Documentation update, added info about the cloud storage UI and add AWS-S3 tutorial (#3745)

* added info about cloud storage ui * fixed linter errors, improved list spacing * fixed linter errors v2 * fixed mistakes & indentation * Apply suggestions from code review Co-authored-by: Timur Osmanov <54434686+TOsmanov@users.noreply.github.com> * fixed errors & indentation issues * fixed errors & indentation issues v2 * fix mistakes in attach-cloud-storage.md * apply suggestions v2 * add aws-s3 tutorial * fix linter errors * update attach-cloud-storage.md * fix mistakes * update attach-cloud-storage.md * update image * fix link * update attach cloud storage section * fix markup * fix mistake * fix mistake * update the example manifest * apply suggestions Co-authored-by: Motov <alexeyx.motov@intel.com> Co-authored-by: amotovx <87861362+amotovx@users.noreply.github.com>
4 years ago · ced1bc8cb3
parent 4d8172c64a
commit ced1bc8cb3
11 changed files with 138 additions and 13 deletions
--- a/site/content/en/docs/manual/advanced/dataset_manifest.md
+++ b/site/content/en/docs/manual/advanced/dataset_manifest.md
@ -123,7 +123,7 @@ A maifest file contains some intuitive information and some specific like:
 ```json
 {"version":"1.0"}
 {"type":"images"}
-{"name":"image1","extension":".jpg","width":720,"height":405,"checksum":"548918ec4b56132a5cff1d4acabe9947"}
-{"name":"image2","extension":".jpg","width":183,"height":275,"checksum":"4b4eefd03cc6a45c1c068b98477fb639"}
-{"name":"image3","extension":".jpg","width":301,"height":167,"checksum":"0e454a6f4a13d56c82890c98be063663"}
+{"name":"image1","extension":".jpg","width":720,"height":405,"meta":{"related_images":[]},"checksum":"548918ec4b56132a5cff1d4acabe9947"}
+{"name":"image2","extension":".jpg","width":183,"height":275,"meta":{"related_images":[]},"checksum":"4b4eefd03cc6a45c1c068b98477fb639"}
+{"name":"image3","extension":".jpg","width":301,"height":167,"meta":{"related_images":[]},"checksum":"0e454a6f4a13d56c82890c98be063663"}
 ```
--- a/site/content/en/docs/manual/basics/attach-cloud-storage.md
+++ b/site/content/en/docs/manual/basics/attach-cloud-storage.md
@ -6,13 +6,72 @@ description: 'Instructions on how to attach cloud storage using UI'
 ---

 In CVAT you can use AWS-S3 and Azure Blob Container cloud storages to store image datasets for your tasks.
-Initially you need to create a manifest file for your image dataset. Information on how to do that is available
-on the [Simple command line to prepare dataset manifest file](/docs/manual/advanced/dataset_manifest) page.

-After the manifest file has been created, you can upload it and your dataset to an AWS-S3 or
-Azure Blob Container cloud storage.
+## Using AWS-S3

-After that you will be able to attach a cloud storage. To do this, press the `Attach new cloud storage`
+### Create AWS account
+
+First, you need to create an AWS account, to do this, [register of 5 steps](https://portal.aws.amazon.com/billing/signup#/start)
+following the instructions
+(even if you plan to use a free basic account you may need to link a credit card to verify your identity).
+
+To learn more about the operation and benefits of AWS cloud,
+take a free [AWS Cloud Practitioner Essentials](https://www.aws.training/Details/eLearning?id=60697) course,
+which will be available after registration.
+
+### Create a bucket
+
+After the account is created, go to [console AWS-S3](https://s3.console.aws.amazon.com/s3/home)
+and click `Create bucket`.
+
+![](/images/aws-s3_tutorial_1.jpg)
+
+You'll be taken to the bucket creation page. Here you have to specify the name of the bucket, region,
+optionally you can copy the settings of another bucket by clicking on the `choose bucket` button.
+Checkbox block all public access can be enabled as we will use `access key ID` and `secret access key` to gain access.
+In the following sections, you can leave the default settings and click `create bucket`.
+After you create the bucket it will appear in the list of buckets.
+
+### Create user and configure permissions
+
+To access bucket you will need to create a user, to do this, go [IAM](https://console.aws.amazon.com/iamv2/home#/users)
+and click `add users`. You need to choose AWS access type, have an access key ID and secret access key.
+
+![](/images/aws-s3_tutorial_2.jpg)
+
+After pressing `next` button to configure permissions, you need to create a user group.
+To do this click `create a group`, input the `group name` and select permission policies add `AmazonS3ReadOnlyAccess`
+using the search (if you want the user you create to have write rights to bucket select `AmazonS3FullAccess`).
+
+![](/images/aws-s3_tutorial_3.jpg)
+
+You can also add tags for the user (optional), and look again at the entered data. In the last step of creating a user,
+you will be provided with `access key ID` and `secret access key`,
+they will need to be used in CVAT when adding cloud storage.
+
+![](/images/aws-s3_tutorial_4.jpg)
+
+### Upload dataset
+
+For example, let's take [The Oxford-IIIT Pet Dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/):
+- Download the [archive with images](https://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz).
+- Unpack the archive into the prepared folder
+  and create a manifest file as described in [prepare manifest file section](/docs/manual/advanced/dataset_manifest/):
+  ```
+  python <cvat repository>/utils/dataset_manifest/create.py --output-dir <yourfolder> <yourfolder>
+  ```
+- When the manifest file is ready, open the previously prepared bucket and click `Upload`:
+
+  ![](/images/aws-s3_tutorial_5.jpg)
+
+- Drag the manifest file and image folder on the page and click `Upload`:
+
+  ![](/images/aws-s3_tutorial_1.gif)
+
+## Attach new cloud storage
+
+After you upload the dataset and manifest file to AWS-S3 or Azure Blob Container
+you will be able to attach a cloud storage. To do this, press the `Attach new cloud storage`
 button on the `Cloud storages` page and fill out the following form:

 ![](/images/image228.jpg)
@ -22,17 +81,17 @@ button on the `Cloud storages` page and fill out the following form:
 of an item on cloud storages page.
 - `Provider` - choose provider of the cloud storage:

-  - [AWS-S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/GetStartedWithS3.html):
+  - [AWS-S3](#using-aws-s3):

-    - `Bucket` - cloud storage bucket name
+    - [`Bucket`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingBucket) - cloud storage bucket name

    - [`Authorization type`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-control-best-practices.html):

-      - `Key id and secret access key pair`:
+      - `Key id and secret access key pair` - available on [IAM](https://console.aws.amazon.com/iamv2/home?#/users)
        - `ACCESS KEY ID`
        - `SECRET ACCESS KEY ID`

-      - `Anonymous access`
+      - `Anonymous access` - For anonymous access, you need to enable public access to bucket

    - `Region` - here you can choose a region from the list or add a new one. To get more information click
    on [`?`](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions)
@ -60,3 +119,69 @@ For more information click on [`?`](/docs/manual/advanced/dataset_manifest/).

 To publish the cloud storage, click `submit`, after which it will be available on
 the [Cloud storages page](/docs/manual/basics/cloud-storages/).
+
+## Using AWS Data Exchange
+
+### Subscribe to data set
+
+You can use AWS Data Exchange to add image datasets.
+For example, consider adding a set of datasets `500 Image & Metadata Free Sample`.
+Go to [browse catalog](https://console.aws.amazon.com/dataexchange) and use the search to find
+`500 Image & Metadata Free Sample`, open the dataset page and click `continue to subscribe`,
+you will be taken to the page complete subscription request, read the information provided
+and click send subscription request to provider.
+
+![](/images/aws-s3_tutorial_6.jpg)
+
+### Export to bucket
+
+After that, this dataset will appear in the
+[list subscriptions](https://console.aws.amazon.com/dataexchange/home/subscriptions#/subscriptions).
+Now you need to export the dataset to `Amazon S3`.
+First, let's create a new one bucket similar to [described above](#create-a-bucket).
+To export one of the datasets to a new bucket open it `entitled data` select one of the datasets,
+select the corresponding revision and click export to Amazon S3
+(please note that if bucket and dataset are located in different regions, export fees may apply).
+In the window that appears, select the created bucket and click export.
+
+![](/images/aws-s3_tutorial_7.jpg)
+
+### Prepare manifest file
+Now you need to prepare a manifest file. I used [AWS cli](https://aws.amazon.com/cli/) and
+[script for prepare manifest file](https://github.com/openvinotoolkit/cvat/tree/develop/utils/dataset_manifest).
+Perform the installation using the manual [aws-shell](https://github.com/awslabs/aws-shell),
+I used `aws-cli 1.20.49` `Python 3.7.9` `Windows 10`.
+You can configure credentials by running `aws configure`.
+You will need to enter `Access Key ID` and `Secret Access Key` as well as region.
+
+```
+aws configure
+Access Key ID: <your Access Key ID>
+Secret Access Key: <your Secret Access Key>
+```
+
+Copy the content of the bucket to a folder on your computer:
+
+```
+aws s3 cp <s3://bucket-name> <yourfolder> --recursive
+```
+
+After copying the files, you can create a manifest file as described in [preapair manifest file section](/docs/manual/advanced/dataset_manifest/):
+
+```
+python <cvat repository>/utils/dataset_manifest/create.py --output-dir <yourfolder> <yourfolder>
+```
+
+When the manifest file is ready, you can upload it to aws s3 bucket. If you gave full write permissions
+when you created the user, run:
+
+```
+aws s3 cp <yourfolder>/manifest.jsonl <s3://bucket-name>
+```
+
+If you have given read-only permissions, use the download through the browser, click upload,
+drag the manifest file to the page and click upload.
+
+![](/images/aws-s3_tutorial_5.jpg)
+
+Now you can [attach new cloud storage](#attach-new-cloud-storage) using the dataset `500 Image & Metadata Free Sample`.
--- a/site/content/en/images/aws-s3_tutorial_1.gif
+++ b/site/content/en/images/aws-s3_tutorial_1.gif
--- a/site/content/en/images/aws-s3_tutorial_1.jpg
+++ b/site/content/en/images/aws-s3_tutorial_1.jpg
--- a/site/content/en/images/aws-s3_tutorial_2.jpg
+++ b/site/content/en/images/aws-s3_tutorial_2.jpg
--- a/site/content/en/images/aws-s3_tutorial_3.jpg
+++ b/site/content/en/images/aws-s3_tutorial_3.jpg
--- a/site/content/en/images/aws-s3_tutorial_4.jpg
+++ b/site/content/en/images/aws-s3_tutorial_4.jpg
--- a/site/content/en/images/aws-s3_tutorial_5.jpg
+++ b/site/content/en/images/aws-s3_tutorial_5.jpg
--- a/site/content/en/images/aws-s3_tutorial_6.jpg
+++ b/site/content/en/images/aws-s3_tutorial_6.jpg
--- a/site/content/en/images/aws-s3_tutorial_7.jpg
+++ b/site/content/en/images/aws-s3_tutorial_7.jpg
--- a/utils/dataset_manifest/core.py
+++ b/utils/dataset_manifest/core.py
@ -159,7 +159,7 @@ class DatasetImagesReader:
                else os.path.basename(image)
            name, extension = os.path.splitext(img_name)
            image_properties = {
-                'name': name,
+                'name': name.replace('\\', '/'),
                'extension': extension,
                'width': img.width,
                'height': img.height,