diff --git a/site/content/en/docs/manual/advanced/dataset_manifest.md b/site/content/en/docs/manual/advanced/dataset_manifest.md index 5d2c1f2f..64db6415 100644 --- a/site/content/en/docs/manual/advanced/dataset_manifest.md +++ b/site/content/en/docs/manual/advanced/dataset_manifest.md @@ -123,7 +123,7 @@ A maifest file contains some intuitive information and some specific like: ```json {"version":"1.0"} {"type":"images"} -{"name":"image1","extension":".jpg","width":720,"height":405,"checksum":"548918ec4b56132a5cff1d4acabe9947"} -{"name":"image2","extension":".jpg","width":183,"height":275,"checksum":"4b4eefd03cc6a45c1c068b98477fb639"} -{"name":"image3","extension":".jpg","width":301,"height":167,"checksum":"0e454a6f4a13d56c82890c98be063663"} +{"name":"image1","extension":".jpg","width":720,"height":405,"meta":{"related_images":[]},"checksum":"548918ec4b56132a5cff1d4acabe9947"} +{"name":"image2","extension":".jpg","width":183,"height":275,"meta":{"related_images":[]},"checksum":"4b4eefd03cc6a45c1c068b98477fb639"} +{"name":"image3","extension":".jpg","width":301,"height":167,"meta":{"related_images":[]},"checksum":"0e454a6f4a13d56c82890c98be063663"} ``` diff --git a/site/content/en/docs/manual/basics/attach-cloud-storage.md b/site/content/en/docs/manual/basics/attach-cloud-storage.md index 70c8a930..86117f86 100644 --- a/site/content/en/docs/manual/basics/attach-cloud-storage.md +++ b/site/content/en/docs/manual/basics/attach-cloud-storage.md @@ -6,13 +6,72 @@ description: 'Instructions on how to attach cloud storage using UI' --- In CVAT you can use AWS-S3 and Azure Blob Container cloud storages to store image datasets for your tasks. -Initially you need to create a manifest file for your image dataset. Information on how to do that is available -on the [Simple command line to prepare dataset manifest file](/docs/manual/advanced/dataset_manifest) page. -After the manifest file has been created, you can upload it and your dataset to an AWS-S3 or -Azure Blob Container cloud storage. +## Using AWS-S3 -After that you will be able to attach a cloud storage. To do this, press the `Attach new cloud storage` +### Create AWS account + +First, you need to create an AWS account, to do this, [register of 5 steps](https://portal.aws.amazon.com/billing/signup#/start) +following the instructions +(even if you plan to use a free basic account you may need to link a credit card to verify your identity). + +To learn more about the operation and benefits of AWS cloud, +take a free [AWS Cloud Practitioner Essentials](https://www.aws.training/Details/eLearning?id=60697) course, +which will be available after registration. + +### Create a bucket + +After the account is created, go to [console AWS-S3](https://s3.console.aws.amazon.com/s3/home) +and click `Create bucket`. + +![](/images/aws-s3_tutorial_1.jpg) + +You'll be taken to the bucket creation page. Here you have to specify the name of the bucket, region, +optionally you can copy the settings of another bucket by clicking on the `choose bucket` button. +Checkbox block all public access can be enabled as we will use `access key ID` and `secret access key` to gain access. +In the following sections, you can leave the default settings and click `create bucket`. +After you create the bucket it will appear in the list of buckets. + +### Create user and configure permissions + +To access bucket you will need to create a user, to do this, go [IAM](https://console.aws.amazon.com/iamv2/home#/users) +and click `add users`. You need to choose AWS access type, have an access key ID and secret access key. + +![](/images/aws-s3_tutorial_2.jpg) + +After pressing `next` button to configure permissions, you need to create a user group. +To do this click `create a group`, input the `group name` and select permission policies add `AmazonS3ReadOnlyAccess` +using the search (if you want the user you create to have write rights to bucket select `AmazonS3FullAccess`). + +![](/images/aws-s3_tutorial_3.jpg) + +You can also add tags for the user (optional), and look again at the entered data. In the last step of creating a user, +you will be provided with `access key ID` and `secret access key`, +they will need to be used in CVAT when adding cloud storage. + +![](/images/aws-s3_tutorial_4.jpg) + +### Upload dataset + +For example, let's take [The Oxford-IIIT Pet Dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/): +- Download the [archive with images](https://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz). +- Unpack the archive into the prepared folder + and create a manifest file as described in [prepare manifest file section](/docs/manual/advanced/dataset_manifest/): + ``` + python /utils/dataset_manifest/create.py --output-dir + ``` +- When the manifest file is ready, open the previously prepared bucket and click `Upload`: + + ![](/images/aws-s3_tutorial_5.jpg) + +- Drag the manifest file and image folder on the page and click `Upload`: + + ![](/images/aws-s3_tutorial_1.gif) + +## Attach new cloud storage + +After you upload the dataset and manifest file to AWS-S3 or Azure Blob Container +you will be able to attach a cloud storage. To do this, press the `Attach new cloud storage` button on the `Cloud storages` page and fill out the following form: ![](/images/image228.jpg) @@ -22,17 +81,17 @@ button on the `Cloud storages` page and fill out the following form: of an item on cloud storages page. - `Provider` - choose provider of the cloud storage: - - [AWS-S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/GetStartedWithS3.html): + - [AWS-S3](#using-aws-s3): - - `Bucket` - cloud storage bucket name + - [`Bucket`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingBucket) - cloud storage bucket name - [`Authorization type`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-control-best-practices.html): - - `Key id and secret access key pair`: + - `Key id and secret access key pair` - available on [IAM](https://console.aws.amazon.com/iamv2/home?#/users) - `ACCESS KEY ID` - `SECRET ACCESS KEY ID` - - `Anonymous access` + - `Anonymous access` - For anonymous access, you need to enable public access to bucket - `Region` - here you can choose a region from the list or add a new one. To get more information click on [`?`](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions) @@ -60,3 +119,69 @@ For more information click on [`?`](/docs/manual/advanced/dataset_manifest/). To publish the cloud storage, click `submit`, after which it will be available on the [Cloud storages page](/docs/manual/basics/cloud-storages/). + +## Using AWS Data Exchange + +### Subscribe to data set + +You can use AWS Data Exchange to add image datasets. +For example, consider adding a set of datasets `500 Image & Metadata Free Sample`. +Go to [browse catalog](https://console.aws.amazon.com/dataexchange) and use the search to find +`500 Image & Metadata Free Sample`, open the dataset page and click `continue to subscribe`, +you will be taken to the page complete subscription request, read the information provided +and click send subscription request to provider. + +![](/images/aws-s3_tutorial_6.jpg) + +### Export to bucket + +After that, this dataset will appear in the +[list subscriptions](https://console.aws.amazon.com/dataexchange/home/subscriptions#/subscriptions). +Now you need to export the dataset to `Amazon S3`. +First, let's create a new one bucket similar to [described above](#create-a-bucket). +To export one of the datasets to a new bucket open it `entitled data` select one of the datasets, +select the corresponding revision and click export to Amazon S3 +(please note that if bucket and dataset are located in different regions, export fees may apply). +In the window that appears, select the created bucket and click export. + +![](/images/aws-s3_tutorial_7.jpg) + +### Prepare manifest file +Now you need to prepare a manifest file. I used [AWS cli](https://aws.amazon.com/cli/) and +[script for prepare manifest file](https://github.com/openvinotoolkit/cvat/tree/develop/utils/dataset_manifest). +Perform the installation using the manual [aws-shell](https://github.com/awslabs/aws-shell), +I used `aws-cli 1.20.49` `Python 3.7.9` `Windows 10`. +You can configure credentials by running `aws configure`. +You will need to enter `Access Key ID` and `Secret Access Key` as well as region. + +``` +aws configure +Access Key ID: +Secret Access Key: +``` + +Copy the content of the bucket to a folder on your computer: + +``` +aws s3 cp --recursive +``` + +After copying the files, you can create a manifest file as described in [preapair manifest file section](/docs/manual/advanced/dataset_manifest/): + +``` +python /utils/dataset_manifest/create.py --output-dir +``` + +When the manifest file is ready, you can upload it to aws s3 bucket. If you gave full write permissions +when you created the user, run: + +``` +aws s3 cp /manifest.jsonl +``` + +If you have given read-only permissions, use the download through the browser, click upload, +drag the manifest file to the page and click upload. + +![](/images/aws-s3_tutorial_5.jpg) + +Now you can [attach new cloud storage](#attach-new-cloud-storage) using the dataset `500 Image & Metadata Free Sample`. diff --git a/site/content/en/images/aws-s3_tutorial_1.gif b/site/content/en/images/aws-s3_tutorial_1.gif new file mode 100644 index 00000000..e5c98e1e Binary files /dev/null and b/site/content/en/images/aws-s3_tutorial_1.gif differ diff --git a/site/content/en/images/aws-s3_tutorial_1.jpg b/site/content/en/images/aws-s3_tutorial_1.jpg new file mode 100644 index 00000000..7f90b15b Binary files /dev/null and b/site/content/en/images/aws-s3_tutorial_1.jpg differ diff --git a/site/content/en/images/aws-s3_tutorial_2.jpg b/site/content/en/images/aws-s3_tutorial_2.jpg new file mode 100644 index 00000000..4d1ab7c6 Binary files /dev/null and b/site/content/en/images/aws-s3_tutorial_2.jpg differ diff --git a/site/content/en/images/aws-s3_tutorial_3.jpg b/site/content/en/images/aws-s3_tutorial_3.jpg new file mode 100644 index 00000000..6992fdbc Binary files /dev/null and b/site/content/en/images/aws-s3_tutorial_3.jpg differ diff --git a/site/content/en/images/aws-s3_tutorial_4.jpg b/site/content/en/images/aws-s3_tutorial_4.jpg new file mode 100644 index 00000000..ef208105 Binary files /dev/null and b/site/content/en/images/aws-s3_tutorial_4.jpg differ diff --git a/site/content/en/images/aws-s3_tutorial_5.jpg b/site/content/en/images/aws-s3_tutorial_5.jpg new file mode 100644 index 00000000..9c7eb0f6 Binary files /dev/null and b/site/content/en/images/aws-s3_tutorial_5.jpg differ diff --git a/site/content/en/images/aws-s3_tutorial_6.jpg b/site/content/en/images/aws-s3_tutorial_6.jpg new file mode 100644 index 00000000..9fc8863b Binary files /dev/null and b/site/content/en/images/aws-s3_tutorial_6.jpg differ diff --git a/site/content/en/images/aws-s3_tutorial_7.jpg b/site/content/en/images/aws-s3_tutorial_7.jpg new file mode 100644 index 00000000..86adb765 Binary files /dev/null and b/site/content/en/images/aws-s3_tutorial_7.jpg differ diff --git a/utils/dataset_manifest/core.py b/utils/dataset_manifest/core.py index 794fdf7e..158628d1 100644 --- a/utils/dataset_manifest/core.py +++ b/utils/dataset_manifest/core.py @@ -159,7 +159,7 @@ class DatasetImagesReader: else os.path.basename(image) name, extension = os.path.splitext(img_name) image_properties = { - 'name': name, + 'name': name.replace('\\', '/'), 'extension': extension, 'width': img.width, 'height': img.height,