7.9 KiB
| title | linkTitle | weight | description |
|---|---|---|---|
| Attach cloud storage | Attach cloud storage | 21 | Instructions on how to attach cloud storage using UI |
In CVAT you can use AWS-S3 and Azure Blob Container cloud storages to store image datasets for your tasks.
Using AWS-S3
Create AWS account
First, you need to create an AWS account, to do this, register of 5 steps following the instructions (even if you plan to use a free basic account you may need to link a credit card to verify your identity).
To learn more about the operation and benefits of AWS cloud, take a free AWS Cloud Practitioner Essentials course, which will be available after registration.
Create a bucket
After the account is created, go to console AWS-S3
and click Create bucket.
You'll be taken to the bucket creation page. Here you have to specify the name of the bucket, region,
optionally you can copy the settings of another bucket by clicking on the choose bucket button.
Checkbox block all public access can be enabled as we will use access key ID and secret access key to gain access.
In the following sections, you can leave the default settings and click create bucket.
After you create the bucket it will appear in the list of buckets.
Create user and configure permissions
To access bucket you will need to create a user, to do this, go IAM
and click add users. You need to choose AWS access type, have an access key ID and secret access key.
After pressing next button to configure permissions, you need to create a user group.
To do this click create a group, input the group name and select permission policies add AmazonS3ReadOnlyAccess
using the search (if you want the user you create to have write rights to bucket select AmazonS3FullAccess).
You can also add tags for the user (optional), and look again at the entered data. In the last step of creating a user,
you will be provided with access key ID and secret access key,
they will need to be used in CVAT when adding cloud storage.
Upload dataset
For example, let's take The Oxford-IIIT Pet Dataset:
-
Download the archive with images.
-
Unpack the archive into the prepared folder and create a manifest file as described in prepare manifest file section:
python <cvat repository>/utils/dataset_manifest/create.py --output-dir <yourfolder> <yourfolder> -
When the manifest file is ready, open the previously prepared bucket and click
Upload: -
Drag the manifest file and image folder on the page and click
Upload:
Attach new cloud storage
After you upload the dataset and manifest file to AWS-S3 or Azure Blob Container
you will be able to attach a cloud storage. To do this, press the Attach new cloud storage
button on the Cloud storages page and fill out the following form:
-
Display name- the display name of the cloud storage. -
Description(optional) - description of the cloud storage, appears when you click on the?button of an item on cloud storages page. -
Provider- choose provider of the cloud storage:-
-
Bucket- cloud storage bucket name -
-
Key id and secret access key pair- available on IAMACCESS KEY IDSECRET ACCESS KEY ID
-
Anonymous access- For anonymous access, you need to enable public access to bucket
-
-
Region- here you can choose a region from the list or add a new one. To get more information click on?
-
-
-
Container name- name of the cloud storage container-
Authorization type:-
Account nameSAS token
-
Account name
-
-
-
-
-
Manifest- the path to the manifest file on your cloud storage. You can add multiple file manifests using theAdd manifestbutton. For more information click on?.
To publish the cloud storage, click submit, after which it will be available on
the Cloud storages page.
Using AWS Data Exchange
Subscribe to data set
You can use AWS Data Exchange to add image datasets.
For example, consider adding a set of datasets 500 Image & Metadata Free Sample.
Go to browse catalog and use the search to find
500 Image & Metadata Free Sample, open the dataset page and click continue to subscribe,
you will be taken to the page complete subscription request, read the information provided
and click send subscription request to provider.
Export to bucket
After that, this dataset will appear in the
list subscriptions.
Now you need to export the dataset to Amazon S3.
First, let's create a new one bucket similar to described above.
To export one of the datasets to a new bucket open it entitled data select one of the datasets,
select the corresponding revision and click export to Amazon S3
(please note that if bucket and dataset are located in different regions, export fees may apply).
In the window that appears, select the created bucket and click export.
Prepare manifest file
Now you need to prepare a manifest file. I used AWS cli and
script for prepare manifest file.
Perform the installation using the manual aws-shell,
I used aws-cli 1.20.49 Python 3.7.9 Windows 10.
You can configure credentials by running aws configure.
You will need to enter Access Key ID and Secret Access Key as well as region.
aws configure
Access Key ID: <your Access Key ID>
Secret Access Key: <your Secret Access Key>
Copy the content of the bucket to a folder on your computer:
aws s3 cp <s3://bucket-name> <yourfolder> --recursive
After copying the files, you can create a manifest file as described in preapair manifest file section:
python <cvat repository>/utils/dataset_manifest/create.py --output-dir <yourfolder> <yourfolder>
When the manifest file is ready, you can upload it to aws s3 bucket. If you gave full write permissions when you created the user, run:
aws s3 cp <yourfolder>/manifest.jsonl <s3://bucket-name>
If you have given read-only permissions, use the download through the browser, click upload, drag the manifest file to the page and click upload.
Now you can attach new cloud storage using the dataset 500 Image & Metadata Free Sample.








