29 KiB
Dataset and annotation formats
Contents
How to add a new annotation format support
- Add a python script to
dataset_manager/formats - Add an import statement to registry.py.
- Implement some importers and exporters as the format requires.
Each format is supported by an importer and exporter.
It can be a function or a class decorated with
importer or exporter from registry.py. Examples:
@importer(name="MyFormat", version="1.0", ext="ZIP")
def my_importer(file_object, task_data, **options):
...
@importer(name="MyFormat", version="2.0", ext="XML")
class my_importer(file_object, task_data, **options):
def __call__(self, file_object, task_data, **options):
...
@exporter(name="MyFormat", version="1.0", ext="ZIP"):
def my_exporter(file_object, task_data, **options):
...
Each decorator defines format parameters such as:
-
name
-
version
-
file extension. For the
importerit can be a comma-separated list. These parameters are combined to produce a visible name. It can be set explicitly by thedisplay_nameargument.
Importer arguments:
- file_object - a file with annotations or dataset
- task_data - an instance of
TaskDataclass.
Exporter arguments:
-
file_object - a file for annotations or dataset
-
task_data - an instance of
TaskDataclass. -
options - format-specific options.
save_imagesis the option to distinguish if dataset or just annotations are requested.
TaskData provides many task properties and interfaces
to add and read task annotations.
Public members:
-
TaskData. Attribute - class,
namedtuple('Attribute', 'name, value') -
TaskData. LabeledShape - class,
namedtuple('LabeledShape', 'type, frame, label, points, occluded, attributes, group, z_order') -
TrackedShape -
namedtuple('TrackedShape', 'type, points, occluded, frame, attributes, outside, keyframe, z_order') -
Track - class,
namedtuple('Track', 'label, group, shapes') -
Tag - class,
namedtuple('Tag', 'frame, label, attributes, group') -
Frame - class,
namedtuple('Frame', 'frame, name, width, height, labeled_shapes, tags') -
TaskData. shapes - property, an iterator over
LabeledShapeobjects -
TaskData. tracks - property, an iterator over
Trackobjects -
TaskData. tags - property, an iterator over
Tagobjects -
TaskData. meta - property, a dictionary with task information
-
TaskData. group_by_frame() - method, returns an iterator over
Frameobjects, which groups annotation objects by frame. Note thatTrackedShapes will be represented asLabeledShapes. -
TaskData. add_tag(tag) - method, tag should be an instance of the
Tagclass -
TaskData. add_shape(shape) - method, shape should be an instance of the
Shapeclass -
TaskData. add_track(track) - method, track should be an instance of the
Trackclass
Sample exporter code:
...
# dump meta info if necessary
...
# iterate over all frames
for frame_annotation in task_data.group_by_frame():
# get frame info
image_name = frame_annotation.name
image_width = frame_annotation.width
image_height = frame_annotation.height
# iterate over all shapes on the frame
for shape in frame_annotation.labeled_shapes:
label = shape.label
xtl = shape.points[0]
ytl = shape.points[1]
xbr = shape.points[2]
ybr = shape.points[3]
# iterate over shape attributes
for attr in shape.attributes:
attr_name = attr.name
attr_value = attr.value
...
# dump annotation code
file_object.write(...)
...
Sample importer code:
...
#read file_object
...
for parsed_shape in parsed_shapes:
shape = task_data.LabeledShape(
type="rectangle",
points=[0, 0, 100, 100],
occluded=False,
attributes=[],
label="car",
outside=False,
frame=99,
)
task_data.add_shape(shape)
Format specifications
CVAT
This is the native CVAT annotation format. It supports all CVAT annotations features, so it can be used to make data backups.
-
supported annotations: Rectangles, Polygons, Polylines, Points, Cuboids, Tags, Tracks
-
attributes are supported
CVAT for images export
Downloaded file: a ZIP file of the following structure:
taskname.zip/
├── images/
| ├── img1.png
| └── img2.jpg
└── annotations.xml
- tracks are split by frames
CVAT for videos export
Downloaded file: a ZIP file of the following structure:
taskname.zip/
├── images/
| ├── frame_000000.png
| └── frame_000001.png
└── annotations.xml
- shapes are exported as single-frame tracks
CVAT loader
Uploaded file: an XML file or a ZIP file of the structures above
Datumaro format
Datumaro is a tool, which can help with complex dataset and annotation transformations, format conversions, dataset statistics, merging, custom formats etc. It is used as a provider of dataset support in CVAT, so basically, everything possible in CVAT is possible in Datumaro too, but Datumaro can offer dataset operations.
- supported annotations: any 2D shapes, labels
- supported attributes: any
Pascal VOC
-
supported annotations:
- Rectangles (detection and layout tasks)
- Tags (action- and classification tasks)
- Polygons (segmentation task)
-
supported attributes:
occluded(both UI option and a separate attribute)truncatedanddifficult(should be defined for labels ascheckbox-es)- action attributes (import only, should be defined as
checkbox-es) - arbitrary attributes (in the
attributessecion of XML files)
Pascal VOC export
Downloaded file: a zip archive of the following structure:
taskname.zip/
├── JPEGImages/
│ ├── <image_name1>.jpg
│ ├── <image_name2>.jpg
│ └── <image_nameN>.jpg
├── Annotations/
│ ├── <image_name1>.xml
│ ├── <image_name2>.xml
│ └── <image_nameN>.xml
├── ImageSets/
│ └── Main/
│ └── default.txt
└── labelmap.txt
# labelmap.txt
# label : color_rgb : 'body' parts : actions
background:::
aeroplane:::
bicycle:::
bird:::
Pascal VOC import
Uploaded file: a zip archive of the structure declared above or the following:
taskname.zip/
├── <image_name1>.xml
├── <image_name2>.xml
└── <image_nameN>.xml
It must be possible for CVAT to match the frame name and file name
from annotation .xml file (the filename tag, e. g.
<filename>2008_004457.jpg</filename> ).
There are 2 options:
-
full match between frame name and file name from annotation
.xml(in cases when task was created from images or image archive). -
match by frame number. File name should be
<number>.jpgorframe_000000.jpg. It should be used when task was created from video.
Segmentation mask export
Downloaded file: a zip archive of the following structure:
taskname.zip/
├── labelmap.txt # optional, required for non-VOC labels
├── ImageSets/
│ └── Segmentation/
│ └── default.txt # list of image names without extension
├── SegmentationClass/ # merged class masks
│ ├── image1.png
│ └── image2.png
└── SegmentationObject/ # merged instance masks
├── image1.png
└── image2.png
# labelmap.txt
# label : color (RGB) : 'body' parts : actions
background:0,128,0::
aeroplane:10,10,128::
bicycle:10,128,0::
bird:0,108,128::
boat:108,0,100::
bottle:18,0,8::
bus:12,28,0::
Mask is a png image with 1 or 3 channels where each pixel
has own color which corresponds to a label.
Colors are generated following to Pascal VOC algorithm.
(0, 0, 0) is used for background by default.
- supported shapes: Rectangles, Polygons
Segmentation mask import
Uploaded file: a zip archive of the following structure:
taskname.zip/
├── labelmap.txt # optional, required for non-VOC labels
├── ImageSets/
│ └── Segmentation/
│ └── <any_subset_name>.txt
├── SegmentationClass/
│ ├── image1.png
│ └── image2.png
└── SegmentationObject/
├── image1.png
└── image2.png
It is also possible to import grayscale (1-channel) PNG masks. For grayscale masks provide a list of labels with the number of lines equal to the maximum color index on images. The lines must be in the right order so that line index is equal to the color index. Lines can have arbitrary, but different, colors. If there are gaps in the used color indices in the annotations, they must be filled with arbitrary dummy labels. Example:
q:0,128,0:: # color index 0
aeroplane:10,10,128:: # color index 1
_dummy2:2,2,2:: # filler for color index 2
_dummy3:3,3,3:: # filler for color index 3
boat:108,0,100:: # color index 3
...
_dummy198:198,198,198:: # filler for color index 198
_dummy199:199,199,199:: # filler for color index 199
...
the last label:12,28,0:: # color index 200
- supported shapes: Polygons
How to create a task from Pascal VOC dataset
-
Download the Pascal Voc dataset (Can be downloaded from the PASCAL VOC website)
-
Create a CVAT task with the following labels:
aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitorYou can add
~checkbox=difficult:false ~checkbox=truncated:falseattributes for each label if you want to use them.Select interesting image files (See Creating an annotation task guide for details)
-
zip the corresponding annotation files
-
click
Upload annotationbutton, choosePascal VOC ZIP 1.1and select the zip file with annotations from previous step. It may take some time.
YOLO
- Format specification
- supported annotations: Rectangles
YOLO export
Downloaded file: a zip archive with following structure:
archive.zip/
├── obj.data
├── obj.names
├── obj_<subset>_data
│ ├── image1.txt
│ └── image2.txt
└── train.txt # list of subset image paths
# the only valid subsets are: train, valid
# train.txt and valid.txt:
obj_<subset>_data/image1.jpg
obj_<subset>_data/image2.jpg
# obj.data:
classes = 3 # optional
names = obj.names
train = train.txt
valid = valid.txt # optional
backup = backup/ # optional
# obj.names:
cat
dog
airplane
# image_name.txt:
# label_id - id from obj.names
# cx, cy - relative coordinates of the bbox center
# rw, rh - relative size of the bbox
# label_id cx cy rw rh
1 0.3 0.8 0.1 0.3
2 0.7 0.2 0.3 0.1
Each annotation *.txt file has a name that corresponds to the name of
the image file (e. g. frame_000001.txt is the annotation
for the frame_000001.jpg image).
The *.txt file structure: each line describes label and bounding box
in the following format label_id cx cy w h.
obj.names contains the ordered list of label names.
YOLO import
Uploaded file: a zip archive of the same structure as above It must be possible to match the CVAT frame (image name) and annotation file name. There are 2 options:
-
full match between image name and name of annotation
*.txtfile (in cases when a task was created from images or archive of images). -
match by frame number (if CVAT cannot match by name). File name should be in the following format
<number>.jpg. It should be used when task was created from a video.
How to create a task from YOLO formatted dataset (from VOC for example)
-
Follow the official guide(see Training YOLO on VOC section) and prepare the YOLO formatted annotation files.
-
Zip train images
zip images.zip -j -@ < train.txt
-
Create a CVAT task with the following labels:
aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitorSelect images. zip as data. Most likely you should use
sharefunctionality because size of images. zip is more than 500Mb. See Creating an annotation task guide for details. -
Create
obj.nameswith the following content:aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor -
Zip all label files together (we need to add only label files that correspond to the train subset)
cat train.txt | while read p; do echo ${p%/*/*}/labels/${${p##*/}%%.*}.txt; done | zip labels.zip -j -@ obj.names -
Click
Upload annotationbutton, chooseYOLO 1.1and select the zipfile with labels from the previous step.
MS COCO Object Detection
COCO export
Downloaded file: a zip archive with following structure:
archive.zip/
├── images/
│ ├── <image_name1.ext>
│ ├── <image_name2.ext>
│ └── ...
└── annotations/
└── instances_default.json
- supported annotations: Polygons, Rectangles
- supported attributes:
is_crowd(checkbox or integer with values 0 and 1) - specifies that the instance (an object group) should have an RLE-encoded mask in thesegmentationfield. All the grouped shapes are merged into a single mask, the largest one defines all the object propertiesscore(number) - the annotationscorefield- arbitrary attributes - will be stored in the
attributesannotation section
Note: there is also a support for COCO keypoints over Datumaro
- Install Datumaro
pip install datumaro - Export the task in the
Datumaroformat, unzip - Export the Datumaro project in
coco/coco_person_keypointsformatsdatum export -f coco -p path/to/project [-- --save-images]
This way, one can export CVAT points as single keypoints or
keypoint lists (without the visibility COCO flag).
COCO import
Uploaded file: a single unpacked *.json or a zip archive with the structure above (without images).
- supported annotations: Polygons, Rectangles (if the
segmentationfield is empty)
How to create a task from MS COCO dataset
-
Download the MS COCO dataset.
For example 2017 Val images and 2017 Train/Val annotations.
-
Create a CVAT task with the following labels:
person bicycle car motorcycle airplane bus train truck boat "traffic light" "fire hydrant" "stop sign" "parking meter" bench bird cat dog horse sheep cow elephant bear zebra giraffe backpack umbrella handbag tie suitcase frisbee skis snowboard "sports ball" kite "baseball bat" "baseball glove" skateboard surfboard "tennis racket" bottle "wine glass" cup fork knife spoon bowl banana apple sandwich orange broccoli carrot "hot dog" pizza donut cake chair couch "potted plant" bed "dining table" toilet tv laptop mouse remote keyboard "cell phone" microwave oven toaster sink refrigerator book clock vase scissors "teddy bear" "hair drier" toothbrush -
Select val2017.zip as data (See Creating an annotation task guide for details)
-
Unpack
annotations_trainval2017.zip -
click
Upload annotationbutton, chooseCOCO 1.1and selectinstances_val2017.json.jsonannotation file. It can take some time.
TFRecord
TFRecord is a very flexible format, but we try to correspond the format that used in TF object detection with minimal modifications.
Used feature description:
image_feature_description = {
'image/filename': tf.io.FixedLenFeature([], tf.string),
'image/source_id': tf.io.FixedLenFeature([], tf.string),
'image/height': tf.io.FixedLenFeature([], tf.int64),
'image/width': tf.io.FixedLenFeature([], tf.int64),
# Object boxes and classes.
'image/object/bbox/xmin': tf.io.VarLenFeature(tf.float32),
'image/object/bbox/xmax': tf.io.VarLenFeature(tf.float32),
'image/object/bbox/ymin': tf.io.VarLenFeature(tf.float32),
'image/object/bbox/ymax': tf.io.VarLenFeature(tf.float32),
'image/object/class/label': tf.io.VarLenFeature(tf.int64),
'image/object/class/text': tf.io.VarLenFeature(tf.string),
}
TFRecord export
Downloaded file: a zip archive with following structure:
taskname.zip/
├── default.tfrecord
└── label_map.pbtxt
# label_map.pbtxt
item {
id: 1
name: 'label_0'
}
item {
id: 2
name: 'label_1'
}
...
- supported annotations: Rectangles, Polygons (as masks, manually over Datumaro)
How to export masks:
- Export annotations in
Datumaroformat - Apply
polygons_to_masksandboxes_to_maskstransforms
datum transform -t polygons_to_masks -p path/to/proj -o ptm
datum transform -t boxes_to_masks -p ptm -o btm
- Export in the
TF Detection APIformat
datum export -f tf_detection_api -p btm [-- --save-images]
TFRecord import
Uploaded file: a zip archive of following structure:
taskname.zip/
└── <any name>.tfrecord
- supported annotations: Rectangles
How to create a task from TFRecord dataset (from VOC2007 for example)
- Create
label_map.pbtxtfile with the following content:
item {
id: 1
name: 'aeroplane'
}
item {
id: 2
name: 'bicycle'
}
item {
id: 3
name: 'bird'
}
item {
id: 4
name: 'boat'
}
item {
id: 5
name: 'bottle'
}
item {
id: 6
name: 'bus'
}
item {
id: 7
name: 'car'
}
item {
id: 8
name: 'cat'
}
item {
id: 9
name: 'chair'
}
item {
id: 10
name: 'cow'
}
item {
id: 11
name: 'diningtable'
}
item {
id: 12
name: 'dog'
}
item {
id: 13
name: 'horse'
}
item {
id: 14
name: 'motorbike'
}
item {
id: 15
name: 'person'
}
item {
id: 16
name: 'pottedplant'
}
item {
id: 17
name: 'sheep'
}
item {
id: 18
name: 'sofa'
}
item {
id: 19
name: 'train'
}
item {
id: 20
name: 'tvmonitor'
}
to convert VOC2007 dataset to TFRecord format. As example:
python create_pascal_tf_record.py --data_dir <path to VOCdevkit> --set train --year VOC2007 --output_path pascal.tfrecord --label_map_path label_map.pbtxt
-
Zip train images
cat <path to VOCdevkit>/VOC2007/ImageSets/Main/train.txt | while read p; do echo <path to VOCdevkit>/VOC2007/JPEGImages/${p}.jpg ; done | zip images.zip -j -@ -
Create a CVAT task with the following labels:
aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitorSelect images. zip as data. See Creating an annotation task guide for details.
-
Zip
pascal.tfrecordandlabel_map.pbtxtfiles togetherzip anno.zip -j <path to pascal.tfrecord> <path to label_map.pbtxt> -
Click
Upload annotationbutton, chooseTFRecord 1.0and select the zip filewith labels from the previous step. It may take some time.
MOT sequence
MOT export
Downloaded file: a zip archive of the following structure:
taskname.zip/
├── img1/
| ├── image1.jpg
| └── image2.jpg
└── gt/
├── labels.txt
└── gt.txt
# labels.txt
cat
dog
person
...
# gt.txt
# frame_id, track_id, x, y, w, h, "not ignored", class_id, visibility, <skipped>
1,1,1363,569,103,241,1,1,0.86014
...
- supported annotations: Rectangle shapes and tracks
- supported attributes:
visibility(number),ignored(checkbox)
MOT import
Uploaded file: a zip archive of the structure above or:
taskname.zip/
├── labels.txt # optional, mandatory for non-official labels
└── gt.txt
- supported annotations: Rectangle tracks
MOTS PNG
MOTS PNG export
Downloaded file: a zip archive of the following structure:
taskname.zip/
└── <any_subset_name>/
| images/
| ├── image1.jpg
| └── image2.jpg
└── instances/
├── labels.txt
├── image1.png
└── image2.png
# labels.txt
cat
dog
person
...
- supported annotations: Rectangle and Polygon tracks
MOTS PNG import
Uploaded file: a zip archive of the structure above
- supported annotations: Polygon tracks
LabelMe
LabelMe export
Downloaded file: a zip archive of the following structure:
taskname.zip/
├── img1.jpg
└── img1.xml
- supported annotations: Rectangles, Polygons (with attributes)
LabelMe import
Uploaded file: a zip archive of the following structure:
taskname.zip/
├── Masks/
| ├── img1_mask1.png
| └── img1_mask2.png
├── img1.xml
├── img2.xml
└── img3.xml
- supported annotations: Rectangles, Polygons, Masks (as polygons)
ImageNet
ImageNet export
Downloaded file: a zip archive of the following structure:
# if we save images:
taskname.zip/
├── label1/
| ├── label1_image1.jpg
| └── label1_image2.jpg
└── label2/
├── label2_image1.jpg
├── label2_image3.jpg
└── label2_image4.jpg
# if we keep only annotation:
taskname.zip/
├── <any_subset_name>.txt
└── synsets.txt
- supported annotations: Labels
ImageNet import
Uploaded file: a zip archive of the structure above
- supported annotations: Labels
CamVid
CamVid export
Downloaded file: a zip archive of the following structure:
taskname.zip/
├── labelmap.txt # optional, required for non-CamVid labels
├── <any_subset_name>/
| ├── image1.png
| └── image2.png
├── <any_subset_name>annot/
| ├── image1.png
| └── image2.png
└── <any_subset_name>.txt
# labelmap.txt
# color (RGB) label
0 0 0 Void
64 128 64 Animal
192 0 128 Archway
0 128 192 Bicyclist
0 128 64 Bridge
Mask is a png image with 1 or 3 channels where each pixel
has own color which corresponds to a label.
(0, 0, 0) is used for background by default.
- supported annotations: Rectangles, Polygons
CamVid import
Uploaded file: a zip archive of the structure above
- supported annotations: Polygons
WIDER Face
WIDER Face export
Downloaded file: a zip archive of the following structure:
taskname.zip/
├── labels.txt # optional
├── wider_face_split/
│ └── wider_face_<any_subset_name>_bbx_gt.txt
└── WIDER_<any_subset_name>/
└── images/
├── 0--label0/
│ └── 0_label0_image1.jpg
└── 1--label1/
└── 1_label1_image2.jpg
- supported annotations: Rectangles (with attributes), Labels
- supported attributes:
blur,expression,illumination,pose,invalidoccluded(both the annotation property & an attribute)
WIDER Face import
Uploaded file: a zip archive of the structure above
- supported annotations: Rectangles (with attributes), Labels
- supported attributes:
blur,expression,illumination,occluded,pose,invalid
VGGFace2
VGGFace2 export
Downloaded file: a zip archive of the following structure:
taskname.zip/
├── labels.txt # optional
├── <any_subset_name>/
| ├── label0/
| | └── image1.jpg
| └── label1/
| └── image2.jpg
└── bb_landmark/
├── loose_bb_<any_subset_name>.csv
└── loose_landmark_<any_subset_name>.csv
# labels.txt
# n000001 car
label0 <class0>
label1 <class1>
- supported annotations: Rectangles, Points (landmarks - groups of 5 points)
VGGFace2 import
Uploaded file: a zip archive of the structure above
- supported annotations: Rectangles, Points (landmarks - groups of 5 points)
Market-1501
Market-1501 export
Downloaded file: a zip archive of the following structure:
taskname.zip/
├── bounding_box_<any_subset_name>/
│ └── image_name_1.jpg
└── query
├── image_name_2.jpg
└── image_name_3.jpg
# if we keep only annotation:
taskname.zip/
└── images_<any_subset_name>.txt
# images_<any_subset_name>.txt
query/image_name_1.jpg
bounding_box_<any_subset_name>/image_name_2.jpg
bounding_box_<any_subset_name>/image_name_3.jpg
# image_name = 0001_c1s1_000015_00.jpg
0001 - person id
c1 - camera id (there are totally 6 cameras)
s1 - sequence
000015 - frame number in sequence
00 - means that this bounding box is the first one among the several
- supported annotations: Label
market-1501with atrributes (query,person_id,camera_id)
Market-1501 import
Uploaded file: a zip archive of the structure above
- supported annotations: Label
market-1501with atrributes (query,person_id,camera_id)
ICDAR13/15
ICDAR13/15 export
Downloaded file: a zip archive of the following structure:
# word recognition task
taskname.zip/
└── word_recognition/
└── <any_subset_name>/
├── images
| ├── word1.png
| └── word2.png
└── gt.txt
# text localization task
taskname.zip/
└── text_localization/
└── <any_subset_name>/
├── images
| ├── img_1.png
| └── img_2.png
├── gt_img_1.txt
└── gt_img_1.txt
#text segmentation task
taskname.zip/
└── text_localization/
└── <any_subset_name>/
├── images
| ├── 1.png
| └── 2.png
├── 1_GT.bmp
├── 1_GT.txt
├── 2_GT.bmp
└── 2_GT.txt
Word recognition task:
- supported annotations: Label
icdarwith attributecaption
Text localization task:
- supported annotations: Rectangles and Polygons with label
icdarand attributetext
Text segmentation task:
- supported annotations: Rectangles and Polygons with label
icdarand attributesindex,text,color,center
ICDAR13/15 import
Uploaded file: a zip archive of the structure above
Word recognition task:
- supported annotations: Label
icdarwith attributecaption
Text localization task:
- supported annotations: Rectangles and Polygons with label
icdarand attributetext
Text segmentation task:
- supported annotations: Rectangles and Polygons with label
icdarand attributesindex,text,color,center