* Move formats to dataset manager * Unify datataset export and anno export implementations * Add track_id to TrackedShape, export tracked shapes * Replace MOT format * Replace LabelMe format * Add new formats to dm * Add dm tests * Extend TrackedShape * Enable dm test in CI * Fix tests * Add import * Fix tests * Fix mot track ids * Fix mot format * Update attribute logic in labelme tests * Use common code in yolo * Put datumaro in path in settings * Expect labels file in MOT next to annotations file * Add MOT format description * Add import * Add labelme format description * Linter fix * Linter fix2 * Compare attributes ordered * Update docs * Update tests |
6 years ago | |
|---|---|---|
| .. | ||
| migrations | 7 years ago | |
| README.md | 6 years ago | |
| __init__.py | 7 years ago | |
| admin.py | 7 years ago | |
| annotation.py | 6 years ago | |
| apps.py | 7 years ago | |
| format.py | 6 years ago | |
| models.py | 7 years ago | |
| serializers.py | 7 years ago | |
| settings.py | 6 years ago | |
| tests.py | 7 years ago | |
| views.py | 7 years ago | |
README.md
Description
The purpose of this application is to add support for multiple annotation formats for CVAT. It allows to download and upload annotations in different formats and easily add support for new.
How to add a new annotation format support
-
Write a python script that will be executed via exec() function. Following items must be defined inside at code:
- format_spec - a dictionary with the following structure:
format_spec = { "name": "CVAT", "dumpers": [ { "display_name": "{name} {format} {version} for videos", "format": "XML", "version": "1.1", "handler": "dump_as_cvat_interpolation" }, { "display_name": "{name} {format} {version} for images", "format": "XML", "version": "1.1", "handler": "dump_as_cvat_annotation" } ], "loaders": [ { "display_name": "{name} {format} {version}", "format": "XML", "version": "1.1", "handler": "load", } ], }- name - unique name for each format
- dumpers and loaders - lists of objects that describes exposed dumpers and loaders and must
have following keys:
- display_name - unique string used as ID for dumpers and loaders. Also this string is displayed in CVAT UI. Possible to use a named placeholders like the python format function (supports only name, format and version variables).
- format - a string, used as extension for a dumped annotation.
- version - just string with version.
- handler - function that will be called and should be defined at top scope.
- dumper/loader handler functions. Each function should have the following signature:
def dump_handler(file_object, annotations):
Inside of the script environment 2 variables are available:
- file_object - python's standard file object returned by open() function and exposing a file-oriented API (with methods such as read() or write()) to an underlying resource.
- annotations - instance of Annotation class.
Annotation class expose API and some additional pre-defined types that allow to get/add shapes inside a loader/dumper code.
Short description of the public methods:
- Annotation.shapes - property, returns a generator of Annotation.LabeledShape objects
- Annotation.tracks - property, returns a generator of Annotation.Track objects
- Annotation.tags - property, returns a generator of Annotation.Tag objects
- Annotation.group_by_frame() - method, returns an iterator on Annotation.Frame object, which groups annotation objects by frame. Note that TrackedShapes will be represented as Annotation.LabeledShape.
- Annotation.meta - property, returns dictionary which represent a task meta information, for example - video source name, number of frames, number of jobs, etc
- Annotation.add_tag(tag) - tag should be a instance of the Annotation.Tag class
- Annotation.add_shape(shape) - shape should be a instance of the Annotation.Shape class
- Annotation.add_track(track) - track should be a instance of the Annotation.Track class
- Annotation.Attribute = namedtuple('Attribute', 'name, value')
- name - String, name of the attribute
- value - String, value of the attribute
- Annotation.LabeledShape = namedtuple('LabeledShape', 'type, frame, label, points, occluded, attributes, group, z_order') LabeledShape._new_._defaults_ = (0, None)
- TrackedShape = namedtuple('TrackedShape', 'type, points, occluded, frame, attributes, outside, keyframe, z_order') TrackedShape._new_._defaults_ = (None, )
- Track = namedtuple('Track', 'label, group, shapes')
- Tag = namedtuple('Tag', 'frame, label, attributes, group') Tag._new_._defaults_ = (0, )
- Frame = namedtuple('Frame', 'frame, name, width, height, labeled_shapes, tags')
Pseudocode for a dumper script
... # dump meta info if necessary ... # iterate over all frames for frame_annotation in annotations.group_by_frame(): # get frame info image_name = frame_annotation.name image_width = frame_annotation.width image_height = frame_annotation.height # iterate over all shapes on the frame for shape in frame_annotation.labeled_shapes: label = shape.label xtl = shape.points[0] ytl = shape.points[1] xbr = shape.points[2] ybr = shape.points[3] # iterate over shape attributes for attr in shape.attributes: attr_name = attr.name attr_value = attr.value ... # dump annotation code file_object.write(...) ...Pseudocode for a loader code
... #read file_object ... for parsed_shape in parsed_shapes: shape = annotations.LabeledShape( type="rectangle", points=[0, 0, 100, 100], occluded=False, attributes=[], label="car", outside=False, frame=99, ) annotations.add_shape(shape)Full examples can be found in corrseponding *.py files (cvat.py, coco.py, yolo.py, etc.).
- format_spec - a dictionary with the following structure:
-
Add path to a new python script to the annotation app settings:
BUILTIN_FORMATS = ( os.path.join(path_prefix, 'cvat.py'), os.path.join(path_prefix,'pascal_voc.py'), )
Ideas for improvements
- Annotation format manager like DL Model manager with which the user can add custom format support by writing dumper/loader scripts.
- Often a custom loader/dumper requires additional python packages and it would be useful if CVAT provided some API that allows the user to install a python dependencies from their own code without changing the source code. Possible solutions: install additional modules via pip call to a separate directory for each Annotation Format to reduce version conflicts, etc. Thus, custom code can be run in an extended environment, and core CVAT modules should not be affected. As well, this functionality can be useful for Auto Annotation module.
Format specifications
CVAT
This is native CVAT annotation format. Detailed format description
CVAT XML for images dumper
- downloaded file: Single unpacked XML
- supported shapes - Rectangles, Polygons, Polylines, Points
CVAT XML for videos dumper
- downloaded file: Single unpacked XML
- supported shapes - Rectangles, Polygons, Polylines, Points
CVAT XML Loader
- uploaded file: Single unpacked XML
- supported shapes - Rectangles, Polygons, Polylines, Points
Pascal VOC
Pascal dumper description
-
downloaded file: a zip archive of the following structure:
taskname.zip/ ├── Annotations/ │ ├── <image_name1>.xml │ ├── <image_name2>.xml │ └── <image_nameN>.xml ├── ImageSets/ │ └── Main/ │ └── default.txt └── labelmap.txt -
supported shapes: Rectangles
-
additional comments: If you plan to use
truncatedanddifficultattributes please add the corresponding items to the CVAT label attributes:~checkbox=difficult:false ~checkbox=truncated:false
Pascal loader description
-
uploaded file: a zip archive of the structure declared above or the following:
taskname.zip/ ├── <image_name1>.xml ├── <image_name2>.xml ├── <image_nameN>.xml └── labelmap.txt # optionalThe
labelmap.txtfile contains dataset labels. It must be included if dataset labels differ from VOC default labels. The file structure:# label : color_rgb : 'body' parts : actions background::: aeroplane::: bicycle::: bird:::It must be possible for CVAT to match the frame (image name) and file name from annotation *.xml file (the tag filename, e.g.
<filename>2008_004457.jpg</filename>). There are 2 options:- full match between image name and filename from annotation *.xml (in cases when task was created from images or image archive).
- match by frame number (if CVAT cannot match by name). File name should
be in the following format
<number>.jpg. It should be used when task was created from a video.
-
supported shapes: Rectangles
-
limitations: Support of Pascal VOC object detection format
-
additional comments: the CVAT task should be created with the full label set that may be in the annotation files
How to create a task from Pascal VOC dataset
-
Download the Pascal Voc dataset (Can be downloaded from the PASCAL VOC website)
-
Create a CVAT task with the following labels:
aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitorYou can add
~checkbox=difficult:false ~checkbox=truncated:falseattributes for each label if you want to use them.Select interesting image files (See Creating an annotation task guide for details)
-
zip the corresponding annotation files
-
click
Upload annotationbutton, choosePascal VOC ZIP 1.1and select the *.zip file with annotations from previous step. It may take some time.
YOLO
Yolo dumper description
- downloaded file: a zip archive with following structure:
Format specification
Each annotationarchive.zip/ ├── obj.data ├── obj.names ├── obj_<subset>_data │ ├── image1.txt │ └── image2.txt └── train.txt # list of subset image paths # the only valid subsets are: train, valid # train.txt and valid.txt: obj_<subset>_data/image1.jpg obj_<subset>_data/image2.jpg # obj.data: classes = 3 # optional names = obj.names train = train.txt valid = valid.txt # optional backup = backup/ # optional # obj.names: cat dog airplane # image_name.txt: # label_id - id from obj.names # cx, cy - relative coordinates of the bbox center # rw, rh - relative size of the bbox # label_id cx cy rw rh 1 0.3 0.8 0.1 0.3 2 0.7 0.2 0.3 0.1*.txtfile has a name that corresponds to the name of the image file (e.g.frame_000001.txtis the annotation for theframe_000001.jpgimage). The*.txtfile structure: each line describes label and bounding box in the following formatlabel_id cx cy w h.obj.namescontains the ordered list of label names. - supported shapes - Rectangles
Yolo loader description
-
uploaded file: a zip archive of the same structure as above It must be possible to match the CVAT frame (image name) and annotation file name There are 2 options:
- full match between image name and name of annotation
*.txtfile (in cases when a task was created from images or archive of images). - match by frame number (if CVAT cannot match by name). File name should be in the following format
<number>.jpg. It should be used when task was created from a video.
- full match between image name and name of annotation
-
supported shapes: Rectangles
-
additional comments: the CVAT task should be created with the full label set that may be in the annotation files
How to create a task from YOLO formatted dataset (from VOC for example)
- Follow the official guide(see Training YOLO on VOC section) and prepare the YOLO formatted annotation files.
- Zip train images
zip images.zip -j -@ < train.txt - Create a CVAT task with the following labels:
Select images.zip as data. Most likely you should useaeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitorsharefunctionality because size of images.zip is more than 500Mb. See Creating an annotation task guide for details. - Create
obj.nameswith the following content:aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor - Zip all label files together (we need to add only label files that correspond to the train subset)
cat train.txt | while read p; do echo ${p%/*/*}/labels/${${p##*/}%%.*}.txt; done | zip labels.zip -j -@ obj.names - Click
Upload annotationbutton, chooseYOLO ZIP 1.1and select the *.zip file with labels from previous step. It may take some time.
MS COCO Object Detection
COCO dumper description
- downloaded file: single unpacked
json. Detailed description of the MS COCO format can be found here - supported shapes - Polygons, Rectangles (interpreted as polygons)
COCO loader description
- uploaded file: single unpacked
*.json. - supported shapes: object is interpreted as Polygon if the
segmentationfield of annotation is not empty else as Rectangle with coordinates frombboxfield. - additional comments: the CVAT task should be created with the full label set that may be in the annotation files
How to create a task from MS COCO dataset
-
Download the MS COCO dataset. For example 2017 Val images and 2017 Train/Val annotations.
-
Create a CVAT task with the following labels:
person bicycle car motorcycle airplane bus train truck boat "traffic light" "fire hydrant" "stop sign" "parking meter" bench bird cat dog horse sheep cow elephant bear zebra giraffe backpack umbrella handbag tie suitcase frisbee skis snowboard "sports ball" kite "baseball bat" "baseball glove" skateboard surfboard "tennis racket" bottle "wine glass" cup fork knife spoon bowl banana apple sandwich orange broccoli carrot "hot dog" pizza donut cake chair couch "potted plant" bed "dining table" toilet tv laptop mouse remote keyboard "cell phone" microwave oven toaster sink refrigerator book clock vase scissors "teddy bear" "hair drier" toothbrushSelect val2017.zip as data (See Creating an annotation task guide for details)
-
unpack annotations_trainval2017.zip
-
click
Upload annotationbutton, chooseCOCO JSON 1.0and selectinstances_val2017.json.jsonannotation file. It may take some time.
TFRecord
TFRecord is a very flexible format, but we try to correspond the format that used in TF object detection with minimal modifications. Used feature description:
image_feature_description = {
'image/filename': tf.io.FixedLenFeature([], tf.string),
'image/source_id': tf.io.FixedLenFeature([], tf.string),
'image/height': tf.io.FixedLenFeature([], tf.int64),
'image/width': tf.io.FixedLenFeature([], tf.int64),
# Object boxes and classes.
'image/object/bbox/xmin': tf.io.VarLenFeature(tf.float32),
'image/object/bbox/xmax': tf.io.VarLenFeature(tf.float32),
'image/object/bbox/ymin': tf.io.VarLenFeature(tf.float32),
'image/object/bbox/ymax': tf.io.VarLenFeature(tf.float32),
'image/object/class/label': tf.io.VarLenFeature(tf.int64),
'image/object/class/text': tf.io.VarLenFeature(tf.string),
}
TFRecord dumper description
- downloaded file: a zip archive with following structure:
taskname.zip ├── task2.tfrecord └── label_map.pbtxt - supported shapes - Rectangles
TFRecord loader description
- uploaded file: a zip archive with following structure:
taskname.zip └── task2.tfrecord - supported shapes: Rectangles
- additional comments: the CVAT task should be created with the full label set that may be in the annotation files
How to create a task from TFRecord dataset (from VOC2007 for example)
- Create label_map.pbtxt file with the following content:
item {
id: 1
name: 'aeroplane'
}
item {
id: 2
name: 'bicycle'
}
item {
id: 3
name: 'bird'
}
item {
id: 4
name: 'boat'
}
item {
id: 5
name: 'bottle'
}
item {
id: 6
name: 'bus'
}
item {
id: 7
name: 'car'
}
item {
id: 8
name: 'cat'
}
item {
id: 9
name: 'chair'
}
item {
id: 10
name: 'cow'
}
item {
id: 11
name: 'diningtable'
}
item {
id: 12
name: 'dog'
}
item {
id: 13
name: 'horse'
}
item {
id: 14
name: 'motorbike'
}
item {
id: 15
name: 'person'
}
item {
id: 16
name: 'pottedplant'
}
item {
id: 17
name: 'sheep'
}
item {
id: 18
name: 'sofa'
}
item {
id: 19
name: 'train'
}
item {
id: 20
name: 'tvmonitor'
}
- Use create_pascal_tf_record.py to convert VOC2007 dataset to TFRecord format. As example:
python create_pascal_tf_record.py --data_dir <path to VOCdevkit> --set train --year VOC2007 --output_path pascal.tfrecord --label_map_path label_map.pbtxt
- Zip train images
cat <path to VOCdevkit>/VOC2007/ImageSets/Main/train.txt | while read p; do echo <path to VOCdevkit>/VOC2007/JPEGImages/${p}.jpg ; done | zip images.zip -j -@ - Create a CVAT task with the following labels:
Select images.zip as data. See Creating an annotation task guide for details.aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor - Zip pascal.tfrecord and label_map.pbtxt files together
zip anno.zip -j <path to pascal.tfrecord> <path to label_map.pbtxt> - Click
Upload annotationbutton, chooseTFRecord ZIP 1.0and select the *.zip file with labels from previous step. It may take some time.
PNG mask
Mask dumper description
- downloaded file: a zip archive with the following structure:
Mask is a png image with several (RGB) channels where each pixel has own color which corresponds to a label. Color generation correspond to the Pascal VOC color generation algorithm. (0, 0, 0) is used for background.taskname.zip ├── labelmap.txt # optional, required for non-VOC labels ├── ImageSets/ │ └── Segmentation/ │ └── default.txt # list of image names without extension ├── SegmentationClass/ # merged class masks │ ├── image1.png │ └── image2.png └── SegmentationObject/ # merged instance masks ├── image1.png └── image2.pnglabelmap.txtfile contains the values of the used colors in RGB format. The file structure:# label:color_rgb:parts:actions background:0,128,0:: aeroplane:10,10,128:: bicycle:10,128,0:: bird:0,108,128:: boat:108,0,100:: bottle:18,0,8:: bus:12,28,0:: - supported shapes - Rectangles, Polygons
Mask loader description
- uploaded file: a zip archive of the following structure:
name.zip ├── labelmap.txt # optional, required for non-VOC labels ├── ImageSets/ │ └── Segmentation/ │ └── <any_subset_name>.txt ├── SegmentationClass/ │ ├── image1.png │ └── image2.png └── SegmentationObject/ ├── image1.png └── image2.png - supported shapes: Polygons
- additional comments: the CVAT task should be created with the full label set that may be in the annotation files
MOT sequence
Dumper
- downloaded file: a zip archive of the following structure:
taskname.zip/ ├── img1/ | ├── imgage1.jpg | └── imgage2.jpg └── gt/ ├── labels.txt └── gt.txt # labels.txt cat dog person ... # gt.txt # frame_id, track_id, x, y, w, h, "not ignored", class_id, visibility, <skipped> 1,1,1363,569,103,241,1,1,0.86014 ... - supported annotations: Rectangle shapes and tracks
- supported attributes:
visibility(number),ignored(checkbox)
Loader
- uploaded file: a zip archive of the structure above or:
taskname.zip/ ├── labels.txt # optional, mandatory for non-official labels └── gt.txt - supported annotations: Rectangle tracks
LabelMe
Dumper
- downloaded file: a zip archive of the following structure:
taskname.zip/ ├── img1.jpg └── img1.xml - supported annotations: Rectangles, Polygons (with attributes)
Loader
- uploaded file: a zip archive of the following structure:
taskname.zip/ ├── Masks/ | ├── img1_mask1.png | └── img1_mask2.png ├── img1.xml ├── img2.xml └── img3.xml - supported annotations: Rectangles, Polygons, Masks (as polygons)