cvat/annotation at 4da951a812be829f7e14fdc60107a14f4dd41698 - cvat

You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

History

zhiltsov-max 887c6f0432 Move annotation formats to dataset manager (#1256 ) * Move formats to dataset manager * Unify datataset export and anno export implementations * Add track_id to TrackedShape, export tracked shapes * Replace MOT format * Replace LabelMe format * Add new formats to dm * Add dm tests * Extend TrackedShape * Enable dm test in CI * Fix tests * Add import * Fix tests * Fix mot track ids * Fix mot format * Update attribute logic in labelme tests * Use common code in yolo * Put datumaro in path in settings * Expect labels file in MOT next to annotations file * Add MOT format description * Add import * Add labelme format description * Linter fix * Linter fix2 * Compare attributes ordered * Update docs * Update tests		6 years ago
..
migrations	Az/pascal voc loader (#613 )	7 years ago
README.md	Move annotation formats to dataset manager (#1256 )	6 years ago
__init__.py	Az/multiformat downloader (#551 )	7 years ago
admin.py	Az/multiformat downloader (#551 )	7 years ago
annotation.py	Move annotation formats to dataset manager (#1256 )	6 years ago
apps.py	Az/multiformat downloader (#551 )	7 years ago
format.py	Move annotation formats to dataset manager (#1256 )	6 years ago
models.py	Az/pascal voc loader (#613 )	7 years ago
serializers.py	Az/pascal voc loader (#613 )	7 years ago
settings.py	Move annotation formats to dataset manager (#1256 )	6 years ago
tests.py	Az/multiformat downloader (#551 )	7 years ago
views.py	Az/multiformat downloader (#551 )	7 years ago

README.md

Description

The purpose of this application is to add support for multiple annotation formats for CVAT. It allows to download and upload annotations in different formats and easily add support for new.

How to add a new annotation format support

Write a python script that will be executed via exec() function. Following items must be defined inside at code:
- format_spec - a dictionary with the following structure:
```
format_spec = {
  "name": "CVAT",
  "dumpers": [
      {
          "display_name": "{name} {format} {version} for videos",
          "format": "XML",
          "version": "1.1",
          "handler": "dump_as_cvat_interpolation"
      },
      {
          "display_name": "{name} {format} {version} for images",
          "format": "XML",
          "version": "1.1",
          "handler": "dump_as_cvat_annotation"
      }
  ],
  "loaders": [
      {
          "display_name": "{name} {format} {version}",
          "format": "XML",
          "version": "1.1",
          "handler": "load",
      }
  ],
}
```
  - name - unique name for each format
  - dumpers and loaders - lists of objects that describes exposed dumpers and loaders and must have following keys:
    1. display_name - unique string used as ID for dumpers and loaders. Also this string is displayed in CVAT UI. Possible to use a named placeholders like the python format function (supports only name, format and version variables).
    2. format - a string, used as extension for a dumped annotation.
    3. version - just string with version.
    4. handler - function that will be called and should be defined at top scope.
- dumper/loader handler functions. Each function should have the following signature:
```
def dump_handler(file_object, annotations):
```
Inside of the script environment 2 variables are available:
- file_object - python's standard file object returned by open() function and exposing a file-oriented API (with methods such as read() or write()) to an underlying resource.
- annotations - instance of Annotation class.
Annotation class expose API and some additional pre-defined types that allow to get/add shapes inside a loader/dumper code.

Short description of the public methods:
- Annotation.shapes - property, returns a generator of Annotation.LabeledShape objects
- Annotation.tracks - property, returns a generator of Annotation.Track objects
- Annotation.tags - property, returns a generator of Annotation.Tag objects
- Annotation.group_by_frame() - method, returns an iterator on Annotation.Frame object, which groups annotation objects by frame. Note that TrackedShapes will be represented as Annotation.LabeledShape.
- Annotation.meta - property, returns dictionary which represent a task meta information, for example - video source name, number of frames, number of jobs, etc
- Annotation.add_tag(tag) - tag should be a instance of the Annotation.Tag class
- Annotation.add_shape(shape) - shape should be a instance of the Annotation.Shape class
- Annotation.add_track(track) - track should be a instance of the Annotation.Track class
- Annotation.Attribute = namedtuple('Attribute', 'name, value')
  - name - String, name of the attribute
  - value - String, value of the attribute
- Annotation.LabeledShape = namedtuple('LabeledShape', 'type, frame, label, points, occluded, attributes, group, z_order') LabeledShape._new_._defaults_ = (0, None)
- TrackedShape = namedtuple('TrackedShape', 'type, points, occluded, frame, attributes, outside, keyframe, z_order') TrackedShape._new_._defaults_ = (None, )
- Track = namedtuple('Track', 'label, group, shapes')
- Tag = namedtuple('Tag', 'frame, label, attributes, group') Tag._new_._defaults_ = (0, )
- Frame = namedtuple('Frame', 'frame, name, width, height, labeled_shapes, tags')
Pseudocode for a dumper script
```
...
# dump meta info if necessary
...

# iterate over all frames
for frame_annotation in annotations.group_by_frame():
    # get frame info
    image_name = frame_annotation.name
    image_width = frame_annotation.width
    image_height = frame_annotation.height

    # iterate over all shapes on the frame
    for shape in frame_annotation.labeled_shapes:
        label = shape.label
        xtl = shape.points[0]
        ytl = shape.points[1]
        xbr = shape.points[2]
        ybr = shape.points[3]

        # iterate over shape attributes
        for attr in shape.attributes:
            attr_name = attr.name
            attr_value = attr.value
...
# dump annotation code
file_object.write(...)
...
```
Pseudocode for a loader code
```
...
#read file_object
...

for parsed_shape in parsed_shapes:
    shape = annotations.LabeledShape(
        type="rectangle",
        points=[0, 0, 100, 100],
        occluded=False,
        attributes=[],
        label="car",
        outside=False,
        frame=99,
    )

    annotations.add_shape(shape)
```
Full examples can be found in corrseponding *.py files (cvat.py, coco.py, yolo.py, etc.).

Add path to a new python script to the annotation app settings:

BUILTIN_FORMATS = (
  os.path.join(path_prefix, 'cvat.py'),
  os.path.join(path_prefix,'pascal_voc.py'),
)

Ideas for improvements

Annotation format manager like DL Model manager with which the user can add custom format support by writing dumper/loader scripts.
Often a custom loader/dumper requires additional python packages and it would be useful if CVAT provided some API that allows the user to install a python dependencies from their own code without changing the source code. Possible solutions: install additional modules via pip call to a separate directory for each Annotation Format to reduce version conflicts, etc. Thus, custom code can be run in an extended environment, and core CVAT modules should not be affected. As well, this functionality can be useful for Auto Annotation module.

Format specifications

CVAT

This is native CVAT annotation format. Detailed format description

CVAT XML for images dumper

downloaded file: Single unpacked XML
supported shapes - Rectangles, Polygons, Polylines, Points

CVAT XML for videos dumper

downloaded file: Single unpacked XML
supported shapes - Rectangles, Polygons, Polylines, Points

CVAT XML Loader

uploaded file: Single unpacked XML
supported shapes - Rectangles, Polygons, Polylines, Points

Pascal VOC

Format specification

Pascal dumper description

downloaded file: a zip archive of the following structure:

taskname.zip/
├── Annotations/
│   ├── <image_name1>.xml
│   ├── <image_name2>.xml
│   └── <image_nameN>.xml
├── ImageSets/
│   └── Main/
│       └── default.txt
└── labelmap.txt

supported shapes: Rectangles
additional comments: If you plan to use truncated and difficult attributes please add the corresponding items to the CVAT label attributes: ~checkbox=difficult:false ~checkbox=truncated:false

Pascal loader description

uploaded file: a zip archive of the structure declared above or the following:
```
taskname.zip/
├── <image_name1>.xml
├── <image_name2>.xml
├── <image_nameN>.xml
└── labelmap.txt # optional
```
The labelmap.txt file contains dataset labels. It must be included if dataset labels differ from VOC default labels. The file structure:
```
# label : color_rgb : 'body' parts : actions
background:::
aeroplane:::
bicycle:::
bird:::
```
It must be possible for CVAT to match the frame (image name) and file name from annotation *.xml file (the tag filename, e.g. <filename>2008_004457.jpg</filename>). There are 2 options:
1. full match between image name and filename from annotation *.xml (in cases when task was created from images or image archive).
2. match by frame number (if CVAT cannot match by name). File name should be in the following format <number>.jpg. It should be used when task was created from a video.
supported shapes: Rectangles
limitations: Support of Pascal VOC object detection format
additional comments: the CVAT task should be created with the full label set that may be in the annotation files

How to create a task from Pascal VOC dataset

Download the Pascal Voc dataset (Can be downloaded from the PASCAL VOC website)
Create a CVAT task with the following labels:
```
aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor
```
You can add ~checkbox=difficult:false ~checkbox=truncated:false attributes for each label if you want to use them.

Select interesting image files (See Creating an annotation task guide for details)
zip the corresponding annotation files
click Upload annotation button, choose Pascal VOC ZIP 1.1 and select the *.zip file with annotations from previous step. It may take some time.

YOLO

Yolo dumper description

downloaded file: a zip archive with following structure: Format specification

archive.zip/
├── obj.data
├── obj.names
├── obj_<subset>_data
│   ├── image1.txt
│   └── image2.txt
└── train.txt # list of subset image paths

# the only valid subsets are: train, valid
# train.txt and valid.txt:
obj_<subset>_data/image1.jpg
obj_<subset>_data/image2.jpg

# obj.data:
classes = 3 # optional
names = obj.names
train = train.txt
valid = valid.txt # optional
backup = backup/ # optional

# obj.names:
cat
dog
airplane

# image_name.txt:
# label_id - id from obj.names
# cx, cy - relative coordinates of the bbox center
# rw, rh - relative size of the bbox
# label_id cx cy rw rh
1 0.3 0.8 0.1 0.3
2 0.7 0.2 0.3 0.1

Each annotation *.txt file has a name that corresponds to the name of the image file (e.g. frame_000001.txt is the annotation for the frame_000001.jpg image). The *.txt file structure: each line describes label and bounding box in the following format label_id cx cy w h. obj.names contains the ordered list of label names.

supported shapes - Rectangles

Yolo loader description

uploaded file: a zip archive of the same structure as above It must be possible to match the CVAT frame (image name) and annotation file name There are 2 options:
1. full match between image name and name of annotation *.txt file (in cases when a task was created from images or archive of images).
2. match by frame number (if CVAT cannot match by name). File name should be in the following format <number>.jpg. It should be used when task was created from a video.
supported shapes: Rectangles
additional comments: the CVAT task should be created with the full label set that may be in the annotation files

How to create a task from YOLO formatted dataset (from VOC for example)

Follow the official guide(see Training YOLO on VOC section) and prepare the YOLO formatted annotation files.
Zip train images
```
zip images.zip -j -@ < train.txt
```
Create a CVAT task with the following labels:
```
aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor
```
Select images.zip as data. Most likely you should use share functionality because size of images.zip is more than 500Mb. See Creating an annotation task guide for details.

Create obj.names with the following content:

aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor

Zip all label files together (we need to add only label files that correspond to the train subset)

cat train.txt | while read p; do echo ${p%/*/*}/labels/${${p##*/}%%.*}.txt; done | zip labels.zip -j -@ obj.names

Click Upload annotation button, choose YOLO ZIP 1.1 and select the *.zip file with labels from previous step. It may take some time.

MS COCO Object Detection

COCO dumper description

downloaded file: single unpacked json. Detailed description of the MS COCO format can be found here
supported shapes - Polygons, Rectangles (interpreted as polygons)

COCO loader description

uploaded file: single unpacked *.json.
supported shapes: object is interpreted as Polygon if the segmentation field of annotation is not empty else as Rectangle with coordinates from bbox field.
additional comments: the CVAT task should be created with the full label set that may be in the annotation files

How to create a task from MS COCO dataset

Download the MS COCO dataset. For example 2017 Val images and 2017 Train/Val annotations.

Create a CVAT task with the following labels:

person bicycle car motorcycle airplane bus train truck boat "traffic light" "fire hydrant" "stop sign" "parking meter" bench bird cat dog horse sheep cow elephant bear zebra giraffe backpack umbrella handbag tie suitcase frisbee skis snowboard "sports ball" kite "baseball bat" "baseball glove" skateboard surfboard "tennis racket" bottle "wine glass" cup fork knife spoon bowl banana apple sandwich orange broccoli carrot "hot dog" pizza donut cake chair couch "potted plant" bed "dining table" toilet tv laptop mouse remote keyboard "cell phone" microwave oven toaster sink refrigerator book clock vase scissors "teddy bear" "hair drier" toothbrush

Select val2017.zip as data (See Creating an annotation task guide for details)

unpack annotations_trainval2017.zip
click Upload annotation button, choose COCO JSON 1.0 and select instances_val2017.json.json annotation file. It may take some time.

TFRecord

TFRecord is a very flexible format, but we try to correspond the format that used in TF object detection with minimal modifications. Used feature description:

image_feature_description = {
    'image/filename': tf.io.FixedLenFeature([], tf.string),
    'image/source_id': tf.io.FixedLenFeature([], tf.string),
    'image/height': tf.io.FixedLenFeature([], tf.int64),
    'image/width': tf.io.FixedLenFeature([], tf.int64),
    # Object boxes and classes.
    'image/object/bbox/xmin': tf.io.VarLenFeature(tf.float32),
    'image/object/bbox/xmax': tf.io.VarLenFeature(tf.float32),
    'image/object/bbox/ymin': tf.io.VarLenFeature(tf.float32),
    'image/object/bbox/ymax': tf.io.VarLenFeature(tf.float32),
    'image/object/class/label': tf.io.VarLenFeature(tf.int64),
    'image/object/class/text': tf.io.VarLenFeature(tf.string),
}

TFRecord dumper description

downloaded file: a zip archive with following structure:

taskname.zip
├── task2.tfrecord
└── label_map.pbtxt

supported shapes - Rectangles

TFRecord loader description

uploaded file: a zip archive with following structure:
```
taskname.zip
└── task2.tfrecord
```
supported shapes: Rectangles
additional comments: the CVAT task should be created with the full label set that may be in the annotation files

How to create a task from TFRecord dataset (from VOC2007 for example)

Create label_map.pbtxt file with the following content:

item {
	id: 1
	name: 'aeroplane'
}
item {
	id: 2
	name: 'bicycle'
}
item {
	id: 3
	name: 'bird'
}
item {
	id: 4
	name: 'boat'
}
item {
	id: 5
	name: 'bottle'
}
item {
	id: 6
	name: 'bus'
}
item {
	id: 7
	name: 'car'
}
item {
	id: 8
	name: 'cat'
}
item {
	id: 9
	name: 'chair'
}
item {
	id: 10
	name: 'cow'
}
item {
	id: 11
	name: 'diningtable'
}
item {
	id: 12
	name: 'dog'
}
item {
	id: 13
	name: 'horse'
}
item {
	id: 14
	name: 'motorbike'
}
item {
	id: 15
	name: 'person'
}
item {
	id: 16
	name: 'pottedplant'
}
item {
	id: 17
	name: 'sheep'
}
item {
	id: 18
	name: 'sofa'
}
item {
	id: 19
	name: 'train'
}
item {
	id: 20
	name: 'tvmonitor'
}

Use create_pascal_tf_record.py to convert VOC2007 dataset to TFRecord format. As example:

python create_pascal_tf_record.py --data_dir <path to VOCdevkit> --set train --year VOC2007 --output_path pascal.tfrecord --label_map_path label_map.pbtxt

Zip train images

cat <path to VOCdevkit>/VOC2007/ImageSets/Main/train.txt | while read p; do echo <path to VOCdevkit>/VOC2007/JPEGImages/${p}.jpg  ; done | zip images.zip -j -@

Create a CVAT task with the following labels:

aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor

Select images.zip as data. See Creating an annotation task guide for details.

Zip pascal.tfrecord and label_map.pbtxt files together

zip anno.zip -j <path to pascal.tfrecord> <path to label_map.pbtxt>

Click Upload annotation button, choose TFRecord ZIP 1.0 and select the *.zip file with labels from previous step. It may take some time.

PNG mask

Mask dumper description

downloaded file: a zip archive with the following structure:

taskname.zip
├── labelmap.txt # optional, required for non-VOC labels
├── ImageSets/
│   └── Segmentation/
│       └── default.txt # list of image names without extension
├── SegmentationClass/ # merged class masks
│   ├── image1.png
│   └── image2.png
└── SegmentationObject/ # merged instance masks
    ├── image1.png
    └── image2.png

Mask is a png image with several (RGB) channels where each pixel has own color which corresponds to a label. Color generation correspond to the Pascal VOC color generation algorithm. (0, 0, 0) is used for background. labelmap.txt file contains the values of the used colors in RGB format. The file structure:

# label:color_rgb:parts:actions
background:0,128,0::
aeroplane:10,10,128::
bicycle:10,128,0::
bird:0,108,128::
boat:108,0,100::
bottle:18,0,8::
bus:12,28,0::

supported shapes - Rectangles, Polygons

Mask loader description

uploaded file: a zip archive of the following structure:

name.zip
├── labelmap.txt # optional, required for non-VOC labels
├── ImageSets/
│   └── Segmentation/
│       └── <any_subset_name>.txt
├── SegmentationClass/
│   ├── image1.png
│   └── image2.png
└── SegmentationObject/
    ├── image1.png
    └── image2.png

supported shapes: Polygons
additional comments: the CVAT task should be created with the full label set that may be in the annotation files

MOT sequence

Dumper

downloaded file: a zip archive of the following structure:

taskname.zip/
├── img1/
|   ├── imgage1.jpg
|   └── imgage2.jpg
└── gt/
    ├── labels.txt
    └── gt.txt

# labels.txt
cat
dog
person
...

# gt.txt
# frame_id, track_id, x, y, w, h, "not ignored", class_id, visibility, <skipped>
1,1,1363,569,103,241,1,1,0.86014
...

supported annotations: Rectangle shapes and tracks
supported attributes: visibility (number), ignored (checkbox)

Loader

uploaded file: a zip archive of the structure above or:

taskname.zip/
├── labels.txt # optional, mandatory for non-official labels
└── gt.txt

supported annotations: Rectangle tracks

LabelMe

Dumper

downloaded file: a zip archive of the following structure:
```
taskname.zip/
├── img1.jpg
└── img1.xml
```
supported annotations: Rectangles, Polygons (with attributes)

Loader

uploaded file: a zip archive of the following structure:

taskname.zip/
├── Masks/
|   ├── img1_mask1.png
|   └── img1_mask2.png
├── img1.xml
├── img2.xml
└── img3.xml

supported annotations: Rectangles, Polygons, Masks (as polygons)