Replace VOC format support in CVAT with Datumaro (#1167)

* Add image meta reading to voc * Replace voc support in cvat * Bump format version * Materialize lazy transforms in voc export * Store voc instance id as group id * Add flat format import * Add documentation * Fix format name in doc
6 years ago · cd8ef2aca4
parent 9850094773
commit cd8ef2aca4
7 changed files with 131 additions and 147 deletions
--- a/cvat/apps/annotation/README.md
+++ b/cvat/apps/annotation/README.md
@ -170,44 +170,58 @@ This is native CVAT annotation format.
 - supported shapes - Rectangles, Polygons, Polylines, Points
 ### [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/)
 - [Format specification](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/devkit_doc.pdf)
 #### Pascal dumper description
- downloaded file: a zip archive with following structure:
+- downloaded file: a zip archive of the following structure:
  ```bash
-  taskname.zip
+  taskname.zip/
-  ├── frame_000001.xml
+  ├── Annotations/
-  ├── frame_000002.xml
+  │   ├── <image_name1>.xml
-  ├── frame_000003.xml
+  │   ├── <image_name2>.xml
-  └── ...
+  │   └── <image_nameN>.xml
  ├── ImageSets/
  │   └── Main/
  │       └── default.txt
  └── labelmap.txt
  ```
-  Each annotation `*.xml` file has a name that corresponds to the name of the image file
+
-  (e.g. `frame_000001.xml` is the annotation for the `frame_000001.jpg` image).
+- supported shapes: Rectangles
-  Detailed structure specification of the `*.xml` file can be found
+- additional comments: If you plan to use `truncated` and `difficult` attributes please add the corresponding
  [here](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/devkit_doc.pdf).
 - supported shapes - Rectangles
 - additional comments: If you plan to use 'truncated' and 'difficult' attributes please add the corresponding
  items to the CVAT label attributes:
  `~checkbox=difficult:false ~checkbox=truncated:false`
 #### Pascal loader description
-   uploaded file: a zip archive with following structure:
+- uploaded file: a zip archive of the structure declared above or the following:
-    ```bash
+  ```bash
-    taskname.zip
+  taskname.zip/
-    ├── frame_000001.xml
+  ├── <image_name1>.xml
-    ├── frame_000002.xml
+  ├── <image_name2>.xml
-    ├── frame_000003.xml
+  ├── <image_nameN>.xml
-    └── ...
+  └── labelmap.txt # optional
-    ```
+  ```
    It should be possible to match the CVAT frame(imagename) and image filename from the annotation \*.xml
    file (the tag filename, e.g. `<filename>2008_004457.jpg</filename>`). There are 2 options:
    1. full match between image name and filename from annotation *.xml
       file (in case of a task was created from images or archive of images).
    1. match by frame number (if CVAT cannot match by name). File name should be in the following format `frame_%6d.jpg`.
       It will be used when task was created from a video.
-   supported shapes: Rectangles
+  The `labelmap.txt` file contains dataset labels. It **must** be included
-   limitations: Support of Pascal VOC object detection format
+  if dataset labels **differ** from VOC default labels. The file structure:
-   additional comments: the CVAT task should be created with the full label set that may be in the annotation files
+  ```bash
  # label : color_rgb : 'body' parts : actions
  background:::
  aeroplane:::
  bicycle:::
  bird:::
  ```
  It must be possible for CVAT to match the frame (image name) and file name from annotation \*.xml
  file (the tag filename, e.g. `<filename>2008_004457.jpg</filename>`). There are 2 options:
  1. full match between image name and filename from annotation \*.xml
      (in cases when task was created from images or image archive).
  1. match by frame number (if CVAT cannot match by name). File name should
      be in the following format `<number>.jpg`.
      It should be used when task was created from a video.
 - supported shapes: Rectangles
 - limitations: Support of Pascal VOC object detection format
 - additional comments: the CVAT task should be created with the full label set that may be in the annotation files
 #### How to create a task from Pascal VOC dataset
 1.  Download the Pascal Voc dataset (Can be downloaded from the
@ -222,7 +236,7 @@ This is native CVAT annotation format.
    (See [Creating an annotation task](cvat/apps/documentation/user_guide.md#creating-an-annotation-task)
    guide for details)
 1.  zip the corresponding annotation files
-1.  click `Upload annotation` button, choose `Pascal VOC ZIP 1.0`
+1.  click `Upload annotation` button, choose `Pascal VOC ZIP 1.1`
 and select the *.zip file with annotations from previous step.
 It may take some time.
--- a/cvat/apps/annotation/pascal_voc.py
+++ b/cvat/apps/annotation/pascal_voc.py
@ -8,7 +8,7 @@ format_spec = {
        {
            "display_name": "{name} {format} {version}",
            "format": "ZIP",
-            "version": "1.0",
+            "version": "1.1",
            "handler": "dump"
        },
    ],
@ -16,101 +16,57 @@ format_spec = {
        {
            "display_name": "{name} {format} {version}",
            "format": "ZIP",
-            "version": "1.0",
+            "version": "1.1",
            "handler": "load"
        },
    ],
 }
 def load(file_object, annotations):
-    from pyunpack import Archive
+    from glob import glob
    import os
    import os.path as osp
    import shutil
    from pyunpack import Archive
    from tempfile import TemporaryDirectory
    from datumaro.plugins.voc_format.importer import VocImporter
    from cvat.apps.dataset_manager.bindings import import_dm_annotations
-    def parse_xml_file(annotation_file):
+    archive_file = file_object if isinstance(file_object, str) else getattr(file_object, "name")
        import xml.etree.ElementTree as ET
        root = ET.parse(annotation_file).getroot()
        frame_number = annotations.match_frame(root.find('filename').text)
        for obj_tag in root.iter('object'):
            bbox_tag = obj_tag.find("bndbox")
            label = obj_tag.find('name').text
            xmin = float(bbox_tag.find('xmin').text)
            ymin = float(bbox_tag.find('ymin').text)
            xmax = float(bbox_tag.find('xmax').text)
            ymax = float(bbox_tag.find('ymax').text)
            truncated = obj_tag.find('truncated')
            truncated = truncated.text if truncated is not None else 0
            difficult = obj_tag.find('difficult')
            difficult = difficult.text if difficult is not None else 0
            annotations.add_shape(annotations.LabeledShape(
                type='rectangle',
                frame=frame_number,
                label=label,
                points=[xmin, ymin, xmax, ymax],
                occluded=False,
                attributes=[
                    annotations.Attribute('truncated', truncated),
                    annotations.Attribute('difficult', difficult),
                ],
            ))
    archive_file = getattr(file_object, 'name')
    with TemporaryDirectory() as tmp_dir:
        Archive(archive_file).extractall(tmp_dir)
-        for dirpath, _, filenames in os.walk(tmp_dir):
+        # support flat archive layout
-            for _file in filenames:
+        anno_dir = osp.join(tmp_dir, 'Annotations')
-                if '.xml' == os.path.splitext(_file)[1]:
+        if not osp.isdir(anno_dir):
-                    parse_xml_file(os.path.join(dirpath, _file))
+            anno_files = glob(osp.join(tmp_dir, '**', '*.xml'), recursive=True)
            subsets_dir = osp.join(tmp_dir, 'ImageSets', 'Main')
            os.makedirs(subsets_dir, exist_ok=True)
            with open(osp.join(subsets_dir, 'train.txt'), 'w') as subset_file:
                for f in anno_files:
                    subset_file.write(osp.splitext(osp.basename(f))[0] + '\n')
-def dump(file_object, annotations):
+            os.makedirs(anno_dir, exist_ok=True)
-    from pascal_voc_writer import Writer
+            for f in anno_files:
-    import os
+                shutil.move(f, anno_dir)
    from zipfile import ZipFile
    from tempfile import TemporaryDirectory
    with TemporaryDirectory() as out_dir:
        with ZipFile(file_object, 'w') as output_zip:
            for frame_annotation in annotations.group_by_frame():
                image_name = frame_annotation.name
                width = frame_annotation.width
                height = frame_annotation.height
                writer = Writer(image_name, width, height)
                writer.template_parameters['path'] = ''
                writer.template_parameters['folder'] = ''
-                for shape in frame_annotation.labeled_shapes:
+        dm_project = VocImporter()(tmp_dir)
-                    if shape.type != "rectangle":
+        dm_dataset = dm_project.make_dataset()
-                        continue
+        import_dm_annotations(dm_dataset, annotations)
-                    label = shape.label
+def dump(file_object, annotations):
-                    xtl = shape.points[0]
+    from cvat.apps.dataset_manager.bindings import CvatAnnotationsExtractor
-                    ytl = shape.points[1]
+    from cvat.apps.dataset_manager.util import make_zip_archive
-                    xbr = shape.points[2]
+    from datumaro.components.project import Environment, Dataset
-                    ybr = shape.points[3]
+    from tempfile import TemporaryDirectory
                    difficult = 0
                    truncated = 0
                    for attribute in shape.attributes:
                        if attribute.name == 'truncated' and 'true' == attribute.value.lower():
                            truncated = 1
                        elif attribute.name == 'difficult' and 'true' == attribute.value.lower():
                            difficult = 1
-                    writer.addObject(
+    env = Environment()
-                        name=label,
+    id_from_image = env.transforms.get('id_from_image_name')
                        xmin=xtl,
                        ymin=ytl,
                        xmax=xbr,
                        ymax=ybr,
                        truncated=truncated,
                        difficult=difficult,
                    )
-                anno_name = os.path.basename('{}.{}'.format(os.path.splitext(image_name)[0], 'xml'))
+    extractor = CvatAnnotationsExtractor('', annotations)
-                anno_file = os.path.join(out_dir, anno_name)
+    extractor = extractor.transform(id_from_image)
-                writer.save(anno_file)
+    extractor = Dataset.from_extractors(extractor) # apply lazy transforms
-                output_zip.write(filename=anno_file, arcname=anno_name)
+    converter = env.make_converter('voc_detection')
    with TemporaryDirectory() as temp_dir:
        converter(extractor, save_dir=temp_dir)
        make_zip_archive(temp_dir, file_object)
--- a/cvat/apps/annotation/yolo.py
+++ b/cvat/apps/annotation/yolo.py
@ -8,7 +8,7 @@ format_spec = {
        {
            "display_name": "{name} {format} {version}",
            "format": "ZIP",
-            "version": "1.0",
+            "version": "1.1",
            "handler": "dump"
        },
    ],
@ -16,7 +16,7 @@ format_spec = {
        {
            "display_name": "{name} {format} {version}",
            "format": "ZIP",
-            "version": "1.0",
+            "version": "1.1",
            "handler": "load"
        },
    ],
--- a/cvat/apps/engine/tests/test_rest_api.py
+++ b/cvat/apps/engine/tests/test_rest_api.py
@ -2650,8 +2650,8 @@ class TaskAnnotationAPITestCase(JobAnnotationAPITestCase):
            elif annotation_format == "CVAT XML 1.1 for images":
                annotations["shapes"] = rectangle_shapes_with_attrs + rectangle_shapes_wo_attrs
-            elif annotation_format == "PASCAL VOC ZIP 1.0" or \
+            elif annotation_format == "PASCAL VOC ZIP 1.1" or \
-                 annotation_format == "YOLO ZIP 1.0" or \
+                 annotation_format == "YOLO ZIP 1.1" or \
                 annotation_format == "TFRecord ZIP 1.0":
                 annotations["shapes"] = rectangle_shapes_wo_attrs
--- a/datumaro/datumaro/plugins/voc_format/converter.py
+++ b/datumaro/datumaro/plugins/voc_format/converter.py
@ -235,7 +235,8 @@ class _Converter:
                        if bbox is not None:
                            _write_xml_bbox(bbox, obj_elem)
-                        for part_bbox in filter(lambda x: obj.id == x.group,
+                        for part_bbox in filter(
                                lambda x: obj.group and obj.group == x.group,
                                layout_bboxes):
                            part_elem = ET.SubElement(obj_elem, 'part')
                            ET.SubElement(part_elem, 'name').text = \
--- a/datumaro/datumaro/plugins/voc_format/extractor.py
+++ b/datumaro/datumaro/plugins/voc_format/extractor.py
@ -4,6 +4,7 @@
 # SPDX-License-Identifier: MIT
 from collections import defaultdict
 import logging as log
 import os
 import os.path as osp
 from xml.etree import ElementTree as ET
@ -13,7 +14,7 @@ from datumaro.components.extractor import (SourceExtractor, Extractor,
    AnnotationType, Label, Mask, Bbox, CompiledMask
 )
 from datumaro.util import dir_items
-from datumaro.util.image import lazy_image
+from datumaro.util.image import lazy_image, Image
 from datumaro.util.mask_tools import lazy_mask, invert_colormap
 from .format import (
@ -52,8 +53,12 @@ class VocExtractor(SourceExtractor):
                subset_name = None
            subset = __class__.Subset(subset_name, self)
            subset.items = []
            with open(osp.join(subsets_dir, subset_file_name + '.txt'), 'r') as f:
-                subset.items = [line.split()[0] for line in f]
+                for line in f:
                    line = line.split()[0].strip()
                    if line:
                        subset.items.append(line)
            subsets[subset_name] = subset
        return subsets
@ -84,12 +89,7 @@ class VocExtractor(SourceExtractor):
        for ann_item in det_anno_items:
            with open(osp.join(det_anno_dir, ann_item + '.xml'), 'r') as f:
                ann_file_data = f.read()
-                ann_file_root = ET.fromstring(ann_file_data)
+                det_annotations[ann_item] = ann_file_data
                item = ann_file_root.find('filename').text
                if not item:
                    item = ann_item
                item = osp.splitext(item)[0]
                det_annotations[item] = ann_file_data
        self._annotations[VocTask.detection] = det_annotations
@ -134,6 +134,19 @@ class VocExtractor(SourceExtractor):
    def _get(self, item_id, subset_name):
        image = osp.join(self._path, VocPath.IMAGES_DIR,
            item_id + VocPath.IMAGE_EXT)
        det_annotations = self._annotations.get(VocTask.detection)
        if det_annotations is not None:
            det_annotations = det_annotations.get(item_id)
        if det_annotations is not None:
            root_elem = ET.fromstring(det_annotations)
            height = root_elem.find('size/height')
            if height is not None:
                height = int(height.text)
            width = root_elem.find('size/width')
            if width is not None:
                width = int(width.text)
            if height and width:
                image = Image(path=image, size=(height, width))
        annotations = self._get_annotations(item_id)
@ -217,7 +230,7 @@ class VocExtractor(SourceExtractor):
            for obj_id, object_elem in enumerate(root_elem.findall('object')):
                obj_id += 1
                attributes = {}
-                group = None
+                group = obj_id
                obj_label_id = None
                label_elem = object_elem.find('name')
@ -262,20 +275,21 @@ class VocExtractor(SourceExtractor):
                for action, present in actions.items():
                    attributes[action] = present
                has_parts = False
                for part_elem in object_elem.findall('part'):
                    part = part_elem.find('name').text
                    part_label_id = self._get_label_id(part)
                    part_bbox = self._parse_bbox(part_elem)
                    group = obj_id
                    if self._task is not VocTask.person_layout:
                        break
                    if part_bbox is None:
                        continue
                    has_parts = True
                    item_annotations.append(Bbox(*part_bbox, label=part_label_id,
                        group=group))
-                if self._task is VocTask.person_layout and not group:
+                if self._task is VocTask.person_layout and not has_parts:
                    continue
                if self._task is VocTask.action_classification and not actions:
                    continue
@ -699,7 +713,7 @@ class VocComp_9_10_Extractor(VocResultsExtractor):
    def _load_categories(self):
        from collections import OrderedDict
-        from datumaro.components.formats.voc import VocAction
+        from .format import VocAction
        label_map = OrderedDict((a.name, [[], [], []]) for a in VocAction)
        self._categories = make_voc_categories(label_map)
--- a/datumaro/tests/test_voc_format.py
+++ b/datumaro/tests/test_voc_format.py
@ -211,7 +211,7 @@ class VocExtractorTest(TestCase):
                                    'difficult': False,
                                    'occluded': False,
                                },
-                                id=1,
+                                id=1, group=1,
                            ),
                            Bbox(4, 5, 2, 2, label=self._label('person'),
                                attributes={
@ -382,14 +382,14 @@ class VocConverterTest(TestCase):
            def __iter__(self):
                return iter([
                    DatasetItem(id=1, subset='a', annotations=[
-                        Bbox(2, 3, 4, 5, label=2, id=1,
+                        Bbox(2, 3, 4, 5, label=2, id=1, group=1,
                            attributes={
                                'truncated': False,
                                'difficult': False,
                                'occluded': True,
                            }
                        ),
-                        Bbox(2, 3, 4, 5, label=3, id=2,
+                        Bbox(2, 3, 4, 5, label=3, id=2, group=2,
                            attributes={
                                'truncated': True,
                                'difficult': False,
@ -399,7 +399,7 @@ class VocConverterTest(TestCase):
                    ]),
                    DatasetItem(id=2, subset='b', annotations=[
-                        Bbox(5, 4, 6, 5, label=3, id=1,
+                        Bbox(5, 4, 6, 5, label=3, id=1, group=1,
                            attributes={
                                'truncated': False,
                                'difficult': True,
@ -498,16 +498,16 @@ class VocConverterTest(TestCase):
            def __iter__(self):
                return iter([
                    DatasetItem(id=1, subset='a', annotations=[
-                        Bbox(2, 3, 4, 5, label=2, id=1,
+                        Bbox(2, 3, 4, 5, label=2,
-                            attributes={
+                            id=1, group=1, attributes={
                                'truncated': True,
                                'difficult': False,
                                'occluded': False,
                                # no attributes here in the label categories
                            }
                        ),
-                        Bbox(5, 4, 3, 2, label=self._label('person'), id=2,
+                        Bbox(5, 4, 3, 2, label=self._label('person'),
-                            attributes={
+                            id=2, group=2, attributes={
                                'truncated': True,
                                'difficult': False,
                                'occluded': False,
@ -579,7 +579,7 @@ class VocConverterTest(TestCase):
            def __iter__(self):
                yield DatasetItem(id=1, annotations=[
                    # drop non voc label
-                    Bbox(2, 3, 4, 5, label=self._label('cat'), id=1,
+                    Bbox(2, 3, 4, 5, label=self._label('cat'), id=1, group=1,
                        attributes={
                            'truncated': False,
                            'difficult': False,
@ -615,16 +615,15 @@ class VocConverterTest(TestCase):
        class DstExtractor(TestExtractorBase):
            def __iter__(self):
                yield DatasetItem(id=1, annotations=[
-                    Bbox(2, 3, 4, 5, label=self._label(VOC.VocLabel(1).name), id=1,
+                    Bbox(2, 3, 4, 5, label=self._label(VOC.VocLabel(1).name),
-                        attributes={
+                        id=1, group=1, attributes={
                            'truncated': False,
                            'difficult': False,
                            'occluded': False,
                        }
                    ),
-                    Bbox(1, 2, 3, 4,
+                    Bbox(1, 2, 3, 4, label=self._label('non_voc_label'),
-                        label=self._label('non_voc_label'), id=2,
+                        id=2, group=2, attributes={
                        attributes={
                            'truncated': False,
                            'difficult': False,
                            'occluded': False,
@ -663,15 +662,15 @@ class VocConverterTest(TestCase):
        class DstExtractor(TestExtractorBase):
            def __iter__(self):
                yield DatasetItem(id=1, annotations=[
-                    Bbox(2, 3, 4, 5, label=self._label('label_1'), id=1,
+                    Bbox(2, 3, 4, 5, label=self._label('label_1'),
-                        attributes={
+                        id=1, group=1, attributes={
                            'truncated': False,
                            'difficult': False,
                            'occluded': False,
                        }
                    ),
-                    Bbox(1, 2, 3, 4, label=self._label('label_2'), id=2,
+                    Bbox(1, 2, 3, 4, label=self._label('label_2'),
-                        attributes={
+                        id=2, group=2, attributes={
                            'truncated': False,
                            'difficult': False,
                            'occluded': False,