You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

7.2 KiB

Quick start guide

Installation

Prerequisites

  • Python (3.5+)
  • OpenVINO (optional)

Installation steps

Download the project to any directory.

Set up a virtual environment:

python -m pip install virtualenv
python -m virtualenv venv
. venv/bin/activate
while read -r p; do pip install $p; done < requirements.txt

Usage

The directory containing the project should be in the PYTHONPATH environment variable. The other way is to invoke commands from that directory.

As a python module:

python -m datumaro --help

As a standalone python script:

python datum.py --help

As a python library:

import datumaro

Workflow

Note: command invocation syntax is subject to change, refer to --help output

The key object is the project. It can be created or imported with project create and project import commands. The project is a combination of dataset and environment.

If you want to interact with models, you should add them to project first.

Implemented commands (CLI design doc):

  • project create
  • project import
  • project diff
  • project transform
  • source add
  • explain

Create a project

Usage:

python datum.py project create --help

python datum.py project create \
     -d <project_dir>

Example:

python datum.py project create -d /home/my_dataset

Import a project

This command creates a project from an existing dataset. Supported formats:

  • MS COCO
  • Custom formats via custom importers and extractors

Usage:

python -m datumaro project import --help

python -m datumaro project import \
     <dataset_path> \
     -d <project_dir> \
     -t <format>

Example:

python -m datumaro project import \
     /home/coco_dir \
     -d /home/project_dir \
     -t ms_coco

An MS COCO-like dataset should have the following directory structure:

COCO/
├── annotations/
│   ├── instances_val2017.json
│   ├── instances_train2017.json
├── images/
│   ├── val2017
│   ├── train2017

Everything after the last _ is considered as a subset name.

Register a model

Supported models:

  • OpenVINO
  • Custom models via custom launchers

Usage:

python -m datumaro model add --help

Example: register OpenVINO model

A model consists of a graph description and weights. There is also a script used to convert model outputs to internal data structures.

python -m datumaro model add \
     <model_name> openvino \
     -d <path_to_xml> -w <path_to_bin> -i <path_to_interpretation_script>

Interpretation script for an OpenVINO detection model (convert.py):

from datumaro.components.extractor import *

max_det = 10
conf_thresh = 0.1

def process_outputs(inputs, outputs):
     # inputs = model input, array or images, shape = (N, C, H, W)
     # outputs = model output, shape = (N, 1, K, 7)
     # results = conversion result, [ [ Annotation, ... ], ... ]
     results = []
     for input, output in zip(inputs, outputs):
          input_height, input_width = input.shape[:2]
          detections = output[0]
          image_results = []
          for i, det in enumerate(detections):
               label = int(det[1])
               conf = det[2]
               if conf <= conf_thresh:
                    continue

               x = max(int(det[3] * input_width), 0)
               y = max(int(det[4] * input_height), 0)
               w = min(int(det[5] * input_width - x), input_width)
               h = min(int(det[6] * input_height - y), input_height)
               image_results.append(BboxObject(x, y, w, h,
                    label=label, attributes={'score': conf} ))

               results.append(image_results[:max_det])

     return results

def get_categories():
     # Optionally, provide output categories - label map etc.
     # Example:
     label_categories = LabelCategories()
     label_categories.add('person')
     label_categories.add('car')
     return { AnnotationType.label: label_categories }

Run a model inference

This command сreates a new project from the current project. The new one annotations are the model outputs.

Usage:

python -m datumaro project transform --help

python -m datumaro project transform \
     -m <model_name> \
     -d <save_dir>

Example:

python -m datumaro project import <...>
python -m datumaro model add mymodel <...>
python -m datumaro project transform -m mymodel -d ../mymodel_inference

Compare datasets

The command compares two datasets and saves the results in the specified directory. The current project is considered to be "ground truth".

python -m datumaro project diff --help

python -m datumaro project diff <other_project_dir> -d <save_dir>

Example: compare a dataset with model inference

python -m datumaro project import <...>
python -m datumaro model add mymodel <...>
python -m datumaro project transform <...> -d ../inference
python -m datumaro project diff ../inference -d ../diff

Run inference explanation

Usage:

python -m datumaro explain --help

python -m datumaro explain \
     -m <model_name> \
     -d <save_dir> \
     -t <target> \
     <method> \
     <method_params>

Example: run inference explanation on a single image with visualization

python -m datumaro project create <...>
python -m datumaro model add mymodel <...>
python -m datumaro explain \
     -m mymodel \
     -t 'image.png' \
     rise \
     -s 1000 --progressive

Extract data subset based on filter

This command allows to create a subprject form a project, which would include only items satisfying some condition. XPath is used as a query format.

Usage:

python -m datumaro project extract --help

python -m datumaro project extract \
     -p <source_project> \
     -d <destinatin dir> \
     -f '<filter expression>'

Example:

python -m datumaro project extract \
     -p ../test_project \
     -d ../test_project-extract \
     -f '/item[image/width < image/height]'

Item representation:

<item>
  <id>290768</id>
  <subset>minival2014</subset>
  <image>
    <width>612</width>
    <height>612</height>
    <depth>3</depth>
  </image>
  <annotation>
    <id>80154</id>
    <type>bbox</type>
    <label_id>39</label_id>
    <x>264.59</x>
    <y>150.25</y>
    <w>11.199999999999989</w>
    <h>42.31</h>
    <area>473.87199999999956</area>
  </annotation>
  <annotation>
    <id>669839</id>
    <type>bbox</type>
    <label_id>41</label_id>
    <x>163.58</x>
    <y>191.75</y>
    <w>76.98999999999998</w>
    <h>73.63</h>
    <area>5668.773699999998</area>
  </annotation>
  ...
</item>