[Datumaro] CLI updates + better documentation (#1057)
parent
095d6d4611
commit
93b3c091f5
@ -0,0 +1,119 @@
|
||||
## Table of Contents
|
||||
|
||||
- [Installation](#installation)
|
||||
- [Usage](#usage)
|
||||
- [Testing](#testing)
|
||||
- [Design](#design-and-code-structure)
|
||||
|
||||
## Installation
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Python (3.5+)
|
||||
- OpenVINO (optional)
|
||||
|
||||
``` bash
|
||||
git clone https://github.com/opencv/cvat
|
||||
```
|
||||
|
||||
Optionally, install a virtual environment:
|
||||
|
||||
``` bash
|
||||
python -m pip install virtualenv
|
||||
python -m virtualenv venv
|
||||
. venv/bin/activate
|
||||
```
|
||||
|
||||
Then install all dependencies:
|
||||
|
||||
``` bash
|
||||
while read -r p; do pip install $p; done < requirements.txt
|
||||
```
|
||||
|
||||
If you're working inside CVAT environment:
|
||||
``` bash
|
||||
. .env/bin/activate
|
||||
while read -r p; do pip install $p; done < datumaro/requirements.txt
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
> The directory containing Datumaro should be in the `PYTHONPATH`
|
||||
> environment variable or `cvat/datumaro/` should be the current directory.
|
||||
|
||||
``` bash
|
||||
datum --help
|
||||
python -m datumaro --help
|
||||
python datumaro/ --help
|
||||
python datum.py --help
|
||||
```
|
||||
|
||||
``` python
|
||||
import datumaro
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
It is expected that all Datumaro functionality is covered and checked by
|
||||
unit tests. Tests are placed in `tests/` directory.
|
||||
|
||||
To run tests use:
|
||||
|
||||
``` bash
|
||||
python -m unittest discover -s tests
|
||||
```
|
||||
|
||||
If you're working inside CVAT environment, you can also use:
|
||||
|
||||
``` bash
|
||||
python manage.py test datumaro/
|
||||
```
|
||||
|
||||
## Design and code structure
|
||||
|
||||
- [Design document](docs/design.md)
|
||||
|
||||
### Command-line
|
||||
|
||||
Use [Docker](https://www.docker.com/) as an example. Basically,
|
||||
the interface is divided on contexts and single commands.
|
||||
Contexts are semantically grouped commands,
|
||||
related to a single topic or target. Single commands are handy shorter
|
||||
alternatives for the most used commands and also special commands,
|
||||
which are hard to be put into any specific context.
|
||||
|
||||

|
||||
|
||||
- The diagram above was created with [FreeMind](http://freemind.sourceforge.net/wiki/index.php/Main_Page)
|
||||
|
||||
Model-View-ViewModel (MVVM) UI pattern is used.
|
||||
|
||||

|
||||
|
||||
### Datumaro project and environment structure
|
||||
|
||||
<!--lint disable fenced-code-flag-->
|
||||
```
|
||||
├── [datumaro module]
|
||||
└── [project folder]
|
||||
├── .datumaro/
|
||||
| ├── config.yml
|
||||
│ ├── .git/
|
||||
│ ├── importers/
|
||||
│ │ ├── custom_format_importer1.py
|
||||
│ │ └── ...
|
||||
│ ├── statistics/
|
||||
│ │ ├── custom_statistic1.py
|
||||
│ │ └── ...
|
||||
│ ├── visualizers/
|
||||
│ │ ├── custom_visualizer1.py
|
||||
│ │ └── ...
|
||||
│ └── extractors/
|
||||
│ ├── custom_extractor1.py
|
||||
│ └── ...
|
||||
├── dataset/
|
||||
└── sources/
|
||||
├── source1
|
||||
└── ...
|
||||
```
|
||||
<!--lint enable fenced-code-flag-->
|
||||
@ -1,36 +1,176 @@
|
||||
# Dataset framework
|
||||
# Dataset Framework (Datumaro)
|
||||
|
||||
A framework to prepare, manage, build, analyze datasets
|
||||
A framework to build, transform, and analyze datasets.
|
||||
|
||||
<!--lint disable fenced-code-flag-->
|
||||
```
|
||||
CVAT annotations -- ---> Annotation tool
|
||||
... \ /
|
||||
COCO-like dataset -----> Datumaro ---> dataset ------> Model training
|
||||
... / \
|
||||
VOC-like dataset -- ---> Publication etc.
|
||||
```
|
||||
<!--lint enable fenced-code-flag-->
|
||||
|
||||
## Contents
|
||||
|
||||
- [Documentation](#documentation)
|
||||
- [Features](#features)
|
||||
- [Installation](#installation)
|
||||
- [Usage](#usage)
|
||||
- [Examples](#examples)
|
||||
- [Contributing](#contributing)
|
||||
|
||||
## Documentation
|
||||
|
||||
-[Quick start guide](docs/quickstart.md)
|
||||
- [User manual](docs/user_manual.md)
|
||||
- [Design document](docs/design.md)
|
||||
- [Contributing](CONTRIBUTING.md)
|
||||
|
||||
## Installation
|
||||
## Features
|
||||
|
||||
Python3.5+ is required.
|
||||
- Dataset format conversions:
|
||||
- COCO (`image_info`, `instances`, `person_keypoints`, `captions`, `labels`*)
|
||||
- [Format specification](http://cocodataset.org/#format-data)
|
||||
- `labels` are our extension - like `instances` with only `category_id`
|
||||
- PASCAL VOC (`classification`, `detection`, `segmentation` (class, instances), `action_classification`, `person_layout`)
|
||||
- [Format specification](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/htmldoc/index.html)
|
||||
- YOLO (`bboxes`)
|
||||
- [Format specification](https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data)
|
||||
- TF Detection API (`bboxes`, `masks`)
|
||||
- Format specifications: [bboxes](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md), [masks](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/instance_segmentation.md)
|
||||
- CVAT
|
||||
- [Format specification](https://github.com/opencv/cvat/blob/develop/cvat/apps/documentation/xml_format.md)
|
||||
- Dataset building operations:
|
||||
- Merging multiple datasets into one
|
||||
- Dataset filtering with custom conditions, for instance:
|
||||
- remove all annotations except polygons of a certain class
|
||||
- remove images without a specific class
|
||||
- remove occluded annotations from images
|
||||
- keep only vertically-oriented images
|
||||
- remove small area bounding boxes from annotations
|
||||
- Annotation conversions, for instance
|
||||
- polygons to instance masks and vise-versa
|
||||
- apply a custom colormap for mask annotations
|
||||
- remap dataset labels
|
||||
- Dataset comparison
|
||||
- Model integration:
|
||||
- Inference (OpenVINO and custom models)
|
||||
- Explainable AI ([RISE algorithm](https://arxiv.org/abs/1806.07421))
|
||||
|
||||
To install into a virtual environment do:
|
||||
> Check the [design document](docs/design.md) for a full list of features
|
||||
|
||||
## Installation
|
||||
|
||||
Optionally, create a virtual environment:
|
||||
|
||||
``` bash
|
||||
python -m pip install virtualenv
|
||||
python -m virtualenv venv
|
||||
. venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Execution
|
||||
|
||||
The tool can be executed both as a script and as a module.
|
||||
Install Datumaro package:
|
||||
|
||||
``` bash
|
||||
PYTHONPATH="..."
|
||||
python -m datumaro <command>
|
||||
python path/to/datum.py
|
||||
pip install 'git+https://github.com/opencv/cvat#egg=datumaro&subdirectory=datumaro'
|
||||
```
|
||||
|
||||
## Testing
|
||||
## Usage
|
||||
|
||||
There are several options available:
|
||||
- [A standalone command-line tool](#standalone-tool)
|
||||
- [A python module](#python-module)
|
||||
|
||||
### Standalone tool
|
||||
|
||||
<!--lint disable fenced-code-flag-->
|
||||
```
|
||||
User
|
||||
|
|
||||
v
|
||||
+------------------+
|
||||
| CVAT |
|
||||
+--------v---------+ +------------------+ +--------------+
|
||||
| Datumaro module | ----> | Datumaro project | <---> | Datumaro CLI | <--- User
|
||||
+------------------+ +------------------+ +--------------+
|
||||
```
|
||||
<!--lint enable fenced-code-flag-->
|
||||
|
||||
``` bash
|
||||
python -m unittest discover -s tests
|
||||
datum --help
|
||||
python -m datumaro --help
|
||||
```
|
||||
|
||||
### Python module
|
||||
|
||||
Datumaro can be used in custom scripts as a library in the following way:
|
||||
|
||||
``` python
|
||||
from datumaro.components.project import Project # project-related things
|
||||
import datumaro.components.extractor # annotations and high-level interfaces
|
||||
# etc.
|
||||
project = Project.load('directory')
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
<!--lint disable list-item-indent-->
|
||||
<!--lint disable list-item-bullet-indent-->
|
||||
|
||||
- Convert [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html#data) to COCO, keep only images with `cat` class presented:
|
||||
```bash
|
||||
# Download VOC dataset:
|
||||
# http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
|
||||
datum project import --format voc --input-path <path/to/voc>
|
||||
datum project export --format coco --filter '/item[annotation/label="cat"]'
|
||||
```
|
||||
|
||||
- Convert only non-occluded annotations from a CVAT-annotated project to TFrecord:
|
||||
```bash
|
||||
# export Datumaro dataset in CVAT UI, extract somewhere, go to the project dir
|
||||
datum project extract --filter '/item/annotation[occluded="False"]' \
|
||||
--mode items+anno --output-dir not_occluded
|
||||
datum project export --project not_occluded \
|
||||
--format tf_detection_api -- --save-images
|
||||
```
|
||||
|
||||
- Annotate COCO, extract image subset, re-annotate it in CVAT, update old dataset:
|
||||
```bash
|
||||
# Download COCO dataset http://cocodataset.org/#download
|
||||
# Put images to coco/images/ and annotations to coco/annotations/
|
||||
datum project import --format coco --input-path <path/to/coco>
|
||||
datum project export --filter '/image[images_I_dont_like]' --format cvat \
|
||||
--output-dir reannotation
|
||||
# import dataset and images to CVAT, re-annotate
|
||||
# export Datumaro project, extract to 'reannotation-upd'
|
||||
datum project project merge reannotation-upd
|
||||
datum project export --format coco
|
||||
```
|
||||
|
||||
- Annotate instance polygons in CVAT, export as masks in COCO:
|
||||
```bash
|
||||
datum project import --format cvat --input-path <path/to/cvat.xml>
|
||||
datum project export --format coco -- --segmentation-mode masks
|
||||
```
|
||||
|
||||
- Apply an OpenVINO detection model to some COCO-like dataset,
|
||||
then compare annotations with ground truth and visualize in TensorBoard:
|
||||
```bash
|
||||
datum project import --format coco --input-path <path/to/coco>
|
||||
# create model results interpretation script
|
||||
datum model add mymodel openvino \
|
||||
--weights model.bin --description model.xml \
|
||||
--interpretation-script parse_results.py
|
||||
datum model run --model mymodel --output-dir mymodel_inference/
|
||||
datum project diff mymodel_inference/ --format tensorboard --output-dir diff
|
||||
```
|
||||
|
||||
<!--lint enable list-item-bullet-indent-->
|
||||
<!--lint enable list-item-indent-->
|
||||
|
||||
## Contributing
|
||||
|
||||
Feel free to [open an Issue](https://github.com/opencv/cvat/issues/new) if you
|
||||
think something needs to be changed. You are welcome to participate in development,
|
||||
development instructions are available in our [developer manual](CONTRIBUTING.md).
|
||||
|
||||
@ -0,0 +1,109 @@
|
||||
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
import argparse
|
||||
import logging as log
|
||||
import sys
|
||||
|
||||
from . import contexts, commands
|
||||
from .util import CliException, add_subparser
|
||||
from ..version import VERSION
|
||||
|
||||
|
||||
_log_levels = {
|
||||
'debug': log.DEBUG,
|
||||
'info': log.INFO,
|
||||
'warning': log.WARNING,
|
||||
'error': log.ERROR,
|
||||
'critical': log.CRITICAL
|
||||
}
|
||||
|
||||
def loglevel(name):
|
||||
return _log_levels[name]
|
||||
|
||||
def _make_subcommands_help(commands, help_line_start=0):
|
||||
desc = ""
|
||||
for command_name, _, command_help in commands:
|
||||
desc += (" %-" + str(max(0, help_line_start - 2 - 1)) + "s%s\n") % \
|
||||
(command_name, command_help)
|
||||
return desc
|
||||
|
||||
def make_parser():
|
||||
parser = argparse.ArgumentParser(prog="datumaro",
|
||||
description="Dataset Framework",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter)
|
||||
|
||||
parser.add_argument('--version', action='version', version=VERSION)
|
||||
parser.add_argument('--loglevel', type=loglevel, default='info',
|
||||
help="Logging level (options: %s; default: %s)" % \
|
||||
(', '.join(_log_levels.keys()), "%(default)s"))
|
||||
|
||||
known_contexts = [
|
||||
('project', contexts.project, "Actions on projects (datasets)"),
|
||||
('source', contexts.source, "Actions on data sources"),
|
||||
('model', contexts.model, "Actions on models"),
|
||||
]
|
||||
known_commands = [
|
||||
('create', commands.create, "Create project"),
|
||||
('add', commands.add, "Add source to project"),
|
||||
('remove', commands.remove, "Remove source from project"),
|
||||
('export', commands.export, "Export project"),
|
||||
('explain', commands.explain, "Run Explainable AI algorithm for model"),
|
||||
]
|
||||
|
||||
# Argparse doesn't support subparser groups:
|
||||
# https://stackoverflow.com/questions/32017020/grouping-argparse-subparser-arguments
|
||||
help_line_start = max((len(e[0]) for e in known_contexts + known_commands),
|
||||
default=0)
|
||||
help_line_start = max((2 + help_line_start) // 4 + 1, 6) * 4 # align to tabs
|
||||
subcommands_desc = ""
|
||||
if known_contexts:
|
||||
subcommands_desc += "Contexts:\n"
|
||||
subcommands_desc += _make_subcommands_help(known_contexts,
|
||||
help_line_start)
|
||||
if known_commands:
|
||||
if subcommands_desc:
|
||||
subcommands_desc += "\n"
|
||||
subcommands_desc += "Commands:\n"
|
||||
subcommands_desc += _make_subcommands_help(known_commands,
|
||||
help_line_start)
|
||||
if subcommands_desc:
|
||||
subcommands_desc += \
|
||||
"\nRun '%s COMMAND --help' for more information on a command." % \
|
||||
parser.prog
|
||||
|
||||
subcommands = parser.add_subparsers(title=subcommands_desc,
|
||||
description="", help=argparse.SUPPRESS)
|
||||
for command_name, command, _ in known_contexts + known_commands:
|
||||
add_subparser(subcommands, command_name, command.build_parser)
|
||||
|
||||
return parser
|
||||
|
||||
def set_up_logger(args):
|
||||
log.basicConfig(format='%(asctime)s %(levelname)s: %(message)s',
|
||||
level=args.loglevel)
|
||||
|
||||
def main(args=None):
|
||||
parser = make_parser()
|
||||
args = parser.parse_args(args)
|
||||
|
||||
set_up_logger(args)
|
||||
|
||||
if 'command' not in args:
|
||||
parser.print_help()
|
||||
return 1
|
||||
|
||||
try:
|
||||
return args.command(args)
|
||||
except CliException as e:
|
||||
log.error(e)
|
||||
return 1
|
||||
except Exception as e:
|
||||
log.error(e)
|
||||
raise
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
@ -1,21 +0,0 @@
|
||||
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
import argparse
|
||||
|
||||
from . import source as source_module
|
||||
|
||||
|
||||
def build_parser(parser=argparse.ArgumentParser()):
|
||||
source_module.build_add_parser(parser). \
|
||||
set_defaults(command=source_module.add_command)
|
||||
|
||||
return parser
|
||||
|
||||
def main(args=None):
|
||||
parser = build_parser()
|
||||
args = parser.parse_args(args)
|
||||
|
||||
return args.command(args)
|
||||
@ -0,0 +1,6 @@
|
||||
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
from . import add, create, explain, export, remove
|
||||
@ -0,0 +1,8 @@
|
||||
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
# pylint: disable=unused-import
|
||||
|
||||
from ..contexts.source import build_add_parser as build_parser
|
||||
@ -0,0 +1,8 @@
|
||||
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
# pylint: disable=unused-import
|
||||
|
||||
from ..contexts.project import build_create_parser as build_parser
|
||||
@ -0,0 +1,8 @@
|
||||
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
# pylint: disable=unused-import
|
||||
|
||||
from ..contexts.project import build_export_parser as build_parser
|
||||
@ -0,0 +1,8 @@
|
||||
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
# pylint: disable=unused-import
|
||||
|
||||
from ..contexts.source import build_remove_parser as build_parser
|
||||
@ -0,0 +1,6 @@
|
||||
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
from . import project, source, model, item
|
||||
@ -0,0 +1,36 @@
|
||||
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
import argparse
|
||||
|
||||
from ...util import add_subparser
|
||||
|
||||
|
||||
def build_export_parser(parser_ctor=argparse.ArgumentParser):
|
||||
parser = parser_ctor()
|
||||
return parser
|
||||
|
||||
def build_stats_parser(parser_ctor=argparse.ArgumentParser):
|
||||
parser = parser_ctor()
|
||||
return parser
|
||||
|
||||
def build_diff_parser(parser_ctor=argparse.ArgumentParser):
|
||||
parser = parser_ctor()
|
||||
return parser
|
||||
|
||||
def build_edit_parser(parser_ctor=argparse.ArgumentParser):
|
||||
parser = parser_ctor()
|
||||
return parser
|
||||
|
||||
def build_parser(parser_ctor=argparse.ArgumentParser):
|
||||
parser = parser_ctor()
|
||||
|
||||
subparsers = parser.add_subparsers()
|
||||
add_subparser(subparsers, 'export', build_export_parser)
|
||||
add_subparser(subparsers, 'stats', build_stats_parser)
|
||||
add_subparser(subparsers, 'diff', build_diff_parser)
|
||||
add_subparser(subparsers, 'edit', build_edit_parser)
|
||||
|
||||
return parser
|
||||
@ -0,0 +1,647 @@
|
||||
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
import argparse
|
||||
from enum import Enum
|
||||
import logging as log
|
||||
import os
|
||||
import os.path as osp
|
||||
import shutil
|
||||
|
||||
from datumaro.components.project import Project
|
||||
from datumaro.components.comparator import Comparator
|
||||
from datumaro.components.dataset_filter import DatasetItemEncoder
|
||||
from datumaro.components.extractor import AnnotationType
|
||||
from .diff import DiffVisualizer
|
||||
from ...util import add_subparser, CliException, MultilineFormatter
|
||||
from ...util.project import make_project_path, load_project, \
|
||||
generate_next_dir_name
|
||||
|
||||
|
||||
def build_create_parser(parser_ctor=argparse.ArgumentParser):
|
||||
parser = parser_ctor(help="Create empty project",
|
||||
description="""
|
||||
Create a new empty project.|n
|
||||
|n
|
||||
Examples:|n
|
||||
- Create a project in the current directory:|n
|
||||
|s|screate -n myproject|n
|
||||
|n
|
||||
- Create a project in other directory:|n
|
||||
|s|screate -o path/I/like/
|
||||
""",
|
||||
formatter_class=MultilineFormatter)
|
||||
|
||||
parser.add_argument('-o', '--output-dir', default='.', dest='dst_dir',
|
||||
help="Save directory for the new project (default: current dir")
|
||||
parser.add_argument('-n', '--name', default=None,
|
||||
help="Name of the new project (default: same as project dir)")
|
||||
parser.add_argument('--overwrite', action='store_true',
|
||||
help="Overwrite existing files in the save directory")
|
||||
parser.set_defaults(command=create_command)
|
||||
|
||||
return parser
|
||||
|
||||
def create_command(args):
|
||||
project_dir = osp.abspath(args.dst_dir)
|
||||
project_path = make_project_path(project_dir)
|
||||
|
||||
if osp.isdir(project_dir) and os.listdir(project_dir):
|
||||
if not args.overwrite:
|
||||
raise CliException("Directory '%s' already exists "
|
||||
"(pass --overwrite to force creation)" % project_dir)
|
||||
else:
|
||||
shutil.rmtree(project_dir)
|
||||
os.makedirs(project_dir, exist_ok=True)
|
||||
|
||||
if not args.overwrite and osp.isfile(project_path):
|
||||
raise CliException("Project file '%s' already exists "
|
||||
"(pass --overwrite to force creation)" % project_path)
|
||||
|
||||
project_name = args.name
|
||||
if project_name is None:
|
||||
project_name = osp.basename(project_dir)
|
||||
|
||||
log.info("Creating project at '%s'" % project_dir)
|
||||
|
||||
Project.generate(project_dir, {
|
||||
'project_name': project_name,
|
||||
})
|
||||
|
||||
log.info("Project has been created at '%s'" % project_dir)
|
||||
|
||||
return 0
|
||||
|
||||
def build_import_parser(parser_ctor=argparse.ArgumentParser):
|
||||
import datumaro.components.importers as importers_module
|
||||
builtin_importers = [name for name, cls in importers_module.items]
|
||||
|
||||
parser = parser_ctor(help="Create project from existing dataset",
|
||||
description="""
|
||||
Creates a project from an existing dataset. The source can be:|n
|
||||
- a dataset in a supported format (check 'formats' section below)|n
|
||||
- a Datumaro project|n
|
||||
|n
|
||||
Formats:|n
|
||||
Datasets come in a wide variety of formats. Each dataset
|
||||
format defines its own data structure and rules on how to
|
||||
interpret the data. For example, the following data structure
|
||||
is used in COCO format:|n
|
||||
/dataset/|n
|
||||
- /images/<id>.jpg|n
|
||||
- /annotations/|n
|
||||
|n
|
||||
In Datumaro dataset formats are supported by
|
||||
Extractor-s and Importer-s.
|
||||
An Extractor produces a list of dataset items corresponding
|
||||
to the dataset. An Importer creates a project from the
|
||||
data source location.
|
||||
It is possible to add a custom Extractor and Importer.
|
||||
To do this, you need to put an Extractor and
|
||||
Importer implementation scripts to
|
||||
<project_dir>/.datumaro/extractors
|
||||
and <project_dir>/.datumaro/importers.|n
|
||||
|n
|
||||
List of supported dataset formats: %s|n
|
||||
|n
|
||||
Examples:|n
|
||||
- Create a project from VOC dataset in the current directory:|n
|
||||
|s|simport -f voc -i path/to/voc|n
|
||||
|n
|
||||
- Create a project from COCO dataset in other directory:|n
|
||||
|s|simport -f coco -i path/to/coco -o path/I/like/
|
||||
""" % ', '.join(builtin_importers),
|
||||
formatter_class=MultilineFormatter)
|
||||
|
||||
parser.add_argument('-o', '--output-dir', default='.', dest='dst_dir',
|
||||
help="Directory to save the new project to (default: current dir)")
|
||||
parser.add_argument('-n', '--name', default=None,
|
||||
help="Name of the new project (default: same as project dir)")
|
||||
parser.add_argument('--copy', action='store_true',
|
||||
help="Copy the dataset instead of saving source links")
|
||||
parser.add_argument('--skip-check', action='store_true',
|
||||
help="Skip source checking")
|
||||
parser.add_argument('--overwrite', action='store_true',
|
||||
help="Overwrite existing files in the save directory")
|
||||
parser.add_argument('-i', '--input-path', required=True, dest='source',
|
||||
help="Path to import project from")
|
||||
parser.add_argument('-f', '--format', required=True,
|
||||
help="Source project format")
|
||||
# parser.add_argument('extra_args', nargs=argparse.REMAINDER,
|
||||
# help="Additional arguments for importer (pass '-- -h' for help)")
|
||||
parser.set_defaults(command=import_command)
|
||||
|
||||
return parser
|
||||
|
||||
def import_command(args):
|
||||
project_dir = osp.abspath(args.dst_dir)
|
||||
project_path = make_project_path(project_dir)
|
||||
|
||||
if osp.isdir(project_dir) and os.listdir(project_dir):
|
||||
if not args.overwrite:
|
||||
raise CliException("Directory '%s' already exists "
|
||||
"(pass --overwrite to force creation)" % project_dir)
|
||||
else:
|
||||
shutil.rmtree(project_dir)
|
||||
os.makedirs(project_dir, exist_ok=True)
|
||||
|
||||
if not args.overwrite and osp.isfile(project_path):
|
||||
raise CliException("Project file '%s' already exists "
|
||||
"(pass --overwrite to force creation)" % project_path)
|
||||
|
||||
project_name = args.name
|
||||
if project_name is None:
|
||||
project_name = osp.basename(project_dir)
|
||||
|
||||
log.info("Importing project from '%s' as '%s'" % \
|
||||
(args.source, args.format))
|
||||
|
||||
source = osp.abspath(args.source)
|
||||
project = Project.import_from(source, args.format)
|
||||
project.config.project_name = project_name
|
||||
project.config.project_dir = project_dir
|
||||
|
||||
if not args.skip_check or args.copy:
|
||||
log.info("Checking the dataset...")
|
||||
dataset = project.make_dataset()
|
||||
if args.copy:
|
||||
log.info("Cloning data...")
|
||||
dataset.save(merge=True, save_images=True)
|
||||
else:
|
||||
project.save()
|
||||
|
||||
log.info("Project has been created at '%s'" % project_dir)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
class FilterModes(Enum):
|
||||
# primary
|
||||
items = 1
|
||||
annotations = 2
|
||||
items_annotations = 3
|
||||
|
||||
# shortcuts
|
||||
i = 1
|
||||
a = 2
|
||||
i_a = 3
|
||||
a_i = 3
|
||||
annotations_items = 3
|
||||
|
||||
@staticmethod
|
||||
def parse(s):
|
||||
s = s.lower()
|
||||
s = s.replace('+', '_')
|
||||
return FilterModes[s]
|
||||
|
||||
@classmethod
|
||||
def make_filter_args(cls, mode):
|
||||
if mode == cls.items:
|
||||
return {}
|
||||
elif mode == cls.annotations:
|
||||
return {
|
||||
'filter_annotations': True
|
||||
}
|
||||
elif mode == cls.items_annotations:
|
||||
return {
|
||||
'filter_annotations': True,
|
||||
'remove_empty': True,
|
||||
}
|
||||
else:
|
||||
raise NotImplementedError()
|
||||
|
||||
@classmethod
|
||||
def list_options(cls):
|
||||
return [m.name.replace('_', '+') for m in cls]
|
||||
|
||||
def build_export_parser(parser_ctor=argparse.ArgumentParser):
|
||||
import datumaro.components.converters as converters_module
|
||||
builtin_converters = [name for name, cls in converters_module.items]
|
||||
|
||||
parser = parser_ctor(help="Export project",
|
||||
description="""
|
||||
Exports the project dataset in some format. Optionally, a filter
|
||||
can be passed, check 'extract' command description for more info.
|
||||
Each dataset format has its own options, which
|
||||
are passed after '--' separator (see examples), pass '-- -h'
|
||||
for more info. If not stated otherwise, by default
|
||||
only annotations are exported, to include images pass
|
||||
'--save-images' parameter.|n
|
||||
|n
|
||||
Formats:|n
|
||||
In Datumaro dataset formats are supported by Converter-s.
|
||||
A Converter produces a dataset of a specific format
|
||||
from dataset items. It is possible to add a custom Converter.
|
||||
To do this, you need to put a Converter
|
||||
definition script to <project_dir>/.datumaro/converters.|n
|
||||
|n
|
||||
List of supported dataset formats: %s|n
|
||||
|n
|
||||
Examples:|n
|
||||
- Export project as a VOC-like dataset, include images:|n
|
||||
|s|sexport -f voc -- --save-images|n
|
||||
|n
|
||||
- Export project as a COCO-like dataset in other directory:|n
|
||||
|s|sexport -f coco -o path/I/like/
|
||||
""" % ', '.join(builtin_converters),
|
||||
formatter_class=MultilineFormatter)
|
||||
|
||||
parser.add_argument('-e', '--filter', default=None,
|
||||
help="Filter expression for dataset items")
|
||||
parser.add_argument('--filter-mode', default=FilterModes.i.name,
|
||||
type=FilterModes.parse,
|
||||
help="Filter mode (options: %s; default: %s)" % \
|
||||
(', '.join(FilterModes.list_options()) , '%(default)s'))
|
||||
parser.add_argument('-o', '--output-dir', dest='dst_dir', default=None,
|
||||
help="Directory to save output (default: a subdir in the current one)")
|
||||
parser.add_argument('--overwrite', action='store_true',
|
||||
help="Overwrite existing files in the save directory")
|
||||
parser.add_argument('-p', '--project', dest='project_dir', default='.',
|
||||
help="Directory of the project to operate on (default: current dir)")
|
||||
parser.add_argument('-f', '--format', required=True,
|
||||
help="Output format")
|
||||
parser.add_argument('extra_args', nargs=argparse.REMAINDER, default=None,
|
||||
help="Additional arguments for converter (pass '-- -h' for help)")
|
||||
parser.set_defaults(command=export_command)
|
||||
|
||||
return parser
|
||||
|
||||
def export_command(args):
|
||||
project = load_project(args.project_dir)
|
||||
|
||||
dst_dir = args.dst_dir
|
||||
if dst_dir:
|
||||
if not args.overwrite and osp.isdir(dst_dir) and os.listdir(dst_dir):
|
||||
raise CliException("Directory '%s' already exists "
|
||||
"(pass --overwrite to force creation)" % dst_dir)
|
||||
else:
|
||||
dst_dir = generate_next_dir_name('%s-export-%s' % \
|
||||
(project.config.project_name, args.format))
|
||||
dst_dir = osp.abspath(dst_dir)
|
||||
|
||||
try:
|
||||
converter = project.env.make_converter(args.format,
|
||||
cmdline_args=args.extra_args)
|
||||
except KeyError:
|
||||
raise CliException("Converter for format '%s' is not found" % \
|
||||
args.format)
|
||||
|
||||
filter_args = FilterModes.make_filter_args(args.filter_mode)
|
||||
|
||||
log.info("Loading the project...")
|
||||
dataset = project.make_dataset()
|
||||
|
||||
log.info("Exporting the project...")
|
||||
dataset.export_project(
|
||||
save_dir=dst_dir,
|
||||
converter=converter,
|
||||
filter_expr=args.filter,
|
||||
**filter_args)
|
||||
log.info("Project exported to '%s' as '%s'" % \
|
||||
(dst_dir, args.format))
|
||||
|
||||
return 0
|
||||
|
||||
def build_extract_parser(parser_ctor=argparse.ArgumentParser):
|
||||
parser = parser_ctor(help="Extract subproject",
|
||||
description="""
|
||||
Extracts a subproject that contains only items matching filter.
|
||||
A filter is an XPath expression, which is applied to XML
|
||||
representation of a dataset item. Check '--dry-run' parameter
|
||||
to see XML representations of the dataset items.|n
|
||||
|n
|
||||
To filter annotations use the mode ('-m') parameter.|n
|
||||
Supported modes:|n
|
||||
- 'i', 'items'|n
|
||||
- 'a', 'annotations'|n
|
||||
- 'i+a', 'a+i', 'items+annotations', 'annotations+items'|n
|
||||
When filtering annotations, use the 'items+annotations'
|
||||
mode to point that annotation-less dataset items should be
|
||||
removed. To select an annotation, write an XPath that
|
||||
returns 'annotation' elements (see examples).|n
|
||||
|n
|
||||
Examples:|n
|
||||
- Filter images with width < height:|n
|
||||
|s|sextract -e '/item[image/width < image/height]'|n
|
||||
|n
|
||||
- Filter images with large-area bboxes:|n
|
||||
|s|sextract -e '/item[annotation/type="bbox" and
|
||||
annotation/area>2000]'|n
|
||||
|n
|
||||
- Filter out all irrelevant annotations from items:|n
|
||||
|s|sextract -m a -e '/item/annotation[label = "person"]'|n
|
||||
|n
|
||||
- Filter out all irrelevant annotations from items:|n
|
||||
|s|sextract -m a -e '/item/annotation[label="cat" and
|
||||
area > 99.5]'|n
|
||||
|n
|
||||
- Filter occluded annotations and items, if no annotations left:|n
|
||||
|s|sextract -m i+a -e '/item/annotation[occluded="True"]'
|
||||
""",
|
||||
formatter_class=MultilineFormatter)
|
||||
|
||||
parser.add_argument('-e', '--filter', default=None,
|
||||
help="XML XPath filter expression for dataset items")
|
||||
parser.add_argument('-m', '--mode', default=FilterModes.i.name,
|
||||
type=FilterModes.parse,
|
||||
help="Filter mode (options: %s; default: %s)" % \
|
||||
(', '.join(FilterModes.list_options()) , '%(default)s'))
|
||||
parser.add_argument('--dry-run', action='store_true',
|
||||
help="Print XML representations to be filtered and exit")
|
||||
parser.add_argument('-o', '--output-dir', dest='dst_dir', default=None,
|
||||
help="Output directory (default: update current project)")
|
||||
parser.add_argument('--overwrite', action='store_true',
|
||||
help="Overwrite existing files in the save directory")
|
||||
parser.add_argument('-p', '--project', dest='project_dir', default='.',
|
||||
help="Directory of the project to operate on (default: current dir)")
|
||||
parser.set_defaults(command=extract_command)
|
||||
|
||||
return parser
|
||||
|
||||
def extract_command(args):
|
||||
project = load_project(args.project_dir)
|
||||
|
||||
if not args.dry_run:
|
||||
dst_dir = args.dst_dir
|
||||
if dst_dir:
|
||||
if not args.overwrite and osp.isdir(dst_dir) and os.listdir(dst_dir):
|
||||
raise CliException("Directory '%s' already exists "
|
||||
"(pass --overwrite to force creation)" % dst_dir)
|
||||
else:
|
||||
dst_dir = generate_next_dir_name('%s-filter' % \
|
||||
project.config.project_name)
|
||||
dst_dir = osp.abspath(dst_dir)
|
||||
|
||||
dataset = project.make_dataset()
|
||||
|
||||
filter_args = FilterModes.make_filter_args(args.mode)
|
||||
|
||||
if args.dry_run:
|
||||
dataset = dataset.extract(filter_expr=args.filter, **filter_args)
|
||||
for item in dataset:
|
||||
encoded_item = DatasetItemEncoder.encode(item, dataset.categories())
|
||||
xml_item = DatasetItemEncoder.to_string(encoded_item)
|
||||
print(xml_item)
|
||||
return 0
|
||||
|
||||
if not args.filter:
|
||||
raise CliException("Expected a filter expression ('-e' argument)")
|
||||
|
||||
os.makedirs(dst_dir, exist_ok=False)
|
||||
dataset.extract_project(save_dir=dst_dir, filter_expr=args.filter,
|
||||
**filter_args)
|
||||
|
||||
log.info("Subproject has been extracted to '%s'" % dst_dir)
|
||||
|
||||
return 0
|
||||
|
||||
def build_merge_parser(parser_ctor=argparse.ArgumentParser):
|
||||
parser = parser_ctor(help="Merge projects",
|
||||
description="""
|
||||
Updates items of the current project with items
|
||||
from the other project.|n
|
||||
|n
|
||||
Examples:|n
|
||||
- Update a project with items from other project:|n
|
||||
|s|smerge -p path/to/first/project path/to/other/project
|
||||
""",
|
||||
formatter_class=MultilineFormatter)
|
||||
|
||||
parser.add_argument('other_project_dir',
|
||||
help="Directory of the project to get data updates from")
|
||||
parser.add_argument('-o', '--output-dir', dest='dst_dir', default=None,
|
||||
help="Output directory (default: current project's dir)")
|
||||
parser.add_argument('--overwrite', action='store_true',
|
||||
help="Overwrite existing files in the save directory")
|
||||
parser.add_argument('-p', '--project', dest='project_dir', default='.',
|
||||
help="Directory of the project to operate on (default: current dir)")
|
||||
parser.set_defaults(command=merge_command)
|
||||
|
||||
return parser
|
||||
|
||||
def merge_command(args):
|
||||
first_project = load_project(args.project_dir)
|
||||
second_project = load_project(args.other_project_dir)
|
||||
|
||||
dst_dir = args.dst_dir
|
||||
if dst_dir:
|
||||
if not args.overwrite and osp.isdir(dst_dir) and os.listdir(dst_dir):
|
||||
raise CliException("Directory '%s' already exists "
|
||||
"(pass --overwrite to force creation)" % dst_dir)
|
||||
|
||||
first_dataset = first_project.make_dataset()
|
||||
first_dataset.update(second_project.make_dataset())
|
||||
|
||||
first_dataset.save(save_dir=dst_dir)
|
||||
|
||||
if dst_dir is None:
|
||||
dst_dir = first_project.config.project_dir
|
||||
dst_dir = osp.abspath(dst_dir)
|
||||
log.info("Merge results have been saved to '%s'" % dst_dir)
|
||||
|
||||
return 0
|
||||
|
||||
def build_diff_parser(parser_ctor=argparse.ArgumentParser):
|
||||
parser = parser_ctor(help="Compare projects",
|
||||
description="""
|
||||
Compares two projects.|n
|
||||
|n
|
||||
Examples:|n
|
||||
- Compare two projects, consider bboxes matching if their IoU > 0.7,|n
|
||||
|s|s|s|sprint results to Tensorboard:
|
||||
|s|sdiff path/to/other/project -o diff/ -f tensorboard --iou-thresh 0.7
|
||||
""",
|
||||
formatter_class=MultilineFormatter)
|
||||
|
||||
parser.add_argument('other_project_dir',
|
||||
help="Directory of the second project to be compared")
|
||||
parser.add_argument('-o', '--output-dir', dest='dst_dir', default=None,
|
||||
help="Directory to save comparison results (default: do not save)")
|
||||
parser.add_argument('-f', '--format',
|
||||
default=DiffVisualizer.DEFAULT_FORMAT,
|
||||
choices=[f.name for f in DiffVisualizer.Format],
|
||||
help="Output format (default: %(default)s)")
|
||||
parser.add_argument('--iou-thresh', default=0.5, type=float,
|
||||
help="IoU match threshold for detections (default: %(default)s)")
|
||||
parser.add_argument('--conf-thresh', default=0.5, type=float,
|
||||
help="Confidence threshold for detections (default: %(default)s)")
|
||||
parser.add_argument('--overwrite', action='store_true',
|
||||
help="Overwrite existing files in the save directory")
|
||||
parser.add_argument('-p', '--project', dest='project_dir', default='.',
|
||||
help="Directory of the first project to be compared (default: current dir)")
|
||||
parser.set_defaults(command=diff_command)
|
||||
|
||||
return parser
|
||||
|
||||
def diff_command(args):
|
||||
first_project = load_project(args.project_dir)
|
||||
second_project = load_project(args.other_project_dir)
|
||||
|
||||
comparator = Comparator(
|
||||
iou_threshold=args.iou_thresh,
|
||||
conf_threshold=args.conf_thresh)
|
||||
|
||||
dst_dir = args.dst_dir
|
||||
if dst_dir:
|
||||
if not args.overwrite and osp.isdir(dst_dir) and os.listdir(dst_dir):
|
||||
raise CliException("Directory '%s' already exists "
|
||||
"(pass --overwrite to force creation)" % dst_dir)
|
||||
else:
|
||||
dst_dir = generate_next_dir_name('%s-%s-diff' % (
|
||||
first_project.config.project_name,
|
||||
second_project.config.project_name)
|
||||
)
|
||||
dst_dir = osp.abspath(dst_dir)
|
||||
if dst_dir:
|
||||
log.info("Saving diff to '%s'" % dst_dir)
|
||||
|
||||
visualizer = DiffVisualizer(save_dir=dst_dir, comparator=comparator,
|
||||
output_format=args.format)
|
||||
visualizer.save_dataset_diff(
|
||||
first_project.make_dataset(),
|
||||
second_project.make_dataset())
|
||||
|
||||
return 0
|
||||
|
||||
def build_transform_parser(parser_ctor=argparse.ArgumentParser):
|
||||
parser = parser_ctor(help="Transform project",
|
||||
description="""
|
||||
Applies some operation to dataset items in the project
|
||||
and produces a new project.
|
||||
|
||||
[NOT IMPLEMENTED YET]
|
||||
""",
|
||||
formatter_class=MultilineFormatter)
|
||||
|
||||
parser.add_argument('-t', '--transform', required=True,
|
||||
help="Transform to apply to the project")
|
||||
parser.add_argument('-o', '--output-dir', dest='dst_dir', default=None,
|
||||
help="Directory to save output (default: current dir)")
|
||||
parser.add_argument('--overwrite', action='store_true',
|
||||
help="Overwrite existing files in the save directory")
|
||||
parser.add_argument('-p', '--project', dest='project_dir', default='.',
|
||||
help="Directory of the project to operate on (default: current dir)")
|
||||
parser.set_defaults(command=transform_command)
|
||||
|
||||
return parser
|
||||
|
||||
def transform_command(args):
|
||||
raise NotImplementedError("Not implemented yet.")
|
||||
|
||||
# project = load_project(args.project_dir)
|
||||
|
||||
# dst_dir = args.dst_dir
|
||||
# if dst_dir:
|
||||
# if not args.overwrite and osp.isdir(dst_dir) and os.listdir(dst_dir):
|
||||
# raise CliException("Directory '%s' already exists "
|
||||
# "(pass --overwrite to force creation)" % dst_dir)
|
||||
# dst_dir = osp.abspath(args.dst_dir)
|
||||
|
||||
# project.make_dataset().transform_project(
|
||||
# method=args.transform,
|
||||
# save_dir=dst_dir
|
||||
# )
|
||||
|
||||
# log.info("Transform results saved to '%s'" % dst_dir)
|
||||
|
||||
# return 0
|
||||
|
||||
def build_info_parser(parser_ctor=argparse.ArgumentParser):
|
||||
parser = parser_ctor(help="Get project info",
|
||||
description="""
|
||||
Outputs project info.
|
||||
""",
|
||||
formatter_class=MultilineFormatter)
|
||||
|
||||
parser.add_argument('--all', action='store_true',
|
||||
help="Print all information")
|
||||
parser.add_argument('-p', '--project', dest='project_dir', default='.',
|
||||
help="Directory of the project to operate on (default: current dir)")
|
||||
parser.set_defaults(command=info_command)
|
||||
|
||||
return parser
|
||||
|
||||
def info_command(args):
|
||||
project = load_project(args.project_dir)
|
||||
config = project.config
|
||||
env = project.env
|
||||
dataset = project.make_dataset()
|
||||
|
||||
print("Project:")
|
||||
print(" name:", config.project_name)
|
||||
print(" location:", config.project_dir)
|
||||
print("Plugins:")
|
||||
print(" importers:", ', '.join(env.importers.items))
|
||||
print(" extractors:", ', '.join(env.extractors.items))
|
||||
print(" converters:", ', '.join(env.converters.items))
|
||||
print(" launchers:", ', '.join(env.launchers.items))
|
||||
|
||||
print("Sources:")
|
||||
for source_name, source in config.sources.items():
|
||||
print(" source '%s':" % source_name)
|
||||
print(" format:", source.format)
|
||||
print(" url:", source.url)
|
||||
print(" location:", project.local_source_dir(source_name))
|
||||
|
||||
def print_extractor_info(extractor, indent=''):
|
||||
print("%slength:" % indent, len(extractor))
|
||||
|
||||
categories = extractor.categories()
|
||||
print("%scategories:" % indent, ', '.join(c.name for c in categories))
|
||||
|
||||
for cat_type, cat in categories.items():
|
||||
print("%s %s:" % (indent, cat_type.name))
|
||||
if cat_type == AnnotationType.label:
|
||||
print("%s count:" % indent, len(cat.items))
|
||||
|
||||
count_threshold = 10
|
||||
if args.all:
|
||||
count_threshold = len(cat.items)
|
||||
labels = ', '.join(c.name for c in cat.items[:count_threshold])
|
||||
if count_threshold < len(cat.items):
|
||||
labels += " (and %s more)" % (
|
||||
len(cat.items) - count_threshold)
|
||||
print("%s labels:" % indent, labels)
|
||||
|
||||
print("Dataset:")
|
||||
print_extractor_info(dataset, indent=" ")
|
||||
|
||||
subsets = dataset.subsets()
|
||||
print(" subsets:", ', '.join(subsets))
|
||||
for subset_name in subsets:
|
||||
subset = dataset.get_subset(subset_name)
|
||||
print(" subset '%s':" % subset_name)
|
||||
print_extractor_info(subset, indent=" ")
|
||||
|
||||
print("Models:")
|
||||
for model_name, model in env.config.models.items():
|
||||
print(" model '%s':" % model_name)
|
||||
print(" type:", model.launcher)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def build_parser(parser_ctor=argparse.ArgumentParser):
|
||||
parser = parser_ctor(
|
||||
description="""
|
||||
Manipulate projects.|n
|
||||
|n
|
||||
By default, the project to be operated on is searched for
|
||||
in the current directory. An additional '-p' argument can be
|
||||
passed to specify project location.
|
||||
""",
|
||||
formatter_class=MultilineFormatter)
|
||||
|
||||
subparsers = parser.add_subparsers()
|
||||
add_subparser(subparsers, 'create', build_create_parser)
|
||||
add_subparser(subparsers, 'import', build_import_parser)
|
||||
add_subparser(subparsers, 'export', build_export_parser)
|
||||
add_subparser(subparsers, 'extract', build_extract_parser)
|
||||
add_subparser(subparsers, 'merge', build_merge_parser)
|
||||
add_subparser(subparsers, 'diff', build_diff_parser)
|
||||
add_subparser(subparsers, 'transform', build_transform_parser)
|
||||
add_subparser(subparsers, 'info', build_info_parser)
|
||||
|
||||
return parser
|
||||
@ -0,0 +1,247 @@
|
||||
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
import argparse
|
||||
import logging as log
|
||||
import os
|
||||
import os.path as osp
|
||||
import shutil
|
||||
|
||||
from ...util import add_subparser, CliException, MultilineFormatter
|
||||
from ...util.project import load_project
|
||||
|
||||
|
||||
def build_add_parser(parser_ctor=argparse.ArgumentParser):
|
||||
import datumaro.components.extractors as extractors_module
|
||||
extractors_list = [name for name, cls in extractors_module.items]
|
||||
|
||||
base_parser = argparse.ArgumentParser(add_help=False)
|
||||
base_parser.add_argument('-n', '--name', default=None,
|
||||
help="Name of the new source")
|
||||
base_parser.add_argument('-f', '--format', required=True,
|
||||
help="Source dataset format")
|
||||
base_parser.add_argument('--skip-check', action='store_true',
|
||||
help="Skip source checking")
|
||||
base_parser.add_argument('-p', '--project', dest='project_dir', default='.',
|
||||
help="Directory of the project to operate on (default: current dir)")
|
||||
|
||||
parser = parser_ctor(help="Add data source to project",
|
||||
description="""
|
||||
Adds a data source to a project. The source can be:|n
|
||||
- a dataset in a supported format (check 'formats' section below)|n
|
||||
- a Datumaro project|n
|
||||
|n
|
||||
The source can be either a local directory or a remote
|
||||
git repository. Each source type has its own parameters, which can
|
||||
be checked by:|n
|
||||
'%s'.|n
|
||||
|n
|
||||
Formats:|n
|
||||
Datasets come in a wide variety of formats. Each dataset
|
||||
format defines its own data structure and rules on how to
|
||||
interpret the data. For example, the following data structure
|
||||
is used in COCO format:|n
|
||||
/dataset/|n
|
||||
- /images/<id>.jpg|n
|
||||
- /annotations/|n
|
||||
|n
|
||||
In Datumaro dataset formats are supported by Extractor-s.
|
||||
An Extractor produces a list of dataset items corresponding
|
||||
to the dataset. It is possible to add a custom Extractor.
|
||||
To do this, you need to put an Extractor
|
||||
definition script to <project_dir>/.datumaro/extractors.|n
|
||||
|n
|
||||
List of supported source formats: %s|n
|
||||
|n
|
||||
Examples:|n
|
||||
- Add a local directory with VOC-like dataset:|n
|
||||
|s|sadd path path/to/voc -f voc_detection|n
|
||||
- Add a local file with CVAT annotations, call it 'mysource'|n
|
||||
|s|s|s|sto the project somewhere else:|n
|
||||
|s|sadd path path/to/cvat.xml -f cvat -n mysource -p somewhere/else/
|
||||
""" % ('%(prog)s SOURCE_TYPE --help', ', '.join(extractors_list)),
|
||||
formatter_class=MultilineFormatter,
|
||||
add_help=False)
|
||||
parser.set_defaults(command=add_command)
|
||||
|
||||
sp = parser.add_subparsers(dest='source_type', metavar='SOURCE_TYPE',
|
||||
help="The type of the data source "
|
||||
"(call '%s SOURCE_TYPE --help' for more info)" % parser.prog)
|
||||
|
||||
dir_parser = sp.add_parser('path', help="Add local path as source",
|
||||
parents=[base_parser])
|
||||
dir_parser.add_argument('url',
|
||||
help="Path to the source")
|
||||
dir_parser.add_argument('--copy', action='store_true',
|
||||
help="Copy the dataset instead of saving source links")
|
||||
|
||||
repo_parser = sp.add_parser('git', help="Add git repository as source",
|
||||
parents=[base_parser])
|
||||
repo_parser.add_argument('url',
|
||||
help="URL of the source git repository")
|
||||
repo_parser.add_argument('-b', '--branch', default='master',
|
||||
help="Branch of the source repository (default: %(default)s)")
|
||||
repo_parser.add_argument('--checkout', action='store_true',
|
||||
help="Do branch checkout")
|
||||
|
||||
# NOTE: add common parameters to the parent help output
|
||||
# the other way could be to use parse_known_args()
|
||||
display_parser = argparse.ArgumentParser(
|
||||
parents=[base_parser, parser],
|
||||
prog=parser.prog, usage="%(prog)s [-h] SOURCE_TYPE ...",
|
||||
description=parser.description, formatter_class=MultilineFormatter)
|
||||
class HelpAction(argparse._HelpAction):
|
||||
def __call__(self, parser, namespace, values, option_string=None):
|
||||
display_parser.print_help()
|
||||
parser.exit()
|
||||
|
||||
parser.add_argument('-h', '--help', action=HelpAction,
|
||||
help='show this help message and exit')
|
||||
|
||||
# TODO: needed distinction on how to add an extractor or a remote source
|
||||
|
||||
return parser
|
||||
|
||||
def add_command(args):
|
||||
project = load_project(args.project_dir)
|
||||
|
||||
if args.source_type == 'git':
|
||||
name = args.name
|
||||
if name is None:
|
||||
name = osp.splitext(osp.basename(args.url))[0]
|
||||
|
||||
if project.env.git.has_submodule(name):
|
||||
raise CliException("Git submodule '%s' already exists" % name)
|
||||
|
||||
try:
|
||||
project.get_source(name)
|
||||
raise CliException("Source '%s' already exists" % name)
|
||||
except KeyError:
|
||||
pass
|
||||
|
||||
rel_local_dir = project.local_source_dir(name)
|
||||
local_dir = osp.join(project.config.project_dir, rel_local_dir)
|
||||
url = args.url
|
||||
project.env.git.create_submodule(name, local_dir,
|
||||
url=url, branch=args.branch, no_checkout=not args.checkout)
|
||||
elif args.source_type == 'path':
|
||||
url = osp.abspath(args.url)
|
||||
if not osp.exists(url):
|
||||
raise CliException("Source path '%s' does not exist" % url)
|
||||
|
||||
name = args.name
|
||||
if name is None:
|
||||
name = osp.splitext(osp.basename(url))[0]
|
||||
|
||||
if project.env.git.has_submodule(name):
|
||||
raise CliException("Git submodule '%s' already exists" % name)
|
||||
|
||||
try:
|
||||
project.get_source(name)
|
||||
raise CliException("Source '%s' already exists" % name)
|
||||
except KeyError:
|
||||
pass
|
||||
|
||||
rel_local_dir = project.local_source_dir(name)
|
||||
local_dir = osp.join(project.config.project_dir, rel_local_dir)
|
||||
|
||||
if args.copy:
|
||||
log.info("Copying from '%s' to '%s'" % (url, local_dir))
|
||||
if osp.isdir(url):
|
||||
# copytree requires destination dir not to exist
|
||||
shutil.copytree(url, local_dir)
|
||||
url = rel_local_dir
|
||||
elif osp.isfile(url):
|
||||
os.makedirs(local_dir)
|
||||
shutil.copy2(url, local_dir)
|
||||
url = osp.join(rel_local_dir, osp.basename(url))
|
||||
else:
|
||||
raise Exception("Expected file or directory")
|
||||
else:
|
||||
os.makedirs(local_dir)
|
||||
|
||||
project.add_source(name, { 'url': url, 'format': args.format })
|
||||
|
||||
if not args.skip_check:
|
||||
log.info("Checking the source...")
|
||||
try:
|
||||
project.make_source_project(name).make_dataset()
|
||||
except Exception:
|
||||
shutil.rmtree(local_dir, ignore_errors=True)
|
||||
raise
|
||||
|
||||
project.save()
|
||||
|
||||
log.info("Source '%s' has been added to the project, location: '%s'" \
|
||||
% (name, rel_local_dir))
|
||||
|
||||
return 0
|
||||
|
||||
def build_remove_parser(parser_ctor=argparse.ArgumentParser):
|
||||
parser = parser_ctor(help="Remove source from project",
|
||||
description="Remove a source from a project.")
|
||||
|
||||
parser.add_argument('-n', '--name', required=True,
|
||||
help="Name of the source to be removed")
|
||||
parser.add_argument('--force', action='store_true',
|
||||
help="Ignore possible errors during removal")
|
||||
parser.add_argument('--keep-data', action='store_true',
|
||||
help="Do not remove source data")
|
||||
parser.add_argument('-p', '--project', dest='project_dir', default='.',
|
||||
help="Directory of the project to operate on (default: current dir)")
|
||||
parser.set_defaults(command=remove_command)
|
||||
|
||||
return parser
|
||||
|
||||
def remove_command(args):
|
||||
project = load_project(args.project_dir)
|
||||
|
||||
name = args.name
|
||||
if not name:
|
||||
raise CliException("Expected source name")
|
||||
try:
|
||||
project.get_source(name)
|
||||
except KeyError:
|
||||
if not args.force:
|
||||
raise CliException("Source '%s' does not exist" % name)
|
||||
|
||||
if project.env.git.has_submodule(name):
|
||||
if args.force:
|
||||
log.warning("Forcefully removing the '%s' source..." % name)
|
||||
|
||||
project.env.git.remove_submodule(name, force=args.force)
|
||||
|
||||
source_dir = osp.join(project.config.project_dir,
|
||||
project.local_source_dir(name))
|
||||
project.remove_source(name)
|
||||
project.save()
|
||||
|
||||
if not args.keep_data:
|
||||
shutil.rmtree(source_dir, ignore_errors=True)
|
||||
|
||||
log.info("Source '%s' has been removed from the project" % name)
|
||||
|
||||
return 0
|
||||
|
||||
def build_parser(parser_ctor=argparse.ArgumentParser):
|
||||
parser = parser_ctor(description="""
|
||||
Manipulate data sources inside of a project.|n
|
||||
|n
|
||||
A data source is a source of data for a project.
|
||||
The project combines multiple data sources into one dataset.
|
||||
The role of a data source is to provide dataset items - images
|
||||
and/or annotations.|n
|
||||
|n
|
||||
By default, the project to be operated on is searched for
|
||||
in the current directory. An additional '-p' argument can be
|
||||
passed to specify project location.
|
||||
""",
|
||||
formatter_class=MultilineFormatter)
|
||||
|
||||
subparsers = parser.add_subparsers()
|
||||
add_subparser(subparsers, 'add', build_add_parser)
|
||||
add_subparser(subparsers, 'remove', build_remove_parser)
|
||||
|
||||
return parser
|
||||
@ -1,21 +0,0 @@
|
||||
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
import argparse
|
||||
|
||||
from . import project as project_module
|
||||
|
||||
|
||||
def build_parser(parser=argparse.ArgumentParser()):
|
||||
project_module.build_create_parser(parser) \
|
||||
.set_defaults(command=project_module.create_command)
|
||||
|
||||
return parser
|
||||
|
||||
def main(args=None):
|
||||
parser = build_parser()
|
||||
args = parser.parse_args(args)
|
||||
|
||||
return args.command(args)
|
||||
@ -1,69 +0,0 @@
|
||||
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
import argparse
|
||||
import os.path as osp
|
||||
|
||||
from datumaro.components.project import Project
|
||||
from datumaro.util.command_targets import (TargetKinds, target_selector,
|
||||
ProjectTarget, SourceTarget, ImageTarget, ExternalDatasetTarget,
|
||||
is_project_path
|
||||
)
|
||||
|
||||
from . import project as project_module
|
||||
from . import source as source_module
|
||||
from . import item as item_module
|
||||
|
||||
|
||||
def export_external_dataset(target, params):
|
||||
raise NotImplementedError()
|
||||
|
||||
def build_parser(parser=argparse.ArgumentParser()):
|
||||
parser.add_argument('target', nargs='?', default=None)
|
||||
parser.add_argument('params', nargs=argparse.REMAINDER)
|
||||
|
||||
parser.add_argument('-p', '--project', dest='project_dir', default='.',
|
||||
help="Directory of the project to operate on (default: current dir)")
|
||||
|
||||
return parser
|
||||
|
||||
def process_command(target, params, args):
|
||||
project_dir = args.project_dir
|
||||
target_kind, target_value = target
|
||||
if target_kind == TargetKinds.project:
|
||||
return project_module.main(['export', '-p', target_value] + params)
|
||||
elif target_kind == TargetKinds.source:
|
||||
return source_module.main(['export', '-p', project_dir, '-n', target_value] + params)
|
||||
elif target_kind == TargetKinds.item:
|
||||
return item_module.main(['export', '-p', project_dir, target_value] + params)
|
||||
elif target_kind == TargetKinds.external_dataset:
|
||||
return export_external_dataset(target_value, params)
|
||||
return 1
|
||||
|
||||
def main(args=None):
|
||||
parser = build_parser()
|
||||
args = parser.parse_args(args)
|
||||
|
||||
project_path = args.project_dir
|
||||
if is_project_path(project_path):
|
||||
project = Project.load(project_path)
|
||||
else:
|
||||
project = None
|
||||
try:
|
||||
args.target = target_selector(
|
||||
ProjectTarget(is_default=True, project=project),
|
||||
SourceTarget(project=project),
|
||||
ExternalDatasetTarget(),
|
||||
ImageTarget()
|
||||
)(args.target)
|
||||
if args.target[0] == TargetKinds.project:
|
||||
if is_project_path(args.target[1]):
|
||||
args.project_dir = osp.dirname(osp.abspath(args.target[1]))
|
||||
except argparse.ArgumentTypeError as e:
|
||||
print(e)
|
||||
parser.print_help()
|
||||
return 1
|
||||
|
||||
return process_command(args.target, args.params, args)
|
||||
@ -1,33 +0,0 @@
|
||||
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
import argparse
|
||||
|
||||
|
||||
def run_command(args):
|
||||
return 0
|
||||
|
||||
def build_run_parser(parser):
|
||||
return parser
|
||||
|
||||
def build_parser(parser=argparse.ArgumentParser()):
|
||||
command_parsers = parser.add_subparsers(dest='command')
|
||||
|
||||
build_run_parser(command_parsers.add_parser('run')). \
|
||||
set_defaults(command=run_command)
|
||||
|
||||
return parser
|
||||
|
||||
def process_command(command, args):
|
||||
return 0
|
||||
|
||||
def main(args=None):
|
||||
parser = build_parser()
|
||||
args = parser.parse_args(args)
|
||||
if 'command' not in args:
|
||||
parser.print_help()
|
||||
return 1
|
||||
|
||||
return args.command(args)
|
||||
@ -1,38 +0,0 @@
|
||||
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
import argparse
|
||||
|
||||
|
||||
def build_export_parser(parser):
|
||||
return parser
|
||||
|
||||
def build_stats_parser(parser):
|
||||
return parser
|
||||
|
||||
def build_diff_parser(parser):
|
||||
return parser
|
||||
|
||||
def build_edit_parser(parser):
|
||||
return parser
|
||||
|
||||
def build_parser(parser=argparse.ArgumentParser()):
|
||||
command_parsers = parser.add_subparsers(dest='command_name')
|
||||
|
||||
build_export_parser(command_parsers.add_parser('export'))
|
||||
build_stats_parser(command_parsers.add_parser('stats'))
|
||||
build_diff_parser(command_parsers.add_parser('diff'))
|
||||
build_edit_parser(command_parsers.add_parser('edit'))
|
||||
|
||||
return parser
|
||||
|
||||
def main(args=None):
|
||||
parser = build_parser()
|
||||
args = parser.parse_args(args)
|
||||
if 'command' not in args:
|
||||
parser.print_help()
|
||||
return 1
|
||||
|
||||
return args.command(args)
|
||||
@ -1,361 +0,0 @@
|
||||
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
import argparse
|
||||
import logging as log
|
||||
import os
|
||||
import os.path as osp
|
||||
import shutil
|
||||
|
||||
from datumaro.components.project import Project
|
||||
from datumaro.components.comparator import Comparator
|
||||
from datumaro.components.dataset_filter import DatasetItemEncoder
|
||||
from .diff import DiffVisualizer
|
||||
from ..util.project import make_project_path, load_project
|
||||
|
||||
|
||||
def build_create_parser(parser):
|
||||
parser.add_argument('-d', '--dest', default='.', dest='dst_dir',
|
||||
help="Save directory for the new project (default: current dir")
|
||||
parser.add_argument('-n', '--name', default=None,
|
||||
help="Name of the new project (default: same as project dir)")
|
||||
parser.add_argument('--overwrite', action='store_true',
|
||||
help="Overwrite existing files in the save directory")
|
||||
return parser
|
||||
|
||||
def create_command(args):
|
||||
project_dir = osp.abspath(args.dst_dir)
|
||||
project_path = make_project_path(project_dir)
|
||||
|
||||
if osp.isdir(project_dir) and os.listdir(project_dir):
|
||||
if not args.overwrite:
|
||||
log.error("Directory '%s' already exists "
|
||||
"(pass --overwrite to force creation)" % project_dir)
|
||||
return 1
|
||||
else:
|
||||
shutil.rmtree(project_dir)
|
||||
os.makedirs(project_dir, exist_ok=args.overwrite)
|
||||
|
||||
if not args.overwrite and osp.isfile(project_path):
|
||||
log.error("Project file '%s' already exists "
|
||||
"(pass --overwrite to force creation)" % project_path)
|
||||
return 1
|
||||
|
||||
project_name = args.name
|
||||
if project_name is None:
|
||||
project_name = osp.basename(project_dir)
|
||||
|
||||
log.info("Creating project at '%s'" % (project_dir))
|
||||
|
||||
Project.generate(project_dir, {
|
||||
'project_name': project_name,
|
||||
})
|
||||
|
||||
log.info("Project has been created at '%s'" % (project_dir))
|
||||
|
||||
return 0
|
||||
|
||||
def build_import_parser(parser):
|
||||
import datumaro.components.importers as importers_module
|
||||
importers_list = [name for name, cls in importers_module.items]
|
||||
|
||||
parser.add_argument('-s', '--source', required=True,
|
||||
help="Path to import a project from")
|
||||
parser.add_argument('-f', '--format', required=True,
|
||||
help="Source project format (options: %s)" % (', '.join(importers_list)))
|
||||
parser.add_argument('-d', '--dest', default='.', dest='dst_dir',
|
||||
help="Directory to save the new project to (default: current dir)")
|
||||
parser.add_argument('-n', '--name', default=None,
|
||||
help="Name of the new project (default: same as project dir)")
|
||||
parser.add_argument('--overwrite', action='store_true',
|
||||
help="Overwrite existing files in the save directory")
|
||||
parser.add_argument('--copy', action='store_true',
|
||||
help="Copy the dataset instead of saving source links")
|
||||
parser.add_argument('--skip-check', action='store_true',
|
||||
help="Skip source checking")
|
||||
# parser.add_argument('extra_args', nargs=argparse.REMAINDER,
|
||||
# help="Additional arguments for importer (pass '-- -h' for help)")
|
||||
return parser
|
||||
|
||||
def import_command(args):
|
||||
project_dir = osp.abspath(args.dst_dir)
|
||||
project_path = make_project_path(project_dir)
|
||||
|
||||
if osp.isdir(project_dir) and os.listdir(project_dir):
|
||||
if not args.overwrite:
|
||||
log.error("Directory '%s' already exists "
|
||||
"(pass --overwrite to force creation)" % project_dir)
|
||||
return 1
|
||||
else:
|
||||
shutil.rmtree(project_dir)
|
||||
os.makedirs(project_dir, exist_ok=args.overwrite)
|
||||
|
||||
if not args.overwrite and osp.isfile(project_path):
|
||||
log.error("Project file '%s' already exists "
|
||||
"(pass --overwrite to force creation)" % project_path)
|
||||
return 1
|
||||
|
||||
project_name = args.name
|
||||
if project_name is None:
|
||||
project_name = osp.basename(project_dir)
|
||||
|
||||
log.info("Importing project from '%s' as '%s'" % \
|
||||
(args.source, args.format))
|
||||
|
||||
source = osp.abspath(args.source)
|
||||
project = Project.import_from(source, args.format)
|
||||
project.config.project_name = project_name
|
||||
project.config.project_dir = project_dir
|
||||
|
||||
if not args.skip_check or args.copy:
|
||||
log.info("Checking the dataset...")
|
||||
dataset = project.make_dataset()
|
||||
if args.copy:
|
||||
log.info("Cloning data...")
|
||||
dataset.save(merge=True, save_images=True)
|
||||
else:
|
||||
project.save()
|
||||
|
||||
log.info("Project has been created at '%s'" % (project_dir))
|
||||
|
||||
return 0
|
||||
|
||||
def build_build_parser(parser):
|
||||
return parser
|
||||
|
||||
def build_export_parser(parser):
|
||||
parser.add_argument('-e', '--filter', default=None,
|
||||
help="Filter expression for dataset items. Examples: "
|
||||
"extract images with width < height: "
|
||||
"'/item[image/width < image/height]'; "
|
||||
"extract images with large-area bboxes: "
|
||||
"'/item[annotation/type=\"bbox\" and annotation/area>2000]'"
|
||||
"filter out irrelevant annotations from items: "
|
||||
"'/item/annotation[label = \"person\"]'"
|
||||
)
|
||||
parser.add_argument('-a', '--filter-annotations', action='store_true',
|
||||
help="Filter annotations instead of dataset "
|
||||
"items (default: %(default)s)")
|
||||
parser.add_argument('-d', '--dest', dest='dst_dir', required=True,
|
||||
help="Directory to save output")
|
||||
parser.add_argument('-f', '--output-format', required=True,
|
||||
help="Output format")
|
||||
parser.add_argument('-p', '--project', dest='project_dir', default='.',
|
||||
help="Directory of the project to operate on (default: current dir)")
|
||||
parser.add_argument('--overwrite', action='store_true',
|
||||
help="Overwrite existing files in the save directory")
|
||||
parser.add_argument('extra_args', nargs=argparse.REMAINDER, default=None,
|
||||
help="Additional arguments for converter (pass '-- -h' for help)")
|
||||
return parser
|
||||
|
||||
def export_command(args):
|
||||
project = load_project(args.project_dir)
|
||||
|
||||
dst_dir = osp.abspath(args.dst_dir)
|
||||
if not args.overwrite and osp.isdir(dst_dir) and os.listdir(dst_dir):
|
||||
log.error("Directory '%s' already exists "
|
||||
"(pass --overwrite to force creation)" % dst_dir)
|
||||
return 1
|
||||
os.makedirs(dst_dir, exist_ok=args.overwrite)
|
||||
|
||||
log.info("Loading the project...")
|
||||
dataset = project.make_dataset()
|
||||
|
||||
log.info("Exporting the project...")
|
||||
dataset.export_project(
|
||||
save_dir=dst_dir,
|
||||
output_format=args.output_format,
|
||||
filter_expr=args.filter,
|
||||
filter_annotations=args.filter_annotations,
|
||||
cmdline_args=args.extra_args)
|
||||
log.info("Project exported to '%s' as '%s'" % \
|
||||
(dst_dir, args.output_format))
|
||||
|
||||
return 0
|
||||
|
||||
def build_stats_parser(parser):
|
||||
parser.add_argument('name')
|
||||
return parser
|
||||
|
||||
def build_docs_parser(parser):
|
||||
return parser
|
||||
|
||||
def build_extract_parser(parser):
|
||||
parser.add_argument('-e', '--filter', default=None,
|
||||
help="XML XPath filter expression for dataset items. Examples: "
|
||||
"extract images with width < height: "
|
||||
"'/item[image/width < image/height]'; "
|
||||
"extract images with large-area bboxes: "
|
||||
"'/item[annotation/type=\"bbox\" and annotation/area>2000]' "
|
||||
"filter out irrelevant annotations from items: "
|
||||
"'/item/annotation[label = \"person\"]'"
|
||||
)
|
||||
parser.add_argument('-a', '--filter-annotations', action='store_true',
|
||||
help="Filter annotations instead of dataset "
|
||||
"items (default: %(default)s)")
|
||||
parser.add_argument('--remove-empty', action='store_true',
|
||||
help="Remove an item if there are no annotations left after filtration")
|
||||
parser.add_argument('--dry-run', action='store_true',
|
||||
help="Print XML representations to be filtered and exit")
|
||||
parser.add_argument('-d', '--dest', dest='dst_dir', required=True,
|
||||
help="Output directory")
|
||||
parser.add_argument('-p', '--project', dest='project_dir', default='.',
|
||||
help="Directory of the project to operate on (default: current dir)")
|
||||
return parser
|
||||
|
||||
def extract_command(args):
|
||||
project = load_project(args.project_dir)
|
||||
|
||||
dst_dir = osp.abspath(args.dst_dir)
|
||||
if not args.dry_run:
|
||||
os.makedirs(dst_dir, exist_ok=False)
|
||||
|
||||
dataset = project.make_dataset()
|
||||
|
||||
kwargs = {}
|
||||
if args.filter_annotations:
|
||||
kwargs['remove_empty'] = args.remove_empty
|
||||
|
||||
if args.dry_run:
|
||||
dataset = dataset.extract(filter_expr=args.filter,
|
||||
filter_annotations=args.filter_annotations, **kwargs)
|
||||
for item in dataset:
|
||||
encoded_item = DatasetItemEncoder.encode(item, dataset.categories())
|
||||
xml_item = DatasetItemEncoder.to_string(encoded_item)
|
||||
print(xml_item)
|
||||
return 0
|
||||
|
||||
dataset.extract_project(save_dir=dst_dir, filter_expr=args.filter,
|
||||
filter_annotations=args.filter_annotations, **kwargs)
|
||||
|
||||
log.info("Subproject extracted to '%s'" % (dst_dir))
|
||||
|
||||
return 0
|
||||
|
||||
def build_merge_parser(parser):
|
||||
parser.add_argument('other_project_dir',
|
||||
help="Directory of the project to get data updates from")
|
||||
parser.add_argument('-d', '--dest', dest='dst_dir', default=None,
|
||||
help="Output directory (default: current project's dir)")
|
||||
parser.add_argument('-p', '--project', dest='project_dir', default='.',
|
||||
help="Directory of the project to operate on (default: current dir)")
|
||||
return parser
|
||||
|
||||
def merge_command(args):
|
||||
first_project = load_project(args.project_dir)
|
||||
second_project = load_project(args.other_project_dir)
|
||||
|
||||
first_dataset = first_project.make_dataset()
|
||||
first_dataset.update(second_project.make_dataset())
|
||||
|
||||
dst_dir = args.dst_dir
|
||||
first_dataset.save(save_dir=dst_dir)
|
||||
|
||||
if dst_dir is None:
|
||||
dst_dir = first_project.config.project_dir
|
||||
dst_dir = osp.abspath(dst_dir)
|
||||
log.info("Merge result saved to '%s'" % (dst_dir))
|
||||
|
||||
return 0
|
||||
|
||||
def build_diff_parser(parser):
|
||||
parser.add_argument('other_project_dir',
|
||||
help="Directory of the second project to be compared")
|
||||
parser.add_argument('-d', '--dest', default=None, dest='dst_dir',
|
||||
help="Directory to save comparison results (default: do not save)")
|
||||
parser.add_argument('-f', '--output-format',
|
||||
default=DiffVisualizer.DEFAULT_FORMAT,
|
||||
choices=[f.name for f in DiffVisualizer.Format],
|
||||
help="Output format (default: %(default)s)")
|
||||
parser.add_argument('--iou-thresh', default=0.5, type=float,
|
||||
help="IoU match threshold for detections (default: %(default)s)")
|
||||
parser.add_argument('--conf-thresh', default=0.5, type=float,
|
||||
help="Confidence threshold for detections (default: %(default)s)")
|
||||
parser.add_argument('-p', '--project', dest='project_dir', default='.',
|
||||
help="Directory of the first project to be compared (default: current dir)")
|
||||
return parser
|
||||
|
||||
def diff_command(args):
|
||||
first_project = load_project(args.project_dir)
|
||||
second_project = load_project(args.other_project_dir)
|
||||
|
||||
comparator = Comparator(
|
||||
iou_threshold=args.iou_thresh,
|
||||
conf_threshold=args.conf_thresh)
|
||||
|
||||
save_dir = args.dst_dir
|
||||
if save_dir is not None:
|
||||
log.info("Saving diff to '%s'" % save_dir)
|
||||
os.makedirs(osp.abspath(save_dir))
|
||||
visualizer = DiffVisualizer(save_dir=save_dir, comparator=comparator,
|
||||
output_format=args.output_format)
|
||||
visualizer.save_dataset_diff(
|
||||
first_project.make_dataset(),
|
||||
second_project.make_dataset())
|
||||
|
||||
return 0
|
||||
|
||||
def build_transform_parser(parser):
|
||||
parser.add_argument('-d', '--dest', dest='dst_dir', required=True,
|
||||
help="Directory to save output")
|
||||
parser.add_argument('-m', '--model', dest='model_name', required=True,
|
||||
help="Model to apply to the project")
|
||||
parser.add_argument('-f', '--output-format', required=True,
|
||||
help="Output format")
|
||||
parser.add_argument('-p', '--project', dest='project_dir', default='.',
|
||||
help="Directory of the project to operate on (default: current dir)")
|
||||
return parser
|
||||
|
||||
def transform_command(args):
|
||||
project = load_project(args.project_dir)
|
||||
|
||||
dst_dir = osp.abspath(args.dst_dir)
|
||||
os.makedirs(dst_dir, exist_ok=False)
|
||||
project.make_dataset().apply_model(
|
||||
save_dir=dst_dir,
|
||||
model_name=args.model_name)
|
||||
|
||||
log.info("Transform results saved to '%s'" % (dst_dir))
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def build_parser(parser=argparse.ArgumentParser()):
|
||||
command_parsers = parser.add_subparsers(dest='command_name')
|
||||
|
||||
build_create_parser(command_parsers.add_parser('create')) \
|
||||
.set_defaults(command=create_command)
|
||||
|
||||
build_import_parser(command_parsers.add_parser('import')) \
|
||||
.set_defaults(command=import_command)
|
||||
|
||||
build_export_parser(command_parsers.add_parser('export')) \
|
||||
.set_defaults(command=export_command)
|
||||
|
||||
build_extract_parser(command_parsers.add_parser('extract')) \
|
||||
.set_defaults(command=extract_command)
|
||||
|
||||
build_merge_parser(command_parsers.add_parser('merge')) \
|
||||
.set_defaults(command=merge_command)
|
||||
|
||||
build_build_parser(command_parsers.add_parser('build'))
|
||||
build_stats_parser(command_parsers.add_parser('stats'))
|
||||
build_docs_parser(command_parsers.add_parser('docs'))
|
||||
build_diff_parser(command_parsers.add_parser('diff')) \
|
||||
.set_defaults(command=diff_command)
|
||||
|
||||
build_transform_parser(command_parsers.add_parser('transform')) \
|
||||
.set_defaults(command=transform_command)
|
||||
|
||||
return parser
|
||||
|
||||
def main(args=None):
|
||||
parser = build_parser()
|
||||
args = parser.parse_args(args)
|
||||
if 'command' not in args:
|
||||
parser.print_help()
|
||||
return 1
|
||||
|
||||
return args.command(args)
|
||||
@ -1,21 +0,0 @@
|
||||
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
import argparse
|
||||
|
||||
from . import source as source_module
|
||||
|
||||
|
||||
def build_parser(parser=argparse.ArgumentParser()):
|
||||
source_module.build_add_parser(parser). \
|
||||
set_defaults(command=source_module.remove_command)
|
||||
|
||||
return parser
|
||||
|
||||
def main(args=None):
|
||||
parser = build_parser()
|
||||
args = parser.parse_args(args)
|
||||
|
||||
return args.command(args)
|
||||
@ -1,254 +0,0 @@
|
||||
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
import argparse
|
||||
import logging as log
|
||||
import os
|
||||
import os.path as osp
|
||||
import shutil
|
||||
|
||||
from ..util.project import load_project
|
||||
|
||||
|
||||
def build_create_parser(parser):
|
||||
parser.add_argument('-n', '--name', required=True,
|
||||
help="Name of the source to be created")
|
||||
parser.add_argument('-p', '--project', dest='project_dir', default='.',
|
||||
help="Directory of the project to operate on (default: current dir)")
|
||||
return parser
|
||||
|
||||
def create_command(args):
|
||||
project = load_project(args.project_dir)
|
||||
config = project.config
|
||||
|
||||
name = args.name
|
||||
|
||||
if project.env.git.has_submodule(name):
|
||||
log.fatal("Submodule '%s' already exists" % (name))
|
||||
return 1
|
||||
|
||||
try:
|
||||
project.get_source(name)
|
||||
log.fatal("Source '%s' already exists" % (name))
|
||||
return 1
|
||||
except KeyError:
|
||||
pass
|
||||
|
||||
dst_dir = osp.join(config.project_dir, config.sources_dir, name)
|
||||
project.env.git.init(dst_dir)
|
||||
|
||||
project.add_source(name, { 'url': name })
|
||||
project.save()
|
||||
|
||||
log.info("Source '%s' has been added to the project, location: '%s'" \
|
||||
% (name, dst_dir))
|
||||
|
||||
return 0
|
||||
|
||||
def build_import_parser(parser):
|
||||
sp = parser.add_subparsers(dest='source_type')
|
||||
|
||||
repo_parser = sp.add_parser('repo')
|
||||
repo_parser.add_argument('url',
|
||||
help="URL of the source git repository")
|
||||
repo_parser.add_argument('-b', '--branch', default='master',
|
||||
help="Branch of the source repository (default: %(default)s)")
|
||||
repo_parser.add_argument('--checkout', action='store_true',
|
||||
help="Do branch checkout")
|
||||
|
||||
dir_parser = sp.add_parser('dir')
|
||||
dir_parser.add_argument('url',
|
||||
help="Path to the source directory")
|
||||
dir_parser.add_argument('--copy', action='store_true',
|
||||
help="Copy the dataset instead of saving source links")
|
||||
|
||||
parser.add_argument('-n', '--name', default=None,
|
||||
help="Name of the new source")
|
||||
parser.add_argument('-f', '--format', default=None,
|
||||
help="Name of the source dataset format (default: 'project')")
|
||||
parser.add_argument('-p', '--project', dest='project_dir', default='.',
|
||||
help="Directory of the project to operate on (default: current dir)")
|
||||
parser.add_argument('--skip-check', action='store_true',
|
||||
help="Skip source checking")
|
||||
return parser
|
||||
|
||||
def import_command(args):
|
||||
project = load_project(args.project_dir)
|
||||
|
||||
if args.source_type == 'repo':
|
||||
name = args.name
|
||||
if name is None:
|
||||
name = osp.splitext(osp.basename(args.url))[0]
|
||||
|
||||
if project.env.git.has_submodule(name):
|
||||
log.fatal("Submodule '%s' already exists" % (name))
|
||||
return 1
|
||||
|
||||
try:
|
||||
project.get_source(name)
|
||||
log.fatal("Source '%s' already exists" % (name))
|
||||
return 1
|
||||
except KeyError:
|
||||
pass
|
||||
|
||||
dst_dir = project.local_source_dir(name)
|
||||
project.env.git.create_submodule(name, dst_dir,
|
||||
url=args.url, branch=args.branch, no_checkout=not args.checkout)
|
||||
|
||||
source = { 'url': args.url }
|
||||
if args.format:
|
||||
source['format'] = args.format
|
||||
project.add_source(name, source)
|
||||
|
||||
if not args.skip_check:
|
||||
log.info("Checking the source...")
|
||||
project.make_source_project(name)
|
||||
project.save()
|
||||
|
||||
log.info("Source '%s' has been added to the project, location: '%s'" \
|
||||
% (name, dst_dir))
|
||||
elif args.source_type == 'dir':
|
||||
url = osp.abspath(args.url)
|
||||
if not osp.exists(url):
|
||||
log.fatal("Source path '%s' does not exist" % url)
|
||||
return 1
|
||||
|
||||
name = args.name
|
||||
if name is None:
|
||||
name = osp.splitext(osp.basename(url))[0]
|
||||
|
||||
try:
|
||||
project.get_source(name)
|
||||
log.fatal("Source '%s' already exists" % (name))
|
||||
return 1
|
||||
except KeyError:
|
||||
pass
|
||||
|
||||
dst_dir = url
|
||||
if args.copy:
|
||||
dst_dir = project.local_source_dir(name)
|
||||
log.info("Copying from '%s' to '%s'" % (url, dst_dir))
|
||||
shutil.copytree(url, dst_dir)
|
||||
url = name
|
||||
|
||||
source = { 'url': url }
|
||||
if args.format:
|
||||
source['format'] = args.format
|
||||
project.add_source(name, source)
|
||||
|
||||
if not args.skip_check:
|
||||
log.info("Checking the source...")
|
||||
project.make_source_project(name)
|
||||
project.save()
|
||||
|
||||
log.info("Source '%s' has been added to the project, location: '%s'" \
|
||||
% (name, dst_dir))
|
||||
|
||||
return 0
|
||||
|
||||
def build_remove_parser(parser):
|
||||
parser.add_argument('-n', '--name', required=True,
|
||||
help="Name of the source to be removed")
|
||||
parser.add_argument('--force', action='store_true',
|
||||
help="Ignore possible errors during removal")
|
||||
parser.add_argument('-p', '--project', dest='project_dir', default='.',
|
||||
help="Directory of the project to operate on (default: current dir)")
|
||||
return parser
|
||||
|
||||
def remove_command(args):
|
||||
project = load_project(args.project_dir)
|
||||
|
||||
name = args.name
|
||||
if name is None:
|
||||
log.fatal("Expected source name")
|
||||
return
|
||||
|
||||
if project.env.git.has_submodule(name):
|
||||
if args.force:
|
||||
log.warning("Forcefully removing the '%s' source..." % (name))
|
||||
|
||||
project.env.git.remove_submodule(name, force=args.force)
|
||||
|
||||
project.remove_source(name)
|
||||
project.save()
|
||||
|
||||
log.info("Source '%s' has been removed from the project" % (name))
|
||||
|
||||
return 0
|
||||
|
||||
def build_export_parser(parser):
|
||||
parser.add_argument('-n', '--name', required=True,
|
||||
help="Source dataset to be extracted")
|
||||
parser.add_argument('-e', '--filter', default=None,
|
||||
help="Filter expression for dataset items. Examples: "
|
||||
"extract images with width < height: "
|
||||
"'/item[image/width < image/height]'; "
|
||||
"extract images with large-area bboxes: "
|
||||
"'/item[annotation/type=\"bbox\" and annotation/area>2000]'"
|
||||
)
|
||||
parser.add_argument('-a', '--filter-annotations', action='store_true',
|
||||
help="Filter annotations instead of dataset "
|
||||
"items (default: %(default)s)")
|
||||
parser.add_argument('-d', '--dest', dest='dst_dir', required=True,
|
||||
help="Directory to save output")
|
||||
parser.add_argument('-f', '--output-format', required=True,
|
||||
help="Output format")
|
||||
parser.add_argument('-p', '--project', dest='project_dir', default='.',
|
||||
help="Directory of the project to operate on (default: current dir)")
|
||||
parser.add_argument('--overwrite', action='store_true',
|
||||
help="Overwrite existing files in the save directory")
|
||||
parser.add_argument('extra_args', nargs=argparse.REMAINDER, default=None,
|
||||
help="Additional arguments for converter (pass '-- -h' for help)")
|
||||
return parser
|
||||
|
||||
def export_command(args):
|
||||
project = load_project(args.project_dir)
|
||||
|
||||
dst_dir = osp.abspath(args.dst_dir)
|
||||
if not args.overwrite and osp.isdir(dst_dir) and os.listdir(dst_dir):
|
||||
log.error("Directory '%s' already exists "
|
||||
"(pass --overwrite to force creation)" % dst_dir)
|
||||
return 1
|
||||
os.makedirs(dst_dir, exist_ok=args.overwrite)
|
||||
|
||||
log.info("Loading the project...")
|
||||
source_project = project.make_source_project(args.name)
|
||||
dataset = source_project.make_dataset()
|
||||
|
||||
log.info("Exporting the project...")
|
||||
dataset.export_project(
|
||||
save_dir=dst_dir,
|
||||
output_format=args.output_format,
|
||||
filter_expr=args.filter,
|
||||
filter_annotations=args.filter_annotations,
|
||||
cmdline_args=args.extra_args)
|
||||
log.info("Source '%s' exported to '%s' as '%s'" % \
|
||||
(args.name, dst_dir, args.output_format))
|
||||
|
||||
return 0
|
||||
|
||||
def build_parser(parser=argparse.ArgumentParser()):
|
||||
command_parsers = parser.add_subparsers(dest='command_name')
|
||||
|
||||
build_create_parser(command_parsers.add_parser('create')) \
|
||||
.set_defaults(command=create_command)
|
||||
build_import_parser(command_parsers.add_parser('import')) \
|
||||
.set_defaults(command=import_command)
|
||||
build_remove_parser(command_parsers.add_parser('remove')) \
|
||||
.set_defaults(command=remove_command)
|
||||
build_export_parser(command_parsers.add_parser('export')) \
|
||||
.set_defaults(command=export_command)
|
||||
|
||||
return parser
|
||||
|
||||
|
||||
def main(args=None):
|
||||
parser = build_parser()
|
||||
args = parser.parse_args(args)
|
||||
if 'command' not in args:
|
||||
parser.print_help()
|
||||
return 1
|
||||
|
||||
return args.command(args)
|
||||
@ -1,69 +0,0 @@
|
||||
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
import argparse
|
||||
import os.path as osp
|
||||
|
||||
from datumaro.components.project import Project
|
||||
from datumaro.util.command_targets import (TargetKinds, target_selector,
|
||||
ProjectTarget, SourceTarget, ExternalDatasetTarget, ImageTarget,
|
||||
is_project_path
|
||||
)
|
||||
|
||||
from . import project as project_module
|
||||
from . import source as source_module
|
||||
from . import item as item_module
|
||||
|
||||
|
||||
def compute_external_dataset_stats(target, params):
|
||||
raise NotImplementedError()
|
||||
|
||||
def build_parser(parser=argparse.ArgumentParser()):
|
||||
parser.add_argument('target', nargs='?', default=None)
|
||||
parser.add_argument('params', nargs=argparse.REMAINDER)
|
||||
|
||||
parser.add_argument('-p', '--project', dest='project_dir', default='.',
|
||||
help="Directory of the project to operate on (default: current dir)")
|
||||
|
||||
return parser
|
||||
|
||||
def process_command(target, params, args):
|
||||
project_dir = args.project_dir
|
||||
target_kind, target_value = target
|
||||
if target_kind == TargetKinds.project:
|
||||
return project_module.main(['stats', '-p', target_value] + params)
|
||||
elif target_kind == TargetKinds.source:
|
||||
return source_module.main(['stats', '-p', project_dir, target_value] + params)
|
||||
elif target_kind == TargetKinds.item:
|
||||
return item_module.main(['stats', '-p', project_dir, target_value] + params)
|
||||
elif target_kind == TargetKinds.external_dataset:
|
||||
return compute_external_dataset_stats(target_value, params)
|
||||
return 1
|
||||
|
||||
def main(args=None):
|
||||
parser = build_parser()
|
||||
args = parser.parse_args(args)
|
||||
|
||||
project_path = args.project_dir
|
||||
if is_project_path(project_path):
|
||||
project = Project.load(project_path)
|
||||
else:
|
||||
project = None
|
||||
try:
|
||||
args.target = target_selector(
|
||||
ProjectTarget(is_default=True, project=project),
|
||||
SourceTarget(project=project),
|
||||
ExternalDatasetTarget(),
|
||||
ImageTarget()
|
||||
)(args.target)
|
||||
if args.target[0] == TargetKinds.project:
|
||||
if is_project_path(args.target[1]):
|
||||
args.project_dir = osp.dirname(osp.abspath(args.target[1]))
|
||||
except argparse.ArgumentTypeError as e:
|
||||
print(e)
|
||||
parser.print_help()
|
||||
return 1
|
||||
|
||||
return process_command(args.target, args.params, args)
|
||||
@ -0,0 +1,55 @@
|
||||
|
||||
# Copyright (C) 2018 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
from collections import OrderedDict
|
||||
import os
|
||||
import os.path as osp
|
||||
|
||||
from datumaro.components.extractor import DatasetItem, Extractor
|
||||
from datumaro.util.image import lazy_image
|
||||
|
||||
|
||||
class ImageDirExtractor(Extractor):
|
||||
_SUPPORTED_FORMATS = ['.png', '.jpg']
|
||||
|
||||
def __init__(self, url):
|
||||
super().__init__()
|
||||
|
||||
assert osp.isdir(url)
|
||||
|
||||
items = []
|
||||
for name in os.listdir(url):
|
||||
path = osp.join(url, name)
|
||||
if self._is_image(path):
|
||||
item_id = osp.splitext(name)[0]
|
||||
item = DatasetItem(id=item_id, image=lazy_image(path))
|
||||
items.append((item.id, item))
|
||||
|
||||
items = sorted(items, key=lambda e: e[0])
|
||||
items = OrderedDict(items)
|
||||
self._items = items
|
||||
|
||||
self._subsets = None
|
||||
|
||||
def __iter__(self):
|
||||
for item in self._items.values():
|
||||
yield item
|
||||
|
||||
def __len__(self):
|
||||
return len(self._items)
|
||||
|
||||
def subsets(self):
|
||||
return self._subsets
|
||||
|
||||
def get(self, item_id, subset=None, path=None):
|
||||
if path or subset:
|
||||
raise KeyError()
|
||||
return self._items[item_id]
|
||||
|
||||
def _is_image(self, path):
|
||||
for ext in self._SUPPORTED_FORMATS:
|
||||
if osp.isfile(path) and path.endswith(ext):
|
||||
return True
|
||||
return False
|
||||
@ -0,0 +1,26 @@
|
||||
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
import os.path as osp
|
||||
|
||||
|
||||
class ImageDirImporter:
|
||||
EXTRACTOR_NAME = 'image_dir'
|
||||
|
||||
def __call__(self, path, **extra_params):
|
||||
from datumaro.components.project import Project # cyclic import
|
||||
project = Project()
|
||||
|
||||
if not osp.isdir(path):
|
||||
raise Exception("Can't find a directory at '%s'" % path)
|
||||
|
||||
source_name = osp.basename(osp.normpath(path))
|
||||
project.add_source(source_name, {
|
||||
'url': source_name,
|
||||
'format': self.EXTRACTOR_NAME,
|
||||
'options': dict(extra_params),
|
||||
})
|
||||
|
||||
return project
|
||||
Binary file not shown.
|
Before Width: | Height: | Size: 91 KiB After Width: | Height: | Size: 35 KiB |
@ -0,0 +1,563 @@
|
||||
# Quick start guide
|
||||
|
||||
## Contents
|
||||
|
||||
- [Installation](#installation)
|
||||
- [Interfaces](#interfaces)
|
||||
- [Supported dataset formats and annotations](#formats-support)
|
||||
- [Command line workflow](#command-line-workflow)
|
||||
- [Create a project](#create-project)
|
||||
- [Add and remove data](#add-and-remove-data)
|
||||
- [Import a project](#import-project)
|
||||
- [Extract a subproject](#extract-subproject)
|
||||
- [Merge projects](#merge-project)
|
||||
- [Export a project](#export-project)
|
||||
- [Compare projects](#compare-projects)
|
||||
- [Get project info](#get-project-info)
|
||||
- [Register a model](#register-model)
|
||||
- [Run inference](#run-inference)
|
||||
- [Run inference explanation](#explain-inference)
|
||||
- [Links](#links)
|
||||
|
||||
## Installation
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Python (3.5+)
|
||||
- OpenVINO (optional)
|
||||
|
||||
### Installation steps
|
||||
|
||||
Optionally, set up a virtual environment:
|
||||
|
||||
``` bash
|
||||
python -m pip install virtualenv
|
||||
python -m virtualenv venv
|
||||
. venv/bin/activate
|
||||
```
|
||||
|
||||
Install Datumaro:
|
||||
``` bash
|
||||
pip install 'git+https://github.com/opencv/cvat#egg=datumaro&subdirectory=datumaro'
|
||||
```
|
||||
|
||||
> You can change the installation branch with `.../cvat@<branch_name>#egg...`
|
||||
> Also note `--force-reinstall` parameter in this case.
|
||||
|
||||
## Interfaces
|
||||
|
||||
As a standalone tool:
|
||||
|
||||
``` bash
|
||||
datum --help
|
||||
```
|
||||
|
||||
As a python module:
|
||||
> The directory containing Datumaro should be in the `PYTHONPATH`
|
||||
> environment variable or `cvat/datumaro/` should be the current directory.
|
||||
|
||||
``` bash
|
||||
python -m datumaro --help
|
||||
python datumaro/ --help
|
||||
python datum.py --help
|
||||
```
|
||||
|
||||
As a python library:
|
||||
|
||||
``` python
|
||||
import datumaro
|
||||
```
|
||||
|
||||
## Formats support
|
||||
|
||||
List of supported formats:
|
||||
- COCO (`image_info`, `instances`, `person_keypoints`, `captions`, `labels`*)
|
||||
- [Format specification](http://cocodataset.org/#format-data)
|
||||
- `labels` are our extension - like `instances` with only `category_id`
|
||||
- PASCAL VOC (`classification`, `detection`, `segmentation` (class, instances), `action_classification`, `person_layout`)
|
||||
- [Format specification](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/htmldoc/index.html)
|
||||
- YOLO (`bboxes`)
|
||||
- [Format specification](https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data)
|
||||
- TF Detection API (`bboxes`, `masks`)
|
||||
- Format specifications: [bboxes](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md), [masks](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/instance_segmentation.md)
|
||||
- CVAT
|
||||
- [Format specification](https://github.com/opencv/cvat/blob/develop/cvat/apps/documentation/xml_format.md)
|
||||
|
||||
List of supported annotation types:
|
||||
- Labels
|
||||
- Bounding boxes
|
||||
- Polygons
|
||||
- Polylines
|
||||
- (Key-)Points
|
||||
- Captions
|
||||
- Masks
|
||||
|
||||
## Command line workflow
|
||||
|
||||
> **Note**: command invocation syntax is subject to change,
|
||||
> **always refer to command --help output**
|
||||
|
||||
The key object is the Project. The Project is a combination of
|
||||
a Project's own dataset, a number of external data sources and an environment.
|
||||
An empty Project can be created by `project create` command,
|
||||
an existing dataset can be imported with `project import` command.
|
||||
A typical way to obtain projects is to export tasks in CVAT UI.
|
||||
|
||||
Available CLI commands:
|
||||

|
||||
|
||||
If you want to interact with models, you need to add them to project first.
|
||||
|
||||
### Import project
|
||||
|
||||
This command creates a Project from an existing dataset.
|
||||
|
||||
Supported formats are listed in the command help.
|
||||
In Datumaro dataset formats are supported by Extractors and Importers.
|
||||
An Extractor produces a list of dataset items corresponding
|
||||
to the dataset. An Importer creates a Project from the
|
||||
data source location. It is possible to add a custom Extractor and Importer.
|
||||
To do this, you need to put an Extractor and Importer implementation scripts to
|
||||
`<project_dir>/.datumaro/extractors` and `<project_dir>/.datumaro/importers`.
|
||||
|
||||
Usage:
|
||||
|
||||
``` bash
|
||||
datum project import --help
|
||||
|
||||
datum project import \
|
||||
-i <dataset_path> \
|
||||
-o <project_dir> \
|
||||
-f <format>
|
||||
```
|
||||
|
||||
Example: create a project from COCO-like dataset
|
||||
|
||||
``` bash
|
||||
datum project import \
|
||||
-i /home/coco_dir \
|
||||
-o /home/project_dir \
|
||||
-f coco
|
||||
```
|
||||
|
||||
An _MS COCO_-like dataset should have the following directory structure:
|
||||
|
||||
<!--lint disable fenced-code-flag-->
|
||||
```
|
||||
COCO/
|
||||
├── annotations/
|
||||
│ ├── instances_val2017.json
|
||||
│ ├── instances_train2017.json
|
||||
├── images/
|
||||
│ ├── val2017
|
||||
│ ├── train2017
|
||||
```
|
||||
<!--lint enable fenced-code-flag-->
|
||||
|
||||
Everything after the last `_` is considered a subset name in the COCO format.
|
||||
|
||||
### Create project
|
||||
|
||||
The command creates an empty project. Once a Project is created, there are
|
||||
a few options to interact with it.
|
||||
|
||||
Usage:
|
||||
|
||||
``` bash
|
||||
datum project create --help
|
||||
|
||||
datum project create \
|
||||
-o <project_dir>
|
||||
```
|
||||
|
||||
Example: create an empty project `my_dataset`
|
||||
|
||||
``` bash
|
||||
datum project create -o my_dataset/
|
||||
```
|
||||
|
||||
### Add and remove data
|
||||
|
||||
A Project can be attached to a number of external Data Sources. Each Source
|
||||
describes a way to produce dataset items. A Project combines dataset items from
|
||||
all the sources and its own dataset into one composite dataset. You can manage
|
||||
project sources by commands in the `source` command line context.
|
||||
|
||||
Datasets come in a wide variety of formats. Each dataset
|
||||
format defines its own data structure and rules on how to
|
||||
interpret the data. For example, the following data structure
|
||||
is used in COCO format:
|
||||
<!--lint disable fenced-code-flag-->
|
||||
```
|
||||
/dataset/
|
||||
- /images/<id>.jpg
|
||||
- /annotations/
|
||||
```
|
||||
<!--lint enable fenced-code-flag-->
|
||||
|
||||
In Datumaro dataset formats are supported by Extractors.
|
||||
An Extractor produces a list of dataset items corresponding
|
||||
to the dataset. It is possible to add a custom Extractor.
|
||||
To do this, you need to put an Extractor
|
||||
definition script to `<project_dir>/.datumaro/extractors`.
|
||||
|
||||
Usage:
|
||||
|
||||
``` bash
|
||||
datum source add --help
|
||||
datum source remove --help
|
||||
|
||||
datum source add \
|
||||
path <path> \
|
||||
-p <project dir> \
|
||||
-n <name>
|
||||
|
||||
datum source remove \
|
||||
-p <project dir> \
|
||||
-n <name>
|
||||
```
|
||||
|
||||
Example: create a project from a bunch of different annotations and images,
|
||||
and generate TFrecord for TF Detection API for model training
|
||||
|
||||
``` bash
|
||||
datum project create
|
||||
# 'default' is the name of the subset below
|
||||
datum source add path <path/to/coco/instances_default.json> -f coco_instances
|
||||
datum source add path <path/to/cvat/default.xml> -f cvat
|
||||
datum source add path <path/to/voc> -f voc_detection
|
||||
datum source add path <path/to/datumaro/default.json> -f datumaro
|
||||
datum source add path <path/to/images/dir> -f image_dir
|
||||
datum project export -f tf_detection_api
|
||||
```
|
||||
|
||||
### Extract subproject
|
||||
|
||||
This command allows to create a sub-Project from a Project. The new project
|
||||
includes only items satisfying some condition. [XPath](https://devhints.io/xpath)
|
||||
is used as query format.
|
||||
|
||||
There are several filtering modes available ('-m/--mode' parameter).
|
||||
Supported modes:
|
||||
- 'i', 'items'
|
||||
- 'a', 'annotations'
|
||||
- 'i+a', 'a+i', 'items+annotations', 'annotations+items'
|
||||
|
||||
When filtering annotations, use the 'items+annotations'
|
||||
mode to point that annotation-less dataset items should be
|
||||
removed. To select an annotation, write an XPath that
|
||||
returns 'annotation' elements (see examples).
|
||||
|
||||
Usage:
|
||||
|
||||
``` bash
|
||||
datum project extract --help
|
||||
|
||||
datum project extract \
|
||||
-p <project dir> \
|
||||
-o <output dir> \
|
||||
-e '<xpath filter expression>'
|
||||
```
|
||||
|
||||
Example: extract a dataset with only images which width < height
|
||||
|
||||
``` bash
|
||||
datum project extract \
|
||||
-p test_project \
|
||||
-o test_project-extract \
|
||||
-e '/item[image/width < image/height]'
|
||||
```
|
||||
|
||||
Example: extract a dataset with only large annotations of class `cat` and any non-`persons`
|
||||
|
||||
``` bash
|
||||
datum project extract \
|
||||
-p test_project \
|
||||
-o test_project-extract \
|
||||
--mode annotations -e '/item/annotation[(label="cat" and area > 999.5) or label!="person"]'
|
||||
```
|
||||
|
||||
Example: extract a dataset with only occluded annotations, remove empty images
|
||||
|
||||
``` bash
|
||||
datum project extract \
|
||||
-p test_project \
|
||||
-o test_project-extract \
|
||||
-m i+a -e '/item/annotation[occluded="True"]'
|
||||
```
|
||||
|
||||
Item representations are available with `--dry-run` parameter:
|
||||
|
||||
``` xml
|
||||
<item>
|
||||
<id>290768</id>
|
||||
<subset>minival2014</subset>
|
||||
<image>
|
||||
<width>612</width>
|
||||
<height>612</height>
|
||||
<depth>3</depth>
|
||||
</image>
|
||||
<annotation>
|
||||
<id>80154</id>
|
||||
<type>bbox</type>
|
||||
<label_id>39</label_id>
|
||||
<x>264.59</x>
|
||||
<y>150.25</y>
|
||||
<w>11.199999999999989</w>
|
||||
<h>42.31</h>
|
||||
<area>473.87199999999956</area>
|
||||
</annotation>
|
||||
<annotation>
|
||||
<id>669839</id>
|
||||
<type>bbox</type>
|
||||
<label_id>41</label_id>
|
||||
<x>163.58</x>
|
||||
<y>191.75</y>
|
||||
<w>76.98999999999998</w>
|
||||
<h>73.63</h>
|
||||
<area>5668.773699999998</area>
|
||||
</annotation>
|
||||
...
|
||||
</item>
|
||||
```
|
||||
|
||||
### Merge projects
|
||||
|
||||
This command combines multiple Projects into one.
|
||||
|
||||
Usage:
|
||||
|
||||
``` bash
|
||||
datum project merge --help
|
||||
|
||||
datum project merge \
|
||||
-p <project dir> \
|
||||
-o <output dir> \
|
||||
<other project dir>
|
||||
```
|
||||
|
||||
Example: update annotations in the `first_project` with annotations
|
||||
from the `second_project` and save the result as `merged_project`
|
||||
|
||||
``` bash
|
||||
datum project merge \
|
||||
-p first_project \
|
||||
-o merged_project \
|
||||
second_project
|
||||
```
|
||||
|
||||
### Export project
|
||||
|
||||
This command exports a Project in some format.
|
||||
|
||||
Supported formats are listed in the command help.
|
||||
In Datumaro dataset formats are supported by Converters.
|
||||
A Converter produces a dataset of a specific format
|
||||
from dataset items. It is possible to add a custom Converter.
|
||||
To do this, you need to put a Converter
|
||||
definition script to <project_dir>/.datumaro/converters.
|
||||
|
||||
Usage:
|
||||
|
||||
``` bash
|
||||
datum project export --help
|
||||
|
||||
datum project export \
|
||||
-p <project dir> \
|
||||
-o <output dir> \
|
||||
-f <format> \
|
||||
[-- <additional format parameters>]
|
||||
```
|
||||
|
||||
Example: save project as VOC-like dataset, include images
|
||||
|
||||
``` bash
|
||||
datum project export \
|
||||
-p test_project \
|
||||
-o test_project-export \
|
||||
-f voc \
|
||||
-- --save-images
|
||||
```
|
||||
|
||||
### Get project info
|
||||
|
||||
This command outputs project status information.
|
||||
|
||||
Usage:
|
||||
|
||||
``` bash
|
||||
datum project info --help
|
||||
|
||||
datum project info \
|
||||
-p <project dir>
|
||||
```
|
||||
|
||||
Example:
|
||||
|
||||
``` bash
|
||||
datum project info -p /test_project
|
||||
|
||||
Project:
|
||||
name: test_project2
|
||||
location: /test_project
|
||||
Sources:
|
||||
source 'instances_minival2014':
|
||||
format: coco_instances
|
||||
url: /coco_like/annotations/instances_minival2014.json
|
||||
Dataset:
|
||||
length: 5000
|
||||
categories: label
|
||||
label:
|
||||
count: 80
|
||||
labels: person, bicycle, car, motorcycle (and 76 more)
|
||||
subsets: minival2014
|
||||
subset 'minival2014':
|
||||
length: 5000
|
||||
categories: label
|
||||
label:
|
||||
count: 80
|
||||
labels: person, bicycle, car, motorcycle (and 76 more)
|
||||
```
|
||||
|
||||
### Register model
|
||||
|
||||
Supported models:
|
||||
- OpenVINO
|
||||
- Custom models via custom `launchers`
|
||||
|
||||
Usage:
|
||||
|
||||
``` bash
|
||||
datum model add --help
|
||||
```
|
||||
|
||||
Example: register an OpenVINO model
|
||||
|
||||
A model consists of a graph description and weights. There is also a script
|
||||
used to convert model outputs to internal data structures.
|
||||
|
||||
``` bash
|
||||
datum project create
|
||||
datum model add \
|
||||
-n <model_name> openvino \
|
||||
-d <path_to_xml> -w <path_to_bin> -i <path_to_interpretation_script>
|
||||
```
|
||||
|
||||
Interpretation script for an OpenVINO detection model (`convert.py`):
|
||||
|
||||
``` python
|
||||
from datumaro.components.extractor import *
|
||||
|
||||
max_det = 10
|
||||
conf_thresh = 0.1
|
||||
|
||||
def process_outputs(inputs, outputs):
|
||||
# inputs = model input, array or images, shape = (N, C, H, W)
|
||||
# outputs = model output, shape = (N, 1, K, 7)
|
||||
# results = conversion result, [ [ Annotation, ... ], ... ]
|
||||
results = []
|
||||
for input, output in zip(inputs, outputs):
|
||||
input_height, input_width = input.shape[:2]
|
||||
detections = output[0]
|
||||
image_results = []
|
||||
for i, det in enumerate(detections):
|
||||
label = int(det[1])
|
||||
conf = det[2]
|
||||
if conf <= conf_thresh:
|
||||
continue
|
||||
|
||||
x = max(int(det[3] * input_width), 0)
|
||||
y = max(int(det[4] * input_height), 0)
|
||||
w = min(int(det[5] * input_width - x), input_width)
|
||||
h = min(int(det[6] * input_height - y), input_height)
|
||||
image_results.append(BboxObject(x, y, w, h,
|
||||
label=label, attributes={'score': conf} ))
|
||||
|
||||
results.append(image_results[:max_det])
|
||||
|
||||
return results
|
||||
|
||||
def get_categories():
|
||||
# Optionally, provide output categories - label map etc.
|
||||
# Example:
|
||||
label_categories = LabelCategories()
|
||||
label_categories.add('person')
|
||||
label_categories.add('car')
|
||||
return { AnnotationType.label: label_categories }
|
||||
```
|
||||
|
||||
### Run model
|
||||
|
||||
This command applies model to dataset images and produces a new project.
|
||||
|
||||
Usage:
|
||||
|
||||
``` bash
|
||||
datum model run --help
|
||||
|
||||
datum model run \
|
||||
-p <project dir> \
|
||||
-m <model_name> \
|
||||
-o <save_dir>
|
||||
```
|
||||
|
||||
Example: launch inference on a dataset
|
||||
|
||||
``` bash
|
||||
datum project import <...>
|
||||
datum model add mymodel <...>
|
||||
datum model run -m mymodel -o inference
|
||||
```
|
||||
|
||||
### Compare projects
|
||||
|
||||
The command compares two datasets and saves the results in the
|
||||
specified directory. The current project is considered to be
|
||||
"ground truth".
|
||||
|
||||
``` bash
|
||||
datum project diff --help
|
||||
|
||||
datum project diff <other_project_dir> -o <save_dir>
|
||||
```
|
||||
|
||||
Example: compare a dataset with model inference
|
||||
|
||||
``` bash
|
||||
datum project import <...>
|
||||
datum model add mymodel <...>
|
||||
datum project transform <...> -o inference
|
||||
datum project diff inference -o diff
|
||||
```
|
||||
|
||||
### Explain inference
|
||||
|
||||
Usage:
|
||||
|
||||
``` bash
|
||||
datum explain --help
|
||||
|
||||
datum explain \
|
||||
-m <model_name> \
|
||||
-o <save_dir> \
|
||||
-t <target> \
|
||||
<method> \
|
||||
<method_params>
|
||||
```
|
||||
|
||||
Example: run inference explanation on a single image with visualization
|
||||
|
||||
``` bash
|
||||
datum project create <...>
|
||||
datum model add mymodel <...>
|
||||
datum explain \
|
||||
-m mymodel \
|
||||
-t 'image.png' \
|
||||
rise \
|
||||
-s 1000 --progressive
|
||||
```
|
||||
|
||||
## Links
|
||||
- [TensorFlow detection model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md)
|
||||
- [How to convert model to OpenVINO format](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_Object_Detection_API_Models.html)
|
||||
- [Model conversion script example](https://github.com/opencv/cvat/blob/3e09503ba6c6daa6469a6c4d275a5a8b168dfa2c/components/tf_annotation/install.sh#L23)
|
||||
@ -1,5 +0,0 @@
|
||||
import unittest
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
||||
@ -0,0 +1,48 @@
|
||||
import numpy as np
|
||||
import os.path as osp
|
||||
|
||||
from unittest import TestCase
|
||||
|
||||
from datumaro.components.project import Project
|
||||
from datumaro.components.extractor import Extractor, DatasetItem
|
||||
from datumaro.util.test_utils import TestDir
|
||||
from datumaro.util.image import save_image
|
||||
|
||||
|
||||
class ImageDirFormatTest(TestCase):
|
||||
class TestExtractor(Extractor):
|
||||
def __iter__(self):
|
||||
return iter([
|
||||
DatasetItem(id=1, image=np.ones((10, 6, 3))),
|
||||
DatasetItem(id=2, image=np.ones((5, 4, 3))),
|
||||
])
|
||||
|
||||
def test_can_load(self):
|
||||
with TestDir() as test_dir:
|
||||
source_dataset = self.TestExtractor()
|
||||
|
||||
for item in source_dataset:
|
||||
save_image(osp.join(test_dir.path, '%s.jpg' % item.id),
|
||||
item.image)
|
||||
|
||||
project = Project.import_from(test_dir.path, 'image_dir')
|
||||
parsed_dataset = project.make_dataset()
|
||||
|
||||
self.assertListEqual(
|
||||
sorted(source_dataset.subsets()),
|
||||
sorted(parsed_dataset.subsets()),
|
||||
)
|
||||
|
||||
self.assertEqual(len(source_dataset), len(parsed_dataset))
|
||||
|
||||
for subset_name in source_dataset.subsets():
|
||||
source_subset = source_dataset.get_subset(subset_name)
|
||||
parsed_subset = parsed_dataset.get_subset(subset_name)
|
||||
self.assertEqual(len(source_subset), len(parsed_subset))
|
||||
for idx, (item_a, item_b) in enumerate(
|
||||
zip(source_subset, parsed_subset)):
|
||||
self.assertEqual(item_a, item_b, str(idx))
|
||||
|
||||
self.assertEqual(
|
||||
source_dataset.categories(),
|
||||
parsed_dataset.categories())
|
||||
@ -1,2 +1,2 @@
|
||||
Pillow==6.2.0
|
||||
requests==2.20.1
|
||||
Pillow>=6.2.0
|
||||
requests>=2.20.1
|
||||
|
||||
Loading…
Reference in New Issue