cvat/README.md at af1101bdb739bd26a38246fef04cd07f735966bd - cvat

You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

6.2 KiB

Raw Blame History

Auto annotation

Description

The application will be enabled automatically if OpenVINO™ component is installed. It allows to use custom models for auto annotation. Only models in OpenVINO™ toolkit format are supported. If you would like to annotate a task with a custom model please convert it to the intermediate representation (IR) format via the model optimizer tool. See OpenVINO documentation for details.

Usage

To annotate a task with a custom model you need to prepare 4 files:

Model config (*.xml) - a text file with network configuration.
Model weights (*.bin) - a binary file with trained weights.

Label map (*.json) - a simple json file with label_map dictionary like object with string values for label numbers. Example:

{
  "label_map": {
    "0": "background",
    "1": "aeroplane",
    "2": "bicycle",
    "3": "bird",
    "4": "boat",
    "5": "bottle",
    "6": "bus",
    "7": "car",
    "8": "cat",
    "9": "chair",
    "10": "cow",
    "11": "diningtable",
    "12": "dog",
    "13": "horse",
    "14": "motorbike",
    "15": "person",
    "16": "pottedplant",
    "17": "sheep",
    "18": "sofa",
    "19": "train",
    "20": "tvmonitor"
  }
}

Interpretation script (*.py) - a file used to convert net output layer to a predefined structure which can be processed by CVAT. This code will be run inside a restricted python's environment, but it's possible to use some builtin functions like str, int, float, max, min, range.

Also two variables are available in the scope:

detections - a list of dictionaries with detections for each frame:
- frame_id - frame number
- frame_height - frame height
- frame_width - frame width
- detections - output np.ndarray (See ExecutableNetwork.infer for details).

results - an instance of python class with converted results. Following methods should be used to add shapes:

# xtl, ytl, xbr, ybr - expected values are float or int
# label - expected value is int
# frame_number - expected value is int
# attributes - dictionary of attribute_name: attribute_value pairs, for example {"confidence": "0.83"}
add_box(self, xtl, ytl, xbr, ybr, label, frame_number, attributes=None)

# points - list of (x, y) pairs of float or int, for example [(57.3, 100), (67, 102.7)]
# label - expected value is int
# frame_number - expected value is int
# attributes - dictionary of attribute_name: attribute_value pairs, for example {"confidence": "0.83"}
add_points(self, points, label, frame_number, attributes=None)
add_polygon(self, points, label, frame_number, attributes=None)
add_polyline(self, points, label, frame_number, attributes=None)

Examples

Person-vehicle-bike-detection-crossroad-0078 (OpenVINO toolkit)

Links

Task labels: person vehicle non-vehicle

label_map.json:

{
"label_map": {
    "1": "person",
    "2": "vehicle",
    "3": "non-vehicle"
    }
}

Interpretation script for SSD based networks:

def clip(value):
  return max(min(1.0, value), 0.0)

for frame_results in detections:
  frame_height = frame_results["frame_height"]
  frame_width = frame_results["frame_width"]
  frame_number = frame_results["frame_id"]

  for i in range(frame_results["detections"].shape[2]):
    confidence = frame_results["detections"][0, 0, i, 2]
    if confidence < 0.5:
      continue

    results.add_box(
      xtl=clip(frame_results["detections"][0, 0, i, 3]) * frame_width,
      ytl=clip(frame_results["detections"][0, 0, i, 4]) * frame_height,
      xbr=clip(frame_results["detections"][0, 0, i, 5]) * frame_width,
      ybr=clip(frame_results["detections"][0, 0, i, 6]) * frame_height,
      label=int(frame_results["detections"][0, 0, i, 1]),
      frame_number=frame_number,
      attributes={
        "confidence": "{:.2f}".format(confidence),
      },
    )

Landmarks-regression-retail-0009 (OpenVINO toolkit)