6.2 KiB
Auto annotation
Description
The application will be enabled automatically if OpenVINO™ component is installed. It allows to use custom models for auto annotation. Only models in OpenVINO™ toolkit format are supported. If you would like to annotate a task with a custom model please convert it to the intermediate representation (IR) format via the model optimizer tool. See OpenVINO documentation for details.
Usage
To annotate a task with a custom model you need to prepare 4 files:
-
Model config (*.xml) - a text file with network configuration.
-
Model weights (*.bin) - a binary file with trained weights.
-
Label map (*.json) - a simple json file with
label_mapdictionary like object with string values for label numbers. Example:{ "label_map": { "0": "background", "1": "aeroplane", "2": "bicycle", "3": "bird", "4": "boat", "5": "bottle", "6": "bus", "7": "car", "8": "cat", "9": "chair", "10": "cow", "11": "diningtable", "12": "dog", "13": "horse", "14": "motorbike", "15": "person", "16": "pottedplant", "17": "sheep", "18": "sofa", "19": "train", "20": "tvmonitor" } } -
Interpretation script (*.py) - a file used to convert net output layer to a predefined structure which can be processed by CVAT. This code will be run inside a restricted python's environment, but it's possible to use some builtin functions like str, int, float, max, min, range.
Also two variables are available in the scope:
-
detections - a list of dictionaries with detections for each frame:
- frame_id - frame number
- frame_height - frame height
- frame_width - frame width
- detections - output np.ndarray (See ExecutableNetwork.infer for details).
-
results - an instance of python class with converted results. Following methods should be used to add shapes:
# xtl, ytl, xbr, ybr - expected values are float or int # label - expected value is int # frame_number - expected value is int # attributes - dictionary of attribute_name: attribute_value pairs, for example {"confidence": "0.83"} add_box(self, xtl, ytl, xbr, ybr, label, frame_number, attributes=None) # points - list of (x, y) pairs of float or int, for example [(57.3, 100), (67, 102.7)] # label - expected value is int # frame_number - expected value is int # attributes - dictionary of attribute_name: attribute_value pairs, for example {"confidence": "0.83"} add_points(self, points, label, frame_number, attributes=None) add_polygon(self, points, label, frame_number, attributes=None) add_polyline(self, points, label, frame_number, attributes=None)
-
Examples
Person-vehicle-bike-detection-crossroad-0078 (OpenVINO toolkit)
Links
Task labels: person vehicle non-vehicle
label_map.json:
{
"label_map": {
"1": "person",
"2": "vehicle",
"3": "non-vehicle"
}
}
Interpretation script for SSD based networks:
def clip(value):
return max(min(1.0, value), 0.0)
for frame_results in detections:
frame_height = frame_results["frame_height"]
frame_width = frame_results["frame_width"]
frame_number = frame_results["frame_id"]
for i in range(frame_results["detections"].shape[2]):
confidence = frame_results["detections"][0, 0, i, 2]
if confidence < 0.5:
continue
results.add_box(
xtl=clip(frame_results["detections"][0, 0, i, 3]) * frame_width,
ytl=clip(frame_results["detections"][0, 0, i, 4]) * frame_height,
xbr=clip(frame_results["detections"][0, 0, i, 5]) * frame_width,
ybr=clip(frame_results["detections"][0, 0, i, 6]) * frame_height,
label=int(frame_results["detections"][0, 0, i, 1]),
frame_number=frame_number,
attributes={
"confidence": "{:.2f}".format(confidence),
},
)
Landmarks-regression-retail-0009 (OpenVINO toolkit)
Links
Task labels: left_eye right_eye tip_of_nose left_lip_corner right_lip_corner
label_map.json:
{
"label_map": {
"0": "left_eye",
"1": "right_eye",
"2": "tip_of_nose",
"3": "left_lip_corner",
"4": "right_lip_corner"
}
}
Interpretation script:
def clip(value):
return max(min(1.0, value), 0.0)
for frame_results in detections:
frame_height = frame_results["frame_height"]
frame_width = frame_results["frame_width"]
frame_number = frame_results["frame_id"]
for i in range(0, frame_results["detections"].shape[1], 2):
x = frame_results["detections"][0, i, 0, 0]
y = frame_results["detections"][0, i + 1, 0, 0]
results.add_points(
points=[(clip(x) * frame_width, clip(y) * frame_height)],
label=i // 2, # see label map and model output specification,
frame_number=frame_number,
)