Manifest (#2763)

* Added support for manifest file * Added data migration * Updated tests * Update CHANGELOG * Update manifest documentation * Fix case with 3d data Co-authored-by: Nikita Manovich <nikita.manovich@intel.com>
5 years ago · 6c38ad0701
parent e41c301251
commit 6c38ad0701
21 changed files with 1089 additions and 459 deletions
--- a/.coveragerc
+++ b/.coveragerc
@ -5,6 +5,7 @@ branch = true
 source =
    cvat/apps/
    utils/cli/
    utils/dataset_manifest
 omit =
    cvat/settings/*
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -28,6 +28,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - CVAT-3D: Implemented initial cuboid placement in 3D View and select cuboid in Top, Side and Front views
  (<https://github.com/openvinotoolkit/cvat/pull/2891>)
 - [Market-1501](https://www.aitribune.com/dataset/2018051063) format support (<https://github.com/openvinotoolkit/cvat/pull/2869>)
 - Ability of upload manifest for dataset with images (<https://github.com/openvinotoolkit/cvat/pull/2763>)
 - Annotations filters UI using react-awesome-query-builder (https://github.com/openvinotoolkit/cvat/issues/1418)
 ### Changed
@ -42,6 +43,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Image visualizations settings on canvas for faster access (<https://github.com/openvinotoolkit/cvat/pull/2872>)
 - Better scale management of left panel when screen is too small (<https://github.com/openvinotoolkit/cvat/pull/2880>)
 - Improved error messages for annotation import (<https://github.com/openvinotoolkit/cvat/pull/2935>)
 - Using manifest support instead video meta information and dummy chunks (<https://github.com/openvinotoolkit/cvat/pull/2763>)
 ### Deprecated
--- a/cvat/apps/documentation/data_on_fly.md
+++ b/cvat/apps/documentation/data_on_fly.md
@ -2,40 +2,25 @@
 ## Description
-Data on the fly processing is a way of working with data, the main idea of which is as follows:
+Data on the fly processing is a way of working with data, the main idea of which is as follows: when creating a task,
-Minimum necessary meta information is collected, when task is created.
+the minimum necessary meta information is collected. This meta information allows in the future to create necessary
-This meta information allows in the future to create a necessary chunks when receiving a request from a client.
+chunks when receiving a request from a client.
-Generated chunks are stored in a cache of limited size with a policy of evicting less popular items.
+Generated chunks are stored in a cache of the limited size with a policy of evicting less popular items.
-When a request received from a client, the required chunk is searched for in the cache.
+When a request is received from a client, the required chunk is searched for in the cache. If the chunk does not exist
-If the chunk does not exist yet, it is created using a prepared meta information and then put into the cache.
+yet, it is created using prepared meta information and then put into the cache.
 This method of working with data allows:
 - reduce the task creation time.
- store data in a cache of limited size with a policy of evicting less popular items.
+- store data in a cache of the limited size with a policy of evicting less popular items.
-## Prepare meta information
+Unfortunately, this method will not work for all videos with a valid manifest file. If there are not enough keyframes
 in the video for smooth video decoding, the task will be created in another way. Namely, all chunks will be prepared
 during task creation, which may take some time.
-Different meta information is collected for different types of uploaded data.
+#### Uploading a manifest with data
-### Video
+When creating a task, you can upload a `manifest.jsonl` file along with the video or dataset with images.
-
+You can see how to prepare it [here](/utils/dataset_manifest/README.md).
 For video, this is a valid mapping of key frame numbers and their timestamps. This information is saved to `meta_info.txt`.
 Unfortunately, this method will not work for all videos with valid meta information.
 If there are not enough keyframes in the video for smooth video decoding, the task will be created in the old way.
 #### Uploading meta information along with data
 When creating a task, you can upload a file with meta information along with the video,
 which will further reduce the time for creating a task.
 You can see how to prepare meta information [here](/utils/prepare_meta_information/README.md).
 It is worth noting that the generated file also contains information about the number of frames in the video at the end.
 ### Images
 Mapping of chunk number and paths to images that should enter the chunk
 is saved at the time of creating a task in a files `dummy_{chunk_number}.txt`
--- a/cvat/apps/documentation/faq.md
+++ b/cvat/apps/documentation/faq.md
@ -15,7 +15,6 @@
 - [How to create a task with multiple jobs](#how-to-create-a-task-with-multiple-jobs)
 - [How to transfer CVAT to another machine](#how-to-transfer-cvat-to-another-machine)
 ## How to update CVAT
 Before upgrading, please follow the [backup guide](backup_guide.md) and backup all CVAT volumes.
@ -151,4 +150,5 @@ Set the segment size when you create a new task, this option is available in the
 [Advanced configuration](user_guide.md#advanced-configuration) section.
 ## How to transfer CVAT to another machine
 Follow the [backup/restore guide](backup_guide.md#how-to-backup-all-cvat-data).
--- a/cvat/apps/documentation/user_guide.md
+++ b/cvat/apps/documentation/user_guide.md
@ -153,8 +153,8 @@ Go to the [Django administration panel](http://localhost:8080/admin). There you
    **Select files**. Press tab `My computer` to choose some files for annotation from your PC.
    If you select tab `Connected file share` you can choose files for annotation from your network.
    If you select ` Remote source` , you'll see a field where you can enter a list of URLs (one URL per line).
-    If you upload a video data and select `Use cache` option, you can along with the video file attach a file with meta information.
+    If you upload a video or dataset with images and select `Use cache` option, you can attach a `manifest.jsonl` file.
-    You can find how to prepare it [here](/utils/prepare_meta_information/README.md).
+    You can find how to prepare it [here](/utils/dataset_manifest/README.md).
    ![](static/documentation/images/image127.jpg)
@ -1157,8 +1157,6 @@ Intelligent scissors is an CV method of creating a polygon by placing points wit
 The distance between the adjacent points is limited by the threshold of action,
 displayed as a red square which is tied to the cursor.
 - First, select the label and then click on the `intelligent scissors` button.
  ![](static/documentation/images/image199.jpg)
--- a/cvat/apps/engine/cache.py
+++ b/cvat/apps/engine/cache.py
@ -1,4 +1,4 @@
-# Copyright (C) 2020 Intel Corporation
+# Copyright (C) 2020-2021 Intel Corporation
 #
 # SPDX-License-Identifier: MIT
@ -9,9 +9,9 @@ from diskcache import Cache
 from django.conf import settings
 from cvat.apps.engine.media_extractors import (Mpeg4ChunkWriter,
-    Mpeg4CompressedChunkWriter, ZipChunkWriter, ZipCompressedChunkWriter)
+    Mpeg4CompressedChunkWriter, ZipChunkWriter, ZipCompressedChunkWriter,
    ImageDatasetManifestReader, VideoDatasetManifestReader)
 from cvat.apps.engine.models import DataChoice, StorageChoice
 from cvat.apps.engine.prepare import PrepareInfo
 from cvat.apps.engine.models import DimensionType
 class CacheInteraction:
@ -51,17 +51,24 @@ class CacheInteraction:
                StorageChoice.LOCAL: db_data.get_upload_dirname(),
                StorageChoice.SHARE: settings.SHARE_ROOT
            }[db_data.storage]
-        if os.path.exists(db_data.get_meta_path()):
+        if hasattr(db_data, 'video'):
            source_path = os.path.join(upload_dir, db_data.video.path)
-            meta = PrepareInfo(source_path=source_path, meta_path=db_data.get_meta_path())
+            reader = VideoDatasetManifestReader(manifest_path=db_data.get_manifest_path(),
-            for frame in meta.decode_needed_frames(chunk_number, db_data):
+                source_path=source_path, chunk_number=chunk_number,
-                images.append(frame)
+                chunk_size=db_data.chunk_size, start=db_data.start_frame,
-            writer.save_as_chunk([(image, source_path, None) for image in images], buff)
+                stop=db_data.stop_frame, step=db_data.get_frame_step())
            for frame in reader:
                images.append((frame, source_path, None))
        else:
-            with open(db_data.get_dummy_chunk_path(chunk_number), 'r') as dummy_file:
+            reader = ImageDatasetManifestReader(manifest_path=db_data.get_manifest_path(),
-                images = [os.path.join(upload_dir, line.strip()) for line in dummy_file]
+                chunk_number=chunk_number, chunk_size=db_data.chunk_size,
-            writer.save_as_chunk([(image, image, None) for image in images], buff)
+                start=db_data.start_frame, stop=db_data.stop_frame,
                step=db_data.get_frame_step())
            for item in reader:
                source_path = os.path.join(upload_dir, f"{item['name']}{item['extension']}")
                images.append((source_path, source_path, None))
        writer.save_as_chunk(images, buff)
        buff.seek(0)
        return buff, mime_type
--- a/cvat/apps/engine/media_extractors.py
+++ b/cvat/apps/engine/media_extractors.py
@ -11,6 +11,7 @@ import itertools
 import struct
 import re
 from abc import ABC, abstractmethod
 from contextlib import closing
 import av
 import numpy as np
@ -25,6 +26,7 @@ from cvat.apps.engine.models import DimensionType
 ImageFile.LOAD_TRUNCATED_IMAGES = True
 from cvat.apps.engine.mime_types import mimetypes
 from utils.dataset_manifest import VideoManifestManager, ImageManifestManager
 def get_mime(name):
    for type_name, type_def in MEDIA_TYPES.items():
@ -127,6 +129,10 @@ class ImageListReader(IMediaReader):
        img = Image.open(self._source_path[i])
        return img.width, img.height
    @property
    def absolute_source_paths(self):
        return [self.get_path(idx) for idx, _ in enumerate(self._source_path)]
 class DirectoryReader(ImageListReader):
    def __init__(self, source_path, step=1, start=0, stop=None):
        image_paths = []
@ -317,6 +323,103 @@ class VideoReader(IMediaReader):
        image = (next(iter(self)))[0]
        return image.width, image.height
 class FragmentMediaReader:
    def __init__(self, chunk_number, chunk_size, start, stop, step=1):
        self._start = start
        self._stop = stop + 1 # up to the last inclusive
        self._step = step
        self._chunk_number = chunk_number
        self._chunk_size = chunk_size
        self._start_chunk_frame_number = \
            self._start + self._chunk_number * self._chunk_size * self._step
        self._end_chunk_frame_number = min(self._start_chunk_frame_number \
            + (self._chunk_size - 1) * self._step + 1, self._stop)
        self._frame_range = self._get_frame_range()
    @property
    def frame_range(self):
        return self._frame_range
    def _get_frame_range(self):
        frame_range = []
        for idx in range(self._start, self._stop, self._step):
            if idx < self._start_chunk_frame_number:
                continue
            elif idx < self._end_chunk_frame_number and \
                    not ((idx - self._start_chunk_frame_number) % self._step):
                frame_range.append(idx)
            elif (idx - self._start_chunk_frame_number) % self._step:
                continue
            else:
                break
        return frame_range
 class ImageDatasetManifestReader(FragmentMediaReader):
    def __init__(self, manifest_path, **kwargs):
        super().__init__(**kwargs)
        self._manifest = ImageManifestManager(manifest_path)
        self._manifest.init_index()
    def __iter__(self):
        for idx in self._frame_range:
            yield self._manifest[idx]
 class VideoDatasetManifestReader(FragmentMediaReader):
    def __init__(self, manifest_path, **kwargs):
        self.source_path = kwargs.pop('source_path')
        super().__init__(**kwargs)
        self._manifest = VideoManifestManager(manifest_path)
        self._manifest.init_index()
    def _get_nearest_left_key_frame(self):
        if self._start_chunk_frame_number >= \
                self._manifest[len(self._manifest) - 1].get('number'):
            left_border = len(self._manifest) - 1
        else:
            left_border = 0
            delta = len(self._manifest)
            while delta:
                step = delta // 2
                cur_position = left_border + step
                if self._manifest[cur_position].get('number') < self._start_chunk_frame_number:
                    cur_position += 1
                    left_border = cur_position
                    delta -= step + 1
                else:
                    delta = step
            if self._manifest[cur_position].get('number') > self._start_chunk_frame_number:
                left_border -= 1
        frame_number = self._manifest[left_border].get('number')
        timestamp = self._manifest[left_border].get('pts')
        return frame_number, timestamp
    def __iter__(self):
        start_decode_frame_number, start_decode_timestamp = self._get_nearest_left_key_frame()
        with closing(av.open(self.source_path, mode='r')) as container:
            video_stream = next(stream for stream in container.streams if stream.type == 'video')
            video_stream.thread_type = 'AUTO'
            container.seek(offset=start_decode_timestamp, stream=video_stream)
            frame_number = start_decode_frame_number - 1
            for packet in container.demux(video_stream):
                for frame in packet.decode():
                    frame_number += 1
                    if frame_number in self._frame_range:
                        if video_stream.metadata.get('rotate'):
                            frame = av.VideoFrame().from_ndarray(
                                rotate_image(
                                    frame.to_ndarray(format='bgr24'),
                                    360 - int(container.streams.video[0].metadata.get('rotate'))
                                ),
                                format ='bgr24'
                            )
                        yield frame
                    elif frame_number < self._frame_range[-1]:
                        continue
                    else:
                        return
 class IChunkWriter(ABC):
    def __init__(self, quality, dimension=DimensionType.DIM_2D):
        self._image_quality = quality
--- a/cvat/apps/engine/migrations/0038_manifest.py
+++ b/cvat/apps/engine/migrations/0038_manifest.py
@ -0,0 +1,83 @@
 # Generated by Django 3.1.1 on 2021-02-20 08:36
 import glob
 import os
 from re import search
 from django.conf import settings
 from django.db import migrations
 from cvat.apps.engine.models import (DimensionType, StorageChoice,
                                     StorageMethodChoice)
 from utils.dataset_manifest import ImageManifestManager, VideoManifestManager
 def migrate_data(apps, shema_editor):
    Data = apps.get_model("engine", "Data")
    query_set = Data.objects.filter(storage_method=StorageMethodChoice.CACHE)
    for db_data in query_set:
        try:
            upload_dir = '{}/{}/raw'.format(settings.MEDIA_DATA_ROOT, db_data.id)
            if os.path.exists(os.path.join(upload_dir, 'meta_info.txt')):
                    os.remove(os.path.join(upload_dir, 'meta_info.txt'))
            else:
                for path in glob.glob(f'{upload_dir}/dummy_*.txt'):
                    os.remove(path)
            # it's necessary for case with long data migration
            if os.path.exists(os.path.join(upload_dir, 'manifest.jsonl')):
                continue
            data_dir = upload_dir if db_data.storage == StorageChoice.LOCAL else settings.SHARE_ROOT
            if hasattr(db_data, 'video'):
                media_file = os.path.join(data_dir, db_data.video.path)
                manifest = VideoManifestManager(manifest_path=upload_dir)
                meta_info = manifest.prepare_meta(media_file=media_file)
                manifest.create(meta_info)
                manifest.init_index()
            else:
                manifest = ImageManifestManager(manifest_path=upload_dir)
                sources = []
                if db_data.storage == StorageChoice.LOCAL:
                    for (root, _, files) in os.walk(data_dir):
                        sources.extend([os.path.join(root, f) for f in files])
                    sources.sort()
                # using share, this means that we can not explicitly restore the entire data structure
                else:
                    sources = [os.path.join(data_dir, db_image.path) for db_image in db_data.images.all().order_by('frame')]
                if any(list(filter(lambda x: x.dimension==DimensionType.DIM_3D, db_data.tasks.all()))):
                    content = []
                    for source in sources:
                        name, ext = os.path.splitext(os.path.relpath(source, upload_dir))
                        content.append({
                            'name': name,
                            'extension': ext
                        })
                else:
                    meta_info = manifest.prepare_meta(sources=sources, data_dir=data_dir)
                    content = meta_info.content
                if db_data.storage == StorageChoice.SHARE:
                    def _get_frame_step(str_):
                        match = search("step\s*=\s*([1-9]\d*)", str_)
                        return int(match.group(1)) if match else 1
                    step = _get_frame_step(db_data.frame_filter)
                    start = db_data.start_frame
                    stop = db_data.stop_frame + 1
                    images_range = range(start, stop, step)
                    result_content = []
                    for i in range(stop):
                        item = content.pop(0) if i in images_range else dict()
                        result_content.append(item)
                    content = result_content
                manifest.create(content)
                manifest.init_index()
        except Exception as ex:
            print(str(ex))
 class Migration(migrations.Migration):
    dependencies = [
        ('engine', '0037_task_subset'),
    ]
    operations = [
        migrations.RunPython(migrate_data)
    ]
--- a/cvat/apps/engine/models.py
+++ b/cvat/apps/engine/models.py
@ -138,11 +138,10 @@ class Data(models.Model):
    def get_preview_path(self):
        return os.path.join(self.get_data_dirname(), 'preview.jpeg')
-    def get_meta_path(self):
+    def get_manifest_path(self):
-        return os.path.join(self.get_upload_dirname(), 'meta_info.txt')
+        return os.path.join(self.get_upload_dirname(), 'manifest.jsonl')
-
+    def get_index_path(self):
-    def get_dummy_chunk_path(self, chunk_number):
+        return os.path.join(self.get_upload_dirname(), 'index.json')
        return os.path.join(self.get_upload_dirname(), 'dummy_{}.txt'.format(chunk_number))
 class Video(models.Model):
    data = models.OneToOneField(Data, on_delete=models.CASCADE, related_name="video", null=True)
--- a/cvat/apps/engine/prepare.py
+++ b/cvat/apps/engine/prepare.py
@ -1,277 +0,0 @@
 # Copyright (C) 2020 Intel Corporation
 #
 # SPDX-License-Identifier: MIT
 import av
 from collections import OrderedDict
 import hashlib
 import os
 from cvat.apps.engine.utils import rotate_image
 class WorkWithVideo:
    def __init__(self, **kwargs):
        if not kwargs.get('source_path'):
            raise Exception('No sourse path')
        self.source_path = kwargs.get('source_path')
    @staticmethod
    def _open_video_container(sourse_path, mode, options=None):
        return av.open(sourse_path, mode=mode, options=options)
    @staticmethod
    def _close_video_container(container):
        container.close()
    @staticmethod
    def _get_video_stream(container):
        video_stream = next(stream for stream in container.streams if stream.type == 'video')
        video_stream.thread_type = 'AUTO'
        return video_stream
    @staticmethod
    def _get_frame_size(container):
        video_stream = WorkWithVideo._get_video_stream(container)
        for packet in container.demux(video_stream):
            for frame in packet.decode():
                if video_stream.metadata.get('rotate'):
                    frame = av.VideoFrame().from_ndarray(
                        rotate_image(
                            frame.to_ndarray(format='bgr24'),
                            360 - int(container.streams.video[0].metadata.get('rotate')),
                        ),
                        format ='bgr24',
                    )
                return frame.width, frame.height
 class AnalyzeVideo(WorkWithVideo):
    def check_type_first_frame(self):
        container = self._open_video_container(self.source_path, mode='r')
        video_stream = self._get_video_stream(container)
        for packet in container.demux(video_stream):
            for frame in packet.decode():
                self._close_video_container(container)
                assert frame.pict_type.name == 'I', 'First frame is not key frame'
                return
    def check_video_timestamps_sequences(self):
        container = self._open_video_container(self.source_path, mode='r')
        video_stream = self._get_video_stream(container)
        frame_pts = -1
        frame_dts = -1
        for packet in container.demux(video_stream):
            for frame in packet.decode():
                if None not in [frame.pts, frame_pts] and frame.pts <= frame_pts:
                    self._close_video_container(container)
                    raise Exception('Invalid pts sequences')
                if None not in [frame.dts, frame_dts] and frame.dts <= frame_dts:
                    self._close_video_container(container)
                    raise Exception('Invalid dts sequences')
                frame_pts, frame_dts = frame.pts, frame.dts
        self._close_video_container(container)
 def md5_hash(frame):
    return hashlib.md5(frame.to_image().tobytes()).hexdigest()
 class PrepareInfo(WorkWithVideo):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        if not kwargs.get('meta_path'):
            raise Exception('No meta path')
        self.meta_path = kwargs.get('meta_path')
        self.key_frames = {}
        self.frames = 0
        container = self._open_video_container(self.source_path, 'r')
        self.width, self.height = self._get_frame_size(container)
        self._close_video_container(container)
    def get_task_size(self):
        return self.frames
    @property
    def frame_sizes(self):
        return (self.width, self.height)
    def check_key_frame(self, container, video_stream, key_frame):
        for packet in container.demux(video_stream):
            for frame in packet.decode():
                if md5_hash(frame) != key_frame[1]['md5'] or frame.pts != key_frame[1]['pts']:
                    self.key_frames.pop(key_frame[0])
                return
    def check_seek_key_frames(self):
        container = self._open_video_container(self.source_path, mode='r')
        video_stream = self._get_video_stream(container)
        key_frames_copy = self.key_frames.copy()
        for key_frame in key_frames_copy.items():
            container.seek(offset=key_frame[1]['pts'], stream=video_stream)
            self.check_key_frame(container, video_stream, key_frame)
    def check_frames_ratio(self, chunk_size):
        return (len(self.key_frames) and (self.frames // len(self.key_frames)) <= 2 * chunk_size)
    def save_key_frames(self):
        container = self._open_video_container(self.source_path, mode='r')
        video_stream = self._get_video_stream(container)
        frame_number = 0
        for packet in container.demux(video_stream):
            for frame in packet.decode():
                if frame.key_frame:
                    self.key_frames[frame_number] = {
                        'pts': frame.pts,
                        'md5': md5_hash(frame),
                    }
                frame_number += 1
        self.frames = frame_number
        self._close_video_container(container)
    def save_meta_info(self):
        with open(self.meta_path, 'w') as meta_file:
            for index, frame in self.key_frames.items():
                meta_file.write('{} {}\n'.format(index, frame['pts']))
    def get_nearest_left_key_frame(self, start_chunk_frame_number):
        start_decode_frame_number = 0
        start_decode_timestamp = 0
        with open(self.meta_path, 'r') as file:
            for line in file:
                frame_number, timestamp = line.strip().split(' ')
                if int(frame_number) <= start_chunk_frame_number:
                    start_decode_frame_number = frame_number
                    start_decode_timestamp = timestamp
                else:
                    break
        return int(start_decode_frame_number), int(start_decode_timestamp)
    def decode_needed_frames(self, chunk_number, db_data):
        step = db_data.get_frame_step()
        start_chunk_frame_number = db_data.start_frame + chunk_number * db_data.chunk_size * step
        end_chunk_frame_number = min(start_chunk_frame_number + (db_data.chunk_size - 1) * step + 1, db_data.stop_frame + 1)
        start_decode_frame_number, start_decode_timestamp = self.get_nearest_left_key_frame(start_chunk_frame_number)
        container = self._open_video_container(self.source_path, mode='r')
        video_stream = self._get_video_stream(container)
        container.seek(offset=start_decode_timestamp, stream=video_stream)
        frame_number = start_decode_frame_number - 1
        for packet in container.demux(video_stream):
            for frame in packet.decode():
                frame_number += 1
                if frame_number < start_chunk_frame_number:
                    continue
                elif frame_number < end_chunk_frame_number and not ((frame_number - start_chunk_frame_number) % step):
                    if video_stream.metadata.get('rotate'):
                        frame = av.VideoFrame().from_ndarray(
                            rotate_image(
                                frame.to_ndarray(format='bgr24'),
                                360 - int(container.streams.video[0].metadata.get('rotate'))
                            ),
                            format ='bgr24'
                        )
                    yield frame
                elif (frame_number - start_chunk_frame_number) % step:
                    continue
                else:
                    self._close_video_container(container)
                    return
        self._close_video_container(container)
 class UploadedMeta(PrepareInfo):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        uploaded_meta = kwargs.get('uploaded_meta')
        assert uploaded_meta is not None , 'No uploaded meta path'
        with open(uploaded_meta, 'r') as meta_file:
            lines = meta_file.read().strip().split('\n')
            self.frames = int(lines.pop())
            key_frames = {int(line.split()[0]): int(line.split()[1]) for line in lines}
            self.key_frames = OrderedDict(sorted(key_frames.items(), key=lambda x: x[0]))
    @property
    def frame_sizes(self):
        container = self._open_video_container(self.source_path, 'r')
        video_stream = self._get_video_stream(container)
        container.seek(offset=next(iter(self.key_frames.values())), stream=video_stream)
        for packet in container.demux(video_stream):
            for frame in packet.decode():
                if video_stream.metadata.get('rotate'):
                    frame = av.VideoFrame().from_ndarray(
                        rotate_image(
                            frame.to_ndarray(format='bgr24'),
                            360 - int(container.streams.video[0].metadata.get('rotate'))
                        ),
                        format ='bgr24'
                    )
                self._close_video_container(container)
                return (frame.width, frame.height)
    def save_meta_info(self):
        with open(self.meta_path, 'w') as meta_file:
            for index, pts in self.key_frames.items():
                meta_file.write('{} {}\n'.format(index, pts))
    def check_key_frame(self, container, video_stream, key_frame):
        for packet in container.demux(video_stream):
            for frame in packet.decode():
                assert frame.pts == key_frame[1], "Uploaded meta information does not match the video"
                return
    def check_seek_key_frames(self):
        container = self._open_video_container(self.source_path, mode='r')
        video_stream = self._get_video_stream(container)
        for key_frame in self.key_frames.items():
            container.seek(offset=key_frame[1], stream=video_stream)
            self.check_key_frame(container, video_stream, key_frame)
        self._close_video_container(container)
    def check_frames_numbers(self):
        container = self._open_video_container(self.source_path, mode='r')
        video_stream = self._get_video_stream(container)
        # not all videos contain information about numbers of frames
        if video_stream.frames:
            self._close_video_container(container)
            assert video_stream.frames == self.frames, "Uploaded meta information does not match the video"
            return
        self._close_video_container(container)
 def prepare_meta(media_file, upload_dir=None, meta_dir=None, chunk_size=None):
    paths = {
        'source_path': os.path.join(upload_dir, media_file) if upload_dir else media_file,
        'meta_path': os.path.join(meta_dir, 'meta_info.txt') if meta_dir else os.path.join(upload_dir, 'meta_info.txt'),
    }
    analyzer = AnalyzeVideo(source_path=paths.get('source_path'))
    analyzer.check_type_first_frame()
    analyzer.check_video_timestamps_sequences()
    meta_info = PrepareInfo(source_path=paths.get('source_path'),
                            meta_path=paths.get('meta_path'))
    meta_info.save_key_frames()
    meta_info.check_seek_key_frames()
    meta_info.save_meta_info()
    smooth_decoding = meta_info.check_frames_ratio(chunk_size) if chunk_size else None
    return (meta_info, smooth_decoding)
 def prepare_meta_for_upload(func, *args):
    meta_info, smooth_decoding = func(*args)
    with open(meta_info.meta_path, 'a') as meta_file:
        meta_file.write(str(meta_info.get_task_size()))
    return smooth_decoding
--- a/cvat/apps/engine/task.py
+++ b/cvat/apps/engine/task.py
@ -1,12 +1,11 @@
-# Copyright (C) 2018-2020 Intel Corporation
+# Copyright (C) 2018-2021 Intel Corporation
 #
 # SPDX-License-Identifier: MIT
 import itertools
 import os
 import sys
 from re import findall
 import rq
 import shutil
 from traceback import print_exception
@ -17,8 +16,9 @@ import requests
 from cvat.apps.engine.media_extractors import get_mime, MEDIA_TYPES, Mpeg4ChunkWriter, ZipChunkWriter, Mpeg4CompressedChunkWriter, ZipCompressedChunkWriter, ValidateDimension
 from cvat.apps.engine.models import DataChoice, StorageMethodChoice, StorageChoice, RelatedFile
 from cvat.apps.engine.utils import av_scan_paths
 from cvat.apps.engine.prepare import prepare_meta
 from cvat.apps.engine.models import DimensionType
 from utils.dataset_manifest import ImageManifestManager, VideoManifestManager
 from utils.dataset_manifest.core import VideoManifestValidator
 import django_rq
 from django.conf import settings
@ -107,7 +107,7 @@ def _save_task_to_db(db_task):
    db_task.data.save()
    db_task.save()
-def _count_files(data, meta_info_file=None):
+def _count_files(data, manifest_file=None):
    share_root = settings.SHARE_ROOT
    server_files = []
@ -134,8 +134,8 @@ def _count_files(data, meta_info_file=None):
            mime = get_mime(full_path)
            if mime in counter:
                counter[mime].append(rel_path)
-            elif findall('meta_info.txt$', rel_path):
+            elif 'manifest.jsonl' == os.path.basename(rel_path):
-                meta_info_file.append(rel_path)
+                manifest_file.append(rel_path)
            else:
                slogger.glob.warn("Skip '{}' file (its mime type doesn't "
                    "correspond to a video or an image file)".format(full_path))
@ -154,7 +154,7 @@ def _count_files(data, meta_info_file=None):
    return counter
-def _validate_data(counter, meta_info_file=None):
+def _validate_data(counter, manifest_file=None):
    unique_entries = 0
    multiple_entries = 0
    for media_type, media_config in MEDIA_TYPES.items():
@ -164,8 +164,8 @@ def _validate_data(counter, meta_info_file=None):
            else:
                multiple_entries += len(counter[media_type])
-            if meta_info_file and media_type != 'video':
+            if manifest_file and media_type not in ('video', 'image'):
-                raise Exception('File with meta information can only be uploaded with video file')
+                raise Exception('File with meta information can only be uploaded with video/images ')
    if unique_entries == 1 and multiple_entries > 0 or unique_entries > 1:
        unique_types = ', '.join([k for k, v in MEDIA_TYPES.items() if v['unique']])
@ -221,10 +221,10 @@ def _create_thread(tid, data):
    if data['remote_files']:
        data['remote_files'] = _download_data(data['remote_files'], upload_dir)
-    meta_info_file = []
+    manifest_file = []
-    media = _count_files(data, meta_info_file)
+    media = _count_files(data, manifest_file)
-    media, task_mode = _validate_data(media, meta_info_file)
+    media, task_mode = _validate_data(media, manifest_file)
-    if meta_info_file:
+    if manifest_file:
        assert settings.USE_CACHE and db_data.storage_method == StorageMethodChoice.CACHE, \
            "File with meta information can be uploaded if 'Use cache' option is also selected"
@ -248,8 +248,10 @@ def _create_thread(tid, data):
            if extractor is not None:
                raise Exception('Combined data types are not supported')
            source_paths=[os.path.join(upload_dir, f) for f in media_files]
-            if media_type in  ('archive', 'zip') and db_data.storage == StorageChoice.SHARE:
+            if media_type in {'archive', 'zip'} and db_data.storage == StorageChoice.SHARE:
                source_paths.append(db_data.get_upload_dirname())
                upload_dir = db_data.get_upload_dirname()
                db_data.storage = StorageChoice.LOCAL
            extractor = MEDIA_TYPES[media_type]['extractor'](
                source_path=source_paths,
                step=db_data.get_frame_step(),
@ -322,68 +324,108 @@ def _create_thread(tid, data):
    video_path = ""
    video_size = (0, 0)
    def _update_status(msg):
        job.meta['status'] = msg
        job.save_meta()
    if settings.USE_CACHE and db_data.storage_method == StorageMethodChoice.CACHE:
       for media_type, media_files in media.items():
            if not media_files:
                continue
            # replace manifest file (e.g was uploaded 'subdir/manifest.jsonl')
            if manifest_file and not os.path.exists(db_data.get_manifest_path()):
                shutil.copyfile(os.path.join(upload_dir, manifest_file[0]),
                    db_data.get_manifest_path())
                if upload_dir != settings.SHARE_ROOT:
                    os.remove(os.path.join(upload_dir, manifest_file[0]))
            if task_mode == MEDIA_TYPES['video']['mode']:
                try:
-                    if meta_info_file:
+                    manifest_is_prepared = False
                    if manifest_file:
                        try:
-                            from cvat.apps.engine.prepare import UploadedMeta
+                            manifest = VideoManifestValidator(source_path=os.path.join(upload_dir, media_files[0]),
-                            meta_info = UploadedMeta(source_path=os.path.join(upload_dir, media_files[0]),
+                                                              manifest_path=db_data.get_manifest_path())
-                                                     meta_path=db_data.get_meta_path(),
+                            manifest.init_index()
-                                                     uploaded_meta=os.path.join(upload_dir, meta_info_file[0]))
+                            manifest.validate_seek_key_frames()
-                            meta_info.check_seek_key_frames()
+                            manifest.validate_frame_numbers()
-                            meta_info.check_frames_numbers()
+                            assert len(manifest) > 0, 'No key frames.'
-                            meta_info.save_meta_info()
+
-                            assert len(meta_info.key_frames) > 0, 'No key frames.'
+                            all_frames = manifest['properties']['length']
                            video_size = manifest['properties']['resolution']
                            manifest_is_prepared = True
                        except Exception as ex:
-                            base_msg = str(ex) if isinstance(ex, AssertionError) else \
+                            if os.path.exists(db_data.get_index_path()):
-                                'Invalid meta information was upload.'
+                                os.remove(db_data.get_index_path())
-                            job.meta['status'] = '{} Start prepare valid meta information.'.format(base_msg)
+                            if isinstance(ex, AssertionError):
-                            job.save_meta()
+                                base_msg = str(ex)
-                            meta_info, smooth_decoding = prepare_meta(
+                            else:
-                                media_file=media_files[0],
+                                base_msg = 'Invalid manifest file was upload.'
-                                upload_dir=upload_dir,
+                                slogger.glob.warning(str(ex))
-                                meta_dir=os.path.dirname(db_data.get_meta_path()),
+                            _update_status('{} Start prepare a valid manifest file.'.format(base_msg))
-                                chunk_size=db_data.chunk_size
+
-                            )
+                    if not manifest_is_prepared:
-                            assert smooth_decoding == True, 'Too few keyframes for smooth video decoding.'
+                        _update_status('Start prepare a manifest file')
-                    else:
+                        manifest = VideoManifestManager(db_data.get_manifest_path())
-                        meta_info, smooth_decoding = prepare_meta(
+                        meta_info = manifest.prepare_meta(
                            media_file=media_files[0],
                            upload_dir=upload_dir,
                            meta_dir=os.path.dirname(db_data.get_meta_path()),
                            chunk_size=db_data.chunk_size
                        )
-                        assert smooth_decoding == True, 'Too few keyframes for smooth video decoding.'
+                        manifest.create(meta_info)
                        manifest.init_index()
                        _update_status('A manifest had been created')
-                    all_frames = meta_info.get_task_size()
+                        all_frames = meta_info.get_size()
-                    video_size = meta_info.frame_sizes
+                        video_size = meta_info.frame_sizes
                        manifest_is_prepared = True
-                    db_data.size = len(range(db_data.start_frame, min(data['stop_frame'] + 1 if data['stop_frame'] else all_frames, all_frames), db_data.get_frame_step()))
+                    db_data.size = len(range(db_data.start_frame, min(data['stop_frame'] + 1 \
                        if data['stop_frame'] else all_frames, all_frames), db_data.get_frame_step()))
                    video_path = os.path.join(upload_dir, media_files[0])
                except Exception as ex:
                    db_data.storage_method = StorageMethodChoice.FILE_SYSTEM
-                    if os.path.exists(db_data.get_meta_path()):
+                    if os.path.exists(db_data.get_manifest_path()):
-                        os.remove(db_data.get_meta_path())
+                        os.remove(db_data.get_manifest_path())
-                    base_msg = str(ex) if isinstance(ex, AssertionError) else "Uploaded video does not support a quick way of task creating."
+                    if os.path.exists(db_data.get_index_path()):
-                    job.meta['status'] = "{} The task will be created using the old method".format(base_msg)
+                        os.remove(db_data.get_index_path())
-                    job.save_meta()
+                    base_msg = str(ex) if isinstance(ex, AssertionError) \
-            else:#images,archive
+                        else "Uploaded video does not support a quick way of task creating."
                    _update_status("{} The task will be created using the old method".format(base_msg))
            else:# images, archive, pdf
                db_data.size = len(extractor)
-
+                manifest = ImageManifestManager(db_data.get_manifest_path())
                if not manifest_file:
                    if db_task.dimension == DimensionType.DIM_2D:
                        meta_info = manifest.prepare_meta(
                            sources=extractor.absolute_source_paths,
                            data_dir=upload_dir
                        )
                        content = meta_info.content
                    else:
                        content = []
                        for source in extractor.absolute_source_paths:
                            name, ext = os.path.splitext(os.path.relpath(source, upload_dir))
                            content.append({
                                'name': name,
                                'extension': ext
                            })
                    manifest.create(content)
                manifest.init_index()
                counter = itertools.count()
-                for chunk_number, chunk_frames in itertools.groupby(extractor.frame_range, lambda x: next(counter) // db_data.chunk_size):
+                for _, chunk_frames in itertools.groupby(extractor.frame_range, lambda x: next(counter) // db_data.chunk_size):
                    chunk_paths = [(extractor.get_path(i), i) for i in chunk_frames]
                    img_sizes = []
-                    with open(db_data.get_dummy_chunk_path(chunk_number), 'w') as dummy_chunk:
+
-                        for path, frame_id in chunk_paths:
+                    for _, frame_id in chunk_paths:
-                            dummy_chunk.write(os.path.relpath(path, upload_dir) + '\n')
+                        properties = manifest[frame_id]
-                            img_sizes.append(extractor.get_image_size(frame_id))
+                        if db_task.dimension == DimensionType.DIM_2D:
                            resolution = (properties['width'], properties['height'])
                        else:
                            resolution = extractor.get_image_size(frame_id)
                        img_sizes.append(resolution)
                    db_images.extend([
                        models.Image(data=db_data,
@ -453,6 +495,10 @@ def _create_thread(tid, data):
    if db_data.stop_frame == 0:
        db_data.stop_frame = db_data.start_frame + (db_data.size - 1) * db_data.get_frame_step()
    else:
        # validate stop_frame
        db_data.stop_frame = min(db_data.stop_frame, \
            db_data.start_frame + (db_data.size - 1) * db_data.get_frame_step())
    preview = extractor.get_preview()
    preview.save(db_data.get_preview_path())
--- a/cvat/apps/engine/tests/test_rest_api.py
+++ b/cvat/apps/engine/tests/test_rest_api.py
@ -30,9 +30,9 @@ from rest_framework.test import APIClient, APITestCase
 from cvat.apps.engine.models import (AttributeSpec, AttributeType, Data, Job, Project,
    Segment, StatusChoice, Task, Label, StorageMethodChoice, StorageChoice)
 from cvat.apps.engine.prepare import prepare_meta, prepare_meta_for_upload
 from cvat.apps.engine.media_extractors import ValidateDimension
 from cvat.apps.engine.models import DimensionType
 from utils.dataset_manifest import ImageManifestManager, VideoManifestManager
 def create_db_users(cls):
    (group_admin, _) = Group.objects.get_or_create(name="admin")
@ -1971,6 +1971,26 @@ def generate_pdf_file(filename, page_count=1):
    file_buf.seek(0)
    return image_sizes, file_buf
 def generate_manifest_file(data_type, manifest_path, sources):
    kwargs = {
        'images': {
            'sources': sources,
            'is_sorted': False,
        },
        'video': {
            'media_file': sources[0],
            'upload_dir': os.path.dirname(sources[0]),
            'force': True
        }
    }
    if data_type == 'video':
        manifest = VideoManifestManager(manifest_path)
    else:
        manifest = ImageManifestManager(manifest_path)
    prepared_meta = manifest.prepare_meta(**kwargs[data_type])
    manifest.create(prepared_meta)
 class TaskDataAPITestCase(APITestCase):
    _image_sizes = {}
@ -2093,6 +2113,12 @@ class TaskDataAPITestCase(APITestCase):
        shutil.rmtree(root_path)
        cls._image_sizes[filename] = image_sizes
        generate_manifest_file(data_type='video', manifest_path=os.path.join(settings.SHARE_ROOT, 'videos', 'manifest.jsonl'),
            sources=[os.path.join(settings.SHARE_ROOT, 'videos', 'test_video_1.mp4')])
        generate_manifest_file(data_type='images', manifest_path=os.path.join(settings.SHARE_ROOT, 'manifest.jsonl'),
            sources=[os.path.join(settings.SHARE_ROOT, f'test_{i}.jpg') for i in range(1,4)])
    @classmethod
    def tearDownClass(cls):
        super().tearDownClass()
@ -2114,7 +2140,10 @@ class TaskDataAPITestCase(APITestCase):
        path = os.path.join(settings.SHARE_ROOT, "videos", "test_video_1.mp4")
        os.remove(path)
-        path = os.path.join(settings.SHARE_ROOT, "videos", "meta_info.txt")
+        path = os.path.join(settings.SHARE_ROOT, "videos", "manifest.jsonl")
        os.remove(path)
        path = os.path.join(settings.SHARE_ROOT, "manifest.jsonl")
        os.remove(path)
    def _run_api_v1_tasks_id_data_post(self, tid, user, data):
@ -2257,7 +2286,7 @@ class TaskDataAPITestCase(APITestCase):
            self.assertEqual(len(images), min(task["data_chunk_size"], len(image_sizes)))
            if task["data_original_chunk_type"] == self.ChunkType.IMAGESET:
-                server_files = [img for key, img in data.items() if key.startswith("server_files")]
+                server_files = [img for key, img in data.items() if key.startswith("server_files") and not img.endswith("manifest.jsonl")]
                client_files = [img for key, img in data.items() if key.startswith("client_files")]
                if server_files:
@ -2446,7 +2475,7 @@ class TaskDataAPITestCase(APITestCase):
        image_sizes = self._image_sizes[task_data["server_files[0]"]]
        self._test_api_v1_tasks_id_data_spec(user, task_spec, task_data, self.ChunkType.IMAGESET, self.ChunkType.IMAGESET, image_sizes,
-                                             expected_uploaded_data_location=StorageChoice.SHARE)
+                                             expected_uploaded_data_location=StorageChoice.LOCAL)
        task_spec.update([('name', 'my archive task #12')])
        task_data.update([('copy_data', True)])
@ -2546,7 +2575,7 @@ class TaskDataAPITestCase(APITestCase):
        image_sizes = self._image_sizes[task_data["server_files[0]"]]
        self._test_api_v1_tasks_id_data_spec(user, task_spec, task_data, self.ChunkType.IMAGESET,
-            self.ChunkType.IMAGESET, image_sizes, StorageMethodChoice.CACHE, StorageChoice.SHARE)
+            self.ChunkType.IMAGESET, image_sizes, StorageMethodChoice.CACHE, StorageChoice.LOCAL)
        task_spec.update([('name', 'my cached zip archive task #19')])
        task_data.update([('copy_data', True)])
@ -2595,11 +2624,6 @@ class TaskDataAPITestCase(APITestCase):
        self._test_api_v1_tasks_id_data_spec(user, task_spec, task_data,
            self.ChunkType.IMAGESET, self.ChunkType.IMAGESET, image_sizes)
        prepare_meta_for_upload(
            prepare_meta,
            os.path.join(settings.SHARE_ROOT, "videos", "test_video_1.mp4"),
            os.path.join(settings.SHARE_ROOT, "videos")
        )
        task_spec = {
            "name": "my video with meta info task without copying #22",
            "overlap": 0,
@ -2611,7 +2635,7 @@ class TaskDataAPITestCase(APITestCase):
        }
        task_data = {
            "server_files[0]": os.path.join("videos", "test_video_1.mp4"),
-            "server_files[1]": os.path.join("videos", "meta_info.txt"),
+            "server_files[1]": os.path.join("videos", "manifest.jsonl"),
            "image_quality": 70,
            "use_cache": True
        }
@ -2723,6 +2747,38 @@ class TaskDataAPITestCase(APITestCase):
                                             self.ChunkType.IMAGESET,
                                             image_sizes, dimension=DimensionType.DIM_3D)
        task_spec = {
            "name": "my images+manifest without copying #26",
            "overlap": 0,
            "segment_size": 0,
            "labels": [
                {"name": "car"},
                {"name": "person"},
            ]
        }
        task_data = {
            "server_files[0]": "test_1.jpg",
            "server_files[1]": "test_2.jpg",
            "server_files[2]": "test_3.jpg",
            "server_files[3]": "manifest.jsonl",
            "image_quality": 70,
            "use_cache": True
        }
        image_sizes = [
            self._image_sizes[task_data["server_files[0]"]],
            self._image_sizes[task_data["server_files[1]"]],
            self._image_sizes[task_data["server_files[2]"]],
        ]
        self._test_api_v1_tasks_id_data_spec(user, task_spec, task_data, self.ChunkType.IMAGESET, self.ChunkType.IMAGESET,
            image_sizes, StorageMethodChoice.CACHE, StorageChoice.SHARE)
        task_spec.update([('name', 'my images+manifest #27')])
        task_data.update([('copy_data', True)])
        self._test_api_v1_tasks_id_data_spec(user, task_spec, task_data, self.ChunkType.IMAGESET, self.ChunkType.IMAGESET,
            image_sizes, StorageMethodChoice.CACHE, StorageChoice.LOCAL)
    def test_api_v1_tasks_id_data_admin(self):
        self._test_api_v1_tasks_id_data(self.admin)
--- a/cvat/apps/engine/utils.py
+++ b/cvat/apps/engine/utils.py
@ -1,15 +1,17 @@
-# Copyright (C) 2020 Intel Corporation
+# Copyright (C) 2020-2021 Intel Corporation
 #
 # SPDX-License-Identifier: MIT
 import ast
 import cv2 as cv
 from collections import namedtuple
 import hashlib
 import importlib
 import sys
 import traceback
 import subprocess
 import os
 from av import VideoFrame
 from django.core.exceptions import ValidationError
@ -51,6 +53,7 @@ class InterpreterError(Exception):
 def execute_python_code(source_code, global_vars=None, local_vars=None):
    try:
        # pylint: disable=exec-used
        exec(source_code, global_vars, local_vars)
    except SyntaxError as err:
        error_class = err.__class__.__name__
@ -72,7 +75,7 @@ def av_scan_paths(*paths):
    if 'yes' == os.environ.get('CLAM_AV'):
        command = ['clamscan', '--no-summary', '-i', '-o']
        command.extend(paths)
-        res = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+        res = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE) # nosec
        if res.returncode:
            raise ValidationError(res.stdout)
@ -88,3 +91,8 @@ def rotate_image(image, angle):
    matrix[1, 2] += bound_h/2 - image_center[1]
    matrix = cv.warpAffine(image, matrix, (bound_w, bound_h))
    return matrix
 def md5_hash(frame):
    if isinstance(frame, VideoFrame):
        frame = frame.to_image()
    return hashlib.md5(frame.tobytes()).hexdigest() # nosec
--- a/utils/dataset_manifest/README.md
+++ b/utils/dataset_manifest/README.md
@ -0,0 +1,118 @@
 ## Simple command line to prepare dataset manifest file
 ### Steps before use
 When used separately from Computer Vision Annotation Tool(CVAT), the required dependencies must be installed
 #### Ubuntu:20.04
 Install dependencies:
 ```bash
 # General
 sudo apt-get update && sudo apt-get --no-install-recommends install -y \
    python3-dev python3-pip python3-venv pkg-config
 ```
 ```bash
 # Library components
 sudo apt-get install --no-install-recommends -y \
    libavformat-dev libavcodec-dev libavdevice-dev \
    libavutil-dev libswscale-dev libswresample-dev libavfilter-dev
 ```
 Create an environment and install the necessary python modules:
 ```bash
 python3 -m venv .env
 . .env/bin/activate
 pip install -U pip
 pip install -r requirements.txt
 ```
 ### Using
 ```bash
 usage: python create.py [-h] [--force] [--output-dir .] source
 positional arguments:
  source                Source paths
 optional arguments:
  -h, --help            show this help message and exit
  --force               Use this flag to prepare the manifest file for video data if by default the video does not meet the requirements
                        and a manifest file is not prepared
  --output-dir OUTPUT_DIR
                        Directory where the manifest file will be saved
 ```
 ### Alternative way to use with openvino/cvat_server
 ```bash
 docker run -it --entrypoint python3 -v /path/to/host/data/:/path/inside/container/:rw openvino/cvat_server
 utils/dataset_manifest/create.py --output-dir /path/to/manifest/directory/ /path/to/data/
 ```
 ### Examples of using
 Create a dataset manifest in the current directory with video which contains enough keyframes:
 ```bash
 python create.py ~/Documents/video.mp4
 ```
 Create a dataset manifest with video which does not contain enough keyframes:
 ```bash
 python create.py --force --output-dir ~/Documents ~/Documents/video.mp4
 ```
 Create a dataset manifest with images:
 ```bash
 python create.py --output-dir ~/Documents ~/Documents/images/
 ```
 Create a dataset manifest with pattern (may be used `*`, `?`, `[]`):
 ```bash
 python create.py --output-dir ~/Documents "/home/${USER}/Documents/**/image*.jpeg"
 ```
 Create a dataset manifest with `openvino/cvat_server`:
 ```bash
 docker run -it --entrypoint python3 -v ~/Documents/data/:${HOME}/manifest/:rw openvino/cvat_server
 utils/dataset_manifest/create.py --output-dir ~/manifest/ ~/manifest/images/
 ```
 ### Examples of generated `manifest.jsonl` files
 A maifest file contains some intuitive information and some specific like:
 `pts` - time at which the frame should be shown to the user
 `checksum` - `md5` hash sum for the specific image/frame
 #### For a video
 ```json
 {"version":"1.0"}
 {"type":"video"}
 {"properties":{"name":"video.mp4","resolution":[1280,720],"length":778}}
 {"number":0,"pts":0,"checksum":"17bb40d76887b56fe8213c6fded3d540"}
 {"number":135,"pts":486000,"checksum":"9da9b4d42c1206d71bf17a7070a05847"}
 {"number":270,"pts":972000,"checksum":"a1c3a61814f9b58b00a795fa18bb6d3e"}
 {"number":405,"pts":1458000,"checksum":"18c0803b3cc1aa62ac75b112439d2b62"}
 {"number":540,"pts":1944000,"checksum":"4551ecea0f80e95a6c32c32e70cac59e"}
 {"number":675,"pts":2430000,"checksum":"0e72faf67e5218c70b506445ac91cdd7"}
 ```
 #### For a dataset with images
 ```json
 {"version":"1.0"}
 {"type":"images"}
 {"name":"image1","extension":".jpg","width":720,"height":405,"checksum":"548918ec4b56132a5cff1d4acabe9947"}
 {"name":"image2","extension":".jpg","width":183,"height":275,"checksum":"4b4eefd03cc6a45c1c068b98477fb639"}
 {"name":"image3","extension":".jpg","width":301,"height":167,"checksum":"0e454a6f4a13d56c82890c98be063663"}
 ```
--- a/utils/dataset_manifest/init.py
+++ b/utils/dataset_manifest/init.py
@ -0,0 +1,4 @@
 # Copyright (C) 2021 Intel Corporation
 #
 # SPDX-License-Identifier: MIT
 from .core import VideoManifestManager, ImageManifestManager
--- a/utils/dataset_manifest/core.py
+++ b/utils/dataset_manifest/core.py
@ -0,0 +1,446 @@
 # Copyright (C) 2021 Intel Corporation
 #
 # SPDX-License-Identifier: MIT
 import av
 import json
 import os
 from abc import ABC, abstractmethod
 from collections import OrderedDict
 from contextlib import closing
 from PIL import Image
 from .utils import md5_hash, rotate_image
 class VideoStreamReader:
    def __init__(self, source_path):
        self.source_path = source_path
        self._key_frames = OrderedDict()
        self.frames = 0
        with closing(av.open(self.source_path, mode='r')) as container:
            self.width, self.height = self._get_frame_size(container)
    @staticmethod
    def _get_video_stream(container):
        video_stream = next(stream for stream in container.streams if stream.type == 'video')
        video_stream.thread_type = 'AUTO'
        return video_stream
    @staticmethod
    def _get_frame_size(container):
        video_stream = VideoStreamReader._get_video_stream(container)
        for packet in container.demux(video_stream):
            for frame in packet.decode():
                if video_stream.metadata.get('rotate'):
                    frame = av.VideoFrame().from_ndarray(
                        rotate_image(
                            frame.to_ndarray(format='bgr24'),
                            360 - int(container.streams.video[0].metadata.get('rotate')),
                        ),
                        format ='bgr24',
                    )
                return frame.width, frame.height
    def check_type_first_frame(self):
        with closing(av.open(self.source_path, mode='r')) as container:
            video_stream = self._get_video_stream(container)
            for packet in container.demux(video_stream):
                for frame in packet.decode():
                    if not frame.pict_type.name == 'I':
                        raise Exception('First frame is not key frame')
                    return
    def check_video_timestamps_sequences(self):
        with closing(av.open(self.source_path, mode='r')) as container:
            video_stream = self._get_video_stream(container)
            frame_pts = -1
            frame_dts = -1
            for packet in container.demux(video_stream):
                for frame in packet.decode():
                    if None not in {frame.pts, frame_pts} and frame.pts <= frame_pts:
                        raise Exception('Invalid pts sequences')
                    if None not in {frame.dts, frame_dts} and frame.dts <= frame_dts:
                        raise Exception('Invalid dts sequences')
                    frame_pts, frame_dts = frame.pts, frame.dts
    def rough_estimate_frames_ratio(self, upper_bound):
        analyzed_frames_number, key_frames_number = 0, 0
        _processing_end = False
        with closing(av.open(self.source_path, mode='r')) as container:
            video_stream = self._get_video_stream(container)
            for packet in container.demux(video_stream):
                for frame in packet.decode():
                    if frame.key_frame:
                        key_frames_number += 1
                    analyzed_frames_number += 1
                    if upper_bound == analyzed_frames_number:
                        _processing_end = True
                        break
                if _processing_end:
                    break
        # In our case no videos with non-key first frame, so 1 key frame is guaranteed
        return analyzed_frames_number // key_frames_number
    def validate_frames_ratio(self, chunk_size):
        upper_bound = 3 * chunk_size
        ratio = self.rough_estimate_frames_ratio(upper_bound + 1)
        assert ratio < upper_bound, 'Too few keyframes'
    def get_size(self):
        return self.frames
    @property
    def frame_sizes(self):
        return (self.width, self.height)
    def validate_key_frame(self, container, video_stream, key_frame):
        for packet in container.demux(video_stream):
            for frame in packet.decode():
                if md5_hash(frame) != key_frame[1]['md5'] or frame.pts != key_frame[1]['pts']:
                    self._key_frames.pop(key_frame[0])
                return
    def validate_seek_key_frames(self):
        with closing(av.open(self.source_path, mode='r')) as container:
            video_stream = self._get_video_stream(container)
            key_frames_copy = self._key_frames.copy()
            for key_frame in key_frames_copy.items():
                container.seek(offset=key_frame[1]['pts'], stream=video_stream)
                self.validate_key_frame(container, video_stream, key_frame)
    def save_key_frames(self):
        with closing(av.open(self.source_path, mode='r')) as container:
            video_stream = self._get_video_stream(container)
            frame_number = 0
            for packet in container.demux(video_stream):
                for frame in packet.decode():
                    if frame.key_frame:
                        self._key_frames[frame_number] = {
                            'pts': frame.pts,
                            'md5': md5_hash(frame),
                        }
                    frame_number += 1
            self.frames = frame_number
    @property
    def key_frames(self):
        return self._key_frames
    def __len__(self):
        return len(self._key_frames)
    #TODO: need to change it in future
    def __iter__(self):
        for idx, key_frame in self._key_frames.items():
            yield (idx, key_frame['pts'], key_frame['md5'])
 class DatasetImagesReader:
    def __init__(self, sources, is_sorted=True, use_image_hash=False, *args, **kwargs):
        self._sources = sources if is_sorted else sorted(sources)
        self._content = []
        self._data_dir = kwargs.get('data_dir', None)
        self._use_image_hash = use_image_hash
    def __iter__(self):
        for image in self._sources:
            img = Image.open(image, mode='r')
            img_name = os.path.relpath(image, self._data_dir) if self._data_dir \
                else os.path.basename(image)
            name, extension = os.path.splitext(img_name)
            image_properties = {
                'name': name,
                'extension': extension,
                'width': img.width,
                'height': img.height,
            }
            if self._use_image_hash:
                image_properties['checksum'] = md5_hash(img)
            yield image_properties
    def create(self):
        for item in self:
            self._content.append(item)
    @property
    def content(self):
        return self._content
 class _Manifest:
    FILE_NAME = 'manifest.jsonl'
    VERSION = '1.0'
    def __init__(self, path, is_created=False):
        assert path, 'A path to manifest file not found'
        self._path = os.path.join(path, self.FILE_NAME) if os.path.isdir(path) else path
        self._is_created = is_created
    @property
    def path(self):
        return self._path
    @property
    def is_created(self):
        return self._is_created
    @is_created.setter
    def is_created(self, value):
        assert isinstance(value, bool)
        self._is_created = value
 # Needed for faster iteration over the manifest file, will be generated to work inside CVAT
 # and will not be generated when manually creating a manifest
 class _Index:
    FILE_NAME = 'index.json'
    def __init__(self, path):
        assert path and os.path.isdir(path), 'No index directory path'
        self._path = os.path.join(path, self.FILE_NAME)
        self._index = {}
    @property
    def path(self):
        return self._path
    def dump(self):
        with open(self._path, 'w') as index_file:
            json.dump(self._index, index_file,  separators=(',', ':'))
    def load(self):
        with open(self._path, 'r') as index_file:
            self._index = json.load(index_file,
                object_hook=lambda d: {int(k): v for k, v in d.items()})
    def create(self, manifest, skip):
        assert os.path.exists(manifest), 'A manifest file not exists, index cannot be created'
        with open(manifest, 'r+') as manifest_file:
            while skip:
                manifest_file.readline()
                skip -= 1
            image_number = 0
            position = manifest_file.tell()
            line = manifest_file.readline()
            while line:
                if line.strip():
                    self._index[image_number] = position
                    image_number += 1
                    position = manifest_file.tell()
                line = manifest_file.readline()
    def partial_update(self, manifest, number):
        assert os.path.exists(manifest), 'A manifest file not exists, index cannot be updated'
        with open(manifest, 'r+') as manifest_file:
            manifest_file.seek(self._index[number])
            line = manifest_file.readline()
            while line:
                if line.strip():
                    self._index[number] = manifest_file.tell()
                    number += 1
                line = manifest_file.readline()
    def __getitem__(self, number):
        assert 0 <= number < len(self), \
            'A invalid index number: {}\nMax: {}'.format(number, len(self))
        return self._index[number]
    def __len__(self):
        return len(self._index)
 class _ManifestManager(ABC):
    BASE_INFORMATION = {
        'version' : 1,
        'type': 2,
    }
    def __init__(self, path, *args, **kwargs):
        self._manifest = _Manifest(path)
    def _parse_line(self, line):
        """ Getting a random line from the manifest file """
        with open(self._manifest.path, 'r') as manifest_file:
            if isinstance(line, str):
                assert line in self.BASE_INFORMATION.keys(), \
                    'An attempt to get non-existent information from the manifest'
                for _ in range(self.BASE_INFORMATION[line]):
                    fline = manifest_file.readline()
                return json.loads(fline)[line]
            else:
                assert self._index, 'No prepared index'
                offset = self._index[line]
                manifest_file.seek(offset)
                properties = manifest_file.readline()
                return json.loads(properties)
    def init_index(self):
        self._index = _Index(os.path.dirname(self._manifest.path))
        if os.path.exists(self._index.path):
            self._index.load()
        else:
            self._index.create(self._manifest.path, 3 if self._manifest.TYPE == 'video' else 2)
            self._index.dump()
    @abstractmethod
    def create(self, content, **kwargs):
        pass
    @abstractmethod
    def partial_update(self, number, properties):
        pass
    def __iter__(self):
        with open(self._manifest.path, 'r') as manifest_file:
            manifest_file.seek(self._index[0])
            image_number = 0
            line = manifest_file.readline()
            while line:
                if not line.strip():
                    continue
                yield (image_number, json.loads(line))
                image_number += 1
                line = manifest_file.readline()
    @property
    def manifest(self):
        return self._manifest
    def __len__(self):
        if hasattr(self, '_index'):
            return len(self._index)
        else:
            return None
    def __getitem__(self, item):
        return self._parse_line(item)
    @property
    def index(self):
        return self._index
 class VideoManifestManager(_ManifestManager):
    def __init__(self, manifest_path, *args, **kwargs):
        super().__init__(manifest_path)
        setattr(self._manifest, 'TYPE', 'video')
        self.BASE_INFORMATION['properties'] = 3
    def create(self, content, **kwargs):
        """ Creating and saving a manifest file """
        with open(self._manifest.path, 'w') as manifest_file:
            base_info = {
                'version': self._manifest.VERSION,
                'type': self._manifest.TYPE,
                'properties': {
                    'name': os.path.basename(content.source_path),
                    'resolution': content.frame_sizes,
                    'length': content.get_size(),
                },
            }
            for key, value in base_info.items():
                json_item = json.dumps({key: value}, separators=(',', ':'))
                manifest_file.write(f'{json_item}\n')
            for item in content:
                json_item = json.dumps({
                    'number': item[0],
                    'pts': item[1],
                    'checksum': item[2]
                }, separators=(',', ':'))
                manifest_file.write(f"{json_item}\n")
        self._manifest.is_created = True
    def partial_update(self, number, properties):
        pass
    @staticmethod
    def prepare_meta(media_file, upload_dir=None, chunk_size=36, force=False):
        source_path = os.path.join(upload_dir, media_file) if upload_dir else media_file
        meta_info = VideoStreamReader(source_path=source_path)
        meta_info.check_type_first_frame()
        try:
            meta_info.validate_frames_ratio(chunk_size)
        except AssertionError:
            if not force:
                raise
        meta_info.check_video_timestamps_sequences()
        meta_info.save_key_frames()
        meta_info.validate_seek_key_frames()
        return meta_info
 #TODO: add generic manifest structure file validation
 class ManifestValidator:
    def validate_base_info(self):
        with open(self._manifest.path, 'r') as manifest_file:
            assert self._manifest.VERSION != json.loads(manifest_file.readline())['version']
            assert self._manifest.TYPE != json.loads(manifest_file.readline())['type']
 class VideoManifestValidator(VideoManifestManager):
    def __init__(self, **kwargs):
        self.source_path = kwargs.pop('source_path')
        super().__init__(self, **kwargs)
    def validate_key_frame(self, container, video_stream, key_frame):
        for packet in container.demux(video_stream):
            for frame in packet.decode():
                assert frame.pts == key_frame['pts'], "The uploaded manifest does not match the video"
                return
    def validate_seek_key_frames(self):
        with closing(av.open(self.source_path, mode='r')) as container:
            video_stream = self._get_video_stream(container)
            last_key_frame = None
            for _, key_frame in self:
                # check that key frames sequence sorted
                if last_key_frame and last_key_frame['number'] >= key_frame['number']:
                    raise AssertionError('Invalid saved key frames sequence in manifest file')
                container.seek(offset=key_frame['pts'], stream=video_stream)
                self.validate_key_frame(container, video_stream, key_frame)
                last_key_frame = key_frame
    def validate_frame_numbers(self):
        with closing(av.open(self.source_path, mode='r')) as container:
            video_stream = self._get_video_stream(container)
            # not all videos contain information about numbers of frames
            frames = video_stream.frames
            if frames:
                assert frames == self['properties']['length'], "The uploaded manifest does not match the video"
                return
 class ImageManifestManager(_ManifestManager):
    def __init__(self, manifest_path):
        super().__init__(manifest_path)
        setattr(self._manifest, 'TYPE', 'images')
    def create(self, content, **kwargs):
        """ Creating and saving a manifest file"""
        with open(self._manifest.path, 'w') as manifest_file:
            base_info = {
                'version': self._manifest.VERSION,
                'type': self._manifest.TYPE,
            }
            for key, value in base_info.items():
                json_item = json.dumps({key: value}, separators=(',', ':'))
                manifest_file.write(f'{json_item}\n')
            for item in content:
                json_item = json.dumps({
                    key: value for key, value in item.items()
                }, separators=(',', ':'))
                manifest_file.write(f"{json_item}\n")
        self._manifest.is_created = True
    def partial_update(self, number, properties):
        pass
    @staticmethod
    def prepare_meta(sources, **kwargs):
        meta_info = DatasetImagesReader(sources=sources, **kwargs)
        meta_info.create()
        return meta_info
--- a/utils/dataset_manifest/create.py
+++ b/utils/dataset_manifest/create.py
@ -0,0 +1,91 @@
 # Copyright (C) 2021 Intel Corporation
 #
 # SPDX-License-Identifier: MIT
 import argparse
 import mimetypes
 import os
 import sys
 from glob import glob
 def _define_data_type(media):
    media_type, _ = mimetypes.guess_type(media)
    if media_type:
        return media_type.split('/')[0]
 def _is_video(media_file):
    return _define_data_type(media_file) == 'video'
 def _is_image(media_file):
    return _define_data_type(media_file) == 'image'
 def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--force', action='store_true',
        help='Use this flag to prepare the manifest file for video data '
             'if by default the video does not meet the requirements and a manifest file is not prepared')
    parser.add_argument('--output-dir',type=str, help='Directory where the manifest file will be saved',
        default=os.getcwd())
    parser.add_argument('source', type=str, help='Source paths')
    return parser.parse_args()
 def main():
    args = get_args()
    manifest_directory = os.path.abspath(args.output_dir)
    os.makedirs(manifest_directory, exist_ok=True)
    source = os.path.abspath(args.source)
    sources = []
    if not os.path.isfile(source): # directory/pattern with images
        data_dir = None
        if os.path.isdir(source):
            data_dir = source
            for root, _, files in os.walk(source):
                sources.extend([os.path.join(root, f) for f in files if _is_image(f)])
        else:
            items = source.lstrip('/').split('/')
            position = 0
            try:
                for item in items:
                    if set(item) & {'*', '?', '[', ']'}:
                        break
                    position += 1
                else:
                    raise Exception('Wrong positional argument')
                assert position != 0, 'Wrong pattern: there must be a common root'
                data_dir = source.split(items[position])[0]
            except Exception as ex:
                sys.exit(str(ex))
            sources = list(filter(_is_image, glob(source, recursive=True)))
        try:
            assert len(sources), 'A images was not found'
            manifest = ImageManifestManager(manifest_path=manifest_directory)
            meta_info = manifest.prepare_meta(sources=sources, is_sorted=False,
                use_image_hash=True, data_dir=data_dir)
            manifest.create(meta_info)
        except Exception as ex:
            sys.exit(str(ex))
    else: # video
        try:
            assert _is_video(source), 'You can specify a video path or a directory/pattern with images'
            manifest = VideoManifestManager(manifest_path=manifest_directory)
            try:
                meta_info = manifest.prepare_meta(media_file=source, force=args.force)
            except AssertionError as ex:
                if str(ex) == 'Too few keyframes':
                    msg = 'NOTE: prepared manifest file contains too few key frames for smooth decoding.\n' \
                        'Use --force flag if you still want to prepare a manifest file.'
                    print(msg)
                    sys.exit(2)
                else:
                    raise
            manifest.create(meta_info)
        except Exception as ex:
            sys.exit(str(ex))
    print('The manifest file has been prepared')
 if __name__ == "__main__":
    base_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
    sys.path.append(base_dir)
    from dataset_manifest.core import VideoManifestManager, ImageManifestManager
    main()
--- a/utils/dataset_manifest/requirements.txt
+++ b/utils/dataset_manifest/requirements.txt
@ -0,0 +1,3 @@
 av==8.0.2 --no-binary=av
 opencv-python-headless==4.4.0.42
 Pillow==7.2.0
--- a/utils/dataset_manifest/utils.py
+++ b/utils/dataset_manifest/utils.py
@ -0,0 +1,24 @@
 # Copyright (C) 2021 Intel Corporation
 #
 # SPDX-License-Identifier: MIT
 import hashlib
 import cv2 as cv
 from av import VideoFrame
 def rotate_image(image, angle):
    height, width = image.shape[:2]
    image_center = (width/2, height/2)
    matrix = cv.getRotationMatrix2D(image_center, angle, 1.)
    abs_cos = abs(matrix[0,0])
    abs_sin = abs(matrix[0,1])
    bound_w = int(height * abs_sin + width * abs_cos)
    bound_h = int(height * abs_cos + width * abs_sin)
    matrix[0, 2] += bound_w/2 - image_center[0]
    matrix[1, 2] += bound_h/2 - image_center[1]
    matrix = cv.warpAffine(image, matrix, (bound_w, bound_h))
    return matrix
 def md5_hash(frame):
    if isinstance(frame, VideoFrame):
        frame = frame.to_image()
    return hashlib.md5(frame.tobytes()).hexdigest() # nosec
--- a/utils/prepare_meta_information/README.md
+++ b/utils/prepare_meta_information/README.md
@ -1,30 +0,0 @@
 # Simple command line for prepare meta information for video data
 **Usage**
 ```bash
 usage: prepare.py [-h] [-chunk_size CHUNK_SIZE] video_file meta_directory
 positional arguments:
  video_file            Path to video file
  meta_directory        Directory where the file with meta information will be saved
 optional arguments:
  -h, --help            show this help message and exit
  -chunk_size CHUNK_SIZE
                        Chunk size that will be specified when creating the task with specified video and generated meta information
 ```
 **NOTE**: For smooth video decoding, the `chunk size` must be greater than or equal to the ratio of number of frames
 to a number of key frames.
 You can understand the approximate `chunk size` by preparing and looking at the file with meta information.
 **NOTE**: If ratio of number of frames to number of key frames is small compared to the `chunk size`,
 then when creating a task with prepared meta information, you should expect that the waiting time for some chunks
 will be longer than the waiting time for other chunks. (At the first iteration, when there is no chunk in the cache)
 **Examples**
 ```bash
 python prepare.py ~/Documents/some_video.mp4 ~/Documents
 ```
--- a/utils/prepare_meta_information/prepare.py
+++ b/utils/prepare_meta_information/prepare.py
@ -1,37 +0,0 @@
 # Copyright (C) 2020 Intel Corporation
 #
 # SPDX-License-Identifier: MIT
 import argparse
 import sys
 import os
 def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('video_file',
        type=str,
        help='Path to video file')
    parser.add_argument('meta_directory',
        type=str,
        help='Directory where the file with meta information will be saved')
    parser.add_argument('-chunk_size',
        type=int,
        help='Chunk size that will be specified when creating the task with specified video and generated meta information')
    return parser.parse_args()
 def main():
    args = get_args()
    try:
        smooth_decoding = prepare_meta_for_upload(prepare_meta, args.video_file, None, args.meta_directory, args.chunk_size)
        print('Meta information for video has been prepared')
        if smooth_decoding != None and not smooth_decoding:
            print('NOTE: prepared meta information contains too few key frames for smooth decoding.')
    except Exception:
        print('Impossible to prepare meta information')
 if __name__ == "__main__":
    base_dir = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    sys.path.append(base_dir)
    from cvat.apps.engine.prepare import prepare_meta, prepare_meta_for_upload
    main()