GPU-Suport: Mask-RCNN + Minor GPU fixes (#2714)

* fixed cpu mask rcnn+preparation for gpu * fix-limit gpu memory to 30% of total memory per worker Co-authored-by: Nikita Manovich <nikita.manovich@intel.com>
5 years ago · 59c3b28116
parent daedff4204
commit 59c3b28116
9 changed files with 97 additions and 59 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Added
 - CVAT-3D: support lidar data on the server side (<https://github.com/openvinotoolkit/cvat/pull/2534>)
 - GPU support for Mask-RCNN and improvement in its deployment time (<https://github.com/openvinotoolkit/cvat/pull/2714>)
 - CVAT-3D: Load all frames corresponding to the job instance
  (<https://github.com/openvinotoolkit/cvat/pull/2645>)
 - Intelligent scissors with OpenCV javascript (<https://github.com/openvinotoolkit/cvat/pull/2689>)
@ -23,7 +24,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Updated HTTPS install README section (cleanup and described more robust deploy)
 - Logstash is improved for using with configurable elasticsearch outputs (<https://github.com/openvinotoolkit/cvat/pull/2531>)
- Bumped nuclio version to 1.5.16
+- Bumped nuclio version to 1.5.16 (<https://github.com/openvinotoolkit/cvat/pull/2578>)
 - All methods for interative segmentation accept negative points as well
 - Persistent queue added to logstash (<https://github.com/openvinotoolkit/cvat/pull/2744>)
@ -36,7 +37,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 -
 ### Fixed
-
+- More robust execution of nuclio GPU functions by limiting the GPU memory consumption per worker (<https://github.com/openvinotoolkit/cvat/pull/2714>)
 - Kibana startup initialization (<https://github.com/openvinotoolkit/cvat/pull/2659>)
 - The cursor jumps to the end of the line when renaming a task (<https://github.com/openvinotoolkit/cvat/pull/2669>)
 - SSLCertVerificationError when remote source is used (<https://github.com/openvinotoolkit/cvat/pull/2683>)
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -122,10 +122,10 @@ You develop CVAT under WSL (Windows subsystem for Linux) following next steps.
 ### DL models as serverless functions
-Install [nuclio platform](https://github.com/nuclio/nuclio):
+Follow this [guide](/cvat/apps/documentation/installation_automatic_annotation.md) to install Nuclio:
 - You have to install `nuctl` command line tool to build and deploy serverless
-  functions. Download [the latest release](https://github.com/nuclio/nuclio/blob/development/docs/reference/nuctl/nuctl.md#download).
+  functions.
 - The simplest way to explore Nuclio is to run its graphical user interface (GUI)
  of the Nuclio dashboard. All you need in order to run the dashboard is Docker. See
  [nuclio documentation](https://github.com/nuclio/nuclio#quick-start-steps)
--- a/README.md
+++ b/README.md
@ -80,7 +80,7 @@ For more information about supported formats look at the
 | [f-BRS](/serverless/pytorch/saic-vul/fbrs/nuclio)                                                       | interactor | PyTorch    | X   |     |
 | [Inside-Outside Guidance](/serverless/pytorch/shiyinzhang/iog/nuclio)                                   | interactor | PyTorch    | X   |     |
 | [Faster RCNN](/serverless/tensorflow/faster_rcnn_inception_v2_coco/nuclio)                              | detector   | TensorFlow | X   | X   |
-| [Mask RCNN](/serverless/tensorflow/matterport/mask_rcnn/nuclio)                                         | detector   | TensorFlow | X   |     |
+| [Mask RCNN](/serverless/tensorflow/matterport/mask_rcnn/nuclio)                                         | detector   | TensorFlow | X   | X   |
 <!--lint enable maximum-line-length-->
--- a/cvat/apps/documentation/installation.md
+++ b/cvat/apps/documentation/installation.md
@ -290,7 +290,7 @@ docker-compose -f docker-compose.yml \
 ### Semi-automatic and automatic annotation
-Please follow [instructions](/cvat/apps/documentation/installation_automatic_annotation.md)
+Please follow this [guide](/cvat/apps/documentation/installation_automatic_annotation.md).
 ### Stop all containers
--- a/cvat/apps/documentation/installation_automatic_annotation.md
+++ b/cvat/apps/documentation/installation_automatic_annotation.md
@ -53,47 +53,80 @@
  - See [deploy_cpu.sh](/serverless/deploy_cpu.sh) for more examples.
  #### GPU Support
-
+  You will need to install [Nvidia Container Toolkit](https://www.tensorflow.org/install/docker#gpu_support).
-  You will need to install Nvidia Container Toolkit and make sure your docker supports GPU. Follow [Nvidia docker instructions](https://www.tensorflow.org/install/docker#gpu_support).
+  Also you will need to add `--resource-limit nvidia.com/gpu=1 --triggers '{"myHttpTrigger": {"maxWorkers": 1}}'` to
-  Also you will need to add `--resource-limit nvidia.com/gpu=1` to the nuclio deployment command.
+  the nuclio deployment command. You can increase the maxWorker if you have enough GPU memory.
  As an example, below will run on the GPU:
  ```bash
-  nuctl deploy tf-faster-rcnn-inception-v2-coco-gpu \
+  nuctl deploy --project-name cvat \
-    --project-name cvat --path "serverless/tensorflow/faster_rcnn_inception_v2_coco/nuclio" --platform local \
+    --path `pwd`/tensorflow/matterport/mask_rcnn/nuclio \
-    --base-image tensorflow/tensorflow:2.1.1-gpu \
+    --platform local --base-image tensorflow/tensorflow:1.15.5-gpu-py3 \
-    --desc "Faster RCNN from Tensorflow Object Detection GPU API" \
+    --desc "GPU based implementation of Mask RCNN on Python 3, Keras, and TensorFlow." \
-    --image cvat/tf.faster_rcnn_inception_v2_coco_gpu \
+    --image cvat/tf.matterport.mask_rcnn_gpu
    --triggers '{"myHttpTrigger": {"maxWorkers": 1}}' \
    --resource-limit nvidia.com/gpu=1
  ```
  **Note:**
-
+  - The number of GPU deployed functions will be limited to your GPU memory.
  - Since the model is loaded during deployment, the number of GPU functions you can deploy will be limited to your GPU memory.
  - See [deploy_gpu.sh](/serverless/deploy_gpu.sh) script for more examples.
-####Debugging Nuclio Functions:
+**Troubleshooting Nuclio Functions:**
 - You can open nuclio dashboard at [localhost:8070](http://localhost:8070). Make sure status of your functions are up and running without any error.
 - Test your deployed DL model as a serverless function. The command below should work on Linux and Mac OS.
  ```bash
  image=$(curl https://upload.wikimedia.org/wikipedia/en/7/7d/Lenna_%28test_image%29.png --output - | base64 | tr -d '\n')
  cat << EOF > /tmp/input.json
  {"image": "$image"}
  EOF
  cat /tmp/input.json | nuctl invoke openvino.omz.public.yolo-v3-tf -c 'application/json'
  ```
- To check for internal server errors, run `docker ps -a` to see the list of containers. Find the container that you are interested, e.g. `nuclio-nuclio-tf-faster-rcnn-inception-v2-coco-gpu`. Then check its logs by
+  <details>
  ```bash
-  docker logs <name of your container>
+  20.07.17 12:07:44.519    nuctl.platform.invoker (I) Executing function {"method": "POST", "url": "http://:57308", "headers": {"Content-Type":["application/json"],"X-Nuclio-Log-Level":["info"],"X-Nuclio-Target":["openvino.omz.public.yolo-v3-tf"]}}
  20.07.17 12:07:45.275    nuctl.platform.invoker (I) Got response {"status": "200 OK"}
  20.07.17 12:07:45.275                     nuctl (I) >>> Start of function logs
  20.07.17 12:07:45.275 ino.omz.public.yolo-v3-tf (I) Run yolo-v3-tf model {"worker_id": "0", "time": 1594976864570.9353}
  20.07.17 12:07:45.275                     nuctl (I) <<< End of function logs
  > Response headers:
  Date = Fri, 17 Jul 2020 09:07:45 GMT
  Content-Type = application/json
  Content-Length = 100
  Server = nuclio
  > Response body:
  [
      {
          "confidence": "0.9992254",
          "label": "person",
          "points": [
              39,
              124,
              408,
              512
          ],
          "type": "rectangle"
      }
  ]
  ```
  </details>
 - To check for internal server errors, run `docker ps -a` to see the list of containers.
  Find the container that you are interested, e.g., `nuclio-nuclio-tf-faster-rcnn-inception-v2-coco-gpu`.
  Then check its logs by `docker logs <name of your container>`
  e.g.,
  ```bash
  docker logs nuclio-nuclio-tf-faster-rcnn-inception-v2-coco-gpu
  ```
- If you would like to debug a code inside a container, you can use vscode to directly attach to a container [instructions](https://code.visualstudio.com/docs/remote/attach-container). To apply your changes, make sure to restart the container.
+- To debug a code inside a container, you can use vscode to attach to a container [instructions](https://code.visualstudio.com/docs/remote/attach-container).
-
+  To apply your changes, make sure to restart the container.
  ```bash
  docker restart <name_of_the_container>
  ```
  > **⚠ WARNING:**
  > Do not use nuclio dashboard to stop the container because with any modifications, it rebuilds the container and you will lose your changes.
--- a/serverless/deploy_gpu.sh
+++ b/serverless/deploy_gpu.sh
@ -8,8 +8,18 @@ nuctl create project cvat
 nuctl deploy --project-name cvat \
    --path "$SCRIPT_DIR/tensorflow/faster_rcnn_inception_v2_coco/nuclio" \
    --platform local --base-image tensorflow/tensorflow:2.1.1-gpu \
-    --desc "Faster RCNN from Tensorflow Object Detection GPU API" \
+    --desc "GPU based Faster RCNN from Tensorflow Object Detection API" \
    --image cvat/tf.faster_rcnn_inception_v2_coco_gpu \
-    --resource-limit nvidia.com/gpu=1
+    --triggers '{"myHttpTrigger": {"maxWorkers": 1}}' \
    --resource-limit nvidia.com/gpu=1 --verbose
 nuctl deploy --project-name cvat \
    --path "$SCRIPT_DIR/tensorflow/matterport/mask_rcnn/nuclio" \
    --platform local --base-image tensorflow/tensorflow:1.15.5-gpu-py3 \
    --desc "GPU based implementation of Mask RCNN on Python 3, Keras, and TensorFlow." \
    --image cvat/tf.matterport.mask_rcnn_gpu\
    --triggers '{"myHttpTrigger": {"maxWorkers": 1}}' \
    --resource-limit nvidia.com/gpu=1 --verbose
 nuctl get function
--- a/serverless/tensorflow/faster_rcnn_inception_v2_coco/nuclio/model_loader.py
+++ b/serverless/tensorflow/faster_rcnn_inception_v2_coco/nuclio/model_loader.py
@ -15,9 +15,10 @@ class ModelLoader:
                serialized_graph = fid.read()
                od_graph_def.ParseFromString(serialized_graph)
                tf.import_graph_def(od_graph_def, name='')
-
+            gpu_fraction = 0.333
-            config = tf.ConfigProto()
+            gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=gpu_fraction,
-            config.gpu_options.allow_growth = True
+                                        allow_growth=True)
            config = tf.ConfigProto(gpu_options=gpu_options)
            self.session = tf.Session(graph=detection_graph, config=config)
            self.image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
--- a/serverless/tensorflow/matterport/mask_rcnn/nuclio/function.yaml
+++ b/serverless/tensorflow/matterport/mask_rcnn/nuclio/function.yaml
@ -102,22 +102,19 @@ spec:
      value: /opt/nuclio/Mask_RCNN
  build:
    image: cvat/tf.matterport.mask_rcnn
-    baseImage: tensorflow/tensorflow:2.1.0-py3
+    baseImage: tensorflow/tensorflow:1.13.1-py3
    directives:
      postCopy:
        - kind: WORKDIR
          value: /opt/nuclio
        - kind: RUN
-          value: apt update && apt install --no-install-recommends -y git curl libsm6 libxext6 libgl1-mesa-glx
+          value: apt update && apt install --no-install-recommends -y git curl
        - kind: RUN
-          value: git clone https://github.com/matterport/Mask_RCNN.git
+          value: git clone --depth 1 https://github.com/matterport/Mask_RCNN.git
        - kind: RUN
          value: curl -L https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5 -o Mask_RCNN/mask_rcnn_coco.h5
        - kind: RUN
-          value: pip3 install scipy cython matplotlib scikit-image opencv-python-headless h5py \
+          value: pip3 install numpy cython pyyaml keras==2.1.0 scikit-image Pillow
            imgaug IPython[all] tensorflow==1.13.1 keras==2.1.0 pillow pyyaml
        - kind: RUN
          value: pip3 install pycocotools
  triggers:
    myHttpTrigger:
--- a/serverless/tensorflow/matterport/mask_rcnn/nuclio/model_loader.py
+++ b/serverless/tensorflow/matterport/mask_rcnn/nuclio/model_loader.py
@ -1,4 +1,4 @@
-# Copyright (C) 2018-2020 Intel Corporation
+# Copyright (C) 2020-2021 Intel Corporation
 #
 # SPDX-License-Identifier: MIT
@ -6,24 +6,13 @@ import os
 import numpy as np
 import sys
 from skimage.measure import find_contours, approximate_polygon
 # workaround for tf.placeholder() is not compatible with eager execution
 # https://github.com/tensorflow/tensorflow/issues/18165
 import tensorflow as tf
 tf.compat.v1.disable_eager_execution()
 #import tensorflow.compat.v1 as tf
 #   tf.disable_v2_behavior()
 # The directory should contain a clone of
 # https://github.com/matterport/Mask_RCNN repository and
 # downloaded mask_rcnn_coco.h5 model.
 MASK_RCNN_DIR = os.path.abspath(os.environ.get('MASK_RCNN_DIR'))
 if MASK_RCNN_DIR:
    sys.path.append(MASK_RCNN_DIR)  # To find local version of the library
    sys.path.append(os.path.join(MASK_RCNN_DIR, 'samples/coco'))
 from mrcnn import model as modellib
-import coco
+from mrcnn.config import Config
 class ModelLoader:
    def __init__(self, labels):
@ -31,12 +20,21 @@ class ModelLoader:
        if COCO_MODEL_PATH is None:
            raise OSError('Model path env not found in the system.')
-        class InferenceConfig(coco.CocoConfig):
+        class InferenceConfig(Config):
-            # Set batch size to 1 since we'll be running inference on
+            NAME = "coco"
-            # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
+            NUM_CLASSES = 1 + 80  # COCO has 80 classes
            GPU_COUNT = 1
            IMAGES_PER_GPU = 1
        # Limit gpu memory to 30% to allow for other nuclio gpu functions. Increase fraction as you like
        import keras.backend.tensorflow_backend as ktf
        def get_session(gpu_fraction=0.333):
            gpu_options = tf.GPUOptions(
            per_process_gpu_memory_fraction=gpu_fraction,
            allow_growth=True)
            return tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
        ktf.set_session(get_session())
        # Print config details
        self.config = InferenceConfig()
        self.config.display()
@ -54,7 +52,7 @@ class ModelLoader:
        for i in range(len(output["rois"])):
            score = output["scores"][i]
            class_id = output["class_ids"][i]
-            mask = output["masks"][:,:,i]
+            mask = output["masks"][:, :, i]
            if score >= threshold:
                mask = mask.astype(np.uint8)
                contours = find_contours(mask, MASK_THRESHOLD)
@ -74,6 +72,4 @@ class ModelLoader:
                    "type": "polygon",
                })
-        return results
+        return results