GPU-Suport: Mask-RCNN + Minor GPU fixes (#2714)

* fixed cpu mask rcnn+preparation for gpu
* fix-limit gpu memory to 30% of total memory per worker

Co-authored-by: Nikita Manovich <nikita.manovich@intel.com>
main
Ali Jahani 5 years ago committed by GitHub
parent daedff4204
commit 59c3b28116
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added ### Added
- CVAT-3D: support lidar data on the server side (<https://github.com/openvinotoolkit/cvat/pull/2534>) - CVAT-3D: support lidar data on the server side (<https://github.com/openvinotoolkit/cvat/pull/2534>)
- GPU support for Mask-RCNN and improvement in its deployment time (<https://github.com/openvinotoolkit/cvat/pull/2714>)
- CVAT-3D: Load all frames corresponding to the job instance - CVAT-3D: Load all frames corresponding to the job instance
(<https://github.com/openvinotoolkit/cvat/pull/2645>) (<https://github.com/openvinotoolkit/cvat/pull/2645>)
- Intelligent scissors with OpenCV javascript (<https://github.com/openvinotoolkit/cvat/pull/2689>) - Intelligent scissors with OpenCV javascript (<https://github.com/openvinotoolkit/cvat/pull/2689>)
@ -23,7 +24,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Updated HTTPS install README section (cleanup and described more robust deploy) - Updated HTTPS install README section (cleanup and described more robust deploy)
- Logstash is improved for using with configurable elasticsearch outputs (<https://github.com/openvinotoolkit/cvat/pull/2531>) - Logstash is improved for using with configurable elasticsearch outputs (<https://github.com/openvinotoolkit/cvat/pull/2531>)
- Bumped nuclio version to 1.5.16 - Bumped nuclio version to 1.5.16 (<https://github.com/openvinotoolkit/cvat/pull/2578>)
- All methods for interative segmentation accept negative points as well - All methods for interative segmentation accept negative points as well
- Persistent queue added to logstash (<https://github.com/openvinotoolkit/cvat/pull/2744>) - Persistent queue added to logstash (<https://github.com/openvinotoolkit/cvat/pull/2744>)
@ -36,7 +37,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- -
### Fixed ### Fixed
- More robust execution of nuclio GPU functions by limiting the GPU memory consumption per worker (<https://github.com/openvinotoolkit/cvat/pull/2714>)
- Kibana startup initialization (<https://github.com/openvinotoolkit/cvat/pull/2659>) - Kibana startup initialization (<https://github.com/openvinotoolkit/cvat/pull/2659>)
- The cursor jumps to the end of the line when renaming a task (<https://github.com/openvinotoolkit/cvat/pull/2669>) - The cursor jumps to the end of the line when renaming a task (<https://github.com/openvinotoolkit/cvat/pull/2669>)
- SSLCertVerificationError when remote source is used (<https://github.com/openvinotoolkit/cvat/pull/2683>) - SSLCertVerificationError when remote source is used (<https://github.com/openvinotoolkit/cvat/pull/2683>)

@ -122,10 +122,10 @@ You develop CVAT under WSL (Windows subsystem for Linux) following next steps.
### DL models as serverless functions ### DL models as serverless functions
Install [nuclio platform](https://github.com/nuclio/nuclio): Follow this [guide](/cvat/apps/documentation/installation_automatic_annotation.md) to install Nuclio:
- You have to install `nuctl` command line tool to build and deploy serverless - You have to install `nuctl` command line tool to build and deploy serverless
functions. Download [the latest release](https://github.com/nuclio/nuclio/blob/development/docs/reference/nuctl/nuctl.md#download). functions.
- The simplest way to explore Nuclio is to run its graphical user interface (GUI) - The simplest way to explore Nuclio is to run its graphical user interface (GUI)
of the Nuclio dashboard. All you need in order to run the dashboard is Docker. See of the Nuclio dashboard. All you need in order to run the dashboard is Docker. See
[nuclio documentation](https://github.com/nuclio/nuclio#quick-start-steps) [nuclio documentation](https://github.com/nuclio/nuclio#quick-start-steps)

@ -80,7 +80,7 @@ For more information about supported formats look at the
| [f-BRS](/serverless/pytorch/saic-vul/fbrs/nuclio) | interactor | PyTorch | X | | | [f-BRS](/serverless/pytorch/saic-vul/fbrs/nuclio) | interactor | PyTorch | X | |
| [Inside-Outside Guidance](/serverless/pytorch/shiyinzhang/iog/nuclio) | interactor | PyTorch | X | | | [Inside-Outside Guidance](/serverless/pytorch/shiyinzhang/iog/nuclio) | interactor | PyTorch | X | |
| [Faster RCNN](/serverless/tensorflow/faster_rcnn_inception_v2_coco/nuclio) | detector | TensorFlow | X | X | | [Faster RCNN](/serverless/tensorflow/faster_rcnn_inception_v2_coco/nuclio) | detector | TensorFlow | X | X |
| [Mask RCNN](/serverless/tensorflow/matterport/mask_rcnn/nuclio) | detector | TensorFlow | X | | | [Mask RCNN](/serverless/tensorflow/matterport/mask_rcnn/nuclio) | detector | TensorFlow | X | X |
<!--lint enable maximum-line-length--> <!--lint enable maximum-line-length-->

@ -290,7 +290,7 @@ docker-compose -f docker-compose.yml \
### Semi-automatic and automatic annotation ### Semi-automatic and automatic annotation
Please follow [instructions](/cvat/apps/documentation/installation_automatic_annotation.md) Please follow this [guide](/cvat/apps/documentation/installation_automatic_annotation.md).
### Stop all containers ### Stop all containers

@ -53,47 +53,80 @@
- See [deploy_cpu.sh](/serverless/deploy_cpu.sh) for more examples. - See [deploy_cpu.sh](/serverless/deploy_cpu.sh) for more examples.
#### GPU Support #### GPU Support
You will need to install [Nvidia Container Toolkit](https://www.tensorflow.org/install/docker#gpu_support).
You will need to install Nvidia Container Toolkit and make sure your docker supports GPU. Follow [Nvidia docker instructions](https://www.tensorflow.org/install/docker#gpu_support). Also you will need to add `--resource-limit nvidia.com/gpu=1 --triggers '{"myHttpTrigger": {"maxWorkers": 1}}'` to
Also you will need to add `--resource-limit nvidia.com/gpu=1` to the nuclio deployment command. the nuclio deployment command. You can increase the maxWorker if you have enough GPU memory.
As an example, below will run on the GPU: As an example, below will run on the GPU:
```bash ```bash
nuctl deploy tf-faster-rcnn-inception-v2-coco-gpu \ nuctl deploy --project-name cvat \
--project-name cvat --path "serverless/tensorflow/faster_rcnn_inception_v2_coco/nuclio" --platform local \ --path `pwd`/tensorflow/matterport/mask_rcnn/nuclio \
--base-image tensorflow/tensorflow:2.1.1-gpu \ --platform local --base-image tensorflow/tensorflow:1.15.5-gpu-py3 \
--desc "Faster RCNN from Tensorflow Object Detection GPU API" \ --desc "GPU based implementation of Mask RCNN on Python 3, Keras, and TensorFlow." \
--image cvat/tf.faster_rcnn_inception_v2_coco_gpu \ --image cvat/tf.matterport.mask_rcnn_gpu
--triggers '{"myHttpTrigger": {"maxWorkers": 1}}' \
--resource-limit nvidia.com/gpu=1 --resource-limit nvidia.com/gpu=1
``` ```
**Note:** **Note:**
- The number of GPU deployed functions will be limited to your GPU memory.
- Since the model is loaded during deployment, the number of GPU functions you can deploy will be limited to your GPU memory.
- See [deploy_gpu.sh](/serverless/deploy_gpu.sh) script for more examples. - See [deploy_gpu.sh](/serverless/deploy_gpu.sh) script for more examples.
####Debugging Nuclio Functions: **Troubleshooting Nuclio Functions:**
- You can open nuclio dashboard at [localhost:8070](http://localhost:8070). Make sure status of your functions are up and running without any error. - You can open nuclio dashboard at [localhost:8070](http://localhost:8070). Make sure status of your functions are up and running without any error.
- Test your deployed DL model as a serverless function. The command below should work on Linux and Mac OS.
```bash
image=$(curl https://upload.wikimedia.org/wikipedia/en/7/7d/Lenna_%28test_image%29.png --output - | base64 | tr -d '\n')
cat << EOF > /tmp/input.json
{"image": "$image"}
EOF
cat /tmp/input.json | nuctl invoke openvino.omz.public.yolo-v3-tf -c 'application/json'
```
- To check for internal server errors, run `docker ps -a` to see the list of containers. Find the container that you are interested, e.g. `nuclio-nuclio-tf-faster-rcnn-inception-v2-coco-gpu`. Then check its logs by <details>
```bash ```bash
docker logs <name of your container> 20.07.17 12:07:44.519 nuctl.platform.invoker (I) Executing function {"method": "POST", "url": "http://:57308", "headers": {"Content-Type":["application/json"],"X-Nuclio-Log-Level":["info"],"X-Nuclio-Target":["openvino.omz.public.yolo-v3-tf"]}}
20.07.17 12:07:45.275 nuctl.platform.invoker (I) Got response {"status": "200 OK"}
20.07.17 12:07:45.275 nuctl (I) >>> Start of function logs
20.07.17 12:07:45.275 ino.omz.public.yolo-v3-tf (I) Run yolo-v3-tf model {"worker_id": "0", "time": 1594976864570.9353}
20.07.17 12:07:45.275 nuctl (I) <<< End of function logs
> Response headers:
Date = Fri, 17 Jul 2020 09:07:45 GMT
Content-Type = application/json
Content-Length = 100
Server = nuclio
> Response body:
[
{
"confidence": "0.9992254",
"label": "person",
"points": [
39,
124,
408,
512
],
"type": "rectangle"
}
]
``` ```
</details>
- To check for internal server errors, run `docker ps -a` to see the list of containers.
Find the container that you are interested, e.g., `nuclio-nuclio-tf-faster-rcnn-inception-v2-coco-gpu`.
Then check its logs by `docker logs <name of your container>`
e.g., e.g.,
```bash ```bash
docker logs nuclio-nuclio-tf-faster-rcnn-inception-v2-coco-gpu docker logs nuclio-nuclio-tf-faster-rcnn-inception-v2-coco-gpu
``` ```
- If you would like to debug a code inside a container, you can use vscode to directly attach to a container [instructions](https://code.visualstudio.com/docs/remote/attach-container). To apply your changes, make sure to restart the container. - To debug a code inside a container, you can use vscode to attach to a container [instructions](https://code.visualstudio.com/docs/remote/attach-container).
To apply your changes, make sure to restart the container.
```bash ```bash
docker restart <name_of_the_container> docker restart <name_of_the_container>
``` ```
> **⚠ WARNING:**
> Do not use nuclio dashboard to stop the container because with any modifications, it rebuilds the container and you will lose your changes.

@ -8,8 +8,18 @@ nuctl create project cvat
nuctl deploy --project-name cvat \ nuctl deploy --project-name cvat \
--path "$SCRIPT_DIR/tensorflow/faster_rcnn_inception_v2_coco/nuclio" \ --path "$SCRIPT_DIR/tensorflow/faster_rcnn_inception_v2_coco/nuclio" \
--platform local --base-image tensorflow/tensorflow:2.1.1-gpu \ --platform local --base-image tensorflow/tensorflow:2.1.1-gpu \
--desc "Faster RCNN from Tensorflow Object Detection GPU API" \ --desc "GPU based Faster RCNN from Tensorflow Object Detection API" \
--image cvat/tf.faster_rcnn_inception_v2_coco_gpu \ --image cvat/tf.faster_rcnn_inception_v2_coco_gpu \
--resource-limit nvidia.com/gpu=1 --triggers '{"myHttpTrigger": {"maxWorkers": 1}}' \
--resource-limit nvidia.com/gpu=1 --verbose
nuctl deploy --project-name cvat \
--path "$SCRIPT_DIR/tensorflow/matterport/mask_rcnn/nuclio" \
--platform local --base-image tensorflow/tensorflow:1.15.5-gpu-py3 \
--desc "GPU based implementation of Mask RCNN on Python 3, Keras, and TensorFlow." \
--image cvat/tf.matterport.mask_rcnn_gpu\
--triggers '{"myHttpTrigger": {"maxWorkers": 1}}' \
--resource-limit nvidia.com/gpu=1 --verbose
nuctl get function nuctl get function

@ -15,9 +15,10 @@ class ModelLoader:
serialized_graph = fid.read() serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph) od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='') tf.import_graph_def(od_graph_def, name='')
gpu_fraction = 0.333
config = tf.ConfigProto() gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=gpu_fraction,
config.gpu_options.allow_growth = True allow_growth=True)
config = tf.ConfigProto(gpu_options=gpu_options)
self.session = tf.Session(graph=detection_graph, config=config) self.session = tf.Session(graph=detection_graph, config=config)
self.image_tensor = detection_graph.get_tensor_by_name('image_tensor:0') self.image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')

@ -102,22 +102,19 @@ spec:
value: /opt/nuclio/Mask_RCNN value: /opt/nuclio/Mask_RCNN
build: build:
image: cvat/tf.matterport.mask_rcnn image: cvat/tf.matterport.mask_rcnn
baseImage: tensorflow/tensorflow:2.1.0-py3 baseImage: tensorflow/tensorflow:1.13.1-py3
directives: directives:
postCopy: postCopy:
- kind: WORKDIR - kind: WORKDIR
value: /opt/nuclio value: /opt/nuclio
- kind: RUN - kind: RUN
value: apt update && apt install --no-install-recommends -y git curl libsm6 libxext6 libgl1-mesa-glx value: apt update && apt install --no-install-recommends -y git curl
- kind: RUN - kind: RUN
value: git clone https://github.com/matterport/Mask_RCNN.git value: git clone --depth 1 https://github.com/matterport/Mask_RCNN.git
- kind: RUN - kind: RUN
value: curl -L https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5 -o Mask_RCNN/mask_rcnn_coco.h5 value: curl -L https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5 -o Mask_RCNN/mask_rcnn_coco.h5
- kind: RUN - kind: RUN
value: pip3 install scipy cython matplotlib scikit-image opencv-python-headless h5py \ value: pip3 install numpy cython pyyaml keras==2.1.0 scikit-image Pillow
imgaug IPython[all] tensorflow==1.13.1 keras==2.1.0 pillow pyyaml
- kind: RUN
value: pip3 install pycocotools
triggers: triggers:
myHttpTrigger: myHttpTrigger:

@ -1,4 +1,4 @@
# Copyright (C) 2018-2020 Intel Corporation # Copyright (C) 2020-2021 Intel Corporation
# #
# SPDX-License-Identifier: MIT # SPDX-License-Identifier: MIT
@ -6,24 +6,13 @@ import os
import numpy as np import numpy as np
import sys import sys
from skimage.measure import find_contours, approximate_polygon from skimage.measure import find_contours, approximate_polygon
# workaround for tf.placeholder() is not compatible with eager execution
# https://github.com/tensorflow/tensorflow/issues/18165
import tensorflow as tf import tensorflow as tf
tf.compat.v1.disable_eager_execution()
#import tensorflow.compat.v1 as tf
# tf.disable_v2_behavior()
# The directory should contain a clone of
# https://github.com/matterport/Mask_RCNN repository and
# downloaded mask_rcnn_coco.h5 model.
MASK_RCNN_DIR = os.path.abspath(os.environ.get('MASK_RCNN_DIR')) MASK_RCNN_DIR = os.path.abspath(os.environ.get('MASK_RCNN_DIR'))
if MASK_RCNN_DIR: if MASK_RCNN_DIR:
sys.path.append(MASK_RCNN_DIR) # To find local version of the library sys.path.append(MASK_RCNN_DIR) # To find local version of the library
sys.path.append(os.path.join(MASK_RCNN_DIR, 'samples/coco'))
from mrcnn import model as modellib from mrcnn import model as modellib
import coco from mrcnn.config import Config
class ModelLoader: class ModelLoader:
def __init__(self, labels): def __init__(self, labels):
@ -31,12 +20,21 @@ class ModelLoader:
if COCO_MODEL_PATH is None: if COCO_MODEL_PATH is None:
raise OSError('Model path env not found in the system.') raise OSError('Model path env not found in the system.')
class InferenceConfig(coco.CocoConfig): class InferenceConfig(Config):
# Set batch size to 1 since we'll be running inference on NAME = "coco"
# one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU NUM_CLASSES = 1 + 80 # COCO has 80 classes
GPU_COUNT = 1 GPU_COUNT = 1
IMAGES_PER_GPU = 1 IMAGES_PER_GPU = 1
# Limit gpu memory to 30% to allow for other nuclio gpu functions. Increase fraction as you like
import keras.backend.tensorflow_backend as ktf
def get_session(gpu_fraction=0.333):
gpu_options = tf.GPUOptions(
per_process_gpu_memory_fraction=gpu_fraction,
allow_growth=True)
return tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
ktf.set_session(get_session())
# Print config details # Print config details
self.config = InferenceConfig() self.config = InferenceConfig()
self.config.display() self.config.display()
@ -54,7 +52,7 @@ class ModelLoader:
for i in range(len(output["rois"])): for i in range(len(output["rois"])):
score = output["scores"][i] score = output["scores"][i]
class_id = output["class_ids"][i] class_id = output["class_ids"][i]
mask = output["masks"][:,:,i] mask = output["masks"][:, :, i]
if score >= threshold: if score >= threshold:
mask = mask.astype(np.uint8) mask = mask.astype(np.uint8)
contours = find_contours(mask, MASK_THRESHOLD) contours = find_contours(mask, MASK_THRESHOLD)
@ -74,6 +72,4 @@ class ModelLoader:
"type": "polygon", "type": "polygon",
}) })
return results return results
Loading…
Cancel
Save