Train Mask RCNN với Crowai Food Challenge Dataset

Chuẩn bị data
Download dataset từ trang https://www.aicrowd.com/challenges/food-recognition-challenge/dataset_files (Nếu chưa có tài khoản cần đăng ký và đồng ý tham gia competition mới download được, hoặc có thể tạo tài khoản kaggle để download từ kaggle cũng ok) và chuẩn bị cấu trúc như thư mục sau bên trong assets folder:

Setting proxy
Thêm proxy thông tin vào file /etc/systemd/system/docker.service.d/http-proxy.conf như sau:
[Service]Environment="HTTP_PROXY=http://10.144.3.26:8080/" "HTTPS_PROXY=http://10.144.3.26:8080/"
Thêm proxy thông tin vào trong file /etc/environment như sau:
http_proxy="http://10.144.3.26:8080/"
https_proxy="http://10.144.3.26:8080/"
HTTP_PROXY="http://10.144.3.26:8080/"
HTTPS_PROXY="http://10.144.3.26:8080/"

Soạn docker-compose.yml

version: "3.3"
 services:
   engine:
     build:
       context: .
       dockerfile: Dockerfile
       args:
         - http_proxy=${http_proxy}
         - https_proxy=${https_proxy}
     image: maskrcnn/tf:1.15
     ports:
       - "8889:8888"
     volumes:
       - ./Mask_RCNN/:/engine
     tty: true
     stdin_open: true
     privileged: true
     command: tail -f /dev/null

Soạn Dockerfile

FROM nvcr.io/nvidia/tensorflow:20.03-tf1-py3
RUN env | grep -i proxy
ENV NVIDIA_VISIBLE_DEVICES all
ENV http_proxy=http://proxy.srv.cc.nttcom.co.jp:8080/
ENV https_proxy=http://proxy.srv.cc.nttcom.co.jp:8080/
ENV no_proxy=localhost,127.0.0.1
RUN DEBIAN_FRONTEND=noninteractive apt-get update
WORKDIR /engine
COPY ./Mask_RCNN/ /engine
RUN python setup.py install
RUN apt-get install -y libsm6 libxext6 libxrender-dev
RUN pip install opencv-python scikit-image keras==2.0.8 imgaug

Build docker
docker-compose build --build-arg http_proxy=http://10.144.3.26:8080 --build-arg https_proxy=http://10.144.3.26:8080/
Khởi động container
docker-compose up -d
Chú ý: tách riêng build và up vì khi vừa –build và up cùng nhau thì có khả năng –build-arg không đưa vào trong docker-compose được.
Train command
Dùng pretrained coco để training với dataset mới
docker-compose run engine python samples/food/food.py train --dataset ./assets/food --model coco

Tips
Để chỉ định Gpu hoạt động trong tensorflow/keras ta cần:
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="1" # specify which GPU(s) to be used

import os
import sys
import random
import math
import numpy as np
import skimage.io
import matplotlib
import matplotlib.pyplot as plt

os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"  # specify which GPU(s) to be used

# Root directory of the project
ROOT_DIR = os.path.abspath("../")

import warnings
warnings.filterwarnings("ignore")

# Import Mask RCNN
sys.path.append(ROOT_DIR)  # To find local version of the library
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
# Import Food config
sys.path.append(os.path.join(ROOT_DIR, "samples/food/"))  # To find local version
import food

%matplotlib inline 

# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")

# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(MODEL_DIR, "food20200625T0350/mask_rcnn_food_0039.h5")

# Directory of images to run detection on
IMAGE_DIR = os.path.join(ROOT_DIR, "assets/food")
class InferenceConfig(food.FoodConfig):
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1
    NUM_CLASSES = 62  # 1 Background + 61 food classes
    IMAGE_MAX_DIM=256
    IMAGE_MIN_DIM=256
    NAME = "food"
    DETECTION_MIN_CONFIDENCE=0

config = InferenceConfig()
# config.display()
# Create model object in inference mode.
model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)

# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH, by_name=True)
Re-starting from epoch 39
from food import FoodDataset
dataset_test = FoodDataset()
dataset_test.load_dataset(dataset_dir=IMAGE_DIR, subset='test_images', return_food=True)
dataset_test.prepare()
class_names = dataset_test.class_names

assert len(class_names)==62, "Check Again DatasetConfig"
Annotation Path:  /engine/assets/food/test_images/annotations.json
Image Dir:  /engine/assets/food/test_images/images
loading annotations into memory...
Done (t=0.02s)
creating index...
index created!
dataset = dataset_test
fig = plt.figure(figsize=(10, 30))
for i in range(4):

    image_id = random.choice(dataset.image_ids)

    original_image, image_meta, gt_class_id, gt_bbox, gt_mask =\
        modellib.load_image_gt(dataset, config, 
                               image_id, use_mini_mask=False)

    print(original_image.shape)
    plt.subplot(6, 2, 2*i + 1)
    visualize.display_instances(original_image, gt_bbox, gt_mask, gt_class_id, 
                                dataset.class_names, ax=fig.axes[-1])

    plt.subplot(6, 2, 2*i + 2)
    results = model.detect([original_image]) #, verbose=1)
    r = results[0]
    visualize.display_instances(original_image, r['rois'], r['masks'], r['class_ids'], 
                                dataset.class_names, r['scores'], ax=fig.axes[-1])
(256, 256, 3)
(256, 256, 3)
(256, 256, 3)
(256, 256, 3)

Chú ý nếu xảy ra lỗi trên thì tham khảo https://github.com/tensorflow/tensorflow/issues/24828https://qiita.com/kikusumk3/items/907565559739376076b9 để setting options cho GPU. Lý do là tensorflow sẽ tự chiếm hết GPU tồn tại, ta cần vô hiệu hóa chức năng này lại để có thể bằng cách:
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

hoặc là
import tensorflow as tf
config=tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.5))
sess = tf.Session(config=config)

UnknownError: 2 root error(s) found.   (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.  [[{{node conv1/convolution}}]]  [[mrcnn_detection/ExpandDims/_1623]]   (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.  [[{{node conv1/convolution}}]] 0 successful operations. 0 derived errors ignored. During handling of the above exception, another exception occurred:

Have fun!
Tài liệu tham khảo:
https://www.kaggle.com/c/data-science-bowl-2018/discussion/54089
https://www.kaggle.com/c/data-science-bowl-2018/discussion/56326

Share

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *