tensorflow版のyolo2のインストールと学習 - 八谷大岳の覚え書きブログ

yolo2のtensorflow版がいくつかgithubに上がっているので、 thtrieuのdarkflowをインストールして、学習させてみた試してみた。

thtrieuのdarkflow
github.com

インストール
flowを用いた検出
flowを用いた学習
ネットワーク構成

インストール

インストールの手順は以下の通り。

> git clone https://github.com/thtrieu/darkflow
> python3 setup.py build_ext --inplace
> pip install -e .
> pip install .

内容としては、setup.pyを用いてdarkflow/cython_utilsに、nms（non maximum supression）とbb関連のcythonコードがあるので、コンパイルし、darkflowパッケージをインストールしている。

次に、yolo2の学習済みのモデルファイルyolov2.weightsをダウンロードし、binフォルダに置く。

> cd bin
> wget https://pjreddie.com/media/files/yolov2.weights

flowを用いた検出

そして、flowを以下のように実行する。

> flow --model cfg/yolo.cfg --load bin/yolov2.weights
/home/hachiya/works/yolo2/darkflow/darkflow/dark/darknet.py:54: UserWarning: ./cfg/yolov2.cfg not found, use cfg/yolo.cfg instead
  cfg_path, FLAGS.model))
Parsing cfg/yolo.cfg
Loading bin/yolov2.weights ...
Successfully identified 203934260 bytes
Finished in 0.04647064208984375s
Model has a coco model name, loading coco labels.

Building net ...
Source | Train? | Layer description                | Output size
-------+--------+----------------------------------+---------------
       |        | input                            | (?, 608, 608, 3)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 608, 608, 32)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 304, 304, 32)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 304, 304, 64)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 152, 152, 64)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 152, 152, 128)
 Load  |  Yep!  | conv 1x1p0_1  +bnorm  leaky      | (?, 152, 152, 64)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 152, 152, 128)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 76, 76, 128)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 76, 76, 256)
 Load  |  Yep!  | conv 1x1p0_1  +bnorm  leaky      | (?, 76, 76, 128)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 76, 76, 256)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 38, 38, 256)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 38, 38, 512)
 Load  |  Yep!  | conv 1x1p0_1  +bnorm  leaky      | (?, 38, 38, 256)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 38, 38, 512)
 Load  |  Yep!  | conv 1x1p0_1  +bnorm  leaky      | (?, 38, 38, 256)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 38, 38, 512)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 19, 19, 512)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 19, 19, 1024)
 Load  |  Yep!  | conv 1x1p0_1  +bnorm  leaky      | (?, 19, 19, 512)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 19, 19, 1024)
 Load  |  Yep!  | conv 1x1p0_1  +bnorm  leaky      | (?, 19, 19, 512)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 19, 19, 1024)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 19, 19, 1024)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 19, 19, 1024)
 Load  |  Yep!  | concat [16]                      | (?, 38, 38, 512)
 Load  |  Yep!  | conv 1x1p0_1  +bnorm  leaky      | (?, 38, 38, 64)
 Load  |  Yep!  | local flatten 2x2                | (?, 19, 19, 256)
 Load  |  Yep!  | concat [27, 24]                  | (?, 19, 19, 1280)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 19, 19, 1024)
 Load  |  Yep!  | conv 1x1p0_1    linear           | (?, 19, 19, 425)
-------+--------+----------------------------------+---------------
Running entirely on CPU
2018-06-13 13:24:26.305912: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-06-13 13:24:27.013552: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-06-13 13:24:27.013818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2018-06-13 13:24:27.013833: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
Finished in 4.2318079471588135s

Forwarding 8 inputs ...
Total time = 3.7926931381225586s / 8 inps = 2.1093190797819523 ips
Post processing 8 inputs ...
Total time = 0.30597639083862305s / 8 inps = 26.145808106545484 ips

「/cfg/yolov2.cfg not found」というwarningが出ているが、代わりに「cfg/yolo.cfg」が読み込まれているようだ。8枚の画像に対し、0.3秒で処理したと表示されている。GTX1080Tiなのでそこそこ速い？
8枚の画像は「sample_img」以下にあり、検出結果は、sample_img/outに保存されている。以下が結果の例である。
f:id:hirotaka_hachiya:20180613133208p:plain

ちなみに、引数に「--json」を付けると、sample_img/outにjsonの拡張子のファイルが出力され、以下のように検出したBBのラベル、スコア、座標を出力してくれる。なんとも便利。

[{"label": "person", "confidence": 0.82, "topleft": {"x": 189, "y": 95}, "bottomright": {"x": 271, "y": 380}}, {"label": "dog", "confidence": 0.8, "topleft": {"x": 69, "y": 258}, "bottomright": {"x": 209, "y": 355}}, {"label": "horse", "confidence": 0.89, "topleft": {"x": 397, "y": 127}, "bottomright": {"x": 605, "y": 352}}]

flowを用いた学習

学習は、引数に--trainを付ければよい。

> flow --model cfg/yolo.cfg --load bin/yolov2.weights --train --gpu 1.0

ただし、そのまま実行すると以下のエラーがでる。
「Error: Annotation directory not found ../pascal/VOCdevkit/ANN/ .」
ちなみに、--gpuはGPUをどれくらい使うかを0～1の範囲で指定している。GPUのIDではないので注意。

darkflow/default.pyを見ると以下のように、datasetとannotationが設定されているので、引数としてdatasetとannotationを自分のデータに合わせて設定すればよいとのこと。しかも、アノテーションはpascalVOC形式のようだ。

self.define('dataset', '../pascal/VOCdevkit/IMG/', 'path to dataset directory')
self.define('annotation', '../pascal/VOCdevkit/ANN/', 'path to annotation directory')

早速kittiデータで学習したいところだが、今回は、pascal VOC2007で試してみた。

> mkdir data
> cd data
> curl -O https://pjreddie.com/media/files/VOCtest_06-Nov-2007.tartar xf VOCtest_06-Nov-2007.tar
> tar xf VOCtest_06-Nov-2007.tar
> ls VOCdevkit/VOC2007/
Annotations  ImageSets  JPEGImages  SegmentationClass  SegmentationObject

datasetは「data/VOCdevkit/VOC2007/JPEGImages」、annotationは[
data/VOCdevkit/VOC2007/Annotations」にそれぞれあるので、引数で以下のように指定してflowを実行する。ただし、GPUのメモリの関係上モデルはtiny-yolo.cfgを用いて、学習方法としてadamを用いた。

>  flow --model cfg/tiny-yolo.cfg --train --gpu 1.0 --dataset "data/VOCdevkit/VOC2007/JPEGImages/" --annotation "data/VOCdevkit/VOC2007/Annotations" --trainer adam
Parsing cfg/tiny-yolo.cfg
Loading None ...
Finished in 5.0067901611328125e-05s
Model has a coco model name, loading coco labels.

Building net ...
Source | Train? | Layer description                | Output size
-------+--------+----------------------------------+---------------
       |        | input                            | (?, 416, 416, 3)
 Init  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 416, 416, 16)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 208, 208, 16)
 Init  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 208, 208, 32)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 104, 104, 32)
 Init  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 104, 104, 64)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 52, 52, 64)
 Init  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 52, 52, 128)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 26, 26, 128)
 Init  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 26, 26, 256)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 13, 13, 256)
 Init  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 13, 13, 512)
 Load  |  Yep!  | maxp 2x2p0_1                     | (?, 13, 13, 512)
 Init  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 13, 13, 1024)
 Init  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 13, 13, 1024)
 Init  |  Yep!  | conv 1x1p0_1    linear           | (?, 13, 13, 425)
-------+--------+----------------------------------+---------------
GPU mode with 1.0 usage
cfg/tiny-yolo.cfg loss hyper-parameters:
        H       = 13
        W       = 13
        box     = 5
        classes = 80
        scales  = [1.0, 5.0, 1.0, 1.0]
]
...
Dataset size: 4952
Dataset of 4952 instance(s)
Training statistics:
        Learning rate : 1e-05
        Batch size    : 16
        Epoch number  : 1000
        Backup every  : 2000
step 1 - loss 108.20623016357422 - moving ave loss 108.20623016357422
step 2 - loss 108.07720184326172 - moving ave loss 108.19332733154297
step 3 - loss 108.5759048461914 - moving ave loss 108.23158508300781
step 4 - loss 110.49064636230469 - moving ave loss 108.4574912109375
step 5 - loss 111.99656677246094 - moving ave loss 108.81139876708986
step 6 - loss 109.51751708984375 - moving ave loss 108.88201059936524
step 7 - loss 107.66697692871094 - moving ave loss 108.76050723229982
step 8 - loss 109.469482421875 - moving ave loss 108.83140475125734
...
Checkpoint at step 1125
step 1126 - loss 84.48749542236328 - moving ave loss 85.27347111125957
step 1127 - loss 84.01033020019531 - moving ave loss 85.14715702015314
step 1128 - loss 84.40534210205078 - moving ave loss 85.07297552834291
step 1129 - loss 85.56282043457031 - moving ave loss 85.12196001896565
step 1130 - loss 87.02748107910156 - moving ave loss 85.31251212497924

学習を回すとckptというフォルダが作られて、そこにチェックポイントが保存されるようになっている。ちなみに、10000回くらい回すとlossが5あたりまで減る。ただし、tiny-yolo.cfgはクラス数が80になっていて、MS COCO用のファイル設定ファイルであることに後から気づいた。そこで、cfg/yolo-voc.cfgで以下のように学習してみた。

flow --model cfg/yolo-voc.cfg --train --gpu 1.0 --dataset "data/VOCdevkit/VOC2007/JPEGImages/" --annotation "data/VOCdevkit/VOC2007/Annotations" --trainer adam
...
GPU mode with 1.0 usage
cfg/yolo-voc.cfg loss hyper-parameters:
        H       = 13
        W       = 13
        box     = 5
        classes = 20
        scales  = [1.0, 5.0, 1.0, 1.0]
Building cfg/yolo-voc.cfg loss
Building cfg/yolo-voc.cfg train op
2018-06-14 00:11:42.967112: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-06-14 00:11:43.538943: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-06-14 00:11:43.539323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2018-06-14 00:11:43.539337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-06-14 00:11:43.541108: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 10.91G (11712987136 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
...
step 113 - loss 107.9711685180664 - moving ave loss 107.79369719607465
step 114 - loss 108.29700469970703 - moving ave loss 107.84402794643789
step 115 - loss 107.34602355957031 - moving ave loss 107.79422750775115
step 116 - loss 109.22856140136719 - moving ave loss 107.93766089711275
step 117 - loss 106.66355895996094 - moving ave loss 107.81025070339757
step 118 - loss 109.05477905273438 - moving ave loss 107.93470353833126
step 119 - loss 107.35346984863281 - moving ave loss 107.87658016936142
step 120 - loss 107.55178833007812 - moving ave loss 107.8441009854331
step 121 - loss 108.16707611083984 - moving ave loss 107.87639849797378
step 122 - loss 107.94886779785156 - moving ave loss 107.88364542796157
step 123 - loss 108.37859344482422 - moving ave loss 107.93314022964783
step 124 - loss 106.92852783203125 - moving ave loss 107.83267898988618
step 125 - loss 108.65249633789062 - moving ave loss 107.91466072468663
Checkpoint at step 125
...
Finish 5 epoch(es)
step 1546 - loss 68.74520874023438 - moving ave loss 69.44457628681288
step 1547 - loss 68.43368530273438 - moving ave loss 69.34348718840502
step 1548 - loss 68.29949188232422 - moving ave loss 69.23908765779694
step 1549 - loss 69.23208618164062 - moving ave loss 69.23838751018131
step 1550 - loss 69.6432876586914 - moving ave loss 69.27887752503233
step 1551 - loss 69.58462524414062 - moving ave loss 69.30945229694316
step 1552 - loss 67.54881286621094 - moving ave loss 69.13338835386995
step 1553 - loss 68.92791748046875 - moving ave loss 69.11284126652983
step 1554 - loss 68.00341033935547 - moving ave loss 69.00189817381239
...
step 57833 - loss 3.7261016368865967 - moving ave loss 4.266748164827713
step 57834 - loss 2.4047257900238037 - moving ave loss 4.080545927347322
step 57835 - loss 4.587594509124756 - moving ave loss 4.131250785525066
step 57836 - loss 2.978076934814453 - moving ave loss 4.015933400454005
step 57837 - loss 5.362588882446289 - moving ave loss 4.150598948653233

上記のように、CUDA_ERROR_OUT_OF_MEMORYがでたものの、学習がまわり収束した。

ネットワーク構成

最後の1x1 conv

Init | Yep! | conv 1x1p0_1 linear | (?, 13, 13, 425)

の出力の425フィルターの内訳は、以下のようになっている。
※詳細は、「darkflow/cython_utils/cy_yolo2_findboxes.pyx」を参照するとよい。

ディメンションクラスターanchors(5つ) × { bboxの座標(４次元) + confidence(1次元) + 80クラス｝

なお、bboxの座標の順番は、以下のようになっている。

x座標の特徴マップ上の残差スコア（最終的にはsigmoidをかけるので、最大で1ピクセルしか移動しない）
y座標の特徴マップ上の残差スコア（最終的にはsigmoidをかけるので、最大で1ピクセルしか移動しない）
anchorの幅に対する対数ratio（倍率）
anchorの高さに対する対数ratio（倍率）

また、cfg/yolo.cfgファイルにて設定されている以下のanchorsは、幅、高さの順番で５個のアンカーを設定している。また、MS COCOデータセット（http://cocodataset.org/#home）を想定しているためクラス数が８０個ある。

[region]
anchors =  0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828
bias_match=1
classes=80
coords=4
num=5