覚え書きブログ

Deep Learning覚え書き(WindowsでTensorflow GPU編)

前回はwindowsでCPU用のdockerでTensorflowを動かしてみた。
hirotaka-hachiya.hatenablog.com

今回は、GPU上でTensorflowを動かすために、GPU版のイメージを使ってdockerコンテナを起動する。

> docker run -it gcr.io/tensorflow/tensorflow:latest-devel-gpu

自動的に必要なパッケージがインストールされ、dockerが起動できたので、pythonでテストをしたところ下記のエラーがでた。

root@d30158b5383d:~# python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] Couldn't open CUDA library libcuda.so.1. LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64

どうやらlibcuda.so.1が見つからないようだ。ちょっと探してみたところ/usr/local/cuda/lib64/stubsにlibcuda.soがあったので、シンボリックリンクを張り、LD_LIBRARY_PATHにパスを追加した。

root@d30158b5383d:~# cd /usr/local/cuda/lib64/stubs/
root@d30158b5383d:/usr/local/cuda/lib64/stubs# ls
libcublas.so  libcufftw.so    libcusparse.so  libnpps.so         libnvrtc.so
libcuda.so    libcurand.so    libnppc.so      libnvidia-ml.so
libcufft.so   libcusolver.so  libnppi.so      libnvidia-ml.so.1
root@d30158b5383d:/usr/local/cuda/lib64/stubs# ln -s libcuda.so libcuda.so.1
root@d30158b5383d:/usr/local/cuda/lib64/stubs# export LD_LIBRARY_PATH=/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH

下記のようにpython上でtensorflowモジュールを読み込めるようになった。

root@d30158b5383d:/usr/local/cuda/lib64/stubs# python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally

次に、MNISTのサンプルプログラムを実行してみたところ、実行ができたもののドライバー関連のエラーがでている。
どうやらGPUのドライバが入っていないためGPUにアクセスができていないようだ。。。
そもそもWindows上のバーチャル環境からはPCIバイスにアクセスできないとう問題のようだ。GPUを使う場合は素直にLinux上で動かした方がいいかも。
http://scriptlife.hacca.jp/contents/programming/2016/09/14/post-1766/

root@d30158b5383d:/usr/local/cuda/lib64/stubs# python -m tensorflow.models.image.mnist.convolutional
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUresult(-1)
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:140] kernel driver does not appear to be running on this host (d30158b5383d): /proc/driver/nvidia/version does not exist
Initialized!
Step 0 (epoch 0.00), 16.7 ms
Minibatch loss: 12.053, learning rate: 0.010000
Minibatch error: 90.6%
Step 100 (epoch 0.12), 798.4 ms
Minibatch loss: 3.276, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 7.2%
Step 200 (epoch 0.23), 504.9 ms
Minibatch loss: 3.457, learning rate: 0.010000
Minibatch error: 14.1%