前回はwindowsでCPU用のdockerでTensorflowを動かしてみた。
hirotaka-hachiya.hatenablog.com
今回は、GPU上でTensorflowを動かすために、GPU版のイメージを使ってdockerコンテナを起動する。
> docker run -it gcr.io/tensorflow/tensorflow:latest-devel-gpu
自動的に必要なパッケージがインストールされ、dockerが起動できたので、pythonでテストをしたところ下記のエラーがでた。
root@d30158b5383d:~# python Python 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:105] Couldn't open CUDA library libcuda.so.1. LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
どうやらlibcuda.so.1が見つからないようだ。ちょっと探してみたところ/usr/local/cuda/lib64/stubsにlibcuda.soがあったので、シンボリックリンクを張り、LD_LIBRARY_PATHにパスを追加した。
root@d30158b5383d:~# cd /usr/local/cuda/lib64/stubs/ root@d30158b5383d:/usr/local/cuda/lib64/stubs# ls libcublas.so libcufftw.so libcusparse.so libnpps.so libnvrtc.so libcuda.so libcurand.so libnppc.so libnvidia-ml.so libcufft.so libcusolver.so libnppi.so libnvidia-ml.so.1 root@d30158b5383d:/usr/local/cuda/lib64/stubs# ln -s libcuda.so libcuda.so.1 root@d30158b5383d:/usr/local/cuda/lib64/stubs# export LD_LIBRARY_PATH=/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH
下記のようにpython上でtensorflowモジュールを読み込めるようになった。
root@d30158b5383d:/usr/local/cuda/lib64/stubs# python Python 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
次に、MNISTのサンプルプログラムを実行してみたところ、実行ができたもののドライバー関連のエラーがでている。
どうやらGPUのドライバが入っていないためGPUにアクセスができていないようだ。。。
そもそもWindows上のバーチャル環境からはPCIデバイスにアクセスできないとう問題のようだ。GPUを使う場合は素直にLinux上で動かした方がいいかも。
http://scriptlife.hacca.jp/contents/programming/2016/09/14/post-1766/
root@d30158b5383d:/usr/local/cuda/lib64/stubs# python -m tensorflow.models.image.mnist.convolutional I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes. Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes. Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes. Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes. Extracting data/train-images-idx3-ubyte.gz Extracting data/train-labels-idx1-ubyte.gz Extracting data/t10k-images-idx3-ubyte.gz Extracting data/t10k-labels-idx1-ubyte.gz E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUresult(-1) I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:140] kernel driver does not appear to be running on this host (d30158b5383d): /proc/driver/nvidia/version does not exist Initialized! Step 0 (epoch 0.00), 16.7 ms Minibatch loss: 12.053, learning rate: 0.010000 Minibatch error: 90.6% Step 100 (epoch 0.12), 798.4 ms Minibatch loss: 3.276, learning rate: 0.010000 Minibatch error: 6.2% Validation error: 7.2% Step 200 (epoch 0.23), 504.9 ms Minibatch loss: 3.457, learning rate: 0.010000 Minibatch error: 14.1%