TensorFlow Configuration and Optimization Notes
2017/02/10
Notes for installing TensorFlow on linux, with GPU enabled.
Background
TensorFlow is the second-generation ML framework from Google. (See this comparison of deep learning software.) The current state-of-the art image recognition models (inception-v3) use this framework.
Prequisites
Assuming Fedora 24 with Nvidia 1060 installed, running nvidia as opposed to nouveau drivers. See Fedora 24 Notes, and RPM Fusion’s installation page for installing the Nvidia drivers. In sum,
dnf install -y xorg-x11-drv-nvidia akmod-nvidia "kernel-devel-uname-r == $(uname -r)" dnf install xorg-x11-drv-nvidia-cuda dnf install vulkan
After, install some devel packages.
dnf install -y vulkan-devel
Download the Nvidia GPU CUDA Toolkit. The version used for this install is 8.0.61, and the network install for Fedora x86_64 was used.
This version of CUDA Toolkit is not C++11/C++14/C++17 aware. So, be aware! One way around this is to mod like below, and use
-std=gnu++98
.
117c117,118 5 --- > /* bkoz use -std=c++98 if necessary */ > #if __GNUC__ > 6
Next, compile top-of-tree OpenCV (aka 3.2) with CUDA enabled. To do so, use the following configure list, mod for paths on system:
cmake -DVERBOSE=1 -DCMAKE_CXX_FLAGS="${CMAKE_CXX_FLAGS} -std=gnu++98 -Wno-deprecated-gpu-targets" -D BUILD_EXAMPLES=1 -D BUILD_DOCS=1 -D WITH_OPENNI=1 -D WITH_CUDA=1 -D CUDA_FAST_MATH=1 -D WITH_CUBLAS=1 -D WITH_FFMPEG=1 -D WITH_EIGEN=1 -D ENABLE_FAST_MATH=1 -D ENABLE_SSE3=1 -D ENABLE_AVX=1 -D CMAKE_BUILD_TYPE=RELEASE -D ENABLE_PRECOMPILED_HEADERS=OFF -D CMAKE_INSTALL_PREFIX=/home/bkoz/bin/H-opencv -D OPENCV_EXTRA_MODULES_PATH=/home/bkoz/src/opencv_contrib.git/modules /home/bkoz/src/opencv.git/
Admittedly, this abuse of CMAKE_CXX_FLAGS
is not optimal. Maybe EXTRA_CXX_FLAGS
?
Now, for Nvidia cuDNN. The version used for this install is 5.1
When that is done, use pip
to install TensorFlow.
sudo pip install --upgrade pip; sudo pip install tensorflow-gpu
This should output something like:
Collecting tensorflow-gpu Downloading tensorflow_gpu-0.12.1-cp27-cp27mu-manylinux1_x86_64.whl (89.7MB) 100% |████████████████████████████████| 89.7MB 19kB/s Requirement already satisfied: mock>=2.0.0 in /usr/lib/python2.7/site-packages (from tensorflow-gpu) Requirement already satisfied: six>=1.10.0 in /usr/lib/python2.7/site-packages (from tensorflow-gpu) Requirement already satisfied: numpy>=1.11.0 in /usr/lib64/python2.7/site-packages (from tensorflow-gpu) Collecting protobuf>=3.1.0 (from tensorflow-gpu) Downloading protobuf-3.2.0-cp27-cp27mu-manylinux1_x86_64.whl (5.6MB) 100% |████████████████████████████████| 5.6MB 284kB/s Collecting wheel (from tensorflow-gpu) Downloading wheel-0.29.0-py2.py3-none-any.whl (66kB) 100% |████████████████████████████████| 71kB 3.3MB/s Requirement already satisfied: pbr>=0.11 in /usr/lib/python2.7/site-packages (from mock>=2.0.0->tensorflow-gpu) Requirement already satisfied: funcsigs>=1 in /usr/lib/python2.7/site-packages (from mock>=2.0.0->tensorflow-gpu) Requirement already satisfied: setuptools in /usr/lib/python2.7/site-packages (from protobuf>=3.1.0->tensorflow-gpu) Installing collected packages: protobuf, wheel, tensorflow-gpu Successfully installed protobuf-3.2.0 tensorflow-gpu-0.12.1 wheel-0.29.0
After this has completed, add in Keras.
Optimization
For Nvidia GPUs, take a look at this interesting post from Netflix. In sum, add
NVreg_CheckPCIConfigSpace=0