TensorFlow Configuration and Optimization Notes

Notes for installing TensorFlow on linux, with GPU enabled.

Background

TensorFlow is the second-generation ML framework from Google. (See this comparison of deep learning software.) The current state-of-the art image recognition models (inception-v3) use this framework.

Prequisites

Assuming Fedora 24 with Nvidia 1060 installed, running nvidia as opposed to nouveau drivers. See Fedora 24 Notes, and RPM Fusion’s installation page for installing the Nvidia drivers. In sum,

dnf install -y xorg-x11-drv-nvidia akmod-nvidia "kernel-devel-uname-r == $(uname -r)"
dnf install xorg-x11-drv-nvidia-cuda
dnf install vulkan

After, install some devel packages.

dnf install -y vulkan-devel

Download the Nvidia GPU CUDA Toolkit. The version used for this install is 8.0.61, and the network install for Fedora x86_64 was used.

This version of CUDA Toolkit is not C++11/C++14/C++17 aware. So, be aware! One way around this is to mod like below, and use -std=gnu++98.

117c117,118
 5
---
> /* bkoz use -std=c++98 if necessary */
> #if __GNUC__ > 6

Next, compile top-of-tree OpenCV (aka 3.2) with CUDA enabled. To do so, use the following configure list, mod for paths on system:

cmake -DVERBOSE=1 -DCMAKE_CXX_FLAGS="${CMAKE_CXX_FLAGS} -std=gnu++98 -Wno-deprecated-gpu-targets" -D BUILD_EXAMPLES=1 -D BUILD_DOCS=1 -D WITH_OPENNI=1 -D WITH_CUDA=1 -D CUDA_FAST_MATH=1 -D WITH_CUBLAS=1 -D WITH_FFMPEG=1 -D WITH_EIGEN=1 -D ENABLE_FAST_MATH=1 -D ENABLE_SSE3=1 -D ENABLE_AVX=1 -D CMAKE_BUILD_TYPE=RELEASE -D ENABLE_PRECOMPILED_HEADERS=OFF  -D CMAKE_INSTALL_PREFIX=/home/bkoz/bin/H-opencv -D OPENCV_EXTRA_MODULES_PATH=/home/bkoz/src/opencv_contrib.git/modules /home/bkoz/src/opencv.git/

Admittedly, this abuse of CMAKE_CXX_FLAGS is not optimal. Maybe EXTRA_CXX_FLAGS?

Now, for Nvidia cuDNN. The version used for this install is 5.1

When that is done, use pip to install TensorFlow.

sudo pip install --upgrade pip;
sudo pip install tensorflow-gpu

This should output something like:

Collecting tensorflow-gpu
  Downloading tensorflow_gpu-0.12.1-cp27-cp27mu-manylinux1_x86_64.whl (89.7MB)
    100% |████████████████████████████████| 89.7MB 19kB/s 
Requirement already satisfied: mock>=2.0.0 in /usr/lib/python2.7/site-packages (from tensorflow-gpu)
Requirement already satisfied: six>=1.10.0 in /usr/lib/python2.7/site-packages (from tensorflow-gpu)
Requirement already satisfied: numpy>=1.11.0 in /usr/lib64/python2.7/site-packages (from tensorflow-gpu)
Collecting protobuf>=3.1.0 (from tensorflow-gpu)
  Downloading protobuf-3.2.0-cp27-cp27mu-manylinux1_x86_64.whl (5.6MB)
    100% |████████████████████████████████| 5.6MB 284kB/s 
Collecting wheel (from tensorflow-gpu)
  Downloading wheel-0.29.0-py2.py3-none-any.whl (66kB)
    100% |████████████████████████████████| 71kB 3.3MB/s 
Requirement already satisfied: pbr>=0.11 in /usr/lib/python2.7/site-packages (from mock>=2.0.0->tensorflow-gpu)
Requirement already satisfied: funcsigs>=1 in /usr/lib/python2.7/site-packages (from mock>=2.0.0->tensorflow-gpu)
Requirement already satisfied: setuptools in /usr/lib/python2.7/site-packages (from protobuf>=3.1.0->tensorflow-gpu)
Installing collected packages: protobuf, wheel, tensorflow-gpu
Successfully installed protobuf-3.2.0 tensorflow-gpu-0.12.1 wheel-0.29.0

After this has completed, add in Keras.

Optimization

For Nvidia GPUs, take a look at this interesting post from Netflix. In sum, add

NVreg_CheckPCIConfigSpace=0

All the Uhuras

all_the_uhuras_LR_1800-3k

all_the_uhuras_C_1803-3k

All the Left/Right Uhuras, 2014
All the Center Uhuras, 2014

68 cm x 86.5 cm, Inkjet over lapis and silvertone wash on Awagami Bamboo 250 gsm paper. Master jedi paper tricks via Emily York.

These prints are composed of 288 cropped images of Uhura from the television show Star Trek. Each frame of the first season is analyzed with facial recognition software, and found Uhura faces are either inscribed with tattoo-like circles representing individual facial detection algorithms, or scaled, cropped, and center-aligned via sophisticated image-processing routines

To offset the explicitly computed nature of this work, the images are aligned on broken grids, and floated on an organic background of silvertone metallic or lapis mineral pigments.

OpenCV Configuration and Optimization Notes


Background

The default package for OpenCV on Fedora 20 (f20) is

opencv-2.4.7-6

The performance of such algorithms as Classifier::detectMultiScale and opencv_traincascade can be optimized via the installation of additional packages, and then enabling them when rebuilding OpenCV with various build flags.

Looking through the opencv.spec SRPM file, various enable flags are provided for configuration tweaking and tuning purposes when rebuilding with rpmbuild.

The most relevant for optimization:

--with eigen3
--with sse3

The most relevant for extending capabilities:

--with ffmpeg
--with openni

The default package can be rebuilt with these optimizations using syntax like:

rpmbuild -ba opencv.spec --with ffmpeg --with openni --with eigen3 --with sse3

However, even when using these flags on f20, the output provided by cmake at configuration time as per doesn’t enthuse. So, rebuild upstream sources without RPM to master the package configuration, and then bring this knowledge back into the RPM package. Old school, yo.

Looking at the upstream source repository, and then rebasing the f20 sources to the latest release of OpenCV (2.4.9) starts off the SRPM hacking. To get a cmake build going, build the opencv sources as specified in the link, to get dependency tracking working.

The file CMakeLists.txt has the build-time configure options.

A list of the most interesting:

WITH_CUDA
WITH_CUFFT
WITH_BLAS

WITH_FFMPEG
WITH_OPENNI

WITH_EIGEN
WITH_IPP
WITH_TBB / BUILD_TBB
WITH_OPENMP
WITH_OPENCL

enable_dynamic_cuda
enable_fast_math
enable_sse3

Setup, Install Prerequisites.

A couple of these are easy to enable, with dependencies already pre-packaged.

For development, you’ll need the following dependencies:

yum install -y gtk2-devel libtheora-devel libvorbis-devel libraw1394-devel libdc1394-devel jasper-devel libpng-devel libjpeg-devel libtiff-devel libv4l-devel libGL-devel gtkglext-devel OpenEXR-devel zlib-devel python2-devel swig python-sphinx gstreamer-devel gstreamer-plugins-base-devel opencl-headers gstreamer-plugins-bad-free-devel gstreamer-python-devel gstreamer-devel gstreamer-plugins-bad-free-devel-docs gstreamer-plugins-base-devel-docs gstreamer-plugins-ugly-devel-docs libpng12-devel mesa-libGLES-devel

To execute binaries that have been compiled with this optimized version of opencv, one will need to install the OpenCL runtime.

For OPENNI

yum install -y openni openni-devel openni-doc

For FFMPEG

yum install -y ffmpeg ffmpeg-devel

For TBB

yum install -y tbb tbb-devel tbb-doc

For EIGEN

yum install -y eigen3-devel eigen3-doc

For IPP

To enable WITH_IPP, more elaborate configuration is required. First, install Intel Performance Primitives (aka IPP). From the User’s Guide: Note that opencv_traincascade application can use TBB for multi-threading. To use it in multicore mode OpenCV must be built with TBB.

After IPP is installed, the system must be configured to use it easily. To fixup PATHs, pick one of two options.

One: add the following to LD_LIBRARY_PATH and LD_RUN_PATH:

/opt/intel/ipp/lib/intel64:/opt/intel/lib/intel64/

Two: edit /etc/ldso.conf.d and add

tbb.conf
/opt/intel/lib/intel64

ipp.conf
/opt/intel/ipp/lib/intel64

Furthermore, for OpenCV configuration to find the installed IPP at SRPM build time, the environment variable IPPROOT must be set, as follows:

setenv IPPROOT /opt/intel/ipp

Build SRPM

Build the modified opencv package with the following custom SPEC file. No configuration options are necessary: WITH_IPP, WITH_TBB, WITH_EIGEN are all enabled.

Then, force install it over the default libs as follows:

rpm -Uvh --nodeps opencv-2.4.9-3 etc etc.

Recompile the opencv app in question, and volia. Optimized. Speedups may vary, seeing ~ 2.3x speedups in processing times.

Notes on The Machine Is Learning

This is a series of art videos that were generated as a by-product of an ongoing computational media project further described in GVOD, GVOD + Analytics: Star Treks \\///, etc.


The Machine is Learning, v2.2.
12:37 minutes, 960 x 720 pixels.


The Machine is Learning, v2.3 lbp
12:37 minutes, 960 x 720 pixels.


The Machine is Learning, v5.7 multi
12:37 minutes, 960 x 720 pixels.


The Machine is Learning, v5.7 multi ghost
12:37 minutes, 960 x 720 pixels.

GVOD + Analytics: Star Treks \\///

vlcsnap-2014-02-19-17h20m22s182

vlcsnap-2014-02-19-17h16m19s57

This is a fan studies and media assemblage experiment, loosely associated with Professor Abigail De Kosnik’s Fan Data/Net Difference Project at the Berkeley Center for New Media. It uses technology associated with copyright verification and the surveillance state to desconstruct serial television into a hybrid media form.

The motivating question for this work is simple. How does one quantize serial television? Given a television episode,  such as the third episode of Star Trek, how can it be measured and then compared to other episodes of Star Trek? Can characters of the original Star Trek television series be compared to characters in different Star Trek universes and franchises, such as comparing Kirk/Spock in Star Trek to Janeway/Seven-of-Nine in Star Trek Voyager? Given a media text, how do you tag and score it? If you cannot score the whole text, can you score a character or characters? How do characters or elements of a media text become countable?

Media Texts:

Star Trek (The Original Series), aka TOS, 1966 to 1969. Episodes: 79. Running time each: 50 minutes. English subtitles from subscene for Seasons 1, 2, 3.

Star Trek Voyager, aka VOY, 1995 to 2001. Episodes: S03E26 to S07E26, ie #68 to #172, a total of 104. Running time each varies between 45 and 46 minutes.

Media Focus/Themes:

The pairs of Kirk/Spock in Star Trek the Original Series and Janeway/Seven of Nine in Star Trek Voyager will be compared in a media-analytic fashion.

A popular fanfic genre is called One True Pairing, aka OTP, which is a perceived or invented romantic relationship between two characters. One of the best known examples of OTP is the pair of Kirk and Spock on TOS. Indeed, fanfic involving Kirk and Spock is so popular to have its own nomenclature, and is called slash, or slash fic.

The pair of Janeway and Seven of Nine are comparable to Kirk and Spock as both the Janeway and Kirk characters are captains of space ships, and both the Seven of Nine and Spock characters are presented as “the other” to human characters: both the borg and vulcans are presented as otherworldly, non-human. The two pairs are different in other areas, the most obvious being gender: K/S is male, J/7 is female.

Some edit tapes for K/S can be found on YouTube for Seasons 1, 2, and 3. Some fanvids for J/7.

Open Questions:

This is a meta-vidding tool with an analytic overlay. It takes serial television shows and adds facial recognition to count face time and change the focus of viewing to specific character pairs instead of entire episodes. Developing the technology to answer these analytic questions, answering and understanding the answers, and formulating the next round of questions is the purpose of this project.

1. Should the method be the first 79 episodes that the character-pairs are together? How do you normalize the series and pairs?

Or minute-normalized, after the edits? The current times are:

TOS == 79 x 50 minutes == 3950 “character-pair” minutes total

VOY == 104 x 43 minutes == 4472 “character-pair” minutes total

2. Best method for facial recognition.

One idea is to use openframeworks, and incorporate an addon. Get FaceTracker library. See video explaining it. Get ofxFaceTracker addon for openframeworks.

Another is to use opencv directly.

OpenCv documentation main page.

Tutorial: Object detection with cascade classifiers.

User guide: Cascade Classifier Training.

Contrib/Experimental: Face Recognition with OpenCV. See the cv::FaceRecognizer class.

Many, many variants go into this. Some good links:

Samuel Molinari, People’s Control Over Their Own Image in
Photos on Social Networks, 2012-05-08

Aligning Faces in C++

Tutorial: OpenCV haartraining Naotoshi Seo

Notes on traincascades parameters

Recommended values for detecting

 

3

ffmpeg concat

LBP and Facial Recognition Example with Obama

Simple Face recognition using OpenCV, Etienne Membrives, The Pebibyte

 

IEEE Xplore: Face detection, pose estimation, landmark localization in the wild, 2012

Xiangxin Zhu, Ramanan, D

 

 

3. Measuring “character” and “character-pair” screen time. How is this related to the bechdel test? [2+ women, who talk to each other, about something besides a man] Can be this used to visualize it or flaws as currently conceived? What is bechdel version 2.0? [2+ women, who talk to each other, about something besides a man, or kids, or family] Can we use this tool to develop new forms?

4. How to auto-tag? How to populate the details of each scene in a tagged format? If original sources have subtitles, is there a way to dump the subs to SRT, and then populate the body of the wordpress with the transcript? Or, is there a way to use google’s transcription API’s to try and upload/subtitle/rip?

5. Can the netflix taxonomy be replicated? Given the patents, can some other organization scheme be devised?

Methodology:

0. Prerequisites

Software/hardware base is: Linux (Fedora 20) on Intel x86_64, using Nvidia graphics hardware and software. Ie, contemporary rendering and film production workstation.

Additional software is required on top of this base. For instance, g++ development environment, ffmpeg, opencv, openframeworks.080.

Make sure opencv is installed.

yum install -y opencv opencv-core opencv-devel opencv-python opencv-devel-docs

See OpenCV Configuration and Optimization Notes for more information about speeding up OpenCV on fedora.

1. Digitize selected episodes for processing with digital tools

Decrypt via makemkv. Compress to 3k constant rip with HandBrake.

Using 720p version of TOS in matroska media container. Downloaded SRT subtitles from fan sites. Media ends up being: 960×720, 24 frames a second.

2. Quantize each episode to a select number of frames.

Make sure ffmpeg is installed.

yum install -y ffmpeg ffmpeg-devel ffmpeg-libs

 

Sample math as follows. Assume a fifty minute show has 24 frames a second. That is:

50 minutes x 60 seconds in a minute x 24 frames a second == 72k total frames an episode.

Assuming a one-frame-a-second sample resolution gives 3k frames for the total set of frames in TOS episode one. Use ffmpeg to create a thumbnail image ever X seconds of video. And set to one image every second.

Via:

TMPDIR=tmp-1
mkdir $TMPDIR;
ffmpeg -i $1 -f image2 -vf fps=fps=1 ${TMPDIR}/out%4d.png;

3. Sort through frames and set aside twelve frames of Kirk faces, twelve frames of Spock faces.

This is used later, to train the facial recognition. Note: you definitely need hundreds and even thousands of positive samples for faces. In the case of faces you should consider all the race and age groups, emotions and perhaps beard styles.

For example, meet the Kirks.

And here are the Spocks.

In addition, this technique requires a negative set of images. These are images that are from the media source, but do not contain any of the faces that are going to be recognized. These are used to train the facial recognizer. Meet the non-K/S-faces.

4. Seed facial recognition with faces to recognize. Scan frames with facial recognition according to some input and expected result algorithm, and come up with edit lists that can be used to frames that are relevant to the character-pair.

Need either timecode or some other measure that can be dumped with an edit decision list or specific timecode marks. Some persistent data structure? Edits made.

5. Decompose episode into character-pair edit vids.

Use edit decision list or specific timecode marks, as above. Automate ffmpeg to make edits.

6. Store in wordpress container, one post per edit vid? Then with another post, tie together all of a single episode edit vids into one linked post?

Legal

There are both copyright risks and patent opportunities in this line of inquiry.

Production Notes:

Further
cinemetrics
How Netflix Reverse Engineered Hollywood, Alexis Madrigal, The Atlantic, 2014-01-02