TensorFlow Configuration and Optimization Notes

Notes for installing TensorFlow on linux, with GPU enabled.

Background

TensorFlow is the second-generation ML framework from Google. (See this comparison of deep learning software.) The current state-of-the art image recognition models (inception-v3) use this framework.

Prequisites

Assuming Fedora 24 with Nvidia 1060 installed, running nvidia as opposed to nouveau drivers. See Fedora 24 Notes, and RPM Fusion’s installation page for installing the Nvidia drivers. In sum,

dnf install -y xorg-x11-drv-nvidia akmod-nvidia "kernel-devel-uname-r == $(uname -r)"
dnf install xorg-x11-drv-nvidia-cuda
dnf install vulkan

After, install some devel packages.

dnf install -y vulkan-devel

Download the Nvidia GPU CUDA Toolkit. The version used for this install is 8.0.61, and the network install for Fedora x86_64 was used.

This version of CUDA Toolkit is not C++11/C++14/C++17 aware. So, be aware! One way around this is to mod like below, and use -std=gnu++98.

117c117,118
 5
---
> /* bkoz use -std=c++98 if necessary */
> #if __GNUC__ > 6

Next, compile top-of-tree OpenCV (aka 3.2) with CUDA enabled. To do so, use the following configure list, mod for paths on system:

cmake -DVERBOSE=1 -DCMAKE_CXX_FLAGS="${CMAKE_CXX_FLAGS} -std=gnu++98 -Wno-deprecated-gpu-targets" -D BUILD_EXAMPLES=1 -D BUILD_DOCS=1 -D WITH_OPENNI=1 -D WITH_CUDA=1 -D CUDA_FAST_MATH=1 -D WITH_CUBLAS=1 -D WITH_FFMPEG=1 -D WITH_EIGEN=1 -D ENABLE_FAST_MATH=1 -D ENABLE_SSE3=1 -D ENABLE_AVX=1 -D CMAKE_BUILD_TYPE=RELEASE -D ENABLE_PRECOMPILED_HEADERS=OFF  -D CMAKE_INSTALL_PREFIX=/home/bkoz/bin/H-opencv -D OPENCV_EXTRA_MODULES_PATH=/home/bkoz/src/opencv_contrib.git/modules /home/bkoz/src/opencv.git/

Admittedly, this abuse of CMAKE_CXX_FLAGS is not optimal. Maybe EXTRA_CXX_FLAGS?

Now, for Nvidia cuDNN. The version used for this install is 5.1

When that is done, use pip to install TensorFlow.

sudo pip install --upgrade pip;
sudo pip install tensorflow-gpu

This should output something like:

Collecting tensorflow-gpu
  Downloading tensorflow_gpu-0.12.1-cp27-cp27mu-manylinux1_x86_64.whl (89.7MB)
    100% |████████████████████████████████| 89.7MB 19kB/s 
Requirement already satisfied: mock>=2.0.0 in /usr/lib/python2.7/site-packages (from tensorflow-gpu)
Requirement already satisfied: six>=1.10.0 in /usr/lib/python2.7/site-packages (from tensorflow-gpu)
Requirement already satisfied: numpy>=1.11.0 in /usr/lib64/python2.7/site-packages (from tensorflow-gpu)
Collecting protobuf>=3.1.0 (from tensorflow-gpu)
  Downloading protobuf-3.2.0-cp27-cp27mu-manylinux1_x86_64.whl (5.6MB)
    100% |████████████████████████████████| 5.6MB 284kB/s 
Collecting wheel (from tensorflow-gpu)
  Downloading wheel-0.29.0-py2.py3-none-any.whl (66kB)
    100% |████████████████████████████████| 71kB 3.3MB/s 
Requirement already satisfied: pbr>=0.11 in /usr/lib/python2.7/site-packages (from mock>=2.0.0->tensorflow-gpu)
Requirement already satisfied: funcsigs>=1 in /usr/lib/python2.7/site-packages (from mock>=2.0.0->tensorflow-gpu)
Requirement already satisfied: setuptools in /usr/lib/python2.7/site-packages (from protobuf>=3.1.0->tensorflow-gpu)
Installing collected packages: protobuf, wheel, tensorflow-gpu
Successfully installed protobuf-3.2.0 tensorflow-gpu-0.12.1 wheel-0.29.0

After this has completed, add in Keras.

Optimization

For Nvidia GPUs, take a look at this interesting post from Netflix. In sum, add

NVreg_CheckPCIConfigSpace=0

Notes on the Deep Deep Deepest

 

Reading

 

Approaches

  • SVM (Support Vector Machines)
  • RBM (Restricted Boltzmann Machines)
  • NN/Convolution NN/DNN

 

Silicon Valley Fun

TensorFlow Dev Summit

February 15, 2017 @ googleplex, Mountain View, CA

 

Software

theano

  • python-theano-doc
  • python3-theano
  • python2-theano

tensorflow

  • TensorFlow
  • github
  • Current models for facial recognition include VGG-19, VGG-16, and inception-v3. Of the listed models, inception-v3 seems to have the advantage, at least as of early 2017.

keras

scikit-learn

opencv

 

GPU Hardware

Recommended GPUS are: Nvidia GTX 1080, 1070, 980, and 970. Maximize CUDA cores.

libabigail aka C++ Instrumentation and Analysis


Background

Libabigail is shorthand for the alternative, which just so happens to be a bit of a mouthful: “GNU Application Binary Interface Generic Analysis and Instrumentation Library.”

This is a current compiler/language research topic to provide a serialized XML form of C++11 sources as compiled by GNU g++, and a way of looking at the data produced. This data can be parsed to more accurately determine ABI compatibility, to better understand code additions and changes and how these change the exported interface, to examine and prototype how C++11 language usage determines linkage, etc.

Discussions about this functionality started at the “C++ ABI BOF” at the GNU Tools Cauldron 2012 Prague. This work was created at Red Hat, by Benjamin Kosnik, Jason Merrill, and Dodji Seketeli. Some updates at 2013 Cauldron. See “Cauldron 2013 GCC ABI BOF.”

Development sources are written in mixed C++2003/C++11, hosted in git, based on GCC trunk, and tracking what will to be gcc-4.9.0. The branch is administered by Dodji Seketeli.

Please feel free to try it out, but know that the state is experimental and quite raw.

Feedback and assistance is welcome.

Starting from a git working tree as described in GitMirror, add the libabigail repository as follows:

git checkout -b libabigail origin/libabigail

To stay up to date, use:

git pull


Overview

How is this expected to be used? First, a libabigail top-level directory is either added to the GCC sources or compiled as a first step and put into some PREFIX directory. The GNU C++ compiler, g++, is configured to use this new library with:

configure .. --with-abigail=$PREFIX

Thus configured, the C++ front end is built, installed, and used as the primary compiler. All sources are compiled with an additional flag, -fdump-abi.

So, this command:

g++ -c -fdump-abi somefile.cc

Creates two files:

  • somefile.o

    The object file

  • somefile.cc.bi

    The XML instrumentation file


API/ABI

basics

Toplevel namespace is abigail.

The interface header files in libabigail:

abg-ir.h
abg-corpus.h

Doxgen is used to document the sources: try make html to generate, and look in libabigail-build-dir/doc/api/html/index.html to read it.

And then the binary interface is in libabigail.so.

notes

Each object file is compiled to a translation_unit. The sum of all translation_units is a corpus.

Compiler-generated files are read as serialized input to a translation_unit and de-serialized. And any modified form is written to an output file in serialized form.

The interface to the C++ intermediate representation is best viewed in the class documentation.

Opinions and Wild Guesses

1. Some formatting tips.

– classes “read” as types, data, members functions. In that order.

– doxygen gives feedback on the state of the doxygen parse in the form of a log, as you run “make html.” Read this log: doxygen is a fuzzy parse. There are formatting things you can do to make it better. Do them. It’s easier to fix up these errors then figure out why the generated HTML is poor.

2. Use of shared_ptr is intriguing.

There are not really a lot of existing usage patterns for std::shared_ptr in libstdc++ (in C++11 , , ). If you look at the page of boost idioms for shared_ptr usage:

http://www.boost.org/doc/libs/1_54_0/libs/smart_ptr/sp_techniques.html

One notices that there’s not a lot of use of shared_ptrs in interfaces. Yet in libabigail, that is very common. I’m curious about this style question.

And most usage is up for debate, see this stack overthow discussion about using shared_ptrs as function arguments. Should the parameters be const reference or just shared_ptr? And another.

Some interesting thinking from microsoft on shared_ptr usage.

3. Use of virtual binary operators is odd.

The old adage is that operators cause havoc in overload resolution. These are binary operators, but the stigma lingers. A vague feeling is not the same as something definite that’s a hard no. It’s more like the pirate code than a strict coding convention or hard rule. I would say that if you ever start to see strange bugs due to overloading, consider making these (non-operator) functions.

Otherwise, do it.

LSB C++ Notes

Install

Base hardware/software config: F18 adair + x86_64.

Test harness for the Linux Standard Base. Bugzilla for the project. The C++ library parts of this are using component “lsb-test-libstdcpp” in bugzilla. The components “Cpp-T2C” and “libstdcpp” are also relevant, these are generated test files and any reference to libstdc++.so, respectively.

A couple of different projects and configurations. For testing the C++ runtime on GNU linux systems (aka libstdc++), want LSB Distribution Testkit. This requires the packages redhat-lsb, redhat-lsb-core, and ctags be installed.

basics:

%mkdir $src/lsb

%cd $src/lsb

%tar xvfz lsb-dist-testkit-manager-4.1.7-1.x86_64.tar.gz
./
./lsb-dist-testkit-manager/
./lsb-dist-testkit-manager/inst-config
./lsb-dist-testkit-manager/install.sh
./lsb-dist-testkit-manager/post-install.sh
./lsb-dist-testkit-manager/lsb-dist-checker-4.1.0.11-1.x86_64.rpm
./lsb-dist-testkit-manager/lsb-setup-4.1.0-1.noarch.rpm

Then, install it:

%./install.sh

A bunch of install spewage omitted. The relevant details:

LSB Distribution Checker

  • Project homepage.
  • installed in: /opt/lsb/test/manager
  • to start: /opt/lsb/test/manager/bin/dist-checker-start.pl

Run Distribution Checker

Now, start up the web-ui:

%/opt/lsb/test/manager/bin/dist-checker-start.pl
The port '8888' will be used by the Distribution Checker's web-UI server.
If you want to change this, run /opt/lsb/test/manager/bin/dist-checker-start.pl <port>

Server started. Log file location:
/var/opt/lsb/test/manager/log/dc-server.log.8888

The start page should be opened in a browser shortly.
If it doesn't open, you can load it at http://localhost:8888/

Got to firefox, open:

http://localhost:8888/

Select Custom Tests Mode. Hit the red “Refresh List” button.  Then, click box by “Libstdc++ Tests” in the “Runtime Interface Tests” category. Scroll to the bottom, hit the green “Run Selected Test” button.

This will chug a bit, and then you’ll get a webpage saying: Success and Passed. It s1`

f18-lsb-original-results

Now, what goes on here? What compiler and library are being tested?

The magic is here:

/opt/lsb/test/libstdcpp_4.1.0/run_tests

Notes

Test files for the Distribution Check/libstdc++ are based on GCC-4.1.x release. For the 4.1.2 release, libstdc++/testsuite ran 2131 testcases in native mode. The LSB testsuite has 1978 testcases, and 1786 are run. Jump ahead six years, the 4.8.0 release has 5367 testscases.

Red Hat Use

See the main LSB at Red Hat page on the wiki. It’s got a lot of great info about current test results.

ABI Notes, commentary

Looking at past releases, zoom in on a couple in particular:

  • gcc-4.2.1, fixed in gcc-4.2.2, GCC PR 33678.  Problem was re-ordering of vtables in libsupc++/typeinfo. In std::typeinfo, the virtual member functions must be ordered as: __is_pointer_p, __is_function_p, __do_catch, __do_upcast. Found via LSB vtable change detector.
  • gcc-4.4.x and above support for exception propagation, GCC PR 40296.Look at __cxa_get_exeption_ptr backports. For std::exception_ptr, std::current_exception, __cxa_get_exception_ptr, backports of this functionality are required if the base system is to deal with C++11 calls. So, back-port exception handling changes to RHEL 5 systems (RHEL 6 support this as based on 4.4.).
  • gcc-4.7.x, including pre-release.
    • For std::list, there is an added data member, _M_size.  Reverted, fixed in 4.7.2. This is problematic when std::list is part of a function signature.
    • For std::pair, an addressable change in C++11 but not C++98, due to the addition of a non-trivial move constructor. Fixed in 4.7.2. That meant in C++98, std::pair was passed in register. In C++11, it was not. Note that this is unrelated to the known C++98/C++11 incompatibility with std::make_pair found by James Dennett (See Appendix D, signature change to explicitly specifying make_pair<template_arguments> will break in C++11, replace with std::pair<tempate_arguments> or just let the compiler deduct). Some commentary from Andreas Jaeger’s blog post about it, the openSUSE bug 767666, GCC changes reference to it.
    • For std::unordered_map, and the rest of the C++11 unordered associated containers,  the situation is a bit different as this is a C++11 only component. Internal changes change data model. Deemed incompatible, but only for C++11, so allowed per C++11 ABI policy.
    • For std::condition_variable_any, added data member _M_mutex from this change. Deemed incompatible, but only for C++11, so allowed per C++11 ABI policy.
    • For std::num_get, vtable change from this change. Fixed via this change, pre-release of 4.7.0. Found via DTS-1.0 testing, running hybrid libstdc++.dts.so with base system running gcc-4.4 libstdc++.so.
    • For std::complex, there are signature changes in member functions, GCC PR 53429. This is a C++98/C++11 compatibility question involving vague linkage and template in-lining. Unresolved.

Also of note is the LSB tracker bug for C++11 support.

General C++11 ABI policy, as per GCC wiki

Some links to cool tools:

pkgdiff, like rpmdiff but open.

ACC, ABI Compliance Checker, lsb.

Combining all this, look at a compatibility report for boost.

Notes on Generative, openFrameworks

Background

Processing, openFrameworks are related. For Processing, see Casey Reas at UCLA.

See: Hello, Processing! For a beginning.

Some people: Casey Reas (west), Ben Fry (east). Via John Maeda. Chris Reilly (west), Chandler McWilliams

Setup

Base is ulloa config, Fedora 18 on x86_64. Secondary is Fedora 17 on x86_64. Note, you’ll need to have a video card and driver suitable for running OpenGL. Intel graphics are easy, Nvidia can be done but will need to use proprietary drivers, and not the default nouveau driver.

Follow notes from openframeworks site for linux/64 install, starting with going into the Fedora scripts directory:

cd /home/bkoz/src/openframeworks.v0073_linux64/scripts/linux/fedora

And then:

sudo ./install_codeblocks.sh

sudo ./install_codecs.sh

These should install some packages, if the codeblocks IDE and some of the development packages for audio or video codecs aren’t already installed. Then, do this script, which may start compiling things:

sudo ./install_dependencies.sh

Some editing/slight work-arounds for cairo includes, means using CXXFLAGS=”-I/usr/include/cairo” or doing the following small patch:

*** libs/openFrameworks/graphics/ofCairoRenderer.h.orig 2012-12-28 15:48:06.649358899 -0800
--- libs/openFrameworks/graphics/ofCairoRenderer.h      2012-12-28 15:48:59.502292659 -0800
***************
*** 1,10 ****
#pragma once

! #include "cairo-features.h"
! #include "cairo-pdf.h"
! #include "cairo-svg.h"
! #include "cairo.h"
#include 
#include 
#include "ofMatrix4x4.h"
--- 1,10 ----
#pragma once

! #include "cairo/cairo-features.h"
! #include "cairo/cairo-pdf.h"
! #include "cairo/cairo-svg.h"
! #include "cairo/cairo.h"
#include 
#include 
#include "ofMatrix4x4.h"

After applying the patch, the dependency-making script above should complete without error. It will probably end with something that looks like:

to launch the application

cd bin
./projectGeneratorSimple

Instead, go up a level and run the

buildAllExamples.sh

script, and then then

testAllExamples.sh

script.

If everything has gone correctly, then the last script(testAllExamples.sh) will throw up window after window of delicious OF candy, fun to watch. To go on to the next example, close the topmost window.

If you get to this point, then the preliminary setup should be correct.

linux tcp examples failing, start tcp server? kill firewall? punch holes for openframeworks ports?


Examine Examples

Open up the example project files within the Codeblocks IDE.


Start hacking

Put all projects into the compiled openframeworks directory, in the “apps” subdirectory.


See

0. Form+Code website, links

1. Generative Design website, code page table of contents.

2. openprocessing website, community site.

3. openFrameworks website.

4. ofauckland, ofxCairo examples for linux

4. creativeapplications.net

5. Golan Levin is or was an openframeworks user

6. memo’s libs (MSA libs) on github. Download, then copy into your addons directory.

7. toxiclibs, another good lib

8. podcast on creative coding

9. cinder, which is another, alternative framework not based on processing.

10. generative.net, another good meta-site

Static Linking for C++ Shared Objects

Prerequisites

Assuming recent-vintage linux with standard C/C++ developer setup, including g++, readelf, ld, gold etc., etc. Reference platform is Fedora 17 (F17) using gcc-4.7.2. Linking static libraries may require the installation of non-default packages. For C/C++, this means:

yum install -y glibc-static libstdc++-static

Problem Description

Create a shared library that uses C++ internally, but with an external C interface and all dependent libraries statically-linked.

Some quick background/primer on creating a shared library with current GNU tools on recent linux. Consider the default case, making a shared library.

Take the first source file, a generic library function implemented in C++ and exported via extern C.

// filename: 28811.cc
#include <iostream>
#include <string>

extern "C" void announce()
{
  const std::string s("oooohllalala");
  std::cout << s << std::endl;
}

And then the second file, which uses it.

// filename: 28811-loop.cc
extern "C" void announce();

int main()
{
  announce();
  return 0;
}

Compile via:

g++ -shared -fPIC -O2 -g 28811.cc -o libannounce.so

g++ -g -O2 -L. 28811-loop.cc -lannounce -o 28811.exe

To run the executable, first make sure that LD_RUN_PATH has the directory containing the shared library libannounce.so. Then:

28811.exe

Which should print:

oooohllalala

Now that that is confirmed to be working, here’s a closer look at the library file libannounce.so that was produced.

A look at the linked dependencies:

%ldd libannounce.so 
	linux-vdso.so.1 =>  (0x00007fffed1ff000)
	libstdc++.so.6 => /mnt/share/bld/gcc.git-trunk/x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs/libstdc++.so.6 (0x00007fd87e7e5000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fd87e4c3000)
	libgcc_s.so.1 => /mnt/share/bld/gcc.git-trunk/gcc/libgcc_s.so.1 (0x00007fd87e4ad000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fd87e0f6000)
	/lib64/ld-linux-x86-64.so.2 (0x0000003b72200000)

This shows that the support library has a depedent link on the C language runtime (ie libc.so and libm.so, the GNU C/C++ language support libray (libgcc_s.so), the C++ language runtime (libstdc++.so), and the linux dynamic loader (linux-vdso.so).

A look at what’s in the library:

readelf -s libannounce.so | grep announce
    25: 0000000000000d20   226 FUNC    GLOBAL DEFAULT   11 announce
    61: 0000000000000d20   226 FUNC    GLOBAL DEFAULT   11 announce

This is our baseline setup. Now, try some variations.

The first variation: try with libgcc_s.so statically-linked.

g++ -shared -static-libgcc -fPIC -O2 -g 28811.cc -o libannounce.so

And then look at the library produced:

%ldd libannounce.so 
	linux-vdso.so.1 =>  (0x00007fff075ff000)
	libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f27ceeb9000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f27cebbd000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f27ce806000)
	/lib64/ld-linux-x86-64.so.2 (0x0000003b72200000)
	libgcc_s.so.1 => /mnt/share/bld/gcc.git-trunk/gcc/libgcc_s.so.1 (0x00007f27ce7f0000)

It’s missing libgcc_s.so, as expected.

The second variation: try with libgcc_s.so and libstdc++.so statically-linked.

So:

g++ -shared -static-libgcc -static-libstdc++ -fPIC -O2 -g 28811.cc -o libannounce.so

Gives:

%ldd libannounce.so 
	linux-vdso.so.1 =>  (0x00007fff819ff000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fdba493a000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fdba4582000)
	/lib64/ld-linux-x86-64.so.2 (0x0000003b72200000)

As expected.

Let’s try a third variation, to try for an all-C static link too. Ie:

g++ -shared -static-libgcc -static-libstdc++ -fPIC -O2 -g 28811.cc -Wl,-Bstatic -lc -lm -o libannounce.so

This doesn’t work:

g++ -shared -static-libgcc -static-libstdc++ -fPIC -O2 -g 28811.cc -Wl, -static -lc -lm -o libannounce.so
/usr/bin/ld: /mnt/share/bin/H-x86_64-gcc/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.8.0/crtbeginT.o: relocation R_X86_64_32 against `__TMC_END__' can not be used when making a shared object; recompile with -fPIC
/mnt/share/bin/H-x86_64-gcc/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.8.0/crtbeginT.o: could not read symbols: Bad value
collect2: error: ld returned 1 exit status


Known Bugs

PR28811
PR54482

Older bugs:

PR52689

Some of these bugs are many years old, and the build machinery has changed since they were first filed.

Basics on impacted sources:

src/c++98/compatibility.cc
src/c++98/compatibility-list-2.cc
src/c++11/compatibility-c++0x.cc
src/c++11/compatibility-atomic-c++0x.cc
src/c++11/compatibility-thread-c++0x.cc

and configure via PIC_CXXFLAGS

src/c++98/Makefile.am
src/c++11/Makefile.am

testing via

./lib/libstdc++.exp
     [list "incdir=$srcdir" "additional_flags=-w -shared 
testsuite/17_intro/static.cc:
     { dg-options "-static-libstdc++ -std=gnu++11" }
testsuite/17_intro/static_pic.cc:
     { dg-options "-shared -fPIC -static-libgcc -static-libstdc++"

It looks like the C++ compiler testsuite has a bunch of -fPIC tests, some -fPIC -fvisibility=hidden tests (mostly in the context of local statics), and some -static tests.

Questions

1) Are these expectations valid for C too?

First, what the GCC Manual says about the specific link options. Use “N1975 Dynamic Shared Objects: Survey and Issues” as background.

Then figure out the uses for:

  • -static
  • partial static shorcuts: -static-libgccgcc, -static-libstdc++
  • partial static via:
    -Wl,-static -lssl -lcrypto

2) how to check to see if an object has relocations

readelf -r foo.o

Ian Taylor has a nice description of relocations.

Template-fu, Zen Linking

Welcome, weary traveller.

Please, enter the dojo. Have some tea. Sit, and listen to me expound on the state of linking today.

There are a number of new techniques for linking in C+11. Some are not widely known. Some require long nights, on cold drafty mountain tops to fully master.

The new forms:

1. Extern Template

When you want white. Nothing. A truly private implementation, with only the API exported. Use extern template on class specializations to tell the compiler to not implicitly instantiate any symbols when the class is used by user code. For template functions as well.

Smartly done on forward-declarations, after the main class has been defined, making them post-declarations.  Pretty much anything goes: the syntax is the same as the syntax for explicit instantiations. Precisely because the two are a matched pair: with the power to prohibit instantiations comes the responsibility to explicitly instantiate them in some form. Wax on, wax off.

With C++11, extern template is portable. GNU C++ users have used it widely since 2002.

2. -fvisibility=hidden

And why it’s different from extern template. There seems to be a lot of confusion out there, about this. And let’s face it, the syntax is atrocious! Absolutely abominable.

GNU extensions, apply as attribute on namespaces.

3. constexpr

Mantis-style.

4. Namespace association. Tarsier-style.

But I will not bore you, weary traveller. Sit and enjoy your beverage. There will be plenty of time to talk about new techniques and methods later, after you have rested from your voyage.