TensorFlow Configuration and Optimization Notes

Notes for installing TensorFlow on linux, with GPU enabled.

Background

TensorFlow is the second-generation ML framework from Google. (See this comparison of deep learning software.) The current state-of-the art image recognition models (inception-v3) use this framework.

Prequisites

Assuming Fedora 24 with Nvidia 1060 installed, running nvidia as opposed to nouveau drivers. See Fedora 24 Notes, and RPM Fusion’s installation page for installing the Nvidia drivers. In sum,

dnf install -y xorg-x11-drv-nvidia akmod-nvidia "kernel-devel-uname-r == $(uname -r)"
dnf install xorg-x11-drv-nvidia-cuda
dnf install vulkan

After, install some devel packages.

dnf install -y vulkan-devel

Download the Nvidia GPU CUDA Toolkit. The version used for this install is 8.0.61, and the network install for Fedora x86_64 was used.

This version of CUDA Toolkit is not C++11/C++14/C++17 aware. So, be aware! One way around this is to mod like below, and use -std=gnu++98.

117c117,118
 5
---
> /* bkoz use -std=c++98 if necessary */
> #if __GNUC__ > 6

Next, compile top-of-tree OpenCV (aka 3.2) with CUDA enabled. To do so, use the following configure list, mod for paths on system:

cmake -DVERBOSE=1 -DCMAKE_CXX_FLAGS="${CMAKE_CXX_FLAGS} -std=gnu++98 -Wno-deprecated-gpu-targets" -D BUILD_EXAMPLES=1 -D BUILD_DOCS=1 -D WITH_OPENNI=1 -D WITH_CUDA=1 -D CUDA_FAST_MATH=1 -D WITH_CUBLAS=1 -D WITH_FFMPEG=1 -D WITH_EIGEN=1 -D ENABLE_FAST_MATH=1 -D ENABLE_SSE3=1 -D ENABLE_AVX=1 -D CMAKE_BUILD_TYPE=RELEASE -D ENABLE_PRECOMPILED_HEADERS=OFF  -D CMAKE_INSTALL_PREFIX=/home/bkoz/bin/H-opencv -D OPENCV_EXTRA_MODULES_PATH=/home/bkoz/src/opencv_contrib.git/modules /home/bkoz/src/opencv.git/

Admittedly, this abuse of CMAKE_CXX_FLAGS is not optimal. Maybe EXTRA_CXX_FLAGS?

Now, for Nvidia cuDNN. The version used for this install is 5.1

When that is done, use pip to install TensorFlow.

sudo pip install --upgrade pip;
sudo pip install tensorflow-gpu

This should output something like:

Collecting tensorflow-gpu
  Downloading tensorflow_gpu-0.12.1-cp27-cp27mu-manylinux1_x86_64.whl (89.7MB)
    100% |████████████████████████████████| 89.7MB 19kB/s 
Requirement already satisfied: mock>=2.0.0 in /usr/lib/python2.7/site-packages (from tensorflow-gpu)
Requirement already satisfied: six>=1.10.0 in /usr/lib/python2.7/site-packages (from tensorflow-gpu)
Requirement already satisfied: numpy>=1.11.0 in /usr/lib64/python2.7/site-packages (from tensorflow-gpu)
Collecting protobuf>=3.1.0 (from tensorflow-gpu)
  Downloading protobuf-3.2.0-cp27-cp27mu-manylinux1_x86_64.whl (5.6MB)
    100% |████████████████████████████████| 5.6MB 284kB/s 
Collecting wheel (from tensorflow-gpu)
  Downloading wheel-0.29.0-py2.py3-none-any.whl (66kB)
    100% |████████████████████████████████| 71kB 3.3MB/s 
Requirement already satisfied: pbr>=0.11 in /usr/lib/python2.7/site-packages (from mock>=2.0.0->tensorflow-gpu)
Requirement already satisfied: funcsigs>=1 in /usr/lib/python2.7/site-packages (from mock>=2.0.0->tensorflow-gpu)
Requirement already satisfied: setuptools in /usr/lib/python2.7/site-packages (from protobuf>=3.1.0->tensorflow-gpu)
Installing collected packages: protobuf, wheel, tensorflow-gpu
Successfully installed protobuf-3.2.0 tensorflow-gpu-0.12.1 wheel-0.29.0

After this has completed, add in Keras.

Optimization

For Nvidia GPUs, take a look at this interesting post from Netflix. In sum, add

NVreg_CheckPCIConfigSpace=0

libabigail aka C++ Instrumentation and Analysis


Background

Libabigail is shorthand for the alternative, which just so happens to be a bit of a mouthful: “GNU Application Binary Interface Generic Analysis and Instrumentation Library.”

This is a current compiler/language research topic to provide a serialized XML form of C++11 sources as compiled by GNU g++, and a way of looking at the data produced. This data can be parsed to more accurately determine ABI compatibility, to better understand code additions and changes and how these change the exported interface, to examine and prototype how C++11 language usage determines linkage, etc.

Discussions about this functionality started at the “C++ ABI BOF” at the GNU Tools Cauldron 2012 Prague. This work was created at Red Hat, by Benjamin Kosnik, Jason Merrill, and Dodji Seketeli. Some updates at 2013 Cauldron. See “Cauldron 2013 GCC ABI BOF.”

Development sources are written in mixed C++2003/C++11, hosted in git, based on GCC trunk, and tracking what will to be gcc-4.9.0. The branch is administered by Dodji Seketeli.

Please feel free to try it out, but know that the state is experimental and quite raw.

Feedback and assistance is welcome.

Starting from a git working tree as described in GitMirror, add the libabigail repository as follows:

git checkout -b libabigail origin/libabigail

To stay up to date, use:

git pull


Overview

How is this expected to be used? First, a libabigail top-level directory is either added to the GCC sources or compiled as a first step and put into some PREFIX directory. The GNU C++ compiler, g++, is configured to use this new library with:

configure .. --with-abigail=$PREFIX

Thus configured, the C++ front end is built, installed, and used as the primary compiler. All sources are compiled with an additional flag, -fdump-abi.

So, this command:

g++ -c -fdump-abi somefile.cc

Creates two files:

  • somefile.o

    The object file

  • somefile.cc.bi

    The XML instrumentation file


API/ABI

basics

Toplevel namespace is abigail.

The interface header files in libabigail:

abg-ir.h
abg-corpus.h

Doxgen is used to document the sources: try make html to generate, and look in libabigail-build-dir/doc/api/html/index.html to read it.

And then the binary interface is in libabigail.so.

notes

Each object file is compiled to a translation_unit. The sum of all translation_units is a corpus.

Compiler-generated files are read as serialized input to a translation_unit and de-serialized. And any modified form is written to an output file in serialized form.

The interface to the C++ intermediate representation is best viewed in the class documentation.

Opinions and Wild Guesses

1. Some formatting tips.

– classes “read” as types, data, members functions. In that order.

– doxygen gives feedback on the state of the doxygen parse in the form of a log, as you run “make html.” Read this log: doxygen is a fuzzy parse. There are formatting things you can do to make it better. Do them. It’s easier to fix up these errors then figure out why the generated HTML is poor.

2. Use of shared_ptr is intriguing.

There are not really a lot of existing usage patterns for std::shared_ptr in libstdc++ (in C++11 , , ). If you look at the page of boost idioms for shared_ptr usage:

http://www.boost.org/doc/libs/1_54_0/libs/smart_ptr/sp_techniques.html

One notices that there’s not a lot of use of shared_ptrs in interfaces. Yet in libabigail, that is very common. I’m curious about this style question.

And most usage is up for debate, see this stack overthow discussion about using shared_ptrs as function arguments. Should the parameters be const reference or just shared_ptr? And another.

Some interesting thinking from microsoft on shared_ptr usage.

3. Use of virtual binary operators is odd.

The old adage is that operators cause havoc in overload resolution. These are binary operators, but the stigma lingers. A vague feeling is not the same as something definite that’s a hard no. It’s more like the pirate code than a strict coding convention or hard rule. I would say that if you ever start to see strange bugs due to overloading, consider making these (non-operator) functions.

Otherwise, do it.

Compiler Feature Testing

SG10 is a working group for C++ users to try and figure out how to port between C++11 and C++14. It’s part of the ISO C++ standardization plan for post-C++11 work.

SG 10 first met in Bristol, United Kingdom during the spring of 2013. There have been two teleconfences, and an archived mailing list has been set up for discussion.

The goal is to have some consensus for an approach that vendors can use as C++14 is implemented. In particular, draft recommendations are due prior to the start of the Chicago meeting starting on August 23, 2013.

Background

Below is a summary of the macro interface for relevant languages (C and C++), operating systems (), and notable implementations (GNU, EDG, Clang, Boost).

C

6.10.8 Predefined macro names

6.10.8.1 Mandatory macros
Required:

__DATE__
__FILE__
__LINE__
__STDC__
__STDC_HOSTED__
__STDC_VERSION__ (201ymmL)
__TIME__

6.10.8.2 Environment macros
Conditionally defined:

__STDC__ISO_10646__
__STDC_MB_MIGHT_NEQ_WC__
__STDC_UTF_16__ (char16_t types are UTF-16 encoded)
__STDC_UTF_32__ (char32_t types are UTF-32 encoded)

6.10.8.3 Conditional feature macros

__STDC_ANALYZABLE__ (1 iff conforms to Annex L)
__STDC_IEC_559__ (1 iff IEC 60559 floating point)
__STDC_IEC_559_COMPLEX__ (1 iff IEC 60559 complex)
__STDC_LIB_EXT1__ (201ymmL Annex K, bounds checking interfaces)
__STDC_NO_ATOMICS__ (1 iff no atomics)
__STDC_NO_COMPLEX__ (1 iff no complex.h types)
__STDC_NO_THREADS__ (1 iff no threads)
__STDC_NO_VLA__ (1 if no VLA)

POSIX

See the unistd.h include file.

2.1.3 POSIX Conformance

_POSIX_VERSION (200809L iff all mandatory functions and headers)

Defined with value > 1:

_POSIX_CHOWN_RESTRICTED
_POSIX_NO_TRUNC (!-1, pathnames smaller than NAME_MAX ok)

Defined with value = 200809L

_POSIX_ASYNCHRONOUS_IO
_POSIX_BARRIERS
_POSIX_CLOCK_SELECTION
_POSIX_MAPPED_FILES
_POSIX_MEMORY_PROTECTION
_POSIX_READER_WRITER_LOCKS
_POSIX_REALTIME_SIGNALS
_POSIX_SEMAPHORES
_POSIX_SPIN_LOCKS
_POSIX_THREAD_SAFE_FUNCTIONS
_POSIX_THREADS
_POSIX_TIMEOUT
_POSIX_TIMERS

Defined with a value > 0

_POSIX_JOB_CONTROL
_POSIX_SAVED_IDS

Defined optionally (-1 means no, 0 maybe, >0 means yes)

_POSIX_ADVISORY_INFO
_POSIX_CPUTIME
_POSIX_FSYNC
_POSIX_IPV6
_POSIX_MEMLOCK
_POSIX_MEMLOCK_RANGE
_POSIX_MESSAGE_PASSING
_POSIX_MONOTONIC_CLOCK
_POSIX_PRIORITIZED_IO
_POSIX_PRIORITY_SCHEDULING
_POSIX_RAW_SOCKETS
_POSIX_SHARED_MEMORY_OBJECTS
_POSIX_SPAWN
_POSIX_SPORADIC_SERVER
_POSIX_SYNCHRONIZED_IO
_POSIX_THREAD_ATTR_STACKADDR
_POSIX_THREAD_CPUTIME
_POSIX_THREAD_ATTR_STACKSIZE
_POSIX_THREAD_PRIO_INHERIT
_POSIX_THREAD_PRIO_PROTECT
_POSIX_THREAD_PRIORITY_SCHEDULING
_POSIX_THREAD_PROCESS_SHARED
_POSIX_THREAD_SPORADIC_SERVER
_POSIX_TRACE
_POSIX_TRACE_EVENT_FILTER
_POSIX_TRACE_INHERIT
_POSIX_TRACE_LOG
_POSIX_TYPED_MEMORY_OBJECTS

User defined to specify version

_POSIX_C_SOURCE

C++

As of the post-C++11 draft standard, N3485, the current lay of the land is divided into:

16.8 Predefined macro names [cpp.predefined]

Required:

__cplusplus (201103L)
__DATE__ ("Mmm dd yyy")
__FILE__
__LINE__
__STDC_HOSTED__ (1 iff hosted, 0 iff freestanding)
__TIME__ ("hh:mm:ss")

Conditionally defined:

__STDC__
__STDC_MB_MIGHT_NEQ_WC__
__STDC_VERSION__
__STDC_ISO_10646__
__STDCPP_STRICT_POINTER_SAFETY__ (1 iff string pointer safety as per 3.7.4.3)
__STDCPP_THREADS__ (1 iff more than single thread supported)

Some other notable implementation interfaces follow.

gnu

From “The C Preprocessor” manual, section 3.7 on Predefined Macros.

compiler

__GNUC__
__GNUC_MINOR__
__GNUC_PATCHLEVEL__
__GNUG__
__VERSION__
__OPTIMIZE__ (iff -On, where n > 0)
__NO_INLINE__ (if -finline)
__GNUC_GNU_INLINE__
__GNUC__STDC_INLINE__
__CHAR_UNSIGNED__
__SIZE_TYPE__ + others (correct underlying type)
__SIZEOF_INT__ + others (size of type)
__DEPRECATED
__EXCEPTIONS (if -fexceptions)
__GXX_WEAK (1 if comdat, weak supported)
__GXX_RTTI (if -frtti)
__PRETTY_FUNCTION__
__STRICT_ANSI__ (no GNU extensions)

runtime

__GNU_SOURCE
_GLIBCXX_CONSTEXPR 
_GLIBCXX_USE_CONSTEXPR 
_GLIBCXX_NOEXCEPT
_GLIBCXX_USE_NOEXCEPT
_GLIBCXX_THROW
_GLIBCXX_THROW_OR_ABORT
_GLIBCXX_WEAK_DEFINITION
_GLIBCXX_USE_DECIMAL_FLOAT
_GLIBCXX_USE_NANOSLEEP
_GLIBCXX_USE_WCHAR_T
_GLIBCXX_USE_LONG_LONG
_GLIBCXX_USE_C99_STDINT_TR1
__try
__throw
__throw_exception_again

boost

BOOST_NO_EXCEPTIONS
BOOST_BUILT_IN_EXCEPTIONS_MISSING_WHAT
BOOST_NO_TYPEID
BOOST_NO_RTTI

BOOST_HAS_LONG_LONG
BOOST_HAS_DECLTYPE
BOOST_HAS_RVALUE_REFS
BOOST_HAS_STATIC_ASSERT
BOOST_HAS_VARIADIC_TMPL

BOOST_NO_CXX11_TEMPLATE_ALIASES

BOOST_NO_CXX11_CONSTEXPR
BOOST_NO_CXX11_NOEXCEPT
BOOST_NO_CXX11_NULLPTR
BOOST_NO_CXX11_RANGE_BASED_FOR
BOOST_NO_CXX11_UNIFIED_INITIALIZATION_SYNTAX

BOOST_NO_CXX11_SCOPED_ENUMS
BOOST_NO_CXX11_CHAR16_T
BOOST_NO_CXX11_CHAR32_T
BOOST_NO_CXX11_EXPLICIT_CONVERSION_OPERATORS
BOOST_NO_CXX11_LAMBDAS
BOOST_NO_CXX11_LOCAL_CLASS_TEMPLATE_PARAMETERS
BOOST_NO_CXX11_RAW_LITERALS
BOOST_NO_CXX11_UNICODE_LITERALS
BOOST_NO_CXX11_USER_DEFINED_LITERALS

clang

See Feature Checking Macros. Uses a generalized mechanism via “builtin function-like” macros.

__has_builtin(x)
__has_feature(x)
__has_feature(cxx_exceptions)
__has_feature(cxx_rtti)
__has_extension(x)
__has-attribute(x)

edg

__EDG__ 
__EDG_VERSION__ 

__cplusplus 
__cplusplus_cli 
__SIGNED_CHARS__
__PRETTY_FUNCTION__ 
_WCHAR_T 
_BOOL 
__NO_LONG_LONG
__CHAR16_T_AND_CHAR32_T

__ARRAY_OPERATORS 
__PLACEMENT_DELETE 

__EXCEPTIONS 
__RTTI 

__VARIADIC_TEMPLATES

Proposed Language Predefines

The plan is to start with new language features, and to offer several modular macros that sub-divide the C++2014 feature set, while retaining a relationship with the main versioning macro (__cplusplus).

In addition, there is much interest in a solution that could be used to resolve some of the lingering portability issues with C++2011 (constexpr, variadic templates), and even C++2003 (exceptions, RTTI).

Starting with proposed C++11/14 language features, add predefined macros of the form:

__cpp + language feature

So, for constexpr, the macro becomes:

__cpp_constexpr

The value is determined to be:

1) if C++11 constexpr is not supported, __cpp_constexpr < 201103L
2) if C++11 constexpr is supported, __cpp_constexpr >= 201103L
3) if C++14 constexpr is supported, __cpp_constexpr > 201103L

In the last case, there is a bit of ambiguity. How do you distinguish between a C++11 conformant compiler, an experimental C++14 compiler of a particular vintage, and a C++14 conformant compiler?

One way would be to use the same form used by __cplusplus. This macro value is computed from the year + month of the standard’s adoption by ISO. In a similar manner, pre-standard features could be defined as the year + month that the feature was voted into the working C++ draft.

Take the evolution of constexpr, as a useful for-instance. Using

__cpp_constexpr

Set it to the following values based on different language dialect flags, and compare to the primary C++ macro, __cplusplus.

c++ dialect flag __cplusplus __cpp_constexpr
c++98 -std=g++98 199711L 199711L
c++11 -std=g++11 201103L 201103L
pre-c++14 with N3302/N3470/3469/3471 lib changes -std=g++1y 201300L 201210L
pre-c++14 with above + N3652 (relaxed) language changes -std=g++1y 201300L 201304L

Proposed Library Defines

Starting with proposed C++11/14 library features, add macros of the form:

__cpp_lib_ + header name

So, for C++11 , the macro becomes:

__cpp_lib_futures

The value is determined to be:

1) if C++11 futures is not supported, __cpp_lib_futures < 201103L
2) if C++11 futures is supported, __cpp_lib_futures >= 201103L
3) if C++14 futures is supported, __cpp_lib_futures > 201103L

This would require library implementors to create a header file with this macro definition. (As opposed to not having the header, or pre-defining this macro, or having the library feature testing macros live in one particular header.)

Example Usage

Guarding for C++11 constexpr:

#if __cplusplus_constexpr >= 201103L
  constexpr int i = 66;
#endif

Guarding for C++14 relaxed constexpr, given C++11 assumed.

#if __cplusplus_constexpr >= 201304L
constexpr int h(int k) 
{
  int x = incr(k);  
  return x;
}
#endif

Open Questions

1. Macro conventions.

The macro naming convention, the numbers of macros, type, form, etc. are all up for debate.

Some consensus on:

a. Against function-style macros in the committee, but no explicit rationale for this.

b. The prefix with the most support is: __cpp_.

c. Language feature macros should be pre-defined and not tied to a particular header.

2. What about feature testing in older versions of C++?

In the C++11 standard, two new macros were added, proto feature-testing macros. These macros may establish a naming precedent.

__STDCPP_STRICT_POINTER_SAFETY__
__STDCPP_THREADS__

If the committee feels like this is not precedent, and that new functionality means new name, than hopefully these will be incorporated this into whatever naming scheme is now proposed, and the C++11 forms deprecated.

3. Longest-standing feature-testing portability wart is from 1997, starting with the language features exceptions and run-time type identification.

Solving generalized feature testing in a manner simpatico with the both older and newer language features is highly desirable. Some background on GNU issues with this is PR 25191.

4. How do individual feature tests fit in with the global version macro for C++ (__cplusplus)?

Right now, there’s only one real macro, so everything depends on it. But when there are more, how do the multiple feature test macros interact with __cplusplus? Is there a general way to indicate that there is a compiler setting or command line flag that has explicitly disabled parts of the specified language dialect?

No, there is not. Should there be just one, or should a bunch of smaller macros also be checked?

Surveying a couple compilers for standard operating procedures, it seems as if the usual behavior is to treat the command line dialect flag as the base language target, rather than indicating full or strict conformance.

Then, strict language conformance is available via specific command-line flags (-ansi, or -std=c++98), and defines __STRICT_ANSI__ or another equivalent macro.

So, distinguishing between some vendor extensions and strict standard conformance is possible at compile time.

But disabling whole chunks of the regularly-supported language, like specific builtin types or language features, doesn’t distinguish itself in the same manner at compile time. .

For instance, in the C++11 dialect, gnu/clang/edg front ends set __cplusplus to 201103L. Even when language features required for full conformance, like exceptions or long long integers, are explicitly disabled.

The C language has the idea of a pre-defined macro that indicates conformance (__STDC__), and separates out dialect (__STDC_VERSION__). In practice, __STDC__ may indicate conformance + extensions, or explicitly non-conformant behavior. So, not especially useful.

In C++ these macros are explicitly implementation-defined, so even less useful.

Posix has a runtime test that can be used to determine functionality, ie sysconf.

5. What about multi-vendor setups hosted on a single operating system?