Notes for installing TensorFlow on linux, with GPU enabled.
Background
TensorFlow is the second-generation ML framework from Google. (See this comparison of deep learning software.) The current state-of-the art image recognition models (inception-v3) use this framework.
Prequisites
Assuming Fedora 24 with Nvidia 1060 installed, running nvidia as opposed to nouveau drivers. See Fedora 24 Notes, and RPM Fusion’s installation page for installing the Nvidia drivers. In sum,
All the Left/Right Uhuras, 2014 All the Center Uhuras, 2014
68 cm x 86.5 cm, Inkjet over lapis and silvertone wash on Awagami Bamboo 250 gsm paper. Master jedi paper tricks via Emily York.
These prints are composed of 288 cropped images of Uhura from the television show Star Trek. Each frame of the first season is analyzed with facial recognition software, and found Uhura faces are either inscribed with tattoo-like circles representing individual facial detection algorithms, or scaled, cropped, and center-aligned via sophisticated image-processing routines.
To offset the explicitly computed nature of this work, the images are aligned on broken grids, and floated on an organic background of silvertone metallic or lapis mineral pigments.
Two 1080p LED televisions, 2 8GB SD Cards, video @ 960 x 720 pixel, loops of 3:00:00 and 2:00:00 hours.
ONE Tens, hundreds, thousands. How many photographs do you look at every day? Do you swipe through photographs on a smart phone, scroll through images on Tumblr, scrub through frames on Netflix? More images per day than last year? Two years ago? Ten years ago? I wanted to explore an environment of extreme multiplicity. Thousands, tens of thousands of images. I wanted to create and manipulate images in pre-determined ways, but using image mass-generation techniques. Metadata is a by-product of the mass manufacture and processing of images, and exploring the oceans of possible patterns created as a by-product of a machine counting thousands of images had great appeal. Like generating waves with a mechanical device in a wave pool, this would be a super-human way of counting, with very clear rules about the perception of the image: detection, perception codified. I picked facial recognition algorithms as my art tool, and use them to count faces.
Uhura Audit 23 Details S01E04 frame 1391: All Faces, All Eyes, All Lips, 2014
3 x 960 x 720 pixels
TWO The appeal of All Eyes, Multiple Eyes, Fanciful Eyes. The three augmented frames above comprise a sample from the 1391th second of Star Trek TOS, the 4th episode of the first season (S01E04). As part of the initial processing and detection of objects, each frame is scanned for face-like objects, then for eye-like objects, and then for lip-like objects. The first frame is “All Faces,” which is comprised of 5 algorithms run up to 3 different ways. The second frame is “All Eyes,” which is comprised of 7 algorithms, all run the same way. The last is “All Lips,” which is generated by 2 algorithms, also all run in a similar manner. Note that “All Eyes” is generated by running the algorithm over the entire frame, instead of over just the detected face area. I am consciously mis-using this algorithm to generate ornamentation. This is of dubious utility for anything other than generating pictures that dazzle: generated virtual tattoos, a lot of pixel ink spread around to establish identity.
A Few Frames from Spock Audit 23 Results, 2014
4 x 960 x 720 pixels
THREE Spock Eyebrows and Cat-Eye Makeup Considered Harmful. The frames above show a face detection mistake that heavily impacts efforts to use human-face detection algorithms on the non-human face of the character Spock. In the Star Trek universe, Spock is a hybrid human: half human, half alien (Vulcan). For some reason, the green box that outlines the boundary of the detected Spock face in each of the images above truncates the face at or above the lips. I speculate that this augmentation pattern is caused by the heavily arched eyebrows of Spock being mistaken for prominent cheek bones, effectively spiking the recognized facial region upwards, clipping the mouth. Looking at different episodes of the series, I see this same mis-detection replicated in Uhura frames with extreme cat-eye makeup.
Git Wins, 2014
g+ post, text
FOUR Reflections on Process. This exploration started as a generative art project targeting media. Little did I know that I’d end up buying my first television sets as part of the development process, and lose myself into the void of a consciously computed image environment. Intermittently back from the void, let me report on what I’ve found from time spent doing art and science at the same. I see a lot of engineering processes: a strangely elliptical 10-13 day project management schedule for keeping on track of technical tasks, source code versioning, a lot of linux system-level administration and tuning. Tens and tens of thousands of lines of C++11 and one hot processor, the requisite kludgy bash scripts, and gigabytes of image files nestled in directories by the six thousand. I started inserting art tasks into the project management scheme, forcing visual priorities to become explicit. As an art process, the manipulation of thousands of frames, picking apart video and putting it back together has given some interesting insights into Star Trek. I’ve noticed certain editing patterns previously invisible, the favoring of certain sides of characters by the camera, and that counting the characters frames changes the character’s meta-narrative in my head. All these formalist film analysis, done by algorithm.
Sort 4 Negatives All Faces, 2014
960 x 720, 8:28 minutes
FIVE Faceless, Negatives, and the Fleeting Form. In addition to finding faces, one can use face detection to assure that there is no face, to detect nothing. That’s what these frames are, a moody collection of people walking through closing doors, turning corners, of machines levitating through space, space ships cruising the galaxy, of cropped hands, the backs of heads, murky faces in the shadows. It turns out that along with the rise of face detection, and the rise in the capabilities in easy-to-use commercial forms like Facebook photo-tagging, is a concurrent rise in the desire for explicit face removal. Search the #faceless tag on the social media platform of your choice for a peek into this particularly interesting data set.
A Few Frames from Kirk Audit 16 Results, 2014
3 x 960 x 720 pixels
SIX Kirk Dimples Considered Harmful. Another mistaken detection, particular to Kirk: the deep chin cleft is a shape that eye detectors can consider eyes. Proper tuning and some additional computation fixes this issue, but the mind reels. Human faces are mostly symmetric, but filled with parts that make them non-uniform. There is no recognizer for scars, dimples, or birth marks.
Select Frames from Audit 16 Results, 2014
4 x 960 x 720 pixels
SEVEN Computers Recognizing Computers. Both Uhura and Spock characters are often seated in front of bank of blinking lights, a typical 1960’s imagineering of technology. By chance, the relative size and positions of the lights comprising the background technology, and the profile nature of the seated character, combine to confuse the face recognizer algorithms.
126 Uhuras as Seen on TV, 2014
17 x 45″, inkjet on two sheets of Awagami paper, space
EIGHT The rise of Uhura. Before I started this project, my favorite Star Trek character was Spock. Now it’s Uhura. Of the three Star Trek characters I’m stalking with face recognition, Uhura has the least number of good samples. At this point, there are 2909 Kirk positive samples, 1838 Spock samples, and 288 Uhura samples. Just as a point of reference, there are 351 title frames in an equivalent sample of Star Trek frames. That there are less solo-Uhura frames in a typical Star Trek episode than title-credit-end-credit sequence frames can be explicitly quantified. I catch myself constantly scheming to figure out edits and algorithms that will give her more screen time, retro-actively. I triple-count the Uhura frames as I count all her eyes, all her faces, all her lips. Computer, tell me about gender and race in 1966-69 USA.
Sort 4 All Grid Uhuras Waterfall, 2014
960 x 720, 48 seconds
Miscellaneous Augmentation Keys, 2014
svg files, text
NINE Compression, augmentation, visualization. A part of this project involved running every single face, eye, and mouth recognizer over a “frame” of Star Trek episode. To get an idea how the algorithms were performing in my setup, I created these color-coded augmentation schemes that allowed me to look at the results of algorithm, directly applied on the original frame. Each frame is then smashed up against another in a compressed form, and watched on a screen or projected against a wall. That’s what these video clips encode: first detected faces, and then later recognized characters and specific character interactions.
audit version: 23f 2014-07-08
kirk samples: 2909
kirk samples with detected faces: 2780
front faces: 1081
(h x w) Max 527 Median 312 Min 134
(faces <= size) 1025 <= 400 610 <= 320 168 <= 240
profile faces: 639
(h x w) Max 502 Median 317 Min 123
(faces <= size) 582 <= 400 330 <= 320 83 <= 240
tos-101-0130
face: 294 162 407 407
eye 1: [82 x 82 from (83, 117)]
lips: 3
[161 x 97 from (113, 301)]
[155 x 93 from (60, 117)]
[144 x 86 from (229, 129)]
Kirk Audit 23 Results, 2014
c++ 11, math, text
TEN Rule-based Perception. Humans see faces, and don’t consciously count eye offsets, have to think about detecting mouths, or consider that a sufficiently cleft chin can trick a common eye recognizer. Humans look at faces and see friends, family, stereotypes and simplifications, other people. Computers look at faces and calculate eye position, estimate posture, automatically center-align eyes and re-scale the entire face to fit. Faces are calculated and enumerated: parts of the software that created this piece detect faces in an algorithmic and speculative fashion: checks that two eyes are detected with the face boundary, that the detected eyes are about the same size, that there is some horizontal spacing between the eyes, and that the vertical distance between the eyes is not so extreme as to indicate failure. Humans see faces. Computers see points of interest in a region specified as interesting. Teaching a computer to see like humans therefore involves moments where the humans decide that the picture in front of you is a full-frontal face when it has no more than 15% deviation from an imaginary nose plane. That both ears are visible. When everything on a face is measured and quantified, there is so much more information, and with it the corresponding ability to both recognize and mis-interpret. Make no mistake, soon computers will recognize more faces and emotions than a sleep-deprived parent, or a tired law enforcement officer. And then, what?
Ambient
Sol LeWitt Artist’s Books, Corraini Edizioni, 2010
The default package for OpenCV on Fedora 20 (f20) is
opencv-2.4.7-6
The performance of such algorithms as Classifier::detectMultiScale and opencv_traincascade can be optimized via the installation of additional packages, and then enabling them when rebuilding OpenCV with various build flags.
Looking through the opencv.spec SRPM file, various enable flags are provided for configuration tweaking and tuning purposes when rebuilding with rpmbuild.
The most relevant for optimization:
--with eigen3
--with sse3
The most relevant for extending capabilities:
--with ffmpeg
--with openni
The default package can be rebuilt with these optimizations using syntax like:
However, even when using these flags on f20, the output provided by cmake at configuration time as per doesn’t enthuse. So, rebuild upstream sources without RPM to master the package configuration, and then bring this knowledge back into the RPM package. Old school, yo.
Looking at the upstream source repository, and then rebasing the f20 sources to the latest release of OpenCV (2.4.9) starts off the SRPM hacking. To get a cmake build going, build the opencv sources as specified in the link, to get dependency tracking working.
The file CMakeLists.txt has the build-time configure options.
To enable WITH_IPP, more elaborate configuration is required. First, install Intel Performance Primitives (aka IPP). From the User’s Guide: Note that opencv_traincascade application can use TBB for multi-threading. To use it in multicore mode OpenCV must be built with TBB.
After IPP is installed, the system must be configured to use it easily. To fixup PATHs, pick one of two options.
One: add the following to LD_LIBRARY_PATH and LD_RUN_PATH:
Furthermore, for OpenCV configuration to find the installed IPP at SRPM build time, the environment variable IPPROOT must be set, as follows:
setenv IPPROOT /opt/intel/ipp
Build SRPM
Build the modified opencv package with the following custom SPEC file. No configuration options are necessary: WITH_IPP, WITH_TBB, WITH_EIGEN are all enabled.
Then, force install it over the default libs as follows:
rpm -Uvh --nodeps opencv-2.4.9-3 etc etc.
Recompile the opencv app in question, and volia. Optimized. Speedups may vary, seeing ~ 2.3x speedups in processing times.
This is a series of art videos that were generated as a by-product of an ongoing computational media project further described in GVOD, GVOD + Analytics: Star Treks \\///, etc.
The Machine is Learning, v2.2. 12:37 minutes, 960 x 720 pixels.
The Machine is Learning, v2.3 lbp 12:37 minutes, 960 x 720 pixels.
The Machine is Learning, v5.7 multi 12:37 minutes, 960 x 720 pixels.
The Machine is Learning, v5.7 multi ghost 12:37 minutes, 960 x 720 pixels.
This is a fan studies and media assemblage experiment, loosely associated with Professor Abigail De Kosnik’s Fan Data/Net Difference Project at the Berkeley Center for New Media. It uses technology associated with copyright verification and the surveillance state to desconstruct serial television into a hybrid media form.
The motivating question for this work is simple. How does one quantize serial television? Given a television episode, such as the third episode of Star Trek, how can it be measured and then compared to other episodes of Star Trek? Can characters of the original Star Trek television series be compared to characters in different Star Trek universes and franchises, such as comparing Kirk/Spock in Star Trek to Janeway/Seven-of-Nine in Star Trek Voyager? Given a media text, how do you tag and score it? If you cannot score the whole text, can you score a character or characters? How do characters or elements of a media text become countable?
Media Texts:
Star Trek (The Original Series), aka TOS, 1966 to 1969. Episodes: 79. Running time each: 50 minutes. English subtitles from subscene for Seasons 1, 2, 3.
Star Trek Voyager, aka VOY, 1995 to 2001. Episodes: S03E26 to S07E26, ie #68 to #172, a total of 104. Running time each varies between 45 and 46 minutes.
Media Focus/Themes:
The pairs of Kirk/Spock in Star Trek the Original Series and Janeway/Seven of Nine in Star Trek Voyager will be compared in a media-analytic fashion.
James Kirk
Kathryn Janeway
Spock
Seven of Nine
A popular fanfic genre is called One True Pairing, aka OTP, which is a perceived or invented romantic relationship between two characters. One of the best known examples of OTP is the pair of Kirk and Spock on TOS. Indeed, fanfic involving Kirk and Spock is so popular to have its own nomenclature, and is called slash, or slash fic.
The pair of Janeway and Seven of Nine are comparable to Kirk and Spock as both the Janeway and Kirk characters are captains of space ships, and both the Seven of Nine and Spock characters are presented as “the other” to human characters: both the borg and vulcans are presented as otherworldly, non-human. The two pairs are different in other areas, the most obvious being gender: K/S is male, J/7 is female.
Some edit tapes for K/S can be found on YouTube for Seasons 1, 2, and 3. Some fanvids for J/7.
Open Questions:
This is a meta-vidding tool with an analytic overlay. It takes serial television shows and adds facial recognition to count face time and change the focus of viewing to specific character pairs instead of entire episodes. Developing the technology to answer these analytic questions, answering and understanding the answers, and formulating the next round of questions is the purpose of this project.
1. Should the method be the first 79 episodes that the character-pairs are together? How do you normalize the series and pairs?
Or minute-normalized, after the edits? The current times are:
TOS == 79 x 50 minutes == 3950 “character-pair” minutes total
VOY == 104 x 43 minutes == 4472 “character-pair” minutes total
2. Best method for facial recognition.
One idea is to use openframeworks, and incorporate an addon. Get FaceTracker library. See video explaining it. Get ofxFaceTracker addon for openframeworks.
3. Measuring “character” and “character-pair” screen time. How is this related to the bechdel test? [2+ women, who talk to each other, about something besides a man] Can be this used to visualize it or flaws as currently conceived? What is bechdel version 2.0? [2+ women, who talk to each other, about something besides a man, or kids, or family] Can we use this tool to develop new forms?
4. How to auto-tag? How to populate the details of each scene in a tagged format? If original sources have subtitles, is there a way to dump the subs to SRT, and then populate the body of the wordpress with the transcript? Or, is there a way to use google’s transcription API’s to try and upload/subtitle/rip?
5. Can the netflix taxonomy be replicated? Given the patents, can some other organization scheme be devised?
Methodology:
0. Prerequisites
Software/hardware base is: Linux (Fedora 20) on Intel x86_64, using Nvidia graphics hardware and software. Ie, contemporary rendering and film production workstation.
Additional software is required on top of this base. For instance, g++ development environment, ffmpeg, opencv, openframeworks.080.
3. Sort through frames and set aside twelve frames of Kirk faces, twelve frames of Spock faces.
This is used later, to train the facial recognition. Note: you definitely need hundreds and even thousands of positive samples for faces. In the case of faces you should consider all the race and age groups, emotions and perhaps beard styles.
For example, meet the Kirks.
And here are the Spocks.
In addition, this technique requires a negative set of images. These are images that are from the media source, but do not contain any of the faces that are going to be recognized. These are used to train the facial recognizer. Meet the non-K/S-faces.
4. Seed facial recognition with faces to recognize. Scan frames with facial recognition according to some input and expected result algorithm, and come up with edit lists that can be used to frames that are relevant to the character-pair.
Need either timecode or some other measure that can be dumped with an edit decision list or specific timecode marks. Some persistent data structure? Edits made.
5. Decompose episode into character-pair edit vids.
Use edit decision list or specific timecode marks, as above. Automate ffmpeg to make edits.
6. Store in wordpress container, one post per edit vid? Then with another post, tie together all of a single episode edit vids into one linked post?
Legal
There are both copyright risks and patent opportunities in this line of inquiry.