ffmpeg | sunglint

Ten Thoughts On Detection and Recognition

2014/07/10

Installation of The Machine Is Learning v# at Pro Arts Oakland, 2014

Two 1080p LED televisions, 2 8GB SD Cards, video @ 960 x 720 pixel, loops of 3:00:00 and 2:00:00 hours.

ONE Tens, hundreds, thousands. How many photographs do you look at every day? Do you swipe through photographs on a smart phone, scroll through images on Tumblr, scrub through frames on Netflix? More images per day than last year? Two years ago? Ten years ago? I wanted to explore an environment of extreme multiplicity. Thousands, tens of thousands of images. I wanted to create and manipulate images in pre-determined ways, but using image mass-generation techniques. Metadata is a by-product of the mass manufacture and processing of images, and exploring the oceans of possible patterns created as a by-product of a machine counting thousands of images had great appeal. Like generating waves with a mechanical device in a wave pool, this would be a super-human way of counting, with very clear rules about the perception of the image: detection, perception codified. I picked facial recognition algorithms as my art tool, and use them to count faces.

Uhura Audit 23 Details S01E04 frame 1391: All Faces, All Eyes, All Lips, 2014

3 x 960 x 720 pixels

TWO The appeal of All Eyes, Multiple Eyes, Fanciful Eyes. The three augmented frames above comprise a sample from the 1391th second of Star Trek TOS, the 4th episode of the first season (S01E04). As part of the initial processing and detection of objects, each frame is scanned for face-like objects, then for eye-like objects, and then for lip-like objects. The first frame is “All Faces,” which is comprised of 5 algorithms run up to 3 different ways. The second frame is “All Eyes,” which is comprised of 7 algorithms, all run the same way. The last is “All Lips,” which is generated by 2 algorithms, also all run in a similar manner. Note that “All Eyes” is generated by running the algorithm over the entire frame, instead of over just the detected face area. I am consciously mis-using this algorithm to generate ornamentation. This is of dubious utility for anything other than generating pictures that dazzle: generated virtual tattoos, a lot of pixel ink spread around to establish identity.

A Few Frames from Spock Audit 23 Results, 2014

4 x 960 x 720 pixels

THREE Spock Eyebrows and Cat-Eye Makeup Considered Harmful. The frames above show a face detection mistake that heavily impacts efforts to use human-face detection algorithms on the non-human face of the character Spock. In the Star Trek universe, Spock is a hybrid human: half human, half alien (Vulcan). For some reason, the green box that outlines the boundary of the detected Spock face in each of the images above truncates the face at or above the lips. I speculate that this augmentation pattern is caused by the heavily arched eyebrows of Spock being mistaken for prominent cheek bones, effectively spiking the recognized facial region upwards, clipping the mouth. Looking at different episodes of the series, I see this same mis-detection replicated in Uhura frames with extreme cat-eye makeup.

Git Wins, 2014

g+ post, text

FOUR Reflections on Process. This exploration started as a generative art project targeting media. Little did I know that I’d end up buying my first television sets as part of the development process, and lose myself into the void of a consciously computed image environment. Intermittently back from the void, let me report on what I’ve found from time spent doing art and science at the same. I see a lot of engineering processes: a strangely elliptical 10-13 day project management schedule for keeping on track of technical tasks, source code versioning, a lot of linux system-level administration and tuning. Tens and tens of thousands of lines of C++11 and one hot processor, the requisite kludgy bash scripts, and gigabytes of image files nestled in directories by the six thousand. I started inserting art tasks into the project management scheme, forcing visual priorities to become explicit. As an art process, the manipulation of thousands of frames, picking apart video and putting it back together has given some interesting insights into Star Trek. I’ve noticed certain editing patterns previously invisible, the favoring of certain sides of characters by the camera, and that counting the characters frames changes the character’s meta-narrative in my head. All these formalist film analysis, done by algorithm.

Sort 4 Negatives All Faces, 2014

960 x 720, 8:28 minutes

FIVE Faceless, Negatives, and the Fleeting Form. In addition to finding faces, one can use face detection to assure that there is no face, to detect nothing. That’s what these frames are, a moody collection of people walking through closing doors, turning corners, of machines levitating through space, space ships cruising the galaxy, of cropped hands, the backs of heads, murky faces in the shadows. It turns out that along with the rise of face detection, and the rise in the capabilities in easy-to-use commercial forms like Facebook photo-tagging, is a concurrent rise in the desire for explicit face removal. Search the #faceless tag on the social media platform of your choice for a peek into this particularly interesting data set.

A Few Frames from Kirk Audit 16 Results, 2014

3 x 960 x 720 pixels

SIX Kirk Dimples Considered Harmful. Another mistaken detection, particular to Kirk: the deep chin cleft is a shape that eye detectors can consider eyes. Proper tuning and some additional computation fixes this issue, but the mind reels. Human faces are mostly symmetric, but filled with parts that make them non-uniform. There is no recognizer for scars, dimples, or birth marks.

Select Frames from Audit 16 Results, 2014

4 x 960 x 720 pixels

SEVEN Computers Recognizing Computers. Both Uhura and Spock characters are often seated in front of bank of blinking lights, a typical 1960’s imagineering of technology. By chance, the relative size and positions of the lights comprising the background technology, and the profile nature of the seated character, combine to confuse the face recognizer algorithms.

126 Uhuras as Seen on TV, 2014

17 x 45″, inkjet on two sheets of Awagami paper, space

EIGHT The rise of Uhura. Before I started this project, my favorite Star Trek character was Spock. Now it’s Uhura. Of the three Star Trek characters I’m stalking with face recognition, Uhura has the least number of good samples. At this point, there are 2909 Kirk positive samples, 1838 Spock samples, and 288 Uhura samples. Just as a point of reference, there are 351 title frames in an equivalent sample of Star Trek frames. That there are less solo-Uhura frames in a typical Star Trek episode than title-credit-end-credit sequence frames can be explicitly quantified. I catch myself constantly scheming to figure out edits and algorithms that will give her more screen time, retro-actively. I triple-count the Uhura frames as I count all her eyes, all her faces, all her lips. Computer, tell me about gender and race in 1966-69 USA.

Sort 4 All Grid Uhuras Waterfall, 2014

960 x 720, 48 seconds

Miscellaneous Augmentation Keys, 2014

svg files, text

NINE Compression, augmentation, visualization. A part of this project involved running every single face, eye, and mouth recognizer over a “frame” of Star Trek episode. To get an idea how the algorithms were performing in my setup, I created these color-coded augmentation schemes that allowed me to look at the results of algorithm, directly applied on the original frame. Each frame is then smashed up against another in a compressed form, and watched on a screen or projected against a wall. That’s what these video clips encode: first detected faces, and then later recognized characters and specific character interactions.

audit version: 23f      2014-07-08

kirk samples: 2909
kirk samples with detected faces: 2780

front faces: 1081
(h x w)              Max 527      Median 312   Min 134    
(faces <= size)      1025 <= 400  610 <= 320   168 <= 240

profile faces: 639
(h x w)              Max 502      Median 317   Min 123    
(faces <= size)      582 <= 400   330 <= 320   83 <= 240


tos-101-0130
  face:   294  162  407  407
  eye 1: [82 x 82 from (83, 117)]
  lips:   3
[161 x 97 from (113, 301)]
[155 x 93 from (60, 117)]
[144 x 86 from (229, 129)]

Kirk Audit 23 Results, 2014

c++ 11, math, text

TEN Rule-based Perception. Humans see faces, and don’t consciously count eye offsets, have to think about detecting mouths, or consider that a sufficiently cleft chin can trick a common eye recognizer. Humans look at faces and see friends, family, stereotypes and simplifications, other people. Computers look at faces and calculate eye position, estimate posture, automatically center-align eyes and re-scale the entire face to fit. Faces are calculated and enumerated: parts of the software that created this piece detect faces in an algorithmic and speculative fashion: checks that two eyes are detected with the face boundary, that the detected eyes are about the same size, that there is some horizontal spacing between the eyes, and that the vertical distance between the eyes is not so extreme as to indicate failure. Humans see faces. Computers see points of interest in a region specified as interesting. Teaching a computer to see like humans therefore involves moments where the humans decide that the picture in front of you is a full-frontal face when it has no more than 15% deviation from an imaginary nose plane. That both ears are visible. When everything on a face is measured and quantified, there is so much more information, and with it the corresponding ability to both recognize and mis-interpret. Make no mistake, soon computers will recognize more faces and emotions than a sleep-deprived parent, or a tired law enforcement officer. And then, what?

Ambient

Sol LeWitt Artist’s Books, Corraini Edizioni, 2010

Agnes Martin, Tremolo, 1962

Andy Warhol, Shadows, 1978-79

Barnett Newman, The Stations of the Cross, 1958-66

Research Notes

Trevor Paglen, 1: Is Photography Over?, Foto Museum Winterthur, 2014-03-03

Trevor Paglen, 2: Seeing Machines, Foto Museum Winterthur, 2014-03-13

Trevor Paglen, 3: Scripts, Foto Museum Winterthur, 2014-03-24

Trevor Paglen, 4: Geographies of Photography, Foto Museum Winterthur, 2014-04-11

Filed under art, technology Tagged with being human, c++11, california, cognition, dena beard, detection, digital humanities, disquiet, eigen, embellishment, face detection, face recognition, facerec, facial recognition, ffmpeg, fisher, fun, generative, generative art, generative fan labor, gig networking, haar, hella c++, lbph, machine learning, matrix, memory, metadata, mkv, NAS, oakland, objdetect, off into the unknown, opencv, pessoa, pro arts, recognition, reconstruction, the kiss, the machine is learning, uhura, uhura #1, uhura fandom, unease, what's a face?, what's front? what's profile?

OpenCV Configuration and Optimization Notes

2014/06/04

Background

The default package for OpenCV on Fedora 20 (f20) is

opencv-2.4.7-6

The performance of such algorithms as Classifier::detectMultiScale and opencv_traincascade can be optimized via the installation of additional packages, and then enabling them when rebuilding OpenCV with various build flags.

Looking through the opencv.spec SRPM file, various enable flags are provided for configuration tweaking and tuning purposes when rebuilding with rpmbuild.

The most relevant for optimization:

--with eigen3
--with sse3

The most relevant for extending capabilities:

--with ffmpeg
--with openni

The default package can be rebuilt with these optimizations using syntax like:

rpmbuild -ba opencv.spec --with ffmpeg --with openni --with eigen3 --with sse3

However, even when using these flags on f20, the output provided by cmake at configuration time as per doesn’t enthuse. So, rebuild upstream sources without RPM to master the package configuration, and then bring this knowledge back into the RPM package. Old school, yo.

Looking at the upstream source repository, and then rebasing the f20 sources to the latest release of OpenCV (2.4.9) starts off the SRPM hacking. To get a cmake build going, build the opencv sources as specified in the link, to get dependency tracking working.

The file CMakeLists.txt has the build-time configure options.

A list of the most interesting:

WITH_CUDA
WITH_CUFFT
WITH_BLAS

WITH_FFMPEG
WITH_OPENNI

WITH_EIGEN
WITH_IPP
WITH_TBB / BUILD_TBB
WITH_OPENMP
WITH_OPENCL

enable_dynamic_cuda
enable_fast_math
enable_sse3

Setup, Install Prerequisites.

A couple of these are easy to enable, with dependencies already pre-packaged.

For development, you’ll need the following dependencies:

yum install -y gtk2-devel libtheora-devel libvorbis-devel libraw1394-devel libdc1394-devel jasper-devel libpng-devel libjpeg-devel libtiff-devel libv4l-devel libGL-devel gtkglext-devel OpenEXR-devel zlib-devel python2-devel swig python-sphinx gstreamer-devel gstreamer-plugins-base-devel opencl-headers gstreamer-plugins-bad-free-devel gstreamer-python-devel gstreamer-devel gstreamer-plugins-bad-free-devel-docs gstreamer-plugins-base-devel-docs gstreamer-plugins-ugly-devel-docs libpng12-devel mesa-libGLES-devel

To execute binaries that have been compiled with this optimized version of opencv, one will need to install the OpenCL runtime.

For OPENNI

yum install -y openni openni-devel openni-doc

For FFMPEG

yum install -y ffmpeg ffmpeg-devel

For TBB

yum install -y tbb tbb-devel tbb-doc

For EIGEN

yum install -y eigen3-devel eigen3-doc

For IPP

To enable WITH_IPP, more elaborate configuration is required. First, install Intel Performance Primitives (aka IPP). From the User’s Guide: Note that opencv_traincascade application can use TBB for multi-threading. To use it in multicore mode OpenCV must be built with TBB.

After IPP is installed, the system must be configured to use it easily. To fixup PATHs, pick one of two options.

One: add the following to LD_LIBRARY_PATH and LD_RUN_PATH:

/opt/intel/ipp/lib/intel64:/opt/intel/lib/intel64/

Two: edit /etc/ldso.conf.d and add

tbb.conf
/opt/intel/lib/intel64

ipp.conf
/opt/intel/ipp/lib/intel64

Furthermore, for OpenCV configuration to find the installed IPP at SRPM build time, the environment variable IPPROOT must be set, as follows:

setenv IPPROOT /opt/intel/ipp

Build SRPM

Build the modified opencv package with the following custom SPEC file. No configuration options are necessary: WITH_IPP, WITH_TBB, WITH_EIGEN are all enabled.

Then, force install it over the default libs as follows:

rpm -Uvh --nodeps opencv-2.4.9-3 etc etc.

Recompile the opencv app in question, and volia. Optimized. Speedups may vary, seeing ~ 2.3x speedups in processing times.

Filed under technology Tagged with c++11, eigen, f20, fedora, ffmpeg, ipp, linux, opencv, openmp, openni, optimization, tbb, unix

Notes on The Machine Is Learning

2014/03/16

This is a series of art videos that were generated as a by-product of an ongoing computational media project further described in GVOD, GVOD + Analytics: Star Treks \\///, etc.

The Machine is Learning, v2.2.
12:37 minutes, 960 x 720 pixels.

The Machine is Learning, v2.3 lbp
12:37 minutes, 960 x 720 pixels.

The Machine is Learning, v5.7 multi
12:37 minutes, 960 x 720 pixels.

The Machine is Learning, v5.7 multi ghost
12:37 minutes, 960 x 720 pixels.

Filed under art, technology Tagged with 720p, algorithms, cascades, eye recognition, face recognition, facial recognition, ffmpeg, generative, haar, lbp, object recognition, opencv, star trek, surfacing survellience, the man eater, tos

sunglint

Ten Thoughts On Detection and Recognition

OpenCV Configuration and Optimization Notes

Notes on The Machine Is Learning

Tags

Archives