facial recognition

All the Uhuras

2014/07/31

All the Left/Right Uhuras, 2014
All the Center Uhuras, 2014

68 cm x 86.5 cm, Inkjet over lapis and silvertone wash on Awagami Bamboo 250 gsm paper. Master jedi paper tricks via Emily York.

These prints are composed of 288 cropped images of Uhura from the television show Star Trek. Each frame of the first season is analyzed with facial recognition software, and found Uhura faces are either inscribed with tattoo-like circles representing individual facial detection algorithms, or scaled, cropped, and center-aligned via sophisticated image-processing routines.

To offset the explicitly computed nature of this work, the images are aligned on broken grids, and floated on an organic background of silvertone metallic or lapis mineral pigments.

Filed under art Tagged with art, awagami, bamboo, binding, broken grid, california, facerec, facial detection, facial recognition, filmstrip, flandmarks, GAF 2014, grid, hong kong, inkjet, landmarks, lapis, master jedi, opencv, overprinting, paper tricks, san-fran-cisco, savoir faire, seam them together, silvertone, uhura, uhura fandom, warhol, wash, wheatpaste, works on paper

Ten Thoughts On Detection and Recognition

2014/07/10

Installation of The Machine Is Learning v# at Pro Arts Oakland, 2014

Two 1080p LED televisions, 2 8GB SD Cards, video @ 960 x 720 pixel, loops of 3:00:00 and 2:00:00 hours.

ONE Tens, hundreds, thousands. How many photographs do you look at every day? Do you swipe through photographs on a smart phone, scroll through images on Tumblr, scrub through frames on Netflix? More images per day than last year? Two years ago? Ten years ago? I wanted to explore an environment of extreme multiplicity. Thousands, tens of thousands of images. I wanted to create and manipulate images in pre-determined ways, but using image mass-generation techniques. Metadata is a by-product of the mass manufacture and processing of images, and exploring the oceans of possible patterns created as a by-product of a machine counting thousands of images had great appeal. Like generating waves with a mechanical device in a wave pool, this would be a super-human way of counting, with very clear rules about the perception of the image: detection, perception codified. I picked facial recognition algorithms as my art tool, and use them to count faces.

Uhura Audit 23 Details S01E04 frame 1391: All Faces, All Eyes, All Lips, 2014

3 x 960 x 720 pixels

TWO The appeal of All Eyes, Multiple Eyes, Fanciful Eyes. The three augmented frames above comprise a sample from the 1391th second of Star Trek TOS, the 4th episode of the first season (S01E04). As part of the initial processing and detection of objects, each frame is scanned for face-like objects, then for eye-like objects, and then for lip-like objects. The first frame is “All Faces,” which is comprised of 5 algorithms run up to 3 different ways. The second frame is “All Eyes,” which is comprised of 7 algorithms, all run the same way. The last is “All Lips,” which is generated by 2 algorithms, also all run in a similar manner. Note that “All Eyes” is generated by running the algorithm over the entire frame, instead of over just the detected face area. I am consciously mis-using this algorithm to generate ornamentation. This is of dubious utility for anything other than generating pictures that dazzle: generated virtual tattoos, a lot of pixel ink spread around to establish identity.

A Few Frames from Spock Audit 23 Results, 2014

4 x 960 x 720 pixels

THREE Spock Eyebrows and Cat-Eye Makeup Considered Harmful. The frames above show a face detection mistake that heavily impacts efforts to use human-face detection algorithms on the non-human face of the character Spock. In the Star Trek universe, Spock is a hybrid human: half human, half alien (Vulcan). For some reason, the green box that outlines the boundary of the detected Spock face in each of the images above truncates the face at or above the lips. I speculate that this augmentation pattern is caused by the heavily arched eyebrows of Spock being mistaken for prominent cheek bones, effectively spiking the recognized facial region upwards, clipping the mouth. Looking at different episodes of the series, I see this same mis-detection replicated in Uhura frames with extreme cat-eye makeup.

Git Wins, 2014

g+ post, text

FOUR Reflections on Process. This exploration started as a generative art project targeting media. Little did I know that I’d end up buying my first television sets as part of the development process, and lose myself into the void of a consciously computed image environment. Intermittently back from the void, let me report on what I’ve found from time spent doing art and science at the same. I see a lot of engineering processes: a strangely elliptical 10-13 day project management schedule for keeping on track of technical tasks, source code versioning, a lot of linux system-level administration and tuning. Tens and tens of thousands of lines of C++11 and one hot processor, the requisite kludgy bash scripts, and gigabytes of image files nestled in directories by the six thousand. I started inserting art tasks into the project management scheme, forcing visual priorities to become explicit. As an art process, the manipulation of thousands of frames, picking apart video and putting it back together has given some interesting insights into Star Trek. I’ve noticed certain editing patterns previously invisible, the favoring of certain sides of characters by the camera, and that counting the characters frames changes the character’s meta-narrative in my head. All these formalist film analysis, done by algorithm.

Sort 4 Negatives All Faces, 2014

960 x 720, 8:28 minutes

FIVE Faceless, Negatives, and the Fleeting Form. In addition to finding faces, one can use face detection to assure that there is no face, to detect nothing. That’s what these frames are, a moody collection of people walking through closing doors, turning corners, of machines levitating through space, space ships cruising the galaxy, of cropped hands, the backs of heads, murky faces in the shadows. It turns out that along with the rise of face detection, and the rise in the capabilities in easy-to-use commercial forms like Facebook photo-tagging, is a concurrent rise in the desire for explicit face removal. Search the #faceless tag on the social media platform of your choice for a peek into this particularly interesting data set.

A Few Frames from Kirk Audit 16 Results, 2014

3 x 960 x 720 pixels

SIX Kirk Dimples Considered Harmful. Another mistaken detection, particular to Kirk: the deep chin cleft is a shape that eye detectors can consider eyes. Proper tuning and some additional computation fixes this issue, but the mind reels. Human faces are mostly symmetric, but filled with parts that make them non-uniform. There is no recognizer for scars, dimples, or birth marks.

Select Frames from Audit 16 Results, 2014

4 x 960 x 720 pixels

SEVEN Computers Recognizing Computers. Both Uhura and Spock characters are often seated in front of bank of blinking lights, a typical 1960’s imagineering of technology. By chance, the relative size and positions of the lights comprising the background technology, and the profile nature of the seated character, combine to confuse the face recognizer algorithms.

126 Uhuras as Seen on TV, 2014

17 x 45″, inkjet on two sheets of Awagami paper, space

EIGHT The rise of Uhura. Before I started this project, my favorite Star Trek character was Spock. Now it’s Uhura. Of the three Star Trek characters I’m stalking with face recognition, Uhura has the least number of good samples. At this point, there are 2909 Kirk positive samples, 1838 Spock samples, and 288 Uhura samples. Just as a point of reference, there are 351 title frames in an equivalent sample of Star Trek frames. That there are less solo-Uhura frames in a typical Star Trek episode than title-credit-end-credit sequence frames can be explicitly quantified. I catch myself constantly scheming to figure out edits and algorithms that will give her more screen time, retro-actively. I triple-count the Uhura frames as I count all her eyes, all her faces, all her lips. Computer, tell me about gender and race in 1966-69 USA.

Sort 4 All Grid Uhuras Waterfall, 2014

960 x 720, 48 seconds

Miscellaneous Augmentation Keys, 2014

svg files, text

NINE Compression, augmentation, visualization. A part of this project involved running every single face, eye, and mouth recognizer over a “frame” of Star Trek episode. To get an idea how the algorithms were performing in my setup, I created these color-coded augmentation schemes that allowed me to look at the results of algorithm, directly applied on the original frame. Each frame is then smashed up against another in a compressed form, and watched on a screen or projected against a wall. That’s what these video clips encode: first detected faces, and then later recognized characters and specific character interactions.

audit version: 23f      2014-07-08

kirk samples: 2909
kirk samples with detected faces: 2780

front faces: 1081
(h x w)              Max 527      Median 312   Min 134    
(faces <= size)      1025 <= 400  610 <= 320   168 <= 240

profile faces: 639
(h x w)              Max 502      Median 317   Min 123    
(faces <= size)      582 <= 400   330 <= 320   83 <= 240


tos-101-0130
  face:   294  162  407  407
  eye 1: [82 x 82 from (83, 117)]
  lips:   3
[161 x 97 from (113, 301)]
[155 x 93 from (60, 117)]
[144 x 86 from (229, 129)]

Kirk Audit 23 Results, 2014

c++ 11, math, text

TEN Rule-based Perception. Humans see faces, and don’t consciously count eye offsets, have to think about detecting mouths, or consider that a sufficiently cleft chin can trick a common eye recognizer. Humans look at faces and see friends, family, stereotypes and simplifications, other people. Computers look at faces and calculate eye position, estimate posture, automatically center-align eyes and re-scale the entire face to fit. Faces are calculated and enumerated: parts of the software that created this piece detect faces in an algorithmic and speculative fashion: checks that two eyes are detected with the face boundary, that the detected eyes are about the same size, that there is some horizontal spacing between the eyes, and that the vertical distance between the eyes is not so extreme as to indicate failure. Humans see faces. Computers see points of interest in a region specified as interesting. Teaching a computer to see like humans therefore involves moments where the humans decide that the picture in front of you is a full-frontal face when it has no more than 15% deviation from an imaginary nose plane. That both ears are visible. When everything on a face is measured and quantified, there is so much more information, and with it the corresponding ability to both recognize and mis-interpret. Make no mistake, soon computers will recognize more faces and emotions than a sleep-deprived parent, or a tired law enforcement officer. And then, what?

Ambient

Sol LeWitt Artist’s Books, Corraini Edizioni, 2010

Agnes Martin, Tremolo, 1962

Andy Warhol, Shadows, 1978-79

Barnett Newman, The Stations of the Cross, 1958-66

Research Notes

Trevor Paglen, 1: Is Photography Over?, Foto Museum Winterthur, 2014-03-03

Trevor Paglen, 2: Seeing Machines, Foto Museum Winterthur, 2014-03-13

Trevor Paglen, 3: Scripts, Foto Museum Winterthur, 2014-03-24

Trevor Paglen, 4: Geographies of Photography, Foto Museum Winterthur, 2014-04-11

Filed under art, technology Tagged with being human, c++11, california, cognition, dena beard, detection, digital humanities, disquiet, eigen, embellishment, face detection, face recognition, facerec, facial recognition, ffmpeg, fisher, fun, generative, generative art, generative fan labor, gig networking, haar, hella c++, lbph, machine learning, matrix, memory, metadata, mkv, NAS, oakland, objdetect, off into the unknown, opencv, pessoa, pro arts, recognition, reconstruction, the kiss, the machine is learning, uhura, uhura #1, uhura fandom, unease, what's a face?, what's front? what's profile?

Notes on The Machine Is Learning

2014/03/16

This is a series of art videos that were generated as a by-product of an ongoing computational media project further described in GVOD, GVOD + Analytics: Star Treks \\///, etc.

The Machine is Learning, v2.2.
12:37 minutes, 960 x 720 pixels.

The Machine is Learning, v2.3 lbp
12:37 minutes, 960 x 720 pixels.

The Machine is Learning, v5.7 multi
12:37 minutes, 960 x 720 pixels.

The Machine is Learning, v5.7 multi ghost
12:37 minutes, 960 x 720 pixels.

Filed under art, technology Tagged with 720p, algorithms, cascades, eye recognition, face recognition, facial recognition, ffmpeg, generative, haar, lbp, object recognition, opencv, star trek, surfacing survellience, the man eater, tos

GVOD + Analytics: Star Treks \\///

2014/03/02

This is a fan studies and media assemblage experiment, loosely associated with Professor Abigail De Kosnik’s Fan Data/Net Difference Project at the Berkeley Center for New Media. It uses technology associated with copyright verification and the surveillance state to desconstruct serial television into a hybrid media form.

The motivating question for this work is simple. How does one quantize serial television? Given a television episode, such as the third episode of Star Trek, how can it be measured and then compared to other episodes of Star Trek? Can characters of the original Star Trek television series be compared to characters in different Star Trek universes and franchises, such as comparing Kirk/Spock in Star Trek to Janeway/Seven-of-Nine in Star Trek Voyager? Given a media text, how do you tag and score it? If you cannot score the whole text, can you score a character or characters? How do characters or elements of a media text become countable?

Media Texts:

Star Trek (The Original Series), aka TOS, 1966 to 1969. Episodes: 79. Running time each: 50 minutes. English subtitles from subscene for Seasons 1, 2, 3.

Star Trek Voyager, aka VOY, 1995 to 2001. Episodes: S03E26 to S07E26, ie #68 to #172, a total of 104. Running time each varies between 45 and 46 minutes.

Media Focus/Themes:

The pairs of Kirk/Spock in Star Trek the Original Series and Janeway/Seven of Nine in Star Trek Voyager will be compared in a media-analytic fashion.

: James Kirk

: Kathryn Janeway

: Spock

: Seven of Nine

A popular fanfic genre is called One True Pairing, aka OTP, which is a perceived or invented romantic relationship between two characters. One of the best known examples of OTP is the pair of Kirk and Spock on TOS. Indeed, fanfic involving Kirk and Spock is so popular to have its own nomenclature, and is called slash, or slash fic.

The pair of Janeway and Seven of Nine are comparable to Kirk and Spock as both the Janeway and Kirk characters are captains of space ships, and both the Seven of Nine and Spock characters are presented as “the other” to human characters: both the borg and vulcans are presented as otherworldly, non-human. The two pairs are different in other areas, the most obvious being gender: K/S is male, J/7 is female.

Some edit tapes for K/S can be found on YouTube for Seasons 1, 2, and 3. Some fanvids for J/7.

Open Questions:

This is a meta-vidding tool with an analytic overlay. It takes serial television shows and adds facial recognition to count face time and change the focus of viewing to specific character pairs instead of entire episodes. Developing the technology to answer these analytic questions, answering and understanding the answers, and formulating the next round of questions is the purpose of this project.

1. Should the method be the first 79 episodes that the character-pairs are together? How do you normalize the series and pairs?

Or minute-normalized, after the edits? The current times are:

TOS == 79 x 50 minutes == 3950 “character-pair” minutes total

VOY == 104 x 43 minutes == 4472 “character-pair” minutes total

2. Best method for facial recognition.

One idea is to use openframeworks, and incorporate an addon. Get FaceTracker library. See video explaining it. Get ofxFaceTracker addon for openframeworks.

Another is to use opencv directly.

OpenCv documentation main page.

Tutorial: Object detection with cascade classifiers.

User guide: Cascade Classifier Training.

Contrib/Experimental: Face Recognition with OpenCV. See the cv::FaceRecognizer class.

Many, many variants go into this. Some good links:

Samuel Molinari, People’s Control Over Their Own Image in
Photos on Social Networks, 2012-05-08

Aligning Faces in C++

Tutorial: OpenCV haartraining Naotoshi Seo

Notes on traincascades parameters

Recommended values for detecting

ffmpeg concat

LBP and Facial Recognition Example with Obama

Simple Face recognition using OpenCV, Etienne Membrives, The Pebibyte

IEEE Xplore: Face detection, pose estimation, landmark localization in the wild, 2012

Xiangxin Zhu, Ramanan, D

3. Measuring “character” and “character-pair” screen time. How is this related to the bechdel test? [2+ women, who talk to each other, about something besides a man] Can be this used to visualize it or flaws as currently conceived? What is bechdel version 2.0? [2+ women, who talk to each other, about something besides a man, or kids, or family] Can we use this tool to develop new forms?

4. How to auto-tag? How to populate the details of each scene in a tagged format? If original sources have subtitles, is there a way to dump the subs to SRT, and then populate the body of the wordpress with the transcript? Or, is there a way to use google’s transcription API’s to try and upload/subtitle/rip?

5. Can the netflix taxonomy be replicated? Given the patents, can some other organization scheme be devised?

Methodology:

0. Prerequisites

Software/hardware base is: Linux (Fedora 20) on Intel x86_64, using Nvidia graphics hardware and software. Ie, contemporary rendering and film production workstation.

Additional software is required on top of this base. For instance, g++ development environment, ffmpeg, opencv, openframeworks.080.

Make sure opencv is installed.

yum install -y opencv opencv-core opencv-devel opencv-python opencv-devel-docs

See OpenCV Configuration and Optimization Notes for more information about speeding up OpenCV on fedora.

1. Digitize selected episodes for processing with digital tools

Decrypt via makemkv. Compress to 3k constant rip with HandBrake.

Using 720p version of TOS in matroska media container. Downloaded SRT subtitles from fan sites. Media ends up being: 960×720, 24 frames a second.

2. Quantize each episode to a select number of frames.

Make sure ffmpeg is installed.

yum install -y ffmpeg ffmpeg-devel ffmpeg-libs

Sample math as follows. Assume a fifty minute show has 24 frames a second. That is:

50 minutes x 60 seconds in a minute x 24 frames a second == 72k total frames an episode.

Assuming a one-frame-a-second sample resolution gives 3k frames for the total set of frames in TOS episode one. Use ffmpeg to create a thumbnail image ever X seconds of video. And set to one image every second.

Via:

TMPDIR=tmp-1
mkdir $TMPDIR;
ffmpeg -i $1 -f image2 -vf fps=fps=1 ${TMPDIR}/out%4d.png;

3. Sort through frames and set aside twelve frames of Kirk faces, twelve frames of Spock faces.

This is used later, to train the facial recognition. Note: you definitely need hundreds and even thousands of positive samples for faces. In the case of faces you should consider all the race and age groups, emotions and perhaps beard styles.

For example, meet the Kirks.

And here are the Spocks.

In addition, this technique requires a negative set of images. These are images that are from the media source, but do not contain any of the faces that are going to be recognized. These are used to train the facial recognizer. Meet the non-K/S-faces.

4. Seed facial recognition with faces to recognize. Scan frames with facial recognition according to some input and expected result algorithm, and come up with edit lists that can be used to frames that are relevant to the character-pair.

Need either timecode or some other measure that can be dumped with an edit decision list or specific timecode marks. Some persistent data structure? Edits made.

5. Decompose episode into character-pair edit vids.

Use edit decision list or specific timecode marks, as above. Automate ffmpeg to make edits.

6. Store in wordpress container, one post per edit vid? Then with another post, tie together all of a single episode edit vids into one linked post?

Legal

There are both copyright risks and patent opportunities in this line of inquiry.

Production Notes:

Further
cinemetrics
How Netflix Reverse Engineered Hollywood, Alexis Madrigal, The Atlantic, 2014-01-02

Filed under art, technology Tagged with analytics, anti-yt, art, assemblage, binary pattern recognition, broadcast, california, cascades, censorship, comparative media, cv::faceRecognizer, deconstruct, edit tape, empower, facial recognition, fan labor, fem slash, fuck the surveillance state, fuck-yt, generative, generative fan labor, gvod, gvoda, haar, hyperlinking, j/7, janeway, k/s, kirk, media, media fandom, media literacy, media quanta, meta, metavid, mixed-media, mulitple medias, multi-media, multiple literacy, next level, ninjas, nvidia tesla, object recognition, of, opencv, openframeworks, OTP, pattern recognition, processing, production, quanta, quantize, remix, resist, san-fran-cisco, self-censorship, serial media, serial television, shipping, slash, spock, star trek, streaming, stunt, subversion, supercuts, television, vidding, yt-yourself

sunglint

All the Uhuras

Ten Thoughts On Detection and Recognition

Notes on The Machine Is Learning

GVOD + Analytics: Star Treks \\///

Tags

Archives