Notes on The Machine Is Learning

This is a series of art videos that were generated as a by-product of an ongoing computational media project further described in GVOD, GVOD + Analytics: Star Treks \\///, etc.

The Machine is Learning, v2.2.
12:37 minutes, 960 x 720 pixels.

The Machine is Learning, v2.3 lbp
12:37 minutes, 960 x 720 pixels.

The Machine is Learning, v5.7 multi
12:37 minutes, 960 x 720 pixels.

The Machine is Learning, v5.7 multi ghost
12:37 minutes, 960 x 720 pixels.

GVOD + Analytics: Star Treks \\///



This is a fan studies and media assemblage experiment, loosely associated with Professor Abigail De Kosnik’s Fan Data/Net Difference Project at the Berkeley Center for New Media. It uses technology associated with copyright verification and the surveillance state to desconstruct serial television into a hybrid media form.

The motivating question for this work is simple. How does one quantize serial television? Given a television episode,  such as the third episode of Star Trek, how can it be measured and then compared to other episodes of Star Trek? Can characters of the original Star Trek television series be compared to characters in different Star Trek universes and franchises, such as comparing Kirk/Spock in Star Trek to Janeway/Seven-of-Nine in Star Trek Voyager? Given a media text, how do you tag and score it? If you cannot score the whole text, can you score a character or characters? How do characters or elements of a media text become countable?

Media Texts:

Star Trek (The Original Series), aka TOS, 1966 to 1969. Episodes: 79. Running time each: 50 minutes. English subtitles from subscene for Seasons 1, 2, 3.

Star Trek Voyager, aka VOY, 1995 to 2001. Episodes: S03E26 to S07E26, ie #68 to #172, a total of 104. Running time each varies between 45 and 46 minutes.

Media Focus/Themes:

The pairs of Kirk/Spock in Star Trek the Original Series and Janeway/Seven of Nine in Star Trek Voyager will be compared in a media-analytic fashion.

A popular fanfic genre is called One True Pairing, aka OTP, which is a perceived or invented romantic relationship between two characters. One of the best known examples of OTP is the pair of Kirk and Spock on TOS. Indeed, fanfic involving Kirk and Spock is so popular to have its own nomenclature, and is called slash, or slash fic.

The pair of Janeway and Seven of Nine are comparable to Kirk and Spock as both the Janeway and Kirk characters are captains of space ships, and both the Seven of Nine and Spock characters are presented as “the other” to human characters: both the borg and vulcans are presented as otherworldly, non-human. The two pairs are different in other areas, the most obvious being gender: K/S is male, J/7 is female.

Some edit tapes for K/S can be found on YouTube for Seasons 1, 2, and 3. Some fanvids for J/7.

Open Questions:

This is a meta-vidding tool with an analytic overlay. It takes serial television shows and adds facial recognition to count face time and change the focus of viewing to specific character pairs instead of entire episodes. Developing the technology to answer these analytic questions, answering and understanding the answers, and formulating the next round of questions is the purpose of this project.

1. Should the method be the first 79 episodes that the character-pairs are together? How do you normalize the series and pairs?

Or minute-normalized, after the edits? The current times are:

TOS == 79 x 50 minutes == 3950 “character-pair” minutes total

VOY == 104 x 43 minutes == 4472 “character-pair” minutes total

2. Best method for facial recognition.

One idea is to use openframeworks, and incorporate an addon. Get FaceTracker library. See video explaining it. Get ofxFaceTracker addon for openframeworks.

Another is to use opencv directly.

OpenCv documentation main page.

Tutorial: Object detection with cascade classifiers.

User guide: Cascade Classifier Training.

Contrib/Experimental: Face Recognition with OpenCV. See the cv::FaceRecognizer class.

Many, many variants go into this. Some good links:

Samuel Molinari, People’s Control Over Their Own Image in
Photos on Social Networks, 2012-05-08

Aligning Faces in C++

Tutorial: OpenCV haartraining Naotoshi Seo

Notes on traincascades parameters

Recommended values for detecting



ffmpeg concat

LBP and Facial Recognition Example with Obama

Simple Face recognition using OpenCV, Etienne Membrives, The Pebibyte


IEEE Xplore: Face detection, pose estimation, landmark localization in the wild, 2012

Xiangxin Zhu, Ramanan, D



3. Measuring “character” and “character-pair” screen time. How is this related to the bechdel test? [2+ women, who talk to each other, about something besides a man] Can be this used to visualize it or flaws as currently conceived? What is bechdel version 2.0? [2+ women, who talk to each other, about something besides a man, or kids, or family] Can we use this tool to develop new forms?

4. How to auto-tag? How to populate the details of each scene in a tagged format? If original sources have subtitles, is there a way to dump the subs to SRT, and then populate the body of the wordpress with the transcript? Or, is there a way to use google’s transcription API’s to try and upload/subtitle/rip?

5. Can the netflix taxonomy be replicated? Given the patents, can some other organization scheme be devised?


0. Prerequisites

Software/hardware base is: Linux (Fedora 20) on Intel x86_64, using Nvidia graphics hardware and software. Ie, contemporary rendering and film production workstation.

Additional software is required on top of this base. For instance, g++ development environment, ffmpeg, opencv, openframeworks.080.

Make sure opencv is installed.

yum install -y opencv opencv-core opencv-devel opencv-python opencv-devel-docs

See OpenCV Configuration and Optimization Notes for more information about speeding up OpenCV on fedora.

1. Digitize selected episodes for processing with digital tools

Decrypt via makemkv. Compress to 3k constant rip with HandBrake.

Using 720p version of TOS in matroska media container. Downloaded SRT subtitles from fan sites. Media ends up being: 960×720, 24 frames a second.

2. Quantize each episode to a select number of frames.

Make sure ffmpeg is installed.

yum install -y ffmpeg ffmpeg-devel ffmpeg-libs


Sample math as follows. Assume a fifty minute show has 24 frames a second. That is:

50 minutes x 60 seconds in a minute x 24 frames a second == 72k total frames an episode.

Assuming a one-frame-a-second sample resolution gives 3k frames for the total set of frames in TOS episode one. Use ffmpeg to create a thumbnail image ever X seconds of video. And set to one image every second.


mkdir $TMPDIR;
ffmpeg -i $1 -f image2 -vf fps=fps=1 ${TMPDIR}/out%4d.png;

3. Sort through frames and set aside twelve frames of Kirk faces, twelve frames of Spock faces.

This is used later, to train the facial recognition. Note: you definitely need hundreds and even thousands of positive samples for faces. In the case of faces you should consider all the race and age groups, emotions and perhaps beard styles.

For example, meet the Kirks.

And here are the Spocks.

In addition, this technique requires a negative set of images. These are images that are from the media source, but do not contain any of the faces that are going to be recognized. These are used to train the facial recognizer. Meet the non-K/S-faces.

4. Seed facial recognition with faces to recognize. Scan frames with facial recognition according to some input and expected result algorithm, and come up with edit lists that can be used to frames that are relevant to the character-pair.

Need either timecode or some other measure that can be dumped with an edit decision list or specific timecode marks. Some persistent data structure? Edits made.

5. Decompose episode into character-pair edit vids.

Use edit decision list or specific timecode marks, as above. Automate ffmpeg to make edits.

6. Store in wordpress container, one post per edit vid? Then with another post, tie together all of a single episode edit vids into one linked post?


There are both copyright risks and patent opportunities in this line of inquiry.

Production Notes:

How Netflix Reverse Engineered Hollywood, Alexis Madrigal, The Atlantic, 2014-01-02