GVOD + Analytics: Star Treks \\///



This is a fan studies and media assemblage experiment, loosely associated with Professor Abigail De Kosnik’s Fan Data/Net Difference Project at the Berkeley Center for New Media. It uses technology associated with copyright verification and the surveillance state to desconstruct serial television into a hybrid media form.

The motivating question for this work is simple. How does one quantize serial television? Given a television episode,  such as the third episode of Star Trek, how can it be measured and then compared to other episodes of Star Trek? Can characters of the original Star Trek television series be compared to characters in different Star Trek universes and franchises, such as comparing Kirk/Spock in Star Trek to Janeway/Seven-of-Nine in Star Trek Voyager? Given a media text, how do you tag and score it? If you cannot score the whole text, can you score a character or characters? How do characters or elements of a media text become countable?

Media Texts:

Star Trek (The Original Series), aka TOS, 1966 to 1969. Episodes: 79. Running time each: 50 minutes. English subtitles from subscene for Seasons 1, 2, 3.

Star Trek Voyager, aka VOY, 1995 to 2001. Episodes: S03E26 to S07E26, ie #68 to #172, a total of 104. Running time each varies between 45 and 46 minutes.

Media Focus/Themes:

The pairs of Kirk/Spock in Star Trek the Original Series and Janeway/Seven of Nine in Star Trek Voyager will be compared in a media-analytic fashion.

A popular fanfic genre is called One True Pairing, aka OTP, which is a perceived or invented romantic relationship between two characters. One of the best known examples of OTP is the pair of Kirk and Spock on TOS. Indeed, fanfic involving Kirk and Spock is so popular to have its own nomenclature, and is called slash, or slash fic.

The pair of Janeway and Seven of Nine are comparable to Kirk and Spock as both the Janeway and Kirk characters are captains of space ships, and both the Seven of Nine and Spock characters are presented as “the other” to human characters: both the borg and vulcans are presented as otherworldly, non-human. The two pairs are different in other areas, the most obvious being gender: K/S is male, J/7 is female.

Some edit tapes for K/S can be found on YouTube for Seasons 1, 2, and 3. Some fanvids for J/7.

Open Questions:

This is a meta-vidding tool with an analytic overlay. It takes serial television shows and adds facial recognition to count face time and change the focus of viewing to specific character pairs instead of entire episodes. Developing the technology to answer these analytic questions, answering and understanding the answers, and formulating the next round of questions is the purpose of this project.

1. Should the method be the first 79 episodes that the character-pairs are together? How do you normalize the series and pairs?

Or minute-normalized, after the edits? The current times are:

TOS == 79 x 50 minutes == 3950 “character-pair” minutes total

VOY == 104 x 43 minutes == 4472 “character-pair” minutes total

2. Best method for facial recognition.

One idea is to use openframeworks, and incorporate an addon. Get FaceTracker library. See video explaining it. Get ofxFaceTracker addon for openframeworks.

Another is to use opencv directly.

OpenCv documentation main page.

Tutorial: Object detection with cascade classifiers.

User guide: Cascade Classifier Training.

Contrib/Experimental: Face Recognition with OpenCV. See the cv::FaceRecognizer class.

Many, many variants go into this. Some good links:

Samuel Molinari, People’s Control Over Their Own Image in
Photos on Social Networks, 2012-05-08

Aligning Faces in C++

Tutorial: OpenCV haartraining Naotoshi Seo

Notes on traincascades parameters

Recommended values for detecting



ffmpeg concat

LBP and Facial Recognition Example with Obama

Simple Face recognition using OpenCV, Etienne Membrives, The Pebibyte


IEEE Xplore: Face detection, pose estimation, landmark localization in the wild, 2012

Xiangxin Zhu, Ramanan, D



3. Measuring “character” and “character-pair” screen time. How is this related to the bechdel test? [2+ women, who talk to each other, about something besides a man] Can be this used to visualize it or flaws as currently conceived? What is bechdel version 2.0? [2+ women, who talk to each other, about something besides a man, or kids, or family] Can we use this tool to develop new forms?

4. How to auto-tag? How to populate the details of each scene in a tagged format? If original sources have subtitles, is there a way to dump the subs to SRT, and then populate the body of the wordpress with the transcript? Or, is there a way to use google’s transcription API’s to try and upload/subtitle/rip?

5. Can the netflix taxonomy be replicated? Given the patents, can some other organization scheme be devised?


0. Prerequisites

Software/hardware base is: Linux (Fedora 20) on Intel x86_64, using Nvidia graphics hardware and software. Ie, contemporary rendering and film production workstation.

Additional software is required on top of this base. For instance, g++ development environment, ffmpeg, opencv, openframeworks.080.

Make sure opencv is installed.

yum install -y opencv opencv-core opencv-devel opencv-python opencv-devel-docs

See OpenCV Configuration and Optimization Notes for more information about speeding up OpenCV on fedora.

1. Digitize selected episodes for processing with digital tools

Decrypt via makemkv. Compress to 3k constant rip with HandBrake.

Using 720p version of TOS in matroska media container. Downloaded SRT subtitles from fan sites. Media ends up being: 960×720, 24 frames a second.

2. Quantize each episode to a select number of frames.

Make sure ffmpeg is installed.

yum install -y ffmpeg ffmpeg-devel ffmpeg-libs


Sample math as follows. Assume a fifty minute show has 24 frames a second. That is:

50 minutes x 60 seconds in a minute x 24 frames a second == 72k total frames an episode.

Assuming a one-frame-a-second sample resolution gives 3k frames for the total set of frames in TOS episode one. Use ffmpeg to create a thumbnail image ever X seconds of video. And set to one image every second.


mkdir $TMPDIR;
ffmpeg -i $1 -f image2 -vf fps=fps=1 ${TMPDIR}/out%4d.png;

3. Sort through frames and set aside twelve frames of Kirk faces, twelve frames of Spock faces.

This is used later, to train the facial recognition. Note: you definitely need hundreds and even thousands of positive samples for faces. In the case of faces you should consider all the race and age groups, emotions and perhaps beard styles.

For example, meet the Kirks.

And here are the Spocks.

In addition, this technique requires a negative set of images. These are images that are from the media source, but do not contain any of the faces that are going to be recognized. These are used to train the facial recognizer. Meet the non-K/S-faces.

4. Seed facial recognition with faces to recognize. Scan frames with facial recognition according to some input and expected result algorithm, and come up with edit lists that can be used to frames that are relevant to the character-pair.

Need either timecode or some other measure that can be dumped with an edit decision list or specific timecode marks. Some persistent data structure? Edits made.

5. Decompose episode into character-pair edit vids.

Use edit decision list or specific timecode marks, as above. Automate ffmpeg to make edits.

6. Store in wordpress container, one post per edit vid? Then with another post, tie together all of a single episode edit vids into one linked post?


There are both copyright risks and patent opportunities in this line of inquiry.

Production Notes:

How Netflix Reverse Engineered Hollywood, Alexis Madrigal, The Atlantic, 2014-01-02

libabigail aka C++ Instrumentation and Analysis


Libabigail is shorthand for the alternative, which just so happens to be a bit of a mouthful: “GNU Application Binary Interface Generic Analysis and Instrumentation Library.”

This is a current compiler/language research topic to provide a serialized XML form of C++11 sources as compiled by GNU g++, and a way of looking at the data produced. This data can be parsed to more accurately determine ABI compatibility, to better understand code additions and changes and how these change the exported interface, to examine and prototype how C++11 language usage determines linkage, etc.

Discussions about this functionality started at the “C++ ABI BOF” at the GNU Tools Cauldron 2012 Prague. This work was created at Red Hat, by Benjamin Kosnik, Jason Merrill, and Dodji Seketeli. Some updates at 2013 Cauldron. See “Cauldron 2013 GCC ABI BOF.”

Development sources are written in mixed C++2003/C++11, hosted in git, based on GCC trunk, and tracking what will to be gcc-4.9.0. The branch is administered by Dodji Seketeli.

Please feel free to try it out, but know that the state is experimental and quite raw.

Feedback and assistance is welcome.

Starting from a git working tree as described in GitMirror, add the libabigail repository as follows:

git checkout -b libabigail origin/libabigail

To stay up to date, use:

git pull


How is this expected to be used? First, a libabigail top-level directory is either added to the GCC sources or compiled as a first step and put into some PREFIX directory. The GNU C++ compiler, g++, is configured to use this new library with:

configure .. --with-abigail=$PREFIX

Thus configured, the C++ front end is built, installed, and used as the primary compiler. All sources are compiled with an additional flag, -fdump-abi.

So, this command:

g++ -c -fdump-abi somefile.cc

Creates two files:

  • somefile.o

    The object file

  • somefile.cc.bi

    The XML instrumentation file



Toplevel namespace is abigail.

The interface header files in libabigail:


Doxgen is used to document the sources: try make html to generate, and look in libabigail-build-dir/doc/api/html/index.html to read it.

And then the binary interface is in libabigail.so.


Each object file is compiled to a translation_unit. The sum of all translation_units is a corpus.

Compiler-generated files are read as serialized input to a translation_unit and de-serialized. And any modified form is written to an output file in serialized form.

The interface to the C++ intermediate representation is best viewed in the class documentation.

Opinions and Wild Guesses

1. Some formatting tips.

– classes “read” as types, data, members functions. In that order.

– doxygen gives feedback on the state of the doxygen parse in the form of a log, as you run “make html.” Read this log: doxygen is a fuzzy parse. There are formatting things you can do to make it better. Do them. It’s easier to fix up these errors then figure out why the generated HTML is poor.

2. Use of shared_ptr is intriguing.

There are not really a lot of existing usage patterns for std::shared_ptr in libstdc++ (in C++11 , , ). If you look at the page of boost idioms for shared_ptr usage:


One notices that there’s not a lot of use of shared_ptrs in interfaces. Yet in libabigail, that is very common. I’m curious about this style question.

And most usage is up for debate, see this stack overthow discussion about using shared_ptrs as function arguments. Should the parameters be const reference or just shared_ptr? And another.

Some interesting thinking from microsoft on shared_ptr usage.

3. Use of virtual binary operators is odd.

The old adage is that operators cause havoc in overload resolution. These are binary operators, but the stigma lingers. A vague feeling is not the same as something definite that’s a hard no. It’s more like the pirate code than a strict coding convention or hard rule. I would say that if you ever start to see strange bugs due to overloading, consider making these (non-operator) functions.

Otherwise, do it.

Notes on Generative, openFrameworks


Processing, openFrameworks are related. For Processing, see Casey Reas at UCLA.

See: Hello, Processing! For a beginning.

Some people: Casey Reas (west), Ben Fry (east). Via John Maeda. Chris Reilly (west), Chandler McWilliams


Base is ulloa config, Fedora 18 on x86_64. Secondary is Fedora 17 on x86_64. Note, you’ll need to have a video card and driver suitable for running OpenGL. Intel graphics are easy, Nvidia can be done but will need to use proprietary drivers, and not the default nouveau driver.

Follow notes from openframeworks site for linux/64 install, starting with going into the Fedora scripts directory:

cd /home/bkoz/src/openframeworks.v0073_linux64/scripts/linux/fedora

And then:

sudo ./install_codeblocks.sh

sudo ./install_codecs.sh

These should install some packages, if the codeblocks IDE and some of the development packages for audio or video codecs aren’t already installed. Then, do this script, which may start compiling things:

sudo ./install_dependencies.sh

Some editing/slight work-arounds for cairo includes, means using CXXFLAGS=”-I/usr/include/cairo” or doing the following small patch:

*** libs/openFrameworks/graphics/ofCairoRenderer.h.orig 2012-12-28 15:48:06.649358899 -0800
--- libs/openFrameworks/graphics/ofCairoRenderer.h      2012-12-28 15:48:59.502292659 -0800
*** 1,10 ****
#pragma once

! #include "cairo-features.h"
! #include "cairo-pdf.h"
! #include "cairo-svg.h"
! #include "cairo.h"
#include "ofMatrix4x4.h"
--- 1,10 ----
#pragma once

! #include "cairo/cairo-features.h"
! #include "cairo/cairo-pdf.h"
! #include "cairo/cairo-svg.h"
! #include "cairo/cairo.h"
#include "ofMatrix4x4.h"

After applying the patch, the dependency-making script above should complete without error. It will probably end with something that looks like:

to launch the application

cd bin

Instead, go up a level and run the


script, and then then



If everything has gone correctly, then the last script(testAllExamples.sh) will throw up window after window of delicious OF candy, fun to watch. To go on to the next example, close the topmost window.

If you get to this point, then the preliminary setup should be correct.

linux tcp examples failing, start tcp server? kill firewall? punch holes for openframeworks ports?

Examine Examples

Open up the example project files within the Codeblocks IDE.

Start hacking

Put all projects into the compiled openframeworks directory, in the “apps” subdirectory.


0. Form+Code website, links

1. Generative Design website, code page table of contents.

2. openprocessing website, community site.

3. openFrameworks website.

4. ofauckland, ofxCairo examples for linux

4. creativeapplications.net

5. Golan Levin is or was an openframeworks user

6. memo’s libs (MSA libs) on github. Download, then copy into your addons directory.

7. toxiclibs, another good lib

8. podcast on creative coding

9. cinder, which is another, alternative framework not based on processing.

10. generative.net, another good meta-site