Assuming Fedora 24, with GeoIP installed as
dnf install -y GeoIP GeoIP-devel GeoIP-GeoLite-data GeoIP-GeoLite-data-extra;
dnf install -y libcurl libcurl-devel;
dnf install -y libevent libevent-devel --allowerasing;
dnf install -y intltool;
Test that this is working by using
geoiplookup on the command line
GeoIP Country Edition: US, United States
GeoIP City Edition, Rev 1: US, CA, California, Mountain View, 94035, 37.386002, -122.083801, 807, 650
GeoIP ASNum Edition: AS15169 Google Inc.
Other dependencies: rapidjson, rapidxml for visualization interface.
dnf install -y rapidjson rapidjson-devel rapidxml-devel;
Assuming SSL and crypto support.
dnf install -y openssl openssl-libs openssl-devel;
For the bittorrent protocol, use libtorrent built from source.
In addition to the operating system packages, python support for geolocation is needed. In particular, need
bencode, geoip2, requests. So:
pip install GeoIP
pip install GeoIP2
pip install bencode
pip install requests
Part one of Bittorrent distribution research was Fall 2015. Part two is Fall 2016.
Bittorrent. Common terms and jargon.
Basic idea as per The BitTorrent Protocol Specification. Of this, of note is tracker and in particular the tracker scraper protocol.
Magnet links wikipedia entry. PEX, DHT, Magnet links all from lifehacker.
1. Trackers. List of trackers, announcements, UDP, http.
Fetishizing the most current trackers: 1, 2.
2. Public torrents, Private torrents, SSL torrents. Why are some of these able to be scraped, and others, not so much? From
transmission-show -s my.torrent, will get status for public and private, but not ssl torrents.
3. Look at
transmission dependencies: openssl, libcurl. See libtransmission includes: transmission.h, variant.h, utils.h. See:
struct tr_tracker_stat, tr_torrentTrackers, tr_torrentTrackersFree, tr_info, tr_tracker_info, tr_torrent_activity. Of note, it looks like multiple reads of peers from tracker. TR_PEER_FROM_PEX, TR_PEER_FROM_DHT, TR_PEER_FROM_TRACKER.
4. Look at SSL cert example in
libtorrent. See libtorrent github source repository.
5. What kind of tree/node visualizations will work? See The Book of Trees.
One. Data set 1 is four torrents from October 11 for the same recent serial television episode, each a different uploading group: LOL, DIMENSION, AFG, mSD.
Two. Data set 2 is three torrents from October 17, over 8 locales, during 3 time periods.
Three: Prepare for TWD 2016-10-23 Season 7 premiere.
- visualization: three or more approaches, data types, presentation/exhibition,prototype
- scraper analytics
- archival data format
- persistent scraping, archival interface