---
title: Third Party Tracking Pixels
author: Brett Langdon
date: 2013-05-03
template: article.jade
---
An overview of what a third party tracking pixel is and how to create/use them.
---
So, what exactly do we mean by “third party tracking pixel” anyways?
Lets try to break it down piece by piece:
### Tracking Pixel:
A pixel referes to a tag that is placed on a site that offers no merit other than
calling out to a web page or script that is not the current page you are visiting.
These pixels are usually an html script tag that point to a javascript file with
no content or an img tag with a empty or transparent 1 pixel by 1 pixel gif image
(hence the term “pixel”). A tracking pixel is the term used to describe a pixel
that calls to another page or script in order to provide it information about the
users visit to the page.
### Third Party:
Third party just means the pixel points to a website that is not the current
website. For example,
Google Analytics
is a third party tracking tool because you place scripts on your website
that calls and sends data to Google.
## What is the point?
Why do people do this? In the case of Google Analytics people do not wish to track
and follow their own analytics for their website, instead they want a third party
host to do it for them, but they need a way of sending their user’s data to Google.
Using pixels and javascript to send the data to Google offers the company a few
benefits. For starters, they do not require any more overhead on their servers for
a service to send data directly to Google, instead by using pixels and scripts they
get to off load this overhead onto their users (thats right, we are using our
personal computers resources to send analytical data about ourselves to Google for
websites that use Google analytics). Secondly, the benefit of using a tracking
pixel that runs client side (in the user’s browser) we are allowed to gather more
information about the user. The information that is made available to us through
the use of javascript is far greater than what is given to our servers via
HTTP Headers.
## How do we do it?
Next we will walk through the basics of how to create third party tracking pixels.
Code examples for the following discussion can be found
here.
We will walk through four examples of tracking pixels accompanied by the server
code needed to serve and receive the pixels. The server is written in
Python and some basic
understanding of Python is required to follow along. The server examples are
written using only standard Python wsgi modules, so no extra installation is
needed. We will start off with a very simple example of using a tracking pixel and
then each example afterwards we will begin to add features to the pixel.
## Simple Example
For this example all we want to accomplish is to have a web server that returns
HTML containing our tracking pixel as well as a handler to receive the call from
our tracking pixel. Our end goal is to serve this HTML content:
```html
Welcome
```
As you can see, this is fairly simple HTML; the important part is the script tag
pointing to “/track.js”, this is our tracking pixel. When the user’s browser loads
the page this script will make a call to our server, our server can then log
information about that user. So we start with a wsgi handler for the HTML code:
```python
def html_content(environ, respond):
headers = [('Content-Type', 'text/html')]
respond('200 OK', headers)
return [
"""
Welcome
"""
]
```
Next we want to make sure that we have a handler for the calls to “/track.js”
from the script tag:
```python
def track_user(environ, respond):
headers = [('Content-Type', 'application/javascript')]
respond('200 OK', headers)
prefixes = ['PATH_', 'HTTP', 'REQUEST', 'QUERY']
for key, value in environ.iteritems():
if any(key.startswith(prefix) for prefix in prefixes):
print '%s: %s' % (key, value)
return ['']
```
In this handler we are taking various information about the request from the user
and simply printing it to the screen. The end point “/track.js” is not meant to
point to actual javascript so instead we return back an empty string. When this
code runs you should see something like the following:
```
brett$ python tracking_server.py
Tracking Server Listening on Port 8000...
1.0.0.127.in-addr.arpa - - [24/Apr/2013 20:03:21] "GET / HTTP/1.1" 200 89
HTTP_REFERER: http://localhost:8000/
REQUEST_METHOD: GET
QUERY_STRING:
HTTP_ACCEPT_CHARSET: ISO-8859-1,utf-8;q=0.7,*;q=0.3
HTTP_CONNECTION: keep-alive
PATH_INFO: /track.js
HTTP_HOST: localhost:8000
HTTP_ACCEPT: */*
HTTP_USER_AGENT: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.65 Safari/537.31
HTTP_ACCEPT_LANGUAGE: en-US,en;q=0.8
HTTP_DNT: 1
HTTP_ACCEPT_ENCODING: gzip,deflate,sdch
1.0.0.127.in-addr.arpa - - [24/Apr/2013 20:03:21] "GET /track.js HTTP/1.1" 200 0
1.0.0.127.in-addr.arpa - - [24/Apr/2013 20:03:21] "GET /favicon.ico HTTP/1.1" 204 0
```
You can see in the above that first the browser makes the request “GET /” which
returns our HTML containing the tracking pixel, then directly afterwards makes a
request for “GET /track.js” which prints out various information about the incoming
request. This example is not very useful as is, but helps to illustrate the key
point of a tracking pixel. We are having the browser make a request on behalf of
the user without the user’s knowledge. In this case we are making a call back to
our own server, but our script tag could easily point to a third party server.
## Add Some Search Data
Our previous, simple, example does not really provide us with any particularly
useful information other than allow us to track that a user’s browser made the
call to our server. For this next example we want to build upon the previous by
sending some data along with the tracking pixel; in this case, some search data.
Let us make an assumption that our web page allows users to make searches; searches
are given to the page through a url query string parameter “search”. We want to
pass that query string parameter on to our tracking pixel, which we will use the
query string parameter “s”. So our requests will look as follows:
* http://localhost:8000?search=my cool search
* http://localhost:8000/track.js?s=my cool search
To do this, we simply append the query string parameter “search” onto our track.js
script tag in our HTML:
```python
def html_content(environ, respond):
query = parse_qs(environ['QUERY_STRING'])
search = quote(query.get('search', [''])[0])
headers = [('Content-Type', 'text/html')]
respond('200 OK', headers)
return [
"""
Welcome
""" % search
]
```
For our tracking pixel handler we will simply print the value of the query string
parameter “s” and again return an empty string.
```python
def track_user(environ, respond):
query = parse_qs(environ['QUERY_STRING'])
search = query.get('s', [''])[0]
print 'User Searched For: %s' % search
headers = [('Content-Type', 'application/javascript')]
respond('200 OK', headers)
return ['']
```
When run the output will look similar to:
```
brett$ python tracking_server.py
Tracking Server Listening on Port 8000...
1.0.0.127.in-addr.arpa - - [24/Apr/2013 21:35:24] "GET /?search=my%20cool%20search HTTP/1.1" 200 110
User Searched For: my cool search
1.0.0.127.in-addr.arpa - - [24/Apr/2013 21:35:24] "GET /track.js?s=my%20cool%20search HTTP/1.1" 200 0
1.0.0.127.in-addr.arpa - - [24/Apr/2013 21:35:24] "GET /favicon.ico HTTP/1.1" 204 0
1.0.0.127.in-addr.arpa - - [24/Apr/2013 21:35:34] "GET /?search=another%20search HTTP/1.1" 200 108
User Searched For: another search
1.0.0.127.in-addr.arpa - - [24/Apr/2013 21:35:34] "GET /track.js?s=another%20search HTTP/1.1" 200 0
1.0.0.127.in-addr.arpa - - [24/Apr/2013 21:35:34] "GET /favicon.ico HTTP/1.1" 204 0
```
Here we can see the two search requests made to our web page and the similar
resulting requests to track.js. Again, this example might not seem like much but
it proves a way of being able to pass values from our web page along with to the
tracking server. In this case we are passing search terms, but we could also pass
any other information along we needed.
## Track User’s with Cookies
So now we are getting somewhere, our tracking server is able to receive some
search data about the requests made to our web page. The problem now is we have
no way of associating this information with a specific user; how can we know when
a specific user searches for multiple things. Cookies to the rescue. In this
example we are going to add the support of using cookies to assign each visiting
user a specific and unique id, this will allow us to associate all the search data
we receive with “specific” users. Yes, I say “specific” with quotes because we can
only associate the data with a given cookie, if multiple people share a computer
then we will probably think they are a single person. As well, if someone clears
the cookies for their browser then we lose all association with that user and have
to start all over again with a new cookie. Lastly, if a user does not allow cookies
for their browser then we will be unable to associate any data with them as every
time they visit our tracking server we will see them as a new user. So, how do we
do this? When receive a request from a user we want to look and see if we have
given them a cookie with a user id, if so then we will associate the incoming data
with that user id and if there is no user cookie then we will generate a new user
id and give it to the user.
```python
def track_user(environ, respond):
cookies = SimpleCookie()
cookies.load(environ.get('HTTP_COOKIE', ''))
user_id = cookies.get('id')
if not user_id:
user_id = uuid4()
print 'User did not have id, giving: %s' % user_id
query = parse_qs(environ['QUERY_STRING'])
search = query.get('s', [''])[0]
print 'User %s Searched For: %s' % (user_id, search)
headers = [
('Content-Type', 'application/javascript'),
('Set-Cookie', 'id=%s' % user_id)
]
respond('200 OK', headers)
return ['']
```
This is great! Not only can we now obtain search data from a third party website
but we can also do our best to associate that data with a given user. In this
instance a single user is anyone who shares the same user id in their
browsers cookies.
## Cache Busting
So what exactly is cache busting? Our browsers are smart, they know that we do not
like to wait a long time for a web page to load, they have also learned that they
do not need to refetch content that they have seen before if they cache it. For
example, an image on a web site might get cached by your web browser so every time
you reload the page the image can be loaded locally as opposed to being fetched
from the remote server. Cache busting is a way to ensure that the browser does not
cache the content of our tracking pixel. We want the user’s browser to follow the
tracking pixel to our server for every page request they make because we want to
follow everything that that user does. When the browser caches our tracking
pixel’s content (an empty string) then we lose out on data. Cache busting is the
term used when we programmatically generate query string parameters to make calls
to our tracking pixel look unique and therefore ensure that the browser follows
the pixel rather than load from it’s cache. To do this we need to add an extra end
point to our server. We need the HTML for the web page, along with a cache busting
script and finally our track.js handler. A cache busting script will use javascript
to add our track.js script tag to the web page. This means that after the web page
is loaded javascript will run to manipulate the
DOM
to add our cache busted track.js script tag to the HTML. So, what does this
look like?
```javascript
var now = new Date().getTime();
var random = Math.random() * 99999999999;
document.write('
```
This script adds the extra query string parameters ”r” which is a random number
and “t” which is the current timestamp in milliseconds. This will give us a unique
enough request that will trick our browsers into ignoring anything that is has in
it’s cache for track.js and forces it to make the request anyways. Using a cache
buster requires us to modify the html we server slightly to server up the cache
busting javascript as opposed to our track.js pixel.
```html
Welcome
```
And we need the following to serve up the cache buster script buster.js:
```python
def cache_buster(environ, respond):
headers = [('Content-Type', 'application/javascript')]
respond('200 OK', headers)
cb_js = """
function getParameterByName(name){
name = name.replace(/[\[]/, "\\\[").replace(/[\]]/, "\\\]");
var regexS = "[\\?&]" + name + "=([^]*)";
var regex = new RegExp(regexS);
var results = regex.exec(window.location.search);
if(results == null){
return "";
}
return decodeURIComponent(results[1].replace(/\+/g, " "));
}
var now = new Date().getTime();
var random = Math.random() * 99999999999;
var search = getParameterByName('search');
document.write('');
"""
return [cb_js]
```
We do not care very much if the browser caches our cache buster script because
it will always generate a new unique track.js url every time it is run.
## Conclusion
There is a lot of stuff going on here and probably a lot to digest so lets review
quick what we have learned. For starters we learned that companies use tracking
pixels or tags on web pages whose sole purpose is to make your browser call our to
external third party sites in order to track information about your internet
usage (usually, they can be used for other things as well). We also looked into
some very simplistic ways of implementing a server whose job it is to accept
tracking pixels calls in various forms.
We learned that these tracking servers can use cookies stored on your browser to
store a unique id for you in order to help associate the data collected to you.
That you can remove this association by clearing your cookies or by not allowing
them at all. Lastly, we learned that browsers can cause issues for our tracking
pixels and data collection and that we can get around them using a cache busting
javascript.
As a reminder the full working code examples can be located at
"https://github.com/brettlangdon/tracking-server-examples.