Migrate content to hugo

9 years ago · 8749084ac3
--- a/config.toml
+++ b/config.toml
@ -0,0 +1,22 @@
 baseurl = "https://brett.is/"
 title = "Brett.is"
 languageCode = "en-us"
 theme = "hugo-cactus-theme"
 googleAnalytics = "UA-34513423-1"
 disqusShortname = "brettlangdon"
 [params]
    customCSS = ["css/lato.css", "css/site.css"]
    name = "Brett Langdon"
    description = "A geek with a blog"
    bio = "A geek with a blog"
    aboutAuthor = "A geek with a blog"
    twitter = "brett_langdon"
    enableRSS = true
    iconFont = "font-awesome"
 [social]
    twitter  = "https://twitter.com/brett_langdon"
    github   = "https://github.com/brettlangdon"
    linkedin = "https://www.linkedin.com/in/brettlangdon"
    rss = "https://brett.is/index.xml"
--- a/content/about/index.md
+++ b/content/about/index.md
@ -0,0 +1,2 @@
 ---
 ---
--- a/content/writing/about/browser-fingerprinting/index.md
+++ b/content/writing/about/browser-fingerprinting/index.md
@ -0,0 +1,107 @@
 ---
 title: Browser Fingerprinting
 author: Brett Langdon
 date: 2013-06-05
 template: article.jade
 ---
 Ever want to know what browser fingerprinting is or how it is done?
 ---
 ## What is Browser Fingerprinting?
 A browser or <a href="http://en.wikipedia.org/wiki/Device_fingerprint" target="_blank">device fingerprint</a>
 is a term used to describe an identifier generated from information retrieved from
 a single given device that can be used to identify that single device only.
 For example, as you will see below, browser fingerprinting can be used to generate
 an identifier for the browser you are currently viewing this website with.
 Regardless of you clearing your cookies (which is how most third party companies
 track your browser) the identifier should be the same every time it is generated
 for your specific device/browser. A browser fingerprint is usually generated from
 the browsers <a href="https://en.wikipedia.org/wiki/User_agent" target="_blank">user agent</a>,
 timezone offset, list of installed plugins, available fonts, screen resolution,
 language and more. The <a href="https://www.eff.org/" target"_blank">EFF</a> did
 a <a href="https://panopticlick.eff.org/browser-uniqueness.pdf" target="_blank">study</a>
 on how unique a browser fingerprint for a given client can be and which browser
 information provides the most entropy. To see how unique your browser is please
 check out their demo application
 <a href="https://panopticlick.eff.org/" target="_blank">Panopticlick</a>.
 ## What can it used for?
 Ok, so great, but who cares? How can browser fingerprinting be used? Right now
 the majority of <a href="http://kb.mozillazine.org/User_tracking" target="_blank">user tracking</a>
 is done by the use of cookies. For example, when you go to a website that has
 [tracking pixels](http://brett.is/writing/about/third-party-tracking-pixels/)
 (which are “invisible” scripts or images loaded in the background of the web page)
 the third party company receiving these tracking calls will inject a cookie into
 your browser which has a unique, usually randomly generated, identifier that is
 used to associate stored data about you like collected
 <a href="http://searchengineland.com/what-is-retargeting-160407" target="_blank">site or search retargeting</a>
 data. This way when you visit them again with the same cookie they can lookup
 previously associated data for you.
 So, if this is how it is usually done why do we care about browser fingerprints?
 Well, the main problem with cookies is they can be volatile, if you manually delete
 your cookies then the company that put that cookie there loses all association with
 you and any data they have on your is no longer useful. As well, if a client does
 not allow third party cookies (or any cookies) on their browser then the company
 will be unable to track the client at all.
 A browser fingerprint on the other hand is a more constant way to identify a given
 client, as long as they have javascript enabled (which seems to be a thing which
 most websites cannot properly function without), which allows the client to be
 identified even if they do not allow cookies for their browser.
 ##How do we do it?
 Like I mentioned before to generate a browser fingerprint you must have javascript
 enabled as it is the easiest way to gather the most information about a browser.
 Javascript gives us access to things like your screen size, language, installed
 plugins, user agent, timezone offset, and other points of interest. This
 information is basically smooshed together in a string and then hashed to generate
 the identifier, the more information you can gather about a single browser the more
 unique of a fingerprint you can generate and the less collision you will have.
 Collision? Yes, if you end up with two laptops each of the same make, model, year,
 os version, browser version with the exact same features and plugins enabled then
 the hashes will be the exact same and anyone relying on their fingerprint will
 treat both of those devices as the same. But, if you read the white paper by EFF
 listed above then you will see that their method for generating browser fingerprints
 is usually unique for almost 3 million different devices. There may be some cases
 for companies where that much uniqueness is more than enough to use and rely on
 fingerprints to identify devices and others where they have more than 3
 million users.
 Where does this really come into play? Most websites usually have their users
 create and account and log in before allowing them access to portions of the site or
 to be able to lookup stored information, maybe their credit card payment
 information, home address, e-mail address, etc. Where browser fingerprints are
 useful is for trying to identify anonymous visitors to a web application. For
 example, [third party trackers](/writing/about/third-party-tracking-pixels/)
 who are collecting search or other kinds of data.
 ## Some Code
 Their is a project on <a href="https://www.github.com/" target="_blank">github</a>
 by user <a href="https://github.com/Valve" target="_blank">Valentin Vasilyev (Valve)</a>
 called <a href="https://github.com/Valve/fingerprintjs" target="_blank">fingerprintjs</a>
 which is a client side javascript library for generating browser fingerprints.
 If you are interested in seeing some production worthy code of how to generate
 browser fingerprints please take a look at that project, it uses information like
 useragent, language, color depth, timezone offset, whether session or local storage
 is available, a listing of all installed plugins and it hashes everything using
 <a href="https://sites.google.com/site/murmurhash/" target="_blank">murmurhash3</a>.
 ## Your <a href="" target="_blank">fingerprintjs</a> Fingerprint: *<span id="fingerprint">Could not generate fingerprint</span>*
 <script type="text/javascript" src="/js/fingerprint.js"></script>
 <script type="text/javascript">
 var fingerprint = new Fingerprint().get();
 document.getElementById("fingerprint").innerHTML = fingerprint;
 </script>
 **Resources:**
 * <a href="http://panopticlick.eff.org/" target="_blank">panopticlick.eff.org</a> - find out how rare your browser fingerprint is.
 * <a href="https://github.com/Valve/fingerprintjs" target="_blank">github.com/Valve/fingerprintjs</a> - client side browser fingerprinting library.
--- a/content/writing/about/continuous-nodejs-module/index.md
+++ b/content/writing/about/continuous-nodejs-module/index.md
@ -0,0 +1,62 @@
 ---
 title: Continuous NodeJS Module
 author: Brett Langdon
 date: 2012-04-28
 template: article.jade
 ---
 A look into my new NodeJS module called Continuous.
 ---
 Greetings everyone. I wanted to take a moment to mention the new NodeJS module
 that I just published called Continuous.
 Continuous is a fairly simply plugin that is aimed to aid in running blocks of
 code consistently; it is an event based interface for setTimeout and setInterval.
 With Continuous you can choose to run code at a set or random interval and
 can also hook into events.
 ## Installation
 ```bash
 npm install continuous
 ```
 ## Continuous Usage
 ```javascript
 var continuous = require('continuous');
 var run = new continuous({
    minTime: 1000,
    maxTime: 3000,
    random: true,
    callback: function(){
        return Math.round( new Date().getTime()/1000.0 );
    },
    limit: 5
 });
 run.on(‘complete’, function(count, result){
    console.log(‘I have run ‘ + count + ‘ times’);
    console.log(‘Results:’);
    console.dir(result);
 });
 run.on(‘started’, function(){
    console.log(‘I Started’);
 });
 run.on(‘stopped’, function(){
    console.log(‘I am Done’);
 });
 run.start();
 setTimeout( function(){
    run.stop();
 }, 5000 );
 ```
 For more information check out Continuous on
 <a href="https://github.com/brettlangdon/continuous" target="_blank">GitHub</a>.
--- a/content/writing/about/cookieless-user-tracking/index.md
+++ b/content/writing/about/cookieless-user-tracking/index.md
@ -0,0 +1,167 @@
 ---
 title: Cookieless User Tracking
 author: Brett Langdon
 date: 2013-11-30
 template: article.jade
 ---
 A look into various methods of online user tracking without cookies.
 ---
 Over the past few months, in my free time, I have been researching various
 methods for cookieless user tracking. I have a previous article that talks
 on how to write a
 <a href="/writing/about/third-party-tracking-pixels/" target="_blank">tracking server</a>
 which uses cookies to follow people between requests. However, recently
 browsers are beginning to disallow third party cookies by default which means
 developers have to come up with other ways of tracking users.
 ## Browser Fingerprinting
 You can use client side javascript to generate a
 <a href="/writing/about/browser-fingerprinting/" target="_blank">browser fingerprint</a>,
 or, a unique identifier for a specific users browser (since that is what cookies
 are actually tracking). Once you have the browser's fingerprint you can then
 send that id along with any other requests you make.
 ```javascript
 var user_id = generateBrowserFingerprint();
 document.write(
    '<script type="text/javascript" src="/track/user/"' + user_id + '></ sc' + 'ript>'
 );
 ```
 ## Local Storage
 Newer browsers come equipped with a feature called
 <a href="http://diveintohtml5.info/storage.html" target="_blank">local storage</a>
 , which is used as a simple key-value store accessible through javascript.
 So instead of relying on cookies as your persistent storage, you can store the
 user id in local storage instead.
 ```javascript
 var user_id = localStorage.getItem("user_id");
 if(user_id == null){
    user_id = generateNewId();
    localStorage.setItem("user_id", user_id);
 }
 document.write(
    '<script type="text/javascript" src="/track/user/"' + user_id + '></ sc' + 'ript>'
 );
 ```
 This can also be combined with a browser fingerprinting library for generating
 the new id.
 ## ETag Header
 There is a feature of HTTP requests called an
 <a href="http://en.wikipedia.org/wiki/HTTP_ETag" target="_blank">ETag Header</a>
 which can be exploited for the sake of user tracking. The way an ETag works is
 when a request is made the server will respond with an ETag header with
 a given value (usually it is an id for the requested document, or maybe a hash
 of it), whenever the bowser then makes another request for that document it will
 send an _If-None-Match_ header with the value of _ETag_ provided by the server
 last time. The server can then make a decision as to whether or not new content
 needs to be served based on the id/hash provided by the browser.
 As you may have figured out, instead we can assign a unique user id as the ETag
 header for a response, then when the browser makes a request for that page again
 it will send us the user id.
 This is useful, except for the fact that we can only provide a single id per
 user per endpoint. For example, if I use the urls `/track/user` and
 `/collect/data` there is no way for me to get the browser to send the same
 _If-None-Match_ header for both urls.
 ### Example Server
 ```python
 from uuid import uuid4
 from wsgiref.simple_server import make_server
 def tracking_server(environ, start_response):
    user_id = environ.get("HTTP_IF_NONE_MATCH")
    if not user_id:
        user_id = uuid4().hex
    start_response("200 Ok", [
        ("ETag", user_id),
    ])
    return [user_id]
 if __name__ == "__main__":
    try:
        httpd = make_server("", 8000, tracking_server)
        print "Tracking Server Listening on Port 8000..."
        httpd.serve_forever()
    except KeyboardInterrupt:
        print "Exiting..."
 ```
 ## Redirect Caching
 Redirect caching is similar in concept to the the ETag tracking method where
 we rely on the browser cache to store the user id for us. With redirect caching
 we have our tracking url `/track/`, when someone goes there we perform a 301
 redirect to `/<user_id>/track`. The users browser will then cache that 301
 redirect and the next time the user goes to `/track` it will just go to
 `/<user_id>/track` instead.
 Just like the ETag method we run into an issue where this method really only
 works for a single endpoint url. We cannot use it for an end all be all for
 tracking users across a site or multiple sites.
 ### Example Server
 ```python
 from uuid import uuid4
 from wsgiref.simple_server import make_server
 def tracking_server(environ, start_response):
    if environ["PATH_INFO"] == "/track":
        start_response("301 Moved Permanently", [
            ("Location", "/%s/track" % uuid4().hex),
        ])
    else:
        start_response("200 Ok", [])
    return [""]
 if __name__ == "__main__":
    try:
        httpd = make_server("", 8000, tracking_server)
        print "Tracking Server Listening on Port 8000..."
        httpd.serve_forever()
    except KeyboardInterrupt:
        print "Exiting..."
 ```
 ## Ever Cookie
 A project worth noting is Samy Kamkar's
 <a href="http://samy.pl/evercookie/" target="_blank">Evercookie</a>
 which uses standard cookies, flash objects, silverlight isolated storage,
 web history, etags, web cache, local storage, global storage... and more
 all at the same time to track users. This library exercises every possible
 method for storing a user id which makes it a reliable method for ensuring
 that the id is stored, but at the cost of being very intrusive and persistent.
 ## Other Methods
 I am sure there are other methods out there, these are just the few that I
 decided to focus on. If anyone has any other methods or ideas please leave a comment.
 ## References
 * <a href="http://ochronus.com/tracking-without-cookies/" target="_blank">http://ochronus.com/tracking-without-cookies/</a>
 * <a href="http://ochronus.com/user-tracking-http-redirect/" target="_blank">http://ochronus.com/user-tracking-http-redirect/</a>
 * <a href="http://samy.pl/evercookie/" target="_blank">http://samy.pl/evercookie/</a>
--- a/content/writing/about/detect-flash-with-javascript/index.md
+++ b/content/writing/about/detect-flash-with-javascript/index.md
@ -0,0 +1,62 @@
 ---
 title: Detect Flash with JavaScript
 author: Brett Langdon
 date: 2013-06-05
 template: article.jade
 ---
 Quick, easy and lightweight way to detecting flash support in clients.
 ---
 Recently I had to find a a good way of detecting if <a href="http://www.adobe.com/products/flashplayer.html" target="_blank">Flash</a>
 is enabled in the browser, there are the two main libraries
 <a href="http://solutionpartners.adobe.com/products/flashplayer/download/detection_kit/" target="_blank">Adobe Flash Detection Kit</a>
 and <a href="https://code.google.com/p/swfobject/" target="_blank">SWFObject</a>
 which are both very good at detecting whether Flash is enabled as well as getting
 the version of Flash installed and useful for dynamically embedding and manipulating
 <a href="http://en.wikipedia.org/wiki/SWF" target="_blank">swf</a> files
 in your web application. But all I needed was a **yes** or a **no** to whether
 Flash was there or not without the added overhead of unneeded code.
 My goal was to wrote the least amount of JavaScript while still being able
 to detect cross browser for Flash.
 ```javascript
 function detectflash(){
    if (navigator.plugins != null && navigator.plugins.length > 0){
        return navigator.plugins["Shockwave Flash"] && true;
    }
    if(~navigator.userAgent.toLowerCase().indexOf("webtv")){
        return true;
    }
    if(~navigator.appVersion.indexOf("MSIE") && !~navigator.userAgent.indexOf("Opera")){
        try{
            return new ActiveXObject("ShockwaveFlash.ShockwaveFlash") && true;
        } catch(e){}
    }
    return false;
 }
 ```
 For those unfamiliar with the tilde (~) operator in javascript, please read
 <a href="http://dreaminginjavascript.wordpress.com/2008/07/04/28/" target="_blank">this article</a>,
 but the short version is, used with indexOf these two lines are equivalent:
 ```javascript
 ~navigator.appVersion.indexOf("MSIE")
 navigator.appVersion.indexOf("MSIE") != -1
 ```
 To use the above function:
 ```javascript
 if(detectflash()){
    alert("Flash is enabled");
 } else{
    alert("Flash is not available");
 }
 ```
 And that is it. Pretty simple and a much shorter version that the alternatives,
 compressed and mangled I have gotten this code to under 400 Bytes.
 I tested this code with IE 5.5+, Firefox and Chrome without any issues.
--- a/content/writing/about/fail2ban-honeypot/index.md
+++ b/content/writing/about/fail2ban-honeypot/index.md
@ -0,0 +1,141 @@
 ---
 title: Fail2Ban Honeypot
 author: Brett Langdon
 date: 2012-02-04
 template: article.jade
 ---
 How to use Python and Fail2Ban to write an auto-blocking honeypot.
 ---
 I have been practicing for the upcoming NECCDC competition and have been playing
 around with various security concepts and one that I thought of trying was
 creating a honeypot that automagically blocks ips when trapped. So what I have is
 a honeypot script written in python that logs intruders to a log file and then a
 <a href="http://fail2ban.org/" target="_blank">Fail2Ban</a>
 definition that will block the ip address. So I will show you the Fail2Ban
 honeypot that I have thrown together.
 ## Installation
 We first need to install
 <a href="http://python.org/" target="_blank">python</a> and
 <a href="http://fail2ban.org/" target="_blank">fail2ban</a>.
 Installation process might be different depending which linux distribution
 you are using.
 ```bash
 sudo apt-get install python fail2ban
 ```
 ## Honeypot
 Copy the following python script and create a file `honeypot.py`.
 ```python
 import socket
 import threading
 import sys
 class HoneyThread(threading.Thread):
    def __init__(self, logfile, port):
        self.logfile = logfile
        self.port = port
        self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        self.sock.bind( ('', port) )
        self.sock.listen( 1 )
        print 'Listening on: ', port
        super(HoneyThread, self).__init__()
    def run(self):
        while True:
            channel, details = self.sock.accept()
            logstr = (
                'Connection from %s:%s on port %s\r\n' %
                (details[0], details[1], self.port)
            )
            self.logfile.write('%s\r\n' % logstr)
            print logstr
            self.logfile.flush()
            channel.send('You Just Got Stuck In Some Honey')
            channel.close()
 ports = []
 for arg in sys.argv[1:]:
     ports.append(int(arg))
     threads = []
     logfile = open('/var/log/honeypot.log', 'a')
 for p in ports:
    threads.append(HoneyThread(logfile, p))
 for thread in threads:
    thread.start()
 print 'Bring it on!'
 ```
 Some may notice a slight issue, this script is meant to run 24/7 and never be
 stopped. There is no particular way of stopping the threads unless the machine
 is restarted.
 ## Running Honeypot
 To run the honeypot simply issue the following command:
 ```bash
 python honeypot.py 22 25 80 443
 ```
 Replace the ports shown with the ports that you want the honeypot to run on.
 When someone tries to connect to one of the supplied ports this script will
 display on the screen the ip address that connected, the port they connected from
 and the port they were trying to reach. It will also log the incident to
 the `/var/log/honeypot.log` file.
 ## Fail2Ban
 Now to setup fail2ban to block the ip address when it is captured.
 A new filter definition needs to be created in `/etc/fail2ban/filter.d/honeypot.conf`.
 ```ini
 [Definition]
 failregex =
 ```
 And the filter has to be set in `/etc/fail2ban/jail.conf`.
 ```ini
 ...
 [honeypot]
 enabled = true
 filter = honeypot
 logpath = /var/log/honeypot.log
 action = iptables-allports[name=Honeypot, protocol=all]
 maxretry = 1
 ...
 ```
 Please make sure to read up on fail2ban’s various actions, the ‘iptables-allports’
 one is used here with ‘protocol: all’, meaning that the ip address is banned from
 making all connections on any port using any protocol (tcp, udp, icmp, etc). Also
 change ‘maxretry’ as you see fit, with it set to 1 then any single access to the
 honeypot will ban the ip for the configured amount of time (600 seconds by
 default), if you want this can be changed to 2 or 3 so if someone is persistent
 with trying to access the false service.
 And that is it, just start Fail2Ban and test by trying to access the one of the
 honeypot ports. This can be done from a second machine and using telnet.
 ```bash
 telnet 192.168.1.11 80
 ```
 Replace ’192.168.1.11′ with the ip address of the machine running the honeypot
 and ’80′ with the port you wish to test.
 And there you have it, a Fail2Ban honeypot written in Python. Deploy and Enjoy.
--- a/content/writing/about/fastest-python-json-library/index.md
+++ b/content/writing/about/fastest-python-json-library/index.md
@ -0,0 +1,136 @@
 ---
 title: The Fastest Python JSON Library
 author: Brett Langdon
 date: 2013-09-22
 template: article.jade
 ---
 My results from benchmarking a handfull of Python JSON parsing libraries.
 ---
 Most who know me well know that I am usually not one for benchmarks.
 Especially blindly posted benchmark results in blog posts (like how this one is going to be).
 So, instead of trying to say that “this library is better than that library” or to try and convince you that you are going to end up with the same results as me.
 Instead remember to take these results with a grain of salt.
 You might end up with different results than me.
 Take these results as interesting findings which help supplement your own experiments.
 Ok, now that that diatribe is over with LETS GET TO THE COOL STUFF!
 We use JSON for a bunch of stuff at work, whether it is a third party system that uses JSON to communicate or storing JSON blobs in the database.
 We have done some naive benchmarking in the past and came to the conclusion that [jsonlib2](https://pypi.python.org/pypi/jsonlib2/) is the library for us.
 Well, I started a personal project that also uses JSON and I decided to revisit benchmarking Python JSON libraries to see if there are any “better” ones out there.
 I ended up with the following libraries to test:
 [standard lib json](http://docs.python.org/2/library/json.html), [jsonlib2](https://pypi.python.org/pypi/jsonlib2/), [simplejson](https://pypi.python.org/pypi/simplejson/), [yajl](https://pypi.python.org/pypi/yajl) (yet another json library) and lastly [ujson](https://pypi.python.org/pypi/ujson) (ultrajson).
 For the test, I wanted to test parsing and serializing a large json blob, in this case, I simply took a snapshot of data from the [Twitter API Console](https://dev.twitter.com/console).
 Ok, enough with this context b.s. lets see some code and some results.
 ```python
 import json
 import timeit
 # json data as a str
 json_data = open("./fixture.json").read()
 # json data as a list
 data = json.loads(json_data)
 number = 500
 repeat = 4
 print "Average run time over %s executions repeated %s times" % (number, repeat)
 # we still store the fastest run times here
 fastest_dumps = (None, -1)
 fastest_loads = (None, -1)
 for library in ("ujson", "simplejson", "jsonlib2", "json", "yajl"):
    print "-" * 20
    # thanks yajl for not setting __version__
    exec("""
 try:
    from %s import __version__
 except Exception:
    __version__ = None
         """ % library)
    print "Library: %s" % library
    # for jsonlib2 this is a tuple... thanks guys
    print "Version: %s" % (__version__, )
    # time to time json.dumps
    timer = timeit.Timer(
        "json.dumps(data)",
        setup="""
 import %s as json
 data = %r
              """ % (library, data)
    )
    total = sum(timer.repeat(repeat=repeat, number=number))
    per_call = total / (number * repeat)
    print "%s.dumps(data): %s (total) %s (per call)" % (library, total, per_call)
    if fastest_dumps[1] == -1 or total > fastest_dumps[1]:
        fastest_dumps = (library, total)
    # time to time json.loads
    timer = timeit.Timer(
        "json.loads(data)",
        setup="""
 import %s as json
 data = %r
              """ % (library, json_data)
    )
    total = sum(timer.repeat(repeat=repeat, number=number))
    per_call = total / (number * repeat)
    print "%s.loads(data): %s (total) %s (per call)" % (library, total, per_call)
    if fastest_loads[1] == -1 or total > fastest_loads[1]:
       fastest_loads = (library, total)
    print "-" * 20
    print "Fastest dumps: %s %s (total)" % fastest_dumps
    print "Fastest loads: %s %s (total)" % fastest_loads
 ```
 Ok, we need to talk about this code for a second.
 It really is not the cleanest code I have ever written.
 We start off by loading the fixture json data as both the raw json text and parse it into a python list of objects.
 Then for each of the libraries we want to test, we try to get their version information and finally we use [timeit](http://docs.python.org/2/library/timeit.html) to test how long it takes to serialize the parsed fixture data into a JSON string and then we test parsing the JSON string of the fixture data into a list of objects.
 And lastly, we store the name of the library with the fastest total run time for either “dumps” or “loads” and then at the end we print which was fastest.
 Here are the results I got when running on my macbook pro:
 ```text
 Average run time over 500 executions repeated 4 times
 --------------------
 Library: ujson
 Version: 1.33
 ujson.dumps(data): 1.97361302376 (total) 0.000986806511879 (per call)
 ujson.loads(data): 2.05873394012 (total) 0.00102936697006 (per call)
 --------------------
 Library: simplejson
 Version: 3.3.0
 simplejson.dumps(data): 3.24183320999 (total) 0.001620916605 (per call)
 simplejson.loads(data): 2.20791387558 (total) 0.00110395693779 (per call)
 --------------------
 Library: jsonlib2
 Version: (1, 3, 10)
 jsonlib2.dumps(data): 2.211810112 (total) 0.001105905056 (per call)
 jsonlib2.loads(data): 2.55381131172 (total) 0.00127690565586 (per call)
 --------------------
 Library: json
 Version: 2.0.9
 json.dumps(data): 2.35674309731 (total) 0.00117837154865 (per call)
 json.loads(data): 5.23104810715 (total) 0.00261552405357 (per call)
 --------------------
 Library: yajl
 Version: None
 yajl.dumps(data): 2.85826969147 (total) 0.00142913484573 (per call)
 yajl.loads(data): 3.03867292404 (total) 0.00151933646202 (per call)
 --------------------
 Fastest dumps: ujson 1.97361302376 (total)
 Fastest loads: ujson 2.05873394012 (total)
 ```
 So there we have it.
 My tests show that [ujson](https://pypi.python.org/pypi/ujson) is the fastest python json library (when running on my mbp and when parsing or serializing a “large” json dataset).
 I have added the test scripts, fixture data and results in [this gist](https://gist.github.com/brettlangdon/6b007ef89fd7d2931a22) if anyone wants to run locally and post their results in the comments below.
 I would be curious to see the results of others.
--- a/content/writing/about/forge-configuration-parser/index.md
+++ b/content/writing/about/forge-configuration-parser/index.md
@ -0,0 +1,184 @@
 ---
 title: Forge configuration parser
 author: Brett Langdon
 date: 2015-06-27
 template: article.jade
 ---
 An overview of how I wrote a configuration file format and parser.
 ---
 Recently I have finished the initial work on a project,
 [forge](https://github.com/brettlangdon/forge), which is a
 configuration file syntax and parser written in go. Recently I was working
 on a project where I was trying to determine what configuration
 language I wanted to use and whether I tested out
 [YAML](https://en.wikipedia.org/wiki/YAML) or
 [JSON](https://en.wikipedia.org/wiki/JSON) or
 [ini](https://en.wikipedia.org/wiki/INI_file), nothing really felt
 right. What I really wanted was a format similar to
 [nginx](http://wiki.nginx.org/FullExample)
 but I couldn't find any existing packages for go which supported this
 syntax. A-ha, I smell an opportunity.
 I have always been interested by programming languages, by their
 design and implementation. I have always wanted to write my own
 programming language, but since I have never had any formal education
 around the subject I have always gone about it on my own. I bring it
 up because this project has some similarities. You have a defined
 syntax that gets parsed into some sort of intermediate format. The
 part that is missing is where the intermediate format is then
 translated into machine or byte code and actually executed. Since this
 is just a configuration language, that is not necessary.
 ## Project overview
 You can see the repository for
 [forge](https://github.com/brettlangdon/forge) for current usage and
 documentation.
 Forge syntax is a file which is made up of _directives_. There are 3
 kinds of _directives_:
 * _settings_: Which are in the form `<KEY> = <VALUE>`
 * _sections_: Which are used to group more _directives_ `<SECTION-NAME> { <DIRECTIVES> }`
 * _includes_: Used to pull in settings from other forge config files `include <FILENAME/GLOB>`
 Forge also supports various types of _setting_ values:
 * _string_: `key = "some value";`
 * _bool_: `key = true;`
 * _integer_: `key = 5;`
 * _float_: `key = 5.5;`
 * _null_: `key = null;`
 * _reference_: `key = some_section.key;`
 Most of these setting types are probably fairly self explanatory
 except for _reference_. A _reference_ in forge is a way to have the
 value of one _setting_ be a pointer to another _setting_. For example:
 ```config
 global = "value";
 some_section {
  key = "some_section.value";
  global_ref = global;
  local_ref = .key;
  ref_key = ref_section.ref_key;
 }
 ref_section {
  ref_key = "hello";
 }
 ```
 In this example we see 3 examples of _references_. A _reference_ value
 is one which is an identifier (`global`) possibly multiple identifiers separated
 with a period (`ref_section.ref_key`) as well _references_ can begin
 with a perod (`.key`). Every _reference_ which is not prefixed with a period
 is resolved from the global section (most outer level). So in this
 example a _reference_ to `global` will point to the value of
 `"value"` and `ref_section.ref_key` will point to the value of
 `"hello"`. A _local reference_ is one which is prefixed with a period,
 those are resolved starting from the current section that the
 _setting_ is defined in. So in this case, `local_ref` will point to
 the value of `"some_section.value"`.
 That is a rough idea of how forge files are defined, so lets see a
 quick example of how you can use it from go.
 ```go
 package main
 import (
    "github.com/brettlangdon/forge"
 )
 func main() {
    settings, _ := forge.ParseFile("example.cfg")
    if settings.Exists("global") {
    	value, _ := settings.GetString("global");
    	fmt.Println(value);
    }
    settings.SetString("new_key", "new_value");
    settingsMap := settings.ToMap();
    fmt.Println(settingsMaps["new_key"]);
    jsonBytes, _ := settings.ToJSON();
    fmt.Println(string(jsonBytes));
 }
 ```
 ## How it works
 Lets dive in and take a quick look at the parts that make forge
 capable of working.
 **Example config file:**
 ```config
 # Top comment
 global = "value";
 section {
  a_float = 50.67;
  sub_section {
    a_null = null;
    a_bool = true;
    a_reference = section.a_float;  # Gets replaced with `50.67`
  }
 }
 ```
 Basically what forge does is take a configuration file in defined
 format and parses it into what is essentially a `map[string]interface{}`.
 The code itself is comprised of two main parts, the tokenizer (or scanner) and the
 parser. The tokenizer turns the raw source code (like above) into a stream of tokens. If
 you printed the token representation of the code above, it could look like:
 ```
 (COMMENT, "Top comment")
 (IDENTIFIER, "global")
 (EQUAL, "=")
 (STRING, "value")
 (SEMICOLON, ";"
 (IDENTIFIER, "section")
 (LBRACKET, "{")
 (IDENTIFIER, "a_float")
 (EQUAL, "=")
 (FLOAT, "50.67")
 (SEMICOLON, ";")
 ....
 ```
 Then the parser takes in this stream of tokens and tries to parse them based on some known
 grammar. For example, a directive is in the form
 `<IDENTIFIER> <EQUAL> <VALUE> <SEMICOLON>` (where `<VALUE>` can be
 `<STRING>`, `<BOOL>`, `<INTEGER>`, `<FLOAT>`, `<NULL>`,
 `<REFERENCE>`). When the parser sees `<IDENTIFIER>` it'll look ahead
 to the next token to try and match it to this rule, if it matches then
 it knows to add this setting to the internal `map[string]interface{}`
 for that identifier. If it doesn't match anything then it has a syntax
 error and will throw an exception.
 The part that I think is interesting is that I opted to just write the
 tokenizer and parser by hand rather than using a library that converts
 a language grammar into a tokenizer (like flex/bison). I have done
 this before and was inspired to do so after learning that that is how
 the go programming language is written, you can see here
 [parser.go](https://github.com/golang/go/blob/258bf65d8b157bfe311ce70c93dd854022a25c9d/src/go/parser/parser.go)
 (not a light read at 2500 lines). The
 [scanner.go](https://github.com/brettlangdon/forge/blob/1c8c6f315b078622b7264b702b76c6407ec0f264/scanner.go)
 and
 [parser.go](https://github.com/brettlangdon/forge/blob/1c8c6f315b078622b7264b702b76c6407ec0f264/parser.go)
 might proof to be slightly easier reads for those who are interested.
 ## Conclusion
 There is just a brief overview of the project and just a slight dip
 into the inner workings of it. I am extremely interested in continuing
 to learn as much as I can about programming languages and
 parsers/compilers. I am going to put together a series of blog posts
 that walk through what I have learned so far and which might help
 guide the reader through creating something similar to forge.
 Enjoy.
--- a/content/writing/about/generator-pipelines-in-python/index.md
+++ b/content/writing/about/generator-pipelines-in-python/index.md
@ -0,0 +1,265 @@
 ---
 title: Generator Pipelines in Python
 author: Brett Langdon
 date: 2012-12-18
 template: article.jade
 ---
 A brief look into what a generator pipeline is and how to write one in Python.
 ---
 Generator pipelines are a great way to break apart complex processing into
 smaller pieces when processing lists of items (like lines in a file). For those
 who are not familiar with <a href="http://www.python.org" target="_blank">Python</a>
 generators or the concept behind generator pipelines, I strongly recommend
 reading this article first:
 <a href="http://www.dabeaz.com/generators-uk/index.html" target="_blank">Generator Tricks for Systems Programmers</a>
 by <a href="http://www.dabeaz.com/" target="_blank">David M. Beazley</a>.
 It will surely take you more in-depth than I am going to go.
 A brief introduction on generators. There are two types of generators,
 generator expressions and generator functions. A
 <a href="http://www.python.org/dev/peps/pep-0289/" target="_blank">generator expression</a>
 looks similar to a
 <a href="http://www.python.org/dev/peps/pep-0202/" target="_blank">list comprehension</a>
 but the simple difference is that it uses parenthesis over square brackets.
 A <a href="http://www.python.org/dev/peps/pep-0255/" target="_blank">generator function</a>
 is a function which contains the keyword
 <a href="http://docs.python.org/2/reference/simple_stmts.html#grammar-token-yield_stmt" target="_blank">yield</a>;
 yield is used to pass a value from within the function to the calling expression
 without exiting the function (unlike a return statement).
 ## Generator Expression
 ```python
 nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
 print sum(num for num in nums)
 num_gen = (num for num in nums)
 for num in num_gen:
    print num
 ```
 Line 2 of the above, when passing a generator into a function the extra parenthesis
 are not needed. Otherwise you can create a stand alone generator, like in line 3;
 this expression simply creates the generator, it does not iterate over the list of
 numbers until it is passed into the for loop on line 4.
 ## Generator Function
 ```python
 def nums():
    nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    for num in nums:
        yield num
 print sum(nums())
 for num in nums():
    print num
 ```
 This block of code does the exact same as the example above but uses a generator
 function instead of a generator expression. When the function nums is called it
 will loop through the list of numbers and one by one pass them back up to either
 the function call for sum or for the for loop.
 Generators (either expressions or functions) are not the same as returning a list
 of items (lets say numbers). They do not wait for all possible items to be yielded
 before the items are returned. Each item is returned as it is yielded. For example,
 with the generator function code above, the number 1 is being printed on line 7
 before the number 2 is being yielded on line 4.
 So, cool, alright, generators are nice, but what about generator pipelines? A
 generator pipeline is taking these generators (expressions or functions) and
 chaining them together. Lets try to look at a case where they might be useful.
 ## Example: Without Generators
 ```python
 def process(num):
    # filter out non-evens
    if num % 2 != 0:
        return
    num = num * 3
    num = 'The Number: %s' % num
    return num
 nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
 for num in nums:
    print process(num)
 ```
 This code is fairly simple and may not seem like the best example for creating a
 generator pipeline, but it is nice because we can break it down into small parts.
 For starters we need to filter out any non-even numbers, then we need to multiple
 the num by 3, then finally we convert the number to a string. Lets see what this
 looks like as a pipeline.
 ## Generator Pipeline
 ```python
 def even_filter(nums):
    for num in nums:
        if num % 2 == 0:
            yield num
 def multiply_by_three(nums):
    for num in nums:
        yield num * 3
 def convert_to_string(nums):
    for num in nums:
        yield 'The Number: %s' % num
 nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
 pipeline = convert_to_string(multiply_by_three(even_filter(nums)))
 for num in pipeline:
    print num
 ```
 This code example might look more complex that the previous example, but it
 provides a good example of how (with generators) you can chain together a set of
 very small and concise processes over a set of items. So, how does this example
 really work? Each number in the list nums passes through each of the three
 functions and is printed before the next items has it’s chance to make it through.
 1. The Number 1 is checked for even, it is not so processing for that number stops
 2. The Number 2 is checked for even, it is so it is yielded to `multiply_by_three`
 3. The Number 2 is multiplied by 3 and yielded to `convert_to_string`
 4. The Number 2 is formatted into the string and yielded to the for loop on line 14
 5. The Number 2 is printed as _“The Number: 2″_
 6. The Number 3 is checked for even, it is not so processing for that number stops
 7. The Number 4 is checked for even, it is so it is yielded to `multiply_by_three`
 8. … etc…
 This continues until all of the numbers have either been ignored (by even_filter)
 or have been yielded. If you wanted to, you can change the order in which the
 chain is created to change the order in which each process runs (try swapping
 even_filter and multiply_by_three).
 So, how about a more practical example? What if we needed to process an
 <a href="http://httpd.apache.org/" target="_blank">Apache</a> log file? We can use
 a generator pipeline to break the processing into very small functions for
 filtering and parsing. We will use the following example line format for our
 processing:
 ```
 127.0.0.1 [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
 ```
 ## Processing Apache Logs
 ```python
 class LogProcessor(object):
    def __init__(self, file):
        self._file = file
        self._filters = []
    def add_filter(self, new_filter):
        if callable(new_filter):
            self._filters.append(new_filter)
    def process(self):
        # this is the pattern for creating a generator
        # pipeline, we start with a generator then wrap
        # each consecutive generator with the pipeline itself
        pipeline = self._file
        for new_filter in self._filters:
            pipeline = new_filter(pipeline)
        return pipeline
 def parser(lines):
    """Split each line based on spaces and
    yield the resulting list.
    """
    for line in lines:
        yield [part.strip('"[]') for part in line.split(' ')]
 def mapper(lines):
    """Convert each line to a dict
    """
    for line in lines:
        tmp = {}
        tmp['ip_address'] = line[0]
        tmp['timestamp'] = line[1]
        tmp['timezone'] = line[2]
        tmp['method'] = line[3]
        tmp['request'] = line[4]
        tmp['version'] = line[5]
        tmp['status'] = int(line[6])
        tmp['size'] = int(line[7])
        yield tmp
 def status_filter(lines):
    """Filter out lines whose status
    code is not 200
    """
    for line in lines:
        # is the status is not 200
        # then the line is ignored
        # and does not make it through
        # the pipeline to the end
        if line['status'] == 200:
            yield line
 def method_filter(lines):
    """Filter out lines whose method
    is not 'GET'
    """
    for line in lines:
        # all lines with method not equal
        # to 'get' are dropped
        if line['method'].lower() == 'get':
            yield line
 def size_converter(lines):
    """Convert the size (in bytes)
    into megabytes
    """
    mb = 9.53674e-7
    for line in lines:
        line['size'] = line['size'] * mb
        yield line
 # setup the processor
 log = open('./sample.log')
 processor = LogProcessor(log)
 # this is the order we want the functions to run
 processor.add_filter(parser)
 processor.add_filter(mapper)
 processor.add_filter(status_filter)
 processor.add_filter(method_filter)
 processor.add_filter(size_converter)
 # process() returns the generator pipeline
 for line in processor.process():
    # line with be a dict whose status is
    # 200 and method is 'GET' and whose
    # size is expressed in megabytes
    print line
 log.close()
 ```
 So there you have it. A more practical example of how to use generator pipelines.
 We have setup a simple class that is used to iterate through a log file of a
 specific format and perform a set of operations on each log line in a specified
 order. By having each operation a very small generator function we now have modular
 line processing, meaning we can move our filters, parsers and converters around in
 any order we want. We can swap the order of the method and status filters and move
 the size converters before the filters. It would not make sense, but we could move
 the parser and mapper functions around as well (this might break things).
 This generator pipeline will do the following:
 1. yield a single line in from the log file
 2. Split that line based on spaces and yield the resulting list
 3. yield a dict from the single line list
 4. check the line’s status code, yield if 200, goto step 1 otherwise
 5. check the line’s method, yield if ‘get’, goto step 1 otherwise
 6. convert the line’s size to megabytes, yield the line
 7. the line is printed in the for loop, goto step 1 (repeat for all other lines)
 Do you use generators and generator pipelines differently in your Python code?
 Please feel free to share any tips/tricks or anything I may have missed in
 the above. Enjoy.
--- a/content/writing/about/goodbye-grunt-hello-tend/index.md
+++ b/content/writing/about/goodbye-grunt-hello-tend/index.md
@ -0,0 +1,173 @@
 ---
 title: Goodbye Grunt, Hello Tend
 author: Brett Langdon
 date: 2014-06-09
 template: article.jade
 ---
 Recently decided to give Grunt a try, which caused me to write my
 own node.js build system.
 ---
 For the longest time I had refused to move away from [Makefiles](http://mrbook.org/tutorials/make/)
 for [Grunt](http://gruntjs.com/) or some other [node.js](https://nodejs.org) build system.
 But I finally gave in and decided to take an afternoon to give Grunt a go.
 Initially it seemed promising, Grunt had a plugin for everything and ultimately
 it supporting watching files and directories (the one feature I really wanted
 for my `make` build setup).
 I tried to move over a fairly simplistic `Makefile` that I already had written
 into a `Gruntfile`. However, after about an hour (or more) of trying to get `grunt`
 setup with [grunt-cli](https://github.com/gruntjs/grunt-cli) and all the other
 plugins installed and configured to do the right thing I realized that `Grunt`
 wasn't for me. I took a simple 10 (ish) line `Makefile` and turned it into a 40+
 line `Gruntfile` and it still didn't seem to do exactly what I wanted. What I
 had to reflect on was why should I spend all this time trying to learn how to
 configure some convoluted plugins when I already known the correct commands to
 execute? Then I realized what I really wanted wasn't a new build system but
 simply `watch` for a `Makefile`
 I have attempted to get some form of watch working with a `Makefile` in the past,
 but it usually involves using inotify and I've never gotten it working exactly
 like how I wanted. So, I decided to start writing my own system, because, why not
 spend more time on perfecting my build system. My requirements were fairly simple,
 I wanted a way to watch a directory/files for changes and when they do simply run
 a single command (ultimately `make <target>`), I wanted the ability to also run
 long running processing like `node server.js` and restart them if certain files
 have changed, and lastly unlike other watch based systems I have seen I wanted
 a way to run a command as soon as I start up the watch program (so you dont have
 to start the watching, then go open/save a newline change to a file to get it to
 build for the first time).
 What I came up with was [tend](https://github.com/brettlangdon/tend). Which solves
 mostly all of my needs, which was simply "watch for make". So how do you use it?
 ### Installation
 ```bash
 npm install -g tend
 ```
 ### Usage
 ```
 Usage:
  tend
  tend <action>
  tend [--restart] [--start] [--ignoreHidden] [--filter <filter>] [<dir> <command>]
  tend (--help | --version)
 Options:
  -h --help             Show this help text
  -v --version          Show tend version information
  -r --restart          If <command> is still running when there is a change, stop and re-run it
  -i --ignoreHidden     Ignore changes to files which start with "."
  -f --filter <filter>  Use <filter> regular expression to filter which files trigger the command
  -s --start            Run <command> as soon as tend executes
 ```
 ### Example CLI Usage
 The following will watch for changes to any `js` files in the directory `./src/`
 when any of them change or are added it will run `uglifyjs` to combine them into
 a single file.
 ```bash
 tend --ignoreHidden --filter "*.js" ./src "uglifyjs -o ./public/main.min.js ./src/*.js"
 ```
 The following will run a long running process, starting it as soon as `tend` starts
 and restarting the program whenever files in `./routes/` has changed.
 ```bash
 tend --restart --start --filter "*.js" ./routes "node server.js"
 ```
 ### Config File
 Instead of running `tend` commands singly from the command line you can provide
 `tend` with a `.tendrc` file of multiple directories/files to watch with commands
 to run.
 The following `.tendrc` file are setup to run the same commands as shown above.
 ```ini
 ; global settings
 ignoreHidden=true
 [js]
 filter=*.js
 directory=./src
 command=uglifyjs -o ./public/main.min.js ./src/*.js
 [app]
 filter=*.js
 directory=./routes
 command=node ./app/server.js
 restart=true
 start=true
 ```
 You can then simply run `tend` without any arguments to have `tend` watch for
 all changes configured in your `.tendrc` file.
 Running:
 ```bash
 tend
 ```
 Will basically execute:
 ```bash
 tend --ignoreHidden --filter "*.js" ./src "uglifyjs -o ./public/main.min.js ./src/*.js" \
  & tend --restart --start --filter "*.js" ./routes "node server.js"
 ```
 Along with running multiple targets at once, you can run specific targets from
 a `.tendrc` file as well, `tend <target>`.
 ```bash
 tend js
 ```
 Will run the `js` target once.
 ```bash
 tend --ignoreHidden --filter "*.js" ./src "uglifyjs -o ./public/main.min.js ./src/*.js"
 ```
 ### With Make
 If I haven't beaten a dead horse enough, I am a `Makefile` kind of person and
 that is exactly what I wanted to use this new tool with. So below is an example
 of a `Makefile` and it's corresponding `.tendrc` file.
 ```make
 js:
    uglifyjs -o ./public/main.min.js ./src/*.js
 app:
    node server.js
 .PHONY: js app
 ```
 ```ini
 ignoreHidden=true
 [js]
 filter=*.js
 directory=./src
 command=make js
 [app]
 filter=*.js
 directory=./routes
 command=make app
 restart=true
 start=true
 ```
 ### Conclusion
 So that is mostly it. Nothing overly exciting and nothing really new here, just
 another watch/build system written in node to add to the list. For the most part
 this tool does exactly what I want for now, but if anyone has any ideas on how
 to make this better or even any other better/easier tools which do similar things
 please let me know, I am more than willing to continue maintaining this tool.
--- a/content/writing/about/how-ads-are-delivered/index.md
+++ b/content/writing/about/how-ads-are-delivered/index.md
@ -0,0 +1,85 @@
 ---
 title: How Ads are Delivered
 author: Brett Langdon
 date: 2012-09-02
 template: article.jade
 ---
 A really brief look into how online advertising works.
 ---
 For the last 6 months or so I have been working in the ad tech industry for a
 search re-targeting company,
 <a href="http://www.magnetic.com" target="_blank">Magnetic</a>,
 as a software engineer working on software to deliver ads online and I wanted
 share some of the things I have learned.
 When I started working for them I did not realize how online ads are delivered.
 I thought that web sites offer up space to advertisers and then they show various
 ads based on what the web site wants them to show. Well, this isn’t really wrong
 but not quite right. There are a few more pieces to the puzzle.
 ### Advertiser
 An advertiser is the person, or agency, that wishes to deliver ads to the internet.
 ### Publisher
 A publisher is a person, or organization, that has online ad space that they
 wish to fill.
 ### Ad Exchange
 An ad exchange is a company that allows various advertisers to bid on available
 ad space provided by publishers.
 ## How It Works
 This is the part I never fully understood until I started working in the industry
 (there are still parts I do not know). The magic for most online ads is in the ad
 exchange. When a user goes to a website there are various iframes on the page which
 the publisher has pointed to the ad exchange. This lets the exchange know that
 there is space currently available.
 So the exchange then compiles a bid request which contains as much information
 about the ad space and user as possible. This information can contain simple
 things like the size of the ad and location of the add (above or below fold), to
 various information about the user, geo location, window size, etc.
 The bid request is sent out to all of the advertisers to let them know about the
 potential ad space available. The advertisers must then make a decision whether or
 not they want to bid on that ad space, based on the information provided. If they
 have an ad that meets the criteria then they will return a bid response to the ad
 exchange telling them of their bid. The bid price for an ad is provided in micro
 dollars or one one millionth of a dollar. Another common unit for ad tech is CPM
 or cost per mile which denotes the price for every one thousand ads.
 Once the ad exchange has all the bids they take the ad with the highest bid to
 deliver. The cost you pay is not the price you bid, but one bidding unit above
 the next highest bid. Lastly, the ad is delivered to the user.
 One thing to note, which I find very cool, is this all happens in real time for
 every page request that a user makes. Next time you go to a website which contains
 ads, stop to think about what had happened for that ad to become available to you.
 ## Why Is This Cool?
 Some might not find this topic very interesting, others might hold a grunge to the
 fact that ads are being shown on websites or to the fact that some companies are
 maintaining search information about them on their systems (in order to make
 future ad decisions based on available ad space for you specifically). To me this
 is interesting because of the scale that these systems need to be in. Our company
 does not make a few hundred bids per day or even hour, this can happen in seconds.
 We also do not make any “static” decisions based on the bids that we receive,
 instead we are trying to make informed, real time decisions as to which ads we
 want to show.
 Our systems need to not only be scalable, for an increase in available bids, but
 they also need to be fast. If we waited for a SQL query to finish we would lose
 out on hundreds of bids before we got our response. Our system is based heavily
 on caching and rebuilding useful information for bidding. The fact that our
 company works under these constraints requires our developers (that includes me)
 to have to think outside the box and about the bigger picture.
--- a/content/writing/about/javascript-documentation-generation/index.md
+++ b/content/writing/about/javascript-documentation-generation/index.md
@ -0,0 +1,86 @@
 ---
 title: Javascript Documentation Generation
 author: Brett Langdon
 date: 2015-02-03
 template: article.jade
 ---
 I have always been trying to find a good Javascript documentation generator and
 I have never really been very happy with any that I have found. So I've decided
 to just write my own, DocAST.
 ---
 The problem I have always had with any documentation generators is they are
 either hard to theme or are sometimes very strict with the way doc strings are
 suppose to be written, making them potentially difficult to switch between
 documentation generators if you had to. So for a fun exercise I've decided to
 just try writting one myself, [DocAST](https://github.com/brettlangdon/docast).
 What is different about DocAST? I've seen a few documentation parsers which use
 regular expressions to parse out the comment blocks, which works perfectly well,
 except I've decided to have some fun and use
 [AST](http://en.wikipedia.org/wiki/Abstract_syntax_tree) parsing to grab the
 code blocks from the scripts. As well, DocAST doesn't try to force you in to any
 specific theme or display, instead it is used simply to extract documentation
 from scripts. Lastly, DocAST, doesn't use any specific documentation format for
 signifying parameters, returns or exceptions, it will traverse the AST of the
 code block to find them for you, so most of the time you just need to add a
 simple block comment describing the function above it.
 Lets just get to an example:
 ```javascript
 // script.js
 /*
 * This is my super cool function that does all sorts of cool stuff
 **/
 function superCool(arg1, arg2){
    if(arg1 === arg2){
        throw new Exception("arg1 and arg2 cant be the same");
    }
    var sum = arg1 + arg2;
    return sum;
 }
 ```
 ```shell
 $ docast extract ./script.js
 $ cat out.json
 ```
 ```javascript
 [
    {
        "name": "superCool",
        "params": [
            "arg1",
            "arg2"
        ],
        "returns": [
            "sum"
        ],
        "raises": [
            "Exception"
        ],
        "doc": " This is my super cool function that does all sorts of cool stuff\n"
    }
 ]
 ```
 For more information check out the github page for
 [DocAST](https://github.com/brettlangdon/docast).
 The other benefit I have found with a documentation parser (something that just
 extracts the documentation information as opposed to trying to build it) is that
 you can get fun and creative with how you use the information parsed. For
 example, I've had someone suggest creating your doc strings as
 [yaml](http://www.yaml.org/). When you parse out the string just parse the yaml
 to get an object which is then easy to pass on to [jade](http://jade-lang.com/)
 or some other templating engine to generate your documentation. If you want to
 see an example of this, just check out the documentation for DocAST
 https://github.com/brettlangdon/docast/blob/master/lib/index.js#L127 and the
 code used to generate the docs at http://brettlangdon.github.io/docast/
 https://github.com/brettlangdon/docast/tree/master/docs
--- a/content/writing/about/javascript-interview-questions/index.md
+++ b/content/writing/about/javascript-interview-questions/index.md
@ -0,0 +1,43 @@
 ---
 title: JavaScript Interview Questions
 author: Brett Langdon
 date: 2012-09-01
 template: article.jade
 ---
 Prelimiary review of the JavaScript book "JavaScript Interview Questions"
 ---
 A few weeks ago I pre-ordered a wonderful book,
 <a href="http://o2js.com/interview-questions/" target="_blank">JavaScript Interview Questions</a>,
 written by
 <a href="http://o2js.com/volkan" target="_blank">Volkan Özçelik</a>.
 So even though the book is not yet finished I thought I would take a moment
 to give a brief overview of what I have read so far.
 When I started reading the book it was a mere 20-30 pages long and most of the
 book was empty chapters and sections outlining the structure of the soon to be
 full copy. Now, just a few weeks further along, Volkan has begun filling in the
 others sections nicely and the book has reached over 150 pages and there is still
 much more work to do. This book will cover every topic surrounding a professional
 JavaScript interview, from how to handle technical JavaScript interview questions
 to even how to apply for a job and react to a job offer.
 This book provides an intimate and in-depth look into the heart of JavaScript and
 the parts that make the language unique from others. As if he felt years of
 professional and tried experience were not enough Volkan also offers various
 resources and links to related material throughout the book to help support points
 he makes, as well as to provide an alternative point of view to the topics he is
 covering. So far the references section of the book takes up 5 pages and includes
 over 100 unique links, and there are more to come.
 For those who feel that this book is just for those looking how to get a leg up in
 a JavaScript interview, please reconsider purchasing the book. It will help you
 learn the hidden secrets to make you a better JavaScript developer and an all
 around better interviewee.
 If <a href="http://o2js.com/interview-questions/" target="_blank">JavaScript Interview Questions</a>
 sounds interesting to you then please checkout the
 <a href="http://o2js.com/assets/javascript-interview-questions.pdf" target="_blank">book teaser</a>
 for a free lesson on JavaScript Closures and consider pre-ordering the book.
--- a/content/writing/about/lets-make-a-metrics-beacon/index.md
+++ b/content/writing/about/lets-make-a-metrics-beacon/index.md
@ -0,0 +1,242 @@
 ---
 title: Lets Make a Metrics Beacon
 author: Brett Langdon
 date: 2014-06-22
 template: article.jade
 ---
 Recently I wrote a simple javascript metrics beacon
 library. Let me show you what I came up with and how it works.
 ---
 So, what do I mean by "javascript metrics beacon library"? Think
 [RUM (Real User Monitoring)](http://en.wikipedia.org/wiki/Real_user_monitoring) or
 [Google Analytics](http://www.google.com/analytics/),
 it is a javascript library used to capture/aggregate metrics/data
 from the client side and send that data to a server either in one
 big batch or in small increments.
 For those who do not like reading articles and just want the code you
 can find the current state of my library on github: https://github.com/brettlangdon/sleuth
 Before we get into anything technical, lets just take a quick look at an
 example usage:
 ```html
 <script type="text/javascript" src="//raw.githubusercontent.com/brettlangdon/sleuth/master/sleuth.min.js"></script>
 <script type="text/javascript">
 Sleuth.init({
  url: "/track",
 });
 // static tags to identify the browser/user
 // these are sent with each call to `url`
 Sleuth.tag('uid', userId);
 Sleuth.tag('productId', productId);
 Sleuth.tag('lang', navigator.language);
 // set some metrics to be sent with the next sync
 Sleuth.track('clicks', buttonClicks);
 Sleuth.track('images', imagesLoaded);
 // manually sync all data
 Sleuth.sendAllData();
 </script>
 ```
 Alright, so lets cover a few concepts from above, `tags`, `metrics` and `syncing`.
 ### Tags
 Tags are meant to be a way to uniquely identify the metrics that are being sent
 to the server and are generally used to break apart metrics. For example, you might
 have a metric to track whether or not someone clicks an "add to cart" button, using tags
 you can then break out that metric to see how many times the button has been pressed
 for each `productId` or browser or language or any other piece of data you find
 applicable to segment your metrics. Tags can also be used when tracking data for
 [A/B Tests](http://en.wikipedia.org/wiki/A/B_testing) where you want to segment your
 data based on which part of the test the user was included.
 ### Metrics
 Metrics are simply data points to track for a given request. Good metrics to record
 are things like load times, elements loaded on the page, time spent on the page,
 number of times buttons are clicked or other user interactions with the page.
 ### Syncing
 Syncing refers to sending the data from the client to the server. I refer to it as
 "syncing" since we want to try and aggregate as much data on the client side and send
 fewer, but larger, requests rather than having to make a request to the server for
 each metric we mean to track. We do not want to overload the Client if we mean to
 track a lot of user interactions on the site.
 ## How To Do It
 Alright, enough of the simple examples/explanations, lets dig into the source a bit
 to find out how to aggregate the data on the client side and how to sync that data
 to the server.
 ### Aggregating Data
 Collecting the data we want to send to the server isn't too bad. We are just going
 to take any specific calls to `Sleuth.track(key, value)` and store either in
 [LocalStorage](http://diveintohtml5.info/storage.html) or in an object until we need
 to sync. For example this is the `track` method of `Sleuth`:
 ```javascript
 Sleuth.prototype.track = function(key, value){
  if(this.config.useLocalStorage && window.localStorage !== undefined){
    window.localStorage.setItem('Sleuth:' + key, value);
  } else {
    this.data[key] = value;
  }
 };
 ```
 The only thing of note above is that it will fall back to storing in `this.data`
 if LocalStorage is not available as well we are namespacing all data stored in
 LocalStorage with the prefix "Sleuth:" to ensure there is no name collision with
 anyone else using LocalStorage.
 Also `Sleuth` will be kind enough to capture data from `window.performance` if it
 is available and enabled (it is by default). And it simply grabs everything it can
 to sync up to the server:
 ```javascript
 Sleuth.prototype.captureWindowPerformance = function(){
  if(this.config.performance && window.performance !== undefined){
    if(window.performance.timing !== undefined){
      this.data.timing = window.performance.timing;
    }
    if(window.performance.navigation !== undefined){
      this.data.navigation = {
        redirectCount: window.performance.navigation.redirectCount,
        type: window.performance.navigation.type,
      };
    }
  }
 };
 ```
 For an idea on what is store in `window.performance.timing` check out
 [Navigation Timing](https://developer.mozilla.org/en-US/docs/Navigation_timing).
 ### Syncing Data
 Ok, so this is really the important part of this library. Collecting the data isn't
 hard. In fact, no one probably really needs a library to do that for them, when you
 just as easily store a global object to aggregate the data. But why am I making a
 "big deal" about syncing the data either? It really isn't too hard when you can just
 make a simple AJAX call using jQuery `$.ajax(...)` to ship up a JSON string to some
 server side listener.
 The approach I wanted to take was a little different, yes, by default `Sleuth` will
 try to send the data using AJAX to a server side url "/track", but what about when
 the server which collects the data lives on another hostname?
 [CORS](http://en.wikipedia.org/wiki/Cross-origin_resource_sharing) can be less than
 fun to deal with, and rather than worrying about any domain security I just wanted
 a method that can send the data from anywhere I want back to whatever server I want
 regardless of where it lives. So, how? Simple, javascript pixels.
 A javascript pixel is simply a `script` tag which is written to the page with
 `document.write` whose `src` attribute points to the url that you want to make the
 call to. The browser will then call that url without using AJAX just like it would
 with a normal `script` tag loading javascript. For a more in-depth look at tracking
 pixels you can read a previous article of mine:
 [Third Party Tracking Pixels](http://brett.is/writing/about/third-party-tracking-pixels/).
 The point of going with this method is that we get CORS-free GET requests from any
 client to any server. But some people are probably thinking, "wait, a GET request
 doesn't help us send data from the client to server"? This is why we will encode
 our JSON string of data for the url and simply send in the url as a query string
 parameter. Enough talk, lets see what this looks like:
 ```javascript
 var encodeObject = function(data){
  var query = [];
  for(var key in data){
    query.push(encodeURIComponent(key) + '=' + encodeURIComponent(data[key]));
  };
  return query.join('&');
 };
 var drop = function(url, data, tags){
  // base64 encode( stringify(data) )
  tags.d = window.btoa(JSON.stringify(data));
  // these parameters are used for cache busting
  tags.n = new Date().getTime();
  tags.r = Math.random() * 99999999;
  // make sure we url encode all parameters
  url += '?' + encodeObject(tags);
  document.write('<sc' + 'ript type="text/javascript" src="' + url + '"></scri' + 'pt>');
 };
 ```
 That is basically it. We simply base64 encode a JSON string version of the data and send
 as a query string parameter. There might be a few odd things that stand out above, mainly
 url length limitations of base64 encoded JSON string, the "cache busting" and the weird
 breaking up of the tag "script". A safe url length limit to live under is around
 [2000](http://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers)
 to accommodate internet explorer, which from some very crude testing means each reqyest
 can hold around 50 or so separate metrics each containing a string value. Cache busting
 can be read about more in-depth in my article again about tracking pixels
 (http://brett.is/writing/about/third-party-tracking-pixels/#cache-busting), but the short
 version is, we add random numbers and the current timestamp the query string to ensure that
 the browser or cdn or anyone in between doesn't cache the request being made to the server,
 this way you will not get any missed metrics calls. Lastly, breaking up the `script` tag
 into "sc + ript" and "scri + pt" makes it harder for anyone blocking scripts from writing
 `script` tags to detect that a script tag is being written to the DOM (also an `img` or
 `iframe` tag could be used instead of a `script` tag).
 ### Unload
 How do we know when to send the data? If someone is trying to time and see how much time
 someone is spending on each page or wants to make sure they are collecting as much data
 as they want on the client side then you want to wait until the last second before
 syncing the data to the server. By using LocalStorage to store the data you can ensure
 that you will be able to access that data the next time you see that user, but who wants
 to wait? And what if the user never comes back? I want my data now dammit!
 Simple, lets bind an event to `window.onunload`! Woot, done... wait... why isn't my data
 being sent to me? Initially I was trying to use `window.onunload` to sync data back, but
 found that it didn't always work with pixel dropping, AJAX requests worked most of the time.
 After some digging I found that with `window.onunload` I was hitting a race condition on
 whether or not the DOM was still available or not, meaning I couldn't use `document.write`
 or even query the DOM on unload for more metrics to sync on `window.onunload`.
 In come `window.onbeforeunload` to the rescue! For those who don't know about it (I
 didn't before this project), `window.onbeforeunload` is exactly what it sounds like
 an event that gets called before `window.onunload` which also happens before the DOM
 gets unloaded. So you can reliably use it to write to the DOM (like the pixels) or
 to query the DOM for any extra information you want to sync up.
 ## Conclusion
 So what do you think? There really isn't too much to it is there? Especially since we
 only covered the client side of the piece and haven't touched on how to collect and
 interpret this data on the server (maybe that'll be a follow up post). Again this is mostly
 a simple implementation of a RUM library, but hopefully it sparks an interest to build
 one yourself or even just to give you some insight into how Google Analytics or other
 RUM libraries collect/send data from the client.
 I think this project that I undertook was neat because I do not always do client side
 javascript and every time I do I tend to learn something pretty cool. In this case
 learning the differences between `window.onunload` and `window.onbeforeunload` as well
 as some of the cool things that are tracked by default in `window.performance` I
 definitely urge people to check out the documentation on `window.performance`.
 ### TODO
 What is next for [Sleuth](https://github.com/brettlangdon/sleuth)? I am not sure yet,
 I am thinking of implementing more ways of tracking data, like adding counter support,
 rate limiting, automatic incremental data syncs. I am open to ideas of how other people
 would use a library like this, so please leave a comment here or open an issue on the
 projects github page with any thoughts you have.
 ## Links
 * [Sleuth](https://github.com/brettlangdon/sleuth)
 * [Third Party Tracking Pixels](http://brett.is/writing/about/third-party-tracking-pixels/)
 * [LocalStorage](http://diveintohtml5.info/storage.html)
 * [Navigation Timing](https://developer.mozilla.org/en-US/docs/Navigation_timing)
 * [window.onbeforeunload](https://developer.mozilla.org/en-US/docs/Web/API/Window.onbeforeunload)
 * [window.onunload](https://developer.mozilla.org/en-US/docs/Web/API/Window.onunload)
 * [RUM](http://en.wikipedia.org/wiki/Real_user_monitoring)
 * [Google Analytics](http://www.google.com/analytics/)
 * [A/B Testing](http://en.wikipedia.org/wiki/A/B_testing)
--- a/content/writing/about/managing-go-dependencies-with-git-subtree/index.md
+++ b/content/writing/about/managing-go-dependencies-with-git-subtree/index.md
@ -0,0 +1,145 @@
 ---
 title: Managing Go dependencies with git-subtree
 author: Brett Langdon
 date: 2016-02-03
 template: article.jade
 ---
 Recently I have decided to make the switch to using `git-subtree` for managing dependencies of my Go projects.
 ---
 For a while now I have been searching for a good way to manage dependencies for my [Go](https://golang.org/)
 projects. I think I have finally found a work flow that I really like that uses
 [git-subtree](http://git.kernel.org/cgit/git/git.git/plain/contrib/subtree/git-subtree.txt).
 When I began investigating different ways to manage dependencies I had a few small goals or concepts I wanted to follow.
 ### Keep it simple
 I have always been drawn to the simplicity of Go and the tools that surround it.
 I didn't want to add a lot of overhead or complexity into my work flow when programming in Go.
 ### Vendor dependencies
 I decided right away that I wanted to vendor my dependencies, that is, where all of my dependencies
 live under a top level `vendor/` directory in each repository.
 This also means that I wanted to use the `GO15VENDOREXPERIMENT="1"` flag.
 ### Maintain the full source code of each dependency in each repository
 The idea here is that each project will maintain the source code for each of its dependencies
 instead of having a dependency manifest file, like `package.json` or `Godeps.json`, to manage the dependencies.
 This was more of an acceptance than a decision. It wasn't a hard requirement that
 each repository maintains the full source code for each of its dependencies, but
 I was willing to accept that as a by product of a good work flow.
 ## In come git-subtree
 When researching methods of managing dependencies with `git`, I came across a great article
 from Atlassian, [The power of Git subtree](https://developer.atlassian.com/blog/2015/05/the-power-of-git-subtree/).
 Which outlined how to use `git-subtree` for managing repository dependencies... exactly what I was looking for!
 The main idea with `git-subtree` is that it is able to fetch a full repository and place
 it inside of your repository. However, it differs from `git-submodule` because it does not
 create a link/reference to a remote repository, instead it will fetch all the files from that
 remote repository and place them under a directory in your repository and then treats them as
 though they are part of your repository (there is no additional `.git` directory).
 If you pair `git-subtree` with its `--squash` option, it will squash the remote repository
 down to a single commit before pulling it into your repository.
 As well, `git-subtree` has ability to issue a `pull` to update a child repository.
 Lets just take a look at how using `git-subtree` would work.
 ### Adding a new dependency
 We want to add a new dependency, [github.com/miekg/dns](https://github.com/miekg/dns)
 to our project.
 ```
 git subtree add --prefix vendor/github.com/miekg/dns https://github.com/miekg/dns.git master --squash
 ```
 This command will pull in the full repository for `github.com/miekg/dns` at `master` to `vendor/github.com/miekg/dns`.
 And that is it, `git-subtree` will have created two commits for you, one for the squash of `github.com/miekg/dns`
 and another for adding it as a child repository.
 ### Updating an existing dependency
 If you want to then update `github.com/miekg/dns` you can just run the following:
 ```
 git subtree pull --prefix vendor/github.com/miekg/dns https://github.com/miekg/dns.git master --squash
 ```
 This command will again pull down the latest version of `master` from `github.com/miekg/dns` (assuming it has changed)
 and create two commits for you.
 ### Using tags/branches/commits
 `git-subtree` also works with tags, branches, or commit hashes.
 Say we want to pull in a specific version of `github.com/brettlangdon/forge` which uses tags to manage versions.
 ```
 git subtree add --prefix vendor/github.com/brettlangdon/forge https://github.com/brettlangdon/forge.git v0.1.5 --squash
 ```
 And then, if we want to update to a later version, `v0.1.7`, we can just run the following:
 ```
 git subtree pull --prefix vendor/github.com/brettlangdon/forge https://github.com/brettlangdon/forge.git v0.1.7 --squash
 ```
 ## Making it all easier
 I really like using `git-subtree`, a lot, but the syntax is a little cumbersome.
 The previous article I mentioned from Atlassian ([here](ttps://developer.atlassian.com/blog/2015/05/the-power-of-git-subtree/))
 suggests adding in `git` aliases to make using `git-subtree` easier.
 I decided to take this one step further and write a `git` command, [git-vendor](https://github.com/brettlangdon/git-vendor)
 to help manage subtree dependencies.
 I won't go into much details here since it is outlined in the repository as well as at https://brettlangdon.github.io/git-vendor/,
 but the project's goal was to make working with `git-subtree` easier for managing Go dependencies.
 Mainly, to be able to add subtrees and give them a name, to be able to list all current subtrees,
 and to be able to update a subtree by name rather than repo + prefix path.
 Here is a quick preview:
 ```
 $ git vendor add forge https://github.com/brettlangdon/forge v0.1.5
 $ git vendor list
 forge@v0.1.5:
    name:   forge
    dir:    vendor/github.com/brettlangdon/forge
    repo:   https://github.com/brettlangdon/forge
    ref:    v0.1.5
    commit: 4c620b835a2617f3af91474875fc7dc84a7ea820
 $ git vendor update forge v0.1.7
 $ git vendor list
 forge@v0.1.7:
    name:   forge
    dir:    vendor/github.com/brettlangdon/forge
    repo:   https://github.com/brettlangdon/forge
    ref:    v0.1.7
    commit: 0b2bf8e484ce01c15b87bbb170b0a18f25b446d9
 ```
 ## Why not...
 ### Godep/&lt;package manager here&gt;
 I decided early on that I did not want to "deal" with a package manager unless I had to.
 This is not to say that there is anything wrong with [godep](https://github.com/tools/godep)
 or any of the other currently available package managers out there, I just wanted to keep
 the work flow simple and as close to what Go supports with respect to vendored dependencies
 as possible.
 ### git-submodule
 I have been asked why not `git-submodule`, and I think anyone that has had to work
 with `git-submodule` will agree that it isn't really the best option out there.
 It isn't as though it cannot get the job done, but the extra work flow needed
 when working with them is a bit of a pain. Mostly when working on a project with
 multiple contributors, or with contributors who are either not aware that the project
 is using submodules or who has never worked with them before.
 ### Something else?
 This isn't the end of my search, I will always be keeping a look out for new and
 different ways to manage my dependencies. However, this is by far my favorite as of yet.
 If anyone has any suggestions, please feel free to leave a comment.
--- a/content/writing/about/my-new-website/index.md
+++ b/content/writing/about/my-new-website/index.md
@ -0,0 +1,37 @@
 ---
 title: My New Website
 author: Brett Langdon
 date: 2013-11-16
 template: article.jade
 ---
 Why did I redo my website?
 What makes it any better?
 Why are there old posts that are missing?
 ---
 I just wanted to write a quick post about my new site.
 Some of you who are not familiar with my site might not notice the difference,
 but trust me... it is different and for the better.
 So what has changed?
 For starters, I think the new design is a little simpler than the previous,
 but more importantly it is not longer in [Wordpress](http://www.wordpress.org).
 It is now maintained with [Wintersmith](https://github.com/jnordberg/wintersmith),
 which is a static site generator which is built in [node.js](http://nodejs.org/) and
 uses[Jade](http://jade-lang.com) templates and [markdown](http://daringfireball.net/projects/markdown/).
 Why is this better?
 Well for started I think writing in markdown is a lot easier than using Wordpress.
 It means I can use whatever text editor I want (emacs in this case) to write my
 articles. As well, I no longer need to have PHP and MySQL setup in order to just
 serve up silly static content like blog posts and a few images.
 This also means I can keep my blog entirely in [GitHub](http://github.com/).
 So far I am fairly happy with the move to Wintersmith, except having to move all my
 current blog posts over to markdown, but I will slowly keep porting some over until
 I have them all in markdown. So, please bear with me during the time of transition
 as there may be a few posts missing when I initially publish this new site.
 Check out my blog in GitHub, [brett.is](http://github.com/brettlangdon/brett.is.git).
--- a/content/writing/about/my-python-web-crawler/index.md
+++ b/content/writing/about/my-python-web-crawler/index.md
@ -0,0 +1,203 @@
 ---
 title: My Python Web Crawler
 author: Brett Langdon
 date: 2012-09-09
 template: article.jade
 ---
 How to write a very simplistic Web Crawler in Python for fun.
 ---
 Recently I decided to take on a new project, a Python based
 <a href="http://en.wikipedia.org/wiki/Web_crawler" target="_blank">web crawler</a>
 that I am dubbing Breakdown. Why? I have always been interested in web crawlers
 and have written a few in the past, one previously in Python and another before
 that as a class project in C++. So what makes this project different?
 For starters I want to try and store and expose different information about the
 web pages it is visiting. Instead of trying to analyze web pages and develop a
 ranking system (like
 <a href="http://en.wikipedia.org/wiki/PageRank" target="_blank">PageRank</a>)
 that allows people to easily search for pages based on keywords, I instead want to
 just store the information that is used to make those decisions and allow people
 to use them how they wish.
 For example, I want to provide an API for people to be able to search for specific
 web pages. If the page is found in the system, it will return back an easy to use
 data structure that contain the pages
 <a href="http://en.wikipedia.org/wiki/Meta_element" target="_blank">meta data</a>,
 keyword histogram, list of links to other pages and more.
 ## Overview of Web Crawlers
 What is a web crawler? We can start with the simplest definition of a web crawler.
 It is a program that, starting from a single web page, moves from web page to web
 page by only using urls that are given in each page, starting with only those
 provided in the original page. This is how search engines like
 <a href="http://www.google.com/" target="_blank">Google</a>,
 <a href="http://www.bing.com/" target="_blank">Bing</a> and
 <a href="http://www.yahoo.com/" target="_blank">Yahoo</a>
 obtain the content they need for their search sites.
 But a web crawler is not just about moving from site to site (even though this
 can be fun to watch). Most web crawlers have a higher purpose, like (in the case
 of search engines) to rank the relativity of a web page based on the content
 provided within the pages content and html meta data to allow people easier
 searching of content on the internet. Other web crawlers are used for more
 invasive purposes like to obtain e-mail addresses to use for marketing or spam.
 So what goes into making a web crawler? A web crawler, again, is not just about
 moving from place to place how ever it feels. Web sites can actually dictate how
 web crawlers access the content on their sites and how they should move around on
 their site. This information is provided in the
 <a href="http://www.robotstxt.org/" target="_blank">robots.txt</a>
 file that can be found on most websites
 (<a href="http://en.wikipedia.org/robots.txt" target="_blank">here is wikipedia’s</a>).
 A rookie mistaken when building a web crawler is to ignore this file. These
 robots.txt files are provided as a set of guidelines and rules that web crawlers
 must adhere by for a given domain, otherwise you are liable to get your IP and/or
 User Agent banned. Robots.txt files tell crawlers which pages or directories to
 ignore or even which ones they should consider.
 Along with ensuring that you follow along with robots.txt please be sure to
 provide a useful and unique
 <a href="http://en.wikipedia.org/wiki/User_agent" target="_blank">User Agent</a>.
 This is so that sites can identify that you are a robot and not a human.
 For example, if you see a User Agent of *“breakdown”* on your website, hi, it’s me.
 Do not use know User Agents like:
 *“Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.19 (KHTML, like Gecko) Ubuntu/12.04 Chromium/18.0.1025.168 Chrome/18.0.1025.168 Safari/535.19″*,
 this is, again, an easy way for you to get your IP address banned on many sites.
 Lastly, it is important to consider adding in rate limiting to your crawler. It is
 wonderful to be able to crawl websites and between them very quickly (no one likes
 to wait for results), but this is another sure fire way of getting your IP banned
 by websites. Net admins do not like bots to tie up all of their networks
 resources, making it difficult for actual users to use their site.
 ## Prototype of Web Crawler
 So this afternoon I decided to take around an hour or so and prototype out the
 code to crawl from page to page extracting links and storing them in the database.
 All this code does at the moment is download the content of a url, parse out all
 of the urls, find the new urls that it has not seen before, append them to a queue
 for further processing and also inserting them into the database.This process has
 2 queues and 2 different thread types for processing each link.
 There are two different types of processes within this module, the first is a
 Grabber, which is used to take a single url from a queue and download the text
 content of that url using the
 <a href="http://docs.python-requests.org/en/latest/index.html" target="_blank">Requests</a>
 Python module. It then passes the content along to a queue that the Parser uses
 to get new content to process. The Parser takes the content from the queue that
 has been retrieved from the Grabber process and simply parses out all the links
 contained within the sites html content. It then checks MongoDB to see if that
 url has been retrieved already or not, if not, it will append the new url to the
 queue that the Grabber uses to retrieve new content and also inserts this url
 into the database.
 The unique thing about using multiple threads per process (X for Grabbers and Y
 for Parsers) as well as having two different queues to share information between
 the two allows this crawler to be self sufficient once it gets started with a
 single url. The Grabbers help feed the queue that the Parsers work off of and the
 Parsers feed the queue that the Grabbers work from.
 For now, this is all that my prototype does, it only stores links and crawls from
 site to site looking for more links. What I have left to do is expand upon the
 Parser to parse out more information from the html including things like meta
 data, page title, keywords, etc, as well as to incorporate
 <a href="http://www.robotstxt.org/" target="_blank">robots.txt</a> into the
 processing (to keep from getting banned) and automated rate limiting
 (right now I have a 3 second pause between each web request).
 ## How Did I Do It?
 So I assume at this point you want to see some code? The code it not up on
 GitHub just yet, I have it hosted on my own private git repo for now and will
 gladly open source the code once I have a better prototype.
 Lets just take a very quick look at how I am sharing code between the different
 threads.
 ### Parser.py
 ```python
 import threading
 class Thread(threading.Thread):
    def __init__(self, content_queue, url_queue):
        self.c_queue = content_queue
        self.u_queue = url_queue
        super(Thread, self).__init__()
    def run(self):
        while True:
            data = self.c_queue.get()
            #process data
            for link in links:
                self.u_queue.put(link)
            self.c_queue.task_done()
 ```
 ### Grabber.py
 ```python
 import threading
 class Thread(threading.Thread):
    def __init__(self, url_queue, content_queue):
        self.c_queue = content_queue
        self.u_queue = url_queue
        super(Thread, self).__init__()
    def run(self):
        while True:
            next_url = self.u_queue.get()
            #data = requests.get(next_url)
            while self.c_queue.full():
                pass
            self.c_queue.put(data)
            self.u_queue.task_done()
 ```
 ### Breakdown
 ```python
 from breakdown import Parser, Grabber
 from Queue import Queue
 num_threads = 4
 max_size = 1000
 url_queue = Queue()
 content_queue = Queue(maxsize=max_size)
 parsers = [Parser.Thread(content_queue, url_queue) for i in xrange(num_threads)]
 grabbers = [Grabber.Thread(url_queue, content_queue) for i in xrange(num_threads)]
 for thread in parsers+grabbers:
    thread.daemon = True
    thread.start()
 url_queue.put('http://brett.is/')
 ```
 Lets talk about this process quick. The Breakdown code is provided as a binary
 script to start the crawler. It creates “num_threads” threads for each process
 (Grabber and Parser). It starts each thread and then appends the starting point
 for the crawler, http://brett.is/. One of the Grabber threads will then pick up on
 the single url, make a web request to get the content of that url and append it
 to “content_queue”. Then one of the Parser threads will pick up on the content
 data from “content_queue”, it will process the data from the web page html,
 parsing out all of the links and then appending those links onto “url_queue”. This
 will then allow the other Grabber threads an opportunity to make new web requests
 to get more content to pass to the Parsers threads. This will continue on and on
 until there are no links left (hopefully never).
 ## My Results
 I ran this script for a few minutes, maybe 10-15, and I ended up with over 11,000
 links ranging from my domain,
 <a href="http://www.pandora.com/" target="_blank">pandora</a>,
 <a href="http://www.twitter.com/" target="_blank">twitter</a>,
 <a href="http://www.linkedin.com/" target="_blank">linkedin</a>,
 <a href="http://www.github.com/" target="_blank">github</a>,
 <a href="http://www.sony.com/" target="_blank">sony</a>,
 and many many more. Now that I have a decent base prototype I can continue forward
 and expand upon the processing and logic that goes into each web request.
 Look forward to more posts about this in the future.
--- a/content/writing/about/os-x-battery-percentage-command-line/index.md
+++ b/content/writing/about/os-x-battery-percentage-command-line/index.md
@ -0,0 +1,31 @@
 ---
 title: OS X Battery Percentage Command Line
 author: Brett Langdon
 date: 2012-03-18
 template: article.jade
 ---
 Quick and easy utility to get OS X battery usage from the command line.
 ---
 Recently I learned how to enable full screen console mode for OS X but the first
 issue I ran into was trying to determine how far gone the battery in my laptop was.
 Yes of course I could use the fancy little button on the side that lights up and
 shows me but that would be way too easy for a programmer, so of course instead I
 wrote this scripts. The script will gather the battery current and max capacity
 and simply divide them to give you a percentage of battery life left.
 Just create this script, I named mine “battery”, make sure to enable execution
 “chmod +x battery” and I moved mine into “/usr/sbin/”. Then to use simply run the
 command “battery” and you’ll get an output similar to “3.900%”
 (yes as of the writing of this my battery needs a charging).
 ```bash
 #!/bin/bash
 current=`ioreg -l | grep CurrentCapacity | awk ‘{print %5}’`
 max=`ioreg -l | grep MaxCapacity| awk ‘{print %5}’`
 echo `echo “scale=3;$current/$max*100″|bc -l`’%’
 ```
 Enjoy!
--- a/content/writing/about/pharos-popup-on-osx-lion/index.md
+++ b/content/writing/about/pharos-popup-on-osx-lion/index.md
@ -0,0 +1,46 @@
 ---
 title: Pharos Popup on OSX Lion
 author: Brett Langdon
 date: 2012-01-28
 template: article.jade
 ---
 Fixing Pharos Popup app on OS X Lion.
 ---
 My University uses
 <a href="http://www.pharos.com/" target="_blank">Pharos</a>
 print servers to manage a few printers on campus and we were running into an
 issue of the Pharos popup and notify applications not working properly with OSX
 Lion. As I work for the Apple technician on campus I was tasked with finding out
 why. The popup installation was setting up the applications to run on startup just
 fine, the postflight script was invoking the Popup.app, the drivers we were using
 worked perfectly when we mapped the printer by IP but what was going on? Through
 some further examination the two applications were in fact not being properly
 started either after install or on boot.
 I managed to find a work around that caused the applications to run. I manually
 ran each of them through command line (as through Finder resulted in failure) and
 magically they worked as expected and now whenever my machine starts up they start
 on boot without having to manually run them, even if I uninstall the applications
 and reinstall them I not longer have to manually run them… but why?
 ```bash
 voltaire:~ brett$ open /Library/Application\ Support/Pharos/Popup.app
 voltaire:~ brett$ open /Library/Application\ Support/Pharos/Notify.app
 voltaire:~ brett$ ps aux | grep Pharos
 brett 600 0.0 0.1 655276 3984 ?? S 2:55PM 0:00.10 /Library/Application Support/Pharos/Popup.app/Contents/MacOS/Popup -psn_0_237626
 brett 543 0.0 0.1 655156 3652 ?? S 2:45PM 0:00.08 /Library/Application Support/Pharos/Notify.app/Contents/MacOS/Notify -psn_0_233529
 brett 608 0.0 0.0 2434892 436 s001 R+ 2:56PM 0:00.00 grep Pharos
 ```
 I am still not 100% sure why this work around worked, especially when the
 postflight script included with the Popup package is set to run Popup.app after
 installation. The only explanation I can come up with is OSX keeps a library of
 all of the “trusted” applications, you know that popup that asks you if you want
 to run a program that was downloaded from the internet, and the Popup.app and
 Notify.app are not being properly added to the list, unless run manually.
 I am still looking into a solution that can be packaged with the Popup package and
 will post more information here when I find out more.
--- a/content/writing/about/php-stop-malicious-image-uploads/index.md
+++ b/content/writing/about/php-stop-malicious-image-uploads/index.md
@ -0,0 +1,77 @@
 ---
 title: PHP - Stop Malicious Image Uploads
 author: Brett Langdon
 date: 2012-02-01
 template: article.jade
 ---
 Quick and easy trick for detecting and stopping malicious image uploads to PHP.
 ---
 Recently I have been practicing for the upcoming NECCDC competition and have
 come across a few issues that will need to be overcome, including how to stop
 malicious image uploads.
 I was reading
 <a href="http://www.acunetix.com/websitesecurity/upload-forms-threat.htm" target="_blank">this</a>
 article on
 <a href="http://www.acunetix.com/" target="_blank">Acunetix.com</a>
 about the threats of having upload forms in PHP.
 The general idea behind this exploit for Apache and PHP is when a user can
 upload an image whose content contains PHP code and the extension includes
 ‘php’ for example an image ‘new-house.php.jpg’ that contains:
 ```
 ... (image contents)
 <?php phpinfo(); ?>
 ... (image contents)
 ```
 When uploaded and then viewed Apache, if improperly setup, will process the
 image as PHP, because of the ‘.php’ in the extension and then when accessed
 will execute malicious code on your server.
 ## My Solution
 I was trying to find a good way to remove this issue quickly without opening
 more security holes. I have seen some solutions that use the function
 <a href="http://us2.php.net/manual/en/function.getimagesize.php" target="_blank">getimagesize</a>
 to try and determine if the file is an image, but if the malicious code is
 injected into the middle of an actual image this function will still return
 the actual image size and the file will validate as an image. The solution I
 came up with is to explicitly convert each uploaded image to a jpeg using
 <a href="http://us2.php.net/manual/en/function.imagecreatefromjpeg.php" target="_blank">imagecreatefromjpeg</a>
 and
 <a href="http://us2.php.net/manual/en/function.imagejpeg.php" target="_blank">imagejpeg</a>
 functions.
 ```php
 <?php
 $image = imagecreatefromjpeg( './new-house.php.jpeg' );
 imagejpeg( $image, './new-house.php.jpeg' );
 ```
 If the original image contains malicious code an error will be thrown and
 `$image` will not contain an image. This is a way to try and sanitize the
 image. This code can also be embellished where if the image is invalid then
 an image is still created and uploaded.
 ```php
 <?php
 //@ to quite the possible error from this.
 $image = @imagecreatefromjpeg( './new-house.php.jpg' );
 if( !$image ):
    $image = imagecreate(100,20);
    $greenish = imagecolorallocate( $image, 180,200,180 );
    imagefill( $image, 0, 0, $greenish );
    $black = imagecolorallocate( $image, 0,0,0 );
    imagestring( $image, 1, 5, 5, 'No.. No..', $black );
 endif;
 imagejpeg( $image, './new-house.php.jpg' );
 ```
 Enjoy.
--- a/content/writing/about/python-redis-queue-workers/index.md
+++ b/content/writing/about/python-redis-queue-workers/index.md
@ -0,0 +1,90 @@
 ---
 title: Python Redis Queue Workers
 author: Brett Langdon
 date: 2014-10-14
 template: article.jade
 ---
 Learn an easy, distributed approach to processing jobs
 from a Redis queue in Python.
 ---
 Recently I started thinking about a new project. I want to write my own Continuous Integration (CI)
 server. I know what you are thinking... "Why?!" and yes I agree, there are a bunch of good ones out
 there now, I just want to do it. The first problem I came across was how to have distributed workers
 to process the incoming builds for the CI server. I wanted something that was easy to start up on
 multiple machines and that needed minimal configuration to get going.
 The design is relatively simple, there is a main queue which jobs can be pulled from and a second queue
 that each worker process pulls jobs into to denote processing. The main queue is meant as a list of things that
 have to be processed where the processing queues is a list of pending jobs which are being processed by the
 workers. For this example we will be using [Redis lists](http://redis.io/commands#list) since they support
 the short feature list we require.
 ### worker.py
 Lets start with the worker process, the job of the worker is to simply grab a job from the queue and process it.
 ```python
 import redis
 def process(job_id, job_data):
    print "Processing job id(%s) with data (%r)" % (job_id, job_data)
 def main(client, processing_queue, all_queue):
    while True:
        # try to fetch a job id from "<all_queue>:jobs"
        # and push it to "<processing_queue>:jobs"
        job_id = client.brpoplpush(all_queue, processing_queue)
        if not job_id:
            continue
        # fetch the job data
        job_data = client.hgetall("job:%s" % (job_id, ))
        # process the job
        process(job_id, job_data)
        # cleanup the job information from redis
        client.delete("job:%s" % (job_id, ))
        client.lrem(process_queue, 1, job_id)
 if __name__ == "__main__":
    import socket
    import os
    client = redis.StrictRedis()
    try:
        main(client, "processing:jobs", "all:jobs")
    except KeyboardInterrupt:
        pass
 ```
 The above script does the following:
 1. Try to fetch a job from the queue `all:jobs` pushing it to `processing:jobs`
 2. Fetch the job data from a [hash](http://redis.io/commands#hash) key with the name `job:<job_id>`
 3. Process the job information
 4. Remove the hash key `job:<job_id>`
 5. Remove the job id from the queue `processing:jobs`
 With this design we will always be able to determine how many jobs are currently queued for process
 by looking at the list `all:jobs` and we will also know exactly how many jobs are being processed
 by looking at the list `processing:jobs` which contains the list of job ids that all workers are
 working on.
 Also we are not tied down to running just 1 worker on 1 machine. With this design we can run multiple
 worker processes on as many nodes as we want. As long as they all have access to the same Redis server.
 There are a few limitations which are all seeded in Redis' [limits on lists](http://redis.io/topics/data-types),
 but this should be good enough to get started.
 There are a few other approaches that can be taken here as well. Instead of using a single processing queue
 we could use a separate queue for each worker. Then we can look at which jobs are currently being processed
 by each individual worker, this approach would also give us the opportunity to have the workers try to fetch
 from the worker specific queue first before looking at `all:jobs` so we can either assign jobs to specific
 workers or where the worker can recover from failed processing by starting with the last job it was working
 on before failing.
 ## qw
 I have developed the library [qw](https://github.com/brettlangdon/qw) or (QueueWorker) to implement a similar
 pattern to this, so if you are interested in playing around with this or to see a more developed implementation
 please checkout the projects [github page](https://github.com/brettlangdon/qw) for more information.
--- a/content/writing/about/sharing-data-from-php-to-javascript/index.md
+++ b/content/writing/about/sharing-data-from-php-to-javascript/index.md
@ -0,0 +1,87 @@
 ---
 title: Sharing Data from PHP to JavaScript
 author: Brett Langdon
 date: 2014-03-16
 template: article.jade
 ---
 A quick example of how I decided to share dynamic content from PHP with my JavaScript.
 ---
 So the other day I was refactoring some of the client side code I was working on and
 came across something like the following:
 ### page.php
 ```php
 <html>
 ...
 <script type="text/javascript">
 var modelTitle = "<?=$myModel->getTitle()?>";
 // do something with modelTitle
 </script>
 </html>
 ```
 There isn't really anything wrong here, in fact this seems to be a fairly common practice
 (from the little research I did). So... whats the big deal? Why write an article about it?
 My issue with the above is, what if the JavaScript gets fairly large (as mine was). The
 ideal thing to do is to move the js into it's own file, minify/compress it and serve it
 from a CDN so it doesn't effect page load time. But, now we have content that needs to be
 added dynamically from the PHP script in order for the js to run. How do we solve it? The
 approach that I took, which probably isn't original at all, but I think neat enough to
 share, was to let PHP make the data available to the script through `window.data`.
 ### page.php
 ```php
 <html>
 ...
 <?php
 $pageData = array(
    'modelTitle' => $myModel->getTitle(),
 );
 ?>
 <script type="text/javascript">
 window.data = <?=json_encode($pageData)?>;
 </script>
 <script type="text/javascript" src="//my-cdn.com/scripts/page-script.min.js"></script>
 </html>
 ```
 ### page-script.js
 ```javascript
 // window.data.modelTitle is available for me to use
 console.log("My Model Title: " + window.data.modelTitle);
 ```
 Nothing really fancy, shocking, new or different here, just passing data from PHP to js.
 Something to note is that we have to have our PHP code set `window.data` before we load
 our external script so that `window.data` will be available when the script loads. Which
 this shouldn't be too much of an issue since most web developers are used to putting all
 of their `script` tags at the end of the page.
 Some might wonder why I decided to use `window.data`, why not just set
 `var modelTitle = "<?=$myModel->getTitle()?>";`? I think it is better to try and have a
 convention for where the data from the page will come from. Having to rely on a bunch of
 global variables being set isn't really a safe way to write this. What if you overwrite
 an existing variable or if some other script overwrites your data from the PHP script?
 This is still a cause for concern with `window.data`, but at least you only have to keep
 track of a single variable. As well, I think organizationally it is easier and more concise
 to have `window.data = <?=json_encode($pageData)?>;` as opposed to:
 ```php
 var modelTitle = "<?=$myModel->getTitle()?>";
 var modelId = "<?=$myModel->getId()?>";
 var username = "<?=getCurrentUser()?>";
 ...
 ```
 I am sure there are other ways to do this sort of thing, like with AJAX or having an
 initialization function that PHP calls with the correct variables it needs to pass, etc.
 This was just what I came up with and the approach I decided to take.
 If anyone has other methods of sharing dynamic content between PHP and js, please leave a
 comment and let me know, I am curious as to what most other devs are doing to handle this.
--- a/content/writing/about/the-battle-of-the-caches/index.md
+++ b/content/writing/about/the-battle-of-the-caches/index.md
@ -0,0 +1,95 @@
 ---
 title: The Battle of the Caches
 author: Brett Langdon
 date: 2013-08-01
 template: article.jade
 ---
 A co-worker and I set out to each build our own http proxy cache.
 One of them was written in Go and the other as a C++ plugin for
 Kyoto Tycoon.
 ---
 So, I know what most people are thinking: “Not another cache benchmark post,
 with skewed or biased results.” But luckily that is not what this post is about;
 there are no opinionated graphs showing that my favorite caching system happens
 to be better than all the other ones. Instead, this post is about why at work we
 decided to write our own API caching system rather than use <a href="http://www.varnish-cache.org/" target="_blank">Varnish</a>
 (a tested, tried and true HTTP caching system).
 Let us discuss the problem we have to solve. The system we have is a simple
 request/response HTTP server that needs to have very low latency (a few
 milliseconds, usually 2-3 on average) and we are adding a third-party HTTP API
 call to almost every request that we see. I am sure some people see the issue
 right away, any network call is going to add at least a half to a whole millisecond
 to your processing time and that is if the two servers are in the same datacenter,
 more if they are not. That is just network traffic, now we must rely on the
 performance of the third-party API, hoping that they are able to maintain a
 consistent response time under heavy load. If, in total, this third-party API call
 is adding more than 2 milliseconds response time to each request that our system
 is processing then that greatly reduces the capacity of our system.
 THE SOLUTION! Lets use Varnish. This is the logical solution, lets put a caching
 system in front of the API. The content we are requesting isn’t changing very often
 (every few days, if that) and it can help speed up the added latency from the API
 call. So, we tried this but had very little luck; no matter what we tried we could
 not get Varnish to respond in under 2 milliseconds per request (which is a main
 requirement of solution we were looking for). That means Varnish is out, the next
 solution is to write our own caching system.
 Now, before people start flooding the comments calling me a troll or yelling at me
 for not trying this or that or some other thing, let me try to explain really why
 we decided to write our own cache rather than spend extra days investing time into
 Varnish or some other known HTTP cache. We have a fairly specific requirement from
 our cache, very low and consistent latency. “Consistent” is the key word that really
 matters to us. We decided fairly early on that getting no response on a cache miss
 is better for our application than blocking and waiting for a response from the
 proxy call. This is a very odd requirement and most HTTP caching systems do not
 support it since it almost defeats their purpose (be “slow” 1-2 times so you can be
 fast all the other times). As well, HTTP is not a requirement for us, that is,
 from the cache to the API server HTTP must be used, but it is not a requirement
 that our application calls to the cache using HTTP. Headers add extra bandwidth
 and processing that are not required for our application.
 So we decided that our ideal cache would have 3 main requirements:
 1. Must have a consistent response time, returning nothing early over waiting for a proper response
 2. Support the <a href="https://github.com/memcached/memcached/blob/master/doc/protocol.txt" target="_blank">Memcached Protocol</a>
 3. Support TTLs on the cached data
 This behavior works basically like so: Call to cache, if it is a cache miss,
 return an empty response and queue the request to a background process to make the
 call to the API server, every identical request coming in (until the proxy call
 returns a result) will receive an empty response but not add the request to the
 queue. As soon as the proxy call returns, update the cache and every identical call
 coming in will yield the proper response. After a given TTL consider the data in
 the cache to be old and re-fetch.
 This was then seen as a challenge between a co-worker,
 <a href="http://late.am/" target="_blank">Dan Crosta</a>, and myself to see who
 can write the better/faster caching system with these requirements. His solution,
 entitled “CacheOrBust”, was a
 <a href="http://fallabs.com/kyototycoon/" target="_blank">Kyoto Tycoon</a> plugin
 written in C++ which simply used a subset of the memcached protocol as well as some
 background workers and a request queue to perform the fetching. My solution,
 <a href="https://github.com/brettlangdon/ferrite" target="_blank">Ferrite</a>, is a
 custom server written in <a href="http://golang.org/" target="_blank">Go</a>
 (originally written in C) that has the same functionality (except using
 <a href="http://golang.org/doc/effective_go.html#goroutines" target="_blank">goroutines</a>
 rather than background workers and a queue). Both servers used
 <a href="http://fallabs.com/kyotocabinet/" target="_blank">Kyoto Cabinet</a>
 as the underlying caching data structure.
 So… results already! As with most fairly competitive competitions it is always a
 sad day when there is a tie. Thats right, two similar solutions, written in two
 different programming languages yielded similar results (we probably have
 Kyoto Cabinet to thank). Both of our caching systems were able to yield us the
 results we wanted, **consistent** sub-millisecond response times, averaging about
 .5-.6 millisecond responses (different physical servers, but same datacenter),
 regardless of whether the response was a cache hit or a cache miss. Usually the
 morale of the story is: “do not re-invent the wheel, use something that already
 exists that does what you want,” but realistically sometimes this isn’t an option.
 Sometimes you have to bend the rules a little to get exactly what your application
 needs, especially when dealing with low latency systems, every millisecond counts.
 Just be smart about the decisions you make and make sure you have sound
 justification for them, especially if you decide to build it yourself.
--- a/content/writing/about/third-party-tracking-pixels/index.md
+++ b/content/writing/about/third-party-tracking-pixels/index.md
@ -0,0 +1,352 @@
 ---
 title: Third Party Tracking Pixels
 author: Brett Langdon
 date: 2013-05-03
 template: article.jade
 ---
 An overview of what a third party tracking pixel is and how to create/use them.
 ---
 So, what exactly do we mean by “third party tracking pixel” anyways?
 Lets try to break it down piece by piece:
 ### Tracking Pixel:
 A pixel referes to a tag that is placed on a site that offers no merit other than
 calling out to a web page or script that is not the current page you are visiting.
 These pixels are usually an html script tag that point to a javascript file with
 no content or an img tag with a empty or transparent 1 pixel by 1 pixel gif image
 (hence the term “pixel”). A tracking pixel is the term used to describe a pixel
 that calls to another page or script in order to provide it information about the
 users visit to the page.
 ### Third Party:
 Third party just means the pixel points to a website that is not the current
 website. For example,
 <a href="http://www.google.com/analytics/" target="_blank">Google Analytics</a>
 is a third party tracking tool because you place scripts on your website
 that calls and sends data to Google.
 ## What is the point?
 Why do people do this? In the case of Google Analytics people do not wish to track
 and follow their own analytics for their website, instead they want a third party
 host to do it for them, but they need a way of sending their user’s data to Google.
 Using pixels and javascript to send the data to Google offers the company a few
 benefits. For starters, they do not require any more overhead on their servers for
 a service to send data directly to Google, instead by using pixels and scripts they
 get to off load this overhead onto their users (thats right, we are using our
 personal computers resources to send analytical data about ourselves to Google for
 websites that use Google analytics). Secondly, the benefit of using a tracking
 pixel that runs client side (in the user’s browser) we are allowed to gather more
 information about the user. The information that is made available to us through
 the use of javascript is far greater than what is given to our servers via
 HTTP Headers.
 ## How do we do it?
 Next we will walk through the basics of how to create third party tracking pixels.
 Code examples for the following discussion can be found
 <a href="https://github.com/brettlangdon/tracking-server-examples" target="_blank">here</a>.
 We will walk through four examples of tracking pixels accompanied by the server
 code needed to serve and receive the pixels. The server is written in
 <a href="http://python.org/" target="_blank">Python</a> and some basic
 understanding of Python is required to follow along. The server examples are
 written using only standard Python wsgi modules, so no extra installation is
 needed. We will start off with a very simple example of using a tracking pixel and
 then each example afterwards we will begin to add features to the pixel.
 ## Simple Example
 For this example all we want to accomplish is to have a web server that returns
 HTML containing our tracking pixel as well as a handler to receive the call from
 our tracking pixel. Our end goal is to serve this HTML content:
 ```html
 <html>
  <head></head>
  <body>
    <h2>Welcome</h2>
    <script src="/track.js"></script>
  </body>
 </html>
 ```
 As you can see, this is fairly simple HTML; the important part is the script tag
 pointing to “/track.js”, this is our tracking pixel. When the user’s browser loads
 the page this script will  make a call to our server, our server can then log
 information about that user. So we start with a wsgi handler for the HTML code:
 ```python
 def html_content(environ, respond):
    headers = [('Content-Type', 'text/html')]
    respond('200 OK', headers)
    return [
        """
        <html><head></head><body>
        <h2>Welcome</h2><script src="/track.js"></script>
        </body></html>
        """
    ]
 ```
 Next we want to make sure that we have a handler for the calls to “/track.js”
 from the script tag:
 ```python
 def track_user(environ, respond):
    headers = [('Content-Type', 'application/javascript')]
    respond('200 OK', headers)
    prefixes = ['PATH_', 'HTTP', 'REQUEST', 'QUERY']
    for key, value in environ.iteritems():
        if any(key.startswith(prefix) for prefix in prefixes):
            print '%s: %s' % (key, value)
    return ['']
 ```
 In this handler we are taking various information about the request from the user
 and simply printing it to the screen. The end point “/track.js” is not meant to
 point to actual javascript so instead we return back an empty string. When this
 code runs you should see something like the following:
 ```
 brett$ python tracking_server.py
 Tracking Server Listening on Port 8000...
 1.0.0.127.in-addr.arpa - - [24/Apr/2013 20:03:21] "GET / HTTP/1.1" 200 89
 HTTP_REFERER: http://localhost:8000/
 REQUEST_METHOD: GET
 QUERY_STRING:
 HTTP_ACCEPT_CHARSET: ISO-8859-1,utf-8;q=0.7,*;q=0.3
 HTTP_CONNECTION: keep-alive
 PATH_INFO: /track.js
 HTTP_HOST: localhost:8000
 HTTP_ACCEPT: */*
 HTTP_USER_AGENT: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.65 Safari/537.31
 HTTP_ACCEPT_LANGUAGE: en-US,en;q=0.8
 HTTP_DNT: 1
 HTTP_ACCEPT_ENCODING: gzip,deflate,sdch
 1.0.0.127.in-addr.arpa - - [24/Apr/2013 20:03:21] "GET /track.js HTTP/1.1" 200 0
 1.0.0.127.in-addr.arpa - - [24/Apr/2013 20:03:21] "GET /favicon.ico HTTP/1.1" 204 0
 ```
 You can see in the above that first the browser makes the request “GET /” which
 returns our HTML containing the tracking pixel, then directly afterwards makes a
 request for “GET /track.js” which prints out various information about the incoming
 request. This example is not very useful as is, but helps to illustrate the key
 point of a tracking pixel. We are having the browser make a request on behalf of
 the user without the user’s knowledge. In this case we are making a call back to
 our own server, but our script tag could easily point to a third party server.
 ## Add Some Search Data
 Our previous, simple, example does not really provide us with any particularly
 useful information other than allow us to track that a user’s browser made the
 call to our server. For this next example we want to build upon the previous by
 sending some data along with the tracking pixel; in this case, some search data.
 Let us make an assumption that our web page allows users to make searches; searches
 are given to the page through a url query string parameter “search”. We want to
 pass that query string parameter on to our tracking pixel, which we will use the
 query string parameter “s”. So our requests will look as follows:
 * http://localhost:8000?search=my cool search
 * http://localhost:8000/track.js?s=my cool search
 To do this, we simply append the query string parameter “search” onto our track.js
 script tag in our HTML:
 ```python
 def html_content(environ, respond):
    query = parse_qs(environ['QUERY_STRING'])
    search = quote(query.get('search', [''])[0])
    headers = [('Content-Type', 'text/html')]
    respond('200 OK', headers)
    return [
        """
        <html><head></head><body>
        <h2>Welcome</h2><script src="/track.js?s=%s"></script>
        </body></html>
        """ % search
    ]
 ```
 For our tracking pixel handler we will simply print the value of the query string
 parameter “s” and again return an empty string.
 ```python
 def track_user(environ, respond):
    query = parse_qs(environ['QUERY_STRING'])
    search = query.get('s', [''])[0]
    print 'User Searched For: %s' % search
    headers = [('Content-Type', 'application/javascript')]
    respond('200 OK', headers)
    return ['']
 ```
 When run the output will look similar to:
 ```
 brett$ python tracking_server.py
 Tracking Server Listening on Port 8000...
 1.0.0.127.in-addr.arpa - - [24/Apr/2013 21:35:24] "GET /?search=my%20cool%20search HTTP/1.1" 200 110
 User Searched For: my cool search
 1.0.0.127.in-addr.arpa - - [24/Apr/2013 21:35:24] "GET /track.js?s=my%20cool%20search HTTP/1.1" 200 0
 1.0.0.127.in-addr.arpa - - [24/Apr/2013 21:35:24] "GET /favicon.ico HTTP/1.1" 204 0
 1.0.0.127.in-addr.arpa - - [24/Apr/2013 21:35:34] "GET /?search=another%20search HTTP/1.1" 200 108
 User Searched For: another search
 1.0.0.127.in-addr.arpa - - [24/Apr/2013 21:35:34] "GET /track.js?s=another%20search HTTP/1.1" 200 0
 1.0.0.127.in-addr.arpa - - [24/Apr/2013 21:35:34] "GET /favicon.ico HTTP/1.1" 204 0
 ```
 Here we can see the two search requests made to our web page and the similar
 resulting requests to track.js. Again, this example might not seem like much but
 it proves a way of being able to pass values from our web page along with to the
 tracking server. In this case we are passing search terms, but we could also pass
 any other information along we needed.
 ## Track User’s with Cookies
 So now we are getting somewhere, our tracking server is able to receive some
 search data  about the requests made to our web page. The problem now is we have
 no way of associating this information with a specific user; how can we know when
 a specific user searches for multiple things. Cookies to the rescue. In this
 example we are going to add the support of using cookies to assign each visiting
 user a specific and unique id, this will allow us to associate all the search data
 we receive with “specific” users. Yes, I say “specific” with quotes because we can
 only associate the data with a given cookie, if multiple people share a computer
 then we will probably think they are a single person. As well, if someone clears
 the cookies for their browser then we lose all association with that user and have
 to start all over again with a new cookie. Lastly, if a user does not allow cookies
 for their browser then we will be unable to associate any data with them as every
 time they visit our tracking server we will see them as a new user. So, how do we
 do this? When receive a request from a user we want to look and see if we have
 given them a cookie with a user id, if so then we will associate the incoming data
 with that user id and if there is no user cookie then we will generate a new user
 id and give it to the user.
 ```python
 def track_user(environ, respond):
    cookies = SimpleCookie()
    cookies.load(environ.get('HTTP_COOKIE', ''))
    user_id = cookies.get('id')
    if not user_id:
        user_id = uuid4()
        print 'User did not have id, giving: %s' % user_id
    query = parse_qs(environ['QUERY_STRING'])
    search = query.get('s', [''])[0]
    print 'User %s Searched For: %s' % (user_id, search)
    headers = [
        ('Content-Type', 'application/javascript'),
        ('Set-Cookie', 'id=%s' % user_id)
    ]
    respond('200 OK', headers)
    return ['']
 ```
 This is great! Not only can we now obtain search data from a third party website
 but we can also do our best to associate that data with a given user. In this
 instance a single user is anyone who shares the same user id in their
 browsers cookies.
 ## Cache Busting
 So what exactly is cache busting? Our browsers are smart, they know that we do not
 like to wait a long time for a web page to load, they have also learned that they
 do not need to refetch content that they have seen before if they cache it. For
 example, an image on a web site might get cached by your web browser so every time
 you reload the page the image can be loaded locally as opposed to being fetched
 from the remote server. Cache busting is a way to ensure that the browser does not
 cache the content of our tracking pixel. We want the user’s browser to follow the
 tracking pixel to our server for every page request they make because we want to
 follow everything that that user does. When the browser caches our tracking
 pixel’s content (an empty string) then we lose out on data. Cache busting is the
 term used when we programmatically generate query string parameters to make calls
 to our tracking pixel look unique and therefore ensure that the browser follows
 the pixel rather than load from it’s cache. To do this we need to add an extra end
 point to our server. We need the HTML for the web page, along with a cache busting
 script and finally our track.js handler. A cache busting script will use javascript
 to add our track.js script tag to the web page. This means that after the web page
 is loaded javascript will run to manipulate the
 <a href="http://en.wikipedia.org/wiki/Document_Object_Model" target="_blank">DOM</a>
 to add our cache busted track.js script tag to the HTML. So, what does this
 look like?
 ```javascript
 var now = new Date().getTime();
 var random = Math.random() * 99999999999;
 document.write('<script type="text/javascript" src="/track.js?t=' + now + '&r=' + random + '"></script>
 ```
 This script adds the extra query string parameters ”r” which is a random number
 and “t” which is the current timestamp in milliseconds. This will give us a unique
 enough request that will trick our browsers into ignoring anything that is has in
 it’s cache for track.js and forces it to make the request anyways. Using a cache
 buster requires us to modify the html we server slightly to server up the cache
 busting javascript as opposed to our track.js pixel.
 ```html
 <html>
  <head></head>
  <body>
    <h2>Welcome</h2>
    <script src="/buster.js"></script>
  </body>
 </html>
 ```
 And we need the following to serve up the cache buster script buster.js:
 ```python
 def cache_buster(environ, respond):
    headers = [('Content-Type', 'application/javascript')]
    respond('200 OK', headers)
    cb_js = """
            function getParameterByName(name){
                name = name.replace(/[\[]/, "\\\[").replace(/[\]]/, "\\\]");
                var regexS = "[\\?&]" + name + "=([^&#]*)";
                var regex = new RegExp(regexS);
                var results = regex.exec(window.location.search);
                if(results == null){
                    return "";
                }
                return decodeURIComponent(results[1].replace(/\+/g, " "));
            }
            var now = new Date().getTime();
            var random = Math.random() * 99999999999;
            var search = getParameterByName('search');
            document.write('<script src="/track.js?t=' + now + '&r=' + random + '&s=' + search + '"></script>');
            """
    return [cb_js]
 ```
 We do not care very much if the browser caches our cache buster script because
 it will always generate a new unique track.js url every time it is run.
 ## Conclusion
 There is a lot of stuff going on here and probably a lot to digest so lets review
 quick what we have learned. For starters we learned that companies use tracking
 pixels or tags on web pages whose sole purpose is to make your browser call our to
 external third party sites in order to track information about your internet
 usage (usually, they can be used for other things as well). We also looked into
 some very simplistic ways of implementing a server whose job it is to accept
 tracking pixels calls in various forms.
 We learned that these tracking servers can use cookies stored on your browser to
 store a unique id for you in order to help associate the data collected to you.
 That you can remove this association by clearing your cookies or by not allowing
 them at all. Lastly, we learned that browsers can cause issues for our tracking
 pixels and data collection and that we can get around them using a cache busting
 javascript.
 As a reminder the full working code examples can be located at
 <a href="https://github.com/brettlangdon/tracking-server-examples" target="_blank">"https://github.com/brettlangdon/tracking-server-examples</a>.
--- a/content/writing/about/what-i'm-up-to-these-days/index.md
+++ b/content/writing/about/what-i'm-up-to-these-days/index.md
@ -0,0 +1,42 @@
 ---
 title: What I'm up to these days
 author: Brett Langdon
 date: 2015-06-19
 template: article.jade
 ---
 It has been awhile since I have written anything in my blog. Might as well get started
 somewhere, like a brief summary of what I have been working on lately.
 ---
 It has been far too long since I last wrote in this blog. I always have these aspirations
 of writing all the time about all the things I am working on. The problem generally comes
 back to me not feeling confident enough to write about anything I am working on. "Oh, a
 post like that probably already exists", "There are smarter people than me out there
 writing about this, why bother". It is an unfortunate feeling to try and get over.
 So, here is where I am making an attempt. I will try to write more, it'll be healthy for
 me. I always hear of people setting reminders in their calendars to block off time to
 write blog posts, even if they end up only writing a few sentences, which seems like a
 great idea that I indent to try.
 Ok, enough with the "I haven't been feeling confident dribble", on to what I actually have
 been up to lately.
 Since my last post I have a new job. I am now Senior Software Engineer at
 [underdog.io](https://underdog.io/). We are a small early stage startup (4 employees, just
 over a year old) that is in the hiring space. For candidates our site basically acts like
 a common application to now over 150 venture backed startups in New York City or San
 Francisco. In the short time I have been working there, I am very impressed and glad that
 I took their offer. I work with some awesome and smart people and I am still learning a
 lot, whether it is about coding or just trying to run a business.
 I originally started to end this post by talking about a programming project I have been
 working on, but it ended up being 4 times longer than the text above and have decided
 instead to write a separate post about it. Apparently even though I have been writing
 lately, I have a lot to say.
 Thanks for bearing with this "I have to write something" post. I am not going to make a
 promise that I am going to write more, because it is something that could easily fall
 through, like it usually does... but I shall give it my all!
--- a/content/writing/about/why-benchmarking-tools-suck/index.md
+++ b/content/writing/about/why-benchmarking-tools-suck/index.md
@ -0,0 +1,86 @@
 ---
 title: Why Benchmarking Tools Suck
 author: Brett Langdon
 date: 2012-10-22
 template: article.jade
 ---
 A brief aside into why I think no benchmarking tool is exactly correct
 and why I wrote my own.
 ---
 Benchmarking is (or should be) a fairly important part of most developers job or
 duty. To determine the load that the systems that they build can withstand. We are
 currently at a point in our development lifecycle at work where load testing is a
 fairly high priority. We need to be able to answer questions like, what kind of
 load can our servers currently handle as a whole?, what kind of load can a single
 server handle?, how much throughput can we gain by adding X more servers?, what
 happens when we overload our servers?, what happens when our concurrency doubles?
 These are all questions that most have probably been asked at some point in their
 career. Luckily enough there is a plethora of HTTP benchmarking tools to help try
 to answer these questions. Tools like,
 <a href="http://httpd.apache.org/docs/2.2/programs/ab.html" target="_blank">ab</a>,
 <a href="http://www.joedog.org/siege-home/" target="_blank">siege</a>,
 <a href="https://github.com/newsapps/beeswithmachineguns" target="_blank">beeswithmachineguns</a>,
 <a href="http://curl-loader.sourceforge.net/" target="_blank">curl-loader</a>
 and one I wrote recently (today),
 <a href="https://github.com/brettlangdon/tommygun" target="_blank">tommygun</a>.
 Every single one of those tools suck, including the one I wrote (and will
 probably keep using/maintaining). Why? Don’t a lot of people use them? Yes,
 almost everyone I know has used ab (most of you probably have) and I know a
 decent handful of people who use siege, but that does not mean that they are
 the most useful for all use cases. In fact they tend to only be useful for a
 limited set of testing. Ab is great if you want to test a single web page, but
 what if you need to test multiple pages at once? or in a sequence? I’ve also
 personally experienced huge performance issues with running ab from a mac. These
 scope issues of ab make way for other tools such as siege and curl-loader which
 can test multiple pages at a time or in a sequence, but at what cost? Currently at
 work we are having issues getting siege to properly parse and test a few hundred
 thousand urls, some of which contain binary post data.
 On top of only really having a limited set of use cases, each benchmarking tool
 also introduces overhead to the machine that you are benchmarking from. Ab might
 be able to test your servers faster and with more concurrency than curl-loader
 can, but if curl-loader can test your specific use case, which do you use?
 Curl-loader can probably benchmark exactly what your trying to test but if it
 cannot supply the source load of what you are looking for, then how useful of a
 tool is it? What if you need to scale your benchmarking tool? How do you scale
 your benchmarking tool? What if you are running the test from the same machine as
 your development environment? What kind of effect will running the benchmarking
 tool itself have on your application?
 So, what is the solution then? I think instead of trying to develop these command
 line tools to fit each scenario we should try to develop a benchmarking framework
 with all of the right pieces that we need. For example, develop a platform that
 has the functionality to run a given task concurrently but where you supply the
 task for it to run. This way the benchmarking tool does not become obsolete and
 useless as your application evolves. This will also pave the way for the tool to
 be protocol agnostic. Allowing people to write tests easily for HTTP web
 applications or even services that do not interpret HTTP, such as message queues
 or in memory stores. This framework should also provide a way to scale the tool
 to allow more throughput and overload on your system. Lastly, but not least, this
 platform should be lightweight and try to introduce as little overhead as
 possible, for those who do not have EC2 available to them for testing, or who do
 not have spare servers lying around for them to test from.
 I am not saying that up until now load testing has been nothing but a pain and
 the tools that we have available to us (for free) are the worst things out there
 and should not be trusted. I just feel that they do not and cannot meet every use
 case and that I have been plighted by this issue in the past. How can you properly
 load test your application if you do not have the right load testing tool for
 the job?
 So, I know what some might be thinking, “sounds neat, when will your framework
 be ready for me to use?” That is a nice idea, but if the past few months are any
 indication of how much free time I have, I might not be able to get anything done
 right away (seeing how I was able to write my load testing tool while on vacation).
 I am however, more than willing to contribute to anyone else’s attempt at this
 framework and I am especially more than willing to help test anyone else’s
 framework.
 **Side Note:** If anyone knows of any tool or framework currently that tries to
 achieve my “goal” please let me know. I was unable to find any tools out there
 that worked as I described or that even got close, but I might not of searched for
 the right thing or maybe skipped over the right link, etc.
--- a/content/writing/about/write-code-every-day/index.md
+++ b/content/writing/about/write-code-every-day/index.md
@ -0,0 +1,56 @@
 ---
 title: Write code every day
 author: Brett Langdon
 date: 2015-07-02
 template: article.jade
 ---
 Just like a poet or an athlete practicing code every day will only make you better.
 ---
 Lately I have been trying to get into blogging more and any article I read always says, "you need to write every day".
 It doesn't matter if what I write down gets published, but forming the habit of trying to write something every day
 is what counts. The more I write the easier it will become, the more natural it will feel and the better I will get at it.
 This really isn't just true of writing or blogging, it is something that can be said of anything at all. Riding a bike,
 playing basketball, reading, cooking or absolutely anything at all. The more you do it, the easier it will become and
 the better you will get.
 As the title of the post will allude you to, this is also true of programming. If you want to be really good at programming
 you have to write code every day. The more code you write the easier it'll be to write and the better you will be at programming.
 Just like any other task I've listed in this article, trying to write code every day, even if you are used to it, can be really
 hard to do and a really hard habit to keep.
 "What should I write?" The answer to this question is going to be different for everyone, but it is the hurdle which
 you must first overcome to work your way towards writing code every day. Usually people write code to solve problems
 that they have, but not everyone has problems to solve. There is usually a chicken and the egg problem. You need to
 write code to have coding problems, and you need to have coding problems to have something to write. So, where should
 you start?
 For myself, one of the things I like doing is to rewrite things that already exist. Sometimes it can be hard to come up with a
 new and different idea or even a new approach to an existing idea. However, there are millions of existing projects out
 there to copy. The idea I go for is to try and replicate the overall goal of the project, but in my own way. That might
 mean writing it in a different language, or changing the API for it or just taking some wacky new approach to solving the same issue.
 More times than not the above exercise leads me to a problem that I then can go off and solve. For example, a few weeks ago
 I sat down and decided I wanted to write a web server in `go` (think `nginx`/`apache`). I knew going into the project I wanted
 a really nice and easy to use configuration file to define the settings. So, I did what most people do these days I and
 used `json`, but that didn't really feel right to me. I then tried `yaml`, but yet again didn't feel like what I wanted. I
 probably could have used `ini` format and made custom rules for the keys and values, but again, this is hacky. This spawned
 a new project in order to solve the problem I was having and ended up being [forge](https://github.com/brettlangdon/forge),
 which is a hand coded configuration file syntax and parser for `go` which ended up being a neat mix between `json` and `nginx`
 configuration file syntax.
 Anywho, enough of me trying to self promote projects. The main point is that by trying to replicate something that
 already exists, without really trying to do anything new, I came up with an idea which spawned another project and
 for at least a week (and continuing now) gave me a reason to write code every day. Not only did I write something
 useful that I can now use in any future project of mine, I also learned something I did not know before. I learned
 how to hand code a syntax parser in `go`.
 Ultimately, try to take "coding every day" not as a challenge to write something useful every day, but to learn
 something new every day. Learn part of a new language, a new framework, learn how to take something apart or put
 it back together. Write code every day and learn something new every day. The more you do this, the more you will
 learn and the better you will become.
 Go forth and happy coding. :)
--- a/static/css/lato.css
+++ b/static/css/lato.css
@ -0,0 +1 @@
 css
--- a/static/css/site.css
+++ b/static/css/site.css
@ -0,0 +1,9 @@
 #wrapper,
 .profile #wrapper,
 #wrapper.home {
    max-width: 900px;
 }
 a.symbol {
    margin-right: 0.7rem;
 }
--- a/static/images/avatar.png
+++ b/static/images/avatar.png
--- a/static/images/avatar@2x.png
+++ b/static/images/avatar@2x.png
--- a/static/images/favicon.ico
+++ b/static/images/favicon.ico