diff --git a/contents/writing/about/lets-make-a-metrics-beacon/index.md b/contents/writing/about/lets-make-a-metrics-beacon/index.md new file mode 100644 index 0000000..a3ea839 --- /dev/null +++ b/contents/writing/about/lets-make-a-metrics-beacon/index.md @@ -0,0 +1,242 @@ +--- +title: Lets Make a Metrics Beacon +author: Brett Langdon +date: 2014-06-22 +template: article.jade +--- + +Recently I wrote a simple javascript metrics beacon +library. Let me show you what I came up with and how it works. + +--- + +So, what do I mean by "javascript metrics beacon library"? Think +[RUM (Real User Monitoring)](http://en.wikipedia.org/wiki/Real_user_monitoring) or +[Google Analytics](http://www.google.com/analytics/), +it is a javascript library used to capture/aggregate metrics/data +from the client side and send that data to a server either in one +big batch or in small increments. + +For those who do not like reading articles and just want the code you +can find the current state of my library on github: https://github.com/brettlangdon/sleuth + +Before we get into anything technical, lets just take a quick look at an +example usage: + +```html + + +``` + +Alright, so lets cover a few concepts from above, `tags`, `metrics` and `syncing`. + +### Tags +Tags are meant to be a way to uniquely identify the metrics that are being sent +to the server and are generally used to break apart metrics. For example, you might +have a metric to track whether or not someone clicks an "add to cart" button, using tags +you can then break out that metric to see how many times the button has been pressed +for each `productId` or browser or language or any other piece of data you find +applicable to segment your metrics. Tags can also be used when tracking data for +[A/B Tests](http://en.wikipedia.org/wiki/A/B_testing) where you want to segment your +data based on which part of the test the user was included. + +### Metrics +Metrics are simply data points to track for a given request. Good metrics to record +are things like load times, elements loaded on the page, time spent on the page, +number of times buttons are clicked or other user interactions with the page. + +### Syncing +Syncing refers to sending the data from the client to the server. I refer to it as +"syncing" since we want to try and aggregate as much data on the client side and send +fewer, but larger, requests rather than having to make a request to the server for +each metric we mean to track. We do not want to overload the Client if we mean to +track a lot of user interactions on the site. + +## How To Do It +Alright, enough of the simple examples/explanations, lets dig into the source a bit +to find out how to aggregate the data on the client side and how to sync that data +to the server. + +### Aggregating Data +Collecting the data we want to send to the server isn't too bad. We are just going +to take any specific calls to `Sleuth.track(key, value)` and store either in +[LocalStorage](http://diveintohtml5.info/storage.html) or in an object until we need +to sync. For example this is the `track` method of `Sleuth`: + +```javascript +Sleuth.prototype.track = function(key, value){ + if(this.config.useLocalStorage && window.localStorage !== undefined){ + window.localStorage.setItem('Sleuth:' + key, value); + } else { + this.data[key] = value; + } +}; +``` + +The only thing of note above is that it will fall back to storing in `this.data` +if LocalStorage is not available as well we are namespacing all data stored in +LocalStorage with the prefix "Sleuth:" to ensure there is no name collision with +anyone else using LocalStorage. + +Also `Sleuth` will be kind enough to capture data from `window.performance` if it +is available and enabled (it is by default). And it simply grabs everything it can +to sync up to the server: + +```javascript +Sleuth.prototype.captureWindowPerformance = function(){ + if(this.config.performance && window.performance !== undefined){ + if(window.performance.timing !== undefined){ + this.data.timing = window.performance.timing; + } + if(window.performance.navigation !== undefined){ + this.data.navigation = { + redirectCount: window.performance.navigation.redirectCount, + type: window.performance.navigation.type, + }; + } + } +}; +``` + +For an idea on what is store in `window.performance.timing` check out +[Navigation Timing](https://developer.mozilla.org/en-US/docs/Navigation_timing). + +### Syncing Data +Ok, so this is really the important part of this library. Collecting the data isn't +hard. In fact, no one probably really needs a library to do that for them, when you +just as easily store a global object to aggregate the data. But why am I making a +"big deal" about syncing the data either? It really isn't too hard when you can just +make a simple AJAX call using jQuery `$.ajax(...)` to ship up a JSON string to some +server side listener. + +The approach I wanted to take was a little different, yes, by default `Sleuth` will +try to send the data using AJAX to a server side url "/track", but what about when +the server which collects the data lives on another hostname? +[CORS](http://en.wikipedia.org/wiki/Cross-origin_resource_sharing) can be less than +fun to deal with, and rather than worrying about any domain security I just wanted +a method that can send the data from anywhere I want back to whatever server I want +regardless of where it lives. So, how? Simple, javascript pixels. + +A javascript pixel is simply a `script` tag which is written to the page with +`document.write` whose `src` attribute points to the url that you want to make the +call to. The browser will then call that url without using AJAX just like it would +with a normal `script` tag loading javascript. For a more in-depth look at tracking +pixels you can read a previous article of mine: +[Third Party Tracking Pixels](http://brett.is/writing/about/third-party-tracking-pixels/). + +The point of going with this method is that we get CORS-free GET requests from any +client to any server. But some people are probably thinking, "wait, a GET request +doesn't help us send data from the client to server"? This is why we will encode +our JSON string of data for the url and simply send in the url as a query string +parameter. Enough talk, lets see what this looks like: + +```javascript +var encodeObject = function(data){ + var query = []; + for(var key in data){ + query.push(encodeURIComponent(key) + '=' + encodeURIComponent(data[key])); + }; + + return query.join('&'); +}; + +var drop = function(url, data, tags){ + // base64 encode( stringify(data) ) + tags.d = window.btoa(JSON.stringify(data)); + + // these parameters are used for cache busting + tags.n = new Date().getTime(); + tags.r = Math.random() * 99999999; + + // make sure we url encode all parameters + url += '?' + encodeObject(tags); + document.write(''); +}; +``` + +That is basically it. We simply base64 encode a JSON string version of the data and send +as a query string parameter. There might be a few odd things that stand out above, mainly +url length limitations of base64 encoded JSON string, the "cache busting" and the weird +breaking up of the tag "script". A safe url length limit to live under is around +[2000](http://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers) +to accommodate internet explorer, which from some very crude testing means each reqyest +can hold around 50 or so separate metrics each containing a string value. Cache busting +can be read about more in-depth in my article again about tracking pixels +(http://brett.is/writing/about/third-party-tracking-pixels/#cache-busting), but the short +version is, we add random numbers and the current timestamp the query string to ensure that +the browser or cdn or anyone in between doesn't cache the request being made to the server, +this way you will not get any missed metrics calls. Lastly, breaking up the `script` tag +into "sc + ript" and "scri + pt" makes it harder for anyone blocking scripts from writing +`script` tags to detect that a script tag is being written to the DOM (also an `img` or +`iframe` tag could be used instead of a `script` tag). + +### Unload +How do we know when to send the data? If someone is trying to time and see how much time +someone is spending on each page or wants to make sure they are collecting as much data +as they want on the client side then you want to wait until the last second before +syncing the data to the server. By using LocalStorage to store the data you can ensure +that you will be able to access that data the next time you see that user, but who wants +to wait? And what if the user never comes back? I want my data now dammit! + +Simple, lets bind an event to `window.onunload`! Woot, done... wait... why isn't my data +being sent to me? Initially I was trying to use `window.onunload` to sync data back, but +found that it didn't always work with pixel dropping, AJAX requests worked most of the time. +After some digging I found that with `window.onunload` I was hitting a race condition on +whether or not the DOM was still available or not, meaning I couldn't use `document.write` +or even query the DOM on unload for more metrics to sync on `window.onunload`. + +In come `window.onbeforeunload` to the rescue! For those who don't know about it (I +didn't before this project), `window.onbeforeunload` is exactly what it sounds like +an event that gets called before `window.onunload` which also happens before the DOM +gets unloaded. So you can reliably use it to write to the DOM (like the pixels) or +to query the DOM for any extra information you want to sync up. + +## Conclusion +So what do you think? There really isn't too much to it is there? Especially since we +only covered the client side of the piece and haven't touched on how to collect and +interpret this data on the server (maybe that'll be a follow up post). Again this is mostly +a simple implementation of a RUM library, but hopefully it sparks an interest to build +one yourself or even just to give you some insight into how Google Analytics or other +RUM libraries collect/send data from the client. + +I think this project that I undertook was neat because I do not always do client side +javascript and every time I do I tend to learn something pretty cool. In this case +learning the differences between `window.onunload` and `window.onbeforeunload` as well +as some of the cool things that are tracked by default in `window.performance` I +definitely urge people to check out the documentation on `window.performance`. + +### TODO +What is next for [Sleuth](https://github.com/brettlangdon/sleuth)? I am not sure yet, +I am thinking of implementing more ways of tracking data, like adding counter support, +rate limiting, automatic incremental data syncs. I am open to ideas of how other people +would use a library like this, so please leave a comment here or open an issue on the +projects github page with any thoughts you have. + + +## Links +* [Sleuth](https://github.com/brettlangdon/sleuth) +* [Third Party Tracking Pixels](http://brett.is/writing/about/third-party-tracking-pixels/) +* [LocalStorage](http://diveintohtml5.info/storage.html) +* [Navigation Timing](https://developer.mozilla.org/en-US/docs/Navigation_timing) +* [window.onbeforeunload](https://developer.mozilla.org/en-US/docs/Web/API/Window.onbeforeunload) +* [window.onunload](https://developer.mozilla.org/en-US/docs/Web/API/Window.onunload) +* [RUM](http://en.wikipedia.org/wiki/Real_user_monitoring) +* [Google Analytics](http://www.google.com/analytics/) +* [A/B Testing](http://en.wikipedia.org/wiki/A/B_testing)