For whatever reason, Go insisted on loading rm.sm[i] in several chunks,
even though it could be loaded in a single 64-bit block. Instead, let's
reorder our loads to minimize the amount of memory we're uselessly
moving around.
This gives us about a 15% perf boost in
github.com/julienschmidt/go-http-routing-benchmark's
BenchmarkGoji_StaticAll, and questionable benefits (i.e., not
distinguishable from noise but certainly no worse) on Goji's own
benchmarks.
Swap out the naive "try all the routes in order" router with a "compile
a trie down to bytecode" router. It's a ton faster, while providing all
the same semantics.
See the documentation at the top of web/fast_router.go for more.