Syntax extensions and regular expressions for Rust

Posted on 04/21/2014 at 7:51pm

A few weeks ago, I set out to add regular expressions to the Rust distribution with an implementation and feature set heavily inspired by Russ Cox’s RE2. It was just recently added to the Rust distribution.

This regex crate includes a syntax extension that compiles a regular expression to native Rust code when a Rust program is compiled. This can be thought of as “ahead of time” compilation or something similar to compile time function execution. These special natively compiled regexes have the same exact API as regular expressions compiled at runtime.

In this article, I will outline my implementation strategy—including code samples on how to write a Rust syntax extension—and describe how I was able to achieve an identical API between regexes compiled at compile time and regexes compiled at runtime.

Brief notes on format

All code samples in this post compile with rustc from commit 8bc286 (compiled on 2014-06-30, 21:16:32+0). Code samples with an // Output: comment are tested before uploaded.

Most code samples should correspond to a complete program that is compilable. This makes samples a bit longer than they need to be, but I think it’s important to include complete working examples for a language still in development and not (yet) widely known.

Note that Rust is still under heavy development. I will do my best to keep this article updated with any breaking changes.

How a regex matcher works

There are many different ways to implement a regular expression engine, so I will just briefly outline how mine works, which should closely correspond to RE2.

First, a regular expression is parsed and converted into an abstract syntax tree. For example, the regex a|b might be represented as Alternate(Literal(a), Literal(b)), which signifies “match either a or b.” Second, the regex’s abstract syntax is converted to a sequence of instructions. This sequence of instructions can then be evaluated by a virtual machine which will determine if a regex matches some text (and where it matches).

For the regex a|b, the sequence of instructions looks something like:

  • 1: Split(2, 4)
  • 2: Char(a)
  • 3: Jump(5)
  • 4: Char(b)
  • 5: Match

In this case, Split means “jump to two different instructions simultaneously.” In particular, if either branch executes the Match instruction, then the entire regex will match. If both Char instructions fail, then it’s impossible to reach the Match instruction.

Clarifying the divide: native regexes vs. dynamic regexes

The word “compiled” is heavily overloaded, so it’s worth making sure that we get our terms straight. For example, in Python, you can compile a regular expression with re.compile("..."), which is done at runtime, and converts the pattern given into a data structure that is fed into a matching algorithm. Often, it is good practice to compile a regular expression outside of loops so that it doesn’t have to go through the overhead of compilation each time the expression is used to search text. (Assuming you don’t want to rely on Python’s caching.)

This is emphatically not what I mean by compiling regexes “ahead of time.” What I mean is the more literal translation: a regular expression is converted to native Rust code when you compile your Rust program. That is, there is (virtually) zero cost of compiling a regex during runtime. Perhaps more importantly, because it’s compiled to native Rust code, your regex will always run faster. (I promise evidence of this claim with benchmarks toward the end of this article.)

Let’s clarify with an example. First, we can try compiling a regular expression at runtime:

Notice that we call Regex::new to create a regex, and the call may fail if the expression given isn’t a valid regex. I call this a dynamic regex because the regular expression compilation happens at runtime. This example is analogous to Python’s re.compile.

When a regular expression is compiled dynamically, it is converted to a sequence of instructions that are executed by a virtual machine. This virtual machine can execute any valid regular expression expressed as a sequence of instructions.

In contrast, I’ve called regexes that are compiled to Rust code when your Rust program compiles native. A native regex is indistinguishable from a dynamic regex, except that it must be created with a regex! macro:

The highlighted lines indicate a change from the previous code sample.

(Note about Rust: Syntax extensions require a special compiler directive #[phase(plugin)] to work. The phase directive must be enabled explicitly: #![feature(phase)].)

Notice that we no longer have to check if regex! returns an error because all errors are converted to compile time errors. For example, the following program will fail to compile:

The error given by rustc: Regex syntax error near position 3: Unclosed parenthesis.

So what exactly does regex! expand to if not a sequence of instructions? Well, remember the virtual machine I mentioned earlier that can execute a sequence of instructions? That’s precisely what regex! gets replaced with, except it’s specialized to the particular sequence of instructions corresponding to the regular expression given. This specialization allows us to perform optimizations such as removing heap allocation and embedding the instructions directly instead of relying on a generalized algorithm.

If you want an example, you can take a look at how this program expands the regex! macro into this program. (It’s not important to understand the expanded code, but it is notable that it expands to a lot of code! If you have a lot of regex! calls in your program, then you might see your binary size increase. But compiling with -O optimization will help with that.)

Native regexes are pretty good. They provide extra safety (cannot compile a program with an invalid regex) and extra performance. They do come with a downside though: code bloat. If your program has a lot of regex! calls (hundreds) and is compiled unoptimized, then you’ll notice a bigger binary. However, optimization can help shrink them back down.

Related work

Note that I am not the first to do this. D has ctRegex which claims to do something similar. There is also xpressive in Boost, but native regexes must be written with template syntax. Finally, there is also CL-PPCRE for Common Lisp which also claims to have native regexes.

Nimrod seems like it is capable of producing native regexes, but I don’t think it has been done yet.

A related project is Ragel, which can compile state machines to native code in a variety of languages. In principle, Ragel is also producing native regexes, but my approach is automated by the Rust compiler and provides an API identical to that of dynamic regexes.

(Please alert me if I’ve missed anything.)

Native regexes are implemented with macros

Rust has hygienic macros, but they can be written in one of two ways. The first way is to use macro_rules! to conveniently specify some source transformation with quasiquoting. For example, here is how the try! macro is defined, which provides an easy way to return early in a function that returns a value with type Result<T, E>:

Notice the use of quasiquoting with $e. The expression $e is spliced into the match expression at compile time. The quasiquoting in this case distinguishes between something that parameterizes the macro at compile time and the actual source code being written.

In general, if a macro can be written with macro_rules!, then it should be written with macro_rules! because they are simple to write. However, they are not powerful enough to implement native regexes because they are restricted to source code transformations. To implement native regexes, we need to actually compile a regular expression during the compilation of a Rust program.

Luckily, the second way to write a macro is via a syntax extension (also known as a “procedural macro”). This is done by a compile time hook that lets you execute arbitrary Rust code, rewrite the abstract syntax to whatever you want and opens up access to the Rust compiler’s parser. In short, you get a lot of power. And it’s enough to implement native regexes. The obvious drawback is that they are more difficult to implement.

Note that all macros are invoked with the name! syntax, regardless of whether they are defined with macro_rules! or as a syntax extension.

Setting up a syntax extension

Syntax extensions only became available to user programs a few months ago. Before that, they were only available to the Rust compiler. Thus, there is little documentation, so I’ve learned what I know by example and by examining the syntax crate, which defines Rust’s abstract syntax, provides a parser and the syntax extension functionality itself. (Note that syntax extensions are currently an experimental feature of Rust.)

Since there is a fair bit of boiler plate required, let’s take a look at implementing a trivial macro that returns the factorial of 5:

(N.B. There are some comments in the code above that explain some details.)

The magic happens with the macro_registrar function, which must be exported and labelled with the special #[macro_registrar] attribute. This function is where the compiler lets you register syntax extensions, which have a name and provide a value of type SyntaxExtension which indicates the sort of expansion that you intend to do. In this case, we’re using NormalTT, which is a function-like macro, but there are others, including the macro_rules! form.

Here, we register a single macro: factorial, which is expanded by the expand function. The type of expand is specified by the NormalTT expansion, which gives us an extension context (e.g., the parsing state), a span (e.g., a region of the actual code, used for error reporting) and a token tree. But most importantly, the return value is a Box<MacResult> (“macro result”), where MacResult is a trait. Types that satisfy the MacResult trait can be automatically spliced into the AST of a Rust program. In this case, we’re just creating an expression, so we can use the MacExpr helper type to construct a value that has type Box<MacResult>. Finally, we see that MacExpr::new has type Gc<Expr> -> Box<MacResult>. Ah ha! So if we can build an expression, then we can build a Box<MacResult>, and therefore be done with our macro expansion.

The body of the expand function shows how to manually create an expression with a single integer literal. Even though the expression is very simple, I wrote two helper functions uint_literal and dummy_expr to encapsulate some of the more mundane details. If we have to go through this much work for just a single literal, you might imagine that it gets very tedious pretty quickly.

But remember the quasiquoting that was available to use when defining macros with macro_rules!? They are available to us here too, courtesy of the syntax::ext::quote module. Here’s the same syntax extension implemented using quasiquoting:

The quote_expr! macro does all the heavy lifting for us. It splices the value of answer into an expression for us, handling all the nitty-gritty details automatically. (We could also write, say, $answer + 1 which would produce an expression with an addition operation. The $answer part is computed at compile time and the addition operation would be done at runtime—if it isn’t optimized away.) Fundamentally, the quasiquoting works the same here as it does with macro_rules!.

(There is some middle ground between building the AST manually and using quasiquoting, but I won’t cover it here.)

And finally, we can actually use it with:

The comments in the code explain some of the extra compiler directives we have to write in order to import a syntax extension. (Note that this is no different than importing the log crate from the Rust distribution, except that the log crate requires #[phase(syntax, link)] because it provides more stuff than just a syntax extension.)

Accessing the parser

The above example demonstrates how to write a really simple—but pretty useless—syntax extension. To write a useful syntax extension, we probably need to define a macro that can accept arguments. With macro_rules!, this is really easy because there’s some convenient syntax similar to defining a regular function. But in the land of syntax extensions, we must deal with Rust’s parser directly.

An obvious extension to our factorial! macro is to let it take an integer argument so that we can compute any factorial. But how do we get the arguments given to a macro defined as a syntax extension?

Well, if you look back at the last example, then you’ll notice that the expand function has a few parameters that we didn’t use. One of which is a sequence of token trees with type &[ast::TokenTree]. Luckily, a sequence of ast::TokenTree values can be used to build a parser with new_parser_from_tts defined in the syntax::parse module. With a parser, we can ask it for expressions—and once we have expressions, we can translate those to real Rust values. In this case, that’s just going to be an integer. (You can see the parser API here, and although it is largely undocumented, most of the names are reasonably descriptive.)

Let’s take a look at the code. The principal changes from the last example are highlighted:

The first few changes are fairly trivial. This time, instead of hard-coding the factorial of 5, we parse an unsigned integer literal and pass that to the factorial function. If a single unsigned integer literal could not be parsed, then we return a “dummy expression”. The actual error is written to the parser state in parse. The code is structured this way to allow the compiler to continue even if an error is found.

The parse function shows how to create a parser from a sequence of token trees. A single expression is parsed, and we use pattern matching to make sure it corresponds to an unsigned integer literal. (To figure out which value constructors to use in your patterns, you’ll want to dig into the ast::Expr documentation, specifically the Expr_ type.)

After we find a literal, we make sure it’s the last thing in the sequence of token trees and then return the integer as a real Rust value with type u64.

If anything unexpected occurs, we log an error with the Rust compiler via the ExtCtxt and return None. As a bonus, we use the syntax::print::pprust module to pretty-print Rust expressions to make our error messages nicer.

Finally, the macro can be used the same way as last time, except we call factorial! with an integer argument:

This concludes the tutorial aspect of this article. Now we’ll get back to regexes.

Creating the regex! syntax extension

Instead of going through the full code generator, which would require more explanation about how the virtual machine works, I’ll explain the representation of native regexes and how some of the optimizations work.

But first, we must figure out how the regex! macro will be used. If possible, it should reuse the existing API for the Regex type. The benefit of this is that native regexes won’t introduce any additional complexity on top of dynamic regexes other than how they are constructed. The regex! macro should also probably return an expression, so that we can write things like regex!("abc").is_match("xyz").

Perhaps we can construct a value with type Regex. If we could do that, then we’d be done—because such values already support all the convenient matching, splitting and replacing functions. So let’s take a look at the representation of a dynamic Regex:

The representation shown here is very simple: it’s just a sequence of instructions. The problem is that this representation is for dynamic regexes. Remember that dynamic regexes work by being compiled to a sequence of instructions that is then executed by a general virtual machine. The VM can be thought of as a Rust function with type fn(insts: &[Inst], search: &str) -> Captures, where Captures is just the location of matches in the search text.

Native regexes on the other hand should be native Rust code with precisely the same type as above with one omission: the sequence of instructions. Why? Because the sequence of instructions is encoded directly into the function by the regex! syntax extension. Therefore, the type of a native regex VM would have to be fn(search: &str) -> Captures.

So we’re left with an interesting set of constraints. On the one hand, we have a VM that can execute a sequence of instructions on search text, and on the other, we have a specialized VM that just executes on search text. Since a regex must be either dynamic or native, we can represent this state of affairs with a sum type:

And now we can easily write a function that executes a Regex on search text, regardless of whether its dynamic or native:

These details of the representation are hidden from well behaving clients, and therefore, dynamic and native regexes are indistinguishable.

Why did you just say “well behaving clients”?

Shouldn’t Rust’s module system prevent clients from accessing parts of the representation that aren’t exported?

Herein lay the dirty little secret of the regex_macros crate: it must be able to access the internal representation of a Regex (along with other things, like the set of all instructions). But if you look at the public API documentation, you’ll not see any such details of the representation exposed. This is because they are hidden using a special #[doc(hidden)] attribute, but you can see that an entire module of things is exported for the exclusive benefit of the regex_macros crate.

Therefore, the “well behaving clients” qualification is necessary. Misbehaving clients could access the internal representation of the Regex type, and thereby distinguish between dynamic and native regexes.

(It’d be possible to ameliorate this somewhat by exposing a hidden constructor function.)

Performance optimizations

This is still early days for the regex crate in Rust, but there are two primary optimizations made with native regexes. The first is removing heap allocation. Briefly, the general VM for dynamic regexes has a few places where it needs to allocate things on the heap. Primarily, the allocation is for storing capture groups (i.e., the location of submatches in search text). The heap allocation is necessary because the dynamic VM doesn’t know at compile time how many capture groups any particular regex might have.

This of course is not the case for native regexes, since the regex is compiled when your Rust program is compiled. Therefore, we can put all of the capture group information on the stack using explicitly sized vectors (i.e., [T, ..N]). This removes all heap allocation (except for the return value).

The second optimization comes from encoding the instructions directly into one big match expression based on the position of the instruction. This removes a lot of small computations necessary to discover what the “next” instruction is.

Some other smaller optimizations were made, such as removing conditional branches based on flags set in a particular instruction (like case insensitivity, multi-line mode, etc.) or converting a binary search on a character class to a single match expression.


As promised, here are some benchmarks run on an Intel i3930K:

Native regexes provide a universal speedup.

The more interesting observation is that native regexes drastically reduce constant factors associated with matching a particular regex against some text. This is evidenced by the easy_32, easy1_32, medium_32 and hard_32 benchmarks, where there is an order of magnitude improvement in almost all of them.

These benchmarks in particular test a few different regexes (based on particular optimizations related to literal prefixes) on 32 bytes of text. As one might imagine, with such a small search string, constant factors are likely to dominate the performance of the matching algorithm. Not surprisingly, native regexes win big here—likely because there is almost no heap allocation occurring.

Future work

There are still many more optimizations that can be performed:

  • Creating a “one pass” NFA where no backtracking need occur.
  • Attempt to do state compression in the code generator via a more intelligent analysis on the sequence of instructions.
  • The big one: implement a DFA, similar to how RE2/C++ works.


Many thanks to the following:


Other examples of syntax extensions

When documentation is scarce, the next best thing is to look at examples. Here are the ones I know about:

  • simple-ext is an example of a syntax extension for sorting literal strings. You can see it used here.
  • hexfloat for hexadecimal floating-point literals (part of Rust distribution).
  • fourcc four-character code library (part of Rust distribution).
  • rust-phf implements compile time static maps.

Please ping me if you know of more examples. I’ll add them here.

181 Comments - Post comment

Writing type parametric functions in Go

Posted on 04/06/2013 at 12:54pm

Go’s only method of compile time safe polymorphism is structural subtyping, and this article will do nothing to change that. Instead, I’m going to present a package ty with facilities to write type parametric functions in Go that maintain run time type safety, while also being convenient for the caller to use.

By run time type safety, I mean that the types of a function’s arguments are consistent with its parametric type or else the function predictably fails at run time with a reasonable error message. Stated differently, a lack of run time type safety would permit arguments that are inconsistent with the function’s parametric type at the call site, but might fail with unrelated and hard to debug errors (or worse, not fail at all). Thus, run time type safety in this context is a statement about failure modes.

I will provide examples that clarify run time type safety later in the article.

Warm up

Briefly, type parametric functions operate on their inputs without explicit knowledge of the types of their inputs. That is, they are parameterized on the types of their arguments.

If Go had parametric polymorphism available to users, here’s what a Map function might look like:

// Forall A, B ...
func Map(f func(A) B, xs []A) []B {
    ys := make([]B, len(xs))
    for i, x := range xs {
        ys[i] = f(x)
    return ys
Map(func(x int) int { return x * x }, []int{1, 2, 3})
// Returns: [1, 4, 9]

Note: Go has several built in functions that use parametric polymorphism: append, close, delete, copy, cap and len.


The purpose of the ty package is to give the programmer the ability to write the aforementioned Map function such that

  • It is easy for the caller to use.
  • The Map function is not overly difficult to write.
  • Type safety is maintained at run time.


Let’s skip the brouhaha and assume you buy into the notion that type parametric functions are useful in the hands of the user. The question remains: why is such an addition useful when Go already has powerful reflection tools? The answer is: working with reflection can be terribly inconvenient, and verifying the consistency of types can be complex and error prone.

I will attempt to convince you of this with code samples using the familiar Map function.

An attempt without using reflection

In Go, the type interface{} corresponds to the set of types that implement the empty interface. Stated differently: all Go types. It is appropriate to think of an interface{} type as conceptually analogous to a void * in C, but there are important operational differences. For example, Go is memory safe, which prevents arbitrary conversion of a value from one type to another. This limitation in particular makes a Map function without reflection more clumsy than how you might write and use it in C. Also, in Go, a value with interface{} type still contains information about the value’s underlying type, which we will exploit later.

Let’s start with writing Map using interface{}:

func Map(f func(interface{}) interface{}, xs []interface{}) []interface{} {
    ys := make([]interface{}, len(xs))
    for i, x := range xs {
        ys[i] = f(x)
    return ys

This part isn’t so bad, but the burden on the caller is outrageous:

square := func(x interface{}) interface{} {
    return x.(int) * x.(int)
nums := []int{1, 2, 3, 4}

gnums := make([]interface{}, len(nums))
for i, x := range nums {
    gnums[i] = x
gsquared := Map(square, gnums)
squared := make([]int, len(gsquared))
for i, x := range gsquared {
    squared[i] = x.(int)

Since we can’t do arbitrary type conversions, we need to allocate a new slice for the arguments, while also allocating a new slice for the return value of interface{} and type assert each element individually. (A type assertion in Go is a way to state knowledge about the underlying type of an interface value. In the above code, the type assertion will crash the program if it fails.) Moreover, the function f provided by the caller must also be generic. Anything with this much burden on the caller probably isn’t worth it.

With regard to run time type safety, most of it is contained inside the user-supplied f function, but error messages don’t address the underlying cause. For example, the following code

Map(func(a interface{}) interface{} { return len(a.(string)) },
    []interface{}{1, 2, 3})

fails with a stack trace and an error message: interface conversion: interface is int, not string.

A reflective interlude

For those that haven’t worked with reflection in Go before, The Laws of Reflection is a great introduction to the topic. It is suitable even if you don’t know Go.

I don’t consider it to be a necessary read before moving on, but it is important to know this (from “The Laws of Reflection”):

Reflection in computing is the ability of a program to examine its own structure, particularly through types; it’s a form of metaprogramming.

With that, let us move on to a Map that uses reflection.

Reflection in Go 1.0.x

We can make the burden a bit easier on the caller using Map by wielding the power of reflection to examine and manipulate the structure of a program. We wield this power by exploiting the fact that interface{} values still contain information about the underlying type of the value it contains. But this exploitation comes with the price of a more painful Map function:

func Map(f interface{}, xs interface{}) []interface{} {
    vf := reflect.ValueOf(f)
    vxs := reflect.ValueOf(xs)
    ys := make([]interface{}, vxs.Len())

    for i := 0; i < vxs.Len(); i++ {
        ys[i] = vf.Call([]reflect.Value{vxs.Index(i)})[0].Interface()
    return ys

Here are the key differences between this Map and the last one:

  • The type of f is now interface{} instead of func(interface{}) interface{}.
  • The type of xs is now interface{} instead of []interface{}.
  • The user’s f function is now applied using reflection instead of a regular Go function application.
  • The xs slice is accessed using reflection instead of the regular Go indexing operation.

The differentiating theme here is to move the entire world of Map into an interface{} type and rely on the reflect package to operate on the structure of those unknown values. In particular, we’ve given up some compile time type safety in exchange for lifting some burdens from the caller:

square := func(x int) int {
    return x * x
nums := []int{1, 2, 3, 4}

gsquared := Map(square, nums)
squared := make([]int, len(gsquared))
for i, x := range gsquared {
    squared[i] = x.(int)

Namely, the client is no longer mandated to write f as a generic function. It can use its own types without worrying about type assertions. Moreover, the client no longer needs to allocate a new slice for the input of the function.

But the caller still needs to type assert each element in the returned slice. How can we remove such a burden?

It turns out that it’s impossible using reflection in Go 1.0.x.

Reflection in Go tip (soon to be Go 1.1)

The release notes for Go 1.1 detail many welcomed changes, but for this article, we care about the changes made to the reflect package. In particular, three new critical functions were added: ChanOf, MapOf and SliceOf. These functions allow the creation of new types from existing types. With Go 1.0.x, such operations were not possible (except with pointer types).

This now allows us to write a Map function that uses the return type of f to construct a new slice type, which we can then populate and return to the caller.

func Map(f interface{}, xs interface{}) interface{} {
    vf := reflect.ValueOf(f)
    vxs := reflect.ValueOf(xs)

    tys := reflect.SliceOf(vf.Type().Out(0))
    vys := reflect.MakeSlice(tys, vxs.Len(), vxs.Len())

    for i := 0; i < vxs.Len(); i++ {
        y := vf.Call([]reflect.Value{vxs.Index(i)})[0]
    return vys.Interface()

Our Map function has gotten a bit more annoying, but after practice with the reflect package, it’s possible to see how most lines correspond to a regular Go operation. On the bright side, the caller’s obligations have dropped to nearly the level that we saw with the first generic Map example:

squared := Map(func(x int) int { return x * x }, []int{1, 2, 3}).([]int)

The only burden on the caller is to type assert the return value of the function. Indeed, this is the best we can do in this regard: all type parametric functions that return a value from now on have this restriction and only this restriction unless stated otherwise.

Run time type safety

The most recent iteration of the Map function is annoying to write, but not quite painful. Unfortunately, that’s about to change. Consider what happens when we try to subvert the parametric type of Map (which is func(func(A) B, []A) []B) by running this code:

Map(func(a string) int { return len(a) }, []int{1, 2, 3}).([]int)

The program fails with a stack trace and an error message: Call using int as type string.

Since our program is small, it is easy to see where we went wrong. But in a larger program, such an error message can be confusing. Moreover, type failures could occur anywhere which might make them more confusing. Even worse, other type parametric functions (not Map) might not fail at all—which results in a total loss of type safety, even at run time.

Therefore, to provide useful and consistent error messages, we must check the invariants in the parametric type of Map. Why? Because the Go type of Map is func(interface{}, interface{}) interface{} while the parametric type of Map is func(func(A) B, []A) []B. Since an interface{} type can correspond to any type, we need to be exhaustive in our checking:

  1. Map’s first parameter type must be func(A) B
  2. Map’s second parameter type must be []A1 where A == A1.
  3. Map’s return type must be []B1 where B == B1.

Given those invariants, here’s a Map function that enforces them and produces sane error messages. (I leave it to the reader to imagine better ones.)

func Map(f interface{}, xs interface{}) interface{} {
    vf := reflect.ValueOf(f)
    vxs := reflect.ValueOf(xs)

    ftype := vf.Type()
    xstype := vxs.Type()

    // 1) Map's first parameter type must be `func(A) B`
    if ftype.Kind() != reflect.Func {
        log.Panicf("`f` should be %s but got %s", reflect.Func, ftype.Kind())
    if ftype.NumIn() != 1 {
        log.Panicf("`f` should have 1 parameter but it has %d parameters",
    if ftype.NumOut() != 1 {
        log.Panicf("`f` should return 1 value but it returns %d values",

    // 2) Map's second parameter type must be `[]A1` where `A == A1`.
    if xstype.Kind() != reflect.Slice {
        log.Panicf("`xs` should be %s but got %s", reflect.Slice, xstype.Kind())
    if xstype.Elem() != ftype.In(0) {
        log.Panicf("type of `f`'s parameter should be %s but xs contains %s",
            ftype.In(0), xstype.Elem())

    // 3) Map's return type must be `[]B1` where `B == B1`.
    tys := reflect.SliceOf(vf.Type().Out(0))

    vys := reflect.MakeSlice(tys, vxs.Len(), vxs.Len())
    for i := 0; i < vxs.Len(); i++ {
        y := vf.Call([]reflect.Value{vxs.Index(i)})[0]
    return vys.Interface()

The result is a lot of pain, but when one tries to subvert its type

Map(func(a string) int { return len(a) }, []int{1, 2, 3}).([]int)

you’ll get a stack trace with a better error message: type of f's parameter should be string but xs contains int. The error message is better than what we’ve seen, but the type checking in Map has dwarfed the actual function of Map.

Fortunately, this pain can be avoided through abstraction.


Recall the type constraints for Map, which has parametric type func(func(A) B, []A) []B:

  1. Map’s first parameter type must be func(A) B
  2. Map’s second parameter type must be []A1 where A == A1.
  3. Map’s return type must be []B1 where B == B1.

We can interpret the above constraints as a unification problem, which in this case is the problem of finding a set of valid Go types that can replace all instances of A, A1, B and B1 in the type of Map. We can view these Go types as a set of substitutions.

More generally, given a parametric type of a function and the non-parametric types of the function’s arguments at run time, find a set of substitutions that unifies the parametric type with its arguments. As a bonus, we can use those substitutions to construct new types that Map may use to make new values.

To be concrete, let’s restate the constraints of Map in terms of a unification problem. (Note that this isn’t really a traditional unification problem, since the types of the arguments are not allowed to be parametric.)

Assume that all types with the Go prefix are real Go types like int, string or []byte.

  1. Unify the type func(A) B with the first argument. The result is a substitution from A to GoA and a substitution from B to GoB.
  2. Unify the type []A with the second argument. The result is a substitution from A to GoA1 such that GoA1 == GoA.
  3. Substitute GoB into []B to get []GoB.

So that if Map is invoked like so

strlen := func(s string) int { return len(s) }
lens := Map(strlen, []string{"abc", "ab", "a"}).([]int)

then A = string and B = int.

A generalized version of this algorithm is implemented in ty.Check, which is too big to list here. The input of ty.Check is a pointer to the type of a parametric function and every argument. The output is a slice of reflection values of the arguments, a slice of reflection types of the return values and a type environment containing the substitutions.

Writing Map with ty.Check

Here’s the code:

// Map has a parametric type:
//  func Map(f func(A) B, xs []A) []B
// Map returns the list corresponding to the return value of applying
// `f` to each element in `xs`.
func Map(f, xs interface{}) interface{} {
    chk := ty.Check(
        new(func(func(ty.A) ty.B, []ty.A) []ty.B),
        f, xs)
    vf, vxs, tys := chk.Args[0], chk.Args[1], chk.Returns[0]

    xsLen := vxs.Len()
    vys := reflect.MakeSlice(tys, xsLen, xsLen)
    for i := 0; i < xsLen; i++ {
        vy := vf.Call([]reflect.Value{vxs.Index(i)})[0]
    return vys.Interface()

The latter half of the function is something you ought to be deeply familiar with by now. But the first parts of the function are new and worth inspection:

chk := ty.Check(
    new(func(func(ty.A) ty.B, []ty.A) []ty.B),
    f, xs)

The first argument to ty.Check is a nil function pointer with a parametric type. Even though it doesn’t point to a valid function, the Check function can still query the type information.

But wait. How am I writing a parametric type in Go? The trick is to define a type that can never be equal to any other type unless explicitly declared to be:

type TypeVariable struct {
    noImitation struct{}

type A TypeVariable
type B TypeVariable

And by convention, the ty.Check function interprets those types (and only those types) to be parametric. You may define your own type variables too:

type K ty.TypeVariable
type V ty.TypeVariable

ty.Check has the following useful invariant: If Check returns, then the types of the arguments are consistent with the parametric type of the function, and the parametric return types of the function were made into valid Go types that are not parametric. Otherwise, there is a bug in ty.Check.

Let’s test that invariant. Using the above definition of Map, if one tries to run this code

Map(func(a string) int { return len(a) }, []int{1, 2, 3}).([]int)

then you’ll get a stack trace and a descriptive error message

Error type checking
        func(func(ty.A) ty.B, []ty.A) []ty.B
with argument types
        (func(string) int, []int)
Type error when unifying type '[]ty.A' and '[]int': Type variable A expected 
type 'string' but got 'int'.

Can we write functions other than Map?

Sure. Let’s take a look at how to shuffle any slice in place.

// Shuffle has a parametric type:
//  func Shuffle(xs []A)
// Shuffle shuffles `xs` in place using a default random number
// generator.
func Shuffle(xs interface{}) {
    chk := ty.Check(
    vxs := chk.Args[0]

    // Used for swapping in the loop.
    // Equivalent to `var tmp A`.
    tmp := reflect.New(vxs.Type().Elem()).Elem()

    // Implements the Fisher-Yates shuffle:
    for i := vxs.Len() - 1; i >= 1; i-- {
        j := rand.Intn(i + 1)

        // Swapping is a bit painful.

Or an implementation of set union, where a set is a map from any type that can be a key to a bool:

// Union has a parametric type:
//  func Union(a map[A]bool, b map[A]bool) map[A]bool
// Union returns the union of two sets, where a set is represented as a
// `map[A]bool`. The sets `a` and `b` are not modified.
func Union(a, b interface{}) interface{} {
    chk := ty.Check(
        new(func(map[ty.A]bool, map[ty.A]bool) map[ty.A]bool),
        a, b)
    va, vb, tc := chk.Args[0], chk.Args[1], chk.Returns[0]

    vtrue := reflect.ValueOf(true)
    vc := reflect.MakeMap(tc)
    for _, vkey := range va.MapKeys() {
        vc.SetMapIndex(vkey, vtrue)
    for _, vkey := range vb.MapKeys() {
        vc.SetMapIndex(vkey, vtrue)
    return vc.Interface()

Which can be used like so:

A := map[string]bool{
    "springsteen": true,
    "j. geils": true,
    "seger": true,
B := map[string]bool{
    "petty": true,
    "seger": true,
AandB := Union(A, B).(map[string]bool)

Sorts, parallel map, memoization, channels without a fixed buffer…

… and more can be found in the documentation of the ty/fun package.

Here’s a quick example of memoizing a recursive function that I think is pretty cool:

// Memoizing a recursive function like `fibonacci`:
// Write it like normal.
var fib func(n int64) int64
fib = func(n int64) int64 {
    switch n {
    case 0:
        return 0
    case 1:
        return 1
    return fib(n-1) + fib(n-2)

// Wrap it with a memoizing function.
// The type assert here is the *only* burden on the caller.
fib = fun.Memo(fib).(func(int64) int64)

// Will keep your CPU busy for a long time
// without memoization.

And here’s the definition of Memo.

Back to reality

There’s no such thing as a free lunch. The price one must pay to write type parametric functions in Go is rather large:

  1. Type parametric functions are SLOW.
  2. Absolutely zero compile time type safety.
  3. Writing type parametric functions is annoying.
  4. Unidiomatic Go.

I think that items 2, 3 and 4 are fairly self-explanatory if you’ve been reading along. But I have been coy about 1: the performance of type parametric functions.

As a general rule, they are slow because reflection in Go is slow. In most cases, slow means at least an order of magnitude slower than an equivalent implementation that is not type parametric (i.e., hard coded for a particular Go type). But, there is hope yet. Let’s take a look at some benchmarks comparing non-parametric (builtin) functions with their type parametric (reflect) counterparts.

benchmark                    builtin ns/op  reflect ns/op        delta
BenchmarkFibonacciMemo-12             5895          43896     +644.63%
BenchmarkFibonacciNoMemo-12        6827001        6829859       +0.04%
BenchmarkParMapSquare-12            408320         572307      +40.16%
BenchmarkParMapPrime-12            5289594        5510075       +4.17%
BenchmarkMapSquare-12                 8499         457844    +5287.03%
BenchmarkMapPrime-12              34265372       32220176       -5.97%
BenchmarkShuffle-12                 240036        1018408     +324.27%
BenchmarkSample-12                  262565         271122       +3.26%
BenchmarkSort-12                    137293        7716737    +5520.63%
BenchmarkQuickSort-12                 6325        6051563   +95576.89%

Benchmarks were run on an Intel i7 3930K (12 threads), 32GB of memory, Linux 3.8.4 and Go tip on commit c879a45c3389 with GOMAXPROCS=12. Code for all benchmarks can be found in *test.go files.

There is a lot of data in these benchmarks, so I won’t talk about everything. However, it is interesting to see that there are several data points where type parametric functions don’t perform measurably worse. For example, let’s look at our old friend Map. The type parametric version gets blown away in the MapSquare benchmark, which just squares a slice of integers. But the MapPrime benchmark has them performing similarly.

The key is what the benchmark MapPrime is doing: performing a very naive algorithm to find the prime factorization of every element in a slice of large integers. The operation itself ends up dwarfing the overhead of using reflection. From a performance perspective, this means Map is only useful when either performance doesn’t matter or when you know the operation being performed isn’t trivial.

But what about ParMap? ParMap is a function that spawns N goroutines and computes f over the slice concurrently. Even when not using reflection this approach bears a lot of overhead because of synchronization. Indeed, the ParMapSquare benchmark shows that the type parametric version is only slightly slower than the built in version. And of course, it is comparable in the ParMapPrime benchmark as well. This suggests that, from a performance perspective, the decision procedure for using a builtin ParMap is the same as the decision procedure for using a reflective ParMap.

If you have any questions about the benchmarks, I’d be happy to answer them in the comments.

What’s next?

Tumbling down the rabbit hole with parametric data types.

Show me the code, dammit.

  • Working examples of all code in this article can be found in a gist. (Although you will need Go tip to run most of them.)
  • Repository for the ty package, which includes ty/fun and ty/data.
  • Documentation with many more details and examples than were provided in this article.
27 Comments - Post comment

Introducing NFLGame: Programmatic access to live NFL game statistics

Posted on 08/30/2012 at 12:45am

As a programmer and a fantasy football addict, I am embarassed by the means through which we must expend ourselves to get data in a machine readable form. This lack of open source software cripples the community with sub-standard tools, and most importantly, detracts from some really cool and fun things that could be done with easily available statistics. Many tools are either out-dated or broken, or if they work, they are closed source and often cost money.

Yesterday I started work on a new library package that I hope will start to improve this sorry state of affairs.

nflgame is a Python package that provides convenient access to NFL statistics. This includes games that are currently being played, or games as far back as the 2009 season.

nflgame works by reading a JSON feed that powers’s live GameCenter.

Since game statistics never change after a game has been played, the JSON data is automatically cached and saved to disk if the game is no longer being played. The next time statistics for that game are queried, the data will be read from disk. (nflgame comes preloaded with data from every game in the pre- and regular season since 2009.)

The API for nflgame is small and hopefully easy to use—even for those without much or any experience programming.


Let’s start off with a quick teaser to showcase some of nflgame’s power.

Who lead the league in rushing between weeks 10 and 14 of the 2010 regular season?

I won’t make you hunt down the answer. Here’s five lines of code that lists the top ten rushers in weeks 10-14 of the 2009 season:

>>> import nflgame
>>> games =, week=[10, 11, 12, 13, 14])
>>> players = nflgame.combine(games)
>>> for p in players.rushing().sort("rushing_yds").limit(10):
...     print p, p.rushing_yds
M.Jones-Drew 632
M.Turner 480
A.Foster 466
F.Jackson 462
K.Moreno 462
J.Charles 458
P.Hillis 426
C.Johnson 416
S.Jackson 405
B.Green-Ellis 401

Back to basics

If you are a beginning programmer (or don’t have any experience programming), I strongly urge you to read my Tutorial for non programmers: Installation and examples. What follows is a condensed version of the tutorial that might be a bit too confusing for those without programming experience.

nflgame is designed around three core concepts: games, players and lists of players (that are implemented as Python generators). Games can be selected based on season, week and team. Players in each game can then be accessed by name, statistical categories (i.e., passing, rushing, defense, etc.), or even statistical values—such as finding all players in a list with at least one receiving touchdown.

Games can be selected one at a time:

>>> import nflgame
>>> buf_at_ne =, 17, "NE", "BUF")

Or in bulk (every game in the 2009 season):

>>> import nflgame
>>> season09 =

Each game comes with its own players attribute that holds player statistics for every player that participated in the game. Additionally, games have attributes like winner, home, away, score_home, score_away and clock that report meta-information about the game itself.

So to get every player with at least one passing statistic in a game:

>>> import nflgame
>>> game =, 17, "NE", "BUF")
>>> print game.players.passing()
[B.Hoyer, T.Brady, R.Fitzpatrick]

And the same thing can be done with rushing, receiving, defense, kicking, etc., by simply replacing passing with one of the aforementioned.

Each player comes with his own grouping of statistics. To extend upon the previous example, consider printing out some passing statistics associated with each passer in the game:

>>> for p in game.players.passing():
...     print p, p.passing_cmp, p.passing_att, p.passing_yds
B.Hoyer 1 1 22
T.Brady 23 35 338
R.Fitzpatrick 29 46 307

Filtering, sorting and combining—oh my!

No data API would be complete without a means of filtering the data according to its values.

To find all players on the home team of the current game:

>>> print game.players.filter(home=True)
[B.Hoyer, T.Brady, B.Green-Ellis, A.Hernandez, J.Edelman, S.Ridley, D.Woodhead, R.Gronkowski, W.Welker, S.Gostkowski, Z.Mesko, M.Slater, K.Arrington, M.Anderson, J.Mayo, A.Molden, N.Jones, P.
Chung, B.Deaderick, D.Fletcher, D.McCourty, V.Wilfork, N.Koutouvides, R.Ninkovich, K.Love, L.Polite, B.Spikes, S.Moore]

In this case, New England is the home team, so only players on the Patriots are returned.

A more advanced use of filter is to use predicates to determine whether a particular stat should be filtered or not. For example, here we look at every player in the game on the home team with at least one interception:

>>> print game.players.defense().filter(home=True, defense_int=lambda x: x >= 1)
[A.Molden, D.McCourty, S.Moore]

Any player list can be sorted according to a statistical field. For example, we might want to see a list of rushing leaders in the game by yards:

>>> for p in game.players.rushing().sort("rushing_yds"):
>>> ...     print p, p.rushing_att, p.rushing_yds
>>> ...     
>>> ... 
S.Ridley 15 81
C.Spiller 13 60
R.Fitzpatrick 5 36
A.Hernandez 2 26
B.Green-Ellis 7 22
J.Edelman 1 6
G.Wilson 1 6
D.Woodhead 1 5
T.Choice 1 4
B.Hoyer 3 -2

Player statistics for the same player from different games can be combined to represent statistics from multiple games. Additionally, player generators can be concatenated. This combination allows one to construct searchable player generators of any makeup: from only games in a certain week, to all games in a season (or multiple seasons!).

For example, to find the top ten rushing leaders from week 2 of the 2009 season, we simply select all games from that week, combine the games into a single player list, and use our familiar searching methods exemplified above to get our answer:

>>> week2 =, 2)
>>> players = nflgame.combine(week2)
>>> for p in players.rushing().sort("rushing_yds").limit(10):
...     print p, p.rushing_att, p.rushing_yds, p.rushing_tds
F.Gore 16 207 2
C.Johnson 16 197 2
F.Jackson 28 163 0
C.Benson 29 141 0
R.Brown 24 136 2
M.Barber 18 124 1
M.Turner 28 105 1
S.Jackson 17 104 0
F.Jones 7 96 1
A.Peterson 15 92 1

What if you wanted to see who passed for the most touchdowns in the first five weeks of the 2011 season?

>>> games1_5 =, week=[1, 2, 3, 4, 5])
>>> players = nflgame.combine(games1_5)
>>> for p in players.passing().sort("passing_tds").limit(10):
...     print p, p.passing_tds
T.Brady 13
A.Rodgers 12
M.Stafford 11
D.Brees 10
R.Fitzpatrick 9
M.Hasselbeck 8
E.Manning 8
K.Orton 8
J.Flacco 7
M.Schaub 7

Or how about the receiving leaders for the entire 2009 season?

>>> season2009 =
>>> players = nflgame.combine(season2009)
>>> for p in players.receiving().sort("receiving_yds").limit(15):
...     print p, p.receiving_yds, p.receiving_rec, p.receiving_tds
A.Johnson 1504 95 9
W.Welker 1336 122 4
S.Holmes 1243 78 4
R.Wayne 1243 95 10
M.Austin 1230 74 11
S.Rice 1200 78 6
R.Moss 1189 78 13
S.Smith 1163 97 7
A.Gates 1145 78 7
D.Jackson 1120 60 9
H.Ward 1106 87 6
V.Jackson 1097 63 9
G.Jennings 1091 66 4
R.White 1087 79 10
B.Marshall 1081 93 10

Finally, with any of the above examples, you can export the statistics to a CSV file that can be read by Excel. For example, to export the entire 2011 season in just a single line:

>>> nflgame.combine("2011.csv")

What’s next?

For the short-term, I’d really like to come up with an easy and elegant way of providing alerts (emails and texts) for your fantasy football team. For example, whenever a player on your team—or your opponent’s team—scores a lot of points like a touchdown or a field goal. I did something like this last year using a cobbled together hack-job that I’m ashamed of, and it was a lot of fun. (I did screen scraping on the ESPN fantasy web site for all the statistics.)

The real problem here is keeping your roster up to date. It’s pretty easy if you’re using Yahoo because of their Fantasy Sports API, but I don’t think any other league web site offers such amenities.

In the long-term, nflgame could certainly be the statistical back-bone for fantasy football league software. But I don’t think that’s on my personal radar any time soon.

I am aware of phpFFL, which purports to be open source fantasy football league software—but development seems to have stalled. (It also looks like they are screen scraping CBS Sports for statistics, which I really want to avoid.) Plus, I vowed a long time ago never to take up another serious project in PHP again. I value my mental health too highly.


488 Comments - Post comment

Running Archlinux on the Lenovo Thinkpad T430

Posted on 07/01/2012 at 12:39am

In sum, Archlinux is working beautifully. What follows is a rough run down of my notes while installing, configuring, tuning and using Archlinux on the Lenovo Thinkpad T430.


  • i7 3520M (Dual core, Four threads, 4M cache, 2.9GHz)
  • 14” 1600x900 display
  • Intel HD Graphics 4000, no discrete graphics card
  • 4GB memory
  • 128GB Crucial m4 2.5” (7mm) SSD (laptop came with a 320GB 7200rpm platter)
  • 9 cell 70++ battery
  • Intel Centrino Wireless-N 2200 (2x2 BGN)

Software that I typically use

No desktop environment. Openbox/pytyle3/pager-multihead. Gkrellm. Konsole. Google Chrome. Vim. Konversation. Wicd.

Battery life

On 66% brightness, about 8 hours seems to be the sweet spot for typical usage (web browsing and vim). 100% brightness seems to knock this down to the 5-6 hour range. I’ve only had the laptop for a couple of days, so these numbers are pretty rough and based somewhat on extrapolation.

I have all of the tweaks suggested by powertop enabled. This includes wifi, audio, SATA link and usb power management.

I also have the ondemand CPU frequency governor enable, which is the default nowadays anyway. Changing governors (tested powersave, performance and conservative) works perfectly.

Disable KMS when using the current Archlinux installer

The current default Archlinux installer uses an older kernel that doesn’t include the updated drivers for Intel’s HD 4000 graphics (and possibly wifi?), so KMS fails once it tries to load—which ends up borking the display. To get around this, add ‘nomodeset’ to the kernel boot parameters to disable KMS. You’ll have to do this again on initial boot if you install the kernel that comes with the installer. Once an updated kernel is installed, this boot parameter is no longer needed since the driver for the Intel HD 4000 graphics chipset is included. The available snapshot installers have an updated kernel, and therefore disabling KMS is unnecessary if you’re using them.

Most of the current T420 wiki page is relevant

And by that I mean, most things just work. Wifi, graphics (with xf86-video-intel), CPU frequency scaling, screen brightness, keyboard backlight, and excellent two finger scrolling (out-of-the-box with xf86-input-synaptics, no configuration necessary despite what the T420 wiki article says).


I installed thinkfan and added the thinkpad_acpi kernel module to MODULES in /etc/rc.conf. I also added the thinkfan daemon to DAEMONS in /etc/rc.conf. To allow thinkfan to control the fan, enable fan_control by creating /etc/modprobe.d/thinkfan.conf with:

options thinkpad_acpi fan_control=1

And this is my thinkfan /etc/thinkfan.conf:

sensor /sys/devices/virtual/thermal/thermal_zone0/temp

(0, 0, 40)
(1, 38, 43)
(2, 41, 50)
(3, 44, 54)
(4, 51, 63)
(5, 55, 67)
(7, 61, 32767)

The settings are tweaked (a little less aggressive) from Thinkpad T420 thinkfan settings. I’ve tested it with the utilities in the cpuburn package and it seems to work well. For me, the fan stays off when idle and kicks into its lowest settings on typical usage (web browsing and vim, for me).

Recall that I do not have discrete graphics (i.e., nvidia) and have an SSD, which both have impact temperature.


This also worked out of the box with one small tweak. In /etc/acpi/, I replaced

    case "$2" in


    case "$2" in

This was needed because ac_adapter ACPI0003:00 are the first two arguments used when the ac_adapter event is triggered.


Works beautifully. Literally no problems.


I don’t use it. acpid, pm-utils and powertop’s recommended tweaks are enough for me. (I use acpid to raise and lower the brightness when the AC adapter is [un]plugged, and make sure that on wakeup/boot, the brightness is set appropriately.)

Sleep button

For whatever reason I haven’t been able to discover, I cannot detect any power button presses (neither through acpi_listen or xev). However, the T430 has an extra unlabeled button to the right of the “turn microphone on/off” button that shows up in xev as XF86Launch1. Thus, I put this in my ~/.xbindkeys:

"sudo pm-suspend"

And it’s now a sleep button, as long as you set ‘pm-suspend’ to require no password using visudo. For example:

%wheel ALL=(ALL) NOPASSWD: /usr/sbin/pm-suspend

Web cam

I installed guvcview and then ran it. It worked. (It looks like the uvcvideo kernel module is automatically loaded.)


Alsa just works. Speakers sound good enough to me. (I’m no audiophile though, and have never been too picky over audio quality. My ears aren’t very discerning.)

What I Haven’t tested

VGA. Mini display port. Mic. USB 3.0. mSATA. Memory card slot.

I don’t anticipate any problems with these things, though.


Archlinux works beautifully on this machine. I literally could not have imagined a smoother experience. I’ve installed Arch on several laptops, and this was by far the easiest. Having the wifi driver included in the kernel is an especially nice thing that I’m not used to. (Been a victim of broadcom for many years.) Also, the integrated graphics works beautifully, although I am not doing any compositing beyond xcompmgr transparency.

The laptop also has excellent ventilation and actually stays cool enough for it to be bearable to sit directly on my lap indefinitely. Even when the CPU is chugging.

This laptop also comes with the option of adding nvidia via Optimus. I recommend staying away from this unless you’re willing to deal with Bumblebee or absolutely need a dedicated graphics card. Early reviews seem to indicate that Intel’s HD 4000 Graphics is quite excellent for integrated graphics, particularly compared to the available nvidia option for this laptop (NVIDIA NVS 5400M).

There is also a 1366x768 screen option, and based on preliminary reviews, it isn’t that great. Go with the 1600x900 screen—it seems like the sweet spot.

Finally, this is my first Thinkpad. Incidentally, this is the first time that the T-series has had a “chiclet style” keyboard. Personally, I love it and have been partial to chiclet style keyboards in the past. The reviews already out there are not lying when they speak highly of its quality; it’s on the best-feeling keyboards I’ve ever typed on.

With that said, it seems like most of the complaints revolve around the layout. The one bugger that has got me so far is the location of the ‘fn’ and ‘control’ keys. It feels like they should be swapped. xev actually picks up ‘fn’ key releases but not key presses. I was unsuccessful in swapping the keys using xmodmap, but perhaps there is a BIOS option somewhere that will allow a swap. (EDIT: Indeed, there is a BIOS option that successfully swaps the ‘fn’ and control keys.) This is my only complaint with the keyboard layout. (But remember, I was never used to the old layout!)

In sum, this appears to be a great laptop to run Linux on.

74 Comments - Post comment

Daemonizing Go Programs (With a BSD-style rc.d example)

Posted on 04/27/2012 at 9:12pm

Go, by its very nature, is multithreaded. This makes a traditional approach of daemonizing Go programs by forking a bit difficult.

To get around this, you could try something as simple as backgrounding your Go program and instructing it to ignore the HUP signal:

nohup your-go-binary &

But what if your Go program is a web server that you need to be able to stop and start? (Particularly during development.) It can quickly become a pain to use the above approach, as you’ll have to look up the process identifier each time you need to stop your server. Moreover, using nohup isn’t an ideal means to turn your program into a daemon, since it doesn’t accomplish tasks commonly associated with daemons, like setting the root directory as the current working directory, setting the umask to 0, and more.

In steps daemonize, which runs any command as a Unix daemon. It automatically performs the aforementioned tasks and allows stdout and stderr to be redirected to files of your choosing. Here’s a quick example usage:

daemonize -o stdout.log -e stderr.log /absolute/path/to/go-program

While daemonize takes care of the nitty gritty details of becoming a daemon, we still cannot start or stop our program as easily as we can with other common daemons (like httpd, crond, cupsd, etc.).

In order to accomplish such a thing easily, it’s usually convenient to mimmick how other daemons are set up.

Since I use Archlinux, my daemons are organized in a BSD-style setup. Namely, they are all located in /etc/rc.d. Building off of the /etc/rc.d/crond daemon, I came up with the following for the daemon powering this blog (located at /etc/rc.d/blog-burntsushid):


. /etc/rc.conf
. /etc/rc.d/functions

cmd="/usr/sbin/daemonize -o $logOut -e $logErr $full"

# Go environment setup.
export GOROOT=/opt/go
export GOPATH=/home/andrew/go/world:/home/andrew/go/me

# You shouldn't have to edit below this line.
PID=$(pidof -o %PPID $full)

case "$1" in
    stat_busy "Starting $name daemon"
    [[ -z "$PID" ]] && su $user -m -c "$cmd" &>/dev/null \
    && { add_daemon $name; stat_done; } \
    || { stat_fail; exit 1; }
    stat_busy "Stopping $name daemon"
    [[ -n "$PID" ]] && kill $PID &>/dev/null \
    && { rm_daemon $name; stat_done; } \
    || { stat_fail; exit 1; }
    stat_busy "Reloading $name daemon"
    [[ -n "$PID" ]] && kill -HUP $PID &>/dev/null \
    && { stat_done; } \
    || { stat_fail; exit 1; }
    $0 stop
    sleep 1
    $0 start
    echo "usage: $0 {start|stop|restart|reload}"
exit 0

Simply altering the environment variables at the top should be enough to adapt it to your own purposes. In particular, $name refers to the name of the daemon—it needn’t correspond to any actual file. $full corresponds to the absolute path name of your Go binary. (The path must be absolute because daemonize requires it.) $logOut and $logErr correspond to the log files containing the stdout and the stderr of your program. $cmd corresponds to the full daemonize command. $user is the name of the user that should run the daemon. I’ve chosen to run my blog as myself for security purposes. $GOROOT, $GOPATH and $GOMAXPROCS should be set according to your Go environment.

Finally, the command is actually run using:

su $user -m -c "$cmd"

Using su will run the daemon as the user you specified. The -m switch tells su to use the current environment to run the command in, which is required for the $GO variables to have any effect.

Your Go program can now be started, stopped or restarted like so:

/etc/rc.d/blog-burntsushid start
/etc/rc.d/blog-burntsushid restart
/etc/rc.d/blog-burntsushid stop

The Archlinux Wiki has more information on writing rc.d scripts.

22 Comments - Post comment

Adding Thread Safety to the X Go Binding

Posted on 04/21/2012 at 8:52pm

The X Go Binding (XGB) is a low level library that provides an API to interact with running X servers. One can only communicate with an X server by sending data over a network connection; protocol requests, replies and errors need to be perfectly constructed down to the last byte. Xlib did precisely this, and then some. As a result, Xlib became huge and difficult to maintain.

In recent years, the XCB project made things a bit more civilized by generating C code from XML files describing the X client protocol using Python. While the Python to generate said code is no walk in the park, it is typically preferred to the alternative: keeping the X core protocol up to date along with any number of extensions that exist as well. (There are other benefits to XCB, like easier asynchronicity, but that is beyond the scope of this post.)

XGB proceeds in a similar vain; Python is used to generate Go code that provides an API to interact with the X protocol. Unlike its sister project XCB, it is not thread safe. In particular, if X requests are made in parallel, the best case scenario is a jumbled request or reply and the worst case scenario is an X server crash. Parallel requests can be particularly useful when images are being sent to the X server to be painted on the screen; other useful work could be done in the interim.

For example, in Wingo (a window manager written purely in Go; still in development), it would be great to do something like this when first managing a client:

func manage(windowId xgb.Id) {
    go func() {
        // load window icon images
        // and turn them into X pixmaps

    // the rest of the code to manage a client

Assuming GOMAXPROCS has been set appropriately, this would allow Wingo to show and map a window without regard to the time required to prep images associated with each client. (i.e., icons, images containing the title of the window, alt-tab cylcing icons, etc.) Such parallelism is particularly useful when the user has configured Wingo to use large images—which noticeably results in lag when first managing a window. Without thread safety in XGB, this sort of parallelism is impossible. Since drawing images to X windows is a common task, parallelism can be particularly useful. Thus, thread safety in XGB—being the only barrier to this sort of parallelism—is certainly desirable.

This is a perfect opportunity for Go to shine. But before we get into the juicy tidbits, I must discuss a few low-level and nasty details of X.

As said previously, we communicate with X over a network connection. As a client, we send requests and we read replies, events and errors. In particular, replies, events and errors all come to us on the same wire—XGB must deduce which kind of thing its reading by looking at the value of the first byte of each 32 byte block. (Sometimes replies can be longer than 32 bytes, but we can safely skip over that detail for this post.)

On a conceptual level, events are inherently separate from replies and errors. In particular, when a request expects a response (not all requests do!), it will either get a reply or an error from X. Thus, when issuing a request that expects a response, we are implicitly creating a contract with the X server that we’ll receive something corresponding to that request. We will revisit this response contract later; remember it!

(In this post, I’ll be focusing on the thread safety of receiving responses. Some amount of work also had to be done to ensure thread safe writing and reading of events—among other things. But the thread safety of receiving responses is much more interesting.)

You may be wondering: how do requests and responses match up? Both X and XGB keep track of how many requests have been sent. Each request we send, therefore, has a unique serial identifier associated with it. This identifier is also known as a cookie—and it’s included in every single reply and error sent to us from the X server. Therefore, whenever we send a request that expects a response, we need some way to store the cookie so we know when we’ve received the response (which will either be a reply or an error). Naively, this could be a simple map from cookie identifiers to responses. Here are the types:

type Cookies map[uint16]*Response
type Response struct {
    reply []byte
    err error

The types say that we have a map of cookie identifiers (unsigned 16-bit integers) to responses—where responses are either a reply (some slice of bytes) or an error. We could then populate this map by reading from our X connection with something like (excuse some pseudo code to keep it brief):

cookies := make(Cookies)

go func() {
    for {
        io.ReadFull(xConn, responseBytes)
        cookieId := getCookieFromBytes(responseBytes)

        if _, ok := cookies[cookieId]; ok {
            if responseBytes is reply {
                cookies[cookieId] = &Response{reply: responseBytes}
            } else if responseBytes is error {
                cookies[cookieId] = &Response{err: errorFromBytes(responseBytes)}
            } else {
        } else {
            panic("Got unexpected response")

But now what? If we’re waiting for a particular reply (i.e., with some particular cookie identifier), we could try something like:

func WaitForReply(cookieId uint16) *Response {
    for {
        if response, ok := cookies[cookieId]; ok {
            return response

But how much time should we wait between checks? If it’s too short, we’ll end up spinning and if it’s too long we’ll be blocking when we should be handling a response.

The underlying problem here is that replies may not come in the order that we need them. We can’t wait on a single channel for responses to come in, because we may be waiting for more than one reply and we can’t be sure of the order they’ll arrive in.

Another way to think about this problem is in terms of the response contract I mentioned earlier. The response contract is quite specific: whenever a request is sent that expects a response (we know which requests expect a response ahead of time), it must get either a reply or an error from the X server.

This is a perfect situation for goroutines and channels. Instead of thinking about the cookie as some identifier yielding a response, we can think about a cookie as an identifier with a “promise” it will return either a reply or an error in the future. That “promise” can be represented as a pair of channels: one channel gets a reply and the other gets an error. Let’s revisit our types:

type Cookies map[uint16]*Cookie
type Cookie struct {
    replyChan make(chan []byte, 1)
    errChan make(chan error, 1)

So that replyChan and errChan are the pieces that will fulfill the cookie’s promise (they are buffered so they don’t block the X read loop). The promise is fullfilled in the code that reads from the X connection. Instead of creating a response that has either a reply or an error, we can use the cookie’s channels to send either a reply or an error. Only two lines need changing:

cookies := make(Cookies)

go func() {
    for {
        io.ReadFull(xConn, responseBytes)
        cookieId := getCookieFromBytes(responseBytes)

        if cookie, ok := cookies[cookieId]; ok {
            if responseBytes is reply {
-->             cookie.replyChan <- responseBytes
            } else if responseBytes is error {
-->             cookie.errChan <- errorFromBytes(responseBytes)
            } else {
        } else {
            panic("Got unexpected response")

Our WaitForReply code also becomes much better, and will always do just the right amount of blocking:

func WaitForReply(cookie *Cookie) ([]byte, error) {
    select {
        case reply := <-cookie.replyChan:
            return reply, nil
        case err := <-cookie.errChan:
            return nil, err

So that we now have thread safe—and parallel—responses.

You can find my work in a clone of XGB called jamslam-x-go-binding. In addition to thead safety, several bugs have been fixed (most notably with using ChangeProperty on 32 bit values) and support for the Xinerama extension has been added. Support for other extensions is on the roadmap.

Any comments or criticisms on my approach are greatly appreciated.

31 Comments - Post comment

Dynamic Workspaces in Window Management

Posted on 04/21/2012 at 5:30pm

Do you have dynamic workspaces in your window manager?

You might be wondering: what in the world are dynamic workspaces? A dynamic workspace model allows one to add, remove or rename workspaces on the fly. Comparatively, in a typical window manager (or desktop environment) configuration, you tell the window manager to have x number of workspaces. When you start your window manager, you’ll have x workspaces, and you can typically cycle between them using some variation of “next workspace” or “previous workspace” commands. The disadvantage with this model is that it’s difficult to have a large number of workspaces—else you might forget which window is on each workspace.

Another approach (if you have a particularly nice window manager) is to specify the names of each workspace available to you. This lets you segregate windows into nice groups. Perhaps one workspace is named “web” and is dedicated to web browsing. Perhaps another one is named “editing” and is full of vim windows. And so on. Named workspaces are useful because they are easier to manage in large number—all one needs is to glance at the name of a workspace and you’ll know what kind of windows are there.

Dynamic workspaces extend upon the named workspace approach. While named workspaces don’t change in size or name, dynamic workspaces can. Named workspaces encourage general groups like “web” or “editing”, dynamic workspaces encourage task-oriented groups.

For example, while developing this blog application, I created several workspaces to manage windows related to development. Namely, blog, view and geils. blog contained terminals running vim for editing the source code, view contained a browser window I used to view my changes and geils contained a terminal logged into my web server (named Geils). After I was done working on my blog, I simply deleted those three workspaces. (I could also leave them there to come back to later—which is something I often do with persistent projects like my window manager.)

I really enjoy this approach to workspace management because it’s so specific to the tasks you’re working on. A static approach is too restraining for me; sometimes windows don’t fit nicely in each category you’ve created or sometimes you can’t forsee how a certain window will be used. Additionally, sometimes you want to have more than one workspace dedicated to some general task like “web”. But how many should you make up front? Too few and you’re restrained; but too many and you have unused workspaces cluttering up your environment.

Okay, you convinced me. How can I get dynamic workspaces?

Typically, your window manager has to support it. Xmonad comes to mind, as it has a DynamicWorkspaces module in xmonad-contrib. I’d also be willing to bet that Awesome could also do it, given its flexible Lua configuration (but I have never used Awesome extensively).

If you’re using Openbox (or possibly any EWMH compliant window manager), I’ve developed a pager that provides dynamic workspaces. (If you’re using Archlinux, it’s in the AUR.)

The pager is configured in Python, and by default, Super+Return is bound to a prompt that will show all available workspaces. If you type in a workspace name that exists, you’ll switch to that workspace. If you type in a workspace name that doesn’t exist, the pager will create that workspace and switch you to it. Finally, Super+BackSpace will delete a workspace (only when it doesn’t have any windows).

(If there’s demand, I may write another post that describes my pager in more detail. It actually has quite a few more nice features other than dynamic workspaces.)

27 Comments - Post comment