Concurrency in Go: Don’t Start, Run

Ryan Collingham
7 min readJun 20, 2021
Some hardworking goroutines

Concurrency is often considered one of Go’s strongest suits, if not its ace in the hole. People like how simple it is to launch goroutines to do work asynchronously, and how the runtime manages these lightweight goroutines in such an efficient manner that you needn’t worry about launching thousands, millions or as many as you need of them in a single program.

However, this simplicity can be a double-edged sword: without the safety rails that many other languages and runtimes impose, concurrency in Go can quickly become a surefire way to shoot yourself in the foot in ways that even a seasoned Go developer might not be able to quickly diagnose and fix. I highly recommend reading this excellent blog post, Go statement considered harmful, for a thoughtful, first-principles critique of the Go statement.

As a tl;dr summary, the author explains how the Go statement has many of the same properties as the Goto statement in languages of old. In languages such as Fortran and Cobol, the Goto statement was far more powerful than its modern-day cousin as seen in C/C++ and Go itself: not only did it allow the execution of code to jump to any line in the same function, but to any line in any function in the entire program. This property was considered problematic by many leading computer scientists, since it very quickly made programs difficult if not impossible to reason about. Since any function call could jump to any other part of the program, you had to consider the entire program when making changes — limiting how large and complex a program could get without becoming an unmaintainable mess. The original Goto statement destroyed the abstraction offered by functions and libraries.

How is all of this relevant to concurrency? Well, much like the Old Testament Goto statement, the Go statement also allows the execution path to jump outside of the current function boundary and right into any other part of the program. Without guardrails, this property can very easily make your Go programs difficult if not impossible to reason about, and breaks down abstraction boundaries.

However, while the author of “Go statement considered harmful” believes that Go statements are irreparably broken, going on to build and flaunt yet another Python concurrency framework (if I have one piece of advice for concurrency in Python, it is: Don’t), I believe that the sting of the Go statement can be neutered much like the sting of the old fashioned Goto, by implementing and following some simple rules. Sure, in an ideal world it would be great if the Go language and compiler would provide better tools to enforce these safety rules and make them feel more natural. However, like it or not, it seems to be a part of Go’s philosophy and design to have many unwritten rules and a laissez faire attitude towards enforcing them: if you want strict, compile-time asserted rules consider switching to Rust instead.

The Golden Rule(s)

My golden rule is actually made up of both a hard rule, and a strong recommendation:

  1. Always consider the lifetime of a new goroutine — know when and how it will terminate.
  2. Strongly prefer keeping the lifetime of a goroutine to a single block.

If you follow both of these rules, you will avoid many of the pitfalls of concurrency in Go. To explain them, consider an example of a common pattern. Let’s say you want to design a handler which periodically refreshes a value in a background goroutine. Maybe the value takes a long time to compute and so it is preferable to serve a cached value to requests and periodically refresh it in the background, or maybe the value represents some kind of authorization token that needs to be re-negotiated with a server every so often. I will first demonstrate a design which breaks the golden rules, explain the issues, and then show how it may be refactored.

A Bad Example: The “Start-Close” Pattern

In the Go standard library and beyond, you’ll often see a pattern where a “Close” function is deferred to perform some teardown later. Close methods are called when you are done reading from a file, finished reading the body of a network request, or finished using a DB connection, to name some common examples.

Though entirely appropriate for looking after OS resources like file descriptors, I do not recommend adopting this approach to look after your background goroutines. Though it may look entirely reasonable at first glance, there are a couple of issues. Firstly, consider from a code maintenance perspective. Look at just the Start() method — can you tell what the lifetime of the goroutine being launched there will be? No, we need to look elsewhere, at the Close() method, to see that the context will get cancelled and the waitgroup gets waited on until the goroutine has terminated. So while the first of our golden rules is followed, the second is not.

What happens if the caller neglects to call Close()? Or what if they call Close() before Start() has been called? In the first case a goroutine is leaked, in the second the program will panic from dereferencing the nil function pointer. Generally, you want to design your interfaces so that there is only one obvious way to use them, if possible.

It is possible to patch over these shortcomings by adding some additional bookkeeping — Close() could check if the cancel func is non-nil, or we could implement a finalizer function to call Close() if necessary before the handler gets garbage collected. However, these aren’t perfect solutions — there is a better way.

A Better Example: The “Run” Pattern

You will be familiar with this pattern if you have ever used the http.Server type — except instead of being called “Run”, the blocking method you call is one of the “Serve*” or “ListenAndServe*” family. Essentially, the concurrency is being lifted up a level, to the caller, rather than buried within the Handler itself.

Now, when we look at the code, it is trivial to see what the lifetime of the background goroutine used to run the handler will be from looking at just a single func — immediately below our go statement, we can see that the context is cancelled before we wait on the goroutine to terminate. This goroutine will never outlive the function in which it is spawned, and so the caller does not ever need to care about it — it is a non-leaky abstraction. There is no “wrong” way to use the Handler, no footguns or panics awaiting us if we get it wrong.

A Bonus: Better Error Handling

So far our handler has been very simple — it simply loops and updates the stored value to a random integer until its context gets cancelled. However, let’s imagine that our handler is doing something a bit more complex, and that it might hit an unrecoverable error. This doesn’t easily fit into the “Start-stop” pattern, but the “Run” pattern can easily be modified to accommodate better error handling:

Our sync.WaitGroup has been replaced with an errgroup.Group, a highly useful type to keep in your concurrency toolbelt. When any of the goroutines spawned by the Go() method return a non-nil error, the errgroup will cancel the context and await all of the other goroutines to exit.

In our example, we’ve set a “bomb” that has a random chance of going off when the number 42 is generated by our handler. If this happens, the goroutine which reads and prints values from the handler will see that the context has been cancelled and exit.

Being able to swap out the sync.WaitGroup for an errgroup.Group depending on your error-handling needs is a convenient benefit of lifting control of the concurrency out from inside the Handler.

The Exception which Proves the Rule

Of course, every good rule needs an exception. If you’ve used pretty much any networking server library in Go, including http, gRPC, pubsub and many others, you will be familiar with the concept of registering a handler func which gets called to handle incoming requests. Because of the way the Go runtime cleverly multiplexes goroutines onto OS threads, there is usually no need for the implementation of these servers to maintain pools of worker goroutines as you would in other languages; instead, they will simply spawn a new goroutine to handle each new request and move on, without waiting on that goroutine to complete. This violates both of our golden rules — not only does the goroutine not terminate within the same block that it spawns, but the library really has no idea when it will terminate if at all.

It’s down to a fairly fundamental computer science issue known as the halting problem — there is no way for a library to know when or even if the handler function it has been given will terminate without actually running it. Sure, it can pass in a context with a deadline but it is up to the called code whether it actually respects that deadline or not. In this case, spawning new go-routines for every new request without knowing when they will terminate is the only reasonable way to ensure high throughput of requests in Go without imposing some arbitrary limits on how many requests may be processed concurrently.

While this is a valid exception, it is one that most Go programmers are unlikely to ever come across themselves, unless you happen to be developing a library to handle some new and exotic network protocol which does not already have a widely used library available. The vast majority of Go code will fall within the context of the request handler and not the dispatcher — and this is the domain of the golden rules.

In Summary

When dealing with concurrency in Go or any other language, remember the Golden Rules and keep the lifetime of your goroutines at the forefront of your mind. If possible, it is preferable to wait for all goroutines spawned in a function to have finished before you exit that function — using a sync.WaitGroup or an errgroup.Group. Spawning a goroutine which outlives the function is a side-effect, and functions without side effects are much easier to reason about and should be used as much as possible.

From comparing two common ways of spawning background goroutines, you can see how “lifting up” the concurrency can result in code that is more explicit and therefore easier to reason about and maintain.

--

--