Synchronization in Go (But Your Hands Are Tied)
I normally swear by the following pattern which takes context.Context
as an argument to the daemon loop, to implement a long-running goroutine that needs to be stopped by a user interrupt, signal, timer, etc:
// Figure 1
type Daemon struct {
// .... omitted
}func (d *Daemon) Run(ctx context.Context) {
for {
select {
case: <-ctx.Done:
return
default:
}
// do your thing
}
}// Caller's code somewhere
ctx, cancel := context.WithCancel(...)
d.Run(ctx)// sometime later...
cancel()
The reason is that this allows the code writer to _NOT_ use any extra synchronization using sync.Mutex
, chan
, sync.Cond
, sync/atomi
, etc etc.
(On top of this, I also like to separate the outlier object ( Daemon
here ) and the daemon loop state, if any, such as loop counters or connection pools or whatever, but I digress…)
So… yes, I have a solution in my head already. But what if I couldn’t enforce this?
What if, for example, for historical reasons it wasn’t possible to just switch to this pattern. What if our hands were tied?
When you do not accept context.Context
(or similar) as a parameter, you will inevitably need to store the lifecycle state in your object as an instance variable.
For example, let’s say you want Daemon
to implement a Stop()
method, and a way for the caller to wait for its exit, such as Wait()
. One natural solution for any Go programmer is to use channels:
// Figure 2
type Daemon struct {
done chan struct{}
exited chane struct{}
}for (d *Daemon) Run() {
d.done = make(chan struct{})
d.exited = make(chan struct{})
defer close(d.exited) for {
select {
case <-d.done:
return
default:
}
// do your thing
}
}func (d *Daemon) Stop() {
close(d.done)
}func (d *Daemon) Wait() {
<-d.exited
}
So this is fine. Except when you start thinking about… users. Yes, them. Users. They don’t read the manual. They copy and paste. They will happily shoot their own foot when given the opportunity.
There are three problems that I see with this approach.
- There’s nothing stopping the user from using the zero value
Daemon
and call methods on it - There’s nothing stopping the user from calling methods out of order (e.g.
Stop()
,Wait()
, then maybeRun()
- There’s no synchronization provided between each of the methods — this is go: users are perfectly allowed to run each of the
Run()
,Stop()
,Wait()
methods in their own goroutines. Things need to be synchronized.
Let’s see if we can do anything about these problems.
Avoiding the Uninitialized
The first one could be mitigated by providing a constructor, and not allowing users to instantiate the object themselves. This way we can make sure that Daemon
has all of the channels initialized when the object gets into the users’ hands.
// Figure 3
type daemon struct { ... } // unexported
func New() *daemon {
return &daemon{...}
}
This is perfectly fine, except maybe that it’s clunky exposing a type name that users cannot use when they read the docs. Also, I would even venture to say that a lot of Go programmers would find returning unexported objects kind of weird.
(We could do something similar but return an exported interface instead, but that’s also controversial — I do it, but some people and even some linters don’t like it)
Besides, that requires code change. And no, users do not like code change. I’d like to get away with an exported struct, if I can help it. We’ll punt this for now.
Making Sense of the Chaotic Ordering
The second problem is really about protecting users from causing panics by accessing things out of order. Basically, your code should either return an error, or actually work, even if you do this:
// Figure 4
var d Daemon
go d.Stop() // goroutine 1
go d.Run() // goroutine 2
d.Wait() // goroutine 3, or current goroutine
The heart of the problem is that all three methods use shared instance variables stored in Daemon
, which needs some guarantees. Unfortunately the original code only initializes the channels once Run()
is called. And therefore you cannot guarantee that the required channels are initialized when Stop()
or Wait()
is called. You can’t force the user to always call things in order.
Of course, you could document this fact and call the users to only call Stop()
and Wait()
once after Run()
is called. Except, remember, users don’t read documentation. Ideally, code in Fig 4 would Just Work.
We have several obstacles:
- There is no clear initializer that can be run before any of the methods. We already ruled out using a constructor. And no, we are not changing “public API”.
- Whatever we do, we need proper synchronization as we will be accessing shared variables
This means we need a mechanism that works without an initializer, and we need to also synchronize things correctly. That means we will need to tackle item 3 in order to fix all of this.
Dancing With The Synchronization
Now, I’ve thought long and hard about this, and thus far I have only been able to come to the conclusion that, in order to make this not blow up, we will simply need to block the calls to Stop()
and Wait()
until Run()
is called.
So first, the easy part. We need to provide Daemon
with a synchronization primitives. Luckily, we can use sync.RWMutex
without an explicit initialization. Unfortunately using a non-pointer sync.RWMutex
as a struct field will mean that you won’t be able to make struct copies of Daemon
but I’d say that’s a trade-off we can live with.
// Figure 5
type Daemon struct {
done chan struct{}
exited chan struct{}
mu sync.RWMutex
}func (d *Daemon) Run() {
d.mu.Lock()
d.done = make(chan struct{})
d.exited = make(chan struct{})
d.mu.Unlock()LOOP:
for {
select {
case <-d.done:
fmt.Println("detected stop")
break LOOP
default:
}
// do your thing
} d.mu.Lock()
close(d.exited) // won't reset d.exited, because it will be used by Wait()
d.done = nil
d.mu.Unlock()
fmt.Println("clean exit")
}
You can see the critical sections properly protected by the mutex.
For Stop()
and Wait()
we’re going to have to write a method that blocks them until Run()
is called. Those who are used to this sort of code will quickly realize that we’re basically implementing sync.Cond
as a busy loop (and no, we can’t use sync.Cond
here because it will require initialization using a constructor or the like).
// Figure 6
func (d *Daemon) Stop() {
d.mu.Lock()
for d.done == nil { // has not started
d.mu.Unlock()
<-time.After(100*time.Millisecond)
d.mu.Lock()
}
fmt.Println("Stop!")
close(d.done)
d.mu.Unlock()
}func (d *Daemon) Wait() {
d.mu.RLock()
for d.exited == nil {
d.mu.RUnlock()
<-time.After(100*time.Millisecond)
d.mu.RLock()
}
ch := d.exited
d.mu.RUnlock()
<-ch
fmt.Println("wait done")
}
Then We Force Our Will Unto The Users
And given this code, the following two code will work as expected, even with our hands tied. Well, mostly, if you are okay with the ungodly amount of synchronization we had to write and the hideous busy loop.
// Figure 7, the "correct" way to use
var d Daemon
go d.Run()
// mimic somebody stopping the daemon
time.AfterFunc(time.Second, d.Stop)d.Wait()
And even if you call things in out of order, it still works:
// Figure 8, the "out-of-order" way to use
var d Daemon
go d.Stop()
go d.Wait()
go d.Run()// give the goroutines chance to exit
<-time.After(time.Second)
Both of the above should print out the messages in this order:
Stop!
detected stop
clean exit
wait done
Try it for yourself! (Crossing my fingers that I don’t have any major bugs)
This was the best answer I could come up with to make this work with my hands tied. And no, if given the chance to use context.Context
I would never write this code. But this sort of constraints exist all over, so better think of ways to overcome them (while you concoct ways to rewrite the whole codebase).
Please let me know if you have better solutions!