Golang Mutexes

One big problem arises that kills our application under seemingly very obscure circumstances. So called “raceconditions” will break everything you love once things start to warm-up scale wise. Your code is suddenly exposed to multiple workers trying to read from/write to the same source. Causing problems that are VERY hard to debug and pinpoint.

Lets demonstrate the problem a bit, we’re building a hitcounter for our website.

package main

import (
    "fmt"
    "math/rand"
    "sync"
    "time"
)

var wg sync.WaitGroup

func main() {
    // Create our counter object
    counter := &Counter{0}

    // We handle this concurrently, because we handle HTTP requests concurrently
    wg.Add(1)
    go incrementCounter(counter)

    wg.Add(1)
    go incrementCounter(counter)

    // Wait for all goroutines to finish
    wg.Wait()

    fmt.Printf("%v", counter.count())
}

func incrementCounter(counter *Counter) {

    i := counter.count()

    // Emulate the processing of some checks, obviously not all requests are equal in processing time
    time.Sleep(time.Duration(rand.Int31n(1000)) * time.Millisecond)

    // Increment counter
    i++

    // Set count
    counter.setCount(i)

    wg.Done()
}

type Counter struct {
    i int
}

func (c *Counter) count() int {
    return c.i
}

func (c *Counter) setCount(count int) {
    c.i = count
}

Running this code, you will find it outputs 1.
This is because we’re copying the counter value to a different chunk of memory, increment the counter on the copied version, then copy it back. The last goroutine to finish will set it back to 1, because when it grabbed the initial value of the counter it was 0.

Mutexes

Mutex, short for mutual exclusion. These amazing things allow you to patch the most obvious forms of raceconditions efficiently and mostly effortless. Mutexes, like WaitGroup are part of the sync package, which provides tooling for keeping goroutines in check. Mutexes specifically work by simply flipping a bit to determine if the resource itself is being used by something else. Lets fix the above code:

package main

import (
    "fmt"
    "math/rand"
    "sync"
    "time"
)

var wg sync.WaitGroup

func main() {
    // Create our counter object
    counter := &Counter{i: 0}

    // We handle this concurrently, because we handle HTTP requests concurrently
    // Add a goroutine to the WaitGroup
    wg.Add(1)
    go incrementCounter(counter)
    // Add a goroutine to the WaitGroup
    wg.Add(1)
    go incrementCounter(counter)

    // Wait for all goroutines to finish
    wg.Wait()

    fmt.Printf("%v", counter.count())
}

func incrementCounter(counter *Counter) {
    // locks the mutex OR blocks the goroutine until the mutex is unlocked and then locks it again
    counter.Lock()

    i := counter.count()

    // Emulate the processing of some checks, obviously not all requests are equal in duration
    time.Sleep(time.Duration(rand.Int31n(1000)) * time.Millisecond)

    // Increment counter
    i++

    // Set count
    counter.setCount(i)

    // unlocks the embedded mutex, unblocking stalled channels
    counter.Unlock()

    wg.Done()
}

type Counter struct {
    // embed the sync.Mutex functions
    sync.Mutex
    i int
}

func (c *Counter) count() int {
    return c.i
}

func (c *Counter) setCount(count int) {
    c.i = count
}

Now our Counter struct embeds a Mutex, and we can invoke the functions Lock and Unlock.
Lock will lock the Mutex, and if it’s already locked it will block the goroutine until the Mutex is Unlock-ed. Making it so that only a single goroutine can access the resource at a time. Given you do something clever like embed the mutex on the resource itself like I did, and make sure everything that accesses the resource requires it to be Unlocked, we should be good to go.

However, there is one thing you should note. If you don’t properly Unlock the resource afterwards, you block a goroutine indefinitely, causing a deadlock. So always ensure that your code always Unlocks the mutex. One tip I can give to avoid forgetting to unlock is by using defer
Defer postphones execution until the goroutine is done, it basically tells the runtime “do this once done”. I myself nearly always put a defer Unlock after locking a mutex, so we don’t have to follow code flow to figure out when/where it happens. It’s always executed when halting the process.

Let’s refine the corrected code a bit more:

package main

import (
    "fmt"
    "math/rand"
    "sync"
    "time"
)

var wg sync.WaitGroup

func main() {
    // Create our counter object
    counter := &Counter{i: 0}

    // We handle this concurrently, because we handle HTTP requests concurrently
    // Add a goroutine to the WaitGroup
    wg.Add(1)
    go incrementCounter(counter)
    // Add a goroutine to the WaitGroup
    wg.Add(1)
    go incrementCounter(counter)

    // Wait for all goroutines to finish
    wg.Wait()

    fmt.Printf("%v", counter.count())
}

func incrementCounter(counter *Counter) {
    // locks the mutex OR blocks the goroutine until the mutex is unlocked and then locks it again
    counter.Lock()

    // unlocks the embedded mutex, unblocking stalled channels
    defer counter.Unlock()

    // Subtract from waitgroup
    defer wg.Done()

    i := counter.count()

    // Emulate the processing of some checks, obviously not all requests are equal in duration
    time.Sleep(time.Duration(rand.Int31n(1000)) * time.Millisecond)

    // Increment counter
    i++

    // Set count
    counter.setCount(i)
}

type Counter struct {
    // embed the sync.Mutex functions
    sync.Mutex
    i int
}

func (c *Counter) count() int {
    return c.i
}

func (c *Counter) setCount(count int) {
    c.i = count
}

Thanks for reading