Stack Allocation vs Heap: Boosting Go Performance

From Corea24, the free encyclopedia of technology

The Hidden Cost of Heap Allocations

Every time a Go program requests memory from the heap, a significant amount of code executes behind the scenes to satisfy that request. This process isn't just slow—it also adds pressure on the garbage collector (GC). Even with modern improvements like the Green Tea collector, GC overhead remains a bottleneck for many applications. The result? Programs that spend more time managing memory than doing actual work.

Stack Allocation vs Heap: Boosting Go Performance
Source: blog.golang.org

Why Stack Allocations Are a Game Changer

Stack allocations operate differently. They're often free in terms of runtime cost because the memory is reclaimed automatically when the function returns. The stack is a structured region of memory that grows and shrinks with function calls, making allocations extremely cache-friendly. Better still, stack allocations never burden the garbage collector—they simply vanish when the stack frame is popped. This reuse pattern reduces latency and improves throughput, especially in performance-critical hot paths.

The Slice Growth Problem

Consider a common pattern: reading tasks from a channel and collecting them into a slice.

func process(c chan task) {
    var tasks []task
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

At first glance this looks harmless, but behind the scenes the append operation triggers repeated heap allocations. Here's what happens on each iteration:

  • Iteration 1: The backing array doesn't exist, so append allocates a new array of size 1.
  • Iteration 2: The array is full; append allocates a new array of size 2 (doubling). The old array becomes garbage.
  • Iteration 3: Full again, new array of size 4 allocated. Old array (size 2) discarded.
  • Iteration 4: Only 3 elements used, so no allocation. The fourth element fits.
  • Iteration 5: Full, allocate size 8, and so on.

This doubling strategy ensures that later appends rarely cause allocations, but the early iterations are expensive. For small slices—or when the slice never grows large—this startup phase dominates. Every allocation touches the heap, produces garbage, and forces the GC to work harder.

Constant-Sized Slices: A Stack-Friendly Solution

Recent Go releases address this by enabling stack allocation for slices whose backing array size is known at compile time. If you can define a maximum number of elements (e.g., from a constant or a tight loop bound), the compiler can allocate the backing array directly on the stack. This eliminates heap allocation entirely for that slice.

When Does the Compiler Use the Stack?

The optimization applies when the slice's capacity is a compile-time constant. For example:

const maxTasks = 100

func process(c chan task) {
    var tasks [maxTasks]task  // stack-allocated array
    var slice = tasks[:0]    // zero-length slice on top of it
    for t := range c {
        slice = append(slice, t)
        // append only bumps length; no heap allocation
    }
    processAll(slice)
}

By pre-allocating a fixed-size array on the stack and then slicing it, you avoid the growth overhead. The slice's backing store never leaves the stack, so the garbage collector stays out of the picture.

Real-World Benefits

In hot paths—like request handlers or event loops—this change can significantly reduce allocation counts and GC pause times. Benchmarks show up to 30% reduction in memory management overhead for slices that rarely exceed a small constant size. The cache locality also improves because the data stays close to the function's stack frame.

When to Apply This Technique

Not every slice benefits from stack allocation. The key is knowing the upper bound of the slice at compile time. Use it when:

  • The slice size is limited by a constant or a small loop bound (e.g., under a few thousand elements).
  • The slice is created frequently in performance-critical code paths.
  • You want to minimize GC pressure and heap fragmentation.

For unbounded slices—those that grow arbitrarily large—heap allocation remains necessary. However, even in those cases, the initial small-growth phases can be avoided by pre-allocating a reasonable capacity with make([]task, 0, initialCapacity). This reduces the number of early reallocations.

Conclusion

Stack allocation is one of the most effective optimizations in Go. By moving small, fixed-size data structures off the heap, you reduce allocation costs, decrease GC workload, and improve memory locality. The latest Go releases make this easier than ever for slices with compile-time constant sizes. Start examining your hot loops—you might be surprised at how many heap allocations can be eliminated with a simple constant-bound array.