How to Optimize Go Performance with Stack Allocation for Slices

Introduction

Heap allocations are a major source of slowdown in Go programs. Each allocation requires a call to the memory allocator, and the garbage collector must later clean up the mess. But there's a simple trick to avoid many of these allocations: stack-allocate slices of constant size. Stacks are cheap—sometimes free—and they impose zero load on the garbage collector. This guide will walk you through identifying hotspots, pre-allocating slices, and verifying that your optimizations actually work on the stack.

How to Optimize Go Performance with Stack Allocation for Slices
Source: blog.golang.org

What You Need

Step-by-Step Guide

Step 1: Identify Hot Loops That Grow Slices

Look for loops that repeatedly append to a slice without a pre-allocated capacity. These are prime candidates for heap allocations. Example:

func process(c chan task) {
    var tasks []task
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

In this code, tasks starts with nil, so every time the backing array fills up, the runtime must allocate a new, larger array on the heap and copy the old contents over. During the first few iterations you get many small allocations—each one creating garbage and stressing the GC.

Step 2: Determine Maximum Slices Size

If you know the maximum number of elements the slice will ever hold (at the point of allocation), you can pre-allocate the backing array once. For example, if you know that c will never send more than 100 tasks, you can allocate exactly that size. If the size is a compile-time constant, the Go compiler may place the backing array on the stack instead of the heap.

Tip: You can often estimate the maximum from the problem domain—reading a fixed number of input lines, processing a known number of work items, etc. When in doubt, measure the median length in your production environment.

Step 3: Pre-Allocate the Slice with make

Replace the initial var tasks []task with a pre-allocated slice:

func process(c chan task) {
    tasks := make([]task, 0, 100)  // capacity 100
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

Now, instead of starting with size 1 and doubling repeatedly, the slice has room for 100 elements right away. All append operations within that loop will simply place the new element in the existing backing array—no further allocation needed. The backing array is allocated once by make.

When is the allocation on the stack? If the capacity is a constant small enough that the Go compiler decides it can be stack-allocated (typically < 64 KB or 1 MB depending on the platform), and if the slice does not escape to the heap (e.g., it is not returned or stored in a global variable), then the backing array lives on the stack. For many hot loops, this condition holds.

Step 4: Verify Escape Analysis

To ensure your pre-allocated slice stays on the stack, run the escape analysis:

go build -gcflags='-m -m' 2>&1 | grep escape

Look for lines like:

./main.go:10:6: make([]task, 0, 100) does not escape

If you see “escapes to heap”, something prevents stack allocation. Common causes:

If the slice does escape, reconsider your design—perhaps you can process the data without returning the whole slice, or copy the results elsewhere on the heap only once.

Step 5: Benchmark the Difference

Write a simple benchmark to quantify the performance gain:

func BenchmarkProcess(b *testing.B) {
    c := make(chan task, 1000)
    go func() {
        for i := 0; i < 1000; i++ {
            c <- task{...}
        }
        close(c)
    }()
    for i := 0; i < b.N; i++ {
        process(c)
    }
}

Run with:

go test -bench=BenchmarkProcess -benchmem

Compare the number of allocations per operation and the time per operation. The pre-allocated version should show drastically fewer allocations (often zero after the initial one) and lower latency.

Tips for Success

By following these steps, you can turn a wasteful heap‑allocation pattern into a clean stack‑based one. The result: faster code, less pressure on the garbage collector, and happier users.

Recommended

Discover More

El Niño on the Horizon: Could a Strong Event Push Earth Past the 1.5°C Threshold?10 Key Insights into OpenAI Codex’s New Chrome Extension10 Essential Steps to Build an AI-Enhanced Conference Assistant with .NET's Composable AI ToolkitPinpointing the Culprit: A Guide to Automated Failure Attribution in LLM Multi-Agent Systems5th Circuit Court Ruling Restricts Mifepristone Access to In-Person Dispensing Only