How to Reduce Heap Allocations by Stack-Allocating Slices in Go
<h2>Introduction</h2>
<p>Heap allocations are a major source of performance overhead in Go programs. Each call to the memory allocator consumes CPU cycles, and allocated objects place additional strain on the garbage collector. In contrast, <strong>stack allocations</strong> are nearly free—they require no explicit deallocation and are automatically reclaimed when the function returns. This guide walks you through a practical technique: <em>stack-allocating slices of known or bounded size</em> to eliminate heap pressure and accelerate your hot code paths.</p><figure style="margin:20px 0"><img src="https://go.dev/images/google-white.png" alt="How to Reduce Heap Allocations by Stack-Allocating Slices in Go" style="width:100%;height:auto;border-radius:8px" loading="lazy"><figcaption style="font-size:12px;color:#666;margin-top:5px">Source: blog.golang.org</figcaption></figure>
<h2>What You’ll Need</h2>
<ul>
<li>A Go development environment (version 1.20 or later for best escape analysis)</li>
<li>Basic familiarity with Go slices, arrays, and memory management</li>
<li>A profiler (e.g., <code>pprof</code>) to measure heap allocations before and after</li>
</ul>
<h2>Step-by-Step Instructions</h2>
<h3 id="step1">Step 1: Identify Heap‑Allocation Hot Spots</h3>
<p>Run your program under the profiler to locate functions where slices are repeatedly grown via <code>append</code>. Look for patterns like:</p>
<pre><code>var tasks []task
for t := range c {
tasks = append(tasks, t)
}
processAll(tasks)</code></pre>
<p>Each time the slice’s backing array fills, the runtime must allocate a new—and usually larger—array on the heap. This produces garbage and slows down the inner loop. In your profiler output, pay special attention to <code>runtime.mallocgc</code> and <code>runtime.growslice</code> call stacks.</p>
<h3 id="step2">Step 2: Determine the Maximum Slice Size</h3>
<p>Ask yourself: <strong>Is the maximum number of elements known at compile time?</strong> Even a loose upper bound is enough. For example, if you know you will never process more than 64 tasks, that bound allows stack allocation. If the bound depends on runtime input (e.g., <code>len(users)</code>), you may still be able to pre‑allocate capacity with <code>make</code>, but that alone does not move the allocation to the stack.</p>
<h3 id="step3">Step 3: Replace the Dynamic Slice with a Fixed‑Size Array</h3>
<p>When the maximum size is a compile‑time constant, use a <strong>stack‑allocated array</strong> and then slice it:</p>
<pre><code>func process(c chan task) {
var tasks [64]task // stack-allocated array
var n int
for t := range c {
if n == 64 {
// Handle overflow (log, return error, etc.)
break
}
tasks[n] = t
n++
}
processAll(tasks[:n]) // slice the array
}</code></pre>
<p>Because <code>[64]task</code> has a fixed size known to the compiler, it is placed on the stack. The subsequent slice <code>tasks[:n]</code> points to that stack memory, so no heap allocation occurs. The <code>append</code> loop is gone entirely.</p>
<h3 id="step4">Step 4: Use <code>make</code> with Capacity for Bounded but Dynamic Sizes</h3>
<p>If the maximum size is a runtime value (e.g., <code>len(data)</code>), pre‑allocate the backing array with <code>make([]task, 0, maxSize)</code>. This avoids the incremental growth overhead, but note that <code>make</code> itself still allocates on the heap. To truly push it to the stack, the array must have a compile‑time constant size (see Step 3). However, pre‑allocation dramatically reduces the number of allocations and garbage generated:</p>
<pre><code>func process(c chan task, max int) {
tasks := make([]task, 0, max)
for t := range c {
if len(tasks) == cap(tasks) {
break
}
tasks = append(tasks, t)
}
// ...
}</code></pre>
<h3 id="step5">Step 5: Leverage Pooling for Repeated Slices</h3>
<p>If you must use dynamic slices and cannot determine a maximum size, consider reusing backing arrays via <code>sync.Pool</code>. While not strictly stack allocation, this reduces heap churn. Combine with Steps 1–4 to minimise allocations in the most critical paths.</p>
<h3 id="step6">Step 6: Verify with Escape Analysis and Profiling</h3>
<p>After refactoring, confirm that your objects stay on the stack. Use the <code>-gcflags='-m'</code> flag:</p>
<pre><code>go build -gcflags='-m -l' yourfile.go</code></pre>
<p>Look for lines like <code>moved to heap</code> or <code>escapes to heap</code>. If your fixed‑size array or slice is reported as “does not escape”, it is stack‑allocated. Rerun the profiler and verify reduced <code>mallocgc</code> calls and lower GC pauses.</p>
<h2>Conclusion and Tips</h2>
<ul>
<li><strong>Start small.</strong> Only rewrite the top 2–3 hot spots identified by profiling. Premature optimisation can hurt readability.</li>
<li><strong>Respect stack limits.</strong> A fixed‑size array of several megabytes will overflow the stack (<code>runtime: goroutine stack exceeds 1GB</code>). Keep arrays small (a few thousand elements) unless you adjust the stack size with <code>runtime/debug.SetMaxStack</code>.</li>
<li><strong>Watch for sharing.</strong> If part of the array is taken as a slice and returned or stored beyond the function call, the entire array escapes to the heap. Be careful with slices passed to goroutines or stored in global variables.</li>
<li><strong>Combine with compiler hints.</strong> Declare a variable with <code>var buf [64]byte</code> inside a function – it is almost always stack‑allocated. The Go escape analyzer is conservative, so simple code is best.</li>
<li><strong>Test for correctness.</strong> When you replace <code>append</code> with manual index tracking, confirm you handle the “full” case gracefully (overflow, error, or dynamic fallback).</li>
</ul>
<p>By applying these steps, you can convert expensive heap allocations into cheap stack allocations, making your Go programs faster and more cache‑friendly. Start with the most performance‑sensitive loops and work outward—the gains can be substantial.</p>