ClimsTech
Kubernetes & platform22 Oct 2025

Smaller, safer container images: a practical guide

The typical production container ships over 1 GB and 50+ CVEs it never needed. Multi-stage builds, minimal bases, layer discipline, and CI scanning cut that by 85–95%. Here is the complete technical playbook with benchmarks, worked examples, and the pitfalls that trip teams up in practice.

ClimsTech Engineering · 18 min read

Container image bloat is a tax on every deployment — not a dramatic one-time cost, but a compounding one. Slower autoscaling at peak load, more CVEs to triage every sprint, a larger attack surface for any runtime compromise. The standard Debian-based node:22 image ships with roughly 50 to 60 known vulnerabilities before you write a single line of application code — 15 to 20 of them rated high or critical, per Minimus platform data from 2025. Research from Alibaba Cloud's Function Compute team found that image pulls account for more than 72% of total cold-start latency in containerised workloads. None of this happens because teams are careless. It happens because developer-convenience defaults get applied to production workloads and never revisited. Multi-stage builds, a minimal base image, and disciplined layer ordering cut image size by 85–95%. This post is the complete technical playbook.

The cost of the default: a standard node:22 image shipped to production

50–60

CVEs in node:22 before your code

Minimus, 2025

over 72%

of cold-start time spent pulling image

Alibaba FaaSNet research

95%

potential size reduction vs full base

Docker hardened images, 2024

Source: Minimus 2025; Alibaba Cloud FaaSNet research; Docker hardened images documentation, 2024

Why image size is an operations problem, not just a hygiene one

The instinct to treat image size as cosmetic misunderstands where the cost lands.

Autoscaling latency. Kubernetes spins up new pods when load spikes. A new pod on a fresh node must pull the image before it can serve a single request. A 1.1 GB image at a realistic registry pull rate of 100 MB/s takes roughly 11 seconds to transfer before container initialisation even starts. A 30 MB distroless image takes under a second. During a traffic burst that triggered the HPA, that 10-second gap per pod is the window in which your users see errors. With 20 pods scaling simultaneously, you are looking at a meaningful fraction of a minute where the capacity you provisioned has not materialised.

CVE triage burden. In 2024, Aqua Security tracked approximately 28,000 newly disclosed CVEs. Every package in your image is a target surface. Shipping node:22 — which includes a full Debian system with curl, apt, bash, libsystemd, perl, and dozens of other packages — means your security team receives a long finding list, most entries relating to OS packages the application never calls. Signal-to-noise degrades, and real findings get buried. A distroless or minimal-base image collapses that list to a handful of entries that are actually relevant.

Registry and network costs. Images are pushed on every CI build. On a team running 50 builds per day, a 1 GB image versus a 100 MB image is 450 GB of additional registry traffic daily. That number matters on bandwidth-metered registries: Amazon ECR, Google Artifact Registry, and Azure Container Registry all bill for cross-region data transfer.

Runtime attack surface. If an attacker exploits an application-level vulnerability and achieves code execution, a full Debian base hands them bash, curl, wget, apt, Python, and a network namespace to pivot from. A distroless image has none of those. It does not eliminate the exploit, but it substantially constrains what the attacker can do next — a meaningful improvement in defence depth without any application code change.

Multi-stage builds: the highest-leverage change

A multi-stage Dockerfile uses multiple FROM instructions. Each stage creates a fresh filesystem. The final stage — the one that becomes the production image — contains only what you explicitly copy into it with COPY --from=build or COPY --from=deps. The compiler, build tools, dev dependencies, and test harness never appear in the artifact shipped to production.

Node.js: three-stage pattern

The pattern below separates production dependency installation, the build, and the runtime:

# Stage 1: production dependencies only (no devDependencies)
FROM node:22-slim AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
 
# Stage 2: full build (needs devDependencies for TypeScript, bundler, etc.)
FROM node:22-slim AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
 
# Stage 3: runtime — distroless, no shell, nonroot user
FROM gcr.io/distroless/nodejs22-debian12 AS run
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
USER nonroot
CMD ["/app/dist/server.js"]

The deps stage installs only what runs in production. The build stage gets everything it needs. The run stage gets only the compiled output and the production node_modules. Nothing from the build toolchain reaches the registry.

Go: FROM scratch

Go's static compilation makes the size reduction most dramatic. A statically linked binary needs nothing from the host OS:

FROM golang:1.22-alpine AS build
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build \
  -ldflags="-s -w" \
  -trimpath \
  -o /app/server .
 
FROM scratch
COPY --from=build /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=build /app/server /server
EXPOSE 8080
ENTRYPOINT ["/server"]

The result is your binary plus a CA certificate bundle. Depending on application size, this typically lands between 10 and 25 MB. Zero OS packages, zero base-image CVEs. The only CVEs that can appear are in Go's standard library and your direct dependencies.

Two flags warrant explanation. -ldflags="-s -w" strips the symbol table and DWARF debug information, shaving 20–30% off binary size with no runtime impact. -trimpath removes local filesystem paths from the compiled binary, preventing build-system paths from leaking into production stack traces.

The payoff in concrete terms

bloated

Before: single-stage node:22

  • Full Debian bookworm base — over 1 GB compressed
  • Both dev and prod node_modules in final layer
  • bash, curl, apt, perl present at runtime
  • 50–60 CVEs before your application code
  • Runs as root (UID 0) by default
optimised

After: three-stage distroless build

  • gcr.io/distroless/nodejs22-debian12 — approximately 160 MB
  • Production node_modules and compiled dist only
  • No shell, no package manager, no extraneous OS tools
  • Fewer than 10 CVEs in the base layer
  • Runs as nonroot (UID 65532)
Node.js service: single-stage vs three-stage multi-stage DockerfileSource: GoogleContainerTools distroless documentation, 2024; Minimus, 2025

Choosing the right base image

Every base image decision is a trade-off across size, CVE count, tooling compatibility, and debuggability. The table below applies specifically to a Node.js stack; the same hierarchy holds for JVM and Python with their respective distroless variants.

| Base | Compressed size (approx.) | Typical CVE count | Has shell | When to use | |------|--------------------------|-------------------|-----------|-------------| | node:22 | over 1 GB | 50–60 | bash | Development only. Never production. | | node:22-slim | ~250 MB | 30–40 | bash | Acceptable short-term migration step | | node:22-alpine | ~165 MB | 10–20 | sh | Good general default; verify glibc compat first | | gcr.io/distroless/nodejs22-debian12 | ~160 MB | under 10 | none | Production Node.js services | | gcr.io/distroless/base-debian12 | 29.7 MB | minimal | none | Go/Rust with glibc linkage | | gcr.io/distroless/static-debian12 | 1.9 MB | minimal | none | Go/Rust with CGO_ENABLED=0 | | scratch | 0 MB | 0 | none | Fully static binaries; copy in TLS certs manually |

The distroless size figures come from GoogleContainerTools documentation published in mid-2024. The node:22 and Alpine figures are approximate benchmarks that vary by minor version; the order of magnitude is stable. The chart below shows the comparison for the most common Node.js bases:

Compressed base image sizes by variant (approximate, Node.js — linux/amd64)
node:22 (full Debian)~1,100 MB
node:22-slim~250 MB
node:22-alpine~165 MB
distroless/nodejs22-debian12~160 MB
distroless/base-debian1229.7 MB
Source: GoogleContainerTools distroless documentation, 2024; Docker Hub official images

Alpine and native modules. Alpine uses musl libc rather than glibc. For pure JavaScript workloads this is irrelevant. For services with native Node addons — sharp, better-sqlite3, bcrypt, canvas, grpc — musl incompatibilities cause silent failures at the point of first use, not at container start. The container passes health checks, then throws a Module._extensions error when the specific code path runs. If the dependency tree includes any native addon, use node:22-slim or distroless/nodejs22-debian12 (both are Debian/glibc-based). Check with npm ls --depth 0 before switching.

Debugging without a shell. The absence of a shell in distroless is a security property that surprises teams first. kubectl exec -it pod -- /bin/bash returns nothing. The correct approach is to debug against a separate build stage that retains the shell — docker build --target build . — or to attach an ephemeral debug container: kubectl debug -it pod --image=busybox --target=run. Reaching for the :debug tag on distroless images, which adds busybox, should be a deliberate, temporary decision with a ticket to remove it, not the permanent baseline.

Layer ordering: free rebuild speed

Docker layer caching is simple in theory and routinely mis-applied in practice. Every instruction creates a layer. When an instruction changes, every subsequent layer is invalidated and rebuilt from scratch. The rule is: place instructions that change most frequently at the bottom.

For a Node.js service, the correct order is:

FROM node:22-slim AS build
 
# Layer 1: system packages (change almost never)
RUN apt-get update && apt-get install -y --no-install-recommends \
    python3 make g++ \
  && rm -rf /var/lib/apt/lists/*
 
WORKDIR /app
 
# Layer 2: dependency manifests (change only when adding/removing packages)
COPY package*.json ./
 
# Layer 3: install (expensive; only busts when manifests change)
RUN npm ci
 
# Layer 4: application source (changes every commit)
COPY . .
 
# Layer 5: compiled artifact (depends on source)
RUN npm run build

The violation that causes the most CI waste is COPY . . placed before RUN npm ci. A code change in any file busts the dependency cache, turning a three-second cache hit into a three-minute cold install. This exact error appears in the majority of README examples that get copy-pasted into real projects.

Concrete math. Suppose npm ci takes 90 seconds cold and 4 seconds from cache. A team of five engineers making 8 commits per day generates 40 CI builds per day. With correct layer ordering, 38 of those 40 builds hit the npm cache (only the two that actually change package.json miss it). The saving is 38 builds multiplied by 86 seconds each: 3,268 seconds — just under 55 minutes of CI time recovered per day — on one service. Multiply by the number of services in the organisation.

One team, one service, correct Dockerfile layer order: roughly 55 minutes of CI time returned per day.
Calculation: 40 builds/day, 90s cold npm ci vs 4s from cache, correct layer order

.dockerignore: keep secrets and junk out of every layer

.dockerignore follows the same syntax as .gitignore and must sit at the root of the build context. A baseline for a Node.js project:

node_modules/
.git/
.env
.env.*
.env.local
*.log
coverage/
.nyc_output/
dist/
.DS_Store
*.test.ts
*.spec.ts
__tests__/
.github/

The two entries that matter most for security are .env and .env.*. Any COPY . . instruction without a .dockerignore embeds those files into an image layer. Image layers are plain tar archives — not encrypted, not access-controlled beyond registry permissions. Anyone with pull access to the registry, or who receives a copy of the image tarball, can run docker run --rm image tar -c /app and extract the contents. Credentials embedded in layers are a documented and recurring class of secrets leak; it is not a theoretical risk.

The .git/ exclusion matters for build context size. Git object databases accumulate quickly on long-lived repositories. A repository with 18 months of history routinely carries 200–400 MB of git objects that serve no purpose inside a container image.

One precision note: COPY --chown=user:group src dest is more efficient than a subsequent RUN chown -R. The RUN chown -R instruction adds a new layer that re-copies the inode metadata for every file in the tree, approximately doubling the layer size. The --chown flag sets ownership at copy time with no extra layer at all.

Scanning images in CI

The goal of image scanning is not zero CVEs — that is practically unachievable with any real runtime base — but a defensible posture: no unfixed, high-severity CVEs in packages the application actually ships. Trivy from Aqua Security is the most widely deployed open-source scanner and integrates cleanly into GitHub Actions:

name: Build and scan
 
on:
  push:
    branches: [main]
  pull_request:
 
jobs:
  build-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
 
      - name: Build production image
        run: |
          docker build \
            --target run \
            -t app:${{ github.sha }} .
 
      - name: Scan with Trivy
        uses: aquasecurity/trivy-action@0.28.0
        with:
          image-ref: app:${{ github.sha }}
          format: table
          exit-code: '1'
          ignore-unfixed: true
          vuln-type: os,library
          severity: CRITICAL,HIGH
 
      - name: Export SBOM
        uses: aquasecurity/trivy-action@0.28.0
        with:
          image-ref: app:${{ github.sha }}
          format: cyclonedx
          output: sbom.cdx.json
          vuln-type: os,library
 
      - uses: actions/upload-artifact@v4
        with:
          name: sbom
          path: sbom.cdx.json

ignore-unfixed: true is the critical flag. Without it, the build fails on CVEs for which no patched version of the package yet exists — meaning the build fails indefinitely regardless of what the team does. The objective is actionable findings: vulnerabilities where a patched package version is available and the team can act.

Trivy vs Grype vs Snyk. Grype from Anchore is the main open-source alternative to Trivy. Both are actively maintained and competent for image scanning. Trivy has broader coverage — IaC scanning, secret detection, and SBOM generation alongside image scanning — which makes it the better single-tool choice. Snyk offers a commercial option with a richer policy engine, finer-grained suppression workflows, and faster vulnerability database updates in some categories; for teams with formal compliance requirements around scanner SLAs, the managed option is worth evaluating. The vulnerability database latency gap between free and commercial tools is real: Snyk has historically published fixes faster for certain CVE classes than the open-source databases that Trivy and Grype consume.

Recommended container image build and release pipeline
  1. 01

    Lint Dockerfile

    Run hadolint in CI. It catches missing USER instructions, apt cache not cleaned in the same RUN layer, and shell pipe idioms that bypass error propagation.

  2. 02

    Multi-stage build

    Build targeting the run stage only. The build stage stays in CI cache but never reaches the registry. Tag with the git SHA, not latest.

  3. 03

    Trivy scan

    Fail on CRITICAL and HIGH unfixed CVEs. Export an SBOM in CycloneDX or SPDX format as a CI artifact for audit trail and supply chain attestation.

  4. 04

    Push to registry

    Immutable SHA-tagged images allow exact rollbacks and eliminate ambiguity about which build is running in production.

  5. 05

    Nightly base rebuild

    Schedule a weekly or nightly CI run that rebuilds the final stage even with no code changes, to pick up base image security patches as they are published.

Source: ClimsTech Engineering

Pitfalls that cost teams the most

apt-get without cache cleanup

# Wrong: apt cache stays in the layer permanently
RUN apt-get update
RUN apt-get install -y curl
 
# Correct: single RUN, cache cleaned inside the same layer
RUN apt-get update \
  && apt-get install -y --no-install-recommends curl \
  && rm -rf /var/lib/apt/lists/*

Each RUN instruction commits a layer snapshot. The apt cache written by apt-get update in layer N cannot be removed in layer N+1 — it is baked into layer N's compressed archive. The --no-install-recommends flag prevents apt from pulling in suggested packages: documentation files, locale data, and optional components that are rarely needed inside a container and add 5–20 MB per package.

Running as root in production

Distroless images define a nonroot user at UID 65532 and GID 65532. Omitting USER nonroot means the container process runs as root, which violates most enterprise security policies, the Kubernetes PodSecurityStandard restricted baseline, and makes a container escape materially more dangerous. Many managed Kubernetes services now enforce runAsNonRoot: true at the namespace level; a container without a USER directive fails to schedule. Add USER nonroot as the second-to-last line in your final stage. For non-distroless bases, create a dedicated system user explicitly:

RUN addgroup --system --gid 1001 appgroup \
  && adduser --system --uid 1001 --ingroup appgroup appuser
USER appuser

Alpine and musl: the silent failure mode

Switching from node:22-slim to node:22-alpine saves approximately 85 MB and is a sensible optimisation for pure JavaScript services. The trap is native Node addons. Packages like sharp, better-sqlite3, bcrypt, and canvas ship prebuilt binaries linked against glibc. On Alpine's musl libc, the container starts, passes health checks, and then throws a Module._extensions or Error: /lib/x86_64-linux-musl/libc.so.6: version GLIBC_2.34 not found error on first use of the affected module — not on startup. The failure is silent until the specific code path executes.

The diagnosis is one command before switching base images:

npm ls --depth=0 | grep -iE 'sharp|sqlite|bcrypt|canvas|grpc|node-gyp'

If anything matches, use node:22-slim or distroless/nodejs22-debian12.

Using latest as a base tag

# Unpredictable, unauditable, unrollbackable
FROM node:latest
 
# Correct: pin to a specific minor version
FROM node:22.11-slim
 
# Maximum reproducibility: pin to the image digest
FROM node:22.11-slim@sha256:a3f9...

latest resolves to a different image whenever Docker Inc. publishes a new major version. A CI build that passed on Monday may fail on Tuesday because latest now points to Node.js 24. More significantly, you cannot reconstruct which base image a production image was built from — a requirement for any meaningful CVE audit or incident investigation. For supply chain security requirements, the SHA digest pin is the only fully reproducible option.

Stale base images accumulate CVEs silently

An optimised Dockerfile built against a base image that was last refreshed six months ago ships all the CVEs published in those six months, regardless of how clean the Dockerfile structure is. This is the most common reason for teams to pass an initial image audit and then fail a re-audit three months later without having changed the Dockerfile. The nightly rebuild step in the process flow above addresses this directly. Alternatively, Renovate Bot and Dependabot both support automated PRs to bump base image tags and digests.

Large files silently included in build context

COPY . . includes everything the .dockerignore does not exclude. Teams that store test fixtures, database seed files, or model weights in the repository may be silently sending hundreds of megabytes to the Docker daemon before a single layer executes. The symptom is slow Sending build context to Docker daemon output in CI logs. Run docker build --progress=plain . to see the actual context transfer size, then add the offending paths to .dockerignore.

Maintaining the gains

Optimising the Dockerfile once is necessary but not sufficient. The decay patterns are predictable: a new engineer adds a RUN apt-get install without cache cleanup; a hotfix does COPY . . before restoring layer order; the base image pin drifts to a 14-month-old SHA; a devDependency migrates from the build stage to the run stage during a refactor.

The most effective preventive control is hadolint in CI. It is a Dockerfile linter that catches common structural errors — missing USER instruction, apt cache not cleaned, COPY . . before RUN npm install — in under a second:

- uses: hadolint/hadolint-action@v3.1.0
  with:
    dockerfile: Dockerfile
    failure-threshold: warning

The second control is treating the Dockerfile as a first-class review artefact rather than an afterthought. Engineers who understand layer ordering and base image tradeoffs catch regressions in code review before they reach the main branch. A five-line summary of the layer ordering rule in the repository CONTRIBUTING document goes a long way.

The third is scheduling: a nightly CI workflow that rebuilds the final image stage, runs Trivy, and opens a PR if new CRITICAL or HIGH CVEs appear. This keeps the security posture current without requiring any manual intervention.