Container image bloat is a tax on every deployment — not a dramatic one-time cost, but a compounding one. Slower autoscaling at peak load, more CVEs to triage every sprint, a larger attack surface for any runtime compromise. The standard Debian-based node:22 image ships with roughly 50 to 60 known vulnerabilities before you write a single line of application code — 15 to 20 of them rated high or critical, per Minimus platform data from 2025. Research from Alibaba Cloud's Function Compute team found that image pulls account for more than 72% of total cold-start latency in containerised workloads. None of this happens because teams are careless. It happens because developer-convenience defaults get applied to production workloads and never revisited. Multi-stage builds, a minimal base image, and disciplined layer ordering cut image size by 85–95%. This post is the complete technical playbook.
50–60
CVEs in node:22 before your code
Minimus, 2025
over 72%
of cold-start time spent pulling image
Alibaba FaaSNet research
95%
potential size reduction vs full base
Docker hardened images, 2024
Source: Minimus 2025; Alibaba Cloud FaaSNet research; Docker hardened images documentation, 2024
Why image size is an operations problem, not just a hygiene one
The instinct to treat image size as cosmetic misunderstands where the cost lands.
Autoscaling latency. Kubernetes spins up new pods when load spikes. A new pod on a fresh node must pull the image before it can serve a single request. A 1.1 GB image at a realistic registry pull rate of 100 MB/s takes roughly 11 seconds to transfer before container initialisation even starts. A 30 MB distroless image takes under a second. During a traffic burst that triggered the HPA, that 10-second gap per pod is the window in which your users see errors. With 20 pods scaling simultaneously, you are looking at a meaningful fraction of a minute where the capacity you provisioned has not materialised.
CVE triage burden. In 2024, Aqua Security tracked approximately 28,000 newly disclosed CVEs. Every package in your image is a target surface. Shipping node:22 — which includes a full Debian system with curl, apt, bash, libsystemd, perl, and dozens of other packages — means your security team receives a long finding list, most entries relating to OS packages the application never calls. Signal-to-noise degrades, and real findings get buried. A distroless or minimal-base image collapses that list to a handful of entries that are actually relevant.
Registry and network costs. Images are pushed on every CI build. On a team running 50 builds per day, a 1 GB image versus a 100 MB image is 450 GB of additional registry traffic daily. That number matters on bandwidth-metered registries: Amazon ECR, Google Artifact Registry, and Azure Container Registry all bill for cross-region data transfer.
Runtime attack surface. If an attacker exploits an application-level vulnerability and achieves code execution, a full Debian base hands them bash, curl, wget, apt, Python, and a network namespace to pivot from. A distroless image has none of those. It does not eliminate the exploit, but it substantially constrains what the attacker can do next — a meaningful improvement in defence depth without any application code change.
Multi-stage builds: the highest-leverage change
A multi-stage Dockerfile uses multiple FROM instructions. Each stage creates a fresh filesystem. The final stage — the one that becomes the production image — contains only what you explicitly copy into it with COPY --from=build or COPY --from=deps. The compiler, build tools, dev dependencies, and test harness never appear in the artifact shipped to production.
Node.js: three-stage pattern
The pattern below separates production dependency installation, the build, and the runtime:
# Stage 1: production dependencies only (no devDependencies)
FROM node:22-slim AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
# Stage 2: full build (needs devDependencies for TypeScript, bundler, etc.)
FROM node:22-slim AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Stage 3: runtime — distroless, no shell, nonroot user
FROM gcr.io/distroless/nodejs22-debian12 AS run
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
USER nonroot
CMD ["/app/dist/server.js"]The deps stage installs only what runs in production. The build stage gets everything it needs. The run stage gets only the compiled output and the production node_modules. Nothing from the build toolchain reaches the registry.
Go: FROM scratch
Go's static compilation makes the size reduction most dramatic. A statically linked binary needs nothing from the host OS:
FROM golang:1.22-alpine AS build
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build \
-ldflags="-s -w" \
-trimpath \
-o /app/server .
FROM scratch
COPY --from=build /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=build /app/server /server
EXPOSE 8080
ENTRYPOINT ["/server"]The result is your binary plus a CA certificate bundle. Depending on application size, this typically lands between 10 and 25 MB. Zero OS packages, zero base-image CVEs. The only CVEs that can appear are in Go's standard library and your direct dependencies.
Two flags warrant explanation. -ldflags="-s -w" strips the symbol table and DWARF debug information, shaving 20–30% off binary size with no runtime impact. -trimpath removes local filesystem paths from the compiled binary, preventing build-system paths from leaking into production stack traces.
The payoff in concrete terms
Before: single-stage node:22
- Full Debian bookworm base — over 1 GB compressed
- Both dev and prod node_modules in final layer
- bash, curl, apt, perl present at runtime
- 50–60 CVEs before your application code
- Runs as root (UID 0) by default
After: three-stage distroless build
- gcr.io/distroless/nodejs22-debian12 — approximately 160 MB
- Production node_modules and compiled dist only
- No shell, no package manager, no extraneous OS tools
- Fewer than 10 CVEs in the base layer
- Runs as nonroot (UID 65532)
Choosing the right base image
Every base image decision is a trade-off across size, CVE count, tooling compatibility, and debuggability. The table below applies specifically to a Node.js stack; the same hierarchy holds for JVM and Python with their respective distroless variants.
| Base | Compressed size (approx.) | Typical CVE count | Has shell | When to use |
|------|--------------------------|-------------------|-----------|-------------|
| node:22 | over 1 GB | 50–60 | bash | Development only. Never production. |
| node:22-slim | ~250 MB | 30–40 | bash | Acceptable short-term migration step |
| node:22-alpine | ~165 MB | 10–20 | sh | Good general default; verify glibc compat first |
| gcr.io/distroless/nodejs22-debian12 | ~160 MB | under 10 | none | Production Node.js services |
| gcr.io/distroless/base-debian12 | 29.7 MB | minimal | none | Go/Rust with glibc linkage |
| gcr.io/distroless/static-debian12 | 1.9 MB | minimal | none | Go/Rust with CGO_ENABLED=0 |
| scratch | 0 MB | 0 | none | Fully static binaries; copy in TLS certs manually |
The distroless size figures come from GoogleContainerTools documentation published in mid-2024. The node:22 and Alpine figures are approximate benchmarks that vary by minor version; the order of magnitude is stable. The chart below shows the comparison for the most common Node.js bases:
Alpine and native modules. Alpine uses musl libc rather than glibc. For pure JavaScript workloads this is irrelevant. For services with native Node addons — sharp, better-sqlite3, bcrypt, canvas, grpc — musl incompatibilities cause silent failures at the point of first use, not at container start. The container passes health checks, then throws a Module._extensions error when the specific code path runs. If the dependency tree includes any native addon, use node:22-slim or distroless/nodejs22-debian12 (both are Debian/glibc-based). Check with npm ls --depth 0 before switching.
Debugging without a shell. The absence of a shell in distroless is a security property that surprises teams first. kubectl exec -it pod -- /bin/bash returns nothing. The correct approach is to debug against a separate build stage that retains the shell — docker build --target build . — or to attach an ephemeral debug container: kubectl debug -it pod --image=busybox --target=run. Reaching for the :debug tag on distroless images, which adds busybox, should be a deliberate, temporary decision with a ticket to remove it, not the permanent baseline.
Layer ordering: free rebuild speed
Docker layer caching is simple in theory and routinely mis-applied in practice. Every instruction creates a layer. When an instruction changes, every subsequent layer is invalidated and rebuilt from scratch. The rule is: place instructions that change most frequently at the bottom.
For a Node.js service, the correct order is:
FROM node:22-slim AS build
# Layer 1: system packages (change almost never)
RUN apt-get update && apt-get install -y --no-install-recommends \
python3 make g++ \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Layer 2: dependency manifests (change only when adding/removing packages)
COPY package*.json ./
# Layer 3: install (expensive; only busts when manifests change)
RUN npm ci
# Layer 4: application source (changes every commit)
COPY . .
# Layer 5: compiled artifact (depends on source)
RUN npm run buildThe violation that causes the most CI waste is COPY . . placed before RUN npm ci. A code change in any file busts the dependency cache, turning a three-second cache hit into a three-minute cold install. This exact error appears in the majority of README examples that get copy-pasted into real projects.
Concrete math. Suppose npm ci takes 90 seconds cold and 4 seconds from cache. A team of five engineers making 8 commits per day generates 40 CI builds per day. With correct layer ordering, 38 of those 40 builds hit the npm cache (only the two that actually change package.json miss it). The saving is 38 builds multiplied by 86 seconds each: 3,268 seconds — just under 55 minutes of CI time recovered per day — on one service. Multiply by the number of services in the organisation.
One team, one service, correct Dockerfile layer order: roughly 55 minutes of CI time returned per day.
.dockerignore: keep secrets and junk out of every layer
.dockerignore follows the same syntax as .gitignore and must sit at the root of the build context. A baseline for a Node.js project:
node_modules/
.git/
.env
.env.*
.env.local
*.log
coverage/
.nyc_output/
dist/
.DS_Store
*.test.ts
*.spec.ts
__tests__/
.github/
The two entries that matter most for security are .env and .env.*. Any COPY . . instruction without a .dockerignore embeds those files into an image layer. Image layers are plain tar archives — not encrypted, not access-controlled beyond registry permissions. Anyone with pull access to the registry, or who receives a copy of the image tarball, can run docker run --rm image tar -c /app and extract the contents. Credentials embedded in layers are a documented and recurring class of secrets leak; it is not a theoretical risk.
The .git/ exclusion matters for build context size. Git object databases accumulate quickly on long-lived repositories. A repository with 18 months of history routinely carries 200–400 MB of git objects that serve no purpose inside a container image.
One precision note: COPY --chown=user:group src dest is more efficient than a subsequent RUN chown -R. The RUN chown -R instruction adds a new layer that re-copies the inode metadata for every file in the tree, approximately doubling the layer size. The --chown flag sets ownership at copy time with no extra layer at all.
Scanning images in CI
The goal of image scanning is not zero CVEs — that is practically unachievable with any real runtime base — but a defensible posture: no unfixed, high-severity CVEs in packages the application actually ships. Trivy from Aqua Security is the most widely deployed open-source scanner and integrates cleanly into GitHub Actions:
name: Build and scan
on:
push:
branches: [main]
pull_request:
jobs:
build-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build production image
run: |
docker build \
--target run \
-t app:${{ github.sha }} .
- name: Scan with Trivy
uses: aquasecurity/trivy-action@0.28.0
with:
image-ref: app:${{ github.sha }}
format: table
exit-code: '1'
ignore-unfixed: true
vuln-type: os,library
severity: CRITICAL,HIGH
- name: Export SBOM
uses: aquasecurity/trivy-action@0.28.0
with:
image-ref: app:${{ github.sha }}
format: cyclonedx
output: sbom.cdx.json
vuln-type: os,library
- uses: actions/upload-artifact@v4
with:
name: sbom
path: sbom.cdx.jsonignore-unfixed: true is the critical flag. Without it, the build fails on CVEs for which no patched version of the package yet exists — meaning the build fails indefinitely regardless of what the team does. The objective is actionable findings: vulnerabilities where a patched package version is available and the team can act.
Trivy vs Grype vs Snyk. Grype from Anchore is the main open-source alternative to Trivy. Both are actively maintained and competent for image scanning. Trivy has broader coverage — IaC scanning, secret detection, and SBOM generation alongside image scanning — which makes it the better single-tool choice. Snyk offers a commercial option with a richer policy engine, finer-grained suppression workflows, and faster vulnerability database updates in some categories; for teams with formal compliance requirements around scanner SLAs, the managed option is worth evaluating. The vulnerability database latency gap between free and commercial tools is real: Snyk has historically published fixes faster for certain CVE classes than the open-source databases that Trivy and Grype consume.
- 01
Lint Dockerfile
Run hadolint in CI. It catches missing USER instructions, apt cache not cleaned in the same RUN layer, and shell pipe idioms that bypass error propagation.
- 02
Multi-stage build
Build targeting the run stage only. The build stage stays in CI cache but never reaches the registry. Tag with the git SHA, not latest.
- 03
Trivy scan
Fail on CRITICAL and HIGH unfixed CVEs. Export an SBOM in CycloneDX or SPDX format as a CI artifact for audit trail and supply chain attestation.
- 04
Push to registry
Immutable SHA-tagged images allow exact rollbacks and eliminate ambiguity about which build is running in production.
- 05
Nightly base rebuild
Schedule a weekly or nightly CI run that rebuilds the final stage even with no code changes, to pick up base image security patches as they are published.
Source: ClimsTech Engineering
Pitfalls that cost teams the most
apt-get without cache cleanup
# Wrong: apt cache stays in the layer permanently
RUN apt-get update
RUN apt-get install -y curl
# Correct: single RUN, cache cleaned inside the same layer
RUN apt-get update \
&& apt-get install -y --no-install-recommends curl \
&& rm -rf /var/lib/apt/lists/*Each RUN instruction commits a layer snapshot. The apt cache written by apt-get update in layer N cannot be removed in layer N+1 — it is baked into layer N's compressed archive. The --no-install-recommends flag prevents apt from pulling in suggested packages: documentation files, locale data, and optional components that are rarely needed inside a container and add 5–20 MB per package.
Running as root in production
Distroless images define a nonroot user at UID 65532 and GID 65532. Omitting USER nonroot means the container process runs as root, which violates most enterprise security policies, the Kubernetes PodSecurityStandard restricted baseline, and makes a container escape materially more dangerous. Many managed Kubernetes services now enforce runAsNonRoot: true at the namespace level; a container without a USER directive fails to schedule. Add USER nonroot as the second-to-last line in your final stage. For non-distroless bases, create a dedicated system user explicitly:
RUN addgroup --system --gid 1001 appgroup \
&& adduser --system --uid 1001 --ingroup appgroup appuser
USER appuserAlpine and musl: the silent failure mode
Switching from node:22-slim to node:22-alpine saves approximately 85 MB and is a sensible optimisation for pure JavaScript services. The trap is native Node addons. Packages like sharp, better-sqlite3, bcrypt, and canvas ship prebuilt binaries linked against glibc. On Alpine's musl libc, the container starts, passes health checks, and then throws a Module._extensions or Error: /lib/x86_64-linux-musl/libc.so.6: version GLIBC_2.34 not found error on first use of the affected module — not on startup. The failure is silent until the specific code path executes.
The diagnosis is one command before switching base images:
npm ls --depth=0 | grep -iE 'sharp|sqlite|bcrypt|canvas|grpc|node-gyp'If anything matches, use node:22-slim or distroless/nodejs22-debian12.
Using latest as a base tag
# Unpredictable, unauditable, unrollbackable
FROM node:latest
# Correct: pin to a specific minor version
FROM node:22.11-slim
# Maximum reproducibility: pin to the image digest
FROM node:22.11-slim@sha256:a3f9...latest resolves to a different image whenever Docker Inc. publishes a new major version. A CI build that passed on Monday may fail on Tuesday because latest now points to Node.js 24. More significantly, you cannot reconstruct which base image a production image was built from — a requirement for any meaningful CVE audit or incident investigation. For supply chain security requirements, the SHA digest pin is the only fully reproducible option.
Stale base images accumulate CVEs silently
An optimised Dockerfile built against a base image that was last refreshed six months ago ships all the CVEs published in those six months, regardless of how clean the Dockerfile structure is. This is the most common reason for teams to pass an initial image audit and then fail a re-audit three months later without having changed the Dockerfile. The nightly rebuild step in the process flow above addresses this directly. Alternatively, Renovate Bot and Dependabot both support automated PRs to bump base image tags and digests.
Large files silently included in build context
COPY . . includes everything the .dockerignore does not exclude. Teams that store test fixtures, database seed files, or model weights in the repository may be silently sending hundreds of megabytes to the Docker daemon before a single layer executes. The symptom is slow Sending build context to Docker daemon output in CI logs. Run docker build --progress=plain . to see the actual context transfer size, then add the offending paths to .dockerignore.
Maintaining the gains
Optimising the Dockerfile once is necessary but not sufficient. The decay patterns are predictable: a new engineer adds a RUN apt-get install without cache cleanup; a hotfix does COPY . . before restoring layer order; the base image pin drifts to a 14-month-old SHA; a devDependency migrates from the build stage to the run stage during a refactor.
The most effective preventive control is hadolint in CI. It is a Dockerfile linter that catches common structural errors — missing USER instruction, apt cache not cleaned, COPY . . before RUN npm install — in under a second:
- uses: hadolint/hadolint-action@v3.1.0
with:
dockerfile: Dockerfile
failure-threshold: warningThe second control is treating the Dockerfile as a first-class review artefact rather than an afterthought. Engineers who understand layer ordering and base image tradeoffs catch regressions in code review before they reach the main branch. A five-line summary of the layer ordering rule in the repository CONTRIBUTING document goes a long way.
The third is scheduling: a nightly CI workflow that rebuilds the final image stage, runs Trivy, and opens a PR if new CRITICAL or HIGH CVEs appear. This keeps the security posture current without requiring any manual intervention.