Securing the CI/CD supply chain: DevSecOps that doesn't slow you down

The build pipeline used to be infrastructure. It is now a primary attack vector. SolarWinds' build environment was compromised before a single customer was touched. Codecov's CI bootstrap script was modified to exfiltrate environment variables — tens of thousands of pipelines leaked credentials for two months before anyone noticed. The XZ Utils backdoor was planted through a years-long social engineering campaign targeting an open-source maintainer; it reached production systems via a completely routine package update. The 3CX breach reached customers through a trojanized upstream Electron dependency. The common thread across all of them: none were application vulnerabilities. They were pipeline vulnerabilities. DevSecOps done badly answers this with gates that generate 200 findings per PR and block on a 2016 CVE in a package nobody loads at runtime — teams learn the override workflow inside a week. Done well, controls are inline, the signal-to-noise ratio is high, and a blocked build means someone needs to act today.

Software supply chain threat growth

156%

YoY growth in open-source malware

2023 to 2024

742%

Avg annual growth in supply chain attacks

3-year average

95%

Vulnerable packages with a fix available

at time of consumption

Source: Sonatype, 10th Annual State of the Software Supply Chain, 2024

The six links you are actually defending

Before tooling, inventory what you are protecting. The software supply chain has six distinct links. Most teams secure the first two and ignore the rest. The major breaches of the last five years exploited links three through six.

1. Source — your code, your Infrastructure-as-Code, and your CI configuration files. A .github/workflows/deploy.yml with AWS_ACCESS_KEY_ID in its environment block is as sensitive as any application file. The CI configuration is the specification of your entire build and deploy process, and it runs with elevated permissions by design.

2. Dependencies — direct and transitive open-source packages. The PyTorch nightly compromise in December 2022 exploited a package named torchtriton that was three hops down the resolution graph from the package engineers directly required. You cannot protect what you cannot see.

3. Base images — a FROM node:20 in your Dockerfile inherits Ubuntu's full package surface plus Node's. A production distroless image ships with a handful of packages. The gap between these two baselines is your avoidable attack surface.

4. Build credentials — GITHUB_TOKEN, AWS access keys, container registry push credentials. A long-lived credential stored in a CI environment variable does not expire, is often scoped to the team rather than the job, and is one accidental echo away from appearing in a build log.

5. Artifacts — the compiled binary, container image, or published package that leaves your pipeline. Without signing and provenance attestation, "we deployed version X" is an assertion. With them, it is a verifiable cryptographic fact.

6. Deploy path — the Kubernetes RBAC, Helm values, Terraform state, and CD tooling connecting an artifact to running infrastructure. A CI job with cluster-admin is a blast radius measured in entire clusters.

Map your dependencies before you scan them

Running Grype or Trivy without understanding the dependency graph produces noise, not risk signal. Before configuring any scanner, measure the actual surface you are defending:

# Node.js — count all resolved transitive packages
npx npm-ls --all 2>/dev/null | tail -n +2 | wc -l
 
# Go — unique modules in the full dependency graph
go mod graph | tr ' ' '\n' | sort -u | wc -l
 
# Python — all installed packages in the current environment
pip list --format=json | jq 'length'

A bare Express.js application routinely resolves 150–300 transitive packages. A Go service using common infrastructure clients frequently crosses 100 modules. That is your actual attack surface, not the dozen entries in your package.json or go.mod.

The remediation gap: fix availability vs. actual patching

Vulnerable packages consumed where a patched version already existed95%

Vulnerable dependencies actually patched within 12 months~20%

Source: Sonatype, 10th Annual State of the Software Supply Chain, 2024

The chart above illustrates the core problem in dependency security. Sonatype's 2024 research found that 95% of the time a vulnerable open-source package is pulled into a build, a patched release is already published. Separately, 80% of application dependencies go un-upgraded for more than a year. The ~20% patched-within-12-months figure above is derived directly from that finding. The remediation gap is not technical — the fixes exist and are ready. It is operational: teams have not built the habit of consuming available updates, partly because scanner alert volume makes genuine signal hard to distinguish.

The controls worth running on every build

Not all security controls belong in the critical path of a PR. The productive question is not "what can we scan?" but "what justifies blocking a merge right now?" This table governs which controls belong at which stage and what the correct failure mode is:

| Control | Pipeline stage | Block the build when... | Otherwise | |---|---|---|---| | SCA (Grype, Trivy, Snyk) | PR + merge | Critical or high severity, fix available, in the production artifact | Open a ticket | | Secrets scan (Gitleaks, TruffleHog) | Pre-commit + PR | Any credential pattern matches | Block immediately | | SAST (Semgrep, CodeQL) | PR | High-confidence, high-severity rule hit in new code | Open a ticket | | Dockerfile lint (Hadolint) | PR | FROM :latest, root user, no explicit user directive | Warn on PR only | | Container image scan (Trivy) | Post-build + deploy gate | Critical OS-level CVE, publicly exploitable, fix available | Open a ticket | | License audit (FOSSA, Scancode) | PR | Copyleft license in a proprietary binary | Escalate | | Dependency confusion check | PR + build | Package resolves from public registry instead of internal proxy | Block immediately |

Getting SCA calibration right

The default behavior of most SCA tools is to fail on any finding above a CVSS score threshold. CVSS scores a vulnerability in isolation — it does not account for whether the vulnerable code path is reachable from your application, whether exploitation requires network access your service does not expose, or whether a fix exists at all. A better Grype configuration:

# .grype.yaml — commit this to the repository root
fail-on-severity: high
ignore:
  # Do not block on findings with no published fix
  - fix-state: "not-fixed"
  - fix-state: "wont-fix"
  # Document suppressed false positives with rationale
  - package:
      name: "webpack-dev-server"
      type: "npm"
    vulnerability: "CVE-2024-XXXXX"
    reason: "dev-only dependency, absent from production image"

For finer-grained policy, feed Grype's JSON output through Open Policy Agent. The EPSS (Exploit Prediction Scoring System) probability score adds useful signal alongside CVSS: a CVSS 9.8 with EPSS 0.003 carries a different operational priority than a CVSS 7.5 with EPSS 0.42, where the latter indicates roughly a 42% probability of observed exploitation within 30 days.

# policy/vulns.rego — committed alongside application code, reviewed in PR
package main
 
deny[msg] {
    vuln := input.matches[_].vulnerability
    vuln.severity == "Critical"
    vuln.fix.state == "fixed"
    msg := sprintf("Critical CVE with fix available: %s in %s@%s",
        [vuln.id, vuln.artifact.name, vuln.artifact.version])
}

This is security policy as code: reviewable, diffable, not dependent on a GUI toggle set three years ago by someone who has since left the company.

Secrets scanning: pre-commit is not optional

Pipeline-stage secrets scanning is a backstop. The real control is preventing secrets from entering git history in the first place. Once a credential lands in a commit, rotating it is correct and necessary — but it is not sufficient. The secret may already exist in forks, collaborator clones, CI caches, and log aggregation systems.

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.21.2
    hooks:
      - id: gitleaks
        args: ["--config", ".gitleaks.toml"]

Run pre-commit install after adding this file. On first configuration, baseline the scan to suppress any pre-existing false positives, then treat every new match as a mandatory block.

Sign everything, ship an SBOM

Artifact signing and SBOM generation are the two controls with the widest blast-radius benefit per unit of implementation effort. They do not prevent an attack. They change the time-to-answer for "are we affected by this CVE?" from two days of log archaeology and Slack threads to a ten-second registry query.

Keyless signing with Sigstore

Sigstore's keyless workflow uses short-lived OIDC identity tokens from the CI environment. There are no private signing keys to rotate, store, or lose:

# GitHub Actions: sign the built image by digest (never by tag)
- name: Sign container image
  env:
    COSIGN_EXPERIMENTAL: "1"
  run: |
    cosign sign --yes \
      --rekor-url https://rekor.sigstore.dev \
      "${IMAGE_REGISTRY}/${IMAGE_NAME}@${IMAGE_DIGEST}"
 
# At deploy time: verify before the workload is scheduled
- name: Verify image signature
  run: |
    cosign verify \
      --certificate-oidc-issuer https://token.actions.githubusercontent.com \
      --certificate-identity-regexp \
        "^https://github.com/your-org/your-repo/.github/workflows/.*" \
      "${IMAGE_REGISTRY}/${IMAGE_NAME}@${IMAGE_DIGEST}"

Signing without enforcing verification at admission is pure theater. The verification step belongs in an OPA Gatekeeper or Kyverno ClusterPolicy so that it cannot be bypassed by a deploy script that someone modifies under pressure.

Generating and attesting SBOMs

CISA published updated Minimum Elements for a Software Bill of Materials in 2025, promoting several previously "recommended" fields to baseline requirements and adding guidance for SaaS and AI software components. CISA, NSA, and 19 international partner agencies issued joint guidance urging cross-sector SBOM adoption. The regulatory trajectory is unambiguous: SBOMs are baseline hygiene, not an advanced capability reserved for compliance-heavy industries.

SBOM lifecycle: from build step to CVE response

01
Generate
Syft produces a CycloneDX or SPDX JSON SBOM from the final built image — not from the source tree. The multi-stage build strips dev tooling; scan what actually runs.
02
Attest
Cosign attests the SBOM as a signed predicate to the image digest in the OCI registry. The SBOM travels with the artifact, not alongside it in a separate store that can fall out of sync.
03
Verify at deploy
The admission controller or CD job verifies both the image signature and the SBOM attestation before scheduling the workload. No valid attestation means the deploy does not proceed.
04
Query on CVE
When a new CVE is published, query all attested SBOMs across the registry to identify affected images. This takes seconds. The alternative — tracing which teams pulled which dependency version across all deployments — takes days.

Source: CISA Minimum Elements for a Software Bill of Materials, 2025; Sigstore documentation

# Step 1: generate from the final image, not the source tree
syft "${IMAGE_REGISTRY}/${IMAGE_NAME}@${IMAGE_DIGEST}" \
  -o cyclonedx-json=./sbom.cyclonedx.json
 
# Step 2: attest the SBOM to the OCI registry alongside the image
cosign attest --yes \
  --predicate ./sbom.cyclonedx.json \
  --type cyclonedx \
  "${IMAGE_REGISTRY}/${IMAGE_NAME}@${IMAGE_DIGEST}"
 
# Step 4: on CVE announcement, query all attested SBOMs in the registry
grype sbom:./sbom.cyclonedx.json \
  --add-cpes-if-none \
  --output json \
  | jq '.matches[] | select(.vulnerability.id == "CVE-YYYY-XXXXX")'

The SPDX and CycloneDX formats are both widely supported. CycloneDX has slightly better tooling support in the Grype/Syft ecosystem as of 2025; SPDX is the format referenced in NIST SP 800-218. Either is correct; what matters is that you generate, attest, and can query them.

Harden the pipeline itself

The scanning and signing controls protect what you build. These protect the infrastructure doing the building.

before

Long-lived CI credentials

AWS access key and secret stored as CI environment variables
Key is valid indefinitely unless manually rotated
Scoped to the team or project, not to the specific job
Appears in build logs on accidental echo or debug output
Shared across branches, environments, and developers

after

OIDC-federated credentials

CI job requests a short-lived AWS credential via OIDC token exchange
Token expires within one hour maximum
IAM role scoped to the specific ECR repos and ECS services this job touches
No credential value to leak — the exchange happens server-side
Trust policy restricts to the specific repo, branch, and environment

Replacing long-lived CI secrets with OIDC federationSource: AWS IAM OIDC documentation; GitHub Actions OIDC documentation

The GitHub Actions configuration for OIDC:

jobs:
  deploy:
    permissions:
      id-token: write   # required to request the OIDC token
      contents: read
    environment: production
    steps:
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions-production-deploy
          aws-region: us-east-1
          # The resulting STS credential is scoped to 1 hour maximum

The IAM trust policy — scope it as tightly as your deploy pattern allows:

{
  "Effect": "Allow",
  "Principal": {
    "Federated": "arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com"
  },
  "Action": "sts:AssumeRoleWithWebIdentity",
  "Condition": {
    "StringEquals": {
      "token.actions.githubusercontent.com:sub": "repo:your-org/your-repo:environment:production"
    }
  }
}

A trust policy scoped to repo:your-org/* (with a wildcard) allows every repository in your GitHub organization to assume the production deploy role — including repositories created by contractors last week. Use StringEquals with the full repo path and environment, not StringLike with a wildcard, unless you have a documented reason.

Pin third-party Actions to commit SHAs

GitHub Actions version tags are mutable. uses: actions/checkout@v4 is a pointer that an account takeover can redirect to a malicious commit. Pin to the specific SHA that was reviewed:

# Good: pinned to a specific reviewed commit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
- uses: docker/setup-buildx-action@b5730b7e2b8d6b63e2c3e48e6e68df91ecf8e2b8  # v3.10.0
 
# Bad: the tag can be moved to point at different code without warning
- uses: actions/checkout@v4

Apply the same principle to Dockerfile base images:

# Pin by digest — the tag :20-alpine is for human readability only
FROM node:20-alpine@sha256:a73a7e659c9eed4dc916a68e7acb476f852e1bd6f3be1f02b14ca40daa17ec0c

Audit service account permissions

In Kubernetes, verify what the CI service account can actually do before granting it to a deploy job:

kubectl auth can-i --list \
  --as=system:serviceaccount:ci:deploy-agent \
  -n production

If the output includes secrets get, pods/exec, or any verb on resources in namespaces outside the deploy target, the service account is over-privileged. The correct scope is: read the current deployment, apply updated manifests, and update image references in the target namespace. Nothing else. cluster-admin bound to a CI service account is a complete cluster compromise waiting on a credential leak.

Pitfalls that appear in production

These are failure modes from real pipeline audits, not from documentation.

Pitfall: Scanning the wrong artifact. Most pipelines scan the source tree or the Dockerfile, not the final built image. A multi-stage Dockerfile's builder stage pulls in a compiler, test frameworks, and build tooling — 600+ packages. The production runner stage copies a single binary. Scanning the source sees 600 packages. Scanning the production image sees fewer than 20. Grype and Trivy both accept OCI image URIs. Scan the artifact that will actually run.

Pitfall: Dependency confusion attacks. A dependency confusion attack works by registering a public package with the same name as an internal package, at a higher version number. npm, pip, and gem all resolve the higher version from the public registry unless configured otherwise. Fix: use a private registry (AWS CodeArtifact, GCP Artifact Registry, Artifactory) that proxies and caches public registries, and configure your package manager to resolve exclusively from it. Validate in CI that no resolved package came from an unexpected source:

# Detect packages resolving outside your internal registry (npm lockfile)
jq '.packages | to_entries[]
    | select(.value.resolved != null)
    | select(
        .value.resolved
        | startswith("https://your-private-registry.example.com")
        | not
      )
    | .key' package-lock.json

Pitfall: Signing the tag, not the digest. Container tags are mutable. If you sign your-registry/app:latest and a new image is pushed under that tag, the signature now refers to the previous digest. Verification at deploy time passes because you are verifying the signature, not asserting which digest the tag currently resolves to. Always sign and verify by digest. Your CD tooling should resolve the tag to a digest at deploy time and pin that digest in the Kubernetes manifest.

Pitfall: OIDC trust policies scoped too broadly. As described above: a trust policy scoped to an entire GitHub organization gives every repository in that organization access to your production deploy role. Tighten to the specific repo, branch, and environment. Use StringEquals rather than StringLike where possible.

Pitfall: Transitive dependency updates being ignored. Dependabot and Renovate create PRs for direct dependencies. High-impact vulnerabilities frequently live in transitive packages that neither tool tracks by default. Configure Renovate with transitiveRemediation: true, or enable Dependabot's versioning-strategy: increase for dependency groups. These PRs are typically patch-level changes with low risk. The risk of not merging them accumulates quietly.

95% of the time a vulnerable open-source component is consumed, a patched version already exists in the registry — the gap is operational, not technical.

— Sonatype, 10th Annual State of the Software Supply Chain, 2024

A worked example: reaching a defensible baseline in one sprint

Scenario: an existing monorepo, three Node.js microservices and one Go service, no pipeline security controls, CI running on GitHub Actions. The goal is a defensible baseline in 10 working days without breaking current deployments.

Days 1–2: Audit without blocking. Run scanners in non-failing mode to understand the current state before changing any failure conditions.

# Trivy across the current production images — exit code 0 regardless of findings
for IMAGE in service-a service-b service-c; do
  trivy image --exit-code 0 --format json \
    --output "trivy-${IMAGE}.json" \
    "your-registry/${IMAGE}:latest"
done
 
# Summarize by severity across all services
jq -s '[.[].Results[].Vulnerabilities[]?
        | .Severity]
        | group_by(.)
        | map({(.[0]): length})
        | add' trivy-*.json

Categorize the output into three buckets: critical-with-fix-available (block candidates in sprint), medium-no-fix (near-term tickets), everything else (background). Present this as a risk inventory, not a failing grade.

Days 3–4: Secrets scanning, no exceptions. Add pre-commit Gitleaks and a pipeline scan job. Run a historical scan on the last 200 commits before going live with the pre-commit hook.

# Scan recent git history for exposed credentials
gitleaks detect --source . --log-opts="HEAD~200..HEAD"
 
# If findings exist: rotate the credential immediately, then assess
# whether history scrubbing is warranted (required for public repos or
# repos with external collaborators; optional for fully private repos)

Days 5–7: SCA with a calibrated block policy. Create .grype.yaml at the repository root with fail-on-severity: high and ignore rules for no-fix-available findings. Add Syft SBOM generation and Grype scanning to every PR workflow. Fail only on critical/high with a published fix. Route everything else to a Jira board via the pipeline's notification step.

# .github/workflows/security.yml (excerpt)
jobs:
  security-scan:
    steps:
      - name: Build image
        run: docker build -t "$IMAGE" .
 
      - name: Generate SBOM
        run: |
          syft "$IMAGE" -o cyclonedx-json=sbom.json
 
      - name: Scan SBOM with Grype
        run: |
          grype sbom:./sbom.json \
            --config .grype.yaml \
            --output table \
            --fail-on high

Days 8–10: OIDC migration for one service. Migrate the lowest-risk service's deploy job from long-lived secrets to OIDC. Validate the trust policy is scoped to repo and environment. Document the IAM role's permission set. Do not delete the long-lived keys immediately — keep them for two weeks to confirm no job is still using them before revoking.

At the end of this sprint, the pipeline would have caught the Codecov environment variable exfiltration (secrets scanning + no long-lived credentials to exfiltrate), the PyTorch dependency confusion attack (registry scope validation), and the most common CI credential leak patterns (OIDC federation). The full hardening program — Sigstore signing, attested SBOMs, SLSA level 2 provenance, OPA admission policies — is a quarter's work. This is the first sprint.

What to remember

Inventory all six supply chain links before choosing any tooling. Source and dependencies are two of six — base images, build credentials, artifacts, and the deploy path are where the major breaches of the last five years lived.
Calibrate SCA to block only on: critical or high severity, fix available, present in the production artifact. Everything else is a ticket, not a build failure. Security that survives contact with a deadline is security that actually blocks real threats.
Secrets scanning must run pre-commit, not only in CI. Once a credential lands in git history, rotating it is necessary but not sufficient — it persists in forks, clones, CI caches, and log systems.
Replace long-lived CI credentials with OIDC federation scoped to the specific repository, branch, and environment. A wildcard trust policy covering an entire organization grants your production deploy role to every repo in the org.
Sign artifacts by content digest using Sigstore and attest a CycloneDX or SPDX SBOM to the OCI registry. Querying attested SBOMs when a CVE lands takes seconds; Slack and wiki archaeology takes days.
Pin third-party Actions and base images to commit SHAs or content digests. Mutable tags are a silent trust dependency on every upstream maintainer account — any one of which can be compromised.