Kubernetes ships optimized for getting started, not for staying safe. That tradeoff is deliberate — the project prioritizes usability, and a cluster that demands a security specialist before first use is one most teams will not adopt. But the gap between a default installation and a production-ready one is wide, and automated attack tooling understands the defaults better than most cluster operators do. A new cluster appearing on the internet faces its first automated probe within roughly 18 minutes (Kubezilla, 2025). The Red Hat 2024 Kubernetes Security Report found that over 50 percent of respondents identified misconfigurations as the leading cause of their security incidents — not zero-days, not sophisticated nation-state supply-chain attacks. Defaults and operator oversights. That is actually good news: the largest fraction of risk is addressable by operational discipline, not an arms race with well-resourced adversaries.
~18 min
Time to first attack probe
new cluster on internet
50%+
Incidents caused by misconfiguration
Red Hat 2024
87%
Production images with critical/high CVEs
Aikido Security 2024
33%
Orgs with over half workloads over-privileged
CNCF Benchmark 2024
Source: Red Hat Kubernetes Security Report 2024; Kubezilla 2025; Aikido Security 2024; CNCF Benchmark Report April 2024
This post is not a 200-point checklist. It covers the layer-by-layer controls with the highest signal-to-effort ratio, with specific guidance on ordering and tradeoffs. None of these controls require a platform team or a Kubernetes security specialist to implement. They require discipline and a willingness to fix the things that break when you turn them on.
1. Network Policies: Default-Deny First, Then Allow
The default Kubernetes network model is a flat layer-3 mesh. Every pod can reach every other pod in the cluster across every namespace without restriction. A compromised service becomes a pivot point to every other service reachable from that host — which, without NetworkPolicy, is all of them. One misconfigured application, one compromised dependency, one leaked container credential, and an attacker has lateral movement across your entire workload.
Installing a default-deny NetworkPolicy is the single highest-leverage change you can make to a default cluster. One YAML file, applied per namespace:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: payments
spec:
podSelector: {}
policyTypes:
- Ingress
- EgressAn empty podSelector matches every pod in the namespace. Including Egress in policyTypes is not optional — ingress-only default-deny leaves outbound traffic completely unrestricted, which still permits data exfiltration and command-and-control beaconing from a compromised container.
The blast-radius reduction is concrete and calculable. In a namespace with 15 services and flat networking, a compromised service can reach all 14 others: 15 × 14 = 210 possible lateral paths. After applying a strict allow-list where each service declares only its real upstream and downstream dependencies, a typical microservice communicates with two or three others — collapsing the 210 paths to 2 or 3. That is roughly a 98 percent reduction in lateral movement surface for the cost of a YAML file and an afternoon of traffic mapping.
Default flat networking
- Every pod reaches every other pod across all namespaces
- Compromised service has 200+ lateral paths in a 15-service namespace
- Egress is unrestricted — exfiltration and C2 beaconing invisible at network layer
- No codified baseline means anomaly detection is impossible
Default-deny + explicit allow-list
- Each service explicitly declares its upstream and downstream peers
- Lateral reach collapses to 2–3 declared paths per service
- Egress is enumerated — unexpected destinations blocked by default
- Codified baselines make network-layer anomaly detection viable
One implementation requirement: NetworkPolicy is only enforced if your CNI plugin supports it. Kubenet does not. Calico, Cilium, Weave, and Antrea all do. Cilium is worth specific attention for workloads that need Layer 7 policy (HTTP method and path filtering) or DNS-based egress restriction — both are outside the standard NetworkPolicy API.
Before applying default-deny in production, add an explicit egress rule allowing DNS (UDP and TCP port 53 to your cluster DNS service) or you will cut pod DNS resolution immediately. Map traffic flows first, apply the deny, then add allow rules. Use warn and audit mode PSA labels during the mapping phase so you are not flying blind.
2. RBAC: No More "Temporary" cluster-admin
The most common finding in Kubernetes security reviews is a cluster-admin ClusterRoleBinding that was added during initial setup or an incident response session and was never removed. The comment in the commit usually says "temporary." The mechanism to enforce that intent does not exist in RBAC, so the binding stays indefinitely.
cluster-admin grants full control over every resource in the cluster — reading every Secret, modifying RBAC itself, scheduling privileged pods on any node. A service account with cluster-admin turns any container compromise that can access the mounted token into a full cluster takeover. Audit what exists first:
kubectl get clusterrolebindings \
-o custom-columns='NAME:.metadata.name,ROLE:.roleRef.name,SUBJECTS:.subjects' \
| grep cluster-adminFor each result, determine whether the subject genuinely requires cluster-wide admin rights. In nearly all cases the answer is no. Replace ClusterRole grants with namespace-scoped Role and RoleBinding:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-reader
namespace: monitoring
rules:
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list", "watch"]The verbs matter as much as the resources. The table below shows the risk gradient:
| Verb | Risk | Notes |
|---|---|---|
| get, list, watch | Low | Read-only; still restrict on Secrets and credentials |
| create, update, patch | Medium | Enables resource creation and modification |
| delete | High | Enables disruption of running workloads |
| escalate, bind, impersonate | Critical | Can elevate privileges to cluster-admin transitively |
| * (wildcard) | Critical | Functionally equivalent to cluster-admin |
| pods/exec | Critical | Direct shell access on any matched pod |
Wildcards on service accounts are the most common path from workload compromise to cluster takeover in real incidents. The CNCF K8s Benchmark Report (April 2024, covering 330,000-plus workloads across hundreds of organizations) found that a third of organizations had more than half their workloads carrying excessive privileges — in most cases inherited from a permissive default or from an operator who used wildcards because correct scoping required effort that was not budgeted.
Service accounts should also have automountServiceAccountToken: false set unless the workload actively calls the Kubernetes API. Most application pods do not:
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-app
namespace: api
automountServiceAccountToken: falseAn auto-mounted token is a valid API credential that an attacker gets for free if they achieve code execution inside the container. Disabling it on accounts that do not need it costs nothing. Override it at the Pod level in the PodSpec if specific pods in the same namespace do legitimately need API access.
3. Pod Security: Closing the Container Escape Paths
Container escapes — breakouts from the containerized process into the host OS — almost universally exploit permissions the workload never needed: privileged mode, host namespace sharing (hostPID, hostNetwork, hostIPC), dangerous Linux capabilities. The NSA/CISA guidance, the CNCF Security Technical Advisory Group, and every red team report on Kubernetes consistently identify this as the primary in-cluster escalation vector.
Pod Security Standards (PSS) and the Pod Security Admission (PSA) controller — which replaced the deprecated PodSecurityPolicy at Kubernetes 1.25 — are the built-in mechanism for restricting these at admission time. Three profiles cover the full range:
| Profile | What it blocks | When to use | |---|---|---| | Privileged | Nothing — allows all | CNI, CSI, and node-level system components only | | Baseline | Privileged containers, dangerous capabilities, host namespaces | Default for all application namespaces | | Restricted | Everything in Baseline plus: requires non-root UID, drops all capabilities, requires seccomp | Application workloads that can meet the requirements |
Enable enforcement via namespace labels:
apiVersion: v1
kind: Namespace
metadata:
name: api
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/enforce-version: latest
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/audit: restrictedUse warn and audit labels during migration — they surface violations without blocking deployments. Set a concrete deadline (one sprint at most) to remediate the warnings, then flip enforce on. Teams that leave PSA permanently in warn mode get no security benefit from it; the warnings go into API server logs that nobody monitors.
Note the enforce-version: latest setting. Pinning to a specific version (such as v1.25) means the policy does not pick up new Restricted profile requirements introduced in later Kubernetes releases. Use latest to track the current cluster version.
For workloads to comply with the Restricted profile, the PodSpec security context needs:
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: api
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]readOnlyRootFilesystem: true prevents an attacker with code execution from writing malware to the container filesystem, modifying binaries, or installing reconnaissance tools. The container can still write to explicitly declared emptyDir or persistentVolumeClaim volume mounts — but the image layer itself is immutable. Combined with allowPrivilegeEscalation: false (which blocks setuid binaries from gaining elevated privileges), these two flags alone eliminate a substantial fraction of the post-compromise actions available to an attacker inside the container.
seccompProfile: RuntimeDefault applies the container runtime's built-in seccomp filter (both containerd and CRI-O ship sensible defaults). For high-risk workloads, a custom Localhost profile can restrict the syscall surface further, but RuntimeDefault is a safe practical starting point that requires no custom profile authoring.
4. Secrets: Base64 Is Not Encryption
Kubernetes Secrets are stored in etcd as base64-encoded strings. Base64 is not encryption — it is a reversible encoding scheme that any CLI tool decodes in milliseconds. Anyone with read access to etcd, or to the API server with sufficient RBAC permissions, can extract every Secret in the cluster. Two controls address this at different layers.
Encryption at rest. The API server supports an EncryptionConfiguration resource that encrypts Secrets before writing to etcd. The recommended provider is AES-GCM with a 256-bit key:
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
providers:
- aesgcm:
keys:
- name: key1
secret: <base64-encoded-32-byte-key>
- identity: {}Pass --encryption-provider-config to kube-apiserver to activate it. After enabling, rewrite existing Secrets through the encryption path so they are not left as plaintext in etcd:
kubectl get secrets --all-namespaces -o json | kubectl replace -f -This closes the direct etcd-read vector — an attacker who can read etcd files on disk cannot read the secret values. It does not protect against a valid API server call from an over-permissioned account; RBAC controls that path.
External secret stores. For genuinely sensitive credentials — database passwords, private keys, payment processor tokens — treat Kubernetes Secrets as a runtime delivery mechanism, not as the authoritative store. The External Secrets Operator (ESO, a CNCF incubating project) synchronizes from AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault, or Azure Key Vault into Kubernetes Secrets on a configurable schedule:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: db-password
namespace: api
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: ClusterSecretStore
target:
name: db-password
creationPolicy: Owner
data:
- secretKey: password
remoteRef:
key: prod/api/db
property: passwordRotation becomes a property of the external store, not a Kubernetes operation. Set a short refreshInterval (1 hour or less for high-value secrets) so that rotated credentials propagate automatically without manual intervention.
One underrated problem: secrets injected as environment variables are visible to every process in the container, frequently logged by frameworks on startup, and included in crash dumps and heap snapshots. Mount secrets as files from volumes instead of environment variables wherever the application can consume them. File-based mounting limits exposure to code that explicitly reads the path.
5. Image Supply Chain: Scan, Minimize, Sign
The 87 percent figure — nearly nine in ten production container images carrying a critical or high CVE (Aikido Security analysis, 2024) — is not primarily a tooling failure. It is a failure of what happens after scanning. Teams that run Trivy or Snyk in CI but merge anyway because "we'll fix it in the next sprint" accumulate vulnerability debt that compounds faster than it gets paid down. Three practices together actually move the needle.
Scanning with blocking gates. Add Trivy or Grype to CI as a blocking step, not just a reporting step. Gate on CRITICAL severity CVEs that have a fix available. The --ignore-unfixed flag is important: blocking on CVEs where upstream has not released a patch generates noise that trains teams to dismiss all scan output.
trivy image \
--exit-code 1 \
--severity CRITICAL,HIGH \
--ignore-unfixed \
--format table \
my-registry/my-app:${GIT_SHA}Minimal base images. The standard node:20 image carries roughly 500 to 700 packages and typically presents dozens of CVEs. node:20-alpine reduces this to around 40 to 50 packages. gcr.io/distroless/nodejs20-debian12 reduces it further and removes the shell entirely — if there is no sh binary in the image, there is no shell for an attacker to exec into after achieving code execution. The ergonomic tradeoff is debuggability: distroless containers cannot be exec'd into interactively. Use kubectl debug with an ephemeral container attached at incident time rather than widening the base image to accommodate routine debugging.
For statically compiled Go or Rust binaries, a FROM scratch base produces a single-layer image with zero OS packages and zero inherited CVEs.
Image signing. Cosign (Sigstore, CNCF) signs container images at push time and enables cryptographic verification at deploy time. Paired with a Kyverno or OPA/Gatekeeper admission policy, you can require that every image deployed to a production namespace carries a valid signature from your CI pipeline. An unsigned image — whether pushed manually, injected via a compromised registry mirror, or introduced by a supply-chain attack — is rejected at admission before it ever runs.
cosign sign --key cosign.key my-registry/my-app:${GIT_SHA}
cosign verify --key cosign.pub my-registry/my-app:${GIT_SHA}6. Runtime Detection: When Static Controls Are Not Enough
Static controls define what should not be possible. They do not tell you when something anomalous is happening within the permitted envelope. An attacker who gains code execution inside a reasonably hardened container will attempt filesystem reconnaissance, internal network probing, and credential harvesting before attempting lateral movement. Runtime detection is the layer that catches this behavior.
Falco is the de facto open-source runtime security tool for Kubernetes. It runs as a DaemonSet and consumes the kernel system call stream via eBPF (in modern deployments) to detect anomalous behavior in real time. Default rules cover the highest-signal events:
- Shell spawned inside a container
- Sensitive file reads (
/etc/shadow,/proc/self/mem,/.dockerenv) - Network connections on unexpected ports from known services
setuid/setgidprivilege escalation attempts- Writes to system binary directories
A structured Falco alert:
{
"output": "Shell spawned in container (user=root container=api k8s.pod=api-78f9d-xm2 shell=bash)",
"priority": "WARNING",
"rule": "Terminal shell in container",
"time": "2026-01-08T14:23:01.000Z"
}Deploy falcosidekick alongside Falco to route alerts to your SIEM, PagerDuty, or Slack. A shell-in-container Falco alert correlated with a new outbound connection from the same pod is the kind of compound signal that should page someone immediately. Falco running but writing to pod stdout with no downstream consumer is worthless — wire the routing on day one.
API server audit logging is the second pillar. Enable it. A surprisingly large fraction of clusters run without it because it is not active by default and requires disk allocation. The following policy is a practical production starting point:
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
resources:
- group: ""
resources: ["secrets"]
- level: Request
verbs: ["create", "update", "patch", "delete"]
resources:
- group: ""
resources: ["pods", "pods/exec", "pods/attach"]
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: ""
resources: ["endpoints", "services"]level: Metadata on Secrets logs who accessed what and when, without capturing the secret value itself. level: Request on pod/exec events logs the full request. An exec into a production pod outside a maintenance window is either a policy violation or an active incident — you want a record of it either way.
7. Getting the Order Right
Applying these controls in the wrong sequence creates operational disruption without corresponding security improvement. Enabling strict pod security before workloads are ready breaks deployments. Encrypting etcd before fixing RBAC means the encrypted Secrets are still accessible to over-permissioned accounts that can call kubectl get secret. The sequence matters:
- 01
1. Audit existing state
Run Kubescape or Trivy Operator to baseline current RBAC bindings, privileged pods, host mounts, and secrets in environment variables. Do not change anything yet — understand what you are working with before touching it.
- 02
2. Network policies (default-deny)
Apply default-deny to every namespace containing sensitive workloads. Map undocumented flows using warn and audit labels first. Flip enforce on within one sprint. This closes the widest single gap for the least implementation effort.
- 03
3. RBAC cleanup
Remove cluster-admin bindings that are not strictly necessary. Scope service account roles to namespaces and minimum required verbs. Disable automountServiceAccountToken on accounts that do not call the Kubernetes API.
- 04
4. Pod security standards
Apply Baseline enforcement to all application namespaces. Migrate to Restricted where workloads can comply. Add readOnlyRootFilesystem and drop ALL capabilities to existing workload PodSpecs.
- 05
5. Secrets hardening
Enable EncryptionConfiguration on the API server. Deploy External Secrets Operator and migrate high-value secrets to Vault or your cloud secrets manager. Audit for secrets in image layers and environment variables.
- 06
6. Image hygiene
Add Trivy as a blocking CI gate on CRITICAL and HIGH with --ignore-unfixed. Switch base images to Alpine or distroless. Integrate Cosign signing for production builds and an admission policy to enforce signatures at deploy time.
- 07
7. Runtime detection
Deploy Falco with falcosidekick routing alerts to your SIEM or alerting stack. Enable API server audit logging scoped to Secrets and pod/exec events. Wire alerts into your on-call rotation before considering the cluster production-ready.
Source: NSA/CISA Kubernetes Hardening Guidance 1.2, 2022; CNCF Security Technical Advisory Group
8. Real-World Pitfalls and Fixes
These are the failure modes that appear consistently across cluster security reviews. Each has a specific cause and a specific fix.
Pitfall: Egress missing from default-deny. Teams apply ingress default-deny correctly but omit Egress from policyTypes. Ingress is locked; egress is wide open. A compromised container can freely exfiltrate data or beacon to command-and-control infrastructure without triggering any network-layer alert. Fix: always include both Ingress and Egress in the default-deny NetworkPolicy. Add an explicit egress rule for UDP and TCP port 53 to your cluster DNS before applying it or you will cut pod DNS resolution.
Pitfall: Default service account with ambient permissions. Helm charts and operators that do not specify a dedicated ServiceAccount attach workloads to the default service account. Any permissions granted to default in the namespace apply to every workload in that namespace that does not explicitly declare a different account. Fix: create a dedicated ServiceAccount per application, never grant permissions to default, and set automountServiceAccountToken: false on the namespace's default service account explicitly.
Pitfall: Secrets baked into image layers at build time. A COPY .env /app/.env instruction or credentials passed via --build-arg at build time embeds the secret in an immutable image layer. That layer persists in the registry, in CI artifact storage, in every local Docker cache across the development team, and potentially in public layer caches if the registry is public. Deleting the image tag does not remove the layer data from the registry. Fix: never pass secrets at build time. Audit Dockerfiles for COPY, ADD, and ARG statements that reference credential files. Use runtime injection via mounted volumes or the External Secrets Operator.
Pitfall: Broad RBAC roles granted to CI/CD pipelines. Deployment pipelines frequently receive cluster-admin because scoping the role correctly required effort that was not budgeted at project start. A compromised CI/CD credential then gives an attacker full cluster access. Fix: grant pipelines only the verbs required for deployment — typically get, list, create, update, patch on Deployments, Services, and ConfigMaps in specific target namespaces. Use IRSA (EKS), Workload Identity (GKE), or Pod Identity (AKS) to issue short-lived credentials scoped to each pipeline run instead of long-lived static tokens.
Pitfall: PSA permanently in warn mode. Teams enable PSA in warn mode, the warnings go into Kubernetes events and API server logs, nobody monitors either, and the security posture does not change. Fix: treat warn mode as a migration tool with a sprint-length deadline. Create a policy-as-code check in CI (Kyverno ClusterPolicy or OPA Conftest) that fails the pipeline if any workload manifest would violate the Restricted profile in the target namespace. That puts the feedback where engineers can act on it — in the PR, not in a log nobody reads.
Pitfall: Falco installed but alerts not routed. Falco running with output going to DaemonSet pod stdout, with no downstream consumer, provides no operational value. Security incidents occur with zero notification. Fix: deploy falcosidekick alongside Falco on day one and route alerts to at least one monitored destination — a Slack channel, a SIEM, an alerting stack — before the cluster receives production traffic.
Pitfall: Trivy gates reporting but not blocking. Scan results posted as PR comments are ignored under delivery pressure. Engineers merge with acknowledged-but-unfixed CVEs because the build does not fail. Fix: set --exit-code 1 in the Trivy command and use --ignore-unfixed to reduce noise. CRITICAL CVEs with available fixes should break the build. Teams that fight scanner noise because they are blocking on un-patchable CVEs are using the tool incorrectly; --ignore-unfixed solves this.
Kubernetes is not secure by default. Cluster administrators must configure security settings based on their workloads and the environment in which they are deployed.