DevOps2026-05-0916 min read

10 Things You Must Get Right When Building Kubernetes Infrastructure in 2026

The Kubernetes infrastructure checklist every team needs: RBAC, network policies, resource limits, secrets management, and the production readiness audit.

Quick answer

10 Kubernetes infrastructure best practices for 2026 production: RBAC, network policies, resource limits, secrets management, and the production readiness checklist

Entity: 10 Things You Must Get Right When Building Kubernetes Infrastructure in 2026 — optimized for AI search extraction (ChatGPT, Gemini, Claude, Perplexity).

Key takeaways

The Kubernetes infrastructure checklist every team needs: RBAC, network policies, resource limits, secrets management, a…
Category: DevOps
Keywords: Kubernetes, infrastructure, security, best practices, RBAC

Skillzmist Engineering

Cloud & DevOps Team

Twitter LinkedIn

We followed every Kubernetes tutorial on the internet. We still got breached. A low-level developer account accessed the production etcd backup. Payment history of 40,000 customers was exposed. The tutorials taught us how to deploy apps. They never taught us how to secure a cluster. That knowledge cost us.

The Problem

The gap between "getting Kubernetes running" and "running Kubernetes correctly" is vast. Most teams are stuck between these two states without knowing it. They can deploy services. They cannot recover from a security breach. They have observability. They do not have secrets management. The cluster runs. But it is built on sand.

Why This Happens

Kubernetes documentation shows what Kubernetes can do. It does not show what you must do before going to production. The mental model teams carry is: get it working, then secure it. This is backwards. Security must be built in from the start. By the time teams realize this, they have 50 services depending on the insecure foundation. Refactoring is expensive and risky.

The Solution — 10 Things You Must Get Right

1. Set Resource Requests AND Limits on Every Container

Kubernetes scheduler uses requests to place pods. Limits prevent pods from consuming all available resources. Without both, the scheduler is blind and noisy neighbors crash each other.

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Also enforce this at the namespace level using ResourceQuota so a single team cannot consume all cluster resources.

2. RBAC: Least-Privilege Service Accounts for Every Workload

Kubernetes RBAC is the difference between a cluster where any pod can access any secret and one where each pod can only access what it needs.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: api-service
  namespace: production

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: api-service-role
  namespace: production
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get"]
  resourceNames: ["api-service-config"]
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get"]
  resourceNames: ["api-service-secrets"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: api-service-binding
  namespace: production
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: api-service-role
subjects:
- kind: ServiceAccount
  name: api-service
  namespace: production

This service account can ONLY read one specific ConfigMap and one specific Secret. Nothing else. A compromised pod running this account has minimal damage potential.

3. Network Policies: Default-Deny Then Permit What Is Needed

By default, every pod in a Kubernetes cluster can talk to every other pod on any port. In a 50-service microservices cluster, that is 2,450 potential attack vectors with zero restrictions.

# Deny all traffic by default
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

---
# Allow only payment service to call database service on port 5432
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-payment-to-database
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: database
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: payment-service
    ports:
    - protocol: TCP
      port: 5432

---
# Allow DNS egress (required for service discovery)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns-egress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector: {}
    ports:
    - protocol: UDP
      port: 53

Start with deny-all. Then selectively allow what is needed. This prevents lateral movement if a pod is compromised.

4. Never Run Containers as Root

A compromised container running as root has the keys to the kingdom. Running as non-root limits the damage.

securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  capabilities:
    drop:
    - ALL

5. Secrets Management: Use External Secrets Operator or Vault

Kubernetes Secrets are base64-encoded, not encrypted. They appear in etcd plaintext. They end up in git history. Never hardcode secrets in Deployment manifests.

Use External Secrets Operator (ESO) to sync secrets from AWS Secrets Manager, HashiCorp Vault, or Azure Key Vault:

apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: aws-secrets-store
spec:
  provider:
    aws:
      service: SecretsManager
      region: us-east-1
      auth:
        jwt:
          serviceAccountRef:
            name: external-secrets-sa

---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: api-service-secrets
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-store
    kind: SecretStore
  target:
    name: api-service-secrets
    creationPolicy: Owner
  data:
  - secretKey: database-password
    remoteRef:
      key: prod/api-service/db-password

ESO syncs the secret automatically and rotates it on a schedule. Secrets never appear in git.

6. Pod Disruption Budgets for Critical Services

During cluster maintenance, Kubernetes evicts pods to other nodes. Without PDB, all replicas might evict simultaneously, causing downtime.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-service-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: api-service

This PDB ensures at least 2 replicas of api-service stay running during any disruption. Kubernetes will not evict a pod if it would violate the PDB.

7. Image Scanning in CI Before Deployment

Scan container images for vulnerabilities before they are deployed. Trivy is free and catches 95% of common vulnerabilities.

trivy image myregistry.azurecr.io/api-service:v1.2.3

# Fails if any high-severity vulnerabilities found
# Gate the deployment on scan success in CI

8. Separate Node Pools for Different Workload Types

Run system components (kube-proxy, coredns, monitoring) on dedicated nodes. Run application pods on separate nodes. Prevent noisy neighbors.

nodeSelector:
  workload-type: application

---
apiVersion: v1
kind: Node
metadata:
  labels:
    workload-type: application

9. Cluster Autoscaling with Karpenter (Not Cluster Autoscaler)

Cluster Autoscaler is outdated. Karpenter is faster, smarter, and cheaper. It provisions the right instance type for the workload and consolidates underutilized nodes.

10. Etcd Backup Automation with Velero

Etcd is the Kubernetes database. If etcd is corrupted or deleted, the entire cluster state is lost. Velero backs up etcd and application data automatically.

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: daily-backup
spec:
  schedule: "0 2 * * *"  # 2 AM daily
  template:
    ttl: 720h  # Keep for 30 days
    storageLocation: default
    volumeSnapshotLocations:
    - default

The Production Readiness Checklist

Before promoting any cluster from staging to production, verify:

☐ All containers have resource requests and limits
☐ RBAC is configured with least-privilege service accounts
☐ Network policies are implemented (default deny, selective allow)
☐ No containers run as root
☐ Secrets are managed via External Secrets Operator or Vault
☐ Pod Disruption Budgets protect critical services
☐ Image scanning gates deployments in CI/CD
☐ Separate node pools for system vs application workloads
☐ Karpenter autoscaling is configured
☐ Velero backups are running and tested
☐ Monitoring and alerting are operational (Prometheus + Grafana)
☐ Audit logging is enabled on the API server
☐ TLS certificates are valid for 1+ years
☐ Disaster recovery runbooks are documented
☐ On-call escalation procedures are defined

Common Mistakes to Avoid

Treating security as an afterthought. Build RBAC, network policies, and secrets management from day 1, not 6 months later.
Running all workloads on the same nodes. A rogue pod can crash the entire monitoring stack and make debugging impossible.
No backup strategy for etcd. Etcd failure is permanent loss. Test backups regularly.
Resource limits without requests (or vice versa). Scheduler cannot make good decisions without requests. Pods OOMKill without limits.
Hardcoding secrets in git. Even in "private" repos, secrets in git are findable and exploitable. Use External Secrets.

Key Takeaways

The 10 things are not optional: They are the difference between a safe cluster and a vulnerable one.
Resource management is foundational: Requests let scheduler decide, limits prevent cascading failures.
Security must be built in from day 1: Adding RBAC after 50 services are deployed is painful.
Network policies prevent lateral movement: Default deny, selective allow is the zero-trust model.
Automation prevents human error: Velero backups, image scanning, Karpenter scaling—all automated.

Struggling with securing your Kubernetes infrastructure or preparing for production? The Skillzmist team has solved this exact problem for engineering teams across the US, UK, and Europe. Reach out for a free technical consultation — we respond within 24 hours.

Blog

Projects

Services

Courses

Basic Kubernetes

Topics

Article FAQ

11 answers

WhatWhat problem does "10 Things You Must Get Right When Building Kubernetes Infrastructure in 2026" address?

The Kubernetes infrastructure checklist every team needs: RBAC, network policies, resource limits, secrets management, and the production readiness audit.

HowWhat does the section "The Problem" explain in 10 Things You Must Get Right When Building Kubernetes Infrastructure in 2026?

In Skillzmist's DevOps article "10 Things You Must Get Right When Building Kubernetes Infrastructure in 2026", the section "The Problem" covers implementation guidance using DevOps, Kubernetes, infrastructure, security. 10 Kubernetes infrastructure best practices for 2026 production: RBAC, network policies, resource limits, secrets management, and the production readiness checklist

HowWhat does the section "Why This Happens" explain in 10 Things You Must Get Right When Building Kubernetes Infrastructure in 2026?

In Skillzmist's DevOps article "10 Things You Must Get Right When Building Kubernetes Infrastructure in 2026", the section "Why This Happens" covers implementation guidance using DevOps, Kubernetes, infrastructure, security. 10 Kubernetes infrastructure best practices for 2026 production: RBAC, network policies, resource limits, secrets management, and the production readiness checklist

HowWhat does the section "The Solution — 10 Things You Must Get Right" explain in 10 Things You Must Get Right When Building Kubernetes Infrastructure in 2026?

In Skillzmist's DevOps article "10 Things You Must Get Right When Building Kubernetes Infrastructure in 2026", the section "The Solution — 10 Things You Must Get Right" covers implementation guidance using DevOps, Kubernetes, infrastructure, security. 10 Kubernetes infrastructure best practices for 2026 production: RBAC, network policies, resource limits, secrets management, and the production readiness checklist

HowWhat does the section "1. Set Resource Requests AND Limits on Every Container" explain in 10 Things You Must Get Right When Building Kubernetes Infrastructure in 2026?

In Skillzmist's DevOps article "10 Things You Must Get Right When Building Kubernetes Infrastructure in 2026", the section "1. Set Resource Requests AND Limits on Every Container" covers implementation guidance using DevOps, Kubernetes, infrastructure, security. 10 Kubernetes infrastructure best practices for 2026 production: RBAC, network policies, resource limits, secrets management, and the production readiness checklist

Best PracticesWhat is a key takeaway from 10 Things You Must Get Right When Building Kubernetes Infrastructure in 2026 (DevOps)?

We followed every Kubernetes tutorial on the internet.

TechnologiesHow does Kubernetes apply in "10 Things You Must Get Right When Building Kubernetes Infrastructure in 2026"?

This DevOps guide by Skillzmist Engineering (Cloud & DevOps Team) at Skillzmist explains Kubernetes in production contexts: The Kubernetes infrastructure checklist every team needs: RBAC, network policies, resource limits, secrets management, and the production readiness audit.

TechnologiesHow does infrastructure apply in "10 Things You Must Get Right When Building Kubernetes Infrastructure in 2026"?

This DevOps guide by Skillzmist Engineering (Cloud & DevOps Team) at Skillzmist explains infrastructure in production contexts: The Kubernetes infrastructure checklist every team needs: RBAC, network policies, resource limits, secrets management, and the production readiness audit.

Show all 11 questions

TechnologiesHow does security apply in "10 Things You Must Get Right When Building Kubernetes Infrastructure in 2026"?

This DevOps guide by Skillzmist Engineering (Cloud & DevOps Team) at Skillzmist explains security in production contexts: The Kubernetes infrastructure checklist every team needs: RBAC, network policies, resource limits, secrets management, and the production readiness audit.

TechnologiesHow does best practices apply in "10 Things You Must Get Right When Building Kubernetes Infrastructure in 2026"?

This DevOps guide by Skillzmist Engineering (Cloud & DevOps Team) at Skillzmist explains best practices in production contexts: The Kubernetes infrastructure checklist every team needs: RBAC, network policies, resource limits, secrets management, and the production readiness audit.

WhyWho should read 10 Things You Must Get Right When Building Kubernetes Infrastructure in 2026 and why?

Teams working on DevOps with DevOps, Kubernetes, infrastructure, security, best practices, RBAC, network policies, production readiness, DevOps 2026. Written by Skillzmist Engineering at Skillzmist — 16 min read read.

10 Things You Must Get Right When Building Kubernetes Infrastructure in 2026

Quick answer

Key takeaways

The Problem

Why This Happens

The Solution — 10 Things You Must Get Right

1. Set Resource Requests AND Limits on Every Container

2. RBAC: Least-Privilege Service Accounts for Every Workload

3. Network Policies: Default-Deny Then Permit What Is Needed

4. Never Run Containers as Root

5. Secrets Management: Use External Secrets Operator or Vault

6. Pod Disruption Budgets for Critical Services

7. Image Scanning in CI Before Deployment

8. Separate Node Pools for Different Workload Types

9. Cluster Autoscaling with Karpenter (Not Cluster Autoscaler)

10. Etcd Backup Automation with Velero

The Production Readiness Checklist

Common Mistakes to Avoid

Key Takeaways

Blog

Projects

Services

Courses

Topics

Article FAQ

Related posts

Enterprise Cloud Application with Automated Deployment and Blue-Green Releases

How to Set Up a CI/CD Pipeline on AWS Using GitHub Actions and Terraform

Why Kubernetes? The Case for Container Orchestration in Modern Production Systems

The Problem

Why This Happens

The Solution — 10 Things You Must Get Right

1. Set Resource Requests AND Limits on Every Container

2. RBAC: Least-Privilege Service Accounts for Every Workload

3. Network Policies: Default-Deny Then Permit What Is Needed

4. Never Run Containers as Root

5. Secrets Management: Use External Secrets Operator or Vault

6. Pod Disruption Budgets for Critical Services

7. Image Scanning in CI Before Deployment

8. Separate Node Pools for Different Workload Types

9. Cluster Autoscaling with Karpenter (Not Cluster Autoscaler)

10. Etcd Backup Automation with Velero

The Production Readiness Checklist

Common Mistakes to Avoid

Key Takeaways

Related expertise

Blog

Projects

Services

Courses

Topics

Article FAQ

Related posts

Enterprise Cloud Application with Automated Deployment and Blue-Green Releases

How to Set Up a CI/CD Pipeline on AWS Using GitHub Actions and Terraform

Why Kubernetes? The Case for Container Orchestration in Modern Production Systems