Project ID: 21-Enterprise Kubernetes Platform Engineering-IND Team 2
12-week capstone: secure, observable, GitOps-driven computer vision on Kubernetes (CNCF certification track).
Platform scope
01 ยท Overview
This capstone simulates a real enterprise Kubernetes environment โ designed, deployed, secured, and operated end-to-end with alignment to the full CNCF certification track.
Cluster Architecture & Operations
Multi-node clusters via kind, k3s, and EKS. Control plane inspection, namespace isolation, resource scheduling, and HPA-driven autoscaling.
Application Delivery
Helm-packaged microservices with rolling updates, zero-downtime deployments, liveness/readiness probes, and externalised config via ConfigMaps and Secrets.
GitOps with ArgoCD
Declarative, Git-driven deployments with automatic reconciliation. Drift detection and self-heal demonstrated live against manually edited cluster state.
Shift-Left Security
Trivy image scanning with CI exit-code gates, Sealed Secrets review, and manifest validation scripts blocking non-compliant deploys before any kubectl apply runs.
Observability Stack
Prometheus metrics collection and Grafana dashboards. HPA events, resource trends, and incident simulation with documented recovery procedures.
CI/CD Automation
GitHub Actions pipelines: checkout โ YAML validate โ Docker build โ security scan โ Helm deploy. Blocking gates enforce quality before any apply runs.
02 ยท Skills Demonstrated
Hands-on engineering across the full Kubernetes production stack โ every skill listed was implemented, observed, and documented in lab.
Bootstrapped multi-node kind clusters with custom config. Verified control plane health, inspected API server and DNS endpoints, and established kubeconfig contexts.
Created namespace-scoped environments via YAML manifests. Verified workload isolation with kubectl exec and context switching enforced per team boundary.
Applied NoSchedule taints, confirmed pods remain Pending without matching tolerations, then added tolerations and observed placement. Used nodeSelector for additional control.
Deployed Metrics Server, created HPA targeting CPU utilisation, generated synthetic load inside pods, confirmed automatic replica scale-up via HPA event log.
Diagnosed and resolved four failure scenarios: CrashLoopBackOff (bad entrypoint), ImagePullBackOff (invalid tag), Pending pod (resource exhaustion), and Service selector mismatch.
Configured maxSurge: 1 / maxUnavailable: 0 for zero-downtime guarantees. Simulated a bad image deploy โ old pods kept serving. Recovered instantly with kubectl rollout undo.
Authored Helm charts parameterising image tag, replica count, and service type via values.yaml. Executed install, upgrade, and rollback โ demonstrated full release lifecycle across 4 revisions.
Injected config via envFrom (restart required) and volume mounts (auto-refresh without restart). Managed base64-encoded Secrets with --dry-run=client previews and safe describe inspection.
Implemented sidecar log-reader pattern using emptyDir shared volume between nginx main app and busybox sidecar. Confirmed shared filesystem at runtime with pod showing 2/2 Ready.
Built four-stage pipeline: checkout โ YAML syntax validation โ Docker build โ Helm deploy. Triggered on pull_request to main. Pipeline structure mirrors production GitOps workflows.
Deployed ArgoCD with Application manifest pointing to Git repo. Enabled auto-sync and self-heal. Introduced live drift via kubectl edit โ ArgoCD detected and reverted within seconds.
Scanned two container images for CRITICAL/HIGH CVEs. Used --exit-code 1 to block CI on critical findings, saving JSON audit output. Demonstrated warn-only vs. hard-gate comparison.
Provisioned dedicated ServiceAccounts per workload (trainer-sa, intake-sa, inference-sa, dashboard-sa). Namespace-scoped read-only Role + RoleBinding for the dashboard, limiting API access to pods, services, endpoints, and configmaps only.
Default-deny ingress for all pods in warehouse-cv with explicit allow rules for ingress-nginx โ services and approved app-to-app paths only. Enforced via dev overlay, leaving no implicit inter-pod reachability.
Pod Security Admission labels (restricted) applied to the application namespace. All workloads enforce runAsNonRoot, explicit UID 1000, seccompProfile: RuntimeDefault, and dropped all capabilities with privilege escalation disabled.
Cluster credentials delivered via SealedSecret โ encrypted values committed to Git, decrypted only by the in-cluster controller. Setup script interactively reseals if the placeholder sentinel is detected, making credential rotation an explicit, auditable step.
kube-prometheus-stack deployed via Helm. ServiceMonitor configured for the application namespace. Grafana dashboard provisioned via ConfigMap. System ingress exposes Prometheus and Grafana at dedicated internal hostnames.
ArgoCD auto-sync with prune, self-heal, and foreground pruning eliminates config drift. CI pipeline validates YAML with yamllint, renders Kustomize overlays, and runs kubeconform on all rendered output before any merge.
03 ยท Final Platform
The capstone culminated in a fully deployed warehouse computer-vision pipeline โ three Flask microservices running YOLO detection, managed entirely through GitOps.
footage-intake
Serves camera frame data from local image files. Exposes /stream, /frame, and /health. HPA maintains minimum two replicas.
cv-inference
Fetches frames from intake and runs YOLO object detection. Returns JSON detections via POST /detect. Sealed credentials for model registry access.
results-dashboard
Web UI that proxies the video stream and polls inference for live detections. Exposed at dashboard.warehouse-cv.internal via NGINX ingress.
model-finetune
Nightly CronJob for GPU-backed YOLO fine-tuning. Mounts a 20Gi PVC for model artifacts and consumes sealed object-store credentials. Scheduled for GPU nodes.
GitOps Delivery
Two ArgoCD applications sync from Kustomize overlays: warehouse-cv-dev and warehouse-cv-addons-dev. Auto-sync, prune, and self-heal enforce Git as the sole source of truth.
One-Command Bootstrap
A single interactive script handles preflight checks, kind cluster creation, Sealed Secrets, ArgoCD, ingress-nginx, and kube-prometheus-stack โ from zero to a fully synced cluster.
warehouse-cv # footage-intake ยท cv-inference ยท results-dashboard ยท model-finetune argocd # GitOps controller โ auto-sync, prune, self-heal ingress-nginx # North-south traffic, internal hostname routing monitoring # kube-prometheus-stack ยท ServiceMonitor ยท Grafana dashboard kube-system # Sealed Secrets controller
04 ยท Architecture
The platform layers cluster infrastructure, application delivery, security controls, and observability into a cohesive production environment.
Cluster Infrastructure
Application Layer
CI/CD & GitOps
Security Controls
Observability
Certification Alignment
| Focus Area | CKA | CKAD | CKS |
|---|---|---|---|
| Cluster Architecture | โ | ||
| Namespaces & Quotas | โ | ||
| Scheduling & Resources | โ | ||
| Troubleshooting | โ | ||
| Application Deployment | โ | ||
| ConfigMaps & Secrets | โ | โ | |
| Scaling & Probes | โ | โ | |
| CI/CD & GitOps | โ | ||
| RBAC & NetworkPolicy | โ | ||
| Pod Security & Scanning | โ | ||
| Monitoring & Observability | โ |
05 ยท Roadmap
Structured to mirror real Kubernetes industry adoption: platform fundamentals first, application patterns next, security hardening last.
Kubernetes Foundations
Cluster setup with kind, kubectl fundamentals, namespaces, context switching, and basic Deployment + Service patterns.
Working cluster with namespaced environmentsCluster Operations & Troubleshooting
Resource requests/limits, taints, tolerations, HPA, and debugging CrashLoopBackOff, ImagePullBackOff, and selector mismatches.
Resource isolation policies + troubleshooting reportApplication Deployment Phase
ConfigMaps and Secrets injection, liveness and readiness probes, rolling updates with maxUnavailable: 0, and rollback workflows.
Stable app deployments ยท CKA Exam TargetApplication Patterns & CI/CD
Helm chart packaging, multi-container sidecar pods, GitHub Actions CI pipeline, and Blue-Green/Canary deployment strategies.
Helm-packaged apps + automated CI pipelineGitOps & Application Security
ArgoCD auto-sync and self-heal, Trivy image scanning with CI gates, Sealed Secrets review, and manifest validation scripts.
GitOps pipeline ยท CKAD Exam TargetSecurity & Observability
Least-privilege RBAC, scoped ServiceAccounts, NetworkPolicies, Pod Security Standards (restricted), Sealed Secrets, Prometheus + Grafana, and ArgoCD drift protection.
Warehouse CV platform โ fully deployed โ06 ยท Deliverables
Each phase closes with a concrete, demonstrable deliverable mapped to CNCF certification objectives.
Kubernetes Foundations
Working cluster with namespaced environments and baseline kubectl output as cluster snapshot.
Cluster Operations & Troubleshooting
Scalable workloads, resource isolation policies, and a documented troubleshooting report.
Application Deployment
Stable deployments, zero-downtime rolling updates, and externalised configuration management.
Application Patterns & CI/CD
Helm-packaged apps, automated CI pipeline with validation gate, and release workflow docs.
GitOps & Application Security
GitOps-driven deployments with Trivy-scanned images and CI-enforced security gates.
Kubernetes Security & Observability
Hardened cluster, Grafana dashboards, and live final architecture demo.
07 ยท Repository
All manifests, Helm charts, CI configs, and lab logs are version-controlled in a single monorepo.
โโโ docs/ # Architecture diagrams and security models โโโ Docker/ # Service Dockerfiles and image build script โโโ k8s/ # Manifests: Base, Overlays, RBAC โโโ gitops/ # ArgoCD ApplicationSets โโโ monitoring/ # Prometheus + Grafana configs โโโ scripts/ # cluster-setup.sh bootstrap script โโโ labs/ # Weekly milestone logs
08 ยท Team
Five engineers building production-grade Kubernetes infrastructure and working toward CNCF certification.
Advisor / Instructor: Arthur Choi ยท Industry Sponsor: Sudheer Amgothu