Kubernetes Deployment

TL;DR

Deploy the open-source CMDOP server on Kubernetes for scale. Run the control plane (api_server, :8000) and the relay (grpc_server, :50051) as separate Deployments backed by managed PostgreSQL and Redis. Expose REST over an Ingress with TLS via cert-manager, and the gRPC relay over a gRPC-aware Ingress. Use /healthz/live and /healthz/ready probes, and autoscale the relay on active gRPC streams rather than CPU.

Kubernetes is the scale tier (L3 in the scale ladder — up to ~50,000 agents). The server is a multi-process Python stack, so run each process as its own Deployment rather than as a single container.

There is no published Helm chart yet. Deploy with the manual manifests below, or adapt the OSS deploy/compose.oss.yml with a Compose-to-Kubernetes tool. At this scale, use managed Postgres and Redis rather than in-cluster single instances.

What are the prerequisites?

Kubernetes 1.24+
kubectl configured
A managed PostgreSQL 16 and Redis 7 (recommended over in-cluster) or operators such as CloudNativePG
cert-manager (for TLS) and an Ingress controller with gRPC support

How do I create the namespace?


# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: cmdop

How do I configure secrets?


# secrets.yaml
apiVersion: v1
kind: Secret
metadata:
  name: cmdop-secrets
  namespace: cmdop
type: Opaque
stringData:
  CMDOP_DATABASE_URL: postgresql+asyncpg://cmdop_app:PASS@postgres:5432/cmdop
  CMDOP_ADMIN_DATABASE_URL: postgresql+asyncpg://cmdop_admin:PASS@postgres:5432/cmdop
  CMDOP_REDIS_URL: redis://redis:6379/0
  CMDOP_INTERNAL_SECRET: "<openssl rand -hex 32>"

How do I configure non-secret settings?


# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: cmdop-config
  namespace: cmdop
data:
  CMDOP_DB_MODE: standalone
  CMDOP_ENVIRONMENT: prod
  CMDOP_REST_HOST: "0.0.0.0"
  CMDOP_REST_PORT: "8000"
  CMDOP_GRPC_HOST: "0.0.0.0"
  CMDOP_GRPC_PORT: "50051"
  CMDOP_LOG_JSON: "true"

All processes read the same CMDOP_* environment. Mount both the ConfigMap and the Secret via envFrom so each Deployment shares the configuration.

How do I run migrations?

Run alembic upgrade head as a Job (or an init container) before rolling out the Deployments, using the same image as api_server:


# migrate-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: cmdop-migrate
  namespace: cmdop
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: migrate
          image: ghcr.io/cmdop/cmdop-api:latest
          command: ["alembic", "upgrade", "head"]
          envFrom:
            - configMapRef: {name: cmdop-config}
            - secretRef: {name: cmdop-secrets}

What do the Deployments look like?

Run the control plane and the relay as separate Deployments.


# api-deployment.yaml — REST control plane
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cmdop-api
  namespace: cmdop
spec:
  replicas: 2
  selector:
    matchLabels: {app: cmdop-api}
  template:
    metadata:
      labels: {app: cmdop-api}
    spec:
      containers:
        - name: api
          image: ghcr.io/cmdop/cmdop-api:latest
          ports:
            - {containerPort: 8000, name: http}
          envFrom:
            - configMapRef: {name: cmdop-config}
            - secretRef: {name: cmdop-secrets}
          livenessProbe:
            httpGet: {path: /healthz/live, port: 8000}
            periodSeconds: 10
          readinessProbe:
            httpGet: {path: /healthz/ready, port: 8000}
            periodSeconds: 5
          resources:
            requests: {cpu: 500m, memory: 1Gi}
            limits: {cpu: "2", memory: 4Gi}


# grpc-deployment.yaml — the relay
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cmdop-grpc
  namespace: cmdop
spec:
  replicas: 2
  selector:
    matchLabels: {app: cmdop-grpc}
  template:
    metadata:
      labels: {app: cmdop-grpc}
    spec:
      containers:
        - name: grpc
          image: ghcr.io/cmdop/cmdop-grpc:latest
          ports:
            - {containerPort: 50051, name: grpc}
          envFrom:
            - configMapRef: {name: cmdop-config}
            - secretRef: {name: cmdop-secrets}
          resources:
            requests: {cpu: 500m, memory: 1Gi}
            limits: {cpu: "2", memory: 4Gi}

For grpc_server use grpc_health_probe (the gRPC Health-Checking protocol is registered) for liveness/readiness rather than an HTTP probe. Run the worker as a third Deployment with no Service and no ports.

How do I expose the services?


# services.yaml
apiVersion: v1
kind: Service
metadata: {name: cmdop-api, namespace: cmdop}
spec:
  selector: {app: cmdop-api}
  ports:
    - {name: http, port: 80, targetPort: 8000}
---
apiVersion: v1
kind: Service
metadata: {name: cmdop-grpc, namespace: cmdop}
spec:
  selector: {app: cmdop-grpc}
  ports:
    - {name: grpc, port: 50051, targetPort: 50051}

How do I configure Ingress with TLS?

REST control plane:


# api-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: cmdop-api
  namespace: cmdop
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
    - {hosts: [api.example.com], secretName: cmdop-api-tls}
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend: {service: {name: cmdop-api, port: {number: 80}}}

gRPC relay (needs a gRPC-aware backend protocol):


# grpc-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: cmdop-grpc
  namespace: cmdop
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/backend-protocol: GRPC
spec:
  ingressClassName: nginx
  tls:
    - {hosts: [grpc.example.com], secretName: cmdop-grpc-tls}
  rules:
    - host: grpc.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend: {service: {name: cmdop-grpc, port: {number: 50051}}}

How do I deploy PostgreSQL and Redis?

Prefer managed offerings (RDS / Cloud SQL / Aiven for Postgres; ElastiCache / Memorystore / Upstash for Redis). If you must run them in-cluster, CloudNativePG and a Redis operator are reasonable choices. Point CMDOP_DATABASE_URL, CMDOP_ADMIN_DATABASE_URL, and CMDOP_REDIS_URL at them. With PgBouncer, use transaction mode (RLS uses SET LOCAL, which transaction mode supports).

How do I configure autoscaling?

Scale the relay on active gRPC streams, not CPU — a relay can hold thousands of mostly-idle streams while CPU stays low. Export the stream-duration metric and drive an HPA (via Prometheus Adapter / KEDA) off the rate of active streams:


apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: {name: cmdop-grpc, namespace: cmdop}
spec:
  scaleTargetRef: {apiVersion: apps/v1, kind: Deployment, name: cmdop-grpc}
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Pods
      pods:
        metric: {name: cmdop_grpc_active_streams}
        target: {type: AverageValue, averageValue: "500"}

How do I apply everything?


kubectl apply -f namespace.yaml
kubectl apply -f secrets.yaml
kubectl apply -f configmap.yaml
kubectl apply -f migrate-job.yaml
kubectl wait --for=condition=complete job/cmdop-migrate -n cmdop
kubectl apply -f api-deployment.yaml -f grpc-deployment.yaml -f services.yaml
kubectl apply -f api-ingress.yaml -f grpc-ingress.yaml

How do I verify the deployment?


kubectl get pods -n cmdop
kubectl get svc,ingress -n cmdop
kubectl logs -n cmdop -l app=cmdop-grpc -f

How do I set up monitoring?

The api_server serves /metrics on :8000. Scrape it with a ServiceMonitor (keep it off the public Ingress):


apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata: {name: cmdop-api, namespace: cmdop}
spec:
  selector:
    matchLabels: {app: cmdop-api}
  endpoints:
    - {port: http, path: /metrics}

After every rollout, run just audit-rls against the database to confirm RLS coverage.

How do I troubleshoot Kubernetes issues?


kubectl describe pod -n cmdop -l app=cmdop-grpc
kubectl logs -n cmdop -l app=cmdop-grpc --previous
kubectl exec -it -n cmdop deploy/cmdop-api -- sh

What should I read next?

Docker — the simpler single-node Compose path
Self-Hosted — config, TLS, and the scale ladder