Skip to main content

keda

Overview

  • Namespace: keda
  • Purpose: Kubernetes Event-Driven Autoscaling - PRODUCTION
  • Age: ~213 days (since May 2024)
  • Status: Active - Event-driven autoscaling platform
  • Workloads: 3 deployments (all active)
  • Environment: PRODUCTION - Enables queue-based scaling

Architecture

KEDA (Kubernetes Event-Driven Autoscaling) for event-driven workload scaling:

  • KEDA Operator: Core operator for ScaledObject management (1 replica)
  • Admission Webhooks: Validation for ScaledObject creation (1 replica)
  • Metrics API Server: Exposes custom metrics for scaling (1 replica)

Auto-Scaling Configuration

Not Applicable:

  • KEDA itself provides autoscaling capability (not auto-scaled)
  • Fixed 1 replica for each component
  • Core infrastructure component

Workload Categories

KEDA Operator (1 deployment)

NameReplicasStatusPurpose
keda-operator1/1RunningEvent-driven autoscaling operator

Admission Webhooks (1 deployment)

NameReplicasStatusPurpose
keda-admission-webhooks1/1RunningWebhook validation for ScaledObjects

Metrics API Server (1 deployment)

NameReplicasStatusPurpose
keda-operator-metrics-apiserver1/1RunningCustom metrics provider

Services

NameTypeCluster IPPortsPurpose
keda-operatorClusterIP10.8.30.559666Operator metrics
keda-admission-webhooksClusterIP10.8.19.108443Webhook service
keda-operator-metrics-apiserverClusterIP10.8.27.114443, 8080Custom metrics API

Access & Management

View all resources:

kubectl get all -n keda
kubectl get scaledobjects --all-namespaces | head -20

Check operator status:

# View KEDA deployments
kubectl get deployments -n keda

# View KEDA logs
kubectl logs -f deployment/keda-operator -n keda
kubectl logs -f deployment/keda-admission-webhooks -n keda

# Check for errors
kubectl logs deployment/keda-operator -n keda --tail=100 | grep -i "error"

Monitor scaling:

# List all ScaledObjects
kubectl get scaledobjects --all-namespaces

# Watch ScaledObject activity
kubectl get scaledobjects -A -w

# Check ScaledObject details
kubectl describe scaledobject <name> -n <namespace>

Restart services:

# Restart KEDA operator
kubectl rollout restart deployment/keda-operator -n keda

# Restart all KEDA components
kubectl rollout restart deployment --all -n keda

Monitoring

Operator metrics:

kubectl top pods -n keda

# Check operator health
kubectl port-forward -n keda service/keda-operator 9666:9666

Metrics server:

# Check metrics API
kubectl port-forward -n keda service/keda-operator-metrics-apiserver 8080:8080

# List custom metrics
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq .

Events:

kubectl get events -n keda --sort-by='.lastTimestamp' | head -20

Scaling Sources

KEDA supports scaling based on:

  • Message Queues: RabbitMQ, Kafka, Azure Service Bus
  • Databases: PostgreSQL, MySQL, Redis
  • Cloud Services: AWS SQS, Azure Queue Storage
  • Metrics: Prometheus, Datadog, New Relic
  • HTTP: Custom webhooks and HTTP endpoints
  • Schedulers: Cron jobs, calendars
  • External: Custom scalers

Data Flow

Event Source (Queue, Metrics, etc.)

KEDA Operator (Monitoring)

ScaledObject (Scaling configuration)

Metrics API Server (Provides metrics)

Kubernetes HPA (Scales deployment)

Pods scale up/down based on events

KEDA Workflow

1. Operator

  • 1 replica (core component)
  • Watches ScaledObjects across cluster
  • Monitors event sources
  • Creates/manages HPA resources
  • Handles scaling logic

2. Admission Webhooks

  • Validates ScaledObject YAML
  • Prevents invalid configurations
  • Webhook validation rules
  • Ensures correct trigger syntax

3. Metrics API Server

  • Exposes custom metrics
  • Integrates with Kubernetes metrics
  • Provides scaling metrics
  • Enables native HPA integration

Production Considerations

High Availability

Single Point of Failure:

  • Operator: 1 replica (no HA)
  • Webhooks: 1 replica (no HA)
  • Metrics API: 1 replica (no HA)
  • Single member can cause scaling delays

Recommendations

  1. Operator Resilience:

    • Current: 1 replica (acceptable for non-critical scaling)
    • Consider 2+ replicas if scaling criticality requires HA
    • Pod restart = brief scaling delay (not permanent)
  2. Webhook Redundancy (Optional):

    • Current: 1 replica (acceptable for validation)
    • No user impact if briefly unavailable
    • New ScaledObjects would temporarily queue
  3. Monitor ScaledObjects:

    • Verify ScaledObjects are active
    • Check scaling triggers are working
    • Monitor metrics collection
    • Alert on operator failures
  4. Error Handling:

    • Configure fallback HPA alongside KEDA
    • Document scaling trigger failures
    • Have manual scaling procedures
    • Regular testing of scaling
  5. Performance Tuning:

    • Adjust reconcile intervals
    • Configure concurrent scaling limits
    • Tune metrics polling frequency
    • Monitor API latency

Troubleshooting

Operator issues:

# Check operator logs
kubectl logs -f deployment/keda-operator -n keda

# Check operator status
kubectl get deployments -n keda -o wide

# Check for errors
kubectl logs deployment/keda-operator -n keda --tail=50 | grep -i "error\|fail\|warn"

# Restart operator
kubectl rollout restart deployment/keda-operator -n keda

ScaledObject issues:

# List all ScaledObjects
kubectl get scaledobjects -A

# Check specific ScaledObject
kubectl describe scaledobject <name> -n <namespace>

# Check ScaledObject events
kubectl describe scaledobject <name> -n <namespace> | grep -A 20 "Events:"

# Check generated HPA
kubectl get hpa -n <namespace> | grep keda

Metrics collection issues:

# Check if metrics are available
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq '.resources[] | .name'

# Verify metrics endpoint
kubectl port-forward -n keda service/keda-operator-metrics-apiserver 8080:8080
curl http://localhost:8080/metrics

# Check scaler logs
kubectl logs deployment/keda-operator -n keda | grep -i "scaler\|trigger\|metric"

Webhook issues:

# Check webhook service
kubectl get svc -n keda keda-admission-webhooks

# Check webhook logs
kubectl logs deployment/keda-admission-webhooks -n keda --tail=50

# Verify webhook is active
kubectl get validatingwebhookconfigurations | grep keda

Performance Metrics

Current Scale

  • Operator: 1 replica (core component)
  • Webhooks: 1 replica (validation)
  • Metrics API: 1 replica (metrics provider)
  • Total Active Pods: 3 pods

Stability

  • KEDA Age: ~213 days (mature)
  • Deployment Status: All healthy
  • Pod Restarts: Check for recent restarts
  • ScaledObjects: Monitor active count

Architecture Notes

  • Event-Driven: Enables queue-based scaling (Kafka, RabbitMQ, etc.)
  • Custom Metrics: Extends Kubernetes metrics system
  • Extensible: Pluggable scalers for different event sources
  • Standard: Uses native Kubernetes HPA under the hood
  • Critical Role: Powers all queue-based autoscaling