Skip to main content

etcd

Overview

  • Namespace: etcd
  • Purpose: Key-Value Store for APISIX Configuration - PRODUCTION
  • Age: ~567 days (since November 2023 / older)
  • Status: Active - Distributed key-value store
  • Workloads: 1 StatefulSet (1 replica)
  • Environment: PRODUCTION - Configuration storage for APISIX

Architecture

etcd distributed key-value store for APISIX configuration and service discovery:

  • StatefulSet: Single etcd member (1 replica) - No clustering
  • Persistent Storage: Data persistence (state persistent)
  • Service: Both ClusterIP and headless service for discovery

Auto-Scaling Configuration

Not Applicable:

  • StatefulSets don't use HPAs
  • Single member (not clustered)
  • Fixed 1 replica

Workload Categories

Distributed Key-Value Store (1 StatefulSet)

NameReplicasStatusPurpose
apisix-etcd1/1Runningetcd key-value store (Single member)

Services

NameTypeCluster IPPortsPurpose
apisix-etcdClusterIP10.8.20.892379, 2380Client and peer communication
apisix-etcd-headlessClusterIPNone (Headless)2379, 2380DNS discovery for StatefulSet

Access & Management

View all resources:

kubectl get all -n etcd
kubectl get statefulset -n etcd
kubectl get pvc -n etcd

Check etcd pod:

# View etcd pod
kubectl get pods -n etcd

# View etcd logs
kubectl logs -f statefulset/apisix-etcd -n etcd

# Check etcd health
kubectl exec -it statefulset/apisix-etcd -n etcd -- etcdctl endpoint health

Access etcd CLI:

# Get shell access
kubectl exec -it statefulset/apisix-etcd -n etcd -- sh

# Check keys (from within pod)
etcdctl --endpoints=localhost:2379 get --prefix "/"

# Get APISIX routes
etcdctl --endpoints=localhost:2379 get --prefix "/apisix"

Monitor storage:

# Check persistent volume usage
kubectl get pvc -n etcd

# Check volume details
kubectl describe pvc -n etcd

Restart/Maintenance:

#  Careful with StatefulSet restarts - may lose data if no persistent storage
kubectl rollout restart statefulset/apisix-etcd -n etcd

# Scale operations (not recommended for single-member)
# kubectl scale statefulset apisix-etcd --replicas=1 -n etcd

Monitoring

Pod metrics:

kubectl top pods -n etcd

# Check pod resource requests/limits
kubectl describe pod -n etcd | grep -A 5 "Requests\|Limits"

Storage metrics:

# Check persistent volume usage
kubectl get pvc -n etcd -o wide

# Watch etcd metrics
kubectl port-forward -n etcd statefulset/apisix-etcd 2379:2379

Events:

kubectl get events -n etcd --sort-by='.lastTimestamp' | head -20

Data Flow

APISIX Gateway

etcd Admin API (Port 2379)

apisix-etcd pod (StatefulSet)

Persistent Volume (data storage)

Stored key-value data

etcd Workflow

1. Key-Value Storage

  • Single member (no clustering)
  • Configuration persistence
  • Atomic transactions
  • Watch mechanism for changes
  • APISIX configuration storage

2. Service Discovery

  • DNS discovery via headless service
  • Peer communication on port 2380
  • Client requests on port 2379
  • Health checks and status

3. Data Persistence

  • Persistent volume for state
  • Automatic state recovery
  • Data consistency guarantees
  • Backup considerations

Production Considerations

High Availability

CRITICAL ISSUE - NO CLUSTERING:

  • Single member (1 replica) - NO REDUNDANCY
  • Single point of failure for APISIX configuration
  • Pod restart = temporary configuration unavailability
  • No leader election or consensus

Data Safety

PRODUCTION RISK:

  • Single member means no fault tolerance
  • Pod failure = configuration loss (if no persistent storage)
  • No replication across nodes
  • Backup/recovery dependent on persistent volume

Recommendations

  1. URGENT: Cluster etcd:

    • Current: Single member ( not production-ready)
    • Recommended: 3-member cluster (good HA)
    • Provides fault tolerance and leader election
    • Prevents split-brain scenarios
  2. Persistent Storage:

    • Verify persistent volume is healthy
    • Monitor disk space usage
    • Check volume backup strategy
    • Ensure volume snapshots are configured
  3. Monitoring:

    • Monitor pod restart count
    • Check disk space availability
    • Monitor etcd commit latency
    • Alert on health check failures
  4. Backup Strategy:

    • Regular etcd snapshots
    • Test restore procedures
    • Store backups in safe location
    • Document recovery procedures
  5. Disaster Recovery:

    • Document single-member etcd limitations
    • Plan cluster upgrade path
    • Prepare restore procedures
    • Have backup keys/configuration

Troubleshooting

Health checks:

# Check etcd health (from pod)
kubectl exec -it statefulset/apisix-etcd -n etcd -- etcdctl endpoint health

# Check cluster status (single-member)
kubectl exec -it statefulset/apisix-etcd -n etcd -- etcdctl member list

# Monitor etcd metrics
kubectl exec -it statefulset/apisix-etcd -n etcd -- etcdctl metrics

Storage issues:

# Check persistent volume
kubectl get pvc -n etcd
kubectl describe pvc -n etcd

# Check disk usage (from pod)
kubectl exec -it statefulset/apisix-etcd -n etcd -- df -h

# Check data directory
kubectl exec -it statefulset/apisix-etcd -n etcd -- ls -la /bitnami/etcd/data/

Data integrity:

# Defragment etcd (when disk fragmentation occurs)
kubectl exec -it statefulset/apisix-etcd -n etcd -- etcdctl defrag

# Check database size
kubectl exec -it statefulset/apisix-etcd -n etcd -- du -sh /bitnami/etcd/data/

# List all keys
kubectl exec -it statefulset/apisix-etcd -n etcd -- etcdctl get --prefix "/" | head -20

Connection issues:

# Test connectivity from APISIX
kubectl exec -it deployment/apisix -n apisix -- curl -v http://apisix-etcd.etcd:2379/v2/keys/

# Test from external pod
kubectl exec -it <pod> -- curl -v http://apisix-etcd.etcd.svc.cluster.local:2379/health

Performance Metrics

Current Scale

  • etcd Members: 1 (single member, NOT CLUSTERED)
  • Storage: Persistent volume (check PVC for size)
  • Replicas: 1 (StatefulSet)
  • Age: ~567 days (very mature)

Stability

  • StatefulSet Age: ~567 days (very stable)
  • Pod Restarts: Check recent restart count
  • Data Persistence: Depends on PVC
  • Critical Role: Stores all APISIX configuration

Architecture Notes

  • Single Member: Current deployment has no redundancy
  • StatefulSet: Maintains stable identity and persistent storage
  • Headless Service: DNS discovery for peer communication
  • etcd v3: Modern version with strong consistency guarantees
  • PRODUCTION ISSUE: Single member not suitable for production HA