Skip to main content

rfd--test-library--be

Overview

  • Namespace: rfd--test-library--be
  • Purpose: Referring Doctor Test Library Backend - PRODUCTION
  • Age: ~2 years 151 days (since May 2023)
  • Status: Active - Test catalog and library management
  • Workloads: 6 deployments (5 active, 1 scaled to 0)
  • Environment: PRODUCTION - Test information and catalog

Architecture

Test library system handling test catalog, master data sync, and publishing:

  • Main Application: REST API backend (4 replicas) - Excellent HA with HPA
  • Event Consumer: Master data synchronization (1 deployment)
  • Workers: Default worker, Kafka publisher (2 active, 1 scaled to 0)
  • Scheduler: Cron jobs for scheduled tasks

Auto-Scaling Configuration

HPA Configured:

  • 1 Standard HPA: Main API (4-10 replicas, CPU + Memory based)
  • Currently: 4/10 replicas (CPU: 13%, Memory: 60MB/150MB)
  • HPA age: 209 days (proven setup)

Workload Categories

Main Application (1 deployment with HPA)

NameReplicasHPA StatusPurpose
rfd--test-library--be--app--prod4/4HPA Active (4-10, CPU+Memory)Main test library API

Event Consumer (1 deployment)

NameReplicasStatusPurpose
consumer-rfd-sync-master-data1/1RunningRFD master data sync

Workers (3 deployments - 2 active, 1 scaled to 0)

NameReplicasStatusPurpose
wrk--default1/1RunningDefault worker queue
wrk--kafka-publisher1/1RunningKafka event publisher
wrk--notifications0/0Scaled to 0Notification worker (INACTIVE)

Scheduler (1 deployment)

NameReplicasStatusPurpose
cron--prod1/1RunningScheduled cron jobs

HPA Details

Standard HPA - Main API

Name: rfd--test-library--be--app--prod-hpa
Current: 4 replicas
Min: 4 replicas
Max: 10 replicas
Metrics:
- CPU: 13% / 80% target
- Memory: 60MB / 150MB target
Status: Healthy, below target thresholds
Age: 209 days (proven configuration)

Services

NameTypeCluster IPPortsNodePortPurpose
rfd--test-library--be--app--prodNodePort10.8.21.1928030299Main test library API

Access & Management

View all resources:

kubectl get all -n rfd--test-library--be

Check HPA:

# Check HPA status
kubectl get hpa -n rfd--test-library--be

# Detailed HPA info
kubectl describe hpa rfd--test-library--be--app--prod-hpa -n rfd--test-library--be

# Watch HPA scaling
kubectl get hpa -n rfd--test-library--be -w

Check main application:

# View app pods (4 replicas with HPA)
kubectl get pods -n rfd--test-library--be | grep "app--prod"

# View logs from all replicas
kubectl logs -f deployment/rfd--test-library--be--app--prod -n rfd--test-library--be

Check consumer and workers:

# Master data sync consumer
kubectl logs -f deployment/rfd--test-library--be--consumer-rfd-sync-master-data--prod -n rfd--test-library--be

# Kafka publisher worker
kubectl logs -f deployment/rfd--test-library--be--wrk--kafka-publisher--prod -n rfd--test-library--be

# Default worker
kubectl logs -f deployment/rfd--test-library--be--wrk--default--prod -n rfd--test-library--be

Restart services:

# Restart main app (HPA will maintain replicas)
kubectl rollout restart deployment/rfd--test-library--be--app--prod -n rfd--test-library--be

# Restart consumer
kubectl rollout restart deployment/rfd--test-library--be--consumer-rfd-sync-master-data--prod -n rfd--test-library--be

# Restart all workers (active only)
kubectl get deployments -n rfd--test-library--be | grep wrk | grep -v "0/0" | awk '{print $1}' | xargs -I {} kubectl rollout restart deployment/{} -n rfd--test-library--be

Monitoring

Resource usage:

kubectl top pods -n rfd--test-library--be --sort-by=memory
kubectl top pods -n rfd--test-library--be --sort-by=cpu

HPA monitoring:

# Current HPA status
kubectl get hpa -n rfd--test-library--be

# HPA events
kubectl describe hpa rfd--test-library--be--app--prod-hpa -n rfd--test-library--be | grep -A 20 "Events:"

Events:

kubectl get events -n rfd--test-library--be --sort-by='.lastTimestamp' | head -20

Data Flow

Doctor Test Library Request

rfd--test-library--be--app--prod (NodePort 30299)

Main Test Library API (4 replicas - HPA scaling 4-10)

Database (external)

Events Published to Kafka
├─ Master Data Sync → consumer-rfd-sync-master-data
└─ Kafka Publisher → wrk--kafka-publisher

Workers Process Background Jobs

Cron Jobs → Scheduled Tasks

Test catalog, doctor test information

Test Library Workflow

1. Test Library API (Excellent Scaling)

  • 4 replicas currently active (HPA managing)
  • HPA scales 4-10 based on CPU + Memory
  • Test catalog queries
  • Test information lookup
  • Test pricing information
  • Test requirements and instructions
  • Package information

2. Master Data Synchronization

  • consumer-rfd-sync-master-data syncs test master data
  • Test catalog updates
  • Pricing synchronization
  • Test information updates
  • Integration with central test database

3. Kafka Event Publisher

  • wrk--kafka-publisher publishes events to Kafka
  • Test catalog change notifications
  • Event-driven architecture
  • Integration with other services

4. Background Worker

  • wrk--default handles general background tasks
  • Data processing
  • Batch operations

5. Notification Worker (Scaled to 0)

  • wrk--notifications scaled to 0
  • Previously handled notification tasks
  • May be deprecated or moved elsewhere

6. Scheduled Tasks

  • Cron jobs for periodic test library updates
  • Cache refreshes
  • Data validation

Production Considerations

High Availability

Excellent Configuration:

  • Main API: 4 replicas with HPA (4-10) - excellent HA and scaling
  • HPA proven setup (209 days)
  • Well-tuned thresholds (CPU: 13%, Memory: 60MB/150MB)
  • Very mature namespace (~2 years)

Single Points of Failure:

  • Consumer: 1 replica (master data sync)
  • Workers: 1 replica each
  • Cron job: 1 replica

Scaled to 0:

  • wrk--notifications: Scaled to 0 (review if needed)

Recommendations

  1. Maintain Current HPA Setup (Already Excellent):

    • HPA for main API: Working well (4/10 replicas)
    • Healthy metrics (CPU 13%, Memory 40%)
    • Good headroom for scaling
    • Min: 4 replicas provides good HA baseline
  2. Consumer Resilience:

    • consumer-rfd-sync-master-data: 1 replica
    • Consider 2 replicas for master data sync reliability
    • Critical for test catalog accuracy
  3. Worker Resilience:

    • wrk--kafka-publisher: 1 replica (consider 2 for event reliability)
    • wrk--default: 1 replica (consider 2 for HA)
  4. Review Scaled-to-0 Worker:

    • wrk--notifications scaled to 0
    • Confirm if permanently unused
    • Remove if deprecated
  5. Recent Activity:

    • Very active: Pods updated 7 days ago
    • Frequent updates (multiple in past month)
    • Monitor stability after updates
  6. Monitoring Priorities:

    • HPA scaling events
    • Master data sync status
    • Kafka publishing success
    • Consumer lag

Troubleshooting

HPA issues:

# Check HPA status
kubectl get hpa -n rfd--test-library--be

# Detailed HPA metrics
kubectl describe hpa rfd--test-library--be--app--prod-hpa -n rfd--test-library--be

# Watch for scaling events
kubectl get hpa -n rfd--test-library--be -w

Main API issues:

# Check all 4 API pods
kubectl get pods -n rfd--test-library--be | grep "app--prod"

# Check logs from scaled replicas
kubectl logs deployment/rfd--test-library--be--app--prod -n rfd--test-library--be --all-containers=true --tail=100

# Test API endpoint
kubectl port-forward -n rfd--test-library--be service/rfd--test-library--be--app--prod 8080:80

Master data sync issues:

# Check consumer
kubectl logs -f deployment/rfd--test-library--be--consumer-rfd-sync-master-data--prod -n rfd--test-library--be

# Check for sync errors
kubectl logs deployment/rfd--test-library--be--consumer-rfd-sync-master-data--prod -n rfd--test-library--be --tail=100 | grep -i "error\|sync\|fail"

# Restart consumer
kubectl rollout restart deployment/rfd--test-library--be--consumer-rfd-sync-master-data--prod -n rfd--test-library--be

Kafka publisher issues:

# Check Kafka publisher
kubectl logs -f deployment/rfd--test-library--be--wrk--kafka-publisher--prod -n rfd--test-library--be

# Check for publishing errors
kubectl logs deployment/rfd--test-library--be--wrk--kafka-publisher--prod -n rfd--test-library--be --tail=100 | grep -i "error\|kafka\|publish"

# Restart publisher
kubectl rollout restart deployment/rfd--test-library--be--wrk--kafka-publisher--prod -n rfd--test-library--be

Cron job issues:

# Check cron
kubectl logs -f deployment/rfd--test-library--be--cron--prod -n rfd--test-library--be

# Restart cron
kubectl rollout restart deployment/rfd--test-library--be--cron--prod -n rfd--test-library--be

Performance Metrics

Current Scale

  • Main API: 4/10 replicas (HPA active, CPU+Memory based)
    • CPU: 13% (target: 80%)
    • Memory: 60MB (target: 150MB)
    • Healthy headroom for scaling
  • Consumer: 1 replica (master data sync)
  • Workers: 2 active workers at 1 replica each
  • Cron: 1 replica
  • Scaled to 0: 1 worker (notifications)
  • Total Active Pods: ~9 pods

Stability

  • Namespace Age: ~2 years (very mature)
  • HPA Age: 209 days (proven scaling setup)
  • Recent Updates: 7 days ago (very active development)
  • Auto-Scaling: Excellent (Standard HPA with good metrics)