Skip to main content

spc--lis

Overview

  • Namespace: spc--lis
  • Purpose: Sapoche Laboratory Information System (LIS) - PRODUCTION
  • Age: 416 days (~14 months)
  • Status: Active - Critical medical laboratory system
  • Workloads: 26 deployments (24 active, 2 scaled to 0)
  • Environment: PRODUCTION - Handles all laboratory test processing

Architecture

The Laboratory Information System (LIS) manages the complete laboratory workflow:

  • Main Application: REST API backend (6 replicas with HPA)
  • Event Consumers: Process laboratory events and results (11 deployments)
  • Batch Publishers: Async job publishers for various workflows (11 deployments)
  • Cron Jobs: Scheduled tasks
  • Observability: OpenTelemetry collector for tracing

Auto-Scaling Configuration

HorizontalPodAutoscalers (4 HPAs)

HPA NameTargetMinMaxCurrentMetricsType
spc--lis--be--app--prodMain app2206CPU: 46%/80%, Mem: 500MiStandard HPA
keda-hpa-consumer-lis-sample-statusSample status consumer1101Queue: 0/10KEDA
keda-hpa-consumer-lis-work-orderWork order consumer141Queue: 0/10KEDA
keda-hpa-consumer-pdf-webhookPDF webhook consumer2152Queue: 0/50KEDA

Scaling Summary:

  • Main app auto-scales based on CPU/memory load
  • 3 consumers use KEDA for queue-based autoscaling
  • PDF webhook maintains minimum 2 replicas for availability

Workload Categories

Main Application (1 deployment)

NameReplicasStatusPurpose
spc--lis--be--app--prod6/6Running + HPAMain LIS API (auto-scales 2-20)

The main application handles:

  • Laboratory test ordering
  • Sample tracking
  • Result entry and verification
  • Quality control
  • Integration with lab analyzers
  • RESTful API for frontend and integrations

Event Consumers (11 deployments)

Process laboratory-related events from message queues:

NameReplicasStatusPurpose
consumer-iris-test-result1/1RunningIRIS test result integration
consumer-lis-auto-verify1/1RunningAutomatic result verification
consumer-lis-qc-data1/1RunningQuality control data processing
consumer-lis-sample-status1/1Running + HPASample status updates (scales 1-10)
consumer-lis-test-results-sync1/1RunningTest result synchronization
consumer-lis-vid-attachment-upload2/2RunningVisit ID attachment uploads (2 replicas)
consumer-lis-work-order1/1Running + HPAWork order processing (scales 1-4)
consumer-order1/1RunningOrder processing
consumer-pdf-webhook2/2Running + HPAPDF webhook events (scales 2-15)
consumer-status-attune1/1RunningAttune device status
consumer-status-attune-forwarder1/1RunningAttune status forwarder

Scaled to 0:

  • consumer-lis-attune-evoke-ai (x AI integration - inactive)

Batch Publishers (11 deployments)

Publish async jobs for background processing:

NameReplicasStatusPurpose
wrk--batch-publisher1/1RunningGeneral batch job publisher
wrk--batch-publisher-audit1/1RunningAudit log publishing
wrk--batch-publisher-lis-auto-verify1/1RunningAuto-verification jobs
wrk--batch-publisher-lis-qc-data1/1RunningQC data jobs
wrk--batch-publisher-lis-test-result1/1RunningTest result processing jobs
wrk--batch-publisher-lis-test-result-corp1/1RunningCorporate test results
wrk--batch-publisher-lis-test-result-repush1/1RunningResult re-push jobs
wrk--batch-publisher-lis-test-results-sync1/1RunningResult sync jobs
wrk--batch-publisher-sample-status1/1RunningSample status jobs
wrk--batch-publisher-work-order1/1RunningWork order jobs

Scaled to 0:

  • wrk--batch-publisher-lis-attune-evoke-ai (x AI integration - inactive)

Supporting Services (2 deployments)

NameReplicasStatusPurpose
spc--lis--be--cron--prod1/1RunningScheduled cron jobs
spc-lis-otel-collector1/1RunningOpenTelemetry trace collector

Services

NameTypeCluster IPPortsNodePortPurpose
spc--lis--be--app--prodNodePort10.8.27.1178032705Main LIS API
spc-lis-otel-collectorClusterIP10.8.24.2244317, 4318-OTLP trace collection

Access & Management

View all resources:

kubectl get all -n spc--lis

Check main application:

# View app pods
kubectl get pods -n spc--lis | grep "app--prod"

# Check HPA status
kubectl describe hpa spc--lis--be--app--prod -n spc--lis

# View logs
kubectl logs -f deployment/spc--lis--be--app--prod -n spc--lis

Check consumers:

# All consumers
kubectl get pods -n spc--lis | grep consumer

# KEDA-scaled consumers
kubectl get hpa -n spc--lis | grep keda

# Consumer logs
kubectl logs -f deployment/spc--lis--be--consumer-lis-sample-status--prod -n spc--lis

Check batch publishers:

# All workers
kubectl get pods -n spc--lis | grep "wrk--"

# Worker logs
kubectl logs -f deployment/spc--lis--be--wrk--batch-publisher-lis-test-result--prod -n spc--lis

Scaling:

# View current scaling
kubectl get hpa -n spc--lis

# Manual scale (HPA will override)
kubectl scale deployment spc--lis--be--app--prod -n spc--lis --replicas=10

# Check KEDA scaled objects
kubectl get scaledobjects -n spc--lis

Restart services:

# Restart main app
kubectl rollout restart deployment/spc--lis--be--app--prod -n spc--lis

# Restart specific consumer
kubectl rollout restart deployment/spc--lis--be--consumer-pdf-webhook--prod -n spc--lis

# Restart all consumers
kubectl get deployments -n spc--lis | grep consumer | awk '{print $1}' | xargs -I {} kubectl rollout restart deployment/{} -n spc--lis

Monitoring

Resource usage:

kubectl top pods -n spc--lis --sort-by=memory
kubectl top pods -n spc--lis --sort-by=cpu

HPA metrics:

# All HPAs
kubectl get hpa -n spc--lis

# Detailed HPA status
kubectl describe hpa spc--lis--be--app--prod -n spc--lis
kubectl describe hpa keda-hpa-spc--lis--be--consumer-pdf-webhook--prod -n spc--lis

Deployment status:

kubectl get deployments -n spc--lis

Events:

kubectl get events -n spc--lis --sort-by='.lastTimestamp' | head -20

Traces (OpenTelemetry):

# Check OTEL collector
kubectl logs -f deployment/spc-lis-otel-collector -n spc--lis

# Port forward to access OTEL endpoints
kubectl port-forward -n spc--lis deployment/spc-lis-otel-collector 4317:4317

Data Flow

External Requests (via APISIX/Traefik)

spc--lis--be--app--prod (NodePort 32705)

Main LIS API (6-20 replicas via HPA)

Database (external)

Events Published to Message Queue

Consumers Process Events

Batch Publishers Create Background Jobs

Workers Process Jobs (in other namespaces)

Results, Notifications, PDFs

OpenTelemetry Tracing

Application → OTEL Collector (4317/4318) → Backend (Grafana/Jaeger)

Laboratory Workflow

1. Test Ordering

  • Orders created via API
  • Work orders published to queue
  • consumer-order / consumer-lis-work-order process

2. Sample Collection & Tracking

  • Sample status updates
  • consumer-lis-sample-status processes (KEDA-scaled)
  • Barcode scanning and tracking

3. Analysis & Results

  • Analyzer integration (IRIS, Attune devices)
  • consumer-iris-test-result processes results
  • Auto-verification via consumer-lis-auto-verify

4. Quality Control

  • QC data processing
  • consumer-lis-qc-data handles QC events
  • wrk--batch-publisher-lis-qc-data publishes QC jobs

5. Result Verification & Reporting

  • Manual/auto verification
  • PDF generation via consumer-pdf-webhook
  • Result synchronization across systems

6. Attachments & Documents

  • Visit attachments upload (2 replicas for reliability)
  • PDF webhooks (2-15 replicas based on load)

Production Considerations

High Availability

Well Configured:

  • Main API: 6 replicas with HPA (scales to 20)
  • Critical consumers: 2 replicas (vid-attachment-upload, pdf-webhook)
  • KEDA autoscaling for queue-based consumers

x Single Points of Failure:

  • Most consumers: 1 replica
  • All batch publishers: 1 replica
  • Cron job: 1 replica
  • OTEL collector: 1 replica

Auto-Scaling Configuration

WorkloadTypeCurrentMinMaxScaling Metric
Main APIStandard HPA6220CPU 80%, Mem 500Mi
Sample Status ConsumerKEDA1110Queue depth 10
Work Order ConsumerKEDA114Queue depth 10
PDF Webhook ConsumerKEDA2215Queue depth 50

Recommendations

  1. Main API Scaling:

    • Currently at 6 replicas (30% of max capacity)
    • Consider lowering CPU threshold from 80% to 70% for faster response
    • Monitor during peak hours
  2. Consumer Reliability:

    • Critical consumers at 1 replica - consider baseline of 2
    • KEDA autoscaling configured but at minimum
    • Review queue thresholds (10/50 messages)
  3. Batch Publisher Resilience:

    • All at 1 replica - single points of failure
    • Consider 2 replicas for critical publishers:
      • lis-test-result
      • lis-test-result-corp
      • sample-status
  4. Observability:

    • OTEL collector at 1 replica
    • Consider 2+ replicas or use daemonset
    • Monitor trace collection lag
  5. Resource Cleanup:

    • 2 deployments scaled to 0 (AI integration)
    • Review and remove if permanently unused
  6. Monitoring Priorities:

    • Main API response times
    • Queue depths and consumer lag
    • PDF generation success rate
    • Auto-verification accuracy
    • Sample tracking accuracy

Troubleshooting

Main API issues:

# Check API pods
kubectl get pods -n spc--lis | grep "app--prod"

# Check HPA status
kubectl describe hpa spc--lis--be--app--prod -n spc--lis

# Check logs
kubectl logs -f deployment/spc--lis--be--app--prod -n spc--lis --tail=100

# Test API endpoint
kubectl port-forward -n spc--lis service/spc--lis--be--app--prod 8080:80
# Access http://localhost:8080

Consumer not processing:

# Check consumer status
kubectl get pods -n spc--lis | grep consumer-lis-sample-status

# Check KEDA scaler
kubectl describe scaledobject spc--lis--be--consumer-lis-sample-status--prod -n spc--lis

# Check logs
kubectl logs -f deployment/spc--lis--be--consumer-lis-sample-status--prod -n spc--lis

# Restart consumer
kubectl rollout restart deployment/spc--lis--be--consumer-lis-sample-status--prod -n spc--lis

HPA not scaling:

# Check HPA events
kubectl describe hpa spc--lis--be--app--prod -n spc--lis

# Check metrics server
kubectl top nodes
kubectl top pods -n spc--lis

# Check KEDA operator
kubectl get pods -n keda
kubectl logs -n keda deployment/keda-operator

PDF generation delays:

# Check PDF webhook consumer
kubectl get hpa -n spc--lis | grep pdf-webhook
kubectl describe hpa keda-hpa-spc--lis--be--consumer-pdf-webhook--prod -n spc--lis

# Check consumer logs
kubectl logs -f deployment/spc--lis--be--consumer-pdf-webhook--prod -n spc--lis

# Check queue depth (via KEDA)
kubectl describe scaledobject spc--lis--be--consumer-pdf-webhook--prod -n spc--lis

Result synchronization issues:

# Check sync consumer
kubectl logs -f deployment/spc--lis--be--consumer-lis-test-results-sync--prod -n spc--lis

# Check sync batch publisher
kubectl logs -f deployment/spc--lis--be--wrk--batch-publisher-lis-test-results-sync--prod -n spc--lis

# Restart both
kubectl rollout restart deployment/spc--lis--be--consumer-lis-test-results-sync--prod -n spc--lis
kubectl rollout restart deployment/spc--lis--be--wrk--batch-publisher-lis-test-results-sync--prod -n spc--lis

Tracing issues:

# Check OTEL collector
kubectl logs -f deployment/spc-lis-otel-collector -n spc--lis

# Check collector metrics
kubectl port-forward -n spc--lis deployment/spc-lis-otel-collector 8888:8888
# Access http://localhost:8888/metrics

Performance Metrics

Current Scale (Production Load)

  • Main API: 6 replicas (moderate load, can scale to 20)
  • Consumers:
    • Critical: 2 replicas (attachment upload, PDF webhook)
    • Standard: 1 replica with KEDA autoscaling
  • Batch Publishers: 1 replica each
  • Total Pods: ~30-35 pods in namespace

Scaling Behavior

  • Main API HPA: Scales based on CPU (80% threshold)
  • KEDA HPAs: Scale based on queue depth
    • Sample status: 10 messages per replica
    • Work order: 10 messages per replica
    • PDF webhook: 50 messages per replica

Integration Points

External Systems

  1. Lab Analyzers:

    • IRIS (hematology analyzer)
    • Attune (flow cytometer)
    • Consumer-based integration
  2. Corporate Systems:

    • Test result synchronization
    • Corporate reporting
  3. PDF Generation:

    • Webhook-based PDF generation
    • Result reports, labels, certificates
  4. Patient Portal:

    • Test result delivery
    • Notification integration

Important Notes

x PRODUCTION ENVIRONMENT:

  • This is a CRITICAL PRODUCTION system handling medical laboratory data
  • Downtime directly impacts patient care and laboratory operations
  • Changes must be tested in staging first
  • Coordinate with laboratory operations team
  • Monitor carefully during deployments
  • Have immediate rollback plan ready

Compliance: Laboratory data is subject to strict regulatory requirements (HIPAA, local medical regulations)