Skip to main content

Production1 Cluster (asia-southeast1)

Cluster Information

  • Cluster Name: gke_diagvn_asia-southeast1_production1
  • Project: diagvn
  • Region: Asia Southeast 1 (Singapore)
  • Zone: Regional (multi-zone)
  • Environment: Production
  • Kubernetes Version: v1.30.9-gke.1127000 / v1.30.3-gke.1969002

Control Plane

  • API Server: https://35.185.183.71
  • Status: Running
  • Components:
    • GLBCDefaultBackend
    • KubeDNS
    • Metrics-server

Cluster Resources

Nodes (10 nodes across 4 node pools)

Node NameNode PoolMachine TypeCPUMemoryAgeUsage
gke-production1-diag-vpn-enable-pool-cf6245a3-ckjfdiag-vpn-enable-poole2-standard-4416GB224dCPU: 15%, Mem: 56%
gke-production1-high-performance-a0df873e-g9ndhigh-performancet2d-standard-228GB10hCPU: 26%, Mem: 68%
gke-production1-high-performance-a0df873e-sjtphigh-performancet2d-standard-228GB16dCPU: 18%, Mem: 57%
gke-production1-high-performance-a0df873e-vg7vhigh-performancet2d-standard-228GB14dCPU: 39%, Mem: 62%
gke-production1-production-asia-south-7b3f9129-hijoproduction-asia-southeast1-be2-standard-4416GB381dCPU: 29%, Mem: 84%
gke-production1-production-asia-south-8d9044df-jiw3production-asia-southeast1-be2-standard-4416GB381dCPU: 33%, Mem: 63%
gke-production1-production-asia-south-8d9044df-uilbproduction-asia-southeast1-be2-standard-4416GB381dCPU: 31%, Mem: 65%
gke-production1-production-asia-south-e4fd6e82-e28pproduction-asia-southeast1-be2-standard-4416GB381dCPU: 47%, Mem: 93% x
gke-production1-production-asia-south-e4fd6e82-ex49production-asia-southeast1-be2-standard-4416GB381dCPU: 98% xx, Mem: 93% x
gke-production1-production-chrome-spo-409316b8-hlrkproduction-chrome-spot (Spot)e2-standard-8832GB4hCPU: 5%, Mem: 4%

Total Capacity: 42 vCPUs, 160 GB RAM

xx CRITICAL: Node ex49 has 98% CPU usage and 93% memory usage x WARNING: Node e28p has 93% memory usage

Node Pools Summary

PoolMachine TypeSpotNodesTotal vCPUsTotal MemoryPurpose
diag-vpn-enable-poole2-standard-4No1416GBVPN-enabled workloads
high-performancet2d-standard-2No3624GBHigh-performance workloads
production-asia-southeast1-be2-standard-4No52080GBMain production workloads
production-chrome-spote2-standard-8Yes1832GBChrome headless (cost-optimized)

Workload Overview

  • Total Deployments: 375
  • Total Namespaces: 88
  • Active Namespaces with Workloads: 66
  • HorizontalPodAutoscalers: 40
  • StatefulSets: Multiple (databases, etcd, monitoring)

Access

To connect to this cluster:

# Switch context
kubectx gke_diagvn_asia-southeast1_production1

# Verify connection
kubectl cluster-info

# View nodes
kubectl get nodes

# View all resources
kubectl get all --all-namespaces

External Access Points

ServiceNamespaceExternal IPPortsPurpose
apisix-gatewayapisix34.87.114.12180, 443, 2222API Gateway
traefikdefault34.126.178.24980, 443Primary ingress controller
workflow-managementworkflow-management34.124.238.595678n8n workflow automation

Application Namespaces (Organized by Domain)

Infrastructure & Platform Services

NamespacePurposeDeploymentsStatus
apisixAPI GatewayMultipleActive
etcdKey-value storeStatefulSetActive
monitoringPrometheus/Grafana stack10 deploymentsActive
kedaEvent-driven autoscaling3Active
sentryError trackingActiveActive
sentry-relayError tracking relay2Active
netsuite-producerNetSuite ERP Event Producer0Empty

Patient Portal (PAT) Services

NamespacePurposeDeploymentsStatus
pat--authenticateAuthentication serviceMultipleActive
pat--booking--beBooking backend14Active
pat--patient-mgt--bePatient managementMultipleActive
pat--test-result--beTest results21Active
pat--notification--beNotifications8Active
pat--reminder-booking--beBooking remindersMultipleActive
pat--dtc--beDTC backendMultipleActive
pat--dlq--beDead letter queueMultipleActive
pat--webappPatient web applicationMultipleActive

Referral/Results (RFD) Services

NamespacePurposeDeploymentsStatus
rfd--webappRFD web application7Active
rfd--user-mgtUser management7Active
rfd--notificationNotifications8Active
rfd--test-library--beTest library backendMultipleActive
rfd--doctor-kycDoctor KYCMultipleActive
rfd--doctor-statementDoctor statementsMultipleActive
rfd--doctor-test-resultDoctor test resultsMultipleActive
rfd--order-historyOrder historyMultipleActive
rfd--dashboardDashboardMultipleActive
rfd--mixpanel-integrationAnalytics integrationMultipleActive

Sapoche Clinic (SPC) Services

NamespacePurposeDeploymentsStatus
spc--lisLaboratory Information System26Active
spc--posPoint of Sale18Active
spc--webappClinic web application7Active
spc--appointment--beAppointments8Active
spc--auditAudit loggingMultipleActive
spc--delivery--beDelivery managementMultipleActive
spc--promotion--bePromotionsMultipleActive
spc--booking-qsBooking queue systemMultipleActive
spc--noti-centre--beNotification centerMultipleActive
spc--pdf-generate--bePDF generation10Active
spc--purchase-order--bePurchase ordersMultipleActive
spc--websocket-serverWebSocket serverMultipleActive

Sapoche Core Services

NamespacePurposeDeploymentsStatus
sapocheMain Sapoche application39Active
sapoche-micro-financeFinance microservice1Active
sapoche-micro-search-pidPatient ID search1Active
sapoche-micro-search-vidVisit ID search1Active

Data Streaming Services

NamespacePurposeDeploymentsStatus
data-streamingCore data streaming3Inactive (Scaled to 0)
data-streaming-consumersConsumer services (6 namespaces)MultipleActive
ds-ns-producerNetSuite producer3Active
prod-ds-producerProduction data streamingMultipleActive
netsuite-consumerNetSuite consumer11Active

DIAG Internal Applications

NamespacePurposeDeploymentsStatus
diagcorpDIAG corporate portal4 (all scaled to 0)Inactive
diagpassDIAG employee pass system3 (all scaled to 0)Inactive
diag-accounting-teamAccounting tools1Active

Supporting Services

NamespacePurposeDeploymentsStatus
impIMP application26Active
ecgECG services3 + CronJobActive
signatureSignature services1 active, 4 scaled to 0Partially Active
webportalWeb portalMultipleActive
short-linksURL shortenerMultipleActive
dynamic-linksDynamic link generatorMultipleActive
chrome-headlessHeadless Chrome for automation4Active
external-containerExternal containersMultipleActive
scalar-docsAPI documentationMultipleActive
workflow-managementn8n workflow automation1Active
auto-assign-botAutomated assignmentMultipleActive
slackbotSlack integrationMultipleActive
zendeskCustomer support integrationMultipleActive
fingerprintBiometric authenticationMultipleActive
nocodbNo-code databaseMultipleActive
pmmPerformance monitoringMultipleActive

Critical Alerts

Resource Pressure

xx IMMEDIATE ACTION REQUIRED:

Node gke-production1-production-asia-south-e4fd6e82-ex49:

  • CPU Usage: 98% (CRITICAL)
  • Memory Usage: 93% (CRITICAL)
  • Age: 381 days
  • Recommended Actions:
    1. Identify high-consuming pods: kubectl top pods --all-namespaces --sort-by=cpu
    2. Consider horizontal scaling of workloads
    3. Investigate if pods can be migrated to other nodes
    4. Add more nodes to the pool or scale up existing pods' requests

Node gke-production1-production-asia-south-e4fd6e82-e28p:

  • Memory Usage: 93% (WARNING)
  • Recommended Action: Monitor closely, prepare to scale

Performance Optimization

  1. Node Distribution: 5 nodes in main pool show varying load (29-98% CPU)
  2. Spot Node: Chrome spot node significantly underutilized (5% CPU, 4% mem)
  3. HPAs: 40 HPAs configured - verify they're responding appropriately

Important Notes

  • This is a production cluster - exercise extreme caution when making changes
  • CRITICAL resource pressure on node ex49 requires immediate attention
  • Large scale: 375 deployments across 88 namespaces
  • Multiple major applications: PAT (patient portal), RFD (referral), SPC (clinic), Sapoche
  • Microservices architecture with domain-based namespace organization
  • 40 HPAs configured for auto-scaling

Quick Commands

# Check critical node
kubectl top node gke-production1-production-asia-south-e4fd6e82-ex49
kubectl describe node gke-production1-production-asia-south-e4fd6e82-ex49

# Find high-consuming pods
kubectl top pods --all-namespaces --sort-by=cpu | head -20
kubectl top pods --all-namespaces --sort-by=memory | head -20

# View resource usage by namespace
kubectl top pods --all-namespaces --sort-by=cpu | awk '{print $1}' | sort | uniq -c | sort -rn

# View HPAs
kubectl get hpa --all-namespaces

# View all LoadBalancer services
kubectl get svc --all-namespaces --field-selector spec.type=LoadBalancer

# Check pod distribution per node
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=gke-production1-production-asia-south-e4fd6e82-ex49

Architecture Overview

Production1 cluster follows a microservices architecture organized by business domains:

  1. PAT: Patient-facing portal services (9 namespaces)
  2. RFD: Referral and test results management (10 namespaces)
  3. SPC: Sapoche clinic management system (12 namespaces)
  4. Sapoche: Core Sapoche platform and microservices (4 namespaces)
  5. Data Streaming: Event streaming and integration services (8 namespaces)
  6. Infrastructure: Supporting platform services (monitoring, API gateway, databases)
  7. DIAG Internal: Corporate applications (diagcorp, diagpass, accounting)

Each domain has multiple microservices deployed in separate namespaces for isolation and independent scaling.