This architecture provides a highly available, multi-regional deployment for KnowledgeCity that ensures Saudi Arabian users’ data remains in the KSA region, U.S. users’ data remains in the U.S., and KnowledgeCity course content is globally distributable with minimal latency. The design is built on GCP, leverages GKE for container orchestration, Cloud Storage for media, Pub/Sub for eventing, and ClickHouse for analytics. It targets high availability (multi-AZ per region), horizontal scalability, and cost optimization.
- High-level topology
- Multi-regional & data-residency model
- Compute & microservice layout
- Media pipeline (upload → convert → delivery)
- Analytics
- Observability & SRE
- Dashboards and troubleshooting
- Scaling & failure handling
- Networking, security & compliance
- Cost Optimization
- Cloud provider: Google Cloud Platform (GCP). Rationale: KSA and US regions available, mature global networking, managed services to speed delivery and reduce ops cost.
- Infra as code: Terraform (module-per-region) with a small wrapper (Terragrunt/Terragrant) to instantiate new regions quickly and consistently.
- Projects & network:
- Use a single global GCP VPC. Create one subnet per region (and per cluster) inside that VPC. Each subnet will host the regional GKE cluster node pools and have dedicated secondary CIDR ranges for Kubernetes pods and services.
- Orchestration: Google Kubernetes Engine (GKE) clusters in each region (3+ zones per region).
- CI/CD & registry: Artifact Registry + GitHub Actions / ArgoRollouts for builds, image signing, vulnerability scanning, and progressive rollouts.
- Per-region primary data:
- Each customer’s personal or regulated data is stored only in their home region (KSA or US). This applies to databases and regional object storage buckets.
- Implement identity/tenant routing at the API layer (ingress) so requests are routed to the correct regional backend.
- Databases:
- Cloud SQL (regional primary + standby in another zone) for each region.
- Global content:
- Course materials and public content are stored in a global bucket (or multi-region bucket) and served through CDN caching.
- Frontend
- Store the frontend static assets to a cloud storage bucket and serve its content via Cloud CDN to keep the contenct closer to the user location and reduce latency.
- GKE clusters per region, spread across 3 availability zones. For each region:
- Node pools:
- control node pool (standard nodes) for critical services and stateful workloads.
- spot / preemptible node pool for batch/convert workloads (taints + tolerations).
- Use pod anti-affinity to distribute pods across zones.
- Use PodDisruptionBudgets for availability during upgrades.
- Enable Cluster Autoscaler + Horizontal Pod Autoscaler (HPA) per workload.
- Node pools:
- Ingress:
- Global HTTP(S) Load Balancer provides a single global entry. LB routes requests to regional backends based on geography / latency / failover policy.
- Use GKE Ingress + Gateway API (CRDs) for per-cluster routing and consistent config.
- Microservices pattern:
- Monolith API can be containerized and deployed in each region (if latency/availability requires it) or run centrally if residency permits.
- Analytics, video conversion, and future microservices are independent containers with clear contracts (REST/gRPC + events).
- Develop a helm chart to make it easy to deploy future microservices to the GKE cluster.
- Upload
- Each region has dedicated Cloud Storage buckets (regional) for raw uploads and processed outputs to enforce residency.
- Frontend uploads directly to the regional bucket using short-lived signed URLs (V4 signed URLs) to keep traffic out of app servers and ensure secure upload.
- Eventing
- Cloud Storage notifications → Pub/Sub topic (regional).
- Video conversion service in the same region subscribes (pull) to the topic and processes jobs.
- Processing
- Worker pods (on spot node pools) fetch files, transcode into required formats/bitrates, upload processed outputs to the regional processed bucket.
- Use metadata to tag outputs (quality, codec) and write processing results to the regional database or a processing-status topic.
- Delivery
- Processed files for global course content are replicated or published to a global bucket / origin for CDN if allowed by residency rules.
- Configure Google Cloud CDN (or Cloudflare/third-party CDN) in front of the origin, with cache keys and TTLs tuned for course assets.
- Use signed URLs and token authentication for private video streaming (short TTLs, optional signed cookies for players).
- Storage lifecycle
- Raw uploads lifecycle: move to Coldline/Archive after retention period; processed outputs stay in Standard or Multi-region as required.
- ClickHouse: provision ClickHouse clusters per region (or use ClickHouse Cloud). Aggregate events locally and stream selected aggregates cross-region for global reports.
- Metrics: Prometheus in each cluster (Prometheus Operator) + Thanos for global, long-term storage, deduplication, downsampling and cross-cluster querying.
- Logs: Loki deployed per cluster with object storage (GCS) as long-term chunk storage.
- Dashboards / Alerting: Grafana as the single pane of glass (data sources: Thanos Querier, Loki). Alerts via Prometheus/Thanos Ruler / Alertmanager or Grafana Alerting with high-availability configuration.
- Tracing (recommended): Instrument apps with OpenTelemetry and centralize traces to a collector.
- Alerting & SLOs: Define SLOs for availability and latency (e.g., 99.99% availability targets), configure alerts and escalation playbooks.
- Instrument Frontend with RUM (Real User Monitoring) to capture client-side metrics and errors (latency, first paint, JS errors).
- Trace requests end-to-end with OpenTelemetry (browser -> backend) to correlate frontend latency to backend services.
- CI/CD & Release strategy: Canary deployments, automated canary analysis, and rollout automation via ArgoCD and ArgoRollouts.
- Provide ready dashboards:
- Cluster overview (CPU, memory, pod count, node health).
- Application latency/error dashboards (per service).
- Video pipeline metrics (queue depth, job time, success/failure rate).
- CDN / edge metrics and cache hit ratio.
- Correlate logs & metrics in Grafana: link a metric alert to relevant logs via log labels (namespace, pod, request_id).
- Add RUM dashboards (frontend) to measure user experience and tie to backend metrics.
- Autoscaling: HPA (requests / CPU / custom metrics queue length) + cluster autoscaler to scale nodes automatically.
- Regional failover:
- For user-facing content: global LB with region-based routing and failover. If a region is down, LB can failover traffic to another region (with potential user data access implications — make sure residency policy is respected).
- For user data: maintain per-region primaries. In catastrophic region failure, have documented DR procedure to restore from backups or promote cross-region replica (must comply with residency rules/legal constraints).
- Resilience of video pipeline: Use Pub/Sub message retries, dead letter topics.
- Network
- Global VPC: one VPC spanning regions. Subnets are regional objects (e.g., subnet-ksa-1, subnet-usa-1) to logically separate traffic and enforce regional residency.
- Private connectivity: Private Service Connect / VPC peering to managed services (ClickHouse cloud/Cloud SQL) to keep traffic on the private network.
- Subnet per cluster: create a dedicated subnet for each GKE cluster and assign secondary CIDR ranges for pod & service IPs (GKE VPC-native / alias IPs).
- Route egress through regional Cloud NAT or egress VMs with strict firewall rules to control outbound IPs (important for compliance & whitelisting).
- Network security
- Network segmentation: use firewall rules, tags, and service accounts to isolate traffic between subnets (e.g., frontend, backend, video-processing, analytics).
- WAF & DDoS: Cloud Armor in front of the Load Balancer for WAF rules and DDoS protection. This can be configured using the declarative approach via YAML files
- Restrict access to sensitive dashboards and applications via Identity-aware Proxy.
- Encryption & keys: CMEK (Cloud KMS) for sensitive resources; encrypt objects at rest and in transit.
- IAM: Least privilege for service accounts; workload identity for pods to access GCP services.
- Secret management
- Use HashiCorp Vault to store secrets and vault agent injector to inject secrets to Kubernetes pods.
- Use dynamic credentials and keys rotation whenever possible using HachiCorp Vault.
- CI/CD
- Ensure GitHub actions agents are running in our cluster and use workload identity service accounts to access other GCP services to prevent any static credentials.
- Integrate image scanning tool into the CI/CD pipeline to scan images for vulnerabilities.
- Integrate SonarQube’s automated static analysis and Quality Gates into the CI/CD pipeline so builds fail automatically when code doesn’t meet defined thresholds for bugs, vulnerabilities, code smells, or test coverage
- Audit and compliance [I DON'T HAVE EXPERIENCE IN THIS AREA]
- We can use HashiCorp Vault audit engine to observe who did what on which resources and when.
- Use Google Cloud Audit to observe who did what on which resources and when.
- Use spot/preemptible nodes for non-critical batch processing.
- Storage lifecycle policies for raw video to move cold items to cheaper classes.
- CDN + aggressive edge caching for course assets to minimize origin egress.
- Right-size DB instances and use read replicas only where necessary.
- Use GKE Custom Compute Classes to encode scheduling policies that prefer on-demand node capacity for steady state workloads and automatically place overflow or fault-tolerant workloads on spot (preemptible) node pools. For predictable baseline capacity, purchase a resource-based Committed Use Discount (CUD) sized to that baseline (for example, 200 vCPU / 800 GB RAM) for a 1- or 3-year term to reduce compute costs (typical discounts ≈37% for 1 year and ≈55% for 3 years).
