-
Notifications
You must be signed in to change notification settings - Fork 5
feat: github event byoc demo #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a comprehensive GitHub event analytics demo for AutoMQ's BYOC (Bring Your Own Cloud) platform, showcasing the Table Topic feature that automatically converts streaming data into Apache Iceberg tables for real-time analytics.
Key changes:
- Adds Terraform infrastructure for deploying a complete GitHub event analytics pipeline on AWS EKS
- Implements a Python event producer that ingests GitHub archive data into Kafka topics
- Provides a Marimo-based analytics notebook for real-time visualization and querying via AWS Athena
- Configures S3 Tables and Athena for querying Iceberg-formatted event data
- Switches default node capacity type to SPOT instances for cost optimization
Reviewed changes
Copilot reviewed 21 out of 23 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| byoc-examples/setup/kubernetes/aws/terraform/main.tf | Changed node group capacity type from ON_DEMAND to SPOT for cost savings |
| byoc-examples/features/table-topic/policy-eks/terraform/variables.tf | Added variable definitions for producer node group configuration |
| byoc-examples/features/table-topic/policy-eks/terraform/providers.tf | Added Terraform provider configurations for AWS, Helm, Kubernetes, and random |
| byoc-examples/features/table-topic/policy-eks/terraform/eks.tf | Added EKS environment module reference |
| byoc-examples/features/table-topic/github-event/terraform/variables.tf | Defined variables for region, resource naming, node groups, and producer configuration |
| byoc-examples/features/table-topic/github-event/terraform/s3_table.tf | Created S3 Tables bucket, namespace, and Athena workgroup resources |
| byoc-examples/features/table-topic/github-event/terraform/providers.tf | Configured providers with Kubernetes and Helm authentication to EKS |
| byoc-examples/features/table-topic/github-event/terraform/producer.tf | Defined EKS node group dedicated to running producer workloads |
| byoc-examples/features/table-topic/github-event/terraform/outputs.tf | Added outputs for console endpoints, credentials, cluster info, and resource ARNs |
| byoc-examples/features/table-topic/github-event/terraform/main.tf | Configured AutoMQ BYOC environment module with VPC integration |
| byoc-examples/features/table-topic/github-event/terraform/eks.tf | Set up EKS cluster with security group rules and access policies |
| byoc-examples/features/table-topic/github-event/notebook/service.yaml | Created Kubernetes service for notebook access |
| byoc-examples/features/table-topic/github-event/notebook/requirements.txt | Listed Python dependencies for analytics notebook |
| byoc-examples/features/table-topic/github-event/notebook/deployment.yaml | Configured Kubernetes deployment for Marimo analytics notebook |
| byoc-examples/features/table-topic/github-event/notebook/create-configmap.sh | Script to create ConfigMap from notebook source files |
| byoc-examples/features/table-topic/github-event/notebook/analysis.py | Marimo notebook implementing real-time GitHub event analytics with auto-refresh |
| byoc-examples/features/table-topic/github-event/event-producer/requirements.txt | Listed Python dependencies for event producer |
| byoc-examples/features/table-topic/github-event/event-producer/github_event.avsc | Defined Avro schema for GitHub event data structure |
| byoc-examples/features/table-topic/github-event/event-producer/event_producer.py | Implemented continuous producer fetching GitHub archive data and publishing to Kafka |
| byoc-examples/features/table-topic/github-event/event-producer/deployment.yaml | Configured Kubernetes deployment for event producer |
| byoc-examples/features/table-topic/github-event/event-producer/create-configmap.sh | Script to create ConfigMap from producer source files |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| value: "kf-pfiq444nipaz43tk.automqlab-k3f3.automq.private:9092" | ||
| - name: SCHEMA_REGISTRY_URL | ||
| value: "http://kf-pfiq444nipaz43tk.automqlab-k3f3.automq.private:8081" |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded Kafka bootstrap servers and schema registry URL. These should be parameterized using Terraform outputs or ConfigMap/environment variables that can be dynamically configured per environment.
| value: "kf-pfiq444nipaz43tk.automqlab-k3f3.automq.private:9092" | |
| - name: SCHEMA_REGISTRY_URL | |
| value: "http://kf-pfiq444nipaz43tk.automqlab-k3f3.automq.private:8081" | |
| valueFrom: | |
| configMapKeyRef: | |
| name: github-event-producer-config | |
| key: KAFKA_BOOTSTRAP_SERVERS | |
| - name: SCHEMA_REGISTRY_URL | |
| valueFrom: | |
| configMapKeyRef: | |
| name: github-event-producer-config | |
| key: SCHEMA_REGISTRY_URL |
| "type": t, | ||
| "created_at": GithubEventMapper.to_timestamp(raw.get("created_at")), | ||
| "actor_login": raw.get("actor", {}).get("login"), | ||
| "repo_name": raw.get("repo", {}).get("name") |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing field mapping. The Avro schema defines an org_login field (line 11 in github_event.avsc), but it's not being populated in the map_event function. Add "org_login": raw.get("org", {}).get("login") to the record dictionary to match the schema.
| "repo_name": raw.get("repo", {}).get("name") | |
| "repo_name": raw.get("repo", {}).get("name"), | |
| "org_login": raw.get("org", {}).get("login") |
| exec: | ||
| command: | ||
| - /bin/sh | ||
| - -c | ||
| - "ps aux | grep -v grep | grep event_producer.py || exit 1" | ||
| initialDelaySeconds: 60 | ||
| periodSeconds: 30 | ||
| timeoutSeconds: 10 | ||
| failureThreshold: 3 | ||
| readinessProbe: | ||
| exec: | ||
| command: | ||
| - /bin/sh | ||
| - -c | ||
| - "ps aux | grep -v grep | grep event_producer.py || exit 1" |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inefficient liveness/readiness probe. Using ps aux | grep is resource-intensive and unreliable for checking if a Python script is healthy. Consider implementing a simple HTTP health endpoint in the producer script or using a file-based health check that the script touches periodically.
| exec: | |
| command: | |
| - /bin/sh | |
| - -c | |
| - "ps aux | grep -v grep | grep event_producer.py || exit 1" | |
| initialDelaySeconds: 60 | |
| periodSeconds: 30 | |
| timeoutSeconds: 10 | |
| failureThreshold: 3 | |
| readinessProbe: | |
| exec: | |
| command: | |
| - /bin/sh | |
| - -c | |
| - "ps aux | grep -v grep | grep event_producer.py || exit 1" | |
| httpGet: | |
| path: /healthz | |
| port: 8080 | |
| initialDelaySeconds: 60 | |
| periodSeconds: 30 | |
| timeoutSeconds: 10 | |
| failureThreshold: 3 | |
| readinessProbe: | |
| httpGet: | |
| path: /healthz | |
| port: 8080 |
| required_providers { | ||
| aws = { | ||
| source = "hashicorp/aws" | ||
| version = " > 5.0.0" |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid version constraint syntax. There's an extra space before the > operator. Should be ">= 5.0.0" or "> 5.0.0" without the leading space.
| version = " > 5.0.0" | |
| version = "> 5.0.0" |
|
|
||
|
|
||
| resource "aws_athena_workgroup" "lab-athena-workgroup" { | ||
| name = "${local.resource_suffix}-primary" # 覆盖默认的 primary 工作组 |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment contains Chinese text: "覆盖默认的 primary 工作组" (Override default primary workgroup). Consider translating to English for consistency with the rest of the codebase.
| name = "${local.resource_suffix}-primary" # 覆盖默认的 primary 工作组 | |
| name = "${local.resource_suffix}-primary" # Override default primary workgroup |
| @@ -0,0 +1,30 @@ | |||
| resource "aws_s3tables_table_bucket" "event_table_bucket" { | |||
| name = "${local.resource_suffix}-event-analytics-bucket" | |||
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing force_destroy attribute. The S3 Tables bucket resource should have force_destroy = true to allow Terraform to delete the bucket even if it contains data, similar to the aws_s3_bucket.athena_results resource below. Without this, terraform destroy may fail if tables exist in the bucket.
| name = "${local.resource_suffix}-event-analytics-bucket" | |
| name = "${local.resource_suffix}-event-analytics-bucket" | |
| force_destroy = true |
| "repo_name": raw.get("repo", {}).get("name") | ||
| } | ||
|
|
||
| return record |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing return statement. The function map_event doesn't explicitly return the record dictionary. While Python functions implicitly return None if no return statement is provided, this appears to be a bug as the function is supposed to return the mapped event record.
| # After successful processing, update variable and move to next hour | ||
| last_processed_hour = current_hour |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable last_processed_hour is not used.
| # After successful processing, update variable and move to next hour | |
| last_processed_hour = current_hour | |
| # After successful processing, move to next hour |
| else: | ||
| # Resume from last processed time point | ||
| current_hour = last_processed_hour + timedelta(hours=1) | ||
| logger.info(f"🔄 Resuming from {format_hour_key(current_hour)}") |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement is unreachable.
| else: | |
| # Resume from last processed time point | |
| current_hour = last_processed_hour + timedelta(hours=1) | |
| logger.info(f"🔄 Resuming from {format_hour_key(current_hour)}") | |
| # (Resume logic removed: unreachable code) |
| output "initial_password" { | ||
| description = "Initial password for the AutoMQ BYOC environment" | ||
| value = module.automq-byoc.automq_byoc_initial_password | ||
| } |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Terraform configuration exposes the AutoMQ BYOC initial console password in cleartext via the initial_password output. This causes the password to appear in terraform apply output and any CI logs or shared automation that runs this module, so anyone with access to those logs can reuse the credentials to access and control the BYOC environment. Consider not exposing the password as an output at all (retrieving it interactively or via a secret store instead), or at minimum mark the output as sensitive and ensure Terraform runs only in tightly controlled environments so this value is not written to shared logs.
| output "initial_password" { | |
| description = "Initial password for the AutoMQ BYOC environment" | |
| value = module.automq-byoc.automq_byoc_initial_password | |
| } |
No description provided.