Skip to content

Conversation

@johnluoyx
Copy link
Contributor

No description provided.

Copilot AI review requested due to automatic review settings December 9, 2025 02:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a comprehensive GitHub event analytics demo for AutoMQ's BYOC (Bring Your Own Cloud) platform, showcasing the Table Topic feature that automatically converts streaming data into Apache Iceberg tables for real-time analytics.

Key changes:

  • Adds Terraform infrastructure for deploying a complete GitHub event analytics pipeline on AWS EKS
  • Implements a Python event producer that ingests GitHub archive data into Kafka topics
  • Provides a Marimo-based analytics notebook for real-time visualization and querying via AWS Athena
  • Configures S3 Tables and Athena for querying Iceberg-formatted event data
  • Switches default node capacity type to SPOT instances for cost optimization

Reviewed changes

Copilot reviewed 21 out of 23 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
byoc-examples/setup/kubernetes/aws/terraform/main.tf Changed node group capacity type from ON_DEMAND to SPOT for cost savings
byoc-examples/features/table-topic/policy-eks/terraform/variables.tf Added variable definitions for producer node group configuration
byoc-examples/features/table-topic/policy-eks/terraform/providers.tf Added Terraform provider configurations for AWS, Helm, Kubernetes, and random
byoc-examples/features/table-topic/policy-eks/terraform/eks.tf Added EKS environment module reference
byoc-examples/features/table-topic/github-event/terraform/variables.tf Defined variables for region, resource naming, node groups, and producer configuration
byoc-examples/features/table-topic/github-event/terraform/s3_table.tf Created S3 Tables bucket, namespace, and Athena workgroup resources
byoc-examples/features/table-topic/github-event/terraform/providers.tf Configured providers with Kubernetes and Helm authentication to EKS
byoc-examples/features/table-topic/github-event/terraform/producer.tf Defined EKS node group dedicated to running producer workloads
byoc-examples/features/table-topic/github-event/terraform/outputs.tf Added outputs for console endpoints, credentials, cluster info, and resource ARNs
byoc-examples/features/table-topic/github-event/terraform/main.tf Configured AutoMQ BYOC environment module with VPC integration
byoc-examples/features/table-topic/github-event/terraform/eks.tf Set up EKS cluster with security group rules and access policies
byoc-examples/features/table-topic/github-event/notebook/service.yaml Created Kubernetes service for notebook access
byoc-examples/features/table-topic/github-event/notebook/requirements.txt Listed Python dependencies for analytics notebook
byoc-examples/features/table-topic/github-event/notebook/deployment.yaml Configured Kubernetes deployment for Marimo analytics notebook
byoc-examples/features/table-topic/github-event/notebook/create-configmap.sh Script to create ConfigMap from notebook source files
byoc-examples/features/table-topic/github-event/notebook/analysis.py Marimo notebook implementing real-time GitHub event analytics with auto-refresh
byoc-examples/features/table-topic/github-event/event-producer/requirements.txt Listed Python dependencies for event producer
byoc-examples/features/table-topic/github-event/event-producer/github_event.avsc Defined Avro schema for GitHub event data structure
byoc-examples/features/table-topic/github-event/event-producer/event_producer.py Implemented continuous producer fetching GitHub archive data and publishing to Kafka
byoc-examples/features/table-topic/github-event/event-producer/deployment.yaml Configured Kubernetes deployment for event producer
byoc-examples/features/table-topic/github-event/event-producer/create-configmap.sh Script to create ConfigMap from producer source files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +36 to +38
value: "kf-pfiq444nipaz43tk.automqlab-k3f3.automq.private:9092"
- name: SCHEMA_REGISTRY_URL
value: "http://kf-pfiq444nipaz43tk.automqlab-k3f3.automq.private:8081"
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded Kafka bootstrap servers and schema registry URL. These should be parameterized using Terraform outputs or ConfigMap/environment variables that can be dynamically configured per environment.

Suggested change
value: "kf-pfiq444nipaz43tk.automqlab-k3f3.automq.private:9092"
- name: SCHEMA_REGISTRY_URL
value: "http://kf-pfiq444nipaz43tk.automqlab-k3f3.automq.private:8081"
valueFrom:
configMapKeyRef:
name: github-event-producer-config
key: KAFKA_BOOTSTRAP_SERVERS
- name: SCHEMA_REGISTRY_URL
valueFrom:
configMapKeyRef:
name: github-event-producer-config
key: SCHEMA_REGISTRY_URL

Copilot uses AI. Check for mistakes.
"type": t,
"created_at": GithubEventMapper.to_timestamp(raw.get("created_at")),
"actor_login": raw.get("actor", {}).get("login"),
"repo_name": raw.get("repo", {}).get("name")
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing field mapping. The Avro schema defines an org_login field (line 11 in github_event.avsc), but it's not being populated in the map_event function. Add "org_login": raw.get("org", {}).get("login") to the record dictionary to match the schema.

Suggested change
"repo_name": raw.get("repo", {}).get("name")
"repo_name": raw.get("repo", {}).get("name"),
"org_login": raw.get("org", {}).get("login")

Copilot uses AI. Check for mistakes.
Comment on lines +59 to +73
exec:
command:
- /bin/sh
- -c
- "ps aux | grep -v grep | grep event_producer.py || exit 1"
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
readinessProbe:
exec:
command:
- /bin/sh
- -c
- "ps aux | grep -v grep | grep event_producer.py || exit 1"
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inefficient liveness/readiness probe. Using ps aux | grep is resource-intensive and unreliable for checking if a Python script is healthy. Consider implementing a simple HTTP health endpoint in the producer script or using a file-based health check that the script touches periodically.

Suggested change
exec:
command:
- /bin/sh
- -c
- "ps aux | grep -v grep | grep event_producer.py || exit 1"
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
readinessProbe:
exec:
command:
- /bin/sh
- -c
- "ps aux | grep -v grep | grep event_producer.py || exit 1"
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /healthz
port: 8080

Copilot uses AI. Check for mistakes.
required_providers {
aws = {
source = "hashicorp/aws"
version = " > 5.0.0"
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid version constraint syntax. There's an extra space before the > operator. Should be ">= 5.0.0" or "> 5.0.0" without the leading space.

Suggested change
version = " > 5.0.0"
version = "> 5.0.0"

Copilot uses AI. Check for mistakes.


resource "aws_athena_workgroup" "lab-athena-workgroup" {
name = "${local.resource_suffix}-primary" # 覆盖默认的 primary 工作组
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment contains Chinese text: "覆盖默认的 primary 工作组" (Override default primary workgroup). Consider translating to English for consistency with the rest of the codebase.

Suggested change
name = "${local.resource_suffix}-primary" # 覆盖默认的 primary 工作组
name = "${local.resource_suffix}-primary" # Override default primary workgroup

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,30 @@
resource "aws_s3tables_table_bucket" "event_table_bucket" {
name = "${local.resource_suffix}-event-analytics-bucket"
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing force_destroy attribute. The S3 Tables bucket resource should have force_destroy = true to allow Terraform to delete the bucket even if it contains data, similar to the aws_s3_bucket.athena_results resource below. Without this, terraform destroy may fail if tables exist in the bucket.

Suggested change
name = "${local.resource_suffix}-event-analytics-bucket"
name = "${local.resource_suffix}-event-analytics-bucket"
force_destroy = true

Copilot uses AI. Check for mistakes.
"repo_name": raw.get("repo", {}).get("name")
}

return record
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing return statement. The function map_event doesn't explicitly return the record dictionary. While Python functions implicitly return None if no return statement is provided, this appears to be a bug as the function is supposed to return the mapped event record.

Copilot uses AI. Check for mistakes.
Comment on lines +268 to +269
# After successful processing, update variable and move to next hour
last_processed_hour = current_hour
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable last_processed_hour is not used.

Suggested change
# After successful processing, update variable and move to next hour
last_processed_hour = current_hour
# After successful processing, move to next hour

Copilot uses AI. Check for mistakes.
Comment on lines +241 to +244
else:
# Resume from last processed time point
current_hour = last_processed_hour + timedelta(hours=1)
logger.info(f"🔄 Resuming from {format_hour_key(current_hour)}")
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is unreachable.

Suggested change
else:
# Resume from last processed time point
current_hour = last_processed_hour + timedelta(hours=1)
logger.info(f"🔄 Resuming from {format_hour_key(current_hour)}")
# (Resume logic removed: unreachable code)

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +14
output "initial_password" {
description = "Initial password for the AutoMQ BYOC environment"
value = module.automq-byoc.automq_byoc_initial_password
}
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Terraform configuration exposes the AutoMQ BYOC initial console password in cleartext via the initial_password output. This causes the password to appear in terraform apply output and any CI logs or shared automation that runs this module, so anyone with access to those logs can reuse the credentials to access and control the BYOC environment. Consider not exposing the password as an output at all (retrieving it interactively or via a secret store instead), or at minimum mark the output as sensitive and ensure Terraform runs only in tightly controlled environments so this value is not written to shared logs.

Suggested change
output "initial_password" {
description = "Initial password for the AutoMQ BYOC environment"
value = module.automq-byoc.automq_byoc_initial_password
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants