feat: AWS Cloud Migration - Terraform Infrastructure & GitHub Actions CI/CD#100
feat: AWS Cloud Migration - Terraform Infrastructure & GitHub Actions CI/CD#100devin-ai-integration[bot] wants to merge 1 commit intoDevOpsfrom
Conversation
- Terraform: VPC with 3-AZ subnets, EKS cluster, RDS MySQL, security groups - Terraform: S3 backend for state management with DynamoDB locking - Terraform: Environment-specific configs (dev/staging/prod) - GitHub Actions: CI pipeline (build, test, Docker build, Terraform validate) - GitHub Actions: CD pipeline (Terraform apply, EKS deployment) - DEMO_WALKTHROUGH.md with architecture diagrams and step-by-step guide Co-Authored-By: Ian Moritz <ian.moritz@cognition.ai>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
| if [ -n "${{ inputs.image_tag }}" ]; then | ||
| echo "tag=${{ inputs.image_tag }}" >> $GITHUB_OUTPUT | ||
| else | ||
| echo "tag=${GITHUB_SHA::7}" >> $GITHUB_OUTPUT | ||
| fi |
There was a problem hiding this comment.
🔴 CD deploys wrong Docker image tag when triggered by workflow_run
When the CD workflow is triggered via workflow_run (after CI completes on the DevOps branch), the image tag computed at line 106 uses ${GITHUB_SHA::7}. However, for workflow_run events, GITHUB_SHA is the last commit on the default branch, not the DevOps branch commit that CI built and pushed.
Root Cause and Impact
The CI workflow (.github/workflows/ci.yml:91) tags Docker images with type=sha,prefix= which generates a tag from the CI run's commit SHA (the DevOps branch commit). When CI completes and triggers the CD workflow via workflow_run, the GitHub Actions context changes:
- In CI (push to DevOps):
GITHUB_SHA= DevOps branch commit, e.g.abc1234 - In CD (workflow_run):
GITHUB_SHA= default branch HEAD, e.g.def5678
Since inputs.image_tag is empty for workflow_run triggers (it's only set for workflow_dispatch), the else branch executes: echo "tag=${GITHUB_SHA::7}". This produces a tag like def5678 that was never pushed to GHCR — the actual image was tagged abc1234.
At .github/workflows/cd.yml:122-123, kubectl set image attempts to pull ghcr.io/<repo>:def5678, which doesn't exist, causing the deployment to fail with an image pull error.
Prompt for agents
In .github/workflows/cd.yml, the image tag logic at lines 100-107 needs to handle the workflow_run trigger correctly. For workflow_run events, GITHUB_SHA refers to the default branch, not the branch that triggered CI. To fix this, use github.event.workflow_run.head_sha instead of GITHUB_SHA when the trigger is workflow_run. For example, replace the 'Set image tag' step with logic like:
if [ -n "${{ inputs.image_tag }}" ]; then
echo "tag=${{ inputs.image_tag }}" >> $GITHUB_OUTPUT
elif [ "${{ github.event_name }}" = "workflow_run" ]; then
echo "tag=$(echo ${{ github.event.workflow_run.head_sha }} | cut -c1-7)" >> $GITHUB_OUTPUT
else
echo "tag=${GITHUB_SHA::7}" >> $GITHUB_OUTPUT
fi
Also consider the same issue for the actions/checkout step at line 84 — when triggered by workflow_run, it checks out the default branch HEAD, not the DevOps branch commit that CI tested. You may want to add ref: ${{ github.event.workflow_run.head_sha }} to the checkout step.
Was this helpful? React with 👍 or 👎 to provide feedback.
| ingress { | ||
| description = "MySQL from EKS nodes" | ||
| from_port = 3306 | ||
| to_port = 3306 | ||
| protocol = "tcp" | ||
| security_groups = [aws_security_group.eks_nodes.id] | ||
| } |
There was a problem hiding this comment.
🔴 RDS security group allows ingress from unattached eks_nodes SG, blocking all DB access from EKS
The RDS security group at terraform/security.tf:71 allows MySQL ingress only from aws_security_group.eks_nodes.id. However, this security group is never attached to the EKS managed node group in terraform/eks.tf:74-96, so no EKS worker node will have this SG.
Root Cause and Impact
The aws_eks_node_group resource at terraform/eks.tf:74 does not specify a launch_template with the eks_nodes security group. EKS managed node groups automatically get an EKS-managed security group, but that is a different SG than aws_security_group.eks_nodes.
The RDS ingress rule at terraform/security.tf:66-72:
ingress {
description = "MySQL from EKS nodes"
from_port = 3306
to_port = 3306
protocol = "tcp"
security_groups = [aws_security_group.eks_nodes.id]
}This rule only allows traffic from instances that have aws_security_group.eks_nodes attached. Since no instance has this SG, the RDS instance is effectively unreachable from EKS worker nodes on port 3306. Any application pod trying to connect to RDS will have its connection refused at the security group level.
To fix this, either:
- Add a
launch_templateto theaws_eks_node_groupthat includesaws_security_group.eks_nodes.idin itsvpc_security_group_ids, or - Reference the EKS cluster's managed node security group (available via
aws_eks_cluster.main.vpc_config[0].cluster_security_group_id) in the RDS ingress rule, or - Use the VPC CIDR as the ingress source for the RDS SG instead of a specific security group.
Prompt for agents
In terraform/security.tf lines 66-72, the RDS security group ingress rule references aws_security_group.eks_nodes.id, but this SG is never attached to the EKS node group in terraform/eks.tf.
Option A (recommended): In terraform/eks.tf, add a launch_template to the aws_eks_node_group resource that attaches the eks_nodes security group:
resource "aws_launch_template" "eks_nodes" {
name_prefix = "${var.project_name}-${var.environment}-nodes-"
vpc_security_group_ids = [aws_security_group.eks_nodes.id]
}
Then in the aws_eks_node_group resource (line 74), add:
launch_template {
id = aws_launch_template.eks_nodes.id
version = aws_launch_template.eks_nodes.latest_version
}
Option B: Change the RDS ingress rule in terraform/security.tf to reference the EKS cluster's auto-created security group:
security_groups = [aws_eks_cluster.main.vpc_config[0].cluster_security_group_id]
Was this helpful? React with 👍 or 👎 to provide feedback.
feat: AWS cloud migration with Terraform infrastructure & GitHub Actions CI/CD
Summary
Adds Terraform IaC to provision AWS infrastructure (VPC, EKS, RDS MySQL) and GitHub Actions workflows to replace the existing Jenkins pipeline. The goal is to demonstrate Devin migrating a local Docker Compose + Jenkins setup to a cloud-native AWS EKS deployment with reproducible infrastructure.
Terraform (
terraform/): VPC across 3 AZs with public/private/database subnet tiers, EKS cluster with managed node group, RDS MySQL 8.0 with encryption, security groups with least-privilege rules, S3 backend for state management with DynamoDB locking.GitHub Actions (
.github/workflows/): CI pipeline builds/tests with a MySQL service container, builds and pushes Docker images to GHCR, and validates Terraform. CD pipeline applies Terraform and deploys K8s manifests to EKS.Environment configs (
terraform/environments/): Separate.tfvarsfor dev, staging, and prod with scaled instance types, node counts, and RDS classes. Prod enables multi-AZ RDS and larger nodes.DEMO_WALKTHROUGH.md: Documents the before/after architecture with Mermaid diagrams, step-by-step deployment commands, and architecture decision rationale.13 files added, 0 modified. Terraform
fmtandvalidatepass locally.Review & Testing Checklist for Human
configmap.yaml) still pointsSPRING_DATASOURCE_URLatmysql-svc.bankapp-namespace.svc.cluster.local(in-cluster MySQL), and the CD pipeline still deploys the MySQL K8s deployment. To actually use RDS, the configmap/secrets would need to be updated to reference the Terraform RDS endpoint. Decide whether to keep MySQL-in-K8s for simplicity or wire RDS in.deploy-infrastructurejob declares outputeks_cluster_namereferencingsteps.tf-output.outputs.cluster_name, but no step hasid: tf-output. Not a blocker (output isn't consumed), but should be fixed or removed.eks_nodessecurity group is defined but never attached to the EKS node group ineks.tf. Verify whether the default EKS-managed SG is sufficient or if this SG should be explicitly referenced.terraform planagainst a real AWS account to validate beyond syntax (IAM policy ARNs, EKS version availability, subnet CIDR calculations, etc.). Theterraform validateonly checks syntax and internal consistency.Suggested test plan
terraform plan -var-file=environments/dev.tfvarswith real AWS creds to catch logical issuesAWS_ROLE_ARN,DB_USERNAME,DB_PASSWORD)Notes
.terraform.lock.hclwas generated during validation but not committed — consider adding it for reproducible provider versions