Skip to content
Draft
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
ce33511
Add AWS architecture diagram and documentation
killev May 24, 2025
3fcc647
Migrate infrastructure from MongoDB EC2 to managed services with comp…
killev May 24, 2025
196a33b
Standardize Terraform backend configuration and add initialization sc…
killev May 24, 2025
e5e4b60
Enhance Terraform backend initialization with improved error handling…
killev May 24, 2025
45d75c7
Refactor Terraform configuration and remove DynamoDB state locking
killev May 24, 2025
0bad9ce
Refactor Terraform configuration and split IAM policies
killev May 24, 2025
5f51b9c
Reorganize Terraform IAM policies and simplify Redis configuration
killev May 24, 2025
a3f4473
Add additional AWS permissions to Terraform core management policy
killev May 24, 2025
0ffa8ab
Enhance ECS infrastructure with configurable parameters and auto-scaling
killev May 24, 2025
a6bd59a
Make Terraform configuration more flexible and configurable
killev May 24, 2025
e807885
Update Terraform infrastructure configuration and IAM policies - Enha…
killev May 25, 2025
d824a1d
Update Terraform IAM policies for core and service management
killev May 25, 2025
aa4b964
Replace auto-generated Redis auth token with user-provided variable
killev May 25, 2025
156934d
Enhance AWS infrastructure configuration and S3 setup
killev May 26, 2025
37c7f8a
Add bastion host module for secure SSH access to infrastructure
killev May 26, 2025
b21f767
Improve bastion host configuration and IAM permissions
killev May 26, 2025
ff1a0d0
Add AWS deployment infrastructure and fix configuration issues
killev May 27, 2025
9f1d99d
Add Terraform deployment automation and release tracking
killev May 27, 2025
e72605e
feat: refactor DocumentDB connection to use separate environment vari…
killev May 27, 2025
6d4c974
Improve DocumentDB connection configuration for AWS compatibility
killev May 27, 2025
2851747
Update infrastructure configuration and deployment workflow - Update …
killev May 27, 2025
5d1a159
Merge commit '0db4fcbe6a3b20651351d3131cb2a51467191b61' into create-t…
killev May 27, 2025
5c8bc9e
Standardize environment naming and enhance Terraform configuration
killev May 27, 2025
7e5da24
Restructure infrastructure configuration and update CI/CD workflow
killev May 27, 2025
1b2fece
Update GitHub workflow for Terraform deployment configuration
killev May 27, 2025
15a40d5
Fix GitHub Actions workflow for terraform configuration
killev May 27, 2025
4616764
Reorganize deployment configuration into environments structure
killev May 27, 2025
a53edfa
Fix directory name typo in deployment configuration
killev May 27, 2025
283e35b
Fix Terraform configuration and AWS region in deploy workflow
killev May 27, 2025
4b67b30
Fix terraform plan output path and enable S3 plan storage
killev May 27, 2025
0e79606
Fix Terraform plan file paths in GitHub workflow
killev May 27, 2025
b400800
Fix AWS region configuration in deployment workflow
killev May 27, 2025
88e7564
Simplify GitHub workflow and update ECR repository naming
killev May 28, 2025
8b49799
Update Terraform version to 1.12.0 in apply job
killev May 28, 2025
ad646a3
Fix Terraform lock file consistency issue in GitHub Actions - Add ste…
killev May 28, 2025
03a5c5b
Fix duplicate cd command in terraform apply step
killev May 28, 2025
c2bb29c
Remove duplicate cd deployment command in workflow
killev May 28, 2025
9d2154a
Update Docker build step in AWS deployment workflow
killev May 28, 2025
797968f
Update Docker build configuration and ignore patterns
killev May 28, 2025
50d0594
Simplify ECS secrets configuration in Terraform
killev May 28, 2025
d21c003
Add AWS environment destruction workflow and documentation
killev Jun 5, 2025
5f884d9
Update wget version to 1.25.0-r1 in Dockerfile
killev Jun 5, 2025
b98f1ab
Resolve merge conflicts: keep full implementation and AWS certificate
killev Jun 7, 2025
775bd50
Add conditional approval for AWS environment destruction
killev Jun 7, 2025
168bb71
Add environment specification to destroy workflow jobs
killev Jun 7, 2025
a1aced8
Optimize GitHub Actions destroy workflow conditions
killev Jun 7, 2025
b59b21d
Fix manual approval conditions in destroy AWS environment workflow
killev Jun 7, 2025
3b5bac3
Add force delete option to ECR repository
killev Jun 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -144,3 +144,8 @@ website/public/apos-frontend
dump.archive
.prettierrc
aposUsersSafe.json
terraform.tfvars
terraform/.terraform
dev.tfplan
plan.out.txt
.cursor/tmp
142 changes: 113 additions & 29 deletions docs/Infrastructure.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

* **AWS Region**: `us-east-1`
* **Environments**: `dev`, `staging`, `prod`
* **Domain**: `sf-website-<env>.prettyclear.com`
* **Domain**: `sf-website-<env>.sandbox-prettyclear.com`
* **Structure**: Modular Terraform setup for multi-environment support
* **Resource Tags** (applied to all resources):

Expand Down Expand Up @@ -158,7 +158,8 @@
* ECR (container image repository)
* ALB (for ingress)
* CloudWatch (for logs & metrics)
* MongoDB EC2 instance (database)
* DocumentDB Cluster (database)
* ElastiCache Redis (caching)
* S3 Attachments Bucket
* **Service Name**: `sf-website`
* **Container Image**: Built from project Dockerfile and stored in ECR
Expand All @@ -167,7 +168,8 @@
* **Auto-scaling**: Based on CPU usage (target: 70%)
* **Environment Variables**:
* `NODE_ENV=production`
* `APOS_MONGODB_URI=mongodb://<mongodb-hostname>:27017/apostrophe`
* `APOS_MONGODB_URI=mongodb://<documentdb-cluster-endpoint>:27017/apostrophe`
* `REDIS_URI=redis://<elasticache-cluster-endpoint>:6379`
* `SESSION_SECRET=<from parameter store>`
* `APOS_S3_BUCKET=sf-website-s3-attachments-<env>`
* `APOS_S3_REGION=us-east-1`
Expand Down Expand Up @@ -198,7 +200,7 @@
* ACM (for SSL certificates)
* **Type**: HTTPS-only
* **SSL**: Via AWS ACM
* **Domain**: `sf-website-<env>.prettyclear.com`
* **Domain**: `sf-website-<env>.sandbox-prettyclear.com`

---

Expand All @@ -219,7 +221,7 @@
* ECS Cluster (via APOS_CDN_URL environment variable)
* **Origin**: S3 bucket `sf-website-s3-attachments-<env>`
* **Access**: Origin access identity (OAI) to restrict direct S3 access
* **Custom domain**: `sf-website-media-<env>.prettyclear.com`
* **Custom domain**: `sf-website-media-<env>.sandbox-prettyclear.com`
* **SSL Certificate**: Managed through AWS ACM
* **Cache Behavior**:
* Default TTL: 86400 seconds (1 day)
Expand Down Expand Up @@ -247,51 +249,133 @@
* **Resource Integration**:
* ECS Apostrophe Cluster
* ALB
* DocumentDB Cluster
* ElastiCache Redis
* Slack (for alerts)
* **Features**:
* ECS logs and detailed metrics
* ALB metrics (e.g., 5xx, latency)
* DocumentDB cluster and instance metrics
* ElastiCache Redis performance metrics
* CloudWatch alarms for key metrics
* **Alerts**: Sent to Slack
* **Log retention**: 90 days

---

### 📄 MongoDB on EC2
### 🔴 Amazon ElastiCache (Redis)

* **MongoDB**:
* **Instance Name**: `sf-website-mongodb-<env>`
* **Purpose**: Primary data store for ApostropheCMS
* **ElastiCache Redis Cluster**:
* **Cluster Name**: `sf-website-redis-<env>`
* **Purpose**: Managed Redis service for session storage and application caching
* **Resource Tags**:
* `Name: sf-website-mongodb-<env>`
* `Name: sf-website-redis-<env>`
* `Project: Website`
* `CostCenter: Website`
* `Environment: <environment>`
* `Owner: peter.ovchyn`
* **Resource Integration**:
* ECS Apostrophe Cluster
* AWS Backup service
* CloudWatch (for monitoring)
* Parameter Store (for credentials)
* **Instance Type**: t3.medium (2 vCPU, 4GB RAM)
* **Storage**: 100GB gp3 EBS volume with 3000 IOPS
* **AMI**: Amazon Linux 2
* **Deployment**: Single EC2 instance in private subnet
* Cache Subnet Group (for networking)
* **Engine Version**: Redis 7.0 (latest stable)
* **Node Configuration**:
* **Node Type**: `cache.t3.micro` (1 vCPU, 0.5GB RAM) for dev/staging
* **Node Type**: `cache.t3.small` (2 vCPU, 1.5GB RAM) for production
* **Number of Nodes**: 1 (single node for simplicity)
* **Port**: 6379 (Redis standard)
* **Deployment**:
* Deployed in private subnets
* Cache Subnet Group spans both availability zones
* **Security**:
* No public IP assigned
* Security group allows ingress only from ECS service security group on port 27017
* SSH access via Session Manager (no direct SSH allowed)
* **Authentication**: Username/password authentication enabled
* Credentials stored in AWS Parameter Store
* VPC security group restricting access to ECS service only
* No public access
* Transit encryption enabled
* Auth token enabled for authentication
* **Authentication**:
* Auth token stored in AWS Parameter Store
* Referenced in ECS task environment variables
* **Backup Strategy**:
* Daily automated snapshots of EBS volume
* Retention period: 7 daily, 4 weekly
* Snapshot automation via AWS Backup service
* **Automatic Backups**:
* Daily snapshots enabled
* Retention period: 5 days
* Backup window: 02:00-03:00 UTC
* **Monitoring**:
* CloudWatch metrics for cluster performance
* CloudWatch alarms for:
* CPU utilization > 80%
* Memory usage > 80%
* Connection count thresholds
* Cache hit ratio < 80%
* **High Availability**:
* Automatic failover enabled
* Multi-AZ deployment for production environment
* Automatic minor version updates during maintenance window
* **Network Configuration**:
* **Cache Subnet Group**: `sf-website-redis-subnet-group-<env>`
* **Security Group**: `sf-website-redis-sg-<env>`
* **Endpoint**: Primary endpoint for read/write operations

---

### 📄 Amazon DocumentDB

* **DocumentDB Cluster**:
* **Cluster Name**: `sf-website-documentdb-<env>`
* **Purpose**: Managed MongoDB-compatible database service for ApostropheCMS
* **Resource Tags**:
* `Name: sf-website-documentdb-<env>`
* `Project: Website`
* `CostCenter: Website`
* `Environment: <environment>`
* `Owner: peter.ovchyn`
* **Resource Integration**:
* ECS Apostrophe Cluster
* CloudWatch (for monitoring)
* Parameter Store (for credentials)
* DB Subnet Group (for networking)
* **Engine Version**: 4.0.0 (MongoDB compatible)
* **Cluster Configuration**:
* **Primary Instance**: `db.t3.medium` (2 vCPU, 4GB RAM)
* **Replica Instances**: 1 replica for high availability
* **Storage**: Encrypted with AWS managed keys
* **Port**: 27017 (MongoDB standard)
* **Deployment**:
* Multi-AZ deployment across private subnets
* DB Subnet Group spans both availability zones
* **Security**:
* VPC security group restricting access to ECS service only
* TLS encryption in transit required
* No public access
* Authentication required
* **Authentication**:
* Master username/password stored in AWS Parameter Store
* Referenced in ECS task environment variables via Parameter Store
* Database: `apostrophe`
* **Backup Strategy**:
* **Automated Backups**:
* Backup retention period: 7 days
* Backup window: 03:00-04:00 UTC
* Point-in-time recovery enabled
* **Manual Snapshots**: Available for major releases
* **Monitoring**:
* CloudWatch agent for system metrics
* Custom MongoDB metrics published to CloudWatch
* Alerts for disk usage, connections, and query performance
* CloudWatch metrics for cluster and instance performance
* Enhanced monitoring enabled (60-second granularity)
* CloudWatch alarms for:
* CPU utilization > 80%
* Database connections > 80% of max
* Free storage < 20%
* Read/Write latency thresholds
* **Parameter Group**:
* Custom parameter group for performance optimization
* TLS enforcement enabled
* Audit logging enabled for security compliance
* **High Availability**:
* Configured for future upgrade to a replica set
* Placeholder DNS record for future replica nodes
* Multi-AZ replica instance for automatic failover
* Cross-AZ backup replication
* Automatic minor version updates during maintenance window
* **Network Configuration**:
* **DB Subnet Group**: `sf-website-documentdb-subnet-group-<env>`
* **Security Group**: `sf-website-documentdb-sg-<env>`
* **Endpoint**: Cluster endpoint for write operations
* **Reader Endpoint**: Available for read-only operations
115 changes: 115 additions & 0 deletions docs/aws-architecture-diagram.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# SF Website AWS Architecture

## Principal AWS Architecture Diagram

```mermaid
flowchart TB
%% External actors
Users[👥 Users]
GitHub[🐙 GitHub Actions CI/CD]

%% Public facing components
ALB[⚖️ Application Load Balancer<br/>sf-website-alb-env<br/>HTTPS Only]
CF[🌐 CloudFront Distribution<br/>sf-website-media-env<br/>CDN for Media Assets]

%% Compute layer
ECS[🚢 ECS Fargate Cluster<br/>sf-website-ecs-cluster-env<br/>Apostrophe CMS App]
ECR[🐳 ECR Repository<br/>sf-website-ecr-env<br/>Container Images]

%% Storage layer
S3_Attachments[🪣 S3 Attachments Bucket<br/>sf-website-s3-attachments-env<br/>Media & Files]
S3_Logs[🪣 S3 Logs Bucket<br/>sf-website-s3-logs-env<br/>Centralized Logs]
MongoDB[📄 MongoDB on EC2<br/>sf-website-mongodb-env<br/>t3.medium + 100GB EBS]

%% Security & Identity
IAM_Task[👤 ECS Task Role<br/>sf-website-ecs-task-env<br/>S3 Access Permissions]
IAM_Exec[👤 ECS Execution Role<br/>sf-website-ecs-execution-env<br/>ECR & Parameter Store]
ParamStore[🔐 Parameter Store<br/>Session Secrets & DB Credentials]

%% Monitoring & Backup
CloudWatch[📊 CloudWatch<br/>sf-website-cloudwatch-env<br/>Logs & Metrics]
AWSBackup[💾 AWS Backup<br/>Daily EBS Snapshots<br/>7 daily, 4 weekly retention]

%% User flows
Users -->|HTTPS requests| ALB
Users -->|Media requests| CF

%% CI/CD flow
GitHub -->|Build & Push| ECR
GitHub -->|Deploy| ECS

%% Load balancer to application
ALB -->|Route traffic| ECS

%% CloudFront to storage
CF -->|Origin requests| S3_Attachments

%% ECS relationships
ECS -->|Pull images| ECR
ECS -->|Read/Write media| S3_Attachments
ECS -->|Database operations| MongoDB
ECS -->|Get secrets| ParamStore
ECS -->|Send logs| CloudWatch

%% IAM relationships
IAM_Task -.->|Assume role| ECS
Copy link

Copilot AI May 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a legend to clarify the significance of dashed arrows (used for IAM relationships) versus solid arrows in the Mermaid diagram for better reader comprehension.

Copilot uses AI. Check for mistakes.
IAM_Exec -.->|Assume role| ECS
IAM_Task -.->|S3 permissions| S3_Attachments
IAM_Exec -.->|ECR permissions| ECR
IAM_Exec -.->|Parameter Store| ParamStore

%% Logging flows
ALB -->|Access logs| S3_Logs
CF -->|Access logs| S3_Logs
S3_Attachments -->|Server logs| S3_Logs

%% Monitoring
ECS -->|Metrics & logs| CloudWatch
ALB -->|Metrics| CloudWatch
MongoDB -->|System metrics| CloudWatch

%% Backup
AWSBackup -->|Snapshot| MongoDB
Copy link

Copilot AI May 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the documentation to specify that AWS Backup snapshots target the EBS volume of the MongoDB EC2 instance, ensuring clear understanding of what is being backed up.

Copilot uses AI. Check for mistakes.

%% Styling
classDef public fill:#e1f5fe
classDef compute fill:#f3e5f5
classDef storage fill:#e8f5e8
classDef security fill:#fff3e0
classDef monitoring fill:#fce4ec

class ALB,CF public
class ECS,ECR compute
class S3_Attachments,S3_Logs,MongoDB storage
class IAM_Task,IAM_Exec,ParamStore security
class CloudWatch,AWSBackup monitoring
```

## Key Architecture Components

### 🌐 Public Layer
- **Application Load Balancer**: HTTPS-only entry point for web traffic
- **CloudFront**: Global CDN for media asset delivery from S3

### 🚢 Compute Layer
- **ECS Fargate**: Serverless container hosting for Apostrophe CMS
- **ECR**: Private container registry for application images

### 🪣 Storage Layer
- **S3 Attachments**: Media files and uploads from CMS
- **S3 Logs**: Centralized logging for all services
- **MongoDB on EC2**: Primary database with automated backups

### 👤 Security Layer
- **IAM Roles**: Least-privilege access for ECS tasks
- **Parameter Store**: Secure storage for secrets and configuration

### 📊 Operations Layer
- **CloudWatch**: Monitoring, metrics, and alerting
- **AWS Backup**: Automated daily snapshots with retention policies

## Environment Isolation
All resources are tagged and named with environment suffix:
- `dev`, `staging`, `prod`
- Complete isolation between environments
- Consistent naming: `sf-website-<service>-<env>`
66 changes: 66 additions & 0 deletions docs/infrastructureQNA.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Infrastructure Q&A for Terraform Implementation

## Questions and Answers

### Q1: Certificate ARNs
**Question**: What are the actual ARN values for your existing SSL certificates?
- Main app certificates: `sf-website-{env}.sandbox-prettyclear.com`
- Media certificates: `sf-website-media-{env}.sandbox-prettyclear.com`

**Answer**: Wildcard certificate `*.sandbox-prettyclear.com` covers all subdomains
**ARN**: `arn:aws:acm:us-east-1:548271326349:certificate/7e11016f-f90e-4800-972d-622bf1a82948`

---

### Q2: Route 53 Hosted Zone ID
**Question**: What's the hosted zone ID for `sandbox-prettyclear.com`?

**Answer**: [Skipped for now - will address later]

---

### Q3: Parameter Store Secrets
**Question**: Should I generate these automatically or do you have specific values?
- DocumentDB master username/password
- SESSION_SECRET
- Any other app secrets?

**Answer**:
- **DocumentDB master username/password**: Store in tfvars files
- **SESSION_SECRET**: User will provide specific value in tfvars
- **Other secrets**: Based on docker-compose.yml:
- **REDIS_URI**: Will be auto-generated (ElastiCache endpoint)
- **BASE_URL**: Will be auto-generated from ALB domain
- **SERVICE_ACCOUNT_PRIVATE_KEY**: User will provide if using Google Cloud Storage
- **NODE_ENV**: Will be set to 'production'

---

### Q4: Deployment Scope
**Question**: Should I create Terraform to deploy all three environments at once, or one environment at a time (which one first)?

**Answer**: Terraform script should create 1 environment at a time. Environment should be specified via tfvars file.

---

### Q5: Remote State
**Question**: Do you want S3 backend for Terraform state storage?

**Answer**: Yes, use S3 bucket for Terraform state storage with DynamoDB for state locking.

---

### Q6: CI/CD Integration
**Question**: Do you need IAM roles for GitHub Actions to deploy?

**Answer**: Yes, include all 3:
- IAM role that GitHub Actions can assume
- Permissions for Terraform operations (creating/updating resources)
- ECR permissions for pushing Docker images

---

### Q7: CloudWatch Alerts
**Question**: For notifications, do you have Slack webhook URLs, or should I create SNS topics instead?

**Answer**: Slack webhook URLs - should be provided in tfvars file
Loading
Loading