This project provisions GCS buckets and Cloud Run Functions by Terraform to load remote files directly into GCS from a given list of URLs, without storing them on a local disk.
Downloads a list of URLs and writes them straight to your GCS bucket on GCP. Nothing is persisted on a local machine: no local disk usage, less time spent on intermediate copies.
- Process files later on GCP services
- Archive data in GCS without local copies
- One-off or recurring remote-to-GCS transfers
- Have access to a GCP project with sufficient IAM permissions for GCP Buckets and Cloud Run Functions. A trial account is sufficient for trying:
https://cloud.google.com/free - Install gcloud CLI:
https://cloud.google.com/sdk/docs/install-sdk - Install Docker Compose:
https://docs.docker.com/compose/install/
git clone <your-fork-or-this-repo-url>
cd bucketloaderFill in environment variables (used by Docker Compose):
- Open
env.tmpand set the path to your gcloud config (Linux/macOS default:~/.config/gcloud). - Save your changes, then move the file to the path expected by Docker:
mv env.tmp .envProvide Terraform input variables:
- Open
tfvars.tmpin the repo root and replace placeholders (GCP project id, bucket name, region, etc.). - Move it into Terraform's expected filepath:
mv tfvars.tmp terraform/terraform.tfvarsInfo: The paths .env and terraform/terraform.tfvars will be ignored by Git as the path is already added to .gitignore.
docker compose up -dOpen a shell inside the running container:
docker exec -it terraform shIf the next commands are not found immediately, cd /workspace inside the container and try again.
Test the Terraform plan (no changes applied):
./engine/starter.sh testApply the plan to create the bucket and deploy the Cloud Run Function:
./engine/starter.sh applyResource creation typically takes ~1 minute. When done, you should see:
Exit the container when done:
exitPrepare your URL list locally by editing urls.txt (one URL per line). Then start the loader from your terminal:
./load_to_bucket.shYou can verify progress and results in the GCP Console once transfers begin.
The code includes a GitHub Actions workflow which will warn user if any secret (key, id etc) is accidentally published.
This repo also ships env.tmp as a template you rename to .env. Instead of (or in addition to) that approach, you can use alternative methods that you prefer. Whichever method you choose, never commit secrets. If you do keep .env, always make sure it is in .gitignore.
This is a side project meant as a development practice; the GCP-native Storage Transfer Service provides agentless URL-to-GCS transfers and should be preferred for enterprise-grade production workloads.
Modify load_to_bucket.sh to read the region value from an environment variable instead of setting it in the script.

