This project calculates the average elevation for all ZIP codes (ZCTAs) in the United States using high-resolution elevation data. It outputs a CSV file with ZIP codes and their corresponding average elevations, which can be used for studying correlations between elevation and health outcomes like skin cancer.
-
Elevation Data:
- Source: USGS National Map Elevation Data
- AWS Bucket: USGS Elevation AWS Bucket
- Resolution: ~30 meters (~1/9 arc-second)
- Format: GeoTIFF files
-
ZIP Code Data (ZCTAs):
- Source: U.S. Census Bureau TIGER/Line Shapefiles
- Format: Shapefiles (.shp)
This project requires specific dependencies that can be installed using a Conda environment.
-
Open an Anaconda Prompt and use the provided
environment.ymlfile to recreate the environment:conda env create -f environment.yml
-
Activate the environment:
conda activate elevation_analysis_env
-
Ensure that the
tif_links.txtfile is in the project directory (it should be by default when cloning the repository). This file contains the URLs of the required elevation GeoTIFF files. -
Run the download script in the conda environment to fetch the elevation data:
python path/to/download_elevation_map.py
This script downloads the ZIP Code shapefile archive and the elevation tiles dataset. The ZIP Code archive will automatically unzip. Ensure sufficient storage (~500GB) is available for the elevation tiles which will install to the
elevation_tilesdirectory.
- Run the elevation processing script in the conda environment:
This script generates the
python path/to/process_elevations.py
zip_code_elevations.csvfile, which contains the average elevation for each ZIP code.
The final output is a CSV file zip_code_elevations.csv, containing:
ZIP Code: The ZIP code (ZCTA).Average Elevation: The average elevation above sea level in meters for the corresponding ZIP code.
- The elevation data requires significant storage (~500GB).
- Processing time may vary depending on hardware and network speeds, but can take multiple days due to the size of the dataset.
- Both scripts will pick up downloading/processing where they left off when run again in the event that the program was previously terminated early.