- Contents
- Summary
- Dependencies
- Setup
- Download Code and Token From GitHub
- Set Up AWS Before Creating App
- Create App in AWS Elastic Beanstalk
- Deploy App in AWS EC2
- Notes
- API
- Endpoints
- Additional Details
- Examples
- Data Model
- Future Features
- Metadata
This application retrieves, stores, processes, and accesses data about daily weather and yearly crop yields.
This application ran live on AWS Elastic Beanstalk in July and August 2024, with API usage information at the link below.
http://gconan-corteva-challenge.us-west-2.elasticbeanstalk.com/apidocs
See requirements.txt for full list of dependencies.
- Python v3.10+
- Python-Poetry
- NumPy v2.0.0+
- Pandas v2.2.2+
- Flask-SQLAlchemy v3.1.1+
- SQLAlchemy v2.0.31+
- PsycoPG2-Binary v2.9.9+
- Dask[dataframe] v2024.7.0+
- Flasgger v0.9.7.1+
- Clone this (
Corteva-Challenge) repository, then select all of its contents and export them as a .ZIP file. Do not just download the repository as a .ZIP file through the GitHub website, because that will put all of the repo contents in a top-level subdirectory within the .ZIP file and cause errors later. - Get a valid GitHub authorization token to access the GitHub API.
-
Open the AWS Management Console in your browser.
-
Create, or log in to, an AWS account.
-
Create and download an AWS EC2 key pair. I named mine
Corteva-Challenge. -
In the IAM page of the AWS Management Console, go to
Rolesand clickCreate role. UnderUse case, selectEC2. ClickNext. On theAdd permissionspage, add the following permissions policies, then clickNext.AmazonS3FullAccessAWSElasticBeanstalkRoleWorkerTierAWSElasticBeanstalkWebTierAWSElasticBeanstalkWorkerTier
-
Name the role and click
Create role. -
Create an EC2 instance profile using this role.
- Open the the Elastic Beanstalk page in the AWS Management Console.
- Click
Create application. - Fill in the
Application name,Environment name, andDomainfields. For this example, I named my appCorteva-Challengewith an environment calledCorteva-Challenge-envat the subdomaingconan-corteva-challenge. - In the
Platformfield, selectPython, and forPlatform branchselectPython 3.11. - Check
Upload your codeandLocal file, then clickChoose fileand upload your.zipfile copy of theCorteva-Challengecode repo. ClickNext. - Click
Use an existing service roleand select the defaultaws-elasticbeanstalk-service-role. UnderEC2 key pair, select the key pair you downloaded earlier. UnderEC2 instance role, select the role you created earlier. - Under
Public IP address, click theActivatedbox. Also click theEnable databaseswitch. UnderUsernameandPassword, typepostgres.1 - Under
Environment properties, clickAdd environment property. Name itGITHUB_TOKENand enter the entire token string you generated. - Click
Next, and on theReviewpage clickSubmitto create the app.
-
Open Amazon RDS in the AWS Management Console. Click
Databasesin the sidebar, then select the database you generated when you created your Elastic Beanstalk application. UnderConnected compute resources, clickActionsand clickSet up EC2 connection. In theEC2 instancedropdown, select the EC2 instance running your application, then clickContinue. -
From the EC2 page of the AWS Management Console, click
Instances, and then the string under theInstance IDof the instance running your application. ClickConnect, ensure thatConnect using EC2 Instance Connectis checked, and then click theConnectbutton at the bottom-right. -
In the EC2 Instance Connect command-line terminal, run
source /var/app/venv/staging-*/bin/activate.2 -
From the Elastic Beanstalk page of the AWS Management Console, click
Environmentsand then the env you created (e.g.Corteva-Challenge-env). Copy the URL path underDomain. -
From the RDS page of the AWS Management Console, go to
Databasesand then click the database you started for this app. Copy the URI path listed underEndpoint & Port. -
In the EC2 Instance Connect command-line terminal, activate the environment and define its variables.2 In the terminal,
- Run
ls -d /var/app/venv/staging-*/binto get thebindirectory path. - Run
export PYTHONPATH=followed by thebindirectory path. - Run
source ${PYTHONPATH}/activate - Run
export GITHUB_TOKEN=followed by the entire GitHub access token string you generated. - Run
export SQLALCHEMY_DATABASE_URI=postgresql+psycopg2://postgres:postgres@followed by the database URI path and:5432/postgresat the end.
- Run
-
In the
Elastic Beanstalk > Environments > Corteva-Challenge-env > Configurationsection, define those environment variables again.3 In theEnvironment propertiessection of theConfigurationpage,- If there is no environment variable named
GITHUB_TOKEN, then add one and set its value to the entire GitHub token string. - If there is no environment variable named
PYTHONPATH, then add one and set its value to thebindirectory path. - If there is no environment variable named
SQLALCHEMY_DATABASE_URI, then add one and set its value to the same stringpostgresql+psycopg2://postgres:postgres@{database-URI}:5432/postgres.
- If there is no environment variable named
-
To start the application, connect to its host via EC2 Instance Connect and then do the following:
cdto the directory containingapp.py. That file should be in a subdirectory of/var/app/current/.- Run
flask setup-db.2 - Load all data into the database by running
flask load-data.2
-
The application should now be fully usable. Navigate to the domain path URL you copied earlier in your browser, and you should be able to access any of the API endpoints defined below as subdomains.
In a full production deployment used by actual clients, I would write the app to:
- use a secure username and password, and require user authentication to access the application.
- run its setup steps automatically. For this test deployment, I do them manually.
- ensure that environment variables are passed between Elastic Beanstalk and the EC2 instance terminal. Currently, environment variables must be defined both places.
/returns a simple message stating whether the application is running./apidocsuses Flasgger to provide additional information on this application's API endpoints and what data they allow you to access./api/cropreturns crop yield data: the number of crop bushels per year./api/weatherreturns daily weather report data: the daily maximum/minimum temperature and precipitation at each weather station. This endpoint accepts several parameters to filter the data:station_id=Nwill only include reports from the weather station with the ID number N.max_date=YYYY-MM-DDwill exclude any reports after the specified date in ISO 8601 format.min_date=YYYY-MM-DDwill exclude any reports before the specified date in ISO 8601 format.
/api/weather/statsreturns overall weather report data: the average minimum/maximum temperature and total precipitation at a given station during a given yearstation_id=Nwill only include reports from the weather station with the ID number N.year=YYYYwill only include stations' reports for the year YYYY.
/api/weather/stationsreturns the name and ID number of every weather station.
- The
/api/weather,/api/weather/stations, and/api/cropendpoints return paginated results. They accept two parameters to filter results by page:per_page=Norganizes results into groups of N. By default, it will return the first N results.page=Nwill return the Nth page/group. By default, it will return the Nth 50 results.
Navigate to this API endpoint to access the twentieth to fourtieth daily weather reports from 1997 at station 5:
/api/weather?page=2&per_page=20&min_date=1997-01-01&max_date=1997-12-31&station_id=5
Navigate to this API endpoint to access the average yearly maximum/minimum temperature and total precipitation at weather station 3 in 1998:
/api/weather/stats?station_id=3&year=1998
classDiagram
WeatherStation "1" --> "many" WeatherReport : generates
class WeatherStation {
+id: int
+created: datetime
+name: string
+updated: datetime
}
class WeatherReport {
+id: int
+date: date
+max_temp: float
+min_temp: float
+precipitation: int
+station_id: int
}
class CropYield {
+id: int
+corn_bushels: int
+created: datetime
+year: int
}
The following are not currently features of this application, but I would add them if implementing it for production-level use by actual clients.
- Add Yearly Statistics Class/Model. Explicitly define a SQL database table, and corresponding Python class in
models.py, to store the yearly statistics returned from the/api/weather/statsendpoint. - User Authentication. Instead of allowing data access to anyone who can access the page, the application could require user authentication.
- Scheduled Data Ingestion. The application could query the source data files and update its database at specified intervals, like on a
cronjob. - Statistical Predictive Modeling. The application could use daily weather reports to predict and yearly crop yield. In its most basic form, the application would correlate the data columns of the
weather_reporttable in a given year with thecorn_bushelsyield for that year. Further models would identify which stations and periods of time best predict the yield. - Filtering By Station Name. Instead of accepting the arbitrary
station_idparameter, the/api/weatherendpoint could accept astation_nameparameter and determine the ID number of that station bySELECTing thatstation_namein theweather_stationtable.
- Written 2024-07-15 by @GregConan (gregmconan@gmail.com)
- Updated 2024-11-24 by @GregConan (gregmconan@gmail.com)