Data Engineering Challenge

Construct a data pipeline capable of normalising and storing data in AWS (Amazon Web Services) environment

Requirements

Solution should be utilising AWS solutions
Code should be written in Python
Data should be normalised as much as possible
Data processing should be triggered by AWS S3 event notifications. More info can be found here.
Data processing should be done utilising AWS Lambda Function
Final data should be stored on AWS S3 or one of the AWS databases (RDS, DynamoDB etc.)
At least one unit test should be created

Data

You can find dataset in data directory as Open_DataRDW.csv file
Dataset is an open data subset from https://opendata.rdw.nl/Voertuigen.

Data processing & normalisation

Vehicle type - Voertuigsoort - should be normalised to English term. Same goes for body type (Inrichting), and colour (Eerste kleur).
All date fields should be converted to ISO-8601 timezone-aware datetime format.
Depending where the final data is going to be stored, it may need to be grouped by vehicle make (e.g. Ford, Audi et). Thus, if it is JSON/CSV final output, it should be separated to distinct files. If final output chosen as a database, grouping is not necessary however uniqueness should be explicitly defined by license plate number.
Any n.v.t. null values in colour column should be correctly translated to final output (e.g. empty for CSV, null for DBs)

Submitting work

Code should be shared with @smagu and @Kontuzijus as collaborators on private repository when it is ready for a review
Any explanations should be written either in README.md or within the code as comments

Hints and suggestions

AWS Free Tier should be sufficient enough for all the necessary work
AWS resource creation such as Lambda Function and S3 events can be either created manually, or deployed using frameworks such as Serverless.
The easiest approach is by using AWS console and supplied examples
To imitate a new S3 object, you may upload new file by hand and then use s3:ObjectCreated:* event notifaction to trigger AWS Lambda

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Engineering Challenge

Requirements

Data

Data processing & normalisation

Submitting work

Hints and suggestions

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

carVertical/data-engineering-homework

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Challenge

Requirements

Data

Data processing & normalisation

Submitting work

Hints and suggestions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages