Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
53fb38e
FlugHafen pipeline - first commit
Crazz-Zaac Oct 26, 2024
14afdf4
exercise 2 initial commit
Crazz-Zaac Nov 3, 2024
71c382d
Project plan inital commit
Crazz-Zaac Nov 3, 2024
d72f740
ex 3 initial commit
Crazz-Zaac Nov 13, 2024
b3c8dee
minor change
Crazz-Zaac Nov 13, 2024
6a6507a
update to project plan
Crazz-Zaac Nov 13, 2024
f40dfd7
update to project plan
Crazz-Zaac Nov 13, 2024
e9454dc
update to project plan
Crazz-Zaac Nov 13, 2024
3976fad
Update project-plan-example.md
Crazz-Zaac Nov 14, 2024
57d181b
Update project-plan-example.md
Crazz-Zaac Nov 14, 2024
d7b0bd8
Update project-plan-example.md
Crazz-Zaac Nov 14, 2024
de98368
exercise 3 initial commit
Crazz-Zaac Nov 18, 2024
b93c460
removed python code, with complete jv code
Crazz-Zaac Nov 19, 2024
c60b392
code revision and documentation
Crazz-Zaac Nov 19, 2024
2f5ba0e
report pdf
Crazz-Zaac Nov 27, 2024
6fd741d
exercise 4 first commit
Crazz-Zaac Dec 1, 2024
917bd69
fixing data transform
Crazz-Zaac Dec 4, 2024
6bc38c3
test cases
Crazz-Zaac Dec 5, 2024
ae92cb8
rounded off temperature, validated month
Crazz-Zaac Dec 5, 2024
a43772a
data unit fixed
Crazz-Zaac Dec 5, 2024
382dc6a
test cases fixed
Crazz-Zaac Dec 5, 2024
c988083
exercise 5 initial commit
Crazz-Zaac Dec 5, 2024
5d6db94
exercise 5 adding validation
Crazz-Zaac Dec 5, 2024
cd12db8
workflow and requirements added
Crazz-Zaac Dec 11, 2024
30fdcdd
requirements updated
Crazz-Zaac Dec 11, 2024
5d143db
minor changes
Crazz-Zaac Dec 17, 2024
7fd19bb
minor changes
Crazz-Zaac Dec 17, 2024
5d185fc
minor changes
Crazz-Zaac Dec 17, 2024
2510641
minor changes
Crazz-Zaac Dec 19, 2024
a3849a8
experiments for the report
Crazz-Zaac Jan 9, 2025
08e5063
final report of the experiment
Crazz-Zaac Jan 9, 2025
ef80e88
minor changes
Crazz-Zaac Jan 9, 2025
5e90bcb
report finalization
Crazz-Zaac Jan 13, 2025
67a21f4
minor issue fixed
Crazz-Zaac Jan 13, 2025
dd1a3ae
minor issue fixed
Crazz-Zaac Jan 13, 2025
6a0af3c
import error fixed
Crazz-Zaac Jan 13, 2025
e7e36eb
import error fixed
Crazz-Zaac Jan 13, 2025
7b8cdca
first commit
Crazz-Zaac Jan 17, 2025
b6dda54
Add CC BY 4.0 License and update README
Crazz-Zaac Jan 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions .github/workflows/project_feedback.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: Run Tests
run-name: ${{ github.actor }} is running tests

on:
push:
branches:
- main

jobs:
test:
runs-on: ubuntu-20.04

steps:
- uses: actions/checkout@v2

- name: Set up Python 3.11
uses: actions/setup-python@v2
with:
python-version: 3.11

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt

# project and exercise feedbacks
- name: Make test executable
run: chmod +x /home/runner/work/made-template/made-template/project/tests.sh

- name: Run project tests
run: /home/runner/work/made-template/made-template/project/tests.sh
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
.DS_Store
/data/*
!/data/.gitkeep
!/data/.gitkeep
.mypy_cache/
temp_dir/
project/__pycache__/
*.pyc
*.sqlite
24 changes: 24 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Creative Commons Attribution 4.0 International License (CC BY 4.0)

Copyright (c) [2025][FAU]

You are free to:

- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.

The licensor cannot revoke these freedoms as long as you follow the license terms.

Under the following terms:

- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

Notices:

You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an exception or limitation.

No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.

For more information, visit: https://creativecommons.org/licenses/by/4.0/
70 changes: 44 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,55 @@
# Methods of Advanced Data Engineering Template Project
## A comparative analysis of North America and Latin America & Caribbean’s contribution to renewable energy; and their adoption to climate mitigation efforts from 1960 to 2023

This template project provides some structure for your open data project in the MADE module at FAU.
This repository contains (a) a data science project that is developed by the student over the course of the semester, and (b) the exercises that are submitted over the course of the semester.
This report presents a comparative analysis of the contributions and progress of North America and Latin America & Caribbean in renewable energy
adoption and climate mitigation efforts over the period 1960 to 2023.

To get started, please follow these steps:
1. Create your own fork of this repository. Feel free to rename the repository right after creation, before you let the teaching instructors know your repository URL. **Do not rename the repository during the semester**.
## Data License
The World Bank strives to enhance public access to and use of data that it collects and publishes. The data are organized in datasets listed in The World Bank Data Catalog (the “Datasets”).[Creative Commons Attribution 4.0 International License (CC BY 4.0)](https://www.worldbank.org/en/about/legal/terms-of-use-for-datasets) It spans multiple countries, enabling both regional and country-level analysis. The data spans from 1960 to 2023.

## Project Work
Your data engineering project will run alongside lectures during the semester. We will ask you to regularly submit project work as milestones, so you can reasonably pace your work. All project work submissions **must** be placed in the `project` folder.
| Region | Data Source |
|-------------------------|-----------------------------------------------------------------------------|
| North America | [Download](https://api.worldbank.org/v2/en/country/NAC?downloadformat=csv) |
| Latin America & Caribbean | [Download](https://api.worldbank.org/v2/en/country/LCN?downloadformat=csv) |
| Metadata URL | [Link](https://data.worldbank.org/country) |
| Data Type | csv |

### Exporting a Jupyter Notebook
Jupyter Notebooks can be exported using `nbconvert` (`pip install nbconvert`). For example, to export the example notebook to HTML: `jupyter nbconvert --to html examples/final-report-example.ipynb --embed-images --output final-report.html`


## Exercises
During the semester you will need to complete exercises using [Jayvee](https://github.com/jvalue/jayvee). You **must** place your submission in the `exercises` folder in your repository and name them according to their number from one to five: `exercise<number from 1-5>.jv`.
### Description of Directories and Files

In regular intervals, exercises will be given as homework to complete during the semester. Details and deadlines will be discussed in the lecture, also see the [course schedule](https://made.uni1.de/).
- **data**: Contains all the datasets used in the project.
- **examples**: Includes example files provided by the course facilitators for reference.
- **exercises**: Contains solutions to exercises from the WinSem2024 course.
- **project**: This directory holds all the analysis work done during the semester, including reports, scripts, and temporary files.
- **analysis-report.pdf**: The final analysis report.
- **data-report.pdf**: A report on the data used in the project.
- **final_report.ipynb**: The final report in Jupyter Notebook format.
- **pipeline.sh**: A shell script for running the data processing pipeline.
- **prepare_data.py**: A Python script for preparing the data.
- **project-plan-example.md**: An example of a project plan.
- **run_tests.py**: A Python script for running tests.
- **temp_dir**: A temporary directory for intermediate files.
- **tests.sh**: A shell script for running tests.
- **version_check.py**: A Python script for checking package versions.
project.
- **requirements.txt**: A file listing the dependencies required for the project.

### Exercise Feedback
We provide automated exercise feedback using a GitHub action (that is defined in `.github/workflows/exercise-feedback.yml`).
### Usage

To view your exercise feedback, navigate to Actions → Exercise Feedback in your repository.
1. Clone the repository:
```bash
git clone <repository-url>
2. Create virtual environment
```bash
python -m venv <env_name>
3. Install the dependencies
```bash
pip install -r requirements.txt
4. Run the data pipeline using `project/pipeline.sh`
4. You can now open `final_report.ipynb` and explore.

The exercise feedback is executed whenever you make a change in files in the `exercise` folder and push your local changes to the repository on GitHub. To see the feedback, open the latest GitHub Action run, open the `exercise-feedback` job and `Exercise Feedback` step. You should see command line output that contains output like this:

```sh
Found exercises/exercise1.jv, executing model...
Found output file airports.sqlite, grading...
Grading Exercise 1
Overall points 17 of 17
---
By category:
Shape: 4 of 4
Types: 13 of 13
```

## License

This project is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to share and adapt the work as long as you provide appropriate credit. For more details, see the [LICENSE](LICENSE) file or visit [Creative Commons](https://creativecommons.org/licenses/by/4.0/).
74 changes: 74 additions & 0 deletions exercises/exercise1.jv
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
pipeline FlugHafen{

//1. FlugHafen Pipeline connects the blocks via pipes to extract data from a CSV file
// in the web to a SQLite file sink.
FlugHafenHttpExtractor
-> FlugHafenTextFileInterpreter;

//2. The FlugHafenTextFileInterpreter output is used as input for the FlugHafenCsvFileInterpreter
// block which is then used as input for the FlugHafenDataSelector block.
FlugHafenTextFileInterpreter
-> FlugHafenCsvFileInterpreter
// -> FlugHafenDatabaseWriter
-> FlugHafenDataSelector
-> FlugHafenTableInterpreter
-> FlugHafenLoader;

//3. The FlugHafenHttpExtractor block is of type HttpExtractor and the URL is specified.
block FlugHafenHttpExtractor oftype HttpExtractor {
// URL of the data source
url: "https://opendata.rhein-kreis-neuss.de/api/explore/v2.1/catalog/datasets/rhein-kreis-neuss-flughafen-weltweit/exports/csv?lang=en&timezone=Europe%2FBerlin&use_labels=true&delimiter=%3B";
}

//4. The FlugHafenTextFileInterpreter block is of type TextFileInterpreter.
block FlugHafenTextFileInterpreter oftype TextFileInterpreter { }

//5. Since we only need a specific range of the data, we use the CellRangeSelector block.
block FlugHafenDataSelector oftype CellRangeSelector {
// The name of the sheet
select: range A1:I*;
}

//6. The FlugHafenCsvFileInterpreter block is of type CSVInterpreter and the delimiter is specified.
block FlugHafenCsvFileInterpreter oftype CSVInterpreter {
// Specify the separator as a semicolon for the CSV
delimiter: ';';
}

// block FlugHafenDatabaseWriter oftype DatabaseWriter {
// // The name of the database
// database: "flughafen.db";
// // The name of the table
// table: "flughafen";
// }

//7. The FlugHafenTableInterpreter block is of type TableInterpreter and the necessary columns are specified.
block FlugHafenTableInterpreter oftype TableInterpreter {
// The first row contains the header
header: true;
// The columns of the table
columns: [
"Lfd. Nummmer" oftype integer,
"Name des Flughafens" oftype text,
"Ort" oftype text,
"Land" oftype text,
"IATA" oftype text,
"ICAO" oftype text,
"Latitude" oftype decimal,
"Longitude" oftype decimal,
"Altitude" oftype integer,

];
}

//8. Finally the FlugHafenLoader block is of type SQLiteLoader and the table name and file name are specified.
block FlugHafenLoader oftype SQLiteLoader {
// The name of the table
table: "airports";
// The name of the file
file: "airports.sqlite";
}


}

85 changes: 85 additions & 0 deletions exercises/exercise2.jv
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
pipeline TreePlanting{

//1. TreePlanting Pipeline connects the blocks via pipes to extract data from a CSV file
// in the web to a SQLite file sink.
TreePlantingHttpExtractor
-> TreePlantingTextFileInterpreter;

//2. The TreePlantingTextFileInterpreter output is used as input for the TreePlantingCsvFileInterpreter
// block which is then used as input for the TreePlantingDataSelector block.
TreePlantingTextFileInterpreter
-> TreePlantingCsvFileInterpreter
// -> TreePlantingDatabaseWriter
-> TreePlantingBaumartDeutschDeleter
-> TreePlantingTableInterpreter
-> TreePlantingLoader;

//3. The TreePlantingHttpExtractor block is of type HttpExtractor and the URL is specified.
block TreePlantingHttpExtractor oftype HttpExtractor {
// URL of the data source
url: "https://opendata.rhein-kreis-neuss.de/api/v2/catalog/datasets/stadt-neuss-herbstpflanzung-2023/exports/csv";
}

//4. The TreePlantingTextFileInterpreter block is of type TextFileInterpreter.
block TreePlantingTextFileInterpreter oftype TextFileInterpreter { }

//6. The TreePlantingCsvFileInterpreter block is of type CSVInterpreter and the delimiter is specified.
block TreePlantingCsvFileInterpreter oftype CSVInterpreter {
// Specify the separator as a semicolon for the CSV
delimiter: ';';
}

//5. The TreePlantingBaumartDeutschDeleter block is of type ColumnDeleter and the column to be deleted is specified.
block TreePlantingBaumartDeutschDeleter oftype ColumnDeleter {
// The name of the column
delete: [column E];
}

//7. The TreePlantingTableInterpreter block is of type TableInterpreter and the necessary columns are specified.
block TreePlantingTableInterpreter oftype TableInterpreter {
// The first row contains the header
header: true;
// The columns of the table
columns: [
"lfd_nr" oftype integer,
"stadtteil" oftype Vogelsang,
"standort" oftype text,
"baumart_botanisch" oftype text,
"id" oftype GeoCoordinate,
"baumfamilie" oftype text,

];
}

block TreePlantingLoader oftype SQLiteLoader {
// The name of the table
table: "trees";
// The name of the file
file: "trees.sqlite";
}


valuetype Vogelsang oftype text {
// The value of the column
constraints: [
// only allow column values that start with "Vogelsang"
VogelsangStadteil
];
}

valuetype GeoCoordinate oftype text {
// The value of the column
constraints: [
// only allow column values that match the pattern of a geo coordinate
Geopoints
];
}

constraint VogelsangStadteil on text: value matches(/^Vogelsang*/);
//8. Finally the TreePlantingLoader block is of type SQLiteLoader and the table name and file name are specified.

constraint Geopoints on text: value matches(/^\d{1,3}\.\d+,\s*\d{1,3}\.\d+$/);


}

Loading