This repo collects four small database projects I worked on while learning SQL and relational databases:
- Case Study: City of Chicago Data Portal
- Human Resources Database
- Pet Sale Database
- Exploring Automotive Industry Trends in India
All of them focus on different parts of the SQL story: designing tables, loading data, writing queries, and in the Chicago case, running SQL directly from a Jupyter Notebook.
For the Chicago case study I didn’t just write SQL in a database console. Instead, I ran SQL inside a Jupyter notebook.
The basic idea:
-
Install the needed packages for database connectivity:
pip install sqlalchemy==1.3.9 ibm_db_sa
-
Load the SQL magic extension in the notebook:
%load_ext sql
-
Connect to the Db2 on Cloud database using a connection string:
%sql ibm_db_sa://username:password@hostname:port/database
-
From that point on, any cell starting with
%%sqlcan run pure SQL:%%sql SELECT * FROM CHICAGO_SCHOOLS FETCH FIRST 5 ROWS ONLY;
This setup let me:
- Keep Python and SQL side by side in one notebook
- Write text, comments, and explanations as Markdown between queries
- Re-run queries and tweak them easily during exploration
Exactly the same approach can be used for the other projects too (HR and Pet Sale): define tables in the database, connect from Jupyter, and then run queries with %%sql cells while using Python for plotting or extra analysis if needed.
Folder: Case Study: City of Chicago Data Portal
Main notebook:
-
Chicago Dataset.ipynbExtra documentation:
This was a more realistic project: working with open data from the City of Chicago.
The goal was to build a small analytical database in Db2 using three public datasets:
- Socioeconomic Indicators – hardship index, per‑capita income, unemployment, education, etc. by community area
- Public Schools – enrollment, school type, safety, health, and environment ratings
- (Typically) Crime or related civic data that can be linked by community area or location
The assignment walked through the full flow:
- Download CSV files from Chicago’s Data Portal
- Load them into Db2 tables from Jupyter
- Write SQL queries to answer specific questions about Chicago communities, schools, and hardship levels
- Database: IBM Db2 on Cloud
- Interface: Jupyter Notebook (
Chicago Dataset.ipynb) - SQL Extension:
%load_ext sqlwith theibm_db_sa/sqlalchemyconnector - Language: Standard SQL (Db2 flavor)
Inside the notebook, I:
- Explored the structure of each dataset (columns, data types, keys such as community area number).
- Created three Db2 tables for the Chicago datasets.
- Loaded CSV data from the local environment into those tables.
- Wrote SQL queries to answer assignment-style questions such as:
- Which community areas have the highest hardship index?
- How do school safety or environment ratings vary across communities?
- How do socioeconomic indicators relate to where schools are located?
Each query was run from a Jupyter cell using %%sql, with the outputs rendered as tables directly in the notebook. This made it easy to iterate and annotate the results with short explanations.
Folder: Exploring Automotive Industry trends in India
Files:
This project uses a dataset of cars listed for sale in India to explore trends in the automotive market.
Each row represents a specific car listing with information such as:
Name(model / variant)year(model year)selling_pricekm_drivenfuelseller_typetransmissionowner(first owner, second owner, etc.)mileageenginemax_powertorqueseats
The idea here was to treat this as a small analytical dataset and write SQL queries that answer realistic questions like:
- Which types of cars hold their value better?
- How do fuel type, transmission, and ownership history affect selling price?
- Are there clear patterns over time and mileage?
The SQL file Exploring Trends in the AutomotiveIndustry.sql connects to a database called cars_info and queries a table car_info.
The script walks through a series of numbered questions, for example:
-
Average selling price by fuel type
- For cars with manual transmission and first‑owner status.
- Uses
WHEREfilters plusGROUP BY fuelandAVG(selling_price).
-
Top 3 models by average mileage
- Considers only cars with more than 5 seats.
- Groups by
Nameand orders byAVG(mileage)to find the most efficient larger cars.
-
Models with large price swings
- Finds car models where the difference between max and min selling price is greater than 10,000.
- Uses
GROUP BY NameplusHAVING MAX(selling_price) - MIN(selling_price) > 10000.
-
Models priced above average but with below‑average mileage
- Compares each record against the overall average selling price and overall average mileage.
- Identifies cars that are relatively expensive but not very efficient.
-
Cumulative selling price over time
- Calculates the cumulative sum of selling prices by model over the years.
- Uses window functions with
SUM(selling_price) OVER (PARTITION BY Name ORDER BY year).
-
Models priced near the global average
- Flags models whose selling price is within ±10% of the overall average.
-
Exponential moving average (EMA)
- Computes an EMA of selling prices per model with a smoothing factor of 0.2.
- Uses window functions to simulate a running average over time.
-
Models with year‑over‑year price drops
- Uses
LAG(selling_price) OVER (PARTITION BY Name ORDER BY year)to compare each year with the previous one. - Finds models where the selling price decreased versus the prior year.
- Uses
-
Highest total mileage per transmission type
- Aggregates total
km_drivenfor eachNamewithin eachtransmission. - Returns the car names with the highest total mileage for manual vs automatic.
- Aggregates total
-
Average selling price per year for top models
- First ranks each model by its overall selling price using
RANK()window function. - Then, for the top 3 models, computes the average selling price per year.
- First ranks each model by its overall selling price using
This automotive project adds a few more advanced SQL skills to the repo:
- Working with a realistic, messy business dataset (used car listings).
- Using aggregations and
HAVINGto compare models. - Using window functions:
SUM() OVERfor cumulative totalsAVG() OVERfor moving averages / EMALAG()to compare current vs previous yearRANK()to select “top N” models
- Combining filters on price, mileage, seats, fuel type, and ownership to answer targeted questions.
It’s essentially a compact case study of how to use SQL to explore pricing and performance trends in the Indian automotive market.
Folder: Human Resources Database
SQL scripts:
- DDL (schema design)
- Queries – Part 1
- Queries – Part 2 (joins and more)
This project is a mini HR system for a fictional company.
The idea was to:
- Design a simple HR schema from scratch, and
- Practice writing realistic queries that an HR analyst or HRIS system might run.
Instead of using a large real dataset, everything here is schema + queries: focusing on table design, relationships, joins, and aggregations.
The HR database-DDL.sql script builds the core tables:
EMPLOYEES– employee id, first/last name, SSN, birth date, sex, address, job id, salary, manager id, department idJOBS– job identifier, job title, minimum and maximum salaryJOB_HISTORY– records of previous job assignments for each employeeDEPARTMENTS– department id, name, manager, and location idLOCATIONS– locations tied to departments
This schema models:
- One-to-many relationships (department → employees, job → employees)
- Historical records for people who have changed jobs (
JOB_HISTORY) - Geographical / organisational hierarchy via locations and departments
The two query scripts show how to ask common HR questions using SQL:
From HR database-Queries.sql:
- Filter employees by address
- e.g. employees whose address contains
Elgin,IL.
- e.g. employees whose address contains
- Filter by birth decade
- employees born in the 1970s using
B_DATEpatterns.
- employees born in the 1970s using
- Find employees in a salary band and department
- e.g. salary between 60,000 and 70,000 in department 5.
- Group by department to compute:
COUNT(*)(headcount)AVG(SALARY)(average salary)
and then useHAVINGto keep only departments with fewer than 4 employees.
From HR database-Queries_2.sql):
- Use INNER JOIN between
EMPLOYEESandJOB_HISTORYto list:- Employees in department 5 and their job start dates.
- Add a join to
JOBSto show job titles along with employee names and start dates. - Compare LEFT OUTER JOIN vs INNER JOIN to see how unmatched rows behave:
- For employees born before 1980 and their departments.
- Use FULL OUTER JOIN / LEFT JOIN between
EMPLOYEESandDEPARTMENTS:- To list employees with department names,
- Or to filter based on criteria like gender (
SEX='M').
This HR database project shows:
- How to turn a business domain (HR) into a relational schema.
- How to use DDL to define tables and primary keys.
- How to write DML / query scripts that:
- Filter rows using
WHERE,BETWEEN,LIKE, and functions likeYEAR(). - Combine tables using
INNER,LEFT, andFULL OUTERjoins. - Aggregate and summarise data using
GROUP BYandHAVING.
- Filter rows using
It’s a good example of using pure SQL to answer practical questions about employees, departments, and compensation, and it can easily be run either in a Db2 console or from Jupyter using the same %load_ext sql pattern as the Chicago notebook.
Folder: Pet Sale Database
SQL scripts:
This one is a lightweight, sandbox-style project.
The idea: build a tiny pet shop sales dataset and then use it to practice SQL functions and basic DDL/DML without the complexity of a big schema.
It’s intentionally small and simple so the focus stays on the SQL itself.
-
Drop the
PETSALEtable if it exists. -
Create a new
PETSALEtable with:create table PETSALE ( ID INTEGER PRIMARY KEY NOT NULL, ANIMAL VARCHAR(20), QUANTITY INTEGER, SALEPRICE DECIMAL(6,2), SALEDATE DATE );
-
Insert a handful of rows, for example:
- Cats, dogs, parrots, hamsters, goldfish
- Various quantities and sale prices
- Sales across late May and June 2018
In DDL - Exercise.sql there is also an extended exercise that:
- Creates a
PETSALEtable with bothSALEPRICEandPROFIT, and - A
PETtable withANIMALandQUANTITY, - Then experiments with inserts, updates, and joins as part of the practice.
PETSALE-FUNCTIONS.sql runs a bunch of built‑in SQL functions against the PETSALE data:
Numeric and aggregate functions:
SUM(SALEPRICE)– total revenueMAX(QUANTITY)– largest quantity sold in one rowAVG(SALEPRICE)– average sale priceAVG(SALEPRICE / QUANTITY)for dogs – average per‑unit price
String functions:
-
LENGTH(ANIMAL)– length of each animal name -
UCASE(ANIMAL)/LCASE(ANIMAL)– upper/lowercase conversions -
DISTINCT(UCASE(ANIMAL))– unique animal types in uppercase -
Filter rows with case-insensitive conditions, e.g.:
select * from PETSALE where LCASE(ANIMAL) = 'cat';
Date functions:
DAY(SALEDATE)– day of the month (for a given animal)COUNT(*)whereMONTH(SALEDATE)='05'– sales in May(SALEDATE + 3 DAYS)– add three days to the sale date(CURRENT DATE - SALEDATE)– days since each sale
This pet shop mini-database shows:
- How to create and populate a small table quickly.
- How to use aggregate functions for basic reporting.
- How to work with string and date functions in SQL.
- How to test and understand expressions like date arithmetic and case‑insensitive matching.
It’s a nice “lab bench” for learning SQL without getting distracted by a complicated schema.
Across these three projects, I worked with:
- DDL (CREATE TABLE, primary keys, basic design)
- DML and queries (SELECT, INSERT, UPDATE, DELETE in the exercises)
- Joins across multiple related tables
- Aggregations and groupings for reporting
- Built‑in functions for numbers, strings, and dates
- Database connectivity from Jupyter, using the
sqlmagic extension and Db2 drivers
The Chicago notebook shows how to wire up a real cloud database into a Python notebook, while the HR and Pet Sale projects show how to design schemas and write queries that match real‑world questions.
Yasir Savanur
These exercises are based on the “Databases and SQL for Data Science” course material, extended with my own notes and practice queries.