Lino Galiana • Data scientist, Insee (French national statistical institute)
Course given in two top French engineering schools (ENSAE and ENSAI) and available for self-paced learning.
This repository hosts the source files for my Python for Data Science, a hands-on course designed to take students from first contact with
Python to practical data science workflows.
It is taught in two French engineering schools in 2n year (Master 1):
- ENSAE since 2021
- ENSAI from 2026
The material combines explanations, examples, and exercises, with an emphasis on reproducible and real-world datasets.
All chapters (notes, examples, and exercises available as Jupyter notebooks) are available at https://pythonds.linogaliana.fr/.
📌 License and attribution
This course is released under the Creative Commons CC BY-NC-SA license .
If you use this course material, please cite:
Galiana, Lino. 2025. Python pour la data science. https://doi.org/10.5281/zenodo.8229676
@book{galiana2025,
author = {Galiana, Lino},
title = {Python pour la data science},
date = {2025},
url = {https://pythonds.linogaliana.fr/},
doi = {10.5281/zenodo.8229676},
langid = {fr}
}A few examples of figures produced during the course (click to open the course website):
This course is suitable for both beginners and advanced learners.
The syllabus below is fully clickable and collapsible.
1. Getting started: why Python for data science?
🔗 https://pythonds.linogaliana.fr/en/content/getting-started/
- Getting a functional Python environment for data science
- How to deal with a data set
- Python basics
2. Data wrangling
🔗 https://pythonds.linogaliana.fr/en/content/manipulation/
- Numpy, the foundation of data science
- Introduction to Pandas
- Data wrangling with Pandas
- Spatial data with GeoPandas
- Webscraping with Python
- Retrieving data with APIs
- Mastering regular expressions
- Importing data from Parquet and S3
3. Data visualisation and communication
🔗 https://pythonds.linogaliana.fr/en/content/visualisation/
- Building graphics with Python
- Introduction to cartography
4. Modeling
🔗 https://pythonds.linogaliana.fr/en/content/modelisation/
- Why preprocessing matters
- Evaluating model quality
- Introduction to classification
- Introduction to regression
- Feature selection
- Clustering
5. Natural Language Processing (NLP)
🔗 https://pythonds.linogaliana.fr/en/content/nlp/
- Cleaning and structuring texts
- Bag-of-words approach
- Text embeddings
The course content relies heavily on open data, including French datasets (from data.gouv and Insee) and American datasets.
Complementary course with Romain Avouac (@avouacr):
https://ensae-reproductibilite.github.io/website/
Tip
Run examples instantly on SSP Cloud or Google Colab. Here is an example for Pandas chapter:
I welcome contributions!












