Skip to content

linogaliana/python-datascientist

Repository files navigation

Data Science with Python

DOI Production deployment


Lino Galiana • Data scientist, Insee (French national statistical institute)

Course given in two top French engineering schools (ENSAE and ENSAI) and available for self-paced learning.


Note

This is the English 🇬🇧🇺🇸 version of the README. To see the French 🇫🇷 version, click here:

fr


About

This repository hosts the source files for my Python for Data Science, a hands-on course designed to take students from first contact with Python to practical data science workflows.

It is taught in two French engineering schools in 2n year (Master 1):

  • ENSAE since 2021
  • ENSAI from 2026

The material combines explanations, examples, and exercises, with an emphasis on reproducible and real-world datasets.

All chapters (notes, examples, and exercises available as Jupyter notebooks) are available at https://pythonds.linogaliana.fr/.

📌 License and attribution

This course is released under the Creative Commons CC BY-NC-SA license .

If you use this course material, please cite:

Galiana, Lino. 2025. Python pour la data science. https://doi.org/10.5281/zenodo.8229676

@book{galiana2025,
  author = {Galiana, Lino},
  title = {Python pour la data science},
  date = {2025},
  url = {https://pythonds.linogaliana.fr/},
  doi = {10.5281/zenodo.8229676},
  langid = {fr}
}

🎨 Gallery

A few examples of figures produced during the course (click to open the course website):

Top 50 French velib stations Correlation matrix Forest map Waffle chart

More examples

Population map Top carbon emission cities Leaflet map example Pandas structure Haute Garonne map Spillhaus projected map example with Python Bulbizarre scrapped image Velib time use


📖 Course content

This course is suitable for both beginners and advanced learners.

The syllabus below is fully clickable and collapsible.

1. Getting started: why Python for data science?

🔗 https://pythonds.linogaliana.fr/en/content/getting-started/

  • Getting a functional Python environment for data science
  • How to deal with a data set
  • Python basics
2. Data wrangling

🔗 https://pythonds.linogaliana.fr/en/content/manipulation/

  • Numpy, the foundation of data science
  • Introduction to Pandas
  • Data wrangling with Pandas
  • Spatial data with GeoPandas
  • Webscraping with Python
  • Retrieving data with APIs
  • Mastering regular expressions
  • Importing data from Parquet and S3
3. Data visualisation and communication

🔗 https://pythonds.linogaliana.fr/en/content/visualisation/

  • Building graphics with Python
  • Introduction to cartography
4. Modeling

🔗 https://pythonds.linogaliana.fr/en/content/modelisation/

  • Why preprocessing matters
  • Evaluating model quality
  • Introduction to classification
  • Introduction to regression
  • Feature selection
  • Clustering
5. Natural Language Processing (NLP)

🔗 https://pythonds.linogaliana.fr/en/content/nlp/

  • Cleaning and structuring texts
  • Bag-of-words approach
  • Text embeddings

🔗 Resources

The course content relies heavily on open data, including French datasets (from data.gouv and Insee) and American datasets.

Complementary course with Romain Avouac (@avouacr):
https://ensae-reproductibilite.github.io/website/


🚀 Accessing the course in Jupyter Notebooks

Tip

Run examples instantly on SSP Cloud or Google Colab. Here is an example for Pandas chapter:

SSP Cloud VSCode SSP Cloud Jupyter Open in Colab


🤝 Contributing

I welcome contributions!

Note

See the guide for contributors:

CONTRIBUTING.md

About

Dépôt associé au cours Python pour data scientists (ENSAE 2e année)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors 19

Languages