Skip to content

Data analysis of the BUAP university transport system (Route CU-CU2) to optimize transit times and student mobility using Python & ML.

Notifications You must be signed in to change notification settings

Dannap7337/University-Transport-Optimization

Repository files navigation

🚌 University Transport Optimization Analysis (STU BUAP)

📋 Project Overview

The University Transport System (STU) connects the main campus (CU) with the new engineering campus (CU2). This project analyzes user data to identify saturation patterns, optimize transit schedules, and reduce the environmental impact of inefficient trips.

Authors: Danna Patricia Riveroll Martínez & Josué Salvador Marín Nieva.

🎯 Objectives

  • Optimize Logistics: Reduce wait times and improve bus scheduling efficiency.
  • User Experience: Analyze student satisfaction and overcrowding (standing frequency).
  • Sustainability: Minimize the carbon footprint by maximizing unit capacity per trip.

🛠️ Tech Stack & Tools

  • Language: Python 3.x
  • Data Processing: Pandas, NumPy (ETL, Regex cleaning).
  • Machine Learning: Scikit-Learn (Decision Tree Classifier).
  • Visualization: Matplotlib, PowerBI (Dashboards).

📊 Methodology

1. ETL & Data Cleaning

  • Dataset: ~3,400 survey records after cleaning (originally ~3,000 raw entries).
  • Techniques:
    • Handling missing values via mean/mode imputation.
    • Removing duplicates and standardizing time formats using Regex.
    • Outlier analysis using boxplots (preserved for realistic representation of wait times).

2. Exploratory Data Analysis (EDA)

Key insights derived from the data:

  • Peak Hours: Identified high demand windows at 6:00-8:00 AM and 1:00-3:00 PM.
  • Wait Times: Most users wait between 15 to 60 minutes.
  • Satisfaction: Predominantly low (Level 2/5), correlated with overcrowding.
  • Correlation: Found a positive correlation (0.22) between wait times and the likelihood of standing during the trip.

3. Machine Learning Model

  • Algorithm: Decision Tree Classifier.
  • Target: Predicting "Standing Frequency" (Saturation level).
  • Results: The model achieved an accuracy of 51%. It performed well in predicting extreme cases (Always Standing vs. Never Standing) but highlighted the need for more complex features to predict intermediate states.

💡 Proposed Solution

Based on the data analysis, we propose a staggered logistical schedule:

  • Coordination: Buses departing every 15/30 minutes synchronized between CU and CU2.
  • Efficiency: Ensuring units do not return empty ("deadheading") by aligning departures with arrival peaks, reducing unnecessary emissions.

Note: The source code, datasets, and detailed PDF reports contained in this repository are in Spanish.

About

Data analysis of the BUAP university transport system (Route CU-CU2) to optimize transit times and student mobility using Python & ML.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published