The University Transport System (STU) connects the main campus (CU) with the new engineering campus (CU2). This project analyzes user data to identify saturation patterns, optimize transit schedules, and reduce the environmental impact of inefficient trips.
Authors: Danna Patricia Riveroll Martínez & Josué Salvador Marín Nieva.
- Optimize Logistics: Reduce wait times and improve bus scheduling efficiency.
- User Experience: Analyze student satisfaction and overcrowding (standing frequency).
- Sustainability: Minimize the carbon footprint by maximizing unit capacity per trip.
- Language: Python 3.x
- Data Processing: Pandas, NumPy (ETL, Regex cleaning).
- Machine Learning: Scikit-Learn (Decision Tree Classifier).
- Visualization: Matplotlib, PowerBI (Dashboards).
- Dataset: ~3,400 survey records after cleaning (originally ~3,000 raw entries).
- Techniques:
- Handling missing values via mean/mode imputation.
- Removing duplicates and standardizing time formats using Regex.
- Outlier analysis using boxplots (preserved for realistic representation of wait times).
Key insights derived from the data:
- Peak Hours: Identified high demand windows at 6:00-8:00 AM and 1:00-3:00 PM.
- Wait Times: Most users wait between 15 to 60 minutes.
- Satisfaction: Predominantly low (Level 2/5), correlated with overcrowding.
- Correlation: Found a positive correlation (0.22) between wait times and the likelihood of standing during the trip.
- Algorithm: Decision Tree Classifier.
- Target: Predicting "Standing Frequency" (Saturation level).
- Results: The model achieved an accuracy of 51%. It performed well in predicting extreme cases (Always Standing vs. Never Standing) but highlighted the need for more complex features to predict intermediate states.
Based on the data analysis, we propose a staggered logistical schedule:
- Coordination: Buses departing every 15/30 minutes synchronized between CU and CU2.
- Efficiency: Ensuring units do not return empty ("deadheading") by aligning departures with arrival peaks, reducing unnecessary emissions.
Note: The source code, datasets, and detailed PDF reports contained in this repository are in Spanish.