Machine learning-based wind power prediction using XGBoost regression with meteorological data and engineered features.
This project implements a wind power forecasting system using XGBoost (Extreme Gradient Boosting) to predict normalized wind turbine power output based on wind speed and direction measurements at multiple heights. The model achieves high accuracy through careful feature engineering and multi-height wind data integration.
- XGBoost Regression: Gradient boosting for accurate non-linear predictions
- Multi-Height Wind Data: Utilizes measurements at 10m, 50m, and 100m
- Feature Engineering: Rolling averages, wind angles, and power law extrapolation
- Temporal Features: Hour and month encoding for seasonal patterns
- RMSE Optimization: Model tuned to minimize prediction error
- Raw Measurements: U10, V10, U100, V100 (wind components)
- Derived Features:
- Absolute wind speed
- Wind angle/direction
- 50m height estimates (power law)
- Temporal Aggregations:
- 6-hour rolling average (recent trend)
- 24-hour rolling average (daily pattern)
- 30-day rolling average (seasonal baseline)
- 5-hour centered average (smoothing)
- Time Features: Hour of day, month
n_estimators=1000
learning_rate=0.01
max_depth=4
reg_lambda=15 (L2 regularization)
reg_alpha=0.001 (L1 regularization)
early_stopping_rounds=50Extrapolates wind speed to turbine hub height (50m):
V(h) = V(10m) × (h / 10)^α
where α = 0.11 (typical for open terrain)
Absolute Speed = sqrt(u² + v²)
Wind Angle = arcsin(v / speed)
Capture temporal patterns at multiple scales:
- 6-hour: Recent weather trends
- 24-hour: Daily cycles
- 30-day: Seasonal variations
- 5-hour centered: Local smoothing
- TIMESTAMP: Date and time (YYYYMMDD HH:MM)
- U10, V10: Wind components at 10m (m/s)
- U100, V100: Wind components at 100m (m/s)
- POWER: Normalized power output (0-1)
- Same format as training data (without POWER column)
- Typically covers 1-7 days ahead
python>=3.7
pandas>=1.3.0
numpy>=1.21.0
scikit-learn>=1.0.0
xgboost>=1.5.0
pip install pandas numpy scikit-learn xgboostimport pandas as pd
from sklearn.model_selection import train_test_split
import xgboost as xgb
# Load and transform data
train_data = pd.read_csv('TrainData_A.csv')
train_data_prepared = transform_df(train_data)
# Split features and target
target = train_data_prepared['POWER']
params = train_data_prepared.drop('POWER', axis=1)
# Train/test split
X_train, X_test, Y_train, Y_test = train_test_split(
params, target, test_size=0.1, random_state=42
)
# Train XGBoost model
xgb_model = xgb.XGBRegressor(
n_estimators=1000,
learning_rate=0.01,
max_depth=4,
reg_lambda=15,
reg_alpha=0.001,
early_stopping_rounds=50
)
xgb_model.fit(X_train, Y_train,
eval_set=[(X_train, Y_train), (X_test, Y_test)],
verbose=100)# Load forecast data
test_data = pd.read_csv('WeatherForecastInput_A.csv')
test_data_prepared = transform_df(test_data)
# Generate predictions
predictions = xgb_model.predict(test_data_prepared)
# Save results
result = pd.DataFrame({
'TIMESTAMP': test_data['TIMESTAMP'],
'FORECAST': predictions
})
result.to_csv('forecast_results.csv', index=False)The model has been trained, tested, and evaluated successfully.
Training: 26,590 samples, 62 features
Test R2 Score: 0.8850
Forecast R2 Score: 0.9003
Generated 672 hourly predictions
Model saved to wind_power_model.pkl
| Metric | Value |
|---|---|
| Test R2 Score | 0.885 |
| Forecast R2 Score | 0.900 |
| Final Train RMSE | 0.1057 |
| Final Val RMSE | 0.1070 |
| Features | 62 |
| Training samples | 26,590 |
pip install pandas numpy scikit-learn xgboost joblib
python wind_power_forecast.pyThe model shows:
- Rapid initial improvement (first 100 iterations)
- Steady convergence after 300 iterations
- No significant overfitting (train/val gap < 2%)
Top predictive features (typical):
- Absolute wind speed at 100m
- Rolling averages of wind speed
- Wind angle
- Hour of day
- Month (seasonal patterns)
- Energy Trading: Day-ahead market bidding
- Grid Operations: Balancing and dispatch planning
- Renewable Integration: Managing variability
- Financial Planning: Revenue forecasting
- Maintenance Scheduling: Optimal downtime planning
- Non-linear Relationships: Captures power curve behavior
- Robust to Outliers: Tree-based method handles extreme values
- Fast Inference: Real-time predictions
- Interpretable: Feature importance analysis available
- Regularization: Prevents overfitting on noisy data
- Ensemble with other models (LSTM, Random Forest)
- Weather regime classification
- Turbine-specific power curves
- Uncertainty quantification
- Online learning for model updates
- Spatial features (nearby turbines)
- Training Period: Historical wind farm data
- Temporal Resolution: 1 hour
- Forecast Horizon: 1-672 hours (up to 28 days)
- Output Range: [0, 1] (normalized power)
Cross-validation strategy:
- 90/10 train/test split
- Time-series aware splitting (no future leakage)
- Early stopping to prevent overfitting
This project is available for educational and research purposes.