Satellite Fleet Health Management System
Real-time ISS telemetry monitoring with ML-powered diagnostics
Continuous Operations Network for Satellite Telemetry Evaluation, Life-cycle Analysis, Tracking, Intelligence, Operations, and Notification
Mission: Production-grade satellite fleet health management platform using real ISS telemetry to demonstrate predictive maintenance, anomaly detection, and operational decision support capabilities for aerospace and defense applications.
CONSTELLATION monitors the International Space Station's attitude control and communications subsystems using real-time telemetry from NASA's public Lightstreamer feed. The system provides:
Real-Time Telemetry Processor
- AWS Lambda function subscribing to NASA Lightstreamer
- Filters for attitude control and communications subsystems
- Writes to DynamoDB for real-time access
- Archives to S3 for historical analysis
Parameters Monitored:
Attitude Control (Reaction Wheels & CMGs):
USLAB000084: Reaction Wheel Assembly (RWA) speedUSLAB000085: RWA bearing temperatureUSLAB000086: RWA current drawUSLAB000087: CMG (Control Moment Gyroscope) momentum- Attitude quaternions (pitch, roll, yaw)
- Rate gyro outputs
Communications:
- S-band transponder power levels
- Ku-band signal strength
- Antenna pointing accuracy
- Data throughput metrics
- Ground station contact windows
- Communication link quality indicators
Cross-Cutting:
- Power system voltage/current (affects both subsystems)
- Thermal readings (reaction wheel bearings, transmitter temps)
- Time-on-orbit (cumulative degradation tracking)
Time Series Features:
- Rolling statistics (mean, std, min, max) over multiple windows (1hr, 6hr, 24hr, 7day)
- Rate of change calculations
- Autocorrelation features
- Fourier transform for periodic patterns
- Lag features (t-1, t-6, t-24 for hourly data)
Domain-Specific Features:
- Reaction wheel friction coefficient (derived from speed vs. current)
- Thermal cycling count (number of orbital day/night transitions)
- Momentum accumulation rate
- Communication link budget margin
- Signal degradation trends
- Anomaly persistence scores
Engineering Calculations:
- Power efficiency ratios
- Thermal dissipation rates
- Bearing wear indicators
- Transmitter efficiency
- Pointing error accumulation
Model 1: Anomaly Detection (Isolation Forest + LSTM Autoencoder)
Purpose: Real-time detection of unusual telemetry patterns
Approach:
- Isolation Forest for fast, lightweight anomaly flagging
- LSTM Autoencoder for complex temporal anomaly detection
- Ensemble voting for final anomaly score
Training Data:
- Nominal operational periods (confirmed healthy operation)
- Labeled anomalies from NASA incident reports
Metrics: Precision, Recall, F1-Score, False Positive Rate
Model 2: Degradation Forecasting (Temporal Fusion Transformer)
Purpose: Predict subsystem performance degradation over time
Targets:
- Reaction wheel bearing temperature trend
- Solar panel output decline
- Battery capacity fade
- Communication signal strength degradation
Features: Time series telemetry + orbital mechanics (radiation exposure, thermal cycling)
Output: Forecasted parameter values with confidence intervals (7, 30, 90 days ahead)
Model 3: Survival Analysis (Cox Proportional Hazards)
Purpose: Estimate time-to-failure for critical components
Approach:
- Cox model for component-level survival curves
- Censored data handling for components still operational
- Hazard ratios for risk factors (high temps, usage patterns)
Output: Probability of failure within time windows (30d, 60d, 90d, 180d)
Model 4: Fault Classification (XGBoost)
Purpose: Diagnose root cause when anomalies occur
Classes:
- Thermal stress
- Mechanical wear (bearings, gimbals)
- Electrical fault
- Software/command error
- External disturbance (debris impact, space weather)
- Normal operational variation
Features: Anomaly signatures, subsystem interactions, environmental context
Output: Ranked list of probable causes with confidence scores
Constraint Satisfaction Problem:
Variables:
- Maintenance task list (derived from predictions)
- Available maintenance windows
- Crew availability (for ISS; ground station access for unmanned satellites)
- Orbital position constraints
- Mission priority levels
Constraints:
- Ground station contact requirements
- Crew schedule conflicts
- Tool/equipment availability
- Task dependencies (some maintenance requires others first)
- Safety margins (don't defer critical items)
Objective Function:
- Minimize risk-weighted maintenance delay
- Balance urgency vs. operational disruption
- Optimize crew time utilization
Algorithm: Mixed Integer Programming (MIP) using PuLP or Google OR-Tools
┌─────────────────────────────────────────────────────────────────┐
│ DATA INGESTION LAYER │
├─────────────────────────────────────────────────────────────────┤
│ NASA Lightstreamer → Lambda (Real-time) → DynamoDB │
│ Historical Archive → S3 Data Lake │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ FEATURE ENGINEERING LAYER │
├─────────────────────────────────────────────────────────────────┤
│ • Time series windowing │
│ • Statistical feature extraction │
│ • Subsystem correlation analysis │
│ • Degradation rate calculation │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ ML MODEL SUITE │
├─────────────────────────────────────────────────────────────────┤
│ Anomaly Detection → Isolation Forest / Autoencoder │
│ Degradation Forecast → LSTM / Temporal Fusion Transformer │
│ Survival Analysis → Cox Proportional Hazards / Weibull │
│ Fault Classification → Random Forest / XGBoost │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ OPERATIONAL LAYER │
├─────────────────────────────────────────────────────────────────┤
│ Health Scoring → Maintenance Scheduling → Alert Generation │
│ Dashboard (Streamlit) → CloudWatch Monitoring │
└─────────────────────────────────────────────────────────────────┘
### Model Training Strategy
Local Development:
- Jupyter notebooks for experimentation
- GPU-enabled local training for initial model development
- Small data samples for rapid iteration
Production Training:
- AWS SageMaker for full dataset training
- Hyperparameter tuning with SageMaker Automatic Model Tuning
- Distributed training for large models
- Model versioning with MLflow
Anomaly Detection:
- Precision, Recall, F1-Score
- False Positive Rate (critical for operational systems)
- Detection latency
- ROC-AUC, PR-AUC
Degradation Forecasting:
- RMSE, MAE, MAPE
- Prediction interval coverage
- Directional accuracy (did we predict the trend correctly?)
- Forecast horizon performance (7d vs 30d vs 90d)
Survival Analysis:
- Concordance index (C-index)
- Brier score
- Calibration plots (predicted vs observed survival)
- Time-dependent AUC
Fault Classification:
- Accuracy, Precision, Recall per class
- Confusion matrix
- Top-k accuracy (are correct diagnoses in top 3 predictions?)
# Clone repository
git clone <your-repo-url>
cd constellation
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Start telemetry collection
python -m src.ingestion.collect_telemetryData will be saved to data/raw/ in date-partitioned Parquet files.
This is a portfolio project. For questions or collaboration:
Data: NASA ISS Telemetry (Public)