Machine learning pipeline for magnetic distortion classification using synthetic sensor data and Audio Spectrogram Transformer (AST) models.
This repository provides a complete, self-contained machine learning pipeline for time-series classification. It specializes in magnetic distortion detection using an innovative approach that treats sensor data as audio spectrograms, leveraging state-of-the-art audio classification models.
- π¬ Self-Contained Synthetic Data: Generate realistic IMU sensor data with configurable magnetic distortion
- π΅ Audio Classification Approach: Convert time-series to spectrograms for audio model training
- π€ Modern ML Pipeline: HuggingFace transformers with Audio Spectrogram Transformer (AST)
- π Educational Notebooks: Complete tutorials that work in Google Colab without external dependencies
- ποΈ Configurable Training: Easy to adjust distortion levels, motion patterns, and training parameters
The easiest way to get started is with our self-contained Jupyter notebooks:
π Notebooks README - Complete guide to all notebooks
classify.ipynb- Live classification demo with synthetic datadataset.ipynb- Interactive synthetic data explorationfine_tune.ipynb- Train your own models
π Try in Google Colab:
/workspaces/time-series-classifier/
βββ src/
β βββ synthetic/ # Core synthetic data generation
β β βββ __init__.py # Module exports
β β βββ generator.py # SyntheticDataGenerator class
β β βββ math_utils.py # Math utilities (quaternions, vectors, etc.)
β βββ mcap_utils/ # MCAP processing utilities
β β βββ __init__.py # Module exports
β β βββ reader.py # Data reading functions
β β βββ visualization.py # Plotting and visualization
β β βββ dataset.py # ML dataset creation
β β βββ spectrogram.py # Spectrogram processing
β βββ mcap_utilities.py # Original monolithic file (kept for compatibility)
βββ config/
β βββ default_plan.json # Default configuration for synthetic data generation
βββ scripts/
β βββ script_utils.py # Common utilities for all scripts
β βββ convert_sensor_logs.py # Convert recorded JSON logs to MCAP
β βββ generate_synthetic.py # CLI for synthetic data generation
β βββ classify.py # Classification script
β βββ fine_tune.py # Fine-tuning script
β βββ synthetic_data.py # Data processing script
βββ examples/
β βββ basic_example.py # Usage demonstration
β βββ using_recorded_data.py # Example using converted sensor logs
βββ tests/
β βββ test_synthetic.py # Test suite
βββ docs/
βββ synthetic_data.md # Synthetic data generation documentation
βββ converting_sensor_logs.md # Guide for converting recorded logs
All scripts use common utilities from script_utils.py for consistent environment setup, file management, and Nstrumenta integration.
Provides common functions for all scripts:
init_script_environment()- Sets up Python path and Nstrumenta clientsetup_working_directory()- Creates and manages working directoriesfetch_nstrumenta_file()- Downloads files from Nstrumenta with optional extractionupload_with_prefix()- Uploads files with organized remote paths
Convert recorded sensor log JSON files to MCAP format with labels and experiment configurations.
Features:
- Converts iPhone sensor logs (or similar JSON format) to MCAP
- Generates label files with single event for entire recording
- Creates experiment configs compatible with training pipeline
- Automatically uploads to Nstrumenta
Usage:
# Convert all sensor logs in temp/ directory
python scripts/convert_sensor_logs.py
# Programmatic usage
from convert_sensor_logs import convert_sensor_log
convert_sensor_log("temp/Sensor_Log_xyz.json", distortion_level="0")See: docs/converting_sensor_logs.md for detailed documentation.
Command-line interface for generating synthetic sensor data from motion plans.
The fine_tune.py script is used to fine_tune a pre-trained audio classification model on a custom dataset. The script performs the following steps:
- Setup and Initialization: Initializes the working directory and sets up the environment.
- Data Preparation: Downloads necessary input files and creates spectrograms from time-series data.
- Dataset Creation: Creates a dataset from the spectrogram files and corresponding labels.
- Model Configuration: Loads a pre-trained model and updates its configuration based on the dataset labels.
- Training: Splits the dataset into training and testing subsets, and trains the model using the
Trainerclass from thetransformerslibrary. - Evaluation: Evaluates the model on the test set and logs the metrics.
- Model Saving: Saves the trained model and uploads it to the Nstrumenta platform.
The classify.py script is used to classify time-series data using a fine_tuned model. The script performs the following steps:
- Setup and Initialization: Initializes the working directory and sets up the environment.
- Data Preparation: Downloads necessary input files and creates spectrograms from time-series data if they do not already exist.
- Model Loading: Loads the fine_tuned model for time-series classification.
- Spectrogram Classification: Classifies the spectrogram data using the loaded model.
- Result Upload: Uploads the classification results to the Nstrumenta platform.
python scripts/generate_synthetic.py --plan config/default_plan.json --output data.mcapfrom synthetic import SyntheticDataGenerator
generator = SyntheticDataGenerator()
generator.generate("config/default_plan.json", "output.mcap")from mcap_utils import read_synthetic_sensor_data, plot_synthetic_sensor_data
data = read_synthetic_sensor_data("output.mcap")
plot_synthetic_sensor_data("output.mcap")from mcap_utils import extract_imu_windows
windows = extract_imu_windows("output.mcap", window_size_ns=1e9)The notebooks automatically handle repository setup and environment configuration for Google Colab. Just click and run:
For advanced users and production workflows, use the command-line scripts:
Experiment files define datasets for training and classification. Here's an example from our synthetic magnetic distortion data:
{
"dirname": "synthetic_datasets/training_sequence_0",
"labelFiles": [
{
"filePath": "projects/nst-test/data/synthetic_datasets/training_sequence_0/training_sequence_0.labels.json"
}
],
"description": "Synthetic dataset: training_sequence_0",
"segments": [
{
"name": "high_motion_0",
"duration_s": 6.942836128288839,
"rotation_rpy_degrees": {
"roll": 1.5799431492871534,
"pitch": -6.920307915411698,
"yaw": -77.28209866177114
},
"magnetic_distortion": 2.115377702012322,
"mag_distortion": {
"level": "high"
}
},
{
"name": "none_motion_1",
"duration_s": 18.26988031964769,
"rotation_rpy_degrees": {
"roll": 18.08781247960689,
"pitch": 6.5994098443636915,
"yaw": 81.38743907152028
},
"magnetic_distortion": 0.0,
"mag_distortion": {
"level": "none"
}
}
],
"metadata": {
"generated_by": "synthetic_data.py",
"sample_rate": 100,
"total_duration_s": 600.6333273280001,
"classification_type": "mag_distortion",
"distortion_levels": ["none", "high", "low"]
}
}Key Fields:
dirname: Points to the directory containing the MCAP data fileslabelFiles: Array of label files with classification datadescription: Human-readable description of the datasetsegments: Detailed information about each data segment including motion parameters and distortion levelsmetadata: Additional information about data generation and classification schema
Magnetic Distortion Levels:
none(0): No magnetic distortion appliedlow(1): Low-level magnetic field distortionhigh(2): High-level magnetic field distortion
Use an nstrumenta API key from your project:
Use an access token from Hugging Face settings:
https://huggingface.co/docs/hub/en/security-tokens
To fine-tune a model using synthetic data, you have two options:
Option 1: Use the Notebook (Recommended)
# Open notebooks/fine_tune.ipynb in Jupyter or Google Colab
# All synthetic data generation and training is automatedOption 2: Command Line
python scripts/fine_tune.pyThe fine-tuning process:
- Generates synthetic training data with multiple magnetic distortion scenarios
- Creates spectrograms from time-series data for audio classification
- Trains an Audio Spectrogram Transformer (AST) model
- Evaluates performance and saves the trained model
Option 1: Use the Notebook (Recommended)
# Open notebooks/classify.ipynb for interactive classification demo
# Generates test data and runs inference automaticallyOption 2: Command Line
python scripts/classify.pyThe synthetic data generator creates realistic IMU sensor data with controllable magnetic distortion:
{
"initialization": {
"sample_rate": 100,
"pose": {
"origin": {"lat": 38.446, "lng": -122.687, "height": 0.0}
}
},
"segments": [
{
"name": "high_distortion_test",
"duration_s": 60.0,
"rotation_rpy_degrees": {"roll": 30.0, "pitch": 0.0, "yaw": 0.0},
"magnetic_distortion": 2.5,
"mag_distortion": {"level": "high"}
}
]
}none(0.0): Clean magnetic field datalow(1.0): Subtle magnetic disturbanceshigh(2.5): Strong magnetic interference
- Notebooks README - Complete notebook documentation and tutorials
- Synthetic Data Guide - Technical details on data generation
- Scripts Documentation - Command-line tool reference
pip install datasets[audio]==3.0.1 mcap==1.2.1 torch torchaudio transformers[torch] numpygit clone https://github.com/nstrumenta/time-series-classifier.git
cd time-series-classifier
# Open notebooks/classify.ipynb and run all cells!The synthetic data approach enables controlled experiments with perfect ground truth:
- Magnetometer Data: Realistic 3-axis magnetic field measurements
- Accelerometer Data: Motion-correlated acceleration patterns
- Gyroscope Data: Angular velocity measurements
- Perfect Labels: Exact magnetic distortion classifications
- Reproducible: Same synthetic data every time


