Time-Series-Classifier

Machine learning pipeline for magnetic distortion classification using synthetic sensor data and Audio Spectrogram Transformer (AST) models.

Overview

This repository provides a complete, self-contained machine learning pipeline for time-series classification. It specializes in magnetic distortion detection using an innovative approach that treats sensor data as audio spectrograms, leveraging state-of-the-art audio classification models.

🌟 Key Features

🔬 Self-Contained Synthetic Data: Generate realistic IMU sensor data with configurable magnetic distortion
🎵 Audio Classification Approach: Convert time-series to spectrograms for audio model training
🤖 Modern ML Pipeline: HuggingFace transformers with Audio Spectrogram Transformer (AST)
📚 Educational Notebooks: Complete tutorials that work in Google Colab without external dependencies
🎛️ Configurable Training: Easy to adjust distortion levels, motion patterns, and training parameters

📓 Quick Start - Notebooks

The easiest way to get started is with our self-contained Jupyter notebooks:

📋 Notebooks README - Complete guide to all notebooks

classify.ipynb - Live classification demo with synthetic data
dataset.ipynb - Interactive synthetic data exploration
fine_tune.ipynb - Train your own models

🚀 Try in Google Colab:

classify:
fine_tune:

Directory Structure

/workspaces/time-series-classifier/
├── src/
│   ├── synthetic/                    # Core synthetic data generation
│   │   ├── __init__.py              # Module exports
│   │   ├── generator.py             # SyntheticDataGenerator class
│   │   └── math_utils.py            # Math utilities (quaternions, vectors, etc.)
│   ├── mcap_utils/                  # MCAP processing utilities  
│   │   ├── __init__.py              # Module exports
│   │   ├── reader.py                # Data reading functions
│   │   ├── visualization.py         # Plotting and visualization
│   │   ├── dataset.py               # ML dataset creation
│   │   └── spectrogram.py           # Spectrogram processing
│   └── mcap_utilities.py            # Original monolithic file (kept for compatibility)
├── config/
│   └── default_plan.json            # Default configuration for synthetic data generation
├── scripts/
│   ├── script_utils.py              # Common utilities for all scripts
│   ├── convert_sensor_logs.py       # Convert recorded JSON logs to MCAP
│   ├── generate_synthetic.py        # CLI for synthetic data generation
│   ├── classify.py                  # Classification script
│   ├── fine_tune.py                 # Fine-tuning script
│   └── synthetic_data.py            # Data processing script
├── examples/
│   ├── basic_example.py             # Usage demonstration
│   └── using_recorded_data.py       # Example using converted sensor logs
├── tests/
│   └── test_synthetic.py            # Test suite
└── docs/
    ├── synthetic_data.md            # Synthetic data generation documentation
    └── converting_sensor_logs.md    # Guide for converting recorded logs

Scripts & Command Line Tools

All scripts use common utilities from script_utils.py for consistent environment setup, file management, and Nstrumenta integration.

script_utils.py

Provides common functions for all scripts:

init_script_environment() - Sets up Python path and Nstrumenta client
setup_working_directory() - Creates and manages working directories
fetch_nstrumenta_file() - Downloads files from Nstrumenta with optional extraction
upload_with_prefix() - Uploads files with organized remote paths

convert_sensor_logs.py

Convert recorded sensor log JSON files to MCAP format with labels and experiment configurations.

Features:

Converts iPhone sensor logs (or similar JSON format) to MCAP
Generates label files with single event for entire recording
Creates experiment configs compatible with training pipeline
Automatically uploads to Nstrumenta

Usage:

# Convert all sensor logs in temp/ directory
python scripts/convert_sensor_logs.py

# Programmatic usage
from convert_sensor_logs import convert_sensor_log
convert_sensor_log("temp/Sensor_Log_xyz.json", distortion_level="0")

See: docs/converting_sensor_logs.md for detailed documentation.

generate_synthetic.py

Command-line interface for generating synthetic sensor data from motion plans.

fine_tune.py

The fine_tune.py script is used to fine_tune a pre-trained audio classification model on a custom dataset. The script performs the following steps:

Setup and Initialization: Initializes the working directory and sets up the environment.
Data Preparation: Downloads necessary input files and creates spectrograms from time-series data.
Dataset Creation: Creates a dataset from the spectrogram files and corresponding labels.
Model Configuration: Loads a pre-trained model and updates its configuration based on the dataset labels.
Training: Splits the dataset into training and testing subsets, and trains the model using the Trainer class from the transformers library.
Evaluation: Evaluates the model on the test set and logs the metrics.
Model Saving: Saves the trained model and uploads it to the Nstrumenta platform.

classify.py

The classify.py script is used to classify time-series data using a fine_tuned model. The script performs the following steps:

Setup and Initialization: Initializes the working directory and sets up the environment.
Data Preparation: Downloads necessary input files and creates spectrograms from time-series data if they do not already exist.
Model Loading: Loads the fine_tuned model for time-series classification.
Spectrogram Classification: Classifies the spectrogram data using the loaded model.
Result Upload: Uploads the classification results to the Nstrumenta platform.

Usage

Synthetic Data Generation

Command Line

python scripts/generate_synthetic.py --plan config/default_plan.json --output data.mcap

Python API

from synthetic import SyntheticDataGenerator
generator = SyntheticDataGenerator()
generator.generate("config/default_plan.json", "output.mcap")

Data Analysis

from mcap_utils import read_synthetic_sensor_data, plot_synthetic_sensor_data
data = read_synthetic_sensor_data("output.mcap")
plot_synthetic_sensor_data("output.mcap")

Machine Learning Dataset Creation

from mcap_utils import extract_imu_windows
windows = extract_imu_windows("output.mcap", window_size_ns=1e9)

Using in Colab

The notebooks automatically handle repository setup and environment configuration for Google Colab. Just click and run:

Classification Demo:
Model Training:

Command Line Usage

For advanced users and production workflows, use the command-line scripts:

Experiment File Example

Experiment files define datasets for training and classification. Here's an example from our synthetic magnetic distortion data:

{
  "dirname": "synthetic_datasets/training_sequence_0",
  "labelFiles": [
    {
      "filePath": "projects/nst-test/data/synthetic_datasets/training_sequence_0/training_sequence_0.labels.json"
    }
  ],
  "description": "Synthetic dataset: training_sequence_0",
  "segments": [
    {
      "name": "high_motion_0",
      "duration_s": 6.942836128288839,
      "rotation_rpy_degrees": {
        "roll": 1.5799431492871534,
        "pitch": -6.920307915411698,
        "yaw": -77.28209866177114
      },
      "magnetic_distortion": 2.115377702012322,
      "mag_distortion": {
        "level": "high"
      }
    },
    {
      "name": "none_motion_1", 
      "duration_s": 18.26988031964769,
      "rotation_rpy_degrees": {
        "roll": 18.08781247960689,
        "pitch": 6.5994098443636915,
        "yaw": 81.38743907152028
      },
      "magnetic_distortion": 0.0,
      "mag_distortion": {
        "level": "none"
      }
    }
  ],
  "metadata": {
    "generated_by": "synthetic_data.py",
    "sample_rate": 100,
    "total_duration_s": 600.6333273280001,
    "classification_type": "mag_distortion",
    "distortion_levels": ["none", "high", "low"]
  }
}

Key Fields:

dirname: Points to the directory containing the MCAP data files
labelFiles: Array of label files with classification data
description: Human-readable description of the dataset
segments: Detailed information about each data segment including motion parameters and distortion levels
metadata: Additional information about data generation and classification schema

Magnetic Distortion Levels:

none (0): No magnetic distortion applied
low (1): Low-level magnetic field distortion
high (2): High-level magnetic field distortion

set up secrets

NSTRUMENTA_API_KEY

Use an nstrumenta API key from your project:

HF_TOKEN

Use an access token from Hugging Face settings:

https://huggingface.co/docs/hub/en/security-tokens

Fine-Tuning a Model

To fine-tune a model using synthetic data, you have two options:

Option 1: Use the Notebook (Recommended)

# Open notebooks/fine_tune.ipynb in Jupyter or Google Colab
# All synthetic data generation and training is automated

Option 2: Command Line

python scripts/fine_tune.py

The fine-tuning process:

Generates synthetic training data with multiple magnetic distortion scenarios
Creates spectrograms from time-series data for audio classification
Trains an Audio Spectrogram Transformer (AST) model
Evaluates performance and saves the trained model

Classification

Option 1: Use the Notebook (Recommended)

# Open notebooks/classify.ipynb for interactive classification demo
# Generates test data and runs inference automatically

Option 2: Command Line

python scripts/classify.py

🔬 Synthetic Data Generation

The synthetic data generator creates realistic IMU sensor data with controllable magnetic distortion:

Configuration Example

{
  "initialization": {
    "sample_rate": 100,
    "pose": {
      "origin": {"lat": 38.446, "lng": -122.687, "height": 0.0}
    }
  },
  "segments": [
    {
      "name": "high_distortion_test",
      "duration_s": 60.0,
      "rotation_rpy_degrees": {"roll": 30.0, "pitch": 0.0, "yaw": 0.0},
      "magnetic_distortion": 2.5,
      "mag_distortion": {"level": "high"}
    }
  ]
}

Distortion Levels

none (0.0): Clean magnetic field data
low (1.0): Subtle magnetic disturbances
high (2.5): Strong magnetic interference

📚 Documentation

Notebooks README - Complete notebook documentation and tutorials
Synthetic Data Guide - Technical details on data generation
Scripts Documentation - Command-line tool reference

🛠️ Setup & Installation

Prerequisites

pip install datasets[audio]==3.0.1 mcap==1.2.1 torch torchaudio transformers[torch] numpy

Quick Setup

git clone https://github.com/nstrumenta/time-series-classifier.git
cd time-series-classifier
# Open notebooks/classify.ipynb and run all cells!

📊 Example Results

The synthetic data approach enables controlled experiments with perfect ground truth:

Magnetometer Data: Realistic 3-axis magnetic field measurements
Accelerometer Data: Motion-correlated acceleration patterns
Gyroscope Data: Angular velocity measurements
Perfect Labels: Exact magnetic distortion classifications
Reproducible: Same synthetic data every time

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.devcontainer		.devcontainer
.nstrumenta		.nstrumenta
.vscode		.vscode
config		config
docs		docs
examples		examples
notebooks		notebooks
nst-test		nst-test
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
image-1.png		image-1.png
image-2.png		image-2.png
image.png		image.png
requirements.txt		requirements.txt

nstrumenta/time-series-classifier

Folders and files

Latest commit

History

Repository files navigation