A research framework for discovering structure, invariants, and emergence in mathematical sequences and symbolic systems
ARKHĒ is an open research framework for studying first principles in rule-based systems where simple constraints give rise to complex structure.
The project treats mathematical sequences, symbolic representations, and learning systems as objects of inspection rather than targets for optimization. Models are used as analytical probes, not authorities. Performance is measured only insofar as it helps expose structure, invariants, or failure modes.
ARKHĒ is designed for transparency, reproducibility, and long-term relevance. It is not a demo platform, benchmark suite, or production system.
ARKHĒ follows production-grade engineering standards where they support research integrity, reproducibility, and auditability.
The ARKHĒ FRAMEWORK is a comprehensive, enterprise-level Python framework designed to support mathematical sequence research and machine learning experimentation. Inspired by Collatz conjecture research, the framework provides tools for:
- Sequence Generation: Extensible framework for generating and analyzing mathematical sequences
- Machine Learning: Transformer-based models for sequence prediction
- Analysis Tools: Statistical analysis, visualization, and pattern detection
- Interactive Interfaces: CLI, Streamlit web app, and Jupyter notebooks
- Research Support: Comprehensive documentation and example notebooks
- 🔢 Mathematical Sequences: Base framework for sequence generation with Collatz implementation and long-step optimization
- 🤖 ML Models: Complete transformer architecture (
CollatzTransformer) for sequence prediction - 📊 Analysis Tools: Statistical analysis, visualization, and pattern detection
- 🔬 Pattern Detection: Algorithms for identifying patterns in sequences
- 📈 Experiment Tracking: Integration with wandb and MLflow
- 🔧 Extensible: Easy to add custom sequence types via registry system
- 💻 CLI Application: Full-featured command-line interface with 4 commands
generate: Generate Collatz sequences (TXT, JSON, CSV)train: Train transformer modelsevaluate: Evaluate trained modelsanalyze: Batch sequence analysis CLI Example
-
🌐 Streamlit Dashboard: Interactive web application with 5 pages (including Health Check)
- Sequence Explorer with real-time visualization
- Model Inference interface
- Statistical Analysis dashboard`n - Health Check monitoring page
- Interactive charts and data export
- 📓 Jupyter Notebooks: 3 complete notebooks with working examples
- ✅ Type Hints: Full type annotation coverage
- ✅ Documentation: 12+ comprehensive guides (~80KB+)
- ✅ Testing: 40+ tests across 7 test files
- ✅ Code Quality: Linting, formatting, and quality checks
n- ✅ **Health Monitoring**: Real-time system health checks and diagnosticsn- ✅ Production Deployment: Comprehensive deployment guide with Docker, monitoring, and security best practicesn- ✅ **Metrics Export**: Prometheus-compatible metrics for production monitoringn- ✅ Distributed Tracing: OpenTelemetry-based tracing for request/operation tracking`n- ✅ Enhanced Observability: Request logging, performance profiling, and error tracking - ✅ Maintainability: Clean architecture and design patterns
# Clone the repository
git clone https://github.com/monigarr/ARKHE.git
cd ARKHE
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtpython -c "import sys; sys.path.insert(0, 'src'); from math_research import CollatzSequence; print('✓ ARKHE installed successfully!')"from math_research.sequences import CollatzSequence
from math_research.analysis import SequenceStatistics, SequenceVisualizer
# Generate a Collatz sequence
seq = CollatzSequence(start=27)
sequence = seq.generate()
# Analyze the sequence
stats = SequenceStatistics(sequence)
print(stats.summary())
# Visualize
visualizer = SequenceVisualizer()
fig, ax = visualizer.plot_sequence(sequence, title="Collatz Sequence Starting at 27")# Generate a sequence
python -m src.apps.cli generate --start 27 --output sequence.txt --show-stats
# Train a model
python -m src.apps.cli train --num-samples 10000 --epochs 10
# Evaluate a model
python -m src.apps.cli evaluate --checkpoint checkpoints/best_model.pt --test-size 1000
# Analyze sequences
python -m src.apps.cli analyze --start 1 --end 100 --output analysis.json# Build and run with Docker Compose
docker-compose --profile streamlit up # Web interface
docker-compose --profile cli run arkhe-cli python -m src.apps.cli generate --start 27
docker-compose --profile dev up -d # Development environment
# Or use Docker directly
docker build -t arkhe:latest .
docker run --rm arkhe:latest python -m src.apps.cli --helpSee Docker Setup Guide for complete instructions.
# Launch interactive web interface
streamlit run src/apps/streamlit_demo/app.py
# OR use the launcher script
python run_streamlit.pyThen open your browser to http://localhost:8501
# Launch Jupyter
jupyter notebook src/notebooks/Try the notebooks:
01_collatz_basics.ipynb- Basic sequence operations02_sequence_analysis.ipynb- Complete statistical analysis (18 cells): multiple sequences, visualizations, pattern analysis, and comparisons03_transformer_training.ipynb- Complete training pipeline
ARKHE/
├── src/
│ ├── math_research/ # Main package
│ │ ├── sequences/ # Sequence generation (base, collatz, registry)
│ │ ├── analysis/ # Analysis tools (statistics, visualization, patterns)
│ │ ├── ml/ # Machine learning
│ │ │ ├── models/ # Transformer models
│ │ │ ├── encoding/ # Data encoding (multi-base)
│ │ │ ├── training/ # Training pipeline (trainer, data_loader, experiment_tracker)
│ │ │ └── evaluation/ # Metrics and error analysis
│ │ └── utils/ # Utilities (config, logging, validators)
│ ├── apps/
│ │ ├── cli/ # Command-line interface
│ │ │ ├── main.py # CLI entry point
│ │ │ └── commands/ # Command implementations
│ │ └── streamlit_demo/ # Streamlit web application
│ │ └── app.py # Main Streamlit app
│ └── notebooks/ # Jupyter notebooks
│ ├── 01_collatz_basics.ipynb
│ ├── 02_sequence_analysis.ipynb
│ └── 03_transformer_training.ipynb
├── tests/ # Test suite
│ ├── unit/ # Unit tests (40+ tests)
│ └── integration/ # Integration tests
├── docs/ # Documentation
│ ├── guides/ # User guides
│ ├── api/ # API documentation
│ ├── architecture/ # Architecture docs
│ └── examples/ # Example scripts
├── configs/ # Configuration files
├── scripts/ # Utility scripts
├── data/ # Data directory
│ ├── raw/ # Raw data
│ ├── processed/ # Processed data
│ └── models/ # Saved models
├── checkpoints/ # Model checkpoints (created during training)
├── README.md # This file
├── CHANGELOG.md # Version history
├── requirements.txt # Python dependencies
└── pyproject.toml # Project configuration
- Getting Started Guide - Installation and quick start
- Usage Examples - Comprehensive code examples
- Training Guide - Complete ML training workflow
- FAQ - Common questions and troubleshooting
- Streamlit Setup - Streamlit-specific help
- API Documentation Index - Complete API reference
- Quick Start Script - Runnable example script
from math_research.sequences import BaseSequence, CollatzSequence
# Use built-in Collatz sequence
seq = CollatzSequence(start=27)
sequence = seq.generate()
# Long step optimization
long_step = seq.compute_long_step(27)
print(f"k={long_step['k']}, k'={long_step['k_prime']}, result={long_step['result']}")
# Generate with long steps
long_steps = seq.generate_with_long_steps()from math_research.ml import (
CollatzTransformer,
MultiBaseEncoder,
CollatzDataset,
Trainer,
compute_accuracy,
)
# Create dataset
dataset = CollatzDataset(start_range=(1, 10000), num_samples=10000, base=24)
# Create model
model = CollatzTransformer(vocab_size=24, d_model=512, nhead=8, num_layers=6)
# Train
trainer = Trainer(model, train_loader, val_loader)
history = trainer.train(num_epochs=20)
# Evaluate
accuracy = compute_accuracy(predictions, targets)See Training Guide for complete examples.
from math_research.analysis import SequenceStatistics, SequenceVisualizer
stats = SequenceStatistics(sequence)
summary = stats.summary() # Comprehensive statistics
visualizer = SequenceVisualizer()
fig, ax = visualizer.plot_sequence(sequence, show_peaks=True)
fig, ax = visualizer.plot_log_sequence(sequence)
fig, ax = visualizer.plot_histogram(sequence)- Python 3.8 or higher
- NumPy >= 1.24.0
- SciPy >= 1.10.0
- Pandas >= 2.0.0
- PyTorch >= 2.0.0
- tqdm >= 4.65.0
- Matplotlib >= 3.7.0
- Seaborn >= 0.12.0 (optional)
- Streamlit >= 1.50.0 (for web app)
- Jupyter (for notebooks)
- PyYAML (for configuration)
- pytest (for testing)
See requirements.txt for complete list.
Minimum:
- CPU: Multi-core processor
- RAM: 8 GB
- Storage: 10 GB
Recommended (for ML training):
- CPU: High-performance multi-core (Intel i7/i9, AMD Ryzen 7/9)
- RAM: 16+ GB
- GPU: NVIDIA GPU with CUDA support (8+ GB VRAM recommended)
# Install development dependencies
pip install -r requirements-dev.txt
# Install pre-commit hooks
pre-commit install
# Run tests
pytest tests/
# Run with coverage
pytest tests/ --cov=src/math_research
# Run linting
flake8 src/The project includes comprehensive test coverage:
- 40+ test functions across 7 test files
- Unit tests for all major components
- Integration tests for complete pipelines
- Parametrized tests for multiple configurations
Run tests:
pytest tests/ -vPlease see CONTRIBUTING.md for guidelines on contributing to the project.
# Generate sequence and save as JSON
python -m src.apps.cli generate --start 27 --format json --output sequence.json
# Train model with config file
python -m src.apps.cli train --config configs/training/collatz_transformer.yaml
# Evaluate with custom test range
python -m src.apps.cli evaluate --checkpoint checkpoints/best_model.pt --test-range 10000 20000
# Batch analysis with step size
python -m src.apps.cli analyze --start 1 --end 1000 --step 10 --output batch_analysis.json# Custom sequence class
from math_research.sequences.base import BaseSequence
class MySequence(BaseSequence):
def step(self, n: int) -> int:
return n * 2 + 1
seq = MySequence(start=5)
sequence = seq.generate(max_iterations=10)See Usage Examples Guide for more.
This framework is inspired by research on training transformers to predict Collatz sequences:
"Transformers know more than they can tell: Learning the Collatz sequence"
Key insights:
- Transformers can learn complex arithmetic functions with proper encoding
- Base 24 and 32 encoding yield optimal performance
- Models learn specific patterns (k, k' values) rather than universal algorithms
- Error patterns are explainable, not random hallucinations
- 38 Python source files in src/
- 7 test files with 40+ test functions
- 3 Jupyter notebooks with complete examples
- 12+ documentation files (~80KB+)
- 4 CLI commands fully functional
- 5 Streamlit pages interactive
- Comprehensive test coverage for all major components
- Enterprise infrastructure: Security policy, containerization, CI/CD, API docs
This project is licensed under the MIT License - see the LICENSE file for details.
MoniGarr
- Email: monigarr@MoniGarr.com
- Website: MoniGarr.com
Research Interests:
- AI/ML Research and Development
- Extended Reality (XR) Applications
- 3D Graphics and Visualization
- Robotics and Autonomous Systems
- Computer Vision
- Navigation Systems
- Natural Language Processing (NLP)
- Low Resource Languages (spoken in English communities)
If you use this framework in your research, please cite:
@software{arkhe_framework,
title = {ARKHĒ FRAMEWORK: Mathematical Sequence Research and ML Framework},
author = {MoniGarr},
year = {2025},
version = {0.1.0},
url = {https://github.com/monigarr/ARKHE.git},
note = {Framework for Collatz sequence research and transformer model training}
}- Inspired by Collatz conjecture research and transformer-based sequence learning
- Built with excellent open-source tools (PyTorch, NumPy, Streamlit, Jupyter, etc.)
- Design principles influenced by research on interpretable ML
- Additional sequence types (Fibonacci, Prime sequences, etc.)
- Enhanced visualization tools (3D plots, animations)
- Advanced ML architectures (attention variants, hybrid models)
- Distributed training support
- Real-time training monitoring
- Model comparison and benchmarking tools
- Sequence animation and playback
- REST API for model serving
For questions, issues, or contributions:
- Check the FAQ first
- Review documentation
- Open an issue on GitHub
- See CONTRIBUTING.md for contribution guidelines
- If you build upon or extend the framework, attribution is appreciated.
See CHANGELOG.md for detailed version history and changes.
If you use ARKHĒ in academic work, technical reports, or derivative research, please cite it as follows:
ARKHĒ Framework
MoniGarr. ARKHĒ: A Framework for First-Principle Analysis of Rule-Based Systems.
GitHub repository: https://github.com/monigarr/ARKHE
BibTeX:
@software{arkhe_framework,
author = {MoniGarr},
title = {ARKHĒ: A Framework for First-Principle Analysis of Rule-Based Systems},
year = {2026},
url = {https://github.com/monigarr/ARKHE}
}
---
**Note:** This is a research framework designed for mathematical exploration and ML experimentation. It prioritizes interpretability and research insights over production deployment optimization.
