Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
253 changes: 253 additions & 0 deletions OPERATORS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
# OOCMatrix - Out-of-Core Matrix Operations

## Overview

The `OOCMatrix` class provides a high-level, NumPy-like API for out-of-core matrix operations. It wraps existing NumPy/SciPy operations with lazy evaluation, block-wise processing, and streaming constructs, focusing on **orchestration** rather than reimplementing mathematical operations.

## Design Philosophy

**Key Principle**: Don't reimplement optimized math libraries - orchestrate them intelligently.

- **Reuse Existing Libraries**: All mathematical operations (matrix multiply, element-wise ops, reductions) use NumPy/SciPy/Pandas for in-block computations
- **Focus on Orchestration**: The framework handles block loading, buffer management, lazy evaluation, and scheduling
- **Transparent Out-of-Core**: Provides a familiar API that doesn't require full matrices in RAM

## API Reference

### Creating OOCMatrix

```python
from paper import OOCMatrix

# Load existing matrix
A = OOCMatrix('data.bin', shape=(10_000_000, 1000), dtype=np.float32, mode='r')

# Create new matrix
B = OOCMatrix('new.bin', shape=(1000, 500), dtype=np.float32, mode='w+', create=True)
```

### Block-wise Operations

#### blockwise_apply(op, output_path=None)
Apply a function to each block independently.

```python
# Normalize using NumPy operations on each block
A_norm = A.blockwise_apply(lambda x: (x - x.mean()) / x.std())

# Apply ReLU activation
A_relu = A.blockwise_apply(lambda x: np.maximum(0, x))

# Custom transformation
A_transformed = A.blockwise_apply(lambda x: np.tanh(x * 2.0))
```

#### blockwise_reduce(op, combine_op=None)
Reduce across all blocks using a reduction operation.

```python
# Custom reduction
max_val = A.blockwise_reduce(np.max, combine_op=max)

# Sum with custom combining
total = A.blockwise_reduce(np.sum, combine_op=lambda x, y: x + y)
```

### Matrix Operations

#### matmul(other, op=None, output_path=None)
Matrix multiplication with custom operation.

```python
# Use NumPy dot product (default)
C = A.matmul(B, op=np.dot)

# Use custom operation
C = A.matmul(B, op=lambda a, b: np.dot(a, b))
```

### Statistical Operations

All operations are computed block-wise without loading the full matrix:

```python
total = A.sum() # Sum of all elements
mean = A.mean() # Mean value
std = A.std() # Standard deviation
min_val = A.min() # Minimum value
max_val = A.max() # Maximum value
```

### Lazy Evaluation & Operators

Build computation plans that execute only when materialized:

```python
# Lazy operations (don't execute immediately)
result = (A + B) * 2
result = A @ B
result = A * scalar

# Execute the plan
materialized = result.compute('output.bin')
```

### Block Iteration

Stream through matrix blocks for custom processing:

```python
for block, (row_start, col_start) in A.iterate_blocks():
# block is a NumPy array
# Process each block independently
process(block)
```

## Usage Examples

### Example 1: Normalization

```python
from paper import OOCMatrix
import numpy as np

A = OOCMatrix('data.bin', shape=(10_000_000, 1000))

# Compute statistics (block-wise)
mean = A.mean()
std = A.std()

# Normalize using existing NumPy operations
A_normalized = A.blockwise_apply(lambda x: (x - mean) / std)

print(f"Normalized mean: {A_normalized.mean()}") # ~0
print(f"Normalized std: {A_normalized.std()}") # ~1
```

### Example 2: Matrix Multiplication Pipeline

```python
A = OOCMatrix('A.bin', shape=(10_000_000, 1000))
B = OOCMatrix('B.bin', shape=(1000, 500))

# Matrix multiplication using NumPy dot
C = A.matmul(B, op=np.dot)

# Process results block-by-block
for block, (r, c) in C.iterate_blocks():
# Send to downstream system
send_to_database(block, position=(r, c))
```

### Example 3: Complex Expression with Fusion

```python
A = OOCMatrix('A.bin', shape=(1_000_000, 1000))
B = OOCMatrix('B.bin', shape=(1_000_000, 1000))

# Build lazy expression (operator fusion optimization)
result = (A + B) * 2

# Execute optimized plan
C = result.compute('result.bin')
```

### Example 4: Custom Block Processing

```python
A = OOCMatrix('data.bin', shape=(10_000_000, 1000))

# Apply custom function using SciPy
from scipy.special import expit

A_sigmoid = A.blockwise_apply(lambda x: expit(x))

# Chain operations
A_processed = A.blockwise_apply(
lambda x: np.clip((x - x.mean()) / x.std(), -3, 3)
)
```

## Performance Characteristics

- **Memory Efficient**: Only loads blocks needed for current computation
- **Lazy Evaluation**: Builds DAG for optimization before execution
- **Operator Fusion**: Automatically fuses compatible operations (e.g., (A+B)*scalar)
- **Buffer Management**: Intelligent caching and eviction strategies
- **Parallelization**: Thread-based parallelism for fused operations

## Comparison with Similar Frameworks

| Feature | OOCMatrix | Dask | Vaex |
|---------|-----------|------|------|
| Lazy Evaluation | ✓ | ✓ | ✓ |
| Operator Fusion | ✓ | Limited | Limited |
| Block Iteration | ✓ | ✓ | ✓ |
| NumPy Backend | ✓ | ✓ | ✓ |
| Buffer Management | Advanced | Basic | Basic |
| Custom Operations | ✓ | ✓ | ✓ |

## Implementation Details

### No Kernel Rewrites
All mathematical operations use existing, optimized libraries:
- Matrix operations: `np.dot`, `np.matmul`
- Element-wise: `np.add`, `np.multiply`, etc.
- Reductions: `np.sum`, `np.mean`, `np.std`, etc.
- Custom: Any NumPy/SciPy function

### Focus Areas
The framework provides:
1. **Block Orchestration**: Managing how data flows through operations
2. **Lazy DAG Building**: Constructing computation plans
3. **Buffer Management**: Caching, eviction, prefetching
4. **Scheduling**: Optimal ordering of block operations
5. **Streaming**: Iterating over results without materialization

### What We Don't Do
- ❌ Reimplement matrix multiplication
- ❌ Rewrite NumPy/SciPy kernels
- ❌ Custom linear algebra code
- ❌ Low-level optimization (handled by NumPy/BLAS)

### What We Do
- ✓ Smart block loading and caching
- ✓ Lazy evaluation and operator fusion
- ✓ Memory-efficient streaming
- ✓ Computation plan optimization
- ✓ Transparent out-of-core orchestration

## Testing

The OOCMatrix implementation includes comprehensive tests:

```bash
# Run all operator tests
python -m unittest tests.test_operators

# Run specific test class
python -m unittest tests.test_operators.TestOOCMatrix
python -m unittest tests.test_operators.TestOOCMatrixIntegration
```

Test coverage includes:
- Block iteration
- Block-wise operations
- Lazy evaluation
- Operator overloading
- Statistical operations
- Context manager protocol
- Integration with existing Plan infrastructure

## Benchmarks

See main README for performance comparisons with Dask and other frameworks.

## Future Extensions

Potential areas for enhancement:
- GPU backend support (CuPy)
- Additional reduction operations
- More sophisticated operator fusion patterns
- Distributed computation support
- Integration with Pandas for heterogeneous data
72 changes: 65 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,60 @@
# Paper framework

Paper a lightweight Python framework for performing matrix computations on datasets that are too large to fit into main memory. It is designed around the principle of lazy evaluation, which allows it to build a computation plan and apply powerful optimizations, such as operator fusion, before executing any costly I/O operations.
Paper is a lightweight Python framework for performing matrix computations on datasets that are too large to fit into main memory. It is designed around the principle of lazy evaluation, which allows it to build a computation plan and apply powerful optimizations, such as operator fusion, before executing any costly I/O operations.

The architecture is inspired by modern data systems and academic research (e.g., PreVision), with a clear separation between the logical plan, the physical execution backend, and an intelligent optimizer.

## OOCMatrix API - High-Level NumPy-like Interface

The framework now includes **OOCMatrix**, a high-level wrapper that provides a NumPy-like API for out-of-core operations. This API focuses on **orchestration** rather than reimplementing mathematical operations, leveraging existing NumPy/SciPy/Pandas operations with lazy evaluation and block-wise processing.

### Key Features

- **NumPy-like API**: Familiar interface for matrix operations
- **Lazy Evaluation**: Build computation plans before execution
- **Block-wise Processing**: Automatically chunks large datasets
- **Operator Support**: Reuses existing optimized libraries (NumPy, SciPy) for in-block operations
- **Memory Efficient**: Smart block loading, buffer management, and streaming

### Quick Start

```python
from paper import OOCMatrix
import numpy as np

# Create or load large matrices
A = OOCMatrix('fileA.bin', shape=(10_000_000, 1000))
B = OOCMatrix('fileB.bin', shape=(1000, 1000))

# Perform operations using existing NumPy functions
C = A.matmul(B, op=np.dot)

# Iterate over result blocks (no full matrix in memory)
for block, (r, c) in C.iterate_blocks():
process(block) # Each block uses NumPy operations

# Common operations (computed block-wise)
mean = A.mean()
std = A.std()
total = A.sum()

# Apply custom transformations using NumPy
A_normalized = A.blockwise_apply(lambda x: (x - mean) / std)

# Lazy evaluation with operator fusion
result = (A + B) * 2 # Builds plan, doesn't execute
materialized = result.compute('output.bin') # Executes optimized plan
```

### Examples

See `examples_oocmatrix.py` for comprehensive examples including:
- Basic statistics (sum, mean, std, min, max)
- Block-wise transformations
- Lazy evaluation and operator fusion
- Matrix multiplication with custom operations
- Block iteration for streaming workflows

### Architecture


Expand All @@ -13,13 +64,20 @@ The architecture is inspired by modern data systems and academic research (e.g.,
### Testing

```bash
# Run all tests
python ./tests/run_tests.py
# Run all tests (including OOCMatrix tests)
python run_tests.py

# Run a specific test
python ./tests/run_tests.py addition
python ./tests/run_tests.py fused
python ./tests/run_tests.py scalar
# Or run specific test modules
python -m unittest tests.test_operators
python -m unittest tests.test_core
python -m unittest tests.test_fusion_operations
```

### Running Examples

```bash
# Run OOCMatrix examples
python examples_oocmatrix.py
```

### Buffer Manager architecture
Expand Down
Loading