Implement ML workload using Paper's operators with gene expression classification #43
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The framework had infrastructure benchmarking (matrix multiplication timing) but lacked end-to-end ML problem-solving with actionable results. This adds a complete gene expression classification workflow that implements ML algorithms using Paper's out-of-core operators instead of external ML libraries, demonstrating Paper can handle ML workloads directly.
Changes
Paper ML Module (
paper_ml.py) - NEW@(matmul),.T(transpose),*(scalar mult),+(add),-(sub)New Operator Added (
paper/numpy_api.py)-) implemented asA - B = A + (-1 * B)weights = weights - learning_rate * gradientCore ML Pipeline (
ml_classification.py)Example & Documentation
examples/ml_classification_example.py- Standalone demonstrationML_TASK.md- Complete workflow documentation updated to emphasize Paper operatorsdemo_real_dataset.pywith ML stepTesting
Usage
Output includes accuracy, ROC AUC, and timing - demonstrating Paper's ML computation capability and solution quality.
What This Achieves
✅ ML algorithms run on Paper's operators - not external libraries
✅ Benchmarks Paper's ML workload - matrix ops for gradient descent
✅ Demonstrates out-of-core ML capability - Paper handles ML computation
✅ Minimal framework changes - only subtraction operator added
This implementation shows Paper is more than a data loading framework - it can perform ML computations using its own out-of-core operators.
Dependencies
scikit-learnfor utilities only (train_test_split, metrics),dask[array]andpsutilfor benchmarking (already used by existing benchmark scripts). ML algorithms implemented using Paper's operators.Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.
Footnotes
https://github.com/j143/ooc/pull/41/files ↩ ↩2