NebTorch is a minimal Autograd engine built from scratch using NumPy, inspired by PyTorch’s automatic differentiation system.
In 11-785: Introduction to Deep Learning, a graduate-level course at CMU taught by Prof. Bhiksha Raj Ramakrishnan, I completed a sequence of assignments covering everything from foundational concepts to advanced topics in Deep Learning — including neural networks, optimizations, and more. The course provided both theoretical and practical understanding of neural networks, along with a brief introduction to Autograd.
After completing the course, I was inspired to dive deeper and build my own Autograd engine from scratch. Building NebTorch has been very rewarding—I’ve solidified my understanding of Deep Learning and Automatic Differentiation, and most of all, I’ve gained appreciation for frameworks such as PyTorch and TensorFlow.
Most of the course content is openly available online: https://deeplearning.cs.cmu.edu/F24/index.html
Here's a complete example demonstrating how to use NebTorch to train a simple Multi-Layer Perceptron (MLP) on the Iris dataset:
import numpy as np
from nebtorch import Module, Tensor
from nebtorch.nn import Linear, ReLU, CrossEntropyLoss, Softmax
from nebtorch.optim import SGD
from sklearn import datasets
from sklearn.model_selection import train_test_splitclass MLP(Module):
def __init__(self, in_features: int, out_features: int):
super().__init__()
self.linear_1 = Linear(in_features=in_features, out_features=256)
self.act = ReLU()
self.linear_2 = Linear(in_features=256, out_features=out_features)
def forward(self, input: Tensor):
out = self.linear_1(input)
out = self.act(out)
logits = self.linear_2(out)
return logits# Load and prepare data
iris = datasets.load_iris()
X = iris.data
Y = iris.target
X_train, X_test, Y_train, Y_test = train_test_split(
X, Y, test_size=0.2, random_state=42
)
# Convert to NebTorch tensors
X_train = nebtorch.tensor(X_train)
Y_train = nebtorch.tensor(Y_train)
X_test = nebtorch.tensor(X_test)
Y_test = nebtorch.tensor(Y_test)# Hyperparameters
INPUT_FEATURES = X_train.shape[1]
NUM_CLASSES = np.max(Y) + 1
EPOCHS = 100
BATCH_SIZE = 5
# Initialize model, loss, and optimizer
model = MLP(INPUT_FEATURES, NUM_CLASSES)
criterion = CrossEntropyLoss()
optimizer = SGD(model.parameters(), lr=0.01)num_batches = X_train.shape[0] // BATCH_SIZE
for epoch in range(EPOCHS):
for i in range(num_batches):
model.train()
optimizer.zero_grad()
# Get batch
start_idx = i * BATCH_SIZE
end_idx = start_idx + BATCH_SIZE
input = X_train[start_idx:end_idx]
target = Y_train[start_idx:end_idx]
# Forward pass
out = model(input)
loss = criterion(out, target)
# Backward pass
loss.backward()
optimizer.step()
# Print progress
if epoch % 10 == 0:
print(f"Epoch {epoch:3d} | Loss: {loss.data.item():.4f}")# Evaluate on test set
model.eval()
out = model(X_test)
loss = criterion(out, Y_test)
# Calculate accuracy
softmax = Softmax(dim=1)
predictions = np.argmax(softmax(out).data, axis=1)
accuracy = np.sum(predictions == Y_test.data) / Y_test.shape[0] * 100
print(f"Test Accuracy: {accuracy:.2f}%")| Component | Description |
|---|---|
| Module | Base class for all neural network modules |
| Tensor | Multi-dimensional datastructure with automatic differentiation support |
| Parameter | Special tensor for trainable model parameters |
| Optimizer | Base class for all optimizers |
| Component | Description |
|---|---|
| Add | Element-wise addition with broadcasting |
| Subtract | Element-wise subtraction with broadcasting |
| Negate | Element-wise negation |
| Multiply | Element-wise multiplication with broadcasting |
| Divide | Element-wise division with broadcasting |
| Matrix Multiplication | Matrix multiplication (@ operator) |
| Transpose | Matrix transposition |
| Reshape | Tensor reshaping |
| Log | Natural logarithm |
| Exp | Exponential function |
| Power | Element-wise power operation |
| Mean | Mean reduction with axis support |
| Variance | Variance reduction with axis support |
| Sum | Sum reduction with axis support |
| Max | Maximum reduction with axis support |
| Slice | Tensor indexing and slicing |
| Component | Description |
|---|---|
| Sigmoid | Sigmoid activation function |
| Tanh | Hyperbolic tangent activation |
| ReLU | Rectified Linear Unit |
| GELU | Gaussian Error Linear Unit |
| Softmax | Softmax with dimension support |
| Component | Description |
|---|---|
| Linear | Fully connected layer |
| Conv1d_stride1 | 1D convolution with stride 1 |
| Conv2d_stride1 | 2D convolution with stride 1 |
| Conv2d | 2D convolution with configurable stride |
| MaxPool2d_stride1 | 2D max pooling with stride 1 |
| MeanPool2d_stride1 | 2D mean pooling with stride 1 |
| MaxPool2d | 2D max pooling with configurable stride |
| MeanPool2d | 2D mean pooling with configurable stride |
| BatchNorm1d | 1D batch normalization |
| LayerNorm | Layer normalization |
| Dropout | Dropout regularization |
| Embedding | Embedding layer for sparse inputs |
| Component | Description |
|---|---|
| RNNCell | Recurrent neural network cell |
| GRUCell | Gated Recurrent Unit cell |
| Component | Description |
|---|---|
| Upsampling1d | 1D upsampling |
| Downsample1d | 1D downsampling |
| Upsample2d | 2D upsampling |
| Downsample2d | 2D downsampling |
| Component | Description |
|---|---|
| MultiheadAttention | Multi-head attention mechanism |
| Scaled Dot-Product Attention | Scaled dot-product attention |
| Component | Description |
|---|---|
| Loss | Base class for all loss functions |
| MSELoss | Mean Squared Error loss |
| CrossEntropyLoss | Cross-entropy loss with softmax |
| Component | Description |
|---|---|
| SGD | Stochastic Gradient Descent |