Troubleshooting

This guide covers common issues and their solutions when using AiDotNet.

Installation Issues

Package Not Found

Error:

error: Unable to find package 'AiDotNet'

Solutions:

Check your internet connection
Clear NuGet cache: dotnet nuget locals all --clear
Add nuget.org source: dotnet nuget add source https://api.nuget.org/v3/index.json

Version Conflicts

Error:

error: Version conflict detected for System.Memory

Solution: Add explicit package reference:

<PackageReference Include="System.Memory" Version="4.5.5" />

Platform Not Supported

Error:

System.PlatformNotSupportedException: AiDotNet.Gpu is not supported on this platform

Solutions:

Verify you're on a supported platform (Windows x64, Linux x64, macOS)
For GPU: ensure CUDA is installed (Windows/Linux) or Metal is available (macOS)
Use CPU-only version if GPU is not available

GPU Issues

CUDA Not Found

Error:

CUDA driver not found. Please install CUDA toolkit.

Solutions:

Install NVIDIA driver (version 525+)
Install CUDA Toolkit 12.x from NVIDIA
Verify with: nvidia-smi
Add CUDA to PATH:
- Windows: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0\bin
- Linux: /usr/local/cuda/bin

GPU Out of Memory

Error:

CUDA out of memory. Tried to allocate 2.00 GiB

Solutions:

Reduce batch size:
```
.WithBatchSize(16)  // or smaller
```

Enable gradient checkpointing:

.WithGradientCheckpointing(enabled: true)

Use mixed precision:
```
.WithMixedPrecision(enabled: true)
```
Clear GPU cache:
```
GpuMemory.ClearCache();
```
Use smaller model variant

GPU Not Detected

Error:

No CUDA-capable device detected

Solutions:

Verify GPU is installed: nvidia-smi
Reinstall NVIDIA drivers
Check CUDA compatibility with your GPU
Try rebooting

Training Issues

Model Not Learning (Loss Not Decreasing)

Possible Causes and Solutions:

Learning rate too high

.WithOptimizer(OptimizerType.Adam, learningRate: 1e-4)  // Try smaller

Learning rate too low

.WithOptimizer(OptimizerType.Adam, learningRate: 1e-2)  // Try larger

Data not normalized

.ConfigurePreprocessing(p => p.Normalize())

Wrong loss function
- Classification: CrossEntropy
- Binary: BinaryCrossEntropy
- Regression: MSE or MAE
Network too simple
- Add more layers
- Increase hidden units

NaN Loss

Error:

Loss is NaN

Solutions:

Reduce learning rate significantly
Add gradient clipping:
```
.WithGradientClipping(maxNorm: 1.0)
```

Check for NaN in input data:

if (data.Any(x => float.IsNaN(x))) throw new Exception("NaN in data");

Use numerically stable operations
Initialize weights properly

Overfitting

Symptoms:

Training accuracy high, validation accuracy low
Training loss low, validation loss high

Solutions:

Add regularization:

nn.AddDropout(0.5)
nn.AddBatchNormalization()

Use weight decay:

.WithOptimizer(OptimizerType.AdamW, weightDecay: 0.01)

Early stopping:
```
.WithEarlyStopping(patience: 10)
```

Data augmentation:

.WithDataAugmentation(aug => aug
    .RandomHorizontalFlip()
    .RandomRotation(15))

Get more training data

Underfitting

Symptoms:

Both training and validation accuracy are low
Model performs poorly on all data

Solutions:

Use a more complex model
Train for more epochs
Reduce regularization
Feature engineering
Check data quality

Class Imbalance

Symptoms:

Model always predicts majority class
High accuracy but low recall on minority class

Solutions:

Class weighting:

.ConfigureDataHandling(d => d.ClassWeights(ClassWeights.Balanced))

Oversampling:

.ConfigureDataHandling(d => d.Resample(ResamplingMethod.SMOTE))

Use appropriate metrics (F1, AUC instead of accuracy)

Focal loss for extreme imbalance:

.WithLossFunction(LossType.FocalLoss, gamma: 2.0)

Memory Issues

Out of Memory (CPU)

Error:

System.OutOfMemoryException

Solutions:

Use streaming/batching for large datasets:

.ConfigureDataLoading(d => d.StreamFromDisk = true)

Reduce batch size
Use memory-mapped files for large data
Process data in chunks
Use 64-bit process

Memory Leak

Symptoms:

Memory usage grows continuously during training

Solutions:

Dispose resources properly:
```
using var tensor = CreateTensor();
```

Clear cache periodically:

GC.Collect();
GC.WaitForPendingFinalizers();

Use using statements for all disposable objects

Data Issues

Shape Mismatch

Error:

Shape mismatch: expected [32, 784] but got [32, 28, 28]

Solutions:

Flatten input if needed:
```
nn.AddFlattenLayer()
```
Check data preprocessing pipeline
Verify input dimensions match model expectations

Data Type Mismatch

Error:

Cannot convert double[] to float[]

Solutions:

Use consistent types throughout:

var features = data.Select(x => (float)x).ToArray();

Specify correct type parameter:

new PredictionModelBuilder<float, float[], int>()  // not double

Model Loading Issues

Model Not Found

Error:

FileNotFoundException: Could not find model file 'model.aidotnet'

Solutions:

Check file path is correct
Use absolute path
Verify file exists

Model Version Incompatible

Error:

Model version 1.0 is not compatible with AiDotNet version 2.0

Solutions:

Retrain model with current version

Use model migration:

var model = await Model.LoadAndMigrateAsync("old_model.aidotnet");

Corrupted Model File

Error:

InvalidDataException: Model file is corrupted

Solutions:

Re-download or re-train the model
Check file integrity
Ensure file wasn't modified

Performance Issues

Slow Training

Solutions:

Enable GPU:

.ConfigureGpu(gpu => gpu.Enabled = true)

Increase batch size (if memory allows)
Use mixed precision:
```
.WithMixedPrecision(enabled: true)
```

Use multiple data loading workers:

.ConfigureDataLoading(d => d.NumWorkers = 4)

Profile to find bottlenecks:

.ConfigureProfiling(p => p.Enabled = true)

Slow Inference

Solutions:

Use batch inference
Enable GPU for inference

Use model quantization:

var quantized = model.Quantize(QuantizationType.INT8);

Export to ONNX and use ONNX Runtime

Getting Help

If you can't resolve your issue:

Search existing issues: GitHub Issues
Create a new issue with:
- AiDotNet version
- .NET version
- OS and hardware
- Minimal reproducible code
- Full error message and stack trace
Ask in discussions: GitHub Discussions

Home

Getting Started

Core Concepts

Reference

Community

Uh oh!

Troubleshooting

Troubleshooting

Installation Issues

Package Not Found

Version Conflicts

Platform Not Supported

GPU Issues

CUDA Not Found

GPU Out of Memory

GPU Not Detected

Training Issues

Model Not Learning (Loss Not Decreasing)

NaN Loss

Overfitting

Underfitting

Class Imbalance

Memory Issues

Out of Memory (CPU)

Memory Leak

Data Issues

Shape Mismatch

Data Type Mismatch

Model Loading Issues

Model Not Found

Model Version Incompatible

Corrupted Model File

Performance Issues

Slow Training

Slow Inference

Getting Help

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally