Skip to content

Troubleshooting

franklinic edited this page Jan 19, 2026 · 1 revision

Troubleshooting

This guide covers common issues and their solutions when using AiDotNet.

Installation Issues

Package Not Found

Error:

error: Unable to find package 'AiDotNet'

Solutions:

  1. Check your internet connection
  2. Clear NuGet cache: dotnet nuget locals all --clear
  3. Add nuget.org source: dotnet nuget add source https://api.nuget.org/v3/index.json

Version Conflicts

Error:

error: Version conflict detected for System.Memory

Solution: Add explicit package reference:

<PackageReference Include="System.Memory" Version="4.5.5" />

Platform Not Supported

Error:

System.PlatformNotSupportedException: AiDotNet.Gpu is not supported on this platform

Solutions:

  1. Verify you're on a supported platform (Windows x64, Linux x64, macOS)
  2. For GPU: ensure CUDA is installed (Windows/Linux) or Metal is available (macOS)
  3. Use CPU-only version if GPU is not available

GPU Issues

CUDA Not Found

Error:

CUDA driver not found. Please install CUDA toolkit.

Solutions:

  1. Install NVIDIA driver (version 525+)
  2. Install CUDA Toolkit 12.x from NVIDIA
  3. Verify with: nvidia-smi
  4. Add CUDA to PATH:
    • Windows: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0\bin
    • Linux: /usr/local/cuda/bin

GPU Out of Memory

Error:

CUDA out of memory. Tried to allocate 2.00 GiB

Solutions:

  1. Reduce batch size:

    .WithBatchSize(16)  // or smaller
  2. Enable gradient checkpointing:

    .WithGradientCheckpointing(enabled: true)
  3. Use mixed precision:

    .WithMixedPrecision(enabled: true)
  4. Clear GPU cache:

    GpuMemory.ClearCache();
  5. Use smaller model variant

GPU Not Detected

Error:

No CUDA-capable device detected

Solutions:

  1. Verify GPU is installed: nvidia-smi
  2. Reinstall NVIDIA drivers
  3. Check CUDA compatibility with your GPU
  4. Try rebooting

Training Issues

Model Not Learning (Loss Not Decreasing)

Possible Causes and Solutions:

  1. Learning rate too high

    .WithOptimizer(OptimizerType.Adam, learningRate: 1e-4)  // Try smaller
  2. Learning rate too low

    .WithOptimizer(OptimizerType.Adam, learningRate: 1e-2)  // Try larger
  3. Data not normalized

    .ConfigurePreprocessing(p => p.Normalize())
  4. Wrong loss function

    • Classification: CrossEntropy
    • Binary: BinaryCrossEntropy
    • Regression: MSE or MAE
  5. Network too simple

    • Add more layers
    • Increase hidden units

NaN Loss

Error:

Loss is NaN

Solutions:

  1. Reduce learning rate significantly
  2. Add gradient clipping:
    .WithGradientClipping(maxNorm: 1.0)
  3. Check for NaN in input data:
    if (data.Any(x => float.IsNaN(x))) throw new Exception("NaN in data");
  4. Use numerically stable operations
  5. Initialize weights properly

Overfitting

Symptoms:

  • Training accuracy high, validation accuracy low
  • Training loss low, validation loss high

Solutions:

  1. Add regularization:

    nn.AddDropout(0.5)
    nn.AddBatchNormalization()
  2. Use weight decay:

    .WithOptimizer(OptimizerType.AdamW, weightDecay: 0.01)
  3. Early stopping:

    .WithEarlyStopping(patience: 10)
  4. Data augmentation:

    .WithDataAugmentation(aug => aug
        .RandomHorizontalFlip()
        .RandomRotation(15))
  5. Get more training data

Underfitting

Symptoms:

  • Both training and validation accuracy are low
  • Model performs poorly on all data

Solutions:

  1. Use a more complex model
  2. Train for more epochs
  3. Reduce regularization
  4. Feature engineering
  5. Check data quality

Class Imbalance

Symptoms:

  • Model always predicts majority class
  • High accuracy but low recall on minority class

Solutions:

  1. Class weighting:

    .ConfigureDataHandling(d => d.ClassWeights(ClassWeights.Balanced))
  2. Oversampling:

    .ConfigureDataHandling(d => d.Resample(ResamplingMethod.SMOTE))
  3. Use appropriate metrics (F1, AUC instead of accuracy)

  4. Focal loss for extreme imbalance:

    .WithLossFunction(LossType.FocalLoss, gamma: 2.0)

Memory Issues

Out of Memory (CPU)

Error:

System.OutOfMemoryException

Solutions:

  1. Use streaming/batching for large datasets:

    .ConfigureDataLoading(d => d.StreamFromDisk = true)
  2. Reduce batch size

  3. Use memory-mapped files for large data

  4. Process data in chunks

  5. Use 64-bit process

Memory Leak

Symptoms:

  • Memory usage grows continuously during training

Solutions:

  1. Dispose resources properly:

    using var tensor = CreateTensor();
  2. Clear cache periodically:

    GC.Collect();
    GC.WaitForPendingFinalizers();
  3. Use using statements for all disposable objects

Data Issues

Shape Mismatch

Error:

Shape mismatch: expected [32, 784] but got [32, 28, 28]

Solutions:

  1. Flatten input if needed:

    nn.AddFlattenLayer()
  2. Check data preprocessing pipeline

  3. Verify input dimensions match model expectations

Data Type Mismatch

Error:

Cannot convert double[] to float[]

Solutions:

  1. Use consistent types throughout:

    var features = data.Select(x => (float)x).ToArray();
  2. Specify correct type parameter:

    new PredictionModelBuilder<float, float[], int>()  // not double

Model Loading Issues

Model Not Found

Error:

FileNotFoundException: Could not find model file 'model.aidotnet'

Solutions:

  1. Check file path is correct
  2. Use absolute path
  3. Verify file exists

Model Version Incompatible

Error:

Model version 1.0 is not compatible with AiDotNet version 2.0

Solutions:

  1. Retrain model with current version
  2. Use model migration:
    var model = await Model.LoadAndMigrateAsync("old_model.aidotnet");

Corrupted Model File

Error:

InvalidDataException: Model file is corrupted

Solutions:

  1. Re-download or re-train the model
  2. Check file integrity
  3. Ensure file wasn't modified

Performance Issues

Slow Training

Solutions:

  1. Enable GPU:

    .ConfigureGpu(gpu => gpu.Enabled = true)
  2. Increase batch size (if memory allows)

  3. Use mixed precision:

    .WithMixedPrecision(enabled: true)
  4. Use multiple data loading workers:

    .ConfigureDataLoading(d => d.NumWorkers = 4)
  5. Profile to find bottlenecks:

    .ConfigureProfiling(p => p.Enabled = true)

Slow Inference

Solutions:

  1. Use batch inference
  2. Enable GPU for inference
  3. Use model quantization:
    var quantized = model.Quantize(QuantizationType.INT8);
  4. Export to ONNX and use ONNX Runtime

Getting Help

If you can't resolve your issue:

  1. Search existing issues: GitHub Issues

  2. Create a new issue with:

    • AiDotNet version
    • .NET version
    • OS and hardware
    • Minimal reproducible code
    • Full error message and stack trace
  3. Ask in discussions: GitHub Discussions

Clone this wiki locally