This demonstration shows how Strix Halo maintains system responsiveness and prevents CPU memory starvation when running intensive AI workloads alongside CPU-intensive tasks.
./scripts/install_prerequisites.shThis installs Ollama, Python packages, CMake, and the required LLM model.
./scripts/memory_qos_demo.sh --duration 600This runs a 10-minute demonstration (600 seconds). Adjust the duration as needed.
LATEST_CSV=$(ls -t logs/memory_qos_metrics_*.csv | head -1)
python3 scripts/visualize_memory_qos.py --metrics-file "$LATEST_CSV"This generates a comprehensive visualization showing all metrics across the four phases.
The visualization generates a comprehensive dashboard showing memory QoS effectiveness:
Memory QoS Protection: CPU stays responsive and memory availability is maintained even under heavy AI loads.
The demonstration runs through four phases:
- Baseline: Memory-intensive workload only (CMake builds + Python memory operations)
- Transition: Workload + LLM starting (QoS should activate immediately)
- Contended: Both workloads fully active (QoS should maintain memory availability)
- Recovery: LLM stopped, workload continues (system should recover gracefully)
The demo measures and visualizes:
-
Memory Availability (Primary QoS Metric)
- Target: Should remain stable throughout all phases
- Higher is better
- Demonstrates memory QoS protection
-
Memory Retention by Phase
- Target: ≥95% (Excellent), ≥85% (Good)
- Shows how much memory availability is retained compared to baseline
- Higher is better
-
Swap Usage
- Target: 0% (no swap usage indicates no memory pressure)
- Lower is better
- Zero swap usage is a strong indicator of effective QoS
-
CPU Usage
- Shows CPU utilization across phases
- Stability matters more than absolute percentage
-
System Load Average
- Shows system load over 1, 5, and 15-minute windows
- Lower is better
- Demonstrates system responsiveness
-
I/O Bandwidth
- Shows fair sharing of I/O resources
- Higher is better
Strix Halo's memory QoS ensures that:
- CPU traffic gets guaranteed bandwidth allocation
- Memory latency floors are enforced for CPU access
- Foreground work remains interactive even under heavy AI load
- Memory availability is maintained (no starvation)
This is what enables the "always-on AI PC" experience - AI assistants that don't require you to turn them off when you're doing real work.
- Memory QoS Demo Documentation: Complete guide to the demo, workloads, and metrics
- Installation Guide: Detailed installation instructions
- Ollama not found: Ensure Ollama is installed and in PATH
- Model not available: Run
ollama pull codellama:7bfirst - Python packages missing: Run
pip install -r requirements.txt - CMake not found: Install build tools:
sudo apt-get install cmake build-essential - Project path not found: Ensure
test-projects/jsonexists or use--project-pathto specify a different path
See Troubleshooting Guide for more details.
Contributions, issues, and feature requests are welcome! Feel free to check the existing issues or open a new one.
This demo shows absolute performance characteristics of the developer's Strix Halo hardware. All measurements are taken on Strix Halo systems only. This suite makes no direct comparisons or competitive claims.
This project is licensed under the MIT License - see the LICENSE file for details.
