Skip to content

Conversation

@planetf1
Copy link

@planetf1 planetf1 commented Jan 20, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

Adds pytest markers to categorize tests by backend, resource requirements, and test type.

Key benefit: Run pytest with zero config—tests auto-skip based on your system (Ollama running? GPU? API keys?) with clear messages about what's missing.

Advanced: Target specific tests: pytest -m "ollama" or pytest -m "not (requires_gpu or requires_api_key)"

Auto-Detection

Check Method Threshold
Ollama Port 11434 socket Must be listening
API Keys Env vars Must exist
GPU torch (CUDA/MPS) Must be available
RAM psutil 48GB for heavy tests

Usage

pytest                                    # Auto-skips based on system
pytest -m "not llm"                       # Fast unit tests only
pytest -m "ollama"                        # Ollama tests only
pytest --ignore-gpu-check                 # Override (for debugging)

Implementation

Markers: ollama, openai, watsonx, huggingface, vllm, requires_api_key, requires_gpu, requires_heavy_ram, qualitative, llm

Files: pyproject.toml (psutil dep + markers), test/conftest.py (auto-detection + skip logic), test/MARKERS_GUIDE.md (430-line guide), README.md/docs/tutorial.md (links), 4 test files (example markers)

Notes:

See test/MARKERS_GUIDE.md for complete documentation.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Testing notes: Added markers to 4 test files; auto-detection tested via existing suite; all pre-commit hooks passed.

…g#322)

Implements comprehensive marker system to categorize tests by backend,
resource requirements, and test type. Auto-detects system capabilities
(Ollama, API keys, GPU, RAM) and skips tests with clear messaging.

- Add backend markers: ollama, openai, watsonx, huggingface, vllm
- Add capability markers: requires_api_key, requires_gpu, requires_heavy_ram
- Auto-detection: Ollama (port 11434), GPU (CUDA/MPS), RAM (48GB threshold)
- CLI overrides: --ignore-{gpu,ram,ollama,api-key}-check
- Add comprehensive test/MARKERS_GUIDE.md documentation
- Link guide from README.md and docs/tutorial.md
- Extends existing CI optimizations from generative-computing#293 (preserves qualitative skip)
@github-actions
Copy link
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@mergify
Copy link

mergify bot commented Jan 20, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@planetf1
Copy link
Author

I started the RAM check at 32GB. After extensive cleanup on my macbook m1 I couldn't run the pytorch tests. So upped to 48GB, but cannot test. Other changes to model, quantization could also be done to reduce footprint (I could try adding tests with a smaller model as an additional issue pr if it helps?) - so the need for this condition may change, but I think this makes a clear developer-friendly intent workable

@planetf1 planetf1 marked this pull request as ready for review January 20, 2026 14:26
@planetf1
Copy link
Author

planetf1 commented Jan 20, 2026

Example output from uv run pytest -v

test/backends/test_huggingface.py::test_adapters SKIPPED (Skipping test: Insufficient RAM (32.0GB < ...) [  0%]

or

test/backends/test_litellm_watsonx.py::test_generate_from_raw SKIPPED (Skipping test: watsonx API ke...) [ 11%]

Currently the detection code doesn't check the required models are available

In draft whilst I verify execution at least on local 32GB mac

@planetf1 planetf1 marked this pull request as draft January 20, 2026 14:43
- Add markers to 7 more backend test files (litellm_ollama, openai_ollama,
  vision_ollama, vision_openai, litellm_watsonx, vllm_tools, huggingface_tools)
- Add note that Ollama check only verifies service, not loaded models
- Add fixture to normalize OLLAMA_HOST from 0.0.0.0 to 127.0.0.1 for client connections
Adds xgrammar to hf optional dependencies to support constrained decoding
in granite_common. Fixes test_answerability failure.
Adds pytest markers to 5 stdlib test files based on model requirements:

Heavy RAM tests (8B+ models):
- test_spans.py: granite-3.3-8b (huggingface, gpu, heavy_ram)
- test_think_budget_forcing.py: gpt-oss:20b (ollama, heavy_ram)

GPU-only tests (4B models):
- test_rag.py: granite-4.0-micro (huggingface, gpu)

Lightweight tests (1B models):
- test_sofai_sampling.py: llama3.2:1b (ollama)
- test_sofai_graph_coloring.py: llama3.2:1b (ollama)
Adds markers to 3 test files using large Ollama models:
- test_genslot.py: granite3.3:8b (22GB, unquantized)
- test_component_typing.py: granite3.3:8b (22GB, unquantized)
- test_think_budget_forcing.py: gpt-oss:20b (adds requires_gpu)

The granite3.3:8b Ollama model is 22GB (full precision), requiring
both GPU and heavy RAM despite being only 8B parameters.
@planetf1
Copy link
Author

With the current configuration I was able to run uv run pytest -v and succeeded on all tests. Some were close on resource (macOS, arm m1, 32GB) - in part becuase ollama leaves models loaded, but it's a starting point. Some further refinements/test changes may make the experience smoother?

@planetf1 planetf1 marked this pull request as ready for review January 20, 2026 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tests: qualitative vs LLM usage

1 participant