Skip to content

Conversation

@llbbl
Copy link

@llbbl llbbl commented Sep 3, 2025

Set up Python Testing Infrastructure

Summary

This PR establishes a comprehensive testing infrastructure for the behaviour cloning benchmarks project, providing developers with a robust foundation for writing and running tests.

Changes Made

Package Management

  • Poetry Setup: Configured Poetry as the primary package manager with a complete pyproject.toml
  • Dependency Management: Added testing dependencies as development-only packages:
    • pytest ^7.4.0 - Main testing framework
    • pytest-cov ^4.1.0 - Coverage reporting
    • pytest-mock ^3.11.1 - Enhanced mocking utilities

Testing Configuration

  • pytest Configuration: Comprehensive test discovery, strict settings, and custom markers
  • Coverage Settings: 80% threshold with HTML, XML, and terminal reporting
  • Custom Markers:
    • @pytest.mark.unit - Unit tests
    • @pytest.mark.integration - Integration tests
    • @pytest.mark.slow - Long-running tests

Directory Structure

tests/
├── __init__.py
├── conftest.py              # Shared fixtures and configuration
├── test_infrastructure.py   # Infrastructure validation tests
├── unit/
│   ├── __init__.py
│   └── test_sample_unit.py
└── integration/
    ├── __init__.py
    └── test_sample_integration.py

Shared Test Fixtures

Created comprehensive fixtures in conftest.py:

  • File System: temp_dir, temp_file for temporary file testing
  • Configuration: mock_config with realistic test settings
  • Components: mock_agent, mock_dataset, mock_environment for behavioral testing
  • Data: sample_observation, sample_action for consistent test data
  • Environment: clean_environment, disable_gpu for isolated testing

Development Tools

  • Commands: Both poetry run test and poetry run tests work
  • Coverage: HTML reports in htmlcov/, XML reports in coverage.xml
  • Filtering: Run specific test types with -m unit, -m integration, -m slow

Updated .gitignore

Added comprehensive exclusions for:

  • Testing artifacts (.pytest_cache/, .coverage, htmlcov/)
  • Development environments (venv/, .venv/, etc.)
  • Build artifacts (build/, dist/, *.egg-info/)
  • IDE files (.vscode/, etc.)
  • Claude settings (.claude/*)

Usage Instructions

Running Tests

# Run all tests with coverage
poetry run test

# Run tests without coverage  
poetry run pytest --no-cov

# Run specific test types
poetry run pytest -m unit      # Unit tests only
poetry run pytest -m integration # Integration tests only
poetry run pytest -m slow      # Slow tests only

# Run with different verbosity
poetry run pytest -v           # Verbose output
poetry run pytest -q           # Quiet output

Coverage Reports

  • Terminal: Shows during test run with --cov-report=term-missing
  • HTML: Open htmlcov/index.html in browser for detailed coverage
  • XML: Machine-readable coverage data in coverage.xml

Writing Tests

  • Place unit tests in tests/unit/
  • Place integration tests in tests/integration/
  • Use fixtures from conftest.py for common test data and mocks
  • Mark tests appropriately: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.slow

Validation

✅ All 23 validation tests pass
✅ Both poetry run test and poetry run tests commands work
✅ Coverage reporting generates properly in all formats
✅ Test markers function correctly
✅ Shared fixtures work as expected
✅ Directory structure follows best practices

Next Steps

Developers can now:

  1. Start writing unit tests for individual components in tests/unit/
  2. Create integration tests for multi-component interactions in tests/integration/
  3. Use the comprehensive fixture library for consistent test data
  4. Run targeted test suites using markers
  5. Monitor code coverage to maintain quality standards

The infrastructure is ready for immediate use - no additional setup required!

- Configure Poetry as package manager with pyproject.toml
- Add pytest, pytest-cov, and pytest-mock as test dependencies
- Create structured test directories (tests/unit, tests/integration)
- Configure coverage reporting with 80% threshold and multiple formats
- Set up shared fixtures and test utilities in conftest.py
- Add custom pytest markers for unit, integration, and slow tests
- Update .gitignore with testing artifacts and development files
- Create validation tests to verify infrastructure functionality
- Enable both 'poetry run test' and 'poetry run tests' commands
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant