Skip to content
/ Circepy Public

[under development] A one to one python implementation of the circe-be cohort definition classes

License

Notifications You must be signed in to change notification settings

OHDSI/Circepy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CIRCE Python Implementation

Python Tests codecov License PyPI

Caution

This project is currently under active testing and development. It is a Python implementation of the OHDSI CIRCE-BE Java library. While we aim for 1:1 parity, this version is an Alpha release and should be used with caution in production environments.

A Python implementation of the OHDSI CIRCE-BE (Cohort Inclusion and Restriction Criteria Engine) for generating SQL queries from cohort definitions in the OMOP Common Data Model.

Overview

CIRCE Python provides a comprehensive toolkit for working with OMOP CDM cohort definitions:

  • Cohort Definition Modeling: Create and validate cohort expressions using Pydantic models
  • SQL Generation: Generate SQL queries from cohort definitions for OMOP CDM v5.x
  • Concept Set Management: Handle concepts and concept sets from OMOP vocabularies
  • Validation & Checking: Comprehensive validation with 40+ checker implementations
  • Print-Friendly Output: Generate human-readable markdown descriptions of cohort definitions
  • CLI Interface: Command-line tools for validation, SQL generation, and markdown rendering

Package Status

Important

This package is currently in Alpha status and undergoing rigorous parity testing against the Java implementation.

  • Version: 0.1.0 (Alpha)
  • Tests: 3,400+ passing
  • Coverage: 34% (Core logic focus)
  • Python: 3.8+
  • License: Apache 2.0

Installation

Note

This package is currently in private development. Install from source using Git.

From Source (Current Method)

# Clone the repository
git clone https://github.com/OHDSI/ohdsi-circepy.git
cd Circepy

# Install in development mode with all dependencies
pip install -e ".[dev]"

# Verify installation
circe --help

See INSTALLATION.md for detailed installation instructions, troubleshooting, and setup options.

From PyPI (Coming Soon)

# Coming in future release
pip install ohdsi-circepy

Quick Start

Command-Line Interface

The easiest way to use CIRCE is through the command-line interface:

# Validate a cohort expression JSON file
circe validate cohort.json

# Generate SQL from a cohort expression
circe generate-sql cohort.json --output cohort.sql

# Render a cohort expression to markdown
circe render-markdown cohort.json --output cohort.md

# Process a cohort expression (validate, generate SQL, and render markdown)
circe process cohort.json --validate --sql --markdown

See the CLI Documentation section below for more details.

Python API

from circe import CohortExpression
from circe.cohortdefinition import PrimaryCriteria, ConditionOccurrence
from circe.cohortdefinition.core import ObservationFilter, ResultLimit
from circe.vocabulary import ConceptSet, ConceptSetExpression, ConceptSetItem, Concept

# Create a cohort expression
cohort = CohortExpression(
    title="Type 2 Diabetes Cohort",
    primary_criteria=PrimaryCriteria(
        criteria_list=[
            ConditionOccurrence(
                codeset_id=1,
                first=True
            )
        ],
        observation_window=ObservationFilter(prior_days=0, post_days=0),
        primary_limit=ResultLimit(type="All")
    ),
    concept_sets=[
        ConceptSet(
            id=1,
            name="Type 2 Diabetes",
            expression=ConceptSetExpression(
                items=[
                    ConceptSetItem(
                        concept=Concept(
                            concept_id=201826,
                            concept_name="Type 2 diabetes mellitus"
                        ),
                        include_descendants=True
                    )
                ]
            )
        )
    ]
)

# Generate SQL using the API
from circe.api import build_cohort_query
from circe.cohortdefinition import BuildExpressionQueryOptions

options = BuildExpressionQueryOptions()
options.cdm_schema = 'cdm'
options.vocabulary_schema = 'cdm'
options.cohort_id = 1
options.target_table = 'scratch.cohort'
sql = build_cohort_query(cohort, options)
print(sql)

What's Included

This package provides a complete Python implementation of CIRCE-BE with:

  • 3,400+ passing tests with focused coverage on core logic
  • 18+ SQL builders for all OMOP CDM domains:
    • Condition Occurrence/Era
    • Drug Exposure/Era
    • Procedure Occurrence
    • Measurement, Observation
    • Visit Occurrence/Detail
    • Device Exposure, Specimen
    • Death, Location Region
    • Observation Period, Payer Plan Period
    • And more...
  • Full cohort expression validation with comprehensive error checking
  • Markdown rendering for human-readable cohort descriptions
  • Complete CLI interface with 4 commands (validate, generate-sql, render-markdown, process)
  • Java interoperability - supports both camelCase and snake_case field names for seamless Java CIRCE-BE compatibility

⚠️ Java Fidelity Requirement

This project maintains 1:1 compatibility with Java CIRCE-BE.

  • All Python classes replicate Java functionality exactly
  • Field names support both Java (camelCase) and Python (snake_case) formats
  • SQL generation produces identical results to Java implementation
  • All changes are validated against Java schema

See JAVA_CLASS_MAPPINGS.md for complete class mappings.

Package Structure

circe/
├── cohortdefinition/          # Core cohort definition classes
│   ├── builders/              # SQL query builders (18+ builders)
│   ├── printfriendly/         # Human-readable markdown output
│   └── negativecontrols/      # Negative control generation
├── vocabulary/                # Concept and concept set management
├── check/                     # Validation and checking framework
│   ├── checkers/              # 40+ specific checker implementations
│   ├── operations/            # Check operations
│   ├── utils/                 # Check utilities
│   └── warnings/              # Warning classes
├── helper/                    # Utility helper classes
├── api.py                     # High-level API functions
└── cli.py                     # Command-line interface

Features

✅ Implemented

  • Complete cohort definition data model with Pydantic validation
  • 18+ SQL builders covering all OMOP CDM domains
  • Comprehensive CLI interface (validate, generate-sql, render-markdown, process)
  • Java interoperability with camelCase/snake_case field support
  • Cohort expression validation with 40+ checker implementations
  • Markdown rendering for print-friendly descriptions
  • Full test suite (3,400+ tests)
  • Type hints throughout with py.typed marker
  • Concept set expression handling
  • Window criteria and correlated criteria support
  • Date adjustments and custom era strategies
  • Observation period and demographic criteria
  • Inclusion rules and censoring criteria

Command-Line Interface

CIRCE provides a comprehensive command-line interface for validating, generating SQL, and rendering cohort expressions.

Validate Command

Validate a cohort expression JSON file against the CIRCE standard:

circe validate cohort.json

Options:

  • --verbose, -v: Display all validation warnings including INFO level
  • --quiet, -q: Suppress non-error output

Exit codes:

  • 0: Valid (no errors or warnings)
  • 1: Invalid (errors found)
  • 2: Valid but has warnings

Generate SQL Command

Generate SQL from a cohort expression:

# Output to stdout
circe generate-sql cohort.json

# Output to file
circe generate-sql cohort.json --output cohort.sql

# With custom schema names
circe generate-sql cohort.json --cdm-schema my_cdm --vocab-schema my_vocab --cohort-id 123

Options:

  • --output, -o: Output SQL file path (default: stdout)
  • --sql-options: JSON file with BuildExpressionQueryOptions
  • --cdm-schema: CDM schema name (default: @cdm_database_schema)
  • --vocab-schema: Vocabulary schema name (default: @vocabulary_database_schema)
  • --cohort-id: Cohort ID for SQL generation
  • --validate: Validate before generating SQL (default: True)
  • --no-validate: Skip validation before generating SQL
  • --verbose, -v: Verbose output
  • --quiet, -q: Suppress non-error output

Render Markdown Command

Render a cohort expression to human-readable markdown:

# Output to stdout
circe render-markdown cohort.json

# Output to file
circe render-markdown cohort.json --output cohort.md

Options:

  • --output, -o: Output markdown file path (default: stdout)
  • --validate: Validate before rendering markdown (default: True)
  • --no-validate: Skip validation before rendering markdown
  • --verbose, -v: Verbose output
  • --quiet, -q: Suppress non-error output

Process Command

Process a cohort expression with multiple operations:

# Validate, generate SQL, and render markdown
circe process cohort.json --validate --sql --markdown

# Generate SQL with custom output file
circe process cohort.json --sql output.sql

# Generate SQL and markdown with default file names
circe process cohort.json --sql --markdown

Options:

  • --validate: Validate the cohort expression
  • --sql [FILE]: Generate SQL (optionally specify output file, default: input file with .sql extension)
  • --markdown [FILE]: Render markdown (optionally specify output file, default: input file with .md extension)
  • --sql-options: JSON file with BuildExpressionQueryOptions
  • --cdm-schema: CDM schema name (default: @cdm_database_schema)
  • --vocab-schema: Vocabulary schema name (default: @vocabulary_database_schema)
  • --cohort-id: Cohort ID for SQL generation
  • --verbose, -v: Verbose output
  • --quiet, -q: Suppress non-error output

CLI Examples

# Validate a cohort expression
circe validate my_cohort.json

# Generate SQL with custom schema
circe generate-sql my_cohort.json --output my_cohort.sql \
    --cdm-schema my_cdm_schema \
    --vocab-schema my_vocab_schema \
    --cohort-id 1

# Generate SQL and markdown in one command
circe process my_cohort.json --sql --markdown

# Validate, generate SQL, and render markdown
circe process my_cohort.json --validate --sql my_cohort.sql --markdown my_cohort.md

Development

Setup Development Environment

# Clone the repository
git clone https://github.com/OHDSI/Circepy.git
cd Circepy

# Install with development dependencies
pip install -e ".[dev]"

# Verify installation
pytest --version
circe --help

Running Tests

pytest

All 3,400+ tests should pass.

Code Formatting

black circe/
isort circe/

Type Checking

mypy circe/

Compatibility Notes

This implementation is designed to be compatible with OHDSI CIRCE-BE Java version. The Python package:

  • Accepts JSON cohort definitions from OHDSI Atlas and other tools
  • Generates SQL identical to the Java implementation
  • Supports all OMOP CDM v5.x versions
  • Maintains field name compatibility (camelCase and snake_case)

Troubleshooting

Import Errors

If you encounter import errors, ensure the package is properly installed:

pip install --upgrade ohdsi-circepy

SQL Generation Issues

  • Verify your cohort expression JSON is valid using circe validate
  • Check that all concept IDs reference valid OMOP concepts
  • Ensure schema names are correctly specified

Performance Considerations

For large cohort definitions with many criteria:

  • SQL generation typically completes in < 1 second
  • Validation runs in < 500ms for most cohorts
  • Memory usage scales with the number of criteria (typically < 100MB)

FAQ

Q: Is this compatible with OHDSI Atlas? A: Yes, this package can process cohort definition JSON files exported from Atlas.

Q: Can I use this with CDM v5.3? A: Yes, the package supports all OMOP CDM v5.x versions.

Q: How do I convert camelCase JSON to Python? A: The package automatically handles both camelCase and snake_case field names.

Q: Does this replace the Java CIRCE-BE? A: No, this is a complementary Python implementation. Both produce identical SQL output.

Contributing

Contributions are welcome! Please see our Contributing Guidelines for details.

Key areas for contribution:

  • Additional test coverage
  • Performance optimizations
  • Documentation improvements
  • Bug fixes and issue reports

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Acknowledgments

This project is based on the Java CIRCE-BE implementation by the OHDSI community. We thank all contributors to the original Java implementation.

Special thanks to:

  • The OHDSI community for their continued support
  • Contributors to the Java CIRCE-BE project
  • The Pydantic team for their excellent validation library

Support

Related Projects

About

[under development] A one to one python implementation of the circe-be cohort definition classes

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •