CIRCE Python Implementation

Caution

This project is currently under active testing and development. It is a Python implementation of the OHDSI CIRCE-BE Java library. While we aim for 1:1 parity, this version is an Alpha release and should be used with caution in production environments.

A Python implementation of the OHDSI CIRCE-BE (Cohort Inclusion and Restriction Criteria Engine) for generating SQL queries from cohort definitions in the OMOP Common Data Model.

Overview

CIRCE Python provides a comprehensive toolkit for working with OMOP CDM cohort definitions:

Cohort Definition Modeling: Create and validate cohort expressions using Pydantic models
SQL Generation: Generate SQL queries from cohort definitions for OMOP CDM v5.x
Concept Set Management: Handle concepts and concept sets from OMOP vocabularies
Validation & Checking: Comprehensive validation with 40+ checker implementations
Print-Friendly Output: Generate human-readable markdown descriptions of cohort definitions
CLI Interface: Command-line tools for validation, SQL generation, and markdown rendering

Package Status

Important

This package is currently in Alpha status and undergoing rigorous parity testing against the Java implementation.

Version: 0.1.0 (Alpha)
Tests: 3,400+ passing
Coverage: 34% (Core logic focus)
Python: 3.8+
License: Apache 2.0

Installation

Note

This package is currently in private development. Install from source using Git.

From Source (Current Method)

# Clone the repository
git clone https://github.com/OHDSI/ohdsi-circepy.git
cd Circepy

# Install in development mode with all dependencies
pip install -e ".[dev]"

# Verify installation
circe --help

See INSTALLATION.md for detailed installation instructions, troubleshooting, and setup options.

From PyPI (Coming Soon)

# Coming in future release
pip install ohdsi-circepy

Quick Start

Command-Line Interface

The easiest way to use CIRCE is through the command-line interface:

# Validate a cohort expression JSON file
circe validate cohort.json

# Generate SQL from a cohort expression
circe generate-sql cohort.json --output cohort.sql

# Render a cohort expression to markdown
circe render-markdown cohort.json --output cohort.md

# Process a cohort expression (validate, generate SQL, and render markdown)
circe process cohort.json --validate --sql --markdown

See the CLI Documentation section below for more details.

Python API

from circe import CohortExpression
from circe.cohortdefinition import PrimaryCriteria, ConditionOccurrence
from circe.cohortdefinition.core import ObservationFilter, ResultLimit
from circe.vocabulary import ConceptSet, ConceptSetExpression, ConceptSetItem, Concept

# Create a cohort expression
cohort = CohortExpression(
    title="Type 2 Diabetes Cohort",
    primary_criteria=PrimaryCriteria(
        criteria_list=[
            ConditionOccurrence(
                codeset_id=1,
                first=True
            )
        ],
        observation_window=ObservationFilter(prior_days=0, post_days=0),
        primary_limit=ResultLimit(type="All")
    ),
    concept_sets=[
        ConceptSet(
            id=1,
            name="Type 2 Diabetes",
            expression=ConceptSetExpression(
                items=[
                    ConceptSetItem(
                        concept=Concept(
                            concept_id=201826,
                            concept_name="Type 2 diabetes mellitus"
                        ),
                        include_descendants=True
                    )
                ]
            )
        )
    ]
)

# Generate SQL using the API
from circe.api import build_cohort_query
from circe.cohortdefinition import BuildExpressionQueryOptions

options = BuildExpressionQueryOptions()
options.cdm_schema = 'cdm'
options.vocabulary_schema = 'cdm'
options.cohort_id = 1
options.target_table = 'scratch.cohort'
sql = build_cohort_query(cohort, options)
print(sql)

What's Included

This package provides a complete Python implementation of CIRCE-BE with:

3,400+ passing tests with focused coverage on core logic
18+ SQL builders for all OMOP CDM domains:
- Condition Occurrence/Era
- Drug Exposure/Era
- Procedure Occurrence
- Measurement, Observation
- Visit Occurrence/Detail
- Device Exposure, Specimen
- Death, Location Region
- Observation Period, Payer Plan Period
- And more...
Full cohort expression validation with comprehensive error checking
Markdown rendering for human-readable cohort descriptions
Complete CLI interface with 4 commands (validate, generate-sql, render-markdown, process)
Java interoperability - supports both camelCase and snake_case field names for seamless Java CIRCE-BE compatibility

⚠️ Java Fidelity Requirement

This project maintains 1:1 compatibility with Java CIRCE-BE.

All Python classes replicate Java functionality exactly
Field names support both Java (camelCase) and Python (snake_case) formats
SQL generation produces identical results to Java implementation
All changes are validated against Java schema

See JAVA_CLASS_MAPPINGS.md for complete class mappings.

Package Structure

circe/
├── cohortdefinition/          # Core cohort definition classes
│   ├── builders/              # SQL query builders (18+ builders)
│   ├── printfriendly/         # Human-readable markdown output
│   └── negativecontrols/      # Negative control generation
├── vocabulary/                # Concept and concept set management
├── check/                     # Validation and checking framework
│   ├── checkers/              # 40+ specific checker implementations
│   ├── operations/            # Check operations
│   ├── utils/                 # Check utilities
│   └── warnings/              # Warning classes
├── helper/                    # Utility helper classes
├── api.py                     # High-level API functions
└── cli.py                     # Command-line interface

Features

✅ Implemented

Command-Line Interface

CIRCE provides a comprehensive command-line interface for validating, generating SQL, and rendering cohort expressions.

Validate Command

Validate a cohort expression JSON file against the CIRCE standard:

circe validate cohort.json

Options:

--verbose, -v: Display all validation warnings including INFO level
--quiet, -q: Suppress non-error output

Exit codes:

0: Valid (no errors or warnings)
1: Invalid (errors found)
2: Valid but has warnings

Generate SQL Command

Generate SQL from a cohort expression:

# Output to stdout
circe generate-sql cohort.json

# Output to file
circe generate-sql cohort.json --output cohort.sql

# With custom schema names
circe generate-sql cohort.json --cdm-schema my_cdm --vocab-schema my_vocab --cohort-id 123

Options:

--output, -o: Output SQL file path (default: stdout)
--sql-options: JSON file with BuildExpressionQueryOptions
--cdm-schema: CDM schema name (default: @cdm_database_schema)
--vocab-schema: Vocabulary schema name (default: @vocabulary_database_schema)
--cohort-id: Cohort ID for SQL generation
--validate: Validate before generating SQL (default: True)
--no-validate: Skip validation before generating SQL
--verbose, -v: Verbose output
--quiet, -q: Suppress non-error output

Render Markdown Command

Render a cohort expression to human-readable markdown:

# Output to stdout
circe render-markdown cohort.json

# Output to file
circe render-markdown cohort.json --output cohort.md

Options:

--output, -o: Output markdown file path (default: stdout)
--validate: Validate before rendering markdown (default: True)
--no-validate: Skip validation before rendering markdown
--verbose, -v: Verbose output
--quiet, -q: Suppress non-error output

Process Command

Process a cohort expression with multiple operations:

# Validate, generate SQL, and render markdown
circe process cohort.json --validate --sql --markdown

# Generate SQL with custom output file
circe process cohort.json --sql output.sql

# Generate SQL and markdown with default file names
circe process cohort.json --sql --markdown

Options:

--validate: Validate the cohort expression
--sql [FILE]: Generate SQL (optionally specify output file, default: input file with .sql extension)
--markdown [FILE]: Render markdown (optionally specify output file, default: input file with .md extension)
--sql-options: JSON file with BuildExpressionQueryOptions
--cdm-schema: CDM schema name (default: @cdm_database_schema)
--vocab-schema: Vocabulary schema name (default: @vocabulary_database_schema)
--cohort-id: Cohort ID for SQL generation
--verbose, -v: Verbose output
--quiet, -q: Suppress non-error output

CLI Examples

# Validate a cohort expression
circe validate my_cohort.json

# Generate SQL with custom schema
circe generate-sql my_cohort.json --output my_cohort.sql \
    --cdm-schema my_cdm_schema \
    --vocab-schema my_vocab_schema \
    --cohort-id 1

# Generate SQL and markdown in one command
circe process my_cohort.json --sql --markdown

# Validate, generate SQL, and render markdown
circe process my_cohort.json --validate --sql my_cohort.sql --markdown my_cohort.md

Development

Setup Development Environment

# Clone the repository
git clone https://github.com/OHDSI/Circepy.git
cd Circepy

# Install with development dependencies
pip install -e ".[dev]"

# Verify installation
pytest --version
circe --help

Running Tests

pytest

All 3,400+ tests should pass.

Code Formatting

black circe/
isort circe/

Type Checking

mypy circe/

Compatibility Notes

This implementation is designed to be compatible with OHDSI CIRCE-BE Java version. The Python package:

Accepts JSON cohort definitions from OHDSI Atlas and other tools
Generates SQL identical to the Java implementation
Supports all OMOP CDM v5.x versions
Maintains field name compatibility (camelCase and snake_case)

Troubleshooting

Import Errors

If you encounter import errors, ensure the package is properly installed:

pip install --upgrade ohdsi-circepy

SQL Generation Issues

Verify your cohort expression JSON is valid using circe validate
Check that all concept IDs reference valid OMOP concepts
Ensure schema names are correctly specified

Performance Considerations

For large cohort definitions with many criteria:

SQL generation typically completes in < 1 second
Validation runs in < 500ms for most cohorts
Memory usage scales with the number of criteria (typically < 100MB)

FAQ

Q: Is this compatible with OHDSI Atlas? A: Yes, this package can process cohort definition JSON files exported from Atlas.

Q: Can I use this with CDM v5.3? A: Yes, the package supports all OMOP CDM v5.x versions.

Q: How do I convert camelCase JSON to Python? A: The package automatically handles both camelCase and snake_case field names.

Q: Does this replace the Java CIRCE-BE? A: No, this is a complementary Python implementation. Both produce identical SQL output.

Contributing

Contributions are welcome! Please see our Contributing Guidelines for details.

Key areas for contribution:

Additional test coverage
Performance optimizations
Documentation improvements
Bug fixes and issue reports

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Acknowledgments

This project is based on the Java CIRCE-BE implementation by the OHDSI community. We thank all contributors to the original Java implementation.

Special thanks to:

The OHDSI community for their continued support
Contributors to the Java CIRCE-BE project
The Pydantic team for their excellent validation library

Support

Repository: https://github.com/OHDSI/circepy
Issues: https://github.com/OHDSI/circepy/issues
Installation Guide: INSTALLATION.md
PyPI: https://pypi.org/project/circepy/ (coming soon)
Documentation: https://ohdsi-circepy.readthedocs.io/ (coming soon)

Related Projects

OHDSI CIRCE-BE (Java) - Original Java implementation
OHDSI Common Data Model - OMOP CDM specification
OHDSI Atlas - Web-based cohort definition tool
OHDSI WebAPI - RESTful API for OHDSI tools

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
.agent		.agent
.github/workflows		.github/workflows
circe		circe
debug_app		debug_app
docs		docs
examples		examples
scripts		scripts
tests		tests
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
INSTALLATION.md		INSTALLATION.md
JAVA_CLASS_MAPPINGS.md		JAVA_CLASS_MAPPINGS.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
PUBLISHING_GUIDE.md		PUBLISHING_GUIDE.md
README.md		README.md
RELEASE_CHECKLIST.md		RELEASE_CHECKLIST.md
circe_sql.R		circe_sql.R
cohort_definition.py		cohort_definition.py
coverage.json		coverage.json
evaluation.md		evaluation.md
java_cohort_expression_schema.json		java_cohort_expression_schema.json
pyproject.toml		pyproject.toml
python_cohort_expression_schema.json		python_cohort_expression_schema.json
requirements.txt		requirements.txt
tox.ini		tox.ini

License

OHDSI/Circepy

Folders and files

Latest commit

History

Repository files navigation

CIRCE Python Implementation

Overview

Package Status

Installation

From Source (Current Method)

From PyPI (Coming Soon)

Quick Start

Command-Line Interface

Python API

What's Included

⚠️ Java Fidelity Requirement

Package Structure

Features

✅ Implemented

Command-Line Interface

Validate Command

Generate SQL Command

Render Markdown Command

Process Command

CLI Examples

Development

Setup Development Environment

Running Tests

Code Formatting

Type Checking

Compatibility Notes

Troubleshooting

Import Errors

SQL Generation Issues

Performance Considerations

FAQ

Contributing

License

Acknowledgments

Support

Related Projects

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 2

Uh oh!

Languages

Packages