Context-free Grammar for FHIR

Context-free grammars (CFGs) are a crucial element to streamline the use of Large Language Models (LLMs) for generating structured data using constrained decoding and structured generation. Manually creating such CFGs can be difficult, especially when the enforced language rule follows a complex and deeply nested document structure. HL7 FHIR is a powerful standard for encoding medical and clinical information. This work focusses on dynamically generatating CFGs which can be used enforce and stabilize the LLM outputs to closely follow the FHIR structure.

To compare the CFG-guided approach with JSON schema-guided or unguided implementations, a set of test cases were created to highlight potential differences in the resulting FHIR data.

Results

Grammar Synthesis

Install the dependencies first.

Setup Steps

# Create venv [using System Python 3.12 (<3.13!)]
# e.g. via installed UV for custom Python version:
uv python install 3.12
uv venv --python 3.12 env

source env/bin/activate
python -m ensurepip --upgrade

# Install dependencies
python3 -m pip install -r requirements.txt

You also need to install the tool jq, using

sudo apt install jq

You can generate custom FHIR CFG as follows:

./generate_FHIR_grammar.sh

You can test the grammar as follows:

export CUDA_VISIBLE_DEVICES=0
source env/bin/activate

# Using CFG
python3 demo_outlines_v1_cfg.py
# Using JSON schema
python3 demo_outlines_v1_jsonschema.py

Test Cases

The prompts and individual results of the test cases can be found in testcases/. Our test case summary is as follows:

ID	Category	Description	CFG-guided	JSON schema-guided	Unguided
1.0	Output Cleanliness	Basic Patient Test	🟢	🟢	🟢
2.1	Version Compliance	MedicationStatement with CodeableReference	🟢	🟢	🔴: medication.reference set to str
2.2	Version Compliance	MedicationStatement with basic medication.concept and medication.subject	🟢	🟢	🔴: R4 issue
2.3	Version Compliance	MedicationStatement with multiple coding systems	🟢	🟢	🔴: R4 issue; invalid medication.coding
3.1	Structural Validity	Condition Resource Generation	🟢	🔴: Patient-like object generated	🟢
3.2	Structural Validity	Patient with telecom/address field	🟢	🔴: Patient-like object with empty array	🟢
3.3	Structural Validity	MedicationStatement with encounter & dosage	🟢	🟢	🔴: R4 issue
3.4	Structural Validity	MedicationStatement with structured Dosage information	🟢	🔴: Patient-like object	🔴: R4 issue; duration as object
3.5	Structural Validity	MedicationStatement with multiple Dosages	🟢	🟢	🔴: R4 issue; doseQuantity in dosage item
4.1	Constrained Values	MedicationStatement with “stored“ status	🟡: syntactically valid, semantically wrong: status: entered-in-error	🔴: Patient-like object	🔴: R4 issue; syntactically wrong: status: completed
4.2	Constrained Values	MedicationStatement with “erroneous“ status	🟢	🔴: status correct, but hallucinated extra information	🔴: R4 issue
4.3	Constrained Values	MedicationStatement with “completed“ status and “hours“, “days“ dosage values	🟢: status set to “draft“; “hours“ → “h“, “days“ → “d“	🔴: Set invalid values “completed“, “hours“, “days“	🔴: R4 issue; set invalid values “completed“, “hours“, “days“; duration as object
4.4	Constrained Values	MedicationStatement with mixed valid/invalid UCUM codes	🟡: syntactically valid, semantically wrong: “week“ → “d“	🔴: missed duration/durationUnit entirely, hallucinated when, asNeeded	🔴: R4 issue; duration as object
5.1	Schema Robustness	MedicationStatement with non-standard order (status last)	🟢	🔴: Missing “status“, added “adherence“	🔴: R4 issue
5.2	Schema Robustness	MedicationStatement with non-standard order (status last; dosage before medication)	🟢	🟢	🔴: R4 issue; hallucinated fields: id, meta
5.3	Schema Robustness	Deliberate Trigger: MedicationStatement with forbidden note field	🔴: Repetition loop, unable to generate “note“ field by CFG	🔴: Patient-like object	🔴: R4 issue
5.4	Schema Robustness	Deliberate Trigger: MedicationStatement with forbidden dosage text field	🔴: Encoded textual information as structured information correctly instead (→hallucination)	🔴: Patient-like object	🔴: R4 issue

To reproduce our test results, ensure the following prerequisits:

The dependencies and virtual environment are installed.
The grammar is generated (with unchanged modifications).
You have access to the Llama 3.3 70B-Instruct model via Hugging Face.

Perform the following steps:

# Remove the old result data
rm testcases/*_collected.txt testcases/*_output.*.txt

# Setup envs
source env/bin/activate
export HF_TOKEN="<your HF token>"
export CUDA_VISIBLE_DEVICES=0

# Run the experiments again
python3 run_testcases.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Context-free Grammar for FHIR

Results

Grammar Synthesis

Test Cases

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
grammar		grammar
grammar_utils		grammar_utils
schema		schema
testcases		testcases
.gitignore		.gitignore
README.md		README.md
demo_outlines_v1_cfg.py		demo_outlines_v1_cfg.py
demo_outlines_v1_jsonschema.py		demo_outlines_v1_jsonschema.py
demo_xgrammar.py		demo_xgrammar.py
generate_FHIR_grammar.sh		generate_FHIR_grammar.sh
requirements.txt		requirements.txt
run_testcases.py		run_testcases.py

j-frei/CFG4FHIR

Folders and files

Latest commit

History

Repository files navigation

Context-free Grammar for FHIR

Results

Grammar Synthesis

Test Cases

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages