CausalARC: Abstract Reasoning with Causal World Models

About this work

★ Spotlight paper @ NeurIPS 2025 Workshop: Bridging Language, Agent, and World Models for Reasoning and Planning (LAW)

★ Contents. This repository houses the source code for generating CausalARC tasks, as well as static datasets of presampled tasks (data/static_evaluation_set/) and text prompts (data/prompts/).

★ Learn more. See our full project page here: https://jmaasch.github.io/carc/

★ Contribute. If you are interested in contributing to this open source project, contact me on LinkedIn.

Abstract

On-the-fly reasoning often requires adaptation to novel problems under limited data and distribution shift. This work introduces CausalARC: an experimental testbed for AI reasoning in low-data and out-of-distribution regimes, modeled after the Abstraction and Reasoning Corpus (ARC). Each CausalARC reasoning task is sampled from a fully specified causal world model, formally expressed as a structural causal model (SCM). Principled data augmentations provide observational, interventional, and counterfactual feedback about the world model in the form of few-shot, in-context learning demonstrations. As a proof-of-concept, we illustrate the use of CausalARC for four language model evaluation settings: (1) abstract reasoning with test-time training, (2) counterfactual reasoning with in-context learning, (3) program synthesis, and (4) causal discovery with logical reasoning. Within- and between-model performance varied heavily across tasks, indicating room for significant improvement in language model reasoning.

The CausalARC testbed for reasoning evaluation.

How to cite

Please cite this work:

@inproceedings{maasch2025causalarc,
   title={CausalARC: Abstract Reasoning with Causal World Models},
   author={Maasch, Jacqueline and Kalantari, John and Khezeli, Kia},
   booktitle={NeurIPS 2025 Workshop on Bridging Language, Agent, and World Models for Reasoning and Planning}
}

Respository structure

.
├── causal_arc # All source code for CausalARC (task generation, data processing, etc).
│   ├── carc_augment.py
│   ├── carc_tasks_counting.py
│   ├── carc_tasks_extension.py
│   ├── carc_tasks_logical.py
│   ├── carc_tasks_order.py
│   ├── carc_utils.py
│   └── carc.py
├── data
│   ├── prompts # Prompt dictionaries submitted to LLMs for langchain experiments.
│   │   ├── causal_discovery
│   │   │   └── discovery_logical_compose_and_xor_prompts.json
│   │   ├── counterfactual_reasoning
│   │   │   ├── counting_extension_ordering
│   │   │   │   └── cf_reasoning_counting_extension_ordering_prompts.json
│   │   │   └── logical
│   │   │       └── cf_reasoning_logical_prompts.json
│   │   └── program_synthesis
│   │       ├── program_synthesis_nexamples4_prompts.json
│   │       ├── program_synthesis_nexamples6_prompts.json
│   │       └── program_synthesis_nexamples8_prompts.json
│   └── static_evaluation_set # The version of the static dataset used in MARC TTT experiments.
│       └── v0_09-01-25
│           ├── counting
│           │   ├── causal_arc_counting_solutions.json
│           │   └── causal_arc_counting.json
│           ├── extension
│           │   ├── causal_arc_extension_solutions.json
│           │   └── causal_arc_extension.json
│           ├── logical
│           │   ├── causal_arc_logical_solutions.json
│           │   └── causal_arc_logical.json
│           └── ordering
│               ├── causal_arc_ordering_solutions.json
│               └── causal_arc_ordering.json
├── demos
│   ├── causal_discovery_pc_algorithm.ipynb # Run PC algorithm on a CausalARC SCM.
│   ├── preview_causal_arc_tasks.ipynb # View examples from all CausalARC SCMs.
│   ├── prompt_generation # Directory for demonstrations of prompt sampling functions.
│   │   ├── prompt_causal_discovery_logical_composition.ipynb
│   │   ├── prompt_counterfactual_counting_ordering_extension.ipynb
│   │   └── prompt_program_synthesis.ipynb
│   └── task_sampling # Directory for demonstrations of task / grid sampling functions.
│       ├── causal_arc_task_construction_counting.ipynb
│       ├── causal_arc_task_construction_extension.ipynb
│       ├── causal_arc_task_construction_logical.ipynb
│       └── causal_arc_task_construction_ordering.ipynb
├── experiments
│   ├── langchain # Directory for langchain scripts to query proprietary models.
│   └── marc_results # Directory for raw output dictionaries from MARC TTT experiments.
└── README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CausalARC: Abstract Reasoning with Causal World Models

About this work

Abstract

How to cite

Respository structure

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
causal_arc		causal_arc
data		data
demos		demos
experiments		experiments
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md

jmaasch/causal_arc

Folders and files

Latest commit

History

Repository files navigation

CausalARC: Abstract Reasoning with Causal World Models

About this work

Abstract

How to cite

Respository structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages