Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
The up-to-date documentation regarding usage and features of CausalBench can be found at [https://docs.causalbench.org](https://docs.causalbench.org).

Registration at [CausalBench website](https://causalbench.org) is required in order to utilize the CausalBench package.
### Install CausalBench:
### Install CausalBench:
`pip install causalbench-asu`

## Overview
Expand Down Expand Up @@ -42,7 +42,7 @@ To start using CausalBench, follow these steps:

## Contributing

CausalBench is an open-source project and welcomes contributions from the community. We plan to announce the contribution guideline soon.
CausalBench is an open-source project and welcomes contributions from the community. We plan to announce the contribution guideline soon.

## License

Expand All @@ -58,13 +58,15 @@ for Causal-Learning Benchmarking for Efficacy, Reproducibility, and Scientific
Collaboration".

## Support Benchmark Context
CausalBench is structured to support different machine learning tasks and dataset types. With user contribution, the supported context will be expanded, currently (as of 8/12/25), these models and tasks are provided.
CausalBench is structured to support different machine learning tasks and dataset types. With user contribution, the supported context will be expanded, currently (as of 8/12/25), these models and tasks are provided.

| Dataset | File | Description |
|-----------------------|----------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|
| Abalone | data, static graph | |
| Adult | data, static graph | |
| Sachs | data, static graph | |
| California Housing | data | Regression dataset from sklearn with 20,640 samples predicting median house values in California districts |
| Diabetes | data | Regression dataset from sklearn with 442 samples predicting disease progression from physiological variables |
| NetSim | data, static graph | Brain FMRI scan<br/> - 28 simulations <br/> - Each has different DGPs, num of nodes (5, 50), num of observations (50 to 5000), 1400 datasets in total |
Comment on lines 60 to 70
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description mentions a new test_regression_datasets.py test script, but it doesn't appear to be included in this change set (and isn't present under causalbench-asu/tests/). If automated coverage for loading these datasets is intended, please add the test file (or update the PR description to match what’s actually being delivered).

Copilot uses AI. Check for mistakes.
| Time series simulated | data, temporal graph | |
| Telecom | data, temporal graph | |
Expand Down
Binary file not shown.
20,641 changes: 20,641 additions & 0 deletions causalbench-asu/tests/data/california_housing/california_housing_data.csv

Large diffs are not rendered by default.

49 changes: 49 additions & 0 deletions causalbench-asu/tests/data/california_housing/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# California Housing dataset: https://scikit-learn.org/stable/datasets/real_world.html#california-housing-dataset
type: dataset
name: california_housing
source: sklearn
url: https://scikit-learn.org/stable/datasets/real_world.html#california-housing-dataset
description: Predict median house values in California districts based on census data (regression task)
files:
file1:
type: csv
data: dataframe
path: california_housing_data.csv
headers: true
columns:
MedInc:
header: MedInc
type: continuous
data: decimal
HouseAge:
header: HouseAge
type: continuous
data: decimal
AveRooms:
header: AveRooms
type: continuous
data: decimal
AveBedrms:
header: AveBedrms
type: continuous
data: decimal
Population:
header: Population
type: continuous
data: decimal
AveOccup:
header: AveOccup
type: continuous
data: decimal
Latitude:
header: Latitude
type: continuous
data: decimal
Longitude:
header: Longitude
type: continuous
data: decimal
MedHouseVal:
header: MedHouseVal
type: continuous
Comment on lines +16 to +48
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column type fields use continuous, which is inconsistent with existing dataset configs that use values like ratio/nominal (see tests/data/panama/config.yaml, tests/data/abalone/config.yaml). If continuous isn't a recognized schema value, this dataset config may not load. Please switch these to the supported type value (likely ratio for these numeric columns) or extend the schema/loader to accept continuous.

Suggested change
type: continuous
data: decimal
HouseAge:
header: HouseAge
type: continuous
data: decimal
AveRooms:
header: AveRooms
type: continuous
data: decimal
AveBedrms:
header: AveBedrms
type: continuous
data: decimal
Population:
header: Population
type: continuous
data: decimal
AveOccup:
header: AveOccup
type: continuous
data: decimal
Latitude:
header: Latitude
type: continuous
data: decimal
Longitude:
header: Longitude
type: continuous
data: decimal
MedHouseVal:
header: MedHouseVal
type: continuous
type: ratio
data: decimal
HouseAge:
header: HouseAge
type: ratio
data: decimal
AveRooms:
header: AveRooms
type: ratio
data: decimal
AveBedrms:
header: AveBedrms
type: ratio
data: decimal
Population:
header: Population
type: ratio
data: decimal
AveOccup:
header: AveOccup
type: ratio
data: decimal
Latitude:
header: Latitude
type: ratio
data: decimal
Longitude:
header: Longitude
type: ratio
data: decimal
MedHouseVal:
header: MedHouseVal
type: ratio

Copilot uses AI. Check for mistakes.
data: decimal
24 changes: 24 additions & 0 deletions causalbench-asu/tests/data/california_housing/download_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
"""
Download California Housing dataset from sklearn
"""

from sklearn.datasets import fetch_california_housing


def download_california_housing():
# Load the dataset
california = fetch_california_housing(as_frame=True)

# Combine features and target
data = california.frame

# Save to CSV
data.to_csv("california_housing_data.csv", index=False)

print(f"Dataset saved with {len(data)} samples")
print(f"Features: {california.feature_names}")
print(f"Target: {california.target_names}")


if __name__ == "__main__":
download_california_housing()
Binary file added causalbench-asu/tests/data/diabetes.zip
Binary file not shown.
57 changes: 57 additions & 0 deletions causalbench-asu/tests/data/diabetes/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Diabetes dataset: https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset
type: dataset
name: diabetes
source: sklearn
url: https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset
description: Predict disease progression one year after baseline from physiological variables (regression task)
files:
file1:
type: csv
data: dataframe
path: diabetes_data.csv
headers: true
columns:
age:
header: age
type: continuous
data: decimal
sex:
header: sex
type: continuous
data: decimal
bmi:
header: bmi
type: continuous
data: decimal
bp:
header: bp
type: continuous
data: decimal
s1:
header: s1
type: continuous
data: decimal
s2:
header: s2
type: continuous
data: decimal
s3:
header: s3
type: continuous
data: decimal
s4:
header: s4
type: continuous
data: decimal
s5:
header: s5
type: continuous
data: decimal
s6:
header: s6
type: continuous
data: decimal
target:
header: target
type: continuous
Comment on lines +16 to +56
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column type values are set to continuous, but other dataset configs in this repo use values like ratio/nominal (e.g., tests/data/panama/config.yaml). If the dataset loader only recognizes the existing enum values, continuous will fail schema validation or parsing. Please align these column type fields with the accepted values used elsewhere (e.g., use ratio for numeric continuous variables) or update the loader/schema to explicitly support continuous.

Suggested change
type: continuous
data: decimal
sex:
header: sex
type: continuous
data: decimal
bmi:
header: bmi
type: continuous
data: decimal
bp:
header: bp
type: continuous
data: decimal
s1:
header: s1
type: continuous
data: decimal
s2:
header: s2
type: continuous
data: decimal
s3:
header: s3
type: continuous
data: decimal
s4:
header: s4
type: continuous
data: decimal
s5:
header: s5
type: continuous
data: decimal
s6:
header: s6
type: continuous
data: decimal
target:
header: target
type: continuous
type: ratio
data: decimal
sex:
header: sex
type: ratio
data: decimal
bmi:
header: bmi
type: ratio
data: decimal
bp:
header: bp
type: ratio
data: decimal
s1:
header: s1
type: ratio
data: decimal
s2:
header: s2
type: ratio
data: decimal
s3:
header: s3
type: ratio
data: decimal
s4:
header: s4
type: ratio
data: decimal
s5:
header: s5
type: ratio
data: decimal
s6:
header: s6
type: ratio
data: decimal
target:
header: target
type: ratio

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is important. @prat-man @Shu-Wan I don't recall we do have any "type" for datasets. We may need to remove them. Please confirm.

data: decimal
Loading
Loading