CausalBenchOrg · Shu-Wan · Feb 6, 2026 · Copilot · Feb 6, 2026 · Copilot
diff --git a/README.md b/README.md
@@ -3,7 +3,7 @@
 The up-to-date documentation regarding usage and features of CausalBench can be found at [https://docs.causalbench.org](https://docs.causalbench.org).
 
 Registration at [CausalBench website](https://causalbench.org) is required in order to utilize the CausalBench package.
-### Install CausalBench: 
+### Install CausalBench:
 `pip install causalbench-asu`
 
 ## Overview
@@ -42,7 +42,7 @@ To start using CausalBench, follow these steps:
 
 ## Contributing
 
-CausalBench is an open-source project and welcomes contributions from the community. We plan to announce the contribution guideline soon. 
+CausalBench is an open-source project and welcomes contributions from the community. We plan to announce the contribution guideline soon.
 
 ## License
 
@@ -58,13 +58,15 @@ for Causal-Learning Benchmarking for Efficacy, Reproducibility, and Scientific
 Collaboration".
 
 ## Support Benchmark Context
-CausalBench is structured to support different machine learning tasks and dataset types. With user contribution, the supported context will be expanded, currently (as of 8/12/25), these models and tasks are provided.    
+CausalBench is structured to support different machine learning tasks and dataset types. With user contribution, the supported context will be expanded, currently (as of 8/12/25), these models and tasks are provided.
 
 | Dataset               | File                 | Description                                                                                                                                           |
 |-----------------------|----------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|
 | Abalone               | data, static graph   |                                                                                                                                                       |
 | Adult                 | data, static graph   |                                                                                                                                                       |
 | Sachs                 | data, static graph   |                                                                                                                                                       |
+| California Housing    | data                 | Regression dataset from sklearn with 20,640 samples predicting median house values in California districts                                            |
+| Diabetes              | data                 | Regression dataset from sklearn with 442 samples predicting disease progression from physiological variables                                          |
 | NetSim                | data, static graph   | Brain FMRI scan<br/> - 28 simulations <br/> - Each has different DGPs, num of nodes (5, 50), num of observations (50 to 5000), 1400 datasets in total |
 | Time series simulated | data, temporal graph |                                                                                                                                                       |
 | Telecom               | data, temporal graph |                                                                                                                                                       |

diff --git a/causalbench-asu/tests/data/california_housing.zip b/causalbench-asu/tests/data/california_housing.zip
diff --git a/causalbench-asu/tests/data/california_housing/california_housing_data.csv b/causalbench-asu/tests/data/california_housing/california_housing_data.csv
diff --git a/causalbench-asu/tests/data/california_housing/config.yaml b/causalbench-asu/tests/data/california_housing/config.yaml
@@ -0,0 +1,49 @@
+# California Housing dataset: https://scikit-learn.org/stable/datasets/real_world.html#california-housing-dataset
+type: dataset
+name: california_housing
+source: sklearn
+url: https://scikit-learn.org/stable/datasets/real_world.html#california-housing-dataset
+description: Predict median house values in California districts based on census data (regression task)
+files:
+    file1:
+        type: csv
+        data: dataframe
+        path: california_housing_data.csv
+        headers: true
+        columns:
+            MedInc:
+                header: MedInc
+                type: continuous
+                data: decimal
+            HouseAge:
+                header: HouseAge
+                type: continuous
+                data: decimal
+            AveRooms:
+                header: AveRooms
+                type: continuous
+                data: decimal
+            AveBedrms:
+                header: AveBedrms
+                type: continuous
+                data: decimal
+            Population:
+                header: Population
+                type: continuous
+                data: decimal
+            AveOccup:
+                header: AveOccup
+                type: continuous
+                data: decimal
+            Latitude:
+                header: Latitude
+                type: continuous
+                data: decimal
+            Longitude:
+                header: Longitude
+                type: continuous
+                data: decimal
+            MedHouseVal:
+                header: MedHouseVal
+                type: continuous
-                type: continuous
-                data: decimal
-            HouseAge:
-                header: HouseAge
-                type: continuous
-                data: decimal
-            AveRooms:
-                header: AveRooms
-                type: continuous
-                data: decimal
-            AveBedrms:
-                header: AveBedrms
-                type: continuous
-                data: decimal
-            Population:
-                header: Population
-                type: continuous
-                data: decimal
-            AveOccup:
-                header: AveOccup
-                type: continuous
-                data: decimal
-            Latitude:
-                header: Latitude
-                type: continuous
-                data: decimal
-            Longitude:
-                header: Longitude
-                type: continuous
-                data: decimal
-            MedHouseVal:
-                header: MedHouseVal
-                type: continuous
+                type: ratio
+                data: decimal
+            HouseAge:
+                header: HouseAge
+                type: ratio
+                data: decimal
+            AveRooms:
+                header: AveRooms
+                type: ratio
+                data: decimal
+            AveBedrms:
+                header: AveBedrms
+                type: ratio
+                data: decimal
+            Population:
+                header: Population
+                type: ratio
+                data: decimal
+            AveOccup:
+                header: AveOccup
+                type: ratio
+                data: decimal
+            Latitude:
+                header: Latitude
+                type: ratio
+                data: decimal
+            Longitude:
+                header: Longitude
+                type: ratio
+                data: decimal
+            MedHouseVal:
+                header: MedHouseVal
+                type: ratio
-                type: continuous
-                data: decimal
-            HouseAge:
-                header: HouseAge
-                type: continuous
-                data: decimal
-            AveRooms:
-                header: AveRooms
-                type: continuous
-                data: decimal
-            AveBedrms:
-                header: AveBedrms
-                type: continuous
-                data: decimal
-            Population:
-                header: Population
-                type: continuous
-                data: decimal
-            AveOccup:
-                header: AveOccup
-                type: continuous
-                data: decimal
-            Latitude:
-                header: Latitude
-                type: continuous
-                data: decimal
-            Longitude:
-                header: Longitude
-                type: continuous
-                data: decimal
-            MedHouseVal:
-                header: MedHouseVal
-                type: continuous
+                type: ratio
+                data: decimal
+            HouseAge:
+                header: HouseAge
+                type: ratio
+                data: decimal
+            AveRooms:
+                header: AveRooms
+                type: ratio
+                data: decimal
+            AveBedrms:
+                header: AveBedrms
+                type: ratio
+                data: decimal
+            Population:
+                header: Population
+                type: ratio
+                data: decimal
+            AveOccup:
+                header: AveOccup
+                type: ratio
+                data: decimal
+            Latitude:
+                header: Latitude
+                type: ratio
+                data: decimal
+            Longitude:
+                header: Longitude
+                type: ratio
+                data: decimal
+            MedHouseVal:
+                header: MedHouseVal
+                type: ratio
+                data: decimal
diff --git a/causalbench-asu/tests/data/california_housing/download_data.py b/causalbench-asu/tests/data/california_housing/download_data.py
@@ -0,0 +1,24 @@
+"""
+Download California Housing dataset from sklearn
+"""
+
+from sklearn.datasets import fetch_california_housing
+
+
+def download_california_housing():
+    # Load the dataset
+    california = fetch_california_housing(as_frame=True)
+
+    # Combine features and target
+    data = california.frame
+
+    # Save to CSV
+    data.to_csv("california_housing_data.csv", index=False)
+
+    print(f"Dataset saved with {len(data)} samples")
+    print(f"Features: {california.feature_names}")
+    print(f"Target: {california.target_names}")
+
+
+if __name__ == "__main__":
+    download_california_housing()
diff --git a/causalbench-asu/tests/data/diabetes.zip b/causalbench-asu/tests/data/diabetes.zip
diff --git a/causalbench-asu/tests/data/diabetes/config.yaml b/causalbench-asu/tests/data/diabetes/config.yaml
@@ -0,0 +1,57 @@
+# Diabetes dataset: https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset
+type: dataset
+name: diabetes
+source: sklearn
+url: https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset
+description: Predict disease progression one year after baseline from physiological variables (regression task)
+files:
+    file1:
+        type: csv
+        data: dataframe
+        path: diabetes_data.csv
+        headers: true
+        columns:
+            age:
+                header: age
+                type: continuous
+                data: decimal
+            sex:
+                header: sex
+                type: continuous
+                data: decimal
+            bmi:
+                header: bmi
+                type: continuous
+                data: decimal
+            bp:
+                header: bp
+                type: continuous
+                data: decimal
+            s1:
+                header: s1
+                type: continuous
+                data: decimal
+            s2:
+                header: s2
+                type: continuous
+                data: decimal
+            s3:
+                header: s3
+                type: continuous
+                data: decimal
+            s4:
+                header: s4
+                type: continuous
+                data: decimal
+            s5:
+                header: s5
+                type: continuous
+                data: decimal
+            s6:
+                header: s6
+                type: continuous
+                data: decimal
+            target:
+                header: target
+                type: continuous
-                type: continuous
-                data: decimal
-            sex:
-                header: sex
-                type: continuous
-                data: decimal
-            bmi:
-                header: bmi
-                type: continuous
-                data: decimal
-            bp:
-                header: bp
-                type: continuous
-                data: decimal
-            s1:
-                header: s1
-                type: continuous
-                data: decimal
-            s2:
-                header: s2
-                type: continuous
-                data: decimal
-            s3:
-                header: s3
-                type: continuous
-                data: decimal
-            s4:
-                header: s4
-                type: continuous
-                data: decimal
-            s5:
-                header: s5
-                type: continuous
-                data: decimal
-            s6:
-                header: s6
-                type: continuous
-                data: decimal
-            target:
-                header: target
-                type: continuous
+                type: ratio
+                data: decimal
+            sex:
+                header: sex
+                type: ratio
+                data: decimal
+            bmi:
+                header: bmi
+                type: ratio
+                data: decimal
+            bp:
+                header: bp
+                type: ratio
+                data: decimal
+            s1:
+                header: s1
+                type: ratio
+                data: decimal
+            s2:
+                header: s2
+                type: ratio
+                data: decimal
+            s3:
+                header: s3
+                type: ratio
+                data: decimal
+            s4:
+                header: s4
+                type: ratio
+                data: decimal
+            s5:
+                header: s5
+                type: ratio
+                data: decimal
+            s6:
+                header: s6
+                type: ratio
+                data: decimal
+            target:
+                header: target
+                type: ratio
-                type: continuous
-                data: decimal
-            sex:
-                header: sex
-                type: continuous
-                data: decimal
-            bmi:
-                header: bmi
-                type: continuous
-                data: decimal
-            bp:
-                header: bp
-                type: continuous
-                data: decimal
-            s1:
-                header: s1
-                type: continuous
-                data: decimal
-            s2:
-                header: s2
-                type: continuous
-                data: decimal
-            s3:
-                header: s3
-                type: continuous
-                data: decimal
-            s4:
-                header: s4
-                type: continuous
-                data: decimal
-            s5:
-                header: s5
-                type: continuous
-                data: decimal
-            s6:
-                header: s6
-                type: continuous
-                data: decimal
-            target:
-                header: target
-                type: continuous
+                type: ratio
+                data: decimal
+            sex:
+                header: sex
+                type: ratio
+                data: decimal
+            bmi:
+                header: bmi
+                type: ratio
+                data: decimal
+            bp:
+                header: bp
+                type: ratio
+                data: decimal
+            s1:
+                header: s1
+                type: ratio
+                data: decimal
+            s2:
+                header: s2
+                type: ratio
+                data: decimal
+            s3:
+                header: s3
+                type: ratio
+                data: decimal
+            s4:
+                header: s4
+                type: ratio
+                data: decimal
+            s5:
+                header: s5
+                type: ratio
+                data: decimal
+            s6:
+                header: s6
+                type: ratio
+                data: decimal
+            target:
+                header: target
+                type: ratio
+                data: decimal