Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .beads/issues.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{"id":"policyengine-us-data-apq","title":"Add age and demographics to pre-tax contribution QRF imputation","description":"The QRF in puf.py that imputes pre_tax_contributions from CPS to PUF uses only employment_income as a predictor. Age, filing status, and number of dependents are strong predictors of 401(k) participation and contribution rates. Adding these should improve the distributional accuracy.","status":"closed","priority":2,"issue_type":"feature","created_at":"2026-01-31T08:01:22.72749-05:00","updated_at":"2026-01-31T08:08:02.675063-05:00","closed_at":"2026-01-31T08:08:02.675063-05:00"}
{"id":"policyengine-us-data-jhh","title":"Parameterize retirement contribution limits by year","description":"The contribution waterfall in cps.py hardcodes 2022 limits ($20,500 401k, $6,500 catch-up, $6,000 IRA, $1,000 IRA catch-up). These should be pulled from PolicyEngine parameters or a year-indexed lookup so the dataset builds correctly for any year.","status":"closed","priority":2,"issue_type":"bug","created_at":"2026-01-31T08:01:18.941246-05:00","updated_at":"2026-01-31T08:08:02.614396-05:00","closed_at":"2026-01-31T08:08:02.614396-05:00"}
{"id":"policyengine-us-data-mnw","title":"Use SS_SC source code for Social Security retirement/disability split","description":"Currently cps.py uses a hard age-62 cutoff to split SS into retirement vs disability. The CPS ASEC has SS_SC (Social Security source codes) that distinguish retirement, disability, and survivor benefits. Use these codes instead of the age heuristic.","status":"closed","priority":2,"issue_type":"bug","created_at":"2026-01-31T08:01:21.01419-05:00","updated_at":"2026-01-31T08:08:02.644611-05:00","closed_at":"2026-01-31T08:08:02.644611-05:00"}
{"id":"policyengine-us-data-x4q","title":"Calibrate taxable pension fraction from SOI data","description":"imputation_parameters.yaml sets taxable_pension_fraction to 1.0 with the comment 'no SOI data, so arbitrary assumption.' But the SOI targets CSV includes both total_pension_income and taxable_pension_income by AGI bracket. Use the ratio of these to set a data-driven fraction instead of assuming 100% taxable.","status":"closed","priority":2,"issue_type":"bug","created_at":"2026-01-31T08:01:24.590331-05:00","updated_at":"2026-01-31T08:08:02.70425-05:00","closed_at":"2026-01-31T08:08:02.70425-05:00"}
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ database:
python policyengine_us_data/db/etl_age.py
python policyengine_us_data/db/etl_medicaid.py
python policyengine_us_data/db/etl_snap.py
python policyengine_us_data/db/etl_state_income_tax.py
python policyengine_us_data/db/etl_irs_soi.py
python policyengine_us_data/db/validate_database.py

Expand Down
4 changes: 4 additions & 0 deletions changelog_entry.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
- bump: minor
changes:
added:
- Add state income tax calibration targets from Census STC FY2023 data
Original file line number Diff line number Diff line change
Expand Up @@ -105,10 +105,11 @@
targets_df, X_sparse, household_id_mapping = builder.build_matrix(
sim,
target_filter={
"stratum_group_ids": [4],
"stratum_group_ids": [4, 7], # 4=SNAP households, 7=state income tax
"variables": [
"health_insurance_premiums_without_medicare_part_b",
"snap",
"state_income_tax", # Census STC state income tax collections
],
},
)
Expand Down
7 changes: 5 additions & 2 deletions policyengine_us_data/db/DATABASE_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,9 @@ make promote-database # Copy DB + raw inputs to HuggingFace clone
| 4 | `etl_age.py` | Census ACS 1-year | Age distribution: 18 bins x 488 geographies |
| 5 | `etl_medicaid.py` | Census ACS + CMS | Medicaid enrollment (admin state-level, survey district-level) |
| 6 | `etl_snap.py` | USDA FNS + Census ACS | SNAP participation (admin state-level, survey district-level) |
| 7 | `etl_irs_soi.py` | IRS | Tax variables, EITC by child count, AGI brackets, conditional strata |
| 8 | `validate_database.py` | No | Checks all target variables exist in policyengine-us |
| 7 | `etl_state_income_tax.py` | No | State income tax collections (Census STC FY2023, hardcoded) |
| 8 | `etl_irs_soi.py` | IRS | Tax variables, EITC by child count, AGI brackets, conditional strata |
| 9 | `validate_database.py` | No | Checks all target variables exist in policyengine-us |

### Raw Input Caching

Expand Down Expand Up @@ -108,6 +109,7 @@ The `stratum_group_id` field categorizes strata:
| 4 | SNAP | SNAP recipient strata |
| 5 | Medicaid | Medicaid enrollment strata |
| 6 | EITC | EITC recipients by qualifying children |
| 7 | State Income Tax | State-level income tax collections (Census STC) |
| 100-118 | IRS Conditional | Each IRS variable paired with conditional count constraints |

### Conditional Strata (IRS SOI)
Expand Down Expand Up @@ -216,6 +218,7 @@ SELECT
WHEN 4 THEN 'SNAP'
WHEN 5 THEN 'Medicaid'
WHEN 6 THEN 'EITC'
WHEN 7 THEN 'State Income Tax'
END AS group_name,
COUNT(*) AS stratum_count
FROM strata
Expand Down
Loading