From 0a14f946771798ee2e4c81a9c92fd099f7fd25f1 Mon Sep 17 00:00:00 2001 From: David Trimmer Date: Wed, 17 Dec 2025 13:40:21 -0500 Subject: [PATCH 1/4] Add EITC childless families analysis notebook MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Analyzes childless families by EITC phase status using state-specific datasets. Features: - Uses state datasets (hf://policyengine/policyengine-us-data/states/{STATE}.h5) - 5 phase statuses: No income, Pre-phase-in, Full amount, Partially phased out, Fully phased out - Weighted household counts and percentages by state - Separate summary CSVs for 2024 and 2025 - Includes federal and state EITC amounts - Household characteristics: marital status, age, AGI Fixes #99 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .gitignore | 3 + .../eitc_childless_analysis.ipynb | 1500 +++++++++++++++++ ...tc_childless_phase_status_summary_2024.csv | 256 +++ ...tc_childless_phase_status_summary_2025.csv | 255 +++ 4 files changed, 2014 insertions(+) create mode 100644 eitc_childless_analysis/eitc_childless_analysis.ipynb create mode 100644 eitc_childless_analysis/eitc_childless_phase_status_summary_2024.csv create mode 100644 eitc_childless_analysis/eitc_childless_phase_status_summary_2025.csv diff --git a/.gitignore b/.gitignore index 8b248d0..d8ba267 100644 --- a/.gitignore +++ b/.gitignore @@ -163,3 +163,6 @@ cython_debug/ # and can be added to the global gitignore or merged into this file. For a more nuclear # option (not recommended) you can uncomment the following to ignore the entire idea folder. #.idea/ + +# Large data files +eitc_childless_analysis/eitc_childless_families_*.csv diff --git a/eitc_childless_analysis/eitc_childless_analysis.ipynb b/eitc_childless_analysis/eitc_childless_analysis.ipynb new file mode 100644 index 0000000..5b7af7e --- /dev/null +++ b/eitc_childless_analysis/eitc_childless_analysis.ipynb @@ -0,0 +1,1500 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# EITC Analysis: Childless Families by Phase-in/Phase-out Status\n", + "\n", + "This notebook analyzes childless families (tax units with no EITC qualifying children) who receive the Earned Income Tax Credit (EITC), including:\n", + "- Federal EITC amounts\n", + "- State EITC amounts (where applicable)\n", + "- EITC schedule position (pre-phase-in, full amount, partially phased out, fully phased out)\n", + "- Household characteristics (marital status, state, demographics)\n", + "\n", + "**Data Source:** State-specific datasets from PolicyEngine\n", + "\n", + "**Years Analyzed:** 2024 and 2025" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup and Imports" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [], + "source": [ + "from policyengine_us import Microsimulation\n", + "import pandas as pd\n", + "import numpy as np\n", + "\n", + "pd.set_option('display.max_columns', None)\n", + "pd.set_option('display.width', None)\n", + "pd.set_option('display.float_format', lambda x: f'{x:,.2f}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Helper Function: Determine EITC Phase Status\n", + "\n", + "The EITC has four distinct regions:\n", + "1. **Pre-phase-in**: Earning below the level needed to reach maximum credit\n", + "2. **Full amount (plateau)**: Earning enough for maximum credit, not yet in phase-out\n", + "3. **Partially phased out**: In phase-out range, but still receiving some credit\n", + "4. **Fully phased out**: Income too high, EITC = $0" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [], + "source": [ + "def determine_eitc_phase_status_vectorized(df):\n", + " \"\"\"\n", + " Vectorized version to determine EITC phase status for a DataFrame.\n", + " \n", + " Categories:\n", + " - No income: No/minimal earned income, not receiving EITC\n", + " - Pre-phase-in: Earning but haven't reached maximum credit yet\n", + " - Full amount: At maximum credit (plateau region)\n", + " - Partially phased out: In phase-out region, still receiving some credit\n", + " - Fully phased out: Income too high, EITC reduced to $0\n", + " \"\"\"\n", + " conditions = [\n", + " # No income: earned income is 0 or very low AND not receiving EITC\n", + " (df['tax_unit_earned_income'] <= 100) & (df['eitc'] <= 0),\n", + " \n", + " # Fully phased out: EITC is 0 AND had some earned income AND there was reduction\n", + " (df['eitc'] <= 0) & (df['tax_unit_earned_income'] > 100) & (df['eitc_reduction'] > 0),\n", + " \n", + " # Fully phased out: EITC is 0 AND phased_in >= maximum (meaning they would have gotten max but it's all reduced)\n", + " (df['eitc'] <= 0) & (df['eitc_phased_in'] >= df['eitc_maximum']),\n", + " \n", + " # Pre-phase-in: Receiving EITC but haven't hit maximum yet\n", + " (df['eitc'] > 0) & (df['eitc_phased_in'] < df['eitc_maximum']),\n", + " \n", + " # Partially phased out: Receiving EITC with some reduction\n", + " (df['eitc'] > 0) & (df['eitc_reduction'] > 0),\n", + " \n", + " # Full amount: At maximum, no reduction\n", + " (df['eitc'] > 0) & (df['eitc_phased_in'] >= df['eitc_maximum']) & (df['eitc_reduction'] <= 0)\n", + " ]\n", + " \n", + " choices = [\n", + " 'No income',\n", + " 'Fully phased out',\n", + " 'Fully phased out',\n", + " 'Pre-phase-in',\n", + " 'Partially phased out',\n", + " 'Full amount'\n", + " ]\n", + " \n", + " return np.select(conditions, choices, default='No income')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load Data and Calculate Variables\n", + "\n", + "We'll run the analysis for both 2024 and 2025." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [], + "source": [ + "# List of all US states\n", + "ALL_STATES = [\n", + " 'AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL', \n", + " 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', \n", + " 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', \n", + " 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', \n", + " 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY'\n", + "]\n", + "\n", + "# Phase status order for sorting\n", + "PHASE_ORDER = ['No income', 'Pre-phase-in', 'Full amount', 'Partially phased out', 'Fully phased out']\n", + "\n", + "def run_state_eitc_analysis(state_abbr, year):\n", + " \"\"\"\n", + " Run EITC analysis for ALL childless households (not just recipients) for a given state and year.\n", + " \"\"\"\n", + " try:\n", + " # Load the state-specific dataset\n", + " dataset_path = f\"hf://policyengine/policyengine-us-data/states/{state_abbr}.h5\"\n", + " sim = Microsimulation(dataset=dataset_path)\n", + " \n", + " # Calculate tax unit level variables\n", + " data = {}\n", + " \n", + " tax_unit_vars = [\n", + " 'tax_unit_id',\n", + " 'tax_unit_weight',\n", + " 'eitc',\n", + " 'eitc_maximum',\n", + " 'eitc_phased_in',\n", + " 'eitc_reduction',\n", + " 'eitc_child_count',\n", + " 'state_eitc',\n", + " 'adjusted_gross_income',\n", + " 'tax_unit_earned_income',\n", + " 'filing_status',\n", + " 'age_head',\n", + " 'age_spouse',\n", + " ]\n", + " \n", + " for var in tax_unit_vars:\n", + " result = sim.calculate(var, period=year)\n", + " data[var] = result.values if hasattr(result, 'values') else np.array(result)\n", + " \n", + " df = pd.DataFrame(data)\n", + " df['state'] = state_abbr\n", + " \n", + " # Filter to childless families only (include ALL, not just EITC recipients)\n", + " childless_mask = df['eitc_child_count'] == 0\n", + " df_childless = df[childless_mask].copy()\n", + " \n", + " if len(df_childless) == 0:\n", + " return None\n", + " \n", + " # Determine EITC phase status for ALL childless households\n", + " df_childless['eitc_phase_status'] = determine_eitc_phase_status_vectorized(df_childless)\n", + " \n", + " # Add year column\n", + " df_childless['year'] = year\n", + " \n", + " # Map filing status codes to readable labels\n", + " filing_status_map = {\n", + " 1: 'Single',\n", + " 2: 'Joint',\n", + " 3: 'Separate',\n", + " 4: 'Head of Household',\n", + " 5: 'Widow(er)'\n", + " }\n", + " df_childless['filing_status_label'] = df_childless['filing_status'].map(filing_status_map).fillna('Unknown')\n", + " \n", + " return df_childless\n", + " \n", + " except Exception as e:\n", + " print(f\" Error processing {state_abbr}: {e}\")\n", + " return None\n", + "\n", + "\n", + "def run_all_states_analysis(year, states=None):\n", + " \"\"\"\n", + " Run EITC analysis for all states for a given year.\n", + " \"\"\"\n", + " if states is None:\n", + " states = ALL_STATES\n", + " \n", + " print(f\"\\n{'='*60}\")\n", + " print(f\"Running analysis for {year}\")\n", + " print(f\"{'='*60}\")\n", + " \n", + " all_results = []\n", + " \n", + " for i, state in enumerate(states):\n", + " print(f\"Processing {state} ({i+1}/{len(states)})...\", end=\" \")\n", + " result = run_state_eitc_analysis(state, year)\n", + " if result is not None and len(result) > 0:\n", + " weighted_count = result['tax_unit_weight'].sum()\n", + " print(f\"{len(result):,} records, {weighted_count:,.0f} weighted\")\n", + " all_results.append(result)\n", + " else:\n", + " print(\"No data found\")\n", + " \n", + " if all_results:\n", + " combined = pd.concat(all_results, ignore_index=True)\n", + " print(f\"\\nTotal: {len(combined):,} records, {combined['tax_unit_weight'].sum():,.0f} weighted tax units\")\n", + " return combined\n", + " else:\n", + " return pd.DataFrame()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Run Analysis for 2024 and 2025" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "============================================================\n", + "Running analysis for 2024\n", + "============================================================\n", + "Processing AL (1/51)... 25,751 records, 1,422,123 weighted\n", + "Processing AK (2/51)... 1,182 records, 205,778 weighted\n", + "Processing AZ (3/51)... 30,120 records, 1,905,622 weighted\n", + "Processing AR (4/51)... 15,144 records, 683,842 weighted\n", + "Processing CA (5/51)... 238,247 records, 11,676,756 weighted\n", + "Processing CO (6/51)... 34,120 records, 1,602,958 weighted\n", + "Processing CT (7/51)... 19,827 records, 1,119,846 weighted\n", + "Processing DE (8/51)... 3,801 records, 265,233 weighted\n", + "Processing DC (9/51)... 4,995 records, 247,082 weighted\n", + "Processing FL (10/51)... 45,655 records, 6,828,672 weighted\n", + "Processing GA (11/51)... 56,638 records, 2,867,909 weighted\n", + "Processing HI (12/51)... 8,416 records, 401,230 weighted\n", + "Processing ID (13/51)... 7,678 records, 420,636 weighted\n", + "Processing IL (14/51)... 56,631 records, 4,061,833 weighted\n", + "Processing IN (15/51)... 33,456 records, 1,707,284 weighted\n", + "Processing IA (16/51)... 14,070 records, 834,990 weighted\n", + "Processing KS (17/51)... 15,776 records, 746,492 weighted\n", + "Processing KY (18/51)... 22,109 records, 1,122,918 weighted\n", + "Processing LA (19/51)... 21,674 records, 1,255,035 weighted\n", + "Processing ME (20/51)... 7,782 records, 436,655 weighted\n", + "Processing MD (21/51)... 39,963 records, 1,737,465 weighted\n", + "Processing MA (22/51)... 40,034 records, 2,445,482 weighted\n", + "Processing MI (23/51)... 41,722 records, 2,947,462 weighted\n", + "Processing MN (24/51)... 32,839 records, 1,579,933 weighted\n", + "Processing MS (25/51)... 13,414 records, 751,858 weighted\n", + "Processing MO (26/51)... 29,883 records, 1,572,474 weighted\n", + "Processing MT (27/51)... 7,850 records, 322,606 weighted\n", + "Processing NE (28/51)... 7,585 records, 555,046 weighted\n", + "Processing NV (29/51)... 10,477 records, 962,804 weighted\n", + "Processing NH (30/51)... 2,774 records, 466,176 weighted\n", + "Processing NJ (31/51)... 53,826 records, 2,670,506 weighted\n", + "Processing NM (32/51)... 10,333 records, 674,804 weighted\n", + "Processing NY (33/51)... 111,004 records, 6,089,496 weighted\n", + "Processing NC (34/51)... 55,174 records, 3,018,448 weighted\n", + "Processing ND (35/51)... 3,391 records, 208,559 weighted\n", + "Processing OH (36/51)... 52,414 records, 3,171,406 weighted\n", + "Processing OK (37/51)... 17,840 records, 1,141,744 weighted\n", + "Processing OR (38/51)... 27,048 records, 1,384,394 weighted\n", + "Processing PA (39/51)... 59,791 records, 4,057,412 weighted\n", + "Processing RI (40/51)... 7,429 records, 397,583 weighted\n", + "Processing SC (41/51)... 26,703 records, 1,387,951 weighted\n", + "Processing SD (42/51)... 1,071 records, 257,659 weighted\n", + "Processing TN (43/51)... 11,099 records, 2,125,824 weighted\n", + "Processing TX (44/51)... 46,778 records, 8,270,492 weighted\n", + "Processing UT (45/51)... 18,448 records, 728,702 weighted\n", + "Processing VT (46/51)... 3,780 records, 213,103 weighted\n", + "Processing VA (47/51)... 47,926 records, 2,348,494 weighted\n", + "Processing WA (48/51)... 12,571 records, 2,709,062 weighted\n", + "Processing WV (49/51)... 6,981 records, 519,592 weighted\n", + "Processing WI (50/51)... 27,609 records, 1,731,936 weighted\n", + "Processing WY (51/51)... 2,712 records, 170,182 weighted\n", + "\n", + "Total: 1,493,541 records, 96,431,536 weighted tax units\n" + ] + } + ], + "source": [ + "# Run for 2024 - all states\n", + "df_2024 = run_all_states_analysis(2024)" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "============================================================\n", + "Running analysis for 2025\n", + "============================================================\n", + "Processing AL (1/51)... 25,751 records, 1,435,327 weighted\n", + "Processing AK (2/51)... 1,182 records, 207,689 weighted\n", + "Processing AZ (3/51)... 30,120 records, 1,923,315 weighted\n", + "Processing AR (4/51)... 15,144 records, 690,191 weighted\n", + "Processing CA (5/51)... 238,247 records, 11,785,171 weighted\n", + "Processing CO (6/51)... 34,120 records, 1,617,841 weighted\n", + "Processing CT (7/51)... 19,827 records, 1,130,243 weighted\n", + "Processing DE (8/51)... 3,801 records, 267,696 weighted\n", + "Processing DC (9/51)... 4,995 records, 249,376 weighted\n", + "Processing FL (10/51)... 45,655 records, 6,892,074 weighted\n", + "Processing GA (11/51)... 56,638 records, 2,894,537 weighted\n", + "Processing HI (12/51)... 8,416 records, 404,956 weighted\n", + "Processing ID (13/51)... 7,678 records, 424,542 weighted\n", + "Processing IL (14/51)... 56,631 records, 4,099,546 weighted\n", + "Processing IN (15/51)... 33,456 records, 1,723,135 weighted\n", + "Processing IA (16/51)... 14,070 records, 842,742 weighted\n", + "Processing KS (17/51)... 15,776 records, 753,423 weighted\n", + "Processing KY (18/51)... 22,109 records, 1,133,344 weighted\n", + "Processing LA (19/51)... 21,674 records, 1,266,688 weighted\n", + "Processing ME (20/51)... 7,782 records, 440,709 weighted\n", + "Processing MD (21/51)... 39,963 records, 1,753,596 weighted\n", + "Processing MA (22/51)... 40,034 records, 2,468,188 weighted\n", + "Processing MI (23/51)... 41,722 records, 2,974,829 weighted\n", + "Processing MN (24/51)... 32,839 records, 1,594,602 weighted\n", + "Processing MS (25/51)... 13,414 records, 758,839 weighted\n", + "Processing MO (26/51)... 29,883 records, 1,587,074 weighted\n", + "Processing MT (27/51)... 7,850 records, 325,601 weighted\n", + "Processing NE (28/51)... 7,585 records, 560,199 weighted\n", + "Processing NV (29/51)... 10,477 records, 971,744 weighted\n", + "Processing NH (30/51)... 2,774 records, 470,505 weighted\n", + "Processing NJ (31/51)... 53,826 records, 2,695,300 weighted\n", + "Processing NM (32/51)... 10,333 records, 681,069 weighted\n", + "Processing NY (33/51)... 111,004 records, 6,146,035 weighted\n", + "Processing NC (34/51)... 55,174 records, 3,046,473 weighted\n", + "Processing ND (35/51)... 3,391 records, 210,495 weighted\n", + "Processing OH (36/51)... 52,414 records, 3,200,852 weighted\n", + "Processing OK (37/51)... 17,840 records, 1,152,346 weighted\n", + "Processing OR (38/51)... 27,048 records, 1,397,248 weighted\n", + "Processing PA (39/51)... 59,791 records, 4,095,084 weighted\n", + "Processing RI (40/51)... 7,429 records, 401,274 weighted\n", + "Processing SC (41/51)... 26,703 records, 1,400,838 weighted\n", + "Processing SD (42/51)... 1,071 records, 260,051 weighted\n", + "Processing TN (43/51)... 11,099 records, 2,145,562 weighted\n", + "Processing TX (44/51)... 46,778 records, 8,347,282 weighted\n", + "Processing UT (45/51)... 18,448 records, 735,468 weighted\n", + "Processing VT (46/51)... 3,780 records, 215,082 weighted\n", + "Processing VA (47/51)... 47,926 records, 2,370,298 weighted\n", + "Processing WA (48/51)... 12,571 records, 2,734,216 weighted\n", + "Processing WV (49/51)... 6,981 records, 524,417 weighted\n", + "Processing WI (50/51)... 27,609 records, 1,748,017 weighted\n", + "Processing WY (51/51)... 2,712 records, 171,762 weighted\n", + "\n", + "Total: 1,493,541 records, 97,326,840 weighted tax units\n" + ] + } + ], + "source": [ + "# Run for 2025 - all states\n", + "df_2025 = run_all_states_analysis(2025)" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Combined dataset: 2,987,082 records\n" + ] + } + ], + "source": [ + "# Combine both years\n", + "df_combined = pd.concat([df_2024, df_2025], ignore_index=True)\n", + "print(f\"\\nCombined dataset: {len(df_combined):,} records\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summary Statistics" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### EITC Phase Status Distribution" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "======================================================================\n", + "EITC Phase Status by State - 2024\n", + "======================================================================\n", + "\n", + "======================================================================\n", + "EITC Phase Status by State - 2025\n", + "======================================================================\n", + "\n", + "2024 Summary (first 20 rows):\n", + "state eitc_phase_status weighted_households pct_of_state avg_federal_eitc avg_state_eitc\n", + " AK No income 64,211.33 31.20 0.00 0.00\n", + " AK Pre-phase-in 3,593.07 1.70 515.63 0.00\n", + " AK Full amount 0.26 0.00 632.00 0.00\n", + " AK Partially phased out 1,670.44 0.80 626.76 0.00\n", + " AK Fully phased out 136,303.23 66.20 0.00 0.00\n", + " AL No income 598,891.06 42.10 0.00 0.00\n", + " AL Pre-phase-in 3,394.00 0.20 354.86 0.00\n", + " AL Full amount 579.79 0.00 632.00 0.00\n", + " AL Partially phased out 10,719.72 0.80 448.06 0.00\n", + " AL Fully phased out 808,538.31 56.90 0.00 0.00\n", + " AR No income 232,860.75 34.10 0.00 0.00\n", + " AR Pre-phase-in 2,328.13 0.30 453.60 0.00\n", + " AR Full amount 225.77 0.00 632.00 0.00\n", + " AR Partially phased out 5,891.05 0.90 390.64 0.00\n", + " AR Fully phased out 442,536.25 64.70 0.00 0.00\n", + " AZ No income 672,398.75 35.30 0.00 0.00\n", + " AZ Pre-phase-in 16,732.68 0.90 489.91 0.00\n", + " AZ Full amount 813.98 0.00 632.00 0.00\n", + " AZ Partially phased out 14,077.30 0.70 468.21 0.00\n", + " AZ Fully phased out 1,201,599.25 63.10 0.00 0.00\n", + "\n", + "2025 Summary (first 20 rows):\n", + "state eitc_phase_status weighted_households pct_of_state avg_federal_eitc avg_state_eitc\n", + " AK No income 64,630.09 31.10 0.00 0.00\n", + " AK Pre-phase-in 3,626.43 1.70 540.81 0.00\n", + " AK Full amount 0.27 0.00 649.00 0.00\n", + " AK Partially phased out 1,685.95 0.80 627.09 0.00\n", + " AK Fully phased out 137,746.20 66.30 0.00 0.00\n", + " AL No income 600,602.44 41.80 0.00 0.00\n", + " AL Pre-phase-in 3,424.46 0.20 372.11 0.00\n", + " AL Full amount 586.22 0.00 649.00 0.00\n", + " AL Partially phased out 10,817.39 0.80 439.31 0.00\n", + " AL Fully phased out 819,896.50 57.10 0.00 0.00\n", + " AR No income 233,882.83 33.90 0.00 0.00\n", + " AR Pre-phase-in 2,349.75 0.30 475.76 0.00\n", + " AR Full amount 227.08 0.00 649.00 0.00\n", + " AR Partially phased out 5,943.54 0.90 379.36 0.00\n", + " AR Fully phased out 447,788.06 64.90 0.00 0.00\n", + " AZ No income 676,085.38 35.20 0.00 0.00\n", + " AZ Pre-phase-in 16,887.52 0.90 513.83 0.00\n", + " AZ Full amount 821.81 0.00 649.00 0.00\n", + " AZ Partially phased out 14,207.12 0.70 460.65 0.00\n", + " AZ Fully phased out 1,215,313.50 63.20 0.00 0.00\n" + ] + } + ], + "source": [ + "def create_phase_status_summary(df, year_label):\n", + " \"\"\"\n", + " Create summary of EITC phase status by state with weighted counts and percentages.\n", + " \"\"\"\n", + " print(f\"\\n{'='*70}\")\n", + " print(f\"EITC Phase Status by State - {year_label}\")\n", + " print(f\"{'='*70}\")\n", + " \n", + " # Calculate weighted counts by state and phase status\n", + " summary = df.groupby(['state', 'eitc_phase_status']).agg({\n", + " 'tax_unit_weight': 'sum',\n", + " }).reset_index()\n", + " \n", + " summary.columns = ['state', 'eitc_phase_status', 'weighted_households']\n", + " \n", + " # Calculate state totals for percentage\n", + " state_totals = summary.groupby('state')['weighted_households'].sum().reset_index()\n", + " state_totals.columns = ['state', 'state_total']\n", + " \n", + " # Merge to get percentages\n", + " summary = summary.merge(state_totals, on='state')\n", + " summary['pct_of_state'] = (summary['weighted_households'] / summary['state_total'] * 100).round(1)\n", + " \n", + " # Add average EITC amounts (only for those receiving)\n", + " avg_eitc = df[df['eitc'] > 0].groupby(['state', 'eitc_phase_status']).apply(\n", + " lambda x: pd.Series({\n", + " 'avg_federal_eitc': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n", + " 'avg_state_eitc': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n", + " })\n", + " ).reset_index()\n", + " \n", + " summary = summary.merge(avg_eitc, on=['state', 'eitc_phase_status'], how='left')\n", + " summary['avg_federal_eitc'] = summary['avg_federal_eitc'].fillna(0)\n", + " summary['avg_state_eitc'] = summary['avg_state_eitc'].fillna(0)\n", + " \n", + " # Reorder columns\n", + " summary = summary[['state', 'eitc_phase_status', 'weighted_households', 'pct_of_state', \n", + " 'avg_federal_eitc', 'avg_state_eitc']]\n", + " \n", + " # Sort by state and phase status order\n", + " summary['phase_sort'] = summary['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})\n", + " summary = summary.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)\n", + " \n", + " return summary\n", + "\n", + "# Run for 2024 and 2025\n", + "summary_2024 = create_phase_status_summary(df_2024, \"2024\")\n", + "summary_2025 = create_phase_status_summary(df_2025, \"2025\")\n", + "\n", + "print(\"\\n2024 Summary (first 20 rows):\")\n", + "print(summary_2024.head(20).to_string(index=False))\n", + "print(\"\\n2025 Summary (first 20 rows):\")\n", + "print(summary_2025.head(20).to_string(index=False))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Distribution by Filing Status (Marital Status)" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "======================================================================\n", + "Example Households by Phase Status - 2024\n", + "======================================================================\n", + " phase_status state marital_status age_head agi earned_income federal_eitc state_eitc\n", + " Pre-phase-in TN Unknown 30 938.96 2,316.38 177.20 0.00\n", + " Pre-phase-in NY Unknown 38 2.51 2.51 0.19 0.06\n", + " Pre-phase-in NY Unknown 44 3,109.13 3,268.07 250.01 75.00\n", + " Full amount GA Unknown 72 9,529.73 9,529.73 632.00 0.00\n", + " Full amount NY Unknown 31 12,709.29 13,765.16 632.00 189.60\n", + " Full amount CA Unknown 48 7,210.69 15,053.27 632.00 159.20\n", + "Partially phased out AL Unknown 25 12,807.38 13,072.27 422.22 0.00\n", + "Partially phased out NY Unknown 64 13,765.90 13,765.16 369.15 65.75\n", + "Partially phased out AZ Unknown 46 3,667.19 10,394.97 627.03 0.00\n" + ] + } + ], + "source": [ + "def show_example_households(df, year_label, n_examples=3):\n", + " \"\"\"\n", + " Show example households from each phase status with key characteristics.\n", + " \"\"\"\n", + " print(f\"\\n{'='*70}\")\n", + " print(f\"Example Households by Phase Status - {year_label}\")\n", + " print(f\"{'='*70}\")\n", + " \n", + " examples = []\n", + " \n", + " for phase in ['Pre-phase-in', 'Full amount', 'Partially phased out']:\n", + " phase_df = df[df['eitc_phase_status'] == phase]\n", + " if len(phase_df) > 0:\n", + " # Get random sample of examples\n", + " sample = phase_df.sample(min(n_examples, len(phase_df)), random_state=42)\n", + " for _, row in sample.iterrows():\n", + " examples.append({\n", + " 'phase_status': phase,\n", + " 'state': row['state'],\n", + " 'marital_status': row['filing_status_label'],\n", + " 'age_head': int(row['age_head']),\n", + " 'agi': row['adjusted_gross_income'],\n", + " 'earned_income': row['tax_unit_earned_income'],\n", + " 'federal_eitc': row['eitc'],\n", + " 'state_eitc': row['state_eitc'],\n", + " })\n", + " \n", + " examples_df = pd.DataFrame(examples)\n", + " return examples_df\n", + "\n", + "# Show examples for 2024\n", + "examples_2024 = show_example_households(df_2024, \"2024\")\n", + "print(examples_2024.to_string(index=False))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Distribution by State" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "============================================================\n", + "Top 15 States by EITC Recipients - 2024\n", + "============================================================\n", + "state Tax Units (Weighted) Total Federal EITC Total State EITC Avg Federal EITC Avg State EITC Has State EITC\n", + " CA 11,676,756.00 126,396,768.00 392,533,280.00 10.82 33.62 True\n", + " TX 8,270,492.50 102,216,176.00 0.00 12.36 0.00 False\n", + " FL 6,828,671.50 50,078,040.00 0.00 7.33 0.00 False\n", + " NY 6,089,496.00 64,632,924.00 17,955,152.00 10.61 2.95 True\n", + " IL 4,061,833.00 43,125,848.00 8,625,170.00 10.62 2.12 True\n", + " PA 4,057,412.25 41,305,212.00 0.00 10.18 0.00 False\n", + " OH 3,171,405.75 30,410,496.00 9,123,148.00 9.59 2.88 True\n", + " NC 3,018,447.50 15,553,126.00 0.00 5.15 0.00 False\n", + " MI 2,947,462.50 30,062,786.00 9,018,837.00 10.20 3.06 True\n", + " GA 2,867,909.25 20,237,260.00 0.00 7.06 0.00 False\n", + " WA 2,709,062.25 42,446,576.00 27,457,220.00 15.67 10.14 True\n", + " NJ 2,670,505.50 31,733,258.00 55,209,756.00 11.88 20.67 True\n", + " MA 2,445,482.50 27,926,758.00 11,170,704.00 11.42 4.57 True\n", + " VA 2,348,493.50 14,553,468.00 102,224,696.00 6.20 43.53 True\n", + " TN 2,125,824.00 16,333,918.00 0.00 7.68 0.00 False\n", + "\n", + "============================================================\n", + "Top 15 States by EITC Recipients - 2025\n", + "============================================================\n", + "state Tax Units (Weighted) Total Federal EITC Total State EITC Avg Federal EITC Avg State EITC Has State EITC\n", + " CA 11,785,171.00 129,709,128.00 394,849,248.00 11.01 33.50 True\n", + " TX 8,347,282.00 106,447,616.00 0.00 12.75 0.00 False\n", + " FL 6,892,074.00 51,580,204.00 0.00 7.48 0.00 False\n", + " NY 6,146,035.00 66,562,396.00 18,485,602.00 10.83 3.01 True\n", + " IL 4,099,546.00 44,526,100.00 8,905,220.00 10.86 2.17 True\n", + " PA 4,095,084.50 42,804,808.00 0.00 10.45 0.00 False\n", + " OH 3,200,851.75 31,340,700.00 9,402,211.00 9.79 2.94 True\n", + " NC 3,046,473.25 15,684,542.00 0.00 5.15 0.00 False\n", + " MI 2,974,829.25 31,287,212.00 9,386,164.00 10.52 3.16 True\n", + " GA 2,894,537.00 20,446,926.00 0.00 7.06 0.00 False\n", + " WA 2,734,215.50 44,398,868.00 28,292,344.00 16.24 10.35 True\n", + " NJ 2,695,300.25 32,696,890.00 56,647,304.00 12.13 21.02 True\n", + " MA 2,468,188.00 28,870,056.00 11,548,022.00 11.70 4.68 True\n", + " VA 2,370,298.50 14,610,885.00 103,897,088.00 6.16 43.83 True\n", + " TN 2,145,561.75 16,974,484.00 0.00 7.91 0.00 False\n" + ] + } + ], + "source": [ + "def summary_by_state(df, year_label, top_n=15):\n", + " \"\"\"\n", + " Create summary by state (top N by number of recipients).\n", + " \"\"\"\n", + " print(f\"\\n{'='*60}\")\n", + " print(f\"Top {top_n} States by EITC Recipients - {year_label}\")\n", + " print(f\"{'='*60}\")\n", + " \n", + " summary = df.groupby('state').apply(\n", + " lambda x: pd.Series({\n", + " 'Tax Units (Weighted)': x['tax_unit_weight'].sum(),\n", + " 'Total Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum(),\n", + " 'Total State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum(),\n", + " 'Avg Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n", + " 'Avg State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n", + " 'Has State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() > 0,\n", + " })\n", + " ).reset_index()\n", + " \n", + " # Sort by number of recipients\n", + " summary = summary.sort_values('Tax Units (Weighted)', ascending=False).head(top_n)\n", + " \n", + " return summary\n", + "\n", + "# 2024\n", + "state_2024 = summary_by_state(df_2024, \"2024\")\n", + "print(state_2024.to_string(index=False))\n", + "\n", + "# 2025\n", + "state_2025 = summary_by_state(df_2025, \"2025\")\n", + "print(state_2025.to_string(index=False))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Cross-tabulation: Phase Status by Filing Status" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "============================================================\n", + "Phase Status by Filing Status (Weighted Tax Units) - 2024\n", + "============================================================\n", + "filing_status_label Unknown Total\n", + "eitc_phase_status \n", + "Full amount 33,314.48 33,314.48\n", + "Fully phased out 60,244,548.00 60,244,548.00\n", + "No income 34,126,456.00 34,126,456.00\n", + "Partially phased out 824,046.81 824,046.81\n", + "Pre-phase-in 1,203,184.00 1,203,184.00\n", + "Total 96,431,552.00 96,431,552.00\n", + "\n", + "============================================================\n", + "Phase Status by Filing Status (Weighted Tax Units) - 2025\n", + "============================================================\n", + "filing_status_label Unknown Total\n", + "eitc_phase_status \n", + "Full amount 33,638.47 33,638.47\n", + "Fully phased out 60,940,444.00 60,940,444.00\n", + "No income 34,307,016.00 34,307,016.00\n", + "Partially phased out 831,458.88 831,458.88\n", + "Pre-phase-in 1,214,332.12 1,214,332.12\n", + "Total 97,326,896.00 97,326,896.00\n" + ] + } + ], + "source": [ + "def crosstab_phase_by_filing(df, year_label):\n", + " \"\"\"\n", + " Create cross-tabulation of phase status by filing status.\n", + " \"\"\"\n", + " print(f\"\\n{'='*60}\")\n", + " print(f\"Phase Status by Filing Status (Weighted Tax Units) - {year_label}\")\n", + " print(f\"{'='*60}\")\n", + " \n", + " # Create pivot table with weighted counts\n", + " pivot = df.pivot_table(\n", + " values='tax_unit_weight',\n", + " index='eitc_phase_status',\n", + " columns='filing_status_label',\n", + " aggfunc='sum',\n", + " fill_value=0\n", + " )\n", + " \n", + " # Add totals\n", + " pivot['Total'] = pivot.sum(axis=1)\n", + " pivot.loc['Total'] = pivot.sum()\n", + " \n", + " return pivot\n", + "\n", + "# 2024\n", + "crosstab_2024 = crosstab_phase_by_filing(df_2024, \"2024\")\n", + "print(crosstab_2024.to_string())\n", + "\n", + "# 2025\n", + "crosstab_2025 = crosstab_phase_by_filing(df_2025, \"2025\")\n", + "print(crosstab_2025.to_string())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Age Distribution" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "============================================================\n", + "Age Distribution of Head of Household - 2024\n", + "============================================================\n", + "age_group Tax Units (Weighted) Avg Federal EITC Avg Earned Income % of Total\n", + " Under 25 12,069,262.00 0.57 24,647.42 12.50\n", + " 25-34 14,198,971.00 35.78 76,383.24 14.70\n", + " 35-44 11,448,204.00 2.19 94,731.74 11.90\n", + " 45-54 16,595,334.00 22.36 87,682.62 17.20\n", + " 55-64 9,673,886.00 1.18 59,089.31 10.00\n", + " 65+ 32,441,214.00 0.01 25,601.59 33.60\n", + "\n", + "============================================================\n", + "Age Distribution of Head of Household - 2025\n", + "============================================================\n", + "age_group Tax Units (Weighted) Avg Federal EITC Avg Earned Income % of Total\n", + " Under 25 12,181,323.00 0.59 25,851.27 12.50\n", + " 25-34 14,330,805.00 36.65 80,112.14 14.70\n", + " 35-44 11,554,499.00 2.15 99,357.79 11.90\n", + " 45-54 16,749,416.00 22.80 91,964.61 17.20\n", + " 55-64 9,763,707.00 1.16 61,973.64 10.00\n", + " 65+ 32,742,422.00 0.01 26,850.62 33.60\n" + ] + } + ], + "source": [ + "def age_distribution(df, year_label):\n", + " \"\"\"\n", + " Create age group distribution.\n", + " \"\"\"\n", + " print(f\"\\n{'='*60}\")\n", + " print(f\"Age Distribution of Head of Household - {year_label}\")\n", + " print(f\"{'='*60}\")\n", + " \n", + " # Create age groups\n", + " df_copy = df.copy()\n", + " df_copy['age_group'] = pd.cut(\n", + " df_copy['age_head'],\n", + " bins=[0, 25, 35, 45, 55, 65, 100],\n", + " labels=['Under 25', '25-34', '35-44', '45-54', '55-64', '65+']\n", + " )\n", + " \n", + " summary = df_copy.groupby('age_group').apply(\n", + " lambda x: pd.Series({\n", + " 'Tax Units (Weighted)': x['tax_unit_weight'].sum(),\n", + " 'Avg Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum() if x['tax_unit_weight'].sum() > 0 else 0,\n", + " 'Avg Earned Income': (x['tax_unit_earned_income'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum() if x['tax_unit_weight'].sum() > 0 else 0,\n", + " })\n", + " ).reset_index()\n", + " \n", + " total_units = summary['Tax Units (Weighted)'].sum()\n", + " summary['% of Total'] = (summary['Tax Units (Weighted)'] / total_units * 100).round(1)\n", + " \n", + " return summary\n", + "\n", + "# 2024\n", + "age_2024 = age_distribution(df_2024, \"2024\")\n", + "print(age_2024.to_string(index=False))\n", + "\n", + "# 2025\n", + "age_2025 = age_distribution(df_2025, \"2025\")\n", + "print(age_2025.to_string(index=False))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### States with State EITC Programs" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "============================================================\n", + "States with State EITC Benefits - 2024\n", + "============================================================\n", + "state Tax Units (Weighted) Total State EITC Avg State EITC State EITC as % of Fed\n", + " CA 2,873,632.00 392,533,248.00 136.60 310.56\n", + " MN 313,996.50 118,287,520.00 376.72 656.23\n", + " VA 348,289.69 102,224,688.00 293.50 702.41\n", + " NJ 242,280.23 55,209,756.00 227.88 173.98\n", + " MD 64,599.95 49,036,304.00 759.08 294.33\n", + " WA 84,485.99 27,457,220.00 324.99 64.69\n", + " NY 146,240.25 17,955,152.00 122.78 27.83\n", + " MA 61,848.42 11,170,704.00 180.61 40.00\n", + " DC 21,836.86 10,643,084.00 487.39 442.39\n", + " NM 90,626.20 9,532,400.00 105.18 140.54\n", + " SC 15,907.35 9,201,450.00 578.44 125.00\n", + " OH 63,755.52 9,123,150.00 143.10 30.00\n", + " MI 58,990.42 9,018,837.00 152.89 30.00\n", + " IL 93,595.17 8,625,170.00 92.15 20.00\n", + " CO 33,525.91 7,573,860.50 225.91 50.00\n", + " CT 26,765.33 4,802,010.00 179.41 40.00\n", + " MO 25,335.12 2,362,942.00 93.27 20.00\n", + " ME 8,716.66 1,781,716.25 204.40 50.00\n", + " HI 8,067.77 1,532,535.38 189.96 40.00\n", + " IA 19,540.86 1,514,946.88 77.53 15.00\n", + " IN 25,976.84 1,273,132.25 49.01 10.00\n", + " OR 28,182.12 1,070,083.75 37.97 9.00\n", + " VT 4,973.77 849,665.75 170.83 38.00\n", + " KS 12,115.70 816,712.25 67.41 17.00\n", + " RI 9,870.70 746,713.44 75.65 16.00\n", + " LA 20,618.45 473,224.38 22.95 5.00\n", + " OK 20,021.92 449,874.88 22.47 4.57\n", + " NE 9,921.20 428,053.00 43.15 10.00\n", + " MT 6,376.92 245,735.78 38.54 10.00\n", + " DE 4,382.70 126,291.01 28.82 6.69\n", + "\n", + "============================================================\n", + "States with State EITC Benefits - 2025\n", + "============================================================\n", + "state Tax Units (Weighted) Total State EITC Avg State EITC State EITC as % of Fed\n", + " CA 2,542,093.00 394,849,216.00 155.32 304.41\n", + " MN 303,920.69 115,833,824.00 381.13 621.19\n", + " VA 351,505.81 103,897,088.00 295.58 711.09\n", + " NJ 242,613.14 56,647,304.00 233.49 173.25\n", + " MD 65,194.73 51,697,912.00 792.98 303.97\n", + " WA 85,270.31 28,292,344.00 331.80 63.72\n", + " NY 147,563.92 18,485,602.00 125.27 27.81\n", + " MA 62,421.07 11,548,022.00 185.00 40.00\n", + " DC 22,038.88 10,909,810.00 495.03 440.93\n", + " NM 90,061.51 9,856,160.00 109.44 141.47\n", + " OH 64,342.17 9,402,211.00 146.13 30.00\n", + " MI 59,535.31 9,386,164.00 157.66 30.00\n", + " SC 16,043.47 9,339,962.00 582.17 125.00\n", + " IL 94,457.95 8,905,220.00 94.28 20.00\n", + " CO 33,835.58 5,450,297.00 161.08 35.00\n", + " CT 27,013.58 4,959,099.50 183.58 40.00\n", + " MO 25,566.08 2,421,298.00 94.71 20.00\n", + " VT 5,019.33 2,304,432.25 459.11 100.00\n", + " ME 8,797.44 1,807,965.25 205.51 50.00\n", + " HI 8,142.12 1,588,292.62 195.07 40.00\n", + " IA 19,719.88 1,583,632.25 80.31 15.00\n", + " IN 26,210.19 1,314,889.75 50.17 10.00\n", + " OR 28,440.38 1,093,685.00 38.46 9.00\n", + " KS 12,228.20 826,845.44 67.62 17.00\n", + " RI 9,960.62 775,537.62 77.86 16.00\n", + " LA 20,803.49 486,003.25 23.36 5.00\n", + " OK 20,206.65 452,391.19 22.39 4.41\n", + " NE 10,012.65 435,824.81 43.53 10.00\n", + " MT 6,436.13 247,790.34 38.50 10.00\n", + " DE 4,423.04 123,470.46 27.92 6.44\n" + ] + } + ], + "source": [ + "def state_eitc_summary(df, year_label):\n", + " \"\"\"\n", + " Summary of states with state EITC programs.\n", + " \"\"\"\n", + " print(f\"\\n{'='*60}\")\n", + " print(f\"States with State EITC Benefits - {year_label}\")\n", + " print(f\"{'='*60}\")\n", + " \n", + " # Filter to states with state EITC > 0\n", + " df_with_state_eitc = df[df['state_eitc'] > 0]\n", + " \n", + " if len(df_with_state_eitc) == 0:\n", + " print(\"No state EITC benefits found in the data.\")\n", + " return None\n", + " \n", + " summary = df_with_state_eitc.groupby('state').apply(\n", + " lambda x: pd.Series({\n", + " 'Tax Units (Weighted)': x['tax_unit_weight'].sum(),\n", + " 'Total State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum(),\n", + " 'Avg State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n", + " 'State EITC as % of Fed': ((x['state_eitc'] * x['tax_unit_weight']).sum() / \n", + " (x['eitc'] * x['tax_unit_weight']).sum() * 100) if (x['eitc'] * x['tax_unit_weight']).sum() > 0 else 0,\n", + " })\n", + " ).reset_index()\n", + " \n", + " summary = summary.sort_values('Total State EITC', ascending=False)\n", + " \n", + " return summary\n", + "\n", + "# 2024\n", + "state_eitc_2024 = state_eitc_summary(df_2024, \"2024\")\n", + "if state_eitc_2024 is not None:\n", + " print(state_eitc_2024.to_string(index=False))\n", + "\n", + "# 2025\n", + "state_eitc_2025 = state_eitc_summary(df_2025, \"2025\")\n", + "if state_eitc_2025 is not None:\n", + " print(state_eitc_2025.to_string(index=False))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Export Data to CSV" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Exported 1,493,541 rows to: eitc_childless_families_2024.csv\n", + "Exported 1,493,541 rows to: eitc_childless_families_2025.csv\n" + ] + } + ], + "source": [ + "# Export detailed household data - SEPARATE files for 2024 and 2025\n", + "# Sorted by state and phase_status, without eitc_maximum\n", + "\n", + "def export_household_data(df, year):\n", + " \"\"\"Export household-level data sorted by state and phase status.\"\"\"\n", + " \n", + " export_columns = [\n", + " 'state',\n", + " 'eitc_phase_status',\n", + " 'tax_unit_id',\n", + " 'tax_unit_weight',\n", + " 'eitc',\n", + " 'state_eitc',\n", + " 'eitc_phased_in',\n", + " 'eitc_reduction',\n", + " 'tax_unit_earned_income',\n", + " 'adjusted_gross_income',\n", + " 'filing_status_label',\n", + " 'age_head',\n", + " 'age_spouse',\n", + " ]\n", + " \n", + " # Select columns that exist\n", + " available_columns = [col for col in export_columns if col in df.columns]\n", + " df_export = df[available_columns].copy()\n", + " \n", + " # Rename for clarity\n", + " df_export = df_export.rename(columns={\n", + " 'eitc': 'federal_eitc',\n", + " 'filing_status_label': 'marital_status',\n", + " })\n", + " \n", + " # Sort by state and phase status\n", + " df_export['phase_sort'] = df_export['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})\n", + " df_export = df_export.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)\n", + " \n", + " # Export\n", + " filename = f'eitc_childless_families_{year}.csv'\n", + " df_export.to_csv(filename, index=False)\n", + " print(f\"Exported {len(df_export):,} rows to: {filename}\")\n", + " \n", + " return df_export\n", + "\n", + "# Export 2024\n", + "df_export_2024 = export_household_data(df_2024, 2024)\n", + "\n", + "# Export 2025\n", + "df_export_2025 = export_household_data(df_2025, 2025)" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Sample of 2024 export data:\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
stateeitc_phase_statustax_unit_idtax_unit_weightfederal_eitcstate_eitceitc_phased_ineitc_reductiontax_unit_earned_incomeadjusted_gross_incomemarital_statusage_headage_spouse
25751AKNo income00.800.000.000.000.000.003,923.64Unknown790
25753AKNo income30.280.000.000.0010,068.100.00148,859.19Unknown7674
25754AKNo income512.270.000.00194.410.002,541.263,945.09Unknown640
25757AKNo income114,387.350.000.000.003,368.610.0061,284.13Unknown8582
25761AKNo income15639.520.000.000.00992.740.0023,307.04Unknown850
25763AKNo income181,114.780.000.000.000.000.001,403.83Unknown830
25767AKNo income220.820.000.000.000.000.002,153.92Unknown850
25769AKNo income24792.770.000.000.0020.540.0010,598.54Unknown810
25770AKNo income251.060.000.000.000.000.001,403.83Unknown850
25771AKNo income271.040.000.000.000.000.001,403.83Unknown640
\n", + "
" + ], + "text/plain": [ + " state eitc_phase_status tax_unit_id tax_unit_weight federal_eitc \\\n", + "25751 AK No income 0 0.80 0.00 \n", + "25753 AK No income 3 0.28 0.00 \n", + "25754 AK No income 5 12.27 0.00 \n", + "25757 AK No income 11 4,387.35 0.00 \n", + "25761 AK No income 15 639.52 0.00 \n", + "25763 AK No income 18 1,114.78 0.00 \n", + "25767 AK No income 22 0.82 0.00 \n", + "25769 AK No income 24 792.77 0.00 \n", + "25770 AK No income 25 1.06 0.00 \n", + "25771 AK No income 27 1.04 0.00 \n", + "\n", + " state_eitc eitc_phased_in eitc_reduction tax_unit_earned_income \\\n", + "25751 0.00 0.00 0.00 0.00 \n", + "25753 0.00 0.00 10,068.10 0.00 \n", + "25754 0.00 194.41 0.00 2,541.26 \n", + "25757 0.00 0.00 3,368.61 0.00 \n", + "25761 0.00 0.00 992.74 0.00 \n", + "25763 0.00 0.00 0.00 0.00 \n", + "25767 0.00 0.00 0.00 0.00 \n", + "25769 0.00 0.00 20.54 0.00 \n", + "25770 0.00 0.00 0.00 0.00 \n", + "25771 0.00 0.00 0.00 0.00 \n", + "\n", + " adjusted_gross_income marital_status age_head age_spouse \n", + "25751 3,923.64 Unknown 79 0 \n", + "25753 148,859.19 Unknown 76 74 \n", + "25754 3,945.09 Unknown 64 0 \n", + "25757 61,284.13 Unknown 85 82 \n", + "25761 23,307.04 Unknown 85 0 \n", + "25763 1,403.83 Unknown 83 0 \n", + "25767 2,153.92 Unknown 85 0 \n", + "25769 10,598.54 Unknown 81 0 \n", + "25770 1,403.83 Unknown 85 0 \n", + "25771 1,403.83 Unknown 64 0 " + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Preview the data\n", + "print(\"\\nSample of 2024 export data:\")\n", + "df_export_2024.head(10)" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Household data exported to separate files above.\n" + ] + } + ], + "source": [ + "# CSVs already exported in previous cell\n", + "# Files created:\n", + "# - eitc_childless_families_2024.csv\n", + "# - eitc_childless_families_2025.csv\n", + "print(\"Household data exported to separate files above.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summary Statistics Export" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Exported summary to: eitc_childless_phase_status_summary_2024.csv\n", + "Exported summary to: eitc_childless_phase_status_summary_2025.csv\n" + ] + } + ], + "source": [ + "# Export phase status summaries - SEPARATE files for 2024 and 2025\n", + "\n", + "def export_summary(summary_df, year):\n", + " \"\"\"Export summary sorted by state and phase status.\"\"\"\n", + " df_export = summary_df.copy()\n", + " \n", + " # Sort by state and phase status\n", + " df_export['phase_sort'] = df_export['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})\n", + " df_export = df_export.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)\n", + " \n", + " filename = f'eitc_childless_phase_status_summary_{year}.csv'\n", + " df_export.to_csv(filename, index=False)\n", + " print(f\"Exported summary to: {filename}\")\n", + " return df_export\n", + "\n", + "# Export 2024 summary\n", + "summary_2024_export = export_summary(summary_2024, 2024)\n", + "\n", + "# Export 2025 summary \n", + "summary_2025_export = export_summary(summary_2025, 2025)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Grand Totals" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "National Totals by Phase Status:\n", + "\n", + "2024:\n", + " eitc_phase_status weighted_households pct_of_total year\n", + " Full amount 33,314.48 0.00 2024\n", + " Fully phased out 60,244,548.00 62.50 2024\n", + " No income 34,126,456.00 35.40 2024\n", + "Partially phased out 824,046.81 0.90 2024\n", + " Pre-phase-in 1,203,184.00 1.20 2024\n", + "\n", + "Total childless EITC recipients: 96,431,552\n", + "\n", + "2025:\n", + " eitc_phase_status weighted_households pct_of_total year\n", + " Full amount 33,638.47 0.00 2025\n", + " Fully phased out 60,940,444.00 62.60 2025\n", + " No income 34,307,016.00 35.20 2025\n", + "Partially phased out 831,458.88 0.90 2025\n", + " Pre-phase-in 1,214,332.12 1.20 2025\n", + "\n", + "Total childless EITC recipients: 97,326,896\n" + ] + } + ], + "source": [ + "# National totals by phase status\n", + "def national_totals(df, year):\n", + " totals = df.groupby('eitc_phase_status').agg({\n", + " 'tax_unit_weight': 'sum',\n", + " }).reset_index()\n", + " totals.columns = ['eitc_phase_status', 'weighted_households']\n", + " total_all = totals['weighted_households'].sum()\n", + " totals['pct_of_total'] = (totals['weighted_households'] / total_all * 100).round(1)\n", + " totals['year'] = year\n", + " return totals\n", + "\n", + "print(\"National Totals by Phase Status:\")\n", + "print(\"\\n2024:\")\n", + "nat_2024 = national_totals(df_2024, 2024)\n", + "print(nat_2024.to_string(index=False))\n", + "print(f\"\\nTotal childless EITC recipients: {nat_2024['weighted_households'].sum():,.0f}\")\n", + "\n", + "print(\"\\n2025:\")\n", + "nat_2025 = national_totals(df_2025, 2025)\n", + "print(nat_2025.to_string(index=False))\n", + "print(f\"\\nTotal childless EITC recipients: {nat_2025['weighted_households'].sum():,.0f}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Notes\n", + "\n", + "### Data Interpretation\n", + "- **Tax unit weights** represent the number of actual tax units each record represents in the population\n", + "- All monetary values are weighted averages/totals reflecting the full population\n", + "- The enhanced CPS dataset has ~42,000 household records that are weighted to represent the US population\n", + "\n", + "### EITC Phase Status Definitions\n", + "1. **Pre-phase-in**: Earned income is below the level needed to receive the maximum credit. The credit amount equals (earned income × phase-in rate).\n", + "2. **Full amount**: Earned income is sufficient to receive the maximum credit, and income is below the phase-out threshold.\n", + "3. **Partially phased out**: Income is above the phase-out threshold, resulting in a reduced credit.\n", + "4. **Fully phased out**: Income is too high; credit is reduced to $0.\n", + "\n", + "### State EITC Programs\n", + "Not all states have state EITC programs. States with programs typically calculate their EITC as a percentage of the federal EITC amount.\n", + "\n", + "### Childless Worker EITC\n", + "The federal EITC for childless workers is significantly smaller than for workers with children. Key parameters (2024):\n", + "- Maximum credit: ~$632\n", + "- Phase-in rate: 7.65%\n", + "- Phase-out starts at: ~$9,800 (single), ~$16,400 (married)\n", + "- Phase-out rate: 7.65%" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.5" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/eitc_childless_analysis/eitc_childless_phase_status_summary_2024.csv b/eitc_childless_analysis/eitc_childless_phase_status_summary_2024.csv new file mode 100644 index 0000000..2f08905 --- /dev/null +++ b/eitc_childless_analysis/eitc_childless_phase_status_summary_2024.csv @@ -0,0 +1,256 @@ +state,eitc_phase_status,weighted_households,pct_of_state,avg_federal_eitc,avg_state_eitc +AK,No income,64211.33,31.2,0.0,0.0 +AK,Pre-phase-in,3593.0698,1.7,515.6275,0.0 +AK,Full amount,0.26411486,0.0,632.0,0.0 +AK,Partially phased out,1670.441,0.8,626.76306,0.0 +AK,Fully phased out,136303.23,66.2,0.0,0.0 +AL,No income,598891.06,42.1,0.0,0.0 +AL,Pre-phase-in,3393.997,0.2,354.8626,0.0 +AL,Full amount,579.7901,0.0,632.00006,0.0 +AL,Partially phased out,10719.72,0.8,448.05634,0.0 +AL,Fully phased out,808538.3,56.9,0.0,0.0 +AR,No income,232860.75,34.1,0.0,0.0 +AR,Pre-phase-in,2328.134,0.3,453.59937,0.0 +AR,Full amount,225.77205,0.0,632.00006,0.0 +AR,Partially phased out,5891.0483,0.9,390.64108,0.0 +AR,Fully phased out,442536.25,64.7,0.0,0.0 +AZ,No income,672398.75,35.3,0.0,0.0 +AZ,Pre-phase-in,16732.682,0.9,489.90863,0.0 +AZ,Full amount,813.97723,0.0,631.99994,0.0 +AZ,Partially phased out,14077.297,0.7,468.2071,0.0 +AZ,Fully phased out,1201599.2,63.1,0.0,0.0 +CA,No income,4351373.5,37.3,0.0,0.0 +CA,Pre-phase-in,169192.22,1.4,464.49402,220.66093 +CA,Full amount,6347.9746,0.1,632.0,214.94342 +CA,Partially phased out,129554.266,1.1,338.052,165.84186 +CA,Fully phased out,7020288.0,60.1,0.0,0.0 +CO,No income,508308.2,31.7,0.0,0.0 +CO,Pre-phase-in,18100.334,1.1,495.57047,247.78523 +CO,Full amount,607.85394,0.0,632.0,316.0 +CO,Partially phased out,14817.716,0.9,390.9891,195.49455 +CO,Fully phased out,1061123.5,66.2,0.0,0.0 +CT,No income,375591.62,33.5,0.0,0.0 +CT,Pre-phase-in,16373.769,1.5,484.28445,193.71379 +CT,Full amount,767.30066,0.1,631.99994,252.79996 +CT,Partially phased out,9624.257,0.9,373.07062,149.22826 +CT,Fully phased out,717488.44,64.1,0.0,0.0 +DC,No income,108526.7,43.9,0.0,0.0 +DC,Pre-phase-in,2963.3503,1.2,494.57617,494.57617 +DC,Full amount,184.6686,0.1,632.0,632.0 +DC,Partially phased out,2235.9175,0.9,368.30032,631.9979 +DC,Fully phased out,133171.56,53.9,0.0,0.0 +DE,No income,96851.85,36.5,0.0,0.0 +DE,Pre-phase-in,1639.9989,0.6,510.83517,22.987583 +DE,Full amount,147.16574,0.1,632.0001,28.440006 +DE,Partially phased out,2595.5366,1.0,368.74863,32.519676 +DE,Fully phased out,163998.89,61.8,0.0,0.0 +FL,No income,2428692.2,35.6,0.0,0.0 +FL,Pre-phase-in,75472.13,1.1,421.00208,0.0 +FL,Full amount,162.57959,0.0,632.00006,0.0 +FL,Partially phased out,46630.918,0.7,390.32828,0.0 +FL,Fully phased out,4277713.5,62.6,0.0,0.0 +GA,No income,1082267.2,37.7,0.0,0.0 +GA,Pre-phase-in,19097.996,0.7,465.8739,0.0 +GA,Full amount,737.23004,0.0,632.00006,0.0 +GA,Partially phased out,31459.322,1.1,345.65503,0.0 +GA,Fully phased out,1734347.4,60.5,0.0,0.0 +HI,No income,150656.95,37.5,0.0,0.0 +HI,Pre-phase-in,5178.715,1.3,494.72058,197.88823 +HI,Full amount,257.19754,0.1,632.0,252.79999 +HI,Partially phased out,2631.8538,0.7,420.5297,168.21193 +HI,Fully phased out,242505.7,60.4,0.0,0.0 +IA,No income,251013.9,30.1,0.0,0.0 +IA,Pre-phase-in,15106.781,1.8,501.14404,75.171616 +IA,Full amount,170.26959,0.0,632.00006,94.80001 +IA,Partially phased out,4263.8096,0.5,567.887,85.18305 +IA,Fully phased out,564435.0,67.6,0.0,0.0 +ID,No income,124213.31,29.5,0.0,0.0 +ID,Pre-phase-in,4232.7524,1.0,488.3741,0.0 +ID,Full amount,29.52343,0.0,631.99994,0.0 +ID,Partially phased out,3326.125,0.8,406.90085,0.0 +ID,Fully phased out,288834.3,68.7,0.0,0.0 +IL,No income,1562858.9,38.5,0.0,0.0 +IL,Pre-phase-in,56196.797,1.4,500.49594,100.099174 +IL,Full amount,1363.2563,0.0,632.0,126.399994 +IL,Partially phased out,36035.117,0.9,392.3396,78.46792 +IL,Fully phased out,2405379.0,59.2,0.0,0.0 +IN,No income,524384.75,30.7,0.0,0.0 +IN,Pre-phase-in,15157.452,0.9,480.99515,48.099514 +IN,Full amount,511.06155,0.0,632.0,63.2 +IN,Partially phased out,10308.328,0.6,496.45984,49.64599 +IN,Fully phased out,1156921.9,67.8,0.0,0.0 +KS,No income,212148.61,28.4,0.0,0.0 +KS,Pre-phase-in,4779.117,0.6,466.8283,79.36082 +KS,Full amount,216.41118,0.0,631.99994,107.44002 +KS,Partially phased out,7120.175,1.0,342.1813,58.17083 +KS,Fully phased out,522227.38,70.0,0.0,0.0 +KY,No income,423462.44,37.7,0.0,0.0 +KY,Pre-phase-in,13095.383,1.2,494.37393,0.0 +KY,Full amount,217.51695,0.0,632.0,0.0 +KY,Partially phased out,9609.8,0.9,443.0421,0.0 +KY,Fully phased out,676532.6,60.2,0.0,0.0 +LA,No income,555979.4,44.3,0.0,0.0 +LA,Pre-phase-in,10768.384,0.9,475.93637,23.796818 +LA,Full amount,408.5082,0.0,631.99994,31.6 +LA,Partially phased out,9441.553,0.8,432.26416,21.613207 +LA,Fully phased out,678437.3,54.1,0.0,0.0 +MA,No income,924162.9,37.8,0.0,0.0 +MA,Pre-phase-in,39122.17,1.6,497.64935,199.05975 +MA,Full amount,775.058,0.0,632.00006,252.79997 +MA,Partially phased out,21951.188,0.9,362.97803,145.19122 +MA,Fully phased out,1459471.0,59.7,0.0,0.0 +MD,No income,591329.2,34.0,0.0,0.0 +MD,Pre-phase-in,19581.512,1.1,469.82965,899.6156 +MD,Full amount,673.052,0.0,631.9999,1145.5214 +MD,Partially phased out,20056.42,1.2,350.75146,579.90436 +MD,Fully phased out,1105824.4,63.6,0.0,0.0 +ME,No income,160096.05,36.7,0.0,0.0 +ME,Pre-phase-in,3628.5383,0.8,479.49707,239.74854 +ME,Full amount,66.5702,0.0,632.0,316.0 +ME,Partially phased out,5021.548,1.2,354.7684,177.3842 +ME,Fully phased out,267842.22,61.3,0.0,0.0 +MI,No income,1127146.1,38.2,0.0,0.0 +MI,Pre-phase-in,40620.59,1.4,500.584,150.1752 +MI,Full amount,1816.3854,0.1,632.0,189.59996 +MI,Partially phased out,16553.447,0.6,518.3705,155.51114 +MI,Fully phased out,1761326.0,59.8,0.0,0.0 +MN,No income,477418.0,30.2,0.0,0.0 +MN,Pre-phase-in,24572.006,1.6,499.59137,515.8987 +MN,Full amount,607.66266,0.0,631.9999,637.73914 +MN,Partially phased out,13705.589,0.9,391.5037,578.87744 +MN,Fully phased out,1063629.9,67.3,0.0,0.0 +MO,No income,557544.7,35.5,0.0,0.0 +MO,Pre-phase-in,12013.582,0.8,484.0781,96.81563 +MO,Full amount,537.36273,0.0,632.0,126.4 +MO,Partially phased out,12784.181,0.8,442.70206,88.54041 +MO,Fully phased out,989594.4,62.9,0.0,0.0 +MS,No income,300984.9,40.0,0.0,0.0 +MS,Pre-phase-in,2575.797,0.3,406.55896,0.0 +MS,Full amount,166.1088,0.0,632.0,0.0 +MS,Partially phased out,7846.9844,1.0,404.25897,0.0 +MS,Fully phased out,440284.22,58.6,0.0,0.0 +MT,No income,104050.336,32.3,0.0,0.0 +MT,Pre-phase-in,2331.9133,0.7,462.88824,46.288826 +MT,Full amount,77.42616,0.0,631.99994,63.2 +MT,Partially phased out,3967.5837,1.2,334.9668,33.49668 +MT,Fully phased out,212178.6,65.8,0.0,0.0 +NC,No income,1207476.5,40.0,0.0,0.0 +NC,Pre-phase-in,11699.434,0.4,451.06644,0.0 +NC,Full amount,875.7679,0.0,632.00006,0.0 +NC,Partially phased out,24799.03,0.8,392.04834,0.0 +NC,Fully phased out,1773596.9,58.8,0.0,0.0 +ND,No income,49795.652,23.9,0.0,0.0 +ND,Pre-phase-in,4071.564,2.0,507.6821,0.0 +ND,Full amount,3.9011254,0.0,632.0,0.0 +ND,Partially phased out,932.3558,0.4,620.9536,0.0 +ND,Fully phased out,153755.1,73.7,0.0,0.0 +NE,No income,153839.03,27.7,0.0,0.0 +NE,Pre-phase-in,4230.322,0.8,484.84406,48.48441 +NE,Full amount,54.920815,0.0,631.99994,63.20001 +NE,Partially phased out,5635.957,1.0,389.42334,38.942337 +NE,Fully phased out,391285.56,70.5,0.0,0.0 +NH,No income,106673.484,22.9,0.0,0.0 +NH,Pre-phase-in,10499.317,2.3,341.3794,0.0 +NH,Full amount,0.36059391,0.0,631.9999,0.0 +NH,Partially phased out,4283.302,0.9,369.89697,0.0 +NH,Fully phased out,344719.75,73.9,0.0,0.0 +NJ,No income,739419.6,27.7,0.0,0.0 +NJ,Pre-phase-in,44810.92,1.7,469.32755,187.73102 +NJ,Full amount,816.7071,0.0,632.00006,252.79999 +NJ,Partially phased out,28470.457,1.1,357.7779,143.11118 +NJ,Fully phased out,1856987.6,69.5,0.0,0.0 +NM,No income,325146.88,48.2,0.0,0.0 +NM,Pre-phase-in,8052.7715,1.2,507.6397,126.90993 +NM,Full amount,333.1007,0.0,632.00006,158.00002 +NM,Partially phased out,6802.738,1.0,365.1997,91.29993 +NM,Fully phased out,334468.0,49.6,0.0,0.0 +NV,No income,348538.47,36.2,0.0,0.0 +NV,Pre-phase-in,14720.673,1.5,504.0872,0.0 +NV,Full amount,148.53976,0.0,632.0,0.0 +NV,Partially phased out,8670.565,0.9,444.6631,0.0 +NV,Fully phased out,590726.06,61.4,0.0,0.0 +NY,No income,2374925.0,39.0,0.0,0.0 +NY,Pre-phase-in,84690.75,1.4,473.9047,142.17131 +NY,Full amount,4793.9814,0.1,632.00006,187.95201 +NY,Partially phased out,57803.137,0.9,371.3948,86.73438 +NY,Fully phased out,3567282.8,58.6,0.0,0.0 +OH,No income,1122261.2,35.4,0.0,0.0 +OH,Pre-phase-in,35480.473,1.1,496.31277,148.89384 +OH,Full amount,1062.4825,0.0,632.00006,189.60002 +OH,Partially phased out,27212.557,0.9,445.73523,133.72055 +OH,Fully phased out,1985389.1,62.6,0.0,0.0 +OK,No income,449458.44,39.4,0.0,0.0 +OK,Pre-phase-in,12928.621,1.1,493.86475,24.441998 +OK,Full amount,425.45355,0.0,631.9999,26.308289 +OK,Partially phased out,7212.5874,0.6,491.54944,14.299281 +OK,Fully phased out,671719.44,58.8,0.0,0.0 +OR,No income,580960.9,42.0,0.0,0.0 +OR,Pre-phase-in,13906.58,1.0,489.80258,44.082233 +OR,Full amount,593.1798,0.0,631.99994,56.879993 +OR,Partially phased out,13682.355,1.0,343.76025,30.938423 +OR,Fully phased out,775251.25,56.0,0.0,0.0 +PA,No income,1519688.8,37.5,0.0,0.0 +PA,Pre-phase-in,57390.523,1.4,501.43756,0.0 +PA,Full amount,1836.4663,0.0,632.0,0.0 +PA,Partially phased out,29079.27,0.7,390.8903,0.0 +PA,Fully phased out,2449417.2,60.4,0.0,0.0 +RI,No income,134419.86,33.8,0.0,0.0 +RI,Pre-phase-in,6746.8735,1.7,491.96518,78.71443 +RI,Full amount,183.33531,0.0,632.0,101.12 +RI,Partially phased out,2940.4941,0.7,418.9308,67.02892 +RI,Fully phased out,253292.34,63.7,0.0,0.0 +SC,No income,515461.88,37.1,0.0,0.0 +SC,Pre-phase-in,5157.9634,0.4,435.94592,544.9495 +SC,Full amount,683.06305,0.0,631.99994,789.99994 +SC,Partially phased out,10066.327,0.7,464.98962,581.2448 +SC,Fully phased out,856582.0,61.7,0.0,0.0 +SD,No income,66645.664,25.9,0.0,0.0 +SD,Pre-phase-in,1245.0791,0.5,504.92484,0.0 +SD,Full amount,0.10412951,0.0,632.00006,0.0 +SD,Partially phased out,2807.9924,1.1,332.0377,0.0 +SD,Fully phased out,186959.94,72.6,0.0,0.0 +TN,No income,638658.2,30.0,0.0,0.0 +TN,Pre-phase-in,26534.91,1.2,403.13788,0.0 +TN,Full amount,507.1686,0.0,631.9999,0.0 +TN,Partially phased out,10116.405,0.5,525.4987,0.0 +TN,Fully phased out,1450007.2,68.2,0.0,0.0 +TX,No income,2445616.0,29.6,0.0,0.0 +TX,Pre-phase-in,152565.27,1.8,480.83218,0.0 +TX,Full amount,196.01257,0.0,632.00006,0.0 +TX,Partially phased out,57381.688,0.7,500.75223,0.0 +TX,Fully phased out,5614733.5,67.9,0.0,0.0 +UT,No income,199810.95,27.4,0.0,0.0 +UT,Pre-phase-in,6227.8457,0.9,467.4281,0.0 +UT,Full amount,130.31685,0.0,631.9999,0.0 +UT,Partially phased out,10770.686,1.5,357.29318,0.0 +UT,Fully phased out,511761.94,70.2,0.0,0.0 +VA,No income,796918.2,33.9,0.0,0.0 +VA,Pre-phase-in,11077.845,0.5,431.60608,64.74091 +VA,Full amount,626.7285,0.0,632.0,94.8 +VA,Partially phased out,25376.033,1.1,369.48682,55.479294 +VA,Fully phased out,1514494.6,64.5,0.0,0.0 +VT,No income,74611.41,35.0,0.0,0.0 +VT,Pre-phase-in,2901.803,1.4,506.58264,192.5014 +VT,Full amount,82.56235,0.0,631.99994,240.16003 +VT,Partially phased out,1989.4019,0.9,358.7914,136.34074 +VT,Fully phased out,133518.0,62.7,0.0,0.0 +WA,No income,823241.4,30.4,0.0,0.0 +WA,Pre-phase-in,66998.695,2.5,508.81302,324.99997 +WA,Full amount,9.786905,0.0,632.0,324.99997 +WA,Partially phased out,17477.504,0.6,477.79034,324.9583 +WA,Fully phased out,1801335.0,66.5,0.0,0.0 +WI,No income,611358.3,35.3,0.0,0.0 +WI,Pre-phase-in,15312.648,0.9,452.62857,0.0 +WI,Full amount,774.9501,0.0,632.0,0.0 +WI,Partially phased out,12071.585,0.7,421.98117,0.0 +WI,Fully phased out,1092418.4,63.1,0.0,0.0 +WV,No income,229108.64,44.1,0.0,0.0 +WV,Pre-phase-in,5850.024,1.1,508.3707,0.0 +WV,Full amount,620.71576,0.1,631.99994,0.0 +WV,Partially phased out,3761.238,0.7,542.116,0.0 +WV,Fully phased out,280251.75,53.9,0.0,0.0 +WY,No income,44996.188,26.4,0.0,0.0 +WY,Pre-phase-in,2233.8857,1.3,320.7945,0.0 +WY,Full amount,86.93167,0.1,632.0,0.0 +WY,Partially phased out,803.0245,0.5,622.7631,0.0 +WY,Fully phased out,122062.08,71.7,0.0,0.0 diff --git a/eitc_childless_analysis/eitc_childless_phase_status_summary_2025.csv b/eitc_childless_analysis/eitc_childless_phase_status_summary_2025.csv new file mode 100644 index 0000000..a2ea8be --- /dev/null +++ b/eitc_childless_analysis/eitc_childless_phase_status_summary_2025.csv @@ -0,0 +1,255 @@ +state,eitc_phase_status,weighted_households,pct_of_state,avg_federal_eitc,avg_state_eitc +AK,No income,64630.094,31.1,0.0,0.0 +AK,Pre-phase-in,3626.4304,1.7,540.8136,0.0 +AK,Full amount,0.2665671,0.0,649.0,0.0 +AK,Partially phased out,1685.9508,0.8,627.0948,0.0 +AK,Fully phased out,137746.2,66.3,0.0,0.0 +AL,No income,600602.44,41.8,0.0,0.0 +AL,Pre-phase-in,3424.464,0.2,372.11118,0.0 +AL,Full amount,586.21875,0.0,648.9999,0.0 +AL,Partially phased out,10817.388,0.8,439.31036,0.0 +AL,Fully phased out,819896.5,57.1,0.0,0.0 +AR,No income,233882.83,33.9,0.0,0.0 +AR,Pre-phase-in,2349.7502,0.3,475.75613,0.0 +AR,Full amount,227.07906,0.0,649.0,0.0 +AR,Partially phased out,5943.537,0.9,379.36224,0.0 +AR,Fully phased out,447788.06,64.9,0.0,0.0 +AZ,No income,676085.4,35.2,0.0,0.0 +AZ,Pre-phase-in,16887.523,0.9,513.83417,0.0 +AZ,Full amount,821.8081,0.0,649.00006,0.0 +AZ,Partially phased out,14207.117,0.7,460.64685,0.0 +AZ,Fully phased out,1215313.5,63.2,0.0,0.0 +CA,No income,4375658.5,37.1,0.0,0.0 +CA,Pre-phase-in,170760.03,1.4,487.17966,225.75891 +CA,Full amount,6409.468,0.1,649.0,217.95975 +CA,Partially phased out,130714.73,1.1,324.05356,164.7 +CA,Fully phased out,7101629.0,60.3,0.0,0.0 +CO,No income,511078.1,31.6,0.0,0.0 +CO,Pre-phase-in,18268.05,1.1,519.7745,181.92107 +CO,Full amount,613.7801,0.0,648.99994,227.15 +CO,Partially phased out,14953.744,0.9,379.74893,132.91211 +CO,Fully phased out,1072927.0,66.3,0.0,0.0 +CT,No income,377874.75,33.4,0.0,0.0 +CT,Pre-phase-in,16525.797,1.5,507.93967,203.17586 +CT,Full amount,774.36786,0.1,649.00006,259.6 +CT,Partially phased out,9713.412,0.9,360.43723,144.1749 +CT,Fully phased out,725354.56,64.2,0.0,0.0 +DC,No income,109168.22,43.8,0.0,0.0 +DC,Pre-phase-in,2990.8645,1.2,518.73395,518.73395 +DC,Full amount,186.38322,0.1,648.99994,648.99994 +DC,Partially phased out,2256.2388,0.9,355.38052,648.99664 +DC,Fully phased out,134774.6,54.0,0.0,0.0 +DE,No income,97403.21,36.4,0.0,0.0 +DE,Pre-phase-in,1655.2258,0.6,535.7872,24.110426 +DE,Full amount,148.53215,0.1,648.9998,29.219957 +DE,Partially phased out,2619.2864,1.0,356.51498,30.245703 +DE,Fully phased out,165869.83,62.0,0.0,0.0 +FL,No income,2441849.5,35.4,0.0,0.0 +FL,Pre-phase-in,76172.52,1.1,441.5651,0.0 +FL,Full amount,164.23816,0.0,649.0,0.0 +FL,Partially phased out,47063.992,0.7,379.02612,0.0 +FL,Fully phased out,4326824.0,62.8,0.0,0.0 +GA,No income,1087212.8,37.6,0.0,0.0 +GA,Pre-phase-in,19275.111,0.7,488.62866,0.0 +GA,Full amount,744.164,0.0,649.0,0.0 +GA,Partially phased out,31712.979,1.1,332.53235,0.0 +GA,Fully phased out,1755592.1,60.7,0.0,0.0 +HI,No income,151569.28,37.4,0.0,0.0 +HI,Pre-phase-in,5226.798,1.3,518.88556,207.55426 +HI,Full amount,259.58557,0.1,649.0,259.6 +HI,Partially phased out,2655.7375,0.7,410.4888,164.19553 +HI,Fully phased out,245244.36,60.6,0.0,0.0 +IA,No income,252454.45,30.0,0.0,0.0 +IA,Pre-phase-in,15246.479,1.8,525.61804,78.842705 +IA,Full amount,172.41685,0.0,649.00006,97.350006 +IA,Partially phased out,4300.986,0.5,565.4114,84.81173 +IA,Fully phased out,570568.1,67.7,0.0,0.0 +ID,No income,124885.96,29.4,0.0,0.0 +ID,Pre-phase-in,4272.0527,1.0,512.2291,0.0 +ID,Full amount,29.752382,0.0,649.0,0.0 +ID,Partially phased out,3353.1333,0.8,396.75974,0.0 +ID,Fully phased out,292000.62,68.8,0.0,0.0 +IL,No income,1571268.5,38.3,0.0,0.0 +IL,Pre-phase-in,56717.453,1.4,524.9404,104.98808 +IL,Full amount,1376.8824,0.0,649.0,129.8 +IL,Partially phased out,36363.61,0.9,381.12885,76.22578 +IL,Fully phased out,2433819.8,59.4,0.0,0.0 +IN,No income,526058.25,30.5,0.0,0.0 +IN,Pre-phase-in,15297.608,0.9,504.4841,50.4484 +IN,Full amount,515.559,0.0,649.0,64.899994 +IN,Partially phased out,10397.022,0.6,490.227,49.0227 +IN,Fully phased out,1170866.9,67.9,0.0,0.0 +KS,No income,213326.72,28.3,0.0,0.0 +KS,Pre-phase-in,4823.49,0.6,489.63092,83.23727 +KS,Full amount,218.4205,0.0,649.00006,110.330025 +KS,Partially phased out,7186.284,1.0,328.44672,55.83595 +KS,Fully phased out,527867.8,70.1,0.0,0.0 +KY,No income,425535.8,37.5,0.0,0.0 +KY,Pre-phase-in,13215.555,1.2,518.5075,0.0 +KY,Full amount,220.27727,0.0,648.99994,0.0 +KY,Partially phased out,9690.798,0.9,434.6611,0.0 +KY,Fully phased out,684681.4,60.4,0.0,0.0 +LA,No income,558228.4,44.1,0.0,0.0 +LA,Pre-phase-in,10867.544,0.9,499.1724,24.95862 +LA,Full amount,412.60556,0.0,649.0,32.450005 +LA,Partially phased out,9523.345,0.8,422.9088,21.14544 +LA,Fully phased out,687655.94,54.3,0.0,0.0 +MA,No income,929485.7,37.7,0.0,0.0 +MA,Pre-phase-in,39485.215,1.6,521.95667,208.78267 +MA,Full amount,782.3858,0.0,649.0,259.60004 +MA,Partially phased out,22153.469,0.9,349.9549,139.98196 +MA,Fully phased out,1476281.2,59.8,0.0,0.0 +MD,No income,594474.6,33.9,0.0,0.0 +MD,Pre-phase-in,19762.99,1.1,492.7761,974.1709 +MD,Full amount,679.5751,0.0,649.00006,1213.9137 +MD,Partially phased out,20237.688,1.2,337.385,566.70026 +MD,Fully phased out,1118441.6,63.8,0.0,0.0 +ME,No income,160843.88,36.5,0.0,0.0 +ME,Pre-phase-in,3662.2285,0.8,502.9185,251.45924 +ME,Full amount,67.18829,0.0,648.99994,324.49997 +ME,Partially phased out,5068.026,1.1,341.45905,170.72952 +ME,Fully phased out,271067.88,61.5,0.0,0.0 +MI,No income,1133642.9,38.1,0.0,0.0 +MI,Pre-phase-in,40997.25,1.4,525.0338,157.51013 +MI,Full amount,1833.6553,0.1,649.0001,194.70001 +MI,Partially phased out,16704.406,0.6,513.17175,153.95154 +MI,Fully phased out,1781650.9,59.9,0.0,0.0 +MN,No income,480188.66,30.1,0.0,0.0 +MN,Pre-phase-in,24799.537,1.6,523.99115,541.0607 +MN,Full amount,613.8611,0.0,649.00006,653.1152 +MN,Partially phased out,13830.196,0.9,379.9242,576.283 +MN,Fully phased out,1075170.2,67.4,0.0,0.0 +MO,No income,559638.9,35.3,0.0,0.0 +MO,Pre-phase-in,12124.915,0.8,507.72104,101.544205 +MO,Full amount,542.54004,0.0,648.9999,129.8 +MO,Partially phased out,12898.628,0.8,434.0235,86.804695 +MO,Fully phased out,1001869.3,63.1,0.0,0.0 +MS,No income,301091.4,39.7,0.0,0.0 +MS,Pre-phase-in,2598.4688,0.3,426.31012,0.0 +MS,Full amount,168.39967,0.0,649.0,0.0 +MS,Partially phased out,7911.6304,1.0,394.04974,0.0 +MS,Fully phased out,447068.94,58.9,0.0,0.0 +MT,No income,104646.91,32.1,0.0,0.0 +MT,Pre-phase-in,2353.5647,0.7,485.49875,48.549877 +MT,Full amount,78.14505,0.0,648.99994,64.899994 +MT,Partially phased out,4004.4219,1.2,320.77896,32.0779 +MT,Fully phased out,214518.14,65.9,0.0,0.0 +NC,No income,1213348.5,39.8,0.0,0.0 +NC,Pre-phase-in,11808.03,0.4,473.09875,0.0 +NC,Full amount,883.92944,0.0,649.0,0.0 +NC,Partially phased out,25027.451,0.8,380.5623,0.0 +NC,Fully phased out,1795405.4,58.9,0.0,0.0 +ND,No income,50085.26,23.8,0.0,0.0 +ND,Pre-phase-in,4108.928,2.0,532.4672,0.0 +ND,Full amount,4.3769407,0.0,649.00006,0.0 +ND,Partially phased out,941.0125,0.4,620.9985,0.0 +ND,Fully phased out,155355.4,73.8,0.0,0.0 +NE,No income,154722.36,27.6,0.0,0.0 +NE,Pre-phase-in,4269.599,0.8,508.52655,50.85265 +NE,Full amount,55.43074,0.0,648.99994,64.899994 +NE,Partially phased out,5687.6196,1.0,378.20206,37.82021 +NE,Fully phased out,395464.25,70.6,0.0,0.0 +NH,No income,107312.2,22.8,0.0,0.0 +NH,Pre-phase-in,10596.802,2.3,358.05423,0.0 +NH,Full amount,0.36394194,0.0,649.0,0.0 +NH,Partially phased out,4323.072,0.9,357.67905,0.0 +NH,Fully phased out,348272.1,74.0,0.0,0.0 +NJ,No income,744035.94,27.6,0.0,0.0 +NJ,Pre-phase-in,45226.242,1.7,492.24957,196.89983 +NJ,Full amount,824.88654,0.0,649.00006,259.60004 +NJ,Partially phased out,28732.479,1.1,344.521,137.80841 +NJ,Fully phased out,1876480.9,69.6,0.0,0.0 +NM,No income,326654.53,48.0,0.0,0.0 +NM,Pre-phase-in,8126.212,1.2,532.416,133.104 +NM,Full amount,337.28604,0.0,649.0,162.25 +NM,Partially phased out,6862.6577,1.0,352.83185,88.20796 +NM,Fully phased out,339088.2,49.8,0.0,0.0 +NV,No income,350563.6,36.1,0.0,0.0 +NV,Pre-phase-in,14857.255,1.5,528.70886,0.0 +NV,Full amount,150.01535,0.0,649.00006,0.0 +NV,Partially phased out,8750.994,0.9,436.09225,0.0 +NV,Fully phased out,597421.9,61.5,0.0,0.0 +NY,No income,2387931.2,38.9,0.0,0.0 +NY,Pre-phase-in,85475.5,1.4,497.05002,148.67133 +NY,Full amount,4840.017,0.1,648.99994,191.7149 +NY,Partially phased out,58324.44,0.9,358.95114,83.15456 +NY,Fully phased out,3609463.8,58.7,0.0,0.0 +OH,No income,1128293.6,35.2,0.0,0.0 +OH,Pre-phase-in,35809.71,1.1,520.55475,156.16643 +OH,Full amount,1071.496,0.0,649.0,194.7 +OH,Partially phased out,27460.96,0.9,437.14362,131.14308 +OH,Fully phased out,2008216.0,62.7,0.0,0.0 +OK,No income,451488.94,39.2,0.0,0.0 +OK,Pre-phase-in,13047.681,1.1,517.97766,25.205536 +OK,Full amount,430.25494,0.0,649.0,25.723408 +OK,Partially phased out,7278.75,0.6,485.13818,12.743043 +OK,Fully phased out,680099.75,59.0,0.0,0.0 +OR,No income,584257.56,41.8,0.0,0.0 +OR,Pre-phase-in,14035.583,1.0,513.72626,46.235363 +OR,Full amount,598.7314,0.0,649.00006,58.409996 +OR,Partially phased out,13806.066,1.0,329.78485,29.680637 +OR,Fully phased out,784550.06,56.1,0.0,0.0 +PA,No income,1528526.2,37.3,0.0,0.0 +PA,Pre-phase-in,57922.355,1.4,525.9283,0.0 +PA,Full amount,1853.9569,0.0,649.0,0.0 +PA,Partially phased out,29319.182,0.7,379.9078,0.0 +PA,Fully phased out,2477462.8,60.5,0.0,0.0 +RI,No income,135101.94,33.7,0.0,0.0 +RI,Pre-phase-in,6809.517,1.7,515.9956,82.55931 +RI,Full amount,184.90422,0.0,648.99994,103.84 +RI,Partially phased out,2966.2002,0.7,409.08417,65.45346 +RI,Fully phased out,256211.83,63.8,0.0,0.0 +SC,No income,518110.75,37.0,0.0,0.0 +SC,Pre-phase-in,5204.986,0.4,457.20728,571.5345 +SC,Full amount,690.02356,0.0,648.9999,811.2 +SC,Partially phased out,10148.458,0.7,457.65204,572.04614 +SC,Fully phased out,866683.8,61.9,0.0,0.0 +SD,No income,67121.33,25.8,0.0,0.0 +SD,Pre-phase-in,1256.6394,0.5,529.5882,0.0 +SD,Partially phased out,2834.1692,1.1,317.98163,0.0 +SD,Fully phased out,188838.94,72.6,0.0,0.0 +TN,No income,641917.3,29.9,0.0,0.0 +TN,Pre-phase-in,26780.662,1.2,422.82413,0.0 +TN,Full amount,512.49664,0.0,649.00006,0.0 +TN,Partially phased out,10210.256,0.5,520.8847,0.0 +TN,Fully phased out,1466141.0,68.3,0.0,0.0 +TX,No income,2459692.5,29.5,0.0,0.0 +TX,Pre-phase-in,153981.7,1.8,504.31863,0.0 +TX,Full amount,197.92868,0.0,649.0,0.0 +TX,Partially phased out,57914.004,0.7,494.92886,0.0 +TX,Fully phased out,5675496.0,68.0,0.0,0.0 +UT,No income,200944.84,27.3,0.0,0.0 +UT,Pre-phase-in,6285.637,0.9,490.25928,0.0 +UT,Full amount,131.55965,0.0,648.99994,0.0 +UT,Partially phased out,10868.779,1.5,344.35452,0.0 +UT,Fully phased out,517236.75,70.3,0.0,0.0 +VA,No income,800697.0,33.8,0.0,0.0 +VA,Pre-phase-in,11179.481,0.5,452.66635,90.53327 +VA,Full amount,633.57574,0.0,649.0,129.79999 +VA,Partially phased out,25600.77,1.1,356.98605,71.724556 +VA,Fully phased out,1532187.8,64.6,0.0,0.0 +VT,No income,75031.914,34.9,0.0,0.0 +VT,Pre-phase-in,2928.623,1.4,531.3219,531.3219 +VT,Full amount,83.45121,0.0,649.0,649.0 +VT,Partially phased out,2007.2561,0.9,345.8606,345.8606 +VT,Fully phased out,135030.55,62.8,0.0,0.0 +WA,No income,827862.5,30.3,0.0,0.0 +WA,Pre-phase-in,67620.664,2.5,533.66614,334.37003 +WA,Full amount,9.981713,0.0,649.0,334.36996 +WA,Partially phased out,17639.67,0.6,470.84427,321.92703 +WA,Fully phased out,1821082.5,66.6,0.0,0.0 +WI,No income,614872.7,35.2,0.0,0.0 +WI,Pre-phase-in,15454.481,0.9,474.73358,0.0 +WI,Full amount,782.4303,0.0,649.0,0.0 +WI,Partially phased out,12181.741,0.7,411.6662,0.0 +WI,Fully phased out,1104725.2,63.2,0.0,0.0 +WV,No income,230403.9,43.9,0.0,0.0 +WV,Pre-phase-in,5904.34,1.1,533.2025,0.0 +WV,Full amount,626.0814,0.1,649.0,0.0 +WV,Partially phased out,3794.6746,0.7,538.4127,0.0 +WV,Fully phased out,283687.66,54.1,0.0,0.0 +WY,No income,45254.85,26.3,0.0,0.0 +WY,Pre-phase-in,2254.627,1.3,336.4639,0.0 +WY,Full amount,87.738815,0.1,648.99994,0.0 +WY,Partially phased out,810.4804,0.5,622.8266,0.0 +WY,Fully phased out,123354.51,71.8,0.0,0.0 From 79af45341ce860366c53c3df778eb87ca25c76dc Mon Sep 17 00:00:00 2001 From: David Trimmer Date: Wed, 17 Dec 2025 13:55:31 -0500 Subject: [PATCH 2/4] Add comprehensive documentation to EITC childless analysis notebook MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Added detailed comments and docstrings throughout the notebook to make it easier for recipients to understand, reproduce, and modify the analysis. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .../eitc_childless_analysis.ipynb | 1051 +---------------- 1 file changed, 46 insertions(+), 1005 deletions(-) diff --git a/eitc_childless_analysis/eitc_childless_analysis.ipynb b/eitc_childless_analysis/eitc_childless_analysis.ipynb index 5b7af7e..d1c396c 100644 --- a/eitc_childless_analysis/eitc_childless_analysis.ipynb +++ b/eitc_childless_analysis/eitc_childless_analysis.ipynb @@ -3,19 +3,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "# EITC Analysis: Childless Families by Phase-in/Phase-out Status\n", - "\n", - "This notebook analyzes childless families (tax units with no EITC qualifying children) who receive the Earned Income Tax Credit (EITC), including:\n", - "- Federal EITC amounts\n", - "- State EITC amounts (where applicable)\n", - "- EITC schedule position (pre-phase-in, full amount, partially phased out, fully phased out)\n", - "- Household characteristics (marital status, state, demographics)\n", - "\n", - "**Data Source:** State-specific datasets from PolicyEngine\n", - "\n", - "**Years Analyzed:** 2024 and 2025" - ] + "source": "# EITC Analysis: Childless Families by Phase-in/Phase-out Status\n\n## Overview\nThis notebook analyzes **childless tax units** (those with no EITC-qualifying children) across all 50 US states + DC, categorizing them by where they fall on the EITC schedule.\n\n## What This Notebook Does\n1. **Loads state-specific microdata** from PolicyEngine's HuggingFace repository\n2. **Filters to childless households** (eitc_child_count == 0)\n3. **Categorizes each household** into one of 5 EITC phase statuses\n4. **Calculates weighted counts and percentages** by state\n5. **Exports summary and detailed data** to CSV files\n\n## EITC Phase Status Categories\n| Status | Description |\n|--------|-------------|\n| **No income** | No/minimal earned income ($100 or less), not receiving EITC |\n| **Pre-phase-in** | Earning income but haven't reached maximum credit yet |\n| **Full amount** | At the plateau - receiving maximum credit |\n| **Partially phased out** | In phase-out range, receiving reduced credit |\n| **Fully phased out** | Income too high, EITC reduced to $0 |\n\n## Data Source\n- **State datasets**: `hf://policyengine/policyengine-us-data/states/{STATE}.h5`\n- Each state has its own dataset with representative household microdata\n- Data is weighted to represent the actual population\n\n## Output Files\n- `eitc_childless_phase_status_summary_{year}.csv` - Aggregated by state and phase status\n- `eitc_childless_families_{year}.csv` - Detailed household-level data (large files, ~125MB each)\n\n## Years Analyzed\n- 2024 and 2025" }, { "cell_type": "markdown", @@ -26,203 +14,34 @@ }, { "cell_type": "code", - "execution_count": 30, + "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "from policyengine_us import Microsimulation\n", - "import pandas as pd\n", - "import numpy as np\n", - "\n", - "pd.set_option('display.max_columns', None)\n", - "pd.set_option('display.width', None)\n", - "pd.set_option('display.float_format', lambda x: f'{x:,.2f}')" - ] + "source": "# =============================================================================\n# IMPORTS AND CONFIGURATION\n# =============================================================================\n# \n# policyengine_us: PolicyEngine's US tax-benefit microsimulation model\n# - Microsimulation: Class for running simulations on survey microdata\n# - Loads datasets, calculates tax/benefit variables for each household\n#\n# pandas/numpy: Standard data manipulation libraries\n# =============================================================================\n\nfrom policyengine_us import Microsimulation\nimport pandas as pd\nimport numpy as np\n\n# Configure pandas display options for better output formatting\npd.set_option('display.max_columns', None) # Show all columns\npd.set_option('display.width', None) # Don't wrap output\npd.set_option('display.float_format', lambda x: f'{x:,.2f}') # Format numbers with commas" }, { "cell_type": "markdown", "metadata": {}, - "source": [ - "## Helper Function: Determine EITC Phase Status\n", - "\n", - "The EITC has four distinct regions:\n", - "1. **Pre-phase-in**: Earning below the level needed to reach maximum credit\n", - "2. **Full amount (plateau)**: Earning enough for maximum credit, not yet in phase-out\n", - "3. **Partially phased out**: In phase-out range, but still receiving some credit\n", - "4. **Fully phased out**: Income too high, EITC = $0" - ] + "source": "## EITC Phase Status Classification\n\nThe Earned Income Tax Credit (EITC) follows a trapezoidal schedule:\n\n```\nCredit\nAmount\n ^\n | ___________\n | / \\\n | / \\\n | / \\\n | / \\\n | / \\\n |/_____________________\\____> Earned Income\n Phase-in Plateau Phase-out\n```\n\n### How We Classify Households\n\nWe use PolicyEngine's calculated variables to determine where each household falls:\n\n| Variable | Description |\n|----------|-------------|\n| `eitc` | Final EITC amount received (after all calculations) |\n| `eitc_maximum` | Maximum possible EITC for this household's filing status |\n| `eitc_phased_in` | Amount \"earned\" based on phase-in rate × earned income |\n| `eitc_reduction` | Amount reduced due to being in phase-out range |\n| `tax_unit_earned_income` | Total earned income for the tax unit |\n\n### Classification Logic\n1. **No income**: Earned income ≤ $100 AND eitc = 0\n2. **Pre-phase-in**: Receiving EITC but eitc_phased_in < eitc_maximum\n3. **Full amount**: eitc_phased_in ≥ eitc_maximum AND eitc_reduction = 0\n4. **Partially phased out**: Receiving EITC AND eitc_reduction > 0\n5. **Fully phased out**: eitc = 0 AND (has reduction OR phased_in ≥ maximum)" }, { "cell_type": "code", - "execution_count": 31, + "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "def determine_eitc_phase_status_vectorized(df):\n", - " \"\"\"\n", - " Vectorized version to determine EITC phase status for a DataFrame.\n", - " \n", - " Categories:\n", - " - No income: No/minimal earned income, not receiving EITC\n", - " - Pre-phase-in: Earning but haven't reached maximum credit yet\n", - " - Full amount: At maximum credit (plateau region)\n", - " - Partially phased out: In phase-out region, still receiving some credit\n", - " - Fully phased out: Income too high, EITC reduced to $0\n", - " \"\"\"\n", - " conditions = [\n", - " # No income: earned income is 0 or very low AND not receiving EITC\n", - " (df['tax_unit_earned_income'] <= 100) & (df['eitc'] <= 0),\n", - " \n", - " # Fully phased out: EITC is 0 AND had some earned income AND there was reduction\n", - " (df['eitc'] <= 0) & (df['tax_unit_earned_income'] > 100) & (df['eitc_reduction'] > 0),\n", - " \n", - " # Fully phased out: EITC is 0 AND phased_in >= maximum (meaning they would have gotten max but it's all reduced)\n", - " (df['eitc'] <= 0) & (df['eitc_phased_in'] >= df['eitc_maximum']),\n", - " \n", - " # Pre-phase-in: Receiving EITC but haven't hit maximum yet\n", - " (df['eitc'] > 0) & (df['eitc_phased_in'] < df['eitc_maximum']),\n", - " \n", - " # Partially phased out: Receiving EITC with some reduction\n", - " (df['eitc'] > 0) & (df['eitc_reduction'] > 0),\n", - " \n", - " # Full amount: At maximum, no reduction\n", - " (df['eitc'] > 0) & (df['eitc_phased_in'] >= df['eitc_maximum']) & (df['eitc_reduction'] <= 0)\n", - " ]\n", - " \n", - " choices = [\n", - " 'No income',\n", - " 'Fully phased out',\n", - " 'Fully phased out',\n", - " 'Pre-phase-in',\n", - " 'Partially phased out',\n", - " 'Full amount'\n", - " ]\n", - " \n", - " return np.select(conditions, choices, default='No income')" - ] + "source": "# =============================================================================\n# EITC PHASE STATUS CLASSIFICATION FUNCTION\n# =============================================================================\n# This function takes a DataFrame of households and classifies each one into\n# one of 5 EITC phase statuses based on their income and EITC calculations.\n#\n# Uses numpy's np.select() for efficient vectorized conditional logic.\n# =============================================================================\n\ndef determine_eitc_phase_status_vectorized(df):\n \"\"\"\n Classify each household into an EITC phase status category.\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Must contain columns: tax_unit_earned_income, eitc, eitc_reduction,\n eitc_phased_in, eitc_maximum\n \n Returns:\n --------\n numpy.ndarray\n Array of status strings, one per row in df\n \n Categories:\n -----------\n - No income: No/minimal earned income, not receiving EITC\n - Pre-phase-in: Earning but haven't reached maximum credit yet\n - Full amount: At maximum credit (plateau region)\n - Partially phased out: In phase-out region, still receiving some credit\n - Fully phased out: Income too high, EITC reduced to $0\n \"\"\"\n \n # Define conditions in priority order (first match wins)\n # Each condition is a boolean array the same length as df\n conditions = [\n # CONDITION 1: No income\n # Household has little/no earned income AND isn't receiving EITC\n (df['tax_unit_earned_income'] <= 100) & (df['eitc'] <= 0),\n \n # CONDITION 2: Fully phased out (with reduction)\n # Not receiving EITC, but has earned income and would have had reduction\n (df['eitc'] <= 0) & (df['tax_unit_earned_income'] > 100) & (df['eitc_reduction'] > 0),\n \n # CONDITION 3: Fully phased out (hit max then reduced to zero)\n # Not receiving EITC, but phased_in amount reached/exceeded maximum\n (df['eitc'] <= 0) & (df['eitc_phased_in'] >= df['eitc_maximum']),\n \n # CONDITION 4: Pre-phase-in\n # Receiving EITC, but haven't earned enough to hit maximum yet\n (df['eitc'] > 0) & (df['eitc_phased_in'] < df['eitc_maximum']),\n \n # CONDITION 5: Partially phased out\n # Receiving EITC, but some reduction has been applied\n (df['eitc'] > 0) & (df['eitc_reduction'] > 0),\n \n # CONDITION 6: Full amount (plateau)\n # Receiving EITC at maximum, no reduction applied\n (df['eitc'] > 0) & (df['eitc_phased_in'] >= df['eitc_maximum']) & (df['eitc_reduction'] <= 0)\n ]\n \n # Labels corresponding to each condition above\n choices = [\n 'No income',\n 'Fully phased out',\n 'Fully phased out',\n 'Pre-phase-in',\n 'Partially phased out',\n 'Full amount'\n ]\n \n # np.select applies conditions in order, returns first matching choice\n # Default 'No income' catches any edge cases\n return np.select(conditions, choices, default='No income')" }, { "cell_type": "markdown", "metadata": {}, - "source": [ - "## Load Data and Calculate Variables\n", - "\n", - "We'll run the analysis for both 2024 and 2025." - ] + "source": "## Data Loading Functions\n\nThe following cell defines two key functions:\n\n### `run_state_eitc_analysis(state_abbr, year)`\nLoads and processes data for a single state:\n1. Loads the state's microdata from HuggingFace\n2. Calculates all relevant EITC and household variables\n3. Filters to childless households only\n4. Classifies each household by EITC phase status\n5. Returns a DataFrame with one row per household\n\n### `run_all_states_analysis(year)`\nOrchestrates the full analysis:\n1. Loops through all 51 states/DC\n2. Calls `run_state_eitc_analysis()` for each\n3. Combines all results into a single DataFrame\n4. Reports progress and totals\n\n### Variables Calculated\n| Variable | Description |\n|----------|-------------|\n| `tax_unit_weight` | Survey weight (how many real households this record represents) |\n| `eitc` | Federal EITC amount received |\n| `state_eitc` | State EITC amount (if state has a program) |\n| `eitc_child_count` | Number of EITC-qualifying children (we filter to 0) |\n| `filing_status` | Tax filing status (Single, Joint, etc.) |\n| `age_head` | Age of primary filer |\n| `adjusted_gross_income` | AGI for the tax unit |" }, { "cell_type": "code", - "execution_count": 32, + "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "# List of all US states\n", - "ALL_STATES = [\n", - " 'AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL', \n", - " 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', \n", - " 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', \n", - " 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', \n", - " 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY'\n", - "]\n", - "\n", - "# Phase status order for sorting\n", - "PHASE_ORDER = ['No income', 'Pre-phase-in', 'Full amount', 'Partially phased out', 'Fully phased out']\n", - "\n", - "def run_state_eitc_analysis(state_abbr, year):\n", - " \"\"\"\n", - " Run EITC analysis for ALL childless households (not just recipients) for a given state and year.\n", - " \"\"\"\n", - " try:\n", - " # Load the state-specific dataset\n", - " dataset_path = f\"hf://policyengine/policyengine-us-data/states/{state_abbr}.h5\"\n", - " sim = Microsimulation(dataset=dataset_path)\n", - " \n", - " # Calculate tax unit level variables\n", - " data = {}\n", - " \n", - " tax_unit_vars = [\n", - " 'tax_unit_id',\n", - " 'tax_unit_weight',\n", - " 'eitc',\n", - " 'eitc_maximum',\n", - " 'eitc_phased_in',\n", - " 'eitc_reduction',\n", - " 'eitc_child_count',\n", - " 'state_eitc',\n", - " 'adjusted_gross_income',\n", - " 'tax_unit_earned_income',\n", - " 'filing_status',\n", - " 'age_head',\n", - " 'age_spouse',\n", - " ]\n", - " \n", - " for var in tax_unit_vars:\n", - " result = sim.calculate(var, period=year)\n", - " data[var] = result.values if hasattr(result, 'values') else np.array(result)\n", - " \n", - " df = pd.DataFrame(data)\n", - " df['state'] = state_abbr\n", - " \n", - " # Filter to childless families only (include ALL, not just EITC recipients)\n", - " childless_mask = df['eitc_child_count'] == 0\n", - " df_childless = df[childless_mask].copy()\n", - " \n", - " if len(df_childless) == 0:\n", - " return None\n", - " \n", - " # Determine EITC phase status for ALL childless households\n", - " df_childless['eitc_phase_status'] = determine_eitc_phase_status_vectorized(df_childless)\n", - " \n", - " # Add year column\n", - " df_childless['year'] = year\n", - " \n", - " # Map filing status codes to readable labels\n", - " filing_status_map = {\n", - " 1: 'Single',\n", - " 2: 'Joint',\n", - " 3: 'Separate',\n", - " 4: 'Head of Household',\n", - " 5: 'Widow(er)'\n", - " }\n", - " df_childless['filing_status_label'] = df_childless['filing_status'].map(filing_status_map).fillna('Unknown')\n", - " \n", - " return df_childless\n", - " \n", - " except Exception as e:\n", - " print(f\" Error processing {state_abbr}: {e}\")\n", - " return None\n", - "\n", - "\n", - "def run_all_states_analysis(year, states=None):\n", - " \"\"\"\n", - " Run EITC analysis for all states for a given year.\n", - " \"\"\"\n", - " if states is None:\n", - " states = ALL_STATES\n", - " \n", - " print(f\"\\n{'='*60}\")\n", - " print(f\"Running analysis for {year}\")\n", - " print(f\"{'='*60}\")\n", - " \n", - " all_results = []\n", - " \n", - " for i, state in enumerate(states):\n", - " print(f\"Processing {state} ({i+1}/{len(states)})...\", end=\" \")\n", - " result = run_state_eitc_analysis(state, year)\n", - " if result is not None and len(result) > 0:\n", - " weighted_count = result['tax_unit_weight'].sum()\n", - " print(f\"{len(result):,} records, {weighted_count:,.0f} weighted\")\n", - " all_results.append(result)\n", - " else:\n", - " print(\"No data found\")\n", - " \n", - " if all_results:\n", - " combined = pd.concat(all_results, ignore_index=True)\n", - " print(f\"\\nTotal: {len(combined):,} records, {combined['tax_unit_weight'].sum():,.0f} weighted tax units\")\n", - " return combined\n", - " else:\n", - " return pd.DataFrame()" - ] + "source": "# =============================================================================\n# STATE LIST AND DATA LOADING FUNCTIONS\n# =============================================================================\n\n# All US states + DC (51 total)\n# Modify this list to analyze a subset of states\nALL_STATES = [\n 'AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL', \n 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', \n 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', \n 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', \n 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY'\n]\n\n# Order for sorting phase statuses (follows the EITC schedule from left to right)\nPHASE_ORDER = ['No income', 'Pre-phase-in', 'Full amount', 'Partially phased out', 'Fully phased out']\n\n\ndef run_state_eitc_analysis(state_abbr, year):\n \"\"\"\n Load and analyze EITC data for a single state.\n \n Parameters:\n -----------\n state_abbr : str\n Two-letter state abbreviation (e.g., 'CA', 'NY', 'TX')\n year : int\n Tax year to analyze (e.g., 2024, 2025)\n \n Returns:\n --------\n pandas.DataFrame or None\n DataFrame with one row per childless tax unit, or None if error\n \"\"\"\n try:\n # -----------------------------------------------------------------\n # STEP 1: Load the state's microdata from HuggingFace\n # -----------------------------------------------------------------\n # Each state has its own .h5 file with representative household data\n # The data is weighted to represent the state's actual population\n dataset_path = f\"hf://policyengine/policyengine-us-data/states/{state_abbr}.h5\"\n sim = Microsimulation(dataset=dataset_path)\n \n # -----------------------------------------------------------------\n # STEP 2: Calculate required variables using PolicyEngine\n # -----------------------------------------------------------------\n # These are \"tax unit\" level variables (a tax unit = people filing together)\n # sim.calculate() returns a weighted array of values\n data = {}\n \n tax_unit_vars = [\n 'tax_unit_id', # Unique identifier for each tax unit\n 'tax_unit_weight', # Survey weight (represents X real households)\n 'eitc', # Federal EITC amount (final, after all calculations)\n 'eitc_maximum', # Max possible EITC for this filing status\n 'eitc_phased_in', # Amount \"earned\" via phase-in calculation\n 'eitc_reduction', # Amount reduced due to phase-out\n 'eitc_child_count', # Number of EITC-qualifying children\n 'state_eitc', # State EITC amount (0 if no state program)\n 'adjusted_gross_income', # AGI\n 'tax_unit_earned_income', # Total earned income\n 'filing_status', # 1=Single, 2=Joint, 3=Separate, 4=HoH, 5=Widow\n 'age_head', # Age of primary filer\n 'age_spouse', # Age of spouse (0 if single)\n ]\n \n # Calculate each variable and extract the numpy array\n for var in tax_unit_vars:\n result = sim.calculate(var, period=year)\n # .values extracts the underlying numpy array from PolicyEngine's result\n data[var] = result.values if hasattr(result, 'values') else np.array(result)\n \n # Create DataFrame from the calculated values\n df = pd.DataFrame(data)\n df['state'] = state_abbr # Add state identifier\n \n # -----------------------------------------------------------------\n # STEP 3: Filter to childless households only\n # -----------------------------------------------------------------\n # We want ALL childless households, not just those receiving EITC\n # This lets us calculate percentages that sum to 100%\n childless_mask = df['eitc_child_count'] == 0\n df_childless = df[childless_mask].copy()\n \n if len(df_childless) == 0:\n return None\n \n # -----------------------------------------------------------------\n # STEP 4: Classify each household by EITC phase status\n # -----------------------------------------------------------------\n df_childless['eitc_phase_status'] = determine_eitc_phase_status_vectorized(df_childless)\n \n # -----------------------------------------------------------------\n # STEP 5: Add readable labels for filing status\n # -----------------------------------------------------------------\n df_childless['year'] = year\n \n filing_status_map = {\n 1: 'Single',\n 2: 'Joint',\n 3: 'Separate',\n 4: 'Head of Household',\n 5: 'Widow(er)'\n }\n df_childless['filing_status_label'] = df_childless['filing_status'].map(filing_status_map).fillna('Unknown')\n \n return df_childless\n \n except Exception as e:\n print(f\" Error processing {state_abbr}: {e}\")\n return None\n\n\ndef run_all_states_analysis(year, states=None):\n \"\"\"\n Run EITC analysis for all states and combine results.\n \n Parameters:\n -----------\n year : int\n Tax year to analyze\n states : list, optional\n List of state abbreviations. Defaults to ALL_STATES (all 51).\n \n Returns:\n --------\n pandas.DataFrame\n Combined DataFrame with all states' data\n \"\"\"\n if states is None:\n states = ALL_STATES\n \n print(f\"\\n{'='*60}\")\n print(f\"Running analysis for {year}\")\n print(f\"{'='*60}\")\n \n all_results = []\n \n # Process each state\n for i, state in enumerate(states):\n print(f\"Processing {state} ({i+1}/{len(states)})...\", end=\" \")\n result = run_state_eitc_analysis(state, year)\n \n if result is not None and len(result) > 0:\n # Report: raw record count and weighted population count\n weighted_count = result['tax_unit_weight'].sum()\n print(f\"{len(result):,} records, {weighted_count:,.0f} weighted\")\n all_results.append(result)\n else:\n print(\"No data found\")\n \n # Combine all state DataFrames\n if all_results:\n combined = pd.concat(all_results, ignore_index=True)\n print(f\"\\nTotal: {len(combined):,} records, {combined['tax_unit_weight'].sum():,.0f} weighted tax units\")\n return combined\n else:\n return pd.DataFrame()" }, { "cell_type": "markdown", @@ -233,171 +52,24 @@ }, { "cell_type": "code", - "execution_count": 33, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "============================================================\n", - "Running analysis for 2024\n", - "============================================================\n", - "Processing AL (1/51)... 25,751 records, 1,422,123 weighted\n", - "Processing AK (2/51)... 1,182 records, 205,778 weighted\n", - "Processing AZ (3/51)... 30,120 records, 1,905,622 weighted\n", - "Processing AR (4/51)... 15,144 records, 683,842 weighted\n", - "Processing CA (5/51)... 238,247 records, 11,676,756 weighted\n", - "Processing CO (6/51)... 34,120 records, 1,602,958 weighted\n", - "Processing CT (7/51)... 19,827 records, 1,119,846 weighted\n", - "Processing DE (8/51)... 3,801 records, 265,233 weighted\n", - "Processing DC (9/51)... 4,995 records, 247,082 weighted\n", - "Processing FL (10/51)... 45,655 records, 6,828,672 weighted\n", - "Processing GA (11/51)... 56,638 records, 2,867,909 weighted\n", - "Processing HI (12/51)... 8,416 records, 401,230 weighted\n", - "Processing ID (13/51)... 7,678 records, 420,636 weighted\n", - "Processing IL (14/51)... 56,631 records, 4,061,833 weighted\n", - "Processing IN (15/51)... 33,456 records, 1,707,284 weighted\n", - "Processing IA (16/51)... 14,070 records, 834,990 weighted\n", - "Processing KS (17/51)... 15,776 records, 746,492 weighted\n", - "Processing KY (18/51)... 22,109 records, 1,122,918 weighted\n", - "Processing LA (19/51)... 21,674 records, 1,255,035 weighted\n", - "Processing ME (20/51)... 7,782 records, 436,655 weighted\n", - "Processing MD (21/51)... 39,963 records, 1,737,465 weighted\n", - "Processing MA (22/51)... 40,034 records, 2,445,482 weighted\n", - "Processing MI (23/51)... 41,722 records, 2,947,462 weighted\n", - "Processing MN (24/51)... 32,839 records, 1,579,933 weighted\n", - "Processing MS (25/51)... 13,414 records, 751,858 weighted\n", - "Processing MO (26/51)... 29,883 records, 1,572,474 weighted\n", - "Processing MT (27/51)... 7,850 records, 322,606 weighted\n", - "Processing NE (28/51)... 7,585 records, 555,046 weighted\n", - "Processing NV (29/51)... 10,477 records, 962,804 weighted\n", - "Processing NH (30/51)... 2,774 records, 466,176 weighted\n", - "Processing NJ (31/51)... 53,826 records, 2,670,506 weighted\n", - "Processing NM (32/51)... 10,333 records, 674,804 weighted\n", - "Processing NY (33/51)... 111,004 records, 6,089,496 weighted\n", - "Processing NC (34/51)... 55,174 records, 3,018,448 weighted\n", - "Processing ND (35/51)... 3,391 records, 208,559 weighted\n", - "Processing OH (36/51)... 52,414 records, 3,171,406 weighted\n", - "Processing OK (37/51)... 17,840 records, 1,141,744 weighted\n", - "Processing OR (38/51)... 27,048 records, 1,384,394 weighted\n", - "Processing PA (39/51)... 59,791 records, 4,057,412 weighted\n", - "Processing RI (40/51)... 7,429 records, 397,583 weighted\n", - "Processing SC (41/51)... 26,703 records, 1,387,951 weighted\n", - "Processing SD (42/51)... 1,071 records, 257,659 weighted\n", - "Processing TN (43/51)... 11,099 records, 2,125,824 weighted\n", - "Processing TX (44/51)... 46,778 records, 8,270,492 weighted\n", - "Processing UT (45/51)... 18,448 records, 728,702 weighted\n", - "Processing VT (46/51)... 3,780 records, 213,103 weighted\n", - "Processing VA (47/51)... 47,926 records, 2,348,494 weighted\n", - "Processing WA (48/51)... 12,571 records, 2,709,062 weighted\n", - "Processing WV (49/51)... 6,981 records, 519,592 weighted\n", - "Processing WI (50/51)... 27,609 records, 1,731,936 weighted\n", - "Processing WY (51/51)... 2,712 records, 170,182 weighted\n", - "\n", - "Total: 1,493,541 records, 96,431,536 weighted tax units\n" - ] - } - ], - "source": [ - "# Run for 2024 - all states\n", - "df_2024 = run_all_states_analysis(2024)" - ] + "outputs": [], + "source": "# =============================================================================\n# RUN ANALYSIS FOR 2024\n# =============================================================================\n# This cell processes all 51 states/DC for tax year 2024.\n# \n# Output:\n# df_2024 - DataFrame containing all childless tax units from all states\n# with EITC calculations and phase status classification\n#\n# Processing time: Approximately 5-10 minutes depending on internet speed\n# (downloads ~50MB of data from HuggingFace)\n# =============================================================================\n\ndf_2024 = run_all_states_analysis(2024)" }, { "cell_type": "code", - "execution_count": 34, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "============================================================\n", - "Running analysis for 2025\n", - "============================================================\n", - "Processing AL (1/51)... 25,751 records, 1,435,327 weighted\n", - "Processing AK (2/51)... 1,182 records, 207,689 weighted\n", - "Processing AZ (3/51)... 30,120 records, 1,923,315 weighted\n", - "Processing AR (4/51)... 15,144 records, 690,191 weighted\n", - "Processing CA (5/51)... 238,247 records, 11,785,171 weighted\n", - "Processing CO (6/51)... 34,120 records, 1,617,841 weighted\n", - "Processing CT (7/51)... 19,827 records, 1,130,243 weighted\n", - "Processing DE (8/51)... 3,801 records, 267,696 weighted\n", - "Processing DC (9/51)... 4,995 records, 249,376 weighted\n", - "Processing FL (10/51)... 45,655 records, 6,892,074 weighted\n", - "Processing GA (11/51)... 56,638 records, 2,894,537 weighted\n", - "Processing HI (12/51)... 8,416 records, 404,956 weighted\n", - "Processing ID (13/51)... 7,678 records, 424,542 weighted\n", - "Processing IL (14/51)... 56,631 records, 4,099,546 weighted\n", - "Processing IN (15/51)... 33,456 records, 1,723,135 weighted\n", - "Processing IA (16/51)... 14,070 records, 842,742 weighted\n", - "Processing KS (17/51)... 15,776 records, 753,423 weighted\n", - "Processing KY (18/51)... 22,109 records, 1,133,344 weighted\n", - "Processing LA (19/51)... 21,674 records, 1,266,688 weighted\n", - "Processing ME (20/51)... 7,782 records, 440,709 weighted\n", - "Processing MD (21/51)... 39,963 records, 1,753,596 weighted\n", - "Processing MA (22/51)... 40,034 records, 2,468,188 weighted\n", - "Processing MI (23/51)... 41,722 records, 2,974,829 weighted\n", - "Processing MN (24/51)... 32,839 records, 1,594,602 weighted\n", - "Processing MS (25/51)... 13,414 records, 758,839 weighted\n", - "Processing MO (26/51)... 29,883 records, 1,587,074 weighted\n", - "Processing MT (27/51)... 7,850 records, 325,601 weighted\n", - "Processing NE (28/51)... 7,585 records, 560,199 weighted\n", - "Processing NV (29/51)... 10,477 records, 971,744 weighted\n", - "Processing NH (30/51)... 2,774 records, 470,505 weighted\n", - "Processing NJ (31/51)... 53,826 records, 2,695,300 weighted\n", - "Processing NM (32/51)... 10,333 records, 681,069 weighted\n", - "Processing NY (33/51)... 111,004 records, 6,146,035 weighted\n", - "Processing NC (34/51)... 55,174 records, 3,046,473 weighted\n", - "Processing ND (35/51)... 3,391 records, 210,495 weighted\n", - "Processing OH (36/51)... 52,414 records, 3,200,852 weighted\n", - "Processing OK (37/51)... 17,840 records, 1,152,346 weighted\n", - "Processing OR (38/51)... 27,048 records, 1,397,248 weighted\n", - "Processing PA (39/51)... 59,791 records, 4,095,084 weighted\n", - "Processing RI (40/51)... 7,429 records, 401,274 weighted\n", - "Processing SC (41/51)... 26,703 records, 1,400,838 weighted\n", - "Processing SD (42/51)... 1,071 records, 260,051 weighted\n", - "Processing TN (43/51)... 11,099 records, 2,145,562 weighted\n", - "Processing TX (44/51)... 46,778 records, 8,347,282 weighted\n", - "Processing UT (45/51)... 18,448 records, 735,468 weighted\n", - "Processing VT (46/51)... 3,780 records, 215,082 weighted\n", - "Processing VA (47/51)... 47,926 records, 2,370,298 weighted\n", - "Processing WA (48/51)... 12,571 records, 2,734,216 weighted\n", - "Processing WV (49/51)... 6,981 records, 524,417 weighted\n", - "Processing WI (50/51)... 27,609 records, 1,748,017 weighted\n", - "Processing WY (51/51)... 2,712 records, 171,762 weighted\n", - "\n", - "Total: 1,493,541 records, 97,326,840 weighted tax units\n" - ] - } - ], - "source": [ - "# Run for 2025 - all states\n", - "df_2025 = run_all_states_analysis(2025)" - ] + "outputs": [], + "source": "# =============================================================================\n# RUN ANALYSIS FOR 2025\n# =============================================================================\n# Same analysis as above but for tax year 2025.\n# PolicyEngine uses inflation-adjusted parameters for future years.\n#\n# Output:\n# df_2025 - DataFrame containing all childless tax units for 2025\n# =============================================================================\n\ndf_2025 = run_all_states_analysis(2025)" }, { "cell_type": "code", - "execution_count": 35, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Combined dataset: 2,987,082 records\n" - ] - } - ], - "source": [ - "# Combine both years\n", - "df_combined = pd.concat([df_2024, df_2025], ignore_index=True)\n", - "print(f\"\\nCombined dataset: {len(df_combined):,} records\")" - ] + "outputs": [], + "source": "# =============================================================================\n# COMBINE BOTH YEARS INTO SINGLE DATASET\n# =============================================================================\n# Creates a unified dataset with both years for cross-year comparisons.\n# The 'year' column distinguishes records from each tax year.\n#\n# Note: This combined dataset is primarily for exploratory analysis.\n# The exports are done separately by year for cleaner output files.\n# =============================================================================\n\ndf_combined = pd.concat([df_2024, df_2025], ignore_index=True)\nprint(f\"\\nCombined dataset: {len(df_combined):,} records\")" }, { "cell_type": "markdown", @@ -415,125 +87,10 @@ }, { "cell_type": "code", - "execution_count": 36, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "======================================================================\n", - "EITC Phase Status by State - 2024\n", - "======================================================================\n", - "\n", - "======================================================================\n", - "EITC Phase Status by State - 2025\n", - "======================================================================\n", - "\n", - "2024 Summary (first 20 rows):\n", - "state eitc_phase_status weighted_households pct_of_state avg_federal_eitc avg_state_eitc\n", - " AK No income 64,211.33 31.20 0.00 0.00\n", - " AK Pre-phase-in 3,593.07 1.70 515.63 0.00\n", - " AK Full amount 0.26 0.00 632.00 0.00\n", - " AK Partially phased out 1,670.44 0.80 626.76 0.00\n", - " AK Fully phased out 136,303.23 66.20 0.00 0.00\n", - " AL No income 598,891.06 42.10 0.00 0.00\n", - " AL Pre-phase-in 3,394.00 0.20 354.86 0.00\n", - " AL Full amount 579.79 0.00 632.00 0.00\n", - " AL Partially phased out 10,719.72 0.80 448.06 0.00\n", - " AL Fully phased out 808,538.31 56.90 0.00 0.00\n", - " AR No income 232,860.75 34.10 0.00 0.00\n", - " AR Pre-phase-in 2,328.13 0.30 453.60 0.00\n", - " AR Full amount 225.77 0.00 632.00 0.00\n", - " AR Partially phased out 5,891.05 0.90 390.64 0.00\n", - " AR Fully phased out 442,536.25 64.70 0.00 0.00\n", - " AZ No income 672,398.75 35.30 0.00 0.00\n", - " AZ Pre-phase-in 16,732.68 0.90 489.91 0.00\n", - " AZ Full amount 813.98 0.00 632.00 0.00\n", - " AZ Partially phased out 14,077.30 0.70 468.21 0.00\n", - " AZ Fully phased out 1,201,599.25 63.10 0.00 0.00\n", - "\n", - "2025 Summary (first 20 rows):\n", - "state eitc_phase_status weighted_households pct_of_state avg_federal_eitc avg_state_eitc\n", - " AK No income 64,630.09 31.10 0.00 0.00\n", - " AK Pre-phase-in 3,626.43 1.70 540.81 0.00\n", - " AK Full amount 0.27 0.00 649.00 0.00\n", - " AK Partially phased out 1,685.95 0.80 627.09 0.00\n", - " AK Fully phased out 137,746.20 66.30 0.00 0.00\n", - " AL No income 600,602.44 41.80 0.00 0.00\n", - " AL Pre-phase-in 3,424.46 0.20 372.11 0.00\n", - " AL Full amount 586.22 0.00 649.00 0.00\n", - " AL Partially phased out 10,817.39 0.80 439.31 0.00\n", - " AL Fully phased out 819,896.50 57.10 0.00 0.00\n", - " AR No income 233,882.83 33.90 0.00 0.00\n", - " AR Pre-phase-in 2,349.75 0.30 475.76 0.00\n", - " AR Full amount 227.08 0.00 649.00 0.00\n", - " AR Partially phased out 5,943.54 0.90 379.36 0.00\n", - " AR Fully phased out 447,788.06 64.90 0.00 0.00\n", - " AZ No income 676,085.38 35.20 0.00 0.00\n", - " AZ Pre-phase-in 16,887.52 0.90 513.83 0.00\n", - " AZ Full amount 821.81 0.00 649.00 0.00\n", - " AZ Partially phased out 14,207.12 0.70 460.65 0.00\n", - " AZ Fully phased out 1,215,313.50 63.20 0.00 0.00\n" - ] - } - ], - "source": [ - "def create_phase_status_summary(df, year_label):\n", - " \"\"\"\n", - " Create summary of EITC phase status by state with weighted counts and percentages.\n", - " \"\"\"\n", - " print(f\"\\n{'='*70}\")\n", - " print(f\"EITC Phase Status by State - {year_label}\")\n", - " print(f\"{'='*70}\")\n", - " \n", - " # Calculate weighted counts by state and phase status\n", - " summary = df.groupby(['state', 'eitc_phase_status']).agg({\n", - " 'tax_unit_weight': 'sum',\n", - " }).reset_index()\n", - " \n", - " summary.columns = ['state', 'eitc_phase_status', 'weighted_households']\n", - " \n", - " # Calculate state totals for percentage\n", - " state_totals = summary.groupby('state')['weighted_households'].sum().reset_index()\n", - " state_totals.columns = ['state', 'state_total']\n", - " \n", - " # Merge to get percentages\n", - " summary = summary.merge(state_totals, on='state')\n", - " summary['pct_of_state'] = (summary['weighted_households'] / summary['state_total'] * 100).round(1)\n", - " \n", - " # Add average EITC amounts (only for those receiving)\n", - " avg_eitc = df[df['eitc'] > 0].groupby(['state', 'eitc_phase_status']).apply(\n", - " lambda x: pd.Series({\n", - " 'avg_federal_eitc': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n", - " 'avg_state_eitc': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n", - " })\n", - " ).reset_index()\n", - " \n", - " summary = summary.merge(avg_eitc, on=['state', 'eitc_phase_status'], how='left')\n", - " summary['avg_federal_eitc'] = summary['avg_federal_eitc'].fillna(0)\n", - " summary['avg_state_eitc'] = summary['avg_state_eitc'].fillna(0)\n", - " \n", - " # Reorder columns\n", - " summary = summary[['state', 'eitc_phase_status', 'weighted_households', 'pct_of_state', \n", - " 'avg_federal_eitc', 'avg_state_eitc']]\n", - " \n", - " # Sort by state and phase status order\n", - " summary['phase_sort'] = summary['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})\n", - " summary = summary.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)\n", - " \n", - " return summary\n", - "\n", - "# Run for 2024 and 2025\n", - "summary_2024 = create_phase_status_summary(df_2024, \"2024\")\n", - "summary_2025 = create_phase_status_summary(df_2025, \"2025\")\n", - "\n", - "print(\"\\n2024 Summary (first 20 rows):\")\n", - "print(summary_2024.head(20).to_string(index=False))\n", - "print(\"\\n2025 Summary (first 20 rows):\")\n", - "print(summary_2025.head(20).to_string(index=False))" - ] + "outputs": [], + "source": "# =============================================================================\n# PHASE STATUS SUMMARY BY STATE\n# =============================================================================\n# This function creates the main summary output: for each state, what\n# percentage of childless households fall into each EITC phase status?\n#\n# Key outputs per state × phase status:\n# - weighted_households: Actual population count (using survey weights)\n# - pct_of_state: What % of that state's childless households are in this phase\n# - avg_federal_eitc: Average federal EITC for households receiving EITC\n# - avg_state_eitc: Average state EITC (for states with programs)\n#\n# The percentages should sum to 100% for each state since we include ALL\n# childless households (not just EITC recipients).\n# =============================================================================\n\ndef create_phase_status_summary(df, year_label):\n \"\"\"\n Create summary of EITC phase status by state with weighted counts and percentages.\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Household-level data from run_all_states_analysis()\n year_label : str\n Label for display (e.g., \"2024\")\n \n Returns:\n --------\n pandas.DataFrame\n Summary with columns: state, eitc_phase_status, weighted_households,\n pct_of_state, avg_federal_eitc, avg_state_eitc\n \"\"\"\n print(f\"\\n{'='*70}\")\n print(f\"EITC Phase Status by State - {year_label}\")\n print(f\"{'='*70}\")\n \n # Step 1: Calculate weighted counts by state and phase status\n # tax_unit_weight is summed to get population-representative counts\n summary = df.groupby(['state', 'eitc_phase_status']).agg({\n 'tax_unit_weight': 'sum',\n }).reset_index()\n \n summary.columns = ['state', 'eitc_phase_status', 'weighted_households']\n \n # Step 2: Calculate state totals for percentage calculation\n state_totals = summary.groupby('state')['weighted_households'].sum().reset_index()\n state_totals.columns = ['state', 'state_total']\n \n # Step 3: Merge to compute percentages\n summary = summary.merge(state_totals, on='state')\n summary['pct_of_state'] = (summary['weighted_households'] / summary['state_total'] * 100).round(1)\n \n # Step 4: Add average EITC amounts (only computed for households receiving EITC)\n # This uses weighted averages: sum(value × weight) / sum(weight)\n avg_eitc = df[df['eitc'] > 0].groupby(['state', 'eitc_phase_status']).apply(\n lambda x: pd.Series({\n 'avg_federal_eitc': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n 'avg_state_eitc': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n })\n ).reset_index()\n \n summary = summary.merge(avg_eitc, on=['state', 'eitc_phase_status'], how='left')\n summary['avg_federal_eitc'] = summary['avg_federal_eitc'].fillna(0)\n summary['avg_state_eitc'] = summary['avg_state_eitc'].fillna(0)\n \n # Step 5: Clean up columns and sort\n summary = summary[['state', 'eitc_phase_status', 'weighted_households', 'pct_of_state', \n 'avg_federal_eitc', 'avg_state_eitc']]\n \n # Sort by state alphabetically, then by phase status in logical order\n summary['phase_sort'] = summary['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})\n summary = summary.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)\n \n return summary\n\n# Generate summaries for both years\nsummary_2024 = create_phase_status_summary(df_2024, \"2024\")\nsummary_2025 = create_phase_status_summary(df_2025, \"2025\")\n\n# Preview the results\nprint(\"\\n2024 Summary (first 20 rows):\")\nprint(summary_2024.head(20).to_string(index=False))\nprint(\"\\n2025 Summary (first 20 rows):\")\nprint(summary_2025.head(20).to_string(index=False))" }, { "cell_type": "markdown", @@ -544,65 +101,10 @@ }, { "cell_type": "code", - "execution_count": 37, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "======================================================================\n", - "Example Households by Phase Status - 2024\n", - "======================================================================\n", - " phase_status state marital_status age_head agi earned_income federal_eitc state_eitc\n", - " Pre-phase-in TN Unknown 30 938.96 2,316.38 177.20 0.00\n", - " Pre-phase-in NY Unknown 38 2.51 2.51 0.19 0.06\n", - " Pre-phase-in NY Unknown 44 3,109.13 3,268.07 250.01 75.00\n", - " Full amount GA Unknown 72 9,529.73 9,529.73 632.00 0.00\n", - " Full amount NY Unknown 31 12,709.29 13,765.16 632.00 189.60\n", - " Full amount CA Unknown 48 7,210.69 15,053.27 632.00 159.20\n", - "Partially phased out AL Unknown 25 12,807.38 13,072.27 422.22 0.00\n", - "Partially phased out NY Unknown 64 13,765.90 13,765.16 369.15 65.75\n", - "Partially phased out AZ Unknown 46 3,667.19 10,394.97 627.03 0.00\n" - ] - } - ], - "source": [ - "def show_example_households(df, year_label, n_examples=3):\n", - " \"\"\"\n", - " Show example households from each phase status with key characteristics.\n", - " \"\"\"\n", - " print(f\"\\n{'='*70}\")\n", - " print(f\"Example Households by Phase Status - {year_label}\")\n", - " print(f\"{'='*70}\")\n", - " \n", - " examples = []\n", - " \n", - " for phase in ['Pre-phase-in', 'Full amount', 'Partially phased out']:\n", - " phase_df = df[df['eitc_phase_status'] == phase]\n", - " if len(phase_df) > 0:\n", - " # Get random sample of examples\n", - " sample = phase_df.sample(min(n_examples, len(phase_df)), random_state=42)\n", - " for _, row in sample.iterrows():\n", - " examples.append({\n", - " 'phase_status': phase,\n", - " 'state': row['state'],\n", - " 'marital_status': row['filing_status_label'],\n", - " 'age_head': int(row['age_head']),\n", - " 'agi': row['adjusted_gross_income'],\n", - " 'earned_income': row['tax_unit_earned_income'],\n", - " 'federal_eitc': row['eitc'],\n", - " 'state_eitc': row['state_eitc'],\n", - " })\n", - " \n", - " examples_df = pd.DataFrame(examples)\n", - " return examples_df\n", - "\n", - "# Show examples for 2024\n", - "examples_2024 = show_example_households(df_2024, \"2024\")\n", - "print(examples_2024.to_string(index=False))" - ] + "outputs": [], + "source": "# =============================================================================\n# EXAMPLE HOUSEHOLDS BY PHASE STATUS\n# =============================================================================\n# Shows concrete examples of households in each phase status to help\n# understand what kinds of households fall into each category.\n#\n# This is useful for validation and for explaining the analysis to stakeholders.\n# =============================================================================\n\ndef show_example_households(df, year_label, n_examples=3):\n \"\"\"\n Show example households from each phase status with key characteristics.\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Household-level data\n year_label : str\n Label for display\n n_examples : int\n Number of examples per phase status (default 3)\n \n Returns:\n --------\n pandas.DataFrame\n Sample households with key characteristics\n \"\"\"\n print(f\"\\n{'='*70}\")\n print(f\"Example Households by Phase Status - {year_label}\")\n print(f\"{'='*70}\")\n \n examples = []\n \n # Only show examples for phases where households receive some EITC\n # (No income and Fully phased out receive $0, so less interesting as examples)\n for phase in ['Pre-phase-in', 'Full amount', 'Partially phased out']:\n phase_df = df[df['eitc_phase_status'] == phase]\n if len(phase_df) > 0:\n # Random sample with fixed seed for reproducibility\n sample = phase_df.sample(min(n_examples, len(phase_df)), random_state=42)\n for _, row in sample.iterrows():\n examples.append({\n 'phase_status': phase,\n 'state': row['state'],\n 'marital_status': row['filing_status_label'],\n 'age_head': int(row['age_head']),\n 'agi': row['adjusted_gross_income'],\n 'earned_income': row['tax_unit_earned_income'],\n 'federal_eitc': row['eitc'],\n 'state_eitc': row['state_eitc'],\n })\n \n examples_df = pd.DataFrame(examples)\n return examples_df\n\n# Show examples for 2024\nexamples_2024 = show_example_households(df_2024, \"2024\")\nprint(examples_2024.to_string(index=False))" }, { "cell_type": "markdown", @@ -613,89 +115,10 @@ }, { "cell_type": "code", - "execution_count": 38, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "============================================================\n", - "Top 15 States by EITC Recipients - 2024\n", - "============================================================\n", - "state Tax Units (Weighted) Total Federal EITC Total State EITC Avg Federal EITC Avg State EITC Has State EITC\n", - " CA 11,676,756.00 126,396,768.00 392,533,280.00 10.82 33.62 True\n", - " TX 8,270,492.50 102,216,176.00 0.00 12.36 0.00 False\n", - " FL 6,828,671.50 50,078,040.00 0.00 7.33 0.00 False\n", - " NY 6,089,496.00 64,632,924.00 17,955,152.00 10.61 2.95 True\n", - " IL 4,061,833.00 43,125,848.00 8,625,170.00 10.62 2.12 True\n", - " PA 4,057,412.25 41,305,212.00 0.00 10.18 0.00 False\n", - " OH 3,171,405.75 30,410,496.00 9,123,148.00 9.59 2.88 True\n", - " NC 3,018,447.50 15,553,126.00 0.00 5.15 0.00 False\n", - " MI 2,947,462.50 30,062,786.00 9,018,837.00 10.20 3.06 True\n", - " GA 2,867,909.25 20,237,260.00 0.00 7.06 0.00 False\n", - " WA 2,709,062.25 42,446,576.00 27,457,220.00 15.67 10.14 True\n", - " NJ 2,670,505.50 31,733,258.00 55,209,756.00 11.88 20.67 True\n", - " MA 2,445,482.50 27,926,758.00 11,170,704.00 11.42 4.57 True\n", - " VA 2,348,493.50 14,553,468.00 102,224,696.00 6.20 43.53 True\n", - " TN 2,125,824.00 16,333,918.00 0.00 7.68 0.00 False\n", - "\n", - "============================================================\n", - "Top 15 States by EITC Recipients - 2025\n", - "============================================================\n", - "state Tax Units (Weighted) Total Federal EITC Total State EITC Avg Federal EITC Avg State EITC Has State EITC\n", - " CA 11,785,171.00 129,709,128.00 394,849,248.00 11.01 33.50 True\n", - " TX 8,347,282.00 106,447,616.00 0.00 12.75 0.00 False\n", - " FL 6,892,074.00 51,580,204.00 0.00 7.48 0.00 False\n", - " NY 6,146,035.00 66,562,396.00 18,485,602.00 10.83 3.01 True\n", - " IL 4,099,546.00 44,526,100.00 8,905,220.00 10.86 2.17 True\n", - " PA 4,095,084.50 42,804,808.00 0.00 10.45 0.00 False\n", - " OH 3,200,851.75 31,340,700.00 9,402,211.00 9.79 2.94 True\n", - " NC 3,046,473.25 15,684,542.00 0.00 5.15 0.00 False\n", - " MI 2,974,829.25 31,287,212.00 9,386,164.00 10.52 3.16 True\n", - " GA 2,894,537.00 20,446,926.00 0.00 7.06 0.00 False\n", - " WA 2,734,215.50 44,398,868.00 28,292,344.00 16.24 10.35 True\n", - " NJ 2,695,300.25 32,696,890.00 56,647,304.00 12.13 21.02 True\n", - " MA 2,468,188.00 28,870,056.00 11,548,022.00 11.70 4.68 True\n", - " VA 2,370,298.50 14,610,885.00 103,897,088.00 6.16 43.83 True\n", - " TN 2,145,561.75 16,974,484.00 0.00 7.91 0.00 False\n" - ] - } - ], - "source": [ - "def summary_by_state(df, year_label, top_n=15):\n", - " \"\"\"\n", - " Create summary by state (top N by number of recipients).\n", - " \"\"\"\n", - " print(f\"\\n{'='*60}\")\n", - " print(f\"Top {top_n} States by EITC Recipients - {year_label}\")\n", - " print(f\"{'='*60}\")\n", - " \n", - " summary = df.groupby('state').apply(\n", - " lambda x: pd.Series({\n", - " 'Tax Units (Weighted)': x['tax_unit_weight'].sum(),\n", - " 'Total Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum(),\n", - " 'Total State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum(),\n", - " 'Avg Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n", - " 'Avg State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n", - " 'Has State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() > 0,\n", - " })\n", - " ).reset_index()\n", - " \n", - " # Sort by number of recipients\n", - " summary = summary.sort_values('Tax Units (Weighted)', ascending=False).head(top_n)\n", - " \n", - " return summary\n", - "\n", - "# 2024\n", - "state_2024 = summary_by_state(df_2024, \"2024\")\n", - "print(state_2024.to_string(index=False))\n", - "\n", - "# 2025\n", - "state_2025 = summary_by_state(df_2025, \"2025\")\n", - "print(state_2025.to_string(index=False))" - ] + "outputs": [], + "source": "# =============================================================================\n# SUMMARY BY STATE - TOP STATES BY POPULATION\n# =============================================================================\n# Shows the states with the largest childless tax unit populations,\n# along with total and average EITC amounts.\n#\n# Useful for understanding which states contribute most to the national totals.\n# =============================================================================\n\ndef summary_by_state(df, year_label, top_n=15):\n \"\"\"\n Create summary by state showing top N by number of childless tax units.\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Household-level data\n year_label : str\n Label for display\n top_n : int\n Number of top states to show (default 15)\n \n Returns:\n --------\n pandas.DataFrame\n State-level summary sorted by weighted tax unit count\n \"\"\"\n print(f\"\\n{'='*60}\")\n print(f\"Top {top_n} States by EITC Recipients - {year_label}\")\n print(f\"{'='*60}\")\n \n # Calculate state-level aggregates using weighted sums/averages\n summary = df.groupby('state').apply(\n lambda x: pd.Series({\n # Total weighted tax units in state\n 'Tax Units (Weighted)': x['tax_unit_weight'].sum(),\n # Total federal EITC distributed (weight × eitc amount)\n 'Total Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum(),\n # Total state EITC distributed\n 'Total State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum(),\n # Weighted average federal EITC per tax unit\n 'Avg Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n # Weighted average state EITC per tax unit\n 'Avg State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n # Boolean: does this state have a state EITC program?\n 'Has State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() > 0,\n })\n ).reset_index()\n \n # Sort by number of tax units (largest states first)\n summary = summary.sort_values('Tax Units (Weighted)', ascending=False).head(top_n)\n \n return summary\n\n# Generate and display for both years\nstate_2024 = summary_by_state(df_2024, \"2024\")\nprint(state_2024.to_string(index=False))\n\nstate_2025 = summary_by_state(df_2025, \"2025\")\nprint(state_2025.to_string(index=False))" }, { "cell_type": "markdown", @@ -706,72 +129,10 @@ }, { "cell_type": "code", - "execution_count": 39, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "============================================================\n", - "Phase Status by Filing Status (Weighted Tax Units) - 2024\n", - "============================================================\n", - "filing_status_label Unknown Total\n", - "eitc_phase_status \n", - "Full amount 33,314.48 33,314.48\n", - "Fully phased out 60,244,548.00 60,244,548.00\n", - "No income 34,126,456.00 34,126,456.00\n", - "Partially phased out 824,046.81 824,046.81\n", - "Pre-phase-in 1,203,184.00 1,203,184.00\n", - "Total 96,431,552.00 96,431,552.00\n", - "\n", - "============================================================\n", - "Phase Status by Filing Status (Weighted Tax Units) - 2025\n", - "============================================================\n", - "filing_status_label Unknown Total\n", - "eitc_phase_status \n", - "Full amount 33,638.47 33,638.47\n", - "Fully phased out 60,940,444.00 60,940,444.00\n", - "No income 34,307,016.00 34,307,016.00\n", - "Partially phased out 831,458.88 831,458.88\n", - "Pre-phase-in 1,214,332.12 1,214,332.12\n", - "Total 97,326,896.00 97,326,896.00\n" - ] - } - ], - "source": [ - "def crosstab_phase_by_filing(df, year_label):\n", - " \"\"\"\n", - " Create cross-tabulation of phase status by filing status.\n", - " \"\"\"\n", - " print(f\"\\n{'='*60}\")\n", - " print(f\"Phase Status by Filing Status (Weighted Tax Units) - {year_label}\")\n", - " print(f\"{'='*60}\")\n", - " \n", - " # Create pivot table with weighted counts\n", - " pivot = df.pivot_table(\n", - " values='tax_unit_weight',\n", - " index='eitc_phase_status',\n", - " columns='filing_status_label',\n", - " aggfunc='sum',\n", - " fill_value=0\n", - " )\n", - " \n", - " # Add totals\n", - " pivot['Total'] = pivot.sum(axis=1)\n", - " pivot.loc['Total'] = pivot.sum()\n", - " \n", - " return pivot\n", - "\n", - "# 2024\n", - "crosstab_2024 = crosstab_phase_by_filing(df_2024, \"2024\")\n", - "print(crosstab_2024.to_string())\n", - "\n", - "# 2025\n", - "crosstab_2025 = crosstab_phase_by_filing(df_2025, \"2025\")\n", - "print(crosstab_2025.to_string())" - ] + "outputs": [], + "source": "# =============================================================================\n# CROSS-TABULATION: PHASE STATUS × FILING STATUS\n# =============================================================================\n# Creates a pivot table showing how phase status varies by filing status.\n#\n# Note: Due to data limitations in the state datasets, filing status may\n# show as \"Unknown\" for many records.\n# =============================================================================\n\ndef crosstab_phase_by_filing(df, year_label):\n \"\"\"\n Create cross-tabulation of phase status by filing status (marital status).\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Household-level data\n year_label : str\n Label for display\n \n Returns:\n --------\n pandas.DataFrame\n Pivot table with phase status as rows, filing status as columns\n \"\"\"\n print(f\"\\n{'='*60}\")\n print(f\"Phase Status by Filing Status (Weighted Tax Units) - {year_label}\")\n print(f\"{'='*60}\")\n \n # Create pivot table: rows = phase status, columns = filing status\n # Values are weighted tax unit counts\n pivot = df.pivot_table(\n values='tax_unit_weight',\n index='eitc_phase_status',\n columns='filing_status_label',\n aggfunc='sum',\n fill_value=0\n )\n \n # Add row and column totals for context\n pivot['Total'] = pivot.sum(axis=1)\n pivot.loc['Total'] = pivot.sum()\n \n return pivot\n\n# Generate for both years\ncrosstab_2024 = crosstab_phase_by_filing(df_2024, \"2024\")\nprint(crosstab_2024.to_string())\n\ncrosstab_2025 = crosstab_phase_by_filing(df_2025, \"2025\")\nprint(crosstab_2025.to_string())" }, { "cell_type": "markdown", @@ -782,76 +143,10 @@ }, { "cell_type": "code", - "execution_count": 40, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "============================================================\n", - "Age Distribution of Head of Household - 2024\n", - "============================================================\n", - "age_group Tax Units (Weighted) Avg Federal EITC Avg Earned Income % of Total\n", - " Under 25 12,069,262.00 0.57 24,647.42 12.50\n", - " 25-34 14,198,971.00 35.78 76,383.24 14.70\n", - " 35-44 11,448,204.00 2.19 94,731.74 11.90\n", - " 45-54 16,595,334.00 22.36 87,682.62 17.20\n", - " 55-64 9,673,886.00 1.18 59,089.31 10.00\n", - " 65+ 32,441,214.00 0.01 25,601.59 33.60\n", - "\n", - "============================================================\n", - "Age Distribution of Head of Household - 2025\n", - "============================================================\n", - "age_group Tax Units (Weighted) Avg Federal EITC Avg Earned Income % of Total\n", - " Under 25 12,181,323.00 0.59 25,851.27 12.50\n", - " 25-34 14,330,805.00 36.65 80,112.14 14.70\n", - " 35-44 11,554,499.00 2.15 99,357.79 11.90\n", - " 45-54 16,749,416.00 22.80 91,964.61 17.20\n", - " 55-64 9,763,707.00 1.16 61,973.64 10.00\n", - " 65+ 32,742,422.00 0.01 26,850.62 33.60\n" - ] - } - ], - "source": [ - "def age_distribution(df, year_label):\n", - " \"\"\"\n", - " Create age group distribution.\n", - " \"\"\"\n", - " print(f\"\\n{'='*60}\")\n", - " print(f\"Age Distribution of Head of Household - {year_label}\")\n", - " print(f\"{'='*60}\")\n", - " \n", - " # Create age groups\n", - " df_copy = df.copy()\n", - " df_copy['age_group'] = pd.cut(\n", - " df_copy['age_head'],\n", - " bins=[0, 25, 35, 45, 55, 65, 100],\n", - " labels=['Under 25', '25-34', '35-44', '45-54', '55-64', '65+']\n", - " )\n", - " \n", - " summary = df_copy.groupby('age_group').apply(\n", - " lambda x: pd.Series({\n", - " 'Tax Units (Weighted)': x['tax_unit_weight'].sum(),\n", - " 'Avg Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum() if x['tax_unit_weight'].sum() > 0 else 0,\n", - " 'Avg Earned Income': (x['tax_unit_earned_income'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum() if x['tax_unit_weight'].sum() > 0 else 0,\n", - " })\n", - " ).reset_index()\n", - " \n", - " total_units = summary['Tax Units (Weighted)'].sum()\n", - " summary['% of Total'] = (summary['Tax Units (Weighted)'] / total_units * 100).round(1)\n", - " \n", - " return summary\n", - "\n", - "# 2024\n", - "age_2024 = age_distribution(df_2024, \"2024\")\n", - "print(age_2024.to_string(index=False))\n", - "\n", - "# 2025\n", - "age_2025 = age_distribution(df_2025, \"2025\")\n", - "print(age_2025.to_string(index=False))" - ] + "outputs": [], + "source": "# =============================================================================\n# AGE DISTRIBUTION ANALYSIS\n# =============================================================================\n# Shows how childless tax units are distributed by age of the head of household.\n#\n# Key insight: The childless EITC has age restrictions (25-64 for 2024 under\n# current law), so we expect most EITC recipients to fall within that range.\n# =============================================================================\n\ndef age_distribution(df, year_label):\n \"\"\"\n Create age group distribution for heads of household.\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Household-level data\n year_label : str\n Label for display\n \n Returns:\n --------\n pandas.DataFrame\n Summary by age group with weighted counts and averages\n \"\"\"\n print(f\"\\n{'='*60}\")\n print(f\"Age Distribution of Head of Household - {year_label}\")\n print(f\"{'='*60}\")\n \n # Create age groups using pd.cut\n df_copy = df.copy()\n df_copy['age_group'] = pd.cut(\n df_copy['age_head'],\n bins=[0, 25, 35, 45, 55, 65, 100],\n labels=['Under 25', '25-34', '35-44', '45-54', '55-64', '65+']\n )\n \n # Calculate weighted statistics by age group\n summary = df_copy.groupby('age_group').apply(\n lambda x: pd.Series({\n 'Tax Units (Weighted)': x['tax_unit_weight'].sum(),\n 'Avg Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum() if x['tax_unit_weight'].sum() > 0 else 0,\n 'Avg Earned Income': (x['tax_unit_earned_income'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum() if x['tax_unit_weight'].sum() > 0 else 0,\n })\n ).reset_index()\n \n # Add percentage of total\n total_units = summary['Tax Units (Weighted)'].sum()\n summary['% of Total'] = (summary['Tax Units (Weighted)'] / total_units * 100).round(1)\n \n return summary\n\n# Generate for both years\nage_2024 = age_distribution(df_2024, \"2024\")\nprint(age_2024.to_string(index=False))\n\nage_2025 = age_distribution(df_2025, \"2025\")\nprint(age_2025.to_string(index=False))" }, { "cell_type": "markdown", @@ -862,126 +157,10 @@ }, { "cell_type": "code", - "execution_count": 41, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "============================================================\n", - "States with State EITC Benefits - 2024\n", - "============================================================\n", - "state Tax Units (Weighted) Total State EITC Avg State EITC State EITC as % of Fed\n", - " CA 2,873,632.00 392,533,248.00 136.60 310.56\n", - " MN 313,996.50 118,287,520.00 376.72 656.23\n", - " VA 348,289.69 102,224,688.00 293.50 702.41\n", - " NJ 242,280.23 55,209,756.00 227.88 173.98\n", - " MD 64,599.95 49,036,304.00 759.08 294.33\n", - " WA 84,485.99 27,457,220.00 324.99 64.69\n", - " NY 146,240.25 17,955,152.00 122.78 27.83\n", - " MA 61,848.42 11,170,704.00 180.61 40.00\n", - " DC 21,836.86 10,643,084.00 487.39 442.39\n", - " NM 90,626.20 9,532,400.00 105.18 140.54\n", - " SC 15,907.35 9,201,450.00 578.44 125.00\n", - " OH 63,755.52 9,123,150.00 143.10 30.00\n", - " MI 58,990.42 9,018,837.00 152.89 30.00\n", - " IL 93,595.17 8,625,170.00 92.15 20.00\n", - " CO 33,525.91 7,573,860.50 225.91 50.00\n", - " CT 26,765.33 4,802,010.00 179.41 40.00\n", - " MO 25,335.12 2,362,942.00 93.27 20.00\n", - " ME 8,716.66 1,781,716.25 204.40 50.00\n", - " HI 8,067.77 1,532,535.38 189.96 40.00\n", - " IA 19,540.86 1,514,946.88 77.53 15.00\n", - " IN 25,976.84 1,273,132.25 49.01 10.00\n", - " OR 28,182.12 1,070,083.75 37.97 9.00\n", - " VT 4,973.77 849,665.75 170.83 38.00\n", - " KS 12,115.70 816,712.25 67.41 17.00\n", - " RI 9,870.70 746,713.44 75.65 16.00\n", - " LA 20,618.45 473,224.38 22.95 5.00\n", - " OK 20,021.92 449,874.88 22.47 4.57\n", - " NE 9,921.20 428,053.00 43.15 10.00\n", - " MT 6,376.92 245,735.78 38.54 10.00\n", - " DE 4,382.70 126,291.01 28.82 6.69\n", - "\n", - "============================================================\n", - "States with State EITC Benefits - 2025\n", - "============================================================\n", - "state Tax Units (Weighted) Total State EITC Avg State EITC State EITC as % of Fed\n", - " CA 2,542,093.00 394,849,216.00 155.32 304.41\n", - " MN 303,920.69 115,833,824.00 381.13 621.19\n", - " VA 351,505.81 103,897,088.00 295.58 711.09\n", - " NJ 242,613.14 56,647,304.00 233.49 173.25\n", - " MD 65,194.73 51,697,912.00 792.98 303.97\n", - " WA 85,270.31 28,292,344.00 331.80 63.72\n", - " NY 147,563.92 18,485,602.00 125.27 27.81\n", - " MA 62,421.07 11,548,022.00 185.00 40.00\n", - " DC 22,038.88 10,909,810.00 495.03 440.93\n", - " NM 90,061.51 9,856,160.00 109.44 141.47\n", - " OH 64,342.17 9,402,211.00 146.13 30.00\n", - " MI 59,535.31 9,386,164.00 157.66 30.00\n", - " SC 16,043.47 9,339,962.00 582.17 125.00\n", - " IL 94,457.95 8,905,220.00 94.28 20.00\n", - " CO 33,835.58 5,450,297.00 161.08 35.00\n", - " CT 27,013.58 4,959,099.50 183.58 40.00\n", - " MO 25,566.08 2,421,298.00 94.71 20.00\n", - " VT 5,019.33 2,304,432.25 459.11 100.00\n", - " ME 8,797.44 1,807,965.25 205.51 50.00\n", - " HI 8,142.12 1,588,292.62 195.07 40.00\n", - " IA 19,719.88 1,583,632.25 80.31 15.00\n", - " IN 26,210.19 1,314,889.75 50.17 10.00\n", - " OR 28,440.38 1,093,685.00 38.46 9.00\n", - " KS 12,228.20 826,845.44 67.62 17.00\n", - " RI 9,960.62 775,537.62 77.86 16.00\n", - " LA 20,803.49 486,003.25 23.36 5.00\n", - " OK 20,206.65 452,391.19 22.39 4.41\n", - " NE 10,012.65 435,824.81 43.53 10.00\n", - " MT 6,436.13 247,790.34 38.50 10.00\n", - " DE 4,423.04 123,470.46 27.92 6.44\n" - ] - } - ], - "source": [ - "def state_eitc_summary(df, year_label):\n", - " \"\"\"\n", - " Summary of states with state EITC programs.\n", - " \"\"\"\n", - " print(f\"\\n{'='*60}\")\n", - " print(f\"States with State EITC Benefits - {year_label}\")\n", - " print(f\"{'='*60}\")\n", - " \n", - " # Filter to states with state EITC > 0\n", - " df_with_state_eitc = df[df['state_eitc'] > 0]\n", - " \n", - " if len(df_with_state_eitc) == 0:\n", - " print(\"No state EITC benefits found in the data.\")\n", - " return None\n", - " \n", - " summary = df_with_state_eitc.groupby('state').apply(\n", - " lambda x: pd.Series({\n", - " 'Tax Units (Weighted)': x['tax_unit_weight'].sum(),\n", - " 'Total State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum(),\n", - " 'Avg State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n", - " 'State EITC as % of Fed': ((x['state_eitc'] * x['tax_unit_weight']).sum() / \n", - " (x['eitc'] * x['tax_unit_weight']).sum() * 100) if (x['eitc'] * x['tax_unit_weight']).sum() > 0 else 0,\n", - " })\n", - " ).reset_index()\n", - " \n", - " summary = summary.sort_values('Total State EITC', ascending=False)\n", - " \n", - " return summary\n", - "\n", - "# 2024\n", - "state_eitc_2024 = state_eitc_summary(df_2024, \"2024\")\n", - "if state_eitc_2024 is not None:\n", - " print(state_eitc_2024.to_string(index=False))\n", - "\n", - "# 2025\n", - "state_eitc_2025 = state_eitc_summary(df_2025, \"2025\")\n", - "if state_eitc_2025 is not None:\n", - " print(state_eitc_2025.to_string(index=False))" - ] + "outputs": [], + "source": "# =============================================================================\n# STATE EITC PROGRAM ANALYSIS\n# =============================================================================\n# Shows which states have state EITC programs and how generous they are.\n#\n# State EITCs are typically calculated as a percentage of the federal EITC,\n# ranging from ~3% (Montana) to ~125% (South Carolina).\n# =============================================================================\n\ndef state_eitc_summary(df, year_label):\n \"\"\"\n Summary of states with state EITC programs.\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Household-level data\n year_label : str\n Label for display\n \n Returns:\n --------\n pandas.DataFrame or None\n Summary for states with state EITC programs, sorted by total distributed\n \"\"\"\n print(f\"\\n{'='*60}\")\n print(f\"States with State EITC Benefits - {year_label}\")\n print(f\"{'='*60}\")\n \n # Filter to only households actually receiving state EITC\n df_with_state_eitc = df[df['state_eitc'] > 0]\n \n if len(df_with_state_eitc) == 0:\n print(\"No state EITC benefits found in the data.\")\n return None\n \n # Calculate state-level summaries\n summary = df_with_state_eitc.groupby('state').apply(\n lambda x: pd.Series({\n # Number of recipients (weighted)\n 'Tax Units (Weighted)': x['tax_unit_weight'].sum(),\n # Total state EITC distributed\n 'Total State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum(),\n # Average per recipient\n 'Avg State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n # State EITC as percentage of federal (indicates program generosity)\n 'State EITC as % of Fed': ((x['state_eitc'] * x['tax_unit_weight']).sum() / \n (x['eitc'] * x['tax_unit_weight']).sum() * 100) if (x['eitc'] * x['tax_unit_weight']).sum() > 0 else 0,\n })\n ).reset_index()\n \n # Sort by total distributed (largest programs first)\n summary = summary.sort_values('Total State EITC', ascending=False)\n \n return summary\n\n# Generate for both years\nstate_eitc_2024 = state_eitc_summary(df_2024, \"2024\")\nif state_eitc_2024 is not None:\n print(state_eitc_2024.to_string(index=False))\n\nstate_eitc_2025 = state_eitc_summary(df_2025, \"2025\")\nif state_eitc_2025 is not None:\n print(state_eitc_2025.to_string(index=False))" }, { "cell_type": "markdown", @@ -992,68 +171,10 @@ }, { "cell_type": "code", - "execution_count": 42, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Exported 1,493,541 rows to: eitc_childless_families_2024.csv\n", - "Exported 1,493,541 rows to: eitc_childless_families_2025.csv\n" - ] - } - ], - "source": [ - "# Export detailed household data - SEPARATE files for 2024 and 2025\n", - "# Sorted by state and phase_status, without eitc_maximum\n", - "\n", - "def export_household_data(df, year):\n", - " \"\"\"Export household-level data sorted by state and phase status.\"\"\"\n", - " \n", - " export_columns = [\n", - " 'state',\n", - " 'eitc_phase_status',\n", - " 'tax_unit_id',\n", - " 'tax_unit_weight',\n", - " 'eitc',\n", - " 'state_eitc',\n", - " 'eitc_phased_in',\n", - " 'eitc_reduction',\n", - " 'tax_unit_earned_income',\n", - " 'adjusted_gross_income',\n", - " 'filing_status_label',\n", - " 'age_head',\n", - " 'age_spouse',\n", - " ]\n", - " \n", - " # Select columns that exist\n", - " available_columns = [col for col in export_columns if col in df.columns]\n", - " df_export = df[available_columns].copy()\n", - " \n", - " # Rename for clarity\n", - " df_export = df_export.rename(columns={\n", - " 'eitc': 'federal_eitc',\n", - " 'filing_status_label': 'marital_status',\n", - " })\n", - " \n", - " # Sort by state and phase status\n", - " df_export['phase_sort'] = df_export['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})\n", - " df_export = df_export.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)\n", - " \n", - " # Export\n", - " filename = f'eitc_childless_families_{year}.csv'\n", - " df_export.to_csv(filename, index=False)\n", - " print(f\"Exported {len(df_export):,} rows to: {filename}\")\n", - " \n", - " return df_export\n", - "\n", - "# Export 2024\n", - "df_export_2024 = export_household_data(df_2024, 2024)\n", - "\n", - "# Export 2025\n", - "df_export_2025 = export_household_data(df_2025, 2025)" - ] + "outputs": [], + "source": "# =============================================================================\n# EXPORT DETAILED HOUSEHOLD DATA\n# =============================================================================\n# Exports the full household-level dataset with all calculated variables.\n#\n# WARNING: These files are large (~125MB each) and are excluded from git\n# via .gitignore. They are generated locally when the notebook runs.\n#\n# Use cases:\n# - Detailed analysis in external tools (Excel, Stata, R)\n# - Validation of the summary statistics\n# - Custom filtering/aggregation not provided in this notebook\n# =============================================================================\n\ndef export_household_data(df, year):\n \"\"\"\n Export household-level data to CSV, sorted by state and phase status.\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Household-level data from run_all_states_analysis()\n year : int\n Tax year (used in filename)\n \n Returns:\n --------\n pandas.DataFrame\n The exported data (same as written to file)\n \n Output File:\n eitc_childless_families_{year}.csv\n \"\"\"\n \n # Select columns for export (excluding eitc_maximum per user request)\n export_columns = [\n 'state', # State abbreviation\n 'eitc_phase_status', # Classification result\n 'tax_unit_id', # Unique identifier\n 'tax_unit_weight', # Survey weight\n 'eitc', # Federal EITC amount\n 'state_eitc', # State EITC amount\n 'eitc_phased_in', # Phase-in calculation\n 'eitc_reduction', # Phase-out reduction\n 'tax_unit_earned_income', # Total earned income\n 'adjusted_gross_income', # AGI\n 'filing_status_label', # Marital/filing status\n 'age_head', # Age of primary filer\n 'age_spouse', # Age of spouse (0 if none)\n ]\n \n # Only include columns that exist in the DataFrame\n available_columns = [col for col in export_columns if col in df.columns]\n df_export = df[available_columns].copy()\n \n # Rename columns for clarity in external tools\n df_export = df_export.rename(columns={\n 'eitc': 'federal_eitc',\n 'filing_status_label': 'marital_status',\n })\n \n # Sort by state (alphabetically) then by phase status (in logical EITC order)\n df_export['phase_sort'] = df_export['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})\n df_export = df_export.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)\n \n # Write to CSV\n filename = f'eitc_childless_families_{year}.csv'\n df_export.to_csv(filename, index=False)\n print(f\"Exported {len(df_export):,} rows to: {filename}\")\n \n return df_export\n\n# Export both years to separate files\ndf_export_2024 = export_household_data(df_2024, 2024)\ndf_export_2025 = export_household_data(df_2025, 2025)" }, { "cell_type": "code", @@ -1348,40 +469,10 @@ }, { "cell_type": "code", - "execution_count": 45, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Exported summary to: eitc_childless_phase_status_summary_2024.csv\n", - "Exported summary to: eitc_childless_phase_status_summary_2025.csv\n" - ] - } - ], - "source": [ - "# Export phase status summaries - SEPARATE files for 2024 and 2025\n", - "\n", - "def export_summary(summary_df, year):\n", - " \"\"\"Export summary sorted by state and phase status.\"\"\"\n", - " df_export = summary_df.copy()\n", - " \n", - " # Sort by state and phase status\n", - " df_export['phase_sort'] = df_export['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})\n", - " df_export = df_export.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)\n", - " \n", - " filename = f'eitc_childless_phase_status_summary_{year}.csv'\n", - " df_export.to_csv(filename, index=False)\n", - " print(f\"Exported summary to: {filename}\")\n", - " return df_export\n", - "\n", - "# Export 2024 summary\n", - "summary_2024_export = export_summary(summary_2024, 2024)\n", - "\n", - "# Export 2025 summary \n", - "summary_2025_export = export_summary(summary_2025, 2025)" - ] + "outputs": [], + "source": "# =============================================================================\n# EXPORT SUMMARY DATA\n# =============================================================================\n# Exports the aggregated summary by state and phase status.\n#\n# These files are small (~10KB) and ARE included in git commits.\n# This is the primary output for sharing with stakeholders.\n#\n# Output Files:\n# - eitc_childless_phase_status_summary_2024.csv\n# - eitc_childless_phase_status_summary_2025.csv\n# =============================================================================\n\ndef export_summary(summary_df, year):\n \"\"\"\n Export phase status summary to CSV, sorted by state and phase status.\n \n Parameters:\n -----------\n summary_df : pandas.DataFrame\n Summary from create_phase_status_summary()\n year : int\n Tax year (used in filename)\n \n Returns:\n --------\n pandas.DataFrame\n The exported data\n \"\"\"\n df_export = summary_df.copy()\n \n # Sort by state (alphabetically) then phase status (logical EITC order)\n df_export['phase_sort'] = df_export['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})\n df_export = df_export.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)\n \n # Write to CSV\n filename = f'eitc_childless_phase_status_summary_{year}.csv'\n df_export.to_csv(filename, index=False)\n print(f\"Exported summary to: {filename}\")\n return df_export\n\n# Export both years\nsummary_2024_export = export_summary(summary_2024, 2024)\nsummary_2025_export = export_summary(summary_2025, 2025)" }, { "cell_type": "markdown", @@ -1392,60 +483,10 @@ }, { "cell_type": "code", - "execution_count": 46, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "National Totals by Phase Status:\n", - "\n", - "2024:\n", - " eitc_phase_status weighted_households pct_of_total year\n", - " Full amount 33,314.48 0.00 2024\n", - " Fully phased out 60,244,548.00 62.50 2024\n", - " No income 34,126,456.00 35.40 2024\n", - "Partially phased out 824,046.81 0.90 2024\n", - " Pre-phase-in 1,203,184.00 1.20 2024\n", - "\n", - "Total childless EITC recipients: 96,431,552\n", - "\n", - "2025:\n", - " eitc_phase_status weighted_households pct_of_total year\n", - " Full amount 33,638.47 0.00 2025\n", - " Fully phased out 60,940,444.00 62.60 2025\n", - " No income 34,307,016.00 35.20 2025\n", - "Partially phased out 831,458.88 0.90 2025\n", - " Pre-phase-in 1,214,332.12 1.20 2025\n", - "\n", - "Total childless EITC recipients: 97,326,896\n" - ] - } - ], - "source": [ - "# National totals by phase status\n", - "def national_totals(df, year):\n", - " totals = df.groupby('eitc_phase_status').agg({\n", - " 'tax_unit_weight': 'sum',\n", - " }).reset_index()\n", - " totals.columns = ['eitc_phase_status', 'weighted_households']\n", - " total_all = totals['weighted_households'].sum()\n", - " totals['pct_of_total'] = (totals['weighted_households'] / total_all * 100).round(1)\n", - " totals['year'] = year\n", - " return totals\n", - "\n", - "print(\"National Totals by Phase Status:\")\n", - "print(\"\\n2024:\")\n", - "nat_2024 = national_totals(df_2024, 2024)\n", - "print(nat_2024.to_string(index=False))\n", - "print(f\"\\nTotal childless EITC recipients: {nat_2024['weighted_households'].sum():,.0f}\")\n", - "\n", - "print(\"\\n2025:\")\n", - "nat_2025 = national_totals(df_2025, 2025)\n", - "print(nat_2025.to_string(index=False))\n", - "print(f\"\\nTotal childless EITC recipients: {nat_2025['weighted_households'].sum():,.0f}\")" - ] + "outputs": [], + "source": "# =============================================================================\n# NATIONAL TOTALS BY PHASE STATUS\n# =============================================================================\n# Aggregates across all states to show the national distribution of\n# childless tax units by EITC phase status.\n#\n# Key insights:\n# - Most childless tax units (~62%) are \"Fully phased out\" (too much income)\n# - About 35% have \"No income\" (no earned income = no EITC)\n# - Only ~2% actually receive EITC (Pre-phase-in + Full amount + Partially)\n# =============================================================================\n\ndef national_totals(df, year):\n \"\"\"\n Calculate national totals by phase status.\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Household-level data\n year : int\n Tax year (for output column)\n \n Returns:\n --------\n pandas.DataFrame\n National summary with weighted counts and percentages\n \"\"\"\n totals = df.groupby('eitc_phase_status').agg({\n 'tax_unit_weight': 'sum',\n }).reset_index()\n totals.columns = ['eitc_phase_status', 'weighted_households']\n \n # Calculate percentage of total\n total_all = totals['weighted_households'].sum()\n totals['pct_of_total'] = (totals['weighted_households'] / total_all * 100).round(1)\n totals['year'] = year\n return totals\n\n# Display national totals\nprint(\"National Totals by Phase Status:\")\nprint(\"\\n2024:\")\nnat_2024 = national_totals(df_2024, 2024)\nprint(nat_2024.to_string(index=False))\nprint(f\"\\nTotal childless tax units: {nat_2024['weighted_households'].sum():,.0f}\")\n\nprint(\"\\n2025:\")\nnat_2025 = national_totals(df_2025, 2025)\nprint(nat_2025.to_string(index=False))\nprint(f\"\\nTotal childless tax units: {nat_2025['weighted_households'].sum():,.0f}\")" }, { "cell_type": "markdown", @@ -1497,4 +538,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file From cda33ba0a1aff29d88211c1f44ab82d6ac4b3cab Mon Sep 17 00:00:00 2001 From: David Trimmer Date: Wed, 17 Dec 2025 16:06:45 -0500 Subject: [PATCH 3/4] Refactor EITC childless analysis notebook MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Major changes: - Add comprehensive state EITC documentation at beginning (CA, MN, WA, VA, DE, MD) - Add 'Ineligible' phase status tier for SSN/age/investment income failures - Rename 'No income' to 'No earned income' with true zero threshold - Add eitc_eligible variable to data loading - Remove filing status cross-tabulation cells (redundant with CSVs) - Clean up export function to only include loaded columns - Update phase status order and notes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .../eitc_childless_analysis.ipynb | 108 ++---------------- 1 file changed, 9 insertions(+), 99 deletions(-) diff --git a/eitc_childless_analysis/eitc_childless_analysis.ipynb b/eitc_childless_analysis/eitc_childless_analysis.ipynb index d1c396c..4861b17 100644 --- a/eitc_childless_analysis/eitc_childless_analysis.ipynb +++ b/eitc_childless_analysis/eitc_childless_analysis.ipynb @@ -3,14 +3,12 @@ { "cell_type": "markdown", "metadata": {}, - "source": "# EITC Analysis: Childless Families by Phase-in/Phase-out Status\n\n## Overview\nThis notebook analyzes **childless tax units** (those with no EITC-qualifying children) across all 50 US states + DC, categorizing them by where they fall on the EITC schedule.\n\n## What This Notebook Does\n1. **Loads state-specific microdata** from PolicyEngine's HuggingFace repository\n2. **Filters to childless households** (eitc_child_count == 0)\n3. **Categorizes each household** into one of 5 EITC phase statuses\n4. **Calculates weighted counts and percentages** by state\n5. **Exports summary and detailed data** to CSV files\n\n## EITC Phase Status Categories\n| Status | Description |\n|--------|-------------|\n| **No income** | No/minimal earned income ($100 or less), not receiving EITC |\n| **Pre-phase-in** | Earning income but haven't reached maximum credit yet |\n| **Full amount** | At the plateau - receiving maximum credit |\n| **Partially phased out** | In phase-out range, receiving reduced credit |\n| **Fully phased out** | Income too high, EITC reduced to $0 |\n\n## Data Source\n- **State datasets**: `hf://policyengine/policyengine-us-data/states/{STATE}.h5`\n- Each state has its own dataset with representative household microdata\n- Data is weighted to represent the actual population\n\n## Output Files\n- `eitc_childless_phase_status_summary_{year}.csv` - Aggregated by state and phase status\n- `eitc_childless_families_{year}.csv` - Detailed household-level data (large files, ~125MB each)\n\n## Years Analyzed\n- 2024 and 2025" + "source": "# EITC Analysis: Childless Filers by Phase Status\n\n## Overview\nThis notebook analyzes **childless tax units** (those with no EITC-qualifying children) across all 50 US states + DC, categorizing them by where they fall on the EITC schedule.\n\n## What This Notebook Does\n1. **Loads state-specific microdata** from PolicyEngine's HuggingFace repository\n2. **Filters to childless filers** (eitc_child_count == 0)\n3. **Checks EITC eligibility** (age requirements, SSN, investment income limits)\n4. **Categorizes each household** into one of 6 phase statuses\n5. **Calculates weighted counts and percentages** by state\n6. **Exports summary data** to CSV files\n\n## EITC Phase Status Categories\n| Status | Description |\n|--------|-------------|\n| **Ineligible** | Does not meet EITC eligibility requirements (age, SSN, investment income, or filing status) |\n| **No earned income** | No earned income, therefore no EITC |\n| **Pre-phase-in** | Has earned income but hasn't reached maximum credit yet |\n| **Full amount** | At the plateau - receiving maximum credit |\n| **Partially phased out** | In phase-out range, receiving reduced credit |\n| **Fully phased out** | Income too high, EITC reduced to $0 |\n\n## Data Source\n- **State datasets**: `hf://policyengine/policyengine-us-data/states/{STATE}.h5`\n- Each state has its own dataset with representative household microdata\n- Data is weighted to represent the actual population\n\n## Output Files\n- `eitc_childless_phase_status_summary_{year}.csv` - Aggregated by state and phase status\n\n## Years Analyzed\n- 2024 and 2025" }, { "cell_type": "markdown", "metadata": {}, - "source": [ - "## Setup and Imports" - ] + "source": "## State EITC Programs\n\nAs of 2024, **31 states plus DC** have state-level Earned Income Tax Credit programs. Most states calculate their EITC as a simple percentage match of the federal EITC, but several have unique structures.\n\n### States with Standard Federal Match Structure\nThese states calculate their state EITC as a percentage of the federal EITC amount:\n\n| State | Match % | Refundable | Notes |\n|-------|---------|------------|-------|\n| CO | 50% (2024) | Yes | Phasing down to 10% by 2034 |\n| CT | ~30% | Yes | |\n| DC | 70% | Yes | Higher match for childless workers |\n| DE | 4.5% ref / 20% non-ref | Choice | Taxpayers choose refundable OR non-refundable |\n| HI | 40% | Yes | |\n| IL | 20% | Yes | |\n| IN | 10% | Yes | |\n| IA | 15% | Yes | |\n| KS | 17% | Yes | |\n| LA | 5% | Yes | |\n| ME | 50% | Yes | |\n| MA | 40% | Yes | |\n| MI | 30% | Yes | |\n| MO | 20% | Yes | Called \"Working Families Tax Credit\" |\n| MT | 10% | Yes | |\n| NE | 10% | Yes | |\n| NJ | Variable | Yes | Varies by income |\n| NM | ~25% | Yes | |\n| NY | 30% | Yes | Plus supplemental credit |\n| OH | 30% | Yes | |\n| OK | 5% | Yes | Lowest in nation |\n| OR | 9-12% | Yes | Varies by children |\n| PA | ~10% | Yes | |\n| RI | 16% | Yes | |\n| SC | 125% | Yes | Highest in nation |\n| VT | ~38% | Yes | Increased to 100% for childless in 2025 |\n| WI | Variable | Yes | Varies by children |\n\n### States with UNIQUE/NON-STANDARD Structures\n\n#### California (CA) - CalEITC\nCalifornia does NOT simply match the federal EITC. Instead:\n- Uses an **85% adjustment factor** applied to a state-specific calculation\n- Has **different phase-in rates by number of children**:\n - 0 children: 7.65%\n - 1 child: 34%\n - 2 children: 40%\n - 3+ children: 45%\n- Has a **two-stage phase-out** structure\n- Maximum credit is lower than federal EITC\n- **Fully refundable**\n\n#### Minnesota (MN) - Working Family Credit / Child & Working Families Credit\nMinnesota **replaced** its traditional Working Family Credit in 2023 with the **Child and Working Families Credit (CWFC)**:\n- **Two-part credit structure**:\n 1. Child Tax Credit component: Fixed amount per qualifying child\n 2. Working Family Credit component: Phase-in based on earnings\n- Combined amounts phase out together based on AGI or earnings\n- **Completely independent calculation** from federal EITC\n- **Fully refundable**\n\n#### Washington (WA) - Working Families Tax Credit (WFTC)\nWashington has **no income tax** and therefore no traditional EITC. Instead:\n- Provides a **flat dollar amount** based on number of children:\n - 0 children: $300-$325\n - 1 child: $600-$640\n - 2 children: $900-$965\n - 3+ children: $1,200-$1,290\n- Phases out starting **$2,500-$5,000 below** federal EITC AGI limits\n- Requires claiming federal EITC to qualify\n- **Fully refundable**\n\n#### Virginia (VA) - Split Refundable/Non-Refundable + Low-Income Tax Credit\nVirginia has the most complex structure:\n- **Non-refundable match**: 20% of federal EITC (since 2006)\n- **Refundable match**: Variable (0% → 15% → 20% → 15% over different years)\n- **Alternative Low-Income Tax Credit (LITC)**: $300 per personal exemption\n- Taxpayers receive the **better of** EITC match or LITC\n- Separate filers receive prorated credits\n\n#### Delaware (DE) - Choice Between Refundable and Non-Refundable\nDelaware requires taxpayers to **choose one**:\n- **Refundable option**: 4.5% of federal EITC\n- **Non-refundable option**: 20% of federal EITC\n- Cannot claim both\n\n#### Maryland (MD) - Differentiated by Family Status\nMaryland varies match percentages by family composition:\n- **Married OR has children**: \n - Non-refundable: 50%\n - Refundable: 25-45%\n- **Childless unmarried filers**: Different (lower) percentages\n- Has separate parameters for different filing situations\n\n### States WITHOUT State EITC Programs\nThe following states have **no state EITC**: AL, AK, AZ, AR, FL, GA, ID, KY, MS, NV, NH, NC, ND, SD, TN, TX, UT, WV, WY" }, { "cell_type": "code", @@ -22,26 +20,26 @@ { "cell_type": "markdown", "metadata": {}, - "source": "## EITC Phase Status Classification\n\nThe Earned Income Tax Credit (EITC) follows a trapezoidal schedule:\n\n```\nCredit\nAmount\n ^\n | ___________\n | / \\\n | / \\\n | / \\\n | / \\\n | / \\\n |/_____________________\\____> Earned Income\n Phase-in Plateau Phase-out\n```\n\n### How We Classify Households\n\nWe use PolicyEngine's calculated variables to determine where each household falls:\n\n| Variable | Description |\n|----------|-------------|\n| `eitc` | Final EITC amount received (after all calculations) |\n| `eitc_maximum` | Maximum possible EITC for this household's filing status |\n| `eitc_phased_in` | Amount \"earned\" based on phase-in rate × earned income |\n| `eitc_reduction` | Amount reduced due to being in phase-out range |\n| `tax_unit_earned_income` | Total earned income for the tax unit |\n\n### Classification Logic\n1. **No income**: Earned income ≤ $100 AND eitc = 0\n2. **Pre-phase-in**: Receiving EITC but eitc_phased_in < eitc_maximum\n3. **Full amount**: eitc_phased_in ≥ eitc_maximum AND eitc_reduction = 0\n4. **Partially phased out**: Receiving EITC AND eitc_reduction > 0\n5. **Fully phased out**: eitc = 0 AND (has reduction OR phased_in ≥ maximum)" + "source": "## EITC Phase Status Classification\n\nThe Earned Income Tax Credit (EITC) follows a trapezoidal schedule:\n\n```\nCredit\nAmount\n ^\n | ___________\n | / \\\n | / \\\n | / \\\n | / \\\n | / \\\n |/_____________________\\____> Earned Income\n Phase-in Plateau Phase-out\n```\n\n### EITC Eligibility Requirements (Childless Filers)\nBefore a childless filer can receive EITC, they must meet:\n1. **Age requirement**: Between 25 and 64 years old (or 19+ if former foster youth/homeless)\n2. **SSN requirement**: Valid Social Security Number for work\n3. **Investment income limit**: Investment income must be below threshold (~$11,000 in 2024)\n4. **Filing status**: Cannot file as \"Married Filing Separately\" (in most cases)\n\n### How We Classify Households\n\nWe use PolicyEngine's calculated variables:\n\n| Variable | Description |\n|----------|-------------|\n| `eitc_eligible` | Whether tax unit meets all EITC eligibility requirements |\n| `eitc` | Final EITC amount received (after all calculations) |\n| `eitc_maximum` | Maximum possible EITC for this filing status |\n| `eitc_phased_in` | Amount \"earned\" based on phase-in rate × earned income |\n| `eitc_reduction` | Amount reduced due to being in phase-out range |\n| `tax_unit_earned_income` | Total earned income for the tax unit |\n\n### Classification Logic (in priority order)\n1. **Ineligible**: `eitc_eligible == False` (fails age, SSN, investment income, or filing status)\n2. **No earned income**: `tax_unit_earned_income == 0` (eligible but no earnings)\n3. **Pre-phase-in**: Receiving EITC but `eitc_phased_in < eitc_maximum`\n4. **Full amount**: `eitc_phased_in >= eitc_maximum` AND `eitc_reduction == 0`\n5. **Partially phased out**: Receiving EITC AND `eitc_reduction > 0`\n6. **Fully phased out**: `eitc == 0` AND has income (phased out completely)" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], - "source": "# =============================================================================\n# EITC PHASE STATUS CLASSIFICATION FUNCTION\n# =============================================================================\n# This function takes a DataFrame of households and classifies each one into\n# one of 5 EITC phase statuses based on their income and EITC calculations.\n#\n# Uses numpy's np.select() for efficient vectorized conditional logic.\n# =============================================================================\n\ndef determine_eitc_phase_status_vectorized(df):\n \"\"\"\n Classify each household into an EITC phase status category.\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Must contain columns: tax_unit_earned_income, eitc, eitc_reduction,\n eitc_phased_in, eitc_maximum\n \n Returns:\n --------\n numpy.ndarray\n Array of status strings, one per row in df\n \n Categories:\n -----------\n - No income: No/minimal earned income, not receiving EITC\n - Pre-phase-in: Earning but haven't reached maximum credit yet\n - Full amount: At maximum credit (plateau region)\n - Partially phased out: In phase-out region, still receiving some credit\n - Fully phased out: Income too high, EITC reduced to $0\n \"\"\"\n \n # Define conditions in priority order (first match wins)\n # Each condition is a boolean array the same length as df\n conditions = [\n # CONDITION 1: No income\n # Household has little/no earned income AND isn't receiving EITC\n (df['tax_unit_earned_income'] <= 100) & (df['eitc'] <= 0),\n \n # CONDITION 2: Fully phased out (with reduction)\n # Not receiving EITC, but has earned income and would have had reduction\n (df['eitc'] <= 0) & (df['tax_unit_earned_income'] > 100) & (df['eitc_reduction'] > 0),\n \n # CONDITION 3: Fully phased out (hit max then reduced to zero)\n # Not receiving EITC, but phased_in amount reached/exceeded maximum\n (df['eitc'] <= 0) & (df['eitc_phased_in'] >= df['eitc_maximum']),\n \n # CONDITION 4: Pre-phase-in\n # Receiving EITC, but haven't earned enough to hit maximum yet\n (df['eitc'] > 0) & (df['eitc_phased_in'] < df['eitc_maximum']),\n \n # CONDITION 5: Partially phased out\n # Receiving EITC, but some reduction has been applied\n (df['eitc'] > 0) & (df['eitc_reduction'] > 0),\n \n # CONDITION 6: Full amount (plateau)\n # Receiving EITC at maximum, no reduction applied\n (df['eitc'] > 0) & (df['eitc_phased_in'] >= df['eitc_maximum']) & (df['eitc_reduction'] <= 0)\n ]\n \n # Labels corresponding to each condition above\n choices = [\n 'No income',\n 'Fully phased out',\n 'Fully phased out',\n 'Pre-phase-in',\n 'Partially phased out',\n 'Full amount'\n ]\n \n # np.select applies conditions in order, returns first matching choice\n # Default 'No income' catches any edge cases\n return np.select(conditions, choices, default='No income')" + "source": "# =============================================================================\n# EITC PHASE STATUS CLASSIFICATION FUNCTION\n# =============================================================================\n# This function takes a DataFrame of households and classifies each one into\n# one of 6 EITC phase statuses based on eligibility, income, and EITC calculations.\n#\n# Uses numpy's np.select() for efficient vectorized conditional logic.\n# =============================================================================\n\ndef determine_eitc_phase_status_vectorized(df):\n \"\"\"\n Classify each household into an EITC phase status category.\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Must contain columns: eitc_eligible, tax_unit_earned_income, eitc, \n eitc_reduction, eitc_phased_in, eitc_maximum\n \n Returns:\n --------\n numpy.ndarray\n Array of status strings, one per row in df\n \n Categories (in priority order):\n -------------------------------\n 1. Ineligible: Does not meet EITC eligibility (age, SSN, investment income)\n 2. No earned income: Eligible but has zero earned income\n 3. Pre-phase-in: Receiving EITC, still building up to maximum\n 4. Full amount: At maximum credit (plateau region)\n 5. Partially phased out: In phase-out region, still receiving some credit\n 6. Fully phased out: Income too high, EITC reduced to $0\n \"\"\"\n \n # Define conditions in PRIORITY ORDER (first match wins)\n conditions = [\n # CONDITION 1: Ineligible for EITC\n # Fails age requirement (25-64), SSN, investment income limit, or filing status\n df['eitc_eligible'] == False,\n \n # CONDITION 2: No earned income\n # Eligible for EITC but has zero earned income (cannot receive credit)\n (df['eitc_eligible'] == True) & (df['tax_unit_earned_income'] == 0),\n \n # CONDITION 3: Pre-phase-in\n # Receiving EITC, but haven't earned enough to hit maximum yet\n (df['eitc'] > 0) & (df['eitc_phased_in'] < df['eitc_maximum']),\n \n # CONDITION 4: Full amount (plateau)\n # Receiving EITC at maximum, no reduction applied\n (df['eitc'] > 0) & (df['eitc_phased_in'] >= df['eitc_maximum']) & (df['eitc_reduction'] <= 0),\n \n # CONDITION 5: Partially phased out\n # Receiving EITC, but some reduction has been applied\n (df['eitc'] > 0) & (df['eitc_reduction'] > 0),\n \n # CONDITION 6: Fully phased out\n # Eligible, has income, but EITC reduced to zero\n (df['eitc_eligible'] == True) & (df['tax_unit_earned_income'] > 0) & (df['eitc'] <= 0),\n ]\n \n # Labels corresponding to each condition above\n choices = [\n 'Ineligible',\n 'No earned income',\n 'Pre-phase-in',\n 'Full amount',\n 'Partially phased out',\n 'Fully phased out'\n ]\n \n # np.select applies conditions in order, returns first matching choice\n # Default catches any edge cases\n return np.select(conditions, choices, default='Ineligible')" }, { "cell_type": "markdown", "metadata": {}, - "source": "## Data Loading Functions\n\nThe following cell defines two key functions:\n\n### `run_state_eitc_analysis(state_abbr, year)`\nLoads and processes data for a single state:\n1. Loads the state's microdata from HuggingFace\n2. Calculates all relevant EITC and household variables\n3. Filters to childless households only\n4. Classifies each household by EITC phase status\n5. Returns a DataFrame with one row per household\n\n### `run_all_states_analysis(year)`\nOrchestrates the full analysis:\n1. Loops through all 51 states/DC\n2. Calls `run_state_eitc_analysis()` for each\n3. Combines all results into a single DataFrame\n4. Reports progress and totals\n\n### Variables Calculated\n| Variable | Description |\n|----------|-------------|\n| `tax_unit_weight` | Survey weight (how many real households this record represents) |\n| `eitc` | Federal EITC amount received |\n| `state_eitc` | State EITC amount (if state has a program) |\n| `eitc_child_count` | Number of EITC-qualifying children (we filter to 0) |\n| `filing_status` | Tax filing status (Single, Joint, etc.) |\n| `age_head` | Age of primary filer |\n| `adjusted_gross_income` | AGI for the tax unit |" + "source": "## Data Loading Functions\n\n### `run_state_eitc_analysis(state_abbr, year)`\nLoads and processes data for a single state:\n1. Loads the state's microdata from HuggingFace\n2. Calculates all relevant EITC and household variables\n3. Filters to childless filers only (`eitc_child_count == 0`)\n4. Classifies each household by EITC phase status\n5. Returns a DataFrame with one row per household\n\n### `run_all_states_analysis(year)`\nOrchestrates the full analysis:\n1. Loops through all 51 states/DC\n2. Calls `run_state_eitc_analysis()` for each\n3. Combines all results into a single DataFrame\n\n### Variables Calculated\n| Variable | Description |\n|----------|-------------|\n| `tax_unit_weight` | Survey weight (how many real households this record represents) |\n| `eitc_eligible` | Whether tax unit meets all EITC eligibility requirements |\n| `eitc` | Federal EITC amount received |\n| `state_eitc` | State EITC amount (if state has a program) |\n| `eitc_child_count` | Number of EITC-qualifying children (we filter to 0) |\n| `tax_unit_earned_income` | Total earned income for the tax unit |\n| `age_head` | Age of primary filer |" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], - "source": "# =============================================================================\n# STATE LIST AND DATA LOADING FUNCTIONS\n# =============================================================================\n\n# All US states + DC (51 total)\n# Modify this list to analyze a subset of states\nALL_STATES = [\n 'AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL', \n 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', \n 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', \n 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', \n 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY'\n]\n\n# Order for sorting phase statuses (follows the EITC schedule from left to right)\nPHASE_ORDER = ['No income', 'Pre-phase-in', 'Full amount', 'Partially phased out', 'Fully phased out']\n\n\ndef run_state_eitc_analysis(state_abbr, year):\n \"\"\"\n Load and analyze EITC data for a single state.\n \n Parameters:\n -----------\n state_abbr : str\n Two-letter state abbreviation (e.g., 'CA', 'NY', 'TX')\n year : int\n Tax year to analyze (e.g., 2024, 2025)\n \n Returns:\n --------\n pandas.DataFrame or None\n DataFrame with one row per childless tax unit, or None if error\n \"\"\"\n try:\n # -----------------------------------------------------------------\n # STEP 1: Load the state's microdata from HuggingFace\n # -----------------------------------------------------------------\n # Each state has its own .h5 file with representative household data\n # The data is weighted to represent the state's actual population\n dataset_path = f\"hf://policyengine/policyengine-us-data/states/{state_abbr}.h5\"\n sim = Microsimulation(dataset=dataset_path)\n \n # -----------------------------------------------------------------\n # STEP 2: Calculate required variables using PolicyEngine\n # -----------------------------------------------------------------\n # These are \"tax unit\" level variables (a tax unit = people filing together)\n # sim.calculate() returns a weighted array of values\n data = {}\n \n tax_unit_vars = [\n 'tax_unit_id', # Unique identifier for each tax unit\n 'tax_unit_weight', # Survey weight (represents X real households)\n 'eitc', # Federal EITC amount (final, after all calculations)\n 'eitc_maximum', # Max possible EITC for this filing status\n 'eitc_phased_in', # Amount \"earned\" via phase-in calculation\n 'eitc_reduction', # Amount reduced due to phase-out\n 'eitc_child_count', # Number of EITC-qualifying children\n 'state_eitc', # State EITC amount (0 if no state program)\n 'adjusted_gross_income', # AGI\n 'tax_unit_earned_income', # Total earned income\n 'filing_status', # 1=Single, 2=Joint, 3=Separate, 4=HoH, 5=Widow\n 'age_head', # Age of primary filer\n 'age_spouse', # Age of spouse (0 if single)\n ]\n \n # Calculate each variable and extract the numpy array\n for var in tax_unit_vars:\n result = sim.calculate(var, period=year)\n # .values extracts the underlying numpy array from PolicyEngine's result\n data[var] = result.values if hasattr(result, 'values') else np.array(result)\n \n # Create DataFrame from the calculated values\n df = pd.DataFrame(data)\n df['state'] = state_abbr # Add state identifier\n \n # -----------------------------------------------------------------\n # STEP 3: Filter to childless households only\n # -----------------------------------------------------------------\n # We want ALL childless households, not just those receiving EITC\n # This lets us calculate percentages that sum to 100%\n childless_mask = df['eitc_child_count'] == 0\n df_childless = df[childless_mask].copy()\n \n if len(df_childless) == 0:\n return None\n \n # -----------------------------------------------------------------\n # STEP 4: Classify each household by EITC phase status\n # -----------------------------------------------------------------\n df_childless['eitc_phase_status'] = determine_eitc_phase_status_vectorized(df_childless)\n \n # -----------------------------------------------------------------\n # STEP 5: Add readable labels for filing status\n # -----------------------------------------------------------------\n df_childless['year'] = year\n \n filing_status_map = {\n 1: 'Single',\n 2: 'Joint',\n 3: 'Separate',\n 4: 'Head of Household',\n 5: 'Widow(er)'\n }\n df_childless['filing_status_label'] = df_childless['filing_status'].map(filing_status_map).fillna('Unknown')\n \n return df_childless\n \n except Exception as e:\n print(f\" Error processing {state_abbr}: {e}\")\n return None\n\n\ndef run_all_states_analysis(year, states=None):\n \"\"\"\n Run EITC analysis for all states and combine results.\n \n Parameters:\n -----------\n year : int\n Tax year to analyze\n states : list, optional\n List of state abbreviations. Defaults to ALL_STATES (all 51).\n \n Returns:\n --------\n pandas.DataFrame\n Combined DataFrame with all states' data\n \"\"\"\n if states is None:\n states = ALL_STATES\n \n print(f\"\\n{'='*60}\")\n print(f\"Running analysis for {year}\")\n print(f\"{'='*60}\")\n \n all_results = []\n \n # Process each state\n for i, state in enumerate(states):\n print(f\"Processing {state} ({i+1}/{len(states)})...\", end=\" \")\n result = run_state_eitc_analysis(state, year)\n \n if result is not None and len(result) > 0:\n # Report: raw record count and weighted population count\n weighted_count = result['tax_unit_weight'].sum()\n print(f\"{len(result):,} records, {weighted_count:,.0f} weighted\")\n all_results.append(result)\n else:\n print(\"No data found\")\n \n # Combine all state DataFrames\n if all_results:\n combined = pd.concat(all_results, ignore_index=True)\n print(f\"\\nTotal: {len(combined):,} records, {combined['tax_unit_weight'].sum():,.0f} weighted tax units\")\n return combined\n else:\n return pd.DataFrame()" + "source": "# =============================================================================\n# STATE LIST AND DATA LOADING FUNCTIONS\n# =============================================================================\n\n# All US states + DC (51 total)\nALL_STATES = [\n 'AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL', \n 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', \n 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', \n 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', \n 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY'\n]\n\n# Order for sorting phase statuses (follows logical EITC flow)\nPHASE_ORDER = [\n 'Ineligible', # Cannot receive EITC (age/SSN/investment income)\n 'No earned income', # Eligible but no earnings\n 'Pre-phase-in', # Building up to maximum\n 'Full amount', # At maximum (plateau)\n 'Partially phased out', # Being reduced\n 'Fully phased out' # Reduced to $0\n]\n\n\ndef run_state_eitc_analysis(state_abbr, year):\n \"\"\"\n Load and analyze EITC data for a single state.\n \n Parameters:\n -----------\n state_abbr : str\n Two-letter state abbreviation (e.g., 'CA', 'NY', 'TX')\n year : int\n Tax year to analyze (e.g., 2024, 2025)\n \n Returns:\n --------\n pandas.DataFrame or None\n DataFrame with one row per childless tax unit, or None if error\n \"\"\"\n try:\n # Load the state's microdata from HuggingFace\n dataset_path = f\"hf://policyengine/policyengine-us-data/states/{state_abbr}.h5\"\n sim = Microsimulation(dataset=dataset_path)\n \n # Variables to calculate\n tax_unit_vars = [\n 'tax_unit_id', # Unique identifier\n 'tax_unit_weight', # Survey weight\n 'eitc_eligible', # NEW: Whether eligible for EITC\n 'eitc', # Federal EITC amount\n 'eitc_maximum', # Max possible EITC\n 'eitc_phased_in', # Phase-in amount\n 'eitc_reduction', # Phase-out reduction\n 'eitc_child_count', # Number of EITC-qualifying children\n 'state_eitc', # State EITC amount\n 'tax_unit_earned_income', # Total earned income\n 'age_head', # Age of primary filer\n ]\n \n # Calculate each variable\n data = {}\n for var in tax_unit_vars:\n result = sim.calculate(var, period=year)\n data[var] = result.values if hasattr(result, 'values') else np.array(result)\n \n df = pd.DataFrame(data)\n df['state'] = state_abbr\n \n # Filter to childless filers only\n childless_mask = df['eitc_child_count'] == 0\n df_childless = df[childless_mask].copy()\n \n if len(df_childless) == 0:\n return None\n \n # Classify each household by EITC phase status\n df_childless['eitc_phase_status'] = determine_eitc_phase_status_vectorized(df_childless)\n df_childless['year'] = year\n \n return df_childless\n \n except Exception as e:\n print(f\" Error processing {state_abbr}: {e}\")\n return None\n\n\ndef run_all_states_analysis(year, states=None):\n \"\"\"\n Run EITC analysis for all states and combine results.\n \"\"\"\n if states is None:\n states = ALL_STATES\n \n print(f\"\\n{'='*60}\")\n print(f\"Running analysis for {year}\")\n print(f\"{'='*60}\")\n \n all_results = []\n \n for i, state in enumerate(states):\n print(f\"Processing {state} ({i+1}/{len(states)})...\", end=\" \")\n result = run_state_eitc_analysis(state, year)\n \n if result is not None and len(result) > 0:\n weighted_count = result['tax_unit_weight'].sum()\n print(f\"{len(result):,} records, {weighted_count:,.0f} weighted\")\n all_results.append(result)\n else:\n print(\"No data found\")\n \n if all_results:\n combined = pd.concat(all_results, ignore_index=True)\n print(f\"\\nTotal: {len(combined):,} records, {combined['tax_unit_weight'].sum():,.0f} weighted tax units\")\n return combined\n else:\n return pd.DataFrame()" }, { "cell_type": "markdown", @@ -74,16 +72,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "## Summary Statistics" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### EITC Phase Status Distribution" - ] + "source": "## Create and Export Summary" }, { "cell_type": "code", @@ -92,27 +81,6 @@ "outputs": [], "source": "# =============================================================================\n# PHASE STATUS SUMMARY BY STATE\n# =============================================================================\n# This function creates the main summary output: for each state, what\n# percentage of childless households fall into each EITC phase status?\n#\n# Key outputs per state × phase status:\n# - weighted_households: Actual population count (using survey weights)\n# - pct_of_state: What % of that state's childless households are in this phase\n# - avg_federal_eitc: Average federal EITC for households receiving EITC\n# - avg_state_eitc: Average state EITC (for states with programs)\n#\n# The percentages should sum to 100% for each state since we include ALL\n# childless households (not just EITC recipients).\n# =============================================================================\n\ndef create_phase_status_summary(df, year_label):\n \"\"\"\n Create summary of EITC phase status by state with weighted counts and percentages.\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Household-level data from run_all_states_analysis()\n year_label : str\n Label for display (e.g., \"2024\")\n \n Returns:\n --------\n pandas.DataFrame\n Summary with columns: state, eitc_phase_status, weighted_households,\n pct_of_state, avg_federal_eitc, avg_state_eitc\n \"\"\"\n print(f\"\\n{'='*70}\")\n print(f\"EITC Phase Status by State - {year_label}\")\n print(f\"{'='*70}\")\n \n # Step 1: Calculate weighted counts by state and phase status\n # tax_unit_weight is summed to get population-representative counts\n summary = df.groupby(['state', 'eitc_phase_status']).agg({\n 'tax_unit_weight': 'sum',\n }).reset_index()\n \n summary.columns = ['state', 'eitc_phase_status', 'weighted_households']\n \n # Step 2: Calculate state totals for percentage calculation\n state_totals = summary.groupby('state')['weighted_households'].sum().reset_index()\n state_totals.columns = ['state', 'state_total']\n \n # Step 3: Merge to compute percentages\n summary = summary.merge(state_totals, on='state')\n summary['pct_of_state'] = (summary['weighted_households'] / summary['state_total'] * 100).round(1)\n \n # Step 4: Add average EITC amounts (only computed for households receiving EITC)\n # This uses weighted averages: sum(value × weight) / sum(weight)\n avg_eitc = df[df['eitc'] > 0].groupby(['state', 'eitc_phase_status']).apply(\n lambda x: pd.Series({\n 'avg_federal_eitc': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n 'avg_state_eitc': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n })\n ).reset_index()\n \n summary = summary.merge(avg_eitc, on=['state', 'eitc_phase_status'], how='left')\n summary['avg_federal_eitc'] = summary['avg_federal_eitc'].fillna(0)\n summary['avg_state_eitc'] = summary['avg_state_eitc'].fillna(0)\n \n # Step 5: Clean up columns and sort\n summary = summary[['state', 'eitc_phase_status', 'weighted_households', 'pct_of_state', \n 'avg_federal_eitc', 'avg_state_eitc']]\n \n # Sort by state alphabetically, then by phase status in logical order\n summary['phase_sort'] = summary['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})\n summary = summary.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)\n \n return summary\n\n# Generate summaries for both years\nsummary_2024 = create_phase_status_summary(df_2024, \"2024\")\nsummary_2025 = create_phase_status_summary(df_2025, \"2025\")\n\n# Preview the results\nprint(\"\\n2024 Summary (first 20 rows):\")\nprint(summary_2024.head(20).to_string(index=False))\nprint(\"\\n2025 Summary (first 20 rows):\")\nprint(summary_2025.head(20).to_string(index=False))" }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Distribution by Filing Status (Marital Status)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": "# =============================================================================\n# EXAMPLE HOUSEHOLDS BY PHASE STATUS\n# =============================================================================\n# Shows concrete examples of households in each phase status to help\n# understand what kinds of households fall into each category.\n#\n# This is useful for validation and for explaining the analysis to stakeholders.\n# =============================================================================\n\ndef show_example_households(df, year_label, n_examples=3):\n \"\"\"\n Show example households from each phase status with key characteristics.\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Household-level data\n year_label : str\n Label for display\n n_examples : int\n Number of examples per phase status (default 3)\n \n Returns:\n --------\n pandas.DataFrame\n Sample households with key characteristics\n \"\"\"\n print(f\"\\n{'='*70}\")\n print(f\"Example Households by Phase Status - {year_label}\")\n print(f\"{'='*70}\")\n \n examples = []\n \n # Only show examples for phases where households receive some EITC\n # (No income and Fully phased out receive $0, so less interesting as examples)\n for phase in ['Pre-phase-in', 'Full amount', 'Partially phased out']:\n phase_df = df[df['eitc_phase_status'] == phase]\n if len(phase_df) > 0:\n # Random sample with fixed seed for reproducibility\n sample = phase_df.sample(min(n_examples, len(phase_df)), random_state=42)\n for _, row in sample.iterrows():\n examples.append({\n 'phase_status': phase,\n 'state': row['state'],\n 'marital_status': row['filing_status_label'],\n 'age_head': int(row['age_head']),\n 'agi': row['adjusted_gross_income'],\n 'earned_income': row['tax_unit_earned_income'],\n 'federal_eitc': row['eitc'],\n 'state_eitc': row['state_eitc'],\n })\n \n examples_df = pd.DataFrame(examples)\n return examples_df\n\n# Show examples for 2024\nexamples_2024 = show_example_households(df_2024, \"2024\")\nprint(examples_2024.to_string(index=False))" - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Distribution by State" - ] - }, { "cell_type": "code", "execution_count": null, @@ -120,27 +88,6 @@ "outputs": [], "source": "# =============================================================================\n# SUMMARY BY STATE - TOP STATES BY POPULATION\n# =============================================================================\n# Shows the states with the largest childless tax unit populations,\n# along with total and average EITC amounts.\n#\n# Useful for understanding which states contribute most to the national totals.\n# =============================================================================\n\ndef summary_by_state(df, year_label, top_n=15):\n \"\"\"\n Create summary by state showing top N by number of childless tax units.\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Household-level data\n year_label : str\n Label for display\n top_n : int\n Number of top states to show (default 15)\n \n Returns:\n --------\n pandas.DataFrame\n State-level summary sorted by weighted tax unit count\n \"\"\"\n print(f\"\\n{'='*60}\")\n print(f\"Top {top_n} States by EITC Recipients - {year_label}\")\n print(f\"{'='*60}\")\n \n # Calculate state-level aggregates using weighted sums/averages\n summary = df.groupby('state').apply(\n lambda x: pd.Series({\n # Total weighted tax units in state\n 'Tax Units (Weighted)': x['tax_unit_weight'].sum(),\n # Total federal EITC distributed (weight × eitc amount)\n 'Total Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum(),\n # Total state EITC distributed\n 'Total State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum(),\n # Weighted average federal EITC per tax unit\n 'Avg Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n # Weighted average state EITC per tax unit\n 'Avg State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n # Boolean: does this state have a state EITC program?\n 'Has State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() > 0,\n })\n ).reset_index()\n \n # Sort by number of tax units (largest states first)\n summary = summary.sort_values('Tax Units (Weighted)', ascending=False).head(top_n)\n \n return summary\n\n# Generate and display for both years\nstate_2024 = summary_by_state(df_2024, \"2024\")\nprint(state_2024.to_string(index=False))\n\nstate_2025 = summary_by_state(df_2025, \"2025\")\nprint(state_2025.to_string(index=False))" }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Cross-tabulation: Phase Status by Filing Status" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": "# =============================================================================\n# CROSS-TABULATION: PHASE STATUS × FILING STATUS\n# =============================================================================\n# Creates a pivot table showing how phase status varies by filing status.\n#\n# Note: Due to data limitations in the state datasets, filing status may\n# show as \"Unknown\" for many records.\n# =============================================================================\n\ndef crosstab_phase_by_filing(df, year_label):\n \"\"\"\n Create cross-tabulation of phase status by filing status (marital status).\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Household-level data\n year_label : str\n Label for display\n \n Returns:\n --------\n pandas.DataFrame\n Pivot table with phase status as rows, filing status as columns\n \"\"\"\n print(f\"\\n{'='*60}\")\n print(f\"Phase Status by Filing Status (Weighted Tax Units) - {year_label}\")\n print(f\"{'='*60}\")\n \n # Create pivot table: rows = phase status, columns = filing status\n # Values are weighted tax unit counts\n pivot = df.pivot_table(\n values='tax_unit_weight',\n index='eitc_phase_status',\n columns='filing_status_label',\n aggfunc='sum',\n fill_value=0\n )\n \n # Add row and column totals for context\n pivot['Total'] = pivot.sum(axis=1)\n pivot.loc['Total'] = pivot.sum()\n \n return pivot\n\n# Generate for both years\ncrosstab_2024 = crosstab_phase_by_filing(df_2024, \"2024\")\nprint(crosstab_2024.to_string())\n\ncrosstab_2025 = crosstab_phase_by_filing(df_2025, \"2025\")\nprint(crosstab_2025.to_string())" - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Age Distribution" - ] - }, { "cell_type": "code", "execution_count": null, @@ -148,20 +95,6 @@ "outputs": [], "source": "# =============================================================================\n# AGE DISTRIBUTION ANALYSIS\n# =============================================================================\n# Shows how childless tax units are distributed by age of the head of household.\n#\n# Key insight: The childless EITC has age restrictions (25-64 for 2024 under\n# current law), so we expect most EITC recipients to fall within that range.\n# =============================================================================\n\ndef age_distribution(df, year_label):\n \"\"\"\n Create age group distribution for heads of household.\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Household-level data\n year_label : str\n Label for display\n \n Returns:\n --------\n pandas.DataFrame\n Summary by age group with weighted counts and averages\n \"\"\"\n print(f\"\\n{'='*60}\")\n print(f\"Age Distribution of Head of Household - {year_label}\")\n print(f\"{'='*60}\")\n \n # Create age groups using pd.cut\n df_copy = df.copy()\n df_copy['age_group'] = pd.cut(\n df_copy['age_head'],\n bins=[0, 25, 35, 45, 55, 65, 100],\n labels=['Under 25', '25-34', '35-44', '45-54', '55-64', '65+']\n )\n \n # Calculate weighted statistics by age group\n summary = df_copy.groupby('age_group').apply(\n lambda x: pd.Series({\n 'Tax Units (Weighted)': x['tax_unit_weight'].sum(),\n 'Avg Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum() if x['tax_unit_weight'].sum() > 0 else 0,\n 'Avg Earned Income': (x['tax_unit_earned_income'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum() if x['tax_unit_weight'].sum() > 0 else 0,\n })\n ).reset_index()\n \n # Add percentage of total\n total_units = summary['Tax Units (Weighted)'].sum()\n summary['% of Total'] = (summary['Tax Units (Weighted)'] / total_units * 100).round(1)\n \n return summary\n\n# Generate for both years\nage_2024 = age_distribution(df_2024, \"2024\")\nprint(age_2024.to_string(index=False))\n\nage_2025 = age_distribution(df_2025, \"2025\")\nprint(age_2025.to_string(index=False))" }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### States with State EITC Programs" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": "# =============================================================================\n# STATE EITC PROGRAM ANALYSIS\n# =============================================================================\n# Shows which states have state EITC programs and how generous they are.\n#\n# State EITCs are typically calculated as a percentage of the federal EITC,\n# ranging from ~3% (Montana) to ~125% (South Carolina).\n# =============================================================================\n\ndef state_eitc_summary(df, year_label):\n \"\"\"\n Summary of states with state EITC programs.\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Household-level data\n year_label : str\n Label for display\n \n Returns:\n --------\n pandas.DataFrame or None\n Summary for states with state EITC programs, sorted by total distributed\n \"\"\"\n print(f\"\\n{'='*60}\")\n print(f\"States with State EITC Benefits - {year_label}\")\n print(f\"{'='*60}\")\n \n # Filter to only households actually receiving state EITC\n df_with_state_eitc = df[df['state_eitc'] > 0]\n \n if len(df_with_state_eitc) == 0:\n print(\"No state EITC benefits found in the data.\")\n return None\n \n # Calculate state-level summaries\n summary = df_with_state_eitc.groupby('state').apply(\n lambda x: pd.Series({\n # Number of recipients (weighted)\n 'Tax Units (Weighted)': x['tax_unit_weight'].sum(),\n # Total state EITC distributed\n 'Total State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum(),\n # Average per recipient\n 'Avg State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n # State EITC as percentage of federal (indicates program generosity)\n 'State EITC as % of Fed': ((x['state_eitc'] * x['tax_unit_weight']).sum() / \n (x['eitc'] * x['tax_unit_weight']).sum() * 100) if (x['eitc'] * x['tax_unit_weight']).sum() > 0 else 0,\n })\n ).reset_index()\n \n # Sort by total distributed (largest programs first)\n summary = summary.sort_values('Total State EITC', ascending=False)\n \n return summary\n\n# Generate for both years\nstate_eitc_2024 = state_eitc_summary(df_2024, \"2024\")\nif state_eitc_2024 is not None:\n print(state_eitc_2024.to_string(index=False))\n\nstate_eitc_2025 = state_eitc_summary(df_2025, \"2025\")\nif state_eitc_2025 is not None:\n print(state_eitc_2025.to_string(index=False))" - }, { "cell_type": "markdown", "metadata": {}, @@ -174,7 +107,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": "# =============================================================================\n# EXPORT DETAILED HOUSEHOLD DATA\n# =============================================================================\n# Exports the full household-level dataset with all calculated variables.\n#\n# WARNING: These files are large (~125MB each) and are excluded from git\n# via .gitignore. They are generated locally when the notebook runs.\n#\n# Use cases:\n# - Detailed analysis in external tools (Excel, Stata, R)\n# - Validation of the summary statistics\n# - Custom filtering/aggregation not provided in this notebook\n# =============================================================================\n\ndef export_household_data(df, year):\n \"\"\"\n Export household-level data to CSV, sorted by state and phase status.\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Household-level data from run_all_states_analysis()\n year : int\n Tax year (used in filename)\n \n Returns:\n --------\n pandas.DataFrame\n The exported data (same as written to file)\n \n Output File:\n eitc_childless_families_{year}.csv\n \"\"\"\n \n # Select columns for export (excluding eitc_maximum per user request)\n export_columns = [\n 'state', # State abbreviation\n 'eitc_phase_status', # Classification result\n 'tax_unit_id', # Unique identifier\n 'tax_unit_weight', # Survey weight\n 'eitc', # Federal EITC amount\n 'state_eitc', # State EITC amount\n 'eitc_phased_in', # Phase-in calculation\n 'eitc_reduction', # Phase-out reduction\n 'tax_unit_earned_income', # Total earned income\n 'adjusted_gross_income', # AGI\n 'filing_status_label', # Marital/filing status\n 'age_head', # Age of primary filer\n 'age_spouse', # Age of spouse (0 if none)\n ]\n \n # Only include columns that exist in the DataFrame\n available_columns = [col for col in export_columns if col in df.columns]\n df_export = df[available_columns].copy()\n \n # Rename columns for clarity in external tools\n df_export = df_export.rename(columns={\n 'eitc': 'federal_eitc',\n 'filing_status_label': 'marital_status',\n })\n \n # Sort by state (alphabetically) then by phase status (in logical EITC order)\n df_export['phase_sort'] = df_export['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})\n df_export = df_export.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)\n \n # Write to CSV\n filename = f'eitc_childless_families_{year}.csv'\n df_export.to_csv(filename, index=False)\n print(f\"Exported {len(df_export):,} rows to: {filename}\")\n \n return df_export\n\n# Export both years to separate files\ndf_export_2024 = export_household_data(df_2024, 2024)\ndf_export_2025 = export_household_data(df_2025, 2025)" + "source": "# =============================================================================\n# EXPORT DETAILED HOUSEHOLD DATA\n# =============================================================================\n# Exports the full household-level dataset with all calculated variables.\n#\n# WARNING: These files are large (~125MB each) and are excluded from git\n# via .gitignore. They are generated locally when the notebook runs.\n#\n# Use cases:\n# - Detailed analysis in external tools (Excel, Stata, R)\n# - Validation of the summary statistics\n# - Custom filtering/aggregation not provided in this notebook\n# =============================================================================\n\ndef export_household_data(df, year):\n \"\"\"\n Export household-level data to CSV, sorted by state and phase status.\n \"\"\"\n \n # Select columns for export (only columns we're loading)\n export_columns = [\n 'state', # State abbreviation\n 'eitc_phase_status', # Classification result\n 'tax_unit_id', # Unique identifier\n 'tax_unit_weight', # Survey weight\n 'eitc_eligible', # Eligibility status\n 'eitc', # Federal EITC amount\n 'state_eitc', # State EITC amount\n 'eitc_phased_in', # Phase-in calculation\n 'eitc_reduction', # Phase-out reduction\n 'tax_unit_earned_income', # Total earned income\n 'age_head', # Age of primary filer\n ]\n \n # Only include columns that exist in the DataFrame\n available_columns = [col for col in export_columns if col in df.columns]\n df_export = df[available_columns].copy()\n \n # Rename columns for clarity in external tools\n df_export = df_export.rename(columns={\n 'eitc': 'federal_eitc',\n })\n \n # Sort by state (alphabetically) then by phase status (in logical EITC order)\n df_export['phase_sort'] = df_export['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})\n df_export = df_export.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)\n \n # Write to CSV\n filename = f'eitc_childless_families_{year}.csv'\n df_export.to_csv(filename, index=False)\n print(f\"Exported {len(df_export):,} rows to: {filename}\")\n \n return df_export\n\n# Export both years to separate files\ndf_export_2024 = export_household_data(df_2024, 2024)\ndf_export_2025 = export_household_data(df_2025, 2025)" }, { "cell_type": "code", @@ -491,30 +424,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "## Notes\n", - "\n", - "### Data Interpretation\n", - "- **Tax unit weights** represent the number of actual tax units each record represents in the population\n", - "- All monetary values are weighted averages/totals reflecting the full population\n", - "- The enhanced CPS dataset has ~42,000 household records that are weighted to represent the US population\n", - "\n", - "### EITC Phase Status Definitions\n", - "1. **Pre-phase-in**: Earned income is below the level needed to receive the maximum credit. The credit amount equals (earned income × phase-in rate).\n", - "2. **Full amount**: Earned income is sufficient to receive the maximum credit, and income is below the phase-out threshold.\n", - "3. **Partially phased out**: Income is above the phase-out threshold, resulting in a reduced credit.\n", - "4. **Fully phased out**: Income is too high; credit is reduced to $0.\n", - "\n", - "### State EITC Programs\n", - "Not all states have state EITC programs. States with programs typically calculate their EITC as a percentage of the federal EITC amount.\n", - "\n", - "### Childless Worker EITC\n", - "The federal EITC for childless workers is significantly smaller than for workers with children. Key parameters (2024):\n", - "- Maximum credit: ~$632\n", - "- Phase-in rate: 7.65%\n", - "- Phase-out starts at: ~$9,800 (single), ~$16,400 (married)\n", - "- Phase-out rate: 7.65%" - ] + "source": "## Notes\n\n### Data Interpretation\n- **Tax unit weights** represent the number of actual tax units each record represents in the population\n- All monetary values are weighted averages/totals reflecting the full population\n- State datasets contain representative microdata for each state\n\n### EITC Phase Status Definitions\n1. **Ineligible**: Does not meet EITC eligibility requirements (age 25-64, valid SSN, investment income limits, or filing status)\n2. **No earned income**: Eligible for EITC but has zero earned income (cannot receive credit without earnings)\n3. **Pre-phase-in**: Earned income is below the level needed to receive the maximum credit. Credit = (earned income × 7.65%)\n4. **Full amount**: At the plateau - receiving maximum credit (~$632 for childless in 2024)\n5. **Partially phased out**: Income is above the phase-out threshold, receiving reduced credit\n6. **Fully phased out**: Income is too high; credit is reduced to $0\n\n### Childless Worker EITC Parameters (2024)\n- Maximum credit: ~$632\n- Phase-in rate: 7.65%\n- Phase-out starts at: ~$9,800 (single), ~$16,400 (married)\n- Phase-out rate: 7.65%\n- Age requirements: 25-64 years old (or 19+ if former foster youth/homeless)\n\n### State EITC Programs\nSee the State EITC Programs section at the beginning of this notebook for detailed information on each state's program, including states with unique structures (CA, MN, WA, VA, DE, MD)." } ], "metadata": { From 0989e36ccc824241ce7407496a995ef47123f5e2 Mon Sep 17 00:00:00 2001 From: David Trimmer Date: Wed, 17 Dec 2025 16:13:49 -0500 Subject: [PATCH 4/4] overhaul --- .../eitc_childless_analysis.ipynb | 2885 ++++++++++++++++- ...tc_childless_phase_status_summary_2024.csv | 255 +- ...tc_childless_phase_status_summary_2025.csv | 255 +- .../Congressional-Hackathon-2025 | 1 + 4 files changed, 3035 insertions(+), 361 deletions(-) create mode 160000 obbba_district_impacts/Congressional-Hackathon-2025 diff --git a/eitc_childless_analysis/eitc_childless_analysis.ipynb b/eitc_childless_analysis/eitc_childless_analysis.ipynb index 4861b17..e5a4890 100644 --- a/eitc_childless_analysis/eitc_childless_analysis.ipynb +++ b/eitc_childless_analysis/eitc_childless_analysis.ipynb @@ -3,97 +3,2463 @@ { "cell_type": "markdown", "metadata": {}, - "source": "# EITC Analysis: Childless Filers by Phase Status\n\n## Overview\nThis notebook analyzes **childless tax units** (those with no EITC-qualifying children) across all 50 US states + DC, categorizing them by where they fall on the EITC schedule.\n\n## What This Notebook Does\n1. **Loads state-specific microdata** from PolicyEngine's HuggingFace repository\n2. **Filters to childless filers** (eitc_child_count == 0)\n3. **Checks EITC eligibility** (age requirements, SSN, investment income limits)\n4. **Categorizes each household** into one of 6 phase statuses\n5. **Calculates weighted counts and percentages** by state\n6. **Exports summary data** to CSV files\n\n## EITC Phase Status Categories\n| Status | Description |\n|--------|-------------|\n| **Ineligible** | Does not meet EITC eligibility requirements (age, SSN, investment income, or filing status) |\n| **No earned income** | No earned income, therefore no EITC |\n| **Pre-phase-in** | Has earned income but hasn't reached maximum credit yet |\n| **Full amount** | At the plateau - receiving maximum credit |\n| **Partially phased out** | In phase-out range, receiving reduced credit |\n| **Fully phased out** | Income too high, EITC reduced to $0 |\n\n## Data Source\n- **State datasets**: `hf://policyengine/policyengine-us-data/states/{STATE}.h5`\n- Each state has its own dataset with representative household microdata\n- Data is weighted to represent the actual population\n\n## Output Files\n- `eitc_childless_phase_status_summary_{year}.csv` - Aggregated by state and phase status\n\n## Years Analyzed\n- 2024 and 2025" + "source": [ + "# EITC Analysis: Childless Filers by Phase Status\n", + "\n", + "## Overview\n", + "This notebook analyzes **childless tax units** (those with no EITC-qualifying children) across all 50 US states + DC, categorizing them by where they fall on the EITC schedule.\n", + "\n", + "## What This Notebook Does\n", + "1. **Loads state-specific microdata** from PolicyEngine's HuggingFace repository\n", + "2. **Filters to childless filers** (eitc_child_count == 0)\n", + "3. **Checks EITC eligibility** (age requirements, SSN, investment income limits)\n", + "4. **Categorizes each household** into one of 6 phase statuses\n", + "5. **Calculates weighted counts and percentages** by state\n", + "6. **Exports summary data** to CSV files\n", + "\n", + "## EITC Phase Status Categories\n", + "| Status | Description |\n", + "|--------|-------------|\n", + "| **Ineligible** | Does not meet EITC eligibility requirements (age, SSN, investment income, or filing status) |\n", + "| **No earned income** | No earned income, therefore no EITC |\n", + "| **Pre-phase-in** | Has earned income but hasn't reached maximum credit yet |\n", + "| **Full amount** | At the plateau - receiving maximum credit |\n", + "| **Partially phased out** | In phase-out range, receiving reduced credit |\n", + "| **Fully phased out** | Income too high, EITC reduced to $0 |\n", + "\n", + "## Data Source\n", + "- **State datasets**: `hf://policyengine/policyengine-us-data/states/{STATE}.h5`\n", + "- Each state has its own dataset with representative household microdata\n", + "- Data is weighted to represent the actual population\n", + "\n", + "## Output Files\n", + "- `eitc_childless_phase_status_summary_{year}.csv` - Aggregated by state and phase status\n", + "\n", + "## Years Analyzed\n", + "- 2024 and 2025" + ] }, { "cell_type": "markdown", "metadata": {}, - "source": "## State EITC Programs\n\nAs of 2024, **31 states plus DC** have state-level Earned Income Tax Credit programs. Most states calculate their EITC as a simple percentage match of the federal EITC, but several have unique structures.\n\n### States with Standard Federal Match Structure\nThese states calculate their state EITC as a percentage of the federal EITC amount:\n\n| State | Match % | Refundable | Notes |\n|-------|---------|------------|-------|\n| CO | 50% (2024) | Yes | Phasing down to 10% by 2034 |\n| CT | ~30% | Yes | |\n| DC | 70% | Yes | Higher match for childless workers |\n| DE | 4.5% ref / 20% non-ref | Choice | Taxpayers choose refundable OR non-refundable |\n| HI | 40% | Yes | |\n| IL | 20% | Yes | |\n| IN | 10% | Yes | |\n| IA | 15% | Yes | |\n| KS | 17% | Yes | |\n| LA | 5% | Yes | |\n| ME | 50% | Yes | |\n| MA | 40% | Yes | |\n| MI | 30% | Yes | |\n| MO | 20% | Yes | Called \"Working Families Tax Credit\" |\n| MT | 10% | Yes | |\n| NE | 10% | Yes | |\n| NJ | Variable | Yes | Varies by income |\n| NM | ~25% | Yes | |\n| NY | 30% | Yes | Plus supplemental credit |\n| OH | 30% | Yes | |\n| OK | 5% | Yes | Lowest in nation |\n| OR | 9-12% | Yes | Varies by children |\n| PA | ~10% | Yes | |\n| RI | 16% | Yes | |\n| SC | 125% | Yes | Highest in nation |\n| VT | ~38% | Yes | Increased to 100% for childless in 2025 |\n| WI | Variable | Yes | Varies by children |\n\n### States with UNIQUE/NON-STANDARD Structures\n\n#### California (CA) - CalEITC\nCalifornia does NOT simply match the federal EITC. Instead:\n- Uses an **85% adjustment factor** applied to a state-specific calculation\n- Has **different phase-in rates by number of children**:\n - 0 children: 7.65%\n - 1 child: 34%\n - 2 children: 40%\n - 3+ children: 45%\n- Has a **two-stage phase-out** structure\n- Maximum credit is lower than federal EITC\n- **Fully refundable**\n\n#### Minnesota (MN) - Working Family Credit / Child & Working Families Credit\nMinnesota **replaced** its traditional Working Family Credit in 2023 with the **Child and Working Families Credit (CWFC)**:\n- **Two-part credit structure**:\n 1. Child Tax Credit component: Fixed amount per qualifying child\n 2. Working Family Credit component: Phase-in based on earnings\n- Combined amounts phase out together based on AGI or earnings\n- **Completely independent calculation** from federal EITC\n- **Fully refundable**\n\n#### Washington (WA) - Working Families Tax Credit (WFTC)\nWashington has **no income tax** and therefore no traditional EITC. Instead:\n- Provides a **flat dollar amount** based on number of children:\n - 0 children: $300-$325\n - 1 child: $600-$640\n - 2 children: $900-$965\n - 3+ children: $1,200-$1,290\n- Phases out starting **$2,500-$5,000 below** federal EITC AGI limits\n- Requires claiming federal EITC to qualify\n- **Fully refundable**\n\n#### Virginia (VA) - Split Refundable/Non-Refundable + Low-Income Tax Credit\nVirginia has the most complex structure:\n- **Non-refundable match**: 20% of federal EITC (since 2006)\n- **Refundable match**: Variable (0% → 15% → 20% → 15% over different years)\n- **Alternative Low-Income Tax Credit (LITC)**: $300 per personal exemption\n- Taxpayers receive the **better of** EITC match or LITC\n- Separate filers receive prorated credits\n\n#### Delaware (DE) - Choice Between Refundable and Non-Refundable\nDelaware requires taxpayers to **choose one**:\n- **Refundable option**: 4.5% of federal EITC\n- **Non-refundable option**: 20% of federal EITC\n- Cannot claim both\n\n#### Maryland (MD) - Differentiated by Family Status\nMaryland varies match percentages by family composition:\n- **Married OR has children**: \n - Non-refundable: 50%\n - Refundable: 25-45%\n- **Childless unmarried filers**: Different (lower) percentages\n- Has separate parameters for different filing situations\n\n### States WITHOUT State EITC Programs\nThe following states have **no state EITC**: AL, AK, AZ, AR, FL, GA, ID, KY, MS, NV, NH, NC, ND, SD, TN, TX, UT, WV, WY" + "source": [ + "## State EITC Programs\n", + "\n", + "As of 2024, **31 states plus DC** have state-level Earned Income Tax Credit programs. Most states calculate their EITC as a simple percentage match of the federal EITC, but several have unique structures.\n", + "\n", + "### States with Standard Federal Match Structure\n", + "These states calculate their state EITC as a percentage of the federal EITC amount:\n", + "\n", + "| State | Match % | Refundable | Notes |\n", + "|-------|---------|------------|-------|\n", + "| CO | 50% (2024) | Yes | Phasing down to 10% by 2034 |\n", + "| CT | ~30% | Yes | |\n", + "| DC | 70% | Yes | Higher match for childless workers |\n", + "| DE | 4.5% ref / 20% non-ref | Choice | Taxpayers choose refundable OR non-refundable |\n", + "| HI | 40% | Yes | |\n", + "| IL | 20% | Yes | |\n", + "| IN | 10% | Yes | |\n", + "| IA | 15% | Yes | |\n", + "| KS | 17% | Yes | |\n", + "| LA | 5% | Yes | |\n", + "| ME | 50% | Yes | |\n", + "| MA | 40% | Yes | |\n", + "| MI | 30% | Yes | |\n", + "| MO | 20% | Yes | Called \"Working Families Tax Credit\" |\n", + "| MT | 10% | Yes | |\n", + "| NE | 10% | Yes | |\n", + "| NJ | Variable | Yes | Varies by income |\n", + "| NM | ~25% | Yes | |\n", + "| NY | 30% | Yes | Plus supplemental credit |\n", + "| OH | 30% | Yes | |\n", + "| OK | 5% | Yes | Lowest in nation |\n", + "| OR | 9-12% | Yes | Varies by children |\n", + "| PA | ~10% | Yes | |\n", + "| RI | 16% | Yes | |\n", + "| SC | 125% | Yes | Highest in nation |\n", + "| VT | ~38% | Yes | Increased to 100% for childless in 2025 |\n", + "| WI | Variable | Yes | Varies by children |\n", + "\n", + "### States with UNIQUE/NON-STANDARD Structures\n", + "\n", + "#### California (CA) - CalEITC\n", + "California does NOT simply match the federal EITC. Instead:\n", + "- Uses an **85% adjustment factor** applied to a state-specific calculation\n", + "- Has **different phase-in rates by number of children**:\n", + " - 0 children: 7.65%\n", + " - 1 child: 34%\n", + " - 2 children: 40%\n", + " - 3+ children: 45%\n", + "- Has a **two-stage phase-out** structure\n", + "- Maximum credit is lower than federal EITC\n", + "- **Fully refundable**\n", + "\n", + "#### Minnesota (MN) - Working Family Credit / Child & Working Families Credit\n", + "Minnesota **replaced** its traditional Working Family Credit in 2023 with the **Child and Working Families Credit (CWFC)**:\n", + "- **Two-part credit structure**:\n", + " 1. Child Tax Credit component: Fixed amount per qualifying child\n", + " 2. Working Family Credit component: Phase-in based on earnings\n", + "- Combined amounts phase out together based on AGI or earnings\n", + "- **Completely independent calculation** from federal EITC\n", + "- **Fully refundable**\n", + "\n", + "#### Washington (WA) - Working Families Tax Credit (WFTC)\n", + "Washington has **no income tax** and therefore no traditional EITC. Instead:\n", + "- Provides a **flat dollar amount** based on number of children:\n", + " - 0 children: $300-$325\n", + " - 1 child: $600-$640\n", + " - 2 children: $900-$965\n", + " - 3+ children: $1,200-$1,290\n", + "- Phases out starting **$2,500-$5,000 below** federal EITC AGI limits\n", + "- Requires claiming federal EITC to qualify\n", + "- **Fully refundable**\n", + "\n", + "#### Virginia (VA) - Split Refundable/Non-Refundable + Low-Income Tax Credit\n", + "Virginia has the most complex structure:\n", + "- **Non-refundable match**: 20% of federal EITC (since 2006)\n", + "- **Refundable match**: Variable (0% → 15% → 20% → 15% over different years)\n", + "- **Alternative Low-Income Tax Credit (LITC)**: $300 per personal exemption\n", + "- Taxpayers receive the **better of** EITC match or LITC\n", + "- Separate filers receive prorated credits\n", + "\n", + "#### Delaware (DE) - Choice Between Refundable and Non-Refundable\n", + "Delaware requires taxpayers to **choose one**:\n", + "- **Refundable option**: 4.5% of federal EITC\n", + "- **Non-refundable option**: 20% of federal EITC\n", + "- Cannot claim both\n", + "\n", + "#### Maryland (MD) - Differentiated by Family Status\n", + "Maryland varies match percentages by family composition:\n", + "- **Married OR has children**: \n", + " - Non-refundable: 50%\n", + " - Refundable: 25-45%\n", + "- **Childless unmarried filers**: Different (lower) percentages\n", + "- Has separate parameters for different filing situations\n", + "\n", + "### States WITHOUT State EITC Programs\n", + "The following states have **no state EITC**: AL, AK, AZ, AR, FL, GA, ID, KY, MS, NV, NH, NC, ND, SD, TN, TX, UT, WV, WY" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "metadata": {}, "outputs": [], - "source": "# =============================================================================\n# IMPORTS AND CONFIGURATION\n# =============================================================================\n# \n# policyengine_us: PolicyEngine's US tax-benefit microsimulation model\n# - Microsimulation: Class for running simulations on survey microdata\n# - Loads datasets, calculates tax/benefit variables for each household\n#\n# pandas/numpy: Standard data manipulation libraries\n# =============================================================================\n\nfrom policyengine_us import Microsimulation\nimport pandas as pd\nimport numpy as np\n\n# Configure pandas display options for better output formatting\npd.set_option('display.max_columns', None) # Show all columns\npd.set_option('display.width', None) # Don't wrap output\npd.set_option('display.float_format', lambda x: f'{x:,.2f}') # Format numbers with commas" + "source": [ + "# =============================================================================\n", + "# IMPORTS AND CONFIGURATION\n", + "# =============================================================================\n", + "# \n", + "# policyengine_us: PolicyEngine's US tax-benefit microsimulation model\n", + "# - Microsimulation: Class for running simulations on survey microdata\n", + "# - Loads datasets, calculates tax/benefit variables for each household\n", + "#\n", + "# pandas/numpy: Standard data manipulation libraries\n", + "# =============================================================================\n", + "\n", + "from policyengine_us import Microsimulation\n", + "import pandas as pd\n", + "import numpy as np\n", + "\n", + "# Configure pandas display options for better output formatting\n", + "pd.set_option('display.max_columns', None) # Show all columns\n", + "pd.set_option('display.width', None) # Don't wrap output\n", + "pd.set_option('display.float_format', lambda x: f'{x:,.2f}') # Format numbers with commas" + ] }, { "cell_type": "markdown", "metadata": {}, - "source": "## EITC Phase Status Classification\n\nThe Earned Income Tax Credit (EITC) follows a trapezoidal schedule:\n\n```\nCredit\nAmount\n ^\n | ___________\n | / \\\n | / \\\n | / \\\n | / \\\n | / \\\n |/_____________________\\____> Earned Income\n Phase-in Plateau Phase-out\n```\n\n### EITC Eligibility Requirements (Childless Filers)\nBefore a childless filer can receive EITC, they must meet:\n1. **Age requirement**: Between 25 and 64 years old (or 19+ if former foster youth/homeless)\n2. **SSN requirement**: Valid Social Security Number for work\n3. **Investment income limit**: Investment income must be below threshold (~$11,000 in 2024)\n4. **Filing status**: Cannot file as \"Married Filing Separately\" (in most cases)\n\n### How We Classify Households\n\nWe use PolicyEngine's calculated variables:\n\n| Variable | Description |\n|----------|-------------|\n| `eitc_eligible` | Whether tax unit meets all EITC eligibility requirements |\n| `eitc` | Final EITC amount received (after all calculations) |\n| `eitc_maximum` | Maximum possible EITC for this filing status |\n| `eitc_phased_in` | Amount \"earned\" based on phase-in rate × earned income |\n| `eitc_reduction` | Amount reduced due to being in phase-out range |\n| `tax_unit_earned_income` | Total earned income for the tax unit |\n\n### Classification Logic (in priority order)\n1. **Ineligible**: `eitc_eligible == False` (fails age, SSN, investment income, or filing status)\n2. **No earned income**: `tax_unit_earned_income == 0` (eligible but no earnings)\n3. **Pre-phase-in**: Receiving EITC but `eitc_phased_in < eitc_maximum`\n4. **Full amount**: `eitc_phased_in >= eitc_maximum` AND `eitc_reduction == 0`\n5. **Partially phased out**: Receiving EITC AND `eitc_reduction > 0`\n6. **Fully phased out**: `eitc == 0` AND has income (phased out completely)" + "source": [ + "## EITC Phase Status Classification\n", + "\n", + "The Earned Income Tax Credit (EITC) follows a trapezoidal schedule:\n", + "\n", + "```\n", + "Credit\n", + "Amount\n", + " ^\n", + " | ___________\n", + " | / \\\n", + " | / \\\n", + " | / \\\n", + " | / \\\n", + " | / \\\n", + " |/_____________________\\____> Earned Income\n", + " Phase-in Plateau Phase-out\n", + "```\n", + "\n", + "### EITC Eligibility Requirements (Childless Filers)\n", + "Before a childless filer can receive EITC, they must meet:\n", + "1. **Age requirement**: Between 25 and 64 years old (or 19+ if former foster youth/homeless)\n", + "2. **SSN requirement**: Valid Social Security Number for work\n", + "3. **Investment income limit**: Investment income must be below threshold (~$11,000 in 2024)\n", + "4. **Filing status**: Cannot file as \"Married Filing Separately\" (in most cases)\n", + "\n", + "### How We Classify Households\n", + "\n", + "We use PolicyEngine's calculated variables:\n", + "\n", + "| Variable | Description |\n", + "|----------|-------------|\n", + "| `eitc_eligible` | Whether tax unit meets all EITC eligibility requirements |\n", + "| `eitc` | Final EITC amount received (after all calculations) |\n", + "| `eitc_maximum` | Maximum possible EITC for this filing status |\n", + "| `eitc_phased_in` | Amount \"earned\" based on phase-in rate × earned income |\n", + "| `eitc_reduction` | Amount reduced due to being in phase-out range |\n", + "| `tax_unit_earned_income` | Total earned income for the tax unit |\n", + "\n", + "### Classification Logic (in priority order)\n", + "1. **Ineligible**: `eitc_eligible == False` (fails age, SSN, investment income, or filing status)\n", + "2. **No earned income**: `tax_unit_earned_income == 0` (eligible but no earnings)\n", + "3. **Pre-phase-in**: Receiving EITC but `eitc_phased_in < eitc_maximum`\n", + "4. **Full amount**: `eitc_phased_in >= eitc_maximum` AND `eitc_reduction == 0`\n", + "5. **Partially phased out**: Receiving EITC AND `eitc_reduction > 0`\n", + "6. **Fully phased out**: `eitc == 0` AND has income (phased out completely)" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 2, "metadata": {}, "outputs": [], - "source": "# =============================================================================\n# EITC PHASE STATUS CLASSIFICATION FUNCTION\n# =============================================================================\n# This function takes a DataFrame of households and classifies each one into\n# one of 6 EITC phase statuses based on eligibility, income, and EITC calculations.\n#\n# Uses numpy's np.select() for efficient vectorized conditional logic.\n# =============================================================================\n\ndef determine_eitc_phase_status_vectorized(df):\n \"\"\"\n Classify each household into an EITC phase status category.\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Must contain columns: eitc_eligible, tax_unit_earned_income, eitc, \n eitc_reduction, eitc_phased_in, eitc_maximum\n \n Returns:\n --------\n numpy.ndarray\n Array of status strings, one per row in df\n \n Categories (in priority order):\n -------------------------------\n 1. Ineligible: Does not meet EITC eligibility (age, SSN, investment income)\n 2. No earned income: Eligible but has zero earned income\n 3. Pre-phase-in: Receiving EITC, still building up to maximum\n 4. Full amount: At maximum credit (plateau region)\n 5. Partially phased out: In phase-out region, still receiving some credit\n 6. Fully phased out: Income too high, EITC reduced to $0\n \"\"\"\n \n # Define conditions in PRIORITY ORDER (first match wins)\n conditions = [\n # CONDITION 1: Ineligible for EITC\n # Fails age requirement (25-64), SSN, investment income limit, or filing status\n df['eitc_eligible'] == False,\n \n # CONDITION 2: No earned income\n # Eligible for EITC but has zero earned income (cannot receive credit)\n (df['eitc_eligible'] == True) & (df['tax_unit_earned_income'] == 0),\n \n # CONDITION 3: Pre-phase-in\n # Receiving EITC, but haven't earned enough to hit maximum yet\n (df['eitc'] > 0) & (df['eitc_phased_in'] < df['eitc_maximum']),\n \n # CONDITION 4: Full amount (plateau)\n # Receiving EITC at maximum, no reduction applied\n (df['eitc'] > 0) & (df['eitc_phased_in'] >= df['eitc_maximum']) & (df['eitc_reduction'] <= 0),\n \n # CONDITION 5: Partially phased out\n # Receiving EITC, but some reduction has been applied\n (df['eitc'] > 0) & (df['eitc_reduction'] > 0),\n \n # CONDITION 6: Fully phased out\n # Eligible, has income, but EITC reduced to zero\n (df['eitc_eligible'] == True) & (df['tax_unit_earned_income'] > 0) & (df['eitc'] <= 0),\n ]\n \n # Labels corresponding to each condition above\n choices = [\n 'Ineligible',\n 'No earned income',\n 'Pre-phase-in',\n 'Full amount',\n 'Partially phased out',\n 'Fully phased out'\n ]\n \n # np.select applies conditions in order, returns first matching choice\n # Default catches any edge cases\n return np.select(conditions, choices, default='Ineligible')" + "source": [ + "# =============================================================================\n", + "# EITC PHASE STATUS CLASSIFICATION FUNCTION\n", + "# =============================================================================\n", + "# This function takes a DataFrame of households and classifies each one into\n", + "# one of 6 EITC phase statuses based on eligibility, income, and EITC calculations.\n", + "#\n", + "# Uses numpy's np.select() for efficient vectorized conditional logic.\n", + "# =============================================================================\n", + "\n", + "def determine_eitc_phase_status_vectorized(df):\n", + " \"\"\"\n", + " Classify each household into an EITC phase status category.\n", + " \n", + " Parameters:\n", + " -----------\n", + " df : pandas.DataFrame\n", + " Must contain columns: eitc_eligible, tax_unit_earned_income, eitc, \n", + " eitc_reduction, eitc_phased_in, eitc_maximum\n", + " \n", + " Returns:\n", + " --------\n", + " numpy.ndarray\n", + " Array of status strings, one per row in df\n", + " \n", + " Categories (in priority order):\n", + " -------------------------------\n", + " 1. Ineligible: Does not meet EITC eligibility (age, SSN, investment income)\n", + " 2. No earned income: Eligible but has zero earned income\n", + " 3. Pre-phase-in: Receiving EITC, still building up to maximum\n", + " 4. Full amount: At maximum credit (plateau region)\n", + " 5. Partially phased out: In phase-out region, still receiving some credit\n", + " 6. Fully phased out: Income too high, EITC reduced to $0\n", + " \"\"\"\n", + " \n", + " # Define conditions in PRIORITY ORDER (first match wins)\n", + " conditions = [\n", + " # CONDITION 1: Ineligible for EITC\n", + " # Fails age requirement (25-64), SSN, investment income limit, or filing status\n", + " df['eitc_eligible'] == False,\n", + " \n", + " # CONDITION 2: No earned income\n", + " # Eligible for EITC but has zero earned income (cannot receive credit)\n", + " (df['eitc_eligible'] == True) & (df['tax_unit_earned_income'] == 0),\n", + " \n", + " # CONDITION 3: Pre-phase-in\n", + " # Receiving EITC, but haven't earned enough to hit maximum yet\n", + " (df['eitc'] > 0) & (df['eitc_phased_in'] < df['eitc_maximum']),\n", + " \n", + " # CONDITION 4: Full amount (plateau)\n", + " # Receiving EITC at maximum, no reduction applied\n", + " (df['eitc'] > 0) & (df['eitc_phased_in'] >= df['eitc_maximum']) & (df['eitc_reduction'] <= 0),\n", + " \n", + " # CONDITION 5: Partially phased out\n", + " # Receiving EITC, but some reduction has been applied\n", + " (df['eitc'] > 0) & (df['eitc_reduction'] > 0),\n", + " \n", + " # CONDITION 6: Fully phased out\n", + " # Eligible, has income, but EITC reduced to zero\n", + " (df['eitc_eligible'] == True) & (df['tax_unit_earned_income'] > 0) & (df['eitc'] <= 0),\n", + " ]\n", + " \n", + " # Labels corresponding to each condition above\n", + " choices = [\n", + " 'Ineligible',\n", + " 'No earned income',\n", + " 'Pre-phase-in',\n", + " 'Full amount',\n", + " 'Partially phased out',\n", + " 'Fully phased out'\n", + " ]\n", + " \n", + " # np.select applies conditions in order, returns first matching choice\n", + " # Default catches any edge cases\n", + " return np.select(conditions, choices, default='Ineligible')" + ] }, { "cell_type": "markdown", "metadata": {}, - "source": "## Data Loading Functions\n\n### `run_state_eitc_analysis(state_abbr, year)`\nLoads and processes data for a single state:\n1. Loads the state's microdata from HuggingFace\n2. Calculates all relevant EITC and household variables\n3. Filters to childless filers only (`eitc_child_count == 0`)\n4. Classifies each household by EITC phase status\n5. Returns a DataFrame with one row per household\n\n### `run_all_states_analysis(year)`\nOrchestrates the full analysis:\n1. Loops through all 51 states/DC\n2. Calls `run_state_eitc_analysis()` for each\n3. Combines all results into a single DataFrame\n\n### Variables Calculated\n| Variable | Description |\n|----------|-------------|\n| `tax_unit_weight` | Survey weight (how many real households this record represents) |\n| `eitc_eligible` | Whether tax unit meets all EITC eligibility requirements |\n| `eitc` | Federal EITC amount received |\n| `state_eitc` | State EITC amount (if state has a program) |\n| `eitc_child_count` | Number of EITC-qualifying children (we filter to 0) |\n| `tax_unit_earned_income` | Total earned income for the tax unit |\n| `age_head` | Age of primary filer |" + "source": [ + "## Data Loading Functions\n", + "\n", + "### `run_state_eitc_analysis(state_abbr, year)`\n", + "Loads and processes data for a single state:\n", + "1. Loads the state's microdata from HuggingFace\n", + "2. Calculates all relevant EITC and household variables\n", + "3. Filters to childless filers only (`eitc_child_count == 0`)\n", + "4. Classifies each household by EITC phase status\n", + "5. Returns a DataFrame with one row per household\n", + "\n", + "### `run_all_states_analysis(year)`\n", + "Orchestrates the full analysis:\n", + "1. Loops through all 51 states/DC\n", + "2. Calls `run_state_eitc_analysis()` for each\n", + "3. Combines all results into a single DataFrame\n", + "\n", + "### Variables Calculated\n", + "| Variable | Description |\n", + "|----------|-------------|\n", + "| `tax_unit_weight` | Survey weight (how many real households this record represents) |\n", + "| `eitc_eligible` | Whether tax unit meets all EITC eligibility requirements |\n", + "| `eitc` | Federal EITC amount received |\n", + "| `state_eitc` | State EITC amount (if state has a program) |\n", + "| `eitc_child_count` | Number of EITC-qualifying children (we filter to 0) |\n", + "| `tax_unit_earned_income` | Total earned income for the tax unit |\n", + "| `age_head` | Age of primary filer |" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 3, "metadata": {}, "outputs": [], - "source": "# =============================================================================\n# STATE LIST AND DATA LOADING FUNCTIONS\n# =============================================================================\n\n# All US states + DC (51 total)\nALL_STATES = [\n 'AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL', \n 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', \n 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', \n 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', \n 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY'\n]\n\n# Order for sorting phase statuses (follows logical EITC flow)\nPHASE_ORDER = [\n 'Ineligible', # Cannot receive EITC (age/SSN/investment income)\n 'No earned income', # Eligible but no earnings\n 'Pre-phase-in', # Building up to maximum\n 'Full amount', # At maximum (plateau)\n 'Partially phased out', # Being reduced\n 'Fully phased out' # Reduced to $0\n]\n\n\ndef run_state_eitc_analysis(state_abbr, year):\n \"\"\"\n Load and analyze EITC data for a single state.\n \n Parameters:\n -----------\n state_abbr : str\n Two-letter state abbreviation (e.g., 'CA', 'NY', 'TX')\n year : int\n Tax year to analyze (e.g., 2024, 2025)\n \n Returns:\n --------\n pandas.DataFrame or None\n DataFrame with one row per childless tax unit, or None if error\n \"\"\"\n try:\n # Load the state's microdata from HuggingFace\n dataset_path = f\"hf://policyengine/policyengine-us-data/states/{state_abbr}.h5\"\n sim = Microsimulation(dataset=dataset_path)\n \n # Variables to calculate\n tax_unit_vars = [\n 'tax_unit_id', # Unique identifier\n 'tax_unit_weight', # Survey weight\n 'eitc_eligible', # NEW: Whether eligible for EITC\n 'eitc', # Federal EITC amount\n 'eitc_maximum', # Max possible EITC\n 'eitc_phased_in', # Phase-in amount\n 'eitc_reduction', # Phase-out reduction\n 'eitc_child_count', # Number of EITC-qualifying children\n 'state_eitc', # State EITC amount\n 'tax_unit_earned_income', # Total earned income\n 'age_head', # Age of primary filer\n ]\n \n # Calculate each variable\n data = {}\n for var in tax_unit_vars:\n result = sim.calculate(var, period=year)\n data[var] = result.values if hasattr(result, 'values') else np.array(result)\n \n df = pd.DataFrame(data)\n df['state'] = state_abbr\n \n # Filter to childless filers only\n childless_mask = df['eitc_child_count'] == 0\n df_childless = df[childless_mask].copy()\n \n if len(df_childless) == 0:\n return None\n \n # Classify each household by EITC phase status\n df_childless['eitc_phase_status'] = determine_eitc_phase_status_vectorized(df_childless)\n df_childless['year'] = year\n \n return df_childless\n \n except Exception as e:\n print(f\" Error processing {state_abbr}: {e}\")\n return None\n\n\ndef run_all_states_analysis(year, states=None):\n \"\"\"\n Run EITC analysis for all states and combine results.\n \"\"\"\n if states is None:\n states = ALL_STATES\n \n print(f\"\\n{'='*60}\")\n print(f\"Running analysis for {year}\")\n print(f\"{'='*60}\")\n \n all_results = []\n \n for i, state in enumerate(states):\n print(f\"Processing {state} ({i+1}/{len(states)})...\", end=\" \")\n result = run_state_eitc_analysis(state, year)\n \n if result is not None and len(result) > 0:\n weighted_count = result['tax_unit_weight'].sum()\n print(f\"{len(result):,} records, {weighted_count:,.0f} weighted\")\n all_results.append(result)\n else:\n print(\"No data found\")\n \n if all_results:\n combined = pd.concat(all_results, ignore_index=True)\n print(f\"\\nTotal: {len(combined):,} records, {combined['tax_unit_weight'].sum():,.0f} weighted tax units\")\n return combined\n else:\n return pd.DataFrame()" + "source": [ + "# =============================================================================\n", + "# STATE LIST AND DATA LOADING FUNCTIONS\n", + "# =============================================================================\n", + "\n", + "# All US states + DC (51 total)\n", + "ALL_STATES = [\n", + " 'AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL', \n", + " 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', \n", + " 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', \n", + " 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', \n", + " 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY'\n", + "]\n", + "\n", + "# Order for sorting phase statuses (follows logical EITC flow)\n", + "PHASE_ORDER = [\n", + " 'Ineligible', # Cannot receive EITC (age/SSN/investment income)\n", + " 'No earned income', # Eligible but no earnings\n", + " 'Pre-phase-in', # Building up to maximum\n", + " 'Full amount', # At maximum (plateau)\n", + " 'Partially phased out', # Being reduced\n", + " 'Fully phased out' # Reduced to $0\n", + "]\n", + "\n", + "\n", + "def run_state_eitc_analysis(state_abbr, year):\n", + " \"\"\"\n", + " Load and analyze EITC data for a single state.\n", + " \n", + " Parameters:\n", + " -----------\n", + " state_abbr : str\n", + " Two-letter state abbreviation (e.g., 'CA', 'NY', 'TX')\n", + " year : int\n", + " Tax year to analyze (e.g., 2024, 2025)\n", + " \n", + " Returns:\n", + " --------\n", + " pandas.DataFrame or None\n", + " DataFrame with one row per childless tax unit, or None if error\n", + " \"\"\"\n", + " try:\n", + " # Load the state's microdata from HuggingFace\n", + " dataset_path = f\"hf://policyengine/policyengine-us-data/states/{state_abbr}.h5\"\n", + " sim = Microsimulation(dataset=dataset_path)\n", + " \n", + " # Variables to calculate\n", + " tax_unit_vars = [\n", + " 'tax_unit_id', # Unique identifier\n", + " 'tax_unit_weight', # Survey weight\n", + " 'eitc_eligible', # NEW: Whether eligible for EITC\n", + " 'eitc', # Federal EITC amount\n", + " 'eitc_maximum', # Max possible EITC\n", + " 'eitc_phased_in', # Phase-in amount\n", + " 'eitc_reduction', # Phase-out reduction\n", + " 'eitc_child_count', # Number of EITC-qualifying children\n", + " 'state_eitc', # State EITC amount\n", + " 'tax_unit_earned_income', # Total earned income\n", + " 'age_head', # Age of primary filer\n", + " ]\n", + " \n", + " # Calculate each variable\n", + " data = {}\n", + " for var in tax_unit_vars:\n", + " result = sim.calculate(var, period=year)\n", + " data[var] = result.values if hasattr(result, 'values') else np.array(result)\n", + " \n", + " df = pd.DataFrame(data)\n", + " df['state'] = state_abbr\n", + " \n", + " # Filter to childless filers only\n", + " childless_mask = df['eitc_child_count'] == 0\n", + " df_childless = df[childless_mask].copy()\n", + " \n", + " if len(df_childless) == 0:\n", + " return None\n", + " \n", + " # Classify each household by EITC phase status\n", + " df_childless['eitc_phase_status'] = determine_eitc_phase_status_vectorized(df_childless)\n", + " df_childless['year'] = year\n", + " \n", + " return df_childless\n", + " \n", + " except Exception as e:\n", + " print(f\" Error processing {state_abbr}: {e}\")\n", + " return None\n", + "\n", + "\n", + "def run_all_states_analysis(year, states=None):\n", + " \"\"\"\n", + " Run EITC analysis for all states and combine results.\n", + " \"\"\"\n", + " if states is None:\n", + " states = ALL_STATES\n", + " \n", + " print(f\"\\n{'='*60}\")\n", + " print(f\"Running analysis for {year}\")\n", + " print(f\"{'='*60}\")\n", + " \n", + " all_results = []\n", + " \n", + " for i, state in enumerate(states):\n", + " print(f\"Processing {state} ({i+1}/{len(states)})...\", end=\" \")\n", + " result = run_state_eitc_analysis(state, year)\n", + " \n", + " if result is not None and len(result) > 0:\n", + " weighted_count = result['tax_unit_weight'].sum()\n", + " print(f\"{len(result):,} records, {weighted_count:,.0f} weighted\")\n", + " all_results.append(result)\n", + " else:\n", + " print(\"No data found\")\n", + " \n", + " if all_results:\n", + " combined = pd.concat(all_results, ignore_index=True)\n", + " print(f\"\\nTotal: {len(combined):,} records, {combined['tax_unit_weight'].sum():,.0f} weighted tax units\")\n", + " return combined\n", + " else:\n", + " return pd.DataFrame()" + ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Run Analysis for 2024 and 2025" + "## Run Analysis for 2024 and 2025" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "============================================================\n", + "Running analysis for 2024\n", + "============================================================\n", + "Processing AL (1/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5febc04fad0e4300bd2649808cf6d148", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "AL.h5: 0%| | 0.00/35.6M [00:00 0].groupby(['state', 'eitc_phase_status']).apply(\n lambda x: pd.Series({\n 'avg_federal_eitc': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n 'avg_state_eitc': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n })\n ).reset_index()\n \n summary = summary.merge(avg_eitc, on=['state', 'eitc_phase_status'], how='left')\n summary['avg_federal_eitc'] = summary['avg_federal_eitc'].fillna(0)\n summary['avg_state_eitc'] = summary['avg_state_eitc'].fillna(0)\n \n # Step 5: Clean up columns and sort\n summary = summary[['state', 'eitc_phase_status', 'weighted_households', 'pct_of_state', \n 'avg_federal_eitc', 'avg_state_eitc']]\n \n # Sort by state alphabetically, then by phase status in logical order\n summary['phase_sort'] = summary['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})\n summary = summary.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)\n \n return summary\n\n# Generate summaries for both years\nsummary_2024 = create_phase_status_summary(df_2024, \"2024\")\nsummary_2025 = create_phase_status_summary(df_2025, \"2025\")\n\n# Preview the results\nprint(\"\\n2024 Summary (first 20 rows):\")\nprint(summary_2024.head(20).to_string(index=False))\nprint(\"\\n2025 Summary (first 20 rows):\")\nprint(summary_2025.head(20).to_string(index=False))" + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "======================================================================\n", + "EITC Phase Status by State - 2024\n", + "======================================================================\n", + "\n", + "======================================================================\n", + "EITC Phase Status by State - 2025\n", + "======================================================================\n", + "\n", + "2024 Summary (first 20 rows):\n", + "state eitc_phase_status weighted_households pct_of_state avg_federal_eitc avg_state_eitc\n", + " AK Ineligible 103,108.59 50.10 0.00 0.00\n", + " AK No earned income 13,868.58 6.70 0.00 0.00\n", + " AK Pre-phase-in 3,593.07 1.70 515.63 0.00\n", + " AK Full amount 0.26 0.00 632.00 0.00\n", + " AK Partially phased out 1,670.44 0.80 626.76 0.00\n", + " AK Fully phased out 83,537.39 40.60 0.00 0.00\n", + " AL Ineligible 807,295.19 56.80 0.00 0.00\n", + " AL No earned income 108,302.58 7.60 0.00 0.00\n", + " AL Pre-phase-in 3,394.00 0.20 354.86 0.00\n", + " AL Full amount 579.79 0.00 632.00 0.00\n", + " AL Partially phased out 10,719.72 0.80 448.06 0.00\n", + " AL Fully phased out 491,831.62 34.60 0.00 0.00\n", + " AR Ineligible 365,523.75 53.50 0.00 0.00\n", + " AR No earned income 42,188.39 6.20 0.00 0.00\n", + " AR Pre-phase-in 2,328.13 0.30 453.60 0.00\n", + " AR Full amount 225.77 0.00 632.00 0.00\n", + " AR Partially phased out 5,891.05 0.90 390.64 0.00\n", + " AR Fully phased out 267,684.84 39.10 0.00 0.00\n", + " AZ Ineligible 1,030,924.88 54.10 0.00 0.00\n", + " AZ No earned income 118,057.78 6.20 0.00 0.00\n", + "\n", + "2025 Summary (first 20 rows):\n", + "state eitc_phase_status weighted_households pct_of_state avg_federal_eitc avg_state_eitc\n", + " AK Ineligible 104,066.34 50.10 0.00 0.00\n", + " AK No earned income 13,997.34 6.70 0.00 0.00\n", + " AK Pre-phase-in 3,626.43 1.70 540.81 0.00\n", + " AK Full amount 0.27 0.00 649.00 0.00\n", + " AK Partially phased out 1,685.95 0.80 627.09 0.00\n", + " AK Fully phased out 84,312.62 40.60 0.00 0.00\n", + " AL Ineligible 814,817.62 56.80 0.00 0.00\n", + " AL No earned income 109,308.14 7.60 0.00 0.00\n", + " AL Pre-phase-in 3,424.46 0.20 372.11 0.00\n", + " AL Full amount 586.22 0.00 649.00 0.00\n", + " AL Partially phased out 10,817.39 0.80 439.31 0.00\n", + " AL Fully phased out 496,373.16 34.60 0.00 0.00\n", + " AR Ineligible 368,928.66 53.50 0.00 0.00\n", + " AR No earned income 42,580.10 6.20 0.00 0.00\n", + " AR Pre-phase-in 2,349.75 0.30 475.76 0.00\n", + " AR Full amount 227.08 0.00 649.00 0.00\n", + " AR Partially phased out 5,943.54 0.90 379.36 0.00\n", + " AR Fully phased out 270,162.16 39.10 0.00 0.00\n", + " AZ Ineligible 1,040,525.88 54.10 0.00 0.00\n", + " AZ No earned income 119,153.92 6.20 0.00 0.00\n" + ] + } + ], + "source": [ + "# =============================================================================\n", + "# PHASE STATUS SUMMARY BY STATE\n", + "# =============================================================================\n", + "# This function creates the main summary output: for each state, what\n", + "# percentage of childless households fall into each EITC phase status?\n", + "#\n", + "# Key outputs per state × phase status:\n", + "# - weighted_households: Actual population count (using survey weights)\n", + "# - pct_of_state: What % of that state's childless households are in this phase\n", + "# - avg_federal_eitc: Average federal EITC for households receiving EITC\n", + "# - avg_state_eitc: Average state EITC (for states with programs)\n", + "#\n", + "# The percentages should sum to 100% for each state since we include ALL\n", + "# childless households (not just EITC recipients).\n", + "# =============================================================================\n", + "\n", + "def create_phase_status_summary(df, year_label):\n", + " \"\"\"\n", + " Create summary of EITC phase status by state with weighted counts and percentages.\n", + " \n", + " Parameters:\n", + " -----------\n", + " df : pandas.DataFrame\n", + " Household-level data from run_all_states_analysis()\n", + " year_label : str\n", + " Label for display (e.g., \"2024\")\n", + " \n", + " Returns:\n", + " --------\n", + " pandas.DataFrame\n", + " Summary with columns: state, eitc_phase_status, weighted_households,\n", + " pct_of_state, avg_federal_eitc, avg_state_eitc\n", + " \"\"\"\n", + " print(f\"\\n{'='*70}\")\n", + " print(f\"EITC Phase Status by State - {year_label}\")\n", + " print(f\"{'='*70}\")\n", + " \n", + " # Step 1: Calculate weighted counts by state and phase status\n", + " # tax_unit_weight is summed to get population-representative counts\n", + " summary = df.groupby(['state', 'eitc_phase_status']).agg({\n", + " 'tax_unit_weight': 'sum',\n", + " }).reset_index()\n", + " \n", + " summary.columns = ['state', 'eitc_phase_status', 'weighted_households']\n", + " \n", + " # Step 2: Calculate state totals for percentage calculation\n", + " state_totals = summary.groupby('state')['weighted_households'].sum().reset_index()\n", + " state_totals.columns = ['state', 'state_total']\n", + " \n", + " # Step 3: Merge to compute percentages\n", + " summary = summary.merge(state_totals, on='state')\n", + " summary['pct_of_state'] = (summary['weighted_households'] / summary['state_total'] * 100).round(1)\n", + " \n", + " # Step 4: Add average EITC amounts (only computed for households receiving EITC)\n", + " # This uses weighted averages: sum(value × weight) / sum(weight)\n", + " avg_eitc = df[df['eitc'] > 0].groupby(['state', 'eitc_phase_status']).apply(\n", + " lambda x: pd.Series({\n", + " 'avg_federal_eitc': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n", + " 'avg_state_eitc': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n", + " })\n", + " ).reset_index()\n", + " \n", + " summary = summary.merge(avg_eitc, on=['state', 'eitc_phase_status'], how='left')\n", + " summary['avg_federal_eitc'] = summary['avg_federal_eitc'].fillna(0)\n", + " summary['avg_state_eitc'] = summary['avg_state_eitc'].fillna(0)\n", + " \n", + " # Step 5: Clean up columns and sort\n", + " summary = summary[['state', 'eitc_phase_status', 'weighted_households', 'pct_of_state', \n", + " 'avg_federal_eitc', 'avg_state_eitc']]\n", + " \n", + " # Sort by state alphabetically, then by phase status in logical order\n", + " summary['phase_sort'] = summary['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})\n", + " summary = summary.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)\n", + " \n", + " return summary\n", + "\n", + "# Generate summaries for both years\n", + "summary_2024 = create_phase_status_summary(df_2024, \"2024\")\n", + "summary_2025 = create_phase_status_summary(df_2025, \"2025\")\n", + "\n", + "# Preview the results\n", + "print(\"\\n2024 Summary (first 20 rows):\")\n", + "print(summary_2024.head(20).to_string(index=False))\n", + "print(\"\\n2025 Summary (first 20 rows):\")\n", + "print(summary_2025.head(20).to_string(index=False))" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 8, "metadata": {}, - "outputs": [], - "source": "# =============================================================================\n# SUMMARY BY STATE - TOP STATES BY POPULATION\n# =============================================================================\n# Shows the states with the largest childless tax unit populations,\n# along with total and average EITC amounts.\n#\n# Useful for understanding which states contribute most to the national totals.\n# =============================================================================\n\ndef summary_by_state(df, year_label, top_n=15):\n \"\"\"\n Create summary by state showing top N by number of childless tax units.\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Household-level data\n year_label : str\n Label for display\n top_n : int\n Number of top states to show (default 15)\n \n Returns:\n --------\n pandas.DataFrame\n State-level summary sorted by weighted tax unit count\n \"\"\"\n print(f\"\\n{'='*60}\")\n print(f\"Top {top_n} States by EITC Recipients - {year_label}\")\n print(f\"{'='*60}\")\n \n # Calculate state-level aggregates using weighted sums/averages\n summary = df.groupby('state').apply(\n lambda x: pd.Series({\n # Total weighted tax units in state\n 'Tax Units (Weighted)': x['tax_unit_weight'].sum(),\n # Total federal EITC distributed (weight × eitc amount)\n 'Total Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum(),\n # Total state EITC distributed\n 'Total State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum(),\n # Weighted average federal EITC per tax unit\n 'Avg Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n # Weighted average state EITC per tax unit\n 'Avg State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n # Boolean: does this state have a state EITC program?\n 'Has State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() > 0,\n })\n ).reset_index()\n \n # Sort by number of tax units (largest states first)\n summary = summary.sort_values('Tax Units (Weighted)', ascending=False).head(top_n)\n \n return summary\n\n# Generate and display for both years\nstate_2024 = summary_by_state(df_2024, \"2024\")\nprint(state_2024.to_string(index=False))\n\nstate_2025 = summary_by_state(df_2025, \"2025\")\nprint(state_2025.to_string(index=False))" + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "============================================================\n", + "Top 15 States by EITC Recipients - 2024\n", + "============================================================\n", + "state Tax Units (Weighted) Total Federal EITC Total State EITC Avg Federal EITC Avg State EITC Has State EITC\n", + " CA 11,676,756.00 126,396,768.00 392,533,280.00 10.82 33.62 True\n", + " TX 8,270,492.50 102,216,176.00 0.00 12.36 0.00 False\n", + " FL 6,828,671.50 50,078,040.00 0.00 7.33 0.00 False\n", + " NY 6,089,496.00 64,632,924.00 17,955,152.00 10.61 2.95 True\n", + " IL 4,061,833.00 43,125,848.00 8,625,170.00 10.62 2.12 True\n", + " PA 4,057,412.25 41,305,212.00 0.00 10.18 0.00 False\n", + " OH 3,171,405.75 30,410,496.00 9,123,148.00 9.59 2.88 True\n", + " NC 3,018,447.50 15,553,126.00 0.00 5.15 0.00 False\n", + " MI 2,947,462.50 30,062,786.00 9,018,837.00 10.20 3.06 True\n", + " GA 2,867,909.25 20,237,260.00 0.00 7.06 0.00 False\n", + " WA 2,709,062.25 42,446,576.00 27,457,220.00 15.67 10.14 True\n", + " NJ 2,670,505.50 31,733,258.00 55,209,756.00 11.88 20.67 True\n", + " MA 2,445,482.50 27,926,758.00 11,170,704.00 11.42 4.57 True\n", + " VA 2,348,493.50 14,553,468.00 102,224,696.00 6.20 43.53 True\n", + " TN 2,125,824.00 16,333,918.00 0.00 7.68 0.00 False\n", + "\n", + "============================================================\n", + "Top 15 States by EITC Recipients - 2025\n", + "============================================================\n", + "state Tax Units (Weighted) Total Federal EITC Total State EITC Avg Federal EITC Avg State EITC Has State EITC\n", + " CA 11,785,171.00 129,709,128.00 394,849,248.00 11.01 33.50 True\n", + " TX 8,347,282.00 106,447,616.00 0.00 12.75 0.00 False\n", + " FL 6,892,074.00 51,580,204.00 0.00 7.48 0.00 False\n", + " NY 6,146,035.00 66,562,396.00 18,485,602.00 10.83 3.01 True\n", + " IL 4,099,546.00 44,526,100.00 8,905,220.00 10.86 2.17 True\n", + " PA 4,095,084.50 42,804,808.00 0.00 10.45 0.00 False\n", + " OH 3,200,851.75 31,340,700.00 9,402,211.00 9.79 2.94 True\n", + " NC 3,046,473.25 15,684,542.00 0.00 5.15 0.00 False\n", + " MI 2,974,829.25 31,287,212.00 9,386,164.00 10.52 3.16 True\n", + " GA 2,894,537.00 20,446,926.00 0.00 7.06 0.00 False\n", + " WA 2,734,215.50 44,398,868.00 28,292,344.00 16.24 10.35 True\n", + " NJ 2,695,300.25 32,696,890.00 56,647,304.00 12.13 21.02 True\n", + " MA 2,468,188.00 28,870,056.00 11,548,022.00 11.70 4.68 True\n", + " VA 2,370,298.50 14,610,885.00 103,897,088.00 6.16 43.83 True\n", + " TN 2,145,561.75 16,974,484.00 0.00 7.91 0.00 False\n" + ] + } + ], + "source": [ + "# =============================================================================\n", + "# SUMMARY BY STATE - TOP STATES BY POPULATION\n", + "# =============================================================================\n", + "# Shows the states with the largest childless tax unit populations,\n", + "# along with total and average EITC amounts.\n", + "#\n", + "# Useful for understanding which states contribute most to the national totals.\n", + "# =============================================================================\n", + "\n", + "def summary_by_state(df, year_label, top_n=15):\n", + " \"\"\"\n", + " Create summary by state showing top N by number of childless tax units.\n", + " \n", + " Parameters:\n", + " -----------\n", + " df : pandas.DataFrame\n", + " Household-level data\n", + " year_label : str\n", + " Label for display\n", + " top_n : int\n", + " Number of top states to show (default 15)\n", + " \n", + " Returns:\n", + " --------\n", + " pandas.DataFrame\n", + " State-level summary sorted by weighted tax unit count\n", + " \"\"\"\n", + " print(f\"\\n{'='*60}\")\n", + " print(f\"Top {top_n} States by EITC Recipients - {year_label}\")\n", + " print(f\"{'='*60}\")\n", + " \n", + " # Calculate state-level aggregates using weighted sums/averages\n", + " summary = df.groupby('state').apply(\n", + " lambda x: pd.Series({\n", + " # Total weighted tax units in state\n", + " 'Tax Units (Weighted)': x['tax_unit_weight'].sum(),\n", + " # Total federal EITC distributed (weight × eitc amount)\n", + " 'Total Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum(),\n", + " # Total state EITC distributed\n", + " 'Total State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum(),\n", + " # Weighted average federal EITC per tax unit\n", + " 'Avg Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n", + " # Weighted average state EITC per tax unit\n", + " 'Avg State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n", + " # Boolean: does this state have a state EITC program?\n", + " 'Has State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() > 0,\n", + " })\n", + " ).reset_index()\n", + " \n", + " # Sort by number of tax units (largest states first)\n", + " summary = summary.sort_values('Tax Units (Weighted)', ascending=False).head(top_n)\n", + " \n", + " return summary\n", + "\n", + "# Generate and display for both years\n", + "state_2024 = summary_by_state(df_2024, \"2024\")\n", + "print(state_2024.to_string(index=False))\n", + "\n", + "state_2025 = summary_by_state(df_2025, \"2025\")\n", + "print(state_2025.to_string(index=False))" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 9, "metadata": {}, - "outputs": [], - "source": "# =============================================================================\n# AGE DISTRIBUTION ANALYSIS\n# =============================================================================\n# Shows how childless tax units are distributed by age of the head of household.\n#\n# Key insight: The childless EITC has age restrictions (25-64 for 2024 under\n# current law), so we expect most EITC recipients to fall within that range.\n# =============================================================================\n\ndef age_distribution(df, year_label):\n \"\"\"\n Create age group distribution for heads of household.\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Household-level data\n year_label : str\n Label for display\n \n Returns:\n --------\n pandas.DataFrame\n Summary by age group with weighted counts and averages\n \"\"\"\n print(f\"\\n{'='*60}\")\n print(f\"Age Distribution of Head of Household - {year_label}\")\n print(f\"{'='*60}\")\n \n # Create age groups using pd.cut\n df_copy = df.copy()\n df_copy['age_group'] = pd.cut(\n df_copy['age_head'],\n bins=[0, 25, 35, 45, 55, 65, 100],\n labels=['Under 25', '25-34', '35-44', '45-54', '55-64', '65+']\n )\n \n # Calculate weighted statistics by age group\n summary = df_copy.groupby('age_group').apply(\n lambda x: pd.Series({\n 'Tax Units (Weighted)': x['tax_unit_weight'].sum(),\n 'Avg Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum() if x['tax_unit_weight'].sum() > 0 else 0,\n 'Avg Earned Income': (x['tax_unit_earned_income'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum() if x['tax_unit_weight'].sum() > 0 else 0,\n })\n ).reset_index()\n \n # Add percentage of total\n total_units = summary['Tax Units (Weighted)'].sum()\n summary['% of Total'] = (summary['Tax Units (Weighted)'] / total_units * 100).round(1)\n \n return summary\n\n# Generate for both years\nage_2024 = age_distribution(df_2024, \"2024\")\nprint(age_2024.to_string(index=False))\n\nage_2025 = age_distribution(df_2025, \"2025\")\nprint(age_2025.to_string(index=False))" + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "============================================================\n", + "Age Distribution of Head of Household - 2024\n", + "============================================================\n", + "age_group Tax Units (Weighted) Avg Federal EITC Avg Earned Income % of Total\n", + " Under 25 12,069,262.00 0.57 24,647.42 12.50\n", + " 25-34 14,198,971.00 35.78 76,383.24 14.70\n", + " 35-44 11,448,204.00 2.19 94,731.74 11.90\n", + " 45-54 16,595,334.00 22.36 87,682.62 17.20\n", + " 55-64 9,673,886.00 1.18 59,089.31 10.00\n", + " 65+ 32,441,214.00 0.01 25,601.59 33.60\n", + "\n", + "============================================================\n", + "Age Distribution of Head of Household - 2025\n", + "============================================================\n", + "age_group Tax Units (Weighted) Avg Federal EITC Avg Earned Income % of Total\n", + " Under 25 12,181,323.00 0.59 25,851.27 12.50\n", + " 25-34 14,330,805.00 36.65 80,112.14 14.70\n", + " 35-44 11,554,499.00 2.15 99,357.79 11.90\n", + " 45-54 16,749,416.00 22.80 91,964.61 17.20\n", + " 55-64 9,763,707.00 1.16 61,973.64 10.00\n", + " 65+ 32,742,422.00 0.01 26,850.62 33.60\n" + ] + } + ], + "source": [ + "# =============================================================================\n", + "# AGE DISTRIBUTION ANALYSIS\n", + "# =============================================================================\n", + "# Shows how childless tax units are distributed by age of the head of household.\n", + "#\n", + "# Key insight: The childless EITC has age restrictions (25-64 for 2024 under\n", + "# current law), so we expect most EITC recipients to fall within that range.\n", + "# =============================================================================\n", + "\n", + "def age_distribution(df, year_label):\n", + " \"\"\"\n", + " Create age group distribution for heads of household.\n", + " \n", + " Parameters:\n", + " -----------\n", + " df : pandas.DataFrame\n", + " Household-level data\n", + " year_label : str\n", + " Label for display\n", + " \n", + " Returns:\n", + " --------\n", + " pandas.DataFrame\n", + " Summary by age group with weighted counts and averages\n", + " \"\"\"\n", + " print(f\"\\n{'='*60}\")\n", + " print(f\"Age Distribution of Head of Household - {year_label}\")\n", + " print(f\"{'='*60}\")\n", + " \n", + " # Create age groups using pd.cut\n", + " df_copy = df.copy()\n", + " df_copy['age_group'] = pd.cut(\n", + " df_copy['age_head'],\n", + " bins=[0, 25, 35, 45, 55, 65, 100],\n", + " labels=['Under 25', '25-34', '35-44', '45-54', '55-64', '65+']\n", + " )\n", + " \n", + " # Calculate weighted statistics by age group\n", + " summary = df_copy.groupby('age_group').apply(\n", + " lambda x: pd.Series({\n", + " 'Tax Units (Weighted)': x['tax_unit_weight'].sum(),\n", + " 'Avg Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum() if x['tax_unit_weight'].sum() > 0 else 0,\n", + " 'Avg Earned Income': (x['tax_unit_earned_income'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum() if x['tax_unit_weight'].sum() > 0 else 0,\n", + " })\n", + " ).reset_index()\n", + " \n", + " # Add percentage of total\n", + " total_units = summary['Tax Units (Weighted)'].sum()\n", + " summary['% of Total'] = (summary['Tax Units (Weighted)'] / total_units * 100).round(1)\n", + " \n", + " return summary\n", + "\n", + "# Generate for both years\n", + "age_2024 = age_distribution(df_2024, \"2024\")\n", + "print(age_2024.to_string(index=False))\n", + "\n", + "age_2025 = age_distribution(df_2025, \"2025\")\n", + "print(age_2025.to_string(index=False))" + ] }, { "cell_type": "markdown", @@ -104,14 +2470,81 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 10, "metadata": {}, - "outputs": [], - "source": "# =============================================================================\n# EXPORT DETAILED HOUSEHOLD DATA\n# =============================================================================\n# Exports the full household-level dataset with all calculated variables.\n#\n# WARNING: These files are large (~125MB each) and are excluded from git\n# via .gitignore. They are generated locally when the notebook runs.\n#\n# Use cases:\n# - Detailed analysis in external tools (Excel, Stata, R)\n# - Validation of the summary statistics\n# - Custom filtering/aggregation not provided in this notebook\n# =============================================================================\n\ndef export_household_data(df, year):\n \"\"\"\n Export household-level data to CSV, sorted by state and phase status.\n \"\"\"\n \n # Select columns for export (only columns we're loading)\n export_columns = [\n 'state', # State abbreviation\n 'eitc_phase_status', # Classification result\n 'tax_unit_id', # Unique identifier\n 'tax_unit_weight', # Survey weight\n 'eitc_eligible', # Eligibility status\n 'eitc', # Federal EITC amount\n 'state_eitc', # State EITC amount\n 'eitc_phased_in', # Phase-in calculation\n 'eitc_reduction', # Phase-out reduction\n 'tax_unit_earned_income', # Total earned income\n 'age_head', # Age of primary filer\n ]\n \n # Only include columns that exist in the DataFrame\n available_columns = [col for col in export_columns if col in df.columns]\n df_export = df[available_columns].copy()\n \n # Rename columns for clarity in external tools\n df_export = df_export.rename(columns={\n 'eitc': 'federal_eitc',\n })\n \n # Sort by state (alphabetically) then by phase status (in logical EITC order)\n df_export['phase_sort'] = df_export['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})\n df_export = df_export.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)\n \n # Write to CSV\n filename = f'eitc_childless_families_{year}.csv'\n df_export.to_csv(filename, index=False)\n print(f\"Exported {len(df_export):,} rows to: {filename}\")\n \n return df_export\n\n# Export both years to separate files\ndf_export_2024 = export_household_data(df_2024, 2024)\ndf_export_2025 = export_household_data(df_2025, 2025)" + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Exported 1,493,541 rows to: eitc_childless_families_2024.csv\n", + "Exported 1,493,541 rows to: eitc_childless_families_2025.csv\n" + ] + } + ], + "source": [ + "# =============================================================================\n", + "# EXPORT DETAILED HOUSEHOLD DATA\n", + "# =============================================================================\n", + "# Exports the full household-level dataset with all calculated variables.\n", + "#\n", + "# WARNING: These files are large (~125MB each) and are excluded from git\n", + "# via .gitignore. They are generated locally when the notebook runs.\n", + "#\n", + "# Use cases:\n", + "# - Detailed analysis in external tools (Excel, Stata, R)\n", + "# - Validation of the summary statistics\n", + "# - Custom filtering/aggregation not provided in this notebook\n", + "# =============================================================================\n", + "\n", + "def export_household_data(df, year):\n", + " \"\"\"\n", + " Export household-level data to CSV, sorted by state and phase status.\n", + " \"\"\"\n", + " \n", + " # Select columns for export (only columns we're loading)\n", + " export_columns = [\n", + " 'state', # State abbreviation\n", + " 'eitc_phase_status', # Classification result\n", + " 'tax_unit_id', # Unique identifier\n", + " 'tax_unit_weight', # Survey weight\n", + " 'eitc_eligible', # Eligibility status\n", + " 'eitc', # Federal EITC amount\n", + " 'state_eitc', # State EITC amount\n", + " 'eitc_phased_in', # Phase-in calculation\n", + " 'eitc_reduction', # Phase-out reduction\n", + " 'tax_unit_earned_income', # Total earned income\n", + " 'age_head', # Age of primary filer\n", + " ]\n", + " \n", + " # Only include columns that exist in the DataFrame\n", + " available_columns = [col for col in export_columns if col in df.columns]\n", + " df_export = df[available_columns].copy()\n", + " \n", + " # Rename columns for clarity in external tools\n", + " df_export = df_export.rename(columns={\n", + " 'eitc': 'federal_eitc',\n", + " })\n", + " \n", + " # Sort by state (alphabetically) then by phase status (in logical EITC order)\n", + " df_export['phase_sort'] = df_export['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})\n", + " df_export = df_export.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)\n", + " \n", + " # Write to CSV\n", + " filename = f'eitc_childless_families_{year}.csv'\n", + " df_export.to_csv(filename, index=False)\n", + " print(f\"Exported {len(df_export):,} rows to: {filename}\")\n", + " \n", + " return df_export\n", + "\n", + "# Export both years to separate files\n", + "df_export_2024 = export_household_data(df_2024, 2024)\n", + "df_export_2025 = export_household_data(df_2025, 2025)" + ] }, { "cell_type": "code", - "execution_count": 43, + "execution_count": 11, "metadata": {}, "outputs": [ { @@ -147,221 +2580,199 @@ " eitc_phase_status\n", " tax_unit_id\n", " tax_unit_weight\n", + " eitc_eligible\n", " federal_eitc\n", " state_eitc\n", " eitc_phased_in\n", " eitc_reduction\n", " tax_unit_earned_income\n", - " adjusted_gross_income\n", - " marital_status\n", " age_head\n", - " age_spouse\n", " \n", " \n", " \n", " \n", " 25751\n", " AK\n", - " No income\n", + " Ineligible\n", " 0\n", " 0.80\n", + " False\n", " 0.00\n", " 0.00\n", " 0.00\n", " 0.00\n", " 0.00\n", - " 3,923.64\n", - " Unknown\n", " 79\n", - " 0\n", " \n", " \n", " 25753\n", " AK\n", - " No income\n", + " Ineligible\n", " 3\n", " 0.28\n", + " False\n", " 0.00\n", " 0.00\n", " 0.00\n", " 10,068.10\n", " 0.00\n", - " 148,859.19\n", - " Unknown\n", " 76\n", - " 74\n", - " \n", - " \n", - " 25754\n", - " AK\n", - " No income\n", - " 5\n", - " 12.27\n", - " 0.00\n", - " 0.00\n", - " 194.41\n", - " 0.00\n", - " 2,541.26\n", - " 3,945.09\n", - " Unknown\n", - " 64\n", - " 0\n", " \n", " \n", " 25757\n", " AK\n", - " No income\n", + " Ineligible\n", " 11\n", " 4,387.35\n", + " False\n", " 0.00\n", " 0.00\n", " 0.00\n", " 3,368.61\n", " 0.00\n", - " 61,284.13\n", - " Unknown\n", " 85\n", - " 82\n", + " \n", + " \n", + " 25760\n", + " AK\n", + " Ineligible\n", + " 14\n", + " 2,849.94\n", + " False\n", + " 0.00\n", + " 0.00\n", + " 632.00\n", + " 1,747.39\n", + " 31,767.87\n", + " 21\n", " \n", " \n", " 25761\n", " AK\n", - " No income\n", + " Ineligible\n", " 15\n", " 639.52\n", + " False\n", " 0.00\n", " 0.00\n", " 0.00\n", " 992.74\n", " 0.00\n", - " 23,307.04\n", - " Unknown\n", " 85\n", - " 0\n", " \n", " \n", " 25763\n", " AK\n", - " No income\n", + " Ineligible\n", " 18\n", " 1,114.78\n", + " False\n", " 0.00\n", " 0.00\n", " 0.00\n", " 0.00\n", " 0.00\n", - " 1,403.83\n", - " Unknown\n", " 83\n", - " 0\n", " \n", " \n", - " 25767\n", + " 25764\n", " AK\n", - " No income\n", - " 22\n", - " 0.82\n", - " 0.00\n", - " 0.00\n", - " 0.00\n", + " Ineligible\n", + " 19\n", + " 1,114.78\n", + " False\n", " 0.00\n", " 0.00\n", - " 2,153.92\n", - " Unknown\n", - " 85\n", - " 0\n", + " 632.00\n", + " 10,566.14\n", + " 132,357.31\n", + " 61\n", " \n", " \n", - " 25769\n", + " 25766\n", " AK\n", - " No income\n", - " 24\n", - " 792.77\n", - " 0.00\n", + " Ineligible\n", + " 21\n", + " 2.31\n", + " False\n", " 0.00\n", " 0.00\n", - " 20.54\n", + " 632.00\n", " 0.00\n", - " 10,598.54\n", - " Unknown\n", - " 81\n", - " 0\n", + " 16,941.74\n", + " 78\n", " \n", " \n", - " 25770\n", + " 25767\n", " AK\n", - " No income\n", - " 25\n", - " 1.06\n", + " Ineligible\n", + " 22\n", + " 0.82\n", + " False\n", " 0.00\n", " 0.00\n", " 0.00\n", " 0.00\n", " 0.00\n", - " 1,403.83\n", - " Unknown\n", " 85\n", - " 0\n", " \n", " \n", - " 25771\n", + " 25769\n", " AK\n", - " No income\n", - " 27\n", - " 1.04\n", - " 0.00\n", + " Ineligible\n", + " 24\n", + " 792.77\n", + " False\n", " 0.00\n", " 0.00\n", " 0.00\n", + " 20.54\n", " 0.00\n", - " 1,403.83\n", - " Unknown\n", - " 64\n", - " 0\n", + " 81\n", " \n", " \n", "\n", "" ], "text/plain": [ - " state eitc_phase_status tax_unit_id tax_unit_weight federal_eitc \\\n", - "25751 AK No income 0 0.80 0.00 \n", - "25753 AK No income 3 0.28 0.00 \n", - "25754 AK No income 5 12.27 0.00 \n", - "25757 AK No income 11 4,387.35 0.00 \n", - "25761 AK No income 15 639.52 0.00 \n", - "25763 AK No income 18 1,114.78 0.00 \n", - "25767 AK No income 22 0.82 0.00 \n", - "25769 AK No income 24 792.77 0.00 \n", - "25770 AK No income 25 1.06 0.00 \n", - "25771 AK No income 27 1.04 0.00 \n", + " state eitc_phase_status tax_unit_id tax_unit_weight eitc_eligible \\\n", + "25751 AK Ineligible 0 0.80 False \n", + "25753 AK Ineligible 3 0.28 False \n", + "25757 AK Ineligible 11 4,387.35 False \n", + "25760 AK Ineligible 14 2,849.94 False \n", + "25761 AK Ineligible 15 639.52 False \n", + "25763 AK Ineligible 18 1,114.78 False \n", + "25764 AK Ineligible 19 1,114.78 False \n", + "25766 AK Ineligible 21 2.31 False \n", + "25767 AK Ineligible 22 0.82 False \n", + "25769 AK Ineligible 24 792.77 False \n", "\n", - " state_eitc eitc_phased_in eitc_reduction tax_unit_earned_income \\\n", - "25751 0.00 0.00 0.00 0.00 \n", - "25753 0.00 0.00 10,068.10 0.00 \n", - "25754 0.00 194.41 0.00 2,541.26 \n", - "25757 0.00 0.00 3,368.61 0.00 \n", - "25761 0.00 0.00 992.74 0.00 \n", - "25763 0.00 0.00 0.00 0.00 \n", - "25767 0.00 0.00 0.00 0.00 \n", - "25769 0.00 0.00 20.54 0.00 \n", - "25770 0.00 0.00 0.00 0.00 \n", - "25771 0.00 0.00 0.00 0.00 \n", + " federal_eitc state_eitc eitc_phased_in eitc_reduction \\\n", + "25751 0.00 0.00 0.00 0.00 \n", + "25753 0.00 0.00 0.00 10,068.10 \n", + "25757 0.00 0.00 0.00 3,368.61 \n", + "25760 0.00 0.00 632.00 1,747.39 \n", + "25761 0.00 0.00 0.00 992.74 \n", + "25763 0.00 0.00 0.00 0.00 \n", + "25764 0.00 0.00 632.00 10,566.14 \n", + "25766 0.00 0.00 632.00 0.00 \n", + "25767 0.00 0.00 0.00 0.00 \n", + "25769 0.00 0.00 0.00 20.54 \n", "\n", - " adjusted_gross_income marital_status age_head age_spouse \n", - "25751 3,923.64 Unknown 79 0 \n", - "25753 148,859.19 Unknown 76 74 \n", - "25754 3,945.09 Unknown 64 0 \n", - "25757 61,284.13 Unknown 85 82 \n", - "25761 23,307.04 Unknown 85 0 \n", - "25763 1,403.83 Unknown 83 0 \n", - "25767 2,153.92 Unknown 85 0 \n", - "25769 10,598.54 Unknown 81 0 \n", - "25770 1,403.83 Unknown 85 0 \n", - "25771 1,403.83 Unknown 64 0 " + " tax_unit_earned_income age_head \n", + "25751 0.00 79 \n", + "25753 0.00 76 \n", + "25757 0.00 85 \n", + "25760 31,767.87 21 \n", + "25761 0.00 85 \n", + "25763 0.00 83 \n", + "25764 132,357.31 61 \n", + "25766 16,941.74 78 \n", + "25767 0.00 85 \n", + "25769 0.00 81 " ] }, - "execution_count": 43, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } @@ -374,7 +2785,7 @@ }, { "cell_type": "code", - "execution_count": 44, + "execution_count": 12, "metadata": {}, "outputs": [ { @@ -402,10 +2813,64 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 13, "metadata": {}, - "outputs": [], - "source": "# =============================================================================\n# EXPORT SUMMARY DATA\n# =============================================================================\n# Exports the aggregated summary by state and phase status.\n#\n# These files are small (~10KB) and ARE included in git commits.\n# This is the primary output for sharing with stakeholders.\n#\n# Output Files:\n# - eitc_childless_phase_status_summary_2024.csv\n# - eitc_childless_phase_status_summary_2025.csv\n# =============================================================================\n\ndef export_summary(summary_df, year):\n \"\"\"\n Export phase status summary to CSV, sorted by state and phase status.\n \n Parameters:\n -----------\n summary_df : pandas.DataFrame\n Summary from create_phase_status_summary()\n year : int\n Tax year (used in filename)\n \n Returns:\n --------\n pandas.DataFrame\n The exported data\n \"\"\"\n df_export = summary_df.copy()\n \n # Sort by state (alphabetically) then phase status (logical EITC order)\n df_export['phase_sort'] = df_export['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})\n df_export = df_export.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)\n \n # Write to CSV\n filename = f'eitc_childless_phase_status_summary_{year}.csv'\n df_export.to_csv(filename, index=False)\n print(f\"Exported summary to: {filename}\")\n return df_export\n\n# Export both years\nsummary_2024_export = export_summary(summary_2024, 2024)\nsummary_2025_export = export_summary(summary_2025, 2025)" + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Exported summary to: eitc_childless_phase_status_summary_2024.csv\n", + "Exported summary to: eitc_childless_phase_status_summary_2025.csv\n" + ] + } + ], + "source": [ + "# =============================================================================\n", + "# EXPORT SUMMARY DATA\n", + "# =============================================================================\n", + "# Exports the aggregated summary by state and phase status.\n", + "#\n", + "# These files are small (~10KB) and ARE included in git commits.\n", + "# This is the primary output for sharing with stakeholders.\n", + "#\n", + "# Output Files:\n", + "# - eitc_childless_phase_status_summary_2024.csv\n", + "# - eitc_childless_phase_status_summary_2025.csv\n", + "# =============================================================================\n", + "\n", + "def export_summary(summary_df, year):\n", + " \"\"\"\n", + " Export phase status summary to CSV, sorted by state and phase status.\n", + " \n", + " Parameters:\n", + " -----------\n", + " summary_df : pandas.DataFrame\n", + " Summary from create_phase_status_summary()\n", + " year : int\n", + " Tax year (used in filename)\n", + " \n", + " Returns:\n", + " --------\n", + " pandas.DataFrame\n", + " The exported data\n", + " \"\"\"\n", + " df_export = summary_df.copy()\n", + " \n", + " # Sort by state (alphabetically) then phase status (logical EITC order)\n", + " df_export['phase_sort'] = df_export['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})\n", + " df_export = df_export.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)\n", + " \n", + " # Write to CSV\n", + " filename = f'eitc_childless_phase_status_summary_{year}.csv'\n", + " df_export.to_csv(filename, index=False)\n", + " print(f\"Exported summary to: {filename}\")\n", + " return df_export\n", + "\n", + "# Export both years\n", + "summary_2024_export = export_summary(summary_2024, 2024)\n", + "summary_2025_export = export_summary(summary_2025, 2025)" + ] }, { "cell_type": "markdown", @@ -416,15 +2881,121 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 14, "metadata": {}, - "outputs": [], - "source": "# =============================================================================\n# NATIONAL TOTALS BY PHASE STATUS\n# =============================================================================\n# Aggregates across all states to show the national distribution of\n# childless tax units by EITC phase status.\n#\n# Key insights:\n# - Most childless tax units (~62%) are \"Fully phased out\" (too much income)\n# - About 35% have \"No income\" (no earned income = no EITC)\n# - Only ~2% actually receive EITC (Pre-phase-in + Full amount + Partially)\n# =============================================================================\n\ndef national_totals(df, year):\n \"\"\"\n Calculate national totals by phase status.\n \n Parameters:\n -----------\n df : pandas.DataFrame\n Household-level data\n year : int\n Tax year (for output column)\n \n Returns:\n --------\n pandas.DataFrame\n National summary with weighted counts and percentages\n \"\"\"\n totals = df.groupby('eitc_phase_status').agg({\n 'tax_unit_weight': 'sum',\n }).reset_index()\n totals.columns = ['eitc_phase_status', 'weighted_households']\n \n # Calculate percentage of total\n total_all = totals['weighted_households'].sum()\n totals['pct_of_total'] = (totals['weighted_households'] / total_all * 100).round(1)\n totals['year'] = year\n return totals\n\n# Display national totals\nprint(\"National Totals by Phase Status:\")\nprint(\"\\n2024:\")\nnat_2024 = national_totals(df_2024, 2024)\nprint(nat_2024.to_string(index=False))\nprint(f\"\\nTotal childless tax units: {nat_2024['weighted_households'].sum():,.0f}\")\n\nprint(\"\\n2025:\")\nnat_2025 = national_totals(df_2025, 2025)\nprint(nat_2025.to_string(index=False))\nprint(f\"\\nTotal childless tax units: {nat_2025['weighted_households'].sum():,.0f}\")" + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "National Totals by Phase Status:\n", + "\n", + "2024:\n", + " eitc_phase_status weighted_households pct_of_total year\n", + " Full amount 33,314.48 0.00 2024\n", + " Fully phased out 36,907,524.00 38.30 2024\n", + " Ineligible 51,325,700.00 53.20 2024\n", + " No earned income 6,137,777.50 6.40 2024\n", + "Partially phased out 824,046.81 0.90 2024\n", + " Pre-phase-in 1,203,184.00 1.20 2024\n", + "\n", + "Total childless tax units: 96,431,552\n", + "\n", + "2025:\n", + " eitc_phase_status weighted_households pct_of_total year\n", + " Full amount 33,638.47 0.00 2025\n", + " Fully phased out 37,248,152.00 38.30 2025\n", + " Ineligible 51,804,544.00 53.20 2025\n", + " No earned income 6,194,765.50 6.40 2025\n", + "Partially phased out 831,458.88 0.90 2025\n", + " Pre-phase-in 1,214,332.12 1.20 2025\n", + "\n", + "Total childless tax units: 97,326,896\n" + ] + } + ], + "source": [ + "# =============================================================================\n", + "# NATIONAL TOTALS BY PHASE STATUS\n", + "# =============================================================================\n", + "# Aggregates across all states to show the national distribution of\n", + "# childless tax units by EITC phase status.\n", + "#\n", + "# Key insights:\n", + "# - Most childless tax units (~62%) are \"Fully phased out\" (too much income)\n", + "# - About 35% have \"No income\" (no earned income = no EITC)\n", + "# - Only ~2% actually receive EITC (Pre-phase-in + Full amount + Partially)\n", + "# =============================================================================\n", + "\n", + "def national_totals(df, year):\n", + " \"\"\"\n", + " Calculate national totals by phase status.\n", + " \n", + " Parameters:\n", + " -----------\n", + " df : pandas.DataFrame\n", + " Household-level data\n", + " year : int\n", + " Tax year (for output column)\n", + " \n", + " Returns:\n", + " --------\n", + " pandas.DataFrame\n", + " National summary with weighted counts and percentages\n", + " \"\"\"\n", + " totals = df.groupby('eitc_phase_status').agg({\n", + " 'tax_unit_weight': 'sum',\n", + " }).reset_index()\n", + " totals.columns = ['eitc_phase_status', 'weighted_households']\n", + " \n", + " # Calculate percentage of total\n", + " total_all = totals['weighted_households'].sum()\n", + " totals['pct_of_total'] = (totals['weighted_households'] / total_all * 100).round(1)\n", + " totals['year'] = year\n", + " return totals\n", + "\n", + "# Display national totals\n", + "print(\"National Totals by Phase Status:\")\n", + "print(\"\\n2024:\")\n", + "nat_2024 = national_totals(df_2024, 2024)\n", + "print(nat_2024.to_string(index=False))\n", + "print(f\"\\nTotal childless tax units: {nat_2024['weighted_households'].sum():,.0f}\")\n", + "\n", + "print(\"\\n2025:\")\n", + "nat_2025 = national_totals(df_2025, 2025)\n", + "print(nat_2025.to_string(index=False))\n", + "print(f\"\\nTotal childless tax units: {nat_2025['weighted_households'].sum():,.0f}\")" + ] }, { "cell_type": "markdown", "metadata": {}, - "source": "## Notes\n\n### Data Interpretation\n- **Tax unit weights** represent the number of actual tax units each record represents in the population\n- All monetary values are weighted averages/totals reflecting the full population\n- State datasets contain representative microdata for each state\n\n### EITC Phase Status Definitions\n1. **Ineligible**: Does not meet EITC eligibility requirements (age 25-64, valid SSN, investment income limits, or filing status)\n2. **No earned income**: Eligible for EITC but has zero earned income (cannot receive credit without earnings)\n3. **Pre-phase-in**: Earned income is below the level needed to receive the maximum credit. Credit = (earned income × 7.65%)\n4. **Full amount**: At the plateau - receiving maximum credit (~$632 for childless in 2024)\n5. **Partially phased out**: Income is above the phase-out threshold, receiving reduced credit\n6. **Fully phased out**: Income is too high; credit is reduced to $0\n\n### Childless Worker EITC Parameters (2024)\n- Maximum credit: ~$632\n- Phase-in rate: 7.65%\n- Phase-out starts at: ~$9,800 (single), ~$16,400 (married)\n- Phase-out rate: 7.65%\n- Age requirements: 25-64 years old (or 19+ if former foster youth/homeless)\n\n### State EITC Programs\nSee the State EITC Programs section at the beginning of this notebook for detailed information on each state's program, including states with unique structures (CA, MN, WA, VA, DE, MD)." + "source": [ + "## Notes\n", + "\n", + "### Data Interpretation\n", + "- **Tax unit weights** represent the number of actual tax units each record represents in the population\n", + "- All monetary values are weighted averages/totals reflecting the full population\n", + "- State datasets contain representative microdata for each state\n", + "\n", + "### EITC Phase Status Definitions\n", + "1. **Ineligible**: Does not meet EITC eligibility requirements (age 25-64, valid SSN, investment income limits, or filing status)\n", + "2. **No earned income**: Eligible for EITC but has zero earned income (cannot receive credit without earnings)\n", + "3. **Pre-phase-in**: Earned income is below the level needed to receive the maximum credit. Credit = (earned income × 7.65%)\n", + "4. **Full amount**: At the plateau - receiving maximum credit (~$632 for childless in 2024)\n", + "5. **Partially phased out**: Income is above the phase-out threshold, receiving reduced credit\n", + "6. **Fully phased out**: Income is too high; credit is reduced to $0\n", + "\n", + "### Childless Worker EITC Parameters (2024)\n", + "- Maximum credit: ~$632\n", + "- Phase-in rate: 7.65%\n", + "- Phase-out starts at: ~$9,800 (single), ~$16,400 (married)\n", + "- Phase-out rate: 7.65%\n", + "- Age requirements: 25-64 years old (or 19+ if former foster youth/homeless)\n", + "\n", + "### State EITC Programs\n", + "See the State EITC Programs section at the beginning of this notebook for detailed information on each state's program, including states with unique structures (CA, MN, WA, VA, DE, MD)." + ] } ], "metadata": { @@ -448,4 +3019,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} \ No newline at end of file +} diff --git a/eitc_childless_analysis/eitc_childless_phase_status_summary_2024.csv b/eitc_childless_analysis/eitc_childless_phase_status_summary_2024.csv index 2f08905..11a6b3d 100644 --- a/eitc_childless_analysis/eitc_childless_phase_status_summary_2024.csv +++ b/eitc_childless_analysis/eitc_childless_phase_status_summary_2024.csv @@ -1,256 +1,307 @@ state,eitc_phase_status,weighted_households,pct_of_state,avg_federal_eitc,avg_state_eitc -AK,No income,64211.33,31.2,0.0,0.0 +AK,Ineligible,103108.586,50.1,0.0,0.0 +AK,No earned income,13868.578,6.7,0.0,0.0 AK,Pre-phase-in,3593.0698,1.7,515.6275,0.0 AK,Full amount,0.26411486,0.0,632.0,0.0 AK,Partially phased out,1670.441,0.8,626.76306,0.0 -AK,Fully phased out,136303.23,66.2,0.0,0.0 -AL,No income,598891.06,42.1,0.0,0.0 +AK,Fully phased out,83537.39,40.6,0.0,0.0 +AL,Ineligible,807295.2,56.8,0.0,0.0 +AL,No earned income,108302.58,7.6,0.0,0.0 AL,Pre-phase-in,3393.997,0.2,354.8626,0.0 AL,Full amount,579.7901,0.0,632.00006,0.0 AL,Partially phased out,10719.72,0.8,448.05634,0.0 -AL,Fully phased out,808538.3,56.9,0.0,0.0 -AR,No income,232860.75,34.1,0.0,0.0 +AL,Fully phased out,491831.62,34.6,0.0,0.0 +AR,Ineligible,365523.75,53.5,0.0,0.0 +AR,No earned income,42188.39,6.2,0.0,0.0 AR,Pre-phase-in,2328.134,0.3,453.59937,0.0 AR,Full amount,225.77205,0.0,632.00006,0.0 AR,Partially phased out,5891.0483,0.9,390.64108,0.0 -AR,Fully phased out,442536.25,64.7,0.0,0.0 -AZ,No income,672398.75,35.3,0.0,0.0 +AR,Fully phased out,267684.84,39.1,0.0,0.0 +AZ,Ineligible,1030924.9,54.1,0.0,0.0 +AZ,No earned income,118057.78,6.2,0.0,0.0 AZ,Pre-phase-in,16732.682,0.9,489.90863,0.0 AZ,Full amount,813.97723,0.0,631.99994,0.0 AZ,Partially phased out,14077.297,0.7,468.2071,0.0 -AZ,Fully phased out,1201599.2,63.1,0.0,0.0 -CA,No income,4351373.5,37.3,0.0,0.0 +AZ,Fully phased out,725015.44,38.0,0.0,0.0 +CA,Ineligible,6262195.5,53.6,0.0,0.0 +CA,No earned income,752618.44,6.4,0.0,0.0 CA,Pre-phase-in,169192.22,1.4,464.49402,220.66093 CA,Full amount,6347.9746,0.1,632.0,214.94342 CA,Partially phased out,129554.266,1.1,338.052,165.84186 -CA,Fully phased out,7020288.0,60.1,0.0,0.0 -CO,No income,508308.2,31.7,0.0,0.0 +CA,Fully phased out,4356847.0,37.3,0.0,0.0 +CO,Ineligible,845305.4,52.7,0.0,0.0 +CO,No earned income,91801.25,5.7,0.0,0.0 CO,Pre-phase-in,18100.334,1.1,495.57047,247.78523 CO,Full amount,607.85394,0.0,632.0,316.0 CO,Partially phased out,14817.716,0.9,390.9891,195.49455 -CO,Fully phased out,1061123.5,66.2,0.0,0.0 -CT,No income,375591.62,33.5,0.0,0.0 +CO,Fully phased out,632325.06,39.4,0.0,0.0 +CT,Ineligible,610852.06,54.5,0.0,0.0 +CT,No earned income,67626.59,6.0,0.0,0.0 CT,Pre-phase-in,16373.769,1.5,484.28445,193.71379 CT,Full amount,767.30066,0.1,631.99994,252.79996 CT,Partially phased out,9624.257,0.9,373.07062,149.22826 -CT,Fully phased out,717488.44,64.1,0.0,0.0 -DC,No income,108526.7,43.9,0.0,0.0 +CT,Fully phased out,414601.4,37.0,0.0,0.0 +DC,Ineligible,150828.56,61.0,0.0,0.0 +DC,No earned income,16620.902,6.7,0.0,0.0 DC,Pre-phase-in,2963.3503,1.2,494.57617,494.57617 DC,Full amount,184.6686,0.1,632.0,632.0 DC,Partially phased out,2235.9175,0.9,368.30032,631.9979 -DC,Fully phased out,133171.56,53.9,0.0,0.0 -DE,No income,96851.85,36.5,0.0,0.0 +DC,Fully phased out,74248.8,30.1,0.0,0.0 +DE,Ineligible,155854.33,58.8,0.0,0.0 +DE,No earned income,16751.723,6.3,0.0,0.0 DE,Pre-phase-in,1639.9989,0.6,510.83517,22.987583 DE,Full amount,147.16574,0.1,632.0001,28.440006 DE,Partially phased out,2595.5366,1.0,368.74863,32.519676 -DE,Fully phased out,163998.89,61.8,0.0,0.0 -FL,No income,2428692.2,35.6,0.0,0.0 +DE,Fully phased out,88244.69,33.3,0.0,0.0 +FL,Ineligible,3838385.8,56.2,0.0,0.0 +FL,No earned income,433466.7,6.3,0.0,0.0 FL,Pre-phase-in,75472.13,1.1,421.00208,0.0 FL,Full amount,162.57959,0.0,632.00006,0.0 FL,Partially phased out,46630.918,0.7,390.32828,0.0 -FL,Fully phased out,4277713.5,62.6,0.0,0.0 -GA,No income,1082267.2,37.7,0.0,0.0 +FL,Fully phased out,2434553.2,35.7,0.0,0.0 +GA,Ineligible,1553378.5,54.2,0.0,0.0 +GA,No earned income,188064.14,6.6,0.0,0.0 GA,Pre-phase-in,19097.996,0.7,465.8739,0.0 GA,Full amount,737.23004,0.0,632.00006,0.0 GA,Partially phased out,31459.322,1.1,345.65503,0.0 -GA,Fully phased out,1734347.4,60.5,0.0,0.0 -HI,No income,150656.95,37.5,0.0,0.0 +GA,Fully phased out,1075171.9,37.5,0.0,0.0 +HI,Ineligible,221502.23,55.2,0.0,0.0 +HI,No earned income,23955.72,6.0,0.0,0.0 HI,Pre-phase-in,5178.715,1.3,494.72058,197.88823 HI,Full amount,257.19754,0.1,632.0,252.79999 HI,Partially phased out,2631.8538,0.7,420.5297,168.21193 -HI,Fully phased out,242505.7,60.4,0.0,0.0 -IA,No income,251013.9,30.1,0.0,0.0 +HI,Fully phased out,147704.7,36.8,0.0,0.0 +IA,Ineligible,409482.44,49.0,0.0,0.0 +IA,No earned income,46174.527,5.5,0.0,0.0 IA,Pre-phase-in,15106.781,1.8,501.14404,75.171616 IA,Full amount,170.26959,0.0,632.00006,94.80001 IA,Partially phased out,4263.8096,0.5,567.887,85.18305 -IA,Fully phased out,564435.0,67.6,0.0,0.0 -ID,No income,124213.31,29.5,0.0,0.0 +IA,Fully phased out,359791.94,43.1,0.0,0.0 +ID,Ineligible,213084.25,50.7,0.0,0.0 +ID,No earned income,20815.344,4.9,0.0,0.0 ID,Pre-phase-in,4232.7524,1.0,488.3741,0.0 ID,Full amount,29.52343,0.0,631.99994,0.0 ID,Partially phased out,3326.125,0.8,406.90085,0.0 -ID,Fully phased out,288834.3,68.7,0.0,0.0 -IL,No income,1562858.9,38.5,0.0,0.0 +ID,Fully phased out,179148.03,42.6,0.0,0.0 +IL,Ineligible,2202561.0,54.2,0.0,0.0 +IL,No earned income,286114.75,7.0,0.0,0.0 IL,Pre-phase-in,56196.797,1.4,500.49594,100.099174 IL,Full amount,1363.2563,0.0,632.0,126.399994 IL,Partially phased out,36035.117,0.9,392.3396,78.46792 -IL,Fully phased out,2405379.0,59.2,0.0,0.0 -IN,No income,524384.75,30.7,0.0,0.0 +IL,Fully phased out,1479562.0,36.4,0.0,0.0 +IN,Ineligible,843974.44,49.4,0.0,0.0 +IN,No earned income,96952.6,5.7,0.0,0.0 IN,Pre-phase-in,15157.452,0.9,480.99515,48.099514 IN,Full amount,511.06155,0.0,632.0,63.2 IN,Partially phased out,10308.328,0.6,496.45984,49.64599 -IN,Fully phased out,1156921.9,67.8,0.0,0.0 -KS,No income,212148.61,28.4,0.0,0.0 +IN,Fully phased out,740379.7,43.4,0.0,0.0 +KS,Ineligible,358161.7,48.0,0.0,0.0 +KS,No earned income,40405.902,5.4,0.0,0.0 KS,Pre-phase-in,4779.117,0.6,466.8283,79.36082 KS,Full amount,216.41118,0.0,631.99994,107.44002 KS,Partially phased out,7120.175,1.0,342.1813,58.17083 -KS,Fully phased out,522227.38,70.0,0.0,0.0 -KY,No income,423462.44,37.7,0.0,0.0 +KS,Fully phased out,335808.38,45.0,0.0,0.0 +KY,Ineligible,602998.1,53.7,0.0,0.0 +KY,No earned income,70266.05,6.3,0.0,0.0 KY,Pre-phase-in,13095.383,1.2,494.37393,0.0 KY,Full amount,217.51695,0.0,632.0,0.0 KY,Partially phased out,9609.8,0.9,443.0421,0.0 -KY,Fully phased out,676532.6,60.2,0.0,0.0 -LA,No income,555979.4,44.3,0.0,0.0 +KY,Fully phased out,426730.9,38.0,0.0,0.0 +LA,Ineligible,704199.25,56.1,0.0,0.0 +LA,No earned income,105432.836,8.4,0.0,0.0 LA,Pre-phase-in,10768.384,0.9,475.93637,23.796818 LA,Full amount,408.5082,0.0,631.99994,31.6 LA,Partially phased out,9441.553,0.8,432.26416,21.613207 -LA,Fully phased out,678437.3,54.1,0.0,0.0 -MA,No income,924162.9,37.8,0.0,0.0 +LA,Fully phased out,424784.6,33.8,0.0,0.0 +MA,Ineligible,1367253.4,55.9,0.0,0.0 +MA,No earned income,163230.92,6.7,0.0,0.0 MA,Pre-phase-in,39122.17,1.6,497.64935,199.05975 MA,Full amount,775.058,0.0,632.00006,252.79997 MA,Partially phased out,21951.188,0.9,362.97803,145.19122 -MA,Fully phased out,1459471.0,59.7,0.0,0.0 -MD,No income,591329.2,34.0,0.0,0.0 +MA,Fully phased out,853149.5,34.9,0.0,0.0 +MD,Ineligible,926722.4,53.3,0.0,0.0 +MD,No earned income,97357.57,5.6,0.0,0.0 MD,Pre-phase-in,19581.512,1.1,469.82965,899.6156 MD,Full amount,673.052,0.0,631.9999,1145.5214 MD,Partially phased out,20056.42,1.2,350.75146,579.90436 -MD,Fully phased out,1105824.4,63.6,0.0,0.0 -ME,No income,160096.05,36.7,0.0,0.0 +MD,Fully phased out,673073.6,38.7,0.0,0.0 +ME,Ineligible,233543.02,53.5,0.0,0.0 +ME,No earned income,28112.352,6.4,0.0,0.0 ME,Pre-phase-in,3628.5383,0.8,479.49707,239.74854 ME,Full amount,66.5702,0.0,632.0,316.0 ME,Partially phased out,5021.548,1.2,354.7684,177.3842 -ME,Fully phased out,267842.22,61.3,0.0,0.0 -MI,No income,1127146.1,38.2,0.0,0.0 +ME,Fully phased out,166282.9,38.1,0.0,0.0 +MI,Ineligible,1612950.5,54.7,0.0,0.0 +MI,No earned income,202805.86,6.9,0.0,0.0 MI,Pre-phase-in,40620.59,1.4,500.584,150.1752 MI,Full amount,1816.3854,0.1,632.0,189.59996 MI,Partially phased out,16553.447,0.6,518.3705,155.51114 -MI,Fully phased out,1761326.0,59.8,0.0,0.0 -MN,No income,477418.0,30.2,0.0,0.0 +MI,Fully phased out,1072715.8,36.4,0.0,0.0 +MN,Ineligible,777345.56,49.2,0.0,0.0 +MN,No earned income,92785.24,5.9,0.0,0.0 MN,Pre-phase-in,24572.006,1.6,499.59137,515.8987 MN,Full amount,607.66266,0.0,631.9999,637.73914 MN,Partially phased out,13705.589,0.9,391.5037,578.87744 -MN,Fully phased out,1063629.9,67.3,0.0,0.0 -MO,No income,557544.7,35.5,0.0,0.0 +MN,Fully phased out,670917.06,42.5,0.0,0.0 +MO,Ineligible,835192.44,53.1,0.0,0.0 +MO,No earned income,104288.97,6.6,0.0,0.0 MO,Pre-phase-in,12013.582,0.8,484.0781,96.81563 MO,Full amount,537.36273,0.0,632.0,126.4 MO,Partially phased out,12784.181,0.8,442.70206,88.54041 -MO,Fully phased out,989594.4,62.9,0.0,0.0 -MS,No income,300984.9,40.0,0.0,0.0 +MO,Fully phased out,607657.7,38.6,0.0,0.0 +MS,Ineligible,419520.0,55.8,0.0,0.0 +MS,No earned income,45843.477,6.1,0.0,0.0 MS,Pre-phase-in,2575.797,0.3,406.55896,0.0 MS,Full amount,166.1088,0.0,632.0,0.0 MS,Partially phased out,7846.9844,1.0,404.25897,0.0 -MS,Fully phased out,440284.22,58.6,0.0,0.0 -MT,No income,104050.336,32.3,0.0,0.0 +MS,Fully phased out,275905.66,36.7,0.0,0.0 +MT,Ineligible,167522.28,51.9,0.0,0.0 +MT,No earned income,18607.895,5.8,0.0,0.0 MT,Pre-phase-in,2331.9133,0.7,462.88824,46.288826 MT,Full amount,77.42616,0.0,631.99994,63.2 MT,Partially phased out,3967.5837,1.2,334.9668,33.49668 -MT,Fully phased out,212178.6,65.8,0.0,0.0 -NC,No income,1207476.5,40.0,0.0,0.0 +MT,Fully phased out,130098.75,40.3,0.0,0.0 +NC,Ineligible,1696681.8,56.2,0.0,0.0 +NC,No earned income,207752.4,6.9,0.0,0.0 NC,Pre-phase-in,11699.434,0.4,451.06644,0.0 NC,Full amount,875.7679,0.0,632.00006,0.0 NC,Partially phased out,24799.03,0.8,392.04834,0.0 -NC,Fully phased out,1773596.9,58.8,0.0,0.0 -ND,No income,49795.652,23.9,0.0,0.0 +NC,Fully phased out,1076639.2,35.7,0.0,0.0 +ND,Ineligible,93042.516,44.6,0.0,0.0 +ND,No earned income,10131.772,4.9,0.0,0.0 ND,Pre-phase-in,4071.564,2.0,507.6821,0.0 ND,Full amount,3.9011254,0.0,632.0,0.0 ND,Partially phased out,932.3558,0.4,620.9536,0.0 -ND,Fully phased out,153755.1,73.7,0.0,0.0 -NE,No income,153839.03,27.7,0.0,0.0 +ND,Fully phased out,100376.46,48.1,0.0,0.0 +NE,Ineligible,259955.0,46.8,0.0,0.0 +NE,No earned income,30927.43,5.6,0.0,0.0 NE,Pre-phase-in,4230.322,0.8,484.84406,48.48441 NE,Full amount,54.920815,0.0,631.99994,63.20001 NE,Partially phased out,5635.957,1.0,389.42334,38.942337 -NE,Fully phased out,391285.56,70.5,0.0,0.0 -NH,No income,106673.484,22.9,0.0,0.0 +NE,Fully phased out,254242.17,45.8,0.0,0.0 +NH,Ineligible,224852.69,48.2,0.0,0.0 +NH,No earned income,22175.135,4.8,0.0,0.0 NH,Pre-phase-in,10499.317,2.3,341.3794,0.0 NH,Full amount,0.36059391,0.0,631.9999,0.0 NH,Partially phased out,4283.302,0.9,369.89697,0.0 -NH,Fully phased out,344719.75,73.9,0.0,0.0 -NJ,No income,739419.6,27.7,0.0,0.0 +NH,Fully phased out,204365.39,43.8,0.0,0.0 +NJ,Ineligible,1327728.9,49.7,0.0,0.0 +NJ,No earned income,152476.06,5.7,0.0,0.0 NJ,Pre-phase-in,44810.92,1.7,469.32755,187.73102 NJ,Full amount,816.7071,0.0,632.00006,252.79999 NJ,Partially phased out,28470.457,1.1,357.7779,143.11118 -NJ,Fully phased out,1856987.6,69.5,0.0,0.0 -NM,No income,325146.88,48.2,0.0,0.0 +NJ,Fully phased out,1116202.4,41.8,0.0,0.0 +NM,Ineligible,404188.94,59.9,0.0,0.0 +NM,No earned income,55271.652,8.2,0.0,0.0 NM,Pre-phase-in,8052.7715,1.2,507.6397,126.90993 NM,Full amount,333.1007,0.0,632.00006,158.00002 NM,Partially phased out,6802.738,1.0,365.1997,91.29993 -NM,Fully phased out,334468.0,49.6,0.0,0.0 -NV,No income,348538.47,36.2,0.0,0.0 +NM,Fully phased out,200154.3,29.7,0.0,0.0 +NV,Ineligible,506265.03,52.6,0.0,0.0 +NV,No earned income,57386.582,6.0,0.0,0.0 NV,Pre-phase-in,14720.673,1.5,504.0872,0.0 NV,Full amount,148.53976,0.0,632.0,0.0 NV,Partially phased out,8670.565,0.9,444.6631,0.0 -NV,Fully phased out,590726.06,61.4,0.0,0.0 -NY,No income,2374925.0,39.0,0.0,0.0 +NV,Fully phased out,375612.94,39.0,0.0,0.0 +NY,Ineligible,3385627.2,55.6,0.0,0.0 +NY,No earned income,414966.78,6.8,0.0,0.0 NY,Pre-phase-in,84690.75,1.4,473.9047,142.17131 NY,Full amount,4793.9814,0.1,632.00006,187.95201 NY,Partially phased out,57803.137,0.9,371.3948,86.73438 -NY,Fully phased out,3567282.8,58.6,0.0,0.0 -OH,No income,1122261.2,35.4,0.0,0.0 +NY,Fully phased out,2141613.5,35.2,0.0,0.0 +OH,Ineligible,1660358.1,52.4,0.0,0.0 +OH,No earned income,201214.5,6.3,0.0,0.0 OH,Pre-phase-in,35480.473,1.1,496.31277,148.89384 OH,Full amount,1062.4825,0.0,632.00006,189.60002 OH,Partially phased out,27212.557,0.9,445.73523,133.72055 -OH,Fully phased out,1985389.1,62.6,0.0,0.0 -OK,No income,449458.44,39.4,0.0,0.0 +OH,Fully phased out,1246077.8,39.3,0.0,0.0 +OK,Ineligible,601921.5,52.7,0.0,0.0 +OK,No earned income,77404.78,6.8,0.0,0.0 OK,Pre-phase-in,12928.621,1.1,493.86475,24.441998 OK,Full amount,425.45355,0.0,631.9999,26.308289 OK,Partially phased out,7212.5874,0.6,491.54944,14.299281 -OK,Fully phased out,671719.44,58.8,0.0,0.0 -OR,No income,580960.9,42.0,0.0,0.0 +OK,Fully phased out,441851.6,38.7,0.0,0.0 +OR,Ineligible,796490.7,57.5,0.0,0.0 +OR,No earned income,91706.81,6.6,0.0,0.0 OR,Pre-phase-in,13906.58,1.0,489.80258,44.082233 OR,Full amount,593.1798,0.0,631.99994,56.879993 OR,Partially phased out,13682.355,1.0,343.76025,30.938423 -OR,Fully phased out,775251.25,56.0,0.0,0.0 -PA,No income,1519688.8,37.5,0.0,0.0 +OR,Fully phased out,468014.56,33.8,0.0,0.0 +PA,Ineligible,2221587.5,54.8,0.0,0.0 +PA,No earned income,261490.27,6.4,0.0,0.0 PA,Pre-phase-in,57390.523,1.4,501.43756,0.0 PA,Full amount,1836.4663,0.0,632.0,0.0 PA,Partially phased out,29079.27,0.7,390.8903,0.0 -PA,Fully phased out,2449417.2,60.4,0.0,0.0 -RI,No income,134419.86,33.8,0.0,0.0 +PA,Fully phased out,1486028.4,36.6,0.0,0.0 +RI,Ineligible,197708.66,49.7,0.0,0.0 +RI,No earned income,23615.35,5.9,0.0,0.0 RI,Pre-phase-in,6746.8735,1.7,491.96518,78.71443 RI,Full amount,183.33531,0.0,632.0,101.12 RI,Partially phased out,2940.4941,0.7,418.9308,67.02892 -RI,Fully phased out,253292.34,63.7,0.0,0.0 -SC,No income,515461.88,37.1,0.0,0.0 +RI,Fully phased out,166388.2,41.8,0.0,0.0 +SC,Ineligible,769749.3,55.5,0.0,0.0 +SC,No earned income,89473.05,6.4,0.0,0.0 SC,Pre-phase-in,5157.9634,0.4,435.94592,544.9495 SC,Full amount,683.06305,0.0,631.99994,789.99994 SC,Partially phased out,10066.327,0.7,464.98962,581.2448 -SC,Fully phased out,856582.0,61.7,0.0,0.0 -SD,No income,66645.664,25.9,0.0,0.0 +SC,Fully phased out,512821.5,36.9,0.0,0.0 +SD,Ineligible,118655.95,46.1,0.0,0.0 +SD,No earned income,16588.975,6.4,0.0,0.0 SD,Pre-phase-in,1245.0791,0.5,504.92484,0.0 SD,Full amount,0.10412951,0.0,632.00006,0.0 SD,Partially phased out,2807.9924,1.1,332.0377,0.0 -SD,Fully phased out,186959.94,72.6,0.0,0.0 -TN,No income,638658.2,30.0,0.0,0.0 +SD,Fully phased out,118360.67,45.9,0.0,0.0 +TN,Ineligible,1009965.5,47.5,0.0,0.0 +TN,No earned income,132509.69,6.2,0.0,0.0 TN,Pre-phase-in,26534.91,1.2,403.13788,0.0 TN,Full amount,507.1686,0.0,631.9999,0.0 TN,Partially phased out,10116.405,0.5,525.4987,0.0 -TN,Fully phased out,1450007.2,68.2,0.0,0.0 -TX,No income,2445616.0,29.6,0.0,0.0 +TN,Fully phased out,946190.3,44.5,0.0,0.0 +TX,Ineligible,4019368.2,48.6,0.0,0.0 +TX,No earned income,487542.28,5.9,0.0,0.0 TX,Pre-phase-in,152565.27,1.8,480.83218,0.0 TX,Full amount,196.01257,0.0,632.00006,0.0 TX,Partially phased out,57381.688,0.7,500.75223,0.0 -TX,Fully phased out,5614733.5,67.9,0.0,0.0 -UT,No income,199810.95,27.4,0.0,0.0 +TX,Fully phased out,3553438.8,43.0,0.0,0.0 +UT,Ineligible,345823.22,47.5,0.0,0.0 +UT,No earned income,39239.773,5.4,0.0,0.0 UT,Pre-phase-in,6227.8457,0.9,467.4281,0.0 UT,Full amount,130.31685,0.0,631.9999,0.0 UT,Partially phased out,10770.686,1.5,357.29318,0.0 -UT,Fully phased out,511761.94,70.2,0.0,0.0 -VA,No income,796918.2,33.9,0.0,0.0 +UT,Fully phased out,326509.88,44.8,0.0,0.0 +VA,Ineligible,1276002.0,54.3,0.0,0.0 +VA,No earned income,147874.81,6.3,0.0,0.0 VA,Pre-phase-in,11077.845,0.5,431.60608,64.74091 VA,Full amount,626.7285,0.0,632.0,94.8 VA,Partially phased out,25376.033,1.1,369.48682,55.479294 -VA,Fully phased out,1514494.6,64.5,0.0,0.0 -VT,No income,74611.41,35.0,0.0,0.0 +VA,Fully phased out,887535.94,37.8,0.0,0.0 +VT,Ineligible,111693.305,52.4,0.0,0.0 +VT,No earned income,12993.281,6.1,0.0,0.0 VT,Pre-phase-in,2901.803,1.4,506.58264,192.5014 VT,Full amount,82.56235,0.0,631.99994,240.16003 VT,Partially phased out,1989.4019,0.9,358.7914,136.34074 -VT,Fully phased out,133518.0,62.7,0.0,0.0 -WA,No income,823241.4,30.4,0.0,0.0 +VT,Fully phased out,83442.83,39.2,0.0,0.0 +WA,Ineligible,1369159.1,50.5,0.0,0.0 +WA,No earned income,155331.19,5.7,0.0,0.0 WA,Pre-phase-in,66998.695,2.5,508.81302,324.99997 WA,Full amount,9.786905,0.0,632.0,324.99997 WA,Partially phased out,17477.504,0.6,477.79034,324.9583 -WA,Fully phased out,1801335.0,66.5,0.0,0.0 -WI,No income,611358.3,35.3,0.0,0.0 +WA,Fully phased out,1100086.0,40.6,0.0,0.0 +WI,Ineligible,923438.06,53.3,0.0,0.0 +WI,No earned income,111346.82,6.4,0.0,0.0 WI,Pre-phase-in,15312.648,0.9,452.62857,0.0 WI,Full amount,774.9501,0.0,632.0,0.0 WI,Partially phased out,12071.585,0.7,421.98117,0.0 -WI,Fully phased out,1092418.4,63.1,0.0,0.0 -WV,No income,229108.64,44.1,0.0,0.0 +WI,Fully phased out,668991.9,38.6,0.0,0.0 +WV,Ineligible,302051.8,58.1,0.0,0.0 +WV,No earned income,36730.87,7.1,0.0,0.0 WV,Pre-phase-in,5850.024,1.1,508.3707,0.0 WV,Full amount,620.71576,0.1,631.99994,0.0 WV,Partially phased out,3761.238,0.7,542.116,0.0 -WV,Fully phased out,280251.75,53.9,0.0,0.0 -WY,No income,44996.188,26.4,0.0,0.0 +WV,Fully phased out,170577.69,32.8,0.0,0.0 +WY,Ineligible,83720.836,49.2,0.0,0.0 +WY,No earned income,9110.276,5.4,0.0,0.0 WY,Pre-phase-in,2233.8857,1.3,320.7945,0.0 WY,Full amount,86.93167,0.1,632.0,0.0 WY,Partially phased out,803.0245,0.5,622.7631,0.0 -WY,Fully phased out,122062.08,71.7,0.0,0.0 +WY,Fully phased out,74227.15,43.6,0.0,0.0 diff --git a/eitc_childless_analysis/eitc_childless_phase_status_summary_2025.csv b/eitc_childless_analysis/eitc_childless_phase_status_summary_2025.csv index a2ea8be..8dd6f83 100644 --- a/eitc_childless_analysis/eitc_childless_phase_status_summary_2025.csv +++ b/eitc_childless_analysis/eitc_childless_phase_status_summary_2025.csv @@ -1,255 +1,306 @@ state,eitc_phase_status,weighted_households,pct_of_state,avg_federal_eitc,avg_state_eitc -AK,No income,64630.094,31.1,0.0,0.0 +AK,Ineligible,104066.336,50.1,0.0,0.0 +AK,No earned income,13997.345,6.7,0.0,0.0 AK,Pre-phase-in,3626.4304,1.7,540.8136,0.0 AK,Full amount,0.2665671,0.0,649.0,0.0 AK,Partially phased out,1685.9508,0.8,627.0948,0.0 -AK,Fully phased out,137746.2,66.3,0.0,0.0 -AL,No income,600602.44,41.8,0.0,0.0 +AK,Fully phased out,84312.62,40.6,0.0,0.0 +AL,Ineligible,814817.6,56.8,0.0,0.0 +AL,No earned income,109308.14,7.6,0.0,0.0 AL,Pre-phase-in,3424.464,0.2,372.11118,0.0 AL,Full amount,586.21875,0.0,648.9999,0.0 AL,Partially phased out,10817.388,0.8,439.31036,0.0 -AL,Fully phased out,819896.5,57.1,0.0,0.0 -AR,No income,233882.83,33.9,0.0,0.0 +AL,Fully phased out,496373.16,34.6,0.0,0.0 +AR,Ineligible,368928.66,53.5,0.0,0.0 +AR,No earned income,42580.1,6.2,0.0,0.0 AR,Pre-phase-in,2349.7502,0.3,475.75613,0.0 AR,Full amount,227.07906,0.0,649.0,0.0 AR,Partially phased out,5943.537,0.9,379.36224,0.0 -AR,Fully phased out,447788.06,64.9,0.0,0.0 -AZ,No income,676085.4,35.2,0.0,0.0 +AR,Fully phased out,270162.16,39.1,0.0,0.0 +AZ,Ineligible,1040525.9,54.1,0.0,0.0 +AZ,No earned income,119153.92,6.2,0.0,0.0 AZ,Pre-phase-in,16887.523,0.9,513.83417,0.0 AZ,Full amount,821.8081,0.0,649.00006,0.0 AZ,Partially phased out,14207.117,0.7,460.64685,0.0 -AZ,Fully phased out,1215313.5,63.2,0.0,0.0 -CA,No income,4375658.5,37.1,0.0,0.0 +AZ,Fully phased out,731719.1,38.0,0.0,0.0 +CA,Ineligible,6320594.5,53.6,0.0,0.0 +CA,No earned income,759606.3,6.4,0.0,0.0 CA,Pre-phase-in,170760.03,1.4,487.17966,225.75891 CA,Full amount,6409.468,0.1,649.0,217.95975 CA,Partially phased out,130714.73,1.1,324.05356,164.7 -CA,Fully phased out,7101629.0,60.3,0.0,0.0 -CO,No income,511078.1,31.6,0.0,0.0 +CA,Fully phased out,4397087.0,37.3,0.0,0.0 +CO,Ineligible,853170.56,52.7,0.0,0.0 +CO,No earned income,92653.6,5.7,0.0,0.0 CO,Pre-phase-in,18268.05,1.1,519.7745,181.92107 CO,Full amount,613.7801,0.0,648.99994,227.15 CO,Partially phased out,14953.744,0.9,379.74893,132.91211 -CO,Fully phased out,1072927.0,66.3,0.0,0.0 -CT,No income,377874.75,33.4,0.0,0.0 +CO,Fully phased out,638180.9,39.4,0.0,0.0 +CT,Ineligible,616535.9,54.5,0.0,0.0 +CT,No earned income,68254.49,6.0,0.0,0.0 CT,Pre-phase-in,16525.797,1.5,507.93967,203.17586 CT,Full amount,774.36786,0.1,649.00006,259.6 CT,Partially phased out,9713.412,0.9,360.43723,144.1749 -CT,Fully phased out,725354.56,64.2,0.0,0.0 -DC,No income,109168.22,43.8,0.0,0.0 +CT,Fully phased out,418438.97,37.0,0.0,0.0 +DC,Ineligible,152229.14,61.0,0.0,0.0 +DC,No earned income,16775.225,6.7,0.0,0.0 DC,Pre-phase-in,2990.8645,1.2,518.73395,518.73395 DC,Full amount,186.38322,0.1,648.99994,648.99994 DC,Partially phased out,2256.2388,0.9,355.38052,648.99664 -DC,Fully phased out,134774.6,54.0,0.0,0.0 -DE,No income,97403.21,36.4,0.0,0.0 +DC,Fully phased out,74938.46,30.1,0.0,0.0 +DE,Ineligible,157301.47,58.8,0.0,0.0 +DE,No earned income,16907.258,6.3,0.0,0.0 DE,Pre-phase-in,1655.2258,0.6,535.7872,24.110426 DE,Full amount,148.53215,0.1,648.9998,29.219957 DE,Partially phased out,2619.2864,1.0,356.51498,30.245703 -DE,Fully phased out,165869.83,62.0,0.0,0.0 -FL,No income,2441849.5,35.4,0.0,0.0 +DE,Fully phased out,89064.305,33.3,0.0,0.0 +FL,Ineligible,3874125.5,56.2,0.0,0.0 +FL,No earned income,437491.3,6.3,0.0,0.0 FL,Pre-phase-in,76172.52,1.1,441.5651,0.0 FL,Full amount,164.23816,0.0,649.0,0.0 FL,Partially phased out,47063.992,0.7,379.02612,0.0 -FL,Fully phased out,4326824.0,62.8,0.0,0.0 -GA,No income,1087212.8,37.6,0.0,0.0 +FL,Fully phased out,2457056.5,35.7,0.0,0.0 +GA,Ineligible,1567838.5,54.2,0.0,0.0 +GA,No earned income,189810.28,6.6,0.0,0.0 GA,Pre-phase-in,19275.111,0.7,488.62866,0.0 GA,Full amount,744.164,0.0,649.0,0.0 GA,Partially phased out,31712.979,1.1,332.53235,0.0 -GA,Fully phased out,1755592.1,60.7,0.0,0.0 -HI,No income,151569.28,37.4,0.0,0.0 +GA,Fully phased out,1085156.0,37.5,0.0,0.0 +HI,Ineligible,223566.28,55.2,0.0,0.0 +HI,No earned income,24178.145,6.0,0.0,0.0 HI,Pre-phase-in,5226.798,1.3,518.88556,207.55426 HI,Full amount,259.58557,0.1,649.0,259.6 HI,Partially phased out,2655.7375,0.7,410.4888,164.19553 -HI,Fully phased out,245244.36,60.6,0.0,0.0 -IA,No income,252454.45,30.0,0.0,0.0 +HI,Fully phased out,149069.22,36.8,0.0,0.0 +IA,Ineligible,413299.25,49.0,0.0,0.0 +IA,No earned income,46603.25,5.5,0.0,0.0 IA,Pre-phase-in,15246.479,1.8,525.61804,78.842705 IA,Full amount,172.41685,0.0,649.00006,97.350006 IA,Partially phased out,4300.986,0.5,565.4114,84.81173 -IA,Fully phased out,570568.1,67.7,0.0,0.0 -ID,No income,124885.96,29.4,0.0,0.0 +IA,Fully phased out,363120.06,43.1,0.0,0.0 +ID,Ineligible,215063.75,50.7,0.0,0.0 +ID,No earned income,21008.61,4.9,0.0,0.0 ID,Pre-phase-in,4272.0527,1.0,512.2291,0.0 ID,Full amount,29.752382,0.0,649.0,0.0 ID,Partially phased out,3353.1333,0.8,396.75974,0.0 -ID,Fully phased out,292000.62,68.8,0.0,0.0 -IL,No income,1571268.5,38.3,0.0,0.0 +ID,Fully phased out,180814.25,42.6,0.0,0.0 +IL,Ineligible,2223068.0,54.2,0.0,0.0 +IL,No earned income,288771.25,7.0,0.0,0.0 IL,Pre-phase-in,56717.453,1.4,524.9404,104.98808 IL,Full amount,1376.8824,0.0,649.0,129.8 IL,Partially phased out,36363.61,0.9,381.12885,76.22578 -IL,Fully phased out,2433819.8,59.4,0.0,0.0 -IN,No income,526058.25,30.5,0.0,0.0 +IL,Fully phased out,1493248.9,36.4,0.0,0.0 +IN,Ineligible,851842.56,49.4,0.0,0.0 +IN,No earned income,97852.78,5.7,0.0,0.0 IN,Pre-phase-in,15297.608,0.9,504.4841,50.4484 IN,Full amount,515.559,0.0,649.0,64.899994 IN,Partially phased out,10397.022,0.6,490.227,49.0227 -IN,Fully phased out,1170866.9,67.9,0.0,0.0 -KS,No income,213326.72,28.3,0.0,0.0 +IN,Fully phased out,747229.75,43.4,0.0,0.0 +KS,Ineligible,361496.38,48.0,0.0,0.0 +KS,No earned income,40781.062,5.4,0.0,0.0 KS,Pre-phase-in,4823.49,0.6,489.63092,83.23727 KS,Full amount,218.4205,0.0,649.00006,110.330025 KS,Partially phased out,7186.284,1.0,328.44672,55.83595 -KS,Fully phased out,527867.8,70.1,0.0,0.0 -KY,No income,425535.8,37.5,0.0,0.0 +KS,Fully phased out,338917.06,45.0,0.0,0.0 +KY,Ineligible,608619.8,53.7,0.0,0.0 +KY,No earned income,70918.45,6.3,0.0,0.0 KY,Pre-phase-in,13215.555,1.2,518.5075,0.0 KY,Full amount,220.27727,0.0,648.99994,0.0 KY,Partially phased out,9690.798,0.9,434.6611,0.0 -KY,Fully phased out,684681.4,60.4,0.0,0.0 -LA,No income,558228.4,44.1,0.0,0.0 +KY,Fully phased out,430678.94,38.0,0.0,0.0 +LA,Ineligible,710803.8,56.1,0.0,0.0 +LA,No earned income,106411.75,8.4,0.0,0.0 LA,Pre-phase-in,10867.544,0.9,499.1724,24.95862 LA,Full amount,412.60556,0.0,649.0,32.450005 LA,Partially phased out,9523.345,0.8,422.9088,21.14544 -LA,Fully phased out,687655.94,54.3,0.0,0.0 -MA,No income,929485.7,37.7,0.0,0.0 +LA,Fully phased out,428668.78,33.8,0.0,0.0 +MA,Ineligible,1379958.9,55.9,0.0,0.0 +MA,No earned income,164746.48,6.7,0.0,0.0 MA,Pre-phase-in,39485.215,1.6,521.95667,208.78267 MA,Full amount,782.3858,0.0,649.0,259.60004 MA,Partially phased out,22153.469,0.9,349.9549,139.98196 -MA,Fully phased out,1476281.2,59.8,0.0,0.0 -MD,No income,594474.6,33.9,0.0,0.0 +MA,Fully phased out,861061.6,34.9,0.0,0.0 +MD,Ineligible,935364.0,53.3,0.0,0.0 +MD,No earned income,98261.51,5.6,0.0,0.0 MD,Pre-phase-in,19762.99,1.1,492.7761,974.1709 MD,Full amount,679.5751,0.0,649.00006,1213.9137 MD,Partially phased out,20237.688,1.2,337.385,566.70026 -MD,Fully phased out,1118441.6,63.8,0.0,0.0 -ME,No income,160843.88,36.5,0.0,0.0 +MD,Fully phased out,679290.75,38.7,0.0,0.0 +ME,Ineligible,235719.73,53.5,0.0,0.0 +ME,No earned income,28373.367,6.4,0.0,0.0 ME,Pre-phase-in,3662.2285,0.8,502.9185,251.45924 ME,Full amount,67.18829,0.0,648.99994,324.49997 ME,Partially phased out,5068.026,1.1,341.45905,170.72952 -ME,Fully phased out,271067.88,61.5,0.0,0.0 -MI,No income,1133642.9,38.1,0.0,0.0 +ME,Fully phased out,167818.64,38.1,0.0,0.0 +MI,Ineligible,1627968.5,54.7,0.0,0.0 +MI,No earned income,204688.86,6.9,0.0,0.0 MI,Pre-phase-in,40997.25,1.4,525.0338,157.51013 MI,Full amount,1833.6553,0.1,649.0001,194.70001 MI,Partially phased out,16704.406,0.6,513.17175,153.95154 -MI,Fully phased out,1781650.9,59.9,0.0,0.0 -MN,No income,480188.66,30.1,0.0,0.0 +MI,Fully phased out,1082636.5,36.4,0.0,0.0 +MN,Ineligible,784575.2,49.2,0.0,0.0 +MN,No earned income,93646.734,5.9,0.0,0.0 MN,Pre-phase-in,24799.537,1.6,523.99115,541.0607 MN,Full amount,613.8611,0.0,649.00006,653.1152 MN,Partially phased out,13830.196,0.9,379.9242,576.283 -MN,Fully phased out,1075170.2,67.4,0.0,0.0 -MO,No income,559638.9,35.3,0.0,0.0 +MN,Fully phased out,677136.94,42.5,0.0,0.0 +MO,Ineligible,842984.0,53.1,0.0,0.0 +MO,No earned income,105257.266,6.6,0.0,0.0 MO,Pre-phase-in,12124.915,0.8,507.72104,101.544205 MO,Full amount,542.54004,0.0,648.9999,129.8 MO,Partially phased out,12898.628,0.8,434.0235,86.804695 -MO,Fully phased out,1001869.3,63.1,0.0,0.0 -MS,No income,301091.4,39.7,0.0,0.0 +MO,Fully phased out,613266.94,38.6,0.0,0.0 +MS,Ineligible,423436.22,55.8,0.0,0.0 +MS,No earned income,46269.12,6.1,0.0,0.0 MS,Pre-phase-in,2598.4688,0.3,426.31012,0.0 MS,Full amount,168.39967,0.0,649.0,0.0 MS,Partially phased out,7911.6304,1.0,394.04974,0.0 -MS,Fully phased out,447068.94,58.9,0.0,0.0 -MT,No income,104646.91,32.1,0.0,0.0 +MS,Fully phased out,278455.0,36.7,0.0,0.0 +MT,Ineligible,169079.56,51.9,0.0,0.0 +MT,No earned income,18780.664,5.8,0.0,0.0 MT,Pre-phase-in,2353.5647,0.7,485.49875,48.549877 MT,Full amount,78.14505,0.0,648.99994,64.899994 MT,Partially phased out,4004.4219,1.2,320.77896,32.0779 -MT,Fully phased out,214518.14,65.9,0.0,0.0 -NC,No income,1213348.5,39.8,0.0,0.0 +MT,Fully phased out,131304.81,40.3,0.0,0.0 +NC,Ineligible,1712466.1,56.2,0.0,0.0 +NC,No earned income,209681.34,6.9,0.0,0.0 NC,Pre-phase-in,11808.03,0.4,473.09875,0.0 NC,Full amount,883.92944,0.0,649.0,0.0 NC,Partially phased out,25027.451,0.8,380.5623,0.0 -NC,Fully phased out,1795405.4,58.9,0.0,0.0 -ND,No income,50085.26,23.8,0.0,0.0 +NC,Fully phased out,1086606.4,35.7,0.0,0.0 +ND,Ineligible,93906.45,44.6,0.0,0.0 +ND,No earned income,10225.844,4.9,0.0,0.0 ND,Pre-phase-in,4108.928,2.0,532.4672,0.0 ND,Full amount,4.3769407,0.0,649.00006,0.0 ND,Partially phased out,941.0125,0.4,620.9985,0.0 -ND,Fully phased out,155355.4,73.8,0.0,0.0 -NE,No income,154722.36,27.6,0.0,0.0 +ND,Fully phased out,101308.375,48.1,0.0,0.0 +NE,Ineligible,262389.75,46.8,0.0,0.0 +NE,No earned income,31214.582,5.6,0.0,0.0 NE,Pre-phase-in,4269.599,0.8,508.52655,50.85265 NE,Full amount,55.43074,0.0,648.99994,64.899994 NE,Partially phased out,5687.6196,1.0,378.20206,37.82021 -NE,Fully phased out,395464.25,70.6,0.0,0.0 -NH,No income,107312.2,22.8,0.0,0.0 +NE,Fully phased out,256582.3,45.8,0.0,0.0 +NH,Ineligible,226942.31,48.2,0.0,0.0 +NH,No earned income,22381.025,4.8,0.0,0.0 NH,Pre-phase-in,10596.802,2.3,358.05423,0.0 NH,Full amount,0.36394194,0.0,649.0,0.0 NH,Partially phased out,4323.072,0.9,357.67905,0.0 -NH,Fully phased out,348272.1,74.0,0.0,0.0 -NJ,No income,744035.94,27.6,0.0,0.0 +NH,Fully phased out,206260.97,43.8,0.0,0.0 +NJ,Ineligible,1340070.0,49.7,0.0,0.0 +NJ,No earned income,153891.77,5.7,0.0,0.0 NJ,Pre-phase-in,45226.242,1.7,492.24957,196.89983 NJ,Full amount,824.88654,0.0,649.00006,259.60004 NJ,Partially phased out,28732.479,1.1,344.521,137.80841 -NJ,Fully phased out,1876480.9,69.6,0.0,0.0 -NM,No income,326654.53,48.0,0.0,0.0 +NJ,Fully phased out,1126555.0,41.8,0.0,0.0 +NM,Ineligible,407959.84,59.9,0.0,0.0 +NM,No earned income,55784.84,8.2,0.0,0.0 NM,Pre-phase-in,8126.212,1.2,532.416,133.104 NM,Full amount,337.28604,0.0,649.0,162.25 NM,Partially phased out,6862.6577,1.0,352.83185,88.20796 -NM,Fully phased out,339088.2,49.8,0.0,0.0 -NV,No income,350563.6,36.1,0.0,0.0 +NM,Fully phased out,201998.03,29.7,0.0,0.0 +NV,Ineligible,510968.28,52.6,0.0,0.0 +NV,No earned income,57919.402,6.0,0.0,0.0 NV,Pre-phase-in,14857.255,1.5,528.70886,0.0 NV,Full amount,150.01535,0.0,649.00006,0.0 NV,Partially phased out,8750.994,0.9,436.09225,0.0 -NV,Fully phased out,597421.9,61.5,0.0,0.0 -NY,No income,2387931.2,38.9,0.0,0.0 +NV,Fully phased out,379097.78,39.0,0.0,0.0 +NY,Ineligible,3417135.0,55.6,0.0,0.0 +NY,No earned income,418819.66,6.8,0.0,0.0 NY,Pre-phase-in,85475.5,1.4,497.05002,148.67133 NY,Full amount,4840.017,0.1,648.99994,191.7149 NY,Partially phased out,58324.44,0.9,358.95114,83.15456 -NY,Fully phased out,3609463.8,58.7,0.0,0.0 -OH,No income,1128293.6,35.2,0.0,0.0 +NY,Fully phased out,2161440.5,35.2,0.0,0.0 +OH,Ineligible,1675823.5,52.4,0.0,0.0 +OH,No earned income,203082.73,6.3,0.0,0.0 OH,Pre-phase-in,35809.71,1.1,520.55475,156.16643 OH,Full amount,1071.496,0.0,649.0,194.7 OH,Partially phased out,27460.96,0.9,437.14362,131.14308 -OH,Fully phased out,2008216.0,62.7,0.0,0.0 -OK,No income,451488.94,39.2,0.0,0.0 +OH,Fully phased out,1257603.4,39.3,0.0,0.0 +OK,Ineligible,607532.3,52.7,0.0,0.0 +OK,No earned income,78123.47,6.8,0.0,0.0 OK,Pre-phase-in,13047.681,1.1,517.97766,25.205536 OK,Full amount,430.25494,0.0,649.0,25.723408 OK,Partially phased out,7278.75,0.6,485.13818,12.743043 -OK,Fully phased out,680099.75,59.0,0.0,0.0 -OR,No income,584257.56,41.8,0.0,0.0 +OK,Fully phased out,445932.9,38.7,0.0,0.0 +OR,Ineligible,803898.6,57.5,0.0,0.0 +OR,No earned income,92558.29,6.6,0.0,0.0 OR,Pre-phase-in,14035.583,1.0,513.72626,46.235363 OR,Full amount,598.7314,0.0,649.00006,58.409996 OR,Partially phased out,13806.066,1.0,329.78485,29.680637 -OR,Fully phased out,784550.06,56.1,0.0,0.0 -PA,No income,1528526.2,37.3,0.0,0.0 +OR,Fully phased out,472350.72,33.8,0.0,0.0 +PA,Ineligible,2242255.8,54.8,0.0,0.0 +PA,No earned income,263918.12,6.4,0.0,0.0 PA,Pre-phase-in,57922.355,1.4,525.9283,0.0 PA,Full amount,1853.9569,0.0,649.0,0.0 PA,Partially phased out,29319.182,0.7,379.9078,0.0 -PA,Fully phased out,2477462.8,60.5,0.0,0.0 -RI,No income,135101.94,33.7,0.0,0.0 +PA,Fully phased out,1499815.2,36.6,0.0,0.0 +RI,Ineligible,199547.12,49.7,0.0,0.0 +RI,No earned income,23834.613,5.9,0.0,0.0 RI,Pre-phase-in,6809.517,1.7,515.9956,82.55931 RI,Full amount,184.90422,0.0,648.99994,103.84 RI,Partially phased out,2966.2002,0.7,409.08417,65.45346 -RI,Fully phased out,256211.83,63.8,0.0,0.0 -SC,No income,518110.75,37.0,0.0,0.0 +RI,Fully phased out,167932.03,41.8,0.0,0.0 +SC,Ineligible,776925.8,55.5,0.0,0.0 +SC,No earned income,90303.79,6.4,0.0,0.0 SC,Pre-phase-in,5204.986,0.4,457.20728,571.5345 SC,Full amount,690.02356,0.0,648.9999,811.2 SC,Partially phased out,10148.458,0.7,457.65204,572.04614 -SC,Fully phased out,866683.8,61.9,0.0,0.0 -SD,No income,67121.33,25.8,0.0,0.0 +SC,Fully phased out,517565.0,36.9,0.0,0.0 +SD,Ineligible,119758.0,46.1,0.0,0.0 +SD,No earned income,16743.0,6.4,0.0,0.0 SD,Pre-phase-in,1256.6394,0.5,529.5882,0.0 SD,Partially phased out,2834.1692,1.1,317.98163,0.0 -SD,Fully phased out,188838.94,72.6,0.0,0.0 -TN,No income,641917.3,29.9,0.0,0.0 +SD,Fully phased out,119459.266,45.9,0.0,0.0 +TN,Ineligible,1019347.2,47.5,0.0,0.0 +TN,No earned income,133740.02,6.2,0.0,0.0 TN,Pre-phase-in,26780.662,1.2,422.82413,0.0 TN,Full amount,512.49664,0.0,649.00006,0.0 TN,Partially phased out,10210.256,0.5,520.8847,0.0 -TN,Fully phased out,1466141.0,68.3,0.0,0.0 -TX,No income,2459692.5,29.5,0.0,0.0 +TN,Fully phased out,954971.2,44.5,0.0,0.0 +TX,Ineligible,4057729.0,48.6,0.0,0.0 +TX,No earned income,492069.03,5.9,0.0,0.0 TX,Pre-phase-in,153981.7,1.8,504.31863,0.0 TX,Full amount,197.92868,0.0,649.0,0.0 TX,Partially phased out,57914.004,0.7,494.92886,0.0 -TX,Fully phased out,5675496.0,68.0,0.0,0.0 -UT,No income,200944.84,27.3,0.0,0.0 +TX,Fully phased out,3585390.2,43.0,0.0,0.0 +UT,Ineligible,349041.34,47.5,0.0,0.0 +UT,No earned income,39604.105,5.4,0.0,0.0 UT,Pre-phase-in,6285.637,0.9,490.25928,0.0 UT,Full amount,131.55965,0.0,648.99994,0.0 UT,Partially phased out,10868.779,1.5,344.35452,0.0 -UT,Fully phased out,517236.75,70.3,0.0,0.0 -VA,No income,800697.0,33.8,0.0,0.0 +UT,Fully phased out,329536.16,44.8,0.0,0.0 +VA,Ineligible,1287870.1,54.3,0.0,0.0 +VA,No earned income,149247.8,6.3,0.0,0.0 VA,Pre-phase-in,11179.481,0.5,452.66635,90.53327 VA,Full amount,633.57574,0.0,649.0,129.79999 VA,Partially phased out,25600.77,1.1,356.98605,71.724556 -VA,Fully phased out,1532187.8,64.6,0.0,0.0 -VT,No income,75031.914,34.9,0.0,0.0 +VA,Fully phased out,895766.8,37.8,0.0,0.0 +VT,Ineligible,112731.08,52.4,0.0,0.0 +VT,No earned income,13113.921,6.1,0.0,0.0 VT,Pre-phase-in,2928.623,1.4,531.3219,531.3219 VT,Full amount,83.45121,0.0,649.0,649.0 VT,Partially phased out,2007.2561,0.9,345.8606,345.8606 -VT,Fully phased out,135030.55,62.8,0.0,0.0 -WA,No income,827862.5,30.3,0.0,0.0 +VT,Fully phased out,84217.46,39.2,0.0,0.0 +WA,Ineligible,1381875.4,50.5,0.0,0.0 +WA,No earned income,156773.4,5.7,0.0,0.0 WA,Pre-phase-in,67620.664,2.5,533.66614,334.37003 WA,Full amount,9.981713,0.0,649.0,334.36996 WA,Partially phased out,17639.67,0.6,470.84427,321.92703 -WA,Fully phased out,1821082.5,66.6,0.0,0.0 -WI,No income,614872.7,35.2,0.0,0.0 +WA,Fully phased out,1110296.2,40.6,0.0,0.0 +WI,Ineligible,932028.8,53.3,0.0,0.0 +WI,No earned income,112380.65,6.4,0.0,0.0 WI,Pre-phase-in,15454.481,0.9,474.73358,0.0 WI,Full amount,782.4303,0.0,649.0,0.0 WI,Partially phased out,12181.741,0.7,411.6662,0.0 -WI,Fully phased out,1104725.2,63.2,0.0,0.0 -WV,No income,230403.9,43.9,0.0,0.0 +WI,Fully phased out,675188.44,38.6,0.0,0.0 +WV,Ineligible,304861.97,58.1,0.0,0.0 +WV,No earned income,37071.91,7.1,0.0,0.0 WV,Pre-phase-in,5904.34,1.1,533.2025,0.0 WV,Full amount,626.0814,0.1,649.0,0.0 WV,Partially phased out,3794.6746,0.7,538.4127,0.0 -WV,Fully phased out,283687.66,54.1,0.0,0.0 -WY,No income,45254.85,26.3,0.0,0.0 +WV,Fully phased out,172157.69,32.8,0.0,0.0 +WY,Ineligible,84500.09,49.2,0.0,0.0 +WY,No earned income,9194.863,5.4,0.0,0.0 WY,Pre-phase-in,2254.627,1.3,336.4639,0.0 WY,Full amount,87.738815,0.1,648.99994,0.0 WY,Partially phased out,810.4804,0.5,622.8266,0.0 -WY,Fully phased out,123354.51,71.8,0.0,0.0 +WY,Fully phased out,74914.4,43.6,0.0,0.0 diff --git a/obbba_district_impacts/Congressional-Hackathon-2025 b/obbba_district_impacts/Congressional-Hackathon-2025 new file mode 160000 index 0000000..3f6d05e --- /dev/null +++ b/obbba_district_impacts/Congressional-Hackathon-2025 @@ -0,0 +1 @@ +Subproject commit 3f6d05e76400c6e396a3a4eddd34a7b3f6919fc3