diff --git a/.gitignore b/.gitignore index 8b248d0..d8ba267 100644 --- a/.gitignore +++ b/.gitignore @@ -163,3 +163,6 @@ cython_debug/ # and can be added to the global gitignore or merged into this file. For a more nuclear # option (not recommended) you can uncomment the following to ignore the entire idea folder. #.idea/ + +# Large data files +eitc_childless_analysis/eitc_childless_families_*.csv diff --git a/eitc_childless_analysis/eitc_childless_analysis.ipynb b/eitc_childless_analysis/eitc_childless_analysis.ipynb new file mode 100644 index 0000000..e5a4890 --- /dev/null +++ b/eitc_childless_analysis/eitc_childless_analysis.ipynb @@ -0,0 +1,3022 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# EITC Analysis: Childless Filers by Phase Status\n", + "\n", + "## Overview\n", + "This notebook analyzes **childless tax units** (those with no EITC-qualifying children) across all 50 US states + DC, categorizing them by where they fall on the EITC schedule.\n", + "\n", + "## What This Notebook Does\n", + "1. **Loads state-specific microdata** from PolicyEngine's HuggingFace repository\n", + "2. **Filters to childless filers** (eitc_child_count == 0)\n", + "3. **Checks EITC eligibility** (age requirements, SSN, investment income limits)\n", + "4. **Categorizes each household** into one of 6 phase statuses\n", + "5. **Calculates weighted counts and percentages** by state\n", + "6. **Exports summary data** to CSV files\n", + "\n", + "## EITC Phase Status Categories\n", + "| Status | Description |\n", + "|--------|-------------|\n", + "| **Ineligible** | Does not meet EITC eligibility requirements (age, SSN, investment income, or filing status) |\n", + "| **No earned income** | No earned income, therefore no EITC |\n", + "| **Pre-phase-in** | Has earned income but hasn't reached maximum credit yet |\n", + "| **Full amount** | At the plateau - receiving maximum credit |\n", + "| **Partially phased out** | In phase-out range, receiving reduced credit |\n", + "| **Fully phased out** | Income too high, EITC reduced to $0 |\n", + "\n", + "## Data Source\n", + "- **State datasets**: `hf://policyengine/policyengine-us-data/states/{STATE}.h5`\n", + "- Each state has its own dataset with representative household microdata\n", + "- Data is weighted to represent the actual population\n", + "\n", + "## Output Files\n", + "- `eitc_childless_phase_status_summary_{year}.csv` - Aggregated by state and phase status\n", + "\n", + "## Years Analyzed\n", + "- 2024 and 2025" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## State EITC Programs\n", + "\n", + "As of 2024, **31 states plus DC** have state-level Earned Income Tax Credit programs. Most states calculate their EITC as a simple percentage match of the federal EITC, but several have unique structures.\n", + "\n", + "### States with Standard Federal Match Structure\n", + "These states calculate their state EITC as a percentage of the federal EITC amount:\n", + "\n", + "| State | Match % | Refundable | Notes |\n", + "|-------|---------|------------|-------|\n", + "| CO | 50% (2024) | Yes | Phasing down to 10% by 2034 |\n", + "| CT | ~30% | Yes | |\n", + "| DC | 70% | Yes | Higher match for childless workers |\n", + "| DE | 4.5% ref / 20% non-ref | Choice | Taxpayers choose refundable OR non-refundable |\n", + "| HI | 40% | Yes | |\n", + "| IL | 20% | Yes | |\n", + "| IN | 10% | Yes | |\n", + "| IA | 15% | Yes | |\n", + "| KS | 17% | Yes | |\n", + "| LA | 5% | Yes | |\n", + "| ME | 50% | Yes | |\n", + "| MA | 40% | Yes | |\n", + "| MI | 30% | Yes | |\n", + "| MO | 20% | Yes | Called \"Working Families Tax Credit\" |\n", + "| MT | 10% | Yes | |\n", + "| NE | 10% | Yes | |\n", + "| NJ | Variable | Yes | Varies by income |\n", + "| NM | ~25% | Yes | |\n", + "| NY | 30% | Yes | Plus supplemental credit |\n", + "| OH | 30% | Yes | |\n", + "| OK | 5% | Yes | Lowest in nation |\n", + "| OR | 9-12% | Yes | Varies by children |\n", + "| PA | ~10% | Yes | |\n", + "| RI | 16% | Yes | |\n", + "| SC | 125% | Yes | Highest in nation |\n", + "| VT | ~38% | Yes | Increased to 100% for childless in 2025 |\n", + "| WI | Variable | Yes | Varies by children |\n", + "\n", + "### States with UNIQUE/NON-STANDARD Structures\n", + "\n", + "#### California (CA) - CalEITC\n", + "California does NOT simply match the federal EITC. Instead:\n", + "- Uses an **85% adjustment factor** applied to a state-specific calculation\n", + "- Has **different phase-in rates by number of children**:\n", + " - 0 children: 7.65%\n", + " - 1 child: 34%\n", + " - 2 children: 40%\n", + " - 3+ children: 45%\n", + "- Has a **two-stage phase-out** structure\n", + "- Maximum credit is lower than federal EITC\n", + "- **Fully refundable**\n", + "\n", + "#### Minnesota (MN) - Working Family Credit / Child & Working Families Credit\n", + "Minnesota **replaced** its traditional Working Family Credit in 2023 with the **Child and Working Families Credit (CWFC)**:\n", + "- **Two-part credit structure**:\n", + " 1. Child Tax Credit component: Fixed amount per qualifying child\n", + " 2. Working Family Credit component: Phase-in based on earnings\n", + "- Combined amounts phase out together based on AGI or earnings\n", + "- **Completely independent calculation** from federal EITC\n", + "- **Fully refundable**\n", + "\n", + "#### Washington (WA) - Working Families Tax Credit (WFTC)\n", + "Washington has **no income tax** and therefore no traditional EITC. Instead:\n", + "- Provides a **flat dollar amount** based on number of children:\n", + " - 0 children: $300-$325\n", + " - 1 child: $600-$640\n", + " - 2 children: $900-$965\n", + " - 3+ children: $1,200-$1,290\n", + "- Phases out starting **$2,500-$5,000 below** federal EITC AGI limits\n", + "- Requires claiming federal EITC to qualify\n", + "- **Fully refundable**\n", + "\n", + "#### Virginia (VA) - Split Refundable/Non-Refundable + Low-Income Tax Credit\n", + "Virginia has the most complex structure:\n", + "- **Non-refundable match**: 20% of federal EITC (since 2006)\n", + "- **Refundable match**: Variable (0% → 15% → 20% → 15% over different years)\n", + "- **Alternative Low-Income Tax Credit (LITC)**: $300 per personal exemption\n", + "- Taxpayers receive the **better of** EITC match or LITC\n", + "- Separate filers receive prorated credits\n", + "\n", + "#### Delaware (DE) - Choice Between Refundable and Non-Refundable\n", + "Delaware requires taxpayers to **choose one**:\n", + "- **Refundable option**: 4.5% of federal EITC\n", + "- **Non-refundable option**: 20% of federal EITC\n", + "- Cannot claim both\n", + "\n", + "#### Maryland (MD) - Differentiated by Family Status\n", + "Maryland varies match percentages by family composition:\n", + "- **Married OR has children**: \n", + " - Non-refundable: 50%\n", + " - Refundable: 25-45%\n", + "- **Childless unmarried filers**: Different (lower) percentages\n", + "- Has separate parameters for different filing situations\n", + "\n", + "### States WITHOUT State EITC Programs\n", + "The following states have **no state EITC**: AL, AK, AZ, AR, FL, GA, ID, KY, MS, NV, NH, NC, ND, SD, TN, TX, UT, WV, WY" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# =============================================================================\n", + "# IMPORTS AND CONFIGURATION\n", + "# =============================================================================\n", + "# \n", + "# policyengine_us: PolicyEngine's US tax-benefit microsimulation model\n", + "# - Microsimulation: Class for running simulations on survey microdata\n", + "# - Loads datasets, calculates tax/benefit variables for each household\n", + "#\n", + "# pandas/numpy: Standard data manipulation libraries\n", + "# =============================================================================\n", + "\n", + "from policyengine_us import Microsimulation\n", + "import pandas as pd\n", + "import numpy as np\n", + "\n", + "# Configure pandas display options for better output formatting\n", + "pd.set_option('display.max_columns', None) # Show all columns\n", + "pd.set_option('display.width', None) # Don't wrap output\n", + "pd.set_option('display.float_format', lambda x: f'{x:,.2f}') # Format numbers with commas" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## EITC Phase Status Classification\n", + "\n", + "The Earned Income Tax Credit (EITC) follows a trapezoidal schedule:\n", + "\n", + "```\n", + "Credit\n", + "Amount\n", + " ^\n", + " | ___________\n", + " | / \\\n", + " | / \\\n", + " | / \\\n", + " | / \\\n", + " | / \\\n", + " |/_____________________\\____> Earned Income\n", + " Phase-in Plateau Phase-out\n", + "```\n", + "\n", + "### EITC Eligibility Requirements (Childless Filers)\n", + "Before a childless filer can receive EITC, they must meet:\n", + "1. **Age requirement**: Between 25 and 64 years old (or 19+ if former foster youth/homeless)\n", + "2. **SSN requirement**: Valid Social Security Number for work\n", + "3. **Investment income limit**: Investment income must be below threshold (~$11,000 in 2024)\n", + "4. **Filing status**: Cannot file as \"Married Filing Separately\" (in most cases)\n", + "\n", + "### How We Classify Households\n", + "\n", + "We use PolicyEngine's calculated variables:\n", + "\n", + "| Variable | Description |\n", + "|----------|-------------|\n", + "| `eitc_eligible` | Whether tax unit meets all EITC eligibility requirements |\n", + "| `eitc` | Final EITC amount received (after all calculations) |\n", + "| `eitc_maximum` | Maximum possible EITC for this filing status |\n", + "| `eitc_phased_in` | Amount \"earned\" based on phase-in rate × earned income |\n", + "| `eitc_reduction` | Amount reduced due to being in phase-out range |\n", + "| `tax_unit_earned_income` | Total earned income for the tax unit |\n", + "\n", + "### Classification Logic (in priority order)\n", + "1. **Ineligible**: `eitc_eligible == False` (fails age, SSN, investment income, or filing status)\n", + "2. **No earned income**: `tax_unit_earned_income == 0` (eligible but no earnings)\n", + "3. **Pre-phase-in**: Receiving EITC but `eitc_phased_in < eitc_maximum`\n", + "4. **Full amount**: `eitc_phased_in >= eitc_maximum` AND `eitc_reduction == 0`\n", + "5. **Partially phased out**: Receiving EITC AND `eitc_reduction > 0`\n", + "6. **Fully phased out**: `eitc == 0` AND has income (phased out completely)" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "# =============================================================================\n", + "# EITC PHASE STATUS CLASSIFICATION FUNCTION\n", + "# =============================================================================\n", + "# This function takes a DataFrame of households and classifies each one into\n", + "# one of 6 EITC phase statuses based on eligibility, income, and EITC calculations.\n", + "#\n", + "# Uses numpy's np.select() for efficient vectorized conditional logic.\n", + "# =============================================================================\n", + "\n", + "def determine_eitc_phase_status_vectorized(df):\n", + " \"\"\"\n", + " Classify each household into an EITC phase status category.\n", + " \n", + " Parameters:\n", + " -----------\n", + " df : pandas.DataFrame\n", + " Must contain columns: eitc_eligible, tax_unit_earned_income, eitc, \n", + " eitc_reduction, eitc_phased_in, eitc_maximum\n", + " \n", + " Returns:\n", + " --------\n", + " numpy.ndarray\n", + " Array of status strings, one per row in df\n", + " \n", + " Categories (in priority order):\n", + " -------------------------------\n", + " 1. Ineligible: Does not meet EITC eligibility (age, SSN, investment income)\n", + " 2. No earned income: Eligible but has zero earned income\n", + " 3. Pre-phase-in: Receiving EITC, still building up to maximum\n", + " 4. Full amount: At maximum credit (plateau region)\n", + " 5. Partially phased out: In phase-out region, still receiving some credit\n", + " 6. Fully phased out: Income too high, EITC reduced to $0\n", + " \"\"\"\n", + " \n", + " # Define conditions in PRIORITY ORDER (first match wins)\n", + " conditions = [\n", + " # CONDITION 1: Ineligible for EITC\n", + " # Fails age requirement (25-64), SSN, investment income limit, or filing status\n", + " df['eitc_eligible'] == False,\n", + " \n", + " # CONDITION 2: No earned income\n", + " # Eligible for EITC but has zero earned income (cannot receive credit)\n", + " (df['eitc_eligible'] == True) & (df['tax_unit_earned_income'] == 0),\n", + " \n", + " # CONDITION 3: Pre-phase-in\n", + " # Receiving EITC, but haven't earned enough to hit maximum yet\n", + " (df['eitc'] > 0) & (df['eitc_phased_in'] < df['eitc_maximum']),\n", + " \n", + " # CONDITION 4: Full amount (plateau)\n", + " # Receiving EITC at maximum, no reduction applied\n", + " (df['eitc'] > 0) & (df['eitc_phased_in'] >= df['eitc_maximum']) & (df['eitc_reduction'] <= 0),\n", + " \n", + " # CONDITION 5: Partially phased out\n", + " # Receiving EITC, but some reduction has been applied\n", + " (df['eitc'] > 0) & (df['eitc_reduction'] > 0),\n", + " \n", + " # CONDITION 6: Fully phased out\n", + " # Eligible, has income, but EITC reduced to zero\n", + " (df['eitc_eligible'] == True) & (df['tax_unit_earned_income'] > 0) & (df['eitc'] <= 0),\n", + " ]\n", + " \n", + " # Labels corresponding to each condition above\n", + " choices = [\n", + " 'Ineligible',\n", + " 'No earned income',\n", + " 'Pre-phase-in',\n", + " 'Full amount',\n", + " 'Partially phased out',\n", + " 'Fully phased out'\n", + " ]\n", + " \n", + " # np.select applies conditions in order, returns first matching choice\n", + " # Default catches any edge cases\n", + " return np.select(conditions, choices, default='Ineligible')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Data Loading Functions\n", + "\n", + "### `run_state_eitc_analysis(state_abbr, year)`\n", + "Loads and processes data for a single state:\n", + "1. Loads the state's microdata from HuggingFace\n", + "2. Calculates all relevant EITC and household variables\n", + "3. Filters to childless filers only (`eitc_child_count == 0`)\n", + "4. Classifies each household by EITC phase status\n", + "5. Returns a DataFrame with one row per household\n", + "\n", + "### `run_all_states_analysis(year)`\n", + "Orchestrates the full analysis:\n", + "1. Loops through all 51 states/DC\n", + "2. Calls `run_state_eitc_analysis()` for each\n", + "3. Combines all results into a single DataFrame\n", + "\n", + "### Variables Calculated\n", + "| Variable | Description |\n", + "|----------|-------------|\n", + "| `tax_unit_weight` | Survey weight (how many real households this record represents) |\n", + "| `eitc_eligible` | Whether tax unit meets all EITC eligibility requirements |\n", + "| `eitc` | Federal EITC amount received |\n", + "| `state_eitc` | State EITC amount (if state has a program) |\n", + "| `eitc_child_count` | Number of EITC-qualifying children (we filter to 0) |\n", + "| `tax_unit_earned_income` | Total earned income for the tax unit |\n", + "| `age_head` | Age of primary filer |" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "# =============================================================================\n", + "# STATE LIST AND DATA LOADING FUNCTIONS\n", + "# =============================================================================\n", + "\n", + "# All US states + DC (51 total)\n", + "ALL_STATES = [\n", + " 'AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL', \n", + " 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', \n", + " 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', \n", + " 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', \n", + " 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY'\n", + "]\n", + "\n", + "# Order for sorting phase statuses (follows logical EITC flow)\n", + "PHASE_ORDER = [\n", + " 'Ineligible', # Cannot receive EITC (age/SSN/investment income)\n", + " 'No earned income', # Eligible but no earnings\n", + " 'Pre-phase-in', # Building up to maximum\n", + " 'Full amount', # At maximum (plateau)\n", + " 'Partially phased out', # Being reduced\n", + " 'Fully phased out' # Reduced to $0\n", + "]\n", + "\n", + "\n", + "def run_state_eitc_analysis(state_abbr, year):\n", + " \"\"\"\n", + " Load and analyze EITC data for a single state.\n", + " \n", + " Parameters:\n", + " -----------\n", + " state_abbr : str\n", + " Two-letter state abbreviation (e.g., 'CA', 'NY', 'TX')\n", + " year : int\n", + " Tax year to analyze (e.g., 2024, 2025)\n", + " \n", + " Returns:\n", + " --------\n", + " pandas.DataFrame or None\n", + " DataFrame with one row per childless tax unit, or None if error\n", + " \"\"\"\n", + " try:\n", + " # Load the state's microdata from HuggingFace\n", + " dataset_path = f\"hf://policyengine/policyengine-us-data/states/{state_abbr}.h5\"\n", + " sim = Microsimulation(dataset=dataset_path)\n", + " \n", + " # Variables to calculate\n", + " tax_unit_vars = [\n", + " 'tax_unit_id', # Unique identifier\n", + " 'tax_unit_weight', # Survey weight\n", + " 'eitc_eligible', # NEW: Whether eligible for EITC\n", + " 'eitc', # Federal EITC amount\n", + " 'eitc_maximum', # Max possible EITC\n", + " 'eitc_phased_in', # Phase-in amount\n", + " 'eitc_reduction', # Phase-out reduction\n", + " 'eitc_child_count', # Number of EITC-qualifying children\n", + " 'state_eitc', # State EITC amount\n", + " 'tax_unit_earned_income', # Total earned income\n", + " 'age_head', # Age of primary filer\n", + " ]\n", + " \n", + " # Calculate each variable\n", + " data = {}\n", + " for var in tax_unit_vars:\n", + " result = sim.calculate(var, period=year)\n", + " data[var] = result.values if hasattr(result, 'values') else np.array(result)\n", + " \n", + " df = pd.DataFrame(data)\n", + " df['state'] = state_abbr\n", + " \n", + " # Filter to childless filers only\n", + " childless_mask = df['eitc_child_count'] == 0\n", + " df_childless = df[childless_mask].copy()\n", + " \n", + " if len(df_childless) == 0:\n", + " return None\n", + " \n", + " # Classify each household by EITC phase status\n", + " df_childless['eitc_phase_status'] = determine_eitc_phase_status_vectorized(df_childless)\n", + " df_childless['year'] = year\n", + " \n", + " return df_childless\n", + " \n", + " except Exception as e:\n", + " print(f\" Error processing {state_abbr}: {e}\")\n", + " return None\n", + "\n", + "\n", + "def run_all_states_analysis(year, states=None):\n", + " \"\"\"\n", + " Run EITC analysis for all states and combine results.\n", + " \"\"\"\n", + " if states is None:\n", + " states = ALL_STATES\n", + " \n", + " print(f\"\\n{'='*60}\")\n", + " print(f\"Running analysis for {year}\")\n", + " print(f\"{'='*60}\")\n", + " \n", + " all_results = []\n", + " \n", + " for i, state in enumerate(states):\n", + " print(f\"Processing {state} ({i+1}/{len(states)})...\", end=\" \")\n", + " result = run_state_eitc_analysis(state, year)\n", + " \n", + " if result is not None and len(result) > 0:\n", + " weighted_count = result['tax_unit_weight'].sum()\n", + " print(f\"{len(result):,} records, {weighted_count:,.0f} weighted\")\n", + " all_results.append(result)\n", + " else:\n", + " print(\"No data found\")\n", + " \n", + " if all_results:\n", + " combined = pd.concat(all_results, ignore_index=True)\n", + " print(f\"\\nTotal: {len(combined):,} records, {combined['tax_unit_weight'].sum():,.0f} weighted tax units\")\n", + " return combined\n", + " else:\n", + " return pd.DataFrame()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Run Analysis for 2024 and 2025" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "============================================================\n", + "Running analysis for 2024\n", + "============================================================\n", + "Processing AL (1/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5febc04fad0e4300bd2649808cf6d148", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "AL.h5: 0%| | 0.00/35.6M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "25,751 records, 1,422,123 weighted\n", + "Processing AK (2/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3cbec989f729485cb1a711c01d3c2523", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "AK.h5: 0%| | 0.00/1.59M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1,182 records, 205,778 weighted\n", + "Processing AZ (3/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "fa6ce605ab34489bbc0e2c1e326730e7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "AZ.h5: 0%| | 0.00/41.0M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "30,120 records, 1,905,622 weighted\n", + "Processing AR (4/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d13287a0a9bf4c1f91fd5ae6df4f370a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "AR.h5: 0%| | 0.00/21.1M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "15,144 records, 683,842 weighted\n", + "Processing CA (5/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8b9436f3e6de492d91228f6ac061dd92", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "CA.h5: 0%| | 0.00/334M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "238,247 records, 11,676,756 weighted\n", + "Processing CO (6/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "eac4bc5a650949028572b60d82a6ff85", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "CO.h5: 0%| | 0.00/46.3M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "34,120 records, 1,602,958 weighted\n", + "Processing CT (7/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "eceb02eb07c3406fbbeb6051af1cce04", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "CT.h5: 0%| | 0.00/27.9M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "19,827 records, 1,119,846 weighted\n", + "Processing DE (8/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "965844b28cdd4c45969c8f0c2f5a0251", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "DE.h5: 0%| | 0.00/5.47M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3,801 records, 265,233 weighted\n", + "Processing DC (9/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7c0c053c5c7e423390ef1a12fde0e5b7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "DC.h5: 0%| | 0.00/7.56M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "4,995 records, 247,082 weighted\n", + "Processing FL (10/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a33b9cccb968459aa058dd367db7a66d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "FL.h5: 0%| | 0.00/56.4M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "45,655 records, 6,828,672 weighted\n", + "Processing GA (11/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a9eab9273a68484c97eb8837b8838bda", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "GA.h5: 0%| | 0.00/77.1M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "56,638 records, 2,867,909 weighted\n", + "Processing HI (12/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "09f5b941fa8d40f6a7b6edc5878f79b6", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HI.h5: 0%| | 0.00/11.6M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "8,416 records, 401,230 weighted\n", + "Processing ID (13/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b8b9aa4512764832848e02e73e99bd9a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "ID.h5: 0%| | 0.00/10.4M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "7,678 records, 420,636 weighted\n", + "Processing IL (14/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9798fd13998843f49b6d068cce2a5109", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "IL.h5: 0%| | 0.00/76.9M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "56,631 records, 4,061,833 weighted\n", + "Processing IN (15/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ce0e73ee527849d0bec80b7b74e8c3d4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "IN.h5: 0%| | 0.00/46.1M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "33,456 records, 1,707,284 weighted\n", + "Processing IA (16/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "915da8e399b342cda53d0f5fa672e6aa", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "IA.h5: 0%| | 0.00/19.6M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "14,070 records, 834,990 weighted\n", + "Processing KS (17/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7f203f4f805646fb86b0a4132d1583c7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "KS.h5: 0%| | 0.00/21.6M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "15,776 records, 746,492 weighted\n", + "Processing KY (18/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "916fd778263c48c4be4ec769b8d41fe0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "KY.h5: 0%| | 0.00/30.5M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "22,109 records, 1,122,918 weighted\n", + "Processing LA (19/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f75d01d62cc044488a2c0a939b79d389", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "LA.h5: 0%| | 0.00/30.3M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "21,674 records, 1,255,035 weighted\n", + "Processing ME (20/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ba6e172e03154000b08a410d28d3e674", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "ME.h5: 0%| | 0.00/11.0M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "7,782 records, 436,655 weighted\n", + "Processing MD (21/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "dfc4224942a341708a3424e8625d9af8", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "MD.h5: 0%| | 0.00/54.7M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "39,963 records, 1,737,465 weighted\n", + "Processing MA (22/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "eafcbfb2d5f944f4a09fb38e686e9a7d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "MA.h5: 0%| | 0.00/56.3M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "40,034 records, 2,445,482 weighted\n", + "Processing MI (23/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "cea20110b2ee4285ac370fb8869bf896", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "MI.h5: 0%| | 0.00/57.3M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "41,722 records, 2,947,462 weighted\n", + "Processing MN (24/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "297a2c26953f4dee9cec234405fda4ec", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "MN.h5: 0%| | 0.00/45.4M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "32,839 records, 1,579,933 weighted\n", + "Processing MS (25/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "03dd07c82aa74f1c9e9bc0dbae9065a9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "MS.h5: 0%| | 0.00/18.9M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "13,414 records, 751,858 weighted\n", + "Processing MO (26/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "25181a2d3a33451388cd051ba832a77c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "MO.h5: 0%| | 0.00/41.1M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "29,883 records, 1,572,474 weighted\n", + "Processing MT (27/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e65ddfa0f1314741b0042d259b8f93ef", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "MT.h5: 0%| | 0.00/10.8M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "7,850 records, 322,606 weighted\n", + "Processing NE (28/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "267cb0a9e34f4b60ad6bd90edc4a561a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "NE.h5: 0%| | 0.00/10.3M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "7,585 records, 555,046 weighted\n", + "Processing NV (29/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d1c1931007be440f95ecfdacdff8b3a7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "NV.h5: 0%| | 0.00/14.2M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "10,477 records, 962,804 weighted\n", + "Processing NH (30/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4433e09858184566a87c6f4abd171d11", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "NH.h5: 0%| | 0.00/3.58M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2,774 records, 466,176 weighted\n", + "Processing NJ (31/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "df877f21e79042308823efacec3c0337", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "NJ.h5: 0%| | 0.00/75.4M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "53,826 records, 2,670,506 weighted\n", + "Processing NM (32/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "20193afcd274477193c83f6b286cae8e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "NM.h5: 0%| | 0.00/14.7M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "10,333 records, 674,804 weighted\n", + "Processing NY (33/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4e492771e45642aba7483a71fc87bb50", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "NY.h5: 0%| | 0.00/155M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "111,004 records, 6,089,496 weighted\n", + "Processing NC (34/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "79fe95f4c0c243209629ab08b64704d7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "NC.h5: 0%| | 0.00/75.2M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "55,174 records, 3,018,448 weighted\n", + "Processing ND (35/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d9ae0a05759f4cf5b33071fa475a468e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "ND.h5: 0%| | 0.00/4.80M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3,391 records, 208,559 weighted\n", + "Processing OH (36/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e844542f5b954040b78212bc3cab933b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "OH.h5: 0%| | 0.00/72.0M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "52,414 records, 3,171,406 weighted\n", + "Processing OK (37/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "820f250db78c461da843b3702d3fa2f0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "OK.h5: 0%| | 0.00/24.8M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "17,840 records, 1,141,744 weighted\n", + "Processing OR (38/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "923dcf30790e440f8874853c9f464e6d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "OR.h5: 0%| | 0.00/37.5M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "27,048 records, 1,384,394 weighted\n", + "Processing PA (39/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "67555e66f486451c8fd3fd54c10f5847", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "PA.h5: 0%| | 0.00/81.9M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "59,791 records, 4,057,412 weighted\n", + "Processing RI (40/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d10a8288393b4c36a8a3c74e8ef79e4b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "RI.h5: 0%| | 0.00/10.3M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "7,429 records, 397,583 weighted\n", + "Processing SC (41/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1b2775944bee4c5d9a210104fc2e4731", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "SC.h5: 0%| | 0.00/36.9M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "26,703 records, 1,387,951 weighted\n", + "Processing SD (42/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "49ed2cd7c18b42e985b0b479ed3f8c4e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "SD.h5: 0%| | 0.00/1.53M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1,071 records, 257,659 weighted\n", + "Processing TN (43/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "61ce789e36eb43ca9cad8a237824af2f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "TN.h5: 0%| | 0.00/13.2M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "11,099 records, 2,125,824 weighted\n", + "Processing TX (44/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e656ceb21aef43508792622491df4828", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "TX.h5: 0%| | 0.00/56.5M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "46,778 records, 8,270,492 weighted\n", + "Processing UT (45/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d9ea5a74412d43379f54844c3c3fe468", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "UT.h5: 0%| | 0.00/24.9M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "18,448 records, 728,702 weighted\n", + "Processing VT (46/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "10b730b455bf42dd8746678f4b185c09", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "VT.h5: 0%| | 0.00/5.38M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3,780 records, 213,103 weighted\n", + "Processing VA (47/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ba76d55fca004ac8b27f5c388612c677", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "VA.h5: 0%| | 0.00/65.4M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "47,926 records, 2,348,494 weighted\n", + "Processing WA (48/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "786dbd9fab924268beab0bf97453dcd9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "WA.h5: 0%| | 0.00/15.2M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "12,571 records, 2,709,062 weighted\n", + "Processing WV (49/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "39fd0f42363e4e739eae83a2ea996917", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "WV.h5: 0%| | 0.00/9.70M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "6,981 records, 519,592 weighted\n", + "Processing WI (50/51)... " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0e2ed0af9bb34b67a651b7d10f791397", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "WI.h5: 0%| | 0.00/38.0M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "27,609 records, 1,731,936 weighted\n", + "Processing WY (51/51)... " + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f41aba6775804ca28621f687e3a08a51", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "WY.h5: 0%| | 0.00/3.95M [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2,712 records, 170,182 weighted\n", + "\n", + "Total: 1,493,541 records, 96,431,536 weighted tax units\n" + ] + } + ], + "source": [ + "# =============================================================================\n", + "# RUN ANALYSIS FOR 2024\n", + "# =============================================================================\n", + "# This cell processes all 51 states/DC for tax year 2024.\n", + "# \n", + "# Output:\n", + "# df_2024 - DataFrame containing all childless tax units from all states\n", + "# with EITC calculations and phase status classification\n", + "#\n", + "# Processing time: Approximately 5-10 minutes depending on internet speed\n", + "# (downloads ~50MB of data from HuggingFace)\n", + "# =============================================================================\n", + "\n", + "df_2024 = run_all_states_analysis(2024)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "============================================================\n", + "Running analysis for 2025\n", + "============================================================\n", + "Processing AL (1/51)... 25,751 records, 1,435,327 weighted\n", + "Processing AK (2/51)... 1,182 records, 207,689 weighted\n", + "Processing AZ (3/51)... 30,120 records, 1,923,315 weighted\n", + "Processing AR (4/51)... 15,144 records, 690,191 weighted\n", + "Processing CA (5/51)... 238,247 records, 11,785,171 weighted\n", + "Processing CO (6/51)... 34,120 records, 1,617,841 weighted\n", + "Processing CT (7/51)... 19,827 records, 1,130,243 weighted\n", + "Processing DE (8/51)... 3,801 records, 267,696 weighted\n", + "Processing DC (9/51)... 4,995 records, 249,376 weighted\n", + "Processing FL (10/51)... 45,655 records, 6,892,074 weighted\n", + "Processing GA (11/51)... 56,638 records, 2,894,537 weighted\n", + "Processing HI (12/51)... 8,416 records, 404,956 weighted\n", + "Processing ID (13/51)... 7,678 records, 424,542 weighted\n", + "Processing IL (14/51)... 56,631 records, 4,099,546 weighted\n", + "Processing IN (15/51)... 33,456 records, 1,723,135 weighted\n", + "Processing IA (16/51)... 14,070 records, 842,742 weighted\n", + "Processing KS (17/51)... 15,776 records, 753,423 weighted\n", + "Processing KY (18/51)... 22,109 records, 1,133,344 weighted\n", + "Processing LA (19/51)... 21,674 records, 1,266,688 weighted\n", + "Processing ME (20/51)... 7,782 records, 440,709 weighted\n", + "Processing MD (21/51)... 39,963 records, 1,753,596 weighted\n", + "Processing MA (22/51)... 40,034 records, 2,468,188 weighted\n", + "Processing MI (23/51)... 41,722 records, 2,974,829 weighted\n", + "Processing MN (24/51)... 32,839 records, 1,594,602 weighted\n", + "Processing MS (25/51)... 13,414 records, 758,839 weighted\n", + "Processing MO (26/51)... 29,883 records, 1,587,074 weighted\n", + "Processing MT (27/51)... 7,850 records, 325,601 weighted\n", + "Processing NE (28/51)... 7,585 records, 560,199 weighted\n", + "Processing NV (29/51)... 10,477 records, 971,744 weighted\n", + "Processing NH (30/51)... 2,774 records, 470,505 weighted\n", + "Processing NJ (31/51)... 53,826 records, 2,695,300 weighted\n", + "Processing NM (32/51)... 10,333 records, 681,069 weighted\n", + "Processing NY (33/51)... 111,004 records, 6,146,035 weighted\n", + "Processing NC (34/51)... 55,174 records, 3,046,473 weighted\n", + "Processing ND (35/51)... 3,391 records, 210,495 weighted\n", + "Processing OH (36/51)... 52,414 records, 3,200,852 weighted\n", + "Processing OK (37/51)... 17,840 records, 1,152,346 weighted\n", + "Processing OR (38/51)... 27,048 records, 1,397,248 weighted\n", + "Processing PA (39/51)... 59,791 records, 4,095,084 weighted\n", + "Processing RI (40/51)... 7,429 records, 401,274 weighted\n", + "Processing SC (41/51)... 26,703 records, 1,400,838 weighted\n", + "Processing SD (42/51)... 1,071 records, 260,051 weighted\n", + "Processing TN (43/51)... 11,099 records, 2,145,562 weighted\n", + "Processing TX (44/51)... 46,778 records, 8,347,282 weighted\n", + "Processing UT (45/51)... 18,448 records, 735,468 weighted\n", + "Processing VT (46/51)... 3,780 records, 215,082 weighted\n", + "Processing VA (47/51)... 47,926 records, 2,370,298 weighted\n", + "Processing WA (48/51)... 12,571 records, 2,734,216 weighted\n", + "Processing WV (49/51)... 6,981 records, 524,417 weighted\n", + "Processing WI (50/51)... 27,609 records, 1,748,017 weighted\n", + "Processing WY (51/51)... 2,712 records, 171,762 weighted\n", + "\n", + "Total: 1,493,541 records, 97,326,840 weighted tax units\n" + ] + } + ], + "source": [ + "# =============================================================================\n", + "# RUN ANALYSIS FOR 2025\n", + "# =============================================================================\n", + "# Same analysis as above but for tax year 2025.\n", + "# PolicyEngine uses inflation-adjusted parameters for future years.\n", + "#\n", + "# Output:\n", + "# df_2025 - DataFrame containing all childless tax units for 2025\n", + "# =============================================================================\n", + "\n", + "df_2025 = run_all_states_analysis(2025)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Combined dataset: 2,987,082 records\n" + ] + } + ], + "source": [ + "# =============================================================================\n", + "# COMBINE BOTH YEARS INTO SINGLE DATASET\n", + "# =============================================================================\n", + "# Creates a unified dataset with both years for cross-year comparisons.\n", + "# The 'year' column distinguishes records from each tax year.\n", + "#\n", + "# Note: This combined dataset is primarily for exploratory analysis.\n", + "# The exports are done separately by year for cleaner output files.\n", + "# =============================================================================\n", + "\n", + "df_combined = pd.concat([df_2024, df_2025], ignore_index=True)\n", + "print(f\"\\nCombined dataset: {len(df_combined):,} records\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create and Export Summary" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "======================================================================\n", + "EITC Phase Status by State - 2024\n", + "======================================================================\n", + "\n", + "======================================================================\n", + "EITC Phase Status by State - 2025\n", + "======================================================================\n", + "\n", + "2024 Summary (first 20 rows):\n", + "state eitc_phase_status weighted_households pct_of_state avg_federal_eitc avg_state_eitc\n", + " AK Ineligible 103,108.59 50.10 0.00 0.00\n", + " AK No earned income 13,868.58 6.70 0.00 0.00\n", + " AK Pre-phase-in 3,593.07 1.70 515.63 0.00\n", + " AK Full amount 0.26 0.00 632.00 0.00\n", + " AK Partially phased out 1,670.44 0.80 626.76 0.00\n", + " AK Fully phased out 83,537.39 40.60 0.00 0.00\n", + " AL Ineligible 807,295.19 56.80 0.00 0.00\n", + " AL No earned income 108,302.58 7.60 0.00 0.00\n", + " AL Pre-phase-in 3,394.00 0.20 354.86 0.00\n", + " AL Full amount 579.79 0.00 632.00 0.00\n", + " AL Partially phased out 10,719.72 0.80 448.06 0.00\n", + " AL Fully phased out 491,831.62 34.60 0.00 0.00\n", + " AR Ineligible 365,523.75 53.50 0.00 0.00\n", + " AR No earned income 42,188.39 6.20 0.00 0.00\n", + " AR Pre-phase-in 2,328.13 0.30 453.60 0.00\n", + " AR Full amount 225.77 0.00 632.00 0.00\n", + " AR Partially phased out 5,891.05 0.90 390.64 0.00\n", + " AR Fully phased out 267,684.84 39.10 0.00 0.00\n", + " AZ Ineligible 1,030,924.88 54.10 0.00 0.00\n", + " AZ No earned income 118,057.78 6.20 0.00 0.00\n", + "\n", + "2025 Summary (first 20 rows):\n", + "state eitc_phase_status weighted_households pct_of_state avg_federal_eitc avg_state_eitc\n", + " AK Ineligible 104,066.34 50.10 0.00 0.00\n", + " AK No earned income 13,997.34 6.70 0.00 0.00\n", + " AK Pre-phase-in 3,626.43 1.70 540.81 0.00\n", + " AK Full amount 0.27 0.00 649.00 0.00\n", + " AK Partially phased out 1,685.95 0.80 627.09 0.00\n", + " AK Fully phased out 84,312.62 40.60 0.00 0.00\n", + " AL Ineligible 814,817.62 56.80 0.00 0.00\n", + " AL No earned income 109,308.14 7.60 0.00 0.00\n", + " AL Pre-phase-in 3,424.46 0.20 372.11 0.00\n", + " AL Full amount 586.22 0.00 649.00 0.00\n", + " AL Partially phased out 10,817.39 0.80 439.31 0.00\n", + " AL Fully phased out 496,373.16 34.60 0.00 0.00\n", + " AR Ineligible 368,928.66 53.50 0.00 0.00\n", + " AR No earned income 42,580.10 6.20 0.00 0.00\n", + " AR Pre-phase-in 2,349.75 0.30 475.76 0.00\n", + " AR Full amount 227.08 0.00 649.00 0.00\n", + " AR Partially phased out 5,943.54 0.90 379.36 0.00\n", + " AR Fully phased out 270,162.16 39.10 0.00 0.00\n", + " AZ Ineligible 1,040,525.88 54.10 0.00 0.00\n", + " AZ No earned income 119,153.92 6.20 0.00 0.00\n" + ] + } + ], + "source": [ + "# =============================================================================\n", + "# PHASE STATUS SUMMARY BY STATE\n", + "# =============================================================================\n", + "# This function creates the main summary output: for each state, what\n", + "# percentage of childless households fall into each EITC phase status?\n", + "#\n", + "# Key outputs per state × phase status:\n", + "# - weighted_households: Actual population count (using survey weights)\n", + "# - pct_of_state: What % of that state's childless households are in this phase\n", + "# - avg_federal_eitc: Average federal EITC for households receiving EITC\n", + "# - avg_state_eitc: Average state EITC (for states with programs)\n", + "#\n", + "# The percentages should sum to 100% for each state since we include ALL\n", + "# childless households (not just EITC recipients).\n", + "# =============================================================================\n", + "\n", + "def create_phase_status_summary(df, year_label):\n", + " \"\"\"\n", + " Create summary of EITC phase status by state with weighted counts and percentages.\n", + " \n", + " Parameters:\n", + " -----------\n", + " df : pandas.DataFrame\n", + " Household-level data from run_all_states_analysis()\n", + " year_label : str\n", + " Label for display (e.g., \"2024\")\n", + " \n", + " Returns:\n", + " --------\n", + " pandas.DataFrame\n", + " Summary with columns: state, eitc_phase_status, weighted_households,\n", + " pct_of_state, avg_federal_eitc, avg_state_eitc\n", + " \"\"\"\n", + " print(f\"\\n{'='*70}\")\n", + " print(f\"EITC Phase Status by State - {year_label}\")\n", + " print(f\"{'='*70}\")\n", + " \n", + " # Step 1: Calculate weighted counts by state and phase status\n", + " # tax_unit_weight is summed to get population-representative counts\n", + " summary = df.groupby(['state', 'eitc_phase_status']).agg({\n", + " 'tax_unit_weight': 'sum',\n", + " }).reset_index()\n", + " \n", + " summary.columns = ['state', 'eitc_phase_status', 'weighted_households']\n", + " \n", + " # Step 2: Calculate state totals for percentage calculation\n", + " state_totals = summary.groupby('state')['weighted_households'].sum().reset_index()\n", + " state_totals.columns = ['state', 'state_total']\n", + " \n", + " # Step 3: Merge to compute percentages\n", + " summary = summary.merge(state_totals, on='state')\n", + " summary['pct_of_state'] = (summary['weighted_households'] / summary['state_total'] * 100).round(1)\n", + " \n", + " # Step 4: Add average EITC amounts (only computed for households receiving EITC)\n", + " # This uses weighted averages: sum(value × weight) / sum(weight)\n", + " avg_eitc = df[df['eitc'] > 0].groupby(['state', 'eitc_phase_status']).apply(\n", + " lambda x: pd.Series({\n", + " 'avg_federal_eitc': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n", + " 'avg_state_eitc': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n", + " })\n", + " ).reset_index()\n", + " \n", + " summary = summary.merge(avg_eitc, on=['state', 'eitc_phase_status'], how='left')\n", + " summary['avg_federal_eitc'] = summary['avg_federal_eitc'].fillna(0)\n", + " summary['avg_state_eitc'] = summary['avg_state_eitc'].fillna(0)\n", + " \n", + " # Step 5: Clean up columns and sort\n", + " summary = summary[['state', 'eitc_phase_status', 'weighted_households', 'pct_of_state', \n", + " 'avg_federal_eitc', 'avg_state_eitc']]\n", + " \n", + " # Sort by state alphabetically, then by phase status in logical order\n", + " summary['phase_sort'] = summary['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})\n", + " summary = summary.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)\n", + " \n", + " return summary\n", + "\n", + "# Generate summaries for both years\n", + "summary_2024 = create_phase_status_summary(df_2024, \"2024\")\n", + "summary_2025 = create_phase_status_summary(df_2025, \"2025\")\n", + "\n", + "# Preview the results\n", + "print(\"\\n2024 Summary (first 20 rows):\")\n", + "print(summary_2024.head(20).to_string(index=False))\n", + "print(\"\\n2025 Summary (first 20 rows):\")\n", + "print(summary_2025.head(20).to_string(index=False))" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "============================================================\n", + "Top 15 States by EITC Recipients - 2024\n", + "============================================================\n", + "state Tax Units (Weighted) Total Federal EITC Total State EITC Avg Federal EITC Avg State EITC Has State EITC\n", + " CA 11,676,756.00 126,396,768.00 392,533,280.00 10.82 33.62 True\n", + " TX 8,270,492.50 102,216,176.00 0.00 12.36 0.00 False\n", + " FL 6,828,671.50 50,078,040.00 0.00 7.33 0.00 False\n", + " NY 6,089,496.00 64,632,924.00 17,955,152.00 10.61 2.95 True\n", + " IL 4,061,833.00 43,125,848.00 8,625,170.00 10.62 2.12 True\n", + " PA 4,057,412.25 41,305,212.00 0.00 10.18 0.00 False\n", + " OH 3,171,405.75 30,410,496.00 9,123,148.00 9.59 2.88 True\n", + " NC 3,018,447.50 15,553,126.00 0.00 5.15 0.00 False\n", + " MI 2,947,462.50 30,062,786.00 9,018,837.00 10.20 3.06 True\n", + " GA 2,867,909.25 20,237,260.00 0.00 7.06 0.00 False\n", + " WA 2,709,062.25 42,446,576.00 27,457,220.00 15.67 10.14 True\n", + " NJ 2,670,505.50 31,733,258.00 55,209,756.00 11.88 20.67 True\n", + " MA 2,445,482.50 27,926,758.00 11,170,704.00 11.42 4.57 True\n", + " VA 2,348,493.50 14,553,468.00 102,224,696.00 6.20 43.53 True\n", + " TN 2,125,824.00 16,333,918.00 0.00 7.68 0.00 False\n", + "\n", + "============================================================\n", + "Top 15 States by EITC Recipients - 2025\n", + "============================================================\n", + "state Tax Units (Weighted) Total Federal EITC Total State EITC Avg Federal EITC Avg State EITC Has State EITC\n", + " CA 11,785,171.00 129,709,128.00 394,849,248.00 11.01 33.50 True\n", + " TX 8,347,282.00 106,447,616.00 0.00 12.75 0.00 False\n", + " FL 6,892,074.00 51,580,204.00 0.00 7.48 0.00 False\n", + " NY 6,146,035.00 66,562,396.00 18,485,602.00 10.83 3.01 True\n", + " IL 4,099,546.00 44,526,100.00 8,905,220.00 10.86 2.17 True\n", + " PA 4,095,084.50 42,804,808.00 0.00 10.45 0.00 False\n", + " OH 3,200,851.75 31,340,700.00 9,402,211.00 9.79 2.94 True\n", + " NC 3,046,473.25 15,684,542.00 0.00 5.15 0.00 False\n", + " MI 2,974,829.25 31,287,212.00 9,386,164.00 10.52 3.16 True\n", + " GA 2,894,537.00 20,446,926.00 0.00 7.06 0.00 False\n", + " WA 2,734,215.50 44,398,868.00 28,292,344.00 16.24 10.35 True\n", + " NJ 2,695,300.25 32,696,890.00 56,647,304.00 12.13 21.02 True\n", + " MA 2,468,188.00 28,870,056.00 11,548,022.00 11.70 4.68 True\n", + " VA 2,370,298.50 14,610,885.00 103,897,088.00 6.16 43.83 True\n", + " TN 2,145,561.75 16,974,484.00 0.00 7.91 0.00 False\n" + ] + } + ], + "source": [ + "# =============================================================================\n", + "# SUMMARY BY STATE - TOP STATES BY POPULATION\n", + "# =============================================================================\n", + "# Shows the states with the largest childless tax unit populations,\n", + "# along with total and average EITC amounts.\n", + "#\n", + "# Useful for understanding which states contribute most to the national totals.\n", + "# =============================================================================\n", + "\n", + "def summary_by_state(df, year_label, top_n=15):\n", + " \"\"\"\n", + " Create summary by state showing top N by number of childless tax units.\n", + " \n", + " Parameters:\n", + " -----------\n", + " df : pandas.DataFrame\n", + " Household-level data\n", + " year_label : str\n", + " Label for display\n", + " top_n : int\n", + " Number of top states to show (default 15)\n", + " \n", + " Returns:\n", + " --------\n", + " pandas.DataFrame\n", + " State-level summary sorted by weighted tax unit count\n", + " \"\"\"\n", + " print(f\"\\n{'='*60}\")\n", + " print(f\"Top {top_n} States by EITC Recipients - {year_label}\")\n", + " print(f\"{'='*60}\")\n", + " \n", + " # Calculate state-level aggregates using weighted sums/averages\n", + " summary = df.groupby('state').apply(\n", + " lambda x: pd.Series({\n", + " # Total weighted tax units in state\n", + " 'Tax Units (Weighted)': x['tax_unit_weight'].sum(),\n", + " # Total federal EITC distributed (weight × eitc amount)\n", + " 'Total Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum(),\n", + " # Total state EITC distributed\n", + " 'Total State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum(),\n", + " # Weighted average federal EITC per tax unit\n", + " 'Avg Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n", + " # Weighted average state EITC per tax unit\n", + " 'Avg State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),\n", + " # Boolean: does this state have a state EITC program?\n", + " 'Has State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() > 0,\n", + " })\n", + " ).reset_index()\n", + " \n", + " # Sort by number of tax units (largest states first)\n", + " summary = summary.sort_values('Tax Units (Weighted)', ascending=False).head(top_n)\n", + " \n", + " return summary\n", + "\n", + "# Generate and display for both years\n", + "state_2024 = summary_by_state(df_2024, \"2024\")\n", + "print(state_2024.to_string(index=False))\n", + "\n", + "state_2025 = summary_by_state(df_2025, \"2025\")\n", + "print(state_2025.to_string(index=False))" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "============================================================\n", + "Age Distribution of Head of Household - 2024\n", + "============================================================\n", + "age_group Tax Units (Weighted) Avg Federal EITC Avg Earned Income % of Total\n", + " Under 25 12,069,262.00 0.57 24,647.42 12.50\n", + " 25-34 14,198,971.00 35.78 76,383.24 14.70\n", + " 35-44 11,448,204.00 2.19 94,731.74 11.90\n", + " 45-54 16,595,334.00 22.36 87,682.62 17.20\n", + " 55-64 9,673,886.00 1.18 59,089.31 10.00\n", + " 65+ 32,441,214.00 0.01 25,601.59 33.60\n", + "\n", + "============================================================\n", + "Age Distribution of Head of Household - 2025\n", + "============================================================\n", + "age_group Tax Units (Weighted) Avg Federal EITC Avg Earned Income % of Total\n", + " Under 25 12,181,323.00 0.59 25,851.27 12.50\n", + " 25-34 14,330,805.00 36.65 80,112.14 14.70\n", + " 35-44 11,554,499.00 2.15 99,357.79 11.90\n", + " 45-54 16,749,416.00 22.80 91,964.61 17.20\n", + " 55-64 9,763,707.00 1.16 61,973.64 10.00\n", + " 65+ 32,742,422.00 0.01 26,850.62 33.60\n" + ] + } + ], + "source": [ + "# =============================================================================\n", + "# AGE DISTRIBUTION ANALYSIS\n", + "# =============================================================================\n", + "# Shows how childless tax units are distributed by age of the head of household.\n", + "#\n", + "# Key insight: The childless EITC has age restrictions (25-64 for 2024 under\n", + "# current law), so we expect most EITC recipients to fall within that range.\n", + "# =============================================================================\n", + "\n", + "def age_distribution(df, year_label):\n", + " \"\"\"\n", + " Create age group distribution for heads of household.\n", + " \n", + " Parameters:\n", + " -----------\n", + " df : pandas.DataFrame\n", + " Household-level data\n", + " year_label : str\n", + " Label for display\n", + " \n", + " Returns:\n", + " --------\n", + " pandas.DataFrame\n", + " Summary by age group with weighted counts and averages\n", + " \"\"\"\n", + " print(f\"\\n{'='*60}\")\n", + " print(f\"Age Distribution of Head of Household - {year_label}\")\n", + " print(f\"{'='*60}\")\n", + " \n", + " # Create age groups using pd.cut\n", + " df_copy = df.copy()\n", + " df_copy['age_group'] = pd.cut(\n", + " df_copy['age_head'],\n", + " bins=[0, 25, 35, 45, 55, 65, 100],\n", + " labels=['Under 25', '25-34', '35-44', '45-54', '55-64', '65+']\n", + " )\n", + " \n", + " # Calculate weighted statistics by age group\n", + " summary = df_copy.groupby('age_group').apply(\n", + " lambda x: pd.Series({\n", + " 'Tax Units (Weighted)': x['tax_unit_weight'].sum(),\n", + " 'Avg Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum() if x['tax_unit_weight'].sum() > 0 else 0,\n", + " 'Avg Earned Income': (x['tax_unit_earned_income'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum() if x['tax_unit_weight'].sum() > 0 else 0,\n", + " })\n", + " ).reset_index()\n", + " \n", + " # Add percentage of total\n", + " total_units = summary['Tax Units (Weighted)'].sum()\n", + " summary['% of Total'] = (summary['Tax Units (Weighted)'] / total_units * 100).round(1)\n", + " \n", + " return summary\n", + "\n", + "# Generate for both years\n", + "age_2024 = age_distribution(df_2024, \"2024\")\n", + "print(age_2024.to_string(index=False))\n", + "\n", + "age_2025 = age_distribution(df_2025, \"2025\")\n", + "print(age_2025.to_string(index=False))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Export Data to CSV" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Exported 1,493,541 rows to: eitc_childless_families_2024.csv\n", + "Exported 1,493,541 rows to: eitc_childless_families_2025.csv\n" + ] + } + ], + "source": [ + "# =============================================================================\n", + "# EXPORT DETAILED HOUSEHOLD DATA\n", + "# =============================================================================\n", + "# Exports the full household-level dataset with all calculated variables.\n", + "#\n", + "# WARNING: These files are large (~125MB each) and are excluded from git\n", + "# via .gitignore. They are generated locally when the notebook runs.\n", + "#\n", + "# Use cases:\n", + "# - Detailed analysis in external tools (Excel, Stata, R)\n", + "# - Validation of the summary statistics\n", + "# - Custom filtering/aggregation not provided in this notebook\n", + "# =============================================================================\n", + "\n", + "def export_household_data(df, year):\n", + " \"\"\"\n", + " Export household-level data to CSV, sorted by state and phase status.\n", + " \"\"\"\n", + " \n", + " # Select columns for export (only columns we're loading)\n", + " export_columns = [\n", + " 'state', # State abbreviation\n", + " 'eitc_phase_status', # Classification result\n", + " 'tax_unit_id', # Unique identifier\n", + " 'tax_unit_weight', # Survey weight\n", + " 'eitc_eligible', # Eligibility status\n", + " 'eitc', # Federal EITC amount\n", + " 'state_eitc', # State EITC amount\n", + " 'eitc_phased_in', # Phase-in calculation\n", + " 'eitc_reduction', # Phase-out reduction\n", + " 'tax_unit_earned_income', # Total earned income\n", + " 'age_head', # Age of primary filer\n", + " ]\n", + " \n", + " # Only include columns that exist in the DataFrame\n", + " available_columns = [col for col in export_columns if col in df.columns]\n", + " df_export = df[available_columns].copy()\n", + " \n", + " # Rename columns for clarity in external tools\n", + " df_export = df_export.rename(columns={\n", + " 'eitc': 'federal_eitc',\n", + " })\n", + " \n", + " # Sort by state (alphabetically) then by phase status (in logical EITC order)\n", + " df_export['phase_sort'] = df_export['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})\n", + " df_export = df_export.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)\n", + " \n", + " # Write to CSV\n", + " filename = f'eitc_childless_families_{year}.csv'\n", + " df_export.to_csv(filename, index=False)\n", + " print(f\"Exported {len(df_export):,} rows to: {filename}\")\n", + " \n", + " return df_export\n", + "\n", + "# Export both years to separate files\n", + "df_export_2024 = export_household_data(df_2024, 2024)\n", + "df_export_2025 = export_household_data(df_2025, 2025)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Sample of 2024 export data:\n" + ] + }, + { + "data": { + "text/html": [ + "
| \n", + " | state | \n", + "eitc_phase_status | \n", + "tax_unit_id | \n", + "tax_unit_weight | \n", + "eitc_eligible | \n", + "federal_eitc | \n", + "state_eitc | \n", + "eitc_phased_in | \n", + "eitc_reduction | \n", + "tax_unit_earned_income | \n", + "age_head | \n", + "
|---|---|---|---|---|---|---|---|---|---|---|---|
| 25751 | \n", + "AK | \n", + "Ineligible | \n", + "0 | \n", + "0.80 | \n", + "False | \n", + "0.00 | \n", + "0.00 | \n", + "0.00 | \n", + "0.00 | \n", + "0.00 | \n", + "79 | \n", + "
| 25753 | \n", + "AK | \n", + "Ineligible | \n", + "3 | \n", + "0.28 | \n", + "False | \n", + "0.00 | \n", + "0.00 | \n", + "0.00 | \n", + "10,068.10 | \n", + "0.00 | \n", + "76 | \n", + "
| 25757 | \n", + "AK | \n", + "Ineligible | \n", + "11 | \n", + "4,387.35 | \n", + "False | \n", + "0.00 | \n", + "0.00 | \n", + "0.00 | \n", + "3,368.61 | \n", + "0.00 | \n", + "85 | \n", + "
| 25760 | \n", + "AK | \n", + "Ineligible | \n", + "14 | \n", + "2,849.94 | \n", + "False | \n", + "0.00 | \n", + "0.00 | \n", + "632.00 | \n", + "1,747.39 | \n", + "31,767.87 | \n", + "21 | \n", + "
| 25761 | \n", + "AK | \n", + "Ineligible | \n", + "15 | \n", + "639.52 | \n", + "False | \n", + "0.00 | \n", + "0.00 | \n", + "0.00 | \n", + "992.74 | \n", + "0.00 | \n", + "85 | \n", + "
| 25763 | \n", + "AK | \n", + "Ineligible | \n", + "18 | \n", + "1,114.78 | \n", + "False | \n", + "0.00 | \n", + "0.00 | \n", + "0.00 | \n", + "0.00 | \n", + "0.00 | \n", + "83 | \n", + "
| 25764 | \n", + "AK | \n", + "Ineligible | \n", + "19 | \n", + "1,114.78 | \n", + "False | \n", + "0.00 | \n", + "0.00 | \n", + "632.00 | \n", + "10,566.14 | \n", + "132,357.31 | \n", + "61 | \n", + "
| 25766 | \n", + "AK | \n", + "Ineligible | \n", + "21 | \n", + "2.31 | \n", + "False | \n", + "0.00 | \n", + "0.00 | \n", + "632.00 | \n", + "0.00 | \n", + "16,941.74 | \n", + "78 | \n", + "
| 25767 | \n", + "AK | \n", + "Ineligible | \n", + "22 | \n", + "0.82 | \n", + "False | \n", + "0.00 | \n", + "0.00 | \n", + "0.00 | \n", + "0.00 | \n", + "0.00 | \n", + "85 | \n", + "
| 25769 | \n", + "AK | \n", + "Ineligible | \n", + "24 | \n", + "792.77 | \n", + "False | \n", + "0.00 | \n", + "0.00 | \n", + "0.00 | \n", + "20.54 | \n", + "0.00 | \n", + "81 | \n", + "