Skip to content

Comments

Go to market#120

Open
yash-chauhan-dev wants to merge 24 commits intomainfrom
go-to-market
Open

Go to market#120
yash-chauhan-dev wants to merge 24 commits intomainfrom
go-to-market

Conversation

@yash-chauhan-dev
Copy link
Member

What does this PR do?

Fixes #

Changes

How was it tested?

Anything reviewers should know?

yash-chauhan-dev and others added 17 commits February 20, 2026 19:12
…olumns

The previous implementation converted a datetime column to a string using
the user's format string, then immediately parsed it back with that same
format — a tautological round-trip that always passed regardless of whether
the format actually matched the data.

Fix: after strftime(format) → to_datetime(format), compare the parsed result
against the original timestamp. Formats that discard information (e.g.
"%d/%m/%Y" on a column with time-of-day values) produce a different timestamp
on the round-trip, correctly signalling a format mismatch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… PyPI updates

- Add airflow-provider/ package with DataCheckOperator for DAG-based validation
- Add github-action/ with action.yml for CI/CD pipeline integration
- Add SARIF exporter for GitHub Code Scanning / security tooling compatibility
- Update README and README_PYPI with feature comparisons and integration guides
- Expand pyproject.toml keywords and classifiers for PyPI discoverability
- Add COMPETITIVE_COMPARISON.md and MARKET_REPORT.md
- Extend validate CLI and reporting module for new output integrations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- testing/csv/run_all.py — master runner (python run_all.py [suite...])
- testing/csv/helpers.py — TestSuite class, UTF-8 stdout, CLI detection
- testing/csv/test_{users,products,orders}.py — 137 test cases across 9 groups:
    A. Validate passing rules     B. Failure detection (exit 1/2/3)
    C. Output formats (json, sarif, markdown, csv)
    D. Sampling modes (top, count, rate, stratified, time_based, reservoir...)
    E. Profiling (terminal, json, markdown, iqr, zscore, suggestions)
    F. Schema evolution (capture, list, show, compare, history)
    G. Config management (validate, show, generate, templates)
    H/I. Extended rule coverage (distribution_type, min/max_length,
         date_format, no_future_timestamps, business_days_only, max_age,
         foreign_key_exists) — both pass and fail detection
- testing/csv/configs/ — 12 YAML configs (pass/fail/extended per source)
- .gitignore — excludes testing/venv/ and testing/csv/results/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rule implementation gaps (factory.py, numeric_rules.py, composite_rules.py):
- Add factory handlers and rule classes for positive, negative, non_negative,
  range, and boolean — previously exit=4 (no handler)
- BooleanRule handles both bool dtype and True/False string values

Severity propagation (engine.py, numeric_rules.py):
- Replace replace("_min","").replace("_max","") with removesuffix() so that
  check names containing "_max" or "_min" mid-string are not corrupted;
  severity: warning checks no longer incorrectly exit=1

Config validation (loader.py, schema.py):
- Remove "must have at least one check" guard so enabled:false-only configs
  exit=0 instead of exit=2
- Unify rule-type allowlist against schema.py's VALID_RULE_TYPES to stay
  in sync automatically; add missing date_range to schema
- Replace unimplemented html output format with sarif in VALID_OUTPUT_FORMATS

Temporal rules (temporal_rules.py):
- TimestampRangeRule and NoFutureTimestampsRule now match tz-awareness of
  the column before comparison to avoid tz-naive vs tz-aware TypeError
- DateFormatValidRule handles Arrow date32[day] columns via ISO string path

Profiling / statistics (profiler.py, statistics.py, schema/detector.py):
- Guard df.duplicated(), series.nunique(), and value_counts() against
  unhashable Arrow complex types (list, struct, map)
- Cast Arrow decimal128 to float64 before numeric stats to avoid ArrowTypeError
- Fix re.error from duplicate named group %H in inferred date format strings
  by tracking has_hour and capping hour-segment detection to one emission

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removed the 5 aggregate/statistical rules (mean_between, std_dev_less_than,
percentile_range, z_score_outliers, distribution_type) which are anomaly
detection tools rather than row-level data quality rules. This simplifies
the rule set and avoids user confusion about what validation means.

Also includes prior go-to-market work committed together:
- SQL pushdown engine (datacheck/sql_pushdown/) for PostgreSQL, Redshift,
  MySQL, SQL Server, Snowflake, BigQuery — zero data transfer validation
- Removed profiling feature (datacheck/profiling/, cli/profile.py, config/generator.py)
- Removed custom rule plugin system (datacheck/plugins/)
- Removed sampling feature (datacheck/sampling/)
- Advanced templates for all 6 domains with sample data generation
- Performance improvements: 11x speedup for temporal rules via PyArrow,
  vectorized ops in type/bool/length rules, ThreadPoolExecutor parallelism
- Updated all docs, guides, templates, and benchmarks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ing folder

- Bump version to 2.1.0 across pyproject.toml, __init__.py, sarif_exporter.py, airflow-provider, github-action
- Remove comparison table and stale competitor references from README
- Add boolean rule and fix missing rules (range, positive, non_negative) in all summary tables
- Remove positive/non_negative from high-level summary tables (redundant with min/max)
- Fix Named Sources heading and email_valid stale reference in README
- Add DataCheckSchemaOperator query parameter (code + docs + airflow-provider README)
- Add large-table tip for schema operator using LIMIT in query
- Add guides/config-guide.md comprehensive config file reference
- Update cli-guide.md and guides to remove redundancy, add cross-links
- Remove testing/ folder and internal market/competitive reports

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New tagline: "Catch data quality issues before they reach production"
- Rewrite Highlights to lead with benefits (bold) not features
- Surface SARIF, GitHub Action, and Airflow in top-level highlights
- Remove comparison table from README_PYPI.md, sync with README.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the old stats block + per-failure listing with a single Rich
rounded table showing every rule as a row (Result | Check | Column | Details):
- Result cell: passed (green) / failed (red) / warning (yellow) / info (blue) / error (red)
- Details cell: failure rate + sample bad values for failures; error message for execution errors
- One-line footer: 🟢/🟡/🔴 status, check count, row count, per-severity counts, elapsed time
- Warnings-only runs show 🟡 "Passed with warnings" instead of red

Track elapsed time in validate.py (time.monotonic) and pass to reporter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Show "Validating <source>" line above the table so users know what ran
  - Named source: "production_db → orders"
  - Inline file: "orders.csv (csv)"
  - Warehouse connection: "snowflake → orders"
  - File arg: "orders.parquet" (just filename, not full path)
- Execution errors truncated at 60 chars in the table Details cell with "… (see below)"
- Full error messages printed separately after the footer in a red "Execution Errors" panel

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
JSON (--format json):
- Switch from basic JSONExporter to JsonReporter (metadata, distributions, suggestions)
- Add source, elapsed_seconds to metadata
- Status now "PASSED" / "PASSED_WITH_WARNINGS" / "FAILED"
- Summary adds failed_errors, failed_warnings, failed_info, total_rows, total_columns
- Results add severity field, cleaner status values (PASS/FAIL/WARNING/INFO/ERROR)

Markdown (--format markdown):
- Source line, status icon, run summary with counts and timing at the top
- Full results table: Result | Check | Column | Details | Severity (all rules, not just failures)
- Failure details section with sample values table per failed rule
- Execution errors section with full error messages in code blocks

SARIF (--format sarif):
- Add startTimeUtc derived from elapsed time
- Add automationDetails.description for source info

CSV (--format csv / --csv-export):
- Add severity column to both export_failures and export_summary
- Drop redundant rule_name column (check_name is cleaner)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove BusinessDaysOnlyRule (country_code was never implemented)
- Remove GCS and Azure connectors (stubs with no real implementation)
- Remove DuckDB and SQLite loaders
- Remove Delta Lake and Avro loaders
- Remove min_quality_score from Airflow operator (profiling removed)
- Delete empty stub directories: core/, plugins/, profiling/, sampling/
- Fix output_path → output_file in all 7 config templates
- Clean all guides, docs, and templates of stale references
- Update airflow-provider package to match supported sources

Supported file formats: CSV, Parquet only
Supported cloud storage: S3 only
Supported databases: PostgreSQL, MySQL, MSSQL, Snowflake, BigQuery, Redshift

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rewrite README headline: 'A Linter for Data Pipelines'
- Add enforcement-first description with fail-fast diagram
- Add 'Why not observability?' section to README
- Expand CI/CD section: SARIF upload, Airflow gate, plain shell examples
- Add SQL pushdown callout in database sources section
- Remove 'continuous monitoring' roadmap item (wrong direction)
- Add Python API halt-on-failure pattern
- Rewrite README_PYPI.md with matching positioning
- Create docs/philosophy.md: detection vs enforcement, deterministic
  vs statistical, SQL pushdown rationale, zero-infra rationale,
  opinionated design principles
- Replace 'monitoring dashboards' with 'informational checks' in config guide
- Replace 'schema monitoring' with 'schema enforcement' in python-api guide

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- docs/index.md: update title and opening to 'A Linter for Data Pipelines'
  with enforcement diagram and deterministic/zero-infra framing
- guides/guide-who-uses-datacheck.md: update opening from detection
  language to enforcement/gate language
- pyproject.toml: update description to 'A linter for data pipelines.
  Enforce data quality rules in CI/CD, Airflow, and beyond.'
- .github/workflows/data-quality.yml: add ready-to-use GitHub Actions
  workflow with SARIF upload to Security tab

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add Mental Model section: "Code has linters. Data pipelines need gates."
- "data quality rules" -> "deterministic validation rules" throughout
- Determinism bullet: "No heuristics. No anomaly scoring. No statistical guessing."
- Add "Validate Where Data Lives" section surfacing SQL pushdown as differentiator
- Add "What DataCheck Is Not" block after observability section
- Quickstart: add echo $? to reinforce gating behavior
- "Detect Schema Changes" -> "Enforce Schema Contracts" + enforcement framing
- Remove stability self-declaration from Roadmap

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Monitoring multiple tables in parallel
"""

from datetime import datetime, timedelta

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'timedelta' is not used.

Copilot Autofix

AI about 15 hours ago

To fix the problem, remove the unused timedelta symbol from the import statement so that only the actually used datetime is imported. This removes the unnecessary dependency and makes the code cleaner without changing behavior.

Concretely, in airflow-provider/example_dags/example_schema_dag.py, update line 10 from from datetime import datetime, timedelta to from datetime import datetime. No other changes are needed, since timedelta is not referenced anywhere else in the shown code. This keeps the DAG’s behavior identical while resolving the CodeQL finding.

Suggested changeset 1
airflow-provider/example_dags/example_schema_dag.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/airflow-provider/example_dags/example_schema_dag.py b/airflow-provider/example_dags/example_schema_dag.py
--- a/airflow-provider/example_dags/example_schema_dag.py
+++ b/airflow-provider/example_dags/example_schema_dag.py
@@ -7,7 +7,7 @@
 - Monitoring multiple tables in parallel
 """
 
-from datetime import datetime, timedelta
+from datetime import datetime
 
 from airflow import DAG
 from airflow.operators.python import PythonOperator
EOF
@@ -7,7 +7,7 @@
- Monitoring multiple tables in parallel
"""

from datetime import datetime, timedelta
from datetime import datetime

from airflow import DAG
from airflow.operators.python import PythonOperator
Copilot is powered by AI and may make mistakes. Always verify output.
series.dtype.pyarrow_dtype
):
return series.astype("float64")
except Exception:

Check notice

Code scanning / CodeQL

Empty except Note

'except' clause does nothing but pass and there is no explanatory comment.

Copilot Autofix

AI about 3 hours ago

In general, to fix an empty except you either (a) narrow the exception to the specific expected types and explain why it is safe to ignore them, or (b) handle the exception in a meaningful way, such as logging, then proceed with a safe fallback. Here, _ensure_numeric wants to be resilient: failures in pyarrow/decimal detection should simply mean “don’t convert, just return the original series”. We should keep that behavior but avoid swallowing unexpected errors silently.

The best fix with minimal functional change is:

  • Narrow each except to a more specific set where reasonable (e.g. ImportError and AttributeError), but given limited context and to avoid changing behavior, we’ll keep Exception and add a small handling action.
  • Log the exception at debug level using the logging module so users can diagnose issues when needed, but default behavior remains unaffected.
  • Add an explanatory comment clarifying that on any error we fall back to returning the original series.

Concretely in datacheck/rules/numeric_rules.py:

  • Add import logging near the top (without modifying existing imports).
  • In the first try block (lines 20–27), replace the empty except Exception: pass with except Exception as exc: and add a comment and a logging call such as logging.getLogger(__name__).debug(...).
  • Do the same in the second try block (lines 31–37).

This preserves existing logic: _ensure_numeric still returns the original series when conversion fails, but the exceptions are no longer entirely ignored.

Suggested changeset 1
datacheck/rules/numeric_rules.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/datacheck/rules/numeric_rules.py b/datacheck/rules/numeric_rules.py
--- a/datacheck/rules/numeric_rules.py
+++ b/datacheck/rules/numeric_rules.py
@@ -1,6 +1,7 @@
 """Numeric validation rules."""
 
 import pandas as pd
+import logging
 
 from datacheck.exceptions import ColumnNotFoundError, RuleDefinitionError
 from datacheck.results import RuleResult
@@ -24,8 +25,11 @@
             series.dtype.pyarrow_dtype
         ):
             return series.astype("float64")
-    except Exception:
-        pass
+    except Exception as exc:
+        # If pyarrow is unavailable or dtype inspection fails, fall back to the original series.
+        logging.getLogger(__name__).debug(
+            "Failed to coerce Arrow-backed decimal series to float64: %s", exc
+        )
     # Handle object dtype containing Python decimal.Decimal objects
     if series.dtype == object:
         try:
@@ -33,8 +37,11 @@
             first_valid = series.dropna()
             if len(first_valid) > 0 and isinstance(first_valid.iloc[0], decimal.Decimal):
                 return pd.to_numeric(series, errors="coerce")
-        except Exception:
-            pass
+        except Exception as exc:
+            # If decimal import or conversion fails, fall back to the original series.
+            logging.getLogger(__name__.strip()).debug(
+                "Failed to coerce Decimal-containing series to numeric: %s", exc
+            )
     return series
 
 
EOF
@@ -1,6 +1,7 @@
"""Numeric validation rules."""

import pandas as pd
import logging

from datacheck.exceptions import ColumnNotFoundError, RuleDefinitionError
from datacheck.results import RuleResult
@@ -24,8 +25,11 @@
series.dtype.pyarrow_dtype
):
return series.astype("float64")
except Exception:
pass
except Exception as exc:
# If pyarrow is unavailable or dtype inspection fails, fall back to the original series.
logging.getLogger(__name__).debug(
"Failed to coerce Arrow-backed decimal series to float64: %s", exc
)
# Handle object dtype containing Python decimal.Decimal objects
if series.dtype == object:
try:
@@ -33,8 +37,11 @@
first_valid = series.dropna()
if len(first_valid) > 0 and isinstance(first_valid.iloc[0], decimal.Decimal):
return pd.to_numeric(series, errors="coerce")
except Exception:
pass
except Exception as exc:
# If decimal import or conversion fails, fall back to the original series.
logging.getLogger(__name__.strip()).debug(
"Failed to coerce Decimal-containing series to numeric: %s", exc
)
return series


Copilot is powered by AI and may make mistakes. Always verify output.
first_valid = series.dropna()
if len(first_valid) > 0 and isinstance(first_valid.iloc[0], decimal.Decimal):
return pd.to_numeric(series, errors="coerce")
except Exception:

Check notice

Code scanning / CodeQL

Empty except Note

'except' clause does nothing but pass and there is no explanatory comment.

Copilot Autofix

AI about 3 hours ago

General approach: Avoid bare “do-nothing” except Exception blocks. Either (a) narrow the exception type and document why it is safe to ignore, or (b) log/record the error while preserving the current non-failing behavior.

Best fix here without changing existing functionality:

  • Keep the behavior that _ensure_numeric never raises from the decimal-detection logic and instead falls back to returning series.
  • Add a comment explaining that failures are intentionally ignored because conversion is best-effort.
  • Optionally capture the exception as e so future logging can be added, but to keep behavior strictly identical we will not log or re-raise.
  • Apply the same pattern to both except blocks in _ensure_numeric (lines 20–27 and 31–37) for consistency.

Concretely in datacheck/rules/numeric_rules.py:

  • Replace except Exception: followed by pass in the Arrow/pyarrow block with except Exception: # best-effort Arrow decimal handling and include a short explanatory comment inside the block.
  • Replace the except Exception: in the decimal.Decimal block the same way.

We do not need new imports or helper methods; the changes are limited to these except blocks and comments.

Suggested changeset 1
datacheck/rules/numeric_rules.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/datacheck/rules/numeric_rules.py b/datacheck/rules/numeric_rules.py
--- a/datacheck/rules/numeric_rules.py
+++ b/datacheck/rules/numeric_rules.py
@@ -25,6 +25,8 @@
         ):
             return series.astype("float64")
     except Exception:
+        # Best-effort Arrow decimal handling: if detection/conversion fails,
+        # fall back to returning the original series unchanged.
         pass
     # Handle object dtype containing Python decimal.Decimal objects
     if series.dtype == object:
@@ -34,6 +36,7 @@
             if len(first_valid) > 0 and isinstance(first_valid.iloc[0], decimal.Decimal):
                 return pd.to_numeric(series, errors="coerce")
         except Exception:
+            # Best-effort Decimal handling: on any failure, return the series as-is.
             pass
     return series
 
EOF
@@ -25,6 +25,8 @@
):
return series.astype("float64")
except Exception:
# Best-effort Arrow decimal handling: if detection/conversion fails,
# fall back to returning the original series unchanged.
pass
# Handle object dtype containing Python decimal.Decimal objects
if series.dtype == object:
@@ -34,6 +36,7 @@
if len(first_valid) > 0 and isinstance(first_valid.iloc[0], decimal.Decimal):
return pd.to_numeric(series, errors="coerce")
except Exception:
# Best-effort Decimal handling: on any failure, return the series as-is.
pass
return series

Copilot is powered by AI and may make mistakes. Always verify output.
)
except Exception:
pass
except Exception:

Check notice

Code scanning / CodeQL

Empty except Note

'except' clause does nothing but pass and there is no explanatory comment.

Copilot Autofix

AI about 15 hours ago

In general, to fix empty except blocks, either (a) narrow the exception type and handle it explicitly, (b) add at least logging or a comment explaining why it is safe to ignore, or (c) re-raise after doing necessary cleanup. Here, the intent is to attempt an optimized PyArrow-based conversion but to fall back silently to pd.to_datetime if PyArrow is unavailable or the cast fails. The best fix without changing functionality is to:

  • Add explanatory comments to both except blocks clarifying that the function will fall back to pd.to_datetime.
  • Optionally narrow the outer exception to ImportError (the main expected failure), while still keeping behavior identical for callers. However, per instruction to avoid changing functionality, we’ll keep the broad catch but document it.

Because the only snippet we may edit is _to_datetime_fast in datacheck/rules/temporal_rules.py, we will replace the two except Exception: pass blocks with versions that include clear comments explaining the intentional fallbacks. We will not add imports or logging libraries (to avoid new dependencies and behavior changes). No new methods or definitions are needed; we only modify these lines in this file.

Suggested changeset 1
datacheck/rules/temporal_rules.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/datacheck/rules/temporal_rules.py b/datacheck/rules/temporal_rules.py
--- a/datacheck/rules/temporal_rules.py
+++ b/datacheck/rules/temporal_rules.py
@@ -36,8 +36,13 @@
                         name=series.name,
                     )
                 except Exception:
+                    # If any error occurs in the fast Arrow-based path, fall back to
+                    # pandas' to_datetime below to preserve correctness.
                     pass
     except Exception:
+        # If pyarrow is not available or Arrow dtype handling fails, silently fall
+        # back to pandas' to_datetime below. This keeps behavior identical while
+        # only sacrificing the fast path.
         pass
     return pd.to_datetime(series, errors="coerce", format="mixed")
 
EOF
@@ -36,8 +36,13 @@
name=series.name,
)
except Exception:
# If any error occurs in the fast Arrow-based path, fall back to
# pandas' to_datetime below to preserve correctness.
pass
except Exception:
# If pyarrow is not available or Arrow dtype handling fails, silently fall
# back to pandas' to_datetime below. This keeps behavior identical while
# only sacrificing the fast path.
pass
return pd.to_datetime(series, errors="coerce", format="mixed")

Copilot is powered by AI and may make mistakes. Always verify output.
yash-chauhan-dev and others added 7 commits February 24, 2026 11:16
- CLI help text: "Lightweight data quality validation tool" -> "A linter for data pipelines"
- CLI schema command: "Schema evolution detection" -> "Enforce schema contracts"
- datacheck/__init__.py: update module docstring
- pyproject.toml: "data quality rules" -> "deterministic validation rules"; remove data-quality/data-observability keywords, add data-linter/schema-contracts
- airflow/operators.py + __init__.py: "data quality checks" -> "validation rules"; "Detect schema" -> "Enforce schema contracts"
- airflow-provider: pyproject.toml description/keywords cleaned; provider __init__ docstring; example DAG docstring
- airflow-provider/README.md: "Detects schema changes" -> "Enforces schema contracts"
- github-action/README.md: "Validate data quality" -> "Enforce deterministic validation rules"
- guides/cli-guide.md: schema section heading + command table + code comment
- guides/python-api.md: schema operator description + Airflow example comment
- guides/guide-who-uses-datacheck.md: "schema evolution detection" -> enforcement framing; pipeline diagram comments
- docs/index.md: "detect schema changes" -> "enforce schema contracts" (3 occurrences)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
LoaderFactory.create_loader extracted 'columns' explicitly but also
left it in file_kwargs, causing CSVLoader to receive it twice.
Added 'columns' to the exclusion list in file_kwargs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- docs/index.md: remove Dagster and Prefect (no integrations exist)
- pyproject.toml: remove dagster/prefect keywords
- github-action/README.md: remove gcs/azure from extras list; CSV/Parquet only for data-source input
- SECURITY.md: remove GCS and Azure from optional dependencies

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
validate:
- Add all 20+ options in grouped tables (data source / output / execution / logging)
- Add positional [DATA_SOURCE] argument and direct file example
- Add echo $? to reinforce gating behavior

schema compare:
- Fix incorrect comment: compare does NOT fail by default - only with --fail-on-breaking
- Add --fail-on-breaking to examples
- Add full schema compare options table

README_PYPI.md: add direct file and echo $? examples to validate quickstart

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix all GitHub Actions using non-existent @v6 versions across ci.yml,
  security.yml, auto-release.yml, release.yml, pr-version-check.yml
  (checkout@v4, setup-python@v5, upload-artifact@v4)
- Remove data-quality.yml from this repo's CI - it is a user template,
  not a workflow for the DataCheck repo itself (no .datacheck.yaml here)
- Fix validate command one-line description to enforcement language
- Fix 30 ruff linting errors: unused imports, dead variable, loop variable,
  Optional[X] -> X | None modernisation, quoted type annotations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- builder.py: use separate variable name for int(params) in min/max_length
  to avoid type conflict with str-typed v used elsewhere in the function
- sample_data.py: add type annotation to nested seg() helper and data list
- engine.py: cast to_dict() result to dict[str, Any] for parse_results()
- loader.py: add type: ignore[call-overload] on pd.read_csv calls where
  **kwargs spread prevents pandas-stubs overload resolution
- poetry.lock: regenerated after types-PyYAML and pandas-stubs were installed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant