Skip to content

Conversation

@damonmcc
Copy link
Member

@damonmcc damonmcc commented Jan 14, 2026

related to #987

most of the commits are cleanup of things I bumped into. the sections below are related to the more significant changes

all builds on this branch

successful nightly QA build

source data versions csv

a human-friendly "name" or "display name" of our input datasets are not things we've had to worry about before and it seems like data from edm-recipes are the ones we should focus on first

added name to the InputDataset model and added get_name() to the ingest_datastore.Connector

one of the new relevant tests is in the integration test file test_plan_load because plan.write_source_data_versions is only used in load.load_source_data_from_resolved_recipe. would like to punt on omptimizing where these things and their tests live

to avoid issues reading old csvs, I held off on improving the column names even though the frequently used publishing.get_source_data_versions() renames the columns

Template DB's source data versions from this branch

Screenshot 2026-01-21 at 4 52 08 PM

Seems like some datasets are old (library?) and don't have a "name" declared. library templates only had an id and a body of text (description) that sometimes had a name in the first line. that's not something worth parsing IMO, per @fvankrieken suggestion they'll be NULL

FacDB source data versions from this branch

Screenshot 2026-01-21 at 4 53 42 PM

FacDB cli

before

cli arg wasn't being used. tried to do run_pipelines for non-existing dataset named a_name and it tried to run all pipelines declared in datasets.yml

Screenshot 2026-01-14 at 12 06 52 PM

after

Screenshot 2026-01-14 at 12 08 16 PM

@codecov
Copy link

codecov bot commented Jan 14, 2026

Codecov Report

❌ Patch coverage is 94.11765% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 74.26%. Comparing base (a9bce9d) to head (032540a).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
dcpy/lifecycle/builds/plan.py 92.30% 0 Missing and 1 partial ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
dcpy/connectors/ingest_datastore.py 100.00% <100.00%> (ø)
dcpy/lifecycle/builds/load.py 85.48% <ø> (ø)
dcpy/models/lifecycle/builds.py 98.03% <100.00%> (+0.01%) ⬆️
dcpy/models/lifecycle/ingest.py 96.73% <100.00%> (+0.02%) ⬆️
dcpy/lifecycle/builds/plan.py 88.88% <92.30%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@damonmcc damonmcc marked this pull request as draft January 15, 2026 03:42
@damonmcc damonmcc changed the title imporve FacDB utils add name to source data exports Jan 15, 2026
@damonmcc damonmcc changed the title add name to source data exports add name to build source data export Jan 15, 2026
@damonmcc damonmcc force-pushed the dm-facdb-source-names branch 26 times, most recently from 5845c61 to bc4f0fd Compare January 20, 2026 18:54
@damonmcc damonmcc requested a review from a team January 20, 2026 19:01
@damonmcc damonmcc force-pushed the dm-facdb-source-names branch from bc4f0fd to 3bb1884 Compare January 20, 2026 21:16
assert planned.is_resolved, "Dataset is not resolved"

def test_input_datasets(self):
add_required_version_var_to_env()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if you have a sec, this should be a fixture that adds the env var, yields, then pops the env var

Copy link
Contributor

@alexrichey alexrichey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! a few questions, but LGTM. You might try planning/loading a few recipes, in particular maybe the factfinder recipe_qa.py

@damonmcc damonmcc force-pushed the dm-facdb-source-names branch 2 times, most recently from 1222f07 to 50fd00f Compare January 21, 2026 21:39

- name: Dataloading
run: python -m dcpy.lifecycle.builds.load load
run: python3 -m dcpy lifecycle builds load load load
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too many loads here?

On that note, can you also run a nightly qa on this branch just as a sanity check

Copy link
Member Author

@damonmcc damonmcc Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nightly qa run successful (minus CDBG for unrelated reasons and PLUTO taking forever)

Comment on lines 11 to 13
- name: dataset_id
- name: dataset_name
- name: version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

Copy link
Member Author

@damonmcc damonmcc Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I must've done this when I was still hoping to change the csv's column names, but I kept the old ones

would prefer to keep the old ones for now (of course adding a new one) and change em later

for posterity before the rebase blast them away, was gonna make them dataset_id, dataset_name, version, file_type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair but also sad

@damonmcc damonmcc force-pushed the dm-facdb-source-names branch 2 times, most recently from c5dda68 to 53c0df9 Compare January 22, 2026 03:19
@damonmcc damonmcc force-pushed the dm-facdb-source-names branch from 53c0df9 to 032540a Compare January 22, 2026 03:54
@damonmcc damonmcc merged commit aef37fb into main Jan 22, 2026
115 of 123 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants