-
Notifications
You must be signed in to change notification settings - Fork 3
add name to build source data export #2174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
5845c61 to
bc4f0fd
Compare
bc4f0fd to
3bb1884
Compare
| assert planned.is_resolved, "Dataset is not resolved" | ||
|
|
||
| def test_input_datasets(self): | ||
| add_required_version_var_to_env() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if you have a sec, this should be a fixture that adds the env var, yields, then pops the env var
alexrichey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff! a few questions, but LGTM. You might try planning/loading a few recipes, in particular maybe the factfinder recipe_qa.py
1222f07 to
50fd00f
Compare
.github/workflows/cbbr_build.yml
Outdated
|
|
||
| - name: Dataloading | ||
| run: python -m dcpy.lifecycle.builds.load load | ||
| run: python3 -m dcpy lifecycle builds load load load |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too many loads here?
On that note, can you also run a nightly qa on this branch just as a sanity check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nightly qa run successful (minus CDBG for unrelated reasons and PLUTO taking forever)
products/ceqr/models/_sources.yml
Outdated
| - name: dataset_id | ||
| - name: dataset_name | ||
| - name: version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I must've done this when I was still hoping to change the csv's column names, but I kept the old ones
would prefer to keep the old ones for now (of course adding a new one) and change em later
for posterity before the rebase blast them away, was gonna make them dataset_id, dataset_name, version, file_type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair but also sad
this function is only used in PFF. every other relevant product uses dcpy.lifecycle for this
c5dda68 to
53c0df9
Compare
53c0df9 to
032540a
Compare
related to #987
most of the commits are cleanup of things I bumped into. the sections below are related to the more significant changes
all builds on this branch
successful nightly QA build
source data versions csv
a human-friendly "name" or "display name" of our input datasets are not things we've had to worry about before and it seems like data from
edm-recipesare the ones we should focus on firstadded
nameto theInputDatasetmodel and addedget_name()to theingest_datastore.Connectorone of the new relevant tests is in the integration test file
test_plan_loadbecauseplan.write_source_data_versionsis only used inload.load_source_data_from_resolved_recipe. would like to punt on omptimizing where these things and their tests liveto avoid issues reading old csvs, I held off on improving the column names even though the frequently used
publishing.get_source_data_versions()renames the columnsTemplate DB's source data versions from this branch
Seems like some datasets are old (library?) and don't have a "name" declared. library templates only had an id and a body of text (
description) that sometimes had a name in the first line. that's not something worth parsing IMO, per @fvankrieken suggestion they'll be NULLFacDB source data versions from this branch
FacDB cli
before
cli arg wasn't being used. tried to do run_pipelines for non-existing dataset named
a_nameand it tried to run all pipelines declared indatasets.ymlafter