Skip to content

Update data validation btp#21

Open
JunAishima wants to merge 6 commits intoNSLS2:mainfrom
JunAishima:update-data-validation-btp
Open

Update data validation btp#21
JunAishima wants to merge 6 commits intoNSLS2:mainfrom
JunAishima:update-data-validation-btp

Conversation

@JunAishima
Copy link
Contributor

Use data validation from bluesky-tiled-plugin, which does more checks and correction of dataset errors, as well as performing data reading checks that are more efficient than reading out the entire dataset (read first and last images).

The data validation functionality in bluesky-tiled-plugin requires additional dependencies that are included in the full "tiled" installation, which are not in tiled-client.

Also, the validation functionality requires use of the migration catalog, so this must be used instead of raw.

Changes required in other repos:

  • Correcting errors requires the ability to write back into Tiled. This requires the write:metadata and register scopes in a new Tiled API key (ansible)

 * replace the task that just reads the entire datasets with
   what is in btp, which also includes checks on the data
 * use tiled instead of tiled-client - validation requires more
   dependencies

@task(retries=2, retry_delay_seconds=10)
def read_all_streams(uid, beamline_acronym):
@flow(retries=2, retry_delay_seconds=10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a conceptual improvement in defining flows directly (instead of flows comprised of tasks)? Asking for a friend. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea is that tasks are smaller components and can be retried - https://docs.prefect.io/v3/concepts/flows#why-both-flows-and-tasks
you could say that we could have made reading the individual streams in the previous read_all_streams() as an individual task, while making the main function as a flow.
no restrictions on flows calling tasks and vice versa! (as well as a flow calling a flow and task calling a task)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note on a task calling a task, it is possible, but not recommended (see discussion here).

@task(retries=2, retry_delay_seconds=10)
def read_all_streams(uid, beamline_acronym):
@flow(retries=2, retry_delay_seconds=10)
def data_validation(uid, beamline_acronym="smi"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may still need to keep the old validation task that uses the /smi/raw catalog while the data is still being written to Mongo (and add the new validation of the /smi/migration catalog). It doesn't cost us much (correctly me if I'm wrong), but for completeness I think it makes sense to keep it.
We can the old function once Mongo is turned off and /migration is renamed /raw.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you want to keep the task that reads all data as a check for /smi/raw? we should talk about this more - this change would remove one of the major advantages of this PR, that we are not reading all of the data, which could be the main reason for prefect-worker2 CPU usage issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, if improving the performance was the main reason -- maybe just remove it then. I don't know if anyone paid any attention to this reading failing.
Validating /migration (what you've added) is important though; it fixes any inconsistencies in structures to make the data readable.

@JunAishima JunAishima requested review from danielballan and removed request for AbbyGi February 18, 2026 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments