COO-140 Added preprocessing tasks and initial pipeline engine by Yashvi-Sharma · Pull Request #3 · CaltechOpticalObservatories/eregion

Yashvi-Sharma · 2026-01-21T01:02:34Z

Tested masterbias creation with both direct task calls and pipeline flow

… joblib, added hook for fitsloader

…bias creation example on DTU_dettest singledet data

weatherhead99 · 2026-01-22T18:41:04Z

looks like matplotlib is not either a hard or soft dependency or eregion (am testing the notebook in a clean venv). Not sure whether it should be or not. Maybe not in fact. However, it is currently imported by image.py. Therefore at the moment it should be a hard dependency in pyproject.toml to make the code work

EDIT: similarly for joblib

weatherhead99 · 2026-01-22T18:47:07Z

it looks to me like tasks, configs, datamodels etc all get installed as top level packages (at least when doing a pip install -e . on eregion. This is bad, we shouldn't be taking up that top level namespace. (when I start up a new console I can just do import tasks and that's eregion.) All these need to be installed either under an eregion package or in a package namespace.

Also, some of these aren't included in pyproject.toml and this causes import errors if the notebook is not run in the correct relative path

weatherhead99 · 2026-01-22T18:52:02Z

FITSLoader doesn't appear to work with a .fits.gz file. Not immediately clear why, but this needs to work since it's close to a standard extension (including for DEIMOS original DRP data). Similarly for .fz but I haven't tested that yet

weatherhead99 · 2026-01-22T18:55:08Z

musing: in trying to run the pipeline, since my path is different, I was expecting an error or something, but it all works fine because the path is a glob. As it should, and there are no files at that path in my machine. But I think this is unexpected behaviour for some users and risks a silent and annoying failure.

Possibilities to think about:

an option that allows us to specify that we actually do expect some data to be there? (mnaybe even on by default)?
for glob paths, check that the path itself actually exists (I think this is probably the usual use case), and allow an override option if it doesn't?
warnings.warn early if there actually isn't any data to process? (I know this can't be done cleanly in the lazy generator, but probably can in the non-lazy case)

Yashvi-Sharma · 2026-01-22T21:09:32Z

looks like matplotlib is not either a hard or soft dependency or eregion (am testing the notebook in a clean venv). Not sure whether it should be or not. Maybe not in fact. However, it is currently imported by image.py. Therefore at the moment it should be a hard dependency in pyproject.toml to make the code work

EDIT: similarly for joblib

Fixed this and updated pyproject.toml

Yashvi-Sharma · 2026-01-22T21:10:18Z

it looks to me like tasks, configs, datamodels etc all get installed as top level packages (at least when doing a pip install -e . on eregion. This is bad, we shouldn't be taking up that top level namespace. (when I start up a new console I can just do import tasks and that's eregion.) All these need to be installed either under an eregion package or in a package namespace.

Also, some of these aren't included in pyproject.toml and this causes import errors if the notebook is not run in the correct relative path

Fixed this and moved modules under `eregion' package

Yashvi-Sharma · 2026-01-22T23:19:12Z

FITSLoader doesn't appear to work with a .fits.gz file. Not immediately clear why, but this needs to work since it's close to a standard extension (including for DEIMOS original DRP data). Similarly for .fz but I haven't tested that yet

That is strange, it works when I give the load_image_fits() a .gz file path. Though apparently there are known issues when a file-like object is passed. Could you give an example of how it fails?

Yashvi-Sharma · 2026-01-23T01:49:15Z

musing: in trying to run the pipeline, since my path is different, I was expecting an error or something, but it all works fine because the path is a glob. As it should, and there are no files at that path in my machine. But I think this is unexpected behaviour for some users and risks a silent and annoying failure.

Possibilities to think about:

an option that allows us to specify that we actually do expect some data to be there? (mnaybe even on by default)?

This may mess up the case in which a directory is being watched for files, but we can set it to off by default in watch_mode and on otherwise. I'll add it, and if there are no files when they are expected, the task fails (which in pipeline mode should be recognized as a hard failure, which I haven't added yet but will do as well)

for glob paths, check that the path itself actually exists (I think this is probably the usual use case), and allow an override option if it doesn't?

Added path checking for glob in latest commit, but what do you mean by override option?

warnings.warn early if there actually isn't any data to process? (I know this can't be done cleanly in the lazy generator, but probably can in the non-lazy case)

Added logger warnings, they will only show up though when lazy generator is looped on.

…ay fix the .gz loading issue

Yashvi-Sharma added 7 commits January 12, 2026 13:55

COO-141 COO-142 COO-143 COO-144 COO-145 Added preproc tasks

7f6838c

updated tasks to use joblib for intra-parallelization instead of prefect

b8ff5d1

Moved all config loader classes to one place

da73713

Merge branch 'refs/heads/main' into preproc_tasks_COO-140

74428a1

Minor fixes, made typing hints consistent

2ab9a88

Removed prefect from intra-task parallel operations, switched that to…

e5e8146

… joblib, added hook for fitsloader

COO-150 initial commit for pipeline engine using prefect, with master…

29c5ccd

…bias creation example on DTU_dettest singledet data

Yashvi-Sharma requested review from prkrtg and weatherhead99 January 21, 2026 01:02

fixed logger configuration

68252e2

Updated pyproject.toml, restructured modules

db81bd3

Yashvi-Sharma added 2 commits January 22, 2026 17:50

added path checking, log strings, and a tiny mod in fitsloader that m…

ca53587

…ay fix the .gz loading issue

added a param for requiring data to be present in input_source

3eb093d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COO-140 Added preprocessing tasks and initial pipeline engine#3

COO-140 Added preprocessing tasks and initial pipeline engine#3
Yashvi-Sharma wants to merge 11 commits intomainfrom
preproc_tasks_COO-140

Yashvi-Sharma commented Jan 21, 2026

Uh oh!

weatherhead99 commented Jan 22, 2026 •

edited

Loading

Uh oh!

weatherhead99 commented Jan 22, 2026 •

edited

Loading

Uh oh!

weatherhead99 commented Jan 22, 2026

Uh oh!

weatherhead99 commented Jan 22, 2026

Uh oh!

Yashvi-Sharma commented Jan 22, 2026

Uh oh!

Yashvi-Sharma commented Jan 22, 2026

Uh oh!

Yashvi-Sharma commented Jan 22, 2026

Uh oh!

Yashvi-Sharma commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Yashvi-Sharma commented Jan 21, 2026

Uh oh!

weatherhead99 commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weatherhead99 commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weatherhead99 commented Jan 22, 2026

Uh oh!

weatherhead99 commented Jan 22, 2026

Uh oh!

Yashvi-Sharma commented Jan 22, 2026

Uh oh!

Yashvi-Sharma commented Jan 22, 2026

Uh oh!

Yashvi-Sharma commented Jan 22, 2026

Uh oh!

Yashvi-Sharma commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

weatherhead99 commented Jan 22, 2026 •

edited

Loading

weatherhead99 commented Jan 22, 2026 •

edited

Loading