Skip to content

COO-140 Added preprocessing tasks and initial pipeline engine#3

Open
Yashvi-Sharma wants to merge 11 commits intomainfrom
preproc_tasks_COO-140
Open

COO-140 Added preprocessing tasks and initial pipeline engine#3
Yashvi-Sharma wants to merge 11 commits intomainfrom
preproc_tasks_COO-140

Conversation

@Yashvi-Sharma
Copy link
Collaborator

Tested masterbias creation with both direct task calls and pipeline flow

@weatherhead99
Copy link

weatherhead99 commented Jan 22, 2026

looks like matplotlib is not either a hard or soft dependency or eregion (am testing the notebook in a clean venv). Not sure whether it should be or not. Maybe not in fact. However, it is currently imported by image.py. Therefore at the moment it should be a hard dependency in pyproject.toml to make the code work

EDIT: similarly for joblib

@weatherhead99
Copy link

weatherhead99 commented Jan 22, 2026

it looks to me like tasks, configs, datamodels etc all get installed as top level packages (at least when doing a pip install -e . on eregion. This is bad, we shouldn't be taking up that top level namespace. (when I start up a new console I can just do import tasks and that's eregion.) All these need to be installed either under an eregion package or in a package namespace.

Also, some of these aren't included in pyproject.toml and this causes import errors if the notebook is not run in the correct relative path

@weatherhead99
Copy link

FITSLoader doesn't appear to work with a .fits.gz file. Not immediately clear why, but this needs to work since it's close to a standard extension (including for DEIMOS original DRP data). Similarly for .fz but I haven't tested that yet

@weatherhead99
Copy link

musing: in trying to run the pipeline, since my path is different, I was expecting an error or something, but it all works fine because the path is a glob. As it should, and there are no files at that path in my machine. But I think this is unexpected behaviour for some users and risks a silent and annoying failure.

Possibilities to think about:

  1. an option that allows us to specify that we actually do expect some data to be there? (mnaybe even on by default)?
  2. for glob paths, check that the path itself actually exists (I think this is probably the usual use case), and allow an override option if it doesn't?
  3. warnings.warn early if there actually isn't any data to process? (I know this can't be done cleanly in the lazy generator, but probably can in the non-lazy case)

@Yashvi-Sharma
Copy link
Collaborator Author

looks like matplotlib is not either a hard or soft dependency or eregion (am testing the notebook in a clean venv). Not sure whether it should be or not. Maybe not in fact. However, it is currently imported by image.py. Therefore at the moment it should be a hard dependency in pyproject.toml to make the code work

EDIT: similarly for joblib

Fixed this and updated pyproject.toml

@Yashvi-Sharma
Copy link
Collaborator Author

it looks to me like tasks, configs, datamodels etc all get installed as top level packages (at least when doing a pip install -e . on eregion. This is bad, we shouldn't be taking up that top level namespace. (when I start up a new console I can just do import tasks and that's eregion.) All these need to be installed either under an eregion package or in a package namespace.

Also, some of these aren't included in pyproject.toml and this causes import errors if the notebook is not run in the correct relative path

Fixed this and moved modules under `eregion' package

@Yashvi-Sharma
Copy link
Collaborator Author

FITSLoader doesn't appear to work with a .fits.gz file. Not immediately clear why, but this needs to work since it's close to a standard extension (including for DEIMOS original DRP data). Similarly for .fz but I haven't tested that yet

That is strange, it works when I give the load_image_fits() a .gz file path. Though apparently there are known issues when a file-like object is passed. Could you give an example of how it fails?

@Yashvi-Sharma
Copy link
Collaborator Author

musing: in trying to run the pipeline, since my path is different, I was expecting an error or something, but it all works fine because the path is a glob. As it should, and there are no files at that path in my machine. But I think this is unexpected behaviour for some users and risks a silent and annoying failure.

Possibilities to think about:

  1. an option that allows us to specify that we actually do expect some data to be there? (mnaybe even on by default)?

This may mess up the case in which a directory is being watched for files, but we can set it to off by default in watch_mode and on otherwise. I'll add it, and if there are no files when they are expected, the task fails (which in pipeline mode should be recognized as a hard failure, which I haven't added yet but will do as well)

  1. for glob paths, check that the path itself actually exists (I think this is probably the usual use case), and allow an override option if it doesn't?

Added path checking for glob in latest commit, but what do you mean by override option?

  1. warnings.warn early if there actually isn't any data to process? (I know this can't be done cleanly in the lazy generator, but probably can in the non-lazy case)

Added logger warnings, they will only show up though when lazy generator is looped on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants