-
Notifications
You must be signed in to change notification settings - Fork 19
Local banzai setup for running at site #427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Is the prior |
b2664e3 to
2ec7227
Compare
- Remove outdated docker-compose.yml - Rename docker-compose.local.yml to docker-compose-site.yml - Rename default local banzai directory from local_banzai to site_banzai - Rename default db name from local-banzai.db to site-banzai.db - Added line to cache sync daemon to create calibrations_cache directory if needed
2ec7227 to
5f47cf9
Compare
de27938 to
959275e
Compare
| args_dict = args | ||
|
|
||
| # If a separate calibration db address is not provided, fall back to using the primary db address | ||
| if 'cal_db_address' not in args_dict or args_dict.get('cal_db_address') is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it should be in the main.py parse args code rather than the context. The context stuff doesn't really care what we put into the object itself.
| default='sqlite:///banzai-test.db', | ||
| help='Database address: Should be in SQLAlchemy form') | ||
| parser.add_argument('--calibration-db-address', dest='cal_db_address', | ||
| help='Optional separate database address for getting calibration files. Defaults to using the same address as --db-address.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where the cal-db-address should be set. default the arg to None and then check for None in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried your suggestion of removing the cal_db_address fallback from context.py but that caused issues in the e2e tests because some parts of the code (e2e tests and celery workers) don’t use parse_args to set up the context, and were therefore failing to set the cal_db_address fallback to db_address.
Seems like the solution is either
- add the init logic back to context.py
- use access patterns like getattr(runtime_context, ‘cal_db_address’, runtime_context.db_address) everywhere cal_db_address is used (messy)
- require setting cal_db_address explicitly (might break existing banzai setups)
- get rid of the separate cal_db_address entirely
Any of these fixes is relatively easy to implement but option 4 seems best from an overall complexity standpoint if we are ok with the docker-compose-local.yml setup requiring users to set up their own db from scratch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone ahead with option 1 (adding init logic back to context.py) just to get tests working again
| raise FrameNotAvailableError(f"Frame {frame_id} not found in archive") | ||
|
|
||
| # Check if 'url' field exists in the response | ||
| if 'url' not in response_data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the error trying to check? This feels too specific and like you are trying to solve something else here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I encountered some files that were missing the s3 url. Something from when I was working on the old cache setup a while back. Unfortunately I'm fuzzy on the details, but here's an example log with a file that prompted this:
2025-10-22 17:36:36.624 | 2025-10-22 21:36:36.620 ERROR: sync: Unexpected error downloading lsc0m412-sq35-20240315-bpm-central30x30.fits.fz (frameid: 69366609, type: BPM): 'url' | {"processName": "MainProcess"}
...
2025-10-22 17:36:36.624 | bytes = buffer.write(requests.get(response.json()['url'], stream=True, timeout=60).content)
2025-10-22 17:36:36.624 | KeyError: 'url'
I don't think the code is hitting this file anymore, and I tried running without this block and there were no issues. So maybe best to remove it?
| from kombu import Connection, Exchange | ||
|
|
||
|
|
||
| def post_to_processing_queue(filename, path, broker_url, exchange_name, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels redundant with the file utils function. What new things does this add?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uses the file path rather than the frameid with the intended use case of wanting to process a file that exists on the local disk. Maybe it would be cleaner to modify the file utils function to accept a frameid or path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am leaning towards keeping post_to_processing_queue separate from post_to_archive_queue in file utils because different intended workflows (local disk vs s3), and because this is a small function that's pretty limited in scope and maybe not worth setting up the abstraction elsewhere. Let me know if you disagree.
cmccully
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some recommendations for cleanup, but overall looks fine.
|
You also need to add a change log message and update the pyproject version number. |
- Add changelog entry and bump version to 1.28.0 - Rename .site-banzai-env to site-banzai-env (remove hidden prefix) - Change "Running at Site" to "Running Locally" in README - Use SQLAlchemy make_url() instead of string splitting in dbs.py - Add argparse to queue_images.py and use named FITS extension - Remove cal_db_address fallback from context.py (already in main.py) - Fix kernel name in example notebook
|
Added the requested changes. Two of your comments I did not change but replied with details; let me know if there's anything else needed for those to be resolved. |
Changes in service of #422
This PR introduces the local setup we'll use for site deployments of banzai, sans calibration caching which will be added in a subsequent PR.
Local Banzai Notes
To run:
This requires an env file called .site-banzai-env that should look like this:
In order to send images to be processed, run:
The data to be processed should be in the directory
${HOST_DATA_DIR}/raw. The output will be saved in./${HOST_PROCESSED_DIR}.