-
Notifications
You must be signed in to change notification settings - Fork 58
Split uploads into three parts to reduce server workload #772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: prerelease
Are you sure you want to change the base?
Split uploads into three parts to reduce server workload #772
Conversation
|
Testing Extraction times _detected 5.5 seconds - 77M |
|
Excellent idea. |
|
At the moment the server unpacks everything that ends in .bz2. I think that just needs to be extended to be _metadata.tar.bz2. EventMonitor has no interest in anything in that part of the project. I suspect some other folks' work, which hangs off the webpages, might break when the webpage extension changes from _detected to _metadata. I've got the basics up and running, we can tweak as necessary, |
markmac99
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems all fine.
I made a couple of comments about replacing legacy formatting with f-strings where possible, and replacing any calls to print() with calls to log.debug(), which i think is worth doing as we go along making other improvements and will ensure messages aren't lost on the console.
I also personally prefer to add new return values to the end of the returned tuple rather than interjecting them, as this ensures there's no possibility of something being misinterpreted, but i suspect you've caught all use-cases for the function.
|
Great job, Dave. I think this all looks good. If we don't mind a bit of data duplication, we might want to consider adding the config and platepar with the ff/fr data. This way, if we need to pull the data and make the plate or make measurements, we won't have to download two archives and combine the files. |
Worth checking with Paul Roggemans as he might still be using the _detected files. |
|
Right, but we won't be uploading double the amount of data. We can leave the archive split as an option, just in case. |
This already happens, as extra_files are added into both archives.
Are you asking for a new option in .config/upload, such as ? Unless I hear something else from you, I'll assume I've understood what you want. I should be able to turn round all the comments in the next 24 hours. |
|
Awesome, that should be it! Could you add more details in the config option about why the archive split is done (faster serverside processing) and how the files look like (mention the two suffixes)? |
|
''' ; If upload_split is false a single archive will be uploaded upload_split: true |
|
@g7gpr Could you update the PR title and the first post to indicate that only two archives are being uploaded and that we dropped the _detected archive? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request implements a feature to split archive uploads into multiple parts to reduce server-side processing time. Instead of uploading a single large archive with all files, the system can now optionally create two separate archives: one containing only image data (FF*.fits and FR*.bin files), and another containing metadata and derived data. This allows the server to process the smaller metadata archive quickly without needing to extract the large image files.
Changes:
- Added
upload_splitconfiguration option to enable/disable archive splitting - Modified
archiveDetections()to return three values (detected, imgdata, metadata archives) and create split archives when enabled - Updated all callers of
processNight()andarchiveDetections()to handle the new return signature with three archive names - Added None-filtering logic to prevent None values from being added to upload queues
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| RMS/ConfigReader.py | Added upload_split configuration option with default value True |
| .config | Added documentation for the upload_split configuration option |
| RMS/ArchiveDetections.py | Modified archiveDetections() to create separate imgdata and metadata archives when split mode is enabled, return three archive names instead of one |
| RMS/Reprocess.py | Updated to handle three-value return from processNight(), filter None values, and pass all archives to upload manager |
| RMS/StartCapture.py | Updated runCapture() and processIncompleteCaptures() to handle three-value return from processNight(), filter None values, and improve logging |
| RMS/UploadManager.py | Added None-check when appending files to upload queue to prevent None values from being written |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if parser.has_option(section, "upload_enabled"): | ||
| config.upload_enabled = parser.getboolean(section, "upload_enabled") | ||
|
|
||
| # Enable uploading images in one archive and data derived from images in another |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra space in the comment. Should be "Enable uploading" instead of "Enable uploading" (two spaces).
| # Enable uploading images in one archive and data derived from images in another | |
| # Enable uploading images in one archive and data derived from images in another |
| log.info("Adding file to upload list: %s", archive_name) | ||
| upload_manager.addFiles([archive_name]) | ||
| log.info("File added.") | ||
| log.info(f"Adding files to upload list: {files_to_add_list}", ) |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary trailing comma in the log.info() call. The comma after the f-string serves no purpose and should be removed for cleaner code.
| log.info(f"Adding files to upload list: {files_to_add_list}", ) | |
| log.info(f"Adding files to upload list: {files_to_add_list}") |
| ; other archive, suffix _metadata will only contain the | ||
| ; data derived from the images, including the timelapse. |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent terminology. The comment says "suffix _metadata will only contain" but should say "suffix _metadata, will contain" (with comma) or "suffix _metadata contains" (present tense matching the _imgdata description). Also, the term "data derived from the images" is used here, while line 379 says "data derived" - consider using consistent phrasing throughout.
| ; other archive, suffix _metadata will only contain the | |
| ; data derived from the images, including the timelapse. | |
| ; other archive, suffix _metadata, contains only the | |
| ; data derived, including the timelapse. |
| imgdata_set = (set([item for item in file_list if item.startswith("FF") and item.endswith(".fits")]) | | ||
| set([item for item in file_list if item.startswith("FR") and item.endswith(".bin")])) | ||
|
|
||
| # Create the metadata set which is the relative complement of file_list in imgdata_set |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment incorrectly describes the set operation. It states "relative complement of file_list in imgdata_set" but the code actually computes the complement of imgdata_set in file_list (i.e., elements in file_list that are not in imgdata_set). The comment should read: "Create the metadata set which is the complement of imgdata_set in file_list" or "Create the metadata set which contains all files except FF*.fits and FR*.bin files".
| # Create the metadata set which is the relative complement of file_list in imgdata_set | |
| # Create the metadata set which is the complement of imgdata_set in file_list |
| log.info(f"Adding file to upload list: {f}") | ||
| upload_manager.addFiles(files_to_add_list) | ||
| if files_to_add_list: | ||
| if len(files_to_add_list) < 2: |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent condition for logging. In line 687 of this function, the condition checks if len(files_to_add_list) == 1 for singular "File added", while here at line 837, the condition checks if len(files_to_add_list) < 2 for the same purpose. These should use the same logic for consistency. Recommend changing line 837 to if len(files_to_add_list) == 1: to match the pattern used elsewhere.
| if len(files_to_add_list) < 2: | |
| if len(files_to_add_list) == 1: |
| if config.upload_enabled: | ||
|
|
||
| # Add metadata archive first, so it might get uploaded first | ||
| files_to_add_list_unfiltered = [archive_name, metadata_archive_name, imgdata_archive_name] |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order of files in the list differs from other similar code sections. In StartCapture.py line 679 and line 830, the order is [archive_name, metadata_archive_name, imgdata_archive_name], but here the order is [archive_name, metadata_archive_name, imgdata_archive_name] with a comment saying "Add metadata archive first, so it might get uploaded first". However, when upload_split is true, archive_name is None, so the actual first file would be metadata_archive_name. For consistency and to match the intent of the comment, consider ordering the list as [metadata_archive_name, imgdata_archive_name, archive_name] to ensure metadata is truly first when all three are present.
| files_to_add_list_unfiltered = [archive_name, metadata_archive_name, imgdata_archive_name] | |
| files_to_add_list_unfiltered = [metadata_archive_name, imgdata_archive_name, archive_name] |
| if files_to_add_list: | ||
| for f in files_to_add_list: | ||
| log.info(f"Adding file to upload list: {f}") | ||
| upload_manager.addFiles(files_to_add_list) | ||
| if files_to_add_list: |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logging statements are executed before calling addFiles(), but they should be executed after to maintain consistency with the pattern used in the runCapture function (lines 684-690). The current order could log files that fail to be added. Consider moving lines 833-834 to after line 835, or combine with the existing conditional logging block at lines 836-840.
| if files_to_add_list: | |
| for f in files_to_add_list: | |
| log.info(f"Adding file to upload list: {f}") | |
| upload_manager.addFiles(files_to_add_list) | |
| if files_to_add_list: | |
| upload_manager.addFiles(files_to_add_list) | |
| if files_to_add_list: | |
| for f in files_to_add_list: | |
| log.info(f"Adding file to upload list: {f}") |
To reduce the time taken for the server to extract bz2 archives, upload two additional archives.
The detected file is the original archive, retained for now for compatibility
The imgdata file contains only FF*.fits and FR*.bin files and all the "extra_files"
The metadata file contains everything except for FF*.fits and FR*.bin files.
This means that the "extra_files" are present in both archives, which is convenient.
This is on test on au000a at present.