Skip to content

Conversation

@g7gpr
Copy link
Contributor

@g7gpr g7gpr commented Dec 20, 2025

To reduce the time taken for the server to extract bz2 archives, upload two additional archives.

-rw-r--r--  1 au000a au000a  99M Dec 20 09:52 AU000A_20251219_120648_988521_detected.tar.bz2
-rw-r--r--  1 au000a au000a  83M Dec 20 09:53 AU000A_20251219_120648_988521_imgdata.tar.bz2
-rw-r--r--  1 au000a au000a 3.1M Dec 20 09:53 AU000A_20251219_120648_988521_metadata.tar.bz2

The detected file is the original archive, retained for now for compatibility
The imgdata file contains only FF*.fits and FR*.bin files and all the "extra_files"
The metadata file contains everything except for FF*.fits and FR*.bin files.

This means that the "extra_files" are present in both archives, which is convenient.

This is on test on au000a at present.

@g7gpr g7gpr requested review from dvida and markmac99 December 20, 2025 23:57
@g7gpr
Copy link
Contributor Author

g7gpr commented Dec 21, 2025

Testing

Extraction times

_detected 5.5 seconds - 77M
_metadata 0.7 seconds - 70M
_imgdata 5.1 seconds - 12M

david@lap-deb-01:~/tmp/au000a$ cd detected/
david@lap-deb-01:~/tmp/au000a/detected$ mv ../AU000A_20251220_114828_789997_detected.tar.bz2 . 
david@lap-deb-01:~/tmp/au000a/detected$ time tar -xf AU000A_20251220_114828_789997_detected.tar.bz2 

real	0m5.551s
user	0m5.524s
sys	0m0.230s
david@lap-deb-01:~/tmp/au000a/detected$ cd ../metadata/
david@lap-deb-01:~/tmp/au000a/metadata$ mv ../AU000A_20251220_114828_789997_metadata.tar.bz2 . 
david@lap-deb-01:~/tmp/au000a/metadata$ time tar -xf AU000A_20251220_114828_789997_metadata.tar.bz2 

real	0m0.705s
user	0m0.689s
sys	0m0.046s
david@lap-deb-01:~/tmp/au000a/metadata$ cd ../imgdata/
david@lap-deb-01:~/tmp/au000a/imgdata$ mv ../AU000A_20251220_114828_789997_imgdata.tar.bz2 . 
david@lap-deb-01:~/tmp/au000a/imgdata$ time tar -xf AU000A_20251220_114828_789997_imgdata.tar.bz2 

real	0m5.162s
user	0m5.079s
sys	0m0.226s
david@lap-deb-01:~/tmp/au000a/imgdata$ cd ..; du -hd1
251M	./imgdata
37M	./metadata
273M	./detected
561M	.
david@lap-deb-01:~/tmp/au000a$ ls */*.mp4
detected/AU000A_20251220_114828_789997_timelapse.mp4  imgdata/AU000A_20251220_114828_789997_timelapse.mp4  metadata/AU000A_20251220_114828_789997_timelapse.mp4
david@lap-deb-01:~/tmp/au000a$ ls */*.fits
detected/FF_AU000A_20251220_120243_004_0020480.fits  detected/FF_AU000A_20251220_170420_638_0472576.fits  detected/FF_AU000A_20251220_184202_378_0619008.fits  imgdata/FF_AU000A_20251220_124717_689_0087296.fits  imgdata/FF_AU000A_20251220_172044_428_0497152.fits  imgdata/FF_AU000A_20251220_184517_085_0623872.fits
detected/FF_AU000A_20251220_124717_689_0087296.fits  detected/FF_AU000A_20251220_172044_428_0497152.fits  detected/FF_AU000A_20251220_184517_085_0623872.fits  imgdata/FF_AU000A_20251220_132249_235_0140544.fits  imgdata/FF_AU000A_20251220_172521_118_0504064.fits  imgdata/FF_AU000A_20251220_185105_510_0632576.fits
detected/FF_AU000A_20251220_132249_235_0140544.fits  detected/FF_AU000A_20251220_172521_118_0504064.fits  detected/FF_AU000A_20251220_185105_510_0632576.fits  imgdata/FF_AU000A_20251220_132806_916_0148480.fits  imgdata/FF_AU000A_20251220_172927_065_0510208.fits  imgdata/FF_AU000A_20251220_185755_422_0642816.fits
detected/FF_AU000A_20251220_132806_916_0148480.fits  detected/FF_AU000A_20251220_172927_065_0510208.fits  detected/FF_AU000A_20251220_185755_422_0642816.fits  imgdata/FF_AU000A_20251220_140520_937_0204288.fits  imgdata/FF_AU000A_20251220_173333_012_0516352.fits  imgdata/FF_AU000A_20251220_190354_094_0651776.fits
detected/FF_AU000A_20251220_140520_937_0204288.fits  detected/FF_AU000A_20251220_173333_012_0516352.fits  detected/FF_AU000A_20251220_190354_094_0651776.fits  imgdata/FF_AU000A_20251220_141708_035_0221952.fits  imgdata/FF_AU000A_20251220_173444_747_0518144.fits  imgdata/FF_AU000A_20251220_190435_085_0652800.fits
detected/FF_AU000A_20251220_141708_035_0221952.fits  detected/FF_AU000A_20251220_173444_747_0518144.fits  detected/FF_AU000A_20251220_190435_085_0652800.fits  imgdata/FF_AU000A_20251220_142215_469_0229632.fits  imgdata/FF_AU000A_20251220_173718_464_0521984.fits  imgdata/FF_AU000A_20251220_191023_510_0661504.fits
detected/FF_AU000A_20251220_142215_469_0229632.fits  detected/FF_AU000A_20251220_173718_464_0521984.fits  detected/FF_AU000A_20251220_191023_510_0661504.fits  imgdata/FF_AU000A_20251220_144904_374_0269824.fits  imgdata/FF_AU000A_20251220_174114_163_0527872.fits  imgdata/FF_AU000A_20251220_191338_218_0666368.fits
detected/FF_AU000A_20251220_144904_374_0269824.fits  detected/FF_AU000A_20251220_174114_163_0527872.fits  detected/FF_AU000A_20251220_191338_218_0666368.fits  imgdata/FF_AU000A_20251220_151126_836_0303360.fits  imgdata/FF_AU000A_20251220_174459_615_0533504.fits  imgdata/FF_AU000A_20251220_192251_599_0680192.fits
detected/FF_AU000A_20251220_151126_836_0303360.fits  detected/FF_AU000A_20251220_174459_615_0533504.fits  detected/FF_AU000A_20251220_192251_599_0680192.fits  imgdata/FF_AU000A_20251220_153815_742_0343552.fits  imgdata/FF_AU000A_20251220_181128_023_0573184.fits  imgdata/FF_AU000A_20251220_192910_768_0689664.fits
detected/FF_AU000A_20251220_153815_742_0343552.fits  detected/FF_AU000A_20251220_181128_023_0573184.fits  detected/FF_AU000A_20251220_192910_768_0689664.fits  imgdata/FF_AU000A_20251220_155551_266_0369920.fits  imgdata/FF_AU000A_20251220_182447_351_0593152.fits  imgdata/FF_AU000A_20251220_194047_617_0707072.fits
detected/FF_AU000A_20251220_155551_266_0369920.fits  detected/FF_AU000A_20251220_182447_351_0593152.fits  detected/FF_AU000A_20251220_194047_617_0707072.fits  imgdata/FF_AU000A_20251220_164827_592_0448768.fits  imgdata/FF_AU000A_20251220_182751_811_0597760.fits  imgdata/FF_AU000A_20251220_194433_068_0712704.fits
detected/FF_AU000A_20251220_164827_592_0448768.fits  detected/FF_AU000A_20251220_182751_811_0597760.fits  detected/FF_AU000A_20251220_194433_068_0712704.fits  imgdata/FF_AU000A_20251220_165740_974_0462592.fits  imgdata/FF_AU000A_20251220_183035_775_0601856.fits
detected/FF_AU000A_20251220_165740_974_0462592.fits  detected/FF_AU000A_20251220_183035_775_0601856.fits  imgdata/FF_AU000A_20251220_120243_004_0020480.fits   imgdata/FF_AU000A_20251220_170420_638_0472576.fits  imgdata/FF_AU000A_20251220_184202_378_0619008.fits
david@lap-deb-01:~/tmp/au000a$ ls */*.txt
detected/AU000A_20251220_114828_789997_config_audit_report.txt  detected/FTPdetectinfo_007087_20251220_114828_789997.txt                                imgdata/AU000A_20251220_114828_789997_observation_summary.txt   metadata/CALSTARS_AU000A_20251220_114828_789997.txt
detected/AU000A_20251220_114828_789997_observation_summary.txt  detected/FTPdetectinfo_AU000A_20251220_114828_789997_backup_20251220_205147.085850.txt  metadata/AU000A_20251220_114828_789997_config_audit_report.txt  metadata/FTPdetectinfo_007087_20251220_114828_789997.txt
detected/AU000A_20251220_114828_789997_radiants.txt             detected/FTPdetectinfo_AU000A_20251220_114828_789997.txt                                metadata/AU000A_20251220_114828_789997_observation_summary.txt  metadata/FTPdetectinfo_AU000A_20251220_114828_789997_backup_20251220_205147.085850.txt
detected/CAL_007087_20251220_114828_789.txt                     detected/FTPdetectinfo_AU000A_20251220_114828_789997_unfiltered.txt                     metadata/AU000A_20251220_114828_789997_radiants.txt             metadata/FTPdetectinfo_AU000A_20251220_114828_789997.txt
detected/CALSTARS_AU000A_20251220_114828_789997.txt             imgdata/AU000A_20251220_114828_789997_config_audit_report.txt                           metadata/CAL_007087_20251220_114828_789.txt                     metadata/FTPdetectinfo_AU000A_20251220_114828_789997_unfiltered.txt

@markmac99
Copy link
Contributor

Excellent idea.
Not had time to look at the code yet, but this will also require server-side changes i think - @dvida i guess the extractor will need some changes ? Also the server side part of event-moniitor i guess.
I assume the metadata file contains the platepar, config, ftpdetect etc and is what will be required by the solver, so that should be uploaded first and then the other files which probably don't need to be unpacked at all on the server (unless event monitor needs 'em).

@g7gpr
Copy link
Contributor Author

g7gpr commented Dec 21, 2025

At the moment the server unpacks everything that ends in .bz2. I think that just needs to be extended to be _metadata.tar.bz2. EventMonitor has no interest in anything in that part of the project. I suspect some other folks' work, which hangs off the webpages, might break when the webpage extension changes from _detected to _metadata.

I've got the basics up and running, we can tweak as necessary,

Copy link
Contributor

@markmac99 markmac99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems all fine.
I made a couple of comments about replacing legacy formatting with f-strings where possible, and replacing any calls to print() with calls to log.debug(), which i think is worth doing as we go along making other improvements and will ensure messages aren't lost on the console.
I also personally prefer to add new return values to the end of the returned tuple rather than interjecting them, as this ensures there's no possibility of something being misinterpreted, but i suspect you've caught all use-cases for the function.

@dvida
Copy link
Contributor

dvida commented Dec 22, 2025

Great job, Dave. I think this all looks good. If we don't mind a bit of data duplication, we might want to consider adding the config and platepar with the ff/fr data. This way, if we need to pull the data and make the plate or make measurements, we won't have to download two archives and combine the files.
Finally, before this goes into production, we'll have to disable uploading the _detected directory. We'll keep this as a PR until I modify the serverside script.

@markmac99
Copy link
Contributor

Finally, before this goes into production, we'll have to disable uploading the _detected directory.

Worth checking with Paul Roggemans as he might still be using the _detected files.

@dvida
Copy link
Contributor

dvida commented Dec 22, 2025

Right, but we won't be uploading double the amount of data. We can leave the archive split as an option, just in case.

@g7gpr
Copy link
Contributor Author

g7gpr commented Dec 22, 2025

we might want to consider adding the config and platepar with the ff/fr data

This already happens, as extra_files are added into both archives.

david@lap-deb-01:~/tmp/au000a$ ls metadata/.config metadata/platepar_cmn2010.cal imgdata/.config imgdata/platepar_cmn2010.cal 
imgdata/.config  imgdata/platepar_cmn2010.cal  metadata/.config  metadata/platepar_cmn2010.cal

We can leave the archive split as an option, just in case.

Are you asking for a new option in .config/upload, such as

; Upload two archives, one containing images, the other containing data derived from the images.
archive_split: true

?

Unless I hear something else from you, I'll assume I've understood what you want.

I should be able to turn round all the comments in the next 24 hours.

@dvida
Copy link
Contributor

dvida commented Dec 22, 2025

Awesome, that should be it! Could you add more details in the config option about why the archive split is done (faster serverside processing) and how the files look like (mention the two suffixes)?

@g7gpr
Copy link
Contributor Author

g7gpr commented Dec 22, 2025

'''
; If upload_split is true, two archives will be created.
; One archive, suffix _imgdata, contains the images. The
; other archive, suffix _metadata will only contain the
; data derived from the images, including the timelapse.
;
; The metadata archive is much smaller than the imagedata
; archive, and can be extracted very quickly. This speeds up the
; processing on the server, since the image data is generally not
; required.

; If upload_split is false a single archive will be uploaded
; with the suffix _detected containing all images, and data derived.
; These archives are large and slow down the server side processing.

upload_split: true
'''

@g7gpr g7gpr marked this pull request as ready for review December 28, 2025 03:22
@dvida
Copy link
Contributor

dvida commented Jan 21, 2026

@g7gpr Could you update the PR title and the first post to indicate that only two archives are being uploaded and that we dropped the _detected archive?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements a feature to split archive uploads into multiple parts to reduce server-side processing time. Instead of uploading a single large archive with all files, the system can now optionally create two separate archives: one containing only image data (FF*.fits and FR*.bin files), and another containing metadata and derived data. This allows the server to process the smaller metadata archive quickly without needing to extract the large image files.

Changes:

  • Added upload_split configuration option to enable/disable archive splitting
  • Modified archiveDetections() to return three values (detected, imgdata, metadata archives) and create split archives when enabled
  • Updated all callers of processNight() and archiveDetections() to handle the new return signature with three archive names
  • Added None-filtering logic to prevent None values from being added to upload queues

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
RMS/ConfigReader.py Added upload_split configuration option with default value True
.config Added documentation for the upload_split configuration option
RMS/ArchiveDetections.py Modified archiveDetections() to create separate imgdata and metadata archives when split mode is enabled, return three archive names instead of one
RMS/Reprocess.py Updated to handle three-value return from processNight(), filter None values, and pass all archives to upload manager
RMS/StartCapture.py Updated runCapture() and processIncompleteCaptures() to handle three-value return from processNight(), filter None values, and improve logging
RMS/UploadManager.py Added None-check when appending files to upload queue to prevent None values from being written

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if parser.has_option(section, "upload_enabled"):
config.upload_enabled = parser.getboolean(section, "upload_enabled")

# Enable uploading images in one archive and data derived from images in another
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space in the comment. Should be "Enable uploading" instead of "Enable uploading" (two spaces).

Suggested change
# Enable uploading images in one archive and data derived from images in another
# Enable uploading images in one archive and data derived from images in another

Copilot uses AI. Check for mistakes.
log.info("Adding file to upload list: %s", archive_name)
upload_manager.addFiles([archive_name])
log.info("File added.")
log.info(f"Adding files to upload list: {files_to_add_list}", )
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary trailing comma in the log.info() call. The comma after the f-string serves no purpose and should be removed for cleaner code.

Suggested change
log.info(f"Adding files to upload list: {files_to_add_list}", )
log.info(f"Adding files to upload list: {files_to_add_list}")

Copilot uses AI. Check for mistakes.
Comment on lines +370 to +371
; other archive, suffix _metadata will only contain the
; data derived from the images, including the timelapse.
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent terminology. The comment says "suffix _metadata will only contain" but should say "suffix _metadata, will contain" (with comma) or "suffix _metadata contains" (present tense matching the _imgdata description). Also, the term "data derived from the images" is used here, while line 379 says "data derived" - consider using consistent phrasing throughout.

Suggested change
; other archive, suffix _metadata will only contain the
; data derived from the images, including the timelapse.
; other archive, suffix _metadata, contains only the
; data derived, including the timelapse.

Copilot uses AI. Check for mistakes.
imgdata_set = (set([item for item in file_list if item.startswith("FF") and item.endswith(".fits")]) |
set([item for item in file_list if item.startswith("FR") and item.endswith(".bin")]))

# Create the metadata set which is the relative complement of file_list in imgdata_set
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment incorrectly describes the set operation. It states "relative complement of file_list in imgdata_set" but the code actually computes the complement of imgdata_set in file_list (i.e., elements in file_list that are not in imgdata_set). The comment should read: "Create the metadata set which is the complement of imgdata_set in file_list" or "Create the metadata set which contains all files except FF*.fits and FR*.bin files".

Suggested change
# Create the metadata set which is the relative complement of file_list in imgdata_set
# Create the metadata set which is the complement of imgdata_set in file_list

Copilot uses AI. Check for mistakes.
log.info(f"Adding file to upload list: {f}")
upload_manager.addFiles(files_to_add_list)
if files_to_add_list:
if len(files_to_add_list) < 2:
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent condition for logging. In line 687 of this function, the condition checks if len(files_to_add_list) == 1 for singular "File added", while here at line 837, the condition checks if len(files_to_add_list) < 2 for the same purpose. These should use the same logic for consistency. Recommend changing line 837 to if len(files_to_add_list) == 1: to match the pattern used elsewhere.

Suggested change
if len(files_to_add_list) < 2:
if len(files_to_add_list) == 1:

Copilot uses AI. Check for mistakes.
if config.upload_enabled:

# Add metadata archive first, so it might get uploaded first
files_to_add_list_unfiltered = [archive_name, metadata_archive_name, imgdata_archive_name]
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of files in the list differs from other similar code sections. In StartCapture.py line 679 and line 830, the order is [archive_name, metadata_archive_name, imgdata_archive_name], but here the order is [archive_name, metadata_archive_name, imgdata_archive_name] with a comment saying "Add metadata archive first, so it might get uploaded first". However, when upload_split is true, archive_name is None, so the actual first file would be metadata_archive_name. For consistency and to match the intent of the comment, consider ordering the list as [metadata_archive_name, imgdata_archive_name, archive_name] to ensure metadata is truly first when all three are present.

Suggested change
files_to_add_list_unfiltered = [archive_name, metadata_archive_name, imgdata_archive_name]
files_to_add_list_unfiltered = [metadata_archive_name, imgdata_archive_name, archive_name]

Copilot uses AI. Check for mistakes.
Comment on lines +832 to +836
if files_to_add_list:
for f in files_to_add_list:
log.info(f"Adding file to upload list: {f}")
upload_manager.addFiles(files_to_add_list)
if files_to_add_list:
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logging statements are executed before calling addFiles(), but they should be executed after to maintain consistency with the pattern used in the runCapture function (lines 684-690). The current order could log files that fail to be added. Consider moving lines 833-834 to after line 835, or combine with the existing conditional logging block at lines 836-840.

Suggested change
if files_to_add_list:
for f in files_to_add_list:
log.info(f"Adding file to upload list: {f}")
upload_manager.addFiles(files_to_add_list)
if files_to_add_list:
upload_manager.addFiles(files_to_add_list)
if files_to_add_list:
for f in files_to_add_list:
log.info(f"Adding file to upload list: {f}")

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants