Split uploads into three parts to reduce server workload #772

g7gpr · 2025-12-20T09:55:10Z

To reduce the time taken for the server to extract bz2 archives, upload two additional archives.

-rw-r--r--  1 au000a au000a  99M Dec 20 09:52 AU000A_20251219_120648_988521_detected.tar.bz2
-rw-r--r--  1 au000a au000a  83M Dec 20 09:53 AU000A_20251219_120648_988521_imgdata.tar.bz2
-rw-r--r--  1 au000a au000a 3.1M Dec 20 09:53 AU000A_20251219_120648_988521_metadata.tar.bz2

The detected file is the original archive, retained for now for compatibility
The imgdata file contains only FF*.fits and FR*.bin files and all the "extra_files"
The metadata file contains everything except for FF*.fits and FR*.bin files.

This means that the "extra_files" are present in both archives, which is convenient.

This is on test on au000a at present.

g7gpr · 2025-12-21T00:17:15Z

Testing

Extraction times

_detected 5.5 seconds - 77M
_metadata 0.7 seconds - 70M
_imgdata 5.1 seconds - 12M

david@lap-deb-01:~/tmp/au000a$ cd detected/
david@lap-deb-01:~/tmp/au000a/detected$ mv ../AU000A_20251220_114828_789997_detected.tar.bz2 . 
david@lap-deb-01:~/tmp/au000a/detected$ time tar -xf AU000A_20251220_114828_789997_detected.tar.bz2 

real	0m5.551s
user	0m5.524s
sys	0m0.230s
david@lap-deb-01:~/tmp/au000a/detected$ cd ../metadata/
david@lap-deb-01:~/tmp/au000a/metadata$ mv ../AU000A_20251220_114828_789997_metadata.tar.bz2 . 
david@lap-deb-01:~/tmp/au000a/metadata$ time tar -xf AU000A_20251220_114828_789997_metadata.tar.bz2 

real	0m0.705s
user	0m0.689s
sys	0m0.046s
david@lap-deb-01:~/tmp/au000a/metadata$ cd ../imgdata/
david@lap-deb-01:~/tmp/au000a/imgdata$ mv ../AU000A_20251220_114828_789997_imgdata.tar.bz2 . 
david@lap-deb-01:~/tmp/au000a/imgdata$ time tar -xf AU000A_20251220_114828_789997_imgdata.tar.bz2 

real	0m5.162s
user	0m5.079s
sys	0m0.226s
david@lap-deb-01:~/tmp/au000a/imgdata$ cd ..; du -hd1
251M	./imgdata
37M	./metadata
273M	./detected
561M	.
david@lap-deb-01:~/tmp/au000a$ ls */*.mp4
detected/AU000A_20251220_114828_789997_timelapse.mp4  imgdata/AU000A_20251220_114828_789997_timelapse.mp4  metadata/AU000A_20251220_114828_789997_timelapse.mp4
david@lap-deb-01:~/tmp/au000a$ ls */*.fits
detected/FF_AU000A_20251220_120243_004_0020480.fits  detected/FF_AU000A_20251220_170420_638_0472576.fits  detected/FF_AU000A_20251220_184202_378_0619008.fits  imgdata/FF_AU000A_20251220_124717_689_0087296.fits  imgdata/FF_AU000A_20251220_172044_428_0497152.fits  imgdata/FF_AU000A_20251220_184517_085_0623872.fits
detected/FF_AU000A_20251220_124717_689_0087296.fits  detected/FF_AU000A_20251220_172044_428_0497152.fits  detected/FF_AU000A_20251220_184517_085_0623872.fits  imgdata/FF_AU000A_20251220_132249_235_0140544.fits  imgdata/FF_AU000A_20251220_172521_118_0504064.fits  imgdata/FF_AU000A_20251220_185105_510_0632576.fits
detected/FF_AU000A_20251220_132249_235_0140544.fits  detected/FF_AU000A_20251220_172521_118_0504064.fits  detected/FF_AU000A_20251220_185105_510_0632576.fits  imgdata/FF_AU000A_20251220_132806_916_0148480.fits  imgdata/FF_AU000A_20251220_172927_065_0510208.fits  imgdata/FF_AU000A_20251220_185755_422_0642816.fits
detected/FF_AU000A_20251220_132806_916_0148480.fits  detected/FF_AU000A_20251220_172927_065_0510208.fits  detected/FF_AU000A_20251220_185755_422_0642816.fits  imgdata/FF_AU000A_20251220_140520_937_0204288.fits  imgdata/FF_AU000A_20251220_173333_012_0516352.fits  imgdata/FF_AU000A_20251220_190354_094_0651776.fits
detected/FF_AU000A_20251220_140520_937_0204288.fits  detected/FF_AU000A_20251220_173333_012_0516352.fits  detected/FF_AU000A_20251220_190354_094_0651776.fits  imgdata/FF_AU000A_20251220_141708_035_0221952.fits  imgdata/FF_AU000A_20251220_173444_747_0518144.fits  imgdata/FF_AU000A_20251220_190435_085_0652800.fits
detected/FF_AU000A_20251220_141708_035_0221952.fits  detected/FF_AU000A_20251220_173444_747_0518144.fits  detected/FF_AU000A_20251220_190435_085_0652800.fits  imgdata/FF_AU000A_20251220_142215_469_0229632.fits  imgdata/FF_AU000A_20251220_173718_464_0521984.fits  imgdata/FF_AU000A_20251220_191023_510_0661504.fits
detected/FF_AU000A_20251220_142215_469_0229632.fits  detected/FF_AU000A_20251220_173718_464_0521984.fits  detected/FF_AU000A_20251220_191023_510_0661504.fits  imgdata/FF_AU000A_20251220_144904_374_0269824.fits  imgdata/FF_AU000A_20251220_174114_163_0527872.fits  imgdata/FF_AU000A_20251220_191338_218_0666368.fits
detected/FF_AU000A_20251220_144904_374_0269824.fits  detected/FF_AU000A_20251220_174114_163_0527872.fits  detected/FF_AU000A_20251220_191338_218_0666368.fits  imgdata/FF_AU000A_20251220_151126_836_0303360.fits  imgdata/FF_AU000A_20251220_174459_615_0533504.fits  imgdata/FF_AU000A_20251220_192251_599_0680192.fits
detected/FF_AU000A_20251220_151126_836_0303360.fits  detected/FF_AU000A_20251220_174459_615_0533504.fits  detected/FF_AU000A_20251220_192251_599_0680192.fits  imgdata/FF_AU000A_20251220_153815_742_0343552.fits  imgdata/FF_AU000A_20251220_181128_023_0573184.fits  imgdata/FF_AU000A_20251220_192910_768_0689664.fits
detected/FF_AU000A_20251220_153815_742_0343552.fits  detected/FF_AU000A_20251220_181128_023_0573184.fits  detected/FF_AU000A_20251220_192910_768_0689664.fits  imgdata/FF_AU000A_20251220_155551_266_0369920.fits  imgdata/FF_AU000A_20251220_182447_351_0593152.fits  imgdata/FF_AU000A_20251220_194047_617_0707072.fits
detected/FF_AU000A_20251220_155551_266_0369920.fits  detected/FF_AU000A_20251220_182447_351_0593152.fits  detected/FF_AU000A_20251220_194047_617_0707072.fits  imgdata/FF_AU000A_20251220_164827_592_0448768.fits  imgdata/FF_AU000A_20251220_182751_811_0597760.fits  imgdata/FF_AU000A_20251220_194433_068_0712704.fits
detected/FF_AU000A_20251220_164827_592_0448768.fits  detected/FF_AU000A_20251220_182751_811_0597760.fits  detected/FF_AU000A_20251220_194433_068_0712704.fits  imgdata/FF_AU000A_20251220_165740_974_0462592.fits  imgdata/FF_AU000A_20251220_183035_775_0601856.fits
detected/FF_AU000A_20251220_165740_974_0462592.fits  detected/FF_AU000A_20251220_183035_775_0601856.fits  imgdata/FF_AU000A_20251220_120243_004_0020480.fits   imgdata/FF_AU000A_20251220_170420_638_0472576.fits  imgdata/FF_AU000A_20251220_184202_378_0619008.fits
david@lap-deb-01:~/tmp/au000a$ ls */*.txt
detected/AU000A_20251220_114828_789997_config_audit_report.txt  detected/FTPdetectinfo_007087_20251220_114828_789997.txt                                imgdata/AU000A_20251220_114828_789997_observation_summary.txt   metadata/CALSTARS_AU000A_20251220_114828_789997.txt
detected/AU000A_20251220_114828_789997_observation_summary.txt  detected/FTPdetectinfo_AU000A_20251220_114828_789997_backup_20251220_205147.085850.txt  metadata/AU000A_20251220_114828_789997_config_audit_report.txt  metadata/FTPdetectinfo_007087_20251220_114828_789997.txt
detected/AU000A_20251220_114828_789997_radiants.txt             detected/FTPdetectinfo_AU000A_20251220_114828_789997.txt                                metadata/AU000A_20251220_114828_789997_observation_summary.txt  metadata/FTPdetectinfo_AU000A_20251220_114828_789997_backup_20251220_205147.085850.txt
detected/CAL_007087_20251220_114828_789.txt                     detected/FTPdetectinfo_AU000A_20251220_114828_789997_unfiltered.txt                     metadata/AU000A_20251220_114828_789997_radiants.txt             metadata/FTPdetectinfo_AU000A_20251220_114828_789997.txt
detected/CALSTARS_AU000A_20251220_114828_789997.txt             imgdata/AU000A_20251220_114828_789997_config_audit_report.txt                           metadata/CAL_007087_20251220_114828_789.txt                     metadata/FTPdetectinfo_AU000A_20251220_114828_789997_unfiltered.txt

markmac99 · 2025-12-21T10:47:13Z

Excellent idea.
Not had time to look at the code yet, but this will also require server-side changes i think - @dvida i guess the extractor will need some changes ? Also the server side part of event-moniitor i guess.
I assume the metadata file contains the platepar, config, ftpdetect etc and is what will be required by the solver, so that should be uploaded first and then the other files which probably don't need to be unpacked at all on the server (unless event monitor needs 'em).

g7gpr · 2025-12-21T10:59:08Z

At the moment the server unpacks everything that ends in .bz2. I think that just needs to be extended to be _metadata.tar.bz2. EventMonitor has no interest in anything in that part of the project. I suspect some other folks' work, which hangs off the webpages, might break when the webpage extension changes from _detected to _metadata.

I've got the basics up and running, we can tweak as necessary,

markmac99

Seems all fine.
I made a couple of comments about replacing legacy formatting with f-strings where possible, and replacing any calls to print() with calls to log.debug(), which i think is worth doing as we go along making other improvements and will ensure messages aren't lost on the console.
I also personally prefer to add new return values to the end of the returned tuple rather than interjecting them, as this ensures there's no possibility of something being misinterpreted, but i suspect you've caught all use-cases for the function.

RMS/ArchiveDetections.py

RMS/Reprocess.py

RMS/StartCapture.py

dvida · 2025-12-22T15:29:30Z

Great job, Dave. I think this all looks good. If we don't mind a bit of data duplication, we might want to consider adding the config and platepar with the ff/fr data. This way, if we need to pull the data and make the plate or make measurements, we won't have to download two archives and combine the files.
Finally, before this goes into production, we'll have to disable uploading the _detected directory. We'll keep this as a PR until I modify the serverside script.

markmac99 · 2025-12-22T17:43:25Z

Finally, before this goes into production, we'll have to disable uploading the _detected directory.

Worth checking with Paul Roggemans as he might still be using the _detected files.

dvida · 2025-12-22T18:07:48Z

Right, but we won't be uploading double the amount of data. We can leave the archive split as an option, just in case.

g7gpr · 2025-12-22T22:31:20Z

we might want to consider adding the config and platepar with the ff/fr data

This already happens, as extra_files are added into both archives.

david@lap-deb-01:~/tmp/au000a$ ls metadata/.config metadata/platepar_cmn2010.cal imgdata/.config imgdata/platepar_cmn2010.cal 
imgdata/.config  imgdata/platepar_cmn2010.cal  metadata/.config  metadata/platepar_cmn2010.cal

We can leave the archive split as an option, just in case.

Are you asking for a new option in .config/upload, such as

; Upload two archives, one containing images, the other containing data derived from the images.
archive_split: true

?

Unless I hear something else from you, I'll assume I've understood what you want.

I should be able to turn round all the comments in the next 24 hours.

dvida · 2025-12-22T22:56:53Z

Awesome, that should be it! Could you add more details in the config option about why the archive split is done (faster serverside processing) and how the files look like (mention the two suffixes)?

g7gpr · 2025-12-22T23:35:09Z

'''
; If upload_split is true, two archives will be created.
; One archive, suffix _imgdata, contains the images. The
; other archive, suffix _metadata will only contain the
; data derived from the images, including the timelapse.
;
; The metadata archive is much smaller than the imagedata
; archive, and can be extracted very quickly. This speeds up the
; processing on the server, since the image data is generally not
; required.

; If upload_split is false a single archive will be uploaded
; with the suffix _detected containing all images, and data derived.
; These archives are large and slow down the server side processing.

upload_split: true
'''

dvida · 2026-01-21T15:39:49Z

@g7gpr Could you update the PR title and the first post to indicate that only two archives are being uploaded and that we dropped the _detected archive?

Copilot

Pull request overview

This pull request implements a feature to split archive uploads into multiple parts to reduce server-side processing time. Instead of uploading a single large archive with all files, the system can now optionally create two separate archives: one containing only image data (FF*.fits and FR*.bin files), and another containing metadata and derived data. This allows the server to process the smaller metadata archive quickly without needing to extract the large image files.

Changes:

Added upload_split configuration option to enable/disable archive splitting
Modified archiveDetections() to return three values (detected, imgdata, metadata archives) and create split archives when enabled
Updated all callers of processNight() and archiveDetections() to handle the new return signature with three archive names
Added None-filtering logic to prevent None values from being added to upload queues

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
RMS/ConfigReader.py	Added `upload_split` configuration option with default value `True`
.config	Added documentation for the `upload_split` configuration option
RMS/ArchiveDetections.py	Modified `archiveDetections()` to create separate imgdata and metadata archives when split mode is enabled, return three archive names instead of one
RMS/Reprocess.py	Updated to handle three-value return from `processNight()`, filter None values, and pass all archives to upload manager
RMS/StartCapture.py	Updated `runCapture()` and `processIncompleteCaptures()` to handle three-value return from `processNight()`, filter None values, and improve logging
RMS/UploadManager.py	Added None-check when appending files to upload queue to prevent None values from being written

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-21T15:45:29Z

RMS/ConfigReader.py

    if parser.has_option(section, "upload_enabled"):
        config.upload_enabled = parser.getboolean(section, "upload_enabled")

+    # Enable  uploading images in one archive and data derived from images in another


Extra space in the comment. Should be "Enable uploading" instead of "Enable uploading" (two spaces).

Suggested change

# Enable uploading images in one archive and data derived from images in another

# Enable uploading images in one archive and data derived from images in another

Copilot · 2026-01-21T15:45:29Z

RMS/StartCapture.py

-                log.info("Adding file to upload list: %s", archive_name)
-                upload_manager.addFiles([archive_name])
-                log.info("File added.")
+                log.info(f"Adding files to upload list: {files_to_add_list}", )


Unnecessary trailing comma in the log.info() call. The comma after the f-string serves no purpose and should be removed for cleaner code.

Suggested change

log.info(f"Adding files to upload list: {files_to_add_list}", )

log.info(f"Adding files to upload list: {files_to_add_list}")

Copilot · 2026-01-21T15:45:29Z

.config

+; other archive, suffix _metadata will only contain the
+; data derived from the images, including the timelapse.


Inconsistent terminology. The comment says "suffix _metadata will only contain" but should say "suffix _metadata, will contain" (with comma) or "suffix _metadata contains" (present tense matching the _imgdata description). Also, the term "data derived from the images" is used here, while line 379 says "data derived" - consider using consistent phrasing throughout.

Suggested change

; other archive, suffix _metadata will only contain the

; data derived from the images, including the timelapse.

; other archive, suffix _metadata, contains only the

; data derived, including the timelapse.

Copilot · 2026-01-21T15:45:30Z

RMS/ArchiveDetections.py

+        imgdata_set = (set([item for item in file_list if item.startswith("FF") and item.endswith(".fits")]) |
+                       set([item for item in file_list if item.startswith("FR") and item.endswith(".bin")]))
+
+        # Create the metadata set which is the relative complement of file_list in imgdata_set


The comment incorrectly describes the set operation. It states "relative complement of file_list in imgdata_set" but the code actually computes the complement of imgdata_set in file_list (i.e., elements in file_list that are not in imgdata_set). The comment should read: "Create the metadata set which is the complement of imgdata_set in file_list" or "Create the metadata set which contains all files except FF*.fits and FR*.bin files".

Suggested change

# Create the metadata set which is the relative complement of file_list in imgdata_set

# Create the metadata set which is the complement of imgdata_set in file_list

Copilot · 2026-01-21T15:45:30Z

RMS/StartCapture.py

+                        log.info(f"Adding file to upload list: {f}")
+                upload_manager.addFiles(files_to_add_list)
+                if files_to_add_list:
+                    if len(files_to_add_list) < 2:


Inconsistent condition for logging. In line 687 of this function, the condition checks if len(files_to_add_list) == 1 for singular "File added", while here at line 837, the condition checks if len(files_to_add_list) < 2 for the same purpose. These should use the same logic for consistency. Recommend changing line 837 to if len(files_to_add_list) == 1: to match the pattern used elsewhere.

Suggested change

if len(files_to_add_list) < 2:

if len(files_to_add_list) == 1:

Copilot · 2026-01-21T15:45:30Z

RMS/Reprocess.py

    if config.upload_enabled:

+        # Add metadata archive first, so it might get uploaded first
+        files_to_add_list_unfiltered = [archive_name, metadata_archive_name, imgdata_archive_name]


The order of files in the list differs from other similar code sections. In StartCapture.py line 679 and line 830, the order is [archive_name, metadata_archive_name, imgdata_archive_name], but here the order is [archive_name, metadata_archive_name, imgdata_archive_name] with a comment saying "Add metadata archive first, so it might get uploaded first". However, when upload_split is true, archive_name is None, so the actual first file would be metadata_archive_name. For consistency and to match the intent of the comment, consider ordering the list as [metadata_archive_name, imgdata_archive_name, archive_name] to ensure metadata is truly first when all three are present.

Suggested change

files_to_add_list_unfiltered = [archive_name, metadata_archive_name, imgdata_archive_name]

files_to_add_list_unfiltered = [metadata_archive_name, imgdata_archive_name, archive_name]

Copilot · 2026-01-21T15:45:31Z

RMS/StartCapture.py

+                if files_to_add_list:
+                    for f in files_to_add_list:
+                        log.info(f"Adding file to upload list: {f}")
+                upload_manager.addFiles(files_to_add_list)
+                if files_to_add_list:


The logging statements are executed before calling addFiles(), but they should be executed after to maintain consistency with the pattern used in the runCapture function (lines 684-690). The current order could log files that fail to be added. Consider moving lines 833-834 to after line 835, or combine with the existing conditional logging block at lines 836-840.

Suggested change

if files_to_add_list:

for f in files_to_add_list:

log.info(f"Adding file to upload list: {f}")

upload_manager.addFiles(files_to_add_list)

if files_to_add_list:

upload_manager.addFiles(files_to_add_list)

if files_to_add_list:

for f in files_to_add_list:

log.info(f"Adding file to upload list: {f}")

g7gpr added 7 commits December 20, 2025 09:49

Initial work on splitting uploads

b0313fc

Add a small amount of logging

b17d2f4

Remove unintended cr

8fbcd84

Fix unusual exit condition from archiveDetections

2df7eb4

Clarify comments

c462902

Update docstrings

bf0951a

Update docstrings

91a66f5

g7gpr requested review from dvida and markmac99 December 20, 2025 23:57

g7gpr added 6 commits December 21, 2025 00:24

Metadata as a singular noun

2c0ca6e

Commenting clarity

152beb8

Test for existence of directory before attempting deletion

9044e24

Log files being added to upload list

3f0465f

Singular and plural file adding

45f164b

Singular and plural file adding

16942e0

markmac99 approved these changes Dec 21, 2025

View reviewed changes

g7gpr added 5 commits December 22, 2025 22:45

Use f strings

e82b896

Use f strings

086d803

Add archive_split flag

4b8f443

Change .config template

75ba0bf

Change .config template

7c45041

Implement upload split

fa93cbf

g7gpr added 2 commits December 22, 2025 23:11

Set unused archive names to None

8659f48

Upload metadata before imgdata

39bf3f0

g7gpr added 3 commits December 22, 2025 23:37

Config file commenting to explain upload_split

4f56ca3

Remove obsolete comment

e1e0df3

Catch None values

e0c483d

g7gpr marked this pull request as ready for review December 28, 2025 03:22

dvida requested a review from Copilot January 21, 2026 15:40

Copilot started reviewing on behalf of dvida January 21, 2026 15:41 View session

Copilot AI reviewed Jan 21, 2026

View reviewed changes

Ensure _detected directory and archive are always created

72289e5

	# Enable uploading images in one archive and data derived from images in another
	# Enable uploading images in one archive and data derived from images in another

	log.info(f"Adding files to upload list: {files_to_add_list}", )
	log.info(f"Adding files to upload list: {files_to_add_list}")

		; other archive, suffix _metadata will only contain the
		; data derived from the images, including the timelapse.

	# Create the metadata set which is the relative complement of file_list in imgdata_set
	# Create the metadata set which is the complement of imgdata_set in file_list

	if len(files_to_add_list) < 2:
	if len(files_to_add_list) == 1:

	files_to_add_list_unfiltered = [archive_name, metadata_archive_name, imgdata_archive_name]
	files_to_add_list_unfiltered = [metadata_archive_name, imgdata_archive_name, archive_name]

Split uploads into three parts to reduce server workload #772

Are you sure you want to change the base?

Split uploads into three parts to reduce server workload #772

Conversation

g7gpr commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

g7gpr commented Dec 21, 2025

Uh oh!

markmac99 commented Dec 21, 2025

Uh oh!

g7gpr commented Dec 21, 2025

Uh oh!

markmac99 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dvida commented Dec 22, 2025

Uh oh!

markmac99 commented Dec 22, 2025

Uh oh!

dvida commented Dec 22, 2025

Uh oh!

g7gpr commented Dec 22, 2025

Uh oh!

dvida commented Dec 22, 2025

Uh oh!

g7gpr commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dvida commented Jan 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

g7gpr commented Dec 20, 2025 •

edited

Loading

g7gpr commented Dec 22, 2025 •

edited

Loading