-
Notifications
You must be signed in to change notification settings - Fork 58
FITS Handling: Introduce Lossless Compression for HDUs #278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: prerelease
Are you sure you want to change the base?
FITS Handling: Introduce Lossless Compression for HDUs #278
Conversation
|
Is this expected to be safe to run on stations that will upload, or should upload be turned off when testing this? |
|
can you please share some example compressed files ? I'd like to check they're still compatible with other FITS viewing and analysis software, |
|
Here are four FF fits files. Unfortunately, It's cloudy here right now. It would interesting to see what the size will be on a starry night - the size reduction might not be as dramatic.
[ |
|
Dave, I suggest testing this separately until we can confirm that there are no effects on processing. |
|
As noted by email i have found that the compressed files are incompatible with other FITS handling software - both FITS Liberator and Pixinsight refuse to open them anyway (they actually crash FL!). For me this feels like an issue as we don't know what the files might be getting used for downstream of RMS. I'm also wondering whether compression is really beneficial. Storage is cheap and 3.5MB is not really that big and the data are bzip compressed for upload to GMN so I dont think it will save space on the GMN side or save time in uploads. I agree it means less data getting written to disk each night, but the lifespan of any decent SD card is pretty long (several years) and i've actually never had a card fail due to wear and tear. |
|
On my end, CMNbinViewer, SAOImageDS9, and FITS Liberator have no issues handling it. I don't have access to Pixinsight but Pixinsight stated 'we have no interest in FITS, which we deprecated many years ago in PixInsight'. Is anyone else having any issues? It would be strange for a Fits application to not support such a basic requirement. Mark, you're making a very good point about the data being compressed before upload. The compressed and the uncompressed fits files, once compressed in a tar.bz2, have similar sizes. So, there is indeed no benefit as far transmitting the files. I don't know how the tar files are handled on the receiving end. Regarding local storage, it would clearly be beneficial. So, I think it would be valuable to get to the bottom the compatibility issue before pulling this out. Luc
|
|
Pixinsight will keep supporting FITS for a long time, although the pixinsight guys have been pushing their own private format for years. Nobody else is interested in it! My point was that PixInsight is a -very- widely used tool in the astro-imaging world, and if it can't open the files then its a problem. i think the problem with Fits Liberator is that there are two versions: version 3 is distributed by ESA from their website and can't open the compressed files. Version 4 from noirlab can handle the files. Personally i actually don't agree there's much advantage in compressing the files, because storage is cheap and the days when 3-4MB was a big file are long gone. Saving 1-1.5MB per file isn't really very significant. I realise it'd mean we could keep the CapturedFiles data for a bit longer but its very rare that we need to look back more than a few days, and realistically we'd only be able to keep an extra day or so. Anyway for me, i would want this to be an optional feature. |
|
OK, will do.
…________________________________
From: Denis Vida ***@***.***>
Sent: Tuesday, March 26, 2024 9:16 PM
To: CroatianMeteorNetwork/RMS ***@***.***>
Cc: David Rollinson ***@***.***>; Review requested ***@***.***>
Subject: Re: [CroatianMeteorNetwork/RMS] FITS Handling: Introduce Lossless Compression for HDUs (PR #278)
Dave, I suggest testing this separately until we can confirm that there are no effects on processing.
—
Reply to this email directly, view it on GitHub<#278 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ASMOVH6CBPCC24ZFXXZRZM3Y2FYKXAVCNFSM6AAAAABFHXYEZWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRQGQYDIOJRGM>.
You are receiving this because your review was requested.Message ID: ***@***.***>
|
|
Hi all, |
|
I added a 'hdu_compress' configuration setting to |
|
will try it out ! |
|
Update on compression ratio: my camera sees stars tonight and the compression still reduces the size by half. |
|
Worth noting that AstroPy also supports full-file compression with gzip, zip or bzip2. Such files have IMO it should be preferred approach:
As for justifiably question, I would agree that reducing size of individual files will be helpful. Besides retaining more nights on local storage before they're cleaned, it would also help will lifespan of SD cards, as they can only survive a limited number of write cycles. |
|
As i've said earlier i do not see any real benefit to compressing the FITS files, and so would prefer this to be an option and not the default. Finally bear in mind that RMS is not the only app that reads/writes the FITS files. If we change the file format it could have unexpected impact on how end-users process the data in ways we cannot guess because we do not know what downstream processes the individual countrry networks have in place (we in UKMON don't use the FITS files, but its possible others do). I'm sure it'd be a simple fix, but i am never in favour of introducing incompatability. |
|
In my use case, reduce storage size in ~50% is benefical. Storage isn't cheap here, and it would allow us to use 64GB or even 32GB microSD cards on the Pis. It seems that required I/O write throughtput will be reduced as well, so we will be able to add more RMS instances on a x86_64 server, at cost of higher CPU usage. I really like this patch. But I'd suggest you to collect more data. For instance, I'd like to see how extra CPU usage will affect the PIs, specially the Pi3 as they are at maximum load already, before making it default. The compatibility with software like Mark said is also important. Some people would prefer to have compression disabled for now, so they need to be aware and choose what to. I think most people don't use any external software, so I'm in favor in having it as default as long it doesn't break any existing RMS station. I'll be happy to test it on BR0001, BR0002 and BR0003 |
|
To clarify what I meant above; it's not required to change file format to get benefits of compression. As such, I would be argue against this patch with current implementation of HDU-only compression. Overall idea makes sense though. AstroPy can directly write to disk files with current FITS format, but already inside a generic compressed archive. Ie. best of both worlds, no changes to underlying format and smaller individual files, It will result in better compression ratio as well. I can open a branch if anybody is curious to test full-file compression? |
Hm! Yes, It's a good balance if it can write the compressed file directly (and not by writting the full uncompressed file, then compress it and delete the full file). I would like to test it as well! |
Would be interested to test it too. |
|
Bumping this discussion. I think we need to address FITS file sizes as they're impacting the science GMN can perform. Storage size directly affects how many nights of data we can keep on disk for revisiting interesting events. Currently, without compression, we can only go back half as many days as we could with compressed storage. Current storage breakdown per 12hr night:
This is particularly problematic for multi-cam setups where storage quickly becomes the limiting factor. This isn't just an upload bandwidth issue (files are already compressed for transfer). The real impact is on data retention for analysis. All compression schemes (RICE, gzip, etc.) achieve roughly the same ~50% ratio. We just need to pick one and implement it. Option 1: Compress individual FF files
Option 2: Compress entire night directories
Both could be made optional via config flag. Thoughts on implementation approach? Any strong preferences between the options? |
|
If you compressed an entire directory, how hard would it be to retrieve an individual file? I know some compression schemes are clever, but it feels expensive. My own feeling is compress individual fits files from the previous night just before capture on the next night is due to start. I don't agree that this problem affects multi-camera stations more than single camera stations. I think the greatest benefit will come from doubling the recall ability of the many 128 GB SD card stations. |
|
I just want to reiterate that i still don't think this is the right approach. Storage is cheap, and it'd be a lot simpler, less risky and more compatible just to keep data uncompressed and advise buying a larger SD card. As I noted upthread my quick tests indicate that any compression will make the data less compatible with other tools, which will make it harder for camera operators to examine and play with the data themselves. i think that ability is really important as it keeps people enaged and interested. I also believe we can solve the problm by making sure that all new non-core capabilities, such as all-day timelapses, raw video capture, daytime monitoring, contribution to contrail monitoring etc are disabled by default and come with a clear caveat that they'll use more storage so either a 256GB SD card or an SSD will be required. Existing station owners would then be unaffected and anyone enabling the new features would understand the risks and impact. I do worry that we're seeing a lot of "mission creep" thats impacting camera owners without fully informing them. I also think this would work - I might be missing something here, but I have disabled all unnecessary features and my stations can still retain 9-10 days of data, which is the same as it was back in 2022. So i do think this approach would obviate the need to make the data less compatible. If we do decide to compress data, I'd strongly recommend creating a standalone service that could be run as via systemd or be triggered by RMS using a signal. I feel it should not be done within RMS itself as this will create another thread and dependency that could fail or get stuck, leading to unexpected behaviour or data loss. Its a best-practice antipattern to make monolithic apps. |
|
Just to clear up a misconception, all-day time-lapse has a completely negligible impact on storage as of prerelease. It needs 1.1GB overhead plus 0.1GB per day. Obviously, raw video - used for meteor - has a large impact and requires large storage (although, it can be more efficient than FF files and produces higher quality observations). Even if you asked operators to spend their money on larger storage, you still would only retain half as many days as you could if you just compressed for free - which is wasteful. Sometimes the storage you get is whatever is laying at the bottom of the drawer. With everything turned off but core functions, a high latitude station with a 128GB drive can hold 2 days of uncompressed data vs 4 days of compressed data in the winter. Turning on all-day timelapse doesn't change these numbers. At the other end of the spectrum, a high latitude 6-cam station with continuous raw-video turned on, and a 2TB drive, can hold 5 days of data vs 7 days compressed. Same whether all day timelapse are produced or not. If we at least made it optional, I don't believe it would break any GMN pipeline. For operators needing their data to remain uncompressed, they would just turn the option off. For the majority of people who would rather store data efficiently, they would leave the options on. This minimal PR accomplishes this without large changes to the code base. |
|
Given recent developments - rising storage costs, the introduction of raw video save, and upcoming plans for daytime FITS recording - I'd like to revisit optional lossless compression for HDUs. Currently, FITS files are a fixed size regardless of content. They compress well before upload, but consume significant local storage prior to tar archiving. Daytime frames in particular would compress exceptionally well. Is there still resistance to making HDU compression optional? Maybe separately configurable for day vs. night capture? |
|
Just to give everyone some context - @Cybis320 and I discussed using compressed FFs for daytime recording. This will enable people to turn this feature on with very little impact on storage, as daytime data will compress really well. And this data will not be used operationally as any daytime events will have to be manually analyzed. So this will enable easy "better than nothing" data collection. And voila, the GMN is then recording daytime fireballs with little to no impact on resources. |
|
In terms of space its clearly a useful thing. My concern, as before, is that many downstream FITS tools can't handle them eg as noted previously Pixinsight 1.8.9 and FITS Liberator 3. There are probably only a few people who routinely use these tools on the data though. However, I'm a bit unclear about the plan. Above i tihnk luc mentioned needing 1.1GB plus 0.1GB per day, is that correct? Sorry to ask a lot of questions |
|
Right, I haven't explained things all too well. Currently, the daytime recordings only consist of the frame files and are quite tiny. The only way to get daytime fireballs is to record the full MKV videos, which comes with a big hit in terms of disk space. The idea is to develop a new (optional) feature which will allow saving FFs during the day in the compressed format, and also run the fireball detector. This way, we can get away with daytime fireball monitoring with only a small impact on disk space. |
|
understood, makes sense. I agree its better to collect some data than none! Will be keen to see the size of the compressed fits files compared to the current frames files. |
|
To clarify, the day FF files would be generated alongside the JPG ‘frames’ file, not as a replacement - they serve distinct purposes (Fireball/reentry vs climate) |
|
Okay definitely need to understand the space impact. Looks like the frames files take a min of 1GB, plus extra GB for compressed fits, will we need to recommend 256GB sd cards for pi based systems? Otherwise we'll lose night time data on recent fireballs. |
g7gpr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The console call to recursively decompress is good, but is there a way to recompress the files? From the console would be ideal. Or will RMS recurse through the directories, compressing them?
Will the compressed files work with SkyFit2, or does SkyFit2 need some work so it can open compressed files?
|
Would you like to me to switch some stations to this branch? |
|
@markmac99 We still don't know what the requirements are, but this definitely won't run by default. We're making it an option and we'll do a bit of exploration first. @g7gpr The daytime FF saving is still not implemented, and I'd rather not the normal FFs to be saved in the compressed format just yet. To answer the questions whether SkyFit will need any work - if the reader in FFfile.py transparently works with the compressed format, then no. I belive that it does. |
|
Here’s an initial attempt to estimate the storage impact of different options. I haven’t tested the actual compression ratio on day FITS. I still need to consider the quota management scenario - this is with quota management disabled. I've assumed the default retention policy in .config, including 20 day retention for the hypothetical day FF files (same as night - which may not make sense.) Capture Options Summary
Data Products by Mode
Storage Impact by Option (720p @ 25 fps)Per-Hour Storage
†Reach max value after 24 hr and then fluctuates between 0 and max. Approximate Raw Video Storage (H.264 @ 50% of nominal bitrate)
Combined Storage Scenarios (10h night, current - uncompressed FF)
Continuous Capture (24h) Storage (current - uncompressed FF)
Continuous Capture (24h) Storage (with planned day-only comp FF)The actual compression ration of day FF is unknown at this time
10-Day Storage AccumulationDefault retention settings from .config:
Standard Night Capture (10h nights)
†Raw JPGs: fixed overhead, deleted after timelapse creation Continuous Capture with 24h FF (hypothetical, 10h night + 14h day, day FF compressed)
†Raw JPGs: fixed overhead, deleted after timelapse creation Storage Growth Summary
*day-only compression (estimated)
|
|
Thanks for the very comprehensive stats. One thing I'd note is that night time FF file capture in the southern UK around the solstice is 18GB (currently capturing 5217 files @ 3.5MB each = 18,259 MB) as capture is currently running from 16:15 to 07:10, approx 15 hours. The cameras in Greenland are capturing for even longer, around 18 hours = 23GB FF files. If we take that as worst-case i estimate 24-hr continuous capture, with compressed FF during the day, could need 27GB per day in northern latitudes, without raw video. |
|
Actually, the cameras in Greenland capture 23 hours, pause for processing and upload, and then continue. So the worst-case scenario is effectively 24/7 nighttime capture. |
|
I think this is really fruitful direction, especially as compute cost vs storage cost of compression with new RPis is changing! @Cybis320 would you mind testing out on a different branch AstroPy's whole-file compression that we discussed before? IIRC all that's needed is to provide astropy function that we already use to write out FITS files HDU compressions are cool, but I doubt any of those can outperform good old DEFLATE. And
Other downstream software (CMN_binViewer etc) can likewise be modified to detect Otherwise I can give it a try later in January (on holidays without laptop currently) |
|
Hi Dario, Surprisingly, the current method provides the highest compression ratio in my tests.
Also, just compressing the HDUs means everything downstream (except PixInsight) just works - no extra steps needed. I'll investigate again, but I'm not seeing the advantage of doing whole fits compression - but maybe I'm missing something. Also, I tested day fits compression, and the compression ratio is surprisingly not that different from night fits - ~2x all around. Edit: number above are for 1080p - doesn't affect the ratio though. |
This PR introduces lossless compression algorithms for Header/Data Units (HDUs) within our FITS file handling module ('RMS/Formats/FFfits.py'). The goal is to reduce FITS file sizes to improve storage efficiency, mitigate wear and tear on storage devices, and enhance data transfer speeds, all while maintaining the data integrity and accessibility.
Changes Made:
Impact:
Testing: