Skip to content

Compressed inputs and temporal outputs#878

Open
gschivley wants to merge 16 commits intoGenXProject:developfrom
gschivley:compressed_inputs
Open

Compressed inputs and temporal outputs#878
gschivley wants to merge 16 commits intoGenXProject:developfrom
gschivley:compressed_inputs

Conversation

@gschivley
Copy link
Collaborator

Description

Allow for the use of gzipped csv and parquet 1) input files and 2) temporal output files with complete backwards-compatibility. Any or all of the input files can be binary/compressed, so long as they are in the expected folder location and have the same file stem as existing inputs. Annual outputs will continue to be in csv. The format of hourly outputs is controlled with the new parameter TemporalOutputFormat.

All inputs are read using DuckDB. Gzip and parquet output formats are written using DuckDB.

This new feature is especially useful when inputs contain multiple weather years of hourly data. I've had CSV files growing to multiple GB.

What type of PR is this? (check all applicable)

  • [ X] Feature
  • Bug Fix
  • Documentation Update
  • Code Refactor
  • Performance Improvements

Related Tickets & Documents

This partially supersedes PR #734, which has languished. I suspect it was trying to do too many things in a single PR, so I'm starting with one discrete task that is backwards compatible.

Checklist

  • [X ] Code changes are sufficiently documented; i.e. new functions contain docstrings and .md files under /docs/src have been updated if necessary.
  • [ X] The latest changes on the target branch have been incorporated, so that any conflicts are taken care of before merging. This can be accomplished either by merging in the target branch (e.g. 'git merge develop') or by rebasing on top of the target branch (e.g. 'git rebase develop'). Please do not hesitate to reach out to the GenX development team if you need help with this.
  • [ X] Code has been tested to ensure all functionality works as intended.
  • [X ] CHANGELOG.md has been updated (if this is a 'notable' change).
  • [X ] I consent to the release of this PR's code under the GNU General Public license.

How this can be tested

This is a strictly internal change.

Post-approval checklist for GenX core developers

After the PR is approved

  • Check that the latest changes on the target branch are incorporated, either via merge or rebase
  • Remember to squash and merge if incorporating into develop

lbonaldo and others added 15 commits February 4, 2025 23:13
Merge commit for Patch Release v0.4.4
Co-authored-by: gschivley <10373332+gschivley@users.noreply.github.com>
Co-authored-by: gschivley <10373332+gschivley@users.noreply.github.com>
Co-authored-by: gschivley <10373332+gschivley@users.noreply.github.com>
Co-authored-by: gschivley <10373332+gschivley@users.noreply.github.com>
Co-authored-by: gschivley <10373332+gschivley@users.noreply.github.com>
Co-authored-by: gschivley <10373332+gschivley@users.noreply.github.com>
…uet)

Co-authored-by: gschivley <10373332+gschivley@users.noreply.github.com>
…utFormat setting

Co-authored-by: gschivley <10373332+gschivley@users.noreply.github.com>
Co-authored-by: gschivley <10373332+gschivley@users.noreply.github.com>
…ptions

Replace CSV-only file loading with DuckDB to support CSV, CSV.GZ, and Parquet formats for inputs. Add option for gzip and parquet temporal outputs.
@gschivley gschivley changed the title Compressed inputs Compressed inputs and temporal outputs Oct 31, 2025
@gschivley gschivley marked this pull request as ready for review October 31, 2025 13:30
@gschivley
Copy link
Collaborator Author

To Do:

  • Rewrite tests to match format of existing tests. Wrap everything in functions that are called on include.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants