Code from publication @ACM SIGSPATIAL 2018: Modular Software Framework for Compression of Structured Environmental Data.
This framework should help with the design and development of a prediction-based compression method for climate data.
This code has been tested on following machine:
Python: 3.6.1
OS: Debian 4.11.6-1 (2017-06-19) testing (buster)
CPU: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
MEM: 16 GiB 2400MHz DDR4
To recreate the software environment you can use the provided
spec-file.txt and requirements.txt files. Currently only GNU/Linux is supported.
conda create -n ENVNAME --file spec-file.txt
conda activate ENVNAME
pip install -r requirements.txtmacOS & Windows: Conda does not support cross-platform export of package names including versions. As soon as this feature is added I'll generate the appropriate macOS and Windows environmental files.
A workflow defines the whole compression process from start to finish. The following figure will give a short overview of the steps and modifiers as well as objects involved in the compression process.
A workflow is defined by its five modifiers: mapper, sequencer, predictor, subtractor and encoder. These modifiers define the whole compression process (map,flatten, predict, subtract, encode) depicted in the figure above. The read and write processes are provided by the Python standard library.
A modifier defines how to transform one object to the other. It provides one function (with appropriate inverse function for the decompression process) and operate on only on one object. The following table provides an overview of the interface for each of the modifiers.
| modifier | function | inverse function | input object | output object |
|---|---|---|---|---|
| mapper | map() |
rev_map() |
floatarray | integerarray |
| sequencer | flatten() |
- | integerarray | sequence object |
| predictor | predict() |
- | integerarray, startnode, sequence object | predictionarray |
| subtractor | subtract() |
- | integerarray, predictionarray | residualarray |
| encoder | encode() |
decode() |
residualarray | coded object |
The input and output objects are switched if the appropriate inverse function is used.
An object defines the state of the data during the (de)compression process. The following UML-diagram provides information about each object and the attributes provided by it.
The keen observer might have discovered that the integer, residual and predictionarrays provide the same attributes and have the same properties. But the separation is necessary due to the fact that the modifiers are only allowed to operate on one kind of object.
The cframe folder is structured as follows:
cframe/
├── backend
├── data
├── format
├── modifier
│ ├── encoder
│ ├── mapper
│ ├── predictor
│ ├── sequencer
│ └── subtractor
├── objects
│ └── arrays
└── toolbox
backendFiles which define the interfaces of the objects and modifiers. The modifier definitions use amodsuffix and the object definitions use theobjsuffix.dataExample netcdf file [source]formatPreliminary format definition including white papermodifierActual instances for modifier objects (following explanation is ordered by execution steps, not alphabetically)mapperMapping of floating point values to integerssequencerTraversal of the dataset withstartnodeand optionalorderparameter (further sequencer are provided in following repository informationspaces)predictorPrediction of the next value based on past experiencesubtractorModifier for calculation of the difference between prediction and true valueencoderMethods for writing of the data on disk (according to the definition informat)
objectsImplementation of the objects. The arrays are put in a separate subfolder for better organisationtoolboxAdditional helper functions to provide workflows, parallel execution, subsetting, and quality assessmentfeederPre-allocation of arrays to help improve calculation speed during the prediction process. This prevents copying data over and over in memoryparallelParallel execution of severalworkflowsqualityassessmentStatistics about the compressed datasubsettingRandom subsets of a dataset of sizeNworkflowWrapper class for the definition of a complete (de)compression process


