Skip to content

Comments

Merge main into develop for release v1.0.0#85

Open
Ianlmgoddard wants to merge 8 commits intodevelopfrom
main
Open

Merge main into develop for release v1.0.0#85
Ianlmgoddard wants to merge 8 commits intodevelopfrom
main

Conversation

@Ianlmgoddard
Copy link
Collaborator

No description provided.

shengy90 and others added 8 commits March 25, 2025 10:46
* Add link to Zenodo and PDF of workshop tutorial

Exclude PDF from large file precommit checks

* Fix URL error

* Add gitkeep to subfolders in data

* Update data modules to return a TypedDict as training rows.

TODO: Update models to handle TrainingData typed dict

* Update VAE to handle arbitrary number of features

* Training GMM to handle arbitrary feature length

* Update sampling to arbitrarily handle a list of features

* Save feature list when training VAE model

* Convert some utility methods to static methods

* Fixing mypy and linting errors

Mypy needs error handling to handle cases where dictionary could return None values.

* Add Pytests

* Use torch.Tensor for typing for better typing support

* Add comments and docstrings to explain get_index

* Fix error in pytest

Pytest wasn't working, index and value was swapped in the pytest

* Add comments to pytest to explain logic

* Add to docstrings to explain training data needs to be an instance of TrainingData typed dict.

* Fix bug where gmm_labels was converted to numpy array accidentally

* Add more test to test the sampling logic

* Refactor test and assert mask len is correct

* Add support for sample weights

* Fix bug in the way quantile and mse losses were calculating averages

Can't just use simple average - need to calculate the sum of losses and divide by the sum of weights

* MSE weight wasn't applied to mse_loss

* Bugs with if sample_weights

* Use the torch native repat_interleave function

More GPU efficient

* More efficient implementation of weighted MMD loss

* Add pytests to check that weighted MMD loss implementation is correct

* Fix bugs with MSE and Quantile losses for 2D tensors

* Fixed keyerror bug

Previously sample_gmm returned a Tuple but it was updated to returning the TrainingData TypedDict. However we forgot to update the code for evaluation.

* Update check_gpl.sh to exclude notebooks folder since codes in notebooks can sometimes have "GPL" in the raw byte strings, and notebooks is not part of the distributable software.

* Fix wrong directory

* Ignore notebooks from scancode checks

* Make mean_factor an passable argument for MIA

* Add doc strings

* Make MIA dataset a parameter users can pass in

Also fixes a typo bug

* Feature - Streaming training data (#62)

* Added prod and dev streaming packages

Signed-off-by: Gus Chadney <gus.chadney@centrefornetzero.org>

* Added streaming dataloader / dataset

Signed-off-by: Gus Chadney <gus.chadney@centrefornetzero.org>

* Added ipywidgets dev package to show tqdm

Signed-off-by: Gus Chadney <gus.chadney@centrefornetzero.org>

* Ignore streaming output files

Signed-off-by: Gus Chadney <gus.chadney@centrefornetzero.org>

* Fix import path

Signed-off-by: Gus Chadney <gus.chadney@centrefornetzero.org>

* Fix dataset output

Signed-off-by: Gus Chadney <gus.chadney@centrefornetzero.org>

* Split households should output to raw not processed

Signed-off-by: Gus Chadney <gus.chadney@centrefornetzero.org>

* Notebook to process streaming training data

Signed-off-by: Gus Chadney <gus.chadney@centrefornetzero.org>

* Fix data format issues

Signed-off-by: Gus Chadney <gus.chadney@centrefornetzero.org>

* Update streaming model training notebook

Signed-off-by: Gus Chadney <gus.chadney@centrefornetzero.org>

* precommit checks

Signed-off-by: Gus Chadney <gus.chadney@centrefornetzero.org>

* Added tests for streaming dataloader

Signed-off-by: Gus Chadney <gus.chadney@centrefornetzero.org>

* fix test packages

Signed-off-by: Gus Chadney <gus.chadney@centrefornetzero.org>

* Update README

Signed-off-by: Gus Chadney <gus.chadney@centrefornetzero.org>

* Update to pass in df as local variable

Signed-off-by: Gus Chadney <gus.chadney@centrefornetzero.org>

---------

Signed-off-by: Gus Chadney <gus.chadney@centrefornetzero.org>

* Replace sklearn GMM with Pytorch lightning GMM to enable GPU training (#52)

* use pytorch GMM instead of sklearn gmm (#43)

* Expand the repeat in batches based on the weights

- Weights = number of time each sample occurs in the dataset
- Before training GMM, expand the repeat samples in the batch based on the sample weights
- Shuffle the expanded batch to prevent consecutive repeated samples which may impact GMM convergence

* Abstract `prepare_data_for_model`

`prepare_data_for_model` was repeated 4x in _lightning.py scripts.
Abstract code for testing purposes.

* Avoid circular import

Note this is a temporary fix and we need to a better way of brining the trained FaradayVAE as import.

* Abstract out FaradayVAE

This is prevent circular imports.
`src/opensynth/models/faraday/model.py` depends on `_lightning.py` scripts which themselves take FaradayVAE as input.
Before FaradayVAE was defined in `src/opensynth/models/faraday/model.py` creating a circular import. I have moved FaradayVAE to vae_models.py to prevent circular import.

* Fix bug

* Add licence in heading

* Update src/opensynth/models/faraday/gaussian_mixture/prepare_gmm_input.py

Co-authored-by: Ianlmgoddard <47185808+Ianlmgoddard@users.noreply.github.com>
Signed-off-by: Charlotte Avery <143102500+charlotte-avery@users.noreply.github.com>

* Update src/opensynth/models/faraday/vae_model.py

Co-authored-by: Ianlmgoddard <47185808+Ianlmgoddard@users.noreply.github.com>
Signed-off-by: Charlotte Avery <143102500+charlotte-avery@users.noreply.github.com>

* Test function preparing training data for GMM

* Address review comment

* Specify FaradayVAE type

* Abstract `prepare_data_for_model` for unit testing

Add tests recommended in PR review.

* Fix gmm regularisation and issues with div by zero (#48)

* Fix gmm regularisation and ensure minimum probability is 1/batch size
* Use sklearn for kmeans

* Spit large test func into smaller test func

* Move GMM tests to new script

* Clean up Pytorch lightning GMM feature branch (#51)

* remove reference to different covariance types, and kmeans lightning

* remove kmeans metrics, refactor kmeans implementation to re-use code

* add tests for gmm internals

* Add sample_weights as passable argument into FaradayModel

* Add train_sample_weights as parameter to prepare_data_for_model

Add unit test to check for positive and negative behaviour

* Add train_sample_weights to all invocations of prepare_data_for_model

* Make train_sample_weights compulsory for prepare_data_for_model

This is to catch errors to make sure that train_sample_weights are explicitly passed into the function to avoid accidental default False even if it's passed in.

* Update docstrings

* Add missing test prefix

* Simplified gmm clean (#61)

Implement GMM model using PyTorch Lightning with logic based on SK Learn
---------

Signed-off-by: Shengy <shengy90@gmail.com>
Signed-off-by: Charlotte Avery <143102500+charlotte-avery@users.noreply.github.com>
Co-authored-by: Shengy <shengy90@gmail.com>

---------

Signed-off-by: Charlotte Avery <143102500+charlotte-avery@users.noreply.github.com>
Signed-off-by: Shengy <shengy90@gmail.com>
Co-authored-by: Charlotte Avery <charlotte.avery@centrefornetzero.org>
Co-authored-by: Charlotte Avery <143102500+charlotte-avery@users.noreply.github.com>
Co-authored-by: Shengy <shengy90@gmail.com>

* Feature/EnergyDiff - models, scripts, and notebooks (#65) (#67)

* add models, training of energydiff  (#1)

* add test script for data format

* add energydiff modules

* add data format adapt for energydiff

* add energydiff test for training

* add diff sampler functions

* add support tweaks for mps

* fix diffusion samplers

* add adjust diffusion logging

* add diffusion test notebook

* !FIX typo in calling q posterior mean var function

* fix change ema to init

* update test energydiff notebook

* fix PLDiffusion1D init error

* update diffusion init to adapt checkpoint loading

* update notebook

* update notebook

* add train py script

* add energydiff calibrate

* update energydiff sample params

* fix import in diffusion

* update energydiff notebook

* !fix init projection reshape
- this will make mps backend incompatible

* Add option to disable init proj

* !fix nn linear when skipping init proj

* add format-conforming ipynb energydiff

* fix energydiff ipynb

* add data to ignore

* revert gitignore

* update notebook

* del temp files

* del temp file

* formatting changes

* pre-commit check formatting

* del temp files





* add __init__ of energydiff





* add attacks support for energydiff





* add energydiff eval notebook





* add a note to notebook





* add two docstrings




* add meaningful enum.Enum




* fix replace assert with if checks




* add multiple formatting tweaks




* add renaming ambiguous var names in tutorial notebook




* add rename ambiguous var names in energydiff eval notebook




* add optimize shape transforms in ECDF class




* fix circular dependency of sampler and diffusion




* add header to sampler module




* refactor: diffusion and sampler with a base module




* add .venv to gitignore




* update dependencies




* fix calibrate transform functions




* fix a wrong if statement in diffusion




* add type hints




* Fix several assertions changed to if-raise




* change a string format to fstring




* Add noise schedule in sampler: str -> enum




* add an important comment in sampler module




* add dependency to Pipfile



* add unit test for energydiff
- calibrate
- model
- dependency



* fix calibrate and tests



* fix license in sampler module



* add license



* fix gitignore



* add test for extract + docs



* rename diffusion utils test



* add nn module unit test and docstrings



* add important ERROR raise for num sampling step (need to be >=15)
- also add tests



* add unit test for diffusion + lightning module



* add make mps test optional depending on availability



* add more explicit fixture param in diffusion test



* del position scaling
- not necessary for usually uses
- might be confusing



---------

Signed-off-by: Nan <nanlin1997@outlook.com>
Signed-off-by: Nan-Snellius <nanlin1997@outlook.com>
Signed-off-by: Nan Lin <nansense97@gmail.com>
Co-authored-by: Nan <81872363+sentient-codebot@users.noreply.github.com>
Co-authored-by: Nan Lin <nansense97@gmail.com>

* add funding info to energydiff (#68)

Signed-off-by: Nan Lin <nansense97@gmail.com>
Co-authored-by: Nan Lin <nansense97@gmail.com>

* Fix streaming notebooks to update FaradayGMM code (#70)

Also update pipfile

* Increment version

---------

Signed-off-by: Gus Chadney <gus.chadney@centrefornetzero.org>
Signed-off-by: Charlotte Avery <143102500+charlotte-avery@users.noreply.github.com>
Signed-off-by: Shengy <shengy90@gmail.com>
Signed-off-by: Nan <nanlin1997@outlook.com>
Signed-off-by: Nan-Snellius <nanlin1997@outlook.com>
Signed-off-by: Nan Lin <nansense97@gmail.com>
Co-authored-by: Gus Chadney <angus.chadney@gmail.com>
Co-authored-by: Ianlmgoddard <47185808+Ianlmgoddard@users.noreply.github.com>
Co-authored-by: Charlotte Avery <charlotte.avery@centrefornetzero.org>
Co-authored-by: Charlotte Avery <143102500+charlotte-avery@users.noreply.github.com>
Co-authored-by: Nan <81872363+sentient-codebot@users.noreply.github.com>
Co-authored-by: Nan Lin <nansense97@gmail.com>
Signed-off-by: Shengy <shengy90@gmail.com>
* Replace sklearn GMM with Pytorch lightning GMM to enable GPU training (#52)

* use pytorch GMM instead of sklearn gmm (#43)

* Expand the repeat in batches based on the weights

- Weights = number of time each sample occurs in the dataset
- Before training GMM, expand the repeat samples in the batch based on the sample weights
- Shuffle the expanded batch to prevent consecutive repeated samples which may impact GMM convergence

* Abstract `prepare_data_for_model`

`prepare_data_for_model` was repeated 4x in _lightning.py scripts.
Abstract code for testing purposes.

* Avoid circular import

Note this is a temporary fix and we need to a better way of brining the trained FaradayVAE as import.

* Abstract out FaradayVAE

This is prevent circular imports.
`src/opensynth/models/faraday/model.py` depends on `_lightning.py` scripts which themselves take FaradayVAE as input.
Before FaradayVAE was defined in `src/opensynth/models/faraday/model.py` creating a circular import. I have moved FaradayVAE to vae_models.py to prevent circular import.

* Fix bug

* Add licence in heading

* Update src/opensynth/models/faraday/gaussian_mixture/prepare_gmm_input.py

Co-authored-by: Ianlmgoddard <47185808+Ianlmgoddard@users.noreply.github.com>
Signed-off-by: Charlotte Avery <143102500+charlotte-avery@users.noreply.github.com>

* Update src/opensynth/models/faraday/vae_model.py

Co-authored-by: Ianlmgoddard <47185808+Ianlmgoddard@users.noreply.github.com>
Signed-off-by: Charlotte Avery <143102500+charlotte-avery@users.noreply.github.com>

* Test function preparing training data for GMM

* Address review comment

* Specify FaradayVAE type

* Abstract `prepare_data_for_model` for unit testing

Add tests recommended in PR review.

* Fix gmm regularisation and issues with div by zero (#48)

* Fix gmm regularisation and ensure minimum probability is 1/batch size
* Use sklearn for kmeans

* Spit large test func into smaller test func

* Move GMM tests to new script

* Clean up Pytorch lightning GMM feature branch (#51)

* remove reference to different covariance types, and kmeans lightning

* remove kmeans metrics, refactor kmeans implementation to re-use code

* add tests for gmm internals

* Add sample_weights as passable argument into FaradayModel

* Add train_sample_weights as parameter to prepare_data_for_model

Add unit test to check for positive and negative behaviour

* Add train_sample_weights to all invocations of prepare_data_for_model

* Make train_sample_weights compulsory for prepare_data_for_model

This is to catch errors to make sure that train_sample_weights are explicitly passed into the function to avoid accidental default False even if it's passed in.

* Update docstrings

* Add missing test prefix

* Simplified gmm clean (#61)

Implement GMM model using PyTorch Lightning with logic based on SK Learn
---------

Signed-off-by: Shengy <shengy90@gmail.com>
Signed-off-by: Charlotte Avery <143102500+charlotte-avery@users.noreply.github.com>
Co-authored-by: Shengy <shengy90@gmail.com>

---------

Signed-off-by: Charlotte Avery <143102500+charlotte-avery@users.noreply.github.com>
Signed-off-by: Shengy <shengy90@gmail.com>
Co-authored-by: Charlotte Avery <charlotte.avery@centrefornetzero.org>
Co-authored-by: Charlotte Avery <143102500+charlotte-avery@users.noreply.github.com>
Co-authored-by: Shengy <shengy90@gmail.com>
Signed-off-by: hajalibayram <hajalibayram@outlook.com>

* Feature/EnergyDiff - models, scripts, and notebooks (#65) (#67)

* add models, training of energydiff  (#1)

* add test script for data format

* add energydiff modules

* add data format adapt for energydiff

* add energydiff test for training

* add diff sampler functions

* add support tweaks for mps

* fix diffusion samplers

* add adjust diffusion logging

* add diffusion test notebook

* !FIX typo in calling q posterior mean var function

* fix change ema to init

* update test energydiff notebook

* fix PLDiffusion1D init error

* update diffusion init to adapt checkpoint loading

* update notebook

* update notebook

* add train py script

* add energydiff calibrate

* update energydiff sample params

* fix import in diffusion

* update energydiff notebook

* !fix init projection reshape
- this will make mps backend incompatible

* Add option to disable init proj

* !fix nn linear when skipping init proj

* add format-conforming ipynb energydiff

* fix energydiff ipynb

* add data to ignore

* revert gitignore

* update notebook

* del temp files

* del temp file

* formatting changes

* pre-commit check formatting

* del temp files

* add __init__ of energydiff

* add attacks support for energydiff

* add energydiff eval notebook

* add a note to notebook

* add two docstrings

* add meaningful enum.Enum

* fix replace assert with if checks

* add multiple formatting tweaks

* add renaming ambiguous var names in tutorial notebook

* add rename ambiguous var names in energydiff eval notebook

* add optimize shape transforms in ECDF class

* fix circular dependency of sampler and diffusion

* add header to sampler module

* refactor: diffusion and sampler with a base module

* add .venv to gitignore

* update dependencies

* fix calibrate transform functions

* fix a wrong if statement in diffusion

* add type hints

* Fix several assertions changed to if-raise

* change a string format to fstring

* Add noise schedule in sampler: str -> enum

* add an important comment in sampler module

* add dependency to Pipfile

* add unit test for energydiff
- calibrate
- model
- dependency

* fix calibrate and tests

* fix license in sampler module

* add license

* fix gitignore

* add test for extract + docs

* rename diffusion utils test

* add nn module unit test and docstrings

* add important ERROR raise for num sampling step (need to be >=15)
- also add tests

* add unit test for diffusion + lightning module

* add make mps test optional depending on availability

* add more explicit fixture param in diffusion test

* del position scaling
- not necessary for usually uses
- might be confusing

---------

Signed-off-by: Nan <nanlin1997@outlook.com>
Signed-off-by: Nan-Snellius <nanlin1997@outlook.com>
Signed-off-by: Nan Lin <nansense97@gmail.com>
Co-authored-by: Nan <81872363+sentient-codebot@users.noreply.github.com>
Co-authored-by: Nan Lin <nansense97@gmail.com>
Signed-off-by: hajalibayram <hajalibayram@outlook.com>

* add funding info to energydiff (#68)

Signed-off-by: Nan Lin <nansense97@gmail.com>
Co-authored-by: Nan Lin <nansense97@gmail.com>
Signed-off-by: hajalibayram <hajalibayram@outlook.com>

* Fix streaming notebooks to update FaradayGMM code (#70)

Also update pipfile

Signed-off-by: hajalibayram <hajalibayram@outlook.com>

* V0.0.6 release (#71) (#72)

* Add link to Zenodo and PDF of workshop tutorial

Exclude PDF from large file precommit checks

* Fix URL error

* Add gitkeep to subfolders in data

* Update data modules to return a TypedDict as training rows.

TODO: Update models to handle TrainingData typed dict

* Update VAE to handle arbitrary number of features

* Training GMM to handle arbitrary feature length

* Update sampling to arbitrarily handle a list of features

* Save feature list when training VAE model

* Convert some utility methods to static methods

* Fixing mypy and linting errors

Mypy needs error handling to handle cases where dictionary could return None values.

* Add Pytests

* Use torch.Tensor for typing for better typing support

* Add comments and docstrings to explain get_index

* Fix error in pytest

Pytest wasn't working, index and value was swapped in the pytest

* Add comments to pytest to explain logic

* Add to docstrings to explain training data needs to be an instance of TrainingData typed dict.

* Fix bug where gmm_labels was converted to numpy array accidentally

* Add more test to test the sampling logic

* Refactor test and assert mask len is correct

* Add support for sample weights

* Fix bug in the way quantile and mse losses were calculating averages

Can't just use simple average - need to calculate the sum of losses and divide by the sum of weights

* MSE weight wasn't applied to mse_loss

* Bugs with if sample_weights

* Use the torch native repat_interleave function

More GPU efficient

* More efficient implementation of weighted MMD loss

* Add pytests to check that weighted MMD loss implementation is correct

* Fix bugs with MSE and Quantile losses for 2D tensors

* Fixed keyerror bug

Previously sample_gmm returned a Tuple but it was updated to returning the TrainingData TypedDict. However we forgot to update the code for evaluation.

* Update check_gpl.sh to exclude notebooks folder since codes in notebooks can sometimes have "GPL" in the raw byte strings, and notebooks is not part of the distributable software.

* Fix wrong directory

* Ignore notebooks from scancode checks

* Make mean_factor an passable argument for MIA

* Add doc strings

* Make MIA dataset a parameter users can pass in

Also fixes a typo bug

* Feature - Streaming training data (#62)

* Added prod and dev streaming packages

* Added streaming dataloader / dataset

* Added ipywidgets dev package to show tqdm

* Ignore streaming output files

* Fix import path

* Fix dataset output

* Split households should output to raw not processed

* Notebook to process streaming training data

* Fix data format issues

* Update streaming model training notebook

* precommit checks

* Added tests for streaming dataloader

* fix test packages

* Update README

* Update to pass in df as local variable

---------

* Replace sklearn GMM with Pytorch lightning GMM to enable GPU training (#52)

* use pytorch GMM instead of sklearn gmm (#43)

* Expand the repeat in batches based on the weights

- Weights = number of time each sample occurs in the dataset
- Before training GMM, expand the repeat samples in the batch based on the sample weights
- Shuffle the expanded batch to prevent consecutive repeated samples which may impact GMM convergence

* Abstract `prepare_data_for_model`

`prepare_data_for_model` was repeated 4x in _lightning.py scripts.
Abstract code for testing purposes.

* Avoid circular import

Note this is a temporary fix and we need to a better way of brining the trained FaradayVAE as import.

* Abstract out FaradayVAE

This is prevent circular imports.
`src/opensynth/models/faraday/model.py` depends on `_lightning.py` scripts which themselves take FaradayVAE as input.
Before FaradayVAE was defined in `src/opensynth/models/faraday/model.py` creating a circular import. I have moved FaradayVAE to vae_models.py to prevent circular import.

* Fix bug

* Add licence in heading

* Update src/opensynth/models/faraday/gaussian_mixture/prepare_gmm_input.py

* Update src/opensynth/models/faraday/vae_model.py

* Test function preparing training data for GMM

* Address review comment

* Specify FaradayVAE type

* Abstract `prepare_data_for_model` for unit testing

Add tests recommended in PR review.

* Fix gmm regularisation and issues with div by zero (#48)

* Fix gmm regularisation and ensure minimum probability is 1/batch size
* Use sklearn for kmeans

* Spit large test func into smaller test func

* Move GMM tests to new script

* Clean up Pytorch lightning GMM feature branch (#51)

* remove reference to different covariance types, and kmeans lightning

* remove kmeans metrics, refactor kmeans implementation to re-use code

* add tests for gmm internals

* Add sample_weights as passable argument into FaradayModel

* Add train_sample_weights as parameter to prepare_data_for_model

Add unit test to check for positive and negative behaviour

* Add train_sample_weights to all invocations of prepare_data_for_model

* Make train_sample_weights compulsory for prepare_data_for_model

This is to catch errors to make sure that train_sample_weights are explicitly passed into the function to avoid accidental default False even if it's passed in.

* Update docstrings

* Add missing test prefix

* Simplified gmm clean (#61)

Implement GMM model using PyTorch Lightning with logic based on SK Learn
---------

---------

* Feature/EnergyDiff - models, scripts, and notebooks (#65) (#67)

* add models, training of energydiff  (#1)

* add test script for data format

* add energydiff modules

* add data format adapt for energydiff

* add energydiff test for training

* add diff sampler functions

* add support tweaks for mps

* fix diffusion samplers

* add adjust diffusion logging

* add diffusion test notebook

* !FIX typo in calling q posterior mean var function

* fix change ema to init

* update test energydiff notebook

* fix PLDiffusion1D init error

* update diffusion init to adapt checkpoint loading

* update notebook

* update notebook

* add train py script

* add energydiff calibrate

* update energydiff sample params

* fix import in diffusion

* update energydiff notebook

* !fix init projection reshape
- this will make mps backend incompatible

* Add option to disable init proj

* !fix nn linear when skipping init proj

* add format-conforming ipynb energydiff

* fix energydiff ipynb

* add data to ignore

* revert gitignore

* update notebook

* del temp files

* del temp file

* formatting changes

* pre-commit check formatting

* del temp files

* add __init__ of energydiff

* add attacks support for energydiff

* add energydiff eval notebook

* add a note to notebook

* add two docstrings

* add meaningful enum.Enum

* fix replace assert with if checks

* add multiple formatting tweaks

* add renaming ambiguous var names in tutorial notebook

* add rename ambiguous var names in energydiff eval notebook

* add optimize shape transforms in ECDF class

* fix circular dependency of sampler and diffusion

* add header to sampler module

* refactor: diffusion and sampler with a base module

* add .venv to gitignore

* update dependencies

* fix calibrate transform functions

* fix a wrong if statement in diffusion

* add type hints

* Fix several assertions changed to if-raise

* change a string format to fstring

* Add noise schedule in sampler: str -> enum

* add an important comment in sampler module

* add dependency to Pipfile

* add unit test for energydiff
- calibrate
- model
- dependency

* fix calibrate and tests

* fix license in sampler module

* add license

* fix gitignore

* add test for extract + docs

* rename diffusion utils test

* add nn module unit test and docstrings

* add important ERROR raise for num sampling step (need to be >=15)
- also add tests

* add unit test for diffusion + lightning module

* add make mps test optional depending on availability

* add more explicit fixture param in diffusion test

* del position scaling
- not necessary for usually uses
- might be confusing

---------

* add funding info to energydiff (#68)

* Fix streaming notebooks to update FaradayGMM code (#70)

Also update pipfile

* Increment version

---------

Signed-off-by: Gus Chadney <gus.chadney@centrefornetzero.org>
Signed-off-by: Charlotte Avery <143102500+charlotte-avery@users.noreply.github.com>
Signed-off-by: Shengy <shengy90@gmail.com>
Signed-off-by: Nan <nanlin1997@outlook.com>
Signed-off-by: Nan-Snellius <nanlin1997@outlook.com>
Signed-off-by: Nan Lin <nansense97@gmail.com>
Co-authored-by: Gus Chadney <angus.chadney@gmail.com>
Co-authored-by: Ianlmgoddard <47185808+Ianlmgoddard@users.noreply.github.com>
Co-authored-by: Charlotte Avery <charlotte.avery@centrefornetzero.org>
Co-authored-by: Charlotte Avery <143102500+charlotte-avery@users.noreply.github.com>
Co-authored-by: Nan <81872363+sentient-codebot@users.noreply.github.com>
Co-authored-by: Nan Lin <nansense97@gmail.com>
Signed-off-by: hajalibayram <hajalibayram@outlook.com>

* Train test split fixed in split_households.py:38

Signed-off-by: hajalibayram <hajalibayram@outlook.com>

* In train test split sample size refactored to sample fraction with default value of 0.75

Signed-off-by: hajalibayram <hajalibayram@outlook.com>

---------

Signed-off-by: Charlotte Avery <143102500+charlotte-avery@users.noreply.github.com>
Signed-off-by: Shengy <shengy90@gmail.com>
Signed-off-by: hajalibayram <hajalibayram@outlook.com>
Signed-off-by: Nan <nanlin1997@outlook.com>
Signed-off-by: Nan-Snellius <nanlin1997@outlook.com>
Signed-off-by: Nan Lin <nansense97@gmail.com>
Signed-off-by: Gus Chadney <gus.chadney@centrefornetzero.org>
Co-authored-by: Ianlmgoddard <47185808+Ianlmgoddard@users.noreply.github.com>
Co-authored-by: Charlotte Avery <charlotte.avery@centrefornetzero.org>
Co-authored-by: Charlotte Avery <143102500+charlotte-avery@users.noreply.github.com>
Co-authored-by: Shengy <shengy90@gmail.com>
Co-authored-by: Nan <81872363+sentient-codebot@users.noreply.github.com>
Co-authored-by: Nan Lin <nansense97@gmail.com>
Co-authored-by: Gus Chadney <angus.chadney@gmail.com>
Signed-off-by: Ianlmgoddard <47185808+Ianlmgoddard@users.noreply.github.com>
* Update data processing scripts

Signed-off-by: Charlotte Avery <charlotte.avery@centrefornetzero.org>

* Update tests

Signed-off-by: Charlotte Avery <charlotte.avery@centrefornetzero.org>

* Address PR review comments

Signed-off-by: Charlotte Avery <charlotte.avery@centrefornetzero.org>

* Add option to drop nulls or replace nulls with 0

Signed-off-by: Charlotte Avery <charlotte.avery@centrefornetzero.org>

* Update sampled_fraction description
OEIT: removed time filter

Signed-off-by: Charlotte Avery <charlotte.avery@centrefornetzero.org>

* Add logging

Signed-off-by: Charlotte Avery <charlotte.avery@centrefornetzero.org>

---------

Signed-off-by: Charlotte Avery <charlotte.avery@centrefornetzero.org>
Signed-off-by: Ianlmgoddard <47185808+Ianlmgoddard@users.noreply.github.com>
Signed-off-by: Charlotte Avery <charlotte.avery@centrefornetzero.org>
Signed-off-by: Ianlmgoddard <47185808+Ianlmgoddard@users.noreply.github.com>
* Add ReparametrisationModule to init

Signed-off-by: Charlotte Avery <charlotte.avery@centrefornetzero.org>

---------

Signed-off-by: Charlotte Avery <charlotte.avery@centrefornetzero.org>
Signed-off-by: Ianlmgoddard <47185808+Ianlmgoddard@users.noreply.github.com>
…ro.org>

Signed-off-by: Ianlmgoddard <47185808+Ianlmgoddard@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants