Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #376 +/- ##
==========================================
- Coverage 93.98% 88.99% -5.00%
==========================================
Files 10 11 +1
Lines 1630 1717 +87
==========================================
- Hits 1532 1528 -4
- Misses 98 189 +91 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
11f527a to
63829f4
Compare
01687ce to
208a7e0
Compare
There was a problem hiding this comment.
So far looks great .
I can see where you butted up against missing features/API. Happy to have resolved here:
- The dim criterion for decorrelator -> Happy to resolve here
- The No-Emulator
MachineLearningTool-> Happy to resolve here - encode_data/decode_data for
Vectortypes -> Happy to resolve here
Maybe can be resolved later:
- Parallel MCMC, , I have created an issue to expand this so you don't have to reach into internals to do parallel MCMC -> We can resolve this in another PR if that's useful
- The EKP scheduler with a checkpoint -> We can create a PR for this in EKP (may avoid the issue below)
Regarding the examples.
- I had to rebase this branch with
mainto get the MCMC working again - It is a little hard to interpret what the final result of
emulate_sample_linlinexp.jldoes. it also seemed like to runemulate_sampleyou always had to runcalibratein the same session (I wonder if this has something to do with the CheckpointScheduler definition) - I haven't yet run lorenz
| end | ||
|
|
||
| all_errs[dim_i, :] = | ||
| [norm(post_means[:, 2] - v) / norm(post_means[:, 2]) for v in eachcol(post_means[:, 3:end])]' |
There was a problem hiding this comment.
Why is the reference emulator not used here? Otherwise this is just showing the size of difference to a PCA-truncation mean? rather than comparing both a truncated PCA and likelihood approach to a true reference
PS. Running the example gave:
0.771104 0.724079 0.686967 0.675671 0.720269 0.687709
0.397877 0.567556 0.629621 0.656653 0.666858 0.66542
0.40432 0.393665 0.350307 0.287113 0.297794 0.238406
0.436975 0.370324 0.34126 0.278414 0.262382 0.233595
0.376547 0.295315 0.151808 0.147098 0.094317 0.118197
| chain_type = Chains, | ||
| stepsize = new_step, | ||
| discard_initial = 5_000, | ||
| ) |
There was a problem hiding this comment.
I see that our framework is lacking the full support for parallel chains - perhaps we can address this in a future PR
Seems like we just need to enable passing args... into optimize_stepsize that are forwarded into sample
| Apply the `CanonicalCorrelation` encoder, on a columns-are-data matrix or a data vector | ||
| """ | ||
| function encode_data(cc::CanonicalCorrelation, data::MM) where {MM <: AbstractMatrix} | ||
| function encode_data(cc::CanonicalCorrelation, data::MorV) where {MorV <: Union{AbstractMatrix, AbstractVector}} |
There was a problem hiding this comment.
In general the EKP / CES ecosystem, data is internally treated as columns, thus will always be provided as a matrix. However, I guess as this may be used as part of the external API, we can allow for vectors!
| # Fields | ||
| $(TYPEDFIELDS) | ||
| """ | ||
| mutable struct LikelihoodInformed{FT <: Real} <: PairedDataContainerProcessor |
There was a problem hiding this comment.
In general in the package we have never used mutable structs, only immutable ones. Which is perhaps annoying, but for consistency could we stick with immutable for now and do a larger change later (sorry)?
I also saw a couple of issues like this where the union {Nothing,X} do weird things before mutation. Not a big deal but happens quite commonly
There was a problem hiding this comment.
I'm not gonna die on this hill however - If it is a pain to make this immutable then I wouldn't worry too much about it. My guess is it just makes the stored objects vectors that you push to.
| grad = (samples_out .- mean(samples_out; dims = 2)) / (samples_in .- mean(samples_in; dims = 2)) | ||
| fill(grad, size(samples_in, 2)) | ||
| else | ||
| @assert li.grad_type == :localsl |
There was a problem hiding this comment.
Could we have useful throw(ArgumentError(...))s for the user over asserts? In our experience it is much better to tell the user to use r to provide the correct args than to have asserts.
| sortby = (-), | ||
| ) | ||
| else | ||
| @assert apply_to == "out" && α ≈ 0 && obs_whitened |
There was a problem hiding this comment.
likewise assert statement above... and others below
|
|
||
| num_trials = 1 | ||
| for trial in 1:num_trials | ||
| loaded = load("datafiles/ekp_linlinexp_$(trial).jld2") |
There was a problem hiding this comment.
I found that loading this in a new session fails, it seems like the only time it worked was if I ran both calibrate and emulate_sample in the same session.
There was a problem hiding this comment.
I wonder if this is because the CheckpointScheduler must be defined in the session where you load the EKP object
Implement the likelihood-informed data processor from our Overleaf
This PR makes the following changes.
LikelihoodInformeddata processor, which combines input data, output data and the actual inverse problem to find good reduced spaces.Decorrelatorthat uses a fixed reduced-space dimension, instead of calculating it based on a variance threshold. This helps primarily when testing or doing comparisons.examples/DimensionReduction/emulate_sample_linlinexp.jl. I feel like we need to make it a bit more versatile and less hacky before extracting it from there.Decorrelator + LikelihoodInformedto justDecorrelator.Big TODO: The error estimate for output space reduction when
α ≠ 0is way off, which would result in high errors when usingretain_KLas a criterion to find a reduced space. I'm still looking into why this is.There are some test failures that I still have to look at, but I think this PR is ready to review.