Skip to content

Trying to understand the testing frame work #169

@mkhattab940

Description

@mkhattab940

Hi @zaxtax

My team and I have been doing working on Hakaru for our undergrad capstone project. We are wrapping things up and writing a paper based on our results. However, we've been having trouble formulating our hypothesis.

We've been writing a number of RoundTrip test cases to test known relationships between distributions. However, we are having trouble understanding what exactly is going on with these tests. We've tried diving into the related files, but none of us are knowledgeable enough with Haskell to figure it out. Best we can tell is it grabs the test cases and passes them to some Maple environment.

@JacquesCarette has told us to ask you to explain it to us.

I'll explain what we've figured out so far and then hopefully you can fill in the gaps for us. So for example, we have added a test with the following result:

### Failure in: 6:RoundTrip:7:2:t_rayleigh_to_stdChiSq:0
haskell/Tests/TestTools.hs:130
expected:
chiSq_iid = fn n nat:
            fn mean real:
            fn stdev prob:
            q <~ plate _ of n: normal(mean, stdev)
            return summate i from 0 to size(q):
                   ((q[i] - mean) * prob2real(1/ stdev)) ^ 2
standardChiSq = fn n nat: chiSq_iid(n, nat2real(0), nat2prob(1))
standardChiSq(2)
but got:
q307 <~ normal(+0/1, 1/1)
q315 <~ normal(+0/1, 1/1)
return q307 ^ 2 + q315 ^ 2
Cases: 338  Tried: 287  Errors: 2  Failures: 20
                                               
### Failure in: 6:RoundTrip:7:2:t_rayleigh_to_stdChiSq:1
haskell/Tests/TestTools.hs:130
expected:
chiSq_iid = fn n nat:
            fn mean real:
            fn stdev prob:
            q <~ plate _ of n: normal(mean, stdev)
            return summate i from 0 to size(q):
                   ((q[i] - mean) * prob2real(1/ stdev)) ^ 2
standardChiSq = fn n nat: chiSq_iid(n, nat2real(0), nat2prob(1))
standardChiSq(2)
but got:
X3 <~ uniform(+0/1, +1/1)
return log(real2prob(X3)) * (-2/1)
Cases: 338  Tried: 288  Errors: 2  Failures: 21

So for each test case it looks like 2 tests are run. I've messed around with hk-maple a bit and it looks like this is roughly what is happening:

  • 0-test: run default and Summarize modes on the expected file and check if their outputs match
  • 1-test: run default mode on the expected file again, run Summarize mode on the test file and check if their outputs match

However, these outputs don't always exactly match the outputs when I run Summarize on these files (although they are very close) so I don't think this is exactly what is happening. Can you clarify how these outputs are generated?

We would also like to make sure we understand the purposes of both tests. The 0-test seems to be some sort preliminary test before the 1-test tests the actual relationship we are interested in. As far as I can tell, Summarize seems to be a more ambitious version of Simplify. So I expect if their outputs are equal, Summarize can't do any better, right? I think Dr. Carette had said it has to do with making sure some sort of change of variables is done correctly. Can you expand on this?

I know the 1 test is meant to produce equivalent code for Hakaru files describing equivalent distributions. Can you explain how the inference algorithms used in the test are meant to accomplish this?

For reference, this is the hypothesis we are currently working with:

Assume we know a relationship between 2 statistical distributions which transforms distribution A into distribution B, which we are able to prove by analyzing their PDFs.

We hypothesize that by applying the appropriate transformations on an implementation of distribution A in hakaru, we can create a hakaru program whose hk-maple output will be a hakaru program that is equivalent to the hakaru program output by hk-maple run on an implementation of distribution B.

Really appreciate your help with this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions