-
Notifications
You must be signed in to change notification settings - Fork 2
Description
The BQL documentation ( http://probcomp.csail.mit.edu/dev/bayesdb/doc/bql.html ) states the following in the section for: PROBABILITY DENSITY OF (<targets>) [GIVEN (<constraints>)] :
WARNING: The value this function returns is not a normalized probability in [0, 1], but rather a probability density with a normalization constant that is common to the column but may vary between columns. So it may take on values above 1.
Presumably, this also applies to ESTIMATE PREDICTIVE PROBABILITY, which also returns values greater than one.
As a user, it is challenging to make use of the results of ESTIMATE PREDICTIVE PROBABILITY for the most typical use cases (e.g., ranking the most improbable data in a .csv file -- including across multiple columns) unless there is more clarity about the right way to compute the normalization constant in BQL. I understand in theory what a PDF is; however, it wasn't obvious to me what is the right way to compute the normalization constant using a series of BQL expressions or other code so that the probability densities could be compared across columns.
I think this is a documentation issue that could affect many typical users. Some examples of solutions to this issue are:
- A link, reference, or brief explanation in the documentation of how to compute the normalization constant that is currently mentioned in the documentation (probably easiest/fastest).
- An example showing how to compute the constant through a sequence of BQL expressions, and therefore how to compare probability densities for variables in different columns. This could go into one of the tutorial notebooks if not the BQL or bayeslite function references directly.
- A feature that returns the normalization constant and/or normalized versions of
PROBABILITY DENSITYandPREDICTIVE PROBABILITY.
For the moment, some basic information in the form of a reference or explanation about the normalization constant would be very helpful.