Skip to content

BQL function reference describes a normalization constant without info on how to compute it #76

@versar

Description

@versar

The BQL documentation ( http://probcomp.csail.mit.edu/dev/bayesdb/doc/bql.html ) states the following in the section for: PROBABILITY DENSITY OF (<targets>) [GIVEN (<constraints>)] :

WARNING: The value this function returns is not a normalized probability in [0, 1], but rather a probability density with a normalization constant that is common to the column but may vary between columns. So it may take on values above 1.

Presumably, this also applies to ESTIMATE PREDICTIVE PROBABILITY, which also returns values greater than one.

As a user, it is challenging to make use of the results of ESTIMATE PREDICTIVE PROBABILITY for the most typical use cases (e.g., ranking the most improbable data in a .csv file -- including across multiple columns) unless there is more clarity about the right way to compute the normalization constant in BQL. I understand in theory what a PDF is; however, it wasn't obvious to me what is the right way to compute the normalization constant using a series of BQL expressions or other code so that the probability densities could be compared across columns.

I think this is a documentation issue that could affect many typical users. Some examples of solutions to this issue are:

  1. A link, reference, or brief explanation in the documentation of how to compute the normalization constant that is currently mentioned in the documentation (probably easiest/fastest).
  2. An example showing how to compute the constant through a sequence of BQL expressions, and therefore how to compare probability densities for variables in different columns. This could go into one of the tutorial notebooks if not the BQL or bayeslite function references directly.
  3. A feature that returns the normalization constant and/or normalized versions of PROBABILITY DENSITY and PREDICTIVE PROBABILITY.

For the moment, some basic information in the form of a reference or explanation about the normalization constant would be very helpful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions