Skip to content
/ DBCI Public

Computation of confidence intervals for binomial proportions and for difference of binomial proportions.

License

Notifications You must be signed in to change notification settings

DeepPSP/DBCI

Repository files navigation

Confidence Intervals for Difference of Binomial Proportions

pytest random-test codecov PyPI RTD Status gh-page status

PyPI Downloads license GitHub Release Date - Published_At GitHub commits since latest release (by SemVer including pre-releases) Streamlit App

Computation of confidence intervals for binomial proportions and for difference of binomial proportions.

[GitHub Pages] [Read the Docs]

πŸš€ NEW πŸš€ Streamlit support! See here for an app deployed on Streamlit Community Cloud.

Installation

Run

python -m pip install diff-binom-confint

or install the latest version in GitHub using

python -m pip install git+https://github.com/DeepPSP/DBCI.git

or git clone this repository and install locally via

cd DBCI
python -m pip install .

Numba accelerated version

Install using

python -m pip install diff-binom-confint[acc]

Usage examples

from diff_binom_confint import compute_difference_confidence_interval

n_positive, n_total = 84, 101
ref_positive, ref_total = 89, 105

confint = compute_difference_confidence_interval(
    n_positive,
    n_total,
    ref_positive,
    ref_total,
    conf_level=0.95,
    method="wilson",
)

Implemented methods

Confidence intervals for binomial proportions

Click to view!
Method (type) Implemented
wilson βœ”οΈ
wilson-cc βœ”οΈ
wald βœ”οΈ
wald-cc βœ”οΈ
agresti-coull βœ”οΈ
jeffreys βœ”οΈ
clopper-pearson βœ”οΈ
arcsine βœ”οΈ
logit βœ”οΈ
pratt βœ”οΈ
witting βœ”οΈ
mid-p βœ”οΈ
lik βœ”οΈ
blaker βœ”οΈ
modified-wilson βœ”οΈ
modified-jeffreys βœ”οΈ

Confidence intervals for difference of binomial proportions

Click to view!
Method (type) Implemented
wilson βœ”οΈ
wilson-cc βœ”οΈ
wald βœ”οΈ
wald-cc βœ”οΈ
haldane βœ”οΈ
jeffreys-perks βœ”οΈ
mee βœ”οΈ
miettinen-nurminen βœ”οΈ
true-profile βœ”οΈ
hauck-anderson βœ”οΈ
agresti-caffo βœ”οΈ
carlin-louis βœ”οΈ
brown-li βœ”οΈ
brown-li-jeffrey βœ”οΈ
miettinen-nurminen-brown-li βœ”οΈ
exact ❌
mid-p ❌
santner-snell ❌
chan-zhang ❌
agresti-min ❌
wang βœ”οΈ
pradhan-banerjee ❌

Creating report

One can use the make_risk_report function to create a report of the confidence intervals for difference of binomial proportions.

from diff_binom_confint import make_risk_report

# df_train and df_test are pandas.DataFrame providing the data
table = make_risk_report((df_train, df_test), target = "binary_target")
# or if df_data is a pandas.DataFrame containing both training and testing data
table = make_risk_report(df_data, target = "binary_target")

For more details, see corresponding documenation. The produced table is similar to the following:

Click to view!

risk report

References

  1. SAS
  2. PASS
  3. statsmodels.stats.proportion
  4. scipy.stats._binomtest
  5. corplingstats
  6. DescTools.StatsAndCIs
  7. Newcombee

NOTE

Reference 1 has errors in the description of the methods Wilson CC, Mee, Miettinen-Nurminen. The correct computation of Wilson CC is given in Reference 5. The correct computation of Mee, Miettinen-Nurminen are given in the code blocks in Reference 1

Test data

Test data are

  1. taken (with slight modification, e.g. the upper_bound of miettinen-nurminen-brown-li method in the edge case file) from Reference 1 for automatic test of the correctness of the implementation of the algorithms.

  2. generated using DescTools.StatsAndCIs via

    library("DescTools")
    library("data.table")
    
    results = data.table()
    for (m in c("wilson", "wald", "waldcc", "agresti-coull", "jeffreys",
                    "modified wilson", "wilsoncc", "modified jeffreys",
                    "clopper-pearson", "arcsine", "logit", "witting", "pratt",
                    "midp", "lik", "blaker")){
        ci = BinomCI(84,101,method = m)
        new_row = data.table("method" = m, "ratio"=ci[1], "lower_bound" = ci[2], "upper_bound" = ci[3])
        results = rbindlist(list(results, new_row))
    }
    fwrite(results, "./test/test-data/example-84-101.csv")  # with manual slight adjustment of method names
  3. taken from Reference 7 (Table II).

The filenames has the following pattern:

# for computing confidence interval for difference of binomial proportions
"example-(?P<n_positive>[\\d]+)-(?P<n_total>[\\d]+)-vs-(?P<ref_positive>[\\d]+)-(?P<ref_total>[\\d]+)\\.csv"

# for computing confidence interval for binomial proportions
"example-(?P<n_positive>[\\d]+)-(?P<n_total>[\\d]+)\\.csv"

Note that the out-of-range values (e.g. > 1) are left as empty values in the .csv files.

Known Issues

  1. Edge cases incorrect for the method true-profile.

About

Computation of confidence intervals for binomial proportions and for difference of binomial proportions.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages