Skip to content

Commit a0218dd

Browse files
authored
Merge pull request #177 from danielskatz/patch-1
minor changes for JOSS publication
2 parents 6eb704d + fc0bc4a commit a0218dd

File tree

2 files changed

+37
-37
lines changed

2 files changed

+37
-37
lines changed

paper.bib

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,8 @@ @Article{ numpy
2222
}
2323

2424
@software{pandas,
25-
author = {{The pandas development team}},
26-
title = {pandas-dev/pandas: Pandas},
25+
author = {Pandas development team},
26+
title = {pandas-dev/pandas: {P}andas},
2727
month = sep,
2828
year = 2025,
2929
publisher = {Zenodo},
@@ -52,8 +52,8 @@ @article{scipy
5252
Harris, Charles R. and Archibald, Anne M. and
5353
Ribeiro, Ant{\^o}nio H. and Pedregosa, Fabian and
5454
{van Mulbregt}, Paul and {SciPy 1.0 Contributors}},
55-
title = {{{SciPy} 1.0: Fundamental Algorithms for Scientific
56-
Computing in Python}},
55+
title = {{SciPy} 1.0: Fundamental Algorithms for Scientific
56+
Computing in {P}ython},
5757
journal = {Nature Methods},
5858
year = {2020},
5959
volume = {17},
@@ -103,8 +103,8 @@ @article{seaborn
103103
}
104104

105105
@InProceedings{ statsmodels,
106-
author = { {S}kipper {S}eabold and {J}osef {P}erktold },
107-
title = { {S}tatsmodels: {E}conometric and {S}tatistical {M}odeling with {P}ython },
106+
author = { Skipper Seabold and Josef Perktold },
107+
title = { Statsmodels: Econometric and Statistical Modeling with {P}ython },
108108
booktitle = { {P}roceedings of the 9th {P}ython in {S}cience {C}onference },
109109
pages = { 92 - 96 },
110110
year = { 2010 },
@@ -113,7 +113,7 @@ @InProceedings{ statsmodels
113113
}
114114

115115
@article{pingouin,
116-
title = {Pingouin: statistics in Python},
116+
title = {Pingouin: statistics in {P}ython},
117117
volume = {3},
118118
DOI = {10.21105/joss.01026},
119119
number = {31},
@@ -136,4 +136,4 @@ @inproceedings{tabddpm
136136
pages = {17708--17728},
137137
publisher = {PMLR},
138138
url = {https://proceedings.mlr.press/v202/kotelnikov23a.html}
139-
}
139+
}

paper.md

Lines changed: 29 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -31,25 +31,25 @@ In short, **`dython`** lowers the friction for inter-variable association analys
3131

3232
# Statement of Need
3333

34-
While there are many statistical and visualization libraries in Python (e.g. `pandas` [@pandas], `scipy` [@scipy], `scikit-learn` [@scikit-learn], `seaborn` [@seaborn]), they treat continuous data, categorical data and the overall visualization separately. Users often resort to custom glue code to:
34+
While there are many statistical and visualization libraries in Python (e.g., `pandas` [@pandas], `scipy` [@scipy], `scikit-learn` [@scikit-learn], `seaborn` [@seaborn]), they treat continuous data, categorical data, and the overall visualization separately. Users often resort to custom glue code to:
3535

3636
1. determine which columns are categorical vs. numeric,
37-
2. choose an appropriate association statistic (e.g. Pearson's R for numeric–numeric, correlation ratio for numeric–categorical, Cramér’s V or Theil’s U for categorical–categorical),
37+
2. choose an appropriate association statistic (e.g., Pearson's R for numeric–numeric, correlation ratio for numeric–categorical, Cramér’s V or Theil’s U for categorical–categorical),
3838
3. compute those pairwise,
39-
4. assemble a matrix or graph,
39+
4. assemble a matrix or graph, and
4040
5. annotate, visualize, and interpret the results.
4141

4242
This fragmentation results in boilerplate, inconsistency, or risk of mistakes, especially in exploratory settings or pipelines.
4343

4444
**`dython`** addresses this gap by providing a unified, high-level API that:
4545

46-
- **infers variable types**
46+
- **infers variable types**,
4747

48-
- **automatically selects appropriate measures**
48+
- **automatically selects appropriate measures**,
4949

50-
- **returns structured and annotated output**
50+
- **returns structured and annotated output**,
5151

52-
- **offers visualization** (heatmaps, annotation) integrations
52+
- **offers visualization** (heatmaps, annotation) integrations, and
5353

5454
- **offers model evaluation tools** (ROC, AUC, thresholding) for classification tasks
5555

@@ -62,47 +62,47 @@ Below is a summary of existing methods of `dython`, per module.
6262

6363
| Method | Description |
6464
|--------|-------------|
65-
| associations | Computes associations between mixed-type features. |
66-
| cluster_correlations | Applies clustering to reorder a correlation matrix. |
67-
| compute_associations | Deprecated; replaced by `associations(compute_only=True)`. |
68-
| conditional_entropy | Computes conditional entropy of X given Y. |
69-
| correlation_ratio | Computes correlation between categorical and numeric variables. |
70-
| cramers_v | Computes Cramér’s V between categorical variables. |
71-
| identify_nominal_columns | Detects nominal (categorical) columns. |
72-
| identify_numeric_columns | Detects numeric columns. |
73-
| numerical_encoding | Encodes a mixed dataset into numeric format. |
74-
| replot_last_associations | Replots the last association heatmap. |
75-
| theils_u | Computes Theil’s U (uncertainty coefficient). |
65+
| associations | Computes associations between mixed-type features |
66+
| cluster_correlations | Applies clustering to reorder a correlation matrix |
67+
| compute_associations | Deprecated; replaced by `associations(compute_only=True)` |
68+
| conditional_entropy | Computes conditional entropy of X given Y |
69+
| correlation_ratio | Computes correlation between categorical and numeric variables |
70+
| cramers_v | Computes Cramér’s V between categorical variables |
71+
| identify_nominal_columns | Detects nominal (categorical) columns |
72+
| identify_numeric_columns | Detects numeric columns |
73+
| numerical_encoding | Encodes a mixed dataset into numeric format |
74+
| replot_last_associations | Replots the last association heatmap |
75+
| theils_u | Computes Theil’s U (uncertainty coefficient) |
7676

7777
## `model_utils`
7878

7979
| Method | Description |
8080
|--------|-------------|
81-
| ks_abc | Computes KS statistic, ABC, and optional plot. |
82-
| metric_graph | Plots ROC/PR curves for classifiers. |
83-
| random_forest_feature_importance | Plots feature importance for Random Forest models. |
81+
| ks_abc | Computes KS statistic, ABC, and optional plot |
82+
| metric_graph | Plots ROC/PR curves for classifiers |
83+
| random_forest_feature_importance | Plots feature importance for Random Forest models |
8484

8585
## `sampling`
8686

8787
| Method | Description |
8888
|--------|-------------|
89-
| boltzmann_sampling | Samples values under Boltzmann distribution. |
90-
| weighted_sampling | Samples values using weighted probabilities. |
89+
| boltzmann_sampling | Samples values under Boltzmann distribution |
90+
| weighted_sampling | Samples values using weighted probabilities |
9191

9292
## `data_utils`
9393

9494
| Method | Description |
9595
|-------|-------------|
96-
| identify_columns_with_na | Returns dataset columns containing NA values. |
97-
| identify_columns_by_type | Identifies columns of requested data types. |
98-
| one_hot_encode | Converts a 1D array of integers into a one-hot matrix. |
99-
| split_hist | Plots a histogram split by categories. |
96+
| identify_columns_with_na | Returns dataset columns containing NA values |
97+
| identify_columns_by_type | Identifies columns of requested data types |
98+
| one_hot_encode | Converts a 1D array of integers into a one-hot matrix |
99+
| split_hist | Plots a histogram split by categories |
100100

101101
## Code Examples
102102
### Associations
103103

104104
* `dython.nominal.associations(df, theil_u=False, plot=False, return_results=False, **kwargs)`
105-
Computes pairwise associations across all columns in a pandas DataFrame `df`. Internally, for each pair, it selects a measure appropriate to the variable types:
105+
This computes pairwise associations across all columns in a pandas DataFrame `df`. Internally, for each pair, it selects a measure appropriate to the variable types:
106106

107107
- continuous–continuous → Pearson correlation (or Spearman, if configured)
108108
- continuous–categorical → correlation ratio
@@ -130,7 +130,7 @@ Below is a summary of existing methods of `dython`, per module.
130130
```
131131

132132
* `dython.model_utils.ks_abc(y_true, y_pred, **kwargs)`
133-
Perform the Kolmogorov–Smirnov test over the positive and negative distributions of a binary classifier, and compute the area between curves.
133+
This performs the Kolmogorov–Smirnov test over the positive and negative distributions of a binary classifier, and compute the area between curves.
134134

135135
Example:
136136

0 commit comments

Comments
 (0)