Skip to content

Commit 111f32f

Browse files
authored
Added modules tables
1 parent 6892e0a commit 111f32f

File tree

1 file changed

+64
-24
lines changed

1 file changed

+64
-24
lines changed

paper.md

Lines changed: 64 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -56,10 +56,50 @@ This fragmentation results in boilerplate, inconsistency, or mistake-risk, espec
5656
Therefore, **`dython`** helps data scientists, statisticians, and researchers spend less time writing glue code and more time focusing on insights.
5757

5858
# Functionality
59-
60-
Below is a non-exhaustive overview of core modules and features. For full API and examples, see the documentation.
61-
62-
## Associations
59+
Below is a summary of existing methods of `dython`, per module.
60+
61+
## `nominal`
62+
63+
| Method | Description |
64+
|--------|-------------|
65+
| associations | Computes associations between mixed-type features. |
66+
| cluster_correlations | Applies clustering to reorder a correlation matrix. |
67+
| compute_associations | Deprecated; replaced by associations(compute_only). |
68+
| conditional_entropy | Computes conditional entropy of X given Y. |
69+
| correlation_ratio | Computes correlation between categorical and numeric vars. |
70+
| cramers_v | Computes Cramer’s V between categorical variables. |
71+
| identify_nominal_columns | Detects nominal (categorical) columns. |
72+
| identify_numeric_columns | Detects numeric columns. |
73+
| numerical_encoding | Encodes a mixed dataset into numeric format. |
74+
| replot_last_associations | Re-plots the last association heatmap. |
75+
| theils_u | Computes Theil’s U (uncertainty coefficient). |
76+
77+
## `model_utils`
78+
79+
| Method | Description |
80+
|--------|-------------|
81+
| ks_abc | Computes KS statistic, ABC, and optional plot. |
82+
| metric_graph | Plots ROC/PR curves for classifiers. |
83+
| random_forest_feature_importance | Plots feature importance for Random Forest models. |
84+
85+
## `sampling`
86+
87+
| Method | Description |
88+
|--------|-------------|
89+
| boltzmann_sampling | Samples values under Boltzmann distribution. |
90+
| weighted_sampling | Samples values using weighted probabilities. |
91+
92+
## `data_utils`
93+
94+
| Method | Description |
95+
|-------|-------------|
96+
| identify_columns_with_na | Returns dataset columns containing NA values. |
97+
| identify_columns_by_type | Identifies columns of requested data types. |
98+
| one_hot_encode | Converts a 1D array of integers into a one-hot matrix. |
99+
| split_hist | Plots a histogram split by categories. |
100+
101+
## Code Examples
102+
### Associations
63103

64104
* `dython.nominal.associations(df, theil_u=False, plot=False, return_results=False, **kwargs)`
65105
Computes pairwise associations across all columns in a pandas DataFrame `df`. Internally, for each pair, it selects a measure appropriate to the variable types:
@@ -77,7 +117,7 @@ Below is a non-exhaustive overview of core modules and features. For full API an
77117
assoc_df = associations(my_df, theil_u=True, plot=True)
78118
```
79119

80-
## Model evaluation
120+
### Model evaluation
81121

82122
* `dython.model_utils.metric_graph(y_true, y_pred, metric='roc', **kwargs)`
83123
This utility helps visualize classification performance. For a given true-label array y_true and predicted scores y_pred, it can plot ROC curves, compute AUC for each class (in multiclass settings), and show threshold recommendations.
@@ -126,7 +166,25 @@ Dependencies include standard scientific Python packages such as
126166
`matplotlib` [@matplotlib],
127167
and `seaborn` [@seaborn].
128168

129-
# Example workflow
169+
# Usage Mention
170+
171+
Throughout its lifetime, `dython` has been used in many projects, including:
172+
173+
* [Official implementation](`https://github.com/yandex-research/tab-ddpm`) of TabDDPM [@tabddpm] by [Yandex Research](https://research.yandex.com)
174+
175+
* [`gretel-synthetics`](https://github.com/gretelai/gretel-synthetics/tree/a0e712852f74c238ce848456952e90a141e8da2a) by [Gretal.ai](https://gretel.ai)
176+
177+
* [`torchmetrics`](https://github.com/Lightning-AI/torchmetrics?tab=readme-ov-file) by [Lightning AI](https://lightning.ai)
178+
179+
* [`ydata-quiality`](https://github.com/ydataai/ydata-quality) by [YData](https://ydata.ai)
180+
181+
182+
# Acknowledgements
183+
184+
The author thanks users and contributors who have filed issues, submitted pull requests, or suggested enhancements, as well as the authors and
185+
communities of the foundational packages on which `dython` builds.
186+
187+
# Appendix: Code Examples
130188
**A minimal example using `associations`**:
131189

132190
```python
@@ -196,22 +254,4 @@ This would output:
196254
![Example of a ROC graph plotted over the Iris Dataset](roc.png)
197255

198256

199-
# Usage Mention
200-
201-
Throughout its lifetime, `dython` has been used in many projects, including:
202-
203-
* [Official implementation](`https://github.com/yandex-research/tab-ddpm`) of TabDDPM [@tabddpm] by [Yandex Research](https://research.yandex.com)
204-
205-
* [`gretel-synthetics`](https://github.com/gretelai/gretel-synthetics/tree/a0e712852f74c238ce848456952e90a141e8da2a) by [Gretal.ai](https://gretel.ai)
206-
207-
* [`torchmetrics`](https://github.com/Lightning-AI/torchmetrics?tab=readme-ov-file) by [Lightning AI](https://lightning.ai)
208-
209-
* [`ydata-quiality`](https://github.com/ydataai/ydata-quality) by [YData](https://ydata.ai)
210-
211-
212-
# Acknowledgements
213-
214-
The author thanks users and contributors who have filed issues, submitted pull requests, or suggested enhancements, as well as the authors and
215-
communities of the foundational packages on which `dython` builds.
216-
217257
# References

0 commit comments

Comments
 (0)