You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paper.md
+29-29Lines changed: 29 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,25 +31,25 @@ In short, **`dython`** lowers the friction for inter-variable association analys
31
31
32
32
# Statement of Need
33
33
34
-
While there are many statistical and visualization libraries in Python (e.g. `pandas`[@pandas], `scipy`[@scipy], `scikit-learn`[@scikit-learn], `seaborn`[@seaborn]), they treat continuous data, categorical data and the overall visualization separately. Users often resort to custom glue code to:
34
+
While there are many statistical and visualization libraries in Python (e.g.,`pandas`[@pandas], `scipy`[@scipy], `scikit-learn`[@scikit-learn], `seaborn`[@seaborn]), they treat continuous data, categorical data, and the overall visualization separately. Users often resort to custom glue code to:
35
35
36
36
1. determine which columns are categorical vs. numeric,
37
-
2. choose an appropriate association statistic (e.g. Pearson's R for numeric–numeric, correlation ratio for numeric–categorical, Cramér’s V or Theil’s U for categorical–categorical),
37
+
2. choose an appropriate association statistic (e.g., Pearson's R for numeric–numeric, correlation ratio for numeric–categorical, Cramér’s V or Theil’s U for categorical–categorical),
38
38
3. compute those pairwise,
39
-
4. assemble a matrix or graph,
39
+
4. assemble a matrix or graph, and
40
40
5. annotate, visualize, and interpret the results.
41
41
42
42
This fragmentation results in boilerplate, inconsistency, or risk of mistakes, especially in exploratory settings or pipelines.
43
43
44
44
**`dython`** addresses this gap by providing a unified, high-level API that:
Computes pairwise associations across all columns in a pandas DataFrame `df`. Internally, for each pair, it selects a measure appropriate to the variable types:
105
+
This computes pairwise associations across all columns in a pandas DataFrame `df`. Internally, for each pair, it selects a measure appropriate to the variable types:
106
106
107
107
- continuous–continuous → Pearson correlation (or Spearman, if configured)
108
108
- continuous–categorical → correlation ratio
@@ -130,7 +130,7 @@ Below is a summary of existing methods of `dython`, per module.
0 commit comments