Skip to content

Conversation

@Subat-01
Copy link

@Subat-01 Subat-01 commented Oct 17, 2025

Background

Table.hist needed clearer documentation and safer behavior for the group argument.

Changes

Code

  • datascience/tables.py:hist
    • Validate group is a legal column (label or index); otherwise raise ValueError with a clear message.
    • Keep existing constraints: group cannot be used with bin_column; and grouping allows only one numeric data column.
    • When the grouped subset is empty, return an empty figure safely (avoid exceptions).
    • Expanded docstring with two examples:
      • t.hist('height', group='gender')
      • t.hist('height', group='gender', side_by_side=True)

Docs

  • New docs/hist_grouping.md:
    • Minimal example and explanation of constraints + empty-data behavior.
    • (If the project prefers RST/toctree only, I'm happy to convert to .rst and add it to the TOC.)

Tests

  • tests/test_hist_group.py
    • Valid grouping runs without error.
    • Non-existent group column raises ValueError.
    • Empty grouped subset renders safely (creates an empty plot).

(Optional) Maps small fixes

  • datascience/maps.py
    • Provide default attribution when a string tile style is used and no attr supplied (Folium >= 0.20).
    • Mirror text_color -> textColor option for BeautifyIcon for backward compatibility.

Verification

  • Local run: 225 passed, 1 skipped, 0 failed.

Files changed (for quick review)

  • datascience/tables.py [around lines ~5310, ~5398, ~5420, ~5455]
  • docs/hist_grouping.md
  • tests/test_hist_group.py
  • (optional) datascience/maps.py

Compatibility & Notes

  • No breaking API changes; clearer errors, docs, and tests.
  • Open to adjusting error types/doc placement or dropping the maps tweak if maintainers prefer a focused PR.

…and empty-group handling; add tests; fix folium tile attribution and BeautifyIcon textColor compatibility
@Subat-01 Subat-01 changed the title docs(hist): document group usage with examples; add group validation … hist(group): add validation & empty-data handling; docstring, guide and tests Oct 17, 2025
Constraints and behavior:
- ``group`` cannot be combined with ``bin_column``.
- ``group`` requires exactly one histogram value column. If more
than one value column is passed, a ``ValueError`` is raised.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to specify which exception is raised if a constraint is violated.

- ``group`` cannot be combined with ``bin_column``.
- ``group`` requires exactly one histogram value column. If more
than one value column is passed, a ``ValueError`` is raised.
- If ``group`` does not reference an existing column (by label or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed

warnings.warn("It looks like you're making a grouped histogram with "
"a lot of groups ({:d}), which is probably incorrect."
.format(grouped.num_rows))
if grouped.num_rows == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the argument for this change in behavior?

# This code is factored as a function for clarity only.
n = len(values_dict)
if n == 0:
# Create an empty figure to maintain a no-error contract on empty groups
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the argument for this change in behavior? Why is it important to avoid exceptions in this case?


Notes and constraints:
- `group` cannot be used together with `bin_column`.
- `group` expects exactly one numeric value column (e.g., `'height'`). Passing multiple value columns raises a `ValueError`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the exception raised should be part of the API specification.

Notes and constraints:
- `group` cannot be used together with `bin_column`.
- `group` expects exactly one numeric value column (e.g., `'height'`). Passing multiple value columns raises a `ValueError`.
- If `group` does not reference an existing column label or index, a `ValueError` is raised.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the exception raised should be part of the API specification.

- `group` cannot be used together with `bin_column`.
- `group` expects exactly one numeric value column (e.g., `'height'`). Passing multiple value columns raises a `ValueError`.
- If `group` does not reference an existing column label or index, a `ValueError` is raised.
- If the data are empty for all groups, `hist` creates an empty figure and returns without error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

@davidwagner
Copy link
Member

Thanks for these improvements!

I prefer that each pull request address one self-contained issue. Please split out the fixes to maps/tiles in a separate PR.

I do prefer that we stick to .rst.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants