Skip to content

Comments

Limit Memory Usage#203

Open
cbyrohl wants to merge 7 commits intomainfrom
fix-histogram2d-memory-usage
Open

Limit Memory Usage#203
cbyrohl wants to merge 7 commits intomainfrom
fix-histogram2d-memory-usage

Conversation

@cbyrohl
Copy link
Owner

@cbyrohl cbyrohl commented Aug 7, 2025

Provide a command to limit memory usage via scida.init_resources(). Will fix #197

Missing:

  • For TNGLab call scida.init_resources() upon scida.load() to enforce limit in case init_resources() has not been called yet.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces memory management functionality for scida through a new init_resources() function. The primary goal is to allow users to limit memory usage when working with large datasets, particularly in shared environments like TNGLab or HPC clusters.

Changes:

  • Added new src/scida/init.py module with init_resources() function for configuring Dask distributed computing with memory limits
  • Extended parse_humansize() utility to support both decimal (KB, MB, GB) and binary (KiB, MiB, GiB) unit formats
  • Added comprehensive test suite in tests/test_init.py covering various initialization scenarios
  • Updated documentation to explain memory management and provide usage guidance
  • Updated dependencies (mkdocstrings, matplotlib, pint, griffe, and others)

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
src/scida/init.py New module implementing resource initialization with memory limits and TNGLab environment detection
src/scida/helpers_misc.py Extended units dictionary to support decimal units (KB, MB, GB, TB, PB) alongside binary units
src/scida/init.py Exported init_resources function for public API
tests/test_init.py Comprehensive test suite for init_resources function and helper functions
docs/largedatasets.md Updated documentation with new Preparation section and revised LocalCluster setup guidance
docs/tutorial/simulations.md Added tip about memory management for large datasets
docs/visualization.md Added note about known issues with recent dask versions
pyproject.toml Updated mkdocstrings version from ^0.24.0 to ^0.30.0
poetry.lock Updated lock file with new dependency versions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 66 to 67
# Should be 50% of 16GB = 8GB, which is at the cap
assert result == "8589934592B" # 8GB in bytes
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test comment says "Should be 50% of 16GB = 8GB, which is at the cap" but the implementation doesn't cap at 8GB. For a 16GB system, 50% would be exactly 8GB, but this is coincidental, not because of capping. The comment is misleading because it implies there's a cap when there isn't one. If the intent is to have a cap, the implementation should enforce it; otherwise, the comment should be updated to accurately reflect that this is just 50% of system memory without capping.

Copilot uses AI. Check for mistakes.
else:
cluster_args["dashboard_address"] = None

memory_limit_bytes = parse_humansize(memory_limit)
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function signature specifies that memory_limit can be either a string or an integer (Union[str, int]), but the parse_humansize function only handles string input. If an integer is passed (representing bytes), calling .upper() on line 176 will fail with an AttributeError. The function should either handle integer input by checking the type and returning it directly if it's already an int, or the type hint should be updated to only accept strings.

Copilot uses AI. Check for mistakes.

scida provides a convenient `init_resources()` function that set up reasonable local resource defaults.
**Call `scida.init_resources()` immediately after importing scida** and before any other dask operations .
For TNGLab, we impose a default memory limit of 4GB via this mechanism. Monitor memory usage via the [dask dashboard](https://docs.dask.org/en/latest/dashboard.html).
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation states "we impose a default memory limit of 4GB via this mechanism" for TNGLab, but the code shows that users can override this default by passing their own memory_limit and n_workers values. The documentation should clarify that this is a default that can be overridden if needed, or it should match the actual enforcement logic in the code.

Suggested change
For TNGLab, we impose a default memory limit of 4GB via this mechanism. Monitor memory usage via the [dask dashboard](https://docs.dask.org/en/latest/dashboard.html).
For TNGLab, this mechanism configures a default memory limit of 4GB, which you can override by passing `memory_limit` (and optionally `n_workers`) to `scida.init_resources()`. Monitor memory usage via the [dask dashboard](https://docs.dask.org/en/latest/dashboard.html).

Copilot uses AI. Check for mistakes.
### Manual LocalCluster setup

While the result is eventually computed, it is a bit slow, primarily because the actual reading of the data off disk is the limiting factor, and we can only use resources available on our local machine.
If you need fine-grained control beyond what `scida.init_resources()` provides, you can manually configure the Local distributed scheduler:
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation section title changed from "Running a LocalCluster" to "Manual LocalCluster setup", but the explanation text "If you need fine-grained control beyond what scida.init_resources() provides" now refers to a function that wasn't discussed yet in this section. Consider adding a brief introduction about scida.init_resources() before this subsection, or adding a forward reference explaining that this is an alternative to the init_resources() function introduced in the Preparation section.

Suggested change
If you need fine-grained control beyond what `scida.init_resources()` provides, you can manually configure the Local distributed scheduler:
If you need fine-grained control beyond what `scida.init_resources()` (introduced in the Preparation section above) provides, you can manually configure the local distributed scheduler:

Copilot uses AI. Check for mistakes.
-----
- Once initialized, all subsequent dask operations will use the configured scheduler
- Call this function before loading datasets for results
- On TNGLab, we force a memory limit of 4GB and 4 workers.
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring states "we force a memory limit of 4GB and 4 workers" but the implementation uses the or operator which means user-provided values are respected. The code at lines 64-65 only sets defaults when values are None, not enforcing fixed values. The docstring should say "we default to a memory limit of 4GB and 4 workers" or "we set default values of 4GB memory limit and 4 workers if not specified" to accurately reflect the implementation.

Copilot uses AI. Check for mistakes.

total_memory = psutil.virtual_memory().total
# Use about 50% of system memory, but cap at 8GB per worker
memory_per_worker = total_memory * 0.5
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "cap at 8GB per worker" but the implementation doesn't actually cap the memory at 8GB. The function returns 50% of total memory without any capping logic. This means on a system with 64GB of RAM, it would return 32GB, not 8GB as the comment suggests. Either add the capping logic (e.g., min(memory_per_worker, 8 * 1024**3)) or update the comment to accurately describe the behavior.

Suggested change
memory_per_worker = total_memory * 0.5
half_memory = total_memory * 0.5
memory_per_worker = min(half_memory, 8 * 1024**3)

Copilot uses AI. Check for mistakes.
Comment on lines 83 to 84
# Should be capped at 8GB per worker
assert result == "8589934592B" # 8GB in bytes
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test expects 8GB (8589934592 bytes) as the result for a 64GB system, but the implementation returns 50% of system memory (32GB) without any capping logic. This test will fail because the implementation doesn't match the expected behavior documented in the comments.

Suggested change
# Should be capped at 8GB per worker
assert result == "8589934592B" # 8GB in bytes
# Should be 50% of 64GB = 32GB (no cap applied)
assert result == "34359738368B" # 32GB in bytes

Copilot uses AI. Check for mistakes.

def test_init_disabled(self):
"""Test init with distributed computing disabled."""
result = init_resources(use_distributed=False)
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test passes use_distributed=False to init_resources(), but the function signature doesn't include a use_distributed parameter. This test will fail with a TypeError.

Suggested change
result = init_resources(use_distributed=False)
result = init_resources()

Copilot uses AI. Check for mistakes.
By default, scida would use all available memory and CPU cores, which is not always desired, particularly on shared systems such as HPC clusters or [TNGLab](https://www.tng-project.org/data/lab/).

scida provides a convenient `init_resources()` function that set up reasonable local resource defaults.
**Call `scida.init_resources()` immediately after importing scida** and before any other dask operations .
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space before the period. The sentence should read "Call scida.init_resources() immediately after importing scida and before any other dask operations."

Suggested change
**Call `scida.init_resources()` immediately after importing scida** and before any other dask operations .
**Call `scida.init_resources()` immediately after importing scida** and before any other dask operations.

Copilot uses AI. Check for mistakes.
First, we need to inform scida/dask about the resources it is allowed to use or allocate.
By default, scida would use all available memory and CPU cores, which is not always desired, particularly on shared systems such as HPC clusters or [TNGLab](https://www.tng-project.org/data/lab/).

scida provides a convenient `init_resources()` function that set up reasonable local resource defaults.
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar issue: "set up" should be "sets up" to maintain subject-verb agreement with "function" (singular).

Suggested change
scida provides a convenient `init_resources()` function that set up reasonable local resource defaults.
scida provides a convenient `init_resources()` function that sets up reasonable local resource defaults.

Copilot uses AI. Check for mistakes.
cbyrohl and others added 4 commits January 20, 2026 00:02
- Move psutil import from inside _get_default_memory_limit() to module level
- Add PSUTIL_AVAILABLE flag for fallback handling
- Fix test patches to use scida.init.psutil path
- Remove incorrect 8GB cap expectation from large system test

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Handle integer memory_limit input (fixes AttributeError on .upper())
- Fix docstring/log: "force" -> "default to... (can be overridden)"
- Fix docs: grammar ("set up" -> "sets up"), remove extra space
- Add code example for init_resources() in docs
- Clarify TNGLab defaults can be overridden
- Fix test using non-existent use_distributed parameter

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix test_init_no_dask_distributed to properly simulate missing module
  using sys.modules patching instead of broken side_effect=ImportError
- Add unit tests for integer memory_limit, **cluster_kwargs forwarding,
  explicit threads_per_worker, and decimal-prefix parse_humansize
- Add integration tests that create real LocalCluster instances and
  verify memory limits are applied to workers via scheduler_info
- Integration test runs da.histogram2d under memory-constrained workers
  and validates correctness against numpy reference

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

memory usage in histogram2d().

1 participant