Refactor lag significance functions and config handling #19

GongJr0 · 2025-10-04T22:15:09Z

Renamed internal bootstrapped significance functions for clarity, added config-driven wrappers, and improved result DataFrame construction. This refactor centralizes configuration resolution and streamlines selection logic for both statsmodels and numba engines.

Summary by CodeRabbit

New Features
- Added configuration-driven bootstrap analysis with standard and fast modes and new result fields (e.g., base statistics, FDR-adjusted p-values, selection flags).
- Public wrappers to run bootstrap flows from config.
Bug Fixes
- Fixed cache deletion to target the correct sector identifier, preventing stale entries.
Refactor
- Unified lag selection to use the config-driven flow and updated result sorting to prioritize base p-values and frequency for clearer interpretation.

Renamed internal bootstrapped significance functions for clarity, added config-driven wrappers, and improved result DataFrame construction. This refactor centralizes configuration resolution and streamlines selection logic for both statsmodels and numba engines.

coderabbitai · 2025-10-04T22:15:33Z

Walkthrough

Refactors lag significance to a config-driven flow: adds internal config resolver and underscored bootstrapped functions, adds public wrappers that build/resume config, updates select_lags to dispatch via these wrappers, changes result fields/sorting, and renames a CorrCache.delete parameter and SQL predicate from sectors to sectors_id.

Changes

Cohort / File(s)	Summary
Autoreg lag utils refactor `CandleNet/autoreg/lag_utils.py`	Added `_resolve_lag_cfg(params: LagConfig, n: int)` and `_infer_block_len(n: int)` helpers; renamed `bootstrapped_significance` → `_bootstrapped_significance` and `fast_bootstrapped_significance` → `_fast_bootstrapped_significance`; added public wrappers `bootstrapped_from_config(y: pd.Series)` and `fast_bootstrapped_from_config(y: pd.Series)` that build params, resolve config, seed RNG, and call underscored implementations; updated `select_lags` to dispatch to these wrappers; changed result DataFrame columns (e.g., added `t_base`, `p_fdr`, `selected`) and final sorting keys to `["selected","p_base","freq"]`.
Cache delete clause change `CandleNet/cache/synergy_cache.py`	In `CorrCache.delete`, parameter renamed from `sectors` → `sectors_id` and the SQL WHERE clause updated from `sectors = ?` to `sectors_id = ?`; logging message updated accordingly.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant SL as select_lags
  participant WR as bootstrapped_from_config / fast_bootstrapped_from_config
  participant RES as _resolve_lag_cfg
  participant SLOW as _bootstrapped_significance
  participant FAST as _fast_bootstrapped_significance

  User->>SL: call select_lags(y, params)
  SL->>WR: delegate based on engine (fast/slow)
  WR->>RES: resolve LagConfig into concrete ints (n, block_len, max_lag, B, etc.)
  RES-->>WR: resolved params
  alt engine == "fast"
    WR->>FAST: run fast bootstrap (uses RNG, computes p_base, p_fdr, freq, selected)
    FAST-->>WR: results DataFrame (t_base, p_base, p_fdr, selected, freq, ...)
  else engine == "slow"
    WR->>SLOW: run slow bootstrap (resampling bootstrap, computes p-values and masks)
    SLOW-->>WR: results DataFrame (t_base, p_base, p_fdr, selected, freq, ...)
  end
  WR-->>SL: DataFrame with resolved fields
  SL->>SL: finalize ordering/sorting by ["selected","p_base","freq"]
  SL-->>User: selected lags + metrics

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Refactor lag selection config and utils for flexibility #16 — Overlapping refactor of autoreg/lag_utils.py introducing config-driven helpers and renaming/moving bootstrapped significance functions.

Poem

I hop through configs, seeds in paw,
Resolve the lags with careful law.
Fast or slow the tests align,
P-values sorted, selected fine.
Corr keys shifted — tidy and spry, carrot-coded by my eye. 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title clearly and concisely summarizes the primary refactor of lag significance functions and the associated config handling improvements, which aligns directly with the changeset’s objectives. It highlights the two main components of the update without extraneous detail and is sufficiently specific for team members to understand the scope at a glance.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch FeatureGen

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

CandleNet/cache/synergy_cache.py (1)

176-186: Rename parameter to match the column it targets.

The method parameter is named sectors but the SQL query now correctly targets the sectors_id column. For clarity and consistency with other methods (insert, fetch), rename the parameter to sectors_id.

Apply this diff:

-    def delete(self, sectors: str) -> None:
+    def delete(self, sectors_id: str) -> None:
         con = self.check_con()
 
         query = f"""DELETE FROM {self.TABLE_NAME} WHERE sectors_id = ?;"""
-        con.execute(query, (sectors,))
+        con.execute(query, (sectors_id,))
         self._log(
             LogType.EVENT,
             OriginType.USER,
             CallerType.CACHE,
-            f"Deleted cache entry for sectors: {sectors}.",
+            f"Deleted cache entry for sectors: {sectors_id}.",
         )

🧹 Nitpick comments (3)

CandleNet/autoreg/lag_utils.py (3)
341-341: Fix ambiguous dash character in docstring.

The docstring contains an EN DASH (–) instead of a HYPHEN-MINUS (-). While this is a minor issue, it's flagged by static analysis and should be corrected for consistency.

Apply this diff:
-            - "hacBandwidth": Newey–West/HAC bandwidth or "auto".
+            - "hacBandwidth": Newey-West/HAC bandwidth or "auto".
Based on static analysis hints.

695-720: Simplify redundant bandwidth resolution.

Lines 707-711 check whether hacBandwidth is "auto" and conditionally use the resolved value, but _resolve_lag_cfg has already resolved "auto" to an integer at line 361-362. You can simplify by directly passing r["bandwidth"].

Apply this diff:
     r = _resolve_lag_cfg(params, n)
     return _bootstrapped_significance(
         y,
         max_lag=r["max_lag"],
         B=r["B"],
         block_len=r["block_len"],
-        bandwidth=(
-            params["hacBandwidth"]
-            if params["hacBandwidth"] != "auto"
-            else r["bandwidth"]
-        ),
+        bandwidth=r["bandwidth"],
         alpha=params["sigLevel"],
         use_fdr_end=(params["selectionMethod"] == "fdrAdjusted"),
         min_freq=params["stabilityFreq"] if params["requireStability"] else 0.0,
723-748: Simplify redundant bandwidth resolution.

Lines 735-739 have the same redundant check as in bootstrapped_from_config. Since _resolve_lag_cfg already resolves "auto" values, directly pass r["bandwidth"].

Apply this diff:
     r = _resolve_lag_cfg(params, n)
     return _fast_bootstrapped_significance(
         y,
         max_lag=r["max_lag"],
         B=r["B"],
         block_len=r["block_len"],
-        bandwidth=(
-            params["hacBandwidth"]
-            if params["hacBandwidth"] != "auto"
-            else r["bandwidth"]
-        ),
+        bandwidth=r["bandwidth"],
         alpha=params["sigLevel"],
         use_fdr_end=(params["selectionMethod"] == "fdrAdjusted"),
         min_freq=params["stabilityFreq"] if params["requireStability"] else 0.0,

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4003831 and ca2acbf.

📒 Files selected for processing (2)

CandleNet/autoreg/lag_utils.py (5 hunks)
CandleNet/cache/synergy_cache.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

CandleNet/autoreg/lag_utils.py (4)

CandleNet/autoreg/fastlag/select.py (2)

_auto_nw_bandwidth (83-96)

_is_int_like (11-15)

CandleNet/_config.py (1)

LagConfig (27-48)

CandleNet/autoreg/fastlag/engine.py (1)

_ols_hac_beta_t_vectorized (6-77)

CandleNet/__init__.py (1)

lag_config (10-17)

🪛 Ruff (0.13.3)

CandleNet/cache/synergy_cache.py

179-179: Possible SQL injection vector through string-based query construction

(S608)

CandleNet/autoreg/lag_utils.py

341-341: Docstring contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: lint

🔇 Additional comments (3)

CandleNet/autoreg/lag_utils.py (3)

396-561: LGTM! Appropriate conversion to internal function.

The rename to _bootstrapped_significance (underscore prefix) correctly signals this is now an internal implementation detail, with the new bootstrapped_from_config wrapper providing the public interface. The sorting logic remains correct.

564-692: LGTM! Consistent refactoring to internal function.

The rename to _fast_bootstrapped_significance is appropriate. The DataFrame column ordering is consistent between the early-return and main-path results, and the sorting logic correctly uses the by= parameter.

751-845: LGTM! Cleaner separation of concerns.

The refactor to use bootstrapped_from_config and fast_bootstrapped_from_config wrappers simplifies select_lags by delegating configuration resolution to dedicated functions. This improves maintainability and reduces duplication.

CandleNet/autoreg/lag_utils.py

Replaces _auto_block_len with _infer_block_len in lag_utils.py and updates parameter naming in CorrCache.delete for clarity. These changes improve code readability and maintain consistency in function usage and logging.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

CandleNet/autoreg/lag_utils.py (1)
682-707: Minor code duplication with fast_bootstrapped_from_config.

Both bootstrapped_from_config and fast_bootstrapped_from_config (lines 710-735) follow nearly identical patterns:

Get config via lag_config()

Compute series length and RNG

Resolve config with _resolve_lag_cfg

Call the underscored function

Consider extracting the common setup into a helper function to reduce duplication.

Example:
def _prepare_config_and_rng(y: pd.Series) -> tuple[dict, dict, np.random.Generator]:
    """Prepare resolved config and RNG from global config and series."""
    params = lag_config()
    n = len(y)
    rand_seed = params.get("randomSeed")
    rng = np.random.default_rng(rand_seed)
    r = _resolve_lag_cfg(params, n)
    return params, r, rng
Then both wrappers could use:
def bootstrapped_from_config(y: pd.Series) -> pd.DataFrame:
    params, r, rng = _prepare_config_and_rng(y)
    return _bootstrapped_significance(
        y,
        max_lag=r["max_lag"],
        # ... rest of parameters
    )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca2acbf and 474f172.

📒 Files selected for processing (2)

CandleNet/autoreg/lag_utils.py (5 hunks)
CandleNet/cache/synergy_cache.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

CandleNet/cache/synergy_cache.py (4)

CandleNet/cache/p2_cache.py (2)

delete (60-72)

TABLE_NAME (89-90)

CandleNet/cache/base_cache.py (4)

delete (62-63)

check_con (125-130)

TABLE_NAME (93-94)

_log (69-72)

CandleNet/cache/ticker_cache.py (2)

delete (153-163)

TABLE_NAME (190-191)

CandleNet/logger/logger_types.py (3)

LogType (4-8)

OriginType (11-13)

CallerType (16-23)

CandleNet/autoreg/lag_utils.py (4)

CandleNet/autoreg/fastlag/select.py (3)

_auto_nw_bandwidth (83-96)

_infer_block_len (99-101)

_is_int_like (11-15)

CandleNet/_config.py (1)

LagConfig (27-48)

CandleNet/autoreg/fastlag/engine.py (1)

_ols_hac_beta_t_vectorized (6-77)

CandleNet/__init__.py (1)

lag_config (10-17)

🪛 Ruff (0.13.3)

CandleNet/cache/synergy_cache.py

179-179: Possible SQL injection vector through string-based query construction

(S608)

CandleNet/autoreg/lag_utils.py

328-328: Docstring contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF002)

🔇 Additional comments (4)

CandleNet/cache/synergy_cache.py (1)

176-186: LGTM! Consistent parameter naming.

The rename from sectors to sectors_id aligns with the database schema (line 207) and matches the naming used in insert (line 123) and fetch (line 144) methods. The SQL query correctly uses parameterized binding with the ? placeholder, making it safe from injection attacks.

Note: The static analysis warning (S608) about SQL injection is a false positive—the query is properly parameterized.

CandleNet/autoreg/lag_utils.py (3)

547-548: Verify the sorting key change is intentional.

The result sorting changed from by=["selected", "freq", "top_freq"] to by=["selected", "p_base", "freq"]. This prioritizes base-sample p-values over the frequency of being the top-ranked lag, which affects how equally-stable lags are ordered in the output.

Ensure this change aligns with the intended selection logic and user expectations.

551-679: LGTM! Improved consistency between fast and slow paths.

The updates align the fast implementation with the slow version:

Added FDR handling (lines 600-603)

Expanded DataFrame output to include all fields (lines 661-673)

Unified sorting logic (lines 677-679)

This ensures both engines produce compatible results.

766-768: LGTM! Clean integration of config-driven wrappers.

The updated calls to bootstrapped_from_config and fast_bootstrapped_from_config simplify select_lags by delegating config resolution to the wrapper functions. This improves separation of concerns and reduces coupling.

coderabbitai · 2025-10-04T22:26:50Z

CandleNet/autoreg/lag_utils.py

+def _resolve_lag_cfg(params: LagConfig, n: int) -> dict:
+    # max_lag tested
+    """
+    Resolve a LagConfig mapping into concrete numeric parameters used for lag testing and bootstrapping.
+
+    This converts potentially symbolic or "auto" entries in `params` into integer values appropriate for
+    a series of length `n`, applying sensible bounds and heuristics where needed.
+
+    Parameters:
+        params (LagConfig): Configuration mapping containing keys:
+            - "maxLag": maximum lag to consider (may be numeric or "auto"-like value).
+            - "hacBandwidth": Newey–West/HAC bandwidth or "auto".
+            - "blockLen": circular block bootstrap block length or "auto".
+            - "bootstrapSamples": number of bootstrap replicates or "auto".
+            - "maxLagsSelected": cap on number of selected lags or "auto".
+            - "minBootstrapSamples", "minLagsSelected": minimums used when resolving "auto".
+        n (int): Length of the time series; used to clamp and derive data-dependent defaults.
+
+    Returns:
+        dict: A mapping with integer-valued keys:
+            - "max_lag": selected max lag (clamped to at least 1 and at most n-2).
+            - "bandwidth": resolved HAC bandwidth as an int.
+            - "block_len": resolved block length for CBB as an int.
+            - "B": number of bootstrap replicates as an int.
+            - "max_selected": maximum number of lags to retain as an int (>= 0).
+    """
+    max_lag = int(params["maxLag"])
+    max_lag = max(1, min(max_lag, n - 2))
+
+    # bandwidth
+    bw = params["hacBandwidth"]
+    if isinstance(bw, str) and bw == "auto":
+        bw = _auto_nw_bandwidth(n)
+
+    # block length
+    bl = params["blockLen"]
+    if isinstance(bl, str) and bl == "auto":
+        bl = _infer_block_len(n)
+
+    # bootstrap samples
+    B = params["bootstrapSamples"]
+    if isinstance(B, str) and B == "auto":
+        # heuristic: proportional to tested lags, capped
+        B = max(params["minBootstrapSamples"], min(300, 20 * max_lag))
+
+    # max lags selected
+    msel_cfg = params["maxLagsSelected"]
+    if isinstance(msel_cfg, str) and msel_cfg == "auto":
+        msel = max(params["minLagsSelected"], min(5, max_lag))
+    else:
+        msel = msel_cfg
+
+    assert _is_int_like(msel) and msel >= 0, (
+        f"Unsupported maxLagsSelected: {msel_cfg}. "
+        f"Must be a non-negative integer or 'auto'."
+    )
+
+    return {
+        "max_lag": max_lag,
+        "bandwidth": int(bw),
+        "block_len": int(bl),
+        "B": int(B),
+        "max_selected": int(msel),
+    }


⚠️ Potential issue | 🟡 Minor

Fix EN DASH character in docstring.

The docstring contains an EN DASH (–) character on line 328 in "Newey–West/HAC". Use a regular hyphen-minus (-) instead for better compatibility and to satisfy linting rules.

Apply this diff:

- params (LagConfig): Configuration mapping containing keys: - - "maxLag": maximum lag to consider (may be numeric or "auto"-like value). - - "hacBandwidth": Newey–West/HAC bandwidth or "auto". + params (LagConfig): Configuration mapping containing keys: + - "maxLag": maximum lag to consider (may be numeric or "auto"-like value). + - "hacBandwidth": Newey-West/HAC bandwidth or "auto".

Note: The past review comment about duplicate _auto_block_len logic appears to have been addressed—this function now calls _infer_block_len(n) at line 354 instead of duplicating the logic.

Based on past review comments.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _resolve_lag_cfg(params: LagConfig, n: int) -> dict:

# max_lag tested

"""

Resolve a LagConfig mapping into concrete numeric parameters used for lag testing and bootstrapping.

This converts potentially symbolic or "auto" entries in `params` into integer values appropriate for

a series of length `n`, applying sensible bounds and heuristics where needed.

Parameters:

params (LagConfig): Configuration mapping containing keys:

- "maxLag": maximum lag to consider (may be numeric or "auto"-like value).

- "hacBandwidth": Newey–West/HAC bandwidth or "auto".

- "blockLen": circular block bootstrap block length or "auto".

- "bootstrapSamples": number of bootstrap replicates or "auto".

- "maxLagsSelected": cap on number of selected lags or "auto".

- "minBootstrapSamples", "minLagsSelected": minimums used when resolving "auto".

n (int): Length of the time series; used to clamp and derive data-dependent defaults.

Returns:

dict: A mapping with integer-valued keys:

- "max_lag": selected max lag (clamped to at least 1 and at most n-2).

- "bandwidth": resolved HAC bandwidth as an int.

- "block_len": resolved block length for CBB as an int.

- "B": number of bootstrap replicates as an int.

- "max_selected": maximum number of lags to retain as an int (>= 0).

"""

max_lag = int(params["maxLag"])

max_lag = max(1, min(max_lag, n - 2))

# bandwidth

bw = params["hacBandwidth"]

if isinstance(bw, str) and bw == "auto":

bw = _auto_nw_bandwidth(n)

# block length

bl = params["blockLen"]

if isinstance(bl, str) and bl == "auto":

bl = _infer_block_len(n)

# bootstrap samples

B = params["bootstrapSamples"]

if isinstance(B, str) and B == "auto":

# heuristic: proportional to tested lags, capped

B = max(params["minBootstrapSamples"], min(300, 20 * max_lag))

# max lags selected

msel_cfg = params["maxLagsSelected"]

if isinstance(msel_cfg, str) and msel_cfg == "auto":

msel = max(params["minLagsSelected"], min(5, max_lag))

else:

msel = msel_cfg

assert _is_int_like(msel) and msel >= 0, (

f"Unsupported maxLagsSelected: {msel_cfg}. "

f"Must be a non-negative integer or 'auto'."

)

return {

"max_lag": max_lag,

"bandwidth": int(bw),

"block_len": int(bl),

"B": int(B),

"max_selected": int(msel),

}

params (LagConfig): Configuration mapping containing keys:

- "maxLag": maximum lag to consider (may be numeric or "auto"-like value).

- "hacBandwidth": Newey-West/HAC bandwidth or "auto".

🧰 Tools

🪛 Ruff (0.13.3)

328-328: Docstring contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF002)

🤖 Prompt for AI Agents

CandleNet/autoreg/lag_utils.py around lines 317-380: the docstring at line ~328 uses an EN DASH in "Newey–West/HAC"; replace the EN DASH (–) with a regular hyphen-minus ("-") so it reads "Newey-West/HAC", save the file, and run the linter to confirm the character issue is resolved.

GongJr0 self-assigned this Oct 4, 2025

GongJr0 added this to CandleNet Oct 4, 2025

GongJr0 added the enhancement New feature or request label Oct 4, 2025

GongJr0 moved this to In review in CandleNet Oct 4, 2025

coderabbitai bot reviewed Oct 4, 2025

View reviewed changes

CandleNet/autoreg/lag_utils.py Outdated Show resolved Hide resolved

Refactor block length and cache deletion functions

474f172

Replaces _auto_block_len with _infer_block_len in lag_utils.py and updates parameter naming in CorrCache.delete for clarity. These changes improve code readability and maintain consistency in function usage and logging.

GongJr0 merged commit ccd931a into main Oct 4, 2025
6 of 7 checks passed

github-project-automation bot moved this from In review to Done in CandleNet Oct 4, 2025

coderabbitai bot reviewed Oct 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor lag significance functions and config handling #19

Refactor lag significance functions and config handling #19

Uh oh!

GongJr0 commented Oct 4, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 4, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Refactor lag significance functions and config handling #19

Refactor lag significance functions and config handling #19

Uh oh!

Conversation

GongJr0 commented Oct 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

GongJr0 commented Oct 4, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 4, 2025 •

edited

Loading