Skip to content

[FEATURE] Validation Utilities for Shape/Mask Consistency #10

@GongJr0

Description

@GongJr0

Feature Details

Implement lightweight utilities that verify the structural integrity of feature batches before they hit the model. These checks should be fast, torch-friendly, and easy to call in both unit tests and runtime (debug mode). The goal is to catch silent bugs in sparse-lag formatting, padding, and concatenation.

The validators should cover:

  • Tensor shapes (e.g., $(B, K, D_{val})$ for values, $(B, K)$ for lag IDs, $(B,)$ for ticker IDs).
  • Mask semantics: pad_mask is boolean, True means “ignore/pad”, and no non-pad exists beyond the last valid index.
  • Alignment across tensors in the same batch (same $B$, same $K$).
  • Dtype sanity (e.g., embeddings indices are int64, values are float32/float64).
  • Monotone padding: once padded, all subsequent positions in that row must be padded.
  • Optional value checks: NaN/Inf guards on numeric features.

Affected Modules

As stated in the parent issue.

Implementation Checklist

  • Verify ranks/dims; optional expected K/D enforcement.
  • Ensure each row’s mask transitions at most once (valid->pad).
  • Confirm common $(B, K)$ and consistent first-dimension $B$ across all inputs.
  • Flag NaNs and Infs. Optionally returns a boolean mask of bad rows.
  • Calls all checks; in strict=False, return a report dict instead of raising.
  • Wire into FeatureGen (behind a debug=True flag) to run per batch in debug mode.
  • Unit tests:
    • Happy paths: correct shapes/masks/dtypes pass.
    • Failure cases: mismatched K, ragged masks, wrong dtypes, NaNs/Infs, non-boolean masks.
    • Edge cases: K=1, empty after padding (all pad), mixed dtypes.

Limitations

As stated in the parent issue.

Metadata

Metadata

Assignees

Labels

featureImplementation tracking for approved features

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions