-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
featureImplementation tracking for approved featuresImplementation tracking for approved features
Description
Feature Details
Implement lightweight utilities that verify the structural integrity of feature batches before they hit the model. These checks should be fast, torch-friendly, and easy to call in both unit tests and runtime (debug mode). The goal is to catch silent bugs in sparse-lag formatting, padding, and concatenation.
The validators should cover:
- Tensor shapes (e.g.,
$(B, K, D_{val})$ for values,$(B, K)$ for lag IDs,$(B,)$ for ticker IDs). - Mask semantics: pad_mask is boolean, True means “ignore/pad”, and no non-pad exists beyond the last valid index.
- Alignment across tensors in the same batch (same
$B$ , same$K$ ). - Dtype sanity (e.g., embeddings indices are
int64, values arefloat32/float64). - Monotone padding: once padded, all subsequent positions in that row must be padded.
- Optional value checks: NaN/Inf guards on numeric features.
Affected Modules
As stated in the parent issue.
Implementation Checklist
- Verify ranks/dims; optional expected K/D enforcement.
- Ensure each row’s mask transitions at most once (valid->pad).
- Confirm common
$(B, K)$ and consistent first-dimension$B$ across all inputs. - Flag NaNs and Infs. Optionally returns a boolean mask of bad rows.
- Calls all checks; in
strict=False, return a report dict instead of raising. - Wire into
FeatureGen(behind a debug=True flag) to run per batch in debug mode. - Unit tests:
• Happy paths: correct shapes/masks/dtypes pass.
• Failure cases: mismatched K, ragged masks, wrong dtypes, NaNs/Infs, non-boolean masks.
• Edge cases: K=1, empty after padding (all pad), mixed dtypes.
Limitations
As stated in the parent issue.
Metadata
Metadata
Assignees
Labels
featureImplementation tracking for approved featuresImplementation tracking for approved features
Projects
Status
In progress