fix(curation): Use matches_hash for statement hash due to string issues #64

tonywu1999 · 2025-11-05T22:15:55Z

Summary by CodeRabbit

Bug Fixes
- Modified statement identifier extraction to derive values from JSON payloads instead of direct data fields, affecting curation filtering and edge construction processes in the data processing pipeline.

coderabbitai · 2025-11-05T22:16:04Z

Walkthrough

The code modifies statement identifier extraction in two functions: .filterIndraResponse and .constructEdgesDataFrame. Instead of using a direct stmt_hash field, both now parse stmt_json and derive the identifier from its matches_hash value. No JSON parsing error handling is introduced.

Changes

Cohort / File(s)	Summary
Statement hash extraction refactor `R/utils_getSubnetworkFromIndra.R`	Modified `.filterIndraResponse` and `.constructEdgesDataFrame` to derive `stmt_hash` by parsing `stmt_json` and extracting `matches_hash`, replacing direct field access to `data$stmt_hash`

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Area of attention: Verify that matches_hash is reliably present in the stmt_json payload in all expected cases; JSON parsing errors could break existing workflows
Area of attention: Confirm that the computed matches_hash produces equivalent results to the previous stmt_hash field for curation filtering and edge construction
Area of attention: Check if this change affects performance, particularly for large response batches that require repeated JSON parsing

Poem

🐰 Hops through JSON fields so deep,
Where matches_hash secrets sleep,
No more direct paths we take,
Parsing makes the logic wake,
Refactored bright, the code runs clean! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	Pull request description is empty with no motivation, context, detailed changes, or testing information provided by the author.	Add a comprehensive description including: motivation and context for the curation fix, detailed bullet points of changes to stmt_hash extraction logic, description of unit tests verifying the matches_hash behavior, and confirmation of checklist items (styling, documentation, contributing guidelines).

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly indicates the main change: using matches_hash instead of stmt_hash to fix string-related issues in curation filtering.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-curation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-11-05T22:16:50Z

Failed to generate code suggestions for PR

codecov-commenter · 2025-11-05T22:18:30Z

Codecov Report

❌ Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.89%. Comparing base (b42471f) to head (fdf4253).

Files with missing lines	Patch %	Lines
R/utils_getSubnetworkFromIndra.R	50.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##            devel      #64      +/-   ##
==========================================
- Coverage   57.90%   57.89%   -0.02%     
==========================================
  Files           7        7              
  Lines        1354     1356       +2     
==========================================
+ Hits          784      785       +1     
- Misses        570      571       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

R/utils_getSubnetworkFromIndra.R (1)

128-129: Consider extracting common JSON parsing logic.

The same JSON parsing pattern appears in both .filterIndraResponse and .constructEdgesDataFrame. Consider extracting this into a helper function to reduce duplication and ensure consistent error handling.

Example helper function to add:

#' Extract matches_hash from INDRA statement JSON
#' @param stmt_json_string JSON string from INDRA statement
#' @return matches_hash value or NA_character_ on error
#' @keywords internal
#' @noRd
.extractMatchesHashFromStmtJson <- function(stmt_json_string) {
    tryCatch({
        stmt_json <- fromJSON(stmt_json_string)
        if (is.null(stmt_json$matches_hash)) {
            return(NA_character_)
        }
        as.character(stmt_json$matches_hash)
    }, error = function(e) {
        warning(paste("Failed to parse stmt_json:", e$message))
        return(NA_character_)
    })
}

Then use it in both locations:

# In .filterIndraResponse
stmt_hash <- .extractMatchesHashFromStmtJson(res[[i]]$data$stmt_json)
if (is.na(stmt_hash)) next

# In .constructEdgesDataFrame
stmt_hash = vapply(keys(res), function(x) {
    .extractMatchesHashFromStmtJson(query(res, x)$data$stmt_json)
}, "")

Also applies to: 304-307

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b42471f and fdf4253.

📒 Files selected for processing (1)

R/utils_getSubnetworkFromIndra.R (2 hunks)

🔇 Additional comments (1)

R/utils_getSubnetworkFromIndra.R (1)

128-129: Commit message confirms the PR addresses string issues with statement hashes, and code changes are systematic and complete.

The verification confirms that:

PR intent is validated: The commit message explicitly states "fix(curation): Use matches_hash for statement hash due to string issues," directly addressing the concern about "string issues."

Changes are consistent and complete: Both code locations (lines 129 and 305-306) apply the same pattern—parsing stmt_json and extracting matches_hash instead of directly accessing $data$stmt_hash. A search for remaining direct $data$stmt_hash accesses found none, confirming all references were updated.

Implementation is systematic: The change consistently uses fromJSON(res[[i]]$data$stmt_json) or fromJSON(query(res, x)$data$stmt_json) followed by field extraction ($matches_hash), confirming stmt_json is an expected INDRA response field.

Tests exist: Test file at ./tests/testthat/test-getSubnetworkFromIndra.R validates basic getSubnetworkFromIndra functionality using a mocked INDRA response.

Limitation: The sandbox environment cannot verify that production INDRA API responses consistently contain stmt_json with a matches_hash field, nor can tests be executed to confirm the curation filtering logic works correctly with the new field extraction pattern. Manual testing with actual INDRA API responses is recommended to fully validate the fix.

coderabbitai · 2025-11-05T22:18:58Z

R/utils_getSubnetworkFromIndra.R

+            stmt_json <- fromJSON(res[[i]]$data$stmt_json)
+            stmt_hash <- stmt_json$matches_hash


⚠️ Potential issue | 🟠 Major

Add error handling for JSON parsing and field validation.

The fromJSON call lacks error handling and there's no validation that matches_hash exists in the parsed JSON. If the INDRA API returns malformed JSON or if the structure changes, this will cause runtime failures.

Apply this diff to add defensive error handling:

for (i in seq_along(res)) { - stmt_json <- fromJSON(res[[i]]$data$stmt_json) - stmt_hash <- stmt_json$matches_hash + stmt_json <- tryCatch( + fromJSON(res[[i]]$data$stmt_json), + error = function(e) { + warning(paste("Failed to parse stmt_json for statement", i, ":", e$message)) + return(NULL) + } + ) + if (is.null(stmt_json) || is.null(stmt_json$matches_hash)) { + warning(paste("Missing matches_hash for statement", i)) + next + } + stmt_hash <- stmt_json$matches_hash incorrect_count <- .get_incorrect_curation_count(stmt_hash, api_key)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

stmt_json <- fromJSON(res[[i]]$data$stmt_json)

stmt_hash <- stmt_json$matches_hash

stmt_json <- tryCatch(

fromJSON(res[[i]]$data$stmt_json),

error = function(e) {

warning(paste("Failed to parse stmt_json for statement", i, ":", e$message))

return(NULL)

}

)

if (is.null(stmt_json) || is.null(stmt_json$matches_hash)) {

warning(paste("Missing matches_hash for statement", i))

next

}

stmt_hash <- stmt_json$matches_hash

🤖 Prompt for AI Agents

In R/utils_getSubnetworkFromIndra.R around lines 128-129, the call stmt_json <- fromJSON(res[[i]]$data$stmt_json) and immediate access to stmt_json$matches_hash assume valid JSON and presence of matches_hash; add defensive error handling: wrap fromJSON in tryCatch to catch/parsing errors (log/warn and skip or set stmt_hash to NA), validate that parsed object is a list and contains the matches_hash field before accessing it, handle missing or NULL matches_hash (log/warn and skip or set NA), and ensure subsequent code branches handle NA/absence safely so malformed or changed INDRA responses do not cause runtime failures.

coderabbitai · 2025-11-05T22:18:59Z

R/utils_getSubnetworkFromIndra.R

        stmt_hash = vapply(keys(res), function(x) {
-            as.character(query(res, x)$data$stmt_hash)
+            stmt_json <- fromJSON(query(res, x)$data$stmt_json)
+            stmt_json$matches_hash
        }, ""),


⚠️ Potential issue | 🟠 Major

Add error handling and validation for JSON parsing in vapply.

Similar to the issue in .filterIndraResponse, this fromJSON call lacks error handling. Additionally, since this is inside a vapply expecting character output, any parsing failure will cause a type mismatch error that's harder to debug.

Apply this diff to add defensive error handling:

stmt_hash = vapply(keys(res), function(x) { - stmt_json <- fromJSON(query(res, x)$data$stmt_json) - stmt_json$matches_hash + tryCatch({ + stmt_json <- fromJSON(query(res, x)$data$stmt_json) + if (is.null(stmt_json$matches_hash)) { + return(NA_character_) + } + stmt_json$matches_hash + }, error = function(e) { + warning(paste("Failed to parse stmt_json for key", x, ":", e$message)) + return(NA_character_) + }) }, ""),

🤖 Prompt for AI Agents

In R/utils_getSubnetworkFromIndra.R around lines 304 to 307, the inline fromJSON(query(res, x)$data$stmt_json) call inside vapply lacks error handling and can break the vapply's expected character return; wrap the JSON parsing in a tryCatch that first verifies query(res, x)$data$stmt_json exists and is non-empty, attempt fromJSON inside tryCatch, check that the parsed object has a character matches_hash field, and on any error or missing/invalid value return a safe default (e.g., "" or NA_character_) so the vapply always returns a character vector and failures are logged or flagged for debugging.

tonywu1999 added 2 commits November 5, 2025 17:03

fix(curation): Use matches_hash for statement hash due to string issues

4ac3744

bug fix

fdf4253

coderabbitai bot reviewed Nov 5, 2025

View reviewed changes

tonywu1999 merged commit 5afd016 into devel Nov 5, 2025
5 checks passed

tonywu1999 deleted the fix-curation branch November 5, 2025 23:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(curation): Use matches_hash for statement hash due to string issues #64

fix(curation): Use matches_hash for statement hash due to string issues #64

Uh oh!

tonywu1999 commented Nov 5, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 5, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 5, 2025

Uh oh!

codecov-commenter commented Nov 5, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 5, 2025

Uh oh!

coderabbitai bot Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		stmt_json <- fromJSON(res[[i]]$data$stmt_json)
		stmt_hash <- stmt_json$matches_hash

-            stmt_json <- fromJSON(res[[i]]$data$stmt_json)
-            stmt_hash <- stmt_json$matches_hash
+            stmt_json <- tryCatch(
+                fromJSON(res[[i]]$data$stmt_json),
+                error = function(e) {
+                    warning(paste("Failed to parse stmt_json for statement", i, ":", e$message))
+                    return(NULL)
+                }
+            )
+            if (is.null(stmt_json) || is.null(stmt_json$matches_hash)) {
+                warning(paste("Missing matches_hash for statement", i))
+                next
+            }
+            stmt_hash <- stmt_json$matches_hash

fix(curation): Use matches_hash for statement hash due to string issues #64

fix(curation): Use matches_hash for statement hash due to string issues #64

Uh oh!

Conversation

tonywu1999 commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Nov 5, 2025

Uh oh!

codecov-commenter commented Nov 5, 2025

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tonywu1999 commented Nov 5, 2025 •

edited

Loading

coderabbitai bot commented Nov 5, 2025 •

edited

Loading