Skip to content

Conversation

@tonywu1999
Copy link
Contributor

@tonywu1999 tonywu1999 commented Nov 5, 2025

Summary by CodeRabbit

  • Bug Fixes
    • Modified statement identifier extraction to derive values from JSON payloads instead of direct data fields, affecting curation filtering and edge construction processes in the data processing pipeline.

@coderabbitai
Copy link

coderabbitai bot commented Nov 5, 2025

Walkthrough

The code modifies statement identifier extraction in two functions: .filterIndraResponse and .constructEdgesDataFrame. Instead of using a direct stmt_hash field, both now parse stmt_json and derive the identifier from its matches_hash value. No JSON parsing error handling is introduced.

Changes

Cohort / File(s) Summary
Statement hash extraction refactor
R/utils_getSubnetworkFromIndra.R
Modified .filterIndraResponse and .constructEdgesDataFrame to derive stmt_hash by parsing stmt_json and extracting matches_hash, replacing direct field access to data$stmt_hash

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Area of attention: Verify that matches_hash is reliably present in the stmt_json payload in all expected cases; JSON parsing errors could break existing workflows
  • Area of attention: Confirm that the computed matches_hash produces equivalent results to the previous stmt_hash field for curation filtering and edge construction
  • Area of attention: Check if this change affects performance, particularly for large response batches that require repeated JSON parsing

Poem

🐰 Hops through JSON fields so deep,
Where matches_hash secrets sleep,
No more direct paths we take,
Parsing makes the logic wake,
Refactored bright, the code runs clean!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning Pull request description is empty with no motivation, context, detailed changes, or testing information provided by the author. Add a comprehensive description including: motivation and context for the curation fix, detailed bullet points of changes to stmt_hash extraction logic, description of unit tests verifying the matches_hash behavior, and confirmation of checklist items (styling, documentation, contributing guidelines).
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly indicates the main change: using matches_hash instead of stmt_hash to fix string-related issues in curation filtering.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-curation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

Failed to generate code suggestions for PR

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.89%. Comparing base (b42471f) to head (fdf4253).

Files with missing lines Patch % Lines
R/utils_getSubnetworkFromIndra.R 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel      #64      +/-   ##
==========================================
- Coverage   57.90%   57.89%   -0.02%     
==========================================
  Files           7        7              
  Lines        1354     1356       +2     
==========================================
+ Hits          784      785       +1     
- Misses        570      571       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
R/utils_getSubnetworkFromIndra.R (1)

128-129: Consider extracting common JSON parsing logic.

The same JSON parsing pattern appears in both .filterIndraResponse and .constructEdgesDataFrame. Consider extracting this into a helper function to reduce duplication and ensure consistent error handling.

Example helper function to add:

#' Extract matches_hash from INDRA statement JSON
#' @param stmt_json_string JSON string from INDRA statement
#' @return matches_hash value or NA_character_ on error
#' @keywords internal
#' @noRd
.extractMatchesHashFromStmtJson <- function(stmt_json_string) {
    tryCatch({
        stmt_json <- fromJSON(stmt_json_string)
        if (is.null(stmt_json$matches_hash)) {
            return(NA_character_)
        }
        as.character(stmt_json$matches_hash)
    }, error = function(e) {
        warning(paste("Failed to parse stmt_json:", e$message))
        return(NA_character_)
    })
}

Then use it in both locations:

# In .filterIndraResponse
stmt_hash <- .extractMatchesHashFromStmtJson(res[[i]]$data$stmt_json)
if (is.na(stmt_hash)) next

# In .constructEdgesDataFrame
stmt_hash = vapply(keys(res), function(x) {
    .extractMatchesHashFromStmtJson(query(res, x)$data$stmt_json)
}, "")

Also applies to: 304-307

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b42471f and fdf4253.

📒 Files selected for processing (1)
  • R/utils_getSubnetworkFromIndra.R (2 hunks)
🔇 Additional comments (1)
R/utils_getSubnetworkFromIndra.R (1)

128-129: Commit message confirms the PR addresses string issues with statement hashes, and code changes are systematic and complete.

The verification confirms that:

  1. PR intent is validated: The commit message explicitly states "fix(curation): Use matches_hash for statement hash due to string issues," directly addressing the concern about "string issues."

  2. Changes are consistent and complete: Both code locations (lines 129 and 305-306) apply the same pattern—parsing stmt_json and extracting matches_hash instead of directly accessing $data$stmt_hash. A search for remaining direct $data$stmt_hash accesses found none, confirming all references were updated.

  3. Implementation is systematic: The change consistently uses fromJSON(res[[i]]$data$stmt_json) or fromJSON(query(res, x)$data$stmt_json) followed by field extraction ($matches_hash), confirming stmt_json is an expected INDRA response field.

  4. Tests exist: Test file at ./tests/testthat/test-getSubnetworkFromIndra.R validates basic getSubnetworkFromIndra functionality using a mocked INDRA response.

Limitation: The sandbox environment cannot verify that production INDRA API responses consistently contain stmt_json with a matches_hash field, nor can tests be executed to confirm the curation filtering logic works correctly with the new field extraction pattern. Manual testing with actual INDRA API responses is recommended to fully validate the fix.

Comment on lines +128 to +129
stmt_json <- fromJSON(res[[i]]$data$stmt_json)
stmt_hash <- stmt_json$matches_hash
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add error handling for JSON parsing and field validation.

The fromJSON call lacks error handling and there's no validation that matches_hash exists in the parsed JSON. If the INDRA API returns malformed JSON or if the structure changes, this will cause runtime failures.

Apply this diff to add defensive error handling:

         for (i in seq_along(res)) {
-            stmt_json <- fromJSON(res[[i]]$data$stmt_json)
-            stmt_hash <- stmt_json$matches_hash
+            stmt_json <- tryCatch(
+                fromJSON(res[[i]]$data$stmt_json),
+                error = function(e) {
+                    warning(paste("Failed to parse stmt_json for statement", i, ":", e$message))
+                    return(NULL)
+                }
+            )
+            if (is.null(stmt_json) || is.null(stmt_json$matches_hash)) {
+                warning(paste("Missing matches_hash for statement", i))
+                next
+            }
+            stmt_hash <- stmt_json$matches_hash
             incorrect_count <- .get_incorrect_curation_count(stmt_hash, api_key)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
stmt_json <- fromJSON(res[[i]]$data$stmt_json)
stmt_hash <- stmt_json$matches_hash
stmt_json <- tryCatch(
fromJSON(res[[i]]$data$stmt_json),
error = function(e) {
warning(paste("Failed to parse stmt_json for statement", i, ":", e$message))
return(NULL)
}
)
if (is.null(stmt_json) || is.null(stmt_json$matches_hash)) {
warning(paste("Missing matches_hash for statement", i))
next
}
stmt_hash <- stmt_json$matches_hash
🤖 Prompt for AI Agents
In R/utils_getSubnetworkFromIndra.R around lines 128-129, the call stmt_json <-
fromJSON(res[[i]]$data$stmt_json) and immediate access to stmt_json$matches_hash
assume valid JSON and presence of matches_hash; add defensive error handling:
wrap fromJSON in tryCatch to catch/parsing errors (log/warn and skip or set
stmt_hash to NA), validate that parsed object is a list and contains the
matches_hash field before accessing it, handle missing or NULL matches_hash
(log/warn and skip or set NA), and ensure subsequent code branches handle
NA/absence safely so malformed or changed INDRA responses do not cause runtime
failures.

Comment on lines 304 to 307
stmt_hash = vapply(keys(res), function(x) {
as.character(query(res, x)$data$stmt_hash)
stmt_json <- fromJSON(query(res, x)$data$stmt_json)
stmt_json$matches_hash
}, ""),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add error handling and validation for JSON parsing in vapply.

Similar to the issue in .filterIndraResponse, this fromJSON call lacks error handling. Additionally, since this is inside a vapply expecting character output, any parsing failure will cause a type mismatch error that's harder to debug.

Apply this diff to add defensive error handling:

         stmt_hash = vapply(keys(res), function(x) {
-            stmt_json <- fromJSON(query(res, x)$data$stmt_json)
-            stmt_json$matches_hash
+            tryCatch({
+                stmt_json <- fromJSON(query(res, x)$data$stmt_json)
+                if (is.null(stmt_json$matches_hash)) {
+                    return(NA_character_)
+                }
+                stmt_json$matches_hash
+            }, error = function(e) {
+                warning(paste("Failed to parse stmt_json for key", x, ":", e$message))
+                return(NA_character_)
+            })
         }, ""),
🤖 Prompt for AI Agents
In R/utils_getSubnetworkFromIndra.R around lines 304 to 307, the inline
fromJSON(query(res, x)$data$stmt_json) call inside vapply lacks error handling
and can break the vapply's expected character return; wrap the JSON parsing in a
tryCatch that first verifies query(res, x)$data$stmt_json exists and is
non-empty, attempt fromJSON inside tryCatch, check that the parsed object has a
character matches_hash field, and on any error or missing/invalid value return a
safe default (e.g., "" or NA_character_) so the vapply always returns a
character vector and failures are logged or flagged for debugging.

@tonywu1999 tonywu1999 merged commit 5afd016 into devel Nov 5, 2025
5 checks passed
@tonywu1999 tonywu1999 deleted the fix-curation branch November 5, 2025 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants