From dd37deb11164d461744fb98f913fe075d0b09d50 Mon Sep 17 00:00:00 2001 From: April Shen Date: Wed, 28 Jan 2026 11:24:24 +0000 Subject: [PATCH 1/2] make upload script executable and update SOP --- bin/upload_to_gcloud.py | 0 docs/open-targets/generate-evidence-strings.md | 12 +++++++++++- 2 files changed, 11 insertions(+), 1 deletion(-) mode change 100644 => 100755 bin/upload_to_gcloud.py diff --git a/bin/upload_to_gcloud.py b/bin/upload_to_gcloud.py old mode 100644 new mode 100755 diff --git a/docs/open-targets/generate-evidence-strings.md b/docs/open-targets/generate-evidence-strings.md index 251189e6..99d3a27d 100644 --- a/docs/open-targets/generate-evidence-strings.md +++ b/docs/open-targets/generate-evidence-strings.md @@ -53,6 +53,16 @@ Nevertheless, we also report evidence strings in which ``diseaseFromSourceMappe ## 2. Manual follow-up actions +### Check removed mappings and invalid evidence +The pipeline removes mappings that would violate the current JSON schema, and outputs them to `${BATCH_ROOT}/logs/removed_mappings.tsv`. +Check that this file is empty, or only contains expected mappings (e.g. within the material entity branch of EFO). +This can be confirmed with the Open Targets data team if needed. + +Any invalid evidence strings are dropped and output in `${BATCH_ROOT}/evidence/invalid_evidence.json`. +Check that this file is empty, or only contains expected evidence given the current state of development (e.g. +unsupported clinical significance terms or other unsupported features). +Anything unusual should be raised with the Open Targets team so it can be addressed as a priority. + ### Update summary metrics After the evidence strings have been generated, summary metrics need to be updated in the Google Sheets [table](https://docs.google.com/spreadsheets/d/1g_4tHNWP4VIikH7Jb0ui5aNr0PiFgvscZYOe69g191k/) on the “Raw statistics” sheet. @@ -63,7 +73,7 @@ The evidence string file (`evidence_strings.json`) must be compressed and upload To do this, run the following: ```shell gzip evidence_strings/evidence_strings.json -${CODE_ROOT}/bin/upload_to_gcloud.py --input-file evidence_strings/evidence_strings.json.gz --destination-folder disease-target-evidence +${PYTHON_BIN} ${CODE_ROOT}/bin/upload_to_gcloud.py --input-file evidence_strings/evidence_strings.json.gz --destination-folder disease-target-evidence ``` Once the upload is complete, send an email to Open Targets (data [at] opentargets.org) containing the following information from the [metrics spreadsheet](https://docs.google.com/spreadsheets/d/1g_4tHNWP4VIikH7Jb0ui5aNr0PiFgvscZYOe69g191k/): From 7ea0277c1b89ec20829671752b57bd2829192f75 Mon Sep 17 00:00:00 2001 From: April Shen Date: Wed, 28 Jan 2026 15:46:37 +0000 Subject: [PATCH 2/2] update command in SOP --- docs/open-targets/generate-evidence-strings.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/open-targets/generate-evidence-strings.md b/docs/open-targets/generate-evidence-strings.md index 99d3a27d..832f72f9 100644 --- a/docs/open-targets/generate-evidence-strings.md +++ b/docs/open-targets/generate-evidence-strings.md @@ -73,7 +73,7 @@ The evidence string file (`evidence_strings.json`) must be compressed and upload To do this, run the following: ```shell gzip evidence_strings/evidence_strings.json -${PYTHON_BIN} ${CODE_ROOT}/bin/upload_to_gcloud.py --input-file evidence_strings/evidence_strings.json.gz --destination-folder disease-target-evidence +${CODE_ROOT}/env/bin/upload_to_gcloud.py --input-file evidence_strings/evidence_strings.json.gz --destination-folder disease-target-evidence ``` Once the upload is complete, send an email to Open Targets (data [at] opentargets.org) containing the following information from the [metrics spreadsheet](https://docs.google.com/spreadsheets/d/1g_4tHNWP4VIikH7Jb0ui5aNr0PiFgvscZYOe69g191k/):