-
Notifications
You must be signed in to change notification settings - Fork 1
Description
In a Slack thread Nikolay Akhmetov reported entities which have metadata:null when displayed using ElasticSearch documents. These can be viewed using the portal:
dataset/b3c76ba5f4b687ee400e8208dffe65bd.json,
sample/2b28ee5d395fb6b0ffcf53132858a8ff.json
metadata is stored in Neo4j as a single Python literal string, expected to be parsed into a Python dict using ast.literal_eval(). However, the notes field embedded within the string for the Samples submitted is bounded with single-quotes and also contains information which is single-quoted.
MATCH (e:Entity) WHERE e.uuid IN ['b3c76ba5f4b687ee400e8208dffe65bd','2b28ee5d395fb6b0ffcf53132858a8ff'] RETURN e
shows
'notes': 'the 'source storage duration' of the Block corresponds to the 'cold ischemic time' for the source Organ'
for b3c76ba5f4b687ee400e8208dffe65bd and
'notes': 'the 'source storage duration' of the Block corresponds to the 'cold ischemic time' for the source Organ'
for 2b28ee5d395fb6b0ffcf53132858a8ff.
Investigate if such a string was loaded using a TSV or by some other mechanism lacking the validation which, for example, is done by entity-api's POST /entities/sample. Another possibility is entity-api's verification that Sample.metadata contains JSON needs to be re-worked for nested fields. However, invalid JSON on a Request body should cause
"error": "400 Bad Request: The browser (or proxy) sent a request that this server could not understand."
Correctly using entity-api's POST /entities/sample seems to be working correctly right now. A JSON body on the HTTP Request of
{
[SNIP]
"protocol_url": "have we fixed the 'single-quote' problem?"
}
results in Neo4j content of
"metadata": "{'notes': "Have we fixed the 'single-quote problem' everywhere?"}"
and ElasticSearch content of
"metadata" : {"notes" : "Have we fixed the 'single-quote problem' everywhere?"}
Existing data in PROD will need to be fixed as reported, in addition to improving the code which is allowing such flawed structures into Neo4j. (It is difficult to query ElasticSearch Documents having metadata:null because of how ES treats nulls and because workarounds require full-scans which are not enabled on the Production server. A Cypher query may be complex due to the flawed data being embedded in a string, but maybe not.)
After fixing any reported data
- the entity-api cache for PROD needs to be cleared
- the entity should be examined calling
GET /entities/<ID> - the entity should be re-indexed using search-api's
PUT /reindex/<ID>