-
Notifications
You must be signed in to change notification settings - Fork 120
Description
I have been working with this library to extract chem information from HTML pages.
I followed http://chemdataextractor.org/demo and saved https://pubs.rsc.org/en/content/articlelanding/2015/TC/C5TC02626A as an html(input3.html) file.
Below is my code.
with open('input/input3.html', 'rb') as f:
doc = Document.from_file(f)
records = doc.records.serialize()
This does not matches with the records in the json output published at https://pubs.rsc.org/en/content/articlelanding/2015/TC/C5TC02626A .
A lot of information is missing including smiles, fluorescence_lifetimes etc.
@mcs07 was wondering if you could publish the code that was used for the demo.
Ps : Is there a method which creates the entire json which includes abbreviation + biblio + record or they are extracted separately and stitched together to create the final json output.