-
Notifications
You must be signed in to change notification settings - Fork 34
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Following a discussion with @marcellocostamagna and team who wish to benchmark 3D representations, it makes sense to remove the dependency on SMILES as both identifiers and representations, as SMILES may not be available or necessarily correct.
Main tasks
- Have
score()acceptmolecule_ids: Optionalto allow user specified identifiers, if not, use only the index.check_duplicatesshould usemolecule_idsif present. - Refactor
additional_formatstorepresentations: Dict(or call it "molecules" or even simply "inputs" e.g., if multiple conformers input) which will be passed to scoring functions only. This will be user specified key, value pairs where the key is the representation name and the value is a list of molecules per representation. If "smiles" is a representation, cache them, parse them and use as an additional identifier as per current implementation. Additionally we could accept anykwargand convert it into therepresentationsdict, potentially more backwards compatible but less explicit. - Make
parse_smiles()only run if SMILES are present and make rdkit sanitization optional.
Possible issues
Everything currently depends on SMILES, scoring_functions, GUI monitor, score_metrics etc.
Mitigating actions
- Maintain
smilesvariable for backwards compatibility, or grab "smiles" key fromrepresentations.
I think many other things will break, and we come up with solutions as they appear.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request