Skip to content

Remove dependency on SMILES representation to accommodate any given representation #63

@MorganCThomas

Description

@MorganCThomas

Following a discussion with @marcellocostamagna and team who wish to benchmark 3D representations, it makes sense to remove the dependency on SMILES as both identifiers and representations, as SMILES may not be available or necessarily correct.

Main tasks

  • Have score() accept molecule_ids: Optional to allow user specified identifiers, if not, use only the index. check_duplicates should use molecule_ids if present.
  • Refactor additional_formats to representations: Dict (or call it "molecules" or even simply "inputs" e.g., if multiple conformers input) which will be passed to scoring functions only. This will be user specified key, value pairs where the key is the representation name and the value is a list of molecules per representation. If "smiles" is a representation, cache them, parse them and use as an additional identifier as per current implementation. Additionally we could accept any kwarg and convert it into the representations dict, potentially more backwards compatible but less explicit.
  • Make parse_smiles() only run if SMILES are present and make rdkit sanitization optional.

Possible issues

Everything currently depends on SMILES, scoring_functions, GUI monitor, score_metrics etc.

Mitigating actions

  • Maintain smiles variable for backwards compatibility, or grab "smiles" key from representations.

I think many other things will break, and we come up with solutions as they appear.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions