Skip to content

MolField hassubstruct perfomance for complex molecules #28

@ivannnnnnnnnn

Description

@ivannnnnnnnnn

Hello! Sorry for the possibly off topic question and my English) But maybe django-rdkit community help me with my trouble

I am using django-rdkit for store mol objects. In my database I have 10 millions molecules. I am having trouble with this amount of data when try to select molecules which is substructure of target molecule if target molecule is complex.

For example if I need select molecules when hassubstruct= c1ccccc1 its work fast. But when I try to select molecules with hassubstruct= COc1cccnc1C1=CCN(C(=O)OC(C)(C)C)CC1 I am gave very slow query.

Maybe someone have same troubles and have recommendations how to up performance.

And next one questions is which algoritm rdkit catridge use for this (hassubstruct (@>)) operation. Maybe someone know any articls about this, or can explain. I'm asking because I think there might be ways to optimize search speed with data mining. For example, I do not use exact lookup to accurately search for a molecule, but instead I store smiles in a separate field in the same model and search for them. Perhaps it will also be possible to simplify the search for substructures.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions