A Python package for analyzing matched molecular series and proposing new molecules based on existing series with similar SAR.
- Molecular fragmentation using SMARTS-based transformations
- Fragment database creation and querying
- Series matching with customizable parameters
- Efficient data handling with Polars DataFrames
- RDKit-based chemical structure processing
git clone https://github.com/driesvr/matchmolseries.git
cd matchmolseries
pip install .For development installation:
git clone https://github.com/driesvr/matchmolseries.git
cd matchmolseries
pip install -e ".[dev]"- Python >= 3.8
- RDKit >= 2022.9.1
- Pandas >= 1.3.0
- Polars >= 0.18.0
- NumPy >= 1.20.0
from matchmolseries import MatchMolSeries
import pandas as pd
# Initialize MatchMolSeries
mms = MatchMolSeries()
# Prepare your data
data = pd.DataFrame({
'smiles': ['c1ccccc1F', 'c1ccccc1Cl', 'c1ccccc1Br'],
'potency': [1.0, 2.0, 3.0],
'assay': ['assay1', 'assay1', 'assay1']
})
# Fragment reference molecules
fragments = mms.fragment_molecules(data)
# Optional: save fragments to parquet and load them in again
mms.save_fragments('fragments.parquet')
mms.load_fragments('fragments.parquet')
# Query with new molecules
query_data = pd.DataFrame({
'smiles': ['c1cnccc1F', 'c1cnccc1Br'],
'potency': [1.5, 2.5],
'assay': ['assay1', 'assay1']
})
# Find matching series
matches = mms.query_fragments(query_data, min_series_length=2)
# stitch fragments together into full molecules
matches = mms.combine_fragments(matches)The main class for molecular fragmentation and analysis.
fragment_molecules(input_df, ...): Fragment molecules from input DataFramequery_fragments(query_dataset, ...): Query fragment databasecombine_fragments(core_smiles, fragment_smiles): Combine core and fragmentload_fragments(path): Load fragment databasesave_fragments(path): Save fragment databasecombine_fragments: combine core with fragments into full molecules
The fragment dataset is stored in memory for rapid access. It can be saved as a Parquet or csv file.
This project is licensed under the MIT License - see the LICENSE file for details.
The MMS method was originally introduced by Wawer and Bajorath. Ehmki and Kramer recommended the cRMSD metric for assessing series similarity. O'Boyle et al. proposed the fragmentation patterns used herein.