This repo accompanies the preprint "Enhanced Thompson Sampling by Roulette Wheel Selection for Screening Ultra-Large Combinatorial Libraries".
Create a new conda environment:
conda create -n ts_test python=3.11
Activate your environment and install the rest of the requirements:
conda activate ts_test
pip install -r requirements.txt
python ./src/ts_paper.py quinazoline_fp_sim.json
Code in src is outdated and will not be maintained
python ./src_multiprocess/ts_main.py input.json
Note that there is a multiprocessing overhead. If time_for_scoring_single_compound * min_cpds_per_core > 0.1 s, there could be a gain from multiprocessing.
Required params:
evaluator_arg: The argument to be passed to the instantiation of the evaluator (e.g. smiles string of the query molecule, filepath to the mol file, etc.) See the relevantEvaluatorfor more info.evaluator_class_name: Required. The scoring function to use. Must be "FPEvaluator" for 2D similarity. To use your own scoring function, implement a subclass of the Evaluator baseclass in evaluators.py.reaction_smarts: Required. The SMARTS string for the reaction.reagent_file_list: Required. List of filepaths - one for each component of the reaction. Each file should contain the smiles strings of valid reagents for that component.
Optional params:
percent_of_libray: Optional. Default 0.1%. Percent of library to be screened. If 0.1% to be screened, set as 0.001 instead of 0.1 (a bit confusing).num_warmup_trials: Optional. Default 3.min_cpds_per_core: Optional. Default 50. The minimum number of compounds collected for scoring per core per iteration.scaling: Optional. Default 1. +1 if higher score is preferred; -1 otherwise.stop: Optional. Default 6000. Stop searching when without sampling a new compound for a specified number of consecutive attempts.results_filename: Optional. Default "./results.csv". Name of the file to output results to.log_filename: Optional. Log filename to save logs to. If not set, logging will be printed to stdout.hide_progress: Optional. Defaut true. false otherwise.nprocesses: Optional. Default to use all CPU cores.