From d30ea51edfbe25493e211f9f60880f4911234b41 Mon Sep 17 00:00:00 2001 From: scohenjanes5 <101665992+scohenjanes5@users.noreply.github.com> Date: Wed, 23 Apr 2025 14:19:03 -0400 Subject: [PATCH 1/5] Add more flexible environment file for linux Use updated instructions for installing torch via pip, not conda. --- env-linux.yml | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 env-linux.yml diff --git a/env-linux.yml b/env-linux.yml new file mode 100644 index 0000000..fc33ea9 --- /dev/null +++ b/env-linux.yml @@ -0,0 +1,26 @@ +name: cardiogenai_env +channels: + - conda-forge + - defaults +dependencies: + - python + - ipython + - h5py + - jupyter + - matplotlib + - matplotlib-venn + - numpy + - openbabel + - pandas + - rdkit + - scipy + - seaborn + - scikit-learn + - tqdm + - pip + - pip: + - torch + - torchvision + - torchaudio + - torch-geometric + - --extra-index-url https://download.pytorch.org/whl/cu126 \ No newline at end of file From 34f24696955a7b81b6332e6a1aaec3f964dd0e8b Mon Sep 17 00:00:00 2001 From: scohenjanes5 <101665992+scohenjanes5@users.noreply.github.com> Date: Wed, 23 Apr 2025 14:33:07 -0400 Subject: [PATCH 2/5] rename --- env-linux.yml => env-flex.yml | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename env-linux.yml => env-flex.yml (100%) diff --git a/env-linux.yml b/env-flex.yml similarity index 100% rename from env-linux.yml rename to env-flex.yml From fdd317e25a26f04d94e936b3c30ad771c740e008 Mon Sep 17 00:00:00 2001 From: scohenjanes5 <101665992+scohenjanes5@users.noreply.github.com> Date: Wed, 23 Apr 2025 14:37:08 -0400 Subject: [PATCH 3/5] Update install instructions --- README.md | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index e9e5507..bddcf6e 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,10 @@ The link between in vitro hERG ion channel inhibition and subsequent in vivo QT The CardioGenAI framework combines generative and discriminative ML models to re-engineer hERG-active compounds for reduced hERG channel inhibition while preserving their pharmacological activity. An autoregressive transformer is trained on a dataset that we previously curated which contains approximately 5 million unique and valid SMILES strings derived from ChEMBL 33, GuacaMol v1, MOSES, and BindingDB datasets. The model is trained autoregressively, receiving a sequence of SMILES tokens as context as well as the corresponding molecular scaffold and physicochemical properties, and iteratively predicting each subsequent token in the sequence. Once trained, this model is able to generate valid molecules conditioned on a specified molecular scaffold along with a set of physicochemical properties. For an input hERG-active compound, the generation is conditioned on the scaffold and physicochemical properties of this compound. Each generated compound is subject to filtering based on activity against hERG, NaV1.5 and CaV1.2 channels. Depending on the desired activity against each channel, the framework employs either classification models to include predicted non-blockers (i.e., pIC50 value ≥ 5.0) or regression models to include compounds within a specified range of predicted pIC50 values. Both the classification and regression models utilize the same architecture, and are trained using three feature representations of each molecule: a feature vector that is extracted from a bidirectional transformer trained on SMILES strings, a molecular fingerprint, and a graph. For each molecule in the filtered generated ensemble and the input hERG-active molecule, a feature vector is constructed from the 209 chemical descriptors available through the RDKit Descriptors module. The redundant descriptors are then removed according to pairwise mutual information calculated for every possible pair of descriptors. Cosine similarity is then calculated between the processed descriptor vector of the input molecule and the descriptor vectors of every generated molecule to identify the molecules most chemically similar to the input molecule but with desired activity against each of the cardiac ion channels. ## Installation and Setup -Follow these instructions to install and set up CardioGenAI on your local Windows machine: +Follow these instructions to install and set up CardioGenAI on your local machine: + +### Installing Miniconda +Miniconda is a command-line tool for managing python environments, which allows multiple combinations of package versions to exist on the same machine and be used in projects with different requirements. Please follow the [official instructions](https://www.anaconda.com/docs/getting-started/miniconda/install) to install it before continuing. ### Cloning the Repository Clone the CardioGenAI repository to your local environment using the following command: @@ -28,12 +31,20 @@ cd CardioGenAI ``` ### Setting Up the Conda Environment -Create a Conda environment using the `environment.yml` file provided in the repository which contains all of the necessary dependencies: +There are two environment files. One has enough information to exactly recreate the environment used during the preparation of the manuscript, but only works on Windows. The second works on Linux as well (untested on MacOS), but will result in slightly different package versions from those used in the Windows-only environment. + +Either way, you will create the conda environment using a `.yml` file provided in the repository which contains all of the necessary dependencies: +#### Exact Recreation of Manuscript Environment (Windows Only) ``` conda env create -f environment.yml ``` +#### Flexible Install (Linux, Windows, Mac?) +``` +conda env create -f env-flex.yml +``` +#### Activation Activate the newly created environment: ``` From 9876402fd79da882d58b5d81189fa1186646f307 Mon Sep 17 00:00:00 2001 From: scohenjanes5 <101665992+scohenjanes5@users.noreply.github.com> Date: Wed, 23 Apr 2025 18:02:55 -0400 Subject: [PATCH 4/5] Additional info about CUDA --- README.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/README.md b/README.md index bddcf6e..5857f5c 100644 --- a/README.md +++ b/README.md @@ -40,6 +40,11 @@ Either way, you will create the conda environment using a `.yml` file provided i conda env create -f environment.yml ``` #### Flexible Install (Linux, Windows, Mac?) + +* The env file has an explicit link to CUDA 12.6. If you would like to change this, substitute the last line of `env-flex.yml` with the link listed at the [official pytorch site](https://pytorch.org/). + +* At the time of writing, CUDA is not available for Mac. Remove the link at the end of `env-flex.yml`. + ``` conda env create -f env-flex.yml ``` From 6726ba5bfd34e79eeecb1b6295ed8d6ac4aff2b2 Mon Sep 17 00:00:00 2001 From: scohenjanes5 <101665992+scohenjanes5@users.noreply.github.com> Date: Wed, 30 Apr 2025 09:43:12 -0400 Subject: [PATCH 5/5] Create .gitignore --- .gitignore | 1 + 1 file changed, 1 insertion(+) create mode 100644 .gitignore diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..0d20b64 --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +*.pyc