HoVpred

Installation

The environment requires Linux with CUDA 11.0.

conda env create -f environment.yml
conda activate hovpred

Data

The training data originates from NIST and DIPPR databases, which are not freely available. The CSV files in data/ contain only the molecular identifiers (SMILES) and temperatures, with the enthalpy of vaporization values redacted. As a result, only the prediction workflow can be reproduced using the provided pre-trained model weights; model training and cross-validation require access to the original databases.

Usage

python main.py [-h] [-predict] [-watsoneq] [-K_fold] [-maxatoms MAXATOMS]
               [-lr LR] [-epoch EPOCH] [-batchsize BATCHSIZE] [-layers LAYERS]
               [-heads HEADS] [-residcon] [-explicitH] [-dropout DROPOUT]
               [-modelname MODELNAME] [-num_hidden NUM_HIDDEN] [-train_only]
               [-loss LOSS] [-sw_thr SW_THR] [-sw_decay SW_DECAY]

Optional arguments:

  -h, --help            show this help message and exit
  -predict              If specified, prediction is carried out
                        (default=False)
  -watsoneq             whether to use watson equation (default=False)
  -K_fold               whether to run KFoldCV (default=False)
  -maxatoms MAXATOMS    Maximum number of atoms in a molecule (default=64)
  -lr LR                Learning rate (default=5.0e-4)
  -epoch EPOCH          epoch (default=200)
  -batchsize BATCHSIZE  batch_size (default=256)
  -layers LAYERS        number of gnn layers (default=5)
  -heads HEADS          number of gat heads (default=5)
  -residcon             whether to use residual connection (default=True)
  -explicitH            whether to use explicit hydrogens (default=False)
  -dropout DROPOUT      dropout rate (default=0.0)
  -modelname MODELNAME  model name (default=an array of hyperparameter values)
  -num_hidden NUM_HIDDEN
                        number of nodes in hidden layers (default=32)
  -train_only           If specified, no 8:1:1 split is carried out, the whole
                        database is used for training (default=False)
  -loss LOSS            loss function (default=mse). Options - mae, mse,
                        kl_div_normal

Prediction

Prepare a molecules_to_predict.csv file with two columns: smiles and temperature, then run:

python main.py -predict -modelname best_211007 -loss kl_div_normal

Training (requires NIST/DIPPR data)

python main.py -modelname test_model -loss kl_div_normal

non-default hyperparameters can also be tested by adding more arguments

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Fig7		Fig7
Fig8		Fig8
Table4_Pearson_r		Table4_Pearson_r
__pycache__		__pycache__
data		data
five_other_props		five_other_props
nonlinear_regression		nonlinear_regression
results_		results_
results_Watson/model		results_Watson/model
Fig2.ipynb		Fig2.ipynb
Fig3b+4a.ipynb		Fig3b+4a.ipynb
Fig3c+Fig6+tSNE+attention_calc.ipynb		Fig3c+Fig6+tSNE+attention_calc.ipynb
Fig3c+Fig6+tSNE+attention_plot.ipynb		Fig3c+Fig6+tSNE+attention_plot.ipynb
Fig5.ipynb		Fig5.ipynb
README.md		README.md
atom_feature_T_response_analysis.ipynb		atom_feature_T_response_analysis.ipynb
data_for_learning_curve.csv		data_for_learning_curve.csv
environment.yml		environment.yml
feat_vectors_2d+Att.csv		feat_vectors_2d+Att.csv
gcn.py		gcn.py
gnn.py		gnn.py
main.py		main.py
molecules_to_predict.csv		molecules_to_predict.csv
molecules_to_predict_results.csv		molecules_to_predict_results.csv
molgraph.py		molgraph.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HoVpred

Installation

Data

Usage

Prediction

Training (requires NIST/DIPPR data)

About

Uh oh!

Releases 1

Packages

Languages

BioE-KimLab/HoVpred

Folders and files

Latest commit

History

Repository files navigation

HoVpred

Installation

Data

Usage

Prediction

Training (requires NIST/DIPPR data)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages