Skip to content

kalininalab/structguy

Repository files navigation

StructGuy

Installation

Warning

StructGuy is designed for the application on high performance computing machines. The installation includes the download of UniRef50 and UniRef90 and the calculation of their search index tables with MMseqs2 requires 1 Tb of disc space and 800 Gb of memory. StructGuy can still be installed on regular computers that can provide the disc space, however we do not suggest to run a full model training.

Step 0:

StructGuy needs to be installed on top of StructMAn.

Step 1:

  • Clone the repository:
git clone https://github.com/kalininalab/structguy.git
  • Navigate into repo folder Cloning the repo will create a folder named structguy that contains the installation script install.sh
    cd structguy
    

Step 2:

Go into the cloned repository and call the installation script:

  • -e Name of the conda environment that contains an installation of StructMAn
  • -s Path to download the search tables, needs to provide 1 Tb of disc space!
  • -v Activates verbose Output

Warning

You need to be in conda base environment to call the install.sh script.

./install.sh -e [name of StructMAn environment] -s [path to storage folder] -v

Warning

In addition to downloading the search tables, this script uses MMSEQS2 to calculate indices tables for them. Dependant on the available number of CPUs, this step can take multiple hours.

Usage

Feature Generation

Whether to train on or to predict a dataset, a respective feature table has to be calculated. The first step to do so is the calculation of structural features by applying the StructMAn annotation pipeline. Therefor a dataset needs to be prepared to be processable by StructMAn, which is explained in this tutorial.

Calling StructMAn:

structman -i [path to dataset] -n [number of threads]
  • -i Path to a StructMAn-readable dataset file.
  • -n Provides the maximal number of threads that should be used.

StructMAn generates a config file named [name of dataset].structguy_project.conf in the corresponding output directory.
It is required for the subsequent callings of StructGuy.

Calling of non-structural features generation script:

structguy generate_features -i [path to structguy_project.conf] -n [number of threads]
  • -i Path to the structuguy_project.conf file that got produced by StructMAn.
  • -n Provides the maximal number of threads that should be used.

Model Training

Tip

Easiest way to use StructGuy is by downloading the model we trained in (add_link_to_publication_later) from Hugging Face

Without Hyperparameter Optimization

structguy build_model -i [path to name_of_dataset.structguy_project.conf] --nocv --nohpo -n [number of threads]
  • -i Path to the structuguy_project.conf file that got produced by StructMAn.
  • --nocv Skips any cross-validation setups and directly trains on the full dataset.
  • --nohpo Skips the hyperparameter optimization.
  • -n Provides the maximal number of threads that should be used.

With Hyperparameter Optimization

Warning

This will consume great amounts of computing resources and time.

structguy build_model -i [path to name_of_dataset.structguy_project.conf] -n [number of threads]
  • -i Path to the structuguy_project.conf file that got produced by StructMAn.
  • -n Provides the maximal number of threads that should be used.

Applying a Model

structguy predict -i [path to name_of_dataset.structguy_project.conf] -m [path to model.dump file] -n [number of threads]
  • -i Path to the structuguy_project.conf file that got produced by StructMAn.
  • -m Path to an already trained model, either generated by structguy build_model or downloaded from Hugging Face.
  • -n Provides the maximal number of threads that should be used.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages