Warning
StructGuy is designed for the application on high performance computing machines. The installation includes the download of UniRef50 and UniRef90 and the calculation of their search index tables with MMseqs2 requires 1 Tb of disc space and 800 Gb of memory. StructGuy can still be installed on regular computers that can provide the disc space, however we do not suggest to run a full model training.
StructGuy needs to be installed on top of StructMAn.
- Clone the repository:
git clone https://github.com/kalininalab/structguy.git
- Navigate into repo folder
Cloning the repo will create a folder named
structguythat contains the installation scriptinstall.shcd structguy
Go into the cloned repository and call the installation script:
-eName of the conda environment that contains an installation of StructMAn-sPath to download the search tables, needs to provide 1 Tb of disc space!-vActivates verbose Output
Warning
You need to be in conda base environment to call the install.sh script.
./install.sh -e [name of StructMAn environment] -s [path to storage folder] -v
Warning
In addition to downloading the search tables, this script uses MMSEQS2 to calculate indices tables for them. Dependant on the available number of CPUs, this step can take multiple hours.
Whether to train on or to predict a dataset, a respective feature table has to be calculated. The first step to do so is the calculation of structural features by applying the StructMAn annotation pipeline. Therefor a dataset needs to be prepared to be processable by StructMAn, which is explained in this tutorial.
structman -i [path to dataset] -n [number of threads]
-iPath to a StructMAn-readable dataset file.-nProvides the maximal number of threads that should be used.
StructMAn generates a config file named [name of dataset].structguy_project.conf in the corresponding output directory.
It is required for the subsequent callings of StructGuy.
structguy generate_features -i [path to structguy_project.conf] -n [number of threads]
-iPath to the structuguy_project.conf file that got produced by StructMAn.-nProvides the maximal number of threads that should be used.
Tip
Easiest way to use StructGuy is by downloading the model we trained in (add_link_to_publication_later) from Hugging Face
structguy build_model -i [path to name_of_dataset.structguy_project.conf] --nocv --nohpo -n [number of threads]
-iPath to the structuguy_project.conf file that got produced by StructMAn.--nocvSkips any cross-validation setups and directly trains on the full dataset.--nohpoSkips the hyperparameter optimization.-nProvides the maximal number of threads that should be used.
Warning
This will consume great amounts of computing resources and time.
structguy build_model -i [path to name_of_dataset.structguy_project.conf] -n [number of threads]
-iPath to the structuguy_project.conf file that got produced by StructMAn.-nProvides the maximal number of threads that should be used.
structguy predict -i [path to name_of_dataset.structguy_project.conf] -m [path to model.dump file] -n [number of threads]
-iPath to the structuguy_project.conf file that got produced by StructMAn.-mPath to an already trained model, either generated bystructguy build_modelor downloaded from Hugging Face.-nProvides the maximal number of threads that should be used.