Supplementary material for "Residue Cluster Classes Efficiently Identify Protein Interfaces and Interactions"
This site contains the training and testing datasets used to evaluate Residue Cluster Classes (RCC) vectors in classifying true and false protein-peptide interactions (PpI). To do so, the RCC vector for each pair of peptide and protein three-dimensional structures was derived as previosuly reported (Fontove F and Del Rio G, 2020), and the vectors were summed or concatenated to generate a single vector to represent a PpI. Twentyfour datasets were used to generate twentyfour Weka models that are available at the Models directory.
The best-performing models were used to further test the generalization of these models by testing the ability to identify the regions relevant of known protein-protein interaction (PPI) complexes reported in the 3DID database. The list of PPI used in this study is available at the 3DID directory. The CodesRCCPI directory contains the Java codes used to identify the regions (peptides) from protein A, that are likely to interact with protein B, as well as the instructions to compile and execute. To validate those predictions, we used the 3did_flat.tsv file provided by the 3DID database and we also generated a code to identify the closest peptides derived from protein A to protein B. These codes are available at the CodesCP directory.