GitHub - msahilgit/Unsupervised-Random-Forest: URF is a self-supervised version of traditional supervised-random-forest algorithm, facilitating the feature selection in protein biophysics for resolving protein's conformational representation.

Unbiased learning of protein conformational representation via unsupervised random forest

Accurate data representation is paramount in molecular dynamics simulations to capture the functionally relevant motions of proteins. Traditional feature selection methods, while effective, often rely on labeled data, limiting their applicability to novel systems. Here, we present unsupervised random forest (URF), a self-supervised adaptation of traditional random forests that identifies functionally critical features without requiring prior labels. URF-selected features highlight key functional regions, enabling the identification of important residues in diverse proteins. By implementing a memory-efficient version, we demonstrate URF's capability to resolve functional states in around 10 diverse systems, including folded and intrinsically disordered proteins, performing on par with or surpassing 16 leading baseline methods. Crucially, URF is guided by an internal metric, the learning coefficient, which automates hyper-parameter optimization, making the method robust and user-friendly. Benchmarking results reveal URF's distinct ability to produce functionally meaningful representations in comparison to previously reported methods, facilitating downstream analyses such as Markov state modeling . The investigation presented here establishes URF as a leading tool for unsupervised representation learning in protein biophysics.

Reference

this repository is implementation of URF protocol, corresponding to publication(ref.).

MAIN
├── URF : the unsupervised-random-forest module
│
├── data : scripts for data estimation from MD trajectories
│ ├── ASH1
│ ├── LJ polymer
│ ├── P450_binding
│ ├── P450_channel1
│ ├── SIC1
│ ├── T4L
│ ├── asyn
│ ├── mopR
│ ├── mopR_ensembles
│ ├── pASH1
│ └── pSIC1
│
├── scripts : scripts for reproducibility of results
│ ├── 0_python_modules
│ ├── ASH1
│ ├── LJ_polymer
│ ├── P450_binding
│ ├── P450_channel1
│ ├── SIC1
│ ├── T4L
│ ├── asyn
│ ├── baseline
│ ├── mopr
│ └── t4l
│ ├── functional_regions
│ ├── mopr
│ ├── t4l
│ └── diffnet
│ ├── hyperparameters
│ ├── mopR
│ ├── msm
│ ├── mopr
│ ├── asyn
│ └── vampnet
│ ├── optimization
│ ├── pASH1
│ └── pSIC1
│
└── usage : guidelines/tutorials for using URF

Dependencies

Numpy
scikit-learn
numba
copy
tqdm
multiprocessing
sys
fastcluster
gc
pickle
tables (only for certain functions of proximity_matrix.py, off by default)
scipy
joblib

Installation

conda create --name urf python=3.9
conda activate urf
git clone https://github.com/msahilgit/Unsupervised-Random-Forest
cd Unsupervised-Random-Forest/
pip install -e .
#also see 'alternative.txt' for use without installation

Usage

from URF.model import unsupervised_random_forest as urf
dobj=urf()
dobj.fit(data)
lc,fimp=dobj.get_output()
# see usage/t{1,2}.ipynb for details

Name		Name	Last commit message	Last commit date
Latest commit History 324 Commits
URF		URF
data		data
scripts		scripts
usage		usage
README.md		README.md
alternative.txt		alternative.txt
setup.py		setup.py
urf.png		urf.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unbiased learning of protein conformational representation via unsupervised random forest

Reference

Dependencies

Installation

Usage

Quick Links

About

Uh oh!

Releases

Packages

Uh oh!

Languages

msahilgit/Unsupervised-Random-Forest

Folders and files

Latest commit

History

Repository files navigation

Unbiased learning of protein conformational representation via unsupervised random forest

Reference

Dependencies

Installation

Usage

Quick Links

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages