RAVEN: Redundancy Analysis via Elimination Networks

An ultra-fast tool to reduce the attributes (features) of that insanely large dataset in a way that doesn't affect dataset quality. It does this by identifying clusters of linearly related (and therefore redundant) features, and only preserving the feature most 'near' to all other features.

Tested on huge datasets, and mathematically sound. Read the unfinished draft here.

Dependencies

Make sure you have Pandas, NumPy and NetworkX installed. You can install these packages using pip

pip install pandas numpy networkx

Usage

To use Raven, you can simply download the raw of raven.py and import it as

from raven import raven

Once you have it imported, you can identify redundant features. Here's an example usage:

really_huge_dataset = pd.read_csv('./really_huge_dataset.csv')

redundant_features = raven(really_huge_dataset)

smaller_dataset = really_huge_dataset.drop(columns=redundant_features)

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
LIMESODA.ipynb		LIMESODA.ipynb
Leukemia.ipynb		Leukemia.ipynb
QSAR-TID-102807.ipynb		QSAR-TID-102807.ipynb
README.md		README.md
Santander.ipynb		Santander.ipynb
Train1000.ipynb		Train1000.ipynb
dexter.ipynb		dexter.ipynb
raven.ipynb		raven.ipynb
raven.py		raven.py
results.svg		results.svg
train1000.csv		train1000.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAVEN: Redundancy Analysis via Elimination Networks

Dependencies

Usage

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

fringewidth/raven

Folders and files

Latest commit

History

Repository files navigation

RAVEN: Redundancy Analysis via Elimination Networks

Dependencies

Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages