Skip to content
forked from renero/dataset

Dataset class to perform data preparation and feature engineering in machine learning.

Notifications You must be signed in to change notification settings

MargaretNM/dataset

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

109 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dataset

(C) J. Renero

Build Status Documentation Status GitHub tag (latest by date)

This class attempts, through a very simple approach, collect all the common tasks that are normally done over pandas dataframes, like:

  • load data
  • set the target variable
  • describe the health status of the dataset
  • drop/keep columns or sample from simple lists
  • split the dataset
  • count categorical and numerical features
  • fix NA's
  • find correlations
  • detect skewness
  • scale numeric values
  • detect outliers
  • one hot encoding
  • find under represented categorical features
  • perform stepwise feature selection

Install

To install this package, first of all, be sure you have Python 3.7, and then do the following:

$ pip install git+http://github.com/renero/dataset

Or, if you prefer, clone the repository using git clone https:/github.com/renero/dataset.git, and then move into the just created folder to install it:

$ git clone https:/github.com/renero/dataset.git
$ cd dataset
$ pip install -e .

Examples

Check the example.ipynb to see how to start using it.

Documentation

Please, check the latest documentation at ReadTheDocs PyDataset Project page.

About

Dataset class to perform data preparation and feature engineering in machine learning.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 77.2%
  • Python 22.8%