diff --git a/README.md b/README.md index 54b151a..5acdf90 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ A Python wrapper for MADlib - an open source library for scalable in-database machine learning algorithms. -You can visit [PyMADlib's webpage](http://pivotalsoftware.github.io/pymadlib/) for installation and usage tutorials. +You can visit [PyMADlib's webpage](https://pivotalsoftware.github.io/pymadlib/) for installation and usage tutorials. ## Algorithms @@ -11,14 +11,14 @@ PyMADlib currently has wrappers for the following algorithms in MADlib (version 1. K-Means 1. LDA -Refer [MADlib User Docs](http://doc.madlib.net/v0.5/ ) for MADlib's user documentation. Please note that PyMADlib as of now is only compatible with MADlib v0.5. You can obtain MADlib v0.5 from [MADlib v0.5](https://github.com/madlib/madlib/archive/v0.5.tar.gz). We might add support to more recent versions of MADlib depending on adoption rate. Please email me if you have a strong case for an upgrade. +Refer [MADlib User Docs](https://madlib.apache.org/docs/v0.5/ ) for MADlib's user documentation. Please note that PyMADlib as of now is only compatible with MADlib v0.5. You can obtain MADlib v0.5 from [MADlib v0.5](https://github.com/madlib/madlib/archive/v0.5.tar.gz). We might add support to more recent versions of MADlib depending on adoption rate. Please email me if you have a strong case for an upgrade. ## Dependencies 1. You'll need the python extension _**psycopg2**_ to use PyMADlib. 1. If you have matplotlib installed, you'll see Matplotlib visualizations for Linear Regression demo. -1. If you have installed [networkx](http://networkx.github.com/download.html), you'll see a visualization of the k-means demo +1. If you have installed [networkx](https://networkx.github.com/download.html), you'll see a visualization of the k-means demo 1. [PyROC](https://github.com/marcelcaraciolo/PyROC) is included in the source of this distribution with permission from its developer. You'll see a visualization of the ROC curves for Logistic Regression. @@ -53,7 +53,7 @@ PyMADlib depends on `MADlib`, `psycopg2` and `Pandas`. It is easiest to work wit ## Build Environment Setup on Mac OS X 10.8 -* Download & install [Anaconda-1.9.0-MacOSX-x86_64.pkg] (http://repo.continuum.io/archive/Anaconda-1.9.0-MacOSX-x86_64.pkg) +* Download & install [Anaconda-1.9.0-MacOSX-x86_64.pkg] (https://repo.continuum.io/archive/Anaconda-1.9.0-MacOSX-x86_64.pkg) * Open a terminal and check if you have Anaconda Python & the package manager conda @@ -62,7 +62,7 @@ PyMADlib depends on `MADlib`, `psycopg2` and `Pandas`. It is easiest to work wit > vatsan-mac$ which conda > /Users/vatsan/anaconda/bin/conda -* If you haven't installed PostgreSQL on your Mac already, you'll have to download & install `PostGreSQL` for Mac. This is so that we get some required libraries to compile the SQL Engine: psycopg2. The easiest way to install `PostGreSQL` on Mac is via `http://postgresapp.com/`. Once you've downloaded and installed PostGreSQL on Mac, it should typically be found under `/Library/PostgreSQL` +* If you haven't installed PostgreSQL on your Mac already, you'll have to download & install `PostGreSQL` for Mac. This is so that we get some required libraries to compile the SQL Engine: psycopg2. The easiest way to install `PostGreSQL` on Mac is via `https://postgresapp.com/`. Once you've downloaded and installed PostGreSQL on Mac, it should typically be found under `/Library/PostgreSQL` > vatsan-mac$ ls /Library/PostgreSQL/9.2/ > Library include pg_env.sh uninstall-postgresql.app @@ -98,7 +98,7 @@ If the above command did not error out, then installation was successful. ## Usage Tutorial -Visit [PyMADlib Tutorial](http://nbviewer.ipython.org/gist/vatsan/dd88abb47c2fbd9e16bd) for a tutorial on using PyMADlib +Visit [PyMADlib Tutorial](https://nbviewer.ipython.org/gist/vatsan/dd88abb47c2fbd9e16bd) for a tutorial on using PyMADlib Also visit [PyMADlib IPython NB](https://gist.github.com/vatsan/dd88abb47c2fbd9e16bd) to download the IPython NB tutorial @@ -137,9 +137,9 @@ Remember to close the Matplotlib windows that pop-up to continue with the rest o PyMADlib packages publicly available datasets from the UCI machine learning repository and other sources. -1. [Wine quality dataset from UCI Machine Learning repository](http://archive.ics.uci.edu/ml/datasets/Wine+Quality) -1. [Auto MPG dataset from UCI ML repository from UCI Machine Learning repository](http://archive.ics.uci.edu/ml/datasets/Auto+MPG) -1. [Wine quality dataset from UCI Machine Learning repository](http://archive.ics.uci.edu/ml/datasets/Wine+Quality) +1. [Wine quality dataset from UCI Machine Learning repository](https://archive.ics.uci.edu/ml/datasets/Wine+Quality) +1. [Auto MPG dataset from UCI ML repository from UCI Machine Learning repository](https://archive.ics.uci.edu/ml/datasets/Auto+MPG) +1. [Wine quality dataset from UCI Machine Learning repository](https://archive.ics.uci.edu/ml/datasets/Wine+Quality) 1. Obama-Romney second presidential debate (2012) transcripts diff --git a/README.txt b/README.txt index 24344be..7f653dc 100644 --- a/README.txt +++ b/README.txt @@ -3,14 +3,14 @@ Python wrapper for MADlib Srivatsan Ramanujam , 3 Jan 2013 This currently implements Linear regression, Logistic Regression, SVM (regression & classification), K-Means and LDA algorithms of MADlib. -Refer : http://doc.madlib.net/v0.5/ for MADlib's user documentation. +Refer : https://madlib.apache.org/docs/v0.5/ for MADlib's user documentation. ================================================================================ Dependencies : =============== You'll need the python extension : psycopg2 to use PyMADlib. (i) If you have matplotlib installed, you'll see Matplotlib visualizations for Linear Regression demo. - (ii) If you have installed networkx (http://networkx.github.com/download.html), you'll see a visualization of the k-means demo + (ii) If you have installed networkx (https://networkx.github.com/download.html), you'll see a visualization of the k-means demo (iii) PyROC (https://github.com/marcelcaraciolo/PyROC) is included in the source of this distribution with permission from its developer. You'll see a visualization of the ROC curves for Logistic Regression. Configurations: @@ -56,8 +56,8 @@ Datasets packaged with this installation : ========================================= PyMADlib packages publicly available datasets from the UCI machine learning repository and other sources. -1) Wine quality dataset from UCI Machine Learning repository : http://archive.ics.uci.edu/ml/datasets/Wine+Quality -2) Auto MPG dataset from UCI ML repository : http://archive.ics.uci.edu/ml/datasets/Auto+MPG +1) Wine quality dataset from UCI Machine Learning repository : https://archive.ics.uci.edu/ml/datasets/Wine+Quality +2) Auto MPG dataset from UCI ML repository : https://archive.ics.uci.edu/ml/datasets/Auto+MPG 3) Obama-Romney second presidential debate (2012) transcripts for the LDA models. @@ -71,6 +71,6 @@ with installing psycopg2. Here are some blogs which discuss the issue and offer solutions: http://hardlifeofapo.com/psycopg2-and-postgresql-9-1-on-snow-leopard/ -http://www.initd.org/psycopg/articles/2010/11/11/links-about-building-psycopg-mac-os-x/ +https://www.initd.org/psycopg/articles/2010/11/11/links-about-building-psycopg-mac-os-x/ diff --git a/pymadlib/doc/PyMADlib Tutorial.ipynb b/pymadlib/doc/PyMADlib Tutorial.ipynb index e7176b9..ede04b0 100644 --- a/pymadlib/doc/PyMADlib Tutorial.ipynb +++ b/pymadlib/doc/PyMADlib Tutorial.ipynb @@ -28,7 +28,7 @@ "1. K-Means \n", "1. LDA \n", "\n", - "Refer [MADlib User Docs](http://doc.madlib.net/v0.5/ ) for MADlib's user documentation.\n", + "Refer [MADlib User Docs](https://madlib.apache.org/docs/v0.5/ ) for MADlib's user documentation.\n", "\n", "We can employ it to push the heavy number crunching to MADlib, while allowing us to work with awesomeness of Python in the front end." ] diff --git a/pymadlib/example.py b/pymadlib/example.py index 0ceb801..37ea10c 100644 --- a/pymadlib/example.py +++ b/pymadlib/example.py @@ -86,7 +86,7 @@ def linearRegressionDemo(conn): smat = scatter_matrix(predictions.get(['quality','prediction']), diagonal='kde') # 1 b) Linear Regression with categorical variables - # We'll use the auto_mpg dataset from UCI : http://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.names + # We'll use the auto_mpg dataset from UCI : https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.names # make, fuel_type, fuel_system are all categorical variables, rest are real. #Train Linear Regression Model on a mixture of Numeric and Categorical Variables mdl_dict, mdl_params = lreg.train('public.auto_mpg_train',['1','height','width','length','highway_mpg','engine_size','make','fuel_type','fuel_system'],'price') diff --git a/pymadlib/pymadlib.py b/pymadlib/pymadlib.py index bad2c83..a570ff2 100644 --- a/pymadlib/pymadlib.py +++ b/pymadlib/pymadlib.py @@ -7,7 +7,7 @@ 3) SVM (regression & classification) and 4) K-Means & 5) PLDA - Refer : http://doc.madlib.net/v0.5/ for MADlib's user documentation. + Refer : https://madlib.apache.org/docs/v0.5/ for MADlib's user documentation. ''' from utils import pivotCategoricalColumns, convertsColsToArray import psycopg2 @@ -98,7 +98,7 @@ def predict(self, *args): class LinearRegression(SupervisedLearning): ''' Python Wrapper to invoke MADlib's Linear Regression Algorithm - http://doc.madlib.net/v0.5/group__grp__linreg.html + https://madlib.apache.org/docs/v0.5/group__grp__linreg.html ''' def __init__(self,conn): super(LinearRegression,self).__init__(conn) @@ -184,7 +184,7 @@ def predict(self, predict_table_name, actual_label_col=''): class LogisticRegression(SupervisedLearning): ''' Python Wrapper to invoke MADlib's Logistic Regression Algorithm - http://doc.madlib.net/v0.5/group__grp__logreg.html + https://madlib.apache.org/docs/v0.5/group__grp__logreg.html ''' def __init__(self,conn): super(LogisticRegression,self).__init__(conn) @@ -293,7 +293,7 @@ def predict(self, predict_table_name,actual_label_col='',threshold=0.5): class SVM(SupervisedLearning): ''' Python Wrapper to invoke MADlib's SVM Algorithm - http://doc.madlib.net/v0.5/group__grp__kernmach.html + https://madlib.apache.org/docs/v0.5/group__grp__kernmach.html ''' def __init__(self,conn): super(SVM,self).__init__(conn) @@ -494,7 +494,7 @@ def predict_batch(self, predict_table, output_table, id_col, data_col): class KMeans(object): ''' Python Wrapper to invoke MADlib's KMeans Algorithm - http://doc.madlib.net/v0.5/group__grp__kmeans.html + https://madlib.apache.org/docs/v0.5/group__grp__kmeans.html ''' def __init__(self,conn): self.dbconn = conn @@ -611,7 +611,7 @@ def generateClusters( class PLDA(object): ''' Python Wrapper to invoke MADlib's PLDA Algorithm - http://doc.madlib.net/v0.5/group__grp__plda.html + https://madlib.apache.org/docs/v0.5/group__grp__plda.html ''' def __init__(self,conn): self.dbconn = conn diff --git a/pymadlib/pyroc.py b/pymadlib/pyroc.py index 877f84d..f62d154 100644 --- a/pymadlib/pyroc.py +++ b/pymadlib/pyroc.py @@ -351,7 +351,7 @@ def _calculate_counts(self,pos_data,neg_data): if __name__ == '__main__': print "PyRoC - ROC Curve Generator" print "By Marcel Pinheiro Caraciolo (@marcelcaraciolo)" - print "http://aimotion.bogspot.com\n" + print "http://ww1.bogspot.com\n" from optparse import OptionParser parser = OptionParser() diff --git a/pymadlib/utils.py b/pymadlib/utils.py index 530cd8d..504f563 100644 --- a/pymadlib/utils.py +++ b/pymadlib/utils.py @@ -184,7 +184,7 @@ def __getColNamesAndTypesList__(cols,col_types_dict, col_distinct_vals_dict): ''' Return a list of column names and types, where any categorical column in the original table have been 'binarized'. Dummy coding is used to convert categorical columns into dummy variables. - Refer: http://en.wikipedia.org/wiki/Categorical_variable#Dummy_coding + Refer: https://en.wikipedia.org/wiki/Categorical_variable#Dummy_coding Inputs: ======= @@ -278,7 +278,7 @@ def pivotCategoricalColumns(conn,table_name,cols,label='',col_distinct_vals_dict Take a table_name and a set of columns (some of which may be categorical and return a new table, where the categorical columns have been pivoted. This method uses the "Dummy Coding" approach: - http://en.wikipedia.org/wiki/Categorical_variable#Dummy_coding + https://en.wikipedia.org/wiki/Categorical_variable#Dummy_coding Inputs: ======= diff --git a/setup.py b/setup.py index f4d0630..32b8ad9 100644 --- a/setup.py +++ b/setup.py @@ -10,8 +10,8 @@ './dist', 'EGG-INFO', '*.egg-info') -# (c) 2005 Ian Bicking and contributors; written for Paste (http://pythonpaste.org) -# Licensed under the MIT license: http://www.opensource.org/licenses/mit-license.php +# (c) 2005 Ian Bicking and contributors; written for Paste (https://web.archive.org/web/http%3A//pythonpaste.org/) +# Licensed under the MIT license: https://www.opensource.org/licenses/mit-license.php # Note: you may want to copy this into your setup.py file verbatim, as # you can't import this from another package, when you don't know if # that package is installed yet. @@ -98,12 +98,12 @@ def find_package_data( version='1.0', author='Srivatsan Ramanujam', author_email='vatsan.cs@utexas.edu', - url='http://vatsan.github.com/pymadlib', + url='https://vatsan.github.com/pymadlib', packages=find_packages(), package_data=find_package_data(only_in_packages=False,show_ignored=True), include_package_data=True, license='LICENSE.txt', - description='A Python wrapper for MADlib (http://madlib.net) - an open source library for scalable in-database machine learning algorithms', + description='A Python wrapper for MADlib (https://madlib.apache.org/) - an open source library for scalable in-database machine learning algorithms', long_description=open('README.txt').read(), install_requires=[ "psycopg2 >= 2.4.5",