This library provides tools for statistical analysis, regression modeling, sample size analysis, and visualization. It includes OLS and SAR models, as well as utilities for data preprocessing and plotting.
A formal description and analysis are included in the following reference:
J. M. Gorriz, J. Ramirez, F. Segovia, C. Jimenez-Mesa, F. J. Martinez-Murcia, y J. Suckling, «Statistical agnostic regression: A machine learning method to validate regression models», Journal of Advanced Research, may 2025, doi: 10.1016/j.jare.2025.04.026.
- Data Preprocessing: Standardize and clean input data, handle missing values.
- Visualization: Scatter plots for exploratory data analysis.
- OLS Regression: Permutation-based p-value, power, R², and coefficient estimation.
- SAR Regression: Support Vector Regression with advanced generalization bounds.
- Sample Size Analysis: Empirical study of how sample size affects model statistics.
- Plotting Utilities: Visualize loss, threshold, power, p-value, and coefficients as functions of sample size.
SARlib can be installed via PyPI:
pip install sarlib
Alternatively, you can install it manually by downloading the source code. In that case, make sure you have the following dependencies installed:
- numpy
- matplotlib
- statsmodels
- scikit-learn
- scipy
Classes:
-
SAR: Statistical Analysis Regression with PAC-Bayes, Vapnik, and IGP bounds. -
OLS: Ordinary Least Squares regression with permutation-based significance and power analysis. -
SampleSizeAnalysis: Analyzes the effect of sample size on model performance and statistics.
Functions:
-
fix_data(x, y): Standardizes and cleans input data. -
show_scatter(x, y, ...): Visualizes predictors vs. response.
-
Import packages and prepare your data as numpy arrays:
from sarlib import SAR, OLS, SampleSizeAnalysis, show_scatter import numpy as np x = np.random.randn(100, 3) # predictors y = np.random.randn(100) # response
-
Visualize data:
show_scatter(x, y)
-
Fit SAR model:
model_sar = SAR(n_realiz=100, norm='epsins', alpha=0.05) stats_sar = model_sar.fit(x, y, verbose=True)
-
Compare with an OLS model:
model_ols = OLS(n_realiz=100, alpha=0.05) stats_ols = model_ols.fit(x, y, verbose=True)
-
Analyze sample size effect:
analysis = SampleSizeAnalysis(model_sar, x, y, steps=7) analysis.plot_loss() analysis.plot_pvalue() analysis.plot_coef()
All functions and classes are documented with docstrings. Please refer to the code for parameter details and usage.
Author: Sipba Group, UGR, https://sipba.ugr.es/
Please cite: J. M. Gorriz, J. Ramirez, F. Segovia, C. Jimenez-Mesa, F. J. Martinez-Murcia, y J. Suckling, «Statistical agnostic regression: A machine learning method to validate regression models», Journal of Advanced Research, may 2025, doi: 10.1016/j.jare.2025.04.026.
License: GPL Version 3