Introduction

This python class (strangenss.py) is an anomaly/change detector based on the concept of martingales. It is designed to work on unlabelled data (unsupervised anomaly detection). An example of unlabelled data set can be the dataset of number of steps taken by a user everyday. This anomaly detector can point out if the number of steps taken on a particular day are out of the ordinary.

I implemented one of the many possible implementation of the concept explained in this paper Detecting Changes in Unlabeled Data Streams using Martingale by Shen-Shyang Ho and Harry Wechsler. The basic pretext is that given a list of values, the properties (joint probability) of set should not change if the elements in the list are permutated. I use cluster mean for cluster representation.

Usage

The input to the code is a file with a header row and data in the following format

<row Label/ID>,<value1>,<value2>,<value3>...,<valueN>

The row label can be anything that represents the set of values. A timestamp can be a label. The 'value' fields represent the state at that particular label. For a heat rate monitor, the label represents the time of measurement and the values are the light intensity detected by the heart rate sensor. Check the sample data file in 'test-data' folder. I combined data from 4 different normal distributions to generate this dataset.

Also, order of the rows is important, please maintain the original order of records when generating an input dataset.

Concrete usage example

Once the data file is ready, the code can be run as follows: python usageExample.py <dataset> <threshold> <minQueueLen> <epsilon>

With sample parameters:

python usageExamples.py ./test-data/martingalesDataWithLabels.csv 10 50 0.92

The 3 values after the input dataset are as follows:

threshold - This value decides the sensitivity of the algorithm. Lesser value causes more detections
minQueueLen - This value decides the minimum number of input values before starting the change detection. Lesser value causes more detections
epsilon - This value decides the sensitivity of the algorithm - randomised power martingales

The output consists of <row Label/ID> <M Value>. If this value is greater than threshold, the algorithm has detected an anomaly.

Adding in your python code

Download the class file strangeness.py in your code folder
Add the import statement from strangeness import Strangness
Create the class object - strangeness = Strangeness(threshold, minQueueLen, epsilon)
Next pass the tuple of values (no label) to get the M value - strangeness.getMValue(valuesTuple)
Step 4 is to be repeated for each data point.
If the value returned by getMValue() is greater than the threshold then the algorithm has detected an anomaly.

Performance

For the above parameters, the code takes around 6 secs to process 4000 points. For each new point, the center mean and distance are recalculated for minQueueLen points, therefore keeping the queue length short will reduce runtime.

Parameter Tuning

For each new problem we need to find optimum paramters(threshold, minQueueLen, and epsilon) that results in the lower number of false positive (false detection) and false negatives (missed detection). I have put up an example of how parameter tuning (finding optimum values for threshold, minQueueLen, and epsilon) can be perfomed using an example dataset comprising of S&P500 opening day values for each day. The date serves as the label for this dataset. More details are here

Disclaimer

The code is my personal work and does not in any way represent my employer

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
test-data		test-data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
strangeness.py		strangeness.py
usageExample.py		usageExample.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Usage

Concrete usage example

Adding in your python code

Performance

Parameter Tuning

Disclaimer

About

Uh oh!

Releases

Packages

Languages

License

udayankumar/anomaly-detection

Folders and files

Latest commit

History

Repository files navigation

Introduction

Usage

Concrete usage example

Adding in your python code

Performance

Parameter Tuning

Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages