Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
**/.*
6 changes: 4 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@

*.bak
.DS_Store
*.pyc

data_test/
deprecated/
build/
dist/
nab.egg-info/
.idea/
.project
Expand All @@ -13,5 +15,5 @@ nab/detectors/htmjava/.pydevproject
scripts/.ipynb_checkpoints/

# Generated files

plot_*
*resultsSummary*
3 changes: 2 additions & 1 deletion .zenodo.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{
"description": "The Numenta Anomaly Benchmark",
"access_right": "open",
"license": {
Expand Down Expand Up @@ -81,5 +82,5 @@
{
"name": "breznak"
}
],
]
}
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Changelog
All notable changes to this project will be documented in this file.

## [v1.1] - 2019-09-12
### Updated runtime to Python 3
- Moved python 2 runtimes into independent detectors.
- Updated documentation and examples.

## [v1.0] - 2017-04-26
### Initial release
- Established proper python program setup.

## [v0.8] - 2015-09-04
### Initial tag for scoreboard
29 changes: 29 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
NAB is intended for the research community and we encourage your contributions and feedback!

Before your [pull requests](https://help.github.com/articles/using-pull-requests) can be reviewed by our team, you'll need to sign our [Contributor License](https://numenta.com/contributor-license).


#### Data
We welcome data you're willing to contribute. Specifically we're looking for data meeting the following criteria:
* real-world time-series data
* \>1000 records
* labeled anomalies

#### Anomaly detection algorithms
For us to consider adding your algorithm to the NAB repo it must meet the following criteria:
* open-source
* work with streaming data (i.e. process data in real-time)
* we must be able to fully-replicate your results

For an algorithm to be used in practice it must run online as data is streaming in, and not in batch. It is necessary the algorithms are computationally efficient to process streaming data, i.e O(N). The following algorithms have been tested on NAB and do not meet this criteria:
- [Lytics Anomalyzer](https://github.com/lytics/anomalyzer)
- Runs in O(N^2) because for each subsequent record the model retrains over all previous records.
- The author recommended using the detector within a moving window (250 records) to speed up the algorithm, yielding the following results: 4.42 on the standard profile, 2.39 for rewarding low FP, and 8.58 for rewarding low FN. However this still ran quite slow; e.g. running Anomalyzer on "realKnownCause/machine_temperature_system_failure.csv" took 52m0s, but only 4m39s for the HTM detector.

We investigated some popular open-source algorithms to add to NAB, and have found the following unsuitable for streaming/online anomaly detection:
- [Yahoo EGADS](https://github.com/yahoo/egads) separates time series modeling from anomaly detection. To detect anomalies EGADS compares the prediction error to a threshold, and it determines this threshold by scanning the whole data file. It may be possible to use a small part of EGADS to output a set of anomaly scores by simply outputting the prediction error, but this calls for a hardcoded threshold and is a significant departure from the algorithm.
- [Netflix's "Robust Anomaly Detection" (RAD)](https://github.com/Netflix/Surus) uses Robust Principal Component Analysis (RPCA), which is not inherently aware of time. RAD applies RPCA to time series by chunking the data according to a seasonality that you specify, thus creating "time dimensions". The algorithm scans an entire time series, and then decides where the anomalies occurred.
- [LinkedIn's luminol](https://github.com/linkedin/luminol) is a general time-series analysis toolkit, with several algorithms for anomaly detection. However, these algorithms run in batch, not streaming; they process an entire time-series and return the anomalous time windows after the fact.

#### Comments/suggestions
Want to suggest some changes to the NAB codebase? Submit an [issue](https://github.com/numenta/NAB/issues/new) and/or pull request and we'll take a look.
28 changes: 28 additions & 0 deletions Dockerfile.py27
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
FROM numenta/nupic:1.0.5

# Plus Java so we can run HTM.Java as well
RUN wget https://d3pxv6yz143wms.cloudfront.net/8.212.04.2/java-1.8.0-amazon-corretto-jdk_8.212.04-2_amd64.deb && \
apt-get update && apt-get install java-common && apt-get install -y --no-install-recommends apt-utils && \
dpkg --install java-1.8.0-amazon-corretto-jdk_8.212.04-2_amd64.deb

ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-amazon-corretto
ENV PATH $JAVA_HOME/bin:$PATH

ENV NAB /usr/local/src/nab

ADD . $NAB
WORKDIR $NAB
RUN python -m pip install -e .

# Run Numenta detectors
RUN echo "Running numenta detectors in Python 2.7..."
WORKDIR $NAB/nab/detectors/numenta
RUN python -m pip install -r requirements.txt
RUN python run.py --skipConfirmation

# Run HTM.Java detector
RUN echo "Running HTM.Java detector in Java 8 / Python 2.7..."
WORKDIR $NAB/nab/detectors/htmjava/nab/detectors/htmjava
RUN ./gradlew clean build
WORKDIR $NAB/nab/detectors/htmjava
RUN python run.py --skipConfirmation
147 changes: 69 additions & 78 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,31 @@
The Numenta Anomaly Benchmark [![Build Status](https://travis-ci.org/numenta/NAB.svg?branch=master)](https://travis-ci.org/numenta/NAB)
The Numenta Anomaly Benchmark (NAB) [![Build Status](https://travis-ci.org/numenta/NAB.svg?branch=master)](https://travis-ci.org/numenta/NAB) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1040335.svg)](https://doi.org/10.5281/zenodo.1040335)
-----------------------------

Welcome. This repository contains the data and scripts comprising the Numenta
Anomaly Benchmark (NAB). NAB is a novel benchmark for evaluating
Welcome. This repository contains the data and scripts which comprise the
Numenta Anomaly Benchmark (NAB) v1.1. NAB is a novel benchmark for evaluating
algorithms for anomaly detection in streaming, real-time applications. It is
comprised of over 50 labeled real-world and artificial timeseries data files plus a
novel scoring mechanism designed for real-time applications.

Included are the tools to allow you to easily run NAB on your
own anomaly detection algorithms; see the [NAB entry points
info](https://github.com/numenta/NAB/wiki#nab-entry-points). Competitive results
tied to open source code will be posted in the wiki on the
[Scoreboard](https://github.com/numenta/NAB/wiki/NAB%20Scoreboard). Let us know
about your work by emailing us at [nab@numenta.org](mailto:nab@numenta.org) or
composed of over 50 labeled real-world and artificial timeseries data files
plus a novel scoring mechanism designed for real-time applications.

Included are the tools to allow you to run NAB on your own anomaly detection
algorithms; see the [NAB entry points
info](https://github.com/numenta/NAB/wiki/NAB-Entry-Points). Competitive
results tied to open source code will be posted on the
[Scoreboard](https://github.com/numenta/NAB#scoreboard). Let us know about
your work by emailing us at [nab@numenta.org](mailto:nab@numenta.org) or
submitting a pull request.

This readme is a brief overview and contains details for setting up NAB. Please
refer to the following for more details about NAB scoring, data, and motivation:
This readme is a brief overview and contains details for setting up NAB.
Please refer to the following for more details about NAB scoring, data, and
motivation:

- [Unsupervised real-time anomaly detection for streaming data](http://www.sciencedirect.com/science/article/pii/S0925231217309864) - The main paper, covering NAB and Numenta's HTM-based anomaly detection algorithm
- [NAB Whitepaper](https://github.com/numenta/NAB/wiki#nab-whitepaper)
- [Evaluating Real-time Anomaly Detection Algorithms](http://arxiv.org/abs/1510.03336) - Original publication of NAB

We encourage you to publish your results on running NAB, and share them with us at [nab@numenta.org](nab@numenta.org). Please cite the following publication when referring to NAB:
We encourage you to publish your results on running NAB, and share them with
us at [nab@numenta.org](nab@numenta.org). Please cite the following
publication when referring to NAB:

Ahmad, S., Lavin, A., Purdy, S., & Agha, Z. (2017). Unsupervised real-time
anomaly detection for streaming data. Neurocomputing, Available online 2 June
Expand Down Expand Up @@ -59,27 +62,29 @@ The NAB scores are normalized such that the maximum possible is 100.0 (i.e. the

\**** We have included the results for RCF using an [AWS proprietary implementation](https://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/sqlrf-random-cut-forest.html); even though the algorithm code is not open source, the [algorithm description](http://proceedings.mlr.press/v48/guha16.pdf) is public and the code we used to run [NAB on RCF](nab/detectors/random_cut_forest) is open source.


† Algorithm was an entry to the [2016 NAB Competition](http://numenta.com/blog/2016/08/10/numenta-anomaly-benchmark-nab-competition-2016-winners/).

Please see [the wiki section on contributing algorithms](https://github.com/numenta/NAB/wiki/NAB-Contributions-Criteria#anomaly-detection-algorithms) for discussion on posting algorithms to the scoreboard.
Please see [the wiki section on contributing
algorithms](https://github.com/numenta/NAB/wiki/NAB-Contributions-Criteria#anomaly-detection-algorithms)
for discussion on posting algorithms to the scoreboard.

#### Corpus

The NAB corpus of 58 timeseries data files is designed to provide data for research
in streaming anomaly detection. It is comprised of both
real-world and artifical timeseries data containing labeled anomalous periods of behavior.
The NAB corpus of 58 timeseries data files is designed to provide data for
research in streaming anomaly detection. It is comprised of both real-world
and artifical timeseries data containing labeled anomalous periods of
behavior.

The majority of the data is real-world from a variety of sources such as AWS
server metrics, Twitter volume, advertisement clicking metrics, traffic data,
and more. All data is included in the repository, with more details in the [data
readme](https://github.com/numenta/NAB/tree/master/data). We are in the process
of adding more data, and actively searching for more data. Please contact us at
[nab@numenta.org](mailto:nab@numenta.org) if you have similar data (ideally with
known anomalies) that you would like to see incorporated into NAB.
and more. All data is included in the repository, with more details in the
[data readme](https://github.com/numenta/NAB/tree/master/data). Please
contact us at [nab@numenta.org](mailto:nab@numenta.org) if you have similar
data (ideally with known anomalies) that you would like to see incorporated
into NAB.

The NAB version will be updated whenever new data (and corresponding labels) is
added to the corpus; NAB is currently in v1.0.
The NAB version will be updated whenever new data (and corresponding labels)
is added to the corpus or other significant changes are made.

#### Additional Scores

Expand All @@ -96,8 +101,6 @@ run without likelihood, set the variable `self.useLikelihood` in
to `False`.




| Detector |Standard Profile | Reward Low FP | Reward Low FN |
|---------------|---------|------------------|---------------|
| Numenta HTMusing NuPIC v0.5.6* | 70.1 | 63.1 | 74.3 |
Expand All @@ -110,66 +113,57 @@ to `False`.

† Algorithm was an entry to the [2016 NAB Competition](http://numenta.com/blog/2016/08/10/numenta-anomaly-benchmark-nab-competition-2016-winners/).

Installing NAB 1.0
------------------
Installing NAB
--------------

### Supported Platforms

- OSX 10.9 and higher
- Amazon Linux (via AMI)

Other platforms may work but have not been tested.

Other platforms may work. NAB has been tested on Windows 10 but is not
officially supported.

### Initial requirements

You need to manually install the following:

- [Python 2.7](https://www.python.org/download/)
- [Python 3.6](https://www.python.org/download/)
- [pip](https://pip.pypa.io/en/latest/installing.html)
- [NumPy](http://www.numpy.org/)
- [NuPIC](http://www.github.com/numenta/nupic) (only required if running the Numenta detector)

##### Download this repository
#### Download this repository

Use the Github links provided in the right sidebar.

##### Install the Python requirements

cd NAB
(sudo) pip install -r requirements.txt

This will install the required modules.

##### Install NAB

Recommended:
#### Install NAB

pip install . --user
##### Pip:

From inside the checkout directory:

> Note: If NuPIC is not already installed, the version specified in
`NAB/requirements.txt` will be installed. If NuPIC is already installed, it
will not be re-installed.

pip install -r requirements.txt
pip install . --user

If you want to manage dependency versions yourself, you can skip dependencies
with:

pip install . --user --no-deps


If you are actively working on the code and are familiar with manual
PYTHONPATH setup:

pip install -e . --install-option="--prefix=/some/other/path/"
pip install -e . --install-option="--prefix=/some/other/path/"

##### Anaconda:

conda env create

### Usage

There are several different use cases for NAB:

1. If you just want to look at all the results we reported in the paper, there
1. If you want to look at all the results we reported in the paper, there
is no need to run anything. All the data files are in the data subdirectory and
all individual detections for reported algorithms are checked in to the results
subdirectory. Please see the README files in those locations.
Expand All @@ -178,31 +172,28 @@ subdirectory. Please see the README files in those locations.
`scripts` directory for `scripts/plot.py`

1. If you have your own algorithm and want to run the NAB benchmark, please see
the [NAB Entry Points](https://github.com/numenta/NAB/wiki#nab-entry-diagram)
the [NAB Entry Points](https://github.com/numenta/NAB/wiki/NAB-Entry-Points)
section in the wiki. (The easiest option is often to simply run your algorithm
on the data and output results in the CSV format we specify. Then run the NAB
scoring algorithm to compute the final scores. This is how we scored the Twitter
algorithm, which is written in R.)

1. If you are a NuPIC user and just want to run the Numenta HTM detector follow
1. If you are a NuPIC user and want to run the Numenta HTM detector follow
the directions below to "Run HTM with NAB".

1. If you want to run everything including the bundled Skyline detector follow
the directions below to "Run full NAB". Note that this will take hours as the
Skyline code is quite slow.

1. If you just want to run NAB on one or more data files (e.g. for debugging)
1. If you want to run NAB on one or more data files (e.g. for debugging)
follow the directions below to "Run a subset of NAB".


##### Run HTM with NAB

First make sure NuPIC is installed and working properly. Then:
##### Run a detector on NAB

cd /path/to/nab
python run.py -d numenta --detect --optimize --score --normalize
python run.py -d expose --detect --optimize --score --normalize

This will run the Numenta detector only and produce normalized scores. Note that
This will run the EXPoSE detector only and produce normalized scores. Note that
by default it tries to use all the cores on your machine. The above command
should take about 20-30 minutes on a current powerful laptop with 4-8 cores.
For debugging you can run subsets of the data files by modifying and specifying
Expand All @@ -212,27 +203,27 @@ specific label files (see section below). Please type:

to see all the options.

Note that to replicate results exactly as in the paper you may need to checkout
the specific version of NuPIC (and associated nupic.core) that is noted in the
[Scoreboard](https://github.com/numenta/NAB/wiki/NAB%20Scoreboard):
##### Running non-Python 3 detectors

NAB is a Python 3 framework, and can only integrate Python 3 detectors. The following detectors must be run outside the NAB runtime and integrated for scoring in a later step. These detectors include:

cd /path/to/nupic/
git checkout -b nab {TAG NAME}
cd /path/to/nupic.core/
git checkout -b nab {TAG NAME}
numenta (Python 2)
numentaTM (Python 2)
htmjava (Python 2 / Java)
twitterADVec (R)
random_cut_forest (AWS Kinesis Analytics)

Instructions on how to run the each detector in their native environment can be found in the `nab/detectors/${name}` directory. The Python 2 HTM detectors are also provided within a docker image, available with `docker pull numenta/nab:py2.7`.

##### Run full NAB

cd /path/to/nab
python run.py

This will run everything and produce results files for all anomaly detection
methods. Several algorithms are included in the repo, such as the Numenta
HTM anomaly detection method, as well as methods from the [Etsy
Skyline](https://github.com/etsy/skyline) anomaly detection library, a sliding
window detector, Bayes Changepoint, and so on. This will also pass those results
files to the scoring script to generate final NAB scores. **Note**: this option
will take many many hours to run.
This will run all detectors available in this repository and produce results
files. To run non-Python3 detectors see "Running non-Python3 detectors" above.

**Note**: this option may take many many hours to run.

##### Run subset of NAB data files

Expand All @@ -248,7 +239,7 @@ are interested in.
NAB on a subset of labels:

cd /path/to/nab
python run.py -d numenta --detect --windowsFile labels/combined_windows_tiny.json
python run.py -d expose --detect --windowsFile labels/combined_windows_tiny.json

This will run the `detect` phase of NAB on the data files specified in the above
JSON file. Note that scoring and normalization are not supported with this
Expand Down
20 changes: 20 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: NAB
channels:
- defaults
- conda-forge

dependencies:
- python=3.6
- pip

# See requirements.txt
- pandas==0.20.3
- simplejson==3.11.1
- boto3==1.9.134
- scikit-learn==0.21.1

- pip:
- boto3
- botocore
# Install NAB in development mode
- -e .
Loading