Anomaly Detection Template for TIBCO Spotfire®

This template detects anomalous data points in a dataset using an autoencoder algorithm.  It features automated machine learning to facilitate use by business analysts and citizen data scientists.  The Time Series release of the template includes time series analysis and clustering of anomalies

Compatible Products

TIBCO Spotfire®

Provider

TIBCO Software

Supported Versions

This analysis has been tested with  Spotfire 7.10, TERR 4.4.0, CRAN packages data.table (version 1.10.4)
and h2o (version 3.20.0.2)

License

TIBCO Component Exchange License

Overview

Anomaly detection is a way of detecting abnormal behavior. This template uses an autoencoder machine learning model to specify expected behavior and then monitors new data to match and highlight unexpected behavior.  Version 2 features automated machine learning to optimize model tuning parameters.  The Time Series release includes time series analysis, so it can be used as a form of 'control chart', as well input component drill-down to find the most important features influencing a reconstruction error and clustering analysis to group and analyze similar groups of anomalies.  

License Details

Release(s)

Release P1.0

Published: March 2017

Initial Release

Release P2.0

Release P1.0 for Time Series Analysis

Reviews (10)
5
sihemima-0 1:57am 03/28/2018

Very useful. Here is a video that explains the concepts as well: https://www.youtube.com/watch?v=Ebdp5Ao1o9o

 

5
jeyu 11:37am 07/10/2017

Quite useful model to present the power of machine learning

5
amartens 6:30am 07/05/2017

Great template. Very easy to use and well documented. 

5
rtanuwaj 2:49am 07/05/2017

Now it's time for Spotfire to extend the usecase for Advanced Statistic Modeling to build supervised and unsupervised models.

5
Sihem 1:07am 06/12/2017

Amazing demo that shows how Powerful Spotfire is!

Pages

Anomaly Detection with Autoencoder Machine Learning - Template for TIBCO Spotfire®

 

Anomaly detection is a way of detecting abnormal behavior. The technique first uses machine learning models to specify expected behavior and then monitors new data to match and highlight unexpected behavior (See citation

Use cases for Anomaly detection

Fighting Financial Crime – In the financial world, trillions of dollars’ worth of transactions happen every minute. Identifying suspicious ones in real time can provide organizations the necessary competitive edge in the market. Over the last few years, leading financial companies have increasingly adopted big data analytics to identify abnormal transactions, clients, suppliers, or other players. Machine Learning models are used extensively to make predictions that are more accurate.

Monitoring Equipment Sensors – Many different types of equipment, vehicles and machines now have sensors.  Monitoring these sensor outputs can be crucial to detecting and preventing breakdowns and disruptions.  Unsupervised learning algorithms like Auto encoders are widely used to detect anomalous data patterns that may predict impending problems. 

Healthcare claims fraud – Insurance fraud is a common occurrence in the healthcare industry. It is vital for insurance companies to identify claims that are fraudulent and ensure that no payout is made for those claims. The economist recently published an article that estimated $98 Billion as the cost of insurance fraud and expenses involved in fighting it. This amount would account for around 10% of annual Medicare & Medicaid spending. In the past few years, many companies have invested heavily in big data analytics to build supervised, unsupervised and semi-supervised models to predict insurance fraud.

Manufacturing detects – Auto encoders are also used in manufacturing for finding defects. Manual inspection to find anomalies is a laborious & offline process and building machine-learning models for each part of the system is difficult. Therefore, some companies implemented an auto encoder based process where sensor equipment data on manufactured components is continuously fed into a database and any defects (i.e. anomalies) are detected using the auto encoder model that scores the new data. Example

Techniques for Anomaly detection

Companies around the world have used many different techniques to fight fraud in their markets. While the below list is not comprehensive, three anomaly detection techniques have been popular -

Visual Discovery - Anomaly detection can also be accomplished through visual discovery. In this process, a team of data analysts/business analysts etc. builds bar charts; scatter plots etc. to find unexpected behavior in their business. This technique often requires prior business knowledge in the industry of operation and a lot of creative thinking to use the right visualizations to find the answers.

Supervised Learning - Supervised Learning is an improvement over visual discovery. In this technique, persons with business knowledge in the particular industry label a set of data points as normal or anomaly. An analyst then uses this labelled data to build machine learning models that will be able to predict anomalies on unlabeled new data.

Unsupervised Learning - Another technique that is very effective but is not as popular is Unsupervised learning. In this technique, unlabeled data is used to build unsupervised machine learning models.  These models are then used to predict new data. Since the model is tailored to fit normal data, the small number of data points that are anomalies stand out.

Some examples of unsupervised learning algorithms are -

Auto encoders – Unsupervised neural networks or auto encoders are used to replicate the input dataset by restricting the number of hidden layers in a neural network. A reconstruction error is generated upon prediction. Higher the reconstruction error, higher the possibility of that data point being an anomaly.

Clustering – In this technique, the analyst attempts to classify each data point into one of many pre-defined clusters by minimizing the within cluster variance. Models such as K-means clustering, K-nearest neighbors etc. used for this purpose. A K-means or a KNN model serves the purpose effectively since they assign a separate cluster for all those data points that do not look similar to normal data.

One-class support vector machine – In a support vector machine, the effort is to find a hyperplane that best divides a set of labelled data into two classes. For this purpose, the distance between the two nearest data points that lie on either side of the hyperplane is maximized. For anomaly detection, a One-class support vector machine is used and those data points that lie much farther away than the rest of the data are considered anomalies.

Time Series techniques – Anomalies can also be detected through time series analytics by building models that capture trend, seasonality and levels in time series data. These models are then used along with new data to find anomalies. Industry example

Auto encoders explained

Autoencoders use unsupervised neural networks that are both similar to and different from a traditional feed forward neural network. It is similar in that it uses the same principles (i.e. Backpropagation) to build a model. It is different in that, it does not use a labelled dataset containing a target variable for building the model. An unsupervised neural network also known as an Auto encoder uses the training dataset and attempts to replicate the output dataset by restricting the hidden layers/nodes.

The focus on this model is to learn an identity function or an approximation of it that would allow it to predict an output that is similar the input. The identity function achieves this by placing restrictions on the number of hidden units in the data. For example, if we have 10 columns in a dataset and only five hidden units, the neural network is forced to learn a more restricted representation of the input. By limiting the hidden units, we can force the model to learn a pattern in the data if there indeed exists one.

Not restricting the number of hidden units and instead specifying a ‘sparsity’ constraint on the neural network can also find an interesting structure.

Each of the hidden units can be either active or inactive and an activation function such as ‘tanh’ or ‘Rectifier’ can be applied to the input at these hidden units to change their state.

Some forms of auto encoders are as follows –

  • Under complete Auto encoders
  • Regularized Auto encoders
  • Representational Power, Layer Size and Depth
  • Stochastic Encoders and Decoders
  • Denoising Auto encoders

A detailed explanation of each of these types of auto encoders is available here.

Spotfire Template for Anomaly detection

TIBCO Spotfire’s Anomaly detection template uses an auto encoder trained in H2O for best in the market training performance. It can be configured with document properties on Spotfire pages and used as a point and click functionality.

Download the template from the Component Exchange.  See documentation in the download distribution for details on how to use this template

Time Series Analysis

Using AI to detect complex anomalies in time series data

Here is a presentation on recent work using Deep Learning Autoencoders for Anomaly Detection in Manufacturing.  In a dynamic manufacturing environment, it may not be adequate to only look for known process problems, but also important to uncover and react to new, previously unseen patterns and problems as they emerge.  Univariate and linear multivariate Statistical Process Control methods have traditionally been used in manufacturing to detect anomalies.  With increasing equipment, process and product complexity, multivariate anomalies that also involve significant interactions and nonlinearities may be missed by these more traditional methods.  This is a method for identifying complex anomalies using a deep learning autoencoder.  Once the anomalies are detected, their fingerprints are generated so they can be classified and clustered, enabling investigation of the causes of the clusters.  As new data streams in, it can be scored in real-time to identify new anomalies, assign them to clusters and respond to mitigate potential problems.  These tools are no longer the exclusive province of data scientists.  After an initial configuration, the method shown can be routinely employed by engineers who do not have deep expertise in data science.  Watch the video and view the slides below:      

Autoencoder deployed to Hi Tech Manufacturing Accelerator for real-time monitoring:

References:

Anomaly detection definition - Wikipedia

Autoencoders – Deep Learning book

Autoencoders - Stanford publication

Digit recognition (Image Search)

H2O Deep learning

View the Wiki Page