Random Forest Template for TIBCO Spotfire®

Random Forest is an ensemble tree machine-learning algorithm.  This template employs supervised learning to determine variable importance and make predictions. It features automated machine learning to facilitate use by business analysts and citizen data scientists.

Compatible Products

TIBCO Spotfire®

Provider

TIBCO Software

Supported Versions

This template has been tested with TIBCO Spotfire v7.8, TERR 4.3, JAVA 7 (64-bit) and CRAN packages data.table (version 1.10.4) and h2o (version 3.10.3.6).

License

TIBCO Component Exchange License

Overview

Random Forest is a machine-learning algorithm that aggregates the predictions from many decision trees on different subsets of data.  This technique allows the model to be more accurate than single decision trees in predicting new data.  It is a supervised learning technique that can be used to determine variable importance and make predictions.  This point-and-click template uses a distributed random forest trained in H2O for best in the market training performance.  The response can be either numeric or binary (e.g., good / bad) and predictors can be a mixture of numeric and categorical columns.  Version 2 features automated machine learning to optimize model tuning parameters.

License Details

   

Release(s)

Release P1.0

Published: March 2017

Initial Release

Release P2.0

There are currently no reviews for this content.

Why not be the first to review it - click here to login

Random Forest Template for TIBCO Spotfire® - Wiki page

Overview

Random Forest is a machine-learning algorithm that aggregates the predictions from many decision trees on different subsets of data. This technique allows the model to be more accurate in predicting new data.

General Market Landscape

Huge amounts of data is created every minute in the world today. Companies around the world have adopted different techniques to extract value out of this data. Some of these techniques are -

Visual Discovery -  Companies employ teams of data analysts to extract the data available in their databases, build meaningful dashboards and bring out useful patterns that help in making informed business decisions and formulating long-term business strategy. This process while useful is prone to human error and consumes a lot of time.

Supervised Learning - Supervised learning takes a more refined approach to finding meaningful patterns. In this process, labelled data is used with machine-learning models like Random Forest to predict a numerical column (Customer LTV) or a column with categories (product likely to purchase). Some famous supervised learning algorithms include Random Forest, Gradient Boosting models, Extreme Gradient Boosting (XGBoost) etc.

Unsupervised Learning - Unsupervised learning is the process of using unlabeled data and finding those data points that do not fit the pattern exhibited by the rest of the data. Some popular unsupervised learning algorithms include K-means clustering, Auto encoders etc.

Random Forests Algorithms Explained

Random forests follow a technique known bagging (also known as Bootstrap aggregation). This is an ensemble technique where a number of decision trees are built based on subsets of data and an aggregation of the predictions is used as the final prediction.

An illustration of this technique can be seen in the graphic below - 

The above illustration shows three decision trees and a classification obtained from each of them. The final prediction is based on majority voting and will be ‘Class B’ in the above case.

When the random forest algorithm receives the data, it first subsets the data by selecting sqrt(Number of columns) for classification or (Number of columns)/3 for regression. It also takes a bootstrap sample of the rows of data. The algorithm will create as many subsets as is the number of trees specified.

Then, a decision tree is built using each subset of data and a prediction is computed. A final prediction is computed based on the results of the individual predictions.

Random Forests have the following advantages –

  • Solves the problem of model overfitting
  • Runs efficiently for large datasets
  • Handles missing data
  • Ensures that the model is more generalizable
  • Output variable importance

Spotfire Template for Random Forest

TIBCO Spotfire’s Random forest template uses a distributed random forest trained in H2O for best in the market training performance. It can be configured with document properties on Spotfire pages and used as a point and click functionality.  

Download the template from the Component Exchange.  See documentation in the download distribution for details on how to use this template.  

References:

Layman’s Introduction to Random Forests – Edwin Chen

H2O’s Distributed Random Forest

Random forest – Illustration image

View the Wiki Page