Big Data Capabilities in TIBCO Spotfire®
Last updated:
7:39am Feb 28, 2017

 Overview

TIBCO Spotfire® offers three primary types of native integration with Hadoop and other big data sources:

  • Visualizing Data: Native out-of-the-box data connectors that facilitate super fast interactive data visualizations and simple statistics.
  • Performing more advanced statistical and machine learning calculations:
    • Bring the engine to the data: Integration with in-datasource distributed computing frameworks that enable data calculations of any complexity on big data.
    • Bring the data to the engine: Integration with external statistical engines that get data directly from any data source, including traditional databases.

Together, these modes of integration offer a combination of visual data discovery and advanced analytics. They enable business users to access, combine, and analyze data from any underlying data structures simply by interacting with point-and-click dashboards and workflows that are powerful and easy to use.

Basic Big Data Architecture

For sources like Hadoop that support the Engine to Data approach

Data connectors facilitate fast, interactive, out-of-the-box data visualization. Simply interacting with visualizations drives the connectors to retrieve only the appropriate data slice or aggregation needed for that visualization.  Connectors are available for Spark SQL, Databricks Cloud, Impala, Hive, Hortonworks, Teradata, Netezza, Vertica and others.   The Connectors access data in 3 modes - In-memory, In-datasource and On-demand:

  • In-memory mode brings row-level data into Spotfire client memory
  • In-datasource mode: Calculations can be performed in-datasource on big data so that only the aggregation statistics are brought back to Spotfire
  • With on-demand, subsets of row-level data can be swapped in and out of memory on-demand, as-needed 

In-datasource analytics for advanced statistical and machine learning algorithms are performed with a distributed computing framework such as Spark, MapReduce and others.  The framework features engines that are distributed on the datasource nodes, often in master-slave configurations.  These algorithms can be initiated from Spotfire to be run in-datasource on very large datasets, only returning the results to Spotfire needed for visualizations.

Spotfire includes an embedded TIBCO Enterprise Runtime for R (TERR) engine.  This is a robust, high-performance platform for R advanced analytics designed from the ground up for enterprise scalability and embedded analytics.  TERR scripts can initiate distributed computing jobs via Map/Reduce, SparkR, H2O or Fuzzy Logix.and can also be deployed as the advanced analytics engine in Hadoop nodes.

Putting it all together:  Combining all these powerful functionalities means that very sophisticated analytic use cases can be encapsulated in easy-to-use interactive Spotfire dashboards. This empowers business users to visualize and analyze without worrying about the details of the Hadoop architecture or how calculations are performed. 

Complete Big Data Architecture

Shows both the Engine-to-Data and Data-to-Engine approaches, enabling big data advanced analytics with any datasource. 

Spotfire Big Data Architectural Diagram - View this video for a detailed explanation of this diagram with demos that illustrate the capabilities: Deriving Value from Big Data

Technology

General

Specific Technologies

Apache Drill

Cloudera Impala

Spark

TIBCO Spark Accelerator

H2O

  • Insight to Action - Integrating Spotfire and TERR with H20 Machine Learning - H20 World 2015

Use Cases

Learn More

 

Back to Main Spotfire Wiki page