TIBCO Spotfire® offers three primary types of native integration with Hadoop and other big data sources:
- Visualizing Data: Native out-of-the-box data connectors that facilitate super fast interactive data visualizations and simple statistics.
- Performing more advanced statistical and machine learning calculations:
- Bring the engine to the data: Integration with in-datasource distributed computing frameworks that enable data calculations of any complexity on big data.
- Bring the data to the engine: Integration with external statistical engines that get data directly from any data source, including traditional databases.
Together, these modes of integration offer a combination of visual data discovery and advanced analytics. They enable business users to access, combine, and analyze data from any underlying data structures simply by interacting with point-and-click dashboards and workflows that are powerful and easy to use.
Basic Big Data Architecture
For sources like Hadoop that support the Engine to Data approach
Data connectors facilitate fast, interactive, out-of-the-box data visualization. Simply interacting with visualizations drives the connectors to retrieve only the appropriate data slice or aggregation needed for that visualization. Connectors are available for Spark SQL, Databricks Cloud, Impala, Hive, Hortonworks, Teradata, Netezza, Vertica and others. The Connectors access data in 3 modes - In-memory, In-datasource and On-demand:
- In-memory mode brings row-level data into Spotfire client memory
- In-datasource mode: Calculations can be performed in-datasource on big data so that only the aggregation statistics are brought back to Spotfire
- With on-demand, subsets of row-level data can be swapped in and out of memory on-demand, as-needed
In-datasource analytics for advanced statistical and machine learning algorithms are performed with a distributed computing framework such as Spark, MapReduce and others. The framework features engines that are distributed on the datasource nodes, often in master-slave configurations. These algorithms can be initiated from Spotfire to be run in-datasource on very large datasets, only returning the results to Spotfire needed for visualizations.
Spotfire includes an embedded TIBCO Enterprise Runtime for R (TERR) engine. This is a robust, high-performance platform for R advanced analytics designed from the ground up for enterprise scalability and embedded analytics. TERR scripts can initiate distributed computing jobs via Map/Reduce, SparkR, H2O or Fuzzy Logix.and can also be deployed as the advanced analytics engine in Hadoop nodes.
Putting it all together: Combining all these powerful functionalities means that very sophisticated analytic use cases can be encapsulated in easy-to-use interactive Spotfire dashboards. This empowers business users to visualize and analyze without worrying about the details of the Hadoop architecture or how calculations are performed.
Complete Big Data Architecture
Shows both the Engine-to-Data and Data-to-Engine approaches, enabling big data advanced analytics with any datasource.
Spotfire Big Data Architectural Diagram - View this video for a detailed explanation of this diagram with demos that illustrate the capabilities: Deriving Value from Big Data
- Best Practices for Data Access with Apache® Hadoop® , Spark™ and big data
- Spotfire Big Data Connectors - How to Navigate Big Data with Ad Hoc Visual Data Discovery
- Easy to Use Interfaces for Hadoop Calculations: Advanced Analytics with MapR and H2O
- Spotfire and Hadoop: Interactive Analysis on Big Data
- Putting It All Together: Creating a Big Data Analytic Workflow with Spotfire
- Cloudera Impala and TIBCO Spotfire®: Fast Interactive Visual Analytics on Big Data
- Spotfire running on Cloudera Impala - 20 second video as an example of the speed of Spotfire's in-database connectivity
- 4 Easy Steps for Ultra-Fast Visualization of Big Data with Spotfire and Spark SQL
- Spark TERR - the power of R in Spark with the spead of TERR: How to configure SparkR to use TIBCO Enterprise Runtime for R (documentation)
- An Insiders Guide to Apache Spark by insideBIGDATA sponsored by Tibco Spotfire
- Apache Spark blog series by insideBIGDATA sponsored by Tibco:
TIBCO Spark Accelerator
- TIBCO Spotfire Accelerator for Apache Spark - Inspect collected data, Train models, Evaluate the trained models, Build configurations, Deploy models
- Download the Spark Accelerator
- Insight to Action - Integrating Spotfire and TERR with H20 Machine Learning - H20 World 2015
- Customer Analytics With Big Data and Making Real-Time Offers from Big Data Customer Insights using the Spark Accelerator
- Manufacturing Product Quality Improvement
- Big Data Solution Page on the Spotfire website
- Spotfire Data Access and Connectors
- TERR for R performance and Big Data
- Advanced Analytics Use-Case examples
Back to Main Spotfire Wiki page