TIBCO Big Data Capabilities
Last updated:
11:49am May 15, 2018

 Overview

Getting value from your Big Data investment requires being able to exploit the data you save in your cluster. As the word Big implies, searching for the pieces of information that add ROI to your organisation can be like searching for a neddle in a haystack, which is why many companies report low ROI on the big data investments.

TIBCO aids organisations exploit their big data by facilitating two main avenues - visualising the data and running any form of advanced calculations on it:

  • Visualizing Data: Native out-of-the-box data connectors that facilitate super fast interactive data visualizations and simple statistics in a Spotfire dashboard. Bring to your business or advanced users the ability to enquire big data sources and make truly data-driven decisions, from dashboards that speak business language. Call any form of advanced computations to be run in your cluster and consume their results visually in Spotfire's friendly environment.
  • Performing more advanced statistical and machine learning calculations. Here again TIBCO provides different approaches to accomodate to the preferences of your data science team.
    • Rest-API calls to remote services: from a Spotfire dashboard, use TERR to call a Scala job on your big data and consume the results visually in Spotfire. This approach is often prefered by data science team who love coding.
    • Statistica workflow: Statistica is a workflow tool that connects to your small or big data and allows drawing out-of-the boax advanced computations therein. Call existing H2O or SparkMLlib models or run Python and R scripts and let Statistica manage your relationship to the cluster.
    • Spotfire Data Science: Spotfire DS gives you freedom. A workflow tool designed for in-database computations on diverse data sources, be it any flavour of Hadoop or more traditional databases. Do not move data around. Connect to your big distributed data and use Spotfire DS for ETL'ing the data in an adhoc manner then apply your favourite machine learning model, apply your Python or R code, all in database. Focus on your data science and leave the engineering to us. Call your Python, R, or Jupyter notebooks or extend existing nodes with your own customisations. Enjoy the full power of customisation together with the simplicity of out-of-the-box workflow comprehensiveness.

Together, these modes of integration offer a combination of visual data discovery and advanced analytics. They enable business users to access, combine, and analyze data from any underlying data structures simply by interacting with point-and-click dashboards and workflows that are powerful and easy to use.

Visualising Big Data with Spotfire

Moving large chunks of data across applications is hugely inneficient. Spotfire Data Connectors allow three modes of interaction with datastores, that allow you to tailor data access to your needs. Data connectors facilitate fast, interactive, out-of-the-box data visualization. Simply interacting with visualizations drives the connectors to retrieve only the appropriate data slice or aggregation needed for that visualization.  Connectors are available for most datasources whether they are traditional SQL databases or Big Data technologies, such as Spark SQL, Databricks Cloud, Impala, Hive, Hortonworks, Teradata, Netezza, Vertica and others.   The Connectors access data in 3 modes - In-memory, In-datasource and On-demand:

  • In-memory mode brings row-level data into Spotfire client memory. This is sensible only for aggregations of big data, such as time series, where you want to use all of Spotfire in-memory capabilities, e.g. enable the in-built time series forecast or in-built time series clustering to identify pockets of time series (e.g. regional sales data) that move together over time.
  • On-demand, subsets of row-level data can be swapped in and out of memory on-demand, as-needed. For example, one might want to load all sales data for a geographic region, once you have completed your analysis one can offload that region's data and load the next. Spotfire can also prompt the user for which section to load at start-up. 
  • In-datasource mode: Calculations can be performed in-datasource on big data so that only the aggregation statistics are brought back to Spotfire. As the user plays with Spotfire, Spotfire generates the SQL that interrogates the under-lying data source. Spotfire allows leveraging your own datasource, whether it be Hadoop or a traditional database, for doing one of the things it does best - answer SQL. Only the answer is actually sent to Spotfire. Which means Spotfire will answer your question as fast as the underlying datasource allows. Furthermore, when working with in-database data, Spotfire will present users with the set of computations the underlying data source allows, making it seamless for users to compute KPIs in the relevant datastore.

Spotfire running on Cloudera Impala - 20 second video as an example of the speed of Spotfire's in-database connectivity

 

Beyond SQL with TIBCO - from ETL'ing to Machine Learning

TERR for Big Data Technology and Integration

For sources like Hadoop that support the Engine to Data approach

    In-datasource analytics for advanced statistical and machine learning algorithms are performed with a distributed computing framework such as Spark, MapReduce and others.  The framework features engines that are distributed on the datasource nodes, often in master-slave configurations.  These algorithms can be initiated from Spotfire to be run in-datasource on very large datasets, only returning the results to Spotfire needed for visualizations.

    Spotfire includes an embedded TIBCO Enterprise Runtime for R (TERR) engine.  This is a robust, high-performance platform for R advanced analytics designed from the ground up for enterprise scalability and embedded analytics.  TERR scripts can initiate distributed computing jobs via Map/Reduce, SparkR, H2O or Fuzzy Logix, Spotfire Data Science or Statistica, and can also be deployed as the advanced analytics engine in Hadoop nodes.

    Putting it all together:  Combining all these powerful functionalities means that very sophisticated analytic use cases can be encapsulated in easy-to-use interactive Spotfire dashboards. This empowers business users to visualize and analyze without worrying about the details of the Hadoop architecture or how calculations are performed. 

    If your Big Data is a more traditional SQL based database, however, TIBCO also allows Data-to-Engine approaches, that allows loading the data you are visualising directly into a large central engine to perform whichever forms of computations you deem relevant.

    Shows both the Engine-to-Data and Data-to-Engine approaches, enabling all three flavours of visualisations as well big data advanced analytics with any datasource.

    Spotfire Big Data Architectural Diagram - View this video for a detailed explanation of this diagram with demos that illustrate the capabilities: Deriving Value from Big Data

    Machine Learning Models created in TERR can be run in TIBCO event processing tools, allowing real time machine learning driven action in a closed loop of Connected Intelligence throughout your organisation.

     

    Statistica Big Data Technology and Integration

    Statistica provides a flexible architecture that can move analytics to the data (into the database, data repository) orchestrate complex analytic pipelines that combine multiple data sources and in-database, in-memory parallel, and in-server analytics when it is most efficient and useful. It has capabilities to push the computations to big data applications such as Spark and H2O from within a simple and powerful workspace interface.  Computation results can be visualized in Spotfire

    Statistica integration with Big Data Applications and Spotfire 

    Machine Learning Models created in Statistica can be run in TIBCO event processing tools, allowing a closed loop of Connected Intelligence throughout your organisation.

     

    Spotfire Data Science Big Data Technology and Integration

    TIBCO Spotfire Data Science is an enterprise analytics platform that allows data scientists and business users to collaborate on advanced analytics using massively scalable in-database and in-cluster processing. Data Scientists, Analysts and Data Engineers build machine learning workflows with a minimum of code, while still leveraging the power of Big Data platforms. The collaboration interface then allows the analytics team to share insights and data with the rest of the organization, driving action for the business.

     

    For more information on Spotfire Data Science, visit the TIBCO Spotfire Data Science Community Wiki.

    Machine Learning Models created in Spotfire Data Science can be run in TIBCO event processing tools, allowing real time machine learning driven action in a closed loop of Connected Intelligence throughout your organisation.

     

    General Technologies and Best Practices

    General

    Specific Technologies

    Apache Drill

    Cloudera Impala

    Spark

    TIBCO Spark Accelerator

    H2O

    • Insight to Action - Integrating Spotfire and TERR with H20 Machine Learning - H20 World 2015

    Use Cases

    Learn More

     

    Back to Main Spotfire Wiki page