Google Cloud integration for TIBCO Spotfire®
- This TIBCO Spotfire® Template is designed to. enable the user to utilize the text analysis from GCP ML blocks and Python text mining packages, Spacey, and NLTK.
- The data set in the template is the New York City Airbnb listing dataset, which can be downloaded at http://insideairbnb.com/get-the-data.html.
- This template is constructed based on:
- Spotfire native BigQuery Connector
- GCP AutoML Natural Language - Comment Classification
- GCP Natural Language API - Sentinent Analysis
- Python Data Function
- Analysis: Live comment scoring, batch scoring, word frequency analysis, and word2vec
In this blog, we explore how to leverage Google infrastructure and machine learning offerings inside TIBCO Spotfire® and TIBCO® Data Science products to analyze massive amounts of data and perform NLP/Text Mining faster and more precisely than traditional methods.
The bigger picture here is; the TIBCO Data Science products embrace multiple cloud platforms. We have offerings and solutions for all the major cloud platforms (AWS, GCP, Azure) in our TIBCO Data Science products and TIBCO sees GCP as another piece in the same puzzle. Having something like GCP integration in the TIBCO Data Science portfolio enables us now to engage with customers irrespective of where they want to perform their Machine learning, i.e. on-premise or AWS or Azure or GCP.
This use case is an example of a Customer’s Journey tracked from booking until the customer’s end of the stay. Airbnb Support Analyst will analyze the reviews from NYC and gather insights which could help zero into the concerns from the customer reviews.
To create good models, you need good data. If you have all the data first, you can make the algorithms more effective. So our strategy was 3 steps:
- Fundamentally use a platform that is comfortable for ingesting, normalizing, massive amounts of data.
- Use visualizations to explore the raw data and later to explore the results from the models.
- Find ways to use machine learning to make more sense and value of the data.
The data behind the Inside Airbnb site is sourced from publicly available information from the Airbnb site. These are reviews from customers who have stayed in the New York City listings in 2019. This data file includes all the needed information to find out more about hosts, geographical availability, necessary metrics to make predictions and draw conclusions. Guests and hosts have used Airbnb to expand on traveling possibilities and present a more unique, personalized way of experiencing the world. The data is stored in BigQuery which is an enterprise data warehouse solution for Google Cloud.
Upon ingesting the data, the Support Analysts can then quickly understand some key attributes of the data (e.g. like plotting prices of the listings on a map chart to see which boroughs have high listing prices within NYC). Here the Support Analyst has the freedom to slice and dice the data by several attributes of interest. This will also help the analysts to understand the demographics of all the listings and focus on ones with negative reviews to take timely and appropriate action to prevent churn.
Once we understand the data, the next step is to build a “Text Classification” model. We use Google Cloud’s AutoML Natural Language service to achieve this. The detailed process to set up the Cloud AutoML project can be found here in this link. A quick summary for setting up this project
- Set up your project
- Select a cloud project and enable the Billing, the Cloud AutoML and the Storage APIs.
- Model objectives
- Multi-label classification – allows a document to be assigned multiple labels
- Create a dataset
- Train your model (Add custom labels for classification purposes)
- Deploy the custom model to auto-generate API code
Now, focusing on the Spotfire-GCP integration, the main goal is to leverage the trained model from the above step directly in Spotfire. To achieve this, we have built a Spotfire Dashboard that uses Python integration (via Data Function) and the API code generated in the GCP project to make a REST API call to the deployed model in the Cloud AutoML project and classify listings into their respective labels (in this case Bedroom, Kitchen, Host, Price, Overall, location, neighborhood, decoration).
Here’s the step-by-step to set up the Python Data Function:
- Register a new Data Function (From Menu->Tools->Register Data Function menu), and select Python as your engine (if you have a red error that means Python is likely not added to your windows PATH variable).
- Copy and Paste the Python code you get from Cloud AutoML UI (From Test&Use-> Use your custom model). Notice that the output from is in JSON structure, so we will need to reconstruct it into data table format
- Set Data Function inputs as defined below. The Data Function will take a string value as input
- Define the outputs - it will return the labeling result for each text value
Upon getting the labels, we also use other Google AI & Machine Learning Products in combination to our benefit. Here, we used Google Natural Language API to perform Sentiment Analysis and Entity Extraction on the above reviews using a REST API call through Spotfire’s Python Data Function.
Note that Cloud AutoML Natural Language service is very different form the Google Natural Language API in the sense that
- The Natural Language API is a pre-trained model trained by Google by using their own data, and the calls to these models can be made via a REST API call in Python
- The Cloud AutoML Natural Language service lets you train your own custom high quality model using Google state-of-the-art AutoML technology. These models are deployed in your GCP project and can later be called using a REST API call in Python
The procedure to create a Python Data Function to wrap the Natural Language API is very similar to the previous implementation:
- Register a new data function
- Insert Python script
- Add inputs
- Add outputs
This data function will take a text value input and return a Spotfire table with sentiment scores.
Here, the scores in the KPI charts are displayed as a percentage measure. The Entity KPI chart helps the Analyst understand the entities associated with a review (in this case Location, Person etc.) and how they contribute to the overall sentiment score of the review. Now, the Support Analysts can filter all the reviews with negative sentiment scores and analyze the associated labels and the entities to understand where the real problems exist. The Analyst can then inform the hosts and ask them to take timely action to fix the problems reported by the customers. In addition, the Analysts can also reach out to the customers to make sure they are compensated for their inconvenience.
The Support Analysts also can look at the sentiment scores grouped by neighborhood and provide more detailed recommendations for customers while booking their Airbnb stay based on neighborhood sentiments, in addition to other factors like traffic, restraints, experiences etc.
Swift and targeted customer service allows hosts continue to improve their housing quality business on the Airbnb platform, in turn customers keep coming back to book their stays with Airbnb.
In conclusion, there are many applications using the Spotfire-GCP integrations and we have just scratched the surface of what can be achieved.
Please feel free to ask any questions on the TIBCO community with a link to this blog.
Author - Vinoth Manamala, Eric Hsu
Co Authors - Vaibhav Gedigeri
TIBCO Data Science Team.