Webinar Series - Compete & Win with Data Science and AI
Last updated:
2:45pm Nov 28, 2018

This TIBCO Community page accommodates the TIBCO Webinar series Compete and Win with Data Science and AI.

Click here to get access to all webinars - Webinar 1 and 2 are now available on demand, webinar 3 will be shortly.

We developed this page to support the content, Q&A and hands-on exercises of the Webinar Series.

For questions, please use the Answers Section and post your question referring to this webinar series and you will get an answer! Include a link to this Wiki page and tag your question with #competeandwin for quicker response.

Webinar 1 - Data Science - Compete and Win in the Algorithm Economy

This is the pdf version of the presentation of webinar 1

If you would like to trial our Analytics Software click on these links or any of the try now and trial buttons on our Web Site:

Webinar 2 - Data Science Lab: Hands on demos from TIBCO’s best data scientists

This is the pdf version of the presentation of webinar 2

Get hands-on if you would like to follow-along or 'try this at home'

  • See the links to free trials above
  • Exercise 1 - Spotfire and TERR
    • You can download the multi-purpose Spotfire template that creates a supervised and unsupervised learning algorithm.
    • Answer the following questions: what are the main drivers of fraud in this organisation at the moment?
    • You can download this sample dataset about customer churn. In the Spotfire template, replace the FraudInputData table with this dataset. Answer the same question above about churn.
    • If you are interesrested in Streambase, BPM, or the Fraud Accelerator, learn more here.
  • Exercise 2 - Statistica
    • You can download these datasets to follow-along with the Statistica exercise in Webinar 2. The Statistica demo files zip folder contains 3 files - Credit Scoring Historical Data, Credit Scoring Validation Data & Credit Scoring Deployment Data. These files are Statistica Spreadsheet files and can be opened using Statistica software.
    • We have also created a powerpoint which we will be using in the Webinar. You can also download 'Hands on exercise for Statistica' in case you want to setup the data at your end before we create the demo on 11th.
  • Link to TIBCO Community Exchange for downloadable Machine Learning templates for Spotfire or start with reviewing some furher information on Machine Learning use-cases.

Webinar 3 - Real-life Examples of Using Data Science to Compete and Win

This is the pdf version of the TIBCO Slides from Webinar 3 - and here are the slides from Precise Prediction.

Questions & Answers

Please use the Answers Section and post your question referring to this webinar series and you will get an answer! (include link to this Wiki page and tag your question appropriately for quicker response) We are listing some of the questions asked during the live Q&A or coming up frequently in other interactions around this content.

Q1. Does Statistica support Greenplum?

Greenplum has ODBC/JDBC drivers that we can leverage to connect it to Statistica.

Apart from this, Statistica offers a wide range of connectors to import data from different sources (e.g., SQL Server, Oracle, or any database that supports the OLE DB or ODBC standards) and databases that support multidimensional cubes such as SAP Business Warehouse.  There are also specialized interfaces to designated specialized databases (e.g., OSI Pi. Process databases). These database connections can be one-time queries or auto-refreshing queries, as new data are collected.  Statistica also imports data from a variety of flat file formats including Excel, text (CSV or other), SAS, Minitab, JMP, and SPSS. Statistica also includes In-Database Analytics (so that no data import is required, and all computations are done inside the database itself). More details can be seen here -:  http://statistica.io/statsoft/connectivity-and-data-integration/

Q2. Is t-SNE maps implemented in either Statistica or Spotfire?

Through Statistica we can bring in the following R package to support t-SNE as we don’t have it implemented directly. In Spotfire they can be done with Javascript visualisations. You can also do them in R or Matlab.

https://cran.r-project.org/web/packages/tsne/index.html

Q3. Where can I learn more about Data Science?

  • We are hosting a 5-part 'The Building Blocks of Data Science' Webinar Series jointly with our Data Science partner Ruths.ai - please review for content and join series or watch webinars on demand
  • Join our Spotfire on-line user group with 'how to' demos hosted by the TIBCO Data Science team in recurring TIBCO Analytics Meetups. For advanced Spotfire/TERR users. Previous recordings on this TIBCO Community page
  • If you are just starting in Data Science, it might be useful to first look for some on-line courses. For example Johns Hopkins University has a comprehensive series of on-line courses available via Coursera. Click here for more info.

Q4. Where can I find the recordings of the previous webinars?

Click on this link which is the main webinar series registraiton page, register after clicking on the title of the webinar you are interested in (or click on 'learn more') - and you will get access to the recorded webinar (which are those that are listed with on-demand before the title, usually appears 24-48 hours after the event)

Q5. Can I install and use additional R packages for machine learning and deep learning? 

Yes this is possible in Statistica as well as Spotfire (Tools/TERR Tools)

Q6. In Statistica Workspace do I need to manually separate Training and Validation sets? 

There are various ways to build models. One way was what was shown in the demo which was manual. Another way is to use the subset command to separate the data into training and validation datasets. But you can even separate training and testing dataset in the model node itself.

Q7. Does Statistica work in real-time?

You can schedule to update the datasets or the workflows based on the requirement. If you are looking for true real-time applications the combination of Statistica and Streambase would be recommended.

Q8. Is it possible to connect these tools to big data platforms? 

Yes, for both Statistica and Spotfire.

  • Statistica provides big data menu where you can use H2O, Spark, Python, C Sharp and in-database analytics capability of Statistica. There are some useful demos on Youtube, in fact there is a whole series but specifically you might find these two interesting: Spark Integration and H20 Integration
  • Spotfire connects to big data platforms such that you can efficiently visualise that data as well as call SparkTERR or Spark R jobs, H2O models, etc. See the Spark Accelerator for more information, we recommend reading the Reference Info tab first.

Q9. How big may datasets be when running advanced methods as RF or NN within reasonable time? 

This depends on the set-up of the system. It can be seconds or minutes depending on how the platform has been integrated in the system. We have solutions like the Spark Accelerator that are set up to do exactly this on Hadoop. Because they leverage distributed processing, time does not grow linearly with data.

Q10. Where do I download the Spotfire machine learning template you used in Webinar 2?

See instructions under Webinar 2 above. You can download the multi-purpose Spotfire template that creates a supervised and unsupervised learning algorithm.

Q11. How can models be exported from Statistica to Streambase? 

You export a PMML model from Statistica and import that into Streambase. Since the acquisition we are considering various integration ideas in our plans. If you have any specific needs or ideas please let us know. If you are an existing TIBCO Software user please also take a look at the Ideas Portal on the TIBCO Community which is a way to submit your product ideas to our product management team directly or vote for the ones that someone else submitted and that you like. There are some useful filters and a search option to help you find the relevant ideas.