TIBCO Statistica 13.5
Missing Data Processing in Spreadsheet Formulas (IMPORTANT)
Missing data (MD = missing data, empty cell) processing has changed in spreadsheet formulas. To revert the behavior used prior to 13.5, select Home ribbon -> Options -> Spreadsheets -> Use legacy MD comparisons in formulas. After changing this selection, quit and restart Statistica.
Why is this change being made ? The goal was to simplify writing formulas.
Prior to 13.5:
V1 = 1, V2 = MD
=iif(V1=V2, "match", "not match"), returned MD
V1 = MD, V2 = MD
=iif(V1=V2, "match", "not match"), returned MD
V1 = 1, V2 = MD
=iif(V1=V2, "match", "not match"), returns "not match"
V1 = MD, V2 = MD
=iif(V1=V2, "match", "not match"), returns "match"
Spotfire Integration - More Parameterization Options
The Spotfire Analyst creating a Statistica data function can now parameterize the connected Statistica workspace. When the user registers a new Statistica Data Function by selecting a workspace, input parameters of value-type are created to expose node-level parameters. This is in addition to the input (table-type and value-type) and output (table-type) parameters created, based on the "input" nodes, "output" nodes and workspace dictionary variables that are created. For example, the Spotfire dashboard could be connected to the Select Predictors node mentioned below.
This gives the analyst greater control over the analytic options. It is possible to reduce the number of Statistica workspace's parameters shown to the Spotfire Analyst. While editing a workspace within Statistica, select the Designer View button. You can select or unselect nodes and parameters within nodes.
Spotfire Integration - Reporting Documents node
The Reporting Documents node within a Statistica workspace could not be used prior to 13.5. Now any spreadsheet type documents (tables) generated and stored within Reporting Documents node will show up and can be used in the Spotfire Data Function.
Spotfire Integration - Share with Other Spotfire Analysts
... call Statistica Workspace (no code analytics) via a Spotfire Data Function
... embed Statistica Workspace within Spotfire .dxp file and share with other Spotfire Analysts
It is now optional to install Statistica on the same computer with Spotfire Analyst. The Spotfire Analyst only needs to have the Statistica Extension packages installed.
Note: The Spotfire Dashboard can only be executed by Spotfire Analysts locally. It is useful for Ad-hoc analysis, Exploring, Model Building, Feature Selection. This functionality cannot be used by the Spotfire Consumer.
When you work with large datasets and select long intricate lists of variables for analyses with different variable categories (example: dependent, categorical, continuous), if some variables in these lists are overlapping, its hard to determine which variables are affected to make a correction. The Lists overlap error message has been improved. It states which variables overlap. And it provides the following three options:
- remove duplicates from first list
- remove duplicates from second list
- edit variable selections manually
This feature decreases the amount of time spent on variable selection within interactive modules and workspace's nodes.
Alternative Least Squares Deployment node
Alternative Least Squares Deployment node has been updated. After selecting the Deploy to Enterprise button, the user can now select a new None option instead of selecting a Data Configuration.
Customize Output for Workspace
The Customize Output feature is mostly commonly used to set the number of decimal places for a statistic, bold text, format graphics. This feature is accessed by right-clicking on any node within a workspace. You will see the menu to select and open the Customize Options dialog box.
A new option, Suppress output check box, was added. This allows the workspace designer to decide which output, per node, shows up in Reporting Documents node. All the graph outputs from all nodes or a specięc node can be suppressed. All the spreadsheet (table) outputs can be suppressed for all nodes or a specific node. Individual graphs or spreadsheets can also be suppressed.
This feature provides granular control over what the Workspace's consumer sees.
Data Health Check node
The use previous input description check box was added to Data Health Check node. This was added to help develop a template. For example, this can be combined with the new Select Predictors node.
Elasticsearch Text Analytics node
On the Specifications -> Quick tab, select Files . Examine Browse For Folder dialog box. The selected directory is now displayed on this dialog box.
Filter and Process Data nodes
Filter Duplicate Cases, Filter Sparse Data, MD Imputation, Process Invariant Variables, Process MD and Rank nodes now support wildcards in variable selection. Wild card variable selections are important for building templates that can be reused. Other nodes already have this functionality.
For example a variable can be selected:
General linear (GLM) node
The general linear (GLM) node is validated and released in 13.5. This was released as "beta" in prior releases.
ITrees CHAID nodes
The Always split on minimum p option is added to ITrees CHAID Classsification and Regression nodes.
K-Means Clustering node
The K-Means Clustering node is validated and released in 13.5. This was released as "beta" in prior releases.
Lasso Regression node
The use previous input description check box was added to Lasso Regression node. This was added to help develop a template. For example, this can be combined with the new Select Predictors node.
Normality tests node
The Normality tests node computes tests of normality (Kolmogorov-Smirnov test statistic, Kolmogorov-Smirnov p-value, Lilliefors p-value, Anderson-Darling test statistic, Anderson-Darling p-value, Shapiro-Wilks p-value) for each selected variable. If two or more variables are selected, then you can compute the following tests of multivariate
- Mardia’s test of multivariate skewness
- Mardia’s test of multivariate kurtosis
Why test for normality ? The normal distribution is a foundation for many algorithms. And verifiying the normality of a data set can be critical to getting a reasonable result. This can be viewed as model selection for algorithms that have a hypothesis. For example:
"I abhor averages. I like the individual case. A man may have six meals one day and none the next, making an average of three meals per day, but that is not a good way to live." ~ Louis D. Brandeis, Associate Justice of the Supreme Court of the United States from 1916 to 1939
A calendar control is added to start time and end time fields for all the PI nodes.
Reporting Tables node
.You can now reorder the elements in the ∑ Placement group box of the Reporting Tables node. This change also applies to the interactive module for Reporting Tables.
The node now generates a downstream dataset for further analysis. Prior to 13.5, this option was off by default.
Select Predictors node
Select Predictors node connects to a single data source. It can be very useful with Spotfire Integration. The user can select one target variable for predictive analytics problems. The node classifies and selects the remaining variables as continuous or categorical predictors. Then it passes the variable selection downstream.
Connect the Select Predictors node to Advanced Classification Trees (C&RT), Advanced Regression Trees (C&RT), Advanced Classification CHAID, Advanced Regression CHAID, Boosted Trees, Data Health Check, K-Nearest Neighbors, Lasso Regression, Feature Selection, MARSplines, Random Forest, SANN Classification, SANN Clustering, SANN Regression, Support Vector Machines and SVB nodes that use dependent/predictor variable selection.
TIBCO Statistica 13.4
TIBCO Statistica 13.4 has some great new features which are all summarized and documented in detail in this section of the TIBCO product documentation site.
This upgrade also demonstrates how TIBCO Analytics products are working together. For one of the key new features of being able to trigger a Statistica data function from Spotfire and rendering it in Spotfire, we have created a video which you can view on the TIBCO Youtube channel:
New version has several "quick start" Workspace templates (accessible after opening new Workspace document). We have posted 2 of these templates on the TIBCO Community Exchange for you to use with Statistica 13.3 if you have not yet upgraded to version 13.4:
Data Preparation Quick Start Template for TIBCO Statistica® - This is TIBCO Statistica® template guiding users through the process of data preparation steps. It is meant to be a quick start template allowing users to build their own data preparation process quicker. User will go through the workflow, set, connect and use various nodes in a sequence in order to prepare and clean the data for further analyses.
- Classification Model Building Template for TIBCO Statistica® - This template features a typical analytical workflow for building predictive classification data mining models with TIBCO Statistica®. In this template the user can simply change the input data source and run the whole modelling process on the new data with one click.
Some other important features that are added are: PI Event Frames, PI Asset Framework, Dynamic Time Wrapping, Batch Synchronization, Elasticsearch for Text Analytics, Publish PMML models to AMS (TIBCO Artifact Management Server), Lasso Regression Workspace Node, Improved Import and Export with Spotfire.
Statistica's workspace now has the ability to deploy a model into Streambase. This new feature in combination with Statistica Monitoring & Alerting Server (MAS) could be used to automatically refresh a model in production. MAS could monitor "prediction vs actual outcome" and if variance was higher then X, trigger an alarm. And the alarm's job would be to rebuild models against current data, compare/select the best model and then publish the new model into production.
Your upgrade should run smoothly but if you do run into any issues upgrading to 13.4 please submit a TIBCO support ticket. Or search for answers or post your question through the Answers section of this TIBCO Community.