Associate a Dataset
Associating a dataset makes that data available to the workspace under the Data tab. These are datasets that are not a part of the sandbox schema. You can use these datasets to import data that is not contained within the main data source of the workflow.
People can comment on anything in the Activity pane in the workspace overview. Comments can be edited or deleted.
A Connect View is a generated table created by a join or select statement on a database data source. It is stored locally on the application database.
You can integrate your own algorithms and processes into the Spotfire Data Science analytics engine using the Custom Operator Framework. Custom Operators are written using Scala or Java. They can use Spark for advanced machine learning and transformations.
Datasets come from databases, HDFS, or uploaded files such as CSV files. They are used within workflows, Touchpoints, and sandboxes in order to perform analyses.
A data source refers to an external data provider, either on Hadoop or as a relational database.
Data Source States
This has correct connection information and can be used in Spotfire Data Science.
Indicates that this data source is having trouble connecting properly. Verify connection parameters and try again.
A user has begun to add this data source, but has not completed the connection parameters. Users can save some data sources as incomplete. The data source is not usable until the rest of the data is provided.
This has been disabled by an administrator and cannot be used until it is turned back on. Users may not open workflows or run jobs using this data source.
On the homepage of Spotfire Data Science after logging in, people can customize their view to create a dashboard of important information. Click the gear icon on the home page to customize widgets.
Design time refers to the actions a person takes before running an operator or workflow. He or she is “designing”/customizing the operator’s parameters. This is an important concept to understand when learning about Custom Operators.
Insights are pieces of information that are deemed important to a particular workspace or a workflow. People can add an insight directly or promote a note or a comment to an insight. One can also attach workflow results, files, datasets and other files uploaded from the desktop to support the importance of the finding.
People can schedule a Job to be run on a regular time interval, or on demand. This is useful for updating data automatically over time or run specific tasks overnight. Jobs can be customized to run on a hourly, daily, weekly or a monthly schedule. Based on the configuration, team members can be notified of the results on success or failure.
A Milestone refers to a section of work that a team member works on. By setting a due date, people can see at a glance the progress of a particular analytic project. Milestones are shown on the workspace page under the ‘Milestones’ tab, and also on the workspace overview. One can also change the status of the project to one of the 3 available options: On Track, Needs Attention, At Risk.
People can make notes on a workspace or any workfile within that space. Notes will show under the Activity pane in the workspace overview and will be viewable to those with access to the workspace.
Can be promoted to Insights and commented on by other people.
Notifications alert people of important changes in the application, such as job results or collaboration information. When someone adds another person to the workspace, that person will get a notification. Many types of activities can have notify people and this can be configured individually.
An operator encapsulates some algorithm or transformation within a workflow. They show up as a list on the sidebar of the application within the workflow editor, and can be dragged to the canvas and connected to other operators or data sources. They are one of the main building blocks for workflows -- the other being data sources. People can filter the operators based on the intended operation such as Load, Explore, Transform, Model, Predict and Tools.
Running a workflow runs the entire workflow, executing all operators and data sources. People will not be able to run a whole workflow with invalid operators. Alternatively, they can use step run. Like step run, selecting ‘Run’ only executes operators that have not run before. To re-run everything from scratch, select ‘Clear Step Run Results’ from the contextual menu.
A sandbox is a place in the Data tab where people can bring in their training schema and perform simple explorations with the help of visualizations. One can also create external views from the sandbox datasets. Sandboxes are only available for database data sources.
A shared account can be used to provide access to a database data source for multiple members of an organization. This means that people will share a single set of credentials for the data source.
Spotfire Data Science Model
Spotfire Data Science models are generated from a workflow, and then saved as .am files. This allows the portability of models from one workflow to the other without having to build the whole model again. The Spotfire Data Science models are saved under the Work Files tab of a workspace.
By right-clicking an operator’s icon and selecting ‘Step Run’, people can run only the operators and data sources needed to get up to the selected operator for step run in a workflow. Operators that have already been executed will not execute again. This allows for faster iteration so that the person does not need to run the entire workflow again.
The Sub-flow operator allows people to run another workflow from the current workflow.
Tags can be used to categorize datasets or results within the application. Tags can be added to any workfile. Selecting a tag brings up a view of all workfiles with that tag.
Spotfire Data Science Touchpoints wrap the functionality of complex workflows in an interactive application that can be consumed by the business analyst.
If the Spotfire Data Science instance includes Touchpoints:
By selecting the ‘Touchpoints’ item from the sidebar, people can see the collection of Touchpoints that team members have published.
Publish a Touchpoint
When a developer finishes creating a Touchpoint, he or she can ‘publish’ it to the Touchpoint catalog, which means it will be available to all members of the application. Depending on the Touchpoint’s settings, the Touchpoint will be run either as the creator or the person. An unpublished Touchpoint can still be run by members of the workspace the Touchpoint resides in.
Unpublish a Touchpoint
To remove a Touchpoint from the catalog, you can unpublish it. Unpublishing means that the Touchpoint is no longer publicly available for people in the application to view and run it. It is restricted to the members of the workspace from where the Touchpoint originated.
A workfile is any file that is within the Work Files section under the workspace view in the application. Workfiles can be workflows (analytic processes with operators), but they can also be SQL files, CSV files, Touchpoints, Spotfire Data Science Models, or result files saved to the workspace. Additionally, people can upload other types of files (such as ZIP files) for reference.
A workflow is a collection of datasets and operators that performs analytic tasks. It is the place where data scientists build out models using the available operators and algorithms in Spotfire Data Science. In the Work Files view, workflows are displayed as pages with the Spotfire Data Science logo attached.
People can override default parameters and define their own workflow-wide variables using a workflow variable. To set one up, click the menu Actions > Workflow Variables from a workflow. All workflow variables start with the character @. They are commonly used for input in Touchpoints. They can also be used to edit default paths and parameters for HDFS.
Team members can use workspaces to collaborate on a data science project. Workspaces hold workfiles, and they can also have scheduled jobs, milestones, and associated database and Hadoop datasets under the Data tab to keep track of progress on a project. A workspace can be either public or private. People can create a workspace from the workspace page.