Fitting a model using Python and TIBCO® Enterprise Runtime for R
Note: As of Spotfire 10.7 there is native support for Python data functions which is ther preferred method. Read more about this capability here.
You might find yourself working in a situation where you have Python programmers writing Python scripts and R programmers writing R scripts, but you need to share results from data across the organization. Using TIBCO Enterprise Runtime for R, Python, and a set of available packages, you can span the chasm of programming languages for meaningful results. Optionally, you can create a data function in TIBCO Spotfire to call this code, and then use the results returned from Python to create a visualization.
In this article, we install the tools and packages in TIBCO Enterprise Runtime for R that are required to pass data to Python, send a script to run in Python, and then get the fitted model back from Python. We compare the results with a model fitted using the TIBCO Enterprise Runtime for R function
For this solution, we work in Windows, because Spotfire Analyst is a Windows desktop application. We need to make sure our system meets the requirements, and we have the software applications and packages to run the code.
We are running Windows 10, which is 64-bit system, and we have a modern system with adequate hard disk space, CPU power, and memory.
The software for our solution includes the following.
TIBCO Spotfire Analyst 7.7, which includes TIBCO Enterprise Runtime for R version 4.2.
Optionally, we can run the TIBCO Enterprise Runtime for R version 4.2 Developer Edition from our installation of RStudio.
Anaconda Python 4.1.1 or later, which includes Python 3.5 (64 bit). The installation requires 631 MB.
Download from https://www.continuum.io/downloads.
We recommend using the Anaconda installation because it includes many packages for data science including numpy, scipy, pandas, and statsmodels.
Note: You must put Python in your path, because code you need to run this example looks only in the directories listed in the environment variable PATH to find the Python executable and DLL files. If you see the following message, check to see that Python is in your PATH.
INFO: Could not find files for the given pattern(s).
Note: To see system requirements for installing the software, see their individual Help topics or Support information.
Spotfire system requirements: http://support.spotfire.com/sr.asp
Anaconda system requirements: https://docs.continuum.io/anaconda/navigator
Both TIBCO Enterprise Runtime for R and Python use packages that contain specialized functions to solve specific programming and industry problems. In this case, the packages enable the two systems to connect and to communicate, exchanging data frames.
TIBCO Enterprise Runtime for R uses the following packages (plus all their dependencies) from the Comprehensive R Archive Network (CRAN).
Anaconda manages finding, installing, and building binary Windows packages from available Python package resource sites. Python uses the following packages (plus all their dependencies).
- feather-format (binary package provided with this exercise: scroll to the bottom of the page).
- statsmodels (provided in the Anaconda installation).
Our solution demonstrates calling Python from TIBCO Enterprise Runtime for R to fit a linear model in Python using the
ols function from the statsmodels package. Fitted values from Python are passed back to TIBCO Enterprise Runtime for R and compared with the fitted values from the
lm function in TIBCO Enterprise Runtime for R.
Note The complete script demonstrated in this example is attached to this article, for your convenience. To download and review the script, scroll to the bottom of the article and select TERRandPython.R.txt.
The data and data translation packages
For analysis in TIBCO Enterprise Runtime for R, statisticians use the
data.frame object to contain the data. For analysis in Python, programmers use pandas, a powerful Python data analysis toolkit, which contains the data structure
DataFrame. These two object types are not compatible. We can use the CRAN package feather and the Python package feather-format to provide the means to translate the data between the two programming languages while maintaining the structure and integrity of the data.
We use the CRAN package feather to send the
data.frame object from TIBCO Enterprise Runtime for R to Python. We use the feather-format package on the Python side. Python reads in the data as a
DataFrame, adding a column needed by that data structure. After running the script to process the data (fitting the model, in our example), we perform the reverse process, using feather-format in Python to send the data back to TIBCO Enterprise Runtime for R, which reads in the data, with the help of the feather package, as a new
data.frame (with an additional column).
Download the attached .zip archive, feather_format.zip, included with this article. This zip archive contains the feather-format package. Copy the zip archive to the site library for your Anaconda Python installation, and then extract the .whl file it contains. For this example, we provide the .whl archive feather_format-0.3.0-cp35-cp35m-win_amd64.whl.
From a Windows command prompt, install the feather-format package.
pip install feather-format
From the Spotfire menu, click Tools > TERR Tools, and then open the TIBCO Enterprise Runtime for R console.
Note Optionally, you can use RStudio, specifying TIBCO Enterprise Runtime for R as the engine.
At the command prompt, install the required packages.
Load the packages.
Connect to Python calling the
pyConnectfunction from the PythonInR package. You should not need to specify a path.
PythonInR::pyConnect() # only needed on Windows
Assign the data set (in this case,
fuel.frame) to the name
ff <- Sdatasets::fuel.frame
Create a temporary path for the data set, and then write a
data.frameto a feather file, passing in the data set and the temporary path.
tempff <- tempfile("ff") write_feather(ff, path=tempff)
Set the name of the feather file in Python by calling the function PythonInR::pyExecp. Note the use of
rbefore the file name
tempff. This specifies creating a raw string.
PythonInR::pyExecp(paste0("fthrfile = r'", tempff, "'"))
Using PythonInR::pyExec, create and pass to Python the following script.
Note Remember that Python has rules about indentations and spacing. If you see the error,
"IndentationError: unexpected indent", remove the leading spaces from each line in the
PythonInR::pyExec(' import feather from statsmodels.formula.api import ols df = feather.read_dataframe(fthrfile) linmod = ols(formula="Fuel ~ Weight + Type", data=df).fit() pred = linmod.predict(df) df["Fitted"] = pred feather.write_dataframe(df, fthrfile) ')
The script performs the following tasks.
- Reads the data (fthrfile).
- Fits the model.
- Adds fitted values as a new column in the data (Fitted).
- Writes out in a new feather file (fthrfile).
Read the feather file written by Python, passing in the path created earlier to contain it. The file contains the new column with the fitted values.
ff2 <- read_feather(tempff)
Fit the same model with the TIBCO Enterprise Runtime for R function
lmand extract the fitted values with the
m1 <- lm(Fuel ~ Weight + Type, data = ff) p1 <- predict(m1, ff)
Compare to the fitted values that Python computed.
The returned value should be
TRUE, which indicates that the fitted values returned by TIBCO Enterprise Runtime for R and those returned by Python are identical. We can be assured that the code ran correctly and gave us identical results.