Fitting a model using Python and TIBCO® Enterprise Runtime for R
By:
Last updated:
11:03am Jul 03, 2018

You might find yourself working in a situation where you have Python programmers writing Python scripts and R programmers writing R scripts, but you need to share results from data across the organization. Using TIBCO Enterprise Runtime for R, Python, and a set of available packages, you can span the chasm of programming languages for meaningful results.  Optionally, you can create a data function in TIBCO Spotfire to call this code, and then use the results returned from Python to create a visualization.

In this article, we install the tools and packages in TIBCO Enterprise Runtime for R that are required to pass data to Python, send a script to run in Python, and then get the fitted model back from Python. We compare the results with a model fitted using the TIBCO Enterprise Runtime for R function lm.

Requirements

For this solution, we work in Windows, because Spotfire Analyst is a Windows desktop application. We need to make sure our system meets the requirements, and we have the software applications and packages to run the code.

System

We are running Windows 10, which is 64-bit system, and we have a modern system with adequate hard disk space, CPU power, and memory.

Software

The software for our solution includes the following.

  • TIBCO Spotfire Analyst 7.7, which includes TIBCO Enterprise Runtime for R version 4.2.

    Optionally, we can run the TIBCO Enterprise Runtime for R version 4.2 Developer Edition from our installation of RStudio.

  • Anaconda Python 4.1.1 or later, which includes Python 3.5 (64 bit). The installation requires 631 MB.

    Download from https://www.continuum.io/downloads.

    We recommend using the Anaconda installation because it includes many packages for data science including numpy, scipy, pandas, and statsmodels.  

    Note:  You must put Python in your path, because code you need to run this example looks only in the directories listed in the environment variable PATH to find the Python executable and DLL files.  If you see the following message, check to see that Python is in your PATH. 

    INFO: Could not find files for the given pattern(s).

Note: To see system requirements for installing the software, see their individual Help topics or Support information.

Packages

Both TIBCO Enterprise Runtime for R and Python use packages that contain specialized functions to solve specific programming and industry problems. In this case, the packages enable the two systems to connect and to communicate, exchanging data frames.

  • TIBCO Enterprise Runtime for R uses the following packages (plus all their dependencies) from the Comprehensive R Archive Network (CRAN).

    • PythonInR
    • feather
  • Anaconda manages finding, installing, and building binary Windows packages from available Python package resource sites. Python uses the following packages (plus all their dependencies).

    • feather-format (binary package provided with this exercise: scroll to the bottom of the page).
    • statsmodels (provided in the Anaconda installation).

Example

Our solution demonstrates calling Python from TIBCO Enterprise Runtime for R to fit a linear model in Python using the ols function from the statsmodels package. Fitted values from Python are passed back to TIBCO Enterprise Runtime for R and compared with the fitted values from the lm function in TIBCO Enterprise Runtime for R.

Note  The complete script demonstrated in this example is attached to this article, for your convenience. To download and review the script, scroll to the bottom of the article and select TERRandPython.R.txt.

The data and data translation packages

For analysis in TIBCO Enterprise Runtime for R, statisticians use the data.frame object to contain the data. For analysis in Python, programmers use pandas, a powerful Python data analysis toolkit, which contains the data structure DataFrame.  These two object types are not compatible. We can use the CRAN package feather and the Python package feather-format to provide the means to translate the data between the two programming languages while maintaining the structure and integrity of the data.

We use the CRAN package feather to send the data.frame object from TIBCO Enterprise Runtime for R to Python. We use the feather-format package on the Python side. Python reads in the data as a DataFrame, adding a column needed by that data structure. After running the script to process the data (fitting the model, in our example), we perform the reverse process, using feather-format in Python to send the data back to TIBCO Enterprise Runtime for R, which reads in the data, with the help of the feather package, as a new data.frame (with an additional column).

Procedure

  1. Download the attached .zip archive, feather_format.zip, included with this article. This zip archive contains the feather-format package. Copy the zip archive to the site library for your Anaconda Python installation, and then extract the .whl file it contains. For this example, we provide the .whl archive feather_format-0.3.0-cp35-cp35m-win_amd64.whl.

  2. From a Windows command prompt, install the feather-format package.

    pip install feather-format
  3. From the Spotfire menu, click Tools > TERR Tools, and then open the TIBCO Enterprise Runtime for R console.

    Note  Optionally, you can use RStudio, specifying TIBCO Enterprise Runtime for R as the engine.

  4. At the command prompt, install the required packages.

    install.packages(“feather”)
    install.packages(“PythonInR”)
  5. Load the packages.

    library("feather")
    library("PythonInR")
  6. Connect to Python calling the pyConnect function from the PythonInR package. You should not need to specify a path.

    PythonInR::pyConnect()  # only needed on Windows
  7. Assign the data set (in this case, fuel.frame) to the name ff.

    ff <- Sdatasets::fuel.frame
    
  8. Create a temporary path for the data set, and then write a data.frame to a feather file, passing in the data set and the temporary path.

    tempff <- tempfile("ff")
    write_feather(ff, path=tempff)
  9. Set the name of the feather file in Python by calling the function PythonInR::pyExecp. Note the use of r before the file name tempff. This specifies creating a raw string.

    PythonInR::pyExecp(paste0("fthrfile = r'", tempff, "'"))
  10. Using PythonInR::pyExec, create and pass to Python the following script.

    Note  Remember that Python has rules about indentations and spacing. If you see the error, "IndentationError: unexpected indent", remove the leading spaces from each line in the pyExec code.

    PythonInR::pyExec('
    import feather
    from statsmodels.formula.api import ols
    df = feather.read_dataframe(fthrfile)
    linmod = ols(formula="Fuel ~ Weight + Type", data=df).fit()
    pred = linmod.predict(df)
    df["Fitted"] = pred
    feather.write_dataframe(df, fthrfile)
    ')

    The script performs the following tasks.

    1. Reads the data  (fthrfile).
    2. Fits the model.
    3. Adds fitted values as a new column in the data (Fitted).
    4. Writes out in a new feather file (fthrfile).
  11. Read the feather file written by Python, passing in the path created earlier to contain it. The file contains the new column with the fitted values.

    ff2 <- read_feather(tempff)
  12. Fit the same model with the TIBCO Enterprise Runtime for R function lm and extract the fitted values with the predict function.

    m1 <- lm(Fuel ~ Weight + Type, data = ff)
    p1 <- predict(m1, ff)
  13. Compare to the fitted values that Python computed.

    all.equal(unname(p1), ff2$Fitted)

    The returned value should be TRUE, which indicates that the fitted values returned by TIBCO Enterprise Runtime for R and those returned by Python are identical. We can be assured that the code ran correctly and gave us identical results.

Attachments

AttachmentSize
Package icon feather_format.zip80.99 KB
Plain text icon terrandpython.r.txt1.35 KB