IBM Cloud Satellite: Build faster. Securely. Anywhere. Read more

Calling R from SPSS

IBM® SPSS® can talk to R. It’s something of a well-kept secret, judging from the low level of activity in the R blogosphere on this point. The low level of interest is not surprising: SPSS users are, more often than not, people who use only SPSS for their data analysis; and R users are accustomed to applying ugly hacks as part of doing business with R. An R user who wants to analyse data in .sav format typically opens the file in SPSS, saves it to comma-separated values (CSV) format, and opens the result in R by using the read.csv() method. A cleaner way is to save to SPSS Statistics Portable (POR) format from SPSS and open the result by using the read.spss() method from the foreign library. This method usually works, in the sense that only a few dozen lines of R code are then required to cope with categories, missing values, time variables, and other features that are either lost or damaged in translation. If you need to return data from R back to SPSS, the return journey is more awkward.

Tedious data manipulation notwithstanding, you can certainly work both applications without a plug-in to connect them. Is the effort of learning the plug-in worth the gain in productivity? Is there a gain in productivity, or are the advantages of a different sort? To these questions, I would answer Yes and Yes. Translating from one data format to another is always tricky and time consuming. When you use R from SPSS, you can apply R functions to SPSS data while you maintain the integrity of the original database. Using R from SPSS allows you to apply R functions to SPSS data while you maintain the integrity of the original database.

A further advantage to using the R integration plug-in. Where R and SPSS are both used on the same data, use of the R integration plug-in fosters reproducible research.

Reproducible research

Reproducible research is mainly an organizational principle. Given the original data file and the syntax file, it is possible to re-create every step of the analysis from these two files. Months later, if you need to return to the problem with additional data or a new analysis, it is possible to rebuild the original project. With SPSS, you can maintain a record of every procedure that is run on the data, be it a transformation of the data, the creation of new variables, or an analysis. If R is to play a role in the analysis, either as an assist in recoding variables or to supply a function not currently available in SPSS, maintaining both SPSS and R syntax in the same syntax file has value. You can run SPSS and R code from the same SPSS syntax file and apply it to the same database. Everything stays together.

Extending the functionality of SPSS

In a previous article, I argued that data analysts should learn R. Briefly, most advances in statistics appear first as R packages before they are added to the drop-down menus. R gives the SPSS user more tools for the job, and although you might implement these tools outside of SPSS by exporting the data, data export is never seamless. With the R plug-in, you retain all the features of an SPSS database, particularly the labels of category data and the long descriptors.

R extensions

SPSS allows you to create more menu items and add them to the existing menu bar. In particular, R functions can be bundled as extensions and supplied to you through the menu. You can implement a function in R with no knowledge of R programming. Writing extensions goes beyond the scope of this article, but they are an important reason to learn to use the R plug-in. Through this plug-in, you can supply R functions to SPSS users who are unfamiliar with R.

Finding and installing the plug-in

Installing the plug-in is fairly straightforward, but the process does contain a few hurdles. For one thing, you must start several pages before the actual download page. You need to register with IBM developerWorks, if you are not already. It’s free.

Another hurdle in the installation is that the plug-in works with only one version of R, not necessarily the current one. Which version of R you need depends on the version of SPSS you are running. Unfortunately, the download page does not specify. However, for SPSS version 22, use R-2.15. For SPSS version 21, use R-2.14.0.

Be warned that the R integration plug-in is specific about the R version. For SPSS version 21, for example, you must install R-2.14.0. If you install 2.14.1 or 2.14.2, it will not work. During the installation process, the plug-in looks for a folder that contains the correct version of R. For example, if you use SPSS version 21 on Windows®, it looks for C:\Program Files\R\R-2.14.0. The installer queries you for the location of R if it can’t find the folder that it wants. From this query, you can infer the precise version of R you need:

  1. Obtain the appropriate version of R from r-cran, then download and install it. If you already use a different version of R and you want to keep it as your default, be sure to clear the Store version number in registry check box. If you want to install R packages to run with SPSS, you need to install them from the version of R that SPSS uses. R packages that are downloaded for the current version are invisible to the R integration package.
  2. To find the plug-in for download, click Help > Working with R from the menu bar in SPSS to reach the opening page.
  3. Midway down the page, click the link for SPSS plug-ins. SPSS has many plug-ins, but select the one for R. This link brings you to the login screen for IBM downloads.
  4. On the login page, log in or register (it’s free). Proceed to the download page.
  5. Each version of SPSS has its own plug-in. Find the one for your version, download it, and install it. At this stage, if you don’t have the correct version of R installed, you see a message that the installer can’t find it. Install a different version, and try again.
  6. If installation is successful, the installer displays a large documentation file. With the installation of the plug-in, this file is available from the SPSS Help menu under Programmability > R plugin. The Working with R menu command now points to more documentation and tutorials.

Using R from SPSS

The R integration plug-in does two things: It opens communication between SPSS and R, and it provides R with a package of functions with which to translate SPSS data structures into R objects.

Hello R!

Open a syntax file, and type the following lines. Select and run the command by clicking the green arrow:

cat("\t\tHello R!\n")

The line BEGIN PROGRAM R. launches R and loads the requisite library of data management functions. It also sets several option variables for R that override any options that you might set in your .First() function.

The first and last lines here follow the conventions of SPSS syntax code and end with a period (.). All code between those two lines is interpreted as R code and must obey the rules of R syntax, so no period marks the end of a line.

When SPSS meets the END PROGRAM. statement, it interprets subsequent commands as SPSS syntax, but it does not quit the R session. Any variables that an R chunk creates are available to subsequent R chunks during the SPSS session.

Reading data into R and returning changes to SPSS

R chunks that are called from SPSS can read and write data from external sources in the usual way. But if you run R from SPSS, it’s because you want access to an SPSS database. I created a simple test database to illustrate different data types, available with the downloads. Consider the lines in Listing 1.

Listing 1. Read and write a database
#Pull the data into a data frame
testData = spssdata.GetDataFromSPSS() 

#Pull the data dictionary into another data frame
testDict = spssdictionary.GetDictionaryFromSPSS()

#Take a look 

#Check what data types the variables of the R data frame have

lapply(testData, class)

#Set up a new SPSS database with the same dictionary 

#Copy the data to the new SPSS database
spssdata.SetDataToSPSS("Test2", testData) 

#Tell SPSS you're done creating data


When you run this code, the output in Listing 2 should appear in an SPSS output file.

Listing 2. Output reading and writing a database
             CustName Age Rating        Date Weight 
1 Mary                  21      1 13594608000   55.2 
2 John                  45      3 13594694400   73.4 
3 Henry                 33      2 13563244800   80.0 
                               X1    X2              X3 X4                   X5 
varName                  CustName   Age          Rating Date                 Weight 
varLabel            Customer Name   Age Customer rating Date of first trans  Weight 
varType                        20     0               0 0                    0 
varFormat                     A20    F8              F6 ADATE10              F5.1 
varMeasurementLevel       nominal scale         ordinal scale                scale 

[1] "factor" 
[1] "numeric" 
[1] "numeric" 
[1] "numeric" 
[1] "numeric"

What just happened?

The great strength of SPSS as a data vault lies in the detailed data dictionary that you can create. You can store some of this information—variable types and names— as class and variable names in an R data frame but not without some loss of detail. The R integration plug-in lets you create two data frames from the active SPSS data set: one for the data and one for the data dictionary.

Data conversion from SPSS to R

Look at each variable in turn from the test database and see what happens when it is read into R:

  • CustName. This variable is a string variable of length 20 in SPSS, nominal type. It becomes a factor in R.
  • Age. This variable is numeric in SPSS, scale type, of length 6 with no decimals. It becomes numeric in R.
  • Rating. This variable is numeric of type ordinal. The numeric codes were given descriptive labels in SPSS that are lost in translation. (For more about categorical data, see Working with categories.)
  • Date. This variable is a date, formatted dd-mmm-yyyy. It becomes numeric in R. (For more about dates, see Working with dates.)
  • Weight. A numeric variable that is formatted in SPSS to have one decimal. It becomes numeric in R.

The data dictionary

The data dictionary can be imported to a data frame in R, as shown in Listing 1. You don’t need this dictionary to work on the data in R, but you do need to build a data dictionary to create an SPSS database. The data dictionary is a data frame of character vectors. It has one column for each variable of the SPSS database and one row for each entry in the dictionary. As you can see from the example in Listing 2, a range of format types is available. The complete list is given in the documentation for the R plug-in.

Working with dates

R integration function spssdictionary.GetDictionaryFromSPSS(), with no arguments, transforms dates into numbers. The number that you get is the elapsed time in seconds from midnight, 10 October 1582.

To convert the date variable for use in R, I might add testData$Date = as.POSIXlt(testData$Date, origin="1582-10-10"). Alternatively, I can take advantage of a useful argument of the GetDataFromSPSS() function (see Listing 3).

Listing 3. Reading dates from SPSS into R
#Pull the data into a data frame adjusting for dates
testData = spssdata.GetDataFromSPSS(rDate="POSIXct") 
testDict = spssdictionary.GetDictionaryFromSPSS()

              CustName Age Rating       Date Weight 
1 Mary                  21      1 2013‑07‑31   55.2 
2 John                  45      3 2013‑08‑01   73.4

Writing time data to SPSS

The example in Listing 4 shows how to write date-time data back to SPSS from R. File IBM.csv contains a record of NYSE stock market data for IBM stock, obtained from the well-known finance site on Here you see the first few lines of data, reading back from 8 August 2013.

Listing 4. Writing dates from R to SPSS
    Date    Open    High    Low        Close    Volume    Adj Close
28/08/2013    182.68    183.47    181.1    182.16    3979200    182.16
27/08/2013    183.63    184.5    182.57    182.74    3179300    182.74
26/08/2013    185.27    187        184.68    184.74    2170400    184.74
23/08/2013    185.34    185.74    184.57    185.42    2292700    185.42
22/08/2013    185.65    186.25    184.25    185.19    2354300    185.19
21/08/2013    184.67    186.57    184.28    184.86    3551000    184.86

I can read the data into SPSS, but the date format is not a format that the SPSS date-time wizard supports. R to the rescue! Using R syntax from SPSS, I can open the file from R, convert the date to an appropriate format, and create an SPSS database with the results. Here are the steps:

  1. The default working directory for the R integration plug-in is somewhere deep in the SPSS program directory tree. That’s not what you want. Set the working directory to the location of your data file so that R can find it.
  2. These lines of code read in the dates, in character format, and convert them to Portable Operating System Interface for UNIX® (POSIX) format, with the correct starting date of 10 October 1582.
  3. The spssdictionary.CreateSPSSDictionary() function automates some features of building up the data dictionary. Format DATE11 invokes date format 28-Aug-2013.
  4. Create the database and populate it.

Listing 5 shows how to carry out these steps.

Listing 5. Reading data directly into R and creating an SPSS database from them

Working with categories

My simple example did not handle the categorical variable Rating at all well. R got the numeric codes for that variable but not the descriptive labels for the different levels the variable might take: Poor, Average, and Excellent.

You can do something about that issue. The factorMode argument that is shown in Listing 6 imports category levels instead of numeric values.

Listing 6. The factorMode argument
testData = spssdata.GetDataFromSPSS(rDate="POSIXct", factorMode="labels") 
testDict = spssdictionary.GetDictionaryFromSPSS() 
              CustName Age    Rating       Date Weight 
1 Mary                  21      Poor 2013‑07‑31   55.2 
2 John                  45 Excellent 2013‑08‑01   73.4 
3 Henry                 33   Average 2012‑08‑02   80.0

Building a dictionary for categorical variables

The factorMode argument gives me a choice, depending on whether I want numeric codes or values for a categorical variable. But I need more if I want to create an SPSS database with categorical data. The solution lies in adding further structure to the data dictionary. The example in Listing 7 illustrates how to build an SPSS database from an R data frame with factors.

The famous iris data set is bundled with base R. It is a data frame with four numeric variables and one factor, denoting one of three species of iris. To build a database in SPSS, I complete the following steps:

  1. Create a data dictionary for the iris data. This dictionary is a data frame of five columns (one for each variable of the iris set).
  2. Create a category dictionary for the factor. The R structure here is complex. It is a list of length 2. The first component contains the names of the factors. The second component is a list of lists. Each item is a list of length 2: one component for the numeric codes and one component for their labels.
  3. Begin creation of an SPSS database by “setting” the data and category dictionaries.
  4. Populate the database.
  5. End the data step.
  6. Run the code. Doing so creates a database in SPSS but does not save it to disk. The active database remains whatever it was.

Listing 7. Building a dictionary for categorical data
iris.dict = vector(mode="list", length=5)
#Name the columns
names(iris.dict) = paste("X", 1:5, sep="")
#Fill in the numeric variables

for(i in 1:4){
iris.dict[[i]] = c(names(iris)[i],"","0","F3.2","scale")

#Fill information for the category
iris.dict[[5]] = c("Species","Species of Iris","0","F3","nominal")
#Square it off and add row names
iris.dict = data.frame(iris.dict)
row.names(iris.dict) = c("varName","varLabel","varType","varFormat",
#Now build the category dictionary = vector(mode="list",length=2)
names( = c("name","dictionary")$name = "Species"

#Note that the dictionary is a list of lists
#With only one category, the first list has length 1
#The dictionary list contains two lists
#$dictionary = vector(mode="list", length=1)$dictionary[[1]] = list(levels=c(1,2,3), 

#Now build the SPSS database.
spssdictionary.SetDictionaryToSPSS("Iris", iris.dict,
spssdata.SetDataToSPSS("Iris", iris,



The R integration package contains many functions to provide a seamless transfer from SPSS to R. For instance, SPSS allows greater flexibility in defining missing values than R. The R integration package contains functions for managing missing values so that nothing is lost in passing from SPSS to R and back again. Another important feature is the ability to create SPSS extensions that use R. Menu items can be added to the Analysis menu that enable R functions to be run on the active data set without needing to write explicit code in a syntax file. In this way, you can make R functionality available to users who have no knowledge of R. The R integration package has a lot to offer data analysts who use both SPSS and R.