IBM® SPSS® can talk to R. It’s something of a well-kept secret, judging from the low level of activity in the R blogosphere on this point. The low level of interest is not surprising: SPSS users are, more often than not, people who use only SPSS for their data analysis; and R users are accustomed to applying ugly hacks as part of doing business with R. An R user who wants to analyse data in .sav format typically opens the file in SPSS, saves it to comma-separated values (CSV) format, and opens the result in R by using the `read.csv()`

method. A cleaner way is to save to SPSS Statistics Portable (POR) format from SPSS and open the result by using the `read.spss()`

method from the `foreign`

library. This method usually works, in the sense that only a few dozen lines of R code are then required to cope with categories, missing values, time variables, and other features that are either lost or damaged in translation. If you need to return data from R back to SPSS, the return journey is more awkward.

Tedious data manipulation notwithstanding, you can certainly work both applications without a plug-in to connect them. Is the effort of learning the plug-in worth the gain in productivity? Is there a gain in productivity, or are the advantages of a different sort? To these questions, I would answer Yes and Yes. Translating from one data format to another is always tricky and time consuming. When you use R from SPSS, you can apply R functions to SPSS data while you maintain the integrity of the original database. Using R from SPSS allows you to apply R functions to SPSS data while you maintain the integrity of the original database.

A further advantage to using the R integration plug-in. Where R and SPSS are both used on the same data, use of the R integration plug-in fosters reproducible research.

## Reproducible research

Reproducible research is mainly an organizational principle. Given the original data file and the syntax file, it is possible to re-create every step of the analysis from these two files. Months later, if you need to return to the problem with additional data or a new analysis, it is possible to rebuild the original project. With SPSS, you can maintain a record of every procedure that is run on the data, be it a transformation of the data, the creation of new variables, or an analysis. If R is to play a role in the analysis, either as an assist in recoding variables or to supply a function not currently available in SPSS, maintaining both SPSS and R syntax in the same syntax file has value. You can run SPSS and R code from the same SPSS syntax file and apply it to the same database. Everything stays together.

## Extending the functionality of SPSS

In a previous article, I argued that data analysts should learn R. Briefly, most advances in statistics appear first as R packages before they are added to the drop-down menus. R gives the SPSS user more tools for the job, and although you might implement these tools outside of SPSS by exporting the data, data export is never seamless. With the R plug-in, you retain all the features of an SPSS database, particularly the labels of category data and the long descriptors.

## R extensions

SPSS allows you to create more menu items and add them to the existing menu bar. In particular, R functions can be bundled as extensions and supplied to you through the menu. You can implement a function in R with no knowledge of R programming. Writing extensions goes beyond the scope of this article, but they are an important reason to learn to use the R plug-in. Through this plug-in, you can supply R functions to SPSS users who are unfamiliar with R.

## Finding and installing the plug-in

Installing the plug-in is fairly straightforward, but the process does contain a few hurdles. For one thing, you must start several pages before the actual download page. You need to register with IBM developerWorks, if you are not already. It’s free.

Another hurdle in the installation is that the plug-in works with only one version of R, not necessarily the current one. Which version of R you need depends on the version of SPSS you are running. Unfortunately, the download page does not specify. However, for SPSS version 22, use R-2.15. For SPSS version 21, use R-2.14.0.

Be warned that the R integration plug-in is specific about the R version. For SPSS version 21, for example, you must install R-2.14.0. If you install 2.14.1 or 2.14.2, it will not work. During the installation process, the plug-in looks for a folder that contains the correct version of R. For example, if you use SPSS version 21 on Windows®, it looks for C:\Program Files\R\R-2.14.0. The installer queries you for the location of R if it can’t find the folder that it wants. From this query, you can infer the precise version of R you need:

- Obtain the appropriate version of R from r-cran, then download and install it.
If you already use a different version of R and you want to keep it as your default, be sure to clear the
**Store version number in registry**check box. If you want to install R packages to run with SPSS, you need to install them from the version of R that SPSS uses. R packages that are downloaded for the current version are invisible to the R integration package. - To find the plug-in for download, click
**Help > Working with R**from the menu bar in SPSS to reach the opening page. - Midway down the page, click the link for SPSS plug-ins. SPSS has many plug-ins, but select the one for R. This link brings you to the login screen for IBM downloads.
- On the login page, log in or register (it’s free). Proceed to the download page.
- Each version of SPSS has its own plug-in. Find the one for your version, download it, and install it. At this stage, if you don’t have the correct version of R installed, you see a message that the installer can’t find it. Install a different version, and try again.
- If installation is successful, the installer displays a large documentation file.
With the installation of the plug-in, this file is available from the SPSS
**Help**menu under**Programmability > R plugin**. The**Working with R**menu command now points to more documentation and tutorials.

## Using R from SPSS

The R integration plug-in does two things: It opens communication between SPSS and R, and it provides R with a package of functions with which to translate SPSS data structures into R objects.

### Hello R!

Open a syntax file, and type the following lines. Select and run the command by clicking the green arrow:

```
BEGIN PROGRAM R.
cat("\t\tHello R!\n")
END PROGRAM.
```

The line `BEGIN PROGRAM R.`

launches R and loads the requisite library of data management functions. It also sets several option variables for R that override any options that you might set in your `.First()`

function.

The first and last lines here follow the conventions of SPSS syntax code and end with a period (.). All code *between* those two lines is interpreted as R code and must obey the rules of R syntax, so no period marks the end of a line.

When SPSS meets the `END PROGRAM.`

statement, it interprets subsequent commands as SPSS syntax, but it does not quit the R session. Any variables that an R chunk creates are available to subsequent R chunks during the SPSS session.

## Reading data into R and returning changes to SPSS

R chunks that are called from SPSS can read and write data from external sources in the usual way. But if you run R from SPSS, it’s because you want access to an SPSS database. I created a simple test database to illustrate different data types, available with the downloads. Consider the lines in Listing 1.

##### Listing 1. Read and write a database

```
BEGIN PROGRAM R.
#Pull the data into a data frame
testData = spssdata.GetDataFromSPSS()
#Pull the data dictionary into another data frame
testDict = spssdictionary.GetDictionaryFromSPSS()
#Take a look
print(testData)
print(testDict)
#Check what data types the variables of the R data frame have
lapply(testData, class)
#Set up a new SPSS database with the same dictionary
spssdictionary.SetDictionaryToSPSS("Test2",testDict)
#Copy the data to the new SPSS database
spssdata.SetDataToSPSS("Test2", testData)
#Tell SPSS you're done creating data
spssdictionary.EndDataStep()
END PROGRAM.
```

When you run this code, the output in Listing 2 should appear in an SPSS output file.

##### Listing 2. Output reading and writing a database

```
CustName Age Rating Date Weight
1 Mary 21 1 13594608000 55.2
2 John 45 3 13594694400 73.4
3 Henry 33 2 13563244800 80.0
X1 X2 X3 X4 X5
varName CustName Age Rating Date Weight
varLabel Customer Name Age Customer rating Date of first trans Weight
varType 20 0 0 0 0
varFormat A20 F8 F6 ADATE10 F5.1
varMeasurementLevel nominal scale ordinal scale scale
$CustName
[1] "factor"
$Age
[1] "numeric"
$Rating
[1] "numeric"
$Date
[1] "numeric"
$Weight
[1] "numeric"
```

### What just happened?

The great strength of SPSS as a data vault lies in the detailed data dictionary that you can create. You can store some of this information—variable types and names— as class and variable names in an R data frame but not without some loss of detail. The R integration plug-in lets you create two data frames from the active SPSS data set: one for the data and one for the data dictionary.

### Data conversion from SPSS to R

Look at each variable in turn from the test database and see what happens when it is read into R:

This variable is a string variable of length 20 in SPSS, nominal type. It becomes a factor in R.`CustName`

.This variable is numeric in SPSS, scale type, of length 6 with no decimals. It becomes numeric in R.`Age`

.This variable is numeric of type ordinal. The numeric codes were given descriptive labels in SPSS that are lost in translation. (For more about categorical data, see Working with categories.)`Rating`

.This variable is a date, formatted`Date`

.*dd-mmm-yyyy.*It becomes numeric in R. (For more about dates, see Working with dates.)A numeric variable that is formatted in SPSS to have one decimal. It becomes numeric in R.`Weight`

.

### The data dictionary

The data dictionary can be imported to a data frame in R, as shown in Listing 1. You don’t need this dictionary to work on the data in R, but you do need to build a data dictionary to create an SPSS database. The data dictionary is a data frame of character vectors. It has one column for each variable of the SPSS database and one row for each entry in the dictionary. As you can see from the example in Listing 2, a range of format types is available. The complete list is given in the documentation for the R plug-in.

## Working with dates

R integration function `spssdictionary.GetDictionaryFromSPSS()`

, with no arguments, transforms dates into numbers. The number that you get is the elapsed time in seconds from midnight, 10 October 1582.

To convert the date variable for use in R, I might add `testData$Date = as.POSIXlt(testData$Date, origin="1582-10-10")`

. Alternatively, I can take advantage of a useful argument of the `GetDataFromSPSS()`

function (see Listing 3).

##### Listing 3. Reading dates from SPSS into R

```
BEGIN PROGRAM R.
#Pull the data into a data frame adjusting for dates
testData = spssdata.GetDataFromSPSS(rDate="POSIXct")
testDict = spssdictionary.GetDictionaryFromSPSS()
print(testData)
END PROGRAM.
CustName Age Rating Date Weight
1 Mary 21 1 2013‑07‑31 55.2
2 John 45 3 2013‑08‑01 73.4
```

### Writing time data to SPSS

The example in Listing 4 shows how to write date-time data back to SPSS from R. File IBM.csv contains a record of NYSE stock market data for IBM stock, obtained from the well-known finance site on Yahoo.com. Here you see the first few lines of data, reading back from 8 August 2013.

##### Listing 4. Writing dates from R to SPSS

```
Date Open High Low Close Volume Adj Close
28/08/2013 182.68 183.47 181.1 182.16 3979200 182.16
27/08/2013 183.63 184.5 182.57 182.74 3179300 182.74
26/08/2013 185.27 187 184.68 184.74 2170400 184.74
23/08/2013 185.34 185.74 184.57 185.42 2292700 185.42
22/08/2013 185.65 186.25 184.25 185.19 2354300 185.19
21/08/2013 184.67 186.57 184.28 184.86 3551000 184.86
```

I can read the data into SPSS, but the date format is not a format that the SPSS date-time wizard supports. R to the rescue! Using R syntax from SPSS, I can open the file from R, convert the date to an appropriate format, and create an SPSS database with the results. Here are the steps:

- The default working directory for the R integration plug-in is somewhere deep in the SPSS program directory tree. That’s not what you want. Set the working directory to the location of your data file so that R can find it.
- These lines of code read in the dates, in character format, and convert them to Portable Operating System Interface for UNIX® (POSIX) format, with the correct starting date of 10 October 1582.
- The
`spssdictionary.CreateSPSSDictionary()`

function automates some features of building up the data dictionary. Format`DATE11`

invokes date format 28-Aug-2013. - Create the database and populate it.

Listing 5 shows how to carry out these steps.

##### Listing 5. Reading data directly into R and creating an SPSS database from them

## Working with categories

My simple example did not handle the categorical variable `Rating`

at all well. R got the numeric codes for that variable but not the descriptive labels for the different levels the variable might take: `Poor`

, `Average`

, and `Excellent`

.

You can do something about that issue. The `factorMode`

argument that is shown in Listing 6 imports category levels instead of numeric values.

##### Listing 6. The factorMode argument

```
BEGIN PROGRAM R.
testData = spssdata.GetDataFromSPSS(rDate="POSIXct", factorMode="labels")
testDict = spssdictionary.GetDictionaryFromSPSS()
print(testData)
END PROGRAM.
CustName Age Rating Date Weight
1 Mary 21 Poor 2013‑07‑31 55.2
2 John 45 Excellent 2013‑08‑01 73.4
3 Henry 33 Average 2012‑08‑02 80.0
```

### Building a dictionary for categorical variables

The `factorMode`

argument gives me a choice, depending on whether I want numeric codes or values for a categorical variable. But I need more if I want to create an SPSS database with categorical data. The solution lies in adding further structure to the data dictionary. The example in Listing 7 illustrates how to build an SPSS database from an R data frame with factors.

The famous iris data set is bundled with base R. It is a data frame with four numeric variables and one factor, denoting one of three species of iris. To build a database in SPSS, I complete the following steps:

- Create a data dictionary for the iris data. This dictionary is a data frame of five columns (one for each variable of the iris set).
- Create a category dictionary for the factor. The R structure here is complex. It is a list of length 2. The first component contains the names of the factors. The second component is a list of lists. Each item is a list of length 2: one component for the numeric codes and one component for their labels.
- Begin creation of an SPSS database by “setting” the data and category dictionaries.
- Populate the database.
- End the data step.
- Run the code. Doing so creates a database in SPSS but does not save it to disk. The active database remains whatever it was.

##### Listing 7. Building a dictionary for categorical data

```
BEGIN PROGRAM R.
data(iris)
head(iris)
iris.dict = vector(mode="list", length=5)
#Name the columns
names(iris.dict) = paste("X", 1:5, sep="")
#Fill in the numeric variables
for(i in 1:4){
iris.dict[[i]] = c(names(iris)[i],"","0","F3.2","scale")
}
#
#Fill information for the category
iris.dict[[5]] = c("Species","Species of Iris","0","F3","nominal")
#Square it off and add row names
iris.dict = data.frame(iris.dict)
row.names(iris.dict) = c("varName","varLabel","varType","varFormat",
"varMeasurementLevel")
#
#Now build the category dictionary
iris.cat = vector(mode="list",length=2)
names(iris.cat) = c("name","dictionary")
iris.cat$name = "Species"
#Note that the dictionary is a list of lists
#With only one category, the first list has length 1
#The dictionary list contains two lists
#
iris.cat$dictionary = vector(mode="list", length=1)
iris.cat$dictionary[[1]] = list(levels=c(1,2,3),
labels=levels(iris$Species))
#
#Now build the SPSS database.
spssdictionary.SetDictionaryToSPSS("Iris", iris.dict, iris.cat)
spssdata.SetDataToSPSS("Iris", iris, iris.cat)
spssdictionary.EndDataStep()
END PROGRAM.
```

## Conclusion

The R integration package contains many functions to provide a seamless transfer from SPSS to R. For instance, SPSS allows greater flexibility in defining missing values than R. The R integration package contains functions for managing missing values so that nothing is lost in passing from SPSS to R and back again. Another important feature is the ability to create SPSS extensions that use R. Menu items can be added to the **Analysis** menu that enable R functions to be run on the active data set without needing to write explicit code in a syntax file. In this way, you can make R functionality available to users who have no knowledge of R. The R integration package has a lot to offer data analysts who use both SPSS and R.