Traditional SPSS syntax jobs refer to specific variable names and are written with assumptions about what the variables mean.  That is often fine, but you can sometimes accommodate variations in names or roles by using the properties and attributes of variables instead of relying on the names.  A new extension command, SPSSINC SELECT VARIABLES, can help with this.

To start with the most basic approach, suppose you would like to create a job that would produce summary univariate statistics for any dataset without knowing what the variables will be.  The CODEBOOK command will do very condensed statistics, but let's suppose you want more detail.  The measurement level of a variable is a good place to start in determining what statistics would be appropriate, but there is no syntax that says, for example, run FREQUENCIES on all categorical variables and DESCRIPTIVES or EXAMINE on all scale (continuous, ratio) variables.  The analyst can do this interactively easily – the dialog box variable lists can even be sorted by measurement level, but it is hard to build a production job with this flexibility.


Using SPSSINC SELECT VARIABLES, you can easily define macros that list the variables for selected measurement levels.  The macros can then be used in the appropriate commands.  Here's what the job might look like.

DESCRIPTIVES !scalevars.

Simple!  If you wanted to select only variables whose names end with "Education", you could add PATTERN=".*education" to the selection commands above.

The SPSSINC SELECT VARIABLES command has lots of other ways of selecting variables, and it has a dialog box interface as well. I'll only write about one additional dimension of this command, which touches on an exciting way to raise the level of generality.

Since Version 14, SPSS has allowed users to create custom attributes for variables and files.  They can be anything you want, and they are saved with the data just as variable labels, missing value codes and other metainformation are.  Useful examples might include units (currency, distance, weight), sources, roles (predictor, dependent, id), question text, privacy (confidential, semi-public, public).

Let's assume that the datasets to be fed to your job have a set of attributes that have already been assigned by the data preparation team.  You can use this information to affect the macro definitions generated by SPSSINC SELECT VARIABLES.  As one example, perhaps you want to exclude variables from your statistical summary if they don't have a useful role for analysis.  You could change the syntax above to reflect this.  Just redoing the categorical variable example, you would write.

/ATTRVALUES NAME=role VALUE="dependent" "predictor".
FREQUENCIES !interestingCatVars.

SPSSINC SELECT VARIABLES is an extension command available from SPSS Developer Central in the Downloads section. It requires at least Version 17 and the Python programmability plugin. Check it out: no need to learn the macro language or Python programmability to start taking advantage of this. If you are building automation jobs where you don't even know the names of the datasets that might arrive (but you do have some idea of the structure!), check out the SPSSINC PROCESS FILES extension command, also available from Developer Central.

Join The Discussion

Your email address will not be published. Required fields are marked *