One of my frustrations with programmability is the learning barrier.  On the one hand, the resources and capabilities available through Python programmability combined with the plug-in for PASW Statistics are tremendous.  Often a problem can be solved in a few lines of Python code that would take a page of code in PASW Statistics or be practically impossible.  On the other, the Python language is very different from the Statistics syntax language, so many of those who would benefit from the capabilities Python brings to Statistics are frustrated.  Python is an easy language to learn, but it's a programming language, and many of our users are not programmers.

{C}Starting with version 16, we began to broaden the circle of people who could use programmability with the introduction of extension commands.  These give traditional-style syntax to Python or R programs.  That makes functionality packaged this way accessible to users who don't know Python.

 

Starting with version 17, we broadened the circle further with the Custom Dialog Builder, which makes it easy to create a point-and-click interface for programs, extension commands, or even traditional syntax.

With version 18, we have greatly simplified installation of these features, which has always been the hardest part, through the Python and R Essentials installers and the ability to build bundles that gather all the dependencies and make it really easy to install a new module or command.

All of these tools are available to users, not just SPSS staff.  Anyone with the skills can produce something useful and package it for themselves or consumers who might benefit but don't have the skills or time to create it themselves.

This is great for packages, but there is a lot more useful Python code that isn't packaged this way and remains inaccessible to users who would benefit from it.  One thing I have really wanted us to do is to make it possible to call Python functions in the SPSS transformation language just like the built-in transformation functions.

That's an architecturally hard problem that I hope we can solve.  But the latest module I have created and posted to Developer Central, SPSSINC_TRANS.py goes a long way towards doing this.  This new extension command, which also has a dialog box interface, makes it easy for nonprogrammers to apply almost any Python function or class to the cases in the active dataset.

In order to produce a transformation of variables in the active dataset using the Python plugin, you have to write code to create the variable definition, access the cases, do the calculations, and save the result to the active dataset.

With this new command, all you have to do is call a function that does the calculation.  The rest is handled for you.  Yes, you still have to know about the function, its parameters, and how to write the one line of code that calls it, but that's all.  You might need to persuade a producer to write a Python function for you, but any consumer can then use it.

There are many very useful functions in the extendedTransforms.py module available from Developer Central.  I'll illustrate the usage of the  SPSSINC TRANS extension command with the datetimetostr function in that module.  datetimetostr makes it easy to create a string with a date and/or time in almost any format.  Statistics already has many date/time formats, but users are always asking for others.  They could be built out of transformation syntax in most cases with enough effort, but datetimetostr makes this really easy.

The function takes two parameters: a pattern that describes the format you want and a Statistics date/time value and produces a string from it according to the pattern.  The pattern notation is described in the extendedTransforms module as part of the function documentation, but here's an example.

My pattern is %A, %B %d, %Y.  That means

  • day name (%A) followed by a comma and a blank
  • month name (%B) followed by a space
  • day-of-month (%d) followed by a comma and a blank
  • four-digit year (%Y).

This would turn the date value 02/09/1955 (a date value as formatted by Statistics using mm/dd/yyyy) into Wednesday, February 09, 1955.  (These names can be localized, too.)

Here is the command to apply this function to a variable named bdate, creating a new string variable named datestring


SPSSINC TRANS result = datestring type=30
/FORMULA

extendedTransforms.datetimetostr(value=bdate, pattern='''%A, %B %d, %Y''').

That specifies to create a variable named datestring as a string of length 30 and to call the datetimetostr function passing in the value of the Statistics variable bdate for each case, storing the output in datestring.  (Yes, I know the triple quotes for the pattern parameter value are a little weird, but it's necessary in order to accommodate the way that literals work in both Statistics and Python.)

So, once the user has discovered this function, learned the parameter names, and read about the pattern language, that's the end of the learning task.

If you download and install this new command, you can read the details by running

SPSSINC TRANS /HELP.

Or you can use the dialog and the dialog help that go with it.  The dialog isn't nearly as elaborate as the Compute dialog, but it should be enough to get you going.

So, producers and consumers, what do you think?  Post your comments here.

p.s. At this writing, the new command is a beta version.  Use it with caution, and please report any bugs you find.

Join The Discussion

Your email address will not be published. Required fields are marked *