In SPSS Modeler, it can be useful to interact with a stream during its execution. For example, you may want to select a particular subset of data to analyse by scanning the raw data and presenting a set of possible categories before doing further processing. In this article, what we will do is use SPSS Modeler scripting to:

  1. generate a stream to scan a named data set for categories in a particular field
  2. present the available categories to the user who can then select which category they are interested in
  3. generate a histogram of a numeric field for the category chosen by the user

Along the way we will cover:

  • how to generate and configure nodes in a stream
  • some basic Java Swing UI (user interface) controls
  • some of the issues that can occur when interacting with a user during execution

Stream Structure
Our script will generate a stream that consists of a data source node (in this case a CSV import node) and two sub-branches off it:

  1. One that uses an aggregate node to identify the categories in a named field
  2. Another that uses a select node to select records where the category field has the named category before passing those records to a histogram node

Stream with two branches

Script Structure
To keep things flexible, we will define a primary function that takes the location of the CSV file, the name of the categorical field we want to select from and the name of the continuous field we want to plot the histogram:


def genHistogram(stream, csvFile, categoryColumn, continuousColumn):
    # whatever we need to do...

This means we could call it as a stream script with:


genHistogram(modeler.script.stream(), '$CLEO_DEMOS/DRUG1n', 'Drug', 'Age')

where ‘Drug’ and ‘Age’ are two of the fields in the DRUG1n demonstration data set.

The general structure of the primary function is:


def genHistogram(stream, csvFile, categoryColumn, continuousColumn):
    # clear any existing nodes from the stream
    # build the stream using the information passed in the parameters
    # execute the first sub-branch to get the list of possible categories
    # present the user with a list of categories so they can pick one
    # configure the select expression based on the chosen category
    # execute the second sub-branch to create the histogram

Clearing Existing Nodes From The Stream
Streams include a clear() which we will use to remove an existing nodes from the stream. This isn’t strictly necessary but keep things tidy if we re-execute the script multiple times.


def genHistogram(stream, csvFile, categoryColumn, continuousColumn):
    stream.clear()	# reset the stream
    # build the stream using the information passed in the parameters
    # execute the first sub-branch to get the list of possible categories
    # present the user with a list of categories so they can pick one
    # configure the select expression based on the chosen category
    # execute the second sub-branch to create the histogram

Building The Stream
The next step is to build our stream based in the information we already know (the path to the CSV file, the categorical column we’re picking the category from and the continuous column we want to plot the histogram for).

The CSV file node is created first. Since we are limiting this to a simple CSV using the defaults (comma as the field separator, first line contains the field names etc.) we only need to set the file path.

In our first sub-branch, we create an aggregate node configured to use the categorical field as its only key field. We then pass the results to a table node which will create a table output. The table output will contain a row set where each row contains two columns – the category in the key field and the number of records that category appears in (we don’t need the count but we will leave it enabled for simplicity).

In our second sub-branch, we create a select node to select the records with the chosen category. Since we don’t yet know the category, we can’t configure the expression so that will remain blank. Finally we add the histogram node with the continuous field we’re going to plot.

The function which does this is shown below:


def buildStream(stream, csvFile, categoryColumn, continuousColumn):
    # Create and configure the CSV import node
    csvNode = stream.createAt("varfile", csvFile, 92, 92)
    csvNode.setPropertyValue("full_filename", csvFile)

    # Create the branch that will determine the available categories
    # Create and configure the aggregate node
    aggNode = stream.createAt("aggregate", "Agg", 184, 92)
    aggNode.setPropertyValue("keys", categoryColumn)
    # Create the table node
    tableNode = stream.createAt("table", "Table", 276, 92)
    # Connect them together
    stream.linkPath([csvNode, aggNode, tableNode])
    
    # Now create the second branch that will select the specified category
    # and produce the required graph.
    # Just create the select node - we can't configure the select expression yet
    selectNode = stream.createAt("select", "Select", 184, 184)
    # Create and configure the histogram node
    histogramNode = stream.createAt("histogram", "Histogram", 276, 184)
    histogramNode.setPropertyValue("field", continuousColumn)
    # Connect them together
    stream.linkPath([csvNode, selectNode, histogramNode])
    
    # Return the nodes that the rest of the script will need access to
    return [tableNode, selectNode, histogramNode]

It returns a list of the nodes the rest of the script will need access to:

  • the table node in the first branch which will create the list of categories
  • the select node in the second branch which will be configured once the user has selected a category
  • the histogram node which will create the histogram

We can update our primary function to build the stream:


def genHistogram(stream, csvFile, categoryColumn, continuousColumn):
    stream.clear()	# reset the stream
    nodes = buildStream(stream, csvFile, categoryColumn, continuousColumn)
    # execute the first sub-branch to get the list of possible categories
    # present the user with a list of categories so they can pick one
    # configure the select expression based on the chosen category
    # execute the second sub-branch to create the histogram

Executing The First Sub-branch To Get The Categories
We now need to execute the table node and get a list of categories. We will create a single function that:

  1. executes the table node
  2. scans the row set in the table output and extracts the categories
  3. closes the table output (which we now long need)
  4. sorts and returns the categories as a list

The function which does that is:


def getCategories(tableNode):
    # Supply a list to capture the objects generated by executing the stream
    result = []
    tableNode.run(result)
    # Extract the row set from the table output and get the values from the first column
    rowset = result[0].getRowSet()
    rowcount = rowset.getRowCount()
    row = 0
    values = []
    while row < rowcount:
        values.append(str(rowset.getValueAt(row, 0)))
        row+=1
    # Close/delete the table output
    result[0].close()
    # Sort the values before returning them
    values.sort()
    return values

Our updated primary function now looks like:


def genHistogram(stream, csvFile, categoryColumn, continuousColumn):
    stream.clear()	# reset the stream
    nodes = buildStream(stream, csvFile, categoryColumn, continuousColumn)
    categories = getCategories(nodes[0])
    # present the user with a list of categories so they can pick one
    # configure the select expression based on the chosen category
    # execute the second sub-branch to create the histogram

Presenting The User With The Categories
In order to present the user with the categories, we are going to generate a simple UI (user interface). Since Modeler scripting uses Jython, a Java implementation of the Python language, we will use Java’s Swing UI library. Swing is a complex and powerful library (which SPSS Modeler also uses for its UI) and there are dozens of books and on-line tutorials devoted to it (you can find the official tutorial here). However, we will try to keep things simple.

The UI will consist of window containing a list of possible categories which the user can select and on OK button to confirm that selection:

Category selector window

To use Swing UI components we first need to import the ones we need into our script:


from java.awt import BorderLayout, Dimension
from javax.swing import JFrame, JButton, JPanel, JList, JScrollPane

A basic function to create the UI layout would be:


def makeUI(categories):
    frame = JFrame("Select a category", size = (300, 200))
    uilist = JList(categories)
    scroll = JScrollPane(uilist)
    scroll.setPreferredSize(Dimension(300, 200))
    button = JButton('OK')
    panel = JPanel()
    # A BorderLayout provides a simple way of laying out components in a panel
    panel.setLayout(BorderLayout())
    panel.add(scroll, BorderLayout.CENTER)
    panel.add(button, BorderLayout.SOUTH)
    frame.getContentPane().add(panel)
    frame.pack()
    return frame

That looks fine but if we were to make the frame visible, clicking the OK button wouldn’t do anything because we haven’t added any behaviour to the button. To make something happen we have to attach a callback or “listener” function that gets called when certain things happen (like the button being clicked).

Before we do that though, it’s worth stepping back and thinking exactly what our callback needs to do:

  1. check whether the user has actually selected anything before proceeding
  2. get the selected category from the list
  3. configure the select node for that category
  4. close the selector window
  5. execute the histogram node

In order to do those operations, the callback will need access to:

  • the list component in the UI (to get the selected category)
  • the select node that will be configured
  • the name of the categorical field so an expression like “field == category” can be created and added to the select node
  • the histogram node

The problem is that the callback function itself only gets passed a single object, the event that caused the callback to be called. The next issue is how to provide the callback with the values it needs.

One approach would be to store what’s needed in global variables but that can quickly get unwieldy. The approach we’re going to take is to create a “closure” which is basically a function with certain values baked into it.

A Note On Closures
A closure can be created by defining a function within a function:


def buildMultiplier(x):
    def mult(y):
        return x * y
    return mult

The buildMultiplier function is effectively a function factory – it is passed a number and returns a closure on the nested mult function that multiplies by that number. For example:


mult10 = buildMultiplier(10)
print mult10(4)    # prints 40

What we will do is create a “callback” factory that captures the information we need.

We could do something like:


def buildOKHandler(selectNode, categoryColumn, histogramNode, uilist, frame):
    def configAndRun(event):
        # whatever the handler has to do
    return configAndRun

However, this seems a little clumsy – the callback has to know about the stream components and the UI components. It would make the functionality more re-usable to separate the UI-related stuff from the stream-related stuff. One way this could be done is by capturing the UI-related functionality in a single object and allow the rest of the script to worry about the stream, nodes, fields etc. To do that, we will create a new UI component class and then create instances of that class.

Creating A Class For Our UI
There are a number of ways we could define a class for our UI. For example, we could define a class that simply created a JFrame and its associated UI elements. However,we are going to take the approach of extending the JFrame class we used in our makeUI function.

The syntax for extending a class can look a little clunky but getting to grips with it opens up a lot of opportunities for doing the same thing in other situations:


from java.awt import BorderLayout, Dimension
from javax.swing import JFrame, JButton, JPanel, JList, JScrollPane

class SelectorWindow(JFrame):
    def __init__(self):
        JFrame.__init__(self, "Select a category", size = (300, 200))
        self.uilist = JList()
        scroll = JScrollPane(self.uilist)
        scroll.setPreferredSize(Dimension(300, 200))
        button = JButton('OK')
        panel = JPanel()
        # A BorderLayout provides a simple way of laying out components in a panel
        panel.setLayout(BorderLayout())
        panel.add(scroll, BorderLayout.CENTER)
        panel.add(button, BorderLayout.SOUTH)
        self.getContentPane().add(panel)
        self.pack()

You can see that it is similar to the makeUI function we defined earlier but with a few tweaks here and there. The main ones are:

  • class SelectorWindow(JFrame): – this says we are defining a class called SelectorWindow which extends JFrame i.e. we have all the functionality of JFrame along with anything else we choose to add
  • def __init__(self): – this is the function that is called when we create a new instance of SelectorWindow
  • JFrame.__init__(self, "Select a category", size = (300, 200)) – this calls the standard JFrame constructor with a title. It also sets the initial size of the JFrame to width 300 and height 200

As well as creating and laying out the UI components, we also want our class to:

  • handle setting the list in the UI with the possible categories and making the UI visible
  • handle the details of the “OK” button callback – checking that a value has been selected, getting the selected value from the list and closing the window, leaving our main callback code to focus on configuring and executing the second branch of the stream using the category chosen by the user

To do this, we will define a function in our class that takes the list of categories to be displayed and a function to be called once the OK button has been clicked and we know which value has been selected.


class SelectorWindow(JFrame):

    # other class-related code...

    def choose(self, values, cb):
        # set the list data using the values
        # store the callback (cb) so we can call it when the user clicks OK
        # make the window visible

To call the supplied callback cb, we will define a separate callback function within the SelectorWindow class that will take care of checking whether an item has been selected, extracting that item from the list and closing the window before calling the supplied callback with the selected value. This means it should be possible to change the selection mechanism used by the UI (e.g. to use a combo-box rather than a list) without changing anything about the callback that gets passed to the choose function. Our modified class now looks like:


class SelectorWindow(JFrame):
    __callback = None
    
    # Class-specific callback
    def okCallback(self, event):
        # Check a callback has been set and the selection is not empty
        if self.__callback != None and not(self.uilist.isSelectionEmpty()):
            self.setVisible(False)
            self.__callback(self.uilist.getSelectedValue())
    
    def __init__(self):
        JFrame.__init__(self, "Select a category", size = (300, 200))
        self.uilist = JList()
        scroll = JScrollPane(self.uilist)
        scroll.setPreferredSize(Dimension(300, 200))
        # Ensure the class-specific callback is always called when the OK button is clicked
        button = JButton('OK', actionPerformed = self.okCallback)
        panel = JPanel()
        # A BorderLayout provides a simple way of laying out components in a panel
        panel.setLayout(BorderLayout())
        panel.add(scroll, BorderLayout.CENTER)
        panel.add(button, BorderLayout.SOUTH)
        self.getContentPane().add(panel)
        self.pack()
    
    def choose(self, values, cb):
        self.uilist.setListData(values)
        self.__callback = cb
        # Centre the window on the screen
        self.setLocationRelativeTo(None)
        self.setVisible(True)

So the sequence is now:

  1. create an instance of SelectorWindow
  2. create our callback
  3. call the choose function on our SelectorWindow instance with the categories and our callback

This leaves us with the issue of creating our closure that references the select node, histogram node and category field.

Configuring The Second Sub-branch Based On The Chosen Category
Earlier we defined buildOKHandler to create our callback as a closure. Since our SelectorWindow class is responsible for taking care of extracting the value from the list and closing the window, we can simplify it as:


def buildOKHandler(selectNode, categoryColumn, histogramNode):
    def configAndRun(selectedValue):
        # whatever the handler has to do to configure and run the second branch
    return configAndRun

In other words we no longer need to capture the list and window from the UI (since the SelectorWindow handles that side), and our SelectorWindow will pass the selected value directly to our callback function.

So what does our callback function have to do?

  1. configure the select node
  2. execute the histogram

Configuring the select node is relatively simple. The expression will look like:

categoryColumn == selectedValue

There is an implicit assumption in this example that the category column contains strings. Although that might not always be true, it simplifies the example code if we make that assumption – if you need to support non-string categoricals then it is possible to do so but that is left as an exercise for the reader.

To make sure we can handle field names that may contain non-alphanumeric characters, we will enclose the field name in single quotes (')

'categoryColumn' == "selectedValue"

Making the necessary changes to our configAndRun method gives us:


def buildOKHandler(selectNode, categoryColumn, histogramNode):
    def configAndRun(selectedValue):
        expr = '\'' + categoryColumn + '\' == "' + selectedValue + '"'
        selectNode.setPropertyValue("condition", expr)
        # Then run the histogram
    return configAndRun

Executing The Second Sub-branch To Create The Histogram
Obviously we’ve already shown how to execute the table node to get the categories so we can simply follow that pattern by calling the node’s run method, right?


def buildOKHandler(selectNode, categoryColumn, histogramNode):
    def configAndRun(selectedValue):
        expr = '\'' + categoryColumn + '\' == "' + selectedValue + '"'
        selectNode.setPropertyValue("condition", expr)
        # Then run the histogram
        results = []
        histogramNode(results)   # !! DO NOT RUN THIS
    return configAndRun

Unfortunately, if you run this, you will lock up the Modeler UI meaning the only thing you can do is kill the process and lose whatever work you were doing. So what’s the problem?

The issue is the way the Swing UI toolkit we are using for displaying options to the user works. If you recall, the callback we are building here is being run in response to a UI event (i.e. the user clicking the “OK” button). While that UI event is being handled, Swing cannot do anything else until the event handling code has completed and returned control back to Swing. Why does that cause a problem? Executing a stream causes the UI to be updated in various ways but since the Swing UI is still busy handling the “OK” button click event, those updates can’t happen. This means the handler execution can’t complete until the UI is updated but the UI can’t be updated until the handler execution is complete and this causes the Modeler UI to lock up. Obviously this is a problem but the solution is relatively simple.

Best practice in many UI environments (not just Swing) is to minimise the time spent in callbacks since this makes the UI responsive again as quickly as possible. Any long running tasks are typically run in a separate “thread”. A thread is a little like a process in that it allows multiple tasks to be run in parallel (or at least give the impression that is what is happening). Using threads, like Swing, is a complex topic and beyond the scope of this article to discuss in detail. However, Jython and Java have support for threads built-in, so we will use some of those features to allow us to run the stream without locking the UI.

Executing A Node In A Separate Thread
We are going to use the Java threading facilities to run a node in a separate thread. We could use Python/Jython features but this provides an opportunity for a second example of extending Java classes in Jython.

Java’s threading support has two classes we are interested in:

  1. Thread which defines functions for controlling and monitoring a thread
  2. Runnable which defines what code the thread is actually going to run[1].
[1] Technically, Runnable is what’s called an interface in Java. An interface is like an empty class but where the functions expected to be supported have already been declared i.e. the interface defines what has to be supported while a class that implements the interface defines how that support is implemented.

We are going to define a class called NodeRunner that extends (or implements) the Java Runnable class and its single required method called run(). Each instance takes the node to be executed and the list where any results will be stored.

We will also write a convenience function called runLater that takes a node and a result list, creates a NodeRunner instance and then passes that instance to a new Thread object. The thread will then be started allowing the runLater to return immediately i.e. before the node execution has completed.


# We need to import Runnable and Thread from Java
from java.lang import Runnable, Thread

class NodeRunner(Runnable):
    def __init__(self, n, r):
        self.node = n
        self.results = r

    def run(self):
        # Don't need to do this but it can be useful
        print "Executing", self.node.getLabel()
        self.node.run(self.results)

def runLater(node, results):
    Thread(NodeRunner(node, results)).start()

We can now modify our callback (remember that?) to call runLater instead of executing the node directly:


def buildOKHandler(selectNode, categoryColumn, histogramNode):
    def configAndRun(selectedValue):
        expr = '\'' + categoryColumn + '\' == "' + selectedValue + '"'
        selectNode.setPropertyValue("condition", expr)
        # Then run the histogram
        results = []
        runLater(histogramNode, results)
    return configAndRun

(“Just a moment!”, I hear you say, “How come I can run the first branch directly without having to use runLater, even though I ran the script or stream by clicking on a button in the Modeler UI?”. The answer is that Modeler does the same thing as we’ve done in this script i.e. it begins the execution in a separate thread rather than directly in the button callback.)

Tying It All Together
We can now go back to primary function and add the steps to create the callback, create a SelectorWindow and invoke its choose function:


def genHistogram(stream, csvFile, categoryColumn, continuousColumn):
    stream.clear()	# reset the stream
    nodes = buildStream(stream, csvFile, categoryColumn, continuousColumn)
    # Run the first branch to get the categories
    categories = getCategories(nodes[0])
    # Build our callback function
    handler = buildOKHandler(nodes[1], categoryColumn, nodes[2])
    # Create the SelectorWindow and invoke its choose() function
    selector = SelectorWindow()
    selector.choose(categories, handler)

Summary
This has been a long discussion with various side-tracks so if you’ve made it this far then well done. Even if you don’t need all these features, hopefully there are bits and pieces you can pull from these examples to help you make the most of the scripting functionality available in SPSS Modeler.

The full script is shown below or can be downloaded from GitHub:


from java.lang import Runnable, Thread
from java.awt import BorderLayout, Dimension
from javax.swing import JFrame, JButton, JPanel, JList, JScrollPane

class SelectorWindow(JFrame):
    __callback = None
    
    # Class-specific callback
    def okCallback(self, event):
        # Check a callback has been set and the selection is not empty
        if self.__callback != None and not(self.uilist.isSelectionEmpty()):
            self.setVisible(False)
            self.__callback(self.uilist.getSelectedValue())
    
    def __init__(self):
        JFrame.__init__(self, "Select a category", size = (300, 200))
        self.uilist = JList()
        scroll = JScrollPane(self.uilist)
        scroll.setPreferredSize(Dimension(300, 200))
        # Ensure the class-specific callback is always called when the OK button is clicked
        button = JButton("OK", actionPerformed = self.okCallback)
        panel = JPanel()
        # A BorderLayout provides a simple way of laying out components in a panel
        panel.setLayout(BorderLayout())
        panel.add(scroll, BorderLayout.CENTER)
        panel.add(button, BorderLayout.SOUTH)
        self.getContentPane().add(panel)
        self.pack()
    
    def choose(self, values, cb):
        self.uilist.setListData(values)
        self.__callback = cb
        # Centre the window on the screen
        self.setLocationRelativeTo(None)
        self.setVisible(True)
		

class NodeRunner(Runnable):
    def __init__(self, n, r):
        self.node = n
        self.results = r

    def run(self):
        print "Executing", self.node.getLabel()
        self.node.run(self.results)

def runLater(node, results):
    Thread(NodeRunner(node, results)).start()


def buildStream(stream, csvFile, categoryColumn, continuousColumn):
    # Create and configure the CSV import node
    csvNode = stream.createAt("varfile", csvFile, 92, 92)
    csvNode.setPropertyValue("full_filename", csvFile)

    # Create the branch that will determine the available categories
    # Create and configure the aggregate node
    aggNode = stream.createAt("aggregate", "Agg", 184, 92)
    aggNode.setPropertyValue("keys", categoryColumn)
    # Create the table node
    tableNode = stream.createAt("table", "Table", 276, 92)
    # Connect them together
    stream.linkPath([csvNode, aggNode, tableNode])
    
    # Now create the second branch that will select the specified category
    # and produce the required graph.
    # Just create the select node - we can't configure the select expression yet
    selectNode = stream.createAt("select", "Select", 184, 184)
    # Create and configure the histogram node
    histogramNode = stream.createAt("histogram", "Histogram", 276, 184)
    histogramNode.setPropertyValue("field", continuousColumn)
    # Connect them together
    stream.linkPath([csvNode, selectNode, histogramNode])
    
    # Return the nodes that the rest of the script will need access to
    return [tableNode, selectNode, histogramNode]

def getCategories(tableNode):
    # Supply a list to capture the objects generated by executing the stream
    result = []
    tableNode.run(result)
    # Extract the row set from the table output and get the values from the first column
    rowset = result[0].getRowSet()
    rowcount = rowset.getRowCount()
    row = 0
    values = []
    while row < rowcount:
        values.append(str(rowset.getValueAt(row, 0)))
        row+=1
    # Close/delete the table output
    result[0].close()
    # Sort the values before returning them
    values.sort()
    return values

def buildOKHandler(selectNode, categoryColumn, histogramNode):
    def configAndRun(selectedValue):
        expr = '\'' + categoryColumn + '\' == "' + selectedValue + '"';
        selectNode.setPropertyValue('condition', expr)
        results = []
        runLater(histogramNode, results)
		
    return configAndRun

def genHistogram(stream, csvFile, categoryColumn, continuousColumn):
    stream.clear()	# reset the stream
    nodes = buildStream(stream, csvFile, categoryColumn, continuousColumn)
    # Run the first branch to get the categories
    categories = getCategories(nodes[0])
    # Build our callback function
    handler = buildOKHandler(nodes[1], categoryColumn, nodes[2])
    # Create the SelectorWindow and invoke its choose() function
    selector = SelectorWindow()
    selector.choose(categories, handler)
	
genHistogram(modeler.script.stream(), "$CLEO_DEMOS/DRUG1n", "Drug", "Age")

4 comments on"Using SPSS Modeler Scripting To Execute Streams Interactively"

  1. A modal dialog could be used instead of a frame. That makes it possible not to run in separate thread, because everything can be in order.

    • JulianClinton March 10, 2016

      Hi

      The issue is that, even with a modal dialog, you would still be executing a potentially long running process inside a UI callback which is considered bad practice because the UI is unresponsive until the callback returns.

      Julian

      • Suharto Anggono March 11, 2016

        Strategy:
        – The modal dialog blocks further instructions in the main program until the dialog disappears.
        – The button callback just closes the dialog.
        – After the dialog disappears, control is back to the main program. The chosen category has been captured.
        – Node is run from the main program.
        Thank you for the example.

  2. AdolfoRamĂ­rez May 03, 2017

    Great contibution, it will very usefull. Thanks Julian.

Join The Discussion

Your email address will not be published. Required fields are marked *