Whether you are new to IBM SPSS Modeler or a longtime user, it is helpful to be aware of all the modeling nodes available. Just like a carpenter needs a tool for every job, a data scientist needs an algorithm for every problem. I collected descriptions for each modeling node from the documentation and summarized them to help provide a quick overview of the algorithms available natively in the software. The nodes below are grouped based on the type of data mining task they perform (Classfication, Association, and Segmentation). The nodes in this list are available in IBM SPSS Modeler version 17.
Classification Model Nodes (1)
Name of Modeling Node 
Description 
Decision Tree Nodes
 C&R Tree
 QUEST
 CHAID
 C5.0

 The algorithms are similar in that they can all construct a decision tree by recursively splitting the data into smaller and smaller subgroups
 Each algorithm has important differences which should be taken into account during model building

Decision List 
 Identifies subgroups or segments that show a higher or lower likelihood of a binary (yes or no) outcome relative to the overall sample
 Allows complete control over the model, enabling you to edit segments, add your own business rules, specify how each segment is scored, and customize the model in a number of other ways to optimize the proportion of hits across all segments

Linear Models 
 Predict a continuous target based on linear relationships between the target and one or more predictors
 Relatively simple and creates an easily interpreted mathematical formula for scoring

Principal Component Analysis, Factor Analysis (PCA/Factor) 
 Provides powerful datareduction techniques to reduce the complexity of your data
 Two similar but distinct approaches are provided in one node
 The goal is to find a small number of derived fields that effectively summarize the information in the original set of fields

Neural Network 
 Approximates a wide range of predictive models with minimal demands on model structure and assumptions
 Relationships are determined during the learning process
 The tradeoff for this flexibility is that the neural network is not easily interpretable

Feature Selection 
 Reduce the choice of hundreds, or even thousands, of fields that can potentially be used as inputs for a data mining problem
 Used to identify the fields that are most important for a given analysis

Classification Model Nodes (2)
Name of Modeling Node 
Description 
Discriminant 
 Builds a predictive model for group membership
 Model is composed of a discriminant function (or, for more than two groups, a set of discriminant functions) based on linear combinations of the predictor variables that provide the best discrimination between the groups

Logistic 
 Statistical technique for classifying records based on values of input fields
 Both binomial models (for targets with two discrete categories) and multinomial models (for targets with more than two categories) are supported

Generalized Linear Model (GenLin) 
 Expands the general linear model so that the dependent variable is linearly related to the factors and covariates via a specified link function
 Allows for the dependent variable to have a nonnormal distribution
 Covers widely used statistical models, such as linear regression for normally distributed responses, logistic models for binary data, loglinear models for count data, complementary loglog models for intervalcensored survival data, plus many other statistical models through its very general model formulation

Generalized Linear Mixed Models (GLMM) 
 Extend the linear model so that:
 The target is linearly related to the factors and covariates via a specified link function
 The target can have a nonnormal distribution
 The observations can be correlated
 Cover a wide variety of models, from simple linear regression to complex multilevel models for nonnormal longitudinal data.

Cox 
 Builds a predictive model for timetoevent data
 Produces a survival function that predicts the probability that the event of interest has occurred at a given time t for given values of the predictor variables

Support Vector Machine (SVM) 
 Enables you to use a support vector machine to classify data
 Particularly suited for use with datasets with a large number of predictor fields

Bayesian Network 
 Enables you to build a probability model by combining observed and recorded evidence with “commonsense” realworld knowledge to establish the likelihood of occurrences by using seemingly unlinked attributes
 Focuses on Tree Augmented NaĂŻve Bayes (TAN) and Markov Blanket networks that are primarily used for classification

SelfLearning Response Model (SLRM) 
 Enables you to build a model that you can continually update, or reestimate, as a dataset grows without having to rebuild the model every time using the complete dataset
 For example, this is useful when you have several products and you want to identify which product a customer is most likely to buy if you offer it to them
 Allows you to predict which offers are most appropriate for customers and the probability of the offers being accepted

KNearest Neighbor (KNN) 
 Method for classifying cases based on their similarity to other cases
 Similar cases are near each other and dissimilar cases are distant from each other
 The distance between two cases is a measure of their dissimilarity

Time Series to Model 
 Attempts to discover key causal relationships in time series data
 Builds an autoregressive time series model for each target and includes only those inputs that have a causal relationship with the target
 Differs from traditional time series modeling where you must explicitly specify the predictors for a target series

SpatioTemporal Prediction (STP) 
 Uses data that contains location data, input fields for prediction (predictors), a time field, and a target field
 Each location has numerous rows in the data that represent the values of each predictor at each time of measurement
 Used to predict target values at any location within the shape data that is used in the analysis

Association Model Nodes
Name of Modeling Node 
Description 
Apriori 
 Discovers association rules in the data
 To create an Apriori rule set, you need one or more Input fields and one or more Target fields

CARMA 
 Uses an association rules discovery algorithm to discover association rules in the data
 In contrast to Apriori, the CARMA node does not require Input or Target fields. This is integral to the way the algorithm works and is equivalent to building an Apriori model with all fields set to Both

Sequence 
 Discovers patterns in sequential or timeoriented data, in the format bread > cheese
 The elements of a sequence are item sets that constitute a single transaction
 A sequence is a list of item sets that tend to occur in a predictable order

Association Rules 
 The Association Rules node extracts a set of rules from the data, pulling out the rules with the highest information content. The Association Rules node is very similar to the Apriori node, however, there are some notable differences:
 The Association Rules node cannot process transactional data
 The Association Rules node can process data that has the List storage type and the Collection measurement level
 The Association Rules node can be used with IBMÂ® SPSSÂ® Analytic Server. This provides scalability and means that you can process big data and take advantage of faster parallel processing
 The Association Rules node provides additional settings, such as the ability to restrict the number of rules that are generated, thereby increasing the processing speed
 Output from the model nugget is shown in the Output Viewer

Segmentation Model Nodes
Name of Modeling Node 
Description 
KMeans 
 Provides a method of cluster analysis. It can be used to cluster the dataset into distinct groups when you don’t know what those groups are at the beginning
 Instead of trying to predict an outcome, KMeans tries to uncover patterns in the set of input fields
 Records are grouped so that records within a group or cluster tend to be similar to each other, but records in different groups are dissimilar

Kohonen 
 Kohonen networks are a type of neural network that perform clustering, also known as a knet or a selforganizing map
 Used to cluster the dataset into distinct groups when you don’t know what those groups are at the beginning

TwoStep (TwostepAS is a similar node that is only available on IBM SPSS Analytic Server) 
 A twostep clustering method
 The first step makes a single pass through the data, during which it compresses the raw input data into a manageable set of subclusters
 The second step uses a hierarchical clustering method to progressively merge the subclusters into larger and larger clusters, without requiring another pass through the data

Anomaly 
 Used to identify outliers, or unusual cases, in the data
 Anomaly detection models store information on what normal behavior looks like
 Particularly useful in applications, such as fraud detection, where new patterns may constantly be emerging
 Anomaly detection is an unsupervised method, which means that it does not require a training dataset containing known cases of fraud to use as a starting point

Automated Modeling Nodes
Name of Modeling Node 
Description 
Automatic Modeling Nodes
 Auto Classifier
 Auto Numeric
 Auto Clustering

 The automated modeling nodes estimate and compare a number of different modeling methods, enabling you to try out a variety of approaches in a single modeling run
 You can select the modeling algorithms to use, and the specific options for each, including combinations that would otherwise be mutuallyexclusive
 For example, rather than choose between the quick, dynamic, or prune methods for a Neural Net, you can try them al
 The node explores every possible combination of options, ranks each candidate model based on the measure you specify, and saves the best for use in scoring or further analysis

Time Series 
 Estimates exponential smoothing
 Univariate Autoregressive Integrated Moving Average (ARIMA)
 Multivariate ARIMA (or transfer function) models for time series
 Produces forecasts based on the time series data

As you can see, IBM SPSS Modeler offers many algorithms that are well suited for building models to make predictions or to better understand your data. If you are interested in more information on any of these modeling nodes please see the documentation here, or post a question in the IBM SPSS Predictive Analytics Community!
Hi Greg: I have been trying to find how does clustering methods in SPSS modeler cluster variables that are categorical? Do you have a text source that specifies what all clustering algorithms do with this type of variables?