- Gradient-Boosted Trees - Supervised learning algorithm that can be used for either binary classification or regression tasks. Learn more about the implementation here.
- K-Means Clustering - Unsupervised clustering technique accepting a user defined number of clusters (k). Learn more about the implementation here.
- Multinomial Naive Bayes - Supervised learning variation of Naive Bayes used for classification. The inputs used for this algorithm should be frequencies. A classic example is using a term-document frequency matrix to perform document classification. Learn more about the implementation here.
We are excited to announce the release of 3 new extensions for SPSS Modeler using MLlib implemented algorithms and PySpark. These three extensions are Gradient-Boosted Trees, K-Means Clustering, and Multinomial Naive Bayes. Niall McCarroll, IBM SPSS Analytic Server Software Engineer, and I developed these extensions in Modeler version 18, where it is now possible to run PySpark algorithms locally. This means that users who have Modeler 18 with Server Enablement can use these extensions to build models using local data or distributed data in a Spark cluster on Analytic Server.