Today we have released new modification releases of IBM SPSS data science offerings — in particular IBM SPSS Modeler 18.1.1, Analytic Server 3.1.1 and Collaboration and Deployment Services.
In this release we are continuing our new strategy of including open source based algorithms available in the Modeler GUI without having to install anything else. In Modeler 18.1.1, we are adding two Python algorithms. First we have added t-SNE which is a dimensionality reduction method that lets end-users easily visualize groupings in their data. We are also adding the random forest algorithm based on Python. This is in addition to the existing random trees node. While random trees provides some options not available in the Python algorithm, in many cases the Python version will build the model more quickly. We wanted to let the data scientist using Modeler make the decision which is why we are including both.
We are also adding three algorithms that run on Spark and can be run locally in Modeler or in Hadoop through Analytic Server. First we added K-Means on Spark. K-Means is a widely used clustering algorithm now available for the first time for Analytic Server. We are adding XGBoost (already available in Python) as a Spark node. Finally we are adding a new algorithm — Isotonic Regression-AS. Isotonic regression relaxes the constraint in linear regression that the model be completely linear — instead the new constraint is now that the prediction is non-decreasing as one changes an input field.
We have also improved the CPLEX node in this release. We added the CPLEX node in Modeler 18.1 to allow end users to run OPL (a language used in IBM’s CPLEX Optimization Studio). code in Modeler — thus allowing Modeler end users to run optimization directly as part of the Modeler flow. However, in Modeler 18.1 the CPLEX node only allow a single input — making it difficult to add data around constraints or costs. In this release we allow multiple inputs to the CPLEX node making it much easier to use. An end user still needs a separately purchased CPLEX Optimization Studio license to have the optimization model involve more than 1000 optimization variables and 1000 constraints (which generally but not always translates into 1000 rows of data in Modeler).
We have received a lot of feedback that the visualization in Modeler could be improved. We are thus exposing as a beta feature a new type of visualization. This is interactive and allows one to change the colors, scaling and even the type of graph on the fly. Since this is a beta feature it should not be used for production implementations. We are looking for feedback on this node so please feel free to leave a comment on the blog about how you like it.
The following are the new features in Analytic Server 3.1.1 (in addition to the new Spark nodes mentioned above):
1) A more automated offline installation procedure is now available for Hortonworks HDP
2) You can now configure separate YARN queue for each Analytic Server tenant with specific queue name and resource allocation that match the requirement for different type of AS users and jobs.
3) Data (from the same data source) now can be shared across Analytic Server jobs at runtime using global RDD in SparkContext. A checkbox of “is global share” added into the AS Admin Console under Data Source page. It’s recommended to enable this feature when there are multiple steams using the same data source.
4) Two more data processing operations – “Sampling” and “Distinct” now supported to SQL pushback to Hive for execution.
The following are the related software support changes for Analytic Server 3.1.1:
Support for Cloudera 5.11 and 5.12.
Support for Ubuntu Linux 16.04 (with Hortonworks Data Platform 2.6 and Cloudera 5.11).
Cloudera 5.8 and 5.9 are no longer supported.
Big Insights 4.1, 4.2, and 4.2.5 are no longer supported.
MapR 5.0 is no longer supported
Support for Apache Hive 2.1
MongoDB 2.6 is no longer supported
MySQL 5.1 is no longer supported