In the new release of IBM SPSS Modeler 17.1 we introduced integration with Apache Spark. In this post I will explain more about this integration and why it is so powerful.
Why Should Customers Care about Apache Spark?
Complex workloads complete significantly faster in Spark compared Hadoop Map/Reduce
Spark enables users to be more productive
Users are able to:
1. Build predictive models faster
2. Conduct more experiments in less time
3. Build multiple models without waiting for the system
SPSS democratizes analytics, extending benefits to users who do not want to program
Access to a broader library of analytic algorithms delivers solutions to more use cases
1.In addition to SPSS algorithms that now run in Spark, Data Scientists can utilize more than 15 algorithms from Spark MLlib
2.Data Scientists can create new Modeler nodes to exploit MLlib algorithms & share them with non-programmer Data Scientists
3.Via shared Modeler nodes, non-programmer Data Scientists leverage Spark functionality in their own analytic workflows
The Custom Dialog Builder – Python for Spark
The Custom Dialog Builder adds Python for Spark support
- Provides access to Spark & its machine learning library (MLlib)
- Also provides access to other common Python libraries
- e.g.: Numpy, Scipy, Scikit-learn, Pandas
Data Scientists can create new Modeler nodes (extensions) that exploit algorithms from MLlib and other PySpark processes
- These nodes can be shared with non-programmer Data Scientists to democratize access to Spark capabilities
- Spark becomes usable for non-programmers with code abstracted behind a GUI
SPSS + Spark + MLlib
We built some examples to help you get started using Spark on SPSS Modeler. These samples are available on the SPSS Gallery:
You can build your own and share them with the community! We also provide access to any Python library like Numpy, Scipy, Scikit-learn or Pandas.