In the new release of IBM SPSS Modeler 17.1 we introduced integration with Apache Spark. In this post I will explain more about this integration and why it is so powerful.

Why Should Customers Care about Apache Spark?

Complex workloads complete significantly faster in Spark compared Hadoop Map/Reduce

Spark enables users to be more productive

Users are able to:

1. Build predictive models faster

2. Conduct more experiments in less time

3. Build multiple models without waiting for the system

SPSS democratizes analytics, extending benefits to users who do not want to program

Access to a broader library of analytic algorithms delivers solutions to more use cases

1.In addition to SPSS algorithms that now run in Spark, Data Scientists can utilize more than 15 algorithms from Spark MLlib

2.Data Scientists can create new Modeler nodes to exploit MLlib algorithms & share them with non-programmer Data Scientists

3.Via shared Modeler nodes, non-programmer Data Scientists leverage Spark functionality in their own analytic workflows

The Custom Dialog Builder – Python for Spark

The Custom Dialog Builder adds Python for Spark support

  • Provides access to Spark & its machine learning library (MLlib)
  • Also provides access to other common Python libraries
  • e.g.: Numpy, Scipy, Scikit-learn, Pandas

Data Scientists can create new Modeler nodes (extensions) that exploit algorithms from MLlib and other PySpark processes

sparkSPSS

spssSpark2

  • These nodes can be shared with non-programmer Data Scientists to democratize access to Spark capabilities
  • Spark becomes usable for non-programmers with code abstracted behind a GUI

sparkNode

SPSS + Spark + MLlib

We built some examples to help you get started using Spark on SPSS Modeler. These samples are available on the SPSS Gallery:

Collaborative Filtering: Recommendation system

Page Rank

You can build your own and share them with the community! We also provide access to any Python library like Numpy, Scipy, Scikit-learn or Pandas.

 

 

 

2 comments on"Spark integration in SPSS Modeler 17.1"

  1. Is there a way IBM SPSS Modeler to connect to Hadoop and retrieve data without using Analytics Server. May be by writing a pySpark code? any samples available.

    If not is there a plan in the future releases to have IBM SPSS Modeler connectivity to Hadoop without AS?

    • Jorge Cardoso November 24, 2017

      Although i have never used it, IBM SPSS Data Access Pack (which comes with SPSS Modeler) do have a Hadoop ODBC driver.
      If it works i suspect it is a basic aproach!

Join The Discussion

Your email address will not be published. Required fields are marked *