At the recent sold-out Spark & Machine Learning Meetup in Brussels, Sven Hafeneger of IBM delivered a lightning talk called Hyperparameter Optimization – when scikit-learn meets PySpark.

As Sven explained, Apache Spark™ is not only useful when you have big data problems. If you have a relatively small data set you might still have a big computational problem. One problem is the search for optimal parameters for ML algorithms.

Normally, a data scientist has a laptop with 4 cores (8 threads), that means it will take some time to perform a grid search. However, using Spark opens the possibility that the grid search can be taken out on a cluster with a higher degree of parallelism.

See a video of the talk on YouTube

See the slides on SlideShare

Join The Discussion

Your email address will not be published. Required fields are marked *