Get the code
by Alok Singh | Updated March 28, 2019 - Published August 30, 2018
Artificial intelligenceData science
In this developer code pattern, we will use R4ML, a scalable R package running on IBM Watson™ Studio, to perform various machine-learning exercises. For users who are unfamiliar with Watson Studio, it is an interactive, collaborative, cloud-based environment where data scientists, developers, and others interested in data science can use tools (e.g., RStudio, Jupyter Notebooks, Spark, etc.) to collaborate, share, and gather insight from their data.
If you’re a data scientist who’s needed to know how to do large-scale model training for classification using a support vector machine (SVM) or perform tuning using cross-validation, you’ve come to the right place.
Living in the age of big data, we have tons of data generated every day, so it is important to analyze the data for optimal business results. However, traditional data science tools will not scale to big data, which is why frameworks like Apache Spark were created. R4ML is one approach toward that goal.
This pattern provides an SVM example to demonstrate the ease and power of R4ML in implementing scalable classification. R4ML provides various out-of-the-box algorithms to experiment with. For those users who are new to R4ML, or for functionality, support, documentation, and roadmap, please see the related links.
We will use the Airline On-Time Statistics and Delay Causes from RITA. A 1-percent sample of the dataset is available from the American Statistical Association (ASA). All of the data is in the public domain. We will be using a subset of the above dataset, which is shipped with R4ML, but this pattern can also work with the larger RITA dataset.
After you proceed through this pattern, you will understand how to:
Ready to put this code pattern to use? Complete details on how to get started running and using this application are in the README.
April 18, 2019
Artificial intelligenceCloud Foundry+
May 22, 2019
Artificial intelligenceData science+
April 15, 2019
Back to top