In this video:
- Romeo Kienzler, Chief Data Scientist, IBM
Romeos asks the question, “Which programming language should you use for your data science project: Scala, Python, or R?” Romeo answers the question with “All of them.” Then, he proceeds to walk you through using each of them in the IBM Data Science Experience.
Romeo used a sample test data set from NASA, which includes accelerometer and temperature sensor data that is sampled every 10 seconds. So, it’s a lot of data.
Romeo then demonstrates how to use IBM Data Science Experience, working completely in the IBM Cloud, incurring no bandwidth issues in transferring the data or storing the data. He provides Jupyter notebooks that he works with for this tutorial in his GitHub repository:
He demonstrates how to work with Scala, Python, and R notebooks within the Data Science Experience. He shows how to use the basic data science analyses available in each. He thinks that Python lies at the intersection of Scala and R and that Python is easy to learn and that you can do much more in Python (more heavy lifting) than just Scala or R alone. (Romeo reports that in his experience R sessions frequently crash and take a good bit of time to run, and that you often need to downsample your data sets when working in R. He suggests pushing some processing to Apache Spark in the Data Science Experience to take it out of R.)
Follow Romeo as he tackles the most difficult challenges in data science.