Apache® Spark™ is the open-source, in-memory computing framework for distributed data processing. One of the nicest things about this technology is that it features a simple programming model that hides the complexity inherent to distributed computing. As an added bonus, the APIs come in multiple flavors: Scala, Java, Python, and R. Added integration with SWIFT Object Storage, Cloudant, Db2 Warehouse on Cloud (formerly dashDB), SQLDB and other IBM Cloud Data Services makes development and analytics with Apache Spark more accessible, centralized and useful.
Access your Spark instance through IBM Cloud or Data Science Experience (DSx):
- On IBM Cloud, use spark-submit which to run Spark jobs programmatically. Learn more about using spark-submit with IBM Cloud.
- On DSx, use Jupyter notebooks to manipulate data and run analytics in Scala, R, or Python. Try notebooks in DSx.
When you’re ready to pull in data from different sources, follow the integration tutorials to see how it’s done.