IBM Cloud, Big Data, and Data Science educator guide

Posted: December 23, 2015 Modified: September 8, 2016

This educator guide helps you:

  • Understand the importance of developing key skills.
  • Learn about IBM’s Platform as a Service.
  • Find assets, labs, and other resources to use in your classroom.

Help your students prepare for the Cognitive Era

Rising alongside the relatively new technology of big data is a new job title, data scientist. While not tied exclusively to big data projects, the data scientist role does complement those projects because of the increased breadth and depth of data being examined.

The data scientist position represents an evolution from the business or data analyst role.

Desired skills: Solid foundation typically in computer science and applications, modeling, statistics, analytics, math, data visualization, and machine learning. Business acumen and strong communication skills will set the data scientist apart.

Market demand: Data and analytics skills are a growing need in IT and emerging data science practices. According to the Harvard Business Review, 42% of business leaders say data-driven insight will be a significant contributor to revenue over the next 3 years.

IBM provides the foundation for students to succeed in data and analytics by offering the ability to learn, contribute, and network with others. Open-source technologies such as Cloudant, Spark, dashDB, and Streams are paired with cloud computing to allow students to explore data and infuse intelligent analytics everywhere.

Help your students prepare for the Cognitive Era using IBM® Bluemix™. This guide contains assets such as introductory videos, tutorials, corresponding labs, and exercises that you can use in your classroom to build these critical skills.


This guide is for educators who want to help their students build big data and data science skills related to cloud and analytics. It provides a quick introduction to IBM Cloud Data technologies and is ideal for computer science and math undergraduate students or first year graduate students.

IBM Cloud Data Services introduction

IBM Cloud Data Services provide resources for developers to get, build, and analyze data on the IBM Bluemix development cloud built on an integrated, best-of-breed portfolio of technologies, including Analytics for Apache Spark, Cloudant, dashDB, DataWorks, and more.

  • IBM Analytics for Apache Spark: Explore the essentials of data quickly in an interactive computational environment. The large-scale data analytics capabilities of Apache Spark included in Jupyter-based notebooks provide the ideal framework for building powerful solutions around data. Visit the IBM Apache Spark Learning Center to find in-depth learning guides, tutorials, and samples.
  • IBM Cloudant: This fully-managed NoSQL database as a service (DBaaS) is built from the ground up to scale globally, run non-stop, and handle a wide variety of data types like JSON, full-text, and geospatial. To find in-depth learning guides, tutorials, and samples, visit the IBM Cloudant Learning Center.
  • IBM dashDB: The fast, fully-managed, cloud data warehouse utilizes integrated analytics to rapidly deliver answers. dashDB’s unique in-database analytics, R predictive modeling, and business intelligence tools free you to analyze your data and get precise insights quickly. To find in-depth learning guides, tutorials, and samples, visit the IBM dashDB Learning Center.
  • IBM DataWorks: DataWorks is a fully-managed data preparation and movement service that enables business analysts, developers, and data scientists to put data to work through a simple, powerful cloud application. Its processing engine is built on Apache Spark, but even ‘power Excel’ users can use the DataWorks UI to discover, cleanse, transform, and move data for their developers and analytics. DataWorks excels at preparing and moving data from on-premises systems to cloud infrastructure and back again. A key component of the IBM Cloud Data Services portfolio, the DataWorks service comes pre-integrated with other IBM services like the dashDB cloud data warehouse, Cloudant NoSQL database, and Watson Analytics. To find in-depth learning guides, tutorials, and samples, visit the IBM DataWorks Learning Center.
  • IBM Streaming Analytics: Streaming Analytics, built on the IBM Streams technology, is an advanced analytics platform. It allows user-developed applications to quickly ingest, analyze, and correlate information as it arrives from a wide variety of real-time sources. The Streaming Analytics service gives you the ability to deploy Streams applications to run on the Bluemix platform. To learn more, visit the IBM Streams developer site.

Learn more about Bluemix

Recommended Learning Roadmap

This section provides a roadmap and descriptions to learning assets you can leverage to learn Big Data and data science with IBM Cloud data services and technologies and use them to enhance your curriculum.

Introductory learning

  • The Big Data Fundamentals course from Big Data University presents a holistic approach to Big Data, taking both a top-down and a bottom-up approach to answer questions such as: What is Big Data? How do we tackle Big Data? Why are we interested in it? What is a Big Data platform?
  • The Getting Started with Data Science course from Big Data University describes introductory topics about Data Science and provides interesting examples of how Data Science is used in the real world.
  • The Data Science Methodology course from Big Data University describes the major steps involved in practicing data science with interesting real-world examples at each step. It covers forming a concrete business or research problem, to collecting and analyzing data, to building a model, and understanding the feedback after model deployment.
  • The Introduction to R course from Big Data University covers the basics of this open source language, such as factors, lists, and data frames.
  • The Introduction to NoSQL and DBaaS course from Big Data University provides an overview of the NoSQL database landscape, the benefits of using a Database-as-a-Service offering, and where Cloudant fits into the picture. Additionally, we’ll get you started with using Cloudant by providing tutorials on account sign up, creating and replicating databases, loading and querying data, and conclude by pointing to additional resources.
  • The Intro to Scala course from Big Data University covers the fundamentals of the Scala language used to program data science applications, the tooling, and the development process, and also an introduction to the more advanced features.
  • The Spark Fundamentals I and Spark Fundamentals II courses from Big Data University provide an in-depth overview of Spark including its applications, resilient distributed database operations, the use of Scala, Spark SQL, Mlib, Spark Streaming, and GraphX to develop and run Spark applications, configuration, monitoring and tuning, data distribution, task parallelization, optimization, caching, and advanced operations.
  • The Getting Started with IBM Bluemix course from IBM developerWorks covers the fundamentals of cloud computing, Bluemix, services, DevOps, containers, Cloud Foundry, and best practices for agile and test-driven development.

Intermediate learning

  • Complete all of the tutorials from the Cloudant Learning Center. Learn how to create a database, set its permissions, use the database with Bluemix, develop applications using Cloudant, use the HTTP API, set up database replication, create indexes and queries, and integrate with Spark and dashDB.
  • Complete all of the tutorials from the Spark Learning Center. Learn how to use Spark on Bluemix, build SQL queries, use the machine learning library, load and analyze dashDB data, load Cloudant data, use Spark streaming, develop using Scala notebooks, analyze traffic data, analyze Twitter sentiment, and analyze additional sample data.
  • Complete all of the tutorials from the dashDB Learning Center. Learn how to use dashDB on Bluemix, load data from various sources, load geospatial data, integrate dashDB and Informatica Cloud, migrate data from other databases, load XML data, connect applications, use dashDB with Watson Analytics, use dashDB with Spark, use dashDB with R, publish apps with Shiny, use dashDB with Tableau, use dashDB with Cognos, and analyze dashDB data with SPSS.
  • Complete all of the tutorials from the DataWorks Learning Center. Learn how to connect to data in DataWorks, load data for analytics, shape raw data, and use the DataWorks API.
  • Complete all of the tutorials from the Roadmap for Streaming Analytics Service on Bluemix. Learn about streams and how to build streaming applications on Bluemix, how to integrate with other Bluemix services including Message Hub, Cloudant, the SPSS Analytics Toolkit, the Internet of Things services, and Hbase.


  • Big Data University: Analytics, big data, and data science courses for careers in data science and data engineering.
  • Cloudant Learning Center: Resources for developers to get, build, and analyze data on the IBM Cloud.

Reference Materials

Books and articles

  • IBM Cloudant: Database as a Service Advanced Topics. Learn about advanced topics for IBM Cloudant, a NoSQL JSON document store that is optimized for handling heavy workloads of concurrent reads and writes in the cloud, a workload that is typical of large, fast-growing web and mobile apps. You can use Cloudant as a fully-managed DBaaS running on public cloud platforms like IBM SoftLayer or via an on-premise version called Cloudant Local that you can run yourself on any private, public, or hybrid cloud platform.
  • Hybrid Cloud Data and API Integration: Integrate Your Enterprise and Cloud with Bluemix Integration Services. Learn about a set of hybrid cloud capabilities in IBM Bluemix that allows businesses to innovate rapidly while providing IT control and visibility. It allows customers to quickly and easily build and operate systems that mix data and application programming interfaces (APIs) from a wide variety of sources, whether they reside on-premises or in the cloud.
  • Data integration and analytics as a service, Part 1: DataWorks. Learn how to use the DataWorks cloud service in IBM Bluemix to perform data load or migration from different sources to various targets. IBM DataWorks service, which includes DataWorks APIs and DataWorks Forge, allows developers to load, cleanse and profile data, in addition to migrating to different targets seamlessly.


Learn more: Technology

Join us: Events

Ask experts: Community forums

View additional educator guides

Follow @IBMSkills on Twitter