Starting your education in big data


There is no question big data and cloud computing are two of the hottest IT topics today. Demand for skilled individuals in these areas and the salaries offered are growing quickly. Fortunately, both areas are somewhat related, so you can start your education in big data, and at the same time experience and learn cloud computing concepts along the way. Though you can spend some time researching these topics on the internet, there is a better and easier way: Explore the free courses in Big Data University. is an online educational web site offering free courses about big data, and databases. The site is run by the community which includes many IBMers contributing voluntarily to the development of courses, and to enhancing the site. Learn @yourpace, @yourplace from the industry’s best is their motto. What is appealing about Big Data University is that most of the courses include hands-on labs that you can perform on the cloud. For example, one of the courses in Big Data University is sponsored by Amazon Web Services which is providing a 25-dollar credit to learn big data on their cloud. Each course in Big Data University has a short test you can take, and if you pass it, you can print yourself a certificate of completion.

This article lists the courses currently available in Big Data University and the ones that are soon to be published. Though none of the courses have prerequisites, there is a suggested path for you to take them in order.

Suggested learning path

Big Data University courses are classified in three categories:

  • Big data-related topics
  • Database (DB2) related topics
  • Miscellaneous topics

Figure 1 shows the list of courses in the big data category and the suggested order we recommend you to take them (top to bottom, then left to right) depending on your current knowledge of big data concepts.

Figure 1. Big Data University courses — Big data category
described below

The “Big data analytics demos” course at the top of the figure provides an overview of what big data is, why it’s important, and its characteristics. It also introduces you to the concepts of data-at-rest analytics (think of an ocean as an analogy: huge amounts of data, but not really flowing), and data-in-motion analytics (think of a river or a stream as an analogy: streams of data constantly flowing and having to analyze them in real-time).

The courses on the left of Figure 1 (“Hadoop fundamentals I,” “Hadoop and the Amazon Cloud,” “Hadoop and the IBM SmartCloud Enterprise,” and, in beta,”Hadoop Fundamentals II”) are mainly for data-at-rest analytics. They teach you how to work with Hadoop, an open source Java framework that helps you process large amounts of data quickly. Note that these courses have labs that you can run either on the Amazon Cloud or the IBM SmartCloud Enterprise. We suggest you take the courses in this section in the order listed, from top to bottom.

In the center of Figure 1, three courses are listed:

  • “Spreadsheet-like analytics” (in beta) allows non-technical users to take advantage of big data technologies without having to learn how to write a program to run Hadoop, JAQL, and so on. It uses BigSheets, a plug-in that can be run on top of Hadoop, and is designed for the business user who is familiar with spreadsheet tools such as MS Excel.
  • “Text Analytics Essentials I” teaches you the basics of how to perform analytics on unstructured data, such as the content of an email, or any other document. It uses Annotation Query Language (AQL) to specify how to filter the information. A text analytics Eclipse plug-in can be used to develop the AQL which can later be deployed on top of Hadoop to crunch big data.
  • “Query Languages for Hadoop” (in beta) teaches you how to work with query and scripting languages such as Hive, Pig, and JAQL. This scripting languages simplify the development of map-reduce programs in Hadoop for developers with no Java expertise.

On the right of the figure you see the list of courses soon to publish for data-in-motion analytics (“Stream computing I” and “Stream computing II,” both in beta). They will discuss for example, how to analyze tweets or Facebook comments as the data is flowing in real time. They will also discuss how to perform log analysis, complex event processing, and more.

Figure 2 shows the list of courses in the database (DB2) category and the suggested order we recommend you to take them (top to bottom, then left to right) depending on your current knowledge of database concepts.

Big Data University courses — Database (DB2) category
described below

The “SQL fundamentals I” course at the top of Figure 2 is an introductory course that not only teaches you SQL, but also basic concepts about relational database management systems, and other systems. Take this course and read the book Database Fundamentals for the best learning experience.

The courses on the left of Figure 2 provide you with a solid foundation of core DB2 concepts. Take the “DB2 essential training I” and “DB2 essential training II” courses and read the book Getting started with DB2 Express-C for optimum results.

The soon to publish “What’s new in DB2 10” course explains the new features available with the latest release of DB2 for Linux, UNIX, and Windows. It will include videos with demonstrations about features such as time travel query, multi-temperature storage, Oracle compatibility, and more.

In the center of Figure 2 there is one course listed: “Data Studio Essential Training I.” At the time of writing, this course was being updated to the latest version of Data Studio; however, you can review the videos in the course to get familiar with the Data Studio, even though the videos were created for a previous version of the product.

Finally, on the right side of Figure 2, the course “DB2 academic training — 302A exam preparation” is listed. This course prepares you for IBM Exam 302A, developed for the academic community. It includes 13 lessons and a sample test that will give you a good indication of how you would do in the real exam.

Courses for miscellaneous topics

Figure 3 shows the list of courses in the Miscellaneous category.

Figure 3. Big Data University courses – miscellaneous category
described below

The “Creating a course in Big Data University” course provides all the instructions required for anyone interested in developing a course to publish in Big Data University. We encourage you to review this course, and find how easy it easy to create your own course. Though all the courses in Big Data University to date are free of charge, if you would like to develop a course that requires a fee, Big Data University has the capability to support this.

Finally, the “Open source development” course (in beta) includes a list of open source tasks that need to be implemented to support Big Data University features. Members from the community willing to help develop these features using PHP are free to contact us, and we can grant access to this course to review projects or tasks that need to be completed.


This article talked about the different courses available at Big Data University that you can take to enhance your skills in Big Data technologies, as well as database technologies. The figures presented in the article provide a suggested path or order that you should follow. All the courses in Big Data University are currently free, have hands-on lab exercises, and allow you to print your certificate of completion after passing a test.

Big Data University is a community site sponsored by IBM. We invite community members to develop new courses, and the “Creating a course in Big Data University” course has all the instructions to get started.