The next installment of the¬†SQL-on-Hadoop hands-on lab is now available on Hadoop Dev, and its focus is on using Big SQL to work with data that you wouldn’t typically find in a relational DBMS or data warehouse. ¬†The scenarios covered in the two new lab modules use social media data about the IBM Watson project. ¬†This data — representing blogs posted to various global sites — is stored in Java Script Object Notation (JSON).
Lab 5¬†guides you through the process of using BigSheets to explore the blog data in a spreadsheet-style format and generate a Big SQL table based on your workbook. ¬†With that done, you can query the workbook data using standard SQL. ¬†Lab 6¬†shows you how to work with SerDes (serializers/deserializers) so that you can present data stored in different formats as Big SQL tables. ¬†In particular, you’ll see how easy it is to register a SerDe with BigInsights, create a Big SQL table that uses the SerDe, and then query the table using standard SQL.
Labs 5 and 6 differ from the earlier labs I blogged about, which use structured data typical of a relational DBMS to explore the capabilities of Big SQL. ¬† Indeed, the structured data used in those labs fits easily into a collection of FACT and DIMENSION tables that you might find in a relational data warehouse. By contrast, the social media data used in Labs 5 and 6 is stored “raw” — in JSON arrays generated by the application that collected the data.¬† Then, to enable business analysts and SQL programmers to readily read and analyze this data, an appropriate schema is layered on top of the raw data.¬† In Lab 5, this schema is generated a technology in BigSheets (specifically, a JSON array reader).¬† In Lab 6, this schema is generated through SQL DDL (data definitional language) statements that reference a SerDe.¬†¬† In both cases, the original JSON data remains in the distributed file system but is perceived by BigSheets and Big SQL consumers in a different way.¬† This is sometimes referred to as “schema on read”.
So if you’re ready to expand your SQL skills, check out the latest labs. ¬†And stay tuned . . . . In the coming weeks, we’ll have more Big SQL exercises for you to explore.