About This Course
In this Apache Hive course you’ll learn how to make querying your data much easier. First created at Facebook, Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems.
- Learn how to write MapReduce programs to analyze your Big Data
- Learn Hive QL, a language the provides a mechanism to project structure onto this data and query the data.
- Learn how to use Hive for Data Warehousing tasks on your Big Data projects.
- Module 1 – Introduction to Hive
- Describe what Hive is, what it’s used for and how it compares to other similar technologies
- Describe the Hive architecture
- Describe the main components of Hive
- List interesting ways others are using Hive
- Module 2 – Hive DDL
- Create databases and tables in Hive, while using a variety of different Data Types
- Run a variety of different DDL commands
- Use Partitioning to improve performance of Hive queries
- Create Managed and External tables in Hive
- Module 3 – Hive DML
- Load data into Hive
- Export data out of Hive
- Run a variety of different Hive QL DML queries
- Lesson 4 – Hive Operators and Functions
- Use a variety of Hive Operators in your queries
- Utilize Hive’s Built-in Functions
- Explain ways to extend Hive functionality
- This Hive course is free.
- It is self-paced.
- It can be taken at any time.
- It can be audited as many times as you wish.
- Labs can be performed on the Cloud, or using a 64-bit system. If using a 64-bit system, you can install the required software (Linux-only), or use the supplied VMWare image. More details are provided in the section “Labs setup”.
Recommended skills prior to taking this course
- Basic understanding of Apache Hadoop and Big Data.
- Working knowledge of SQL
- Basic Linux Operating System knowledge
Aaron Ritchie has worked in the Information Management division of IBM for over 8 years and has held a variety of roles within the Center of Excellence and Education groups. Aaron has worked as an IT Specialist, Learning Developer, and Project Manager. He is certified in multiple IBM products and enjoys working with an assortment of open-source technologies. Aaron holds a Bachelor of Science in Computer Science degree from Clarkson University and a Master of Science in Information Technology degree from WPI.
Daniel Tran is an IBM Co-op Student working as a Technical Curriculum Developer in Toronto, Ontario. He develops courses to improve the education of customers who seek knowledge in the Big Data field. He has also reworked previously developed courses, updating them to be compatible with the newest software releases, as well as work at the forefront of recreating courses on a newly developed cloud environment. He has worked with various components that deal with Big Data, including Hadoop, Pig, Hive, HBase, MapReduce & YARN, Sqoop, Oozie, and Phoenix. He has also worked on separate courses involving Machine Learning. Daniel is from the University of Alberta, where he has completed his third year of traditional Computer Engineering Co-op.