Learning Objectives
Create Table schemas on top of data in HDFS
Parse arbitrary complex file formats with regular expressions
Use SQL-like query language for data analysis
Use UDF, UDAF, UDTF and window functions in Hive
Extend Hive functionality with Python streaming scripts
Optimize execution of HiveQL queries with the help of partitioning, bucketing and sorting
Explain the purpose of different types of joins in Hive such as Bucket-Map-Side Join and Sort-Merge-Bucket Join
Work with data skew in Hive
List design goals (advantages) for row-column oriented file formats (RCFile / ORC / Parquet)