Knowledge Transfer Microsoft Certified Training Partner CTEC
Knowledge Transfer is a Microsoft Certified Gold Partner
Microsoft Certified Gold Training Partner
Search for a Course Topic:
Public Courses
Corporate Services & Training



 Course Search
Course #

 Training Delivery
Training Delivery
Custom Curriculum
Course List
 Main Menu
View Courses
Site Index

Cloudera Training for Data Analysts: Using Pig, Hive, and Impala with Hadoop Overview

  • Hadoop Fundamentals

    • The Motivation for Hadoop

    • Hadoop Overview

    • Data Storage: HDFS

    • Distributed Data Processing: YARN, MapReduce, and Spark

    • Data Processing and Analysis: Pig, Hive, and Impala

    • Data Integration: Sqoop

    • Other Hadoop Data Tools

    • Exercise Scenarios Explanation

  • Introduction to Pig

    • What Is Pig?

    • Pig’s Features

    • Pig Use Cases

    • Interacting with Pig

  • Basic Data Analysis with Pig

    • Pig Latin Syntax

    • Loading Data

    • Simple Data Types

    • Field Definitions

    • Data Output

    • Viewing the Schema

    • Filtering and Sorting Data

    • Commonly-Used Functions

  • Processing Complex Data with Pig

    • Storage Formats

    • Complex/Nested Data Types

    • Grouping

    • Built-In Functions for Complex Data

    • Iterating Grouped Data

  • Multi-Dataset Operations with Pig

    • Techniques for Combining Data Sets

    • Joining Data Sets in Pig

    • Set Operations

    • Splitting Data Sets

  • Pig Troubleshooting and Optimization

    • Troubleshooting Pig

    • Logging

    • Using Hadoop’s Web UI

    • Data Sampling and Debugging

    • Performance Overview

    • Understanding the Execution Plan

    • Tips for Improving the Performance of Your Pig Jobs

  • Introduction to Hive and Impala

    • What Is Hive?

    • What Is Impala?

    • Schema and Data Storage

    • Comparing Hive to Traditional Databases

    • Hive Use Cases

  • Querying with Hive and Impala

    • Databases and Tables

    • Basic Hive and Impala Query Language Syntax

    • Data Types

    • Differences Between Hive and Impala Query Syntax

    • Using Hue to Execute Queries

    • Using the Impala Shell

  • Data Management

    • Data Storage

    • Creating Databases and Tables

    • Loading Data

    • Altering Databases and Tables

    • Simplifying Queries with Views

    • Storing Query Results

  • Data Storage and Performance

    • Partitioning Tables

    • Choosing a File Format

    • Managing Metadata

    • Controlling Access to Data

  • Relational Data Analysis with Hive and Impala

    • Joining Datasets

    • Common Built-In Functions

    • Aggregation and Windowing

  • Working with Impala

    • How Impala Executes Queries

    • Extending Impala with User-Defined Functions

    • Improving Impala Performance

  • Analyzing Text and Complex Data with Hive

    • Complex Values in Hive

    • Using Regular Expressions in Hive

    • Sentiment Analysis and N-Grams

    • Conclusion

  • Hive Optimization

    • Understanding Query Performance

    • Controlling Job Execution Plan

    • Bucketing

    • Indexing Data

  • Extending Hive

    • SerDes

    • Data Transformation with Custom Scripts

    • User-Defined Functions

    • Parameterized Queries

  • Choosing the Best Tool for the Job

    • Comparing MapReduce, Pig, Hive, Impala, and Relational Databases

    • Which to Choose?


View Printer Friendly Page

Course Schedule
  Start Date  City  Price  

To Inquire About Future Classes

Request a class date

if one is not scheduled.