Knowledge Transfer Microsoft Certified Training Partner CTEC
Knowledge Transfer is a Microsoft Certified Gold Partner
Microsoft Certified Gold Training Partner
Search for a Course Topic:
Public Courses
Corporate Services & Training
 

 

 



 Course Search
Keyword
Course #
State

 Training Delivery
 
Training Delivery
Custom Curriculum
Course List
 
 Main Menu
 
Home
View Courses
Site Index
 
 


Cloudera Training for Data Analysts: Using Pig, Hive, and Impala with Hadoop Overview




  • Hadoop Fundamentals

    • The Motivation for Hadoop

    • Hadoop Overview

    • Data Storage: HDFS

    • Distributed Data Processing: YARN, MapReduce, and Spark

    • Data Processing and Analysis: Pig, Hive, and Impala

    • Data Integration: Sqoop

    • Other Hadoop Data Tools

    • Exercise Scenarios Explanation



  • Introduction to Pig

    • What Is Pig?

    • Pig’s Features

    • Pig Use Cases

    • Interacting with Pig



  • Basic Data Analysis with Pig

    • Pig Latin Syntax

    • Loading Data

    • Simple Data Types

    • Field Definitions

    • Data Output

    • Viewing the Schema

    • Filtering and Sorting Data

    • Commonly-Used Functions



  • Processing Complex Data with Pig

    • Storage Formats

    • Complex/Nested Data Types

    • Grouping

    • Built-In Functions for Complex Data

    • Iterating Grouped Data



  • Multi-Dataset Operations with Pig

    • Techniques for Combining Data Sets

    • Joining Data Sets in Pig

    • Set Operations

    • Splitting Data Sets



  • Pig Troubleshooting and Optimization

    • Troubleshooting Pig

    • Logging

    • Using Hadoop’s Web UI

    • Data Sampling and Debugging

    • Performance Overview

    • Understanding the Execution Plan

    • Tips for Improving the Performance of Your Pig Jobs



  • Introduction to Hive and Impala

    • What Is Hive?

    • What Is Impala?

    • Schema and Data Storage

    • Comparing Hive to Traditional Databases

    • Hive Use Cases



  • Querying with Hive and Impala

    • Databases and Tables

    • Basic Hive and Impala Query Language Syntax

    • Data Types

    • Differences Between Hive and Impala Query Syntax

    • Using Hue to Execute Queries

    • Using the Impala Shell



  • Data Management

    • Data Storage

    • Creating Databases and Tables

    • Loading Data

    • Altering Databases and Tables

    • Simplifying Queries with Views

    • Storing Query Results



  • Data Storage and Performance

    • Partitioning Tables

    • Choosing a File Format

    • Managing Metadata

    • Controlling Access to Data



  • Relational Data Analysis with Hive and Impala

    • Joining Datasets

    • Common Built-In Functions

    • Aggregation and Windowing



  • Working with Impala

    • How Impala Executes Queries

    • Extending Impala with User-Defined Functions

    • Improving Impala Performance



  • Analyzing Text and Complex Data with Hive

    • Complex Values in Hive

    • Using Regular Expressions in Hive

    • Sentiment Analysis and N-Grams

    • Conclusion



  • Hive Optimization

    • Understanding Query Performance

    • Controlling Job Execution Plan

    • Bucketing

    • Indexing Data



  • Extending Hive

    • SerDes

    • Data Transformation with Custom Scripts

    • User-Defined Functions

    • Parameterized Queries



  • Choosing the Best Tool for the Job

    • Comparing MapReduce, Pig, Hive, Impala, and Relational Databases

    • Which to Choose?




 

View Printer Friendly Page

Course Schedule
  Start Date  City  Price  
 8/28/2017
 $3195
Enroll

To Inquire About Future Classes

Request a class date

if one is not scheduled.