Knowledge Transfer Microsoft Certified Training Partner CTEC
Knowledge Transfer is a Microsoft Certified Gold Partner
Microsoft Certified Gold Training Partner
Search for a Course Topic:
Public Courses
Corporate Services & Training
 

 

 



 Course Search
Keyword
Course #
State

 Training Delivery
 
Training Delivery
Custom Curriculum
Course List
 
 Main Menu
 
Home
View Courses
Site Index
 
 


Cloudera Developer Training for Apache Hadoop Overview


1. Motivation for Hadoop

  • Problems with Traditional Large-Scale Systems
  • Requirements for a New Approach

2. Hadoop: Basic Concepts

  • Hadoop Distributed File System (HDFS)
  • MapReduce
  • Anatomy of a Hadoop Cluster
  • Other Hadoop Ecosystem Components

3. Writing a MapReduce Program

  • MapReduce Flow
  • Examining a Sample MapReduce Program
  • Basic MapReduce API Concepts
  • Driver Code
  • Mapper
  • Reducer
  • Streaming API
  • Using Eclipse for Rapid Development
  • New MapReduce API

4. Integrating Hadoop into the Workflow

  • Relational Database Management Systems
  • Storage Systems
  • Importing Data from a Relational Database Management System with Sqoop
  • Importing Real-Time Data with Flume
  • Accessing HDFS Using FuseDFS and Hoop

5. Delving Deeper into the Hadoop API

  • ToolRunner
  • Testing with MRUnit
  • Reducing Intermediate Data with Combiners
  • Configuration and Close Methods for Map/Reduce Setup and Teardown
  • Writing Partitioners for Better Load Balancing
  • Directly Accessing HDFS
  • Using the Distributed Cache

6. Common MapReduce Algorithms

  • Sorting and Searching
  • Indexing
  • Machine Learning with Mahout
  • Term Frequency
  • Inverse Document Frequency
  • Word Co-Occurrence

7. Using Hive and Pig

  • Hive Basics
  • Pig Basics

8. Practical Development Tips and Techniques

  • Debugging MapReduce Code
  • Using LocalJobRunner Mode for Easier Debugging
  • Retrieving Job Information with Counters
  • Logging
  • Splittable File Formats
  • Determining the Optimal Number of Reducers
  • Map-Only MapReduce Jobs

9. Advanced MapReduce Programming

  • Custom Writables and WritableComparables
  • Saving Binary Data Using SequenceFiles and Avro Files
  • Creating InputFormats and OutputFormats

10. Joining Data Sets in MapReduce

  • Map-Side Joins
  • Secondary Sort
  • Reduce-Side Joins

11. Graph Manipulation in Hadoop

  • Graph Techniques
  • Representing Graphs in Hadoop
  • Implementing a Sample Algorithm: Single Source Shortest Path

12. Creating Workflows with Oozie

  • Motivation for Oozie
  • Workflow Definition Format

Labs

Throughout the course, you will write Hadoop code and perform other hands-on exercises to solidify your understanding of the concepts.

 

View Printer Friendly Page


To Inquire About Future Classes

Request a class date

if one is not scheduled.