Knowledge Transfer Microsoft Certified Training Partner CTEC
Knowledge Transfer is a Microsoft Certified Gold Partner
Microsoft Certified Gold Training Partner
Search for a Course Topic:
Public Courses
Corporate Services & Training



 Course Search
Course #

 Training Delivery
Training Delivery
Custom Curriculum
Course List
 Main Menu
View Courses
Site Index

Designing and Building Big Data Applications Overview

  • Introduction

  • Application Architecture

    • Scenario Explanation

    • Understanding the Development Environment

    • Identifying and Collecting Input Data

    • Selecting Tools for Data Processing and Analysis

    • Presenting Results to the Use

  • Defining and Using Data Sets

    • Metadata Management

    • What is Apache Avro?

    • Avro Schemas

    • Avro Schema Evolution

    • Selecting a File Format

    • Performance Considerations

  • Using the Kite SDK Data Module

    • What is the Kite SDK?

    • Fundamental Data Module Concepts

    • Creating New Data Sets Using the Kite SDK

    • Loading, Accessing, and Deleting a Data Set

  • Importing Relational Data with Apache Sqoop

    • What is Apache Sqoop?

    • Basic Imports

    • Limiting Results

    • Improving Sqoop’s Performance

    • Sqoop 2

  • Capturing Data with Apache Flume

    • What is Apache Flume?

    • Basic Flume Architecture

    • Flume Sources

    • Flume Sinks

    • Flume Configuration

    • Logging Application Events to Hadoop

  • Developing Custom Flume Components

    • Flume Data Flow and Common Extension Points

    • Custom Flume Sources

    • Developing a Flume Pollable Source

    • Developing a Flume Event-Driven Source

    • Custom Flume Interceptors

    • Developing a Header-Modifying Flume Interceptor

    • Developing a Filtering Flume Interceptor

    • Writing Avro Objects with a Custom Flume Interceptor

  • Managing Workflows with Apache Oozie

    • The Need for Workflow Management

    • What is Apache Oozie?

    • Defining an Oozie Workflow

    • Validation, Packaging, and Deployment

    • Running and Tracking Workflows Using the CLI

    • Hue UI for Oozie

  • Processing Data Pipelines with Apache Crunch

    • What is Apache Crunch?

    • Understanding the Crunch Pipeline

    • Comparing Crunch to Java MapReduce

    • Working with Crunch Projects

    • Reading and Writing Data in Crunch

    • Data Collection API Functions

    • Utility Classes in the Crunch API

  • Working with Tables in Apache Hive

    • What is Apache Hive?

    • Accessing Hive

    • Basic Query Syntax

    • Creating and Populating Hive Tables

    • How Hive Reads Data

    • Using the RegexSerDe in Hive

  • Developing User-Defined Functions

    • What are User-Defined Functions?

    • Implementing a User-Defined Function

    • Deploying Custom Libraries in Hive

    • Registering a User-Defined Function in Hive

  • Executing Interactive Queries with Impala

    • What is Impala?

    • Comparing Hive to Impala

    • Running Queries in Impala

    • Support for User-Defined Functions

    • Data and Metadata Management

  • Understanding Cloudera Search

    • What is Cloudera Search?

    • Search Architecture

    • Supported Document Formats

  • Indexing Data with Cloudera Search

    • Collection and Schema Management

    • Morphlines

    • Indexing Data in Batch Mode

    • Indexing Data in Near Real Time

  • Presenting Results to Users

    • Solr Query Syntax

    • Building a Search UI with Hue

    • Accessing Impala through JDBC

    • Powering a Custom Web Application with Impala and Search


View Printer Friendly Page

To Inquire About Future Classes

Request a class date

if one is not scheduled.