Knowledge Transfer Microsoft Certified Training Partner CTEC
Knowledge Transfer is a Microsoft Certified Gold Partner
Microsoft Certified Gold Training Partner
Search for a Course Topic:
Public Courses
Corporate Services & Training



 Course Search
Course #

 Training Delivery
Training Delivery
Custom Curriculum
Course List
 Main Menu
View Courses
Site Index

Cloudera Developer Training for Apache Spark Overview

  • Introduction to Spark

    • What is Spark?

    • Review: From Hadoop MapReduce to Spark

    • Review: HDFS

    • Review: YARN

    • Spark Overview

  • Spark Basics

    • Using the Spark Shell

    • RDDs (Resilient Distributed Datasets)

    • Functional Programming in Spark

  • Working with RDDs in Spark

    • Creating RDDs

    • Other General RDD Operations

  • Aggregating Data with Pair RDDs

    • Key-Value Pair RDDs

    • Map-Reduce

    • Other Pair RDD Operations

  • Writing and Deploying Spark Applications

    • Spark Applications vs. Spark Shell

    • Creating the SparkContext

    • Building a Spark Application (Scala and Java)

    • Running a Spark Application

    • The Spark Application Web UI

    • Hands-On Exercise: Write and Run

    • Spark Application

    • Configuring Spark Properties

    • Logging

  • Parallel Processing

    • Review: Spark on a Cluster

    • RDD Partitions

    • Partitioning of File-based RDDs

    • HDFS and Data Locality

    • Executing Parallel Operations

    • Stages and Tasks

  • Spark RDD Persistence

    • RDD Lineage

    • RDD Persistence Overview

    • Distributed Persistence

  • Basic Spark Streaming

    • Spark Streaming Overview

    • Example: Streaming Request Count

    • DStreams

    • Developing Spark Streaming Applications

  • Advanced Spark Streaming

    • Multi-Batch Operations

    • State Operations

    • Sliding Window Operations

    • Advanced Data Sources

  • Common Patterns in Spark Data Processing

    • Common Spark Use Cases

    • Iterative Algorithms in Spark

    • Graph Processing and Analysis

    • Machine Learning

    • Example: k-means

  • Improving Spark Performance

    • Shared Variables: Broadcast Variables

    • Shared Variables: Accumulators

    • Common Performance Issues

    • Diagnosing Performance Problems

  • Spark SQL and DataFrames

    • Spark SQL and the SQL Context

    • Creating DataFrames

    • Transforming and Querying DataFrames

    • Saving DataFrames

    • DataFrames and RDDs

    • Comparing Spark SQL, Impala and Hive-on-Spark

  • Conclusion


View Printer Friendly Page

To Inquire About Future Classes

Request a class date

if one is not scheduled.