Knowledge Transfer Microsoft Certified Training Partner CTEC
Knowledge Transfer is a Microsoft Certified Gold Partner
Microsoft Certified Gold Training Partner
Search for a Course Topic:
Public Courses
Corporate Services & Training
 

 

 



 Course Search
Keyword
Course #
State

 Training Delivery
 
Training Delivery
Custom Curriculum
Course List
 
 Main Menu
 
Home
View Courses
Site Index
 
 


Cloudera Developer Training for Apache Spark Overview



  • Introduction to Spark

    • What is Spark?

    • Review: From Hadoop MapReduce to Spark

    • Review: HDFS

    • Review: YARN

    • Spark Overview



  • Spark Basics

    • Using the Spark Shell

    • RDDs (Resilient Distributed Datasets)

    • Functional Programming in Spark



  • Working with RDDs in Spark

    • Creating RDDs

    • Other General RDD Operations



  • Aggregating Data with Pair RDDs

    • Key-Value Pair RDDs

    • Map-Reduce

    • Other Pair RDD Operations



  • Writing and Deploying Spark Applications

    • Spark Applications vs. Spark Shell

    • Creating the SparkContext

    • Building a Spark Application (Scala and Java)

    • Running a Spark Application

    • The Spark Application Web UI

    • Hands-On Exercise: Write and Run

    • Spark Application

    • Configuring Spark Properties

    • Logging



  • Parallel Processing

    • Review: Spark on a Cluster

    • RDD Partitions

    • Partitioning of File-based RDDs

    • HDFS and Data Locality

    • Executing Parallel Operations

    • Stages and Tasks



  • Spark RDD Persistence

    • RDD Lineage

    • RDD Persistence Overview

    • Distributed Persistence



  • Basic Spark Streaming

    • Spark Streaming Overview

    • Example: Streaming Request Count

    • DStreams

    • Developing Spark Streaming Applications



  • Advanced Spark Streaming

    • Multi-Batch Operations

    • State Operations

    • Sliding Window Operations

    • Advanced Data Sources



  • Common Patterns in Spark Data Processing

    • Common Spark Use Cases

    • Iterative Algorithms in Spark

    • Graph Processing and Analysis

    • Machine Learning

    • Example: k-means



  • Improving Spark Performance

    • Shared Variables: Broadcast Variables

    • Shared Variables: Accumulators

    • Common Performance Issues

    • Diagnosing Performance Problems



  • Spark SQL and DataFrames

    • Spark SQL and the SQL Context

    • Creating DataFrames

    • Transforming and Querying DataFrames

    • Saving DataFrames

    • DataFrames and RDDs

    • Comparing Spark SQL, Impala and Hive-on-Spark



  • Conclusion


 

View Printer Friendly Page


To Inquire About Future Classes

Request a class date

if one is not scheduled.