Description

Learn to use Spark for your own applications in three packed hands-on days

This fast-paced 3-day course is for data engineers, data analysts, data scientists, developers and operations teams and provides a thorough, hands-on overview of the Apache Spark Platform and various technologies and paradigms which are in Apache Spark.

We will explore Apache Spark, how it came into existence, how it compares with Apache Hadoop – currently the de facto big data standard – and the new use cases that can be realized with Apache Spark as well as how your current use cases can be made more performant and powerful.
We will also look at Apache Spark’s Streaming Architecture which can help realize most of the real time-constrained needs of your business. We will also explore Apache Spark’s SQL Architecture which provides very fast migration from traditional slower analytical tools like Hive to SparkSQL.
We will spend some time on Apache Spark ML/ML Lib which provide a total integrated Architecture with both real-time and batch analytics.
Finally, we will also look at Apache Spark GraphX which deals with Graph Algorithms.
All these workshops are delivered with guided hands-on labs allowing attendees to explore the data and the techniques and familiarize themselves with the various paradigms.

Who Should Attend

Developers and Team Leads
Software Engineers
Business Analysts
System Analysts
Data Analysts and Scientists
Data Scientists
Operations and DevOps Engineers
JAVA Developers
Big Data Engineers

Course Overview

Introduction to Big Data & Apache Spark
- Introduce Data Analysis
- Introduce Big Data
- Big Data Definition
- Introduce the techniques and challenges in Big Data
- Introduce the techniques and challenges in Distributed Computing
- Show how the functional programming approach is particularly useful in tackling these challenges
- Short overview of previous solutions: Google’s MapReduce and Apache Hadoop
- Introduce Apache Spark
Hands-on practice: We will get exposure to admin and setup
Deploying & Understanding Apache Spark Architecture
- Spark Architecture in a Cluster
- Spark Ecosystem and Cluster Management
- Deploying Spark on a Cluster
- Deploying Spark on a Standalone Cluster
- Deploying Spark on a Mesos Cluster
- Deploying Spark on YARN cluster
- Cloud-based Deployment
Hands-on practice: Learn to deploy and begin using Spark
Spark Core, RDDs and Spark Shell
- Dig deeper into Apache Spark
- Introduce Resilient Distributed Datasets (RDDs)
- Apache Spark installation (basic, local)
- Introduce the Spark Shell
- Actions and Transformations (Laziness)
- Caching
- Loading and Saving data files from the file system
Hands-on practice: Get hands-on with Spark Core and RDDs
4.Deep Dive into RDD
- Tailored RDD
- Pair RDD
- NewHadoop RDD
- Aggregations
- Partitioning
- Broadcast Variables
- Accumulators
Hands-on practice: You’ll learn expanded RDD capabilities
5.Spark SQL and DataFrames
- SparkSQL & DataFrames
- DataFrame & SQL API
- DataFrame Schema
- Datasets and Encoders
- Loading and Saving data
- Aggregations
- Joins
Hands-on practice: You’ll learn to use one of Spark’s most powerful features: DataFrames using R-style modeling supported by supercomputing clusters
6.Spark Streaming
- Brief introduction to streaming
- Spark Streaming
- Discretized Streams
- Structured Streaming
- Stateful / Stateless Transformations
- Checkpointing
- Interoperability with Streaming Platforms (Apache Kafka)
Hands-on practice: Another of Spark 2.1’s most exciting features is the ability to provide big data streaming to allow beating the timeframe constraints of previous big data solutions
Spark MLlib and ML
- Introduction to Machine Learning
- Spark Machine Learning APIs
- Feature Extractor and Transformation
- Classification using Logistic Regression
- Best Practice in ML for the Practitioners
Hands-on practice: Use Spark to perform production-friendly calls for powerful machine learning service and predictive analytics
Graphx
- Brief Introduction to Graph Theory
- GraphX
- Vertex and Edge RDDs
- Graph operators
- Pregel API
- PageRank / Travelling Salesman Problem *
Hands-on practice: Get hands-on practice using Graphx
Testing and Debugging Spark
- Testing in a Distributed Environment
- Testing Spark Application
- Debugging Spark Application
Hands-on practice: You’ll get lab practice supporting Spark solutions with best practices for testing, debugging, and normal-day production issues for Spark solutions

Apache Spark Big Data Boot Camp Classroom Live Chicago, IL April 22, 2019

Price: $2,700

Enroll today to reserve your spot!

Description

Course Overview

Prerequisites

Find a Course

Corporate Services

Contact us