651-905-3729 Microsoft Silver Learning Partner EC Counsel Reseller compTIA Authorized Partner

Apache Spark Big Data Boot Camp Classroom Live Chicago, IL April 22, 2019

Price: $2,700

This course runs for a duration of 3 Days.

The class will run daily from 8:30 am CST to 4:30 pm CST.

Class Location: Chicago - Chicago, IL.

Enroll today to reserve your spot!

Space is limited. Enroll today.

Enroll Now

Description

Learn to use Spark for your own applications in three packed hands-on days


This fast-paced 3-day course is for data engineers, data analysts, data scientists, developers and operations teams and provides a thorough, hands-on overview of the Apache Spark Platform and various technologies and paradigms which are in Apache Spark.

  • We will explore Apache Spark, how it came into existence, how it compares with Apache Hadoop – currently the de facto big data standard – and the new use cases that can be realized with Apache Spark as well as how your current use cases can be made more performant and powerful.
  • We will also look at Apache Spark’s Streaming Architecture which can help realize most of the real time-constrained needs of your business. We will also explore Apache Spark’s SQL Architecture which provides very fast migration from traditional slower analytical tools like Hive to SparkSQL.
  • We will spend some time on Apache Spark ML/ML Lib which provide a total integrated Architecture with both real-time and batch analytics.
  • Finally, we will also look at Apache Spark GraphX which deals with Graph Algorithms.
  • All these workshops are delivered with guided hands-on labs allowing attendees to explore the data and the techniques and familiarize themselves with the various paradigms.

Who Should Attend

  • Developers and Team Leads
  • Software Engineers
  • Business Analysts
  • System Analysts
  • Data Analysts and Scientists
  • Data Scientists
  • Operations and DevOps Engineers
  • JAVA Developers
  • Big Data Engineers

Course Overview

  • Introduction to Big Data & Apache Spark
    • Introduce Data Analysis
    • Introduce Big Data
    • Big Data Definition
    • Introduce the techniques and challenges in Big Data
    • Introduce the techniques and challenges in Distributed Computing
    • Show how the functional programming approach is particularly useful in tackling these challenges
    • Short overview of previous solutions: Google’s MapReduce and Apache Hadoop
    • Introduce Apache Spark
  • Hands-on practice: We will get exposure to admin and setup
  • Deploying & Understanding Apache Spark Architecture
    • Spark Architecture in a Cluster
    • Spark Ecosystem and Cluster Management
    • Deploying Spark on a Cluster
    • Deploying Spark on a Standalone Cluster
    • Deploying Spark on a Mesos Cluster
    • Deploying Spark on YARN cluster
    • Cloud-based Deployment
  • Hands-on practice: Learn to deploy and begin using Spark
  • Spark Core, RDDs and Spark Shell
    • Dig deeper into Apache Spark
    • Introduce Resilient Distributed Datasets (RDDs)
    • Apache Spark installation (basic, local)
    • Introduce the Spark Shell
    • Actions and Transformations (Laziness)
    • Caching
    • Loading and Saving data files from the file system
  • Hands-on practice: Get hands-on with Spark Core and RDDs
  • 4.Deep Dive into RDD
    • Tailored RDD
    • Pair RDD
    • NewHadoop RDD
    • Aggregations
    • Partitioning
    • Broadcast Variables
    • Accumulators
  • Hands-on practice: You’ll learn expanded RDD capabilities
  • 5.Spark SQL and DataFrames
    • SparkSQL & DataFrames
    • DataFrame & SQL API
    • DataFrame Schema
    • Datasets and Encoders
    • Loading and Saving data
    • Aggregations
    • Joins
  • Hands-on practice: You’ll learn to use one of Spark’s most powerful features: DataFrames using R-style modeling supported by supercomputing clusters
  • 6.Spark Streaming
    • Brief introduction to streaming
    • Spark Streaming
    • Discretized Streams
    • Structured Streaming
    • Stateful / Stateless Transformations
    • Checkpointing
    • Interoperability with Streaming Platforms (Apache Kafka)
  • Hands-on practice: Another of Spark 2.1’s most exciting features is the ability to provide big data streaming to allow beating the timeframe constraints of previous big data solutions
  • Spark MLlib and ML
    • Introduction to Machine Learning
    • Spark Machine Learning APIs
    • Feature Extractor and Transformation
    • Classification using Logistic Regression
    • Best Practice in ML for the Practitioners
  • Hands-on practice: Use Spark to perform production-friendly calls for powerful machine learning service and predictive analytics
  • Graphx
    • Brief Introduction to Graph Theory
    • GraphX
    • Vertex and Edge RDDs
    • Graph operators
    • Pregel API
    • PageRank / Travelling Salesman Problem *
  • Hands-on practice: Get hands-on practice using Graphx
  • Testing and Debugging Spark
    • Testing in a Distributed Environment
    • Testing Spark Application
    • Debugging Spark Application
  • Hands-on practice: You’ll get lab practice supporting Spark solutions with best practices for testing, debugging, and normal-day production issues for Spark solutions

Prerequisites

Labs can be accessed by everyone using the cloud environment set up by the instructor. Participation is not mandatory; if they prefer, attendees can simply observe the instructor perform the lab example. Scala/Python are a nice to have skill to better understand what is being done in the Labs.