Machine Learning with Apache Spark Training

1 Day

Description

To stay competitive, organizations have started adopting new approaches to data processing and analysis.  For example, data scientists are turning to Apache Spark for processing massive amounts of data using Apache Spark’s distributed compute capability and its built-in machine learning library.

This intensive Apache Spark training course provides an overview of data science algorithms as well as the theoretical and technical aspects of using the Apache Spark platform for Machine Learning.  This training course is supplemented by a variety of hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.

Topics

  • Applied Data Science and Business Analytics
  • Machine Learning Algorithms, Techniques and Common Analytical Methods
  • Apache Spark Introduction
  • Spark’s MLlib Machine Learning Library

This Apache Spark training course has 3 hands-on labs that are outlined at the bottom of this page. The labs cover the spark-submit tool as well as Apache Spark shell. The labs allow you to practice the following skills:

Lab 1 - Using the spark-submit Tool

Spark offers developers two ways of running your applications:

  • Using the spark-submit tool
  • Using Spark Shell

In this lab, we will review what is involved in using the spark-submit tool.

Lab 2 - The Apache Spark Shell

Interactive development environment in Spark is provided by the Spark Shell (also known as REPL: Read/Eval/Print Loop tool) that is available for Scala and Python developers (Java is not yet supported).
The lab instructions below apply to the Scala version of the Spark Shell.

Lab 3 - Using Random Forests for Classification with Spark MLlib

In this lab, we will learn how to use Random Forests implementation of the algorithm from Spark's Machine Learning library, MLlib, to perform object classification.
Random Forests algorithm is regarded as one of the most successful supervised learning algorithm that can be used for both classification and regression.
In our work we will use the Python version of the library, which provides API similar to those implemented in Scala and Java.
We will also use the spark-submit Spark tool to submit the application from command line rather than typing in commands in Spark Shell.

Web Age Spark class can be delivered in traditional classroom style format. This Apache Spark Training can also be delivered in a synchronous instructor led format.

Audience

  • Data Scientists
  • Business Analysts
  • Software Developers
  • IT Architects

No Upcoming Public Classes

There are currently no public events available for this course. However, you can submit a request for a new date and we will try our best to get you into a Machine Learning with Apache Spark Training class.

Private Training Available
No date scheduled, don’t see a date that works for you or looking for a private training event, please call 651-905-3729 or submit a request for further information here.
request a private session or new date

Course Overview

  • CHAPTER 1. MACHINE LEARNING ALGORITHMS
    • Supervised vs Unsupervised Machine Learning
    • Supervised Machine Learning Algorithms
    • Unsupervised Machine Learning Algorithms
    • Choose the Right Algorithm
    • Life-cycles of Machine Learning Development
    • Classifying with k-Nearest Neighbors (SL)
    • k-Nearest Neighbors Algorithm
    • k-Nearest Neighbors Algorithm
    • The Error Rate
    • Decision Trees (SL)
    • Random Forests
    • Unsupervised Learning Type: Clustering
    • K-Means Clustering (UL)
    • K-Means Clustering in a Nutshell
    • Regression Analysis
    • Logistic Regression
    • Summary
  • CHAPTER 2. INTRODUCTION TO FUNCTIONAL PROGRAMMING
    • What is Functional Programming (FP)?
    • Terminology: Higher-Order Functions
    • Terminology: Lambda vs Closure
    • A Short List of Languages that Support FP
    • FP with Java
    • FP With JavaScript
    • Imperative Programming in JavaScript
    • The JavaScript map (FP) Example
    • The JavaScript reduce (FP) Example
    • Using reduce to Flatten an Array of Arrays (FP) Example
    • The JavaScript filter (FP) Example
    • Common High-Order Functions in Python
    • Common High-Order Functions in Scala
    • Elements of FP in R
    • Summary
  • CHAPTER 3. INTRODUCTION TO APACHE SPARK
    • What is Apache Spark
    • A Short History of Spark
    • Where to Get Spark?
    • The Spark Platform
    • Spark Logo
    • Common Spark Use Cases
    • Languages Supported by Spark
    • Running Spark on a Cluster
    • The Driver Process
    • Spark Applications
    • Spark Shell
    • The spark-submit Tool
    • The spark-submit Tool Configuration
    • The Executor and Worker Processes
    • The Spark Application Architecture
    • Interfaces with Data Storage Systems
    • Limitations of Hadoop's MapReduce
    • Spark vs MapReduce
    • Spark as an Alternative to Apache Tez
    • The Resilient Distributed Dataset (RDD)
    • Spark Streaming (Micro-batching)
    • Spark SQL
    • Example of Spark SQL
    • Spark Machine Learning Library
    • GraphX
    • Spark vs R
    • Summary
  • CHAPTER 4. THE SPARK SHELL
    • The Spark Shell
    • The Spark Shell UI
    • Spark Shell Options
    • Getting Help
    • The Spark Context (sc) and SQL Context (sqlContext)
    • The Shell Spark Context
    • Loading Files
    • Saving Files
    • Basic Spark ETL Operations
    • Summary
  • CHAPTER 5. THE SPARK MACHINE LEARNING LIBRARY
    • What is MLlib?
    • Supported Languages
    • MLlib Packages
    • Dense and Sparse Vectors
    • Labeled Point
    • Python Example of Using the LabeledPoint Class
    • LIBSVM format
    • An Example of a LIBSVM File
    • Loading LIBSVM Files
    • Local Matrices
    • Example of Creating Matrices in MLlib
    • Distributed Matrices
    • Example of Using a Distributed Matrix
    • Classification and Regression Algorithm
    • Clustering
    • Summary
  • CHAPTER 6. TEXT MINING
    • What is Text Mining?
    • The Common Text Mining Tasks
    • What is Natural Language Processing (NLP)?
    • Some of the NLP Use Cases
    • Machine Learning in Text Mining and NLP
    • Machine Learning in NLP
    • TF-IDF
    • The Feature Hashing Trick
    • Stemming
    • Example of Stemming
    • Stop Words
    • Popular Text Mining and NLP Libraries and Packages
    • Summary
  • LAB EXERCISES
    • Lab 1. Learning the Lab Environment
    • Lab 2. The Spark Shell 
    • Lab 3. Using Random Forests for Classification with Spark MLlib 
    • Lab 4. Using k-means Algorithm from MLlib
    • Lab 5. Text Classification with Spark ML Pipeline

No Upcoming Public Classes

There are currently no public events available for this course. However, you can submit a request for a new date and we will try our best to get you into a Machine Learning with Apache Spark Training class.

Private Training Available
No date scheduled, don’t see a date that works for you or looking for a private training event, please call 651-905-3729 or submit a request for further information here.
request a private session or new date

Prerequisites

Participants should have the general knowledge of statistics and programming

No Upcoming Public Classes

There are currently no public events available for this course. However, you can submit a request for a new date and we will try our best to get you into a Machine Learning with Apache Spark Training class.

Private Training Available
No date scheduled, don’t see a date that works for you or looking for a private training event, please call 651-905-3729 or submit a request for further information here.
request a private session or new date