Python/PySpark training teaches participants how to leverage Apache Spark's massively parallel processing capabilities using PySpark, a Python-based language. This intensive course teaches both Python and PySpark to prepare students to work in a broad range of environments.
Target Audience
Skills Gained
For more Python training you may be interested in, click here.
Big Data Concepts and Systems Overview for Data Engineers
Defining Data Engineering
Data Processing Phases
Python 3 Introduction
Python Variables and Types
Control Statements and Data Collections
Functions and Modules
File I/O and Useful Modules
Practical Introduction to NumPy
Practical Introduction to pandas
Data Grouping and Aggregation with pandas
Repairing and Normalizing Data
Data Visualization in Python
Python as a Cloud Scripting Language
Introduction to Apache Spark
The Spark Shell
Spark RDDs
Parallel Data Processing with Spark
Introduction to Spark SQL
Lab Exercises