Spark and Scala Course
Spark and Scala Course
SCALA
Getting started With Scala
01
Scala Background, Scala Vs Java and Basics
Interactive Scala – REPL, data types, variables,
expressions, simple functions
Running the program with Scala Compiler
Explore the type lattice and use type
inference
Define Methods and Pattern Matching
Scala Environment Set up
Scala set up on Windows
and UNIX
JAVA Setup
SCALA Editor
02 Interpreter
Compiler
Functional
Programming
What is Functional
Programming?
Differences between OOPS and 03
FPP
Collections
Iterating, mapping, filtering,
and counting
Regular expressions and
matching with them
Maps, Sets, group By, Options,
flatten, flat Map
Word count, IO operations, file
04 access, flatMap
Object-Oriented
Programming
Integrations
What is SBT?
Integration of Scala in Eclipse
IDE
Integration of SBT with Eclipse 08
GIT
Introduction to GIT &
Installation
Comparisons, Branching &
Merging
Rebasing, Stashing & Taggings
09
SPARK
Environment
Configuring Apache Spark
spark-shell
11
spark submit
Setting Up memory (Driver Memory , Executor
Memory)
Setting Up Cores (Executors Core)
Running Spark in Local
SPARK UI Explanation
Yarn and Cluster Framework
frames
Overview Of data frames
14
Read a CSV/ Excel Files And create a
data frame.
Cache/ Uncahe Operations On data
frames.
Persist/UnPersist Operations On data
frames.
Partition and repartition Concepts of
data frames.
For each Partitions On Data frames.
Programming using data frame .
How to use data frames Api 's
effectually.
A magic spark Job using data frame
concept.(small project)
Schema Defining on from data frame
How to perform SQL operations On
data frame.
Check Point in data frame .
StructType and arrayType in data
frames
Complex Data Structure on data
15
frame
Scala & Spark
Scala : "Red Hot" Programming Language for Apache Saprk .
According to a recent survey by Databricks 71% of Spark users also
use Scala programming language.
Various data
sources
CSV files
16
Excel Files
JSON Files
Parquet file
Benefits of Parquet file
Text Files
Various levels of
persistence
MEMORY_ONLY
MEMORY_ONLY_SER
MEMORY_AND_DISK
MEMORY_AND_DISK_SER,
17 DISK_ONLY
OFF_HEAP
User Define
Functions
18
Connecting Spark
With S3
Cassandra database
Overview of Cassandra
database and benefits.
Partition Key and collection
concepts in Cassandra
Connecting Cassandra with
spark
Read a table from Cassandra
and perform transformations.
Writing data to a Cassandra
table with millions of data
20
Scala & Spark
Scala : "Red Hot" Programming Language for Apache Saprk .
According to a recent survey by Databricks 71% of Spark users also
use Scala programming language.
Redis
Overview of redis
21
How to connect spark with redis
Collection concepts of redis
Reading the key, HashKey and set from redis
and doing operation in spark
Writing various keys to the redis using spark
Spark SQL
Overview of Spark SQL.
How to write SQL in spark.
Various types of Clause in
spark SQL
Data cleaning
23
Spark Mlib
Introduction to machine
learning and benefits
Spark Mlib library Introduction.
Vectors, Decision Tree and
matrix concepts
Classification and Regression
Correlations and Stratified
Sampling concepts
Various algorithms Explanation
Case Studies
Spark Streaming
and Live
Overview of spark streaming
Concepts of Input DStreams
and Receivers and Receiver
Project On
spark
Transformations on DStreams
Window Operations
25