Data Engineering
Data Engineering
Data Engineering
Syllabus
www.sevenmentor.com
PYTHON
Introduction :
What is Python and history of Python?
Installing Anaconda, Jupyter Notebook
First Program
Python Identifiers, Keywords and Indentation
Comments
Getting User Input
Python Data Types
What are keywords
What are variables?
Python Inbuilt Functions
Control-Flow Statements:
If-else
Elif
While loop
For loop
www.sevenmentor.com
Range Function
Break
Continue
Assert
Pass
Return
Coding Assignment
Data Structures:
What are Data Structures?
Lists in Python
Code Walkthrough on Lists
Understanding Iterators
Tuple in Python
Code Walkthrough on Tuple
Dictionaries in Python
Code Walkthrough on Dictionaries
Sets in Python
Code Walkthrough on Sets
More examples on Data Structures
www.sevenmentor.com
Functions:
What are functions in python?
Defining and Calling Functions
Inbuilt Functions
User Defined Functions
Lambda Function
Split Function
Strip Function
Map Function
Filter Function
Format Function
Code Walkthrough on User Define Functions
Regular Expressions Basics
www.sevenmentor.com
Abstraction
Inheritance
Encapsulation
Polymorphism
Code Walkthroughs on OOP
Errors in Python
Compile-Time Errors
Runtime Errors
What is Exception?
try....except...else
try-finally clause
Raising an exceptions
Tkinter
www.sevenmentor.com
Miscellaneous Topics:
SQL connection with Python using SQLITE Library
Multi-Threading and Multi-Processing
Introduction to Web-scraping
BeautifulSoup Library
Numpy Library for Data Analysis
Code Walkthrough On Numpy Library
Pandas Library for Data Analysis
Code Walkthrough On Pandas Library
Matplotlib Library for Data Analysis
Code Walkthrough On Matplotlib Library
Revision Sessions
Assignment Discussions
Project Discussion:
Defining the Business Problem
Constraints
Flow Diagram
Libraries Used
www.sevenmentor.com
Results and Conclusion
Future Scope
References
www.sevenmentor.com
SQL
Introduction:
What is SQL?
Why do we need SQL?
What is Data Base Management System?
Types of DBMS
Execution Of SQL query
Difference Between SQL and MYSQL
Introduction to MySQL
Installation of MySQL server
Download sample database
Load sample database to work.
www.sevenmentor.com
Where
Comparison Operators
Null
Logical Operators
Aggregate Operators(Count, Max, Min, Avg, Sum)
Group By
Having
Order Of Keywords
Wildcard Operators
JOINS:
What are Joins?
Inner Join
Outer Join
Left Join
Right Join
Self Join
SubQueries/NestedQueries/Inner Queries
Triggers
Stored Procedures
www.sevenmentor.com
DML/DDL:
DML:Insert
DML:Update, Delete
DDL:Create Table
DDL:Alter:Add,Drop,Modify
DDL:Drop Table,Truncate,Delete
DCL:Data Control Language: GRANT,REVOKE
www.sevenmentor.com
PL-SQL
Introduction To PL/SQL:
Informal introduction to PL/SQL Advantages of
PL/SQL
Datatypes in PL/SQL
Program structure of PL/SQL Embedding SQL
statements
Using conditional statements and loops
www.sevenmentor.com
Understanding Exception Handling:
What is an Exception?
Describing Exception types Handling system
defined exceptions Handling user defined
exceptions? Sql code vs Sql errm
Pragma exception_init
www.sevenmentor.com
Difference between procedures and functions
How to use inline functions?
Creating & Using Packages:
What is a Package?
Reasons to use packages
What is package specification?
What is package body?
How to instantiate package?
How to initialise instantiated package? What are
all the package state?
www.sevenmentor.com
Triggers In PL/SQL:
How to create triggers?
Benefits of trigger
How to trigger a trigger?
Using DML trigger & DDL trigger
How to audit database using triggers? What are
database level trigger?
Collections In PL/SQL:
What is collection?
How to use arrays?
Using nested tables
How to use index by value?
Listing types of collection methods.
General overview and discussion about DBA Concepts
www.sevenmentor.com
Introduction To Data
Engineering/Data
Warehouse
What is Data Engineering?
Use Cases, and Applications?
Data Engineer or Data Scientist?
What is DataWarehouse?
Data Lakes
Data Engineering Problems Tools of a Data Engineer
Working with Different Databases Processing
Tasks,
Scheduling Tools, and Different Cloud Providers
Why Cloud Computing, Use Cases, and Applications?
Different Cloud Services
www.sevenmentor.com
AWS
AWS Data Engineering Tools :
CORE python
Sql and no sql
Data Storage Tools
Data Integration Tools
Data Warehouse Tools
Data Visualization Tools
AWS Snowball
www.sevenmentor.com
Data Storage Tools:
Amazon S3
www.sevenmentor.com
MongoDB
Introduction:
What Is MongoDB?
Installation and Configuration
MongoDB Data Modelling
Introduction to NoSQL Architecture with
MongoDB
MongoDB Advantages
MongoDB Tools, Collection and Documents
www.sevenmentor.com
MongoDB Map Reduce
MongoDB Text Search
MongoDB Regular Expression
MongoDB Capped Collections
Administration:
MongoDB Deployment and Cluster setup
MongoDB GridFS
Trident Spout
Working with Replica Sets
MongoDB Sharding
Indexing:
Indexing and Aggregation
Indexing, query profiling and the query
optimisers
GeoSpatial Indexes
Index types, Index Properties
MongoDB Advanced Indexing
MongoDB Indexing Limitations
Aggregation Introduction
www.sevenmentor.com
SPARK
Hadoop Overview:
Need of Hadoop technology
Hadoop Cluster and Racks in detail
Overview of Map Reduce
Big data Concepts and data types
Concept of Streaming data and different tools
utilisation 6. HDFS and Basic Hadoop commands
Scala Programming:
Scala overview and Environment Setup
Oops concepts in scala
Control Structure and Functions
Closures and Collections
Exception Handling in scala
Apache Spark 2 x Installation:
Download release and set
Working with eclipse
Installing Scala IDE with spark
Testing with different OS
www.sevenmentor.com
Working with Apache Spark:
RDD and its Transformations
Working with Eclipse Maven, Spark context
and RDD
Working with different file formats
Introduction to Spark DataFrame
Data Frames and RDD's with with Spark 1.x and
2.x style
Creating Multiple Spark Context and Spark
Sessions
Applying Own Schema to the DataFrame and
basic operations 8. Creating Datasets and its
basic operations
Dataset vs DataFrame Performance
Running Spark Job in Yarn/cluster Mode From
IDE
Spark with Mysql, transformations On MySQL
Table - DataFrame API 12.Query Push Down to
MySQL Database
Creating Partitioned Table with Spark
Spark built-in functions and UDF
Examples with spark Sql and RDD's
Spark job submit
www.sevenmentor.com
Spark Streaming:
Working with data stream 2. Example of
network
Twitter data stream
Twitter data analysis cases
Kafka:
Fundamentals of kafka, Work Flow and Basic
Operations
Creating Topics, Partition, Replication, Broker
and Kafka cluster
Working with Producer and Consumer
Examples
Creating Consumer Group, Leaders, Followers
Starting brokers, Listing and modifying topics
Single Node-Multiple Brokers Configuration
and
Creating Producer, Consumer and Consumer
group application
Running a jar files from terminal
www.sevenmentor.com
Real time case studies:
Working on different data sets
SparkMlib:
Classification algorithm
Clustering algorithm
www.sevenmentor.com