Code Explanation

Uploaded by

The code streams retail data from Kafka into Spark, processes the data using UDFs to calculate order metrics like total cost and item count, and computes time-based and country-based KPIs using window functions and writes them to files stored on HDFS. It imports functions from Spark SQL, defines the data schema, creates SparkSession, reads data from Kafka, registers UDFs, selects and transforms data, and calculates various KPIs which are written to files.

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Code Explanation

Uploaded by

Shilpa Kamagari

0% found this document useful (0 votes)

30 views3 pages

Original Title

Code-Explanation

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Download as docx, pdf, or txt

0% found this document useful (0 votes)

30 views3 pages

Code Explanation

Uploaded by

Shilpa Kamagari

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Download as docx, pdf, or txt

Jump to Page

You are on page 1of 3

Search inside document

Code Explanation

Case Study: Retail Data Analysis

In this project, we will go through a real-world use case from the retail sector.
Data from a centralised Kafka server in real-time will be streamed and
processed to calculate various KPIs or key performance indicators.

1. Various sql functions were imported from pyspark.sql.functions module.

The functions include window, udf etc.
2. Various sql types were imported from pyspark.sql.types module. The
types include StringType, ArrayType, TimestampType, IntegerType,
DoubleType etc.
3. SparkSession was imported from pyspark.sql module
4. Initialized the spark session using
spark = SparkSession \
.builder \
.appName("KafkaRead") \
.getOrCreate()
5. Streamed the data from kafka producer using
orderRaw = spark \
.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers","ec2-18-211-252-152.compute-
1.amazonaws.com:9092") \
.option("subscribe","real-time-project") \
.load()
From
Bootstrap Server - 18.211.252.152
Port - 9092
Topic - real-time-project
6. Schema is defined using
jsonSchema = StructType() \
.add("invoice_no", StringType()) \
.add("country", StringType()) \
.add("timestamp", TimestampType()) \
.add("type", StringType()) \
.add("items", ArrayType(StructType([
StructField("SKU", StringType()),
StructField("title", StringType()),
StructField("unit_price", DoubleType()),
StructField("quantity", IntegerType())
])))
7. Python function is written to compute total cost of an order using
Total cost = ∑(quantity∗unitprice)
8. The above function is transformed into udf (user defined function) using
add_total_cost = udf(get_total_cost, DoubleType())
9. Python function is written to find total items in an order.
10. The above function is transformed into udf using
add_total_count = udf(get_total_item, IntegerType())
11. Python function is written to find if the order is new.
12.The above function is transformed into udf using
add_is_order_flag = udf(get_is_order, IntegerType())
13.Python function is written to find if the order is return.
14.The above function is transformed into udf using
add_is_return_flag = udf(get_is_return, IntegerType())
15. Selected data ("invoice_no", "country", "timestamp", "Total_Items",
"Total_Cost", "is_order", "is_return" ) is written to the console.
16. Time based KPI (“Window”, ”OPM”, ”Total Sales Volume”, ”Average
rate of return”, “Average Transaction Size” ) is calculated using tumbling
window function for every 1 minute
17. Time and country based KPI ((“Window”, ”Country” ,”OPM”, ”Total
Sales Volume”, ”Average rate of return”) is calculated using tumbling
window function for every 1 minute
18. The Computed KPI were written to files and stored on HDFS in json
form
19. The streaming process is manually killed after 10 mins.

CodeLogic
Document6 pages
CodeLogic
Aakash Kotkar
No ratings yet
Big Data Lab
Document12 pages
Big Data Lab
Pooja Patil
No ratings yet
44 Working With Future: Map and Flatmap - Get Programming With Scala
Document8 pages
44 Working With Future: Map and Flatmap - Get Programming With Scala
kr.manid
No ratings yet
T4_L8_Host_Program_Python
Document17 pages
T4_L8_Host_Program_Python
ping71390
No ratings yet
Final Print Py Spark
Document133 pages
Final Print Py Spark
Shivaraj K
No ratings yet
BDA List of Experiments For Practical Exam
Document21 pages
BDA List of Experiments For Practical Exam
Pharoah Gamerz
No ratings yet
VCGV BN
Document2 pages
VCGV BN
Chandini Rajeev
No ratings yet
Lab Manual Cs6461 - Object Oriented Programming Lab: Valliammai Engineering College SRM Nagar, Kattankulathur
Document30 pages
Lab Manual Cs6461 - Object Oriented Programming Lab: Valliammai Engineering College SRM Nagar, Kattankulathur
sathyaraj palanisamy
No ratings yet
jszip.js
Document197 pages
jszip.js
SbX x
No ratings yet
Temperature
Document5 pages
Temperature
mathewsujith31
100% (1)
Tran Quang Kha - Full Stack Engineer
Document21 pages
Tran Quang Kha - Full Stack Engineer
KAI IT
No ratings yet
Operating System - Ass
Document26 pages
Operating System - Ass
16213467rish
No ratings yet
IRFinal
Document46 pages
IRFinal
sahayajeicy10
No ratings yet
Assignment WP
Document53 pages
Assignment WP
uttam porwal
No ratings yet
PCPF - PRACTICAL - Manual - Output 2
Document24 pages
PCPF - PRACTICAL - Manual - Output 2
donnoorain69
No ratings yet
INTRODUCTION
Document18 pages
INTRODUCTION
kavyaa
No ratings yet
F3 Mock Interview
Document15 pages
F3 Mock Interview
vignesh waran
No ratings yet
Gce Requirements
Document4 pages
Gce Requirements
Matthew Harris
No ratings yet
marscode
Document779 pages
marscode
sharath.kumar.20112023
No ratings yet
2 K 22:EE:297
Document8 pages
2 K 22:EE:297
bossdk874581
No ratings yet
Newmenu
Document7 pages
Newmenu
mathewsujith31
No ratings yet
Spark Job Dataproc
Document4 pages
Spark Job Dataproc
Denys Stolbov
No ratings yet
Tugas Pbo
Document2 pages
Tugas Pbo
Anggito karta Wijaya
No ratings yet
Create An Spark Streaming App: 1. Architecture and Abstraction
Document8 pages
Create An Spark Streaming App: 1. Architecture and Abstraction
Ngô Hoàng
No ratings yet
Dot Net Sap Code
Document18 pages
Dot Net Sap Code
debkrc
No ratings yet
Full Stack Development Lab Programs-3
Document27 pages
Full Stack Development Lab Programs-3
Chitti Chitti
No ratings yet
Requirement:: Hive/Impala/Presto Hadoop (Spark / HDFS) No SQL Database Game Server/application
Document4 pages
Requirement:: Hive/Impala/Presto Hadoop (Spark / HDFS) No SQL Database Game Server/application
Ashvanth Ramesh
No ratings yet
Aifaz Khan Awd
Document72 pages
Aifaz Khan Awd
khanaifazvlc
No ratings yet
Guru Gobind Singh Indraprastha University: Institute of Innovation in Technology & Management
Document59 pages
Guru Gobind Singh Indraprastha University: Institute of Innovation in Technology & Management
dhruvsharma292005
No ratings yet
C# File
Document25 pages
C# File
Rohit Tiwari
No ratings yet
Chaincode For Developers - Hyperledger-Fabricdocs Master Documentation
Document10 pages
Chaincode For Developers - Hyperledger-Fabricdocs Master Documentation
biarabbia
No ratings yet
Bda Unit 3
Document22 pages
Bda Unit 3
Vyshnavi Thottempudi
No ratings yet
Analyzing The Data With Hadoop
Document13 pages
Analyzing The Data With Hadoop
Vyshnavi Thottempudi
No ratings yet
Creating Web API in ASP - Net Core 2.0 - CodeProject
Document36 pages
Creating Web API in ASP - Net Core 2.0 - CodeProject
Gabriel Gomes
No ratings yet
Oop_Example_1
Document17 pages
Oop_Example_1
54-Hoàng Lê Nam
No ratings yet
Java Lab Manual PDF
Document66 pages
Java Lab Manual PDF
Vanathi Priyadharshini
No ratings yet
Angular Notes
Document7 pages
Angular Notes
Naman Jain
100% (1)
Csharp Sqlite
Document8 pages
Csharp Sqlite
Zheng Jun
No ratings yet
Code Menu
Document7 pages
Code Menu
mathewsujith31
No ratings yet
Tidying Up: Close
Document6 pages
Tidying Up: Close
rshegde
No ratings yet
Java Final Report
Document10 pages
Java Final Report
Onkar Talekar
No ratings yet
N Capas - ASP
Document15 pages
N Capas - ASP
anrk0
No ratings yet
Java Full
Document10 pages
Java Full
aneeshshinde167
No ratings yet
Information Retrieval Practical
Document16 pages
Information Retrieval Practical
sahadev jagdish dhargalkar
100% (1)
Chapter 8 PDF
Document7 pages
Chapter 8 PDF
Anshuman Tripathy
No ratings yet
Import Import Import Import Import Import Import Import Public Class Extends Implements
Document7 pages
Import Import Import Import Import Import Import Import Public Class Extends Implements
SARAVANAN
No ratings yet
JAVA Lab Manual (Vishesh Purkait)
Document27 pages
JAVA Lab Manual (Vishesh Purkait)
HUNTER RONY
No ratings yet
Importer and Exporter Product For Data Analysis Based On Extract, Transform, Load (ETL) and Regular Expression With Python Programming .Teway
Document26 pages
Importer and Exporter Product For Data Analysis Based On Extract, Transform, Load (ETL) and Regular Expression With Python Programming .Teway
Bharat Thakur
No ratings yet
05 Functions
Document6 pages
05 Functions
jen
No ratings yet
Cosmosdb Study
Document41 pages
Cosmosdb Study
Bryan Sanchez
No ratings yet
Our Pointers For The Exam
Document4 pages
Our Pointers For The Exam
jehaelcorazo
No ratings yet
Sem
Document2 pages
Sem
Chandini Rajeev
No ratings yet
Core & ReactJS
Document33 pages
Core & ReactJS
Muvvala Prudhvi
No ratings yet
Java Lab Manual - 5th Sem Cse
Document23 pages
Java Lab Manual - 5th Sem Cse
Rithika M Nagendiran
No ratings yet
Script Lab
Document10 pages
Script Lab
Trupti Suryawanshi
No ratings yet
Analysis of Heart Disease Dataset
Document16 pages
Analysis of Heart Disease Dataset
sswetha06052003
No ratings yet
ReportJalwa
Document17 pages
ReportJalwa
Rizwan
No ratings yet
CRUD Operations in Ionic2, To-Do Application - Jinal Shah
Document18 pages
CRUD Operations in Ionic2, To-Do Application - Jinal Shah
Johan Samangun
No ratings yet
1998_1000_DOC_AI-Powered Code Generation
Document5 pages
1998_1000_DOC_AI-Powered Code Generation
Jeevana Srinivasan
No ratings yet
Ajax in One Hour, For Beginners, Learn Coding Fast
From Everand
Ajax in One Hour, For Beginners, Learn Coding Fast
Ray Yao
No ratings yet
Format of The Thesis
Document17 pages
Format of The Thesis
kattaswamy
No ratings yet
Aipmst Primary Practice Set
Document13 pages
Aipmst Primary Practice Set
Dev Pandey
No ratings yet
Mechanical Tender Evaluation Report
Document3 pages
Mechanical Tender Evaluation Report
febous
No ratings yet
4045HFG92 A
Document2 pages
4045HFG92 A
Armando Jr Amores
No ratings yet
Loewe l2710 Chassis LCD TV SM
Document72 pages
Loewe l2710 Chassis LCD TV SM
GsmHelp
100% (2)
Important Derivations Physics Class 12 CBSE - Chapter Wise
Document3 pages
Important Derivations Physics Class 12 CBSE - Chapter Wise
Priyam Raj
No ratings yet
Lab 3 Industrial Automation
Document8 pages
Lab 3 Industrial Automation
Muhammad Umar
No ratings yet
NTM 1 Mok
Document10 pages
NTM 1 Mok
Nick Max
No ratings yet
Phenylephrine
Document6 pages
Phenylephrine
Zyuha Ainii
No ratings yet
WooKong M User Manual en v3.6 130130
Document54 pages
WooKong M User Manual en v3.6 130130
bobzybob1
No ratings yet
M. Tech WLF Syllabus
Document22 pages
M. Tech WLF Syllabus
Akhil Arora
No ratings yet
DS DVS DML DL
Document4 pages
DS DVS DML DL
Gabi P
No ratings yet
424-433, Ni Putu Hanisa Noptiana Putri, I Ketut Sunarwijaya, Ni Putu Lisa Ernawatiningsih
Document10 pages
424-433, Ni Putu Hanisa Noptiana Putri, I Ketut Sunarwijaya, Ni Putu Lisa Ernawatiningsih
Novi Riswanti
No ratings yet
Jiangxi Kelley Chemical Packing Co., LTD
Document7 pages
Jiangxi Kelley Chemical Packing Co., LTD
dummy9158
No ratings yet
TopTherm Filter Fan Units
Document4 pages
TopTherm Filter Fan Units
TILAK RAJ Kamboj
No ratings yet
BSV by Example: The Next-Generation Language For Electronic System Design
Document240 pages
BSV by Example: The Next-Generation Language For Electronic System Design
aniketmohanty
100% (1)
Dynamics of Increasing The Volume and Intensity
Document5 pages
Dynamics of Increasing The Volume and Intensity
Aizat Farhan
No ratings yet
ABSTRACTion
Document14 pages
ABSTRACTion
Cynthia Mae Pawid
No ratings yet
M2 Lesson 2 - Air Masses and Fronts
Document3 pages
M2 Lesson 2 - Air Masses and Fronts
VILLARANTE, CHRESSYLE ANNE FABIALA
No ratings yet
Cleaning Grading & Conveying
Document85 pages
Cleaning Grading & Conveying
seema shekhawat
No ratings yet
Whole Wall Performance Analysis of Autoclaved Aerated Concrete An Example of Collaboration Between Industry and A Research Lab On Development of Energy Efficient Building Envelope Systems
Document12 pages
Whole Wall Performance Analysis of Autoclaved Aerated Concrete An Example of Collaboration Between Industry and A Research Lab On Development of Energy Efficient Building Envelope Systems
h2odavid
No ratings yet
Class 8 EVS Revision Question Paper 2024-25
Document3 pages
Class 8 EVS Revision Question Paper 2024-25
shubham.gawali
No ratings yet
1 Post GATE - 2021 Counselling - EC - Students
Document66 pages
1 Post GATE - 2021 Counselling - EC - Students
Lovepreet Singh
No ratings yet
Future Simple Tense (Probability 50%) : Probably, Surely
Document2 pages
Future Simple Tense (Probability 50%) : Probably, Surely
Macarena
No ratings yet
Investigating Atoms and Atomic Theory
Document34 pages
Investigating Atoms and Atomic Theory
sophia lu
No ratings yet
Computer Fundamentals
Document86 pages
Computer Fundamentals
pratibha
No ratings yet
Section 6 Quiz: 1st Normal Form. 2nd Normal Form. 3rd Normal Form. ( ) None of The Above, The Entity Is Fully Normalised
Document5 pages
Section 6 Quiz: 1st Normal Form. 2nd Normal Form. 3rd Normal Form. ( ) None of The Above, The Entity Is Fully Normalised
AnaXYef
No ratings yet
CHE531 2012-Jan
Document9 pages
CHE531 2012-Jan
BiLL ShAmS
No ratings yet
CNS Unit 2
Document32 pages
CNS Unit 2
dineshsai94601
No ratings yet
Make It Home: Automatic Optimization of Furniture Arrangement
Document11 pages
Make It Home: Automatic Optimization of Furniture Arrangement
Tuan Huy Le
No ratings yet