DataFrame Operations Using A Json File

This Python code uses Spark SQL to read employee data from a JSON file into a DataFrame. It then filters the DataFrame to only rows where the stream is "JAVA" and writes the filtered DataFrame to a new Parquet file. It first reads the JSON, displays the DataFrame, coalesces and writes it to a Parquet file. Then it reads the Parquet, filters for "JAVA" stream, displays and writes the filtered DataFrame to a new Parquet file.

Uploaded by

Arpita Das

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views

DataFrame Operations Using A Json File

Uploaded by

Arpita Das

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 1

#Put your code here

from pyspark.sql import SparkSession

spark = SparkSession \
.builder \
.appName("Data Frame EMPLOYEE") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()
df = spark.read.json("emp.json")
df.show()
df.coalesce(1).write.parquet("Employees")
pf = spark.read.parquet("Employees")
dfNew = pf.filter(pf.stream=='JAVA')
dfNew.show()
dfNew.coalesce(1).write.parquet("JavaEmployees")

Architect
100% (1)
Architect
11 pages
Exercise - Descriptive Statistics - Fresco
100% (1)
Exercise - Descriptive Statistics - Fresco
1 page
This Study Resource Was
33% (18)
This Study Resource Was
9 pages
Fresco Play Hands On Answers
33% (3)
Fresco Play Hands On Answers
2 pages
T13 Answers
No ratings yet
T13 Answers
20 pages
FAQ Metamorph
50% (4)
FAQ Metamorph
5 pages
E2 String
0% (4)
E2 String
2 pages
AdvanceTS1handson - Jupyter Notebook
100% (2)
AdvanceTS1handson - Jupyter Notebook
3 pages
SQL
50% (4)
SQL
6 pages
ISecurity Quiz - Generic Question and Answer - Technicalblog - in
No ratings yet
ISecurity Quiz - Generic Question and Answer - Technicalblog - in
5 pages
JavaScript Worklist Handson Solution Ievolve 57714
No ratings yet
JavaScript Worklist Handson Solution Ievolve 57714
3 pages
Py Spark Final
No ratings yet
Py Spark Final
1 page
TCS Database Questions
100% (1)
TCS Database Questions
23 pages
Datascience Quiz
33% (3)
Datascience Quiz
3 pages
DATAbase Connectivity
100% (2)
DATAbase Connectivity
4 pages
Import As From Import Import: Problem 1
100% (1)
Import As From Import Import: Problem 1
5 pages
Chapter-3 Risk Management Through Insurance: Certificate in Insurance Concepts
60% (5)
Chapter-3 Risk Management Through Insurance: Certificate in Insurance Concepts
23 pages
Unstructtured Data Classification Fresco
100% (1)
Unstructtured Data Classification Fresco
4 pages
Data Privacy
No ratings yet
Data Privacy
2 pages
Estimation Concepts Training at TCS - Completion - Certificate
0% (3)
Estimation Concepts Training at TCS - Completion - Certificate
1 page
Hands On Question Interstellar Git
100% (1)
Hands On Question Interstellar Git
9 pages
Scala Constructs: Concepts of Functional Programming
No ratings yet
Scala Constructs: Concepts of Functional Programming
21 pages
Spark SQL Hands - On
No ratings yet
Spark SQL Hands - On
3 pages
Python Hands On
100% (1)
Python Hands On
11 pages
Change Datatypes and Return Required Json Data
No ratings yet
Change Datatypes and Return Required Json Data
1 page
Modules 1
No ratings yet
Modules 1
9 pages
Phone Directory E2 Stage 1
0% (1)
Phone Directory E2 Stage 1
3 pages
Python List Handson 1
No ratings yet
Python List Handson 1
2 pages
Security Analytics With Apache Metron
0% (1)
Security Analytics With Apache Metron
3 pages
Creating A Selenium Script
No ratings yet
Creating A Selenium Script
3 pages
Create A DataFrame
No ratings yet
Create A DataFrame
1 page
This Study Resource Was
No ratings yet
This Study Resource Was
3 pages
Tcs EDA Question
0% (1)
Tcs EDA Question
5 pages
Python 3 Application Programming
100% (1)
Python 3 Application Programming
12 pages
Power BI Outset
100% (1)
Power BI Outset
11 pages
Teste ID 51828 Do Curso 1846
No ratings yet
Teste ID 51828 Do Curso 1846
6 pages
Zenpython Handson1
67% (3)
Zenpython Handson1
2 pages
Dumps of SCJP
100% (1)
Dumps of SCJP
5 pages
Kafka - Premiera Ola
No ratings yet
Kafka - Premiera Ola
5 pages
Python 3 Functions and OOPs
No ratings yet
Python 3 Functions and OOPs
7 pages
Data Mining-All Correct
0% (1)
Data Mining-All Correct
2 pages
Basics of Statistics and Probability Handsons
No ratings yet
Basics of Statistics and Probability Handsons
3 pages
Email
100% (3)
Email
1 page
Python Qualis
No ratings yet
Python Qualis
6 pages
Roleplay
No ratings yet
Roleplay
2 pages
Python 3 Functions and OOPs
No ratings yet
Python 3 Functions and OOPs
6 pages
Advanced Designer Exam
100% (6)
Advanced Designer Exam
19 pages
Python 3 Programming Q & A
No ratings yet
Python 3 Programming Q & A
4 pages
Tensor Flow
No ratings yet
Tensor Flow
2 pages
Questions and Answers For PL-SQL
No ratings yet
Questions and Answers For PL-SQL
9 pages
Implementing Design Thinking
No ratings yet
Implementing Design Thinking
4 pages
Fresco Code Python Application Programming
No ratings yet
Fresco Code Python Application Programming
7 pages
HSE - General Awareness - Environmental Management - Completion - Certificate
30% (43)
HSE - General Awareness - Environmental Management - Completion - Certificate
1 page
Spark Streaming - Malay
100% (1)
Spark Streaming - Malay
1 page
Training List Sample Document
0% (2)
Training List Sample Document
18 pages
Ans
100% (2)
Ans
58 pages
PySpark_FP_Course ID 58339 - Hands on 4
No ratings yet
PySpark_FP_Course ID 58339 - Hands on 4
2 pages
PySpark_FP_Course ID 58339 - Hands on 1
No ratings yet
PySpark_FP_Course ID 58339 - Hands on 1
2 pages
Comparison of SQL
No ratings yet
Comparison of SQL
11 pages
(Big Data Analytics With PySpark) (CheatSheet)
No ratings yet
(Big Data Analytics With PySpark) (CheatSheet)
7 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
23
No ratings yet
23
5 pages
22
No ratings yet
22
7 pages
Exercise - ANOVA - Fresco
No ratings yet
Exercise - ANOVA - Fresco
1 page
Exercise ANOVA Anotherone - Fresco
No ratings yet
Exercise ANOVA Anotherone - Fresco
1 page
DataFrame Operations
No ratings yet
DataFrame Operations
1 page
Give A Try - Database Connectivity
No ratings yet
Give A Try - Database Connectivity
5 pages

DataFrame Operations Using A Json File

Uploaded by

DataFrame Operations Using A Json File

Uploaded by

#Put your code here

from pyspark.sql import SparkSession

You might also like