0% found this document useful (0 votes)

268 views

Data Engineer Interview Questions

Data engineering focuses on applying data collection and research to convert raw data from various sources into useful information. Data modeling documents complex software design as diagrams to easily understand relationships between data objects and rules. The main types of schemas in data modeling are star schema and snowflake schema. Structured data uses databases for storage and has standard integration tools, while unstructured data uses unmanaged file structures and manual processing. A Hadoop application includes common utilities, HDFS for distributed file storage, MapReduce for large-scale processing, YARN for resource management, and NameNode tracks files across clusters.

Uploaded by

Ghulam Mustafa

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

268 views

Data Engineer Interview Questions

Uploaded by

Ghulam Mustafa

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 6

1) Explain Data Engineering.

Data engineering is a term used in big data. It focuses on the

application of data collection and research. The data generated
from various sources are just raw data. Data engineering helps
to convert this raw data into useful information.

2) What is Data Modelling?

Data modeling is the method of documenting complex software
design as a diagram so that anyone can easily understand.
It is a conceptual representation of data objects that
are associated between various data objects and the rules.

3) List various types of design schemas in Data Modelling

There are mainly two types of schemas in data modeling: 1) Star schema and 2)
Snowflake schema.

4) Distinguish between structured and unstructured data

Parameter Structured Data Unstructured Data

Storage DBMS Unmanaged file structures
Standard ADO.net, ODBC, and SQL STMP, XML, CSV, and SMS
Integration Tool ELT (Extract, Transform, Load) Manual data entry or batch
processing that includes codes
scaling Schema scaling is difficult Scaling is very easy.

5) Explain all components of a Hadoop application

Hadoop Common: It is a common set of utilities and libraries that are utilized by
Hadoop.

HDFS: This Hadoop application relates to the file system in which the Hadoop data
is stored. It is a distributed file system having high bandwidth.

Hadoop MapReduce: It is based according to the algorithm for the provision of

large-scale data processing.

Hadoop YARN: It is used for resource management within the Hadoop cluster. It can
also be used for task scheduling for users.

6) What is NameNode?

It is the centerpiece of HDFS. It stores data of HDFS and tracks various files
across the clusters. Here, the actual data is not stored. The data is stored in
DataNodes.

7) Define Hadoop streaming

It is a utility which allows for the creation of the map and Reduces jobs and
submits them to a specific cluster.

8) What is the full form of HDFS?

HDFS stands for Hadoop Distributed File System.

9) Define Block and Block Scanner in HDFS

Blocks are the smallest unit of a data file. Hadoop automatically splits huge files
into small pieces.

Block Scanner verifies the list of blocks that are presented on a DataNode.

10) What are the steps that occur when Block Scanner detects a corrupted data
block?

1) First of all, when Block Scanner find a corrupted data block, DataNode report to
NameNode

2) NameNode start the process of creating a new replica using a replica of the
corrupted block.

3) Replication count of the correct replicas tries to match with the replication
factor. If the match found corrupted data block will not be deleted.

11) Name two messages that NameNode gets from DataNode?

There are two messages which NameNode gets from DataNode. They are 1) Block report
and 2) Heartbeat.

12) List out various XML configuration files in Hadoop?

There are five XML configuration files in Hadoop:
Mapred-site
Core-site
HDFS-site
Yarn-site

13) What are four V's of big data?

Five V's of big data are:

Velocity
Value
Variety
Volume
Veracity

14) Explain the features of Hadoop

Important features of Hadoop are:

It is an open-source framework that is available freeware.

Hadoop is compatible with the many types of hardware and easy to access new
hardware within a specific node.
Hadoop supports faster-distributed processing of data.
It stores the data in the cluster, which is independent of the rest of the
operations.
Hadoop allows creating 3 replicas for each block with different nodes.

15) Explain the main methods of Reducer

setup (): It is used for configuring parameters like the size of input data and
distributed cache.
cleanup(): This method is used to clean temporary files.
reduce(): It is a heart of the reducer which is called once per key with the
associated reduced task

16) What is the abbreviation of COSHH?

The abbreviation of COSHH is Classification and Optimization based Schedule for
Heterogeneous Hadoop systems.

17) Explain Star Schema

Star Schema or Star Join Schema is the simplest type of Data Warehouse schema.
It is known as star schema because its structure is like a star. In the Star
schema,
the center of the star may have one fact table and multiple associated dimension
table.
This schema is used for querying large data sets.

18) How to deploy a big data solution?

Follow the following steps in order to deploy a big data solution.

1) Integrate data using data sources like RDBMS, SAP, MySQL, Salesforce

2) Store data extracted data in either NoSQL database or HDFS.

3) Deploy big data solution using processing frameworks like Pig, Spark, and
MapReduce.

19) Explain FSCK

File System Check or FSCK is command used by HDFS. FSCK command is used to check
inconsistencies and problem in file.
20) Explain Snowflake Schema
A Snowflake Schema is an extension of a Star Schema, and it adds additional
dimensions.
It is so-called as snowflake because its diagram looks like a Snowflake. The
dimension tables
are normalized, that splits data into additional tables.

Certified Data Engineer Associate
No ratings yet
Certified Data Engineer Associate
24 pages
Azure Data Engineer Interview Questions and Answers
No ratings yet
Azure Data Engineer Interview Questions and Answers
7 pages
DBT Cloud Advanced Architecture Guide
0% (1)
DBT Cloud Advanced Architecture Guide
4 pages
Manual Cartel Programable Led Videomax
0% (1)
Manual Cartel Programable Led Videomax
3 pages
1.02 Deswik - CAD Essentials
No ratings yet
1.02 Deswik - CAD Essentials
1 page
Azure Data Factory Interview Questions and Answer
No ratings yet
Azure Data Factory Interview Questions and Answer
12 pages
Azure Databricks Best Practices 1664384402
No ratings yet
Azure Databricks Best Practices 1664384402
30 pages
CaseStudy Online Shopping Cart
87% (53)
CaseStudy Online Shopping Cart
21 pages
Data Engineer Interview Questions
No ratings yet
Data Engineer Interview Questions
16 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Azure DataEngineering End To End Videos
No ratings yet
Azure DataEngineering End To End Videos
21 pages
4.1 The Spark UI - Databricks
No ratings yet
4.1 The Spark UI - Databricks
7 pages
azure DE interview que
100% (1)
azure DE interview que
25 pages
Advanced Project For Data Engineering in Azure
100% (1)
Advanced Project For Data Engineering in Azure
5 pages
Azure Data Engineer Interview Questions
No ratings yet
Azure Data Engineer Interview Questions
15 pages
Senior Data Engineer Resume Example
No ratings yet
Senior Data Engineer Resume Example
1 page
ADB Course Catalog
No ratings yet
ADB Course Catalog
84 pages
Connect Databricks Delta Tables With DBeaver
No ratings yet
Connect Databricks Delta Tables With DBeaver
10 pages
Airflow Introduction
No ratings yet
Airflow Introduction
9 pages
PySpark VS SQL Interview Questions
No ratings yet
PySpark VS SQL Interview Questions
16 pages
Databricks Question
No ratings yet
Databricks Question
89 pages
Bhaskar ADE - Altimetrik
No ratings yet
Bhaskar ADE - Altimetrik
3 pages
2.7 Years AzureDataEngineer Prateek
No ratings yet
2.7 Years AzureDataEngineer Prateek
2 pages
Databricks
No ratings yet
Databricks
4 pages
Databricks Pyspark 1712042928
100% (1)
Databricks Pyspark 1712042928
21 pages
Databricks Course Curriculum
No ratings yet
Databricks Course Curriculum
2 pages
Azure Data Factory Vs Databricks - 4 Key Differences - Hevo
No ratings yet
Azure Data Factory Vs Databricks - 4 Key Differences - Hevo
14 pages
Final Print Py Spark
No ratings yet
Final Print Py Spark
133 pages
Deepak (Sr. Data Engineer)
No ratings yet
Deepak (Sr. Data Engineer)
10 pages
Databricks
No ratings yet
Databricks
56 pages
Srikanth M - Data Engineer
No ratings yet
Srikanth M - Data Engineer
5 pages
Real-Time Analytics With Azure Databricks
No ratings yet
Real-Time Analytics With Azure Databricks
11 pages
Aksha Interview Questions
100% (1)
Aksha Interview Questions
52 pages
HDFS Interview Questions
No ratings yet
HDFS Interview Questions
29 pages
PySpark Notes
No ratings yet
PySpark Notes
29 pages
Aravind - Senior Azure Data Engineer
No ratings yet
Aravind - Senior Azure Data Engineer
5 pages
Databuildtoolpdf 220704 142715
No ratings yet
Databuildtoolpdf 220704 142715
39 pages
1 Introduction To Databricks Machine Learning
No ratings yet
1 Introduction To Databricks Machine Learning
9 pages
HowToCrackInterview Udemy
No ratings yet
HowToCrackInterview Udemy
58 pages
Building Data Pipelines - 1
No ratings yet
Building Data Pipelines - 1
25 pages
Lab 3 - Enabling Team Based Data Science With Azure Databricks
No ratings yet
Lab 3 - Enabling Team Based Data Science With Azure Databricks
18 pages
Azure Data Factory Interview Questions and Aswers
No ratings yet
Azure Data Factory Interview Questions and Aswers
5 pages
150 Data Engineering Interview Questions PDF
No ratings yet
150 Data Engineering Interview Questions PDF
8 pages
Spark Interview Q&A
No ratings yet
Spark Interview Q&A
31 pages
Delta Table and Pyspark Interview Questions
100% (1)
Delta Table and Pyspark Interview Questions
14 pages
Bigdata Interview Preparation Guide
No ratings yet
Bigdata Interview Preparation Guide
292 pages
Interactive Visual Data Exploration With Spark in Databricks Cloud
No ratings yet
Interactive Visual Data Exploration With Spark in Databricks Cloud
26 pages
Siva
No ratings yet
Siva
4 pages
Databricks Performance Tuning
No ratings yet
Databricks Performance Tuning
54 pages
Azure Data Factory Interview Questions
No ratings yet
Azure Data Factory Interview Questions
33 pages
Data Engineering & GCP Basic Services 2. Data Storage in GCP 3. Database Offering by GCP 4. Data Processing in GCP 5. ML/AI Offering in GCP
No ratings yet
Data Engineering & GCP Basic Services 2. Data Storage in GCP 3. Database Offering by GCP 4. Data Processing in GCP 5. ML/AI Offering in GCP
3 pages
Top 50 Data Warehousing Interview Questions & Answers
No ratings yet
Top 50 Data Warehousing Interview Questions & Answers
8 pages
CV For Snowflake Traning
No ratings yet
CV For Snowflake Traning
4 pages
Spark Concept
No ratings yet
Spark Concept
18 pages
Zclus - Harish - Data Engineer
No ratings yet
Zclus - Harish - Data Engineer
6 pages
Databricks Project
No ratings yet
Databricks Project
1 page
4 - Action and RDD Transformations
No ratings yet
4 - Action and RDD Transformations
25 pages
Lakehouse: A Unified Data Architecture
No ratings yet
Lakehouse: A Unified Data Architecture
9 pages
The Medallion Architecture
No ratings yet
The Medallion Architecture
2 pages
Apache Spark Interview Questions
No ratings yet
Apache Spark Interview Questions
12 pages
Snowflake Questions1
No ratings yet
Snowflake Questions1
4 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
22 pages
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
From Everand
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
Pierre-yves Bonnefoy
No ratings yet
Issues ORA 0054 Resource Busy
No ratings yet
Issues ORA 0054 Resource Busy
5 pages
Competing Against Scale Computing
No ratings yet
Competing Against Scale Computing
8 pages
How To Configure Active Directory (AD) Single Sign On (SSO) in Transparent Mode
No ratings yet
How To Configure Active Directory (AD) Single Sign On (SSO) in Transparent Mode
9 pages
905 Release Info
No ratings yet
905 Release Info
3 pages
Outputting Information of Compound Table To IST For GCMS
No ratings yet
Outputting Information of Compound Table To IST For GCMS
4 pages
Node JS: Prof. Nalini N Scope VIT
No ratings yet
Node JS: Prof. Nalini N Scope VIT
25 pages
Magento Enterprise Edition 2.1 User Guide PDF
No ratings yet
Magento Enterprise Edition 2.1 User Guide PDF
1,514 pages
React Native + Redux + ES6: Carol at 2017:p
No ratings yet
React Native + Redux + ES6: Carol at 2017:p
31 pages
Salesforce Interview Questions and Answers - Salesforce Admin and Developer Training For Beginners
No ratings yet
Salesforce Interview Questions and Answers - Salesforce Admin and Developer Training For Beginners
10 pages
Android Training Lesson 7: FPT Software
No ratings yet
Android Training Lesson 7: FPT Software
21 pages
The Yii Framework
No ratings yet
The Yii Framework
28 pages
Brochure Canon Ir3235 PDF
No ratings yet
Brochure Canon Ir3235 PDF
12 pages
Lectures On Architecture Viollet-le-Duc 1877
No ratings yet
Lectures On Architecture Viollet-le-Duc 1877
501 pages
Long Context RAG Performance of Large Language Models
No ratings yet
Long Context RAG Performance of Large Language Models
20 pages
Ansible Sample-Playbooks
No ratings yet
Ansible Sample-Playbooks
24 pages
Introduction To Javascript
No ratings yet
Introduction To Javascript
8 pages
Ism Assignment (Bba 210) Unit 3: Various Qualitative and Quantitative Aspects To Perform Cbs
No ratings yet
Ism Assignment (Bba 210) Unit 3: Various Qualitative and Quantitative Aspects To Perform Cbs
17 pages
Website Resume
No ratings yet
Website Resume
1 page
A Software Engineer Learns HTML5 JavaScript and Jquery Dane Cameron PDF
100% (1)
A Software Engineer Learns HTML5 JavaScript and Jquery Dane Cameron PDF
703 pages
4.9 (JIT, MRP, ERP and Supply Chain Management)
No ratings yet
4.9 (JIT, MRP, ERP and Supply Chain Management)
7 pages
AD 01 Intro To System Analysis N Design
No ratings yet
AD 01 Intro To System Analysis N Design
40 pages
Ibm Websphere Application Server V7: Single-Instance Resource Adapters
No ratings yet
Ibm Websphere Application Server V7: Single-Instance Resource Adapters
8 pages
Description: Tags: 101405TSMFAHProcessesandBatchFileLayout
No ratings yet
Description: Tags: 101405TSMFAHProcessesandBatchFileLayout
53 pages
jdeGT OpenTable
No ratings yet
jdeGT OpenTable
3 pages
Pos Notes Unit-2
No ratings yet
Pos Notes Unit-2
37 pages
The Unofficial Helix MIDI Guide
100% (1)
The Unofficial Helix MIDI Guide
17 pages
BW and Trade Promotion Management PDF
100% (2)
BW and Trade Promotion Management PDF
29 pages