Data Engineer Interview Questions
Data Engineer Interview Questions
Hadoop Common: It is a common set of utilities and libraries that are utilized by
Hadoop.
HDFS: This Hadoop application relates to the file system in which the Hadoop data
is stored. It is a distributed file system having high bandwidth.
Hadoop YARN: It is used for resource management within the Hadoop cluster. It can
also be used for task scheduling for users.
6) What is NameNode?
It is the centerpiece of HDFS. It stores data of HDFS and tracks various files
across the clusters. Here, the actual data is not stored. The data is stored in
DataNodes.
Block Scanner verifies the list of blocks that are presented on a DataNode.
10) What are the steps that occur when Block Scanner detects a corrupted data
block?
1) First of all, when Block Scanner find a corrupted data block, DataNode report to
NameNode
2) NameNode start the process of creating a new replica using a replica of the
corrupted block.
3) Replication count of the correct replicas tries to match with the replication
factor. If the match found corrupted data block will not be deleted.
Velocity
Value
Variety
Volume
Veracity
1) Integrate data using data sources like RDBMS, SAP, MySQL, Salesforce
3) Deploy big data solution using processing frameworks like Pig, Spark, and
MapReduce.