Basic Big Data Interview Questions
Basic Big Data Interview Questions
Basic Big Data Interview Questions
Whenever you go for a Big Data interview, the interviewer may ask some
basic level questions. Whether you are a fresher or experienced in the big
data field, the basic knowledge is required. So, let’s cover some frequently
asked basic big data interview questions and answers to crack big data
interview.
Note: This is one of the basic and significant questions asked in the big data
interview. You can choose to explain the five V’s in detail if you see the
interviewer is interested to know more. However, the names can even be
mentioned if you are asked about the term “Big Data”.
3. Tell us how big data and Hadoop are related to each other.
Answer: Big data and Hadoop are almost synonyms terms. With the rise of
big data, Hadoop, a framework that specializes in big data operations also
became popular. The framework can be used by professionals to analyze big
data and help businesses to make decisions.
Answer: Big data analysis has become very important for the businesses. It
helps businesses to differentiate themselves from others and increase the
revenue. Through predictive analytics, big data analytics provides businesses
customized recommendations and suggestions. Also, big data analytics
enables businesses to launch new products depending on customer needs
and preferences. These factors make businesses earn more revenue, and
thus companies are using big data analytics. Companies may encounter a
significant increase of 5-20% in revenue by implementing big data analytics.
Some popular companies those are using big data analytics to increase their
revenue is – Walmart, LinkedIn, Facebook, Twitter, Bank of America etc.
Answer: Followings are the three steps that are followed to deploy a Big
Data Solution –
i. Data Ingestion
The first step for deploying a big data solution is the data ingestion i.e.
extraction of data from various sources. The data source may be a CRM like
Salesforce, Enterprise Resource Planning System like SAP, RDBMS like MySQL
or any other log files, documents, social media feeds etc. The data can be
ingested either through batch jobs or real-time streaming. The extracted data
is then stored in HDFS.
Steps of
Deploying Big Data Solution
After data ingestion, the next step is to store the extracted data. The data
either be stored in HDFS or NoSQL database (i.e. HBase). The HDFS storage
works well for sequential access whereas HBase for random read/write
access.
The final step in deploying a big data solution is the data processing. The
data is processed through one of the processing frameworks like Spark,
MapReduce, Pig, etc.
Storage
Processing
Data collection
8. What is fsck?
11. Do you have any Big Data experience? If so, please share it with
us.
So, how will you approach the question? If you have previous experience,
start with your duties in your past position and slowly add details to the
conversation. Tell them about your contributions that made the project
successful. This question is generally, the 2 nd or 3rd question asked in an
interview. The later questions are based on this question, so answer it
carefully. You should also take care not to go overboard with a single aspect
of your previous job. Keep it simple and to the point.
13. Will you optimize algorithms or code to make them run faster?
The interviewer might also be interested to know if you have had any
previous experience in code or algorithm optimization. For a beginner, it
obviously depends on which projects he worked on in the past. Experienced
candidates can share their experience accordingly as well. However, be
honest about your work, and it is fine if you haven’t optimized code in the
past. Just let the interviewer know your real experience and you will be able
to crack the big data interview.
By answering this question correctly, you are signaling that you understand
the types of data, both structured and unstructured, and also have the
practical experience to work with these. If you give an answer to this
question specifically, you will definitely be able to crack the big data
interview.
17. What happens when two users try to access the same file in the
HDFS?
HDFS NameNode supports exclusive write only. Hence, only the first user will
receive the grant for file access and the second user will be rejected.
1. Use the FsImage which is file system metadata replica to start a new
NameNode.
2. Configure the DataNodes and also the clients to make them
acknowledge the newly started NameNode.
3. Once the new NameNode completes loading the last checkpoint
FsImage which has received enough block reports from the DataNodes,
it will start to serve the client.
The HDFS divides the input data physically into blocks for processing which is
known as HDFS Block.
Enhance your Big Data skills with the experts. Here is the Complete List of
Big Data Blogs where you can find latest news, trends, updates, and
concepts of Big Data.
22.
What are the common input formats in Hadoop?
Core Components
of Hadoop
Blocks are smallest continuous data storage in a hard drive. For HDFS, blocks
are stored across Hadoop cluster.
i. Standalone or local: This is the default mode and does not need any
configuration. In this mode, all the following components of Hadoop uses
local file system and runs on a single JVM –
NameNode
DataNode
ResourceManager
NodeManager
iii. Fully distributed: In this mode, Hadoop master and slave services are
deployed and executed on separate nodes.
Prepare yourself for the next Hadoop Job Interview with Top 50 Hadoop
Interview Questions and Answers.
Map phase – In this phase, the input data is split by map tasks. The
map tasks run in parallel. These split data is used for analysis purpose.
Reduce phase- In this phase, the similar split data is aggregated from
the entire collection and shows the result.
38. What are the Port Numbers for NameNode, Task Tracker, and Job
Tracker?
Hadoop distributed file system (HDFS) uses a specific permissions model for
files and directories. Following user levels are used in HDFS –
Owner
Group
Others.
For each of the user mentioned above following permissions are applicable –
read (r)
write (w)
execute(x).
For files –
For directories –