Cloud & Big Data
Cloud & Big Data
Cloud & Big Data
Experiential Learning
Submitted to-
Prof. Col. Ravindra Bhate Sir
Henish Kanani
Roll No 42201
Introduction to Big Data
First, we need to understand what data is and what big data is.
Generally speaking, the numbers, characters, or symbols on which operations are performed
by a device that can be processed and transmitted in the form of electrical signals, and recorded
on mechanical, optical, or magnetic storage media are referred to as data.
Big Data is usually also data, but with a huge size. Big Data is a concept used to describe the
type of data that is immense in size and exponentially grows over time. Example -The New
York Stock Exchange produces about 1 Tb of new trading data per day. 5 Tb of data is
generated by Facebook in a single day. Data is created in the form of uploads of photos and
videos, exchanges of messages, comments, etc. A single jet engine produces about 10Tb of
data within 30 minutes of flight time. The data generation reaches up to many Petabytes with
thousands of flights per day.
Sqoop Flume
Apache Flume – Apache Flume is a tool / service / data ingestion system for gathering and
transmitting large amounts of streaming data from different sources to a centralized data store,
such as log files, events, etc.
Apache Hive – Hive is a data warehouse infrastructure tool to process structured data in
Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analysing
easy.