Big Data in Action : Operations, Analytics and more

What is Better Alert?Big Data in Action: Operations, Analytics and more

Agenda
• Meet & Greet Introduction.
• Unfolding the term “Big Data”.
– Evolution of Data to Big Data : Static to Stream.
– 3 V’s of Big Data.
• Overview of Implementing Big Data
– Examples of implementation of Big Data
– Implementing Big data with Hadoop infrastructure
– Implementing Big data with NoSql like Cassandra & MongoDB.
• Advantages of implementing Big Data solutions.
• Open Forum Discussion/ Networking.

Vibhu Bhutani
Technical Project Manager
Started as a Java developer, and I have many years of experience in developing and managing
state of the art applications. With extensive experience in the phases of the SDLC model, I
leads the team of innovations & mobile excellence in softweb soloutions. Am involved in
various innovative implementations which include the implementation of Big Data systems,
IOT implementations and iBeacon developments at Softweb Solutions.
in/vibhuis
Welcome

Unfolding the Term Big Data
• IBM reported in a study that every day we create roughly 2.5 quintillion data from various
data sources like Climate Sensors, GPS Signals, Social Media, Online transactions. Out of
which 90% was created in the last couple of years. Big Data is a buzz word of a technology
that shows a potential to process, huge amount of data so that we get some valuable
information out of it.
• How old is Big Data?
– Its as old as data however the parameters changes every year. In 2012 it was about couple of
Petabytes and now its about few Exabyte's.
• Why do we now here about Big Data?
– Although big data is old, but now a days more industries are knowing about the implications of
big data. In 2004 Google introduced a paper explaining Map Reduce technique to analyze large
datasets. After that many other companies joined together and the buzz word Big Data came
into existence.
• Static data VS Dynamic Data

Evolution of Data
In 76 KB of
Hardwired
Memory, Nasa
successfully took
Men to moon and
brought them back.
With an 8 Gigs
iPhone it can be
done 108 times.
Strange Fact

Evolution of Data
Necessity is the Mother of Invention, and I believe Technology his father

Application of Big Data - Cern
• In 1960 Cern used to store data in a main frame
computer.
• In 1970 cern used to distribute data in several
machines dividing mainframe computer into a
smaller piece of equipments and cern net was
introduced to bridge these machine and travel
was reduced.
• In 1980 these machines were placed in different
countries of US and Europe and internet was
introduced to connect these machines.
• Due to enormous increase of data in 2000 a cern
grid was introduced connecting different smaller
computers together to analyze and process the
data.
• Detector with 150 million sensors are used in LHC
where protons collides at a light speed works as a
3D camera where pictures are by a rate of 40
million times per second. The data is now stored
in cloud and analyzed using big data techniques.

Implementation of Big Data - Cern
Proton injection for collision Collision of particles recording data
in sensors

Other Industries using Big Data
• Government Application:
– US government invested a lot in the big data applications. Big data analysis played a large role
in Barack Obama's successful 2012 re-election campaign.
– The Utah Data Center is a data center currently being constructed by the United States National
Security Agency. The exact amount of storage space is unknown, but more recent sources claim it
will be on the order of a few exabytes.
– Big data analysis was, in parts, responsible for the BJP and its allies to win a highly successful Indian
General Election 2014.
– UK government is utilizing big data to improv weather forecasting & new drug release forecasts.
• Manufacturing Industries:
• Vast amount of sensory data such as acoustics, vibration, pressure, current, voltage and controller
data in addition to historical data construct the big data in manufacturing. The generated big data
acts as the input into predictive tools and preventive strategies.
• Technology Industries:
• Ebay and Amazon are industry leaders for maintaining large amount of user searches and predictive
analysis. This helps to identify user needs and provide them with better results.
• Retail Industries:
• Walmart contains about 2.5 peta bytes of data handling 1 million customer transaction every hour.
• Amazon does a transaction of USD 80,000 in an hour. Amazon has worlds three largest databases.

Big Data Solutions - Hadoop
• Hadoop is an open-source system to reliably
store and process lot of information.
• Solution of Big Data that handles complexity
involved in volume, variety and velocity of data.
• It transform the commodity hardware to services
to handle peta bytes of data into distributed
environments: Pigeon Computing.
• Hadoop is redundant , reliable, powerful, batch
process centric, distributed.

High Level Architecture of Hadoop

Map Reduce program – Word Count

Hadoop Implementation in Real World
• Yahoo:
– In 2008 Yahoo claimed world’s largest hadoop prodcution application. Yahoo Search Webmap is a hadoop
application that runs on Linux with more that 10,000 cores.
• Facebook:
– In 2010 Facebook claimed that they had the largest Hadoop cluster in the world with 21 PB of storage. On
June 13, 2012 they announced the data had grown to 100 PB] On November 8, 2012 they announced the
data gathered in the warehouse grows by roughly half a PB per day
• As of 2013, Hadoop adoption is widespread. For example, more
than half of the Fortune 50 use Hadoop.
• The New York Times used 100 Amazon EC2 instances and a
Hadoop application to process 4 TB of raw image TIFF data (stored
in S3) into 11 million finished PDFs in the space of 24 hours at a
computation cost of about $240 (not including bandwidth)

Distributed System - CAP Theorem

Introduction to No SQL
• A NoSQL database provides a mechanism
for storage and retrieval of data that is modeled in means other
than the tabular relations used in relational databases
• Types of NoSQL Databases:
– Column: Cassandra, HBase
– Document: Apache CouchDB, MongoDB
– Key-value: CouchDB, Dynamo, Redis
– Graph: Neo4J
– Multi-model: OrientDB, Alchemy Database, CortexDB

High Level Architecture - Cassandra
• Ring based replication
• Only 1 type of server (cassandra)
• All nodes hold data and can answer queries
• No Single Point of Failure
• Build for HA & Scalability
• Multi-DC
• Data is found by key (CQL)
• Runs on JVM

Example: Single Row Partition
• Simple User system
• Identified by name (pk)
• 1 Row per partition

Example: Multiple Rows
• Comments on photos
• Comments are always selected by
the photo_id
• There are only 4 rows in 2 partitions

• Multiple rows are transposed into a single partition
• Partitions vary in size
• Old terminology - "wide row"
• Cassandra is built for fast write. The data model should be deformalize to do few Reads as
possible

High Level Architecture – Mongo DB
• Open-source, Document-oriented, popular for its
agile and scalable approach
• Notable Features :
– JSON/BSON data model with dynamic schema
– Auto-sharding for horizontal scalability
– Built-in replication with automated fail-overs
– Full, flexible index support including secondary
indexes
– Rich document-based queries
– Aggregation framework and Map / Reduce
– GridFS for large file storage

High Level Architecture – Mongo DB
• Ensures High Availability, Redundancy, Automated
Fail-over
• Writes to the Primary, Reads from all
• Asynchronous replication
• In conventional terms, more like Master/Slave
replication
• Members can be configured to be: Secondary only
/ Non- Voting / Hidden / Arbiters / Delayed

When to use : Mongo DB
• Unstructured data from multiple suppliers
• GridFS : Stores large binary objects
• Spring Data Services
• Embedding and linking documents
• Easy replication set up for AWS

Thank you for your patience.
Thank You!

Big Data in Action : Operations, Analytics and more

More Related Content

What's hot

What's hot (20)

Similar to Big Data in Action : Operations, Analytics and more

Similar to Big Data in Action : Operations, Analytics and more (20)

More from Softweb Solutions

More from Softweb Solutions (20)

Recently uploaded

Recently uploaded (20)

Big Data in Action : Operations, Analytics and more

Editor's Notes