This document defines big data and discusses its key characteristics and applications. It begins by defining big data as large volumes of structured, semi-structured, and unstructured data that is difficult to process using traditional methods. It then outlines the 5 Vs of big data: volume, velocity, variety, veracity, and variability. The document also discusses Hadoop as an open-source framework for distributed storage and processing of big data, and lists several applications of big data across various industries. Finally, it discusses both the risks and benefits of working with big data.
2. What is BIG DATA?
• The Oxford English Dictionary (OED )defines big data:
“data of a very large size, typically to the extent that its
manipulation and management present significant
logistical challenges.”
• Big data is an evolving term that describes any
voluminous amount of structured, semi-structured and
unstructured data that has the potential to be mined for
information. Although big data doesn't refer to any
specific quantity, the term is often used when speaking
about petabytes and exabytes of data.
3. .
• Big Data generates value from the storage and processing of very large quantities of
digital information that cannot be analyzed with traditional computing techniques.
5. Volume:
• It is the size of the data which determines the
value and potential of the data under
consideration and whether it can actually be
considered as Big Data or not. The name ‘Big
Data’ itself contains a term which is related to size
and hence the characteristic.
• Big data implies enormous volumes of data.
• Now that data is generated by machines,
networks and human interaction on systems like
social media the volume of data to be analyzed is
massive.
6. Velocity:
• The term ‘velocity’ in the context refers to the
speed of generation of data or how fast the
data is generated and processed to meet the
demands and the challenges which lie ahead
in the path of growth and development.
• Big Data Velocity deals with the pace at which
data flows in from sources like business
processes, machines, networks and human
interaction with things like social media sites,
mobile devices, etc.
7. Variety:
• The next aspect of Big Data is its variety.
This means that the category to which Big
Data belongs to is also a very essential fact
that needs to be known by the data analysts.
• This helps the people, who are closely
analyzing the data and are associated with it,
to effectively use the data to their advantage
and thus upholding the importance of the Big
Data.
8. Veracity:
• Big Data Veracity refers to the biases, noise and
abnormality in data. Is the data that is being
stored, and mined meaningful to the problem
being analyzed.
• The quality of the data being captured can vary
greatly. Accuracy of analysis depends on the
veracity of the source data.
9. Variability:
• This is a factor which can be a problem for those who analyse the data. This refers
to the inconsistency which can be shown by the data at times, thus hampering the
process of being able to handle and manage the data effectively.
11. Storage And Architecture:
• Recent studies show that the use of a multiple layer architecture is an option for
dealing with big data. The Distributed Parallel architecture distributes data across
multiple processing units and parallel processing units provide data much faster,
by improving processing speeds.
• This type of architecture inserts data into a parallel DBMS, which implements the
use of MapReduce and Hadoop frameworks. This type of framework looks to make
the processing power transparent to the end user by using a front end application
server.
12. Hadoop
• Hadoop is a set of algorithms (an open-source software framework written in Java)
for distributed storage and distributed processing of very large data or Big Data
on computer clusters built from commodity hardware.
• It is designed to scale up from a single server to thousands of machines, with very
high degree of fault tolerance.
• Hadoop changes the economics and the dynamics of large-scale computing
15. Government:
The use and adoption of Big Data, within governmental processes, is beneficial and
allows efficiencies in terms of cost, productivity and innovation. That said, this
process does not come without its flaws. Data analysis often requires multiple parts
of government (central and local) to work in collaboration and create new and
innovative processes to deliver the desired outcome. Below are the thought leading
examples within the Governmental Big Data space.
India:
• Big data analysis was, in parts, responsible for the BJP and its allies to win a highly
successful Indian General Election 2014.
• The Indian Government utilises numerous techniques to ascertain how the Indian
electorate is responding to government action, as well as ideas for policy
augmentation
16. Risks of Big Data:
#1: Loss of agility
In a typical large-scale organization, data is housed on multiple platforms.
There is transactional data, email data, analytics data, etc. Management wants
people to be able to locate, analyze, and make decisions based on this data
quickly. But if the data isn’t evaluated, organized, and stored properly, critical
information can be either difficult or impossible to find – slowing a business
down at the exact moment when speed is essential.
#2: Loss of compliance
Laws are getting more and more complex with regard to how long companies need
to retain data, how they need to retain it, and where they need to retain it. There are
both general regulations in place as well as state- or industry-specific regulations
that may apply. It is not uncommon for regulators to perform random audits to
examine a company’s policies regarding data and their actual management of that
data. A compliance failure can result in significant fine or damage to reputational
risk.
17. #3: Loss of security
With more data located in and moving between more places than ever before, there
are also a vastly increased number of ways to hack into that data. A security breach
can result in theft, fraud, fines … and, of course, reputational loss.
#4: Loss of money
A server may seem inexpensive at first glance – but never assume that storage is
cheap.
19. Benefits:
• Cost reduction
Big data technologies like Hadoop and cloud-based analytics can provide
substantial cost advantages. While comparisons between big data technology and
traditional architectures (data warehouses and marts in particular) are difficult
because of differences in functionality, a price comparison alone can suggest
order-of-magnitude improvements
• Faster, better decision making
Analytics has always involved attempts to improve decision making, and big data
doesn’t change that.
20. • New products and services
Perhaps the most interesting use of big data analytics is to create new products and
services for customers. Online companies have done this for a decade or so, but
now predominantly offline firms are doing it too. GE, for example, has made a
major investment in new service models for its industrial products using big data
analytics.