Data and Information
Data and Information
Data and Information
For example: When you visit any website, they might store you IP address, that is data,
in return they might add a cookie in your browser, marking you that you visited the
website, that is data, your name, it's data, your age, it's data.
• What is Data?
The quantities, characters, or symbols on which operations are performed by a computer, which may be
stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical
recording media.
Data and Information in return they might add a cookie in your browser, marking you that you visited the
website, that is data, your name, it's data, your age, it's data.
Big Data is a collection of large datasets that cannot be processed using traditional computing techniques. For
example, the volume of data Facebook or Youtube need require it to collect and manage on a daily basis, can fall
under the category of Big Data. However, Big Data is not only about scale and volume, it also involves one or
more of the following aspects − Velocity, Variety, Volume, and Complexity.
The quantities, characters, or symbols on which operations are performed by a computer, which may be stored
and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording
media.
Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in
volume and yet growing exponentially with time. In short such data is so large and complex that none of the
traditional data management tools are able to store it or process it efficiently. “Extremely large data sets that
may be analyzed computationally to reveal patterns , trends and association, especially relating to human
behavior and interaction are known as Big Data.”
BIG DATA:
1. Data is defined as the quantities, characters, or symbols on which operations are performed by a computer.
2. Data may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or
mechanical recording media.
4. Big Data is a term used to describe a collection of data that is huge in size and yet growing exponentially with
time.
5. In short such data is so large and complex that none of the traditional data management tools are able to store
it or process it efficiently.
Velocity essentially refers to the speed at which data is being created in real-time. In a broader prospect, it
comprises the rate of change, linking of incoming data sets at varying speeds, and activity bursts.
We already know that Big Data indicates huge ‘volumes’ of data that is being generated on a daily basis from
various sources like social media platforms, business processes, machines, networks, human interactions, etc.
Such a large amount of data are stored in data warehouses.
BigData sources:
Users
Sensors
Applications
Systems
TYPES:
I) Structured:
1. Any data that can be stored, accessed and processed in the form of fixed format is termed as a
Structured Data.
2. It accounts for about 20% of the total existing data and is used the most in programming and
computer-related activities.
3. There are two sources of structured data - machines and humans.
4. All the data received from sensors, weblogs, and financial systems are classified under machine-
generated data.
5. These include medical devices, GPS data, data of usage statistics captured by servers and applications.
6. Human-generated structured data mainly includes all the data a human input into a computer, such as his
name and other personal details.
7. When a person clicks a link on the internet, or even makes a move in a game, data is created.
8. Example: An 'Employee' table in a database is an example of Structured Data.
III) Semi-Structured:
I) Variety:
1. Variety of Big Data refers to structured, unstructured, and semi structured data that is
gathered from multiple sources.
2. The type and nature of data is having great variety.
3. During earlier days, spreadsheets and databases were the only sources of data considered
by most of the applications.
4. Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio,
etc. are also being considered in the analysis applications.
II) Velocity:
I) Programmable:
1. It is possible with big data to explore all types by programming logic.
2. Programming can be used to perform any kind of exploration because of the scale of the
data.
IV) Veracity:
6. Collecting loads and loads of data is of no use if the quality and trustworthiness of the
data is not up to the mark.
II) Academia
1. Big Data is also helping enhance education today.
2. Education is no more limited to the physical bounds of the classroom – there are
numerous online educational courses to learn from.
3. Academic institutions are investing in digital courses powered by Big Data technologies
to aid the all- round development of budding learners.
III) Banking
1. The banking sector relies on Big Data for fraud detection.
2. Big Data tools can efficiently detect fraudulent acts in real-time such as misuse of
credit/debit cards, archival of inspection tracks, faulty alteration in customer stats, etc.
IV) Manufacturing
1. According to TCS Global Trend Study, the most significant benefit of Big Data in
manufacturing is improving the supply strategies and product quality.
2. In the manufacturing sector, Big data helps create a transparent infrastructure, thereby,
predicting uncertainties and incompetence’s that can affect the business adversely.
V) IT
1. One of the largest users of Big Data, IT companies around the world are using Big Data
to optimize their functioning, enhance employee productivity, and minimize risks in
business operations.
2. By combining Big Data technologies with ML and AI, the IT sector is continually
powering innovation to find solutions even for the most complex of problems.
Challenges of Big Data
The following are the five most important challenges of the Big Data
a) Meeting the need for speed In today’s hypercompetitive business environment, companies
not only have to find and analyze the relevant data they need, they must find it quickly.
b) Visualization helps organizations perform analyses and make decisions much more rapidly,
but the challenge is going through the sheer volumes of data and accessing the level of detail
needed, all at a high speed.
c) The challenge only grows as the degree of granularity increases. One possible solution is
hardware. Some vendors are using increased memory and powerful parallel processing to crunch
large volumes of data extremely quickly.
It takes a lot of understanding to get data in the RIGHT SHAPE so that you can use
visualization as part of data analysis.
Even if you can find and analyze data quickly and put it in the proper context for the
audience that will be consuming the information, the value of data for DECISION
MAKING PURPOSES will be jeopardized if the data is not accurate or timely.
Plotting points on a graph for analysis becomes difficult when dealing with extremely
large amounts of information or a variety of categories of information
For example, imagine you have 10 billion rows of retail SKU data that you’re trying to
compare. The user trying to view 10 billion plots on the screen will have a hard time
seeing so many data points.
By grouping the data together, or “binning,” you can more effectively visualize the data.
Need For Synchronization Across Disparate Data Sources As data sets are becoming bigger and
more diverse, there is a big challenge to incorporate them into an analytical platform. If this is
overlooked, it will create gaps and lead to wrong messages and insights.
2. Acute Shortage Of Professionals Who Understand Big Data Analysis The analysis of data is
important to make this voluminous amount of data being produced in every minute, useful. With
the exponential rise of data, a huge demand for big data scientists and Big Data analysts has been
created in the market. It is important for business organizations to hire a data scientist having
skills that are varied as the job of a data scientist is multidisciplinary. Another major challenge
faced by businesses is the shortage of professionals who understand Big Data analysis. There is a
sharp shortage of data scientists in comparison to the massive amount of data being produced.
3. Getting Meaningful Insights Through The Use Of Big Data Analytics It is imperative for
business organizations to gain important insights from Big Data analytics, and also it is
important that only the relevant department has access to this information. A big challenge faced
by the companies in the Big Data analytics is mending this wide gap in an effective manner.
4. Getting Voluminous Data Into The Big Data Platform It is hardly surprising that data is
growing with every passing day. This simply indicates that business organizations need to handle
a large amount of data on daily basis. The amount and variety of data available these days can
overwhelm any data engineer and that is why it is considered vital to make data accessibility
easy and convenient for brand owners and managers.
5. Uncertainty Of Data Management Landscape With the rise of Big Data, new technologies and
companies are being developed every day. However, a big challenge faced by the companies in
the Big Data analytics is to find out which technology will be best suited to them without the
introduction of new problems and potential risks.
6. Data Storage And Quality Business organizations are growing at a rapid pace. With the
tremendous growth of the companies and large business organizations, increases the amount of
data produced. The storage of this massive amount of data is becoming a real challenge for
everyone. Popular data storage options like data lakes/ warehouses are commonly used to gather
and store large quantities of unstructured and structured data in its native format. The real
problem arises when a data lakes/ warehouse try to combine unstructured and inconsistent data
from diverse sources, it encounters errors. Missing data, inconsistent data, logic conflicts, and
duplicates data all result in data quality challenges.
7. Security And Privacy Of Data Once business enterprises discover how to use Big Data, it
brings them a wide range of possibilities and opportunities. However, it also involves the
potential risks associated with big data when it comes to the privacy and the security of the data.
The Big Data tools used for analysis and storage utilizes the data disparate sources. This
eventually leads to a high risk of exposure of the data, making it vulnerable. Thus, the rise of
voluminous amount of data increases privacy and security concerns.
The importance of big data does not revolve around how much data a company has but how a
company utilizes the collected data. Every company uses data in its own way; the more
efficiently
a company uses its data, the more potential it has to grow. The company can take data from any
1. Cost Savings: Some tools of Big Data like Hadoop and Cloud-Based Analytics can
bring cost advantages to business when large amounts of data are to be stored and these
2. Time Reductions: The high speed of tools like Hadoop and in-memory analytics can
easily identify new sources of data which helps businesses analyzing data immediately
3. Understand the market conditions: By analyzing big data you can get a better
purchasing behaviors, a company can find out the products that are sold the most and
produce products according to this trend. By this, it can get ahead of its competitors.
4. Control online reputation: Big data tools can do sentiment analysis. Therefore, you
can get feedback about who is saying what about your company. If you want to monitor
and improve the online presence of your business, then, big data tools can help in all
this.
The customer is the most important asset any business depends on. There is no single
business that can claim success without first having to establish a solid customer base.
However, even with a customer base, a business cannot afford to disregard the high
competition it faces. If a business is slow to learn what customers are looking for, then
it is very easy to begin offering poor quality products. In the end, loss of clientele will
result, and this creates an adverse overall effect on business success. The use of big data
allows businesses to observe various customer related patterns and trends. Observing
InsightsBig data analytics can help change all business operations. This includes the ability to
match customer expectation, changing company’s product line and of course ensuring
Another huge advantage of big data is the ability to help companies innovate and
Although Big Data and Business Intelligence are two technologies used to analyze data to help
companies in the decision-making process, there are differences between both of them. They
differ
in the way they work as much as in the type of data they analyze.
Traditional BI methodology is based on the principle of grouping all business data into a central
server. Typically, this data is analyzed in offline mode, after storing the information in an
environment called Data Warehouse. The data is structured in a conventional relational database
with an additional set of indexes and forms of access to the tables (multidimensional cubes).
These are the main differences between Big Data and Business Intelligence:
1. In a Big Data environment, information is stored on a distributed file system, rather than
2. Big Data solutions carry the processing functions to the data, rather than the data to the
functions. As the analysis is centered on the information, it´s easier to handle larger
3. Big Data can analyze data in different formats, both structured and unstructured. The
volume of unstructured data (those not stored in a traditional database) is growing at levels
much higher than the structured data. Nevertheless, its analysis carries different challenges.
Big Data solutions solve them by allowing a global analysis of various sources of
information.
4. Data processed by Big Data solutions can be historical or come from real-time sources.
Thus, companies can make decisions that affect their business in an agile and efficient way.
5. Big Data technology uses parallel mass processing (MPP) concepts, which improves the
speed of analysis. With MPP many instructions are executed simultaneously, and since the
various jobs are divided into several parallel execution parts, at the end the overall results
are reunited and presented. This allows you to analyze large volumes of information
quickly.
Why hype around big data analytics?
Data Analytics and its type
Analytics is the discovery and communication of meaningful patterns in data. Especially,
valuable in areas rich with recorded information, analytics relies on the simultaneous
application of statistics, computer programming, and operation research to qualify
performance. Analytics often favors data visualization to communicate insight.
Firms may commonly apply analytics to business data, to describe, predict, and improve
business performance. Especially, areas within include predictive analytics, enterprise decision
management, etc. Since analytics can require extensive computation(because of big data), the
algorithms and software used to analytics harness the most current methods in computer
science.
In a nutshell, analytics is the scientific process of transforming data into insight for making
better decisions. The goal of Data Analytics is to get actionable insights resulting in smarter
decisions and better business outcomes.
It is critical to design and built a data warehouse or Business Intelligence(BI) architecture that
provides a flexible, multi-faceted analytical ecosystem, optimized for efficient ingestion and
analysis of large and diverse data sets.