Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views39 pages

BDA-1st unit

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 39

BIG DATA ANALYTICS

(PC702IT)

Prepared By
Meghana N
Understanding Big Data

 Understanding Big Data involves recognizing


its unique characteristics that differentiate it
from traditional data. These characteristics are
often described using the 5 V's: Volume,
Velocity, Variety, Veracity, and Value. Let's
explore each one:
Big data

 Understanding Big Data involves grasping its


distinct features that set it apart from traditional
data. These features are often referred to as the 5
V's of Big Data: Volume, Velocity, Variety,
Veracity, and Value. Let’s break each one down:
 1Definition: Refers to the vast amount of data
being generated and stored. The scale of data is
immense—often in terabytes, petabytes, and
even exabytes.
• Examples: Social media platforms, like Facebook,
process over 500 terabytes of data daily from
posts, messages, and interactions.
 1. Volume
• Definition: Refers to the enormous size of the
data being generated. Big Data often involves
petabytes or exabytes of data coming from
multiple sources like social media, sensors, and
transaction records.
• Example: Facebook generates around 500
terabytes of data per day from posts, messages,
and user interactions.
Velcoity

 2. Velocity
• Definition: The speed at which data is
generated, collected, and analyzed. With the
rise of IoT (Internet of Things) devices and
digital interactions, data is produced in real
time.
• Examples: Streaming services, such as Netflix
or YouTube, receive data about users’ viewing
preferences in real-time, which helps
recommend content almost instantaneously.
3. Variety

• Definition: Refers to the different types and


formats of data available. Big Data can be
structured, semi-structured, or unstructured.
• Examples:
• Structured: Relational databases (e.g., SQL
databases with predefined schemas).
• Semi-structured: XML, JSON files.
• Unstructured: Videos, images, social
media posts, emails, and sensor data.
4. Veracity

• Definition: The uncertainty or accuracy of the


data. It reflects the quality of the data being
collected, as not all data is clean or reliable.
• Examples: Data collected from social media
might contain biases or misinformation, which
can affect the accuracy of analysis unless
properly filtered and validated.
5. Value

• Definition: The usefulness or economic value


that can be derived from the data. It is not just
about collecting large volumes of data, but
being able to extract actionable insights from
it.
• Examples: Retailers analyzing customer
purchase patterns to create personalized
marketing strategies and boost sales.
Introduction to Big
Data
 Big Data refers to extremely large datasets that
are too complex or voluminous for traditional
data processing techniques to manage effectively.
 These datasets come from a variety of sources,
such as social media, sensors, transactions, and
digital interactions, and are generated
continuously in massive quantities.
 Big Data is characterized by its size, speed,
diversity, and the challenges it presents in terms
of storage, analysis, and interpretation.
 Importance of Big Data
1. Better Decision-Making
1. Data-Driven Decisions: Big Data allows
organizations to analyze trends, patterns, and
correlations across various data points. This
insight enables more informed and effective
decision-making.
2. Predictive Analytics: Businesses can forecast
future trends by analyzing past behaviors, leading
to more accurate planning, marketing strategies,
and risk management.
Importance of Big Data
Better Decision-Making

1. Data-Driven Decisions: Big Data allows


organizations to analyze trends, patterns,
and correlations across various data points.
This insight enables more informed and
effective decision-making.
2. Predictive Analytics: Businesses can
forecast future trends by analyzing past
behaviors, leading to more accurate
planning, marketing strategies, and risk
management.
 Personalization: Businesses that leverage Big
Data can create highly personalized
experiences for customers. For instance, online
platforms like Amazon or Netflix use Big Data
analytics to recommend products and content
tailored to individual users.
 Real-Time Adaptation: The ability to process
and analyze data in real time allows
organizations to quickly adapt to market
changes, providing them a competitive edge
over rivals.
Enhanced Healthcare and
Medicine

• Personalized Medicine: In healthcare, Big


Data is revolutionizing treatment plans by
analyzing patient histories, medical records,
and genetic information, leading to
personalized medicine approaches.
• Disease Prediction and Prevention:
Predictive models using Big Data can identify
potential disease outbreaks, track the spread of
epidemics, and even predict individual risks for
certain diseases, enabling early interventions.
 Challenges Posed by Big Data
 Handling and processing Big Data presents
several challenges due to its massive scale,
complexity, and continuous growth. These
challenges are often tied to the 5 V's (Volume,
Velocity, Variety, Veracity, and Value), but they
go beyond that. Below are some of the key
challenges:
1. Data Storage and Management
1. Issue: The sheer volume of data generated can
exceed the capacity of traditional storage systems.
Organizations need scalable storage solutions that
can handle structured, semi-structured, and
unstructured data.
2. Solution: Distributed storage systems like
Hadoop’s HDFS (Hadoop Distributed File System)
and cloud-based platforms (AWS, Azure) are
widely used to store and manage massive datasets.
 Data Processing and Scalability
• Issue: Processing large datasets in a timely manner
can be complex. Traditional databases and processing
tools often struggle with the scale and velocity of Big
Data.
 Solution: Parallel processing frameworks like
Apache Spark and MapReduce help distribute the
computational load across multiple machines,
enabling faster data processing.
 Data Quality and Veracity
• Issue: Big Data is often incomplete,
inconsistent, or inaccurate. Low-quality data
can lead to unreliable analysis and poor
decision-making.
 Solution: Techniques like data cleansing,
validation, and preprocessing can help improve
the quality of data before it is analyzed.
 Data Integration
• Issue: Big Data comes from various sources and in
different formats, making it difficult to integrate into a
cohesive system.
 Solution: Integration platforms and data pipelines,
such as Apache NiFi, help streamline the ingestion,
transformation, and integration of heterogeneous data
from multiple sources.
 Big data analytics and its classification:
There are four type of data analytics:
1. Predictive (forecasting)
2. Descriptive (business intelligence and data mining)
3. Prescriptive (optimization and simulation)
4. Diagnostic analytics
1.Predictive Analytics: Predictive analytics turn the data into
valuable, actionable information. predictive analytics uses data to
determine the probable outcome of an event or a likelihood of a
situation occurring.

2.Descriptive Analytics: Descriptive analytics looks at data


and analyze past event for insight as how to approach future
events. It looks at the past performance and understands the
performance by mining historical data to understand the cause of
success or failure in the past.
3.Prescriptive Analytics: Prescriptive Analytics automatically
synthesize big data, mathematical science, business rule, and machine
learning to make prediction and then suggests decision option to take
advantage of the prediction.
4. Diagnostic Analytics: In this analysis, we generally use historical
data over other data to answer any question or for the solution of any
problem. We try to find any dependency and pattern in the historical
data of the particular problem.

For example, companies go for this analysis because it gives a great


insight for a problem, and they also keep detailed information about
their disposal otherwise data collection may turn out individual for
every problem and it will be very time-consuming.
Big Data Applications
 Big Data is revolutionizing various sectors by
providing deeper insights, enhancing decision-
making, and improving operational efficiency. Let’s
explore its applications in Healthcare, Banking,
Advertising, and the Technologies driving Big
Data analytics.
 1. Big Data in Healthcare
 The healthcare sector generates vast amounts of
data, ranging from patient records to medical
imaging, wearable devices, and genomics. Big Data
in healthcare helps in improving patient care,
reducing costs, and driving innovations.
Applications:
Predictive Analytics for Patient Care:

• Using historical patient data, machine


learning algorithms can predict disease
outbreaks, identify high-risk patients, and
recommend preventive care, which helps
improve treatment outcomes.
• Example: Hospitals use predictive analytics
to identify patients at risk of readmission,
enabling timely interventions to prevent
complications.
Personalized Medicine:

• Big Data enables the analysis of genetic


information alongside clinical data to create
personalized treatment plans. This can lead to
more precise diagnoses and targeted therapies.
• Example: Cancer treatments can be customized
based on the patient's genetic makeup,
improving the effectiveness of chemotherapy
or radiation treatments.
Real-Time Monitoring with IoT
Devices:

• Wearable health devices (e.g., fitness trackers,


heart monitors) generate real-time data that can
be analyzed to monitor patients remotely,
reducing the need for hospital visits and
enabling early detection of health issues.
• Example: Diabetic patients can use glucose
monitors to track their blood sugar levels in
real time, helping manage their condition more
effectively.
Applications:
Fraud Detection and Prevention:

 The banking and financial services industry


relies heavily on data to understand customer
behavior, detect fraud, and optimize services.
Big Data helps banks process massive amounts
of transactional data, providing insights into
financial trends and risks.
Applications:
Fraud Detection and Prevention:

• Big Data analytics helps identify fraudulent


transactions by analyzing patterns and
anomalies in real-time. Machine learning
models can detect unusual spending
behaviors and flag suspicious activities.
• Example: Credit card companies use Big
Data to detect irregular transactions,
alerting customers immediately to prevent
fraud.
Customer Segmentation and
Personalization:

• Banks use customer data to segment their


clients based on spending behavior, income
levels, and financial goals. This enables them
to offer personalized financial products and
services.
• Example: Personalized investment advice or
loan offers can be tailored to a customer’s
financial history and goals.
Risk Management:

• Big Data analytics enables banks to assess risk


more effectively by analyzing multiple data
points, including market trends, customer
credit history, and economic indicators. This
helps in making informed lending and
investment decisions.
• Example: Banks use predictive models to
assess a borrower’s creditworthiness and
determine the likelihood of loan defaults.
3. Advertising and Big
Data
 In advertising, Big Data allows companies to
better understand consumer preferences and
behaviors, enabling highly targeted and
effective marketing campaigns.
Applications:
Targeted Advertising:

• Big Data helps advertisers analyze


consumer data, such as browsing history,
purchasing behavior, and social media
interactions, to create highly personalized
and relevant ads for specific audiences.
• Example: Google and Facebook use Big
Data to display targeted ads to users based
on their search history, interests, and social
interactions.
Real-Time Bidding
(RTB):
 Customer Sentiment Analysis:
• Analyzing data from social media, reviews,
and feedback allows companies to gauge
customer sentiment toward their products or
services. This helps in tailoring marketing
strategies to improve brand perception.
• Example: Analyzing tweets or Facebook
comments to measure customer satisfaction
after a product launch.
Big data technologies:
Big data technologies are tools, platforms, and frameworks that
facilitate the collection, storage, processing, and analysis of large,
complex datasets.
1. Data Storage Technologies:
 Hadoop Distributed File System (HDFS): HDFS is part of the
Apache Hadoop framework and is designed to store large amounts of
data across multiple machines. It provides high-throughput access to
large datasets and ensures data redundancy by replicating data across
several nodes.
 Amazon S3: Amazon Simple Storage Service (S3) is a cloud-based
object storage service that offers scalable storage for big data. It's
commonly used for storing structured and unstructured data in large-
scale data lakes.
 Google Cloud Storage: Another cloud-based storage system
that supports the storage of unstructured data and integrates
well with other Google Cloud services for data analysis.

NoSQL Databases: Unlike traditional relational databases,


NoSQL databases are designed for scalability and flexibility.

2. Data Processing Technologies:


 Apache Hadoop: Hadoop is an open-source framework that
processes big data in a distributed manner. Its core
components are:
 MapReduce: A programming model used for processing and
generating large datasets in parallel on a cluster.
 YARN (Yet Another Resource Negotiator): A resource
management layer of Hadoop that schedules jobs and
allocates resources.
3.Data Visualization Technologies

 Tableau: Tableau is a powerful data visualization tool that connects to


big data platforms and provides interactive, shareable dashboards.
 Power BI: A Microsoft tool that provides business intelligence and
visualization solutions. It integrates well with other Microsoft services
and big data sources.

4. Data analytics:

Inbig data analytics, technologies are used to clean and transform


data into information that can be used to drive business decisions.
 Apache Spark: Spark is a popular big data tool for data analysis
because it is fast and efficient at running applications.
Splunk: Splunk is another popular big data analytics tool for
deriving insights from large datasets. It has the ability to generate
graphs, charts, reports, and dashboards.
5. Data mining:
Data mining extracts the useful patterns and trends from
the raw data. Big data technologies such as Rapidminer and
Presto can turn unstructured and structured data into usable
information.
 Rapidminer: Rapidminer is a data mining tool that can
be used to build predictive models.
 Presto: Presto is an open-source query engine that was
originally developed by Facebook to run analytic queries
against their large datasets.
Thank You

You might also like