Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
96 views

Business Intelligence & Big Data Analytics-CSE3124Y

This document provides an overview of big data essentials. It defines big data and describes its key characteristics including volume, velocity, variety, veracity and value. The document discusses the evolution of big data and challenges in data representation, analysis, acquisition, storage and management. It also covers application domains for big data analytics such as healthcare, transportation, financial services and others. Finally, it reviews some popular big data tools and platforms.

Uploaded by

splokbov
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views

Business Intelligence & Big Data Analytics-CSE3124Y

This document provides an overview of big data essentials. It defines big data and describes its key characteristics including volume, velocity, variety, veracity and value. The document discusses the evolution of big data and challenges in data representation, analysis, acquisition, storage and management. It also covers application domains for big data analytics such as healthcare, transportation, financial services and others. Finally, it reviews some popular big data tools and platforms.

Uploaded by

splokbov
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Business Intelligence & Big Data

Analytics- CSE3124Y
BIG DATA ESSENTIALS

LECTURE 1
Learning Outcomes
Explain Big data concepts.
Describe the characteristics of Big Data.
Identify the challenges and opportunities in
implementing Big Data
Discuss the application domains for Big Data
Determine how big data analytics are being used
in case studies (application domains)
Definition
Big Data is a term often used to describe data sets whose
size is beyond the capability of commonly used software
tools to capture, manage, and process.
Sagiroglu and Sinanc (2013) defines Big Data as a term
“for massive data sets having large, more varied and
complex structure with the difficulties of storing,
analyzing and visualizing for further processes or results”.
Big Data
Big Data can be generated from many different sources,
including :
Social networks
Banking and financial services
E-commerce services
Web-centric services
Internet search indexes
Scientific and document searches
Medical records
Web logs
Evolution of Big Data
Explosion of the Internet, social media, technologies such as mobile
devices, sensors and
applications have led to the creation of massive data sets.
According to McAfee et al. (2012), as of 2012, about 2.5 exabytes of
data were created each day
that number is doubling every 40 months and so.
As of 2014, Google processes data of hundreds of Petabyte (PB) and
Facebook
generates log data of over 10 PB per month (Chen et al., 2014).
Characteristics of Big Data

5 V’s of Big Data (Anuradha, 2015)


Characteristics of Big Data (1)
Volume:
 The volume of data refers to large amount of data with size varying from terabytes to zettabyte.
 Analyzing and manipulating such a large amount of data require substantial resources and
represent a major challenge.
Velocity:
 The velocity refers to the speed at which data is created. It can be measured using data volume per
time.
Variety:
 Variety refers to different types of data: structured, semi-structured and unstructured data that
are being stored and analysed.
 Semi- structured data consist of a combination of structured and unstructured data.
 The types of data can include text,
audio, video, images, sensor data,
emails, log files, social media posts amongst others
Characteristics of Big Data (2)
Veracity:
Veracity refers to trustworthiness of the data.
It includes other data quality attributes such as authenticity,
reputation, availability, consistency and accountability of data
Value:
Raw data is of no value.
Big data has to be transformed into smart data to add value to a
business or even generate revenue.
Activity 1
Differentiate between structured and unstructured data, by
providing examples of both types.

How is Big Data different from traditional data?


Structured vs Unstructured
Challenges
Data Representation
 The evolution of Big Data has led to creation of large amount of heterogeneous data with variations in type,
structure, semantics, organization, granularity and accessibility.
 Thus representing data so that it is meaningful and efficient, is a major challenge.
Data Analysis
 Analyzing Big Data is a challenging task due to the incompleteness and inconsistencies of semi-structured and
unstructured data .
 Data volume is scaling faster than compute resources.
 The analysis of large data sets is very time consuming
 It is of utmost importance to address these challenges to realize the full potential of Big Data analysis.
 Furthermore, in order to get much benefit and insights from Big Data analysis, Big Data has to be pre-processed,
cleaned and transformed properly.
Data Acquisition
 Data acquisition consists of data collection, data transmission and data pre-processing.
 Big data collected from various sources such as log files and sensors often consist of large amount of redundant data.
 It is therefore a major challenge to remove this high redundancy.
 Appropriate compression algorithms have to be applied.
Challenges
Data Storage
 Traditional RDBMS are found to be ill-suited for storing and processing big data
 NoSQL databases, also referred to as non-traditional databases are becoming increasingly
popular for Big Data storage. Some examples of Big Data databases include Dynamo,
Voldemort, BigTable, Cassandra, MongoDB, SimpleDB and CouchDB
 Challenge to offer information storage service with reliable storage space as well as powerful
access interface for query and analysis of a large amount of data Chen et al., 2014).
Data Management
 Managing Big Data is the most difficult problem.
 A number of issues still have to be resolved such as “access, metadata, utilization, updating,
governance, and reference (in publications)”.
 Need for new approaches to qualify and validate data as they find it impractical to perform
validation on every data item in large datasets.
Activity 2
1. Describe some other challenges that have cropped up with the
evolution of Big Data.
2. There are three types of Big Data databases namely key-value
databases, column-oriented databases, and document-oriented
databases. Differentiate between these three types of databases.
3. Categorize the existing Big Data databases into these three
groups.
Application Domains
Healthcare
Enhanced 360o View of the Customer
Security/Intelligence Extension
Transportation services
…………………..AND many others
Case Study: Healthcare
Healthcare
 Medical information is doubling every 5 years, much of which is unstructured
 81% of physicians report spending 5 hours or less per month reading medical journals
 Big Data is currently being used in healthcare for the prediction and surveillance of
diseases
 Analysing disease patterns can prevent the spreading of the disease.
 Analysing large data sets of patients’ information can help identification of patients who
are likely to suffer from a particular disease such as diabetes.
 Healthcare analytics have the potential to reduce costs of treatment, predict outbreaks
of epidemics, avoid preventable diseases and improve the quality of life in general
How big data analytics can help:
 Epidemic early warning
 Intensive Care Unit and remote monitoring
Activity 3
https://www.datapine.com/blog/big-data-examples-in-healthcare/
Read materials from the website and summarise how big data
analytics are being used in the healthcare.
First define big data
Second define big data analytics
Third discuss on how big data analytics are being used in the
healthcare sector
Case Study 2: Transportation services
Problem:
Traffic congestion has been increasing worldwide as a result of
increased urbanization and population growth reducing the
efficiency of transportation infrastructure and increasing travel
time and fuel consumption.
How big data analytics can help:
Real time analysis to weather and traffic congestion data streams
to identify traffic patterns reducing transportation costs.
Activity 4:
1. Explain what do you understand by ITS (Intelligent Transport
System)
2. Describe the components of an ITS
3. List the advantages of an ITS.
4. What are the challenges that government/ authorities would have
to face/ overcome for the deployment of ITS
Case Study: Financial services
Problem:
 Manage the several Petabytes of data which is growing at 40-100% per year
under increasing pressure to prevent frauds and complain to regulations.
How big data analytics can help:
 Fraud detection
 Risk management
 360°View of the Customer
Activity 4

Suggest other application domains where Big Data could be


applied.
Big Data Tools (1)
Big Data Tools based on batch processing namely
 Apache Hadoop, Dryad, Apache Mahout, Jaspersoft BI Suite, Pentaho Business Analytics, Skytree Server,
Tableau, Karmasphere Studio and Analyst and Talend Open Studio.

The use of the different tools is shown in following Figure


Big Data Tools (2)
Big Data Tools (3)
There are additional tools and platform that distribute
open-source Hadoop platforms namely AWS, Cloudera,
Hortonworks, and MapR Technologies (Raghupathi and
Raghupathi, 2014).

Proprietary options such as IBM’s BigInsights are also


available
References
Sagiroglu, S. and Sinanc, D., 2013, May. Big data: A review. In Collaboration
Technologies and Systems (CTS), 2013 International Conference on (pp. 42-47).
IEEE.
Chen, C.P. and Zhang, C.Y., 2014. Data-intensive applications, challenges,
techniques and technologies: A survey on Big Data. Information Sciences, 275,
pp.314-347.
Chen, M., Mao, S. and Liu, Y., 2014. Big data: A survey. Mobile Networks and
Applications, 19(2), pp.171-209.
McAfee, A., Brynjolfsson, E. and Davenport, T.H., 2012. Big data: the
management revolution. Harvard business review, 90(10), pp.60-68.
Raghupathi, W. and Raghupathi, V., 2014. Big data analytics in healthcare:
promise and potential. Health information science and systems, 2(1), p.3.

You might also like