Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

BIG DATA For Healthcare A Survey

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

SPECIAL SECTION ON NEW TRENDS IN BRAIN SIGNAL PROCESSING AND ANALYSIS

Received September 11, 2018, accepted November 7, 2018, date of publication December 21, 2018,
date of current version January 23, 2019.
Digital Object Identifier 10.1109/ACCESS.2018.2889180

BIG DATA for Healthcare: A Survey


SAFA BAHRI1 , NESRINE ZOGHLAMI1 , MOURAD ABED2 , AND JOÃO MANUEL R. S. TAVARES 3
1 LTSIRS Laboratory, University of Tunis El Manar, Tunis 1002, Tunisia
2 Automatic, Mechanic and Human IT Laboratory, University of Valenciennes and Hainaut-Cambrésis, 59313 Valenciennes, France
3 Instituto de Ciência e Inovação em Engenharia Mecânica e Engenharia Industrial, Departamento de Engenharia Mecânica, Faculdade de Engenharia,

Unversidade do Porto, 4200-465 Porto, Portugal


Corresponding author: João Manuel R. S. Tavares (tavares@fe.up.pt)
This work was supported in part by the Program of ERASMUS+. The work of J. M. R. S. Tavares was supported by the Fundo Europeu de
Desenvolvimento Regional through the Programa Operacional Regional do Norte (NORTE2020) (SciTech—Science and Technology for
Competitive and Sustainable Industries) under Project NORTE-01-0145-FEDER-000022.

ABSTRACT Recently, the massification of new technologies, which has been adopted by a large majority
of the world population, has accumulated a tremendous amount of data, including clinical data. This
clinical data have been gathered up and interpreted by medical organizations in order to gain insights and
knowledge useful for clinical decisions, drug recommendations, and better diagnoses, among many other
uses. This paper highlights the enormous impacts of big data on medical stakeholders, patients, physicians,
pharmaceutical and medical operators, and healthcare insurers, and also reviews the different challenges that
must be taken into account to get the best benefits from all this big data and the available applications.

INDEX TERMS Big data, big data analytics, healthcare.

I. INTRODUCTION satisfaction by developing products or services that match


Since the massification of the most recent technological their needs and desires. Also, they can enhance their produc-
trends, like social networks, wearable devices, mobile devices tivity and operational efficiency through resource optimiza-
and Internet Of Things, data is being continuously generated tion [27]. Furthermore, the Healthcare sector is considered as
in various forms at an unprecedented scale from multiple one of the main sectors making an evolutionary breakthrough
sources. This massive amount of data, along with the oppor- by adopting Big Data techniques and technologies. Indeed,
tunities and possibilities gained from its analysis and the chal- digitalization of medical data has increased tremendously and
lenges it raises in terms of storage, processing and analysis, huge amounts of data are generated at ever increasing rates
has led to the appearance of a new terminology called ‘‘Big and in miscellaneous formats, including structured, semi-
Data.’’ structured and unstructured datasets. The Institute for Health
Big Data has been used by many researchers in diverse Technology Transformation [21] announced that the medical
fields to support their conclusions and findings. For example, Big Data in the U.S reached the zettabyte scale and may
in the transport sector, Big Data Analytics technologies were soon come to the yottabyte dimension. The main features
used in order to improve the service quality, traveler satis- of Big Data are its diversity and the surprising number of
faction and management process, and can suggest ways to data sources; however, these can be categorized into five
optimize customer complaints services [28]. In [29] the effec- groups [7], [21]:
tiveness of Big Data for monitoring smart grid operations is • Large-scale enterprise systems: encompasses data rela-
emphasized. The work in [48] is focused on the impact of Big tive to enterprise information systems;
Data Analytics in optimizing airline routes. Also, Big Data • Online social graphs: is graphic data that illustrates per-
has been used in the field of education, where it can play a role sonal relations of social network users;
in influencing student engagement and behavior [47]. The • Mobile devices: are the main contributors to the Big Data
various applications of Big Data developed between 2010 and phenomenon; approximately 6 billion smartphones are
2016 for supply chain management are identified in [6] with in use worldwide;
the prime objective to convince industry about the effective- • Internet-Of-Things: By the widespread use of sen-
ness of Big Data. In addition, the work in [31] identified sors in many organizations, objects are connected
diverse applications of Big Data in the agriculture sector. with each other and with humans giving rise to huge
By gathering Big Data, organizations can improve customer datasets;

2169-3536 2018 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 7, 2019 Personal use is also permitted, but republication/redistribution requires IEEE permission. 7397
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
S. Bahri et al.: BIG DATA for Healthcare: A Survey

• Open data/public data: which contains data from public


and private organizations.
As reported by The McKinsey Global Institute, the USA
Healthcare institutions might be able to generate more than
US $300 billion in value every year if they applied Big Data
creatively and efficiently. Two thirds of these savings would
be gained from reducing the healthcare expenditure [49].
Therefore, Big Data in the Healthcare sector is becoming
more and more important due to its utility and consistency in,
for example, development of Health Recommender Systems,
Knowledge Discovery Systems, implementation of Clinical
Decision Support Systems as well as Disease Prediction
Systems.
This article reviews the state of the art for the application
FIGURE 2. The evolution of the data volume between 2010 and 2025
of Big Data in the healthcare sector. Initially, an overview (source: IDC’s Data Age 2025 study, April 2017).
of Big Data and its features are presented. Then, the main
aspects of Big Data processes and technologies are discussed.
Afterwards, relevant applications of Big Data are identified in • Transaction data is recovered from fingerprints, genet-
the healthcare area. Next, Big Data Analytics are discussed ics, handwriting and medical images;
in general terms and especially for the healthcare sector. The • Human-Generated data is selected from prescriptions,
article ends with a review of the challenges that were identi- emails, messages, documents and Electronic Medical
fied in this study, followed by the conclusions (Figure 1). Reports;
• Web data contains clickstream data generated by internet
browsers.
This notable evolution in the growth of data has given arise
to this new concept called: ‘‘Big Data.’’ To begin with, Big
Data is a complex dataset that has prominent influence on the
ability of the traditional warehouses to store, handle, manage
and analyze it [6]. A formal definition of Big Data was given
in [1]: ‘‘Big Data is the Information asset characterized by
such a High Volume, Velocity and Variety to acquire specific
technology and analytical methods for its transformation into
Value.’’ On the other hand, the McKinsey Global Institute
FIGURE 1. The topics addressed in this article. defines Big Data as ‘‘datasets whose size is beyond the ability
of typical database software tools to capture, store, manage
II. BIG DATA: DEFINITION AND CHARACTERISTICS and analyze.’’ For instance, an international discount retail
During this last decade, data has been growing exponentially chain (WalMart, USA) stored in 1999 about one terabyte of
in an unexpected way. As reported by the International Data data. In 2012, it had the capability to store 2.5 petabytes
Corporation (IDC), the volume of data is expected to grow of data derived from customer transactions. Indeed, nowa-
from some zettabytes in 2010 to 163 zettabytes in 2025 days, considerable amount of data is generated through the
(Figure 2). For that, data storage capacity has climbed from widespread use of mobile devices, sensors, generated-data
megabytes to exabytes and is expected to reach zettabytes per machines and social media networks. According to [26],
annum in the next few years. Formerly, data was presented in the data today is mainly extracted from five sources: Mobile
structured formats and stored in relational databases arranged Devices, Internet Of Things, Open and Public Data, Enter-
in rows and columns, with limited sources related to internal prise Information Systems and Social Networks.
operations. However, nowadays, many researchers believe A well accepted definition for Big Data nowadays was
that most of this data is unstructured [7], [34], [36], [40] proposed in [10]: it is a dataset characterized by three Vs (Vol-
and the use of non-relational (NoSQL) databases is necessary ume, Variety, Velocity):
for its management. This data can be categorized into web Volume: is related to the huge volume of data acquired
social media data, machine-to-machine data, transaction data, from various sources that can range from terabytes to
biometric data and human-generated data [21], [25], [40]: exabytes or more [10], [12], [34], [36]. According to IDC,
• Social media data is acquired from across interactions, common data volume is going to rise from 1.8 zettabytes to
tweets and posts in social networks such as Facebook, 40 zettabytes between 2011 and 2020 [54]. Therefore, there
LinkedIn, YouTube, and Twitter; is a need to manage this data in a parallel way to gain insights.
• Machine-to-Machine data is extracted from machine Variety: it refers to the heterogeneity and diversity of data
sensors, meters and other devices; extracted from several sources like web sites, social media

7398 VOLUME 7, 2019


S. Bahri et al.: BIG DATA for Healthcare: A Survey

networks, Electronic Medical Records, Journal Documents,


and Video. In fact, Data can be presented in many types, i.e.
structured, semi structured and unstructured datasets, having
different formats, including image, text and video. Hence,
various interpretations and meanings can be extracted from
the same dataset [10], [12], [34], [36].
Velocity: describes the rate of data generation that has
become time sensitive and frequently needs to be handled
and processed in real time. Indeed, many data sources such
as sensors generate data that are constantly updated and need
to be followed in real time [12], [15], [34], [36]. FIGURE 3. Big data chain value: from data collection to data exploitation.
In addition to the aforementioned features of big data,
many researchers have added new features due to the numer-
ous numbers of applications available. Indeed, the initial 3Vs Internet data includes data related to internet search, click
proposal has been extended to 4Vs [3], [26], [42], [50], stream data, comments and likes, log files and messages. IoT
5Vs [33], [39], 6Vs [40], 7Vs [24] and 10Vs [18], [36]. data is related to data generated from devices equipped with
An updated list of the commonly adopted Vs is presented sensors and connectivity. Bio-Medical data include data such
in Table 1. as genes and drugs data and clinical data [44], [45].
TABLE 1. VsOF BIG DATA.
2) BIG DATA ACQUISITION
This second step is usually subdivided into three sub-steps:
Data Collection, Data Transmission and Data Pre-Processing:

a: BIG DATA COLLECTION


Big Data Collection is defined as the acquisition and retrieval
of unlimited raw data, which can be structured, semi-
structured, or unstructured, from several sources using com-
putational techniques and technologies. According to many
authors, Big Data sources can be classified into four cat-
egories: Information Systems, Mobile devices, Internet Of
Things and Open Data [34], [36], [44]. Information System
is considered as a centralized data warehouse containing
all the information about the activities of an organiza-
tion. Mobile devices such as smartphones, tablets or PC’s
generate, a considerable amount of mobile data from the
installed applications. Open Data is the large amount of
extractable data from, for example, web pages, forums, and
journal articles. Internet Of Things encompasses different
interconnected devices with embedded sensors able to pro-
III. BIG DATA: PROCESSES AND TECHNOLOGIES vide stream and updated data controlled across the internet
A. BIG DATA CHAIN VALUE network.
In order to leverage value from this considerable volume of
varied data, a four-step process must be followed (Figure 3). b: BIG DATA TRANSMISSION
Accordingly, a low density set of raw data is processed Big Data transmission is related to the transfer of data from
and analyzed to assist decision maker in their decisions and data sources into storage management systems for data pro-
projects. cessing and analysis [44].

1) BIG DATA GENERATION c: BIG DATA PRE-PROCESSING


As discussed in the first section, a tremendous amount This step ensures efficient and enhanced data for storage and
of data is being generated from various sources, which analysis. In fact, collected data must be pre-processed and
includes internal data from company information systems, enhanced by eliminating redundant, noisy, incomplete and
IoT data, internet data and bio-Medical data. Internal com- useless data leading to a decrease in the storage requirements
pany data encompasses data related to the supply chain, and an improvement in terms of analytic accuracy. Also,
such as production data, quality data, inventory data, sales acquired data with low-density needs to be integrated with
data and administrative data, including human resources data. other data to gain additional value [44].

VOLUME 7, 2019 7399


S. Bahri et al.: BIG DATA for Healthcare: A Survey

3) BIG DATA STORAGE


This is the use of databases that can handle a large amount
of data with different types and formats for further analysis
and processing as well as guaranteeing data security, avail-
ability and reliability. Previously, data sources were relatively
limited; hence, the Volume, Variety and Velocity of the data
were notably smaller, which justified the use of a relational
database management system (RDBMS). Currently, with the
widespread use of the internet, there is a need to use con-
venient and efficient data warehouses for processing data.
In fact, the data storage equipment is becoming increasingly
more important and is considered as the main expense by
various institutions [44].
FIGURE 4. Hadoop architecture.

4) BIG DATA ANALYSIS


Big Data Analysis is the most important and critical step
Modern technologies frequently used throughout Big Data
in Big Data Chain Value, where value is generated as an
processing are:
output. This is defined as the application of techniques and
technologies to mine and extract valuable insights and hid-
1) DATA COLLECTION
den information from large amounts of processed and stored
data [44]. Sqoop, which is a combination of SQL and Hadoop, is an
open source framework that works on top of the Hadoop Dis-
B. BIG DATA TECHNOLOGIES tributed File System (HDFS). It is a command-line interface
Formerly, analysts used relational databases and data ware- that ensures the import and export of data between HDFS
houses to manage and process structured data of limited size. and relational databases, such as Enterprise Data Warehouses,
However, according to the commonly adopted Vs of Big Oracle, Postgres and Teradata (Figure 5). Sqoop presents
Data, traditional tools are inefficient and unable to handle several benefits for data extraction, such as fast performance
tremendous volumes of data and extract valuable insights and optimized exploitation of resources, and offers excessive
from them. storage to other systems and load processing [34], [17].
In order to overcome the low performance and the com-
plexity encountered by using traditional technologies, many
frameworks and tools based on new distributed architectures
along with high memory capacity and processing power have
been developed. Accordingly, [37] defined Big Data tech-
nologies as: ‘‘a new generation of technologies and architec-
tures, designed to economically extract value from very large
volumes of a wide variety of data, by enabling high-velocity
capture, discovery, and/or analysis.’’
Big Data Technologies involve commercial and open- FIGURE 5. Sqoop tool.

source software and services for storage, analyzing, querying,


access, management and processing of data. Flume is a distributed and reliable open source service
Apache Hadoop is a well-known software foundation that designed to assemble, aggregate and transfer huge amounts
offers a collection of open source frameworks designed to of batch files, log files and streaming data from external
support the collection, pre-processing, storage and mining machines to HDFS for storage. It presents a simple and flex-
of considerable amounts of data. As presented in Figure 4, ible architecture based on continuous streaming data flows.
this goal is achieved via a master-slave architecture that com- Flume has many advantages such as fault tolerance, robust-
prises a unique Name Node and a set of Data Nodes. In this ness and horizontal scalability. An extensible data model is
architecture: used to process massive distributed data sources [17], [34].
• NameNode, which is considered as a master, is respon- Kafka is an open source framework developed by Apache
sible for the job scheduling across the cluster. It contains Software foundation. It is able to collect data from many
the metadata, which is a data that describe the other data; sources, including data warehouses and social media net-
• A Secondary NameNode is a backup Namenode that works at the same time due to its distributed system and high
stores all information about the master useful in case the throughput. LinkedIn and Wikipedia are the main users of
system crashes; Kafka and its benefits. It is written in Scala and characterized
• DataNodes are slaves that act as executors of all tasks by its scalability and fault tolerance [17]. Kafka architec-
required by the master Namenode. ture is composed by Producers, Brockers and Consumers.

7400 VOLUME 7, 2019


S. Bahri et al.: BIG DATA for Healthcare: A Survey

Brockers contain Topics, partitions, replicas and offsets. Pro-


ducers, like Facebook and Twitter, write data to Brockers and
Consumers read data from them. Data is stored in Topics that
are split into partitions, which are replicated for data security.
Chukwa is a data collection system designed to monitor
large distributed systems. It works on the top of the HDFS.
It relies on HDFS to collect data from multiple sources and
MapReduce to analyze the acquired data. It is known for its
scalability and robustness, and offers a friendly user interface
to display, monitor and analyze data [17].

2) DATA PROCESSING
FIGURE 8. MapReduce architecture.
MapReduce is a programming model that makes the pro-
cessing of massive amount of data simpler and faster through
its efficient and cost-effective mechanisms. As shown • Task Tracker (Slaves Nodes): perform tasks required by
in Figure 6, this framework has three main functions: the JobTracker and supervise their execution.
C++, Python or JAVA are programming languages useful for
developing the MapReduce programming model. Many uses
of this technology are due to its fault tolerance and scalability.
In case of machine failure, a supplementary machine takes
care of the node failures [8], [17].
YARN (Yet Another Resource Negotiator) is more generic
than MapReduce. It is an advanced resource manager work-
ing on top of the HDFS and ensures the parallel execu-
FIGURE 6. MapReduce pipeline. tion of various applications. In addition, it handles both
batch and stream processing. This framework is also known
• Function Map: to ensure that the input data is broken by its scalability and security. In addition, YARN uses
down into independent key/value pairs; dynamic allocation of system resources, which allows which
• Function Shuffle or Sort: key/value pairs are collected, allows it to increase its exploitation resources. YARN has
stored and then grouped by keys. The output of this a master-slave architecture like the MapReduce framework
function is a collection of keys with associated values; (Figure 9). In fact, the resource manager that operates as a
• Function Reduce: parallel aggregation of pairs accord- master, manages the assignments of jobs around the cluster.
ing to a predefined program. The outputs are sets of The node manager is a generalized task tracker providing
key/value pairs stored in the output file of the MapRe- computational resources such as containers, and manages
duce system. processes running in those containers. A container ensures
A well-known application of the MapReduce framework the execution of the applications-specific process with a con-
called ‘‘Word Count’’ is presented in Figure 7. strained set of resources. An application master is in charge of
managing the required resources of individual applications.
It schedules tasks and assess their progress [8], [17].

FIGURE 7. A MapReduce application called ‘‘Word Count’’ that reads text


files in a distributed manner and determines the number of occurrences
of each word. FIGURE 9. YARN architecture.

The MapReduce framework also has a master-slave archi- Storm: is an open source framework designed for dis-
tecture (Figure 8): tributed real time computations. Unlike HDFS that is tar-
• JobTracker (Master Node): is responsible for the distri- geted for batch processing, this tool is able to handle stream
bution and assignment of tasks for the Slave Nodes; data. It is characterized by its high scalability, fault tolerance

VOLUME 7, 2019 7401


S. Bahri et al.: BIG DATA for Healthcare: A Survey

and efficiency. Storm is used, for example, in real time ana- HDFS, this database also has a master-slave architecture.
lytics, ETL (Extract, Load, Transform) operations, online The master node manages the cluster, and the slaves per-
machine learning, and continuous computations [8], [17]. form the required operations on the available data. HBase
Flink: is an open source framework able to process stream is a flexible, distributed and scalable database, and has
data due to its distributed architecture. the capability for real-time queries, automatic and config-
Spark: is an in-memory cluster computing technology urable partitioning of data to facilitate data processing and
developed for fast computing of data through its sophisticated analysis [17].
libraries: HCatalog: is a table and storage management system for
• Spark Streaming: it allows the collection and processing Hadoop that enables users with different data processing tools
of data in real-time; to read and write data on the grid more easily [17], [27].
• Spark SQL: it is able to execute SQL queries; Avro: is an open source framework designed by Apache
• Spark MLlib: is a Machine Learning library useful and Hadoop that offers two services for developers: Data serial-
powerful in solving machine learning problems. It con- ization and Data exchange [17], [27].
tains different machine learning algorithms related to
classification, regression, clustering and optimization; 4) BIG DATA ANALYSIS
• Spark R: it contains the same functionalities of the pro- Pig: is an open source platform designed by Yahoo to
gramming language for statistical computing and graph- analyze large datasets that are considered to be data flows.
ics ‘‘R.’’ However, processing time is reduced thanks to In order to develop data analysis programs, Pig uses a
its distributed architecture; high programming language called ‘‘Pig Latin.’’ Accord-
• Spark GraphX: it is devoted to the processing of graphs. ingly, to analyze data, developers must write Pig Latin scripts
Giraphe: represents an interactive graph processing with then convert them into MapReduce tasks using the com-
high scalability. It is mostly used by Facebook to interpret ponent Pig Engine. Apache Pig presents many advantages.
social graphs and connections between subscribers [8]. First, due to its multi-query approach, the length of codes
Zookeeper: is a distributed service designed for the syn- is reduced. Second, Pig Latin substitutes the use of Java,
chronization of configurations across a cluster. It ensures the which is traditionally seen as more complicated, when coding
high availability of data [17], [27]. MapReduce jobs. Indeed, Pig Latin is similar to SQL lan-
Oozie: is an open-source job coordinator that executes and guage and is easy to learn. The only difference is that Pig
manages job flows in the Hadoop system. Oozie is a scalable, Latin is able to process semi-structured and unstructured data.
reliable and extensible system [27]. Finally, Pig is characterized by its interactive environment
and can process massive amounts of data due to its distributed
3) BIG DATA STORAGE architecture [17], [45].
To ensure the storage of a huge volume of data, Hadoop uses Hive is a data warehouse system created by Facebook in
the distributed data storage system HDFS and a non-relational order to facilitate the use of Hadoop. The collected data is
database named HBase: stored in a structured database, comprehensible to all users.
HDFS is one of the primary components of a Hadoop Apache Hive database is managed through a HQL language
cluster, i.e. a set of connected computers, that can support up having the same syntax as SQL language. HQL transforms
to hundreds of nodes in a cluster. It is cost effective and has queries into MapReduce jobs processed as batch tasks. Like
a reliable storage capability, high scalability as well as fault Pig, Hive has an interactive interface with a diversity of func-
tolerance. In addition, HDFS can handle both structured and tions useful for data analysis. Unfortunately, Hive is mostly
unstructured data. HDFS is designed for batch processing of used for structured data [17], [45].
high latency operations. In fact, it stores data in 64 or 128 byte Mahout is an open source library designed for machine
capacity blocks. In order to avoid data losses, blocks of the learning and data mining. It works on the top of HDFS in
same file are replicated three times and stored across the order to execute algorithms via MapReduce. It helps devel-
cluster in three different servers. HDFS has a master-slave opers access their own libraries for clustering, collaborative
architecture (Figure 4) that is commonly adopted due to its filtering, categorization and text mining. It is scalable and can
capacity to reduce network congestion and increase system be executed in a distributed mode [17], [27].
performance by performing the computations near the data To conclude, the Hadoop Ecosystem (Figure 10) contains
storage locations [17], [12]. very powerful tools able to collect, process, store and analyze
HBase is an open source project built on top of HDFS a large amount of varied data coming from several sources
designed for low latency operations. This non-relational generated at a high rate. This vast potential is due to its scala-
database, developed posterior to Google’s Big Table, has bility, fault tolerance, flexible scheduling and resource man-
the potential to host very large tables with billions of rows agement, high level and simplified programming model, dis-
and millions of columns. Unlike a row oriented relational tributed architecture, real-time processing, in-memory pro-
database that stores together all columns of a row, HBase cessing and high throughput.
is a columnar database management system that stores The next section presents pertinent applications of Big
data in columns to ensure easier access to data. As for Data, especially in the Healthcare sector.

7402 VOLUME 7, 2019


S. Bahri et al.: BIG DATA for Healthcare: A Survey

their healthcare status [15], [36]. Furthermore, smart dis-


pensers are used to detect if drugs are being taken regularly
at the right time. In the case of non-medication, the prac-
titioner can intervene to get patients properly medicated.
Besides, Big Data analytics help physicians to avoid medi-
cation errors due to drug interactions or incorrect dosages,
which could easily lead to a critical situation. The Big Data
allows the physicians to assess a patient’s records in full
including the recommended medications. In USA, about one
million injuries occur due to prescription errors leading to
thousands of unnecessary deaths. Therefore, an analytical
solution called ‘‘MedAware’’ has been developed. A real-
FIGURE 10. Hadoop ecosystem.
time analysis of medical prescriptions can detect a wide range
of errors with high rates of precision (90%). Anticipating
clinical errors can reduce unnecessary hospitalizations and
IV. BIG DATA APPLICATIONS IN HEALTHCARE SECTOR re-admissions as well as excessive lengths of hospitalization
Healthcare is among the sectors generating a massive amount stays [23].
of data characterized by its high velocity and variety such Furthermore, Common Sensing, a company that use Big
as laboratory data, medical prescriptions, appointments, Data technologies to ensure a follow-up of treatment given
machine generated data, insurance data, and administrative to diabetic patients, developed a replacement cap named
data. In fact, in USA, only clinical data reached 150 exabytes ‘‘GoCap’’ for prefilled insulin pens able to register the dosage
in 2011. According to predictions, the volume of clinical of insulin taken daily and the exact time when it was admin-
data will soon reach zettabytes or even yottabytes [21], [40]. istrated. Then, Bluetooth technology is used to transmit the
The authors in [7] estimate that 90% of all clinical data are data collected to a mobile phone or a connected glucometer.
unstructured, such as prescription notes, lab results, and ECG This stream data might be transferred to the care providers
data. According to the authors in [19], Big Data capability in who would be able to detect potential healthcare issues and
the healthcare sector is defined as: ‘‘The ability to acquire, to interrupt treatment in case of an emergency [16].
store, process and analyze large amounts of health data in
various forms, and deliver meaningful information to users B. HEALTHCARE PREDICTION
that allow them to discover business values and insights in Social networks, such as Facebook and Twitter, can help
a timely fashion.’’ Hence, there is a need to analyze this Big establish healthcare social networks. For example, patients
Data in order to: who suffer from chronic diseases can share their own expe-
- improve patient satisfaction and quality of care; rience with other patients or doctors. This sharing helps
- reduce expenditures, which reached 17.9% of the gross them to benefit from a broad range of experiences and
domestic product in 2010 [21]. expertise [30], [40]. An integrative healthcare system called
The authors in [43] suggest that the collection of clinical ‘‘GEMINI’’ was developed to process and analyze a large
data should adopt the best strategies and recommendations amount of variable and complex healthcare information.
in order to overcome the usual performance gaps. This data First, for each patient, data is collected from structured
is generated from many sources that encompasses Electronic sources, including patient demographics, lab test results and
Health Records, medical imaging data, genetic data and pre- medication history, and from unstructured sources, for exam-
scription notes [34]. Some examples of pertinent Big Data ple, prescriptions. This data is then stored in a patient profile
Applications in the Health Sector are the following. graph that presents a broad view of the patient’s state of
health. Then, analytics algorithms such as classification and
A. HEALTHCARE MONITORING clustering algorithms are used to extract valuable insights
A deep analysis of the healthcare data can help care providers useful for administrative and clinical purposes as well as
manage symptoms of patients online and adjust prescrip- predictive analytics [41].
tions [17]. For instance, with the development of wear- In addition, big genomics data analytics intervene in
able sensor devices, like Apple Watch and Sports bracelets, healthcare predictions by measuring the changes in DNA
information related to physical health checkups, including mutations and detecting the molecular responsible for the
blood pressure, height, weight, blood-glucose levels and appearance of diseases [34].
blood-calcium levels, can be constantly monitored in order Moreover, in order to reduce the healthcare costs due
to give a detailed vision about the condition of the patient’s to high rates of patients that suffer from chronic diseases,
health. These indicators help physicians monitor patients and such as diabetes, doctors use big data predictive analytics to
consequently unnecessary visits to the doctor can be avoided foresee high-risk patients and to offer to them customized
and at the same time, patients have the impression that they care [42]. To ensure a therapeutic follow-up for asthmatic
are more independent and yet become more aware concerning patients, an American company named Asthmapolis has

VOLUME 7, 2019 7403


S. Bahri et al.: BIG DATA for Healthcare: A Survey

developed sensors that are positioned in the top of an inhaler The authors of [46] developed a clinical recommendation
in order to track and follow the inhaler usage. Data related to system useful for patients to obtain reliable recommendations
the place and time of inhaler’s use is collected in real time of care providers in reference to their own health status.
via global positioning system (GPS). In the case of asthma In addition, patients are able to contribute to the enhancement
attack, recorded data is transferred to a web site accessible by of the performance of the system by adding their private notes
the patient and doctor via a smartphone or computer. On one and evaluations about physicians based on data for different
hand, the main goal of the sensors is to help patients make a health conditions. However, due to the sensitivity of such
good decision about visiting different places or not as well data, it must be protected from dishonest and malicious users.
as predicting asthma attacks. On the other hand, the data Moreover, according to care providers, this system ensures
aggregation guides the practitioner to develop a customized the preservation of their reputation. As an output, this system
care plan for the patient, identifying epochs with a high presents, for each patient, a list of best-ranking physicians.
probability of an asthmatic crisis and anticipating them by Furthermore, a collaborative-filtering method is consid-
either increasing or decreasing the drug dosage [16]. ered a type of recommendation system that forecasts view-
The authors in [44] selected 102 patients from 1000 that points of users about an article referring to the preference of
suffer from metabolic syndrome, to follow-up on their recov- a large group of users [13]. This technique was then applied
ery. Analysts collected data from 600 laboratory tests and in the healthcare sector, when a Collaborative Assessment
180 claims. Moreover, a customized treatment plan was and Recommendation Engine (CARE) for disease risk pre-
elaborated from patients’ health records. This application diction was created. The first visit of a patient to a doctor
produced encouraging results: The morbidity rate might be provides clinical input data related to the medical history
reduced by 50% over the 10 next years. Many alternative of the patient. And for each disease j, analysts have to find
solutions were deduced: Prescription of statins, loss of weight patients who suffer from this disease based on health record
and reduction in the total triglyceride rate in the body if the data. The application of collaborative filtering generates p(a,
blood sugar level exceed 20%. j) that denoted the probability that a patient a will develop
In order to follow the propagation of epidemics around disease j in the future. Finally, for each patient, the system
the world, Google has developed two real-time surveillance provides the physician with a sorted list of potential diseases
applications: Google flu trends and Google dengue trends. ranked in order from highest to lowest risk. This framework is
Data were collected from internet searches where, in fact, effective for decreasing readmission rates, making consistent
people were going to Google search engine to find informa- predictions and enhancing quality of care ratings. To ensure
tion about symptoms, drugs, side effects, etc. The results con- its feasibility and efficiency, the system was validated on
firmed that Google Flu Trends over forecasted the prevalence a Medicare database of 13 million patients who made
of flu by 140% [12]. 32 million medical visits over 4 years.

E. HEALTHCARE KNOWLEDGE SYSTEM


C. PERFORMANCE ENHANCEMENT
A knowledge System is defined as the combination of infor-
In order to maximize the performance of Emergency
mation, data and physician expertise in order to present
Rooms (ER) and decrease crowding in ER, King Faisal
alternatives to potential emergency situations and to sup-
Specialist Hospital and Research Center was successfully
port clinical decision-making and diagnosis. The authors
managed by a project based on clinical big data analytics.
in [34] suggested a healthcare knowledge system based on
Data relating to the emergency department was extracted
four Big Data sources: Electronic Health Records (EHR),
from the data warehouse of the hospital. The analysis of the
Clinical Notes, Genetic Data and Medical Imaging Data.
data introduced changes in the workflow of the ER which
EHR data includes structured data, for example, laboratory
led to positive results. In fact, the ER waiting time reduced
data and billing data, and unstructured data such as medi-
from 140 min in 2014 to 62 min in 2016. The treatment time
cation records. Laboratory data is useful for diagnosis and
decreased from 17.5 h in 2014 to 10.8 h in 2016. The ER
health monitoring. Billing data includes various codes giving
Length Of Stay varied from 20 h in 2014 to 12 h in 2016.
access to laboratory results, clinical records and symptoms.
This showed the effectiveness of big data analysis in identi-
Medication records encompass a variety of data useful for
fying areas of insufficiency, and in recommending valuable
disease diagnosis and drug recommendations. Clinical Notes
solutions to positively improve performance [43].
are unstructured data aiming to identify common or well-
known illnesses. Genetic Data are a huge volume of data that
D. RECOMMENDATION SYSTEMS is used in the analysis of changes in gene sequences. And
In general, recommendation systems are software tools and unstructured medical imaging data contains different image
techniques developed to propose a set of suggestions for a data useful for treatment, diagnosis and prognosis.
product or a service, which help users make better decisions.
Currently, in the healthcare sector, recommendation systems F. HEALTHCARE MANAGEMENT SYSTEM
are increasingly used in order to provide medical recommen- The authors in [9] proposed a smart Healthcare Manage-
dations for drugs, diagnoses, and treatment plans. ment System named ‘‘DataCare’’, to be used at hospitals or

7404 VOLUME 7, 2019


S. Bahri et al.: BIG DATA for Healthcare: A Survey

healthcare centers, with the aim to increase patient on the present situation, it offers options on how to benefit
satisfaction by monitoring clinical Key Performance from future opportunities or mitigate a future risk and details
Indicators (KPIs) and identifying unexpected situations. The the implication of each decision option. Finally, Discov-
architecture of the system has three main modules: Data ery (Exploratory) analytics illustrates unexpected relation-
Retrieving and Aggregation, Data Processing and Analysis, ships between parameters in Big Data [4]. The authors in [24]
and Data Visualization. In the first module, data is collected argue that currently, the output of predictive analytics can
and aggregated via AdvantCare software which is used for benefit from the potential of descriptive analytics through the
supervising the communication between patients and physi- use of dashboards and scorecard computations.
cians. In the second module, the authors decided to use Big Healthcare Data Analytics (BHDA) is defined as
Apache Spark to process the huge amounts of streaming the use of statistical, cognitive, predictive, contextual, and
data due to its high scalability and fault-tolerance. Data was quantitative models for efficient and fast decision mak-
then stored in a MongoDB database for further analysis. The ing useful for planning, forecasting, resource management,
stored data was analyzed in order to extract valuable insights etc. Big Data Analytics helps healthcare stakeholders, med-
for healthcare predictions, clinical recommendations and ical practitioners, hospital operators, pharmaceutical and
alerts. In fact, DataCare is capable of forecasting KPIs in the clinical researchers, and healthcare insurers, to improve
future based on actual data. Moreover, the framework is able their findings by harnessing their internal and external Big
to generate early and real-time alerts if an indicator exceeded Data [5], [11], [12], [40]. According to medical practitioners,
its authorized value delimited by thresholds. In addition, the analysis of patient data, including patient medical history,
DataCare provided recommendations aiming to enhance the physicians’ notes, laboratory results, and clinical trials data,
quality of care. DataCare has 52 rules designed by physicians assists them to track the progress of a proposed treatment
based on their expertise and knowledge. Finally, the Visual- plan and to interrupt the plan to make changes if necessary,
ization module displayed all the information on dashboards and consequently unnecessary visits can be eliminated and
to ensure more accurate and efficient interpretations. Dat- readmission rates decreased. For hospital operators, Big Data
aCare was validated at a medical centre in Spain. Expected Analytics helps them to allocate resources. For instance,
outcomes were obtained with interesting conclusions. the analysis of location awareness data contributes to opti-
From these diverse applications in the healthcare sector, mize the use of expensive healthcare equipment and devices.
one can deduce the potential that Big Data analysis can have In addition, pharmaceutical organizations take profit from
on optimizing hospital operations, improving care, decreas- analytic advantages in the elaboration of marketing strategies.
ing expenditure and readmission rates, saving lives, and In fact, by gathering and analyzing data such as sales history
improving quality of care [11], [17], [22]. data, and drug recommendation for each patient and disease,
they are able to assess their current market position, which is
V. BIG DATA ANALYTICS useful for the definition of strategic priorities. Furthermore,
Big Data Analytics is defined as mining of pertinent knowl- the analysis of patient demographic data, such as age and
edge and valuable insights from large amounts of stored gender, and clinical data, such as disease and drugs history,
data [4]. The key objective of such analytics is to facilitate the insurer is able to elaborate an appropriate health plan for
decision making for researchers, such as offering dashboards, each patient [30]. In conclusion, Big Data Analytics plays
graphics or operational reporting to monitor thresholds and an important role in the enhancement of medical services
KPIs. This involves using mathematical and statistical meth- and increases patient satisfaction. Consequently, it has the
ods to understand data, simulate scenarios, validate hypothe- potential to improve care, save lives and lower costs.
ses and make predictive forecasts for future incidents. Data
Mining is a key concept in Big Data Analytics that consists
in applying data science techniques to analyze and explore VI. BIG DATA: CHALLENGES & PERSPECTIVES
large datasets to find meaningful and useful patterns in those A. BIG DATA CHALLENGES
data. It involves complex statistical models and sophisticated Big Data presents various opportunities in the medical,
algorithms, such as machine learning algorithms, mainly biomedical and healthcare sectors due to its ability to obtain
to perform four categories of analytics: Descriptive ana- valuable knowledge useful for improving healthcare organi-
lytics, Predictive analytics, Prescriptive analytics and Dis- zations, reducing healthcare costs as well as reducing unnec-
covery (Exploratory) analytics. Descriptive analytics turns essary visits and readmissions [11]. However, by handling
collected data into meaningful information for interpreting, datasets characterized by huge volumes generated at very
reporting, monitoring and visualization purposes via statis- high speeds and with large diversity, e.g. structured, semi
tical graphical tools such as pie charts, graphs, bar charts, structured, and unstructured data, many barriers and hurdles
and dashboards. Predictive analytics is commonly defined have to be overcome along the path to create value from data
as data extrapolation based on available data for ensuring collection to data analysis. Diverse challenges, that can be
better decision making. Prescriptive analytics is associated categorized into five groups, must be overcome to be able to
with Descriptive and Predictive analytics. Likewise, based benefit from the advantages of Big Data:

VOLUME 7, 2019 7405


S. Bahri et al.: BIG DATA for Healthcare: A Survey

1) DATA COLLECTION and analysis, many researchers deal with the use of graphic
Data reliability is among the criteria for data selection during vizualization tools capable of summarizing large amounts of
the collection phase. It is crucial to select the data sources data into significative and intuitive graphic or picture formats.
well, considering that they may contain noise, errors, as well Thus, researchers need to use tools with high scalability.
as inconsistent or incomplete data. However, due to the enor- However, most of the tools available present many func-
mous diversity of sources, it is becoming a challenge to treat it tional limits in terms of scalability and timeliness responses.
all and select the best. In addition, during data acquisition, it is So, once again it is a challenge to design software or hard-
necessary to integrate the external data of the organization to ware that will lead to to parallel computing and visualization
the internal data in order to obtain knowledge and updated processes to ensure accurate analyses [2], [32].
information about the external environment and to make
accurate prediction models. This aggregation is becoming 5) DATA SECURITY
more and more challenging [17].
Clinical data are very sensitive data that must be made secure
in order to protect data from hackers that are able to use
2) DATA PROCESSING data mining techniques to extract personnal data and make it
Data processing aims to generate cleaned, consistent and public. Therefore, it is essential to implement security proce-
secured data for efficient and accurate analysis. The first dures such as authentification, authorization and encryption
challenge is how to collect, process and store variable data to enhance data security. The challenge here is to develop
from various types of devices with limited capacity and a multi-level security, privacy preserved data model for Big
CPU. The second challenge is how to ensure accuracy and Data [20], [32], [35].
consitency in decision making when aggregating dissimilar
data with multiple formats [17]. However, many Big Data
B. BIG DATA CHALLENGES FOR HEALTHCARE
processing tools perform poorly with computational uncer-
tainties, unconsistencies and complexities. So, it is becoming Currently, the healthcare sector is among the sectors that gen-
a challenge to use convenient techniques and technologies erate tremendous amount of data from multipe sources that
that aim to minimize computational cost processing and com- can be quantitative, including laboratory test, genes arrays
plexities. Currently, many well-known organizations require and sensor data, or qualitative, such as demographics and free
real time data processing in which large amounts of data texts [35]. According to [3], medical data is different from
are promptly executed in real or near-real time to allow fast other data. In fact, it is very sensitive and hard to access.
decision making. Therefore, there is a need to adopt Big Data Unfortunately, data can be affected by the use of unmanaged
technologies with high scalability [32]. data sources such as social networks. Also, any errors in
measures or in codes can severly affect the reliability of an
analysis. Therefore, data trustworthness, data quality and data
3) DATA STORAGE
consistency are yet another challenge for Big Data. Moreover,
The volume of data collected is increasing dramatically, espe- clinical data is continously being generated, hence, the use of
cially due to the spread and use of new technological trends, real-time data streaming tools and technologies are gaining
such as social media, and remote sensing. In the recent past, relevance [22].
developers and analysts were using hard drive disks to store As previously stated, the majority of the population in
data. Unfortunately, these devices are now not suitable for the world use social networks to collect updated informa-
data storage. Therefore, the first challenge is the usage of tion as well as to obtain knowledge, to communicate and
apropriate storage mediums with higher input/output speeds. to carry out personal research. Therefore, a tremendous
The main goal is to ensure data availability and accessibility amount of data will be generated at high speed and in
for further analysis [32]. various formats. Thus, it is very interesting and challeng-
ing to exploit and integrate this data in order to improve
4) DATA ANALYSIS healthcare outcomes. A possible challenge is to elaborate
For the extraction of relevant information from a pool of a medical decision system for forecasting the spreading of
stored data generated from different sources, it is crucial to epidemics in different geographical locations by analysing
choose the analytical software and hardware well in order to social media data such as Facebook and Twitter. In order to
produce more accurate and valuable outcomes. This require- carry this out, a robust predictive system that considers a set of
ment is becoming more and more challenging due to diversity attributes related to a type of epidemic, e.g. influenza, dengue
of available technologies designed for data analysis. Besides, fever or cholera, must be developed. The past values extracted
the variety of data can cause unprecedented challenges for from social media including comments, likes, and posts, will
analysts. Indeed, the existing tools are unable to respond in allow a predictive system to extrapolate these values into
the required time when treating high dimensional data. The the future. The capabilities of Big Data technologies capa-
next challenge is related to how effectively multivariate data bilities and the richness, massiveness and variety of social
can be analyzed in order to obtain valuable knowledge as network data can be combined to provide relevant healthcare
an output. Moreover, for an easy and pertinent interpretation predictions.

7406 VOLUME 7, 2019


S. Bahri et al.: BIG DATA for Healthcare: A Survey

VII. CONCLUSION [11] A. Kankanhalli, J. Hahn, S. Tan, and G. Gao, ‘‘Big data and analytics in
This study aimed to emphasize the enormous implications of healthcare: Introduction to the special section,’’ Inf. Syst. Frontiers, vol. 18,
no. 2, pp. 233–235, 2016.
Big Data Techniques and Technologies on the performance [12] V. Rajaraman, ‘‘Big data analytics,’’ Resonance, vol. 21, no. 8,
and outcomes of Healthcare organizations. Section 1 pre- pp. 695–716, 2016.
sented this novel concept ‘‘Big Data’’ and its evolution over [13] V. Shobana and N. Kumar, ‘‘A personalized recommendation engine for
prediction of disorders using big data analytics,’’ in Proc. IEEE ICIGEHT,
time as well as its Vs: Volume, Variety, Velocity, Verac- Coimbatore, India, Mar. 2017, pp. 1–4.
ity, Variability, Validity, Viscosity, Volatility, Visualization, [14] N. V. Chawla and D. A. Davis, ‘‘Bringing big data to personalized health-
Virility, and Valence. Then, Section 2, described the process care: A patient-centered framework,’’ J. Global Inf. Manage., vol. 28, no. 3,
pp. 660–665, 2013.
of building value from big data. For each step, a list of [15] F. Liang, W. Yu, D. An, Q. Yang, X. Fu, and W. Zhao, ‘‘A survey on
available Big Data technologies was proposed and detailed. big data market: Pricing, trading and protection,’’ IEEE Access, vol. 6,
This section showed the potential of technologies in handling pp. 15132–15154, 2018.
[16] R. Nambiar, A. Sethi, R. Bhardwaj, and R. Vargheese, ‘‘A look at chal-
and analyzing huge amounts of data extracted from multiple lenges and opportunities of big data analytics in healthcare,’’ in Proc. IEEE
sources and in different formats. Then, in Section 3, big data Int. Conf. Big Data, Silicon Valley, CA, USA, Oct. 2013, pp. 17–22.
applications for healthcare found in the literature were classi- [17] A. Oussous, F. Z. Benjelloun, A. Ait Lahcen, and S. Belfkih, ‘‘Big data
technologies: A survey,’’ J. King Saud Univ., Comput. Inf. Sci., vol. 30,
fied into five groups: Healthcare monitoring, Healthcare Pre- no. 4, pp. 431–448, 2017.
diction, Recommendation systems, Healthcare Knowledge [18] K. Tiampo, S. McGinnis, Y. Kropivnitskaya, J. Qin, and M. A. Bauer,
system and Healthcare Management System. Based on the ‘‘Big data challenges and hazards modeling,’’ in Risk Modeling for Hazards
and Disasters, M. Gero, Ed. Amsterdam, The Netherlands: Elsevier, 2018,
reviewed cases, one can confirm the countless opportunities pp. 193–210.
offered by Big Data and its analysis. [19] Y. Wang, L. Kung, and T. Byrd, ‘‘Big data analytics: Understanding its
The analytical capabilities of Big data techniques and tech- capabilities and potential benefits for healthcare organizations,’’ Technol.
Forecasting Social Change, vol. 126, pp. 3–13, Jan. 2016.
nologies as well as the consistent knowledge and valuable [20] K. Abouelmehdi, A. Beni-Hssane, H. Khaloufi, and M. Saadi, ‘‘Big data
insights that can be derived from stored Big Data are useful security and privacy in healthcare: A review,’’ in Proc. 8th Int. Conf.
for making predictions, recommendations, medical diagno- EUSPN, Lund, Sweden, 2017, pp. 73–80.
[21] Transforming Healthcare Through Big Data: Strategies for Leveraging Big
sis, resource allocations and personalized treatment plans. Data in the Healthcare Industry, Inst. Health Technol., New York, NY,
This ability may have a positive effect on the quality of USA, 2013.
healthcare and its outcomes. Here, Big Data Analytics was [22] W. Raghupathi and V. Raghupathi, ‘‘Big Data analytics in healthcare:
Promise and potential,’’ Health Inf. Sci. Syst., vol. 2, no. 3, pp. 1–10, 2014.
classified into four types: Descriptive Analytics, Prescrip- [23] MedAware Raises $8 Million for Software to Reduce Prescription Errors,
tive Analytics, Predictive Analytics and Discovery Analytics. Wall Street J., Dow Jones Company, New York, NY, USA, 2017.
Finally, based on the Big Data features identified, along with [24] U. Sivarajah, M. M. Kamal, I. Irani, and V. Weerakkody, ‘‘Critical analysis
of big data challenges and analytical methods,’’ J. Bus. Res., vol. 70,
Big Data Chain Value, many of the challenges that must be pp. 263–286, Jan. 2017.
tackled, were identified. [25] I. Lee, ‘‘Big data: Dimensions, evolution, impacts, and challenges,’’ Bus.
Horizons, vol. 60, no. 3, pp. 293–303, 2017.
[26] B. Baesens, R. Bapna, J. R. Marsden, J. Vanthienen, and J. L. Zhao,
REFERENCES ‘‘Transformational issues of big data and analytics in networked business,’’
[1] A. De Mauro, M. J. Greco, and M. Grimaldi, ‘‘A formal definition of MIS Quart., vol. 40, no. 4, pp. 807–818, 2016.
big data based on its essential features,’’ Library Rev., vol. 65, no. 3, [27] N. Khan et al., ‘‘Big data: Survey, technologies, opportunities,
pp. 122–135, 2016. and challenges,’’ Sci. World J., vol. 2014, Art. no. 712826,
[2] A. Gandomi and M. Haider, ‘‘Beyond the hype: Big data concepts, meth- doi: 10.1155/2014/712826.
ods, and analytics,’’ Int. J. Inf. Manage., vol. 35, no. 2, pp. 137–144, 2015. [28] W.-K. Liu and C.-C. Yen, ‘‘Optimizing bus passenger complaint service
[3] C. H. Lee and H.-J. Yoon, ‘‘Medical big data: Promise and challenges,’’ through big data analysis: Systematized analysis for improved public sector
Kidney Res. Clin. Pract., vol. 36, no. 1, pp. 3–11, 2017. management,’’ MDPI J., vol. 8, no. 12, p. 1319, 2016.
[4] D. Rajeshwari, ‘‘State of the art of big data analytics: A survey,’’ Int. [29] H. Daki, A. El Hannani, A. Aqqal, A. Haidine, and A. Dahbi, ‘‘Big data
J. Comput. Appl., vol. 120, no. 22, pp. 39–46, 2015. management in smart grid: Concepts, requirements and implementation,’’
[5] J. Cortada, D. Gordon, and B. Lenihan, ‘‘The value of analytics J. Big Data, vol. 4, no. 13, pp. 01–19, 2017.
in healthcare: From insights to outcones,’’ IBM Global Services, [30] V. Palanisamy and R. Thirunavukarasu, ‘‘Implications of big data analytics
Armonk, NY, USA, Executive Rep. GBE03476-USEN-00, 2012. in developing healthcare frameworks—A review,’’ J. King Saud Univ.-
[Online]. Available: http://www-05.ibm.com/ch/gesundheitswesen/pdf/ Comput. Inf. Sci., to be published, doi: 10.1016/j.jksuci.2017.12.007.
The_value_of_analytics_in_healthcare.pdf [31] V. Kamilaris, A. Kartakoullis, and F. X. Prenafeta-Bold, ‘‘A review on the
[6] S. Tiwari, H. M. Wee, and Y. Daryanto, ‘‘Big data analytics in supply chain practice of big data analysis in agriculture,’’ Comput. Electron. Agricult.,
management between 2010 and 2016: Insights to industries,’’ Comput. Ind. vol. 143, pp. 23–37, Dec. 2017.
Eng., vol. 115, pp. 319–330, Jan. 2018. [32] D. P. Acharjya and A. P. Kauser, ‘‘A survey on big data analytics: Chal-
[7] N. Ilyasova, A. Kupriyanov, R. Paringer, and D. Kirsh, ‘‘Particular use of lenges, open research issues and tools,’’ Int. J. Adv. Comput. Sci. Appl.,
BIG DATA in medical diagnostic tasks,’’ J. Sci. Commun., vol. 28, no. 1, vol. 7, no. 2, pp. 511–518, 2016.
pp. 114–121, 2018. [33] R. Y. Zhong, S. T. Newman, G. Q. Huang, and S. Lan, ‘‘Big data for supply
[8] S. Mazumder, ‘‘Big Data Tools and Platforms,’’ in Big Data Concepts, chain management in the service and manufacturing sectors: Challenges,
Theories, and Applications, Y. Shui and G. Song, Eds. Cham, Switzerland: opportunities, and future perspectives,’’ Comput. Ind. Eng., vol. 101,
Springer, 2016, pp. 29–128. pp. 572–591, Nov. 2016.
[9] A. Baldominos, F. De Rada, and Y. Saez, ‘‘DataCare: Big data analytics [34] G. Manogaran, C. Thota, D. Lopez, V. Vijayakumar, K. M. Abbas, and
solution for intelligent healthcare management,’’ Int. J. Interact. Multime- R. Sundarsekar, ‘‘Big data knowledge system in healthcare,’’ in Internet
dia Artif. Intell., vol. 4, no. 7, pp. 13–20, 2017. of Things and Big Data Technologies for Next Generation Healthcare,
[10] D. Laney, ‘‘3D data management: Controlling data volume, velocity and C. Bhatt, N. Dey, and A. S. Ashour, Eds. Springer, 2017, pp. 133–157.
variety,’’ META Group, Stamford, CT, USA, Res. Note 6, 2001. [Online]. [35] J. Andreu-Perez, C. C. Y. Poon, R. D. Merrifield, S. T. C. Wong, and
Available: http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D- G.-Z. Yang, ‘‘Big data for health,’’ IEEE J. Biomed. Health Inform., vol. 19,
Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf no. 4, pp. 1193–1208, Jul. 2015.

VOLUME 7, 2019 7407


S. Bahri et al.: BIG DATA for Healthcare: A Survey

[36] G. Manogaran, D. Lopez, C. Thota, K. M. Abbas, S. Pyne, and NESRINE ZOGHLAMI received the Diploma
R. Sundarsekar, ‘‘Big data analytics in healthcare Internet of Things,’’ in degree in electrical engineering and the M.Sc.
Innovative Healthcare Sytems for the 21st Century, H. Oudrat-Ullah and degree in electronic and telecommunications from
P. Tsasis, Eds. Springer, 2017, pp. 263–284. the University of Valenciennes, France, and the
[37] IDC Iview. vol. 1142, pp. 1–12, 2011. [Online]. Available: https:// Ph.D. degree in industrial computer science and
www.emc.com/collateral/analyst-reports/idc-extracting-value-from- automatic from the Ecole Centrale de Lille,
chaos-ar.pdf France, in 2008. She is currently an Associate
[38] IDC Country Brief. Feb. 2013. [Online]. Available: https://www.emc.
Professor in industrial computer science. She took
com/collateral/analyst-reports/idc-digital-universe-united-states.pdf
an active part in the national working group ORT
[39] R. Addo-Tenkorang and P. Y. Helo, ‘‘Big data applications in
operations/supply-chain management: A literature review,’’ Comput. of GDR MACS. She was responsible for organiz-
Ind. Eng., vol. 101, pp. 528–543, Nov. 2016. ing the program committees of international conferences and workshops,
[40] S. Shafqat, S. Kishwer, R. ur Rasool, J. Qadir, T. Amjad, and H. F. Ahmad, including MHOSI 2005, LT 2006, LT 2007, LT 2009, MSLT 2011, Sysco
‘‘Big data analytics enhanced healthcare systems: A review,’’ J. Supercom- 2012, ICALT 2013, GOL 2014, ICALT 2014, ICALT 2015, ICIT 2015, ICIT
put., pp. 1–46, 2018, doi: 10.1007/s11227-017-2222-4. 2016, GOL 2016, IPAC2016, BDWA 2016, ASET 2017, and ASET 2018.
[41] Z. J. Ling et al., ‘‘GEMINI: An integrative healthcare analytics system,’’ She was a Rapporteur of the H2020 proposals for the Research Executive
in Proc. 40th Int. Conf. Very Large Data Bases, Hangzhou, China, 2014, Agency, European Commission, from 2015 to 2016. She has authored or
pp. 1771–1776. co-authored about 50 publications, communications, and book chapters. She
[42] N. M. S. Kumar, T. Eswari, P. Sampath, and S. Lavanya, ‘‘Predictive has authored a book Optimisation Ãă base d’agents communicants des flux
methodology for diabetic data analysis in big data,’’ in Proc. 2nd Int. Symp. logistiques pour la gestion de crise on supply chain management. Her main
Big Data Cloud Comput. (ISBCC), vol. 50, 2015, pp. 203–208. research interests include optimization, artificial intelligence, and supply
[43] M. Khalifa and I. Zabani, ‘‘Utilizing Health Analytics in improving the per- chain management.
formance of healthcare services: A case study on a tertiary care hospital,’’
J. Infection Public Health, vol. 9, no. 6, pp. 757–765, 2016.
[44] M. Chen, S. Mao, Y. Zhang, and V. C. Leung, Big Data: Related Tech- MOURAD ABED was the Vice-President (dig-
nologies, Challenges and Future Prospects. Cham, Switzerland: Springer,
ital) of the University of Valenciennes and the
2014.
Vice-Director of the Institute of Science and Tech-
[45] A. Bhadani and D. Jothimani, ‘‘Big data: Challenges, opportunities and
realities,’’ in Effective Big Data Management and Opportunities for Imple- nology, from 2000 to 2010. He is currently a
mentation, M. K. Singh and D. G. Kumar, Eds. Hershey, PA, USA: Professor (Exceptional class) in computer engi-
IGI Global, 2016, pp. 1–24. neering with the University of Valenciennes and
[46] T. R. Hoens, M. Blanton, A. Steele, and N. V. Chawla, ‘‘Reliable medical a member of the Human Computer Interaction
recommendation systems with patient privacy,’’ ACM Trans. Intell. Syst. and Automated Reasoning Research Group, Auto-
Technol., vol. 4, no. 4, 2013, Art. no. 67. matic, Mechanic and Human IT Laboratory. He is
[47] R. J. Watson and J. L. Christensen, ‘‘Big data and student engagement also the Director of the Program of Master of
among vulnerable youth: A review,’’ Current Opinion Behav. Sci., vol. 18, Science and Technology Studies, a European Project Coordinator, and the
pp. 23–27, Dec. 2017. National Co-Chair of the Research Group. He has been the President or
[48] E. Kasturi, S. P. Devi, S. V. Kiran, and S. Manivannan, ‘‘Airline route the Co-President of international conferences or special sessions and con-
profitability analysis and optimization using BIG DATA analyticson avia- ferences for international journals. He has authored or co-authored (more
tion data sets under heuristic techniques,’’ Procedia Comput. Sci., vol. 87, than 180) numerous book chapters, journal articles, and communications.
pp. 86–92, 2016. He participates in several research networks, projects, and associations.
[49] J. Manyika et al., Big Data: The Next Frontier for Innovation, Competition
and Productivity. New York, NY, USA: McKinsey Global Institute, 2011.
[50] A. Intezari and S. Gressel, ‘‘Information and reformation in KM systems:
JOÃO MANUEL R. S. TAVARES graduated in
Big data and strategic decision-making,’’ J. Knowl. Manage., vol. 21, no. 1,
mechanical engineering from the Universidade do
pp. 71–91, 2017.
Porto, Portugal, in 1992. He received the M.Sc.
and Ph.D. degrees in electrical and computer engi-
neering from the Universidade do Porto, in 1995
and 2001, respectively, and the Habilitation degree
in mechanical engineering, in 2015. He is cur-
rently a Senior Researcher with the Instituto de
Ciência e Inovação em Engenharia Mecânica e
Engenharia Industrial and an Associate Professor
with the Department of Mechanical Engineering, Faculdade de Engenharia
da Universidade do Porto. He is the co-editor of more than 40 books, and
the co-author of more than 35 book chapters and 600 articles in international
and national journals and conferences. He holds three international patents
and two national patents. He has been a Committee Member for several
international and national journals and conferences. He is the co-founder
and the co-editor of the book series Lecture Notes in Computational Vision
SAFA BAHRI was born in Tunis, Tunisia, in 1992. and Biomechanics (Springer). He has been a (Co-)Supervisor for several
She received the National Diploma of Engineering M.Sc. and Ph.D. theses and a Supervisor for several Postdoctoral projects.
degree in industrial engineering from the National He has participated in many scientific projects as a Researcher and as a
School of Engineering of Carthage, in 2016. Scientific Coordinator. His main research interests include computational
She is currently pursuing the joint Ph.D. degree vision, medical imaging, computational mechanics, scientific visualization,
with the LTISIRS Laboratory, National School of human–computer interaction, and new product development. He is the
Engineering of Tunisia, and the LAMIH Labo- Founder and the Editor-in-Chief of the Computer Methods in Biomechanics
ratory, University of Valenciennes and Hainaut- and Biomedical Engineering: Imaging & Visualization (Taylor & Francis)
Cambrésis. She is developing a healthcare system and the Co-Founder and the Co-Chair of the International Conference Series,
that predicts the spreading of epidemics through including CompIMAGE, ECCOMAS Vip IMAGE, ICCEBS, and BioDental.
social media data analysis. Her research interests include big data applica- More information can be found at www.fe.up.pt/ tavares.
tions in the healthcare sector.

7408 VOLUME 7, 2019

You might also like