BIG DATA For Healthcare A Survey
BIG DATA For Healthcare A Survey
BIG DATA For Healthcare A Survey
Received September 11, 2018, accepted November 7, 2018, date of publication December 21, 2018,
date of current version January 23, 2019.
Digital Object Identifier 10.1109/ACCESS.2018.2889180
ABSTRACT Recently, the massification of new technologies, which has been adopted by a large majority
of the world population, has accumulated a tremendous amount of data, including clinical data. This
clinical data have been gathered up and interpreted by medical organizations in order to gain insights and
knowledge useful for clinical decisions, drug recommendations, and better diagnoses, among many other
uses. This paper highlights the enormous impacts of big data on medical stakeholders, patients, physicians,
pharmaceutical and medical operators, and healthcare insurers, and also reviews the different challenges that
must be taken into account to get the best benefits from all this big data and the available applications.
2169-3536 2018 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 7, 2019 Personal use is also permitted, but republication/redistribution requires IEEE permission. 7397
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
S. Bahri et al.: BIG DATA for Healthcare: A Survey
2) DATA PROCESSING
FIGURE 8. MapReduce architecture.
MapReduce is a programming model that makes the pro-
cessing of massive amount of data simpler and faster through
its efficient and cost-effective mechanisms. As shown • Task Tracker (Slaves Nodes): perform tasks required by
in Figure 6, this framework has three main functions: the JobTracker and supervise their execution.
C++, Python or JAVA are programming languages useful for
developing the MapReduce programming model. Many uses
of this technology are due to its fault tolerance and scalability.
In case of machine failure, a supplementary machine takes
care of the node failures [8], [17].
YARN (Yet Another Resource Negotiator) is more generic
than MapReduce. It is an advanced resource manager work-
ing on top of the HDFS and ensures the parallel execu-
FIGURE 6. MapReduce pipeline. tion of various applications. In addition, it handles both
batch and stream processing. This framework is also known
• Function Map: to ensure that the input data is broken by its scalability and security. In addition, YARN uses
down into independent key/value pairs; dynamic allocation of system resources, which allows which
• Function Shuffle or Sort: key/value pairs are collected, allows it to increase its exploitation resources. YARN has
stored and then grouped by keys. The output of this a master-slave architecture like the MapReduce framework
function is a collection of keys with associated values; (Figure 9). In fact, the resource manager that operates as a
• Function Reduce: parallel aggregation of pairs accord- master, manages the assignments of jobs around the cluster.
ing to a predefined program. The outputs are sets of The node manager is a generalized task tracker providing
key/value pairs stored in the output file of the MapRe- computational resources such as containers, and manages
duce system. processes running in those containers. A container ensures
A well-known application of the MapReduce framework the execution of the applications-specific process with a con-
called ‘‘Word Count’’ is presented in Figure 7. strained set of resources. An application master is in charge of
managing the required resources of individual applications.
It schedules tasks and assess their progress [8], [17].
The MapReduce framework also has a master-slave archi- Storm: is an open source framework designed for dis-
tecture (Figure 8): tributed real time computations. Unlike HDFS that is tar-
• JobTracker (Master Node): is responsible for the distri- geted for batch processing, this tool is able to handle stream
bution and assignment of tasks for the Slave Nodes; data. It is characterized by its high scalability, fault tolerance
and efficiency. Storm is used, for example, in real time ana- HDFS, this database also has a master-slave architecture.
lytics, ETL (Extract, Load, Transform) operations, online The master node manages the cluster, and the slaves per-
machine learning, and continuous computations [8], [17]. form the required operations on the available data. HBase
Flink: is an open source framework able to process stream is a flexible, distributed and scalable database, and has
data due to its distributed architecture. the capability for real-time queries, automatic and config-
Spark: is an in-memory cluster computing technology urable partitioning of data to facilitate data processing and
developed for fast computing of data through its sophisticated analysis [17].
libraries: HCatalog: is a table and storage management system for
• Spark Streaming: it allows the collection and processing Hadoop that enables users with different data processing tools
of data in real-time; to read and write data on the grid more easily [17], [27].
• Spark SQL: it is able to execute SQL queries; Avro: is an open source framework designed by Apache
• Spark MLlib: is a Machine Learning library useful and Hadoop that offers two services for developers: Data serial-
powerful in solving machine learning problems. It con- ization and Data exchange [17], [27].
tains different machine learning algorithms related to
classification, regression, clustering and optimization; 4) BIG DATA ANALYSIS
• Spark R: it contains the same functionalities of the pro- Pig: is an open source platform designed by Yahoo to
gramming language for statistical computing and graph- analyze large datasets that are considered to be data flows.
ics ‘‘R.’’ However, processing time is reduced thanks to In order to develop data analysis programs, Pig uses a
its distributed architecture; high programming language called ‘‘Pig Latin.’’ Accord-
• Spark GraphX: it is devoted to the processing of graphs. ingly, to analyze data, developers must write Pig Latin scripts
Giraphe: represents an interactive graph processing with then convert them into MapReduce tasks using the com-
high scalability. It is mostly used by Facebook to interpret ponent Pig Engine. Apache Pig presents many advantages.
social graphs and connections between subscribers [8]. First, due to its multi-query approach, the length of codes
Zookeeper: is a distributed service designed for the syn- is reduced. Second, Pig Latin substitutes the use of Java,
chronization of configurations across a cluster. It ensures the which is traditionally seen as more complicated, when coding
high availability of data [17], [27]. MapReduce jobs. Indeed, Pig Latin is similar to SQL lan-
Oozie: is an open-source job coordinator that executes and guage and is easy to learn. The only difference is that Pig
manages job flows in the Hadoop system. Oozie is a scalable, Latin is able to process semi-structured and unstructured data.
reliable and extensible system [27]. Finally, Pig is characterized by its interactive environment
and can process massive amounts of data due to its distributed
3) BIG DATA STORAGE architecture [17], [45].
To ensure the storage of a huge volume of data, Hadoop uses Hive is a data warehouse system created by Facebook in
the distributed data storage system HDFS and a non-relational order to facilitate the use of Hadoop. The collected data is
database named HBase: stored in a structured database, comprehensible to all users.
HDFS is one of the primary components of a Hadoop Apache Hive database is managed through a HQL language
cluster, i.e. a set of connected computers, that can support up having the same syntax as SQL language. HQL transforms
to hundreds of nodes in a cluster. It is cost effective and has queries into MapReduce jobs processed as batch tasks. Like
a reliable storage capability, high scalability as well as fault Pig, Hive has an interactive interface with a diversity of func-
tolerance. In addition, HDFS can handle both structured and tions useful for data analysis. Unfortunately, Hive is mostly
unstructured data. HDFS is designed for batch processing of used for structured data [17], [45].
high latency operations. In fact, it stores data in 64 or 128 byte Mahout is an open source library designed for machine
capacity blocks. In order to avoid data losses, blocks of the learning and data mining. It works on the top of HDFS in
same file are replicated three times and stored across the order to execute algorithms via MapReduce. It helps devel-
cluster in three different servers. HDFS has a master-slave opers access their own libraries for clustering, collaborative
architecture (Figure 4) that is commonly adopted due to its filtering, categorization and text mining. It is scalable and can
capacity to reduce network congestion and increase system be executed in a distributed mode [17], [27].
performance by performing the computations near the data To conclude, the Hadoop Ecosystem (Figure 10) contains
storage locations [17], [12]. very powerful tools able to collect, process, store and analyze
HBase is an open source project built on top of HDFS a large amount of varied data coming from several sources
designed for low latency operations. This non-relational generated at a high rate. This vast potential is due to its scala-
database, developed posterior to Google’s Big Table, has bility, fault tolerance, flexible scheduling and resource man-
the potential to host very large tables with billions of rows agement, high level and simplified programming model, dis-
and millions of columns. Unlike a row oriented relational tributed architecture, real-time processing, in-memory pro-
database that stores together all columns of a row, HBase cessing and high throughput.
is a columnar database management system that stores The next section presents pertinent applications of Big
data in columns to ensure easier access to data. As for Data, especially in the Healthcare sector.
developed sensors that are positioned in the top of an inhaler The authors of [46] developed a clinical recommendation
in order to track and follow the inhaler usage. Data related to system useful for patients to obtain reliable recommendations
the place and time of inhaler’s use is collected in real time of care providers in reference to their own health status.
via global positioning system (GPS). In the case of asthma In addition, patients are able to contribute to the enhancement
attack, recorded data is transferred to a web site accessible by of the performance of the system by adding their private notes
the patient and doctor via a smartphone or computer. On one and evaluations about physicians based on data for different
hand, the main goal of the sensors is to help patients make a health conditions. However, due to the sensitivity of such
good decision about visiting different places or not as well data, it must be protected from dishonest and malicious users.
as predicting asthma attacks. On the other hand, the data Moreover, according to care providers, this system ensures
aggregation guides the practitioner to develop a customized the preservation of their reputation. As an output, this system
care plan for the patient, identifying epochs with a high presents, for each patient, a list of best-ranking physicians.
probability of an asthmatic crisis and anticipating them by Furthermore, a collaborative-filtering method is consid-
either increasing or decreasing the drug dosage [16]. ered a type of recommendation system that forecasts view-
The authors in [44] selected 102 patients from 1000 that points of users about an article referring to the preference of
suffer from metabolic syndrome, to follow-up on their recov- a large group of users [13]. This technique was then applied
ery. Analysts collected data from 600 laboratory tests and in the healthcare sector, when a Collaborative Assessment
180 claims. Moreover, a customized treatment plan was and Recommendation Engine (CARE) for disease risk pre-
elaborated from patients’ health records. This application diction was created. The first visit of a patient to a doctor
produced encouraging results: The morbidity rate might be provides clinical input data related to the medical history
reduced by 50% over the 10 next years. Many alternative of the patient. And for each disease j, analysts have to find
solutions were deduced: Prescription of statins, loss of weight patients who suffer from this disease based on health record
and reduction in the total triglyceride rate in the body if the data. The application of collaborative filtering generates p(a,
blood sugar level exceed 20%. j) that denoted the probability that a patient a will develop
In order to follow the propagation of epidemics around disease j in the future. Finally, for each patient, the system
the world, Google has developed two real-time surveillance provides the physician with a sorted list of potential diseases
applications: Google flu trends and Google dengue trends. ranked in order from highest to lowest risk. This framework is
Data were collected from internet searches where, in fact, effective for decreasing readmission rates, making consistent
people were going to Google search engine to find informa- predictions and enhancing quality of care ratings. To ensure
tion about symptoms, drugs, side effects, etc. The results con- its feasibility and efficiency, the system was validated on
firmed that Google Flu Trends over forecasted the prevalence a Medicare database of 13 million patients who made
of flu by 140% [12]. 32 million medical visits over 4 years.
healthcare centers, with the aim to increase patient on the present situation, it offers options on how to benefit
satisfaction by monitoring clinical Key Performance from future opportunities or mitigate a future risk and details
Indicators (KPIs) and identifying unexpected situations. The the implication of each decision option. Finally, Discov-
architecture of the system has three main modules: Data ery (Exploratory) analytics illustrates unexpected relation-
Retrieving and Aggregation, Data Processing and Analysis, ships between parameters in Big Data [4]. The authors in [24]
and Data Visualization. In the first module, data is collected argue that currently, the output of predictive analytics can
and aggregated via AdvantCare software which is used for benefit from the potential of descriptive analytics through the
supervising the communication between patients and physi- use of dashboards and scorecard computations.
cians. In the second module, the authors decided to use Big Healthcare Data Analytics (BHDA) is defined as
Apache Spark to process the huge amounts of streaming the use of statistical, cognitive, predictive, contextual, and
data due to its high scalability and fault-tolerance. Data was quantitative models for efficient and fast decision mak-
then stored in a MongoDB database for further analysis. The ing useful for planning, forecasting, resource management,
stored data was analyzed in order to extract valuable insights etc. Big Data Analytics helps healthcare stakeholders, med-
for healthcare predictions, clinical recommendations and ical practitioners, hospital operators, pharmaceutical and
alerts. In fact, DataCare is capable of forecasting KPIs in the clinical researchers, and healthcare insurers, to improve
future based on actual data. Moreover, the framework is able their findings by harnessing their internal and external Big
to generate early and real-time alerts if an indicator exceeded Data [5], [11], [12], [40]. According to medical practitioners,
its authorized value delimited by thresholds. In addition, the analysis of patient data, including patient medical history,
DataCare provided recommendations aiming to enhance the physicians’ notes, laboratory results, and clinical trials data,
quality of care. DataCare has 52 rules designed by physicians assists them to track the progress of a proposed treatment
based on their expertise and knowledge. Finally, the Visual- plan and to interrupt the plan to make changes if necessary,
ization module displayed all the information on dashboards and consequently unnecessary visits can be eliminated and
to ensure more accurate and efficient interpretations. Dat- readmission rates decreased. For hospital operators, Big Data
aCare was validated at a medical centre in Spain. Expected Analytics helps them to allocate resources. For instance,
outcomes were obtained with interesting conclusions. the analysis of location awareness data contributes to opti-
From these diverse applications in the healthcare sector, mize the use of expensive healthcare equipment and devices.
one can deduce the potential that Big Data analysis can have In addition, pharmaceutical organizations take profit from
on optimizing hospital operations, improving care, decreas- analytic advantages in the elaboration of marketing strategies.
ing expenditure and readmission rates, saving lives, and In fact, by gathering and analyzing data such as sales history
improving quality of care [11], [17], [22]. data, and drug recommendation for each patient and disease,
they are able to assess their current market position, which is
V. BIG DATA ANALYTICS useful for the definition of strategic priorities. Furthermore,
Big Data Analytics is defined as mining of pertinent knowl- the analysis of patient demographic data, such as age and
edge and valuable insights from large amounts of stored gender, and clinical data, such as disease and drugs history,
data [4]. The key objective of such analytics is to facilitate the insurer is able to elaborate an appropriate health plan for
decision making for researchers, such as offering dashboards, each patient [30]. In conclusion, Big Data Analytics plays
graphics or operational reporting to monitor thresholds and an important role in the enhancement of medical services
KPIs. This involves using mathematical and statistical meth- and increases patient satisfaction. Consequently, it has the
ods to understand data, simulate scenarios, validate hypothe- potential to improve care, save lives and lower costs.
ses and make predictive forecasts for future incidents. Data
Mining is a key concept in Big Data Analytics that consists
in applying data science techniques to analyze and explore VI. BIG DATA: CHALLENGES & PERSPECTIVES
large datasets to find meaningful and useful patterns in those A. BIG DATA CHALLENGES
data. It involves complex statistical models and sophisticated Big Data presents various opportunities in the medical,
algorithms, such as machine learning algorithms, mainly biomedical and healthcare sectors due to its ability to obtain
to perform four categories of analytics: Descriptive ana- valuable knowledge useful for improving healthcare organi-
lytics, Predictive analytics, Prescriptive analytics and Dis- zations, reducing healthcare costs as well as reducing unnec-
covery (Exploratory) analytics. Descriptive analytics turns essary visits and readmissions [11]. However, by handling
collected data into meaningful information for interpreting, datasets characterized by huge volumes generated at very
reporting, monitoring and visualization purposes via statis- high speeds and with large diversity, e.g. structured, semi
tical graphical tools such as pie charts, graphs, bar charts, structured, and unstructured data, many barriers and hurdles
and dashboards. Predictive analytics is commonly defined have to be overcome along the path to create value from data
as data extrapolation based on available data for ensuring collection to data analysis. Diverse challenges, that can be
better decision making. Prescriptive analytics is associated categorized into five groups, must be overcome to be able to
with Descriptive and Predictive analytics. Likewise, based benefit from the advantages of Big Data:
1) DATA COLLECTION and analysis, many researchers deal with the use of graphic
Data reliability is among the criteria for data selection during vizualization tools capable of summarizing large amounts of
the collection phase. It is crucial to select the data sources data into significative and intuitive graphic or picture formats.
well, considering that they may contain noise, errors, as well Thus, researchers need to use tools with high scalability.
as inconsistent or incomplete data. However, due to the enor- However, most of the tools available present many func-
mous diversity of sources, it is becoming a challenge to treat it tional limits in terms of scalability and timeliness responses.
all and select the best. In addition, during data acquisition, it is So, once again it is a challenge to design software or hard-
necessary to integrate the external data of the organization to ware that will lead to to parallel computing and visualization
the internal data in order to obtain knowledge and updated processes to ensure accurate analyses [2], [32].
information about the external environment and to make
accurate prediction models. This aggregation is becoming 5) DATA SECURITY
more and more challenging [17].
Clinical data are very sensitive data that must be made secure
in order to protect data from hackers that are able to use
2) DATA PROCESSING data mining techniques to extract personnal data and make it
Data processing aims to generate cleaned, consistent and public. Therefore, it is essential to implement security proce-
secured data for efficient and accurate analysis. The first dures such as authentification, authorization and encryption
challenge is how to collect, process and store variable data to enhance data security. The challenge here is to develop
from various types of devices with limited capacity and a multi-level security, privacy preserved data model for Big
CPU. The second challenge is how to ensure accuracy and Data [20], [32], [35].
consitency in decision making when aggregating dissimilar
data with multiple formats [17]. However, many Big Data
B. BIG DATA CHALLENGES FOR HEALTHCARE
processing tools perform poorly with computational uncer-
tainties, unconsistencies and complexities. So, it is becoming Currently, the healthcare sector is among the sectors that gen-
a challenge to use convenient techniques and technologies erate tremendous amount of data from multipe sources that
that aim to minimize computational cost processing and com- can be quantitative, including laboratory test, genes arrays
plexities. Currently, many well-known organizations require and sensor data, or qualitative, such as demographics and free
real time data processing in which large amounts of data texts [35]. According to [3], medical data is different from
are promptly executed in real or near-real time to allow fast other data. In fact, it is very sensitive and hard to access.
decision making. Therefore, there is a need to adopt Big Data Unfortunately, data can be affected by the use of unmanaged
technologies with high scalability [32]. data sources such as social networks. Also, any errors in
measures or in codes can severly affect the reliability of an
analysis. Therefore, data trustworthness, data quality and data
3) DATA STORAGE
consistency are yet another challenge for Big Data. Moreover,
The volume of data collected is increasing dramatically, espe- clinical data is continously being generated, hence, the use of
cially due to the spread and use of new technological trends, real-time data streaming tools and technologies are gaining
such as social media, and remote sensing. In the recent past, relevance [22].
developers and analysts were using hard drive disks to store As previously stated, the majority of the population in
data. Unfortunately, these devices are now not suitable for the world use social networks to collect updated informa-
data storage. Therefore, the first challenge is the usage of tion as well as to obtain knowledge, to communicate and
apropriate storage mediums with higher input/output speeds. to carry out personal research. Therefore, a tremendous
The main goal is to ensure data availability and accessibility amount of data will be generated at high speed and in
for further analysis [32]. various formats. Thus, it is very interesting and challeng-
ing to exploit and integrate this data in order to improve
4) DATA ANALYSIS healthcare outcomes. A possible challenge is to elaborate
For the extraction of relevant information from a pool of a medical decision system for forecasting the spreading of
stored data generated from different sources, it is crucial to epidemics in different geographical locations by analysing
choose the analytical software and hardware well in order to social media data such as Facebook and Twitter. In order to
produce more accurate and valuable outcomes. This require- carry this out, a robust predictive system that considers a set of
ment is becoming more and more challenging due to diversity attributes related to a type of epidemic, e.g. influenza, dengue
of available technologies designed for data analysis. Besides, fever or cholera, must be developed. The past values extracted
the variety of data can cause unprecedented challenges for from social media including comments, likes, and posts, will
analysts. Indeed, the existing tools are unable to respond in allow a predictive system to extrapolate these values into
the required time when treating high dimensional data. The the future. The capabilities of Big Data technologies capa-
next challenge is related to how effectively multivariate data bilities and the richness, massiveness and variety of social
can be analyzed in order to obtain valuable knowledge as network data can be combined to provide relevant healthcare
an output. Moreover, for an easy and pertinent interpretation predictions.
VII. CONCLUSION [11] A. Kankanhalli, J. Hahn, S. Tan, and G. Gao, ‘‘Big data and analytics in
This study aimed to emphasize the enormous implications of healthcare: Introduction to the special section,’’ Inf. Syst. Frontiers, vol. 18,
no. 2, pp. 233–235, 2016.
Big Data Techniques and Technologies on the performance [12] V. Rajaraman, ‘‘Big data analytics,’’ Resonance, vol. 21, no. 8,
and outcomes of Healthcare organizations. Section 1 pre- pp. 695–716, 2016.
sented this novel concept ‘‘Big Data’’ and its evolution over [13] V. Shobana and N. Kumar, ‘‘A personalized recommendation engine for
prediction of disorders using big data analytics,’’ in Proc. IEEE ICIGEHT,
time as well as its Vs: Volume, Variety, Velocity, Verac- Coimbatore, India, Mar. 2017, pp. 1–4.
ity, Variability, Validity, Viscosity, Volatility, Visualization, [14] N. V. Chawla and D. A. Davis, ‘‘Bringing big data to personalized health-
Virility, and Valence. Then, Section 2, described the process care: A patient-centered framework,’’ J. Global Inf. Manage., vol. 28, no. 3,
pp. 660–665, 2013.
of building value from big data. For each step, a list of [15] F. Liang, W. Yu, D. An, Q. Yang, X. Fu, and W. Zhao, ‘‘A survey on
available Big Data technologies was proposed and detailed. big data market: Pricing, trading and protection,’’ IEEE Access, vol. 6,
This section showed the potential of technologies in handling pp. 15132–15154, 2018.
[16] R. Nambiar, A. Sethi, R. Bhardwaj, and R. Vargheese, ‘‘A look at chal-
and analyzing huge amounts of data extracted from multiple lenges and opportunities of big data analytics in healthcare,’’ in Proc. IEEE
sources and in different formats. Then, in Section 3, big data Int. Conf. Big Data, Silicon Valley, CA, USA, Oct. 2013, pp. 17–22.
applications for healthcare found in the literature were classi- [17] A. Oussous, F. Z. Benjelloun, A. Ait Lahcen, and S. Belfkih, ‘‘Big data
technologies: A survey,’’ J. King Saud Univ., Comput. Inf. Sci., vol. 30,
fied into five groups: Healthcare monitoring, Healthcare Pre- no. 4, pp. 431–448, 2017.
diction, Recommendation systems, Healthcare Knowledge [18] K. Tiampo, S. McGinnis, Y. Kropivnitskaya, J. Qin, and M. A. Bauer,
system and Healthcare Management System. Based on the ‘‘Big data challenges and hazards modeling,’’ in Risk Modeling for Hazards
and Disasters, M. Gero, Ed. Amsterdam, The Netherlands: Elsevier, 2018,
reviewed cases, one can confirm the countless opportunities pp. 193–210.
offered by Big Data and its analysis. [19] Y. Wang, L. Kung, and T. Byrd, ‘‘Big data analytics: Understanding its
The analytical capabilities of Big data techniques and tech- capabilities and potential benefits for healthcare organizations,’’ Technol.
Forecasting Social Change, vol. 126, pp. 3–13, Jan. 2016.
nologies as well as the consistent knowledge and valuable [20] K. Abouelmehdi, A. Beni-Hssane, H. Khaloufi, and M. Saadi, ‘‘Big data
insights that can be derived from stored Big Data are useful security and privacy in healthcare: A review,’’ in Proc. 8th Int. Conf.
for making predictions, recommendations, medical diagno- EUSPN, Lund, Sweden, 2017, pp. 73–80.
[21] Transforming Healthcare Through Big Data: Strategies for Leveraging Big
sis, resource allocations and personalized treatment plans. Data in the Healthcare Industry, Inst. Health Technol., New York, NY,
This ability may have a positive effect on the quality of USA, 2013.
healthcare and its outcomes. Here, Big Data Analytics was [22] W. Raghupathi and V. Raghupathi, ‘‘Big Data analytics in healthcare:
Promise and potential,’’ Health Inf. Sci. Syst., vol. 2, no. 3, pp. 1–10, 2014.
classified into four types: Descriptive Analytics, Prescrip- [23] MedAware Raises $8 Million for Software to Reduce Prescription Errors,
tive Analytics, Predictive Analytics and Discovery Analytics. Wall Street J., Dow Jones Company, New York, NY, USA, 2017.
Finally, based on the Big Data features identified, along with [24] U. Sivarajah, M. M. Kamal, I. Irani, and V. Weerakkody, ‘‘Critical analysis
of big data challenges and analytical methods,’’ J. Bus. Res., vol. 70,
Big Data Chain Value, many of the challenges that must be pp. 263–286, Jan. 2017.
tackled, were identified. [25] I. Lee, ‘‘Big data: Dimensions, evolution, impacts, and challenges,’’ Bus.
Horizons, vol. 60, no. 3, pp. 293–303, 2017.
[26] B. Baesens, R. Bapna, J. R. Marsden, J. Vanthienen, and J. L. Zhao,
REFERENCES ‘‘Transformational issues of big data and analytics in networked business,’’
[1] A. De Mauro, M. J. Greco, and M. Grimaldi, ‘‘A formal definition of MIS Quart., vol. 40, no. 4, pp. 807–818, 2016.
big data based on its essential features,’’ Library Rev., vol. 65, no. 3, [27] N. Khan et al., ‘‘Big data: Survey, technologies, opportunities,
pp. 122–135, 2016. and challenges,’’ Sci. World J., vol. 2014, Art. no. 712826,
[2] A. Gandomi and M. Haider, ‘‘Beyond the hype: Big data concepts, meth- doi: 10.1155/2014/712826.
ods, and analytics,’’ Int. J. Inf. Manage., vol. 35, no. 2, pp. 137–144, 2015. [28] W.-K. Liu and C.-C. Yen, ‘‘Optimizing bus passenger complaint service
[3] C. H. Lee and H.-J. Yoon, ‘‘Medical big data: Promise and challenges,’’ through big data analysis: Systematized analysis for improved public sector
Kidney Res. Clin. Pract., vol. 36, no. 1, pp. 3–11, 2017. management,’’ MDPI J., vol. 8, no. 12, p. 1319, 2016.
[4] D. Rajeshwari, ‘‘State of the art of big data analytics: A survey,’’ Int. [29] H. Daki, A. El Hannani, A. Aqqal, A. Haidine, and A. Dahbi, ‘‘Big data
J. Comput. Appl., vol. 120, no. 22, pp. 39–46, 2015. management in smart grid: Concepts, requirements and implementation,’’
[5] J. Cortada, D. Gordon, and B. Lenihan, ‘‘The value of analytics J. Big Data, vol. 4, no. 13, pp. 01–19, 2017.
in healthcare: From insights to outcones,’’ IBM Global Services, [30] V. Palanisamy and R. Thirunavukarasu, ‘‘Implications of big data analytics
Armonk, NY, USA, Executive Rep. GBE03476-USEN-00, 2012. in developing healthcare frameworks—A review,’’ J. King Saud Univ.-
[Online]. Available: http://www-05.ibm.com/ch/gesundheitswesen/pdf/ Comput. Inf. Sci., to be published, doi: 10.1016/j.jksuci.2017.12.007.
The_value_of_analytics_in_healthcare.pdf [31] V. Kamilaris, A. Kartakoullis, and F. X. Prenafeta-Bold, ‘‘A review on the
[6] S. Tiwari, H. M. Wee, and Y. Daryanto, ‘‘Big data analytics in supply chain practice of big data analysis in agriculture,’’ Comput. Electron. Agricult.,
management between 2010 and 2016: Insights to industries,’’ Comput. Ind. vol. 143, pp. 23–37, Dec. 2017.
Eng., vol. 115, pp. 319–330, Jan. 2018. [32] D. P. Acharjya and A. P. Kauser, ‘‘A survey on big data analytics: Chal-
[7] N. Ilyasova, A. Kupriyanov, R. Paringer, and D. Kirsh, ‘‘Particular use of lenges, open research issues and tools,’’ Int. J. Adv. Comput. Sci. Appl.,
BIG DATA in medical diagnostic tasks,’’ J. Sci. Commun., vol. 28, no. 1, vol. 7, no. 2, pp. 511–518, 2016.
pp. 114–121, 2018. [33] R. Y. Zhong, S. T. Newman, G. Q. Huang, and S. Lan, ‘‘Big data for supply
[8] S. Mazumder, ‘‘Big Data Tools and Platforms,’’ in Big Data Concepts, chain management in the service and manufacturing sectors: Challenges,
Theories, and Applications, Y. Shui and G. Song, Eds. Cham, Switzerland: opportunities, and future perspectives,’’ Comput. Ind. Eng., vol. 101,
Springer, 2016, pp. 29–128. pp. 572–591, Nov. 2016.
[9] A. Baldominos, F. De Rada, and Y. Saez, ‘‘DataCare: Big data analytics [34] G. Manogaran, C. Thota, D. Lopez, V. Vijayakumar, K. M. Abbas, and
solution for intelligent healthcare management,’’ Int. J. Interact. Multime- R. Sundarsekar, ‘‘Big data knowledge system in healthcare,’’ in Internet
dia Artif. Intell., vol. 4, no. 7, pp. 13–20, 2017. of Things and Big Data Technologies for Next Generation Healthcare,
[10] D. Laney, ‘‘3D data management: Controlling data volume, velocity and C. Bhatt, N. Dey, and A. S. Ashour, Eds. Springer, 2017, pp. 133–157.
variety,’’ META Group, Stamford, CT, USA, Res. Note 6, 2001. [Online]. [35] J. Andreu-Perez, C. C. Y. Poon, R. D. Merrifield, S. T. C. Wong, and
Available: http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D- G.-Z. Yang, ‘‘Big data for health,’’ IEEE J. Biomed. Health Inform., vol. 19,
Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf no. 4, pp. 1193–1208, Jul. 2015.
[36] G. Manogaran, D. Lopez, C. Thota, K. M. Abbas, S. Pyne, and NESRINE ZOGHLAMI received the Diploma
R. Sundarsekar, ‘‘Big data analytics in healthcare Internet of Things,’’ in degree in electrical engineering and the M.Sc.
Innovative Healthcare Sytems for the 21st Century, H. Oudrat-Ullah and degree in electronic and telecommunications from
P. Tsasis, Eds. Springer, 2017, pp. 263–284. the University of Valenciennes, France, and the
[37] IDC Iview. vol. 1142, pp. 1–12, 2011. [Online]. Available: https:// Ph.D. degree in industrial computer science and
www.emc.com/collateral/analyst-reports/idc-extracting-value-from- automatic from the Ecole Centrale de Lille,
chaos-ar.pdf France, in 2008. She is currently an Associate
[38] IDC Country Brief. Feb. 2013. [Online]. Available: https://www.emc.
Professor in industrial computer science. She took
com/collateral/analyst-reports/idc-digital-universe-united-states.pdf
an active part in the national working group ORT
[39] R. Addo-Tenkorang and P. Y. Helo, ‘‘Big data applications in
operations/supply-chain management: A literature review,’’ Comput. of GDR MACS. She was responsible for organiz-
Ind. Eng., vol. 101, pp. 528–543, Nov. 2016. ing the program committees of international conferences and workshops,
[40] S. Shafqat, S. Kishwer, R. ur Rasool, J. Qadir, T. Amjad, and H. F. Ahmad, including MHOSI 2005, LT 2006, LT 2007, LT 2009, MSLT 2011, Sysco
‘‘Big data analytics enhanced healthcare systems: A review,’’ J. Supercom- 2012, ICALT 2013, GOL 2014, ICALT 2014, ICALT 2015, ICIT 2015, ICIT
put., pp. 1–46, 2018, doi: 10.1007/s11227-017-2222-4. 2016, GOL 2016, IPAC2016, BDWA 2016, ASET 2017, and ASET 2018.
[41] Z. J. Ling et al., ‘‘GEMINI: An integrative healthcare analytics system,’’ She was a Rapporteur of the H2020 proposals for the Research Executive
in Proc. 40th Int. Conf. Very Large Data Bases, Hangzhou, China, 2014, Agency, European Commission, from 2015 to 2016. She has authored or
pp. 1771–1776. co-authored about 50 publications, communications, and book chapters. She
[42] N. M. S. Kumar, T. Eswari, P. Sampath, and S. Lavanya, ‘‘Predictive has authored a book Optimisation Ãă base d’agents communicants des flux
methodology for diabetic data analysis in big data,’’ in Proc. 2nd Int. Symp. logistiques pour la gestion de crise on supply chain management. Her main
Big Data Cloud Comput. (ISBCC), vol. 50, 2015, pp. 203–208. research interests include optimization, artificial intelligence, and supply
[43] M. Khalifa and I. Zabani, ‘‘Utilizing Health Analytics in improving the per- chain management.
formance of healthcare services: A case study on a tertiary care hospital,’’
J. Infection Public Health, vol. 9, no. 6, pp. 757–765, 2016.
[44] M. Chen, S. Mao, Y. Zhang, and V. C. Leung, Big Data: Related Tech- MOURAD ABED was the Vice-President (dig-
nologies, Challenges and Future Prospects. Cham, Switzerland: Springer,
ital) of the University of Valenciennes and the
2014.
Vice-Director of the Institute of Science and Tech-
[45] A. Bhadani and D. Jothimani, ‘‘Big data: Challenges, opportunities and
realities,’’ in Effective Big Data Management and Opportunities for Imple- nology, from 2000 to 2010. He is currently a
mentation, M. K. Singh and D. G. Kumar, Eds. Hershey, PA, USA: Professor (Exceptional class) in computer engi-
IGI Global, 2016, pp. 1–24. neering with the University of Valenciennes and
[46] T. R. Hoens, M. Blanton, A. Steele, and N. V. Chawla, ‘‘Reliable medical a member of the Human Computer Interaction
recommendation systems with patient privacy,’’ ACM Trans. Intell. Syst. and Automated Reasoning Research Group, Auto-
Technol., vol. 4, no. 4, 2013, Art. no. 67. matic, Mechanic and Human IT Laboratory. He is
[47] R. J. Watson and J. L. Christensen, ‘‘Big data and student engagement also the Director of the Program of Master of
among vulnerable youth: A review,’’ Current Opinion Behav. Sci., vol. 18, Science and Technology Studies, a European Project Coordinator, and the
pp. 23–27, Dec. 2017. National Co-Chair of the Research Group. He has been the President or
[48] E. Kasturi, S. P. Devi, S. V. Kiran, and S. Manivannan, ‘‘Airline route the Co-President of international conferences or special sessions and con-
profitability analysis and optimization using BIG DATA analyticson avia- ferences for international journals. He has authored or co-authored (more
tion data sets under heuristic techniques,’’ Procedia Comput. Sci., vol. 87, than 180) numerous book chapters, journal articles, and communications.
pp. 86–92, 2016. He participates in several research networks, projects, and associations.
[49] J. Manyika et al., Big Data: The Next Frontier for Innovation, Competition
and Productivity. New York, NY, USA: McKinsey Global Institute, 2011.
[50] A. Intezari and S. Gressel, ‘‘Information and reformation in KM systems:
JOÃO MANUEL R. S. TAVARES graduated in
Big data and strategic decision-making,’’ J. Knowl. Manage., vol. 21, no. 1,
mechanical engineering from the Universidade do
pp. 71–91, 2017.
Porto, Portugal, in 1992. He received the M.Sc.
and Ph.D. degrees in electrical and computer engi-
neering from the Universidade do Porto, in 1995
and 2001, respectively, and the Habilitation degree
in mechanical engineering, in 2015. He is cur-
rently a Senior Researcher with the Instituto de
Ciência e Inovação em Engenharia Mecânica e
Engenharia Industrial and an Associate Professor
with the Department of Mechanical Engineering, Faculdade de Engenharia
da Universidade do Porto. He is the co-editor of more than 40 books, and
the co-author of more than 35 book chapters and 600 articles in international
and national journals and conferences. He holds three international patents
and two national patents. He has been a Committee Member for several
international and national journals and conferences. He is the co-founder
and the co-editor of the book series Lecture Notes in Computational Vision
SAFA BAHRI was born in Tunis, Tunisia, in 1992. and Biomechanics (Springer). He has been a (Co-)Supervisor for several
She received the National Diploma of Engineering M.Sc. and Ph.D. theses and a Supervisor for several Postdoctoral projects.
degree in industrial engineering from the National He has participated in many scientific projects as a Researcher and as a
School of Engineering of Carthage, in 2016. Scientific Coordinator. His main research interests include computational
She is currently pursuing the joint Ph.D. degree vision, medical imaging, computational mechanics, scientific visualization,
with the LTISIRS Laboratory, National School of human–computer interaction, and new product development. He is the
Engineering of Tunisia, and the LAMIH Labo- Founder and the Editor-in-Chief of the Computer Methods in Biomechanics
ratory, University of Valenciennes and Hainaut- and Biomedical Engineering: Imaging & Visualization (Taylor & Francis)
Cambrésis. She is developing a healthcare system and the Co-Founder and the Co-Chair of the International Conference Series,
that predicts the spreading of epidemics through including CompIMAGE, ECCOMAS Vip IMAGE, ICCEBS, and BioDental.
social media data analysis. Her research interests include big data applica- More information can be found at www.fe.up.pt/ tavares.
tions in the healthcare sector.