Big Data Security Issues
Big Data Security Issues
net/publication/275772328
CITATIONS READS
68 34,418
3 authors, including:
Asoke Nath
St. Xavier's College, Kolkata
281 PUBLICATIONS 2,427 CITATIONS
SEE PROFILE
All content following this page was uploaded by Asoke Nath on 04 May 2015.
Abstract— The amount of data in world is growing day by day. Data is growing because of use of internet, smart
phone and social network. Big data is a collection of data sets which is very large in size as well as complex. Generally
size of the data is Petabyte and Exabyte. Traditional database systems is not able to capture, store and analyze this
large amount of data. As the internet is growing, amount of big data continue to grow. Big data analytics provide new
ways for businesses and government to analyze unstructured data. Now a days, Big data is one of the most talked topic
in IT industry. It is going to play important role in future. Big data changes the way that data is managed and used.
Some of the applications are in areas such as healthcare, traffic management, banking, retail, education and so on.
Organizations are becoming more flexible and more open. New types of data will give new challenges as well. The
present paper highlights important concepts of Big Data. In this write up we discuss various aspects of big data. We
define Big Data and discuss the parameters along which Big Data is defined. This includes the three V's of big data
which are velocity, volume and variety. The authors also look at processes involved in data processing and review the
security aspects of Big Data and propose a new system for Security of Big Data and finally present the future scope of
Big Data.
Keywords— Big data, Petabyte, Exabyte, Database, velocity, volume, variety
I. INTRODUCTION
The term Big Data is now used almost everywhere in our daily life. The term Big Data came around 2005 which refers to
a wide range of large data sets almost impossible to manage and process using traditional data management tools – due to
their size, but also their complexity. Big Data can be seen in the finance and business where enormous amount of stock
exchange, banking, online and onsite purchasing data flows through computerized systems every day and are then
captured and stored for inventory monitoring, customer behaviour and market behaviour. It can also be seen in the life
sciences where big sets of data such as genome sequencing, clinical data and patient data are analysed and used to
advance breakthroughs in science in research. Other areas of research where Big Data is of central importance are
astronomy, oceanography, and engineering among many others. The leap in computational and storage power enables the
collection, storage and analysis of these Big Data sets and companies introducing innovative technological solutions to
Big Data analytics are flourishing. In this article, we explore the term Big Data as it emerged from the peer reviewed
literature. As opposed to news items and social media articles, peer reviewed articles offer a glimpse into Big Data as a
topic of study and the scientific problems methodologies and solutions that researchers are focusing on in relation to it.
The purpose of this article, therefore, is to sketch the emergence of Big Data as a research topic from several points: (1)
timeline, (2) geographic output, (3) disciplinary output, (4) types of published papers, and (5) thematic and conceptual
development. The amount of data available to us is increasing in manifold with each passing moment. Data is generated
in huge amounts all around us. Every digital process and social media exchange produces it. Systems, sensors and mobile
devices transmit it. [1] With the advancement in technology, this data is being recorded and meaningful value is being
extracted from it. Big data is an evolving term that describes any voluminous amount of structured, semi-structured and
unstructured data that has the potential to be mined for information.
The 3Vs that define Big Data are Variety, Velocity and Volume.
1) Volume: There has been an exponential growth in the volume of data that is being dealt with. Data is not just in
the form of text data, but also in the form of videos, music and large image files. Data is now stored in terms of
Terabytes and even Petabytes in different enterprises. With the growth of the database, we need to re-evaluate
the architecture and applications built to handle the data.
2) Velocity: Data is streaming in at unprecedented speed and must be dealt with in a timely manner. RFID tags,
sensors and smart metering are driving the need to deal with torrents of data in near-real time. Reacting quickly
enough to deal with data velocity is a challenge for most organizations.
3) Variety: Today, data comes in all types of formats. Structured, numeric data in traditional databases.
Information created from line-of-business applications. Unstructured text documents, email, video, audio, stock
ticker data and financial transactions. We need to find ways of governing, merging and managing these diverse
forms of data.
There are two other metrics of defining Big Data
4) Variability: Variability. In addition to the increasing velocities and varieties of data, data flows can be highly
inconsistent with periodic peaks. Daily, seasonal and event-triggered peak data loads can be challenging to
manage. Even more so with unstructured data involved.[2]
_________________________________________________________________________________________________
© 2015, IJIRAE- All Rights Reserved Page - 15
International Journal of Innovative Research in Advanced Engineering (IJIRAE) ISSN: 2349-2163
Issue 2, Volume 2 (February 2015) www.ijirae.com
5) Complexity: Complexity. Today's data comes from multiple sources. And it is still an undertaking to link,
match, cleanse and transform data across systems. However, it is necessary to connect and correlate
relationships, hierarchies and multiple data linkages or your data can quickly spiral out of control. A data
environment can lie along the extremes on any one of the following parameters, or a combination of them, or
even all of them together.
II. BIG DATA TECHNOLOGY: OPERATIONS VS. ANALYTICAL
The Big Data landscape can be divided into two main categories: Systems which provide operational capabilities for
real time, transactional/interactive situations where data is captured and stored. The other type is systems that
provide analysis capabilities for retrospective and complex analysis of the data that has been stored. This document
is a template. An electronic copy can be downloaded from the Journal website. For questions on paper guidelines,
please contact the journal publications committee as indicated on the journal website. Information about final paper
submission is available from the conference website. The following table is a comparison between Operation and
Analytical Systems in the field of Big Data.
TABLE I
Overview of Operational vs. Analytical Systems
Operational Analytical
6. Data Interpretation: The ultimate step in Big Data processing includes interpretation and gaining valuable
information from the data that is processed. The information gained can be of two types: Retrospective
Analysis includes gaining insights about events and actions that have already taken place. For instance, data
about the television viewership for a show in different areas can help us judge the popularity of the show in
those areas. Prospective Analysis includes judging patterns and discerning trends for future from data that is
already been generated. Weather Prediction using big data analysis is an example of prospective analysis.
Problems accruing from such interpretations pertain to fallacious and misleading trends being predicted. This is
particularly dangerous due to an increasing reliance on data for key decisions. For example, if a particular
symptom is plotted against the likelihood of being diagnosed with a particular disease, it might lead to
misinformation about the symptom being caused due to the particular disease itself. Insights gained from data
interpretation are therefore very important and the primary reason for processing big data as well.All paragraphs
must be indented. All paragraphs must be justified, i.e. both left-justified and right-justified.
The advent of Big Data has presented nee challenges in terms of Data Security. There is an increasing need of
research in technologies that can handle the vast volume of Data and make it secure efficiently. Current
Technologies for securing data are slow when applied to huge amounts of data.
_________________________________________________________________________________________________
© 2015, IJIRAE- All Rights Reserved Page - 17
International Journal of Innovative Research in Advanced Engineering (IJIRAE) ISSN: 2349-2163
Issue 2, Volume 2 (February 2015) www.ijirae.com
TABLE II
ENCRYPTION RATES OF POPULAR ALGORITHMS
From the above table we can conclude that even the most efficient algorithms give an encryption rate of 64.3
MB/sec. However, in the light of Big Data where the amounts of data extend to a Gigabytes or even Petabytes, we
can see a significant bottle neck for encrypting such large amounts of data. This is detrimental to the nature of Big
Data which have real time processing and results. A need for a secure but faster encryption technique is increasingly
required.
Another glaring challenge in Big Data is query processing on encrypted data. Currently, queries in both unstructured
and structured encrypted data need decryption of the data first. Due to vast amounts of data this can take significant
amounts of time and Query Processing can take significant time.
We now look into an alternative scheme of data encryption.
_________________________________________________________________________________________________
© 2015, IJIRAE- All Rights Reserved Page - 18
International Journal of Innovative Research in Advanced Engineering (IJIRAE) ISSN: 2349-2163
Issue 2, Volume 2 (February 2015) www.ijirae.com
One significant disadvantage of the proposed system is that security is compromised when individual elements are
encrypted and the process takes longer to encrypt the first time than encrypting Data Base as a whole.
_________________________________________________________________________________________________
© 2015, IJIRAE- All Rights Reserved Page - 19
International Journal of Innovative Research in Advanced Engineering (IJIRAE) ISSN: 2349-2163
Issue 2, Volume 2 (February 2015) www.ijirae.com
2) The second challenge is that even now, in organizations, many data points are not connected. This problem of
connectivity is a severe hurdle. Big Data is all about collection of data from various transaction points.
Organizations need to be able to manage data from across its enterprises. In order to address the growing
volume of data created as a part of power grid operation, Siemens and Accenture recently formed a joint venture
in the smart grid field to focus on solutions and services for system integration and data management. [3] These
offerings will enable utilities to integrate operational technologies, such as real-time grid management, with
information technologies like smart metering.
3) To leverage Big Data, one has to work across departments such as IT, Engineering and Finance. Thus the
ownership and procurement of this data has to be a co-operative endeavour across these departments. This
proves to be a significant organizational challenge.
4) There is a security angle related to Big Data collection. This is a major obstacle preventing companies from
taking full advantage of Big Data Analysis.
Several issues will have to be addressed to capture the full potential of big data. Policies related to privacy, security,
intellectual property, and even liability will need to be addressed in a big data world. Organizations need not only to
put the right talent and technology in place but also structure workflows and incentives to optimize the use of big
data. Access to data is critical—companies will increasingly need to integrate information from multiple data sources,
often from third parties, and the incentives have to be in place to enable this.
REFERENCES
[1] Dona Sarkar, Asoke Nath, “Big Data – A Pilot Study on Scope and Challenges”, International Journal of Advance
Research in Computer Science and Management Studies (IJARCSMS, ISSN: 2371-7782), Volume 2, Issue 12, Dec
31, Page: 9-19(2014).
[2] http://www.cra.org/ccc/files/docs/init/bigdatawhitepaper.pdf
[3] http://www.nessi-europe.com/Files/Private/NESSI_WhitePaper_BigData.pdf
[4] http://sites.amd.com/sa/Documents/IDC_AMD_Big_Data_Whitepaper.pdf
[5] Sagiroglu, S.; Sinanc, D. ,”Big Data: A Review”
[6] Grosso, P. ; de Laat, C. ; Membrey, P.,(” Addressing big data issues in Scientific Data
Infrastructure”
[7] Kogge, P.M.,(20-24 May,2013), “Big data, deep data, and the effect of system architectures on performance”
Szczuka, Marcin,(24-28 June,2013),” How deep data becomes big data”.
[8] META Group. "3D Data Management: Controlling Data Volume, Velocity, and Variety." February 2001.
Performance Analysis of Data Encryption Algorithms: Abdel-Karim Al Tamimi
_________________________________________________________________________________________________
© 2015, IJIRAE- All Rights Reserved Page - 20