This paper presents a survey on security and privacy issues in big data and NoSQL. Due to the high volume, velocity and variety of big data, security and privacy issues are different in such streaming data infrastructures with diverse... more
This paper presents a survey on security and privacy issues in big data and NoSQL. Due to the high volume, velocity and variety of big data, security and privacy issues are different in such streaming data infrastructures with diverse data format. Therefore, traditional security models have difficulties in dealing with such large scale data. In this paper we present some security issues in big data and highlight the security and privacy challenges in big data infrastructures and NoSQL databases.
Big data has been an imperative quantum globally. Gargantuan data types starting from terabytes to petabytes are used incessantly. But, to cache these database competencies is an arduous task. Although, conventional database mechanisms... more
Big data has been an imperative quantum globally. Gargantuan data types starting from terabytes to petabytes are used incessantly. But, to cache these database competencies is an arduous task. Although, conventional database mechanisms were integral elements for reservoir of intricate and immeasurable datasets, however, it is through the approach of NoSQL that is able to accumulate the prodigious information in a proficient style. Furthermore, the Hadoop framework is used which has numerous components. One of its foremost constituent is the MapReduce. The MapReduce is the programming quintessential on which mining of purposive knowledge is extracted. In this paper, the postulates of big data are discussed. Moreover, the Hadoop architecture is shown as a master- slave procedure to distribute the jobs evenly in a parallel style. The MapReduce has been epitomized with the help of an algorithm. It represents WordCount as the criterion for mapping and reducing the datasets.
The amount of data to store, organize and manage in any organization, is very high and increases every day, fact well-known by companies as Facebook, Google or SAS. With this current growth rate, technologies must adapt to the amount of... more
The amount of data to store, organize and manage in any organization, is very high and increases every day, fact well-known by companies as Facebook, Google or SAS. With this current growth rate, technologies must adapt to the amount of disposable data, and a new approach to information processing is required. Big Data technologies are more focused, and this is a reason for a greater spread of NoSQL database models. The purpose of this article is to validate the existing (and already used) migration methods and to adapt them, to understand the most efficient method to migrate a relational database to a NoSQL database. We will show the methodology used and what were the steps followed for the implementation, as well as the configuration of the environment used during the tests. Results show that in this migration process, the most efficient method is what is referred to as automatic offline migration. However, it requires a window of unavailability greater than the method of online m...
Client side testing/execution of server side scripts/languages; first, enables Developers to test their server side scripts on the browser directly with the help of this module/extension avoiding efforts needed for running scripts on... more
Client side testing/execution of server side scripts/languages; first, enables Developers to test their server side
scripts on the browser directly with the help of this module/extension avoiding efforts needed for running scripts on web
servers; second, consolidate approach for transporting server side computation to client side. This module parses and
semantically analyses PHP code transforming it into Abstract Syntax Tree (AST).AST is then traversed and translated to
equivalent JavaScript code. Resulting Hypertext and JavaScript script is fed to Rendering Engine. Functionality of PHP
script is now executed by JavaScript Interpreter available within conventional browsers. Problems related to persistent data
handling can be handled by WebDatabase Facility available in HTML5 specification.
Effective and efficient data management is key to today’s competitive business environment. Data is a valuable asset for any organization. Data is information and information is knowledge. Today’s enterprise applications generate large... more
Effective and efficient data management is key to today’s competitive business environment. Data is a valuable asset for any organization. Data is information and information is knowledge. Today’s enterprise applications generate large amount of data especially because of high usage of the internet. As a result, enterprise applications should be able to scale out and perform as required. Otherwise, these applications will not be able to handle their data load, leading to trouble in business continuity. Data replication is one widely used phenomenon in distributed environments, where data is stored in multiple sites, within same or differing geographical areas. Thus enterprise applications will be able to scale out, so that they perform well. Many modern Database Management Systems (DBMSs) provide in-built methods for data replication. Replication is possible between heterogeneous database systems. In this research, the aim is to design and implement a middleware layer for data replication from RDBMS to document oriented DBMS. The middleware layer includes a Java program, source and destination DBMSs. The research approach is to capture DML/DDL changes in source relational DBMS and then convert and stores them in an intermediate XML format. Then the java program continuously looks for such data changes and then push them to the destination DBMS. A replication method like this can address the need of application scalability in situations where both types of DBMSs are used. In live environments, there are areas of possible performance improvements to the middleware layer, especially when dealing with large data volumes.
Research includes : Google, Facebook, twitter and amazon's hardware and software structure. From storage, development platforms to inside structure discussed in details. Kindly review the research report, the data is taken from... more
Research includes :
Google, Facebook, twitter and amazon's hardware and software structure.
From storage, development platforms to inside structure discussed in details.
Kindly review the research report, the data is taken from highly reliable sources, its impact will be well calculated and result driven. Feel free to contract me if more details or updates are needed.
Querying large data graphs has brought the attention of the research community. Many solutions were pro- posed, such as Oracle Semantic Technologies, Virtuoso, RDF3X, and C-Store, among others. Although such approaches have shown good... more
Querying large data graphs has brought the attention of the research community. Many solutions were pro- posed, such as Oracle Semantic Technologies, Virtuoso, RDF3X, and C-Store, among others. Although such approaches have shown good performance in queries with medium complexity, they perform poorly when the complexity of the queries increases. In this paper, the authors propose the Graph Signature Index, a novel and scalable approach to index and query large data graphs. The idea is that they summarize a graph and instead of executing the query on the original graph, they execute it on the summaries. The authors’ experi- ments with Yago (16M triples) have shown that e.g., a query with 4 levels costs 62 sec using Oracle but it only costs about 0.6 sec with their index. Their index can be implemented on top of any Graph database, but they chose to implement it as an extension to Oracle on top of the SEM_MATCH table function. The paper also introduces disk-based versions of the Trace Equivalence and Bisimilarity algorithms to summarize data graphs, and discusses their complexity and usability for RDF graphs.