poster

Massive structured data management solution

Authors:

Ullas Nambiar,

Rajeev Gupta,

Himanshu Gupta,

Mukesh MohaniaAuthors Info & Claims

CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

Pages 1905 - 1908

https://doi.org/10.1145/1871437.1871760

Published: 26 October 2010 Publication History

Get Access

Abstract

The need to analyze structured data for various business intelligence applications such as customer churn analysis, social network analysis, etc. is well known. However, the potential size to which such data will scale in future will make solutions that revolve around data warehouses hard to scale. We begin by presenting a business case that prompted us to look at building a distributed analytics platform that is leveraging the MapReduce framework pioneered by Google. We present the results of the study and highlight issues with the current structured data access techniques for MapReduce platforms. Finally, we present a distributed and scalable data platform that leverages Apache Hadoop to enable business analysts to seamlessly query archived data along with data stored in the warehouse.

References

[1]

IBM Infosphere Warehouse. Details available at http://www-01.ibm.com/software/data/infosphere/warehouse/.

Google Scholar

[2]

Apache Foundation. Hadoop. http://hadoop.apache.org/core/.

Google Scholar

[3]

J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Communications of ACM. Vol: 51(1): 107--113. 2008.

Digital Library

Google Scholar

[4]

A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. Dewitt, S. Madden, and M. Stonebraker. A Comparison of Approaches to Large Scale Data Analysis. In Proceedings of the ACM SIGMOD International Conference, 2009.

Digital Library

Google Scholar

[5]

Hive- Hadoop wiki. http://wiki.apache.org/hadoop/Hive

Google Scholar

[6]

Jaql Project hosting. http://code.google.com/p/jaql/

Google Scholar

[7]

JSON. http://www.json.org

Google Scholar

[8]

Y. Kotidis, N. Roussopoulos. DynaMat: A Dynamic View Management System for Data Warehouses. Proceedings of SIGMOD, 1999.

Digital Library

Google Scholar

[9]

Y. Zhao, P. Deshpande, and J. Naughton. An Array-Based Algorithm for Simultaneous Multidimensional Aggregates. Proceedings of SIGMOD, 1997.

Digital Library

Google Scholar

[10]

R. Ramamurthy, D. Dewitt. Qi Su: A Case for Fractured Mirrors. In Proceedings of VLDB, 2002.

Digital Library

Google Scholar

Index Terms

Massive structured data management solution
1. Information systems
  1. Data management systems
  2. Information retrieval
    1. Information retrieval query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing
ICDS 2015: Proceedings of the Second International Conference on Data Science - Volume 9208

With the fast development of remote sensing techniques, the volume of acquired data grows exponentially. This brings a big challenge to process massive remote sensing data. In the paper, an in-memory computing framework is proposed to address this ...
Big Data Management: Advanced Issues and Approaches

The objective of this article is to provide the advanced issues and approaches of big data management. The literature review indicates the overview of big data management; the aspects of Big Data Analytics BDA; the importance of big data management; the ...
Research and Implementation of Massive Health Care Data Management and Analysis Based on Hadoop
ICCIS '12: Proceedings of the 2012 Fourth International Conference on Computational and Information Sciences

New generation of health care IT systems are collecting and storing more and more data of patients. Useful knowledge can be extracted from the data in EMR or PHR to provide medical advises to patients, while through data analysis the result statistics ...

Comments

Information & Contributors

Information

Published In

CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

October 2010

2036 pages

ISBN:9781450300995

DOI:10.1145/1871437

General Chair:
Jimmy Huang
York University, Canada
,
Program Chairs:
Nick Koudas
University of Toronto, Canada
,
Gareth Jones
Dublin City University, Ireland
,
Xindong Wu
University of Vermont, USA
,
Kevyn Collins-Thompson
Microsoft Research, USA
,
Aijun An
York University, Canada

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

CIKM '10

Sponsor:

CIKM '10: International Conference on Information and Knowledge Management

October 26 - 30, 2010

ON, Toronto, Canada

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
404
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing

Big Data Management: Advanced Issues and Approaches

Research and Implementation of Massive Health Care Data Management and Analysis Based on Hadoop