Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Column Access-aware In-stream Data Cache with Stream Processing Framework

Published: 01 March 2017 Publication History

Abstract

In recent years, researches focus on addressing the query bottleneck issue of big data, e.g. NoSQL databases, MapReduce and big data processing framework. Although NoSQL databases have many advantages on On-Line Analytical Processing (OLAP), it is a big project to migrate Relational Database Management System (RDBMS) to NoSQL. Therefore, the optimization of RDBMS is still important. In this paper, we construct Column Access-aware In-stream Data Cache (CAIDC) for relational databases, which is an integral part of RDBMS and in-memory cache. Furthermore, a live synchronization approach from physical RDBMS to in-memory data cache using stream processing framework is proposed. On one hand, CAIDC provides low latency while supporting log-based trigger in the presence of updates to maintain data consistency because of stream processing framework. On the other hand, CAIDC translates the frequently accessed data to column-oriented in-memory cache by the column access frequency to ensure heavy hitter queries. Finally, experimental results show that this approach is supporting a wide range of applications with big data.

References

[1]
Ahirrao, S., & Ingle, R. (2013). Scalable transactions in cloud data stores. In 2013 IEEE 3rd international advance computing conference (IACC) (pp. 116---119). IEEE.
[2]
Bo, L.C.L. (2010). An improvement on window snapshot differential algorithm. Computer Applications and Software, 4, 047.
[3]
Casters, M., Bouman, R., & Van Dongen, J. (2010). Pentaho Kettle solutions: building open source ETL solutions with Pentaho data integration. Wiley.
[4]
Cattell, R. (2011). Scalable sql and nosql data stores. ACM SIGMOD Record, 39(4), 12---27.
[5]
Consulting, A. Mongify - move data from sql to mongodb with ease., http://mongify.com/.
[6]
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., & Sears, R. (2010). Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM symposium on cloud computing (pp. 143---154). ACM.
[7]
Das, S., Botev, C., Surlaker, K., Ghosh, B., Varadarajan, B., Nagaraj, S., Zhang, D., Gao, L., Westerman, J., Ganti, P., & et al. (2012). All aboard the databus!: Linkedin's scalable consistent change data capture platform. In Proceedings of the 3rd ACM symposium on cloud computing (p. 18). ACM.
[8]
Dean, J., & Ghemawat, S. (2010). Mapreduce: a flexible data processing tool. Communications of the ACM, 53(1), 72---77.
[9]
Dong, F., Ma, K., & Yang, B. (2015). Cache system for frequently updated data in the cloud. WSEAS Transactions on Computers, 14, 163---170.
[10]
Fitzpatrick, B. (2004). Distributed caching with memcached. Linux journal, 2004(124), 5.
[11]
Ghandeharizadeh, S., & Yap, J. (2012). Gumball: a race condition prevention technique for cache augmented sql database management systems. In Proceedings of the 2nd ACM SIGMOD workshop on databases and social networks (pp. 1---6). ACM.
[12]
Ghandeharizadeh, S., & Yap, J. (2013). Cache augmented database management systems. In Proceedings of the ACM SIGMOD workshop on databases and social networks (pp. 31---36). ACM.
[13]
Gupta, P., Zeldovich, N., & Madden, S. (2011). A trigger-based middleware cache for orms. In Middleware 2011 (pp. 329---349). Springer.
[14]
Gupta, P., Zeldovich, N., & Madden, S. (2011). A trigger-based middleware cache for orms. In Middleware 2011 (pp. 329---349). Springer.
[15]
Liu, Y., Liu, W., Song, J., & He, H. (2015). An empirical study on implementing highly reliable stream computing systems with private cloud. Ad Hoc Networks.
[16]
Ma, K., & Dong, F. (2015). Live data migration approach from relational tables to schema-free collections with mapreduce. International Journal of Services Technology and Management, 21(4/5/6), 318---335.
[17]
Ma, K., & Yang, B. (2015). Access-aware in-memory data cache middleware for relational databases. In Proceedings of 17th IEEE international conference on high performance computing and communications (pp. 1506---1511).
[18]
Ma, K., & Yang, B. (2015). Log-based change data capture from schema-free document stores using mapreduce. In Proceedings of 2015 international conference of cloud computing technologies and applications (pp. 1---6).
[19]
Mi, P., & Scacchi, W. (1992). Process integration in case environments. IEEE Software, 9(2), 45---53.
[20]
Plattner, H. (2009). A common database approach for oltp and olap using an in-memory column database. In Proceedings of the 2009 ACM SIGMOD international conference on management of data (pp. 1---2). ACM.
[21]
Ports, D.R., Clements, A.T., Zhang, I., Madden, S., & Liskov, B. (2010). Transactional consistency and automatic management in an application data cache. In OSDI, (Vol. 10 pp. 1---15).
[22]
Qin, L., Yu, J.X., & Chang, L. (2009). Keyword search in databases: the power of rdbms. In Proceedings of the 2009 ACM SIGMOD international conference on management of data (pp. 681---694). ACM.
[23]
Schwartz, B., Zaitsev, P., & Tkachenko, V. (2012). High performance MySQL: optimization, backups, and replication. O'Reilly Media Inc.
[24]
Stonebraker, M. (2010). Sql databases v. nosql databases. Communications of the ACM, 53(4), 10---11.
[25]
Vassiliadis, P. (2009). A survey of extract-transform-load technology. International Journal of Data Warehousing and Mining, 5(3), 1---27.
[26]
Xhafa, F., Naranjo, V., & Caballé, S. (2015). Processing and analytics of big data streams with yahoo! s4. In Proceedings of 2015 IEEE 29th international conference on advanced information networking and applications (pp. 263---270).
[27]
Zhou, H., Yang, D., & Xu, Y. (2012). An etl strategy for real-time data warehouse. In Practical applications of intelligent systems (pp. 329---336). Springer.

Cited By

View all
  • (2022)RETRACTED ARTICLE: Dynamic multi-variant relational scheme-based intelligent ETL framework for healthcare managementSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-022-06938-827:1(605-614)Online publication date: 21-Mar-2022
  • (2020)ROLAP DW transformation proposal for OLAP architecture in NoSQL databaseProceedings of the 10th Euro-American Conference on Telematics and Information Systems10.1145/3401895.3401899(1-7)Online publication date: 25-Nov-2020
  • (2018)A Dependable Time Series Analytic Framework for Cyber-Physical Systems of IoT-based Smart GridACM Transactions on Cyber-Physical Systems10.1145/31456233:1(1-18)Online publication date: 29-Aug-2018
  • Show More Cited By
  1. Column Access-aware In-stream Data Cache with Stream Processing Framework

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Journal of Signal Processing Systems
    Journal of Signal Processing Systems  Volume 86, Issue 2-3
    March 2017
    240 pages
    ISSN:1939-8018
    EISSN:1939-8115
    Issue’s Table of Contents

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 March 2017

    Author Tags

    1. Access frequency
    2. Big data
    3. Data cache
    4. NoSQL
    5. Stream computing
    6. Stream processing

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)RETRACTED ARTICLE: Dynamic multi-variant relational scheme-based intelligent ETL framework for healthcare managementSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-022-06938-827:1(605-614)Online publication date: 21-Mar-2022
    • (2020)ROLAP DW transformation proposal for OLAP architecture in NoSQL databaseProceedings of the 10th Euro-American Conference on Telematics and Information Systems10.1145/3401895.3401899(1-7)Online publication date: 25-Nov-2020
    • (2018)A Dependable Time Series Analytic Framework for Cyber-Physical Systems of IoT-based Smart GridACM Transactions on Cyber-Physical Systems10.1145/31456233:1(1-18)Online publication date: 29-Aug-2018
    • (2017)Stream-based live entity resolution approach with adaptive duplicate count strategyInternational Journal of Web and Grid Services10.1504/IJWGS.2017.08516713:3(351-373)Online publication date: 1-Jan-2017

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media