Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

High-throughput publish/subscribe on top of LSM-based storage

Published: 01 March 2019 Publication History

Abstract

State-of-the-art publish/subscribe systems are efficient when the subscriptions are relatively static--for instance, the set of followers in Twitter--or can fit in memory. However, now-a-days, many big data and IoT based applications follow a highly dynamic query paradigm, where both continuous queries and data entries are in the millions and can arrive and expire rapidly. In this paper we propose and compare several publish/subscribe storage architectures, based on the popular NoSQL log-structured merge tree (LSM) storage paradigm, to support high-throughput and highly dynamic publish/subscribe systems. Our framework naturally supports subscriptions on both historic and future streaming data, and generates instant notifications. We also extend our framework to efficiently support self-joining subscriptions, where streaming pub/sub records join with past pub/sub entries. Further, we show how hierarchical attributes, such as concept ontologies, can be efficiently supported; for example, a publication's topic is "politics" whereas a subscription's topic is "US politics." We implemented and experimentally evaluated our methods on the popular LSM-based LevelDB system, using real datasets, for simple match and self-joining subscriptions on both flat and hierarchical attributes. Our results show that our approaches achieve significantly higher throughput compared to state-of-the-art baselines.

References

[1]
Carey, M.J., Jacobs, S., Tsotras, V.J.: Breaking bad: a data serving vision for big active data. In: Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems, pp. 181---186. ACM, New York (2016)
[2]
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35---40 (2010)
[3]
Feinberg, A.: Project voldemort: reliable distributed storage. In: Proceedings of the 10th IEEE International Conference on Data Engineering (2011)
[4]
Alsubaiee, S., Behm, A., Borkar, V., Heilbron, Z., Kim, Y.S., Carey, M.J., Dreseler, M., Li, C.: Storage management in AsterixDB. Proc. VLDB Endow. 7(10), 841---852 (2014)
[5]
Mongodb.: https://www.mongodb.com
[6]
Leveldb.: http://leveldb.org/
[7]
Fidler, E., Jacobsen, H.A., Li, G., Mankovski, S.: The padres distributed publish/subscribe system. In: FIW, pp. 12---30 (2005)
[8]
Project Website for Open Source Code.: http://dblab.cs.ucr.edu/projects/PubSub-Store/
[9]
Eugster, P.T., Felber, P.A., Guerraoui, R., Kermarrec, A.M.: The many faces of publish/subscribe. ACM Comput. Surv. (CSUR) 35(2), 114---131 (2003)
[10]
Kermarrec, A.M., Triantafillou, P.: Xl peer-to-peer pub/sub systems. ACM Comput. Surv. (CSUR) 46(2), 16 (2013)
[11]
Jacobsen, H.A., Muthusamy, V., Li, G.: The padres event processing network: uniform querying of past and future eventsdas padres ereignisverarbeitungsnetzwerk: Einheitliche anfragen auf ereignisse der vergangenheit und zukunft. it Inform. Technol. 51(5), 250---260 (2009)
[12]
Bhatt, N., Gawlick, D., Soylemez, E., Yaseem, R.: Content based publish-and-subscribe system integrated in a relational database system. US Patent 6,405,191 (2002)
[13]
Jacobs, S., Uddin, M.Y.S., Carey, M., Hristidis, V., Tsotras, V.J., Venkatasubramanian, N., Wu, Y., Safir, S., Kaul, P., Wang, X., Qader, M.A., Li, Y.: A bad demonstration: towards big active data. Proc. VLDB Endow. 10(12), 1941---1944 (2017)
[14]
Tian, F., Reinwald, B., Pirahesh, H., Mayr, T., Myllymaki, J.: Implementing a scalable xml publish/subscribe system using relational database systems. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 479---490. ACM, New York (2004)
[15]
Guo, L., Zhang, D., Li, G., Tan, K.L., Bao, Z.: Location-aware pub/sub system: when continuous moving queries meet dynamic event streams. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 843---857. ACM, New York (2015)
[16]
Qader, M.A., Hristidis, V.: Dualdb: An efficient lsm-based publish/subscribe storage system. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management (SSDBM) (2017)
[17]
Widom, J., Finkelstein, S.J.: Set-oriented production rules in relational database systems. In: ACM SIGMOD Record, vol. 19, pp. 259---270. ACM, New York (1990)
[18]
Schreier, U., Pirahesh, H., Agrawal, R., Mohan, C.: Alert: An architecture for transforming a passive DBMS into an active DBMS. In: Proceedings of the 17th International Conference on Very Large Data Bases, pp. 469---478. Morgan Kaufmann Publishers Inc. (1991)
[19]
Hanson, E.N., Carnes, C., Huang, L., Konyala, M., Noronha, L., Parthasarathy, S., Park, J., Vernon, A.: Scalable trigger processing. In: Proceedings 15th International Conference on Data Engineering, pp. 266---275. IEEE (1999)
[20]
Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: Niagaracq: a scalable continuous query system for internet databases. In: ACM SIGMOD Record, vol. 29, pp. 379---390. ACM, New York (2000)
[21]
Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Krishnamurthy, S., Madden, S.R., Reiss, F., Shah, M.A.: Telegraphcq: continuous dataflow processing. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 668---668. ACM, New York (2003)
[22]
Babu, S., Widom, J.: Continuous queries over data streams. ACM Sigmod Record 30(3), 109---120 (2001)
[23]
Madden, S., Shah, M., Hellerstein, J.M., Raman, V.: Continuously adaptive continuous queries over streams. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 49---60. ACM, New York (2002)
[24]
Garg, N.: Apache Kafka. Packt Publishing Ltd, Birmingham (2013)
[25]
Gemfire Continuous Querying.: https://pubs.vmware.com/vfabric5/index.jsp?topic=/com.vmware.vfabric.gemfire.6.6/developing/continuous_querying/how_continuous_querying_works.html
[26]
Influxdb.: https://www.influxdata.com/
[27]
Hendawi, A.M., Gupta, J., Shi, Y., Fattah, H., Ali, M.: The microsoft reactive framework meets the internet of moving things. In: IEEE 33rd International Conference on Data Engineering (2017)
[28]
Oracle Bitmap Indexes.: https://docs.oracle.com/cd/B10500_01/server.920/a96520/indexes.htm
[29]
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. TOCS 26(2), 4 (2008)
[30]
George, L.: HBase: The Definitive Guide. O'Reilly Media Inc, Sebastopol, CA (2011)
[31]
Rocksdb.: http://rocksdb.org/
[32]
Qader, M.A., Cheng, S., Hristidis, V.: A comparative study of secondary indexing techniques in LSM-based NoSQL databases. In: Proceedings of the 2018 International Conference on Management of Data, pp. 551---566. ACM, New York (2018)
[33]
Tatarinov, I., Viglas, S.D., Beyer, K., Shanmugasundaram, J., Shekita, E., Zhang, C.: Storing and querying ordered xml using a relational database system. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 204---215. ACM, New York (2002)
[34]
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143---154 (2010)
  1. High-throughput publish/subscribe on top of LSM-based storage

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Distributed and Parallel Databases
    Distributed and Parallel Databases  Volume 37, Issue 1
    March 2019
    229 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 March 2019

    Author Tags

    1. Big data
    2. Continuous lookup queries
    3. Dewey
    4. Internet of things
    5. LevelDB
    6. Log-structured merge tree
    7. Publish/subscribe
    8. Self-join subscription

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Oct 2024

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media