Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3485447.3512150acmconferencesArticle/Chapter ViewAbstractPublication PageswebconfConference Proceedingsconference-collections
research-article

Multi-dimensional Probabilistic Regression over Imprecise Data Streams

Published: 25 April 2022 Publication History
  • Get Citation Alerts
  • Abstract

    In applications of Web of Things or Web of Events, a massive volume of multi-dimensional streaming data are automatically and continuously generated from different sources, such as GPS, sensors, and other measurement devices, which are essentially imprecise (inaccurate and/or uncertain). It is challenging to monitor and get insights over imprecise and low-level streaming data, in order to capture potentially important data changing trends and to initiate prompt responses. In this work, we investigate solutions for conducting multi-dimensional and multi-granularity probabilistic regression for the imprecise streaming data. The probabilistic nature of streaming data poses big computational challenges to the regression and its aggregation. In this paper, we study a series of techniques on multi-dimensional probabilistic regression, including aggregation, sketching, popular path materialization, and exception-driven querying. Extensive experiments on real and synthetic datasets show the efficiency and scalability of our proposals.

    References

    [1]
    2021. Effective and efficient top-k query processing over incomplete data streams. Information Sciences 544(2021), 343–371.
    [2]
    Alberto Abelló, Oscar Romero, Torben Bach Pedersen, Rafael Berlanga, Victoria Nebot, Maria Jose Aramburu, and Alkis Simitsis. 2014. Using semantic web technologies for exploratory OLAP: a survey. IEEE transactions on knowledge and data engineering 27, 2(2014), 571–588.
    [3]
    Rajeev Alur, Yu Chen, Kishor Jothimurugan, and Sanjeev Khanna. 2020. Space-efficient Query Evaluation over Probabilistic Event Streams. In LICS. 74–87.
    [4]
    Antoine Amarilli and İsmail İlkan Ceylan. 2020. A dichotomy for homomorphism-closed queries on probabilistic graphs. In ICDT.
    [5]
    Samir Awad, Abdelhamid Malki, and Mimoun Malki. 2021. Composing WoT services with uncertain and correlated data. Computing (2021), 1–17.
    [6]
    Asif Iqbal Baba, Manfred Jaeger, Hua Lu, Torben Bach Pedersen, Wei-Shinn Ku, and Xike Xie. 2016. Learning-Based Cleansing for Indoor RFID Data. In SIGMOD. 925–936.
    [7]
    Matteo Brucato, Nishant Yadav, Azza Abouzied, Peter J Haas, and Alexandra Meliou. 2020. Stochastic package queries in probabilistic databases. In SIGMOD. 269–283.
    [8]
    Hongming Cai and Athanasios V Vasilakos. 2017. Web of things data storage. In Managing the Web of Things. Elsevier, 325–354.
    [9]
    Yixin Chen, Guozhu Dong, Jiawei Han, Benjamin W Wah, and Jianyoung Wang. 2002. Multi-dimensional regression analysis of time-series data streams. In VLDB. 323–334.
    [10]
    Reynold Cheng, Eric Lo, Xuan S. Yang, Ming-Hay Luk, Xiang Li, and Xike Xie. 2010. Explore or Exploit? Effective Strategies for Disambiguating Large Databases. Proc. VLDB Endow. 3, 1 (2010), 815–825.
    [11]
    Reynold Cheng, Xike Xie, Man Lung Yiu, Jinchuan Chen, and Liwen Sun. 2010. UV-Diagram: A Voronoi Diagram for Uncertain Data. ICDE (2010).
    [12]
    Graham Cormode and Minos Garofalakis. 2007. Sketching probabilistic data streams. In SIGMOD. 281–292.
    [13]
    Graham Cormode and Minos N. Garofalakis. 2009. Histograms and Wavelets on Probabilistic Data. In ICDE.
    [14]
    Graham Cormode, Feifei Li, and Ke Yi. 2009. Semantics of Ranking Queries for Probabilistic Data and Expected Ranks. In ICDE.
    [15]
    Nilesh N. Dalvi and Dan Suciu. 2004. Efficient Query Evaluation on Probabilistic Databases. In VLDB. 864–875.
    [16]
    Torben Bach Pedersen Esteban Zim¨¢nyi Dilshod Ibragimov, Katja Hose. 2014. Towards exploratory OLAP over linked open data–a case study. In Enabling Real-Time Business Intelligence. Springer, 114–132.
    [17]
    Rui Ding, Shi Han, Yong Xu, Haidong Zhang, and Dongmei Zhang. 2019. Quickinsights: Quick and automatic discovery of insights from multi-dimensional data. In SIGMOD. 317–332.
    [18]
    Robert Fink, Larisa Han, and Dan Olteanu. 2012. Aggregation in Probabilistic Databases via Knowledge Compilation. In VLDB.
    [19]
    Nurefsan Gür, Jacob Nielsen, Katja Hose, and Torben Bach Pedersen. 2017. GeoSemOLAP: Geospatial OLAP on the Semantic Web made easy. In Proceedings of the 26th International Conference on World Wide Web Companion. 213–217.
    [20]
    Nurefşan Gür, Torben Bach Pedersen, Katja Hose, and Mikael Midtgaard. 2020. Multidimensional enrichment of spatial RDF data for SOLAP. Semantic WebPreprint(2020), 1–35.
    [21]
    Jiawei Han, Yixin Chen, Guozhu Dong, Jian Pei, Benjamin W Wah, Jianyong Wang, and Y Dora Cai. 2005. Stream cube: An architecture for multi-dimensional analysis of data streams. DPD 18, 2 (2005), 173–197.
    [22]
    Jiawei Han, Micheline Kamber, and Jian Pei. 2012. Data mining concepts and techniques, third edition.
    [23]
    Mohamed Hefeeda and Hossein Ahmadi. 2009. Energy-efficient protocol for deterministic and probabilistic coverage in sensor networks. TPDS 21, 5 (2009), 579–593.
    [24]
    T. S. Jayram, Satyen Kale, and Erik Vee. 2007. Efficient Aggregation Algorithms for Probabilistic Data. In SODA.
    [25]
    Bhargav Kanagal and Amol Deshpande. 2009. Efficient query evaluation on temporally correlated probabilistic streams. In ICDE. 1315–1318.
    [26]
    Junkun Li, Jiming Chen, and Ten H Lai. 2012. Energy-efficient intrusion detection with a barrier of probabilistic sensors. In INFOCOM. IEEE, 118–126.
    [27]
    Jian Li, Barna Saha, and Amol Deshpande. 2009. A Unified Approach to Ranking in Probabilistic Databases. In VLDB.
    [28]
    Xiang Lian and Lei Chen. 2009. Efficient Processing of Probabilistic Reverse Nearest Neighbor Queries over Uncertain Data. VLDB J. 18, 3 (2009), 787–808.
    [29]
    Yongqiang Liu and Xike Xie. 2021. XY-Sketch: on Sketching Data Streams at Web Scale. In TheWebConf. 1169–1180.
    [30]
    Pingchuan Ma, Rui Ding, Shi Han, and Dongmei Zhang. 2021. MetaInsight: Automatic Discovery of Structured Knowledge for Exploratory Data Analysis. In SIGMOD. 1262–1274.
    [31]
    Muntazir Mehdi, Ratnesh Sahay, Wassim Derguech, and Edward Curry. 2013. On-the-fly generation of multidimensional data cubes for web of things. In IDEAS. 28–37.
    [32]
    Alexandra Moraru, Dunja Mladenic, Matevz Vucnik, Maria Porcius, Carolina Fortuna, and Mihael Mohorcic. 2011. Exposing Real World Information for the Web of Things. In IIWeb In Conjunction with WWW 2011. Article 6, 6 pages.
    [33]
    Nathalie Moreno, Manuel F Bertoa, Gala Barquero, Loli Burgueño, Javier Troya, Adrián García-López, and Antonio Vallecillo. 2018. Managing uncertain complex events in web of things applications. In ICWE. Springer, 349–357.
    [34]
    Jian Pei, Bin Jiang, Xuemin Lin, and Yidong Yuan. 2007. Probabilistic Skylines on Uncertain Data. In VLDB.
    [35]
    Marco AF Pimentel, Peter H Charlton, and David A Clifton. 2015. Probabilistic estimation of respiratory rate from wearable sensors. In Wearable electronics sensors. Springer, 241–262.
    [36]
    Christopher Ré, Julie Letchner, Magdalena Balazinksa, and Dan Suciu. 2008. Event queries on correlated probabilistic streams. In SIGMOD. 715–728.
    [37]
    Jang-Ping Sheu and Huang-Fu Lin. 2007. Probabilistic coverage preserving protocol with energy efficiency in wireless sensor networks. In WCNC. IEEE, 2631–2636.
    [38]
    Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. 2011. Probabilistic Databases. Morgan & Claypool Publishers.
    [39]
    Yongxin Tong, Xiaofei Zhang, and Lei Chen. 2016. Tracking frequent items over distributed probabilistic data. WWW 19, 4 (2016), 579–604.
    [40]
    Jesper Pedersen Torben Bach Pedersen, Dennis Pedersen. 2008. Integrating XML data in the TARGIT OLAP system. International journal of Web engineering and technology 4, 4 (2008), 495–533.
    [41]
    Jovan Varga, Lorena Etcheverry, Alejandro A Vaisman, Oscar Romero, Torben Bach Pedersen, and Christian Thomsen. 2016. QB2OLAP: enabling OLAP on statistical linked open data. In 2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, 1346–1349.
    [42]
    David Vernet, Agustín Zaballos, Ramon Martin de Pozuelo, and Víctor Caballero. 2015. High performance web of things architecture for the smart grid domain. IJDSN 11, 12 (2015), 347413.
    [43]
    Zhuoyi Wang, Yuqiao Chen, Chen Zhao, Yu Lin, Xujiang Zhao, Hemeng Tao, Yigong Wang, and Latifur Khan. 2021. CLEAR: Contrastive-Prototype Learning with Drift Estimation for Resource Constrained Stream Mining. In TheWebConf. 1351–1362.
    [44]
    Zhe Wang, Nivan Ferreira, Youhao Wei, Aarthy Sankari Bhaskar, and Carlos Scheidegger. 2017. Gaussian Cubes: Real-Time Modeling for Visual Exploration of Large Multidimensional Datasets. TVCG 23, 1 (2017), 681–690.
    [45]
    Zhuoyi Wang, Yigong Wang, Yu Lin, Evan Delord, and Khan Latifur. 2020. Few-sample and adversarial representation learning for continual stream mining. In TheWebConf. 718–728.
    [46]
    Weiyuan Wu, Lampros Flokas, Eugene Wu, and Jiannan Wang. 2020. Complaint-driven training data debugging for query 2.0. In SIGMOD. 1317–1334.
    [47]
    Xike Xie, Reynold Cheng, Man Lung Yiu, Liwen Sun, and Jinchuan Chen. 2013. UV-Diagram: A Voronoi Diagram for Uncertain Spatial Databases. VLDBJ (2013).
    [48]
    Xike Xie, Xingjun Hao, Torben Bach Pedersen, Peiquan Jin, and Jinchuan Chen. 2016. OLAP over probabilistic data cubes I: Aggregating, materializing, and querying. In ICDE. 799–810.
    [49]
    Xike Xie, Hua Lu, and Torben Bach Pedersen. 2013. Efficient distance-aware query evaluation on indoor moving objects. In ICDE. 434–445.
    [50]
    Xike Xie, Hua Lu, and Torben Bach Pedersen. 2015. Distance-Aware Join for Indoor Moving Objects. IEEE Trans. Knowl. Data Eng. 27, 2 (2015), 428–442.
    [51]
    Xike Xie, Benjin Mei, Jinchuan Chen, Xiaoyong Du, and Christian S. Jensen. 2016. Elite: an elastic infrastructure for big spatiotemporal trajectories. VLDB J. 25, 4 (2016), 473–493.
    [52]
    Xike Xie, Man Lung Yiu, Reynold Cheng, and Hua Lu. 2013. Scalable Evaluation of Trajectory Queries over Imprecise Location Data. TKDE (2013).
    [53]
    Xike Xie, Kai Zou, Xingjun Hao, Torben Bach Pedersen, Peiquan Jin, and Wei Yang. 2020. OLAP over Probabilistic Data Cubes II: Parallel Materialization and Extended Aggregates. TKDE 32, 10 (2020), 1966–1981.
    [54]
    Qianqian Yang, Shibo He, Junkun Li, Jiming Chen, and Youxian Sun. 2014. Energy-efficient probabilistic area coverage in wireless sensor networks. IEEE Trans. Veh. Technol. 64, 1 (2014), 367–377.
    [55]
    Lina Yao and Quan Z. Sheng. 2013. Correlation Discovery in Web of Things. In WWW(Companion). 215¨C216.
    [56]
    Amelie Chi Zhou, Weilin Xue, Yao Xiao, Bingsheng He, Shadi Ibrahim, and Reynold Cheng. 2022. Taming System Dynamics on Resource Optimization for Data Processing Workflows: A Probabilistic Approach. TPDS 33, 1 (2022), 231–248.

    Cited By

    View all
    • (2023)Meta-sketchProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i6.25846(6916-6924)Online publication date: 7-Feb-2023
    • (2023)A Probabilistic Sketch for Summarizing Cold Items of Data StreamsIEEE/ACM Transactions on Networking10.1109/TNET.2023.331642632:2(1287-1302)Online publication date: 5-Oct-2023
    • (2023)Learn to Explore: on Bootstrapping Interactive Data Exploration with Meta-learning2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00135(1720-1733)Online publication date: Apr-2023

    Index Terms

    1. Multi-dimensional Probabilistic Regression over Imprecise Data Streams
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        WWW '22: Proceedings of the ACM Web Conference 2022
        April 2022
        3764 pages
        ISBN:9781450390965
        DOI:10.1145/3485447
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 25 April 2022

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Probabilistic Regression
        2. Stream Processing
        3. WOLAP

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        • NSFC

        Conference

        WWW '22
        Sponsor:
        WWW '22: The ACM Web Conference 2022
        April 25 - 29, 2022
        Virtual Event, Lyon, France

        Acceptance Rates

        Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)43
        • Downloads (Last 6 weeks)3
        Reflects downloads up to 27 Jul 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)Meta-sketchProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i6.25846(6916-6924)Online publication date: 7-Feb-2023
        • (2023)A Probabilistic Sketch for Summarizing Cold Items of Data StreamsIEEE/ACM Transactions on Networking10.1109/TNET.2023.331642632:2(1287-1302)Online publication date: 5-Oct-2023
        • (2023)Learn to Explore: on Bootstrapping Interactive Data Exploration with Meta-learning2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00135(1720-1733)Online publication date: Apr-2023

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media