research-article

Free access

Optimal Dynamic Subset Sampling: Theory and Applications

Authors:

Lu Yi,

Hanzhi Wang,

Zhewei WeiAuthors Info & Claims

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 3116 - 3127

https://doi.org/10.1145/3580305.3599458

Published: 04 August 2023 Publication History

PDF eReader

Abstract

We study the fundamental problem of sampling independent events, called subset sampling. Specifically, consider a set of n distinct events S=x₁, …, x_n, in which each event x_i has an associated probability p(x_i). The subset sampling problem aims to sample a subset T ⊆ S, such that every x_i is independently included in T with probability p(x_i). A naive solution is to flip a coin for each event, which takes O(n) time. However, an ideal solution is a data structure that allows drawing a subset sample in time proportional to the expected output size μ=∑i=1ⁿ p(x_i), which can be significantly smaller than n in many applications. The subset sampling problem serves as an important building block in many tasks and has been the subject of various research for more than a decade.

However, the majority of existing subset sampling methods are designed for a static setting, where the events in set S or their associated probabilities remain unchanged over time. These algorithms incur either large query time or update time in a dynamic setting despite the ubiquitous time-evolving events with varying probabilities in real life. Therefore, it is a pressing need, but still, an open problem, to design efficient dynamic subset sampling algorithms.

In this paper, we propose ODSS, the first optimal dynamic subset sampling algorithm. The expected query time and update time of ODSS are both optimal, matching the lower bounds of the subset sampling problem. We present a nontrivial theoretical analysis to demonstrate the superiority of ODSS. We also conduct comprehensive experiments to empirically evaluate the performance of ODSS. Moreover, we apply ODSS to a concrete application: Influence Maximization. We empirically show that our ODSS can improve the complexities of existing Influence Maximization algorithms on large real-world evolving social networks.

Supplementary Material

MP4 File (0443-2min-promo.mp4)

Presentation video (short version), including an overview of the subset sampling problem and the contributions of the paper "Optimal Dynamic Subset Sampling: Theory and Applications"

Download
12.84 MB

MP4 File (0443-20min-video.mp4)

Presentation Video (long version) for "Optimal Dynamic Subset Sampling: Theory and Applications"

Download
181.92 MB

References

[1]

https://arxiv.org/abs/2305.18785.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Efficient Sampling Methods for Discrete Distributions

Efficient Stream Sampling for Variance-Optimal Estimation of Subset Sums

Optimal adaptive sampling recovery

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations