Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Approximating probabilistic group steiner trees in graphs

Published: 01 October 2022 Publication History

Abstract

Consider an edge-weighted graph, and a number of properties of interests (PoIs). Each vertex has a probability of exhibiting each PoI. The joint probability that a set of vertices exhibits a PoI is the probability that this set contains at least one vertex that exhibits this PoI. The probabilistic group Steiner tree problem is to find a tree such that (i) for each PoI, the joint probability that the set of vertices in this tree exhibits this PoI is no smaller than a threshold value, e.g., 0.97; and (ii) the total weight of edges in this tree is the minimum. Solving this problem is useful for mining various graphs with uncertain vertex properties, but is NP-hard. The existing work focuses on certain cases, and cannot perform this task. To meet this challenge, we propose 3 approximation algorithms for solving the above problem. Let |Γ| be the number of PoIs, and ξ be an upper bound of the number of vertices for satisfying the threshold value of exhibiting each PoI. Algorithms 1 and 2 have tight approximation guarantees proportional to |Γ| and ξ, and exponential time complexities with respect to ξ and |Γ|, respectively. In comparison, Algorithm 3 has a looser approximation guarantee proportional to, and a polynomial time complexity with respect to, both |Γ| and ξ. Experiments on real and large datasets show that the proposed algorithms considerably outperform the state-of-the-art related work for finding probabilistic group Steiner trees in various cases.

References

[1]
2022. AMiner. https://www.aminer.org.
[2]
2022. AMiner: Citation Network Dataset. https://www.aminer.cn/citation.
[3]
2022. GroupLens. https://grouplens.org.
[4]
2022. Microsoft Academic Graph. https://www.microsoft.com/en-us/research/project/microsoft-academic-graph.
[5]
2022. Stanford Network Analysis Project. http://snap.stanford.edu.
[6]
2022. Supplement. https://github.com/rucdatascience/PGST/blob/main/Supplement.pdf.
[7]
Takuya Akiba, Yoichi Iwata, and Yuichi Yoshida. 2013. Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 349--360.
[8]
Thomas Bernecker, Hans-Peter Kriegel, Matthias Renz, Florian Verhein, and Andreas Zuefle. 2009. Probabilistic frequent itemset mining in uncertain databases. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 119--128.
[9]
Mark Bukowski, Sandra Geisler, Thomas Schmitz-Rode, and Robert Farkas. 2020. Feasibility of activity-based expert profiling using text mining of scientific publications and patents. Scientometrics 123, 2 (2020), 579--620.
[10]
Moses Charikar, Chandra Chekuri, Ashish Goel, and Sudipto Guha. 1998. Rounding via trees: deterministic approximation algorithms for group Steiner trees and k-median. In Proceedings of the thirtieth annual ACM symposium on Theory of computing. 114--123.
[11]
Joel Coffman and Alfred C Weaver. 2014. An empirical performance evaluation of relational keyword search techniques. IEEE Transactions on Knowledge and Data Engineering 26, 1 (2014), 30--42.
[12]
Edith Cohen, Eran Halperin, Haim Kaplan, and Uri Zwick. 2003. Reachability and distance queries via 2-hop labels. SIAM Journal on Computing 32, 5 (2003), 1338--1355.
[13]
Bolin Ding, Jeffrey Xu Yu, Shan Wang, Lu Qin, Xiao Zhang, and Xuemin Lin. 2007. Finding top-k min-cost connected trees in databases. In IEEE International Conference on Data Engineering. IEEE, 836--845.
[14]
Huizhong Duan, ChengXiang Zhai, Jinxing Cheng, and Abhishek Gattani. 2013. Supporting keyword search in product database: a probabilistic approach. Proceedings of the VLDB Endowment 6, 14 (2013), 1786--1797.
[15]
CW Duin, A Volgenant, and Stefan Voß. 2004. Solving group Steiner problems as Steiner problems. European Journal of Operational Research 154, 1 (2004), 323--329.
[16]
Karoline Faust, Pierre Dupont, Jérôme Callut, and Jacques Van Helden. 2010. Pathway discovery in metabolic networks by subgraph extraction. Bioinformatics 26, 9 (2010), 1211--1218.
[17]
Naveen Garg, Goran Konjevod, and R Ravi. 2000. A polylogarithmic approximation algorithm for the group Steiner tree problem. Journal of Algorithms 37, 1 (2000), 66--84.
[18]
Shuo Han, Lei Zou, Jeffery Xu Yu, and Dongyan Zhao. 2017. Keyword search on RDF graphs-a query graph assembly approach. In Proceedings of the ACM Conference on Information and Knowledge Management. ACM, 227--236.
[19]
Christopher S Helvig, Gabriel Robins, and Alexander Zelikovsky. 2001. An improved approximation scheme for the group Steiner problem. Networks 37, 1 (2001), 8--20.
[20]
Edmund Ihler. 1990. Bounds on the quality of approximate solutions to the group Steiner problem. In International Workshop on Graph-Theoretic Concepts in Computer Science. Springer, 109--118.
[21]
Theodoros Lappas, Kun Liu, and Evimaria Terzi. 2009. Finding a team of experts in social networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 467--476.
[22]
Guoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong Wang, and Lizhu Zhou. 2008. EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 903--914.
[23]
Guoliang Li, Xiaofang Zhou, Jianhua Feng, and Jianyong Wang. 2009. Progressive keyword search in relational databases. In IEEE International Conference on Data Engineering. IEEE, 1183--1186.
[24]
Jianxin Li, Chengfei Liu, Rui Zhou, and Jeffrey Xu Yu. 2013. Quasi-SLCA based keyword query processing over probabilistic XML data. IEEE Transactions on Knowledge and Data Engineering 26, 4 (2013), 957--969.
[25]
Rong-Hua Li, Lu Qin, Jeffrey Xu Yu, and Rui Mao. 2016. Efficient and progressive group Steiner tree search. In Proceedings of the 2016 International Conference on Management of Data. ACM, 91--106.
[26]
Wentao Li, Miao Qiao, Lu Qin, Ying Zhang, Lijun Chang, and Xuemin Lin. 2019. Scaling distance labeling on small-world networks. In Proceedings of the 2019 International Conference on Management of Data. 1060--1077.
[27]
Wentao Li, Miao Qiao, Lu Qin, Ying Zhang, Lijun Chang, and Xuemin Lin. 2020. Scaling up distance labeling on graphs with core-periphery properties. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1367--1381.
[28]
Ye Li, Leong Hou U, Man Lung Yiu, and Ngai Meng Kou. 2017. An experimental study on hub labeling based shortest path algorithms. Proceedings of the VLDB Endowment 11, 4 (2017), 445--457.
[29]
Xiang Lian and Lei Chen. 2008. Probabilistic ranked queries in uncertain databases. In Proceedings of the 11th international conference on Extending database technology: Advances in database technology. 511--522.
[30]
Xiang Lian and Lei Chen. 2011. Efficient query answering in probabilistic RDF graphs. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. 157--168.
[31]
Xiang Lian, Lei Chen, and Zi Huang. 2015. Keyword search over probabilistic RDF graphs. IEEE transactions on knowledge and data engineering 27, 5 (2015), 1246--1260.
[32]
Anirban Majumder, Samik Datta, and KVM Naidu. 2012. Capacitated team formation problem on social networks. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1005--1013.
[33]
Odysseas Papapetrou, Ekaterini Ioannou, and Dimitrios Skoutas. 2011. Efficient discovery of frequent subgraph patterns in uncertain graph databases. In Proceedings of the 14th International Conference on Extending Database Technology. 355--366.
[34]
Gabriele Reich and Peter Widmayer. 1989. Beyond Steiner's problem: A VLSI oriented generalization. In International Workshop on Graph-theoretic Concepts in Computer Science. Springer, 196--210.
[35]
Sarvjeet Singh, Chris Mayfield, Sunil Prabhakar, Rahul Shah, and Susanne Hambrusch. 2007. Indexing uncertain categorical data. In 2007 IEEE 23rd International Conference on Data Engineering. IEEE, 616--625.
[36]
Mohamed A Soliman, Ihab F Ilyas, and Kevin Chen-Chuan Chang. 2007. Top-k query processing in uncertain databases. In 2007 IEEE 23rd International Conference on Data Engineering. IEEE, 896--905.
[37]
Yahui Sun, Xiaokui Xiao, Bin Cui, Saman Halgamuge, Theodoros Lappas, and Jun Luo. 2021. Finding Group Steiner Trees in Graphs with both Vertex and Edge Weights. Proceedings of the VLDB Endowment 14, 7 (2021), 1137--1149.
[38]
Jie Tang, Limin Yao, Duo Zhang, and Jing Zhang. 2010. A combination approach to web user profiling. ACM Transactions on Knowledge Discovery from Data 5, 1 (2010), 1--44.
[39]
Yufei Tao, Xiaokui Xiao, and Reynold Cheng. 2007. Range search on multidimensional uncertain data. ACM Transactions on Database Systems 32, 3 (2007), 15--es.
[40]
Xinyu Wang, Zhou Zhao, and Wilfred Ng. 2016. Ustf: A unified system of team formation. IEEE Transactions on Big Data 2, 1 (2016), 70--84.
[41]
John William Joseph Williams. 1964. Algorithm 232: heapsort. Commun. ACM 7 (1964), 347--348.
[42]
Ke Yi, Feifei Li, George Kollios, and Divesh Srivastava. 2008. Efficient processing of top-k queries in uncertain databases with x-relations. IEEE transactions on knowledge and data engineering 20, 12 (2008), 1669--1682.
[43]
Ye Yuan, Guoren Wang, Lei Chen, and Haixun Wang. 2012. Efficient subgraph similarity search on large probabilistic graph databases. Proceedings of the VLDB Endowment 5, 9 (2012), 800--811.
[44]
Ye Yuan, Guoren Wang, Lei Chen, and Haixun Wang. 2013. Efficient keyword search on uncertain graph data. IEEE Transactions on Knowledge and Data Engineering 25, 12 (2013), 2767--2779.
[45]
Zhaonian Zou, Jianzhong Li, Hong Gao, and Shuo Zhang. 2010. Finding top-k maximal cliques in an uncertain graph. In IEEE 26th International Conference on Data Engineering. IEEE, 649--652.

Cited By

View all
  • (2023)An Efficient Dynamic Programming Algorithm for Finding Group Steiner Trees in Temporal GraphsInternational Journal of Intelligent Systems10.1155/2023/19741612023Online publication date: 1-Jan-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 16, Issue 2
October 2022
266 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 October 2022
Published in PVLDB Volume 16, Issue 2

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)67
  • Downloads (Last 6 weeks)2
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)An Efficient Dynamic Programming Algorithm for Finding Group Steiner Trees in Temporal GraphsInternational Journal of Intelligent Systems10.1155/2023/19741612023Online publication date: 1-Jan-2023

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media