Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Leveraging graph dimensions in online graph search

Published: 01 September 2014 Publication History

Abstract

Graphs have been widely used due to its expressive power to model complicated relationships. However, given a graph database Dg = {g1, g2, · · ·, gn}, it is challenging to process graph queries since a basic graph query usually involves costly graph operations such as maximum common subgraph and graph edit distance computation, which are NP-hard. In this paper, we study a novel DS-preserved mapping which maps graphs in a graph database Dg onto a multidimensional space Mg under a structural dimension M using a mapping function φ(). The DS-preserved mapping preserves two things: distance and structure. By the distance-preserving, it means that any two graphs gi and gj in Dg must map to two data objects φ(gi) and φ(gj) in Mg, such that the distance, d(φ(gi), φ(gj)), between φ(gi) and φ(gj) in Mg approximates the graph dissimilarity δ(gi, gj) in Dg. By the structure-preserving, it further means that for a given unseen query graph q, the distance between q and any graph gi in Dg needs to be preserved such that δ(q, gi) ≈ d(φ(q), φ(gi)). We discuss the rationality of using graph dimension M for online graph processing, and show how to identify a small set of subgraphs to form M efficiently. We propose an iterative algorithm DSPM to compute the graph dimension, and discuss its optimization techniques. We also give an approximate algorithm DSPMap in order to handle a large graph database. We conduct extensive performance studies on both real and synthetic datasets to evaluate the top-k similarity query which is to find top-k similar graphs from Dg for a query graph, and show the effectiveness and efficiency of our approaches.

References

[1]
H. Bunke and K. Shearer, "A graph distance metric based on the maximal common subgraph," Pattern Recognition Letters, vol. 19, no. 3-4, 1998.
[2]
Y. Zhu, L. Qin, J. X. Yu, and H. Cheng, "Finding top-k similar graphs in graph databases," in Proc. of EDBT'12, 2012.
[3]
B. Luo, R. C. Wilson, and E. R. Hancock, "Spectral embedding of graphs," Pattern Recognition, vol. 36, no. 10, 2003.
[4]
R. C. Wilson, E. R. Hancock, and B. Luo, "Pattern vectors from algebraic graph theory," IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 7, 2005.
[5]
J. B. Tenenbaum, V. De Silva, and J. C. Langford, "A global geometric framework for nonlinear dimensionality reduction," Science, vol. 290, no. 5500, 2000.
[6]
S. T. Roweis and L. K. Saul, "Nonlinear dimensionality reduction by locally linear embedding," Science, vol. 290, no. 5500, 2000.
[7]
D. Luo, C. H. Q. Ding, H. Huang, and T. Li, "Non-negative laplacian embedding," in Proc. of ICDM'09, 2009.
[8]
T. Asano, P. Bose, P. Carmi, A. Maheshwari, C. Shu, M. Smid, and S. Wuhrer, "A linear-space algorithm for distance preserving graph embedding," Computational Geometry, vol. 42, no. 4, 2009.
[9]
K. Riesen, M. Neuhaus, and H. Bunke, "Graph embedding in vector spaces by means of prototype selection," in Graph-Based Representations in Pattern Recognition, 2007.
[10]
H. Bunke and K. Riesen, "Improving vector space embedding of graphs through feature selection algorithms," Pattern Recognition, vol. 44, no. 9, 2011.
[11]
J. Gibert, E. Valveny, and H. Bunke, "Graph embedding in vector spaces by node attribute statistics," Pattern Recognition, vol. 45, no. 9, 2012.
[12]
K. M. Borgwardt, N. N. Schraudolph, and S. Viswanathan, "Fast computation of graph kernels," in Advances in Neural Information Processing Systems, 2006.
[13]
K. M. Borgwardt and H.-P. Kriegel, "Shortest-path kernels on graphs," in Proc. of ICDM'05, 2005.
[14]
J. Ramon and T. Gärtner, "Expressivity versus efficiency of graph kernels," in International Workshop on Mining Graphs, Trees and Sequences, 2003.
[15]
N. Pržulj, "Biological network comparison using graphlet degree distribution," Bioinformatics, vol. 23, no. 2, 2007.
[16]
X. Wang, A. M. Smalter, J. Huan, and G. H. Lushington, "G-hash: towards fast kernel-based similarity search in large graph databases," in Proc. of EDBT'09, 2009.
[17]
M. Tan, F. Polat, and R. Alhajj, "Feature selection for graph kernels," in IEEE International Conference on Bioinformatics and Biomedicine, 2010.
[18]
L. Schietgat, F. Costa, J. Ramon, and L. De Raedt, "Effective feature construction by maximum common subgraph sampling," Machine Learning, vol. 83, no. 2, 2011.
[19]
L. Yu and H. Liu, "Efficiently handling feature redundancy in high-dimensional data," in Proc. of KDD'03, 2003.
[20]
R. Kohavi and G. H. John, "Wrappers for feature subset selection," Artificial Intelligence, vol. 97, no. 1, 1997.
[21]
R. Fukunaga, Statistical Pattern Recognition. Academic Press., 1990.
[22]
L. Talavera, "Feature selection as a preprocessing step for hierarchical clustering," in Proc. of ICML'99, 1999.
[23]
M. Dash, K. Choi, P. Scheuermann, and H. Liu, "Feature selection for clustering - a filter solution," in Proc. of ICDM'02, 2002.
[24]
P. Mitra, C. Murthy, and S. K. Pal, "Unsupervised feature selection using feature similarity," IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 3, 2002.
[25]
X. He, D. Cai, and P. Niyogi, "Laplacian score for feature selection," in Advances in Neural Information Processing Systems, 2005.
[26]
Z. Zhao and H. Liu, "Spectral feature selection for supervised and unsupervised learning," in Proc. of ICML'07, 2007.
[27]
D. Cai, C. Zhang, and X. He, "Unsupervised feature selection for multi-cluster data," in Proc. of KDD'10, 2010.
[28]
Y. Yang, H. T. Shen, Z. Ma, Z. Huang, and X. Zhou, "l 2, 1-norm regularized discriminative feature selection for unsupervised learning," in Proc. of IJCAI'11, 2011.
[29]
Z. Li, Y. Yang, J. Liu, X. Zhou, and H. Lu, "Unsupervised feature selection using nonnegative spectral analysis," in Proc. of AAAI'12, 2012.
[30]
C. C. Aggarwal and H. Wang, Managing and Mining Graph Data. Springer, 2010, vol. 40.
[31]
X. Yan, P. S. Yu, and J. Han, "Graph indexing: A frequent structure-based approach," in Proc. of SIGMOD'04, 2004.
[32]
J. Cheng, Y. Ke, W. Ng, and A. Lu, "Fg-index: towards verification-free query processing on graph databases," in Proc. of SIGMOD'07, 2007.
[33]
X. Yan, P. Yu, and J. Han, "Substructure similarity search in graph databases," in Proc. of SIGMOD'05, 2005.
[34]
C. Chen, X. Yan, P. S. Yu, J. Han, D.-Q. Zhang, and X. Gu, "Towards graph containment search and indexing," in Proc. of VLDB'07, 2007.
[35]
H. Cheng, X. Yan, J. Han, and C.-W. Hsu, "Discriminative frequent pattern analysis for effective classification," in Proc. of ICDE'07, 2007.
[36]
J. De Leeuw, "Applications of convex analysis to multidimensional scaling," Recent Developments in Statistics, 1977.
[37]
J. De Leeuw and H. W. J., "Multidimensional scaling with restrictions on the configuration," Multivariate Analysis, vol. V, 1980.
[38]
X. Yan and J. Han, "gspan: Graph-based substructure pattern mining," in Proc. of ICDM'02, 2002.
[39]
J. Cheng, Y. Ke, and W. Ng, "Graphgen: A graph synthetic generator," 2006, http://www.cse.ust.hk/graphgen.
[40]
R. Fagin, R. Kumar, and D. Sivakumar, "Comparing top k lists," SIAM Journal on Discrete Mathematics, vol. 17, no. 1, 2003.
[41]
M. Theobald, G. Weikum, and R. Schenkel, "Top-k query evaluation with probabilistic guarantees," in Proc. of VLDB'04, 2004.
[42]
M. Kendall, "A new measure of rank correlation," Biometrika, vol. 30, no. 1/2, 1938.
[43]
L. P. Cordella, P. Foggia, C. Sansone, and M. Vento, "A (sub)graph isomorphism algorithm for matching large graphs," IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 10, 2004.

Cited By

View all
  • (2019)Unsupervised graph-based rank aggregation for improved retrievalInformation Processing and Management: an International Journal10.1016/j.ipm.2019.03.00856:4(1260-1279)Online publication date: 1-Jul-2019
  • (2019)Bag of textual graphs (BoTG)Journal of the Association for Information Science and Technology10.1002/asi.2416770:8(817-829)Online publication date: 2-Jul-2019
  • (2018)Efficient structure similarity searchesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-017-0487-027:1(53-78)Online publication date: 1-Feb-2018

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 8, Issue 1
September 2014
100 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 September 2014
Published in PVLDB Volume 8, Issue 1

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)4
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Unsupervised graph-based rank aggregation for improved retrievalInformation Processing and Management: an International Journal10.1016/j.ipm.2019.03.00856:4(1260-1279)Online publication date: 1-Jul-2019
  • (2019)Bag of textual graphs (BoTG)Journal of the Association for Information Science and Technology10.1002/asi.2416770:8(817-829)Online publication date: 2-Jul-2019
  • (2018)Efficient structure similarity searchesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-017-0487-027:1(53-78)Online publication date: 1-Feb-2018

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media