research-article

Leveraging graph dimensions in online graph search

Authors:

Lu QinAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 8, Issue 1

Pages 85 - 96

https://doi.org/10.14778/2735461.2735469

Published: 01 September 2014 Publication History

Abstract

Graphs have been widely used due to its expressive power to model complicated relationships. However, given a graph database D_g = {g₁, g₂, · · ·, g_n}, it is challenging to process graph queries since a basic graph query usually involves costly graph operations such as maximum common subgraph and graph edit distance computation, which are NP-hard. In this paper, we study a novel DS-preserved mapping which maps graphs in a graph database D_g onto a multidimensional space M_g under a structural dimension M using a mapping function φ(). The DS-preserved mapping preserves two things: distance and structure. By the distance-preserving, it means that any two graphs g_i and g_j in D_g must map to two data objects φ(g_i) and φ(g_j) in M_g, such that the distance, d(φ(g_i), φ(g_j)), between φ(g_i) and φ(g_j) in M_g approximates the graph dissimilarity δ(g_i, g_j) in D_g. By the structure-preserving, it further means that for a given unseen query graph q, the distance between q and any graph g_i in D_g needs to be preserved such that δ(q, g_i) ≈ d(φ(q), φ(g_i)). We discuss the rationality of using graph dimension M for online graph processing, and show how to identify a small set of subgraphs to form M efficiently. We propose an iterative algorithm DSPM to compute the graph dimension, and discuss its optimization techniques. We also give an approximate algorithm DSPMap in order to handle a large graph database. We conduct extensive performance studies on both real and synthetic datasets to evaluate the top-k similarity query which is to find top-k similar graphs from D_g for a query graph, and show the effectiveness and efficiency of our approaches.

References

[1]

H. Bunke and K. Shearer, "A graph distance metric based on the maximal common subgraph," Pattern Recognition Letters, vol. 19, no. 3-4, 1998.

Digital Library

[2]

Y. Zhu, L. Qin, J. X. Yu, and H. Cheng, "Finding top-k similar graphs in graph databases," in Proc. of EDBT'12, 2012.

Digital Library

[3]

B. Luo, R. C. Wilson, and E. R. Hancock, "Spectral embedding of graphs," Pattern Recognition, vol. 36, no. 10, 2003.

[4]

R. C. Wilson, E. R. Hancock, and B. Luo, "Pattern vectors from algebraic graph theory," IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 7, 2005.

Digital Library

[5]

J. B. Tenenbaum, V. De Silva, and J. C. Langford, "A global geometric framework for nonlinear dimensionality reduction," Science, vol. 290, no. 5500, 2000.

[6]

S. T. Roweis and L. K. Saul, "Nonlinear dimensionality reduction by locally linear embedding," Science, vol. 290, no. 5500, 2000.

[7]

D. Luo, C. H. Q. Ding, H. Huang, and T. Li, "Non-negative laplacian embedding," in Proc. of ICDM'09, 2009.

Digital Library

[8]

T. Asano, P. Bose, P. Carmi, A. Maheshwari, C. Shu, M. Smid, and S. Wuhrer, "A linear-space algorithm for distance preserving graph embedding," Computational Geometry, vol. 42, no. 4, 2009.

Digital Library

[9]

K. Riesen, M. Neuhaus, and H. Bunke, "Graph embedding in vector spaces by means of prototype selection," in Graph-Based Representations in Pattern Recognition, 2007.

Digital Library

[10]

H. Bunke and K. Riesen, "Improving vector space embedding of graphs through feature selection algorithms," Pattern Recognition, vol. 44, no. 9, 2011.

Digital Library

[11]

J. Gibert, E. Valveny, and H. Bunke, "Graph embedding in vector spaces by node attribute statistics," Pattern Recognition, vol. 45, no. 9, 2012.

Digital Library

[12]

K. M. Borgwardt, N. N. Schraudolph, and S. Viswanathan, "Fast computation of graph kernels," in Advances in Neural Information Processing Systems, 2006.

[13]

K. M. Borgwardt and H.-P. Kriegel, "Shortest-path kernels on graphs," in Proc. of ICDM'05, 2005.

Digital Library

[14]

J. Ramon and T. Gärtner, "Expressivity versus efficiency of graph kernels," in International Workshop on Mining Graphs, Trees and Sequences, 2003.

[15]

N. Pržulj, "Biological network comparison using graphlet degree distribution," Bioinformatics, vol. 23, no. 2, 2007.

Digital Library

[16]

X. Wang, A. M. Smalter, J. Huan, and G. H. Lushington, "G-hash: towards fast kernel-based similarity search in large graph databases," in Proc. of EDBT'09, 2009.

Digital Library

[17]

M. Tan, F. Polat, and R. Alhajj, "Feature selection for graph kernels," in IEEE International Conference on Bioinformatics and Biomedicine, 2010.

[18]

L. Schietgat, F. Costa, J. Ramon, and L. De Raedt, "Effective feature construction by maximum common subgraph sampling," Machine Learning, vol. 83, no. 2, 2011.

Digital Library

[19]

L. Yu and H. Liu, "Efficiently handling feature redundancy in high-dimensional data," in Proc. of KDD'03, 2003.

Digital Library

[20]

R. Kohavi and G. H. John, "Wrappers for feature subset selection," Artificial Intelligence, vol. 97, no. 1, 1997.

Digital Library

[21]

R. Fukunaga, Statistical Pattern Recognition. Academic Press., 1990.

Digital Library

[22]

L. Talavera, "Feature selection as a preprocessing step for hierarchical clustering," in Proc. of ICML'99, 1999.

Digital Library

[23]

M. Dash, K. Choi, P. Scheuermann, and H. Liu, "Feature selection for clustering - a filter solution," in Proc. of ICDM'02, 2002.

Digital Library

[24]

P. Mitra, C. Murthy, and S. K. Pal, "Unsupervised feature selection using feature similarity," IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 3, 2002.

Digital Library

[25]

X. He, D. Cai, and P. Niyogi, "Laplacian score for feature selection," in Advances in Neural Information Processing Systems, 2005.

[26]

Z. Zhao and H. Liu, "Spectral feature selection for supervised and unsupervised learning," in Proc. of ICML'07, 2007.

Digital Library

[27]

D. Cai, C. Zhang, and X. He, "Unsupervised feature selection for multi-cluster data," in Proc. of KDD'10, 2010.

Digital Library

[28]

Y. Yang, H. T. Shen, Z. Ma, Z. Huang, and X. Zhou, "l 2, 1-norm regularized discriminative feature selection for unsupervised learning," in Proc. of IJCAI'11, 2011.

Digital Library

[29]

Z. Li, Y. Yang, J. Liu, X. Zhou, and H. Lu, "Unsupervised feature selection using nonnegative spectral analysis," in Proc. of AAAI'12, 2012.

[30]

C. C. Aggarwal and H. Wang, Managing and Mining Graph Data. Springer, 2010, vol. 40.

Digital Library

[31]

X. Yan, P. S. Yu, and J. Han, "Graph indexing: A frequent structure-based approach," in Proc. of SIGMOD'04, 2004.

Digital Library

[32]

J. Cheng, Y. Ke, W. Ng, and A. Lu, "Fg-index: towards verification-free query processing on graph databases," in Proc. of SIGMOD'07, 2007.

Digital Library

[33]

X. Yan, P. Yu, and J. Han, "Substructure similarity search in graph databases," in Proc. of SIGMOD'05, 2005.

Digital Library

[34]

C. Chen, X. Yan, P. S. Yu, J. Han, D.-Q. Zhang, and X. Gu, "Towards graph containment search and indexing," in Proc. of VLDB'07, 2007.

Digital Library

[35]

H. Cheng, X. Yan, J. Han, and C.-W. Hsu, "Discriminative frequent pattern analysis for effective classification," in Proc. of ICDE'07, 2007.

[36]

J. De Leeuw, "Applications of convex analysis to multidimensional scaling," Recent Developments in Statistics, 1977.

[37]

J. De Leeuw and H. W. J., "Multidimensional scaling with restrictions on the configuration," Multivariate Analysis, vol. V, 1980.

[38]

X. Yan and J. Han, "gspan: Graph-based substructure pattern mining," in Proc. of ICDM'02, 2002.

Digital Library

[39]

J. Cheng, Y. Ke, and W. Ng, "Graphgen: A graph synthetic generator," 2006, http://www.cse.ust.hk/graphgen.

[40]

R. Fagin, R. Kumar, and D. Sivakumar, "Comparing top k lists," SIAM Journal on Discrete Mathematics, vol. 17, no. 1, 2003.

Digital Library

[41]

M. Theobald, G. Weikum, and R. Schenkel, "Top-k query evaluation with probabilistic guarantees," in Proc. of VLDB'04, 2004.

Digital Library

[42]

M. Kendall, "A new measure of rank correlation," Biometrika, vol. 30, no. 1/2, 1938.

[43]

L. P. Cordella, P. Foggia, C. Sansone, and M. Vento, "A (sub)graph isomorphism algorithm for matching large graphs," IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 10, 2004.

Digital Library

Cited By

Dourado IPedronette DTorres R(2019)Unsupervised graph-based rank aggregation for improved retrievalInformation Processing and Management: an International Journal10.1016/j.ipm.2019.03.00856:4(1260-1279)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.1016/j.ipm.2019.03.008
Dourado ÍGalante RGonçalves Mda Silva Torres R(2019)Bag of textual graphs (BoTG)Journal of the Association for Information Science and Technology10.1002/asi.2416770:8(817-829)Online publication date: 2-Jul-2019
https://dl.acm.org/doi/10.1002/asi.24167
Zhao XXiao CLin XZhang WWang Y(2018)Efficient structure similarity searchesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-017-0487-027:1(53-78)Online publication date: 1-Feb-2018
https://dl.acm.org/doi/10.1007/s00778-017-0487-0

Recommendations

Collapsible subgraphs of a 4-edge-connected graph
Abstract
Jaeger in 1979 showed that every 4-edge-connected graph is supereulerian, graphs that have spanning eulerian subgraphs. Catlin in 1988 sharpened Jaeger’s result by showing that every 4-edge-connected graph is collapsible, graphs that are ...
Trivially noncontractible edges in a contraction critically 5-connected graph

An edge of a k-connected graph is said to be k-contractible if the contraction of the edge results in a k-connected graph. A k-connected graph with no k-contractible edge is said to be contraction critically k-connected. An edge of a k-connected graph ...
Traversability and connectivity of the middle graph of a graph

We define a graph M(G) as an intersection graph @W(F) on the point set V(G) of any graph G. Let X(G) be the line set of G and F = V'(G) @__ __ X(G), where V'(G) indicates the family of all one point subsets of the set V(G). Let M(G) = @W(F). M(G) is ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 8, Issue 1

September 2014

100 pages

ISSN:2150-8097

Editors:
Chen Li
University of California, Irvine
,
Volker Markl
TU Berlin

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 September 2014

Published in PVLDB Volume 8, Issue 1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
87
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)4

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dourado IPedronette DTorres R(2019)Unsupervised graph-based rank aggregation for improved retrievalInformation Processing and Management: an International Journal10.1016/j.ipm.2019.03.00856:4(1260-1279)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.1016/j.ipm.2019.03.008
Dourado ÍGalante RGonçalves Mda Silva Torres R(2019)Bag of textual graphs (BoTG)Journal of the Association for Information Science and Technology10.1002/asi.2416770:8(817-829)Online publication date: 2-Jul-2019
https://dl.acm.org/doi/10.1002/asi.24167
Zhao XXiao CLin XZhang WWang Y(2018)Efficient structure similarity searchesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-017-0487-027:1(53-78)Online publication date: 1-Feb-2018
https://dl.acm.org/doi/10.1007/s00778-017-0487-0

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents