Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Fast Comparative Analysis of Merge Trees Using Locality Sensitive Hashing

Published: 12 September 2024 Publication History

Abstract

Scalar field comparison is a fundamental task in scientific visualization. In topological data analysis, we compare topological descriptors of scalar fields—such as persistence diagrams and merge trees—because they provide succinct and robust abstract representations. Several similarity measures for topological descriptors seem to be both asymptotically and practically efficient with polynomial time algorithms, but they do not scale well when handling large-scale, time-varying scientific data and ensembles. In this paper, we propose a new framework to facilitate the comparative analysis of merge trees, inspired by tools from locality sensitive hashing (LSH). LSH hashes similar objects into the same hash buckets with high probability. We propose two new similarity measures for merge trees that can be computed via LSH, using new extensions to Recursive MinHash and subpath signature, respectively. Our similarity measures are extremely efficient to compute and closely resemble the results of existing measures such as merge tree edit distance or geometric interleaving distance. Our experiments demonstrate the utility of our LSH framework in applications such as shape matching, clustering, key event detection, and ensemble summarization.

References

[1]
3D flow around a confined square cylinder. http://tinoweinkauf.net/notes/squarecylinder.html, 2024. 6.
[2]
A. Acharya and V. Natarajan. A parallel and memory efficient algorithm for constructing the contour tree. In IEEE Pacific Visualization Symposium, pp. 271–278. IEEE, 2015. 2.
[3]
F. Aiolli, G. Da San Martino, A. Sperduti, and A. Moschitti. Efficient kernel-based learning for trees. In IEEE Symposium on Computational Intelligence and Data Mining, pp. 308–315. IEEE, 2007. 2.
[4]
I. Baeza Rojo and T. Günther. Vector field topology of time-dependent flows in a steady reference frame. IEEE Transactions on Visualization and Computer Graphics, 26(1):280–290, 2020. 6.
[5]
C. Bajaj, A. Gillette, and S. Goswami. Topology based selection and curation of level sets. In H.-C. Hege, K. Polthier, and G. Scheuermann eds., Topology-Based Methods in Visualization II, pp. 45–58. Springer, 2009. 2.
[6]
K. Beketayev, D. Yeliussizov, D. Morozov, G. H. Weber, and B. Hamann. Measuring the distance between merge trees. In P.-T. Bremer, I. Hotz, V. Pascucci, and R. Peikert eds, Topological Methods in Data Analysis and Visualization III, pp. 151–165. Springer, 2014. 2.
[7]
T. Biedert and C. Garth. Contour tree depth images for large data visualization. In Proc. 15th Eurographics Symposium on Parallel Graphics and Visualization, pp. 77–86, 2015. 2.
[8]
B. Bollen, P. Tennakoon, and J. A. Levine. Computing a stable distance on merge trees. IEEE Transactions on Visualization and Computer Graphics, 29(01):1168–1177, 2023. 2.
[9]
A. Z. Broder. Identifying and filtering near-duplicate documents. In R. Giancarlo and D. Sankoff eds, Combinatorial Pattern Matching, pp. 1–10. Springer, 2000. 5.
[10]
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. Journal of Computer and System Sciences, 60(3):630–659, 2000. 3.
[11]
S. Camarri, M. Buffoni, A. Iollo, and M. V. Salvetti. Simulation of the three-dimensional flow around a square cylinder between parallel walls at moderate Reynolds numbers. In XVII Congresso di Meccanica Teorica ed Applicata, pp. 11–15, 2005. 6.
[12]
G. Cardona, A. Mir, F. Rosselló, L. Rotger, and D. Sánchez. Cophenetic metrics for phylogenetic trees, after Sokal and Rohlf. BMC Bioinformatics. 14(1):3:1–3:13, 2013. 3.
[13]
H. Carr, J. Snoeyink, and U. Axen. Computing contour trees in all dimensions. Computational Geometry: Theory and Applications, 24(2):75–94, 2003. 1, 2.
[14]
H. Carr, J. Snoeyink, and M. Van De Panne. Flexible isosurfaces: Simplifying and displaying scalar topology using the contour tree. Computational Geometry, 43(1):42–58, 2010. 2, 9.
[15]
H. A. Carr, C. M. Sewell, L.-T. Lo, and J. P. Ahrens. Hybrid data-parallel contour tree computation. In Proc. Conferece on Computer Graphics & Visual Computing, pp. 73–80, 2016. 2.
[16]
H. A. Carr, G. H. Weber, C. M. Sewell, O. Rübel, P. Fasel, and J. P. Ahrens. Scalable contour tree computation by data parallel peak pruning. IEEE Transactions on Visualization and Computer Graphics, 27(4):2437–2454, 2019. 2.
[17]
[18]
M. S. Charikar. Similarity estimation techniques from rounding algorithms. In Proc. 34th ACM Symposium on Theory of Computing, pp. 380–388, 2002. 1, 2, 3.
[19]
K. Chen and M. Shao. Locality-sensitive bucketing functions for the edit distance. Algorithms for Molecular Biology, 18(1):7, 2023. 2.
[20]
L. Chi, B. Li, and X. Zhu. Context-preserving hashing for fast text classification. In Proc. SIAM International Conference on Data Mining, pp. 100–108, 2014. 2, 3, 4, 5.
[21]
L. Chi and X. Zhu. Hashing techniques: A survey and taxonomy. ACM Computing Surveys, 50(1):1–36, 2017. 2.
[22]
F. Chierichetti, R. Kumar, A. Panconesi, and E. Terolli. On the distortion of locality sensitive hashing. SIAM Journal on Computing, 48(2):350–372, 2019. 2, 3.
[23]
H. Edelsbrunner, J. Harer, and A. Zomorodian. Hierarchical Morse-Smale complexes for piecewise linear 2-manifolds. Discrete & Computational Geometry, 30:87–107, 2003. 1.
[24]
H. Edelsbrunner, D. Letscher, and A. Zomorodian. Topological persistence and simplification. Discrete & Computational Geometry, 28:511–533, 2002. 4.
[25]
O. Ertl. ProbMinHash – a class of locality-sensitive hash algorithms for the (probability) Jaccard similarity. IEEE Transactions on Knowledge and Data Engineering, 34(7):3491–3506, 2020. 2.
[26]
M. Garofalakis and A. Kumar. Correlating XML data streams using tree-edit distance embeddings. In Proc. 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 143–154, 2003. 2, 3.
[27]
M. Garofalakis and A. Kumar. XML stream processing using tree-edit distance embeddings. ACM Transactions on Database Systems, 30(1):279–332, 2005. 2, 3.
[28]
E. Gasparovic, E. Munch, S. Oudot, K. Turner, B. Wang, and Y. Wang. Intrinsic interleaving distance for merge trees. arXiv eprint, 2019. 2, 3.
[29]
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In International Conference on Very Large Data Bases (VLDB), vol. 99, pp. 518–529, 1999. 2.
[30]
S. Gollapudi and R. Panigrahy. The power of two min-hashes for similarity search among hierarchical data objects. In Proc. 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 211–220, 2008. 2.
[31]
C. Gueunet, P. Fortin, J. Jomier, and J. Tierny. Contour forests: Fast multi-threaded augmented contour trees. In IEEE Symposium on Large Data Analysis and Visualization, pp. 85–92. IEEE, 2016. 2.
[32]
C. Gueunet, P. Fortin, J. Jomier, and J. Tierny. Task-based augmented merge trees with Fibonacci heaps. In IEEE Symposium on Large Data Analysis and Visualization, pp. 6–15. IEEE, 2017. 2.
[33]
C. Gueunet, P. Fortin, J. Jomier, and J. Tierny. Task-based augmented contour trees with Fibonacci heaps. IEEE Transactions on Parallel and Distributed Systems, 30(8):1889–1905, 2019. 2.
[34]
D. Günther, J. Salmon, and J. Tierny. Mandatory critical points of 2D uncertain scalar fields. Computer Graphics Forum, 33(3):31–40, 2014. 1.
[35]
T. Günther, M. Gross, and H. Theisel. Generic objective vortices for flow visualization. ACM Transactions on Graphics, 36(4):141:1–141:11, 2017. 6.
[36]
M. Heimann, W. Lee, S. Pan, K.-Y. Chen, and D. Koutra. HashAlign: hash-based alignment of multiple graphs. In Advances in Knowledge Discovery and Data Mining, pp. 726–739. Springer, 2018. 2.
[37]
C. Heine, H. Leitte, M. Hlawitschka, F. Iuricich, L. D. Floriani, G. Scheuermann, H. Hagen, and C. Garth. A survey of topology-based methods in visualization. Computer Graphics Forum, 35(3):643–667, 2016. 2.
[38]
P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. 30th Annual ACM Symposium on Theory of Computing, pp. 604–613, 1998. 1, 2, 3.
[39]
G. Johansson, K. Museth, and H. Carr. Flexible and topologically localized segmentation. In Eurographics/ IEEE-VGTC Symposium on Visualization, pp. 179–186, 2007. 2.
[40]
F. Lan, S. Parsa, and B. Wang. Labeled interleaving distance for Reeb graphs. arXiv preprint, 2023. 1.
[41]
B. Li, X. Zhu, L. Chi, and C. Zhang. Nested subtree hash kernels for large-scale graph classification over streams. In IEEE International Conference on Data Mining, pp. 399–408. IEEE, 2012. 2.
[42]
H. Li, W. Wang, Z. Liu, Y. Niu, H. Wang, S. Zhao, Y. Liao, W. Yang, and X. Liu. A novel locality-sensitive hashing relational graph matching network for semantic textual similarity measurement. Expert Systems with Applications, 207:117832, 2022. 2.
[43]
G. Marçais, D. DeBlasio, P. Pandey, and C. Kingsford. Locality-sensitive hashing for the edit distance. Bioinformatics, 35(14):i127–i135, 2019. 2.
[44]
S. McCauley. Approximate similarity search under edit distance using locality-sensitive hashing. In 24th International Conference on Database Theory, pp. 21:1–21:22, 2021. 2.
[45]
S. Mizuta and T. Matsuda. Description of digital images by region-based contour trees. In International Conference Image Analysis and Recognition, pp. 549–558. Springer, 2005. 2.
[46]
D. Morozov, K. Beketayev, and G. Weber. Interleaving distance between merge trees. Topology-Based Methods in Visualization, 2013. 2.
[47]
V. Narayanan, D. M. Thomas, and V. Natarajan. Distance between extremum graphs. In IEEE Pacific Visualization Symposium, pp. 263–270, 2015. 1, 2.
[48]
P. Oesterling, C. Heine, H. Janicke, G. Scheuermann, and G. Heyer. Visualization of high-dimensional point clouds using their density distribution's topology. IEEE Transactions on Visualization and Computer Graphics, 17(11):1547–1559, 2011. 2.
[49]
J. Poco, H. Doraiswamy, M. Talbert, J. Morisette, and C. T. Silva. Using maximum topology matching to explore differences in species distribution models. IEEE Scientific Visualization Conference, pp. 9–16, 2015. 1.
[50]
M. Pont, J. Vidal, J. Delon, and J. Tierny. Wasserstein distances, geodesics and barycenters of merge trees. IEEE Transactions on Visualization and Computer Graphics, 28(1):291–301, 2022. 2.
[51]
S. Popinet. Free computational fluid dynamics. ClusterWorld, 2(6), 2004. 6.
[52]
Y. Qin, B. T. Fasy, C. Wenk, and B. Summa. A domain-oblivious approach for learning concise representations of filtered topological spaces for clustering. IEEE Transactions on Visualization and Computer Graphics, 28(1):302–312, 2021. 2.
[53]
P. Rosen, A. Seth, E. Mills, A. Ginsburg, J. Kamenetzky, J. Kern, C. R. Johnson, and B. Wang. Using contour trees in the analysis and visualization of radio astronomy data cubes. In Topological Methods in Data Analysis and Visualization VI, pp. 87–108, 2021. 2.
[54]
H. Saikia, H.-P. Seidel, and T. Weinkauf. Extended branch decomposition graphs: Structural comparison of scalar data. Computer Graphics Forum, 33(3):41–50, 2014. 1, 2.
[55]
H. Saikia, H.-P. Seidel, and T. Weinkauf. Fast similarity search in scalar fields using merging histograms. In Topological Methods in Data Analysis and Visualization IV, pp. 121–134, 2017. 2.
[56]
H. Saikia and T. Weinkauf. Global feature tracking and similarity estimation in time-dependent scalar fields. Computer Graphics Forum, 36(3):1–11, 2017. 1, 2.
[57]
Scientific visualization contest. http://www.uni-kl.de/sciviscontest/, 2016. 6.
[59]
N. Shervashidze, P. Schweitzer, E. J. Van Leeuwen, K. Mehlhorn, and K. M. Borgwardt. Weisfeiler-Lehman graph kernels. Journal of Machine Learning Research, 12(9), 2011. 2.
[60]
K. Shin and T. Ishikawa. Linear-time algorithms for the subpath kernel. In Proc. 29th Annual Symposium on Combinatorial Pattern Matching, pp. 22:1–22:13. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. 2.
[61]
M. Soler, M. Plainchault, B. Conche, and J. Tierny. Lifted Wasserstein matcher for fast and robust topology tracking. In IEEE Symposium on Large Data Analysis and Visualization, pp. 23–33, 2018. 1, 9.
[62]
R. Sridharamurthy, T. B. Masood, A. Kamakshidasan, and V. Natarajan. Edit distance between merge trees. IEEE Transactions on Visualization and Computer Graphics, 26(3):1518–1531, 2020. 1, 2, 5, 6, 7.
[63]
R. Sridharamurthy and V. Natarajan. Comparative analysis of merge trees using local tree edit distance. IEEE Transactions on Visualization and Computer Graphics, 29(2):1518–1530, 2023. 1, 2.
[64]
S. Tatikonda and S. Parthasarathy. Hashing tree-structured data: Methods and applications. In IEEE International Conference on Data Engineering, pp. 429–440. IEEE, 2010. 2.
[65]
D. M. Thomas and V. Natarajan. Multiscale symmetry detection in scalar fields by clustering contours. IEEE Transactions on Visualization and Computer Graphics, 20(12):2427–2436, 2014. 2.
[66]
J. Tierny, G. Favelier, J. A. Levine, C. Gueunet, and M. Michaux. The Topology ToolKit. IEEE Transactions on Visualization and Computer Graphics, 24(1):832–842, 2018. 6.
[67]
G. H. Weber, S. E. Dillard, H. Carr, V. Pascucci, and B. Hamann. Topology-controlled volume rendering. IEEE Transactions on Visualization and Computer Graphics, 13(2):330–341, 2007.
[68]
K. Werner and C. Garth. Unordered task-parallel augmented merge tree construction. IEEE Transactions on Visualization and Computer Graphics, 27(8):3585–3596, 2021. 2.
[69]
F. Wetzels, M. Anders, and C. Garth. Taming horizontal instability in merge trees: On the computation of a comprehensive deformation-based edit distance. In Topological Data Analysis and Visualization, pp. 82–92, 2023. 2.
[70]
F. Wetzels and C. Garth. A deformation-based edit distance for merge trees. In Topological Data Analysis and Visualization, pp. 29–38, 2022. 2.
[71]
F. Wetzels, H. Leitte, and C. Garth. Branch decomposition-independent edit distances for merge trees. Computer Graphics Forum, 41(3):367–378, 2022. 2.
[72]
W. Widanagamaachchi, A. Jacques, B. Wang, E. Crosman, P.-T. Bremer, V. Pascucci, and J. Horel. Exploring the evolution of pressure-perturbations to understand atmospheric phenomena. In IEEE Pacific Visualization Symposium, pp. 101–110, 2017. 1.
[73]
Z. Wood, H. Hoppe, M. Desbrun, and P. Schröder. Removing excess topology from isosurfaces. ACM Transactions on Graphics, 23(2):190–208, 2004. 2.
[74]
K. Wu and S. Zhang. A contour tree based visualization for exploring data with uncertainty. International Journal for Uncertainty Quantification, 3(3), 2013. 2.
[75]
W. Wu and B. Li. Locality sensitive hashing for structured data: A survey. arXiv preprint, 2022. 2.
[76]
W. Wu, B. Li, L. Chen, J. Gao, and C. Zhang. A review for weighted MinHash algorithms. IEEE Transactions on Knowledge and Data Engineering, 34(6):2553–2573, 2020. 2, 3.
[77]
W. Wu, B. Li, L. Chen, X. Zhu, and C. Zhang. K-ary tree hashing for fast graph classification. IEEE Transactions on Knowledge and Data Engineering, 30(5):936–949, 2017. 2.
[78]
Z. Xu, L. Niu, J. Ji, and Q. Li. Structure-preserving hashing for tree-structured data. Signal, Image and Video Processing, 16(8):2045–2053, 2022. 2, 3, 4.
[79]
L. Yan, H. Guo, T. Peterka, B. Wang, and J. Wang. TROPHY: A topologically robust physics-informed tracking framework for tropical cyclones. IEEE Transactions on Visualization and Computer Graphics, 30:1302–1312, 2024. 2.
[80]
L. Yan, T. B. Masood, F. Rasheed, I. Hotz, and B. Wang. Geometry-aware merge tree comparisons for time-varying data with interleaving distances. IEEE Transactions on Visualization and Computer Graphics, 29(8):3489–3506, 2023. 1, 2, 3, 5, 6, 9.
[81]
L. Yan, T. B. Masood, R. Sridharamurthy, F. Rasheed, V. Natarajan, I. Hotz, and B. Wang. Scalar field comparison with topological descriptors: Properties and applications for scientific visualization. Computer Graphics Forum, 40(3):599–633, 2021. 1, 2.
[82]
L. Yan, Y. Wang, E. Munch, E. Gasparovic, and B. Wang. A structural average of labeled merge trees for uncertainty visualization. IEEE Transactions on Visualization and Computer Graphics, 26(1):832–842, 2020. 1, 2.
[83]
H. Zhang and Q. Zhang. Embedjoin: efficient edit similarity joins via embeddings. In Proc. 23rd ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 585–594, 2017. 2.
[84]
H. Zhang and Q. Zhang. MinJoin: efficient edit similarity joins via local hash minima. In Proc. 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1093–1103, 2019. 2.
[85]
H. Zhang and Q. Zhang. MinSearch: an efficient algorithm for similarity search under edit distance. In Proc. 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 566–576, 2020. 2.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Visualization and Computer Graphics
IEEE Transactions on Visualization and Computer Graphics  Volume 31, Issue 1
Jan. 2025
1276 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 12 September 2024

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media