Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A Data-dependent Approach for High-dimensional (Robust) Wasserstein Alignment

Published: 11 August 2023 Publication History

Abstract

Many real-world problems can be formulated as the alignment between two geometric patterns. Previously, a great amount of research focus on the alignment of two-dimensional (2D) or 3D patterns in the field of computer vision. Recently, the alignment problem in high dimensions finds several novel applications in practice. However, the research is still rather limited in the algorithmic aspect. To the best of our knowledge, most existing approaches are just simple extensions of their counterparts for 2D and 3D cases and often suffer from the issues such as high computational complexities. In this article, we propose an effective framework to compress the high-dimensional geometric patterns. Any existing alignment method can be applied to the compressed geometric patterns and the time complexity can be significantly reduced. Our idea is inspired by the observation that high-dimensional data often has a low intrinsic dimension. Our framework is a “data-dependent” approach that has the complexity depending on the intrinsic dimension of the input data. Our experimental results reveal that running the alignment algorithm on compressed patterns can achieve similar qualities, comparing with the results on the original patterns, but the runtimes (including the times cost for compression) are substantially lower.

References

[1]
Prince Osei Aboagye, Yan Zheng, Chin-Chia Michael Yeh, Junpeng Wang, Wei Zhang, Liang Wang, Hao Yang, and Jeff M. Phillips. 2022. Normalization of language embeddings for cross-lingual alignment. In Proceedings of the 10th International Conference on Learning Representations (ICLR’22). OpenReview.net.
[2]
Pankaj K. Agarwal, Kyle Fox, Debmalya Panigrahi, Kasturi R. Varadarajan, and Allen Xiao. 2017. Faster algorithms for the geometric transportation problem. In Proceedings of the 33rd International Symposium on Computational Geometry (SoCG’17). 7:1–7:16.
[3]
Pankaj K. Agarwal and Kasturi R. Varadarajan. 2004. A near-linear constant-factor approximation for euclidean bipartite matching? In Proceedings of the 20th ACM Symposium on Computational Geometry. 247–252.
[4]
Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin. 1993. Network Flows: Theory, Algorithms, and Applications. Prentice Hall.
[5]
Jason Altschuler, Jonathan Weed, and Philippe Rigollet. 2017. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. In Proceedings of the Annual Conference on Neural Information Processing Systems. 1964–1974.
[6]
David Alvarez-Melis and Tommi S. Jaakkola. 2018. Gromov-wasserstein alignment of word embedding spaces. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (Eds.). Association for Computational Linguistics, 1881–1890.
[7]
Alexandr Andoni, Khanh Do Ba, Piotr Indyk, and David Woodruff. 2009. Efficient sketches for earth-mover distance, with applications. In Proceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS’09). IEEE, 324–330.
[8]
Alexandr Andoni, Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev. 2014. Parallel algorithms for geometric graph problems. In Proceedings of the Symposium on Theory of Computing (STOC’14). 574–583.
[9]
Alexandr Andoni, Clifford Stein, and Peilin Zhong. 2020. Parallel approximate undirected shortest paths via low hop emulators. In Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC’20), Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy (Eds.). ACM, 322–335.
[10]
David Arthur and Sergei Vassilvitskii. 2007. K-means++ the advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms. 1027–1035.
[11]
Mikhail Belkin. 2003. Problems of Learning on Manifolds. The University of Chicago.
[12]
Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Mach. Learn. 79, 1-2 (2010), 151–175.
[13]
P. J. Besl and Neil D. McKay. 1992. A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14, 2 (1992), 239–256.
[14]
Gaspard Beugnot, Aude Genevay, Kristjan Greenewald, and Justin Solomon. 2021. Improving approximate optimal transport distances using quantization. In Uncertainty in Artificial Intelligence. PMLR, 290–300.
[15]
John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman. 2007. Learning bounds for domain adaptation. In Proceedings of the 21st Annual Conference on Neural Information Processing Systems. 129–136.
[16]
Sergio Cabello, Panos Giannopoulos, Christian Knauer, and Günter Rote. 2008. Matching point sets with respect to the Earth Mover’s Distance. Comput. Geom. 39, 2 (2008), 118–133.
[17]
Xudong Cao, Yichen Wei, Fang Wen, and Jian Sun. 2014. Face alignment by explicit shape regression. Int. J. Comput. Vis. 107, 2 (2014), 177–190.
[18]
Laetitia Chapel, Rémi Flamary, Haoran Wu, Cédric Févotte, and Gilles Gasso. 2021. Unbalanced optimal transport through non-negative penalized linear regression. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS’21), Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 23270–23282.
[19]
Li Chen, Rasmus Kyng, Yang P. Liu, Richard Peng, Maximilian Probst Gutenberg, and Sushant Sachdeva. 2022. Maximum flow and minimum-cost flow in almost-linear time. In IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS’22). IEEE, 612–623.
[20]
Xi Chen, Rajesh Jayaram, Amit Levi, and Erik Waingarten. 2022. New streaming algorithms for high dimensional EMD and MST. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing. 222–233.
[21]
Scott Cohen and Leonidas Guibas. 1999. The earth mover’s distance under transformation sets. In Proceedings of the 7th IEEE International Conference on Computer Vision. 1.
[22]
Nicu D. Cornea, M. Fatih Demirci, Deborah Silver, S. J. Dickinson, and P. B. Kantor. 2005. 3D object retrieval using many-to-many matching of curve skeletons. In Proceedings of the International Conference on Shape Modeling and Applications. IEEE, 366–371.
[23]
Nicolas Courty, Rémi Flamary, Devis Tuia, and Alain Rakotomamonjy. 2016. Optimal transport for domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 9 (2016), 1853–1865.
[24]
Nicolas Courty, Rémi Flamary, Devis Tuia, and Alain Rakotomamonjy. 2017. Optimal transport for domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 9 (2017), 1853–1865.
[25]
Marco Cuturi. 2013. Sinkhorn distances: Lightspeed computation of optimal transport. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems.2292–2300.
[26]
Sanjoy Dasgupta and Kaushik Sinha. 2013. Randomized partition trees for exact nearest neighbor search. In Proceedinsg of the Conference on Learning Theory. 317–337.
[27]
Sunipa Dev, Safia Hassan, and Jeff M. Phillips. 2021. Closed form word embedding alignment. Knowl. Inf. Syst. 63, 3 (2021), 565–588.
[28]
Hu Ding, Tan Chen, Fan Yang, and Mingyue Wang. 2021. A data-dependent algorithm for querying earth mover’s distance with low doubling dimensions. In Proceedings of the SIAM International Conference on Data Mining (SDM’21), Carlotta Demeniconi and Ian Davidson (Eds.). SIAM, 630–638.
[29]
Hu Ding and Jinhui Xu. 2017. FPTAS for minimizing the earth mover’s distance under rigid transformations and related problems. Algorithmica 78, 3 (2017), 741–770.
[30]
Hu Ding and Mingquan Ye. 2019. On geometric alignment in low doubling dimension. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 1460–1467.
[31]
Dan Feldman. 2020. Core-sets: An updated survey. WIREs Data Min. Knowl. Discov. 10, 1 (2020), 23–44.
[32]
Kyle Fox and Jiashuai Lu. 2022. A near-linear time approximation scheme for geometric transportation with arbitrary supplies and spread. J. Comput. Geom. 13, 1 (2022), 204–225.
[33]
Andrew V. Goldberg and Robert Endre Tarjan. 1989. Finding minimum-cost circulations by canceling negative cycles. J. ACM 36, 4 (1989), 873–886.
[34]
Boqing Gong, Yuan Shi, Fei Sha, and Kristen Grauman. 2012. Geodesic flow kernel for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2066–2073.
[35]
Teofilo F. Gonzalez. 1985. Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 38 (1985), 293–306.
[36]
Edouard Grave, Armand Joulin, and Quentin Berthet. 2019. Unsupervised alignment of embeddings with wasserstein procrustes. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 1880–1890.
[37]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 855–864.
[38]
Jihun Ham, Daniel D. Lee, and Lawrence K. Saul. 2005. Semisupervised alignment of manifolds. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’05). 120–127.
[39]
Sariel Har-Peled and Manor Mendel. 2006. Fast construction of nets in low-dimensional metrics and their applications. SIAM J. Comput. 35, 5 (2006), 1148–1184.
[40]
Piotr Indyk. 2007. A near linear time constant factor approximation for Euclidean bichromatic matching (cost). In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 39–42.
[41]
Piotr Indyk and N. Thaper. 2003. Fast color image retrieval via embeddings. In Workshop on Statistical and Computational Theories of Vision (at ICCV’03).
[42]
Kun Jin, Chaoyue Liu, and Cathy Xia. 2021. Two-sided wasserstein procrustes analysis. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’21). 3515–3521.
[43]
Ibrahim Jubran, Alaa Maalouf, Ron Kimmel, and Dan Feldman. 2021. Provably approximated ICP. arXiv:2101.03588. Retrieved from https://arxiv.org/abs/2101.03588.
[44]
David R. Karger and Matthias Ruhl. 2002. Finding nearest neighbors in growth-restricted metrics. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing. ACM, 741–750.
[45]
Andrey Boris Khesin, Aleksandar Nikolov, and Dmitry Paramonov. 2019. Preconditioning for the geometric transportation problem. In Proceedings of the 35th International Symposium on Computational Geometry. 15:1–15:14.
[46]
Wan Kyu Kim and Edward M. Marcotte. 2008. Age-dependent evolution of the yeast protein interaction network suggests a limited role of gene duplication and divergence. PLoS Comput. Biol. 4, 11 (2008), e1000232.
[47]
Oliver Klein and Remco C. Veltkamp. 2005. Approximation algorithms for computing the earth mover’s distance under transformations. In International Symposium on Algorithms and Computation. Springer, 1019–1028.
[48]
Robert Krauthgamer and James R. Lee. 2004. Navigating nets: Simple algorithms for proximity search. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 798–807.
[49]
Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In Proceedings of the International Conference on Machine Learning. 957–966.
[50]
Tomi J. Laakso. 2002. Plane with \(A_ {\infty }\) -weighten metric not Bilipschitz embeddable to \(R^ n\) . Bull. Lond. Math. Soc. 34, 6 (2002), 667–676.
[51]
François Le Gall. 2012. Faster algorithms for rectangular matrix multiplication. In Proceedings of the IEEE 53rd Annual Symposium on Foundations of Computer Science. IEEE, 514–523.
[52]
Yin Tat Lee and Aaron Sidford. 2014. Path finding methods for linear programming: Solving linear programs in Õ(vrank) iterations and faster algorithms for maximum flow. In Proceedings of the 55th IEEE Annual Symposium on Foundations of Computer Science. 424–433.
[53]
Shi Li. 2010. On constant factor approximation for earth mover distance over doubling metrics. CoRR abs/1002.4034.
[54]
Yangwei Liu, Hu Ding, Danyang Chen, and Jinhui Xu. 2017. Novel geometric approach for global alignment of PPI networks. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. 31–37.
[55]
Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 2 (1982), 129–137.
[56]
Noël Malod-Dognin, Kristina Ban, and Nataša Pržulj. 2017. Unified alignment of protein-protein interaction networks. Sci. Rep. 7, 1 (2017), 953.
[57]
Davide Maltoni, Dario Maio, Anil K. Jain, and Salil Prabhakar. 2009. Handbook of Fingerprint Recognition. Springer Science & Business Media.
[58]
Tomas Mikolov, Quoc V. Le, and Ilya Sutskever. 2013. Exploiting similarities among languages for machine translation. arXiv:1309.4168. Retrieved from https://arxiv.org/abs/1309.4168.
[59]
Debarghya Mukherjee, Aritra Guha, Justin M. Solomon, Yuekai Sun, and Mikhail Yurochkin. 2021. Outlier-robust optimal transport. In International Conference on Machine Learning. PMLR, 7850–7860.
[60]
Alexander Munteanu and Chris Schwiegelshohn. 2018. Coresets-methods and history: A theoreticians design pattern for approximation and streaming algorithms. Künstl. Intell. 32, 1 (2018), 37–53.
[61]
Soliman Nasser, Ibrahim Jubran, and Dan Feldman. 2015. Low-cost and faster tracking systems using core-sets for pose-estimation. CoRR abs/1511.09120 (2015).
[62]
James B. Orlin. 1988. A faster strongly polynominal minimum cost flow algorithm. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing. 377–387.
[63]
James B. Orlin. 1997. A polynomial time primal network simplex algorithm for minimum cost flows. Math. Program. 78, 2 (1997), 109–129.
[64]
James B. Orlin, Serge A. Plotkin, and Éva Tardos. 1993. Polynomial dual network simplex algorithms. Math. Program. 60, 1-3 (1993), 255–276.
[65]
Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 10 (2010), 1345–1359.
[66]
Michaël Perrot, Nicolas Courty, Rémi Flamary, and Amaury Habrard. 2016. Mapping estimation for discrete optimal transport. In Advances in Neural Information Processing Systems, 29.
[67]
Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. 2000. The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40, 2 (2000), 99–121.
[68]
Ebrahim Sahraeian Sayed Mohammad and Byung-Jun Yoon. 2012. A network synthesis model for generating protein interaction network families. PloS One 7 (August2012).
[69]
Peter H. Schönemann. 1966. A generalized solution of the orthogonal procrustes problem. Psychometrika 31, 1 (1966), 1–10.
[70]
R. Sharathkumar and Pankaj K. Agarwal. 2012. Algorithms for the transportation problem in geometric settings. In Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’12). 306–317.
[71]
R. Sharathkumar and Pankaj K. Agarwal. 2012. A near-linear time \(\epsilon\) -approximation algorithm for geometric bipartite matching. In Proceedings of the 44th Symposium on Theory of Computing Conference (STOC’12). 385–394.
[72]
Jonah Sherman. 2017. Generalized preconditioning and undirected minimum-cost flow. In Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms. 772–780.
[73]
Ricard V. Solé, Romualdo Pastor-Satorras, Eric Smith, and Thomas B. Kepler. 2002. A model of large-scale proteome evolution. Adv, Complex Syst, 5, 01 (2002), 43–54.
[74]
Kunal Talwar. 2004. Bypassing the embedding: Algorithms for low dimensional metrics. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing. 281–290.
[75]
Éva Tardos. 1985. A strongly polynomial minimum cost circulation algorithm. Combinatorica 5, 3 (1985), 247–256.
[76]
Sinisa Todorovic and Narendra Ahuja. 2008. Region-based hierarchical image matching. Int. J. Comput. Vis. 78, 1 (2008), 47–66.
[77]
Pravin M. Vaidya. 1989. Geometry helps in matching. SIAM J. Comput. 18, 6 (1989), 1201–1225.
[78]
Kasturi R. Varadarajan and Pankaj K. Agarwal. 1999. Approximation algorithms for bipartite and non-bipartite matching in the plane. In Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms. 805–814.
[79]
Alexei Vázquez, Alessandro Flammini, Amos Maritan, and Alessandro Vespignani. 2003. Modeling of protein interaction networks. Complexus 1, 1 (2003), 38–44.
[80]
Cédric Villani. 2008. Topics in optimal transportation. Am. Math. Soc. 58 (2008).
[81]
Grace Wahba. 1965. A least squares estimate of satellite attitude. SIAM Rev. 7, 3 (1965), 409–409.
[82]
C. Wang, P. Krafft, S. Mahadevan, Y. Ma, and Y. Fu. 2011. Manifold alignment. In Manifold Learning: Theory and Applications, CRC Press. 95–120.
[83]
Zekun Yin, Haidong Lan, Guangming Tan, Mian Lu, Athanasios V. Vasilakos, and Weiguo Liu. 2017. Computing platforms for big biological data analytics: Perspectives and challenges. Comput. Struct. Biotechnol. J. 15 (2017), 403–411.
[84]
Hyejin Youn, Logan Sutton, Eric Smith, Cristopher Moore, Jon F. Wilkins, Ian Maddieson, William Croft, and Tanmoy Bhattacharya. 2016. On the universal structure of human lexical semantics. Proc. Natl. Acad. Sci. U.S.A. 113, 7 (2016), 1766–1771.
[85]
Meng Zhang, Yang Liu, Huanbo Luan, and Maosong Sun. 2017. Earth mover’s distance minimization for unsupervised bilingual lexicon induction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2017). 1934–1945.

Index Terms

  1. A Data-dependent Approach for High-dimensional (Robust) Wasserstein Alignment

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Journal of Experimental Algorithmics
    ACM Journal of Experimental Algorithmics  Volume 28, Issue
    December 2023
    325 pages
    ISSN:1084-6654
    EISSN:1084-6654
    DOI:10.1145/3587923
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 August 2023
    Online AM: 21 June 2023
    Accepted: 27 May 2023
    Revised: 29 March 2023
    Received: 23 September 2022
    Published in JEA Volume 28

    Author Tags

    1. Wasserstein distance
    2. Procrustes analysis
    3. doubling dimension
    4. network alignment
    5. unsupervised cross-lingual learning
    6. domain adaptation

    Qualifiers

    • Research-article

    Funding Sources

    • National Key R&D program of China
    • NSFC
    • Provincial NSF of Anhui

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 128
      Total Downloads
    • Downloads (Last 12 months)116
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 03 Sep 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media