research-article

A Data-dependent Approach for High-dimensional (Robust) Wasserstein Alignment

Authors:

Mingquan YeAuthors Info & Claims

ACM Journal of Experimental Algorithmics, Volume 28

Article No.: 1.8, Pages 1 - 32

https://doi.org/10.1145/3604910

Published: 11 August 2023 Publication History

Abstract

Many real-world problems can be formulated as the alignment between two geometric patterns. Previously, a great amount of research focus on the alignment of two-dimensional (2D) or 3D patterns in the field of computer vision. Recently, the alignment problem in high dimensions finds several novel applications in practice. However, the research is still rather limited in the algorithmic aspect. To the best of our knowledge, most existing approaches are just simple extensions of their counterparts for 2D and 3D cases and often suffer from the issues such as high computational complexities. In this article, we propose an effective framework to compress the high-dimensional geometric patterns. Any existing alignment method can be applied to the compressed geometric patterns and the time complexity can be significantly reduced. Our idea is inspired by the observation that high-dimensional data often has a low intrinsic dimension. Our framework is a “data-dependent” approach that has the complexity depending on the intrinsic dimension of the input data. Our experimental results reveal that running the alignment algorithm on compressed patterns can achieve similar qualities, comparing with the results on the original patterns, but the runtimes (including the times cost for compression) are substantially lower.

References

[1]

Prince Osei Aboagye, Yan Zheng, Chin-Chia Michael Yeh, Junpeng Wang, Wei Zhang, Liang Wang, Hao Yang, and Jeff M. Phillips. 2022. Normalization of language embeddings for cross-lingual alignment. In Proceedings of the 10th International Conference on Learning Representations (ICLR’22). OpenReview.net.

[2]

Pankaj K. Agarwal, Kyle Fox, Debmalya Panigrahi, Kasturi R. Varadarajan, and Allen Xiao. 2017. Faster algorithms for the geometric transportation problem. In Proceedings of the 33rd International Symposium on Computational Geometry (SoCG’17). 7:1–7:16.

[3]

Pankaj K. Agarwal and Kasturi R. Varadarajan. 2004. A near-linear constant-factor approximation for euclidean bipartite matching? In Proceedings of the 20th ACM Symposium on Computational Geometry. 247–252.

Digital Library

[4]

Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin. 1993. Network Flows: Theory, Algorithms, and Applications. Prentice Hall.

Digital Library

[5]

Jason Altschuler, Jonathan Weed, and Philippe Rigollet. 2017. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. In Proceedings of the Annual Conference on Neural Information Processing Systems. 1964–1974.

[6]

David Alvarez-Melis and Tommi S. Jaakkola. 2018. Gromov-wasserstein alignment of word embedding spaces. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (Eds.). Association for Computational Linguistics, 1881–1890.

[7]

Alexandr Andoni, Khanh Do Ba, Piotr Indyk, and David Woodruff. 2009. Efficient sketches for earth-mover distance, with applications. In Proceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS’09). IEEE, 324–330.

[8]

Alexandr Andoni, Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev. 2014. Parallel algorithms for geometric graph problems. In Proceedings of the Symposium on Theory of Computing (STOC’14). 574–583.

Digital Library

[9]

Alexandr Andoni, Clifford Stein, and Peilin Zhong. 2020. Parallel approximate undirected shortest paths via low hop emulators. In Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC’20), Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy (Eds.). ACM, 322–335.

[10]

David Arthur and Sergei Vassilvitskii. 2007. K-means++ the advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms. 1027–1035.

[11]

Mikhail Belkin. 2003. Problems of Learning on Manifolds. The University of Chicago.

[12]

Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Mach. Learn. 79, 1-2 (2010), 151–175.

Digital Library

[13]

P. J. Besl and Neil D. McKay. 1992. A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14, 2 (1992), 239–256.

Digital Library

[14]

Gaspard Beugnot, Aude Genevay, Kristjan Greenewald, and Justin Solomon. 2021. Improving approximate optimal transport distances using quantization. In Uncertainty in Artificial Intelligence. PMLR, 290–300.

[15]

John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman. 2007. Learning bounds for domain adaptation. In Proceedings of the 21st Annual Conference on Neural Information Processing Systems. 129–136.

[16]

Sergio Cabello, Panos Giannopoulos, Christian Knauer, and Günter Rote. 2008. Matching point sets with respect to the Earth Mover’s Distance. Comput. Geom. 39, 2 (2008), 118–133.

Digital Library

[17]

Xudong Cao, Yichen Wei, Fang Wen, and Jian Sun. 2014. Face alignment by explicit shape regression. Int. J. Comput. Vis. 107, 2 (2014), 177–190.

Digital Library

[18]

Laetitia Chapel, Rémi Flamary, Haoran Wu, Cédric Févotte, and Gilles Gasso. 2021. Unbalanced optimal transport through non-negative penalized linear regression. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS’21), Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 23270–23282.

[19]

Li Chen, Rasmus Kyng, Yang P. Liu, Richard Peng, Maximilian Probst Gutenberg, and Sushant Sachdeva. 2022. Maximum flow and minimum-cost flow in almost-linear time. In IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS’22). IEEE, 612–623.

[20]

Xi Chen, Rajesh Jayaram, Amit Levi, and Erik Waingarten. 2022. New streaming algorithms for high dimensional EMD and MST. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing. 222–233.

Digital Library

[21]

Scott Cohen and Leonidas Guibas. 1999. The earth mover’s distance under transformation sets. In Proceedings of the 7th IEEE International Conference on Computer Vision. 1.

[22]

Nicu D. Cornea, M. Fatih Demirci, Deborah Silver, S. J. Dickinson, and P. B. Kantor. 2005. 3D object retrieval using many-to-many matching of curve skeletons. In Proceedings of the International Conference on Shape Modeling and Applications. IEEE, 366–371.

Digital Library

[23]

Nicolas Courty, Rémi Flamary, Devis Tuia, and Alain Rakotomamonjy. 2016. Optimal transport for domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 9 (2016), 1853–1865.

Digital Library

[24]

Nicolas Courty, Rémi Flamary, Devis Tuia, and Alain Rakotomamonjy. 2017. Optimal transport for domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 9 (2017), 1853–1865.

Digital Library

[25]

Marco Cuturi. 2013. Sinkhorn distances: Lightspeed computation of optimal transport. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems.2292–2300.

[26]

Sanjoy Dasgupta and Kaushik Sinha. 2013. Randomized partition trees for exact nearest neighbor search. In Proceedinsg of the Conference on Learning Theory. 317–337.

[27]

Sunipa Dev, Safia Hassan, and Jeff M. Phillips. 2021. Closed form word embedding alignment. Knowl. Inf. Syst. 63, 3 (2021), 565–588.

Digital Library

[28]

Hu Ding, Tan Chen, Fan Yang, and Mingyue Wang. 2021. A data-dependent algorithm for querying earth mover’s distance with low doubling dimensions. In Proceedings of the SIAM International Conference on Data Mining (SDM’21), Carlotta Demeniconi and Ian Davidson (Eds.). SIAM, 630–638.

[29]

Hu Ding and Jinhui Xu. 2017. FPTAS for minimizing the earth mover’s distance under rigid transformations and related problems. Algorithmica 78, 3 (2017), 741–770.

Digital Library

[30]

Hu Ding and Mingquan Ye. 2019. On geometric alignment in low doubling dimension. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 1460–1467.

Digital Library

[31]

Dan Feldman. 2020. Core-sets: An updated survey. WIREs Data Min. Knowl. Discov. 10, 1 (2020), 23–44.

[32]

Kyle Fox and Jiashuai Lu. 2022. A near-linear time approximation scheme for geometric transportation with arbitrary supplies and spread. J. Comput. Geom. 13, 1 (2022), 204–225.

[33]

Andrew V. Goldberg and Robert Endre Tarjan. 1989. Finding minimum-cost circulations by canceling negative cycles. J. ACM 36, 4 (1989), 873–886.

Digital Library

[34]

Boqing Gong, Yuan Shi, Fei Sha, and Kristen Grauman. 2012. Geodesic flow kernel for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2066–2073.

[35]

Teofilo F. Gonzalez. 1985. Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 38 (1985), 293–306.

[36]

Edouard Grave, Armand Joulin, and Quentin Berthet. 2019. Unsupervised alignment of embeddings with wasserstein procrustes. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 1880–1890.

[37]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 855–864.

Digital Library

[38]

Jihun Ham, Daniel D. Lee, and Lawrence K. Saul. 2005. Semisupervised alignment of manifolds. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’05). 120–127.

[39]

Sariel Har-Peled and Manor Mendel. 2006. Fast construction of nets in low-dimensional metrics and their applications. SIAM J. Comput. 35, 5 (2006), 1148–1184.

Digital Library

[40]

Piotr Indyk. 2007. A near linear time constant factor approximation for Euclidean bichromatic matching (cost). In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 39–42.

[41]

Piotr Indyk and N. Thaper. 2003. Fast color image retrieval via embeddings. In Workshop on Statistical and Computational Theories of Vision (at ICCV’03).

[42]

Kun Jin, Chaoyue Liu, and Cathy Xia. 2021. Two-sided wasserstein procrustes analysis. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’21). 3515–3521.

[43]

Ibrahim Jubran, Alaa Maalouf, Ron Kimmel, and Dan Feldman. 2021. Provably approximated ICP. arXiv:2101.03588. Retrieved from https://arxiv.org/abs/2101.03588.

[44]

David R. Karger and Matthias Ruhl. 2002. Finding nearest neighbors in growth-restricted metrics. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing. ACM, 741–750.

Digital Library

[45]

Andrey Boris Khesin, Aleksandar Nikolov, and Dmitry Paramonov. 2019. Preconditioning for the geometric transportation problem. In Proceedings of the 35th International Symposium on Computational Geometry. 15:1–15:14.

[46]

Wan Kyu Kim and Edward M. Marcotte. 2008. Age-dependent evolution of the yeast protein interaction network suggests a limited role of gene duplication and divergence. PLoS Comput. Biol. 4, 11 (2008), e1000232.

[47]

Oliver Klein and Remco C. Veltkamp. 2005. Approximation algorithms for computing the earth mover’s distance under transformations. In International Symposium on Algorithms and Computation. Springer, 1019–1028.

[48]

Robert Krauthgamer and James R. Lee. 2004. Navigating nets: Simple algorithms for proximity search. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 798–807.

Digital Library

[49]

Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In Proceedings of the International Conference on Machine Learning. 957–966.

[50]

Tomi J. Laakso. 2002. Plane with \(A_ {\infty }\) -weighten metric not Bilipschitz embeddable to \(R^ n\) . Bull. Lond. Math. Soc. 34, 6 (2002), 667–676.

[51]

François Le Gall. 2012. Faster algorithms for rectangular matrix multiplication. In Proceedings of the IEEE 53rd Annual Symposium on Foundations of Computer Science. IEEE, 514–523.

Digital Library

[52]

Yin Tat Lee and Aaron Sidford. 2014. Path finding methods for linear programming: Solving linear programs in Õ(vrank) iterations and faster algorithms for maximum flow. In Proceedings of the 55th IEEE Annual Symposium on Foundations of Computer Science. 424–433.

Digital Library

[53]

Shi Li. 2010. On constant factor approximation for earth mover distance over doubling metrics. CoRR abs/1002.4034.

[54]

Yangwei Liu, Hu Ding, Danyang Chen, and Jinhui Xu. 2017. Novel geometric approach for global alignment of PPI networks. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. 31–37.

[55]

Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 2 (1982), 129–137.

Digital Library

[56]

Noël Malod-Dognin, Kristina Ban, and Nataša Pržulj. 2017. Unified alignment of protein-protein interaction networks. Sci. Rep. 7, 1 (2017), 953.

[57]

Davide Maltoni, Dario Maio, Anil K. Jain, and Salil Prabhakar. 2009. Handbook of Fingerprint Recognition. Springer Science & Business Media.

[58]

Tomas Mikolov, Quoc V. Le, and Ilya Sutskever. 2013. Exploiting similarities among languages for machine translation. arXiv:1309.4168. Retrieved from https://arxiv.org/abs/1309.4168.

[59]

Debarghya Mukherjee, Aritra Guha, Justin M. Solomon, Yuekai Sun, and Mikhail Yurochkin. 2021. Outlier-robust optimal transport. In International Conference on Machine Learning. PMLR, 7850–7860.

[60]

Alexander Munteanu and Chris Schwiegelshohn. 2018. Coresets-methods and history: A theoreticians design pattern for approximation and streaming algorithms. Künstl. Intell. 32, 1 (2018), 37–53.

[61]

Soliman Nasser, Ibrahim Jubran, and Dan Feldman. 2015. Low-cost and faster tracking systems using core-sets for pose-estimation. CoRR abs/1511.09120 (2015).

[62]

James B. Orlin. 1988. A faster strongly polynominal minimum cost flow algorithm. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing. 377–387.

[63]

James B. Orlin. 1997. A polynomial time primal network simplex algorithm for minimum cost flows. Math. Program. 78, 2 (1997), 109–129.

Digital Library

[64]

James B. Orlin, Serge A. Plotkin, and Éva Tardos. 1993. Polynomial dual network simplex algorithms. Math. Program. 60, 1-3 (1993), 255–276.

Digital Library

[65]

Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 10 (2010), 1345–1359.

Digital Library

[66]

Michaël Perrot, Nicolas Courty, Rémi Flamary, and Amaury Habrard. 2016. Mapping estimation for discrete optimal transport. In Advances in Neural Information Processing Systems, 29.

[67]

Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. 2000. The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40, 2 (2000), 99–121.

Digital Library

[68]

Ebrahim Sahraeian Sayed Mohammad and Byung-Jun Yoon. 2012. A network synthesis model for generating protein interaction network families. PloS One 7 (August2012).

[69]

Peter H. Schönemann. 1966. A generalized solution of the orthogonal procrustes problem. Psychometrika 31, 1 (1966), 1–10.

[70]

R. Sharathkumar and Pankaj K. Agarwal. 2012. Algorithms for the transportation problem in geometric settings. In Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’12). 306–317.

[71]

R. Sharathkumar and Pankaj K. Agarwal. 2012. A near-linear time \(\epsilon\) -approximation algorithm for geometric bipartite matching. In Proceedings of the 44th Symposium on Theory of Computing Conference (STOC’12). 385–394.

[72]

Jonah Sherman. 2017. Generalized preconditioning and undirected minimum-cost flow. In Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms. 772–780.

[73]

Ricard V. Solé, Romualdo Pastor-Satorras, Eric Smith, and Thomas B. Kepler. 2002. A model of large-scale proteome evolution. Adv, Complex Syst, 5, 01 (2002), 43–54.

[74]

Kunal Talwar. 2004. Bypassing the embedding: Algorithms for low dimensional metrics. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing. 281–290.

Digital Library

[75]

Éva Tardos. 1985. A strongly polynomial minimum cost circulation algorithm. Combinatorica 5, 3 (1985), 247–256.

Digital Library

[76]

Sinisa Todorovic and Narendra Ahuja. 2008. Region-based hierarchical image matching. Int. J. Comput. Vis. 78, 1 (2008), 47–66.

Digital Library

[77]

Pravin M. Vaidya. 1989. Geometry helps in matching. SIAM J. Comput. 18, 6 (1989), 1201–1225.

Digital Library

[78]

Kasturi R. Varadarajan and Pankaj K. Agarwal. 1999. Approximation algorithms for bipartite and non-bipartite matching in the plane. In Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms. 805–814.

[79]

Alexei Vázquez, Alessandro Flammini, Amos Maritan, and Alessandro Vespignani. 2003. Modeling of protein interaction networks. Complexus 1, 1 (2003), 38–44.

[80]

Cédric Villani. 2008. Topics in optimal transportation. Am. Math. Soc. 58 (2008).

[81]

Grace Wahba. 1965. A least squares estimate of satellite attitude. SIAM Rev. 7, 3 (1965), 409–409.

Digital Library

[82]

C. Wang, P. Krafft, S. Mahadevan, Y. Ma, and Y. Fu. 2011. Manifold alignment. In Manifold Learning: Theory and Applications, CRC Press. 95–120.

[83]

Zekun Yin, Haidong Lan, Guangming Tan, Mian Lu, Athanasios V. Vasilakos, and Weiguo Liu. 2017. Computing platforms for big biological data analytics: Perspectives and challenges. Comput. Struct. Biotechnol. J. 15 (2017), 403–411.

[84]

Hyejin Youn, Logan Sutton, Eric Smith, Cristopher Moore, Jon F. Wilkins, Ian Maddieson, William Croft, and Tanmoy Bhattacharya. 2016. On the universal structure of human lexical semantics. Proc. Natl. Acad. Sci. U.S.A. 113, 7 (2016), 1766–1771.

[85]

Meng Zhang, Yang Liu, Huanbo Luan, and Maosong Sun. 2017. Earth mover’s distance minimization for unsupervised bilingual lexicon induction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2017). 1934–1945.

Index Terms

A Data-dependent Approach for High-dimensional (Robust) Wasserstein Alignment
1. Theory of computation
  1. Theory and algorithms for application domains

Recommendations

Wasserstein discriminant analysis

Wasserstein discriminant analysis (WDA) is a new supervised linear dimensionality reduction algorithm. Following the blueprint of classical Fisher Discriminant Analysis, WDA selects the projection matrix that maximizes the ratio of the dispersion of ...
Enhanced algorithm for high-dimensional data classification

Graphical abstractIllustration of the decision hyperplanes generated by TSSVM, MCVSVM, and LMLP on an artificial dataset. Display Omitted HighlightsIn the case of the singularity of the within-class scatter matrix, the drawbacks of both MCVSVM and LMLP ...
Constrained discriminant neighborhood embedding for high dimensional data feature extraction

When handling pattern classification problem such as face recognition and digital handwriting identification, image data is always represented to high dimensional vectors, from which discriminant features are extracted using dimensionality reduction ...

Comments

Information & Contributors

Information

Published In

cover image ACM Journal of Experimental Algorithmics

ACM Journal of Experimental Algorithmics Volume 28, Issue

December 2023

325 pages

ISSN:1084-6654

EISSN:1084-6654

DOI:10.1145/3587923

Issue’s Table of Contents

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2023

Online AM: 21 June 2023

Accepted: 27 May 2023

Revised: 29 March 2023

Received: 23 September 2022

Published in JEA Volume 28

Author Tags

Qualifiers

Research-article

Funding Sources

National Key R&D program of China
NSFC
Provincial NSF of Anhui

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
128
Total Downloads

Downloads (Last 12 months)116
Downloads (Last 6 weeks)16

Reflects downloads up to 03 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents