Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3618260.3649629acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article
Open access

Approximate Earth Mover’s Distance in Truly-Subquadratic Time

Published: 11 June 2024 Publication History

Abstract

We design an additive approximation scheme for estimating the cost of the min-weight bipartite matching problem: given a bipartite graph with non-negative edge costs and ε > 0, our algorithm estimates the cost of matching all but O(ε)-fraction of the vertices in truly subquadratic time O(n2−δ(ε)). Our algorithm has a natural interpretation for computing the Earth Mover’s Distance (EMD), up to a ε-additive approximation. Notably, we make no assumptions about the underlying metric (more generally, the costs do not have to satisfy triangle inequality). Note that compared to the size of the instance (an arbitrary n × n cost matrix), our algorithm runs in sublinear time. Our algorithm can approximate a slightly more general problem: max-cardinality bipartite matching with a knapsack constraint, where the goal is to maximize the number of vertices that can be matched up to a total cost B.

References

[1]
Tenindra Abeywickrama, Victor Liang, and Kian-Lee Tan. 2021. Optimizing bipartite matching in real-world applications by incremental cost computation. Proceedings of the VLDB Endowment, 14, 7 (2021), 1150–1158.
[2]
Pankaj K Agarwal, Hsien-Chih Chang, Sharath Raghvendra, and Allen Xiao. 2022. Deterministic, near-linear (1+∊ ) -approximation algorithm for geometric bipartite matching. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing. 1052–1065.
[3]
Pankaj K Agarwal and R Sharathkumar. 2014. Approximation algorithms for bipartite matching with metric and geometric costs. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing. 555–564.
[4]
Jason Altschuler, Jonathan Niles-Weed, and Philippe Rigollet. 2017. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. Advances in neural information processing systems, 30 (2017).
[5]
Alexandr Andoni, Khanh Do Ba, Piotr Indyk, and David Woodruff. 2009. Efficient sketches for earth-mover distance, with applications. In 2009 50th Annual IEEE Symposium on Foundations of Computer Science. 324–330.
[6]
Alexandr Andoni, Piotr Indyk, and Robert Krauthgamer. 2008. Earth mover distance over high-dimensional spaces. In SODA. 8, 343–352.
[7]
Alexandr Andoni, Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev. 2014. Parallel algorithms for geometric graph problems. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing. 574–583.
[8]
Alexandr Andoni and Hengjie Zhang. 2023. Sub-quadratic (1+∊ )-approximate Euclidean Spanners, with Applications. arXiv preprint arXiv:2310.05315.
[9]
Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein generative adversarial networks. In International conference on machine learning. 214–223.
[10]
Khanh Do Ba, Huy L Nguyen, Huy N Nguyen, and Ronitt Rubinfeld. 2011. Sublinear time algorithms for earth mover’s distance. Theory of Computing Systems, 48 (2011), 428–442.
[11]
Arturs Backurs, Yihe Dong, Piotr Indyk, Ilya Razenshteyn, and Tal Wagner. 2020. Scalable nearest neighbor search for optimal transport. In International Conference on machine learning. 497–506.
[12]
Mihai Bădoiu, Artur Czumaj, Piotr Indyk, and Christian Sohler. 2005. Facility location in sublinear time. In Automata, Languages and Programming: 32nd International Colloquium, ICALP 2005, Lisbon, Portugal, July 11-15, 2005. Proceedings 32. 866–877.
[13]
Soheil Behnezhad. 2022. Time-optimal sublinear algorithms for matching and vertex cover. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS). 873–884.
[14]
Soheil Behnezhad, Mohammad Roghani, and Aviad Rubinstein. 2023. Local Computation Algorithms for Maximum Matching: New Lower Bounds. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS). 2322–2335.
[15]
Soheil Behnezhad, Mohammad Roghani, and Aviad Rubinstein. 2023. Sublinear time algorithms and complexity of approximate maximum matching. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing. 267–280.
[16]
Soheil Behnezhad, Mohammad Roghani, and Aviad Rubinstein. 2024. Approximating Maximum Matching Requires Almost Quadratic Time. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024.
[17]
Soheil Behnezhad, Mohammad Roghani, Aviad Rubinstein, and Amin Saberi. 2023. Beating greedy matching in sublinear time. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 3900–3945.
[18]
Sayan Bhattacharya, Peter Kiss, and Thatchaphol Saranurak. 2023. Dynamic (1+∊ ) -Approximate Matching Size in Truly Sublinear Update Time. arXiv preprint arXiv:2302.05030.
[19]
Sayan Bhattacharya, Peter Kiss, and Thatchaphol Saranurak. 2023. Sublinear Algorithms for (1.5+∊ )-Approximate Matching. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC 2023, Orlando, FL, USA, June 20-23, 2023, Barna Saha and Rocco A. Servedio (Eds.). ACM, 254–266. https://doi.org/10.1145/3564246.3585252
[20]
Jose Blanchet, Arun Jambulapati, Carson Kent, and Aaron Sidford. 2018. Towards optimal running times for optimal transport. arXiv preprint arXiv:1810.07717.
[21]
Clément L Canonne. 2020. A survey on distribution testing: Your data is big. But is it blue? Theory of Computing, 1–100.
[22]
Moses S Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. 380–388.
[23]
Li Chen, Rasmus Kyng, Yang P Liu, Richard Peng, Maximilian Probst Gutenberg, and Sushant Sachdeva. 2022. Maximum flow and minimum-cost flow in almost-linear time. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS). 612–623.
[24]
Xi Chen, Rajesh Jayaram, Amit Levi, and Erik Waingarten. 2022. New streaming algorithms for high dimensional EMD and MST. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing. 222–233.
[25]
Marco Cuturi. 2013. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26 (2013).
[26]
Artur Czumaj and Christian Sohler. 2009. Estimating the weight of metric minimum spanning trees in sublinear time. SIAM J. Comput., 39, 3 (2009), 904–922.
[27]
Pavel Dvurechensky, Alexander Gasnikov, and Alexey Kroshnin. 2018. Computational optimal transport: Complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm. In International conference on machine learning. 1367–1376.
[28]
Kyle Fox and Jiashuai Lu. 2022. A deterministic near-linear time approximation scheme for geometric transportation. arXiv preprint arXiv:2211.03891.
[29]
Harold N Gabow and Robert E Tarjan. 1989. Faster scaling algorithms for network problems. SIAM J. Comput., 18, 5 (1989), 1013–1036.
[30]
Sariel Har-Peled, Piotr Indyk, and Anastasios Sidiropoulos. 2013. Euclidean spanners in high dimensions. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms. 804–809.
[31]
Piotr Indyk. 2003. Fast color image retrieval via embeddings. In Workshop on Statistical and Computational Theories of Vision (at ICCV), 2003.
[32]
Piotr Indyk. 2004. Algorithms for dynamic geometric problems over data streams. In Proceedings of the thirty-sixth annual ACM Symposium on Theory of Computing. 373–380.
[33]
Harold W Kuhn. 1955. The Hungarian method for the assignment problem. Naval research logistics quarterly, 2, 1-2 (1955), 83–97.
[34]
Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In International conference on machine learning. 957–966.
[35]
Nathaniel Lahn, Deepika Mulchandani, and Sharath Raghvendra. 2019. A graph theoretic additive approximation of optimal transport. Advances in Neural Information Processing Systems, 32 (2019).
[36]
Khang Le, Huy Nguyen, Quang M Nguyen, Tung Pham, Hung Bui, and Nhat Ho. 2021. On robust optimal transport: Computational complexity and barycenter computation. Advances in Neural Information Processing Systems, 34 (2021), 21947–21959.
[37]
Yiling Luo, Yiling Xie, and Xiaoming Huo. 2023. Improved Rate of First Order Algorithms for Entropic Optimal Transport. In International Conference on Artificial Intelligence and Statistics. 2723–2750.
[38]
Andrew McGregor. 2005. Finding graph matchings in data streams. In International Workshop on Approximation Algorithms for Combinatorial Optimization. 170–181.
[39]
Gabriel Peyré and Marco Cuturi. 2019. Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11, 5-6 (2019), 355–607.
[40]
Khiem Pham, Khang Le, Nhat Ho, Tung Pham, and Hung Bui. 2020. On unbalanced optimal transport: An analysis of sinkhorn algorithm. In International Conference on Machine Learning. 7673–7682.
[41]
Dhruv Rohatgi. 2019. Conditional hardness of earth mover distance. arXiv preprint arXiv:1909.11068.
[42]
Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. 2000. The earth mover’s distance as a metric for image retrieval. International journal of computer vision, 40 (2000), 99–121.
[43]
Filippo Santambrogio. 2015. Optimal transport for applied mathematicians. Birkäuser, NY, 55, 58-63 (2015), 94.
[44]
R Sharathkumar and Pankaj K Agarwal. 2012. A near-linear time ε -approximation algorithm for geometric bipartite matching. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing. 385–394.
[45]
Justin Solomon, Fernando De Goes, Gabriel Peyré, Marco Cuturi, Adrian Butscher, Andy Nguyen, Tao Du, and Leonidas Guibas. 2015. Convolutional wasserstein distances: Efficient optimal transportation on geometric domains. ACM Transactions on Graphics (ToG), 34, 4 (2015), 1–11.
[46]
Gregory Valiant and Paul Valiant. 2010. A CLT and tight lower bounds for estimating entropy. In Electron. Colloquium Comput. Complex. 17, 179.
[47]
Cédric Villani. 2009. Optimal transport: old and new. 338, Springer.
[48]
Mikhail Yurochkin, Sebastian Claici, Edward Chien, Farzaneh Mirzazadeh, and Justin M Solomon. 2019. Hierarchical optimal transport for document representation. Advances in neural information processing systems, 32 (2019).

Cited By

View all

Index Terms

  1. Approximate Earth Mover’s Distance in Truly-Subquadratic Time

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    STOC 2024: Proceedings of the 56th Annual ACM Symposium on Theory of Computing
    June 2024
    2049 pages
    ISBN:9798400703836
    DOI:10.1145/3618260
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Earth Mover's Distance
    2. Matching
    3. Sublinear algorithms

    Qualifiers

    • Research-article

    Funding Sources

    • Marie Sk?odowska-Curie

    Conference

    STOC '24
    Sponsor:
    STOC '24: 56th Annual ACM Symposium on Theory of Computing
    June 24 - 28, 2024
    BC, Vancouver, Canada

    Acceptance Rates

    Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 101
      Total Downloads
    • Downloads (Last 12 months)101
    • Downloads (Last 6 weeks)42
    Reflects downloads up to 12 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media