Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3184558.3191630acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Free access

Stochastic Multi-path Routing Problem with Non-stationary Rewards: Building PayU's Dynamic Routing

Published: 23 April 2018 Publication History

Abstract

Payment transaction engine at PayU processes multimillion trans- actions every day through multiple payment gateways. Routing a transaction through an appropriate payment gateway is crucial to the engine for optimizing the availability and cost. The problem is that every transaction needs to choose one of K available payment gateways characterized by an unknown probability reward distri- bution. The reward for a gateway is a combination of its health and cost factors. The reward for a gateway is only realized when transaction is processed by the gateway i.e. by its success or failure. The objective of dynamic routing is to maximize the cumulative expected rewards over some given horizon of transactions' life. To do this, the dynamic switching system needs to acquire informa- tion about gateways (exploration) while simultaneously optimizing immediate rewards by selecting the best gateway at the moment (exploitation); the price paid due to this trade o is referred to as the regret. The main objective is to minimize the regret and maximize the rewards. The basic idea is to choose a gateway according to its probability of being the best gateway. The routing problem is a direct formulation of reinforcement learning (RL) problem. In an RL problem, an agent interacts with a dynamic, stochastic, and incompletely known environment, with the goal of finding an action-selection strategy, or policy, that optimizes some long-term performance measure. Thompson Sampling algorithm has experimentally been shown to be close to optimal.

References

[1]
Omar Besbes, Yonatan Gur, Assaf Zeevi Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-stationary Rewards.
[2]
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2--3):235--256, 2002.
[3]
M. Zelen. Play the winner rule and the controlled clinical trials. Journal of the American Statistical Association, 64:131--146, 1969.
[4]
D. A. Berry and B. Fristedt. Bandit problems: sequential allocation of experiments. Chapman and Hall, 1985.
[5]
O. Chapelle and L. Li. An empirical evaluation of thompson sampling. In NIPS, 2011.
[6]
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 2002.
[7]
R. D. Kleinberg and T. Leighton. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 594--605, 2003.
[8]
Sebastien Bubeck and Nicolo Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning, 5(1):1--122, 2012.
[9]
J. C. Gittins. Bandit processes and dynamic allocation indices (with discussion). Journal of the Royal Statistical Society, Series B, 41:148--177, 1979.
[10]
O. Besbes, Y. Gur, and A. Zeevi. Non-stationary stochastic optimization. Working paper, 2014.
[11]
S. Agrawal and N. Goyal. Analysis of thompson sampling for the multi-armed bandit problem. CoRR, abs/1111.1797, 2011.
[12]
M. Babaioff, Y. Sharma, and A. Slivkins. Characterizing truthful multi-armed bandit mechanisms: extended abstract. In Tenth ACM Conference on Electronic Commerce, pages 79--88. ACM, 2009.
[13]
O. Chapelle and L. Li. An empirical evaluation of thompson sampling. In J. Shawe- Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages 2249--2257. Curran Associates, Inc., 2011.
[14]
N. Gatti, A. Lazaric, and F. TrovÃ. A truthful learning mechanism for contextual multi-slot sponsored search auctions with externalities. In Thirteenth ACM Conference on Electronic Commerce, pages 605--622, 2012.
[15]
O.-C. Granmo. Solving two-armed bernoulli bandit problems using a bayesian learning automaton. International Journal of Intelligent Computing and Cybernetics, 3(2):207--234, 2010.
[16]
S. Scott. A modern bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry, 26:639--658, 2010.
[17]
D. Bergemann and J. Valimaki. Learning and strategic pricing. Econometrica, 64:1125--1149, 1996.
[18]
D. Bergemann and U. Hege. The financing of innovation: Learning and stopping. RAND Journal of Economics, 36 (4):719--752, 2005.
[19]
B. Awerbuch and R. D. Kleinberg. Addaptive routing with end-to-end feedback: distributed learning and geometric approaches. In Proceedings of the 36th ACM Symposiuim on Theory of Computing (STOC), pages 45--53, 2004.
[20]
F. Caro and G. Gallien. Dynamic assortment with demand learning for seasonal consumer goods. Management Science, 53:276--292, 2007.
[21]
S. Pandey, D. Agarwal, D. Charkrabarti, and V. Josifovski. Bandits for taxonomies: A model-based approach. In SIAM International Conference on Data Mining, 2007

Cited By

View all
  • (2023)Maximizing Success Rate of Payment Routing using Non-stationary BanditsProceedings of the Third International Conference on AI-ML Systems10.1145/3639856.3639883(1-7)Online publication date: 25-Oct-2023
  • (2021)An AI-powered Smart Routing Solution for Payment Systems2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671961(2026-2033)Online publication date: 15-Dec-2021
  • (2018)Energy Efficient Based Splitting for MPTCP in Heterogeneous NetworksAd Hoc Networks10.1007/978-3-030-05888-3_10(105-114)Online publication date: 19-Dec-2018

Index Terms

  1. Stochastic Multi-path Routing Problem with Non-stationary Rewards: Building PayU's Dynamic Routing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '18: Companion Proceedings of the The Web Conference 2018
    April 2018
    2023 pages
    ISBN:9781450356404
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • IW3C2: International World Wide Web Conference Committee

    In-Cooperation

    Publisher

    International World Wide Web Conferences Steering Committee

    Republic and Canton of Geneva, Switzerland

    Publication History

    Published: 23 April 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. machine learning
    2. reinforcement learning
    3. routing problem

    Qualifiers

    • Research-article

    Conference

    WWW '18
    Sponsor:
    • IW3C2
    WWW '18: The Web Conference 2018
    April 23 - 27, 2018
    Lyon, France

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)237
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Maximizing Success Rate of Payment Routing using Non-stationary BanditsProceedings of the Third International Conference on AI-ML Systems10.1145/3639856.3639883(1-7)Online publication date: 25-Oct-2023
    • (2021)An AI-powered Smart Routing Solution for Payment Systems2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671961(2026-2033)Online publication date: 15-Dec-2021
    • (2018)Energy Efficient Based Splitting for MPTCP in Heterogeneous NetworksAd Hoc Networks10.1007/978-3-030-05888-3_10(105-114)Online publication date: 19-Dec-2018

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media