Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2487575.2488217acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Online controlled experiments at large scale

Published: 11 August 2013 Publication History
  • Get Citation Alerts
  • Abstract

    Web-facing companies, including Amazon, eBay, Etsy, Facebook, Google, Groupon, Intuit, LinkedIn, Microsoft, Netflix, Shop Direct, StumbleUpon, Yahoo, and Zynga use online controlled experiments to guide product development and accelerate innovation. At Microsoft's Bing, the use of controlled experiments has grown exponentially over time, with over 200 concurrent experiments now running on any given day. Running experiments at large scale requires addressing multiple challenges in three areas: cultural/organizational, engineering, and trustworthiness. On the cultural and organizational front, the larger organization needs to learn the reasons for running controlled experiments and the tradeoffs between controlled experiments and other methods of evaluating ideas. We discuss why negative experiments, which degrade the user experience short term, should be run, given the learning value and long-term benefits. On the engineering side, we architected a highly scalable system, able to handle data at massive scale: hundreds of concurrent experiments, each containing millions of users. Classical testing and debugging techniques no longer apply when there are billions of live variants of the site, so alerts are used to identify issues rather than relying on heavy up-front testing. On the trustworthiness front, we have a high occurrence of false positives that we address, and we alert experimenters to statistical interactions between experiments. The Bing Experimentation System is credited with having accelerated innovation and increased annual revenues by hundreds of millions of dollars, by allowing us to find and focus on key ideas evaluated through thousands of controlled experiments. A 1% improvement to revenue equals more than $10M annually in the US, yet many ideas impact key metrics by 1% and are not well estimated a-priori. The system has also identified many negative features that we avoided deploying, despite key stakeholders' early excitement, saving us similar large amounts.

    References

    [1]
    Kohavi, Ron and Round, Matt. Front Line Internet Analytics at Amazon.com. {ed.} Jim Sterne. Santa Barbara, CA : s.n., 2004. http://ai.stanford.edu/~ronnyk/emetricsAmazon.pdf.
    [2]
    McKinley, Dan. Design for Continuous Experimentation: Talk and Slides. {Online} Dec 22, 2012. http://mcfunley.com/design-for-continuous-experimentation.
    [3]
    Tang, Diane, et al. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation. Proceedings 16th Conference on Knowledge Discovery and Data Mining. 2010.
    [4]
    Moran, Mike. Multivariate Testing in Action: Quicken Loan's Regis Hadiaris on multivariate testing. Biznology Blog by Mike Moran. {Online} December 2008. www.biznology.com/2008/12/multivariate_testing_in_action/.
    [5]
    Kohavi, Ron, Crook, Thomas and Longbotham, Roger. Online Experimentation at Microsoft. Third Workshop on Data Mining Case Studies and Practice Prize. 2009. http://exp-platform.com/expMicrosoft.aspx.
    [6]
    Amatriain, Xavier and Basilico, Justin. Netflix Recommendations: Beyond the 5 stars. {Online} April 2012. http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html.
    [7]
    McFarland, Colin. Experiment!: Website conversion rate optimization with A/B and multivariate testing. s.l. : New Riders, 2012. 978-0321834607.
    [8]
    Kolar, Sumanth. Recommendations and Discovery at StumbleUpon. {Online} Sept 2012. www.slideshare.net/sumanthkolar/recsys-2012-sumanth-14260370.
    [9]
    Smietana, Brandon. Zynga: What is Zynga's core competency? Quora. {Online} Sept 2010. http://www.quora.com/Zynga/What-is-Zyngas-core-competency/answer/Brandon-Smietana.
    [10]
    Blank, Steven Gary. The Four Steps to the Epiphany: Successful Strategies for Products that Win . s.l. : Cafepress.com, 2005. 978-0976470700.
    [11]
    Ries, Eric. The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses . s.l. : Crown Business, 2011. 978-0307887894.
    [12]
    Wikipedia. Lean Startup. {Online} 2013. http://en.wikipedia.org/wiki/Lean_Startup.
    [13]
    Sinofsky, Steven. One Strategy: Organization, Planning, and Decision Making . s.l. : Wiley, 2009. 978-0470560457 .
    [14]
    comScore. comScore Releases January 2013 U.S. Search Engine Rankings. {Online} Feb 13, 2013. http://www.comscore.com/Insights/Press_Releases/2013/2/comScore_Releases_January_2013_U.S._Search_Engine_Rankings.
    [15]
    SCOPE: Parallel Databases Meet MapReduce. Zhou, Jingren, et al. s.l. : VLDB Journal, 2012. http://research.microsoft.com/en-us/um/people/jrzhou/pub/Scope-VLDBJ.pdf.
    [16]
    Klein, Peter and Suh, Chris. Microsoft Second Quarter 2013 Earnings Calls Transcript. Microsoft Investor Relations. {Online} Jan 24, 2013. http://www.microsoft.com/global/Investor/RenderingAssets/Downloads/FY13/Q2/Microsoft_Q2_2013_PreparedRemarks.docx.
    [17]
    Manzi, Jim. Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society. s.l. : Basic Books, 2012. 978-0-465-02931-0.
    [18]
    Kohavi, Ron, et al. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery. February 2009, Vol. 18, 1, pp. 140--181. http://www.exp-platform.com/Pages/hippo_long.aspx.
    [19]
    Eisenberg, Bryan. How to Improve A/B Testing. ClickZ Network. {Online} April 29, 2005. www.clickz.com/clickz/column/1717234/how-improve-a-b-testing.
    [20]
    Box, George E.P., Hunter, J Stuart and Hunter, William G. Statistics for Experimenters: Design, Innovation, and Discovery. 2nd. s.l. : John Wiley & Sons, Inc, 2005. 0471718130.
    [21]
    Kohavi, Ron. Online Controlled Experiments: Introduction, Learnings, and Humbling Statistics. The ACM Conference on Recommender Systems. 2012. Industry Keynote. http://www.exp-platform.com/Pages/2012RecSys.aspx.
    [22]
    Crook, Thomas, et al. Seven Pitfalls to Avoid when Running Controlled Experiments on the Web. {ed.} Peter Flach and Mohammed Zaki. KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 2009, pp. 1105--1114. http://www.exp-platform.com/Pages/ExPpitfalls.aspx.
    [23]
    Kohavi, Ron and Longbotham, Roger. Unexpected Results in Online Controlled Experiments. SIGKDD Explorations. 2010, Vol. 12, 2. http://www.exp-platform.com/Documents/2010--12%20ExPUnexpectedSIGKDD.pdf.
    [24]
    Segall, Ken. Insanely Simple: The Obsession That Drives Apple's Success. s.l. : Portfolio Hardcover, 2012. 978-1591844839.
    [25]
    Kohavi, Ron, et al. Trustworthy online controlled experiments: Five puzzling outcomes explained. Proceedings of the 18th Conference on Knowledge Discovery and Data Mining. 2012, www.exp-platform.com/Pages/PuzzingOutcomesExplained.aspx.
    [26]
    Kohavi, Ron, Longbotham, Roger and Walker, Toby. Online Experiments: Practical Lessons. {ed.} Simon S.Y. Shim. IEEE Computer. September 2010, Vol. 43, 9, pp. 82--85. http://www.exp-platform.com/Documents/IEEE2010ExP.pdf.
    [27]
    Kaushik, Avinash. Experimentation and Testing: A Primer. Occam's Razor. {Online} May 22, 2006. http://www.kaushik.net/avinash/2006/05/experimentation-and-testing-a-primer.html.
    [28]
    Moran, Mike. Do It Wrong Quickly: How the Web Changes the Old Marketing Rules . s.l. : IBM Press, 2007. 0132255960.
    [29]
    McKinley, Dan. Testing to Cull the Living Flower. {Online} Jan 2013. http://mcfunley.com/testing-to-cull-the-living-flower.
    [30]
    Deming, data and observational studies: A process out of control and needing fixing. Young, S Stanley and Karr, Allan. 3, 2011, Significance, Vol. 8. http://www.niss.org/sites/default/files/Young%20Karr%20Obs%20Study%20Problem.pdf.
    [31]
    Why Most Published Research Findings Are False. Ioannidis, John P. 8, 2005, PLoS Medicine, Vol. 2, p. e124. http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0020124.
    [32]
    Contradicted and Initially Stronger Effects in Highly Cited Clinical Research. Ioannidis, John P. 2, s.l. : The Journal of the American Medical Association, 2005, Vol. 294. http://jama.jamanetwork.com/article.aspx?articleid=201218.
    [33]
    Weiss, Rick. Study Debunks Theory On Teen Sex, Delinquency. Washington Post. 2007, Nov 11. http://www.washingtonpost.com/wp-dyn/content/story/2007/11/11/ST2007111100542.html.
    [34]
    Myopia, How Science Works: The Story of Night-Light. Myopia: Prevention and Control. {Online} http://www.myopiaprevention.org/references_night_lights.html.
    [35]
    Kohavi, Ron. Online Controlled Experiments: Listening to the Customers, not to the HiPPO. Keynote at EC10: the 11th ACM Conference on Electronic Commerce. 2010. http://www.exp-platform.com/Documents/2010-06%20EC10.pptx.
    [36]
    Kuhn, Thomas. The Structure of Scientific Revolutions. 3rd. 1996. 978-0226458083 .
    [37]
    Wikipedia. Semmelweis reflex. http://en.wikipedia.org/wiki/Semmelweis_reflex.
    [38]
    Hubbard, Douglas W. How to Measure Anything: Finding the Value of Intangibles in Business. 2nd. s.l. : Wiley, 2010.
    [39]
    Wooton, David. Bad Medicine: Doctors Doing Harm Since Hippocrates. s.l. : Oxford University Press, 2007.
    [40]
    Linden, Greg. Make Data Useful. {Online} Dec 2006. home.blarg.net/~glinden/StanfordDataMining.2006-11-29.ppt.
    [41]
    Performance Related Changes and their User Impact. Schurman, Eric and Brutlag, Jake. s.l. : Velocity 09: Velocity Web Performance and Operations Conference, 2009.
    [42]
    Douglas Bowman. Goodbye, Google. StopDesign. {Online} http://stopdesign.com/archive/2009/03/20/goodbye-google.html.
    [43]
    A Multiple Testing Procedure for Clinical Trials. O'Brien, Peter C. and Fleming, Thomas R. 3, September 1979, Biometrics, Vol. 35, pp. 549--556.
    [44]
    A unified approach to false discovery rate estimation. Strimmer, Korbinian. 1, s.l. : Bmc Bioinformatics, 2008, Vol. 9.
    [45]
    Deng, Alex, et al. Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data. WSDM 2013: Sixth ACM International Conference on Web Search and Data Mining. 2013. www.exp-platform.com/Pages/CUPED.aspx.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2013
    1534 pages
    ISBN:9781450321747
    DOI:10.1145/2487575
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 August 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. a/b testing
    2. controlled experiments
    3. randomized experiments

    Qualifiers

    • Research-article

    Conference

    KDD' 13
    Sponsor:

    Acceptance Rates

    KDD '13 Paper Acceptance Rate 125 of 726 submissions, 17%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)226
    • Downloads (Last 6 weeks)27

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Winner Take All: Exploiting Asymmetry in Factorial DesignsSSRN Electronic Journal10.2139/ssrn.4863206Online publication date: 2024
    • (2024)Experimental Design through an Optimization LensSSRN Electronic Journal10.2139/ssrn.4780792Online publication date: 2024
    • (2024)Online multiple testing with super-uniformity rewardElectronic Journal of Statistics10.1214/24-EJS223018:1Online publication date: 1-Jan-2024
    • (2024)The online closure principleThe Annals of Statistics10.1214/24-AOS237052:2Online publication date: 1-Apr-2024
    • (2024)Automating Pipelines of A/B Tests with Population Split Using Self-Adaptation and Machine LearningProceedings of the 19th International Symposium on Software Engineering for Adaptive and Self-Managing Systems10.1145/3643915.3644087(84-97)Online publication date: 15-Apr-2024
    • (2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
    • (2024)A survey on personalized itinerary recommendation: From optimisation to deep learningApplied Soft Computing10.1016/j.asoc.2023.111200152(111200)Online publication date: Feb-2024
    • (2024)A fast bootstrap algorithm for causal inference with large dataStatistics in Medicine10.1002/sim.1007543:15(2894-2927)Online publication date: 13-May-2024
    • (2023)Network A/B Testing: Nonparametric Statistical Significance Test Based on Cluster-Level PermutationJournal of Data Science10.6339/23-JDS1112(523-537)Online publication date: 25-Jul-2023
    • (2023)Building a Foundation for More Flexible A/B Testing: Applications of Interim Monitoring to Large Scale DataJournal of Data Science10.6339/23-JDS1099(412-427)Online publication date: 21-Apr-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media