Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1281192.1281295acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Practical guide to controlled experiments on the web: listen to your customers not to the hippo

Published: 12 August 2007 Publication History

Abstract

The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called randomized experiments (single factor or factorial designs), A/B tests (and their generalizations), split tests, Control/Treatment tests, and parallel flights. Controlled experiments embody the best scientific design for establishing a causal relationship between changes and their influence on user-observable behavior. We provide a practical guide to conducting online experiments, where end-users can help guide the development of features. Our experience indicates that significant learning and return-on-investment (ROI) are seen when development teams listen to their customers, not to the Highest Paid Person's Opinion (HiPPO). We provide several examples of controlled experiments with surprising results. We review the important ingredients of running controlled experiments, and discuss their limitations (both technical and organizational). We focus on several areas that are critical to experimentation, including statistical power, sample size, and techniques for variance reduction. We describe common architectures for experimentation systems and analyze their advantages and disadvantages. We evaluate randomization and hashing techniques, which we show are not as simple in practice as is often assumed. Controlled experiments typically generate large amounts of data, which can be analyzed using data mining techniques to gain deeper understanding of the factors influencing the outcome of interest, leading to new hypotheses and creating a virtuous cycle of improvements. Organizations that embrace controlled experiments with clear evaluation criteria can evolve their systems with automated optimizations and real-time analyses. Based on our extensive practical experience with multiple systems and organizations, we share key lessons that will help practitioners in running trustworthy controlled experiments.

References

[1]
Rossi, Peter H., Lipsey, Mark W. and Freeman, Howard E. Evaluation: A Systematic Approach. 7th. s.l. :Sage Publications, Inc, 2003. 0-7619-0894-3.
[2]
Linden, Greg. Early Amazon: Shopping cart recommendations. Geeking with Greg. {Online} April 25, 2006. http://glinden.blogspot.com/2006/04/early-amazon-shopping-cart.html.
[3]
Kohavi, Ron and Round, Matt. Front Line Internet Analytics at Amazon.com.{ed.} Jim Sterne. Santa Barbara, CA: s.n., 2004. http://ai.stanford.edu~ronnyk/emetricsAmazon.pdf.
[4]
Thomke, Stefan H. Experimentation Matters: Unlocking the Potential of New Technologies for Innovation. s.l. :Harvard Business School Press, 2003.157-85-175-08.
[5]
Charles, Reichardt S. and Melvin, Mark M. Quasi Experimentation. {bookauth.} Joseph S Wholey, Harry P. Hatry and Kathryn E. Newcomer. Handbook of Practical Program Evaluation. 2nd. s.l. :Jossey-Bass, 2004.
[6]
Keppel, Geoffrey, Saufley, William H. and Tokunaga, Howard. Introduction to Design and Analysis. 2nd. s.l. :W.H. Freeman and Company, 1992.
[7]
Eisenberg, Bryan. How to Decrease Sales by 90 Percent. ClickZ. {Online}Feb 21, 2003. http://www.clickz.com/showPage.html?page=1588161.
[8]
How to Increase Conversion Rate 1,000 Percent. ClickZ. {Online} Feb 28, 2003. http://www.clickz.com/showPage.html?page=1756031.
[9]
McGlaughlin, Flint, et al. The Power of Small Changes Tested. Marketing Experiments Journal. {Online} March 21, 2006. http://www.marketingexperiments.com/improving-website-conversion/power-small-change.html.
[10]
Usborne, Nick. Design Choices Can Cripple a Website. A List Apart.{Online} Nov 8, 2005. http://alistapart.com/articles/designcancripple.
[11]
Eisenberg, Bryan and Eisenberg, Jeffrey. Call to Action, Secret formulas to improve online results. Austin, Texas :Wizard Academy Press, 2005. Making the Dial Move by Testing, Introducing A/B Testing.
[12]
Eisenberg, Bryan; Garcia, Anthony;. Which Sells Best: A Quick StartGuide to Testing for Retailers. Future Now's Publications. {Online} 2006.http://futurenowinc.com/shop/.
[13]
Chatham, Bob, Temkin, Bruce D and Amato, Michelle. A Primer on A/B Testing. s.l. :Forrester Research, 2004.
[14]
Forrester Research. The State of Retailing Online. s.l. :Shop.org, 2005.
[15]
Mason, Robert L., Gunst, Richard F. and Hess, James L. Statistical Design and Analysis of Experiments With Applications to Engineering and Science. s.l. :John Wiley & Sons, 1989. 047185364X.
[16]
Box, George E. P., Hunter, J. Stuart and Hunter, William G. Statistics for Experimenters: Design, Innovation, and Discovery. 2nd. s.l. :John Wiley & Sons, Inc, 2005. 0471718130.
[17]
Weiss, Carol H. Evaluation: Methods for Studying Programs and Policies. 2nd. s.l. :Prentice Hall, 1997. 0-13-309725-0.
[18]
Roy, Ranjit K. Design of Experiments using the Taguchi Approach: 16 Steps to Product and Process Improvement. s.l. :John Wiley & Sons, Inc, 2001. 0-471-36101-1.
[19]
Peterson, Eric T. Web Analytics Demystified: A Marketer's Guide to Understanding How Your Web Site Affects Your Business. s.l. :Celilo Group Media and Cafe Press, 2004. 0974358428.
[20]
Eisenberg, Bryan. How to Improve A/B Testing. ClickZ Network. {Online}April 29, 2005. http://www.clickz.com/showPage.html?page=3500811.
[21]
A/B Testing for the Mathematically Disinclined. ClickZ. {Online} May 7, 2004. http://www.clickz.com/showPage.html?page=3349901.
[22]
Quarto-vonTivadar, John. AB Testing: Too Little, Too Soon. Future Now.{Online} 2006. http://www.futurenowinc.com/abtesting.pdf.
[23]
Miller, Scott. How to Design a Split Test. Web Marketing Today, Conversion/Testing. {Online} Jan 18, 2007. http://www.wilsonweb.com/conversion/.
[24]
The ConversionLab.com: How to Experiment Your Way to Increased Web Sales Using Split Testing and Taguchi Optimization. 2006. http://www.conversionlab.com/.
[25]
Kaushik, Avinash. Experimentation and Testing: A Primer. Occam's Razor by Avinash Kaushik. {Online} May 22, 2006. http://www.kaushik.net/avinash/2006/05/experimentation-and-testing-a-primer.html.
[26]
Peterson, Eric T. Web Site Measurement Hacks. s.l. :O'Reilly Media, 2005. 0596009887.
[27]
Tyler, Mary E. and Ledford, Jerri. Google Analytics. s.l. :Wiley Publishing, Inc, 2006. 0470053852.
[28]
Sterne, Jim. Web Metrics: Proven Methods for Measuring Web Site Success. s.l. :John Wiley & Sons, Inc, 2002. 0-471-22072-8.
[29]
Kaplan, Robert S and Norton, David P. The Balanced Scorecard: Translating Strategy into Action. s.l. :Harvard Business School Press, 1996. 0875846513.
[30]
Ulwick, Anthony. What Customers Want: Using Outcome-Driven Innovation to Create Breakthrough Products and Services. s.l. :McGraw-Hill, 2005. 0071408673.
[31]
Portable Power. Wheeler, Robert E. 1974, Technometrics, Vol. 16. http://www.bobwheeler.com/stat/Papers/PortablePower.PDF.
[32]
The Validity of Portable Power. Wheeler, Robert E. 2, May 1975, Technometrics, Vol. 17, pp. 177--179.
[33]
van Belle, Gerald. Statistical Rules of Thumb. s.l. :Wiley-Interscience, 2002. 0471402273.
[34]
Nielsen, Jakob. Putting A/B Testing in Its Place. Useit.com Alertbox.{Online} Aug 15, 2005. http://www.useit.com/alertbox/20050815.html.
[35]
Hawthorne effect. Wikipedia. {Online} 2007. http://en.wikipedia.org/wiki/Hawthorne_experiments.
[36]
Linden, Greg. Make Data Useful. {Online} Dec 2006.http://home.blarg.net/~glinden/StanfordDataMining.2006-11-29.ppt.
[37]
Lessons and Challenges from Mining Retail E-Commerce Data. Kohavi, Ron, et al. 1-2, s.l. :Kluwer Academic Publishers, 2004, Machine Learning, Vol. 57, pp. 83--113. http://ai.stanford.edu/~ronnyk/lessonsInDM.pdf.
[38]
Enlightened Experimentation: The New Imperative for Innovation. Thomke, Stefan. Feb 2001, Harvard Business Review. R0102D.

Cited By

View all
  • (2024)Understanding and Mitigating Authority Bias in Business and BeyondOvercoming Cognitive Biases in Strategic Management and Decision Making10.4018/979-8-3693-1766-2.ch004(57-72)Online publication date: 8-Mar-2024
  • (2024)A Novel Approach Using Non-Experts and Transformation Models to Predict the Performance of Experts in A/B TestsAerospace10.3390/aerospace1107057411:7(574)Online publication date: 12-Jul-2024
  • (2024)Evaluación de la experiencia del usuario (UX) en procesos gamificados: una revisión bibliográficaIngeniería y Competitividad10.25100/iyc.v26i3.1333826:3Online publication date: 2-Oct-2024
  • Show More Cited By

Index Terms

  1. Practical guide to controlled experiments on the web: listen to your customers not to the hippo

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2007
      1080 pages
      ISBN:9781595936097
      DOI:10.1145/1281192
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 August 2007

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. A/B testing
      2. controlled experiments
      3. e-commerce

      Qualifiers

      • Article

      Conference

      KDD07

      Acceptance Rates

      KDD '07 Paper Acceptance Rate 111 of 573 submissions, 19%;
      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Upcoming Conference

      KDD '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)119
      • Downloads (Last 6 weeks)14
      Reflects downloads up to 08 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Understanding and Mitigating Authority Bias in Business and BeyondOvercoming Cognitive Biases in Strategic Management and Decision Making10.4018/979-8-3693-1766-2.ch004(57-72)Online publication date: 8-Mar-2024
      • (2024)A Novel Approach Using Non-Experts and Transformation Models to Predict the Performance of Experts in A/B TestsAerospace10.3390/aerospace1107057411:7(574)Online publication date: 12-Jul-2024
      • (2024)Evaluación de la experiencia del usuario (UX) en procesos gamificados: una revisión bibliográficaIngeniería y Competitividad10.25100/iyc.v26i3.1333826:3Online publication date: 2-Oct-2024
      • (2024)Automating Pipelines of A/B Tests with Population Split Using Self-Adaptation and Machine LearningProceedings of the 19th International Symposium on Software Engineering for Adaptive and Self-Managing Systems10.1145/3643915.3644087(84-97)Online publication date: 15-Apr-2024
      • (2024)Evidence-Based Guidelines for Advancing Continuous ExperimentationIT Professional10.1109/MITP.2024.339754126:5(20-27)Online publication date: Sep-2024
      • (2024)Grunt Attack: Exploiting Execution Dependencies in Microservices2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN58291.2024.00025(115-128)Online publication date: 24-Jun-2024
      • (2024)A Study of Response Time Instability of Microservices at High Resource Utilization in the Cloud2024 IEEE 6th International Conference on Cognitive Machine Intelligence (CogMI)10.1109/CogMI62246.2024.00024(111-116)Online publication date: 28-Oct-2024
      • (2024)Criteria definition for digital requirements using hesitant fuzzy linguistic terms sets: an application to the automotive industryAnnals of Operations Research10.1007/s10479-024-06449-9Online publication date: 20-Dec-2024
      • (2024)Leveraging Website Analytics to Enhance User Experience with Pop-Ups and Drive Sales ConversionsProceedings of 22nd International Conference on Informatics in Economy (IE 2023)10.1007/978-981-99-6529-8_6(61-73)Online publication date: 3-Feb-2024
      • (2024)Experimentieren in UnternehmenAngewandte Psychologie für die Wirtschaft10.1007/978-3-662-68559-4_17(225-241)Online publication date: 1-Sep-2024
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media