Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2566486.2567967acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Designing and deploying online field experiments

Published: 07 April 2014 Publication History

Abstract

Online experiments are widely used to compare specific design alternatives, but they can also be used to produce generalizable knowledge and inform strategic decision making. Doing so often requires sophisticated experimental designs, iterative refinement, and careful logging and analysis. Few tools exist that support these needs. We thus introduce a language for online field experiments called PlanOut. PlanOut separates experimental design from application code, allowing the experimenter to concisely describe experimental designs, whether common "A/B tests" and factorial designs, or more complex designs involving conditional logic or multiple experimental units. These latter designs are often useful for understanding causal mechanisms involved in user behaviors. We demonstrate how experiments from the literature can be implemented in PlanOut, and describe two large field experiments conducted on Facebook with PlanOut. For common scenarios in which experiments are run iteratively and in parallel, we introduce a namespaced management system that encourages sound experimental practice.

References

[1]
Aronow, P., and Samii, C. Estimating average causal effects under interference between units. Manuscript, 2013.
[2]
Bakshy, E., and Eckles, D. Uncertainty in online experiments with dependent data: An evaluation of bootstrap methods. In Proc. of the 19th ACM SIGKDD conference on knowledge discovery and data mining, ACM (2013).
[3]
Bakshy, E., Eckles, D., Yan, R., and Rosenn, I. Social influence in social advertising: Evidence from field experiments. In Proc. of the 13th ACM Conference on Electronic Commerce, ACM (2012), 146--161.
[4]
Bakshy, E., Rosenn, I., Marlow, C., and Adamic, L. The role of social networks in information diffusion. In Proc. of the 21st international conference on World Wide Web, ACM (2012), 519--528.
[5]
Bareinboim, E., and Pearl, J. Transportability of causal effects: Completeness results. In Proc. of the Twenty-Sixth National Conference on Artificial Intelligence, AAAI (2012).
[6]
Beenen, G., Ling, K., Wang, X., Chang, K., Frankowski, D., Resnick, P., and Kraut, R. E. Using social psychology to motivate contributions to online communities. In Proc. of the 2004 ACM conference on Computer supported cooperative work, CSCW '04, ACM (2004), 212--221.
[7]
Bond, R. M., Fariss, C. J., Jones, J. J., Kramer, A. D. I., Marlow, C., Settle, J. E., and Fowler, J. H. A 61-million-person experiment in social influence and political mobilization. Nature 489, 7415 (2012), 295--298.
[8]
Box, G. E., Hunter, J. S., and Hunter, W. G. Statistics for Experimenters: Design, Innovation, and Discovery, vol. 13. Wiley Online Library, 2005.
[9]
Bryan, C. J., Walton, G. M., Rogers, T., and Dweck, C. S. Motivating voter turnout by invoking the self. Proc. of the National Academy of Sciences 108, 31 (2011), 12653--12656.
[10]
Burke, M., Marlow, C., and Lento, T. Feed me: Motivating newcomer contribution in social network sites. In Proc. of the SIGCHI Conference on Human Factors in Computing Systems, CHI '09, ACM (2009), 945--954.
[11]
Cameron, A., Gelbach, J., and Miller, D. Robust inference with multi-way clustering. Journal of Business & Economic Statistics 29, 2 (2011), 238--249.
[12]
Crook, T., Frasca, B., Kohavi, R., and Longbotham, R. Seven pitfalls to avoid when running controlled experiments on the web. In Proc. of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM (2009), 1105--1114.
[13]
Deaton, A. Instruments, randomization, and learning about development. Journal of Economic Literature (2010), 424--455.
[14]
Farahat, A., and Bailey, M. C. How effective is targeted advertising? In Proc. of the 21st international conference on World Wide Web, ACM (2012), 111--120.
[15]
Gelman, A. Analysis of variance -- why it is more important than ever. The Annals of Statistics 33, 1 (2005), 1--53.
[16]
Gerber, A. S., and Green, D. P. Field Experiments: Design, Analysis, and Interpretation. WW Norton, 2012.
[17]
Holland, P. W. Causal inference, path analysis, and recursive structural equations models. Sociological Methodology 18 (1988), 449--484.
[18]
Kohavi, R., Deng, A., Frasca, B., Longbotham, R., Walker, T., and Xu, Y. Trustworthy online controlled experiments: Five puzzling outcomes explained. In Proc. of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM (2012), 786--794.
[19]
Kohavi, R., Longbotham, R., Sommerfield, D., and Henne, R. Controlled experiments on the web: Survey and practical guide. Data Mining and Knowledge Discovery 18, 1 (2009), 140--181.
[20]
Kulkarni, C., and Chi, E. All the news that's fit to read: a study of social annotations for news reading. In Proc. of the SIGCHI Conference on Human Factors in Computing Systems, ACM (2013), 2407--2416.
[21]
Lewis, R. A., Rao, J. M., and Reiley, D. H. Here, there, and everywhere: Correlated online behaviors can lead to overestimates of the effects of advertising. In Proc. of the 20th international conference on World wide web, ACM (2011), 157--166.
[22]
Li, L., Chu, W., Langford, J., and Schapire, R. E. A contextual-bandit approach to personalized news article recommendation. In Proc. of the 19th international conference on World wide web, ACM (2010), 661--670.
[23]
Manzi, J. Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society. Basic Books, 2012.
[24]
Mao, A., Chen, Y., Gajos, K. Z., Parkes, D., Procaccia, A. D., and Zhang, H. Turkserver: Enabling synchronous and longitudinal online experiments. Proc. HCOMP '12 (2012).
[25]
Mason, W., and Suri, S. Conducting behavioral research on Amazon's Mechanical Turk. Behavior research methods 44, 1 (2012), 1--23.
[26]
Miratrix, L. W., Sekhon, J. S., and Yu, B. Adjusting treatment effect estimates by post-stratification in randomized experiments. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75, 2 (2013), 369--396.
[27]
Morgan, S. L., and Winship, C. Counterfactuals and Causal Inference: Methods and Principles for Social Research. Cambridge University Press, July 2007.
[28]
Neilson, J. Putting A/B testing in its place, 2005. http://www.nngroup.com/articles/putting-ab-testing-in-its-place.
[29]
Rubin, D. B. Statistics and causal inference: Comment: Which ifs have causal answers. Journal of the American Statistical Association 81, 396 (1986), 961--962.
[30]
Scott, S. L. A modern Bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry 26, 6 (2010), 639--658.
[31]
Shadish, W. R., and Cook, T. D. The renaissance of field experimentation in evaluating interventions. Annual Review of Psychology 60, 1 (Jan. 2009), 607--629.
[32]
Tang, D., Agarwal, A., O'Brien, D., and Meyer, M. Overlapping experiment infrastructure: More, better, faster experimentation. In Proc. of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM (2010), 17--26.
[33]
Taylor, S. J., Bakshy, E., and Aral, S. Selection effects in online sharing: Consequences for peer adoption. In Proc. of the Fourteenth ACM Conference on Electronic Commerce, EC '13, ACM (2013), 821--836.
[34]
Ugander, J., Karrer, B., Backstrom, L., and Kleinberg, J. M. Graph cluster randomization: Network exposure to multiple universes. In Proc. of KDD, ACM (2013).
[35]
Watts, D. Everything Is Obvious: *Once You Know the Answer. Crown Publishing Group, 2011.

Cited By

View all
  • (2025)Online field experiments: a selective survey of methodsJournal of the Economic Science Association10.1007/s40881-015-0005-31:1(29-42)Online publication date: 1-Jan-2025
  • (2024)Estimating distributional treatment effects in randomized experimentsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692269(5082-5113)Online publication date: 21-Jul-2024
  • (2024)Large-Scale Metric Computation in Online Controlled Experiment PlatformProceedings of the VLDB Endowment10.14778/3685800.368582317:12(4014-4024)Online publication date: 8-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '14: Proceedings of the 23rd international conference on World wide web
April 2014
926 pages
ISBN:9781450327442
DOI:10.1145/2566486

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 April 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. A/B testing
  2. methodology
  3. online experiments
  4. toolkits

Qualifiers

  • Research-article

Conference

WWW '14
Sponsor:
  • IW3C2

Acceptance Rates

WWW '14 Paper Acceptance Rate 84 of 645 submissions, 13%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)99
  • Downloads (Last 6 weeks)11
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Online field experiments: a selective survey of methodsJournal of the Economic Science Association10.1007/s40881-015-0005-31:1(29-42)Online publication date: 1-Jan-2025
  • (2024)Estimating distributional treatment effects in randomized experimentsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692269(5082-5113)Online publication date: 21-Jul-2024
  • (2024)Large-Scale Metric Computation in Online Controlled Experiment PlatformProceedings of the VLDB Endowment10.14778/3685800.368582317:12(4014-4024)Online publication date: 8-Nov-2024
  • (2024)Country-diverted experiments for mitigation of network effectsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688046(765-767)Online publication date: 8-Oct-2024
  • (2024)Improving Ego-Cluster for Network Effect MeasurementProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671557(5713-5722)Online publication date: 25-Aug-2024
  • (2024)Cost-Effective A/B Testing: Leveraging Go and Python for Efficient Experimentation in Hermes Testing Platform2024 10th International Conference on Communication and Signal Processing (ICCSP)10.1109/ICCSP60870.2024.10543437(1048-1050)Online publication date: 12-Apr-2024
  • (2023)Balance risk and rewardProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669451(76181-76201)Online publication date: 10-Dec-2023
  • (2023)Pricing experimental design: causal effect, expected revenue and tail riskProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619727(31788-31799)Online publication date: 23-Jul-2023
  • (2023)Hypervolume knowledge gradientProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618692(7167-7204)Online publication date: 23-Jul-2023
  • (2023)An Integrative Review: Technology’s Role in Organizational Team Dynamics, Communication, and PerformanceSSRN Electronic Journal10.2139/ssrn.4494495Online publication date: 2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media