Abstract
Social media are a promising new data source for real-world behavioral monitoring. Despite clear advantages, analyses of social media data face some challenges. In this paper, we seek to elucidate some of these challenges and draw relevant lessons from more traditional survey techniques. Beyond standard machine learning approaches, we make the case that studies that conduct statistical analyses of social media data should carefully consider elements of study design, providing behavioral examples throughout. Specifically, we focus on issues surrounding the validity of statistical conclusions that may be drawn from social media data. We discuss common pitfalls and techniques to avoid these pitfalls, so researchers may mitigate potential problems of design.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Baker, R., et al.: Summary report of the AAPOR task force on non-probability sampling. J. Surv. Stat. Methodol. 1(2), 90–143 (2013)
Baker, R., et al.: Evaluating Survey Quality in Today’s Complex Environment - AAPOR, May 2016
Beauchamp, N.: Predicting and interpolating state-level polls using Twitter textual data. Am. J. Polit. Sci. 61, 490–503 (2016)
Beskow, D.M., Carley, K.M.: Bot conversations are different: leveraging network metrics for bot detection in Twitter. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 825–832. IEEE (2018)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Bonevski, B., et al.: Reaching the hard-to-reach: a systematic review of strategies for improving health and medical research with socially disadvantaged groups. BMC Med. Res. Methodol. 14, 42 (2014)
Broniatowski, D.A., Hilyard, K.M., Dredze, M.: Effective vaccine communication during the disneyland measles outbreak. Vaccine 34(28), 3225–3228 (2016)
Broniatowski, D.A., et al.: Weaponized health communication: Twitter bots and Russian trolls amplify the vaccine debate. Am. J. Public Health 108(10), 1378–1384 (2018)
Broniatowski, D.A., Paul, M.J., Dredze, M.: National and local influenza surveillance through Twitter: an analysis of the 2012–2013 influenza epidemic. PLoS ONE 8(12), e83672 (2013)
Broniatowski, D.A., Tucker, C.: Assessing causal claims about complex engineered systems with quantitative data: internal, external, and construct validity. Syst. Eng. 20(6), 483–496 (2017)
Campbell, D.T., Stanley, J.C.: Experimental and Quasi-Experimental Designs for Research, 2nd Print edn. Houghton Mifflin Comp, Boston (1967). oCLC: 247359300
Culotta, A., Ravi, N., Cutler, J.: Predicting Twitter user demographics using distant supervision from website traffic data. J. Artif. Intell. Res. 55, 389–408 (2016)
Cunha, E., Magno, G., Comarela, G., Almeida, V., Gonçalves, M.A., Benevenuto, F.: Analyzing the dynamic evolution of hashtags on Twitter: a language-based approach. In: Proceedings of the Workshop on Languages in Social Media, pp. 58–65. Association for Computational Linguistics (2011)
Davis, C.A., Varol, O., Ferrara, E., Flammini, A., Menczer, F.: BotOrNot: a system to evaluate social bots. arXiv:1602.00975 [cs], pp. 273–274 (2016)
Dredze, M., Broniatowski, D.A., Smith, M.C., Hilyard, K.M.: Understanding vaccine refusal: why we need social media now. Am. J. Prev. Med. 50(4), 550 (2016)
Duggan, M., Brenner, J.: The Demographics of Social Media Users – 2012, February 2013
Fitzgerald, R., Fuller, L.: I hear you knocking but you can’t come in: the effects of reluctant respondents and refusers on sample survey estimates. Sociol. Methods Res. 11(1), 3–32 (1982)
Getis, A., Ord, J.K.: The analysis of spatial association by use of distance statistics. Geograph. Anal. 24(3), 189–206 (1992)
Groves, R.M.: Three eras of survey research. Public Opin. Q. 75(5), 861–871 (2011)
Huang, X., et al.: Examining patterns of influenza vaccination in social media. In: AAAI Joint Workshop on Health Intelligence (W3PHIAI) (2017)
Kata, A.: Anti-vaccine activists, web 2.0, and the postmodern paradigm – an overview of tactics and tropes used online by the anti-vaccination movement. Vaccine 30(25), 3778–3789 (2012)
Knowles, R., Carroll, J., Dredze, M.: Demographer: extremely simple name demographics. In: NLP+ CSS 2016, p. 108 (2016)
Krumpal, I.: Determinants of social desirability bias in sensitive surveys: a literature review. Qual. Quant. 47(4), 2025–2047 (2013)
Kudugunta, S., Ferrara, E.: Deep neural networks for bot detection. Inf. Sci. 467, 312–322 (2018)
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 591–600. ACM, New York (2010)
Lazer, D., Kennedy, R., King, G., Vespignani, A.: The parable of Google Flu: traps in big data analysis. Science 343(6176), 1203–1205 (2014)
Liao, Q.V., Fu, W.T., Strohmaier, M.: # Snowden: understanding biases introduced by behavioral differences of opinion groups on social media. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI 2016, pp. 3352–3363. ACM, New York (2016)
Lu, Y., Hu, X., Wang, F., Kumar, S., Liu, H., Maciejewski, R.: Visualizing social media sentiment in disaster scenarios. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015 Companion, pp. 1211–1215. ACM, New York (2015)
Murphy, J., et al.: Social Media in Public Opinion Research - AAPOR, May 2014
Nakov, P., et al.: Developing a successful SemEval task in sentiment analysis of Twitter and other social media texts. Lang. Resour. Eval. 50(1), 35–65 (2016)
Olteanu, A., Castillo, C., Diaz, F., Kiciman, E.: Social data: biases, methodological pitfalls, and ethical boundaries. SSRN Scholarly Paper ID 2886526, Social Science Research Network, Rochester, December 2016
Paul, M.J., Dredze, M., Broniatowski, D.: Twitter improves influenza forecasting. PLoS currents 6 (2014)
Quinn, S.C., Jamison, A., An, J., Freimuth, V.S., Hancock, G.R., Musa, D.: Breaking down the monolith: understanding flu vaccine uptake among African Americans. SSM - Popul. Health 4, 25–36 (2018)
Quinn, S.C., Jamison, A., Freimuth, V.S., An, J., Hancock, G.R., Musa, D.: Exploring racial influences on flu vaccine attitudes and behavior: results of a national survey of White and African American adults. Vaccine 35(8), 1167–1174 (2017)
Schober, M.F., Pasek, J., Guggenheim, L., Lampe, C., Conrad, F.G.: Social media analyses for social measurement. Public Opin. Q. 80(1), 180–211 (2016)
Schwartz, H.A., et al.: Toward personality insights from language exploration in social media. In: 2013 AAAI Spring Symposium Series (2013)
Shadish, W., Cook, T.D., Campbell, D.T.: Experimental and quasi-experimental designs for generalized causal inference. Wadsworth Cengage learning (2002)
Särndal, C., Swensson, B., Wretman, J.: Model Assisted Survey Sampling. Springer, Heidelberg (1992)
Tourangeau, R., Rips, L.J., Rasinski, K.: The Psychology of Survey Response. Cambridge University Press, March 2000. Google-Books-ID: bjVYdyXXT3oC
Volkova, S., Bachrach, Y.: On predicting sociodemographic traits and emotions from communications in social networks and their implications to online self-disclosure. Cyberpsychol. Behav. Soc. Netw. 18(12), 726–736 (2015)
Wood-Doughty, Z., Mahajan, P., Dredze, M.: Johns Hopkins or Johnny-Hopkins: classifying individuals versus organizations on Twitter. In: Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media, pp. 56–61 (2018)
Wood-Doughty, Z., Smith, M., Broniatowski, D., Dredze, M.: How does twitter user behavior vary across demographic groups? In: Proceedings of the Second Workshop on NLP and Computational Social Science, pp. 83–89 (2017)
Yeager, D.S., et al.: Comparing the accuracy of RDD telephone surveys and internet surveys conducted with probability and non-probability samples. Public Opin. Q. 75(4), 709–747 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Smith, M.C., Mazzuchi, T.A., Broniatowski, D.A. (2020). Validating Social Media Monitoring: Statistical Pitfalls and Opportunities from Public Opinion. In: Thomson, R., Bisgin, H., Dancy, C., Hyder, A., Hussain, M. (eds) Social, Cultural, and Behavioral Modeling. SBP-BRiMS 2020. Lecture Notes in Computer Science(), vol 12268. Springer, Cham. https://doi.org/10.1007/978-3-030-61255-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-61255-9_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61254-2
Online ISBN: 978-3-030-61255-9
eBook Packages: Computer ScienceComputer Science (R0)