research-article

Detecting adversarial advertisements in the wild

Authors:

Matthew Eric Otey,

Bridget Spitznagel,

John Hainsworth,

Yunkai ZhouAuthors Info & Claims

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 274 - 282

https://doi.org/10.1145/2020408.2020455

Published: 21 August 2011 Publication History

Abstract

In a large online advertising system, adversaries may attempt to profit from the creation of low quality or harmful advertisements. In this paper, we present a large scale data mining effort that detects and blocks such adversarial advertisements for the benefit and safety of our users. Because both false positives and false negatives have high cost, our deployed system uses a tiered strategy combining automated and semi-automated methods to ensure reliable classification. We also employ strategies to address the challenges of learning from highly skewed data at scale, allocating the effort of human experts, leveraging domain expert knowledge, and independently assessing the effectiveness of our system.

References

[1]

J. Attenberg and F. J. Provost. Why label when you can search?: alternatives to active learning for applying human resources to build classification models under extreme class imbalance. In KDD, 2010.

Digital Library

[2]

C. M. Bishop. Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc., 2006.

Digital Library

[3]

D. M. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. JMLR, 3, 2003.

Digital Library

[4]

L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems 20. 2008.

[5]

D. Chakrabarti, D. Agarwal, and V. Josifovski. Contextual advertising by combining relevance with click feedback. In WWW '08: Proceeding of the 17th international conference on World Wide Web, 2008.

Digital Library

[6]

N. V. Chawla, N. Japkowicz, and A. Kotcz. Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl., 6, June 2004.

Digital Library

[7]

N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma. Adversarial classification. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '04, 2004.

Digital Library

[8]

J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51, January 2008.

Digital Library

[9]

S. Deerwester, S. Dumais, T. Landuaer, G. Furnas, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 1990.

[10]

J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the l1-ball for learning in high dimensions. In ICML '08: Proceedings of the 25th international conference on Mach ine learning, 2008.

Digital Library

[11]

N. Duffield, C. Lund, and M. Thorup. Priority sampling for estimation of arbitrary subset sums. J. ACM, 54, December 2007.

Digital Library

[12]

J. Goodman, G. V. Cormack, and D. Heckerman. Spam and the ongoing battle for the inbox. Commun. ACM, 50(2), 2007.

Digital Library

[13]

Landing page and site policies. Google AdWords Help Center, 2011. http://goo.gl/XcbPO.

[14]

C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass support vector machines. Neural Networks, IEEE Transactions on, 13(2), Mar. 2002.

Digital Library

[15]

IAB internet advertising revenue report, 2010. http://www.iab.net/media/file/IAB_report_1H_2010_Final.pdf.

[16]

T. Joachims. Making large-scale support vector machine learning practical. 1999.

[17]

T. Joachims. Optimizing search engines using clickthrough data. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 2002.

Digital Library

[18]

T. Joachims. A support vector method for multivariate performance measures. In ICML '05: Proceedings of the 22nd international conference on Machine learning, 2005.

Digital Library

[19]

J. Langford. Vowpal wabbit. Open source release, 2007. http://hunch.net/ vw/.

[20]

J. Langford, L. Li, and T. Zhang. Sparse online learning via truncated gradient. J. Mach. Learn. Res., 10, 2009.

Digital Library

[21]

D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. J. Mach. Learn. Res., 5, 2004.

Digital Library

[22]

D. Lowd and C. Meek. Adversarial learning. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, KDD '05, 2005.

Digital Library

[23]

G. Mann, R. McDonald, M. Mohri, N. Silberman, and D. Walker. Efficient large-scale distributed training of conditional maximum entropy models. In Advances in Neural Information Processing Systems 22. 2009.

[24]

H. B. McMahan and M. Streeter. Tighter bounds for multi-armed bandits with expert advice. In COLT '09: 22nd Annual Conference on Learning Theory, 2009.

[25]

C.-E. Särndal, B. Swensson, and J. Wretman. Model Assisted Survey Sampling. Springer, 2003.

[26]

D. Sculley. Large scale learning to rank. In NIPS 2009 Workshop on Advances in Ranking, 2009.

[27]

D. Sculley, R. G. Malkin, S. Basu, and R. J. Bayardo. Predicting bounce rates in sponsored search advertisements. In KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009.

Digital Library

[28]

S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In ICML '07: Proceedings of the 24th international conference on Machine learning, 2007.

Digital Library

[29]

R. Snow, B. O'Connor, D. Jurafsky, and A. Y. Ng. Cheap and fast--but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '08, 2008.

Digital Library

[30]

S. Sonnenburg, G. Rätsch, and B. Schölkopf. Large scale genomic sequence svm classifiers. In ICML '05: Proceedings of the 22nd international conference on Machine learning, 2005.

Digital Library

[31]

M. Szegedy. The DLT priority sampling is essentially optimal. In Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, STOC '06, 2006.

Digital Library

[32]

S. Tong and D. Koller. Support vector machine active learning with applications to text classification. J. Mach. Learn. Res., 2, March 2002.

Digital Library

[33]

P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1:511, 2001.

[34]

L. von Ahn, B. Maurer, C. McMillen, D. Abraham, and M. Blum. reCAPTCHA: Human-based character recognition via web security measure. September 2008.

[35]

M. Weimer, S. Rao, and M. Zinkevich. A convenient framework for efficient parallel multipass algorithms. In NIPS 2010 Workshop on Learning on Cores, Clusters and Clouds, 2010.

[36]

K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.

Digital Library

[37]

W. Yih, J. Goodman, and G. Hulten. Learning at low false positive rates. In Proceedings of the Third Conference on Email and Anti-Spam (CEAS), 2006.

[38]

T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In ICML '04: Proceedings of the twenty-first international conference on Machine learning, 2004.

Digital Library

[39]

M. Zinkevich, A. Smola, and J. Langford. Slow learners are fast. In Advances in Neural Information Processing Systems 22. 2009.

[40]

M. Zinkevich, M. Weimer, A. Smola, and L. Li. Parallelized stochastic gradient descent. In Advances in Neural Information Processing Systems 23. 2010.

Cited By

Ali MGoetzen AMislove ARedmiles ESapiezynski PCalandrino JTroncoso C(2023)Problematic advertising and its disparate exposure on facebookProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620554(5665-5682)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.5555/3620237.3620554
Alsamman ASchmitz AWimmer M(2023)Towards an Organically Growing Hate Speech Dataset in Hate Speech Detection Systems in a Smart Mobility ApplicationProceedings of the 24th Annual International Conference on Digital Government Research10.1145/3598469.3598473(36-43)Online publication date: 11-Jul-2023
https://dl.acm.org/doi/10.1145/3598469.3598473
Gampa PValsangkar AChoubey SA P(2023)Prioritised Moderation for Online Advertising2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW59228.2023.00194(2004-2012)Online publication date: Jun-2023
https://doi.org/10.1109/CVPRW59228.2023.00194
Show More Cited By

Index Terms

Detecting adversarial advertisements in the wild
1. Computing methodologies
  1. Machine learning

Recommendations

Internet Usage, Motives and Advertisements: Empirical Evidences from Iran

This study explains the internet usage among Iranian users. Therefore it has been tried to give basic answers to this question that: What is the Iranians main use for internet and not shopping online? Based on this, by thoroughly analyzing the ...
Defending against adversarial machine learning attacks using hierarchical learning: A case study on network traffic attack classification
Abstract
Machine learning is key for automated detection of malicious network activity to ensure that computer networks and organizations are protected against cyber security attacks. Recently, there has been growing interest in the domain of ...
Adversarial Machine Learning Attacks and Defense Methods in the Cyber Security Domain

In recent years, machine learning algorithms, and more specifically deep learning algorithms, have been widely used in many fields, including cyber security. However, machine learning systems are vulnerable to adversarial attacks, and this limits the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2011

1446 pages

ISBN:9781450308137

DOI:10.1145/2020408

General Chair:
Chid Apte
IBM Research
,
Program Chairs:
Joydeep Ghosh
UT Austin
,
Padhraic Smyth
UC Irvine

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '11

Sponsor:

KDD '11: The 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 21 - 24, 2011

California, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

45
Total Citations
View Citations
787
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)1

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ali MGoetzen AMislove ARedmiles ESapiezynski PCalandrino JTroncoso C(2023)Problematic advertising and its disparate exposure on facebookProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620554(5665-5682)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.5555/3620237.3620554
Alsamman ASchmitz AWimmer M(2023)Towards an Organically Growing Hate Speech Dataset in Hate Speech Detection Systems in a Smart Mobility ApplicationProceedings of the 24th Annual International Conference on Digital Government Research10.1145/3598469.3598473(36-43)Online publication date: 11-Jul-2023
https://dl.acm.org/doi/10.1145/3598469.3598473
Gampa PValsangkar AChoubey SA P(2023)Prioritised Moderation for Online Advertising2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW59228.2023.00194(2004-2012)Online publication date: Jun-2023
https://doi.org/10.1109/CVPRW59228.2023.00194
Nahar NZhang HLewis GZhou SKästner C(2023)A Meta-Summary of Challenges in Building Products with ML Components – Collecting Experiences from 4758+ Practitioners2023 IEEE/ACM 2nd International Conference on AI Engineering – Software Engineering for AI (CAIN)10.1109/CAIN58948.2023.00034(171-183)Online publication date: May-2023
https://doi.org/10.1109/CAIN58948.2023.00034
Kaminwar SGoschenhofer JThomas JThon IBischl B(2023)Structured Verification of Machine Learning Models in Industrial SettingsBig Data10.1089/big.2021.011211:3(181-198)Online publication date: 1-Jun-2023
https://doi.org/10.1089/big.2021.0112
Imam NVassilakis V(2023)Towards an Adversary-Aware ML-Based Detector of Spam on Twitter HashtagsProceedings of Eighth International Congress on Information and Communication Technology10.1007/978-981-99-3243-6_32(401-413)Online publication date: 25-Jul-2023
https://doi.org/10.1007/978-981-99-3243-6_32
Guo CChiesa Pde Moor CFazeli MSchofield THofer KBelachew SScotland A(2022)Digital Devices for Assessing Motor Functions in Mobility-Impaired and Healthy Populations: Systematic Literature ReviewJournal of Medical Internet Research10.2196/3768324:11(e37683)Online publication date: 21-Nov-2022
https://doi.org/10.2196/37683
Paul INegi SAl Hasan MXiong L(2022)Sub-Task Imputation via Self-Labelling to Train Image Moderation Models on Sparse Noisy DataProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557149(3461-3471)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557149
Malialis KPapatheodoulou DFilippou SPanayiotou CPolycarpou M(2022)Data Augmentation On-the-fly and Active Learning in Data Stream Classification2022 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI51031.2022.10022133(1408-1414)Online publication date: 4-Dec-2022
https://doi.org/10.1109/SSCI51031.2022.10022133
Ongun TStokes JOr JTian KTajaddodianfar FNeil JSeifert COprea APlatt JBilge LDumitras T(2021)Living-Off-The-Land Command Detection Using Active LearningProceedings of the 24th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3471621.3471858(442-455)Online publication date: 6-Oct-2021
https://dl.acm.org/doi/10.1145/3471621.3471858
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents