Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2020408.2020455acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Detecting adversarial advertisements in the wild

Published: 21 August 2011 Publication History

Abstract

In a large online advertising system, adversaries may attempt to profit from the creation of low quality or harmful advertisements. In this paper, we present a large scale data mining effort that detects and blocks such adversarial advertisements for the benefit and safety of our users. Because both false positives and false negatives have high cost, our deployed system uses a tiered strategy combining automated and semi-automated methods to ensure reliable classification. We also employ strategies to address the challenges of learning from highly skewed data at scale, allocating the effort of human experts, leveraging domain expert knowledge, and independently assessing the effectiveness of our system.

References

[1]
J. Attenberg and F. J. Provost. Why label when you can search?: alternatives to active learning for applying human resources to build classification models under extreme class imbalance. In KDD, 2010.
[2]
C. M. Bishop. Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc., 2006.
[3]
D. M. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. JMLR, 3, 2003.
[4]
L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems 20. 2008.
[5]
D. Chakrabarti, D. Agarwal, and V. Josifovski. Contextual advertising by combining relevance with click feedback. In WWW '08: Proceeding of the 17th international conference on World Wide Web, 2008.
[6]
N. V. Chawla, N. Japkowicz, and A. Kotcz. Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl., 6, June 2004.
[7]
N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma. Adversarial classification. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '04, 2004.
[8]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51, January 2008.
[9]
S. Deerwester, S. Dumais, T. Landuaer, G. Furnas, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 1990.
[10]
J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the l1-ball for learning in high dimensions. In ICML '08: Proceedings of the 25th international conference on Mach ine learning, 2008.
[11]
N. Duffield, C. Lund, and M. Thorup. Priority sampling for estimation of arbitrary subset sums. J. ACM, 54, December 2007.
[12]
J. Goodman, G. V. Cormack, and D. Heckerman. Spam and the ongoing battle for the inbox. Commun. ACM, 50(2), 2007.
[13]
Landing page and site policies. Google AdWords Help Center, 2011. http://goo.gl/XcbPO.
[14]
C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass support vector machines. Neural Networks, IEEE Transactions on, 13(2), Mar. 2002.
[15]
IAB internet advertising revenue report, 2010. http://www.iab.net/media/file/IAB_report_1H_2010_Final.pdf.
[16]
T. Joachims. Making large-scale support vector machine learning practical. 1999.
[17]
T. Joachims. Optimizing search engines using clickthrough data. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 2002.
[18]
T. Joachims. A support vector method for multivariate performance measures. In ICML '05: Proceedings of the 22nd international conference on Machine learning, 2005.
[19]
J. Langford. Vowpal wabbit. Open source release, 2007. http://hunch.net/ vw/.
[20]
J. Langford, L. Li, and T. Zhang. Sparse online learning via truncated gradient. J. Mach. Learn. Res., 10, 2009.
[21]
D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. J. Mach. Learn. Res., 5, 2004.
[22]
D. Lowd and C. Meek. Adversarial learning. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, KDD '05, 2005.
[23]
G. Mann, R. McDonald, M. Mohri, N. Silberman, and D. Walker. Efficient large-scale distributed training of conditional maximum entropy models. In Advances in Neural Information Processing Systems 22. 2009.
[24]
H. B. McMahan and M. Streeter. Tighter bounds for multi-armed bandits with expert advice. In COLT '09: 22nd Annual Conference on Learning Theory, 2009.
[25]
C.-E. Särndal, B. Swensson, and J. Wretman. Model Assisted Survey Sampling. Springer, 2003.
[26]
D. Sculley. Large scale learning to rank. In NIPS 2009 Workshop on Advances in Ranking, 2009.
[27]
D. Sculley, R. G. Malkin, S. Basu, and R. J. Bayardo. Predicting bounce rates in sponsored search advertisements. In KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009.
[28]
S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In ICML '07: Proceedings of the 24th international conference on Machine learning, 2007.
[29]
R. Snow, B. O'Connor, D. Jurafsky, and A. Y. Ng. Cheap and fast--but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '08, 2008.
[30]
S. Sonnenburg, G. Rätsch, and B. Schölkopf. Large scale genomic sequence svm classifiers. In ICML '05: Proceedings of the 22nd international conference on Machine learning, 2005.
[31]
M. Szegedy. The DLT priority sampling is essentially optimal. In Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, STOC '06, 2006.
[32]
S. Tong and D. Koller. Support vector machine active learning with applications to text classification. J. Mach. Learn. Res., 2, March 2002.
[33]
P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1:511, 2001.
[34]
L. von Ahn, B. Maurer, C. McMillen, D. Abraham, and M. Blum. reCAPTCHA: Human-based character recognition via web security measure. September 2008.
[35]
M. Weimer, S. Rao, and M. Zinkevich. A convenient framework for efficient parallel multipass algorithms. In NIPS 2010 Workshop on Learning on Cores, Clusters and Clouds, 2010.
[36]
K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
[37]
W. Yih, J. Goodman, and G. Hulten. Learning at low false positive rates. In Proceedings of the Third Conference on Email and Anti-Spam (CEAS), 2006.
[38]
T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In ICML '04: Proceedings of the twenty-first international conference on Machine learning, 2004.
[39]
M. Zinkevich, A. Smola, and J. Langford. Slow learners are fast. In Advances in Neural Information Processing Systems 22. 2009.
[40]
M. Zinkevich, M. Weimer, A. Smola, and L. Li. Parallelized stochastic gradient descent. In Advances in Neural Information Processing Systems 23. 2010.

Cited By

View all
  • (2023)Problematic advertising and its disparate exposure on facebookProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620554(5665-5682)Online publication date: 9-Aug-2023
  • (2023)Towards an Organically Growing Hate Speech Dataset in Hate Speech Detection Systems in a Smart Mobility ApplicationProceedings of the 24th Annual International Conference on Digital Government Research10.1145/3598469.3598473(36-43)Online publication date: 11-Jul-2023
  • (2023)Prioritised Moderation for Online Advertising2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW59228.2023.00194(2004-2012)Online publication date: Jun-2023
  • Show More Cited By

Index Terms

  1. Detecting adversarial advertisements in the wild

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2011
    1446 pages
    ISBN:9781450308137
    DOI:10.1145/2020408
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 August 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. adversarial learning
    2. data mining
    3. online advertisement

    Qualifiers

    • Research-article

    Conference

    KDD '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Problematic advertising and its disparate exposure on facebookProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620554(5665-5682)Online publication date: 9-Aug-2023
    • (2023)Towards an Organically Growing Hate Speech Dataset in Hate Speech Detection Systems in a Smart Mobility ApplicationProceedings of the 24th Annual International Conference on Digital Government Research10.1145/3598469.3598473(36-43)Online publication date: 11-Jul-2023
    • (2023)Prioritised Moderation for Online Advertising2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW59228.2023.00194(2004-2012)Online publication date: Jun-2023
    • (2023)A Meta-Summary of Challenges in Building Products with ML Components – Collecting Experiences from 4758+ Practitioners2023 IEEE/ACM 2nd International Conference on AI Engineering – Software Engineering for AI (CAIN)10.1109/CAIN58948.2023.00034(171-183)Online publication date: May-2023
    • (2023)Structured Verification of Machine Learning Models in Industrial SettingsBig Data10.1089/big.2021.011211:3(181-198)Online publication date: 1-Jun-2023
    • (2023)Towards an Adversary-Aware ML-Based Detector of Spam on Twitter HashtagsProceedings of Eighth International Congress on Information and Communication Technology10.1007/978-981-99-3243-6_32(401-413)Online publication date: 25-Jul-2023
    • (2022)Digital Devices for Assessing Motor Functions in Mobility-Impaired and Healthy Populations: Systematic Literature ReviewJournal of Medical Internet Research10.2196/3768324:11(e37683)Online publication date: 21-Nov-2022
    • (2022)Sub-Task Imputation via Self-Labelling to Train Image Moderation Models on Sparse Noisy DataProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557149(3461-3471)Online publication date: 17-Oct-2022
    • (2022)Data Augmentation On-the-fly and Active Learning in Data Stream Classification2022 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI51031.2022.10022133(1408-1414)Online publication date: 4-Dec-2022
    • (2021)Living-Off-The-Land Command Detection Using Active LearningProceedings of the 24th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3471621.3471858(442-455)Online publication date: 6-Oct-2021
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media