research-article

Using co-visitation networks for detecting large scale online display advertising exchange fraud

Authors:

Foster ProvostAuthors Info & Claims

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 1240 - 1248

https://doi.org/10.1145/2487575.2488207

Published: 11 August 2013 Publication History

Get Access

Abstract

Data generated by observing the actions of web browsers across the internet is being used at an ever increasing rate for both building models and making decisions. In fact, a quarter of the industry-track papers for KDD in 2012 were based on data generated by online actions. The models, analytics and decisions they inform all stem from the assumption that observed data captures the intent of users. However, a large portion of these observed actions are not intentional, and are effectively polluting the models. Much of this observed activity is either generated by robots traversing the internet or the result of unintended actions of real users. These non-intentional actions observed in the web logs severely bias both analytics and the models created from the data. In this paper, we will show examples of how non-intentional traffic that is produced by fraudulent activities adversely affects both general analytics and predictive models, and propose an approach using co-visitation networks to identify sites that have large amounts of this fraudulent traffic. We will then show how this approach, along with a second stage classifier that identifies non-intentional traffic at the browser level, is deployed in production at Media6Degrees (m6d), a targeting technology company for display advertising. This deployed product acts both to filter out the fraudulent traffic from the input data and to insure that we don't serve ads during unintended website visits.

References

[1]

N. Benchettara, R. Kanawati, and C. Rouveirol. Supervised machine learning applied to link prediction in bipartite social networks. In ASONAM, pages 326--330. IEEE, 2010.

Digital Library

Google Scholar

[2]

T. Foremski. Report: 51% of web site traffic is 'non-human' and mostly malicious, 2012.

Google Scholar

[3]

P. Ipeirotis. Uncovering an advertising fraud scheme. or "the internet is for porn", 2011. http://www.behind-the-enemy-lines.com/2011/03/uncovering-advertising-fraud-scheme.html.

Google Scholar

[4]

G. McLachlan and D. Peel. Finite mixture models, volume 299. Wiley-Interscience, 2000.

Google Scholar

[5]

C. Perlich, B. Dalessandro, R. Hook, O. Stitelman, T. Raeder, and F. Provost. Bid optimizing and inventory scoring in targeted online advertising. In 18th ACM SIGKDD Conference, pages 804--812. ACM, 2012.

Digital Library

Google Scholar

[6]

F. Provost, B. Dalessandro, R. Hook, X. Zhang, and A. Murray. Audience selection for on-line brand advertising: privacy-friendly social network targeting. In 15th ACM SIGKDD, pages 707--716, 2009.

Digital Library

Google Scholar

[7]

B. Pugh. Battling bots: comscore's ongoing efforts to detect and remove non-human traffic. http://www.comscore.com/Insights/Blog/Battling\_Bots\_comScores\_Ongoing\_Efforts\_to\_Detect\_and\_Remove\_Non\_Human\_Traffic, 2012.

Google Scholar

[8]

T. Raeder, B. Dalessandro, O. Stitelman, C. Perlich, and F. Provost. Design principles of massive, robust prediction systems. In 18th ACM SIGKDD Conference, 2012.

Digital Library

Google Scholar

[9]

M. Shields. Suspicious web domains cost online ad business$400m per year. Adweek, April 2013.

Google Scholar

[10]

\small spider.io. Discovered: Botnet costing display advertisers over six million dollars per month. February 2013.small http://www.spider.io/blog/2013/03/chameleon-botnet/.

Google Scholar

[11]

B. Stone-Gross, R. Stevens, A. Zarras, R. Kemmerer, C. Kruegel, and G. Vigna. Understanding fraudulent activities in online ad exchanges. In 2011 ACM SIGCOMM conference, pages 279--294. ACM, 2011.

Digital Library

Google Scholar

Cited By

View all

Duan MLi KZhang WQin JXiao B(2024)Attacking Click-through Rate Predictors via Generating Realistic Fake SamplesACM Transactions on Knowledge Discovery from Data10.1145/364368518:5(1-24)Online publication date: 28-Feb-2024
https://dl.acm.org/doi/10.1145/3643685
Liang YChen XChen YXiao PZhang J(2024)Mobile ad fraud: Empirical patterns in publisher and advertising campaign dataInternational Journal of Research in Marketing10.1016/j.ijresmar.2023.09.00341:2(265-281)Online publication date: Jun-2024
https://doi.org/10.1016/j.ijresmar.2023.09.003
Yu LLi SMeng YWang XZhu H(2024)Poisoning Attack in Machine Learning Based Invalid Ad Traffic DetectionNetwork Simulation and Evaluation10.1007/978-981-97-4519-7_5(60-72)Online publication date: 2-Aug-2024
https://doi.org/10.1007/978-981-97-4519-7_5
Show More Cited By

Index Terms

Using co-visitation networks for detecting large scale online display advertising exchange fraud
1. Computing methodologies
  1. Modeling and simulation

Recommendations

Detecting click fraud in online advertising: a data mining approach

Click fraud-the deliberate clicking on advertisements with no real interest on the product or service offered-is one of the most daunting problems in online advertising. Building an effective fraud detection method is thus pivotal for online advertising ...
Online Display Advertising: Targeting and Obtrusiveness

We use data from a large-scale field experiment to explore what influences the effectiveness of online advertising. We find that matching an ad to website content and increasing an ad's obtrusiveness independently increase purchase intent. However, in ...
Online Display Advertising: Modeling the Effects of Multiple Creatives and Individual Impression Histories

Online advertising campaigns often consist of multiple ads, each with different creative content. We consider how various creatives in a campaign differentially affect behavior given the targeted individual's ad impression history, as characterized by ...

Comments

Information & Contributors

Information

Published In

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2013

1534 pages

ISBN:9781450321747

DOI:10.1145/2487575

Editors:
Rayid Ghani
University of Chicago
,
Ted E. Senator
SAIC
,
Paul Bradley
MethodCare, Inc.
,
Rajesh Parekh
Groupon
,
Jingrui He
Stevens Institute of Technology
,
General Chairs:
Robert L. Grossman
University of Chicago and Open Data Group
,
Ramasamy Uthurusamy
General Motors Corporation (retired)
,
Program Chairs:
Inderjit S. Dhillon
University of Texas
,
Yehuda Koren
Google

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD' 13

Sponsor:

KDD' 13: The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 11 - 14, 2013

Illinois, Chicago, USA

Acceptance Rates

KDD '13 Paper Acceptance Rate 125 of 726 submissions, 17%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
566
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)3

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Duan MLi KZhang WQin JXiao B(2024)Attacking Click-through Rate Predictors via Generating Realistic Fake SamplesACM Transactions on Knowledge Discovery from Data10.1145/364368518:5(1-24)Online publication date: 28-Feb-2024
https://dl.acm.org/doi/10.1145/3643685
Liang YChen XChen YXiao PZhang J(2024)Mobile ad fraud: Empirical patterns in publisher and advertising campaign dataInternational Journal of Research in Marketing10.1016/j.ijresmar.2023.09.00341:2(265-281)Online publication date: Jun-2024
https://doi.org/10.1016/j.ijresmar.2023.09.003
Yu LLi SMeng YWang XZhu H(2024)Poisoning Attack in Machine Learning Based Invalid Ad Traffic DetectionNetwork Simulation and Evaluation10.1007/978-981-97-4519-7_5(60-72)Online publication date: 2-Aug-2024
https://doi.org/10.1007/978-981-97-4519-7_5
Nguyen THa DZhu WYuan S(2022)Real-Time Filtering Non-Intentional Bid Request on Demand-Side PlatformApplied Sciences10.3390/app12231222812:23(12228)Online publication date: 29-Nov-2022
https://doi.org/10.3390/app122312228
Lyu QLi HZhou RZhang JZhao NLiu Y(2022)BCFDPSSecurity and Communication Networks10.1155/2022/30434892022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/3043489
Sadeghpour SVlajic N(2021)Ads and Fraud: A Comprehensive Survey of Fraud in Online AdvertisingJournal of Cybersecurity and Privacy10.3390/jcp10400391:4(804-832)Online publication date: 16-Dec-2021
https://doi.org/10.3390/jcp1040039
Sun SYu LZhang XXue MZhou RZhu HHao SLin XKim YKim JVigna GShi E(2021)Understanding and Detecting Mobile Ad Fraud Through the Lens of Invalid TrafficProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security10.1145/3460120.3484547(287-303)Online publication date: 12-Nov-2021
https://dl.acm.org/doi/10.1145/3460120.3484547
Ha DAn Nguyen TZhu WYuan S(2021)Identifying Non-Intentional Ad Traffic on the Demand-Side in Display Advertising2021 International Conference on Technologies and Applications of Artificial Intelligence (TAAI)10.1109/TAAI54685.2021.00021(66-71)Online publication date: Nov-2021
https://doi.org/10.1109/TAAI54685.2021.00021
Menkov VGinsparg PKantor P(2020)Recommendations and privacy in the arXiv systemJournal of the Association for Information Science and Technology10.1002/asi.2423671:3(300-313)Online publication date: 28-Jan-2020
https://dl.acm.org/doi/10.1002/asi.24236
Pastor APärssinen MCallejo PVallina PCuevas RCuevas ÁKotila MAzcorra A(2019)Nameles: An intelligent system for Real-Time Filtering of Invalid Ad TrafficThe World Wide Web Conference10.1145/3308558.3313601(1454-1464)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313601
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Detecting click fraud in online advertising: a data mining approach

Online Display Advertising: Targeting and Obtrusiveness

Online Display Advertising: Modeling the Effects of Multiple Creatives and Individual Impression Histories

Comments

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Detecting click fraud in online advertising: a data mining approach

Online Display Advertising: Targeting and Obtrusiveness

Online Display Advertising: Modeling the Effects of Multiple Creatives and Individual Impression Histories

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations