Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2887367.2887371guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Spamming in linked data

Published: 12 November 2012 Publication History
  • Get Citation Alerts
  • Abstract

    The rapidly growing commercial interest in Linked Data raises the prospect of "Linked Data spam", which we define as "deliberately misleading information (data and links) published as Linked Data, with the goal of creating financial gain for the publisher". Compared to conventional technologies affected by spamming, e.g. email and blogs, spammers targeting Linked Data may not be able to push information directly towards consumers, but rather may seek to exploit a lack of human involvement in automated data integration processes performed by applications consuming Linked Data. This paper aims to lay a foundation for future work addressing the issue of Linked Data spam, by providing the following contributions: i) a formal definition of spamming in Linked Data; ii) a classification of potential spamming techniques; iii) a sample dataset demonstrating these techniques, for use in evaluating anti-spamming mechanisms; iv) preliminary recommendations for anti-spamming strategies.

    References

    [1]
    Manuel Atencia, Jérôme Euzenat, Giuseppe Pirrò, and Marie-Christine Rousset. Alignment-Based Trust for Resource Finding in Semantic P2P Networks. In International Semantic Web Conference (1), pages 51-66, 2011.
    [2]
    D. Ballou, R. Wang, H. Pazer, and G. Kumar. Modeling Information Manufacturing Systems to Determine Information Product Quality. Management Science, pages 462-484, 1998.
    [3]
    C. Bizer and R. Cyganiak. Quality-driven Information Filtering Using the WIQA Policy Framework. Web Semantics: Science, Services and Agents on the World Wide Web, 7(1):1-10, 2009.
    [4]
    M. Bobrowski, M. Marré, and D. Yankelevich. A Homogeneous Framework to Measure Data Quality. In Proceedings of the International Conference on Information Quality (IQ), pages 115-124, 1999.
    [5]
    Jeremy J. Carroll, Christian Bizer, Pat Hayes, and Patrick Stickler. Named Graphs. Journal of Web Semantics, 3(3), 2005.
    [6]
    J. Dong, H. Cao, P. Liu, and L. Ren. Bayesian Chinese Spam Filter Based on Crossed N-Gram. In Intelligent Systems Design and Applications, 2006. ISDA'06. Sixth International Conference on, volume 3, pages 103-108. IEEE, 2006.
    [7]
    J.R. Gruser, L. Raschid, V. Zadorozhny, and T. Zhan. Learning Response Time for Websources Using Query Feedback and Application in Query Optimization. The VLDB Journal The International Journal on Very Large Data Bases, 9(1):18-37, 2000.
    [8]
    Z. Gyongyi and H. Garcia-Molina. Web Spam Taxonomy. In First international workshop on adversarial information retrieval on the web (AIRWeb 2005), 2005.
    [9]
    O. Hartig and J. Zhao. Using Web Data Provenance for Quality Assessment. In Proceedings of the International Workshop on Semantic Web and Provenance Management, Washington DC, USA, 2009.
    [10]
    T. Heath and C. Bizer. Linked data: Evolving the Web Into a Global Data Space. Synthesis Lectures on the Semantic Web: Theory and Technology, 1(1):1-136, 2011.
    [11]
    Sepandar D. Kamvar, Mario T. Schlosser, and Hector Garcia-Molina. The Eigentrust Algorithm for Reputation Management in P2P Networks. In Proceedings of the 12th international conference on World Wide Web, WWW '03, pages 640-651, New York, NY, USA, 2003. ACM.
    [12]
    Y.W. Lee, D.M. Strong, B.K. Kahn, and R.Y. Wang. AIMQ: a Methodology for Information Quality Assessment. Information & Management, 40(2):133-146, 2002.
    [13]
    G. Mishne, D. Carmel, and R. Lempel. Blocking Blog Spam with Language Model Disagreement. In Proceedings of the _rst international workshop on adversarial information retrieval on the Web (AIRWeb), pages 1-6, 2005.
    [14]
    Knud Moller, Michael Hausenblas, Richard Cyganiak, and Gunnar Aastrand Grimnes. Learning from linked open data usage: Patterns & metrics. In Proceedings of the WebSci10: Extending the Frontiers of Society On-Line, 2010.
    [15]
    A. Motro and I. Rakov. Estimating the Quality of Databases. Flexible Query Answering Systems, pages 298-307, 1998.
    [16]
    F. Naumann. Quality-Driven Query Answering for Integrated Information Systems, volume 2261. Springer Verlag, 2002.
    [17]
    A.C. Rothwell, L.D. Jagger, W.R. Dennis, and D.R. Clarke. Intelligent SPAM Detection System Using an Updateable Neural Analysis Engine, July 27 2004. US Patent 6,769,016.
    [18]
    I. Stuart, S.H. Cha, and C. Tappert. A Neural Network Classiffier for Junk e-mail. Document Analysis Systems VI, pages 442-450, 2004.
    [19]
    D. Trudgian. Spam Classification Using Nearest Neighbour Techniques. Intelligent Data Engineering and Automated Learning-IDEAL 2004, pages 578-585, 2004.
    [20]
    C.H. Wu. Behavior-based Spam Detection Using a Hybrid Method of Rule-Based Techniques and Neural Networks. Expert Systems with Applications, 36(3):4321- 4330, 2009.
    [21]
    J.A. Zdziarski. Ending spam: Bayesian Content Filtering and the Art of Statistical Language Classification. No Starch Press, 2005.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    COLD'12: Proceedings of the Third International Conference on Consuming Linked Data - Volume 905
    November 2012
    135 pages
    • Editors:
    • Juan F. Sequeda,
    • Andreas Harth,
    • Olaf Hartig

    Publisher

    CEUR-WS.org

    Aachen, Germany

    Publication History

    Published: 12 November 2012

    Author Tags

    1. linked data
    2. spam
    3. spam vectors

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 1
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media