Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2381896.2381902acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

An information theoretic framework for web inference detection

Published: 19 October 2012 Publication History

Abstract

Document redaction is widely used to protect sensitive information in published documents. In a basic redaction system, sensitive and identifying terms are removed from the document. Web-based inference is an attack on redaction systems whereby the redacted document is linked with other publicly available documents to infer the removed parts. Web-based inference also provides an approach for detecting unwanted inferences and so constructing secure redaction systems. Previous works on web-based inference used general keyword extraction methods for document representation. We propose a systematic approach, based on information theoretic concepts and measures, to rank the words in a document for purpose of inference detection. We extend our results to the case of multiple sensitive words and propose a metric that takes into account possible relationship of the sensitive words and results in an effective and efficient inference detection system.
Using a number of experiments we show that our approach, when used for document redaction, substantially reduce the number of inferences that are left in a document. We describe our approach, present the experiment results, and outline future work.

References

[1]
E. Bier, L. Good, K. Popat, and A. Newberger. A document corpus browser for in-depth reading. In Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, JCDL '04, pages 87--96, New York, NY, USA, 2004. ACM.
[2]
E. A. Bier and E. W. Ishak. Entity quick click: rapid text copying based on automatic entity extraction. In Abstracts of the Conference on Human Factors in Computing Systems (CHI, pages 562--567. ACM Press, 2006.
[3]
E. A. Bier and E. W. Ishak. Entity workspace: an evidence file that aids memory, inference, and reading. In Proceedings of intelligence and Security Informatics (ISI 2006, pages 466--472. Springer-Verlag, 2006.
[4]
R. o. Bin Laden. http://www.webspawner.com/users/islamicjihad15, Aug. 2001.
[5]
Bing-API. www.bing.com/toolbox/bingdeveloper, 2012.
[6]
Y. Chen and W. W. Chu. Database security protection via inference detection. In IEEE International Conference on Intelligence and Security Informatics, 2006.
[7]
Y. Chen and W. W. Chu. Protection of database security via collaborative inference detection. IEEE Trans. on Knowl. and Data Eng., 20:1013--1027, August 2008.
[8]
R. Chow, P. Golle, and J. Staddon. Detecting privacy leaks using corpus-based association rules. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '08, pages 893--901, New York, NY, USA, 2008. ACM.
[9]
R. Chow, I. Oberst, and J. Staddon. Sanitization's slippery slope: the design and study of a text revision assistant. In Proceedings of the 5th Symposium on Usable Privacy and Security, SOUPS '09, pages 13:1--13:11, New York, NY, USA, 2009. ACM.
[10]
P. Cimiano and S. Staab. Learning by googling. SIGKDD Explor. Newsl., 6:24--33, December 2004.
[11]
T. M. Cover and J. A. Thomas. Elements of information theory. Wiley-Interscience, New York, NY, USA, 1991.
[12]
M. Dowman, V. Tablan, H. Cunningham, and B. Popov. Web-assisted annotation, semantic indexing and search of television and radio news. In Proceedings of the 14th international conference on World Wide Web, WWW '05, pages 225--234, New York, NY, USA, 2005. ACM.
[13]
C. Farkas and S. Jajodia. The inference problem: a survey. SIGKDD Explor. Newsl., 4:6--11, December 2002.
[14]
S. Haber, Y. Hatano, Y. Honda, W. Horne, K. Miyazaki, T. Sander, S. Tezoku, and D. Yao. Efficient signature schemes supporting redaction, pseudonymization, and data de-identification. In Proceedings of the 2008 ACM symposium on Information, computer and communications security, ASIACCS '08, pages 353--362, New York, NY, USA, 2008. ACM.
[15]
M. Koppel, J. Schler, S. Argamon, and E. Messeri. Authorship attribution with thousands of candidate authors. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '06, pages 659--660, New York, NY, USA, 2006. ACM.
[16]
linden method.com. http://www.linden-method.com/medical-records/, 1993.
[17]
D. Lopresti and A. L. Spitz. Quantifying information leakage in document redaction. In Proceedings of the 1st ACM workshop on Hardcopy document processing, HDP '04, pages 63--69, New York, NY, USA, 2004. ACM.
[18]
D. Lopresti, A. L. Spitz, D. Lopresti, and A. L. Spitz. Information leakage through document redaction: Attacks and countermeasures. In In DRR, pages 183--190, 2004.
[19]
C. D. Manning and H. Schütze. Foundations of statistical natural language processing. MIT Press, Cambridge, MA, USA, 1999.
[20]
T. M. Mitchell. Machine learning. McGraw Hill, New York, 1997.
[21]
C. E. Shannon and W. Weaver. A Mathematical Theory of Communication. University of Illinois Press, Champaign, IL, USA, 1963.
[22]
Slashdot.org. Anonymity of netix prize dataset broken, 2007.
[23]
D. L. Spooner, S. A. Demurjian, and J. E. Dobson, editors. Proceedings of the ninth annual IFIP TC11 WG11.3 working conference on Database security IX : status and prospects, London, UK, 1996. Chapman & Hall, Ltd.
[24]
J. Staddon, P. Golle, and B. Zimny. Web-based inference detection. In Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, pages 6:1--6:16, Berkeley, CA, USA, 2007. USENIX Association.
[25]
L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10:571--588, October 2002.
[26]
L. Sweeney. k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10:557--570, October 2002.
[27]
L. Sweeney. Ai technologies to defeat identity theft vulnerabilities. AAAI Spring Symposium, AI Technologies for Homeland Security, 2005.
[28]
R. Yi and K. Levitt. Data level inference detection in database systems. In Proceedings of the 11th IEEE workshop on Computer Security Foundations, pages 179--, Washington, DC, USA, 1998. IEEE Computer Society.

Cited By

View all

Index Terms

  1. An information theoretic framework for web inference detection

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      AISec '12: Proceedings of the 5th ACM workshop on Security and artificial intelligence
      October 2012
      116 pages
      ISBN:9781450316644
      DOI:10.1145/2381896
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 October 2012

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. document redaction
      2. inference detection
      3. information theory
      4. web-based inference detection

      Qualifiers

      • Research-article

      Conference

      CCS'12
      Sponsor:
      CCS'12: the ACM Conference on Computer and Communications Security
      October 19, 2012
      North Carolina, Raleigh, USA

      Acceptance Rates

      AISec '12 Paper Acceptance Rate 10 of 24 submissions, 42%;
      Overall Acceptance Rate 94 of 231 submissions, 41%

      Upcoming Conference

      CCS '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 11 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media