research-article

An information theoretic framework for web inference detection

Authors:

Reihaneh Safavi-NainiAuthors Info & Claims

AISec '12: Proceedings of the 5th ACM workshop on Security and artificial intelligence

Pages 25 - 36

https://doi.org/10.1145/2381896.2381902

Published: 19 October 2012 Publication History

Abstract

Document redaction is widely used to protect sensitive information in published documents. In a basic redaction system, sensitive and identifying terms are removed from the document. Web-based inference is an attack on redaction systems whereby the redacted document is linked with other publicly available documents to infer the removed parts. Web-based inference also provides an approach for detecting unwanted inferences and so constructing secure redaction systems. Previous works on web-based inference used general keyword extraction methods for document representation. We propose a systematic approach, based on information theoretic concepts and measures, to rank the words in a document for purpose of inference detection. We extend our results to the case of multiple sensitive words and propose a metric that takes into account possible relationship of the sensitive words and results in an effective and efficient inference detection system.

Using a number of experiments we show that our approach, when used for document redaction, substantially reduce the number of inferences that are left in a document. We describe our approach, present the experiment results, and outline future work.

References

[1]

E. Bier, L. Good, K. Popat, and A. Newberger. A document corpus browser for in-depth reading. In Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, JCDL '04, pages 87--96, New York, NY, USA, 2004. ACM.

Digital Library

[2]

E. A. Bier and E. W. Ishak. Entity quick click: rapid text copying based on automatic entity extraction. In Abstracts of the Conference on Human Factors in Computing Systems (CHI, pages 562--567. ACM Press, 2006.

Digital Library

[3]

E. A. Bier and E. W. Ishak. Entity workspace: an evidence file that aids memory, inference, and reading. In Proceedings of intelligence and Security Informatics (ISI 2006, pages 466--472. Springer-Verlag, 2006.

Digital Library

[4]

R. o. Bin Laden. http://www.webspawner.com/users/islamicjihad15, Aug. 2001.

[5]

Bing-API. www.bing.com/toolbox/bingdeveloper, 2012.

[6]

Y. Chen and W. W. Chu. Database security protection via inference detection. In IEEE International Conference on Intelligence and Security Informatics, 2006.

Digital Library

[7]

Y. Chen and W. W. Chu. Protection of database security via collaborative inference detection. IEEE Trans. on Knowl. and Data Eng., 20:1013--1027, August 2008.

Digital Library

[8]

R. Chow, P. Golle, and J. Staddon. Detecting privacy leaks using corpus-based association rules. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '08, pages 893--901, New York, NY, USA, 2008. ACM.

Digital Library

[9]

R. Chow, I. Oberst, and J. Staddon. Sanitization's slippery slope: the design and study of a text revision assistant. In Proceedings of the 5th Symposium on Usable Privacy and Security, SOUPS '09, pages 13:1--13:11, New York, NY, USA, 2009. ACM.

Digital Library

[10]

P. Cimiano and S. Staab. Learning by googling. SIGKDD Explor. Newsl., 6:24--33, December 2004.

Digital Library

[11]

T. M. Cover and J. A. Thomas. Elements of information theory. Wiley-Interscience, New York, NY, USA, 1991.

Digital Library

[12]

M. Dowman, V. Tablan, H. Cunningham, and B. Popov. Web-assisted annotation, semantic indexing and search of television and radio news. In Proceedings of the 14th international conference on World Wide Web, WWW '05, pages 225--234, New York, NY, USA, 2005. ACM.

Digital Library

[13]

C. Farkas and S. Jajodia. The inference problem: a survey. SIGKDD Explor. Newsl., 4:6--11, December 2002.

Digital Library

[14]

S. Haber, Y. Hatano, Y. Honda, W. Horne, K. Miyazaki, T. Sander, S. Tezoku, and D. Yao. Efficient signature schemes supporting redaction, pseudonymization, and data de-identification. In Proceedings of the 2008 ACM symposium on Information, computer and communications security, ASIACCS '08, pages 353--362, New York, NY, USA, 2008. ACM.

Digital Library

[15]

M. Koppel, J. Schler, S. Argamon, and E. Messeri. Authorship attribution with thousands of candidate authors. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '06, pages 659--660, New York, NY, USA, 2006. ACM.

Digital Library

[16]

linden method.com. http://www.linden-method.com/medical-records/, 1993.

[17]

D. Lopresti and A. L. Spitz. Quantifying information leakage in document redaction. In Proceedings of the 1st ACM workshop on Hardcopy document processing, HDP '04, pages 63--69, New York, NY, USA, 2004. ACM.

Digital Library

[18]

D. Lopresti, A. L. Spitz, D. Lopresti, and A. L. Spitz. Information leakage through document redaction: Attacks and countermeasures. In In DRR, pages 183--190, 2004.

[19]

C. D. Manning and H. Schütze. Foundations of statistical natural language processing. MIT Press, Cambridge, MA, USA, 1999.

Digital Library

[20]

T. M. Mitchell. Machine learning. McGraw Hill, New York, 1997.

Digital Library

[21]

C. E. Shannon and W. Weaver. A Mathematical Theory of Communication. University of Illinois Press, Champaign, IL, USA, 1963.

Digital Library

[22]

Slashdot.org. Anonymity of netix prize dataset broken, 2007.

[23]

D. L. Spooner, S. A. Demurjian, and J. E. Dobson, editors. Proceedings of the ninth annual IFIP TC11 WG11.3 working conference on Database security IX : status and prospects, London, UK, 1996. Chapman & Hall, Ltd.

Digital Library

[24]

J. Staddon, P. Golle, and B. Zimny. Web-based inference detection. In Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, pages 6:1--6:16, Berkeley, CA, USA, 2007. USENIX Association.

Digital Library

[25]

L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10:571--588, October 2002.

Digital Library

[26]

L. Sweeney. k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10:557--570, October 2002.

Digital Library

[27]

L. Sweeney. Ai technologies to defeat identity theft vulnerabilities. AAAI Spring Symposium, AI Technologies for Homeland Security, 2005.

[28]

R. Yi and K. Levitt. Data level inference detection in database systems. In Proceedings of the 11th IEEE workshop on Computer Security Foundations, pages 179--, Washington, DC, USA, 1998. IEEE Computer Society.

Digital Library

Cited By

Hai-Jew S(2018)In PlaintextThe Dark Web10.4018/978-1-5225-3163-0.ch012(255-289)Online publication date: 2018
https://doi.org/10.4018/978-1-5225-3163-0.ch012
Hai-Jew S(2014)In PlaintextRemote Workforce Training10.4018/978-1-4666-5137-1.ch011(231-264)Online publication date: 2014
https://doi.org/10.4018/978-1-4666-5137-1.ch011

Index Terms

An information theoretic framework for web inference detection
1. Information systems
  1. Information systems applications
2. Mathematics of computing
  1. Information theory

Recommendations

Utility-preserving privacy protection of textual healthcare documents

Graphical abstractDisplay Omitted An automatic method to protect individual's privacy in plain text medical documents is presented.It considers semantically related terms that may disclose the information to protect.Special care has been put to preserve ...
Toward sensitive document release with privacy guarantees

Privacy has become a serious concern for modern Information Societies. The sensitive nature of much of the data that are daily exchanged or released to untrusted parties requires that responsible organizations undertake appropriate privacy protection ...
Quantifying information leakage in document redaction
HDP '04: Proceedings of the 1st ACM workshop on Hardcopy document processing

In this paper, we examine ways in which sensitive information might leak through the process of redaction. Such attacks apply known methods from document image analysis and natural language processing to recover text thought to have been obliterated for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AISec '12: Proceedings of the 5th ACM workshop on Security and artificial intelligence

October 2012

116 pages

ISBN:9781450316644

DOI:10.1145/2381896

General Chair:
Ting Yu
North Carolina State University, USA
,
Program Chairs:
V. N. Venkatakrishan
University of Illinois at Chicago, USA
,
Apu Kapadia
Indiana University, Bloomington, USA

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CCS'12

Sponsor:

SIGSAC

CCS'12: the ACM Conference on Computer and Communications Security

October 19, 2012

North Carolina, Raleigh, USA

Acceptance Rates

AISec '12 Paper Acceptance Rate 10 of 24 submissions, 42%;

Overall Acceptance Rate 94 of 231 submissions, 41%

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
252
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hai-Jew S(2018)In PlaintextThe Dark Web10.4018/978-1-5225-3163-0.ch012(255-289)Online publication date: 2018
https://doi.org/10.4018/978-1-5225-3163-0.ch012
Hai-Jew S(2014)In PlaintextRemote Workforce Training10.4018/978-1-4666-5137-1.ch011(231-264)Online publication date: 2014
https://doi.org/10.4018/978-1-4666-5137-1.ch011

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents