Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1572532.1572550acmotherconferencesArticle/Chapter ViewAbstractPublication PagessoupsConference Proceedingsconference-collections
research-article

Sanitization's slippery slope: the design and study of a text revision assistant

Published: 15 July 2009 Publication History

Abstract

For privacy reasons, sensitive content may be revised before it is released. The revision often consists of redaction, that is, the "blacking out" of sensitive words and phrases. Redaction has the side effect of reducing the utility of the content, often so much that the content is no longer useful. Consequently, government agencies and others are increasingly exploring the revision of sensitive content as an alternative to redaction that preserves more content utility. We call this practice sanitization. In a sanitized document, names might be replaced with pseudonyms and sensitive attributes might be replaced with hypernyms. Sanitization adds to redaction the challenge of determining what words and phrases reduce the sensitivity of content. We have designed and developed a tool to assist users in sanitizing sensitive content. Our tool leverages the Web to automatically identify sensitive words and phrases and quickly evaluates revisions for sensitivity. The tool, however, does not identify all sensitive terms and mistakenly marks some innocuous terms as sensitive. This is unavoidable because of the difficulty of the underlying inference problem and is the main reason we have designed a sanitization assistant as opposed to a fully-automated tool. We have conducted a small study of our tool in which users sanitize biographies of celebrities to hide the celebrity's identity both both with and without our tool. The user study suggests that while the tool is very valuable in encouraging users to preserve content utility and can preserve privacy, this usefulness and apparent authoritativeness may lead to a "slippery slope" in which users neglect their own judgment in favor of the tool's.

References

[1]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994.
[2]
V. Chakaravarthy, H. Gupta, P. Roy and M. Mohani. Efficient techniques for document sanitization. CIKM 2008.
[3]
R. Chow, P. Golle and J. Staddon. Detecting privacy leaks with corpus-based association rules. KDD 2008.
[4]
K. Crawford. Have a blog, lose your job? CNN/Money. February 15, 2005.
[5]
IntelliDact. CSI Computing Systems Innovations. http://www.csisoft.com
[6]
C. Karat, J. Karat, C. Brodie and J. Feng. Evaluating interfaces for privacy policy rule authoring. CHI 2006.
[7]
D. Lopresti and A. Spitz. Information leakage through document redaction: attacks and countermeasures. Proceedings of Document Recognition and Retrieval XII. January 2005.
[8]
Google Directory. http://www.google.com/dirhp
[9]
C. Johnson, III. Memorandum M-07-16, "Safeguarding against and responding to the breach of personally identifiable information". FAQ. May 22, 2007.
[10]
Judicial Watch. FBI protects Osama bin Laden's "Right to Privacy" in document release. April 20, 2005. http://www.judicialwatch.org/printer_5286.shtml
[11]
J. Markoff. Researchers develop computer techniques to bring blacked-out words to light. The New York Times. May 10, 2004.
[12]
Amazon Mechanical Turk. https://www.mturk.com/mturk/welcome
[13]
OpenNLP. http://opennlp.sourceforge.net/
[14]
RapidRedact. http://www.rapidredact.com/
[15]
S. Shane. Spies do a huge volume of work in invisible ink. The New York Times. October 28, 2007.
[16]
B. Sullivan. California data leak raises questions. Experts wonder: Why do agencies share SSNs? MSNBC. October 27, 2004.
[17]
L. Sweeney. K-anonymity: A model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-Based Systems, 2002.
[18]
V. Plame Wilson. Fair Game: My life as a spy, my betrayal by the White House. Simon and Schuster, 2007.
[19]
A. Witt. Blog Interrupted. The Washington Post. August 15, 2004.
[20]
TrackMeNot. http://mrl.nyu.edu/dhowe/trackmenot/
[21]
WordNet. http://wordnet.princeton.edu

Cited By

View all
  • (2023)Semantic Attack on Disassociated Transaction DataSN Computer Science10.1007/s42979-023-01781-64:4Online publication date: 20-Apr-2023
  • (2018)Connecting Pixels to Privacy and Utility: Automatic Redaction of Private Information in Images2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition10.1109/CVPR.2018.00883(8466-8475)Online publication date: Jun-2018
  • (2018)Utility-preserving privacy protection of textual healthcare documentsJournal of Biomedical Informatics10.1016/j.jbi.2014.06.00852:C(189-198)Online publication date: 27-Dec-2018
  • Show More Cited By

Index Terms

  1. Sanitization's slippery slope: the design and study of a text revision assistant

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SOUPS '09: Proceedings of the 5th Symposium on Usable Privacy and Security
    July 2009
    205 pages
    ISBN:9781605587363
    DOI:10.1145/1572532

    Sponsors

    • Carnegie Mellon CyLab
    • Google Inc.

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 July 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data loss prevention
    2. inference detection
    3. privacy
    4. redaction
    5. sanitization

    Qualifiers

    • Research-article

    Conference

    SOUPS '09
    Sponsor:
    SOUPS '09: Symposium on Usable Privacy and Security
    July 15 - 17, 2009
    California, Mountain View, USA

    Acceptance Rates

    SOUPS '09 Paper Acceptance Rate 15 of 49 submissions, 31%;
    Overall Acceptance Rate 15 of 49 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 03 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Semantic Attack on Disassociated Transaction DataSN Computer Science10.1007/s42979-023-01781-64:4Online publication date: 20-Apr-2023
    • (2018)Connecting Pixels to Privacy and Utility: Automatic Redaction of Private Information in Images2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition10.1109/CVPR.2018.00883(8466-8475)Online publication date: Jun-2018
    • (2018)Utility-preserving privacy protection of textual healthcare documentsJournal of Biomedical Informatics10.1016/j.jbi.2014.06.00852:C(189-198)Online publication date: 27-Dec-2018
    • (2015)Rethinking Privacy for Extended Sanitizable Signatures and a Black-Box Construction of Strongly Private SchemesProceedings of the 9th International Conference on Provable Security - Volume 945110.1007/978-3-319-26059-4_25(455-474)Online publication date: 24-Nov-2015
    • (2014)Privacy DetectiveProceedings of the 13th Workshop on Privacy in the Electronic Society10.1145/2665943.2665958(35-46)Online publication date: 3-Nov-2014
    • (2014)Privacy protection of textual medical documents2014 IEEE Network Operations and Management Symposium (NOMS)10.1109/NOMS.2014.6838361(1-6)Online publication date: May-2014
    • (2014)Utility-preserving sanitization of semantically correlated terms in textual documentsInformation Sciences10.1016/j.ins.2014.03.103279(77-93)Online publication date: Sep-2014
    • (2012)An information theoretic framework for web inference detectionProceedings of the 5th ACM workshop on Security and artificial intelligence10.1145/2381896.2381902(25-36)Online publication date: 19-Oct-2012
    • (2010)Inference control to protect sensitive information in text documentsACM SIGKDD Workshop on Intelligence and Security Informatics10.1145/1938606.1938611(1-7)Online publication date: 25-Jul-2010
    • (2009)The Rules of RedactionIEEE Security and Privacy10.1109/MSP.2009.1837:6(46-53)Online publication date: 1-Nov-2009

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media