Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3397271.3401267acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Active Learning Stopping Strategies for Technology-Assisted Sensitivity Review

Published: 25 July 2020 Publication History

Abstract

Active learning strategies are often deployed in technology-assisted review tasks, such as e-discovery and sensitivity review, to learn a classifier that can assist the reviewers with their task. In particular, an active learning strategy selects the documents that are expected to be the most useful for learning an effective classifier, so that these documents can be reviewed before the less useful ones. However, when reviewing for sensitivity, the order in which the documents are reviewed can impact on the reviewers' ability to perform the review. Therefore, when deploying active learning in technology-assisted sensitivity review, we want to know when a sufficiently effective classifier has been learned, such that the active learning can stop and the reviewing order of the documents can be selected by the reviewer instead of the classifier. In this work, we propose two active learning stopping strategies for technology-assisted sensitivity review. We evaluate the effectiveness of our proposed approaches in comparison with three state-of-the-art stopping strategies from the literature. We show that our best performing approach results in a significantly more effective sensitivity classifier (+6.6% F2) than the best performing stopping strategy from the literature (McNemar's test, p<0.05).

Supplementary Material

MP4 File (3397271.3401267.mp4)
Active Learning Stopping Strategies for Technology-Assisted Sensitivity Review, McDonald et al.

References

[1]
D. Angluin. 1988. Queries and Concept Learning. Machine Learning, Vol. 2, 4 (1988), 319--342.
[2]
L. Atlas, D. Cohn, and R. Ladner. 1990. Training Connectionist Networks with Queries and Selective Sampling. In Proc. NIPS.
[3]
M. Bloodgood and K. Vijay-Shanker. 2009. A Method for Stopping Active Learning Based on Stabilizing Predictions and the Need for User-Adjustable Stopping. In Proc. CoNLL.
[4]
J Cohen. 1960. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, Vol. 20, 1 (1960), 37--46.
[5]
G. Cormack and M. Grossman. 2014. Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery. In Proc. SIGIR.
[6]
S. Ertekin, J. Huang, L. Bottou, and L. Giles. 2007. Learning on the Border: Active Learning in Imbalanced Data Classification. In Proc. CIKM.
[7]
T. Gollins, G. McDonald, C. Macdonald, and I. Ounis. 2014. On Using Information Retrieval for the Selection and Sensitivity Review of Digital Public Records. In Proc. PIR@SIGIR.
[8]
C. Lefebvre, E. Manheimer, and J. Glanville. 2008. Searching for Studies. Cochrane Handbook for Systematic Reviews of Interventions (2008), 95--150.
[9]
D. Lewis and W. Gale. 1994. A Sequential Algorithm for Training Text Classifiers. In Proc. SIGIR.
[10]
A. McCallumzy and K. Nigamy. 1998. Employing EM and Pool-Based Active Learning for Text Classification. In Proc. ICML.
[11]
G. McDonald, C. Macdonald, and I. Ounis. 2018. Active Learning Strategies for Technology-Assisted Sensitivity Review. In Proc. ECIR.
[12]
T. Mitchell. 1982. Generalization as Search. Artificial Intelligence, Vol. 18, 2 (1982), 203--226.
[13]
D. Oard, J. Baron, B. Hedin, D. Lewis, and S. Tomlinson. 2010. Evaluation of Information Retrieval for E-Discovery. Artificial Intelligence and Law, Vol. 18, 4 (2010), 347--386.
[14]
F. Olsson and K. Tomanek. 2009. An Intrinsic Stopping Criterion for Committee-Based Active Learning. In Proc. CoNLL.
[15]
N. Roy and A. McCallum. 2001. Toward Optimal Active Learning through Sampling Estimation of Error Reduction. In Proc. ICML.
[16]
G. Schohn and D. Cohn. 2000. Less is More: Active Learning with Support Vector Machines. In Proc. ICML.
[17]
B. Settles. 2012. Active Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, Vol. 6, 1 (2012), 1--114.
[18]
J. Zhu and E. Hovy. 2007. Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem. In Proc. EMNLP-CoNLL.
[19]
J. Zhu, H. Wang, and E. Hovy. 2008. Multi-Criteria-Based Strategy to Stop Active Learning for Data Annotation. In Proc. Coling.

Cited By

View all
  • (2022)Unlocking digital archives: cross-disciplinary perspectives on AI and born-digital dataAI & Society10.1007/s00146-021-01367-x37:3(823-835)Online publication date: 1-Sep-2022
  • (2021)On minimizing cost in legal document review workflowsProceedings of the 21st ACM Symposium on Document Engineering10.1145/3469096.3469872(1-10)Online publication date: 16-Aug-2021
  • (2021)Certifying One-Phase Technology-Assisted ReviewsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482415(893-902)Online publication date: 26-Oct-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2020
2548 pages
ISBN:9781450380164
DOI:10.1145/3397271
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. active learning
  2. classification
  3. sensitive information
  4. sensitivity classification
  5. technology-assisted sensitivity review

Qualifiers

  • Short-paper

Conference

SIGIR '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)8
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Unlocking digital archives: cross-disciplinary perspectives on AI and born-digital dataAI & Society10.1007/s00146-021-01367-x37:3(823-835)Online publication date: 1-Sep-2022
  • (2021)On minimizing cost in legal document review workflowsProceedings of the 21st ACM Symposium on Document Engineering10.1145/3469096.3469872(1-10)Online publication date: 16-Aug-2021
  • (2021)Certifying One-Phase Technology-Assisted ReviewsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482415(893-902)Online publication date: 26-Oct-2021
  • (2021)A framework for technology-assisted sensitivity reviewACM SIGIR Forum10.1145/3458537.345854453:1(42-43)Online publication date: 23-Mar-2021

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media