short-paper

Active Learning Stopping Strategies for Technology-Assisted Sensitivity Review

Authors:

Graham McDonald,

Craig Macdonald,

Iadh OunisAuthors Info & Claims

SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2053 - 2056

https://doi.org/10.1145/3397271.3401267

Published: 25 July 2020 Publication History

Abstract

Active learning strategies are often deployed in technology-assisted review tasks, such as e-discovery and sensitivity review, to learn a classifier that can assist the reviewers with their task. In particular, an active learning strategy selects the documents that are expected to be the most useful for learning an effective classifier, so that these documents can be reviewed before the less useful ones. However, when reviewing for sensitivity, the order in which the documents are reviewed can impact on the reviewers' ability to perform the review. Therefore, when deploying active learning in technology-assisted sensitivity review, we want to know when a sufficiently effective classifier has been learned, such that the active learning can stop and the reviewing order of the documents can be selected by the reviewer instead of the classifier. In this work, we propose two active learning stopping strategies for technology-assisted sensitivity review. We evaluate the effectiveness of our proposed approaches in comparison with three state-of-the-art stopping strategies from the literature. We show that our best performing approach results in a significantly more effective sensitivity classifier (+6.6% F2) than the best performing stopping strategy from the literature (McNemar's test, p<0.05).

Supplementary Material

MP4 File (3397271.3401267.mp4)

Active Learning Stopping Strategies for Technology-Assisted Sensitivity Review, McDonald et al.

Download
20.45 MB

References

[1]

D. Angluin. 1988. Queries and Concept Learning. Machine Learning, Vol. 2, 4 (1988), 319--342.

Digital Library

[2]

L. Atlas, D. Cohn, and R. Ladner. 1990. Training Connectionist Networks with Queries and Selective Sampling. In Proc. NIPS.

[3]

M. Bloodgood and K. Vijay-Shanker. 2009. A Method for Stopping Active Learning Based on Stabilizing Predictions and the Need for User-Adjustable Stopping. In Proc. CoNLL.

[4]

J Cohen. 1960. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, Vol. 20, 1 (1960), 37--46.

[5]

G. Cormack and M. Grossman. 2014. Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery. In Proc. SIGIR.

[6]

S. Ertekin, J. Huang, L. Bottou, and L. Giles. 2007. Learning on the Border: Active Learning in Imbalanced Data Classification. In Proc. CIKM.

[7]

T. Gollins, G. McDonald, C. Macdonald, and I. Ounis. 2014. On Using Information Retrieval for the Selection and Sensitivity Review of Digital Public Records. In Proc. PIR@SIGIR.

[8]

C. Lefebvre, E. Manheimer, and J. Glanville. 2008. Searching for Studies. Cochrane Handbook for Systematic Reviews of Interventions (2008), 95--150.

[9]

D. Lewis and W. Gale. 1994. A Sequential Algorithm for Training Text Classifiers. In Proc. SIGIR.

[10]

A. McCallumzy and K. Nigamy. 1998. Employing EM and Pool-Based Active Learning for Text Classification. In Proc. ICML.

[11]

G. McDonald, C. Macdonald, and I. Ounis. 2018. Active Learning Strategies for Technology-Assisted Sensitivity Review. In Proc. ECIR.

[12]

T. Mitchell. 1982. Generalization as Search. Artificial Intelligence, Vol. 18, 2 (1982), 203--226.

[13]

D. Oard, J. Baron, B. Hedin, D. Lewis, and S. Tomlinson. 2010. Evaluation of Information Retrieval for E-Discovery. Artificial Intelligence and Law, Vol. 18, 4 (2010), 347--386.

Digital Library

[14]

F. Olsson and K. Tomanek. 2009. An Intrinsic Stopping Criterion for Committee-Based Active Learning. In Proc. CoNLL.

[15]

N. Roy and A. McCallum. 2001. Toward Optimal Active Learning through Sampling Estimation of Error Reduction. In Proc. ICML.

[16]

G. Schohn and D. Cohn. 2000. Less is More: Active Learning with Support Vector Machines. In Proc. ICML.

[17]

B. Settles. 2012. Active Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, Vol. 6, 1 (2012), 1--114.

Digital Library

[18]

J. Zhu and E. Hovy. 2007. Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem. In Proc. EMNLP-CoNLL.

[19]

J. Zhu, H. Wang, and E. Hovy. 2008. Multi-Criteria-Based Strategy to Stop Active Learning for Data Annotation. In Proc. Coling.

Cited By

Jaillant LCaputo A(2022)Unlocking digital archives: cross-disciplinary perspectives on AI and born-digital dataAI & Society10.1007/s00146-021-01367-x37:3(823-835)Online publication date: 1-Sep-2022
https://dl.acm.org/doi/10.1007/s00146-021-01367-x
Yang ELewis DFrieder OHealy PBilauca MBonnici A(2021)On minimizing cost in legal document review workflowsProceedings of the 21st ACM Symposium on Document Engineering10.1145/3469096.3469872(1-10)Online publication date: 16-Aug-2021
https://dl.acm.org/doi/10.1145/3469096.3469872
Lewis DYang EFrieder ODemartini GZuccon GCulpepper JHuang ZTong H(2021)Certifying One-Phase Technology-Assisted ReviewsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482415(893-902)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482415
Show More Cited By

Index Terms

Active Learning Stopping Strategies for Technology-Assisted Sensitivity Review
1. Computing methodologies
  1. Machine learning
    1. Learning settings
      1. Active learning settings
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification

Recommendations

How Sensitivity Classification Effectiveness Impacts Reviewers in Technology-Assisted Sensitivity Review
CHIIR '19: Proceedings of the 2019 Conference on Human Information Interaction and Retrieval

All government documents that are released to the public must first be manually reviewed to identify and protect any sensitive information, e.g. confidential information. However, the unassisted manual sensitivity review of born-digital documents is not ...
Active Learning for Multiclass Cost-Sensitive Classification Using Probabilistic Models
TAAI '13: Proceedings of the 2013 Conference on Technologies and Applications of Artificial Intelligence

Multiclass cost-sensitive active learning is a relatively new problem. In this paper, we derive the maximum expected cost and cost-weighted minimum margin strategies for multiclass cost-sensitive active learning. The two strategies can be viewed as ...
Multiple-view multiple-learner active learning

Generally, collecting a large quantity of unlabeled examples is feasible, but labeling them all is not. Active learning can reduce the number of labeled examples needed to train a good classifier. Existing active learning algorithms can be roughly ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2020

2548 pages

ISBN:9781450380164

DOI:10.1145/3397271

General Chairs:
Jimmy Huang
York University, Canada
,
Yi Chang
Jilin University, China
,
Xueqi Cheng
Chinese Academy of Sciences, China
,
Program Chairs:
Jaap Kamps
University of Amsterdam, Netherlands
,
Vanessa Murdock
Amazon, U.S.A.
,
Ji-Rong Wen
Renmin University of China, China
,
Yiqun Liu
Tsinghua University, China

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGIR '20

Sponsor:

SIGIR

SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval

July 25 - 30, 2020

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
230
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)8

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jaillant LCaputo A(2022)Unlocking digital archives: cross-disciplinary perspectives on AI and born-digital dataAI & Society10.1007/s00146-021-01367-x37:3(823-835)Online publication date: 1-Sep-2022
https://dl.acm.org/doi/10.1007/s00146-021-01367-x
Yang ELewis DFrieder OHealy PBilauca MBonnici A(2021)On minimizing cost in legal document review workflowsProceedings of the 21st ACM Symposium on Document Engineering10.1145/3469096.3469872(1-10)Online publication date: 16-Aug-2021
https://dl.acm.org/doi/10.1145/3469096.3469872
Lewis DYang EFrieder ODemartini GZuccon GCulpepper JHuang ZTong H(2021)Certifying One-Phase Technology-Assisted ReviewsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482415(893-902)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482415
McDonald G(2021)A framework for technology-assisted sensitivity reviewACM SIGIR Forum10.1145/3458537.345854453:1(42-43)Online publication date: 23-Mar-2021
https://dl.acm.org/doi/10.1145/3458537.3458544

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents