research-article

Contrastive Learning for Legal Judgment Prediction

Authors:

Ji-Rong WenAuthors Info & Claims

ACM Transactions on Information Systems, Volume 41, Issue 4

Article No.: 113, Pages 1 - 25

https://doi.org/10.1145/3580489

Published: 21 April 2023 Publication History

Abstract

Legal judgment prediction (LJP) is a fundamental task of legal artificial intelligence. It aims to automatically predict the judgment results of legal cases. Three typical subtasks are relevant law article prediction, charge prediction, and term-of-penalty prediction. Due to the wide range of potential applications, LJP has attracted a great deal of interest, prompting the development of numerous approaches. These methods mainly focus on building a more accurate representation of a case’s fact description in order to improve the performance of judgment prediction. They overlook, however, the practical judicial scenario in which human judges often compare similar law articles or possible charges before making a final decision. To this end, we propose a supervised contrastive learning framework for the LJP task. Specifically, we train the model to distinguish (1) various law articles within the same chapter of a Law and (2) similar charges of the same law article or related law articles. By this means, the fine-grained differences between similar articles/charges can be captured, which are important for making a judgment. Besides, we optimize our model by identifying cases with the same article/charge labels, allowing it to more effectively model the relationship between the case’s fact description and its associated labels. By jointly learning the LJP task with the aforementioned contrastive learning tasks, our model achieves better performance than the state-of-the-art models on two real-world datasets.

References

[1]

Huajie Chen, Deng Cai, Wei Dai, Zehui Dai, and Yadong Ding. 2019. Charge-based prison term prediction with deep gating network. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 6361–6366.

[2]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. 2020. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning (ICML’20), Virtual Event(Proceedings of Machine Learning Research, Vol. 119). PMLR, 1597–1607. http://proceedings.mlr.press/v119/chen20j.html.

[3]

Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, and Ziqing Yang. 2021. Pre-training with whole word masking for Chinese BERT. IEEE ACM Trans. Audio Speech Lang. Process. 29 (2021), 3504–3514.

Digital Library

[4]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19), Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186.

[5]

Qian Dong and Shuzi Niu. 2021. Legal judgment prediction via relational learning. In The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’21), Virtual Event. ACM, 983–992.

Digital Library

[6]

Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. SimCSE: Simple contrastive learning of sentence embeddings. CoRR abs/2104.08821 (2021). arxiv:2104.08821 https://arxiv.org/abs/2104.08821.

[7]

Anne von der Lieth Gardner. 1984. An Artificial Intelligence Approach to Legal Reasoning. Ph. D. Dissertation. Stanford University.

Digital Library

[8]

Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). IEEE Computer Society, 1735–1742.

Digital Library

[9]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross B. Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). Computer Vision Foundation/IEEE, 9726–9735.

[10]

Zikun Hu, Xiang Li, Cunchao Tu, Zhiyuan Liu, and Maosong Sun. 2018. Few-shot charge prediction with discriminative legal attributes. In Proceedings of the 27th International Conference on Computational Linguistics (COLING’18). Association for Computational Linguistics, 487–498. https://aclanthology.org/C18-1041/.

[11]

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020 (NeurIPS’20), virtual. https://proceedings.neurips.cc/paper/2020/hash/d89a66c7c80a29b1bdbab0f2a1a94af8-Abstract.html.

[12]

Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14), A meeting of SIGDAT, a Special Interest Group of the ACL, Alessandro Moschitti, Bo Pang, and Walter Daelemans (Eds.). ACL, 1746–1751.

[13]

Fred Kort. 1957. Predicting supreme court decisions mathematically: A quantitative analysis of the “right to counsel” cases. American Political Science Review 51, 1 (1957), 1–12.

[14]

Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, Blai Bonet and Sven Koenig (Eds.). AAAI Press, 2267–2273. http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9745.

Digital Library

[15]

Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In 7th International Conference on Learning Representations (ICLR’19). OpenReview.net. https://openreview.net/forum?id=Bkg6RiCqY7.

[16]

Bingfeng Luo, Yansong Feng, Jianbo Xu, Xiang Zhang, and Dongyan Zhao. 2017. Learning to predict charges for criminal cases with legal basis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP’17). Association for Computational Linguistics, 2727–2736.

[17]

Luyao Ma, Yating Zhang, Tianyi Wang, Xiaozhong Liu, Wei Ye, Changlong Sun, and Shikun Zhang. 2021. Legal judgment prediction with multi-stage case representation learning in the real court setting. In The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’21), Virtual Event. ACM, 993–1002.

Digital Library

[18]

Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, and Xia Song. 2021. COCO-LM: Correcting and contrasting text sequences for language model pretraining. CoRR abs/2102.08473 (2021). arxiv:2102.08473 https://arxiv.org/abs/2102.08473.

[19]

Tomás Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013. 3111–3119. https://proceedings.neurips.cc/paper/2013/hash/9aa42b31882ec039965f3c4923ce901b-Abstract.html.

[20]

Stuart S. Nagel. 1963. Applying correlation analysis to case prediction. Tex. L. Rev. 42 (1963), 1006.

[21]

Jeffrey A. Segal. 1984. Predicting supreme court cases probabilistically: The search and seizure cases, 1962-1981. American Political Science Review 78, 4 (1984), 891–900.

[22]

Zhan Su, Zhicheng Dou, Yutao Zhu, Xubo Qin, and Ji-Rong Wen. 2021. Modeling intent graph for search result diversification. In The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’21), Virtual Event, Fernando Diaz, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, and Tetsuya Sakai (Eds.). ACM, 736–746.

Digital Library

[23]

Maosong Sun, Xinxiong Chen, Kaixu Zhang, Zhipeng Guo, and Zhiyuan Liu. 2016. Thulac: An efficient lexical analyzer for Chinese.

[24]

Johan A. K. Suykens and Joos Vandewalle. 1999. Least squares support vector machine classifiers. Neural Process. Lett. 9, 3 (1999), 293–300.

Digital Library

[25]

S. Sidney Ulmer. 1963. Quantitative analysis of judicial processes: Some practical and theoretical applications. Law and Contemporary Problems 28, 1 (1963), 164–184.

[26]

Josef Valvoda, Ryan Cotterell, and Simone Teufel. 2022. On the role of negative precedent in legal outcome prediction. CoRR abs/2208.08225 (2022). arXiv:2208.08225

[27]

Aäron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. CoRR abs/1807.03748 (2018). arXiv:1807.03748 http://arxiv.org/abs/1807.03748.

[28]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017. 5998–6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.

[29]

Pengfei Wang, Yu Fan, Shuzi Niu, Ze Yang, Yongfeng Zhang, and Jiafeng Guo. 2019. Hierarchical matching network for crime classification. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). ACM, 325–334.

Digital Library

[30]

Tongzhou Wang and Phillip Isola. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In Proceedings of the 37th International Conference on Machine Learning (ICML’20), Virtual Event(Proceedings of Machine Learning Research, Vol. 119). PMLR, 9929–9939. http://proceedings.mlr.press/v119/wang20k.html.

[31]

Yiquan Wu, Kun Kuang, Yating Zhang, Xiaozhong Liu, Changlong Sun, Jun Xiao, Yueting Zhuang, Luo Si, and Fei Wu. 2020. De-biased court’s view generation with causality. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP’20), Online, Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 763–780.

[32]

Zhuofeng Wu, Sinong Wang, Jiatao Gu, Madian Khabsa, Fei Sun, and Hao Ma. 2020. CLEAR: Contrastive learning for sentence representation. CoRR abs/2012.15466 (2020). arxiv:2012.15466 https://arxiv.org/abs/2012.15466.

[33]

Chaojun Xiao, Haoxi Zhong, Zhipeng Guo, Cunchao Tu, Zhiyuan Liu, Maosong Sun, Yansong Feng, Xianpei Han, Zhen Hu, Heng Wang, and Jianfeng Xu. 2018. CAIL2018: A large-scale legal dataset for judgment prediction. CoRR abs/1807.02478 (2018). arXiv:1807.02478 http://arxiv.org/abs/1807.02478.

[34]

Nuo Xu, Pinghui Wang, Long Chen, Li Pan, Xiaoyan Wang, and Junzhou Zhao. 2020. Distinguish confusing law articles for legal judgment prediction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), Online. Association for Computational Linguistics, 3086–3095.

[35]

Wenmian Yang, Weijia Jia, Xiaojie Zhou, and Yutao Luo. 2019. Legal judgment prediction via multi-perspective bi-feedback network. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19). ijcai.org, 4085–4091.

[36]

Linan Yue, Qi Liu, Binbin Jin, Han Wu, Kai Zhang, Yanqing An, Mingyue Cheng, Biao Yin, and Dayong Wu. 2021. NeurJudge: A circumstance-aware neural framework for legal judgment prediction. In The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’21), Virtual Event. ACM, 973–982.

Digital Library

[37]

Han Zhang, Zhicheng Dou, Yutao Zhu, and Jirong Wen. 2021. Few-shot charge prediction with multi-grained features and mutual information. In Chinese Computational Linguistics - 20th China National Conference (CCL’21), Proceedings(Lecture Notes in Computer Science, Vol. 12869), Sheng Li, Maosong Sun, Yang Liu, Hua Wu, Kang Liu, Wanxiang Che, Shizhu He, and Gaoqi Rao (Eds.). Springer, 387–403.

Digital Library

[38]

Haoxi Zhong, Zhipeng Guo, Cunchao Tu, Chaojun Xiao, Zhiyuan Liu, and Maosong Sun. 2018. Legal judgment prediction via topological learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 3540–3549.

[39]

Haoxi Zhong, Zhengyan Zhang, Zhiyuan Liu, and Maosong Sun. 2019. Open Chinese Language Pre-trained Model Zoo. Technical Report. https://github.com/thunlp/openclap.

[40]

Yujia Zhou, Zhicheng Dou, Yutao Zhu, and Ji-Rong Wen. 2021. PSSL: Self-supervised learning for personalized search with contrastive sampling. In The 30th ACM International Conference on Information and Knowledge Management (CIKM’21), Virtual Event. ACM, 2749–2758.

Digital Library

[41]

Yutao Zhu, Jian-Yun Nie, Zhicheng Dou, Zhengyi Ma, Xinyu Zhang, Pan Du, Xiaochen Zuo, and Hao Jiang. 2021. Contrastive learning of user behavior sequence for context-aware document ranking. In The 30th ACM International Conference on Information and Knowledge Management (CIKM’21), Virtual Event. ACM, 2780–2791.

Digital Library

[42]

Yutao Zhu, Jian-Yun Nie, Zhicheng Dou, Zhengyi Ma, Xinyu Zhang, Pan Du, Xiaochen Zuo, and Hao Jiang. 2021. Contrastive learning of user behavior sequence for context-aware document ranking. In The 30th ACM International Conference on Information and Knowledge Management (CIKM’21), Virtual Event, Gianluca Demartini, Guido Zuccon, J. Shane Culpepper, Zi Huang, and Hanghang Tong (Eds.). ACM, 2780–2791.

Digital Library

[43]

Yutao Zhu, Kun Zhou, Jian-Yun Nie, Shengchao Liu, and Zhicheng Dou. 2021. Neural sentence ordering based on constraint graphs. In 35th AAAI Conference on Artificial Intelligence (AAAI’21), 33rd Conference on Innovative Applications of Artificial Intelligence (IAAI’21), 11th Symposium on Educational Advances in Artificial Intelligence (EAAI’21), Virtual Event. AAAI Press, 14656–14664. https://ojs.aaai.org/index.php/AAAI/article/view/17722.

Cited By

Peng YLei C(2024)Using Bidirectional Encoder Representations from Transformers (BERT) to predict criminal charges and sentences from Taiwanese court judgmentsPeerJ Computer Science10.7717/peerj-cs.184110(e1841)Online publication date: 31-Jan-2024
https://doi.org/10.7717/peerj-cs.1841
Wang TLi FZhu LLi JZhang ZShen H(2024)Invisible Black-Box Backdoor Attack against Deep Cross-Modal Hashing RetrievalACM Transactions on Information Systems10.1145/365020542:4(1-27)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3650205
Li ZLin ZLiang FPan WYang QMing Z(2024)Decentralized Federated Recommendation with Privacy-aware Structured Client-level GraphACM Transactions on Intelligent Systems and Technology10.1145/364128715:4(1-23)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3641287
Show More Cited By

Index Terms

Contrastive Learning for Legal Judgment Prediction
1. Applied computing
  1. Law, social and behavioral sciences
    1. Law

Recommendations

ML-LJP: Multi-Law Aware Legal Judgment Prediction
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Legal judgment prediction (LJP) is a significant task in legal intelligence, which aims to assist the judges and determine the judgment result based on the case's fact description. The judgment result consists of law articles, charge, and prison term. ...
Legal Judgment Prediction Incorporating Guiding Cases Matching
Natural Language Processing and Chinese Computing
Abstract
Legal judgment prediction aims to predict the judgment result based on the case fact description. It is an important application of natural language processing within the legal field. To enhance the impartiality and consistency of the judiciary, ...
Legal Judgment Prediction via graph boosting with constraints
Abstract
Legal Judgment Prediction (LJP) is a multi-task multi-label problem in the civil law system, involving the prediction of law articles, charges, and terms of penalty based on fact descriptions. However, most existing research approaches LJP as a ...
Highlights
- Mitigated the multi-task multi-label challenge in legal judgment prediction.
- A multi-perspective interactive encoder is developed for fact and label fusion.
- A multi-graph attention consistency expert module is proposed for task ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 41, Issue 4

October 2023

958 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/3587261

Editor:
Min Zhang
Tsinghua University, China

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2023

Online AM: 18 January 2023

Accepted: 05 January 2023

Revised: 04 November 2022

Received: 02 August 2022

Published in TOIS Volume 41, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
1,199
Total Downloads

Downloads (Last 12 months)679
Downloads (Last 6 weeks)62

Reflects downloads up to 10 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Peng YLei C(2024)Using Bidirectional Encoder Representations from Transformers (BERT) to predict criminal charges and sentences from Taiwanese court judgmentsPeerJ Computer Science10.7717/peerj-cs.184110(e1841)Online publication date: 31-Jan-2024
https://doi.org/10.7717/peerj-cs.1841
Wang TLi FZhu LLi JZhang ZShen H(2024)Invisible Black-Box Backdoor Attack against Deep Cross-Modal Hashing RetrievalACM Transactions on Information Systems10.1145/365020542:4(1-27)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3650205
Li ZLin ZLiang FPan WYang QMing Z(2024)Decentralized Federated Recommendation with Privacy-aware Structured Client-level GraphACM Transactions on Intelligent Systems and Technology10.1145/364128715:4(1-23)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3641287
Lalor JAbbasi AOketch KYang YForsgren N(2024)Should Fairness be a Metric or a Model? A Model-based Framework for Assessing Bias in Machine Learning PipelinesACM Transactions on Information Systems10.1145/364127642:4(1-41)Online publication date: 22-Mar-2024
https://dl.acm.org/doi/10.1145/3641276
Su HLi JDu ZZhu LLu KShen H(2024)Cross-domain Recommendation via Dual Adversarial AdaptationACM Transactions on Information Systems10.1145/363252442:3(1-26)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3632524
Paul SBhatt RGoyal PGhosh SHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Legal Statute Identification: A Case Study using State-of-the-Art Datasets and MethodsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657879(2231-2240)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657879
Su HMeng LZhu LLu KLi JHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)DDPO: Direct Dual Propensity Optimization for Post-Click Conversion Rate EstimationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657817(1179-1188)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657817
Zhang PHuang ZBai GHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Universal Adversarial Perturbations for Vision-Language Pre-trained ModelsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657781(862-871)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657781
Zhang YSang LZhang YHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Exploring the Individuality and Collectivity of Intents behind Interactions for Graph Collaborative FilteringProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657738(1253-1262)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657738
Yue LLiu QZhao LWang LGao WAn YHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Event Grounded Criminal Court View Generation with Cooperative (Large) Language ModelsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657698(2221-2230)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657698
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents