Abstract
Legal retrieval techniques play an important role in preserving the fairness and equality of the judicial system. As an annually well-known international competition, COLIEE aims to advance the development of state-of-the-art retrieval models for legal texts. This paper elaborates on the methodology employed by the TQM team in COLIEE2024. Specifically, we explored various lexical matching and semantic retrieval models, with a focus on enhancing the understanding of case relevance. Additionally, we endeavor to integrate various features using the learning-to-rank technique. Furthermore, fine heuristic pre-processing and post-processing methods have been proposed to mitigate irrelevant information. Consequently, our methodology achieved remarkable performance in COLIEE2024, securing first place in Task 1 and third place in Task 3. We anticipate that our proposed approach can contribute valuable insights to the advancement of legal retrieval technology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Althammer, S., Askari, A., Verberne, S., Hanbury, A.: DoSSIER@ COLIEE 2021: leveraging dense retrieval and summarization-based re-ranking for case law retrieval. arXiv preprint arXiv:2108.03937 (2021)
Bench-Capon, T., et al.: A history of AI and law in 50 papers: 25 years of the international conference on AI and law. Artif. Intell. Law 20(3), 215–319 (2012)
Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., Androutsopoulos, I.: Legal-BERT: the muppets straight out of law school. arXiv preprint arXiv:2010.02559 (2020)
Chen, J., Li, H., Su, W., Ai, Q., Liu, Y.: THUIR at WSDM cup 2023 task 1: unbiased learning to rank (2023)
Chen, J., et al.: Axiomatically regularized pre-training for ad hoc search. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1524–1534 (2022)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dong, Q., et al.: I3 retriever: incorporating implicit interaction in pre-trained language models for passage retrieval. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp. 441–451 (2023)
Han, X., Tu, Y., Li, H., Ai, Q., Liu, Y.: THUIR_SS at the NTCIR-17 session search (SS) task. (No Title) (2023)
Huang, Z., et al.: Context-aware legal citation recommendation using deep learning. In: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, pp. 79–88 (2021)
Jiang, J.Y., Zhang, M., Li, C., Bendersky, M., Golbandi, N., Najork, M.: Semantic text matching for long-form documents. In: The World Wide Web Conference, pp. 795–806 (2019)
Jiang, Z., El-Jaroudi, A., Hartmann, W., Karakos, D., Zhao, L.: Cross-lingual information retrieval with BERT. arXiv preprint arXiv:2004.13005 (2020)
Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906 (2020)
Kim, M.Y., Rabelo, J., Babiker, H.K.B., Rahman, M.A., Goebel, R.: Legal information retrieval and entailment using transformer-based approaches. Rev. Socionetwork Strategies 1–21 (2024)
Li, H., et al.: SAILER: structure-aware pre-trained language model for legal case retrieval (2023)
Li, H., et al.: Constructing tree-based index for efficient and effective dense retrieval (2023)
Li, H., Chen, J., Su, W., Ai, Q., Liu, Y.: Towards better web search performance: pre-training, fine-tuning and learning to rank. arXiv preprint arXiv:2303.04710 (2023)
Li, H., Shao, Y., Wu, Y., Ai, Q., Ma, Y., Liu, Y.: LeCaRDv2: a large-scale Chinese legal case retrieval dataset (2023)
Li, H., Su, W., Wang, C., Wu, Y., Ai, Q., Liu, Y.: THUIR@COLIEE 2023: incorporating structural knowledge into pre-trained language models for legal case retrieval (2023)
Li, H., Wang, C., Su, W., Wu, Y., Ai, Q., Liu, Y.: THUIR@COLIEE 2023: More parameters and legal knowledge for legal case entailment (2023)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Ma, Y., et al.: LeCaRD: a legal case retrieval dataset for Chinese law system. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2342–2348 (2021)
Nogueira, R., Jiang, Z., Lin, J.: Document ranking with a pretrained sequence-to-sequence model. arXiv preprint arXiv:2003.06713 (2020)
Robertson, S., Zaragoza, H., et al.: The probabilistic relevance framework: BM25 and beyond. Found. Trends® Inf. Retrieval 3(4), 333–389 (2009)
Seo, M., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603 (2016)
Shao, Y., et al.: An intent taxonomy of legal case retrieval. ACM Trans. Inf. Syst. 42(2) (2023). https://doi.org/10.1145/3626093
Shao, Y., et al.: BERT-PLI: modeling paragraph-level interactions for legal case retrieval. In: IJCAI, pp. 3501–3507 (2020)
Tran, V., Nguyen, M.L., Satoh, K.: Building legal case retrieval systems with lexical matching and summarization using a pre-trained phrase scoring model. In: Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, pp. 275–282 (2019)
Tu, Y., Li, H., Chu, Z., Ai, Q., Liu, Y.: THUIR at the NTCIR-17 FairWeb-1 task: an initial exploration of the relationship between relevance and fairness. In: Proceedings of NTCIR-17 (2023). https://doi.org/10.20736/0002001317
Xiao, C., Hu, X., Liu, Z., Tu, C., Sun, M.: Lawformer: a pre-trained language model for Chinese legal long documents. AI Open 2, 79–84 (2021)
Xie, X., et al.: T2Ranking: a large-scale Chinese benchmark for passage ranking. arXiv preprint arXiv:2304.03679 (2023)
Xiong, L., et al.: Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv preprint arXiv:2007.00808 (2020)
Yang, S., et al.: THUIR at the NTCIR-16 WWW-4 task. In: Proceedings of NTCIR-16 (2022)
Yu, W., et al.: Explainable legal case matching via inverse optimal transport-based rationale extraction. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 657–668 (2022)
Zhai, C.: Statistical language models for information retrieval. Synthesis Lect. Hum. Lang. Technol. 1(1), 1–141 (2008)
Zhang, K., Chen, C., Wang, Y., Tian, Q., Bai, L.: CFGL-LCR: a counterfactual graph learning framework for legal case retrieval. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3332–3341 (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, H. et al. (2024). Towards an In-Depth Comprehension of Case Relevance for Better Legal Retrieval. In: Suzumura, T., Bono, M. (eds) New Frontiers in Artificial Intelligence. JSAI-isAI 2024. Lecture Notes in Computer Science(), vol 14741. Springer, Singapore. https://doi.org/10.1007/978-981-97-3076-6_15
Download citation
DOI: https://doi.org/10.1007/978-981-97-3076-6_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-3075-9
Online ISBN: 978-981-97-3076-6
eBook Packages: Computer ScienceComputer Science (R0)