-
Sequence-aware multimodal page classification of Brazilian legal documents
Authors:
Pedro H. Luz de Araujo,
Ana Paula G. S. de Almeida,
Fabricio A. Braz,
Nilton C. da Silva,
Flavio de Barros Vidal,
Teofilo E. de Campos
Abstract:
The Brazilian Supreme Court receives tens of thousands of cases each semester. Court employees spend thousands of hours to execute the initial analysis and classification of those cases -- which takes effort away from posterior, more complex stages of the case management workflow. In this paper, we explore multimodal classification of documents from Brazil's Supreme Court. We train and evaluate ou…
▽ More
The Brazilian Supreme Court receives tens of thousands of cases each semester. Court employees spend thousands of hours to execute the initial analysis and classification of those cases -- which takes effort away from posterior, more complex stages of the case management workflow. In this paper, we explore multimodal classification of documents from Brazil's Supreme Court. We train and evaluate our methods on a novel multimodal dataset of 6,510 lawsuits (339,478 pages) with manual annotation assigning each page to one of six classes. Each lawsuit is an ordered sequence of pages, which are stored both as an image and as a corresponding text extracted through optical character recognition. We first train two unimodal classifiers: a ResNet pre-trained on ImageNet is fine-tuned on the images, and a convolutional network with filters of multiple kernel sizes is trained from scratch on document texts. We use them as extractors of visual and textual features, which are then combined through our proposed Fusion Module. Our Fusion Module can handle missing textual or visual input by using learned embeddings for missing data. Moreover, we experiment with bi-directional Long Short-Term Memory (biLSTM) networks and linear-chain conditional random fields to model the sequential nature of the pages. The multimodal approaches outperform both textual and visual classifiers, especially when leveraging the sequential nature of the pages.
△ Less
Submitted 15 July, 2022; v1 submitted 2 July, 2022;
originally announced July 2022.
-
Skin Lesions Classification Using Convolutional Neural Networks in Clinical Images
Authors:
Danilo Barros Mendes,
Nilton Correia da Silva
Abstract:
Skin lesions are conditions that appear on a patient due to many different reasons. One of these can be because of an abnormal growth in skin tissue, defined as cancer. This disease plagues more than 14.1 million patients and had been the cause of more than 8.2 million deaths, worldwide. Therefore, the construction of a classification model for 12 lesions, including Malignant Melanoma and Basal Ce…
▽ More
Skin lesions are conditions that appear on a patient due to many different reasons. One of these can be because of an abnormal growth in skin tissue, defined as cancer. This disease plagues more than 14.1 million patients and had been the cause of more than 8.2 million deaths, worldwide. Therefore, the construction of a classification model for 12 lesions, including Malignant Melanoma and Basal Cell Carcinoma, is proposed. Furthermore, in this work, it is used a ResNet-152 architecture, which was trained over 3,797 images, later augmented by a factor of 29 times, using positional, scale, and lighting transformations. Finally, the network was tested with 956 images and achieve an area under the curve (AUC) of 0.96 for Melanoma and 0.91 for Basal Cell Carcinoma.
△ Less
Submitted 5 December, 2018;
originally announced December 2018.
-
Document classification using a Bi-LSTM to unclog Brazil's supreme court
Authors:
Fabricio Ataides Braz,
Nilton Correia da Silva,
Teofilo Emidio de Campos,
Felipe Borges S. Chaves,
Marcelo H. S. Ferreira,
Pedro Henrique Inazawa,
Victor H. D. Coelho,
Bernardo Pablo Sukiennik,
Ana Paula Goncalves Soares de Almeida,
Flavio Barros Vidal,
Davi Alves Bezerra,
Davi B. Gusmao,
Gabriel G. Ziegler,
Ricardo V. C. Fernandes,
Roberta Zumblick,
Fabiano Hartmann Peixoto
Abstract:
The Brazilian court system is currently the most clogged up judiciary system in the world. Thousands of lawsuit cases reach the supreme court every day. These cases need to be analyzed in order to be associated to relevant tags and allocated to the right team. Most of the cases reach the court as raster scanned documents with widely variable levels of quality. One of the first steps for the analys…
▽ More
The Brazilian court system is currently the most clogged up judiciary system in the world. Thousands of lawsuit cases reach the supreme court every day. These cases need to be analyzed in order to be associated to relevant tags and allocated to the right team. Most of the cases reach the court as raster scanned documents with widely variable levels of quality. One of the first steps for the analysis is to classify these documents. In this paper we present a Bidirectional Long Short-Term Memory network (Bi-LSTM) to classify these pieces of legal document.
△ Less
Submitted 27 November, 2018;
originally announced November 2018.