Document categorization
33 Followers
Recent papers in Document categorization
This paper describes a study aimed at identifying different profiles of students' project development processes. Specifically, we assessed the use of abstract data types for the development of knowledge-based projects. The concept of... more
Document image classification is an important step in Office Automation, Digital Libraries, and other document image analysis applications. There is great diversity in document image classifiers: they differ in the problems they solve, in... more
In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine... more
Document categorization is a technique through which the category of a document is determined. This paper deals with the automatic classification of Bangla documents. In this proposed categorization system, a support vector machine is... more
Automatic text categorization is a primary step in information retrieval where it is necessary to find the most relevant documents in an enormous volume. It is also useful in a wide range of web domains, such as from portal sites to news... more
Named entity recognition is a preprocessing tool to many natural language processing tasks, such as text summarization, speech translation, and document categorization. Many systems for named entity recognition have been developed over... more
Word fragments or n-grams have been widely used to perform different Natural Language Processing tasks such as information retrieval [1] [2], document categorization [3], automatic summarization [4] or, even, genetic classification of... more
In the process of preparing learning material for Computer Supported Learning Systems (CSLSs), one of the first steps involves finding documents relevant to the topics and to the students. This requires documents to be categorized... more
Automatic document classification due to its various applications in data mining and information technology is one of the important topics in computer science. Classification plays a vital role in many information management and retrieval... more
Despite several decades of research in document analysis, recognition of unconstrained handwritten documents is still considered a challenging task. Previous research in this area has shown that word recognizers perform adequately on... more
Information about the category (type) of a WWW page can be helpful for the user within search, filtering, as well as navigation tasks. We propose a multidimensional categorisation scheme, with bibliographic dimension as the primary one.... more
Experiments were conducted to test several hypotheses on methods for improving document classification for the malicious insider threat problem within the Intelligence Community. Bag-of-words (BOW) representations of documents were... more
Experiments were conducted to test several hypotheses on methods for improving document categorization for the malicious insider threat problem within the Intelligence Community. Bag-of-words (BOW) representations of documents were... more
In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine... more
Introduction The traditional approach to document categorization is categorization by content, since information for categorizing a document is extracted from the document itself. In a hypertext environment like the Web, the structure of... more
Today, there is an increasing demand of efficient archival and retrieval methods for online handwritten data. For such tasks, text categorization is of particular interest. The textual data available in online documents can be extracted... more