In this paper, we introduce the idea of automatically illustrating complex sentences as multimoda... more In this paper, we introduce the idea of automatically illustrating complex sentences as multimodal summaries that combine pictures, structure and simplified compressed text. By including text and structure in addition to pictures, multimodal summaries provide additional clues of what happened, who did it, to whom and how, to people who may have difficulty reading or who are looking to skim quickly. We present ROC-MMS, a system for automatically creating multimodal summaries (MMS) of complex sentences by generating pictures, textual summaries and structure. We show that pictures alone are insufficient to help people understand most sentences, especially for readers who are unfamiliar with the domain. An evaluation of ROC-MMS in the Wikipedia domain illustrates both the promise and challenge of automatically creating multimodal summaries.
ABSTRACT TimeML is an XML-based schema for annotating temporal information over discourse. The st... more ABSTRACT TimeML is an XML-based schema for annotating temporal information over discourse. The standard has been used to annotate a variety of resources and is followed by a number of tools, the creation of which constitute hundreds of thousands of man-hours of research work. However, the current state of resources is such that many are not valid, or do not produce valid output, or contain ambiguous or custom additions and removals. Difficulties arising from these variances were highlighted in the TempEval-3 exercise, which included its own extra stipulations over conventional TimeML as a response. To unify the state of current resources, and to make progress toward easy adoption of its current incarnation ISO-TimeML, this paper introduces TimeML-strict: a valid, unambiguous, and easy-to-process subset of TimeML. We also introduce three resources -- a schema for TimeML-strict; a validator tool for TimeML-strict, so that one may ensure documents are in the correct form; and a repair tool that corrects common invalidating errors and adds disambiguating markup in order to convert documents from the laxer TimeML standard to TimeML-strict.
In this paper, we study the outcome of using n- gram based algorithm for Bangla text categorizati... more In this paper, we study the outcome of using n- gram based algorithm for Bangla text categorization. To analyze the efficiency of this methodology we used one year Prothom-Alo news corpus. Our results show that n-grams of length 2 or 3 are the most useful for categorization. Using gram lengths more than 3 reduces the performance of categorization.
2012 IEEE Sixth International Conference on Semantic Computing, 2012
ABSTRACT The temporal annotation scheme Time ML was developed to support research in complex temp... more ABSTRACT The temporal annotation scheme Time ML was developed to support research in complex temporal question answering (QA). Given the complexity of temporal QA, most of the efforts have focused, so far, on extracting temporal information, which has been evaluated with corpus-based evaluation. However, the QA task represents a natural way to evaluate temporal information understanding, and creating question sets is less costly for humans than manually annotating temporal information, which is required to perform corpus-based evaluation. Additionally, QA performance better captures the understanding of important temporal information as compared to corpus-based evaluation where all information is equally important for scoring. This paper presents a temporal QA system that performs temporal reasoning. It can be used to answer temporal questions (factoid, list and yes/no), about any document annotated in Time ML. In the paper, we show how this system can be used to evaluate automated temporal information understanding. Our QA-based evaluation results suggest that (i) the available temporal annotations are not complete, and (ii) QA provides a less costly and more reliable way of evaluating temporal understanding systems. To favour replicability, we made the temporal QA system and the question set used in the evaluation available.
This paper presents a directional advantage of n- gram modeling in terms of backward or forward n... more This paper presents a directional advantage of n- gram modeling in terms of backward or forward n- gram modeling in Bangla. The most commonly used n- gram analysis is predominantly a forward n-gram. However in Bangla it appears that a backward n- gram is repeatedly more successful and yields more grammatical results than a forward n-gram. This paper hypothesizes that
Rule based Automated Pronunciation Generator Ayesha Binte Mosaddeque, Naushad UzZaman, and Mumit ... more Rule based Automated Pronunciation Generator Ayesha Binte Mosaddeque, Naushad UzZaman, and Mumit Khan Center for Research on Bangla Language ... sister languages Hindi, Assamese and Oriya among others, as they have all descended from Indo-Aryan with Sanskrit ...
Page 1. Comparison of Unigram, Bigram, HMM and Brill's POS Tagging Approaches for some South... more Page 1. Comparison of Unigram, Bigram, HMM and Brill's POS Tagging Approaches for some South Asian Languages Fahim Muhammad Hasan Center for Research on Bangla Language Processing BRAC University 66, Mohakhali C/A, Dhaka Bangladesh fahimht@gmail.com ...
A Corpus from linguistic point of view is defined as a collection of transcribed speech or writte... more A Corpus from linguistic point of view is defined as a collection of transcribed speech or written text compiled mainly to enhance linguistic research. It is as important a resource as any other in the field of language engineering. With the recent advancement in computer ...
In this paper, we introduce the idea of automatically illustrating complex sentences as multimoda... more In this paper, we introduce the idea of automatically illustrating complex sentences as multimodal summaries that combine pictures, structure and simplified compressed text. By including text and structure in addition to pictures, multimodal summaries provide additional clues of what happened, who did it, to whom and how, to people who may have difficulty reading or who are looking to skim quickly. We present ROC-MMS, a system for automatically creating multimodal summaries (MMS) of complex sentences by generating pictures, textual summaries and structure. We show that pictures alone are insufficient to help people understand most sentences, especially for readers who are unfamiliar with the domain. An evaluation of ROC-MMS in the Wikipedia domain illustrates both the promise and challenge of automatically creating multimodal summaries.
ABSTRACT TimeML is an XML-based schema for annotating temporal information over discourse. The st... more ABSTRACT TimeML is an XML-based schema for annotating temporal information over discourse. The standard has been used to annotate a variety of resources and is followed by a number of tools, the creation of which constitute hundreds of thousands of man-hours of research work. However, the current state of resources is such that many are not valid, or do not produce valid output, or contain ambiguous or custom additions and removals. Difficulties arising from these variances were highlighted in the TempEval-3 exercise, which included its own extra stipulations over conventional TimeML as a response. To unify the state of current resources, and to make progress toward easy adoption of its current incarnation ISO-TimeML, this paper introduces TimeML-strict: a valid, unambiguous, and easy-to-process subset of TimeML. We also introduce three resources -- a schema for TimeML-strict; a validator tool for TimeML-strict, so that one may ensure documents are in the correct form; and a repair tool that corrects common invalidating errors and adds disambiguating markup in order to convert documents from the laxer TimeML standard to TimeML-strict.
In this paper, we study the outcome of using n- gram based algorithm for Bangla text categorizati... more In this paper, we study the outcome of using n- gram based algorithm for Bangla text categorization. To analyze the efficiency of this methodology we used one year Prothom-Alo news corpus. Our results show that n-grams of length 2 or 3 are the most useful for categorization. Using gram lengths more than 3 reduces the performance of categorization.
2012 IEEE Sixth International Conference on Semantic Computing, 2012
ABSTRACT The temporal annotation scheme Time ML was developed to support research in complex temp... more ABSTRACT The temporal annotation scheme Time ML was developed to support research in complex temporal question answering (QA). Given the complexity of temporal QA, most of the efforts have focused, so far, on extracting temporal information, which has been evaluated with corpus-based evaluation. However, the QA task represents a natural way to evaluate temporal information understanding, and creating question sets is less costly for humans than manually annotating temporal information, which is required to perform corpus-based evaluation. Additionally, QA performance better captures the understanding of important temporal information as compared to corpus-based evaluation where all information is equally important for scoring. This paper presents a temporal QA system that performs temporal reasoning. It can be used to answer temporal questions (factoid, list and yes/no), about any document annotated in Time ML. In the paper, we show how this system can be used to evaluate automated temporal information understanding. Our QA-based evaluation results suggest that (i) the available temporal annotations are not complete, and (ii) QA provides a less costly and more reliable way of evaluating temporal understanding systems. To favour replicability, we made the temporal QA system and the question set used in the evaluation available.
This paper presents a directional advantage of n- gram modeling in terms of backward or forward n... more This paper presents a directional advantage of n- gram modeling in terms of backward or forward n- gram modeling in Bangla. The most commonly used n- gram analysis is predominantly a forward n-gram. However in Bangla it appears that a backward n- gram is repeatedly more successful and yields more grammatical results than a forward n-gram. This paper hypothesizes that
Rule based Automated Pronunciation Generator Ayesha Binte Mosaddeque, Naushad UzZaman, and Mumit ... more Rule based Automated Pronunciation Generator Ayesha Binte Mosaddeque, Naushad UzZaman, and Mumit Khan Center for Research on Bangla Language ... sister languages Hindi, Assamese and Oriya among others, as they have all descended from Indo-Aryan with Sanskrit ...
Page 1. Comparison of Unigram, Bigram, HMM and Brill's POS Tagging Approaches for some South... more Page 1. Comparison of Unigram, Bigram, HMM and Brill's POS Tagging Approaches for some South Asian Languages Fahim Muhammad Hasan Center for Research on Bangla Language Processing BRAC University 66, Mohakhali C/A, Dhaka Bangladesh fahimht@gmail.com ...
A Corpus from linguistic point of view is defined as a collection of transcribed speech or writte... more A Corpus from linguistic point of view is defined as a collection of transcribed speech or written text compiled mainly to enhance linguistic research. It is as important a resource as any other in the field of language engineering. With the recent advancement in computer ...
Uploads
Papers by Naushad Uzzaman