A legal text usually long and complicated, it has some characteristic that make it different from other dailyuse texts.Then, translating a legal text is generally considered to be difficult. This paper introduces an approach to split a... more
A legal text usually long and complicated, it has some characteristic that make it different from other dailyuse texts.Then, translating a legal text is generally considered to be difficult. This paper introduces an approach to split a legal sentence based on its logical structure and presents selecting appropriate translation rules to improve phrase reordering of legal translation. We use a method which divides a English legal sentence based on its logical structure to segment the legal sentence. New features with rich linguistic and contextual information of split sentences are proposed to rule selection. We apply maximum entropy to combine rich linguistic and contextual information and integrate the features of split sentences into the legal translation, tree-based SMT system. We obtain improvements in performance for legal translation from English to Japanese over Moses and Moses-chart systems.
Abstract. Word segmentation for Vietnamese, like for most Asian languages, is an important task which has a significant impact on higher language processing levels. However, it has received little attention of the community due to the... more
Abstract. Word segmentation for Vietnamese, like for most Asian languages, is an important task which has a significant impact on higher language processing levels. However, it has received little attention of the community due to the lack of a common annotated corpus for ...
The main objective of this research is to extract the health information, such as diseases, symptoms, treatments and drugs from the health online forum discussion. The task is referred as the medical entity recognition (MER) in which is... more
The main objective of this research is to extract the health information, such as diseases, symptoms, treatments and drugs from the health online forum discussion. The task is referred as the medical entity recognition (MER) in which is defined as the Named Entity Recognition (NER) task to extract the information from the unstructured text and transform it into the structured forms in the health field. The approach for the task used in this research is a supervised learning using Conditional Random Field(CRF). We experimented several combinations of features in order to produce the results with the best accuracy. As the final result, this research obtained the best accuracy of precision 70.97%, recall 57.83%, and f-measures 63.69%. The best combination of features resulting the best overall result consists of the word itself, phrase, dictionary, the first preceding word and the word length.
A legal text usually long and complicated, it has some characteristic that make it different from other daily-use texts.Then, translating a legal text is generally considered to be difficult. This paper introduces an approach to split a... more
A legal text usually long and complicated, it has some characteristic that make it different from other daily-use texts.Then, translating a legal text is generally considered to be difficult. This paper introduces an approach to split a legal sentence based on its logical structure and presents selecting appropriate translation rules to improve phrase reordering of legal translation. We use a method which divides a English legal sentence based on its logical structure to segment the legal sentence. New features with rich linguistic and contextual information of split sentences are proposed to rule selection. We apply maximum entropy to combine rich linguistic and contextual information and integrate the features of split sentences into the legal translation, tree-based SMT system. We obtain improvements in performance for legal translation from English to Japanese over Moses and Moses-chart systems