Ctrd: A chinese theme-rheme discourse dataset

B Fu, Y Tong, D Tian, Y Chen, X Shi, M Zhu - … Language Processing and …, 2021 - Springer
B Fu, Y Tong, D Tian, Y Chen, X Shi, M Zhu
Natural Language Processing and Chinese Computing: 10th CCF International …, 2021Springer
Discourse topic structure is the key to the cohesion of the discourse and reflects the essence
of the text. Current Chinese discourse corpus are constructed mainly based on rhetoric and
semantic relations, which ignore the functional information in discourse. To alleviate this
problem, we introduce a new Chinese discourse analysis dataset called CTRD, which
stands for C hinese T heme-R heme D iscourse dataset. Different from previous discourse
banks, CTRD was annotated according to a novel discourse annotation scheme based on …
Abstract
Discourse topic structure is the key to the cohesion of the discourse and reflects the essence of the text. Current Chinese discourse corpus are constructed mainly based on rhetoric and semantic relations, which ignore the functional information in discourse. To alleviate this problem, we introduce a new Chinese discourse analysis dataset called CTRD, which stands for Chinese Theme-Rheme Discourse dataset. Different from previous discourse banks, CTRD was annotated according to a novel discourse annotation scheme based on the Chinese theme-rheme theory and thematic progression patterns from Halliday’s systemic functional grammar. As a result, we manually annotated 525 news documents from OntoNotes 4.0 with a Kappa value greater than 0.6. And preliminary experiments on this corpus verify the computability of CTRD. Finally, we make CTRD available at https://github.com/ydc/ctrd .
Springer
Showing the best result for this search. See all results