Chu-Ren Huang | Hong Kong Polytechnic University - Academia.edu

Skip to main content

Chu-Ren Huang

Hong Kong Polytechnic University, Department of Chinese and Bilingual Studies CBS, Faculty Member

Followers

167

Following

126

Co-authors

42

Mentions

1

Public Views

I work on language as a knowledge system, applying inter-disciplinary approaches that include but are not limited to analytical, behavioural experiments, corpus-driven and computational modelling. The fields I have published most in are computational and corpus linguistics, lexical semantics, Chinese linguistics and ontology. Recently I am interested in language as a self-adaptive complex system, as well as the cognitive and ontological basis of language.

less

InterestsView All (9)

Uploads

Papers

Routledge eBooks, Jul 14, 2017

Meaning Representation and Meaning Instantiation for Chinese Nominals

The Module-Attribute Representation of Verbal Semantics: From Semantic to Argument Structure

Chapter Appendix VII: A Complete Table of Localizers

Taylor & Francis eBooks, 2017

Introduction to CKIP Parts of Speech System

Routledge eBooks, Jul 14, 2017

Routledge eBooks, Jul 14, 2017

Chapter Appendix V: Sample Segmented Text [Ya(Elegant) Level]

Taylor & Francis eBooks, 2017

Routledge eBooks, Jul 14, 2017

Routledge eBooks, Jul 14, 2017

The Module-Attribute Representation of Verbal Semantics

Pacific Asia Conference on Language, Information, and Computation, Feb 1, 2000

Chapter Appendix VI: A Complete List of Parts of Speech in Mandarin Chinese

Taylor & Francis eBooks, 2017

Routledge eBooks, 2017

Social changes through the lens of language: A big data study of Chinese modal verbs

PLOS ONE, 2022

Leech’s corpus-based comparison of English modal verbs from 1961 to 1992 showed the steep decline... more Leech’s corpus-based comparison of English modal verbs from 1961 to 1992 showed the steep decline of all modal verbs together, which he ascribed to continuing changes towards a more equal and less authority-driven society. This study inspired many diachronic and synchronic studies, mostly on English modal verbs and largely assuming the correlation between the use of modal verbs and power relations. Yet, there are continuing debates on sampling design and the choices of corpora. In addition, this hypothesis has not been attested in any other language with comparable corpus size or examined with longitudinal studies. This study tracks the use of Chinese modal verbs from 1901 to 2009, covering the historical events of the New Culture Movement, the establishment of the PRC, the implementation of simplified characters and the completion and finalization of simplification of the Chinese writing system. We found that the usage of modal verbs did rise and fall during the last century, and f...

Variations in World Chineses

The Routledge Handbook of Chinese Applied Linguistics, 2019

Sentiment Analyzer with Rich Features for Ironic and Sarcastic Tweets

Sentiment Analysis of tweets is a complex task, because these short messages employ unconventiona... more Sentiment Analysis of tweets is a complex task, because these short messages employ unconventional language to increase the expressiveness. This task becomes even more difficult when people use figurative language (e.g. irony, sarcasm and metaphors) because it causes a mismatch between the literal meaning and the actual expressed sentiment. In this paper, we describe a sentiment analysis system designed for handling ironic and sarcastic tweets. Features grounded on several linguistic levels are proposed and used to classify the tweets in a 11-scale range, using a decision tree. The system is evaluated on the dataset released by the organizers of the SemEval 2015, task 11. The results show that our method largely outperforms the systems proposed by the participants of the task on ironic and sarcastic tweets.

Are Word Embeddings Really a Bad Fit for the Estimation of Thematic Fit?

While neural embeddings represent a popular choice for word representation in a wide variety of N... more While neural embeddings represent a popular choice for word representation in a wide variety of NLP tasks, their usage for thematic fit modeling has been limited, as they have been reported to lag behind syntax-based count models. In this paper, we propose a complete evaluation of count models and word embeddings on thematic fit estimation, by taking into account a larger number of parameters and verb roles and introducing also dependency-based embeddings in the comparison. Our results show a complex scenario, where a determinant factor for the performance seems to be the availability to the model of reliable syntactic information for building the distributional representations of the roles.

Transitivity in Light Verb Variations in Mandarin Chinese – A Comparable Corpus-based Statistical Approach

This paper adopts a comparable corpus-based approach to light verb variations in two varieties of... more This paper adopts a comparable corpus-based approach to light verb variations in two varieties of Mandarin Chinese and proposes a transitivity (Hopper and Thompson 1980) based theoretical account. Light verbs are highly grammaticalized and lack strong collocation restrictions; hence it has been a challenge to empirical accounts. It is even more challenging to consider their variations between different varieties (e.g. Taiwan and Mainland Mandarin). This current study follows the research paradigm set up in Lin et al. (2014) for differentiating different light verbs and Huang et al. (2014) for automatic discovery of light verb variations. In our study, a corpus-based statistical approach is adopted to show that both internal variety differences between light verbs and external differences between different variants can be detected effectively. The distributional differences between Mainland and Taiwan can also shed light on the re-classification of syntactic types of the taken comple...

Language change in Report on the Work of the Government by Premiers of the People’s Republic of China

The present paper explored the focusing topics change and language change in Report on the Work o... more The present paper explored the focusing topics change and language change in Report on the Work of the Government by Premiers of the People’s Republic of China (hereinafter Report texts). The text clustering and correspondence analysis showed the focusing topics change in selected three periods Report texts. The Report texts were represented by the clause length distribution and clustered. The clustering result showed the differences of clause length usages in the Report texts. The relationship between clause length and word length was studied. The average word length decreases with clause length and were fitted using the function, y = ax based on the Menzerath-Altmann Law. The relationship between the three periods Report texts represented by the fitted parameters, a and b, were explored.

Not all arguments are processed equally: a distributional model of argument complexity

Language Resources and Evaluation, 2021

Using Conceptual Norms for Metaphor Detection

Proceedings of the Second Workshop on Figurative Language Processing, 2020

Routledge eBooks, Jul 14, 2017

Meaning Representation and Meaning Instantiation for Chinese Nominals

The Module-Attribute Representation of Verbal Semantics: From Semantic to Argument Structure

Chapter Appendix VII: A Complete Table of Localizers

Taylor & Francis eBooks, 2017

Introduction to CKIP Parts of Speech System

Routledge eBooks, Jul 14, 2017

Routledge eBooks, Jul 14, 2017

Chapter Appendix V: Sample Segmented Text [Ya(Elegant) Level]

Taylor & Francis eBooks, 2017

Routledge eBooks, Jul 14, 2017

Routledge eBooks, Jul 14, 2017

The Module-Attribute Representation of Verbal Semantics

Pacific Asia Conference on Language, Information, and Computation, Feb 1, 2000

Chapter Appendix VI: A Complete List of Parts of Speech in Mandarin Chinese

Taylor & Francis eBooks, 2017

Routledge eBooks, 2017

Social changes through the lens of language: A big data study of Chinese modal verbs

PLOS ONE, 2022

Leech’s corpus-based comparison of English modal verbs from 1961 to 1992 showed the steep decline... more Leech’s corpus-based comparison of English modal verbs from 1961 to 1992 showed the steep decline of all modal verbs together, which he ascribed to continuing changes towards a more equal and less authority-driven society. This study inspired many diachronic and synchronic studies, mostly on English modal verbs and largely assuming the correlation between the use of modal verbs and power relations. Yet, there are continuing debates on sampling design and the choices of corpora. In addition, this hypothesis has not been attested in any other language with comparable corpus size or examined with longitudinal studies. This study tracks the use of Chinese modal verbs from 1901 to 2009, covering the historical events of the New Culture Movement, the establishment of the PRC, the implementation of simplified characters and the completion and finalization of simplification of the Chinese writing system. We found that the usage of modal verbs did rise and fall during the last century, and f...

Variations in World Chineses

The Routledge Handbook of Chinese Applied Linguistics, 2019

Sentiment Analyzer with Rich Features for Ironic and Sarcastic Tweets

Sentiment Analysis of tweets is a complex task, because these short messages employ unconventiona... more Sentiment Analysis of tweets is a complex task, because these short messages employ unconventional language to increase the expressiveness. This task becomes even more difficult when people use figurative language (e.g. irony, sarcasm and metaphors) because it causes a mismatch between the literal meaning and the actual expressed sentiment. In this paper, we describe a sentiment analysis system designed for handling ironic and sarcastic tweets. Features grounded on several linguistic levels are proposed and used to classify the tweets in a 11-scale range, using a decision tree. The system is evaluated on the dataset released by the organizers of the SemEval 2015, task 11. The results show that our method largely outperforms the systems proposed by the participants of the task on ironic and sarcastic tweets.

Are Word Embeddings Really a Bad Fit for the Estimation of Thematic Fit?

While neural embeddings represent a popular choice for word representation in a wide variety of N... more While neural embeddings represent a popular choice for word representation in a wide variety of NLP tasks, their usage for thematic fit modeling has been limited, as they have been reported to lag behind syntax-based count models. In this paper, we propose a complete evaluation of count models and word embeddings on thematic fit estimation, by taking into account a larger number of parameters and verb roles and introducing also dependency-based embeddings in the comparison. Our results show a complex scenario, where a determinant factor for the performance seems to be the availability to the model of reliable syntactic information for building the distributional representations of the roles.

Transitivity in Light Verb Variations in Mandarin Chinese – A Comparable Corpus-based Statistical Approach

This paper adopts a comparable corpus-based approach to light verb variations in two varieties of... more This paper adopts a comparable corpus-based approach to light verb variations in two varieties of Mandarin Chinese and proposes a transitivity (Hopper and Thompson 1980) based theoretical account. Light verbs are highly grammaticalized and lack strong collocation restrictions; hence it has been a challenge to empirical accounts. It is even more challenging to consider their variations between different varieties (e.g. Taiwan and Mainland Mandarin). This current study follows the research paradigm set up in Lin et al. (2014) for differentiating different light verbs and Huang et al. (2014) for automatic discovery of light verb variations. In our study, a corpus-based statistical approach is adopted to show that both internal variety differences between light verbs and external differences between different variants can be detected effectively. The distributional differences between Mainland and Taiwan can also shed light on the re-classification of syntactic types of the taken comple...

Language change in Report on the Work of the Government by Premiers of the People’s Republic of China

The present paper explored the focusing topics change and language change in Report on the Work o... more The present paper explored the focusing topics change and language change in Report on the Work of the Government by Premiers of the People’s Republic of China (hereinafter Report texts). The text clustering and correspondence analysis showed the focusing topics change in selected three periods Report texts. The Report texts were represented by the clause length distribution and clustered. The clustering result showed the differences of clause length usages in the Report texts. The relationship between clause length and word length was studied. The average word length decreases with clause length and were fitted using the function, y = ax based on the Menzerath-Altmann Law. The relationship between the three periods Report texts represented by the fitted parameters, a and b, were explored.

Not all arguments are processed equally: a distributional model of argument complexity

Language Resources and Evaluation, 2021

Using Conceptual Norms for Metaphor Detection

Proceedings of the Second Workshop on Figurative Language Processing, 2020

POSTER: Unsupervised Measure of Word Similarity: How to Outperform Co-occurrence and Vector Cosine in VSMs Introduction and Related Work

by Enrico Santus and Chu-Ren Huang

In this paper, we claim that vector cosine – which is generally considered among the most efficie... more In this paper, we claim that vector cosine – which is generally considered among the most efficient unsupervised measures for identifying word similarity in Vector Space Models – can be outperformed by an unsupervised measure that calculates the extent of the intersection among the most mutually dependent contexts of the target words. To prove it, we describe and evaluate APSyn, a variant of the Average Precision that, without any optimization, outperforms the vector cosine and the co-occurrence on the standard ESL test set, with an improvement ranging between +9.00% and +17.98%, depending on the number of chosen top contexts.