Abstract
In this paper, a novel dimensionality reduction algorithm named locality alignment discriminant analysis (LADA) for visualizing regional English is proposed. In the LADA algorithm, the proposed intrinsic graph or penalty graph measures the similarities between each pairwise textual slices, which can better characterize the intra-class compactness and inter-class separability; the projection matrix obtained by the proposed method is orthogonal, which can eliminate the redundancy between different projection directions, and is more effective for preserving the intrinsic geometry and improving the discriminating ability. To evaluate the performance of the algorithm, a regional written English corpus is designed and collected. Consequently, articles are split into slices and then transformed into 140-dimensional data points by 140 text style markers. Finally, variations existing in the regional written English are attempted to be recognized with our proposed LADA. The similarity among different types of English can be observed by the data plots. The results of visualization and numerical comparison indicate that LADA outperforms other existing algorithms in handling regional English data, as the proposed LADA can better preserve the local discriminative information embedded in the data, which is suitable for pattern classification.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Biber D (1995) Dimensions of register variation: a cross-linguistic comparison. Cambridge Univesity Press, Cambridge
Branavan SRK, Chen H, Eisenstein J, Barzilay R (2009) Learning document-level semantic properties from free-text annotations. J Artif Intell Res 34:569–603. doi:10.1613/jair.2633
Cai D, He X, Han J (2005) Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 17(12):1624–1637
Fitt S, Isard S (1999) Synthesis of regional english using a keyword lexicon. In: Proceedings Eurospeech 99, 823–826
Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press, Massachusetts
van Halteren H, Tweedie F, Baayen H (1996) Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Comput Humanit 28(2):87–106
Han E, Karypis G, Kumar V (2001) Text categorization using weight adjusted k-nearest neighbor classification. Conference on advances in knowledge discovery and data mining, pp 53–65
He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. Adv Neural Inf Process Syst 18:507
Hotho A, Staab S, Stumme G (2003) Ontologies improve text document clustering. In: Third IEEE international conference on data mining 2003, ICDM 2003. pp. 541–544. doi:10.1109/ICDM.2003.1250972
Hughes A, Trudgill P, Watt D (2012) English accents and dialects: an introduction to social and regional varieties of English in the British Isles. Routledge, London
Jia Y, Nie F, Zhang C (2009) Trace ratio problem revisited. IEEE Trans Neural Netw 20(4):729–735
Joachims T (1999) Transductive inference for text classification using support vector machines. In: Machine learning-international workshop then conference, Morgan Kaufmann Publishers Inc., pp. 200–209
Kessler B, Numberg G, Schütze H (1997) Automatic detection of text genre. In: Proceedings of the 35th annual meeting of the association for computational linguistics and eighth conference of the european chapter of the association for computational linguistics, ACL ’98, Association for Computational Linguistics, Stroudsburg, PA, pp. 32–38. doi:10.3115/976909.979622
Lai Z, Wong WK, Xu Y, Zhao C, Sun M (2013) Sparse alignment for robust tensor learning. IEEE Trans Neural Netw Learn Syst 25(10):1779–1792
Lai Z, Xu Y, Yang J, Jinhui T, David Z (2013) Sparse tensor discriminant analysis. IEEE Trans Image Process 22(10):3904–3915
Mairesse F, Walker MA, Mehl MR, Moore RK (2007) Using linguistic cues for the automatic recognition of personality in conversation and text. J Artif Intell Res 30:457–500. doi:10.1613/jair.2349
Manevitz L, Yousef M (2007) One-class document classification via neural networks. Neurocomputing 70(7):1466–1481
Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of english: the penn treebank. Comput Linguist 19(2):313–330
Metcalf AA (2000) How we talk: American regional english today;[a talking tour of American english, region by region]. Houghton Mifflin Harcourt, Boston
Nie F, Xiang S, Jia Y, Zhang C, Yan S (2008) Trace ratio criterion for feature selection. In: AAAI, vol. 2, 671–676
Stamatatos E, Fakotakis N, Kokkinakis G (1999) Automatic authorship attribution. In: Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics, EACL ’99, Association for Computational Linguistics, Stroudsburg, PA, pp. 158–164. doi:10.3115/977035.977057
Suh JH, Park CH, Jeon SH (2010) Applying text and data mining techniques to forecasting the trend of petitions filed to e-people. Expert Syst Appl 37(10):7255–7268. doi:10.1016/j.eswa.2010.04.002. http://www.sciencedirect.com/science/article/pii/S0957417410002733
Tanaka S (2006) English and multiculturalism—from the language user’s perspective. RELC J 37(1):47–66
Tang P, Chow TWS (2013) Recognition of word collocation habits using frequency rank ratio and inter-term intimacy. Expert Syst Appl 40(11):4301–4314
Thompson RM (1975) Mexican-American english: social correlates of regional pronunciation. Am Speech 50(1/2):18–24
Vaux B, et al. (2003) Harvard survey of North American dialects
Wang H, Yan S, Xu D, Tang X, Huang T (2007) Trace ratio vs. ratio trace for dimensionality reduction. In: IEEE conference on computer vision and pattern recognition 2007, CVPR’07. pp 1–8
Wang TY, Chiang HM (2011) Solving multi-label text categorization problem using support vector machine approach with membership function. Neurocomputing 74(17):3682–3689. doi:10.1016/j.neucom.2011.07.001
Wolfram W, Schilling-Estes N (1998) American English: dialects and variation. Blackwell Malden, Malden
Yan S, Xu D, Zhang B, Zhang HJ, Yang Q, Lin S (2007) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):40–51
Yu L, Wang S, Lai K (2005) A rough-set-refined text mining approach for crude oil market tendency forecasting. Int J Knowl Syst Sci 2(1):33–46
Zhang T, Tao D, Li X, Yang J (2009) Patch alignment for dimensionality reduction. IEEE Trans Knowl Data Eng 21(9):1299–1313
Zhang Z, Chow T, Zhao M (2013) M-isomap: orthogonal constrained marginal isomap for nonlinear dimensionality reduction. IEEE Trans Cybern 43(1):180–191
Zhang Z, Chow TW, Zhao M (2013) Trace ratio optimization-based semi-supervised nonlinear dimensionality reduction for marginal manifold visualization. IEEE Trans Knowl Data Eng 25(5):1148–1161. doi:10.1109/TKDE.2012.47
Zhao M, Chan RH, Tang P, Chow TW, Wong SW (2013) Trace ratio linear discriminant analysis for medical diagnosis: a case study of dementia. IEEE Signal Process Lett 20(5):431–434
Zhao M, Zhang Z, Chow TW (2012) Trace ratio criterion based generalized discriminative learning for semi-supervised dimensionality reduction. Pattern Recognit 45(4):1482–1499
Acknowledgments
This work was partly supported by the National Natural Science Foundation of China under Grant No. 61300209.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tang, P., Zhao, M. & Chow, T.W.S. Locality Alignment Discriminant Analysis for Visualizing Regional English. Neural Process Lett 43, 295–307 (2016). https://doi.org/10.1007/s11063-015-9422-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-015-9422-9