Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Locality Alignment Discriminant Analysis for Visualizing Regional English

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

In this paper, a novel dimensionality reduction algorithm named locality alignment discriminant analysis (LADA) for visualizing regional English is proposed. In the LADA algorithm, the proposed intrinsic graph or penalty graph measures the similarities between each pairwise textual slices, which can better characterize the intra-class compactness and inter-class separability; the projection matrix obtained by the proposed method is orthogonal, which can eliminate the redundancy between different projection directions, and is more effective for preserving the intrinsic geometry and improving the discriminating ability. To evaluate the performance of the algorithm, a regional written English corpus is designed and collected. Consequently, articles are split into slices and then transformed into 140-dimensional data points by 140 text style markers. Finally, variations existing in the regional written English are attempted to be recognized with our proposed LADA. The similarity among different types of English can be observed by the data plots. The results of visualization and numerical comparison indicate that LADA outperforms other existing algorithms in handling regional English data, as the proposed LADA can better preserve the local discriminative information embedded in the data, which is suitable for pattern classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Biber D (1995) Dimensions of register variation: a cross-linguistic comparison. Cambridge Univesity Press, Cambridge

    Book  Google Scholar 

  2. Branavan SRK, Chen H, Eisenstein J, Barzilay R (2009) Learning document-level semantic properties from free-text annotations. J Artif Intell Res 34:569–603. doi:10.1613/jair.2633

    MATH  Google Scholar 

  3. Cai D, He X, Han J (2005) Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 17(12):1624–1637

    Article  Google Scholar 

  4. Fitt S, Isard S (1999) Synthesis of regional english using a keyword lexicon. In: Proceedings Eurospeech 99, 823–826

  5. Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press, Massachusetts

    MATH  Google Scholar 

  6. van Halteren H, Tweedie F, Baayen H (1996) Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Comput Humanit 28(2):87–106

    Google Scholar 

  7. Han E, Karypis G, Kumar V (2001) Text categorization using weight adjusted k-nearest neighbor classification. Conference on advances in knowledge discovery and data mining, pp 53–65

  8. He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. Adv Neural Inf Process Syst 18:507

    Google Scholar 

  9. Hotho A, Staab S, Stumme G (2003) Ontologies improve text document clustering. In: Third IEEE international conference on data mining 2003, ICDM 2003. pp. 541–544. doi:10.1109/ICDM.2003.1250972

  10. Hughes A, Trudgill P, Watt D (2012) English accents and dialects: an introduction to social and regional varieties of English in the British Isles. Routledge, London

    Google Scholar 

  11. Jia Y, Nie F, Zhang C (2009) Trace ratio problem revisited. IEEE Trans Neural Netw 20(4):729–735

    Article  Google Scholar 

  12. Joachims T (1999) Transductive inference for text classification using support vector machines. In: Machine learning-international workshop then conference, Morgan Kaufmann Publishers Inc., pp. 200–209

  13. Kessler B, Numberg G, Schütze H (1997) Automatic detection of text genre. In: Proceedings of the 35th annual meeting of the association for computational linguistics and eighth conference of the european chapter of the association for computational linguistics, ACL ’98, Association for Computational Linguistics, Stroudsburg, PA, pp. 32–38. doi:10.3115/976909.979622

  14. Lai Z, Wong WK, Xu Y, Zhao C, Sun M (2013) Sparse alignment for robust tensor learning. IEEE Trans Neural Netw Learn Syst 25(10):1779–1792

    Article  Google Scholar 

  15. Lai Z, Xu Y, Yang J, Jinhui T, David Z (2013) Sparse tensor discriminant analysis. IEEE Trans Image Process 22(10):3904–3915

    Article  MathSciNet  Google Scholar 

  16. Mairesse F, Walker MA, Mehl MR, Moore RK (2007) Using linguistic cues for the automatic recognition of personality in conversation and text. J Artif Intell Res 30:457–500. doi:10.1613/jair.2349

    MATH  Google Scholar 

  17. Manevitz L, Yousef M (2007) One-class document classification via neural networks. Neurocomputing 70(7):1466–1481

    Article  Google Scholar 

  18. Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of english: the penn treebank. Comput Linguist 19(2):313–330

    Google Scholar 

  19. Metcalf AA (2000) How we talk: American regional english today;[a talking tour of American english, region by region]. Houghton Mifflin Harcourt, Boston

    Google Scholar 

  20. Nie F, Xiang S, Jia Y, Zhang C, Yan S (2008) Trace ratio criterion for feature selection. In: AAAI, vol. 2, 671–676

  21. Stamatatos E, Fakotakis N, Kokkinakis G (1999) Automatic authorship attribution. In: Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics, EACL ’99, Association for Computational Linguistics, Stroudsburg, PA, pp. 158–164. doi:10.3115/977035.977057

  22. Suh JH, Park CH, Jeon SH (2010) Applying text and data mining techniques to forecasting the trend of petitions filed to e-people. Expert Syst Appl 37(10):7255–7268. doi:10.1016/j.eswa.2010.04.002. http://www.sciencedirect.com/science/article/pii/S0957417410002733

  23. Tanaka S (2006) English and multiculturalism—from the language user’s perspective. RELC J 37(1):47–66

    Article  Google Scholar 

  24. Tang P, Chow TWS (2013) Recognition of word collocation habits using frequency rank ratio and inter-term intimacy. Expert Syst Appl 40(11):4301–4314

    Article  Google Scholar 

  25. Thompson RM (1975) Mexican-American english: social correlates of regional pronunciation. Am Speech 50(1/2):18–24

    Article  Google Scholar 

  26. Vaux B, et al. (2003) Harvard survey of North American dialects

  27. Wang H, Yan S, Xu D, Tang X, Huang T (2007) Trace ratio vs. ratio trace for dimensionality reduction. In: IEEE conference on computer vision and pattern recognition 2007, CVPR’07. pp 1–8

  28. Wang TY, Chiang HM (2011) Solving multi-label text categorization problem using support vector machine approach with membership function. Neurocomputing 74(17):3682–3689. doi:10.1016/j.neucom.2011.07.001

    Article  Google Scholar 

  29. Wolfram W, Schilling-Estes N (1998) American English: dialects and variation. Blackwell Malden, Malden

    Google Scholar 

  30. Yan S, Xu D, Zhang B, Zhang HJ, Yang Q, Lin S (2007) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):40–51

    Article  Google Scholar 

  31. Yu L, Wang S, Lai K (2005) A rough-set-refined text mining approach for crude oil market tendency forecasting. Int J Knowl Syst Sci 2(1):33–46

    Google Scholar 

  32. Zhang T, Tao D, Li X, Yang J (2009) Patch alignment for dimensionality reduction. IEEE Trans Knowl Data Eng 21(9):1299–1313

    Article  Google Scholar 

  33. Zhang Z, Chow T, Zhao M (2013) M-isomap: orthogonal constrained marginal isomap for nonlinear dimensionality reduction. IEEE Trans Cybern 43(1):180–191

    Article  Google Scholar 

  34. Zhang Z, Chow TW, Zhao M (2013) Trace ratio optimization-based semi-supervised nonlinear dimensionality reduction for marginal manifold visualization. IEEE Trans Knowl Data Eng 25(5):1148–1161. doi:10.1109/TKDE.2012.47

    Article  Google Scholar 

  35. Zhao M, Chan RH, Tang P, Chow TW, Wong SW (2013) Trace ratio linear discriminant analysis for medical diagnosis: a case study of dementia. IEEE Signal Process Lett 20(5):431–434

    Article  Google Scholar 

  36. Zhao M, Zhang Z, Chow TW (2012) Trace ratio criterion based generalized discriminative learning for semi-supervised dimensionality reduction. Pattern Recognit 45(4):1482–1499

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work was partly supported by the National Natural Science Foundation of China under Grant No. 61300209.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingbo Zhao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, P., Zhao, M. & Chow, T.W.S. Locality Alignment Discriminant Analysis for Visualizing Regional English. Neural Process Lett 43, 295–307 (2016). https://doi.org/10.1007/s11063-015-9422-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-015-9422-9

Keywords