Corpus Building for Corporate Knowledge Discovery and Management: A Case Study of Manufacturing

Liu, Ying; Loh, Han Tong

doi:10.1007/978-3-540-74819-9_67

Ying Liu¹ &
Han Tong Loh²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4692))

Included in the following conference series:

International Conference on Knowledge-Based and Intelligent Information and Engineering Systems

1495 Accesses
3 Citations

Abstract

Building a collection of electronic documents, e.g. corpus, is a cornerstone for the research in information retrieval, text mining and knowledge management. In literature, very few papers have discussed the necessary concerns for building a corpus and explained the building process systematically. In this paper, we explain our work of building an enterprise corpus called manufacturing corpus version 1 (MCV1) for corporate knowledge management purpose. Relevant issues, e.g. input texts, category labels and policies, as well as its parallel coding process and quality measurements are discussed. The real-world automated text classification experiments based on MCV1 show the soundness of its coding process. Finally, suggestions are made on how the proposed approach can be implemented in a more economical manner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Hybrid Method for Manufacturing Text Mining Based on Document Clustering and Topic Modeling Techniques

Know-linking: When Machine Learning Meets Organizational Tools Analysis to Generate Shared Knowledge in Large Companies

A Semi-automatic Data Extraction System for Heterogeneous Data Sources: a Case Study from Cotton Industry

References

Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval. Addison-Wesley Longman Publishing Co., Inc, Boston, MA, USA (1999)
Google Scholar
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery: an overview. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P. (eds.) Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, Menlo Park, CA, USA (1996)
Google Scholar
Hearst, M.A.: Untangling Text Data Mining. In: Proceedings of ACL’99, the 37th Annual Meeting of the Association for Computational Linguistics, invited paper (1999)
Google Scholar
Hersh, W., Buckley, C., Leone, T.J., Hickam, D.: OHSUMED: an interactive retrieval evaluation and new large test collection for research. In: 17th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR’94) (1994)
Google Scholar
Joachims, T.: Text categorization with Support Vector Machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) Machine Learning: ECML-98. LNCS, vol. 1398, Springer, Heidelberg (1998)
Chapter Google Scholar
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)
Google Scholar
Mitchell, T.M.: Machine learning and data mining. Communications of the ACM 42, 30–36 (1999)
Article Google Scholar
Rennie, J.D.M., Shih, L., Teevan, J., Karger, D.R.: Tackling the Poor Assumptions of Naive Bayes Text Classifiers. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML) (2003)
Google Scholar
Rose, T., Stevenson, M., Whitehead, M.: The Reuters Corpus Volume 1 - from Yesterday’s News to Tomorrow’s Language Resources. In: The third international conference on language resource and evaluation (2002)
Google Scholar
Rose, T., Whitehead, M.: Private communication: RCV1 building (2003)
Google Scholar
Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys (CSUR) 34, 1–47 (2002)
Article Google Scholar
Ulrich, K.T., Eppinger, S.D.: Product Design and Development, 2nd edn. McGraw-Hill, New York, USA (2000)
Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory, 2nd edn. Springer, New York (1999)
Google Scholar
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong SAR, China
Ying Liu
Department of Mechanical Engineering, National University of Singapore, 21 Lower Kent Ridge Road, 119077, Singapore
Han Tong Loh

Authors

Ying Liu
View author publications
You can also search for this author in PubMed Google Scholar
Han Tong Loh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Bruno Apolloni Robert J. Howlett Lakhmi Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., Loh, H.T. (2007). Corpus Building for Corporate Knowledge Discovery and Management: A Case Study of Manufacturing. In: Apolloni, B., Howlett, R.J., Jain, L. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2007. Lecture Notes in Computer Science(), vol 4692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74819-9_67

Download citation

DOI: https://doi.org/10.1007/978-3-540-74819-9_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74817-5
Online ISBN: 978-3-540-74819-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Corpus Building for Corporate Knowledge Discovery and Management: A Case Study of Manufacturing

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Hybrid Method for Manufacturing Text Mining Based on Document Clustering and Topic Modeling Techniques

Know-linking: When Machine Learning Meets Organizational Tools Analysis to Generate Shared Knowledge in Large Companies

A Semi-automatic Data Extraction System for Heterogeneous Data Sources: a Case Study from Cotton Industry

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Corpus Building for Corporate Knowledge Discovery and Management: A Case Study of Manufacturing

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Hybrid Method for Manufacturing Text Mining Based on Document Clustering and Topic Modeling Techniques

Know-linking: When Machine Learning Meets Organizational Tools Analysis to Generate Shared Knowledge in Large Companies

A Semi-automatic Data Extraction System for Heterogeneous Data Sources: a Case Study from Cotton Industry

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation