Abstract
This paper discusses the classification problems of text documents. Based on the concept of the proximity degree, the set of words, is partitioned into some equivalence classes. Particularty, the concepts of the semantic field and association degree are given in this paper. Based on the above concepts, this paper presents a fuzzy classification approach for document categorization. Furthermore, applying the concept of the entropy of information, the approaches to select key words from the set of words covering the classification of documents and to construct the hierarchical structure of key words are obtained.
Access this article
Rent this article via DeepDyve
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Faloutsos C, Oard D. A survey of information retrieval and filtering methods. Technical Report CS-TR-3541, University of Maryland, 1995.
Fuhi N, Buckley C. A probabilistic learning approach for document indexing.ACM Trans. Information Systems, 1991, 9(1): 223–248.
Lang K. News weeder: Learning to filter netnews. InProc. 12th International Conference on Machine Learning, New York, 1995, pp.331–339.
Li Y H, Jain A K. Classification of text documents.The Computer Journal, 1988, 41(8): 537–546.
Langley P, Iba W, Thompson K. An analysis of Bayesian classifiers.In AAAI Spring Symp. Machine Learning in Information Access Technical Papers, Palo, Alto, 1992.
Ristad E. A natural law of succession. Technical Report CS-TR-495-95, Princeton University, 1995.
Sahami M. Learning limited dependence Bayesian classifiers. InProc. 2nd Int. Conf. Knowledge Discovery and Data Mining, Montreal, Canada, 1996, pp.335–338.
Quinlan J. Induction of decision trees.Machine Learning, 1986, 1(1): 81–106.
Lalmas M. A model for representing and retrieving heterogeneous structured documents based on evidential reasoning.The Computer Journal, 1999, 42(7): 547–568.
Rijsbergen C J V. A non-classical logic for information retrieval.The Computer Journal, 1986, 29(3): 481–485.
Kolda T G, O'Leary D P. A semidiscrete matrix decomposition for latent semantic indexing in information retrieval.ACM Trans. Information Systems, 1991, 9(2): 223–248.
Nie J Y. Towards a probabilistic model logic for semantic-based information retrieval. InProc. the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1992. pp.140–151.
Wong S K M, Yao Y Y. On modeling information retrieval with probabilistic inference.ACM Trans. Information Systems, 1995, 13(1): 38–68.
Chiaramella Y, Mulhen P, Fourel F. A model for multimedia information retrieval. Technical Report, Fermi ESPRIT BRA 8134, University of Glasgow.
Wang W, Rada R. Structured hypertext with domain semantics.ACM Trans. Information Systems, 1998, 16(4): 372–412.
Larky S, Croft W. Combining classifiers in text classification. InProc. SIGIR, Dublin, Ireland, 1996, pp.81–93.
Woods K, Kegeimeyer W, Bowyer J K. Combination of multiple classifiers using local accuracy estimates.IEEE Trans. PAMI, 1997, 19(3): 405–410.
Lao S Y, Wang H Q, Liu W Y. Functional dependencies with null values, fuzzy values and crisp values.IEEE Trans. Fuzzy Systems, 1999, 7(1): 97–103.
Liu W Y, Song N. The fuzzy association degree in semantic data models.Fuzzy Sets and Systems, 2001, 117(2): 203–208.
Liu W Y. A relational data model with fuzzy inheritance dependencies.Fuzzy Sets and Systems, 1997, 89(2): 205–213.
Liu W Y. An effective partition method of the fuzzy inheritance hierarchies on the basis of the semantic proximity.International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 1998, 6(5): 505–513.
Cohen W W. Learning to classify English text with ILP methods. InProc. 5th Int. Workshop on Inductive Logic Programming, 1995, pp.3–24.
Jarjan R E, Leeuwen J V. Worst-case analysis of set union algorithms.J. ACM, 1984, 31(2): 245–281.
Larsen H L, Yager R R. Efficient computing of transitive closures.Fuzzy Sets and Systems, 1990, 38(1): 81–90.
Klir G. Fuzzy Sets: An Overview of Fundamentals, Applications, and Personal Views. Beijing Normal University Press, Beijing, 2000.
Chen Y, Wang Z W, He Q C. A fuzzy clustering method and its effectivity based on the fuzzy proximity relation.Journal of Sichuan University, 1997, 34(5): 41–46.
Robert A. Information Theory. Interscience Publishers, New York, 1965.
Author information
Authors and Affiliations
Additional information
This work is supported by the National Natural Science Foundation of China (Grant No.50263006), the Foundation of the Key Laboratory of intelligent Information Processing, Institute of Computing Technology Chinese Academy of Sciences (Grant No HP2002-2), and the Yunnan Natural Science Foundation (Grant No.2002F0063M).
LIU WeiYi graduated from Huazhong University of Science and Technology in 1976. He was a research fellow at Hong Kong City University. Currently, he is a professor of the Department of Computer Science of Yunnan University. His research interests include fuzzy systems, data and knowledge engineering. He is a member of IEEE Computer Society.
SONG Ning received the M.S. degree from Kunming University of Science and Technology in 1993. She is an associate professor of the Department of Metallurgical Engineering. Her current research interests include data and knowledge engineering.
Rights and permissions
About this article
Cite this article
Liu, W., Song, N. A fuzzy approach to classification of text documents. J. Comput. Sci. & Technol. 18, 640–647 (2003). https://doi.org/10.1007/BF02947124
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02947124