Abstract
The paper represents a first attempt to formalize the get-specific document classification algorithm and to fully automate it through reasoning in a propositional concept language without requiring user involvement or a training dataset. We follow a knowledge-centric approach and convert a natural language hierarchical classification into a formal classification, where the labels are defined in the concept language. This allows us to encode the get-specific algorithm as a problem in the concept language. The reported experimental results provide evidence of practical applicability of the proposed approach.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
DMoz guidelines: See, http://dmoz.org/guidelines/site-specific.html
DMoz: See, http://dmoz.org/
UNSPSC: See, http://www.unspsc.org/
Yahoo! guidelines: See, http://docs.yahoo.com/info/suggest/appropriate.html
Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.: The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, Cambridge (2003)
Bloehdorn, S., Hotho, A.: Text classification by boosting weak learners based on terms and concepts. In: ICDM 2004. Proc. of IEEE International Conference on Data Mining, pp. 331–334. IEEE Computer Society Press, Los Alamitos (2004)
Bouquet, P., Serafini, L., Zanobini, S.: Semantic coordination: a new approach and an application. In: ISWO 2003. Proc. of the 2nd International Semantic Web Conference. Sanibel Islands, Florida, USA (October 2003)
Chan, L.M., Mitchell, J.S.: Dewey Decimal Classification: A Practical Guide. Forest P., US (December 1996)
Dumais, S.T., Chen, H.: Hierarchical classification of web content. In: Proc. of SIGIR-2000. 23rd ACM International Conference on Research and Development in Information Retrieval, pp. 256–263. ACM Press, Athens, GR (2000)
Giunchiglia, F., Marchese, M., Zaihrayeu, I.: Encoding classifications into lightweight ontologies. JoDS VIII (Winter 2006)
Giunchiglia, F., Shvaiko, P., Yatskevich, M.: Semantic schema matching. In: Proc. of CoopIS, pp. 347–365 (2005)
Giunchiglia, F., Shvaiko, P., Yatskevich, M.: Discovering missing background knowledge in ontology matching. In: Proc. of ECAI (2006)
Giunchiglia, F., Yatskevich, M.: Element level semantic matching. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, Springer, Heidelberg (2004)
Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proc. of ICML-1997. 14th International Conference on Machine Learning, pp. 170–178. Morgan Kaufmann Publishers, Nashville (1997)
Magnini, B., Serafini, L., Speranza, M.: Making explicit the semantics hidden in schema models. In: Fensel, D., Sycara, K.P., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, Springer, Heidelberg (2003)
McCallum, A., Nigam, K.: Text classification by bootstrapping with keywords, em and shrinkage. In: Proc. of ACL 1999 - Workshop for Unsupervised Learning in Natural Language Processing (1999)
Miller, G.: WordNet: An electronic Lexical Database. MIT Press, Cambridge (1998)
Peng, X., Choi, B.: Document classifications based on word semantic hierarchies. In: Proc. of International Conference on Artificial Intelligence and Applications, pp. 362–367 (2005)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Soergel, D.: The rise of ontologies or the reinvention of classification. Journal of the American Society for Information Science 50(12), 1119–1120 (1999)
Sun, A., Lim, E.P.: Hierarchical text classification and evaluation. In: Proc. of ICDM, pp. 521–528 (2001)
Turney, P.D.: Learning algorithms for keyphrase extraction. Information Retrieval 2(4), 303–336 (2000)
Veeramachaneni, S., Sona, D., Avesani, P.: Hierarchical dirichlet model for document classification. In: Proc. of ICML (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Giunchiglia, F., Zaihrayeu, I., Kharkevich, U. (2007). Formalizing the Get-Specific Document Classification Algorithm. In: Kovács, L., Fuhr, N., Meghini, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2007. Lecture Notes in Computer Science, vol 4675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74851-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-74851-9_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74850-2
Online ISBN: 978-3-540-74851-9
eBook Packages: Computer ScienceComputer Science (R0)