Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Formalizing the Get-Specific Document Classification Algorithm

  • Conference paper
Research and Advanced Technology for Digital Libraries (ECDL 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4675))

Included in the following conference series:

Abstract

The paper represents a first attempt to formalize the get-specific document classification algorithm and to fully automate it through reasoning in a propositional concept language without requiring user involvement or a training dataset. We follow a knowledge-centric approach and convert a natural language hierarchical classification into a formal classification, where the labels are defined in the concept language. This allows us to encode the get-specific algorithm as a problem in the concept language. The reported experimental results provide evidence of practical applicability of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. DMoz guidelines: See, http://dmoz.org/guidelines/site-specific.html

  2. DMoz: See, http://dmoz.org/

  3. UNSPSC: See, http://www.unspsc.org/

  4. Yahoo! guidelines: See, http://docs.yahoo.com/info/suggest/appropriate.html

  5. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.: The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, Cambridge (2003)

    MATH  Google Scholar 

  6. Bloehdorn, S., Hotho, A.: Text classification by boosting weak learners based on terms and concepts. In: ICDM 2004. Proc. of IEEE International Conference on Data Mining, pp. 331–334. IEEE Computer Society Press, Los Alamitos (2004)

    Google Scholar 

  7. Bouquet, P., Serafini, L., Zanobini, S.: Semantic coordination: a new approach and an application. In: ISWO 2003. Proc. of the 2nd International Semantic Web Conference. Sanibel Islands, Florida, USA (October 2003)

    Google Scholar 

  8. Chan, L.M., Mitchell, J.S.: Dewey Decimal Classification: A Practical Guide. Forest P., US (December 1996)

    Google Scholar 

  9. Dumais, S.T., Chen, H.: Hierarchical classification of web content. In: Proc. of SIGIR-2000. 23rd ACM International Conference on Research and Development in Information Retrieval, pp. 256–263. ACM Press, Athens, GR (2000)

    Chapter  Google Scholar 

  10. Giunchiglia, F., Marchese, M., Zaihrayeu, I.: Encoding classifications into lightweight ontologies. JoDS VIII (Winter 2006)

    Google Scholar 

  11. Giunchiglia, F., Shvaiko, P., Yatskevich, M.: Semantic schema matching. In: Proc. of CoopIS, pp. 347–365 (2005)

    Google Scholar 

  12. Giunchiglia, F., Shvaiko, P., Yatskevich, M.: Discovering missing background knowledge in ontology matching. In: Proc. of ECAI (2006)

    Google Scholar 

  13. Giunchiglia, F., Yatskevich, M.: Element level semantic matching. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, Springer, Heidelberg (2004)

    Google Scholar 

  14. Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proc. of ICML-1997. 14th International Conference on Machine Learning, pp. 170–178. Morgan Kaufmann Publishers, Nashville (1997)

    Google Scholar 

  15. Magnini, B., Serafini, L., Speranza, M.: Making explicit the semantics hidden in schema models. In: Fensel, D., Sycara, K.P., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, Springer, Heidelberg (2003)

    Google Scholar 

  16. McCallum, A., Nigam, K.: Text classification by bootstrapping with keywords, em and shrinkage. In: Proc. of ACL 1999 - Workshop for Unsupervised Learning in Natural Language Processing (1999)

    Google Scholar 

  17. Miller, G.: WordNet: An electronic Lexical Database. MIT Press, Cambridge (1998)

    Google Scholar 

  18. Peng, X., Choi, B.: Document classifications based on word semantic hierarchies. In: Proc. of International Conference on Artificial Intelligence and Applications, pp. 362–367 (2005)

    Google Scholar 

  19. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  20. Soergel, D.: The rise of ontologies or the reinvention of classification. Journal of the American Society for Information Science 50(12), 1119–1120 (1999)

    Article  Google Scholar 

  21. Sun, A., Lim, E.P.: Hierarchical text classification and evaluation. In: Proc. of ICDM, pp. 521–528 (2001)

    Google Scholar 

  22. Turney, P.D.: Learning algorithms for keyphrase extraction. Information Retrieval 2(4), 303–336 (2000)

    Article  Google Scholar 

  23. Veeramachaneni, S., Sona, D., Avesani, P.: Hierarchical dirichlet model for document classification. In: Proc. of ICML (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

László Kovács Norbert Fuhr Carlo Meghini

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Giunchiglia, F., Zaihrayeu, I., Kharkevich, U. (2007). Formalizing the Get-Specific Document Classification Algorithm. In: Kovács, L., Fuhr, N., Meghini, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2007. Lecture Notes in Computer Science, vol 4675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74851-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74851-9_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74850-2

  • Online ISBN: 978-3-540-74851-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics