Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Your spoken paper cannot be the same as your written paperRead more: Museums and the Web 2011 (MW2011): Presentation Guidelines | conference.archimuse.com
Computational Linguistics in Museums: Applications for Cultural DatasetsKlavansJudithSusanRobertChunSteinGuerraRaul
ComputationalLinguisticsLanguage  - Words, Words, WordsUseMeaningSyntaxShape of wordsSounds
ApplicationsSpeech synthesis – 1980’s Talking Machines for the BlindIntelligent search – pre-googleFinding names – who, what, whereTranslationSpeech recognitionAnswering Questions – What is Watson?
Domains for Computational LinguisticsHealthcare – interpreting patient recordsGovernment – helping people find informationInternational Affairs – cross-language translationLaw – analyzing Enron scandal emailMarketing – Opinions on productsMuseums – analyzing text and tags associated with objects for better access
Computational Linguistics forMetadataBuilding+
Computational Linguistics in Museums: Applications for Cultural DatasetsKlavansJudithSusanRobertChunSteinGuerraRaul
InterdisciplinaryResearchComputational Linguisticsin Museums
Text, Tags, TrustFunded in 2008 by IMLSWith the University of Maryland, and collaborative of museum partnersStudying the relationships between social tags, scholarly text and resources, and the application of trust networks to improve access to museum collections.
MW 2011 Contributions		Which Computational Linguistic tools can or should be applied to tags?How do these tools impact tag analysis?What results differ from the initial steve.museum results from Trant 2007?So what – for CL?So what – for Museums?
Hard  Challenges  What do these words really mean?
  How can tags be related to other tags? 		across languages		across users   How are tags over museum objects                 	 related to tags over anything else?
   How can they be used?  Finding a Needle in the Haystack
Gallery LabelThis canvas was the first one Gauguin painted during the two months he spent in Provence.... Gauguin had rebelled against Impressionism's reliance on the visible world, and he altered nature's shapes and colors to suggest his own more subjective reaction to the landscape.While the rural subject and acidic colors show the influence of van Gogh, this image is more indebted to Paul Cézanne. In his careful integration of the haystack and farm buildings, Gauguin has echoed Cézanne's emphasis on geometric form.
Tools for TagsMorphological Analysis – Conflate when possibleCats, catHaystacks, haystackPainting, paint ?What words are verbs, nouns, adjectives?How should multi-word tags be handled?
Raw Tags or Tokens
Results		25% 93% 68%
1. NN=252052. JJ=63193. NNS=40414. NN_NN=22575. JJ_NN=17926. VBG=10437. VBN=7278. NP=7089. OD_NN=45410. JJ_NNS=413
Top 10 POS Patterns:1. NN=67062. NN_NN=17133. JJ_NN=11944. JJ=9215. NNS=7576. JJ_NNS=3037. NN_NNS=3008. VBG=2389. NP=20910. VBN_NN=202
Hard  Challenges  What do these words really mean?
  How can tags be related to other tags? 		across languages		across users   How are tags over museum objects                 	 related to tags over anything else?
   How can they be used?  Why Part of Speech? Integral to most language processing pipelines
Irecursor to parsing.
   However, for social tags, parsing is not a meaningful step.  Research:  Understand the nature of this kind of descriptive tagging.
  Link part of speech information with other lexical resources for disambiguationYou shall know a word by the company it keeps.  J.R. FirthGold		Orange      NecklaceRipe
What About “New England”Idioms / lexicalized phrases are more difficultHeuristic comparison to Wikipedia Titles matched 46% (30% distinct) of multiword tagsE.g. “Grapes of Wrath”, “Irish Wolfhound”, “Franco-Prussian War”*Klavans and Golbeck, 2010

More Related Content

MW2011: Klavans, J. +, Computational Linguistics in Museums: Applications for Cultural Datasets

  • 1. Your spoken paper cannot be the same as your written paperRead more: Museums and the Web 2011 (MW2011): Presentation Guidelines | conference.archimuse.com
  • 2. Computational Linguistics in Museums: Applications for Cultural DatasetsKlavansJudithSusanRobertChunSteinGuerraRaul
  • 3. ComputationalLinguisticsLanguage - Words, Words, WordsUseMeaningSyntaxShape of wordsSounds
  • 4. ApplicationsSpeech synthesis – 1980’s Talking Machines for the BlindIntelligent search – pre-googleFinding names – who, what, whereTranslationSpeech recognitionAnswering Questions – What is Watson?
  • 5. Domains for Computational LinguisticsHealthcare – interpreting patient recordsGovernment – helping people find informationInternational Affairs – cross-language translationLaw – analyzing Enron scandal emailMarketing – Opinions on productsMuseums – analyzing text and tags associated with objects for better access
  • 7. Computational Linguistics in Museums: Applications for Cultural DatasetsKlavansJudithSusanRobertChunSteinGuerraRaul
  • 9. Text, Tags, TrustFunded in 2008 by IMLSWith the University of Maryland, and collaborative of museum partnersStudying the relationships between social tags, scholarly text and resources, and the application of trust networks to improve access to museum collections.
  • 10. MW 2011 Contributions Which Computational Linguistic tools can or should be applied to tags?How do these tools impact tag analysis?What results differ from the initial steve.museum results from Trant 2007?So what – for CL?So what – for Museums?
  • 11. Hard Challenges What do these words really mean?
  • 12. How can tags be related to other tags? across languages across users How are tags over museum objects related to tags over anything else?
  • 13. How can they be used? Finding a Needle in the Haystack
  • 14. Gallery LabelThis canvas was the first one Gauguin painted during the two months he spent in Provence.... Gauguin had rebelled against Impressionism's reliance on the visible world, and he altered nature's shapes and colors to suggest his own more subjective reaction to the landscape.While the rural subject and acidic colors show the influence of van Gogh, this image is more indebted to Paul Cézanne. In his careful integration of the haystack and farm buildings, Gauguin has echoed Cézanne's emphasis on geometric form.
  • 15. Tools for TagsMorphological Analysis – Conflate when possibleCats, catHaystacks, haystackPainting, paint ?What words are verbs, nouns, adjectives?How should multi-word tags be handled?
  • 16. Raw Tags or Tokens
  • 18. 1. NN=252052. JJ=63193. NNS=40414. NN_NN=22575. JJ_NN=17926. VBG=10437. VBN=7278. NP=7089. OD_NN=45410. JJ_NNS=413
  • 19. Top 10 POS Patterns:1. NN=67062. NN_NN=17133. JJ_NN=11944. JJ=9215. NNS=7576. JJ_NNS=3037. NN_NNS=3008. VBG=2389. NP=20910. VBN_NN=202
  • 20. Hard Challenges What do these words really mean?
  • 21. How can tags be related to other tags? across languages across users How are tags over museum objects related to tags over anything else?
  • 22. How can they be used? Why Part of Speech? Integral to most language processing pipelines
  • 24. However, for social tags, parsing is not a meaningful step. Research: Understand the nature of this kind of descriptive tagging.
  • 25. Link part of speech information with other lexical resources for disambiguationYou shall know a word by the company it keeps. J.R. FirthGold Orange NecklaceRipe
  • 26. What About “New England”Idioms / lexicalized phrases are more difficultHeuristic comparison to Wikipedia Titles matched 46% (30% distinct) of multiword tagsE.g. “Grapes of Wrath”, “Irish Wolfhound”, “Franco-Prussian War”*Klavans and Golbeck, 2010
  • 27. Wish List - Better ways to tame the proliferation of rich but “noisy” contentClustering over tags for similarityClustering over tags and terms from textMatching over existing terms to identify meaningful unitsApply machine learning techniques to guess meaningBigrams, Trigram, Thesauri, Corpus Analysis
  • 28. AcknowledgementsSteve.museum project membersT3 and steve.museum museum partnersUniversity of Maryland, T3 groupIMA Museum ……and other participants

Editor's Notes

  1. Take this seriously.
  2. IN presenting this paper, start with something not in the paper.
  3. Still need to finish
  4. Words,words, words.