MW2011: Klavans, J. +, Computational Linguistics in Museums: Applications for Cultural Datasets

Your spoken paper cannot be the same as your written paperRead more: Museums and the Web 2011 (MW2011): Presentation Guidelines | conference.archimuse.com

Computational Linguistics in Museums: Applications for Cultural DatasetsKlavansJudithSusanRobertChunSteinGuerraRaul

ComputationalLinguisticsLanguage - Words, Words, WordsUseMeaningSyntaxShape of wordsSounds

ApplicationsSpeech synthesis – 1980’s Talking Machines for the BlindIntelligent search – pre-googleFinding names – who, what, whereTranslationSpeech recognitionAnswering Questions – What is Watson?

Domains for Computational LinguisticsHealthcare – interpreting patient recordsGovernment – helping people find informationInternational Affairs – cross-language translationLaw – analyzing Enron scandal emailMarketing – Opinions on productsMuseums – analyzing text and tags associated with objects for better access

Computational Linguistics forMetadataBuilding+

InterdisciplinaryResearchComputational Linguisticsin Museums

Text, Tags, TrustFunded in 2008 by IMLSWith the University of Maryland, and collaborative of museum partnersStudying the relationships between social tags, scholarly text and resources, and the application of trust networks to improve access to museum collections.

MW 2011 Contributions Which Computational Linguistic tools can or should be applied to tags?How do these tools impact tag analysis?What results differ from the initial steve.museum results from Trant 2007?So what – for CL?So what – for Museums?

Hard Challenges What do these words really mean?

How can tags be related to other tags? across languages across users How are tags over museum objects related to tags over anything else?

How can they be used? Finding a Needle in the Haystack

Gallery LabelThis canvas was the first one Gauguin painted during the two months he spent in Provence.... Gauguin had rebelled against Impressionism's reliance on the visible world, and he altered nature's shapes and colors to suggest his own more subjective reaction to the landscape.While the rural subject and acidic colors show the influence of van Gogh, this image is more indebted to Paul Cézanne. In his careful integration of the haystack and farm buildings, Gauguin has echoed Cézanne's emphasis on geometric form.

Tools for TagsMorphological Analysis – Conflate when possibleCats, catHaystacks, haystackPainting, paint ?What words are verbs, nouns, adjectives?How should multi-word tags be handled?

1. NN=252052. JJ=63193. NNS=40414. NN_NN=22575. JJ_NN=17926. VBG=10437. VBN=7278. NP=7089. OD_NN=45410. JJ_NNS=413

Top 10 POS Patterns:1. NN=67062. NN_NN=17133. JJ_NN=11944. JJ=9215. NNS=7576. JJ_NNS=3037. NN_NNS=3008. VBG=2389. NP=20910. VBN_NN=202

How can they be used? Why Part of Speech? Integral to most language processing pipelines

However, for social tags, parsing is not a meaningful step. Research: Understand the nature of this kind of descriptive tagging.

Link part of speech information with other lexical resources for disambiguationYou shall know a word by the company it keeps. J.R. FirthGold Orange NecklaceRipe

What About “New England”Idioms / lexicalized phrases are more difficultHeuristic comparison to Wikipedia Titles matched 46% (30% distinct) of multiword tagsE.g. “Grapes of Wrath”, “Irish Wolfhound”, “Franco-Prussian War”*Klavans and Golbeck, 2010

MW2011: Klavans, J. +, Computational Linguistics in Museums: Applications for Cultural Datasets

More Related Content

MW2011: Klavans, J. +, Computational Linguistics in Museums: Applications for Cultural Datasets

Editor's Notes