Article

Collaborative multimodal photo annotation over digital paper

Authors:

Paulo Barthelmess,

Philip CohenAuthors Info & Claims

ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

Pages 4 - 11

https://doi.org/10.1145/1180995.1181000

Published: 02 November 2006 Publication History

Abstract

The availability of metadata annotations over media content such as photos is known to enhance retrieval and organization, particularly for large data sets. The greatest challenge for obtaining annotations remains getting users to perform the large amount of tedious manual work that is required.In this paper we introduce an approach for semi-automated labeling based on extraction of metadata from naturally occurring conversations of groups of people discussing pictures among themselves.As the burden for structuring and extracting metadata is shifted from users to the system, new recognition challenges arise. We explore how multimodal language can help in 1) detecting a concise set of meaningful labels to be associated with each photo, 2) achieving robust recognition of these key semantic terms, and 3) facilitating label propagation via multimodal shortcuts. Analysis of the data of a preliminary pilot collection suggests that handwritten labels may be highly indicative of the semantics of each photo, as indicated by the correlation of handwritten terms with high frequency spoken ones. We point to initial directions exploring a multimodal fusion technique to recover robust spelling and pronunciation of these high-value terms from redundant speech and handwriting.

References

[1]

R. J. Anderson, R. Anderson, C. Hoyer, and S. A. Wolfman. A study of digital ink in lecture presentation. In CHI 2004: The 2004 Conference on Human Factors in Computing Systems, Vienna, Austria, 2004.

Digital Library

[2]

Anoto Corporation. Anoto technology - how does it work? http://www.anotofunctionality.com/cldoc/aof3.htm, May 2006.

[3]

P. Barthelmess, E. Kaiser, X. Huang, and D. Demirdjian. Distributed pointing for multimodal collaboration over sketched diagrams. In Proceedings of the International Conference on Multimodal Interfaces (ICMI), New York, NY, USA, October 2005. ACM Press.

Digital Library

[4]

J. Chen, T. Tan, P. Mulhem, and M. Kankanhalli. An improved method for image retrieval using speech annotation. In Proc. 9th International Conference on Multimedia Modeling (MMM 2003), pages 15--32, Taipei, January 2003.

[5]

P. Cohen, M. Johnston, D. McGee, S. Oviatt, J. Pittman, I. Smith, L. Chen, and J. Clow. Quickset: Multimodal interaction for distributed applications. In Proceedings of the Fifth ACM International Multimedia Conference, 1997.

Digital Library

[6]

P. R. Cohen and D. R. McGee. Tangible multimodal interfaces for safety-critical applications. Communications of the ACM, 47(1):41--46, 2004.

Digital Library

[7]

P. Debary, P. Goddi, R. Gossweiler, R. Rajani, A. Vorbau, and J. Tyler. Enabling informal communication of digital stories. Technical Report HPL-2004-180, HP Laboratories Palo Alto, 2004.

[8]

C. Decurtins, M. C. Norrie, and B. Signer. Digital annotation of printed documents. In CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management, pages 552--555, New York, NY, USA, 2003. ACM Press.

Digital Library

[9]

M. Fleck. Eavesdropping on storytelling. Technical Report HPL-2004-44, HP Laboratories Palo Alto, 2004.

[10]

D. Frohlich, A. Kuchinsky, C. Pering, A. Don, and S. Ariss. Requirements for photoware. In CSCW '02: Proceedings of the 2002 ACM conference on Computer supported cooperative work, pages 166--175, New York, NY, USA, 2002. ACM Press.

Digital Library

[11]

Google, Inc. Picasa's hello. http://www.hello.com, 2006.

[12]

M. Johnston, P. Cohen, D. McGee, S. Oviatt, J. Pittman, and I. Smith. Unification-based multimodal integration. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, 1997.

Digital Library

[13]

E. Kaiser. Using redundant speech and handwriting for learning new vocabulary and understanding abbreviations. In Proceedings of the International Conference on Multimodal Interfaces (ICMI). ACM Press, 2006.

Digital Library

[14]

E. Kaiser and P. Barthelmess. Edge-splitting in a cumulative multimodal system, for a no-wait temporal threshold on information fusion combined with an under-specified display. In Proceedings Interspeech 2006 - ICSLP, 2006.

[15]

J. Kustanowitz and B. Shneiderman. Annotation for personal digital photo libraries: Lowering barriers while raising incentives. Technical Report HCIL-2004-18, Univ. of Maryland, January 2005.

[16]

H. Lieberman, E. Rosenzweig, and P. Singh. Aria: an agent for annotating and retrieving images. Computer, 34(7):57--62, July 2001.

Digital Library

[17]

D. McGee and P. Cohen. Creating tangible interfaces by augmenting physical objects with multimodal language. In Proceedings of the International Conference on Intelligent User Interfaces (IUI 2001), 2001.

Digital Library

[18]

T. Mills, D. Pye, D. Sinclair, and K. Wood. Shoebox: A digital photo management system. Technical Report 2000.10, AT&T Laboratories, Cambridge, 2000.

[19]

M. Naaman, R. B. Yeh, H. Garcia-Molina, and A. Paepcke. Leveraging context to resolve identity in photo albums. In JCDL '05: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, pages 178--187, New York, NY, USA, 2005. ACM Press.

Digital Library

[20]

S. Oviatt. Mutual disambiguation of recognition errors in a multimodal architecture. In A. Press, editor, Proceedings of the ACM Conference on Human Factors in Computing Systems, pages 576--583, 1999.

Digital Library

[21]

Y. Qian and L. M. G. Feijs. Exploring the potentials of combining photo annotating tasks with instant messaging fun. In MUM '04: Proceedings of the 3rd international conference on Mobile and ubiquitous multimedia, pages 11--17, New York, NY, USA, 2004. ACM Press.

Digital Library

[22]

M. Ravishankar. Efficient Algorithms for Speech Recognition. PhD thesis, Carnegie Mellon University, School of Computer Science, Pittsburgh, PA, May 1996. Also published as Technical Report CMU-CS-96-143.

[23]

K. Rodden and K. R. Wood. How do people manage their digital photographs? In CHI '03: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 409--416, New York, NY, USA, 2003. ACM Press.

Digital Library

[24]

C. Shen, N. Lesh, and F. Vernier. Personal digital historian: story sharing around the table. interactions, 10(2):15--22, 2003.

Digital Library

[25]

B. Signer. Fundamental concepts for interactive paper and cross-media information spaces. Dissertation, ETH Zürich, Switzerland, 2005. No. 16218.

[26]

R. Srihari and Z. Zhang. Show & Tell: A semi-automated image annotation system. IEEE Multimedia, 7(3):61--71, Jul-Sep 2000.

Digital Library

[27]

N. Van House, M. Davis, Y. Takhteyev, N. Good, A. Wilhelm, and M. Finn. From 'what?' to 'why?': The social uses of personal photos. http://www.sims.berkeley.edu/~vanhouse/vanhouse_et_al_2004a.pdf, 2004.

[28]

L. Wenyin, S. Dumais, Y. Sun, H. Zhang, M. Czerwinski, and B. Field. Semi-automatic image annotation. In M. Hirose, editor, Human-Computer Interaction--Interact '01, pages 326--333. IOS Press, 2001.

[29]

B. Wu, R. Singh, P. Gupta, and R. Jain. eVitae: An event-based electronic chronicle. In Proc. International Conference on Extending Database Technology (EDBT), 2004. Demonstration Paper.

[30]

R. B. Yeh, C. Liao, S. Klemmer, F. Guimbretière, B. Lee, B. Kakaradov, and J. S. A. Paepcke. Butterflynet: A mobile capture and access system for field biology research. In CHI: ACM Conference on Human Factors in Computing Systems, Montréal, Québec, Canada, 2006.

Digital Library

Cited By

Cherubini MGutierrez Ade Oliveira ROliver NMynatt EFitzpatrick GHudson SEdwards KRodden T(2010)Social tagging revampedProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/1753326.1753473(985-994)Online publication date: 10-Apr-2010
https://dl.acm.org/doi/10.1145/1753326.1753473
Malacria SLecolinet EBrangier ÉMichel GBastien JCarbonell N(2008)Espace de caractérisation du stylo numériqueProceedings of the 20th Conference on l'Interaction Homme-Machine10.1145/1512714.1512749(177-184)Online publication date: 2-Sep-2008
https://dl.acm.org/doi/10.1145/1512714.1512749
Kaiser EBarthelmess PKaiser E(2007)Cross-domain matching for automatic tag extraction across redundant handwriting and speech eventsProceedings of the 2007 workshop on Tagging, mining and retrieval of human related activity information10.1145/1330588.1330597(55-62)Online publication date: 15-Nov-2007
https://dl.acm.org/doi/10.1145/1330588.1330597
Show More Cited By

Index Terms

Collaborative multimodal photo annotation over digital paper

Recommendations

Toward content-aware multimodal tagging of personal photo collections
ICMI '07: Proceedings of the 9th international conference on Multimodal interfaces

A growing number of tools is becoming available, that make use ofexisting tags to help organize and retrieve photos, facilitating the management and use of photo sets. The tagging on which these techniques rely remains a time consuming, labor intensive ...
Collaborative multimodal photo annotation over digital paper
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

The availability of metadata annotations over media content such as photos is known to enhance retrieval and organization, particularly for large data sets. The greatest challenge for obtaining annotations remains getting users to perform the large ...
Distributed pointing for multimodal collaboration over sketched diagrams
ICMI '05: Proceedings of the 7th international conference on Multimodal interfaces

A problem faced by groups that are not co-located but need to collaborate on a common task is the reduced access to the rich multimodal communicative context that they would have access to if they were collaborating face-to-face. Collaboration support ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

November 2006

404 pages

ISBN:159593541X

DOI:10.1145/1180995

General Chairs:
Francis Quek
Virginia Tech, USA
,
Jie Yang
Carnegie Mellon University, USA
,
Program Chairs:
Dominic Massaro
University of California, Santa Cruz, USA
,
Abeer Alwan
University of California, Los Angeles, USA
,
Timothy J. Hazen
Massachusetts Institute of Technology, USA

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICMI06

Sponsor:

ICMI06: 8th International Conference on Multimodal Interfaces 2006

November 2 - 4, 2006

Alberta, Banff, Canada

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
535
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cherubini MGutierrez Ade Oliveira ROliver NMynatt EFitzpatrick GHudson SEdwards KRodden T(2010)Social tagging revampedProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/1753326.1753473(985-994)Online publication date: 10-Apr-2010
https://dl.acm.org/doi/10.1145/1753326.1753473
Malacria SLecolinet EBrangier ÉMichel GBastien JCarbonell N(2008)Espace de caractérisation du stylo numériqueProceedings of the 20th Conference on l'Interaction Homme-Machine10.1145/1512714.1512749(177-184)Online publication date: 2-Sep-2008
https://dl.acm.org/doi/10.1145/1512714.1512749
Kaiser EBarthelmess PKaiser E(2007)Cross-domain matching for automatic tag extraction across redundant handwriting and speech eventsProceedings of the 2007 workshop on Tagging, mining and retrieval of human related activity information10.1145/1330588.1330597(55-62)Online publication date: 15-Nov-2007
https://dl.acm.org/doi/10.1145/1330588.1330597
Barthelmess PKaiser EMcGee DMase KMassaro DTakeda KRoy DPotamianos A(2007)Toward content-aware multimodal tagging of personal photo collectionsProceedings of the 9th international conference on Multimodal interfaces10.1145/1322192.1322215(122-125)Online publication date: 12-Nov-2007
https://dl.acm.org/doi/10.1145/1322192.1322215
Kaiser EBarthelmess PErdmann CCohen PRosson MGilmore D(2007)Multimodal redundancy across handwriting and speech during computer mediated human-human interactionsProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/1240624.1240778(1009-1018)Online publication date: 29-Apr-2007
https://dl.acm.org/doi/10.1145/1240624.1240778
Barthelmess PKaiser EHuang XMcGee DCohen PQuek FYang JMassaro DAlwan AHazen T(2006)Collaborative multimodal photo annotation over digital paperProceedings of the 8th international conference on Multimodal interfaces10.1145/1180995.1181023(131-132)Online publication date: 2-Nov-2006
https://dl.acm.org/doi/10.1145/1180995.1181023
Barthelmess PKaiser ELunsford RMcGee DCohen POviatt SGatica-Perez DJaimes ASebe N(2006)Human-centered collaborative interactionProceedings of the 1st ACM international workshop on Human-centered multimedia10.1145/1178745.1178747(1-8)Online publication date: 27-Oct-2006
https://dl.acm.org/doi/10.1145/1178745.1178747

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents