Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1180995.1181000acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
Article

Collaborative multimodal photo annotation over digital paper

Published: 02 November 2006 Publication History

Abstract

The availability of metadata annotations over media content such as photos is known to enhance retrieval and organization, particularly for large data sets. The greatest challenge for obtaining annotations remains getting users to perform the large amount of tedious manual work that is required.In this paper we introduce an approach for semi-automated labeling based on extraction of metadata from naturally occurring conversations of groups of people discussing pictures among themselves.As the burden for structuring and extracting metadata is shifted from users to the system, new recognition challenges arise. We explore how multimodal language can help in 1) detecting a concise set of meaningful labels to be associated with each photo, 2) achieving robust recognition of these key semantic terms, and 3) facilitating label propagation via multimodal shortcuts. Analysis of the data of a preliminary pilot collection suggests that handwritten labels may be highly indicative of the semantics of each photo, as indicated by the correlation of handwritten terms with high frequency spoken ones. We point to initial directions exploring a multimodal fusion technique to recover robust spelling and pronunciation of these high-value terms from redundant speech and handwriting.

References

[1]
R. J. Anderson, R. Anderson, C. Hoyer, and S. A. Wolfman. A study of digital ink in lecture presentation. In CHI 2004: The 2004 Conference on Human Factors in Computing Systems, Vienna, Austria, 2004.
[2]
Anoto Corporation. Anoto technology - how does it work? http://www.anotofunctionality.com/cldoc/aof3.htm, May 2006.
[3]
P. Barthelmess, E. Kaiser, X. Huang, and D. Demirdjian. Distributed pointing for multimodal collaboration over sketched diagrams. In Proceedings of the International Conference on Multimodal Interfaces (ICMI), New York, NY, USA, October 2005. ACM Press.
[4]
J. Chen, T. Tan, P. Mulhem, and M. Kankanhalli. An improved method for image retrieval using speech annotation. In Proc. 9th International Conference on Multimedia Modeling (MMM 2003), pages 15--32, Taipei, January 2003.
[5]
P. Cohen, M. Johnston, D. McGee, S. Oviatt, J. Pittman, I. Smith, L. Chen, and J. Clow. Quickset: Multimodal interaction for distributed applications. In Proceedings of the Fifth ACM International Multimedia Conference, 1997.
[6]
P. R. Cohen and D. R. McGee. Tangible multimodal interfaces for safety-critical applications. Communications of the ACM, 47(1):41--46, 2004.
[7]
P. Debary, P. Goddi, R. Gossweiler, R. Rajani, A. Vorbau, and J. Tyler. Enabling informal communication of digital stories. Technical Report HPL-2004-180, HP Laboratories Palo Alto, 2004.
[8]
C. Decurtins, M. C. Norrie, and B. Signer. Digital annotation of printed documents. In CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management, pages 552--555, New York, NY, USA, 2003. ACM Press.
[9]
M. Fleck. Eavesdropping on storytelling. Technical Report HPL-2004-44, HP Laboratories Palo Alto, 2004.
[10]
D. Frohlich, A. Kuchinsky, C. Pering, A. Don, and S. Ariss. Requirements for photoware. In CSCW '02: Proceedings of the 2002 ACM conference on Computer supported cooperative work, pages 166--175, New York, NY, USA, 2002. ACM Press.
[11]
Google, Inc. Picasa's hello. http://www.hello.com, 2006.
[12]
M. Johnston, P. Cohen, D. McGee, S. Oviatt, J. Pittman, and I. Smith. Unification-based multimodal integration. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, 1997.
[13]
E. Kaiser. Using redundant speech and handwriting for learning new vocabulary and understanding abbreviations. In Proceedings of the International Conference on Multimodal Interfaces (ICMI). ACM Press, 2006.
[14]
E. Kaiser and P. Barthelmess. Edge-splitting in a cumulative multimodal system, for a no-wait temporal threshold on information fusion combined with an under-specified display. In Proceedings Interspeech 2006 - ICSLP, 2006.
[15]
J. Kustanowitz and B. Shneiderman. Annotation for personal digital photo libraries: Lowering barriers while raising incentives. Technical Report HCIL-2004-18, Univ. of Maryland, January 2005.
[16]
H. Lieberman, E. Rosenzweig, and P. Singh. Aria: an agent for annotating and retrieving images. Computer, 34(7):57--62, July 2001.
[17]
D. McGee and P. Cohen. Creating tangible interfaces by augmenting physical objects with multimodal language. In Proceedings of the International Conference on Intelligent User Interfaces (IUI 2001), 2001.
[18]
T. Mills, D. Pye, D. Sinclair, and K. Wood. Shoebox: A digital photo management system. Technical Report 2000.10, AT&T Laboratories, Cambridge, 2000.
[19]
M. Naaman, R. B. Yeh, H. Garcia-Molina, and A. Paepcke. Leveraging context to resolve identity in photo albums. In JCDL '05: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, pages 178--187, New York, NY, USA, 2005. ACM Press.
[20]
S. Oviatt. Mutual disambiguation of recognition errors in a multimodal architecture. In A. Press, editor, Proceedings of the ACM Conference on Human Factors in Computing Systems, pages 576--583, 1999.
[21]
Y. Qian and L. M. G. Feijs. Exploring the potentials of combining photo annotating tasks with instant messaging fun. In MUM '04: Proceedings of the 3rd international conference on Mobile and ubiquitous multimedia, pages 11--17, New York, NY, USA, 2004. ACM Press.
[22]
M. Ravishankar. Efficient Algorithms for Speech Recognition. PhD thesis, Carnegie Mellon University, School of Computer Science, Pittsburgh, PA, May 1996. Also published as Technical Report CMU-CS-96-143.
[23]
K. Rodden and K. R. Wood. How do people manage their digital photographs? In CHI '03: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 409--416, New York, NY, USA, 2003. ACM Press.
[24]
C. Shen, N. Lesh, and F. Vernier. Personal digital historian: story sharing around the table. interactions, 10(2):15--22, 2003.
[25]
B. Signer. Fundamental concepts for interactive paper and cross-media information spaces. Dissertation, ETH Zürich, Switzerland, 2005. No. 16218.
[26]
R. Srihari and Z. Zhang. Show & Tell: A semi-automated image annotation system. IEEE Multimedia, 7(3):61--71, Jul-Sep 2000.
[27]
N. Van House, M. Davis, Y. Takhteyev, N. Good, A. Wilhelm, and M. Finn. From 'what?' to 'why?': The social uses of personal photos. http://www.sims.berkeley.edu/~vanhouse/vanhouse_et_al_2004a.pdf, 2004.
[28]
L. Wenyin, S. Dumais, Y. Sun, H. Zhang, M. Czerwinski, and B. Field. Semi-automatic image annotation. In M. Hirose, editor, Human-Computer Interaction--Interact '01, pages 326--333. IOS Press, 2001.
[29]
B. Wu, R. Singh, P. Gupta, and R. Jain. eVitae: An event-based electronic chronicle. In Proc. International Conference on Extending Database Technology (EDBT), 2004. Demonstration Paper.
[30]
R. B. Yeh, C. Liao, S. Klemmer, F. Guimbretière, B. Lee, B. Kakaradov, and J. S. A. Paepcke. Butterflynet: A mobile capture and access system for field biology research. In CHI: ACM Conference on Human Factors in Computing Systems, Montréal, Québec, Canada, 2006.

Cited By

View all
  • (2010)Social tagging revampedProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/1753326.1753473(985-994)Online publication date: 10-Apr-2010
  • (2008)Espace de caractérisation du stylo numériqueProceedings of the 20th Conference on l'Interaction Homme-Machine10.1145/1512714.1512749(177-184)Online publication date: 2-Sep-2008
  • (2007)Cross-domain matching for automatic tag extraction across redundant handwriting and speech eventsProceedings of the 2007 workshop on Tagging, mining and retrieval of human related activity information10.1145/1330588.1330597(55-62)Online publication date: 15-Nov-2007
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces
November 2006
404 pages
ISBN:159593541X
DOI:10.1145/1180995
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automatic label extraction
  2. collaborative interaction
  3. intelligent interfaces
  4. multimodal processing
  5. photo annotation

Qualifiers

  • Article

Conference

ICMI06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2010)Social tagging revampedProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/1753326.1753473(985-994)Online publication date: 10-Apr-2010
  • (2008)Espace de caractérisation du stylo numériqueProceedings of the 20th Conference on l'Interaction Homme-Machine10.1145/1512714.1512749(177-184)Online publication date: 2-Sep-2008
  • (2007)Cross-domain matching for automatic tag extraction across redundant handwriting and speech eventsProceedings of the 2007 workshop on Tagging, mining and retrieval of human related activity information10.1145/1330588.1330597(55-62)Online publication date: 15-Nov-2007
  • (2007)Toward content-aware multimodal tagging of personal photo collectionsProceedings of the 9th international conference on Multimodal interfaces10.1145/1322192.1322215(122-125)Online publication date: 12-Nov-2007
  • (2007)Multimodal redundancy across handwriting and speech during computer mediated human-human interactionsProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/1240624.1240778(1009-1018)Online publication date: 29-Apr-2007
  • (2006)Collaborative multimodal photo annotation over digital paperProceedings of the 8th international conference on Multimodal interfaces10.1145/1180995.1181023(131-132)Online publication date: 2-Nov-2006
  • (2006)Human-centered collaborative interactionProceedings of the 1st ACM international workshop on Human-centered multimedia10.1145/1178745.1178747(1-8)Online publication date: 27-Oct-2006

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media