Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/11677482_11guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

The “FAME” interactive space

Published: 11 July 2005 Publication History

Abstract

This paper describes the “FAME” multi-modal demonstrator, which integrates multiple communication modes – vision, speech and object manipulation – by combining the physical and virtual worlds to provide support for multi-cultural or multi-lingual communication and problem solving.
The major challenges are automatic perception of human actions and understanding of dialogs between people from different cultural or linguistic backgrounds. The system acts as an information butler, which demonstrates context awareness using computer vision, speech and dialog modeling. The integrated computer-enhanced human-to-human communication has been publicly demonstrated at the FORUM2004 in Barcelona and at IST2004 in The Hague.
Specifically, the “Interactive Space” described features an “Augmented Table” for multi-cultural interaction, which allows several users at the same time to perform multi-modal, cross-lingual document retrieval of audio-visual documents previously recorded by an “Intelligent Cameraman” during a week-long seminar.

References

[1]
Gieselmann, P., Denecke, M.: Towards multimodal interaction within an intelligent room. In: Proc. Eurospeech 2003, Geneva; Switzerland, ISCA (2003)
[2]
Crowley, J.L., Reignier, P.: Dynamic composition of process federations for context aware perception of human activity. In: Proc. International Conference on Integration of Knowledge Intensive Multi-Agent Systems, KIMAS'03, 10, IEEE (2003)
[3]
Consorci Universitat Internacional Menéndez Pelayo de Barcelona: "Tecnologies de la llengua: darrers avenços" and "Llenguatge, cognició i evolució". http://www.cuimpb.es/ (2004)
[4]
FORUM2004: Universal Forum of Cultures. http://www.barcelona2004.org/ (2004)
[5]
Lachenal, C., Coutaz, J.: A reference framework for multi-surface interaction. In: Proc. HCI International 2003, Crete; Greece, Crete University Press (2003)
[6]
Metze, F., Fügen, C., Pan, Y., Waibel, A.: Automatically Transcribing Meetings Using Distant Microphones. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, Philadelphia, PA; USA, IEEE (2005)
[7]
SRI AI Center: Open Agent Architecture 2.3.0. http://www.ai.sri.com/~oaa/ (2003)
[8]
Rey, G., Crowley, J.L., Coutaz, J., Reignier, P.: Perceptual components for context aware computing. In: Proc. UBICOMP 2002 - International Conference on Ubiquitous Computing, Springer (2002)
[9]
Allen, J.: Towards a general theory of action and time. Artificial Intelligence 13 (1984)
[10]
Caporossi, A., Hall, D., Reignier, P., Crowley, J.: Robust visual tracking from dynamic control of processing. In: PETS04, Workshop on Performance Evaluation for tracking and Surveillance, ECCV04, Prague; Czech Republic (2004)
[11]
Lamel, L., Gauvain, J., Eskenazi, M.: BREF, a large vocabulary spoken corpus for French. In: Proc. Eurospeech 1991, Geneva, Switzerland (1991)
[12]
Surcin, S., Stiefelhagen, R., McDonough, J.: Evaluation packages for the first chil evaluation campaign. CHIL Deliverable D4.2 (2005) http://chil.server.de/.
[13]
Bertran, M., Gatius, M., Rodriguez, H.: FameIr, multimedia information retrieval shell. In: Proceedings of JOTRI 2003, Madrid; Spain, Universidad Carlos III (2003)
[14]
The Global WordNet Association: EuroWordNet. http://www.globalwordnet.org/ (1999)
[15]
Arranz, V., Bertran, M., Rodriguez, H.: Which is the current topic? what is relevant to it? a topic detection retrieval and presentation system. FAME Deliverable D7.2 (2003)
[16]
Ware, C., Balakrishnan, R.: Reaching for objects in vr displays: Lag and frame rate. ACM Transactions on Computer-Human Interaction (TOCHI) 1 (1994) 331-356
[17]
Watson, B., Walker, N., Ribarsky, W., Spaulding, V.: The effects of variation of system responsiveness on user performance in virtual environments. Human Factors, Special Section on Virtual Environments 3 (1998) 403-414
[18]
Liang, J., Shaw, C., Green, M.: On temporal-spatial realism in the virtual reality environment. In: ACMsymposium on User interface software and technology, Hilton Head, South Carolina (1991) 19-25
[19]
Denecke, M.: Rapid prototyping for spoken dialogue systems. In: Proceedings of the 19th International Conference on Computational Linguistics, Taiwan (2002)
[20]
Holzapfel, H.: Towards development of multilingual spoken dialogue systems. In: Proceedings of the 2nd Language and Technology Conference. (2005)
[21]
Cettolo, M., Brugnara, F., Federico, M.: Advances in the automatic transcription of lectures. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), Montreal; Canada, IEEE (2004)
[22]
Lamel, L., Schiel, F., Fourcin, A., Mariani, J., Tillmann, H.: The translanguage english database (ted). In: Proc. ICSLP1994, Yokohama; Japan, ISCA (1994) 1795 - 1798
[23]
Coutaz, J., et al.: Evaluation of the fame interaction techniques and lessons learned. FAME Deliverable D8.2 (2005)

Cited By

View all
  • (2011)A just-in-time document retrieval system for dialogues or monologuesProceedings of the SIGDIAL 2011 Conference10.5555/2132890.2132935(350-352)Online publication date: 17-Jun-2011
  • (2011)A speech-based just-in-time retrieval system using semantic searchProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations10.5555/2002440.2002454(80-85)Online publication date: 21-Jun-2011
  • (2010)The ACLDProceedings of the 2010 international workshop on Searching spontaneous conversational speech10.1145/1878101.1878111(45-48)Online publication date: 29-Oct-2010
  • Show More Cited By
  1. The “FAME” interactive space

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    MLMI'05: Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
    July 2005
    487 pages
    ISBN:3540325492
    • Editors:
    • Steve Renals,
    • Samy Bengio

    Sponsors

    • NIST: National Institute of Standards and Technology
    • SNSF: Swiss National Science Foundation
    • ESSI/ESPRIT: European Commission

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 11 July 2005

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 31 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2011)A just-in-time document retrieval system for dialogues or monologuesProceedings of the SIGDIAL 2011 Conference10.5555/2132890.2132935(350-352)Online publication date: 17-Jun-2011
    • (2011)A speech-based just-in-time retrieval system using semantic searchProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations10.5555/2002440.2002454(80-85)Online publication date: 21-Jun-2011
    • (2010)The ACLDProceedings of the 2010 international workshop on Searching spontaneous conversational speech10.1145/1878101.1878111(45-48)Online publication date: 29-Oct-2010
    • (2006)Meta-user interfaces for ambient spacesProceedings of the 5th international conference on Task models and diagrams for users interface design10.5555/1756988.1756990(1-15)Online publication date: 23-Oct-2006
    • (2006)A lightweight speech detection system for perceptive environmentsProceedings of the Third international conference on Machine Learning for Multimodal Interaction10.1007/11965152_30(336-345)Online publication date: 1-May-2006

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media