Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3264869acmconferencesBook PagePublication PagesmmConference Proceedingsconference-collections
AVSU'18: Proceedings of the 2018 Workshop on Audio-Visual Scene Understanding for Immersive Multimedia
ACM2018 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
MM '18: ACM Multimedia Conference Seoul Republic of Korea 26 October 2018
ISBN:
978-1-4503-5977-1
Published:
26 October 2018
Sponsors:
Next Conference
October 28 - November 1, 2024
Melbourne , VIC , Australia
Reflects downloads up to 21 Sep 2024Bibliometrics
Skip Abstract Section
Abstract

It is our great pleasure to welcome you to the 2018 ACM Multimedia Workshop on Audio-Visual Scene Understanding for Immersive Multimedia - AVSU 2018. Audio-visual data is the most familiar format of multimedia information acquired in our daily life, but audio and video processing have been researched in separate research areas for long time ignoring their synergy when they work together. Integrated audio-visual processing, building on leading research in each domain, has the potential to contribute significant advances in immersive multimedia production and reproduction. This workshop aims to provide a forum to exchange ideas in scene understanding techniques researched in audio and visual communities, and to ultimately unlock the creative potential of joint audio-visual signal processing to deliver a step change in various multimedia applications.

This workshop is following two successful UK-Korea Focal Point Workshops for Deep Audio-Visual Representation Learning for Multimedia Perception and Reproduction. The first workshop was held in the UK in conjunction with CVMP 2017. 3 demo systems and 4 talks were presented, and 40 people attended the workshop. The second workshop was held in South Korea in early 2018. 60 people attended, and 7 talks were given by invited speakers including the CTO of G'Audio Lab in the USA.

Skip Table Of Content Section
SESSION: Keynote Talk 1
keynote
Multimodal Fusion Strategies: Human vs. Machine

Two-hour movie or a short movie clip as its subset is intended to capture and present a meaningful (or significant) story in video to be recognized and understood by human audience. What if we substitute the task of human audience with that of an ...

SESSION: Oral Session 1
research-article
An Audio-Visual Method for Room Boundary Estimation and Material Recognition

In applications such as virtual and augmented reality, a plausible and coherent audio-visual reproduction can be achieved by deeply understanding the reference scene acoustics. This requires knowledge of the scene geometry and related materials. In this ...

research-article
A Deep Learning-based Stress Detection Algorithm with Speech Signal

In this paper, we propose a deep learning-based psychological stress detection algorithm using speech signals. With increasing demands for communication between human and intelligent systems, automatic stress detection is becoming an interesting ...

SESSION: Keynote Talk 2
keynote
Spatial Audio on the Web - Create, Compress, and Render

The recent surge of VR and AR has spawned an interest in spatial audio beyond its traditional delivery over loudspeakers in, e.g., home theater environments, to headphone delivery over, e.g., mobile devices. In this talk we'll discuss a web-based ...

SESSION: Oral Session 2
research-article
Public Access
Generation Method for Immersive Bullet-Time Video Using an Omnidirectional Camera in VR Platform

This paper proposes a generation method of immersive bullet-time video that continuously switches the images captured by multi-viewpoint omnidirectional cameras arranged around the subject. In ordinary bullet-time processing, it is possible to observe a ...

research-article
Audio-Visual Attention Networks for Emotion Recognition

We present a spatiotemporal attention based multimodal deep neu- ral networks for dimensional emotion recognition in multimodal audio-visual video sequence. To learn the temporal attention that discriminatively focuses on emotional sailient parts within ...

research-article
Towards Realistic Immersive Audiovisual Simulations for Hearing Research: Capture, Virtual Scenes and Reproduction

Most current hearing research laboratories and hearing aid evaluation setups are not sufficient to simulate real-life situations and to evaluate future generations of hearing aids that might include gaze information and brain signals. Thus, new ...

Contributors
  • University of Surrey
  • Yonsei University
  • University of Southampton
  • Yonsei University
Index terms have been assigned to the content through auto-classification.

Recommendations