Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Syncing Shared Multimedia through Audiovisual Bimodal Segmentation

Published: 01 July 2015 Publication History

Abstract

This work emanates from the particularities residing in contemporary social media storytelling, where multiple users and publishing channels capture and share public events, experiences, and places. Multichannel presentation and visualization mechanisms are pursued along with novel audiovisual mixing (such as time-delay-compensation enhancement, perceptual mixing, quality-based content selection, linking to context-aware metadata, and propagating multimedia semantics), thus promoting multimodal social media editing, processing, and authoring. While the exploitation of multiple time-based media (audio and video) describing the same event may lead to significant content enhancement, difficulties regarding detection and temporal synchronization of multimedia events have to be overcome. In many cases, one can identify events based only on audio features, thus performing an initial cost-effective annotation of the multimedia content. This article introduces a new audio-driven approach for temporal alignment and management of shared audiovisual streams. The article presents the theoretical framework and demonstrates the methodology in real-world scenarios. This article is part of a special issue on social multimedia and storytelling.

References

[1]
A. Lerch, An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics, John Wiley & Sons, 2012.
[2]
A.L.-C. Wang, “An Industrial-Strength Audio Search Algorithm,” SPIE Proc., vol. 5307, 2004, pp. 582–588.
[3]
S. Dixon and G. Widmer, “MATCH: A Music Alignment Tool Chest,” Proc. 6th Int'l Conf. Music Information Retrieval (ISMIR), 2005, pp. 492–497.
[4]
K. Lee and M. Cremer, “Segmentation-Based Lyrics-Audio Alignment Using Dynamic Programming,” Proc. 9th Int'l Conf. Music Information Retrieval (ISMIR), 2008, pp. 395–400.
[5]
H. Kirchhoff and A. Lerch, “Evaluation of Features for Audio-To-Audio Alignment,” J. New Music Research, vol. 40, no. 1, 2011, pp. 27–41.
[6]
C. Dimoulas et al., “Automated Audio Detection, Segmentation, and Indexing with Application to Postproduction Editing,” Proc. 1 22nd Audio Eng. Soc. Convention (AES), 2007, paper no. 7138.
[7]
C. Vegiris, C. Dimoulas, and G. Papanikolaou, “Audio Content Annotation, Description and Management Using Joint Audio Detection, Segmentation and Classification Techniques,” Proc. 1 26th Audio Eng. Soc. Convention, 2009, paper no. 7661.
[8]
P. Sidiropoulos et al., “Temporal Video Segmentation to Scenes Using High-Level Audiovisual Features,” IEEE Trans. Circuits and Systems for Video Technology, vol. 21, no. 8, 2011, pp. 1163–1177.
[9]
N. Gkalelis et al., “Video Event Understanding,” Encyclopedia of Information Science and Technology, 3rd ed., M. Khosrow-Pour, Ed., IGI-Global, 2014, pp. 2199–2207.
[10]
M. Federico and M. Furini, “An Automatic Caption Alignment Mechanism for Off-the-Shelf Speech Recognition Technologies,” Multimedia Tools and Applications, vol. 72, 2014, pp. 21–40.
[11]
K. Avdelidis et al., “Adaptive Phoneme Alignment Based on Rough Set Theory,” Proc. 7th Int'l Conf. Rough Sets and Current Trends in Computing (RSCTC), LNAI 6086, 2010, pp. 100–109.
[12]
L. Kennedy and M. Naaman, “Less Talk, More Rock: Automated Organization of Community-Contributed Collections of Concert Videos,” Proc. 18th Int'l World Wide Web Conf. (WWW), 2009, pp. 311–320.
[13]
N.J. Bryan, P. Smaragdis, and G.J. Mysore, “Clustering and Synchronizing Multi-Camera Video via Landmark Cross-Correlation,” Proc. IEEE Int'l Conf. Acoustic, Speech, and Signal Processing (ICASSP), 2012, pp. 2389–2392.
[14]
A. Bagri et al., “A Scalable Framework for Joint Clustering and Synchronizing Multi-Camera Videos, European Signal Processing Conf. (EUSIPCO), 2013.
[15]
K. Su et al., “Making a Scene: Alignment of Complete Sets of Clips based on Pairwise Audio Match,” Proc. 2nd ACM Int'l Conf. Multimedia Retrieval (ICMR), 2012;.
[16]
P. Shrestha et al., “Synchronization of Multiple Camera Videos Using Audio-Visual Features,” IEEE Trans. Multimedia, vol. 12, no. 1, 2010, pp. 79–92.
[17]
A. Llagostera Casanovas and A. Cavallaro, “Audio-Visual Events for Multi-Camera Synchronization,” Multimedia Tools and Applications, 2014, pp. 1–14.

Index Terms

  1. Syncing Shared Multimedia through Audiovisual Bimodal Segmentation
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image IEEE MultiMedia
            IEEE MultiMedia  Volume 22, Issue 3
            July-Sept. 2015
            90 pages

            Publisher

            IEEE Computer Society Press

            Washington, DC, United States

            Publication History

            Published: 01 July 2015

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 10 Feb 2025

            Other Metrics

            Citations

            View Options

            View options

            Figures

            Tables

            Media

            Share

            Share

            Share this Publication link

            Share on social media