Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJuly 2024
Multi‐modal video search by examples—A video quality impact analysis
- Guanfeng Wu,
- Abbas Haider,
- Xing Tian,
- Erfan Loweimi,
- Chi Ho Chan,
- Mengjie Qian,
- Awan Muhammad,
- Ivor Spence,
- Rob Cooper,
- Wing W. Y. Ng,
- Josef Kittler,
- Mark Gales,
- Hui Wang
AbstractAs the proliferation of video content continues, and many video archives lack suitable metadata, therefore, video retrieval, particularly through example‐based search, has become increasingly crucial. Existing metadata often fails to meet the ...
The proliferation of video content with inadequate metadata has made multi‐modal example‐based video retrieval essential. The Multi‐modal Video Search by Examples (MVSE) framework employs advanced techniques to prioritise accuracy, efficiency, and user ...
- short-paperAugust 2024
A Middleware Architecture for Enhancing Multimedia Flows with High-Level Semantic Information
- Jose Matheus Carvalho Boaro,
- Polyana Bezerra da Costa,
- Daniel de Sousa Moraes,
- Pedro Thiago Cutrim dos Santos,
- João G. Ribeiro,
- Julio Cesar Duarte,
- Alberto Sardinha,
- Sérgio Colcher
IMXw '24: Proceedings of the 2024 ACM International Conference on Interactive Media Experiences WorkshopsPages 1–6https://doi.org/10.1145/3672406.3672407Traditional multimedia systems were primarily concerned with coding media types, mainly for enabling their efficient storage, real-time communication, and preserving the temporal relationships present within (and between) these media. However, current ...
- research-articleJanuary 2024
Context‐aware relation enhancement and similarity reasoning for image‐text retrieval
AbstractImage‐text retrieval is a fundamental yet challenging task, which aims to bridge a semantic gap between heterogeneous data to achieve precise measurements of semantic similarity. The technique of fine‐grained alignment between cross‐modal ...
A novel context‐aware relation enhancement and similarity reasoning model is proposed to achieve precise image‐text retrieval, which conducts both intra‐modal relation enhancement and inter‐modal similarity reasoning while considering the global‐context ...
- research-articleOctober 2023
A Comparative Analysis of Sensor-, Geometry-, and Neural-Based Methods for Food Volume Estimation
MADiMa '23: Proceedings of the 8th International Workshop on Multimedia Assisted Dietary ManagementPages 21–29https://doi.org/10.1145/3607828.3617794With the rapid advancements in artificial intelligence and computer vision within health and nutrition fields, image-based automatic dietary assessment is gaining popularity. This automation involves food segmentation, recognition, volume estimation, and ...
- research-articleOctober 2023
StableNet: Distinguishing the hard samples to overcome language priors in visual question answering
AbstractWith the booming fields of computer vision and natural language processing, cross‐modal intersections such as visual question answering (VQA) have become very popular. However, several studies have shown that many VQA models suffer from severe ...
The authors found that some more complex questions cause instability in the visual question answering model. For this reason, metrics are designed to measure the questions’ complexity and the model’s stability, and incorporated the weights into the loss ...
-
- keynoteOctober 2023
Transition and Adaptability: The Cornerstone of Resilience in Future Networked Multimedia Systems and Beyond
MM '23: Proceedings of the 31st ACM International Conference on MultimediaPages 5–6https://doi.org/10.1145/3581783.3613912Let us define transition as the "exchange" between two mechanisms with comparable functionality, but with different algorithms and implementation concepts, which are optimal depending on the respective conditions of the respective context. It is much ...
- research-articleFebruary 2023
MCR: Multilayer cross‐fusion with reconstructor for multimodal abstractive summarisation
AbstractMultimodal abstractive summarisation (MAS) aims to generate a textual summary from multimodal data collection, such as video‐text pairs. Despite the success of recent work, the existing methods lack a thorough analysis for consistency across ...
We propose a novel MCR model for the video‐containing multimodal abstractive summarisation task, aiming to model the thoroughly consistent and complementary semantics in multimodal data. We design the cross‐fusion module implemented by the cross‐modal ...
- short-paperOctober 2021
ZoomSense: A Scalable Infrastructure for Augmenting Zoom
MM '21: Proceedings of the 29th ACM International Conference on MultimediaPages 3771–3774https://doi.org/10.1145/3474085.3478332We have seen a dramatic increase in the adoption of teleconferencing systems such as Zoom for remote teaching and working. Although designed primarily for traditional video conferencing scenarios, these platforms are actually being deployed in many ...
- research-articleDecember 2020
Teaching Cultural Heritage through a Narrative-based Game
Journal on Computing and Cultural Heritage (JOCCH), Volume 13, Issue 4Article No.: 27, Pages 1–28https://doi.org/10.1145/3414833Games are used in various learning situations and domains, among which is cultural heritage. Storytelling is used in games regarding cultural places, but it often takes a simple form. Thus, the authors’ aim is to investigate the possibility to ...
- research-articleFebruary 2019
Evaluating Digital Cultural Heritage ‘In the Wild’: The Case For Reflexivity
Journal on Computing and Cultural Heritage (JOCCH), Volume 12, Issue 1Article No.: 5, Pages 1–15https://doi.org/10.1145/3287272Digital heritage interpretation is often untethered from traditional museological techniques and environments. As museums and heritage sites explore the potential of locative technologies and ever more sophisticated content-triggering mechanisms for use ...
- research-articleOctober 2018
Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 1976–1983https://doi.org/10.1145/3240508.3241911Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. It has a wide range of multimedia applications such as in surveillance, ...
- research-articleOctober 2017
Video Annotation by Cascading Microtasks: a Crowdsourcing Approach
WebMedia '17: Proceedings of the 23rd Brazillian Symposium on Multimedia and the WebPages 49–56https://doi.org/10.1145/3126858.3126897This paper presents a general approach to perform crowdsourcing video annotation without requiring trained workers nor experts. It consists of dividing complex annotation tasks into simple and small microtasks and cascading them to generate a final ...
- research-articleAugust 2017
Method for unconstrained text detection in natural scene image
IET Computer Vision (CVI2), Volume 11, Issue 7Pages 596–604https://doi.org/10.1049/iet-cvi.2016.0452Text detection in natural scene images is an important prerequisite for many content‐based multimedia understanding applications. The authors present a simple and effective text detection method in natural scene image. Firstly, MSERs are extracted by the ...
- research-articleJuly 2017
Parametric Shape Grammar Formalism for Moorish Geometric Design Analysis and Generation
Journal on Computing and Cultural Heritage (JOCCH), Volume 10, Issue 4Article No.: 19, Pages 1–20https://doi.org/10.1145/3064419The goal of this article is to propose a modeling method to automatically generate original and new forms of periodic Moorish geometric patterns. The proposed method is based on the symmetry-based approach and the shape grammar formalism. The symmetry-...
- research-articleOctober 2016
Multimedia and Medicine: Teammates for Better Disease Detection and Survival
- Michael Riegler,
- Mathias Lux,
- Carsten Griwodz,
- Concetto Spampinato,
- Thomas de Lange,
- Sigrun L. Eskeland,
- Konstantin Pogorelov,
- Wallapak Tavanapong,
- Peter T. Schmidt,
- Cathal Gurrin,
- Dag Johansen,
- Håvard Johansen,
- Pål Halvorsen
MM '16: Proceedings of the 24th ACM international conference on MultimediaPages 968–977https://doi.org/10.1145/2964284.2976760Health care has a long history of adopting technology to save lives and improve the quality of living. Visual information is frequently applied for disease detection and assessment, and the established fields of computer vision and medical imaging ...
- research-articleMay 2016
Turbo automatic speech recognition
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 24, Issue 5Pages 846–862https://doi.org/10.1109/TASLP.2016.2520364Performance of automatic speech recognition (ASR) systems can significantly be improved by integrating further sources of information such as additional modalities, or acoustic channels, or acoustic models. Given the arising problem of information ...
- research-articleMarch 2016
Implementing a privacy‐enhanced attribute‐based credential system for online social networks with co‐ownership management
IET Information Security (ISE2), Volume 10, Issue 2Pages 60–68https://doi.org/10.1049/iet-ifs.2014.0466Online social network (OSN) users are exhibiting an increased privacy‐protective behaviour especially since multimedia sharing has emerged as a popular activity over most OSN sites. Popular OSN applications could reveal much of the users’ personal ...
- research-articleNovember 2015
Secure and Error Resilient Approach for Multimedia Data Transmission in Constrained Networks
Q2SWinet '15: Proceedings of the 11th ACM Symposium on QoS and Security for Wireless and Mobile NetworksPages 149–156https://doi.org/10.1145/2815317.2815332This paper addresses the general issue of securely transmitting captured images within constrained networks, as those of Wireless Multimedia Sensor Networks (WMSN)s. In particular, it focuses on handling transmission errors due either to unreliable ...
- articleAugust 2015
The challenge of promoting algorithmic thinking of both sciences- and humanities-oriented learners
Journal of Computer Assisted Learning (JOCAL), Volume 31, Issue 4Pages 287–299https://doi.org/10.1111/jcal.12070The research results we present in this paper reveal that properly calibrated e-learning tools have potential to effectively promote the algorithmic thinking of both science-oriented and humanities-oriented students. After students had watched an ...
- surveyMarch 2015
When Location Meets Social Multimedia: A Survey on Vision-Based Recognition and Mining for Geo-Social Multimedia Analytics
ACM Transactions on Intelligent Systems and Technology (TIST), Volume 6, Issue 1Article No.: 1, Pages 1–18https://doi.org/10.1145/2597181Coming with the popularity of multimedia sharing platforms such as Facebook and Flickr, recent years have witnessed an explosive growth of geographical tags on social multimedia content. This trend enables a wide variety of emerging applications, for ...