Keyword: multimedia systems : Search

research-article

Open Access

Multi‐modal video search by examples—A video quality impact analysis

IET Computer Vision (CVI2), Volume 18, Issue 7Pages 1017–1033https://doi.org/10.1049/cvi2.12303

Abstract

As the proliferation of video content continues, and many video archives lack suitable metadata, therefore, video retrieval, particularly through example‐based search, has become increasingly crucial. Existing metadata often fails to meet the ...

The proliferation of video content with inadequate metadata has made multi‐modal example‐based video retrieval essential. The Multi‐modal Video Search by Examples (MVSE) framework employs advanced techniques to prioritise accuracy, efficiency, and user ...

short-paper

A Middleware Architecture for Enhancing Multimedia Flows with High-Level Semantic Information

IMXw '24: Proceedings of the 2024 ACM International Conference on Interactive Media Experiences WorkshopsPages 1–6https://doi.org/10.1145/3672406.3672407

Traditional multimedia systems were primarily concerned with coding media types, mainly for enabling their efficient storage, real-time communication, and preserving the temporal relationships present within (and between) these media. However, current ...

research-article

Open Access

Context‐aware relation enhancement and similarity reasoning for image‐text retrieval

IET Computer Vision (CVI2), Volume 18, Issue 5Pages 652–665https://doi.org/10.1049/cvi2.12270

Abstract

Image‐text retrieval is a fundamental yet challenging task, which aims to bridge a semantic gap between heterogeneous data to achieve precise measurements of semantic similarity. The technique of fine‐grained alignment between cross‐modal ...

A novel context‐aware relation enhancement and similarity reasoning model is proposed to achieve precise image‐text retrieval, which conducts both intra‐modal relation enhancement and inter‐modal similarity reasoning while considering the global‐context ...

research-article

Open Access

A Comparative Analysis of Sensor-, Geometry-, and Neural-Based Methods for Food Volume Estimation

MADiMa '23: Proceedings of the 8th International Workshop on Multimedia Assisted Dietary ManagementPages 21–29https://doi.org/10.1145/3607828.3617794

With the rapid advancements in artificial intelligence and computer vision within health and nutrition fields, image-based automatic dietary assessment is gaining popularity. This automation involves food segmentation, recognition, volume estimation, and ...

research-article

Open Access

StableNet: Distinguishing the hard samples to overcome language priors in visual question answering

IET Computer Vision (CVI2), Volume 18, Issue 2Pages 315–327https://doi.org/10.1049/cvi2.12249

Abstract

With the booming fields of computer vision and natural language processing, cross‐modal intersections such as visual question answering (VQA) have become very popular. However, several studies have shown that many VQA models suffer from severe ...

The authors found that some more complex questions cause instability in the visual question answering model. For this reason, metrics are designed to measure the questions’ complexity and the model’s stability, and incorporated the weights into the loss ...

keynote

Transition and Adaptability: The Cornerstone of Resilience in Future Networked Multimedia Systems and Beyond

Ralf Steinmetz

MM '23: Proceedings of the 31st ACM International Conference on MultimediaPages 5–6https://doi.org/10.1145/3581783.3613912

Let us define transition as the "exchange" between two mechanisms with comparable functionality, but with different algorithms and implementation concepts, which are optimal depending on the respective conditions of the respective context. It is much ...

research-article

Open Access

MCR: Multilayer cross‐fusion with reconstructor for multimodal abstractive summarisation

IET Computer Vision (CVI2), Volume 17, Issue 4Pages 389–403https://doi.org/10.1049/cvi2.12173

Abstract

Multimodal abstractive summarisation (MAS) aims to generate a textual summary from multimodal data collection, such as video‐text pairs. Despite the success of recent work, the existing methods lack a thorough analysis for consistency across ...

We propose a novel MCR model for the video‐containing multimodal abstractive summarisation task, aiming to model the thoroughly consistent and complementary semantics in multimodal data. We design the cross‐fusion module implemented by the cross‐modal ...

short-paper

Open Access

ZoomSense: A Scalable Infrastructure for Augmenting Zoom

MM '21: Proceedings of the 29th ACM International Conference on MultimediaPages 3771–3774https://doi.org/10.1145/3474085.3478332

We have seen a dramatic increase in the adoption of teleconferencing systems such as Zoom for remote teaching and working. Although designed primarily for traditional video conferencing scenarios, these platforms are actually being deployed in many ...

research-article

Teaching Cultural Heritage through a Narrative-based Game

Journal on Computing and Cultural Heritage (JOCCH), Volume 13, Issue 4Article No.: 27, Pages 1–28https://doi.org/10.1145/3414833

Games are used in various learning situations and domains, among which is cultural heritage. Storytelling is used in games regarding cultural places, but it often takes a simple form. Thus, the authors’ aim is to investigate the possibility to ...

research-article

Evaluating Digital Cultural Heritage ‘In the Wild’: The Case For Reflexivity

Journal on Computing and Cultural Heritage (JOCCH), Volume 12, Issue 1Article No.: 5, Pages 1–15https://doi.org/10.1145/3287272

Digital heritage interpretation is often untethered from traditional museological techniques and environments. As museums and heritage sites explore the potential of locative technologies and ever more sophisticated content-triggering mechanisms for use ...

research-article

Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed

MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 1976–1983https://doi.org/10.1145/3240508.3241911

Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. It has a wide range of multimedia applications such as in surveillance, ...

research-article

Video Annotation by Cascading Microtasks: a Crowdsourcing Approach

WebMedia '17: Proceedings of the 23rd Brazillian Symposium on Multimedia and the WebPages 49–56https://doi.org/10.1145/3126858.3126897

This paper presents a general approach to perform crowdsourcing video annotation without requiring trained workers nor experts. It consists of dividing complex annotation tasks into simple and small microtasks and cascading them to generate a final ...

research-article

Method for unconstrained text detection in natural scene image

IET Computer Vision (CVI2), Volume 11, Issue 7Pages 596–604https://doi.org/10.1049/iet-cvi.2016.0452

Text detection in natural scene images is an important prerequisite for many content‐based multimedia understanding applications. The authors present a simple and effective text detection method in natural scene image. Firstly, MSERs are extracted by the ...

research-article

Parametric Shape Grammar Formalism for Moorish Geometric Design Analysis and Generation

Journal on Computing and Cultural Heritage (JOCCH), Volume 10, Issue 4Article No.: 19, Pages 1–20https://doi.org/10.1145/3064419

The goal of this article is to propose a modeling method to automatically generate original and new forms of periodic Moorish geometric patterns. The proposed method is based on the symmetry-based approach and the shape grammar formalism. The symmetry-...

research-article

Open Access

Multimedia and Medicine: Teammates for Better Disease Detection and Survival

MM '16: Proceedings of the 24th ACM international conference on MultimediaPages 968–977https://doi.org/10.1145/2964284.2976760

Health care has a long history of adopting technology to save lives and improve the quality of living. Visual information is frequently applied for disease detection and assessment, and the established fields of computer vision and medical imaging ...

research-article

Turbo automatic speech recognition

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 24, Issue 5Pages 846–862https://doi.org/10.1109/TASLP.2016.2520364

Performance of automatic speech recognition (ASR) systems can significantly be improved by integrating further sources of information such as additional modalities, or acoustic channels, or acoustic models. Given the arising problem of information ...

research-article

Implementing a privacy‐enhanced attribute‐based credential system for online social networks with co‐ownership management

IET Information Security (ISE2), Volume 10, Issue 2Pages 60–68https://doi.org/10.1049/iet-ifs.2014.0466

Online social network (OSN) users are exhibiting an increased privacy‐protective behaviour especially since multimedia sharing has emerged as a popular activity over most OSN sites. Popular OSN applications could reveal much of the users’ personal ...

research-article

Secure and Error Resilient Approach for Multimedia Data Transmission in Constrained Networks

Q2SWinet '15: Proceedings of the 11th ACM Symposium on QoS and Security for Wireless and Mobile NetworksPages 149–156https://doi.org/10.1145/2815317.2815332

This paper addresses the general issue of securely transmitting captured images within constrained networks, as those of Wireless Multimedia Sensor Networks (WMSN)s. In particular, it focuses on handling transmission errors due either to unreliable ...

article

The challenge of promoting algorithmic thinking of both sciences- and humanities-oriented learners

Z. Katai

Journal of Computer Assisted Learning (JOCAL), Volume 31, Issue 4Pages 287–299https://doi.org/10.1111/jcal.12070

The research results we present in this paper reveal that properly calibrated e-learning tools have potential to effectively promote the algorithmic thinking of both science-oriented and humanities-oriented students. After students had watched an ...

survey

When Location Meets Social Multimedia: A Survey on Vision-Based Recognition and Mining for Geo-Social Multimedia Analytics

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 6, Issue 1Article No.: 1, Pages 1–18https://doi.org/10.1145/2597181

Coming with the popularity of multimedia sharing platforms such as Facebook and Flickr, recent years have witnessed an explosive growth of geographical tags on social multimedia content. This trend enables a wide variety of emerging applications, for ...

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Save to Binder

Upcoming Conferences