default search action
MMM 2024, Amsterdam, The Netherlands - Part IV
- Stevan Rudinac, Alan Hanjalic, Cynthia C. S. Liem, Marcel Worring, Björn Þór Jónsson, Bei Liu, Yoko Yamakata:
MultiMedia Modeling - 30th International Conference, MMM 2024, Amsterdam, The Netherlands, January 29 - February 2, 2024, Proceedings, Part IV. Lecture Notes in Computer Science 14557, Springer 2024, ISBN 978-3-031-53301-3
FMM: Special Session on Foundation Models for Multimedia
- Jun Wu, Mingxin He, Yang Liu, Jingjie Lin, Zeyu Huang, Dayong Ding:
Removing Stray-Light for Wild-Field Fundus Image Fusion Based on Large Generative Models. 3-16 - Yuma Honbu, Keiji Yanai:
Training-Free Region Prediction with Stable Diffusion. 17-31 - Lei Wang, Jiabang He, Shenshen Li, Ning Liu, Ee-Peng Lim:
Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites. 32-45 - Can Zhang, Zhiqiang Wang, Yuan Zhang, Xuanya Li, Kai Hu:
GDTNet: A Synergistic Dilated Transformer and CNN by Gate Attention for Abdominal Multi-organ Segmentation. 46-57 - Xinyue Liu, Gang Yang, Yang Zhou, Yajie Yang, Weichen Huang, Dayong Ding, Jun Wu:
Fine-Grained Multi-modal Fundus Image Generation Based on Diffusion Models for Glaucoma Classification. 58-70 - Lantao Wang, Chao Ma:
Adapting Pretrained Large-Scale Vision Models for Face Forgery Detection. 71-85
ICDAR: Special Session on Intelligent Cross-Data Analysis and Retrieval
- Fuyang Yu, Zhen Wang, Dongyuan Li, Peide Zhu, Xiaohui Liang, Xiaochuan Wang, Manabu Okumura:
Towards Cross-Modal Point Cloud Retrieval for Indoor Scenes. 89-102 - Nhat-Hao Pham, Khanh-Linh Vo, Mai Anh Vu, Thu Nguyen, Michael A. Riegler, Pål Halvorsen, Binh T. Nguyen:
Correlation Visualization Under Missing Values: A Comparison Between Imputation and Direct Parameter Estimation Methods. 103-116 - Rupayan Mallick, Jenny Benois-Pineau, Akka Zemmari:
IFI: Interpreting for Improving: A Multimodal Transformer with an Interpretability Technique for Recognition of Risk Events. 117-131 - Kha-Luan Pham, Minh-Khoi Nguyen-Nhat, Anh-Huy Dinh, Quang-Tri Le, Manh-Thien Nguyen, Anh-Duy Tran, Minh-Triet Tran, Duc-Tien Dang-Nguyen:
Ookpik- A Collection of Out-of-Context Image-Caption Pairs. 132-144 - Viet-Tham Huynh, Trong-Thuan Nguyen, Quang-Thuc Nguyen, Mai-Khiem Tran, Tam V. Nguyen, Minh-Triet Tran:
LUMOS-DM: Landscape-Based Multimodal Scene Retrieval Enhanced by Diffusion Model. 145-158
XR-MACCI: Special Session on eXtended Reality and Multimedia - Advancing Content Creation and Interaction
- Helmut Neuschmied, Werner Bailer:
Mining Landmark Images for Scene Reconstruction from Weakly Annotated Video Collections. 161-174 - Panagiotis Vrachnos, Marios Krestenitis, Ilias Koulalis, Konstantinos Ioannidis, Stefanos Vrochidis:
A Framework for 3D Modeling of Construction Sites Using Aerial Imagery and Semantic NeRFs. 175-187 - Maria Pegia, Björn Þór Jónsson, Anastasia Moumtzidou, Sotiris Diplaris, Ilias Gialampoukidis, Stefanos Vrochidis, Ioannis Kompatsiaris:
Multimodal 3D Object Retrieval. 188-201 - Ioannis Kontostathis, Evlampios Apostolidis, Vasileios Mezaris:
An Integrated System for Spatio-temporal Summarization of 360-Degrees Videos. 202-215
Brave New Ideas
- Mingliang Liang, Zhouran Liu, Martha A. Larson:
Mutant Texts: A Technique for Uncovering Unexpected Inconsistencies in Large-Scale Vision-Language Models. 219-233 - Rômulo Vieira, Débora C. Muchaluat-Saade, Pablo César:
Exploring Artificial Intelligence for Advancing Performance Processes and Events in Io3MT. 234-248
Demonstrations
- Masatoshi Hamanaka:
Implementation of Melody Slot Machines. 251-257 - Faiga Alawad, Pål Halvorsen, Michael A. Riegler:
E2Evideo: End to End Video and Image Pre-processing and Analysis Tool. 258-264 - Loris Sauter, Tim Bachmann, Heiko Schuldt, Luca Rossetto:
Augmented Reality Photo Presentation and Content-Based Image Retrieval on Mobile Devices with AR-Explorer. 265-270 - Evlampios Apostolidis, Konstantinos Apostolidis, Vasileios Mezaris:
Facilitating the Production of Well-Tailored Video Summaries for Sharing on Social Media. 271-278 - Mehdi Houshmand Sarkhoosh, Sayed Mohammad Majidi Dorcheh, Cise Midoglu, Saeed Shafiee Sabet, Tomas Kupka, Dag Johansen, Michael A. Riegler, Pål Halvorsen:
AI-Based Cropping of Soccer Videos for Different Social Media Representations. 279-287 - Werner Bailer, Mihai Dogariu, Bogdan Ionescu, Hannes Fassold:
Few-Shot Object Detection as a Service: Facilitating Training and Deployment for Domain Experts. 288-294 - Boyu Xu, Ghazaleh Tanhaei, Lynda Hardman, Wolfgang Hürst:
DatAR: Supporting Neuroscience Literature Exploration by Finding Relations Between Topics in Augmented Reality. 295-300 - Tengteng Dong, Fangyuan Liu, Xinke Wang, Yishun Jiang, Xiwei Zhang, Xiao Sun:
EmoAda: A Multimodal Emotion Interaction and Psychological Adaptation System. 301-307
Video Browser Showdown
- Takayuki Hori, Kazuya Ueki, Yuma Suzuki, Hiroki Takushima, Hayato Tanoue, Haruki Sato, Takumi Takada, Aiswariya Manoj Kumar:
Waseda_Meisei_SoftBank at Video Browser Showdown 2024. 311-316 - Florian Spiess, Luca Rossetto, Heiko Schuldt:
Exploring Multimedia Vector Spaces with vitrivr-VR. 317-323 - Ralph Gasser, Rahel Arnold, Fynn Faber, Heiko Schuldt, Raphael Waltenspül, Luca Rossetto:
A New Retrieval Engine for Vitrivr. 324-331 - Giuseppe Amato, Paolo Bolettieri, Fabio Carrara, Fabrizio Falchi, Claudio Gennaro, Nicola Messina, Lucia Vadicamo, Claudio Vairo:
VISIONE 5.0: Enhanced User Interface and AI Models for VBS2024. 332-339 - Jakub Lokoc, Zuzana Vopálková, Michael Stroh, Raphael Buchmueller, Udo Schlegel:
PraK Tool: An Interactive Search Tool Based on Video Data Services. 340-346 - Omar Shahbaz Khan, Hongyi Zhu, Ujjwal Sharma, Evangelos Kanoulas, Stevan Rudinac, Björn Þór Jónsson:
Exquisitor at the Video Browser Showdown 2024: Relevance Feedback Meets Conversational Search. 347-355 - Nick Pantelidis, Maria Pegia, Damianos Galanopoulos, Konstantinos Apostolidis, Klearchos Stavrothanasopoulos, Anastasia Moumtzidou, Konstantinos Gkountakos, Ilias Gialampoukidis, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris, Björn Þór Jónsson:
VERGE in VBS 2024. 356-363 - Konstantin Schall, Nico Hezel, Kai Uwe Barthel, Klaus Jung:
Optimizing the Interactive Video Retrieval Tool Vibro for the Video Browser Showdown 2024. 364-371 - Klaus Schoeffmann, Sahar Nasirihaghighi:
DiveXplore at the Video Browser Showdown 2024. 372-379 - Zhixin Ma, Jiaxin Wu, Chong Wah Ngo:
Leveraging LLMs and Generative Models for Interactive Known-Item Video Search. 380-386 - Guihe Gu, Zhengqian Wu, Jiangshan He, Lin Song, Zhongyuan Wang, Chao Liang:
TalkSee: Interactive Video Retrieval Engine Using Large Language Model. 387-393 - Thao-Nhu Nguyen, Le Minh Quang, Graham Healy, Binh T. Nguyen, Cathal Gurrin:
VideoCLIP 2.0: An Interactive CLIP-Based Video Retrieval System for Novice Users at VBS2024. 394-399 - Gia-Huy Vuong, Van-Son Ho, Tien-Thanh Nguyen-Dang, Xuan-Dang Thai, Tu-Khiem Le, Minh-Khoi Pham, Van-Tu Ninh, Cathal Gurrin, Minh-Triet Tran:
ViewsInsight: Enhancing Video Retrieval for VBS 2024 with a User-Friendly Interaction Mechanism. 400-406
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.