research-article

Open access

Multimodal Neural Databases

Authors:

Fabrizio SilvestriAuthors Info & Claims

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2619 - 2628

https://doi.org/10.1145/3539618.3591930

Published: 18 July 2023 Publication History

PDF eReader

Abstract

The rise in loosely-structured data available through text, images, and other modalities has called for new ways of querying them. Multimedia Information Retrieval has filled this gap and has witnessed exciting progress in recent years. Tasks such as search and retrieval of extensive multimedia archives have undergone massive performance improvements, driven to a large extent by recent developments in multimodal deep learning. However, methods in this field remain limited in the kinds of queries they support and, in particular, their inability to answer database-like queries. For this reason, inspired by recent work on neural databases, we propose a new framework, which we name Multimodal Neural Databases (MMNDBs). MMNDBs can answer complex database-like queries that involve reasoning over different input modalities, such as text and images, at scale. In this paper, we present the first architecture able to fulfill this set of requirements and test it with several baselines, showing the limitations of currently available models. The results show the potential of these new techniques to process unstructured data coming from different modalities, paving the way for future research in the area.

Supplemental Material

MP4 File

PRESENTATION VIDEO - Discover the groundbreaking concept of Multimodal Neural Databases (MMNDBs) in this captivating video. With the exponential growth of data and the emergence of generative AI, MMNDBs offer a transformative solution for handling expressive database-like queries on multimedia collections. Explore the potential of MMNDBs through real-world examples and witness how they combine large multimodal models, multimedia information retrieval, and database query processing. Dive into the feasibility, architecture, and future research directions of MMNDBs, revolutionizing how we interact with and extract insights from diverse multimedia data. Join us on this exciting journey at the forefront of AI and information retrieval!

Download
44.59 MB

References

[1]

Ion Androutsopoulos, Graeme D Ritchie, and Peter Thanisch. 1995. Natural language interfaces to databases--an introduction. Natural language engineering, Vol. 1, 1 (1995), 29--81.

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Query containment under bag and bag-set semantics

Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications

A monotone preservation result for Boolean queries expressed as a containment of conjunctive queries

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations