Keyword: multimodal embedding : Search

Results
Videos
Change zoom level
Caption
People

Showing 1 - 11of11 Results

research-article

Open Access

EASE: Learning Lightweight Semantic Feature Adapters from Large Language Models for CTR Prediction

CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge ManagementPages 4819–4827https://doi.org/10.1145/3627673.3680048

Recent studies highlight the potential of large language models (LLMs) to enhance content integration in recommender systems by leveraging their semantic understanding capabilities. However, directly incorporating LLMs into an online inference pipeline ...

abstract

Embedding Formulae and Text for Improved Math Retrieval

Behrooz Mansouri

SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information RetrievalPage 2700https://doi.org/10.1145/3404835.3463264

Large data collections containing millions of math formulae are available online. Retrieving math expressions from these collections is challenging. The structural complexity of formulae requires specialized processing. When searching for mathematical ...

research-article

Conversational Fashion Image Retrieval via Multiturn Natural Language Feedback

SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information RetrievalPages 839–848https://doi.org/10.1145/3404835.3462881

We study the task of conversational fashion image retrieval via multiturn natural language feedback. Most previous studies are based on single-turn settings. Existing models on multiturn conversational fashion image retrieval have limitations, such as ...

research-article

CBVMR: Content-Based Video-Music Retrieval Using Soft Intra-Modal Structure Constraint

ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia RetrievalPages 353–361https://doi.org/10.1145/3206025.3206046

Up to now, only limited research has been conducted on crossmodal retrieval of suitable music for a specified video or vice versa. Moreover, much of the existing research relies on metadata such as keywords, tags, or description that must be ...

research-article

Public Access

Urbanity: A System for Interactive Exploration of Urban Dynamics from Streaming Human Sensing Data

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementPages 2503–2506https://doi.org/10.1145/3132847.3133177

With the urbanization process worldwide, modeling the dynamics of people's activities in urban environments has become a crucial socioeconomic task. We present Urbanity, a novel system that leverages geo-tagged social media streams for modeling urban ...

research-article

Public Access

Detecting Culture-specific Tags for News Videos through Multimodal Embedding

Thematic Workshops '17: Proceedings of the on Thematic Workshops of ACM Multimedia 2017Pages 68–74https://doi.org/10.1145/3126686.3126754

Many videos on the Web about international events are maintained in different countries, and some come with text descriptions from different cultural points of view. We introduce a new task-detecting culture-specific tags for news videos: given video ...

research-article

Public Access

ReAct: Online Multimodal Embedding for Recency-Aware Spatiotemporal Activity Modeling

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information RetrievalPages 245–254https://doi.org/10.1145/3077136.3080814

Spatiotemporalactivity modeling is an important task for applications like tour recommendation and place search. The recently developed geographical topic models have demonstrated compelling results in using geo-tagged social media (GTSM) for ...

research-article

Public Access

TrioVecEvent: Embedding-Based Online Local Event Detection in Geo-Tagged Tweet Streams

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningPages 595–604https://doi.org/10.1145/3097983.3098027

Detecting local events (e.g., protest, disaster) at their onsets is an important task for a wide spectrum of applications, ranging from disaster control to crime monitoring and place recommendation. Recent years have witnessed growing interest in ...

short-paper

Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking

ICMR '17: Proceedings of the 2017 ACM on International Conference on Multimedia RetrievalPages 416–419https://doi.org/10.1145/3078971.3079038

Continuous multimodal representations suitable for multimodal information retrieval are usually obtained with methods that heavily rely on multimodal autoencoders. In video hyperlinking, a task that aims at retrieving video segments, the state of the ...

short-paper

Attention-based LSTM with Semantic Consistency for Videos Captioning

MM '16: Proceedings of the 24th ACM international conference on MultimediaPages 357–361https://doi.org/10.1145/2964284.2967242

Recent progress in using Long Short-Term Memory (LSTM) for image description has motivated the exploration of their applications for automatically describing video content with natural language sentences. By taking a video as a sequence of features, ...

research-article

Public Access

Deep Visual-Semantic Hashing for Cross-Modal Retrieval

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningPages 1445–1454https://doi.org/10.1145/2939672.2939812

Due to the storage and retrieval efficiency, hashing has been widely applied to approximate nearest neighbor search for large-scale multimedia retrieval. Cross-modal hashing, which enables efficient retrieval of images in response to text queries or vice ...

Search Results

Applied Filters

People

Names

Institutions

Authors

Publications

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Results

Caption

EASE: Learning Lightweight Semantic Feature Adapters from Large Language Models for CTR Prediction

Embedding Formulae and Text for Improved Math Retrieval

Conversational Fashion Image Retrieval via Multiturn Natural Language Feedback

CBVMR: Content-Based Video-Music Retrieval Using Soft Intra-Modal Structure Constraint

Urbanity: A System for Interactive Exploration of Urban Dynamics from Streaming Human Sensing Data

Detecting Culture-specific Tags for News Videos through Multimodal Embedding

ReAct: Online Multimodal Embedding for Recency-Aware Spatiotemporal Activity Modeling

TrioVecEvent: Embedding-Based Online Local Event Detection in Geo-Tagged Tweet Streams

Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking

Attention-based LSTM with Semantic Consistency for Videos Captioning

Deep Visual-Semantic Hashing for Cross-Modal Retrieval

Applied Filters

People

Names

Institutions

Authors

Publications

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Save to Binder