Location via proxy:
[ UP ]
[Report a bug]
[Manage cookies]
No cookies
No scripts
No ads
No referrer
Show this form
×
Please click
here
if you are not redirected within a few seconds.
All
Images
Videos
News
Maps
Shopping
Books
Search tools
Recent
Recent
Past hour
Past 24 hours
Past week
Past month
Past year
Archives
Sorted by relevance
Sorted by relevance
Sorted by date
LANISTR: Multimodal learning from structured and unstructured data
Google Research
LANISTR is a new framework that enables multimodal learning by ingesting unstructured (image, text) and structured (time series, tabular) data.
4 months ago
Multi-modal transformer architecture for medical image analysis and automated report generation
Nature
Medical practitioners examine medical images, such as X-rays, write reports based on the findings, and provide conclusive statements.
1 month ago
Cross-modal retrieval based on multi-dimensional feature fusion hashing
Frontiers
Along with the continuous breakthrough and popularization of information network technology, multi-modal data, including texts, images, videos, and audio,...
4 months ago
Building Advanced Video Search: Frame Search Versus Multi-Modal Embeddings
HackerNoon
Imagine a data scientist studying wildlife behavior, analyzing hundreds of hours of video footage from cameras in a remote forest.
3 months ago
Build a contextual text and image search engine for product recommendations using Amazon Bedrock and Amazon OpenSearch Serverless
Amazon Web Services
In this post, we show how to build a contextual text and image search engine for product recommendations using the Amazon Titan Multimodal Embeddings model.
6 months ago
DeepASD: a deep adversarial-regularized graph learning method for ASD diagnosis with multimodal data
Nature
We propose, DeepASD, an end-to-end trainable regularized graph learning method for ASD prediction, which incorporates heterogeneous multimodal data and latent...
1 month ago
CLIP Model and The Importance of Multimodal Embeddings
Towards Data Science
CLIP, which stands for Contrastive Language-Image Pretraining, is a deep learning model developed by OpenAI in 2021.
10 months ago
Fig. 2: Mixture of embedding experts (MEE) model that computes...
ResearchGate
Joint understanding of video and language is an active research area with many applications. Prior work in this domain typically relies on learning...
9 months ago
Fused multi-modal similarity network as prior in guiding brain imaging genetic association
Frontiers
In this study, we explore a new data-driven prior knowledge that captures the subject-level similarity by fusing multi-modal similarity networks.
17 months ago
Multimodal CNN-DDI: using multimodal CNN for drug to drug interaction associated events
Nature
Our Multi-Modal Convolutional Neural Network - Drug to Drug Interaction (MCNN-DDI) model achieved an accuracy of 90.00% and an AUPR of 94.78%.
7 months ago