#

multi-modal

Here are 302 public repositories matching this topic...

Jakob-L-M / multi-modal-document-search

This repository provides a streamlit application that enables a user to upload a screenshot which will than be queried against a database of PDF documents. Both the image structure as well as the (possibly) included text are used to find matching documents for a self defined set.

multi-modal ocr-recognition embedding-vectors streamlit vector-database

Updated Dec 28, 2023
Python

Pruthvi-Sanghavi / air_water_land_surveillance_bot

Repository for air water and land surveillance robot developed as a part of DRDO Robotics and Unmanned Systems Exposition.

quadcopter surveillance arduino-uno multi-modal differential-drive-robot

Updated Feb 10, 2021

znreza / Autoencoder-Comprehensive-Study

This report contains a comprehensive study on unsupervised feature learning using various types of autoencoders.

deep-neural-networks deep-learning autoencoder multi-modal temporal-data 3d-deep-learning denoise-autoencoder

Updated Feb 11, 2019

colurw / temporal_CNN

Time-series forecasting of market price data using a multi-modal Convolutional Neural Network

numpy pandas multi-modal time-series-forecasting tensorflow2

Updated Jun 6, 2024
Jupyter Notebook

XingchenZOU / Awesome-Multimodal-Urban-Computing

A professional list on Multi-modal Data Fusion Models and Key Datasets for Urban Computing.

deep-learning multi-modal data-fusion urban-computing

Updated Jul 28, 2024

maestriaproject / maestriaproject.github.io

Website of the project

image analysis earth observation multi-modal

Updated Jun 10, 2021
CSS

ccwutw / knowledge-networks

A Knowledge Network implementation from Knowledge Graphs

knowledge-graph multi-modal neo4j-graph

Updated Oct 19, 2022
Python

mahalrs / newsgen

Multi-Modal Image Generation for News Stories

transformers multi-modal clip text-image vqgan vqgan-clip dalle-mini

Updated May 6, 2023
Python

MikolajBaranski / LLM-Marketing-Expert

API generating insight on the quality of marketing materials using an open-source multi-modal LLM (llava-1.5-7b-hf). Contains all relevant code as well as instructions to create a docker image of the code.

open-source marketing multi-modal llm llava

Updated Sep 5, 2024
Python

Chiuqyan / arxiv-daily-audio-test

🎓 Automatically Update Some Fields Papers Daily using Github Actions / 12th hours

audio vision multi-modal llm

Updated Sep 7, 2024
Python

shubhangb97 / efficient_vision_project

Music conditioned dance prediction

multi-modal graph-convolutional-networks motion-prediction

Updated May 14, 2022
Python

matthewcheng222 / LapMMIRF

Multi-Modal Image Registration Framework using Laplacian Pyramid

machine-learning deep-learning unsupervised-learning multi-modal image-registration laplacian-pyramid

Updated May 2, 2023
Jupyter Notebook

imamsulthon / beritasatu

dependency-injection retrofit2 gson2 multi-modal viewbinding coroutines-flow dagger-hilt paging3

Updated Jun 19, 2023
Kotlin

ishitaanand2222 / CipherCapture

Updated Aug 17, 2023
JavaScript

jianzhnie / LMMRobot

LMMRobot is a professional end-to-end development framework that uses multimodal large models to enable embodied intelligent robot development.

reinforcement-learning robotics transformer multi-modal aloha mujuco mobile-aloha

Updated Jun 24, 2024

ammarlodhi255 / metadata-augmented-neural-networks-for-wild-animal-classification

This repository contains the implementation code for the paper "Metadata Augmented Neural Networks For Wild Animal Classification".

metadata deep-learning multi-modal multi-modal-learning fusion-techniques wild-animal-classification wild-life-monitoring metadata-fusion

Updated Aug 26, 2024
Jupyter Notebook

pankajarm / multi-modal-cloud-search

Build & Host multi-modal cloud search

deep-learning embeddings multi-modal clip neural-search

Updated Nov 9, 2022
Python

Ji-eun-Kim / Translate-phrases-in-images-and-apply-original-styles

이미지 내 문구 번역 및 원본 스타일 적용 | [인공지능학회] X:AI | 📕 Toy project

deep-learning transformer multi-modal ocr-recognition ocr-detection t5 srnet

Updated Jan 30, 2024
Python

jayant1211 / A-Multi-Modal-Approach-to-Improve-Scene-Context

This GitHub repository focuses on an integrated approach to scene classification and image caption generation, aiming to improve the accuracy of scene evaluation in computer vision applications.

beam-search places365 multi-modal scene-recognition scene-understanding caption-generation places365-googlenet places365-cnns

Updated Feb 4, 2024
Jupyter Notebook

skl0726 / Generative-Model-Study

Generative Model Paper Review and PyTorch Code

computer-vision generative-model multi-modal

Updated Jun 12, 2024

Improve this page

Add a description, image, and links to the multi-modal topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multi-modal topic, visit your repo's landing page and select "manage topics."