AutoGluon Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

Foundation models have transformed landscapes across fields like computer vision and natural language processing. These models are pre-trained on extensive common-domain data, serving as powerful tools for a wide range of applications. However, seamlessly integrating foundation models into real-world application scenarios has posed challenges. The diversity of data modalities, the multitude of available foundation models, and the considerable model sizes make this integration a nontrivial task.

AutoMM is dedicated to breaking these barriers by substantially reducing the engineering effort and manual intervention required in data preprocessing, model selection, and fine-tuning. With AutoMM, users can effortlessly adapt foundation models (from popular model zoos like HuggingFace, TIMM, MMDetection) to their domain-specific data using just three lines of code. Our toolkit accommodates various data types, including image, text, tabular, and document data, either individually or in combination. It offers support for an array of tasks, encompassing classification, regression, object detection, named entity recognition, semantic matching, and image segmentation. AutoMM represents a state-of-the-art and user-friendly solution, empowering multimodal AutoML with foundation models. For more details, refer to the paper below:

Zhiqiang, Tang, Haoyang Fang, Su Zhou, Taojiannan Yang, Zihan Zhong, Tony Hu, Katrin Kirchhoff, George Karypis . “AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models”, The International Conference on Automated Machine Learning (AutoML), 2024.

AutoMM Introduction

In the following, we decompose the functionalities of AutoMM and prepare step-by-step guide for each functionality.

Text Data – Classification / Regression / NER

AutoMM for Text Prediction - Quick Start

How to train high-quality text prediction models with AutoMM.

text_prediction/beginner_text.html
AutoMM for Text Prediction - Multilingual Problems

How to use AutoMM to build models on datasets with languages other than English.

text_prediction/multilingual_text.html
AutoMM for Named Entity Recognition - Quick Start

How to use AutoMM for entity extraction.

text_prediction/ner.html

Image Data – Classification / Regression

AutoMM for Image Classification - Quick Start

How to train image classification models with AutoMM.

image_prediction/beginner_image_cls.html
Zero-Shot Image Classification with CLIP

How to enable zero-shot image classification in AutoMM via pretrained CLIP model.

image_prediction/clip_zeroshot.html

Image Data – Object Detection

Quick Start on a Tiny COCO Format Dataset

How to train high quality object detection model with AutoMM in under 5 minutes on COCO format dataset.

object_detection/quick_start/quick_start_coco.html
Prepare COCO2017 Dataset

How to prepare COCO2017 dataset for object detection.

object_detection/data_preparation/prepare_coco17.html
Prepare Pascal VOC Dataset

How to prepare Pascal VOC dataset for object detection.

object_detection/data_preparation/prepare_voc.html
Prepare Watercolor Dataset

How to prepare Watercolor dataset for object detection.

object_detection/data_preparation/prepare_watercolor.html
Convert VOC Format Dataset to COCO Format

How to convert a dataset from VOC format to COCO format for object detection.

object_detection/data_preparation/voc_to_coco.html
Object Detection with DataFrame

How to use pd.DataFrame format for object detection

object_detection/data_preparation/object_detection_with_dataframe.html

Image Data – Segmentation

AutoMM for Semantic Segmentation - Quick Start

How to train semantic segmentation models with AutoMM.

image_segmentation/beginner_semantic_seg.html

Document Data – Classification / Regression

AutoMM for Scanned Document Classification

How to use AutoMM to build a scanned document classifier.

document_prediction/document_classification.html
Classifying PDF Documents with AutoMM

How to use AutoMM to build a PDF document classifier.

document_prediction/pdf_classification.html

Image / Text Data – Semantic Matching

Text-to-text Semantic Matching with AutoMM - Quick Start

How to use AutoMM for text-to-text semantic matching.

semantic_matching/text2text_matching.html
Image-to-Image Semantic Matching with AutoMM - Quick Start

How to use AutoMM for image-to-image semantic matching.

semantic_matching/image2image_matching.html
Image-Text Semantic Matching with AutoMM - Quick Start

How to use AutoMM for image-text semantic matching.

semantic_matching/image_text_matching.html
Zero Shot Image-Text Semantic Matching with AutoMM

How to use AutoMM for zero shot image-text semantic matching.

semantic_matching/zero_shot_img_txt_matching.html
Text Semantic Search with AutoMM

How to use semantic embeddings to improve search ranking performance.

semantic_matching/text_semantic_search.html

Multimodal Data – Classification / Regression / NER

AutoMM for Text + Tabular - Quick Start

How AutoMM can be applied to multimodal data tables with a mix of text, numerical, and categorical columns.

multimodal_prediction/multimodal_text_tabular.html
AutoMM for Image + Text + Tabular - Quick Start

How to use AutoMM to train a model on image, text, numerical, and categorical data.

multimodal_prediction/beginner_multimodal.html
AutoMM for Entity Extraction with Text and Image - Quick Start

How to use AutoMM to train a model for multimodal named entity recognition.

multimodal_prediction/multimodal_ner.html

Advanced Topics

Single GPU Billion-scale Model Training via Parameter-Efficient Finetuning

How to take advantage of larger foundation models with the help of parameter-efficient finetuning. In the tutorial, we will use combine IA^3, BitFit, and gradient checkpointing to finetune FLAN-T5-XL.

advanced_topics/efficient_finetuning_basic.html
Hyperparameter Optimization in AutoMM

How to do hyperparameter optimization in AutoMM.

advanced_topics/hyperparameter_optimization.html
Knowledge Distillation in AutoMM

How to do knowledge distillation in AutoMM.

advanced_topics/model_distillation.html
Continuous Training with AutoMM

How to continue training in AutoMM.

advanced_topics/continuous_training.html
Customize AutoMM

How to customize AutoMM configurations.

advanced_topics/customization.html
AutoMM Presets

How to use AutoMM presets.

advanced_topics/presets.html
Few Shot Learning with AutoMM

How to use foundation models + SVM for few shot learning.

advanced_topics/few_shot_learning.html
Handling Class Imbalance with AutoMM - Focal Loss

How to use AutoMM to handle class imbalance.

advanced_topics/focal_loss.html
Faster Prediction with TensorRT

How to use TensorRT in accelerating AutoMM model inference.

advanced_topics/tensorrt.html