AutoMM for Entity Extraction with Text and Image - Quick StartΒΆ

Open In Colab Open In SageMaker Studio Lab

We have introduced how to train an entity extraction model with text data. Here, we move a step further by integrating data of other modalities. In many real-world applications, textual data usually comes with data of other modalities. For example, Twitter allows you to compose tweets with text, photos, videos, and GIFs. Amazon.com uses text, images, and videos to describe their products. These auxiliary modalities can be leveraged as additional context resolution of entities. Now, with AutoMM, you can easily exploit multimodal data to enhance entity extraction without worrying about the details.

import os
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

Get the Twitter DatasetΒΆ

In the following example, we will demonstrate how to build a multimodal named entity recognition model with a real-world Twitter dataset. This dataset consists of scrapped tweets from 2016 to 2017, and each tweet was composed of one sentence and one image. Let’s download the dataset.

download_dir = './ag_automm_tutorial_ner'
zip_file = 'https://automl-mm-bench.s3.amazonaws.com/ner/multimodal_ner.zip'
from autogluon.core.utils.loaders import load_zip
load_zip.unzip(zip_file, unzip_dir=download_dir)
Downloading ./ag_automm_tutorial_ner/file.zip from https://automl-mm-bench.s3.amazonaws.com/ner/multimodal_ner.zip...
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 423M/423M [00:09<00:00, 46.0MiB/s]

Next, we will load the CSV files.

dataset_path = download_dir + '/multimodal_ner'
train_data = pd.read_csv(f'{dataset_path}/twitter17_train.csv')
test_data = pd.read_csv(f'{dataset_path}/twitter17_test.csv')
label_col = 'entity_annotations'

We need to expand the image paths to load them in training.

image_col = 'image'
train_data[image_col] = train_data[image_col].apply(lambda ele: ele.split(';')[0]) # Use the first image for a quick tutorial
test_data[image_col] = test_data[image_col].apply(lambda ele: ele.split(';')[0])

def path_expander(path, base_folder):
	path_l = path.split(';')
	p = ';'.join([os.path.abspath(base_folder+path) for path in path_l])
	return p

train_data[image_col] = train_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))
test_data[image_col] = test_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))

train_data[image_col].iloc[0]
'/home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/ag_automm_tutorial_ner/multimodal_ner/twitter2017_images/17_06_1818.jpg'

Each row consists of the text and image of a single tweet and the entity_annotataions which contains the named entity annotations for the text column. Let’s look at an example row and display the text and picture of the tweet.

example_row = train_data.iloc[0]

example_row
text_snippet           Uefa Super Cup : Real Madrid v Manchester United
image                 /home/ci/autogluon/docs/tutorials/multimodal/m...
entity_annotations    [{"entity_group": "B-MISC", "start": 0, "end":...
Name: 0, dtype: object

Below is the image of this tweet.

example_image = example_row[image_col]

from IPython.display import Image, display
pil_img = Image(filename=example_image, width =300)
display(pil_img)

As you can see, this photo contains the logos of the Real Madrid football club, Manchester United football club, and the UEFA super cup. Clearly, the key information of the tweet sentence is coded here in a different modality.

TrainingΒΆ

Now let’s fit the predictor with the training data. Firstly, we need to specify the problem_type to ner. As our annotations are used for text columns, to ensure the model to locate the correct text column for entity extraction, we need to set the corresponding column type to text_ner using the column_types parameter in cases where multiple text columns are present. Here we set a tight time budget for a quick demo.

from autogluon.multimodal import MultiModalPredictor
import uuid

label_col = "entity_annotations"
model_path = f"./tmp/{uuid.uuid4().hex}-automm_multimodal_ner"
predictor = MultiModalPredictor(problem_type="ner", label=label_col, path=model_path)
predictor.fit(
	train_data=train_data,
	column_types={"text_snippet":"text_ner"},
	time_limit=300, #second
)
=================== System Info ===================
AutoGluon Version:  1.1.1b20240613
Python Version:     3.10.13
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Fri May 17 18:07:48 UTC 2024
CPU Count:          8
Pytorch Version:    2.3.1+cu121
CUDA Version:       12.1
Memory Avail:       28.65 GB / 30.95 GB (92.6%)
Disk Space Avail:   187.82 GB / 255.99 GB (73.4%)
===================================================

AutoMM starts to create your model. ✨✨✨

To track the learning progress, you can open a terminal and launch Tensorboard:
    ```shell
    # Assume you have installed tensorboard
    tensorboard --logdir /home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/tmp/79ce6e09e96b4a959dfee68b41f81d10-automm_multimodal_ner
    ```

INFO: Seed set to 0
GPU Count: 1
GPU Count to be Used: 1
GPU 0 Name: Tesla T4
GPU 0 Memory: 0.42GB/15.0GB (Used/Total)

INFO: Using 16bit Automatic Mixed Precision (AMP)
INFO: GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO: HPU available: False, using: 0 HPUs
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO: 
  | Name              | Type                | Params | Mode 
------------------------------------------------------------------
0 | model             | MultimodalFusionNER | 271 M  | train
1 | validation_metric | MulticlassF1Score   | 0      | train
2 | loss_func         | CrossEntropyLoss    | 0      | train
------------------------------------------------------------------
271 M     Trainable params
0         Non-trainable params
271 M     Total params
1,087.138 Total estimated model params size (MB)
INFO: Epoch 0, global step 11: 'val_ner_token_f1' reached 0.05800 (best 0.05800), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/tmp/79ce6e09e96b4a959dfee68b41f81d10-automm_multimodal_ner/epoch=0-step=11.ckpt' as top 3
INFO: Epoch 0, global step 23: 'val_ner_token_f1' reached 0.30867 (best 0.30867), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/tmp/79ce6e09e96b4a959dfee68b41f81d10-automm_multimodal_ner/epoch=0-step=23.ckpt' as top 3
INFO: Time limit reached. Elapsed time is 0:05:00. Signaling Trainer to stop.
INFO: Epoch 1, global step 29: 'val_ner_token_f1' reached 0.47867 (best 0.47867), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/tmp/79ce6e09e96b4a959dfee68b41f81d10-automm_multimodal_ner/epoch=1-step=29.ckpt' as top 3
Start to fuse 3 checkpoints via the greedy soup algorithm.
AutoMM has created your model. πŸŽ‰πŸŽ‰πŸŽ‰

To load the model, use the code below:
    ```python
    from autogluon.multimodal import MultiModalPredictor
    predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/tmp/79ce6e09e96b4a959dfee68b41f81d10-automm_multimodal_ner")
    ```

If you are not satisfied with the model, try to increase the training time, 
adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
or post issues on GitHub (https://github.com/autogluon/autogluon/issues).
<autogluon.multimodal.predictor.MultiModalPredictor at 0x7faf7b2b9e40>

Under the hood, AutoMM automatically detects the data modalities, selects the related models from the multimodal model pools, and trains the selected models. If multiple backbones are available, AutoMM appends a late-fusion model on top of them.

EvaluationΒΆ

predictor.evaluate(test_data,  metrics=['overall_recall', "overall_precision", "overall_f1"])
{'overall_recall': 0.39680232558139533,
 'overall_precision': 0.5967213114754099,
 'overall_f1': 0.4766477520733304}

PredictionΒΆ

You can easily obtain the predictions by calling predictor.predict().

prediction_input = test_data.drop(columns=label_col).head(1)
predictions = predictor.predict(prediction_input)
print('Tweet:', prediction_input.text_snippet[0])
print('Image path:', prediction_input.image[0])
print('Predicted entities:', predictions[0])

for entity in predictions[0]:
	print(f"Word '{prediction_input.text_snippet[0][entity['start']:entity['end']]}' belongs to group: {entity['entity_group']}")
Tweet: Citifield Fan View : RT @ jehnnybgoode What a gorgeous day for baseball ! Stuck in that Saturdaze . # NewYorkMets VS # Sa …
Image path: /home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/ag_automm_tutorial_ner/multimodal_ner/twitter2017_images/16_05_01_360.jpg
Predicted entities: [{'entity_group': 'PER', 'start': 0, 'end': 9}, {'entity_group': 'ORG', 'start': 102, 'end': 113}]
Word 'Citifield' belongs to group: PER
Word 'NewYorkMets' belongs to group: ORG

Reloading and Continuous TrainingΒΆ

The trained predictor is automatically saved and you can easily reload it using the path. If you are not satisfied with the current model performance, you can continue training the loaded model with new data.

new_predictor = MultiModalPredictor.load(model_path)
new_model_path = f"./tmp/{uuid.uuid4().hex}-automm_multimodal_ner_continue_train"
new_predictor.fit(train_data, time_limit=60, save_path=new_model_path)
test_score = new_predictor.evaluate(test_data, metrics=['overall_f1'])
print(test_score)
Load pretrained checkpoint: /home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/tmp/79ce6e09e96b4a959dfee68b41f81d10-automm_multimodal_ner/model.ckpt
=================== System Info ===================
AutoGluon Version:  1.1.1b20240613
Python Version:     3.10.13
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Fri May 17 18:07:48 UTC 2024
CPU Count:          8
Pytorch Version:    2.3.1+cu121
CUDA Version:       12.1
Memory Avail:       22.45 GB / 30.95 GB (72.5%)
Disk Space Avail:   186.21 GB / 255.99 GB (72.7%)
===================================================

AutoMM starts to create your model. ✨✨✨

To track the learning progress, you can open a terminal and launch Tensorboard:
    ```shell
    # Assume you have installed tensorboard
    tensorboard --logdir /home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/tmp/5aecaf3c96224bebabd8341e384df286-automm_multimodal_ner_continue_train
    ```

INFO: Seed set to 0
GPU Count: 1
GPU Count to be Used: 1
GPU 0 Name: Tesla T4
GPU 0 Memory: 0.58GB/15.0GB (Used/Total)

INFO: Using 16bit Automatic Mixed Precision (AMP)
INFO: GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO: HPU available: False, using: 0 HPUs
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO: 
  | Name              | Type                | Params | Mode 
------------------------------------------------------------------
0 | model             | MultimodalFusionNER | 271 M  | train
1 | validation_metric | MulticlassF1Score   | 0      | train
2 | loss_func         | CrossEntropyLoss    | 0      | train
------------------------------------------------------------------
271 M     Trainable params
0         Non-trainable params
271 M     Total params
1,087.138 Total estimated model params size (MB)
INFO: Time limit reached. Elapsed time is 0:01:00. Signaling Trainer to stop.
INFO: Epoch 0, global step 6: 'val_ner_token_f1' reached 0.56733 (best 0.56733), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/tmp/5aecaf3c96224bebabd8341e384df286-automm_multimodal_ner_continue_train/epoch=0-step=6.ckpt' as top 3
AutoMM has created your model. πŸŽ‰πŸŽ‰πŸŽ‰

To load the model, use the code below:
    ```python
    from autogluon.multimodal import MultiModalPredictor
    predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/tmp/5aecaf3c96224bebabd8341e384df286-automm_multimodal_ner_continue_train")
    ```

If you are not satisfied with the model, try to increase the training time, 
adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
or post issues on GitHub (https://github.com/autogluon/autogluon/issues).
{'overall_f1': 0.5273668639053255}

Other ExamplesΒΆ

You may go to AutoMM Examples to explore other examples about AutoMM.

CustomizationΒΆ

To learn how to customize AutoMM, please refer to Customize AutoMM.