MultiModalPredictor.optimize_for_inference

MultiModalPredictor.optimize_for_inference(providers: dict | List[str] | None = None)[source]

Optimize the predictor’s model for inference.

Under the hood, the implementation would convert the PyTorch module into an ONNX module, so that we can leverage efficient execution providers in onnxruntime for faster inference.

Parameters:

providers (dict or str, default=None) –

A list of execution providers for model prediction in onnxruntime.

By default, the providers argument is None. The method would generate an ONNX module that would perform model inference with TensorrtExecutionProvider in onnxruntime, if tensorrt package is properly installed. Otherwise, the onnxruntime would fallback to use CUDA or CPU execution providers instead.

Returns:

onnx_module – The onnx-based module that can be used to replace predictor._model for model inference.

Return type:

OnnxModule