-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TorchFX] Torch FX/PyTorch 2 Export Quantization #2766
Comments
@daniil-lyakhov, please, analyze this feature request and open issues as sub-tasks of this feature request. |
I suugest to introduce the following API in NNCF, to support third-party quantizers and better alignment with PyTorch 2 Export Quantization API:
|
### Changes Added a test in tests/torch/fx/test_models.py to include a test for quantized graph which compares the quantized graph with a reference quantized graph. ### Reason for changes To check if the graph was quantized correctly ### Ticket #2766 ### Tests test_quantized_model() was added in tests/torch/fx/test_models.py
…izers (#2854) ### Changes Quantizer merge logic updated to check that all output branches are quantized before quantizers merging and propagating up. ### Reason for changes To prevent merging of quantizers in case of ScaledDotProductAttention op, which should have quantizers on [0, 1] input ports and shouldn't have a quantizer on the 3 input port. ### Related tickets 148211 #2766 ### Tests * Common solver test for ScaleDotProductAttention branch merging and quantization initialization * Graph tests for torch/ov backends
### Changes Conformance test for resnet18 ### Reason for changes To extend testing scope for the TorchFX backend ### Related tickets #2766 ### Tests post_training_quantization/442 is successfull
### Changes Torch FX pre-hook insertion support ### Reason for changes To enable vit_b_16 quantization ### Related tickets #2766 ### Tests test_quantized_models is updated by vit_b_16 and swin_v2_s
### Changes Constant linear layers support ### Reason for changes To support swint_v2_s FBC ### Related tickets #2766 ### Tests Build post_training_quantization/444/ is finished successfully Unit test `test_model_transformer.test_model_extraction` is presented
### Changes TorchFX SmoothQuant backend implementation * module_insertion_transformation_builder is introduced * Transformation requires names for new modules and nodes * vit_b_16 is introduced in the conformance tests ### Reason for changes To improve metrics of quantized models: swin_v2_s and vit_b_16 * To insert SQ multiply nodes to the graph * To make node names human-readable and consistent * To check sq algorithm E2E ### Related tickets #2766 ### Tests * Smooth quant test template is implemented for TorchfX backed * Conformance test: post_training_quantization/446/ is successfull * Test models check SQ multiplies for swin_v2_s and vit_b_16 models
@MaximProshin, I would like to provide a summary for this feature request: Done:
The tasks are in progress for NNCF 2.14: |
### Changes Transformation for removing fake quantize nodes and saving all weights to disk in int8 format after quantization. It works as follows: 1. Reshape the scale if qdq operation is per-channel. 2. Pattern match the quantize-dequantize nodes. 3. Filter the matches to only include quantize-dequantize ops with constant input. 4. Replace with the multiplication of the scale and input. ### Reason for changes To compress the model after quantization ### Tests Add `test_post_quantization_compression()` in `tests/torch/fx/test_model_transformer.py` which checks the data type of all weights in the model after applying quantization and also checks the value after the decompression step (element-wise multiplication operation). ### Tickets #2766 --------- Co-authored-by: Daniil Lyakhov <daniil.lyakhov@intel.com>
### Changes * Resnet18 TorchFX example ### Reason for changes * To showcase NNCF TorchFX quantization ### Related tickets #2766 ### Tests test_examples/544/ - Done
plus parameter range estimators |
### Changes * ~~Constant folding is applied to all TorchFX models before the quantization~~ * Some torchvision models (swin_v2_s, vit_16_b) are exported by `torch.export.export` before ov conversation * Moc transformations are applied to openvino compressed models after the compression After the #2984 * Fixed `_compress_qdq_constant_transformation` for per tensor case ### Reason for changes * To align TorchFX/OV quantized models ### Related tickets #2766 ### Tests post_training_quantization/504/ is finished successfully
### Changes Constant folding is enabled by default in TorchFX backend ### Reason for changes To align quantizers placement between OV and TorchFX ### Related tickets #2766 ### Tests * test_constant_folding * test_constant_folding_with_constraints * test_models.py references are updated * post_training_quantization/535/ - finished successfully --------- Co-authored-by: Alexander Suslov <alexander.suslov@intel.com> Co-authored-by: Aamir Nazir <aamir.nazir@intel.com>
### Changes * TorchFX Unit tests are moved from `torch._export.capture_pre_autograd_graph` to `torch.export.export_for_training` ALL REFERENCE GRAPHS WERE VALIDATED MANUALLY * BC types for `fuse_bn_node` are updated * NNCFGraphBuilder is updated to support a batch-norm type with only one output node (instead of three) * Model extractor does not traverse down from constans to prevent redundant nodes in the extracted model when the constant is shared * `shared_constants_unification_transformation` is removed * Tests which require `capture_pre_autograd_graph` are removed ### Reason for changes * To migrate to the lates and recommended export method for TorchFX backend ### Related tickets #2766 ### Tests test_shared_constants_unification_not_connected_const post_training_quantization/540/ is finished successfully
…it#3075) ### Changes * TorchFX Unit tests are moved from `torch._export.capture_pre_autograd_graph` to `torch.export.export_for_training` ALL REFERENCE GRAPHS WERE VALIDATED MANUALLY * BC types for `fuse_bn_node` are updated * NNCFGraphBuilder is updated to support a batch-norm type with only one output node (instead of three) * Model extractor does not traverse down from constans to prevent redundant nodes in the extracted model when the constant is shared * `shared_constants_unification_transformation` is removed * Tests which require `capture_pre_autograd_graph` are removed ### Reason for changes * To migrate to the lates and recommended export method for TorchFX backend ### Related tickets openvinotoolkit#2766 ### Tests test_shared_constants_unification_not_connected_const post_training_quantization/540/ is finished successfully
PR #3075 to the release branch: ### Changes * TorchFX Unit tests are moved from `torch._export.capture_pre_autograd_graph` to `torch.export.export_for_training` ALL REFERENCE GRAPHS WERE VALIDATED MANUALLY * BC types for `fuse_bn_node` are updated * NNCFGraphBuilder is updated to support a batch-norm type with only one output node (instead of three) * Model extractor does not traverse down from constans to prevent redundant nodes in the extracted model when the constant is shared * `shared_constants_unification_transformation` is removed * Tests which require `capture_pre_autograd_graph` are removed ### Reason for changes * To migrate to the lates and recommended export method for TorchFX backend ### Related tickets #2766 ### Tests test_shared_constants_unification_not_connected_const post_training_quantization/540/ is finished successfully
### Changes * Main README.md, Usage.md and post training quantization docs are updated with info about the TorchFX ### Reason for changes * To reflect new experimental features of TorchFX in the docs ### Related tickets #2766
### Changes * Torch SDPA pattern is updated * As the concat node has his input nodes in format `args=([inp_1, ..., inp_n], dim)`, thus it should be treated differently. Retrieving concat inputs by input port id was supported in each TorchFX transformation ### Reason for changes * To support quantization of ultralytics/yolo11n in TorchFX backend ### Related tickets #2766 157032 ### Tests * `tests/torch/fx/test_model_transformer.py` and `tests/torch/fx/test_compress_weights.py` are updated to check all cases with the concat node. All .`dot` / `.json` were checked manually. * `tests/torch/fx/test_models.py` is updated with `YOLO11N_SDPABlock` synthetic model to check the correctness of SDPA pattern matching
### Changes All `capture_pre_autograd_graph` calls in the conformance test were replaced by `torch.export.export_for_training`. ### Reason for changes To remove deprecated `capture_pre_autograd_graph` from the conformance test. ### Related tickets #2766 ### Tests post_training_quantization/555/ have finished succesfully
…notoolkit#3078) ### Changes All `capture_pre_autograd_graph` calls in the conformance test were replaced by `torch.export.export_for_training`. ### Reason for changes To remove deprecated `capture_pre_autograd_graph` from the conformance test. ### Related tickets openvinotoolkit#2766 ### Tests post_training_quantization/555/ have finished succesfully
…notoolkit#3078) ### Changes All `capture_pre_autograd_graph` calls in the conformance test were replaced by `torch.export.export_for_training`. ### Reason for changes To remove deprecated `capture_pre_autograd_graph` from the conformance test. ### Related tickets openvinotoolkit#2766 ### Tests post_training_quantization/555/ have finished succesfully
### Changes * Bias fusing is removed from default transformations * `constant_folding` is updated to remove inplace operations without users * `extract_model` is updated to support original model output as a subgraph output ### Reason for changes To make it possible to apply quantization the same way it done by X86Quantizer ### Related tickets #2766 110985 ### Tests * All int8 references are updated and checked manually * `test_constant_folding` and `test_constant_folding_with_constraints` are updated with a constant subgraph which contains an inplace op (`relu_`) * `test_model_extraction_with_original_output` is introduced * conformance test post_training_quantization/557 have finished successfully
### Changes Folded constants do not require gradient ### Reason for changes * To unify all model constant/buffers * To make compressed model deepcopy-able ### Related tickets #2766 ### Tests `test_constant_folding` is updated
…it#3075) ### Changes * TorchFX Unit tests are moved from `torch._export.capture_pre_autograd_graph` to `torch.export.export_for_training` ALL REFERENCE GRAPHS WERE VALIDATED MANUALLY * BC types for `fuse_bn_node` are updated * NNCFGraphBuilder is updated to support a batch-norm type with only one output node (instead of three) * Model extractor does not traverse down from constans to prevent redundant nodes in the extracted model when the constant is shared * `shared_constants_unification_transformation` is removed * Tests which require `capture_pre_autograd_graph` are removed ### Reason for changes * To migrate to the lates and recommended export method for TorchFX backend ### Related tickets openvinotoolkit#2766 ### Tests test_shared_constants_unification_not_connected_const post_training_quantization/540/ is finished successfully
### Changes * Main README.md, Usage.md and post training quantization docs are updated with info about the TorchFX ### Reason for changes * To reflect new experimental features of TorchFX in the docs ### Related tickets openvinotoolkit#2766
🚀 Feature request
Quantization is a widely used technique to accelerate models, particularly when using the torch.compile. For detailed tutorials and demonstrations on model quantization using PyTorch 2 Export Quantization, please refer to the following resources:
These guides show how to obtain a quantized model via the PyTorch 2 Export Quantization API and run it using
torch.compile
. However OpenVINO provide backend fortorch.compile
, but NNCF does not support quantization PyTorch 2 Export (torch.fx.GraphModule
) models and users have to useX86InductorQuantizer
to quantize models. Comparisons between PyTorch 2 Export INT8 models quantized byX86InductorQuantizer
and OpenVINO INT8 models quantized byNNCF
show thatNNCF
produces more accurate and efficient INT8 models.Feature request is to support for
torch.fx.GraphModule
models innncf.quantize
to enable the creation of accurate and highly efficient models usingtorch.compile
with the OpenVINO backend.Feature Use Case
Are you going to submit a PR?
The text was updated successfully, but these errors were encountered: