research-article

SAM-PARSER: fine-tuning SAM efficiently by parameter space reconstruction

AUTHORs:

Wei ShenAuthors Info & Claims

AAAI'24/IAAI'24/EAAI'24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence

Article No.: 502, Pages 4515 - 4523

https://doi.org/10.1609/aaai.v38i5.28250

Published: 07 January 2025 Publication History

Abstract

Segment Anything Model (SAM) has received remarkable attention as it offers a powerful and versatile solution for object segmentation in images. However, fine-tuning SAM for downstream segmentation tasks under different scenarios remains a challenge, as the varied characteristics of different scenarios naturally requires diverse model parameter spaces. Most existing fine-tuning methods attempt to bridge the gaps among different scenarios by introducing a set of new parameters to modify SAM's original parameter space. Unlike these works, in this paper, we propose fine-tuning SAM efficiently by parameter space reconstruction (SAM-PARSER), which introduces nearly zero trainable parameters during fine-tuning. In SAM-PARSER, we assume that SAM's original parameter space is relatively complete, so that its bases are able to reconstruct the parameter space of a new scenario. We obtain the bases by matrix decomposition, and fine-tuning the coefficients to reconstruct the parameter space tailored to the new scenario by an optimal linear combination of the bases. Experimental results show that SAM-PARSER exhibits superior segmentation performance across various scenarios, while reducing the number of trainable parameters by approximately 290 times compared with current parameter-efficient fine-tuning methods.

References

[1]

Aghajanyan, A.; Zettlemoyer, L.; and Gupta, S. 2020. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv preprint arXiv:2012.13255.

[2]

Andrews, H.; and Patterson, C. 1976. Singular value decomposition (SVD) image coding. IEEE transactions on Communications, 24(4): 425-432.

[3]

Bommasani, R.; Hudson, D. A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M. S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.

[4]

Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J. D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, 33: 1877-1901.

[5]

Cannon, K.; Hanna, C.; and Keppel, D. 2011. Efficiently enclosing the compact binary parameter space by singular-value decomposition. Physical Review D, 84(8): 084003.

[6]

Chen, T.; Zhu, L.; Ding, C.; Cao, R.; Zhang, S.; Wang, Y.; Li, Z.; Sun, L.; Mao, P.; and Zang, Y. 2023. SAM Fails to Segment Anything?-SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, and More. arXiv preprint arXiv:2304.09148.

[7]

Cheng, G.; and Han, J. 2016. A survey on object detection in optical remote sensing images. ISPRS journal of photogrammetry and remote sensing, 117: 11-28.

[8]

Cheng, G.; Han, J.; Zhou, P.; and Guo, L. 2014. Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS Journal of Photogrammetry and Remote Sensing, 98: 119-132.

[9]

Cheng, G.; Zhou, P.; and Han, J. 2016. Learning rotationinvariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 54(12): 7405-7415.

[10]

Dutt, R.; Ericsson, L.; Sanchez, P.; Tsaftaris, S. A.; and Hospedales, T. 2023. Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity. arXiv preprint arXiv:2305.08252.

[11]

Everingham, M.; Van Gool, L.; Williams, C. K.; Winn, J.; and Zisserman, A. 2010. The pascal visual object classes (voc) challenge. International journal of computer vision, 88: 303-338.

[12]

Guan, Z.; Hu, M.; Zhou, Z.; Zhang, J.; Li, S.; and Liu, N. 2023. Badsam: Exploring security vulnerabilities of sam via backdoor attacks. arXiv preprint arXiv:2305.03289.

[13]

Han, L.; Li, Y.; Zhang, H.; Milanfar, P.; Metaxas, D.; and Yang, F. 2023. Svdiff: Compact parameter space for diffusion fine-tuning. arXiv preprint arXiv:2303.11305.

[14]

He, S.; Bao, R.; Li, J.; Grant, P. E.; and Ou, Y. 2023a. Accuracy of segment-anything model (sam) in medical image segmentation tasks. arXiv preprint arXiv:2304.09324.

[15]

He, Z.; Yang, M.; Feng, M.; Yin, J.; Wang, X.; Leng, J.; and Lin, Z. 2023b. Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator. arXiv preprint arXiv:2305.15099.

[16]

Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, B.; De Laroussilhe, Q.; Gesmundo, A.; Attariyan, M.; and Gelly, S. 2019. Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning, 2790-2799. PMLR.

[17]

Hu, E. J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; and Chen, W. 2022. LoRA: Low-Rank Adaptation of Large Language Models. In International Conference on Learning Representations.

[18]

Ji, G.-P.; Fan, D.-P.; Xu, P.; Cheng, M.-M.; Zhou, B.; and Van Gool, L. 2023. SAM Struggles in Concealed Scenes-Empirical Study on" Segment Anything". arXiv preprint arXiv:2304.06022.

[19]

Ji, S.; Wei, S.; and Lu, M. 2018. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Transactions on geoscience and remote sensing, 57(1): 574-586.

[20]

Jie, S.; and Deng, Z.-H. 2023. Fact: Factor-tuning for lightweight adaptation on vision transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, 1, 1060-1068.

Digital Library

[21]

Kingma, D. P.; and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

[22]

Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A. C.; Lo, W.Y.; Dollár, P.; and Girshick, R. 2023. Segment Anything. arXiv:2304.02643.

[23]

Li, Y.; Hu, M.; and Yang, X. 2023. Polyp-sam: Transfer sam for polyp segmentation. arXiv preprint arXiv:2305.00293.

[24]

Li, Z.; Kovachki, N.; Azizzadenesheli, K.; Liu, B.; Bhattacharya, K.; Stuart, A.; and Anandkumar, A. 2022. Fourier neural operator for parametric partial differential equations. In The Eleventh International Conference on Learning Representations.

[25]

Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; and Dollár, P. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, 2980-2988.

[26]

Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; and Zitnick, C. L. 2014. Microsoft coco: Common objects in context. In Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, 740-755. Springer.

[27]

Ma, J.; and Wang, B. 2023. Segment anything in medical images. arXiv preprint arXiv:2304.12306.

[28]

Ma, J.; Zhang, Y.; Gu, S.; Zhu, C.; Ge, C.; Zhang, Y.; An, X.; Wang, C.; Wang, Q.; Liu, X.; Cao, S.; Zhang, Q.; Liu, S.; Wang, Y.; Li, Y.; He, J.; and Yang, X. 2022. AbdomenCT-1K: Is Abdominal Organ Segmentation a Solved Problem? IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10): 6695-6714.

Digital Library

[29]

Mijnsbrugge, D. V.; Ongenae, F.; and Van Hoecke, S. 2021. Parameter Efficient Neural Networks With Singular Value Decomposed Kernels. IEEE Transactions on Neural Networks and Learning Systems, 1-11.

[30]

Panahi, A.; Saeedi, S.; and Arodz, T. 2021. Shapeshifter: a parameter-efficient transformer using factorized reshaped matrices. Advances in Neural Information Processing Systems, 34: 1337-1350.

[31]

Sun, Y.; Chen, Q.; He, X.; Wang, J.; Feng, H.; Han, J.; Ding, E.; Cheng, J.; Li, Z.; and Wang, J. 2022. Singular value fine-tuning: Few-shot segmentation requires few-parameters fine-tuning. Advances in Neural Information Processing Systems, 35: 37484-37496.

[32]

Wang, D.; Zheng, Y.; Lian, H.; and Li, G. 2022. High-dimensional vector autoregressive time series modeling via tensor decomposition. Journal of the American Statistical Association, 117(539): 1338-1356.

[33]

Wang, L.; Ye, X.; Zhu, L.; Wu, W.; Zhang, J.; Xing, H.; and Hu, C. 2023a. When SAM Meets Sonar Images. arXiv preprint arXiv:2306.14109.

[34]

Wang, X.; Zhang, X.; Cao, Y.; Wang, W.; Shen, C.; and Huang, T. 2023b. Seggpt: Segmenting everything in context. arXiv preprint arXiv:2304.03284.

[35]

Wang, Y.; Zhou, W.; Mao, Y.; and Li, H. 2023c. Detect Any Shadow: Segment Anything for Video Shadow Detection. arXiv preprint arXiv:2305.16698.

[36]

Wu, J.; Fu, R.; Fang, H.; Liu, Y.; Wang, Z.; Xu, Y.; Jin, Y.; and Arbel, T. 2023. Medical sam adapter: Adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620.

[37]

Zhang, K.; and Liu, D. 2023. Customized segment anything model for medical image segmentation. arXiv preprint arXiv:2304.13785.

[38]

Zhang, T.; Zhang, X.; Li, J.; Xu, X.; Wang, B.; Zhan, X.; Xu, Y.; Ke, X.; Zeng, T.; Su, H.; et al. 2021. SAR ship detection dataset (SSDD): Official release and comprehensive data analysis. Remote Sensing, 13(18): 3690.

[39]

Zhang, Y.; and Jiao, R. 2023. How Segment Anything Model (SAM) Boost Medical Image Segmentation? arXiv preprint arXiv:2305.03678.

[40]

Zheng, H.; Gong, M.; Liu, T.; Jiang, F.; Zhan, T.; Lu, D.; and Zhang, M. 2022. HFA-Net: High frequency attention siamese network for building change detection in VHR remote sensing images. Pattern Recognition, 129: 108717.

Digital Library

[41]

Zou, X.; Yang, J.; Zhang, H.; Li, F.; Li, L.; Gao, J.; and Lee, Y. J. 2023. Segment everything everywhere all at once. arXiv preprint arXiv:2304.06718.

Index Terms

SAM-PARSER: fine-tuning SAM efficiently by parameter space reconstruction
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

SAM 2007 Assessment 5.0 Printed Access Card
SAM 2007 Assessment and Training 5.0 Printed Access Card
S-SAM: SVD-Based Fine-Tuning of Segment Anything Model for Medical Image Segmentation
Medical Image Computing and Computer Assisted Intervention – MICCAI 2024
Abstract
Medical image segmentation has been traditionally approached by training or fine-tuning the entire model to cater to any new modality or dataset. However, this approach often requires tuning a large number of parameters during training. With the ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI'24/IAAI'24/EAAI'24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence

February 2024

23861 pages

ISBN:978-1-57735-887-9

Copyright © 2024 Association for the Advancement of Artificial Intelligence.

Sponsors

Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 07 January 2025

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents