SAM-PARSER: fine-tuning SAM efficiently by parameter space reconstruction
Article No.: 502, Pages 4515 - 4523
Abstract
Segment Anything Model (SAM) has received remarkable attention as it offers a powerful and versatile solution for object segmentation in images. However, fine-tuning SAM for downstream segmentation tasks under different scenarios remains a challenge, as the varied characteristics of different scenarios naturally requires diverse model parameter spaces. Most existing fine-tuning methods attempt to bridge the gaps among different scenarios by introducing a set of new parameters to modify SAM's original parameter space. Unlike these works, in this paper, we propose fine-tuning SAM efficiently by parameter space reconstruction (SAM-PARSER), which introduces nearly zero trainable parameters during fine-tuning. In SAM-PARSER, we assume that SAM's original parameter space is relatively complete, so that its bases are able to reconstruct the parameter space of a new scenario. We obtain the bases by matrix decomposition, and fine-tuning the coefficients to reconstruct the parameter space tailored to the new scenario by an optimal linear combination of the bases. Experimental results show that SAM-PARSER exhibits superior segmentation performance across various scenarios, while reducing the number of trainable parameters by approximately 290 times compared with current parameter-efficient fine-tuning methods.
References
[1]
Aghajanyan, A.; Zettlemoyer, L.; and Gupta, S. 2020. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv preprint arXiv:2012.13255.
[2]
Andrews, H.; and Patterson, C. 1976. Singular value decomposition (SVD) image coding. IEEE transactions on Communications, 24(4): 425-432.
[3]
Bommasani, R.; Hudson, D. A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M. S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
[4]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J. D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, 33: 1877-1901.
[5]
Cannon, K.; Hanna, C.; and Keppel, D. 2011. Efficiently enclosing the compact binary parameter space by singular-value decomposition. Physical Review D, 84(8): 084003.
[6]
Chen, T.; Zhu, L.; Ding, C.; Cao, R.; Zhang, S.; Wang, Y.; Li, Z.; Sun, L.; Mao, P.; and Zang, Y. 2023. SAM Fails to Segment Anything?-SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, and More. arXiv preprint arXiv:2304.09148.
[7]
Cheng, G.; and Han, J. 2016. A survey on object detection in optical remote sensing images. ISPRS journal of photogrammetry and remote sensing, 117: 11-28.
[8]
Cheng, G.; Han, J.; Zhou, P.; and Guo, L. 2014. Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS Journal of Photogrammetry and Remote Sensing, 98: 119-132.
[9]
Cheng, G.; Zhou, P.; and Han, J. 2016. Learning rotationinvariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 54(12): 7405-7415.
[10]
Dutt, R.; Ericsson, L.; Sanchez, P.; Tsaftaris, S. A.; and Hospedales, T. 2023. Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity. arXiv preprint arXiv:2305.08252.
[11]
Everingham, M.; Van Gool, L.; Williams, C. K.; Winn, J.; and Zisserman, A. 2010. The pascal visual object classes (voc) challenge. International journal of computer vision, 88: 303-338.
[12]
Guan, Z.; Hu, M.; Zhou, Z.; Zhang, J.; Li, S.; and Liu, N. 2023. Badsam: Exploring security vulnerabilities of sam via backdoor attacks. arXiv preprint arXiv:2305.03289.
[13]
Han, L.; Li, Y.; Zhang, H.; Milanfar, P.; Metaxas, D.; and Yang, F. 2023. Svdiff: Compact parameter space for diffusion fine-tuning. arXiv preprint arXiv:2303.11305.
[14]
He, S.; Bao, R.; Li, J.; Grant, P. E.; and Ou, Y. 2023a. Accuracy of segment-anything model (sam) in medical image segmentation tasks. arXiv preprint arXiv:2304.09324.
[15]
He, Z.; Yang, M.; Feng, M.; Yin, J.; Wang, X.; Leng, J.; and Lin, Z. 2023b. Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator. arXiv preprint arXiv:2305.15099.
[16]
Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, B.; De Laroussilhe, Q.; Gesmundo, A.; Attariyan, M.; and Gelly, S. 2019. Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning, 2790-2799. PMLR.
[17]
Hu, E. J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; and Chen, W. 2022. LoRA: Low-Rank Adaptation of Large Language Models. In International Conference on Learning Representations.
[18]
Ji, G.-P.; Fan, D.-P.; Xu, P.; Cheng, M.-M.; Zhou, B.; and Van Gool, L. 2023. SAM Struggles in Concealed Scenes-Empirical Study on" Segment Anything". arXiv preprint arXiv:2304.06022.
[19]
Ji, S.; Wei, S.; and Lu, M. 2018. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Transactions on geoscience and remote sensing, 57(1): 574-586.
[20]
Jie, S.; and Deng, Z.-H. 2023. Fact: Factor-tuning for lightweight adaptation on vision transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, 1, 1060-1068.
[21]
Kingma, D. P.; and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[22]
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A. C.; Lo, W.Y.; Dollár, P.; and Girshick, R. 2023. Segment Anything. arXiv:2304.02643.
[23]
Li, Y.; Hu, M.; and Yang, X. 2023. Polyp-sam: Transfer sam for polyp segmentation. arXiv preprint arXiv:2305.00293.
[24]
Li, Z.; Kovachki, N.; Azizzadenesheli, K.; Liu, B.; Bhattacharya, K.; Stuart, A.; and Anandkumar, A. 2022. Fourier neural operator for parametric partial differential equations. In The Eleventh International Conference on Learning Representations.
[25]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; and Dollár, P. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, 2980-2988.
[26]
Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; and Zitnick, C. L. 2014. Microsoft coco: Common objects in context. In Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, 740-755. Springer.
[27]
Ma, J.; and Wang, B. 2023. Segment anything in medical images. arXiv preprint arXiv:2304.12306.
[28]
Ma, J.; Zhang, Y.; Gu, S.; Zhu, C.; Ge, C.; Zhang, Y.; An, X.; Wang, C.; Wang, Q.; Liu, X.; Cao, S.; Zhang, Q.; Liu, S.; Wang, Y.; Li, Y.; He, J.; and Yang, X. 2022. AbdomenCT-1K: Is Abdominal Organ Segmentation a Solved Problem? IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10): 6695-6714.
[29]
Mijnsbrugge, D. V.; Ongenae, F.; and Van Hoecke, S. 2021. Parameter Efficient Neural Networks With Singular Value Decomposed Kernels. IEEE Transactions on Neural Networks and Learning Systems, 1-11.
[30]
Panahi, A.; Saeedi, S.; and Arodz, T. 2021. Shapeshifter: a parameter-efficient transformer using factorized reshaped matrices. Advances in Neural Information Processing Systems, 34: 1337-1350.
[31]
Sun, Y.; Chen, Q.; He, X.; Wang, J.; Feng, H.; Han, J.; Ding, E.; Cheng, J.; Li, Z.; and Wang, J. 2022. Singular value fine-tuning: Few-shot segmentation requires few-parameters fine-tuning. Advances in Neural Information Processing Systems, 35: 37484-37496.
[32]
Wang, D.; Zheng, Y.; Lian, H.; and Li, G. 2022. High-dimensional vector autoregressive time series modeling via tensor decomposition. Journal of the American Statistical Association, 117(539): 1338-1356.
[33]
Wang, L.; Ye, X.; Zhu, L.; Wu, W.; Zhang, J.; Xing, H.; and Hu, C. 2023a. When SAM Meets Sonar Images. arXiv preprint arXiv:2306.14109.
[34]
Wang, X.; Zhang, X.; Cao, Y.; Wang, W.; Shen, C.; and Huang, T. 2023b. Seggpt: Segmenting everything in context. arXiv preprint arXiv:2304.03284.
[35]
Wang, Y.; Zhou, W.; Mao, Y.; and Li, H. 2023c. Detect Any Shadow: Segment Anything for Video Shadow Detection. arXiv preprint arXiv:2305.16698.
[36]
Wu, J.; Fu, R.; Fang, H.; Liu, Y.; Wang, Z.; Xu, Y.; Jin, Y.; and Arbel, T. 2023. Medical sam adapter: Adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620.
[37]
Zhang, K.; and Liu, D. 2023. Customized segment anything model for medical image segmentation. arXiv preprint arXiv:2304.13785.
[38]
Zhang, T.; Zhang, X.; Li, J.; Xu, X.; Wang, B.; Zhan, X.; Xu, Y.; Ke, X.; Zeng, T.; Su, H.; et al. 2021. SAR ship detection dataset (SSDD): Official release and comprehensive data analysis. Remote Sensing, 13(18): 3690.
[39]
Zhang, Y.; and Jiao, R. 2023. How Segment Anything Model (SAM) Boost Medical Image Segmentation? arXiv preprint arXiv:2305.03678.
[40]
Zheng, H.; Gong, M.; Liu, T.; Jiang, F.; Zhan, T.; Lu, D.; and Zhang, M. 2022. HFA-Net: High frequency attention siamese network for building change detection in VHR remote sensing images. Pattern Recognition, 129: 108717.
[41]
Zou, X.; Yang, J.; Zhang, H.; Li, F.; Li, L.; Gao, J.; and Lee, Y. J. 2023. Segment everything everywhere all at once. arXiv preprint arXiv:2304.06718.
Index Terms
- SAM-PARSER: fine-tuning SAM efficiently by parameter space reconstruction
Index terms have been assigned to the content through auto-classification.
Recommendations
S-SAM: SVD-Based Fine-Tuning of Segment Anything Model for Medical Image Segmentation
Medical Image Computing and Computer Assisted Intervention – MICCAI 2024AbstractMedical image segmentation has been traditionally approached by training or fine-tuning the entire model to cater to any new modality or dataset. However, this approach often requires tuning a large number of parameters during training. With the ...
Comments
Information & Contributors
Information
Published In
February 2024
23861 pages
ISBN:978-1-57735-887-9
Copyright © 2024 Association for the Advancement of Artificial Intelligence.
Sponsors
- Association for the Advancement of Artificial Intelligence
Publisher
AAAI Press
Publication History
Published: 07 January 2025
Qualifiers
- Research-article
- Research
- Refereed limited
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 14 Jan 2025