Abstract
Whole Slide Image (WSI) classification with multiple instance learning (MIL) in digital pathology faces significant computational challenges. Current methods mostly rely on extensive self-supervised learning (SSL) for satisfactory performance, requiring long training periods and considerable computational resources. At the same time, no pre-training affects performance due to domain shifts from natural images to WSIs. We introduce Snuffy architecture, a novel MIL-pooling method based on sparse transformers that mitigates performance loss with limited pre-training and enables continual few-shot pre-training as a competitive option. Our sparsity pattern is tailored for pathology and is theoretically proven to be a universal approximator with the tightest probabilistic sharp bound on the number of layers for sparse transformers, to date. We demonstrate Snuffy’s effectiveness on CAMELYON16 and TCGA Lung cancer datasets, achieving superior WSI and patch-level accuracies. The code is available on https://github.com/jafarinia/snuffy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. Adv. Neural Inf. Process. Syst. 15 (2002)
Bejnordi, B.E., et al.: Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017). https://api.semanticscholar.org/CorpusID:205086555
Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer. arXiv preprint arXiv:2004.05150 (2020)
Campanella, G., et al.: Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019). https://api.semanticscholar.org/CorpusID:196814162
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)
Chen, R.J., et al.: Scaling vision transformers to gigapixel images via hierarchical self-supervised learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18–24 June 2022, pp. 16123–16134. IEEE (2022). https://doi.org/10.1109/CVPR52688.2022.01567
Chen, S., et al.: Adaptformer: adapting vision transformers for scalable visual recognition. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 16664–16678. Curran Associates, Inc. (2022). https://proceedings.neurips.cc/paper_files/paper/2022/file/69e2f49ab0837b71b0e0cb7c555990f8-Paper-Conference.pdf
Cheplygina, V., de Bruijne, M., Pluim, J.P.W.: Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. Medical Image Anal. 54, 280–296 (2019). https://doi.org/10.1016/J.MEDIA.2019.03.009
Child, R., Gray, S., Radford, A., Sutskever, I.: Generating long sequences with sparse transformers (2019). https://openai.com/blog/sparse-transformers
Cooper, L.A., Demicco, E.G., Saltz, J.H., Powell, R.T., Rao, A., Lazar, A.J.: Pancancer insights from the cancer genome atlas: the pathologist’s perspective. J. Pathol. 244(5), 512–524 (2018)
Dadashzadeh, A., Duan, S., Whone, A., Mirmehdi, M.: Pecop: parameter efficient continual pretraining for action quality assessment. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 42–52 (2024)
Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89(1–2), 31–71 (1997)
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, 3–7 May 2021. OpenReview.net (2021). https://openreview.net/forum?id=YicbFdNTTy
Gao, P., et al.: Clip-adapter: better vision-language models with feature adapters. Int. J. Comput. Vis. 132(2), 581–595 (2024). https://doi.org/10.1007/S11263-023-01891-X
Guo, Q., Qiu, X., Liu, P., Shao, Y., Xue, X., Zhang, Z.: Star-transformer. CoRR arxiv:1902.09113 (2019)
He, J., Zhou, C., Ma, X., Berg-Kirkpatrick, T., Neubig, G.: Towards a unified view of parameter-efficient transfer learning. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, 25–29 April 2022. OpenReview.net (2022). https://openreview.net/forum?id=0RDcd5Axok
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)
Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, California, USA, 9–15 June 2019. Proceedings of Machine Learning Research, vol. 97, pp. 2790–2799. PMLR (2019). http://proceedings.mlr.press/v97/houlsby19a.html
Hu, E.J., et al.: Lora: low-rank adaptation of large language models. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, 25–29 April 2022. OpenReview.net (2022). https://openreview.net/forum?id=nZeVKeeFYf9
Ilse, M., Tomczak, J.M., Welling, M.: Attention-based deep multiple instance learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018. Proceedings of Machine Learning Research, vol. 80, pp. 2132–2141. PMLR (2018). http://proceedings.mlr.press/v80/ilse18a.html
Javed, S., et al.: Cellular community detection for tissue phenotyping in colorectal cancer histology images. Med. Image Anal. 63, 101696 (2020). https://doi.org/10.1016/J.MEDIA.2020.101696
Kang, M., Song, H., Park, S., Yoo, D., Pereira, S.: Benchmarking self-supervised learning on diverse pathology datasets. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3344–3354 (2023)
Kather, J.N., et al.: Multi-class texture analysis in colorectal cancer histology. Sci. Rep. 6 (2016). https://api.semanticscholar.org/CorpusID:4769235
van der Laak, J.A., Litjens, G.J.S., Ciompi, F.: Deep learning in histopathology: the path to the clinic. Nat. Med. 27, 775 – 784 (2021). https://api.semanticscholar.org/CorpusID:234597294
Li, B., Li, Y., Eliceiri, K.W.: Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 14318–14328. Computer Vision Foundation/IEEE (2021). https://doi.org/10.1109/CVPR46437.2021.01409. https://openaccess.thecvf.com/content/CVPR2021/html/Li_Dual-Stream_Multiple_Instance_Learning_Network_for_Whole_Slide_Image_Classification_CVPR_2021_paper.html
Lu, M.Y., Williamson, D.F.K., Chen, T.Y., Chen, R.J., Barbieri, M., Mahmood, F.: Data efficient and weakly supervised computational pathology on whole slide images. CoRR arxiv:2004.09666 (2020)
Ludwig, J.A., Weinstein, J.N.: Biomarkers in cancer staging, prognosis and treatment selection. Nat. Rev. Cancer 5, 845–856 (2005). https://api.semanticscholar.org/CorpusID:25540232
Myronenko, A., Xu, Z., Yang, D., Roth, H.R., Xu, D.: Accounting for dependencies in deep learning based multiple instance learning for whole slide imaging. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12908, pp. 329–338. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87237-3_32
Ng, T.G., Damiris, K., Trivedi, U., George, J.C.: Obstructive jaundice, a rare presentation of lung cancer: a case report. Respir. Med. Case. Rep. 33, 101425 (2021)
Pajaziti, L., Hapçiu, S.R., Dobruna, S., Hoxha, N., Kurshumliu, F., Pajaziti, A.: Skin metastases from lung cancer: a case report. BMC. Res. Notes 8, 1–6 (2015)
Patel, A.M., Vila, D.G.D., Peters, S.G.: Paraneoplastic syndromes associated with lung cancer. Mayo Clin. Proc. 68(3), 278–287 (1993). https://doi.org/10.1016/S0025-6196(12)60050-0. https://www.sciencedirect.com/science/article/pii/S0025619612600500
Pfeiffer, J., Vulic, I., Gurevych, I., Ruder, S.: MAD-X: an adapter-based framework for multi-task cross-lingual transfer. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, 16–20 November 2020, pp. 7654–7673. Association for Computational Linguistics (2020).https://doi.org/10.18653/V1/2020.EMNLP-MAIN.617
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Qu, L., Luo, X., Liu, S., Wang, M., Song, Z.: DGMIL: distribution guided multiple instance learning for whole slide image classification. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) Medical Image Computing and Computer Assisted Intervention - MICCAI 2022 - 25th International Conference, Singapore, September 18-22, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13432, pp. 24–34. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-16434-7_3
Rony, J., Belharbi, S., Dolz, J., Ayed, I.B., McCaffrey, L., Granger, E.: Deep weakly-supervised learning methods for classification and localization in histology images: a survey. CoRR arxiv:1909.03354 (2019)
Shalata, W., et al.: Dermatomyositis associated with lung cancer: a brief review of the current literature and retrospective single institution experience. Life 13, 40 (2022). https://doi.org/10.3390/life13010040
Shao, Z., et al.: Transmil: transformer based correlated multiple instance learning for whole slide image classification. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, 6–14 December 2021, Virtual, pp. 2136–2147 (2021). https://proceedings.neurips.cc/paper/2021/hash/10c272d06794d3e5785d5e7c5356e9ff-Abstract.html
Srinidhi, C.L., Ciga, O., Martel, A.L.: Deep neural network models for computational histopathology: a survey. Med. Image Anal. 67, 101813 (2021). https://doi.org/10.1016/J.MEDIA.2020.101813
Wu, J., et al.: Medical sam adapter: adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620 (2023)
Xiong, Y., et al.: Nyströmformer: a nyström-based algorithm for approximating self-attention. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, 2–9 February 2021, pp. 14138–14148. AAAI Press (2021). https://doi.org/10.1609/AAAI.V35I16.17664
Xu, Y., Zhu, J., Chang, E.I., Tu, Z.: Multiple clustered instance learning for histopathology cancer image classification, segmentation and clustering. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012, pp. 964–971. IEEE Computer Society (2012). https://doi.org/10.1109/CVPR.2012.6247772
Yun, C., Chang, Y.W., Bhojanapalli, S., Rawat, A.S., Reddi, S.J., Kumar, S.: O(n) connections are expressive enough: universal approximability of sparse transformers. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS 2020. Curran Associates Inc., Red Hook (2020)
Yun, C., Chang, Y.W., Bhojanapalli, S., Rawat, A.S., Reddi, S.J., Kumar, S.: \$o(n)\$ connections are expressive enough: universal approximability of sparse transformers. ArXiv arxiv:2006.04862 (2020). https://api.semanticscholar.org/CorpusID:219558319
Zaheer, M., et al.: Big bird: transformers for longer sequences. Adv. Neural Inf. Process. Syst. 33 (2020)
Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R.R., Smola, A.J.: Deep sets. Adv. Neural Inf. Process. Syst. 30 (2017)
Zhang, H., et al.: DTFD-MIL: double-tier feature distillation multiple instance learning for histopathology whole slide image classification. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18–24 June 2022, pp. 18780–18790. IEEE (2022). https://doi.org/10.1109/CVPR52688.2022.01824
Zhang, T., et al.: Pad: self-supervised pre-training with patchwise-scale adapter for infrared images. arXiv preprint arXiv:2312.08192 (2023)
Zheng, Y., et al.: Kernel attention transformer for histopathology whole slide image analysis and assistant cancer diagnosis. IEEE Trans. Med. Imaging 42(9), 2726–2739 (2023). https://doi.org/10.1109/TMI.2023.3264781
Acknowledgements
We extend our deepest and most special thanks to Danial Hamdi for their efforts. We also thank Mohammad Mosayyebi, Mehrab Moradzadeh, Mohammad Hosein Movasaghinia, Mohammad Azizmalayeri, Hossein Mirzaei, Mohammad Mozafari, Soroush Vafaei Tabar, Mohammad Hassan Alikhani, and Hosein Hasani.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jafarinia, H., Alipanah, A., Razavi, S., Mirzaie, N., Rohban, M.H. (2025). Snuffy: Efficient Whole Slide Image Classifier. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15147. Springer, Cham. https://doi.org/10.1007/978-3-031-73024-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-73024-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73023-8
Online ISBN: 978-3-031-73024-5
eBook Packages: Computer ScienceComputer Science (R0)