Optimizing latent graph representations of surgical scenes for unseen domain generalization

Satyanaik, Siddhant; Murali, Aditya; Alapatt, Deepak; Wang, Xin; Mascagni, Pietro; Padoy, Nicolas

doi:10.1007/s11548-024-03121-2

Optimizing latent graph representations of surgical scenes for unseen domain generalization

Original Article
Published: 28 April 2024

Volume 19, pages 1243–1250, (2024)
Cite this article

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Siddhant Satyanaik¹^na1,
Aditya Murali ORCID: orcid.org/0000-0002-2408-5999¹^na1,
Deepak Alapatt¹,
Xin Wang⁴,
Pietro Mascagni^2,3 &
…
Nicolas Padoy^1,2

331 Accesses
1 Citation
Explore all metrics

Abstract

Purpose

Advances in deep learning have resulted in effective models for surgical video analysis; however, these models often fail to generalize across medical centers due to domain shift caused by variations in surgical workflow, camera setups, and patient demographics. Recently, object-centric learning has emerged as a promising approach for improved surgical scene understanding, capturing and disentangling visual and semantic properties of surgical tools and anatomy to improve downstream task performance. In this work, we conduct a multicentric performance benchmark of object-centric approaches, focusing on critical view of safety assessment in laparoscopic cholecystectomy, then propose an improved approach for unseen domain generalization.

Methods

We evaluate four object-centric approaches for domain generalization, establishing baseline performance. Next, leveraging the disentangled nature of object-centric representations, we dissect one of these methods through a series of ablations (e.g., ignoring either visual or semantic features for downstream classification). Finally, based on the results of these ablations, we develop an optimized method specifically tailored for domain generalization, LG-DG, that includes a novel disentanglement loss function.

Results

Our optimized approach, LG-DG, achieves an improvement of 9.28% over the best baseline approach. More broadly, we show that object-centric approaches are highly effective for domain generalization thanks to their modular approach to representation learning.

Conclusion

We investigate the use of object-centric methods for unseen domain generalization, identify method-agnostic factors critical for performance, and present an optimized approach that substantially outperforms existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tackling Domain Generalization for Out-of-Distribution Endoscopic Imaging

EndoViT: pretraining vision transformers on a large collection of endoscopic images

Article Open access 03 April 2024

Exploring the Effect of Dataset Diversity in Self-supervised Learning for Surgical Computer Vision

Code Availability

The source code will be made publicly available at https://github.com/CAMMA-public/SurgLatentGraph.

Notes

This setting represents a very realistic scenario as collecting dense bounding box or segmentation labels is orders of magnitude more expensive than image-level annotations for classification tasks like CVS.

References

Twinanda AP, Shehata S, Mutter D, Marescaux J, De Mathelin M, Padoy N (2016) Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging 36(1):86–97
Article PubMed Google Scholar
Grammatikopoulou M, Flouty E, Kadkhodamohammadi A, Quellec G, Chow A, Nehme J, Luengo I, Stoyanov D (2021) Cadis: Cataract dataset for surgical rgb-image segmentation. Med Image Anal 71:66
Article Google Scholar
Sestini L, Rosa B, De Momi E, Ferrigno G, Padoy N (2023) Fun-sis: a fully unsupervised approach for surgical instrument segmentation. Med Image Anal 85:102751
Article PubMed Google Scholar
Sharma S, Nwoye CI, Mutter D, Padoy N (2023) Surgical action triplet detection by mixed supervised learning of instrument-tissue interactions. In: MICCAI. Springer, Berlin, pp 505–514
Hao L, Hu Y, Lin W, Wang Q, Li H, Fu H, Duan J, Liu J (2023) Act-net: anchor-context action detection in surgery videos. In: MICCAI. Springer, Berlin, pp 196–206
Kassem H, Alapatt D, Mascagni P, AI4SafeChole C, Karargyris A, Padoy N. (2022) Federated cycling (fedcy): semi-supervised federated learning of surgical phases. IEEE Trans Med Imaging 6:66
Srivastav V, Gangi A, Padoy N (2022) Unsupervised domain adaptation for clinician pose estimation and instance segmentation in the operating room. In: Medical image analysis
Wang Q, Bu P, Breckon TP (2019) Unifying unsupervised domain adaptation and zero-shot visual recognition. In: 2019 International joint conference on neural networks (IJCNN). IEEE, pp 1–8
Mottaghi A, Sharghi A, Yeung S, Mohareri O (2022) Adaptation of surgical activity recognition models across operating rooms. In: MICCAI. Springer, pp 530–540
Xu J, Zhang Q, Yu Y, Zhao R, Bian X, Liu X, Wang J, Ge Z, Qian D (2022) Deep reconstruction-recoding network for unsupervised domain adaptation and multi-center generalization in colonoscopy polyp detection. Comput Methods Programs Biomed 214:106576
Article PubMed Google Scholar
Mascagni P, Vardazaryan A, Alapatt D, Urade T, Emre T, Fiorillo C, Pessaux P, Mutter D, Marescaux J, Costamagna G et al (2021) Artificial intelligence for surgical safety: automatic assessment of the critical view of safety in laparoscopic cholecystectomy using deep learning. Ann Surg 6:66
Murali A, Alapatt D, Mascagni P, Vardazaryan A, Garcia A, Okamoto N, Mutter D, Padoy N (2023) Latent graph representations for critical view of safety assessment. IEEE Trans Med Imaging 66:1
Google Scholar
Murali A, Alapatt D, Mascagni P, Vardazaryan A, Garcia A, Okamoto N, Mutter D, Padoy N (2023) Encoding surgical videos as latent spatiotemporal graphs for object and anatomy-driven reasoning. In: MICCAI. Springer, Berlin, pp 647–657
Murali A, Alapatt D, Mascagni P, Vardazaryan A, Garcia A, Okamoto N, Costamagna G, Mutter D, Marescaux J, Dallemagne B et al (2023) The endoscapes dataset for surgical scene segmentation, object detection, and critical view of safety assessment: official splits and benchmark. arXiv preprint arXiv:2312.12429
Basak H, Yin Z (2023) Semi-supervised domain adaptive medical image segmentation through consistency regularized disentangled contrastive learning. In: MICCAI. Springer, Berlin, pp 260–270
Sohan MF, Basalamah A (2023) A systematic review on federated learning in medical image analysis. IEEE Access 66:6
Google Scholar
Choi S, Jung S, Yun H, Kim JT, Kim S, Choo J (2021) Robustnet: improving domain generalization in urban-scene segmentation via instance selective whitening. In: CVPR, pp 11580–11590
Chen Z, Pan Y, Ye Y, Cui H, Xia Y (2023) Treasure in distribution: a domain randomization based multi-source domain generalization for 2d medical image segmentation. In: MICCAI. Springer, Cham, pp 89–99
Hamoud I, Jamal MA, Srivastav V, Mutter D, Padoy N, Mohareri O (2023) St(or)$^2$: spatio-temporal object level reasoning for activity recognition in the operating room. In: Medical imaging with deep learning
Özsoy E, Czempiel T, Holm F, Pellegrini C, Navab N (2023) Labrad-or: lightweight memory scene graphs for accurate bimodal reasoning in dynamic operating rooms. arXiv preprint arXiv:2303.13293
Holm F, Ghazaei G, Czempiel T, Özsoy E, Saur S, Navab N (2023) Dynamic scene graph representation for surgical video. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 81–87
Pang W, Islam M, Mitheran S, Seenivasan L, Xu M, Ren H (2022) Rethinking feature extraction: gradient-based localized feature extraction for end-to-end surgical downstream tasks. IEEE Robot Autom Lett 7(4):12623–12630
Article Google Scholar

Download references

Acknowledgements

This work was supported by French state funds managed by the ANR within the National AI Chair program under Grant ANR-20-CHIA-0029-01 (Chair AI4ORSafety). This work was granted access to the HPC resources of IDRIS under the allocation AD011013523R1 made by GENCI.

Author information

Siddhant Satyanaik and Aditya Murali have contributed equally to this work.

Authors and Affiliations

ICube, University of Strasbourg, CNRS, Strasbourg, France
Siddhant Satyanaik, Aditya Murali, Deepak Alapatt & Nicolas Padoy
IHU, Strasbourg, France
Pietro Mascagni & Nicolas Padoy
Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy
Pietro Mascagni
West China Hospital of Sichuan University, Chengdu, China
Xin Wang

Authors

Siddhant Satyanaik
View author publications
You can also search for this author in PubMed Google Scholar
Aditya Murali
View author publications
You can also search for this author in PubMed Google Scholar
Deepak Alapatt
View author publications
You can also search for this author in PubMed Google Scholar
Xin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Pietro Mascagni
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Padoy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aditya Murali.

Ethics declarations

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of interest

The authors declare no conflict of interest.

Informed Consent

This manuscript does not contain any patient data.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Satyanaik, S., Murali, A., Alapatt, D. et al. Optimizing latent graph representations of surgical scenes for unseen domain generalization. Int J CARS 19, 1243–1250 (2024). https://doi.org/10.1007/s11548-024-03121-2

Download citation

Received: 03 March 2024
Accepted: 22 March 2024
Published: 28 April 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s11548-024-03121-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimizing latent graph representations of surgical scenes for unseen domain generalization