Abstract
Purpose
Advances in deep learning have resulted in effective models for surgical video analysis; however, these models often fail to generalize across medical centers due to domain shift caused by variations in surgical workflow, camera setups, and patient demographics. Recently, object-centric learning has emerged as a promising approach for improved surgical scene understanding, capturing and disentangling visual and semantic properties of surgical tools and anatomy to improve downstream task performance. In this work, we conduct a multicentric performance benchmark of object-centric approaches, focusing on critical view of safety assessment in laparoscopic cholecystectomy, then propose an improved approach for unseen domain generalization.
Methods
We evaluate four object-centric approaches for domain generalization, establishing baseline performance. Next, leveraging the disentangled nature of object-centric representations, we dissect one of these methods through a series of ablations (e.g., ignoring either visual or semantic features for downstream classification). Finally, based on the results of these ablations, we develop an optimized method specifically tailored for domain generalization, LG-DG, that includes a novel disentanglement loss function.
Results
Our optimized approach, LG-DG, achieves an improvement of 9.28% over the best baseline approach. More broadly, we show that object-centric approaches are highly effective for domain generalization thanks to their modular approach to representation learning.
Conclusion
We investigate the use of object-centric methods for unseen domain generalization, identify method-agnostic factors critical for performance, and present an optimized approach that substantially outperforms existing methods.
Similar content being viewed by others
Code Availability
The source code will be made publicly available at https://github.com/CAMMA-public/SurgLatentGraph.
Notes
This setting represents a very realistic scenario as collecting dense bounding box or segmentation labels is orders of magnitude more expensive than image-level annotations for classification tasks like CVS.
References
Twinanda AP, Shehata S, Mutter D, Marescaux J, De Mathelin M, Padoy N (2016) Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging 36(1):86–97
Grammatikopoulou M, Flouty E, Kadkhodamohammadi A, Quellec G, Chow A, Nehme J, Luengo I, Stoyanov D (2021) Cadis: Cataract dataset for surgical rgb-image segmentation. Med Image Anal 71:66
Sestini L, Rosa B, De Momi E, Ferrigno G, Padoy N (2023) Fun-sis: a fully unsupervised approach for surgical instrument segmentation. Med Image Anal 85:102751
Sharma S, Nwoye CI, Mutter D, Padoy N (2023) Surgical action triplet detection by mixed supervised learning of instrument-tissue interactions. In: MICCAI. Springer, Berlin, pp 505–514
Hao L, Hu Y, Lin W, Wang Q, Li H, Fu H, Duan J, Liu J (2023) Act-net: anchor-context action detection in surgery videos. In: MICCAI. Springer, Berlin, pp 196–206
Kassem H, Alapatt D, Mascagni P, AI4SafeChole C, Karargyris A, Padoy N. (2022) Federated cycling (fedcy): semi-supervised federated learning of surgical phases. IEEE Trans Med Imaging 6:66
Srivastav V, Gangi A, Padoy N (2022) Unsupervised domain adaptation for clinician pose estimation and instance segmentation in the operating room. In: Medical image analysis
Wang Q, Bu P, Breckon TP (2019) Unifying unsupervised domain adaptation and zero-shot visual recognition. In: 2019 International joint conference on neural networks (IJCNN). IEEE, pp 1–8
Mottaghi A, Sharghi A, Yeung S, Mohareri O (2022) Adaptation of surgical activity recognition models across operating rooms. In: MICCAI. Springer, pp 530–540
Xu J, Zhang Q, Yu Y, Zhao R, Bian X, Liu X, Wang J, Ge Z, Qian D (2022) Deep reconstruction-recoding network for unsupervised domain adaptation and multi-center generalization in colonoscopy polyp detection. Comput Methods Programs Biomed 214:106576
Mascagni P, Vardazaryan A, Alapatt D, Urade T, Emre T, Fiorillo C, Pessaux P, Mutter D, Marescaux J, Costamagna G et al (2021) Artificial intelligence for surgical safety: automatic assessment of the critical view of safety in laparoscopic cholecystectomy using deep learning. Ann Surg 6:66
Murali A, Alapatt D, Mascagni P, Vardazaryan A, Garcia A, Okamoto N, Mutter D, Padoy N (2023) Latent graph representations for critical view of safety assessment. IEEE Trans Med Imaging 66:1
Murali A, Alapatt D, Mascagni P, Vardazaryan A, Garcia A, Okamoto N, Mutter D, Padoy N (2023) Encoding surgical videos as latent spatiotemporal graphs for object and anatomy-driven reasoning. In: MICCAI. Springer, Berlin, pp 647–657
Murali A, Alapatt D, Mascagni P, Vardazaryan A, Garcia A, Okamoto N, Costamagna G, Mutter D, Marescaux J, Dallemagne B et al (2023) The endoscapes dataset for surgical scene segmentation, object detection, and critical view of safety assessment: official splits and benchmark. arXiv preprint arXiv:2312.12429
Basak H, Yin Z (2023) Semi-supervised domain adaptive medical image segmentation through consistency regularized disentangled contrastive learning. In: MICCAI. Springer, Berlin, pp 260–270
Sohan MF, Basalamah A (2023) A systematic review on federated learning in medical image analysis. IEEE Access 66:6
Choi S, Jung S, Yun H, Kim JT, Kim S, Choo J (2021) Robustnet: improving domain generalization in urban-scene segmentation via instance selective whitening. In: CVPR, pp 11580–11590
Chen Z, Pan Y, Ye Y, Cui H, Xia Y (2023) Treasure in distribution: a domain randomization based multi-source domain generalization for 2d medical image segmentation. In: MICCAI. Springer, Cham, pp 89–99
Hamoud I, Jamal MA, Srivastav V, Mutter D, Padoy N, Mohareri O (2023) St(or)\(^2\): spatio-temporal object level reasoning for activity recognition in the operating room. In: Medical imaging with deep learning
Özsoy E, Czempiel T, Holm F, Pellegrini C, Navab N (2023) Labrad-or: lightweight memory scene graphs for accurate bimodal reasoning in dynamic operating rooms. arXiv preprint arXiv:2303.13293
Holm F, Ghazaei G, Czempiel T, Özsoy E, Saur S, Navab N (2023) Dynamic scene graph representation for surgical video. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 81–87
Pang W, Islam M, Mitheran S, Seenivasan L, Xu M, Ren H (2022) Rethinking feature extraction: gradient-based localized feature extraction for end-to-end surgical downstream tasks. IEEE Robot Autom Lett 7(4):12623–12630
Acknowledgements
This work was supported by French state funds managed by the ANR within the National AI Chair program under Grant ANR-20-CHIA-0029-01 (Chair AI4ORSafety). This work was granted access to the HPC resources of IDRIS under the allocation AD011013523R1 made by GENCI.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Conflict of interest
The authors declare no conflict of interest.
Informed Consent
This manuscript does not contain any patient data.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Satyanaik, S., Murali, A., Alapatt, D. et al. Optimizing latent graph representations of surgical scenes for unseen domain generalization. Int J CARS 19, 1243–1250 (2024). https://doi.org/10.1007/s11548-024-03121-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11548-024-03121-2