-
A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities
Authors:
Ibrahim Ethem Hamamci,
Sezgin Er,
Furkan Almas,
Ayse Gulnihan Simsek,
Sevval Nil Esirgun,
Irem Dogan,
Muhammed Furkan Dasdelen,
Bastian Wittmann,
Enis Simsar,
Mehmet Simsar,
Emine Bensu Erdemir,
Abdullah Alanbay,
Anjany Sekuboyina,
Berkan Lafci,
Mehmet K. Ozdemir,
Bjoern Menze
Abstract:
A major challenge in computational research in 3D medical imaging is the lack of comprehensive datasets. Addressing this issue, our study introduces CT-RATE, the first 3D medical imaging dataset that pairs images with textual reports. CT-RATE consists of 25,692 non-contrast chest CT volumes, expanded to 50,188 through various reconstructions, from 21,304 unique patients, along with corresponding r…
▽ More
A major challenge in computational research in 3D medical imaging is the lack of comprehensive datasets. Addressing this issue, our study introduces CT-RATE, the first 3D medical imaging dataset that pairs images with textual reports. CT-RATE consists of 25,692 non-contrast chest CT volumes, expanded to 50,188 through various reconstructions, from 21,304 unique patients, along with corresponding radiology text reports. Leveraging CT-RATE, we developed CT-CLIP, a CT-focused contrastive language-image pre-training framework. As a versatile, self-supervised model, CT-CLIP is designed for broad application and does not require task-specific training. Remarkably, CT-CLIP outperforms state-of-the-art, fully supervised methods in multi-abnormality detection across all key metrics, thus eliminating the need for manual annotation. We also demonstrate its utility in case retrieval, whether using imagery or textual queries, thereby advancing knowledge dissemination. The open-source release of CT-RATE and CT-CLIP marks a significant advancement in medical AI, enhancing 3D imaging analysis and fostering innovation in healthcare.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Simulation-Based Segmentation of Blood Vessels in Cerebral 3D OCTA Images
Authors:
Bastian Wittmann,
Lukas Glandorf,
Johannes C. Paetzold,
Tamaz Amiranashvili,
Thomas Wälchli,
Daniel Razansky,
Bjoern Menze
Abstract:
Segmentation of blood vessels in murine cerebral 3D OCTA images is foundational for in vivo quantitative analysis of the effects of neurovascular disorders, such as stroke or Alzheimer's, on the vascular network. However, to accurately segment blood vessels with state-of-the-art deep learning methods, a vast amount of voxel-level annotations is required. Since cerebral 3D OCTA images are typically…
▽ More
Segmentation of blood vessels in murine cerebral 3D OCTA images is foundational for in vivo quantitative analysis of the effects of neurovascular disorders, such as stroke or Alzheimer's, on the vascular network. However, to accurately segment blood vessels with state-of-the-art deep learning methods, a vast amount of voxel-level annotations is required. Since cerebral 3D OCTA images are typically plagued by artifacts and generally have a low signal-to-noise ratio, acquiring manual annotations poses an especially cumbersome and time-consuming task. To alleviate the need for manual annotations, we propose utilizing synthetic data to supervise segmentation algorithms. To this end, we extract patches from vessel graphs and transform them into synthetic cerebral 3D OCTA images paired with their matching ground truth labels by simulating the most dominant 3D OCTA artifacts. In extensive experiments, we demonstrate that our approach achieves competitive results, enabling annotation-free blood vessel segmentation in cerebral 3D OCTA images.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Link Prediction for Flow-Driven Spatial Networks
Authors:
Bastian Wittmann,
Johannes C. Paetzold,
Chinmay Prabhakar,
Daniel Rueckert,
Bjoern Menze
Abstract:
Link prediction algorithms aim to infer the existence of connections (or links) between nodes in network-structured data and are typically applied to refine the connectivity among nodes. In this work, we focus on link prediction for flow-driven spatial networks, which are embedded in a Euclidean space and relate to physical exchange and transportation processes (e.g., blood flow in vessels or traf…
▽ More
Link prediction algorithms aim to infer the existence of connections (or links) between nodes in network-structured data and are typically applied to refine the connectivity among nodes. In this work, we focus on link prediction for flow-driven spatial networks, which are embedded in a Euclidean space and relate to physical exchange and transportation processes (e.g., blood flow in vessels or traffic flow in road networks). To this end, we propose the Graph Attentive Vectors (GAV) link prediction framework. GAV models simplified dynamics of physical flow in spatial networks via an attentive, neighborhood-aware message-passing paradigm, updating vector embeddings in a constrained manner. We evaluate GAV on eight flow-driven spatial networks given by whole-brain vessel graphs and road networks. GAV demonstrates superior performances across all datasets and metrics and outperformed the state-of-the-art on the ogbl-vessel benchmark at the time of submission by 12% (98.38 vs. 87.98 AUC). All code is publicly available on GitHub.
△ Less
Submitted 18 January, 2024; v1 submitted 25 March, 2023;
originally announced March 2023.
-
Focused Decoding Enables 3D Anatomical Detection by Transformers
Authors:
Bastian Wittmann,
Fernando Navarro,
Suprosanna Shit,
Bjoern Menze
Abstract:
Detection Transformers represent end-to-end object detection approaches based on a Transformer encoder-decoder architecture, exploiting the attention mechanism for global relation modeling. Although Detection Transformers deliver results on par with or even superior to their highly optimized CNN-based counterparts operating on 2D natural images, their success is closely coupled to access to a vast…
▽ More
Detection Transformers represent end-to-end object detection approaches based on a Transformer encoder-decoder architecture, exploiting the attention mechanism for global relation modeling. Although Detection Transformers deliver results on par with or even superior to their highly optimized CNN-based counterparts operating on 2D natural images, their success is closely coupled to access to a vast amount of training data. This, however, restricts the feasibility of employing Detection Transformers in the medical domain, as access to annotated data is typically limited. To tackle this issue and facilitate the advent of medical Detection Transformers, we propose a novel Detection Transformer for 3D anatomical structure detection, dubbed Focused Decoder. Focused Decoder leverages information from an anatomical region atlas to simultaneously deploy query anchors and restrict the cross-attention's field of view to regions of interest, which allows for a precise focus on relevant anatomical structures. We evaluate our proposed approach on two publicly available CT datasets and demonstrate that Focused Decoder not only provides strong detection results and thus alleviates the need for a vast amount of annotated data but also exhibits exceptional and highly intuitive explainability of results via attention weights. Our code is available at https://github.com/bwittmann/transoar.
△ Less
Submitted 26 February, 2023; v1 submitted 21 July, 2022;
originally announced July 2022.
-
Relationformer: A Unified Framework for Image-to-Graph Generation
Authors:
Suprosanna Shit,
Rajat Koner,
Bastian Wittmann,
Johannes Paetzold,
Ivan Ezhov,
Hongwei Li,
Jiazhen Pan,
Sahand Sharifzadeh,
Georgios Kaissis,
Volker Tresp,
Bjoern Menze
Abstract:
A comprehensive representation of an image requires understanding objects and their mutual relationship, especially in image-to-graph generation, e.g., road network extraction, blood-vessel network extraction, or scene graph generation. Traditionally, image-to-graph generation is addressed with a two-stage approach consisting of object detection followed by a separate relation prediction, which pr…
▽ More
A comprehensive representation of an image requires understanding objects and their mutual relationship, especially in image-to-graph generation, e.g., road network extraction, blood-vessel network extraction, or scene graph generation. Traditionally, image-to-graph generation is addressed with a two-stage approach consisting of object detection followed by a separate relation prediction, which prevents simultaneous object-relation interaction. This work proposes a unified one-stage transformer-based framework, namely Relationformer, that jointly predicts objects and their relations. We leverage direct set-based object prediction and incorporate the interaction among the objects to learn an object-relation representation jointly. In addition to existing [obj]-tokens, we propose a novel learnable token, namely [rln]-token. Together with [obj]-tokens, [rln]-token exploits local and global semantic reasoning in an image through a series of mutual associations. In combination with the pair-wise [obj]-token, the [rln]-token contributes to a computationally efficient relation prediction. We achieve state-of-the-art performance on multiple, diverse and multi-domain datasets that demonstrate our approach's effectiveness and generalizability.
△ Less
Submitted 18 March, 2022;
originally announced March 2022.