Ren 2024 Phys. Med. Biol. 69 025009
Ren 2024 Phys. Med. Biol. 69 025009
Ren 2024 Phys. Med. Biol. 69 025009
To cite this article: Yu Ren et al 2024 Phys. Med. Biol. 69 025009 - NIR molecule induced self-assembled
nanoparticles for synergistic in vivo
chemo-photothermal therapy of bladder
cancer
Guanchen Zhu, Qingfeng Zhang, Xiaozhi
Zhao et al.
View the article online for updates and enhancements.
PAPER
RECEIVED
MM-SFENet: multi-scale multi-task localization and classification of
4 April 2023
REVISED
bladder cancer in MRI with spatial feature encoder network
25 November 2023
ACCEPTED FOR PUBLICATION
Yu Ren1,2,3,5, Guoli Wang2,3,5, Pingping Wang2,3, Kunmeng Liu2,3, Quanjin Liu1,∗, Hongfu Sun4,
13 December 2023 Xiang Li2,3 and Bengzheng Wei2,3,∗
PUBLISHED 1
College of Electronic Engineering and Intelligent Manufacturing, Anqing Normal University, Anqing 246133, Peopleʼs Republic of China
10 January 2024 2
Center for Medical Artificial Intelligence, Shandong University of Traditional Chinese Medicine, Qingdao 266112, Peopleʼs Republic of
China
3
Qingdao Academy of Chinese Medical Sciences, Shandong University of Traditional Chinese Medicine, Qingdao 266112, Peopleʼs
Republic of China
4
Urological department, Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan 250011, Peopleʼs Republic of
China
5
Yu Ren and Guoli Wang contributed equally to this work and should be considered co-first authors.
∗
Authors to whom any correspondence should be addressed.
E-mail: liuquanjin@aqnu.edu.cn and wbz99@sina.com
Keywords: bladder cancer, MRI, tumor detection, multi-scale, multi-task, deep learning
Abstract
Objective. Bladder cancer is a common malignant urinary carcinoma, with muscle-invasive and non-
muscle-invasive as its two major subtypes. This paper aims to achieve automated bladder cancer
invasiveness localization and classification based on MRI. Approach. Different from previous efforts
that segment bladder wall and tumor, we propose a novel end-to-end multi-scale multi-task spatial
feature encoder network (MM-SFENet) for locating and classifying bladder cancer, according to the
classification criteria of the spatial relationship between the tumor and bladder wall. First, we built a
backbone with residual blocks to distinguish bladder wall and tumor; then, a spatial feature encoder is
designed to encode the multi-level features of the backbone to learn the criteria. Main Results. We
substitute Smooth-L1 Loss with IoU Loss for multi-task learning, to improve the accuracy of the
classification task. By learning two datasets collected from bladder cancer patients at the hospital, the
mAP, IoU, Acc, Sen and Spec are used as the evaluation metrics. The experimental result could reach
93.34%, 83.16%, 85.65%, 81.51%, 89.23% on test set1 and 80.21%, 75.43%, 79.52%, 71.87%, 77.86%
on test set2. Significance. The experimental result demonstrates the effectiveness of the proposed MM-
SFENet on the localization and classification of bladder cancer. It may provide an effective
supplementary diagnosis method for bladder cancer staging.
1. Introduction
Bladder cancer is considered a critical malignancy with a high recurrence rate. According to the 2022 Global
Cancer Statistics report, bladder cancer has become the sixth most common cancer worldwide and caused the
ninth highest number of deaths (Siegel et al 2022). Based on the pathological depth of the tumor invasion,
bladder cancer is characterized as a heterogeneous disease with two major subtypes: muscle-invasive bladder
cancer (MIBC) and non-muscle-invasive bladder cancer (NMIBC) (Wong et al 2021). Earlier and accurate
classification of bladder cancer is critical for diagnosis, treatment, and follow-up of oncologic patients (Benson
et al 2009). Therefore, to improve the efficacy of bladder cancer screening and accurate and reliable diagnostic
methods are required.
The current diagnostic methods of bladder cancer are great challenges for clinicians, as the available tools for
diagnosis and staging include: (a) optical cystoscopy, an invasive and costly method; (b) computed tomography
(CT); (c) magnetic resonance imaging (MRI). Optical cystoscopy is regarded as the gold standard method for
bladder cancer diagnosis, but this procedure is painful for patients, and it may fail to visualize certain areas
within the bladder. Compared with optical cystoscopy, imaging techniques have been developed to detect
tumors non-invasively. Given its high soft-tissue contrast and non-invasive feature, MRI-based image texture
analysis technique has made methods using radiomics that predict tumor stage and grade a potential alternative
for bladder cancer evaluation (Caglic et al 2020). The popularity of medical imaging equipment has enriched
medical image data (Guiot et al 2022). After training on massive annotated data, data-driven deep learning
models are becoming increasingly powerful methods for solving medical imaging problems, such as image
reconstruction (Lv et al 2021), lesion detection (Yu et al 2021), image segmentation (Pan et al 2023) and image
registration (Wu et al 2022). In recent years, several studies have shown that deep learning-based tissue
segmentation methods using UNet are comparable to segmentation tasks in MRI examinations by radiologists,
especially for bladder wall segmentation (Kushnure and Talbar 2022).
UNet adopts convolutional neural network (CNN) to downsample and encode the global information of the
image, and a short-cut structure is added to fuse the downsampling and upsampling layers by channel splicing
(Dolz et al 2018). In 2016, Cha et al conducted a pilot study on bladder CT image segmentation, generating a
lesion likelihood map and refining boundaries with level sets for bladder cancer segmentation (Cha et al 2016).
With superior soft-tissue contrast over CT, MRI has seen the introduction of various UNet-based segmentation
methods for bladder wall extraction. In 2018, dolz et al (Kushnure and Talbar 2022) first applied UNet to
segment bladder wall and tumor in MRI, and they introduced progressive dilation convolutional layers to
expand the receptive fields and decrease the sparsity of the dilated kernel. Liu et al further enhanced UNet by
embedding a pyramidal atrous convolution block, capturing multi-scale contextual information for accurate
bladder wall segmentation (Liu et al 2019). Some studies have also been made in 3D CNN. Hammouda et al
introduced a 3D framework in T2W MRI, incorporating contextual information for each voxel and refining the
network with a conditional random field (Hammouda et al 2019). These studies highlight CNN’s ability,
particularly in MRI, to distinguish bladder cancer tumor and wall tissue based on their textures. Zhang et al used
a two-stage method, manually segmenting bladder tumors on CT images, and then training a classification
model for invasiveness classification (Zhang et al 2021).
Unexpectedly, we found that exsiting research focuses solely on segmenting the bladder wall using a deep
learning model, which has some limitations. So far, there is no end-to-end deep neural network for directly
localizing and classifying bladder cancer. In the domain of bladder cancer localization and classification in MRI,
a thorough review indicates a lack of prior studies in this area (Bandyk et al 2021). This research gap underscores
the need for urgent attention. However, bladder cancer localization and classification in MRI still have some
problems: (a) delineating the bladder wall and tumor is challenging due to very low contrast. (b) As a tumor
invasiveness classification criterion, the spatial relationship between bladder wall and tumor is difficult to learn
by a deep-learning model. (c) Tumors have varied shapes.
Accordingly, we proposed a systematical model to solve the bladder cancer localization and classification
problem, namely, multi-scale multi-task spatial feature encoder network (MM-SFENet). Specifically, the
anterior half of the proposed network consists of a backbone with residual connection, and a pyramidal spatial
feature encoder (SFE) based on feature pyramid networks (FPN) (Lin et al 2017) to encode different semantic
features. The posterior half of the model generates four multi-scale predicted results from different decoders to
enhance the classification capability. In terms of the localization task, we substitute the four-variable-
independent-regression Smooth-L1 Loss with IoU Loss to improve the performance of the localization.
Consequentially, the paper makes three main contributions.
(i) We propose a novel end-to-end detector, namely MM-SFENet, to localize and classify bladder cancer in
MRI, which is the first work of its kind.
(ii) We design an encoder SFE considering multi-scale spatial features and embed it into the detector to learn
the bladder cancer classification criteria.
(iii) Extensive experiments on the datasets indicate the importance of SFE and IoU Loss. Moreover, we conduct
comparisons with the latest detectors, and MM-SFENet outperforms state-of-the-art methods.
2. Related works
2
Phys. Med. Biol. 69 (2024) 025009 Y Ren et al
Contemporary state-of-the-art object detection methods almost follow two major paradigms: two-stage
detectors and one-stage detectors. As a standard paradigm of the two-stage detection methods (Uijlings et al
2013, Girshick et al 2014, Girshick 2015, He et al 2015, Ren et al 2017, Cai and Vasconcelos 2021, Sun et al 2021),
Faster R-CNN combines a proposal detector and a region-wise classifier. In the first stage, region proposal
network generates a coarse set of region proposals, and in the second stage, region classifiers provide confidence
scores and b-box of the proposed region. Since the Faster R-CNN is generated, the architecture of two-stage
detection methods has been formed and upgraded by its descendants Cascade R-CNN (Cai and
Vasconcelos 2021) and Sparse R-CNN (Sun et al 2021), according to multi-stage refinement and relationship
between targets, respectively. To better detect objects at multi-scale, an embedded feature pyramid is built to
perform feature fusion in FPN that has now become a basic module of the latest detectors due to its excellent
performance.
Compared with two-stage detectors, one-stage detectors (Sermanet et al 2013, Liu et al 2016, Redmon and
Farhadi 2017, 2018, Bochkovskiy et al 2020, Lin et al 2020) are much more desired for real-time object detection,
but they are less accurate. These works make significant progress in different areas. The end-to-end bladder
cancer localization and classification method can be well implemented by a deep learning-based two-stage
detector.
From equation (1), it is clear that the receptive field size increase with the number of layers, and each
convolutional layer ‘sees’ different information that is an object.
A deep CNN exhibits a pyramidal shape with inherent multi-scale characteristics, where higher layers
convey more abstract and semantically meaningful information about objects, while lower layers represent low-
level details like edges, contours, and positions. Detectors like SSD (Liu et al 2016) and MS-CNN (Cai et al 2016)
operate independently on feature maps from various backbone hierarchies, avoiding the mixing of high- and
low-level information. This approach, termed prediction pyramid network (PPN), is illustrated in figure 1(b).
However, PPN introduces semantic gaps due to distinct feature hierarchies, particularly affecting the accurate
prediction of small objects. To address this, Lin et al proposed the feature pyramid network (FPN) as a solution,
depicted in figure 1(c). FPN, constructed upon the characteristics of CNN, integrates multiple shallow and
abstract features in a top-down architecture, serving as an encoder that encodes multi-scale features from the
backbone and provides multi-level feature representations for the decoder (detection heads) (Redmon and
Farhadi 2017, Chen et al 2021).
Faster R-CNN and its descendants have been utilized in medical image analysis with low demand for real-
time. The first step in R-CNNs involves downsampling for extracting high-level semantic features, achieved
through the backbone. For tasks like differentiating bladder wall from tumors in medical images, a deeper CNN-
based backbone is necessary. This suggests the importance of CNN judgment in tissue differentiation. Due to
difficulties in converging to a minimum and the occurrence of vanishing gradients in standard deep CNNs like
VGG (Simonyan and Zisserman 2014), the literature introduces a solution: residual connections (He et al 2016),
employed in the backbone to effectively shorten the gradient flow path. However, the residual connection
backbone exhibits a limitation: when a single last feature map is input into the region proposal network during
inference, it is sensitive only to a specific size range. To overcome this, a common strategy is to add a feature
3
Phys. Med. Biol. 69 (2024) 025009 Y Ren et al
pyramid network (FPN) as a neck after the backbone. FPN enables the backbone to extract multi-scale features
and fuse semantic information from different layers, including spatial information in lower layers, enhancing
both localization accuracy and classification performance.
0.5d 2 ∣d ∣ < 1
Smooth L1 Loss (d ) = ⎧ (2)
⎨
⎩ ∣ d ∣ - 0.5 otherwise
As stated in equation (2), only the Euclidean distance d between points is considered by Smooth-L1 Loss, which
loses the scale information of the prediction box. It will result in the rapid convergence of the network during the
training stages, but the deviation between the prediction box and the ground-truth box will occur during the test
stage.
(i) Dataset1: The dataset1 comprises a total of 1287 MRI images obtained from 98 patients diagnosed with
bladder cancer, of which 130 images are used as an independent test set and the rest as a training set. The
sequence parameters are configured as follows: 80–124 slices per scan, each slice measuring
512 × 512 pixels with a pixel resolution of 0.5 mm × 0.5 mm. The slice thickness and inter-slice spacing are
both set at 1 mm. The 3D scanning process had acquisition times ranging from 160.456 to 165.135 s. The
repetition and echo times are established at 2500 ms and 135 ms, respectively.
(ii) Dataset2: The dataset2 comprises a total of 2000 MRI images obtained from 121 patients diagnosed with
bladder cancer. The sequence parameters are configured as follows: 80–124 slices per scan, each slice
measuring 256 × 256 pixels with a pixel resolution of 0.5 mm × 0.5 mm. All other cases are the same as
dataset1. The dataset, including data of 121 cases, is divided into MRI images and tumor invasiveness type
annotation files. We strictly divide dataset2 into train set, valid set and test set in the ratio of 7:3:1. 1400
images for the train set, 400 images for the valid set, and 200 images for the test set.
4
Phys. Med. Biol. 69 (2024) 025009 Y Ren et al
3.4. MM-SFENet
Based on the above discussion, we propose a MM-SFENet, and its detailed architecture is shown in figure 4. In
this section, the two key technical components of MM-SFENet will be introduced in detail.
5
Phys. Med. Biol. 69 (2024) 025009 Y Ren et al
Figure 4. Models architecture for detecting and localizing bladder cancer. SFE based on FPN make the network can detect at multi-
scale and fuse multi-level information at the same time. We assign anchors of a corresponding single scale to multi-level, we define the
anchors to have are {322, 642, 1282, 2562} pixels. Accordingly, RoIs of different scales needed to be assigned to the pyramid levels, and
then summarize the outputs to decoder, duplicate predictions are eliminated by NMS operation.
Figure 5. The architecture of SFE. SFE taken an arbitrary bladder cancer MRI as input, short and long connection inside the network
increase the feature representation capability of model.
6
Phys. Med. Biol. 69 (2024) 025009 Y Ren et al
The feature fusion process is depicted in the right half of figure 5, which requires the feature map to be
consistent in size and channel.
Step 1. To achieve this alignment, we perform a feature map channel transformation, changing the number of
channels to 256 for feature map Mi−1. This transformation could involve employing 1 × 1 convolutional
layers, which are commonly used for this purpose in deep learning architectures.
Step 2. Next, we proceed with upsampling and size enlargement to match the resolution of the feature map Mi−1.
Techniques like bilinear interpolation or transposed convolution can be employed for this upsampling
process.
Step 3. Subsequently, the fused feature map Pi−1 is obtained by combining the upsampled feature map Pi with the
transformed shallow feature map Mi−1. This fusion process typically involves an element-wise operation,
such as addition or 1 × 1 convolutional layers, which allows for the integration of information from both
feature maps.
By following this process of feature map transformation, upsampling, and element-wise, we successfully align
the feature maps’ size and channel, and fusing the information.
7
Phys. Med. Biol. 69 (2024) 025009 Y Ren et al
The loss function is inversely proportional to the intersection area of ¶I ¶x˜t , b, l, r . The larger the intersection
area, the smaller the loss function. The four coordinates of the b-box are regarded as a whole in IoU loss, which
ensures that the predication-box scale is similar to that of the ground-truth box during the b-box regression.
where P(k) is the height of the k b-box under the PR curve, and Δr(k) is the width of the b-box. The formula for
calculating mAP is shown as follows,
åAP
m AP = (8)
m
where m is the total number of categories. The formula for calculating IoU is shown as follows,
Box p Ç BoxT
IoU (Box p, BoxT) = (9)
Box p È BoxT
where Boxp is the predicted box area, and BoxT is the ground-truth box area. The formulas of Accuracy,
Sensitivity and Specificity are as follows,
TP + TN
Accuracy = (10)
TP + FP + TN + FN
8
Phys. Med. Biol. 69 (2024) 025009 Y Ren et al
TP
Sensitivity = (11)
TP + FN
TN
Specificity = . (12)
TN + FP
Bladder cancer detection process can be divided into two parts: localization and classification, based on
which we design loss functions for multi-task learning. We use binary cross-entropy for classification and IoU
Loss for b-box regression, and the calculation formula is shown as follows equation (13).
where pi* is the probability that the classifier is judged as MIBC, and Pi is the probability that the classifier is
judged as NMIBC. In the total loss function, Npred is the total number of prediction boxes outputted by the final
detector, and λ is the localization loss function weight.
4. Results
9
Phys. Med. Biol. 69 (2024) 025009 Y Ren et al
Figure 8. The localization loss function in the curve used is Smooth-L1 Loss. Modulating the weight values in the localization loss
function of the detector can also have an impact on classification loss. This interconnection arises from their coupling, and during
back propagation, the gradient values concurrently optimize both aspects.
Table 3. The results in bold are the best results obtained on the test set in Dataset1.
Classification accuracy is specific to the NMIBC subtype.
Detectors mAP (%) IoU (%) Spec (%) Sen (%) Acc (%)
10
Phys. Med. Biol. 69 (2024) 025009 Y Ren et al
Table 4. The results in bold are the best results obtained on the test set in Dataset2.
Classification accuracy is specific to the NMIBC subtype.
Detectors mAP (%) IoU (%) Spec (%) Sen (%) Acc (%)
Table 5. MM-SFENet detection results of 10-fold cross validation and classification results of ResNet18
on the BcMnist dataset.
Fold mAP (%) IoU (%) Spec (%) Sen (%) Acc (%) BCMnist Acc (%)
In this study, when we are attempting to adjust the weights of several detectors within our model, we observe that
excessively high weight values could lead to the loss function exhibiting NaN (Not-a-Number) values during
training, rendering the model unable to converge effectively. Figure 8(b) shows that, when increasing the weight
value, additional loss values will be generated, which increases the overall gradient during training. This
adjustment proved to be crucial in enabling the model to converge properly. The results underscore the delicate
balance required when tuning the weight values associated with the localization loss function. While assigning
appropriate weights is essential for effective training guidance, setting them too high can be counterproductive.
The results in table 2 reveal an interesting phenomenon where the depth of the backbone network is not
directly proportional to detection accuracy. ResNet50 excels compared to enhanced ResNet backbones (e.g.
ResNeXt101 and ResNeSt50), despite their inclusion of attention mechanism modules. This indicates that, in
11
Phys. Med. Biol. 69 (2024) 025009 Y Ren et al
bladder cancer tumor detection, the anticipated need for high feature extraction capability may be less
significant. Our results emphasize the crucial role of feature fusion in the SFE architecture for effective bladder
cancer tumor detection. SFE’s design excels in extracting and fusing multi-scale features, underscoring its
performance. In summary, our findings emphasize the nuanced interplay between backbone architecture and
detection accuracy in bladder cancer tumor detection.
Compared to MM-SFENet’s intricate tumor detection task, BCMnist involves a singular classification
objective. Unlike the detection task in MM-SFENet, BCMnist’s classification task excludes SFE’s feature fusion
due to its small input image size. Shallow neural networks prove effective in distinguishing bladder wall and
tumors, as evidenced by ablation experiments, which suggest that deep neural networks are not always necessary
for detecting tumor invasiveness. Figure 9 shows that the model’s impressive ability to discern subtle textural
differences between the bladder wall and tumors in MRI. This nuanced distinction facilitates the accurate
classification of both MIBC and NMIBC subtypes. The visualization results serve as compelling evidence that
our proposed detection model possesses the capability to differentiate between bladder wall and tumors at a
pixel-level granularity.
However, our study faces two notable limitations. Firstly, the limited availability and labeling quality of
existing MRI datasets for bladder cancer constrained model development and validation, potentially affecting
generalizability due to variations in diagnostic standards and imaging machines among doctors and hospitals.
Secondly, ensuring interpretability in diagnostic criteria, a common concern when deploying deep learning
models for computer-aided diagnosis, is also challenging.
12
Phys. Med. Biol. 69 (2024) 025009 Y Ren et al
Notably, compared with other machine learning methods, deep learning is a complex black box. Optimizing
this model in the future requires incorporating doctors’ ideas and experiences in disease diagnosis and treatment
to enhance interpretability. Only when the doctor can understand the reason why the model makes the
assessment can the model better assist the doctor in decision-making. VI-RADs is a newly developed scoring
system aimed at standardization of MRI acquisition, interpretation, and reporting for bladder cancer. It has been
proven to be a reliable tool in differentiating NMIBC from MIBC. Our next step is to integrate the industry-
recognized VI-RADs staging standards and multi-parameter MRI with bladder cancer staging (Panebianco et al
2018). This standardized approach aims to significantly enhance the accuracy of the model for bladder cancer
staging and subsequently improve the prognosis of patients.
Acknowledgments
We sincerely appreciate the valuable contributions of reviewers to our article. Their professional knowledge and
rigorous evaluations have helped refine the paper. The constructive comments improved article quality and
readability. We appreciate the hard work and dedication of the reviewers.
The data cannot be made publicly available upon publication because they contain sensitive personal
information. The data that support the findings of this study are available upon reasonable request from the
authors.
Funding
This work is partly supported by the National Nature Science Foundation of China (No.61872225), the Natural
Science Foundation of Shandong Province (No.ZR2020KF013, No.ZR2020ZD44, No.ZR2019ZD04, No.
ZR2020QF043) and Introduction and Cultivation Program for Young Creative Talents in Colleges and
Universities of Shandong Province (No.2019-173), the Special fund of Qilu Health and Health Leading Talents
Training Project.
Ethical statement
This study was approved by the Ethics Committee of Shandong University of Traditional Chinese Medicine.
The ethical approval number is 2020-079. All procedures contributing to this work comply with the ethical
standards of the relevant national and institutional committees on human experimentation and the Helsinki
Declaration of 1975, as revised in 2008. All participants signed an informed consent form before the study.
References
Babjuk M et al 2022 European Association of Urology Guidelines on Non–muscle-invasive Bladder Cancer (Ta, T1, and Carcinoma in Situ)
Eur. Urol. 81 75–94
Bandyk M G, Gopireddy D R, Lall C, Balaji K and Dolz J 2021 MRI and CT bladder segmentation from classical to deep learning based
approaches: Current limitations and lessons Comput. Biol. Med. 134 104472
Benson A B et al 2009 NCCN clinical practice guidelines in oncology: hepatobiliary cancers J. Natl Comprehensive Cancer Netw. : JNCCN 7
350–91
Bochkovskiy A, Wang C Y and Liao H Y M 2020 Yolov4: optimal speed and accuracy of object detection arXiv:2004.10934
Caglic I, Panebianco V, Vargas H A, Bura V, Woo S, Pecoraro M, Cipollari S, Sala E and Barrett T 2020 MRI of Bladder Cancer: Local and
Nodal Staging J. Magn. Reson. Imaging 52 649–67
Cai Z, Fan Q, Feris R S and Vasconcelos N 2016 A unified multi-scale deep convolutional neural network for fast object detection Computer
Vision—ECCV 2016 ed B Leibe et al (Netherlands: Springer International Publishing) 354–70
Cai Z and Vasconcelos N 2021 Cascade R-CNN: High Quality Object Detection and Instance Segmentation IEEE Trans. Pattern Anal. Mach.
Intell. 43 1483–98
Cha K H, Hadjiiski L M, Samala R K, Chan H P, Cohan R H, Caoili E M, Paramagul C, Alva A and Weizer A Z 2016 Bladder Cancer
Segmentation in CT for Treatment Response Assessment: Application of Deep-Learning Convolution Neural Network—A Pilot
Study Tomography 2 421–29
13
Phys. Med. Biol. 69 (2024) 025009 Y Ren et al
Chen Q, Wang Y, Yang T, Zhang X, Cheng J, Sun J et al 2021 You only look one-level feature 2021 IEEE/CVF Conf. on Computer Vision and
Pattern Recognition (CVPR)You only look one-level feature 13034–43
Dolz J, Xu X, Rony J, Yuan J, Liu Y, Granger E, Desrosiers C, Zhang X, Ben Ayed I, Lu H et al 2018 Multiregion segmentation of bladder
cancer structures in MRI with progressive dilated convolutional networks Med. Phys. 45 5482–93
Girshick R et al 2015 IEEE Int. Conf. on Computer Vision (ICCV)Fast R-CNN 1440–48
Girshick R, Donahue J, Darrell T, Malik J et al 2014 IEEE Conf. on Computer Vision and Pattern RecognitionRich feature hierarchies for accurate
object detection and semantic segmentation 580–87
Gsaxner C, Pfarrkirchner B, Lindner L, Pepe A, Roth P M, Egger J and Wallner J 2018 Biomedical Engineering Int. Conf. (BMEiCON)PET-
Train: Automatic Ground Truth Generation from PET Acquisitions for Urinary Bladder Segmentation in CT Images using Deep Learning
(Chiang Mai, Thailand, 21-24 November 2018) (IEEE) pp 1–5
Guiot J et al 2022 A review in radiomics: Making personalized medicine a reality via routine imaging Medicinal Res. Rev. 42 426–40
Hammouda K et al 2019 Int. Conf. on Advances in Biomedical Engineering (ICABME)A cnn-based framework for bladder wall segmentation
using mri pp 1–4
He K, Zhang X, Ren S and Sun J 2015 Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition IEEE Trans. Pattern
Anal. Mach. Intell. 37 1904–16
He K, Zhang X, Ren S and Sun J 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)Deep residual learning for image
recognition pp 770–8
Kushnure D T and Talbar S N 2022 HFRU-Net: High-Level Feature Fusion and Recalibration UNet for Automatic Liver and Tumor
Segmentation in CT Images Comput. Methods Programs Biomed. 213 106501
Lin T Y, Dollár P, Girshick R, He K, Hariharan B and Belongie S 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)Feature
pyramid networks for object detection pp 936–44
Lin T Y, Goyal P, Girshick R, He K, Dollár P et al 2020 Focal Loss for Dense Object Detection IEEE Trans. Pattern Anal. Mach. Intell. 42
318–27
Liu J, Liu L, Xu B, Hou X, Liu B, Chen X, Shen L, Qiu G et al 2019 IEEE 16th Int. Symp. on Biomedical Imaging (ISBI 2019)Bladder Cancer
Multi-Class Segmentation in MRI With Pyramid-In-Pyramid Network 28–31
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C 2016 SSD: Single Shot MultiBox Detector Computer Vision—ECCV
2016 (Netherlands: Springer International Publishing) 21–37
Luo W, Li Y, Urtasun R, Zemel R et al 2017 ArXiv https://arxiv.org/abs/1701.04128 2017arXiv:1701.04128Understanding the effective
receptive field in deep convolutional neural networks
Lv J, Wang C and Yang G 2021 PIC-GAN: A Parallel Imaging Coupled Generative Adversarial Network for Accelerated Multi-Channel MRI
Reconstruction Diagnostics 11 1
Omeiza D, Speakman S, Cintas C and Weldermariam K 2019 Smooth Grad-CAM++: An Enhanced Inference Level Visualization
Technique for Deep Convolutional Neural Network Models arXiv:1701.04128
Pan S et al 2023 2D medical image synthesis using transformer-based denoising diffusion probabilistic model Phys. Med. Biol. 68 105004
Panebianco V et al 2018 Multiparametric Magnetic Resonance Imaging for Bladder Cancer: Development of VI-RADS (Vesical Imaging-
Reporting And Data System Eur. Urol. 74 294–306
Pinto J R and Tavares J M R 2017 A versatile method for bladder segmentation in computed tomography two-dimensional images under
adverse conditions Proc. Inst. Mech. Eng. 231 871–80
Redmon J and Farhadi A 2017 YOLO9000: Better, Faster, Stronger IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
YOLO9000: Better, Faster, Stronger 6517–25
Redmon J and Farhadi A 2018 YOLOv3: An incremental improvement arXiv:1804.02767
Ren S, He K, Girshick R, Sun J et al 2017 Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks IEEE Trans.
Pattern Anal. Mach. Intell. 39 1137–49
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R and LeCun Y 2013 OverFeat: integrated recognition, localization and detection using
convolutional networks arXiv:1312.6229
Siegel R L, Miller K D, Fuchs H E, Jemal A et al 2022 Cancer statistics, 2022 CA: A Cancer Journal for Clinicians 72 7–33
Simonyan K and Zisserman A 2014 Very deep convolutional networks for large-scale image recognition arXiv.1409.1556
Sun P et al 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR)Sparse r-cnn: End-to-end object detection with learnable
proposals pp 14449–58
Uijlings J, vandeSande K, Gevers T, Smeulders A et al 2013 Selective Search for Object Recognition Int. J. Comput. Vision 104 154–71
Wong V K, Ganeshan D, Jensen C T, Devine C E et al 2021 Imaging and Management of Bladder Cancer Cancers 13 1396
Wu C, Fu T, Wang Y, Lin Y, Wang Y, Ai D, Fan J, Song H and Yang J 2022 Fusion Siamese network with drift correction for target tracking in
ultrasound sequences Phys. Med. Biol. 67 4
Xu X P, Zhang X, Liu Y, Tian Q, Zhang G P, Yang Z Y, Lu H B and Yuan J 2017 Image and Graphics ed Y Zhao, X Kong and D Taubman
(Shanghai: Springer International Publishing) 528–42
Yang J, Shi R, Wei D, Liu Z, Zhao L, Ke B, Pfister H, Ni B et al 2023 MedMNIST v2-A large-scale lightweight benchmark for 2D and 3D
biomedical image classification Sci. Data 10 41
Yu C J et al 2021 Lightweight deep neural networks for cholelithiasis and cholecystitis detection by point-of-care ultrasound Comput.
Methods Programs Biomed. 211 106382
Yu J, Jiang Y, Wang Z, Cao Z and Huang T 2016 UnitBox: An Advanced Object Detection Network Proceedings of the XXIV ACM
International Conference on Multimedia, MM ’16 (Netherlands: Association for Computing Machinery) 516–20
Zhang G et al 2021 Deep Learning on Enhanced CT Images Can Predict the Muscular Invasiveness of Bladder Cancer Front. Oncol. 11 1
Zhu Q, Du B, Yan P, Lu H and Zhang L 2018 Shape prior constrained PSO model for bladder wall MRI segmentation Neurocomputing 294
19–28
14