Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3700410.3702130acmconferencesArticle/Chapter ViewFull TextPublication PagesmmConference Proceedingsconference-collections
poster
Free access

TPSegmentDiff: An Enhanced Diffusion Model for Tactile Paving Image Segmentation

Published: 26 December 2024 Publication History

Abstract

The traditional diffusion model is widely used in image generation. This study enhanced the traditional diffusion model and introduced a diffusion-based method for tactile paving image segmentation called TPSegmentDiff. We also proposed a voting mechanism to reduce the random error caused by the random distribution of the initial noise, improving the performance of TPSegmentDiff. In particular, in the TPSegmentDiff-DDPM algorithm, the performance is improved by about 5% after applying the voting mechanism on the MIoU metric. Similarly, the F1 score of the TPSegmentDiff-DDPM model is improved by about 4%.
Figure 1:
Figure 1: We modified the diffusion model and proposed a diffusion-based image segmentation method, TPSegmentDiff, for tactile paving. To address the issue of unstable segmentation results caused by the random distribution of initial noise, we designed a voting mechanism. This mechanism plays a crucial role in improving the stability and reliability of the segmentation process, thereby enhancing the overall performance of TPSegmentDiff.

1 Introduction

Effective recognition and segmentation of tactile paving in computer vision remain challenging due to the diversity of environments and the varying wear conditions of the tactile paving itself. While effective in simple scenarios, traditional image segmentation methods often need help in complex or dynamic environments, limiting the effectiveness and reliability of intelligent assistive devices in real-world applications[9]. Currently, methods for tactile paving segmentation can be divided into traditional image processing methods and deep learning-based methods.
Traditional methods typically use edge detection (such as Canny edge detection) and morphological processing (such as erosion and dilation) to identify the lines and texture features of tactile paving. Another common approach is color and texture analysis, which uses the color and texture characteristics of tactile paving, employing color space transformation and texture analysis techniques for segmentation. These methods perform poorly under varying lighting conditions and in complex backgrounds[6].
Deep learning introduces new approaches to tactile paving segmentation. Convolutional Neural Networks (CNNs) for tactile paving segmentation can significantly improve detection accuracy and robustness. Typical CNNs like FCN, U-Net, and SegNet have been widely applied to image segmentation tasks, automatically extracting significant features of objects through end-to-end learning[8][11][1]. Deep semantic segmentation networks such as DeepLab and PSPNet, by introducing techniques like atrous convolution and spatial pyramid pooling, further enhance the precision and detail retention of tactile paving segmentation[12]. Transfer Learning leverages models pre-trained on large datasets to achieve good segmentation results on small tactile paving datasets, effectively reducing the need for extensive labeled data[4]. Multimodal Fusion combining RGB images and depth images can enhance the robustness and accuracy of tactile paving segmentation. Depth information provides additional geometric features, improving segmentation performance in complex scenes[16].
This study aims to propose a high-precision tactile paving image segmentation method based on diffusion models, named TPSegmentDiff. This method has the potential to significantly improve navigation for visually impaired individuals in complex urban environments. It not only increases the accuracy of the segmentation algorithm but also enhances its adaptability to changes in complex environments, such as varying lighting conditions and the wear state of tactile paving.
The remainder of the paper is organized as follows: Section 2 reviews related work in image segmentation and diffusion models. Section 3 introduces the fundamental concepts of U-Net and diffusion models. Section 4 presents our method, TPSegmentDiff, including adapting diffusion models for tactile paving segmentation. Section 5 presents the experimental analysis, followed by a comparison with existing methods. Finally, Section 6 concludes the paper and suggests future research directions.

2 Related work

2.1 Image Segmentation

Image segmentation involves dividing an image into multiple regions with distinct features. Early image segmentation techniques were primarily based on simple threshold processing, such as the Otsu method, which determines the optimal threshold by maximizing inter-class variance[20]. More complex edge detection algorithms, like the Sobel and Canny algorithms, handle edge information in images more effectively[13][2].
Advancements in deep learning have significantly propelled the progress of image segmentation techniques. Notably, the U-Net architecture proposed by Ronneberger et al. effectively enhances local and global image information processing through deep networks and skip connections, significantly improving segmentation outcomes in fields like medical imaging[12].
Recent research has made remarkable progress in the field of image segmentation. Oktay et al. introduced the Attention U-Net, which incorporates an attention mechanism to selectively focus on important regions in the image, thereby improving segmentation accuracy and further enhancing U-Net’s performance in medical image segmentation[10].
Chen et al. developed DeepLabV3+, which combines Atrous Spatial Pyramid Pooling (ASPP) with encoder-decoder architecture, significantly enhancing the accuracy and boundary detail handling of image segmentation, particularly excelling in natural scene image segmentation[5].
Wang et al. proposed HRNet (High-Resolution Network), which maintains high-resolution feature maps to enhance the fineness and accuracy of image segmentation. This method is especially suitable for tasks requiring fine segmentation, such as facial recognition and medical image segmentation[17].
Xie et al. developed SegFormer, a Transformer-based model that leverages the global feature extraction capabilities of Transformers, significantly improving segmentation performance, especially in handling complex backgrounds and diverse scenes[18].
Cao et al. introduced Swin-UNet, which combines the strengths of Swin Transformer and U-Net through multi-scale feature fusion and global information capture, achieving high-precision image segmentation. It demonstrates outstanding performance in both medical and remote sensing image segmentation[3].

2.2 Diffusion Model

Diffusion models are generative models based on stochastic processes, initially used to simulate diffusion processes in physical phenomena. In recent years, diffusion models have been introduced into the field of machine learning, demonstrating excellent performance, particularly in image generation tasks. These models achieve the transformation from simple noise data to complex data structures by gradually introducing noise and then progressively removing it during the generation process.
The introduction of diffusion models has opened new research directions in image processing, offering new possibilities for handling complex image data.
Sohl-Dickstein et al. first proposed the theories of the Diffusion model, which uses the Markov chain inverse process to denoise and recover data[14].
Ho et al. proposed the DDPM (Denoising Diffusion Probabilistic Models) algorithm, which introduces the diffusion model into the field of image generation[7].
Song et al. subsequently proposed the DDIM (Denoising Diffusion Implicit Models) algorithm, using a non-Markov-chain method to achieve skip-step sampling substantially and improve image generation speed[15].
These advancements provide the theoretical foundation and technical support for applying such models in image segmentation tasks.

3 Background

The traditional diffusion models DDPM and DDIM both consist of forward and backward processes.
The forward process is mainly used to corrupt the data by adding noise, which can be described as follows: For given original data x0, Gaussian noise is added step by step as Eq.1:
\begin{equation}x_{t} = \sqrt {1 - \beta _t} x_{t-1} + \sqrt {\beta _t} \epsilon _t\end{equation}
(1)
where xt − 1 represents the image at the previous time step, i.e., the image before noise is added at the current time step, xt represents the image after noise is added at the current time step, and βt represents the variance used at the current time step. By simplification, the noisy data at time step t can be directly calculated in Eq.2 and Eq.3:
\begin{equation}x_t = \sqrt {\bar{\alpha }_t} x_0 + \sqrt {1 - \bar{\alpha }_t} \epsilon\end{equation}
(2)
\begin{equation}\bar{\alpha }_t = \prod _{i=1}^{t} (1 - \beta _i)\end{equation}
(3)
where αt = 1 − βt and \(\bar{\alpha }_t = \prod _{i=1}^{t} \alpha _i\). xt can be viewed as a linear combination of the original data x0 and standard Gaussian noise ϵ. When t is large, xt approximates standard Gaussian noise. At any point in the noise addition process, xt can be directly derived from the original data x0.
The backward process is mainly used to recover the data by predicting and removing noise through a parameterized model ϵθ(xt, t).
In the traditional DDPM algorithm, the backward process can be described by Eq.4:
\begin{equation}x_{t-1} = \frac{1}{\sqrt {1 - \beta _t}} \left(x_t - \frac{\beta _t}{\sqrt {1 - \bar{\alpha }_t}} \epsilon _\theta (x_t, t) \right) + \sigma _t z\end{equation}
(4)
where σt represents the noise scale at time step t, and z is Gaussian noise, then, we have Eq.5:
\begin{equation}x_{t-1} \approx \frac{1}{\sqrt {\alpha _t}} \left(x_t - \frac{\beta _t}{\sqrt {1 - \bar{\alpha }_t}} \epsilon _\theta (x_t, t) \right)\end{equation}
(5)
In the traditional DDIM algorithm, the backward process can be described by Eq.6:
\begin{equation}x_{t-\tau } = \sqrt {\bar{\alpha }_{t-\tau }} \left(\frac{x_t - \sqrt {1 - \bar{\alpha }_t} \epsilon _\theta (x_t, t)}{\sqrt {\bar{\alpha }_t}} \right) + \sqrt {1 - \bar{\alpha }_{t-\tau } - \sigma _t^2} \epsilon _\theta (x_t, t) + \sigma _t z\end{equation}
(6)
then, we have Eq.7:
\begin{equation}x_{t-\tau } \approx \sqrt {\bar{\alpha }_{t-\tau }} \left(\frac{x_t - \sqrt {1 - \bar{\alpha }_t} \epsilon _\theta (x_t, t)}{\sqrt {\bar{\alpha }_t}} \right) + \sqrt {1 - \bar{\alpha }_{t-\tau }} \epsilon _\theta (x_t, t)\end{equation}
(7)
where τ represents the step lenth.

4 TPSegmentDiff Method

We propose the TPSegmentDiff method based on the traditional DDPM and DDIM models. Using tactile paving optical images as prior knowledge, the trained TPSegmentDiff is capable of generating high-precision segmentation masks from random noise to achieve tactile paving segmentation.
Table 1:
Input: Image x0, ground truth gt, model parameters θ
Output: Trained model parameters θ
Progress:
1: while θ is not converged do:
2: Get random t ∈ (0, 1000)
3: Generate random noise εtN(0, I)
4: Let \(x _ { t } = \sqrt { \overline{ \alpha } _ { t } } x _ { 0 } + \sqrt { 1 - \overline{ \alpha } _ { t } } \varepsilon _ { t } , x _ { i n , t } = x _ { t } \oplus g t\)
5: t, xin, tmodel(θ) → εout, t
6: Compute losst, εout, t) and update parameters θ
7: return θ
Table 1: TPSegmentdiff Training Algorithm

4.1 TPSegmentDiff Training

Table 1 presents the TPSegmentDiff training algorithm. In traditional diffusion generative models, there’s a U-Net to predict noise which is typically set with three input channels and three output channels to predict noise for all channels, thereby generating three-channel images. In this study, in order to generate a single-channel segmentation mask, we adjusted the U-Net’s channel parameters during model training, setting the input channel cin = 4 and the output channel cout = 1. During training, we preprocess the input data by concatenating the optical image, denoted as img with a shape of [3, h, w] and the ground truth segmentation mask, denoted as gt with a shape of [1, h, w] along the channel dimension to obtain xin. In the concatenating progress we put the segmentation mask gt in the last channel.
In the forward process, the first three channels, img, are kept unchanged as prior knowledge, and only the last channel, gt, is subjected to noise addition. The process is as follows: Gaussian noise ϵ is randomly generated, and a random t is selected. According to Eq.2, noise is added only to the last channel gt for t times to obtain xin, t, which is then input into the noise predict model along with the time step t. The output is the predicted single-channel noise ϵout. Then compared predicted noise ϵout with the Gaussian noise ϵ used during the noise addition process to calculate the loss, which is used for backpropagation to update the model parameters. Figure 2 visualizes the training algorithm.
Figure 2:
Figure 2: TPSegmentdiff training workflow

4.2 TPSegmentDiff Segmentation

Table 2 presents the TPSegmentDiff segmentation algorithm, which consists of TPSegmentDiff-DDPM and TPSegmentDiff-DDIM algorithms based on traditional DDPM and DDIM.
Table 2:
Input: Model parameters θ, steps t, random noise xt, image ximg, DDIM flag, step lenth τ
Output: Segmentation mask x0
Progress:
1: while t > 0 do
2: \(x _ { i n , t } = x _ { i m g } \oplus x _ { t }\)
3: t, xin, tmodel(θ) → εt
4: if DDIM: //TPSegmentDiff-DDIM Algorithm
5: \(x _ { x - \tau } = \sqrt { \overline{\alpha _ { x - \tau }}} (x _ { t } - \sqrt { 1 - \overline{ \alpha } _ { t } } \varepsilon _ { t }) / \sqrt { \overline{\alpha }_ { t } } + \sqrt { 1 - \overline{ \alpha _ { x - \tau } }} \varepsilon _ { t }\)
6: ttτ
7: else: //TPSegmentDiff-DDPM Algorithm
8: \(x _ { t - 1 } = \left[ x _ { t } - (1 - \alpha _ { t } / \sqrt { 1 - \overline{ \alpha } _ { t } }) \right] / \sqrt { \alpha _ { t } } + \sqrt { 1 - \alpha _ { t } } z , z \sim N (0 , 1)\)
9: tt − 1
10: return x0
Table 2: TPSegmentDiff Segmentation Algorithm
TPSegmentDiff-DDPM: Randomly generate Gaussian noise ϵt with a shape of [1, h, w], set xin, t = img⊕ϵt, and input it into the noise predict model to obtain ϵpred. Then, according to Eq.5, calculate ϵt − 1. Next, set xin, t − 1 = img⊕ϵt − 1 and input it into the noise predict model. This process is iteratively repeated until, after a sufficient number of iterations, ϵ0 is obtained as the segmentation mask.
TPSegmentDiff-DDIM: First, specify a step length τ. Randomly generate Gaussian noise ϵt with a shape of [1, h, w], set xin, t = img⊕ϵt, and input it into the noise predict model to obtain ϵpred. Then, according to Eq.7, calculate ϵtτ. Next, set xin, tτ = img⊕ϵtτ and input it into the noise predict model. This process is iteratively repeated until, after a sufficient number of iterations, ϵ0 is obtained as the segmentation mask. Figure 3 visualizes the segmentation algorithm.
Figure 3:
Figure 3: TPSegmentdiff segmentation workflow

4.3 TPSegmentDiff Voting Mechanism

Since the generation results of diffusion models are correlated with the initial noise distribution, there will be some random error in tactile paving image segmentation. To reduce random errors, we introduced a voting mechanism for generating segmentation masks multiple times and deciding the final result. In generated segmentation mask, each pixel are ranged from 0 to 255, with the closer to 255 indicating the more likely to be blind, and conversely the closer to 0 indicating the more likely to be background. For each pixel i, we compute the vote as Eq.8:
\begin{equation}p r e d _ { v o t e } \left[ i \right] = \frac{ \sum _ { k = 1 } ^ { n } p r e d _ { k } \left[ i \right] }{ n }\end{equation}
(8)
where predvote[i] represents the i-th pixel in voting result, predk[i] represents the i-th pixel in the k-th generated segmentation mask, n represents the number of the generated segmentation mask used in the voting machanism.
Then we use the Eq,9 to decide the final result of the segmentation mask:
\begin{equation}pred_{\text{final}}[i] = \left\lbrace \begin{matrix} 1, & \text{if } pred_{\text{vote}}[i] \ge 128 \\ 0, & \text{otherwise}\end{matrix} \right.\end{equation}
(9)
Equation 9 combines the prediction results of each generated segmentation mask to determine the final segmentation mask, thus reducing the error enhances the robustness of the noise predict model, and at the same time converting the discrete 0-255 segmentation mask into a 0-1 binary segmentation mask.

5 Experiments

5.1 Model Training

We used the TP-Dataset[19] to train the model from scratch. The dataset is a tactile paving image segmentation dataset, including 1,391 tactile paving images from various scenes such as campuses, streets, railway stations, bus stations, subways, communities, and hospitals. The dataset contains RGB three-channel tactile paving images in JPG format and corresponding grayscale single-channel label mask images in PNG format. This dataset has two image segmentation categories: background (marked with 0) and tactile paving (marked with 255). The images have various resolutions with different aspect ratios.
We conducted model training and experiments on a desktop computer equipped with an Intel i9-13900KF, 63.9 GB RAM, and an Nvidia GeForce RTX 4080 16G GPU. During model training, we uniformly scaled the training images to 128 pixels on the longest side, centered them, and then filled the surrounding area with black pixels to make them 128x128 pixels. The learning rate was fixed at 1e-4, and we did not use half-precision (FP16). The batch size was set to 8, and training was conducted on the described experimental platform. A total of 3.4e5 steps were performed in the training, which took about 50 hours.

5.2 Experimental analysis

In order to test the performance of TPSegmentDiff under different conditions, we conducted in-depth experiments.
All TPSegmentDiff-DDPM algorithms were set to sample 1000 times to generate segmentation results in experiments. For the TPSegmentDiff-DDIM algorithm, unless otherwise specified, a step length of 25 was used, with 40 samples taken, which is equivalent to 1000 sampling steps, aligning with the TPSegmentDiff-DDPM’s sampling process. The performance metrics used for evaluation are Mean Intersection over Union (MIoU) and F1-score.
Table 3:
AlgorithmPic TimeFPSStep Time
TPSegmentDiff-DDPM14.6910.0680.0146
TPSegmentDiff-DDIM0.5901.6930.0147
Table 3: Speed comparison of TPSegmentDiff-DDPM and TPSegmentDiff-DDIM
In Table 3 pic, time represents the time taken to segment one Image, step time represents the time taken in one denoise step. The result indicates that the step time of TPSegmentDiff-DDIM is close to that of TPSegmentDiff-DDPM, suggesting that adopting TPSegmentDiff-DDIM does not introduce significant additional computational overhead. By applying the TPSegmentDiff-DDIM algorithm and setting a more significant step size, the generation speed of the image segmentation mask can be significantly improved.
Figure 4:
Figure 4: The performance of TPSegmentDiff-DDPM and TPSegmentDiff-DDIM
Figure 4 compares the performance of TPSegmentDiff-DDPM and TPSegmentDiff-DDIM, on the task of tactile paving image segmentation. It shows results for both models with and without a voting mechanism.
Without voting, TPSegmentDiff-DDPM achieves an MIoU of 0.584 and an F1-score of 0.672. When the voting mechanism is applied, TPSegmentDiff-DDPM’s performance improves significantly, with an MIoU of 0.632 and an F1-score of 0.711. In contrast, TPSegmentDiff- DDIM shows an MIoU of 0.665 and an F1-score of 0.731 without voting, and these metrics remain unchanged when the voting mechanism is applied. This indicates that while the voting mechanism enhances the performance of TPSegmentDiff-DDPM by reducing random errors and increasing segmentation accuracy, TPSegmentDiff- DDIM already exhibits robust performance without the need for additional voting, highlighting its inherent stability and efficiency in this segmentation task.
Figure 5:
Figure 5: The impact of step length setting on TPSegmentDiff-DDIM segmentation performance (keep equivalent to 1000 total sampling steps)
Figure 5 illustrates the impact of varying the step length in the TPSegmentDiff-DDIM on the model’s performance, measured by MIoU and F1-score. The experimental setup for TPSegmentDiff-DDIM involves varying the step length and sample steps to keep them equivalent to 1000 sampling steps to compare them to TPSeg-TPSegmentDiff-DDPM with 1000 sampling steps.
As observed, when the step length is small, such as 10, the performance metrics, MIoU and F1-score, remain high, with values of approximately 0.666 and 0.732, respectively. This indicates the model effectively captures the data’s underlying distribution with frequent updates. As the step length increases to 20 and 25, the performance remains relatively stable, with minor fluctuations, suggesting a balanced trade-off between computational efficiency and model accuracy.
However, when the step length is extended beyond 50 to 100 and 200, a notable decline in both MIoU and F1 scores is observed. The MIoU drops from 0.660 at a step length of 50 to 0.638 at 100 and drastically to 0.416 at 200. Similarly, the F1-score decreases from 0.721 to 0.438 as the step length increases. This decline highlights the limitations of using giant steps, resulting in fewer samples and less accurate approximations of the target distribution, ultimately impairing the model’s ability to generate precise segmentation outputs.
The analysis suggests that while more considerable step lengths can reduce computational overhead, they may compromise the model’s performance in capturing fine-grained details necessary for high-quality segmentation. Consequently, selecting an optimal step length is crucial for balancing computational efficiency with segmentation accuracy.
In general, the TPSegmentDiff-DDIM algorithm’s segmentation speed and effect are best when the step length is set to 25. After a series of experimental verifications, the TPSegmentDiff algorithm performs better in the tactile paving image segmentation task.

6 Summary

This study adapts diffusion models to tactile paving image segmentation, significantly improving the model’s segmentation accuracy and robustness in complex environments. This advancement provides a new approach to tactile paving image segmentation. We have demonstrated the potential of diffusion models in handling high-variability and noise-interfered image segmentation tasks. We also introduced a voting mechanism to reduce the random error caused by the random distribution of the initial noise. The study offers theoretical support and practical evidence for the future application of diffusion models in various scenarios, such as pedestrian and vehicle image segmentation tasks, to aid autonomous vehicles and mobile robots.
Future work needs to optimize the use of prior knowledge, ensuring that the method can more effectively extract valuable information from prior knowledge during training while avoiding overfitting caused by repetitive use of prior knowledge during segmentation. Additionally, techniques such as distillation and pruning can reduce model complexity and further enhance segmentation speed.

References

[1]
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (Dec 2017), 2481–2495.
[2]
John Canny. 1986. A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelligence6 (1986), 679–698.
[3]
Hu Cao, Yueyue Wang, Joy Chen, Dongsheng Jiang, Xiaopeng Zhang, Qi Tian, and Manning Wang. 2022. Swin-unet: Unet-like pure transformer for medical image segmentation. In European conference on computer vision. Springer, 205–218.
[4]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40, 4 (2017), 834–848.
[5]
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV). 801–818.
[6]
Manzoor Ahmed Hashmani, Mehak Maqbool Memon, and Kamran Raza. 2020. Semantic Segmentation for Visually Adverse Images – A Critical Review. In 2020 International Conference on Computational Intelligence (ICCI). 28–33.
[7]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
[8]
J. Long, E. Shelhamer, and T. Darrell. 2015. Fully convolutional networks for semantic segmentation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 3431–3440.
[9]
Rashmi B N. 2024. Survey on Real-world Applications and Challenges of Deep Learning-Enhanced Techniques to assist visually impaired. International Journal of Intelligent Systems and Applications in Engineering 12, 3 (Mar. 2024), 3772–3790. https://ijisae.org/index.php/IJISAE/article/view/6054
[10]
Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori, Steven McDonagh, Nils Y Hammerla, Bernhard Kainz, et al. 2018. Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:https://arXiv.org/abs/1804.03999 (2018).
[11]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi (Eds.). Springer International Publishing, Cham, 234–241.
[12]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer, 234–241.
[13]
Irwin Sobel and G. M. Feldman. 1990. An Isotropic 3×3 image gradient operator. https://api.semanticscholar.org/CorpusID:59909525
[14]
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 37), Francis Bach and David Blei (Eds.). PMLR, Lille, France, 2256–2265. https://proceedings.mlr.press/v37/sohl-dickstein15.html
[15]
Jiaming Song, Chenlin Meng, and Stefano Ermon. 2020. Denoising diffusion implicit models. arXiv preprint arXiv:https://arXiv.org/abs/2010.02502 (2020).
[16]
Changshuo Wang, Chen Wang, Weijun Li, and Haining Wang. 2021. A brief survey on RGB-D semantic segmentation using deep learning. Displays 70 (2021), 102080.
[17]
Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, et al. 2020. Deep high-resolution representation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence 43, 10 (2020), 3349–3364.
[18]
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. 2021. SegFormer: Simple and efficient design for semantic segmentation with transformers. Advances in neural information processing systems 34 (2021), 12077–12090.
[19]
Xingli Zhang, Lei Liang, Shenglu Zhao, and Zhihui Wang. 2024. GRFB-UNet: A new multi-scale attention network with group receptive field block for tactile paving segmentation. Expert Systems with Applications 238 (2024), 122109.
[20]
Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2881–2890.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MMAsia '24 Workshops: Proceedings of the 6th ACM International Conference on Multimedia in Asia Workshops
December 2024
141 pages
ISBN:9798400713149
DOI:10.1145/3700410
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 December 2024

Check for updates

Author Tags

  1. Computer Vision
  2. Diffusion Model
  3. Image Segmentation
  4. Tactile Paving Segmentation

Qualifiers

  • Poster

Funding Sources

  • Shandong Provincial Natural Science Foundation

Conference

MMAsia '24
Sponsor:
MMAsia'24: ACM Multimedia Asia Workshops
December 3 - 6, 2024
Auckland, New Zealand

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 94
    Total Downloads
  • Downloads (Last 12 months)94
  • Downloads (Last 6 weeks)94
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media