Swin transformer based pyramid pooling network for food segmentation

Q Wang, X Dong, R Wang, H Sun - 2022 IEEE 2nd International …, 2022 - ieeexplore.ieee.org
Q Wang, X Dong, R Wang, H Sun
2022 IEEE 2nd International Conference on Software Engineering and …, 2022ieeexplore.ieee.org
Food segmentation is critical to human health and is one of the elements of food computing
that provides the basis for nutritional assessment as well as composition testing. Food image
segmentation differs from general images in that it usually does not exhibit a unique spatial
layout and common semantic patterns. Current food segmentation methods mainly utilize
deep visual features of convolutional neural networks (CNN) to achieve image segmentation
of food, which ignore the characteristics of food images and make it difficult to achieve the …
Food segmentation is critical to human health and is one of the elements of food computing that provides the basis for nutritional assessment as well as composition testing. Food image segmentation differs from general images in that it usually does not exhibit a unique spatial layout and common semantic patterns. Current food segmentation methods mainly utilize deep visual features of convolutional neural networks(CNN) to achieve image segmentation of food, which ignore the characteristics of food images and make it difficult to achieve the best segmentation performance. In this paper, we propose a Swin Transformer-based pyramid network to capture richer background and boundary information and adaptively combine local features with global features to solve the food image segmentation task. The pyramid pooling module(PPM) aggregates contextual information from different regions of the food image, thus improving the feature representation of global information. Secondly, the multi-scale features acquired by the PPM module are constructed into a feature pyramid, and the multi-scale features are weighted, and then richer edge information is extracted. Experiments are conducted on the FoodSeg103 dataset, and the results show that the method has better results compared with the traditional method, maximizing the details of edges and veins with significant improvements.
ieeexplore.ieee.org