1. Introduction
Vegetables are essential in daily diets. Data from the United Nations’ Food and Agriculture Organization show that China leads globally, with 52.25% of the world’s vegetable planting area and 58.31% of its total production [
1]. With the increasing adoption of smart agricultural technologies, traditional methods of vegetable production are evolving. The use of transplanting machines, in particular, has greatly improved the efficiency of vegetable cultivation [
2,
3]. However, when using transplanting machines for vegetable cultivation, instances of substandard planting quality arise, including issues like excessive planting depth (covered seedlings), inadequate depth (exposed seedlings), and missed hills [
4]. Factors contributing to substandard planting quality include mechanical design [
5,
6,
7], agronomy [
8,
9], and various environmental aspects related to the field. Vavrina, et al. [
10] evaluated the impact of transplanting depth on tomato and bell pepper yields, revealing that transplanting up to the first true leaf or cotyledon results in greater yields than transplanting to the top of the stem. As shown in
Figure 1, currently, the process of detecting and replanting seedlings with substandard planting quality primarily relies on manual labor. This method is marked by inconsistent standards and demands a significant amount of work. A major challenge in transitioning from manual to mechanized replanting is the development of effective target detection algorithms [
11]. The speed and precision of these algorithms are critical, as they directly influence the efficiency of the robots and the yield of field-grown vegetables. Therefore, this study initially categorizes the conditions of seedlings that impact yield and aims to develop fast and accurate detection algorithms for these specific categories.
The current prevalent technologies for field detection include machine vision [
12,
13], ultrasonic sensor detection [
14], and 3D Light Detection and Ranging (LiDAR) detection [
15,
16]. Ultrasonic sensors and 3D LiDAR can detect the presence of vegetable seedlings within an area, yet they face difficulties in accurately distinguishing the planting quality of these seedlings. Machine vision technology, known for its capability to capture comprehensive, precise, and intelligent information, demonstrates significant potential in target detection of broccoli seedling planting quality [
17,
18].
Currently, deep learning is widely applied in the field of agricultural detection [
19]. Scholars worldwide focus mainly on using deep learning to detect missing seedlings in seedling planting quality assessments, with less emphasis on detecting planting depth. Lin, et al. [
20] developed a detection model for field peanut seedlings, combining an improved YOLOv5s with DeepSort, and utilized drones for seedling emergence detection. Although efficient, this model fails to locate non-emerged seedlings and cannot assess planting depth quality. Cui, et al. [
21] enhanced the YOLOv5s by adjusting its detection head structure and incorporating a transformer, developing a rice missing seedling detection and counting model with a precision of 93.2%. Wu, et al. [
22] improved YOLOv5s by replacing its Neck network with the Slim-Neck network, developing a sugarcane field missing seedling detection model and proposing a method for predicting replanting locations. However, this model tends to miss detecting small sugarcane seedlings, presenting limitations for the detection of the “Covered seedling” category in our task. Zhang, et al. [
23] replaced the upsampling module in the neck network of YOLOv5s with the Content-aware ReAssembly of Features (CARAFE) module, enhancing the performance in detecting small targets.
For the detection of complex multi-target tasks such as broccoli planting quality assessment, deep learning models need to achieve high precision across each category. Zhao, et al. [
24] developed a deep learning model for grading vegetable seedlings, utilizing ShuffleNet Version 2 (ShuffleNet-V2) as the backbone network for feature extraction and integrating the Efficient Channel Attention (ECA) attention mechanism. This model achieved high precision in categorizing seedlings as weak, damaged, or strong, with a precision rate of 94.23%. Attention mechanisms enable network models to focus on relevant areas within local information. Commonly used attention mechanisms include Squeeze-and-Excitation (SE) [
25,
26], Convolutional Block Attention Module (CBAM) [
27], and Coordinate Attention (CA) [
28]. SE focuses solely on channel information, overlooking spatial information, whereas CBAM employs global pooling operations to capture local spatial information. CA, on the other hand, maintains channel information while concentrating on long-range spatial information in feature maps. Zhu, et al. [
29] integrated the CA mechanism with YOLOX-s, enhancing the network’s focus on regions of interest and effectively improving the detection precision for corn silk obscured by leaves.
To address challenges such as lower algorithm recognition rates and weak robustness in natural environments, Sun, et al. [
30] focused on the detection of broccoli seedlings. They proposed a method based on the Faster Region-based Convolutional Neural Network (Faster R-CNN) model, achieving a recognition precision of 91.73% with an average detection time of 249 ms. While two-stage detection models like Faster R-CNN [
31] offer higher precision, they also have slower image processing times. In contrast, one-stage detection models, such as those in the YOLO series [
32], bypass the candidate region selection stage and directly treat object detection as a regression task, facilitating end-to-end detection. In 2022, the novel YOLOv7 architecture was introduced, outperforming all known object detectors within a performance range of 5 to 160 fps [
33]. Among its variants, YOLOv7-tiny maintains the cascade-based model scaling strategy of YOLOv7 and features improvements in the Efficient Long-Range Aggregation Network (ELAN) [
34]. YOLOv7-tiny employs a more compact network architecture and an optimized training strategy. By reducing model parameters and computational requirements, it offers a viable solution for target detection in computationally constrained environments.
For mobile deployment in field environments, two primary methods are typically used to reduce network model weights: (1) Utilizing lightweight architectures with fewer parameters, such as MobileNet [
35], ShuffleNet [
36,
37], and GhostConv [
38], which decrease the parameter count while minimizing performance loss. In precision agriculture, attention-based lightweight models are often used in network models that require high accuracy but fewer parameters [
39]. (2) Implementing techniques like sparse training and model pruning to further reduce the model’s parameters and computational demand. In addressing the precise identification and localization of cabbage, Zhai, et al. [
40] evaluated Faster R-CNN, Single Shot MultiBox Detector (SSD), and YOLOv5. They opted for YOLOv5s as the base model and implemented lightweight modifications using MobileNet V3s. This model achieved a recognition precision of 93.14% with an image processing time of 54.09 ms, marking a 26.98% reduction in processing time compared to the base model. However, the model demonstrated reduced precision in detecting small cabbages with missing leaves, particularly noticeable post-transplantation. Moreover, these studies have primarily concentrated on inter-class classification, with the nuanced task of intra-class fine-grained detection still presenting a significant challenge. Ref. [
41] enhanced YOLOv3-tiny with Path Aggregation Network (PANet) and Spatial Attention Module (SAM) for hierarchical tomato seedling detection, effectively distinguishing no-seedlings, weak, and healthy seedlings. However, research in this area is predominantly performed in stable conditions.
In summary, current research primarily focuses on detecting missing seedlings, mainly for assessing crop yield, with limited attention given to the detection of seedlings with improper planting depth. This issue has emerged due to the transition from semi-automatic to fully automatic transplanting machines, where the manual process of picking and placing seedlings has been replaced by mechanical arms, leading to instances of substandard planting quality, a new and common phenomenon. Ensuring vegetable yield necessitates the detection of these poorly planted seedlings. Existing algorithms require significant computational resources and memory, and they face limitations in recognizing categories with similar features and small targets. For example, the similarity between the features of broccoli root balls and soil clods, the ease with which root balls can be obscured by leaves, the resemblance of “missed hill” features to the background, and the small size of features in the “Covered Seedling” category, present significant challenges. To address these challenges, our contributions are as follows: (1) We proposed a method for target detection and classification specifically for broccoli planting quality, categorizing the planting quality into “qualified seedlings”, “exposed seedlings”, “covered seedlings”, and “missed hills”, and created a dataset for this research, thereby contributing exploratory work to the field of vegetable planting quality detection. (2) We developed the Seedling-YOLO deep learning model for identifying substandard broccoli planting quality in the field. (3) We introduced the ELAN_P module for the backbone network, which reduces model parameters without sacrificing precision. Furthermore, by integrating CARAFE and CA, we addressed issues of false and missed detections, especially prevalent in “exposed seedlings”, “covered seedlings”, and “missed hills”.
4. Discussion
Currently, the most advanced models are typically those that excel on public datasets. However, for specific tasks, these models often require customization and development tailored to particular recognition challenges. The detection of vegetable planting quality is a new challenge brought about by the development of fully automatic transplanting machines. In this study, we found that the features of broccoli root balls are similar to soil clods and can easily be obscured by leaves, as shown in
Figure 12A,D, leading to false detections in the most advanced models. In the “missed hills” category, features left by the end effector resemble the background, making recognition difficult, as evidenced by false detections in YOLOv7-tiny in
Figure 12C. In the detection of “covered seedlings”, as shown in
Figure 12B,D, the small target size leads to false detections and low precision in existing algorithms. To address these challenges, we developed Seedling-YOLO, which integrates YOLOv7-tiny with CARAFE, CA, and our proposed ELAN_P module. As shown in
Table 5, the ablation experiments reveal that M3 and M4 significantly increase the model’s accuracy and recall rate. From the heatmap in
Figure 13, it is evident that the model pays more attention to areas of interest, enhancing focus on small targets and reducing interference caused by background similarities. According to M5-M8, the introduction of the CARAFE and CA operators does not significantly increase the overhead of the model. This integration effectively resolves issues related to feature similarity, occlusion, and small target size, resulting in a notable improvement in detection, as illustrated in
Figure 11 and
Figure 12E–H. Typically, developing fast detection algorithms sacrifices model precision, but as shown in
Table 5, Seedling-YOLO reduced the model’s parameters without losing precision using ELAN_P, making it more suitable for deployment on resource-constrained devices. Compared to previous research [
20,
21], which focused on the detection of emergence rate and missing seedlings for yield assessment, [
22] developed a sugarcane seedling replanting model with a replanting location prediction method. Although successful in predicting missing seedlings, it tended to miss small seedlings. In contrast, Seedling-YOLO can directly locate missed plantings and detect exposed and covered seedlings. This study primarily developed a high-precision, real-time detection model for replanting robots, as shown in
Table 4. High precision improves the precision of replanting, reducing the cases of incorrect replanting, thereby enhancing the efficiency of replanting robots. A high recall rate ensures the robot identifies and replants more areas that actually require it, which is crucial for ensuring overall crop yield. Additionally, the proposed model has been applied to a visual chassis for recognition verification at different speeds, achieving over 90% precision at speeds up to 0.6 m/s, as shown in
Figure 15. As the speed increases, due to motion blur, false and missed detections begin to occur, similar to the conclusions drawn in the literature [
40,
47].
Regarding the applicability and limitations of the model, Seedling-YOLO can assist replanting robots in identifying substandard plantings. It also calculates the distance between the robot and the seedling using the coordinates of the bounding box, guiding the motor to the location requiring replanting. Furthermore, the bounding box coordinates aid in directing the robot’s end effector for precise replanting positioning. Additionally, the model is suitable for completing replanting tasks within a few days following transplantation by a transplanting machine. This not only ensures consistent growth between replanted and field-grown seedlings but also prevents the deterioration of mound surfaces due to weather or human factors, which could reduce the precision of missing seedling detection. Given the model’s adaptability to unstructured environments, it is expected to excel even further in stable, controlled settings. However, our model is currently specific to broccoli planting quality detection. In the future, we plan to collect data on more vegetable varieties, enabling the model to be used for quality detection in a broader range of vegetable plantings. In the future, we can incorporate deblurring algorithms to enable the model to adapt to faster walking speeds of replanting robots. Additionally, by adding layers for small targets, we can further enhance the model’s precision in detecting small objects.
5. Conclusions
In this study, we successfully designed Seedling-YOLO, an efficient object detection algorithm for the planting quality of broccoli seedlings. This model efficiently handles real-time detection of diverse planting conditions, including qualified, exposed, covered seedlings, and missed hills, which are commonly problematic in field environments due to false and missed detections by existing algorithms.
Leveraging YOLOv7-tiny, we redesigned the ELAN module by incorporating Pconv, significantly reducing the model’s parameter, and thereby streamlining the backbone feature extraction process. Further enhancements were achieved by integrating the CARAFE operator, which uses a larger receptive field for upsampling to boost model precision. Additionally, we introduced CA in the backbone and neck shallow layers, focusing the model more on critical areas when capturing features.
The architecture of Seedling-YOLO has shown substantial improvements in terms of precision and speed. Experimental validation confirmed that the model can effectively classify four types of broccoli seedling planting qualities. Notably, the
AP for detecting missing seedlings increased by 11.2%. Compared to the original model, Seedling-YOLO’s parameters were reduced by 20%, and FLOPs by 16%, with an
[email protected] of 94.3%, and an FPS of 29.7. This streamlined, more accurate model is suitable for deployment on standard hardware, achieving a detection precision of 93% at a speed of 0.6 m/s and a recognition efficiency of 180 plants/min in dual-row vegetable ridges with a plant spacing of 0.4 m. These capabilities fulfill high-speed planting requirements and provide robust technical support for field vegetable seedling supplementation.
In future work, we plan to broaden the application of the model to include various other vegetable seedlings, aiming to increase the versatility of the seedling recognition system for diverse agricultural environments. Additionally, we plan to explore the development of advanced seedling picking and planting devices that integrate our visual recognition technology, potentially revolutionizing the mechanization of seedling planting and replanting operations. These expansions could significantly contribute to the global efforts in precision agriculture, aiming to improve crop yields, optimize resource use, and ensure food security.