Author Contributions
Conceptualization, J.E., R.G., and R.N.J.; methodology, S.K.S. and M.S.L.; software, S.K.S. and M.S.L.; formal analysis, S.K.S.; data curation, S.K.S.; data collection, S.K.S., R.K.K., J.R., and M.D.; writing—original draft preparation, S.K.S., M.S.L., R.K.K., J.R., and M.D.; writing—review and editing, S.K.S. visualization, S.K.S.; supervision, R.N.J. and H.K.; project administration, J.E., R.G., and R.N.J.; funding acquisition, J.E., R.G., and R.N.J. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Distribution of biomass samples across quantity, plot trial site, and seasonal cut number. The harvested yield is based on the sampled area of 0.25 m2 per biomass sample.
Figure 1.
Distribution of biomass samples across quantity, plot trial site, and seasonal cut number. The harvested yield is based on the sampled area of 0.25 m2 per biomass sample.
Figure 2.
Steps of image preprocessing from captured images to convolutional neural network (CNN)-inference ready images.
Figure 2.
Steps of image preprocessing from captured images to convolutional neural network (CNN)-inference ready images.
Figure 3.
The large scale image acquisition platform and sample images from each day of large scale image acquisition. The difference in yield is noticeable from the larger leaf sizes in the first row, and the visible soil in the bottom row. The images were captured at a velocity of 18 km h−1 without motion blur due to the specially developed camera system.
Figure 3.
The large scale image acquisition platform and sample images from each day of large scale image acquisition. The difference in yield is noticeable from the larger leaf sizes in the first row, and the visible soil in the bottom row. The images were captured at a velocity of 18 km h−1 without motion blur due to the specially developed camera system.
Figure 4.
Overview of the Deeplab v3+ [
20] neural network architecture. Features from the Xception 65 [
21] based backbone is concatenated at multiple scales in the encoder using spatial-scale adjustable atrous filters. The decoder combines higher resolution features from within the Xception 65 with the multi-scale features to produce a spatially improved semantic segmentation. To further improve scale-invariance, images are processed at three image scales and averaged to a final class probability map. The encoder–decoder model architecture is illustrated with similarities to the original publication [
20].
Figure 4.
Overview of the Deeplab v3+ [
20] neural network architecture. Features from the Xception 65 [
21] based backbone is concatenated at multiple scales in the encoder using spatial-scale adjustable atrous filters. The decoder combines higher resolution features from within the Xception 65 with the multi-scale features to produce a spatially improved semantic segmentation. To further improve scale-invariance, images are processed at three image scales and averaged to a final class probability map. The encoder–decoder model architecture is illustrated with similarities to the original publication [
20].
Figure 5.
Illustration of image augmentations applied on the synthetic image crops during training of the model. (a–f) represent online augmentations throughout the training process; (g,h) represent style transfer of dew artifacts, used for offline-augmentation of synthetic images in subsequent finetuning of the 1st stage model.
Figure 5.
Illustration of image augmentations applied on the synthetic image crops during training of the model. (a–f) represent online augmentations throughout the training process; (g,h) represent style transfer of dew artifacts, used for offline-augmentation of synthetic images in subsequent finetuning of the 1st stage model.
Figure 6.
First row: Illustrative 900 × 900 pixel image crops of the weather conditions effect on image quality depending on droplet size. (left) Sunny condition without droplets. (center) Although the large droplets from rain are visible, the captured leaf texture is comparable to sunny conditions. (right) In dewy conditions, especially the clover leaf texture appearance is highly affected by the specular reflections from the camera flash. Second row: The semantic segmentation results with a model trained with traditional image augmentation. Third row: The semantic segmentation results when including style transferred dewy conditions into the synthetic training data. Green is grass, red is clover, and blue is soil and background.
Figure 6.
First row: Illustrative 900 × 900 pixel image crops of the weather conditions effect on image quality depending on droplet size. (left) Sunny condition without droplets. (center) Although the large droplets from rain are visible, the captured leaf texture is comparable to sunny conditions. (right) In dewy conditions, especially the clover leaf texture appearance is highly affected by the specular reflections from the camera flash. Second row: The semantic segmentation results with a model trained with traditional image augmentation. Third row: The semantic segmentation results when including style transferred dewy conditions into the synthetic training data. Green is grass, red is clover, and blue is soil and background.
Figure 7.
Comparison between ground truth labeled images crops of 1000 × 1000 pixels and corresponding semantic segmentations from FCN-8s and Deeplabv3+ models. The images originate from the fall acquisition using the large scale image acquisition platform and vary in clover content. Each row represents one image. The derived clover coverage, relative to the detected canopy, is written in white text. Red is clover, green is grass, blue is soil+background, orange is weeds, and grey is unknown. The predictions from the Deeplabv3+ based model are consistently closer to the ground truth than the FCN-8s based model.
Figure 7.
Comparison between ground truth labeled images crops of 1000 × 1000 pixels and corresponding semantic segmentations from FCN-8s and Deeplabv3+ models. The images originate from the fall acquisition using the large scale image acquisition platform and vary in clover content. Each row represents one image. The derived clover coverage, relative to the detected canopy, is written in white text. Red is clover, green is grass, blue is soil+background, orange is weeds, and grey is unknown. The predictions from the Deeplabv3+ based model are consistently closer to the ground truth than the FCN-8s based model.
Figure 8.
Qualitative samples of semantic segmentation on four plant samples of varied yield and biomass compositions using DeepLabv3+ST followed by FCN-8s for clover species discrimination. First column: Input red green blue (RGB) image of 3000 × 3000 pixels. Second column: 1st-stage pixelwise classification of image into soil (blue), clover (red), grass (green) and weeds (orange). Third column: 2nd-stage pixelwise classification of image into soil (blue), red clover (purple), white clover (yellow), grass (green), and weeds (orange).
Figure 8.
Qualitative samples of semantic segmentation on four plant samples of varied yield and biomass compositions using DeepLabv3+ST followed by FCN-8s for clover species discrimination. First column: Input red green blue (RGB) image of 3000 × 3000 pixels. Second column: 1st-stage pixelwise classification of image into soil (blue), clover (red), grass (green) and weeds (orange). Third column: 2nd-stage pixelwise classification of image into soil (blue), red clover (purple), white clover (yellow), grass (green), and weeds (orange).
Figure 9.
Visualization of the correlation between the predicted visual canopy fractions and the corresponding dry matter fractions. The top row represents 915 biomass samples. The second row represents a reduced set of 752 biomass samples, due to omitted clover species annotations in most biomass samples from plot trial site B. (a–d) represent clover, weeds, white clover, and red clover fractions, respectively.
Figure 9.
Visualization of the correlation between the predicted visual canopy fractions and the corresponding dry matter fractions. The top row represents 915 biomass samples. The second row represents a reduced set of 752 biomass samples, due to omitted clover species annotations in most biomass samples from plot trial site B. (a–d) represent clover, weeds, white clover, and red clover fractions, respectively.
Figure 10.
Test of biomass clover fraction prediction at each plot trial site, based on a first order linear model fitted to the remaining three sites. The site-specific mean absolute error, denoted MAE, is printed in each subfigure.
Figure 10.
Test of biomass clover fraction prediction at each plot trial site, based on a first order linear model fitted to the remaining three sites. The site-specific mean absolute error, denoted MAE, is printed in each subfigure.
Figure 11.
Comparison between the predicted visual canopy coverage and the corresponding yield in the biomass samples. The data are visualized for each plot trial site individually to emphasize local trends. (a–d) correspond to plot trial sites A, B, C, and D, respectively.
Figure 11.
Comparison between the predicted visual canopy coverage and the corresponding yield in the biomass samples. The data are visualized for each plot trial site individually to emphasize local trends. (a–d) correspond to plot trial sites A, B, C, and D, respectively.
Figure 12.
Pixelwise classification of the four large scale image acquisition samples from
Figure 3. Blue is soil+background, red is clover, green is grass, and orange is weeds. The caption for each subfigure exemplifies the two automatically predicted metrics used for large scale mapping.
Figure 12.
Pixelwise classification of the four large scale image acquisition samples from
Figure 3. Blue is soil+background, red is clover, green is grass, and orange is weeds. The caption for each subfigure exemplifies the two automatically predicted metrics used for large scale mapping.
Figure 13.
Visualization of the visual clover fraction in a subset of the large scale mapped fields. Fields (a–g) were sampled in May. Fields (h–l) were sampled in October. Each square unit represents a 5 × 5 m interpolated clover fraction. Each dot represents the visually predicted clover fraction in a corresponding image.
Figure 13.
Visualization of the visual clover fraction in a subset of the large scale mapped fields. Fields (a–g) were sampled in May. Fields (h–l) were sampled in October. Each square unit represents a 5 × 5 m interpolated clover fraction. Each dot represents the visually predicted clover fraction in a corresponding image.
Figure 14.
Visualization of the predicted canopy coverage in a subset of the large scale mapped fields. Fields (a–g) were sampled in May. Fields (h–l) were sampled in October. Each square unit represents a 5 × 5 m interpolated value.
Figure 14.
Visualization of the predicted canopy coverage in a subset of the large scale mapped fields. Fields (a–g) were sampled in May. Fields (h–l) were sampled in October. Each square unit represents a 5 × 5 m interpolated value.
Table 1.
Comparison of the four plot trial sites. Plots in site A, C, and D were all established with a location-specific seed mixture, followed by nitrogen application trials to induce a varied clover content across the plots. Plots in site B were established with a wide range of commercially available seed mixtures, leading to a high variation between the plots, but inconsistent representation of the four species in the plots.
Table 1.
Comparison of the four plot trial sites. Plots in site A, C, and D were all established with a location-specific seed mixture, followed by nitrogen application trials to induce a varied clover content across the plots. Plots in site B were established with a wide range of commercially available seed mixtures, leading to a high variation between the plots, but inconsistent representation of the four species in the plots.
Plot Trial Site | A | B | C | D |
---|
Seeded plant species |
Lolium perenne | ✓ | (✓) | ✓ | ✓ |
× Festulolium | | (✓) | | ✓ |
Trifolium repens | ✓ | (✓) | ✓ | ✓ |
Trifolium pratense | ✓ | (✓) | | ✓ |
Herbicides | | ✓ | | |
Soil type | Loamy sand | Sandy loam | Loamy sand | Coarse sand |
Cuts per season | 4 | 4 | 5 | 5 |
No. of plots at site | 60 | >200 | 48 | 48 |
Years since plot establishment | 1–4 | 1–2 | 2 | 2 |
Sample years | 2017 | 2017–18 | 2019 | 2019 |
Acquisition weather conditions |
Sunny | ✓ | ✓ | ✓ | ✓ |
Rain | | | ✓ | ✓ |
Morning dew | ✓ | ✓ | ✓ | ✓ |
Location |
Latitude | 56.4957 | 55.3397 | 55.5370 | 56.1702 |
Longitude | 9.5693 | 12.3808 | 8.4952 | 8.7816 |
Camera system samples |
Nikon D810A + LED flash | 179 | 83 | | |
Sony a7 + ring flash | 60 | 113 | 180 | 240 |
Sony a7 + speedlight flash | | | 60 | |
Total number of biomass samples | 239 | 196 | 240 | 240 |
Table 2.
Summary of the large scale image acquisition per field.
Table 2.
Summary of the large scale image acquisition per field.
Farm | Field | Area [ha] | Acquisition Time [mm:ss] | Images | Density [Images ha−1] | Speed [ha hour −1] |
---|
May 2018 | | | | | | |
A | 1 | 11.3 | 44:35 | 2223 | 197 | 15.3 |
A | 2 | 18.6 | 67:01 | 3398 | 183 | 16.7 |
A | 3 | 8.1 | 23:04 | 1188 | 145 | 21.1 |
A | 4 | 7.1 | 26:50 | 1330 | 185 | 15.9 |
B | 1 | 14.2 | 49:58 | 2025 | 143 | 17.1 |
B | 2 | 4.8 | 17:50 | 723 | 148 | 16.1 |
B | 3 | 9.2 | 40:43 | 1202 | 135 | 13.6 |
B | 4 | 2.2 | 10:50 | 1202 | 135 | 12.2 |
Oct 2018 | | | | | | |
A | 1 | 11.3 | 34:38 | 1380 | 122 | 19.6 |
A | 2 | 45.8 | 28:16 | 1422 | 31 | 97.2 |
A | 3 | 16.9 | 49:25 | 2423 | 143 | 20.5 |
A | 4 | 14.6 | 46:49 | 2170 | 148 | 18.7 |
A | 5 | 12.5 | 44:11 | 2163 | 173 | 17.0 |
C | 1 | 9.4 | 47:06 | 1878 | 200 | 12.0 |
C | 2 | 20.5 | 78:12 | 3324 | 162 | 15.7 |
C | 3 | 18.8 | 58:46 | 2999 | 160 | 19.2 |
Table 3.
Mean and per class Intersection over Union for semantic segmentation on the GrassClover test set on the evaluation server. The baseline result is provided by the two hierarchically trained FCN-8s models [
12]. ST represents the additional fine-tuning on style-transfer augmented image samples. The best result in each category is marked in bold type.
Table 3.
Mean and per class Intersection over Union for semantic segmentation on the GrassClover test set on the evaluation server. The baseline result is provided by the two hierarchically trained FCN-8s models [
12]. ST represents the additional fine-tuning on style-transfer augmented image samples. The best result in each category is marked in bold type.
Cascaded CNN Models | Intersection over Union [%] |
---|
1st Stage Model | 2nd Stage Model | Mean | Grass | White Clover | Red Clover | Weeds | Soil |
---|
FCN-8s [16] | FCN-8s [16] | 55.0 | 64.6 | 59.5 | 72.6 | 39.1 | 39.0 |
DeepLabv3 + ST | DeepLabv3+ | 65.8 | 78.5 | 62.3 | 75.0 | 51.4 | 61.6 |
DeepLabv3 + ST | FCN-8s [16] | 68.4 | 78.5 | 70.5 | 80.1 | 51.4 | 61.6 |
Table 4.
Comparison with Skovsen et al. [
16] on the same dataset using coefficient of determination between analyzed canopy cover and relative biomass content for each class of species. Contrary to
Table 3, this evaluation includes all 915 biomass samples in the comparison to maximize the extent of experimental sites, camera systems, seeded compositions, weather conditions, and seasons. The best result in each category is marked in bold type.
Table 4.
Comparison with Skovsen et al. [
16] on the same dataset using coefficient of determination between analyzed canopy cover and relative biomass content for each class of species. Contrary to
Table 3, this evaluation includes all 915 biomass samples in the comparison to maximize the extent of experimental sites, camera systems, seeded compositions, weather conditions, and seasons. The best result in each category is marked in bold type.
Cascaded CNN Models | Relative Biomass R2 [%] |
---|
1st Stage Model | 2nd Stage Model | Total Clover | Grass | White Clover | Red Clover | Weeds |
---|
FCN-8s [16] | FCN-8s [16] | 84.1 | 87.2 | 61.1 | 53.5 | 46.1 |
DeepLabv3+ | DeepLabv3+ | 88.6 | 87.3 | 64.8 | 44.9 | 53.8 |
DeepLabv3 + ST | DeepLabv3+ | 91.3 | 90.5 | 64.4 | 45.8 | 64.6 |
DeepLabv3 + ST | FCN-8s [16] | 91.3 | 90.5 | 67.9 | 51.4 | 64.6 |
Table 5.
Detailed comparison of the DeepLabv3+ST model with the FCN-8s model from Skovsen et al. [
16] and the extended morphological filtering from Mortensen et al. [
9] on the 915 biomass samples. The relative clover content prediction is evaluated using coefficient of determination between the detected clover fraction of canopy cover and relative clover content in the biomass, decomposed into individual experimental sites and seasonal cuts. The generalizability of each method can be observed by the drop in predictive performance moving from individual acquisition dates to aggregations across seasonal cuts and experimental sites.
Table 5.
Detailed comparison of the DeepLabv3+ST model with the FCN-8s model from Skovsen et al. [
16] and the extended morphological filtering from Mortensen et al. [
9] on the 915 biomass samples. The relative clover content prediction is evaluated using coefficient of determination between the detected clover fraction of canopy cover and relative clover content in the biomass, decomposed into individual experimental sites and seasonal cuts. The generalizability of each method can be observed by the drop in predictive performance moving from individual acquisition dates to aggregations across seasonal cuts and experimental sites.
| Relative Clover Biomass R2 [%] |
---|
| Cut 1 | Cut 2 | Cut 3 | Cut 4 | Cut 5 | All Cuts |
---|
Morph. filt. | Site A | 71.8 | 81.3 | 79.9 | 36.3 | - | 19.1 |
| Site B | 65.6 | 68.1 | 69.9 | 22.5 | - | 64.8 |
| Site C | 92.7 | 89.2 | 75.5 | 91.5 | 88.9 | 54.8 |
| Site D | 67.9 | 65.3 | 61.5 | 81.8 | 68.0 | 54.2 |
| All sites | 36.4 | 26.4 | 59.3 | 58.3 | 76.4 | 36.9 |
FCN-8s | Site A | 74.1 | 87.8 | 87.8 | 56.9 | - | 74.4 |
| Site B | 90.7 | 84.3 | 87.3 | 79.6 | - | 84.9 |
| Site C | 95.0 | 91.2 | 93.4 | 95.3 | 94.8 | 92.8 |
| Site D | 90.9 | 84.8 | 92.6 | 91.2 | 68.7 | 86.1 |
| All sites | 88.4 | 79.9 | 89.9 | 86.1 | 79.0 | 84.1 |
DeepLabv3+ | Site A | 82.1 | 94.4 | 95.1 | 67.0 | - | 87.8 |
| Site B | 92.5 | 92.6 | 90.6 | 87.6 | - | 90.2 |
| Site C | 95.5 | 93.4 | 95.4 | 97.5 | 95.6 | 94.6 |
| Site D | 92.0 | 87.2 | 94.1 | 91.6 | 70.6 | 89.8 |
| All sites | 91.2 | 90.7 | 92.8 | 91.4 | 85.3 | 91.3 |
Table 6.
Comparison with previously published results on image based prediction of biomass clover fractions. For fair comparison, only mixtures with similar species are compared. Due to the scarcity of published results on ryegrass, white clover, and red clover mixtures, only mixtures of ryegrass and white clover (site C) were included in this comparison. When a data source is referenced, the corresponding R2 is reprinted from the data source. wc and rg is white clover and ryegrass, respectively. BM is biomass. GSD is ground sampling distance.
Table 6.
Comparison with previously published results on image based prediction of biomass clover fractions. For fair comparison, only mixtures with similar species are compared. Due to the scarcity of published results on ryegrass, white clover, and red clover mixtures, only mixtures of ryegrass and white clover (site C) were included in this comparison. When a data source is referenced, the corresponding R2 is reprinted from the data source. wc and rg is white clover and ryegrass, respectively. BM is biomass. GSD is ground sampling distance.
Method | Data Source | BM Range [1000 kg ha−1] | GSD [mm−1] | No. Samples | No. Cuts | Eval. Sites | Species Mixture | Clover R2 [%] |
---|
Morph. filtering [8] | [8] | < 2.8 | 2 | 24 | 3 | 1 | wc, rg | 85 |
FCN-8s [2] | [2] | 1.0–3.3 | 2–3 | 70 | 2 | 1 | wc, rg | 79.3 |
LC-Net [2] | [2] | 1.0–3.3 | 2–3 | 70 | 2 | 1 | wc, rg | 82.5 |
Morph. filtering [9] | Site C | 0.2–5.4 | 6 | 240 | 5 | 1 | wc, rg | 54.8 |
FCN-8s [16] | Site C | 0.2–5.4 | 6 | 240 | 5 | 1 | wc, rg | 92.8 |
DeeplabV3 + ST | Site C | 0.2–5.4 | 6 | 240 | 5 | 1 | wc, rg | 94.6 |