Versions of CLC. We conducted several ablation experiments with noise rates of 0.3, 0.4, and 0.5 on CIFAR-10 to make the consensus mechanism more persuasive. The ablation study does not use the benchmark model, and the number of participants is set to 19. We analyzed CLC with different crucial parts, i.e., the iterative holdout (H), the label correction (C), and the class-wise thresholds aggregation part (agg). The experimental results are shown in Figure
10. Without the agg part, the labels were corrected based on the local thresholds. Since CLC deals with label noise, it outperformed FedAvg under all the settings. As the noise rate increases, the advantages of ‘CLC: H+C+agg’ become significant. The aggregated global thresholds, known as the product of the consensus, can better mine the noise distribution from a global perspective. Moreover, the label correction can further augment the available training data. Overall, the H part helps filter the noisy labels, and the C part reuses them, while the agg part helps learn a better distribution of noise and avoid negative correction.
Hyperparameters. We conduct an ablation study to examine the effect of the hyperparameter
\( \tau \) .
\( \tau \) is the lower threshold of margin
\( m(x) \) used to determine further the size of the holdout dataset
\( \mathcal {H} \) . We experiment with
\( \tau =0, 0.1, 0.3, 0.5 \) . Figure
11 shows the performance of CLC using different
\( \tau \) trained on USC-HAD with the noise rate of 0.4. Closer inspection of Figure
11 shows that the optimal
\( \tau \) is generally fixed at 0.1. The corresponding history recording of margin distributions are presented in Figures
12,
13,
14, and
15, respectively. The margin distribution here reflects the margin of those data that are considered to contain label noise. Note that
\( \tau =0 \) means that noise detection is conducted without the margin tool. As shown in Figure
12, during the period before noisy labels are corrected, although the detected noise (true positive) is gradually increasing, a lot of clean data (false positive) is mixed. Interestingly, a large amount of noisy data and a certain amount of clean data gradually move closer to larger values. Our goal is to find as much noisy data as possible and minimize clean data mixing. We study
\( \tau \) from 0.1 to 0.5, taking a step of 0.2. Figure
13 below illustrates that when
\( \tau =0.1 \) , the number of false-positive samples decline sharply, and the number of true positive stays relatively stable–noted that CLC aims to revise noisy labels as much and as accurately as possible. Unfortunately, with
\( \tau =0.3 \) , the number of true positives and false positives both drastically reduce. Furthermore, the margin threshold of 0.5 in Figure
15 performs even worse than 0.3. In summary, these results show that a margin threshold of 0.1 guarantees both recall and precision.