CellMix: A General Instance Relationship based Method for Data Augmentation Towards Pathology Image Classification
arXiv preprint arXiv:2301.11513, 2023•arxiv.org
In pathology image analysis, obtaining and maintaining high-quality annotated samples is
an extremely labor-intensive task. To overcome this challenge, mixing-based methods have
emerged as effective alternatives to traditional preprocessing data augmentation
techniques. Nonetheless, these methods fail to fully consider the unique features of
pathology images, such as local specificity, global distribution, and inner/outer-sample
instance relationships. To better comprehend these characteristics and create valuable …
an extremely labor-intensive task. To overcome this challenge, mixing-based methods have
emerged as effective alternatives to traditional preprocessing data augmentation
techniques. Nonetheless, these methods fail to fully consider the unique features of
pathology images, such as local specificity, global distribution, and inner/outer-sample
instance relationships. To better comprehend these characteristics and create valuable …
In pathology image analysis, obtaining and maintaining high-quality annotated samples is an extremely labor-intensive task. To overcome this challenge, mixing-based methods have emerged as effective alternatives to traditional preprocessing data augmentation techniques. Nonetheless, these methods fail to fully consider the unique features of pathology images, such as local specificity, global distribution, and inner/outer-sample instance relationships. To better comprehend these characteristics and create valuable pseudo samples, we propose the CellMix framework, which employs a novel distribution-oriented in-place shuffle approach. By dividing images into patches based on the granularity of pathology instances and shuffling them within the same batch, the absolute relationships between instances can be effectively preserved when generating new samples. Moreover, we develop a curriculum learning-inspired, loss-driven strategy to handle perturbations and distribution-related noise during training, enabling the model to adaptively fit the augmented data. Our experiments in pathology image classification tasks demonstrate state-of-the-art (SOTA) performance on 7 distinct datasets. This innovative instance relationship-centered method has the potential to inform general data augmentation approaches for pathology image classification. The associated codes are available at https://github.com/sagizty/CellMix.
arxiv.org