¹¹institutetext: University of Sydney, Camperdown, NSW 2050, Australia ²²institutetext: Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA ³³institutetext: University of Maryland at College Park, College Park, MD 20742, USA ⁴⁴institutetext: Microsoft, Redmond, WA 98052, USA

Revisiting Adaptive Cellular Recognition Under
Domain Shifts: A Contextual Correspondence View

Jianan Fan 11 Dongnan Liu 11 Canran Li 11 Hang Chang 22 Heng Huang 33
Filip Braet 11 Mei Chen 44 Weidong Cai 11

Abstract

Cellular nuclei recognition serves as a fundamental and essential step in the workflow of digital pathology. However, with disparate source organs and staining procedures among histology image clusters, the scanned tiles inherently conform to a non-uniform data distribution, which induces deteriorated promises for general cross-cohort usages. Despite the latest efforts leveraging domain adaptation to mitigate distributional discrepancy, those methods are subjected to modeling the morphological characteristics of each cell individually, disregarding the hierarchical latent structure and intrinsic contextual correspondences across the tumor micro-environment. In this work, we identify the importance of implicit correspondences across biological contexts for exploiting domain-invariant pathological composition and thereby propose to exploit the dependence over various biological structures for domain adaptive cellular recognition. We discover those high-level correspondences via unsupervised contextual modeling and use them as bridges to facilitate adaptation over diverse organs and stains. In addition, to further exploit the rich spatial contexts embedded amongst nuclear communities, we propose self-adaptive dynamic distillation to secure instance-aware trade-offs across different model constituents. The proposed method is extensively evaluated on a broad spectrum of cross-domain settings under miscellaneous data distribution shifts and outperforms the state-of-the-art methods by a substantial margin. Code is available at https://github.com/camwew/Cellular-Recognition_DA_CC.

1 Introduction

In the routine of pathological examination, cellular recognition, which aims to identify the specific type of each cell nucleus and segment it, contributes fundamentally to the success of various downstream clinical commissions [28, 39]. Despite the numerous learning-based approaches proposed in this context [36, 21, 15], those efforts are founded upon an ill-considered assumption that histopathology imaging data is uniformly distributed across cohorts, which dismiss the potential collapse incurred by cross-organ/stain variances [9, 11]. In digital pathology practices, query samples could be acquired from divergent organs with inconsistent staining procedures [23, 37]. The inherently present data distribution shifts induce far-reaching detriment to the robustness and general applicability of learned recognition model. The quantitative illustration and evidence are showcased in Fig. 1(a). It is observed that the precision of cellular subtyping degrades drastically when evaluated across domains, though the performance of class-agnostic segmentation is relatively robust.

Refer to caption — Figure 1: (a) Illustrative results for cellular recognition under domain shifts. X $\rightarrow$ Y denotes that the model is trained with data from X organ and then evaluated on Y organ. bDice and bPQ measure the accuracy of class-agnostic segmentation. $F^{*}$ denote the $F$ score for different nuclear types, indicating subtyping accuracy. (b) Schematic diagram of the hierarchical nature of latent variables. The pathological composition principle of tumor micro-environment inherits fundamental *invariance* regardless of the underlying confounding factors such as sampling organs and staining protocols, holding great promises to formalize domain-agnostic biomarkers.

A promising solution is to leverage well-annotated data from one source cohort and then perform unsupervised domain adaptation (UDA) [26, 19, 8] to transfer the learned domain-agnostic knowledge to another data collection with disparate distribution. However, the efficacy of existing UDA methods in the context of cross-domain cellular recognition is intrinsically undermined by the following limitations: i) Most of these efforts are subjugated to aligning low-level visual attributes in a category-agnostic manner [40, 41, 7]. Without exploitation of discriminative contextual characteristics along adaptation, there is no guarantee that cells of different subtypes can be well separated under domain shifts. ii) In spite of the recent endeavors to perform class-conditioned alignment [45, 24, 25], a major deterrent stands that they either depend on feature representations from the source domain or require pseudo-labeling to categorize objects in the target domain. In digital pathology, the large discrepancy between the source and target domain at the feature distribution level and the unreliability of pseudo-labels on cellular species would inevitably lead to error accumulation and biased alignment.

iii) Most importantly, those approaches seek to devise instance-level alignment strategy to transfer object semantics, which indicate the morphological characteristics (e.g., appearance, texture, and shape) of each individual object, to facilitate adaptation across domains [4, 46, 2]. However, cellular recognition differs from general perception tasks in computer vision by large contexts. Specifically, other than object semantics, the underpinning pathological composition principle and the resultant relational correspondences across biological structures are also indispensable for distinguishing visually ambiguous cells. As illustrated in Fig. 1(b), we characterize the composing process of tumor micro-environment with a hierarchical formulation between latent variables and representations. With conjoint parents in the directed acyclic graph [31], the representations for nuclear clusters and adjacent tissues are causally subjected to statistical dependence. The implicit correspondences among latent representations also instigate observable implications on biological structure traits. As Fig. 2 shows, the tissue structures surrounding neoplastic and epithelial cells demonstrate significant visual differences, despite the analogous morphological properties of nuclei themselves. It suggests that jointly modeling cells and surrounding tissues and exploiting their correspondences are advantageous for recognizing cells with ambiguous semantics [33]. Furthermore, the biological correlations concurrently hold that the spatial contexts among neighboring nuclei in nuclear clusters are found to be informative and could behave as discriminative markers for cell identification [1].

In this work, we propose a novel framework harnessing the inherent biological correspondences across pathological environment for cross-domain nuclei recognition. Our insight lies in leveraging the correspondences between observable biological structures to formulate a suite of surrogate tasks, from which the underlying pathological composition principle can be implicitly learned and exploited. The unveiled pathogenesis principle inherits coherence regardless of the confounding factors such as sampling organs and staining procedures, and can therefore behave as domain-invariant knowledge. By feeding neural models with the explicit invariance, we present an elegant solution to overcome shortcut learning biased from spurious correlations, which has been regarded as the primary determinant for model collapse under domain shifts [34]. In specific, we devise a multifaceted self-discovery scheme to uncover the correspondences across biological structures over complementary levels and intrinsically learn the coherent data-genesis principle to endow the adapted model with general serviceability. It is achieved by devising pretext tasks to perform counterfactual nuclei masking for exploring the implicit dependencies in an unsupervised manner. Moreover, we propose self-adaptive dynamic distillation to further leverage the correspondences within nuclear clusters via exploiting the spatial contexts of neighbouring nuclei and accordingly conduct instance-adaptive rectifications on model outputs.

In a nutshell, our contributions are three-fold: i) We develop a novel framework for cross-domain cellular nuclei recognition which, for the first time, goes beyond ambiguous object semantics and proposes to leverage the latent hierarchy in the pathological composing process and the implicit biological correspondences as bridges to foster model adaptation. ii) We propose the hierarchical self-discovery and instance-adaptive rectification methodologies to exploit the multifaceted biological correspondences for learning high-level composition principle via surrogate tasks, without the need for prior pathological knowledge. iii) We comprehensively evaluate our proposed method and demonstrate its effectiveness on diverse cross-domain settings, attaining remarkable improvements over state-of-the-art UDA methods.

2 Related Work

UDA in Biomedical Images. Aiming at mitigating the distributional bias and learning transferable knowledge across data cohorts, unsupervised domain adaptation (UDA) [47, 35] is raised as a popular line of research. Representative works for UDA propose to mitigate the discrepancy across domains from the image appearance and feature representation level. To accomplish appearance-level alignment, a generative or style hallucinative model is introduced to transform source domain images to synthetic target-like ones [29, 26, 43]. With respect to content representation-level alignment, distributional disparity regularization and adversarial optimization are commonly adopted approaches to derive domain-invariant representation [3, 38, 42]. Those methodologies have shown impressive efficacy for alleviating cross-domain gap in various scenarios such as anatomical structure segmentation [14] and nucleus detection [40].

Domain Adaptive Object Recognition. In the context of cross-domain object recognition, the prevailing approach is employing region-of-interest (ROI) based domain discriminators to align instance-level features from different domains [4]. Following a similar spirit, multi-branch architectures are further devised to compensate for the model collapse issue and resort to securing the best trade-offs between domain-specific and domain-invariant attributes [5, 17]. Style hallucination techniques are also utilized to mitigate domain discrepancy from the image appearance level [20]. However, the aforementioned approaches stand without consideration of category-related characteristics along domain alignment and therefore tend to incur negative transfer across irrelevant entity types. In recent literature, several efforts [45, 46, 22] have posited the beneficial practice of performing class-wise alignment and harnessing category pseudo-labels from the target domain. For instance, [17] proposes a multi-scale adversarial training paradigm which jointly minimizes image-level domain discrepancy and aligns semantic representation across domains in a class-aware manner. Nonetheless, considering the model vulnerability in cross-domain scenarios for the nuclei recognition task and the resulting biased pseudo-labels, securing precise class-wise alignment across distinct nuclei types is deemed impractical. Analogous to our work, [41, 24] study category-aware nuclei recognition under domain shifts. However, they adopt approaches similar to [45] and are therefore suboptimal due to the lack of consideration for domain-invariant biological composition and the unstable pseudo-label learning process.

3 Methodology

3.1 A Hierarchical View on Pathology Data Genesis

As a starting point, we characterize the underpinning pathological genesis process with a hierarchical latent variable model [18], as depicted in Fig. 1(b). The hierarchical dependence structure can be formally described as follows:

Proposition 1 (Hierarchical Formulation of Latent Variables)

Let $\mathcal{G}^{*}$ be the directed acyclic graph describing the causal structure of latent variables, with the sets of nodes and edges denoted as $\mathcal{V}$ and $\mathcal{E}$ . $\mathcal{V}$ is composed of the measurable representations $\mathbf{R}$ for biological structures as well as the high-level latent variables $\mathbf{S}$ . Then, the underlying data genesis procedure can be characterized with the following structural formulations: $\mathbf{R}_{i}=\sum_{\mathbf{S}_{j}\in Pa(\mathbf{R}_{i})}p_{ij}(\mathbf{S}_{j% })+Q(\epsilon_{i})$ , where $Pa(\cdot)$ represents the set of parents for a certain node, $p_{(\cdot,\cdot)}$ denotes the causal dependence function, and $Q(\epsilon)$ is the probability distribution over exogenous random variables $\{\epsilon_{i}\}$ .

Based upon the principle, with the high-level pathological composition mechanism denoted as a latent variable $\mathbf{S}_{c}$ , we have $\mathbf{S}_{c}$ as the conjoint ancestor for different biological structures, namely $\mathbf{R}_{c}\leftarrow\mathbf{S}_{c}\rightarrow\mathbf{R}_{t}$ , where $\mathbf{R}_{c}$ and $\mathbf{R}_{t}$ stand for the representations of cells and tissues.

It implies that the correspondences between those biological structures can be traced back to the high-level latent factor $\mathbf{S}_{c}$ , which holds fundamental invariance across domains. As Fig. 3 illustrates, $\mathbf{S}_{c}$ serves as an intermediate node in the correspondence discovery chain (e.g., $\mathbf{R}_{t}\xmapsto{\mathbf{S}_{c}}\mathbf{R}_{c}$ ), such that the knowledge can be implicitly learned and injected to the model along the path. In this regard, we propose to implicitly learn the domain-coherent principle to foster model generalization by exploring the biological correspondences via counterfactual nuclei masking and restoration.

A schematic illustration of the proposed method is shown in Fig. 4. Specifically, we propose multifaceted correspondence self-discovery and instance-adaptive dynamic distillation, which aim to capture the inherent nuclei-tissue and nuclei-nuclei relationships and exploit relational contexts within nuclear clusters to rectify model predictions dominated by the characteristics of individual nucleus, respectively. Those surrogate tasks share a backbone network with the primary nuclei recognition branch for seamless transfer of complementary knowledge.

3.2 Multifaceted Correspondence Discovery

Correspondence across Nucleus and Tissue. As illustrated in Fig. 2, capturing the underlying correlations across nuclei and tissue structures offers discriminative contextual information and could benefit the precise identification of nuclear subtypes in spite of their ambiguous visual attributes. Those intrinsic biological correspondences are rooted in the fundamental composition mechanism of pathology data and thus inherit stronger robustness against domain shifts compared with vanilla pixel or object-level semantics. To this end, we devise a self-regulated surrogate task to exploit such domain-robust correspondences, by firstly masking nuclei pixels and then resorting to restore the concealed biological contexts. In the restoration step, the only information available is the characteristics of background tissue structures, as nuclear properties have been covered up. On that account, the task becomes predicting the attributes of nuclei according to their surrounding tissue formulation, which provides an elegant solution to learn the relationships between nuclei and tissue.

Specifically, given an image $\mathbf{I}$ and its nuclei binary mask $\mathbf{\hat{M}}$ , we mask the inner details of nuclei regions by replacing all nuclei pixel values with their average value to perform counterfactual intervention: $\mathbf{\hat{I}_{mask}}=\mathbf{I}\odot(\mathbbm{1}-\mathbf{\hat{M}})+\mathtt{% avg}(\mathbf{\hat{M}}\odot\mathbf{I})\odot\mathbf{\hat{M}}$ , where $\odot$ denotes the element-wise product operation. Then, we restore the pixel values of masked nuclei based on the surrounding tissue characteristics and accordingly model the nuclei-tissue relationships. Considering that the functional types of nuclei are mainly dependent on the locally surrounding tissue structures [33], we construct the pixel-wise restoration decoder with convolutional filter to leverage its strong focus on local contextual information. In specific, the masked images $\mathbf{\hat{I}_{mask}}$ are forwarded to backbone $\bm{B}$ for encoding tissue structures, followed by the restoration block $\bm{\tilde{G}}_{\bf{INC}}$ to re-fill those masked regions per pixel. The final restoration results can be represented with $\mathbf{\hat{I}_{rec}}=\bm{\tilde{G}}_{\bf{INC}}(\bm{B}(\mathbf{\hat{I}_{mask}% }))+\mathbf{\hat{I}_{mask}}.$ As optimization target of the surrogate restoration task, we impose two training objectives towards disparate yet complementary regularization endpoints:

\mathcal{L}_{\rm{TCD}}=\mathbb{E}_{(\mathbf{I},\mathbf{\hat{I}_{rec}})\,\sim% \mathcal{X}_{s/t}}[\mathcal{H}(\mathbf{I},\mathbf{\hat{I}_{rec}})+\mathcal{L}_% {\rm{perpt}}(\mathbf{I},\mathbf{\hat{I}_{rec}};\mathbf{\check{D}})].

(1)

Here $\mathcal{X}_{s/t}$ denote the data distributions of source and target domains. The first term aims to ensure the pixel-level context coherence after restoration with a matching loss term $\mathcal{H}$ . The other perceptual regularization term enforces the generated image to not only locally approach the raw image but also exhibit harmonization and consistency from the global perspective. An adversarial discriminator $\mathbf{\check{D}}$ is trained together with the restoration network for estimating perceptual disparity and deriving $\mathcal{L}_{\rm{perpt}}$ [10].

Correspondence within Nuclear Cluster. Recently, several works [27, 1] have underlined the value of considering the community nature of cells, whereas disregarding the particular value of this property under cross-domain scenarios as the correspondences within nuclei clusters are rooted in the pathogenesis of tumor [32] and could deliver implicit modeling of the invariant factor. For example, in all types of organs, epithelial cells tend to cluster together in a ring-like shape [9]. We thereupon devise an instance-level nuclei restoration task to explore the inter-nuclei correspondences. The aim is to predict the attributes of a masked nucleus according to its neighbouring nuclei, formulating a neat recipe to learn the implicit object adjacency relationships.

As indicated in Fig. 4, in each image tile $\mathbf{I}$ , we first perform ROI align and thereafter mask the instance-wise feature of the central nucleus. Here, masking means that we do not forward the feature of the masked nucleus to the following network. We denote the instance-wise feature maps for the masked nucleus and the neighbouring nuclei as $\mathbf{F_{mask}}$ and $\{\mathbf{F_{nbr}}^{i}\}_{i=1}^{N-1}$ , respectively, where $N$ is the total number of nuclei proposals. The restoration scheme intends to retrieve the masked feature maps via contextual modeling of the neighboring nuclear community. It is essential to capture the long-range dependencies and spatial relationships among the group of adjacent nuclei so that the attributes of the masked nucleus can be reasonably restored. In this regard, we develop a ViT [6]-based framework where the high-level correlations across nuclei are exploited with the self-attention mechanism. The ROI features of unmasked nuclei are firstly passed through a ViT encoder $\bm{Z}_{\bf{FNR}}$ . A token $\mathbf{TK}$ is then concatenated with the encoded unmasked nuclei to represent the nucleus that is masked and required to be reconstructed. Positional embeddings $\mathbf{PE}$ are added to incorporate the spatial information of each nucleus into consideration. Subsequently, those representations are forwarded to the feature reconstruction head $\bm{\tilde{G}}_{\bf{FNC}}$ , which is also constructed with a series of transformer blocks, to predict the feature values of the masked nucleus:

\begin{split}\{\mathbf{\hat{F}_{rec}}^{i}\}_{i=1}^{N}=\bm{\tilde{G}}_{\bf{FNC}% }(\mathtt{concat}[\bm{Z}_{\bf{FNR}}(\{\mathbf{F_{nbr}}^{i}\}_{i=1}^{N-1}),\,% \mathbf{TK}]+\mathbf{PE}).\end{split}

(2)

Then, the restored map for the masked nucleus $\mathbf{\hat{F}_{rec}^{mask}}=\{\mathbf{\hat{F}_{rec}}^{i}\}_{i=1}^{N}[N]$ . The training objective of the surrogate task for inter-nuclei correspondence discovery is to restore the original feature maps of the masked nucleus based on neighbouring nuclei characteristics. Therefore, we apply the matching loss $\mathcal{H}$ to penalize the inconsistency between raw feature maps and the restoration results:

\mathcal{L}_{\rm{NCD}}=\mathbb{E}_{(\mathbf{F}|\mathbf{I})\,\sim\mathcal{X}_{s% /t}}[\mathcal{H}(\mathbf{F_{mask}},\mathbf{\hat{F}_{rec}^{mask}})].

(3)

3.3 Self-adaptive Dynamic Distillation

In standard object recognition pipeline, identification of instance type is dominated by the appearance and texture attributes of each individual object, which are vulnerable to the data distribution biases incurred by variations in imaging protocol and staining procedure [16, 44]. To further leverage the domain-invariant contextual information and spatial relationships of nuclear clusters, we propose to take advantage of transformer’s capability to capture high-level implicit correspondences between the group of input nuclei feature representations [13] and provide adaptive guidance.

Specifically, we reuse the ViT previously deployed to characterize and embed the inter-nuclei correlations with an appended classification head. Here, the transformer-based branch operates in parallel with the basic convolution-based one. Given that the two architectures focus on different hierarchies of contextual information (i.e., low-level instance-wise attributes and high-level inter-object correspondences, respectively), performing mutual distillation could strengthen the overall results to transcend object semantics-dominated prediction. Then, we propose to adaptively adjust the trade-offs across two branches for each nucleus according to instance-wise ambiguity. First, we estimate the uncertainty of model inference with predictive entropy, which measures the quantity of information included in the model’s predictive density function [30]. With the input pairs of neural network denoted as ( $\mathbf{x}^{*},\mathbf{y}^{*}$ ) and its weights denoted by $\mathbf{W}$ , the approximate predictive distribution is parameterized as:

q(\mathbf{y}^{*}|\mathbf{x}^{*})=\int p(\mathbf{y}^{*}|\mathbf{x}^{*},\mathbf{% W})q(\mathbf{W})\mathbf{dW},

(4)

where $p(\mathbf{y}^{*}|\mathbf{x}^{*},\mathbf{W})$ represents the predictive distribution, $q(\mathbf{W})$ denotes the approximate variational distribution of $\mathbf{W}$ . The Monte Carlo estimate $\hat{y}$ can be thereafter derived:

\hat{y}=\mathbb{E}_{q(\mathbf{y}^{*}|\mathbf{x}^{*})}(\mathbf{y}^{*})\approx% \frac{1}{T}\sum_{t=1}^{T}\hat{\mathbf{y}^{*}}(\mathbf{x}^{*},\mathbf{W}^{t}),

(5)

where $\hat{\mathbf{y}^{*}}$ corresponds to the label predictions, $T$ is the number of stochastic forward passes. Then, the final predictive entropy is obtained by aggregating entropy over all classes $\mathbf{U}=-\sum_{c=1}^{C}\mathbb{P}(\hat{y}=c)\mathtt{log}\,\mathbb{P}(\hat{y% }=c).$ Here $C$ denotes the number of all classes, $\mathbb{P}(\hat{y}=c)$ denotes the probability of $\hat{y}$ belonging to class $c$ . For nuclei recognition, we find that (see Fig. 1) the cross-domain degradation for connective and inflammatory cells, which are sparsely distributed and possess distinct morphological attributes [9], is relatively moderate. It indicates that the proposals conditioned on object semantics could be more reliable for a nucleus that is distant from other nuclei and possesses unambiguous individual characteristics. We thereby consider $\mathbf{U}^{M}/\mathbf{U}^{T}$ as the trade-off factor over the convolution- and transformer-based classification modules, with different model tendency assigned to nuclei under divergent spatial distributions. The trade-offs are thereafter adopted to regulate the overall loss function to deliver self-adaptive dynamic guidance:

\mathcal{L}_{\rm{SDD}}=\mathbb{E}_{\mathbf{I}\,\sim\mathcal{X}_{s}/\mathcal{X}% _{t}}[\frac{1}{N}\sum_{i=1}^{N}\frac{\mathbf{U}_{i}^{M}}{\mathbf{U}_{i}^{T}}% \lVert\mathcal{S}_{i}^{M}-\mathcal{S}_{i}^{T}\rVert;\mathbf{I}],

(6)

where $N$ is the total number of nuclei in image $\mathbf{I}$ , $\mathcal{S}^{M}$ and $\mathcal{S}^{T}$ are the classification scores of the two model constitutes.

Table 1: Comparison results of our proposed method against other state-of-the-art methods for nuclei classification under three cross-organ settings. X

\rightarrow

Y denotes that the model is trained on data acquired from X organ and then evaluated on Y organ samples.

F

scores for each class and the class-averaged overall score are reported. Neo., Epi., Con., and Inf. denote neoplastic, epithelial, connective, and inflammatory cells, respectively. Best and second best results are highlighted in bold and underlined, respectively.

Methods	Breast $\rightarrow$ Testis (F score)					Breast $\rightarrow$ Thyroid (F score)					Breast $\rightarrow$ Bile-duct (F score)
Methods	Neo.	Epi.	Con.	Inf.	Avg.	Neo.	Epi.	Con.	Inf.	Avg.	Neo.	Epi.	Con.	Inf.	Avg.
Source-only	0.428	0.070	0.529	0.607	0.409	0.311	0.036	0.445	0.368	0.290	0.553	0.000	0.498	0.522	0.393
DA-RCNN [4]	0.527	0.357	0.579	0.460	0.481	0.224	0.303	0.381	0.417	0.331	0.548	0.024	0.466	0.568	0.401
PSA [45]	0.576	0.256	0.580	0.254	0.417	0.386	0.284	0.392	0.401	0.366	0.566	0.000	0.454	0.462	0.371
MGA [46]	0.540	0.302	0.556	0.358	0.439	0.306	0.116	0.448	0.413	0.321	0.535	0.000	0.473	0.526	0.384
PT-MAF [17]	0.452	0.000	0.547	0.554	0.388	0.354	0.000	0.410	0.351	0.279	0.516	0.000	0.460	0.446	0.355
HT [5]	0.518	0.373	0.585	0.579	0.514	0.391	0.264	0.401	0.330	0.347	0.572	0.056	0.501	0.595	0.431
BAFA [41]	0.535	0.215	0.510	0.572	0.458	0.293	0.228	0.477	0.434	0.358	0.558	0.049	0.465	0.590	0.416
CAPL-Net [24]	0.551	0.252	0.602	0.364	0.442	0.401	0.172	0.403	0.389	0.341	0.522	0.000	0.481	0.576	0.394
Ours	0.596	0.521	0.594	0.645	0.589	0.460	0.359	0.482	0.452	0.438	0.627	0.203	0.509	0.615	0.488

Training Pipeline. The overall optimization objective is to minimize the aggregated loss: $\mathcal{L}_{\rm{total}}=\mathcal{L}_{\rm{rec}}+\lambda^{*}(\mathcal{L}_{\rm{% TCD}}+\mathcal{L}_{\rm{NCD}}+\mathcal{L}_{\rm{SDD}}),$ where $\mathcal{L}_{\rm{rec}}$ is the base loss of the recognition model on the source domain, and $\lambda^{*}$ controls the scaling factors of surrogate tasks. The supplementary loss terms are aggregated from both the source and target domains.

4 Experiments and Results

4.1 Experimental Setup

Datasets. To verify the effectiveness and general applicability of the proposed method, we perform extensive experiments under various domain shifts incurred by the discrepancy across organs and stains. For cross-organ adaptation, we leverage four datasets sampled from different organs, i.e., breast, testis, thyroid, and bile-duct, which are retrieved from The Cancer Genome Atlas (TCGA) and [9]. Nuclei are categorized as of neoplastic, epithelial, connective, and inflammatory cells. The statistics of used data are presented in Table 2. Then, we further evaluate our method on scenarios where both cross-organ and stain shifts stand out. Following previous work [41], we adopt CoNSep [12] and PanNuke [9] as the source and target domains. CoNSep contains histology tiles from one single organ type (colon), whilst PanNuke is collected from 19 different organs. The staining procedures for those datasets are also inconsistent due to different clinical purposes and regulatory requirements across cohorts and countries. The neoplastic and epithelial classes in PanNuke are merged into one class for label space coherence.

Table 2: Class-wise statistics of nuclei in the used histology image data across different organs.

Organ Type	Number of Annotated Nuclei
Organ Type	Neoplastic	Epithelial	Connective	Inflammatory	Total
Breast	162,780	109,758	91,053	51,597	415,188
Testis	12,021	6,252	10,845	11,160	40,278
Thyroid	10,152	13,692	12,828	5,280	41,952
Bile-duct	26,460	2,352	23,316	13,995	66,123

Table 3: Comparison results on cross-organ nuclei instance segmentation.

PQ

metrics over each class and averaged score are reported.

Methods	Breast $\rightarrow$ Testis (PQ score)					Breast $\rightarrow$ Thyroid (PQ score)					Breast $\rightarrow$ Bile-duct (PQ score)
Methods	Neo.	Epi.	Con.	Inf.	Avg.	Neo.	Epi.	Con.	Inf.	Avg.	Neo.	Epi.	Con.	Inf.	Avg.
Source-only	0.207	0.029	0.344	0.309	0.222	0.144	0.032	0.320	0.196	0.173	0.295	0.000	0.288	0.299	0.220
DA-RCNN [4]	0.235	0.179	0.323	0.340	0.269	0.095	0.138	0.237	0.271	0.185	0.282	0.011	0.273	0.270	0.209
MGA [46]	0.243	0.150	0.289	0.266	0.237	0.121	0.045	0.282	0.257	0.176	0.294	0.000	0.263	0.295	0.212
HT [5]	0.214	0.207	0.306	0.313	0.260	0.137	0.084	0.262	0.177	0.165	0.309	0.032	0.295	0.283	0.230
BAFA [41]	0.236	0.145	0.311	0.332	0.256	0.127	0.103	0.302	0.264	0.199	0.297	0.015	0.257	0.301	0.218
Ours	0.252	0.390	0.324	0.385	0.338	0.159	0.141	0.297	0.284	0.220	0.330	0.102	0.286	0.304	0.256

Table 4: Comparison results for nuclei recognition under both organ and stain shifts. Neo-Epi. denotes the united class for neoplastic and epithelial cells.

Methods	CoNSep $\rightarrow$ PanNuke (F score)				CoNSep $\rightarrow$ PanNuke (PQ score)
Methods	Neo-Epi.	Con.	Inf.	Avg.	Neo-Epi.	Con.	Inf.	Avg.
Source-only	0.285	0.278	0.356	0.306	0.216	0.130	0.149	0.165
DA-RCNN [4]	0.316	0.254	0.411	0.327	0.227	0.122	0.165	0.171
MGA [46]	0.370	0.272	0.359	0.334	0.240	0.117	0.158	0.172
HT [5]	0.348	0.251	0.366	0.322	0.195	0.102	0.144	0.147
BAFA [41]	0.261	0.280	0.403	0.314	0.183	0.120	0.174	0.159
CAPL-Net [24]	0.335	0.264	0.332	0.310	0.231	0.113	0.155	0.167
Ours	0.404	0.300	0.401	0.368	0.258	0.149	0.166	0.191

Methods	PanNuke $\rightarrow$ CoNSep (F score)				PanNuke $\rightarrow$ CoNSep (PQ score)
Methods	Neo-Epi.	Con.	Inf.	Avg.	Neo-Epi.	Con.	Inf.	Avg.
Source-only	0.796	0.639	0.590	0.675	0.305	0.213	0.312	0.276
DA-RCNN [4]	0.819	0.588	0.603	0.669	0.289	0.174	0.335	0.265
MGA [46]	0.774	0.609	0.602	0.661	0.266	0.185	0.302	0.251
HT [5]	0.830	0.647	0.611	0.696	0.301	0.195	0.318	0.271
BAFA [41]	0.793	0.665	0.626	0.695	0.298	0.207	0.349	0.285
CAPL-Net [24]	0.768	0.574	0.606	0.649	0.273	0.178	0.320	0.257
Ours	0.856	0.696	0.618	0.723	0.356	0.252	0.352	0.320

Implementation Details and Evaluation Metrics. Following previous works [19], we adopt Mask R-CNN [16] as the base model. The matching loss $\mathcal{H}$ between a pair of inputs is implemented with $L_{1}$ regularization term. For nuclei masking on the target domain, we utilize the box proposals and masks generated with a model trained on the source domain since Fig. 1 shows that deep model is robust to domain shifts for class-agnostic segmentation.

For evaluation on the classification task, we follow previous works [12] and adopt the $F$ score to measure the performance of nuclei classification: $F=\frac{TP}{TP+FP+FN}.$ More weights are assigned to FP and FN compared with normal $F_{1}$ score to impose emphasis on false classification results. For instance segmentation, we use panoptic quality (PQ) score [9] for quantitative evaluation. Both $F$ and $PQ$ scores are computed for each class and then averaged to demonstrate the overall performance.

4.2 Comparison with State-of-the-Art Methods

We compare our proposed method against the state-of-the-art UDA object recognition methods, including DA-RCNN [4], PSA [45], MGA [46], PT-MAF [17], HT [5], BAFA [41], and CAPL-Net [24] to justify its effectiveness. The results when the source domain-trained model is adopted straight for evaluations on the target domain are also presented for reference. For fair comparisons, those methods are implemented with the same backbone architecture and training settings (batch size, learning rate, etc.) as ours. We report the quantitative comparison results for each type of nuclei and the averaged value over all classes under a 3-fold cross-validation setting. Paired t-tests are also conducted between our method and the others on class-averaged overall scores. The resulting p-values for all tests are below 0.05, indicating the proposed method significantly outperforms the approaches in comparison.

In Table 1, we present the results for classification task under cross-organ domain shifts. It is observed that our method achieves significant improvements in terms of class-averaged scores in all the three adaptation scenarios. The advancement can be attributed to our proposed method’s capacity of identifying nuclei with ambiguous semantics and bypassing performance degradation incurred by inaccurate category pseudo-labels. The results are further discussed in Section 4.4. In Table 3, the quantitative performance of category-wise instance segmentation is presented. With the attained high-quality nuclei type identification results, our method concurrently yields appealing accuracy regarding class-wise and -averaged PQ. Additionally, as depicted in Fig. 5, most competing methods fail to recognize epithelial cells and undesirably categorize them into neoplastic ones. On the contrary, our proposed method successfully distinguishes the two types of nuclei with very similar visual attributes.

To further verify the generalizability of our method, we evaluate its efficacy under both organ and stain shifts. Following [41], we adopt CoNSep and PanNuke to construct the adaptation benchamark and perform bi-directional experiments. We adopt the evaluation metrics in [41] to jointly consider the performance of nuclei detection and classification tasks. PQ scores are also presented for segmentation quality evaluation. As shown in Table 4, we compare our method with the state-of-the-art UDA object recognition approaches. The empirical improvements of our method in the sophisticated cross-domain setting are consistent with previous experiments and observations, which substantiates the effectiveness and robustness of our method against various types of domain shifts in histology data.

4.3 Ablation Study

To validate the efficacy of key components in the proposed method, we perform ablation studies on the classification and category-wise instance segmentation tasks by evaluating with several variants of the method. The corresponding quantitative comparison results are reported in Table 5, where MD denotes the mutual distillation across architectures and SA is the estimated trade-offs for self-adaptive dynamic distillation. It is remarked that all the components have positive impacts on improving the overall classification accuracy. In specific, solely employing TCD or the combination of NCD and MD can already lead to competitive results. It justifies the importance to explore the implicit biological correspondences for cross-domain nuclei recognition. By integrating those constituents together, we reach peak performance. Moreover, the employment of instance-adaptive guidance further boosts the $F$ score by around $3\%$ . For instance segmentation, we observe that NCD tends to have a negative impact on the overall accuracy. The reason could be that in NCD, we use the global pooling strategy, which inevitably discards the fine-grained spatial information. In contrast, TCD exhibits beneficial effects on segmentation. It is in virtue of the design that when performing image-level nuclei masking, we keep all the spatial details and introduce reliable masks, which subsequently serves as a guidance for segmentation.

Table 5: Ablation study to verify the efficacy of key components in our method. The class-averaged overall

F

and

PQ

scores are presented for each case. Tes., Thyr. and Bile. stand as the abbreviations for testis, thyroid, and bile-duct, respectively.

\checkmark

marks indicate the utilized modules. The best results are highlighted in bold.

	TCD	NCD	MD	SA	Tes. (F)	Tes. (PQ)	Thyr. (F)	Thyr. (PQ)	Bile. (F)	Bile. (PQ)
Settings	–	–	–	–	0.409	0.222	0.290	0.173	0.393	0.220
	$\checkmark$	–	–	–	0.446	0.232	0.326	0.189	0.427	0.227
	$\checkmark$	$\checkmark$	–	–	0.472	0.240	0.346	0.180	0.431	0.214
	–	$\checkmark$	$\checkmark$	–	0.508	0.256	0.382	0.197	0.465	0.234
	$\checkmark$	$\checkmark$	$\checkmark$	–	0.557	0.294	0.401	0.215	0.473	0.250
	$\checkmark$	–	$\checkmark$	$\checkmark$	0.554	0.327	0.417	0.231	0.448	0.247
	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	0.589	0.338	0.438	0.220	0.488	0.256

To investigate the beneficial impact of our proposed method in greater detail, we perform sensitivity analysis on the loss weighting terms $\lambda^{*}=\{\lambda_{1},\lambda_{2},\lambda_{3}\}$ set, which correspond to the scaling factors of tissue correspondence discovery, nuclear correspondence discovery, and self-adaptive dynamic distillation, respectively. The results are depicted in Fig. 6.

For each weighting item, we scale it by a factor of 0.1, 0.3, 1.0, 3.0, and 10.0, respectively, and keep other items fixed. The corresponding quantitative evaluation results for classification and category-aware instance segmentation are reported. All the experiments are conducted on the Breast $\rightarrow$ Testis setting. It can be observed that our main setting (i.e., when all scale factors equal to 1.0) achieves superior performance compared with most adjusted settings. In addition, when decreasing the influential factor of $\lambda_{3}$ , there exists a noticeable performance drop for both tasks. This finding is consistent with the results of ablation studies and substantiates the importance of the proposed instance-adaptive dynamic distillation strategy. On the other hand, increasing the influential factor of $\lambda_{2}$ has detrimental effects on instance segmentation, which is mainly attributed to the introduced global pooling strategy.

4.4 Discussions

Identification of Nuclei with Ambiguous Semantics. Regarding the results in Table 1, the advancements of our method can be mainly attributed to its capability for precisely identifying neoplastic and epithelial cells.

Different from connective and inflammatory cells which possess distinct individual shape and texture characteristics (i.e., the connective cell typically exhibits a flat polygon pattern in terms of geometric shape and the inflammatory cell is much darker than others in color space, as showcased in Fig. 7), the ambiguity between neoplastic and epithelial cells makes them indistinguishable for object semantics-conditioned model under cross-domain scenarios. It is challenging to distinguish those two types of cells solely based on their appearance attributes. To this end, existing methods [4, 45] which perform UDA with object-wise alignment struggle to find the decision boundary to separate neoplastic and epithelial cells. In contrast, by exploiting the informative correlations across biological structures, which demonstrate stronger visual contrast, our proposed method significantly lifts the performance to distinguish those two types of cells with analogous morphological traits.

Sidestepping Reliance on Biased Pseudo-labels. With respect to methods built upon category pseudo-labels [45, 41, 46], they bring relatively limited empirical gains for cross-domain nuclei recognition. For example, in the Breast $\rightarrow$ Testis setting, the overall average $F$ scores of those methods are exceeded by the ones that do not depend on category pseudo-labels by almost $5\%$ . The degradation is on the pitfalls of the successive error accumulation caused by biased category pseudo-labels, which is inevitable considering the drastic model collapse across sampling organs and staining protocols. In this regard, with specifically devised surrogate tasks as bridges for model transfer across domains, our proposed method gets rid of the self-training scheme and the reliance on category pseudo-labels, which contributes to remarkable improvements in consequence.

Statistical Analysis. The key advancement of our method, as previously discussed, is its capability to distinguish nuclei with ambiguous morphological characteristics. We therefore conduct detailed statistical analysis on the nuclear classification results regarding epithelial cells. The resulting box plots and p-values shown in Fig. 8 substantiate the statistical significance of our achieved improvements.

5 Conclusion

In this work, we propose a holistic framework to facilitate cross-domain cellular nuclei recognition via exploitation of implicit biological relationships at image and instance feature levels. Additionally, we devise self-adaptive dynamic distillation to further leverage the rich relational contexts inherently present in nuclear communities with instance-aware trade-offs across model architectures. Experiments on several cross-domain settings with organ and stain shifts demonstrate that our method addresses the common issues existing in the state-of-the-art UDA object recognition approaches and achieves compelling performance. In future work, we will investigate more challenging yet practical domain adaption scenarios when the cross-domain shifts concurrently stand for class distributions.

References

[1] Abousamra, S., Belinsky, D., Van Arnam, J., Allard, F., Yee, E., Gupta, R., Kurc, T., Samaras, D., Saltz, J., Chen, C.: Multi-class cell detection using spatial context representation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 4005–4014 (2021)
[2] Cao, S., Joshi, D., Gui, L.Y., Wang, Y.X.: Contrastive mean teacher for domain adaptive object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 23839–23848 (2023)
[3] Chen, C., Dou, Q., Chen, H., Qin, J., Heng, P.A.: Unsupervised bidirectional cross-modality adaptation via deeply synergistic image and feature alignment for medical image segmentation. IEEE Transactions on Medical Imaging 39(7), 2494–2505 (2020)
[4] Chen, Y., Li, W., Sakaridis, C., Dai, D., Van Gool, L.: Domain adaptive faster r-cnn for object detection in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3339–3348 (2018)
[5] Deng, J., Xu, D., Li, W., Duan, L.: Harmonious teacher for cross-domain object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 23829–23838 (2023)
[6] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 (2021)
[7] Fan, J., Liu, D., Chang, H., Cai, W.: Learning to generalize over subpartitions for heterogeneity-aware domain adaptive nuclei segmentation. International Journal of Computer Vision pp. 1–24 (2024)
[8] Fan, J., Liu, D., Chang, H., Huang, H., Chen, M., Cai, W.: Taxonomy adaptive cross-domain adaptation in medical imaging via optimization trajectory distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 21174–21184 (2023)
[9] Gamper, J., Koohbanani, N.A., Benet, K., Khuram, A., Rajpoot, N.: Pannuke: an open pan-cancer histology dataset for nuclei instance segmentation and classification. In: European Congress on Digital Pathology. pp. 11–19. Springer (2019)
[10] Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: International Conference on Machine Learning. pp. 1180–1189. PMLR (2015)
[11] Graham, S., Jahanifar, M., Azam, A., Nimir, M., Tsang, Y.W., Dodd, K., Hero, E., Sahota, H., Tank, A., Benes, K., et al.: Lizard: A large-scale dataset for colonic nuclear instance segmentation and classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 684–693 (2021)
[12] Graham, S., Vu, Q.D., Raza, S.E.A., Azam, A., Tsang, Y.W., Kwak, J.T., Rajpoot, N.: Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Medical Image Analysis 58, 101563 (2019)
[13] Han, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z., Tang, Y., Xiao, A., Xu, C., Xu, Y., et al.: A survey on vision transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 87–110 (2022)
[14] Han, X., Qi, L., Yu, Q., Zhou, Z., Zheng, Y., Shi, Y., Gao, Y.: Deep symmetric adaptation network for cross-modality medical image segmentation. IEEE Transactions on Medical Imaging 41(1), 121–132 (2021)
[15] He, H., Wang, J., Wei, P., Xu, F., Ji, X., Liu, C., Chen, J.: Toposeg: Topology-aware nuclear instance segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 21307–21316 (2023)
[16] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 2961–2969 (2017)
[17] He, Z., Zhang, L., Gao, X., Zhang, D.: Multi-adversarial faster-rcnn with paradigm teacher for unrestricted object detection. International Journal of Computer Vision 131(3), 680–700 (2023)
[18] Hinton, G.: How to represent part-whole hierarchies in a neural network. Neural Computation pp. 1–40 (2022)
[19] Hsu, J., Chiu, W., Yeung, S.: Darcnn: Domain adaptive region-based convolutional neural network for unsupervised instance segmentation in biomedical images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1003–1012 (2021)
[20] Huang, J., Guan, D., Xiao, A., Lu, S.: Rda: Robust domain adaptation via fourier adversarial attacking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 8988–8999 (2021)
[21] Huang, J., Li, H., Wan, X., Li, G.: Affine-consistent transformer for multi-class cell nuclei detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 21384–21393 (2023)
[22] Kennerley, M., Wang, J.G., Veeravalli, B., Tan, R.T.: 2pcnet: Two-phase consistency training for day-to-night unsupervised domain adaptive object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11484–11493 (2023)
[23] Kumar, N., Verma, R., Anand, D., Zhou, Y., Onder, O.F., Tsougenis, E., Chen, H., Heng, P.A., Li, J., Hu, Z., et al.: A multi-organ nucleus segmentation challenge. IEEE Transactions on Medical Imaging 39(5), 1380–1391 (2019)
[24] Li, C., Liu, D., Li, H., Zhang, Z., Lu, G., Chang, X., Cai, W.: Domain adaptive nuclei instance segmentation and classification via category-aware feature alignment and pseudo-labelling. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 715–724. Springer (2022)
[25] Li, W., Liu, X., Yuan, Y.: Sigma: Semantic-complete graph matching for domain adaptive object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5291–5300 (2022)
[26] Liu, D., Zhang, D., Song, Y., Zhang, F., O’Donnell, L., Huang, H., Chen, M., Cai, W.: Unsupervised instance segmentation in microscopy images via panoptic domain adaptation and task re-weighting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4243–4252 (2020)
[27] Liu, P., Bilgic, M.: Relational classification of biological cells in microscopy images. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 35, pp. 344–352 (2021)
[28] Lu, C., Romo-Bucheli, D., Wang, X., Janowczyk, A., Ganesan, S., Gilmore, H., Rimm, D., Madabhushi, A.: Nuclear shape and orientation features from h&e images predict survival in early-stage estrogen receptor-positive breast cancers. Laboratory investigation 98(11), 1438–1448 (2018)
[29] Murez, Z., Kolouri, S., Kriegman, D., Ramamoorthi, R., Kim, K.: Image to image translation for domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4500–4509 (2018)
[30] Nair, T., Precup, D., Arnold, D.L., Arbel, T.: Exploring uncertainty measures in deep networks for multiple sclerosis lesion detection and segmentation. Medical image analysis 59, 101557 (2020)
[31] Peters, J., Janzing, D., Schölkopf, B.: Elements of causal inference: foundations and learning algorithms. The MIT Press (2017)
[32] Rendeiro, A.F., Ravichandran, H., Bram, Y., Chandar, V., Kim, J., Meydan, C., Park, J., Foox, J., Hether, T., Warren, S., et al.: The spatial landscape of lung pathology during covid-19 progression. Nature 593(7860), 564–569 (2021)
[33] Ryu, J., Puche, A.V., Shin, J., Park, S., Brattoli, B., Lee, J., Jung, W., Cho, S.I., Paeng, K., Ock, C.Y., et al.: Ocelot: Overlapped cell on tissue dataset for histopathology. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 23902–23912 (2023)
[34] Saranrittichai, P., Mummadi, C.K., Blaiotta, C., Munoz, M., Fischer, V.: Overcoming shortcut learning in a target domain by generalizing basic visual factors from a source domain. In: European Conference on Computer Vision. pp. 294–309. Springer (2022)
[35] Shin, H., Kim, H., Kim, S., Jun, Y., Eo, T., Hwang, D.: Sdc-uda: Volumetric unsupervised domain adaptation framework for slice-direction continuous cross-modality medical image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 7412–7421 (2023)
[36] Tyagi, A.K., Mohapatra, C., Das, P., Makharia, G., Mehra, L., AP, P., et al.: Degpr: Deep guided posterior regularization for multi-class cell detection and counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 23913–23923 (2023)
[37] Verma, R., Kumar, N., Patil, A., Kurian, N.C., Rane, S., Graham, S., Vu, Q.D., Zwager, M., Raza, S.E.A., Rajpoot, N., et al.: Monusac2020: A multi-organ nuclei segmentation and classification challenge. IEEE Transactions on Medical Imaging 40(12), 3413–3423 (2021)
[38] Wu, F., Zhuang, X.: Unsupervised domain adaptation with variational approximation for cardiac segmentation. IEEE Transactions on Medical Imaging 40(12), 3555–3567 (2021)
[39] Wu, H., Wang, Z., Song, Y., Yang, L., Qin, J.: Cross-patch dense contrastive learning for semi-supervised segmentation of cellular nuclei in histopathologic images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11666–11675 (2022)
[40] Xing, F., Cornish, T.C., Bennett, T.D., Ghosh, D.: Bidirectional mapping-based domain adaptation for nucleus detection in cross-modality microscopy images. IEEE Transactions on Medical Imaging 40(10), 2880–2896 (2020)
[41] Yang, S., Zhang, J., Huang, J., Lovell, B.C., Han, X.: Minimizing labeling cost for nuclei instance segmentation and classification with cross-domain images and weak labels. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 35, pp. 697–705 (2021)
[42] Zhang, H., Zhang, Y.F., Liu, W., Weller, A., Schölkopf, B., Xing, E.P.: Towards principled disentanglement for domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8024–8034 (2022)
[43] Zhang, Y., Li, M., Li, R., Jia, K., Zhang, L.: Exact feature distribution matching for arbitrary style transfer and domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8035–8045 (2022)
[44] Zhao, Z.Q., Zheng, P., Xu, S.t., Wu, X.: Object detection with deep learning: A review. IEEE Transactions on Neural Networks and Learning Systems 30(11), 3212–3232 (2019)
[45] Zheng, Y., Huang, D., Liu, S., Wang, Y.: Cross-domain object detection through coarse-to-fine feature adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 13766–13775 (2020)
[46] Zhou, W., Du, D., Zhang, L., Luo, T., Wu, Y.: Multi-granularity alignment domain adaptation for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9581–9590 (2022)
[47] Zhou, Z., Qi, L., Yang, X., Ni, D., Shi, Y.: Generalizable cross-modality medical image segmentation via style augmentation and dual normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20856–20865 (2022)

Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View