Gramformer: Learning Crowd Counting via Graph-Modulated Transformer

Lin, Hui; Ma, Zhiheng; Hong, Xiaopeng; Shangguan, Qinnan; Meng, Deyu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2401.03870 (cs)

[Submitted on 8 Jan 2024]

Title:Gramformer: Learning Crowd Counting via Graph-Modulated Transformer

Authors:Hui Lin, Zhiheng Ma, Xiaopeng Hong, Qinnan Shangguan, Deyu Meng

View PDF

Abstract:Transformer has been popular in recent crowd counting work since it breaks the limited receptive field of traditional CNNs. However, since crowd images always contain a large number of similar patches, the self-attention mechanism in Transformer tends to find a homogenized solution where the attention maps of almost all patches are identical. In this paper, we address this problem by proposing Gramformer: a graph-modulated transformer to enhance the network by adjusting the attention and input node features respectively on the basis of two different types of graphs. Firstly, an attention graph is proposed to diverse attention maps to attend to complementary information. The graph is building upon the dissimilarities between patches, modulating the attention in an anti-similarity fashion. Secondly, a feature-based centrality encoding is proposed to discover the centrality positions or importance of nodes. We encode them with a proposed centrality indices scheme to modulate the node features and similarity relationships. Extensive experiments on four challenging crowd counting datasets have validated the competitiveness of the proposed method. Code is available at {this https URL}.

Comments:	This is the accepted version of the paper and supplemental material to appear in AAAI 2024. Please cite the final published version. Code is available at {this https URL}
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2401.03870 [cs.CV]
	(or arXiv:2401.03870v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2401.03870

Submission history

From: Xiaopeng Hong [view email]
[v1] Mon, 8 Jan 2024 13:01:54 UTC (44,778 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Gramformer: Learning Crowd Counting via Graph-Modulated Transformer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Gramformer: Learning Crowd Counting via Graph-Modulated Transformer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators