Scalable quality control on processing of large diffusion-weighted and structural magnetic resonance imaging datasets

Kim, Michael E.; Gao, Chenyu; Ramadass, Karthik; Kanakaraj, Praitayini; Newlin, Nancy R.; Rudravaram, Gaurav; Schilling, Kurt G.; Dewey, Blake E.; Bennett, David A.; OBryant, Sid; Barber, Robert C.; Archer, Derek; Hohman, Timothy J.; Bao, Shunxing; Li, Zhiyuan; Landman, Bennett A.; Khairi, Nazirah Mohd; Initiative, The Alzheimers Disease Neuroimaging; Team, The HABSHD Study

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2409.17286 (cs)

[Submitted on 25 Sep 2024]

Title:Scalable quality control on processing of large diffusion-weighted and structural magnetic resonance imaging datasets

Authors:Michael E. Kim, Chenyu Gao, Karthik Ramadass, Praitayini Kanakaraj, Nancy R. Newlin, Gaurav Rudravaram, Kurt G. Schilling, Blake E. Dewey, David A. Bennett, Sid OBryant, Robert C. Barber, Derek Archer, Timothy J. Hohman, Shunxing Bao, Zhiyuan Li, Bennett A. Landman, Nazirah Mohd Khairi, The Alzheimers Disease Neuroimaging Initiative, The HABSHD Study Team

View PDF

Abstract:Proper quality control (QC) is time consuming when working with large-scale medical imaging datasets, yet necessary, as poor-quality data can lead to erroneous conclusions or poorly trained machine learning models. Most efforts to reduce data QC time rely on outlier detection, which cannot capture every instance of algorithm failure. Thus, there is a need to visually inspect every output of data processing pipelines in a scalable manner. We design a QC pipeline that allows for low time cost and effort across a team setting for a large database of diffusion weighted and structural magnetic resonance images. Our proposed method satisfies the following design criteria: 1.) a consistent way to perform and manage quality control across a team of researchers, 2.) quick visualization of preprocessed data that minimizes the effort and time spent on the QC process without compromising the condition or caliber of the QC, and 3.) a way to aggregate QC results across pipelines and datasets that can be easily shared. In addition to meeting these design criteria, we also provide information on what a successful output should be and common occurrences of algorithm failures for various processing pipelines. Our method reduces the time spent on QC by a factor of over 20 when compared to naively opening outputs in an image viewer and demonstrate how it can facilitate aggregation and sharing of QC results within a team. While researchers must spend time on robust visual QC of data, there are mechanisms by which the process can be streamlined and efficient.

Comments:	22 pages, 12 figures, 1 table, 6 supplemental figures
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2409.17286 [cs.DC]
	(or arXiv:2409.17286v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2409.17286

Submission history

From: Michael Kim [view email]
[v1] Wed, 25 Sep 2024 18:50:51 UTC (12,383 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Scalable quality control on processing of large diffusion-weighted and structural magnetic resonance imaging datasets

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Scalable quality control on processing of large diffusion-weighted and structural magnetic resonance imaging datasets

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators