Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-72086-4_24guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Enhancing Gait Video Analysis in Neurodegenerative Diseases by Knowledge Augmentation in Vision Language Model

Published: 07 October 2024 Publication History

Abstract

We present a knowledge augmentation strategy for assessing the diagnostic groups and gait impairment from monocular gait videos. Based on a large-scale pre-trained Vision Language Model (VLM), our model learns and improves visual, textual, and numerical representations of patient gait videos, through a collective learning across three distinct modalities: gait videos, class-specific descriptions, and numerical gait parameters. Our specific contributions are two-fold: First, we adopt a knowledge-aware prompt tuning strategy to utilize the class-specific medical description in guiding the text prompt learning. Second, we integrate the paired gait parameters in the form of numerical texts to enhance the numeracy of the textual representation. Results demonstrate that our model not only significantly outperforms state-of-the-art methods in video-based classification tasks but also adeptly decodes the learned class-specific text features into natural language descriptions using the vocabulary of quantitative gait parameters. The code and the model will be made available at our project page: https://lisqzqng.github.io/GaitAnalysisVLM/.

References

[1]
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
[2]
Albuquerque P, Verlekar TT, Correia PL, and Soares LD A spatiotemporal deep learning approach for automatic pathological gait classification Sensors 2021 21 18 6202
[3]
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Conference on Neural Information Processing Systems (NeurIPS) (2020)
[4]
Fan, C., Liang, J., Shen, C., Hou, S., Huang, Y., Yu, S.: Opengait: Revisiting gait recognition towards better practicality. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9707–9716 (2023)
[5]
Friji, R., Drira, H., Chaieb, F., Kchok, H., Kurtek, S.: Geometric deep neural network using rigid and non-rigid transformations for human action recognition. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 12611–12620 (2021)
[6]
Goetz CG, Tilley BC, Shaftman SR, Stebbins GT, Fahn S, Martinez-Martin P, Poewe W, Sampaio C, Stern MB, Dodel R, et al. Movement disorder society-sponsored revision of the unified parkinson’s disease rating scale (mds-updrs): scale presentation and clinimetric testing results Movement disorders: official journal of the Movement Disorder Society 2008 23 15 2129-2170
[7]
Golkar, S., Pettee, M., Eickenberg, M., Bietti, A., Cranmer, M., Krawezik, G., Lanusse, F., McCabe, M., Ohana, R., Parker, L., et al.: xval: A continuous number encoding for large language models. arXiv preprint arXiv:2310.02989 (2023)
[8]
Huang, S.C., Shen, L., Lungren, M.P., Yeung, S.: Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 3942–3951 (2021)
[9]
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T.J., Zou, J.: A visual–language foundation model for pathology image analysis using medical twitter. Nature medicine 29(9), 2307–2316 (2023)
[10]
Kan, B., Wang, T., Lu, W., Zhen, X., Guan, W., Zheng, F.: Knowledge-aware prompt tuning for generalizable vision-language models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 15670–15680 (2023)
[11]
Kocabas, M., Athanasiou, N., Black, M.J.: Vibe: Video inference for human body pose and shape estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5253–5263 (2020)
[12]
Kocabas, M., Huang, C.H.P., Hilliges, O., Black, M.J.: Pare: Part attention regressor for 3d human body estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 11127–11137 (2021)
[13]
Li, W., Zhu, L., Wen, L., Yang, Y.: Decap: Decoding clip latents for zero-shot captioning via text-only training. arXiv preprint arXiv:2303.03032 (2023)
[14]
Lu, M., Poston, K., Pfefferbaum, A., Sullivan, E.V., Fei-Fei, L., Pohl, K.M., Niebles, J.C., Adeli, E.: Vision-based estimation of mds-updrs gait scores for assessing parkinson’s disease motor severity. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 637–647. Springer (2020)
[15]
Mc Ardle R, Del Din S, Donaghy P, Galna B, Thomas AJ, and Rochester L The impact of environment on gait assessment: considerations from real-world gait analysis in dementia subtypes Sensors 2021 21 3 813
[16]
Mc Ardle R, Del Din S, Galna B, Thomas A, and Rochester L Differentiating dementia disease subtypes with gait analysis: feasibility of wearable sensors? Gait & posture 2020 76 372-376
[17]
Mc Ardle R, Galna B, Donaghy P, Thomas A, and Rochester L Do alzheimer’s and lewy body disease have discrete pathological signatures of gait? Alzheimer’s & Dementia 2019 15 10 1367-1377
[18]
Mehdizadeh S, Nabavi H, Sabo A, Arora T, Iaboni A, and Taati B The toronto older adults gait archive: video and 3d inertial motion capture data of older adults’ walking Scientific data 2022 9 1 398
[19]
Merory, J., Wittwer, J., Rowe, C., Webster, K.: Quantitative gait analysis in patients with dementia with lewy bodies and alzheimer’s disease. Gait & posture 26, 414–9 (10 2007). 10.1016/j.gaitpost.2006.10.006
[20]
Miech, A., Alayrac, J.B., Smaira, L., Laptev, I., Sivic, J., Zisserman, A.: End-to-end learning of visual representations from uncurated instructional videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9879–9889 (2020)
[21]
Muller C, Perisse J, Blanc F, Kiesmann M, Astier C, and Vogel T Corrélation des troubles de la marche au profil neuropsychologique chez les patients atteints de maladie d’alzheimer et maladie à corps de lewy Revue Neurologique 2018 174 S2-S3
[22]
Qin, Z., Yi, H., Lao, Q., Li, K.: Medical image understanding with pretrained vision language models: A comprehensive study. arXiv preprint arXiv:2209.15517 (2022)
[23]
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PMLR (2021)
[24]
Sabo A, Mehdizadeh S, Iaboni A, and Taati B Estimating parkinsonism severity in natural gait videos of older adults with dementia IEEE journal of biomedical and health informatics 2022 26 5 2288-2298
[25]
Wang D, Zouaoui C, Jang J, Drira H, and Seo H Wu S, Shabestari B, and Xing L Video-based gait analysis for assessing alzheimer’s disease and dementia with lewy bodies Applications of Medical Artificial Intelligence 2024 Cham Springer Nature Switzerland 72-82
[26]
Wang X, Gao T, Zhu Z, Zhang Z, Liu Z, Li J, and Tang J Kepler: A unified model for knowledge embedding and pre-trained language representation Transactions of the Association for Computational Linguistics 2021 9 176-194
[27]
Wang, Z., Wu, Z., Agarwal, D., Sun, J.: Medclip: Contrastive learning from unpaired medical images and text. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. pp. 3876–3887 (2022)
[28]
Wasim, S.T., Naseer, M., Khan, S., Khan, F.S., Shah, M.: Vita-clip: Video and text adaptive clip via multimodal prompting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 23034–23044 (2023)
[29]
Yuan, K., Srivastav, V., Yu, T., Lavanchy, J., Mascagni, P., Navab, N., Padoy, N.: Learning multi-modal representations by watching hundreds of surgical video lectures. arXiv preprint arXiv:2307.15220 (2023)

Index Terms

  1. Enhancing Gait Video Analysis in Neurodegenerative Diseases by Knowledge Augmentation in Vision Language Model
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image Guide Proceedings
            Medical Image Computing and Computer Assisted Intervention – MICCAI 2024: 27th International Conference, Marrakesh, Morocco, October 6–10, 2024, Proceedings, Part V
            Oct 2024
            814 pages
            ISBN:978-3-031-72085-7
            DOI:10.1007/978-3-031-72086-4
            • Editors:
            • Marius George Linguraru,
            • Qi Dou,
            • Aasa Feragen,
            • Stamatia Giannarou,
            • Ben Glocker,
            • Karim Lekadir,
            • Julia A. Schnabel

            Publisher

            Springer-Verlag

            Berlin, Heidelberg

            Publication History

            Published: 07 October 2024

            Author Tags

            1. Pathological gait classification
            2. MDS-UPDRS Gait score
            3. Knowledge-augmented prompt tuning
            4. Numeracy for language model

            Qualifiers

            • Article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 30 Jan 2025

            Other Metrics

            Citations

            View Options

            View options

            Figures

            Tables

            Media

            Share

            Share

            Share this Publication link

            Share on social media