Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3382507.3418829acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections

A Multi-modal System to Assess Cognition in Children from their Physical Movements

Published: 22 October 2020 Publication History


In recent years, computer and game-based cognitive tests have become popular with the advancement in mobile technology. However, these tests require very little body movements and do not consider the influence that physical motion has on cognitive development. Our work mainly focus on assessing cognition in children through their physical movements. Hence, an assessment test "Ball-Drop-to-the-Beat" that is both physically and cognitively demanding has been used where the child is expected to perform certain actions based on the commands. The task is specifically designed to measure attention, response inhibition, and coordination in children. A dataset has been created with 25 children performing this test. To automate the scoring, a computer vision-based assessment system has been developed. The vision system employs an attention-based fusion mechanism to combine multiple modalities such as optical flow, human poses, and objects in the scene to predict a child's action. The proposed method outperforms other state-of-the-art approaches by achieving an average accuracy of 89.8 percent on predicting the actions and an average accuracy of 88.5 percent on predicting the rhythm on the Ball-Drop-to-the-Beat dataset.

Supplementary Material

MP4 File (3382507.3418829.mp4)
The video provides an overview of the work, "A Multi-modal System to Assess Cognition in Children from their Physical Movements". The presentation focuses on the task, the dataset, and the approach to build the automated system along with the results. More detailed information can be found in the paper.


Ashwin Ramesh Babu, Akilesh Rajavenkatanarayanan, James Robert Brady, and Fillia Makedon. 2018. Multimodal approach for cognitive task performance prediction from body postures, facial expressions and EEG signal. In Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data. 1--7.
Ashwin Ramesh Babu, Mohammad Zakizadeh, James Robert Brady, Diane Calderon, and Fillia Makedon. 2019. An Intelligent Action Recognition System to assess Cognitive Behavior for Executive Function Disorder. In 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE). IEEE, 164--169.
Vinay Bettadapura, Grant Schindler, Thomas Plötz, and Irfan Essa. 2013. Augmenting bag-of-words: Data-driven discovery of temporal and structural information for activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2619--2626.
Matteo Bregonzio, Shaogang Gong, and Tao Xiang. 2009. Recognising action as clouds of space-time interest points. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 1948--1955.
Thomas Brox, Andrés Bruhn, Nils Papenberg, and Joachim Weickert. 2004. High accuracy optical flow estimation based on a theory for warping. In European conference on computer vision. Springer, 25--36.
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint arXiv:1812.08008 (2018).
Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, and Y. A. Sheikh. 2019. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019).
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.
Jen-Yen Chang, Antonio Tejero-de Pablos, and Tatsuya Harada. 2019. Improved Optical Flow for Gesture-based Human-robot Interaction. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 7983--7989.
Rizwan Chaudhry, Avinash Ravichandran, Gregory Hager, and René Vidal. 2009. Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1932--1939.
Catherine L Davis and Stephanie Cooper. 2011. Fitness, fatness, cognition, behavior, and academic achievement among overweight children: do cross-sectional associations correspond to exercise trial outcomes? Preventive medicine 52 (2011), S65--S69.
Emma E Davis, Nicola J Pitchford, and Ellie Limback. 2011. The interrelation between cognitive and motor development in typically developing children aged 4--11 years is underpinned by visual processing and fine manual control. British Journal of Psychology 102, 3 (2011), 569--584.
Milton J Dehn. 2011. Working memory and academic learning: Assessment and intervention. John Wiley & Sons.
Adele Diamond. 2013. Executive functions. Annual review of psychology 64 (2013), 135--168.
Alex Dillhoff, Konstantinos Tsiakas, Ashwin Ramesh Babu, Mohammad Zakizadehghariehali, Benjamin Buchanan, Morris Bell, Vassilis Athitsos, and Fillia Makedon. 2019. An automated assessment system for embodied cognition in children: from motion data to executive functioning. In Proceedings of the 6th international Workshop on Sensor-based Activity Recognition and Interaction. 1--6.
Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai, and Cewu Lu. 2017. RMPE: Regional Multi-person Pose Estimation. In ICCV.
Annalisa Franco, Antonio Magnani, and Dario Maio. 2020. A multimodal approach for human activity recognition based on skeleton and RGB data. Pattern Recognition Letters (2020).
Harshala Gammulle, Simon Denman, Sridha Sridharan, and Clinton Fookes. 2017. Two stream lstm: A deep fusion framework for human action recognition. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 177--186.
Srujana Gattupalli, Ashwin Ramesh Babu, James Robert Brady, Fillia Makedon, and Vassilis Athitsos. 2018. Towards deep learning based hand keypoints detection for rapid sequential movements from rgb images. In Proceedings of the 11th PErvasive Technologies Related to Assistive Environments Conference. 31--37.
Srujana Gattupalli, Dylan Ebert, Michalis Papakostas, Fillia Makedon, and Vassilis Athitsos. 2017. Cognilearn: A deep learning-based interface for cognitive behavior assessment. In Proceedings of the 22nd International Conference on Intelligent User Interfaces. 577--587.
Alexander Grushin, Derek D Monner, James A Reggia, and Ajay Mishra. 2013. Robust human action recognition via long short-term memory. In The 2013 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.
Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2017. Learning spatiotemporal features with 3D residual networks for action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 3154--3160.
Samitha Herath, Mehrtash Harandi, and Fatih Porikli. 2017. Going deeper into action recognition: A survey. Image and vision computing 60 (2017), 4--21.
Berthold KP Horn and Brian G Schunck. 1981. Determining optical flow. In Techniques and Applications of Image Understanding, Vol. 281. International Society for Optics and Photonics, 319--331.
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132--7141.
Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2012. 3D convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence 35, 1 (2012), 221--231.
Georgios Kapidis, Ronald Poppe, Elsbeth van Dam, Lucas PJJ Noldus, and Remco C Veltkamp. 2019. Egocentric Hand Track and Object-based Human Action Recognition. arXiv preprint arXiv:1905.00742 (2019).
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1725--1732.
Muhammed Kocabas, Nikos Athanasiou, and Michael J Black. 2020. VIBE: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5253--5263.
Yu Kong and Yun Fu. 2018. Human action recognition and prediction: A survey. arXiv preprint arXiv:1806.11230 (2018).
Maria Kyrarini, Quan Zheng, Muhammad Abdul Haseeb, and Axel Gräser. 2019. Robot Learning of Assistive Manipulation Tasks by Demonstration via Head Gesture-based Interface. In 2019 IEEE 16th International Conference on Rehabilitation Robotics (ICORR). IEEE, 1139--1146.
Tongwei Lu, Shihui Ai, Yongyuan Jiang, Yudian Xiong, and Feng Min. 2018. Deep Optical Flow Feature Fusion Based on 3D Convolutional Networks for Video Action Recognition. In 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, 1077--1080.
Haojie Ma, Wenzhong Li, Xiao Zhang, Songcheng Gao, and Sanglu Lu. 2019. AttnSense: multi-level attention mechanism for multimodal human activity recognition. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. AAAI Press, 3109--3115.
Mehran Maghoumi and Joseph J LaViola Jr. 2019. DeepGRU: Deep gesture recognition utility. In International Symposium on Visual Computing. Springer, 16--31.
James W Montgomery, Beula M Magimairaj, and Mianisha C Finney. 2010. Working memory and specific language impairment: An update on the relation and perspectives on assessment and treatment. American journal of speech-language pathology (2010).
Vinod Nair and Geoffrey E. Hinton. 2010. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning (ICML'10). Omnipress, Madison, WI, USA, 807--814.
Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele. 2016. DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Akilesh Rajavenkatanarayanan, Ashwin Ramesh Babu, Konstantinos Tsiakas, and Fillia Makedon. 2018. Monitoring task engagement using facial expressions and body postures. In Proceedings of the 3rd International Workshop on Interactive and Spatial Computing. 103--108.
Michalis Raptis and Leonid Sigal. 2013. Poselet key-framing: A model for human activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2650--2657.
Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).
Bin Ren, Mengyuan Liu, Runwei Ding, and Hong Liu. 2020. A Survey on 3D Skeleton-Based Action Recognition Using Learning Method. arXiv preprint arXiv:2002.05907 (2020).
Itsaso Rodríguez-Moreno, José María Martínez-Otzeta, Basilio Sierra, Igor Rodriguez, and Ekaitz Jauregi. 2019. Video Activity Recognition: State-of-the-Art. Sensors 19, 14 (2019), 3160.
Laura Sevilla-Lara, Yiyi Liao, Fatma Güney, Varun Jampani, Andreas Geiger, and Michael J Black. 2018. On the integration of optical flow and action recognition. In German Conference on Pattern Recognition. Springer, 281--297.
Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1010--1019.
Ali Sharifara, Ashwin Ramesh Babu, Akilesh Rajavenkatanarayanan, Christopher Collander, and Fillia Makedon. 2018. A robot-based cognitive assessment model based on visual working memory and attention level. In International Conference on Universal Access in Human-Computer Interaction. Springer, 583--597.
Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Skeleton-based action recognition with directed graph neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7912--7921.
Riyanto Sigit, Dyah Rahma Kartika, et al. 2016. 3D Sign language translator using optical flow. In 2016 International Electronics Symposium (IES). IEEE, 262--266.
Hannah R Snyder. 2013. Major depressive disorder is associated with broad impairments on neuropsychological measures of executive function: a metaanalysis and review. Psychological bulletin 139, 1 (2013), 81.
Shuyang Sun, Zhanghui Kuang, Lu Sheng, Wanli Ouyang, and Wei Zhang. 2018. Optical flow guided feature: A fast and robust motion representation for video action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1390--1399.
Amin Ullah, Jamil Ahmad, Khan Muhammad, Muhammad Sajjad, and Sung Wook Baik. 2017. Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access 6 (2017), 1155--1166.
Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. 2019. Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters 119 (2019), 3--11.
Wei Wang, Jinjin Zhang, Chenyang Si, and Liang Wang. 2018. Pose-based twostream relational networks for action recognition in videos. arXiv preprint arXiv:1805.08484 (2018).
Erik G Willcutt, Alysa E Doyle, Joel T Nigg, Stephen V Faraone, and Bruce F Pennington. 2005. Validity of the executive function theory of attentiondeficit/hyperactivity disorder: a meta-analytic review. Biological psychiatry 57, 11 (2005), 1336--1346.
Mohammad Zaki Zadeh, Ashwin Ramesh Babu, Ashish Jaiswal, and Fillia Makedon. 2020. Self-Supervised Human Activity Recognition by Augmenting Generative Adversarial Networks. arXiv:cs.CV/2008.11755
Philip David Zelazo, Jacob E Anderson, Jennifer Richler, Kathleen Wallner-Allen, Jennifer L Beaumont, and Sandra Weintraub. 2013. II. NIH Toolbox Cognition Battery (CB): Measuring executive function and attention. Monographs of the Society for Research in Child Development 78, 4 (2013), 16--33.
Hong-Bo Zhang, Yi-Xiang Zhang, Bineng Zhong, Qing Lei, Lijie Yang, Ji-Xiang Du, and Duan-Sheng Chen. 2019. A comprehensive survey of vision-based human action recognition methods. Sensors 19, 5 (2019), 1005.
Zhengxia Zou, Zhenwei Shi, Yuhong Guo, and Jieping Ye. 2019. Object detection in 20 years: A survey. arXiv preprint arXiv:1905.05055 (2019).

Cited By

View all
  • (2023)SmartFunction: An Immersive Vr System To Assess Attention Using Embodied CognitionProceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3594806.3596559(485-490)Online publication date: 5-Jul-2023
  • (2023)Detecting Cognitive Fatigue in Subjects with Traumatic Brain Injury from FMRI Scans Using Self-Supervised LearningProceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3594806.3594868(83-90)Online publication date: 5-Jul-2023
  • (2023)Remote Operated Human Robot Interactive System using Hand Gestures for Persons with DisabilitiesProceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3594806.3594832(137-139)Online publication date: 5-Jul-2023
  • Show More Cited By

Index Terms

  1. A Multi-modal System to Assess Cognition in Children from their Physical Movements



    Information & Contributors


    Published In

    cover image ACM Conferences
    ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction
    October 2020
    920 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 October 2020


    Request permissions for this article.

    Check for updates

    Author Tags

    1. attention
    2. cognitive assessment
    3. embodied cognition
    4. human activity recognition (har)
    5. multi-modal fusion
    6. response inhibition
    7. rhythm


    • Research-article


    ICMI '20
    October 25 - 29, 2020
    Virtual Event, Netherlands

    Acceptance Rates

    Overall Acceptance Rate 453 of 1,080 submissions, 42%


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 23 Feb 2025

    Other Metrics


    Cited By

    View all
    • (2023)SmartFunction: An Immersive Vr System To Assess Attention Using Embodied CognitionProceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3594806.3596559(485-490)Online publication date: 5-Jul-2023
    • (2023)Detecting Cognitive Fatigue in Subjects with Traumatic Brain Injury from FMRI Scans Using Self-Supervised LearningProceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3594806.3594868(83-90)Online publication date: 5-Jul-2023
    • (2023)Remote Operated Human Robot Interactive System using Hand Gestures for Persons with DisabilitiesProceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3594806.3594832(137-139)Online publication date: 5-Jul-2023
    • (2023)Multiview child motor development dataset for AI-driven assessment of child developmentGigaScience10.1093/gigascience/giad03912Online publication date: 27-May-2023
    • (2023)A Smart Sensor Suit (SSS) to Assess Cognitive and Physical Fatigue with Machine LearningDigital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management10.1007/978-3-031-35741-1_10(120-134)Online publication date: 9-Jul-2023
    • (2022)Self-Supervised Human Activity Representation for Embodied Cognition AssessmentTechnologies10.3390/technologies1001003310:1(33)Online publication date: 17-Feb-2022
    • (2022)Light-Weight Seated Posture Guidance System with Machine Learning and Computer VisionProceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3529190.3535341(595-600)Online publication date: 29-Jun-2022
    • (2022)Automated System to Measure Static Balancing in Children to Assess Executive FunctionProceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3529190.3534750(569-575)Online publication date: 29-Jun-2022
    • (2021)Automated system to measure Tandem Gait to assess executive functions in childrenProceedings of the 14th PErvasive Technologies Related to Assistive Environments Conference10.1145/3453892.3453999(167-170)Online publication date: 29-Jun-2021
    • (2021)Self-Supervised Human Activity Recognition by Augmenting Generative Adversarial NetworksProceedings of the 14th PErvasive Technologies Related to Assistive Environments Conference10.1145/3453892.3453893(171-176)Online publication date: 29-Jun-2021

    View Options

    Login options

    View options


    View or Download as a PDF file.



    View online with eReader.







    Share this Publication link

    Share on social media