Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2818346.2830605acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Utilizing Depth Sensors for Analyzing Multimodal Presentations: Hardware, Software and Toolkits

Published: 09 November 2015 Publication History

Abstract

Body language plays an important role in learning processes and communication. For example, communication research produced evidence that mathematical knowledge can be embodied in gestures made by teachers and students. Likewise, body postures and gestures are also utilized by speakers in oral presentations to convey ideas and important messages. Consequently, capturing and analyzing non-verbal behaviors is an important aspect in multimodal learning analytics (MLA) research. With regard to sensing capabilities, the introduction of depth sensors such as the Microsoft Kinect has greatly facilitated research and development in this area. However, the rapid advancement in hardware and software capabilities is not always in sync with the expanding set of features reported in the literature. For example, though Anvil is a widely used state-of-the-art annotation and visualization toolkit for motion traces, its motion recording component based on OpenNI is outdated. As part of our research in developing multimodal educational assessments, we began an effort to develop and standardize algorithms for purposes of multimodal feature extraction and creating automated scoring models. This paper provides an overview of relevant work in multimodal research on educational tasks, and proceeds to summarize our work using multimodal sensors in developing assessments of communication skills, with attention on the use of depth sensors. Specifically, we focus on the task of public speaking assessment using Microsoft Kinect. Additionally, we introduce an open-source Python package for computing expressive body language features from Kinect motion data, which we hope will benefit the MLA research community.

References

[1]
M. W. Alibali and M. J. Nathan. Embodiment in mathematics teaching and learning: Evidence from learners' and teachers' gestures. Journal of the Learning Sciences, 21(2):247--286, 2012.
[2]
T. Baltrušaitis, P. Robinson, and L.-P. Morency. 3d constrained local model for rigid and non-rigid facial tracking. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2610--2617. IEEE, 2012.
[3]
R. Barmaki and C. E. Hughes. A case study to track teacher gestures and performance in a virtual learning environment. In Proceedings of the Fifth International Conference on Learning Analytics And Knowledge, pages 420--421. ACM, 2015.
[4]
R. L. Birdwhistell. Kinesics and context: Essays on body motion communication. University of Pennsylvania press, 2010.
[5]
P. Boersma and D. Weenink. Praat, a system for doing phonetics by computer. 2001.
[6]
T. Breuer, C. Bodensteiner, and M. Arens. Low-cost commodity depth sensor comparison and accuracy analysis. In SPIE Security+ Defence, pages 92500G--92500G. International Society for Optics and Photonics, 2014.
[7]
C. Caswell and S. Neill. Body language for competent teachers. Routledge, 2003.
[8]
Y.-J. Chang, S.-F. Chen, and J.-D. Huang. A kinect-based system for physical rehabilitation: A pilot study for young adults with motor disabilities. Research in developmental disabilities, 32(6):2566--2570, 2011.
[9]
L. Chen, G. Feng, J. Joe, C. W. Leong, C. Kitchen, and C. M. Lee. Towards automated assessment of public speaking skills using multimodal cues. In Proceedings of the 16th International Conference on Multimodal Interaction, pages 200--203. ACM, 2014.
[10]
L. Chen, C. W. Leong, G. Feng, and C. M. Lee. Using multimodal cues to analyze mla'14 oral presentation quality corpus: Presentation delivery and slides quality. In Proceedings of the 2014 ACM workshop on Multimodal Learning Analytics Workshop and Grand Challenge, pages 45--52. ACM, 2014.
[11]
L. Chen, C. W. Leong, G. Feng, C. M. Lee, and S. Somasundaran. Utilizing multimodal cues to automatically evaluate public speaking performance. In Affective Computing and Intelligent Interaction, 2015.
[12]
M. Draelos, Q. Qiu, A. Bronstein, and G. Sapiro. Intel realsense = real low cost gaze. In Proceedings of the International Conference on Image Processing. IEEE, 2015.
[13]
V. Echeverrá, A. Avendaño, K. Chiluiza, A. Vásquez, and X. Ochoa. Presentation skills estimation based on video and kinect data analysis. In Proceedings of the 2014 ACM workshop on Multimodal Learning Analytics Workshop and Grand Challenge, pages 53--60. ACM, 2014.
[14]
D. Glowinski, N. Dael, A. Camurri, G. Volpe, M. Mortillaro, and K. Scherer. Toward a minimal representation of affective gestures. Affective Computing, IEEE Transactions on, 2(2):106--118, 2011.
[15]
J. Grafsgaard, J. B. Wiggins, K. E. Boyer, E. N. Wiebe, and J. Lester. Automatically recognizing facial expression: Predicting engagement and frustration. In Educational Data Mining 2013, 2013.
[16]
J. Guna, G. Jakus, M. Pogačnik, S. Tomažič, and J. Sodnik. An analysis of the precision and reliability of the leap motion sensor and its suitability for static and dynamic tracking. Sensors, 14(2):3702--3720, 2014.
[17]
J. Han, L. Shao, D. Xu, and J. Shotton. Enhanced computer vision with microsoft kinect sensor: A review. Cybernetics, IEEE Transactions on, 43(5):1318--1334, 2013.
[18]
A. Hartholt, D. Traum, S. C. Marsella, A. Shapiro, G. Stratou, A. Leuski, L.-P. Morency, and J. Gratch. All together now. In Intelligent Virtual Agents, pages 368--381. Springer, 2013.
[19]
J. Joe, C. Kitchen, L. Chen, and G. Feng. A prototype public speaking skills assessment: An evaluation of human scoring quality. ETS Research Report, in press.
[20]
K. Khoshelham and S. O. Elberink. Accuracy and resolution of kinect depth data for indoor mapping applications. Sensors, 12(2):1437--1454, 2012.
[21]
M. Kipp. Annotation facilities for the reliable analysis of human motion. In LREC, pages 4103--4107, 2012.
[22]
P. C. Kyllonen. Measurement of 21st century skills within the common core state standards. In Invitational Research Symposium on Technology Enhanced Assessments. May, pages 7--8, 2012.
[23]
P. W. Miller. Body language in the classroom. Techniques: Connecting Education and Careers, 80(8):28--30, 2005.
[24]
S. P. Morreale, M. R. Moore, D. Surges-Tatum, and L. Webster. "The competent speaker" speech evaluation form (2nd ed.). National Communication Association, 2007.
[25]
M. Nebeling, D. Ott, and M. C. Norrie. Kinect analysis: A system for recording, analysing and sharing multimodal interaction elicitation studies. In Proceedings of the 7th SIGCHI Symposium on Engineering Interactive Computing Systems. ACM, 2015.
[26]
R. Niewiadomski, M. Mancini, and S. Piana. Human and virtual agent expressive gesture quality analysis and synthesis. Coverbal Synchrony in Human-Machine Interaction, pages 269--292, 2013.
[27]
S. Plagenhoef, F. G. Evans, and T. Abdelnour. Anatomical data for analyzing human motion. Research quarterly for exercise and sport, 54(2):169--178, 1983.
[28]
A. Porter, J. McMaken, J. Hwang, and R. Yang. Common core standards the new us intended curriculum. Educational Researcher, 40(3):103--116, 2011.
[29]
R. L. Quianthy. Communication is life: Essential college sophomore speaking and listening competencies. Speech Communication Association, 1990.
[30]
M. Raca and P. Dillenbourg. Holistic analysis of the classroom. In Proceedings of the 2014 ACM workshop on Multimodal Learning Analytics Workshop and Grand Challenge, pages 13--20. ACM, 2014.
[31]
I. Rosenfelder, J. Fruehwald, K. Evanini, and J. Yuan. Fave (forced alignment and vowel extraction) program suite. U RL http://fave. ling. upenn. edu, 2011.
[32]
L. M. Schreiber, G. D. Paul, and L. R. Shibley. The development and test of the public speaking competence rubric. Communication Education, 61(3):205--233, 2012.
[33]
L. M. Schreiber, G. D. Paul, and L. R. Shibley. The development and test of the public speaking competence rubric. Communication Education, 61(3):205--233, 2012.
[34]
E. Suma, B. Lange, A. S. Rizzo, D. M. Krum, M. Bolas, et al. Faast: The flexible action and articulated skeleton toolkit. In Virtual Reality Conference (VR), 2011 IEEE, pages 247--248. IEEE, 2011.
[35]
H. Vasquez, L. E. Sucar, and H. J. Escalante. Simultaneous segmentation and recognition of gestures for human-machine interaction. In Workshop on Ubiquitous Data Mining, IJCAI, page 29, 2013.
[36]
J. Wagner, F. Lingenfelser, T. Baur, I. Damian, F. Kistler, and E. André. The social signal interpretation (ssi) framework: multimodal signal processing and recognition in real-time. In Proceedings of the 21st ACM international conference on Multimedia, pages 831--834. ACM, 2013.
[37]
D. Webster and O. Celik. Systematic review of kinect applications in elderly care and stroke rehabilitation. J. Neuroeng. Rehabil, 11(1):108, 2014.
[38]
M. Worsley and P. Blikstein. Towards the development of multimodal action based assessment. In Proceedings of the third international conference on learning analytics and knowledge, pages 94--101. ACM, 2013.
[39]
Z. Zhang. Microsoft kinect sensor and its effect. MultiMedia, IEEE, 19(2):4--10, 2012.

Cited By

View all
  • (2024)Virtual Reality on Public Speaking Phobia mitigationProcedia Computer Science10.1016/j.procs.2024.06.416239(2251-2259)Online publication date: 2024
  • (2022)IoT-based Multimodal Analysis for Smart Education: Current Status, Challenges and OpportunitiesProceedings of the 3rd ACM Workshop on Intelligent Cross-Data Analysis and Retrieval10.1145/3512731.3534208(32-40)Online publication date: 27-Jun-2022
  • (2022)Construction of Ubiquitous Multimodal Mobile Education Model for College English Based on Cloud Computing2021 International Conference on Big Data Analytics for Cyber-Physical System in Smart City10.1007/978-981-16-7469-3_88(797-803)Online publication date: 1-Jan-2022
  • Show More Cited By

Index Terms

  1. Utilizing Depth Sensors for Analyzing Multimodal Presentations: Hardware, Software and Toolkits

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction
    November 2015
    678 pages
    ISBN:9781450339124
    DOI:10.1145/2818346
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 November 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. depth sensors
    2. kinect
    3. multimodal learning analytics

    Qualifiers

    • Research-article

    Conference

    ICMI '15
    Sponsor:
    ICMI '15: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
    November 9 - 13, 2015
    Washington, Seattle, USA

    Acceptance Rates

    ICMI '15 Paper Acceptance Rate 52 of 127 submissions, 41%;
    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Virtual Reality on Public Speaking Phobia mitigationProcedia Computer Science10.1016/j.procs.2024.06.416239(2251-2259)Online publication date: 2024
    • (2022)IoT-based Multimodal Analysis for Smart Education: Current Status, Challenges and OpportunitiesProceedings of the 3rd ACM Workshop on Intelligent Cross-Data Analysis and Retrieval10.1145/3512731.3534208(32-40)Online publication date: 27-Jun-2022
    • (2022)Construction of Ubiquitous Multimodal Mobile Education Model for College English Based on Cloud Computing2021 International Conference on Big Data Analytics for Cyber-Physical System in Smart City10.1007/978-981-16-7469-3_88(797-803)Online publication date: 1-Jan-2022
    • (2021)Gesture and Gaze: Multimodal Data in Dyadic InteractionsInternational Handbook of Computer-Supported Collaborative Learning10.1007/978-3-030-65291-3_34(625-641)Online publication date: 9-Oct-2021
    • (2020)Inclusive Making: designing tools and experiences to promote accessibility and redefine makingComputer Science Education10.1080/08993408.2020.186370532:2(155-187)Online publication date: 24-Dec-2020
    • (2019)Ergonomic Postural Evaluation System Through Non-invasive SensorsAdvances and Applications in Computer Science, Electronics and Industrial Engineering10.1007/978-3-030-33614-1_19(274-286)Online publication date: 24-Oct-2019
    • (2018)(Dis)engagement mattersProceedings of the 8th International Conference on Learning Analytics and Knowledge10.1145/3170358.3170420(365-369)Online publication date: 7-Mar-2018
    • (2018)Multimodal learning analyticsThe Handbook of Multimodal-Multisensor Interfaces10.1145/3107990.3108003(331-374)Online publication date: 1-Oct-2018
    • (2018)From signals to knowledge: A conceptual model for multimodal learning analyticsJournal of Computer Assisted Learning10.1111/jcal.1228834:4(338-349)Online publication date: 23-Jul-2018
    • (2018)Embodiment analytics of practicing teachers in a virtual immersive environmentJournal of Computer Assisted Learning10.1111/jcal.1226834:4(387-396)Online publication date: 21-May-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media