research-article

Utilizing Depth Sensors for Analyzing Multimodal Presentations: Hardware, Software and Toolkits

Authors:

Chee Wee Leong,

Matthew MulhollandAuthors Info & Claims

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pages 547 - 556

https://doi.org/10.1145/2818346.2830605

Published: 09 November 2015 Publication History

Abstract

Body language plays an important role in learning processes and communication. For example, communication research produced evidence that mathematical knowledge can be embodied in gestures made by teachers and students. Likewise, body postures and gestures are also utilized by speakers in oral presentations to convey ideas and important messages. Consequently, capturing and analyzing non-verbal behaviors is an important aspect in multimodal learning analytics (MLA) research. With regard to sensing capabilities, the introduction of depth sensors such as the Microsoft Kinect has greatly facilitated research and development in this area. However, the rapid advancement in hardware and software capabilities is not always in sync with the expanding set of features reported in the literature. For example, though Anvil is a widely used state-of-the-art annotation and visualization toolkit for motion traces, its motion recording component based on OpenNI is outdated. As part of our research in developing multimodal educational assessments, we began an effort to develop and standardize algorithms for purposes of multimodal feature extraction and creating automated scoring models. This paper provides an overview of relevant work in multimodal research on educational tasks, and proceeds to summarize our work using multimodal sensors in developing assessments of communication skills, with attention on the use of depth sensors. Specifically, we focus on the task of public speaking assessment using Microsoft Kinect. Additionally, we introduce an open-source Python package for computing expressive body language features from Kinect motion data, which we hope will benefit the MLA research community.

References

[1]

M. W. Alibali and M. J. Nathan. Embodiment in mathematics teaching and learning: Evidence from learners' and teachers' gestures. Journal of the Learning Sciences, 21(2):247--286, 2012.

[2]

T. Baltrušaitis, P. Robinson, and L.-P. Morency. 3d constrained local model for rigid and non-rigid facial tracking. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2610--2617. IEEE, 2012.

Digital Library

[3]

R. Barmaki and C. E. Hughes. A case study to track teacher gestures and performance in a virtual learning environment. In Proceedings of the Fifth International Conference on Learning Analytics And Knowledge, pages 420--421. ACM, 2015.

Digital Library

[4]

R. L. Birdwhistell. Kinesics and context: Essays on body motion communication. University of Pennsylvania press, 2010.

[5]

P. Boersma and D. Weenink. Praat, a system for doing phonetics by computer. 2001.

[6]

T. Breuer, C. Bodensteiner, and M. Arens. Low-cost commodity depth sensor comparison and accuracy analysis. In SPIE Security+ Defence, pages 92500G--92500G. International Society for Optics and Photonics, 2014.

[7]

C. Caswell and S. Neill. Body language for competent teachers. Routledge, 2003.

[8]

Y.-J. Chang, S.-F. Chen, and J.-D. Huang. A kinect-based system for physical rehabilitation: A pilot study for young adults with motor disabilities. Research in developmental disabilities, 32(6):2566--2570, 2011.

[9]

L. Chen, G. Feng, J. Joe, C. W. Leong, C. Kitchen, and C. M. Lee. Towards automated assessment of public speaking skills using multimodal cues. In Proceedings of the 16th International Conference on Multimodal Interaction, pages 200--203. ACM, 2014.

Digital Library

[10]

L. Chen, C. W. Leong, G. Feng, and C. M. Lee. Using multimodal cues to analyze mla'14 oral presentation quality corpus: Presentation delivery and slides quality. In Proceedings of the 2014 ACM workshop on Multimodal Learning Analytics Workshop and Grand Challenge, pages 45--52. ACM, 2014.

Digital Library

[11]

L. Chen, C. W. Leong, G. Feng, C. M. Lee, and S. Somasundaran. Utilizing multimodal cues to automatically evaluate public speaking performance. In Affective Computing and Intelligent Interaction, 2015.

[12]

M. Draelos, Q. Qiu, A. Bronstein, and G. Sapiro. Intel realsense = real low cost gaze. In Proceedings of the International Conference on Image Processing. IEEE, 2015.

[13]

V. Echeverrá, A. Avendaño, K. Chiluiza, A. Vásquez, and X. Ochoa. Presentation skills estimation based on video and kinect data analysis. In Proceedings of the 2014 ACM workshop on Multimodal Learning Analytics Workshop and Grand Challenge, pages 53--60. ACM, 2014.

Digital Library

[14]

D. Glowinski, N. Dael, A. Camurri, G. Volpe, M. Mortillaro, and K. Scherer. Toward a minimal representation of affective gestures. Affective Computing, IEEE Transactions on, 2(2):106--118, 2011.

Digital Library

[15]

J. Grafsgaard, J. B. Wiggins, K. E. Boyer, E. N. Wiebe, and J. Lester. Automatically recognizing facial expression: Predicting engagement and frustration. In Educational Data Mining 2013, 2013.

[16]

J. Guna, G. Jakus, M. Pogačnik, S. Tomažič, and J. Sodnik. An analysis of the precision and reliability of the leap motion sensor and its suitability for static and dynamic tracking. Sensors, 14(2):3702--3720, 2014.

[17]

J. Han, L. Shao, D. Xu, and J. Shotton. Enhanced computer vision with microsoft kinect sensor: A review. Cybernetics, IEEE Transactions on, 43(5):1318--1334, 2013.

[18]

A. Hartholt, D. Traum, S. C. Marsella, A. Shapiro, G. Stratou, A. Leuski, L.-P. Morency, and J. Gratch. All together now. In Intelligent Virtual Agents, pages 368--381. Springer, 2013.

[19]

J. Joe, C. Kitchen, L. Chen, and G. Feng. A prototype public speaking skills assessment: An evaluation of human scoring quality. ETS Research Report, in press.

[20]

K. Khoshelham and S. O. Elberink. Accuracy and resolution of kinect depth data for indoor mapping applications. Sensors, 12(2):1437--1454, 2012.

[21]

M. Kipp. Annotation facilities for the reliable analysis of human motion. In LREC, pages 4103--4107, 2012.

[22]

P. C. Kyllonen. Measurement of 21st century skills within the common core state standards. In Invitational Research Symposium on Technology Enhanced Assessments. May, pages 7--8, 2012.

[23]

P. W. Miller. Body language in the classroom. Techniques: Connecting Education and Careers, 80(8):28--30, 2005.

[24]

S. P. Morreale, M. R. Moore, D. Surges-Tatum, and L. Webster. "The competent speaker" speech evaluation form (2nd ed.). National Communication Association, 2007.

[25]

M. Nebeling, D. Ott, and M. C. Norrie. Kinect analysis: A system for recording, analysing and sharing multimodal interaction elicitation studies. In Proceedings of the 7th SIGCHI Symposium on Engineering Interactive Computing Systems. ACM, 2015.

Digital Library

[26]

R. Niewiadomski, M. Mancini, and S. Piana. Human and virtual agent expressive gesture quality analysis and synthesis. Coverbal Synchrony in Human-Machine Interaction, pages 269--292, 2013.

[27]

S. Plagenhoef, F. G. Evans, and T. Abdelnour. Anatomical data for analyzing human motion. Research quarterly for exercise and sport, 54(2):169--178, 1983.

[28]

A. Porter, J. McMaken, J. Hwang, and R. Yang. Common core standards the new us intended curriculum. Educational Researcher, 40(3):103--116, 2011.

[29]

R. L. Quianthy. Communication is life: Essential college sophomore speaking and listening competencies. Speech Communication Association, 1990.

[30]

M. Raca and P. Dillenbourg. Holistic analysis of the classroom. In Proceedings of the 2014 ACM workshop on Multimodal Learning Analytics Workshop and Grand Challenge, pages 13--20. ACM, 2014.

Digital Library

[31]

I. Rosenfelder, J. Fruehwald, K. Evanini, and J. Yuan. Fave (forced alignment and vowel extraction) program suite. U RL http://fave. ling. upenn. edu, 2011.

[32]

L. M. Schreiber, G. D. Paul, and L. R. Shibley. The development and test of the public speaking competence rubric. Communication Education, 61(3):205--233, 2012.

[33]

L. M. Schreiber, G. D. Paul, and L. R. Shibley. The development and test of the public speaking competence rubric. Communication Education, 61(3):205--233, 2012.

[34]

E. Suma, B. Lange, A. S. Rizzo, D. M. Krum, M. Bolas, et al. Faast: The flexible action and articulated skeleton toolkit. In Virtual Reality Conference (VR), 2011 IEEE, pages 247--248. IEEE, 2011.

Digital Library

[35]

H. Vasquez, L. E. Sucar, and H. J. Escalante. Simultaneous segmentation and recognition of gestures for human-machine interaction. In Workshop on Ubiquitous Data Mining, IJCAI, page 29, 2013.

[36]

J. Wagner, F. Lingenfelser, T. Baur, I. Damian, F. Kistler, and E. André. The social signal interpretation (ssi) framework: multimodal signal processing and recognition in real-time. In Proceedings of the 21st ACM international conference on Multimedia, pages 831--834. ACM, 2013.

Digital Library

[37]

D. Webster and O. Celik. Systematic review of kinect applications in elderly care and stroke rehabilitation. J. Neuroeng. Rehabil, 11(1):108, 2014.

[38]

M. Worsley and P. Blikstein. Towards the development of multimodal action based assessment. In Proceedings of the third international conference on learning analytics and knowledge, pages 94--101. ACM, 2013.

Digital Library

[39]

Z. Zhang. Microsoft kinect sensor and its effect. MultiMedia, IEEE, 19(2):4--10, 2012.

Digital Library

Cited By

Ferreira LCerqueira JJonassi JJesus AAmaral CMateus-Coelho N(2024)Virtual Reality on Public Speaking Phobia mitigationProcedia Computer Science10.1016/j.procs.2024.06.416239(2251-2259)Online publication date: 2024
https://doi.org/10.1016/j.procs.2024.06.416
Gan WDao MZettsu KSun YDao MDang-Nguyen DRiegler M(2022)IoT-based Multimodal Analysis for Smart Education: Current Status, Challenges and OpportunitiesProceedings of the 3rd ACM Workshop on Intelligent Cross-Data Analysis and Retrieval10.1145/3512731.3534208(32-40)Online publication date: 27-Jun-2022
https://dl.acm.org/doi/10.1145/3512731.3534208
Liu NZhao Y(2022)Construction of Ubiquitous Multimodal Mobile Education Model for College English Based on Cloud Computing2021 International Conference on Big Data Analytics for Cyber-Physical System in Smart City10.1007/978-981-16-7469-3_88(797-803)Online publication date: 1-Jan-2022
https://doi.org/10.1007/978-981-16-7469-3_88
Show More Cited By

Index Terms

Utilizing Depth Sensors for Analyzing Multimodal Presentations: Hardware, Software and Toolkits
1. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

Classifying student dialogue acts with multimodal learning analytics
LAK '15: Proceedings of the Fifth International Conference on Learning Analytics And Knowledge

Supporting learning with rich natural language dialogue has been the focus of increasing attention in recent years. Many adaptive learning environments model students' natural language input, and there is growing recognition that these systems can be ...
Combining multiple depth-based descriptors for hand gesture recognition

The hand is reliably extracted from the scene by jointly using color and depth data.Features extracted from depth data allow a reliable hand gesture recognition.Multiple features capturing different properties of the gestures are combined together.The ...
Multimodal analysis of the implicit affective channel in computer-mediated textual communication
ICMI '12: Proceedings of the 14th ACM international conference on Multimodal interaction

Computer-mediated textual communication has become ubiquitous in recent years. Compared to face-to-face interactions, there is decreased bandwidth in affective information, yet studies show that interactions in this medium still produce rich and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

November 2015

678 pages

ISBN:9781450339124

DOI:10.1145/2818346

General Chairs:
Zhengyou Zhang
Microsoft Research, USA
,
Phil Cohen
VoiceBox Technologies, USA
,
Program Chairs:
Dan Bohus
Microsoft Research, USA
,
Radu Horaud
INRIA Grenoble Rhone-Alpes, France
,
Helen Meng
Chinese University of Hong Kong, China

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMI '15

Sponsor:

SIGCHI

ICMI '15: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

November 9 - 13, 2015

Washington, Seattle, USA

Acceptance Rates

ICMI '15 Paper Acceptance Rate 52 of 127 submissions, 41%;

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
318
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ferreira LCerqueira JJonassi JJesus AAmaral CMateus-Coelho N(2024)Virtual Reality on Public Speaking Phobia mitigationProcedia Computer Science10.1016/j.procs.2024.06.416239(2251-2259)Online publication date: 2024
https://doi.org/10.1016/j.procs.2024.06.416
Gan WDao MZettsu KSun YDao MDang-Nguyen DRiegler M(2022)IoT-based Multimodal Analysis for Smart Education: Current Status, Challenges and OpportunitiesProceedings of the 3rd ACM Workshop on Intelligent Cross-Data Analysis and Retrieval10.1145/3512731.3534208(32-40)Online publication date: 27-Jun-2022
https://dl.acm.org/doi/10.1145/3512731.3534208
Liu NZhao Y(2022)Construction of Ubiquitous Multimodal Mobile Education Model for College English Based on Cloud Computing2021 International Conference on Big Data Analytics for Cyber-Physical System in Smart City10.1007/978-981-16-7469-3_88(797-803)Online publication date: 1-Jan-2022
https://doi.org/10.1007/978-981-16-7469-3_88
Schneider BWorsley MMartinez-Maldonado R(2021)Gesture and Gaze: Multimodal Data in Dyadic InteractionsInternational Handbook of Computer-Supported Collaborative Learning10.1007/978-3-030-65291-3_34(625-641)Online publication date: 9-Oct-2021
https://doi.org/10.1007/978-3-030-65291-3_34
Worsley MBar-El D(2020)Inclusive Making: designing tools and experiences to promote accessibility and redefine makingComputer Science Education10.1080/08993408.2020.186370532:2(155-187)Online publication date: 24-Dec-2020
https://doi.org/10.1080/08993408.2020.1863705
Mariño CVargas J(2019)Ergonomic Postural Evaluation System Through Non-invasive SensorsAdvances and Applications in Computer Science, Electronics and Industrial Engineering10.1007/978-3-030-33614-1_19(274-286)Online publication date: 24-Oct-2019
https://doi.org/10.1007/978-3-030-33614-1_19
Worsley MPardo ABartimote-Aufflick KLynch GShum SFerguson RMerceron AOchoa X(2018)(Dis)engagement mattersProceedings of the 8th International Conference on Learning Analytics and Knowledge10.1145/3170358.3170420(365-369)Online publication date: 7-Mar-2018
https://dl.acm.org/doi/10.1145/3170358.3170420
Oviatt SGrafsgaard JChen LOchoa X(2018)Multimodal learning analyticsThe Handbook of Multimodal-Multisensor Interfaces10.1145/3107990.3108003(331-374)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.1145/3107990.3108003
Di Mitri DSchneider JSpecht MDrachsler H(2018)From signals to knowledge: A conceptual model for multimodal learning analyticsJournal of Computer Assisted Learning10.1111/jcal.1228834:4(338-349)Online publication date: 23-Jul-2018
https://doi.org/10.1111/jcal.12288
Barmaki RHughes C(2018)Embodiment analytics of practicing teachers in a virtual immersive environmentJournal of Computer Assisted Learning10.1111/jcal.1226834:4(387-396)Online publication date: 21-May-2018
https://doi.org/10.1111/jcal.12268
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten