research-article

Using Multimodal Cues to Analyze MLA'14 Oral Presentation Quality Corpus: Presentation Delivery and Slides Quality

Authors:

Chee Wee Leong,

Chong Min LeeAuthors Info & Claims

MLA '14: Proceedings of the 2014 ACM workshop on Multimodal Learning Analytics Workshop and Grand Challenge

Pages 45 - 52

https://doi.org/10.1145/2666633.2666640

Published: 12 November 2014 Publication History

Abstract

The ability of making presentation slides and delivering them effectively to convey information to the audience is a task of increasing importance, particularly in the pursuit of both academic and professional career success. We envision that multimodal sensing and machine learning techniques can be employed to evaluate, and potentially help to improve the quality of the content and delivery of public presentations. To this end, we report a study using the Oral Presentation Quality Corpus provided by the 2014 Multimodal Learning Analytics (MLA) Grand Challenge. A set of multimodal features were extracted from slides, speech, posture and hand gestures, as well as head poses. We also examined the dimensionality of the human scores, which could be concisely represented by two Principal Component (PC) scores, comp1 for delivery skills and comp2 for slides quality. Several machine learning experiments were performed to predict the two PC scores using multimodal features. Our experiments suggest that multimodal cues can predict human scores on presentation tasks, and a scoring model comprising both verbal and visual features can outperform that using just a single modality.

References

[1]

L. Batrinca, G. Stratou, A. Shapiro, L.-P. Morency, and S. Scherer. Cicero-towards a multimodal virtual audience platform for public speaking training. In Intelligent Virtual Agents, pages 116--128, 2013.

[2]

J. Bernstein, A. V. Moere, and J. Cheng. Validating automated speaking tests. Language Testing, 27(3):355, 2010.

[3]

P. Boersma. Praat, a system for doing phonetics by computer. Glot international, 5(9/10):341--345, 2002.

[4]

R. E. Carlson and D. Smith-Howell. Classroom public speaking assessment: Reliability and validity of selected evaluation instruments. Communication Education, 44(2):87--97, 1995.

[5]

L. Chen, G. Feng, J. Joe, C. W. Leong, C. Kitchen, and C. M. Lee. Towards automated assessment of public speaking skills using multimodal cues. In Proceedings of the 16th international conference on multimodal interfaces. ACM, 2014.

Digital Library

[6]

L. Chen, K. Zechner, and X. Xi. Improved pronunciation features for construct-driven assessment of non-native spontaneous speech. In NAACL-HLT, 2009.

Digital Library

[7]

N. H. de Jong and T. Wempe. Praat script to detect syllable nuclei and measure speech rate automatically. Behavior research methods, 41(2):385--390, 2009.

[8]

ESPOL. Description of the oral presentation quality corpus. http://www.sigmla.org/datasets/, 2014.

[9]

H. Franco, H. Bratt, R. Rossier, V. R. Gadde, E. Shriberg, V. Abrash, and K. Precoda. EduSpeak: a speech recognition and pronunciation scoring toolkit for computer-aided language learning applications. Language Testing, 27(3):401, 2010.

[10]

J. H. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367--378, 2002.

Digital Library

[11]

R. Hincks. Measures and perceptions of liveliness in student oral presentation speech: A proposal for an automatic feedback mechanism. System, 33(4):575--591, 2005.

[12]

J. B. Hirschberg and A. Rosenberg. Acoustic/prosodic and lexical correlates of charismatic speech. In Proc. of InterSpeech, 2005.

[13]

M. Kuhn. Building predictive models in r using the caret package. Journal of Statistical Software, 28(5):1--26, 2008.

[14]

K. Kurihara, M. Goto, J. Ogata, Y. Matsusaka, and T. Igarashi. Presentation sensei: a presentation training system using speech and image processing. In Proceedings of the 9th international conference on Multimodal interfaces, pages 358--365. ACM, 2007.

Digital Library

[15]

P. C. Kyllonen. Measurement of 21st century skills within the common core state standards. In Invitational Research Symposium on Technology Enhanced Assessments. May, pages 7--8, 2012.

[16]

L.-P. Morency, J. Whitehill, and J. Movellan. Monocular head pose estimation using generalized adaptive view-based appearance model. Image and Vision Computing, 28(5):754--761, 2010.

Digital Library

[17]

A.-T. Nguyen, W. Chen, and M. Rauterberg. Online feedback system for public speakers. In IEEE Symp. e-Learning, e-Management and e-Services. Citeseer, 2012.

[18]

C. B. Pull. Current status of knowledge on public-speaking anxiety. Current opinion in psychiatry, 25(1):32--38, 2012.

[19]

S. Scherer, G. Layher, J. Kane, H. Neumann, and N. Campbell. An audiovisual political speech analysis incorporating eye-tracking and perception data. In LREC, pages 1114--1120, 2012.

[20]

S. Scherer, G. Stratou, and L.-P. Morency. Audiovisual behavior descriptors for depression assessment. In Proceedings of the 15th ACM on International conference on multimodal interaction, pages 135--140. ACM, 2013.

Digital Library

[21]

L. M. Schreiber, G. D. Paul, and L. R. Shibley. The development and test of the public speaking competence rubric. Communication Education, 61(3):205--233, 2012.

[22]

D. Silverstein and T. Zhang. System and method of providing evaluation feedback to a speaker while giving a real-time oral presentation, Oct. 2003. U.S. Classification 715/730; International Classification G09B19/04; Cooperative Classification G09B19/04; European Classification G09B19/04.

[23]

A. J. Smola and B. Schölkopf. A tutorial on support vector regression. Statistics and computing, 14(3):199--222, 2004.

Digital Library

[24]

M. Swift, G. Ferguson, L. Galescu, Y. Chu, C. Harman, H. Jung, I. Perera, Y. C. Song, J. Allen, and H. Kautz. A multimodal corpus for integrated language and action. In Proc. of the Int. Workshop on MultiModal Corpora for Machine Learning, 2012.

[25]

A. E. Ward. The assessment of public speaking: A pan-european view. In Information Technology Based Higher Education and Training (ITHET), 2013 International Conference on, pages 1--5. IEEE, 2013.

[26]

S. M. Witt. Use of Speech Recognition in Computer-assisted Language Learning. PhD thesis, University of Cambridge, 1999.

[27]

Z. Zhang. Microsoft kinect sensor and its effect. Multimedia, IEEE, 19(2):4--10, 2012.

Digital Library

Cited By

King SNeal T(2024)Applications of AI-Enabled Deception Detection Using Video, Audio, and Physiological Data: A Systematic ReviewIEEE Access10.1109/ACCESS.2024.346282512(135207-135240)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3462825
Tennakoon SIslam RWijerathna DDeen RSumathipala PAbeywardhana L(2023)An Interactive Application for University Students to Reduce the Industry-Academia Skill Gap in the Software Engineering Field2023 4th International Conference for Emerging Technology (INCET)10.1109/INCET57972.2023.10170591(1-6)Online publication date: 26-May-2023
https://doi.org/10.1109/INCET57972.2023.10170591
Thomas CJayagopi D(2022)Predicting Presentation Skill of a Speaker Using Automatic Speaker and Audience MeasurementIEEE Transactions on Learning Technologies10.1109/TLT.2022.317160115:3(350-363)Online publication date: 1-Jun-2022
https://doi.org/10.1109/TLT.2022.3171601
Show More Cited By

Index Terms

Using Multimodal Cues to Analyze MLA'14 Oral Presentation Quality Corpus: Presentation Delivery and Slides Quality
1. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

Towards Automated Assessment of Public Speaking Skills Using Multimodal Cues
ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

Traditional assessments of public speaking skills rely on human scoring. We report an initial study on the development of an automated scoring model for public speaking performances using multimodal technologies. Task design, rubric development, and ...
MLA'14: Third Multimodal Learning Analytics Workshop and Grand Challenges
ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

This paper summarizes the third Multimodal Learning Analytics Workshop and Grand Challenges (MLA'14). This subfield of Learning Analytics focuses on the interpretation of the multimodal interactions that occurs in learning environments, both digital and ...
Multimodal corpus of multiparty conversations in L1 and L2 languages and findings obtained from it

To investigate the differences in communicative activities by the same interlocutors in Japanese (their L1) and in English (their L2), an 8-h multimodal corpus of multiparty conversations was collected. Three subjects participated in each conversational ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MLA '14: Proceedings of the 2014 ACM workshop on Multimodal Learning Analytics Workshop and Grand Challenge

November 2014

68 pages

ISBN:9781450304887

DOI:10.1145/2666633

Program Chairs:
Xavier Ochoa
ESPOL, Ecuador
,
Marcelo Worsley
Stanford University, USA
,
Katherine Chiluiza
ESPOL, Ecuador
,
Saturnino Luz
Trinity College Dublin, Ireland

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMI '14

Sponsor:

SIGCHI

ICMI '14: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

November 12, 2014

Istanbul, Turkey

Acceptance Rates

MLA '14 Paper Acceptance Rate 3 of 3 submissions, 100%;

Overall Acceptance Rate 3 of 3 submissions, 100%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
441
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)4

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

King SNeal T(2024)Applications of AI-Enabled Deception Detection Using Video, Audio, and Physiological Data: A Systematic ReviewIEEE Access10.1109/ACCESS.2024.346282512(135207-135240)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3462825
Tennakoon SIslam RWijerathna DDeen RSumathipala PAbeywardhana L(2023)An Interactive Application for University Students to Reduce the Industry-Academia Skill Gap in the Software Engineering Field2023 4th International Conference for Emerging Technology (INCET)10.1109/INCET57972.2023.10170591(1-6)Online publication date: 26-May-2023
https://doi.org/10.1109/INCET57972.2023.10170591
Thomas CJayagopi D(2022)Predicting Presentation Skill of a Speaker Using Automatic Speaker and Audience MeasurementIEEE Transactions on Learning Technologies10.1109/TLT.2022.317160115:3(350-363)Online publication date: 1-Jun-2022
https://doi.org/10.1109/TLT.2022.3171601
Dominguez FOchoa XZambrano DCamacho KCastells J(2021)Scaling and Adopting a Multimodal Learning Analytics Application in an Institution-Wide SettingIEEE Transactions on Learning Technologies10.1109/TLT.2021.310077814:3(400-414)Online publication date: 1-Jun-2021
https://doi.org/10.1109/TLT.2021.3100778
Haider FKoutsombogera MConlan OVogel CCampbell NLuz S(2020)An Active Data Representation of Videos for Automatic Scoring of Oral Presentation Delivery Skills and Feedback GenerationFrontiers in Computer Science10.3389/fcomp.2020.000012Online publication date: 28-Jan-2020
https://doi.org/10.3389/fcomp.2020.00001
Lee HMandalapu VGong JKleinsmith AKuber RTruong KHeylen DCzerwinski MBerthouze NChetouani MNakano M(2020)Using Physiological Cues to Determine Levels of Anxiety Experienced among Deaf and Hard of Hearing English Language LearnersCompanion Publication of the 2020 International Conference on Multimodal Interaction10.1145/3395035.3425259(72-76)Online publication date: 25-Oct-2020
https://dl.acm.org/doi/10.1145/3395035.3425259
Lee HTruong KHeylen DCzerwinski MBerthouze NChetouani MNakano M(2020)Supporting Instructors to Provide Emotional and Instructional Scaffolding for English Language Learners through Biosensor-based FeedbackProceedings of the 2020 International Conference on Multimodal Interaction10.1145/3382507.3421159(733-737)Online publication date: 21-Oct-2020
https://dl.acm.org/doi/10.1145/3382507.3421159
Chua YDauwels JTan SHsiao SCunningham JMcCarthy KLynch GBrooks CFerguson RHoppe U(2019)Technologies for automated analysis of co-located, real-life, physical learning spacesProceedings of the 9th International Conference on Learning Analytics & Knowledge10.1145/3303772.3303811(11-20)Online publication date: 4-Mar-2019
https://dl.acm.org/doi/10.1145/3303772.3303811
Gan TLi JWong YKankanhalli M(2019)A Multi-sensor Framework for Personal Presentation AnalyticsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/330094115:2(1-21)Online publication date: 5-Jun-2019
https://dl.acm.org/doi/10.1145/3300941
Alviar CDale RGalati A(2019)Complex Communication Dynamics: Exploring the Structure of an Academic TalkCognitive Science10.1111/cogs.1271843:3Online publication date: 4-Mar-2019
https://doi.org/10.1111/cogs.12718
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents