Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2666633.2666640acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Using Multimodal Cues to Analyze MLA'14 Oral Presentation Quality Corpus: Presentation Delivery and Slides Quality

Published: 12 November 2014 Publication History

Abstract

The ability of making presentation slides and delivering them effectively to convey information to the audience is a task of increasing importance, particularly in the pursuit of both academic and professional career success. We envision that multimodal sensing and machine learning techniques can be employed to evaluate, and potentially help to improve the quality of the content and delivery of public presentations. To this end, we report a study using the Oral Presentation Quality Corpus provided by the 2014 Multimodal Learning Analytics (MLA) Grand Challenge. A set of multimodal features were extracted from slides, speech, posture and hand gestures, as well as head poses. We also examined the dimensionality of the human scores, which could be concisely represented by two Principal Component (PC) scores, comp1 for delivery skills and comp2 for slides quality. Several machine learning experiments were performed to predict the two PC scores using multimodal features. Our experiments suggest that multimodal cues can predict human scores on presentation tasks, and a scoring model comprising both verbal and visual features can outperform that using just a single modality.

References

[1]
L. Batrinca, G. Stratou, A. Shapiro, L.-P. Morency, and S. Scherer. Cicero-towards a multimodal virtual audience platform for public speaking training. In Intelligent Virtual Agents, pages 116--128, 2013.
[2]
J. Bernstein, A. V. Moere, and J. Cheng. Validating automated speaking tests. Language Testing, 27(3):355, 2010.
[3]
P. Boersma. Praat, a system for doing phonetics by computer. Glot international, 5(9/10):341--345, 2002.
[4]
R. E. Carlson and D. Smith-Howell. Classroom public speaking assessment: Reliability and validity of selected evaluation instruments. Communication Education, 44(2):87--97, 1995.
[5]
L. Chen, G. Feng, J. Joe, C. W. Leong, C. Kitchen, and C. M. Lee. Towards automated assessment of public speaking skills using multimodal cues. In Proceedings of the 16th international conference on multimodal interfaces. ACM, 2014.
[6]
L. Chen, K. Zechner, and X. Xi. Improved pronunciation features for construct-driven assessment of non-native spontaneous speech. In NAACL-HLT, 2009.
[7]
N. H. de Jong and T. Wempe. Praat script to detect syllable nuclei and measure speech rate automatically. Behavior research methods, 41(2):385--390, 2009.
[8]
ESPOL. Description of the oral presentation quality corpus. http://www.sigmla.org/datasets/, 2014.
[9]
H. Franco, H. Bratt, R. Rossier, V. R. Gadde, E. Shriberg, V. Abrash, and K. Precoda. EduSpeak: a speech recognition and pronunciation scoring toolkit for computer-aided language learning applications. Language Testing, 27(3):401, 2010.
[10]
J. H. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367--378, 2002.
[11]
R. Hincks. Measures and perceptions of liveliness in student oral presentation speech: A proposal for an automatic feedback mechanism. System, 33(4):575--591, 2005.
[12]
J. B. Hirschberg and A. Rosenberg. Acoustic/prosodic and lexical correlates of charismatic speech. In Proc. of InterSpeech, 2005.
[13]
M. Kuhn. Building predictive models in r using the caret package. Journal of Statistical Software, 28(5):1--26, 2008.
[14]
K. Kurihara, M. Goto, J. Ogata, Y. Matsusaka, and T. Igarashi. Presentation sensei: a presentation training system using speech and image processing. In Proceedings of the 9th international conference on Multimodal interfaces, pages 358--365. ACM, 2007.
[15]
P. C. Kyllonen. Measurement of 21st century skills within the common core state standards. In Invitational Research Symposium on Technology Enhanced Assessments. May, pages 7--8, 2012.
[16]
L.-P. Morency, J. Whitehill, and J. Movellan. Monocular head pose estimation using generalized adaptive view-based appearance model. Image and Vision Computing, 28(5):754--761, 2010.
[17]
A.-T. Nguyen, W. Chen, and M. Rauterberg. Online feedback system for public speakers. In IEEE Symp. e-Learning, e-Management and e-Services. Citeseer, 2012.
[18]
C. B. Pull. Current status of knowledge on public-speaking anxiety. Current opinion in psychiatry, 25(1):32--38, 2012.
[19]
S. Scherer, G. Layher, J. Kane, H. Neumann, and N. Campbell. An audiovisual political speech analysis incorporating eye-tracking and perception data. In LREC, pages 1114--1120, 2012.
[20]
S. Scherer, G. Stratou, and L.-P. Morency. Audiovisual behavior descriptors for depression assessment. In Proceedings of the 15th ACM on International conference on multimodal interaction, pages 135--140. ACM, 2013.
[21]
L. M. Schreiber, G. D. Paul, and L. R. Shibley. The development and test of the public speaking competence rubric. Communication Education, 61(3):205--233, 2012.
[22]
D. Silverstein and T. Zhang. System and method of providing evaluation feedback to a speaker while giving a real-time oral presentation, Oct. 2003. U.S. Classification 715/730; International Classification G09B19/04; Cooperative Classification G09B19/04; European Classification G09B19/04.
[23]
A. J. Smola and B. Schölkopf. A tutorial on support vector regression. Statistics and computing, 14(3):199--222, 2004.
[24]
M. Swift, G. Ferguson, L. Galescu, Y. Chu, C. Harman, H. Jung, I. Perera, Y. C. Song, J. Allen, and H. Kautz. A multimodal corpus for integrated language and action. In Proc. of the Int. Workshop on MultiModal Corpora for Machine Learning, 2012.
[25]
A. E. Ward. The assessment of public speaking: A pan-european view. In Information Technology Based Higher Education and Training (ITHET), 2013 International Conference on, pages 1--5. IEEE, 2013.
[26]
S. M. Witt. Use of Speech Recognition in Computer-assisted Language Learning. PhD thesis, University of Cambridge, 1999.
[27]
Z. Zhang. Microsoft kinect sensor and its effect. Multimedia, IEEE, 19(2):4--10, 2012.

Cited By

View all
  • (2024)Applications of AI-Enabled Deception Detection Using Video, Audio, and Physiological Data: A Systematic ReviewIEEE Access10.1109/ACCESS.2024.346282512(135207-135240)Online publication date: 2024
  • (2023)An Interactive Application for University Students to Reduce the Industry-Academia Skill Gap in the Software Engineering Field2023 4th International Conference for Emerging Technology (INCET)10.1109/INCET57972.2023.10170591(1-6)Online publication date: 26-May-2023
  • (2022)Predicting Presentation Skill of a Speaker Using Automatic Speaker and Audience MeasurementIEEE Transactions on Learning Technologies10.1109/TLT.2022.317160115:3(350-363)Online publication date: 1-Jun-2022
  • Show More Cited By

Index Terms

  1. Using Multimodal Cues to Analyze MLA'14 Oral Presentation Quality Corpus: Presentation Delivery and Slides Quality

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MLA '14: Proceedings of the 2014 ACM workshop on Multimodal Learning Analytics Workshop and Grand Challenge
    November 2014
    68 pages
    ISBN:9781450304887
    DOI:10.1145/2666633
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 November 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. body tracking
    2. educational applications
    3. multimodal corpus
    4. multimodal presentation assessment
    5. public speaking

    Qualifiers

    • Research-article

    Conference

    ICMI '14
    Sponsor:

    Acceptance Rates

    MLA '14 Paper Acceptance Rate 3 of 3 submissions, 100%;
    Overall Acceptance Rate 3 of 3 submissions, 100%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)25
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Applications of AI-Enabled Deception Detection Using Video, Audio, and Physiological Data: A Systematic ReviewIEEE Access10.1109/ACCESS.2024.346282512(135207-135240)Online publication date: 2024
    • (2023)An Interactive Application for University Students to Reduce the Industry-Academia Skill Gap in the Software Engineering Field2023 4th International Conference for Emerging Technology (INCET)10.1109/INCET57972.2023.10170591(1-6)Online publication date: 26-May-2023
    • (2022)Predicting Presentation Skill of a Speaker Using Automatic Speaker and Audience MeasurementIEEE Transactions on Learning Technologies10.1109/TLT.2022.317160115:3(350-363)Online publication date: 1-Jun-2022
    • (2021)Scaling and Adopting a Multimodal Learning Analytics Application in an Institution-Wide SettingIEEE Transactions on Learning Technologies10.1109/TLT.2021.310077814:3(400-414)Online publication date: 1-Jun-2021
    • (2020)An Active Data Representation of Videos for Automatic Scoring of Oral Presentation Delivery Skills and Feedback GenerationFrontiers in Computer Science10.3389/fcomp.2020.000012Online publication date: 28-Jan-2020
    • (2020)Using Physiological Cues to Determine Levels of Anxiety Experienced among Deaf and Hard of Hearing English Language LearnersCompanion Publication of the 2020 International Conference on Multimodal Interaction10.1145/3395035.3425259(72-76)Online publication date: 25-Oct-2020
    • (2020)Supporting Instructors to Provide Emotional and Instructional Scaffolding for English Language Learners through Biosensor-based FeedbackProceedings of the 2020 International Conference on Multimodal Interaction10.1145/3382507.3421159(733-737)Online publication date: 21-Oct-2020
    • (2019)Technologies for automated analysis of co-located, real-life, physical learning spacesProceedings of the 9th International Conference on Learning Analytics & Knowledge10.1145/3303772.3303811(11-20)Online publication date: 4-Mar-2019
    • (2019)A Multi-sensor Framework for Personal Presentation AnalyticsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/330094115:2(1-21)Online publication date: 5-Jun-2019
    • (2019)Complex Communication Dynamics: Exploring the Structure of an Academic TalkCognitive Science10.1111/cogs.1271843:3Online publication date: 4-Mar-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media