Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3472749.3474742acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article
Open access

Voice and Touch Based Error-tolerant Multimodal Text Editing and Correction for Smartphones

Published: 12 October 2021 Publication History

Abstract

Editing operations such as cut, copy, paste, and correcting errors in typed text are often tedious and challenging to perform on smartphones. In this paper, we present VT, a voice and touch-based multi-modal text editing and correction method for smartphones. To edit text with VT, the user glides over a text fragment with a finger and dictates a command, such as ”bold” to change the format of the fragment, or the user can tap inside a text area and speak a command such as ”highlight this paragraph” to edit the text. For text correcting, the user taps approximately at the area of erroneous text fragment and dictates the new content for substitution or insertion. VT combines touch and voice inputs with language context such as language model and phrase similarity to infer a user’s editing intention, which can handle ambiguities and noisy input signals. It is a great advantage over the existing error correction methods (e.g., iOS’s Voice Control) which require precise cursor control or text selection. Our evaluation shows that VT significantly improves the efficiency of text editing and text correcting on smartphones over the touch-only method and the iOS’s Voice Control method. Our user studies showed that VT reduced the text editing time by 30.80%, and text correcting time by 29.97% over the touch-only method. VT reduced the text editing time by 30.81%, and text correcting time by 47.96% over the iOS’s Voice Control method.

References

[1]
Ohoud Alharbi, Ahmed Sabbir Arif, Wolfgang Stuerzlinger, Mark D. Dunlop, and Andreas Komninos. 2019. WiseType: A Tablet Keyboard with Color-Coded Visualization and Various Editing Options for Error Correction. In Proceedings of the 45th Graphics Interface Conference on Proceedings of Graphics Interface 2019 (Kingston, Canada) (GI’19). Canadian Human-Computer Communications Society, Waterloo, CAN, Article 4, 10 pages. https://doi.org/10.20380/GI2019.04
[2]
Jessalyn Alvina, Carla F. Griggio, Xiaojun Bi, and Wendy E. Mackay. 2017. CommandBoard: Creating a General-Purpose Command Gesture Input Space for Soft Keyboard. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (Québec City, QC, Canada) (UIST ’17). ACM, New York, NY, USA, 17–28. https://doi.org/10.1145/3126594.3126639
[3]
Apple. 2018. About the keyboards settings on your iPhone, iPad, and iPod touch. https://support.apple.com/en-us/HT202178. [Online; accessed 22-August-2019].
[4]
Ahmed Sabbir Arif, Sunjun Kim, Wolfgang Stuerzlinger, Geehyuk Lee, and Ali Mazalek. 2016. Evaluation of a Smart-Restorable Backspace Technique to Facilitate Text Entry Error Correction. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 5151–5162. https://doi.org/10.1145/2858036.2858407
[5]
Xiaojun Bi, Yang Li, and Shumin Zhai. 2013. FFitts Law: Modeling Finger Touch with Fitts’ Law. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France) (CHI ’13). Association for Computing Machinery, New York, NY, USA, 1363–1372. https://doi.org/10.1145/2470654.2466180
[6]
Xiaojun Bi, Tom Ouyang, and Shumin Zhai. 2014. Both Complete and Correct?: Multi-objective Optimization of Touchscreen Keyboard. In Proceedings of the 32Nd Annual ACM Conference on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI ’14). ACM, New York, NY, USA, 2297–2306. https://doi.org/10.1145/2556288.2557414
[7]
Chung-Cheng Chiu, Tara N Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J Weiss, Kanishka Rao, Ekaterina Gonina, 2018. State-of-the-art speech recognition with sequence-to-sequence models. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4774–4778.
[8]
Wenzhe Cui, Jingjie Zheng, Blaine Lewis, Daniel Vogel, and Xiaojun Bi. 2019. HotStrokes: Word-Gesture Shortcuts on a Trackpad. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). ACM, New York, NY, USA, Article 165, 13 pages. https://doi.org/10.1145/3290605.3300395
[9]
Wenzhe Cui, Suwen Zhu, Mingrui Ray Zhang, H. Andrew Schwartz, Jacob O. Wobbrock, and Xiaojun Bi. 2020. JustCorrect: Intelligent Post Hoc Text Correction Techniques on Smartphones. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology(Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 487–499. https://doi.org/10.1145/3379337.3415857
[10]
Mark Davies. 2018. The corpus of contemporary American English: 1990-present.
[11]
Android developers. 2021. Android EditText. https://developer.android.com/reference/android/widget/EditText. [Online; Accessed: 2021-04-06].
[12]
Android developers. 2021. Android.Speech. https://developer.android.com/reference/android/speech/package-summary. [Online; Accessed: 2021-04-06].
[13]
A. D. N. Edwards. 2002. Multimodal Interaction and People with Disabilities. Springer Netherlands, Dordrecht. https://doi.org/10.1007/978-94-017-2367-1_5
[14]
Michael Fischer, Giovanni Campagna, Silei Xu, and Monica S. Lam. 2018. Brassau: Automatic Generation of Graphical User Interfaces for Virtual Assistants. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services (Barcelona, Spain) (MobileHCI ’18). Association for Computing Machinery, New York, NY, USA, Article 33, 12 pages. https://doi.org/10.1145/3229434.3229481
[15]
Vittorio Fuccella, Poika Isokoski, and Benoit Martin. 2013. Gestures and Widgets: Performance in Text Editing on Multi-touch Capable Mobile Devices. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France) (CHI ’13). ACM, New York, NY, USA, 2785–2794. https://doi.org/10.1145/2470654.2481385
[16]
Vittorio Fuccella and Benoît Martin. 2017. TouchTap: A Gestural Technique to Edit Text on Multi-Touch Capable Mobile Devices. In Proceedings of the 12th Biannual Conference on Italian SIGCHI Chapter (Cagliari, Italy) (CHItaly ’17). Association for Computing Machinery, New York, NY, USA, Article 21, 6 pages. https://doi.org/10.1145/3125571.3125579
[17]
Joshua Goodman, Gina Venolia, Keith Steury, and Chauncey Parker. 2002. Language Modeling for Soft Keyboards. In Proceedings of the 7th International Conference on Intelligent User Interfaces (San Francisco, California, USA) (IUI ’02). ACM, New York, NY, USA, 194–195. https://doi.org/10.1145/502716.502753
[18]
Google. 2021. Get started with Voice Access. https://support.google.com/accessibility/android/answer/6151848?hl=en. [Online; Accessed: 2021-07-18].
[19]
google.com. 2021. Type with your voice. https://support.google.com/docs/answer/4492226?hl=en#zippy=%2Cselect-text. [Online; accessed 6-April-2021].
[20]
Christian Holz and Patrick Baudisch. 2011. Understanding Touch. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI ’11). Association for Computing Machinery, New York, NY, USA, 2501–2510. https://doi.org/10.1145/1978942.1979308
[21]
iMore.com. 2021. Everything you can do with Voice Control on iPhone and iPad. https://www.imore.com/everything-you-can-do-voice-control-iphone-and-ipad. [Online; Accessed: 2021-07-18].
[22]
ExIdeas Inc. 2018. MessagEase - The Smartest Touch Screen keyboard. https://www.exideas.com/ME/index.php. [Online; accessed 22-August-2019].
[23]
Grammarly Inc.2020. Grammarly Keyboard. https://en.wikipedia.org/wiki/Grammarly [Online; accessed May-2020].
[24]
Poika Isokoski, Benoît Martin, Paul Gandouly, and Thomas Stephanov. 2010. Motor Efficiency of Text Entry in a Combination of a Soft Keyboard and Unistrokes. In Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries (Reykjavik, Iceland) (NordiCHI ’10). ACM, New York, NY, USA, 683–686. https://doi.org/10.1145/1868914.1869004
[25]
Michael Johnston, John Chen, Patrick Ehlen, Hyuckchul Jung, Jay Lieske, Aarthi Reddy, Ethan Selfridge, Svetlana Stoyanchev, Brant Vasilieff, and Jay Wilpon. 2014. MVA: The Multimodal Virtual Assistant. In Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL). Association for Computational Linguistics, Philadelphia, PA, U.S.A., 257–259. https://doi.org/10.3115/v1/W14-4335
[26]
Clare-Marie Karat, Christine Halverson, Daniel Horn, and John Karat. 1999. Patterns of Entry and Correction in Large Vocabulary Continuous Speech Recognition Systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing Machinery, New York, NY, USA, 568–575. https://doi.org/10.1145/302979.303160
[27]
Bryan Klimt and Yiming Yang. 2004. The enron corpus: A new dataset for email classification research. In European Conference on Machine Learning. Springer, 217–226.
[28]
Andreas Komninos, Mark Dunlop, Kyriakos Katsaris, and John Garofalakis. 2018. A Glimpse of Mobile Text Entry Errors and Corrective Behaviour in the Wild. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct (Barcelona, Spain) (MobileHCI ’18). ACM, New York, NY, USA, 221–228. https://doi.org/10.1145/3236112.3236143
[29]
N. Krahnstoever, S. Kettebekov, M. Yeasin, and R. Sharma. 2002. A Real-Time Framework for Natural Multimodal Interaction with Large Screen Displays. In Proceedings of the 4th IEEE International Conference on Multimodal Interfaces(ICMI ’02). IEEE Computer Society, USA, 349. https://doi.org/10.1109/ICMI.2002.1167020
[30]
Per Ola Kristensson and Shumin Zhai. 2007. Command Strokes with and Without Preview: Using Pen Gestures on Keyboard for Command Selection. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’07). ACM, New York, NY, USA, 1137–1146. https://doi.org/10.1145/1240624.1240797
[31]
Vladimir Iosifovich Levenshtein. 1966. Binary codes capable of correcting deletions, insertions and reversals.Soviet Physics Doklady 10, 8 (feb 1966), 707–710. Doklady Akademii Nauk SSSR, V163 No4 845-848 1965.
[32]
Toby Jia-Jun Li, Jingya Chen, Haijun Xia, Tom M. Mitchell, and Brad A. Myers. 2020. Multi-Modal Repairs of Conversational Breakdowns in Task-Oriented Dialogs. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 1094–1107. https://doi.org/10.1145/3379337.3415820
[33]
Google LLC.2020. Gboard. https://en.wikipedia.org/wiki/Gboard [Online; accessed May-2020].
[34]
Matt Mahoney. 2011. About Text8 file. http://mattmahoney.net/dc/textdata.html. [Online; accessed May-2020].
[35]
Jennifer Mankoff, Gregory D Abowd, and Scott E Hudson. 2000. OOPS: a toolkit supporting mediation techniques for resolving ambiguity in recognition-based interfaces. Computers & Graphics 24, 6 (2000), 819–834. https://doi.org/10.1016/S0097-8493(00)00085-6 Calligraphic Interfaces: towards a new generation of interactive systems.
[36]
Tomas Mikolov, Kai Chen, Greg S. Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. http://arxiv.org/abs/1301.3781
[37]
nuance.com. 2021. Dragon Speech Recognition - Get More Done by Voice: Dragon. https://www.nuance.com/dragon.html. [Online; accessed 6-April-2021].
[38]
Per Ola Kristensson and Keith Vertanen. 2011. Asynchronous Multimodal Text Entry Using Speech and Gesture Keyboards. In Proceedings of the International Conference on Spoken Language Processing (Florence, Italy). 581–584.
[39]
Sharon Oviatt and Philip Cohen. 2000. Perceptual User Interfaces: Multimodal Interfaces That Process What Comes Naturally. Commun. ACM 43, 3 (March 2000), 45–53. https://doi.org/10.1145/330534.330538
[40]
Sharon Oviatt, Phil Cohen, Lizhong Wu, John Vergo, Lisbeth Duncan, Bernhard Suhm, Josh Bers, Thomas Holzman, Terry Winograd, James Landay, Jim Larson, and David Ferro. 2000. Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions. Hum.-Comput. Interact. 15, 4 (Dec. 2000), 263–322. https://doi.org/10.1207/S15327051HCI1504_1
[41]
Sharon Oviatt and Philip R. Cohen. 2015. The Paradigm Shift to Multimodality in Contemporary Computer Interfaces. Morgan & Claypool Publishers.
[42]
S. Oviatt and R. VanGent. 1996. Error resolution during multimodal human-computer interaction. In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP ’96, Vol. 1. 204–207 vol.1. https://doi.org/10.1109/ICSLP.1996.607077
[43]
Kseniia Palin, Anna Feit, Sunjun Kim, Per Ola Kristensson, and Antti Oulasvirta. 2019. How do People Type on Mobile Devices? Observations from a Study with 37,000 Volunteers. In Proceedings of 21st International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI’19). ACM.
[44]
Radiah Rivu, Yasmeen Abdrabou, Ken Pfeuffer, Mariam Hassib, and Florian Alt. 2020. Gaze’N’Touch: Enhancing Text Selection on Mobile Devices Using Gaze. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI EA ’20). Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/3334480.3382802
[45]
Ritam Jyoti Sarmah, Yunpeng Ding, Di Wang, Cheuk Yin Phipson Lee, Toby Jia-Jun Li, and Xiang ’Anthony’ Chen. 2020. Geno: A Developer Tool for Authoring Multimodal Interaction on Existing Web Applications. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 1169–1181. https://doi.org/10.1145/3379337.3415848
[46]
Khe Chai Sim. 2010. Haptic Voice Recognition: Augmenting speech modality with touch events for efficient speech recognition. In 2010 IEEE Spoken Language Technology Workshop. 73–78. https://doi.org/10.1109/SLT.2010.5700825
[47]
Khe Chai Sim. 2012. Speak-as-you-swipe (SAYS): A Multimodal Interface Combining Speech and Gesture Keyboard Synchronously for Continuous Mobile Text Entry. In Proceedings of the 14th ACM International Conference on Multimodal Interaction (Santa Monica, California, USA) (ICMI ’12). ACM, New York, NY, USA, 555–560. https://doi.org/10.1145/2388676.2388793
[48]
Shyamli Sindhwani, Christof Lutteroth, and Gerald Weber. 2019. ReType: Quick Text Editing with Keyboard and Gaze. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). ACM, New York, NY, USA, Article 203, 13 pages. https://doi.org/10.1145/3290605.3300433
[49]
Bernhard Suhm, Brad Myers, and Alex Waibel. 2001. Multimodal error correction for speech user interfaces. ACM transactions on computer-human interaction (TOCHI) 8, 1(2001), 60–98.
[50]
Keith Vertanen, Haythem Memmi, Justin Emge, Shyam Reyal, and Per Ola Kristensson. 2015. VelociTap: Investigating Fast Mobile Text Entry Using Sentence-Based Decoding of Touchscreen Keyboard Input. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). ACM, New York, NY, USA, 659–668. https://doi.org/10.1145/2702123.2702135
[51]
Daniel Vogel and Patrick Baudisch. 2007. Shift: A Technique for Operating Pen-Based Interfaces Using Touch. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’07). Association for Computing Machinery, New York, NY, USA, 657–666. https://doi.org/10.1145/1240624.1240727
[52]
Klaus Weidner. 2018. Hackers Keyboard. http://code.google.com/p/hackerskeyboard/ [Online; accessed 22-August-2019].
[53]
Jackie (Junrui) Yang, Monica S. Lam, and James A. Landay. 2020. DoThisHere: Multimodal Interaction to Improve Cross-Application Tasks on Mobile Devices. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 35–44. https://doi.org/10.1145/3379337.3415841
[54]
Mingrui Ray Zhang, He Wen, and Jacob O. Wobbrock. 2019. Type, Then Correct: Intelligent Text Correction Techniques for Mobile Text Entry Using Neural Networks. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology (New Orleans, LA, USA) (UIST ’19). Association for Computing Machinery, New York, NY, USA, 843–855. https://doi.org/10.1145/3332165.3347924
[55]
Mingrui Ray Zhang and O. Jacob Wobbrock. 2020. Gedit: Keyboard gestures for mobile text editing. In Proceedings of Graphics Interface (GI ’20)(Toronto, Ontario) (GI ’20). Canadian Information Processing Society, Toronto, Ontario, 97–104.

Cited By

View all
  • (2024)SwivelTouch: Boosting Touchscreen Input with 3D Finger Rotation GestureProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595848:2(1-30)Online publication date: 15-May-2024
  • (2024)Improving Error Correction and Text Editing Using Voice and Mouse Multimodal InterfaceInternational Journal of Human–Computer Interaction10.1080/10447318.2024.2352932(1-24)Online publication date: 22-May-2024
  • (2023)GeShort: One-Handed Mobile Text Editing and Formatting with Gestural Shortcuts and a Floating ClipboardProceedings of the ACM on Human-Computer Interaction10.1145/36042597:MHCI(1-23)Online publication date: 13-Sep-2023
  • Show More Cited By
  1. Voice and Touch Based Error-tolerant Multimodal Text Editing and Correction for Smartphones

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    UIST '21: The 34th Annual ACM Symposium on User Interface Software and Technology
    October 2021
    1357 pages
    ISBN:9781450386357
    DOI:10.1145/3472749
    This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 October 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Multimodal interaction
    2. smartphones.
    3. text correction
    4. text editing
    5. touch input

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    UIST '21

    Acceptance Rates

    Overall Acceptance Rate 561 of 2,567 submissions, 22%

    Upcoming Conference

    UIST '25
    The 38th Annual ACM Symposium on User Interface Software and Technology
    September 28 - October 1, 2025
    Busan , Republic of Korea

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)433
    • Downloads (Last 6 weeks)53
    Reflects downloads up to 27 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)SwivelTouch: Boosting Touchscreen Input with 3D Finger Rotation GestureProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595848:2(1-30)Online publication date: 15-May-2024
    • (2024)Improving Error Correction and Text Editing Using Voice and Mouse Multimodal InterfaceInternational Journal of Human–Computer Interaction10.1080/10447318.2024.2352932(1-24)Online publication date: 22-May-2024
    • (2023)GeShort: One-Handed Mobile Text Editing and Formatting with Gestural Shortcuts and a Floating ClipboardProceedings of the ACM on Human-Computer Interaction10.1145/36042597:MHCI(1-23)Online publication date: 13-Sep-2023
    • (2023)µGeT: Multimodal eyes-free text selection technique combining touch interaction and microgesturesProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3614131(594-603)Online publication date: 9-Oct-2023
    • (2023)A Human-Computer Collaborative Editing Tool for Conceptual DiagramsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580676(1-29)Online publication date: 19-Apr-2023
    • (2023)Context Matters: Understanding the Effect of Usage Contexts on Users’ Modality Selection in Multimodal SystemsInternational Journal of Human–Computer Interaction10.1080/10447318.2023.225060640:20(6287-6302)Online publication date: 29-Aug-2023
    • (2023)Arrow2edit: A Technique for Editing Text on SmartphonesHuman-Computer Interaction10.1007/978-3-031-35596-7_27(416-432)Online publication date: 23-Jul-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media