Abstract
An important goal at Google is to make spoken access ubiquitously available. Achieving ubiquity requires two things: availability (i.e., built into every possible interaction where speech input or output can make sense) and performance (i.e., works so well that the modality adds no friction to the interaction).
This chapter is a case study of the development of Google Search by Voice – a step toward our long-term vision of ubiquitous access. While the integration of speech input into Google search is a significant step toward more ubiquitous access, it has posed many problems in terms of the performance of core speech technologies and the design of effective user interfaces. Work is ongoing and no doubt the problems are far from solved. Nonetheless, we have at the minimum achieved a level of performance showing that usage of voice search is growing rapidly, and that many users do indeed become repeat users.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, and M. Mohri. OpenFst: A general and efficient weighted finite-state transducer library. Lecture Notes in Computer Science, 4783:11, 2007.
M. Bacchiani, F. Beaufays, J. Schalkwyk, M. Schuster, and B. Strope. Deploying GOOG-411: Early lessons in data, measurement, and testing. In Proceedings of ICASSP, pp 5260–5263, April 2008.
MJF Gales. Semi-tied full-covariance matrices for hidden Markov models. 1997.
B. Harb, C. Chelba, J. Dean, and G. Ghemawhat. Back-off language model compression. 2009.
H. Hermansky. Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America, 87(4):1738–1752, 1990.
M. Kamvar and S. Baluja. A large scale study of wireless search behavior: Google mobile search. In CHI, pp 701–709, 22–27 April 2006.
S. Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer. In IEEE Transactions on Acoustics, Speech and Signal Processing, volume 35, pp 400–401, March 1987.
D. Povey, D. Kanevsky, B. Kingsbury, B. Ramabhadran, G. Saon, and K. Visweswariah. Boosted MMI for model and feature-space discriminative training. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2008.
R. Sproat, C. Shih, W. Gale, and N. Chang. A stochastic finite-state word-segmentation algorithm for Chinese. Computational Linguistics, 22(3):377–404, 1996.
C. Van Heerden, J. Schalkwyk, and B. Strope. Language modeling for what-with-where on GOOG-411. 2009.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Schalkwyk, J. et al. (2010). “Your Word is my Command”: Google Search by Voice: A Case Study. In: Neustein, A. (eds) Advances in Speech Recognition. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-5951-5_4
Download citation
DOI: https://doi.org/10.1007/978-1-4419-5951-5_4
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-5950-8
Online ISBN: 978-1-4419-5951-5
eBook Packages: EngineeringEngineering (R0)