Whilst search engines are making great strides to achieve gold standards in error free voice search recognition there are still a number of challenges. We look at some of them here and seek to understand how we may adapt to optimise for them. Thanks to Enrique Alfonseca, the Google Conversational Research Team, ESSIR Barcelona for the great learnings and education.
Report
Share
Report
Share
1 of 62
More Related Content
Voice Search Challenges For Search and Information Retrieval and SEO
1. #SMX #32A @dawnieando
…And how you can overcome some of them
SOME CURRENT
CHALLENGES WITH
VOICE &
CONVERSATIONAL
SEARCH
2. #SMX #32A @dawnieando
Who is Dawn Anderson?
• From rainy Manchester, UK
• A bit of a ‘pracademic’ (hybrid of academic and
practitioner)
• International SEO consultant
• Move It Marketing
• I lecture on search and digital marketing strategy
• But I mostly ‘do’ SEO
• 11 years in SEO now
• Googlebot hunter ;P ;P
• Consulting with brands, in-‐house teams and start-‐
ups
• My pomeranian Bert is often featured in tweets
and posts ;P ;P
9. #SMX #32A @dawnieando
What does a good result look like?
SPOILER
• Meets informational needs
• In short answers (as applicable)
• Or the answer is at the beginning
of the paragraph or result
• Grammatically correct
(syntactically well-formed)
• No spelling mistakes
• With accurate pronunciation
11. #SMX #32A @dawnieando
• [Skip]
• [play mumford and sons reminder] -‐ Action Response: Set a
Reminder Time: Please specify a time Fails to Meet The user
wanted to play a specific song, and the device instead set a
reminder. No users would be satisfied with this response.
Bad Result - Confusion between ‘actions’ & ‘queries’
12. #SMX #32A @dawnieando
Who knows how many times Google Home cannot help?
• Only Google knows
• But they aren’t
sharing
• Search engine
embarrassment?
17. #SMX #32A @dawnieando
§ One shot at the answer
§ Berrypicking ‘evolving search’ may
not apply so easily
§ Does not benefit from query
refinement and user feedback as
desktop SERPs do
– May be why there are still many
unanswered queries
Better Ranking Is Needed As The User Focuses On A
Single Result
18. #SMX #32A @dawnieando
Accurate spelling and grammar matter a lot in voice search
TITLE SLIDE ALTERNATIVE LAYOUT w/
*EXAMPLE* IMAGE
(SWAP IN YOUR OWN AS NEEDED)
Query diversity ‘clusters’
in keyboard ‘evolving’
user search
19. #SMX #32A @dawnieando
Accurate spelling and grammar matter a lot in voice search
TITLE SLIDE ALTERNATIVE LAYOUT w/
*EXAMPLE* IMAGE
(SWAP IN YOUR OWN AS NEEDED)
Query refinement (via
user feedback) is not
possible with voice
search
20. #SMX #32A @dawnieando
#SMXInsights
§ No query expansion or relaxation
– Precision more important than recall
– Because there can be only one (or 2)
21. #SMX #32A @dawnieando
Accurate spelling and grammar matter a lot in voice search
TITLE SLIDE ALTERNATIVE LAYOUT w/
*EXAMPLE* IMAGE
(SWAP IN YOUR OWN AS NEEDED)
Precision > Recall in voice
search
Accuracy > Diversity
23. #SMX #32A @dawnieando
“There is no re-‐ordering in
voice search – no
paraphrasing – just
extraction and
compression.”
(Alfonseca, 2017,
ESSIR2017)
24. #SMX #32A @dawnieando
Example of classic IR teaching query interpretation system
TITLE SLIDE ALTERNATIVE LAYOUT w/
*EXAMPLE* IMAGE
(SWAP IN YOUR OWN AS NEEDED)
25. #SMX #32A @dawnieando
#SMXInsights
§ No paraphrasing with conversational search
– Paraphrasing likely needs full understanding
of query & intent to reformulate
26. #SMX #32A @dawnieando
• The knowledge base is checked first
• Then the web is checked to ‘fill in gaps’
• Taking from the messy unstructured
data of web pages
Knowledge base first,
web text second
27. #SMX #32A @dawnieando
• Structured data (tables and data stored in databases)
• Semi-‐structured data (XML, JSON, meta headings [h1-‐h6])
• Semantically-‐enriched data (marked up schema, entities)
• Unstuctured data (normal web text copy)
• The web is messy and noisy
• Unstructured data is difficult to make sense of (no topical
strength)
The different types of data & the problem with
unstructured data
29. #SMX #32A @dawnieando
• Adds meaning
• Disambiguates
• Adds structure
• Helps with context
• The web is noisy
• Unstructured data is voluminous
Structured Data is very,
very useful here
31. #SMX #32A @dawnieando
Share these #SMXInsights on your social channels!
#SMXInsights
§ Tables are problematic for voice search
– Support tabular data with well formed
paragraphs and sentences
32. #SMX #32A @dawnieando
• What may be good for featured
snippets (tabular data) may not be
good for voice search
• You may need additional strategy
for voice search & tabular data in
featured snippets
• Pete Myers from Moz found only
30% voice search results on Google
Home came from tables in featured
snippets (Image credit: Pete Myers,
Moz)
Tables are currently problematic
33. #SMX #32A @dawnieando
CONFIRMED BY:
• Google’s Enrique Alfonseca (2017)
• Microsoft’s Harry Shum (2018)
• Conversational contextual search is difficult
Multi-turn conversations are still challenging
34. #SMX #32A @dawnieando
• (“anaphoric” is referring
upward to previously
mentioned words)
• Resolution means trying to
understand what it was
which is referred to in those
previously mentioned words
Anaphoric
Resolution
35. #SMX #32A @dawnieando
• (“cataphoric” is referring
downward to subsequent
words)
• Resolution means trying to
understand what it is which is
referred to in those
subsequent words
Cataphoric
Resolution
36. #SMX #32A @dawnieando
Likely relates to anaphoric (likely) & cataphoric (far less likely)
resolution
Pronouns seem still
Problematic
47. #SMX #32A @dawnieando
money cash 9.08
money currency 9.04
football soccer 9.03
magician wizard 9.02
gem jewel 8.96
car automobile 8.94
boy lad 8.83
furnace stove 8.79
Maradona football 8.62
king queen 8.58
money bank 8.5
Jerusalem Israel 8.46
vodka gin 8.46
planet star 8.45
money dollar 8.42
vodka brandy 8.13
bank money 8.12
physics proton 8.12
planet galaxy 8.11
stock market 8.08
psychology psychiatry 8.08
planet moon 8.08
planet constellation 8.06
planet sun 8.02
tiger feline 8
planet astronomer 7.94
movie theater 7.92
planet space 7.92
baby mother 7.85
wood forest 7.73
money deposit 7.73
psychology mind 7.69
Jerusalem Palestinian 7.65
Arafat terror 7.65
computer keyboard 7.62
computer internet 7.58
money property 7.57
tennis racket 7.56
psychology cognition 7.48
book paper 7.46
book library 7.46
media radio 7.42
psychology depression 7.42
jaguar cat 7.42
movie star 7.38
bird crane 7.38
tiger cat 7.35
physics chemistry 7.35
money possession 7.29
jaguar car 7.27
cup drink 7.25
psychology health 7.23
bird cock 7.1
company stock 7.08
tiger carnivore 7.08
WordSimilarity353 Test Collection
48. #SMX #32A @dawnieando
#SMXInsights
§ Secondary or 3-way strategy may be
needed
– Add a TL:DR
– Or an executive summary
– Or Q & A based table of contents
– Or a ‘Short Answer’ then ‘Longer Answer’
49. #SMX #32A @dawnieando
#SMXInsights
§ Mine forums, customer service, chat &
emails
– Build word clouds to provide answers to
topics which matter to your audience
50. #SMX #32A @dawnieando
Accurate spelling and grammar matter a lot in voice search
TITLE SLIDE ALTERNATIVE LAYOUT w/
*EXAMPLE* IMAGE
(SWAP IN YOUR OWN AS NEEDED)
Soundex, Metaphone or
similar ’misspelling’
algorithms may not apply
to voice search
52. #SMX #32A @dawnieando
• WordSimilarity353 Test Collection -‐http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/
• Miller, G.A. and Charles, W.G., 1991. Contextual correlates of semantic similarity. Language and
cognitive processes, 6(1), pp.1-‐28.
• Linkedin Harry Shum. 2018. From Search to Research. [ONLINE] Available
at: https://www.linkedin.com/pulse/from-‐search-‐research-‐harry-‐shum/. [Accessed 22 February 2018].
• Coreference Resolution -‐ The Stanford Natural Language Processing Group. 2018. The Stanford Natural
Language Processing Group. [ONLINE] Available at: https://nlp.stanford.edu/projects/coref.shtml.
[Accessed 19 February 2018].
Sources & References
54. #SMX #32A @dawnieando
EXAMPLES
• Look at Wikipedia Redirects
• Alternative names redirect to the most appropriate article
title (for example, Edison Arantes do Nascimento redirects
to Pelé) (Wikipedia)
• SPARQL and DBPedia identifies many variations
• (Beethoven example)
• https://dbpedia.org/sparql
• https://en.wikipedia.org/wiki/Wikipedia:Redirect
Terms can have many ‘surface forms’
55. #SMX #32A @dawnieando
” It is concluded…the more often two words
can be substituted into the same contexts the
more similar in meaning they are judged to
be.”
(Miller & Charles,1991)
56. #SMX #32A @dawnieando
Accurate spelling and grammar matter a lot in voice search
TITLE SLIDE ALTERNATIVE LAYOUT w/
*EXAMPLE* IMAGE
(SWAP IN YOUR OWN AS NEEDED)
Difficult to deal with
‘query ambiguity’
Result ‘diversity’
assists with query
ambiguity in desktop
or non-‐voice results
57. #SMX #32A @dawnieando
Accurate spelling and grammar matter a lot in voice search
TITLE SLIDE ALTERNATIVE LAYOUT w/
*EXAMPLE* IMAGE
(SWAP IN YOUR OWN AS NEEDED)
Page Length
‘Normalization’ may not
apply as with traditional
results??
(Me musing)
58. #SMX #32A @dawnieando
Long numbers should be rounded
§ 60,999,888.999999999
– It reads terribly
– Needs to be rounded
59. #SMX #32A @dawnieando
• First checks whether the next ‘turn’ of question relates to
the previous question
• Using LSTMs (Long Short Term Memory)
• Bi-‐directional context embedding
• Query and its context are both used as input
Conversational Context & Microsoft
60. #SMX #32A @dawnieando
Katja Filippova – Google Research Team
TITLE SLIDE ALTERNATIVE LAYOUT w/
*EXAMPLE* IMAGE
(SWAP IN YOUR OWN AS NEEDED)