Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
MEASURING USER
ENGAGEMENT
MOUNIA LALMAS, YAHOO! LABS
HEATHER O’BRIEN, UNIVERSITY OF BRITISH COLUMBIA
ELAD YOM-TOV, MICROSOFT RESEARCH
© Lalmas, O‟Brien & Yom-Tov 1
WHY IS IT IMPORTANT TO ENGAGE USERS?
o In today‟s wired world, users have enhanced expectations about
their interactions with technology
… resulting in increased competition amongst the purveyors
and designers of interactive systems.
o In addition to utilitarian factors, such as usability, we must
consider the hedonic and experiential factors of interacting with
technology, such as fun, fulfillment, play, and user engagement.
o In order to make engaging systems, we need to understand what
user engagement is and how to measure it.
2
WHY IS IT IMPORTANT TO MEASURE AND
INTERPRET USER ENGAGEMENT WELL?
CTR
… for example
3
OUTLINE
o Introduction and Scope
o Part I - Foundations
1. Approaches based on self-report measures
2. Approaches based on web analytics
3. Approaches based on physiological measures
o Part II – Advanced Aspects
1. Measuring user engagement in mobile information searching
2. Networked user engagement
3. Combining different approaches
o Conclusions
o Bibliography
4
WHO WE ARE
o Mounia Lalmas, Visiting Principal Scientist, Yahoo! Labs
• Research interest: user engagement, social media, search
• Blog: http://labtomarket.wordpress.com
o Heather O‟Brien, Assistant Professor, iSchool,
University of British Columbia
• Research interests: theories of user engagement; self-
report and qualitative methods of evaluating user
engagement
• Website: http://faculty.arts.ubc.ca/hobrien/
o Elad Yom-Tov, Senior Researcher, Microsoft Research
• Research interests: learning from user behavior about
actions in the physical world
• Website: http://research.microsoft.com/en-us/people/eladyt/
5
INTRODUCTION
AND SCOPE
MEASURING USER ENGAGEMENT
6
http://thenextweb.com/asia/2013/05/03/kakao-talk-rolls-out-plus-friend-home-a-
revamped-platform-to-connect-users-with-their-favorite-brands/
ENGAGEMENT IS ON EVERYONE’S MIND
http://socialbarrel.com/70-percent-of-brand-engagement-on-pinterest-come-from-
users/51032/
http://iactionable.com/user-engagement/
http://www.cio.com.au/article/459294/heart_foundation_uses_ga
mification_drive_user_engagement/
http://www.localgov.co.uk/index.cfm?method=news.detail&id=109512
http://www.trefis.com/stock/lnkd/articles/179410/linkedin-makes-a-90-
million-bet-on-pulse-to-help-drive-user-engagement/2013-04-15
7
WHAT IS USER ENGAGEMENT (UE)? (I)
o “The state of mind that we must attain in order to enjoy a
representation of an action” so that we may experience
computer worlds “directly, without mediation or distraction”
(Laurel, 1993, pp. 112-113, 116).
o “Engagement is a user‟s response to an interaction that
gains maintains, and encourages their attention, particularly
when they are intrinsically motivated” (Jacques, 1996, p. 103).
o A quality of user experience that depends on the aesthetic
appeal, novelty, and usability of the system, the ability of the
user to attend to and become involved in the experience,
and the user‟s overall evaluation of the experience.
Engagement depends on the depth of participation the user
is able to achieve with respect to each experiential attribute
(O‟Brien & Toms, 2008).
o “…explain[s] how and why applications attract people to
use them” (Sutcliffe, 2010, p. 3).
8
WHAT IS UE? (II)
o User engagement is a quality of the user experience
that emphasizes the positive aspects of interaction
– in particular the fact of being captivated by the
technology (Attfield et al, 2011).
user feelings: happy, sad,
excited, …
The emotional, cognitive and behavioural connection
that exists, at any point in time and over time, between a
user and a technological resource
user interactions: click, read
comment, recommend, buy…
user mental states: involved,
lost, concentrated…
9
INCREASED EMPHASIS ON MEASURING UE
http://www.cbc.ca/news/health/story/2012/12/20/inside-your-brain-neuromarketing.html
10
TRACKING USER BEHAVIOR
http://www.google.ca/analytics/index.html
11
HOW DO WE CAPTURE USER ENGAGEMENT?
http://www.businessweek.com/articles/2012-10-12/why-measuring-
user-engagement-is-harder-than-you-think
12
WHY IS MEASURING UE IMPORTANT?
o User engagement is a complex construct
o Various approaches have been proposed for measuring
engagement, but…
• Not enough emphasis on reliability and validity of
individual measures, or triangulation of various
approaches.
o Standardization of what user engagement is and how
to measure it will benefit research, design, and users.
13
CONSIDERATIONS IN THE MEASUREMENT
OF USER ENGAGEMENT
o Short term (within session) and long term (across
multiple sessions)
o Laboratory vs. field studies
o Subjective vs. objective measurement
o Large scale (e.g., dwell time of 100,000 people) vs.
small scale (gaze patterns of 10 people)
o UE as process vs. as product
One is not better than other; it depends on what is the aim.
14
SOME CAVEATS (I)
o This tutorial assumes that web application are “properly designed”
• We do not look into how to design good web site (although some user
engagement measurement may inform for an enhanced design).
o This tutorial is based on “published research” literature
• We do not know how each individual company and organization
measure user engagement (although we guess some common
baselines).
o This tutorial focuses on web applications that users “chose” to
engage with
• A web tool that has to be used e.g. for work purpose, is totally different
(users have no choice).
o This tutorial is not an “exhaustive” account of all existing works
• We focus on work that we came across and that has influenced us; if we
have missed something important, let us know.
15
SOME CAVEATS (II)
o This tutorial focuses on web applications that are widely used by
“anybody” on a “large-scale”
• User engagement in the game industry or education have different
characteristics.
o This tutorial does not focus on the effect of advertisements on
user engagement
• We assume that web applications that display ads do so in a
“normal” way so that to not annoy or frustrate users.
o This tutorial looks at user engagement at web application “level”
• Although we use examples and may refer to specific sites or types of
applications, we do not focus on any particular applications.
o This tutorial is not about “how” to influence user engagement 
16
OUTLINE
o Introduction and Scope
o Part I - Foundations
1. Approaches based on self-report measures
2. Approaches based on web analytics
3. Approaches based on physiological measures
o Part II – Advanced Aspects
1. Measuring user engagement in mobile information searching
2. Networked user engagement
3. Combining different approaches
o Conclusions
o Bibliography
17
PART 1:
FOUNDATIONS
MEASURING USER ENGAGEMENT
18
CHARACTERISTICS OF USER ENGAGEMENT (I)
• Users must be focused to be engaged
• Distortions in the subjective perception of time used to
measure it
Focused attention
(Webster & Ho, 1997; O‟Brien,
2008)
• Emotions experienced by user are intrinsically motivating
• Initial affective “hook” can induce a desire for exploration,
active discovery or participation
Positive Affect
(O‟Brien & Toms, 2008)
• Sensory, visual appeal of interface stimulates user &
promotes focused attention
• Linked to design principles (e.g. symmetry, balance,
saliency)
Aesthetics
(Jacques et al, 1995; O‟Brien,
2008)
• People remember enjoyable, useful, engaging
experiences and want to repeat them
• Reflected in e.g. the propensity of users to recommend
an experience/a site/a product
Endurability
(Read, MacFarlane, & Casey,
2002; O‟Brien, 2008)
19
CHARACTERISTICS OF USER ENGAGEMENT (II)
• Novelty, surprise, unfamiliarity and the unexpected
• Appeal to users‟ curiosity; encourages inquisitive
behavior and promotes repeated engagement
Novelty
(Webster & Ho, 1997; O‟Brien,
2008)
• Richness captures the growth potential of an activity
• Control captures the extent to which a person is able
to achieve this growth potential
Richness and control
(Jacques et al, 1995; Webster &
Ho, 1997)
• Trust is a necessary condition for user engagement
• Implicit contract among people and entities which is
more than technological
Reputation, trust and
expectation (Attfield et al,
2011)
• Difficulties in setting up “laboratory” style experiments
• Why should users engage?
Motivation, interests,
incentives, and
benefits (Jacques et al., 1995;
O‟Brien & Toms, 2008)
20
FORRESTER RESEARCH – THE FOUR I‟S
• Presence of a user
• Measured by e.g. number of visitors, time spent
Involvement
• Action of a user
• Measured by e.g. CTR, online transaction, uploaded
photos or videos
Interaction
• Affection or aversion of a user
• Measured by e.g. satisfaction rating, sentiment
analysis in blogs, comments, surveys, questionnaires
Intimacy
• Likelihood a user advocates
• Measured by e.g. forwarded content, invitation to joinInfluence
(Forrester Research, June 2008) 21
FLOW: THE THEORY OF OPTIMAL EXPERIENCE
o What is “Flow”
the state in which people are so involved
in an activity that nothing else seems to
matter; the experience itself is so
enjoyable that people will do it even at
great cost, for the sheer sake of doing it
(Csikszentmihalyi, 1990, p. 4).
o Engagement has been called “flow without user
control” and “a subset of flow”
(Webster & Ahuja, 2004, p. 8)
22
ATTRIBUTES OF FLOW
Enjoyment , Focused attention, Absorption, Time perception,
Clear goals and feedback, Control
(Cskiszentmihalyi, 1990)
FLOW IN HUMAN COMPUTER INTERACTION (HCI)
• The “PAT” – Person, Artefact, Task Model
(Finneran & Zhang, 2003)
• Attributes and predictors of flow with work-based systems
(Webster, Trevino & Ryan, 1993)
• Relationships between flow and the tasks being performed
• Ghani & Deshpande, 1994: work tasks
• Pace, 2004: directed and exploratory search tasks
23
RELEVANCE OF FLOW TO ENGAGEMENT
Flow Engagement
Feedback from an activity Perceived usability vital for engagement to
be sustainedControl during an interaction
Appropriate levels of challenge
Focused attention Complete absorption not necessary; getting
“sidetracked” may be acceptable and
engaging
Intrinsic motivation May be extrinsic; may be more fruitful to
explore motivations as utilitarian and
hedonic
Goal-directed behaviour Have fun, have an experience; see where
the road takes me
Emphasis on the individual and
task variables
Personal and task relevance important, but
characteristics of system and content
precipitate engagement
(O‟Brien, 2008)
24
IN THE GAME INDUSTRY
Engagement – Engrossment – Total immersion
(Brown & Cairns, 2004)
(Gow et al, 2010)
… not covered in this tutorial … but we should be aware of this line of work. 25
MEASURING USER ENGAGEMENT
Measures Characteristics
Self-reported
engagement
Questionnaire, interview, report,
product reaction cards, think-aloud
Subjective
Short- and long-term
Lab and field
Small-scale
Product outcome
Cognitive
engagement
Task-based methods (time spent,
follow-on task)
Neurological measures (e.g. EEG)
Physiological measures (e.g. eye
tracking, mouse-tracking)
Objective
Short-term
Lab and field
Small-scale and large-
scale
Process outcome
Interaction
engagement
Web analytics
metrics + models
Objective
Short- and long-term
Field
Large-scale
Process
26
MEASURES
• Ask a user to make some estimation of the passage of time during an
activity.
Subjective perception
of time (Baldauf, Burgarda &
Wittmann, 2009)
• Involuntary body responses
• Gaze behavior, mouse gestures, biometrics (e.g., skin conductance,
body temperature, blood volume pulse), facial expression analysis
Physiological
measures
• How well somebody performs on a task immediately following a
period of engaged interaction
Follow-on task
performance
(Jennett et al, 2008)
• An estimate of the degree and depth of visitor interaction against a
clearly defined set of goals
• Based on web analytics (e.g. click-through rate, comments posted)
Online behaviour
• Relate system effectiveness and user satisfaction
• Designing user models is an important and active research area
Search
(evaluation)
… a bit more about them
27
OUTLINE
o Introduction and Scope
o Part I - Foundations
1. Approaches based on self-report measures
2. Approaches based on web analytics
3. Approaches based on physiological measures
o Part II – Advanced Aspects
1. Measuring user engagement in mobile information searching
2. Networked user engagement
3. Combining different approaches
o Conclusions
o Bibliography
28
PART 1:
FOUNDATIONS
APPROACHES BASED ON SELF-REPORT MEASURES
MEASURING USER ENGAGEMENT
29
INTRODUCTION TO SELF-REPORT MEASURES
o What are self-report measures?
• A type of method commonly used in social science where
individuals express their attitudes, feelings, beliefs or
knowledge about a subject or situation.
o Why consider self-reports?
• Emphasize individuals‟ perceptions and subjective
experiences of their engagement with technologies.
o Self-report methods may be discrete, dimensional, and
free response. (Lopatovska & Arapakis, 2011)
30
ADVANTAGES OF SELF-REPORT MEASURES
o Flexibly applied in a variety of settings
o High internal consistency for well-constructed measures
o Convenient to administer
o Specificity in construct definition
o Quantitative self-report measures, i.e., questionnaires
• Enable statistical analysis and standardization
• Participant anonymity
• Administered to individuals or groups
• Paper-based or web-based
• Function well in large-sample research studies
(Fulmer & Frijters, 2009)
✓31
DISADVANTAGES OF SELF-MEASURES
o Information processing issues
• Interpretation of researchers‟ questions
• Developmental challenges associated with age or cognitive ability
o Communication issues
• Wording and response options
• Rapport between interviewer and interviewee
o Construct issues
o Reliability and validity issues
o Participants‟ responses
• What does the “neutral” category mean?
• Over-estimate behavior frequency
• Reliance on recollection.
(Fulmer & Frijters, 2009; Kobayashi & Boase, 2012)
✗32
APPROACHES TO STUDYING USER ENGAGEMENT
WITH SELF-REPORT MEASURES – OUTLINE
o Methods
• Interviews
• Think aloud/think after protocols
• Questionnaires
o Examples of employing each method to study
engagement
o Examples of using self-report methods
33
INTERVIEWS
o May be structured, semi-structured or unstructured.
o The interview schedule.
o May be one-on-one or one-to-many (focus groups).
o May focus on general or specific events, experiences,
or timeframes.
http://openclipart.org/detail/173434/interview-by-jammi-evil-173434
34
USING INTERVIEWS TO MEASURE USER ENGAGEMENT
o Objectives:
1. To develop an operational definition of engagement, and
2. To identify key attributes of engagement.
o Who?
• 17 online searchers, gamers, learners and shoppers.
o Why interviews?
o How were the questions formulated?
• Grounded in interdisciplinary literature review and theory
o What guided the analysis?
• Threads of Experience (McCarthy & Wright, 2004)
(O‟Brien & Toms, 2008) 35
USING INTERVIEWS TO MEASURE USER ENGAGEMENT:
OUTCOMES
o Developed a process-based model of user
engagement.
o Identified attributes of engagement:
• Aesthetic and sensory appeal, affect, feedback, control,
interactivity, novelty, focused attention, motivation,
interest.
o Mapped attributes to stages in the process model.
o Benefit of using interviews.
(O‟Brien & Toms, 2008) 36
THINK ALOUD/THINK AFTER PROTOCOLS
o Think aloud
• Verbalization during the human-computer interaction
o Think after or simulated recall
• Verbalization after the human-computer interaction
o Constructive interaction
• Involves two verbalizing their thoughts as they interact
with each other
o Spontaneous and prompted self-report
• Participants provide feedback at fixed intervals or at
other points defined by the researcher
(Branch, 2000; Ericson & Simon, 1984; Kelly, 2009; Van den Haak, De
Jong, & Schellens, 2009)
37
THINK ALOUD/THINK AFTER PROTOCOLS:
CONSIDERATIONS
o Automatic processes difficult to articulate.
o Complex/highly visual interactions may be
challenging to remember and/or verbalize.
o Think aloud/spontaneous or prompted self-report
• Unnatural, interruptive
• Increased cognitive load
o Think after or simulated recall:
• Relies on memory but attention is less divided
• Researcher can draw participants‟ attention to
specific features of the interface, activities, etc.
(Branch, 2000; Ericson & Simon, 1984; Kelly, 2009; Van den Haak, De
Jong, & Schellens, 2009)
38
USING THINK ALOUD TO STUDY USER
ENGAGEMENT WITH EDUCATIONAL MULTIMEDIA
o Series of studies with educational multimedia
and television advertisements
o Think aloud component of the research:
• Identified salient aspects of engagement with
content and media
• Content: Perceptions driven by personal interest
• Media: Focus on media preference, presentation,
and affordances of control in navigation
(Jacques, Preece & Carey, 1995)
39
QUESTIONNAIRES
o Closed-ended (quantitative) and open-ended
(qualitative).
o Effect of mode (Kelly et al., 2008).
o Scale development and evaluation is a
longitudinal process.
40
Theoretical
Foundation
Step 3: Develop Instrument
Step 5: Data Analysis
Step 7: Data Analysis
Step 1: Research Review
Step 2: Exploratory Study
Step 4: Administer Survey,
Sample 1
Step 6: Administer Survey,
Sample 2
Collect Date
Select pool of items
Develop conceptual model and
definition
Evaluate scale reliability &
dimensionality
Evaluate scale validity &
predictive relationships
Collect data (Pre-test)
Develop „purified‟ scale
Collect data
Final scale
Scale
Construction
Scale
Evaluation
Basedon(DeVellis,2003)SCALE DEVELOPMENT AND EVALUATION
41
QUESTIONNAIRES FOR MEASURING USER
ENGAGEMENT
o Jacques, 1996
• 13-items
• Attention, perceived time, motivation, needs, control, attitudes, and
overall engagement
o Webster & Ho, 1997
• 15-items
• Influences on engagement: including challenge, feedback, control
and variety, and
• Engagement, including attention focus, curiosity, intrinsic interest,
and overall engagement.
o O’Brien & Toms, 2010 – User Engagement Scale (UES)
• 31-items
• Aesthetic appeal, novelty, felt involvement, focused attention,
perceived usability, and endurability (overall experience)
42
USING QUESTIONNAIRES TO STUDY ENGAGEMENT:
ROLE OF MEDIA FORMAT: EXAMPLE I
Story 1
Story 2
UES +
Information
Recall
Questions
Post Session Questionnaire
– Attitudes Checklist
(Schraw et al. 1998) +
Interviews
Pre-task
Survey
Media
Condition
Video
Audio
Narrative text
Transcript text
Participants
(n=82)
(O‟Brien, 2013)
43
ROLE OF FORMAT IN MEDIA ENGAGEMENT:
PREPARATION AND SCREENING OF UES
Data Screening
Reliability of
sub-scales
Correlation
analysis
Principal
Components
Analysis
- 12 items
- 2 items
31 items
27 items
44
PRINCIPLE COMPONENTS ANALYSIS (PCA) OF
REMAINING UES ITEMS
Component Description No.
Items
%
Variance
Cronbach‟s
alpha
1 Hedonic
Engagement
12 47.9 0.95
2 Focused
Attention
4 11 0.87
3 Affective
Usability
4 5.9 0.75
4 Cognitive
effort
2 4.6 0.83
Kaiser-Meyer-Olkin Measure of Sampling Adequacy = 0.89
Bartlett‟s Test of Sphericity = x2=1621.12(231), p<0.001
45
FINDINGS FROM THE STUDY
Component Story 1: Farming M(SD) Story 2: Mining M(SD)
Hedonic Engagement 4.06 (1.3) 5.06 (1.05)
Focused Attention 3.3 (1.4) 3.93 (1.3)
Affective Usability 4.69 (1.3) 5.6 (0.9)
Cognitive Effort 4.19 (1.5) 5.29 (1.3)
Relationship between Story and Engagement
Component Audio
M(SD)
Video
M(SD)
Transcript
M(SD)
Narrative
M(SD)
Hedonic
Engagement
4.7(1.2) 5(1.1) 3.9(1.4) 4.5(1.2)
Focused Attention 3.6(1.4) 3.8(1.4) 3.5(1.4) 3.5(1.5)
Affective Usability 5(1.2) 5.4(1.1) 4.9(1.3) 5(1.2)
Cognitive Effort 4.5(1.6) 5.5(1.1) 4.1(1.5) 4.8(1.4)
Relationship between Media Condition and Engagement
46
FINDINGS FROM THE STUDY (CONTINUED)
Effect Λ F df(1) df(2) p
Story 0.8 05.45 1 98 .001
Condition 0.78 1.81 3 98 .04
Story x Condition 0.92 0.54 3 98 .88
Multivariate Tests for Story and Condition
UES Component Effect MS F df(1) df(2) p
Hedonic Engagement Story 14.05 9.95 1 98 .002
Focused Attention Story 10.32 4.78 1 98 .031
Affective Usability Story 23.76 17.71 1 98 .000
Cognitive Effort Story 20.02 11.4 1 98 .000
Cognitive Effort Condition 7.23 4.11 3 98 .009
Significant F-tests for Univariate Follow-up
47
CONCLUSIONS: MEDIA FORMAT AND ENGAGEMENT
o Next steps in data analysis.
o Value of screening and examining the reliability and
principal component structure of the UES items.
o Why performance measures would not be significant in
this controlled study.
o What was learned about users‟ perceived engagement
in this study.
48
o How the visual catchiness (saliency) of
“relevant” information impacts user engagement
metrics such as focused attention and emotion
(affect)
• focused attention refers to the exclusion of
other things
• affect relates to the emotions experienced
during the interaction
o Saliency model of visual attention developed by
(Itti & Koch, 2000)
EMPLOYING MULTIPLE SELF-REPORT METHODS:
EXAMPLE II
49
MANIPULATING SALIENCY
Web page screenshot Saliency maps
salientconditionnon-salientcondition
(McCay-Peet et al, 2012) 50
STUDY DESIGN
o 8 tasks = finding latest news or headline on celebrity or
entertainment topic
o Affect measured pre- and post- task using the Positive
e.g. “determined”, “attentive” and Negative e.g. “hostile”, “afraid”
Affect Schedule (PANAS)
o Focused attention measured with 7-item focused
attention subscale e.g. “I was so involved in my news tasks that I
lost track of time”, “I blocked things out around me when I was
completing the news tasks” and perceived time
o Interest level in topics (pre-task) and questionnaire (post-
task) e.g. “I was interested in the content of the web pages”, “I
wanted to find out more about the topics that I encountered on the
web pages”
o 189 (90+99) participants from Amazon Mechanical Turk
51
PANAS (10 POSITIVE ITEMS AND 10 NEGATIVE ITEMS)
o You feel this way right now, that is, at the present moment
[1 = very slightly or not at all; 2 = a little; 3 = moderately;
4 = quite a bit; 5 = extremely]
[randomize items]
distressed, upset, guilty, scared, hostile,
irritable, ashamed, nervous, jittery, afraid
interested, excited, strong, enthusiastic, proud,
alert, inspired, determined, attentive, active
(Watson, Clark & Tellegen, 1988)
52
7-ITEM FOCUSED ATTENTION SUBSCALE (PART OF THE 31-
ITEM USER ENGAGEMENT SCALE)
5-POINT SCALE (STRONG DISAGREE TO STRONG AGREE)
1. I lost myself in this news tasks experience
2. I was so involved in my news tasks that I lost track of time
3. I blocked things out around me when I was completing the
news tasks
4. When I was performing these news tasks, I lost track of
the world around me
5. The time I spent performing these news tasks just slipped
away
6. I was absorbed in my news tasks
7. During the news tasks experience I let myself go
(O'Brien & Toms, 2010)
53
SALIENCY AND POSITIVE AFFECT
o When headlines are visually non-salient
• users are slow at finding them, report more
distraction due to web page features, and show a
drop in affect
o When headlines are visually catchy or salient
• user find them faster, report that it is easy to
focus, and maintain positive affect
o Saliency is helpful in task performance,
focusing/avoiding distraction and in
maintaining positive affect
54
SALIENCY AND FOCUSED ATTENTION
o Adapted focused attention subscale from the online
shopping domain to entertainment news domain
o Users reported “easier to focus in the salient condition”
BUT no significant improvement in the focused
attention subscale or differences in perceived time
spent on tasks
o User interest in web page content is a good
predictor of focused attention, which in turn is a
good predictor of positive affect
55
SELF-REPORTING, CROWDSOURCING, SALIENCY AND
USER ENGAGEMENT
o Interaction of saliency, focused attention, and affect,
together with user interest, is complex.
o Using crowdsourcing worked!
o What next?
• include web page content as a quality of user
engagement in focused attention scale
• more “realistic” user (interactive) reading experience
• other measurements: mouse-tracking, eye-tracking, facial
expression analysis, etc.
(McCay-Peet, Lalmas & Navalpakkam, 2012)
56
CONSIDERATIONS WHEN EMPLOYING SELF-REPORT
MEASURES
o What is the research question?
o What is the most suitable self report method?
o How might we use self-report in studies of user
engagement?
• Gather data explicitly about engagement
• Other self-report measures may predict, validate, or
enrich other measures of engagement
o Why do self-reports get a bad rap?
57
PART 1:
FOUNDATIONS
APPROACHES BASED ON WEB ANALYTICS
MEASURING USER ENGAGEMENT
58
WEB ANALYTICS
59
8 INDICES
o Click Depth Index: page views
o Duration Index: time spent
o Recency Index: rate at which users return over time
o Loyalty Index: level of long-term interaction the user
has with the site or product (frequency)
o Brand Index: apparent awareness of the user of the
brand, site, or product (search terms)
o Feedback Index: qualitative information including
propensity to solicit additional information or supply
direct feedback
o Interaction Index: user interaction with site or product
(click, upload, transaction)
(Peterson et al, September 2008)
60
INTRA-SESSION VERSUS INTER-SESSION ENGAGEMENT
o Intra-session engagement measures our success in attracting the
user to remain on our site for as long as possible
o “Long-term engagement can be defined as the degree of voluntary
use of a system along a wide period of time…” (Febretti and
Garazotto, 2009)
o Inter-session engagement can be measured directly or, for
commercial sites, by observing lifetime customer value (CTR,
etc.).
o Some studies (Lehmann et al, 2011) report some correlation between
inter- and intra-session measures, for example, dwell time and
number of active days ( =− . )
61
WHY NOT USE INTRA-SESSION MEASURES
EXCLUSIVELY?
o We seek to have users return to the site again and
again, and to perceive the site as beneficial to them
o Intra-session measures can easily mislead, especially
in for a short time (Kohavi et al, 2012):
• Consider a very poor ranking function introduced into a
search engine by mistake
• Therefore, bucket testing may provide erroneous results
if intra-session measures are used
o Hence inter-session (long-term) engagement is the
preferred measure
62
(Lehmann et al, 2012) observed that different users engage
with sites differently.
Users were defined according to the number of days per
month that a site is used:
• Tourists: 1 day
• Interested: 2-4 days
• Average: 5-8 days
• Active: 9-15 days
• VIP: more than 16 days
Sites from the Yahoo! network were clustered according to
the proportion of users from each group.
The figure shows that different sites receive different user
types and corresponding usage.
DEPENDENCY ON USER TYPE
63
DEPENDENCY ON TASK AND WEBSITE
o Engagement varies by task. For example, a user who accesses a
website to check for emails (a goal-specific task) has different
engagement patterns from one who is browsing for leisure.
o In one study (Yom-Tov et al, 2013), sessions in which 50% or more of
the visited sites belonged to the five most common sites (for each
user) were classified as goal-specific.
• Goal-specific sessions accounted for 38% of sessions
• Most users (92%) have both goal-specific and non-goal-specific
sessions.
• The average downstream engagement (more later) in goal-specific
sessions was 0.16. This is to be contrasted with 0.2 during non-goal-
specific sessions.
o Dependence on website is clear: news site will see different
engagement patterns that online shopping sites.
64
LARGE-SCALE MEASUREMENTS OF USER ENGAGEMENT
Intra-session measures Inter-session measures
Single site • Dwell time  session
duration
• Play time (video)
• Click through rate (CTR)
• Mouse movement
• Number of pages viewed
(click depth)
• Conversion rate (mostly
for e-commerce)
• Fraction of return visits
• Time between visits (inter-
session time, absence time)
• Number of views (video)
• Total view time per month (video)
• Lifetime value
• Number of sessions per unit of
time
• Total usage time per unit of time
• Number of friends on site (Social
networks)
Multiple sites • Downstream engagement
• Revisits
65
ANOTHER CATEGORIZATION OF MEASURES
o (Lehmann et al, 2012) used a different categorization of
measures:
• Popularity: Total number of users to a site, number of visits, and
number of clicks
• Activity: Number of page views per visit, time per visit (dwell time)
• Loyalty: Number of days a user visits a site, number of times visited,
total time spent
o Each of these categories captures a different facet of engagement,
and are therefore not highly correlated
… more about this later
66
DWELL TIME AND OTHER SIMILAR MEASURES
o Definition
The contiguous time spent on a
site or web page
o Similar measures
Play time (for video sites)
o Cons
Not clear that the user was
actually looking at the site
while there Distribution of dwell times on 50
Yahoo! websites
67
o Dwell time varies
by site type: leisure
sites tend to have
longer dwell times
than news,
ecommerce, etc.
o Dwell time has a
relatively large
variance even for
the same site
DISTRIBUTION OF DWELL TIMES ON 50 YAHOO! SITES
(recall tourists,
VIP, active … users)
68
o User revisits are common
in sites which may be
browser homepages, or
contain content which is
of regular interest to
users.
o Goal-oriented sites (e.g.,
e-commerce) have lower
revisits in the time range
observed, meaning that
revisit horizon should be
adjusted by site.
DISTRIBUTION OF USER REVISITS TO A LIST OF YAHOO!
SITES (WITHIN SESSION)
69
OTHER INTRA-SESSION MEASURES
o Clickthrough rate (CTR): number of clicks (e.g., on an ad) divided
by the number of times it was shown.
o Number of pages viewed (click depth): average number of
contiguous pages viewed within a site
• Can be problematic if the website is ill-designed.
o Number of returns to the website within a session
• Useful for websites such as news aggregators, where returns
indicate that the user believes there may be more information to
glean from the site.
o Conversion rate (mostly for e-commerce): fraction of sessions
which end in a desired user action (e.g., purchase)
• Not all sessions are expected to result in a conversion, so this
measure not always informative. However, it has the advantage of
being closer to a website manager‟s goal.
70
INTER-SESSION ENGAGEMENT MEASURES
In general, these are the preferred measures of
engagement
o Direct value measurement:
• Lifetime value, as measured by ads clicked, monetization, etc.
o Return-rate measurements:
• Fraction of return visits: How many users return for another visit?
• Time between visits (inter-session time, absence time)
• Number of distinct views (video)
o Total use measurements:
• Total usage time per unit of time
• Number of sessions per unit of time
• Total view time per month (video)
• Number of friends on site (social networks)
71
ABSENCE TIME AND SURVIVAL ANALYSIS
Easy to implement
and interpret
Can compare many
things in one go
No need to estimate
baselines
But need lots of data
to account for noise
(Dupret & Lalmas, 2013)
Yahoo! Japan (Answers search)
72
MODELS OF USER ENGAGEMENT BASED ON WEB
ANALYTICS … TOWARDS A TAXONOMY
Onlinesitesdifferconcerningtheirengagement!
Games
Users spend
much time per
visit
Search
Users come
frequently and
do not stay long
Social media
Users come
frequently and
stay long
Special
Users come on
average once
News
Users come
periodically
Service
Users visit site,
when needed
73
DATA AND MEASURES
Interaction data, 2M users, July 2011, 80 US sites
Popularity #Users Number of distinct users
#Visits Number of visits
#Clicks Number of clicks
Activity ClickDepth Average number of page views per visit.
DwellTimeA Average time per visit
Loyalty ActiveDays Number of days a user visited the site
ReturnRate Number of times a user visited the site
DwellTimeL Average time a user spend on the site.
74
METHODOLOGY
General models Time-based models
Dimensions
8 measures
weekdays, weekend
8 metrics per time span
#Dimensions 8 16
Kernel k-means with
Kendall tau rank correlation kernel
Nb of clusters based on eigenvalue distribution of kernel matrix
Significant metric values with Kruskal-Wallis/Bonferonni
#Clusters
(Models) 6 5
Analysing cluster centroids = models
75
MODELS OF USER ENGAGEMENT
[6 GENERAL]
o Popularity, activity and loyalty are independent from each other
o Popularity and loyalty are influenced by external and internal factors
e.g. frequency of publishing new information, events, personal interests
o Activity depends on the structure of the site
interest-specific
media (daily)
search
periodic
media
e-commerce
models based on engagement measures only
76
TIME-BASED [5 MODELS]
Models based on engagement over weekdays and weekend
hobbies,
interest-specific
weather
daily news
work-related
time-based models ≠ general models
77
MODELS OF USER ENGAGEMENT
o User engagement is complex and standard
metrics capture only a part of it
o User engagement depends on time (and users)
o First step towards a taxonomy of models of
user engagement … and associated measures
o What next?
• More sites, more models, more measures
• User demographics, time of the day, geo-location, etc.
• Online multi-tasking
(Lehmann et al, 2012) 78
ONLINE MULTI-TASKING
users spend more and more of their online session multi-tasking, e.g. emailing,
reading news, searching for information  ONLINE MULTI-TASKING
navigating between sites, using browser tabs, bookmarks, etc
seamless integration of social networks platforms into many services
leaving a site is
not a “bad thing!”
(fictitious navigation between sites within an online session)
181K users, 2 months browser
data, 600 sites, 4.8M sessions
•only 40% of the sessions have
no site revisitation
•hyperlinking, backpaging and
teleporting
79
PART 1:
FOUNDATIONS
APPROACHES BASED ON PHYSIOLOGICAL MEASURES
MEASURING USER ENGAGEMENT
80
PHYSIOLOGICAL MEASURES
o Eye tracking
o Mouse movement
o Face expression
o Psychophysiological measures
Respiration, Pulse rate
Temperature, Brain wave,
Skin conductance, …
81
WHAT IS PSYCHOPHYSIOLOGY?
o The branch of physiology dealing with the relationship between
physiological processes and thoughts, emotions, and behavior.
Reaction to The body responds to psychological processes.
we exercise  we sweat
we get embarrassed  our cheeks get red and warm
o Examples of measurements
• Electroencephalography (EEG) – measures the electrical activity of the brain through
the scalp.
• Cardiovascular measures – heart rate, HR; beats per minute, BPM; heart rate
variability, HRV; vasomotor activity
• Respiratory sensors – monitors oxygen intake and carbon dioxide output.
• Electromyographic (EMG) sensors – measures electrical activity in muscles
• Electrogastrogram (EGG) – measures changes in pupil diameter with thought and
emotion (pupillometry) and eye movements
• Galvanic skin response (GSR) sensors – monitors perspiration/sweat gland activity
(also called Skin Conductance Level – SCL)
• Temperature sensors – measures changes in blood flow and body temperature
• Functional magnetic resonance imaging (fMRI) – measures brain activity by detecting
associated changes in blood flow 82
PSYCHOPHYSIOLOGY – PROS AND CONS
Pros
o More objective data (not dependent on language, memory)
o Can be performed continuously during message/task processing
o Can provide information on emotional and attentional responses
often not available to conscious awareness
Cons
o Equipment expensive and can be
cumbersome, and obtrusive
o Rarely a one-to-one
correspondence between specific
behaviors and physiological
responses
o Difficult to operationalize and isolate
a psychological construct
o Not applicable to large-scale
http://flavor.monell.org/~jlundstrom/research%20behavior.html
83
WHAT IS EYE TRACKING?
o Process of measuring either the point of gaze
(where one is looking) or the motion of an
eye relative to the head.
o Eye tracker is a device for measuring eye positions and eye
movement.
o Used in research on the visual system, in psychology,
in cognitive linguistics and in product design.
Time to First Fixation
Fixations Before
First Fixation Duration
Fixation Duration
Total Fixation Duration
Fixation Count
Visit Duration
Visit Count
whole screen or AOI (area of interest)
http://en.wikipedia.org/wiki/Eye_tracking
Examples of measures:
(Lin et al, 2007)
84
EYE TRACKING – ATTENTION AND SELECTION
18 users, 16 tasks each
(chose one story and rate it)
eye movement recorded
Attention (gaze)
interest has no role
position > saliency
Selection
mainly driven by interest
position > attention
(Navalpakkam et al., 2012) 85
EYE TRACKING – PROS AND CONS
o Pros
• Lots of details (fine-grained data/resolution)
• Offers direct measure of user attention +
what they are looking at
• Offers insights into how people consume & browse web
pages + why they fail at clicking on something
o Cons
• Not scalable
• Slow and expensive
• Not natural environment (e.g. at home)
• Behavior ARE can be different in a lab setting
Can mouse movement act as a (weak) proxy of gaze?
86
WHAT IS MOUSE TRACKING?
o Using software (JavaScript) to collect user mouse
cursor positions on computer/web interface
o Aim to provide information about what people are
doing, typically to improve the design of an interface
o How does gaze is measured by an eye tracker relates
to mouse movement as recorded
o Studies and applications
• Attention on web pages
• Relevance of search results
• As a proxy of relevance
• As additional and complimentary signal
(also known as cursor tracking)
87
MOUSE VS GAZE – ATTENTION ON WEB PAGES
o 90 users on 6 Yahoo! Finance articles – rich media content
o 3 treatments:
• ad always on top; ad top right + random; random (6 positions)
o Reading tasks + post-questionnaires
ad avoidance
similar patterns
shift of attention from top-left to
right as ad position change
similar patterns
visit ad sooner & more time to
process content when ad
position moves
similar patterns
more at top position
and longer dwell
left better than right
Similar patterns between gaze and mouse in terms of user attention
when manipulating conditions (here ads)
Interesting results for “ads”
(Navalpakkam & Churchill, 2012) 88
Multimedia search
activities often
driven by
entertainment
needs, not by
information needs
RELEVANCE IN MULTIMEDIA SEARCH
(Slaney, 2011) 89
CLICK-THROUGH RATE AS PROXY OF RELEVANCE
I just wanted the phone number … I am totally satisfied 
90
GAZE AND CURSOR RELATIONSHIP
o Small difference on part of page user attends to (5 users)
o Better correlation when cursor moves and when there is lots of
movement (23 users + reading instructions)
o Search result page
• Correlate more along the y-axis than x-axis
Correlate more when cursor placed over search results
(32 users – 16 search tasks; 10 users and 20 search tasks)
BUT
1. Search result page and result page
2. Some factor?
(Rodden et al, 2008; Guo & Agichtein, 2010)
(Chen et al, 2011; Hauger et al, 2011)
91
GAZE VS MOUSE - DISTANCE
(Huang, White & Dumais, 2011) 92
GAZE VS CURSOR - FACTORS
o 38 users and 32 search tasks (navigational + informational)
o Age or gender does not seem to be a factor
o Task does not seem to be a factor (others found the opposite)
(using click entropy to classify a a query)
o User individual behavior seem to matter more
o Gaze leads the cursor
o Stronger alignment when search result page loads
o Cursor behaviors: alignment increases
inactive < examining < reading < action < click
58.8% 31.9% 2.5% 5.7%
(Huang et al, 2012)
93
CAN WE PREDICT GAZE? better prediction when accounting
for cursor behaviors and time in
addition to cursor only
(Huang,White&Buscher,2012)
94
CLICK VS CURSOR – HEATMAP
o Estimate search result relevance
(Bing - Microsoft employees – 366,473 queries; 21,936 unique cookies;
7,500,429 cursor move or click)
the role of hovering?
(Huang et al, 2011)
95
MOUSE MOVEMENT – WHAT CAN HOVERING TELL
ABOUT RELEVANCE?
Cickthrough rate:
% of clicks when URL
Shown (per query)
Hover rate:
% hover over URL
(per query)
Unclicked hover:
Media time user hovers over
URL but no click (per query)
Max hover time:
Maximum time user hover
Over a result (per SERP)
(Huang et al, 2011)
96
MOUSE MOVEMENT – WHAT CAN HOVERING TELL
ABOUT ABANDONMENT?
o Abandonment (a engagement metric in search) is when there is no
click on the search result page
• User is dissatisfied (bad abandonment)
• User found result(s) on the search result page (good abandonment)
o 858 queries (21% good vs. 79% abandonment manually examined)
o Cursor trail length
• Total distance (pixel) traveled by cursor on SERP
• Shorter for good abandonment
o Movement time
• Total time (second) cursor moved on SERP
• Slower when answers in snippet (good abandonment)
o Cursor speed
• Average cursor speed (pixel/second)
• Slower when answers in snippet (good abandonment)
(Huang et al, 2011)
97
RELEVANCE & CURSOR
o Clickthrough rate (CTR) – in a search result
• Ranking bias
• Various way to deal with it such as “interleaving”
• Presentation bias
• Perceived relevance from reading the snippet
o Dwell time – on landing page (post search result)
• Although a good indicator of user interest/relevance, not reliable on
its own
• Time spending reading a document (result) has shown to improve
search quality
• Short dwell time a good indication of non-relevance
• BUT
• Interpreting long dwell-time not so straight-forward
(user spends a long time localising the relevant part in long document!)
… we recall that in search
98
RELEVANCE & CURSOR
“reading” cursor heatmap of relevant document vs “scanning” cursor heatmap
of non-relevant document (both dwell time of 30s)
(Guo & Agichtein, 2012)
99
RELEVANCE & CURSOR
“reading” a relevant long document vs “scanning” a long non-relevant
document
(Guo & Agichtein, 2012)
100
WHAT WORKS? – PREDICTING RELEVANCE
o Dwell time
o Cursor movement
• number, total distance traveled (and x- and y-axis),
speed (-), maximal coordinate
o Scroll
• frequency (-) and speed (-)
o Predefined areas of interest (AOI)
• Where main content lies
o Actual rank way less informative
incombinationevenbetter
(Guo & Agichtein, 2012)
… learning a model with:
101
FACIAL EXPRESSION AND SEARCH
16 subjects, facial expressions recorded while performing search tasks of
various levels of difficultly.
learned model (based on support vector machine) shows that facial
expressions provide good cues on topical relevance.
Potential application: personalised relevance feedback based on implicit cues.
(Arapakis et al, 2010) 102
FACEBOOK AND EMOTIONAL ENGAGEMENT (FLOW)
(Lang, 1995; Mauri et al, 2011)
valence-arousalplane
SC = skin conductance
EMG = electromagnetic activity
Lang model of emotions
relaxation (3mn, panorama
pictures)  Facebook (3mn,
free navigation)  stress
(4mn, arithmetic tasks)
30 students
103
EMOTION, ENGAGEMENT AND MEASURES
o Anticipation: Humans
are curious.
o Joy: Happy users
mean well engaged,
repeat users.
o Trust: Users want to
feel safe when
interacting with your
site.
o More?
104
Plutchik‟s emotion wheel
http://uxdesign.smashingmagazine.com/2011/05/19/optimizing-emotional-engagement-in-web-design-through-metrics/
OUTLINE
o Introduction and Scope
o Part I - Foundations
1. Approaches based on self-report measures
2. Approaches based on web analytics
3. Approaches based on physiological measures
o Part II – Advanced Aspects
1. Measuring user engagement in mobile information searching
2. Networked user engagement
3. Combining different approaches
o Conclusions
o Bibliography
105
PART 2:
ADVANCED
ASPECTS
MOBILE INFORMATION SEEKING
MEASURING USER ENGAGEMENT
106
MOBILE USER ENGAGEMENT
o Mobile devices are changing the ways in which
we are learning, working, and communicating.
o The role of device has not been considered in
(published) studies of user engagement.
o However … related work has been done in the
UX literature.
107
DIARY STUDIES
1. Komaki et al, 2012
• Context heavily influenced search behavior
2. Nylander et al, 2009
• General preference for using mobile, even when an
alternative was available (51% of instances)
• Mobile use influenced by: technical ease and
functionality, and convenience, laziness, and integration
with social life and daily activities
3. Church & Smythe, 2009; Church & Oliver, 2011
• Emphasized location and time as key factors in mobile
use
108
FIELD STUDIES
o Oulasvirta et al, 2005
• Attention shifting between the mobile device and the
external environment
o Gökera & Myrhaugb, 2008
• Context closely tied to perceived relevance and value of
information
o Battarbee & Koskinen, 2005
• Emotional response of information sharing and
communication with friends in everyday life
109
BUILDING A MODEL OF ENGAGEMENT BASED ON UX
LITERATURE
o User experience (UX) literature suggests that:
• Users must focus attention on the mobile task and the
external environment (Oulasvirta et al., 2005).
• 63% of mobile searches were social in nature (Teevan et al.
2011).
• Mobile devices with constant connectivity are often „habit-
forming‟ (Oulasvirta et al., 2012)
• Time motivates mobile phone use (Tojib & Tsarenko, 2012).
Therefore …
110
MOBILEUSERENGAGEMENT
Usability
Ease of use
Perceptual speed
Integrate into everyday
activities
Aesthetic
Appeal
Small displays
Status
Popularity of device
SOCIAL
CONTEXT
Novelty
“Checking” for new
information, e.g.,
status updates
Endurable
Focused
Attention
External
environment places
demands on
attention
TIME
Felt
Involve-
ment
May be used casually to
“pass time” rather than
for sustained interest or
specific information
needs
Fit between the
interaction and
context of use
111
Period of
Sustained
Engagement
DisengagementRe-engagement
Point of
Engagement
STUDYING MOBILE USER ENGAGEMENT (IN PROGRESS)
(Absar, O‟Brien, Halbert & Trumble, 2013)
While conversing
about their
carbon footprints,
Mary and John
could not decide
which of their
cars are more
energy-efficient
“Let‟s look
it up!”
“We‟re late
for class.
Better go!”
Upon returning
home, John
decides to
search more
about hybrid
cars on his
computer
112
MOBILE USER ENGAGEMENT: EXPLORATORY STUDY
METHODS
Interview 1 Interview 2Mobile Diary Collection Period
TEXT + PHOTO
(Photovoice, Wang & Burris, 1997)
113
ENGAGEMENT WITH MOBILE APPS
o Focused on branded mobile apps, interactive marketing tools
o Methodology: identification and analysis of branded apps
• 2010 Interbrand Top 100 Global Brands + iTunes app store
• Analysis of features and content on the branded app according to:
vividness, novelty, motivation, control, customization, feedback, and
multiplatforming
• Distinguished product and service branded apps
o Almost all apps incorporated at least one of the seven
engagement attributes:
• control (97.2%), customization (85.8%), vividness (78.3%: entire app,
86.8%: entry page), multiplatforming (70.8%), motivation (62.3%),
feedback (55.7%), and novelty (11.3%).
(Kim, Lin & Sung, 2013)
114
PART 2:
ADVANCED
ASPECTS
NETWORKED USER ENGAGEMENT
MEASURING USER ENGAGEMENT
115
DOWNSTREAM ENGAGEMENT
o Basic premises:
• The success of a website depends not only on
itself, but also on its environment.
• This is particularly relevant for companies
running networks of properties or services
No man is an island, entire of itself
or website
116
USER BEHAVIOR WITHIN A NETWORK OF SITES
117
NETWORKED USER ENGAGEMENT:
ENGAGEMENT ACROSS A NETWORK OF SITES
 Large online providers (AOL, Google, Yahoo!, etc.)
offer not one service (site), but a network of sites
 Each service is usually optimized individually, with
some effort to direct users between them
 Success of a service depends on itself, but also on
how it is reached from other services (user traffic)
 Users switch between sites within an online session,
several sites are visited and the same site is visited
several times (online multi-tasking)
118
MEASURING DOWNSTREAM ENGAGEMENT
119
User session
Downstream engagement
for site A
(% remaining session time)
Site A
DOWNSTREAM ENGAGEMENT
o Varies significantly across sites
o Exhibits different distributions
according to site type
o Is not highly correlated with
other engagement measures
such as dwell time
o Optimizing downstream
engagement will have little
effect on user engagement
within that site
120
0%
10%
20%
30%
40%
50%
60%
70%
0
0.02
0.04
0.06
0.08
0.1
0.12
1%
9%
17%
25%
33%
41%
49%
58%
66%
74%
82%
90%
98%
0
20
40
60
80
100
120
140
0% 20% 40% 60% 80%
DISTRIBUTION OF DOWNSTREAM ENGAGEMENT
SCORES
(19.4M sessions, 265,000 users, 50 sites)
o Downstream
engagement is not
highly correlated with
intra-site measures
of engagement such
as dwell time ( =− .
, <10− ).
o Downstream
engagement is
negatively correlated
with inter-session
measures such as
revisits ( =− . ,
<10− ).
121
There are different
modes of downstream
engagement
according to site type.
There are no obvious
characteristics of
websites that would
indication their
downstream
distribution.
CLUSTERED DISTRIBUTION OF DOWNSTREAM
ENGAGEMENT SCORES
(19.4M sessions, 265,000 users, 50 sites)
122
DISTRIBUTION OF DOWNSTREAM ENGAGEMENT TO
A LIST OF YAHOO! WEBSITES
Varies across and within websites (19.4M sessions, 265,000 users, 50 sites) 123
NETWORKED USER ENGAGEMENT
o Downstream engagement
• Varies significantly across sites
• Exhibits different distributions according to site type
o Other measures of networked user engagement?
o Applications to companies with several services but also
to increasing “tightly” connected services (news and
social media)
o Let us not forget increased online multitasking
o Next: Can we quantify the network effect?
(Yom-Tov et al., 2013)
124
PART 2:
ADVANCED
ASPECTS
COMBINATIONS OF APPROACHES
MEASURING USER ENGAGEMENT
125
MEASURING USER ENGAGEMENT – WE RECALL
Measures Characteristics
Self-reported
engagement
Questionnaire, interview, report,
product reaction cards, think-aloud
Subjective
Short- and long-term
Lab and field
Small-scale
Product outcome
Cognitive
engagement
Task-based methods (time spent,
follow-on task)
Neurological measures (e.g. EEG)
Physiological measures (e.g. eye
tracking, mouse-tracking)
Objective
Short-term
Lab and field
Small-scale and large-
scale
Process outcome
Interaction
engagement
Web analytics + “data science”
metrics + models
Objective
Short- and long-term
Field
Large-scale
Process
126
COMBINATION OF APPROACHES
SEVERAL STUDIES
USER ENGAGEMENT
127
STUDY I: GAZE AND SELF-REPORTING
o News + comments
o Sentiment, interest
o 57 users (lab-based)
o Reading task (114)
o Questionnaire (qualitative data)
o Record mouse tracking, eye tracking, facial
expression, EEG signal (quantitative data)
Three metrics: gaze, focus attention and positive affect
128
INTERESTING CONTENT PROMOTE USERS
ENGAGEMENT METRICS
o All three metrics:
• focus attention, positive affect & gaze
o What is the right trade-off?
• news is news 
o Can we predict?
• provider, editor, writer, category, genre, visual
aids, …, sentimentality, …
o Role of user-generated content (comments)
• As measure of engagement?
• To promote engagement?
129
LOTS OF SENTIMENTS BUT WITH NEGATIVE
CONNOTATIONS!
o Positive effect (and interest, enjoyment and wanted to
know more) correlates
• Positively () with sentimentality (lots of emotions)
• Negatively () with positive polarity (happy news)
SentiStrenght (from -5 to 5 per word)
sentimentality: sum of absolute values (amount of sentiments)
polairity: sum of values (direction of the sentiments: positive vs negative)
(Thelwall, Buckley & Paltoglou, 2012)
130
EFFECT OF COMMENTS ON USER ENGAGEMENT
o 6 ranking of comments:
• most replied, most popular, newest
• sentimentality high, sentimentality low
• polarity plus, polarity minus
o Longer gaze on
• newest and most popular for interesting news
• most replied and high sentimentality for non-interesting
news
o Can we leverage this to prolong user
attention?
131
GAZE, SENTIMENTALITY, INTEREST
o Interesting and “attractive” content!
o Sentiment as a proxy of focus
attention, positive affect and gaze?
o Next
• Larger-scale study
• Other domains (beyond daily news!)
• Role of social signals (e.g. Facebook, Twitter)
• Lots more data: mouse tracking, EEG, facial
expression
(Arapakis et al., 2013)
132
STUDY II: MOUSE TRACKING AND SELF-REPORTING
o 324 users from Amazon Mechanical Turk (between
subject design)
o Two domains (BBC News and Wikipedia)
o Two tasks (reading and search)
o “Normal vs Ugly” interface
o Questionnaires (qualitative data)
• focus attention, positive effect, novelty,
• interest, usability, aesthetics
• + demographics, handeness & hardware
o Mouse tracking (quantitative data)
• movement speed, movement rate, click rate, pause
length, percentage of time still
133
“Ugly” vs “Normal” Interface (BBC News)
134
“Ugly” vs “Normal” (Wikipedia)
135
MOUSE TRACKING CAN TELL ABOUT
o Age
o Hardware
• Mouse
• Trackpad
o Task
• Searching: There are many different types of phobia.
What is Gephyrophobia a fear of?
• Reading: (Wikipedia) Archimedes, Section 1:
Biography
136
MOUSE TRACKING COULD NOT TELL MUCH ON
o focused attention and positive affect
o user interests in the task/topic
o BUT BUT BUT BUT
• “ugly” variant did not result in lower aesthetics scores
• although BBC > Wikipedia
 BUT – the comments left …
• Wikipedia: “The website was simply awful. Ads flashing everywhere, poor
text colors on a dark blue background.”; “The webpage was entirely blue. I
don't know if it was supposed to be like that, but it definitely detracted
from the browsing experience.”
• BBC News: “The website's layout and color scheme were a bitch to
navigate and read.”; “Comic sans is a horrible font.”
137
MOUSE TRACKING AND USER ENGAGEMENT
o Task and hardware
o Do we have a Hawthorne Effect???
o “Usability” vs engagement
• “Even uglier” interface?
o Within- vs between-subject design?
o What next?
• Sequence of movements
• Automatic clustering
(Warnock & Lalmas, 2013)
138
STUDY III: SELF-REPORT AND BEHAVIOURAL DATA
o Information Visualization System
• McGill Library Catalogue: Engineering Subject Area
• Version 1: visualization
• Version 2: visualization + audio
o Participatory Design Study
o Experiment
• n=24 engineering students
• Tasks: six information retrieval and hierarchical
navigation tasks
• Data collected: self-report and performance metrics
(Absar, 2012) 139
FINDINGS
o No difference in performance accuracy or time on task
o Aesthetics and Perceived Usability was higher for the
audio-visual system.
o Perceived ease of use was also rated higher for the
audio-visual system.
o Open-ended comments offered insights into
participants‟ perceptions and interactions.
140
STUDY IV: ONLINE NEWS INTERACTIONS
http://www.cbc.ca/news/
(O‟Brien&Lebow,inpress)
141
SELF-REPORT, BEHAVIOR AND PHYSIOLOGICAL DATA:
MEASURES
o Pre-task questionnaire
• Demographics + news behaviours
o Interaction with website
• Performance: Time on task, reading time, browsing time, number of
pages visited within site, whether participants clicked on links to
recommended content
• Physiological: heart rate (HR), electrodermal activity (EDA),
electrocmytogram (EMG) [subset of participants]
o Post-session questionnaire
• User Engagement Scale (UES) (O‟Brien & Toms, 2010)
• Cognitive Absorption Scale (CAS) (Argawal & Karahanna, 2000)
• System Usability Scale (SUS) (Brooks, 1997)
o Think-After Interview
• Questions about the items selected for the task
• Questions about overall experience
142
SELF-REPORT, BEHAVIOR AND PHYSIOLOGICAL DATA:
RESULTS
o Self-report UES, CAS and SUS
• Positive correlations support criterion validity of the
measures
• Designation of “low,” “medium” and “high” scores for each
group based on median
• All questionnaires were positively correlated with
aggregate interest in the articles
o UES and Physiological Data
HR EDA EMG
UES -0.38 -0.25 -0.21
143
SELF-REPORT, BEHAVIOR AND PHYSIOLOGICAL DATA:
RESULTS
o UES and Behavioural Data
• Use of Links
• UES scores were not significantly different between those who clicked
on links (M=3.8, SD=0.95) and those who did not (M=4.29, SD=0.52)
• U(1)=51.5, p=0.15
High M(SD) Medium
M(SD)
Low
M(SD)
Kruskal
Wallis (x2)
p
Reading time 6:03 (2:34) 6:05 (1:56) 6:56 (3:29) 1.15 0.56
Browsing time 4:03 (2:29) 5:17 (3:49) 7:29 (4:09) 3.98 0.13
Total time 10:07 (3:37) 11:23 (5:10) 14:26 (5:02) 5.09 0.07
# pages visited 9.5 (5.0) 10.3 (3.6) 16.3 (8.4) 3.89 0.14
144
THINK-AFTER INTERVIEW
o Did participants‟ experiences with online news fit the
process model of user engagement (O‟Brien & Toms, 2008)?
o What attributes of user engagement were significant to
participants in the online news environment?
• Novelty, affect, usability, personal interest and relevance
o Evidence of two types of engagement (O‟Brien, 2011)
• Content engagement
• Interface engagement
145
OUTLINE
o Introduction and Scope
o Part I - Foundations
1. Approaches based on self-report measures
2. Approaches based on web analytics
3. Approaches based on physiological measures
o Part II – Advanced Aspects
1. Measuring user engagement in mobile information searching
2. Networked user engagement
3. Combining different approaches
o Conclusions
o Bibliography
146
CONCLUSIONS
MEASURING USER ENGAGEMENT
147
AN EXAMPLE OF PUTTING IT ALL TOGETHER: HEART
FRAMEWORK
o Happiness
• Satisfaction
o Engagement
o Adoption
o Retention
o Task success
• Efficiency and effectiveness
large-scale behavioral data
• Based on experience in working with user-
centered products
• Not all measures appropriate to all products
HEART framework is “more” about user experience
(Rodden, Hutchinson & Fu, 2010) 148
PULSE MEASURES … THE OLD WAY
o Page views
• Increase may mean increase of popularity or getting lost
o Uptime
• Outage is bad
o Latency
• Slow is bad
o Seven-day active users
• Number of users who used the application at least once a week
• Does not differentiate between new and returning users
o Earnings
• Two many steps in purchasing flow is bad
• Intra-session vs. inter-session
149
HAPPINESS
o Subjective aspects
• satisfaction, visual appeal, likelihood to recommend,
perceived ease of use
o Survey
o Possibility to track over time
iGoogle (personalised home page)
weekly in-product
survey
major redesign  satisfaction decreases (1…7)
over time  measure recovers
(sign of change aversion) 150
ENGAGEMENT
o Level of involvement
o Behavioral proxies
• Frequency, intensity, depth of interaction over a time period
o Reported as an average and not in total
GMAIL example
at least
one-visit
per week
five-or-
more visits
in a week
PULSE
HEART
strong predictor of long-term retention
151
ADOPTION AND RETENTION
o Adoption: how many new users for a given period
o Retention: percentage of users still active after some given period
o Useful for new applications or those undergoing change
o Should account for seasonal changes and external events
Google Finance (stock market meltdown 2008)
page view 
seven-day 
Adoption
Retention
PULSE
HEART
new users interested in the crisis??
current users panicking?
new users staying?
better understanding of
event-driven traffic spikes
152
TASK SUCCESS … GOAL-ORIENTED
o Behavioral measures of user experience
• efficiency (e.g. time to complete a task); effectiveness (e.g. percent of
task completed); error rate
• e.g. sending an email; finding a location
o Remote usability on a large scale
o Difficult with standard log data unless an optimal path exists for a
type of task
Google map
dual box for search
- what
- where
single search box
A/B testing
Error rates
153
GOALS – SIGNALS - MEASURES
o Measures
• should relate to one or several goals of the
application/product
• Used to track progress towards that goal
1. articulate the goal(s) of an application/feature
2. identify signals that indicate success
3. build/chose corresponding measures to track
(Rodden, Hutchinson & Fu, 2010)
154
GOALS – SOME TIPS
o What are the goals of the product/features in terms of user
experience (user engagement)?
o What tasks users need to accomplish?
o What is the redesign cycle trying to achieve?
o Retention or adoption:
• Is it more important to acquire new users or to keep existing ones
more engaged?
o Goal associated with a feature is not the same as goal of the whole
product
o Measures (to be used or developed) should not be used to solely
drive the goals
155
SIGNALS – SOME TIPS
o What is success? What is failure?
o What feelings and perceptions correlate with success
and failure?
o What actions indicate that goals are met?
o Data sources
• logs, surveys, panel of judges
o Sensitive and specific signals
• need to observe some reaction when user experience is
better or worse
• failure often easier to identify than success
• undo event, abandonment, frustration
156
MEASURES – SOME TIPS
o Raw counts need to normalised
o Ratios, percentages, and average per users often
more useful
o Accuracy of metrics
• bots, all important actions recovered
o Keep comparing measures with “conventional”
ones (e.g. comScore matters)
157
(Rodden, Hutchinson & Fu, 2010)
OPEN RESEARCH QUESTIONS
… IN NO PARTICULAR ORDER
o A great deal of emphasis on users and systems, but less evidence
about the role of task, device, and context on user engagement.
o We tend to focus on characteristics of users in the moment of
interaction. But are their individual differences that may predict
the level of engagement that can be achieved?
o Psychophysiological measurement may not be sensitive enough
for measuring “general” or “average” engagement (e.g. News or
Mail sites) … although it will likely bring great insights.
o How we to “use” physiological measures – interpretation of the
data generated – is an important area for exploration.
o For any measurement that we “think” may be important (e.g.
cursor vs. relevance), we need to made explicit connections to
engagement
o Be careful of the WEIRD syndrome
((Western, Educated, Industrialized, Rich, and Democratic)
158
CONCLUSIONS
o We covered a range of self-report, performance and physiological
metrics.
o We focused on different characteristics of measures, including
intra- vs. inter-session; subjective vs. objective; process- vs.
product-based, small- vs. large-scale; and lab vs. field.
Take-Aways
o No one measure is perfect or complete.
o All studies have different constraints.
o More details on methods used in published literature will enhance
communication around UE measures, which will advance study of
UE.
o Need to ensure methods are applied consistently with attention to
reliability.
o More emphasis should be placed on using mixed methods to
improve the validity of the measures.
159
ACKNOWLEDGEMENTS
o Dr. Lalmas work in collaboration with Ioannis
Arapakis, Ricardo Baeza-Yates, Berkant
Cambazoglu, Georges Dupret, Janette Lehmann and
others at Yahoo! Labs.
o Dr. O‟Brien‟s work is supported by the Social
Science and Humanities Research Council (SSHRC)
of Canada and the Networks of Centres of
Excellence Graphics, Animation and New Media
(NCE GRAND) Project (http://www.grand-nce.ca/).
160

More Related Content

Measuring User Engagement

  • 1. MEASURING USER ENGAGEMENT MOUNIA LALMAS, YAHOO! LABS HEATHER O’BRIEN, UNIVERSITY OF BRITISH COLUMBIA ELAD YOM-TOV, MICROSOFT RESEARCH © Lalmas, O‟Brien & Yom-Tov 1
  • 2. WHY IS IT IMPORTANT TO ENGAGE USERS? o In today‟s wired world, users have enhanced expectations about their interactions with technology … resulting in increased competition amongst the purveyors and designers of interactive systems. o In addition to utilitarian factors, such as usability, we must consider the hedonic and experiential factors of interacting with technology, such as fun, fulfillment, play, and user engagement. o In order to make engaging systems, we need to understand what user engagement is and how to measure it. 2
  • 3. WHY IS IT IMPORTANT TO MEASURE AND INTERPRET USER ENGAGEMENT WELL? CTR … for example 3
  • 4. OUTLINE o Introduction and Scope o Part I - Foundations 1. Approaches based on self-report measures 2. Approaches based on web analytics 3. Approaches based on physiological measures o Part II – Advanced Aspects 1. Measuring user engagement in mobile information searching 2. Networked user engagement 3. Combining different approaches o Conclusions o Bibliography 4
  • 5. WHO WE ARE o Mounia Lalmas, Visiting Principal Scientist, Yahoo! Labs • Research interest: user engagement, social media, search • Blog: http://labtomarket.wordpress.com o Heather O‟Brien, Assistant Professor, iSchool, University of British Columbia • Research interests: theories of user engagement; self- report and qualitative methods of evaluating user engagement • Website: http://faculty.arts.ubc.ca/hobrien/ o Elad Yom-Tov, Senior Researcher, Microsoft Research • Research interests: learning from user behavior about actions in the physical world • Website: http://research.microsoft.com/en-us/people/eladyt/ 5
  • 7. http://thenextweb.com/asia/2013/05/03/kakao-talk-rolls-out-plus-friend-home-a- revamped-platform-to-connect-users-with-their-favorite-brands/ ENGAGEMENT IS ON EVERYONE’S MIND http://socialbarrel.com/70-percent-of-brand-engagement-on-pinterest-come-from- users/51032/ http://iactionable.com/user-engagement/ http://www.cio.com.au/article/459294/heart_foundation_uses_ga mification_drive_user_engagement/ http://www.localgov.co.uk/index.cfm?method=news.detail&id=109512 http://www.trefis.com/stock/lnkd/articles/179410/linkedin-makes-a-90- million-bet-on-pulse-to-help-drive-user-engagement/2013-04-15 7
  • 8. WHAT IS USER ENGAGEMENT (UE)? (I) o “The state of mind that we must attain in order to enjoy a representation of an action” so that we may experience computer worlds “directly, without mediation or distraction” (Laurel, 1993, pp. 112-113, 116). o “Engagement is a user‟s response to an interaction that gains maintains, and encourages their attention, particularly when they are intrinsically motivated” (Jacques, 1996, p. 103). o A quality of user experience that depends on the aesthetic appeal, novelty, and usability of the system, the ability of the user to attend to and become involved in the experience, and the user‟s overall evaluation of the experience. Engagement depends on the depth of participation the user is able to achieve with respect to each experiential attribute (O‟Brien & Toms, 2008). o “…explain[s] how and why applications attract people to use them” (Sutcliffe, 2010, p. 3). 8
  • 9. WHAT IS UE? (II) o User engagement is a quality of the user experience that emphasizes the positive aspects of interaction – in particular the fact of being captivated by the technology (Attfield et al, 2011). user feelings: happy, sad, excited, … The emotional, cognitive and behavioural connection that exists, at any point in time and over time, between a user and a technological resource user interactions: click, read comment, recommend, buy… user mental states: involved, lost, concentrated… 9
  • 10. INCREASED EMPHASIS ON MEASURING UE http://www.cbc.ca/news/health/story/2012/12/20/inside-your-brain-neuromarketing.html 10
  • 12. HOW DO WE CAPTURE USER ENGAGEMENT? http://www.businessweek.com/articles/2012-10-12/why-measuring- user-engagement-is-harder-than-you-think 12
  • 13. WHY IS MEASURING UE IMPORTANT? o User engagement is a complex construct o Various approaches have been proposed for measuring engagement, but… • Not enough emphasis on reliability and validity of individual measures, or triangulation of various approaches. o Standardization of what user engagement is and how to measure it will benefit research, design, and users. 13
  • 14. CONSIDERATIONS IN THE MEASUREMENT OF USER ENGAGEMENT o Short term (within session) and long term (across multiple sessions) o Laboratory vs. field studies o Subjective vs. objective measurement o Large scale (e.g., dwell time of 100,000 people) vs. small scale (gaze patterns of 10 people) o UE as process vs. as product One is not better than other; it depends on what is the aim. 14
  • 15. SOME CAVEATS (I) o This tutorial assumes that web application are “properly designed” • We do not look into how to design good web site (although some user engagement measurement may inform for an enhanced design). o This tutorial is based on “published research” literature • We do not know how each individual company and organization measure user engagement (although we guess some common baselines). o This tutorial focuses on web applications that users “chose” to engage with • A web tool that has to be used e.g. for work purpose, is totally different (users have no choice). o This tutorial is not an “exhaustive” account of all existing works • We focus on work that we came across and that has influenced us; if we have missed something important, let us know. 15
  • 16. SOME CAVEATS (II) o This tutorial focuses on web applications that are widely used by “anybody” on a “large-scale” • User engagement in the game industry or education have different characteristics. o This tutorial does not focus on the effect of advertisements on user engagement • We assume that web applications that display ads do so in a “normal” way so that to not annoy or frustrate users. o This tutorial looks at user engagement at web application “level” • Although we use examples and may refer to specific sites or types of applications, we do not focus on any particular applications. o This tutorial is not about “how” to influence user engagement  16
  • 17. OUTLINE o Introduction and Scope o Part I - Foundations 1. Approaches based on self-report measures 2. Approaches based on web analytics 3. Approaches based on physiological measures o Part II – Advanced Aspects 1. Measuring user engagement in mobile information searching 2. Networked user engagement 3. Combining different approaches o Conclusions o Bibliography 17
  • 19. CHARACTERISTICS OF USER ENGAGEMENT (I) • Users must be focused to be engaged • Distortions in the subjective perception of time used to measure it Focused attention (Webster & Ho, 1997; O‟Brien, 2008) • Emotions experienced by user are intrinsically motivating • Initial affective “hook” can induce a desire for exploration, active discovery or participation Positive Affect (O‟Brien & Toms, 2008) • Sensory, visual appeal of interface stimulates user & promotes focused attention • Linked to design principles (e.g. symmetry, balance, saliency) Aesthetics (Jacques et al, 1995; O‟Brien, 2008) • People remember enjoyable, useful, engaging experiences and want to repeat them • Reflected in e.g. the propensity of users to recommend an experience/a site/a product Endurability (Read, MacFarlane, & Casey, 2002; O‟Brien, 2008) 19
  • 20. CHARACTERISTICS OF USER ENGAGEMENT (II) • Novelty, surprise, unfamiliarity and the unexpected • Appeal to users‟ curiosity; encourages inquisitive behavior and promotes repeated engagement Novelty (Webster & Ho, 1997; O‟Brien, 2008) • Richness captures the growth potential of an activity • Control captures the extent to which a person is able to achieve this growth potential Richness and control (Jacques et al, 1995; Webster & Ho, 1997) • Trust is a necessary condition for user engagement • Implicit contract among people and entities which is more than technological Reputation, trust and expectation (Attfield et al, 2011) • Difficulties in setting up “laboratory” style experiments • Why should users engage? Motivation, interests, incentives, and benefits (Jacques et al., 1995; O‟Brien & Toms, 2008) 20
  • 21. FORRESTER RESEARCH – THE FOUR I‟S • Presence of a user • Measured by e.g. number of visitors, time spent Involvement • Action of a user • Measured by e.g. CTR, online transaction, uploaded photos or videos Interaction • Affection or aversion of a user • Measured by e.g. satisfaction rating, sentiment analysis in blogs, comments, surveys, questionnaires Intimacy • Likelihood a user advocates • Measured by e.g. forwarded content, invitation to joinInfluence (Forrester Research, June 2008) 21
  • 22. FLOW: THE THEORY OF OPTIMAL EXPERIENCE o What is “Flow” the state in which people are so involved in an activity that nothing else seems to matter; the experience itself is so enjoyable that people will do it even at great cost, for the sheer sake of doing it (Csikszentmihalyi, 1990, p. 4). o Engagement has been called “flow without user control” and “a subset of flow” (Webster & Ahuja, 2004, p. 8) 22
  • 23. ATTRIBUTES OF FLOW Enjoyment , Focused attention, Absorption, Time perception, Clear goals and feedback, Control (Cskiszentmihalyi, 1990) FLOW IN HUMAN COMPUTER INTERACTION (HCI) • The “PAT” – Person, Artefact, Task Model (Finneran & Zhang, 2003) • Attributes and predictors of flow with work-based systems (Webster, Trevino & Ryan, 1993) • Relationships between flow and the tasks being performed • Ghani & Deshpande, 1994: work tasks • Pace, 2004: directed and exploratory search tasks 23
  • 24. RELEVANCE OF FLOW TO ENGAGEMENT Flow Engagement Feedback from an activity Perceived usability vital for engagement to be sustainedControl during an interaction Appropriate levels of challenge Focused attention Complete absorption not necessary; getting “sidetracked” may be acceptable and engaging Intrinsic motivation May be extrinsic; may be more fruitful to explore motivations as utilitarian and hedonic Goal-directed behaviour Have fun, have an experience; see where the road takes me Emphasis on the individual and task variables Personal and task relevance important, but characteristics of system and content precipitate engagement (O‟Brien, 2008) 24
  • 25. IN THE GAME INDUSTRY Engagement – Engrossment – Total immersion (Brown & Cairns, 2004) (Gow et al, 2010) … not covered in this tutorial … but we should be aware of this line of work. 25
  • 26. MEASURING USER ENGAGEMENT Measures Characteristics Self-reported engagement Questionnaire, interview, report, product reaction cards, think-aloud Subjective Short- and long-term Lab and field Small-scale Product outcome Cognitive engagement Task-based methods (time spent, follow-on task) Neurological measures (e.g. EEG) Physiological measures (e.g. eye tracking, mouse-tracking) Objective Short-term Lab and field Small-scale and large- scale Process outcome Interaction engagement Web analytics metrics + models Objective Short- and long-term Field Large-scale Process 26
  • 27. MEASURES • Ask a user to make some estimation of the passage of time during an activity. Subjective perception of time (Baldauf, Burgarda & Wittmann, 2009) • Involuntary body responses • Gaze behavior, mouse gestures, biometrics (e.g., skin conductance, body temperature, blood volume pulse), facial expression analysis Physiological measures • How well somebody performs on a task immediately following a period of engaged interaction Follow-on task performance (Jennett et al, 2008) • An estimate of the degree and depth of visitor interaction against a clearly defined set of goals • Based on web analytics (e.g. click-through rate, comments posted) Online behaviour • Relate system effectiveness and user satisfaction • Designing user models is an important and active research area Search (evaluation) … a bit more about them 27
  • 28. OUTLINE o Introduction and Scope o Part I - Foundations 1. Approaches based on self-report measures 2. Approaches based on web analytics 3. Approaches based on physiological measures o Part II – Advanced Aspects 1. Measuring user engagement in mobile information searching 2. Networked user engagement 3. Combining different approaches o Conclusions o Bibliography 28
  • 29. PART 1: FOUNDATIONS APPROACHES BASED ON SELF-REPORT MEASURES MEASURING USER ENGAGEMENT 29
  • 30. INTRODUCTION TO SELF-REPORT MEASURES o What are self-report measures? • A type of method commonly used in social science where individuals express their attitudes, feelings, beliefs or knowledge about a subject or situation. o Why consider self-reports? • Emphasize individuals‟ perceptions and subjective experiences of their engagement with technologies. o Self-report methods may be discrete, dimensional, and free response. (Lopatovska & Arapakis, 2011) 30
  • 31. ADVANTAGES OF SELF-REPORT MEASURES o Flexibly applied in a variety of settings o High internal consistency for well-constructed measures o Convenient to administer o Specificity in construct definition o Quantitative self-report measures, i.e., questionnaires • Enable statistical analysis and standardization • Participant anonymity • Administered to individuals or groups • Paper-based or web-based • Function well in large-sample research studies (Fulmer & Frijters, 2009) ✓31
  • 32. DISADVANTAGES OF SELF-MEASURES o Information processing issues • Interpretation of researchers‟ questions • Developmental challenges associated with age or cognitive ability o Communication issues • Wording and response options • Rapport between interviewer and interviewee o Construct issues o Reliability and validity issues o Participants‟ responses • What does the “neutral” category mean? • Over-estimate behavior frequency • Reliance on recollection. (Fulmer & Frijters, 2009; Kobayashi & Boase, 2012) ✗32
  • 33. APPROACHES TO STUDYING USER ENGAGEMENT WITH SELF-REPORT MEASURES – OUTLINE o Methods • Interviews • Think aloud/think after protocols • Questionnaires o Examples of employing each method to study engagement o Examples of using self-report methods 33
  • 34. INTERVIEWS o May be structured, semi-structured or unstructured. o The interview schedule. o May be one-on-one or one-to-many (focus groups). o May focus on general or specific events, experiences, or timeframes. http://openclipart.org/detail/173434/interview-by-jammi-evil-173434 34
  • 35. USING INTERVIEWS TO MEASURE USER ENGAGEMENT o Objectives: 1. To develop an operational definition of engagement, and 2. To identify key attributes of engagement. o Who? • 17 online searchers, gamers, learners and shoppers. o Why interviews? o How were the questions formulated? • Grounded in interdisciplinary literature review and theory o What guided the analysis? • Threads of Experience (McCarthy & Wright, 2004) (O‟Brien & Toms, 2008) 35
  • 36. USING INTERVIEWS TO MEASURE USER ENGAGEMENT: OUTCOMES o Developed a process-based model of user engagement. o Identified attributes of engagement: • Aesthetic and sensory appeal, affect, feedback, control, interactivity, novelty, focused attention, motivation, interest. o Mapped attributes to stages in the process model. o Benefit of using interviews. (O‟Brien & Toms, 2008) 36
  • 37. THINK ALOUD/THINK AFTER PROTOCOLS o Think aloud • Verbalization during the human-computer interaction o Think after or simulated recall • Verbalization after the human-computer interaction o Constructive interaction • Involves two verbalizing their thoughts as they interact with each other o Spontaneous and prompted self-report • Participants provide feedback at fixed intervals or at other points defined by the researcher (Branch, 2000; Ericson & Simon, 1984; Kelly, 2009; Van den Haak, De Jong, & Schellens, 2009) 37
  • 38. THINK ALOUD/THINK AFTER PROTOCOLS: CONSIDERATIONS o Automatic processes difficult to articulate. o Complex/highly visual interactions may be challenging to remember and/or verbalize. o Think aloud/spontaneous or prompted self-report • Unnatural, interruptive • Increased cognitive load o Think after or simulated recall: • Relies on memory but attention is less divided • Researcher can draw participants‟ attention to specific features of the interface, activities, etc. (Branch, 2000; Ericson & Simon, 1984; Kelly, 2009; Van den Haak, De Jong, & Schellens, 2009) 38
  • 39. USING THINK ALOUD TO STUDY USER ENGAGEMENT WITH EDUCATIONAL MULTIMEDIA o Series of studies with educational multimedia and television advertisements o Think aloud component of the research: • Identified salient aspects of engagement with content and media • Content: Perceptions driven by personal interest • Media: Focus on media preference, presentation, and affordances of control in navigation (Jacques, Preece & Carey, 1995) 39
  • 40. QUESTIONNAIRES o Closed-ended (quantitative) and open-ended (qualitative). o Effect of mode (Kelly et al., 2008). o Scale development and evaluation is a longitudinal process. 40
  • 41. Theoretical Foundation Step 3: Develop Instrument Step 5: Data Analysis Step 7: Data Analysis Step 1: Research Review Step 2: Exploratory Study Step 4: Administer Survey, Sample 1 Step 6: Administer Survey, Sample 2 Collect Date Select pool of items Develop conceptual model and definition Evaluate scale reliability & dimensionality Evaluate scale validity & predictive relationships Collect data (Pre-test) Develop „purified‟ scale Collect data Final scale Scale Construction Scale Evaluation Basedon(DeVellis,2003)SCALE DEVELOPMENT AND EVALUATION 41
  • 42. QUESTIONNAIRES FOR MEASURING USER ENGAGEMENT o Jacques, 1996 • 13-items • Attention, perceived time, motivation, needs, control, attitudes, and overall engagement o Webster & Ho, 1997 • 15-items • Influences on engagement: including challenge, feedback, control and variety, and • Engagement, including attention focus, curiosity, intrinsic interest, and overall engagement. o O’Brien & Toms, 2010 – User Engagement Scale (UES) • 31-items • Aesthetic appeal, novelty, felt involvement, focused attention, perceived usability, and endurability (overall experience) 42
  • 43. USING QUESTIONNAIRES TO STUDY ENGAGEMENT: ROLE OF MEDIA FORMAT: EXAMPLE I Story 1 Story 2 UES + Information Recall Questions Post Session Questionnaire – Attitudes Checklist (Schraw et al. 1998) + Interviews Pre-task Survey Media Condition Video Audio Narrative text Transcript text Participants (n=82) (O‟Brien, 2013) 43
  • 44. ROLE OF FORMAT IN MEDIA ENGAGEMENT: PREPARATION AND SCREENING OF UES Data Screening Reliability of sub-scales Correlation analysis Principal Components Analysis - 12 items - 2 items 31 items 27 items 44
  • 45. PRINCIPLE COMPONENTS ANALYSIS (PCA) OF REMAINING UES ITEMS Component Description No. Items % Variance Cronbach‟s alpha 1 Hedonic Engagement 12 47.9 0.95 2 Focused Attention 4 11 0.87 3 Affective Usability 4 5.9 0.75 4 Cognitive effort 2 4.6 0.83 Kaiser-Meyer-Olkin Measure of Sampling Adequacy = 0.89 Bartlett‟s Test of Sphericity = x2=1621.12(231), p<0.001 45
  • 46. FINDINGS FROM THE STUDY Component Story 1: Farming M(SD) Story 2: Mining M(SD) Hedonic Engagement 4.06 (1.3) 5.06 (1.05) Focused Attention 3.3 (1.4) 3.93 (1.3) Affective Usability 4.69 (1.3) 5.6 (0.9) Cognitive Effort 4.19 (1.5) 5.29 (1.3) Relationship between Story and Engagement Component Audio M(SD) Video M(SD) Transcript M(SD) Narrative M(SD) Hedonic Engagement 4.7(1.2) 5(1.1) 3.9(1.4) 4.5(1.2) Focused Attention 3.6(1.4) 3.8(1.4) 3.5(1.4) 3.5(1.5) Affective Usability 5(1.2) 5.4(1.1) 4.9(1.3) 5(1.2) Cognitive Effort 4.5(1.6) 5.5(1.1) 4.1(1.5) 4.8(1.4) Relationship between Media Condition and Engagement 46
  • 47. FINDINGS FROM THE STUDY (CONTINUED) Effect Λ F df(1) df(2) p Story 0.8 05.45 1 98 .001 Condition 0.78 1.81 3 98 .04 Story x Condition 0.92 0.54 3 98 .88 Multivariate Tests for Story and Condition UES Component Effect MS F df(1) df(2) p Hedonic Engagement Story 14.05 9.95 1 98 .002 Focused Attention Story 10.32 4.78 1 98 .031 Affective Usability Story 23.76 17.71 1 98 .000 Cognitive Effort Story 20.02 11.4 1 98 .000 Cognitive Effort Condition 7.23 4.11 3 98 .009 Significant F-tests for Univariate Follow-up 47
  • 48. CONCLUSIONS: MEDIA FORMAT AND ENGAGEMENT o Next steps in data analysis. o Value of screening and examining the reliability and principal component structure of the UES items. o Why performance measures would not be significant in this controlled study. o What was learned about users‟ perceived engagement in this study. 48
  • 49. o How the visual catchiness (saliency) of “relevant” information impacts user engagement metrics such as focused attention and emotion (affect) • focused attention refers to the exclusion of other things • affect relates to the emotions experienced during the interaction o Saliency model of visual attention developed by (Itti & Koch, 2000) EMPLOYING MULTIPLE SELF-REPORT METHODS: EXAMPLE II 49
  • 50. MANIPULATING SALIENCY Web page screenshot Saliency maps salientconditionnon-salientcondition (McCay-Peet et al, 2012) 50
  • 51. STUDY DESIGN o 8 tasks = finding latest news or headline on celebrity or entertainment topic o Affect measured pre- and post- task using the Positive e.g. “determined”, “attentive” and Negative e.g. “hostile”, “afraid” Affect Schedule (PANAS) o Focused attention measured with 7-item focused attention subscale e.g. “I was so involved in my news tasks that I lost track of time”, “I blocked things out around me when I was completing the news tasks” and perceived time o Interest level in topics (pre-task) and questionnaire (post- task) e.g. “I was interested in the content of the web pages”, “I wanted to find out more about the topics that I encountered on the web pages” o 189 (90+99) participants from Amazon Mechanical Turk 51
  • 52. PANAS (10 POSITIVE ITEMS AND 10 NEGATIVE ITEMS) o You feel this way right now, that is, at the present moment [1 = very slightly or not at all; 2 = a little; 3 = moderately; 4 = quite a bit; 5 = extremely] [randomize items] distressed, upset, guilty, scared, hostile, irritable, ashamed, nervous, jittery, afraid interested, excited, strong, enthusiastic, proud, alert, inspired, determined, attentive, active (Watson, Clark & Tellegen, 1988) 52
  • 53. 7-ITEM FOCUSED ATTENTION SUBSCALE (PART OF THE 31- ITEM USER ENGAGEMENT SCALE) 5-POINT SCALE (STRONG DISAGREE TO STRONG AGREE) 1. I lost myself in this news tasks experience 2. I was so involved in my news tasks that I lost track of time 3. I blocked things out around me when I was completing the news tasks 4. When I was performing these news tasks, I lost track of the world around me 5. The time I spent performing these news tasks just slipped away 6. I was absorbed in my news tasks 7. During the news tasks experience I let myself go (O'Brien & Toms, 2010) 53
  • 54. SALIENCY AND POSITIVE AFFECT o When headlines are visually non-salient • users are slow at finding them, report more distraction due to web page features, and show a drop in affect o When headlines are visually catchy or salient • user find them faster, report that it is easy to focus, and maintain positive affect o Saliency is helpful in task performance, focusing/avoiding distraction and in maintaining positive affect 54
  • 55. SALIENCY AND FOCUSED ATTENTION o Adapted focused attention subscale from the online shopping domain to entertainment news domain o Users reported “easier to focus in the salient condition” BUT no significant improvement in the focused attention subscale or differences in perceived time spent on tasks o User interest in web page content is a good predictor of focused attention, which in turn is a good predictor of positive affect 55
  • 56. SELF-REPORTING, CROWDSOURCING, SALIENCY AND USER ENGAGEMENT o Interaction of saliency, focused attention, and affect, together with user interest, is complex. o Using crowdsourcing worked! o What next? • include web page content as a quality of user engagement in focused attention scale • more “realistic” user (interactive) reading experience • other measurements: mouse-tracking, eye-tracking, facial expression analysis, etc. (McCay-Peet, Lalmas & Navalpakkam, 2012) 56
  • 57. CONSIDERATIONS WHEN EMPLOYING SELF-REPORT MEASURES o What is the research question? o What is the most suitable self report method? o How might we use self-report in studies of user engagement? • Gather data explicitly about engagement • Other self-report measures may predict, validate, or enrich other measures of engagement o Why do self-reports get a bad rap? 57
  • 58. PART 1: FOUNDATIONS APPROACHES BASED ON WEB ANALYTICS MEASURING USER ENGAGEMENT 58
  • 60. 8 INDICES o Click Depth Index: page views o Duration Index: time spent o Recency Index: rate at which users return over time o Loyalty Index: level of long-term interaction the user has with the site or product (frequency) o Brand Index: apparent awareness of the user of the brand, site, or product (search terms) o Feedback Index: qualitative information including propensity to solicit additional information or supply direct feedback o Interaction Index: user interaction with site or product (click, upload, transaction) (Peterson et al, September 2008) 60
  • 61. INTRA-SESSION VERSUS INTER-SESSION ENGAGEMENT o Intra-session engagement measures our success in attracting the user to remain on our site for as long as possible o “Long-term engagement can be defined as the degree of voluntary use of a system along a wide period of time…” (Febretti and Garazotto, 2009) o Inter-session engagement can be measured directly or, for commercial sites, by observing lifetime customer value (CTR, etc.). o Some studies (Lehmann et al, 2011) report some correlation between inter- and intra-session measures, for example, dwell time and number of active days ( =− . ) 61
  • 62. WHY NOT USE INTRA-SESSION MEASURES EXCLUSIVELY? o We seek to have users return to the site again and again, and to perceive the site as beneficial to them o Intra-session measures can easily mislead, especially in for a short time (Kohavi et al, 2012): • Consider a very poor ranking function introduced into a search engine by mistake • Therefore, bucket testing may provide erroneous results if intra-session measures are used o Hence inter-session (long-term) engagement is the preferred measure 62
  • 63. (Lehmann et al, 2012) observed that different users engage with sites differently. Users were defined according to the number of days per month that a site is used: • Tourists: 1 day • Interested: 2-4 days • Average: 5-8 days • Active: 9-15 days • VIP: more than 16 days Sites from the Yahoo! network were clustered according to the proportion of users from each group. The figure shows that different sites receive different user types and corresponding usage. DEPENDENCY ON USER TYPE 63
  • 64. DEPENDENCY ON TASK AND WEBSITE o Engagement varies by task. For example, a user who accesses a website to check for emails (a goal-specific task) has different engagement patterns from one who is browsing for leisure. o In one study (Yom-Tov et al, 2013), sessions in which 50% or more of the visited sites belonged to the five most common sites (for each user) were classified as goal-specific. • Goal-specific sessions accounted for 38% of sessions • Most users (92%) have both goal-specific and non-goal-specific sessions. • The average downstream engagement (more later) in goal-specific sessions was 0.16. This is to be contrasted with 0.2 during non-goal- specific sessions. o Dependence on website is clear: news site will see different engagement patterns that online shopping sites. 64
  • 65. LARGE-SCALE MEASUREMENTS OF USER ENGAGEMENT Intra-session measures Inter-session measures Single site • Dwell time session duration • Play time (video) • Click through rate (CTR) • Mouse movement • Number of pages viewed (click depth) • Conversion rate (mostly for e-commerce) • Fraction of return visits • Time between visits (inter- session time, absence time) • Number of views (video) • Total view time per month (video) • Lifetime value • Number of sessions per unit of time • Total usage time per unit of time • Number of friends on site (Social networks) Multiple sites • Downstream engagement • Revisits 65
  • 66. ANOTHER CATEGORIZATION OF MEASURES o (Lehmann et al, 2012) used a different categorization of measures: • Popularity: Total number of users to a site, number of visits, and number of clicks • Activity: Number of page views per visit, time per visit (dwell time) • Loyalty: Number of days a user visits a site, number of times visited, total time spent o Each of these categories captures a different facet of engagement, and are therefore not highly correlated … more about this later 66
  • 67. DWELL TIME AND OTHER SIMILAR MEASURES o Definition The contiguous time spent on a site or web page o Similar measures Play time (for video sites) o Cons Not clear that the user was actually looking at the site while there Distribution of dwell times on 50 Yahoo! websites 67
  • 68. o Dwell time varies by site type: leisure sites tend to have longer dwell times than news, ecommerce, etc. o Dwell time has a relatively large variance even for the same site DISTRIBUTION OF DWELL TIMES ON 50 YAHOO! SITES (recall tourists, VIP, active … users) 68
  • 69. o User revisits are common in sites which may be browser homepages, or contain content which is of regular interest to users. o Goal-oriented sites (e.g., e-commerce) have lower revisits in the time range observed, meaning that revisit horizon should be adjusted by site. DISTRIBUTION OF USER REVISITS TO A LIST OF YAHOO! SITES (WITHIN SESSION) 69
  • 70. OTHER INTRA-SESSION MEASURES o Clickthrough rate (CTR): number of clicks (e.g., on an ad) divided by the number of times it was shown. o Number of pages viewed (click depth): average number of contiguous pages viewed within a site • Can be problematic if the website is ill-designed. o Number of returns to the website within a session • Useful for websites such as news aggregators, where returns indicate that the user believes there may be more information to glean from the site. o Conversion rate (mostly for e-commerce): fraction of sessions which end in a desired user action (e.g., purchase) • Not all sessions are expected to result in a conversion, so this measure not always informative. However, it has the advantage of being closer to a website manager‟s goal. 70
  • 71. INTER-SESSION ENGAGEMENT MEASURES In general, these are the preferred measures of engagement o Direct value measurement: • Lifetime value, as measured by ads clicked, monetization, etc. o Return-rate measurements: • Fraction of return visits: How many users return for another visit? • Time between visits (inter-session time, absence time) • Number of distinct views (video) o Total use measurements: • Total usage time per unit of time • Number of sessions per unit of time • Total view time per month (video) • Number of friends on site (social networks) 71
  • 72. ABSENCE TIME AND SURVIVAL ANALYSIS Easy to implement and interpret Can compare many things in one go No need to estimate baselines But need lots of data to account for noise (Dupret & Lalmas, 2013) Yahoo! Japan (Answers search) 72
  • 73. MODELS OF USER ENGAGEMENT BASED ON WEB ANALYTICS … TOWARDS A TAXONOMY Onlinesitesdifferconcerningtheirengagement! Games Users spend much time per visit Search Users come frequently and do not stay long Social media Users come frequently and stay long Special Users come on average once News Users come periodically Service Users visit site, when needed 73
  • 74. DATA AND MEASURES Interaction data, 2M users, July 2011, 80 US sites Popularity #Users Number of distinct users #Visits Number of visits #Clicks Number of clicks Activity ClickDepth Average number of page views per visit. DwellTimeA Average time per visit Loyalty ActiveDays Number of days a user visited the site ReturnRate Number of times a user visited the site DwellTimeL Average time a user spend on the site. 74
  • 75. METHODOLOGY General models Time-based models Dimensions 8 measures weekdays, weekend 8 metrics per time span #Dimensions 8 16 Kernel k-means with Kendall tau rank correlation kernel Nb of clusters based on eigenvalue distribution of kernel matrix Significant metric values with Kruskal-Wallis/Bonferonni #Clusters (Models) 6 5 Analysing cluster centroids = models 75
  • 76. MODELS OF USER ENGAGEMENT [6 GENERAL] o Popularity, activity and loyalty are independent from each other o Popularity and loyalty are influenced by external and internal factors e.g. frequency of publishing new information, events, personal interests o Activity depends on the structure of the site interest-specific media (daily) search periodic media e-commerce models based on engagement measures only 76
  • 77. TIME-BASED [5 MODELS] Models based on engagement over weekdays and weekend hobbies, interest-specific weather daily news work-related time-based models ≠ general models 77
  • 78. MODELS OF USER ENGAGEMENT o User engagement is complex and standard metrics capture only a part of it o User engagement depends on time (and users) o First step towards a taxonomy of models of user engagement … and associated measures o What next? • More sites, more models, more measures • User demographics, time of the day, geo-location, etc. • Online multi-tasking (Lehmann et al, 2012) 78
  • 79. ONLINE MULTI-TASKING users spend more and more of their online session multi-tasking, e.g. emailing, reading news, searching for information  ONLINE MULTI-TASKING navigating between sites, using browser tabs, bookmarks, etc seamless integration of social networks platforms into many services leaving a site is not a “bad thing!” (fictitious navigation between sites within an online session) 181K users, 2 months browser data, 600 sites, 4.8M sessions •only 40% of the sessions have no site revisitation •hyperlinking, backpaging and teleporting 79
  • 80. PART 1: FOUNDATIONS APPROACHES BASED ON PHYSIOLOGICAL MEASURES MEASURING USER ENGAGEMENT 80
  • 81. PHYSIOLOGICAL MEASURES o Eye tracking o Mouse movement o Face expression o Psychophysiological measures Respiration, Pulse rate Temperature, Brain wave, Skin conductance, … 81
  • 82. WHAT IS PSYCHOPHYSIOLOGY? o The branch of physiology dealing with the relationship between physiological processes and thoughts, emotions, and behavior. Reaction to The body responds to psychological processes. we exercise  we sweat we get embarrassed  our cheeks get red and warm o Examples of measurements • Electroencephalography (EEG) – measures the electrical activity of the brain through the scalp. • Cardiovascular measures – heart rate, HR; beats per minute, BPM; heart rate variability, HRV; vasomotor activity • Respiratory sensors – monitors oxygen intake and carbon dioxide output. • Electromyographic (EMG) sensors – measures electrical activity in muscles • Electrogastrogram (EGG) – measures changes in pupil diameter with thought and emotion (pupillometry) and eye movements • Galvanic skin response (GSR) sensors – monitors perspiration/sweat gland activity (also called Skin Conductance Level – SCL) • Temperature sensors – measures changes in blood flow and body temperature • Functional magnetic resonance imaging (fMRI) – measures brain activity by detecting associated changes in blood flow 82
  • 83. PSYCHOPHYSIOLOGY – PROS AND CONS Pros o More objective data (not dependent on language, memory) o Can be performed continuously during message/task processing o Can provide information on emotional and attentional responses often not available to conscious awareness Cons o Equipment expensive and can be cumbersome, and obtrusive o Rarely a one-to-one correspondence between specific behaviors and physiological responses o Difficult to operationalize and isolate a psychological construct o Not applicable to large-scale http://flavor.monell.org/~jlundstrom/research%20behavior.html 83
  • 84. WHAT IS EYE TRACKING? o Process of measuring either the point of gaze (where one is looking) or the motion of an eye relative to the head. o Eye tracker is a device for measuring eye positions and eye movement. o Used in research on the visual system, in psychology, in cognitive linguistics and in product design. Time to First Fixation Fixations Before First Fixation Duration Fixation Duration Total Fixation Duration Fixation Count Visit Duration Visit Count whole screen or AOI (area of interest) http://en.wikipedia.org/wiki/Eye_tracking Examples of measures: (Lin et al, 2007) 84
  • 85. EYE TRACKING – ATTENTION AND SELECTION 18 users, 16 tasks each (chose one story and rate it) eye movement recorded Attention (gaze) interest has no role position > saliency Selection mainly driven by interest position > attention (Navalpakkam et al., 2012) 85
  • 86. EYE TRACKING – PROS AND CONS o Pros • Lots of details (fine-grained data/resolution) • Offers direct measure of user attention + what they are looking at • Offers insights into how people consume & browse web pages + why they fail at clicking on something o Cons • Not scalable • Slow and expensive • Not natural environment (e.g. at home) • Behavior ARE can be different in a lab setting Can mouse movement act as a (weak) proxy of gaze? 86
  • 87. WHAT IS MOUSE TRACKING? o Using software (JavaScript) to collect user mouse cursor positions on computer/web interface o Aim to provide information about what people are doing, typically to improve the design of an interface o How does gaze is measured by an eye tracker relates to mouse movement as recorded o Studies and applications • Attention on web pages • Relevance of search results • As a proxy of relevance • As additional and complimentary signal (also known as cursor tracking) 87
  • 88. MOUSE VS GAZE – ATTENTION ON WEB PAGES o 90 users on 6 Yahoo! Finance articles – rich media content o 3 treatments: • ad always on top; ad top right + random; random (6 positions) o Reading tasks + post-questionnaires ad avoidance similar patterns shift of attention from top-left to right as ad position change similar patterns visit ad sooner & more time to process content when ad position moves similar patterns more at top position and longer dwell left better than right Similar patterns between gaze and mouse in terms of user attention when manipulating conditions (here ads) Interesting results for “ads” (Navalpakkam & Churchill, 2012) 88
  • 89. Multimedia search activities often driven by entertainment needs, not by information needs RELEVANCE IN MULTIMEDIA SEARCH (Slaney, 2011) 89
  • 90. CLICK-THROUGH RATE AS PROXY OF RELEVANCE I just wanted the phone number … I am totally satisfied  90
  • 91. GAZE AND CURSOR RELATIONSHIP o Small difference on part of page user attends to (5 users) o Better correlation when cursor moves and when there is lots of movement (23 users + reading instructions) o Search result page • Correlate more along the y-axis than x-axis Correlate more when cursor placed over search results (32 users – 16 search tasks; 10 users and 20 search tasks) BUT 1. Search result page and result page 2. Some factor? (Rodden et al, 2008; Guo & Agichtein, 2010) (Chen et al, 2011; Hauger et al, 2011) 91
  • 92. GAZE VS MOUSE - DISTANCE (Huang, White & Dumais, 2011) 92
  • 93. GAZE VS CURSOR - FACTORS o 38 users and 32 search tasks (navigational + informational) o Age or gender does not seem to be a factor o Task does not seem to be a factor (others found the opposite) (using click entropy to classify a a query) o User individual behavior seem to matter more o Gaze leads the cursor o Stronger alignment when search result page loads o Cursor behaviors: alignment increases inactive < examining < reading < action < click 58.8% 31.9% 2.5% 5.7% (Huang et al, 2012) 93
  • 94. CAN WE PREDICT GAZE? better prediction when accounting for cursor behaviors and time in addition to cursor only (Huang,White&Buscher,2012) 94
  • 95. CLICK VS CURSOR – HEATMAP o Estimate search result relevance (Bing - Microsoft employees – 366,473 queries; 21,936 unique cookies; 7,500,429 cursor move or click) the role of hovering? (Huang et al, 2011) 95
  • 96. MOUSE MOVEMENT – WHAT CAN HOVERING TELL ABOUT RELEVANCE? Cickthrough rate: % of clicks when URL Shown (per query) Hover rate: % hover over URL (per query) Unclicked hover: Media time user hovers over URL but no click (per query) Max hover time: Maximum time user hover Over a result (per SERP) (Huang et al, 2011) 96
  • 97. MOUSE MOVEMENT – WHAT CAN HOVERING TELL ABOUT ABANDONMENT? o Abandonment (a engagement metric in search) is when there is no click on the search result page • User is dissatisfied (bad abandonment) • User found result(s) on the search result page (good abandonment) o 858 queries (21% good vs. 79% abandonment manually examined) o Cursor trail length • Total distance (pixel) traveled by cursor on SERP • Shorter for good abandonment o Movement time • Total time (second) cursor moved on SERP • Slower when answers in snippet (good abandonment) o Cursor speed • Average cursor speed (pixel/second) • Slower when answers in snippet (good abandonment) (Huang et al, 2011) 97
  • 98. RELEVANCE & CURSOR o Clickthrough rate (CTR) – in a search result • Ranking bias • Various way to deal with it such as “interleaving” • Presentation bias • Perceived relevance from reading the snippet o Dwell time – on landing page (post search result) • Although a good indicator of user interest/relevance, not reliable on its own • Time spending reading a document (result) has shown to improve search quality • Short dwell time a good indication of non-relevance • BUT • Interpreting long dwell-time not so straight-forward (user spends a long time localising the relevant part in long document!) … we recall that in search 98
  • 99. RELEVANCE & CURSOR “reading” cursor heatmap of relevant document vs “scanning” cursor heatmap of non-relevant document (both dwell time of 30s) (Guo & Agichtein, 2012) 99
  • 100. RELEVANCE & CURSOR “reading” a relevant long document vs “scanning” a long non-relevant document (Guo & Agichtein, 2012) 100
  • 101. WHAT WORKS? – PREDICTING RELEVANCE o Dwell time o Cursor movement • number, total distance traveled (and x- and y-axis), speed (-), maximal coordinate o Scroll • frequency (-) and speed (-) o Predefined areas of interest (AOI) • Where main content lies o Actual rank way less informative incombinationevenbetter (Guo & Agichtein, 2012) … learning a model with: 101
  • 102. FACIAL EXPRESSION AND SEARCH 16 subjects, facial expressions recorded while performing search tasks of various levels of difficultly. learned model (based on support vector machine) shows that facial expressions provide good cues on topical relevance. Potential application: personalised relevance feedback based on implicit cues. (Arapakis et al, 2010) 102
  • 103. FACEBOOK AND EMOTIONAL ENGAGEMENT (FLOW) (Lang, 1995; Mauri et al, 2011) valence-arousalplane SC = skin conductance EMG = electromagnetic activity Lang model of emotions relaxation (3mn, panorama pictures)  Facebook (3mn, free navigation)  stress (4mn, arithmetic tasks) 30 students 103
  • 104. EMOTION, ENGAGEMENT AND MEASURES o Anticipation: Humans are curious. o Joy: Happy users mean well engaged, repeat users. o Trust: Users want to feel safe when interacting with your site. o More? 104 Plutchik‟s emotion wheel http://uxdesign.smashingmagazine.com/2011/05/19/optimizing-emotional-engagement-in-web-design-through-metrics/
  • 105. OUTLINE o Introduction and Scope o Part I - Foundations 1. Approaches based on self-report measures 2. Approaches based on web analytics 3. Approaches based on physiological measures o Part II – Advanced Aspects 1. Measuring user engagement in mobile information searching 2. Networked user engagement 3. Combining different approaches o Conclusions o Bibliography 105
  • 106. PART 2: ADVANCED ASPECTS MOBILE INFORMATION SEEKING MEASURING USER ENGAGEMENT 106
  • 107. MOBILE USER ENGAGEMENT o Mobile devices are changing the ways in which we are learning, working, and communicating. o The role of device has not been considered in (published) studies of user engagement. o However … related work has been done in the UX literature. 107
  • 108. DIARY STUDIES 1. Komaki et al, 2012 • Context heavily influenced search behavior 2. Nylander et al, 2009 • General preference for using mobile, even when an alternative was available (51% of instances) • Mobile use influenced by: technical ease and functionality, and convenience, laziness, and integration with social life and daily activities 3. Church & Smythe, 2009; Church & Oliver, 2011 • Emphasized location and time as key factors in mobile use 108
  • 109. FIELD STUDIES o Oulasvirta et al, 2005 • Attention shifting between the mobile device and the external environment o Gökera & Myrhaugb, 2008 • Context closely tied to perceived relevance and value of information o Battarbee & Koskinen, 2005 • Emotional response of information sharing and communication with friends in everyday life 109
  • 110. BUILDING A MODEL OF ENGAGEMENT BASED ON UX LITERATURE o User experience (UX) literature suggests that: • Users must focus attention on the mobile task and the external environment (Oulasvirta et al., 2005). • 63% of mobile searches were social in nature (Teevan et al. 2011). • Mobile devices with constant connectivity are often „habit- forming‟ (Oulasvirta et al., 2012) • Time motivates mobile phone use (Tojib & Tsarenko, 2012). Therefore … 110
  • 111. MOBILEUSERENGAGEMENT Usability Ease of use Perceptual speed Integrate into everyday activities Aesthetic Appeal Small displays Status Popularity of device SOCIAL CONTEXT Novelty “Checking” for new information, e.g., status updates Endurable Focused Attention External environment places demands on attention TIME Felt Involve- ment May be used casually to “pass time” rather than for sustained interest or specific information needs Fit between the interaction and context of use 111
  • 112. Period of Sustained Engagement DisengagementRe-engagement Point of Engagement STUDYING MOBILE USER ENGAGEMENT (IN PROGRESS) (Absar, O‟Brien, Halbert & Trumble, 2013) While conversing about their carbon footprints, Mary and John could not decide which of their cars are more energy-efficient “Let‟s look it up!” “We‟re late for class. Better go!” Upon returning home, John decides to search more about hybrid cars on his computer 112
  • 113. MOBILE USER ENGAGEMENT: EXPLORATORY STUDY METHODS Interview 1 Interview 2Mobile Diary Collection Period TEXT + PHOTO (Photovoice, Wang & Burris, 1997) 113
  • 114. ENGAGEMENT WITH MOBILE APPS o Focused on branded mobile apps, interactive marketing tools o Methodology: identification and analysis of branded apps • 2010 Interbrand Top 100 Global Brands + iTunes app store • Analysis of features and content on the branded app according to: vividness, novelty, motivation, control, customization, feedback, and multiplatforming • Distinguished product and service branded apps o Almost all apps incorporated at least one of the seven engagement attributes: • control (97.2%), customization (85.8%), vividness (78.3%: entire app, 86.8%: entry page), multiplatforming (70.8%), motivation (62.3%), feedback (55.7%), and novelty (11.3%). (Kim, Lin & Sung, 2013) 114
  • 115. PART 2: ADVANCED ASPECTS NETWORKED USER ENGAGEMENT MEASURING USER ENGAGEMENT 115
  • 116. DOWNSTREAM ENGAGEMENT o Basic premises: • The success of a website depends not only on itself, but also on its environment. • This is particularly relevant for companies running networks of properties or services No man is an island, entire of itself or website 116
  • 117. USER BEHAVIOR WITHIN A NETWORK OF SITES 117
  • 118. NETWORKED USER ENGAGEMENT: ENGAGEMENT ACROSS A NETWORK OF SITES  Large online providers (AOL, Google, Yahoo!, etc.) offer not one service (site), but a network of sites  Each service is usually optimized individually, with some effort to direct users between them  Success of a service depends on itself, but also on how it is reached from other services (user traffic)  Users switch between sites within an online session, several sites are visited and the same site is visited several times (online multi-tasking) 118
  • 119. MEASURING DOWNSTREAM ENGAGEMENT 119 User session Downstream engagement for site A (% remaining session time) Site A
  • 120. DOWNSTREAM ENGAGEMENT o Varies significantly across sites o Exhibits different distributions according to site type o Is not highly correlated with other engagement measures such as dwell time o Optimizing downstream engagement will have little effect on user engagement within that site 120 0% 10% 20% 30% 40% 50% 60% 70% 0 0.02 0.04 0.06 0.08 0.1 0.12 1% 9% 17% 25% 33% 41% 49% 58% 66% 74% 82% 90% 98% 0 20 40 60 80 100 120 140 0% 20% 40% 60% 80%
  • 121. DISTRIBUTION OF DOWNSTREAM ENGAGEMENT SCORES (19.4M sessions, 265,000 users, 50 sites) o Downstream engagement is not highly correlated with intra-site measures of engagement such as dwell time ( =− . , <10− ). o Downstream engagement is negatively correlated with inter-session measures such as revisits ( =− . , <10− ). 121
  • 122. There are different modes of downstream engagement according to site type. There are no obvious characteristics of websites that would indication their downstream distribution. CLUSTERED DISTRIBUTION OF DOWNSTREAM ENGAGEMENT SCORES (19.4M sessions, 265,000 users, 50 sites) 122
  • 123. DISTRIBUTION OF DOWNSTREAM ENGAGEMENT TO A LIST OF YAHOO! WEBSITES Varies across and within websites (19.4M sessions, 265,000 users, 50 sites) 123
  • 124. NETWORKED USER ENGAGEMENT o Downstream engagement • Varies significantly across sites • Exhibits different distributions according to site type o Other measures of networked user engagement? o Applications to companies with several services but also to increasing “tightly” connected services (news and social media) o Let us not forget increased online multitasking o Next: Can we quantify the network effect? (Yom-Tov et al., 2013) 124
  • 125. PART 2: ADVANCED ASPECTS COMBINATIONS OF APPROACHES MEASURING USER ENGAGEMENT 125
  • 126. MEASURING USER ENGAGEMENT – WE RECALL Measures Characteristics Self-reported engagement Questionnaire, interview, report, product reaction cards, think-aloud Subjective Short- and long-term Lab and field Small-scale Product outcome Cognitive engagement Task-based methods (time spent, follow-on task) Neurological measures (e.g. EEG) Physiological measures (e.g. eye tracking, mouse-tracking) Objective Short-term Lab and field Small-scale and large- scale Process outcome Interaction engagement Web analytics + “data science” metrics + models Objective Short- and long-term Field Large-scale Process 126
  • 127. COMBINATION OF APPROACHES SEVERAL STUDIES USER ENGAGEMENT 127
  • 128. STUDY I: GAZE AND SELF-REPORTING o News + comments o Sentiment, interest o 57 users (lab-based) o Reading task (114) o Questionnaire (qualitative data) o Record mouse tracking, eye tracking, facial expression, EEG signal (quantitative data) Three metrics: gaze, focus attention and positive affect 128
  • 129. INTERESTING CONTENT PROMOTE USERS ENGAGEMENT METRICS o All three metrics: • focus attention, positive affect & gaze o What is the right trade-off? • news is news  o Can we predict? • provider, editor, writer, category, genre, visual aids, …, sentimentality, … o Role of user-generated content (comments) • As measure of engagement? • To promote engagement? 129
  • 130. LOTS OF SENTIMENTS BUT WITH NEGATIVE CONNOTATIONS! o Positive effect (and interest, enjoyment and wanted to know more) correlates • Positively () with sentimentality (lots of emotions) • Negatively () with positive polarity (happy news) SentiStrenght (from -5 to 5 per word) sentimentality: sum of absolute values (amount of sentiments) polairity: sum of values (direction of the sentiments: positive vs negative) (Thelwall, Buckley & Paltoglou, 2012) 130
  • 131. EFFECT OF COMMENTS ON USER ENGAGEMENT o 6 ranking of comments: • most replied, most popular, newest • sentimentality high, sentimentality low • polarity plus, polarity minus o Longer gaze on • newest and most popular for interesting news • most replied and high sentimentality for non-interesting news o Can we leverage this to prolong user attention? 131
  • 132. GAZE, SENTIMENTALITY, INTEREST o Interesting and “attractive” content! o Sentiment as a proxy of focus attention, positive affect and gaze? o Next • Larger-scale study • Other domains (beyond daily news!) • Role of social signals (e.g. Facebook, Twitter) • Lots more data: mouse tracking, EEG, facial expression (Arapakis et al., 2013) 132
  • 133. STUDY II: MOUSE TRACKING AND SELF-REPORTING o 324 users from Amazon Mechanical Turk (between subject design) o Two domains (BBC News and Wikipedia) o Two tasks (reading and search) o “Normal vs Ugly” interface o Questionnaires (qualitative data) • focus attention, positive effect, novelty, • interest, usability, aesthetics • + demographics, handeness & hardware o Mouse tracking (quantitative data) • movement speed, movement rate, click rate, pause length, percentage of time still 133
  • 134. “Ugly” vs “Normal” Interface (BBC News) 134
  • 135. “Ugly” vs “Normal” (Wikipedia) 135
  • 136. MOUSE TRACKING CAN TELL ABOUT o Age o Hardware • Mouse • Trackpad o Task • Searching: There are many different types of phobia. What is Gephyrophobia a fear of? • Reading: (Wikipedia) Archimedes, Section 1: Biography 136
  • 137. MOUSE TRACKING COULD NOT TELL MUCH ON o focused attention and positive affect o user interests in the task/topic o BUT BUT BUT BUT • “ugly” variant did not result in lower aesthetics scores • although BBC > Wikipedia  BUT – the comments left … • Wikipedia: “The website was simply awful. Ads flashing everywhere, poor text colors on a dark blue background.”; “The webpage was entirely blue. I don't know if it was supposed to be like that, but it definitely detracted from the browsing experience.” • BBC News: “The website's layout and color scheme were a bitch to navigate and read.”; “Comic sans is a horrible font.” 137
  • 138. MOUSE TRACKING AND USER ENGAGEMENT o Task and hardware o Do we have a Hawthorne Effect??? o “Usability” vs engagement • “Even uglier” interface? o Within- vs between-subject design? o What next? • Sequence of movements • Automatic clustering (Warnock & Lalmas, 2013) 138
  • 139. STUDY III: SELF-REPORT AND BEHAVIOURAL DATA o Information Visualization System • McGill Library Catalogue: Engineering Subject Area • Version 1: visualization • Version 2: visualization + audio o Participatory Design Study o Experiment • n=24 engineering students • Tasks: six information retrieval and hierarchical navigation tasks • Data collected: self-report and performance metrics (Absar, 2012) 139
  • 140. FINDINGS o No difference in performance accuracy or time on task o Aesthetics and Perceived Usability was higher for the audio-visual system. o Perceived ease of use was also rated higher for the audio-visual system. o Open-ended comments offered insights into participants‟ perceptions and interactions. 140
  • 141. STUDY IV: ONLINE NEWS INTERACTIONS http://www.cbc.ca/news/ (O‟Brien&Lebow,inpress) 141
  • 142. SELF-REPORT, BEHAVIOR AND PHYSIOLOGICAL DATA: MEASURES o Pre-task questionnaire • Demographics + news behaviours o Interaction with website • Performance: Time on task, reading time, browsing time, number of pages visited within site, whether participants clicked on links to recommended content • Physiological: heart rate (HR), electrodermal activity (EDA), electrocmytogram (EMG) [subset of participants] o Post-session questionnaire • User Engagement Scale (UES) (O‟Brien & Toms, 2010) • Cognitive Absorption Scale (CAS) (Argawal & Karahanna, 2000) • System Usability Scale (SUS) (Brooks, 1997) o Think-After Interview • Questions about the items selected for the task • Questions about overall experience 142
  • 143. SELF-REPORT, BEHAVIOR AND PHYSIOLOGICAL DATA: RESULTS o Self-report UES, CAS and SUS • Positive correlations support criterion validity of the measures • Designation of “low,” “medium” and “high” scores for each group based on median • All questionnaires were positively correlated with aggregate interest in the articles o UES and Physiological Data HR EDA EMG UES -0.38 -0.25 -0.21 143
  • 144. SELF-REPORT, BEHAVIOR AND PHYSIOLOGICAL DATA: RESULTS o UES and Behavioural Data • Use of Links • UES scores were not significantly different between those who clicked on links (M=3.8, SD=0.95) and those who did not (M=4.29, SD=0.52) • U(1)=51.5, p=0.15 High M(SD) Medium M(SD) Low M(SD) Kruskal Wallis (x2) p Reading time 6:03 (2:34) 6:05 (1:56) 6:56 (3:29) 1.15 0.56 Browsing time 4:03 (2:29) 5:17 (3:49) 7:29 (4:09) 3.98 0.13 Total time 10:07 (3:37) 11:23 (5:10) 14:26 (5:02) 5.09 0.07 # pages visited 9.5 (5.0) 10.3 (3.6) 16.3 (8.4) 3.89 0.14 144
  • 145. THINK-AFTER INTERVIEW o Did participants‟ experiences with online news fit the process model of user engagement (O‟Brien & Toms, 2008)? o What attributes of user engagement were significant to participants in the online news environment? • Novelty, affect, usability, personal interest and relevance o Evidence of two types of engagement (O‟Brien, 2011) • Content engagement • Interface engagement 145
  • 146. OUTLINE o Introduction and Scope o Part I - Foundations 1. Approaches based on self-report measures 2. Approaches based on web analytics 3. Approaches based on physiological measures o Part II – Advanced Aspects 1. Measuring user engagement in mobile information searching 2. Networked user engagement 3. Combining different approaches o Conclusions o Bibliography 146
  • 148. AN EXAMPLE OF PUTTING IT ALL TOGETHER: HEART FRAMEWORK o Happiness • Satisfaction o Engagement o Adoption o Retention o Task success • Efficiency and effectiveness large-scale behavioral data • Based on experience in working with user- centered products • Not all measures appropriate to all products HEART framework is “more” about user experience (Rodden, Hutchinson & Fu, 2010) 148
  • 149. PULSE MEASURES … THE OLD WAY o Page views • Increase may mean increase of popularity or getting lost o Uptime • Outage is bad o Latency • Slow is bad o Seven-day active users • Number of users who used the application at least once a week • Does not differentiate between new and returning users o Earnings • Two many steps in purchasing flow is bad • Intra-session vs. inter-session 149
  • 150. HAPPINESS o Subjective aspects • satisfaction, visual appeal, likelihood to recommend, perceived ease of use o Survey o Possibility to track over time iGoogle (personalised home page) weekly in-product survey major redesign  satisfaction decreases (1…7) over time  measure recovers (sign of change aversion) 150
  • 151. ENGAGEMENT o Level of involvement o Behavioral proxies • Frequency, intensity, depth of interaction over a time period o Reported as an average and not in total GMAIL example at least one-visit per week five-or- more visits in a week PULSE HEART strong predictor of long-term retention 151
  • 152. ADOPTION AND RETENTION o Adoption: how many new users for a given period o Retention: percentage of users still active after some given period o Useful for new applications or those undergoing change o Should account for seasonal changes and external events Google Finance (stock market meltdown 2008) page view  seven-day  Adoption Retention PULSE HEART new users interested in the crisis?? current users panicking? new users staying? better understanding of event-driven traffic spikes 152
  • 153. TASK SUCCESS … GOAL-ORIENTED o Behavioral measures of user experience • efficiency (e.g. time to complete a task); effectiveness (e.g. percent of task completed); error rate • e.g. sending an email; finding a location o Remote usability on a large scale o Difficult with standard log data unless an optimal path exists for a type of task Google map dual box for search - what - where single search box A/B testing Error rates 153
  • 154. GOALS – SIGNALS - MEASURES o Measures • should relate to one or several goals of the application/product • Used to track progress towards that goal 1. articulate the goal(s) of an application/feature 2. identify signals that indicate success 3. build/chose corresponding measures to track (Rodden, Hutchinson & Fu, 2010) 154
  • 155. GOALS – SOME TIPS o What are the goals of the product/features in terms of user experience (user engagement)? o What tasks users need to accomplish? o What is the redesign cycle trying to achieve? o Retention or adoption: • Is it more important to acquire new users or to keep existing ones more engaged? o Goal associated with a feature is not the same as goal of the whole product o Measures (to be used or developed) should not be used to solely drive the goals 155
  • 156. SIGNALS – SOME TIPS o What is success? What is failure? o What feelings and perceptions correlate with success and failure? o What actions indicate that goals are met? o Data sources • logs, surveys, panel of judges o Sensitive and specific signals • need to observe some reaction when user experience is better or worse • failure often easier to identify than success • undo event, abandonment, frustration 156
  • 157. MEASURES – SOME TIPS o Raw counts need to normalised o Ratios, percentages, and average per users often more useful o Accuracy of metrics • bots, all important actions recovered o Keep comparing measures with “conventional” ones (e.g. comScore matters) 157 (Rodden, Hutchinson & Fu, 2010)
  • 158. OPEN RESEARCH QUESTIONS … IN NO PARTICULAR ORDER o A great deal of emphasis on users and systems, but less evidence about the role of task, device, and context on user engagement. o We tend to focus on characteristics of users in the moment of interaction. But are their individual differences that may predict the level of engagement that can be achieved? o Psychophysiological measurement may not be sensitive enough for measuring “general” or “average” engagement (e.g. News or Mail sites) … although it will likely bring great insights. o How we to “use” physiological measures – interpretation of the data generated – is an important area for exploration. o For any measurement that we “think” may be important (e.g. cursor vs. relevance), we need to made explicit connections to engagement o Be careful of the WEIRD syndrome ((Western, Educated, Industrialized, Rich, and Democratic) 158
  • 159. CONCLUSIONS o We covered a range of self-report, performance and physiological metrics. o We focused on different characteristics of measures, including intra- vs. inter-session; subjective vs. objective; process- vs. product-based, small- vs. large-scale; and lab vs. field. Take-Aways o No one measure is perfect or complete. o All studies have different constraints. o More details on methods used in published literature will enhance communication around UE measures, which will advance study of UE. o Need to ensure methods are applied consistently with attention to reliability. o More emphasis should be placed on using mixed methods to improve the validity of the measures. 159
  • 160. ACKNOWLEDGEMENTS o Dr. Lalmas work in collaboration with Ioannis Arapakis, Ricardo Baeza-Yates, Berkant Cambazoglu, Georges Dupret, Janette Lehmann and others at Yahoo! Labs. o Dr. O‟Brien‟s work is supported by the Social Science and Humanities Research Council (SSHRC) of Canada and the Networks of Centres of Excellence Graphics, Animation and New Media (NCE GRAND) Project (http://www.grand-nce.ca/). 160

Editor's Notes

  1. THUS: User engagement is a high priority for users, but also those who develop and design information systems and applications.
  2. Here it is one slide simply making the point that we need to measure well as it can be easy to interpret something like CTR wrongly. The slide is only about an example here.
  3. We see interest in the concept everywhere we go.
  4. Take this recent story from CBC online news. The premise here is to use eye tracking goggles to monitor customers attention to products; an electroencephalogram (EEG) cap with electrodes monitors the brain response, including positive and negative valence (“emotion”)
  5. Web search companies are examining a host of metrics to try and capture user engagement. Some of the things they are investigating are dwell time, depth of navigation within sites, return visits, etc. They are trying to find patterns of engagement within this data.
  6. Discusses the use of social media sites, like Facebook and Twitter, but how a lot of traffic is coming from other sources, like email and chat messagesThis is heightened by mobile use, where “Links are passed from social network to apps to chat to e-mail, and tracking them quite quickly becomes almost impossible.” Criticize analytics programs, saying the data collected in one program doesn’t match that which is collected in others or publishers’ websites
  7. User engagement is a high priority for users, but also those who develop and design information systems and applications.User engagement is a complex construct, and is influenced by user, system, task, and contextual variables.Standardization around what user engagement is and how to measure it will benefit the research and design communities, and ultimately the users.
  8. Large scale (e.g., dwell time of all visitors to a website in a specified time frame) versus small scale (gaze patterns of 10 participants)I think a potentially good point to make here is that large-scale studies are not necessarily better or more important than small-scale studies. It depends largely on what we are measuring. For example, if we are studying a behaviour that is common and consistent across people, e.g., an immitation phenomenon, then it makes more sense to do a small-scale, but exhaustive study. If we are studing a behaviour that exhibits considerable variance across people then it makes sense to study a larger sample of the population and capture as much of that variance as possible.
  9. Another characterization from a white paper/consultancy world, but also fit what we discussed before.
  10. Originally, flow was developed by watching people engaged in activities such as creating art, meditating, reading, etc. across a variety of culturesOver past several years it has been applied to exploring users’ reactions to and motivation for using specific web applications based on task and other situational (Ghani, &amp; Deshpande, 1994).Webster and Ahuja (2004) argue that engagement is passive compared to flow; I don’t agree that engagement is passive. To me the main difference is in the characteristics of flow and engagement, particularly with respect to focused attention and motivation.
  11. EnjoymentAbility to focus attention on task at handAbsorption/less awareness of external distractionsLose of perception of time passingActivity provides clear goals and feedbackPerson feels in controlFlow theory is all about creating meaning for individuals during the activity, a process brought about when feelings, thoughts, and actions are in concert with one another and intentions, derived from social or biological influences
  12. This slide is just a reference to work done in the game industry; we indeed don’t focus on this, but we acknowledge the work being done there.
  13. Self-report: long term engagement harder to measure due to participants needing to agree to be interviewed, etc. multiple times/return for multiple studiesLab and field: may be experimental, through the web, or in participants’ natural contextsProcess REALLY difficult to get at with self-report – usually relies on people’s perceptions of the experience at the end of study/task or upon reflection Cognitive engagement – talk about scalability issues
  14. In this section we describe the method and give an example of how it has been used to study engagementWe then end this section with examples of two studies that use multiple self-report methods
  15. Will talk through the process of developing the UES using this diagram.
  16. Approximately 20 people in each condition; 80 valid cases
  17. Mouse movement correlated with dwell time?
  18. Komaki et al., 2012: 4 week data collection period; participants recorded information about their mobile search activities using an online questionnaire; Participant searches (total 183) analyzed by context/location (home, office, commute, etc.) and information needs (trivia, business hours, bus timetables, weather, etc.), as well as success rate of searches and reasons cited by participants for failures; context influenced the information need, but also query formulation and search successNylander et al. 2009: one week reporting period; in addition to keeping a diary of mobile searches, they were asked to save examples of web pages they interacted with on their computers but NOT their mobile phonesChurch and colleagues: both studies were four week diary studiesChurch and Oliver, 2011: 70% of mobile use occurring in the home, workplace or other non-mobile location; importacce of being able to integrate the phone into everyday life activities and minimize disruption to these activitiesChurch &amp; Oliver, 2009: 30% of searches were geographical or location dependent; a lot of searches were also temporally dependent even if they didn’t feature explicit references to time in queries
  19. Oulasvirta et al., 2005: various mobile conditions paired with search tasks; the search tasks “consisted of retrieving a piece of information from a given website and reporting that to the experimenter (e.g., “Find your favorite item from today’s menu at the University restaurant”).Findings: “Attention dwelled in the environment much more while outdoors than indoors, the difference between the laboratory and the busy street being almost tenfold”, also found that social tasks (like conversation) also required more attention; the study also found that the mobile context often took precedence over the task at hand, and that participants were not able to ignore their environment even when asked to prioritize tasksGökera &amp; Myrhaugb, 2008: user studies situated in field experiment: testing out a mobile tourist app; participants given information tasks while walking outdoor in a tourist area of Seville; asked to provide &apos;relevance judgements&apos; of information that was provided via system&apos;s content tags; temporal context: e.g., restaurant info tied to meal timeBattarbee &amp; Koskinen, 2005: &quot;several groups of friends exchanged multimedia messages with each other for about five weeks&quot; using mobile phones; emotional experience of sharing and communicating experiences
  20. We consider previous attributes of engagement, summarized from previous literature and presented/confirmed in O’Brien &amp; Toms, 2008 and 2010 as they related to mobile use characteristics and we propose Time and Social Contexts as unique affordances of mobile devices specified in UX literature that may be important for UE.
  21. Explain more about what photovoice is and why we are using it in this study
  22. Branded mobile apps “software downloadable to a mobile device which prominently displays a brand identity, often via the name of the app and the appearance of a brand logo or icon, throughout the user experience” (Bellman et al. 2011, p. 191; as cited in Kim et al., 2013)“Looked at top global brands and searched iTunes to locate whether the company had an app; only included free apps in the study – found 68 companies with branded apps Excluded were branded apps by major media organizations (e.g., MTV, Sony Pictures), the music entertainment industry (e.g., Sony Entertainment), and the software/portal services category (e.g., Google, Yahoo, Apple)” because the mobile apps were alternate versions to already popular channels/portals, etc. Also excluded were ones that required logins or user accounts – “This three-step processyielded a total of 106 branded apps (from 50 brands across 11 business categories).Engagement attributes:Common observations: use of recommendations, saving of personal information (“my” store); location-specific information based on person’s geographical position; Vividness commonly conveyed using imagesAbility to share information and connect to social media sitesp. 63: “From a practical perspective, the implications of this study may be useful in understanding how current branded apps incorporate mobile technologies (e.g., location-based service, bar code, smartphone motion sensor) to engage their consumers.”
  23. Site = service
  24. Reporting here on the final experiment in the thesis. Conducted a participatory design study to select and refine the sounds used in the audio version of the system, and a pre-test with 6 people to test the tasks, procedures and instruments.24 engineering students used both systems (in counterbalanced order) to complete six information retrieval and hierarchical navigation tasksSelf-reported engagement: used a sub-set of the UES (11 items) that researcher felt was most applicable to the evaluation of the system.
  25. Aesthetics and Perceived Usability was higher for the audio-visual system. Novelty and Endurability were not significantly difference between the two systems. Perceived ease of use was also rated higher for the audio-visual system.Open-ended comments revealed that some people wanted more time to become accustomed to the sounds, and others wanted the sounds to be continuous (less distracting).
  26. 30 people interacted with an online news website. They were given a simulated task scenario (Borlund, 2003) that was social in nature. The scenario stated: You will be attending a social gathering later that evening where they may not know many people. You decide to browse CBC news for items that might help you make conversation. You decide that three items will give you enough to talk about.”Quasi-experimental: 1. Complete demographics and news browsing questionnaire; 2. Interact with the website; 3. Complete a post-session questionnaire (User engagement scale; cognitive absorption scale [Argawal &amp; Karahanna, 2000]; system usability scale[Brooks, 1997]); 4. Talk after interview facilitated by Morae.
  27. Complete demographics and news browsing questionnaire; 2. Interact with the website; 3. Complete a post-session questionnaire (User engagement scale; cognitive absorption scale [Argawal &amp; Karahanna, 2000]; system usability scale[Brooks, 1997]); The CAS is a multidimensional scale and consists of five dimensions: temporal dissociation (TD), focused immersion (FIM), heightened enjoyment (HE), control (CO), and curiosity (CU) (Argawal &amp; Karahanna, 2000); the SUS is a unidimensional scale designed to assess the usability of a computer system or application. 4. Talk after interview facilitated by Morae: Participants identified the three news items they selected for the task, described what they were about and why they chose them, and rated them for interest, intellectual stimulation, and willingness to share; were also asked questions pertaining to their overall experience during the session – what they liked and did not like.A subset of the participants had their physiological responses recorded during the interaction (i.e., heart rate, electrodermal activity, electrocmytogram – corrugator supercilli muscle).
  28. Self-report measures: goodcritierion validity, but need to look at discriminant validity. With a larger sample size, could have performed factor or confirmatory path analysis to determine which components of each scale overlapNo significant differences between UES scores and physiological data. This may be because we tracked physiology over a lengthy period of time (12-22 minutes, e.g.) so number of readings captured for each person differed and this is an average; not looking at specific time periods in the browsing session, which might be more fruitful for engagement
  29. 23 people did not click on linksThere were no significant differences in reading, browsing, or total time for UES. However, you can see that those in the low group spend more time reading, browsing and overall than the other two groups and visited more pages. Normally we would consider more time spend a GOOD thing and an indicator that someone was engaged. Interestingly, when we look at these measures for COG ABSORPTION, we see more time spent reading, browsing and overall for the HIGH and LOW groups, and these differences are statistically significant.SUS was statistically significant for clicking on links with higher usability reported for those who did not click on links to other content.The results raise questions about the self-report measures (e.g., discriminant validity issues, but also whether the UES is not registering because it is measuring both utilitarian and hedonic qualities of experience), but also the performance measures best suited to study engagement. Since number of pages visited was highest for those in the low UES, CAS, and SUS, does depth of exploration on a site mean the user is engaged? Perhaps not in an experimental setting. Also, how does time relate to engagement? In this case more time was expected to be associated with more enjoyment, but this was not the case.
  30. Process model: Point of engagement was initiated by organization and aesthetic appeal of the news website.Maintaining engagement depended on the usability and presenttion of the website, but also on participants’ interest in the content.Disengagement occurred when participants became frustrated with navigation, or when the aesthetic and interactive features of the website interfered with reading. Novelty of content also played a role here.By asking participants to talk about the whole experience, rather than only the aspects that were engaging, we were able to see what facilitated and deterred engagement with online news. By leaving the questions about what was engaging and non-engaging open-ended, we were able learn about participants’ perceptions of the website and the new content. This led us to hypothesize that there may be two types of user engagement – one driven by content and the other by the interface – that may be salient for people based on personality traits or context.Study leaves us wondering about how different measures of engagement fit together; interview data provided a lot of richness and indicates that self-report measures may want to pick up on the two types of engagement and how these relate to behavioural and physiological measures