Lec11 ch8

Fundamentals of HCI
1st Semester, 2023
Lecture 11
Chapter 8: User Interface Evaluation
Dr. Wajanat Rayes

Information Science Department
1
Lecture Outlines
• Evaluation Criteria
❖ Usability (Quantitative and Qualitative Measures)
❖ UX
• Evaluation Methods
❖ Focus Interview/Observation Study
❖ Expert/Heuristic Reviews
❖ Measurement
❖ Surveys
• Safety and Ethics in Evaluation
• Homework 8
2
Introduction
• Designers can become so entranced with their creations that they may
fail to evaluate them adequately.
• Experienced designers have attained the wisdom and humility to know

that extensive testing is a necessity.
• Even if the design began with HCI considerations … → Still holes
– Design is an iterative process anyway
– Priorities change
3
Evaluation Criteria
When evaluating the interaction model and interface,

there are largely two criteria:
Usability and User experience (UX)
4
Evaluation Criteria: Usability
Usability:
• Usability refers to the ease of use and learnability of the user interface.
• Usability can be measured in two ways, quantitatively or qualitatively.
https://www.nngroup.com/articles/quant-vs-qual/
5
Evaluation Criteria: Usability - Quantitative Measures
Quantitative (quant)
• offer an indirect assessment of the usability of a design. They can be
based on users’ performance on a given task (e.g., task-completion times,
success rates, number of errors) or can reflect participants’ perception of
usability (e.g., satisfaction ratings).
6
• Popular choices of quantitative measure are:

❖ Task completion time
❖ Task completion amount in a unit time (e.g. score)
❖ Task error rate
• For example, suppose we would like to test a new motion-based interface for a
smart phone game. We could have a pool of subjects play the game, using both
the conventional touch-based interface and also the newly proposed motion
based one. We could compare the score and assess the comparative effectiveness
of the new interface.
• The underlying assumption is that task performance is closely correlated to the

usability (ease of use and learnability). However, such an assumption is quite
arguable. In other words, task performance measures, while quantitative, only
reveals the aspect of efficiency, or merely the aspect of ease of use, not
necessarily the entire usability. 7
• The aspect of learnability should be and can be assessed in a more explicit

way, by measuring the time and effort (e.g. memory) for users to learn the
interface.
• The problem is that it is difficult to gather a “homogeneous” pool of subject

with similar backgrounds.
• Learnability generally involves much more biasing factors such as

educational/experiential/cultural background, age, gender, etc.
• Quantitative measurements in practice cannot be applied to all the possible

tasks for a given application and interface. Usually, a very few representative
tasks are chosen for evaluation. This sometimes makes the evaluation only
partial.
• To complement the shortcomings of the quantitative evaluation, qualitative

evaluations often are conducted together with the quantitative analysis
8
Evaluation Criteria: Usability – Qualitative Measures
Qualitative (qual)
– Consist of observational findings that identify design features easy or hard to
use
– Offer a direct assessment of the usability of a system: researchers will observe

participants struggle with specific UI elements and infer which aspects of the
design are problematic and which work well. They can always ask participants
follow up questions and change the course of the study to get insights into
the specific issue that the participant experiences.
9
NASA TLX (Task Load Index) is one of the often used
semi-standard questionnaires for this purpose [NASA].
NASA Task Load Index method assess work load on 7
point scales. Increments of high, medium and low Excerpts from the IBM Usability Questionnaire for
estimates for each point result in 21 gradations on the computer systems [IBM].
scale 10
Evaluation Criteria: UX
• There is no precise definition for UX. It is generally accepted that user

experience is total in the sense that it is not just about the interface but also
about the whole product/application and even extend to the product family
(such as the Apple products or MS Office).
• It is also deeply related to the user’s emotion and perception that result from
the user or anticipated use of the application (through the given interface).
• Such affective response is very much dependent on the context of use.
11
Evaluation Criteria: UX – Cont.
• Thus, UX evaluation involves a more comprehensive assessment on emotional

response, under variety of usage contexts and across a family of
products/applications/interfaces.
• A distinction can be made between usability methods that have the objective of
improving human performance,
and user experience methods that have the objective of improving user
satisfaction with achieving both pragmatic and hedonic goals.
• Note that the notion of UX encompasses usability, that is, “usually” high UX
translates to high usability and high emotional attachment.
12
User Experience
Usability,
Value, Function,
Emotion/Affect, Size, Weight,
Expectation, Preference,
Prior experience, Satisfaction,
Physical fitness, Aesthetics,,
Personality, Interaction Reputation,
Motivation, User Product
Adaptability,
Skill, Mobility, etc.
Age, etc.
Social Context of
Factor Usage
Cultural
Factor
Time pressure, Time, Place,
Peer pressure Gender, Trend, Weather,
Social status, Rules, Language, Public/Private usage,
Social obligations, Norm, Standards, etc.
etc. Religion, etc.
13
Figure 8.3 Various aspects to be considered in totality for assessing user experience (UX).
Evaluation Methods
• There are a variety of evaluation methods
• A given method may be general and applicable to many different situations and
objectives, or more specific and fitting for a particular criterion or usage
situation.
• Overall, an evaluation method can be characterized by the following factors:
– Timing of analysis (e.g. throughout the application development stage: early,

middle, late/after)
– Type and number of evaluators (e.g. several HCI experts vs. 100’s of domain
users)
– Formality (e.g. controlled experiment or quick and informal assessment),
– Place of evaluation (laboratory vs. in-situ field testing).

14
Focus Interview / Observation Study
• One of the easiest and straightforward evaluation methods
• Interview the actual/potential users and observe their interaction

behavior either with the finished product or through a simulated run.
• The interview can be conducted in a simple question-&-answer form

and involves an actual usage of the given system/interface.
• Depending on the stage of the development at which the evaluation

takes place, the application or interface may not be ready for such a
test drive.
15
• Thus, a simple paper/digital mock-up is used so that a particular usage

scenario can be enacted for which the interview can be based on.
• While mock-ups provide a tangible product and thus an improved feel

for the system/interface in question (vs. a mere rough sketch), at an
early stage of the development, important interactivity may not have
been implemented. In this case, a “Wizard-of-Oz” type of testing is
often employed, where a human administrator fakes the system
response “behind the curtain.”
• User interaction behaviors during the system usage or simulation are

recorded or video-taped for more detailed analysis
16
What is Wizard-of-Oz?
• The Wizard of Oz is a UX research method that

involves interaction with a mock interface controlled
by a human. It is used to test costly concepts
inexpensively and to narrow down the problem
space.
https://www.nngroup.com/articles/wizard-of-
%20Wizard%20of%20Oz,Norman%20at%20UC%20San%20Diego.
oz/#:~:text=Definition%3A%20The%20Wizar
d%20of%20Oz,Norman%20at%20UC%20S
an%20Diego.
17
• The interview is often “focused”
– On particular user groups (e.g. elderly) or
– Features of the system/interface (e.g. information layout) to save time.
• One particular interviewing technique is called the “cognitive walkthrough” in

which the subject (or expert) is asked to “speak aloud” his thought process.
– In this case, the technique is focused on investigating for any gap between
the interaction model of the system and that of user.
– We can deduce that cognitive walkthroughs are fit for relatively the earlier
stage of design, namely interaction modeling or interface selection (vs.
specific interface design).
18
Figure 8.4 Interviewing a subject upon simulating the usage of the interface with a mock-up
19
… I expect to see … Please tell
the list of latest me what is
horror movies and going
will try to select it through your
with a mouse … mind as is …
Figure 8.5 A cognitive walkthrough with the interviewer.
20
• Note that the interview/simulation method, due to its simplicity, can be used not
only for evaluation but also for interaction modeling and exploration of
alternatives at the “early” design stage.
• We have already seen design tools such as storyboards and wire-framing which
can be used in conjunction with users or experts for simultaneous analysis and
design.
• The user interviewing/observation technique, being somewhat free form, is easy

to administer, but not structured to be comprehensive. The following table
summarizes the characteristics of the interview/simulation/observation approach.
21
Summary
Evaluators / Size Actual users / Medium sized (10~15)
Type of evaluators Focused (e.g. by expertise, age group, gender, etc.)
Formality Usually informal (not controlled experiment)
Timing and Stage Objective Enactment Method
Objectives Early Interaction model and flow Mock-up /
Wizard of Oz
Middle Interface selection Mock-up /
Wizard of Oz
Partial simulation
Late/After Interface design issues (look Simulation

and feel such as aesthetics, Actual system
color, contrast, font size, icon
location, labeling, layout, etc.)
Easy to administer / Free Form, but not structured nor comprehensive
Table 8.1 Summary: Interview, Usage, and Observation Method
22
Expert/Heuristic Reviews
• Expert heuristic evaluation is very similar to the interview method.
• The difference is that the evaluators are HCI experts and the analysis is carried out
against a pre-prepared HCI guideline, hence called heuristics.
– For instance, the guideline can be general or more specific, with respect to
application genre (e.g. for games), cognitive/ergonomic load, corporate UI
design style (e.g. Android UI guideline), and etc.
23
The following lists Nielsen’s ten general UI heuristics:

https://www.nngroup.com/articles/ten-usability-heuristics/
1. Visibility of system status: The system should always keep users informed about
what is going on, through appropriate feedback within reasonable time.
2. Match between system and the real world: The system should speak the users’
language, with words, phrases, and concepts familiar to the user, rather than system
-oriented terms. Follow real-world conventions, making information appear in a
natural and logical order.
3. User control and freedom: Users often choose system functions by mistake and will
need a clearly marked “emergency exit” to leave the unwanted state without having
to go through an extended dialogue. Support undo and redo.
24
4. Consistency and standards: Users should not have to wonder whether different
words, situations, or actions mean the same thing. Follow platform conventions.
5. Error prevention: Even better than good error messages is a careful design that
prevents a problem from occurring in the first place. Either eliminate error-prone
conditions or check for them and present users with a confirmation option before
they commit to the action.
6. Recognition rather than recall: Minimize the user’s memory load by making objects,
actions, and options visible. The user should not have to remember information
from one part of the dialogue to another. Instructions for use of the system should
be visible or easily retrievable whenever appropriate.
25
7. Flexibility and efficiency of use: Accelerators—unseen by the novice user—may ofte

n speed up the interaction for the expert user such that the system can cater to both
inexperienced and experienced users. Allow users to tailor frequent actions.
8. Aesthetic and minimalist design: Dialogues should not contain information that is
irrelevant or rarely needed. Every extra unit of information in a dialogue competes
with the relevant units of information and diminishes their relative visibility.
9. Help users recognize, diagnose, and recover from errors: Error messages should be
expressed in plain language (no error codes), precisely indicate the problem, and
constructively suggest a solution.
10. Help and documentation: Even though it is better if the system can be used without
documentation, it may be necessary to provide help and documentation. Any such
information should be easy to search, be focused on the user’s task, list concrete
steps to be carried out, and not be too large.
26
• The expert heuristic evaluation is one of the most popular methods of UI

evaluation because it is quick and dirty and relatively cost effective (involving only
few UI experts).
• A few (typically 3~5) UI and domain experts are brought in to evaluate the UI
implementation in the late stage of the development or even against a finished
product.
• The disadvantage of the expert review is that the feedback from the user is absent
as the HCI expert may not understand the needs of the actual users.
– Even experienced expert reviewers have great difficulty knowing how typical
users, especially first-time users will really behave.
• The small sized evaluator pool is compensated by their expertise.
27
Summary
Evaluators / Size HCI Experts / Small sized (3~5)

Type of evaluators Focused (Experts on Application specific HCI rules, Corporate specific design style,
User ergonomics, etc.), Interface consistency
Formality Usually informal (not controlled experiment)

Timing and Stage Objective Enactment Method
Objectives Middle Interface selection Scenarios
Storyboards
Interaction Model
Late/After Interface design issues (look Simulation

and feel such as aesthetics, Actual system
color, contrast, font size, icon
location, labeling, layout, etc.)
Easy and quick, but prior heuristics assumed to exist and no actual user feedback reflected
Table 8.3 Summary of the Expert Review Method
28
Measurement
• In contrast to interviews and observation, measurement methods attempt

to indirectly quantify the goodness of the interaction/interface design
with a score through representative task performance (quantitative) or
quantified answers from carefully prepared subjective surveys
(qualitative).
• Typical indicators for quantitative task performance are
– Task completion time
– Score (or amount of task performance in unit time)
– Errors (produced in unit time).
29
Figure 8.6 The initial (left) and redesigned (right) “play” activity/layer for No Sheets: The
new design after evaluation uses a landscape mode and fewer primary colors. The icons for
fast-forward and review are changed to the conventional style, and the current tempo is
shown on top.
30
Surveys
• On the other hand, numerical scores can be obtained from surveys.
• Surveys are used because many aspects of usability or user experience are based on user
perception which is not directly measurable.
• However, answers to user perception qualities are highly variable and much more susceptible
to user’s intrinsic backgrounds.
• To reduce such biases, a few provisions can be made, for example:

– Using a large number of subjects (e.g. more than 30 people)
– Using an odd-leveled (5 or 7) answer scale (also known as the Likert scale so that there
always exist the middle level answer,
– Carefully wording and explaining the survey question for clarity and understanding
• Even though the result of the survey is a numerical score, the nature of the measurement is
still qualitative because survey questions usually deal with user perception qualities.
31
Online Surveys
• Online surveys avoid the cost of printing and the extra effort needed
for distribution and collection of paper forms.
• Many people prefer to answer a brief survey displayed on a screen,

instead of filling in and returning a printed form
Survey Guidelines
Minimize the number of questions Too many questions results in fatigue and hence
unreliable responses.
Use an odd level scale, 5 or 7 Research has shown odd answer levels with mid
(or Likert Scale) value with 5 or 7 levels produces the best results.
Use consistent polarity E.g. negative responses correspond to level 1 and

positive to 7 and consistently so throughout the survey.
Make questions compact and Questions should be clear and easy to understand. If
understandable difficult to convey the meaning of the question in
compact form, the administrator should verbally
explain.
Give subjects compensation Without compensation, subjects will not do one’s best
or perform the given task reliably.
Categorize the questions For easier understanding and good flow, questions of
the same nature should be grouped and answered in
block, e.g. answer “ease of use” related questions,
then “ease of learning” and so on.
33
Table 8.4 Guidelines for a Good Survey
Safety and Ethics in Evaluation
• Most HCI evaluation involves simple interviews and or carrying out simple tasks
using paper mock-ups, simulation systems, or prototypes. Thus, safety problems
rarely occur.
• However, precautions are still needed. For example, even interviews can become
long and time consuming, causing the subject to feel much fatigue.
• Some seemingly harmless tasks may bring about unexpected harmful effects, both
physically and mentally.
• Therefore, evaluations must be conducted on volunteers who have signed consent
forms. Even with signed consents, the subjects have the right to discontinue the
evaluation task at any time.
• The purpose and the procedure should be sufficiently explained and made
understood to the subjects prior to any experiments. Many organizations run what
is called the Institutional Review Board (IRB), which reviews the proposed
evaluative experiments to ascertain safety and the rights of the subjects. It is best
to consult or obtain permission from the IRB when there is even a small doubt of
some kind of effect to the subjects during the experiments.
34
Summary
• We have looked at various methods for evaluating the interface at different stages
in the development process. As already emphasized, even though all the required
provisions and knowledge may have been put to use to create the initial versions
of the UI, many compromises may be made during the actual implementation,
resulting in a product somewhat different from what was originally intended at the
design stage.
• It is also quite possible that during the development, the requirements simply
change. This is why the explicit evaluation step is a must and, in fact, the whole
design-implement-evaluate cycle must ideally be repeated at least a few times
until a stable result is obtained.
35
Homework 8
• Complete HW8 on BB
• Due Thursday October 26th - 11:59 pm
• Individual homework
• 2 marks
36

Lec11 ch8

Uploaded by

Copyright:

Available Formats

Lec11 ch8

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec11 ch8

Uploaded by

Copyright:

Available Formats

Fundamentals of HCI

1st Semester, 2023

Chapter 8: User Interface Evaluation

Dr. Wajanat Rayes

• Safety and Ethics in Evaluation

• Experienced designers have attained the wisdom and humility to know

• Even if the design began with HCI considerations … → Still holes

– Design is an iterative process anyway

When evaluating the interaction model and interface,

Usability and User experience (UX)

• Usability can be measured in two ways, quantitatively or qualitatively.

• Popular choices of quantitative measure are:

• The underlying assumption is that task performance is closely correlated to the

• The aspect of learnability should be and can be assessed in a more explicit

• The problem is that it is difficult to gather a “homogeneous” pool of subject

• Learnability generally involves much more biasing factors such as

• Quantitative measurements in practice cannot be applied to all the possible

• To complement the shortcomings of the quantitative evaluation, qualitative

– Offer a direct assessment of the usability of a system: researchers will observe

• There is no precise definition for UX. It is generally accepted that user

• Such affective response is very much dependent on the context of use.

• Thus, UX evaluation involves a more comprehensive assessment on emotional

satisfaction with achieving both pragmatic and hedonic goals.

• There are a variety of evaluation methods

• Overall, an evaluation method can be characterized by the following factors:

– Timing of analysis (e.g. throughout the application development stage: early,

– Formality (e.g. controlled experiment or quick and informal assessment),

– Place of evaluation (laboratory vs. in-situ field testing).

• One of the easiest and straightforward evaluation methods

• Interview the actual/potential users and observe their interaction

• The interview can be conducted in a simple question-&-answer form

• Depending on the stage of the development at which the evaluation

• Thus, a simple paper/digital mock-up is used so that a particular usage

• While mock-ups provide a tangible product and thus an improved feel

• User interaction behaviors during the system usage or simulation are

• The Wizard of Oz is a UX research method that

• The interview is often “focused”

– On particular user groups (e.g. elderly) or

– Features of the system/interface (e.g. information layout) to save time.

• One particular interviewing technique is called the “cognitive walkthrough” in

Figure 8.5 A cognitive walkthrough with the interviewer.

• The user interviewing/observation technique, being somewhat free form, is easy

Late/After Interface design issues (look Simulation

Easy to administer / Free Form, but not structured nor comprehensive

Table 8.1 Summary: Interview, Usage, and Observation Method

• Expert heuristic evaluation is very similar to the interview method.

The following lists Nielsen’s ten general UI heuristics:

7. Flexibility and efficiency of use: Accelerators—unseen by the novice user—may ofte

• The expert heuristic evaluation is one of the most popular methods of UI

• The small sized evaluator pool is compensated by their expertise.

Evaluators / Size HCI Experts / Small sized (3~5)

Formality Usually informal (not controlled experiment)

Late/After Interface design issues (look Simulation

Table 8.3 Summary of the Expert Review Method

• In contrast to interviews and observation, measurement methods attempt

• Typical indicators for quantitative task performance are

– Task completion time

– Score (or amount of task performance in unit time)