Fundamentals of Gestural Interaction

Lecture 1: Fundamentals of Gestural Interaction
Jean Vanderdonckt, UCLouvain
Vrije Universiteit Brussel, March 5, 2020, 2 pm-4 pm
Location: Room PL9.3.62, Pleinlaan 9, B-1050 Brussels

2
Definition and usage of gestures
Source:
https://www.usenix.org/legacy/publications/library/proceedings/usenix03/tech/freenix03/full_papers/worth/worth_html/xstroke.html
• A gesture is any particular type of body
movement performed in 0D, 1D, 2D, 3D or
more
• e.g., Hand movement (supination, pronation, etc.)
• e.g., Head movement (lips, eyes, face, etc.)
• e.g., Full body movement (silhouette, posture, etc.)
x
0D
x
x
1D
x
y x
y
2D
x
y
z
x
y
z
3D
y
z
x
y z
u
4D
x
y
z
x
y z
u
v
5D
x
y
z
x
y z
u
vw
6D

3
Sources: Kendon, A., Do gestures communicate? A review. Research on Language and Social Interaction 27, 1994, 175-200.
Graham, J. A., Argyle, M. (1975). A cross-cultural study of the communication of extra-verbal meaning by gestures. International
Journal of Psychology 10, 57-67
• Benefits of gestures
• Gestures are rich sources of information and
expression
• Gestures convey information to listeners
• Gestures communicate attitudes and emotions both
voluntarily and involuntarily
• Any body limb can be used
• Hands and fingers, Head and shoulders
• Whole body,…
• Contact-based or contact-less

4
• Gestures are fast to produce
• Shorter gestures are perceived as easier (93%)
• Gestures are a “natural” equivalent of keyboard
shortcuts

• Gestures could even become menu shortcuts
Sources: C Appert, S Zhai, Using strokes as command shortcuts: cognitive benefits and toolkit support, Proceedings of the SIGCHI Conference
on Human Factors in Computing Systems CHI 2009.
Dictionary Shortcuts Testing area

• Gestures as shortcuts: cognitive
• advantages in learning and recall (3 times more)
Source: C Appert, S Zhai, Using strokes as command shortcuts: cognitive benefits and toolkit support, Proceedings of the SIGCHI Conference
on Human Factors in Computing Systems CHI 2009.

7
• Pen-based gestures (contact-based)
• are hand movements captured by a pointing
device (pen, mouse, stylus) on a constrained 2D
surface (tablet, whiteboard, wall screen,…)
enabling pen-based interaction and computing

8
• Pen-based gestures (contact-based)
• Example: UI design by sketching (GAMBIT)
Source: Ugo Braga Sangiorgi, François Beuvens, Jean Vanderdonckt, User interface design by collaborative sketching. Conference on
Designing Interactive Systems 2012: 378-387.

9
Definition of strokes
• A stroke is a time series of points. The first point is
the starting point (often marked on a drawing), the last
point the ending point
• A pen-based gesture is a time series of strokes
• A shape is a stroke (iconic) gesture corresponding to
a geometrical figure, e.g., a square
• Shape recognition could be viewed as similar to gesture recognition
• A shape can be open or closed if the distance between the starting
and ending points falls below some threshold
• Circles, squares, or diamonds are closed shapes, while question
marks and brackets are open shapes
Stroke
Gesture
Shape
Motion
Stroke

10
Definition of gestures, which can be
• Unistroke or multistroke, depending on the number
of strokes produced by the user
• synonyms: single-stroke and multi-stroke)
• Unidirectional or multidirectional, depending on the
number of directions involved (one or many)
• ex.: a unistroke, unidirectional gesture is called a flick
• Verb: strike or propel (something) with a sudden quick
movement of the fingers; noun: a sudden quick
movement [Source: Merriam-Webster]
• ex.: a multi-orientation flick is called a mark

11
Definition of gestures
2 orientations 2 orientations 3 orientations
1 orientation
A gesture is said to be
• Unistroke
• Multistroke
• Unidirectional
• Multidirectional
• Flick
• Mark

12
Definition of shapes
The same shape can be drawn in many different ways
• Unistroke
• Multistroke
• Unidirectional
• Multidirectional
flick stroke
flick
flick
mark
mark mark

Invariance properties
1. Variation of strokes  stroke clustering
Stroke invariance = independence of any
combination of strokes
2. Variation of directions  direction interpretation
Direction invariance = independence of the
direction in which strokes are produced

3. Variation of sampling  re-sampling needed
Sampling invariance = independence of the
sampling capability of the input device
Source: J.O. Wobbrock, A.D. Wilson, Y. Li, Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes, Proc.
of UIST’2007.

Source: U. Sangiorgi, V. Genaro Motti, F. Beuvens, J. Vanderdonckt, Assessing lag perception in electronic sketching, Proc. of NordiCHI ’12.
4. Variation of lag  different perceptions by platform
Lag invariance = independence of any speed
The range below 20 FPS was rejected by users.
The difference between grades is not
significant above 24 FPS.
No conclusive evidence that subjects
perceived differences of 2, 5 and 10 FPS when
testing pairs of rates.

5. Variation of position  different locations possible
Translation invariance = independence of the
location of articulation

6. Variation of size  different sizes depending on the
surface of the platform: rescaling is perhaps needed
Scale invariance = independence of size/scale
Source: U. Sangiorgi, F. Beuvens, J. Vanderdonckt, User Interface Design by Collaborative Sketching, Proc. of DIS’2012

7. Variation of angle  different orientations
depending on position
Rotation invariance = independence of any rotation

8. Variation of skew => linear/bilinear transformations are
perhaps needed to recover the initial shape
Skew invariance = independence of any transformation

20
Source:
Example: xStroke recognizer
Principle: transform a gesture into a string by traversing
cells in a 3x3 matrix
Horizontal flick = 123, 456, or 789
Vertical flick = 147, 258, 369
Main diagonal = 159
Second diagonal = 357
Variations of a uppercase ‘B’
123 654 456 987
741 23 654 456 987
147 741 23 654 456 987

21
Source:
- Stroke invariance: no (unistroke)
- Direction invariance: no, unless all variations incl.
Horizontal flick = 123, 456, or 789
Main diagonal = 159
Right-to-Left (RTL) instead of Left-to-Right (LTR)
Horizontal flick = 321, 654, 987
Main diagonal = 951
Plus all sub-cases: 12, 23, 45, 56, 78, 89, etc.

22
Source:
- Stroke invariance: no (unistroke)
- Direction invariance: no, unless all variations incl.
Problem: “Fuzzy” or imprecise traversals cause
false positive
Example: 12569 for main diagonal vs marking

23
Source:
- Sampling invariance: yes, but depends of the soft.
- Lag invariance: no, depends on the device
Problem: forgetted traversals cause true
negative
A unistroke becomes a multi-stroke

24
Source:
- Translation invariance: yes, two modes
- Scale invariance: yes, bounding box
- Rotation invariance: no
In Window mode In full screen mode

25
Rubine (1991)
Stroke gesture recognizers
• Statistical classification for unistroke gestures
• A gesture G is an array of P sampled points
• Features
– Incrementally computable
– Meaningful
– Enough, but not too many

26
Rubine (1991)

27
Rubine (1991)
• Training phase
• Recognition / classification phase
Gesture samples
for class c
Optimal
Classifier
x1
x2
0
C1
C3
C2

28
Rubine (1991)
• Stroke invariance: no (unistroke)
• Direction invariance: yes, by considering symmetrical gesture
• Sampling invariance: yes
• Lag invariance: yes, but depends on the device
• Translation invariance: yes, with translation to (0,0)
• Scale invariance: no
• Rotation invariance: no

29
Siger (2005)
Training phase
Vector string:
Stroke Vector direction
Left L
Right R
Up U
Down D
Regular expression:
(NE|E|SE)+(NW|N|NE)+(SW|W|NW)+(SE|S|SW)+.
LU,U,U,U,U,U,RU,RU,RU,RU,RU,RU,R,R,R,R,R,R,R,R,RD,RD,RD, RD,D,D,D,D,LD,LD,LD,LD,L,L,LD,LD,LD,D,D,D,D,D,D,D,D,D,L

30
Siger (2005)
Training phase Regular expression:
Dim strokeInfo as New StrokeInfo(InkPicture1.Ink.Strokes(0))
If strokeInfo.IsMatch(Vectors.Ups & Vectors.Rights & _ Vectors.Downs & Vectors.Lefts & Vectors.Downs)
Then MessageBox.Show("Question Mark")
End If
Code listing:
The SiGeR (Simple Gesture Recogniser) algorithm was
developed by Scott Swigart for the Microsoft Developer
Network to illustrate the implementation of custom
gesture recognizers for the Microsoft Tablet PC platform.
The algorithm classifies gestures on the basis of regular
expressions

31
Siger (2005)
Training phase Regular expression:
Dim strokeInfo as New StrokeInfo(InkPicture1.Ink.Strokes(0))
If strokeInfo.IsMatch(Vectors.Ups & Vectors.Rights & _ Vectors.Downs & Vectors.Lefts & Vectors.Downs)
Then MessageBox.Show("Question Mark")
End If
Code listing:
The SiGeR (Simple Gesture Recogniser) algorithm was
developed by Scott Swigart for the Microsoft Developer
Network to illustrate the implementation of custom
gesture recognizers for the Microsoft Tablet PC platform.
The algorithm classifies gestures on the basis of regular
expressions

32
LVS (Levenshtein-based, 2007)
• Training
• Transformation into a character chain
• Recognition
• Comparison to all training examples
• Calculating Edit-Distance
• Smaller distance is chosen
Source: Coyette, A., Kieffer, S., Vanderdonckt, J., Multi-Fidelity Prototyping of User Interfaces, Proc. of 11th IFIP TC 13 Int. Conf. on
Human-Computer Interaction Interact’2007

33
• Each gesture is superimposed with a grid
• Each sampling point is mapped to its closest grid node
• For adjacent nodes, a stroke direction (out of 8) is selected
• A string is generated from direction coding of input
• Complex task of comparison of drawing inputs is reduced to simple
calculation of string edit distances

34
• Distance is shortest sequence of edit commands that transform s to t
• Simplest set of operations:
– Copy character from s over t
– Delete a character in s (cost 1)
– Insert a character in t (cost 1)
– Substitute one character for another (cost 1)

35
• Distance is shortest sequence of edit commands that transform s to t
• Simplest set of operations:
– Copy character from s over t
– Delete a character in s (cost 1)
– Insert a character in t (cost 1)
– Substitute one character for another (cost 1)

36
• How to obtain rotation independence?

37
• Training types for Levenshtein on digits
– User-dependent: recognition system trained by the user tester
– User-independent (1): recognition system trained by a user different from the user
tester
– User-independent (2): recognition system trained by several users different from the
user tester

38

39
Many recognition techniques exist from AI/ML
• Hidden Markov Chains (HMM)
• Support Vector Machines (SVM)
• Feature based linear discriminants
• Neural Networks
• CNN
• …

40
Many recognition techniques exist from AI/ML
• Hidden Markov Chains (HMM)
• Support Vector Machines (SVM)
• Feature based linear discriminants
• Neural Networks
• CNN
• …
Challenges:
• They require advanced knowledge from machine learning &
pattern recognition
• Complex to implement, debug, and train
• Parameters that need tweaking
• Risk of overfitting
• Modifiability when new gestures need to be incorporated

41
Benefits:
• They require minimal programming knowledge, nothing from AI
• Easy to implement, debug, and train
• No calibration
• No need to retrain/remodel when new gestures appear
• Straightforward geometric interpretation
• Performance
• Recognition rate >90%
• Execution time < ms
• Computational complexity reasonable
• Limited amount of LOC
Stroke gesture recognizers: recent recognizers in HCI

42
Several recognizers:
• Members of the $-family
• $1: unistroke
• $N: multi-stroke
• $P: cloud-point multi-stroke
• $P+: optimized for users with special needs (the most accurate
today)
• $Q: optimized for low-end devices (the fastest one today)
• Other recognizers
• Protractor
• $N-Protractor
• $3, Protractor3D
• 1ȼ
• Penny Pincher
• Match-Up & Conquer
• …
• Common ground: the all use the Nearest-Neighbor-Classification
(NNC) technique from pattern matching

43
Nearest-Neighbor-Classification (NNC): basic principle
1. Build a training set (or template set) =
set of classes of gestures captured from
users to serve as reference
2. Define a (dis)similarity function =
function F that compares two gestures:
- a candidate gesture
- any training/reference gesture
from the training set
3. Compare the candidate with every
reference and return the class of the
most similar reference
F( , ) = 1.51
F( , ) = 0.98
F( , ) = 0.12
F( , ) = 0.12

44
Nearest-Neighbor-Classification (NNC) for 2D strokes
Y
0 0.25 0.50 0.75 1
X
0
0.25
0.50
0.75
1
= candidate point
x = reference point
x
x
x
x x
x
x
x
x
x
x
x
x
x
k nearest neighbors
1 nearest
neighbor distance
0 1
x x x x x
x x x
x x x
Reference gestures
Training set
candidate
gesture
p2
p3
p4
p1
k-NN
k nearest neighbors
1-NN
Single nearest neighbor
applied to gesture recognition
q3
q1
q2
q4
reference
gesture

45
Nearest-Neighbor-Classification (NNC)
• Pre-processing steps to ensure invariance
• Re-sampling
• Points with same space between: isometricity
• Points with same timestamp between: isochronicity
• Same amount of points: isoparameterization
• Re-Scaling
• Normalisation of the bounding box into [0..1]x[0..1] square
• Rotation to reference angle
• Rotate to 0°
• Re-rotating and distance computation
• Distance computed between candidate gesture and
reference gestures (1-NN)

46
The $1 recognizer
• Unistroke gesture recognizer
• Euclidean distance as dissimilarity
• implementable in 100 LOC
• 1-to-1 point matching;
• 99% with 3+ samples per gesture
• Accessible at https://depts.washington.edu/aimgroup/proj/dollar/
Source: Jacob O. Wobbrock, Andrew D. Wilson, Yang Li. (2007). Gestures without libraries, toolkits or training: a $1 recognizer for user
interface prototypes. In Proc. UIST '07, 159-168

47
The $N recognizer
• Multistroke gesture recognizer
• Euclidean distance as dissimilarity
• implementable in 200 LOC
• generates all stroke permutations
• Uses $1 and stroke order permutations
• Accessible at http://depts.washington.edu/aimgroup/proj/dollar/ndollar.html
Source: Anthony, L., Wobbrock, J.O. (2010). A lightweight multistroke recognizer for user interface prototypes. In Proc. of Graphics
Interface 2010, 245-252, 2010

48
The $P recognizer
• Multistroke gesture recognizer
• Should become articulation-independent: removes all the
temporal information associated to gesture points
• Users perform gestures in various ways, not easily accountable in
training sets
• 442 possible ways to draw a square with 1, 2, 3, and 4 strokes
Source: Vatavu, R.D., Anthony, L., Wobbrock, J.O. (2012). Gestures as Point Clouds: A $P Recognizer for User Interface Prototypes. Int.
Conf. on Multimodal Interaction 2012, ICMI’12

49
Comparison of some $* recognizers
Source: Vatavu, R.D., Anthony, L., Wobbrock, J.O. (2012). Gestures as Point Clouds: A $P Recognizer for User Interface Prototypes. Int.
Conf. on Multimodal Interaction 2012, ICMI’12

50
Other gesture recognizers
• Protractor (Li, CHI 2010)
• $3 (Kratz and Rohs, IUI 2010)
• Protractor3D (Kratz and Rohs, IUI 2011)
• $N-Protractor (Anthony and Wobbrock, GI 2012)
• 1ȼ (Herold and Stahovich, SBIM 2012)
• Match-Up & Conquer (Rekik, Vatavu, Grisoni, AVI 2014)
• Penny Pincher (Taranta and LaViola, GI 2015)
• Jackknife (Taranta et al., CHI 2017)
• Other dissimilarity functions
• Dynamic Time Warping (DTW), Hausdorff, modified Hausdorff,
Levenshtein + chain coding,…
• Techniques for optimizing training sets
• 1F (Vatavu, IUI 2012)
• metaheuristics (Pittman, Taranta, LaViola, IUI 2016).

51
Example: Match-up & Conquer
• Robust recognition of multi-touch gestures
• Users perform multi-touch gestures in various ways, not easily
accountable in training sets
Source: Yosra Rekik, Radu-Daniel Vatavu, Laurent Grisoni. 2014. Match-Up & Conquer: A Two-Step Technique for Recognizing
Unconstrained Bimanual and Multi-Finger Touch Input. In Proc. of AVI’14, 201-208

52
• Two families of approaches
• “Between points” distance
• $-Family recognizers: $1, $3, $N, $P, $P+,
$V, $Q,…
• Variants and optimizations: ProTractor,
Protractor3D,…
• “Vector between points” distance
• PennyPincher, JackKnife,…
[Vatavu R.-D. et al, ICMI ’12]
[Taranta E.M. et al, C&G ’16]
Software dimension: Which algorithm?

53
• Two families of approaches
• “Between points” distance
• $-Family recognizers: $1, $3, $N, $P, $P+,
$V, $Q,…
• Variants and optimizations: ProTractor,
Protactor3D,…
• “Vector between points” distance
• PennyPincher, JackKnife,…
• A third new family of approaches
• “Vector between vectors” distance:
our approach

54
• Local Shape Distance between 2 triangles based on
similarity (Roselli’s distance)
𝑎
𝑏
𝑢
𝑣
𝑎 + 𝑏
𝑢 + 𝑣
Paolo Roselli
Università degli Studi di Roma, Italy
Source: Lorenzo Luzzi & Paolo Roselli, The shape of planar smooth gestures and the convergence of a gesture recognizer, Aequationes
mathematicae volume 94, 219–233(2020).

55
• Step 1. Vectorization for each pair of vectors between
three consecutive points
p1
p2
p3
p4
p5
p6
q1
q2
q3
q4
q5
q6
Training
gesture
Candidate
gesture
Source: Jean Vanderdonckt, Paolo Roselli, Jorge Luis Pérez-Medina, !FTL, an Articulation-Invariant Stroke Gesture Recognizer with
Controllable Position, Scale, and Rotation Invariances. ICMI 2018: 125-134

56
• Step 1. Vectorization for each pair of vectors between
three consecutive points
p1
p2
p3
p4
p5
p6
q1
q2
q3
q4
q5
q6
p1
p2
p3
p4
p5
p6
q4
q5
q6
Training
gesture
Candidate
gesture
q1
q2
q3

57
• Step 2. Mapping candidate’s triangles onto training
gesture’s triangles
p1
p2
p3
p4
p5
p6
q1
q2
q3
q4
q5
q6
p1
p2
p3
p4
p5
p6
q1
q2
q3
q4
q5
q6
Training
gesture
Candidate
gesture
p1
p2
p3
p2
p3
p4
p3
p4
p5
p4
p5
p6

58
• Step 2. Mapping candidate’s triangles onto training
gesture’s triangles
p1
p2
p3
p4
p5
p6
q1
q2
q3
q4
q5
q6
p1
p2
p3
p4
p5
p6
q1
q2
q3
q4
q5
q6
Training
gesture
Candidate
gesture
p1
p2
p3
p2
p3
p4
p3
p4
p5
p4
p5
p6

59
• Step 3. Computation of Local Shape Distance between
pairs of triangles
p1
p2
p3
p4
p5
p6
q1
q2
q3
q4
q5
q6
p1
p2
p3
p4
p5
p6
q1
q2
q3
q4
q5
q6
p1
p2
p3
p3
p4
p5
p2
p3
p4
p4
p5
p6
Training
gesture
Candidate
gesture
p1p2p3,
q1q2q3
(N)LSD (
)
(
=0.02
p2p3p4,
q2q3q4 )
=0.04
(p3p4p5,
q3q4q5 )
=0.0001
)
p4p5p6,
q3q4q5
(
=0.03

60
• Step 4. Summing all individual figures into final one
• Step 5. Iterate for every training gesture
p1
p2
p3
p4
p5
p6
q1
q2
q3
q4
q5
q6
p1
p2
p3
p4
p5
p6
q1
q2
q3
q4
q5
q6
p1
p2
p3
p3
p4
p5
p2
p3
p4
p4
p5
p6
Training
gesture
Candidate
gesture
p1p2p3,
q1q2q3
(N)LSD (
)
p2p3p4,
q2q3q4 )
(
)
p4p5p6,
q3q4q5
(
(p3p4p5,
q3q4q5 )
=0.02 =0.04 =0.0001 =0.03
=0.02+0.04+0.0001+0.03=0.0901
(indicative figures)

• Physical demand depends on variables
• Gesture form: specifies which form of gesture is elicited.
Possible values are:
• S= stroke when the gesture only consists of taps and flicks
• T= static when the gesture is performed in only one location
• M= static with motion (when the gesture is performed with a
static pose while the rest is moving)
• D= dynamic when the gesture does capture any change or
motion
User experience dimension: evaluation criteria

• Physical demand depends on variables
• Laterality: characterizes how the two hands are
employed to produce gestures, with two categories, as
done in many studies. Possible values are:
• D= dominant unimanual, N= non-dominant unimanual,
S= symmetric bimanual, A= asymmetric bimanual
User experience dimension: evaluation criteria
Source: https://www.tandfonline.com/doi/abs/10.1080/00222895.1987.10735426
D
(right handed)
N
(right handed)
S
(right handed)
A
(right handed)

• Aigner et al.’s taxonomy of mid-
air gestures
• P= Pointing gestures (= deictic
gestures) indicate people,
objects, directions
Psychological dimension: What is a gesture?
Source: Aigner, R., Wigdor, D., Benko, H., Haller, M., Lindlbauer, D., Ion, A., Zhao, S., et al. (2012). Understanding Mid-Air Hand
Gestures: A Study of Human Preferences in Usage of Gesture Types for HCI.

air gestures
• Semaphoric gestures are hand
postures and movements
conveying specific meanings
• T= Static semaphorics are identified by a
specific hand posture. Example: a flat palm
facing from the actor means “stop”.
• D= Dynamic semaphorics convey information
through their temporal aspects. Example: a
circular hand motion means “rotate”
• S= Semaphoric strokes represent hand flicks
are single, stroke-like movements. Example: a
left flick of the hand means “dismiss this
object”
Source: Aigner, R., Wigdor, D., Benko, H., Haller, M., Lindlbauer, D., Ion, A., Zhao, S., et al. (2012). Understanding Mid-Air Hand
Gestures: A Study of Human Preferences in Usage of Gesture Types for HCI.

air gestures
• M= Manipulation gestures a
guide movement in a short
feedback loop. Thus, they feature
a tight relationship between the
movements of the actor and the
movements of the object to be
manipulated. The actor waits for
the entity to “follow” before
continuing
Source: https://www.microsoft.com/en-us/research/publication/understanding-mid-air-hand-gestures-a-study-of-human-
preferences-in-usage-of-gesture-types-for-hci/

• When to recognize?
Source: Olivier Bau, Wendy E. Mackay, OctoPocus: a dynamic guide for learning gesture-based command sets. UIST 2008: 37-46

• A Gesture Set is a set of potentially related gestures that could
be used together in one or several interactive applications
• A gesture set could be defined for a particular
- population of users, e.g., motor-impaired people
- set of tasks, e.g., annotation of X-rays
- platform/device, e.g., for an operating system, a style guide
- environment, e.g., machine verification
• A gesture set is said to be
- domain-dependent if used only in a particular application
domain (e.g., mammography, pediatrics, nuclear power plant)
- domain-independent if used across several applications (e.g.,
for an operating system)
Gesture sets: training and reference

• xStroke Gesture set: a set frequent gestures
• All printable ASCII characters: letters a-z, digits 0-9, and punctuation
Gesture sets: examples

• All additional gestures necessary to function as a full keyboard
replacement: Space, BackSpace, Return, Tab, Escape, Shift, Control,
Meta

• Advanced set of gestures:
• Numbers in letter mode
• Punctuation in letter m

1. Design applications with touch interaction as the primary
expected input method.
2. Provide visual feedback for interactions of all types (touch, pen,
stylus, mouse, etc.)
3. Optimize targeting by adjusting touch target size, contact
geometry, scrubbing and rocking.
4. Optimize accuracy through the use of snap points and
directional "rails".
5. Provide tooltips and handles to help improve touch accuracy for
tightly packed UI items.
6. Don't use timed interactions whenever possible (example of
appropriate use: touch and hold).
7. Don't use the number of fingers used to distinguish the
manipulation whenever possible
Gesture sets: guidelines

Thank you very much
for your attention

Fundamentals of Gestural Interaction

More Related Content

Fundamentals of Gestural Interaction

Editor's Notes