[DSC 2016] 系列活動：李祈均 / 人類行為大數據分析

1
李祈均 (Jeremy)
國立清華大學電機工程學系
人類行為訊息與互動計算研究室
Behavioral Informatics and Interaction Computation Lab (BIIC)
人類行為大數據分析:
資料科學如何應用在教育及醫療領域
2017 January 15th

第四節的交流
如果你(們)有準備:
直接上台我們大家一起聊 ?
2

THIS
IS
SUBWAY
MAP
Data
Science

Naïve Bayes Algorithm
Transfer learning
Apriori Algorithm
Gaussian distribute
Random Forests
Logistic Regression
(Deep)Neural Networks
Decision Trees
Nearest Neighbour
Support Vector Machine K Means Algorithm
Linear Regression
Active learning
Domain adaptation
Semi-supervised learningReinforcement learning
unsupervised learningsupervised learning

9
Emotion
Health Care
Education
Voice Recognition
Symptom diagnosis
Behavior Activity
Image Recogn
Medical
IBM Pathway Genomics
Detection of Diabetic
Retinopathy in Retinal
Fundus Photographs
customer behavior
Medical Imaging
Genomic Medicine
跨領域整合 – 與人相關

What do I do ?
&
What am I going to share ?
10

11
人類行為訊號處理
Behavioral signal processing
Professor Shrikanth Narayanan, USC

12
Seek a window into human mind and traits…
…through engineering approach
S. Narayanan and P. G. Georgiou, “Behavioral signal processing: Deriving human behavioral informatics
from speech and language," Proceedings of the IEEE, vol. 101, no. 5, pp. 1203–1233, 2013.

13
Behavioral Signal Processing (BSP)
Compute Human Behavior Traits and States for Domain Experts Decision Making
• Help experts to do things they know in a more efficient manner at scale
• Develop novel behavioral analytics framework for possible scientific discovery
from qualitative to quantitative . . .
through verbal and non-verbal behavioral cues . . .

Part I
跨領域整合的開始: 教育、醫療與訊
號處理
14

15
人類行為學的分析 . . .

16
訊號(Signals)與系統(System)
High-level 的概念想像 (Abstraction) . . .

18
系統
系統
決策
訊號：
外顯行為
訊號：
主觀記錄
內在
狀態
行為產
生機制
行為認
知機制
內在
狀態觀察

19
教育:
國家教育研究院 – 校長培訓班

20
校長培訓：即席演講評分

21
訊號：
外顯行為
系統
訊號：
主觀量測
行為認
知機制
內在
狀態
專家決策
內在
狀態
行為產
生機制
觀察
聲音表達
肢體動作
內容文字
資深
校長
培訓班演講量表
結業成績
分發

23
訊號：
外顯行為
系統
系統
訊號：
自述(Self Report)
內在
狀態
行為產
生機制
自我認
知機制
內在
狀態
專家決策
自我思考

24
醫療:
長庚醫院 – 急診部

25
前測
後測
內科急診檢傷

26
訊號：
外顯行為
系統
系統
內在
狀態
行為產
生機制
內在認
知機制
內在
狀態
專家決策
聲音表達
臉部表情
生理訊號
檢傷指標
臨床處置
訊號：
自述(Self Report)
NRS
自述
疼痛

28
外顯行為
行為認知
機制
內在狀
態
行為產生
機制
行為產生
機制
行為認知
機制
內在狀
態
外顯行為
觀察
訊號：
主觀量測
專家決策

29
精神醫療:
台大醫院 – 精神部

30
自閉症觀察量表
Autism diagnosis observational schedule

31
外顯行為
行為認知
機制
內在狀
態
行為產生
機制
行為產生
機制
行為認知
機制
內在狀
態
外顯行為
觀察
訊號：
主觀量測
專家決策
ADOS
自閉症類群觀察量表
社交互動行為展現
雙人多模態行為
自閉症診斷
分析行為類別

32
BSP的Role . . .
人高階認知能力: 專家決策
BSP
Technology
高可信度 (reliability)、可複製性 (repeatable) 及可擴展性 (scalable)

QUANTITATIVE：
QUANTITATIVE EVIDENCE DIRECTLY FROM MEASURABLE SIGNALS
EFFICIENCY :
HELP DO THINGS THAT EXPERTS KNOW TO DO WELL MORE
EFFICIENTLY, CONSISTENTLY & AT SCALE
SUPPLMENTARY:
COMPLEMENT WITH GOLD STANDARD METHOD WHEN APPROPRIATE
POSSIBILITY:
TOOLS FOR NOVEL ACTIONABLE INSIGHT DISCOVERY
33
COMPUTING BEHAVIORAL TRAITS & STATES FOR DECISION MAKING & ACTION
…aim..

34
BSP的Enablers . . . (半邊的拼圖)
Text
Processing
Voice Activity
Detection
Alignment
Transcription
Keyword
Spotting
Prosody
Modeling
Voice QualityDiarization
Speaker
Identification
Dialog Act
Tagging
Face
Detection
Expression
recognition
Action
recognition
Language
Understandin
Affective
Computing
Speaker State
and Trait
Joint Speech
Visual
Processing
Interaction
Modeling
Sentiment
Analysis

35
訊號處理、機器學習
Enabling Technologies
領域專家知識
Domain Experts Knowledge
Low level
descriptors
Acoustic
features
Motion
features
Text
features
Image
features
Speech
recognition
Face
recognition
Action
recognition
Dialog act
tagging
Keyword
spotting
Text
processing
Sentiment
Analysis
Affect
recognition
Speaker
states and
traits
Visual-
speech
processing
Interaction
modeling
Subjective
assessment
Internal state
& construct
Neuro-
developmen
tal disorder
Evidence-
based
observational
coding
Intervention
efficacy
Coder
variability
control
Development
of coding
manual
Self report
measure
validity
Coding
mechanism
Social
behavior
Affective
behavior
Communica
tive
behavior
Dyadic
behavior
人類訊號處理

36
四大元素
人類行為訊號處理
Behavior signal processing

BSP
INGREDIENTS
37
資料代表性
資料真實性、應用導向
充分預處理
誰說甚麼 (語音檢測器、人聲分群)
臉部、肢體偵測、追蹤
適切模型建立
符合資料特性、數量的演
算法應用
演算法包含: 特徵值計算 +
機器學習
多面向模型評估
多面向有意義的評估系統效度
慎選指標、應用導向
持續性跨領域合作
I. 收集 II. 預處理
III. 建模 IV. 評估

38
緊密的跨領域合作
BSP
INGREDIENTS
領域專家
工程師
人類行為訊號資料科學家

40
Computational Methods that Model Human Behavior Signals
• Manifested in Overt and Covert Cues
• Processed and Used by Humans Explicitly or Implicitly
• Facilitate Human Analysis and Decision Making
Outcome of Behavioral Signal Processing
• Behavioral Analytics
QUANTIFYING HUMAN EXPRESSED BEHAVIOR AND
HUMAN “FELT SENSE”
DERIVING INTERPRETABLE BEHAVIOR ANALYTICS
FROM DATA FOR ACTIONAL INSIGHTS

41
今天演講主軸：
國教院校長培訓班：自動即席演講評量分數建構
急診內科檢傷：觀察式痛覺指數開發
自閉症類群觀察量表：分析量表中社交溝通評分

42
親子天下(2013，3 月13 日)以「題型僵化？誰把老師考笨了？」
為標題，明白指出對於現行教師資格檢定命題方式的疑慮
天下雜誌(2013，5 月29 日)從「被標準答案綁架的老師」
來訴說檢定試題題型的僵化，已限制住原有欲評量師資
應具備的基本能力。

44
國家教育研究院
年度校長儲訓班

45
個案研究
學校校務發展計畫
教育參觀心得
學科測驗
約200位學員/年
目前: 兩位輔導校長評分
有可能電腦一起幫忙評?
生活札記
即席演講
學校行政事務演習
生活表現

47
Can you tell the difference?

48
1. Subjective evaluation
2. Time-consuming
3. Non-scalable
1. 肢體動作
2. 聲音語調
3. 文字
合作此計畫：大規模在別的甄試或培
訓計畫中

49

50
0
2000000
4000000
6000000
8000000
2010 2011 2012 2013 2014 2015
2010~2015 THE NUMBER
OF EMERGENCY PATIENTS
7,200,000

52
台灣急診檢傷與急迫度分級量表
(Taiwan Triage and Acuity Scale, TTAS)
呼吸窘迫
血行動力
意識程度
外傷
體溫
週邊或中樞
疼痛程度
自述量表 (NRS-11)
有些問題

護士自己觀察後給分
The difficulty in implementation of NRS
53

54
透過聲音、影像訊號開發
‘自動化觀察式’量表
解決目前NRS-11可能的問題 – 幫忙

55

56
social-communicative neurodevelopmental disorder
• Prevalence: 1 in 68 children (1 in 42 males) diagnosed [CDC2014]
• ASD: “Spectrum” disorder due to the extreme heterogeneity
• Intervention leads to improved outcomes
BSP in Autism 中的角色?
What is Autism?

57
ROLE OF BSP?
自動的分析醫生小孩在ADOS診斷中互動中 social and
interactive 行為
AIM?
• Analysis at scale
• Quantitative evidence from signals
• New finding beyond current status-quo
in psychiatry (?)

58
此項目在整個評估過程中，以
任何溝通模式出現的相互向交
流出現之絕對數目、以及它在
各種當前情境中的分佈。
廣泛的使用口語或非口語行為
以達到社會交流(看起來要有相
互性意圖、給意見、評論、或
是非口語行為
有些相互性社會溝通，但其頻
率或數量、或是出現該行為的
情境數目較少(不論其非社交性
談話)
大多數溝通以物品導向、或是
對問去回答、仿說、無特定執
迷、無社交性聊天
Qualitative description

59
Example: a snippet of an actual clinical ADOS diagnostic session

60
Can we?
Automatic measuring spontaneous social (verbal/nonverbal) behavior between
clinician and child predicting the child rating of atypical amount of social
reciprocal communication
從聲音影像，開發醫生小孩社交互動行為指標，用以分析預測相互性社會溝通數量

BSP
INGREDIENTS
62
資料代表性
充分預處理
適切模型建立
算法應用
機器學習

64
透過數位資料收集、整合跨領域人類科學知識、開發訊號處理、機器學習演
算法，對於人類行為進行觀察量化分析辨識，提供專家全新的決策工具
𝑩𝒆𝒉𝒂𝒗𝒊𝒐𝒓𝒂𝒍 𝑨𝒏𝒂𝒍𝒚𝒕𝒊𝒄𝒔 = 𝒇 𝑺𝑷/𝑴𝑳 + 𝑩𝒆𝒉𝒂𝒗𝒊𝒐𝒓 𝑺𝒄𝒊𝒆𝒏𝒄𝒆

Part 2:
人類行為資料收集與處理
65

BSP
INGREDIENTS
66
資料代表性
充分預處理
適切模型建立
算法應用
機器學習

67
資料代表性
影音訊號資料取得
• 強建的多模態信號(聲音、影像)紀錄和處理
• 以生態有效的方式(ecologically-valid)紀錄
人類自然行為
• 對於ease-of-application, realism要求
辨識目標的評分
• 運用established instrument
• Scientific-rigor
• Ensure domain-applicable
analytics 產出

68

69
where
when
how
BIIC：無聲隔離室
本來：無限制國家教育院的教室
只好：盡量不要發聲的教室
Ensure current system is not altered too much at the BEGINNING
at-scale, ease-of-application is crucial
在ecological validity & quality control 之中有拿捏
BIIC：每個校長在培訓班中的考試
本來：無
只好：在人力可以範圍內全部錄
BIIC：耳麥、多軌錄音、臉部、肢體動作，
Kinect、全部synchronized
本來：無器材、要可以用簡單人力做
只好：上半身錄影外接麥克風收音

70
!! 其實合作多密切也會有關係啦 !!
人突然走過
下課鐘聲? @@
發現後處理

71
目前：收了360後用演講影音

72

73
演說內容符合主旨
架構分明、井然有序
針對對像使用適當語詞
國家教育研究院：校長儲訓辦即席演講評分表
態度得體
發音標準
語調音量適切、語句流暢
時間控制得宜、結尾不匆促
總分

74
急診內科檢傷：自動觀察式痛覺指數研究

75
where
when
how
本來：無
只好：盡量要安靜的獨立診間
Ensure current system is not altered too much at the BEGINNING
at-scale, ease-of-application is crucial
BIIC：每個病人前測、後測、追蹤
本來：沒有影音
只好：在能力可以範圍內全部錄
BIIC：全正面錄影
本來：沒有影音
只好：單一相機近距離錄

76
!! 因為是真實情境 !! 所以很多可能
一堆人
發現後處理

77
目前：收了250病患，前後測影音資料

78

79
Verbal Numerical Rating Scale (NRS)
11 級 self-report pain-level assessment (0 - 11)
Considered as clinically-valid‘gold standard’for assessing pain
同步收集了心跳、血壓、疼痛部位、年齡、性別

80

81
where
when
how
本來：相對安靜的診間
結果：安靜的獨立診間
Research Oriented:
We have a little more flexibility in the room design!!
BIIC：病人前來進行ADOS
本來：病人前來進行ADOS
結果：病人前來進行ADOS
BIIC：全正面錄影
本來：單一相機
結果：多台相機近距離錄

82
Two HD-cameras Two lapel microphones (synced through mixers)
~40 subjects

83

Autism Diagnostic Observation Schedule [Lord 2001]
• Subject interacts with a psychologist for ~45 minutes
• Current gold standard, research-level observational coding
• Psychologists are trained using stringent training protocol
• Semi-structured assessment in eliciting socio-communicative
behavior of the ASD children for diagnostics
• Multiple subparts events (14) on rating of a wide range number of
socio-communicative behavior (28)
84

85
專業醫師的判斷 – Internally 有quality control
是非口語行為
有些相互性社會溝通，但其頻
率或數量、或是出現該行為的
情境數目較少(不論其非社交性
談話)
大多數溝通以物品導向、或是
對問去回答、仿說、無特定執
迷、無社交性聊天
整份ADOS量表都同步收集

86
資料代表性
建議 1：討論、討論、討論
建議 2：收的過程影響後面預處理
建議 3：想像未來應用
建議 4: 領域專家意見太重要

BSP
INGREDIENTS
87
資料代表性
充分預處理

88
充分預處理
“看聽讀懂辨識”前的苦工
• Pre-processing
• Data collection-dependent
• Smart utilization of current
progresses in audio-video
processing
要被學的是什麼label?
• Label consistency
• Reliable labeling
• Construct validity

89

90
Voice Activity Detector
聲音偵測
先自動找出有人聲的部分

• Speech signal per session
• Energy every frame
– frame = 25ms
– standard deviation (normalize D.C. offset)
• 閥值Threshold
– speech percentage in the wav
• Speech Segments
– Energy > Threshold Energy
Short-Time energy
Formula:
𝑬 𝒏 =
𝒎=𝒏−𝑵+𝟏
𝒏
𝒙 𝟐
(𝒎)
簡單的聲音偵測器

Human
V A D
VAD
Human
量化這些片段的聲音、影像、文字 (Part 3)

93

94
先自動找出有病人聲的部分 (Diarization)
先從影片中抓出臉和點

95
非監督式diarization
Segmentation and Clustering (Diarization)
分割後整合
Speaker B
Speaker A
Where are speaker
changes?
Which segments are
from the same speaker?

96
Segmentation and Clustering (Diarization)
分割後整合
計算個人聲音特徵
(梅爾倒頻係數)
MFCC低階特徵計算
Low-level descriptors
(part 3)
每個音框 (frame)

97
Segmentation:
speaker change
detection
1. 先去無聲片段 (用機器學習學)
2. 每組frame跟中間量測他們一不一樣
常用：Bayesian Inference Criterion(BIC)
會變成：

98
Clustering
speaker change
detection
1. Generate i-vector for each ‘segment’
2. Compute pair-wise similarity each cluster
3. Merge closest clusters
4. Update distances of remaining clusters to
new cluster
5. Iterate steps 2-4 until stopping criterion is
met

Speaker
Diarization
人標的
問話
病人
部分有其他改進，但部分其實也沒那麼不對
可以對病人計算電腦要 “聽”的東西了!

100
抓臉與68個facial landmark (openface toolkit)
影片變照片

101
Face detection
68 facial landmark
detection
Pre-trained
Constrained local neural field method

102
真實社會沒這麼簡單 . . .
用數個規則後續自動修正
例如：太測臉 → 眼睛跟臉頰距離突然太近
其實也還滿多的問題我們一個ㄧ個修正 (learn the hard way!)
可以算 “臉部表情” 和 “聲音特徵”了

103
以下可以算 “頭部動作” 和 “聲音特徵”了
自閉症類群觀察量表：計算量表中質性社交溝通敘述重複上述、當然資料收集過程有些不太一樣、
TAILORED SOLUTION
建議 1：資料收集過程重要、重要、重要
建議 2：熟悉資料是必要、必要、必要
建議 3：有些真的太錯、知道錯、不要用

104
充分預處理
• Pre-processing
processing
預處理：不止訊號端

105
Label的處理

資深校長評分dynamic range不一
排序正規化

4 dimensions:
95% variance
• 演說內容符合主旨 ( 20% )
• 架構分明、井然有序 ( 20% )
• 針對對象使用適當語詞 ( 20% )
• 語調音量適切、語句流暢 ( 10% )
• 態度得體 ( 10% )
• 發音標準 ( 10% )
• 時間控制得宜 ( 10% )
– 總分 (100%)
107
反正label在手，順手分析一下 - 這是個重要前處理
PCA
First principal axis
weights

記得要算inter-evaluator agreement level
越高階層concept的人越容易一至!
最後對rank-normalized總分進行辨識
Depends on the scenarios (sometimes reviewers too!)
Cronbach’s alpha, Intra-class correlation, Fleiss Kappa, Cohan’s Kappa
演說內容符合主旨+架構分明、井然有序+針對對象使用適當語詞
語調音量適切、語句流暢
態度得體
發音標準
總分
0.55
0.39
0.43
0.58
0.63

109
Label的處理

110
Self report 有甚麼好處理
病人: 高疼痛
病人: 中疼痛
開發驗證時
應該要最準的哪部分呢?

111
因為有治療記錄
用來開發驗證framework的
sample該是哪些呢?
Rule:
有打針
前後測自述疼痛有明顯下降
Data samples
IEEE
跨領域知識的重要性

112
自閉症類群觀察量表：計算量表中質性社交溝通敘述
Label的處理

113
問題是甚麼? Label 專家給
Social Reciprocity – 需要對話，ADOS非常完整
Description of pictureCreating a story
Emotion Joint interactive play
針對要分析的label 慎選分析片段

114
充分預處理
• Pre-processing
processing
建議 1：要被學的label需要被分析徹底
建議 2：domain experts 很厲害
建議 3：讀、讀、讀

115
領域專家知識
Low level
descriptors
Acoustic
features
Motion
features
Text
features
Image
features
Speech
recognition
Face
recognition
Action
recognition
Voice
activity
Diarization
Text
processing
Sentiment
Analysis
Affect
recognition
Speaker
states and
traits
Visual-
speech
processing
Interaction
modeling
Subjective
assessment
Internal state
& construct
Neuro-
developmen
tal disorder
Evidence-
based
observational
coding
Intervention
efficacy
Coder
variability
control
Development
of coding
manual
Self report
measure
validity
Coding
mechanism
Social
behavior
Affective
behavior
Communica
tive
behavior
Dyadic
behavior
人類訊號處理
資料label怎麼收
Label預處理
訊號預處理

116
1. 有data了
2. 有可用label/data了
3. 可以來開發behavior analytics了

Part 3:
電腦如何「聽」、「讀」及「看」
懂人類行為
117

BSP
INGREDIENTS
118
資料代表性
充分預處理
適切模型建立
算法應用
機器學習

119
領域專家知識
Low level
descriptors
Acoustic
features
Motion
features
Text
features
Image
features
Speech
recognition
Face
recognition
Action
recognition
Dialog act
tagging
Keyword
spotting
Text
processing
Sentiment
Analysis
Affect
recognition
Speaker
states and
traits
Visual-
speech
processing
Interaction
modeling
Subjective
assessment
Internal state
& construct
Neuro-
developmen
tal disorder
Evidence-
based
observational
coding
Intervention
efficacy
Coder
variability
control
Development
of coding
manual
Self report
measure
validity
Coding
mechanism
Social
behavior
Affective
behavior
Communica
tive
behavior
Dyadic
behavior
人類訊號處理

120
在human computing (signal為主) research
Data & algorithm go hand-in-hand
改變速度非常快
Algorithms

121
聽、讀、看懂 – 基礎一點通?

122
低階訊號描述值
編碼/Profile
語音特性
編碼/Profile
影像特性
編碼/Profile
文字
多
行
為
模
態
的
整
合
Behavioral
Analytics

123
先講低階描述值(low-level descriptors)
電腦開始看懂行為

124
編碼/Profile
語音特性
每個音框 (frame)
Overlapping step
抓到各種變動
發聲系統
Source 聲帶、Filter 聲道

125
比較常用三種LLDs
Pitch (source):
聲音的語調變化
Intensity
(pressure):
聲音的強度變化
MFCC (filter):
梅爾頻率倒譜系數
每ㄧ短時音框的能量值
𝐸 𝑛 =
𝑚=𝑛−𝑁+1
𝑛
𝑥2
(𝑚)
每ㄧ短時音框的自相關函數
𝑅 𝑛 𝑘 =
𝑚=0
𝑁−1
𝑥 𝑛 + 𝑚 𝑥 𝑛 + 𝑚 + 𝑘 , 0 ≤ 𝑘 ≤ 𝐽
找 k 讓此函數有大的值去量化音高
概念上
概念上
將語音訊號轉換至梅爾刻度，在梅爾刻度上
提取對數能量並進行離散傅立葉反轉換，轉
換到倒頻譜域。MFCC就是這個倒頻譜圖的
幅度(13維度)。
概念上
會加上一階二階微分

126
比較快可以上手算
Versatile and Fast Audio Feature
Extractor
Open-Source and Cross-platform
Abundant speech-related features
Signal energy Loudness、
Mel-spectra、MFCC、PLP-
CC、Pitch
Audio I/O
Supported A lot I/O formats: WEKA
HTK LibSVM
可直接視覺化
稍微容易一點
PraatOpensmile
其實還很多啦 . . .

127
編碼/Profile
影像特性
Histogram of oriented gradients (HoG)
Scale-invariant feature transform (Sift)
Local binary pattern (Lbp)
3D SIFT
HOG3D
texture、shape、keypoint、edge
比較常來形容影像(照片) frame
Histogram of oriented gradients (HoG) Local binary pattern (Lbp)

128
容易上手工具
• C++ : opencv
• Python : cv2(Opencv), Scikit-image

129
trajectory
Per-frame 如果要算動作呢?
Improved Dense Trajectory
Optical flow
光流法
Trajectory + HOG + HOF + MBH

130
ㄟ如果沒懂，直接看吧
舉個別的data我們跑的例子，注意看點在跑

131
先講聲音影像編碼(encoding/profile)
聲音10ms就會有一組向量
影像66ms 就會有一組向量
‘Analysis unit’ 通常是一句話、一個session
Label 打在哪(time granularity)就是你的analysis unit
在一個analysis unit下
產生一個向量的編碼
對這一序列的低階特徵
整體的描述
介紹兩個基本

132
聲
音
Analysis unit
Analysis unit
影
像

133
Functionals
LLDs
最簡單就是算ㄧ些統計函數 - 描述feature在這段analysis unit裡面的樣貌
不要小看這種，常常很多speaker state, emotion recognition，這
樣的結果成為baseline後就非常難突破了!!
#𝐿𝐿𝐷 ∗ #𝐹𝑈𝑁𝐶𝑇𝐼𝑂𝑁𝐴𝐿
= 維度

134
k-means clustering
Histograms
Dictionary
Bag-of-feature encoding
LLDs
事先透過k-
means
clustering產生
這兩個滿常被用在audio, video features上
𝑘 = 維度
可被想像成描述組合

135
聲
音
Analysis unit
Analysis unit
影
像

136
編碼/Profile
文字
字
直接用這個當單位去計算針對一個
文件的向量(文件:analysis unit向量)
Distributed word representation
用向量表達一個詞在做編碼

137
Term Weighting Method
a simplifying representation by term count
Term Frequency
How important (or
informative) a word in a document.
Inverse Document Frequency
How important (or
informative) a word in the corpus.
𝑡𝑓𝑡,𝑑
=
𝑛 𝑡,𝑑
𝑘 𝑛 𝑘,𝑑
𝑖𝑑𝑓𝑡,𝐷
= log
𝑁
1 + 𝑑 ∈ 𝐷 ∶ 𝑡 ∈ 𝑑
X
Term Frequency–Inverse Document Frequency (TF-IDF)
有時候就很有效了

138
不一定依一個詞為單位 . . .
N-gram
Turn unigram term into bigram term on the word token step
for instance,
John also likes to watch football games
[ 'John also' , 'also likes' , 'likes to' , 'to watch' , 'watch football' , 'football
games' ]
[ 1 , 1 , 1 , 1 , 1 , 1 ]
可以無限延伸這些東西
那也希望能夠透過這
樣子的一個方式來…
提升我們老師的教學
文
字

139
Distributed word representation
用向量表達一個字(詞)
CBOW predicting the word given its context
Skip-gram predicting the context given a word
distributed representation encoded in the hidden layer of the neural
network as representations of words

140
那也希望能夠透過這
樣子的一個方式來…
提升我們老師的教學
平均
文
字

141
講了三模態低階描述值(low-level descriptors)
電腦開始看懂行為

142
多模態行為融合(multimodal)，在這類型的work幾乎是必備

前端融合
143
低階訊號
描述值
編碼
/Profile
語音
特性
低階訊號
描述值
編碼
/Profile
影像
特性
低階訊號
描述值
編碼
/Profile
文字
Behavioral
Analytics
文
字
影
像
聲
音
機器學習
Behavioral
Analytics
Behavioral
Analytics
決策融合
最簡單兩種多模態行為融合法 ? 哪個
好 ?

144
編碼/Profile
語音特性
編碼/Profile
影像特性
編碼/Profile
文字
多
行
為
模
態
的
整
合
Behavioral
Analytics
Note*
學界潮流：往整合成不同的型態的(D/R)NN, (B)LSTM
BSP 類Work 也可以, just be aware of f(# of data),
and sometimes 解釋力

145
以上 . . .
基礎概念的簡介
除了簡介
如果你手上有個適合的BSP題目，不知道該怎麼辦時。。。
先試試以上看看!!

146
終於讓我們回頭看這三個實例 . . .
架構在上面那個基礎

147

148
建模: 聲音特徵計算
建模: 肢體動作特徵計算
𝑙-frame Dense
Points Tracking
TRAJ
MBHxy
Each 𝑉𝑚 = A Unit-level (66ms)
𝑞-length Derived Video features
肢體動作的描述: Dense Trajectory 高斯混和模型Fisher-編碼
𝑉1
𝑉𝑚
𝑉2
𝑉3
𝑉 𝑀
𝑆1
𝑆2
𝑆𝑁
Acoustic
LLDs
Each 𝐴 𝑆𝑋:𝑌 = A Unit-level (200ms)
𝑝-length Dense Acoustic Features
Functionals
聲音片段切割
𝐴 𝑆1: {1, 𝐾1}𝑆1
𝐴 𝑆1:1
𝐴 𝑆2:1
𝐴 𝑆𝑘:1
𝐴 𝑆1:𝑘
聲音特性描述: Dense Unit Acoustic Features
𝐴 𝑆2: {1, 𝐾2}
𝐴 𝑆3: {1, 𝐾3}
𝐴 𝑆4: {1, 𝐾4}
K-Means Bag-of-word編碼

149
好齁…那也希望能夠透過這樣子的一個方
式來…提升我們老師的教學…來檢視各位
老師的教學成果，
是不是對我們所有的學生齁…有實質的一
個幫助，
‧‧‧‧‧‧
所以有了今天週三進修做為開端，
未來我們會研議更多積極的策略加強本校的
英語教學，
好好把我們的孩子提升他的英語力，
擁有英語力才有競爭力，‧‧‧‧‧‧

所以|c 教師|n 評鑑|v 它|r 不是|c 在|p
檢驗|vn 它|r 是|v 在|p 證明|n
那|r 也|d 希望|v 能夠|v 透過|v 這樣|r 子|ng
的|uj 一個|m 方式|n
來|zg 提升|v 我們|r 老師|n 的|uj 教學|n
來|zg 檢視|v 各位|r 老師|n 的|uj 教學|n
成果|n 是不是|l 對|p 我們|r 所有|b 的|uj
學生|n 有|v 實質,的|uj 一個|m 幫助|v
請對教師說明你對教師評鑑的看法
所以教師評鑑它不是在檢驗它是在證明
那也希望能夠透過這樣子的一個方式
來提升我們老師的教學
來檢視各位老師的教學成果是不是對我們
所有的學生有實質的一個幫助
中文需要先切詞，詞為單位
Jieba
Built to be the best Python Chinese word segmentation module
中研院其實也有切詞系統

151
Word2Vec – 背景資料庫、爬蟲
Yahoo news、蒙典、wiki、ptt

152
文字稿特徵計算
好
齁
那
也
希
望
.
.
.
N-gram K-means
All Documents
BOW
per Document
… …
… …
… …
文章向量
Word2vec
N個前後文字
關係模型
新的：
結合functional, context, bow方式的編碼

153
低階訊號
描述值
編碼
/Profile
語音
特性
低階訊號
描述值
編碼
/Profile
影像
特性
低階訊號
描述值
編碼
/Profile
文字
Behavioral
Analytics
文
字
影
像
聲
音
機器學習
決策融合

154
這個analytics準確率呢?
𝑺𝒑𝒆𝒂𝒓𝒎𝒂𝒏 𝒄𝒐𝒓𝒓𝒆𝒍𝒂𝒕𝒊𝒐𝒏 = 𝟎. 𝟔𝟐𝟏
總分
Inter-evaluator agreement 0.63
驗證未完待續 . . . (part 4)
個別模態差不
多Spearman
correlation
0.3 - 0.4

155

Raw audio-video
recording
S1
S2
Sk ... MFCC
Pitch
Intensity
𝑍 𝑠1 :
[1,𝑛1]
𝑍𝑆𝑐𝑜𝑟𝑒𝑠𝑘
𝑍 𝑠2 :
[1, 𝑛2]
𝑍 𝑠𝑘 :
[1,𝑛 𝑘]
156
建模: 聲音特性、臉部行為特徵計算
聲音、臉部表情 : 高維度特徵計算
S1
修正個體化差異

157
Action-unit inspired facial ‘low-level descriptors’ computation
Facial landmark Head pose estimation
X
Z
Y
𝜃
Head orientation movement

158
低階訊號
描述值
編碼
/Profile
語音
特性
低階訊號
描述值
編碼
/Profile
影像
特性
Behavioral
Analytics
影
像
聲
音
機器學習
前端融合
使用支持向量
“分類器”分
重跟低疼痛

159
檢傷護士
家屬
病人
NRS-自述 : 高度疼痛
電腦辨識 : 高度疼痛

檢傷護士
家屬
病人
160
NRS-自述 : 輕度疼痛
電腦辨識 : 輕度疼痛

161
準確率呢? 跟self-report NRS111比較!
準確率
高低疼痛辨識 74%
高中低疼痛辨識 52%
驗證未完待續 . . . (part 4)
audio video>
文獻多探討臉部

162

163
從聲音影像，開發醫生小孩社交互動行為指標，用以分析預測相互性社會溝通數量
計算兩人行為
對談中
兩人行為間的共
同性:互消息量
是非口語行為
Quantitatively, Automatically
ADOS description

164
ADOS
Emotion Part
Multimodal Turn-taking Behavior
Coordination Time Series
Automatic generating a time-series of
multimodal behavior coordination measure
across a session . . .

165
Audio
– Pitch
– Intensity
– MFCC
– Delta 、 Delta-Delta
Video
– Head poses
– Eye gaze
– Delta 、 Delta-Delta

166
低階訊號
描述值
編碼
/Profile
語音
特性
低階訊號
描述值
編碼
/Profile
影像
特性
影
像
聲
音
前端融合
較好的結果Canonical correlation analysis

167
ADOS
Emotion Part
Multimodal Turn-taking Behavior
Coordination Time Series
Automatic generating a time-series of
multimodal behavior coordination measure
across a session . . .

168
融合後編碼(symbol)→容易算互消息
每一個turn-taking:
短時間的(1.5second)行為互消息量
Sliding
取最大值當代表值

169
1.5s
X:
Y:
●●●
3 3 3 2 1 1 2 1 3
2 1 2 1 3 1 1 2 3 ●●●
Shift
Session-level descriptors
Behavioral
Analytics機器學習
n個 turn, n個值
Logistic
regression
多模態醫生小孩行為依賴(dependency)

170
Binary Classification between
typical vs. atypical

OS: Social reciprocity score (B9)
辨識
專家診斷
正常
正常
多模態互動行為關係指數

ADOS: social reciprocity score (B9)
辨識
專家診斷
嚴重
嚴重
多模態互動行為關係指數

173
𝑨𝒄𝒄𝒖𝒓𝒂𝒄𝒚 = 𝟎. 𝟕𝟒
𝑨𝒄𝒄𝒖𝒓𝒂𝒄𝒚 = 𝟎. 𝟖𝟏

跟任何data science work一樣
準確率是一個面向
透過這個可優化analytics
接下來呢? (part 4)
174

前端融合
175
低階訊號
描述值
編碼
/Profile
語音
特性
低階訊號
描述值
編碼
/Profile
影像
特性
低階訊號
描述值
編碼
/Profile
文字
Behavioral
Analytics
文
字
影
像
聲
音
機器學習
Behavioral
Analytics
Behavioral
Analytics
決策融合
電腦如何「聽」、「讀」及「看」懂人類行為

176
基本上concept不是太難
但我覺得其實滿難
General end-to-end system needs more
R&D
目前很多做法：
Context-dependent (what ever works)
當然有些good rule of thumb
map到你要自動化的construct

177
編碼/Profile
語音特性
編碼/Profile
影像特性
編碼/Profile
文字
多
行
為
模
態
的
整
合
Behavioral
Analytics
Note*
f(# of data), and 解釋力

BSP
INGREDIENTS
178
資料代表性
充分預處理
適切模型建立
算法應用
機器學習

Part 4:
跨領域整合的延續: 提供專家多一個
全新面向決策工具
179

BSP
INGREDIENTS
180
資料代表性
充分預處理
適切模型建立
算法應用
機器學習

181

182
其實需要一些多面向的驗證
這個一定要跟專家合作請教

183
不錯吧?
不錯?
嗯?

184
評估: 實驗
輔導校長 2
輔導校長 1
X個月後
輔導校長 2
輔導校長 1
一開始
聲音影像評分
10 筆相同演講影音資料
𝒓 = 𝟎. 𝟓𝟐𝟕
𝒓 = 𝟎. 𝟔𝟏𝟐
𝒓 = 𝟎. 𝟔𝟒𝟖

185
𝐒𝐩𝐞𝐚𝐫𝐦𝐚𝐧 𝐜𝐨𝐫𝐫𝐞𝐥𝐚𝐭𝐢𝐨𝐧 = 𝟎. 𝟔𝟐𝟏
Higher consistency
不錯吧?
不錯

Extension
186
Good collaborative vibe . . .
繼續!

187
個案研究
學校校務發展計畫
教育參觀心得
學科測驗
生活札記
即席演講
學校行政事務演習
生活表現
多方面測試
針對演講好壞，只有即席演講那一個項目的
單一模型可以被用來建構模型嗎 ?
全方位教育高階
主管培訓

188
運用簡單 multi-task learning 方式
每種評分(培訓測驗)都
是一種task
Task 1 - 有
用 feature
Task 2 - 有
用 feature
Task 8 - 有
用 feature
.
.
.
Kernel
融合
Multi-task learning
多模態行為融合

189
準確率又有明顯提升
• 個案研究
• 學校校務發展計畫
• 教育參觀心得
• 生活札記
融入哪些測驗培訓評量對於建構演
講評分會有顯著效果 ?
沒有紙筆測驗
喔!
An actionable insights that were not clear before
Hence, project continue…

190

191(低度: 0-3, 中度 : 4-6, 高度 : 7-10)
病人: 高疼痛
辨識: 低疼痛
病人: 低疼痛
辨識: 高疼痛
準確率
高低疼痛辨識 74%

192
Content Validity
Validity
Construct Validity
Criterion Validity

193
觀察式疼痛指標：小孩、acute pain、elderly 似乎會有效
跟self-report 有相當程度上會有差異
可否complement現在gold standard
有被打止痛
針(吃止痛)
NRS-11
A-V + 心跳血壓FEATURE 43% 的痛有
下降
70% 的痛有
下降
Project continue…

194
這些高互消息低互消息社交行為有可能是甚麼呢?

195
高互消息似乎 POINT TO HIGHER ATYPICALITY

196
畫畫看好了
BSP好處之一，你完全掌握你在算甚麼可以找回去

197
Psychologists unconsciously alter communicative social behavior strategy (cueing
behavior?) as conditioned on ASD kids ability to carry out reciprocal communication
during interaction

198
自動從影音訊號產生
雙人互動行為相關性的數值
(頭部動作、聲音特質)
醫生小孩行為相關性 ↑自閉症嚴重度↑
辨識: 0.81準確率
醫生行為透露了更多小孩
社交行為嚴重程度

199
主觀敘述→ 客觀、資料驅動的人類行為描述
提供全新客觀第三視角分析探討自閉症小孩社交行為
Insight beyond current capability, opportunity now emerges…
We can now start imagining the application of this 洞見:
(1) 更早期發現 (照護者行為模型?)
(2) 醫師、諮商師訓練?
More?

200
Descriptor’s
Included
Child Prosody Psych Prosody Child and Psych
Prosody
Spearman’s ρ 0.64*** 0.79*** 0.67***
Psychologists acoustics at least as predictive of child ASD severity ratings
跟以前英文ADOS發現有類似!
[1] Daniel Bone, Chi-Chun Lee, Matthew P. Black, Marian E. Williams, Pat Levitt, Sungbok Lee, and Shrikanth Narayanan, "The Psychologist as
an Interlocutor in Autism Spectrum Disorder Assessment: Insights from a Study of Spontaneous Prosody", Journal of Speech, Language, and
Hearing Research, 2014, 57(4), 1162-1177.
Hard to obtained scientific insights without such behavioral analytics for
domain experts
NEED MORE VERIFICATION

201
做結論前
額外兩個小例子:
1. 正確Data 收集有多重要

Is it Technical? Example Pitfall 1
 Controlling for Channel Factors
• Interspeech 2013 Autism Challenge
• Baseline Approach
 Black-box (works well)
 2-class baseline: 92.8% UAR (chance is 50% UAR)
• Hypothesis: Model captures channel, not diagnosis
 ASD/SLI from 2 clinics, TD from classrooms
• Simple experiment showed channel differences
 Matched baseline
• Conclusion: Remit (or note) noise sources in data collection.
202
Daniel Bone, Theodora Chaspari, Kartik Audkhasi, James Gibson, Andreas Tsiartas, Maarten Van Segbroeck, Ming Li, Sungbok Lee, and Shrikanth
Narayanan, "Classifying Language-Related Developmental Disorders from Speech Cues: the Promise and the Potential Confounds", InterSpeech, 2013.
11/11/2014

203
兩個小例子:
2. 正確cross validation 收集有多重要

Is it Technical: Example Pitfall 2
Behavior Analysis & Modeling: Cross-validation
They do not perform speaker-separated cross-fold
validation!
• Can we detect United States Senators’ party affiliations
from speech features (with black-box approach)?
 Performance increases as # samples/speaker
increases
 Conclusion: Always perform speaker-separated
cross-validation!
20411/11/2014

206
情緒辨識 Affective Computing
社交行為訊號處理 Social Signal Processing
副語言資訊辨識 Paralinguistic Recognition
生心理病症辨識預測 Physiological/Pathological Disorder Recognition/Prediction
除了BSP, 其他領域

207
In-car 駕駛人行為分析
生心理疾病的提前風險評估
疾病病程追蹤
In-home 行為追蹤
In-classroom 學習診斷
…on and on…
除了今天講的方向，其他題目

208
每一步、每個application domain、每次依資料中開發的
技術、每個與跨領域專家合作、對於了解人的行為與內在
狀態都更深一層了解。
這幾個題目都其實還算很“開發中”
整個領域都還滿新
我

209
Motivation Interview: Addiction Therapy

210
By professor Shrikanth Narayanan
System in clinical trial

211
說難拆穿了
好像也不難
難在 ? 跨出去了解和開始做而以

212
Behavioral Signal Processing (BSP)
Compute Human Behavior Traits and States for Domain Experts Decision Making
• Help experts to do things they know in a more efficient manner at scale
• Develop novel behavioral analytics framework for possible scientific discovery
Transformative effort . . .

213
OF
FOR
BY
COMPUTING
HUMANS
Human action and behavior data
Meaningful analysis, timely decision making &
intervention (action)
Collaborative integration of human expertise
with automated processing
By professor Shrikanth Narayanan

214
領域專家知識
Low level
descriptors
Acoustic
features
Motion
features
Text
features
Image
features
Speech
recognition
Face
recognition
Action
recognition
Dialog act
tagging
Keyword
spotting
Text
processing
Sentiment
Analysis
Affect
recognition
Speaker
states and
traits
Visual-
speech
processing
Interaction
modeling
Subjective
assessment
Internal state
& construct
Neuro-
developmen
tal disorder
Evidence-
based
observational
coding
Intervention
efficacy
Coder
variability
control
Development
of coding
manual
Self report
measure
validity
Coding
mechanism
Social
behavior
Affective
behavior
Communica
tive
behavior
Dyadic
behavior
人類訊號處理
Relative New:
RICH R&D
OPPORTUNITIES
(CHALLENGES)

215
BSP
INGREDIENTS
不是一個技術
是一個過程

217
人類行為非常複雜
數位資料收集 (非結構化)
行為訊號處理 (客觀)
機器學習(人工智慧): 找尋 Pattern (非線性的可預測性)
Contextualize 在各種領域應用

218
跨領域合作真的很令人興奮
I was challenged and inspired

221
科技、資料、人類行為、人工智慧、跨領域合作
提供專家決策工具，全新各種的可能
顯微鏡: 不只是 “放大”
可以研究開發幫助社會有意義科技應用
Challenging the status quo/ Pushing scientific boundary
Making a positive impact

222
BiiC lab @ NTHU EE
http://biic.ee.nthu.edu.tw
THANK YOU . . .
many COLLABORATORS + the entire BIIC lab

[DSC 2016] 系列活動：李祈均 / 人類行為大數據分析

More Related Content

[DSC 2016] 系列活動：李祈均 / 人類行為大數據分析