[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems

[Karger+] Iterative Learning for
Reliable Crowdsourcing Systems

2012/04/08 #NIPSreading
Nakatani Shuyo

Crowdsourcing
• Outsource to undefined public
– Almost workers are not experts
– Some workers may be SPAMMERs
• Amazon Mechanical Turk
– Separate a large task into microtasks
– Workers gain a few cents per a microtask

2

Spammer and Hammer
• Spam/Spammer
– submitting arbitrary answers for fee
• Ham/Hammer
– answering question correctly
• It is difficult to distinguish spam/spammers
– Requester doesn’t have a gold standard
– Workers are neither persistent nor unidentifiable
3

Questions
• How to ensure reliability of workers
– Is this worker is a spammer or hammer?
• How to minimize total price
– ∝ number of task assignments
• How to predict answers
– majority voting? EMA?
• How to estimate upper bound of error rate
– estimate upper bound

4

Setting
• 𝑡 𝑖 : tasks, 𝑖 = 1, ⋯ , 𝑚 t1 t2 t3 … tm

• 𝑤 𝑗 : workers, 𝑗 = 1, ⋯ , 𝑛
• (l, r)-regular bipartite graph w1 w2 w3 … wn

– Each task assigns to l workers.
– Each worker assigns to r tasks.
• Given m and r, how to select l?
𝑚𝑙
– 𝑚𝑙 = 𝑛𝑟, then 𝑛 = is decided.
𝑟

5

Model
• 𝑠 𝑖 = ±1: correct answers of ti (unobserved)
• 𝐴 𝑖𝑗 : answers to ti of wj (observed)
∀
• 𝑝 𝑗 = 𝑝 𝐴 𝑖𝑗 = 𝑠 𝑖 for 𝑖 : reliability of workers
– It assumes independent on task
2
• 𝐄 2𝑝 𝑗 − 1 = 𝑞 : average quality parameter
– 𝑞 ∈ 0, 1 close to 1 indicates that almost workers are
diligent
– q is set to 0.3 on the later experiment

6

Example: spammer-hammer model
• For 𝑞 ∈ 0, 1 given,
• 𝑝 𝑗 = 1 with probability 𝑞
– wj is a perfect hammer (all correct).
• 𝑝 𝑗 = 1/2 with probability 1 − 𝑞
– wj is a spammer (random answers)
2
• Then 𝐄 2𝑝 𝑗 − 1 = 𝑞×1+ 1− 𝑞 ×0= 𝑞

7

Iterative Inference
• 𝑥 𝑖→𝑗 : real-valued task messages from ti to wj
• 𝑦 𝑗→𝑖 : worker messages from wj to ti

8
from [Karger+ NIPS11]

Prediction
• predicted answer:

𝑠𝑖 𝐴 𝑖𝑗 = sign 𝐴 𝑖𝑗 𝑦 𝑗→𝑖
𝑖,𝑗 ∈𝐸 𝑗∈𝜕 𝑖
– where 𝜕 𝑖 : neighborhood of ti
• error rate:
𝑚
1
lim sup 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴 𝑖𝑗
𝑚→∞ 𝑚 𝑖,𝑗 ∈𝐸
𝑖=1

9

Performance Guarantee

10

Theorem 2.1
• For l >1, r >1, 𝑞 ∈ 0, 1 given, let 𝑙 = 𝑙 − 1, 𝑟 = 𝑟 − 1.
• Assume m tasks assign to 𝑛 = 𝑚𝑙/𝑟 workers according
to (l, r)-regular bipartite graph
• Estimate from k iterations of the iterative algorithm
• If 𝜇 ≡ 𝐄 2𝑝 𝑗 − 1 > 0 and 𝑞2 > 1/𝑙 𝑟, then
𝑚 𝑙𝑞
1 − 2
lim sup 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴 𝑖𝑗 ≤ 𝑒 2𝜌 𝑘
𝑚→∞ 𝑚 𝑖,𝑗 ∈𝐸
𝑖=1
– where

11

Corollary 2.2
• Under the hypotheses of Theorem 2.1,
𝑚 𝑙𝑞
1 − 2
2𝜌∞
lim sup lim sup 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴 𝑖𝑗 ≤ 𝑒
𝑘→∞ 𝑚→∞ 𝑚 𝑖,𝑗 ∈𝐸
𝑖=1
• where

– For 𝑞 = 0.3, 𝑙 = 𝑟 = 25 then r.h.s. = 0.31
– For 𝑞 = 0.5, 𝑙 = 25, 𝑟 = 10 then r.h.s. = 0.15

12

Experiments
• m = n = 1000, l = r
• left: q=0.3, 𝑙 ∈ [1,30]
• right: l = 25, 𝑞 ∈ [0, 0.4]

from [Karger+ NIPS11] 13

[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems

More Related Content

[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems