Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
[Karger+] Iterative Learning for
Reliable Crowdsourcing Systems

        2012/04/08 #NIPSreading
            Nakatani Shuyo
Crowdsourcing
• Outsource to undefined public
  – Almost workers are not experts
  – Some workers may be SPAMMERs
• Amazon Mechanical Turk
  – Separate a large task into microtasks
  – Workers gain a few cents per a microtask


                                               2
Spammer and Hammer
• Spam/Spammer
  – submitting arbitrary answers for fee
• Ham/Hammer
  – answering question correctly
• It is difficult to distinguish spam/spammers
  – Requester doesn’t have a gold standard
  – Workers are neither persistent nor unidentifiable
                                                        3
Questions
• How to ensure reliability of workers
  – Is this worker is a spammer or hammer?
• How to minimize total price
  – ∝ number of task assignments
• How to predict answers
  – majority voting? EMA?
• How to estimate upper bound of error rate
  – estimate upper bound

                                              4
Setting
• 𝑡 𝑖 : tasks, 𝑖 = 1, ⋯ , 𝑚          t1        t2        t3    …    tm

• 𝑤 𝑗 : workers, 𝑗 = 1, ⋯ , 𝑛
• (l, r)-regular bipartite graph          w1        w2    w3   …   wn

   – Each task assigns to l workers.
   – Each worker assigns to r tasks.
• Given m and r, how to select l?
                          𝑚𝑙
   – 𝑚𝑙 = 𝑛𝑟, then 𝑛 =         is decided.
                          𝑟

                                                                    5
Model
• 𝑠 𝑖 = ±1: correct answers of ti (unobserved)
• 𝐴 𝑖𝑗 : answers to ti of wj (observed)
                            ∀
• 𝑝 𝑗 = 𝑝 𝐴 𝑖𝑗 = 𝑠 𝑖 for 𝑖 : reliability of workers
   – It assumes independent on task
                 2
• 𝐄 2𝑝 𝑗 − 1         = 𝑞 : average quality parameter
   – 𝑞 ∈ 0, 1 close to 1 indicates that almost workers are
     diligent
   – q is set to 0.3 on the later experiment

                                                             6
Example: spammer-hammer model
• For 𝑞 ∈ 0, 1 given,
• 𝑝 𝑗 = 1 with probability 𝑞
   – wj is a perfect hammer (all correct).
• 𝑝 𝑗 = 1/2 with probability 1 − 𝑞
   – wj is a spammer (random answers)
                        2
• Then 𝐄 2𝑝 𝑗 − 1           = 𝑞×1+ 1− 𝑞 ×0= 𝑞


                                                7
Iterative Inference
• 𝑥 𝑖→𝑗 : real-valued task messages from ti to wj
• 𝑦 𝑗→𝑖 : worker messages from wj to ti




                                                    8
                  from [Karger+ NIPS11]
Prediction
• predicted answer:

      𝑠𝑖    𝐴 𝑖𝑗            = sign              𝐴 𝑖𝑗 𝑦 𝑗→𝑖
                   𝑖,𝑗 ∈𝐸               𝑗∈𝜕 𝑖
   – where 𝜕 𝑖 : neighborhood of ti
• error rate:
                       𝑚
               1
       lim sup              𝑝 𝑠𝑖 ≠ 𝑠𝑖   𝐴 𝑖𝑗
         𝑚→∞   𝑚                                𝑖,𝑗 ∈𝐸
                      𝑖=1

                                                             9
Performance Guarantee



                        10
Theorem 2.1
• For l >1, r >1, 𝑞 ∈ 0, 1 given, let 𝑙 = 𝑙 − 1, 𝑟 = 𝑟 − 1.
• Assume m tasks assign to 𝑛 = 𝑚𝑙/𝑟 workers according
  to (l, r)-regular bipartite graph
• Estimate from k iterations of the iterative algorithm
• If 𝜇 ≡ 𝐄 2𝑝 𝑗 − 1 > 0 and 𝑞2 > 1/𝑙 𝑟, then
                 𝑚                                       𝑙𝑞
            1                                          − 2
    lim sup           𝑝 𝑠𝑖 ≠ 𝑠𝑖   𝐴 𝑖𝑗            ≤   𝑒 2𝜌 𝑘
      𝑚→∞   𝑚                            𝑖,𝑗 ∈𝐸
                𝑖=1
   – where

                                                               11
Corollary 2.2
• Under the hypotheses of Theorem 2.1,
                     𝑚                                           𝑙𝑞
                1                                           −     2
                                                                2𝜌∞
lim sup lim sup           𝑝 𝑠𝑖 ≠ 𝑠𝑖   𝐴 𝑖𝑗            ≤ 𝑒
  𝑘→∞     𝑚→∞   𝑚                            𝑖,𝑗 ∈𝐸
                    𝑖=1
• where


  – For 𝑞 = 0.3, 𝑙 = 𝑟 = 25 then r.h.s. = 0.31
  – For 𝑞 = 0.5, 𝑙 = 25, 𝑟 = 10 then r.h.s. = 0.15

                                                                12
Experiments
• m = n = 1000, l = r
• left: q=0.3, 𝑙 ∈ [1,30]
• right: l = 25, 𝑞 ∈ [0, 0.4]




                  from [Karger+ NIPS11]   13
Lower Bound




              14

More Related Content

[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems

  • 1. [Karger+] Iterative Learning for Reliable Crowdsourcing Systems 2012/04/08 #NIPSreading Nakatani Shuyo
  • 2. Crowdsourcing • Outsource to undefined public – Almost workers are not experts – Some workers may be SPAMMERs • Amazon Mechanical Turk – Separate a large task into microtasks – Workers gain a few cents per a microtask 2
  • 3. Spammer and Hammer • Spam/Spammer – submitting arbitrary answers for fee • Ham/Hammer – answering question correctly • It is difficult to distinguish spam/spammers – Requester doesn’t have a gold standard – Workers are neither persistent nor unidentifiable 3
  • 4. Questions • How to ensure reliability of workers – Is this worker is a spammer or hammer? • How to minimize total price – ∝ number of task assignments • How to predict answers – majority voting? EMA? • How to estimate upper bound of error rate – estimate upper bound 4
  • 5. Setting • 𝑡 𝑖 : tasks, 𝑖 = 1, ⋯ , 𝑚 t1 t2 t3 … tm • 𝑤 𝑗 : workers, 𝑗 = 1, ⋯ , 𝑛 • (l, r)-regular bipartite graph w1 w2 w3 … wn – Each task assigns to l workers. – Each worker assigns to r tasks. • Given m and r, how to select l? 𝑚𝑙 – 𝑚𝑙 = 𝑛𝑟, then 𝑛 = is decided. 𝑟 5
  • 6. Model • 𝑠 𝑖 = ±1: correct answers of ti (unobserved) • 𝐴 𝑖𝑗 : answers to ti of wj (observed) ∀ • 𝑝 𝑗 = 𝑝 𝐴 𝑖𝑗 = 𝑠 𝑖 for 𝑖 : reliability of workers – It assumes independent on task 2 • 𝐄 2𝑝 𝑗 − 1 = 𝑞 : average quality parameter – 𝑞 ∈ 0, 1 close to 1 indicates that almost workers are diligent – q is set to 0.3 on the later experiment 6
  • 7. Example: spammer-hammer model • For 𝑞 ∈ 0, 1 given, • 𝑝 𝑗 = 1 with probability 𝑞 – wj is a perfect hammer (all correct). • 𝑝 𝑗 = 1/2 with probability 1 − 𝑞 – wj is a spammer (random answers) 2 • Then 𝐄 2𝑝 𝑗 − 1 = 𝑞×1+ 1− 𝑞 ×0= 𝑞 7
  • 8. Iterative Inference • 𝑥 𝑖→𝑗 : real-valued task messages from ti to wj • 𝑦 𝑗→𝑖 : worker messages from wj to ti 8 from [Karger+ NIPS11]
  • 9. Prediction • predicted answer: 𝑠𝑖 𝐴 𝑖𝑗 = sign 𝐴 𝑖𝑗 𝑦 𝑗→𝑖 𝑖,𝑗 ∈𝐸 𝑗∈𝜕 𝑖 – where 𝜕 𝑖 : neighborhood of ti • error rate: 𝑚 1 lim sup 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴 𝑖𝑗 𝑚→∞ 𝑚 𝑖,𝑗 ∈𝐸 𝑖=1 9
  • 11. Theorem 2.1 • For l >1, r >1, 𝑞 ∈ 0, 1 given, let 𝑙 = 𝑙 − 1, 𝑟 = 𝑟 − 1. • Assume m tasks assign to 𝑛 = 𝑚𝑙/𝑟 workers according to (l, r)-regular bipartite graph • Estimate from k iterations of the iterative algorithm • If 𝜇 ≡ 𝐄 2𝑝 𝑗 − 1 > 0 and 𝑞2 > 1/𝑙 𝑟, then 𝑚 𝑙𝑞 1 − 2 lim sup 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴 𝑖𝑗 ≤ 𝑒 2𝜌 𝑘 𝑚→∞ 𝑚 𝑖,𝑗 ∈𝐸 𝑖=1 – where 11
  • 12. Corollary 2.2 • Under the hypotheses of Theorem 2.1, 𝑚 𝑙𝑞 1 − 2 2𝜌∞ lim sup lim sup 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴 𝑖𝑗 ≤ 𝑒 𝑘→∞ 𝑚→∞ 𝑚 𝑖,𝑗 ∈𝐸 𝑖=1 • where – For 𝑞 = 0.3, 𝑙 = 𝑟 = 25 then r.h.s. = 0.31 – For 𝑞 = 0.5, 𝑙 = 25, 𝑟 = 10 then r.h.s. = 0.15 12
  • 13. Experiments • m = n = 1000, l = r • left: q=0.3, 𝑙 ∈ [1,30] • right: l = 25, 𝑞 ∈ [0, 0.4] from [Karger+ NIPS11] 13