Statistical Decision Making for Optimal Budget Allocation in Crowd Labeling

Chen, Xi; Lin, Qihang; Zhou, Dengyong

Computer Science > Machine Learning

arXiv:1403.3080v2 (cs)

[Submitted on 12 Mar 2014 (v1), last revised 24 Apr 2014 (this version, v2)]

Title:Statistical Decision Making for Optimal Budget Allocation in Crowd Labeling

Authors:Xi Chen, Qihang Lin, Dengyong Zhou

View PDF

Abstract:In crowd labeling, a large amount of unlabeled data instances are outsourced to a crowd of workers. Workers will be paid for each label they provide, but the labeling requester usually has only a limited amount of the budget. Since data instances have different levels of labeling difficulty and workers have different reliability, it is desirable to have an optimal policy to allocate the budget among all instance-worker pairs such that the overall labeling accuracy is maximized. We consider categorical labeling tasks and formulate the budget allocation problem as a Bayesian Markov decision process (MDP), which simultaneously conducts learning and decision making. Using the dynamic programming (DP) recurrence, one can obtain the optimal allocation policy. However, DP quickly becomes computationally intractable when the size of the problem increases. To solve this challenge, we propose a computationally efficient approximate policy, called optimistic knowledge gradient policy. Our MDP is a quite general framework, which applies to both pull crowdsourcing marketplaces with homogeneous workers and push marketplaces with heterogeneous workers. It can also incorporate the contextual information of instances when they are available. The experiments on both simulated and real data show that the proposed policy achieves a higher labeling accuracy than other existing policies at the same budget level.

Comments:	39 pages
Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:1403.3080 [cs.LG]
	(or arXiv:1403.3080v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1403.3080

Submission history

From: Xi Chen [view email]
[v1] Wed, 12 Mar 2014 19:55:00 UTC (392 KB)
[v2] Thu, 24 Apr 2014 08:52:28 UTC (396 KB)

Computer Science > Machine Learning

Title:Statistical Decision Making for Optimal Budget Allocation in Crowd Labeling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Statistical Decision Making for Optimal Budget Allocation in Crowd Labeling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators