research-article

Hear the whole story: towards the diversity of opinion in crowdsourcing markets

Editors: Chen Li, Volker Markl Authors: Ting Wu, Lei Chen, Pan Hui, Chen Jason Zhang, Weikai LiAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 8, Issue 5

Pages 485 - 496

https://doi.org/10.14778/2735479.2735482

Published: 01 January 2015 Publication History

Abstract

The recent surge in popularity of crowdsourcing has brought with it a new opportunity for engaging human intelligence in the process of data analysis. Crowdsourcing provides a fundamental mechanism for enabling online workers to participate in tasks that are either too difficult to be solved solely by a computer or too expensive to employ experts to perform. In the field of social science, four elements are required to form a wise crowd - Diversity of Opinion, Independence, Decentralization and Aggregation. However, while the other three elements are already studied and implemented in current crowdsourcing platforms, the 'Diversity of Opinion' has not been functionally enabled. In this paper, we address the algorithmic optimizations towards the diversity of opinion of crowdsourcing marketplaces.

From a computational perspective, in order to build a wise crowd, we need to quantitatively modeling the diversity, and take it into consideration for constructing the crowd. In a crowdsourcing marketplace, we usually encounter two basic paradigms for worker selection: building a crowd to wait for tasks to come and selecting workers for a given task. Therefore, we propose our Similarity-driven Model (S-Model) and Task-driven Model (T-Model) for both of the paradigms. Under both of the models, we propose efficient and effective algorithms to enlist a budgeted number of workers, which have the optimal diversity. We have verified our solutions with extensive experiments on both synthetic datasets and real data sets.

References

[1]

https://foursquare.com/.

[2]

https://petitions.whitehouse.gov/.

[3]

https://www.mturk.com/mturk/welcome.

[4]

http://www.crowdflower.com/.

[5]

http://www.nltk.org/.

[6]

Y. Amsterdamer, Y. Grossman, T. Milo, and P. Senellart. Crowd mining. In SIGMOD Conference, pages 241--252, 2013.

Digital Library

[7]

C. G. Andreas Krause. A note on the budgeted maximization of submodular functions. Technical report, School of Computer Science, Carnegie Mellon University, March 2005.

[8]

D. C. Brabham. Crowdsourcing as a model for problem solving an introduction and cases. Convergence February 2008 vol. 14 no. 1 75--90, 2008.

[9]

C. S. Campbell, P. P. Maglio, A. Cozzi, and B. Dom. Expertise identification using email communications. In In CIKM 03: Proceedings of the twelfth international conference on Information and knowledge management, pages 528--531. ACM Press, 2003.

Digital Library

[10]

C. C. Cao, J. She, Y. Tong, and L. Chen. Whom to ask? jury selection for decision making tasks on micro-blog services. PVLDB, 5(11): 1495--1506, 2012.

Digital Library

[11]

A. Caprara, H. Kellerer, U. Pferschy, and D. Pisinger. Approximation algorithms for knapsack problems with cardinality constraints. European Journal of Operational Research, 123(2): 333--345, 2000.

[12]

M. Das, S. Thirumuruganathan, S. Amer-Yahia, G. Das, and C. Yu. Who tags what? an analysis framework. PVLDB, 5(11): 1567--1578, 2012.

Digital Library

[13]

A. Doan, R. Ramakrishnan, and A. Y. Halevy. Crowdsourcing systems on the world-wide web. Commun. ACM, 54(4): 86--96, 2011.

Digital Library

[14]

B. Dom, I. Eiron, A. Cozzi, and Y. Zhang. Graph-based ranking algorithms for e-mail expertise analysis. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD '03, pages 42--48, New York, NY, USA, 2003. ACM.

Digital Library

[15]

A. Feng, M. J. Franklin, D. Kossmann, T. Kraska, S. Madden, S. Ramesh, A. Wang, and R. Xin. Crowddb: Query processing with the vldb crowd. PVLDB, 4(12): 1387--1390, 2011.

Digital Library

[16]

R. Gomes, P. Welinder, A. Krause, and P. Perona. Crowdclustering. In NIPS, pages 558--566, 2011.

Digital Library

[17]

S. Guo, A. G. Parameswaran, and H. Garcia-Molina. So who won?: dynamic max discovery with the crowd. In SIGMOD Conference, pages 385--396, 2012.

Digital Library

[18]

H. Kaplan, I. Lotosh, T. Milo, and S. Novgorodov. Answering planning queries with the crowd. PVLDB, 6(9): 697--708, 2013.

Digital Library

[19]

H. Kellerer, U. Pferschy, and D. Pisinger. Knapsack problems. Springer, 2004.

[20]

T. Lappas, K. Liu, and E. Terzi. Finding a team of experts in social networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, pages 467--476, New York, NY, USA, 2009. ACM.

Digital Library

[21]

T. Malone, R. Laubacher, and C. Dellarocas. Harnessing crowds: Mapping the genome of collective intelligence. Research Paper No. 4732-09, MIT, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA, USA, February 2009. Sloan Research Paper No. 4732--09.

[22]

A. Marcus, E. Wu, D. R. Karger, S. Madden, and R. C. Miller. Human-powered sorts and joins. PVLDB, 5(1): 13--24, 2011.

Digital Library

[23]

S. Page. The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies. Princeton University Press, 2007.

Digital Library

[24]

S. Page. Making the difference: Applying a logic of diversity. 2007.

[25]

A. G. Parameswaran, H. Garcia-Molina, H. Park, N. Polyzotis, A. Ramesh, and J. Widom. Crowdscreen: algorithms for filtering data with humans. In SIGMOD Conference, pages 361--372, 2012.

Digital Library

[26]

A. G. Parameswaran and N. Polyzotis. Answering queries using humans, algorithms and databases. In CIDR, pages 160--166, 2011.

[27]

A. G. Parameswaran, A. D. Sarma, H. Garcia-Molina, N. Polyzotis, and J. Widom. Human-assisted graph search: it's okay to ask questions. PVLDB, 4(5): 267--278, 2011.

Digital Library

[28]

M. F. Porter. Readings in information retrieval. chapter An Algorithm for Suffix Stripping, pages 313--316. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1997.

Digital Library

[29]

B. Roos. Binomial approximation to the poisson binomial distribution: The krawtchouk expansion. Theory of Probability and its Applications, 45(2): 258--272 (2000) and Teor. Veroyatn. Primen. 45, No. 2, 328--344, 2000.

[30]

C. Stein. Approximate Computation of Expectations. Hayward, Calif.: Institute of Mathematical Statistics, 1986.

[31]

J. Surowiecki. The Wisdom of Crowds. Anchor, 2005.

Digital Library

[32]

J. Wang, T. Kraska, M. J. Franklin, and J. Feng. Crowder: Crowdsourcing entity resolution. PVLDB, 5(11): 1483--1494, 2012.

Digital Library

[33]

J. Wang, G. Li, T. Kraska, M. J. Franklin, and J. Feng. Leveraging transitive relations for crowdsourced joins. In SIGMOD Conference, pages 229--240, 2013.

Digital Library

[34]

S. E. Whang, P. Lofgren, and H. Garcia-Molina. Question selection for crowd entity resolution. PVLDB, 6(6): 349--360, 2013.

Digital Library

[35]

C. J. Zhang, L. Chen, H. V. Jagadish, and C. C. Cao. Reducing uncertainty of schema matching via crowdsourcing. PVLDB, 6(9): 757--768, 2013.

Digital Library

Cited By

Chen ZHe LMukherjee ADragut EBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)Comquest: Large Scale User Comment Crawling and IntegrationCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654736(432-435)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654736
Tian YWang HXie LMa XLi Q(2022)VFLens: Co-design the Modeling Process for Efficient Vertical Federated Learning via VisualizationProceedings of the Tenth International Symposium of Chinese CHI10.1145/3565698.3565765(1-14)Online publication date: 22-Oct-2022
https://dl.acm.org/doi/10.1145/3565698.3565765
Guy IMilo TNovgorodov SYoungmann B(2020)CONCIERGEProceedings of the VLDB Endowment10.14778/3415478.341549513:12(2865-2868)Online publication date: 14-Sep-2020
https://dl.acm.org/doi/10.14778/3415478.3415495
Show More Cited By

Recommendations

Can't You Hear Me?: Investigating Personal Soundscape Curation
MUM '18: Proceedings of the 17th International Conference on Mobile and Ubiquitous Multimedia

Continuous advances in personal audio technology (e.g. headphones), led to efficient noise cancellation and allowed users to build and influence their personal acoustic environment. Despite the high adoption and ubiquitous character of the technology, ...
The role of mutations in whole genome duplication
EvoBIO'12: Proceedings of the 10th European conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics

Genetic mutation is an essential factor in the evolution of biological organisms and a driving force of phenotypical innovation. On rare occasions, nature takes a major evolutionary leap during which an organism's gene repertoire suddenly doubled. ...
Star Wars: a Crowd Asset story
DigiPro '20: Proceedings of the 2020 Digital Production Symposium

For Star Wars: The Rise of Skywalker we changed the way we create crowd assets across the different departments. The main goal of the project was to reduce the overhead of creating a crowd asset and to integrate it as closely as possible with the hero ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 8, Issue 5

January 2015

181 pages

ISSN:2150-8097

Editors:
Chen Li
University of California, Irvine
,
Volker Markl
TU Berlin

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 January 2015

Published in PVLDB Volume 8, Issue 5

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
251
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)2

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen ZHe LMukherjee ADragut EBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)Comquest: Large Scale User Comment Crawling and IntegrationCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654736(432-435)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654736
Tian YWang HXie LMa XLi Q(2022)VFLens: Co-design the Modeling Process for Efficient Vertical Federated Learning via VisualizationProceedings of the Tenth International Symposium of Chinese CHI10.1145/3565698.3565765(1-14)Online publication date: 22-Oct-2022
https://dl.acm.org/doi/10.1145/3565698.3565765
Guy IMilo TNovgorodov SYoungmann B(2020)CONCIERGEProceedings of the VLDB Endowment10.14778/3415478.341549513:12(2865-2868)Online publication date: 14-Sep-2020
https://dl.acm.org/doi/10.14778/3415478.3415495
Zhang MWang HLi JGao H(2020)Diversification on big data in query processingFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-019-8324-914:4Online publication date: 1-Aug-2020
https://dl.acm.org/doi/10.1007/s11704-019-8324-9
Li WLee JShroff N(2020)A Faster FPTAS for Knapsack Problem with Cardinality ConstraintApproximation and Online Algorithms10.1007/978-3-030-80879-2_2(16-29)Online publication date: 9-Sep-2020
https://dl.acm.org/doi/10.1007/978-3-030-80879-2_2
Saldivar JDaniel FCernuzzi LCasati F(2019)Online Idea Management for Civic EngagementACM Transactions on Social Computing10.1145/32849822:1(1-29)Online publication date: 23-Jan-2019
https://dl.acm.org/doi/10.1145/3284982
Han KHuang HLuo J(2018)Quality-Aware Pricing for Mobile CrowdsensingIEEE/ACM Transactions on Networking10.1109/TNET.2018.284656926:4(1728-1741)Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.1109/TNET.2018.2846569
Amsterdamer YGoldreich OLim EWinslett MSanderson MFu ASun JCulpepper SLo EHo JDonato DAgrawal RZheng YCastillo CSun ATseng VLi C(2017)PODIUMProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3133176(2443-2446)Online publication date: 6-Nov-2017
https://dl.acm.org/doi/10.1145/3132847.3133176
Wang WZhang MChen GJagadish HOoi BTan K(2016)Database Meets Deep LearningACM SIGMOD Record10.1145/3003665.300366945:2(17-22)Online publication date: 28-Sep-2016
https://dl.acm.org/doi/10.1145/3003665.3003669
Han KHuang HLuo JDressler Fauf der Heide F(2016)Posted pricing for robust crowdsensingProceedings of the 17th ACM International Symposium on Mobile Ad Hoc Networking and Computing10.1145/2942358.2942385(261-270)Online publication date: 5-Jul-2016
https://dl.acm.org/doi/10.1145/2942358.2942385

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents