Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Hear the whole story: towards the diversity of opinion in crowdsourcing markets

Published: 01 January 2015 Publication History

Abstract

The recent surge in popularity of crowdsourcing has brought with it a new opportunity for engaging human intelligence in the process of data analysis. Crowdsourcing provides a fundamental mechanism for enabling online workers to participate in tasks that are either too difficult to be solved solely by a computer or too expensive to employ experts to perform. In the field of social science, four elements are required to form a wise crowd - Diversity of Opinion, Independence, Decentralization and Aggregation. However, while the other three elements are already studied and implemented in current crowdsourcing platforms, the 'Diversity of Opinion' has not been functionally enabled. In this paper, we address the algorithmic optimizations towards the diversity of opinion of crowdsourcing marketplaces.
From a computational perspective, in order to build a wise crowd, we need to quantitatively modeling the diversity, and take it into consideration for constructing the crowd. In a crowdsourcing marketplace, we usually encounter two basic paradigms for worker selection: building a crowd to wait for tasks to come and selecting workers for a given task. Therefore, we propose our Similarity-driven Model (S-Model) and Task-driven Model (T-Model) for both of the paradigms. Under both of the models, we propose efficient and effective algorithms to enlist a budgeted number of workers, which have the optimal diversity. We have verified our solutions with extensive experiments on both synthetic datasets and real data sets.

References

[1]
https://foursquare.com/.
[2]
https://petitions.whitehouse.gov/.
[3]
https://www.mturk.com/mturk/welcome.
[4]
http://www.crowdflower.com/.
[5]
http://www.nltk.org/.
[6]
Y. Amsterdamer, Y. Grossman, T. Milo, and P. Senellart. Crowd mining. In SIGMOD Conference, pages 241--252, 2013.
[7]
C. G. Andreas Krause. A note on the budgeted maximization of submodular functions. Technical report, School of Computer Science, Carnegie Mellon University, March 2005.
[8]
D. C. Brabham. Crowdsourcing as a model for problem solving an introduction and cases. Convergence February 2008 vol. 14 no. 1 75--90, 2008.
[9]
C. S. Campbell, P. P. Maglio, A. Cozzi, and B. Dom. Expertise identification using email communications. In In CIKM 03: Proceedings of the twelfth international conference on Information and knowledge management, pages 528--531. ACM Press, 2003.
[10]
C. C. Cao, J. She, Y. Tong, and L. Chen. Whom to ask? jury selection for decision making tasks on micro-blog services. PVLDB, 5(11): 1495--1506, 2012.
[11]
A. Caprara, H. Kellerer, U. Pferschy, and D. Pisinger. Approximation algorithms for knapsack problems with cardinality constraints. European Journal of Operational Research, 123(2): 333--345, 2000.
[12]
M. Das, S. Thirumuruganathan, S. Amer-Yahia, G. Das, and C. Yu. Who tags what? an analysis framework. PVLDB, 5(11): 1567--1578, 2012.
[13]
A. Doan, R. Ramakrishnan, and A. Y. Halevy. Crowdsourcing systems on the world-wide web. Commun. ACM, 54(4): 86--96, 2011.
[14]
B. Dom, I. Eiron, A. Cozzi, and Y. Zhang. Graph-based ranking algorithms for e-mail expertise analysis. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD '03, pages 42--48, New York, NY, USA, 2003. ACM.
[15]
A. Feng, M. J. Franklin, D. Kossmann, T. Kraska, S. Madden, S. Ramesh, A. Wang, and R. Xin. Crowddb: Query processing with the vldb crowd. PVLDB, 4(12): 1387--1390, 2011.
[16]
R. Gomes, P. Welinder, A. Krause, and P. Perona. Crowdclustering. In NIPS, pages 558--566, 2011.
[17]
S. Guo, A. G. Parameswaran, and H. Garcia-Molina. So who won?: dynamic max discovery with the crowd. In SIGMOD Conference, pages 385--396, 2012.
[18]
H. Kaplan, I. Lotosh, T. Milo, and S. Novgorodov. Answering planning queries with the crowd. PVLDB, 6(9): 697--708, 2013.
[19]
H. Kellerer, U. Pferschy, and D. Pisinger. Knapsack problems. Springer, 2004.
[20]
T. Lappas, K. Liu, and E. Terzi. Finding a team of experts in social networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, pages 467--476, New York, NY, USA, 2009. ACM.
[21]
T. Malone, R. Laubacher, and C. Dellarocas. Harnessing crowds: Mapping the genome of collective intelligence. Research Paper No. 4732-09, MIT, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA, USA, February 2009. Sloan Research Paper No. 4732--09.
[22]
A. Marcus, E. Wu, D. R. Karger, S. Madden, and R. C. Miller. Human-powered sorts and joins. PVLDB, 5(1): 13--24, 2011.
[23]
S. Page. The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies. Princeton University Press, 2007.
[24]
S. Page. Making the difference: Applying a logic of diversity. 2007.
[25]
A. G. Parameswaran, H. Garcia-Molina, H. Park, N. Polyzotis, A. Ramesh, and J. Widom. Crowdscreen: algorithms for filtering data with humans. In SIGMOD Conference, pages 361--372, 2012.
[26]
A. G. Parameswaran and N. Polyzotis. Answering queries using humans, algorithms and databases. In CIDR, pages 160--166, 2011.
[27]
A. G. Parameswaran, A. D. Sarma, H. Garcia-Molina, N. Polyzotis, and J. Widom. Human-assisted graph search: it's okay to ask questions. PVLDB, 4(5): 267--278, 2011.
[28]
M. F. Porter. Readings in information retrieval. chapter An Algorithm for Suffix Stripping, pages 313--316. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1997.
[29]
B. Roos. Binomial approximation to the poisson binomial distribution: The krawtchouk expansion. Theory of Probability and its Applications, 45(2): 258--272 (2000) and Teor. Veroyatn. Primen. 45, No. 2, 328--344, 2000.
[30]
C. Stein. Approximate Computation of Expectations. Hayward, Calif.: Institute of Mathematical Statistics, 1986.
[31]
J. Surowiecki. The Wisdom of Crowds. Anchor, 2005.
[32]
J. Wang, T. Kraska, M. J. Franklin, and J. Feng. Crowder: Crowdsourcing entity resolution. PVLDB, 5(11): 1483--1494, 2012.
[33]
J. Wang, G. Li, T. Kraska, M. J. Franklin, and J. Feng. Leveraging transitive relations for crowdsourced joins. In SIGMOD Conference, pages 229--240, 2013.
[34]
S. E. Whang, P. Lofgren, and H. Garcia-Molina. Question selection for crowd entity resolution. PVLDB, 6(6): 349--360, 2013.
[35]
C. J. Zhang, L. Chen, H. V. Jagadish, and C. C. Cao. Reducing uncertainty of schema matching via crowdsourcing. PVLDB, 6(9): 757--768, 2013.

Cited By

View all
  • (2024)Comquest: Large Scale User Comment Crawling and IntegrationCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654736(432-435)Online publication date: 9-Jun-2024
  • (2022)VFLens: Co-design the Modeling Process for Efficient Vertical Federated Learning via VisualizationProceedings of the Tenth International Symposium of Chinese CHI10.1145/3565698.3565765(1-14)Online publication date: 22-Oct-2022
  • (2020)CONCIERGEProceedings of the VLDB Endowment10.14778/3415478.341549513:12(2865-2868)Online publication date: 14-Sep-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 8, Issue 5
January 2015
181 pages
ISSN:2150-8097
  • Editors:
  • Chen Li,
  • Volker Markl
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 January 2015
Published in PVLDB Volume 8, Issue 5

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)2
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Comquest: Large Scale User Comment Crawling and IntegrationCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654736(432-435)Online publication date: 9-Jun-2024
  • (2022)VFLens: Co-design the Modeling Process for Efficient Vertical Federated Learning via VisualizationProceedings of the Tenth International Symposium of Chinese CHI10.1145/3565698.3565765(1-14)Online publication date: 22-Oct-2022
  • (2020)CONCIERGEProceedings of the VLDB Endowment10.14778/3415478.341549513:12(2865-2868)Online publication date: 14-Sep-2020
  • (2020)Diversification on big data in query processingFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-019-8324-914:4Online publication date: 1-Aug-2020
  • (2020)A Faster FPTAS for Knapsack Problem with Cardinality ConstraintApproximation and Online Algorithms10.1007/978-3-030-80879-2_2(16-29)Online publication date: 9-Sep-2020
  • (2019)Online Idea Management for Civic EngagementACM Transactions on Social Computing10.1145/32849822:1(1-29)Online publication date: 23-Jan-2019
  • (2018)Quality-Aware Pricing for Mobile CrowdsensingIEEE/ACM Transactions on Networking10.1109/TNET.2018.284656926:4(1728-1741)Online publication date: 1-Aug-2018
  • (2017)PODIUMProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3133176(2443-2446)Online publication date: 6-Nov-2017
  • (2016)Database Meets Deep LearningACM SIGMOD Record10.1145/3003665.300366945:2(17-22)Online publication date: 28-Sep-2016
  • (2016)Posted pricing for robust crowdsensingProceedings of the 17th ACM International Symposium on Mobile Ad Hoc Networking and Computing10.1145/2942358.2942385(261-270)Online publication date: 5-Jul-2016

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media