Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Crowdsourcing Enumeration Queries: Estimators and Interfaces

Published: 01 July 2015 Publication History

Abstract

Hybrid human/computer database systems promise to greatly expand the usefulness of query processing by incorporating the crowd for data gathering and other tasks. Such systems raise many implementation questions. Perhaps the most fundamental issue is that the closed world assumption underlying relational query semantics does not hold in such systems. As a consequence, the meaning of even simple queries can be called into question. Furthermore, query progress monitoring becomes difficult due to non-uniformities in the arrival of crowd-sourced data and peculiarities of how people work in crowd-sourcing systems. To address these issues, we develop statistical tools that enable users and systems developers to reason about query completeness. These tools can also help drive query execution and crowd-sourcing strategies. We evaluate our techniques using experiments on a popular crowd-sourcing platform.

References

[1]
M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin, “CrowdDB: Answering queries with crowdsourcing,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2011, pp. 61–72.
[2]
A. Marcus, E. Wu, S. Madden, and R. Miller, “Crowdsourced databases: Query processing with people,” in Proc. 5th Biennial Conf. Inovative Data Syst. Res., 2011, pp. 211–214.
[3]
A. Parameswaran and N. Polyzotis, “Answering queries using humans, algorithms and databases,” in Proc. 5th Biennial Conf. Inovative Data Syst. Res., 2011, pp. 160–166.
[4]
P. G. Ipeirotis, F. Provost, and J. Wang, “Quality management on Amazon Mechanical Turk,” in Proc. Workshop Human Comput., 2010, pp. 64–67.
[5]
D. W. Barowy, C. Curtsinger, E. D. Berger, and A. McGregor, “AutoMan: A platform for integrating human-based and digital computation,” in OOPSLA, 2012.
[6]
K.-T. Chen, C.-C. Wu, Y.-C. Chang, and C.-L. Lei, “A crowdsourceable QoE evaluation framework for multimedia content,” in Proc. 17th ACM Int. Conf. Multimedia , 2009, pp. 491–500.
[7]
A. Doan, M. J. Franklin, D. Kossmann, and T. Kraska, “Crowdsourcing applications and platforms: A data management perspective,” Proc. VLDB Endowment, vol. 4, no. 12, pp. 1508–1509, 2011.
[8]
R. K. Colwell and J. A. Coddington, “Estimating terrestrial biodiversity through extrapolation, ” Philos. Trans.: Biol. Sci., vol. 345, no. 1311, pp. 101 –118, 1994.
[9]
A. Feng, M. J. Franklin, D. Kossmann, T. Kraska, S. Madden, S. Ramesh, A. Wang, and R. Xin, “CrowdDB: Query processing with the VLDB crowd,” Proc. VLDB Endowment, vol. 4, no. 12, pp. 1387–1390, 2011.
[10]
J. Bunge and M. Fitzpatrick, “Estimating the number of species: A review,” J. Amer. Statist. Assoc., vol. 88, no. 421, pp. 364– 373, 1993.
[11]
A. Chao, “Species richness estimation and applications,” Encyclopedia of Statistical Sciences, 2nd Edition, Wiley, New York, pp. 7907–7916, 2005.
[12]
P. J. Haas, J. F. Naughton, S. Seshadri, and L. Stokes, “Sampling-based estimation of the number of distinct values of an attribute,” in Proc. 21th Int. Conf. Very Large Data Base, 1995, pp. 311–322.
[13]
A. Chao and S. Lee, “Estimating the number of classes via sample coverage,” J. Amer. Statist. Assoc., vol. 87, no. 417, pp. 210–217, 1992.
[14]
J. Bunge, M. Fitzpatrick, and J. Handley, “Comparison of three estimators of the number of species,” J. Appl. Statist., vol. 22, no. 1, pp. 45–59, 1995.
[15]
A. Chao, “Nonparametric estimation of the number of classes in a population, ” Scandinavian J. Statist., vol. 11, no. 4, pp. 265–270, 1984.
[16]
K. P. Burnham and W. S. Overton, “Estimation of the size of a closed population when capture probabilities vary among animals, ” Biometrika, vol. 65, no. 3, pp. 625 –633, 1978.
[17]
B. Trushkowsky, T. Kraska, M. J. Franklin, and P. Sarkar, “Crowdsourced enumeration queries,” in Proc. IEEE Int. Conf. Data Eng., 2013, pp. 673 –684.
[18]
J. Heer and M. Bostock, “Crowdsourcing graphical perception: Using mechanical turk to assess visualization design,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., 2010, pp. 203–212.
[19]
I. J. Good, “The population frequencies of species and the estimation of population parameters,” Biometrika, vol. 40, no. 3/4, pp. 237–264, 1953.
[20]
L. von Ahn and L. Dabbish, “ Labeling images with a computer game,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., 2004, pp. 319–326.
[21]
J. Wang, T. Kraska, M. J. Franklin, and J. Feng, “ Crowder: Crowdsourcing entity resolution,” Proc. VLDB Endowment, vol. 5, no. 11, pp. 1483–1494, July 2012.
[22]
T. Shen, A. Chao, and C. F. Lin, “Predicting the number of new species in further taxonomic sampling,” Ecology, vol. 84, no. 3, pp. 798–804, 2003.
[23]
R. K. Colwell, C. X. Mao, and J. Chang, “Interpolating, extrapolating, and comparing incidence-based species accumulation curves,” Ecology, vol. 85, no. 10, pp. 2717–2727, 2004.
[24]
M. Charikar, S. Chaudhuri, R. Motwani, and V. Narasayya, “Towards estimation error guarantees for distinct values,” in Proc. 19th ACM SIGMOD Symp. Principles Database Syst., 2000, pp. 268–279.
[25]
A. Shlosser, “On estimation of the size of the dictionary of a long text on the basis of a sample,” Eng. Cybern., vol. 19, pp. 97 –102, 1981.
[26]
A. Broder, M. Fontura, V. Josifovski, R. Kumar, R. Motwani, S. Nabar, R. Panigrahy, A. Tomkins, and Y. Xu, “Estimating corpus size via queries,” in Proc. 15th ACM Int. Conf. Inf. Knowl. Manage., 2006, pp. 594– 603.
[27]
K.-L. Liu, C. Yu, and W. Meng, “Discovering the representative of a search engine,” in Proc. 11th ACM Int. Conf. Inf. Knowl. Manage. , 2002, pp. 652–654.
[28]
Z. Bar-Yossef and M. Gurevich, “Efficient search engine measurements,” ACM Trans. Web, vol. 5, no. 4, pp. 18:1–18:48, Oct. 2011.
[29]
J. Lu and D. Li, “ Estimating deep web data source size by capture—Recapture method,” Inf. Retr., vol. 13, no. 1, pp. 70–95, Feb. 2010.
[30]
J. Liang. (2008). Estimation methods for the size of deep web textural data source: A survey. [Online]. Available: cs.uwindsor.ca/richard/cs510/survey_jie_liang.pdf
[31]
R. Nakatsu and E. Grossman, “Designing effective user interfaces for crowdsourcing: An exploratory study, ” in Human Interface and the Management of Information, ser. Lecture Notes in Computer Science, vol. 8016, 2013, pp. 221–229.
[32]
P. Gutheim and B. Hartmann, “Fantasktic: Improving quality of results for novice crowdsourcing users,” EECS Dept., Univ. California, Berkeley, CA, USA, Tech. Rep., 2012.
[33]
A. G. Parameswaran, H. Park, H. Garcia-Molina, N. Polyzotis, and J. Widom, “Deco: Declarative Crowdsourcing,” in Proc. CIKM, 2012.
[34]
M. S. Bernstein, J. Brandt, R. C. Miller, and D. R. Karger, “Crowds in two seconds: Enabling realtime crowd-powered interfaces,” in Proc. 24th Annu. ACM Symp. User Interface Softw. Technol., 2011, pp. 33–42.
[35]
P. Welinder, S. Branson, S. Belongie, and P. Perona, “The multidimensional wisdom of crowds,” in Proc. Adv. Neural Inf. Process. Syst., 2010, pp. 2424–2432.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering  Volume 27, Issue 7
July 2015
281 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 July 2015

Author Tags

  1. user interfaces
  2. Database design
  3. modeling and management

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media