research-article

Crowdsourcing Enumeration Queries: Estimators and Interfaces

Authors:

Beth Trushkowsky,

Michael J. Franklin,

Purnamrita Sarkar,

Venketaram RamachandranAuthors Info & Claims

IEEE Transactions on Knowledge and Data Engineering, Volume 27, Issue 7

Pages 1796 - 1809

https://doi.org/10.1109/TKDE.2014.2339857

Published: 01 July 2015 Publication History

Abstract

Hybrid human/computer database systems promise to greatly expand the usefulness of query processing by incorporating the crowd for data gathering and other tasks. Such systems raise many implementation questions. Perhaps the most fundamental issue is that the closed world assumption underlying relational query semantics does not hold in such systems. As a consequence, the meaning of even simple queries can be called into question. Furthermore, query progress monitoring becomes difficult due to non-uniformities in the arrival of crowd-sourced data and peculiarities of how people work in crowd-sourcing systems. To address these issues, we develop statistical tools that enable users and systems developers to reason about query completeness. These tools can also help drive query execution and crowd-sourcing strategies. We evaluate our techniques using experiments on a popular crowd-sourcing platform.

References

[1]

M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin, “CrowdDB: Answering queries with crowdsourcing,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2011, pp. 61–72.

[2]

A. Marcus, E. Wu, S. Madden, and R. Miller, “Crowdsourced databases: Query processing with people,” in Proc. 5th Biennial Conf. Inovative Data Syst. Res., 2011, pp. 211–214.

[3]

A. Parameswaran and N. Polyzotis, “Answering queries using humans, algorithms and databases,” in Proc. 5th Biennial Conf. Inovative Data Syst. Res., 2011, pp. 160–166.

[4]

P. G. Ipeirotis, F. Provost, and J. Wang, “Quality management on Amazon Mechanical Turk,” in Proc. Workshop Human Comput., 2010, pp. 64–67.

[5]

D. W. Barowy, C. Curtsinger, E. D. Berger, and A. McGregor, “AutoMan: A platform for integrating human-based and digital computation,” in OOPSLA, 2012.

[6]

K.-T. Chen, C.-C. Wu, Y.-C. Chang, and C.-L. Lei, “A crowdsourceable QoE evaluation framework for multimedia content,” in Proc. 17th ACM Int. Conf. Multimedia , 2009, pp. 491–500.

Digital Library

[7]

A. Doan, M. J. Franklin, D. Kossmann, and T. Kraska, “Crowdsourcing applications and platforms: A data management perspective,” Proc. VLDB Endowment, vol. 4, no. 12, pp. 1508–1509, 2011.

Digital Library

[8]

R. K. Colwell and J. A. Coddington, “Estimating terrestrial biodiversity through extrapolation, ” Philos. Trans.: Biol. Sci., vol. 345, no. 1311, pp. 101 –118, 1994.

[9]

A. Feng, M. J. Franklin, D. Kossmann, T. Kraska, S. Madden, S. Ramesh, A. Wang, and R. Xin, “CrowdDB: Query processing with the VLDB crowd,” Proc. VLDB Endowment, vol. 4, no. 12, pp. 1387–1390, 2011.

Digital Library

[10]

J. Bunge and M. Fitzpatrick, “Estimating the number of species: A review,” J. Amer. Statist. Assoc., vol. 88, no. 421, pp. 364– 373, 1993.

[11]

A. Chao, “Species richness estimation and applications,” Encyclopedia of Statistical Sciences, 2nd Edition, Wiley, New York, pp. 7907–7916, 2005.

[12]

P. J. Haas, J. F. Naughton, S. Seshadri, and L. Stokes, “Sampling-based estimation of the number of distinct values of an attribute,” in Proc. 21th Int. Conf. Very Large Data Base, 1995, pp. 311–322.

Digital Library

[13]

A. Chao and S. Lee, “Estimating the number of classes via sample coverage,” J. Amer. Statist. Assoc., vol. 87, no. 417, pp. 210–217, 1992.

[14]

J. Bunge, M. Fitzpatrick, and J. Handley, “Comparison of three estimators of the number of species,” J. Appl. Statist., vol. 22, no. 1, pp. 45–59, 1995.

[15]

A. Chao, “Nonparametric estimation of the number of classes in a population, ” Scandinavian J. Statist., vol. 11, no. 4, pp. 265–270, 1984.

[16]

K. P. Burnham and W. S. Overton, “Estimation of the size of a closed population when capture probabilities vary among animals, ” Biometrika, vol. 65, no. 3, pp. 625 –633, 1978.

[17]

B. Trushkowsky, T. Kraska, M. J. Franklin, and P. Sarkar, “Crowdsourced enumeration queries,” in Proc. IEEE Int. Conf. Data Eng., 2013, pp. 673 –684.

[18]

J. Heer and M. Bostock, “Crowdsourcing graphical perception: Using mechanical turk to assess visualization design,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., 2010, pp. 203–212.

[19]

I. J. Good, “The population frequencies of species and the estimation of population parameters,” Biometrika, vol. 40, no. 3/4, pp. 237–264, 1953.

[20]

L. von Ahn and L. Dabbish, “ Labeling images with a computer game,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., 2004, pp. 319–326.

[21]

J. Wang, T. Kraska, M. J. Franklin, and J. Feng, “ Crowder: Crowdsourcing entity resolution,” Proc. VLDB Endowment, vol. 5, no. 11, pp. 1483–1494, July 2012.

[22]

T. Shen, A. Chao, and C. F. Lin, “Predicting the number of new species in further taxonomic sampling,” Ecology, vol. 84, no. 3, pp. 798–804, 2003.

[23]

R. K. Colwell, C. X. Mao, and J. Chang, “Interpolating, extrapolating, and comparing incidence-based species accumulation curves,” Ecology, vol. 85, no. 10, pp. 2717–2727, 2004.

[24]

M. Charikar, S. Chaudhuri, R. Motwani, and V. Narasayya, “Towards estimation error guarantees for distinct values,” in Proc. 19th ACM SIGMOD Symp. Principles Database Syst., 2000, pp. 268–279.

[25]

A. Shlosser, “On estimation of the size of the dictionary of a long text on the basis of a sample,” Eng. Cybern., vol. 19, pp. 97 –102, 1981.

[26]

A. Broder, M. Fontura, V. Josifovski, R. Kumar, R. Motwani, S. Nabar, R. Panigrahy, A. Tomkins, and Y. Xu, “Estimating corpus size via queries,” in Proc. 15th ACM Int. Conf. Inf. Knowl. Manage., 2006, pp. 594– 603.

Digital Library

[27]

K.-L. Liu, C. Yu, and W. Meng, “Discovering the representative of a search engine,” in Proc. 11th ACM Int. Conf. Inf. Knowl. Manage. , 2002, pp. 652–654.

[28]

Z. Bar-Yossef and M. Gurevich, “Efficient search engine measurements,” ACM Trans. Web, vol. 5, no. 4, pp. 18:1–18:48, Oct. 2011.

[29]

J. Lu and D. Li, “ Estimating deep web data source size by capture—Recapture method,” Inf. Retr., vol. 13, no. 1, pp. 70–95, Feb. 2010.

Digital Library

[30]

J. Liang. (2008). Estimation methods for the size of deep web textural data source: A survey. [Online]. Available: cs.uwindsor.ca/richard/cs510/survey_jie_liang.pdf

[31]

R. Nakatsu and E. Grossman, “Designing effective user interfaces for crowdsourcing: An exploratory study, ” in Human Interface and the Management of Information, ser. Lecture Notes in Computer Science, vol. 8016, 2013, pp. 221–229.

[32]

P. Gutheim and B. Hartmann, “Fantasktic: Improving quality of results for novice crowdsourcing users,” EECS Dept., Univ. California, Berkeley, CA, USA, Tech. Rep., 2012.

[33]

A. G. Parameswaran, H. Park, H. Garcia-Molina, N. Polyzotis, and J. Widom, “Deco: Declarative Crowdsourcing,” in Proc. CIKM, 2012.

[34]

M. S. Bernstein, J. Brandt, R. C. Miller, and D. R. Karger, “Crowds in two seconds: Enabling realtime crowd-powered interfaces,” in Proc. 24th Annu. ACM Symp. User Interface Softw. Technol., 2011, pp. 33–42.

Digital Library

[35]

P. Welinder, S. Branson, S. Belongie, and P. Perona, “The multidimensional wisdom of crowds,” in Proc. Adv. Neural Inf. Process. Syst., 2010, pp. 2424–2432.

Index Terms

Crowdsourcing Enumeration Queries: Estimators and Interfaces

Index terms have been assigned to the content through auto-classification.

Recommendations

Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations

This work introduces new algorithms for processing top-$k$ queries in uncertain databases, under the generally adopted model of x-relations. An x-relation consists of a number of x-tuples, and each x-tuple randomly instantiates into one tuple from one or ...
Faster Join Enumeration for Complex Queries
ICDE '08: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

Most existing join ordering algorithms concentrate on join queries with simple join predicates and inner joins only, where simple predicates are those that involve exactly two relations. However, real queries may contain complex join predicates, i.e. ...
On the Enumeration Complexity of Unions of Conjunctive Queries
PODS '19: Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

We study the enumeration complexity of Unions of Conjunctive Queries (UCQs). We aim to identify the UCQs that are tractable in the sense that the answer tuples can be enumerated with a linear preprocessing phase and a constant delay between every ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering

IEEE Transactions on Knowledge and Data Engineering Volume 27, Issue 7

July 2015

281 pages

ISSN:1041-4347

Issue’s Table of Contents

Copyright © 2014.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 July 2015

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents