research-article

Free access

Answering enumeration queries with the crowd

Authors:

Beth Trushkowsky,

Tim Kraska,

Michael J. Franklin,

Purnamrita SarkarAuthors Info & Claims

Communications of the ACM, Volume 59, Issue 1

Pages 118 - 127

https://doi.org/10.1145/2845644

Published: 21 December 2015 Publication History

All formats PDF

References

[1]

Bar-Yossef, Z., Gurevich, M. Efficient search engine measurements. ACM Trans. Web 5, 4 (Oct. 2011), 18:1--18:48.

Digital Library

Google Scholar

[2]

Broder, A., Fontura, M., Josifovski, V., Kumar, R., Motwani, R., Nabar, S., Panigrahy, R., Tomkins, A., Xu, Y. Estimating corpus size via queries. In Proceedings of CIKM (2006).

Digital Library

Google Scholar

[3]

Bunge, J., Fitzpatrick, M. Estimating the number of species: A review. J. Am. Stat. Assoc. 88, 421 (1993), 364--373.

Google Scholar

[4]

Bunge, J., et al. Comparison of three estimators of the number of species. J. Appl. Stat. 22, 1 (1995), 45--59.

Crossref

Google Scholar

[5]

Burnham, K.P., Overton, W.S. Estimation of the size of a closed population when capture probabilities vary among animals. Biometrika 65, 3 (1978), 625--633.

Crossref

Google Scholar

[6]

Chao, A. Species estimation and applications. In Encyclopedia of Statistical Sciences, 2nd edn. N. Balakrishnan, C.B. Read, and B. Vidakovic, eds. Wiley, New York, 2005, 7907--7916.

Google Scholar

[7]

Chao, A., Lee, S. Estimating the number of classes via sample coverage. J. Am. Stat. Assoc. 87, 417 (1992), 210--217.

Crossref

Google Scholar

[8]

Charikar, M., et al. Towards estimation error guarantees for distinct values. In Proceedings of the PODS (2000).

Digital Library

Google Scholar

[9]

Colwell, R.K., Coddington, J.A. Estimating terrestrial biodiversity through extrapolation. Philos. Trans. Biol. Sci. 345, 1311 (1994), 101--118.

Google Scholar

[10]

Doan, A., et al. Crowdsourcing applications and platforms: A data management perspective. PVLDB 4, 12 (2011), 1508--1509.

Google Scholar

[11]

Franklin, M.J., et al. CrowdDB: Answering queries with crowdsourcing. In Proceedings of the SIGMOD (2011).

Digital Library

Google Scholar

[12]

Good, I.J. The population frequencies of species and the estimation of population parameters. Biometrika 40, 3/4 (1953), 237--264.

Crossref

Google Scholar

[13]

Gray, J., et al. Quickly generating billion-record synthetic databases. In Proceedings of the SIGMOD (1994).

Digital Library

Google Scholar

[14]

Haas, P.J., et al. Sampling-based estimation of the number of distinct values of an attribute. In Proceedings of the VLDB (1995).

Digital Library

Google Scholar

[15]

Heer, J., et al. Crowdsourcing graphical perception: Using mechanical turk to assess visualization design. In Proceedings of the CHI (2010).

Digital Library

Google Scholar

[16]

Ipeirotis, P.G., Provost, F., Wang, J. Quality management on Amazon mechanical turk. In Proceedings of the HCOMP (2010).

Digital Library

Google Scholar

[17]

Liu, K.-L., Yu, C., Meng, W. Discovering the representative of a search engine. In Proceedings of the CIKM (2002).

Digital Library

Google Scholar

[18]

Lu, J., Li, D. Estimating deep web data source size by capture--recapture method. Inf. Retr. 13, 1 (Feb. 2010), 70--95.

Digital Library

Google Scholar

[19]

Marcus, A., Wu, E., Madden, S., Miller, R. Crowdsourced databases: Query processing with people. In Proceedings of the CIDR (2011).

Google Scholar

[20]

Parameswaran, A., Polyzotis, N. Answering queries using humans, algorithms and databases. In Proceedings of the CIDR (2011).

Google Scholar

[21]

Shen, T., et al. Predicting the number of new species in further taxonomic sampling. Ecology 84, 3 (2003).

Crossref

Google Scholar

[22]

Trushkowsky, B., Kraska, T., Franklin, M.J., Sarkar, P. Crowdsourced enumeration queries. In Proceedings of the ICDE (2013).

Digital Library

Google Scholar

[23]

Wang, J., et al. A sample-and-clean framework for fast and accurate query processing on dirty data. In Proceedings of the SIGMOD (2014), 469--480.

Digital Library

Google Scholar

Cited By

View all

Yin BZhang PXu BChen HJi Y(2023)Crowd-enabled multiple Pareto-optimal queries for multi-criteria decision-making servicesFuture Generation Computer Systems10.1016/j.future.2023.06.007148(342-356)Online publication date: Nov-2023
https://doi.org/10.1016/j.future.2023.06.007
Barquero GTroya JVallecillo A(2021)Improving query performance on dynamic graphsSoftware and Systems Modeling (SoSyM)10.1007/s10270-020-00832-320:4(1011-1041)Online publication date: 1-Aug-2021
https://dl.acm.org/doi/10.1007/s10270-020-00832-3
Swidan MAlwan ATuraev SIbrahim HAbualkishik AGulzar Y(2020)Skyline Queries Computation on Crowdsourced- Enabled Incomplete DatabaseIEEE Access10.1109/ACCESS.2020.30006648(106660-106689)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3000664
Show More Cited By

Index Terms

Answering enumeration queries with the crowd

Recommendations

Answering Queries Using Views
Answering Queries Using Views
Answering top-k queries using views
VLDB '06: Proceedings of the 32nd international conference on Very large data bases

The problem of obtaining efficient answers to top-k queries has attracted a lot of research attention. Several algorithms and numerous variants of the top-k retrieval problem have been introduced in recent years. The general form of this problem ...

Reviews

Reviewer: Amos O Olagunju

Human crowds are valuable assets for providing additional responses in real time to cognate query results derived solely from relational database management systems (RDBMSs). But how should query results from human crowds, designed to simultaneously augment database query results, be terminated to provide reliable responses__?__ Trushkowsky and colleagues offer statistical tools for users and developers of RDBMSs to use in scrutinizing the time and cost benefits of the accuracy and inclusiveness of query responses. In an effort to cover the inclusiveness of all behavioral groups, the size of a query result (cardinality) is useful for computing the percentage of each interest survey group. The authors compellingly introduce an effective power law distribution data model that helps to overcome the data sampling problems attributable to cultural and regional biases, and a variety of the knowledgeable uses of web search strategies. Clearly, the paper introduces and evaluates a metric for estimating the stable and convergent cardinality of "human intelligence tasks from Amazon's mechanical Turk (HIT-AMT)." The authors introduce algorithms that help to minimize the influence of individuals who might dominate and bias query responses. They present the concepts of the classes of distributions of coverage and variance of user responses to crowdsourced queries. The experimental results of the test statistics with several thousands of queries in HIT with the AMT RDBMS of United Nations and US data show some significant improvement over well-known studies. List walking is a situation when the total size of a query result is underpredicted due to multiple heavily skewed, or similar, survey responses. The authors propose and validate a heuristic binomial probabilistic algorithm to detect and overcome list walking. The algorithms successfully detected severe list walks in the United Nations database. Undoubtedly, the authors present algorithms for computing cost-benefit tradeoffs of generating precise accounts and estimates of query responses derived from the traditional and real-time crowdsourced RDBMSs. There is no doubt that users should be empowered to contribute and reason about query results in all relational database management searches and retrieval results. In spite of the new light shed on the applications of the well-known power-laws [1] and the binomial distribution in this paper, I encourage all statisticians and database specialists to read and address the outstanding remaining unanswered questions raised by the authors. Is there a clear distinction between operations such as SELECT, JOIN, and PROJECT versus relational operators on real query results__?__ What impacts do human behaviors have on the sampling process in crowdsourced queries__?__ Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

Communications of the ACM Volume 59, Issue 1

January 2016

120 pages

ISSN:0001-0782

EISSN:1557-7317

DOI:10.1145/2859829

Editor:
Moshe Y. Vardi
Association for Computing Machinery, New York, NY

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 December 2015

Published in CACM Volume 59, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
16,866
Total Downloads

Downloads (Last 12 months)830
Downloads (Last 6 weeks)74

Reflects downloads up to 22 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Yin BZhang PXu BChen HJi Y(2023)Crowd-enabled multiple Pareto-optimal queries for multi-criteria decision-making servicesFuture Generation Computer Systems10.1016/j.future.2023.06.007148(342-356)Online publication date: Nov-2023
https://doi.org/10.1016/j.future.2023.06.007
Barquero GTroya JVallecillo A(2021)Improving query performance on dynamic graphsSoftware and Systems Modeling (SoSyM)10.1007/s10270-020-00832-320:4(1011-1041)Online publication date: 1-Aug-2021
https://dl.acm.org/doi/10.1007/s10270-020-00832-3
Swidan MAlwan ATuraev SIbrahim HAbualkishik AGulzar Y(2020)Skyline Queries Computation on Crowdsourced- Enabled Incomplete DatabaseIEEE Access10.1109/ACCESS.2020.30006648(106660-106689)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3000664
Rekatsinas TDeshpande AParameswaran AZhu WTao DCheng XCui PRundensteiner ECarmel DHe QXu Yu J(2019)CRUXProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3357976(841-850)Online publication date: 3-Nov-2019
https://dl.acm.org/doi/10.1145/3357384.3357976
Kraska T(2018)NorthstarProceedings of the VLDB Endowment10.14778/3229863.324049311:12(2150-2164)Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.14778/3229863.3240493
Difallah DFilatova EIpeirotis PChang YZhai CLiu YMaarek Y(2018)Demographics and Dynamics of Mechanical Turk WorkersProceedings of the Eleventh ACM International Conference on Web Search and Data Mining10.1145/3159652.3159661(135-143)Online publication date: 2-Feb-2018
https://dl.acm.org/doi/10.1145/3159652.3159661
Chung YKrishnan SKraska T(2017)A data quality metric (DQM)Proceedings of the VLDB Endowment10.14778/3115404.311541410:10(1094-1105)Online publication date: 1-Jun-2017
https://dl.acm.org/doi/10.14778/3115404.3115414
Guo SChen RLi H(2016)A Real-Time Collaborative Testing Approach for Web Application: Via Multi-tasks Matching2016 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C)10.1109/QRS-C.2016.13(61-68)Online publication date: Aug-2016
https://doi.org/10.1109/QRS-C.2016.13
Majumdar R(2016)Robots at the Edge of the CloudProceedings of the 22nd International Conference on Tools and Algorithms for the Construction and Analysis of Systems - Volume 963610.1007/978-3-662-49674-9_1(3-13)Online publication date: 2-Apr-2016
https://dl.acm.org/doi/10.1007/978-3-662-49674-9_1

View Options

View options

PDF

View or Download as a PDF file.

PDF Chinese translation

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

References

Cited By

Index Terms

Recommendations

Answering Queries Using Views

Answering Queries Using Views

Answering top-k queries using views

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Digital Edition

Magazine Site

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations