Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Free access

Answering enumeration queries with the crowd

Published: 21 December 2015 Publication History
First page of PDF

References

[1]
Bar-Yossef, Z., Gurevich, M. Efficient search engine measurements. ACM Trans. Web 5, 4 (Oct. 2011), 18:1--18:48.
[2]
Broder, A., Fontura, M., Josifovski, V., Kumar, R., Motwani, R., Nabar, S., Panigrahy, R., Tomkins, A., Xu, Y. Estimating corpus size via queries. In Proceedings of CIKM (2006).
[3]
Bunge, J., Fitzpatrick, M. Estimating the number of species: A review. J. Am. Stat. Assoc. 88, 421 (1993), 364--373.
[4]
Bunge, J., et al. Comparison of three estimators of the number of species. J. Appl. Stat. 22, 1 (1995), 45--59.
[5]
Burnham, K.P., Overton, W.S. Estimation of the size of a closed population when capture probabilities vary among animals. Biometrika 65, 3 (1978), 625--633.
[6]
Chao, A. Species estimation and applications. In Encyclopedia of Statistical Sciences, 2nd edn. N. Balakrishnan, C.B. Read, and B. Vidakovic, eds. Wiley, New York, 2005, 7907--7916.
[7]
Chao, A., Lee, S. Estimating the number of classes via sample coverage. J. Am. Stat. Assoc. 87, 417 (1992), 210--217.
[8]
Charikar, M., et al. Towards estimation error guarantees for distinct values. In Proceedings of the PODS (2000).
[9]
Colwell, R.K., Coddington, J.A. Estimating terrestrial biodiversity through extrapolation. Philos. Trans. Biol. Sci. 345, 1311 (1994), 101--118.
[10]
Doan, A., et al. Crowdsourcing applications and platforms: A data management perspective. PVLDB 4, 12 (2011), 1508--1509.
[11]
Franklin, M.J., et al. CrowdDB: Answering queries with crowdsourcing. In Proceedings of the SIGMOD (2011).
[12]
Good, I.J. The population frequencies of species and the estimation of population parameters. Biometrika 40, 3/4 (1953), 237--264.
[13]
Gray, J., et al. Quickly generating billion-record synthetic databases. In Proceedings of the SIGMOD (1994).
[14]
Haas, P.J., et al. Sampling-based estimation of the number of distinct values of an attribute. In Proceedings of the VLDB (1995).
[15]
Heer, J., et al. Crowdsourcing graphical perception: Using mechanical turk to assess visualization design. In Proceedings of the CHI (2010).
[16]
Ipeirotis, P.G., Provost, F., Wang, J. Quality management on Amazon mechanical turk. In Proceedings of the HCOMP (2010).
[17]
Liu, K.-L., Yu, C., Meng, W. Discovering the representative of a search engine. In Proceedings of the CIKM (2002).
[18]
Lu, J., Li, D. Estimating deep web data source size by capture--recapture method. Inf. Retr. 13, 1 (Feb. 2010), 70--95.
[19]
Marcus, A., Wu, E., Madden, S., Miller, R. Crowdsourced databases: Query processing with people. In Proceedings of the CIDR (2011).
[20]
Parameswaran, A., Polyzotis, N. Answering queries using humans, algorithms and databases. In Proceedings of the CIDR (2011).
[21]
Shen, T., et al. Predicting the number of new species in further taxonomic sampling. Ecology 84, 3 (2003).
[22]
Trushkowsky, B., Kraska, T., Franklin, M.J., Sarkar, P. Crowdsourced enumeration queries. In Proceedings of the ICDE (2013).
[23]
Wang, J., et al. A sample-and-clean framework for fast and accurate query processing on dirty data. In Proceedings of the SIGMOD (2014), 469--480.

Cited By

View all
  • (2023)Crowd-enabled multiple Pareto-optimal queries for multi-criteria decision-making servicesFuture Generation Computer Systems10.1016/j.future.2023.06.007148(342-356)Online publication date: Nov-2023
  • (2021)Improving query performance on dynamic graphsSoftware and Systems Modeling (SoSyM)10.1007/s10270-020-00832-320:4(1011-1041)Online publication date: 1-Aug-2021
  • (2020)Skyline Queries Computation on Crowdsourced- Enabled Incomplete DatabaseIEEE Access10.1109/ACCESS.2020.30006648(106660-106689)Online publication date: 2020
  • Show More Cited By

Recommendations

Reviews

Amos O Olagunju

Human crowds are valuable assets for providing additional responses in real time to cognate query results derived solely from relational database management systems (RDBMSs). But how should query results from human crowds, designed to simultaneously augment database query results, be terminated to provide reliable responses__?__ Trushkowsky and colleagues offer statistical tools for users and developers of RDBMSs to use in scrutinizing the time and cost benefits of the accuracy and inclusiveness of query responses. In an effort to cover the inclusiveness of all behavioral groups, the size of a query result (cardinality) is useful for computing the percentage of each interest survey group. The authors compellingly introduce an effective power law distribution data model that helps to overcome the data sampling problems attributable to cultural and regional biases, and a variety of the knowledgeable uses of web search strategies. Clearly, the paper introduces and evaluates a metric for estimating the stable and convergent cardinality of "human intelligence tasks from Amazon's mechanical Turk (HIT-AMT)." The authors introduce algorithms that help to minimize the influence of individuals who might dominate and bias query responses. They present the concepts of the classes of distributions of coverage and variance of user responses to crowdsourced queries. The experimental results of the test statistics with several thousands of queries in HIT with the AMT RDBMS of United Nations and US data show some significant improvement over well-known studies. List walking is a situation when the total size of a query result is underpredicted due to multiple heavily skewed, or similar, survey responses. The authors propose and validate a heuristic binomial probabilistic algorithm to detect and overcome list walking. The algorithms successfully detected severe list walks in the United Nations database. Undoubtedly, the authors present algorithms for computing cost-benefit tradeoffs of generating precise accounts and estimates of query responses derived from the traditional and real-time crowdsourced RDBMSs. There is no doubt that users should be empowered to contribute and reason about query results in all relational database management searches and retrieval results. In spite of the new light shed on the applications of the well-known power-laws [1] and the binomial distribution in this paper, I encourage all statisticians and database specialists to read and address the outstanding remaining unanswered questions raised by the authors. Is there a clear distinction between operations such as SELECT, JOIN, and PROJECT versus relational operators on real query results__?__ What impacts do human behaviors have on the sampling process in crowdsourced queries__?__ Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image Communications of the ACM
Communications of the ACM  Volume 59, Issue 1
January 2016
120 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/2859829
  • Editor:
  • Moshe Y. Vardi
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 December 2015
Published in CACM Volume 59, Issue 1

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)830
  • Downloads (Last 6 weeks)74
Reflects downloads up to 22 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Crowd-enabled multiple Pareto-optimal queries for multi-criteria decision-making servicesFuture Generation Computer Systems10.1016/j.future.2023.06.007148(342-356)Online publication date: Nov-2023
  • (2021)Improving query performance on dynamic graphsSoftware and Systems Modeling (SoSyM)10.1007/s10270-020-00832-320:4(1011-1041)Online publication date: 1-Aug-2021
  • (2020)Skyline Queries Computation on Crowdsourced- Enabled Incomplete DatabaseIEEE Access10.1109/ACCESS.2020.30006648(106660-106689)Online publication date: 2020
  • (2019)CRUXProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3357976(841-850)Online publication date: 3-Nov-2019
  • (2018)NorthstarProceedings of the VLDB Endowment10.14778/3229863.324049311:12(2150-2164)Online publication date: 1-Aug-2018
  • (2018)Demographics and Dynamics of Mechanical Turk WorkersProceedings of the Eleventh ACM International Conference on Web Search and Data Mining10.1145/3159652.3159661(135-143)Online publication date: 2-Feb-2018
  • (2017)A data quality metric (DQM)Proceedings of the VLDB Endowment10.14778/3115404.311541410:10(1094-1105)Online publication date: 1-Jun-2017
  • (2016)A Real-Time Collaborative Testing Approach for Web Application: Via Multi-tasks Matching2016 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C)10.1109/QRS-C.2016.13(61-68)Online publication date: Aug-2016
  • (2016)Robots at the Edge of the CloudProceedings of the 22nd International Conference on Tools and Algorithms for the Construction and Analysis of Systems - Volume 963610.1007/978-3-662-49674-9_1(3-13)Online publication date: 2-Apr-2016

View Options

View options

PDF

View or Download as a PDF file.

PDFChinese translation

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media