research-article

Crowdsourced Selection on Multi-Attribute Data

Authors:

Jianhua FengAuthors Info & Claims

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

Pages 307 - 316

https://doi.org/10.1145/3132847.3132891

Published: 06 November 2017 Publication History

Abstract

Crowdsourced selection asks the crowd to select entities that satisfy a query condition, e.g., selecting the photos of people wearing sunglasses from a given set of photos. Existing studies focus on a single query predicate and in this paper we study the crowdsourced selection problem on multi-attribute data, e.g., selecting the female photos with dark eyes and wearing sunglasses. A straightforward method asks the crowd to answer every entity by checking every predicate in the query. Obviously, this method involves huge monetary cost. Instead, we can select an optimized predicate order and ask the crowd to answer the entities following the order. Since if an entity does not satisfy a predicate, we can prune this entity without needing to ask other predicates and thus this method can reduce the cost. There are two challenges in finding the optimized predicate order. The first is how to detect the predicate order and the second is to capture correlation among different predicates. To address this problem, we propose predicate order based framework to reduce monetary cost. Firstly, we define an expectation tree to store selectivities on predicates and estimate the best predicate order. In each iteration, we estimate the best predicate order from the expectation tree, and then choose a predicate as a question to ask the crowd. After getting the result of the current predicate, we choose next predicate to ask until we get the result. We will update the expectation tree using the answer obtained from the crowd and continue to the next iteration. We also study the problem of answering multiple queries simultaneously, and reduce its cost using the correlation between queries. Finally, we propose a confidence based method to improve the quality. The experiment result shows that our predicate order based algorithm is effective and can reduce cost significantly compared with baseline approaches.

References

[1]

C. Chai, G. Li, J. Li, D. Deng, and J. Feng. Cost-effective crowdsourced entity resolution: A partial-order approach. In SIGMOD, pages 969--984, 2016.

Digital Library

[2]

H. Chen, A. Gallagher, and B. Girod. Describing clothing by semantic attributes. ECCV, pages 609--623, 2012.

Digital Library

[3]

J. Fan, G. Li, B. C. Ooi, K.-l. Tan, and J. Feng. icrowd: An adaptive crowdsourcing framework. In SIGMOD, pages 1015--1030. ACM, 2015.

Digital Library

[4]

J. Fan, M. Zhang, S. Kok, M. Lu, and B. C. Ooi. Crowdop: Query optimization for declarative crowdsourcing systems. IEEE TKDE, 27(8):2078--2092, 2015.

Digital Library

[5]

Y. Fang, H. Sun, G. Li, R. Zhang, and J. Huai. Effective result inference for context-sensitive tasks in crowdsourcing. In DASFAA, pages 33--48, 2016.

[6]

J. Feng, G. Li, H. Wang, and J. Feng. Incremental quality inference in crowdsourcing. In DASFAA, pages 453--467, 2014.

[7]

M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. Crowddb: answering queries with crowdsourcing. In SIGMOD, pages 61--72. ACM, 2011.

Digital Library

[8]

S. Guo, A. Parameswaran, and H. Garcia-Molina. So who won?: dynamic max discovery with the crowd. In SIGMOD, pages 385--396. ACM, 2012.

Digital Library

[9]

J. M. Hellerstein and M. Stonebraker. Predicate migration: Optimizing queries with expensive predicates, volume 22. ACM, 1993.

Digital Library

[10]

H. Hu, G. Li, Z. Bao, Y. Cui, and J. Feng. Crowdsourcing-based real-time urban traffic speed estimation: From trends to speeds. In ICDE, pages 883--894, 2016.

[11]

H. Hu, Y. Zheng, Z. Bao, G. Li, J. Feng, and R. Cheng. Crowdsourced POI labelling: Location-aware result inference and task assignment. In ICDE, pages 61--72, 2016.

[12]

G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07--49, University of Massachusetts, Amherst, October 2007.

[13]

G. Li. Human-in-the-loop data integration. PVLDB, 10(12):2006--2017, 2017.

Digital Library

[14]

G. Li, C. Chai, J. Fan, X. Weng, J. Li, Y. Zheng, Y. Li, X. Yu, X. Zhang, and H. Yuan. Cdb: Optimizing queries with crowd-based selections and joins. In SIGMOD, pages 1463--1478. ACM, 2017.

Digital Library

[15]

G. Li, J. Wang, Y. Zheng, and M. J. Franklin. Crowdsourced data management: A survey. IEEE TKDE., 28(9):2296--2319, 2016.

Digital Library

[16]

X. Liu, M. Lu, B. C. Ooi, Y. Shen, S. Wu, and M. Zhang. Cdas: a crowdsourcing data analytics system. VLDB, 5(10):1040--1051, 2012.

Digital Library

[17]

A. Marcus, D. Karger, S. Madden, R. Miller, and S. Oh. Counting with the crowd. In VLDB, volume 6, pages 109--120. VLDB Endowment, 2012.

Digital Library

[18]

A. Marcus, E. Wu, D. Karger, S. Madden, and R. Miller. Human-powered sorts and joins. VLDB, 5(1):13--24, 2011.

Digital Library

[19]

A. Marcus, E. Wu, D. R. Karger, S. Madden, and R. C. Miller. Demonstration of qurk: a query processor for humanoperators. In SIGMOD, pages 1315--1318. ACM, 2011.

Digital Library

[20]

A. G. Parameswaran, H. Garcia-Molina, H. Park, N. Polyzotis, A. Ramesh, and J. Widom. Crowdscreen: Algorithms for filtering data with humans. In SIGMOD, pages 361--372. ACM, 2012.

Digital Library

[21]

A. G. Parameswaran, H. Park, H. Garcia-Molina, N. Polyzotis, and J. Widom. Deco: declarative crowdsourcing. In CIKM, pages 1203--1212. ACM, 2012.

Digital Library

[22]

H. Park, H. Garcia-Molina, R. Pang, N. Polyzotis, A. Parameswaran, and J. Widom. Deco: A system for declarative crowdsourcing. VLDB, 5(12):1990--1993, 2012.

Digital Library

[23]

H. Park, R. Pang, A. Parameswaran, H. Garcia-Molina, N. Polyzotis, and J. Widom. An overview of the deco system: data model and query language; query processing and optimization. SIGMOD Record, 41(4):22--27, 2013.

Digital Library

[24]

A. D. Sarma, A. Parameswaran, H. Garcia-Molina, and A. Halevy. Crowd-powered find algorithms. In ICDE, pages 964--975. IEEE, 2014.

[25]

P. Venetis, H. Garcia-Molina, K. Huang, and N. Polyzotis. Max algorithms in crowdsourcing environments. In WWW, pages 989--998. ACM, 2012.

Digital Library

[26]

J. Wang, G. Li, T. Kraska, M. J. Franklin, and J. Feng. Leveraging transitive relations for crowdsourced joins. In SIGMOD, pages 229--240. ACM, 2013.

Digital Library

[27]

X. Zhang, G. Li, and J. Feng. Crowdsourced top-k algorithms: An experimental evaluation. PVLDB, 9(8):612--623, 2016.

Digital Library

[28]

Y. Zheng, G. Li, and R. Cheng. DOCS: domain-aware crowdsourcing system. PVLDB, 10(4):361--372, 2016.

Digital Library

[29]

Y. Zheng, G. Li, Y. Li, C. Shan, and R. Cheng. Truth inference in crowdsourcing: Is the problem solved? PVLDB, 10(5):541--552, 2017.

Digital Library

[30]

Y. Zheng, J. Wang, G. Li, R. Cheng, and J. Feng. QASCA: A quality-aware task assignment system for crowdsourcing applications. In SIGMOD, pages 1031--1046, 2015.

Digital Library

Cited By

Yin BZeng WWei X(2022)Cost-effective crowdsourced join queries for entity resolution without prior knowledgeFuture Generation Computer Systems10.1016/j.future.2021.09.008127:C(240-251)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1016/j.future.2021.09.008
Yin BLu J(2021)A Cost-Efficient Framework for Crowdsourced Data Collection in Vehicular NetworksIEEE Internet of Things Journal10.1109/JIOT.2021.30657168:17(13567-13581)Online publication date: 1-Sep-2021
https://doi.org/10.1109/JIOT.2021.3065716
Duan XTajima K(2019)Improving Multiclass Classification in Crowdsourcing by Using Hierarchical SchemesThe World Wide Web Conference10.1145/3308558.3313749(2694-2700)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313749
Show More Cited By

Recommendations

Query optimization over crowdsourced data

Deco is a comprehensive system for answering declarative queries posed over stored relational data together with data obtained on-demand from the crowd. In this paper we describe Deco's cost-based query optimizer, building on Deco's data model, query ...
View selection for real conjunctive queries

Given a query workload, a database and a set of constraints, the view-selection problem is to select views to materialize so that the constraints are satisfied and the views can be used to compute the queries in the workload efficiently. A typical ...
Hyper-USS: Answering Subset Query Over Multi-Attribute Data Stream
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Sketching algorithms are considered as promising solutions for answering approximate query on massive data stream. In real scenarios, a large number of problems can be abstracted as subset query over multiple attributes. Existing sketches are designed ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

November 2017

2604 pages

ISBN:9781450349185

DOI:10.1145/3132847

General Chairs:
Ee-Peng Lim
Singapore Management University, Singapore
,
Marianne Winslett
University of Illinois at Urbana-Champaign, USA, and Advanced Digital Sciences Center, Singapore
,
Program Chairs:
Mark Sanderson
RMIT, Australia
,
Ada Fu
Chinese University of Hong Kong, Hong Kong
,
Jimeng Sun
Georgia Tech, USA
,
Shane Culpepper
RMIT, Australia
,
Eric Lo
Chinese University of Hong Kong, Hong Kong
,
Joyce Ho
Emory University, USA
,
Debora Donato
Mix Tech, Inc., USA
,
Rakesh Agrawal
Data Insights Laboratories, USA
,
Yu Zheng
Microsoft Research Asia, China
,
Carlos Castillo
Qatar Computing Research Institute, Qatar
,
Aixin Sun
Nanyang Technological University, Singapore
,
Vincent S. Tseng
National Cheng Kung University, Taiwan
,
Chenliang Li
Wuhan University, China

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

CIKM '17

Sponsor:

CIKM '17: ACM Conference on Information and Knowledge Management

November 6 - 10, 2017

Singapore, Singapore

Acceptance Rates

CIKM '17 Paper Acceptance Rate 171 of 855 submissions, 20%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
159
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)1

Reflects downloads up to 12 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yin BZeng WWei X(2022)Cost-effective crowdsourced join queries for entity resolution without prior knowledgeFuture Generation Computer Systems10.1016/j.future.2021.09.008127:C(240-251)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1016/j.future.2021.09.008
Yin BLu J(2021)A Cost-Efficient Framework for Crowdsourced Data Collection in Vehicular NetworksIEEE Internet of Things Journal10.1109/JIOT.2021.30657168:17(13567-13581)Online publication date: 1-Sep-2021
https://doi.org/10.1109/JIOT.2021.3065716
Duan XTajima K(2019)Improving Multiclass Classification in Crowdsourcing by Using Hierarchical SchemesThe World Wide Web Conference10.1145/3308558.3313749(2694-2700)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313749
Li KZhang XLi GDas GJermaine CBernstein P(2018)A Rating-Ranking Method for Crowdsourced Top-k ComputationProceedings of the 2018 International Conference on Management of Data10.1145/3183713.3183762(975-990)Online publication date: 27-May-2018
https://dl.acm.org/doi/10.1145/3183713.3183762
Li GWang JZheng YFan JFranklin MLi GWang JZheng YFan JFranklin M(2018)Crowdsourced OperatorsCrowdsourced Data Management10.1007/978-981-10-7847-7_7(97-154)Online publication date: 13-Oct-2018
https://doi.org/10.1007/978-981-10-7847-7_7

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents