Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3132847.3132891acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Crowdsourced Selection on Multi-Attribute Data

Published: 06 November 2017 Publication History
  • Get Citation Alerts
  • Abstract

    Crowdsourced selection asks the crowd to select entities that satisfy a query condition, e.g., selecting the photos of people wearing sunglasses from a given set of photos. Existing studies focus on a single query predicate and in this paper we study the crowdsourced selection problem on multi-attribute data, e.g., selecting the female photos with dark eyes and wearing sunglasses. A straightforward method asks the crowd to answer every entity by checking every predicate in the query. Obviously, this method involves huge monetary cost. Instead, we can select an optimized predicate order and ask the crowd to answer the entities following the order. Since if an entity does not satisfy a predicate, we can prune this entity without needing to ask other predicates and thus this method can reduce the cost. There are two challenges in finding the optimized predicate order. The first is how to detect the predicate order and the second is to capture correlation among different predicates. To address this problem, we propose predicate order based framework to reduce monetary cost. Firstly, we define an expectation tree to store selectivities on predicates and estimate the best predicate order. In each iteration, we estimate the best predicate order from the expectation tree, and then choose a predicate as a question to ask the crowd. After getting the result of the current predicate, we choose next predicate to ask until we get the result. We will update the expectation tree using the answer obtained from the crowd and continue to the next iteration. We also study the problem of answering multiple queries simultaneously, and reduce its cost using the correlation between queries. Finally, we propose a confidence based method to improve the quality. The experiment result shows that our predicate order based algorithm is effective and can reduce cost significantly compared with baseline approaches.

    References

    [1]
    C. Chai, G. Li, J. Li, D. Deng, and J. Feng. Cost-effective crowdsourced entity resolution: A partial-order approach. In SIGMOD, pages 969--984, 2016.
    [2]
    H. Chen, A. Gallagher, and B. Girod. Describing clothing by semantic attributes. ECCV, pages 609--623, 2012.
    [3]
    J. Fan, G. Li, B. C. Ooi, K.-l. Tan, and J. Feng. icrowd: An adaptive crowdsourcing framework. In SIGMOD, pages 1015--1030. ACM, 2015.
    [4]
    J. Fan, M. Zhang, S. Kok, M. Lu, and B. C. Ooi. Crowdop: Query optimization for declarative crowdsourcing systems. IEEE TKDE, 27(8):2078--2092, 2015.
    [5]
    Y. Fang, H. Sun, G. Li, R. Zhang, and J. Huai. Effective result inference for context-sensitive tasks in crowdsourcing. In DASFAA, pages 33--48, 2016.
    [6]
    J. Feng, G. Li, H. Wang, and J. Feng. Incremental quality inference in crowdsourcing. In DASFAA, pages 453--467, 2014.
    [7]
    M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. Crowddb: answering queries with crowdsourcing. In SIGMOD, pages 61--72. ACM, 2011.
    [8]
    S. Guo, A. Parameswaran, and H. Garcia-Molina. So who won?: dynamic max discovery with the crowd. In SIGMOD, pages 385--396. ACM, 2012.
    [9]
    J. M. Hellerstein and M. Stonebraker. Predicate migration: Optimizing queries with expensive predicates, volume 22. ACM, 1993.
    [10]
    H. Hu, G. Li, Z. Bao, Y. Cui, and J. Feng. Crowdsourcing-based real-time urban traffic speed estimation: From trends to speeds. In ICDE, pages 883--894, 2016.
    [11]
    H. Hu, Y. Zheng, Z. Bao, G. Li, J. Feng, and R. Cheng. Crowdsourced POI labelling: Location-aware result inference and task assignment. In ICDE, pages 61--72, 2016.
    [12]
    G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07--49, University of Massachusetts, Amherst, October 2007.
    [13]
    G. Li. Human-in-the-loop data integration. PVLDB, 10(12):2006--2017, 2017.
    [14]
    G. Li, C. Chai, J. Fan, X. Weng, J. Li, Y. Zheng, Y. Li, X. Yu, X. Zhang, and H. Yuan. Cdb: Optimizing queries with crowd-based selections and joins. In SIGMOD, pages 1463--1478. ACM, 2017.
    [15]
    G. Li, J. Wang, Y. Zheng, and M. J. Franklin. Crowdsourced data management: A survey. IEEE TKDE., 28(9):2296--2319, 2016.
    [16]
    X. Liu, M. Lu, B. C. Ooi, Y. Shen, S. Wu, and M. Zhang. Cdas: a crowdsourcing data analytics system. VLDB, 5(10):1040--1051, 2012.
    [17]
    A. Marcus, D. Karger, S. Madden, R. Miller, and S. Oh. Counting with the crowd. In VLDB, volume 6, pages 109--120. VLDB Endowment, 2012.
    [18]
    A. Marcus, E. Wu, D. Karger, S. Madden, and R. Miller. Human-powered sorts and joins. VLDB, 5(1):13--24, 2011.
    [19]
    A. Marcus, E. Wu, D. R. Karger, S. Madden, and R. C. Miller. Demonstration of qurk: a query processor for humanoperators. In SIGMOD, pages 1315--1318. ACM, 2011.
    [20]
    A. G. Parameswaran, H. Garcia-Molina, H. Park, N. Polyzotis, A. Ramesh, and J. Widom. Crowdscreen: Algorithms for filtering data with humans. In SIGMOD, pages 361--372. ACM, 2012.
    [21]
    A. G. Parameswaran, H. Park, H. Garcia-Molina, N. Polyzotis, and J. Widom. Deco: declarative crowdsourcing. In CIKM, pages 1203--1212. ACM, 2012.
    [22]
    H. Park, H. Garcia-Molina, R. Pang, N. Polyzotis, A. Parameswaran, and J. Widom. Deco: A system for declarative crowdsourcing. VLDB, 5(12):1990--1993, 2012.
    [23]
    H. Park, R. Pang, A. Parameswaran, H. Garcia-Molina, N. Polyzotis, and J. Widom. An overview of the deco system: data model and query language; query processing and optimization. SIGMOD Record, 41(4):22--27, 2013.
    [24]
    A. D. Sarma, A. Parameswaran, H. Garcia-Molina, and A. Halevy. Crowd-powered find algorithms. In ICDE, pages 964--975. IEEE, 2014.
    [25]
    P. Venetis, H. Garcia-Molina, K. Huang, and N. Polyzotis. Max algorithms in crowdsourcing environments. In WWW, pages 989--998. ACM, 2012.
    [26]
    J. Wang, G. Li, T. Kraska, M. J. Franklin, and J. Feng. Leveraging transitive relations for crowdsourced joins. In SIGMOD, pages 229--240. ACM, 2013.
    [27]
    X. Zhang, G. Li, and J. Feng. Crowdsourced top-k algorithms: An experimental evaluation. PVLDB, 9(8):612--623, 2016.
    [28]
    Y. Zheng, G. Li, and R. Cheng. DOCS: domain-aware crowdsourcing system. PVLDB, 10(4):361--372, 2016.
    [29]
    Y. Zheng, G. Li, Y. Li, C. Shan, and R. Cheng. Truth inference in crowdsourcing: Is the problem solved? PVLDB, 10(5):541--552, 2017.
    [30]
    Y. Zheng, J. Wang, G. Li, R. Cheng, and J. Feng. QASCA: A quality-aware task assignment system for crowdsourcing applications. In SIGMOD, pages 1031--1046, 2015.

    Cited By

    View all
    • (2022)Cost-effective crowdsourced join queries for entity resolution without prior knowledgeFuture Generation Computer Systems10.1016/j.future.2021.09.008127:C(240-251)Online publication date: 1-Feb-2022
    • (2021)A Cost-Efficient Framework for Crowdsourced Data Collection in Vehicular NetworksIEEE Internet of Things Journal10.1109/JIOT.2021.30657168:17(13567-13581)Online publication date: 1-Sep-2021
    • (2019)Improving Multiclass Classification in Crowdsourcing by Using Hierarchical SchemesThe World Wide Web Conference10.1145/3308558.3313749(2694-2700)Online publication date: 13-May-2019
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
    November 2017
    2604 pages
    ISBN:9781450349185
    DOI:10.1145/3132847
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 November 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    CIKM '17
    Sponsor:

    Acceptance Rates

    CIKM '17 Paper Acceptance Rate 171 of 855 submissions, 20%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Cost-effective crowdsourced join queries for entity resolution without prior knowledgeFuture Generation Computer Systems10.1016/j.future.2021.09.008127:C(240-251)Online publication date: 1-Feb-2022
    • (2021)A Cost-Efficient Framework for Crowdsourced Data Collection in Vehicular NetworksIEEE Internet of Things Journal10.1109/JIOT.2021.30657168:17(13567-13581)Online publication date: 1-Sep-2021
    • (2019)Improving Multiclass Classification in Crowdsourcing by Using Hierarchical SchemesThe World Wide Web Conference10.1145/3308558.3313749(2694-2700)Online publication date: 13-May-2019
    • (2018)A Rating-Ranking Method for Crowdsourced Top-k ComputationProceedings of the 2018 International Conference on Management of Data10.1145/3183713.3183762(975-990)Online publication date: 27-May-2018
    • (2018)Crowdsourced OperatorsCrowdsourced Data Management10.1007/978-981-10-7847-7_7(97-154)Online publication date: 13-Oct-2018

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media