Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3546930.3547500acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Exploratory training: when trainers learn

Published: 17 August 2022 Publication History

Abstract

Data systems often present examples and solicit labels from users to learn a target concept in supervised to semi-supervised learning. This selection of examples could be even done in an active fashion i.e., active learning. Current systems assume that users always provide correct labeling with potentially a fixed and small chance of mistake. In several settings, users may have to explore and learn about the underlying data to label examples correctly, particularly for complex target concepts and models. For example, to provide accurate labeling for a model of detecting noisy or abnormal values, users might need to investigate the underlying data to understand typical and clean values in the data. As users gradually learn about the target concept and data, they may revise their labeling strategies. Due to the significance and non-stationarity of errors in this setting, current systems may use incorrect labels and learn inaccurate models from the users. We report preliminary results for a user study over real-world datasets on modeling human learning during training the system and layout the next steps in this investigation.

References

[1]
Charu C. Aggarwal, Xiangnan Kong, Quanquan Gu, Jiawei Han, and Philip S. Yu. 2014. Active Learning: A Survey. In Data Classification: Algorithms and Applications, Charu C. Aggarwal (Ed.). CRC Press, 571--606.
[2]
Patricia C. Arocena, Boris Glavic, Giansalvatore Mecca, Renée J. Miller, Paolo Papotti, and Donatello Santoro. 2015. Messing up with BART: Error Generation for Evaluating Data-Cleaning Algorithms. Proc. VLDB Endow. 9, 2 (oct 2015), 36--47.
[3]
Colin F. Camerer, Teck-Hua Ho, and Juin Kuan Chong. 2004. Behavioural Game Theory: Thinking, Learning and Teaching. In Advances in understanding strategic behaviour: game theory, experiments, and bounded rationality. Palgrave Macmillan, 120--180.
[4]
Kyriaki Dimitriadou, Olga Papaemmanouil, and Yanlei Diao. 2014. Explore-by-example: an automatic query steering framework for interactive data exploration. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22--27, 2014, Curtis E. Dyreson, Feifei Li, and M. Tamer Özsu (Eds.). ACM, 517--528.
[5]
Samuel J. Gershman, Eric J. Horvitz, and Joshua B. Tenenbaum. 2015. Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science 349, 6245 (2015), 273--278. arXiv:https://science.sciencemag.org/content/349/6245/273.full.pdf
[6]
Daniel Golovin, Andreas Krause, and Debajyoti Ray. 2010. Near-Optimal Bayesian Active Learning with Noisy Observations. In Proceedings of the 23rd International Conference on Neural Information Processing Systems - Volume 1 (Vancouver, British Columbia, Canada) (NIPS'10). Curran Associates Inc., Red Hook, NY, USA, 766--774.
[7]
Jian He, Enzo Veltri, Donatello Santoro, Guoliang Li, Giansalvatore Mecca, Paolo Papotti, and Nan Tang. 2016. Interactive and Deterministic Data Cleaning. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, Fatma Özcan, Georgia Koutrika, and Sam Madden (Eds.). ACM, 893--907.
[8]
Rui Li, Rui Guo, Zhenquan Xu, and Wei Feng. 2012. A prefetching model based on access popularity for geospatial data in a cluster-based caching system. International Journal of Geographical Information Science 26, 10 (2012), 1831--1844.
[9]
Ben McCamish, Vahid Ghadakchi, Arash Termehchy, Behrouz Touri, and Liang Huang. 2018. The Data Interaction Game. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD '18). ACM, New York, NY, USA, 83--98.
[10]
Yael Niv. 2009. The Neuroscience of Reinforcement Learning. In ICML.
[11]
Y Niv. 2009. Reinforcement learning in the brain. The Journal of Mathematical Psychology 53, 3 (2009), 139--154.
[12]
Burr Settles. 2009. Active Learning Literature Survey. Computer Sciences Technical Report 1648. University of Wisconsin-Madison. http://axon.cs.byu.edu/~martinez/classes/778/Papers/settles.activelearning.pdf
[13]
Burr Settles. 2012. Active Learning. Morgan & Claypool Publishers.
[14]
R. Shraga O. Amir, and A. Gal. 2021. Learning to Characterize Matching Experts. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE Computer Society, Los Alamitos, CA, USA, 1236--1247.
[15]
Roee Shraga and Avigdor Gal. 2022. PoWareMatch: A Quality-Aware Deep Learning Approach to Improve Human Schema Matching. J. Data and Information Quality 14, 3, Article 16 (may 2022), 27 pages.
[16]
Joshua B. Tenenbaum. 1999. Bayesian Modeling of Human Concept Learning. In Advances in Neural Information Processing Systems 11, M. J. Kearns, S. A. Solla, and D. A. Cohn (Eds.). MIT Press, 59--68. http://papers.nips.cc/paper/1542-bayesian-modeling-of-human-concept-learning.pdf
[17]
H Peyton Young. 2004. Strategic learning and its limits. OUP Oxford.
[18]
Ugur Çetintemel, Mitch Cherniack, Justin DeBrabant, Yanlei Diao, Kyriaki Dimitriadou, Alexander Kalinin, Olga Papaemmanouil, and Stanley B. Zdonik. 2013. Query Steering for Interactive Data Exploration. In CIDR.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HILDA '22: Proceedings of the Workshop on Human-In-the-Loop Data Analytics
June 2022
67 pages
ISBN:9781450394420
DOI:10.1145/3546930
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 August 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bayesian model
  2. collaborative learning
  3. data exploration
  4. functional dependencies
  5. human in loop
  6. human learning
  7. hypothesis-testing model

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 28 of 56 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 76
    Total Downloads
  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)4
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media