research-article

Exploratory training: when trainers learn

Authors:

Omeed Habibelahian,

Rajesh Shrestha,

Arash Termehchy,

Paolo PapottiAuthors Info & Claims

HILDA '22: Proceedings of the Workshop on Human-In-the-Loop Data Analytics

Article No.: 3, Pages 1 - 5

https://doi.org/10.1145/3546930.3547500

Published: 17 August 2022 Publication History

Abstract

Data systems often present examples and solicit labels from users to learn a target concept in supervised to semi-supervised learning. This selection of examples could be even done in an active fashion i.e., active learning. Current systems assume that users always provide correct labeling with potentially a fixed and small chance of mistake. In several settings, users may have to explore and learn about the underlying data to label examples correctly, particularly for complex target concepts and models. For example, to provide accurate labeling for a model of detecting noisy or abnormal values, users might need to investigate the underlying data to understand typical and clean values in the data. As users gradually learn about the target concept and data, they may revise their labeling strategies. Due to the significance and non-stationarity of errors in this setting, current systems may use incorrect labels and learn inaccurate models from the users. We report preliminary results for a user study over real-world datasets on modeling human learning during training the system and layout the next steps in this investigation.

References

[1]

Charu C. Aggarwal, Xiangnan Kong, Quanquan Gu, Jiawei Han, and Philip S. Yu. 2014. Active Learning: A Survey. In Data Classification: Algorithms and Applications, Charu C. Aggarwal (Ed.). CRC Press, 571--606.

[2]

Patricia C. Arocena, Boris Glavic, Giansalvatore Mecca, Renée J. Miller, Paolo Papotti, and Donatello Santoro. 2015. Messing up with BART: Error Generation for Evaluating Data-Cleaning Algorithms. Proc. VLDB Endow. 9, 2 (oct 2015), 36--47.

Digital Library

[3]

Colin F. Camerer, Teck-Hua Ho, and Juin Kuan Chong. 2004. Behavioural Game Theory: Thinking, Learning and Teaching. In Advances in understanding strategic behaviour: game theory, experiments, and bounded rationality. Palgrave Macmillan, 120--180.

[4]

Kyriaki Dimitriadou, Olga Papaemmanouil, and Yanlei Diao. 2014. Explore-by-example: an automatic query steering framework for interactive data exploration. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22--27, 2014, Curtis E. Dyreson, Feifei Li, and M. Tamer Özsu (Eds.). ACM, 517--528.

Digital Library

[5]

Samuel J. Gershman, Eric J. Horvitz, and Joshua B. Tenenbaum. 2015. Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science 349, 6245 (2015), 273--278. arXiv:https://science.sciencemag.org/content/349/6245/273.full.pdf

[6]

Daniel Golovin, Andreas Krause, and Debajyoti Ray. 2010. Near-Optimal Bayesian Active Learning with Noisy Observations. In Proceedings of the 23rd International Conference on Neural Information Processing Systems - Volume 1 (Vancouver, British Columbia, Canada) (NIPS'10). Curran Associates Inc., Red Hook, NY, USA, 766--774.

[7]

Jian He, Enzo Veltri, Donatello Santoro, Guoliang Li, Giansalvatore Mecca, Paolo Papotti, and Nan Tang. 2016. Interactive and Deterministic Data Cleaning. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, Fatma Özcan, Georgia Koutrika, and Sam Madden (Eds.). ACM, 893--907.

Digital Library

[8]

Rui Li, Rui Guo, Zhenquan Xu, and Wei Feng. 2012. A prefetching model based on access popularity for geospatial data in a cluster-based caching system. International Journal of Geographical Information Science 26, 10 (2012), 1831--1844.

Digital Library

[9]

Ben McCamish, Vahid Ghadakchi, Arash Termehchy, Behrouz Touri, and Liang Huang. 2018. The Data Interaction Game. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD '18). ACM, New York, NY, USA, 83--98.

Digital Library

[10]

Yael Niv. 2009. The Neuroscience of Reinforcement Learning. In ICML.

[11]

Y Niv. 2009. Reinforcement learning in the brain. The Journal of Mathematical Psychology 53, 3 (2009), 139--154.

[12]

Burr Settles. 2009. Active Learning Literature Survey. Computer Sciences Technical Report 1648. University of Wisconsin-Madison. http://axon.cs.byu.edu/~martinez/classes/778/Papers/settles.activelearning.pdf

[13]

Burr Settles. 2012. Active Learning. Morgan & Claypool Publishers.

[14]

R. Shraga O. Amir, and A. Gal. 2021. Learning to Characterize Matching Experts. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE Computer Society, Los Alamitos, CA, USA, 1236--1247.

[15]

Roee Shraga and Avigdor Gal. 2022. PoWareMatch: A Quality-Aware Deep Learning Approach to Improve Human Schema Matching. J. Data and Information Quality 14, 3, Article 16 (may 2022), 27 pages.

Digital Library

[16]

Joshua B. Tenenbaum. 1999. Bayesian Modeling of Human Concept Learning. In Advances in Neural Information Processing Systems 11, M. J. Kearns, S. A. Solla, and D. A. Cohn (Eds.). MIT Press, 59--68. http://papers.nips.cc/paper/1542-bayesian-modeling-of-human-concept-learning.pdf

[17]

H Peyton Young. 2004. Strategic learning and its limits. OUP Oxford.

[18]

Ugur Çetintemel, Mitch Cherniack, Justin DeBrabant, Yanlei Diao, Kyriaki Dimitriadou, Alexander Kalinin, Olga Papaemmanouil, and Stanley B. Zdonik. 2013. Query Steering for Interactive Data Exploration. In CIDR.

Index Terms

Exploratory training: when trainers learn
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods
      1. User models
2. Information systems
  1. Data management systems
    1. Information integration
      1. Data cleaning
  2. Information systems applications
    1. Decision support systems
      1. Data analytics

Recommendations

Exploratory Training: When Annonators Learn About Data
PACMMOD

Data systems often present examples and solicit labels from users to learn a target model, i.e., active learning. However, due to the complexity of the underlying data, users may not initially have a perfect understanding of the effective model and do ...
Collaborative learning for hyperspectral image classification

Recently, collaborative learning (CL) is introduced to combine active learning (AL) with semi-supervised learning (SSL), and solve the problem of limited training samples. In this paper, we proposed a novel CL framework for hyperspectral image ...
Collaborative learning of supervision and correlation for generalized zero-shot extreme multi-label learning
Abstract
Generalized zero-shot extreme multi-label learning (GZXML) aims to predict relevant labels for unknown instances from a set of seen and unseen labels and is widely used in engineering applications. Since the supervisory information of the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HILDA '22: Proceedings of the Workshop on Human-In-the-Loop Data Analytics

June 2022

67 pages

ISBN:9781450394420

DOI:10.1145/3546930

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMOD/PODS '22

Sponsor:

SIGMOD

SIGMOD/PODS '22: International Conference on Management of Data

June 12, 2022

Pennsylvania, Philadelphia

Acceptance Rates

Overall Acceptance Rate 28 of 56 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
76
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)4

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents