[PDF][PDF] Active learning using adaptive resampling
VS Iyengar, C Apte, T Zhang - Proceedings of the sixth ACM SIGKDD …, 2000 - dl.acm.org
VS Iyengar, C Apte, T Zhang
Proceedings of the sixth ACM SIGKDD international conference on Knowledge …, 2000•dl.acm.orgClassi cation modeling (aka supervised learning) is an extremely useful analytical technique
for developing predictive and forecasting applications. The explosive growth in data
warehousing and internet usage has made large amounts of data potentially available for
developing classi cation models. For example, natural language text is widely available in
many forms (eg, electronic mail, news articles, reports, and web page contents).
Categorization of data is a common activity which can be automated to a large extent using …
for developing predictive and forecasting applications. The explosive growth in data
warehousing and internet usage has made large amounts of data potentially available for
developing classi cation models. For example, natural language text is widely available in
many forms (eg, electronic mail, news articles, reports, and web page contents).
Categorization of data is a common activity which can be automated to a large extent using …
Abstract
Classi cation modeling (aka supervised learning) is an extremely useful analytical technique for developing predictive and forecasting applications. The explosive growth in data warehousing and internet usage has made large amounts of data potentially available for developing classi cation models. For example, natural language text is widely available in many forms (eg, electronic mail, news articles, reports, and web page contents). Categorization of data is a common activity which can be automated to a large extent using supervised learning methods. Examples of this include routing of electronic mail, satellite image classi cation, and character recognition. However, these tasks require labeled data sets of su ciently high quality with adequate instances for training the predictive models. Much of the on-line data, particularly the unstructured variety (eg, text), is unlabeled. Labeling is usually a expensive manual process done by domain experts. Active learning is an approach to solving this problem and works by identifying a subset of the data that needs to be labeled and uses this subset to generate classi cation models. We present an active learning method that uses adaptive resampling in a natural way to signi cantly reduce the size of the required labeled set and generates a classi cation model that achieves the high accuracies possible with current adaptive resampling methods.
ACM Digital Library