Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1150402.1150520acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Pragmatic text mining: minimizing human effort to quantify many issues in call logs

Published: 20 August 2006 Publication History

Abstract

We discuss our experiences in analyzing customer-support issues from the unstructured free-text fields of technical-support call logs. The identification of frequent issues and their accurate quantification is essential in order to track aggregate costs broken down by issue type, to appropriately target engineering resources, and to provide the best diagnosis, support and documentation for most common issues. We present a new set of techniques for doing this efficiently on an industrial scale, without requiring manual coding of calls in the call center. Our approach involves (1) a new text clustering method to identify common and emerging issues; (2) a method to rapidly train large numbers of categorizers in a practical, interactive manner; and (3) a method to accurately quantify categories, even in the face of inaccurate classifications and training sets that necessarily cannot match the class distribution of each new month's data. We present our methodology and a tool we developed and deployed that uses these methods for tracking ongoing support issues and discovering emerging issues at HP.

References

[1]
Banerjee, A., Krumpelman, C., Ghosh, J., Basu, S., and Mooney, R. J. Model-based overlapping clustering. In Proc. of the 11th ACM SIGKDD Int'l Conf. on Knowledge Discovery in Data Mining (KDD, Chicago), 532--537, 2005.
[2]
Beil, F., Ester, M., and Xu, X. Frequent term-based text clustering. In Proc. of the 8th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD):436--42, 2002.
[3]
Deerwester, S., Dumais, S., Furnas, G, Landauer, T, and Harshman, R. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.
[4]
Fawcett, T. and Flach, P. A response to Webb and Ting's 'On the application of ROC analysis to predict classification performance under varying class distributions.' Machine Learning, 58(1):33-38, 2005.
[5]
Forman, G. Quantifying trends accurately despite classifier error and class imbalance. In Proc. of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD, Philadelphia), 2006.
[6]
Forman, G. Counting positives accurately despite inaccurate classification. In Proc. of the 16th European Conf. on Machine Learning (ECML, Porto):564--575, 2005.
[7]
Forman, G. and Cohen, I. Learning from little: comparison of classifiers given little training. In Proc. of 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD, Pisa):161--172, 2004.
[8]
Forman, G. An extensive empirical study of feature selection metrics for text classification. J. of Machine Learning Research, 3(Mar):1289--1305, 2003.
[9]
Havre, S., Hetzler, E., Whitney, P., and Nowell, L. ThemeRiver: visualizing thematic changes in large document collections. IEEE Transactions on Visualization and Computer Graphics, 8(1):9--20, 2002.
[10]
Joachims, T. Text categorization with support vector machines: learning with many relevant features. In Proc. of the 10th European Conf. on Machine Learning (ECML, Berlin):137--142, 1998.
[11]
Li, X., Wang, L., and Sung, E. Multilabel SVM active learning for image classification. In Proc. of the Int'l Conf. on Image Processing (ICIP), 4:2207--2210, 2004.
[12]
MacQueen, J. B. Some Methods for classification and Analysis of Multivariate Observations, In Proc. of 5th Berkeley Symposium on Mathematical Statistics and Probability, Univ. of California Press, 1:281--297, 1967.
[13]
Mei, Q. and Zhai, C. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In Proc. of the 11th ACM SIGKDD Int'l Conf. on Knowledge Discovery in Data Mining (KDD, Chicago): 198--207, 2005.
[14]
Melville, P. and Mooney, R. Diverse ensembles for active learning. In Proc. of the 21st Int'l Conf. on Machine Learning (ICML, Banff), 584--591, 2004.
[15]
Rogati, M. and Yang, Y. Resource selection for domain-specific cross-lingual IR. In Proc. of the 27th Annual Int'l ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR, Sheffield), 154--161, 2004.
[16]
Suermondt, J., Kirshenbaum, E., Forman, G., and Stinger, J. The 10-second answer: practical text clustering for topic discovery. Forthcoming. HP Labs, Tech.Rpt. HPL-2006-41.
[17]
Thearling, K. Some thoughts on the current state of data mining software applications. Workshop: Keys to the Commercial Success of Data Mining, 8th ACM SIGKDD Int'l Conf. on Knowledge Discovery in Data Mining (KDD, New York), 1998.

Cited By

View all
  • (2024)MC-SQ and MC-MQ: Ensembles for Multi-Class QuantificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337201136:8(4007-4019)Online publication date: Aug-2024
  • (2021)Application of the LSA Technique to Determine the Priority of Alerts from a Command and Control Center2021 Eighth International Conference on eDemocracy & eGovernment (ICEDEG)10.1109/ICEDEG52154.2021.9530965(210-214)Online publication date: 28-Jul-2021
  • (2020)Building information modelling knowledge harvesting for energy efficiency in the Construction industryClean Technologies and Environmental Policy10.1007/s10098-020-02000-zOnline publication date: 6-Dec-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2006
986 pages
ISBN:1595933395
DOI:10.1145/1150402
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. applications
  2. log processing
  3. quantification
  4. supervised machine learning
  5. text classification
  6. text mining

Qualifiers

  • Article

Conference

KDD06

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)MC-SQ and MC-MQ: Ensembles for Multi-Class QuantificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337201136:8(4007-4019)Online publication date: Aug-2024
  • (2021)Application of the LSA Technique to Determine the Priority of Alerts from a Command and Control Center2021 Eighth International Conference on eDemocracy & eGovernment (ICEDEG)10.1109/ICEDEG52154.2021.9530965(210-214)Online publication date: 28-Jul-2021
  • (2020)Building information modelling knowledge harvesting for energy efficiency in the Construction industryClean Technologies and Environmental Policy10.1007/s10098-020-02000-zOnline publication date: 6-Dec-2020
  • (2019)A Distance Measure for the Analysis of Polar Opinion Dynamics in Social NetworksACM Transactions on Knowledge Discovery from Data10.1145/333216813:4(1-34)Online publication date: 8-Aug-2019
  • (2019)On the Impact of Voice Encoding and Transmission on the Predictions of Speaker Warmth and AttractivenessACM Transactions on Knowledge Discovery from Data10.1145/333214613:4(1-17)Online publication date: 26-Jul-2019
  • (2019)Active Two Phase Collaborative Representation ClassifierACM Transactions on Knowledge Discovery from Data10.1145/332691913:4(1-10)Online publication date: 2-Jul-2019
  • (2017)A Survey of Active Object LanguagesACM Computing Surveys10.1145/312284850:5(1-39)Online publication date: 5-Oct-2017
  • (2017)Secure Smart HomesACM Computing Surveys10.1145/312281650:5(1-32)Online publication date: 26-Sep-2017
  • (2017)A Review on Quantification LearningACM Computing Surveys10.1145/311780750:5(1-40)Online publication date: 26-Sep-2017
  • (2016)Instance Selection by Identifying Relevant Events Using Domain Knowledge and Minimal Human Involvement2016 IEEE 18th Conference on Business Informatics (CBI)10.1109/CBI.2016.29(191-199)Online publication date: Aug-2016
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media