Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2969239.2969333guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Precision-Recall-Gain curves: PR analysis done right

Published: 07 December 2015 Publication History

Abstract

Precision-Recall analysis abounds in applications of binary classification where true negatives do not add value and hence should not affect assessment of the classifier's performance. Perhaps inspired by the many advantages of receiver operating characteristic (ROC) curves and the area under such curves for accuracy-based performance assessment, many researchers have taken to report Precision-Recall (PR) curves and associated areas as performance metric. We demonstrate in this paper that this practice is fraught with difficulties, mainly because of incoherent scale assumptions - e.g., the area under a PR curve takes the arithmetic mean of precision values whereas the Fβ score applies the harmonic mean. We show how to fix this by plotting PR curves in a different coordinate system, and demonstrate that the new Precision-Recall-Gain curves inherit all key advantages of ROC curves. In particular, the area under Precision-Recall-Gain curves conveys an expected F1 score on a harmonic scale, and the convex hull of a Precision-Recall-Gain curve allows us to calibrate the classifier's scores so as to determine, for each operating point on the convex hull, the interval of β values for which the point optimises Fβ. We demonstrate experimentally that the area under traditional PR curves can easily favour models with lower expected F1 score than others, and so the use of Precision-Recall-Gain curves will result in better model selection.

References

[1]
K. Boyd, V. S. Costa, J. Davis, and C. D. Page. Unachievable region in precision-recall space and its effect on empirical evaluation. In International Conference on Machine Learning, page 349, 2012.
[2]
J. Davis and M. Goadrich. The relationship between precision-recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning, pages 233-240, 2006.
[3]
T. Fawcett. An introduction to ROC analysis. Pattern Recognition Letters, 27(8):861-874, 2006.
[4]
T. Fawcett and A. Niculescu-Mizil. PAV and the ROC convex hull. Machine Learning, 68(1): 97-106, July 2007.
[5]
P. A. Flach. The geometry of ROC space: understanding machine learning metrics through ROC isometrics. In Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003), pages 194-201, 2003.
[6]
P. A. Flach. ROC analysis. In C. Sammut and G. Webb, editors, Encyclopedia of Machine Learning, pages 869-875. Springer US, 2010.
[7]
D. J. Hand and R. J. Till. A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 45(2):171-186, 2001.
[8]
J. Hernández-Orallo, P. Flach, and C. Ferri. A unified view of performance metrics: Translating threshold choice into expected classification loss. Journal of Machine Learning Research, 13: 2813-2869, 2012.
[9]
O. O. Koyejo, N. Natarajan, P. K. Ravikumar, and I. S. Dhillon. Consistent binary classification with generalized performance metrics. In Advances in Neural Information Processing Systems, pages 2744-2752, 2014.
[10]
Z. C. Lipton, C. Elkan, and B. Naryanaswamy. Optimal thresholding of classifiers to maximize F1 measure. In Machine Learning and Knowledge Discovery in Databases, volume 8725 of Lecture Notes in Computer Science, pages 225-239. Springer Berlin Heidelberg, 2014.
[11]
H. Narasimhan, R. Vaish, and S. Agarwal. On the statistical consistency of plug-in classifiers for non-decomposable performance measures. In Advances in Neural Information Processing Systems 27, pages 1493-1501. 2014.
[12]
S. P. Parambath, N. Usunier, and Y. Grandvalet. Optimizing F-measures by cost-sensitive classification. In Advances in Neural Information Processing Systems, pages 2123-2131, 2014.
[13]
J. C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in Large Margin Classifiers, pages 61-74. MIT Press, Boston, 1999.
[14]
F. Provost and T. Fawcett. Robust classification for imprecise environments. Machine Learning, 42(3):203-231, 2001.
[15]
B. Sluban and N. Lavrač. Vipercharts: Visual performance evaluation platform. In H. Blockeel, K. Kersting, S. Nijssen, and F. Železný, editors, Machine Learning and Knowledge Discovery in Databases, volume 8190 of Lecture Notes in Computer Science, pages 650-653. Springer
[16]
Berlin Heidelberg, 2013.
[17]
C. J. Van Rijsbergen. Information Retrieval. Butterworth-Heinemann, Newton, MA, USA, 2nd edition, 1979.
[18]
J. Vanschoren, J. N. van Rijn, B. Bischl, and L. Torgo. OpenML: networked science in machine learning. SIGKDD Explorations, 15(2):49-60, 2013.
[19]
N. Ye, K. M. A. Chai, W. S. Lee, and H. L. Chieu. Optimizing F-measures: A tale of two approaches. In Proceedings of the 29th International Conference on Machine Learning, pages 289-296, 2012.
[20]
B. Zadrozny and C. Elkan. Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), pages 609-616, 2001.
[21]
M.-J. Zhao, N. Edakunni, A. Pocock, and G. Brown. Beyond Fano's inequality: bounds on the optimal F-score, BER, and cost-sensitive risk and their implications. The Journal of Machine Learning Research, 14(1):1033-1090, 2013.

Cited By

View all
  • (2019)A Semi-Boosted Nested Model With Sensitivity-Based Weighted Binarization for Multi-Domain Network Intrusion DetectionACM Transactions on Intelligent Systems and Technology10.1145/331377810:3(1-27)Online publication date: 12-Apr-2019
  • (2018)Smoky vehicle detection based on multi-feature fusion and ensemble neural networksMultimedia Tools and Applications10.1007/s11042-018-6248-277:24(32153-32177)Online publication date: 1-Dec-2018
  • (2017)Designing multi-label classifiers that maximize F measuresPattern Recognition10.1016/j.patcog.2016.08.00861:C(394-404)Online publication date: 1-Jan-2017
  1. Precision-Recall-Gain curves: PR analysis done right

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1
    December 2015
    3626 pages

    Publisher

    MIT Press

    Cambridge, MA, United States

    Publication History

    Published: 07 December 2015

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)A Semi-Boosted Nested Model With Sensitivity-Based Weighted Binarization for Multi-Domain Network Intrusion DetectionACM Transactions on Intelligent Systems and Technology10.1145/331377810:3(1-27)Online publication date: 12-Apr-2019
    • (2018)Smoky vehicle detection based on multi-feature fusion and ensemble neural networksMultimedia Tools and Applications10.1007/s11042-018-6248-277:24(32153-32177)Online publication date: 1-Dec-2018
    • (2017)Designing multi-label classifiers that maximize F measuresPattern Recognition10.1016/j.patcog.2016.08.00861:C(394-404)Online publication date: 1-Jan-2017

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media