Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

How to Limit Label Dissipation in Neural-network Validation: Exploring Label-free Early-stopping Heuristics

Published: 01 June 2023 Publication History

Abstract

In recent years, deep learning (DL) has achieved impressive successes in many application domains, including Handwritten-Text Recognition. However, DL methods demand a long training process and a huge amount of human-based labeled data. To address these issues, we explore several label-free heuristics for detecting the early-stopping point in training convolutional-neural networks: (1) Cumulative Distribution of the standard deviation of kernel weights (SKW); (2) the moving standard deviation of SKW, and (3) the standard deviation of the sum of weights over a window in the epoch series. We applied the proposed methods to the common RIMES and Bentham data sets as well as another highly challenging historical data set. In comparison with the usual stopping criterion which uses labels for validation, the label-free heuristics are at least 10 times faster per epoch when the same training set is used. The use of alternative stopping heuristics may require additional epochs, however, they never require the original computing time. The character error rate (%) on the test set of the label-free heuristics is about a percentage point less in comparison to the usual stopping criterion. The label-free early-stopping methods have two benefits: They do not require a computationally intensive evaluation of a validation set per epoch and all labels can be used for training, specifically benefitting the underrepresented word or letter classes.

References

[1]
Mahya Ameryan and Lambert Schomaker. 2020. A high-performance word recognition system for the biological fieldnotes of the Natuurkundige Csommissie. In Proceedings of the International Conference Collect and Connect: Archives and Collections in a Digital Age, Leiden, the Netherlands, November 23-24, 2020(CEUR Workshop Proceedings, Vol. 2810), Andreas Weber, Maarten Heerlien, Eulàlia Gassó Miracle, and Katherine Wolstencroft (Eds.). CEUR-WS.org, Leiden, Netherlands, 92–103. http://ceur-ws.org/Vol-2810/paper8.pdf.
[2]
Mahya Ameryan and Lambert Schomaker. 2020. Improving the robustness of LSTMs for word classification using stressed word endings in dual-state word-beam search. In 17th Int. Conf. Frontiers in Handwriting Recognition. IEEE, Dortmund, Germany, 13–18.
[3]
Mahya Ameryan and Lambert Schomaker. 2021. A limited-size ensemble of homogeneous CNN/LSTMs for high-performance word classification. Neural Computing and Applications 33 (2021), 8615–8634.
[4]
David Duvenaud, Dougal Maclaurin, and Ryan Adams. 2016. Early stopping as nonparametric variational inference. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics(Proceedings of Machine Learning Research, Vol. 51), Arthur Gretton and Christian C. Robert (Eds.). PMLR, Cadiz, Spain, 1070–1077. https://proceedings.mlr.press/v51/duvenaud16.html.
[5]
Basilis Gatos, Georgios Louloudis, Tim Causer, Kris Grint, Verónica Romero, Joan Andreu Sánchez, Alejandro H. Toselli, and Enrique Vidal. 2014. Ground-truth production in the transcriptorium project. In 2014 11th IAPR International Workshop on Document Analysis Systems. 237–241.
[6]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press, London, England. http://www.deeplearningbook.org.
[7]
Alex Graves. 2012. Supervised Sequence Labelling with Recurrent Neural Networks. Springer, Berlin.
[8]
Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning (Pittsburgh, Pennsylvania, USA) (ICML’06). ACM, New York, NY, USA, 369–376.
[9]
William H. Greene. 1997. Econometric Analysis (3rd ed.). Macmillan Publishing Company, New York, USA.
[10]
Gintautas Grigas and Anita Juskeviciene. 2018. Letter frequency analysis of languages using Latin alphabet. International Linguistics Research 1, 1 (2018), 18–31.
[11]
Emmanuèle Grosicki, Matthieu Carré, Edouard Geoffrois, Emmanuel Augustin, and Francoise Preteux. 2006. La campagne d’évaluation RIMES pour la reconnaissance de courriers manuscrits. In Actes Colloque International Francophone sur l’Ecrit et le Document (CIFED’06). Fribourg, Switzerland, 61–66.
[12]
Isabelle Guyon, Amir Reza Saffari Azar Alamdari, Gideon Dror, and Joachim M. Buhmann. 2006. Performance prediction challenge. InProceedings of the International Joint Conference on Neural Networks (IJCNN’06). IEEE, Vancouver, BC, Canada, 2958–2965.
[13]
David Haussler. 1990. Probably approximately correct learning. In Proceedings of the Eighth National Conference on Artificial Intelligence. AAAI Press, University of California, USA, 1101–1108.
[14]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computing 9, 8 (1997), 1735–1780.
[15]
Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. arxiv:1412.6980 [cs.LG].
[16]
Denis Kwiatkowski, Peter C. B. Phillips, Peter Schmidt, and Yongcheol Shin. 1992. Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? Journal of Econometrics 54, 1 (1992), 159–178.
[17]
Yann LeCun and Yoshua Bengio. 1998. Convolutional networks for images, speech, and time series. In The Handbook of Brain Theory and Neural Networks, Michael A. Arbib (Ed.). MIT Press, Cambridge, MA, USA, 255–258.
[18]
Radu Manuca and Robert Savit. 1996. Stationarity and nonstationarity in time series analysis. Physica D: Nonlinear Phenomena 99, 2 (1996), 134–161.
[19]
Nelson Morgan and Herve Bourlard. 1990. Generalization and Parameter Estimation in Feedforward Nets: Some Experiments. Vol. 2. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 630–637.
[20]
Vinod Nair and Geoffrey E. Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning (Haifa, Israel) (ICML’10). Omnipress, USA, 807–814.
[21]
Maryam M. Najafabadi, Flavio Villanustre, Taghi M. Khoshgoftaar, Naeem Seliya, Randall Wald, and Edin Muharemagic. 2015. Deep learning applications and challenges in big data analytics. Journal of Big Data 2 (2015), 1–21.
[22]
Lutz Prechelt. 2012. Early Stopping-But When?Springer Berlin, Berlin, 53–67.
[23]
M. B. Priestley and T. Subba Rao. 1969. A test for non-stationarity of time-series. Journal of the Royal Statistical Society: Series B (Methodological) 31, 1 (1969), 140–149.
[24]
Manel Rhif, Ali Ben Abbes, Imed Riadh Farah, Beatriz Martínez, and Yanfang Sang. 2019. Wavelet transform application for/in non-stationary time-series analysis: A review. Applied Sciences 9, 7 (2019), 1–22.
[25]
Bastian Rieck, Matteo Togninalli, Christian Bock, Michael Moor, Max Horn, Thomas Gumbsch, and Karsten Borgwardt. 2019. Neural persistence: A complexity measure for deep neural networks using algebraic topology. In International Conference on Learning Representations. ICLR2019, New Orleans, Louisiana, USA, 1–25. https://openreview.net/forum?id=ByxkijC5FQ.
[26]
Harald Scheidl, Stefan Fiel, and Robert Sablatnig. 2018. Word beam search: A connectionist temporal classification decoding algorithm. In The International Conference on Frontiers of Handwriting Recognition (ICFHR). IEEE Computer Society, Niagara Falls, New York, USA, 253–258.
[27]
Otto Neall Strand. 1974. Theory and methods related to the singular value expansion and Landweber’s iteration for integral equations of the first kind. SIAM J. Numer. Anal. 11 (1974), 798–825.
[28]
Seba Susan, Rohit Ranjan, Udyant Taluja, Udyant Shivang Rai, and Pranav Agarwal. 2019. Neural net optimization by weight-entropy monitoring. In Computational Intelligence: Theories, Applications and Future Directions - Volume II, Nishchal K. Verma and A. K. Ghosh (Eds.). Springer Singapore, Singapore, 201–213.
[29]
Joan Andreu Sánchez, Verónica Romero, Alejandro H. Toselli, and Enrique Vidal. 2014. ICFHR2014 competition on handwritten text recognition on transcriptorium datasets (HTRtS). In 2014 14th International Conference on Frontiers in Handwriting Recognition. 785–790.
[30]
Tom Viering and Marco Loog. 2021. The shape of learning curves: A review. ArXiv abs/2103.10948 (2021), 1–46.
[31]
Walter Pitts Warren S. McCulloch. 1943. A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics 5 (1943), 99–115.
[32]
Andreas Weber. 2019. Collecting colonial nature- European naturalists and the Netherlands Indies in the early nineteenth century. Low Countries Historical Review 134-3 (2019), 72–95.
[33]
Andreas Weber. 2021. The Routledge Handbook of Science and Empire. Routledge, London and New York, Chapter Natural history collections and empire, 80–86.
[34]
Andreas Weber, Mahya Ameryan, Katherine Wolstencroft, Lise Stork, Maarten Heerlien, and Lambert Schomaker. 2017. Towards a digital infrastructure for illustrated handwritten archives. In Final Conference of the Marie Skłodowska-Curie Initial Training Network for Digital Cultural Heritage, ITN-DCH 2017, M. Ioannides (Ed.), Vol. 10605. Springer, Olimje, Slovenia, 155–166.
[35]
P. Werbos. 1974. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Ph. D. Dissertation. Harvard University.
[36]
[37]
Xue Ying. 2019. An overview of overfitting and its solutions. Journal of Physics: Conference Series 1168 (Feb.2019), 022022.
[38]
Changyong Yu, Xin Qi, Haitao Ma, Xin He, Cuirong Wang, and Yuhai Zhao. 2020. LLR: Learning learning rates by LSTM for training neural networks. Neurocomputing 394 (2020), 41–50.
[39]
G. K. Zipf. 1935. The Psycho-biology of Language: An Introduction to Dynamic Philology. Houghton, Mufflin, Oxford.

Cited By

View all

Index Terms

  1. How to Limit Label Dissipation in Neural-network Validation: Exploring Label-free Early-stopping Heuristics

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Journal on Computing and Cultural Heritage
      Journal on Computing and Cultural Heritage   Volume 16, Issue 1
      March 2023
      437 pages
      ISSN:1556-4673
      EISSN:1556-4711
      DOI:10.1145/3572829
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 June 2023
      Online AM: 27 April 2023
      Accepted: 09 September 2022
      Revised: 30 July 2022
      Received: 31 October 2021
      Published in JOCCH Volume 16, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Deep learning
      2. early-stopping criterion
      3. convolutional neural networks
      4. historical handwritten word recognition

      Qualifiers

      • Research-article

      Funding Sources

      • Netherlands Organisation for Scientific Research (NWO)

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)61
      • Downloads (Last 6 weeks)5
      Reflects downloads up to 16 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media