Prediction intervals of loan rate for mortgage data based on bootstrapping technique: A comparative study
-
Abstract
The prediction interval is an important guide for financial organizations to make decisions for pricing loan rates. In this paper, we considered four models with bootstrap technique to calculate prediction intervals. Two datasets are used for the study and $ 5 $-fold cross validation is used to estimate performance. The classical regression and Huber regression models have similar performance, both of them have narrow intervals. Although the RANSAC model has a slightly higher coverage rate, it has the widest interval. When the coverage rates are similar, the model with a narrower interval is recommended. Therefore, the classical and Huber regression models with bootstrap method are recommended to calculate the prediction interval.
-
Keywords:
- Mortgage rate prediction intervals,
- Huber regression,
- random sample consensus,
- Theil Sen regression.
Mathematics Subject Classification: Primary: 58F15, 58F17; Secondary: 53C35.Citation: -
Table 1. Default parameters for robust models
Models Parameters Huber epsilon=1.35 max_iter=100 alpha=0.0001 tol=0.00001 RANSAC max_trials=100 stop_probability=0.99 loss='absolute_error' Theil Sen max_subpopulation=10000 max_iter=300 tol=0.001 Table 3. Running time for the two datasets
R values Model Running time (seconds) 1st Dataset 2nd Dataset R=3000 Classical 103 104 Huber 1176 1213 RANSAC 4027 4069 Theil Sen 2482 2720 R=5000 Classical 170 172 Huber 1981 2009 RANSAC 6715 6779 Theil Sen 4187 4587 R=7000 Classical 238 238 Huber 2737 2808 RANSAC 9491 9480 Theil Sen 5949 6507 R=9000 Classical 304 322 Huber 3517 3668 RANSAC 12287 12289 Theil Sen 7824 8549 Table 2. Coverage rate for the two datasets
R values Model Coverage rate 1st Dataset 2nd Dataset R=3000 Classical 95.18% 94.88% Huber 95.05% 94.93% RANSAC 97.89% 97.91% Theil Sen 95.05% 94.93% R=5000 Classical 95.14% 94.84% Huber 94.91% 95.01% RANSAC 97.89% 97.99% Theil Sen 95.23% 94.97% R=7000 Classical 95.14% 94.93% Huber 94.95% 94.84% RANSAC 97.84% 97.91% Theil Sen 94.86% 94.97% R=9000 Classical 95.09% 94.93% Huber 95.09% 94.84% RANSAC 97.94% 98.08% Theil Sen 95.18% 94.88% Table 4. Tukey test of mean of the widths for the first dataset
R value Model Difference of Mean P-value Confidence Interval$ 95\% $ R=3000 Classical vs Huber -0.0103 0.3012 (-0.0255, 0.0049) RANSAC vs Huber 0.6364 0.0010 (0.6212, 0.6516) Theil Sen vs Huber 0.0862 0.0010 (0.0710, 0.1014) RANSAC vs Classical 0.6467 0.0010 (0.6315, 0.6619) Theil Sen vs Classical 0.0966 0.0010 (0.0813, 0.1118) Theil Sen vs RANSAC -0.5501 0.0010 (-0.5653, -0.5349) R=5000 Classical vs Huber -0.0099 0.3076 (-0.0246, 0.0048) RANSAC vs Huber 0.6341 0.0010 (0.6194, 0.6488) Theil Sen vs Huber 0.0871 0.0010 (0.0724, 0.1018) RANSAC vs Classical 0.6440 0.0010 (0.6293, 0.6587) Theil Sen vs Classical 0.0970 0.0010 (0.0823, 0.1117) Theil Sen vs RANSAC -0.5470 0.0010 (-0.5617, -0.5323) R=7000 Classical vs Huber -0.0134 0.0866 (-0.0280, 0.0012) RANSAC vs Huber 0.6348 0.0010 (0.6201, 0.6494) Theil Sen vs Huber 0.0833 0.0010 (0.0687, 0.0979) RANSAC vs Classical 0.6482 0.0010 (0.6335, 0.6628) Theil Sen vs Classical 0.0967 0.0010 (0.0821, 0.1113) Theil Sen vs RANSAC -0.5515 0.0010 (-0.5661, -0.5368) R=9000 Classical vs Huber -0.0151 0.0365 (-0.0295, -0.0007) RANSAC vs Huber 0.6352 0.0010 (0.6207, 0.6496) Theil Sen vs Huber 0.0843 0.0010 (0.0699, 0.0987) RANSAC vs Classical 0.6503 0.0010 (0.6358, 0.6647 Theil Sen vs Classical 0.0994 0.0010 (0.0850, 0.1138) Theil Sen vs RANSAC -0.5509 0.0010 (-0.5653, -0.5364) Table 5. Tukey test of mean of the widths for the second dataset
R value Model Difference of Mean P-value Confidence Interval$ 95\% $ R=3000 Classical vs Huber -0.0062 0.6138 (-0.0195, 0.0071) RANSAC vs Huber 0.5572 0.0010 (0.5439, 0.5704) Theil Sen vs Huber 0.0485 0.0010 (0.0353, 0.0618) RANSAC vs Classical 0.5633 0.0010 (0.5501, 0.5766) Theil Sen vs Classical 0.0547 0.0010 (0.0414, 0.0680) Theil Sen vs RANSAC -0.5086 0.0010 (-0.5219, -0.4954) R=5000 Classical vs Huber -0.0002 0.9000 (-0.0131, 0.0127) RANSAC vs Huber 0.5600 0.0010 (0.5471, 0.5729) Theil Sen vs Huber 0.0484 0.0010 (0.0355, 0.0613) RANSAC vs Classical 0.5602 0.0010 (0.5473, 0.5731) Theil Sen vs Classical 0.0487 0.0010 (0.0357, 0.0616) Theil Sen vs RANSAC -0.5116 0.0010 (-0.5245, -0.4986) R=7000 Classical vs Huber -0.0017 0.9000 (-0.0144, 0.0111) RANSAC vs Huber 0.5609 0.0010 (0.5482, 0.5736) Theil Sen vs Huber 0.0466 0.0010 (0.0338, 0.0593) RANSAC vs Classical 0.5625 0.0010 (0.5498, 0.5753) Theil Sen vs Classical 0.0482 0.0010 (0.0355, 0.0609) Theil Sen vs RANSAC -0.5143 0.0010 (-0.5270, -0.5016) R=9000 Classical vs Huber -0.0030 0.9000 (-0.0156, 0.0096) RANSAC vs Huber 0.5624 0.0010 (0.5498, 0.5750) Theil Sen vs Huber 0.0455 0.0010 (0.0329, 0.0581) RANSAC vs Classical 0.5654 0.0010 (0.5528, 0.5780 Theil Sen vs Classical 0.0485 0.0010 (0.0359, 0.0611) Theil Sen vs RANSAC -0.5169 0.0010 (-0.5295, -0.5043) -
References
[1] M. G. Akritas, S. A. Murphy and M. P. Lavalley, The Theil-Sen estimator with doubly censored data and applications to astronomy, J. Amer. Statist. Assoc., 90 (1995), 170-177. doi: 10.1080/01621459.1995.10476499. [2] D. Bertsimas and B. Van Parys, Bootstrap robust prescriptive analytics, Mathematical Programming, (2021), 1-40. [3] M. R. Chernick and R. A. LaBudde, An Introduction to Bootstrap Methods with Applications to R, John Wiley & Sons, 2014. [4] X. Dang, H. Peng, X. Wang and H. Zhang, Theil-sen estimators in a multiple linear regression model, Olemiss Edu., (2008). [5] A. C. Davison and D. V. Hinkley, Bootstrap Methods and Their Application, Cambridge Series in Statistical and Probabilistic Mathematics, 1. Cambridge University Press, Cambridge, 1997. doi: 10.1017/CBO9780511802843. [6] B. Efron, Bootstrap methods: Another look at the jackknife, Breakthroughs in Statistics, Springer, (1992), 569-593. [7] B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap, Monographs on Statistics and Applied Probability, 57. Chapman and Hall, New York, 1993. doi: 10.1007/978-1-4899-4541-9. [8] Y. Fang, J. Xu and L. Yang, Online bootstrap confidence intervals for the stochastic gradient descent estimator, J. Mach. Learn. Res., 19 (2018), Paper No. 78, 21 pp. [9] M. A. Fischler and R. C. Bolles, Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Comm. ACM, 24 (1981), 381-395. doi: 10.1145/358669.358692. [10] J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo and M. Gheshlaghi Azar, et al., Bootstrap your own latent-a new approach to self-supervised learning, Advances in Neural Information Processing Systems, 33 (2020), 21271-21284. [11] T. Hesterberg, Bootstrap, Wiley Interdisciplinary Reviews: Computational Statistics, 3 (2011), 497-526. [12] P. J. Huber, Robust estimation of a location parameter, Breakthroughs in Statistics, Springer, (1992), 492-518. [13] P. J. Huber, Robust Statistics, Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons, Inc., New York, 1981. [14] G. James, D. Witten, T. Hastie and R. Tibshirani, An Introduction to Statistical Learning, Springer Texts in Statistics, 103. Springer, New York, 2013. doi: 10.1007/978-1-4614-7138-7. [15] A. Khosravi, S. Nahavandi, D. Srinivasan and R. Khosravi, Constructing optimal prediction intervals by using neural networks and bootstrap method, IEEE Trans. Neural Netw. Learn. Syst., 26 (2015), 1810-1815. doi: 10.1109/TNNLS.2014.2354418. [16] M. Kuhn and K. Johnson, Applied Predictive Modeling, Springer, New York, 2013. doi: 10.1007/978-1-4614-6849-3. [17] L. Pan and D. N. Politis, Bootstrap prediction intervals for linear, nonlinear and nonparametric autoregressions, J. Statist. Plann. Inference, 177 (2016), 1-27. doi: 10.1016/j.jspi.2014.10.003. [18] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot and E. Duchesnay, Scikit-learn: Machine learning in Python, J. Mach. Learn. Res., 12 (2011), 2825-2830. [19] T. Phaladisailoed and T. Numnonda, Machine learning models comparison for bitcoin price prediction, 2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE), (2018), 506-511. [20] R. Raguram, O. Chum, M. Pollefeys, J. Matas and J.-M. Frahm, Usac: A universal framework for random sample consensus, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2012), 2022-2038. [21] S. Raschka, Model evaluation, model selection, and algorithm selection in machine learning, arXiv preprint, (2018), arXiv: 1811.12808. [22] D. Romanić, M. Ćurić, I. Jovičić and M. Lompar, Long-term trends of the 'koshava' wind during the period 1949-2010, International Journal of Climatology, 35 (2015), 288-302. [23] P. K. Sen, Estimates of the regression coefficient based on Kendall's tau, J. Amer. Statist. Assoc., 63 (1968), 1379-1389. doi: 10.1080/01621459.1968.10480934. [24] R. A. Stine, Bootstrap prediction intervals for regression, J. Amer. Statist. Assoc., 80 (1985), 1026-1031. doi: 10.1080/01621459.1985.10478220. [25] H. Theil, A rank-invariant method of linear and polynomial regression analysis, Henri Theil's Contributions to Economics and Econometrics, Springer, (1992), 345-381. [26] D. Wang, D. Hong and Q. Wu, Prediction of loan rate for mortgage data: Deep learning versus robust regression, Computational Economics, (2022), 1-14. -
Access History
-
Figure 1.
Histogram of the noterate
-
Figure 2.
Box-plot for Different
and Models for 1st Data$ R $ -
Figure 3.
Box-plot for Different
and Models for 2nd Data$ R $