Prediction intervals of loan rate for mortgage data based on bootstrapping technique: A comparative study

Donglin Wang; Rencheng Sun; Lisa Green

doi:10.3934/mfc.2022027

Article Contents

2023, Volume 6, Issue 2: 280-289. Doi: 10.3934/mfc.2022027

This issue Previous Article A Lévy risk model with ratcheting and barrier dividend strategies Next Article

Prediction intervals of loan rate for mortgage data based on bootstrapping technique: A comparative study

1.
Department of Mathematical Sciences, Middle Tennessee State University, USA
2.
College of Computer Science and Technology, Qingdao University, China

^*Corresponding author: Donglin Wang
^*Corresponding author: Donglin Wang

Received: March 2022

Revised: July 2022

Early access: August 2022

Published: May 2023

Abstract / Introduction Full Text(HTML) Figure(3) / Table(5) Related Papers Cited by

Abstract

The prediction interval is an important guide for financial organizations to make decisions for pricing loan rates. In this paper, we considered four models with bootstrap technique to calculate prediction intervals. Two datasets are used for the study and $ 5 $-fold cross validation is used to estimate performance. The classical regression and Huber regression models have similar performance, both of them have narrow intervals. Although the RANSAC model has a slightly higher coverage rate, it has the widest interval. When the coverage rates are similar, the model with a narrower interval is recommended. Therefore, the classical and Huber regression models with bootstrap method are recommended to calculate the prediction interval.

Keywords:

Mathematics Subject Classification: Primary: 58F15, 58F17; Secondary: 53C35.

Citation:

Full Text(HTML)

Figure 1. Histogram of the noterate

Download: Full-size image PowerPoint slide

Figure 2. Box-plot for Different $ R $ and Models for 1st Data

Download: Full-size image PowerPoint slide

Figure 3. Box-plot for Different $ R $ and Models for 2nd Data

Download: Full-size image PowerPoint slide

Table 1. Default parameters for robust models

Models	Parameters
Huber	epsilon=1.35 max_iter=100 alpha=0.0001 tol=0.00001
RANSAC	max_trials=100 stop_probability=0.99 loss='absolute_error'
Theil Sen	max_subpopulation=10000 max_iter=300 tol=0.001

| Show Table

DownLoad: CSV

Table 3. Running time for the two datasets

R values	Model	Running time (seconds)
R values	Model	1st Dataset	2nd Dataset
R=3000	Classical	103	104
	Huber	1176	1213
	RANSAC	4027	4069
	Theil Sen	2482	2720
R=5000	Classical	170	172
	Huber	1981	2009
	RANSAC	6715	6779
	Theil Sen	4187	4587
R=7000	Classical	238	238
	Huber	2737	2808
	RANSAC	9491	9480
	Theil Sen	5949	6507
R=9000	Classical	304	322
	Huber	3517	3668
	RANSAC	12287	12289
	Theil Sen	7824	8549

| Show Table

DownLoad: CSV

Table 2. Coverage rate for the two datasets

R values	Model	Coverage rate
R values	Model	1st Dataset	2nd Dataset
R=3000	Classical	95.18%	94.88%
	Huber	95.05%	94.93%
	RANSAC	97.89%	97.91%
	Theil Sen	95.05%	94.93%
R=5000	Classical	95.14%	94.84%
	Huber	94.91%	95.01%
	RANSAC	97.89%	97.99%
	Theil Sen	95.23%	94.97%
R=7000	Classical	95.14%	94.93%
	Huber	94.95%	94.84%
	RANSAC	97.84%	97.91%
	Theil Sen	94.86%	94.97%
R=9000	Classical	95.09%	94.93%
	Huber	95.09%	94.84%
	RANSAC	97.94%	98.08%
	Theil Sen	95.18%	94.88%

| Show Table

DownLoad: CSV

Table 4. Tukey test of mean of the widths for the first dataset

R value	Model	Difference of Mean	P-value	$ 95\% $ Confidence Interval
R=3000	Classical vs Huber	-0.0103	0.3012	(-0.0255, 0.0049)
	RANSAC vs Huber	0.6364	0.0010	(0.6212, 0.6516)
	Theil Sen vs Huber	0.0862	0.0010	(0.0710, 0.1014)
	RANSAC vs Classical	0.6467	0.0010	(0.6315, 0.6619)
	Theil Sen vs Classical	0.0966	0.0010	(0.0813, 0.1118)
	Theil Sen vs RANSAC	-0.5501	0.0010	(-0.5653, -0.5349)
R=5000	Classical vs Huber	-0.0099	0.3076	(-0.0246, 0.0048)
	RANSAC vs Huber	0.6341	0.0010	(0.6194, 0.6488)
	Theil Sen vs Huber	0.0871	0.0010	(0.0724, 0.1018)
	RANSAC vs Classical	0.6440	0.0010	(0.6293, 0.6587)
	Theil Sen vs Classical	0.0970	0.0010	(0.0823, 0.1117)
	Theil Sen vs RANSAC	-0.5470	0.0010	(-0.5617, -0.5323)
R=7000	Classical vs Huber	-0.0134	0.0866	(-0.0280, 0.0012)
	RANSAC vs Huber	0.6348	0.0010	(0.6201, 0.6494)
	Theil Sen vs Huber	0.0833	0.0010	(0.0687, 0.0979)
	RANSAC vs Classical	0.6482	0.0010	(0.6335, 0.6628)
	Theil Sen vs Classical	0.0967	0.0010	(0.0821, 0.1113)
	Theil Sen vs RANSAC	-0.5515	0.0010	(-0.5661, -0.5368)
R=9000	Classical vs Huber	-0.0151	0.0365	(-0.0295, -0.0007)
	RANSAC vs Huber	0.6352	0.0010	(0.6207, 0.6496)
	Theil Sen vs Huber	0.0843	0.0010	(0.0699, 0.0987)
	RANSAC vs Classical	0.6503	0.0010	(0.6358, 0.6647
	Theil Sen vs Classical	0.0994	0.0010	(0.0850, 0.1138)
	Theil Sen vs RANSAC	-0.5509	0.0010	(-0.5653, -0.5364)

| Show Table

DownLoad: CSV

Table 5. Tukey test of mean of the widths for the second dataset

R value	Model	Difference of Mean	P-value	$ 95\% $ Confidence Interval
R=3000	Classical vs Huber	-0.0062	0.6138	(-0.0195, 0.0071)
	RANSAC vs Huber	0.5572	0.0010	(0.5439, 0.5704)
	Theil Sen vs Huber	0.0485	0.0010	(0.0353, 0.0618)
	RANSAC vs Classical	0.5633	0.0010	(0.5501, 0.5766)
	Theil Sen vs Classical	0.0547	0.0010	(0.0414, 0.0680)
	Theil Sen vs RANSAC	-0.5086	0.0010	(-0.5219, -0.4954)
R=5000	Classical vs Huber	-0.0002	0.9000	(-0.0131, 0.0127)
	RANSAC vs Huber	0.5600	0.0010	(0.5471, 0.5729)
	Theil Sen vs Huber	0.0484	0.0010	(0.0355, 0.0613)
	RANSAC vs Classical	0.5602	0.0010	(0.5473, 0.5731)
	Theil Sen vs Classical	0.0487	0.0010	(0.0357, 0.0616)
	Theil Sen vs RANSAC	-0.5116	0.0010	(-0.5245, -0.4986)
R=7000	Classical vs Huber	-0.0017	0.9000	(-0.0144, 0.0111)
	RANSAC vs Huber	0.5609	0.0010	(0.5482, 0.5736)
	Theil Sen vs Huber	0.0466	0.0010	(0.0338, 0.0593)
	RANSAC vs Classical	0.5625	0.0010	(0.5498, 0.5753)
	Theil Sen vs Classical	0.0482	0.0010	(0.0355, 0.0609)
	Theil Sen vs RANSAC	-0.5143	0.0010	(-0.5270, -0.5016)
R=9000	Classical vs Huber	-0.0030	0.9000	(-0.0156, 0.0096)
	RANSAC vs Huber	0.5624	0.0010	(0.5498, 0.5750)
	Theil Sen vs Huber	0.0455	0.0010	(0.0329, 0.0581)
	RANSAC vs Classical	0.5654	0.0010	(0.5528, 0.5780
	Theil Sen vs Classical	0.0485	0.0010	(0.0359, 0.0611)
	Theil Sen vs RANSAC	-0.5169	0.0010	(-0.5295, -0.5043)

| Show Table

DownLoad: CSV

Related Papers

Cited by

References

[1]	M. G. Akritas, S. A. Murphy and M. P. Lavalley, The Theil-Sen estimator with doubly censored data and applications to astronomy, J. Amer. Statist. Assoc., 90 (1995), 170-177. doi: 10.1080/01621459.1995.10476499.
[2]	D. Bertsimas and B. Van Parys, Bootstrap robust prescriptive analytics, Mathematical Programming, (2021), 1-40.
[3]	M. R. Chernick and R. A. LaBudde, An Introduction to Bootstrap Methods with Applications to R, John Wiley & Sons, 2014.
[4]	X. Dang, H. Peng, X. Wang and H. Zhang, Theil-sen estimators in a multiple linear regression model, Olemiss Edu., (2008).
[5]	A. C. Davison and D. V. Hinkley, Bootstrap Methods and Their Application, Cambridge Series in Statistical and Probabilistic Mathematics, 1. Cambridge University Press, Cambridge, 1997. doi: 10.1017/CBO9780511802843.
[6]	B. Efron, Bootstrap methods: Another look at the jackknife, Breakthroughs in Statistics, Springer, (1992), 569-593.
[7]	B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap, Monographs on Statistics and Applied Probability, 57. Chapman and Hall, New York, 1993. doi: 10.1007/978-1-4899-4541-9.
[8]	Y. Fang, J. Xu and L. Yang, Online bootstrap confidence intervals for the stochastic gradient descent estimator, J. Mach. Learn. Res., 19 (2018), Paper No. 78, 21 pp.
[9]	M. A. Fischler and R. C. Bolles, Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Comm. ACM, 24 (1981), 381-395. doi: 10.1145/358669.358692.
[10]	J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo and M. Gheshlaghi Azar, et al., Bootstrap your own latent-a new approach to self-supervised learning, Advances in Neural Information Processing Systems, 33 (2020), 21271-21284.
[11]	T. Hesterberg, Bootstrap, Wiley Interdisciplinary Reviews: Computational Statistics, 3 (2011), 497-526.
[12]	P. J. Huber, Robust estimation of a location parameter, Breakthroughs in Statistics, Springer, (1992), 492-518.
[13]	P. J. Huber, Robust Statistics, Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons, Inc., New York, 1981.
[14]	G. James, D. Witten, T. Hastie and R. Tibshirani, An Introduction to Statistical Learning, Springer Texts in Statistics, 103. Springer, New York, 2013. doi: 10.1007/978-1-4614-7138-7.
[15]	A. Khosravi, S. Nahavandi, D. Srinivasan and R. Khosravi, Constructing optimal prediction intervals by using neural networks and bootstrap method, IEEE Trans. Neural Netw. Learn. Syst., 26 (2015), 1810-1815. doi: 10.1109/TNNLS.2014.2354418.
[16]	M. Kuhn and K. Johnson, Applied Predictive Modeling, Springer, New York, 2013. doi: 10.1007/978-1-4614-6849-3.
[17]	L. Pan and D. N. Politis, Bootstrap prediction intervals for linear, nonlinear and nonparametric autoregressions, J. Statist. Plann. Inference, 177 (2016), 1-27. doi: 10.1016/j.jspi.2014.10.003.
[18]	F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot and E. Duchesnay, Scikit-learn: Machine learning in Python, J. Mach. Learn. Res., 12 (2011), 2825-2830.
[19]	T. Phaladisailoed and T. Numnonda, Machine learning models comparison for bitcoin price prediction, 2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE), (2018), 506-511.
[20]	R. Raguram, O. Chum, M. Pollefeys, J. Matas and J.-M. Frahm, Usac: A universal framework for random sample consensus, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2012), 2022-2038.
[21]	S. Raschka, Model evaluation, model selection, and algorithm selection in machine learning, arXiv preprint, (2018), arXiv: 1811.12808.
[22]	D. Romanić, M. Ćurić, I. Jovičić and M. Lompar, Long-term trends of the 'koshava' wind during the period 1949-2010, International Journal of Climatology, 35 (2015), 288-302.
[23]	P. K. Sen, Estimates of the regression coefficient based on Kendall's tau, J. Amer. Statist. Assoc., 63 (1968), 1379-1389. doi: 10.1080/01621459.1968.10480934.
[24]	R. A. Stine, Bootstrap prediction intervals for regression, J. Amer. Statist. Assoc., 80 (1985), 1026-1031. doi: 10.1080/01621459.1985.10478220.
[25]	H. Theil, A rank-invariant method of linear and polynomial regression analysis, Henri Theil's Contributions to Economics and Econometrics, Springer, (1992), 345-381.
[26]	D. Wang, D. Hong and Q. Wu, Prediction of loan rate for mortgage data: Deep learning versus robust regression, Computational Economics, (2022), 1-14.