\`x^2+y_1+z_12^34\`
Article Contents
Article Contents

Prediction intervals of loan rate for mortgage data based on bootstrapping technique: A comparative study

  • *Corresponding author: Donglin Wang

    *Corresponding author: Donglin Wang 
Abstract / Introduction Full Text(HTML) Figure(3) / Table(5) Related Papers Cited by
  • The prediction interval is an important guide for financial organizations to make decisions for pricing loan rates. In this paper, we considered four models with bootstrap technique to calculate prediction intervals. Two datasets are used for the study and $ 5 $-fold cross validation is used to estimate performance. The classical regression and Huber regression models have similar performance, both of them have narrow intervals. Although the RANSAC model has a slightly higher coverage rate, it has the widest interval. When the coverage rates are similar, the model with a narrower interval is recommended. Therefore, the classical and Huber regression models with bootstrap method are recommended to calculate the prediction interval.

    Mathematics Subject Classification: Primary: 58F15, 58F17; Secondary: 53C35.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Histogram of the noterate

    Figure 2.  Box-plot for Different $ R $ and Models for 1st Data

    Figure 3.  Box-plot for Different $ R $ and Models for 2nd Data

    Table 1.  Default parameters for robust models

    Models Parameters
    Huber epsilon=1.35 max_iter=100 alpha=0.0001 tol=0.00001
    RANSAC max_trials=100 stop_probability=0.99 loss='absolute_error'
    Theil Sen max_subpopulation=10000 max_iter=300 tol=0.001
     | Show Table
    DownLoad: CSV

    Table 3.  Running time for the two datasets

    R values Model Running time (seconds)
    1st Dataset 2nd Dataset
    R=3000 Classical 103 104
    Huber 1176 1213
    RANSAC 4027 4069
    Theil Sen 2482 2720
    R=5000 Classical 170 172
    Huber 1981 2009
    RANSAC 6715 6779
    Theil Sen 4187 4587
    R=7000 Classical 238 238
    Huber 2737 2808
    RANSAC 9491 9480
    Theil Sen 5949 6507
    R=9000 Classical 304 322
    Huber 3517 3668
    RANSAC 12287 12289
    Theil Sen 7824 8549
     | Show Table
    DownLoad: CSV

    Table 2.  Coverage rate for the two datasets

    R values Model Coverage rate
    1st Dataset 2nd Dataset
    R=3000 Classical 95.18% 94.88%
    Huber 95.05% 94.93%
    RANSAC 97.89% 97.91%
    Theil Sen 95.05% 94.93%
    R=5000 Classical 95.14% 94.84%
    Huber 94.91% 95.01%
    RANSAC 97.89% 97.99%
    Theil Sen 95.23% 94.97%
    R=7000 Classical 95.14% 94.93%
    Huber 94.95% 94.84%
    RANSAC 97.84% 97.91%
    Theil Sen 94.86% 94.97%
    R=9000 Classical 95.09% 94.93%
    Huber 95.09% 94.84%
    RANSAC 97.94% 98.08%
    Theil Sen 95.18% 94.88%
     | Show Table
    DownLoad: CSV

    Table 4.  Tukey test of mean of the widths for the first dataset

    R value Model Difference of Mean P-value $ 95\% $ Confidence Interval
    R=3000 Classical vs Huber -0.0103 0.3012 (-0.0255, 0.0049)
    RANSAC vs Huber 0.6364 0.0010 (0.6212, 0.6516)
    Theil Sen vs Huber 0.0862 0.0010 (0.0710, 0.1014)
    RANSAC vs Classical 0.6467 0.0010 (0.6315, 0.6619)
    Theil Sen vs Classical 0.0966 0.0010 (0.0813, 0.1118)
    Theil Sen vs RANSAC -0.5501 0.0010 (-0.5653, -0.5349)
    R=5000 Classical vs Huber -0.0099 0.3076 (-0.0246, 0.0048)
    RANSAC vs Huber 0.6341 0.0010 (0.6194, 0.6488)
    Theil Sen vs Huber 0.0871 0.0010 (0.0724, 0.1018)
    RANSAC vs Classical 0.6440 0.0010 (0.6293, 0.6587)
    Theil Sen vs Classical 0.0970 0.0010 (0.0823, 0.1117)
    Theil Sen vs RANSAC -0.5470 0.0010 (-0.5617, -0.5323)
    R=7000 Classical vs Huber -0.0134 0.0866 (-0.0280, 0.0012)
    RANSAC vs Huber 0.6348 0.0010 (0.6201, 0.6494)
    Theil Sen vs Huber 0.0833 0.0010 (0.0687, 0.0979)
    RANSAC vs Classical 0.6482 0.0010 (0.6335, 0.6628)
    Theil Sen vs Classical 0.0967 0.0010 (0.0821, 0.1113)
    Theil Sen vs RANSAC -0.5515 0.0010 (-0.5661, -0.5368)
    R=9000 Classical vs Huber -0.0151 0.0365 (-0.0295, -0.0007)
    RANSAC vs Huber 0.6352 0.0010 (0.6207, 0.6496)
    Theil Sen vs Huber 0.0843 0.0010 (0.0699, 0.0987)
    RANSAC vs Classical 0.6503 0.0010 (0.6358, 0.6647
    Theil Sen vs Classical 0.0994 0.0010 (0.0850, 0.1138)
    Theil Sen vs RANSAC -0.5509 0.0010 (-0.5653, -0.5364)
     | Show Table
    DownLoad: CSV

    Table 5.  Tukey test of mean of the widths for the second dataset

    R value Model Difference of Mean P-value $ 95\% $ Confidence Interval
    R=3000 Classical vs Huber -0.0062 0.6138 (-0.0195, 0.0071)
    RANSAC vs Huber 0.5572 0.0010 (0.5439, 0.5704)
    Theil Sen vs Huber 0.0485 0.0010 (0.0353, 0.0618)
    RANSAC vs Classical 0.5633 0.0010 (0.5501, 0.5766)
    Theil Sen vs Classical 0.0547 0.0010 (0.0414, 0.0680)
    Theil Sen vs RANSAC -0.5086 0.0010 (-0.5219, -0.4954)
    R=5000 Classical vs Huber -0.0002 0.9000 (-0.0131, 0.0127)
    RANSAC vs Huber 0.5600 0.0010 (0.5471, 0.5729)
    Theil Sen vs Huber 0.0484 0.0010 (0.0355, 0.0613)
    RANSAC vs Classical 0.5602 0.0010 (0.5473, 0.5731)
    Theil Sen vs Classical 0.0487 0.0010 (0.0357, 0.0616)
    Theil Sen vs RANSAC -0.5116 0.0010 (-0.5245, -0.4986)
    R=7000 Classical vs Huber -0.0017 0.9000 (-0.0144, 0.0111)
    RANSAC vs Huber 0.5609 0.0010 (0.5482, 0.5736)
    Theil Sen vs Huber 0.0466 0.0010 (0.0338, 0.0593)
    RANSAC vs Classical 0.5625 0.0010 (0.5498, 0.5753)
    Theil Sen vs Classical 0.0482 0.0010 (0.0355, 0.0609)
    Theil Sen vs RANSAC -0.5143 0.0010 (-0.5270, -0.5016)
    R=9000 Classical vs Huber -0.0030 0.9000 (-0.0156, 0.0096)
    RANSAC vs Huber 0.5624 0.0010 (0.5498, 0.5750)
    Theil Sen vs Huber 0.0455 0.0010 (0.0329, 0.0581)
    RANSAC vs Classical 0.5654 0.0010 (0.5528, 0.5780
    Theil Sen vs Classical 0.0485 0.0010 (0.0359, 0.0611)
    Theil Sen vs RANSAC -0.5169 0.0010 (-0.5295, -0.5043)
     | Show Table
    DownLoad: CSV
  • [1] M. G. AkritasS. A. Murphy and M. P. Lavalley, The Theil-Sen estimator with doubly censored data and applications to astronomy, J. Amer. Statist. Assoc., 90 (1995), 170-177.  doi: 10.1080/01621459.1995.10476499.
    [2] D. Bertsimas and B. Van Parys, Bootstrap robust prescriptive analytics, Mathematical Programming, (2021), 1-40.
    [3] M. R. Chernick and R. A. LaBudde, An Introduction to Bootstrap Methods with Applications to R, John Wiley & Sons, 2014.
    [4] X. Dang, H. Peng, X. Wang and H. Zhang, Theil-sen estimators in a multiple linear regression model, Olemiss Edu., (2008).
    [5] A. C. Davison and D. V. Hinkley, Bootstrap Methods and Their Application, Cambridge Series in Statistical and Probabilistic Mathematics, 1. Cambridge University Press, Cambridge, 1997. doi: 10.1017/CBO9780511802843.
    [6] B. Efron, Bootstrap methods: Another look at the jackknife, Breakthroughs in Statistics, Springer, (1992), 569-593.
    [7] B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap, Monographs on Statistics and Applied Probability, 57. Chapman and Hall, New York, 1993. doi: 10.1007/978-1-4899-4541-9.
    [8] Y. Fang, J. Xu and L. Yang, Online bootstrap confidence intervals for the stochastic gradient descent estimator, J. Mach. Learn. Res., 19 (2018), Paper No. 78, 21 pp.
    [9] M. A. Fischler and R. C. Bolles, Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Comm. ACM, 24 (1981), 381-395.  doi: 10.1145/358669.358692.
    [10] J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo and M. Gheshlaghi Azar, et al., Bootstrap your own latent-a new approach to self-supervised learning, Advances in Neural Information Processing Systems, 33 (2020), 21271-21284.
    [11] T. Hesterberg, Bootstrap, Wiley Interdisciplinary Reviews: Computational Statistics, 3 (2011), 497-526. 
    [12] P. J. Huber, Robust estimation of a location parameter, Breakthroughs in Statistics, Springer, (1992), 492-518.
    [13] P. J. Huber, Robust Statistics, Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons, Inc., New York, 1981.
    [14] G. James, D. Witten, T. Hastie and R. Tibshirani, An Introduction to Statistical Learning, Springer Texts in Statistics, 103. Springer, New York, 2013. doi: 10.1007/978-1-4614-7138-7.
    [15] A. KhosraviS. NahavandiD. Srinivasan and R. Khosravi, Constructing optimal prediction intervals by using neural networks and bootstrap method, IEEE Trans. Neural Netw. Learn. Syst., 26 (2015), 1810-1815.  doi: 10.1109/TNNLS.2014.2354418.
    [16] M. Kuhn and K. Johnson, Applied Predictive Modeling, Springer, New York, 2013. doi: 10.1007/978-1-4614-6849-3.
    [17] L. Pan and D. N. Politis, Bootstrap prediction intervals for linear, nonlinear and nonparametric autoregressions, J. Statist. Plann. Inference, 177 (2016), 1-27.  doi: 10.1016/j.jspi.2014.10.003.
    [18] F. PedregosaG. VaroquauxA. GramfortV. MichelB. ThirionO. GriselM. BlondelP. PrettenhoferR. WeissV. DubourgJ. VanderplasA. PassosD. CournapeauM. BrucherM. Perrot and E. Duchesnay, Scikit-learn: Machine learning in Python, J. Mach. Learn. Res., 12 (2011), 2825-2830. 
    [19] T. Phaladisailoed and T. Numnonda, Machine learning models comparison for bitcoin price prediction, 2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE), (2018), 506-511.
    [20] R. RaguramO. ChumM. PollefeysJ. Matas and J.-M. Frahm, Usac: A universal framework for random sample consensus, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2012), 2022-2038. 
    [21] S. Raschka, Model evaluation, model selection, and algorithm selection in machine learning, arXiv preprint, (2018), arXiv: 1811.12808.
    [22] D. RomanićM. ĆurićI. Jovičić and M. Lompar, Long-term trends of the 'koshava' wind during the period 1949-2010, International Journal of Climatology, 35 (2015), 288-302. 
    [23] P. K. Sen, Estimates of the regression coefficient based on Kendall's tau, J. Amer. Statist. Assoc., 63 (1968), 1379-1389.  doi: 10.1080/01621459.1968.10480934.
    [24] R. A. Stine, Bootstrap prediction intervals for regression, J. Amer. Statist. Assoc., 80 (1985), 1026-1031.  doi: 10.1080/01621459.1985.10478220.
    [25] H. Theil, A rank-invariant method of linear and polynomial regression analysis, Henri Theil's Contributions to Economics and Econometrics, Springer, (1992), 345-381.
    [26] D. Wang, D. Hong and Q. Wu, Prediction of loan rate for mortgage data: Deep learning versus robust regression, Computational Economics, (2022), 1-14.
  • 加载中

Figures(3)

Tables(5)

SHARE

Article Metrics

HTML views(2310) PDF downloads(256) Cited by(0)

Access History

Other Articles By Authors

Top

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint