Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

A model of inductive bias learning

Published: 01 March 2000 Publication History

Abstract

A major problem in machine learning is that of inductive bias: how to choose a learner's hypothesis space so that it is large enough to contain a solution to the problem being learnt, yet small enough to ensure reliable generalization from reasonably-sized training sets. Typically such bias is supplied by hand through the skill and insights of experts. In this paper a model for automatically learning bias is investigated. The central assumption of the model is that the learner is embedded within an environment of related learning tasks. Within such an environment the learner can sample from multiple tasks, and hence it can search for a hypothesis space that contains good solutions to many of the problems in the environment. Under certain restrictions on the set of all hypothesis spaces available to the learner, we show that a hypothesis space that performs well on a sufficiently large number of training tasks will also perform well when learning novel tasks in the same environment. Explicit bounds are also derived demonstrating that learning multiple tasks within an environment of related tasks can potentially give much better generalization than learning a single task.

References

[1]
Abu-Mostafa, Y. (1993). A method for learning from hints. In Hanson, S. J., Cowan, J. D., & Giles, C. L. (Eds.), Advances in Neural Information Processing Systems 5, pp. 73-80 San Mateo, CA. Morgan Kaufmann.
[2]
Anthony, M., & Bartlett, P. L. (1999). Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge, UK.
[3]
Bartlett, P. L. (1993). Lower bounds on the VC-dimension of multi-layer threshold networks. In Proccedings of the Sixth ACM Conference on Computational Learning Theory, pp. 44-150 New York. ACM Press. Summary appeared in Neural Computation, 5, no. 3.
[4]
Bartlett, P. L. (1998). The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44(2), 525-536.
[5]
Baxter, J. (1995a). Learning Internal Representations. Ph.D. thesis, Department of Mathematics and Statistics, The Flinders University of South Australia. Copy available from http://wwwsyseng.anu.edu.au/~jon/papers/thesis.ps.gz.
[6]
Baxter, J. (1995b). Learning internal representations. In Proceedings of the Eighth International Conference on Computational Learning Theory, pp. 311-320. ACM Press. Copy available from http://wwwsyseng.anu.edu.au/~jon/papers/colt95.ps.gz.
[7]
Baxter, J. (1997a). A Bayesian/information theoretic model of learning to learn via multiple task sampling. Machine Learning, 28, 7-40.
[8]
Baxter, J. (1997b). The canonical distortion measure for vector quantization and function approximation. In Proceedings of the Fourteenth International Conference on Machine Learning, pp. 39-47. Morgan Kaufmann.
[9]
Baxter, J., & Bartlett, P. L. (1998). The canonical distortion measure in feature space and 1-NN classification. In Advances in Neural Information Processing Systems 10, pp. 245-251. MIT Press.
[10]
Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis. Springer-Verlag, New York.
[11]
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1989). Learnability and the vapnik-chervonenkis dimension. Journal of the ACM, 36, 929-965.
[12]
Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41-70.
[13]
Devroye, L., Györfi, L., & Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
[14]
Dudley, R. M. (1984). A Course on Empirical Processes, Vol. 1097 of Lecture Notes in Mathematics , pp. 2-142. Springer-Verlag.
[15]
Dudley, R. M. (1989). Real Analysis and Probability. Wadsworth & Brooks/Cole, California.
[16]
Gelman, A., Carlin, J. B., Stern, H. S., & Rubim, D. B. (Eds.). (1995). Bayesian Data Analysis. Chapman and Hall.
[17]
Good, I. J. (1980). Some history of the hierarchical Bayesian methodology. In Bernardo, J. M., Groot, M. H. D., Lindley, D. V., & Smith, A. F. M. (Eds.), Bayesian Statistics II. University Press, Valencia.
[18]
Haussler, D. (1992). Decision theoretic generalizations of the pac model for neural net and other learning applications. Information and Computation, 100, 78-150.
[19]
Heskes, T. (1998). Solving a huge number of similar tasks: a combination of multi-task learning and a hierarchical Bayesian approach. In Shavlik, J. (Ed.), Proceedings of the 15th International Conference on Machine Learning (ICML '98), pp. 233-241. Morgan Kaufmann.
[20]
Intrator, N., & Edelman, S. (1996). How to make a low-dimensional representation suitable for diverse tasks. Connection Science, 8.
[21]
Kechris, A. S. (1995). Classical Descriptive Set Theory. Springer-Verlag, New York.
[22]
Khan, K., Muggleton, S., & Parson, R. (1998). Repeat learning using predicate invention. In Page, C. D. (Ed.), Proceedings of the 8th International Workshop on Inductive Logic Programming (ILP-98), LNAI 1446, pp. 65-174. Springer-Verlag.
[23]
Langford, J. C. (1999). Staged learning. Tech. rep., CMU, School of Computer Science. http://www.cs.cmu.edu/~jcl/research/ltol/staged latest.ps.
[24]
Mitchell, T. M. (1991). The need for biases in learning generalisations. In Dietterich, T. G., & Shavlik, J. (Eds.), Readings in Machine Learning. Morgan Kaufmann.
[25]
Parthasarathy, K. R. (1967). Probabiliity Measures on Metric Spaces. Academic Press, London.
[26]
Pollard, D. (1984). Convergence of Stochastic Processes. Springer-Verlag, New York.
[27]
Pratt, L. Y. (1992). Discriminability-based transfer between neural networks. In Hanson, S. J., Cowan, J. D., & Giles, C. L. (Eds.), Advances in Neural Information Processing Systems 5, pp. 204-211. Morgan Kaufmann.
[28]
Rendell, L., Seshu, R., & Tcheng, D. (1987). Layered concept learning and dynamically-variable bias management. In Proceedings of the Tenth International Joint Conference on Artificial Intelligence (IJCAI '87), pp. 308-314. IJCAI, Inc.
[29]
Ring, M. B. (1995). Continual Learning in Reinforcement Environments. R. Oldenbourg Verlag.
[30]
Russell, S. (1989). The Use of Knowledge in Analogy and Induction. Morgan Kaufmann.
[31]
Sauer, N. (1972). On the density of families of sets. Journal of Combinatorial Theory A, 13, 145-168.
[32]
Sharkey, N. E., & Sharkey, A. J. C. (1993). Adaptive generalisation and the transfer of knowledge. Artificial Intelligence Review, 7, 313-328.
[33]
Silver, D. L., & Mercer, R. E. (1996). The parallel transfer of task knowledge using dynamic learning rates based on a measure of relatedness. Connection Science, 8, 277-294.
[34]
Singh, S. (1992). Transfer of learning by composing solutions of elemental sequential tasks. Machine Learning, 8, 323-339.
[35]
Slud, E. (1977). Distribution inequalities for the binomial law. Annals of Probability, 4, 404-412.
[36]
Suddarth, S. C., & Holden, A. D. C. (1991). Symolic-neural systems and the use of hints in developing complex systems. International Journal of Man-Machine Studies, 35, 291-311.
[37]
Suddarth, S. C., & Kergosien, Y. L. (1990). Rule-injection hints as a means of improving network performance and learning time. In Proceedings of the EURASIP Workshop on Neural Networks Portugal. EURASIP.
[38]
Sutton, R. (1992). Adapting bias by gradient descent: An incremental version of delta-bar-delta. In Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 171-176. MIT Press.
[39]
Tate, R. F. (1953). On a double inequality of the normal distribution. Annals of Mathematical Statistics, 24, 132-134.
[40]
Thrun, S. (1996). Is learning the n-th thing any easier than learning the first?. In Advances in Neural Information Processing Systems 8, pp. 640-646. MIT Press.
[41]
Thrun, S., & Mitchell, T. M. (1995). Learning one more thing. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1217-1223. Morgan Kaufmann.
[42]
Thrun, S., & O'Sullivan, J. (1996). Discovering structure in multiple learning tasks: The TC algorithm. In Saitta, L. (Ed.), Proceedings of the 13th International Conference on Machine Learning (ICML '96), pp. 489-497. Morgen Kaufmann.
[43]
Thrun, S., & Pratt, L. (Eds.). (1997). Learning to Learn. Kluwer Academic.
[44]
Thrun, S., & Schwartz, A. (1995). Finding structure in reinforcement learning. In Tesauro, G., Touretzky, D., & Leen, T. (Eds.), Advances in Neural Information Processing Systems, Vol. 7, pp. 385-392. MIT Press.
[45]
Utgoff, P. E. (1986). Shift of bias for inductive concept learning. In Machine Learning: An Artificial Intelligence Approach, pp. 107-147. Morgan Kaufmann.
[46]
Valiant, L. G. (1984). A theory of the learnable. Comm. ACM, 27, 1134-1142.
[47]
Vapnik, V. N. (1982). Estimation of Dependences Based on Empirical Data. Springer-Verlag, New York.
[48]
Vapnik, V. N. (1996). The Nature of Statistical Learning Theory. Springer Verlag, New York.

Cited By

View all
  • (2024)Spatial Bias for attention-free non-local neural networksExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122053238:PEOnline publication date: 27-Feb-2024
  • (2023)4MProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668666(58363-58408)Online publication date: 10-Dec-2023
  • (2023)Control of Dual-Sourcing Inventory Systems Using Recurrent Neural NetworksINFORMS Journal on Computing10.1287/ijoc.2022.013635:6(1308-1328)Online publication date: 6-Jul-2023
  • Show More Cited By
  1. A model of inductive bias learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Journal of Artificial Intelligence Research
    Journal of Artificial Intelligence Research  Volume 12, Issue 1
    February 2000
    410 pages

    Publisher

    AI Access Foundation

    El Segundo, CA, United States

    Publication History

    Published: 01 March 2000
    Received: 01 November 1999
    Published in JAIR Volume 12, Issue 1

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 24 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Spatial Bias for attention-free non-local neural networksExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122053238:PEOnline publication date: 27-Feb-2024
    • (2023)4MProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668666(58363-58408)Online publication date: 10-Dec-2023
    • (2023)Control of Dual-Sourcing Inventory Systems Using Recurrent Neural NetworksINFORMS Journal on Computing10.1287/ijoc.2022.013635:6(1308-1328)Online publication date: 6-Jul-2023
    • (2023)Effectiveness of multi-task deep learning framework for EEG-based emotion and context recognitionExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.120348227:COnline publication date: 11-Jul-2023
    • (2023)MTSS-AAEExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.119475216:COnline publication date: 15-Apr-2023
    • (2022)Efficient and effective multi-task grouping via meta learning on task combinationsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602999(37647-37659)Online publication date: 28-Nov-2022
    • (2022)Evaluated CMI bounds for meta learningProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601771(20648-20660)Online publication date: 28-Nov-2022
    • (2022)A survey on machine learning in array databasesApplied Intelligence10.1007/s10489-022-03979-253:9(9799-9822)Online publication date: 12-Aug-2022
    • (2022)Deep Geometric Supervision Improves Spatial Generalization in Orthopedic Surgery PlanningMedical Image Computing and Computer Assisted Intervention – MICCAI 202210.1007/978-3-031-16449-1_59(615-625)Online publication date: 18-Sep-2022
    • (2021)Bayesian decision-making under misspecifed priors with applications to meta-learningProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3542281(26382-26394)Online publication date: 6-Dec-2021
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media