article

A model of inductive bias learning

Author:

Jonathan BaxterAuthors Info & Claims

Journal of Artificial Intelligence Research, Volume 12, Issue 1

Pages 149 - 198

Published: 01 March 2000 Publication History

Abstract

A major problem in machine learning is that of inductive bias: how to choose a learner's hypothesis space so that it is large enough to contain a solution to the problem being learnt, yet small enough to ensure reliable generalization from reasonably-sized training sets. Typically such bias is supplied by hand through the skill and insights of experts. In this paper a model for automatically learning bias is investigated. The central assumption of the model is that the learner is embedded within an environment of related learning tasks. Within such an environment the learner can sample from multiple tasks, and hence it can search for a hypothesis space that contains good solutions to many of the problems in the environment. Under certain restrictions on the set of all hypothesis spaces available to the learner, we show that a hypothesis space that performs well on a sufficiently large number of training tasks will also perform well when learning novel tasks in the same environment. Explicit bounds are also derived demonstrating that learning multiple tasks within an environment of related tasks can potentially give much better generalization than learning a single task.

References

[1]

Abu-Mostafa, Y. (1993). A method for learning from hints. In Hanson, S. J., Cowan, J. D., & Giles, C. L. (Eds.), Advances in Neural Information Processing Systems 5, pp. 73-80 San Mateo, CA. Morgan Kaufmann.

[2]

Anthony, M., & Bartlett, P. L. (1999). Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge, UK.

[3]

Bartlett, P. L. (1993). Lower bounds on the VC-dimension of multi-layer threshold networks. In Proccedings of the Sixth ACM Conference on Computational Learning Theory, pp. 44-150 New York. ACM Press. Summary appeared in Neural Computation, 5, no. 3.

[4]

Bartlett, P. L. (1998). The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44(2), 525-536.

Digital Library

[5]

Baxter, J. (1995a). Learning Internal Representations. Ph.D. thesis, Department of Mathematics and Statistics, The Flinders University of South Australia. Copy available from http://wwwsyseng.anu.edu.au/~jon/papers/thesis.ps.gz.

[6]

Baxter, J. (1995b). Learning internal representations. In Proceedings of the Eighth International Conference on Computational Learning Theory, pp. 311-320. ACM Press. Copy available from http://wwwsyseng.anu.edu.au/~jon/papers/colt95.ps.gz.

[7]

Baxter, J. (1997a). A Bayesian/information theoretic model of learning to learn via multiple task sampling. Machine Learning, 28, 7-40.

Digital Library

[8]

Baxter, J. (1997b). The canonical distortion measure for vector quantization and function approximation. In Proceedings of the Fourteenth International Conference on Machine Learning, pp. 39-47. Morgan Kaufmann.

[9]

Baxter, J., & Bartlett, P. L. (1998). The canonical distortion measure in feature space and 1-NN classification. In Advances in Neural Information Processing Systems 10, pp. 245-251. MIT Press.

[10]

Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis. Springer-Verlag, New York.

[11]

Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1989). Learnability and the vapnik-chervonenkis dimension. Journal of the ACM, 36, 929-965.

Digital Library

[12]

Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41-70.

Digital Library

[13]

Devroye, L., Györfi, L., & Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.

[14]

Dudley, R. M. (1984). A Course on Empirical Processes, Vol. 1097 of Lecture Notes in Mathematics , pp. 2-142. Springer-Verlag.

[15]

Dudley, R. M. (1989). Real Analysis and Probability. Wadsworth & Brooks/Cole, California.

[16]

Gelman, A., Carlin, J. B., Stern, H. S., & Rubim, D. B. (Eds.). (1995). Bayesian Data Analysis. Chapman and Hall.

[17]

Good, I. J. (1980). Some history of the hierarchical Bayesian methodology. In Bernardo, J. M., Groot, M. H. D., Lindley, D. V., & Smith, A. F. M. (Eds.), Bayesian Statistics II. University Press, Valencia.

[18]

Haussler, D. (1992). Decision theoretic generalizations of the pac model for neural net and other learning applications. Information and Computation, 100, 78-150.

Digital Library

[19]

Heskes, T. (1998). Solving a huge number of similar tasks: a combination of multi-task learning and a hierarchical Bayesian approach. In Shavlik, J. (Ed.), Proceedings of the 15th International Conference on Machine Learning (ICML '98), pp. 233-241. Morgan Kaufmann.

[20]

Intrator, N., & Edelman, S. (1996). How to make a low-dimensional representation suitable for diverse tasks. Connection Science, 8.

[21]

Kechris, A. S. (1995). Classical Descriptive Set Theory. Springer-Verlag, New York.

[22]

Khan, K., Muggleton, S., & Parson, R. (1998). Repeat learning using predicate invention. In Page, C. D. (Ed.), Proceedings of the 8th International Workshop on Inductive Logic Programming (ILP-98), LNAI 1446, pp. 65-174. Springer-Verlag.

[23]

Langford, J. C. (1999). Staged learning. Tech. rep., CMU, School of Computer Science. http://www.cs.cmu.edu/~jcl/research/ltol/staged latest.ps.

[24]

Mitchell, T. M. (1991). The need for biases in learning generalisations. In Dietterich, T. G., & Shavlik, J. (Eds.), Readings in Machine Learning. Morgan Kaufmann.

[25]

Parthasarathy, K. R. (1967). Probabiliity Measures on Metric Spaces. Academic Press, London.

[26]

Pollard, D. (1984). Convergence of Stochastic Processes. Springer-Verlag, New York.

[27]

Pratt, L. Y. (1992). Discriminability-based transfer between neural networks. In Hanson, S. J., Cowan, J. D., & Giles, C. L. (Eds.), Advances in Neural Information Processing Systems 5, pp. 204-211. Morgan Kaufmann.

[28]

Rendell, L., Seshu, R., & Tcheng, D. (1987). Layered concept learning and dynamically-variable bias management. In Proceedings of the Tenth International Joint Conference on Artificial Intelligence (IJCAI '87), pp. 308-314. IJCAI, Inc.

[29]

Ring, M. B. (1995). Continual Learning in Reinforcement Environments. R. Oldenbourg Verlag.

[30]

Russell, S. (1989). The Use of Knowledge in Analogy and Induction. Morgan Kaufmann.

[31]

Sauer, N. (1972). On the density of families of sets. Journal of Combinatorial Theory A, 13, 145-168.

[32]

Sharkey, N. E., & Sharkey, A. J. C. (1993). Adaptive generalisation and the transfer of knowledge. Artificial Intelligence Review, 7, 313-328.

[33]

Silver, D. L., & Mercer, R. E. (1996). The parallel transfer of task knowledge using dynamic learning rates based on a measure of relatedness. Connection Science, 8, 277-294.

[34]

Singh, S. (1992). Transfer of learning by composing solutions of elemental sequential tasks. Machine Learning, 8, 323-339.

Digital Library

[35]

Slud, E. (1977). Distribution inequalities for the binomial law. Annals of Probability, 4, 404-412.

[36]

Suddarth, S. C., & Holden, A. D. C. (1991). Symolic-neural systems and the use of hints in developing complex systems. International Journal of Man-Machine Studies, 35, 291-311.

Digital Library

[37]

Suddarth, S. C., & Kergosien, Y. L. (1990). Rule-injection hints as a means of improving network performance and learning time. In Proceedings of the EURASIP Workshop on Neural Networks Portugal. EURASIP.

[38]

Sutton, R. (1992). Adapting bias by gradient descent: An incremental version of delta-bar-delta. In Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 171-176. MIT Press.

[39]

Tate, R. F. (1953). On a double inequality of the normal distribution. Annals of Mathematical Statistics, 24, 132-134.

[40]

Thrun, S. (1996). Is learning the n-th thing any easier than learning the first?. In Advances in Neural Information Processing Systems 8, pp. 640-646. MIT Press.

[41]

Thrun, S., & Mitchell, T. M. (1995). Learning one more thing. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1217-1223. Morgan Kaufmann.

[42]

Thrun, S., & O'Sullivan, J. (1996). Discovering structure in multiple learning tasks: The TC algorithm. In Saitta, L. (Ed.), Proceedings of the 13th International Conference on Machine Learning (ICML '96), pp. 489-497. Morgen Kaufmann.

[43]

Thrun, S., & Pratt, L. (Eds.). (1997). Learning to Learn. Kluwer Academic.

[44]

Thrun, S., & Schwartz, A. (1995). Finding structure in reinforcement learning. In Tesauro, G., Touretzky, D., & Leen, T. (Eds.), Advances in Neural Information Processing Systems, Vol. 7, pp. 385-392. MIT Press.

[45]

Utgoff, P. E. (1986). Shift of bias for inductive concept learning. In Machine Learning: An Artificial Intelligence Approach, pp. 107-147. Morgan Kaufmann.

[46]

Valiant, L. G. (1984). A theory of the learnable. Comm. ACM, 27, 1134-1142.

Digital Library

[47]

Vapnik, V. N. (1982). Estimation of Dependences Based on Empirical Data. Springer-Verlag, New York.

[48]

Vapnik, V. N. (1996). The Nature of Statistical Learning Theory. Springer Verlag, New York.

Cited By

Go JRyu J(2024)Spatial Bias for attention-free non-local neural networksExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122053238:PEOnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.122053
Mizrahi DBachmann RKar OYeo TGao MDehghan AZamir AOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)4MProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668666(58363-58408)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668666
Böttcher LAsikis TFragkos I(2023)Control of Dual-Sourcing Inventory Systems Using Recurrent Neural NetworksINFORMS Journal on Computing10.1287/ijoc.2022.013635:6(1308-1328)Online publication date: 6-Jul-2023
https://dl.acm.org/doi/10.1287/ijoc.2022.0136
Show More Cited By

A model of inductive bias learning
1. Computing methodologies

Recommendations

Inductive bias for semi-supervised extreme learning machine

This research shows that inductive bias provides a valuable method to effectively tackle semi-supervised classification problems. In the learning theory framework, inductive bias provides a powerful tool, and allows one to shape the generalization ...
Policies for the selection of bias in inductive machine learning
Shift of bias for inductive concept learning

Comments

Information & Contributors

Information

Published In

cover image Journal of Artificial Intelligence Research

Journal of Artificial Intelligence Research Volume 12, Issue 1

February 2000

410 pages

ISSN:1076-9757

Issue’s Table of Contents

Publisher

AI Access Foundation

El Segundo, CA, United States

Publication History

Published: 01 March 2000

Received: 01 November 1999

Published in JAIR Volume 12, Issue 1

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

151
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Go JRyu J(2024)Spatial Bias for attention-free non-local neural networksExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122053238:PEOnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.122053
Mizrahi DBachmann RKar OYeo TGao MDehghan AZamir AOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)4MProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668666(58363-58408)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668666
Böttcher LAsikis TFragkos I(2023)Control of Dual-Sourcing Inventory Systems Using Recurrent Neural NetworksINFORMS Journal on Computing10.1287/ijoc.2022.013635:6(1308-1328)Online publication date: 6-Jul-2023
https://dl.acm.org/doi/10.1287/ijoc.2022.0136
Choo SPark HKim SPark DJung JLee SNam C(2023)Effectiveness of multi-task deep learning framework for EEG-based emotion and context recognitionExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.120348227:COnline publication date: 11-Jul-2023
https://dl.acm.org/doi/10.1016/j.eswa.2023.120348
Ullah ZUsman MGwak J(2023)MTSS-AAEExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.119475216:COnline publication date: 15-Apr-2023
https://dl.acm.org/doi/10.1016/j.eswa.2022.119475
Song XZheng SCao WYu JBian JKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Efficient and effective multi-task grouping via meta learning on task combinationsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602999(37647-37659)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602999
Hellström FDurisi GKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Evaluated CMI bounds for meta learningProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601771(20648-20660)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601771
Villarroya SBaumann P(2022)A survey on machine learning in array databasesApplied Intelligence10.1007/s10489-022-03979-253:9(9799-9822)Online publication date: 12-Aug-2022
https://dl.acm.org/doi/10.1007/s10489-022-03979-2
Kordon FMaier ASwartman BPrivalov MEl Barbari JKunze H(2022)Deep Geometric Supervision Improves Spatial Generalization in Orthopedic Surgery PlanningMedical Image Computing and Computer Assisted Intervention – MICCAI 202210.1007/978-3-031-16449-1_59(615-625)Online publication date: 18-Sep-2022
https://dl.acm.org/doi/10.1007/978-3-031-16449-1_59
Simchowitz MTosh CKrishnamurthy AHsu DLykouris TDudík MSchapire RRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Bayesian decision-making under misspecifed priors with applications to meta-learningProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3542281(26382-26394)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3542281
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents