Abstract
Mixture of experts (ME) is one of the most popular and interesting combining methods, which has great potential to improve performance in machine learning. ME is established based on the divide-and-conquer principle in which the problem space is divided between a few neural network experts, supervised by a gating network. In earlier works on ME, different strategies were developed to divide the problem space between the experts. To survey and analyse these methods more clearly, we present a categorisation of the ME literature based on this difference. Various ME implementations were classified into two groups, according to the partitioning strategies used and both how and when the gating network is involved in the partitioning and combining procedures. In the first group, The conventional ME and the extensions of this method stochastically partition the problem space into a number of subspaces using a special employed error function, and experts become specialised in each subspace. In the second group, the problem space is explicitly partitioned by the clustering method before the experts’ training process starts, and each expert is then assigned to one of these sub-spaces. Based on the implicit problem space partitioning using a tacit competitive process between the experts, we call the first group the mixture of implicitly localised experts (MILE), and the second group is called mixture of explicitly localised experts (MELE), as it uses pre-specified clusters. The properties of both groups are investigated in comparison with each other. Investigation of MILE versus MELE, discussing the advantages and disadvantages of each group, showed that the two approaches have complementary features. Moreover, the features of the ME method are compared with other popular combining methods, including boosting and negative correlation learning methods. As the investigated methods have complementary strengths and limitations, previous researches that attempted to combine their features in integrated approaches are reviewed and, moreover, some suggestions are proposed for future research directions.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Alpaydin E, Jordan MI (1996) Local linear perceptrons for classification. IEEE Trans Neural Netw 7(3): 788–792
Avnimelech R, Intrator N (1999) Boosted mixture of experts: an ensemble learning scheme. Neural Comput 11(2): 483–497
Breiman L (1996) Bagging predictors. Mach Learn 24(2): 123–140
Chen K, Xu L, Chi H (1999) Improved learning algorithms for mixture of experts in multiclass classification. Neural Netw 12(9): 1229–1252
Dailey MN, Cottrell GW (1999) Organization of face and object recognition in modular neural network models. Neural Netw 12(7–8): 1053–1074
Dietterich T (2000) Ensemble methods in machine learning. Multiple classifier systems, Cagliari, Italy. Springer, LNCS, pp 1–15
Ebrahimpour R (2007) View-independent face recognition with mixture of experts. PhD thesis, Institute for studies in Theoretical Physics and Mathematics (IPM)
Ebrahimpour R, Kabir E, Yousefi MR (2007) Face detection using mixture of MLP experts. Neural Process Lett 26(1): 69–82. doi:10.1007/s11063-007-9043-z
Ebrahimpour R, Kabir E, Esteky H, Yousefi MR (2008a) A mixture of multilayer perceptron experts network for modeling face/nonface recognition in cortical face processing regions. Intell Autom Soft Comput 14(2): 151–162
Ebrahimpour R, Kabir E, Yousefi MR (2008b) Teacher-directed learning in view-independent face recognition with mixture of experts using overlapping eigenspaces. Comput Vis Image Underst 111(2): 195–206. doi:10.1016/j.cviu.2007.10.003
Ebrahimpour R, Kabir E, Yousefi MR (2008c) Teacher-directed learning in view-independent face recognition with mixture of experts using single-view eigenspaces. J Franklin Inst 345(2): 87–101. doi:10.1016/j.jfranklin.2007.06.004
Ebrahimpour R, Kabire E, Esteky H, Yousefi MR (2008d) View-independent face recognition with mixture of experts. Neurocomputing 71(4–6): 1103–1107. doi:10.1016/j.neucom.2007.08.021
Ebrahimpour R, Nikoo H, Masoudnia S, Yousefi MR, Ghaemi MS (2010) Mixture of MLP-experts for trend forecasting of time series: a case study of the Tehran stock exchange. Int J Forecast
Ebrahimpour R, Arani SAAA, Masoudnia S (2011a) Improving combination method of NCL experts using gating network. Neural Comput Appl 1–7. doi:10.1007/s00521-011-0746-8.
Ebrahimpour R, Kabir E, Yousefi MR (2011b) Improving mixture of experts for view-independent face recognition using teacher-directed learning. Mach Vis Appl 22(2): 421–432
Ebrahimpour R, Nikoo H, Masoudnia S, Yousefi MR, Ghaemi MS (2011) Mixture of MLP-experts for trend forecasting of time series: a case study of the Tehran stock exchange. Int J Forecast 27(3): 804–816. doi:10.1016/j.ijforecast.2010.02.015
Ebrahimpour R, Sadeghnejad N, Arani SAAA, Mohammadi N (2012) Boost-wise pre-loaded mixture of experts for classification tasks. Neural Comput Appl:1-13. doi:10.1007/s00521-012-0909-2
Goodband JH, Haas OCL, Mills JA (2006) A mixture of experts committee machine to design compensators for intensity modulated radiation therapy. Pattern Recogn 39(9): 1704–1714. doi:10.1016/j.patcog.2006.03.018
Guler I, Ubeyli ED (2005) A mixture of experts network structure for modelling Doppler ultrasound blood flow signals. Comput Biol Med 35(7): 565–582. doi:10.1016/j.compbiomed.2004.04.001
Gutta S, Huang JRJ, Jonathon P, Wechsler H (2000) Mixture of experts for classification of gender, ethnic origin, and pose of human faces. IEEE Trans Neural Netw 11(4): 948–960
Hansen JV (1999) Combining predictors: comparison of five meta machine learning methods. Inform Sci 119(1–2): 91–105
Hansen JV (2000) Combining predictors: meta machine learning methods and bias/variance and ambiguity decompositions. Computer Science Dept., Aarhus Univ, Aarhus
Hong X, Harris CJ (2001) A mixture of experts network structure construction algorithm for modelling and control. Appl Intell 16(1): 59–69
Islam MM, Yao X, Murase K (2003) A constructive algorithm for training cooperative neural network ensembles. IEEE Trans Neural Netw 14(4): 820–834
Islam MM, Yao X, Nirjon SMS, Islam MA, Murase K (2008) Bagging and boosting negatively correlated neural networks. IEEE Trans Syst Man Cybern B 38(3): 771–784. doi:10.1109/Tsmcb.2008.922055
Jacobs RA (1997) Bias/variance analyses of mixtures-of-experts architectures. Neural Comput 9(2): 369–383
Jacobs RA, Jordan MI, Barto AG (1991) Task decomposition through competition in a modular connectionist architecture—the what and where vision tasks. Cogn Sci 15(2): 219–250
Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3(1): 79–87
Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the Em algorithm. Neural Comput 6(2): 181–214
Kamimura R (2004) Teacher-directed learning with Gaussian and sigmoid activation functions. In: Springer, New York, pp 530–536
Kecman V (2001) Learning and soft computing: support vector machines, neural networks, and fuzzy logic models. The MIT press, Cambridge
Kim SP, Sanchez JC, Erdogmus D, Rao YN, Wessberg J, Principe JC, Nicolelis M (2003) Divide- and-conquer approach for brain machine interfaces: nonlinear mixture of competitive linear models. Neural Netw 16(5–6): 865–871. doi:10.1016/S0893-6080(03)00108-4
Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal 20(3): 226–239
Kotsiantis S (2011a) Combining bagging, boosting, rotation forest and random subspace methods. Artif Intell Rev 35(3): 1–18. doi:10.1007/s10462-010-9192-8
Kotsiantis S (2011b) An incremental ensemble of classifiers. Artif Intell Rev 36(4): 1–18. doi:10.1007/s10462-011-9211-4
Kotsiantis S, Zaharakis I, Pintelas P (2006) Machine learning: a review of classification and combining techniques. Artif Intell Rev 26(3): 159–190
Kuncheva LI (2002) Switching between selection and fusion in combining classifiers: an experiment. IEEE Trans Syst Man Cybern B 32(2): 146–156. doi:10.1109/3477.990871
Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley-Interscience, New York
Liu Y, Yao X (1999) Ensemble learning via negative correlation. Neural Netw 12(10): 1399–1404
Liu Y, Yao X (1999) Simultaneous training of negatively correlated neural networks in an ensemble. IEEE Trans Syst Man Cybern B 29(6): 716–725
Lorena AC, de Carvalho AC, Gama JMP (2008) A review on the combination of binary classifiers in multiclass problems. Artif Intell Rev 30(1): 19–37
Masoudnia S, Ebrahimpour R, Arani SAAA (2012) Incorporation of a regularization term to control negative correlation in mixture of experts. Neural Process Lett:1-17
Mangiameli P, West D (1999) An improved neural classification network for the two-group problem. Comput Oper Res 26(5): 443–460
Nguyen MH (2006) Cooperative coevolutionary mixture of experts: a neuro ensemble approach for automatic decomposition of classification problems. University of New South Wales, New South Wales
Nguyen MH, Abbass HA, Mckay RI (2006) A novel mixture of experts model based on cooperative coevolution. Neurocomputing 70(1–3): 155–163. doi:10.1016/j.neucom.2006.04.009
Polikar R (2006) Ensemble based systems in decision making. IEEE Circ Syst Mag 6(3): 21–45
Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1–2): 1–39. doi:10.1007/s10462-009-9124-7
Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2): 197–227
Sharkey AJC, Sharkey NE (1997) Combining diverse neural nets. Knowl Eng Rev 12(03): 231–247
Tang B, Heywood MI, Shepherd M (2002) Input partitioning to mixture of experts. In: IEEE, pp 227–232
Tran TP, Nguyen TTS, Tsai P, Kong X (2011) BSPNN: boosted subspace probabilistic neural network for email security. Artif Intell Rev 35(4): 1–14
Tumer K, Ghosh J (1996) Error correlation and error reduction in ensemble classifiers. Connect Sci 8(3): 385–404
Ubeyli ED (2009) Modified mixture of experts employing eigenvector methods and Lyapunov exponents for analysis of electroencephalogram signals. Expert Syst 26(4): 339–354. doi:10.1111/j.1468-0394.2009.00490.x
Ubeyli ED, Ilbay K, Ilbay G, Sahin D, Akansel G (2010) Differentiation of two subtypes of adult hydrocephalus by mixture of experts. J Med Syst 34(3): 281–290. doi:10.1007/s10916-008-9239-4
Viardot G, Lengelle R, Richard C (2002) Mixture of experts for automated detection of phasic arousals in sleep signals. In: IEEE International Conference on Systems, Man and Cybernetics, pp 551–555
Waterhouse S, Cook G (1997) Ensemble methods for phoneme classification. Adv Neural Inf Processing Syst 9: 800–806
Waterhouse S, MacKay D, Robinson T (1996) Bayesian methods for mixtures of experts. Citeseer
Waterhouse SR (1997) Classification and regression using mixtures of experts. Unpublished doctoral dissertation, Cambridge University
Woods K, Kegelmeyer WP, Bowyer K (1997) Combination of multiple classifiers using local accuracy estimates. IEEE Trans Pattern Anal 19(4): 405–410
Xing HJ, Hua BG (2008) An adaptive fuzzy c-means clustering-based mixtures of experts model for unlabeled data classification. Neurocomputing 71(4–6): 1008–1021. doi:10.1016/j.neucom.2007.02.010
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Masoudnia, S., Ebrahimpour, R. Mixture of experts: a literature survey. Artif Intell Rev 42, 275–293 (2014). https://doi.org/10.1007/s10462-012-9338-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-012-9338-y