Abstract
An important problem in neuroscience is to understand how brains extract relevant signals from mixtures of unknown sources, i.e., perform blind source separation. To model how the brain performs this task, we seek a biologically plausible single-layer neural network implementation of a blind source separation algorithm. For biological plausibility, we require the network to satisfy the following three basic properties of neuronal circuits: (i) the network operates in the online setting; (ii) synaptic learning rules are local; and (iii) neuronal outputs are nonnegative. Closest is the work by Pehlevan et al. (Neural Comput 29:2925–2954, 2017), which considers nonnegative independent component analysis (NICA), a special case of blind source separation that assumes the mixture is a linear combination of uncorrelated, nonnegative sources. They derive an algorithm with a biologically plausible 2-layer network implementation. In this work, we improve upon their result by deriving 2 algorithms for NICA, each with a biologically plausible single-layer network implementation. The first algorithm maps onto a network with indirect lateral connections mediated by interneurons. The second algorithm maps onto a network with direct lateral connections and multi-compartmental output neurons.
Similar content being viewed by others
References
Bahroun Y, Chklovskii D, Sengupta A (2021) A normative and biologically plausible algorithm for independent component analysis. Adv Neural Inf Process Syst 34:7368–7384
Bee MA, Micheyl C (2008) The cocktail party problem: what is it? How can it be solved? And why should animal behaviorists study it? J Comp Psychol 122(3):235
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Bronkhorst AW (2015) The cocktail-party problem revisited: early processing and selection of multi-talker speech. Atten Percept Psychophys 77(5):1465–1487
Colin CE (1953) Some experiments on the recognition of speech, with one and with two ears. J Acoust Soc Am 25(5):975–979
Desimone R, Duncan J (1995) Neural mechanisms of selective visual attention. Ann Rev Neurosci 18(1):193–222
Donoho D, Stodden V (2003) When does non-negative matrix factorization give a correct decomposition into parts? Advances in neural information processing systems, 16
Erdogan AT, Pehlevan C (2020) Blind bounded source separation using neural networks with local learning rules. In ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 3812–3816
Friedrich RW, Gilles L (2001) Dynamic optimization of odor representations by slow temporal patterning of mitral cell activity. Science 291(5505):889–894
Gschwend O, Abraham NM, Lagier S, Begnaud F, Rodriguez I, Carleton A (2015) Neuronal pattern separation in the olfactory bulb improves odor discrimination learning. Nat Neurosci 18(10):1474–1482
Huang K, Sidiropoulos ND, Swami A (2013) Non-negative matrix factorization revisited: uniqueness and algorithm for symmetric decomposition. IEEE Trans Signal Process 62(1):211–224
Hulse SH, MacDougall-Shackleton SA, Wisniewski AB (1997) Auditory scene analysis by songbirds: stream segregation of birdsong by European starlings (Sturnus vulgaris). J Comp Psychol 111(1):3
Hyvärinen A, Hoyer P (2000) Emergence of phase-and shift-invariant features by decomposition of natural images into independent feature subspaces. Neural Comput 12(7):1705–1720
Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4–5):411–430
Laughlin SB, Sejnowski TJ (2003) Communication in neuronal networks. Science 301(5641):1870–1874
Laurberg H Christensen MG, Plumbley MD, Hansen LK, and Jensen SH (2008). On the uniqueness of NMF. Computational intelligence and neuroscience, Theorems on positive data, p 2008
Lipshutz D, Windolf C, Golkar S, Chklovskii DB (2020) A biologically plausible neural network for slow feature analysis. Adv Neural Inf Process Syst 33:14986–14996
Lipshutz D, Bahroun Y, Golkar S, Sengupta AM, Chklovskii DB (2021) A biologically plausible neural network for multichannel canonical correlation analysis. Neural Comput 33(9):2309–2352
McDermott JH (2009) The cocktail party problem. Current Biol 19(22):R1024–R1027
Narayan R, Best V, Ozmeral E, McClaine E, Dent M, Shinn-Cunningham B, Sen K (2007) Cortical interference effects in the cocktail party problem. Nat Neurosci 10(12):1601–1607
Oja E, Plumbley M (2004) Blind separation of positive sources by globally convergent gradient search. Neural Comput 16(9):1811–1825
Pehlevan C, Chklovskii DB (2019) Neuroscience-inspired online unsupervised learning algorithms: artificial neural networks. IEEE Signal Process Mag 36(6):88–96
Pehlevan C, Mohan S, Chklovskii DB (2017) Blind nonnegative source separation using biological neural networks. Neural Comput 29(11):2925–2954
Plumbley M (2002) Conditions for nonnegative independent component analysis. IEEE Signal Process Lett 9(6):177–180
Plumbley MD (2003) Algorithms for nonnegative independent component analysis. IEEE Trans Neural Netw 14(3):534–543
Plumbley MD, Oja E (2004) A“nonnegative PCA”algorithm for independent component analysis. IEEE Trans Neural Netw 15(1):66–76
Rivera-Alba M, Hanchuan P, de Polavieja GG, Chklovskii DB (2014) Wiring economy can account for cell body placement across species and brain areas. Current Biol 24(3):R109–R110
Shinn-Cunningham BG (2008) Object-based auditory and visual attention. Trends Cognit Sci 12(5):182
Wilson RI, Mainen ZF (2006) Early events in olfactory processing. Ann Rev Neurosci 29:163–201
Yuan Z, Oja E (2004) A fastICA algorithm for non-negative independent component analysis. In international conference on independent component analysis and signal separation, Springer, pp 1–8
Acknowledgements
We are grateful to Lucy Reading-Ikkanda for creating Figs. 2 and 3. We thank Siavash Golkar, Johannes Friedrich, Tiberiu Tesileanu, Alex Genkin, Jason Moore and Yanis Bahroun for helpful comments and feedback on an earlier draft of this work. We especially thank Siavash Golkar for pointing out that, in Sect. 4.2, \(\mathbf{W}_{XZ}(\mathbf{x}_t-\overline{\mathbf{x}}_t)(\mathbf{x}_t-\overline{\mathbf{x}}_t)\) is not equal to \((\mathbf{c}_t-\overline{\mathbf{c}}_t)(\mathbf{x}_t-\overline{\mathbf{x}}_t)\) due to the (suppressed) time-dependency of the weights \(\mathbf{W}_{XZ}\). Finally, we thank the two referees for their careful reading of our paper and for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Benjamin Lindner.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
A Saddle point property
Here we recall the following minmax property for a function that satisfies the saddle point property (Boyd and Vandenberghe 2004, section 5.4).
Theorem 1
Let \(V\subseteq \mathbb {R}^n\), \(W\subseteq \mathbb {R}^m\) and \(f:V\times W\rightarrow \mathbb {R}\). Suppose f satisfies the saddle point property; that is, there exists \((\mathbf{a}^*,\mathbf{b}^*)\in V\times W\) such that
Then
B Optimization over neural activity matrices \((\mathbf{Y},\mathbf{N})\) in the derivation of Algorithm 1
In this section, we show that when \(\mathbf{W}_{XY}\) and \(\mathbf{W}_{YN}\) are at their optimal values, the optimal neural activity matrices \((\mathbf{Y},\mathbf{N})\) can be approximated via projected gradient descent. We first compute that optimal values of \(\mathbf{W}_{XY}\) and \(\mathbf{W}_{YN}\).
Lemma 1
Suppose \((\mathbf{W}_{XY}^*,\mathbf{W}_{YN}^*,\mathbf{Y}^*,\mathbf{N}^*)\) is an optimal solution of objective (5). Then
for some permutation matrix \(\mathbf{P}\).
Proof
From (Pehlevan et al. 2017, Theorem 3), we know that every solution of the objective
is of the form \(\mathbf{Y}=\mathbf{F}\mathbf{X}\), where \(\mathbf{F}\) is a whitening matrix. In particular, since \(\mathbf{Y}=\mathbf{F}\mathbf{A}\mathbf{S}\) and \(\mathbf{S}\) also has identity covariance matrix, \(\mathbf{Y}\) is an orthogonal transformation of \(\mathbf{S}\). Furthermore, since \(\mathbf{S}\) is well grounded, by (Plumbley 2002, Theorem 1), \(\mathbf{Y}\) is nonnegative if and only if \(\mathbf{F}\mathbf{A}\) is a permutation matrix. Therefore, every solution \(\mathbf{Y}^*\) of the objective
is of the form \(\mathbf{Y}^*=\mathbf{P}\mathbf{X}\) for some permutation matrix \(\mathbf{P}\). In addition, differentiating the expression
with respect to \(\delta \mathbf{Y}\) and setting the derivative equal to zero, we see that the at the optimal value, \(\delta \mathbf{N}^{*,\top }\delta \mathbf{N}^*=\delta \mathbf{X}^\top \delta \mathbf{X}=\delta \mathbf{S}^\top \mathbf{A}^\top \mathbf{A}\delta \mathbf{S}\).
Differentiating \(L_1\) with respect to \(\mathbf{W}_{XY}\) and \(\mathbf{W}_{NY}\), we see that the optimal values for the synaptic weight matrices are achieved at \(\mathbf{W}_{XY}=\frac{1}{T}\delta \mathbf{Y}\delta \mathbf{X}^\top \) and \(\mathbf{W}_{YN}=\frac{1}{T}\delta \mathbf{N}\delta \mathbf{Y}^\top \). Thus,
and
\(\square \)
Next, we show that when \(\mathbf{W}_{XY}\) and \(\mathbf{W}_{YN}\) are at their optimal values, the optimal \((\mathbf{Y}^*,\mathbf{N}^*)\) can be approximated by running the projected gradient dynamics in Eq. (6).
Lemma 2
Suppose \(\mathbf{W}_{XY}=\mathbf{P}\mathbf{A}^\top \) and \(\mathbf{W}_{YN}^\top \mathbf{W}_{YN}=\mathbf{P}\mathbf{A}^\top \mathbf{A}\mathbf{P}^\top \) for some permutation matrix \(\mathbf{P}\). Then
is a solution of the minmax problem
In particular, \((\mathbf{Y}^*,\mathbf{N}^*)\) is the unique solution of the minmax problem
which can be approximated by running the projected gradient dynamics in Eq. (6).
Proof
We first relax the condition that \(\mathbf{Y}\) be a nonnegative linear transformation of \(\mathbf{X}\) and consider the minmax problem
After differentiating with respect to \(\delta \mathbf{Y}\) and \(\delta \mathbf{N}\), we see that this objective is optimized when the centered matrices \(\delta \mathbf{Y}\) and \(\delta \mathbf{N}\) are given by
Next, we see that the above relations for the centered matrices hold when \(\mathbf{Y}\) and \(\mathbf{N}\) are given by Eq. (19), where we have used the fact that \(\mathbf{W}_{XY}=\mathbf{P}\mathbf{A}^\top \) and \(\mathbf{W}_{YN}^\top \mathbf{W}_{YN}=\mathbf{P}\mathbf{A}^\top \mathbf{A}\mathbf{P}^\top \). Note that \(\mathbf{Y}\) is a linear transformation of \(\mathbf{X}\) and \(\mathbf{Y}\) is nonnegative since it is a permutation of the nonnegative sources. It follows that \((\mathbf{Y},\mathbf{N})\) is also a solution to the constrained minmax problem (20). Finally, differentiating the objective in Eq. (21) with respect to \(\mathbf{Y}\) and \(\mathbf{N}\), we see that the optimal \(\mathbf{Y}\) and \(\mathbf{N}\) are again given by Eq. (19). \(\square \)
C Decoupling the interneuron synapses
The NICA algorithm derived in Sect. 4.1 requires the interneuron-to-output neuron synaptic weight matrix \(\mathbf{W}_{NY}\) to be the transpose of the output neuron-to-interneuron synaptic weight matrix \(\mathbf{W}_{YN}\). Enforcing this symmetry via a centralized mechanism is not biologically plausible and is commonly referred to as the weight transport problem.
Here, we show that the symmetry of the 2 weights asymptotically follows from the learning rules in Algorithm 1, even when the symmetry does not hold at initialization. Let \(\mathbf{W}_{NY,0}\) and \(\mathbf{W}_{YN,0}\) denote the initial values of \(\mathbf{W}_{NY}\) and \(\mathbf{W}_{YN}\). Then, in view of the updates rules in Algorithm 1, the difference \(\mathbf{W}_{NY}-\mathbf{W}_{YN}^\top \) after t updates is given by
In particular, the difference decays exponentially.
D Details of numerical experiments
The simulations were performed on an Apple iMac with a 2.8 GHz Quad-Core Intel Core i7 processor. For each of the algorithms that we implement, we use a time-dependent learning rate of the form:
To choose the parameters, we perform a grid search over \(\eta _0\in \{10^{-1},10^{-2},10^{-3},10^{-4}\}\) and over \(\gamma \in \{10^{-2},10^{-3},10^{-4},10^{-5},10^{-6},10^{-7}\}\). In Table 1 we report the best performing hyperparameters we found for each algorithm. We now detail our implementation of each algorithm.
-
1.
Bio-NICA with interneurons (Algorithm 1): The neural outputs were computed using the quadratic convex optimization function solve_qp from the Python package quadprog. After each iteration, we checked if any output neuron had not been active up until that iteration. If so, we flipped the sign of its feedforward inputs. In addition, if the norm of one of the row vectors of \(\mathbf{W}_{XY}\) fell below 0.1, we would replace the row vector with a random vector to avoid the row vector becoming degenerate, and if a singular value of \(\mathbf{W}_{XY}\), \(\mathbf{W}_{YN}\) or \(\mathbf{W}_{NY}\) fell below 0.01, we replaced the singular value with 1 (we checked every 100 iterations).
-
2.
Bio-NICA with 2-compartmental neurons (Algorithm 2): The neural outputs were computed using the quadratic convex optimization function solve_qp from the Python package quadprog. We used the time-dependent learning rate of Eq. (22) and included \(\tau \in \{0.01,0.03,0.05,0.08,0.1,0.3,0.5,0.8,1,3\}\) in the grid search to find the best performance. After each iteration, we checked if any output neuron had not been active up until that iteration. If so, we flipped the sign of its feedforward inputs. In addition, if an eigenvalue of \(\mathbf{W}_{ZZ}\) fell below 0.01, we replaced the eigenvalue with 1 to prevent \(\mathbf{W}_{ZZ}\) from becoming degenerate (we checked every 100 iterations).
-
3.
2-layer NSM: We implemented the algorithm in Pehlevan et al. (2017) with time-dependent learning rates. For the whitening layer, we used the optimal time-dependent learning rate reported in Pehlevan et al. (2017): \(\zeta _t=0.01/(1+0.01t)\). For the NSM layer, we used the time-dependent learning rate of Eq. (22). To compute the neuronal outputs, we used the quadratic convex optimization function solve_qp from the Python package quadprog. After each iteration, we checked if any output neuron had not been active up until that iteration. If so, we flipped the sign of its feedforward inputs.
-
4.
Nonnegative PCA (NPCA): We use the online version given in Plumbley and Oja (2004). The algorithm assumes the inputs are noncentered and whitened. We performed the noncentered whitening offline. After each iteration, we checked if any output neuron had not been active up until that iteration. If so, we flipped the sign of its feedforward inputs.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lipshutz, D., Pehlevan, C. & Chklovskii, D.B. Biologically plausible single-layer networks for nonnegative independent component analysis. Biol Cybern 116, 557–568 (2022). https://doi.org/10.1007/s00422-022-00943-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00422-022-00943-8