Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
Prediction has a central role in the foundations of Bayesian statistics and is now the main focus... more Prediction has a central role in the foundations of Bayesian statistics and is now the main focus in many areas of machine learning, in contrast to the more classical focus on inference. We discuss that, in the basic setting of random sampling—that is, in the Bayesian approach, exchangeability—uncertainty expressed by the posterior distribution and credible intervals can indeed be understood in terms of prediction. The posterior law on the unknown distribution is centred on the predictive distribution and we prove that it is marginally asymptotically Gaussian with variance depending on the predictive updates , i.e. on how the predictive rule incorporates information as new observations become available. This allows to obtain asymptotic credible intervals only based on the predictive rule (without having to specify the model and the prior law), sheds light on frequentist coverage as related to the predictive learning rule, and, we believe, opens a new perspective towards a notion of ...
Riassunto: Si presentano alcuni risultati sulla distribuzione asintotica del numero di coincidenz... more Riassunto: Si presentano alcuni risultati sulla distribuzione asintotica del numero di coincidenze in ambito bayesiano, cio è qualora il meccanismo aleatorio che genera i dati sia specificato mediante una distribuzione a priori. In particolare si determinano condizioni necessarie e sufficienti per la convergenza in distribuzione del numero di coincidenze ad una mistura di Poisson, nel caso di una generica distribuzione a priori, cio è nella sola ipotesi che i dati siano scambiabili. Risultati asintotici in questo ambito erano gi à not , in letteratura, per particolari distribuzioni a priori. Le dimostrazioni si basano su recenti risultati sui grafi aleatori colorati, con colori scambiabili.
We consider fully connected feed-forward deep neural networks (NNs) where weights and biases are ... more We consider fully connected feed-forward deep neural networks (NNs) where weights and biases are independent and identically distributed as symmetric centered stable distributions. Then, we show that the infinite wide limit of the NN, under suitable scaling on the weights, is a stochastic process whose finite-dimensional distributions are multivariate stable distributions. The limiting process is referred to as the stable process, and it generalizes the class of Gaussian processes recently obtained as infinite wide limits of NNs (Matthews at al., 2018b). Parameters of the stable process can be computed via an explicit recursion over the layers of the network. Our result contributes to the theory of fully connected feed-forward deep NNs, and it paves the way to expand recent lines of research that rely on Gaussian infinite wide limits.
P {x ∈ {0, 1} : xi1 = 1, . . . , xik = 1} := det[f̂(ir − is)]1≤r,s≤k, (i1, . . . , ik ∈ Z distinc... more P {x ∈ {0, 1} : xi1 = 1, . . . , xik = 1} := det[f̂(ir − is)]1≤r,s≤k, (i1, . . . , ik ∈ Z distinct, k ≥ 1) is a determinantal probability measure (see Lyons 2003). Determinantal processes are applied in different fields as mathematical physics, random matrix theory, representation theory and ergodic theory. For a survey see Soshnikov (2000). If the function f : [0, 1] → [0, 1] is a trigonometric function of the form
We present applications of the concentration function in both global and local sensitivity analys... more We present applications of the concentration function in both global and local sensitivity analyses, along with its connection with Choquet capacities.
Journal of Statistical Planning and Inference, 1994
ABSTRACT The concentration function, extending the classical notion of Lorenz curve, is well suit... more ABSTRACT The concentration function, extending the classical notion of Lorenz curve, is well suited for comparing probability measures. Such a feature can be useful in different issues in Bayesian robustness, when a probability measure is deemed a baseline to be compared with other measures by means of their functional forms. Neighbourhood classes Γ of probability measures, including well-known ones, can be defined through the concentration function and both prior and posterior expectations of given functions of the unknown parameter are studied. The ranges of such expectations over Γ can be found, restricting the search among the extremal measures in Γ. The concentration function can be also used as a criterion to assess posterior robustness, when considering sensitivity to changes in the likelihood and the prior.
Sankhyā: The Indian Journal of Statistics, Series A, 1994
... Uv 2 K9a(Po) so that Uy is a neighbourhood of Po in the topology TQ. ... 452 SANDRA FORTINI A... more ... Uv 2 K9a(Po) so that Uy is a neighbourhood of Po in the topology TQ. ... 452 SANDRA FORTINI AND FABRIZIO RUGGERI ... P(C) > lim Pn(C). Hence for any e > 0 there exists no(C) such that, for any n?>oo n(C) > no^.P^c^C) < P(C) 4-e. Furthermore, let Po(C) = x0 - 6, then it ...
Patrizia Berti Dipartimento di Matematica Pura ed Applicata "G. Vitali", Università di ... more Patrizia Berti Dipartimento di Matematica Pura ed Applicata "G. Vitali", Università di Modena E-mail: berti.patrizia@unimo.it Sandra Fortini Istituto di Metodi Quantitativi, Università "L. Bocconi" E-mail: sandra.fortini@uni-bocconi.it Lucia Ladelli Dipartimento di Matematica, ...
In modern deep learning, there is a recent and growing literature on the interplay between large-... more In modern deep learning, there is a recent and growing literature on the interplay between large-width asymptotics for deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussiandistributed weights, and classes of Gaussian stochastic processes (SPs). Such an interplay has proved to be critical in several contexts of practical interest, e.g. Bayesian inference under Gaussian SP priors, kernel regression for infinite-wide deep NNs trained via gradient descent, and information propagation within infinite-wide NNs. Motivated by empirical analysis, showing the potential of replacing Gaussian distributions with Stable distributions for the NN’s weights, in this paper we investigate large-width asymptotics for (fully connected) feed-forward deep Stable NNs, i.e. deep NNs with Stable-distributed weights. First, we show that as the width goes to infinity jointly over the NN’s layers, a suitable rescaled deep Stable NN converges weakly to a Stable SP whose distribution is characterized r...
The connection between infinite-width neural networks (NNs) and Gaussian processes (GPs) is well ... more The connection between infinite-width neural networks (NNs) and Gaussian processes (GPs) is well known since the seminal work of Neal [1996]. While numerous theoretical refinements have been proposed in recent years, the connection between NNs and GPs relies on two critical distributional assumptions on the NN’s parameters: i) finite variance ii) independent and identical distribution (iid). In this paper, we consider the problem of removing assumption i) in the context of deep feed-forward convolutional NNs. We show that the infinite-channel limit of a deep feed-forward convolutional NNs, under suitable scaling, is a stochastic process with multivariate stable finite-dimensional distributions, and we give an explicit recursion over the layers for their parameters. Our contribution extends recent results of Favaro et al. [2020] to convolutional architectures, and it paves the way to exciting lines of research that rely on GP limits.
The interplay between infinite-width neural networks (NNs) and classes of Gaussian processes (GPs... more The interplay between infinite-width neural networks (NNs) and classes of Gaussian processes (GPs) is well known since the seminal work of Neal (1996). While numerous theoretical refinements have been proposed in the recent years, the interplay between NNs and GPs relies on two critical distributional assumptions on the NN’s parameters: A1) finite variance; A2) independent and identical distribution (iid). In this paper, we consider the problem of removing A1 in the general context of deep feed-forward convolutional NNs. In particular, we assume iid parameters distributed according to a stable distribution and we show that the infinite-channel limit of a deep feed-forward convolutional NNs, under suitable scaling, is a stochastic process with multivariate stable finite-dimensional distributions. Such a limiting distribution is then characterized through an explicit backward recursion for its parameters over the layers. Our contribution extends results of Favaro et al. (2020) to conv...
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
Prediction has a central role in the foundations of Bayesian statistics and is now the main focus... more Prediction has a central role in the foundations of Bayesian statistics and is now the main focus in many areas of machine learning, in contrast to the more classical focus on inference. We discuss that, in the basic setting of random sampling—that is, in the Bayesian approach, exchangeability—uncertainty expressed by the posterior distribution and credible intervals can indeed be understood in terms of prediction. The posterior law on the unknown distribution is centred on the predictive distribution and we prove that it is marginally asymptotically Gaussian with variance depending on the predictive updates , i.e. on how the predictive rule incorporates information as new observations become available. This allows to obtain asymptotic credible intervals only based on the predictive rule (without having to specify the model and the prior law), sheds light on frequentist coverage as related to the predictive learning rule, and, we believe, opens a new perspective towards a notion of ...
Riassunto: Si presentano alcuni risultati sulla distribuzione asintotica del numero di coincidenz... more Riassunto: Si presentano alcuni risultati sulla distribuzione asintotica del numero di coincidenze in ambito bayesiano, cio è qualora il meccanismo aleatorio che genera i dati sia specificato mediante una distribuzione a priori. In particolare si determinano condizioni necessarie e sufficienti per la convergenza in distribuzione del numero di coincidenze ad una mistura di Poisson, nel caso di una generica distribuzione a priori, cio è nella sola ipotesi che i dati siano scambiabili. Risultati asintotici in questo ambito erano gi à not , in letteratura, per particolari distribuzioni a priori. Le dimostrazioni si basano su recenti risultati sui grafi aleatori colorati, con colori scambiabili.
We consider fully connected feed-forward deep neural networks (NNs) where weights and biases are ... more We consider fully connected feed-forward deep neural networks (NNs) where weights and biases are independent and identically distributed as symmetric centered stable distributions. Then, we show that the infinite wide limit of the NN, under suitable scaling on the weights, is a stochastic process whose finite-dimensional distributions are multivariate stable distributions. The limiting process is referred to as the stable process, and it generalizes the class of Gaussian processes recently obtained as infinite wide limits of NNs (Matthews at al., 2018b). Parameters of the stable process can be computed via an explicit recursion over the layers of the network. Our result contributes to the theory of fully connected feed-forward deep NNs, and it paves the way to expand recent lines of research that rely on Gaussian infinite wide limits.
P {x ∈ {0, 1} : xi1 = 1, . . . , xik = 1} := det[f̂(ir − is)]1≤r,s≤k, (i1, . . . , ik ∈ Z distinc... more P {x ∈ {0, 1} : xi1 = 1, . . . , xik = 1} := det[f̂(ir − is)]1≤r,s≤k, (i1, . . . , ik ∈ Z distinct, k ≥ 1) is a determinantal probability measure (see Lyons 2003). Determinantal processes are applied in different fields as mathematical physics, random matrix theory, representation theory and ergodic theory. For a survey see Soshnikov (2000). If the function f : [0, 1] → [0, 1] is a trigonometric function of the form
We present applications of the concentration function in both global and local sensitivity analys... more We present applications of the concentration function in both global and local sensitivity analyses, along with its connection with Choquet capacities.
Journal of Statistical Planning and Inference, 1994
ABSTRACT The concentration function, extending the classical notion of Lorenz curve, is well suit... more ABSTRACT The concentration function, extending the classical notion of Lorenz curve, is well suited for comparing probability measures. Such a feature can be useful in different issues in Bayesian robustness, when a probability measure is deemed a baseline to be compared with other measures by means of their functional forms. Neighbourhood classes Γ of probability measures, including well-known ones, can be defined through the concentration function and both prior and posterior expectations of given functions of the unknown parameter are studied. The ranges of such expectations over Γ can be found, restricting the search among the extremal measures in Γ. The concentration function can be also used as a criterion to assess posterior robustness, when considering sensitivity to changes in the likelihood and the prior.
Sankhyā: The Indian Journal of Statistics, Series A, 1994
... Uv 2 K9a(Po) so that Uy is a neighbourhood of Po in the topology TQ. ... 452 SANDRA FORTINI A... more ... Uv 2 K9a(Po) so that Uy is a neighbourhood of Po in the topology TQ. ... 452 SANDRA FORTINI AND FABRIZIO RUGGERI ... P(C) > lim Pn(C). Hence for any e > 0 there exists no(C) such that, for any n?>oo n(C) > no^.P^c^C) < P(C) 4-e. Furthermore, let Po(C) = x0 - 6, then it ...
Patrizia Berti Dipartimento di Matematica Pura ed Applicata "G. Vitali", Università di ... more Patrizia Berti Dipartimento di Matematica Pura ed Applicata "G. Vitali", Università di Modena E-mail: berti.patrizia@unimo.it Sandra Fortini Istituto di Metodi Quantitativi, Università "L. Bocconi" E-mail: sandra.fortini@uni-bocconi.it Lucia Ladelli Dipartimento di Matematica, ...
In modern deep learning, there is a recent and growing literature on the interplay between large-... more In modern deep learning, there is a recent and growing literature on the interplay between large-width asymptotics for deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussiandistributed weights, and classes of Gaussian stochastic processes (SPs). Such an interplay has proved to be critical in several contexts of practical interest, e.g. Bayesian inference under Gaussian SP priors, kernel regression for infinite-wide deep NNs trained via gradient descent, and information propagation within infinite-wide NNs. Motivated by empirical analysis, showing the potential of replacing Gaussian distributions with Stable distributions for the NN’s weights, in this paper we investigate large-width asymptotics for (fully connected) feed-forward deep Stable NNs, i.e. deep NNs with Stable-distributed weights. First, we show that as the width goes to infinity jointly over the NN’s layers, a suitable rescaled deep Stable NN converges weakly to a Stable SP whose distribution is characterized r...
The connection between infinite-width neural networks (NNs) and Gaussian processes (GPs) is well ... more The connection between infinite-width neural networks (NNs) and Gaussian processes (GPs) is well known since the seminal work of Neal [1996]. While numerous theoretical refinements have been proposed in recent years, the connection between NNs and GPs relies on two critical distributional assumptions on the NN’s parameters: i) finite variance ii) independent and identical distribution (iid). In this paper, we consider the problem of removing assumption i) in the context of deep feed-forward convolutional NNs. We show that the infinite-channel limit of a deep feed-forward convolutional NNs, under suitable scaling, is a stochastic process with multivariate stable finite-dimensional distributions, and we give an explicit recursion over the layers for their parameters. Our contribution extends recent results of Favaro et al. [2020] to convolutional architectures, and it paves the way to exciting lines of research that rely on GP limits.
The interplay between infinite-width neural networks (NNs) and classes of Gaussian processes (GPs... more The interplay between infinite-width neural networks (NNs) and classes of Gaussian processes (GPs) is well known since the seminal work of Neal (1996). While numerous theoretical refinements have been proposed in the recent years, the interplay between NNs and GPs relies on two critical distributional assumptions on the NN’s parameters: A1) finite variance; A2) independent and identical distribution (iid). In this paper, we consider the problem of removing A1 in the general context of deep feed-forward convolutional NNs. In particular, we assume iid parameters distributed according to a stable distribution and we show that the infinite-channel limit of a deep feed-forward convolutional NNs, under suitable scaling, is a stochastic process with multivariate stable finite-dimensional distributions. Such a limiting distribution is then characterized through an explicit backward recursion for its parameters over the layers. Our contribution extends results of Favaro et al. (2020) to conv...
Uploads
Papers by Sandra Fortini