Latent Factor Analysis in Short Panels: Alain-Philippe Fortin, Patrick Gagliardini, Olivier Scaillet May 31, 2024
Latent Factor Analysis in Short Panels: Alain-Philippe Fortin, Patrick Gagliardini, Olivier Scaillet May 31, 2024
Latent Factor Analysis in Short Panels: Alain-Philippe Fortin, Patrick Gagliardini, Olivier Scaillet May 31, 2024
* 1 University of Geneva, 2 Swiss Finance Institute, 3 Università della Svizzera italiana. Acknowledgements: We are grateful to A. Onatski for
his very insightful discussion (Onatski (2023)) of our paper Fortin, Gagliardini, Scaillet (2023) at the 14th Annual SoFiE Conference in Cambridge,
which prompted us to exploit factor analysis for estimation in short panels. We thank D. Amengual, L. Barras, S. Bonhomme, M. Caner, F. Carlini,
I. Chaieb, F. Ghezzi, A. Horenstein, G. Imbens, S. Kim, F. Kleibergen, H. Langlois, T. Magnac, S. Ng, E. Ossola, Y. Potiron, F. Trojani, participants
at (EC)ˆ2 2022, QFFE 2023, SoFiE 2023, NASM 2023, IPDC 2023, FinEML, 22ème Journée d’Économétrie, HKUST workshop, SFI research days,
GIGS conference, and seminars at UNIMIB, UNIBE, UNIGE, Warwick, Bristol, QMUL, CUHK, and Luxembourg. The first and third author also
acknowledge financial support by the Swiss National Science Foundation (grants UN11140 and 100018_215573).
1
1 Introduction
Latent variable models have been used for a long time in econometrics (Aigner et al. (1984)).
Here, we study large cross-sectional latent factor models with small time dimension. Two common
methods for estimation of latent factor spaces are principal component analysis (PCA) and factor
analysis (FA), see Anderson (2003) Chapters 11 and 14. They cover multiple applications in
finance and economics as well as in social sciences in general. They are often used in exploratory
analysis of data. Omitted latent factors are also called interactive fixed effects in the panel literature
(Pesaran (2006), Bai (2009), Moon and Weidner (2015), Freyberger (2018)). In recent work,
Fortin, Gagliardini and Scaillet (FGS, 2023) show how we can use PCA to conduct inference on the
number of factors in such models without making Gaussian assumptions. Their methodology relies
on sphericity of the idiosyncratic variances since this restriction is both necessary and sufficient
for consistency of latent factor estimates with small T (Theorem 4 of Bai (2003)). FGS provides a
discussion of the (in)consistency of the PCA estimator with fixed T from the vantage point of the
well-known incidental parameter problem of the panel data literature (Neyman and Scott(1948);
see Lancaster (2000) for a review). In PCA, sphericity allows to identify the number k of factors
from the k first eigenvalue spacings being larger than zero, and being zero the subsequent ones. On
the contrary, the FA strategy does not exploit eigenvalue spacings and does not require sphericity.
However, inference with small T up to now mostly relies on (often restrictive) assumptions such
as Gaussian variables (with a notable exception by Anderson and Amemiya (1988)) and error
homoskedasticity across sample units. Those are untenable assumptions in our application with
stock returns. The strong assumption of sphericity might also fail to hold in some samples. If it
happens, our Monte Carlo experiments with non-Gaussian errors show that the eigenvalue spacing
test of FGS exhibit size distortions of over 80 percentage points for a nominal size of 5%.1 Our
1
That massive over-rejection translates into an average estimated number of factors often above 10 instead of 2
(true value) in our simulation design when T = 24 (a sample size close to T = 20 in our empirics). We get similar
Monte Carlo results for a constrained Likelihood Ratio test when sphericity does not hold (see Section 4.3).
2
Monte Carlo experiments also reveal that the classical chi-square test of the FA theory obtained
under cross-sectionally homoskedastic Gaussian errors (Anderson (1963)) suffer from massive
over-rejection by around 80 percentage points for a nominal size of 5% when T = 24.
A central and practical issue in applied work with latent factors is to determine the number of
factors. For models with unobservable (latent) factors only, Connor and Korajczyk (1993) are the
first to develop a test for the number of factors for large balanced panels of individual stock returns
in time-invariant models under covariance stationarity and homoskedasticity. Unobservable factors
are estimated by the method of asymptotic principal components developed by Connor and Kora-
jczyk (1986) (see also Stock and Watson (2002)). For heteroskedastic settings, the recent literature
on large balanced panels with static factors has extended the toolkit available to researchers. A
first strand of that literature focuses on consistent estimation procedures for the number of factors.
Bai and Ng (2002) introduce a penalized least-squares strategy to estimate the number of factors,
at least one. Ando and Bai (2015) extend that approach when explanatory variables are present in
the linear specification (see Bai (2009) for homogeneous regression coefficients). Onatski (2010)
looks at the behavior of differences in adjacent eigenvalues to determine the number of factors
when n and T are both large and comparable. Ahn and Horenstein (2013) opt for a similar strat-
egy based on eigenvalue ratios. Caner and Han (2014) propose an estimator with a group bridge
penalization to determine the number of unobservable factors. Based on the framework of Gagliar-
dini, Ossola and Scaillet (2016), Gagliardini, Ossola and Scaillet (2019) build a simple diagnostic
criterion for approximate factor structure in large panel datasets. Given observable factors, the
criterion checks whether the errors are weakly cross-sectionally correlated, or share one or more
unobservable common factors (interactive effects), and selects their number; see Gagliardini, Os-
sola and Scaillet (2020) for a survey of estimation of large dimensional conditional factor models
in finance. A second strand of that literature develops inference procedures for hypotheses on the
number of latent factors. Onatski (2009) deploys a characterization of the largest eigenvalues of a
Wishart-distributed covariance matrix with large dimensions in terms of the Tracy-Widom Law. To
get a Wishart distribution, Onatski (2009) assumes either Gaussian errors, or T much larger than n.
3
Kapetanios (2010) uses subsampling to estimate the limit distribution of the adjacent eigenvalues.
This paper puts forward methodological and empirical contributions that complement the above
literature. (i) On the methodological side, we extend the inferential tools of FA to non-Gaussian
and non-i.i.d. settings. First, we characterize the asymptotic distribution of FA estimators obtained
under a pseudo maximum likelihood approach where the time-series dimension is held fixed while
the cross-sectional dimension diverges. Hence, the asymptotic analysis targets short panels, and
allows for cross-sectionally heteroskedastic and weakly dependent errors. Cochrane (2005, p. 226)
argues in favour of the development of appropriate large-n small-T tools for evaluating asset pric-
ing models, a problem only partially addressed in finance. In a short panel setting, Zaffaroni (2019)
considers inference for latent factors in conditional linear asset pricing models under sphericity
based on PCA, including estimation of the number of factors.2 The small T setting mitigates con-
cerns for panel unbalancedness (outside the straightforward missing-at-random mechanism) and
corresponds to a locally time-invariant factor structure accommodating globally time-dependent
features of general forms. It is also appealing to macroeconomic data observed quarterly. For the
sake of space, we put part of the theory, namely inference for FA estimates, in the Online Appendix
(OA). We refer to Bai and Li (2016) for inference when n and T are both large (see Bai and Li
(2012) for the cross-sectional independent case). Second, we build on our new theoretical results
for FA to develop testing procedures for the number of latent factors in a short panel which rely on
neither sphericity nor Gaussianity nor cross-sectional independence, thereby extending tests based
on eigenvalues, as in Onatski (2009), to small T , and as in FGS, to non-spherical errors, thanks
to an FA device. Here, we deliver asymptotic feasible distributional theory even if FA residuals
are not consistent estimates of the true errors under fixed T . We further derive the Asymptotically
2
Raponi, Robotti and Zaffaroni (2020) develop tests of beta-pricing models and a two-pass methodology to estimate
the ex-post risk premia (Shanken (1992)) associated to observable factors (see Kleibergen and Zhan (2023) for robust-
identification inference based a continuous updating generalized method of moments). Kim and Skoulakis (2018)
deals with the error-in-variable problem of the two-pass methodology with small T by regression-calibration under
sphericity and a block-dependence structure.
4
Uniformly Most Powerful Invariant (AUMPI) property of the FA likelihood ratio (LR) test statistic
in the non-Gaussian case under inequality restrictions on the DGP parameters, and cover inference
with weak factors. The AUMPI property is rare and sought-after in testing procedures (see Engle
(1984) for a discussion and Romano, Shaikh and Wolf (2010) for a survey of optimality approaches
in testing problems), and often holds only under restrictive assumptions such as Gaussianity. We
show that the AUMPI property can hold even if the asymptotic distribution of the LR statistic is
driven by a weighted average of independent chi-square variates in our context instead of a single
chi-square variate. We achieve that by providing novel sufficient conditions so that the ratio of the
density under local alternative hypotheses and the null hypothesis satisfies the Monotone Likeli-
hood Ratio (MLR) property. Hence, the FA theory gathered in the below consists in a body of new
results including their proofs, and differs completely from PCA theory. (ii) On the empirical side,
we apply our FA methodology to panels of monthly U.S. stock returns with large cross-sectional
and small time-series dimensions, and investigate how the number of driving factors changes over
time and particular periods. Furthermore, month after month, we provide a novel separation3 of
the risk coming from the systematic part and the risk coming from the idiosyncratic part of returns
in short subperiods of bear vs. bull market based on the selected number of factors. We observe
an uptrend in the estimated paths of total and idiosyncratic volatilities (see also Campbell et al.
(2023)) while the systematic risk explains a large part of the cross-sectional total variance in bear
markets but is not driven by a single latent factor. We also investigate whether standard observed
factors span the estimated latent factors using rank tests in order to suit our fixed T setting. Ob-
served factors struggle spanning latent factors with a discrepancy between the dimensions of the
two factor spaces decreasing over time.
The outline of the paper is as follows. In Section 2, we consider a linear latent factor model
and introduce test statistics on the number of latent factors based on FA. Section 3 presents a
feasible asymptotic distributional theory for inference in short panels under a block-dependence
3
Such a decomposition with PCA estimates is invalid without sphericity because of inconsistency of F̂ .
5
structure to allow for weak dependence in the cross-section. Section 4 discusses three special cases,
i.e., Gaussian errors (yielding the classical chi-square test of FA theory in the cross-sectionally
homoskedastic case), settings where the asymptotic distribution under Gaussian errors still holds
for the test statistics, and spherical errors. Section 5 is dedicated to local asymptotic power and
AUMPI tests. We provide our empirical application in Section 6 and our concluding remarks in
Section 7. Appendices A and B gather the regularity assumptions and proofs of the main theoretical
results. Appendix C gives a Monte Carlo assessment of size and power and selection procedure
for the number of factors for the LR test. We place all omitted proofs and additional analyses in
Appendices D-F in Online Appendix (OA). Appendix G collects the maximum value of k as a
function of T . Besides, we gather all explicit formulas not listed in the core text but useful for
coding in an online “Supplementary Materials for Coding” (SMC) attached to the replication files.
We also put there other numerical checks and additional Monte Carlo results to assess the impact
of non-sphericity for the eigenvalue spacing test of FGS and the constrained LR test, as well as of
using the classical chi-square test on the size and selection procedure for the number of factors.
yi = µ + F βi + εi , i = 1, ..., n, (1)
where yi = (yi,1 , ..., yi,T )′ and εi = (εi,1 , ..., εi,T )′ are T -dimensional vectors of observed data and
unobserved error terms for individual i. The k-dimensional vectors βi = (βi,1 , ..., βi,k )′ are latent
individual effects, while µ and F are a T × 1 vector and a T × k matrix of unknown parameters.
The number of latent factors k is an unknown integer smaller than T . In matrix notation, model
(1) reads Y = µ1′n + F β ′ + ε, where Y and ε are T × n matrices, β is the n × k matrix with rows
βi′ , and 1n is a n-dimensional vector of ones.
6
Matrix Vε is the limit cross-sectional average of the - possibly heterogeneous - errors’ unconditional
variance-covariance matrices. The diagonality condition in Assumption 1 is standard in FA (in the
more restrictive formulation involving i.i.d. data).
In our empirics with a large cross-sectional panel of returns for n assets over a short time span
with T periods, vectors yi and εi stack the monthly returns and the idiosyncratic errors of stock i.
Any row vector ft′ := (ft,1 , ..., ft,k ) of matrix F yields the latent factor values in a given month
t, and vector βi collects the factor loadings of stock i. In this finance application, we assume the
No-Arbitrage (NA) principle to hold, so that the entries µt in the intercept vector in Equation (1)
account for the (possibly time-varying) risk-free rate and (possibly non-zero) cross-sectional mean
of stock betas.4 Thus, the linear FA model (1) yields yi,t = µt + ft′ βi + εi,t , that is the standard
formulation in asset pricing. We cover the Capital Asset Pricing Model (CAPM) when the single
latent factor is the excess return of the market portfolio. Assumption 1 allows for serial dependence
in idiosyncratic errors in the form of martingale difference sequences, like individual GARCH and
Stochastic Volatility (SV) processes, as well as weak cross-sectional dependence (see Assumption
2 below). It also accommodates common time-varying components in idiosyncratic volatilities
by allowing different entries along the diagonal of Vε ; see Renault, Van Der Heijden and Werker
(2023) for arbitrage pricing in such settings.5
4
Under NA, the intercept term in the asset return model yi = µi + F̃ β̃i + εi is µi = rf + 1T ν ′ β̃i , where rf
is the T -dimensional vector whose entries collect the (possibly time-varying) risk-free rates, ν = (ν1 , ..., νk )′ is a k-
dimensional vector of parameters, and 1T is a T -dimensional vector of ones (see e.g. Gagliardini, Ossola and Scaillet
(2016)). We can absorb term 1T ν ′ β̃i into the systematic part to get yi = rf + F β̃i + εi with F = F̃ + 1T ν ′ .
It holds irrespective of the latent factors being tradable or not. If the factors are tradable, we further have ν = 0
from the NA restriction. Akin to standard formulation of FA, we recenter the latent effects by subtracting their mean
Pn
µ̃β̃ = n1 i=1 β̃i , to get model (1) with βi = β̃i − µ̃β̃ and µ = rf + F µ̃β̃ .
5
When there is a common random component in idiosyncratic volatilities, we have Vε = plim n1 εε′ by a suitable
n→∞
version of the Law of Large Number (LLN) conditional on the sigma-field generated by this common component.
With fixed T , we treat the sample realizations of the common component in idiosyncratic volatilities as unknown time
fixed effects (the diagonal elements of matrix Vε ), which yields time heterogeneous distributions for the errors. It is
how the unconditional expectation in Assumption 1 has to be understood.
7
This paper focuses mainly on testing hypotheses on the number of latent factors k when T
is fixed and n → ∞. The fixed T perspective makes FA especially well-suited for applications
with short panels. Indeed, we work conditionally on the realizations of the latent factors F and
treat their values as parameters to estimate. In comparison with the standard small n and large
T framework in traditional asset pricing (e.g. Shanken (1992) with observable factors), here fac-
tors and loadings are interchanged in the sense that the βi and F play the roles of the “factors"
and the “factor loadings" in FA. We depart from classical FA since the βi are not considered as
random effects with a Gaussian distribution but rather as fixed effects, namely incidental parame-
ters.6 Moreover, in Assumption 1, we neither assume Gaussianity nor we impose sphericity of the
covariance matrix of the error terms. Besides we accommodate weak cross-sectional dependence
and ARCH effects in idiosyncratic errors (see Section 3). Hence, the FA estimators defined below
correspond to maximizers of a Gaussian pseudo likelihood. By-products of our analysis are the
feasible asymptotic distributions of FA estimators of F and Vε in more general settings than in the
available literature (e.g. Anderson and Amemiya (1988)), which we present in Appendix E.
The test statistic we consider for conducting inference on the number of latent factors k is
function of the elements of the symmetric matrix
where V̂y = n1 Ỹ Ỹ ′ is the sample (cross-sectional) variance matrix (the n columns of Ỹ are yi − ȳ
and ȳ = n1 ni=1 yi is the vector of cross-sectional means), MF,V := IT − F (F ′ V −1 F )−1 F ′ V −1
P
is the Generalized Least Squares (GLS) projection matrix orthogonal to F for variance V , and F̂
and V̂ε are the FA estimators computed under the assumption that there are k latent factors. In the
following, we use the same notation for the matrix-to-vector diag operator and the vector-to-matrix
diag operator. Hence, diag(A) for a matrix A denotes the vector in which we stack the diagonal
elements of matrix A, and diag(a) for a vector a denotes a diagonal matrix with the elements of a
6
Chamberlain (1992) studies semiparametrically efficient estimation in panel models with fixed effects and short T
using moment restrictions from instrumental variables. Our approach does not rely on availability of valid instruments.
8
on the diagonal. From Anderson (2003) Chapter 14, the FA estimators F̂ , V̂ε maximize a Gaussian
pseudo likelihood (Appendix E.1) and meet the first order conditions:7
The number of degrees of freedom is df = 21 ((T − k)2 − T − k).8 It is required that df ≥ 0 for
estimation, and we need df > 0 to test the null hypothesis of k latent factors (see Proposition 2 (a)
below).
Statistic Ŝ in Equation (2) checks if the difference between the sample variance-covariance V̂y
and diagonal matrix V̂ε is a symmetric matrix of reduced rank k, with range spanned by the range
of F̂ . The probability limit of Ŝ is nil under the null hypothesis of k latent factors. We get further
insights from the next result.
Proposition 1 Under Assumption 1, (a) the eigenvalues of matrix Ŝ are: γ̂j , for j = k + 1, ..., T ,
and 0, with multiplicity k, where 1 + γ̂j for j = k + 1, ..., T are the T − k smallest eigenvalues of
V̂y V̂ε−1 , (b) the squared Frobenius norm is ∥Ŝ∥2 = Tj=k+1 γ̂j2 , (c) diag(Ŝ) = 0, and (d) we get
P
1 ′
Ŝ = V̂ε−1/2 ε̂ε̂ V̂ε−1/2 − V̂ε−1/2 MF̂ ,V̂ε V̂ε1/2 , (3)
n
9
From Proposition 1 (d), we can interpret matrix Ŝ in terms of scaled cross-sectional averages
−1/2 1/2
of squared and cross-products of GLS residuals. In (3), we subtract V̂ε MF̂ ,V̂ε V̂ε and not the
identity because the residuals are orthogonal to F̂ by construction. From Proposition 1 (c), the
diagonal elements of matrix Ŝ vanish. Those elements are not informative for inference on the
number of factors, and can be ignored when constructing the test statistics. This finding is natural
because we expect that only the out-of-diagonal elements of n1 ε̂ε̂′ , i.e., the cross-sectional averages
of cross-products of residuals for two different dates, are useful to check for omitted factors.9
We now introduce the classical FA likelihood ratio (LR) statistic to test the null hypothesis
H0 (k) of k latent factors:
T
X
LR(k) := −n log(1 + γ̂j ). (4)
j=k+1
The LR statistic in (4) only uses the information contained in the eigenvalues of matrix Ŝ. More-
over, LR(k) = n2 ∥Ŝ∥2 + op (1).10 It comes from a second-order expansion of the log function
and the use of Tj=k+1 γ̂j = 0, from Proposition 1 (a) and (c) (see also Anderson (2003)), and
P
√
nγ̂j = Op (1), for j = k + 1, ..., T , from Propositions 1 and 2. Next, we establish feasibility of
the asymptotic distribution of the LR statistic with n → ∞ and T fixed under a block-dependence
structure.
9
The test in Connor and Korajczyk (1993) is built on cross-sectional averages of squared residuals, akin to diagonal
terms of Ŝ, but obtained by PCA instead of FA. However, their test statistic involves the difference of such cross-
sectional averages for two consecutive dates, and relies on error sphericity.
10
Alternatively, we can use the squared norm statistic T (k) = n∥Ŝ∥2 . In the below, we focus on the LR test
statistic since both deliver similar results in Monte Carlo experiments and in our empirics.
10
Ṽβ = n1 β ′ β, and Ṽε = E[ n1 εε′ ]. This normalization of the factor values is sample dependent, i.e.,
F = F(n) . We skip index n for the purpose of easing notation. Then, under our assumptions,
we have Vy := plim V̂y = F F ′ + Vε , with F ′ Vε−1 F = diag(γ1 , ..., γk ).11 In particular, we have
n→∞
Vy Vε−1 Fj = (1 + γj )Fj , i.e., the columns of F are the eigenvectors of matrix Vy Vε−1 associated
with eigenvalues 1 + γj , j = 1, ..., k.12
We use a block-dependence structure to allow for weak cross-sectional dependence in errors.
1/2
Assumption 2 (a) The errors are such that ε = Vε W Σ1/2 , where W = [w1 : · · · : wn ] is a T ×n
random matrix of standardized errors terms wi,t that are independent across i and uncorrelated
across t, and Σ = (σi,j ) is a positive-definite symmetric n × n matrix, such that lim n1 ni=1 σii =
P
n→∞
1. (b) Matrix Σ is block diagonal with Jn blocks of size bm,n = Bm,n n, for m = 1, ..., Jn , where
Jn → ∞ as n → ∞, and Im denotes the set of indices in block m. (c) There exist constants
δ ∈ [0, 1] and C > 0 such that max j∈Im |σi,j | ≤ Cbδm,n . (d) The block sizes bm,n and block
P
i∈I
2δ
PJn m 2(1+δ)
number Jn are such that n m=1 Bm,n = o(1).
The block-dependence structure as in Assumption 2 is satisfied, for instance, when there are unob-
served industry-specific factors independent among industries and over time, as in Ang, Liu, and
Schwarz (2020). In empirical applications, blocks in Σ can match industrial sectors (Fan, Furger,
and Xiu (2016), Gagliardini, Ossola, and Scaillet (2016)). As already remarked, the diagonal el-
ements of Vε are the sample realizations of the common component driving the variance of the
error terms at times t = 1, ..., T ; see e.g. Barigozzi and Hallin (2016), Renault, Van Der Heijden
and Werker (2023) for theory and empirical evidence pointing to variance factors. A spheric-
ity assumption cannot accommodate such a common time-varying component. Assumption 2 (a)
11
The standardizations β̄ = 0 and Ṽβ = Ik of the factor loadings wash out the incidental parameter problem
(Neyman and Scott (1948); see Lancaster (2000) for a review) since the individual loadings do not appear in Vy . It
explains why we are able to get consistent estimators F̂ and V̂ε for large n and fixed T .
12 √ 1/2
The remaining eigenvalue is equal to 1 with multiplicity T − k. We have Fj = γj Vε Uj , where the Uj are the
−1/2 −1/2
orthonormal eigenvectors of Vε Vy Vε for the k largest eigenvalues 1 + γj .
11
1/2 1 Pn ′ 1/2 1
Pn
is coherent with Assumption 1. Indeed, Ṽε = Vε n i,j=1 σi,j E[wi wj ]Vε = n i=1 σii Vε
is diagonal. Hence, Ṽε is a scalar multiple of Vε , and converges to Vε under the normalization
lim n1 ni=1 σii = 1. That normalization is without loss of generality by rescaling of the param-
P
n→∞
eters. Assumption 2 (c) builds on Bickel and Levina (2008), and δ < 1 holds under sparsity,
vanishing correlations or mixing dependence within blocks. With blocks of equal size, Assump-
2δ
tion 2 (d) holds for Jn = nᾱ and ᾱ > 2δ+1
. Having δ < 1 helps relaxing this condition on block
granularity, however it is not strictly necessary because we allow value δ = 1.
In the proof of Proposition 2 below, we establish an asymptotic expansion for the LR test
statistic under the null hypothesis of k latent factors, by deriving an asymptotic expansion for
n∥Ŝ∥2 . For a (T − k) × (T − k) symmetric matrix Z = (zi,j ) and p = 21 (T − k)(T − k + 1), let us
′
define the p-dimensional vector vech(Z) = √12 z11 , ..., √12 zT −k,T −k , {zi,j }i<j , where the pairs of
indices (i, j) with i < j are ranked as (1, 2), (1, 3), ..., (1, T − k), (2, 3), ..., (T − k − 1, T − k).13
Moreover, let us define the p × T matrix X = [vech(G′ E1,1 G) : · · · : vech(G′ ET,T G)], where
Et,t denotes the T × T matrix with entry 1 in position (t, t) and 0 elsewhere, and G is a T × (T − k)
matrix such that F ′ Vε−1 G = 0 and G′ Vε−1 G = IT −k . Matrix G is unique up to post-multiplication
by an orthogonal matrix, and the columns of X span the linear space {vech(G′ DG) : D diagonal}.
Then, the asymptotic expansion of n∥Ŝ∥2 under H0 (k) is:
n
∥Ŝ∥2 = vech(Zn∗ )′ MX vech(Zn∗ ) + op (1), (5)
2
√
where Zn∗ := G′ Vε−1 Zn Vε−1 G, Zn := n n1 εε′ − Ṽε , and matrix MX := Ip − X(X ′ X)−1 X ′
is idempotent of rank p − T = df .14 The full-rank condition for matrix X corresponds to
the local identification condition in Assumption A.5 ((see Lemma 6)), analogously as in lin-
ear regression. The expansion in (5) does not depend on the diagonal elements of Zn , since
13
This definition of the half-vectorization operator for symmetric matrices differs from the usual one by the ordering
of the elements, and the rescaling of the diagonal elements. It is more convenient for our purposes (see proof of Lemma
10). For instance, it holds 12 ∥A∥2 = vech(A)′ vech(A), for a symmetric matrix A.
14
We can extend results like (5) to test statistics that are generic functions of the eigenvalues of matrix Ŝ by using
the Weyl’s inequalities (see e.g. Bernstein (2009)), and develop test statistics along the lines of FGS.
12
vech(G′ Vε−1 diag(Zn )Vε−1 G) is spanned by the columns of X, and thus is annihilated by the
projection matrix MX . Such an irrelevance of the diagonal elements of Zn is an implication of
Proposition 1 (c).
We now establish the distributional convergence Zn∗ ⇒ Z ∗ as n → ∞ and T is fixed, where Z ∗
is a Gaussian symmetric matrix variate. By the block structure in Assumption 2 (b), we can write
Zn∗ as a sum of independent zero-mean terms: Zn∗ = √1n Jm=1
Pn ∗
zm,n , where the variables in the
∗
= i∈Im G′ Vε−1 (εi ε′i − E[εi ε′i ]) Vε−1 G are independent across m and such
P
triangular array zm,n
∗
that E[zm,n ] = 0. In Appendix B, we invoke the CLT for independent heterogeneous variables to
vech(Zn∗ ) = √1n Jm=1 ∗
Pn
vech(zm,n ) and use Assumptions 2 (c) and (d) to check the Liapunov con-
dition. We get Zn∗ ⇒ Z ∗ , where vech(Z ∗ ) ∼ N (0, ΩZ ∗ ) and ΩZ ∗ = lim n1 Jm=1 ∗
Pn
V [vech(zm,n )].
n→∞
Then, the asymptotic expansion in (5) yields the asymptotic distribution for the LR test statistic
defined in (4) under the null hypothesis, and is given in Proposition 2 (a) below.
1/2
In Proposition 2 (b) hereafter, we use Ĝ = V̂ε Q̂, where Q̂ is the T × (T − k) matrix of stan-
dardized eigenvectors of Ŝ corresponding to its T − k non-zero eigenvalues ranked in decreasing
order (see Proposition 1 (a)),15 and X̂ the matrix obtained by replacing G with Ĝ in X.
Proposition 2 Let Assumptions 1-2 and A.1-A.7 hold. As n → ∞ and T is fixed, under the null
hypothesis H0 (k) of k latent factors, (a) LR(k) ⇒ df 2 2
P
j=1 µj χj (1), where the χj (1) are independent
chi-square variables with one degree of freedom, and the µj are the df non-zero eigenvalues of
p
matrix MX ΩZ ∗ MX . (b) µ̂j → µj , where µ̂j , j = 1, ..., df , are the non-zero eigenvalues of matrix
MX̂ Ω̂Z ∗ MX̂ , and Ω̂Z ∗ = n1 Jm=1 ∗ ∗
)′ with ẑm,n
∗ ′ −1 ′ −1
Pn P
vech(ẑm,n )vech(ẑm,n = i∈Im Ĝ V̂ε ε̂i ε̂i V̂ε Ĝ
and ε̂i = MF̂ ,V̂ε (yi − ȳ). Under the alternative hypothesis H1 (k) of more than k latent factors, (c)
LR(k) ≥ Cn, w.p.a. 1 for a constant C > 0, and µ̂j = Op (n Jm=1 2
Pn
Bm,n ) = op (n).
Proposition 2 (a) shows that we depart from classical FA theory since we have convergence to a
weighted sum of chi-square variates instead of a single chi-square variate (see the discussion in the
15 −1/2 −1/2
Q̂ is the T × (T − k) matrix of standardized eigenvectors of V̂ε V̂y V̂ε corresponding to its T − k smallest
eigenvalues ranked in decreasing order.
13
next section). In Proposition 2 (b), matrix Ĝ is consistent for G up to a rotation. The eigenvalues µ̂j
are unaffected by such rotation because eigenvalues are invariant under pre and post multiplication
by an orthogonal matrix and its transpose.16 With fixed T , the GLS residuals ε̂i are asymptotically
close to MF,Vε εi and not to the true errors εi . However, it does not impede the consistency of
the eigenvalues µ̂j , since G′ Vε−1 MF,Vε = G′ Vε−1 . When we apply the CLT, centering is done
implicitly since MX vech(G′ Vε−1 εi ε′i Vε−1 G) = MX vech(G′ Vε−1 (εi ε′i − E[εi ε′i ])Vε−1 G). Indeed
vech(G′ Vε−1 E[εi ε′i ]Vε−1 G) is spanned by the columns of X since E[εi ε′i ] = σii Vε is diagonal. We
can consistently estimate the critical values of the statistics by simulating a large number of draws
from df 2
P
j=1 µ̂j χj (1). Hence, even if FA residuals are not consistent estimates of the true errors in
short panels, we are still able to supply a feasible asymptotic distributional theory for our empirical
applications under a block-dependence structure. Proposition 2 (c) gives test consistency against
global alternatives.
14
distribution with df degrees of freedom in the cross-sectionally homoskedastic case, i.e., σii = 1
for all assets i. We cannot expect that this distributional result applies to the Gaussian framework
in full generality, since - even in such a case - our setting corresponds to a pseudo model (because
the σii may be heterogeneous across i, and the βi are treated as fixed effects, namely incidental pa-
rameters, instead of Gaussian random effects). Under the normality assumption for the error terms,
ind
we have ε∗i := G′ Vε−1 εi ∼ N (0, σii IT −k ). Thus, by the Liapunov CLT, the distributional limit of
∗
p
√1 Zn = n/q n1 ε∗ (ε∗ )′ − n1 E [ε∗ (ε∗ )′ ] is in the Gaussian Orthogonal Ensemble (GOE) for di-
q
mension T − k (see e.g. Tao (2012)), i.e., √1q vech(Z ∗ ) ∼ N (0, Ip ), where q := lim n1 ni=1 σii2 .
P
n→∞
2
Then, from (5) we get LR(k) ⇒ qχ (df ), i.e., we get convergence to a scaled chi-square variate
qχ2 (df ). In the cross-sectionally homoskedastic case, we have q = 1 yielding the classical χ2 (df )
result. On the contrary, cross-sectional heterogeneity in the unconditional idiosyncratic variances
yields q > 1 and a deviation from classical FA theory even in the Gaussian case. In the Gaussian
case, unobserved heterogeneity across asset idiosyncratic variances leads to an oversized LR test
if we use critical values from the chi-square table without proper scaling. In our Monte Carlo ex-
periments with non-Gaussian errors, we get size distortion by around 80 percentage points when
T = 24 for a nominal size of 5%, and thus an average estimated number of factors over 3 instead
of 2.
In this subsection, we investigate sufficient conditions for the validity of the convergence of the LR
statistic to a scaled chi-square variate χ2 (df ), but in special cases beyond Gaussianity of errors.
For this purpose, let us recall that the asymptotic expansion of LR(k) in (5) only involves the out-
of-diagonal elements of Zn . Under independent Gaussian errors (Section 4.1), by the Liapunov
CLT, we have Zn ⇒ Z, where Zt,s ∼ N (0, qVε,tt Vε,ss ), for t > s, mutually independent, where
the Vε,tt are the diagonal elements of matrix Vε . We deduce that any setting featuring the same
joint asymptotic distribution for the out-of-diagonal elements of random matrix Zn leads to an
15
asymptotic distribution for the LR statistic similar to the Gaussian case.
1
Pn
Proposition 3 Let Assumptions 1-2, A.1-A.6 hold with (a) lim i=1 E[εi,t εi,s εi,r εi,p ] = qVε,tt Vε,ss ,
n→∞ n
when t = r > s = p, for a constant q > 0, and = 0 in all other cases with t > s and r > p, and
(b) let κ = lim n1 Jm=1 2 2
Pn P
n→∞ i̸=j∈Im σij as in Assumption A.3 (b). Then, LR(k) ⇒ q̄χ (df ) under
H0 (k) for q̄ := q + κ.
Conditions (a) and (b) in Proposition 3 generalize the correctness of the scaled chi-square test
beyond Gaussianity and error independence across time and assets. Under Assumption 2, Condi-
tion (a) is satisfied if the standardized error terms wi,t are conditionally homoskedastic martingale
difference sequences. However, Condition (a) excludes empirically relevant cases such as ARCH
1
processes for wi,t , because, in that case, Vε,tt Vε,ss
E[ε2i,t ε2i,s ] depends on lag t−s. Hence, serial corre-
lation in squared idiosyncratic errors is responsible for the deviation of the LR test from the scaled
chi-square asymptotic distribution. This setting is covered by the general results in Proposition
2. Anderson and Amemiya (1988) establish the asymptotic distribution of FA estimates assuming
that the error terms are i.i.d. across sample units and deploy an assumption that is analogue to
Condition (a) above in their Corollary 2. The i.i.d. assumption in our case implies σii = 1 for all i,
which results in a cross-sectionally homoskedastic setting.17 That setting is irrealistic in our appli-
cation, as it would imply that the idiosyncratic variance is the same for all assets. Our results show
that establishing the asymptotic distribution of the test statistics, especially the AUMPI property of
LR test (see Section 5), in a general setting with non-Gaussian errors, heterogeneous idiosyncratic
variances and ARCH effects, is challenging, but still possible.
17
If the σii are treated as i.i.d. random effects independent of the errors, and we exclude cross-sectional correlation
of errors to simplify, we recover the i.i.d. condition of the data. However, the random σii yields a stochastic common
factor across time that breaks the condition in Corollary 2 of Anderson and Amemiya (1988).
16
4.3 Spherical errors
If errors are spherical, i.e., matrix Vε = σ̄ 2 IT is a multiple of the identity with unknown param-
eter σ̄ 2 > 0, then the asymptotic distribution of LR(k) corresponds to a special case of Propo-
sition 2 (a). If sphericity is imposed in the estimation procedure, i.e., V̂ε becomes V̂ε,c = σ̂ 2 IT ,
the constrained FA estimator boils down to the Principal Component Analysis (PCA) estimator;
see Anderson and Rubin (1956) Section 7.3. Let F̂c denote the FA estimator under sphericity
constraint. Then, F̂c is the matrix of eigenvectors of matrix V̂y standardized such that F̂c′ F̂c =
1
PT
diag(δ̂1 − σ̂ 2 , ..., δ̂k − σ̂ 2 ), and σ̂ 2 = T −k j=k+1 δ̂j , where δ̂j denotes the jth largest eigenvalue
of matrix V̂y . The matrix Ŝ under sphericity constraint becomes Ŝc = σ̂12 MF̂c (V̂y − σ̂ 2 IT )MF̂c =
′
1 1
− MF̂c , where MF̂c = IT − F̂c (F̂c′ F̂c )−1 F̂ ′ and ε̂c = MF̂c Ỹ is the matrix of OLS residu-
σ̂ 2 n
ε̂ c ε̂ c
δ̂ −σ̂ 2
als. Then, the constrained LR statistic becomes LRc (k) := −n Tj=k+1 log(1 + jσ̂2 ) and reduces
P
to the LR statistic invoqued by Onatski (2023) in his discussion of FGS.18 Under spherical errors,
Onatski (2023) shows LRc (k) ⇒ 12 (T r[(Z ∗ )2 ] − T −k
1
[T r(Z ∗ )]2 ) = vech(Z ∗ )′ Mx vech(Z ∗ ), where
Mx = Ip − x(x′ x)−1 x′ with x = vech(IT −k ), Z ∗ = 1
σ̄ 4
G′ ZG, Z is the distributional limit of Zn ,
and G is a T × (T − k) matrix such that F ′ G = 0 and G′ G = σ̄ 2 IT −k . Hence the asymptotic distri-
Pp−1
bution of the constrained LR statistic under sphericity is j=1 µj χ2 (1), where the χ2 (1) are inde-
pendent and the µj are the non-zero eigenvalues of matrix Mx ΩZ ∗ Mx , with ΩZ ∗ = V [vech(Z ∗ )].
Under Gaussian errors, it simplifies to LRc (k) ⇒ qχ2 (p − 1) (see Section 4.1). In a PCA setting
when sphericity fails to hold, a massive over-rejection by over 80 percentage points for a nominal
size of 5% is again observed in our Monte Carlo experiments. It is true both for the constrained
LR test and for the eigenvalue spacing test of FGS. We get an average estimated number of factors
often above 10 instead of 2 when T = 24.
n
PT
18
We have γ̂j = δ̂j /σ̂ 2 − 1, and LRc (k) = 2σ̂ 4
2
j=k+1 (δ̂j − σ̂ 2 )2 + op (1), i.e., the constrained LR statistic is
asymptotically equivalent to the sum of squared deviations of the T −k smallest eigenvalues from their mean. Besides,
√
by similar results, we have that the eigenvalue spacing statistic nS (k) := γ̂k+1 − γ̂T corresponds to the statistic
considered in FGS divided by σ̂ 2 , and its asymptotic distribution under sphericity coincides with that obtained by FGS.
17
5 Local asymptotic power
In this section, we study the asymptotic power of the test statistics against local alternatives in
which we have k (strong) factors plus a weak factor. Specifically, under H1,loc (k), we have
√
nγk+1 → ck+1 as n → ∞, with ck+1 > 0. The (drifting) DGP is Y = µ1′n + F β ′ + Fk+1 βloc
′
+ ε,
where βloc is the loading vector for the (k + 1)th factor, and the factor vector is normalized such
√
that Fk+1 = γk+1 ρk+1 with ρ′k+1 Vε−1 ρk+1 = 1 and F ′ Vε−1 ρk+1 = 0. Thus, we can write
ρk+1 = Gξk+1 for a T − k dimensional vector ξk+1 with unit norm. Scalar ck+1 and vector ξk+1
yield the (normalized) strength and the direction of the local alternative.
We derive an asymptotic expansion of n∥Ŝ∥2 under H1,loc (k) using similar arguments as in the
proof of Proposition 2 (a) (see the proof of Proposition 4 in Appendix B for the derivation):
n ∗
∥Ŝ∥2 = vech(Zn,loc )′ MX vech(Zn,loc
∗
) + op (1), (6)
2
∗
where Zn,loc ′
= Zn∗ + ck+1 ξk+1 ξk+1 ∗
. From the CLT, we have Zn,loc ∗
⇒ Zloc ∗
where Zloc = Z∗ +
′ ∗
ck+1 ξk+1 ξk+1 . Matrix variate Zloc is a non-central symmetric Gaussian matrix. The non-zero mean
depends in general on both ck+1 and ξk+1 , while the variances and covariances of the elements
∗
of Zloc are the same as those of Z ∗ . Then, we deduce from (6) that the asymptotic distribution
of the LR(k) statistic under the local alternative hypothesis is a weighted average of df mutu-
ally independent non-central chi-square variables, with non-centrality parameters depending on
′
vech(∆) := MX vech(ck+1 ξk+1 ξk+1 ).
Proposition 4 Let Assumptions 1-2, A.1-A.6 hold. Under the local alternative hypothesis H1,loc (k),
we have as n → ∞ and T is fixed, LR(k) ⇒ df 2 2 2 −1 ′ 2
P
j=1 µj χ (1, λj ), where λj = µj [vj vech(∆)] ,
and the µj and vj are the non-zero eigenvalues and the associated standardized eigenvectors of
matrix MX ΩZ ∗ MX .
18
The non-centrality term vech(∆) is in charge of the asymptotic local power of the statistics. When
this vector is null, the asymptotic local power is zero. Indeed, for some local alternatives the
(k + 1)th weak factor can be absorbed in the diagonal variance matrix Vε of the error terms. More
precisely, in Appendix E.3 ii), we show that Vy + c√k+1
ρ ρ′
n k+1 k+1
= F ∗ (F ∗ )′ + Vε∗ + √1n G∆G′ +
√
o(1/ n) for some T ×k matrix F ∗ and diagonal matrix Vε∗ , which yields asymptotically a k-factor
model when ∆ = 0. We have λ2 := df 2 1 2
P
j=1 µj λj = 2 ∥∆∥ , i.e., the half squared Frobenius norm of
the matrix measuring local distance from the k-factor specification. It follows that the asymptotic
local power of the LR statistic is non null as long as λ2 > 0, i.e., it has non-trivial asymptotic
power against any proper local alternative. In our Monte Carlo experiments in Appendix C, we
find that the LR statistic has size close to the nominal value, and power against global as well as
local alternatives with time dimension as small as T = 6.
Under the normality of errors, or more generally the conditions of Proposition 3, using that
matrix √1 Z ∗ is in the GOE for dimension T − k, i.e., vech(Z ∗ ) ∼ N (0, q̄Ip ), we have LR(k) ⇒
q̄
q̄χ2 (df, λ2 /q̄) from (6). The local power is a function solely of the squared Euclidean norm of the
vector vech(∆) measuring local distance from the k-factor specification, divided by q̄.
In this subsection, we investigate asymptotic local optimality of the LR statistic for testing hy-
potheses on the number of latent factors. In our framework with composite null and alterna-
tive hypotheses and multi-dimensional parameter, we cannot expect in general to establish Uni-
formly Most Powerful (UMP) tests. Instead, we can establish an optimality property by restricting
the class of tests to invariant tests (e.g. Lehmann and Romano (2005)). We focus on statistics
with test functions ϕ written on the elements of matrix Ŝ. To eliminate the asymptotic redun-
n o
dancy in the elements of Ŝ, we actually consider the test class C = ϕ : ϕ = ϕ(Ŵ ) with
√ ′ −1/2 −1/2
Ŵ := nD̂ vech(Ŝ ∗ ), where Ŝ ∗ = Ĝ′ V̂ε Ŝ V̂ε Ĝ and D̂ is a p × df full-rank matrix, such
′ ′
that MX̂ = D̂ D̂ and D̂ D̂ = Idf . We have that Ŝ ∗ = diag(γ̂k+1 , . . . , γ̂T ), since Ŝ has the eigen-
19
−1/2 −1/2
decomposition V̂ε Ĝdiag(γ̂k+1 , . . . , γ̂T )Ĝ′ V̂ε from Proposition 1 (a). The diagonal matrix
−1/2
Ŝ ∗ contains the information in Ŝ beyond orthogonality of its rows and columns to V̂ε F̂ . The
√
vector Ŵ contains the information in nvech(Ŝ ∗ ) beyond orthogonality to X̂.19
Matrices Ĝ and D̂ are both consistent up to post-multiplication by an orthogonal matrix. This
point yields a group of orthogonal transformations under which we require the test statistics to be
invariant.20 In Appendix E.6, we show that the maximal invariant under this group is provided by
√
Ŵ ′ Ŵ = nvech(Ŝ ∗ )′ MX̂ vech(Ŝ ∗ ). Since nvech(Ŝ ∗ ) belongs to the range of matrix MX̂ , we
have Ŵ ′ Ŵ = n2 ∥Ŝ ∗ ∥2 = n2 ∥Ŝ∥2 . Therefore, the invariant tests are functions of the squared norm
of Ŝ, which is asymptotically equivalent to the LR statistic (up to the factor 1/2).
In the Gaussian case, or more generally under the conditions of Proposition 3, the LR statistic
follows asymptotically a scaled non-central chi-square distribution with df degrees of freedom and
Pdf 2
non-centrality parameter λ2 /q̄ = j=1 λj as shown in the previous subsection. Thus, we can
simplify the null and alternative hypotheses of our testing problem asymptotically and locally to a
one-sided test with null hypothesis H0 (k) : λ2 = 0 vs. alternative hypothesis H1,loc (k) : λ2 > 0.
The scaling constant q > 0 plays no role in the power analysis. It means that the LR test is an
AUMPI test (Lehmann and Romano (2005) Chapters 3 and 13). Indeed, the density g(z; df, λ2 ) of
the χ2 (df, λ2 ) distribution is Totally Positive of order 2 (T P 2) in z and λ2 (Eaton (1987) Example
A.1 p. 468); see Miravete (2011) for a review of applications of TP2 in economics. A density,
which is T P 2 in z and λ2 , has the Monotone Likelihood Ratio (MLR) property (Eaton (1987) p.
467). Since g(z; df, λ2 )/g(z; df, 0) is an increasing function in z, it gives the AUMPI property.
In the general case with df > 1, when neither Gaussianity nor the conditions of Proposition 3
apply, we cannot use the same reasoning, since the density f (z; λ1 , ..., λdf ) of df 2 2
P
j=1 µj χ (1, λj ),
′
19
From Proposition 1 (c), we have 0 = diag(Ŝ) = 2V̂ε−1 X̂ vech(Ŝ ∗ ). Therefore, vech(Ŝ ∗ ) lies in the orthogonal
complement of the range of X̂ in sample.
20
Here, we do not deal with invariance to data transformations but rather with invariance to parameterization of Ĝ
and D̂. However, if we consider tests based on the elements of vector Ŵ , this difference is immaterial.
20
z and λ2 . Instead, our strategy to get a new AUMPI result uses a power series representation of
the density of df 2 2
P
j=1 µj χ (1, λj ) in terms of central chi-square densities from Kotz, Johnson, and
f (z;λ1 ,...,λdf )
Boyd (1967). Under the sufficient condition (7) in Proposition 5, the density ratio f (z;0,...,0)
is
monotone increasing in z.
Proposition 5 Let Assumptions 1-2, A.1-A.6 hold. (a) Let us assume that, for any DGP in the
subset H̄1,loc (k) ⊂ H1,loc (k) of the local alternative hypothesis, we have for any integer m ≥ 3:
X (j − l)Γ( df2 )2
[cj (λ1 , ..., λdf )cl (0, ..., 0) − cl (λ1 , ..., λdf )cj (0, ..., 0)] ≥ 0, (7)
j>l≥0,j+l=m
Γ( df2 + j)Γ( df2 + l)
where Γ(·) is the Gamma function, cj (λ1 , ..., λdf ) := E[Q(λ1 , ..., λdf )j ]/j! for Q(λ1 , ..., λdf ) =
1
Pdf √
1 − νj λj )2 , νj = 1 − µ1j µ1 with the µj ranked in increasing order, and
p
2 j=1 ( ν j X j +
Xj ∼ N (0, 1) are mutually independent. Then, the statistic LR(k) yields an AUMPI test against
H̄1,loc (k). (b) Suppose that either λ21 + (1 − ν2 )λ22 ≥ ν2 and (1 − ν2 )λ22 ≥ 21 ν2 when df = 2, or
df −1 df −1
!
X νdf X
1{i = 0}λ21 + ρij (1 − νj )λ2j + (1 − νdf )λ2df ≥ df − 2 − ρi+1
j , (8)
j=2
i+1 j=2
νj
for all i ≥ 0, where ρj := νdf
, when df ≥ 3. Then, Inequalities (7) hold for any m ≥ 3.
Conditions (7) involve polynomial inequalities in the parameters λj of the alternative hypoth-
esis, and parameters νj of the weights of the non-central chi-square distributions, j = 1, ..., df . It
is challenging to establish an explicit characterization of the λj and νj equivalent to Inequalities
(7), unless df = 1.21 By deploying a novel characterization of the cj (λ1 , ..., λdf ) in terms of a re-
currence relation (Lemma 2), we establish explicit sufficient conditions in part b) of Proposition 5.
Inequalities (8) are linear in the λ2j , and define a non-empty convex domain in the (λ21 , .., λ2df ) space,
that does not contain the origin λ1 = ... = λdf = 0 (unless the DGP is such that ν2 = ... = νdf , in
which case the RHS of (8) is nil for all i and thus any λ2j meet the inequalities). Proposition 5 (b)
21
Inequalities (7) with df = 1 are easily proved to hold. In such a case, we can use the asymptotic distribution of a
scaled chi-square variable and its MLR property.
21
implies that, for a given set of values of df , the MLR property holds if λj ≥ λ for all j, uniformly
for νj ≤ ν̄, where λ > 0 is a constant that depends on ν̄ < 1. Vanishing values of the νj correspond
to homogenous weights µj , i.e., the scaled non-central chi-square distribution with df degrees of
freedom. Hence, the AUMPI property in Proposition 5 holds in neighborhoods of DGPs that match
the conditions of Proposition 3 (e.g. Gaussian errors) for alternative hypotheses that are sufficiently
separated from the null hypothesis. Besides, Proposition 5 shows that the Gaussian case is not the
only design delivering an AUMPI test. Further, in the SMC, we establish a new analytical repre-
sentation of the coefficients ck (λ1 , ..., λdf ) in terms of matrix product iterations. That analytical
representation allows us to check numerically the validity of Inequalities (7) for given df , λj , νj ,
and m = 1, ...., M , for a large bound M (see Appendix F). In Appendix F, when Inequalities (8)
are met, we always conclude to the MLR property in the numerical checks as predicted by the
theory of Proposition 5. There, we also provide numerical evidence that the domain of validity of
the MLR property is relevant for our empirical application. The sufficient conditions (7) and (8) in
Proposition 5 yielding the monotone property of density ratios have potentially broad application
outside the current setting to show AUMPI properties since other test statistics share an asymptotic
distribution characterized by a positive definite quadratic form in normal vectors.
6 Empirical application
In this section, we test hypotheses about the number of latent factors driving stock returns in short
subperiods of the Center for Research in Securities Prices (CRSP) panel. Then, we decompose
the cross-sectional variance into systematic and idiosyncratic components. We also check whether
there is spanning between the estimated latent factors and standard observed factors.
22
6.1 Testing for the number of latent factors
We consider monthly returns of U.S. common stocks trading on the NYSE, AMEX or NASDAQ
between January 1963 and December 2021, and having a non-missing Standard Industrial Clas-
sification (SIC) code. We partition subperiods into bull and bear market phases according to the
classification methodology of Lunde and Timmermann (2004).22 We implement the tests using
a rolling window of T = 20 months, moving forward 12 months each time (adjacent windows
overlap by 8 months), thereby ensuring that we can test up to 14 latent factors in each subperiod.
The size of the cross-section n ranges from 1768 to 6142, and the median is 3680. We only con-
sider stocks with available returns over the whole subperiod, so that our panels are balanced. In
each subperiod, we sequentially test H0 (k) vs H1 (k), for k = 0, . . . , kmax , where kmax = 14 is
the largest nonnegative integer such that df > 0 (see Table 3 in OA). We compute the variance-
covariance estimator Ω̂Z̄ ∗ using a block structure implied by the partitioning of stocks by the first
two digits of their SIC code. The number of blocks ranges from 61 to 87 over the sample, and
the number of stocks per block ranges from 1 to 641. The median number of blocks is 76 and
the median number of stocks per block is 21. We display the p-values of the statistic LR(k) over
time for each subperiod in the upper panel of Figure 1, stopping at the smallest k such that H0 (k)
is not rejected at level αn = 10/nmax , where nmax is the largest cross-sectional sample size over
all subperiods, so that αn = 0.16% in our data. If no such k is found then p-values are displayed
up to kmax . The n-dependent size adjustment controls for the over-rejection problem induced by
sequential testing (see Section 6.2 below). Overall, the results point to a higher number of latent
factors during bear market phases compared to bull market phases and a decrease of the number
of factors over time.23 It remains true for the three-month recession periods 1987/09-1987/11 and
22
We fix their parameter values λ1 = λ2 = 0.2 for the classification based on the nominal S&P500 index. Bear
periods are close to NBER recessions.
23
We also investigate stability of the factor structure by dividing each window of 20 months into two overlapping
subperiods of 16 months (overlap of 12 months) and by estimating canonical correlations between the betas in each
subperiod (see SMC). We find that the fraction of the number of latent factors which are common factors is 1 in 70%
23
2020/01-2020/03, which represent only a fraction of their respective subperiods, although there are
"bull" market periods finding a similar number of latent factors. In particular, our results based on
a fixed T and large n approach contradict the comprehension of a single factor model during mar-
ket downturns due to estimated correlations between equities approaching 1. It is consistent with
the presence of risk factors, such as tail risk or liquidity risk, only showing in distress periods. A
rise in the estimated k often happens towards the end of the recession periods. It is consistent with
the methodology of Lunde and Timmermann (2004) being early in detecting bear periods (early
warning system). The average estimated number of factors is around 7, close to the 4 to 6 factors
found by PCA in Bai and Ng (2006) on large time spans of individual stocks.24
Building on the results in Pötscher (1983), we can obtain a consistent estimator of the number of
latent factors in each subperiod by allowing the asymptotic size α go to zero as n → ∞ in the
sequential testing procedure. We let k̂ be defined as the smallest nonnegative integer k satisfying
pval(k) > αn , where pval(k) is the p-value from testing H0 (k), and αn is a sequence in [0, 1] with
αn → 0. In practice, we take αn = 10/nmax .25 If no such k is found after sequentially testing
H0 (k), for k = 0, . . . , kmax at level αn , then we take k̂ = kmax + 1. The Monte Carlo results show
that such a selection procedure works well. We use the estimate k̂ at each subperiod to decompose
the path of the cross-sectional variance of stock returns into its systematic and idiosyncratic parts:
V̂y,tt = F̂t′ F̂t + V̂ε,tt , where F̂ and V̂ε are the FA estimates obtained by extracting k̂ latent factors.
The condition (FA1) ensures that the decomposition holds for any t. Such a decomposition is
of the windows. The fraction is between 0.8 and 1 in 25% of the windows. It is between 0.5 and 0.8 in the remaining
periods.
24
With fixed T , the selection procedure of Zaffaroni (2019), being by construction more conservative than a (mul-
tiple) testing procedure (see the discussion on p. 508 of Gagliardini, Scaillet, and Ossola (2019)), yields a smaller
number of factors. Imposing cross-sectional independence (resp., Gaussianity and cross-sectional independence) for
the LR test gives most of the time an increase by 1 or 2 (1 or 3). We have an average increase of 3 under sphericity.
25
This choice satisfies the theoretical rule log αn /n → 0 given in Pötscher (1983).
24
invariant to the choice of normalization for the latent factors. If we look at time averages on
a subperiod, we get the decomposition V̂ y = F̂ ′ F̂ + V̂ ε , where the overline indicates averaging
V̂y,tt = F̂t′ F̂t + V̂ε,tt on t. In the lower panels of Figure 1, the blue dots correspond to the square root
of those quantities for the volatilities, while the ratios R̂2 = F̂ ′ F̂ /V̂ y and R̂2 under a single-factor
model in the two last panels give measures of goodness-of-fit.26 We can observe an uptrend in total
and idiosyncratic volatilities, while the systematic volatility appears to remain stable over time
even if the number of factors has overall decreased over time.27 As a result, R̂2 is lower on average
after the year 2000, indicating a noisier environment. During the 2007-2008 financial crisis, we can
observe a rise in systematic volatility, causing R̂2 to reach 59% during that period. In bear markets,
R̂2 is often higher. It means that over a bear subperiod, the systematic risk explains a large part of
the cross-sectional total variance even if it is not driven by a single factor as reported in Section 6.1.
The lowest panel in Figure 1 also signals that R̂2 under the constraint of a single-factor model can
be way below the one given by the multifactor model. It also means that the idiosyncratic volatility
is overestimated if we use a single latent factor only. The plots of the equal-weighted market
and firm volatilities used as measures of total and idiosyncratic volatility from a single observed
factor (CAPM) decomposition in Campbell et al. (2023) show similar patterns as our panels in
Figure 1.28 Section 4 of Campbell et al. (2023) discusses economic forces (firm fundamentals and
investor sentiments) driving the observed time-series variation in average idiosyncratic volatility.
26
We do not plot the whole paths date t by date t, but only averages, for readability. If we sum over time instead of
averaging the estimated variances, we get a quantity similar to an integrated volatility (see e.g. Barndorff-Nielsen and
Shephard (2002), Andersen, Bollerslev, Diebold, and Labys (2003), and references in Aït-Sahalia and Jacod (2014)),
and R̂2 is the ratio of such quantities.
27
Cross-sectional independence increases estimated systematic risk in average by 0.6% and decreases estimated
idiosyncratic volatility in average by 0.5%, so that estimated R2 is inflated in average by 4% per month. We get
the same magnitude under cross-sectional independence and Gaussianity. Under sphericity, PCA estimates give an
increase of 1%, a decrease of 1.3%, and an increase of 12.8% for the same quantities.
28
As in Campbell et al. (2023), we have also made the estimation on value-weighted returns and we confirm that
the results are qualitatively similar.
25
6.3 Spanning with observed factors
As discussed in Bai and Ng (2006), we get economic interpretation of latent factors with observed
factors when we have spanning between the latent factors and the observed factors to be used as
proxies in asset pricing (Shanken (1992)). When n and T are large, Bai and Ng (2006) exploit
the asymptotic normality of the empirical canonical correlations between the two sets of factors
to investigate spanning under a symmetric role of the two sets. When T is fixed, we suggest the
following strategy based on testing for the rank of a matrix. Let us consider k O ≥ k empirical
factors that are excess returns of portfolios,29 and let F̂ O denote the T × k O matrix of their values
with row t given by the transpose of fˆtO = n1 ni=1 (yi,t − rf,t )zi,t , where n1 zi,t is a k O × 1 vector of
P
time-varying portfolio weights (long or short positions) based on stocks characteristics. Let matrix
F O with rows ftO = lim n1 ni=1 E[(yi,t − rf,t )zi,t ] be the corresponding large-n population limit.
P
n→∞
The notation F̂ O makes clear that the sample average of weighted excess returns is an estimate of
the population values F O . We need to take this into account in the asymptotic analysis of the rank
test statistics when n → ∞. From the factor model under NA, yi,t = rf,t + ft′ β̃i + εi,t (see footnote
4), and assuming cross-sectional non-correlation of idiosyncratic errors and portfolio weights, we
get F O = F Φ′ , where Φ = lim n1 ni=1 E[zi,t ]β̃i′ is assumed independent of t, t = 1, ..., T . Hence,
P
n→∞
the range of F O is a subset of the range of F , namely the latent factors span the observed factors
(in the population limit sense) by construction. Moreover, Rank(F O ) ≤ k. We can test the null
hypothesis that F and F O span the same linear spaces, namely matrices F and F 0 have the same
range. Such a null hypothesis is equivalent to the rank condition: Rank(F O ) = k.
We build on the rank testing literature; see e.g. Cragg and Donald (1996), Robin and Smith
(RS, 2000), Kleibergen and Paap (KP, 2006), Al-Sadoon (2017).30 We use in particular the RS
29
If k O < k, empirical factors cannot span the latent space by construction. The condition k O ≥ k eases discussion
but is not needed for the rank tests.
30
Ahn, Horenstein and Wang (2018) use that technology in a fixed-n large-T setting, and find that ranks of beta
matrices estimated from either portfolios, or individual stocks, excess returns are often substantially smaller than
the (potentially large) number k O of observed factors. The explanation in large economies is that the portfolio beta
26
and KP statistics. For those tests, the null hypothesis is that a given matrix has a reduced rank
r against the alternative hypothesis that the rank is greater than r. Hence, to test for spanning
by the empirical factors, we consider the null hypothesis H0,sp (r) : Rank(F O ) = r against the
alternative hypothesis H1,sp (r) : Rank(F O ) > r, for any integer r < k.31 We use the asymptotic
expansion F̂ O = F̃ O + √1n ΨF O ,n , where F̃ O = F Φ′n with Φn = n1 ni=1 E[zi,t ]β̃i′ , and the rows
P
of matrix ΨF O ,n are given by ΨF,n,t = √1n ni=1 (ηi,t ft + εi,t zi,t ) with ηi,t := (zi,t − E[zi,t ])β̃i′ .
P
Under H0,sp (r), we assume that Φn has the same null space as Φ, in particular Φn has rank r, for n
large enough.32 We assume the CLT vec(ΨF O ,n ) ⇒ N (0, ΩΨ ). Further, we use the Singular Value
Decomposition (SVD) of matrix F̂ O = Û Ŝ V̂ ′ . Then, the RS and KP statistics are the quadratic
forms SRS = nvec(Ŝ22 )′ vec(Ŝ22 ) and SKP = nvec(Ŝ22 )′ Ω̂−1
S vec(Ŝ22 ), where Ŝ22 is the lower-
right (T − r) × (k O − r) block of matrix Ŝ. Here, Ω̂S = (V̂kO −r ⊗ ÛT −r )′ Ω̂Ψ (V̂kO −r ⊗ ÛT −r ), where
ÛT −r and V̂kO −r are the T ×(T −r) and k O ×(k O −r) matrices in the block forms Û = [Ûr : ÛT −r ]
and V̂ = [V̂r : V̂kO −r ]. In the SMC, we design a consistent estimator Ω̂Ψ of ΩΨ building on a block
structure for the characteristics akin to Assumption 2 and a stationarity condition. The definitions
of the test statistics SRS and SKP are equivalent to those in the original RS and KP papers. The
P −r)(kO −r) S 2
asymptotic distributions under H0,sp (r) are SRS ⇒ (T j=1 δj χj (1) and SKP ⇒ χ2 [(T −
r)(k O − r)], where the δjS are the eigenvalues of matrix ΩS = (VkO −r ⊗ UT −r )′ ΩΨ (VkO −r ⊗ UT −r ),
assumed non-singular.
We build the empirical matrix F̂ O with the time-varying portfolio weights of the Fama-French
five-factor model (Fama and French (2015)) plus the momentum factor (Carhart (1997)), i.e., k O =
6. In the two panels of Figure 2, we can observe that the rank tests point most of the time at a low
reduced rank r either 1 or 2, with only occasionally 3 or 4, for the matrix F̂ O . Observed factors
struggle spanning latent factors since their associated linear space is of a dimension smaller than
the one of the latent factor space. The discrepancy between the dimensions of the two factor
matrices coincide with Φ, and thus they cannot have a rank above the (potentially small) number k of latent factors.
31
Spanning holds if we can reject H0,sp (r) for any r < k.
32
Under Rank(F ) = k, we have Rank(F O ) = Rank(Φ). Hence, under H0,sp (r), matrix Φ has reduced rank r.
27
spaces has decreased over time. According to the KP statistic, the rank deficiency of F̂ O is often
less pronounced in bear markets indicating less redundancy between the observed factors.
7 Concluding remarks
In this paper, we develop a new theory of latent Factor Analysis in short panels beyond the Gaus-
sian and i.i.d. cases. We establish the AUMPI property of the LR statistic for testing hypotheses
on the number of latent factors. Our results for short subperiods of the CRSP panel of US stock
returns contradict the comprehension of a single factor during market downturns. In bear markets,
systematic risk driven by a latent multifactor structure explains a large part of the cross-sectional
variance, and is not spanned by traditional empirical factors with a discrepancy between the dimen-
sions of the two factor spaces decreasing over time. The estimated paths of total and idiosyncratic
volatilities month after month feature an uptrend through time.
References
Ahn, S., and Horenstein, A., 2013. Eigenvalue ratio test for the number of factors. Econometrica
81 (3), 1203-1227.
Ahn, S., Horenstein, A., and Wang, N., 2018. Beta matrix and common factors in stock returns.
Journal of Financial and Quantitative Analysis 53 (3), 1417-1440.
Aigner, D., Hsiao, C., Kapteyn A., and Wansbeek, T., 1984. Latent variable models in economet-
rics, in Handbook of Econometrics, Volume II, Z. Griliches and M.D. Intriligator Eds., 1321-1393.
Aït-Sahalia, Y., and Jacod, J., 2014. High-frequency financial econometrics. Princeton University
Press.
Al-Sadoon, M., 2017. A unifying theory of tests of rank. Journal of Econometrics 199 (1), 49-62.
Ang, A., Liu, J., and Schwarz, K., 2020. Using stocks or portfolios in tests of factor models.
Journal of Financial and Quantitative Analysis 55 (3), 709-750.
28
Andersen, T., Bollerslev, T., Diebold, F., and Labys, P., 2003. Modeling and forecasting realized
volatility. Econometrica 71 (2), 579-625.
Anderson, T. W., 2003. An introduction to multivariate statistical analysis. Wiley.
Anderson, T. W., 1963. Asymptotic theory for Principal Components Analysis. Annals of Mathe-
matical Statistics 34, 122-148.
Anderson, T. W., and Rubin, H., 1956. Statistical inference in factor analysis. Proceedings of the
Third Berkeley Symposium in Mathematical Statistics and Probability 5, 11-150.
Anderson, T. W. and Amemiya, Y., 1988. The asymptotic normal distribution of estimators in
factor analysis under general conditions. Annals of Statistics 16 (2), 759-771.
Ando, T., and Bai, J., 2015. Asset pricing with a general multifactor structure. Journal of Financial
Econometrics 13 (3), 556-604.
Bai, J., 2003. Inferential theory for factor models of large dimensions. Econometrica 71 (1),
135-171.
Bai, J., 2009. Panel data models with interactive effects. Econometrica 77 (4), 1229-1279.
Bai, J., and Li, K., 2012. Statistical analysis of factor models of high dimension. Annals of
Statistics 40 (1), 436-465.
Bai, J., and Li, K., 2016. Maximum likelihood estimation and inference for approximate factor
models of high dimension. Review of Economics and Statistics 98 (2), 298-309.
Bai, J., and Ng, S., 2002. Determining the number of factors in approximate factor models. Econo-
metrica 70 (1), 191-221.
Bai, J., and Ng, S., 2006. Evaluating latent and observed factors in macroeconomics and finance.
Journal of Econometrics 131 (1), 507-537.
Barigozzi, M., and Hallin, M., 2016. Generalized dynamic factor models and volatilities: recover-
ing the market volatility shocks. Econometrics Journal 19 (1), 33-60.
Barndorff-Nielsen, O., and Shephard, N., 2002. Econometric analysis of realized volatility and its
use in estimating stochastic volatility models. Journal of the Royal Statistical Society, Series B, 64
(2), 253-280.
29
Bell, E. T., 1934. Exponential polynomials. Annals of Mathematics 35 (2), 258-277.
Bernstein, D., 2009. Matrix mathematics: Theory, facts and formulas. Princeton University Press.
Bickel, P. J., and Levina, E., 2008. Covariance regularization by thresholding. Annals of Statistics
36 (6), 2577-2604.
Campbell, J., Lettau, M., Malkiel, B., and Xu, Y., 2023. Idiosyncratic equity risk two decades
later. Critical Finance Review, 12, 203-223.
Caner, M., and Han, X., 2014. Selecting the correct number of factors in approximate factor
models: the large panel case with group bridge estimator. Journal of Business and Economic
Statistics 32 (3), 359-374.
Carhart, M., 1997. On persistence in mutual fund performance. Journal of Finance. 52 (1), 57-82.
Chamberlain, G., 1992. Efficiency bounds for semi-parametric regression. Econometrica 60 (3),
567-596.
Cochrane, J., 2005. Asset pricing. Princeton University Press.
Connor, G., and Korajczyk, R., 1986. Performance measurement with the arbitrage pricing theory:
A new framework for analysis. Journal of Financial Economics 15 (3), 373-394.
Connor, G., and Korajczyk, R., 1993. A test for the number of factors in an approximate factor
model. Journal of Finance 48 (4), 1263-1291.
Cragg, J., and Donald, S., 1996. On the asymptotic properties of LDU-based tests of the rank of a
matrix. Journal of the American Statistical Association 91 (435), 1301-1309.
Eaton, M., 1987. Multivariate statistics. A vector space approach. Institute of Mathematical
Statistics Lecture notes-Monograph series, Vol. 53.
Engle, R., 1984. Wald, likelihood ratio, and Lagrange multiplier tests in econometrics, in Hand-
book of Econometrics, Volume II, Z. Griliches and M.D. Intriligator Eds., 775-826.
Fama, E., and French, K., 2015. A five-factor asset pricing model. Journal of Financial Economics
116 (1), 1-22.
Fan, J., Furger, A., and Xiu, D., 2016. Incorporating global industrial classification standard into
portfolio allocation: A simple factor-based large covariance matrix estimator with high-frequency
30
data. Journal of Business and Economic Statistics 34 (4), 489-503.
Fortin, A.-P., Gagliardini, P., and Scaillet, O., 2023. Eigenvalue tests for the number of latent
factors in short panels. Journal of Financial Econometrics, nbad024.
Freyberger, J., 2018. Non-parametric panel data models with interactive fixed effects. Review of
Economic Studies 85 (3), 1824-1851.
Gagliardini, P., Ossola, E., and Scaillet, O., 2016. Time-varying risk premium in large cross-
sectional equity datasets. Econometrica 84 (3), 985-1046.
Gagliardini, P., Ossola, E., and Scaillet, O., 2019. A diagnostic criterion for approximate factor
structure. Journal of Econometrics 21 (2), 503-521.
Gagliardini, P., Ossola, E., and Scaillet, O., 2020. Estimation of large dimensional conditional
factor models in finance, in Handbook of Econometrics, Volume 7A, S. Durlauf, L. Hansen, J.
Heckman, and R. Matzkin Eds., 219-282.
Jöreskog, K., 1970. A general method for analysis of covariance structures. Biometrika 57 (2),
239-251.
Kapetanios, G., 2010. A testing procedure for determining the number of factors in approximate
factor models with large datasets. Journal of Business and Economic Statistics 28 (3), 397-409.
Kim, S., and Skoulakis, G., 2018. Ex-post risk premia estimation and asset pricing tests using large
cross-sections: the regression-calibration approach. Journal of Econometrics 204 (2), 159-188.
Kleibergen, F., and Paap, R., 2006. Generalized reduced rank tests using the singular value de-
composition. Journal of Econometrics 133 (1), 97-126.
Kleibergen, F., and Zhan, Z., 2023. Identification-robust inference for risk premia in short panels.
Working paper, University of Amsterdam.
Kotz, S., Johnson, N., and Boyd, D., 1967. Series representations of distributions of quadratic
forms in Normal variables II. Non-central case. Annals of Mathematical Statistics 38 (3), 838-848.
Lancaster, T., 2000. The incidental parameter problem since 1948. Journal of Econometrics 95
(2), 391-413.
Lehmann, E., and Romano, D., 2005. Testing statistical hypotheses, Springer Texts in Statistics.
31
Lunde, A., and Timmermann, A., 2004. Duration dependence in stock prices: An analysis of bull
and bear markets. Journal of Business and Economic Statistics 22 (3), 253-273.
Magnus, J., and Neudecker, H., 2007. Matrix differential calculus, with applications in statistics
and econometrics. Wiley.
Miravete, E., 2011. Convolution and composition of totally positive random variables in eco-
nomics. Journal of Mathematical Economics 47 (4), 479-490.
Moon, H.R., and Weidner, M., 2015. Linear regression for panel with unknown number of factors
as interactive fixed effects. Econometrica 83 (4), 1543-1579.
Neyman, J., and Scott, E., 1948. Consistent estimation from partially consistent observations.
Econometrica 16(1), 1-32.
Onatski, A., 2009. Testing hypotheses about the number of factors in large factor models. Econo-
metrica 77 (5), 1447-1479.
Onatski, A., 2010. Determining the number of factors from empirical distribution of eigenvalues.
Review of Economics and Statistics 92 (4), 1004-1016.
Onatski, A. 2023. Comment on “Eigenvalue tests for the number of latent factors in short panels"
by A.-P. Fortin, P. Gagliardini and O. Scaillet, Journal of Financial Econometrics, nbad28.
Pesaran, M.H., 2006. Estimation and inference in large heterogeneous panels with a multifactor
error structure. Econometrica 74 (4), 967-1012.
Pötscher, B., 1983. Order estimation in ARMA-models by Lagrangian multiplier tests, Annals of
Statistics 11 (3), 872-885.
Raponi, V., Robotti, C., and Zaffaroni, P., 2020. Testing beta-pricing models using large cross-
sections. Review of Financial Studies 33 (6), 2796-2842.
Renault, E., Van Der Heijden, T., and Werker, B., 2023. Arbitrage pricing theory for idiosyncratic
variance factors. Journal of Financial Econometrics 21 (5), 1403-1442.
Robin, J.-M., and Smith, R., 2000. Tests of rank. Econometric Theory 16 (2), 151-175.
Romano, J., Shaikh, A., and Wolf, M., 2010. Hypothesis testing in econometrics. Annual Review
of Economics 2, 75-104.
32
Shanken, J. 1992. On the estimation of beta pricing models. Review of Financial Studies 5 (1),
1-33.
Stock, J., and Watson, M., 2002. Forecasting using principal components from a large number of
predictors. Journal of the American Statistical Association 97 (460), 1167-1179.
Tao, T. 2012. Topics in random matrix theory. Graduate Studies in Mathematics, Volume 132,
American Mathematical Society.
White, H., 1982. Maximum likelihood estimation of misspecified models. Econometrica 50 (1),
1-25.
Zaffaroni, P., 2019. Factor models for conditional asset pricing. Working Paper.
Appendix
A Regularity assumptions
In this appendix, we list and comment the additional assumptions used to derive the large sample
properties of the estimators and test statistics. We often denote by C > 0 a generic constant, and we
use δj (A) to denote the jth largest eigenvalue of a symmetric matrix A. Set Θ is a compact subset of
{θ = (vec(F )′ , diag(Vε )′ )′ ∈ Rr : Vε is diagonal and positive definite, F ′ Vε−1 F is diagonal, with
diagonal elements ranked in decreasing order} with r = (T +1)k, and function L0 (θ) = − 12 log |Σ(θ)|−
1
2
T r (Vy Σ(θ)−1 ) is the population FA criterion, where Σ(θ) = F F ′ + Vε and Vy = plim V̂y . Fur-
n→∞
ther, θ0 = (vec(F0 )′ , diag(Vε0 )′ )′ denotes the vector of true parameter values under H0 (k) and is
an interior point of set Θ.
Assumption A.1 The non-zero eigenvalues of Vy Vε−1 − IT are distinct, i.e., γ1 > ... > γk > 0.
Assumption A.2 The loadings are normalized such that β̄ = n1 ni=1 βi = 0 and Ṽβ := n1 ni=1 βi βi′
P P
Assumption A.6 (a) The T (T2+1) × T (T2+1) symmetric matrix D = lim Dn exists, where Dn =
n→∞
1
Pn 2 ′ ′
n i=1 σii V [vech(wi wi )]. (b) We have δT (T +1)/2 (V [vech(wi wi )]) ≥ c, for all i ∈ S̄, where
S̄ ⊂ {1, ..., n} with n1 ni=1 1i∈S̄ ≥ 1 − 21C̄ , for constants C̄, c̄ > 0, such that σii ≤ C̄. (c) We have
P
P n P
lim κn = κ for a constant κ ≥ 0, where κn := n1 Jm=1 σ 2
i̸=j∈Im ij .
n→∞
Assumption A.7 Under the alternative hypothesis H1 (k), (a) function L0 (θ) has a unique maxi-
mizer θ∗ = (vec(F ∗ )′ , diag(Vε∗ )′ )′ over Θ, and (b) we have Vy ̸= F ∗ (F ∗ )′ + Vε∗ .
Assumption A.1 removes the rotational indeterminacy (up to sign) that occurs when identify-
ing the columns of matrix F as eigenvectors of Vy Vε−1 . Assumptions A.2 and A.3 require uniform
bounds on factor loadings as well as on covariances and higher-order moments of the idiosyncratic
errors. Assumption A.4 implies global identification in the FA model (see Lemma 4). Assump-
tions A.2-A.4 yield consistency of FA estimators (see proof of Lemma 5). Assumption A.5 is the
local identification condition in the FA model (see Lemma 6). We use Assumption A.6 together
with Assumption A.3 to invoke a CLT based on a multivariate Lyapunov condition (see proof of
Lemma 1) to establish the asymptotic distribution of the LR statistic. To ease the verification of
the Lyapunov condition, we bound a fourth-order moment of squared errors, which explains why
we require finite eight-order moments in Assumption A.3. We could relax this condition at the
expense of a more sophisticated proof of Lemma 1. The mild Assumption A.6 (b) requires that
the smallest eigenvalue of V [vech(wi wi′ )] is bounded away from 0 for all assets i up to a small
fraction. In Assumption A.6 (c), in order to have κn bounded, we need either mixing dependence
in idiosyncratic errors within blocks, i.e., |σi,j | ≤ Cρ|i−j| for i, j ∈ Im and 0 ≤ ρ < 1, or van-
ishing correlations, i.e., |σi,j | ≤ Cb−s̄
m,n for all i ̸= j ∈ Im and a constant s̄ ≥ 1/2, with blocks
of equal size. In Assumption A.7, part (a) defines the pseudo-true parameter value (White (1982))
under the alternative hypothesis, and part (b) is used to establish the consistency of the LR test
34
under global alternative hypotheses (see proof of Proposition 2). Finally, Assumption A.8 is used
to apply a Lyapunov CLT (see proof of Lemma 7) when deriving the asymptotic normality of the
FA estimators.
the P̂j are the orthogonal projection matrices onto the eigenspaces for the T − k smallest eigen-
values. Then, Part (a) follows. Part (b) is a consequence of the squared Frobenius norm of a
symmetric matrix being equal to the sum of its squared eigenvalues. For part (c), let PF̂ ,V̂ε =
IT − MF̂ ,V̂ε and note that F̂ F̂ ′ = PF̂ ,V̂ε (V̂y − V̂ε ) + (V̂y − V̂ε )PF̂′ ,V̂ − PF̂ ,V̂ε (V̂y − V̂ε )PF̂′ ,V̂
ε ε
= V̂y − V̂ε − MF̂ ,V̂ε (V̂y − V̂ε )MF̂′ ,V̂ , where the first equality is because the three terms on the
ε
RHS are all equal to F̂ F̂ ′ by (FA2). The conclusion follows from (FA1) and V̂ε being diagonal. Fi-
nally, part (d) follows since n1 ε̂ε̂′ = MF̂ ,V̂ε V̂y MF̂′ ,V̂ , V̂ε MF̂′ ,V̂ = MF̂ ,V̂ε V̂ε , and MF̂ ,V̂ε is idempotent,
ε ε
−1/2 −1/2 −1/2 1/2
which implies V̂ε MF̂ ,V̂ε V̂ε MF̂′ ,V̂ V̂ε = V̂ε MF̂ ,V̂ε V̂ε .
ε
Let us now show the asymptotic expansion (5). Since ĜĜ′ V̂ε−1 = MF̂ ,V̂ε , we have Ŝ =
35
−1/2 −1/2
V̂ε ĜŜ ∗ Ĝ′ V̂ε , where Ŝ ∗ = Ĝ′ V̂ε−1 (V̂y − V̂ε )V̂ε−1 Ĝ. Therefore, from Proposition 1 (c) we
′
get 0 = diag(Ŝ) = V̂ε−1 diag(ĜŜ ∗ Ĝ′ ) = 2V̂ε−1 X̂ vech(Ŝ ∗ ), i.e., the condition diag(Ŝ) = 0 is
equivalent to vech(Ŝ ∗ ) being in the orthogonal complement of the range of X̂. It follows from the
local identification assumption A.5 that vech(Ŝ ∗ ) = MX̂ vech(Ŝ ∗ ).33 Next, we have
MX̂ vech(Ŝ ∗ ) = MX̂ vech(Ĝ′ V̂ε−1 (V̂y − V̂ε )V̂ε−1 Ĝ) = MX̂ vech(Ĝ′ V̂ε−1 (V̂y − Ṽε )V̂ε−1 Ĝ), (B.1)
because the kernel of MX̂ is {vech(Ĝ′ DĜ) : D diagonal}. Besides, we have the expansion
√
nvech(Ĝ′ V̂ε−1 (V̂y − Ṽε )V̂ε−1 Ĝ) = vech(Ẑn∗ ) + op (1), where Ẑn∗ = Ĝ′ V̂ε−1 Zn V̂ε−1 Ĝ. It is be-
′ 1 1 1 ′ ′ ′
√ 1 ′
cause V̂y − Ṽε = F F + n Ψy + op ( n ), with Ψy = n (εβF + F β ε ) + n n εε − Ṽε (see
√ √ √
Equation (E.2) and Lemma 5 in Appendix E.2), and Ĝ′ V̂ε−1 F = Ĝ′ V̂ε−1 MF̂ ,V̂ε F = Op ( √1n ) by the
root-n consistency of FA estimators (see Appendix E.5.1). Using ∥Ŝ∥2 = ∥Ŝ ∗ ∥2 , it follows that
n
∥Ŝ∥2 = nvech(Ŝ ∗ )′ vech(Ŝ ∗ ) = nvech(Ŝ ∗ )′ MX̂ vech(Ŝ ∗ ) = vech(Ẑn∗ )′ MX̂ vech(Ẑn∗ ) + op (1).
2
(B.2)
From MF̂ ,V̂ε = MF,Vε + op (1), we have ĜÔ = G + op (1) for some (possibly data-dependent) (T −
k)×(T −k) orthogonal matrix Ô. Since vech(Ẑn∗ )′ MX̂ vech(Ẑn∗ ) is invariant to post-multiplication
of Ĝ by an orthogonal matrix (see Proposition 9 in Appendix E.6), from (B.2) we get n2 ∥Ŝ∥ =
vech(Zn∗ )′ MX vech(Zn∗ ) + op (1), which is the asymptotic expansion (5).
Let us now establish the asymptotic normality of vech(Zn∗ ). For any integer m, we let Am
denote the unique m2 × m(m+1)
2
matrix satisfying vec(S) = Am vech(S) for any m × m symmetric
matrix S.34 Matrix Am satisfies A′m Am = 2I m(m+1) , Am A′m = Im2 + Km,m , and Km,m Am =
2
Am , where Km,m is the commutation matrix (see also Magnus, Neudecker (2007) Theorem 12
33
Assumption A.5 is equivalent to X having full column rank by Lemma 6 in Appendix E.4. Besides, from Propo-
sition 9 in Appendix E.6 and the fact that ĜÔ = G + op (1) for some rotation matrix Ô (see below), we have
R̂ X̂ = X + op (1) from some orthogonal matrix R̂ . Hence, X̂ is invertible with probability approaching 1, and MX̂
is well-defined.
34
√ √
The explicit form for Am is Am = 2(e1 ⊗ e1 ) : · · · : 2(em ⊗ em ) : {ei ⊗ ej + ej ⊗ ei }i<j , with ei being
the ith unit vector of dimension m.
36
−1/2 −1/2
in Chapter 2.8). Then, we have vech(Zn∗ ) = R′ vech(Zn ), where Zn = Vε Zn Vε ,R =
−1/2
1 ′
A (Q
2 T
⊗ Q)AT −k , and Q = Vε G. Matrix R satisfies R′ R = Ip . The next lemma establishes
the asymptotic normality of vech(Zn ).
−1/2
Lemma 1 (a) Under Assumptions 1-2, A.2, A.6 (a)-(b), we have Ωn vech(Zn ) ⇒ N (0, I T (T +1) )
P n P 2
as n → ∞ and T is fixed, where Ωn = Dn + κn I T (T +1) , and κn = n1 Jm=1 σ 2
i̸=j∈Im ij . If
2
additionally Assumption A.6 (c) holds, then vech(Zn ) ⇒ N (0, Ω), with Ω := D + κI T (T +1) .
2
Lemma 1 yields the asymptotic normality of vech(Zn∗ ), namely vech(Zn∗ ) ⇒ N (0, ΩZ ∗ ), with
ΩZ ∗ = R′ ΩR. Part (a) then follows from expansion (5) and the standard result on the distribution
of idempotent quadratic forms of Gaussian vectors.
∗ ′ −1
(ỹi ỹi′ ) V̂ε−1 Ĝ with ỹi = yi − ȳ, since ε̂i = MF̂ ,V̂ε ỹi and
P
(b) We have ẑm,n = i∈Im Ĝ V̂ε
in the proof of Lemma 1 using Assumption 2 (d). Additionally, by Assumption A.6, we have
37
Ωn = Ω + o(1). Thus, Ω̃n = Ω + op (1). Now, from the proof of part (a) we have ĜÔ = G + op (1)
for some (T − k) × (T − k) orthogonal matrix Ô. Then, by Proposition 9 (e) in Appendix E.6,
we have R̂MX̂ R̂ −1 = RMX + op (1), for a p dimensional orthogonal matrix R̂ ≡ R(Ô). We
conclude that R̂MX̂ Ω̃Z ∗ MX̂ R̂ −1 is a consistent estimator of MX ΩZ ∗ MX as n → ∞ and T is
fixed. Part (b) then follows from the continuity of eigenvalues for symmetric matrices, and their
invariance under pre- and post-multiplication by an orthogonal matrix and its transpose.
p p p
(c) Under H1 (k) and Assumption A.7 (a), we have F̂ → F ∗ and V̂ε → Vε∗ . Then, Ŝ → S ∗
with S ∗ = (Vε∗ )−1/2 MF ∗ ,Vε∗ (Vy − Vε∗ )MF′ ∗ ,Vε∗ (Vε∗ )−1/2 ̸= 0. Indeed, if S ∗ were the null matrix,
then we would have MF ∗ ,Vε∗ (Vy − Vε∗ )MF′ ∗ ,Vε∗ = 0, which implies Vy − Vε∗ = PF ∗ ,Vε∗ (Vy − Vε∗ ) +
(Vy − Vε∗ )PF′ ∗ ,Vε∗ − PF ∗ ,Vε∗ (Vy − Vε∗ )PF′ ∗ ,Vε∗ , with PF ∗ ,Vε∗ = IT − MF ∗ ,Vε∗ . From the probability
limits of Equation (FA2) for pseudo values, we have PF ∗ ,Vε∗ (Vy − Vε∗ ) = (Vy − Vε∗ )PF′ ∗ ,Vε∗ =
PF ∗ ,Vε∗ (Vy − Vε∗ )PF′ ∗ ,Vε∗ = F ∗ (F ∗ )′ (see proof of Proposition 1 (c)). Thus Vy = F ∗ (F ∗ )′ + Vε∗ ,
in contradiction with Assumption A.7 (b). Thus, n∥Ŝ∥2 ≥ Cn, w.p.a. 1, for a constant C > 0.
∗
) = vech(Ĝ′ V̂ε−1 ( i∈Im ỹi ỹi′ )V̂ε−1 Ĝ) and the conditions on Θ, we get
P
Moreover, using vech(ẑm,n
∗
)∥ ≤ C i∈Im ∥ỹi ∥2 . Then, from Assumptions A.2 and A.3, E[∥MX̂ Ω̂Z ∗ MX̂ ∥] ≤
P
∥vech(ẑm,n
C n1 Jm=1 bm,n = O(n Jm=1 ). Moreover, Jm=1
Pn 2 Pn 2
Pn 2
Bm,n Bm,n = o(1). Indeed, Assumption 2
δ
(d) implies Bm,n ≤ cn− δ+1 uniformly in m, for any c > 0 and n large enough, and hence
PJn δ PJ
2 − δ+1
m=1 Bm,n ≤ c, for any c > 0 and n large. Part (c) follows from the
n
m=1 Bm,n = cn
38
βloc ] = Ik+1 and Lemma 5 (a) in Appendix D, we get V̂y − Ṽε = F F ′ + √1n Ψy,loc + Ry , where
√
′ 1 ′ ′ ′ 1 ′
Ψy,loc = ck+1 ρk+1 ρk+1 + √ (εβF + F β ε ) + n εε − Ṽε , (B.3)
n n
′ ′
and Ry = n1 (εβloc Fk+1 + Fk+1 βloc ε′ ) + [Fk+1 Fk+1
′
− n−1/2 ck+1 ρk+1 ρ′k+1 ] + op ( √1n ). Using Fk+1 =
√ √ √
γk+1 ρk+1 and nγk+1 = ck+1 + o(1), we get Ry = op (1/ n). Subsituting the expansion for
V̂y − Ṽε into (B.1), and repeating the arguments leading to expansion (5) yields expansion (6).
∗ ′
From Lemma 1, we get vech(Zn,loc ) ⇒ N (ck+1 vech(ξk+1 ξk+1 ), ΩZ ∗ ) as n → ∞. The result then
follows from the standard result on the distribution of idempotent quadratic forms of non-central
Gaussian vectors.
Proof of Proposition 5: The proof of part (a) is in three steps. (i) The testing problem asymp-
totically simplifies to the null hypothesis H0 : λ1 = ... = λdf = 0 vs. the alternative hypothesis
H1 : ∃λj > 0, j = 1, ..., df . Let us define λ0 = (0, ..., 0)′ for the null hypothesis and pick a
given vector λ1 = (λ1 , ..., λdf )′ in the alternative hypothesis, and consider the test of λ0 versus
λ1 (simple hypothesis). By Neyman-Pearson Lemma, the most powerful test for λ0 versus λ1
rejects the null hypothesis when f (z; λ1 , ..., λdf )/f (z; 0, ..., 0) is large, i.e., the test function is
n o
f (z;λ1 ,...,λdf )
ϕ(z) = 1 f (z;0,...,0) ≥ C for a constant C > 0 set to ensure the correct asymptotic size.
f (z;λ1 ,...,λdf )
(ii) Let us now show that the density ratio f (z;0,...,0)
is an increasing function of z. To show
Pdf
this, we can rely on an expansion of the density of j=1 µj χ2 (1, λ2j ) in terms of central chi-square
densities (Kotz, Johnson, and Boyd (1967) Equations (144) and (151)):
∞
X
f (z; λ1 , ..., λdf ) = c̄k (λ1 , ..., λdf )g(z; df + 2k, 0), (B.4)
k=0
Pdf 2
where the coefficients c̄k (λ1 , ..., λdf ) = Ae− j=1 λj /2 E[Q(λ1 , ..., λdf )k ]/k! involve moments of
df 2
X 1/2
the quadratic form Q(λ1 , ..., λdf ) = (1/2) νj Xj + λj (1 − νj )1/2 of the mutually inde-
j=1
Qdf −1/2 1
pendent variables Xj ∼ N (0, 1), A = j=1 µj , and νj = 1 − µj
minℓ µℓ . Without loss
of generality for checking the monotonicity, we have rescaled the density so that minj µj =
P∞
f (z;λ1 ,...,λdf ) k=0 c̄k (λ1 ,...,λdf )g(z;df +2k,0)
1. Then, from (B.4), we get the ratio: f (z;0,...,0)
= P ∞ . By dividing
k=0 c̄k (0,...,0)g(z;df +2k,0)
39
both the numerator and the denominator by the central chi-square density g(z; df, 0), we get
Pdf P∞ Pdf
f (z;λ1 ,...,λdf ) λ2j /2 k=0 ck (λ1 ,...,λdf )ψk (z) λ2j /2
f (z;0,...,0)
= e− j=1 P∞ =: e− j=1 Ψ(z; λ1 , ..., λdf ), where ψk (z) :=
k=0 ck (0,...,0)ψk (z)
Γ( df ) k
g(z; df + 2k, 0)/g(z; df, 0) = 2
2k Γ( df +k)
z is the ratio of central chi-square distributions with
2
df + 2k and df degrees of freedom, and ck (λ1 , ..., λdf ) = E[Q(λ1 , ..., λdf )k ]/k!. We use the
Pdf
λ2j /2
short notation ck (λ) := ck (λ1 , ..., λdf ) and ck (0) := ck (0, ..., 0). The factor e− j=1 does
not impact on the monotonicity of the density ratio. We take the derivative of Ψ(z; λ1 , ..., λdf )
( ∞
P∞
k=1 ck (λ)ψk (z))(1+ k=1 ck (0)ψk (z))
′
P
with respect to argument z and get ∂z Ψ(z; λ1 , ..., λdf ) = P∞ 2 −
( k=0 ck (0)ψk (z))
(1+ ∞
P∞
k=1 ck (λ)ψk (z))( k=1 ck (0)ψk (z))
′
P
P∞ 2 . The sign is given by the difference of the numerators, which
P∞ ( k=0 ck (0)ψk (z)) ′
is k=1 [ck (λ) − ck (0)]ψk (z) + k,l=1,k̸=l ck (λ)cl (0)[ψk′ (z)ψl (z) − ψk (z)ψl′ (z)] = ∞
P∞ P
k=1 [ck (λ) −
ck (0)]ψk′ (z) + ∞ ′ ′ ′
P
k,l=1,k>l [ck (λ)cl (0) − cl (λ)ck (0)][ψk (z)ψl (z) − ψk (z)ψl (z)]. We use ψk (z) =
Γ( d2 )k Γ( d )2
2 Γ( d2 +k)
k z k−1 and ψk′ (z)ψl (z) − ψk (z)ψl′ (z) = (k − l) 2k+l Γ( d +k)Γ(
2
d
+l)
z k+l−1 for k > l and z ≥ 0.
2 2
The difference of the numerators in the derivative of the density ratio becomes:
d d
1 Γ( 2 ) 1 2Γ( 2 )
P∞ 1 Γ( d2 )
[c
2 Γ( d +1) 1
(λ) − c 1 (0)] + [c
22 Γ( d +2) 2
(λ) − c 2 (0)]z + m=3 2m m Γ( d +m) [cm (λ) − cm (0)]
2 2 2
(k−l)Γ( d2 )2
P∞ 1
m−1 m−1
P
+ k>l≥1,k+l=m Γ( d +k)Γ( d +l) [ck (λ)cl (0) − cl (λ)ck (0)] z = m=1 2m κm z , with κm :=
2 2
d
P Γ( 2 )2
k>l≥0,k+l=m (k−l) Γ( d +k)Γ( d +l) [ck (λ)cl (0)−cl (λ)ck (0)]. A direct calculation shows that κ1 , κ2 ≥
2 2
0. Hence, a sufficient condition for monotonicity of the density ratio is κm ≥ 0, for all m ≥ 3, i.e.,
Inequalities (7). Thus, the test rejects for large values of the argument, i.e., ϕ(z) = 1{z ≥ C̄},
where the constant C̄ is determined by fixing the asymptotic size under the null hypothesis.
(iii) Since the test function ϕ does not depend on λ1 , it is AUMPI in the class of hypothesis
tests based on the LR statistic (or the squared norm statistic). It yields part (a).
Let us now turn to the proof of part (b). From the definition of the κm coefficients written as
(j−l)Γ( df )2 c (λ)
κm = j>l≥0,j+l=m Γ( df +j)Γ(2df +l) cj (0)cl (0)[ cjj (0) − ccll(λ)
P
(0)
], it is sufficient to get κm ≥ 0, for all m,
2 2
cj (λ)
that sequence cj (0)
, for j = 0, 1, ..., is increasing. To prove that, we link the coefficients cj (λ) to
the complete exponential Bell’s polynomials (Bell (1934)) and establish the following recurrence.
1
Pl 1 Pdf i 2
Lemma 2 We have cl+1 (λ) = l+1 i=0 2 j=1 νj νj + (i + 1)(1 − νj )λj cl−i (λ), for l ≥ 0.
cl (λ) c̃l (λ) −l −l
We use cl (0)
= γl
, where we obtain the sequences γl := cl (0)νdf and c̃l (λ) := cl (λ)νdf by
40
−l 1
Pl 1 Pdf −1 i+1
standardization with νdf . From Lemma 2, we have γl+1 = l+1 i=0 2 1 + j=2 ρj γl−i with
P h i
1
P l 1 df i i+1 2
γ0 = 1, and c̃l+1 (λ) = l+1 i=0 2 j=1 ρj ρj + νdf (1 − νj )λj c̃l−i (λ) with c̃0 (λ) = 1 (note
that ρ1 = 0 and ρdf = 1). To prove that sequence c̃lγ(λ)
l
is increasing, the next lemma provides a
sufficient condition from "separation" of the coefficients that define the recursive relations.
Pdf −1
1 i
Lemma 3 Let (ai ) be a real sequence, and let bi = 2
1+ j=2 ρj , for i ≥ 1, where 0 ≤ ρj ≤
1. Let sequences (gl ) and (cl ) be defined recursively by gl+1 = 1l (b1 gl + b2 gl−1 + ... + bl ) and
cl+1 = 1l (a1 cl + a2 cl−1 + ... + al ), with g1 = c1 = 1. Suppose that ai ≥ max{ df2−1 , 1}, for all i
(separation condition). Then, sequence ( gcll ) is increasing.
We apply Lemma 3 to sequences c̃l (λ) and γl . We detail the case df ≥ 3 (for df = 2 the
h i
analysis is simpler). The separation condition 12 df i i+1 2 df −1
P
j=1 jρ ρj + νdf
(1 − ν j )λj ≥ 2
, for i = 0,
Pdf −1 Pdf −1 i
yields λ21 + df 2
P
j=2 (1 − νj )λj ≥ νdf df − 2 − j=2 ρj , and, for i ≥ 1, it yields j=2 ρj (1 −
νdf −1 i+1
νj )λ2j + (1 − νdf )λ2df ≥ i+1 df − 2 − df
P
j=2 ρj . Inequalities (8) follow.
41
1/2 1/2 2
errors by εi,t = ht hi,t zi,t , where hi,t = ci + αi hi,t−1 zi,t−1 , with zi,t ∼ IIN (0, 1) mutually
independent of zt . We use the constraint ci = σii (1 − αi ) with uniform draws for the idiosyncratic
i.i.d. 1/2 ci
variances V [εi,t ] = σii ∼ U [1, 4], so that V [εi,t /ht ] = 1−αi
= σii . Such a setting allows
1/2
for cross-sectional heterogeneity in the variances of the scaled εi,t /ht . The ARCH parameters
i.i.d.
are uniform draws αi ∼ U [0.2, 0.5] with an upper boundary of the interval ensuring existence
of fourth-order moments. We generate 5, 000 panels of returns of size n × T for each of the 100
draws of the T × k factor matrix F and common ARCH process ht , t = 1, ..., T , in order to keep
the factor values constant within repetitions, but also to study the potential heterogeneity of size
and power results across different factor paths. The factor betas βi , idiosyncratic variances σii , and
individual ARCH parameters αi are the same across all repetitions in all designs of the section.
We use three different cross-sectional sizes n = 500, 1000, 5000, and three values of time-series
dimension T = 6, 12, 24. The variance matrix Ω̂Z̄ ∗ is computed using the parametric structure of
Lemma 8. We get the T − 1 estimated parameters by least squares, as detailed in OA Section E.5.3
i). The p-values are computed over 5, 000 draws.
We provide the size and power results in % in Table 1. Size of LR(2) is close to its nominal
level 5%, with size distortions smaller than 1%, except for the case T = 24 and n = 500. The
impact of the factor values on size is small for T above 6. The labels global power and local power
refer to κ̄ = 0 and κ̄ = 1/2, and power computation is not size adjusted. The global power is
equal to 100%, while the local power ranges from 80% to 85% for T = 6, and is equal to 100% for
T = 12 and T = 24. The approximate constancy of local power w.r.t. n, for large n, is coherent
with theory implying convergence to asymptotic local power. In the last panel of Table 1, we
provide the average of the estimated number k̂LR of factors, obtained by sequential testing with
LR(k), for k = 0, . . . , kmax , with kmax = 2, 7, 17 for T = 6, 12, 24 (see Table 3 of OA). We follow
the procedure described in Section 6.2, with size αn = 10/n. If we reject for all k = 0, . . . , kmax ,
then the estimated number of factors is set to k̂LR = kmax + 1. For all sample sizes T = 6, 12, 24,
the average estimated number of factors is very close to the true number 2. We can conclude that
our selection procedure for the number of factors works well in our simulations.
42
Figure 1: The upper panel displays the p-values for the statistic LR(k) for the subperiods from January
1963 to December 2021, stopping at the smallest k such that H0 (k) is not rejected at level αn = 10/nmax .
If no such k is found then p-values are displayed up to kmax . We use rolling windows of T = 20 months
moving forward by 12 months each time. The first bar of p-values covers the whole 20 months. Other bars
cover the last 12 months of the 20 months subperiod. We flag bear market phases with grey shaded vertical
1/2 1/2
bars. The five lower panels display V̂ y for total cross-sectional volatility, F̂ ′ F̂ for systematic
1/2
volatility, V̂ ε for idiosyncratic volatility, as well as R̂2 and R̂2 under a single-factor model.
43
Figure 2: The upper and lower panels display the p-values for the RS and KP statistics for the subperiods
from January 1963 to December 2021, for the rank test of the null hypothesis H0,sp (r) that F O has rank r
against the alternative hypothesis of rank larger than r, for any integer r ≤ k − 1. The empirical matrix F̂ O
is computed with the time-varying portfolio weights of the Fama-French five-factor model plus momentum.
We stop at the smallest r such that H0,sp (r) is not rejected at level αn = 10/n. If no such r is found then
p-values are displayed up to k − 1. The red horizontal segments give k̂ − 1, i.e., the estimated number of
latent factors obtained from Figure 1 minus 1. We flag bear market phases with grey shaded vertical bars,
and use the same rolling windows as in Figure 1.
44
Size (%) Global Power (%) Local Power (%) k̂LR
T 6 12 24 6 12 24 6 12 24 6 12 24
n = 500 6.0 5.2 6.7 100 100 100 80 100 100 2.0 2.0 2.1
(2.8) (0.3) (0.4) (0.1) (0.0) (0.0) (20.5) (0.0) (0.0) (0.1) (0.1) (0.2)
n = 1000 5.6 4.9 5.5 100 100 100 81 100 100 2.0 2.0 2.0
(2.3) (0.3) (0.3) (0.0) (0.0) (0.0) (21.1) (0.0) (0.0) (0.0) (0.0) (0.1)
n = 5000 5.3 5.0 4.9 100 100 100 85 100 100 2.0 2.0 2.0
(0.9) (0.3) (0.3) (0.0) (0.0) (0.0) (20.4) (0.0) (0.0) (0.0) (0.0) (0.1)
Table 1: For each sample size combination (n, T ), we provide the average size and power in %
for the statistic LR(2) (first three panels), and the average of the estimated number k̂LR of factors
obtained by sequential testing (last panel). Nominal size is 5% for the first three panels, and
αn = 10/n for the last panel. Global power refers to the global alternative κ̄ = 0, and local power
refers to the local alternative κ̄ = 0.5. In parentheses, we report the standard deviations for size,
power, and k̂LR across 100 different draws of the factor path.
45
ONLINE APPENDIX
Latent Factor Analysis in Short Panels
Alain-Philippe Fortin, Patrick Gagliardini, and Olivier Scaillet
We prove Lemmas 1-3 of the paper in Section D. We provide additional theory in Appendix E,
namely the characterization of the pseudo likelihood and the PML estimator (E.1), the conditions
for global identification and consistency (E.2), the asymptotic expansions for the FA estimators
(E.3), the local analysis of the first-order conditions of FA estimators (E.4), the asymptotic normal-
ity of FA estimators (E.5), the definition of invariant tests (E.6), and proofs of additional lemmas
(E.7). We give numerical checks of Inequalities (7) of Proposition 5 in Appendix F. Finally, we
collect the maximum value of k as a function of T in Appendix G.
with (Zn )ts = √1n i,j wi,t wj,s σij = √1n Jm=1
P P n ts ts
P
ζm,n , t ̸= s, with ζm,n = i∈Im wi,t wi,s σii +
PJn
√1
P P
i,j∈Im wi,t wj,s σij + i,j∈Im wi,t wj,s σij , t ̸= s, so that vech(Zn ) = n m=1 vech(ζm,n ), where
i<j i>j
ts
ζm,n is the T × T matrix having element ζm,n in position (t, s). Hence, vech(Zn ) is the row sum
of a triangular array {vech(ζm,n )}1≤m≤n of independent centered random vectors. Let Ωm,n :=
tt
)2 ] = i∈Im (E[wi,t 4
] − 1)σii2 +
P
V [vech(ζm,n )]. Using Assumption 2 (a), we compute (i) E[(ζm,n
2 i,j∈Im σij2 ; (ii) E[(ζm,n
ts
)2 ] = 2 2 2 2 tt ss
P P P
i∈Im E[w i,t wi,s ]σii + i,j∈Im σij , t ̸= s; (iii) E[ζm,n ζm,n ] =
i̸=j i̸=j
2 2 2 tt rp 2 2 ts rp
P P
i∈Im E[wi,t wi,s −1]σii , t ̸= s; (iv) E[ζm,n ζm,n ] = i∈Im E[wi,t wi,r wip ]σii , r ̸= p; (v) E[ζm,n ζm,n ]
P 2 1
PJn
= i∈Im E[wi,t wi,s wi,r wi,p ]σii , t ̸= s, r ̸= p. It follows that V [vech(Zn )] = n m=1 Ωm,n =
Dn +κn I T (T +1) = Ωn . The eigenvalues of Dn are bounded away from 0 under Assumption A.6 (b),
2
because for any unit vector ξ ∈ RT (T +1)/2 , we have ξ ′ Dn ξ ≥ n1 ni=1 1i∈S̄ σii2 ξ ′ V [vech(wi wi′ )]ξ ≥
P
46
Pn Pn 2 2
c n1 2 1
≥ c 1 − C̄ n1 ni=1 (1 − 1i∈S̄ ) ≥ 4c , for all n.
P
i=1 1i∈S̄ σii ≥ c 1 − n i=1 (1 − 1i∈S̄ )σii
−1/2
We use the multivariate Lyapunov condition ∥Ωn ∥4 n12 Jm=1 E[∥vech(ζm,n )∥4 ] → 0 to invoke
Pn
2
a CLT. Since ∥A−1/2 ∥4 ≤ δ2k(A) and ∥x∥4 ≤ k kj=1 x4j , for any k × k positive semi-definite matrix
P
k
Assumptions A.6 (a)-(c), Ωn → Ω follows from the Slutsky theorem, and Ω is positive definite.
1 1 dj Ψ(0)
Proof of Lemma 2: We have cj (λ) = j!
E[Qj ] =
where Ψ(u) := E[exp(uQ)] =
j! duj
√
exp[ψ(u)] is the Moment Generating Function (MGF) of Q = 12 df
p
1 − νj λj )2 with
P
j=1 ( ν j Xj +
u √
Xj ∼ i.i.d.N (0, 1). By the independence of variables Xj , we get Ψ(u) = df
Q
j=1 E[exp( 2 ( ν j Xj +
√ 1 (1−νj )u 2
λ
1 − νj λj )2 ] where E[exp( u2 ( ν j Xj + 1 − νj λj )2 ] = (1 − νj u)−1/2 e 2 1−νj u j , for u < 1/νj .
p p
P h (1−νj )u 2
i
Thus we get the log MGF ψ(u) = 12 df j=1 − log(1 − ν j u) + λ
1−νj u j
, for u < 1/νdf . Its lth
order derivative evaluated at u = 0 is
df
(l − 1)! X l−1
ψ (l) (0) = νj + l(1 − νj )λ2j ,
νj l ≥ 0. (D.1)
2 j=1
By using the Faa di Bruno formula for the derivatives of a composite function, we have
dl ′′
dul
eψ(u) = eψ(u) Bl (ψ ′ (u), ψ (u), ..., ψ (l) (u)), where Bl is the lth complete exponential Bell’s poly-
′′
nomial (Bell (1934)). Hence, Ψ(l) (0) = Bl (ψ ′ (0), ψ (0), ..., ψ (l) (0)). The complete Bell’s polyno-
mials satisfy the recurrence relation Bl+1 (x1 , x2 , ..., xl+1 ) = li=0 il Bl−i (x1 , ..., xl−i )xi+1 . Thus,
P
Ψ(l+1) (0) = li=0 il Ψ(l−i) (0)ψ (i+1) (0). After standardization with the factorial term, and using
P
47
Gci := ci+1 − ci ≥ 0 for all i. For this purpose, from the recursive relation defining ci+1 we have:
1
a1 (ci−1 + Gci−1 ) + a2 (ci−2 + Gci−2 ) + · · · + ai−1 (c1 + Gc1 ) + ai
ci+1 =
i
1
(a1 − 1)Gci−1 + (a2 − 1)Gci−2 + · · · + (ai−1 − 1)Gc1 + (ai − 1)
=
i
1 1
+ Gci−1 + Gci−2 + · · · + Gc1 + 1 + (a1 ci−1 + a2 ci−2 + · · · + ai−1 ) .
i i
The second term in the RHS is equal to 1i ci . Using a1 ci−1 + a2 ci−2 + · · · + ai−1 = (i − 1)ci ,
i−1
the third term in the RHS is equal to i i
c.
Thus, by bringing these two terms in the LHS, we
1
get Gci = i
(a1 − 1)Gci−1 + (a2 − 1)Gci−2 + · · · + (ai−1 − 1)Gc1 + (ai − 1) , for all i ≥ 2, with
Gc1 = a1 − 1. Since ai ≥ 1 for all i, we get Gci ≥ 0 for all i ≥ 1 by an induction argument .
(ii) We now strengthen the result in step (i) and show that Hic := ci+1 − ci ζ+i−1
i
≥ 0 for all i,
with ζ = max{ df2−1 , 1}. Similarly as in step (i), we have
1
(a1 − ζ)Gci−1 + (a2 − ζ)Gci−2 + · · · + (ai−1 − ζ)Gc1 + (ai − ζ)
ci+1 =
i
ζ 1
+ Gci−1 + Gci−2 + · · · + Gc1 + 1 + (a1 ci−1 + a2 ci−2 + · · · + ai−1 ) ,
i i
where the second term in the RHS equals ζi ci , and the third term equals i−1 c . Thus, we get
i i
Hic = 1i (a1 − ζ)Gci−1 + (a2 − ζ)Gci−2 + · · · + (ai−1 − ζ)Gc1 + (ai − ζ) , for all i. By step (i),
we have Gci ≥ 0 for i ≥ 1. Using the separation condition ai ≥ ζ for all i, we get Hic ≥ 0 for all i.
(iii) We show that Hig := gi+1 − gi ζ+i−1
i
≤ 0 for all i ≥ 1. For df = 2 this statement follows
1 2i−1
with ζ = 1 since gi+1 = (g
2i i
+ gi−1 + ... + 1) = 2i
gi
and hence (gi ) is decreasing. Let us now
consider the case df ≥ 3 with ζ = df2−1 . As above we have Hig = 1i il=1 (bl − ζ)Ggi−l , where
P
P −1 l Pdf −1
Ggi := gi+1 − gi . We plug in bl − ζ = 21 df 1
j=2 (ρj − 1) = 2
l−1
j=2 (ρj − 1)(1 + ρj + ... + ρj ) =
1
Pdf −1 Pl k−1
2 j=2 (ρj − 1) k=1 ρj . Thus, we get:
df −1 i X l df −1 i i
1 X X 1 X X X
Hig = (ρj − 1) ρk−1
j Gg
i−l = (ρj − 1) ρ k−1
j Ggi−l
2i j=2 l=1 k=1
2i j=2 k=1 l=k
df −1 i df −1
1 X X
k−1 1 X
(ρj − 1) gi + ρj gi−1 + ... + ρi−1
= (ρj − 1) ρj gi−k+1 = j ≤ 0.
2i j=2 k=1
2i j=2
48
ci+1 ζ+i−1 gi+1 ζ+i−1
(iv) The inequalities established in steps (ii) and (iii) imply ci
≥ i
and gi
≤ i
for
ci+1 gi+1 ci+1 ci
all i. Then, we get ci
≥ gi
, that is equivalent to gi+1
≥ gi
, for all i, because the sequences ci
and gi are strictly positive. The conclusion follows.
E Additional theory
The FA estimator is the PML estimator based on the Gaussian likelihood function obtained from
the pseudo model yi = µ + F βi + εi with βi ∼ N (0, Ik ) and εi ∼ N (0, Vε ) mutually independent
and i.i.d. across i = 1, ..., n. Then, yi ∼ N (µ, Σ(θ)) under this pseudo model, where Σ(θ) :=
F F ′ + Vε and θ := (vec(F )′ , diag(Vε )′ )′ ∈ Rr with r = (k + 1)T . It yields the pseudo log-
Pn ′ −1
1
likelihood function L̂(θ, µ) = − 12 log |Σ(θ)| − 2n 1
i=1 (yi − µ) Σ(θ) (yi − µ) = − 2 log |Σ(θ)| −
1
T r V̂y Σ(θ)−1 − 12 (ȳ − µ)′ Σ(θ)−1 (ȳ − µ), up to constants, where ȳ = n1 ni=1 yi and V̂y =
P
2
1
Pn ′
n i=1 (yi − ȳ)(yi − ȳ) . We concentrate out parameter µ to get its estimator µ̂ = ȳ. Then,
1 1
L̂(θ) := − log |Σ(θ)| − T r V̂y Σ(θ)−1 , (E.1)
2 2
subject to the normalization restriction that F ′ Vε−1 F is a diagonal matrix, with diagonal elements
ranked in decreasing order.35
The population criterion L0 (θ) is defined in Appendix A, with Vy = Vy0 = Σ(θ0 ) = F0 F0′ + Vε0 .
35
If the risk-free rate vector is considered observable, we can rewrite the model as ỹi = F β̃i + εi = µ + F βi + εi ,
where ỹi = yi −rf is the vector of excess returns and µ = F µβ̃ . It corresponds to a constrained model with parameters
θ and µβ̃ . The maximization of the corresponding Gaussian pseudo likelihood function leads to a constrained FA
estimator, that we do not consider in this paper since it does not match a standard FA formulation.
49
Lemma 4 The following conditions are equivalent: a) the true value θ0 is the unique maximizer
of L0 (θ) for θ ∈ Θ; b) Σ(θ) = Σ(θ0 ), θ ∈ Θ ⇒ θ = θ0 , up to sign changes in the columns of F .
They yield the global identification in the FA model.
In Lemma 4, condition a) is the standard identification condition for a M-estimator with pop-
ulation criterion L0 (θ). Condition (b) is the global identification condition based on the variance
matrix as in Anderson and Rubin (1956). Condition (b) corresponds to our Assumption A.4.
Let us now establish the consistency of the FA estimators in our setting. Write V̂y = n1 ni=1 (εi −
P
ε̄)(εi −ε̄)′ +F [ n1 ni=1 (βi −β̄)(βi −β̄)′ ]F ′ +F [ n1 ni=1 (βi −β̄)(εi −ε̄)′ ]+[ n1 ni=1 (εi −ε̄)(βi −β̄)′ ]F ′ ,
P P P
where ε̄ = n1 ni=1 εi and β̄ = n1 ni=1 βi . Under the normalization in Assumption A.1 we have:
P P
′
1 1 1
V̂y = εε′ − ε̄ε̄′ + F F ′ + F εβ + εβ F ′ . (E.2)
n n n
1
Lemma 5 Under Assumptions 1, 2, and A.2, A.3, as n → ∞, we have: (a) ε̄ = op ( n1/4 ), (b)
p p
1
n
εε′ → Vε0 , and (c) n1 εβ → 0.
p
From Equation (E.2) and Lemma 5, we have V̂y → Vy0 . Thus, L̂(θ) converges in probability to
L0 (θ) as n → ∞, uniformly over Θ compact. From standard results on M-estimators, we get
consistency of θ̂. Moreover, from ȳ = µ + ε̄, we get the consistency of µ̂.
Proposition 6 Under Assumptions 1, 2, and A.2-A.4, the FA estimators F̂ , V̂ε and µ̂ are consistent
as n → ∞ and T is fixed.
Anderson and Rubin (1956) establish consistency in Theorem 12.1 (see beginning of the proof,
page 145) within a Gaussian ML framework. Anderson and Amemiya (1988) provide a version of
this result in their Theorem 1 for generic distribution of the data, dispensing for compacity of the
parameter set but using a more restrictive identification condition.
50
E.3 Asymptotic expansions of estimators V̂ε and F̂
The FA estimators V̂ε and F̂ are consistent M-estimators under nonlinear constraints, and admit
expansions at first order for fixed T and n → ∞, namely V̂ε = Ṽε + √1n Ψε + op ( √1n ) and F̂j = Fj +
√1 ΨF + op ( √1n ) (see Appendix E.5.1). The next proposition (new to the literature) characterizes
n j
the diagonal random matrix Ψε and the random vectors ΨFj by using conditions (FA1) and (FA2)
in Section 2 (see proof at the end of the section).
Proposition 7 Under Assumptions 1, 2, and A.1-A.4, A.6, we have (a) for j = 1, ..., k
1 1
Pk 1
Pk γl
where Rj := P
2γj Fj ,Vε
+ γj
MF,Vε + ℓ=1,ℓ̸=j γj −γℓ PFℓ ,Vε and Λj := − ℓ=1,ℓ̸=j γj −γℓ PFℓ ,Vε and
PFj ,Vε = Fj (Fj′ Vε−1 Fj )−1 Fj′ Vε−1 = 1
F F ′ V −1
γj j j ε
is the GLS orthogonal projection onto Fj . Further,
(b) the diagonal matrix Ψε is such that:
′
diag MF,Vε (Ψy − Ψε )MF,Vε
= 0. (E.4)
Equation (E.3) yields the asymptotic expansion of the eigenvectors by accounting for esti-
mation errors of matrix V̂y V̂ε−1 (first term) and of the normalization constraint (second term). To
′
interpret Equation (E.4), we can observe that the matrix MF,Vε (Ψy −Ψε )MF,Vε
yields the first-order
√
term in the asymptotic expansion of nŜ (up to the left- and right-multiplication by diagonal ma-
−1/2
trix Vε ). Thus, Equation (E.4) is implied by the property that the diagonal terms of matrix Ŝ
are equal to zero as stated in Proposition 1 (c).
′ ′
Let us now give the explicit expression of Ψε . By using MF,Vε Ψy MF,V ε
= MF,Vε Zn MF,V ε
, we
′
can rewrite Equation (E.4) as diag MF,Vε (Zn − Ψε )MF,V ε
= 0. Now, since Ψε is diagonal, we
′ ⊙2 ⊙2
have diag MF,Vε Ψε MF,V ε
= MF,V ε
diag(Ψε ), where MF,V ε
= MF,Vε ⊙ MF,Vε . Thus, we get:
⊙2 ′
MF,Vε
diag(Ψε ) = diag(MF,Vε Zn MF,Vε
). (E.5)
To have a unique solution for vector diag(Ψε ), we need the non-singularity of the T × T matrix
⊙2
MF,Vε
. It is the local identification condition in the FA model stated in Assumption A.5. Let us
51
P −k
write G = [g1 : · · · : gT −k ]. Then, we have MF,Vε = GG′ Vε−1 = Tj=1 gj (Vε−1 gj )′ , and so we get
PT −k h PT −k i
⊙2 −1 ′ −1 ′ ′
the Hadamard product MF,V ε
= [g (V
i,j=1 i ε gi ) ] ⊙ [g j (V ε gj ) ] = (g
i,j=1 i ⊙ g j )(g i ⊙ gj )
Vε−2 = 2 (X ′ X) Vε−2 .36 Hence, we can state the local identification condition in Assumption A.5
as a full-rank condition for matrix X, analogously as in linear regression (Lemma 6). In Lemma
6 in Appendix E.4 i), we also show equivalence with invertibility of the bordered Hessian, i.e., the
Hessian of the Lagrangian function in a constrained M-estimation.
Under Assumption A.5, we get from Equation (E.5):
⊙2 −1 ′
where TF,Vε (V ) := diag [MF,Vε
] diag(MF,Vε V MF,Vε
) , for any matrix V . Mapping TF,Vε (·) is
′
linear and such that TF,Vε (V ) = V , for a diagonal matrix V . We have diag(MF,Vε Zn MF,Vε
) =
diag (GZn∗ G′ ) = 2X ′ vech (Zn∗ ),37 and so
−1
diag(Ψε ) = Vε2 (X ′ X) X ′ vech (Zn∗ ) . (E.7)
Anderson and Rubin (1956), Theorem 12.1, show that the FA estimator is asymptotically nor-
√
mal if n(V̂y − Vy ) is asymptotically normal. They use a linearization of the first-order conditions
similar as the one of Proposition 7. Their Equation (12.16) corresponds to our Equation (E.4).
However, they only provide an implicit characterization of the ΨFj and not an explicit expression
for Ψε and ΨFj in terms of asymptotically Gaussian random matrices like Zn as we do. These key
developments pave the way to establishing the asymptotic distributions of estimators F̂ and V̂ε in
general settings, that we cover in Appendix E.5.
Proof of Proposition 7: From (E.2) and Lemma 5 we have V̂y = Ṽy + √1n Ψy + op ( √1n ), where
√
Ṽy = F F ′ + Ṽε and Ψy = √1n (εβF ′ + F β ′ ε′ ) + n n1 εε′ − Ṽε . Let us substitute this expansion
36
Let us recall the following property of the Hadamard product: (ab′ ) ⊙ (cd′ ) = (a ⊙ c)(b ⊙ d)′ for conformable
h i
vectors a, b, c, d. The last equalitiy because X ′ = √12 g1 ⊙ g1 : · · · : √12 gT −k ⊙ gT −k : {gi ⊙ gj }i<j (see be-
ginning of the proof of Proposition 2 (a)) .
37
We have diag(GAG′ ) = 2X ′ vech(A) for any T × T symmetric matrix A; see beginning of the proof of Propo-
sition 2 (a).
52
for V̂y into (FA2) and rearrange to obtain F̂ Γ̂ − F F ′ V̂ε−1 F̂ = √1 Ψy V̂ −1 F̂
n ε + (Ṽε V̂ε−1 − IT )F̂ +
op ( √1n ), where Γ̂ = F̂ ′ V̂ε−1 F̂ = diag(γ̂1 , . . . , γ̂k ). From V̂ε = Ṽε + √1 Ψε
n
+ op ( √1n ), we have
Ṽε V̂ε−1 − IT = − √1n Ψε V̂ε−1 + op ( √1n ). Substituting into the above equation and right multiplying
both sides by (F ′ V̂ε−1 F̂ )−1 gives F̂ D̂ − F = √1 (Ψy
n
− Ψε )V̂ε−1 F̂ (F ′ V̂ε−1 F̂ )−1 + op ( √1n ), where
D̂ := Γ̂(F ′ V̂ε−1 F̂ )−1 . By the root-n convergence of the FA estimates (see Section E.5.1), we get
1 1
F̂ D̂ − F = √ (Ψy − Ψε )Vε−1 F Γ−1 + op ( √ ), (E.8)
n n
and D̂ = Ik + Op ( √1n ), where Γ = diag(γ1 , ..., γk ). We can push the expansion by plugging
into (E.8) the expansion of D̂. We have F ′ V̂ε−1 F̂ = [Ik − (F̂ − F )′ V̂ε−1 F̂ Γ̂−1 ]Γ̂, so that D̂ =
[Ik − (F̂ − F )′ V̂ε−1 F̂ Γ̂−1 ]−1 = Ik + (F̂ − F )′ Vε−1 F Γ−1 + op ( √1n ). By plugging into (E.8), we get:
1 1
F̂ − F + F [(F̂ − F )′ Vε−1 F Γ−1 ] = √ (Ψy − Ψε )Vε−1 F Γ−1 + op ( √ ). (E.9)
n n
By multiplying both sides with MF,Vε , we get MF,Vε (F̂ − F ) = √1 MF,V (Ψy
n ε − Ψε )Vε−1 F Γ−1 +
op ( √1n ). Then, F̂ − F = √1 MF,V (Ψy
n ε − Ψε )Vε−1 F Γ−1 + √1 F A
n
+ op ( √1n ), where A is a random
k × k matrix to be determined next. By plugging into (E.9), we get F (A + A′ ) = PF,Vε (Ψy −
Ψε )Vε−1 F Γ−1 + op ( √1n ). By multiplying both sides by 12 Γ−1 F ′ Vε−1 and using F ′ Vε−1 PF,Vε =
F ′ Vε−1 , we get the symmetric part of matrix A, i.e., 21 (A + A′ ) = 21 Γ−1 F ′ Vε−1 (Ψy − Ψε )Vε−1 F Γ−1
(we include higher-order terms in the remainder op ( √1n )). Thus, F̂ − F = √1 ΨF
n
+ op ( √1n ), where
1
ΨF = MF,Vε (Ψy − Ψε )Vε−1 F Γ−1 + PF,Vε (Ψy − Ψε )Vε−1 F Γ−1 + F Ã, (E.10)
2
and à = 21 (A − A′ ) is an antisymmetric k × k random matrix. To find the antisymmetric matrix
à = (ãℓ,j ), we use that F̂ ′ V̂ε−1 F̂ is diagonal. Plugging the expansions of the FA estimates, for the
√
term at order 1/ n we get that the out-of-diagonal elements of matrix Ψ′F Vε−1 F + F ′ Vε−1 ΨF −
F ′ Vε−1 Ψε Vε−1 F = 21 Γ−1 F ′ Vε−1 (Ψy − Ψε )Vε−1 F + 21 F ′ Vε−1 (Ψy − Ψε )Vε−1 F Γ−1 + ΓÃ − ÃΓ −
F ′ Vε−1 Ψε Vε−1 F are nil. Setting the (ℓ, j) element of this matrix equal to 0, we get ãℓ,j = −ãj,ℓ =
h i
1 1 1 1 ′ −1 −1 ′ −1 −1
(
γj −γℓ 2 γj
+ γℓ
)F V
ℓ ε (Ψy − Ψ )V
ε ε F j − F V
ℓ ε Ψ V
ε ε F j , for j ̸= ℓ. Then, from Equation
53
Pk 1 −1
Pk γℓ −1
ℓ=1:ℓ̸=j γj −γℓ PFℓ ,Vε (Ψy − Ψε )Vε Fj − ℓ=1:ℓ̸=j γj −γℓ PFℓ ,Vε Ψε Vε Fj , where we use PF,Vε =
Pk
ℓ=1 PFℓ ,Vε . Part (a) follows.
Let us now prove part (b). The asymptotic expansion of condition (FA1) yields:
k
!
X
diag(Ψy ) = diag (Fj Ψ′Fj + ΨFj Fj′ ) + Ψε . (E.11)
j=1
From part (a) and the definition of PFj ,Vε we have kj=1 ΨFj Fj′ = 21 kj=1 PFj ,Vε (Ψy − Ψε )PF′ j ,Vε +
P P
γj γ γ
′
PFℓ ,Vε (Ψy − Ψε )PF′ j ,Vε − kℓ̸=j γjℓ−γj ℓ PFℓ ,Vε Ψε PF′ j ,Vε =: N1 +
P P
MF,Vε (Ψy − Ψε )PF,V ε
+ ℓ̸=j γj −γ ℓ
Pk P
N2 + N3 + N4 , where PF,Vε = j=1 PFj ,Vε = IT − MF,Vε and ℓ̸=j denotes the double sum
over j, ℓ = 1, ..., k such that ℓ ̸= j. Matrix N1 is symmetric and it contributes 2N1 to the RHS
of (E.11). Instead, matrix N4 is antisymmetric (it can be seen by interchanging indices j and ℓ
in the summation) and it does not contribute to the RHS of (E.11). For matrix N3 we have N3 +
γj
N3′ = ℓ̸=j γj −γ PFℓ ,Vε (Ψy − Ψε )PF′ j ,Vε + ℓ̸=j γℓγ−γ PFℓ ,Vε (Ψy − Ψε )PF′ j ,Vε = ℓ̸=j PFℓ ,Vε (Ψy −
P P ℓ
P
ℓ j
Ψε )PF′ j ,Vε = ℓ,j PFℓ ,Vε (Ψy −Ψε )PF′ j ,Vε − j PFj ,Vε (Ψy −Ψε )PF′ j ,Vε = PF,Vε (Ψy −Ψε )PF,V ′
P P
ε
−2N1 ,
where we have interchanged ℓ and j in the first equality when writing N3′ . Thus, we get:
k
X
(Fj Ψ′Fj + ΨFj Fj′ ) = MF,Vε (Ψy − Ψε )PF,V
′
ε
′
+ PF,Vε (Ψy − Ψε )MF,Vε
′
+ PF,Vε (Ψy − Ψε )PF,Vε
j=1
′
= (Ψy − Ψε ) − MF,Vε (Ψy − Ψε )MF,Vε
. (E.12)
Consider the criterion L(θ) = − 12 log |Σ(θ)| − 12 T r (Vy Σ(θ)), where Vy is a p.d. matrix in a neigh-
bourhood of Vy0 . In our Assumptions, θ0 is an interior point of Θ. Let θ∗ = (vec(F ∗ )′ , diag(Vε∗ )′ )′
denote the maximizer of L(θ) subject to θ ∈ Θ. According to Anderson (2003), the first-order
conditions (FOC) for the maximization of L(θ) are: (a) diag(Vy ) = diag(F ∗ (F ∗ )′ + Vε∗ ) and (b)
F ∗ is the matrix of eigenvectors of Vy (Vε∗ )−1 associated to the k largest eigenvalues 1 + γj∗ for
j = 1, ..., k, normalized such that (F ∗ )′ (Vε∗ )−1 F ∗ = diag(γ1∗ , ..., γk∗ ).
54
i) Local identification
Let Vy = Vy0 . The true values F0 and Vε0 solve the FOC. Let F = F0 + ϵΨϵF and Vε = Vε0 + ϵΨϵVε ,
where ϵ is a small scalar and ΨϵF , ΨϵVε are deterministic conformable matrices, be in a neighbour-
hood of F0 and Vε0 and solve the FOC up to terms O(ϵ2 ). The model is locally identified if, and
only if, it implies ΨϵVε = 0 and ΨϵF = 0.
Lemma 6 Under Assumption 1, the following four conditions are equivalent: (a) Matrix MF⊙2
0 ,Vε
0
is non-singular, (b) Matrix X is full-rank, (c) Matrix Φ⊙2 is non-singular, where Φ := Vε0 −
2
F0 (F0′ (Vε0 )−1 F0 )−1 F0′ , (d) Matrix B0′ J0 B0 is non-singular, where J0 := − ∂ ∂θ∂θ
L0 (θ0 )
′ and B0 is any
∂g(θ0 )
full-rank r × (r − 12 k(k − 1)) matrix such that ∂θ′
B0 = 0, for g(θ) = {[F ′ Vε−1 F ]i,j }i<j the
1
2
k(k − 1) dimensional vector of the constraints. They yield the local identification of our model.
In Lemma 6, condition (a) corresponds to Assumption A.5 and is equivalent to condition (b)
that X is full-rank. Condition (c) is used in Theorem 5.9 of Anderson and Rubin (1956) to show
local identification. Condition (d) involves the second-order partial derivatives of the population
criterion function. While the Hessian matrix J0 itself is singular because of the rotational invari-
ance of the model to latent factors, the second-order partial derivatives matrix along parameter
directions, which are in the tangent plan to the contraint set, is non-singular. Condition (d) is
equivalent to invertibility of the bordered Hessian.
Now, let Vy = Vy0 + ϵΨϵy be in a neighbourhood of Vy0 . Let F ∗ = F0 + ϵΨϵF + O(ϵ2 ) and
Vε∗ = Vε0 +ϵΨϵVε +O(ϵ2 ) be the solutions of the FOC. Consider Vy −Σ∗ , where Σ∗ = F ∗ (F ∗ )′ +Vε∗ ,
i.e., the difference between variance Vy and its k-factor approximation with population FA. We
want to find the first-order development of Vy − Σ∗ for small ϵ. From the FOC, we have that the
diagonal of such symmetric matrix is null, but not necessarily the out-of-diagonal elements.
55
From the arguments in the proof of Proposition 7, Equations (E.11) and (E.12), we get:
ΨϵF F0′ + F0 (ΨϵF )′ = Ψϵy − ΨϵVε − MF0 ,Vε0 (Ψϵy − ΨϵVε )MF′ 0 ,Vε0 , (E.13)
−1
diag(ΨϵVε ) = (Vε0 )2 (X ′ X) X ′ vech G′0 (Vε0 )−1 Ψϵy (Vε0 )−1 G0 .
(E.15)
Now, using Equation (E.13), we get Vy − Σ∗ = ϵ Ψϵy − F0 (ΨϵF )′ − ΨϵF F0′ − ΨϵVε + O(ϵ2 )
= ϵMF0 ,Vε0 (Ψϵy −ΨϵVε )MF′ 0 ,Vε0 +O(ϵ2 ) = ϵG0 ∆∗ G′0 +O(ϵ2 ), where ∆∗ := G′0 (Vε0 )−1 Ψϵy (Vε0 )−1 G0 −
G′0 (Vε0 )−1 ΨϵVε (Vε0 )−1 G0 . Using that vech(G′0 diag(a)G0 ) = Xa, and Equation (E.15), the vector-
ized form of matrix ∆∗ is: vech(∆∗ ) = vech G′0 (Vε0 )−1 Ψϵy (Vε0 )−1 G0 −X(Vε0 )−2 diag(ΨϵVε ) =
MX vech G′0 (Vε0 )−1 Ψϵy (Vε0 )−1 G0 . Thus, we have shown that, at first order in ϵ, the difference
between Vy = Vy0 + ϵΨϵy and the FA k-factor approximation Σ∗ is ϵG0 ∆∗ G′0 , with vech(∆∗ ) =
MX vech G′0 (Vε0 )−1 Ψϵy (Vε0 )−1 G0 . It shows that the small perturbation ϵΨϵy around Vy0 keeps
the DGP within the k-factor specification (at first order) if, and only if, we have that vector
vech G′0 (Vε0 )−1 Ψϵy (Vε0 )−1 G0 is spanned by the columns of X.
since F0′ (Vε0 )−1 G0 = 0 and G′0 (Vε0 )−1 G0 = IT −k . Thus, vech(∆∗ ) = MX vech (ξG ξG
′
). Hence, it
′
is only the component of vech (ξG ξG ) that is orthogonal to the range of X, which generates a local
deviation from a k-factor specification through the multiplication by the projection matrix MX . It
clarifies the role of the projector in the local power. On the contrary, the component spanned by
the columns of X can be “absorbed" in the k-factor specification by a redefinition of the factor F
and the variance Vε through F ∗ and Vε∗ .
56
E.5 Feasible asymptotic normality of the FA estimators
We first establish the asymptotic expansion of θ̂ along the lines of pseudo maximum likelihood
estimators (White (1982)). The sample criterion is L̂(θ) given in Equation (E.1), where θ =
(vec(F )′ , diag(Vε )′ )′ is subject to the nonlinear vector constraint g(θ) := {[F ′ Vε−1 F ]i,j }i<j = 0,
i.e., matrix F ′ Vε−1 F is diagonal. By standard methods for constrained M-estimators, we consider
∂ L̂(θ̂) ′
the FOC of the Lagrangian function: − ∂g(∂θθ̂) λ̂L = 0 and g(θ̂) = 0, where λ̂L is the 12 k(k − 1)
∂θ
′
′ ′
dimensional vector of estimated Lagrange multipliers. Define vector θ̃ := vec(F0 ) , diag(Ṽε ) ,
which also satisfies the constraint g(θ̃) = 0 by the in-sample factor normalization. We apply the
mean value theorem to the FOC around θ̃ and get:
√ √ √ ∂ L̂(θ̃)
ˆ θ̄) n(θ̂ − θ̃) + A(θ̂) nλ̂L =
J( n , (E.16)
∂θ
√
A(θ̄)′ n(θ̂ − θ̃) = 0, (E.17)
ˆ 2 L̂(θ) ∂g(θ)′
where J(θ) := − ∂∂θ∂θ ′ is the r × r Hessian matrix, A(θ) := ∂θ
is the r × 21 k(k − 1) dimen-
sional gradient matrix of the constraint function, and θ̄ is a mean value vector between θ̂ and θ̃
componentwise. Matrix A(θ) is full rank for θ in a neighbourhood of θ0 . For any θ define the
r × (r − 21 k(k − 1)) matrix B(θ) with orthonormal columns that span the orthogonal comple-
ment of the range of A(θ). Matrix function B(θ) is continuous in θ in a neighbourhood of θ0 .38
Then, by multiplying Equation (E.16) times B(θ̂)′ to get rid of the Lagrange multiplier vector,
using the identity Ir = A(θ)(A(θ)′ A(θ))−1 A(θ)′ + B(θ)B(θ)′ for θ = θ̄ and Equation (E.17), we
ˆ θ̄)B(θ̄)]B(θ̄)′ √n(θ̂ − θ̃) = B(θ̂)′ √n ∂ L̂(θ̃) . By the uniform convergence of J(θ)
get [B(θ̂)′ J( ˆ to
∂θ
2 L (θ)
′ ˆ
J(θ) := − ∂∂θ∂θ
0
′ , and the consistency of the FA estimator θ̂ (Section E.2), matrix B(θ̂) J(θ̄)B(θ̄)
converges to B0′ J0 B0 , where J0 := J(θ0 ) and B0 := B(θ0 ). Matrix B0′ J0 B0 is invertible un-
38
Matrix B(θ) is uniquely defined up to rotation and sign changes in their columns. We can pick a unique representer
such that matrix B(θ) is locally continuous, e.g., by taking B(θ) = B̃(θ)[B̃(θ)′ B̃(θ)]−1/2 , where matrix B̃(θ) consists
of the first r − 21 k(k − 1) columns of Ir − A(θ)[A(θ)′ A(θ)]−1 A(θ)′ , if those columns are linearly independent.
57
√
der the local identification Assumption A.5 (see Lemma 6 condition d)). Then, B(θ̄)′ n(θ̂ −
ˆ θ̄)B(θ̄)]−1 B(θ̂)′ √n ∂ L̂(θ̃) w.p.a. 1. By using again Ir = A(θ̄)(A(θ̄)′ A(θ̄))−1 A(θ̄)′ +
θ̃) = [B(θ̂)′ J( ∂θ
√ ˆ θ̄)B(θ̄)]−1 B(θ̂)′ √n ∂ L̂(θ̃) . The
B(θ̄)B(θ̄) and Equation (E.17), we get n(θ̂ − θ̃) = B(θ̄)[B(θ̂)′ J(
′
∂θ
√ ∂ L̂(θ̃) √
distributional results established below imply n ∂θ = Op (1). Thus, we get n-consistency:
√ √ ∂ L̂(θ̃)
n(θ̂ − θ̃) = B0 (B0′ J0 B0 )−1 B0′ n + op (1). (E.18)
∂θ
′
Let us now find the score ∂ L̂(θ) . We have ∂ L̂(θ)
= ∂vec(Σ(θ))
∂θ′
vec ∂ L̂(θ)
, where vec ∂ L̂(θ)
=
∂θ ∂θ ∂Σ ∂Σ
1
(Σ(θ)−1 ⊗ Σ(θ)−1 ) vec V̂y − Σ(θ) . Moreover, by using vec(Σ(θ)) = kj=1 Fj ⊗ Fj + [e1 ⊗
P
2
∂vec(Σ(θ))
e1 : ··· : eT ⊗ eT ]diag(Vε ), where et is the t-th column of IT , we get: ∂θ′
=
[(IT ⊗ F1 ) + (F1 ⊗ IT ) : · · · : (IT ⊗ Fk ) + (Fk ⊗ IT ) : e1 ⊗ e1 : · · · : eT ⊗ eT ] . Thus, we
√ ∂ L̂(θ̃)
1 ∂vec(Σ(θ̃))
′
−1 −1
√
get: n ∂θ = 2 ∂θ′
Ṽy ⊗ Ṽy nvec V̂y − Ṽy . From Equation (E.2) and Lemma
√
5 we have V̂y = Ṽy + √1n (Zn + Wn F ′ + F Wn′ ) + op ( √1n ), where Wn := √1n εβ. Thus, n ∂ L̂(
∂θ
θ̃)
=
′
1 ∂vec(Σ(θ0 ))
Vy−1 ⊗ Vy−1 vec (Wn F ′ + F Wn′ + Zn ) + op (1) and, from Equation (E.18), we
2 ∂θ′
get:
′
√
1 −1 ∂vec(Σ(θ0 ))
n(θ̂−θ̃) = B0 (B0′ J0 B0 ) B0′ Vy−1 ⊗ Vy−1 vec (Wn F ′ + F Wn′ + Zn )+op (1).
2 ∂θ′
(E.19)
In this subsection, we establish the asymptotic normality of estimators F̂ and V̂ε . From Lemma
1, as n → ∞ and T is fixed, we have the Gaussian distributional limit Zn ⇒ Z with vech(Z) ∼
N (0, ΩZ ), where the asymptotic variance ΩZ is related to the asymptotic variance Ω of Z such
that Cov(Zts , Zrp ) = Vε,tt Vε,ss Vε,rr Vε,pp Cov(Zts , Zrp ). Moreover, Zn∗ ⇒ Z ∗ = G′ Vε−1 ZVε−1 G
p
and Z̄n := Zn −TF,Vε (Zn ) ⇒ Z̄, where Z̄ = Z −TF,Vε (Z) = Z −Vε2 diag ((X ′ X)−1 X ′ vech(Z ∗ ))
(see (E.7)). The distributional limit of Wn is given next.
58
Lemma 7 Under Assumptions 1, 2 and A.2, A.3, A.8, as n → ∞, (a) we have Wn ⇒ W̄ , where
vec(W̄ ) ∼ N (0, ΩW ) with ΩW = Qβ ⊗ Vε , and (b) if additionally E[wi,t wi,r wi,s ] = 0, for all t, r, s
and i, then Z and W̄ are independent.
We get the following proposition from Lemmas 1 and 7 (see proof at the end of the section).
Proposition 8 Under Assumptions 1-2 and A.1-A.6, A.8, as n → ∞ and T is fixed, for j = 1, .., k:
√
ndiag(V̂ε − Ṽε ) ⇒ Vε2 (X ′ X)−1 X ′ vech(Z ∗ ), (E.20)
√
n(F̂j − Fj ) ⇒ Rj (W̄ F ′ + F W̄ ′ + Z̄)Vε−1 Fj + Λj {[Vε (X ′ X)−1 X ′ vech(Z ∗ )] ⊙ Fj }, (E.21)
√ 1
n(F̂j D̂ − Fj ) ⇒ (W̄ F ′ + F W̄ ′ + Z̄)Vε−1 Fj , (E.22)
γj
where deterministic matrices Rj and Λj are defined in Proposition 7, and D̂ := Γ̂(F ′ V̂ε−1 F̂ )−1
and Γ̂ := diag(γ̂1 , ..., γ̂k ).
The joint asymptotic Gaussian distribution of the FA estimators involves the Gaussian matrices
Z ∗ , Z̄ and W̄ , the former two being symmetric. The asymptotic distribution of V̂ε involves re-
centering around Ṽε = n1 ni=1 E[εi ε′i ], i.e., the finite-sample average cross-moments of errors,
P
and not Vε . For the asymptotic distribution of any functional that depends on F up to one-to-one
transformations of its columns, we can use the Gaussian law of (E.22) involving W̄ and Z̄ only.
The asymptotic expansions (E.20)-(E.21) characterize explicitly the matrices C1 (θ) and C2 (θ)
that appear in Theorem 2 in Anderson and Amemiya (1988). Their derivation is based on an
asymptotic normality argument treating θ̂ as a M-estimator, see Section C.2. However, neither the
asymptotic variance nor a feasible CLT are given in Anderson and Amemiya (1988). We cannot
use their results for our empirics.
To further compare our Proposition 8 with Theorem 2 in Anderson and Amemiya (1988), let
Z̄ = Z − TF,Vε (Z) = Ž − TF,Vε (Ž), where Ž := Z − diag(Z) is the symmetric matrix of the
off-diagonal elements of Z with zeros on the diagonal.39 Hence, the zero-mean Gaussian matrix
39
Here, diag(Z) is the diagonal matrix with the same diagonal elements as Z.
59
Z̄ only involves the off-diagonal elements of Z. Moreover, since Vε2 (X ′ X)−1 X ′ vech(∆∗n ) =
Vε2 diag(Vε−1 ∆n Vε−1 ) = diag(∆n ) for a diagonal matrix ∆n and ∆∗n := G′ Vε−1 ∆n Vε−1 G, we
√
can write the asymptotic expansion of V̂ε as ndiag(V̂ε − Ṽε ) = Vε2 (X ′ X)−1 X ′ vech(Žn∗ ) +
diag(Zn ) + op (1), where Žn∗ = G′ Vε−1 Žn Vε−1 G and Žn := Zn − diag(Zn ). Thus, we get:
√
ndiag(V̂ε − Ṽε ) ⇒ Vε2 (X ′ X)−1 X ′ vech(Ž ∗ ) + diag(Z), where Ž ∗ = G′ Vε−1 ŽVε−1 G. Hence,
the asymptotic distribution of the FA estimators depends on the diagonal elements of Z via term
diag(Z) in the asymptotic distribution of V̂ε . In Theorem 2 in Anderson and Amemiya (1988), this
term does not appear because in their results the asymptotic distribution of V̂ε is centered around
diag( n1 εε′ ) instead of Ṽε . Our recentering around Ṽε avoids a random bias term.
Finally, by applying the CLT to (E.19), the asymptotic distribution of vector θ̂ is:
′
√
1 ′ −1 ′ ∂vec(Σ(θ0 ))
Vy−1 ⊗ Vy−1 vec W̄ F ′ + F W̄ ′ + Z . (E.23)
n(θ̂ − θ̃) ⇒ B0 (B0 J0 B0 ) B0 ′
2 ∂θ
The Gaussian asymptotic distribution in (E.23) matches those in (E.20) and (E.21) written for the
components, and its asymptotic variance yields the ‘sandwich formula”. The result in (E.23) is
analogue to Theorem 2 in Anderson and Amemiya (1988), for different factor normalization and
recentering of the variance estimator.
√
Proof of Proposition 8: From (E.7), we have the asymptotic expansion: ndiag(V̂ε − Ṽε ) =
diag(Ψε )+op (1) = Vε2 (X ′ X)−1 X ′ vech(Zn∗ )+op (1). Moreover, from Proposition 7 (a) and using
√
Ψy − Ψε = Wn F ′ + F Wn′ + Z̄n , we have: n(F̂j − Fj ) = Rj (Ψy − Ψε )Vε−1 Fj + Λj Ψε Vε−1 Fj +
op (1) = Rj (Wn F ′ + F Wn′ + Z̄n )Vε−1 Fj + Λj [diag(Ψε ) ⊙ (Vε−1 Fj )] + op (1) = Rj (Wn F ′ + F Wn′ +
Z̄n )Vε−1 Fj + Λj {[Vε (X ′ X)−1 X ′ vech(Zn∗ )] ⊙ Fj } + op (1). Lemmas 1 and 7 yield (E.20)-(E.21),
together with (E.22) from (E.8) since Ψy − Ψε ⇒ W̄ F ′ + F W̄ ′ + Z̄.
60
Assumption 3 The standardized errors processes wi,t in Assumption 2 are (a) stationary martin-
2
gale difference sequences (mds), and (b) E[wi,t wi,r wi,s ] = 0, for t > r > s.
Assumption 3 holds e.g. for conditionally homoskedastic mds, and for ARCH processes (see be-
−1/2 −1/2
low). Let Z := Vε ZVε . Then, using Lemma 1, under Assumptions 2 and 3, we have
V [Zt,t ] = ψ(0)+2κ, V [Zt,s ] = ψ(t−s)+q +κ and Cov(Zt,t , Zs,s ) = ψ(t−s), where ψ(t−s) :=
lim n1 i Cov(wi,t2 2
)σii2 . Quantity ψ(t−s) depends on the difference t−s only, by stationarity.
P
, wi,s
n→∞
The other covariance terms between elements of Z vanish. Then, we have Ω = [ψ(0) − 2q]D(0) +
PT −1 PT ′
h=1 ψ(h)D(h) + (q + κ)IT (T +1)/2 , where D(0) = t=1 vech(Et,t )vech(Et,t ) and D(h) =
P −h
D̃(h) + D̄(h) with D̃(h) = Tt=1 [vech(Et,t )vech(Et+h,t+h )′ + vech(Et+h,t+h )vech(Et,t )′ ] and
P −h
D̄(h) = Tt=1 vech(Et,t+h +Et+h,t )vech(Et,t+h +Et+h,t )′ for h = 1, ..., T −1, and where Et,s de-
1/2 1/2
note the T ×T matrix with entry 1 in position (t, s) and 0 elsewhere. Hence, with Z = Vε Z Vε ,
we get a parametrization ΩZ (Vε , ϑ) for V [vech(Z)] with ϑ = (q+κ, ψ(0)−2q, ψ(1), ..., ψ(T −1))′ .
Then, we obtain a parametric structure for MX ΩZ ∗ MX = MX R′ ΩZ RMX .
Hence, the parametric structure MX ΩZ ∗ MX (Vε , G, ϑ̃) depends linearly on vector ϑ̃ that stacks the
T − 1 parameters ψ(h) + q + κ, for h = 1, ..., T − 1. It does not involve parameter ψ(0), i.e., the
quartic moment of errors, because the asymptotic expansion of the LR statistic does not involve the
diagonal terms of Z. Moreover, the unknown parameters appear through the linear combinations
ψ(h) + q + κ that are the scaled variances of the out-of-diagonal elements of Z. We can estimate
the unknown parameters in ϑ̃ by least squares applied on (E.24), using the nonparametric estimator
MX̂ Ω̂Z ∗ MX̂ defined in Proposition 2, after half-vectorization and replacing Vε and G by their FA
estimates. It yields a consistent estimator of MX ΩZ ∗ MX incorporating the restrictions implied by
Assumption 3.
61
To get a feasible CLT for the FA estimates, we need to estimate the additional parameters
ψ(0) − 2q and q + κ. We consider the matrix Ω̂Z ∗ from Proposition 2, that involves fourth-order
moments of residuals.
√ PJn 2
Lemma 9 Under Assumptions 1-3 and A.1-A.6, and n m=1 Bm,n = o(1), up to pre- and post-
multiplication by an orthogonal matrix and its transpose, we have Ω̂Z ∗ = R′ Ξ̃n R + op (1), where
PT −1
Ξ̃n = [ψn (0) − 2qn ]D(0) + h=1 ψn (h)D(h) + (qn + κn )IT (T +1)/2 + (qn + ξn )vech(IT )vech(IT )′
Jn
1X X
and ξn := σii σjj .
n m=1 i̸=j∈I
m
√ Pn
With blocks of equal size, the condition n Jm=1 2
Bm,n = o(1) holds if Jn = nᾱ and ᾱ > 1/2.
P −1
Now, we have the relation 3D(0) + Th=1 D(h) − vech(IT )vech(IT )′ = IT (T +1)/2 , which implies
P −1 ′
3R′ D(0)R + Th=1 R D(h)R − vech(IT −k )vech(IT −k )′ = Ip . Hence, matrix
T −1
X
′ ′
R Ξ̃n R = [ψn (0) + qn + 3κn ]R D(0)R + [ψn (h) + qn + κn ]R′ D(h)R
h=1
+(ξn − κn )vech(IT −k )vech(IT −k )′ (E.25)
depends on T + 1 linear combinations of the elements of ϑn = (qn + κn , ψn (0) − 2qn , ψn (1), ...,
ψn (T − 1))′ and ξn − κn . Thus, the linear system (E.25) is rank-deficient to identify ϑn . Moreover,
in Assumption A.3 (b), κn is defined as a double sum over squared covariances scaled by n, and is
assumed to converge to a constant κ. Such a convergence is difficult to assume for ξn since ξn is a
double sum over products of two variances scaled by n.
We apply half-vectorization on (E.25), replace the LHS by its consistent estimate Ξ̂, and plug-in
the FA estimates in the RHS. From Lemma 9, least squares estimation on such a linear regression
yields consistent estimates of linear combinations ψ(0) + q + 3κ and ψ(h) + q + κ for h =
1, . . . , T − 1. Consistency of those parameters applies independently of ξn − κn converging as
n → ∞, or not.40 In order to identify the components of ϑ, we need an additional condition. We
use the assumption ψ(T − 1) = 0. That condition is implied by serial uncorrelation in the squared
40 p(p+1)
To see this, write the half-vectorization of the RHS of (E.25) as χηn , where χ is the 2 × (T + 1) matrix
62
standardized errors after lag T − 1, that is empirically relevant in our application with monthly
returns data. Then, parameter q + κ is estimated by ψn (T −\
1) + qn + κn , and by difference we
get the estimators of ψ(0) − 2q and ψ(h), for h = 1, ..., T − 2.
Let us now discuss the case of ARCH errors. Suppose the wi,t follow independent ARCH(1)
1/2
processes with Gaussian innovations that are independent across assets, i.e., wi,t = hi,t zi,t , zi,t ∼
2 2 2
IIN (0, 1), hi,t = ci + αi wi,t−1 with ci = 1 − αi . Then E[wi,t ] = 0, E[wi,t ] = 1, ηi := V [wi,t ]=
2 2 2
1−3α2i
, Cov(wi,t , wi,t−h ) = ηi αih . Moreover, E[wi,t wi,r wi,s wi,p ] = 0 if one index among t, r, s, p is
different from all the others. Indeed, without loss of generality, suppose t is different from s, p, r.
2 ∞
By the law of iterated expectation: E[εi,t εi,s εi,p εi,r ] = E[E[εi,t |{zi,τ }τ =−∞ , {zi,τ }τ ̸=t ]εi,s εi,p εi,r ] =
1/2 2
E[hi,t E[zi,t |zi,t ]εi,s εi,p εi,r ] = 0. Then, Assumption 3 holds. The explicit formula of Ω involves
2αh
ψ(h) = lim n1 ni=1 1−3αi 2 σii2 , for h = 0, 1, ..., T − 1. Hence, setting ψ(T − 1) = 0 is a mild
P
n→∞ i
assumption for identification purpose since αiT −1 is small. If αi = 0 for all i, i.e., no ARCH
effects, we have ψ(0) = 2q and ψ(h) = 0 for h > 0, so that Ω = (q + κ)I T (T +1) .
2
63
T1 > k and T2 ≥ k are needed because we estimate residuals and betas in the first and the sec-
−1 −1
ond sub-intervals, namely ε̂1,i = MF̂1 ,V̂1,ε (y1,i − ȳ1 ) and β̂i = (F̂2′ V̂2,ε F̂2 )−1 F̂2′ V̂2,ε (y2,i − ȳ2 ).
Here, F̂j and V̂j,ε for j = 1, 2 are deduced from the FA estimates in the full period of T ob-
servations. Define Ψ̂β = n1 m i,j∈Im (β̂i β̂j′ ) ⊗ (ε̂1,i ε̂′1,j ). By using ε̂1,i = (MF̂1 ,V̂1,ε F1 )βi +
P P
MF̂1 ,V̂1,ε (ε1,i − ε̄1 ), MF̂1 ,V̂1,ε F1 = Op ( √1n ) and n12 m b2m,n = B 2 = o(1), we get Ψ̂β =
P P
P P m m,n
(Ik ⊗ MF̂1 ,V̂1,ε ) n1 m i,j∈Im (β̂i β̂j′ ) ⊗ [(ε1,i − ε̄1 )(ε1,j − ε̄1 )′ ] (Ik ⊗ MF̂′ ,V̂ ) + op (1). Now,
h i 1 1,ε
′ −1 −1 ′ −1 ′ −1 −1 ′ −1
we use β̂i = (F̂2 V̂2,ε F̂2 ) F̂2 V̂2,ε F2 βi + (F̂2 V̂2,ε F̂2 ) F̂2 V̂2,ε (ε2,i − ε̄2 ), and ε̄1 = op (n−1/4 ),
ε̄2 = op (n−1/4 ) from Lemma 5 (a), as well as the the mds condition in Assumption 3. We get
P P
Ψ̂β = Ψ̂β,1 + Ψ̂β,2 + op (1), where Ψ̂β,1 = (Ik ⊗ MF1 ,V1,ε ) n1 m i,j∈Im (βi βj′ ) ⊗ (ε1,i ε′1,j ) (Ik ⊗
−1 −1
P P
MF′ 1 ,V1,ε ) and Ψ̂β,2 = [(F2′ V2,ε F2 )−1 F2′ V2,ε ] ⊗ MF1 ,V1,ε n1 m i,j∈Im (ε2,i ε′2,j ) ⊗ (ε1,i ε′1,j )
−1 −1
′
[(F2′ V2,ε F2 )−1 F2′ V2,ε ] ⊗MF1 ,V1,ε . We use n1 m i,j∈Im (βi βj′ )⊗(ε1,i ε′1,j ) = Qβ ⊗V1,ε +op (1),
P P
and n1 m i,j∈Im (ε2,i ε′2,j )⊗(ε1,i ε′1,j ) = Ω21 +op (1), where Ω21 is the sub-block of matrix ΩZ that
P P
is the asymptotic variance of √1n ni=1 ε2,i ⊗ ε1,i ⇒ N (0, Ω21 ). Then, Ψ̂β = Qβ ⊗ (MF1 ,V1,ε V1,ε ) +
P
−1 −1 −1 −1
′
[(F2′ V2,ε F2 )−1 F2′ V2,ε ] ⊗ MF1 ,V1,ε Ω21 [(F2′ V2,ε F2 )−1 F2′ V2,ε
] ⊗ MF1 ,V1,ε + op (1). Thus, we get
−1/2 1/2
a consistent estimator of Qβ ⊗ (V1,ε MF1 ,V1,ε V1,ε ) by subtracting to Ψ̂β a consistent estimator of
−1/2
the second term on the RHS (bias term),42 and then by pre- and post-multiplying times (Ik ⊗V̂1,ε ).
To get a consistent estimator of Qβ , we apply a linear transformation that amounts to computing the
−1/2 1/2
trace of the second term of a Kronecker product, and divide by T r(V1,ε MF1 ,V1,ε V1,ε ) = T1 − k.
n
Thus: Q̂β = n(T11−k) m i,j∈Im (β̂i β̂j′ )(ε̂′1,j V̂1,ε −1
ε̂1,i )− T11−k Tj=1 −1
(Ik ⊗e′j ) [(F̂2′ V̂2,ε −1
F̂2 )−1 F̂2′ V̂2,ε
P P P 1
]⊗
o
−1/2 −1 −1 −1/2
[V̂1,ε MF̂1 ,V̂1,ε ] Ω̂21 [V̂2,ε F̂2 (F̂2′ V̂2,ε F̂2 )−1 ] ⊗ [MF̂′ ,V̂ V̂1,ε ] (Ik ⊗ ej ), where the ej are T1 -
1 1,ε
dimensional unit vectors, and Ω̂21 is obtained from Subsection E.5.3 i). If estimate Q̂β is not
positive definite, we regularize it by deleting the negative eigenvalues.
iii) Joint feasible CLT
To get a feasible CLT for the FA estimators from (E.20)-(E.21), we need the joint distribution
of the Gaussian matrix variates Z and W . Under the condition of Lemma 7 (b), the estimates of
42
Sample splitting makes the estimation of the bias easier, but we can avoid such a splitting at the expense of a more
complicated debiasing procedure.
64
the asymptotic variances of vech(Z) and vec(W ) are enough, since these vectors are independent.
Otherwise, to estimate the covariance Cov(vech(Z), vec(W )), we need to extend the approaches
of the previous subsections.
In this subsection, we particularize the asymptotic distributions of the FA estimators for three
special cases along the lines of Section 4, plus a fourth special case that allows us to further discuss
the link with Anderson and Amemiya (1988).
i) Gaussian errors
ind
When the errors admit a Gaussian distribution εi ∼ N (0, σii Vε ) with diagonal Vε , matrix
−1/2 −1/2 −1/2 −1/2
√1 Vε ZVε is in the GOE for dimension T , i.e., √1q vech(Vε ZVε ) ∼ N (0, IT (T +1)/2 ),
q
where q = lim n1 i σii2 . Moreover, vec(W ) ∼ N (0, Qβ ⊗ Vε ), where Qβ = lim n1 i σii βi βi′ ,
P P
n→∞ n→∞
mutually independent of Z because of the symmetry of the Gaussian distribution.
ii) Quasi GOE errors
As an extension of the previous case, here let us suppose that the errors meet Assumption 2, the
Conditions (a) and (b) in Proposition 3 plus additionally (c) lim n1 ni=1 V (ε2i,t ) = ηVε,tt 2
P
, for a con-
n→∞
stant η > 0, and (d) lim n1 ni=1 E[ε2i,t εi,r εi,p ] = 0 for r ̸= p. This setting allows e.g. for condition-
P
n→∞
ally homoskedastic mds processes in the errors, but excludes
ARCH effects. Then, the arguments
−1/2 −1/2 (η/2 + κ)IT 0
in Lemma 1 imply vech(Vε ZVε ) ∼ N (0, Ω) with Ω = .
0 (q + κ)I 1 T (T −1)
2
−1/2 −1/2
The distribution of Vε ZVε is similar to (scaled) GOE holding in the Gaussian case up to
the variances of diagonal and of out-of-diagonal elements being different when η ̸= 2q. Hence,
contrasting with test statistics, the asymptotic distributions of FA estimates differ in cases i) and ii)
beyond scaling factors. It is because the asymptotic distributions of FA estimates involve diagonal
elements of Z as well.
iii) Spherical errors
ind
Let us consider the case εi ∼ (0, σii Vε ) where Vε = σ̄ 2 IT , with independent components
65
1
P
across time and the normalization lim i σii = 1. By repeating the arguments of Section E.3
n→∞ n
for the constrained FA estimators (see Section 4.3), we get T r(MF (Ψy − Ψε )MF ) = 0 instead of
√ 1
equation (E.4). It yields the asymptotic expansions n(σ̂ 2 − σ̃ 2 ) = T −k T r(MF Zn ) + op (1) =
2 √
σ̄
T −k
T r(Zn∗ ) + op (1), and n(F̂j − Fj ) = σ̄12 Rj (Ψy − Ψε )Fj − σ̄12 Λj Ψε Fj + op (1) = σ̄12 Rj (Wn F ′ +
F Wn′ + Z̄n )Fj + op (1), where we use Ψy − Ψε = Wn F ′ + F Wn′ + Z̄n , Ψε = 1
T −k
T r(MF Zn )IT
1
and Λj Fj = 0, and Z̄n = Zn − T −k T r(MF Zn )IT . Moreover, by sphericity, we have Rj =
√ σ̄ 2 √
1
P + 1 MF + kℓ=1,ℓ̸=j γj −γ
1 2 2 ∗
P
2γj Fj γj ℓ
P Fℓ
. Thus, we get n(σ̂ −σ̃ ) ⇒ T −k
T r(Z ) and n(F̂j −Fj ) ⇒
1
R (W F ′ +F W ′ + Z̄)Fj .43
σ̄ 2 j
The Gaussian matrix Z is such that Ztt ∼ N (0, η) and Zt,s ∼ N (0, q)
̸ s, mutually independent, where η = lim n1 i V [ε2i,t ], and vec(W ) ∼ N (0, Qβ ⊗ IT ).
P
for t =
n→∞
Variables Z and W are independent if E[ε3i,t ] = 0. FGS, Section 4.3.1, explain how we can
estimate q and η by solving a system of two linear equations based on estimated moments of ε̂i,t .
iv) Cross-sectionally homoskedastic errors and link with Anderson and Amemiya (1988)
Let us now make the link with the distributional results in Anderson and Amemiya (1988). In
our setting, the analogous conditions as those in their Corollary 2 would be: (a) random effects
for the loadings that are i.i.d. with E[βi ] = 0, V [βi ] = Ik , (b) error terms are i.i.d. εi ∼ (0, Vε )
with Vε = diag(Vε,11 , ..., Vε,T T ) such that E[εi,t εi,r εi,s εi,p ] = Vε,tt Vε,ss , for t = r > s = p, and
= 0, otherwise, and (c) βi and εi are mutually independent. Thus, σii = 1 for all i, i.e., errors
are cross-sectionally homoskedastic. Under the aforementioned Conditions (a)-(c), the Gaussian
2
distributional limits Z and W are such that V [Ztt ] = ηt Vε,tt 2
, for ηt := V [ε2i,t ]/Vε,tt , V [Zts ] =
Vε,tt Vε,ss , for t ̸= s, all covariances among different elements of Z vanish, and V [vec(W )] = Ik ⊗
Vε . Equations (E.20)-(E.21) yield the asymptotic distributions of the FA estimates. In particular,
they do not depend on the distribution of the βi . Moreover, the distribution of the out-of-diagonal
elements of Z does not depend on the distribution of the errors, while, for the diagonal term, we
have ηt = 2 for Gaussian errors. As remarked in Section E.5.2, if the asymptotic distribution of
43
The asymptotic distribution of estimator σ̂ 2 coincides with that derived in FGS with perturbation theory methods.
The asymptotic distribution of the factor estimates slightly differs from that given in FGS, Section 5.1, because of the
different factor normalization adopted by FA compared to PCA even under sphericity.
66
1 ′
P
estimator V̂ε is centered around the realized matrix n i εi εi instead of its expected value, that
distribution involves the out-of-diagonal elements of Z, and the elements of W . Hence, in that
case, the asymptotic distribution of the FA estimates is the same independent of the errors being
Gaussian or not, and depends on F and Vε only, as found in Anderson and Amemiya (1988).
In this subsection, we consider the transformation O that maps matrix Ĝ into ĜO, where O is an
orthogonal matrix in R(T −k)×(T −k) , and the transformation OD that maps matrix D̂ into D̂OD ,
where OD is an orthogonal matrix in Rdf ×df . These transformations are induced from the freedom
in chosing the orthonormal bases spanning the orthogonal complements of F̂ and X̂. We show
√ ′
that they imply a group of orthogonal transformations on the vector Ŵ = nD̂ vech(Ŝ ∗ ), with
Ŝ ∗ = Ĝ′ V̂ε−1 (V̂y − V̂ε )V̂ε−1 Ĝ, and establish the maximal invariant.
Under the transformation O, matrix Ŝ ∗ is mapped into O−1 Ŝ ∗ O. This transformation is mir-
rored by a linear mapping at the level of the half-vectorized form vech(Ŝ ∗ ). In fact, this mapping
is norm-preserving, since ∥vech(S)∥2 = 21 ∥S∥2 and ∥O−1 SO∥ = ∥S∥ for any conformable sym-
metric matrix S and orthogonal matrix O. This mapping is characterized in the next lemma.
Lemma 10 For any symmetric matrix S and orthogonal matrix O in Rm×m , we have
vech(O−1 SO) = R(O)vech(S), where R(O) = 1 ′
A (O′
2 m
⊗ O′ )Am is an orthogonal matrix,
and Am is the duplication matrix defined in Appendix B. Transformations R(O) with orthogonal
O have the structure of a group: (a) R(Im ) = I 1 m(m+1) , (b) R(O1 )R(O2 ) = R(O2 O1 ), and (c)
2
[R(O)] −1
= R(O ).
−1
With this lemma, we can give the transformation rules under O for a set of relevant statistics in
the next proposition. We denote generically with e· a quantity computed with ĜO instead of Ĝ.
(c) Ip − X(
fX f′ X) f′ = R(O)[Ip − X(X ′ X)−1 X ′ ]R(O)−1 , (d) R
f −1 X e = RR(O)−1 ,
67
e p − X(
(e) R(I fX f′ X) f′ ) = R(Ip − X(X ′ X)−1 X ′ )R(O)−1 .
f −1 X
From Proposition 9 (c), under transformation O, matrix D̂ is mapped into R(O)D̂. Combining
with transformation OD , we have D̂ = R(O)D̂OD . Thus, using Proposition 9 (a), under O and
e
√ e′ ′
OD , vector Ŵ is mapped into Ŵ = nD̂ vech(Ŝ ∗ ) = OD ŴD . Thus, statistic Ŵ is invariant
f f
under O, while OD operates as the group of orthogonal transformations. The maximal invariant
under this group of transformations is the squared norm ∥Ŵ ∥2 = Ŵ ′ Ŵ .
Proof of Proposition 9: With Ŝ ∗ = O−1 Ŝ ∗ O, part (a) follows from Lemma 10. Let G̃ =
f
GO. Then, for any diagonal matrix ∆, on the one hand, we have vech(G̃′ ∆G̃) = X̃diag(∆),
and on the other hand, we have vech(G̃′ ∆G̃) = vech(O−1 G′ ∆GO) = R(O)vech(G′ ∆G) =
R(O)Xdiag(∆). By equating the two expressions for any diagonal matrix ∆, part (b) follows.
Statement (c) is a consequence thereof and R(O) being orthogonal. Moreover, with Q̃ = QO and
using vech(Q̃′ Z Q̃) = vech(O−1 Q′ ZQO) = R(O)R′ vech(Z), we deduce part (d). Statement (e)
is a consequence of (c) and (d).
Proof of Lemma 4: The equivalence of conditions (a) and (b) is a consequence of the fact that
function L (A) = − 21 log |A| − 12 T r(Vy0 A−1 ), where A is a p.d. matrix, is uniquely maximized for
A = Vy0 (see Magnus and Neudecker (2007), p. 410), and L0 (θ) = L (Σ(θ)).
h P i
1 n 1/2
Proof of Lemma 5: (a) From Assumption 2, we have E[ε̄] = 0 and V [ε̄] = V n i,k=1 si,k Vε wk
1/2 1/2
= Vε n12 ni,j,k,l=1 si,k sj,l E[wk wl′ ]Vε = ( n12 ni,j σi,j )Vε where the si,k are the elements of Σ1/2 .
P P
1/2 P n 2(1+δ)
Now, n12 ni,j=1 σi,j ≤ C n12 Jm=1 bm,n = O(nδ−1 Jm=1 ) = O(nδ−1 Jn ( Jm=1
P n 1+δ 1+δ
Bm,n )1/2 )
P Pn
Bm,n
1/2
= o(n−1 Jn ) = o(n−1/2 ) from the Cauchy-Schwarz inequality and Assumptions 2 (c) and (d).
Part (a) follows. To prove part (b), we use E[ n1 εε′ ] → Vε0 and V [vech((Vε0 )−1/2 ( n1 εε′ )(Vε0 )−1/2 )] =
1
Ω
n n
from the proof of Lemma 1, and n1 Ωn = o(1) by Assumption A.3. Finally, to show part (c),
write n1 ni=1 εi βi′ = (Vε0 )1/2 n1 ni,j=1 si,j wj βi′ . Then, E[ n1 ni=1 εi βi′ ] = 0 while the variance of
P P P
vec( n1 ni=1 εi βi′ ) vanishes asymptotically since V [vec( n1 ni,j=1 si,j wj βi′ )] = n12 ni,j,m,l=1 si,j sm,l
P P P
68
Pn
(βi βl′ ) ⊗ E[wj wm
′
]= 1
n2 i,l=1 σi,l (βi βl′ ) ⊗ IT = o(1) under Assumptions 2 and A.2.
Proof of Lemma 6: From the arguments in the proof of Proposition 7 with Ψy = 0, the solution of
the FOC is such that ΨϵF,j = (Λ0j − Rj0 )ΨϵVε (Vε0 )−1 Fj for j = 1, ..., k, and diag(MF0 ,Vε0 ΨϵVε MF′ 0 ,Vε0 )
= 0. Since ΨϵVε is diagonal, the latter equation yields MF⊙2
0 ,Vε
ϵ
0 diag(ΨVε ) = 0. Under condition (a)
of Lemma 6, we get ΨϵVε = 0, which in turn implies ΨϵF = 0. Thus, condition (a) is sufficient
for local identification. It is also necessary to get uniqueness of the solution ΨϵVε = 0. Moreover,
conditions (a) and (b) of Lemma 6 are equivalent as shown in Appendix E.3. Further, conditions (a)
and (c) are equivalent since Φ⊙2 = MF⊙2
0 ,Vε
0 2
0 (Vε ) . Finally, let us show that condition (d) of Lemma
6 is both sufficient and necessary for local identification. The FOC for the Lagrangian problem are
∂L0 (θ) ∂g(θ)′
∂θ
− ∂θ
λL = 0 and g(θ) = 0, where λ
L is the Lagrange
multiplier vector.
By expanding
at
θ − θ0 J0 A0
first-order around θ0 and λ0 = 0, we get H0 = 0, where H0 := , with
λ A′0 0
∂g(θ0 )′
A0 = ∂θ
, is the bordered Hessian. The parameters are locally identified if, and only if, H0 is
invertible. The latter condition is equivalent to B0′ J0 B0 being invertible.44
1/2
Proof of Lemma 7: By Assumption 2, vec(Wn ) = (Ik ⊗ Vε ) √1n Jm=1
Pn
xm,n where the xm,n :=
P
i,j∈Im si,j (βi ⊗ wj ) are independent across m. Now, we apply the Liapunov CLT to show
PJn P
√1 ′ ′
n m=1 m,nx ⇒ N (0, Q β ⊗I T ). We have E[x m,n ] = 0 and E[x x
m,n m,n ] = i,j∈Im i,j i j ⊗
σ β β
IT and, by Assumption A.8, ΩW,n := n1 Jm=1 E[xm,n x′m,n ] converges to the positive definite ma-
Pn
−1/2
trix Qβ ⊗IT . Let us now check the multivariate Liapunov condition ∥ΩW,n ∥4 n12 Jm=1 E[∥xm,n ∥4 ] =
Pn
−1/2
o(1). Since ∥ΩW,n ∥ = Op (1), it suffices to prove n12 Jm=1 E[(xp,t 4
Pn
m,n ) ] = o(1), for any p = 1, ..., k
69
PT −1
where Ω = D+κIT (T +1)/2 = [ψ(0)−2q]D(0)+ h=1 ψ(h)[D̃(h)+D̄(h)]+(q+κ)IT (T +1)/2 . Then,
since the columns of R are orthonormal, we get MX ΩZ ∗ MX = [ψ(0) − 2q]MX R′ D(0)RMX +
PT −1 ′ PT −1 ′
h=1 ψ(h)MX R D̃(h)RMX + h=1 ψ(h)MX R D̄(h)RMX + (q + κ)MX . Now, we show that
1/2 1/2
the the first two terms in this sum are nil. We have G′ Et,t G = Q′ Vε Et,t Vε Q = Vε,tt Q′ Et,t Q
and thus vech(G′ Et,t G) = Vε,tt vech(Q′ Et,t Q) = Vε,tt R′ vech(Et,t ) (see the proof of Proposi-
tion 2). Hence, the kernel of matrix MX is spanned by vectors R′ vech(Et,t ), for t = 1, ..., T .
We deduce that MX R′ D(0) = 0 and MX R′ D̃(h)RMX = 0. Furthermore, from IT (T +1)/2 =
P −1
2 Tt=1 vech(Et,t )vech(Et,t )′ + t<s vech(Et,s + Es,t )vech(Et,s + Es,t )′ = 2D(0) + Th=1
P P
D̄(h),
−1
we get MX = MX R′ IT (T +1)/2 RMX = Th=1 MX R′ D̄(h)RMX . The conclusion follows.
P
The non-zero contributions to the term in the curly brackets come from the combinations with a =
P
b = c = d, a = b ̸= c = d, a = c ̸= b = d and a = d ̸= b = c, yielding: a,b,c,d σa,b σc,d E[(wa ⊗
wb )(wc ⊗wd )′ ] = a σa,a E[(wa wa′ )⊗(wa wa′ )]+( a̸=c σa,a σc,c )vec(IT )vec(IT )′ +( a̸=b σa,b
P 2 P P 2
)(IT 2 +
KT,T ) = a [σa,a V (wa ⊗ wa )] + ( a σa,a )2 vec(IT )vec(IT )′ + ( a̸=b σa,b
P 2 P P 2
)(IT 2 + KT,T ). Then, us-
nP o
′ 1 ′ ′
ing wa ⊗ wa = AT vech(wa wa ), we get 4 AT a,b,c,d∈Im σa,b σc,d E[(wa ⊗ wb )(wc ⊗ wd ) ] AT =
′ ′
P 2 P 2
P 2
a [σa,a V (vech(wa wa ))] + ( a σa,a ) vech(IT )vech(IT ) + ( a̸=b σa,b )I T (T +1) . Then, since
2
1
Pn 2 ′
n
σ
i=1 i,i V [vech(wi iw )] = Dn , where matrix D n is defined in Assumption A.6, we get Ω̂Z ∗ =
R′ Ξ̃n R + op (1), where Ξ̃n = Dn + (qn + ξn )vech(IT )vech(IT )′ + κn I T (T +1) . Moreover, under As-
2
sumption 3, and singling out parameter qn along the diagonal, we have Dn = [ψn (0) − 2qn ]D(0) +
PT −1
h=1 ψn (h)[D̃(h) + D̄(h)] + qn IT (T +1)/2 . The conclusion follows.
Proof of Lemma 10: We use vec(S) = Am vech(S), where the m2 × 12 m(m + 1) matrix Am is
such that: (i) A′m Am = 2I 1 m(m+1) , (ii) Km,m Am = Am , where Km,m is the commutation matrix
2
70
for order m, and (iii) Am A′m = Im2 + Km,m (see the proof of Proposition 2 and also Theorem
1 ′
12 in Magnus, Neudecker (2007) Chapter 2.8). Then, vech(S) = A vec(S)
2 m
by property (i),
and vech(O−1 SO) = 12 A′m vec(O−1 SO) = 21 A′m (O′ ⊗ O′ )vec(S) = 12 A′m (O′ ⊗ O′ )Am vech(S),
for all symmetric matrix S. It follows R(O) = 1 ′
A (O′
2 m
⊗ O′ )Am . Moreover, by properties
(i)-(iii), we have (a) R(Im ) = I 1 m(m+1) , (b) R(O1 )R(O2 ) = A (O1′
1 ′
4 m
⊗ O1′ )Am A′m (O2′ ⊗
2
O2′ )Am = 14 A′m (O1′ ⊗ O1′ )(Im2 + Km,m )(O2′ ⊗ O2′ )Am = 14 A′m (O1′ O2′ ⊗ O1′ O2′ )(Im2 + Km,m )Am =
1 ′
A [(O2 O1 )′
2 m
⊗ (O2 O1 )′ ]Am = R(O2 O1 ), and thus (c) [R(O)]−1 = R(O−1 ).
71
F.1 Calibration of ν̄, λ and λ̄
To calibrate the bounds ν̄, λ and λ̄ with realistic values, we run the following numerical experiment.
For T = 20 and k = 7, we simulate 10, 000 draws from random T ×k matrix F̃ such that vec(F̃ ) ∼
1/2 1/2
N (0, IT k ) and set F = Vε U Γ1/2 , U = F̃ (F̃ ′ F̃ )−1/2 , G = Vε Q, Q = Q̃(Q̃′ Q̃)−1/2 , Q̃ are the
first T − k columns of IT − U U ′ , for Vε = diag(Vε,11 , ..., Vε,T T ), with Vε,tt = 1.5 for t = 1, ..., 10,
and Vε,tt = 0.5 for t = 11, ..., 20, and Γ = T diag(4, 3.5, 3, 2.5, 2, 1.5, 1), ck+1 = 10T , and ξk+1 =
1 ′ −1
e1 . With these choices, the “signal-to-noise" F V Fj
T j ε
for the seven factors j = 1, ..., 7 are
′
4, 3.5, 3, 2.5, 2, 2.5, 1, and the “signal-to-noise" for the weak factor is T1 Fk+1 Vε−1 Fk+1 = 10n−1/2 .
Moreover, the errors follow the ARCH model of Section E.5.3 (i) with ARCH parameters either (a)
αi = 0.2 for all i, or (b) αi = 0.5 for all i, and q = 4, and κ = 0 (cross-sectional independence).
The choices αi = 0.2, 0.5 both meet the condition 3αi2 < 1 ensuring the existence of fourth-
order moments. Moreover, with q − 1 = 3, we have a cross-sectional variance of the σii that
is three times larger than the mean (normalized to 1). For each draw, we compute the df = 71
non-zero eigenvalues and associated eigenvectors of ΩZ̄ ∗ , and the values of parameters νj and
λj . In our simulations (a) with αi = 0.2, the draws of maxj=1,...,df νj range between 0.21 and
0.30, with 95% quantile equal to 0.28, while the 5% and 95% quantiles of the λj are 0.13 and
7.65. Instead, (b) with αi = 0.5, the maxj=1,...,df νj range between 0.70 and 0.79, with 95%
quantile equal to 0.77, and the 5% and 95% quantiles of the λj are 0.12 and 6.64. To get further
insights in the choice of parameters ν̄, λ, λ̄, we also consider the values implied by the FA estimates
in our empirical analysis. Here, when testing for the last retained k in a given subperiod, the
median across subperiods of maxj=1,...,df νj is 0.76, and smaller than about 0.90 in most subperiods.
Similarly, assuming ck+1 = 10T and ξk+1 = e1 as above, the median values of the smallest and the
largest estimated λj are 0.0024 and 5.84. Inspired by these findings, we set λ̄ = 7, and consider
ν̄ = 0.2, 0.7, 0.9, 0.99, and λ = 0, 0.1, 0.5, 1, to get realistic settings with different degrees
of dissimilarity from the case with serially uncorrelated squared errors (increasing with ν̄), and
separation of the alternative hypothesis from the null hypothesis (increasing with λ).
72
F.2 Results with Monte Carlo draws
In Table 2, the entries are nil for ν̄ sufficiently small and λ sufficiently large, suggesting that the
AUMPI property holds for those cases that are closer to the setting with uncorrelated squared
errors and sufficiently separated from the null hypothesis. Violations of Inequalities (7) concern
df = 2, 3, 4, 5.47 Let us focus on the setting with ν̄ = 0.7 and λ = 0.1. We find 3752 violations
of Inequalities (7) out of 108 simulations, all occurring for df = 2, except 65 for df = 3. For
those draws violating Inequalities (7) for df = 2, a closer inspection shows that (a) they feature
values ν2 close to upper bound ν̄ = 0.7, and values of λ2 close to lower bound λ = 0.1, and
f (z;λ1 ,λ2 )
(b) several of them yield non-monotone density ratios f (z;0,0)
, with the non-monotonicity region
corresponding to large values of z. As an illustration, let us take the density ratio for df = 2 with
ν2 = 0.666, λ1 = 1.372, and λ2 = 0.130. Here, the eigenvalues of the variance-covariance matrix
are µ1 = 1 (by normalization) and µ2 = (1 − ν2 )−1 = 2.994, and the non-centrality parameter λ2
is small. The quantiles of the asymptotic distribution under the null hypothesis for asymptotic size
α = 20%, 10%, 5%, 1%, 0.1% are 9.3, 12.8, 16.2, 24.5, 36.5. Non-monotonicity applies for z ≥ 16.
The optimal rejection regions { f f(z;λ 1 ,λ2 )
(z;0,0)
≥ C} correspond to those of the LR test {z ≥ C̃}, e.g.,
for asymptotic levels such as α = 20%, but not for α = 5% or smaller. Indeed, in the latter cases,
because of non-monotonicity of the density ratio, the optimal rejection regions are finite intervals
in argument z. With ν̄ = 0.7, we do not find violations with λ = 0.5 or larger.
47
A given number of simulated draws become increasingly sparse when considering larger values of df , which
makes the exploration of the parameter space more challenging in those cases. However, unreported theoretical con-
siderations show via an asymptotic approximation that the monotone likelihood property holds for df → ∞ since the
limiting distribution is then Gaussian. This finding resonates with the absence of violations in Table 2 for the larger
values of df .
73
df 2 3 4 5 6 7 8 9 10 11 12
Table 2: Numerical check of Inequalities (7) by Monte Carlo. We display the cumulative frequency
of violations in h of Inequalities (7), for m = 3, ..., 16, over 108 random draws of the parameters
λj ∼ U nif [λ, λ̄] and νj ∼ U nif [0, ν̄], for λ̄ = 7, and different combinations of bounds λ, ν̄, and
degrees of freedom df . An entry 0.000 corresponds to less than 100 cases out of 108 draws.
74
G Maximum value of k as a function of T
In Table 3, we report the maximal values for the number of latent factors k to have df ≥ 0, or
df > 0.
T 1 2 3 4 5 6 7 8 9 10 11 12
df ≥ 0 0 0 1 1 2 3 3 4 5 6 6 7
df > 0 NA 0 0 1 2 2 3 4 5 5 6 7
T 13 14 15 16 17 18 19 20 21 22 23 24
df ≥ 0 8 9 10 10 11 12 13 14 15 15 16 17
df > 0 8 9 9 10 11 12 13 14 14 15 16 17
Table 3: Maximum value of k. We give the maximum admissible value k of latent factors so that
the order conditions df ≥ 0 and df > 0 are met, with df = 12 [(T − k)2 − T − k], for different values
of the sample size T = 1, ..., 24. Condition df ≥ 0 is required for FA estimation, and condition
df > 0 is required for testing the number of latent factors.
75