1 • FEBRUARY 2019

Dividend Dynamics, Learning, and Expected

Stock Index Returns

We present a latent variable model of dividends that predicts, out-of-sample, 39.5%
to 41.3% of the variation in annual dividend growth rates between 1975 and 2016.
Further, when learning about dividend dynamics is incorporated into a long-run
risks model, the model predicts, out-of-sample, 25.3% to 27.1% of the variation in
annual stock index returns over the same time horizon, with learning contributing
approximately half of the predictability in returns. These findings support the view
that investors’ aversion to long-run risks and their learning about these risks are
important in determining stock index prices and expected returns.

THE AVERAGE RETURN ON EQUITIES has been substantially higher than the aver-
age return on risk-free bonds over long periods of time. For instance, between
1946 and 2016, the S&P500 earned 66 basis points more per month than 30-day
T-bills (i.e., over 7% annualized). Over the years, many dynamic equilibrium
asset pricing models have been proposed in an attempt to understand why risks
in equities require such a large premium and why risk-free rates are so low.
A common feature in most of these models is that the risk premium on equi-
ties does not remain constant over time, but rather varies in a systematic and
stochastic manner. Given a large number of studies find evidence of such pre-
dictable variation in the equity premium, Lettau and Ludvigson (2001, p. 842)
conclude that “it is now widely accepted that excess returns are predictable by
variables such as price-to-dividend ratios.”1

402 The Journal of FinanceR

Goyal and Welch (2008) argue, however, that while variables such as price-to-
dividend ratios are successful in predicting stock index returns in-sample, they
fail to predict returns out-of-sample. The difference between in-sample and
out-of-sample prediction comes down to the assumption made on investors’ in-
formation set. Traditional dynamic equilibrium asset pricing models assume
that, although investors’ beliefs about investment opportunities and economic
conditions change over time and drive the variation in stock index prices and
expected returns, these investors nevertheless have complete knowledge of the
parameters describing the economy. For example, these models assume that
investors know the true model and model parameters governing consumption
and dividend dynamics. This assumption has been only a matter of analyt-
ical convenience, and as Hansen (2007, p. 2) asks, “how can we burden the
investors with some of the specification problems that challenge the econome-
trician.” Motivated by this insight, a recent but growing literature focuses on
the role of learning in asset pricing models. Timmermann (1993) and Lewellen
and Shanken (2002) demonstrate via simulations that parameter uncertainty
can lead to excess predictability and volatility in stock returns. Johannes,
Lochstoer, and Mou (2016) propose a Markov-switching model for consumption
dynamics and show that learning about the consumption process is reflected in
asset prices. Croce, Lettau, and Ludvigson (2014) further show that a long-run
risks model that features bounded rationality and limited information can gen-
erate a downward-sloping equity term structure. Collin-Dufresne, Johannes,
and Lochstoer (2016) provide a theoretical model in which parameter learning
can be a source of long-run risks under Bayesian learning.2 We add to this
The main contributions of our paper are as follows. We present a model for
aggregate dividends of the stock index, based on simple economic intuition,
which explains large variation in annual dividend growth rates out-of-sample.
We show that, when learning about dividend dynamics is incorporated into a
long-run risks model, the model predicts large variation in annual stock index
returns out-of-sample. This not only addresses the Goyal and Welch (2008)
critique and significantly revises upward the degree of return predictability
relative to the existing literature, but also lends support to the view that both
investors’ aversion to long-run risks and their learning about these risks play
important roles in determining asset prices and expected returns.3,4
To study the effect of learning about dividend dynamics on stock index prices
and expected returns, we first need a dividend model that is able to realistically

and Koijen (2010), Chen, Da, and Zhao (2013), Kelly and Pruitt (2013), van Binsbergen et al.
(2013), Li, Ng, and Swaminathan (2013), Da, Jagannathan, and Shen (2014), and Martin (2017).
2 Instead of learning, an alternative approach that researchers have used is to introduce pref-

erence shocks. See, for example, Albuquerque, Eichenbaum, and Rebelo (2015).
3 Our paper is also consistent with the argument in Lettau and Van Nieuwerburgh (2008)

that steady-state economic fundamentals, or in our interpretation, investors beliefs about these
fundamentals, vary over time and that such variation is critical in determining asset prices and
expected returns.
4 Following existing literature, we adopt the stock index as a proxy for the market portfolio.
Dividend Dynamics, Learning, and Expected Stock Index Returns 403

capture how investors form expectations about future dividends. Inspired by

Lintner (1956) and Campbell and Shiller (1988b), we develop a model of divi-
dend growth rates that adds information in corporate payout policy to the latent
variable model used in Cochrane (2008), van Binsbergen and Koijen (2010), and
others. Our model predicts 42.4% to 46.4% of the variation in annual dividend
growth rates between 1946 and 2016 in-sample and 39.5% to 41.3% of the vari-
ation in annual dividend growth rates between 1976 and 2016 out-of-sample.
Based on these results, we comfortably reject the null that expected dividend
growth rates are constant and show that the superior performance of our divi-
dend model over alternative models in predicting annual dividend growth rates
is statistically significant and economically meaningful.
We further document that uncertainty about parameters in our dividend
model, especially parameters surrounding the persistent latent variable, is
high and resolves slowly. In particular, such uncertainty remains substantial
even at the end of our 71-year sample, which suggests that learning about
dividend dynamics is a difficult and slow process. Moreover, when our dividend
model is estimated at each point in time based on data available at the time,
model parameter estimates fluctuate over time, some significantly, as more
data become available. In other words, if investors estimate dividend dynamics
using our model, we expect their beliefs about the parameters governing the
dividend process to vary significantly over time. We show that these changes
in investors’ beliefs can have large effects on their expectations of future div-
idends. Thus, through this channel, changes in investors’ beliefs about the
parameters governing the dividend process can contribute significantly to the
variation in stock prices and expected returns.
We next provide evidence that investors behave as if they learn about divi-
dend dynamics and price stocks using our model. First, we define stock yields
as discount rates that equate the present value of expected future dividends to
the current prices of the stock index. Based on the log-linearized present value
relationship of Campbell and Shiller (1988a), we specify stock yields as a func-
tion of price-to-dividend ratios and long-run dividend growth expectations.5
We show that, assuming investors learn about dividend dynamics, these stock
yields explain 18.7% of the variation in annual stock index returns between
1975 and 2016. In comparison, stock yields, assuming full information, predict
a statistically significantly lower 13.0% of the same variation over the same
horizon. Next, we embed our dividend model into a dynamic equilibrium asset
pricing model that features Epstein and Zin (1989) preferences, which capture
preferences for the early resolution of uncertainty, and consumption dynamics
similar to the long-run risks model of Bansal and Yaron (2004). We refer to this
model as our long-run risks model. We find that, assuming learning, our long-
run risks model predicts 25.3% to 27.1% of the variation in annual stock index
5 See Jagannathan, McGrattan, and Scherbina (2001) for the dynamic version of the Gordon

(1959) growth model that gives an expression for stock yield in levels. When expected dividend
growth rates vary over time, according to the present value relationship, we show that stock yield,
that is, the long-run expected return on stocks, is the current dividend yield plus a weighted
average of expected future one-period dividend growth rates.
404 The Journal of FinanceR

returns between 1975 and 2016. Learning accounts for approximately half of
the predictability in returns. Both the model’s forecasting performance and
the incremental contribution of learning to this performance are statistically
significant and economically meaningful.
Our results suggest that, aside from a common persistent component in
consumption and dividend growth rates, the assumption that investors hold
Epstein and Zin (1989) preferences with early resolution of uncertainty, which
is a critical component of any long-run risks model, is essential to the model’s
strong performance in predicting annual stock index returns.6 More specifically,
we find that, by replacing Epstein and Zin (1989) preferences with constant
relative risk aversion (CRRA) preferences, the R2 associated with predicting
annual stock index returns between 1975 and 2016 drops from 13.3% to 11.8%
assuming full information and from at least 25.3% to at most 15.1% after
incorporating learning into the model. This substantial deterioration in fore-
casting performance supports the view that the assumption of early resolution
of uncertainty, as modeled by Epstein and Zin (1989) preferences, is poten-
tially important for building an asset pricing model consistent with investor
We follow Cogley and Sargent (2008), Piazzesi and Schneider (2010), and
Johannes, Lochstoer, and Mou (2016), and define learning based on the antic-
ipated utility of Kreps (1998). Under anticipated utility, agents update using
Bayes’s law but optimize myopically in that they do not take into account uncer-
tainty associated with learning in their decision-making process. Anticipated
utility thus assumes that agents form expectations not knowing that their
beliefs will continue to evolve over time as the model continues updating.7
The rest of this paper is organized as follows. In Section I, we introduce our
dividend model and evaluate its performance in capturing dividend dynamics.
In Section II, we show that investors’ beliefs about dividend model parameters
can vary significantly over time as a result of Kreps’s learning about dividend
dynamics. In Sections III, we show that learning accounts for a significant
fraction of the variation in both long-run and short-run expected stock index
returns. In Section IV, we first discuss how an asset pricing model’s perfor-
mance in predicting stock index returns can be used to evaluate the model. We
demonstrate that, between 1975 and 2016, a model that incorporates Kreps’s
learning into a long-run risks model predicts 25.3% to 27.1% of the variation in
annual stock index returns, and we explain why this finding provides insights
into investor preferences and the role of learning in investor behavior. Finally,
in Section V, we conclude.

6 Alternatively, as Hansen and Sargent (2010) and Bidder and Dew-Becker (2016) show, if

investors are averse to ambiguity, they may behave as if a common persistent component exists
even if the actual consumption and dividend processes are not persistent.
7 Collin-Dufresne, Johannes, and Lochstoer (2016) provide the theoretical foundation for study-

ing uncertainty about model parameters as a priced risk factor.

Dividend Dynamics, Learning, and Expected Stock Index Returns 405

I. The Dividend Model

In this section, we present a model of dividend growth rates that extends the
latent variable model of Cochrane (2008), van Binsbergen and Koijen (2010),
and others by incorporating information in corporate payout policy into the
model. The inclusion of corporate payout policy in explaining dividend dynam-
ics is inspired by Campbell and Shiller (1988b), who show that the cyclical-
adjusted price-to-earnings (CAPE) ratios, defined as the log ratio between real
prices and real earnings averaged over the past decade, can predict future
growth rates in dividends.
We begin with the latent variable model used in Cochrane (2008), van Bins-
bergen and Koijen (2010), and others. Let Dt be the nominal dividend of the
stock index, dt = log(Dt ), and dt+1 = dt+1 − dt be the log dividend growth rate.
The model is given as

dt+1 − μd = xt + σdd,t+1
xt+1 = ρxt + σx x,t+1
d,t+1 1 λdx
∼ i.i.d. N 0, , (1)
x,t+1 λdx 1

where time t is defined in years to control for potential seasonality in divi-

dend payments. Following van Binsbergen and Koijen (2010), we fit our model
to the nominal dividend process. As shown in Jagannathan, McGrattan, and
Scherbina (2000) and Boudoukh et al. (2007), equity issuance and repurchases
tend to be more sporadic and random compared to cash dividends. For this rea-
son, we focus on modeling the cash dividend process.8 In (1), expected dividend
growth rates are a function of the latent variable xt , the unconditional mean
μd of dividend growth rates, and the persistence coefficient ρ on the latent
variable xt :
Et dt+s+1 = μd + ρ s xt , ∀ s ≥ 0. (2)

Before we add corporate payout policy into this model, we first recall the
dividend model used in Campbell and Shiller (1988b). Let pt denote the log
nominal price of the stock index, et log nominal earnings, and πt the log
consumer price index, and, following Campbell and Shiller (1988b), consider
the following vector-autoregression for annual nominal dividend growth rates,

8 A firm’s investment opportunity set includes repurchasing its own shares, which all else equal

would lead to an increase in the future earnings of the remaining shares, just as investment in
any other productive assets would. This is another reason we choose to focus on cash dividends in
our study.
406 The Journal of FinanceR

Table I
Campbell and Shiller (1988b) Betas for Predicting Dividend
Growth Rates
This table reports coefficients from predicting dividend growth rates using the vector-
autoregression specification in Campbell and Shiller (1988b). Statistics are based on nonover-
lapping annual data between 1946 and 2016. Reported in parentheses are Newey and West (1987)
standard errors that account for up to 10 years of serial correlations. Estimates significant at the
90%, 95%, and 99% confidence levels are indicated using *, **, and ***.

β10 β11 β12 β13

0.045 0.425*** 0.184*** −0.217***

(0.054) (0.061) (0.064) (0.077)
β0 β1 β2

−0.037 0.455*** 0.147***

(0.024) (0.070) (0.052)

log price-to-dividend ratios, and CAPE ratios:

⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞
dt+1 β10 β11 β12 β13 dt σdd,t+1
⎝ pt+1 − dt+1 ⎠ = ⎝ β20 ⎠ + ⎝ β21 β22 β23 ⎠ ⎝ pt − dt ⎠ + ⎝ σ( p−d) ( p−d),t+1 ⎠ ,
pt+1 − ēt+1 β30 β31 β32 β33 pt − ēt σ( p−ē) ( p−ē),t+1
⎛ ⎞ ⎛ ⎛ ⎞⎞
d,t+1 1 λ12 λ13
⎝ ( p−d),t+1 ⎠ ∼ i.i.d. N ⎝0, ⎝ λ12 1 λ23 ⎠⎠ , (3)
( p−ē),t+1 λ13 λ23 1

where, as in Campbell and Shiller (1988b), the CAPE ratio is defined as

pt − ēt = pt − πt + (et−s+1 − πt−s+1 ) . (4)

Estimates of β10 , β11 , β12 , and β13 in (3) based on data between 1946 and 2016
are reported in the first row of Table I. We see that both price-to-dividend
ratios and CAPE ratios significantly affect future dividends, but in opposite
directions. Specifically, increases in price-to-dividend ratios predict increases in
future dividend growth rates, but increases in CAPE ratios predict decreases in
future dividend growth rates. Further, we note from Table I that β12 + β13 = 0
cannot be statistically rejected. For this reason, we restrict β13 = −β12 and
rewrite (3) as

dt+1 = β0 + β1 dt + β2 (ēt − dt ) + σdd,t+1 , d,t+1 ∼ i.i.d N(0, 1). (5)

The stock index price pt does not appear in (5). Instead, future dividend growth
rates are a function of some measure of retention ratios, that is, ēt − dt . Esti-
mated coefficients from (5) are in the second row of Table I. We see that the
estimate of β2 is significant, suggesting that expected dividend growth rates
Dividend Dynamics, Learning, and Expected Stock Index Returns 407

respond to corporate payout policy. High earnings relative to dividends imply

that firms have been retaining earnings in the past and thus are expected to
pay more dividends in the future.
We extend (1) based on this insight that corporate payout policy contains
information about future dividends. Define et+1 = et+1 − et as the log nominal
earnings growth rate and qt = et − dt as the log earnings-to-dividend ratio, that
is, the retention ratio. We specify our dividend model as follows:

dt+1 − μd = xt + φ qt − μq + σdd,t+1 ,
xt+1 = ρxt + σx x,t+1 ,

qt+1 − μq = θ qt − μq + σq q,t+1 ,
⎛ ⎞ ⎛ ⎛ ⎞⎞
d,t+1 1 λdx λdq
⎝ x,t+1 ⎠ ∼ i.i.d. N ⎝0, ⎝ λdx 1 λxq ⎠⎠ . (6)
q,t+1 λdq λxq 1

In our model, future dividend growth rates are a linear combination of three
components. First, they consist of the latent variable xt , which follows a station-
ary AR[1] process. Second, they are affected by changes in retention ratios. In
particular, we expect firms to pay more future dividends if they have more re-
tained earnings. Third, they contain white noise d,t . For convenience, we model
retention ratios as an AR[1] process. Assuming that this process is stationary
implies that dividend and earnings growth rates have the same unconditional
mean μd. In (6), expected dividend growth rates are

Et [dt+s+1 ] = μd + ρ s xt + φθ s (qt − μq ), ∀s ≥ 0. (7)

This means that, in addition to the latent variable xt and retention ratios,
expected dividend growth rates are a function of the unconditional mean μd of
dividend growth rates, the unconditional mean μq and persistence θ of retention
ratios, the persistence ρ of the latent variable xt , and the coefficient φ that
connects corporate payout policy to dividend dynamics. The earnings process
is not modeled explicitly in (6). However, because earnings growth rates are,
by definition, a function of dividend growth rates and retention ratios, that is,

et+1 = (qt+1 − qt ) + dt+1 , (8)

and because both dividend growth rates and retention ratios are modeled in
(6), we can solve for earnings growth rates using

et+1 = μd + xt + (θ + φ − 1)(qt − μq ) + σe e,t+1 , e,t+1 ∼ i.i.d N(0, 1), (9)

σd d,t+1 +σq q,t+1
where σe = σd2 + σq2 + 2σdσq λdq and e,t+1 = σe

To forecast macroeconomic variables, a Markov-switching model is com-

monly used. It is conceivable that the Markov-switching model that describes
408 The Journal of FinanceR

consumption dynamics in Johannes, Lochstoer, and Mou (2016) can be applied

to dividend growth rates:

dt+1 = μd(st ) + σd(st )d,t+1 , st ∈ {1, 2, 3},

p(st+1 = i|st = j) = φi j ,
φi j ∈ [0, 1] ∀i, j ∈ {1, 2, 3},

φi j = 1 ∀ j ∈ {1, 2, 3}, (10)

where st is the underlying state of the economy, p(st+1 = i|st = j) is the prob-
ability that the economy transfers from state j ∈ {1, 2, 3} to state i ∈ {1, 2, 3},
and μd(st ) and σd(st ) are the mean and volatility of dividend growth rates in
a particular state. A key feature of this model that is not present in dividend
models discussed so far is that it is able to incorporate, albeit in a restricted
manner, both regime changes and stochastic volatility. We employ (10) as
another baseline to compare against our dividend model.

A. Estimation and Results

Due to the lack of reliable historical earnings data on the CRSP value-
weighted market index, we use the S&P500 index as a proxy for the market
portfolio. That is, throughout this study, data on prices, dividends, and earn-
ings correspond to the S&P500 index. These data can be found on Prof. Robert
Shiller’s website (http://www.econ.yale.edu/shiller/data.htm).
We estimate the model parameters

= {μd, φ, σd, ρ, σx , μq , θ, σq } (11)

using maximum-likelihood. For parameter reduction, we assume in our model

that cross-correlations λ(·, ·) of different shocks to the dividends are zeros,
that is, λ(d,t+1 , q,t+1 ) = 0, λ(d,t+1 , x,t+1 ) = 0, and λ(x,t+1 , q,t+1 ) = 0. The log-
likelihood function l(·) is then separable, and maximizing it is equivalent to

max l (d1 , ..dT , q0 , . . . qT | ) = max l1 + max l2

{μq ,θ,σq } {μd ,φ,σd ,ρ,σx }

l1 = l q0 , .., qT |{μq , θ, σq } ,
l2 = l (d1 |q0 , {μd, φ, σd, ρ, σx })

T −1
+ l (dt+1 |q0 , . . . , qt , d1 , .., dt , {μd, φ, σd, ρ, σx }) . (12)

Thus, we can estimate {μq , θ, σq } from the AR[1] process of retention ratios by
maximizing l1 using least squares, and {μd, φ, σd, ρ, σx } from the rest of the
dividend model by maximizing l2 using the Kalman filter (Hamilton (1994)).
Dividend Dynamics, Learning, and Expected Stock Index Returns 409

Table II
Dividend Model Parameter Estimates
This table reports estimated parameters from our dividend model. Dividends are based on nonover-
lapping annual data since 1946. Reported in parentheses are bootstrap-simulated standard errors.

μd φ σd

0.064 0.140 0.015

(0.016) (0.021) (0.017)

ρ σx

0.469 0.048
(0.168) (0.011)

μq θ σq

0.729 0.370 0.251

(0.065) (0.120) (0.026)

Appendix A describes the Kalman filter. Table II reports model parameter esti-
mates based on nonoverlapping annual data between 1946 and 2016.9 Standard
errors of parameter estimates are based on bootstrap simulation, as described
in Appendix B. Previous studies in line with this view suggest the presence of
a regime shift in dividend dynamics before and after World War II. Fama and
French (1988) note that dividends are more smoothed in the postwar period,
and Chen, Da, and Priestley (2012) argue that the lack of predictability in
dividend growth rates by price-to-dividend ratios in the postwar period is at-
tributable to this dividend-smoothing behavior. We therefore limit our sample
to the postwar period between 1946 and 2016. Consistent with our intuition,
the coefficient φ, which connects corporate payout policy to dividend dynamics,
is estimated to be positive and significant. That is, high retention ratios imply
high future dividend growth rates. The annual persistence of retention ratios
is estimated to be 0.370. The latent variable xt is more persistent at 0.469. We
therefore find a moderate to high level of persistence in dividend growth rates
between 1946 and 2016 based on estimates from our model.
In the first column of Table III, we report our dividend model’s performance
in predicting annual dividend growth rates. Between 1946 and 2016, our model
predicts 46.4% of the variation in annual dividend growth rates, which is a sig-
nificant improvement over the baseline models. Given that these statistics are
in-sample, we know that at least part of this improved forecasting performance
comes from adding more parameters to existing models and thus is mechanical.
Hence, to address the concern that our model overfits the data, we also assess
our model based on how it predicts annual dividend growth rates out-of-sample.
That is, instead of estimating model parameters based on the full data sample,
we predict dividend growth rates at each point in time using model parameters

9 All annual statistics reported are based on year-end data, that is, from January to December.

When we replicate our tests using overlapping annual data, the findings are very similar.
410 The Journal of FinanceR

Table III
Dividend Growth Rates and Expected Growth Rates
Panel A reports R2 s for predicting dividend growth rates using our dividend model, the latent vari-
able model in van Binsbergen and Koijen (2010), the VAR model in Campbell and Shiller (1988b),
or the Markov-switching model in Johannes, Lochstoer, and Mou (2016). The first column reports
in-sample R2 s. The second and third columns report out-of-sample R2 s and the corresponding
bootstrap-simulated p-values. Panel B reports incremental R2 s for predicting dividend growth
rates using our model over one of the baseline models. Dividends are estimated based on nonover-
lapping annual data since 1946. Out-of-sample statistics are based on nonoverlapping annual data
between 1975 and 2016.

Panel A. In-Sample and Out-of-Sample R2

In-Sample Out-of-Sample

R2 R2O p-Value

Our Model 0.464 0.413 0.000

van Binsbergen and Koijen (2010) 0.174 0.161 0.008
Campbell and Shiller (1988b) 0.278 0.256 0.001
Johannes, Lochstoer, and Mou (2016) 0.137 −0.042 1.000

Panel B. Incremental R2


R2I p-Value

van Binsbergen and Koijen (2010) 0.301 0.000

Campbell and Shiller (1988b) 0.212 0.002
Johannes, Lochstoer, and Mou (2016) 0.437 0.000

estimated based on data available at the time. Model Mi performance is then

evaluated using out-of-sample R2 s as in Goyal and Welch (2008),
T −1 2
t=T0 (dt+1 − Et [dt+1 |Mi ])
R O (Mi ) = 1 −
T −1 2 , (13)
t=T0 dt+1 − μ̂d,t

where μ̂d,t is the average of dividend growth rates up to time t:

μ̂d,t = ds+1 . (14)

We use time 0 to denote the start of the data sample, time T0 to denote the end
of the training period, and time T to denote the end of the data sample. We
use the period prior to 1975 as the training period, and hence out-of-sample
prediction corresponds to the 42-year period between 1975 and 2016. In the sec-
ond and third columns of Table III, we report out-of-sample R2 s for predicted
annual dividend growth rates and the corresponding bootstrap-simulated
p-values. The results show that our model predicts 41.3% of the variation
in annual dividend growth rates between 1975 and 2016 out-of-sample, which
Dividend Dynamics, Learning, and Expected Stock Index Returns 411

is an economically meaningful improvement over the 16.1%, 25.6%, and −4.2%

from the baseline models.
We next show that the differences in performance between our model and the
baseline models in predicting dividend growth rates are statistically significant.
For two models Mi and M j , we define the incremental R2 of Mi over M j as
T −1 2
t=T0 (dt+1 − Et [dt+1 |Mi ])
R I (Mi , M j ) = 1 − 
2 .
T −1
t=T0 dt+1 − Et [dt+1 |M j ]

We report the results in Table III. If the incremental R2 is significantly positive,

this would suggest that our dividend model is an improvement over the baseline
models in predicting annual dividend growth rates. Taken as a whole, we note
that the differences in forecasting performance between our model and the
baseline models are significant.

B. Inflation and Real Rates

In a standard neoclassical asset pricing model, real dividend growth rates,
not nominal rates, are of interest to investors in forming their investment
decisions. To convert nominal dividend growth rates into real rates, we need
to specify a process for inflation. We model inflation as a stationary AR[1]
πt+1 − μπ = η (πt − μπ ) + σπ π,t+1 , π,t+1 ∼ i.i.d. N(0, 1). (16)
Table IV reports parameter estimates of the inflation model based on nonover-
lapping annual data between 1946 and 2016. We find a moderate level of persis-
tence in inflation rates. Based on the reported R2 for predicting inflation rates,
namely, 44.8% in-sample between 1946 and 2016 and 54.0% out-of-sample be-
tween 1975 and 2016, we conclude that the AR[1] model does a reasonable job
describing the inflation process.
For parameter reduction, we assume that cross-correlations of different
shocks to inflation and the dividend process, that is, λdπ = λ(d,t+1 , π,t+1 ),
λxπ = λ(x,t+1 , π,t+1 ), and λqπ = λ(q,t+1 , π,t+1 ), are zeros. This assumption also
implies that estimating the inflation model separately from dividend dynam-
ics using least squares is equivalent to joint maximum-likelihood estimation.
Given this inflation model, we can derive the expression for expected real divi-
dend growth rates based on expected nominal rates and inflation as
Et [d̃t+s+1 ] = (μd − μπ ) + ρ s xt + φθ s (qt − μq ) − ηs+1 (πt − μπ ) , ∀s ≥ 0, (17)
where d̃t = dt − πt denotes the real dividend growth rate.11 To provide a
more intuitive sense of how various types of shocks to real dividend growth
10 In Figure 9, we plot the serial autocorrelation function (ACF) and serial partial autocorrelation
function (PACF) for inflation rates, which shows that AR[1] is the most appropriate autoregressive–
moving-average (ARMA) model for inflation.
11 Throughout, ∼ on top of a variable indicates that the variable is defined in real terms.
412 The Journal of FinanceR

Table IV
Inflation Model Parameter Estimates and Inflation Predictability
Panel A reports estimated parameters from our inflation model based on nonoverlapping data
between 1946 and 2016. Reported in parentheses are bootstrap-simulated standard errors. Panel
B reports R2 s for predicting inflation rates using our inflation model. The first column reports the
out-of-sample R2 . The second and third columns report the out-of-sample R2 and the corresponding
bootstrap-simulated p-value. In-sample (out-of-sample) statistics are based on nonoverlapping
annual data between 1946 and 2016 (1975 and 2016).

Panel A. Inflation Model Parameter Estimates

μπ η σπ

0.036 0.557 0.027

(0.014) (0.111) (0.018)

Panel B. Inflation and Expected Inflation

In-Sample Out-of-Sample

R2 R2O p-Value

Our Model 0.448 0.540 0.000

rates at a given point in time affect investors’ expectations of real dividends

going forward, we consider a one-unit change to shocks to the real dividend
process, that is, d,t , x,t , q,t , and π,t , and show how such a change affects both
immediate real dividend growth rates and expected real dividend growth rates
up to 10 years into the future. We plot these impulse response functions in
Figure 1. We see that d,t affects dividend growth rates instantly but its effect
does not persist, whereas x,t , q,t , and π,t affect dividend growth rates with
a one-period lag but their effects are persistent over time. Figure 1 further
shows the negative correspondence between expected inflation and expected
real dividend growth rates. This is a result of fitting our dividend model to the
nominal dividend process and extracting real dividend growth rate expectations
by subtracting expected inflation rates from the nominal dividend process. The
assumption underlying this choice is that firms do not adjust for inflation in
paying dividends to investors, therefore, we expect real rates to fall as inflation
expectations rise. This prediction is supported in the data and is consistent
with the economic rationale that expected inflation is negatively related to
expected growth in real activity.12
In Table V, we report the in-sample and out-of-sample R2 s for predicting real,
rather than nominal, annual dividend growth rates using either our model or
one of the baseline models. We find that our model also outperforms the baseline
models in forecasting real annual dividend growth rates. In particular, our
model predicts 42.4% of the variation in real annual dividend growth rates
between 1946 and 2016 in-sample and 39.5% of the variation in real rates
between 1975 and 2016 out-of-sample.
12 See, among others, Fama (1981) and Piazzesi et al. (2006).
Dividend Dynamics, Learning, and Expected Stock Index Returns 413

Figure 1. Impulse response functions of dividend shocks. This figure plots the immediate
change to real annual dividend growth rates and expected real dividend growth rates over the next
10 years as a result of a unit change in shocks to the dividend process, that is, d,t , x,t , q,t , and
π,t . (Color figure can be viewed at wileyonlinelibrary.com)

II. Parameter Uncertainty and Learning

The difference between in-sample and out-of-sample prediction is the as-
sumption made on investors’ information set. The model parameters reported
in Table II are estimated using data up to 2016, so that they reflect investors’
knowledge of dividend dynamics at the end of 2016. If investors were to esti-
mate our dividend model at an earlier date, they would have estimated a set
of parameter values different from those reported in Table II. This is a result
of investors’ knowledge of dividend dynamics evolving as more data become
414 The Journal of FinanceR

Table V
Dividend Growth Rates and Expected Growth Rates (Real Rates)
Panel A reports R2 s for predicting (real) dividend growth rates using our dividend model, the la-
tent variable model in van Binsbergen and Koijen (2010), the VAR model in Campbell and Shiller
(1988b), or the Markov-switching model in Johannes, Lochstoer, and Mou (2016). The first column
reports in-sample R2 s. The second and third columns report out-of-sample R2 s and the correspond-
ing bootstrap-simulated p-values. Panel B reports incremental R2 s for predicting dividend growth
rates using our model over one of the baseline models. Dividends are estimated based on nonover-
lapping annual data since 1946. Out-of-sample statistics are based on nonoverlapping annual data
between 1975 and 2016.

Panel A. In-Sample and Out-of-Sample R2

In-Sample Out-of-Sample

R2 R2O p-Value

Our Model 0.424 0.395 0.000

van Binsbergen and Koijen (2010) 0.160 0.146 0.012
Campbell and Shiller (1988b) 0.259 0.234 0.001
Johannes, Lochstoer, and Mou (2016) 0.172 −0.058 1.000

Panel B. Incremental R2


R2I p-Value

van Binsbergen and Koijen (2010) 0.292 0.000

Campbell and Shiller (1988b) 0.210 0.002
Johannes, Lochstoer, and Mou (2016) 0.428 0.000

available. We refer to investors estimating model parameters at each point

in time based on data available at the time as learning. In this section, we
show how learning affects investors’ beliefs about the parameters governing
the dividend process, assuming that investors behave as if they learn about
dividend dynamics using our model. We then report evidence supporting the
view that learning about dividend dynamics can have significant asset pricing
To reflect investors’ information set, hereafter, we assume that investors
have access to earnings information six months after the fiscal quarter-end
or year-end. The choice of six months is reasonably conservative and is based
on Securities and Exchange Commission (SEC) rules in place since 1934 that
require public companies to file 10-Q reports no later than 45 days after a fiscal
quarter-end and 10-K reports no later than 90 days after a fiscal year-end.13
In Figure 2, we plot model parameters estimated based on nonoverlapping
annual data up to time τ , for τ between 1975 and 2016. Several observations

13 In 2002, these rules were updated to require that large firms file 10-Q reports no later than

40 days after a fiscal quarter-end and 10-K reports no later than 60 days after a fiscal year-end.
However, in our research, we find that a small percentage of firms miss these deadlines.
Dividend Dynamics, Learning, and Expected Stock Index Returns 415

Figure 2. Evolution of dividend model parameter estimates over time. This figure plots
estimates of the eight parameters in our dividend model, assuming that these parameters are
estimated based on data up to time τ for τ between 1975 and 2016. The shaded regions are
recessions. A year is in recession if any of its months corresponds to NBER recession dates. (Color
figure can be viewed at wileyonlinelibrary.com)

can be made from Figure 2. First, there is a gradual upward drift in investors’
beliefs about the unconditional mean μq of retention ratios. This suggests that
firms have been paying a smaller fraction of earnings as cash dividends in
recent decades. Second, there is a gradual downward drift in investors’ beliefs
about φ, which connects corporate payout policy to dividend dynamics. This
416 The Journal of FinanceR

Table VI
Speed of Learning about Dividend Model Parameters
This table reports the speed of learning for the eight parameters in our dividend model. Speed
of learning is defined as one minus the inverse ratio between the bootstrap-simulated standard
errors assuming that parameters are estimated based on data between 1946 and 2016 and the
bootstrap-simulated standard errors assuming that parameters are estimated based on 10 more
years of data.

μd φ σd ρ σx μq θ σq

0.078 0.076 0.067 0.036 0.041 0.061 0.068 0.080

means that dividends have become more smooth over time. The decline in the
impact of retained earnings on future dividends is consistent with a decrease in
investment opportunities and more of retained earnings being used for share
repurchases. Third, a sharp drop in investors’ beliefs about the persistence θ of
retention ratios toward the end of our data sample is due to the abnormally low
earnings reported around the time of the 2009 recession and the strong stock
market recovery that followed. Changes in the volatility of shocks to dividends
and retention ratios are products of these trends.
Figure 2 shows that the persistence ρ of the latent variable xt appears to be
the hardest parameter to learn and least stable parameter over time. Investor
beliefs about ρ fluctuate significantly over the sample period. For example,
investor beliefs about ρ drops sharply three times during our sample. The first
is at the start of what is sometimes referred to as the Dot-Com bubble. The
second is around the time of the 2001 recession. The third is around the time of
the 2009 recession. This is a standard feature of a latent variable model. That
is, when a large and unexpected shock hits, in our context either in the form
of a recession or what is sometimes referred to as a bubble, our model assigns
some positive probability to the shock belonging to the persistent process, and
revises ρ downward.
We infer from the standard errors reported in Table II that learning about
dividend dynamics is a slow process. In particular, even with 71 years of data,
there is still significant uncertainty surrounding the estimates of some model
parameters. For example, the 90% confidence interval for ρ is between 0.193
and 0.745. To quantify the speed of learning about a parameter in our dividend
model, we follow Johannes, Lochstoer, and Mou (2016) and construct a measure
that is one minus the inverse ratio between the bootstrap-simulated standard
error assuming that the parameter is estimated based on data up to 2016 and
the bootstrap-simulated standard error assuming that the parameter is esti-
mated based on 10 more years of data, that is, if the parameters were estimated
in 2026. This ratio thus indicates how much an estimated parameter’s standard
error should decrease if the same exercise were to be done in 10 years time.
Therefore, the closer this ratio is to zero, the more difficult it is for investors to
learn about that parameter. In Table VI, we report this measure for each of the
eight model parameters. Overall, 10 additional years of data would decrease
the standard errors of the parameter estimates by between approximately 3%
Dividend Dynamics, Learning, and Expected Stock Index Returns 417

and 8%. Further, consistent with results in Figure 2 and in Table I, reducing
uncertainty about ρ is the most difficult among these parameters.
We next show that learning about dividend dynamics can have significant
asset pricing implications through its impact on expected dividend growth
rates over the long run. To see this, consider the log-linearized present-value
relationship in Campbell and Shiller (1988a):

pt − dt = + κ1s ( Et [dt+s+1 ] − Et [Rt+s+1 ]) , (18)
1 − κ1

where κ0 and κ1 are log linearizing constants and Rt+1 is the stock index’s
log return.14 The expression is a mathematical identity that connects price-
to-dividend ratios, expected dividend growth rates, and discount rates, that
is, expected returns. We define stock yields as discount rates that equate the
present value of expected future dividends to the current price of the stock
index. Thus, rearranging (18), we can write stock yields as

syt ≡ (1 − κ1 ) κ1s Et [Rt+s+1 ]

= κ0 − (1 − κ1 )( pt − dt ) + (1 − κ1 ) κ1s Et [dt+s+1 ]. (19)

We define long-run dividend growth expectations as

∂t ≡ (1 − κ1 ) κ1s Et [dt+s+1 ]. (20)

Given that price-to-dividend ratios are observed, there is a one-to-one mapping

between long-run dividend growth expectations and stock yields. Further, long-
run dividend growth expectations are specific to the dividend model and its
parameters. For example, using our dividend model, we can rewrite expected
long-run dividend growth rates as:

∂t = (1 − κ1 ) κ1s μd + ρ s xt + φθ s (qt − μq ) . (21)

If investors use a different dividend model, their expectations of long-run div-

idend growth rates will also be different. For example, if we assume that div-
idend growth rates follow a white noise process centered around μd, we can
rewrite (21) as ∂t = μd. Further, because long-run dividend growth expecta-
tions are functions of dividend model parameters, they are also affected by

exp( p−d)
14 To solve for κ0 = log(1 + exp( p − d)) − κ1 ( p − d) and κ1 = 1+exp( p−d)
, we set the unconditional
mean of the log price-to-dividend ratio p − d to 3.474.
418 The Journal of FinanceR

Figure 3. Expected long-run dividend growth rates. This figure plots long-run dividend
growth expectations, computed using our dividend model, for the period between 1975 and 2016.
Dividends are estimated based on nonoverlapping annual data since 1946. Assuming full informa-
tion, parameters are estimated once based on the full data sample. Assuming learning, parameters
are estimated at each point in time based on the data available at the time. The shaded regions are
recessions. A year is in recession if any of its months correspond to NBER recession dates. (Color
figure can be viewed at wileyonlinelibrary.com)

whether these parameters are estimated once based on the full data or at each
point in time based on the data available at the time. The first case corresponds
to investors having complete knowledge of the parameters describing the div-
idend process. The second case corresponds to investors having to learn about
dividend dynamics. In Figure 3, we plot our model’s long-run dividend growth
expectations for these two cases. The figure shows that learning can have a
considerable effect on investors’ long-run dividend growth expectations.
In Figure 4, we plot stock yields for the learning and the full information
cases, which are computed by substituting (21) into (19):
1 φ
syt = κ0 − (1 − κ1 )( pt − dt ) + μd + (1 − κ1 ) xt + (qt − μq ) . (22)
1 − κ1 ρ 1 − κ1 θ

We also plot price-to-dividend ratios in Figure 4, and scale price-to-dividend

ratios to allow for easy comparison to stock yields. We note that, under full
information, there is almost no noticeable difference between the time series
of price-to-dividend ratios and stock yields. This suggests that, assuming that
investors do not learn, the variation in long-run dividend growth expectations
is minimal relative to the variation in price-to-dividend ratios, so the latter
dominates the variation in stock yields, as stock yields are a linear combination
of these two components. Assuming learning takes place, however, we find
significant differences between the time series of price-to-dividend ratios and
stock yields.
Dividend Dynamics, Learning, and Expected Stock Index Returns 419

Figure 4. Stock yields. This figure plots stock yields syt , computed using our dividend model,
and log price-to-dividend ratios (scaled) for the period between 1975 and 2016. Dividends are
estimated based on nonoverlapping annual data since 1946. Assuming full information, parameters
are estimated once based on the full data sample. Assuming learning, parameters are estimated
at each point in time based on the data available at the time. The shaded regions are recessions.
A year is in recession if any of its months correspond to NBER recession dates. (Color figure can
be viewed at wileyonlinelibrary.com)

III. Learning about Dividends and the Time-Variation

in Discount Rates
So far, we have focused on how learning affects the econometrician. We now
examine the asset pricing implications of assuming that investors have to learn
about dividend dynamics in a manner similar to learning by the econometri-
cian. While the econometrician does not price assets, investors do. So, assuming
that investors behave as if they learn about dividend dynamics using our model,
we expect such behavior to affect stock index prices and returns. This assump-
tion is not unreasonable. Because our dividend model outperforms alternative
models in forecasting dividends, it is natural to assume that investors behave
as if they use a model that is more similar to ours in their investment decisions,
at least among the choices examined. In this section, we present evidence that
is consistent with this assumption.
First, we show that, assuming learning, stock yields predict annual stock
index returns. To establish a baseline, note that if we assume dividend growth
rates follow a white noise process centered around μd, stock yields can be
simplified to

syt = κ0 − (1 − κ1 )( pt − dt ) + μd. (23)

That is, under the white noise assumption, stock yields are just scaled price-
to-dividend ratios. We therefore regress future annual stock index returns on
price-to-dividend ratios based on nonoverlapping annual data between 1975
420 The Journal of FinanceR

Table VII
Stock Index Returns and Stock Yields
This table reports coefficient estimates and R2 s from regressing future stock index returns on log
price-to-dividend ratios and stock yields, computed using our dividend model, the latent variable
model in van Binsbergen and Koijen (2010) (vBK), the VAR model in Campbell and Shiller (1988b)
(CS), or the Markov-switching model in Johannes, Lochstoer, and Mou (2016) (JLM) and assuming
investors either learn (i.e., syt (L)), or do not learn (i.e., syt (F)), about dividends. Dividends are
estimated based on nonoverlapping annual data since 1946. Regressions are based on nonoverlap-
ping annual data between 1975 and 2016. Reported in parentheses are Newey and West (1987)
standard errors that account for up to 10 years of serial correlation. Estimates significant at the
90%, 95%, and 99% confidence levels are indicated using *, **, and ***.

Baseline Model

Our Model vBK CS JLM

pt − dt −0.130*** 0.014
(0.035) (0.078)
syt (L) 4.399*** 4.748** 5.423*** 3.379*** 4.160*** 1.965*
(0.775) (2.137) (2.024) (0.850) (1.216) (0.843)
syt (F) 4.097*** −1.282
(1.036) (2.100)
R2 0.136 0.187 0.130 0.187 0.190 0.114 0.114 0.054

and 2016. These reports, appearing in the first column of Table VII, show that
between 1975 and 2016, price-to-dividend ratios predict 13.6% of the variation
in annual stock index returns.
We next regress future annual stock index returns on stock yields in (22),
assuming learning. We report the results in the second column of Table VII. We
see that the R2 from this regression is 18.7%. We note that the only difference
between this regression and the baseline regression is the assumption on the
dividend process. That is, here, we assume that investors behave as if they learn
about dividend dynamics using our model, whereas in the baseline regression,
we assume that expected dividend growth rates are constant. This means that
we can attribute the increase in R2 from 13.6% to 18.7% to our incorporating
learning about dividend dynamics into the model. To emphasize the importance
of learning, we regress future annual stock index returns on stock yields in
(22), assuming full information. The results are reported in the third column
of Table VII. We find that stock yields under full information perform roughly
as well as price-to-dividend ratios in predicting annual stock index returns.
This is consistent with the results in Figure 4, which show that there is very
little difference between the time series of price-to-dividend ratios and stock
yields, assuming full information. To show that the superior predictive power
of stock yields, assuming learning, is significant, we run bivariate regressions
of future annual stock index returns on both stock yields, assuming learning,
and either price-to-dividend ratios or stock yields, assuming full information.
The reports, appearing in the fourth and fifth columns of Table VII, show that
stock yields, assuming learning, significantly dominate both price-to-dividend
Dividend Dynamics, Learning, and Expected Stock Index Returns 421

Table VIII
Stock Index Returns and Shocks to Dividend Expectations
This table reports coefficient estimates and R2 s from regressing future stock index returns on con-
temporaneous shocks to long-run dividend growth rate expectations, computed using our dividend
model, the latent variable model in van Binsbergen and Koijen (2010) (vBK), the VAR model in
Campbell and Shiller (1988b) (CS), or the Markov-switching model in Johannes, Lochstoer, and
Mou (2016) and assuming investors either learn (i.e., ∂t+1 (L)), or do not learn (i.e., ∂t+1 (F)),
about dividends. Dividends are estimated based on nonoverlapping annual data since 1946. Regres-
sions are based on nonoverlapping annual data between 1975 and 2016. Reported in parentheses
are Newey and West (1987) standard errors that account for up to 10 years of serial correlations.
Estimates significant at the 90%, 95%, and 99% confidence levels are indicated using *, **, and

Baseline Model

Our Model vBK CS JLM

∂t+1 (L) 8.324*** −1.800 0.229 −4.183**

(2.702) (4.147) (6.781) (1.787)
∂t+1 (F) −4.203
R2 0.106 0.004 0.002 0.000 0.030

ratios and stock yields, assuming full information, in predicting annual stock
index returns.
It is worth noting that, for learning to be relevant in our context, in-
vestors must behave as if they are learning about dividend dynamics us-
ing our model. To illustrate this point, we regress stock index returns over
the next year on stock yields, assuming instead that investors behave as if
they learn about dividend dynamics using one of the three baseline mod-
els. The results, reported in the sixth to eighth columns of Table VII, show
that stock yields, assuming learning based on one of the baseline models, per-
form no better than price-to-dividend ratios in predicting annual stock index
We can also demonstrate the relevance of our dividend model by showing
that stock index prices respond better to contemporaneous changes to long-run
dividend growth rates using our model than from the alternative models. That
is, if investors behave as if they price the stock index using our model, then
all else equal, we expect that when dividend expectations rise according to our
model, so should prices, and vice versa. We regress annual stock index returns
on contemporaneous changes in long-run dividend expectations, assuming
that investors behave as if they learn using our dividend model. We report
the regression results in the first column of Table VIII. The results confirm
that increases in expectations about future dividends are accompanied by
more positive stock index returns, and vice versa. In fact, contemporaneous
changes to expected dividends account for a statistically significant 10.6%
of annual stock index returns. For comparison, we also run regressions of
annual stock index returns on contemporaneous changes in long-run dividend
422 The Journal of FinanceR

expectations, based either on our model under full information, or one of the
alternative dividend models, but assuming learning about dividends. The
results are reported in the second to fifth columns of Table VIII. We note that,
under any of the other cases considered, the relationship between annual
stock index returns and contemporaneous changes to expected dividends is
Taken as a whole, our findings suggest that the absence of a relationship
between dividend expectations and stock index pricing documented in existing
literature may be due to a failure to simultaneously account for corporate
payout policy and the role of learning in pricing.15

IV. Learning about Dividends in a Dynamic Equilibrium Model

Although long-run discount rates can be uniquely pinned down based on
price-to-dividend ratios and expectations of long-run dividend growth rates,
the present value relationship cannot fully capture how discount rates over
short horizons vary over time. In other words, the variation in expected long-
run returns and the variation in expected returns over the short run are not
necessarily perfectly correlated with each other. In this section, we search for a
dynamic equilibrium asset pricing model that is able to quantitatively capture
the possible role of learning in determining short-run expected returns. That
is, a model that, after incorporating parameter uncertainty, is able to show
strong performance in predicting short-horizon stock index returns that are
consistent with the data.
Below, we first argue that an asset pricing model’s performance in pre-
dicting stock index returns can be used to assess that model. We incorpo-
rate learning into a long-run risks model and show that 25.3% to 27.1% of
the variation in annual stock index returns can be predicted using such a

A. Return Predictability and Assessing Asset Pricing Models

The criterion we propose to assess an asset pricing model is the deviation be-
tween the candidate model’s expected returns on a given asset and the expected
returns of the true model. The true model here is defined as the asset pricing
model that best describes the behavior of the marginal investor who prices the
given asset, in a frictionless and efficient market. Let Mi be a candidate model,
M0 be the unobserved true asset pricing model, Rt be the log return of that
asset, Et [Rt+1 |Mi ] be the Mi -endowed-investors’ expectation of that asset over
the next time period, and Et [Rt+1 |M0 ] be the expected return under the true
model. The following definition defines a better asset pricing model, that is, a
candidate model that is closer to the true model, as the model that minimizes
the mean squared difference between its expected returns and the expected
returns of the true model.

15 See, for example, Cochrane (2008).

Dividend Dynamics, Learning, and Expected Stock Index Returns 423

DEFINITION 1: A candidate asset pricing model Mi is a better approximation of

the true asset pricing model (M0 ) than model M j if and only if
E ( Et [Rt+1 |M0 ] − Et [Rt+1 |Mi ])2 < E Et [Rt+1 |M0 ] − Et [Rt+1 |M j ] .

A clear inconvenience of this definition is that the true asset pricing model
M0 is never observable, and thus Et [Rt+1 |M0 ] is unobservable. To address
this issue, we notice that assuming markets are frictionless and efficient and
investors form rational expectations, the error term t+1 = Rt+1 − Et [Rt+1 |M0 ]
is orthogonal to any information that is time-t measurable. This leads to the
following proposition.
PROPOSITION 1: A candidate asset pricing model Mi is a better approximation
of the true asset pricing model (M0 ) than model M j if and only if
E ( Rt+1 − Et [Rt+1 |Mi ])2 E Rt+1 − Et [Rt+1 |M j ]
1−   >1−   .
E ( Rt+1 − E[Rt+1 ])2 E ( Rt+1 − E[Rt+1 ])2

Proofs are in Appendix C. In other words, if we define the out-of-sample R2

T −1 2
t=T0 ( Rt+1 − Et [Rt+1 |Mi ])
R2 (Mi ) = 1 − T −1 , (24)
t=T0 (Rt+1 − μ̂r,t )

where μ̂r,t = 1t t−1
s=0 Rs+1 is the average return of an asset up to time t, as the
performance of a candidate model Mi in predicting asset returns over the next
time period, then assuming that we have a sufficiently long data sample, we
can use the out-of-sample R2 to assess how close the candidate model is to the
true model. The asset that we use to evaluate models in this paper is the stock

B. The Long-Run Risks Model

We propose a long-run risks model that combines our dividend model, Epstein
and Zin (1989) investor preferences, and persistent consumption growth rates
similar to those in Bansal and Yaron (2004). We show that such a model predicts
25.3% to 27.1% of the variation in annual stock index returns.
Epstein and Zin (1989) preferences are among the most widely used expres-
sions for investor preferences in the literature. Investor preferences are defined
recursively as
    ζ1  1−α
Ut = (1 − δ)C̃t + δ Et Ut+1
ζ 1−α
, ζ = , (25)
1 − ψ1

where C̃t is real consumption, ψ is the elasticity of intertemporal substitution

(EIS), and α is the coefficient of risk aversion. We note that the representative
424 The Journal of FinanceR

agent prefers early resolution of uncertainty if ζ < 0 and late resolution of un-
certainty if ζ > 0.16 The log of the intertemporal marginal rate of substitution
(IMRS) is given by

mt+1 = ζ log(δ) − c̃t+1 + (ζ − 1) R̃t+1
, (26)

where c̃ = log(C̃) and R̃t+1

denotes the real return of the representative agent’s
wealth portfolio. For calibration, we set ψ = 1.5 to be consistent with prefer-
ences for the early resolution of uncertainty, and we set α = 5 and δ = 0.975, all
of which are within the range of parameter choices commonly made in existing
literature. Following much of the recent literature, we calibrate our long-run
risks model at the quarterly frequency.
Similar to Bansal and Yaron (2004), we assume that consumption and divi-
dend growth rates carry the same persistent latent component xt . That is, we
describe real consumption growth rates as

c̃t+1 − (μd − μπ ) = xt + σdc,t+1 . (27)

Following Bansal and Yaron (2004), we set the unconditional mean of con-
sumption growth rates equal to that of dividend growth rates. The parameter
γ is the leverage of the equity market. A common criticism of the long-run
risk model is that it requires a small but highly persistent component in con-
sumption and dividend growth rates that are not clearly supported by data.17
This criticism serves as the reason we expect learning to be important in this
Unfortunately, we cannot adopt the Bansal and Yaron (2004) model in its
exact form because our dividend model does not feature stochastic volatility,
which is a key component of Bansal and Yaron (2004). However, our long-run
risks model still needs the additional degree of freedom from a second latent
variable to simultaneously capture the time series of dividends and price-to-
dividend ratios in the data. So, instead of stochastic volatility, our long-run
risks model assumes stochastic correlation between shocks to consumption
and shocks to dividend and earnings processes. That is, we assume that the
correlations between shocks c,t+1 to real consumption growth rates and shocks
d,t+1 and e,t+1 to dividend and earnings growth rates are equal, that is, λt =
λ(c,t+1 , d,t+1 ) = λ(c,t+1 , e,t+1 ), and follow an AR[1] process centered around

λt+1 = ωλt + σλ λ,t+1 , λ,t+1 ∼ i.i.d.N(0, 1). (28)

16 Equivalently, if α > 1, then the representative agent prefers early resolution of uncertainty if

ψ > 1 and late resolution of uncertainty if ψ < 1.

17 See Beeler and Campbell (2012) and Jagannathan and Marakani (2015).
Dividend Dynamics, Learning, and Expected Stock Index Returns 425

Then, the correlation between consumption and retention ratios is:

 σd2 + σq2 − σd
λ c,t+1 , q,t+1 = λt . (29)
We set other cross-correlations of shocks that we cannot identify to zeros. To
summarize, the correlation matrix of shocks to consumption, dividends, and
retention ratios is given as
⎛ ⎛ √ 2 2 ⎞⎞
⎛ ⎞ σd +σq −σd
c,t+1 1 λt 0 0 λ 0
⎜ ⎜ σq
⎜ d,t+1 ⎟ ⎜ ⎜ λt 1 0 0 0 0⎟ ⎟
⎜ ⎟ ⎜ ⎜ ⎟⎟
⎜ x,t+1 ⎟ ⎜ ⎜ 0 0 1 0 0 0 ⎟⎟

⎜ ⎟ ⎜ ⎜ ⎟ . (30)
⎜ λ,t+1 ⎟ ∼ i.i.d. N ⎜0, ⎜ 0⎟ ⎟
⎜ ⎟ ⎜ ⎜√ 2 0 0 0 1 0 ⎟⎟
⎝ q,t+1 ⎠ ⎜ ⎜ σd +σq2 −σd ⎟⎟
⎝ ⎝ σq
λt 0 0 0 1 0 ⎠⎠
0 0 0 0 0 1

C. Estimation and Results

We solve our long-run risk model in Appendix D. In solving this model, we
closely follow the steps in Bansal and Yaron (2004). The model consists of four
state variables: latent variables xt and λt , the retention ratios, and the inflation
rates. We can solve for the price-to-dividend ratio as a linear function of these
four state variables:

pt − dt = Ad,0 + Ad,1 xt + Ad,2 λt + Ad,3 qt − μq + Ad,4 (πt − μπ ). (31)

The expected stock return over the next period is then

Et [Rt+1 ] = Ar,0 + Ar,1 xt + Ar,2 λt + Ar,4 (πt − μπ ). (32)

The coefficients Ad,· and Ar,· , derived in Appendix D, are functions of the
parameters that describe investor preferences and the joint processes of con-
sumption and dividends. We note that, substituting (31) into (32), we can avoid
estimating the latent variable λt directly from macroeconomic data and instead
write expected future returns as a function of the price-to-dividend ratios and
the three other state variables:

Et [Rt+1 ] = A0 + A1 xt + A2 ( pt − dt ) + A3 qt + A4 πt
Ar,0 Ad,2 − Ad,0 Ar,2 Ad,3 Ar,2 Ar,2 Ad,4 − Ar,4 Ad,2
A0 = + μq + μπ ,
Ad,2 Ad,2 Ad,2
Ar,1 Ad,2 − Ar,2 Ad,1 Ar,2 Ar,2 Ad,3
A1 = , A2 = , A3 = − ,
Ad,2 Ad,2 Ad,2
Ar,4 Ad,2 − Ar,2 Ad,4
A4 = . (33)
426 The Journal of FinanceR

Price-to-dividend ratios, earnings-to-dividend ratios, and inflation rates are

directly observable. Aside from those in investors preferences, all but three
parameters in our long-run risks model, as well as the latent variable xt , ap-
pear in either (6) or (16) and thus can be estimated from dividend dynamics.
We follow Kreps’s learning and use the estimated parameters and state vari-
ables as if they were their true values. There is more than one way to estimate
the remaining three parameters, that is, ω, σλ , and γ , which are not part of
dividends or preferences. One approach would be to estimate them from con-
sumption data. However, as Savov (2011) discussed, consumption is measured
with significant noise, with the right measure of consumption itself still an
open question. Accordingly, we choose to estimate them from price-to-dividend
ratios. Specifically, we fix a set of parameters ω, σλ , and γ . Then, for each time
t, we substitute in price-to-dividend ratios, retention ratios, inflation rates, and
dividend model estimates to back out λt following

( pt − dt ) − Ad,0 − Ad,1 xt − Ad,3 qt − μq − Ad,4 (πt − μπ )
λt = . (34)
From (28), we know that the latent variable λt should be an AR[1] process with
an unconditional distribution of N(0, √1−ω 2
). So, we can choose the parameters
ω, σλ , and γ to best fit these distributional characteristics λt . In other words,
we solve for ω, σλ , and γ using generalized method of moments (GMMs), fitting
the three parameters to the three moments:

E[λt ] = 0,
E[λ2t − E[λt ]2 ] − = 0,
1 − ω2
E[(λt − E[λt ]) ((λt+1 − ωλt ) − E[λt+1 − ωλt ])] = 0. (35)

Under the assumption that our long-run risks model holds, exactly three in-
dependent moment conditions, as in (35), are required to identify the three
parameters ω, σλ , and γ not a part of dividend dynamics. Our choice of moment
conditions is standard. First, we choose the three parameters so that the sam-
ple mean of the latent variable λt is set to zero. Second, the sample variance
of the latent variable λt is set to equal the variance specified in our model.
Third, the sample first-order serial covariance of the latent variable λt is set to
match the covariance specified in our model. Standard errors of the parameter
estimates are based on bootstrap simulation, as described in Appendix B.
Our choice to estimate ω, σλ , and γ from price-to-dividend ratios is consistent
with the existing literature on learning from prices.18 Still, the fact that our
model features consumption but our estimation of the model does not is a
drawback of our approach. Including clean consumption data in our estimation,
if such data were available, would mean having extra independent observations
for estimating state variables and parameters. However, as simulation results

18 For example, the literature on rational expectations equilibrium models.

Dividend Dynamics, Learning, and Expected Stock Index Returns 427

in Table VI suggest, the gain in efficiency as a result of having extra data may
be rather limited.19
To focus on the role of learning about dividends on asset pricing and differ-
entiate ourselves from Johannes, Lochstoer, and Mou (2016), we first run (35)
by fitting the three moments of the latent variable λt based on the entire data
sample between 1946 and 2015. Here, learning is restricted to parameters in
the dividend model, that is, we focus just on learning about dividends. How-
ever, because we use the entire data sample, a forward-looking bias may be
introduced. To mitigate this concern, we also run (35) at each point in time us-
ing only the data available at the time. In this case, learning is for parameters
both in and beyond the dividend model, that is, full learning.
In Figure 5, we plot parameters in our long-run risks model not in the div-
idend process, estimated based on nonoverlapping annual data up to time τ ,
for τ between 1975 and 2016. In the same figure, we also plot coefficients A·
that relate price-to-dividend ratios and state variables to expected returns in
(33), assuming full information, learning about dividends, or full learning. We
see that the coefficients A· fluctuate significantly over time. In particular, as
learning is introduced, these coefficients become additional model-specific state
variables in determining stock index expected returns. This observation is con-
sistent with the findings of Collin-Dufresne, Johannes, and Lochstoer (2016).
Not surprisingly, variation in the coefficients A· is greater under full learning
than learning about dividends.
In Figure 6, we plot the evolution of the four state variables in our long-run
risks model, the latent variables xt and λt , the earnings-to-dividend ratio, and
the inflation rate, as well as expected excess stock index returns and the risk-
free rate, over time assuming full information, learning about dividends, or full
learning. We find that, consistent with data, most of the variation in expected
stock index returns is attributable to variation in expected excess returns. Not
surprisingly, learning increases the volatility of both expected excess returns
and the risk-free rate. Interestingly, Figure 6 suggests that, around the time
of the 2001 recession, expected one-year excess returns are negative. This is a
result of model-implied correlations between shocks to dividends and consump-
tion being highly negative. That is, we find that the stock index temporarily
serves as a hedge for consumption during this period as an equilibrium outcome
of our model.
We examine how our long-run risks model, assuming either learning about
dividends or full learning, perform in predicting annual stock index returns.
We measure forecasting performance using the out-of-sample R2 ,
T −1  2
t=T0Rt+1 − Et Rt+1 |L
R2O (L) =1− T −1 2 , (36)
t=T0 Rt+1 − μ̂r,t

19 Also, we need high-frequency consumption data to reasonably fit a model with time-varying

correlations in dividends and consumption shocks.

428 The Journal of FinanceR

Figure 5. Evolution of long-run risks model parameter and coefficient estimates over
time. This figure plots estimates of the parameters in our long-run risks model, aside from those
in the dividend process, and coefficients A· that relate price-to-dividend ratios, the latent variable
xt , the retention ratio, and the inflation rate to expected returns, assuming that these parameters
are estimated based on data up to time τ for τ between 1975 and 2016. The shaded regions are
recessions. A year is in recession if any of its months correspond to NBER recession dates. (Color
figure can be viewed at wileyonlinelibrary.com)
Dividend Dynamics, Learning, and Expected Stock Index Returns 429

Figure 6. Evolution of long-run risks model state variables, expected excess stock index
returns, and risk-free rate over time. This figure plots estimates of the state variables for our
long-run risks model, as well as expected excess returns and the risk-free rate from our model,
assuming full information, learning about dividends, or learning about all parameters in our long-
run risks model (i.e., full learning), between 1975 and 2016. The shaded regions are recessions. A
year is in recession if any of its months correspond to NBER recession dates. (Color figure can be
viewed at wileyonlinelibrary.com)

where L stands for learning. We use data since 1946 as the training period
and compute the out-of-sample R2 using nonoverlapping annual data between
1975 and 2016. In the first row of Table IX, we report out-of-sample R2 s for
predicting annual stock index returns using our learning models. We find that,
between 1975 and 2016, our learning models predict 25.3% to 27.1% of the
variation in annual stock index returns out-of-sample.
To better quantify the incremental contribution of learning to the model’s
performance in predicting annual stock index returns, we compute expected
returns in (32) using dividend model parameters estimated based on the en-
tire sample between 1975 and 2016, that is, our full-information model. We
also report out-of-sample R2 s for predicting stock index returns using our full-
information model in the first and second columns of Table IX. As we move from
430 The Journal of FinanceR

Table IX
Stock Index Returns and Epstein and Zin (1989) Expected Returns
This table reports out-of-sample R2 s for predicting stock index returns using our long-run risks
model, assuming that investors have full information, learn about dividends, or learn about all pa-
rameters in our long-run risks model (i.e., full learning), and the corresponding bootstrap-simulated
p-values. Also reported are incremental out-of-sample R2 s for predicting stock index returns as-
suming learning over predicting returns assuming full information. Dividends are estimated based
on nonoverlapping annual data since 1946. Statistics are based on nonoverlapping annual data
between 1976 and 2015.


R2 p-Value R2I p-Value

Full information 0.133 0.017

Learning about dividends 0.253 0.001 0.138 0.015
Full learning 0.271 0.000 0.159 0.009

learning to full information, the model’s R2 decreases from at least 25.3% to

13.3%. So, learning accounts for approximately half of the return predictabil-
ity documented. To examine the significance of this difference, in Table IX,
we report the incremental R2 of our learning models over our full-information
T −1 2
t=T ( Rt+1 − Et [Rt+1 |L])
R2I (L, F) = 1 − T −10 2
. (37)
t=T0 ( Rt+1 − Et [Rt+1 |F])

The results show that there is a statistically significant gain in forecasting

performance from modeling investors’ learning about dividend dynamics.20 In
addition, performance for predicting annual stock index returns is slightly
better for full learning than learning about dividends. However, we do not have
enough statistical power to conclude that learning about parameters beyond
the dividend model plays a statistically significant role.
For additional details on how our learning models’ forecasting performance
evolves over time, at each time τ , we follow Goyal and Welch (2008) and define
the cumulative sum of squared errors difference (SSED) between annual stock
index returns predicted using our learning models and using the historical
mean of returns as
τ −1
τ −1

2 2
Dτ (L) = ( Rt+1 − Et [Rt+1 |L]) − Rt+1 − μ̂r,τ . (38)
t=T0 t=T0

20 We note that R2I (L, F), R2 (L) and the out-of-sample R2 of our full-information model, that is,
R2 (F), are related through the following equation:

1 − R2 (L)
R2I (L, F) = 1 − .
1 − R2 (F)
Dividend Dynamics, Learning, and Expected Stock Index Returns 431

Figure 7. Cumulative sum of squared errors difference. Panel A plots the cumulative sum
of squared errors difference (SSED) of our long-run risks model, assuming learning about divi-
dends, in predicting stock index returns. Panel B plots the SSED of our long-run risks model,
assuming learning about all parameters in our long-run risks model (i.e., full learning). Dividends
are estimated based on nonoverlapping annual data since 1946. Statistics are based on nonover-
lapping annual data between 1975 and 2016. The shaded regions are recessions. A year is in
recession if any of its months correspond to NBER recession dates. (Color figure can be viewed at
432 The Journal of FinanceR

Table X
Long-Run Risks Model and Empirical Proxies of Expected Returns
This table reports out-of-sample R2 s for predicting stock index returns using our long-run risks
model, assuming learning, over the proxies for expected returns in Kelly and Pruitt (2013) or Li,
Ng, and Swaminathan (2013) and the corresponding bootstrap-simulated p-values. Dividends are
estimated based on nonoverlapping annual data since 1946. Statistics are based on nonoverlapping
annual data between 1981 and 2009 for Kelly and Pruitt (2013) and between 1995 and 2013 for Li,
Ng, and Swaminathan (2013).


Learning about Dividends Full Learning

R2I p-Value R2I p-Value

Kelly and Pruitt (2013) 0.140 0.045 0.152 0.036

(1975 to 2009)
Li, Ng, and Swaminathan 0.083 0.230 0.069 0.276
(1995 to 2013)

The SSEDs for our learning models are plotted in Figure 7. If the forecast-
ing performance of our learning model is stable and robust over time, we
should observe a steady but consistent decline in SSED. Instead, if fore-
casting performance is especially poor in a certain subperiod of the data,
we should see a significant drawback in SSED during that subperiod. A flat
SSED would suggest that our model neither increases nor decreases forecast-
ing performance. We note that our model’s forecasting performance is positive
over the majority of the sample period. Overall, as shown in Figure 7, most
of the forecasting performance can be attributed to the early three-fourths
of the sample, while performance is relatively flat during the most recent
To see the incremental contribution of learning to SSED over time, in
Figure 8, we plot the incremental SSED, which is given as the difference in
SSED between our learning model and our full-information model:
τ −1

2 2
Dτ (L) − Dτ (F) = ( Rs+1 − Et [Rs+1 |L]) − ( Rs+1 − Et [Rs+1 |F]) . (39)
t=T0 s=T0

Similar to what we observe above, the incremental gain in forecasting per-

formance from learning is large and reasonably consistent, but concentrates
mostly in the early three-fourths of the sample.

C.1. Long-Run Risks Model and Other Return Forecasts

Goyal and Welch (2008) document that empirical forecasts of stock index
returns overwhelmingly lack out-of-sample predictive power. However, Kelly
and Pruitt (2013) and Li, Ng, and Swaminathan (2013), among others, over-
come the Goyal and Welch (2008) critique and find out-of-sample return
Dividend Dynamics, Learning, and Expected Stock Index Returns 433

Figure 8. Incremental gain in cumulative sum of squared errors difference from learn-
ing. Panel A plots the incremental gain in the cumulative sum of squared errors difference (SSED)
of our long-run risks model, assuming learning about dividends versus full information. Panel
B plots the incremental gain in SSED of our long-run risks model, assuming learning about all
parameters in our long-run risks model (i.e., full learning), versus full information. Dividends
are estimated based on nonoverlapping annual data since 1946. Statistics are based on nonover-
lapping annual data between 1976 and 2015. The shaded regions are recessions. A year is in
recession if any of its months correspond to NBER recession dates. (Color figure can be viewed at
434 The Journal of FinanceR

Figure 9. Autocorrelation function and partial autocorrelation function of inflation

rate. This figure plots the autocorrelation function (Panel A) and the partial autocorrelation
function (Panel B) of inflation rates, up to 10 years lag. Correlations are estimated based on data
between 1946 and 2016. (Color figure can be viewed at wileyonlinelibrary.com)

predictability. We compare the out-of-sample forecasting performance of our

long-run risks model, assuming either learning about dividends or full learn-
ing, to these more successful empirical proxies of stock index returns in the
existing literature.
To avoid the concern of selection bias in our results, we compare our learning
models to these empirical proxies of returns based on the sample period used
by the corresponding original authors. For Kelly and Pruitt (2013), this is
between 1975 and 2009. For Li, Ng, and Swaminathan (2013), this is between
1995 and 2013. To evaluate performance, in Table X, we report incremental
R2 s for predicting stock index returns using our learning models over these
alternative expected return proxies for the selected periods. The results show
that our learning models outperform these empirical proxies of expected stock
index returns.21

C.2. Recession versus Expansion

Figure 7 suggests that the times around the 2001 recession plays an es-
pecially important role in the return predictability results. Figure 7 further
suggests that the learning models’ performance is positive in most of the other
recessions as well. To shed more light on how return predictability differs be-
tween expansions and recessions, we divide our 1975 and 2016 sample into

21 However, when compared to Li, Ng, and Swaminathan (2013), the outperformance is not

statistically significant due to the shortened data sample.

Dividend Dynamics, Learning, and Expected Stock Index Returns 435

Table XI
Return Predictability during Expansions versus Recessions
This table reports out-of-sample R-square values for predicting stock index returns using our long-
run risks model, assuming that investors have full information, learn about dividends, or learn
about all parameters in our long-run risks model, that is, full learning, and the corresponding
bootstrap-simulated p-values. Also reported are incremental out-of-sample R-square values for
predicting stock index returns assuming learning over assuming full information. Statistics are
based on nonoverlapping annual data between 1975 and 2016 and are separately reported for
expansions versus recessions. A year is in recession if any of its months overlap with NBER
recession dates.

Expansion Recession

Incremental Incremental

R2 p-Value R2I p-Value R2 p-Value R2I p-Value

Full 0.132 0.029 0.138 0.455

Learning about 0.196 0.007 0.074 0.109 0.516 0.085 0.438 0.128
Full learning 0.191 0.008 0.068 0.024 0.641 0.037 0.584 0.056

expansions versus recessions and separately report the forecasting perfor-

mance results. We classify a year as in a recession if any of its months corre-
spond to the NBER recession dates. There are six such years in our 42-year data
sample. Table XI reports the results. We find that the forecasting performance
of our learning models is much stronger during recessions than expansions, but
performance during expansions is nevertheless robust, with an R2 of 19.1% to
19.6% assuming learning versus 13.2% assuming full information. The finding
that predictability is strongest during market downturns is not surprising and
is consistent with the existing literature. For example, based on four centuries
of stock market data, Golez and Koudijs (2018) find that most of the predictabil-
ity of future stock returns using price-to-dividend ratios stems from recessions.
However, in both expansions and recessions, our learning model outperforms
our full-information model, suggesting that learning plays an important role
regardless of economic conditions.

C.3. The Role of Epstein and Zin (1989) Preferences

To emphasize that Epstein and Zin (1989) preferences are critical to our
return predictability results, we build a model in which we replace Epstein
and Zin (1989) preference with CRRA preferences:

Ut = δt , (40)

where α and δ are set to best match the unconditional means stock index
returns and risk-free rate in our sample. While estimates of the parameters
436 The Journal of FinanceR

Table XII
Stock Index Returns and CRRA Expected Returns
This table reports out-of-sample R2 s for predicting stock index returns using our CRRA model, as-
suming that investors have full information, learn about dividends, or learn about all parameters
in our CRRA model (i.e., full learning) and the corresponding bootstrap-simulated p-values. Also
reported are incremental out-of-sample R2 s for predicting stock index returns assuming learning
over predicting returns assuming full information. Dividends are estimated based on nonoverlap-
ping annual data since 1946. Statistics are based on nonoverlapping annual data between 1975
and 2016.


R2 p-Value R2I p-Value

Full information 0.118 0.026

Learning about dividends 0.144 0.013 0.030 0.276
Full learning 0.151 0.011 0.037 0.219

in the dividend model do not change with preferences, the three remaining
parameters, ω, σλ , and γ need to be reset. We estimate the parameters ω,
σλ , and γ of our model using GMMs by fitting the same set of moments in
(35) under CRRA preferences and the chosen preference parameters. We then
derive expected stock index returns under CRRA preferences. In Table XII, we
report R2 s for predicting annual stock index returns using the CRRA model
assuming learning about dividends, full learning, or full information. We see
that, assuming learning, R2 s for predicting annual stock index returns decrease
from at least 25.3% for Epstein and Zin (1989) preferences to at most 15.1%
for CRRA preferences, and the lack of the incremental contribution of learning
to R2 accounts for most of this reduction. It is clear from these results that
modeling investor behavior using CRRA preferences cannot fully capture the
effect of learning on expected stock index returns.

V. Conclusion
In this paper, we develop a time-series model of dividend growth rates that
is inspired by both the latent variable model of Cochrane (2008), van Bins-
bergen and Koijen (2010), and others and the vector-autoregressive model of
Campbell and Shiller (1988b). The model shows strong performance in pre-
dicting annual dividend growth rates. We find that some parameters of our
dividend model are difficult to estimate with precision in a finite sample. As a
consequence, learning about dividend model parameters significantly changes
investor beliefs about future dividends and the nature of the long-run risks in
the economy.
We show how to evaluate the economic and statistical significance of learn-
ing about parameters in the dividend process in determining asset prices and
returns. We argue that a better asset pricing model should forecast returns
better. We find that a long-run risks model that incorporates learning about
Dividend Dynamics, Learning, and Expected Stock Index Returns 437

dividend dynamics is surprisingly successful in forecasting stock index returns.

In particular, our long-run risks model featuring Epstein and Zin (1989) pref-
erences and persistent shocks to dividends and consumption under learning
explains 25.3% to 27.1% of the variation in annual stock index returns, while
shutting down learning reduces the R2 to 13.3%. This decrease in R2 is statis-
tically significant and economically meaningful. We also show that we cannot
replicate our learning results under CRRA preferences. Our findings highlight
the joint importance of investors aversion to long-run risks and their learning
about these risks for understanding asset prices.

Initial submission: September 30, 2015; Accepted: November 13, 2017

Editors: Bruno Biais, Michael R. Roberts, and Kenneth J. Singleton

Appendix A: Estimation of Parameters in Our Dividend Model

We estimate parameters of the following system of equations that jointly
describe the dividend, earnings (i.e., retention ratio), and the inflation process.
See (6), (9), and (16):

dt+1 − μd = xt + φ qt − μq + σdd,t+1 ,
xt+1 = ρxt + σx x,t+1 ,

qt+1 − μq = θ qt − μq + σq q,t+1 ,
πt+1 − μπ = η (πt − μπ ) + σπ π,t+1 ,
⎛ ⎞ ⎛ ⎛ ⎞⎞
d,t+1 1 λdx λdq λdπ
⎜ x,t+1 ⎟ ⎜ ⎜ λdx 1 λxq λxπ ⎟⎟
⎜ ⎟ ⎜ ⎜ ⎟⎟ .
⎝ q,t+1 ⎠ ∼ i.i.d. N ⎝0, ⎝ λdq λxq 1 λqπ ⎠
⎠ (A1)
π,t+1 λdπ λxπ λqπ 1

To estimate parameters in the third equation of (A1), we run an autoregression

on retention ratios:

qt+1 − μq = θ qt − μq + σq q,t+1 , q,t+1 ∼ i.i.d. N(0, 1). (A2)

To estimate parameters in the fourth equation of (A1), we run an autoregression

on inflation rates:

πt+1 − μπ = η (πt − μπ ) + σπ π,t+1 , π,t+1 ∼ i.i.d. N(0, 1). (A3)

For the remaining parameters in the first and second equations of (A1), we
note that dividend growth rates and contemporaneous earnings are related, as
shocks to dividends also impact contemporaneous earnings in (9), and vice
438 The Journal of FinanceR

Table AI
Stock Index Returns, Long-Run Risks Expected Returns,
and Stock Yields
Panel A reports estimated coefficients from regressing future stock index returns on expected
returns from our long-run risks model, assuming that investors have full information, learn about
dividends, or learn about all parameters in our long-run risks model (i.e., full learning). Panel B
reports estimated coefficients from regressing expected returns from our long-run risks model
on stock yields. Long-run risks model expected returns are computed, assuming that investors
have full information, learn about dividends, or learn about all parameters in our long-run risks
model (i.e., full learning). Stock yields are computed assuming investors learn (i.e., syt (L)) or do
not learn (i.e., syt (F)) about dividends. Dividends are based on nonoverlapping annual data since
1946. Reported in parentheses are Newey and West (1987) standard errors that account for up to
10 years of serial correlations. Estimates significant at the 90%, 95%, and 99% confidence levels
are indicated using *, **, and ***.

Panel A. Stock Index Returns and Expected Returns

Full Information Learning about Dividends Full Learning

Constant 0.016 0.003 0.008

(0.034) (0.013) (0.015)
Slope 0.902*** 1.203*** 1.174***
(0.246) (0.083) (0.140)
R2 0.135 0.254 0.262

Panel B. Stock Index Expected Returns and Stock Yields

Full Information Learning about Dividends Full Learning

syt (L) 3.614*** 3.895***

(0.772) (0.670)
syt (F) 4.410***
R2 0.972 0.708 0.661

versa. So, we estimate the process of dividends and earnings through the
following system of equations:

dt+1 − μd = yt+1 + b1 (et+1 − μd) + b2 qt − μq + vd,t+1
yt+1 = b3 yt + ν y,t+1
vd,t+1 ς 0
∼ i.i.d. N 0, d . (A4)
v y,t+1 0 ςy

To apply the Kalman filter, let ŷt|s denote the time-s expectation of the latent
variable yt and Pt|s denote the variance of yt conditioning on information at
σ y2
time s. Set initial conditions ŷ0|0 = 0 and P0|0 = 1−b32
. We can then iterate the
Dividend Dynamics, Learning, and Expected Stock Index Returns 439

Table AII
Dividend Growth Rates and Expected Growth Rates (Quarterly,
Semiannual, and Biannual Rates)
Panel A reports R2 s for predicting dividend growth rates (quarterly, semiannual, or biannual
rates), computed using our dividend model, the latent variable model in van Binsbergen and
Koijen (2010), the VAR model in Campbell and Shiller (1988b), or the Markov-switching model in
Johannes, Lochstoer, and Mou (2016). The first column reports in-sample R-square values. The
second and third columns report out-of-sample R-square values and the corresponding bootstrap-
simulated p-values. Panel B reports incremental R2 s for predicting dividend growth rates using our
model over one of the baseline models. Dividends are estimated based on nonoverlapping annual
data since 1930. Out-of-sample statistics are based on nonoverlapping quarterly, semiannual, or
biannual data between 1975 and 2016.

Panel A. Out-of-Sample R2

Quarterly Semiannual Biannual

R2O p-Value R2O p-Value R2O p-Value

Our model 0.502 0.000 0.515 0.000 0.379 0.003

van Binsbergen and Koijen (2010) 0.337 0.000 0.267 0.000 0.151 0.080
Campbell and Shiller (1988b) 0.328 0.000 0.298 0.000 0.262 0.017
Johannes, Lochstoer, and Mou (2016) 0.075 0.079 0.001 0.855 −0.034 1.000

Panel B. Incremental R2

Quarterly Semiannual Biannual

R2I p-Value R2I p-Value R2I p-Value

van Binsbergen and Koijen (2010) 0.212 0.002 0.311 0.000 0.268 0.015
Campbell and Shiller (1988b) 0.222 0.002 0.281 0.000 0.158 0.073
Johannes, Lochstoer, and Mou (2016) 0.435 0.000 0.495 0.000 0.399 0.002

following system of equations:

ŷt+1|t = b3 ŷt|t , Pt+1|t = b32 Pt|t + ς y2 ,

t+1 = dt+1 − μd − a1 − ŷt+1|t − b1 (et+1 − μd) − b2 (qt − μq ),
Pt+1|t Pt+1|t
ŷt+1|t+1 = ŷt+1|t + t+1 , Pt+1|t+1 = Pt+1|t − . (A5)
Pt+1|t + ςd2 Pt+1|t + ςd2

At each time τ , to estimate parameters in (A4), we maximize the log-likelihood

τ −1

l=− log Pt+1|t + ςd +
. (A6)
Pt+1|t + ςd2

Throughout, we apply the Kalman filter using nonoverlapping annual data.

To then map parameters in (A4) to those in the first and second equations of
(A1), we substitute et+1 = qt+1 − qt + dt+1 and (A2) into (A4) and rearrange
440 The Journal of FinanceR

Table AIII
Stock Index Returns and Stock Yields (Quarterly, Semiannual, and
Biannual Rates)
This table reports coefficient estimates and R2 s from regressing future stock index returns (quar-
terly, semiannual, or biannual returns) on log price-to-dividend ratios and stock yields computed
using our dividend model, the latent variable model in van Binsbergen and Koijen (2010) (vBK),
the VAR model in Campbell and Shiller (1988b) (CS), or the Markov-switching model in Johannes,
Lochstoer, and Mou (2016) (JLM) and assuming investors either learn (i.e., syt (L)) or do not learn
(i.e., syt (F)) about dividends. Dividends are estimated based on nonoverlapping annual data since
1930. Regressions are based on nonoverlapping quarterly, semiannual, or biannual data between
1975 and 2016. Reported in parentheses are Newey and West (1987) standard errors that account
for up to 10 years of serial correlation. Estimates significant at the 90%, 95%, and 99% confidence
levels are indicated using *, **, and ***.

Panel A. Quarterly Rates

Our Model Baseline Model

pt − dt syt (L) syt (F) vBK CS JLM

−0.033*** 0.993*** 1.101*** 0.857*** 1.183*** 0.384

(0.009) (0.201) (0.284) (0.232) (0.327) (0.239)
R2 0.032 0.035 0.032 0.026 0.029 0.007

Panel B. Semiannual Rates

Our Model Baseline Model

pt − dt syt (L) syt (F) vBK CS JLM

−0.065*** 1.894*** 2.137*** 1.686*** 2.215*** 0.912

(0.016) (0.409) (0.531) (0.415) (0.615) (0.480)
R2 0.069 0.081 0.072 0.058 0.061 0.022

Panel C. Biannual Rates

Our Model Baseline Model

pt − dt syt (L) syt (F) vBK CS JLM

−0.281*** 9.312*** 9.311*** 7.786*** 9.063*** 4.868**

(0.073) (2.216) (2.341) (1.986) (2.663) (1.884)
R2 0.258 0.299 0.263 0.231 0.208 0.120

(A4) to get
b1 (θ − 1) + b2  b1 1
dt+1 − μd = xt + qt − μq + σq q,t+1 + νd,t+1
1 − b1 1 − b1 1 − b1
xt+1 = b3 xt + νx,t+1 , xt = yt ,
1 − b1
vd,t+1 ςd 0
∼ i.i.d. N 0, . (A7)
vx,t+1 0 1
1−b1 y
Dividend Dynamics, Learning, and Expected Stock Index Returns 441

Table AIV
Stock Index Returns and Epstein and Zin (1989) Expected Returns
(Quarterly, Semiannual, and Biannual Rates)
This table reports out-of-sample R2 s for predicting stock index returns (quarterly, semiannual, or
biannual returns) using our long-run risks model, assuming that investors have full information,
learn about dividends, or learn about all parameters in our long-run risks model (i.e., full learning),
and the corresponding bootstrap-simulated p-values. Also reported are incremental out-of-sample
R2 s for predicting stock index returns assuming learning over predicting returns assuming full
information. Dividends are estimated based on nonoverlapping annual data since 1930. Statistics
are based on nonoverlapping quarterly, semiannual, or biannual data between 1975 and 2016.

Panel A. Quarterly Rates


R2 p-Value R2I p-Value

Full information 0.034 0.017

Learning about dividends 0.051 0.003 0.018 0.079
Full learning 0.053 0.003 0.020 0.067

Panel B. Semiannual Rates


R2 p-Value R2I p-Value

Full information 0.070 0.015

Learning about dividends 0.155 0.000 0.092 0.005
Full learning 0.166 0.000 0.103 0.003

Panel C. Biannual Rates


R2 p-Value R2I p-Value

Full information 0.203 0.039

Learning about dividends 0.329 0.006 0.159 0.072
Full learning 0.405 0.002 0.253 0.019

Thus, the mapping is

b1 (θ − 1) + b2
φ= , ρ = b3 ,
1 − b1

 2  2
1 b1 1
σx = ςy, σd = σq2 + ςd2 . (A8)
1 − b1 1 − b1 1 − b1
442 The Journal of FinanceR

Appendix B: Bootstrap Simulation

Each simulation is based on 100,000 iterations. First, we simulate innova-
tions to dividend growth rates and retention ratios:
⎛ ⎞ ⎛ ⎛ ⎞⎞
d,t+1 1 0 0
⎝ x,t+1 ⎠ ∼ i.i.d. N ⎝0, ⎝ 0 1 0 ⎠⎠ . (B1)
q,t+1 0 0 1

Dividend model parameters used in the simulations are reported in Table II,
which are estimated based on the full sample between 1946 and 2015. In our
simulations, we use these estimates as if they were the true parameter values.
Based on these innovations, we can simulate the latent variable xt and retention
ratios iteratively as

xt+1 = ρxt + σx x,t+1 ,

qt+1 − μq = θ qt − μq + σq q,t+1 . (B2)

Given the simulated time series of the latent variable xt and retention ratios,
we can simulate dividend growth rates iteratively according to

dt+1 − μd = xt + φ qt − μq + σdd,t+1 ,
et+1 = qt+1 + dt+1 . (B3)

To simulate price-to-dividend ratios, we use (31), which is derived from our

long-run risks model.

Appendix C: Proof of Proposition 1

Let M0 be the true asset pricing model and let Mi and M j be two candidate
models. Define t+1 = Rt+1 − Et [Rt+1 |M0 ]. We can write
E ( Et [Rt+1 |M0 ] − Et [Rt+1 |Mi ])2
= E ( Rt+1 − Et [Rt+1 |Mi ])2 + E t+1 − 2 · E ( Rt+1 − Et [Rt+1 |Mi ]) t+1
= E ( Rt+1 − Et [Rt+1 |Mi ])2 + E t+1 + 2 · E Et [Rt+1 |Mi ]t+1
−2 · E Rt+1 t+1
= E ( Rt+1 − Et [Rt+1 |Mi ])2 + E t+1 − 2 · E Rt+1 t+1 . (C1)

The last equality assumes frictionless and efficient markets and investors
that have rational expectations. As a result, the marginal investor’s invest-
ment decisions are based on all information available, and therefore, t+1 is
Dividend Dynamics, Learning, and Expected Stock Index Returns 443

orthogonal to any variable that is time-t measurable. Because E[t+1 ] and
E[Rt+1 t+1 ] are independent of the model Mi ,
E ( Et [Rt+1 |M0 ] − Et [Rt+1 |Mi ])2 < E Et [Rt+1 |M0 ] − Et [Rt+1 |M j ]
⇔ E ( Rt+1 − Et [Rt+1 |Mi ])2 < E Rt+1 − Et [Rt+1 |M j ]
E ( Rt+1 − Et [Rt+1 |Mi ])2 E Rt+1 − Et [Rt+1 |M j ]
⇔ 1−   2  > 1 −   2  . (C2)
E Rt+1 − E Rt+1 E Rt+1 − E Rt+1

Appendix D: Derivation of Price-Dividend Ratios and Expected

Returns in Long-Run Risks Model
We derive the price-to-dividend ratios and expected returns implied by our
long-run risk model, which features the dividend dynamics in (3), consumption
dynamics in (27), and investor preferences in (25). Our model differs from
Bansal and Yaron (2004), as discussed in the body of the paper. Nevertheless,
we can still use the methodology in Bansal and Yaron (2004) to solve for prices
and expected returns in our model. The log stochastic discount factor is given
mt+1 = ζ log(δ) − c̃t+1 + (ζ − 1) R̃ct+1 . (D1)
Let zc,t be the log wealth-to-consumption ratio. By first-order Taylor series
approximation, the log real return of the representative agent’s wealth portfolio
can be written as
R̃ct+1 = g0 + g1 zc,t+1 − zc,t + c̃t+1 . (D2)
The log-linearizing constants are
exp(z̄c )
g0 = log(1 + exp(z̄c )) − g1 (z̄c ) and g1 = .
1 + exp(z̄c )
Assume that the log wealth-to-consumption ratio is of the form
zc,t = Ac,0 + Ac,1 xt . (D3)
Let μc = μd − μπ , σc = σd/γ . Then, we can write
ζ 1
Et mt+1 + R̃ct+1 = ζ log(δ) + ζ − μc + xt + ζ g0 + ζ (g1 − 1) Ac,0
ψ γ
+ ζ (g1 ρ − 1) Ac,1 xt ,
1 2 2 2
vart mt+1 + R̃t+1 = ζ 1 −
c 2
σc + ζ 2 g1 Ac,1 σx2 ,
where σc = σd/γ . (D4)
444 The Journal of FinanceR

Since Et [exp(mt+1 + R̃ct+1 )] = 1, we can solve for coefficients Ac,0 , Ac,1 :

log(δ) + 1 − 1
μc + g0 + 12 ζ 1 − 1
σc2 + 12 ζ (g1 Ac,1 )2 σx2
Ac,0 = ,
1 − g1
1− 1
Ac,1 = . (D5)
1 − g1 ρ

Next, let zd,t be the log price-to-dividend ratio of the stock index and R̃t+1 be
the log real stock index return. Then, by first-order Taylor series approximation,
we can write

R̃t+1 = κ0 + κ1 zd,t+1 − zd,t + d̃t+1 , (D6)

where d̃t+1 is real dividend growth rate.

Assume that the log price-to-dividend ratio is of the form

zd,t = Ad,0 + Ad,1 xt + Ad,2 λt + Ad,3 (qt − μq ) + Ad,4 (πt − μπ ). (D7)

Then note that

Et mt+1 + R̃t+1 = ζ log(δ) + (ζ − 1) (g1 − 1) Ac,0 + (ζ − 1) (g1 ρ − 1) Ac,1 xt
ζ 1
+ ζ− −1 μc + xt + (ζ − 1) g0 + κ0 + (κ1 − 1) Ad,0
ψ γ

+ (κ1 ρ − 1) Ad,1 xt + (κ1 ω − 1) Ad,2 λt + (κ1 θ − 1) Ad,3 qt − μq

+ (κ1 η − 1) Ad,4 (πt − μπ ) + μc + xt + φ qt − μq − η(πt − μπ ),
ζ 2 2 2 2
vart mt+1 + R̃t+1 = ζ − 1 − σc + σd2 + (ζ − 1)g1 Ac,1 + κ1 Ad,1 σx2 + κ1 Ad,2 σλ2
2 ζ
+ κ1 Ad,3 σq2 + (κ1 Ad,4 )2 σπ2 + 2 ζ − 1 − σc σd λt
+2 ζ − 1 − (κ1 Ad,3 ) σd2 + σq2 − σd σc λt . (D8)
Dividend Dynamics, Learning, and Expected Stock Index Returns 445

Based on Et [exp(mt+1 + R̃t+1 )] = 1, we can solve for Ad,0 , Ad,1 , Ad,2 , Ad,3 , and
Ad,4 :

⎛ ⎞
ζ log(δ) + (ζ − 1)g0 + (ζ − 1)(g1 − 1)Ac,0
⎜ ζ ⎟
⎜+ ζ − − 1 μc + κ0 + μc + 12 σd2 + 12 ((ζ − 1)g1 Ac,1 + κ1 Ad,1 )2 σx2 ⎟
⎜ ψ ⎟
⎜ ⎟
⎜ + 12 (κ1 Ad,2 )2 σλ2 + 12 (κ1 Ad,3 )2 σq2 + 12 (κ1 Ad,4 )2 σπ2 ⎟
⎜  2 ⎟
⎜ ⎟
⎝ + 1 ζ − 1 − ζ σc2
2 ψ

Ad,0 = ,
1 − κ1
ζ −1− ψ
+ (ζ − 1)(g1 ρ − 1)Ac,1 + 1
Ad,1 = ,
1 − κ1 ρ
ζ − 1 − ψζ (κ1 Ad,3 ) σd2 + σq2 − σd + σd σc
Ad,2 = ,
1 − κ1 ω
φ −η
Ad,3 = , Ad,4 = . (D9)
1 − κ1 θ 1 − κ1 η

Substituting the expression for zd,t into R̃t+1 = κ0 + κ1 zd,t+1 − zd,t + d̃t+1
leads to

Et [ R̃t+1 ] = Ar,0 + Ar,1 xt + Ar,2 λt + Ar,3 (qt − μq ) + Ar,4 (πt − μπ ), (D10)


Ar,0 = κ0 − (1 − κ1 )Ad,0 + μd, Ar,1 = 1 − (1 − κ1 ρ)Ad,1 ,

Ar,2 = − (1 − κ1 ω)Ad,2 , Ar,3 = φ − (1 − κ1 θ )Ad,3 , Ar,4 = −η − (1 − κ1 η)Ad,4 .

The expected real return over the next τ periods is

τ −1
τ −1
τ −1

Et R̃t+s+1 = τ Ar,0 + Ar,1 ρ s
xt + Ar,2 ω s
s=0 s=0 s=0

τ −1
τ −1

+ Ar,3 θ s qt − μq + Ar,4 ηs (πt − μπ ) .(D12)
s=0 s=0

For nominal returns, add expected inflation based on the AR[1] model.
446 The Journal of FinanceR

Appendix E: Estimation of ω, σλ , and γ

The three parameters of interest in our GMMs estimation are the AR[1]
coefficient ω, volatility σλ , which should be strictly above zero, and leverage γ .
The three moment conditions are described in (35). First, note that we can use
the third moment condition in (35) to write ω as

μ (λt+1 − μ (λt+1 ))(λt − μ (λt ))
ω=  , (E1)
μ λ2t − μ (λt )2

where μ(·) is the sample mean function. Substituting in (E1), we can then use
the second moment condition in (35) to write σλ as
σλ = 1 − ω2 μ λ2t − μ (λt )2

 2 ⎞
 μ (λt+1 − μ (λt+1 )) (λt − μ (λt )) 
= ⎝1 − 2  ⎠ μ λ2t − μ (λt )2 . (E2)
μ λ t − μ (λ t ) 2

So for a given γ , we can solve for the corresponding equilibrium ω and σλ ,

denoted as ω∗ and σλ∗ , by finding ω and σλ that are the fixed points to the
system of expressions in (34), (E1), and (E2). In other words, by finding these
fixed points in equilibrium, the second and third moment conditions in (35) are
automatically satisfied.
We can then solve for γ based on the first moment condition in (35), while
satisfying the equilibrium ω = ω∗ and σλ = σλ∗ . We can do this numerically by
maximizing l = −μ(λt )2 .

You might also like