W 31899
W 31899
W 31899
Ashish Arora
Sharon Belenzon
Larisa C. Cioaca
Lia Sheer
Hansen Zhang
We gratefully acknowledge support from The Henry Crown Institute of Business Research, the
Fuqua School of Business, Qualcomm, and the Sloan Foundation. The views expressed herein are
those of the authors and do not necessarily reflect the views of the National Bureau of Economic
Research.
At least one co-author has disclosed additional relationships of potential relevance for this
research. Further information is available online at http://www.nber.org/papers/w31899
NBER working papers are circulated for discussion and comment purposes. They have not been
peer-reviewed or been subject to the review by the NBER Board of Directors that accompanies
official NBER publications.
© 2023 by Ashish Arora, Sharon Belenzon, Larisa C. Cioaca, Lia Sheer, and Hansen Zhang. All
rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted without
explicit permission provided that full credit, including © notice, is given to the source.
The Effect of Public Science on Corporate R&D
Ashish Arora, Sharon Belenzon, Larisa C. Cioaca, Lia Sheer, and Hansen Zhang
NBER Working Paper No. 31899
November 2023
JEL No. O3
ABSTRACT
We study the relationships between corporate R&D and three components of public science:
knowledge, human capital, and invention. We identify the relationships through firm-specific
exposure to changes in federal agency R\&D budgets that are driven by the political composition
of congressional appropriations subcommittees. Our results indicate that R&D by established
firms, which account for more than three-quarters of business R&D, is affected by scientific
knowledge produced by universities only when the latter is embodied in inventions or PhD
scientists. Human capital trained by universities fosters innovation in firms. However, inventions
from universities and public research institutes substitute for corporate inventions and reduce the
demand for internal research by corporations, perhaps reflecting downstream competition from
startups that commercialize university inventions. Moreover, abstract knowledge advances per se
elicit little or no response. Our findings question the belief that public science represents a non-
rival public good that feeds into corporate R&D through knowledge spillovers.
Larisa C. Cioaca
Fuqua School of Business
Duke University
100 Fuqua Drive
Durham 27708
larisa.cioaca@duke.edu
1
Figure 1: Trends in University Science, 1981-2016
Notes: This figure presents trends in university science over time, including U.S. science and engineering
journal publications authored by university researchers (left axis), U.S. hard science PhD dissertations (left
axis), and USPTO patents assigned to universities (right axis). All measures are normalized by their 1981
values. Publication data for 1981-1995 are from Appendix Table 5-44 of Science and Engineering Indicators
1998 (National Science Board, 1998). Publication data for 1995-2003 are from Appendix Table 5-42 of
Science and Engineering Indicators 2010 (National Science Board, 2010). Publication data for 2003-2016 are
from Appendix Table 5-41 of the Science and Engineering Indicators 2018 (National Science Board, 2018).
Dissertation data are from ProQuest Dissertations & Theses Global, while patent data are from PatentsView.
R&D and three dimensions of public science: knowledge, human capital, and inventions.
Corporate innovations can arise from inventions generated internally or acquired externally,
particularly from universities. Scientific knowledge, both from internal research and public
knowledge from universities, lowers the cost of internal invention.2 Human capital is an input
into both internal research and invention.
A firm’s response to increased public science depends on three main factors. First, public
knowledge can complement or substitute for internal research in reducing the marginal cost
of internal inventions. Second, an increase in the supply of human capital reduces the cost
of internal research and invention. Third, public inventions can substitute for internal inven-
tions as inputs into the firm’s innovations, thereby reducing the effective cost of innovation
to the firm. Public inventions can also fuel market entry by startups, reducing the payoff to
2
In a recent example of harnessing public knowledge to lower the cost of internal invention, Swiss pharma-
ceutical company Roche set up the Institute of Human Biology in May 2023 to enable its internal researchers
to collaborate with academic researchers on exploratory research, bioengineering, and translational projects
using organoids (Roche, 2023, May 4). This foundational research will not lead directly to the invention of
new drugs, but will instead provide useful scientific knowledge that reduces the cost of invention (replacing
animal models with organoids may better predict human responses to candidate drugs).
2
the focal firm’s innovations. The effect of public science on the marginal returns to internal
research and invention depends on the nature of these relationships, as noted in Section 3.
Our empirical analysis includes all publicly traded companies headquartered in the United
States that had at least one year of reported R&D expenditures, at least one granted patent,
and at least three years of consecutive financial records in Compustat between 1980 and
2015. We measure corporate R&D using company patents, scientific publications by cor-
porate scientists, the employment of scientists profiled in the American Men & Women of
Science (hereafter, “AMWS scientists”), and R&D expenditures. Measuring the relevance of
public science to a focal firm’s innovative activity is crucial to our analysis. We use a firm’s
previous publishing across OECD natural science subfields to identify relevant public knowl-
edge. To identify relevant human capital, we use the SPECTER deep learning algorithm
to measure the textual similarity between PhD dissertations and the focal firm’s patents
(Cohan, Feldman, Beltagy, Downey, & Weld, 2020).3 We use a firm’s previous patenting
across technology subclasses to identify public inventions relevant to the firm.4
Estimating the effect of public science on corporate R&D suffers from a classical endo-
geneity problem: technological shocks that affect public science can also affect corporate
R&D, leading to biased OLS estimates. Federal funding may offer a source of exogenous
variation in the public science relevant to a firm. We exploit changes in federal funding that
are driven by political rather than technological forces. Specifically, we use the federal agency
R&D budgets that are predicted by the political composition of the relevant congressional
appropriations subcommittees. Firms differ in the share of their publications published in
various subfields. Subfields differ in the extent to which their publications are funded by
different federal agencies. The combination reflects the extent to which firms are exposed
to R&D funding shocks from different agencies. To arrive at a firm-specific instrumental
variable for relevant public knowledge, we create a many-to-many crosswalk from OECD
natural science subfields to publications, and from publications to R&D funding by federal
agencies. We use a similar approach to develop firm-specific exogenous variation in human
capital and public inventions.
We present three main results. First, we find that abstract public knowledge per se—
publications in scientific journals—has little effect on the various components of corporate
R&D. This means that corporate innovation is largely unresponsive to “pure” knowledge
3
Recent work has used machine learning to establish connections between patents (e.g., Kelly, Papaniko-
laou, Seru, & Taddy, 2021), between patents and research grants (e.g., Myers & Lanahan, 2022), and to
classify publications into fields (e.g., Angrist, Azoulay, Ellison, Hill, & Lu, 2020).
4
We get similar results if dissertations are matched to firms using OECD natural science subfields, or
if we use non-corporate publications cited by patents and a firm’s previous patenting across technology
subclasses to measure relevant public inventions. See subsection 6.8 for details.
3
spillovers.
Second, public invention reduces corporate R&D. An increase in relevant university
patents of one standard deviation reduces corporate patents by about 51%, corporate pub-
lications by approximately 33%, and the employment of AMWS scientists by about 8%.
Further, we find that an increase in public invention reduces the firm’s profits, suggesting
that, on balance, public inventions compete with corporate inventions more than they serve
as inputs into corporate innovation.
Third, we find a positive effect of human capital on corporate R&D. An increase of
one standard deviation in PhD dissertations that are textually similar to a focal firm’s
patents increases firm patents by approximately 53%, publications by approximately 22%,
and the employment of AMWS scientists by approximately 9%. Higher human capital from
universities also increases firm profits, consistent with a reduction in the cost of invention
when relevant human capital becomes more abundant.
These effects vary across firms and industries. In particular, firms on the technology
frontier appear to respond less to public invention as compared to followers and to benefit
more from human capital. Similarly, public science appears to stimulate corporate research
in life sciences to a greater extent than in other industries.
Taken together, our findings indicate that the public science that matters for corporate
innovation—the science developed into patented inventions and embodied in the human cap-
ital of people—is both excludable and rivalrous. Thus, the expansion of public science may
not lead to the sustained productivity growth that standard models of economic growth
would predict. Our results also point to the importance of the growing technology com-
mercialization activities of universities. Indeed, between 1980 and 2021, the share of basic
research in the R&D performed by U.S. universities declined from 67% to 62%, while the
share of applied research and development correspondingly grew from 33% to 38%, even as
their R&D expenditures grew more than ten fold (in nominal terms) from around $6 billion
to nearly $90 billion (National Center for Science and Engineering Statistics, 2023a).
We make two main contributions. First, we contribute to the literature that examines
the effect of public science on corporate R&D, as briefly discussed in Section 2. We fo-
cus on established firms, rather than individual researchers, industries, regions, or national
economies, the focus of prior studies. Our simple framework delineates how different compo-
nents of public science, namely publications, patents, and people, affect upstream scientific
research and downstream technology development in corporations. Our findings suggest that
university research is most relevant for corporate innovation not as abstract, non-rivalrous
ideas, but rather as embodied, market-supplied inputs. Incumbent corporations appear to
have a limited ability to absorb and use abstract ideas produced by universities. It is only
4
when those ideas are developed into inventions that they become relevant to firms, reducing
the demand for internal invention by incumbent corporations and hence also reducing the
demand for internal research. In clarifying the relationship between university research and
corporate R&D, our findings also point to an important implication of university technol-
ogy commercialization activities for R&D in incumbent firms. In particular, the expansion
of university research, particularly more applied research, may spur additional competition
from startups, with corresponding changes in corporate R&D.
Second, we make a data contribution by using funding acknowledgments and other bib-
liometric and textual linkages to connect federal agency funding to publications, PhD disser-
tations, and patents. We build on Babina, He, Howell, Perlman, and Staudt (2023), Myers
and Lanahan (2022), and Azoulay, Ding, and Stuart (2009) by linking university publica-
tions, PhD dissertations, and patents with federal funding, and using exogenous changes in
agency R&D funding to estimate their impact on corporate R&D. To our knowledge, we are
the first to indirectly link federal funding to public knowledge, human capital, and public
invention that is relevant to a given firm’s R&D, even if not directly used by the firm. We
exploit differences in the political composition of congressional appropriations subcommit-
tees as a source of exogenous variation in agency R&D funding. This enables us to analyze
the joint effect of the three components of public science on both upstream and downstream
corporate R&D without the potential bias induced by how firms select the public science to
use in innovation.
The paper proceeds as follows. Section 2 places this study in the related literature.
Section 3 presents the conceptual framework that guides our empirical investigation. Section
4 discusses and summarizes the data, Section 5 outlines the econometric specifications, and
Section 6 presents the results. Section 7 concludes and suggests directions for future work.
2 Related Literature
A voluminous literature has explored how public science affects corporate R&D through
knowledge and training spillovers or the acquisition of university inventions. Early influen-
tial studies have surveyed industrial research managers on the perceived importance of public
science to corporate innovation. These include the Yale survey on appropriability and tech-
nological opportunity (Klevorick, Levin, Nelson, & Winter, 1995; Nelson, 1986; Rosenberg &
Nelson, 1994), the pioneering surveys by Mansfield (1991, 1995, 1998), the Carnegie Mellon
survey on industrial R&D (Cohen et al., 2002), and the EU Community Innovation Survey
(Beise & Stahl, 1999; Laursen & Salter, 2004; Tether & Tajar, 2008). These studies suggest
that scientific research from universities is of limited direct value for corporate R&D. How-
5
ever, because these studies lack firm-specific measures of the stock of relevant public science,
they do not directly address how public science affects corporate R&D.5
Other studies use citations to the non-patent literature (NPL) to measure the use of
science in corporate invention (e.g., Fleming, Greene, Li, Marx, & Yao, 2019; McMillan,
Narin, & Deeds, 2000; Narin, Hamilton, & Olivastro, 1997). These studies show that patent
citations to scientific papers have increased over time, particularly for patents in the life-
sciences, and for patents by startups. Most of the science cited is government-funded and
produced by universities, federal laboratories, and other public research institutions, though
AT&T, IBM, DuPont, and Merck also figure prominently. However, though these studies
show that inventions have become closer to science, how public science affects corporate
R&D remains unclear. We find that public science affects corporate R&D only when the
knowledge is developed by universities into patents or embodied in people (PhD graduates).
Several recent studies estimate the effect of public funding for research on patented inven-
tion (Azoulay, Graff Zivin, Li, & Sampat, 2019; Myers & Lanahan, 2022), on the composition
and intensity of corporate R&D (Mulligan, Lenihan, Doran, & Roper, 2022; Scandura, 2016),
and on academic entrepreneurship (Babina et al., 2023). Myers and Lanahan (2022) exploit
windfall grant funding resulting from non-competitive grant matching policies that vary
across states and over time. They find that for every patent produced by grant recipients of
the Department of Energy, three additional patents are produced by non-recipients. Babina
et al. (2023) use windfall changes in agency funding to estimate the effect on university en-
trepreneurship, publishing, and patenting. We map agency R&D to public science relevant
to a given firm to estimate how the different components of public science affect corporate
R&D. We exploit differences in the political composition of congressional appropriations
subcommittees as a source of exogenous variation in agency R&D funding, and in turn, as a
source of exogenous variation in public science.
Our results also add to Azoulay et al. (2019), who analyze the effect of National Institutes
of Health (NIH) grant funding for research and trace the impact on patenting by pharma-
ceutical and biotechnology firms during 1980-2012. They find that an increase of $10 million
in NIH grant funding for a research area leads to 2.3 additional private patents, suggesting
that public research encourages private innovation in the life sciences.6 Our heterogeneity
5
Over the past several decades, researchers have also investigated “additionality”—whether government
spending crowds out or stimulates additional private R&D investments—at various levels of aggregation,
including industries (e.g., Mamuneas & Nadiri, 1996), firms (e.g., Einiö, 2014; Lichtenberg, 1984; Moretti,
Steinwender, & Van Reenen, 2021; Wallsten, 2000) and individuals (e.g., Goolsbee, 1998). Perhaps not sur-
prisingly, given the diversity of approaches and levels of analysis, these studies have produced conflicting
results (see reviews by David, Hall, & Toole, 2000; Dimos & Pugh, 2016). Previous studies have also docu-
mented substantial heterogeneity in response to government subsidies by firm size (González, Jaumandreu,
& Pazó, 2005) and R&D intensity (Szücs, 2020).
6
More than half of the patents resulting from NIH research grants are for diseases different from those
6
analysis similarly reveals that public knowledge provides some encouragement for corporate
innovation in the life sciences, but that outside this unique setting, public knowledge appears
to have little effect on patenting and publishing by incumbent firms. Our findings therefore
caution against generalizing from the life sciences to other sectors.
Another strand of the literature focuses on the localization of spillovers from universities
(e.g., Belenzon & Schankerman, 2013; Hausman, 2022; Tartari & Stern, 2021; Valero &
Van Reenen, 2019). Tartari and Stern (2021) examine the effect of university funding on
local startups at the zip code level. Consistent with our findings, they document a positive
effect on local entrepreneurship from increases in funding for universities, but not for national
laboratories. A possible explanation is that, unlike national laboratories, universities also
embody knowledge in human capital used by new ventures. In other words, it is likely
that human capital from universities is the source of new startups. Similarly, Hausman
(2022) studies the effect of university innovation on local industrial agglomeration at the
county-by-industry level. She documents higher growth in employment, wages, and corporate
patenting after the passage of the Bayh-Dole Act in industries more closely related to the
local university’s technological strengths. Consistent with Tartari and Stern (2021), she
finds that this growth is primarily driven by new ventures in university-linked industries.
However, neither study analyzes the effect on incumbent firms. Indeed, incumbent R&D and
profitability depend on whether startups commercializing university discoveries supply their
innovations to incumbents or compete with them. Our results suggest that the competition
effect is the dominant effect.
Overall, our paper differs from prior literature in a couple of important ways. First, we
study the effects of three distinct components of public science—knowledge, human capital,
and invention—on both upstream corporate R&D (scientific research or “R”) and downstream
corporate R&D (technology development or “D”). Second, we make progress on data and
identification at the firm level rather than at the industry, zip code, or individual researcher
level. For each firm, we measure the potentially relevant public knowledge, human capital,
and public invention based on: (i) the textual similarity between publications, dissertations,
and patents; (ii) the classification of patents and publications in various CPC subclasses
and OECD subfields, respectively; and (iii) non-patent literature citations from patents to
publications. We also match renowned scientists profiled in the American Men & Women
of Science directories to thousands of R&D-performing, publicly traded, American firms
and their subsidiaries over three-and-a-half decades. This allows us to measure corporate
initially funded, indicating the presence of knowledge spillovers. This highlights the importance of linking
science to innovation without assuming that science affects innovation only in a narrowly defined intended
area. We implement this approach when we measure the public science that is potentially relevant to the
firm, and not just that which is actually used by the firm.
7
investment in research for firms that do not publish scientific publications.
3 Conceptual Framework
We adapt the framework from Arora et al. (2021a) to focus on the effect of public science on
internal research and invention. Public science has at least three components: knowledge dis-
closed in scientific publications, trained human capital (Pavitt, 1991), and inventions based
on public knowledge (Fabrizio & Di Minin, 2008). These potentially differ in how they affect
internal research and invention by incumbent corporations. For instance, public knowledge
may complement internal research or substitute for it. Inventions based on public knowl-
edge substitute for internal inventions, and may even compete with the firm’s innovations.
Human capital, on the other hand, tends to increase internal research and invention.
3.1 Setup
A firm’s product market profit, Π(d), depends on its innovations—the number of inventions
it introduces into the market—d. These inventions may be acquired from outside the firm
or internally generated. Internal inventions are produced at a unit cost w(k)ϕ(r, u), where r
is internal research and u is the stock of public knowledge that is relevant to the firm. The
term w(k) represents the wage of inventors and is assumed to fall as more human capital, k,
is available to the firm. The term ϕ represents the inverse of invention productivity and is
assumed to decrease with r at a diminishing rate. We also assume that ϕ decreases with u.7
The relationship between public knowledge and internal research in reducing the unit cost
of internal invention is important for how the stock of public knowledge relates to invest-
ments in internal research.8 Public knowledge may complement internal research because
performing internal research provides the absorptive capacity to use the knowledge.9
We assume that the cost of internal research is given by γ(k) 21 r2 , which also depends on
k, the supply of relevant human capital. In other words, increasing the number of trained
PhD scientists produced by universities reduces the firm’s cost of both internal research and
internal invention.
7
The cost function reflects a simple linear production function d = λ(r, u)n, where n is the number
of inventors the firm employs and λ(r, u) is the productivity of the inventors. Thus, the cost of internal
1
inventions is simply w(k)n so that ϕ = λ(r,u) .
2 2
8 ∂ ϕ ∂ ϕ
Complementarity exists if − ∂r∂u > 0 and substitutability exists if − ∂r∂u < 0.
9
There is a large literature on absorptive capacity that argues firms must invest in internal research
to benefit from public knowledge (e.g., Cohen & Levinthal, 1990; Rosenberg, 1990). Baruffaldi and Poege
(2020) show that firms are more likely to cite papers presented at conferences where the firm’s scientists also
participated.
8
Inventions by university researchers (henceforth, “public inventions”) can either be inputs
to the firm’s own innovation or compete with the firm’s innovations in the marketplace. For
example, university spinoffs and startups could be acquired by the firm or instead compete
with it in the marketplace, either directly or after being acquired by rivals (OECD, 2003).
To model public inventions as inputs to the firm’s own innovation, we assume that the
firm’s innovation, d, is the sum of those derived from internal inventions, d1 , and those
derived from public inventions, d2 . We assume that the firm can acquire public inventions
at an increasing marginal cost represented by a0 d2 + 21 a1 d2 2 .10
To model public inventions that compete with the firm’s innovations, we allow the focal
firm’s product market profits to also depend on public inventions. Specifically, we assume
˜ = b0 + b1 d − 1 b11 d2 − b2 d˜ − 1 b22 d˜2 + b12 dd,
Π(d, d) ˜ where d˜ stands for public inventions that
2 2
compete with the firm’s innovation. We assume that Π(d, d) ˜ increases with d, decreases with
˜ and is concave. Importantly, we assume that the firm takes the number of competing
d,
public inventions as given. Note that the marginal return to innovation (gross of the costs)
˜ which increases with d˜ if b12 ≥ 0 and decreases with d˜ otherwise.
is simply b1 − b11 d + b12 d,
We say that public inventions and internal inventions are strategic complements if b12 ≥ 0
and strategic substitutes otherwise.
An increase in relevant public knowledge increases the value of the firm, v, by reducing the
cost of internal invention. Formally, applying the envelope theorem, ∂u ∂v
= −d1 ∂ϕ
∂u
> 0. If
∂2ϕ
internal research complements public knowledge (i.e., − ∂r∂u > 0), then an increase in public
knowledge will also increase internal research. If they are substitutes, then there are two
opposing effects. Substitutability reduces the marginal return to internal research. However,
a reduction in the cost of internal invention due to public knowledge increases the scale of
internal invention, thereby increasing the marginal return on internal research.
10
For simplicity, the total cost of public inventions acquired by the firm is assumed to be a0 d2 + 12 a1 d2 2 .
The assumption of a rising marginal cost of public invention implies that the firm has market power, perhaps
due to its location or the specific inventions it can commercialize. The results are similar if the firm is a price
taker and has an increasing cost of internal invention, except that an increase in demand for invention would
leave internal invention and research unchanged but decrease invention sourced from the public sector.
9
Figure 2: Conceptual Framework
Notes: This figure presents our basic conceptual framework (Panel A). The firm’s innovation, d, is the
sum of internal inventions, d1 , and external inventions, d2 . The “demand” for innovation is represented by
˜ The “supply” of public inventions is represented by a0 + a1 d2 , while the “supply” of internal
Π′ (d1 + d2 , d).
inventions is represented by w(k)ϕ(r, u), where w(k) is the wage of inventors, k is human capital, r is internal
research, u is public knowledge, and γ(k) 21 r2 is the cost of r. Comparative statics for increases in public
knowledge (Panel B), human capital (Panel C), and public invention (Panels D, E, and F) are also included.
10
The effect on internal invention follows a similar logic. The direct effect of an increase
in public knowledge is to reduce ϕ in the cost of internal invention, as shown in Panel B. As
long as the marginal cost of internal invention decreases, overall innovation increases because
the increase in internal invention is only partly at the expense of external invention.
As with public knowledge, an increase in human capital supply increases firm value. Formally,
∂v
∂k
= −d1 ϕ ∂w
∂k
− 12 r2 ∂γ
∂k
> 0. An increase in the supply of human capital reduces the cost of
internal invention and research, as shown in Panel C. Since external invention substitutes
for internal invention, the former will fall.
Insofar as public inventions are inputs to the firm’s own innovation, they increase firm value
but decrease internal invention and research. An increase in public invention can be modeled
∂v
as a reduction in a0 , as shown in Panel D, in which case − ∂a 0
= d2 > 0. However, a reduction
in the marginal cost of external invention will decrease internal invention, which will, in turn,
decrease internal research. Intuitively, an increase in the supply of an input increases the
firm’s value. However, it will decrease the demand for substitute inputs.
Conversely, an increase in public sector inventions that compete with the firm’s inno-
˜ will decrease firm value. Formally, ∂v = −b2 − b22 d˜ + b12 d ≤ 0 because Π was
vations, d, ∂ d˜
assumed to fall with d, ˜ as shown in Panel E. Indeed, b12 ≤ 0 is sufficient for this result (if
b2 and b22 are both positive). If b12 < 0, then an increase in d˜ will reduce d1 and hence also
will reduce r. Conversely, if b12 > 0, an increase in d˜ will increase d1 and hence also will
increase r. In other words, one has to examine the pattern of relationships with value as
well as internal invention and research to assess how public inventions relate to corporate
innovation. Table 1 summarizes the predictions of our basic conceptual framework.
Even if the fruits of public science are available to all, they may not benefit all firms equally.
It is plausible that for leading firms, which require “frontier” innovations, sourcing public
inventions that match their needs is more difficult. By contrast, for follower firms trying
to “catch up” to the technology frontier, public inventions may be more plentiful. If so,
frontier firms would rely to a greater extent on internal inventions and also invest more
11
Table 1: The Predicted Effect of Public Science on Firm Value and Innovation
in internal research compared to follower firms.11 This suggests that frontier firms may
also respond differently to public science than followers. Public knowledge may substitute
for internal research for followers but may complement internal research in frontier firms.
Insofar as human capital reduces the cost of internal research, frontier firms would be more
responsive to increases in human capital. On the other hand, followers may respond more
to an expansion in the supply of public inventions.
4 Data
We combine data from several sources: (i) scientific publications by corporations, univer-
sities, federal laboratories, and other public research institutions, acknowledgments of fed-
eral grants by these publications, and citations by patents to publications from Dimensions
(Digital Science, 2022); (ii) scientists profiled in the American Men & Women of Science
11
Frontier firms may have a higher demand for inventions, may face a lower effective supply of public
inventions, or internal research and public science may be strategic complements. These issues are explored
in our empirical analysis.
12
directories; (iii) PhD dissertations from ProQuest Dissertations & Theses Global; and (iv)
firm financial information from S&P’s Compustat North America. We complement these
data with scientific publication information from Clarivate’s Web of Science, patent data
from U.S. Patent and Trademark Office’s PatentsView and the European Patent Office’s
PATSTAT, federal procurement contract data from the Federal Procurement Data System,
and federal grant data from the Treasury DATA Act Broker (see Arora et al., 2021a; Arora,
Belenzon, & Sheer, 2021b; Belenzon & Cioaca, 2021).
Corporate innovation and public science are multi-dimensional. Our measures capture
both corporate innovation inputs (R&D expenditures and AMWS scientists) and outputs
(publications and patents). Moreover, they capture upstream corporate science (publications
and AMWS scientists) and downstream corporate invention (patents). As well, we measure
three components of relevant public science: knowledge, human capital, and invention. The
construction of the main variables used in our econometric analyses is summarized below
and detailed in Online Appendix A.
13
Table 2: Cross Tabulation of Measures of Upstream Corporate Research
The index o denotes OECD subfields. P ublicationso,t is the number of non-corporate publi-
cations published in year t in subfield o. P recohort share of publicationsi,o is firm i’s share
of publications in subfield o during the previous (lagged) 5-year time cohort, obtained by
dividing the number of firm publications published in subfield o by the total number of firm
publications in the time cohort. We generate a stock measure of Public knowledge using a
perpetual inventory method with a 15% depreciation rate.
12
As of 2022, the Dimensions dataset combined 131.5 million cited and citing publications, 6.3 million
research grants with related funding organizations, as well as 149.7 million cited and citing patents.
14
4.3 Human Capital: PhD Dissertations
We measure human capital using PhD dissertations sourced from ProQuest Dissertations &
Theses Global (hereafter, PQDT), recognized by the U.S. Library of Congress as the official
repository for dissertations, and containing more than 5 million dissertations and theses
from universities around the world between 1900 and 2021. We exclude “soft science” PhD
dissertations from our data.13 We also discard PhD dissertations from non-U.S. universities
and all master’s degree theses. We end up with 771,023 U.S. PhD dissertations awarded
between 1985 and 2016 in 394 “hard science” research fields.
PhD dissertations are not typically cited by publications or patents. Therefore, we as-
sess the relevance of trained human capital to corporate innovation based on the textual
similarity between the abstracts of dissertations and the abstracts of company patents. We
calculate that similarity using SPECTER, a deep learning algorithm that considers both
the content and the context of scientific tests. In brief, SPECTER uses a transformer-based
neural network to process natural language texts. Online Appendix A provides a detailed
description of how we implement SPECTER in our variable construction.
Our firm-time cohort measure of relevant Human capital is the weighted sum of PhD
dissertations, using the textual similarity to patents as weights:
X
Human capitali,t = M aximum textual similarityd,i,t (2)
d∈D
D is the set of PhD dissertations in the top 1,000 most similar dissertations for one or more of
the patents granted to firm i during the 5-year time cohort t. M aximum textual similarityd,i,t
is the maximum textual similarity score between the abstract of dissertation d and the ab-
stracts of all patents granted to firm i during the 5-year time cohort t.14
A subset of PhD dissertations are published in scientific journals and (subsequently) cited
by patents. We construct a complementary firm-year measure, Human capital, cited, as the
13
Doing so is not straightforward because the variable that describes dissertations’ research fields,
“classterms,” lists 308,862 different combinations of terms. We manually create a list of 1,027 disambiguated
terms, then drop dissertations in such research fields as “literature,” “history,” and “social sciences.”
14
Our text-based measure captures the human capital that is potentially relevant to a firm’s inventions
without requiring “actual use” (e.g., NPL citations or employment history). For example, Arifur Rahman
earned his PhD in Electrical Engineering from MIT in December 2000. His dissertation on interconnect
technologies for integrated circuits was published in early 2001 in ProQuest Dissertations & Theses Global
(document ID 304757014). SPECTER ranked Rahman’s dissertation in the top 1,000 most similar disser-
tations for five of Lattice’s patents granted in 2000, five granted in 2001, and another five granted in 2002.
While none of these contemporaneous patents cited the dissertation, our measure nevertheless identified a
link between Arifur and Lattice. Indeed, Rahman was subsequently hired by Lattice as a technical staff
member in 2001. He went on to produce a number of semiconductor patents for Lattice (with filing dates
starting in 2002) and subsequent corporate employers, including Intel, Altera, and Xilinx.
15
weighted sum of published PhD dissertations cited by patents in various patent subclasses,
as detailed in Appendix A.15 The weights are the focal firm’s shares of patents across patent
subclasses during the previous 5-year time cohort. We construct a third measure, Human
capital, OECD, by first classifying PhD dissertations into OECD natural science subfields.
We then use the focal firm’s previous patenting across technology subclasses that rely on
science from various OECD subfields to identify relevant human capital. We validate the
logic behind our measures of firm-relevant human capital with three case examples included
in Appendix C. We report results using the alternative measures in Section 6.8. Our findings
are not sensitive to the specific approach used for measuring firm-relevant human capital.
The index s denotes patent subclasses, identified using the first four digits of the current CPC
classification from the U.S. Patent & Trademark Office (USPTO). U niversity patentss,t is
the count of patents granted to universities in subclass s in year t. P recohort share of
patentsi,s is firm i’s share of patents in subclass s during the previous 5-year time cohort,
obtained by dividing the number of firm patents granted in subclass s by the total number
of firm patents in that time period.
In robustness checks, we use a broader measure of the supply of relevant public invention
using publications that lead to inventions, as detailed in Appendix A. We construct Public
invention, broad as the stock of non-corporate publications that are cited by patents in
various patent subclasses, weighted by the share of the focal firm’s patents across patent
15
Continuing with the previous example, Arifur Rahman’s dissertation was published under the title
Interconnect Limits on Gigascale Integration (GSI) in the 21st Century (DOI 10.1109/5.915376) in 2001.
This publication was subsequently cited by more than one hundred patents granted between 2004 and 2021,
including patents assigned to IBM, Seagate Technologies, and Texas Instruments. Similar to our primary
measure, our alternative measure captures the relevance of Rahman’s human capital, at graduation, not only
to his eventual employer, Lattice, but also to other firms that innovate in semiconductors.
16
subclasses during the previous 5-year time cohort. Because some publications are not directly
cited by patents, yet still reflect external inventions that are potentially relevant to firms,
we also construct another measure Public invention, SPECTER using the textual similarity
between the abstracts of non-corporate publications and the abstracts of corporate patents.
Textual similarity is assessed using the SPECTER algorithm. We report results using these
alternative measures in Section 6.8. Our findings are not sensitive to the specific approach
used for measuring firm-relevant public invention. Table 3 summarizes the main variables
used in the econometric analyses.
17
Table 3: Main Variables
Publications Scientific publications that have at least one author affiliated with the focal
firm
AMWS scientists Scientists profiled in AMWS that are employed by the focal firm
B. Independent variables
Public knowledge Stock of non-corporate publications published in various OECD natural sci-
ences subfields
Human capital PhD dissertations, based on the textual similarity between abstracts of dis-
sertations and abstracts of firm patents
Public invention Stock of university patents granted by the USPTO in various CPC subclasses
Human capital, cited Published PhD dissertations cited by patents in various CPC subclasses
Human capital, OECD PhD dissertations mapped to various OECD subfields, based on the impor-
tance of the OECD subfields to patenting in various CPC subclasses
Public invention, broad Stock of non-corporate publications cited by patents in various CPC sub-
classes
Public invention, SPECTER Stock of non-corporate publications, based on the textual similarity between
non-corporate publications and firm patents
Notes: This table summarizes the main variables used in the econometric analyses. Stock mea-
sures are constructed using a perpetual inventory method with a 15% depreciation rate. For example,
(P ublic knowledge, stock)t = (P ublic knowledge)t + (1 − δ)(P ublic knowledge, stock)t−1 , where δ = 0.15.
We omit the term “stock” from variable names to simplify notation.
dissertations or the published versions of PhD dissertations) and public invention (whether
measured by university patents or non-corporate publications cited by patents). Large firms,
in particular, face more abundant relevant public science than small firms. Consistent with
the idea that trained human capital and public invention are co-produced in universities,
62% of firms with above median human capital also have above median public invention, as
shown in Appendix Table C18.
18
Table 4: Summary Statistics for Main Variables
5 Econometric Framework
We turn to the empirical investigation of the theoretical predictions from Table 1.
′
ln(Yi,t ) =α0 + α1 ln(Xi,t−1 ) + Zi,t−1 ω + ηi + τt + ϵi,t (4)
We use multiple dependent and independent variables (see Appendix A for details on variable
construction). Yi,t represents corporate innovation inputs (R&D expenditures and AMWS
scientists) and outputs (Publications and Patents), for firm i in year t. Xi,t−1 represents the
Public knowledge (stock), Human capital, and Public invention (stock) relevant to firm i’s
innovation in the lagged year or time cohort. The vector Z includes time-varying controls,
such as ln(Sales)t−1 for the R&D expenditures equation and ln(R&D stock)t−1 for the
patents, publications, and AMWS scientists equations (where we also add an unreported
indicator variable equal to 1 for firms without R&D expenditures prior to the focal year).
In all specifications, we account for a possible direct federal funding effect by including
ln(Awards to f ocal f irm)t−1 , the lagged stock of federal grant and procurement dollars
19
awarded to the focal firm and its subsidiaries. In the 2SLS specifications, we also include
indicator variables equal to 1 for firms with zero-valued instruments in the prior year and a
control for lagged Agency exposure.18 The vectors η and τ are firm and year fixed effects,
respectively, and ϵ is an iid error term. When calculating natural logarithms, we add $1
to variables measured in millions of dollars (e.g., Sales, R&D stock ) and one unit to count
variables (e.g., patents, publications, AMWS scientists). Standard errors are clustered at
the firm level.
Our coefficient of interest is α1 . We expect the effect of public science on corporate
innovation to vary by upstream and downstream R&D and by the specific component of
public science. We also examine heterogeneity in effects by firm proximity to the technology
frontier and by main industry.
One concern with our econometric framework pertains to our ln(1+x) transformation,
which we implement to handle positively skewed count data with zeros (e.g., firms have zero
publication flows in some years). We address this concern using the two-stage control function
Poisson regression approach described in Lin and Wooldridge (2019) and implemented in
Bellet, De Neve, and Ward (2023). We bootstrapped to estimate standard errors for the
coefficient estimates. We obtain similar results to our main specifications.
R&D stocki,t−1
ln(T obin′ s Q)i,t =β0 + β1 ln (Public knowledge)i,t−1
Assetsi,t−1
(5)
+ β2 ln (Human capital )i,t−1 + β3 ln (Public invention)i,t−1
′
+ Zi,t−1 ω + ηi + τt + ϵi,t
Tobin’s Q is market value divided by book value of assets. The other elements of the
specification are as previously described. Our coefficients of interest are β1 , β2 , and β3
on the lagged firm-relevant public knowledge (stock), human capital, and public invention
(stock), respectively.
18
P P
Agency exposurei,t = s∈S a∈A Reliance on public knowledges,a × P recohort share of patentsi,s
captures the weights used to calculate the instrument for public invention at the firm-year level.
20
5.3 Identification
A key econometric challenge is how to deal with the endogeneity of public science. We address
it in an instrumental variable framework that uses the R&D budgets of federal agencies to
predict firm-relevant public science. We construct a Bartik-style shift-share instrument for
each component of public science. The “shift” represents federal financial support across
OECD subfields (in the case of public knowledge), dissertation advisors (in the case of
human capital), and patent subclasses (in the case of public invention). As multiple agencies
provide such financial support, the shift for each subfield, advisor, and subclass is calculated
as the weighted sum of financial support from each federal agency, where the weights capture
how much of that agency’s R&D budget is directed to that subfield, advisor, and subclass,
respectively. The firm-specific “exposure share” is based on the firm’s publishing across
OECD subfields (public knowledge), the textual similarity of PhD dissertations to the firm’s
patents (human capital), and the distribution of the firm’s patents across subclasses (public
invention) in the pre-period.
A key identifying assumption is that federal agency funding for R&D is unrelated to
technology and demand-side factors that also drive corporate innovation. To ensure that
our results are not affected by potential violations of this assumption, we use two different
approaches when building our instruments for public knowledge, human capital, and public
invention. The first approach uses agency R&D budgets to construct the “shift.” The
second (and preferred) approach adds another step: we use two measures of the political
composition of congressional appropriations subcommittees to predict agency R&D budgets,
then use these predicted agency R&D budgets to construct the “shift.” This second approach
leverages the powerful and persistent roles of congressional appropriations subcommittees in
federal budgeting (Davis, Dempster, & Wildavsky, 1966).
Another important identifying assumption is that firm “exposure shares” are unrelated to
the same underlying factors that drive federal agency R&D budgets. For instance, if larger
firms are more exposed to federal agencies that receive more R&D funding, instrumenting
for public science with agency R&D funding may still lead to biased results. We examine the
severity of this concern by estimating the relationship between firm size and federal R&D
funding. We find a positive correlation between firm R&D stock and agency R&D funding,
so we control for the lagged firm R&D stock or annual sales in all relevant specifications.
Our results are qualitatively similar when we do not include the control for size.
21
5.3.1 Federal Funding for Public Science
The U.S. government is a substantial funder of public science. As shown in Appendix table
A5, federal agencies’ R&D budgets have increased from $104.6 billion per year in the 1980s
to $156.1 billion per year in the 2010s (American Association for the Advancement of Sci-
ence, 2021). The Dimensions dataset connects more than 4.6 million publications to their
funding organizations, including federal agencies. These linkages are based on funding ac-
knowledgments provided by the authors at publication and on administrative data collected
from major funders, such as the National Science Foundation and the National Institutes
of Health. We use the publications-to-grants and grants-to-federal agencies crosswalks from
Dimensions, the hierarchical structure of federal agencies from the Global Research Iden-
tifier Database (GRID), and the PhD students-advisors crosswalk from PQDT to create
instrumental variables for our various measures of public science.
In the simplest approach, we link federal funding for R&D with each of the three com-
ponents of public science, then calculate Bartik-style shift-share instruments using firms’
differential exposure to the common federal funding shocks. We report results using these
instrumental variables in our robustness checks.
In our preferred approach, we address the concern that federal funding for R&D may
reflect technological or demand shocks that also affect the R&D decisions of firms. Prior
research suggests that political partisanship can influence federal budgets (Davis et al., 1966;
Epp, Lovett, & Baumgartner, 2014). Because we need a source of agency-level variation in
R&D funding, we focus on the political composition of congressional appropriations sub-
committees. For each of the 12 main federal agencies (plus an “Other” category for smaller
agencies), we identify which U.S. House and U.S. Senate subcommittees are responsible for
reviewing their budget request to Congress, hearing testimony from government officials and
other witnesses, and drafting the spending plan for each fiscal year. Appendix Table A6
summarizes the mapping between agencies and subcommittees.
For each subcommittee, we collect two pieces of information. The first measures how
dominant the majority party is in the subcommittee. The variable Majority party share is
the ratio of the number of members from the majority political party in the chamber over
the total number of members in the subcommittee. The second measures the ideological
orientation of the subcommittee. The variable Democratness is the ratio of the number of
Democrats over the total number of members in the subcommittee. We use these variables to
predict the R&D budget, then use the predicted R&D budget in constructing our Bartik-style
shift-share instruments at the firm-year level.
The ideas behind this approach are as follows. When committees are more balanced, the
majority party may have to engage in more give-and-take with the minority party. One way
22
is to fund more of the minority party’s priorities, which would result in bigger budgets. In
addition, each member of the majority party may also have more bargaining power when the
majority is small, leading to additional spending to benefit their constituents. In either case,
we would expect agency R&D budgets to decrease when the majority party shares in the
relevant subcommittees increase. Moreover, the ideological bent of the majority party may
matter as well. Insofar as in the U.S. Republicans promote spending cuts while Democrats
favor a larger federal government (Epp et al., 2014; Tavares, 2004), we would expect agency
R&D budgets to increase when the share of subcommittee members who are Democrats
increases. Appendix Table A7 shows that the political composition of congressional appro-
priations subcommittees predicts the R&D budgets of federal agencies in the anticipated
directions. However, the political composition should be orthogonal to technological or de-
mand shocks that also affect the R&D decisions of firms. If so, it is a source of exogenous
variation in agency R&D budgets.
Our preferred instrument for Public knowledge is the predicted federal funding for public
knowledge published in each OECD subfield, weighted by the focal firm’s shares of publica-
tions in each OECD subfield during the previous 5-year time cohort, as follows:
X
P redicted R&D budget - public knowledgei,t = P recohort share of publicationsi,o
o∈O
!
X
budgeta,t × Reliance on agencyo,a
R&D\
a∈A
(6)
O denotes OECD subfields. P recohort share of publicationsi,o is firm i’s share of publica-
tions in subfield o during the previous 5-year time cohort, obtained by dividing the number of
firm publications published in subfield o by the total number of firm publications. A is the set
of 12 main federal agencies, plus an “Other” category for smaller agencies. R&D\ budgeta,t is
the R&D budget predicted by Majority party share and Democratness for agency a in year t.
Reliance on agencyo,a is a share obtained by dividing the number of publications published
in subfield o over 1980-2015 and funded by agency a by the total number of publications
published in subfield o over 1980-2015.
Our preferred instrument for Public invention is the predicted federal funding for publications
that are relevant to university patents in each patent subclass, weighted by the focal firm’s
23
shares of patents across CPC subclasses during the previous 5-year time cohort, as follows:
X
P redicted R&D budget - public inventioni,t = P recohort share of patentsi,s
s∈S
!
X
budgeta,t × Reliance on agencys,a
R&D\
a∈A
(7)
S, P recohort share of patentsi,s , A, and R&D\ budgeta,t are as previously defined. Reliance
on agencys,a is a share obtained by dividing the number of citations from university patents
granted in subclass s over 1980-2020 to non-corporate publications published over 1980-2015
and funded by agency a by the total number of citations from university patents granted in
subclass s over 1980-2020 to all non-corporate publications published over 1980-2015.
We construct an analogous instrument for Human capital. Differently from the previous
two instruments, we link each dissertation to a federal agency through the funding the PhD
dissertation advisors received from each agency over the six-year period prior to the grant
of the degree, and link each dissertation to a firm using the textual similarity to the firm’s
patents. Specifically, we match advisors to researchers in the Dimensions dataset using
each dissertation advisor’s name, school affiliation, and years of publishing activity and
retrieve from Dimensions (i) the scientific publications authored by the advisors during the
6-year period preceding the PhD dissertation defense and (ii) the grant amounts and funding
organizations for these publications. In our PhD dissertation dataset, 1,310,774 dissertations
have advisor information, producing 1,472,326 dissertation-advisor pairs (some dissertations
have more than one advisor). We assume that federal funding received by the advisor(s)
of a PhD student during the 6-year duration of the PhD program affects the direction and
content of the dissertation.
Our preferred instrument for Human capital is the predicted federal funding for each
dissertation’s advisors, weighted by the maximum textual similarity between the dissertation
and a focal firm’s patents granted in a 5-year time cohort, as follows:
X
P redicted R&D budget - human capitali,t = M aximum textual similarityd,i,t
d∈D
!
X
budgetd,a × Share of agencyd,a
R&D\
a∈A
(8)
D is the set of PhD dissertations in the top 1,000 most similar dissertations for one or more
24
of the patents granted to firm i during time cohort t. M aximum textual similarityd,i,t
and A are as previously defined. R&D\ budgetd,a is the R&D budget predicted by Majority
party share and Democratness for agency a at the beginning of the PhD program (i.e., five
years prior to dissertation d’s defense year). Share of agencyd,a is obtained by dividing the
funding amount (in $) from agency a to the publications of the advisor(s) of dissertation d
during the 6-year period ending in dissertation d’s defense year by the total funding amount
(in $) from agency a to any publication published over the same period.
6 Estimation Results
6.1 Patents Equation
Table 5 presents the results using patents—our measure of corporate invention—as the de-
pendent variable. Columns 1, 3, and 5 present OLS estimates for Public invention, Human
capital, and Public knowledge, respectively. The coefficients are positive and statistically
different from zero (p-values < 0.001). However, common shocks can affect both public sci-
ence and corporate R&D, leading to biased OLS estimates. We address this concern in a
2SLS framework by instrumenting for Public invention using Predicted R&D budget - public
invention, for Human capital using Predicted R&D budget - human capital, and for Public
knowledge using Predicted R&D budget - public knowledge. The first stage results reported
in Appendix Table A10 confirm that all components of public science are positively related
to their respective instrumental variables (p-values < 0.001, F statistics > 104.7, see Lee,
McCrary, Moreira, and Porter (2022)).
The 2SLS coefficient estimate on public invention becomes negative (Column 2, p-value
< 0.001), while the estimate on human capital becomes even larger (Column 4, p-value
< 0.001). Importantly, the negative effect of public invention and the positive effect of
human capital persist when they are jointly estimated on the entire sample (Column 7) or a
subsample of publishing firms (Column 8).19 At the sample means, a one standard deviation
increase in relevant public invention decreases company patents by 51%, while a one standard
deviation increase in relevant human capital increases patents by 53% (Column 7).20
Conversely, the 2SLS estimate on public knowledge is small when estimated alone (Col-
19
These results are robust to dropping the controls for Agency exposuret and ln(Awards to f ocal
f irm)t−1 . They are also robust to using an inverse hyperbolic sine transformation of the dependent variable.
20
Average values for patent flow, public invention stock, and human capital are 28.30, 266.11, and 6,412.67,
respectively. The standard deviation for public invention is 511.89 and for human capital is 9,708.77. The
marginal effect of a one standard deviation increase in public invention is a decrease in firm patents of
511.89 × 0.256(28.30 + 1)/(266.11 + 1) = 14.37. The marginal effect of a one standard deviation increase in
relevant human capital is an increase in firm patents of 9, 708.77 × 0.338(28.30 + 1)/(6, 412.67 + 1) = 14.99.
25
umn 6, p-value < 0.05) and becomes statistically indistinguishable from zero when estimated
jointly (Columns 7 and 8). In light of our theoretical predictions from Table 1, finding no ef-
fect of public knowledge on corporate patents suggests that public knowledge does not lower
the cost of invention. In turn, this also implies that public knowledge does not complement
internal research in lowering the cost of invention.
To address concerns with our ln(1+x) transformation of the dependent variable, we im-
plement the two-stage control function (CF) instrumental variable (IV) Poisson regression
approach of Lin and Wooldridge (2019). We correct the standard errors in the second stage
using panel bootstrapping with 100 replications and report results in Column 9. Our results
are not sensitive to our preferred data transformation.
26
0.001). These results persist when we jointly estimate them (Column 7), restrict the sample
to publishing firms (Column 8), or use Poisson estimation (Column 9).21 At the sample
means, a one standard deviation increase in university invention decreases company publi-
cations by 33%, while a one standard deviation increase in relevant human capital increases
publications by 22% (Column 7).22
Similar to the results for patents, the estimated effect of public knowledge on publications
is not statistically different from zero (Columns 6-9), suggesting that knowledge that is not
embodied in either people or inventions has little effect on corporate research as well.23
21
These results are also robust to dropping the controls for Agency exposuret and ln(Awards to f ocal
f irm)t−1 , and to using an inverse hyperbolic sine transformation of the dependent variable.
22
Average values for publication flow, public invention stock, and human capital are 15.72, 266.11, and
6,412.67, respectively. The standard deviations for public invention and human capital are 511.89 and
9,708.77, respectively. The marginal effect of a one standard deviation increase in public invention is a
decrease in firm publications of 511.89 × 0.162(15.72 + 1)/(266.11 + 1) = 5.19. The marginal effect of a one
standard deviation increase in human capital is an increase in firm publications of 9, 708.77 × 0.139(15.72 +
1)/(6, 412.67 + 1) = 3.52.
23
The consistency with the zero effect on patenting is gratifying. Even if public knowledge did not directly
affect the marginal return to corporate research, if it increased patenting by the firm, it would indirectly
increase the marginal return to research.
27
6.3 AMWS Scientists Equation
Table 7 presents the estimation results using firm employment of AMWS scientists—our
second measure of corporate internal research—as the dependent variable. The patterns are
very similar to those obtained using publications. Taken together, the results in Columns
5-9 indicate that relevant public knowledge has very little effect on company employment
of renowned scientists. We find a negative effect for public invention (Column 7, p-value <
0.05) and a positive effect for human capital (Column 7, p-value < 0.001).
Evaluated at the sample means, the 2SLS estimates in Column 7 indicate that a one
standard deviation increase in relevant public invention decreases employment of AMWS
scientists by 8%, while a one standard deviation increase in relevant human capital increases
employment of AMWS scientists by 9%.24
Table 7: Main Effect of Relevant Public Science on Company Employment of AMWS Sci-
entists
Notes: This table presents estimation results for corporate employment of AMWS scientists. The sample
in Column 8 is restricted to publishing firms. Standard errors (in parentheses) are robust to arbitrary
heteroskedasticity and allow for serial correlation through clustering by firms. Column 9 reports estimates
from a two-stage control function (CF) instrumental variable (IV) Poisson regression (Lin & Wooldridge,
2019). Standard errors are estimated by panel bootstrapping with 100 replications.
24
Average values for the number of AMWS scientists employed, public invention stock, and human capital
are 4.80, 266.11, and 6,412.67, respectively. The standard deviations for public invention and human capital
are 511.89 and 9,708.77, respectively. The marginal effect of a one standard deviation increase in public
invention is a decrease in AMWS scientists employed of 511.89 × 0.033(4.80 + 1)/(266.11 + 1) = 0.37. The
marginal effect of a one standard deviation increase in human capital is an increase in AMWS scientists
employed of 9, 708.77 × 0.047(4.80 + 1)/(6, 412.67 + 1) = 0.41.
28
6.4 R&D Expenditures Equation
Table 8 presents the estimation results for company R&D expenditures. Consistent with the
previous three tables, the 2SLS estimates show a negative, though only marginally significant,
effect of public invention (Column 7, p-value = 0.066), a positive and significant effect of
human capital (Column 7, p-value < 0.01), and no effect of public knowledge (Columns 6-8).
Evaluated at the sample means, the 2SLS estimates in Column 7 indicate that a one standard
deviation increase in relevant public invention decreases company R&D expenditures by 38%,
while a one standard deviation increase in relevant human capital increases them by 33%.25
Notes: This table presents estimation results for corporate R&D expenditures. The sample in Column 8 is
restricted to publishing firms. Standard errors (in parentheses) are robust to arbitrary heteroskedasticity
and allow for serial correlation through clustering by firms.
In summary, our key findings thus far are that (1) public invention, as measured by the
stock of university patents, has a negative effect on corporate innovation, whereas (2) human
capital, as measured by trained PhD scientists, has a positive effect, and (3) abstract public
knowledge, not embodied in either people or inventions, has no effect.
25
Average values for R&D expenditures, public invention stock, and human capital are 142.59, 287.74,
and 6,931.84, respectively. The standard deviations for public invention and human capital are 527.80 and
10,118.23, respectively. The marginal effect of a one standard deviation increase in public invention is a
decrease in R&D expenditures of 527.80 × 0.203(142.59 + 0.000001)/(287.74 + 1) = 54.02. The marginal
effect of a one standard deviation increase in human capital is an increase in AMWS scientists employed of
10, 118.23 × 0.225(142.59 + 0.000001)/(6, 931.84 + 1) = 46.82.
29
6.5 Heterogeneous Effects: Frontier Firms vs. Follower Firms
Frontier firms may differ from followers in the type of inventions they produce, the value they
derive from inventions, or both. To capture a firm’s proximity to the technology frontier,
we first count its annual flow of novel patents, where patent novelty is based on unique IPC
combinations. Then, we create Tech frontier as an indicator variable equal to 1 for firm
years with novel patents in the top decile compared to other sample firms in that year, and
0 otherwise.26 We interact this indicator variable with our measures of Public invention and
Human capital, respectively, and report second-stage 2SLS results in Table 9.27
The coefficient estimates on the interaction terms show substantial heterogeneity in the
effect of public science on internal research and invention based on firm proximity to the
technology frontier. While Tables 5, 6, and 7 show that, on average, firms respond to an
increase in relevant public invention by withdrawing from patenting, publishing, and hiring
of AMWS scientists, firms operating on the technology frontier do so to a lesser extent.
Similarly, though both frontier firms and followers increase their patenting, publishing, and
hiring in response to an increase in the supply of relevant human capital, frontier firms do
so to a greater extent. We find similar results when we measure proximity to the technology
frontier using patents that are first to be granted in a new CPC main group or subgroup
(see Appendix Table B14).
To further explore these results, we capture a firm’s ability to derive value from inven-
tions using the average patent value from Kogan, Papanikolaou, Seru, and Stoffman (2017)
normalized by market value. The indicator variable High ability equals 1 for firm years with
average patent values in the top decile compared to other sample firms in that year and
0 otherwise. Table 10 reports the second stage of 2SLS estimation using the same instru-
mental variables as before. Unlike the results for firm proximity to the technology frontier,
the coefficient estimates on the interaction terms are no longer significantly different from
zero across specifications. A firm’s ability to derive private value from inventions does not
condition its response to relevant public science. In other words, the impact of public science
on corporate innovation is more likely to be influenced by technological leadership than by
an advantage in product markets.
Our results are consistent with the view that firms on the technology frontier may have
more productive internal research or that these firms operate in technologies where public
26
Appendix Table C19 shows the results of a mean comparison test of frontier firms versus followers.
Frontier firms appear to have higher stocks of public knowledge and human capital than followers, but lower
stocks of public invention.
27
We use Predicted R&D budget - public invention, Predicted R&D budget - human capital, and their
interactions with Tech frontier as instrumental variables for Public invention, Human capital, and their
interactions with Tech frontier, respectively.
30
Table 9: Variation by Firm Proximity to the Technology Frontier: Unique IPC Combinations
Notes: This table presents the second stage of 2SLS estimation for the effect of public invention and human
capital on corporate patents, publications, and AMWS scientists when considering firm proximity to the
technology frontier. To measure this proximity, we first count each firm’s annual flow of novel patents,
where patent novelty is based on unique IPC combinations. Then, we create the variable Tech frontier as an
indicator equal to 1 for firm years with a flow of novel patents in the top decile compared to other sample firms
in that year, and 0 otherwise. Standard errors (in parentheses) are robust to arbitrary heteroskedasticity
and allow for serial correlation through clustering by firms.
invention is less plentiful but with abundant supplies of human capital. In either case, it
would result in frontier firms having a larger scale of internal research and invention. In turn,
frontier firms would be more responsive to increases in human capital but less responsive to
public invention.
31
Table 10: Variation by Firm Ability to Derive Value from Inventions: Patent Value
Notes: This table presents the second stage of 2SLS estimation for the effect of public invention and human
capital on corporate patents, publications, and AMWS scientists when considering firm ability to derive value
from inventions. To measure this ability, we first calculate the average patent value from Kogan et al. (2017),
normalized by market value, for each firm year. Then, we create the variable High ability as an indicator
equal to 1 for firm years with an average patent value in the top decile compared to other sample firms in
that year, and 0 otherwise. Standard errors (in parentheses) are robust to arbitrary heteroskedasticity and
allow for serial correlation through clustering by firms.
Our analysis reveals that the positive effect of human capital on firm patents and publications
is robust across all industries. The negative effect of public invention is robust across all
industries except Life sciences. In Life sciences, public knowledge complements internal
research in reducing the cost of inventing, while external and internal inventions are strategic
complements. This is consistent with incumbent firms collaborating with universities and
investing in or acquiring startups to complete downstream development and commercialize
the resulting products (Arora, Fosfuri, & Gambardella, 2001; Azoulay et al., 2019).
32
Table 11: Variation by Main Industry
33
heterogeneity in this effect by firm proximity to the technology frontier, we find that frontier
firms are better positioned to compete with university-backed startups in the product market
compared to followers (Columns 4 and 5).
Our results also indicate that increases in public knowledge reduce, not increase, value
for incumbent firms. While we leave for future work a careful examination of this negative
effect on market value, a potential direction would build on the idea that public knowledge is
available for all firms to exploit. If the average incumbent firm is poorly positioned to exploit
that knowledge relative to university-backed startups, the negative effect may be attributed
to rent-dissipating competition between incumbents and startups in the technology market.
That is, our results suggest that, insofar as public knowledge creates value, it is captured by
startups and other private firms, at the expense of incumbent public firms.
Table 12: Firm Value Equation
34
allows us to assess the extent to which our findings are sensitive to potential violations of
the key identifying assumptions. Second, we examine the impact of alternative measures of
public invention and human capital on our main results. This allows us to assess the extent
to which our findings are dependent on the specific measures used in our analysis. Third,
in universities, PhD training happens as a part of the research process. We separate public
invention from human capital to determine whether the two factors indeed have independent
effects on corporate innovation outcomes. Fourth, we use measures of high-quality corporate
innovation as dependent variables to test whether the effects of public invention and human
capital are consistent across different measures of corporate innovation.
We explore the sensitivity of our main results by using the R&D budgets of federal agen-
cies, instead of the political composition-predicted R&D budgets, in constructing alternative
instrumental variables. Appendix Tables A9 and B11 present the first and second stages
of 2SLS estimation. We find that the negative (positive) effect of public invention (human
capital) on corporate innovation persists.
We check the robustness of our main results by using alternative measures of public invention
and human capital. Public invention, broad is the stock of non-corporate publications that
are cited by patents in various CPC subclasses, weighted by the firm’s lagged patenting
shares across CPC subclasses. Human capital, cited is a firm-year measure of published
PhD dissertations cited by patents in various CPC subclasses, weighted by the firm’s lagged
patenting shares across CPC subclasses. Human capital, OECD is a firm-year measure of
PhD dissertations in various OECD natural science subfields, weighted by the reliance of
CPC subclasses on science published in various OECD subfields and by the firm’s lagged
patenting shares across CPC subclasses. Details about the construction of these measures
are included in Appendix A.
Table 13 presents the second stage of 2SLS estimation. We find results consistent with
Tables 5 and 6. As one might expect, the use of a broad measure of public invention
reduces the power of the instrument, resulting in noisier but qualitatively similar results.
But regardless of how we measure relevant public invention and human capital, the former
has a negative and significant effect (statistically and economically) on company patents and
publications (p-values < 0.001), while the latter has a positive and significant effect (p-values
< 0.001).
35
Table 13: Alternative Measures of Relevant Public Invention and Human Capital
We explore the sensitivity of our main results to separating public invention from human
capital in different ways. We construct alternative measures of Public invention, broad by
including only patent-cited publications from federal laboratories, excluding any published
PhD dissertations, excluding publications coauthored by PhD students, and excluding pub-
lications coauthored by the advisors of PhD students, respectively. We report results from
the second stage of 2SLS estimation in Appendix Table B12. We find that the effects of
public invention and human capital on corporate innovation are similar to those reported
earlier. For instance, the elasticity of corporate patenting with respect to public invention
ranges from -0.174 to -0.264, similar to -0.256, the elasticity reported in Column 7 of Table
5. The elasticity with respect to human capital ranges from 0.309 to 0.326, which is very
close to the comparable estimate of 0.338 reported in Column 7 of Table 5. The patterns
are similar for corporate publications and the employment of AMWS scientists.
36
We also account for differences in PhD production intensity across different scientific fields
in a three-step process. First, for each of the 15 scientific fields (not including humanities
and social sciences) included in the Dimensions Units of Assessment (UOA) classification
system, we calculate the ratio between (i) the total funding amount received by publications
of PhD students and their advisors published in the field and (ii) the total funding amount
received by all publications published in the field. Second, we categorize fields with above
(below) median ratios as having high (low) student-advisor funding. Third, we construct
alternative measures of Public invention, broad by including only publications from fields
with high or low student-advisor funding ratios, respectively. As shown in Appendix Table
B12, the effects of public invention and human capital on corporate R&D are not sensitive
to these permutations.
37
rate innovation.29 The small net effect, however, conceals the diverse ways in which public
and private R&D investments interact.
Indeed, firms’ response to the increase in public science depends on their proximity to
the technology frontier. Frontier firms tend to continue investing in internal research and
invention, even in the presence of abundant public science. This is consistent with the
observed surge in corporate scientific research in such emerging technology fields as artificial
intelligence and quantum computing. The disparity in response may arise because frontier
firms enjoy greater marginal returns from using internal research and invention than other
firms, or because they operate in technologies where public invention is less abundant but
human capital is nevertheless abundantly supplied. Consequently, frontier firms may benefit
more from public knowledge and skilled PhDs to fuel their internal research and inventions
than followers. On the other hand, firms operating in technologies with more abundant
public invention would also tend to cut back on internal research and development. Such
firms would naturally benefit less from expansion in the supply of human capital or public
knowledge but would be very responsive to changes in public invention.
Our findings also relate to the growing literature on economic growth and productivity
slowdown. The sluggish growth in productivity over the last three decades or more in the
face of sustained growth in scientific output has puzzled observers. Our findings point to
a possible reason. Romer (1990) and Jones (2022) stress that the non-rivalrous nature of
ideas is a potent source of increasing returns and productivity growth. It should follow that
the most powerful sources of increasing returns are ideas that are broadly usable, and whose
production is publicly funded so that they can be placed in the public domain, available to
all. This is the basic argument underlying the case for public support for scientific research
in universities.
Yet the history of technical progress teaches us that abstract ideas are also difficult to
use. Ideas have to be tailored for specific uses, and frequently, have to be embodied in people
and artifacts before they can be absorbed by firms. However, such embodiment also makes
29
Our estimates suggest that the rise in public science between 1986 and 2015 led to an average annual
decrease in corporate patents of 1.5% and in corporate publications of 1.1%. Between 1986 and 2015,
the stock of university patents relevant to our sample of firms increased by 660.85 (from 21.65 to 682.50),
while human capital increased by 4,048.33 (from 5,896.36 to 9,944.69). Using the coefficients from Column
7 in Table 5, we estimate that the increase in university inventions decreased firm patents by 660.85 ×
0.256(28.30+1)/(266.11+1) = 18.56, while the increase in human capital increased firm patents by 4, 048.33×
0.338(28.30+1)/(6, 412.67+1) = 6.25. The net effect was a decrease in firm patents of 12.31, which represents
a 1.5% decrease per year relative to the average annual patent flow of 28.30. Using the coefficients from
Column 7 in Table 6, we estimate that the increase in university inventions decreased firm publications by
660.85 × 0.162(15.72 + 1)/(266.11 + 1) = 6.70, while the increase in human capital increased firm publications
by 4, 048.33 × 0.139(15.72 + 1)/(6, 412.67 + 1) = 1.47. The net effect was a decrease in firm publications of
5.23, which represents a 1.1% decrease per year relative to the average annual publication flow of 15.72.
38
ideas less potent sources of increasing returns, turning non-rival ideas into rival inputs, whose
use by rivals is easier to restrict. Our findings confirm that firms, especially those not on
the technological frontier, appear to lack the absorptive capacity to use externally supplied
ideas unless they are embodied in human capital or inventions. The limit on growth is not
the creation of useful ideas but rather the rate at which those ideas can be embodied in
human capital and inventions, and then allocated to firms to convert them into innovations.
In other words, productivity growth may have slowed down because the potential users—
private corporations—lack the absorptive capacity to understand and use those ideas.
The loss of absorptive capacity is partly related to the growing specialization and divi-
sion of innovative labor in the U.S. economy. Not only do universities and public research
institutes produce the bulk of scientific knowledge, but over the past three decades, publicly
funded inventions and startups have grown in importance as sources of innovation. Con-
comitantly, many incumbent firms have substantially withdrawn from performing upstream
scientific research. The withdrawal of many companies from upstream scientific research
may have reduced their absorptive capacity—their ability to understand and use scientific
advances produced by public science. If so, the division of innovative labor between univer-
sities and firms, wherein the former produce knowledge and the latter apply the knowledge
to invent, appears to work much better for frontier firms. Non-frontier firms instead require
universities or startups to convert ideas into inventions. The growing specialization involv-
ing universities, startups, and incumbents may therefore pose a challenge to maintaining a
diverse and vibrant innovation ecosystem. The expansion of public science may widen the
gap between frontier firms and followers, with ramifications for product market competition,
as well as for the rate and direction of technical progress.
39
References
Aldridge, T. T., & Audretsch, D. (2017). The Bayh-Dole Act and scientist entrepreneur-
ship. In Universities and the entrepreneurial ecosystem (pp. 57–66). Edward Elgar
Publishing.
American Association for the Advancement of Science. (2021). Historical Trends in Federal
R&D. (Available at https://www.aaas.org/programs/r-d-budget-and-policy/
historical-trends-federal-rd. Accessed December 6, 2021.)
Angrist, J., Azoulay, P., Ellison, G., Hill, R., & Lu, S. F. (2020). Inside job or deep impact?
extramural citations and the influence of economic scholarship. Journal of Economic
Literature, 58 (1), 3–52.
Arora, A., Belenzon, S., & Patacconi, A. (2018). The decline of science in corporate R&D.
Strategic Management Journal , 39 (1), 3–32.
Arora, A., Belenzon, S., & Sheer, L. (2021a). Knowledge spillovers and corporate investment
in scientific research. American Economic Review , 111 (3), 871–98.
Arora, A., Belenzon, S., & Sheer, L. (2021b). Matching patents to compustat firms, 1980-
2015: Dynamic reassignment, name changes, and ownership structures. Research Pol-
icy, 50 (5), 104217.
Arora, A., Fosfuri, A., & Gambardella, A. (2001). Markets for technology and their impli-
cations for corporate strategy. Industrial and Corporate Change, 10 (2), 419–451.
Azoulay, P., Ding, W., & Stuart, T. (2009). The impact of academic patenting on the rate,
quality and direction of (public) research output. The Journal of Industrial Economics,
57 (4), 637–676.
Azoulay, P., Graff Zivin, J. S., Li, D., & Sampat, B. N. (2019). Public R&D investments and
private-sector patenting: Evidence from NIH funding rules. The Review of Economic
Studies, 86 (1), 117–152.
Babina, T., He, A. X., Howell, S. T., Perlman, E. R., & Staudt, J. (2023). Cutting the innova-
tion engine: How federal funding shocks affect university patenting, entrepreneurship,
and publications. The Quarterly Journal of Economics.
Baruffaldi, S., & Poege, F. (2020). A firm scientific community: Industry participation
and knowledge diffusion. Max Planck Institute for Innovation & Competition Research
Paper (20-10).
Beise, M., & Stahl, H. (1999). Public research and industrial innovations in germany.
Research Policy, 28 (4), 397–422.
Belenzon, S., & Cioaca, L. C. (2021). Guaranteed markets and corporate scientific research.
National Bureau of Economic Research Working Paper (w28644).
Belenzon, S., & Schankerman, M. (2013). Spreading the word: Geography, policy, and
knowledge spillovers. Review of Economics and Statistics, 95 (3), 884–903.
Bellet, C. S., De Neve, J.-E., & Ward, G. (2023). Does employee happiness have an impact
on productivity? Management Science.
Bloomberg Government. (2023). Your guide to navigating the federal budget process. (Avail-
able at https://about.bgov.com/brief/your-guide-to-navigating-the-federal
-budget-process/. Accessed September 30, 2023.)
Cohan, A., Feldman, S., Beltagy, I., Downey, D., & Weld, D. S. (2020). SPECTER:
Document-level representation learning using citation-informed transformers. In Pro-
40
ceedings of the 58th annual meeting of the association for computational linguistics
(pp. 2270–2282).
Cohen, W. M., & Levinthal, D. A. (1990). Absorptive capacity: A new perspective on
learning and innovation. Administrative Science Quarterly, 128–152.
Cohen, W. M., Nelson, R. R., & Walsh, J. P. (2002). Links and impacts: The influence of
public research on industrial R&D. Management Science, 48 (1), 1–23.
David, P. A., Hall, B. H., & Toole, A. A. (2000). Is public R&D a complement or substitute
for private R&D? A review of the econometric evidence. Research Policy, 29 (4-5),
497–529.
Davis, O. A., Dempster, M. A. H., & Wildavsky, A. (1966). A theory of the budgetary
process. American Political Science Review , 60 (3), 529–547.
Delron, J.-M., Guellec, D., Wu, C., & Liu, J. (2022). Building a corpus of patents-articles sib-
lings. (Available at https://conference.nber.org/conf_papers/f176403.slides
.pdf (Accessed March 2023))
Digital Science. (2022). The data in Dimensions. (Retrieved from https://www.dimensions
.ai/dimensions-data/ on April 12, 2022.)
Dimos, C., & Pugh, G. (2016). The effectiveness of R&D subsidies: A meta-regression
analysis of the evaluation literature. Research Policy, 45 (4), 797–815.
Einiö, E. (2014). R&D subsidies and company performance: Evidence from geographic
variation in government funding based on the ERDF population-density rule. Review
of Economics and Statistics, 96 (4), 710–728.
Epp, D. A., Lovett, J., & Baumgartner, F. R. (2014). Partisan priorities and public budget-
ing. Political Research Quarterly, 67 (4), 864–878.
Fabrizio, K. R., & Di Minin, A. (2008). Commercializing the laboratory: Faculty patenting
and the open science environment. Research Policy, 37 (5), 914–931.
Fleming, L., Greene, H., Li, G., Marx, M., & Yao, D. (2019). Government-funded research
increasingly fuels innovation. Science, 364 (6446), 1139–1141.
González, X., Jaumandreu, J., & Pazó, C. (2005). Barriers to innovation and subsidy
effectiveness. RAND Journal of Economics, 930–950.
Goolsbee, A. (1998). Does government R&D policy mainly benefit scientists and engineers?
American Economic Review , 88 (2), 298–302.
Hartmann, P., & Henkel, J. (2020). The rise of corporate science in AI: Data as a strategic
resource. Academy of Management Discoveries, 6 (3), 359–381.
Hausman, N. (2022). University innovation and local economic growth. Review of Economics
and Statistics, 104 (4), 718–735.
Hernandez, D., & King, R. (2016). Universities’ AI talent poached by tech giants. (Retrieved
from https://www.wsj.com/articles/universities-ai-talent-poached-by-tech
-giants-1479999601.)
Jones, C. I. (2022). The past and future of economic growth: A semi-endogenous perspective.
Annual Review of Economics, 14 , 125–152.
Kelly, B., Papanikolaou, D., Seru, A., & Taddy, M. (2021). Measuring technological innova-
tion over the long run. American Economic Review: Insights, 3 (3), 303–320.
Kim, S. D., & Moser, P. (2021). Women in science. lessons from the baby boom (Tech. Rep.).
National Bureau of Economic Research.
Klevorick, A. K., Levin, R. C., Nelson, R. R., & Winter, S. G. (1995). On the sources
41
and significance of interindustry differences in technological opportunities. Research
Policy, 24 (2), 185–205.
Kogan, L., Papanikolaou, D., Seru, A., & Stoffman, N. (2017). Technological innovation,
resource allocation, and growth. The Quarterly Journal of Economics, 132 (2), 665–
712.
Laursen, K., & Salter, A. (2004). Searching high and low: What types of firms use universities
as a source of innovation? Research Policy, 33 (8), 1201–1215.
Lee, D. S., McCrary, J., Moreira, M. J., & Porter, J. (2022). Valid t-ratio inference for IV.
American Economic Review , 112 (10), 3260–3290.
Lichtenberg, F. R. (1984). The relationship between federal contract R&D and company
R&D. American Economic Review , 74 (2), 73–78.
Lin, W., & Wooldridge, J. M. (2019). Testing and correcting for endogeneity in nonlinear
unobserved effects models. In Panel data econometrics (pp. 21–43). Elsevier.
Mamuneas, T. P., & Nadiri, M. I. (1996). Public R&D policies and cost behavior of the US
manufacturing industries. Journal of Public Economics, 63 (1), 57–81.
Mansfield, E. (1991). Academic research and industrial innovation. Research Policy, 20 (1),
1–12.
Mansfield, E. (1995). Academic research underlying industrial innovations: Sources, char-
acteristics, and financing. The Review of Economics and Statistics, 55–65.
Mansfield, E. (1998). Academic research and industrial innovation: An update of empirical
findings. Research Policy, 26 (7-8), 773–776.
McMillan, G. S., Narin, F., & Deeds, D. L. (2000). An analysis of the critical role of public
science in innovation: The case of biotechnology. Research Policy, 29 (1), 1–8.
Moretti, E., Steinwender, C., & Van Reenen, J. (2021). The intellectual spoils of war?
Defense R&D, productivity, and international spillovers. National Bureau of Economic
Research Working Paper (w26483).
Mowery, D. C. (2009). Plus ca change: Industrial R&D in the “third industrial revolution”.
Industrial and Corporate Change, 18 (1), 1–50.
Mulligan, K., Lenihan, H., Doran, J., & Roper, S. (2022). Harnessing the science base:
Results from a national programme using publicly-funded research centres to reshape
firms’ R&D. Research Policy, 51 (4), 104468.
Myers, K. R., & Lanahan, L. (2022). Estimating spillovers from publicly funded R&D:
Evidence from the US Department of Energy. American Economic Review , 112 (7),
2393–2423.
Narin, F., Hamilton, K. S., & Olivastro, D. (1997). The increasing linkage between US
technology and public science. Research Policy, 26 (3), 317–330.
National Center for Science and Engineering Statistics. (2023a). National patterns of R&D
resources: 2020–21 data update (Tech. Rep. No. NSF 23-321). National Science Foun-
dation. Retrieved from https://ncses.nsf.gov/pubs/nsf23321
National Center for Science and Engineering Statistics. (2023b). Science and engineering
indicators 2022 (Tech. Rep.). National Science Foundation. Retrieved from https://
ncses.nsf.gov/pubs/nsb20225/data#
National Science Board. (1998). Science and engineering indicators 1998 (Tech.
Rep. No. NSB-1998-1). National Science Foundation. Retrieved from
https://wayback.archive-it.org/5902/20150627201913/http://www.nsf.gov/
42
statistics/seind98/
National Science Board. (2010). Science and engineering indicators 2010 (Tech.
Rep. No. NSB-2010-1). National Science Foundation. Retrieved from
https://wayback.archive-it.org/5902/20160210151754/http://www.nsf.gov/
statistics/seind10/
National Science Board. (2018). Science and engineering indicators 2018 (Tech. Rep. No.
NSB-2018-1). National Science Foundation. Retrieved from https://www.nsf.gov/
statistics/2018/nsb20181/
Nelson, R. R. (1986). Institutions supporting technical advance in industry. The American
Economic Review , 76 (2), 186–189.
OECD. (2003). Turning science into business: Patenting and licensing at public re-
search organizations. Retrieved from https://www.oecd-ilibrary.org/content/
publication/9789264100244-en doi: https://doi.org/https://doi.org/10.1787/
9789264100244-en
Pavitt, K. (1991). What makes basic research economically useful? Research Policy, 20 (2),
109–119.
Roche. (2023, May 4). Roche launches Institute of Human Biology to accelerate break-
throughs in R&D by unlocking the potential of human model systems. Retrieved from
https://www.roche.com/media/releases/med-cor-2023-05-04
Romer, P. M. (1990). Endogenous technological change. Journal of political Economy, 98 (5,
Part 2), S71–S102.
Rosenberg, N. (1990). Why do firms do basic research (with their own money)? Research
Policy, 19 (2), 165–174.
Rosenberg, N., & Nelson, R. R. (1994). American universities and technical advance in
industry. Research Policy, 23 (3), 323–348.
Scandura, A. (2016). University-industry collaboration and firms’ r&d effort. Research
Policy, 45 (9), 1907–1922.
Schartinger, D., Rammer, C., Fischer, M. M., & Fröhlich, J. (2002). Knowledge interac-
tions between universities and industry in austria: Sectoral patterns and determinants.
Research Policy, 31 (3), 303–328.
Sinha, A., Shen, Z., Song, Y., Ma, H., Eide, D., Hsu, B.-J., & Wang, K. (2015). An overview
of Microsoft Academic Service (MAS) and applications. In Proceedings of the 24th
international conference on world wide web (pp. 243–246).
Szücs, F. (2020). Do research subsidies crowd out private R&D of large firms? Evidence
from European framework programmes. Research Policy, 49 (3), 103923.
Tartari, V., & Stern, S. (2021). More than an ivory tower: The impact of research institutions
on the quantity and quality of entrepreneurship (Tech. Rep. No. 28846). National
Bureau of Economic Research.
Tavares, J. (2004). Does right or left matter? cabinets, credibility and fiscal adjustments.
Journal of Public Economics, 88 (12), 2447–2468.
Tether, B. S., & Tajar, A. (2008). Beyond industry–university links: Sourcing knowledge for
innovation from consultants, private research organisations and the public science-base.
Research Policy, 37 (6-7), 1079–1095.
Valero, A., & Van Reenen, J. (2019). The economic impact of universities: Evidence from
across the globe. Economics of Education Review , 68 , 53–67.
43
Wallsten, S. J. (2000). The effects of government-industry R&D programs on private R&D:
The case of the Small Business Innovation Research program. The RAND Journal of
Economics, 82–100.
Wang, K., Shen, Z., Huang, C., Wu, C.-H., Eide, D., Dong, Y., . . . Rogahn, R. (2019). A
review of Microsoft Academic Services for science of science studies. Frontiers in Big
Data, 2 , 45.
44
Appendix A Data and Variable Construction
A.1 Main Data Sources
We combined data from three main sources: Dimensions, American Men & Women of Sci-
ence, and ProQuest Dissertations & Theses Global.
A.1.1 Dimensions
Digital Science’s Dimensions project (July 31, 2021) provides data on scientific publications,
grants, patents, and citations. The data include links between funding organizations, the
grants they awarded, and the resulting publications, as well as links between patents and
publications in the form of non-patent literature citations tracking the use of published
research in invention.30 The dataset is extensive, containing over 131.5 million publications
from various sources, including 107,000 journals and 62 pre-print servers, 42.8 million patent-
to-publication citations, and 6.3 million grants totaling $2.3 trillion from 656 funding agencies
globally.
45
Table A1: Construction of AMWS Dataset
The AMWS directory provides full employment histories for the scientists profiled. Once
we linked a scientist to a firm in our sample, we extracted the start and end year of the
affiliation. We aggregated this information to the firm-year level by counting the number of
AMWS scientists employed by a focal firm each year.
46
similarity between the abstracts of dissertations and the abstracts of corporate patents, rather
than relying on citation data (unlike publications, dissertations are not cited by patents).
We also matched each dissertation with its published version from Dimensions, if avail-
able. Since a dissertation often undergoes significant revisions before being published as
a scientific publication, we compared the dissertation’s abstract with the abstracts of all
publications published by the same author within a decade from PhD graduation to identify
the most similar publication to a focal dissertation. We then constructed a second mea-
sure of firm-relevant human capital, which allowed us to use patent citations to published
dissertations to infer relevance to corporate R&D.
Moreover, we classified PhD dissertations into research fields. We then used the re-
liance of patenting subclasses on knowledge published across research fields to construct our
third measure of firm-relevant human capital. PQDT provides a list of one or more non-
standardized subject terms for each dissertation (e.g., “organic chemistry” or “health care;
public health; and laboratories”). We manually created a list of 1,027 disambiguated subjects
and discarded dissertations with a “soft science” subject, such as “literature,” “history,” and
“social sciences.” We also discarded PhD dissertations from non-U.S. universities as well as
all master’s degree theses. We ended up with a dataset of 771,023 U.S. PhD dissertations
awarded between 1985 and 2016 in 394 “hard science” subjects. We manually assigned these
subjects to the 25 OECD natural science subfields. Table A2 displays the resulting crosswalk
for the most common subject terms. We then classified dissertations into one or more OECD
subfields, which allowed us to capture the multidisciplinary nature of many dissertations.
We also faced the challenge of matching each PhD advisor to a researcher from Dimen-
sions, which was necessary to construct instrumental variables for human capital. Instances
of common names led to multiple ambiguous matches. To overcome this challenge, we re-
stricted potential advisor matches using data on the PhD candidate’s institutional affiliation
and a 6-year time window that ended in the defense year. This allowed us to identify all
the publications authored by each PhD advisor, along with the funding linkages between
these publications and federal agencies. We used this information to construct instrumental
variables for firm-relevant human capital.
47
Table A2: Crosswalk Between OECD Subfields and PhD Dissertation Subject Terms
OECD natural science subfield Number of PhD dissertations Most common subject term
1.01 Mathematics 41,106 Mathematics
1.02 Computer and information sciences 48,120 Computer science
1.03 Physical sciences and astronomy 41,254 Optics
1.04 Chemical sciences 83,023 Chemistry
1.05 Earth and related environmental sciences 24,932 Geology
1.06 Biological sciences 155,694 Molecular biology
1.07 Other natural sciences 1 Natural sciences
2.01 Civil engineering 16,205 Civil engineering
2.02 Electrical eng, electronic eng 46,092 Electrical engineering
2.03 Mechanical engineering 31,027 Mechanical engineering
2.04 Chemical engineering 19,101 Chemical engineering
2.05 Materials engineering 28,199 Materials science
2.06 Medical engineering 6,431 Biomedical engineering
2.07 Environmental engineering 31,526 Ecology
2.08 Environmental biotechnology 0 N/A
2.09 Industrial biotechnology 29 Tissue engineering
2.10 Nano-technology 2,053 Nanotechnology
2.11 Other engineering and technologies 17,615 Industrial engineering
3.01 Basic medical research 46,247 Pharmacology
3.02 Clinical medicine 41,111 Neurology
3.03 Health sciences 44,546 Public health
4.01 Agriculture, forestry, fisheries 21,334 Botany
4.02 Animal and dairy science 8,143 Animals
4.03 Veterinary science 3,670 Veterinary services
4.05 Other agricultural science 4,785 Food science
Notes: This table showcases the OECD natural science subfields that have been linked with ProQuest
dissertations. It highlights the subject term most commonly used between 1980 and 2015 for each OECD
subfield.
The index o denotes OECD subfields. P ublicationso,t is the number of non-corporate publi-
cations published in year t in subfield o. P recohort share of publicationsi,o is firm i’s lagged
share of publications in subfield o. We calculated a stock measure of Public knowledge using
a perpetual inventory method with a 15% depreciation rate.
48
Table A3: OECD Subfields and Dissertation/Publication Counts
49
ing forward citations. The out-performance is likely due to the fact that SPECTER
is trained on scientific literature and patents, whereas other models are trained only
on patents or general texts, such as the content of Wikipedia. Unlike vector represen-
tations used by term frequency-inverse document frequency (TF-IDF) algorithms, we
embedded each dissertation abstract and each patent abstract into a densely bounded
vector (dense meaning not having any missing values in the vector, and bounded mean-
ing using values that can only fall between 0 and 1). Each word in the abstract was
converted into a vector of 768 values between 0 and 1 (a 1 by 768 vector). Each ab-
stract was converted into a matrix of size 768 by the number of words in the abstract.
We condensed the matrix into a vector with 768 rows and one column using a mean
pooling approach (averaging across rows).
2. Using the vectors from the previous step, we calculated the cosine similarity for each
dissertation-patent pair (0.77 million PhD dissertations and 1.35 million patents). Due
to the large number of abstract pairs, we used a high-performance computing (HPC)
cluster to distribute the task over 40 NVIDIA A100-40GB GPUs, with each GPU
running for more than six days continuously. The system processed and ranked over 1
trillion pairs of abstracts.32
3. For each corporate patent granted in year t, we identified the top 1,000 most similar
dissertations granted in years [t − 1, t + 1]. Sample firms don’t necessarily patent every
year. To ensure we don’t have zero relevant human capital, we focused on the 5-year
time cohort as our relevant period. We identified all the dissertations that were similar
(i.e., in the top 1,000) to the patents granted to a focal firm during each 5-year time
cohort. Because a PhD dissertation could be similar to multiple corporate patents, we
calculated the maximum textual similarity score between the dissertation and all the
patents granted to the focal firm during the 5-year time cohort.
4. Human capital was constructed as the weighted sum of PhD dissertations, using the
maximum similarity scores between dissertations and patents granted to the focal firm
as weights:
X
P hD dissertationsi,t = M aximum textual similarityd,i,t (10)
d∈D
D is the set of PhD dissertations in the top 1,000 most similar dissertations for one or
more of the patents granted to firm i during the 5-year time cohort t. M aximum textual
similarityd,i,t is the maximum textual similarity score between the abstract of disser-
tation d and the abstracts of all patents granted to firm i during the 5-year time cohort
t.
32
The computing needs included significant storage (more than 10 TBs) and memory resources (more
than 4,800 GBs of RAM). The total computational time for the similarity task was approximately 60 days.
50
Table A4: Performance Comparison for Deep Learning Models
Percentile
of similarity Doc2Vec PatentBERT BERT for Patents PatentSBERTa SPECTER
Top 1% 61.28% 77.44% 43.60% 86.28% 87.81%
Top 3% 76.22% 88.11% 76.83% 94.51% 94.51%
Top 5% 81.10% 93.29% 86.59% 95.43% 97.56%
Notes: This table from Delron, Guellec, Wu, and Liu (2022) compares the performance of several deep
learning models in identifying patent-paper pairs (i.e., the scientific publication that expresses the same
technical content as a given patent). SPECTER was more accurate compared to other models. For a given
set of patents, the Doc2Vec, PatentBERT, BERT for Patents, PatentSBERTa, and SPECTER models were
used to identify the most similar publications. The outputs were then compared to the ground truth (correct
pairs of publications) for each patent. SPECTER was able to rank the correct publication pair among the
top 1% most similar publications for 87.81% of patents, which is a relatively high accuracy rate. Conversely,
BERT for Patents was only able to achieve this for 43.60% of patents. This suggests that SPECTER is the
more effective model for identifying the most similar publications to a given patent.
The index s denotes patent subclasses. U niversity patentss,t is the count of patents granted
to universities in subclass s in year t. P recohort share of patentsi,s is firm i’s lagged share
of patents in subclass s.
33
Other organization types are company, healthcare, nonprofit, facility, other, government, and archive.
For more information on GRID, see https://www.grid.ac/.
51
A.3 Details on the Instrumental Variables
A.3.1 Data Sources on Agency R&D Budgets
We used data on federal R&D budgets from the “Total R&D by Agency, 1976-2020” series
compiled by the American Association for the Advancement of Science (AAAS, 2021). Total
R&D includes basic research, applied research, development, construction of R&D facilities,
and major capital equipment for R&D. Each year, federal agencies are required to report
their R&D budgets to the White House Office of Management and Budget (OMB). AAAS
compiles these data, along with historical data published by OMB and survey data published
by the National Science Foundation’s National Center for Science and Engineering Statistics,
into a data series of R&D budgets by agency, character, and discipline.
Table A5 summarizes the R&D budgets by agency and decade. It demonstrates the
significant variation in R&D budgets between agencies and over time. Some agencies are
significant funders of R&D (e.g., Defense, Health and Human Services), while others are
not (e.g., Environmental Protection Agency, Department of Homeland Security). More
importantly, the composition of federal R&D investments has changed over time. Defense-
related R&D has dropped from 58% of all federal R&D budgets in the 1980s to only 49%
in the 2010s. Conversely, human health-related R&D has increased from 11% of all federal
R&D budgets in the 1980s to 23% in the 2010s. We exploit these differences and changes to
“shock” the public science relevant to firms.
52
A.3.2 Agency R&D Budgets
We construct a Bartik-style shift-share instrument for each component of public science.
Our instrument R&D budget - public knowledge combines “shifts” to the federal funding
for public knowledge published in each OECD natural science subfield with firm-specific
“exposure shares” based on the firm’s publishing across OECD subfields in the previous
5-year time cohort. The following procedure explains its construction:
1. We used data from AAAS to identify the value of the R&D budget appropriated by
Congress to each of the 12 main federal agencies (plus an “Other” category for smaller
agencies) in each year.34
2. We used the connections between federal agencies, grants, and publications from Di-
mensions to identify the federal agencies that funded each non-corporate publication.
3. For each OECD subfield, we calculated its reliance on funding from each federal agency
by dividing (i) the number of publications published in the focal subfield over 1980-
2015 and funded by a focal agency by (ii) the total number of publications published
in the same subfield over 1980-2015.
4. We calculated each firm’s shares of publications across OECD subfields by dividing (i)
the number of firm publications in each subfield-time cohort by (ii) the total number
of firm publications in the same time cohort.
5. We combined the shifts and exposure shares to calculate our first instrument:
X
R&D budget - public knowledgei,t = P recohort share of publicationsi,o
o∈O
!
X
R&D budgeta,t × Reliance on agencyo,a
a∈A
(12)
O denotes OECD subfields. P recohort share of publicationsi,o is firm i’s share of
publications in subfield o during the previous 5-year time cohort. A is the set of 12
main federal agencies, plus an “Other” category for smaller agencies. R&D budgeta,t is
the R&D budget of agency a in year t. Reliance on agencyo,a is a share obtained by
dividing the number of publications published in subfield o over 1980-2015 and funded
by agency a by the total number of publications published in subfield o over 1980-2015.
Our instrument R&D budget - public invention combines “shifts” to the federal funding for
knowledge cited by university patents granted in each subclass with firm-specific “exposure
shares” based on the firm’s patenting across subclasses in the previous 5-year time cohort.
Its construction broadly parallels that of the instrumental variable for Public knowledge,
with two updates. First, to connect federal funding for science to public invention, we used
34
The “Total R&D by Agency, 1976-2020” table includes “budget authority in millions of constant FY 2020
dollars.” The constant-dollar conversions used OMB’s chained price index, which can be found in historical
table 10.1 available at https://www.whitehouse.gov/omb/historical-tables/.
53
the non-patent literature (NPL) citations and funding linkages from Dimensions to identify
the federal agencies that funded each non-corporate publication cited by a university patent.
Second, for each patent subclass, we calculated its reliance on public science funded by each
federal agency. Our first instrument for Public invention was calculated as:
X
R&D budget - public inventioni,t = P recohort share of patentsi,s
s∈S
! (13)
X
R&D budgeta,t × Reliance on agencys,a
a∈A
The index s denotes patent subclasses. P recohort share of patentsi,s is firm i’s share of
patents in subclass s during the previous 5-year time cohort, obtained by dividing the number
of firm patents granted in subclass s by the total number of firm patents in that time period.
A and R&D budgeta,t are as previously defined. Reliance on agencys,a is a share obtained
by dividing the number of citations from university patents granted in subclass s over 1980-
2020 to non-corporate publications published over 1980-2015 and funded by agency a by the
total number of citations from university patents granted in subclass s over 1980-2020 to all
non-corporate publications published over 1980-2015.
Our instrument R&D budget - human capital combines “shifts” to the federal funding for
PhD dissertation advisors with the “exposure shares” of the similarity scores between the
abstracts of dissertations and the abstracts of firm patents, as follows:
X
R&D budget - human capitali,t = M aximum textual similarityd,i,t
d∈D
X
! (14)
R&D budgetd,a × Share of agencyd,a
a∈A
D is the set of PhD dissertations in the top 1,000 most similar dissertations for one or more
of the patents granted to firm i during the time cohort t. M aximum textual similarityd,i,t is
the maximum textual similarity score between the abstract of dissertation d and the abstracts
of all patents granted to firm i during the 5-year time cohort t. A is as previously defined.
R&D budgetd,a is the R&D budget for agency a at the beginning of the PhD program (that
is, five years before the year of defense of dissertation d). Share of agencyd,a is obtained by
dividing the funding amount (in $) from agency a to the publications of the advisor(s) of
dissertation d during the 6-year period ending in dissertation d’s defense year by the total
funding amount (in $) from agency a to any publication published during the 6-year period
ending in the defense year of the dissertation d.
54
The U.S. House Appropriations Committee and its counterpart, the U.S. Senate Ap-
propriations Committee, play a pivotal role in the legislative process, being responsible for
passing appropriations bills that regulate the discretionary spending of federal agencies. Each
committee is organized into subcommittees, and each subcommittee is charged with devel-
oping one regular annual appropriations bill that allocates funding for various agencies and
activities that fall under its jurisdiction (Bloomberg Government, 2023). Importantly, the
jurisdiction of each U.S. House appropriations subcommittee mirrors that of a corresponding
U.S. Senate appropriations subcommittee. This pairing of subcommittees between the two
chambers of Congress ensures symmetry and coordination in the appropriations process.
The composition and names of congressional appropriations subcommittees are not static
over time, reflecting the evolving priorities and structure of the federal government. For ex-
ample, the Homeland Security subcommittee was established in 2003 to oversee the newly
created Department of Homeland Security, itself the result of combining all or part of 22
different federal departments and agencies. Since 2007, the U.S. House Appropriations Com-
mittee and the U.S. Senate Appropriations Committee have each included 12 subcommittees:
1. Agriculture, Rural Development, Food and Drug Administration, and Related Agen-
cies;
3. Defense;
6. Homeland Security;
9. Legislative Branch;
55
A.3.4 Data Sources on the Political Composition of Subcommittees
Given the absence of a comprehensive data source about historical congressional appropri-
ations subcommittees, we manually collected data from a variety of sources. We compiled
information on the jurisdiction and membership roster of each subcommittee from the 95th
Congress (1977-1978) through the 114th Congress (2015-2016).
We used the jurisdiction information to identify which subcommittees are responsible
for which federal agencies. Table A6 summarizes the mapping between the 12 pairs of
appropriations subcommittees and our 12 main federal agencies. The catch-all category of
“Others” was mapped directly to the U.S. House and U.S. Senate Appropriations Committees.
Subcommittees USDA DoC DoD DoE HHS DHS DoT VA DoI EPA NASA NSF
1. Agriculture, Rural Development, 100%
Food and Drug Administration
2. Commerce, Justice, Science 100% 75% 75%
3. Defense 100% 25%
4. Energy and Water Development 100% 25%
5. Financial Services and General 25%
Government
6. Homeland Security 100%
7. Interior, Environment 75% 100%
8. Labor, Health and Human Ser- 100%
vices, Education
9. Legislative Branch
10. Military Construction, Veterans 100%
Affairs
11. State, Foreign Operations
12. Transportation, Housing and Ur- 100%
ban Development
Total 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%
Notes: This table maps the 12 appropriations subcommittees in the U.S. House and the U.S. Senate to 12
main federal agencies based on the jurisdictions of the subcommittees. The reported percentages represent
the weights applied when calculating the two measures of political composition, Majority party share and
Democratness, at the agency level.
We used the membership rosters to extract two pieces of information for each subcom-
mittee (as well as the overall U.S. House and U.S. Senate Appropriations Committees):
1. Majority party share: To quantify how dominant the majority party was in the
subcommittee, we calculated the ratio of (i) the number of members from the majority
political party in the chamber and (ii) the total number of members in the subcom-
mittee.
56
For agencies that fall under the jurisdiction of a single pair of appropriations subcommit-
tees (e.g., Health and Human Services, which is overseen by the subcommittees on Labor,
Health and Human Services, Education, and Related Agencies), we calculated a simple av-
erage Majority party share across the pair of subcommittees to arrive at an agency-year
measure. We did the same for the Democratness measure.
For agencies that fall under the jurisdiction of two pairs of appropriations subcommit-
tees (e.g., Department of the Interior, which is overseen by both the subcommittees on
Energy and Water Development and the subcommittees on Interior, Environment, and Re-
lated Agencies), we calculated a weighted average Majority party share across the relevant
subcommittees, using the percentages reported in Table A6 as weights.35 We did the same
for the Democratness measure.
Important for our identification strategy, the political composition of subcommittees
predicts the R&D budgets of federal agencies. As shown in Table A7, the relationships
between lagged Majority party share and R&D budget is negative and significant (Columns
2 and 4, p-values < 0.05). The relationship between lagged Democratness and R&D budget
is positive, though imprecisely estimated (Columns 3 and 4).
57
of public science. Instead of using the actual R&D budget of each agency in equations 12, 13,
and 14, we first predict the agency R&D budget using the specification reported in Column
4 of Table A7, then use this predicted agency R&D budget to construct our instrumental
variables. Table A8 provides summary statistics for all the instrumental variables used in
the econometric analyses.
58
Table A9: Instrumental Variable Estimation (First Stage)
Notes: This table displays first-stage OLS regression results for instrumental variables based on the R&D
budgets of federal agencies. Standard errors (in parentheses) are robust to arbitrary heteroskedasticity and
allow for serial correlation through clustering by firms.
by the same individual within a 10-year window after PhD graduation. Our assump-
tion was that the top most similar publication corresponds to the published version of
the student’s dissertation.
2. For each patent subclass and year, we counted the number of PhD dissertations de-
fended that year whose published versions were cited by patents granted in the focal
subclass between 1980 and 2020.
3. We used the same firm patenting shares across CPC subclasses as previously described.
59
Table A10: Preferred Instrumental Variable Estimation (First Stage)
Notes: This table displays first-stage OLS regression results for instrumental variables based on predicted
R&D budgets of federal agencies, where two measures of the political composition of congressional appro-
priations subcommittees, Majority party share and Democratness, are first used to predict R&D budgets.
Standard errors (in parentheses) are robust to arbitrary heteroskedasticity and allow for serial correlation
through clustering by firms.
1. We classified PhD dissertations into one or more research fields using a manual cross-
walk between dissertations’ subject terms and the 25 OECD natural science subfields
(see Table A2 for examples of the most commonly used subject term for each subfield).
2. We counted the number of PhD dissertations defended each year in each OECD natural
science subfield.
3. For each patent subclass and time cohort, we calculated its reliance on knowledge pub-
lished across OECD natural science subfields by dividing (i) the number of citations
from patents granted in the focal subclass during the time cohort to publications pub-
lished in the focal OECD subfield by (ii) the total number of citations from patents
granted in the focal subclass during the time cohort to publications published in any
subfield. For instance, if there were 100 NPL citations from subclass C01C (Ammo-
nia; Cyanogen) in time-cohort t and 50 of those citations were to OECD subfield 2.04
Chemical engineering, then the CP C −OECD shares,o,t for subclass C01C and OECD
subfield 2.04 in time cohort t was 0.5.
60
4. We used the same firm patenting shares across CPC subclasses as previously described.
5. We calculated our firm-year measure of PhD dissertations, OECD as follows:
X
P hD dissertations, OECDi,t = P recohort share of patentsi,s
s∈S
!
X
P hD dissertationso,t × CP C − OECD shareo,s,t
o∈O
(16)
The index s denotes patent subclasses. P recohort share of patentsi,s is firm i’s share
of patents in subclass s in the previous time cohort. The index o denotes OECD natural
science subfields. P hD dissertationso,t is the number of PhD dissertations defended
in year t in OECD natural science subfield o. CP C − OECD shareo,s,t is the relative
importance of OECD subfield o to patent subclass s in time cohort t.
The index s denotes patent subclasses. P ublications cited by patentss,t is the number
of non-corporate publications published in year t and cited by at least one patent
(whether a corporate patent or a non-corporate patent) granted in subclass s during
1980-2020. P recohort share of patentsi,s is firm i’s share of patents in subclass s
during the previous 5-year time cohort.
61
2. Using the vectors obtained from the previous step, we computed the cosine similarity
between each corporate patent and the publications published within a window of
plus or minus one year from the year the patent was granted. We then ranked the
publication abstracts in descending order of the cosine similarity score, keeping only
the top 10,000 most similar publications for each patent abstract.
3. We determined the pool of publications relevant to each firm’s patents granted within
a specific time cohort. This list included all publications that were retained in the top
1,000 most similar publications for one or more of the firm’s patents granted in the
time cohort. We used the firm-cohort level rather than the firm-year level because not
all firms in our analysis sample have granted patents every year, which would result in
many instances where the pool of relevant public invention would be zero.
4. For each publication in the pool, we identified its maximum similarity score with the
firm’s patents granted in the time cohort.
P is the set of publications in the top 1,000 most similar publications for one or more
of the patents granted to firm i during time cohort t. M aximum similarityp,i,t is the
maximum textual similarity score between the abstract of publication p and all the
abstracts of the patents granted to firm i during time cohort t.
62
Appendix B Robustness Checks
We performed a variety of checks to test the robustness of the effect of public science on
corporate patents, publications, and AMWS scientists.
63
our dataset, searching for publications authored by the student within 10 years of their
graduation that closely resemble the dissertation abstract, as outlined in Section ??. We
infer that these publications are published versions of the dissertation and remove them from
the construction of our Public invention, broad measure and its corresponding instrumental
variable. As a result of this procedure, 6,199 publications are excluded from the estimation
sample.
Next, we exclude publications that have a PhD student as a coauthor (Column 3). We
identify these publications by comparing the list of authors with our list of students. Pub-
lications with at least one coauthor who is a PhD student during the publication year are
removed from the construction of our Public invention, broad measure and its correspond-
ing instrumental variable. This procedure excluded 74,397 publications from the estimation
sample.
In Column 4, we exclude publications that are authored by the advisors of PhD students.
We identify these publications by comparing the list of authors with our list of advisors.
Publications with at least one coauthor who is a PhD advisor (at any point in time) are
removed from the construction of our Public invention, broad measure and its corresponding
instrumental variable. This procedure excluded 391,007 publications from the estimation
sample.
Next, we seek to account for differences in the intensity of human capital production
across different scientific fields (Columns 5 and 6). We separate fields that have high (i.e.,
above median) ratios of funding received by PhD students and their advisors to total funding
received from those that have low (i.e., below median) ratios. Publications from high ratio
fields are dropped from the construction of our Public invention, broad measure in Column
5, while publications from low ratio fields are similarly dropped in Column 6.
The coefficient estimates on public invention and human capital remain consistent across
all specifications.
64
Table B12: Separating Public Invention From Human Capital
65
B.3 High-Quality Corporate Innovation
We report robustness checks using different measures of high-quality corporate innovation in
Table B13. In Columns 1 and 2, the dependent variable is “home-run patents” (i.e., patents in
the top 5% of their cohort in terms of citations received) and “breakthrough patents” (patents
in the top 1% of their cohort in terms of citations received), respectively. In Columns 3 and 4,
the dependent variable is corporate publications coauthored with AMWS scientists (Column
3) and cited by AMWS scientists (Column 4), respectively. In Column 5, we use only firms’
employment of AMWS scientists who have won major awards. Reassuringly, the coefficient
estimates remain similar to those presented in Tables 5, 6, and 7.
66
B.4 Alternative Measure of Frontier Firms
Notes: This table presents the second stage of 2SLS estimation for the effect of public invention and human
capital on corporate patents, publications, and AMWS scientists when considering firm proximity to the
technology frontier. To measure this proximity, we first count each firm’s annual flow of novel patents, where
patent novelty is based on patents that are first to be granted in a new CPC main group or subgroup. Then,
we create the variable Tech frontier as an indicator equal to 1 for firm years with a flow of novel patents in
the top decile compared to other sample firms in that year, and 0 otherwise. Standard errors (in parentheses)
are robust to arbitrary heteroskedasticity and allow for serial correlation through clustering by firms.
67
Appendix C Additional Descriptive Statistics and Case
Examples
Notes: This table provides summary statistics by main industry for our analysis sample. Industry classifi-
cation is based on a firm’s primary SIC4 code.
68
Table C16: Summary Statistics by Main Industry (Cont.)
Table C18: Cross Tabulation of Measures of Human Capital and Public Invention
69
Table C19: Mean Comparison Tests: Frontier Firms Versus Follower Firms
70
Table C20: Mean Comparison Tests: Federal Lab Publications Versus Other Publications
C. Chemistry
Funding per pub. ($) 22,186.6 4.0 33,904.5 34,321.2 11,717.9 8,909.5
Patent citations per pub. 0.1 1.8 0.6 0.4 0.4 0.3
Publication citations per pub. 7.8 4.1 21.1 10.6 13.3 5.9
Authors per pub. 0.7 2.7 4.6 1.4 4.0 0.7
71
C.1 Examples of Relevant Human Capital
We validate the logic behind two of our measures of firm-relevant human capital with three
case examples, as summarized in Tables C21 and C22 and detailed below.
Our primary measure of relevant human capital relies on the textual similarity between
the abstracts of dissertations and the abstracts of firm patents. Specifically, a PhD graduate
is relevant to a firm’s R&D if his/her dissertation defended in year t is in the top 1,000 most
similar dissertations to one or more of the firm’s patents granted in years [t − 1, t + 1].36 For
each PhD graduate from Column 1, we list the top 3 firms (Column 2) and up to 3 most
similar patents per firm (Column 3) based on the textual similarity between abstracts.
Our complementary measure, Human capital, cited, relies on non-patent literature cita-
tions from patents in various CPC subclasses to the published version of the dissertation.37
For each PhD graduate-firm pair, we list up to three patents that cite the published version
of the dissertation (Column 4). We also list whether the PhD graduate worked for the firm,
and during which years (Column 5).
72
Table C21: Examples of Relevant Human Capital
1983 to 1988. During this time, he collaborated with IBM to create a prototype university
computing system called Andrew, on which Dr. Nichols’s research was built. Dr. Nichols
conducted an extensive comparison of his model’s findings with real-world experiments on
the AFS system. He delved into various factors that could impact the system’s performance,
such as network latency, processing speed, and hard drive access time. Through his research,
he provided valuable insights into the functioning of the AFS system, demonstrating that
its performance is primarily limited by the processing power of the involved computers.
Furthermore, he found that AFS could handle a diverse range of tasks without becoming
overwhelmed.
73
Table C22: Examples of Textually Similar Patents
(1) (2)
Patent number Patent title
4737931 Memory control device
4843542 Virtual memory cache for use in multi-processing systems
4974173 Small-scale workspace representations indicating activities by other users
4779187 Method and operating system for executing programs in a multi-mode microprocessor
4967378 Method and system for displaying a monochrome bitmap on a color display
4974159 Method of transferring control in a multitasking computer system
4719569 Arbitrator for allocating access to data processing resources
4884266 Variable speed local area network
4937734 High speed bus with virtual memory data transfer and rerun cycle capability
5596668 Single mode optical transmission fiber, and method of making the fiber
5847690 Integrated liquid crystal display and digitizer having a black matrix layer adapted for sensing
screen touch location
5858052 Manufacture of fluoride glass fiber with phosphate coatings
5629246 Method for forming fluorine-doped glass having low concentrations of free fluorine
5837564 Method for optimal crystallization to obtain high electrical performance from chalcogenides
5906771 Manufacturing process for high-purity phosphors having utility in field emission displays
5629418 Preparation of titanyl fluorophthalocyanines
5714301 Spacing a donor and a receiver for color transfer
5916946 Organic/inorganic composite and photographic product containing such a composite
6016516 Remote procedure processing device used by at least two linked computer systems
6176425 Information management system supporting multiple electronics tags
6340931 Network printer document interface using electronics tags
6012052 Methods and apparatus for building resource transition probability models for use in pre-
fetching resources, editing resource link topology, building resource link topology templates,
and collaborative filtering
6172972 Multi-packet transport structure and method for sending network data over satellite network
6338079 Method and system for providing a group of parallel resources as a proxy for a single shared
resource
6023509 Digital signature purpose encoding
6173315 Using shared data to automatically communicate conference status information within a com-
puter conference
6343067 Method and apparatus for failure and recovery in a computer network
Notes: This table lists the titles of the textually similar patents from Table C21.
Using the SPECTER algorithm, we found that Dr. Nichols’s dissertation is textually
similar to several corporate patents granted in 1988-1990, as listed in Tables C21 and C22.
For example, the 1989 patent titled “Virtual memory cache for use in multi-processing sys-
tems” (USPTO patent number 4843542) assigned to Xerox and Dr. Nichols’ dissertation are
both closely related to multi-processing systems, with a shared focus on improving perfor-
mance and efficiency within such systems. The patent introduced a virtual memory cache
that enhances performance and efficiency. Meanwhile, Dr. Nichols’ dissertation explored the
use of multiple processors in a network of workstations to optimize processing power and
resource sharing. Overall, both the patent and the dissertation highlight the importance of
74
performance and efficiency in multi-processing systems.
Dr. Nichols’s dissertation was published in 1988 in ACM Transactions on Computer
Systems under the title “Scale and performance in a distributed file system.” This publication
has been cited by 442 patents, including patents assigned to Sun Microsystems, Lucent
Technologies, IBM, Xerox, EMC, Unisys Corporation, Microsoft, Oracle, NetApp, Hewlett
Packard, Google, and AT&T, among others. For example, the 1993 patent titled “Method for
delegating access rights through executable access control program without delegating access
rights not in a specification to any intermediary nor comprising server security” (USPTO
patent number 5649099) assigned to Xerox outlines a method that allows users to securely
delegate specific access rights to others, even without complete trust. By utilizing rules
called access control programs, the system maintains controlled and secure shared access.
As a result, the system can decide whether to grant or deny a request.
Dr. Nichols was employed at Xerox PARC from 1990 to 1996.38 Later, he joined Mi-
crosoft in 2003 (where he was still employed as of 2023). Although Dr. Nichols did not
publish any scientific articles after completing his PhD degree, he contributed to numer-
ous patents at both Xerox and Microsoft, many of which were related to his dissertation.
While working at Xerox, Dr. Nichols played a significant role in designing and implement-
ing the Tapestry system, which facilitates automatic filtering of electronic messages based
on human feedback. He also co-led the Jupiter project, aimed at supporting collaboration
through the concept of “network places.” Some of his notable inventions at Xerox include
“Method for controlling real-time presentation of audio/visual data on a computer system”
(USPTO patent number 5692213), filed in 1995. Some of Dr. Nichols’s notable inventions
at Microsoft include “Method and system for resolving conflicts operations in a collaborative
editing environment” (USPTO patent number 7792788), filed in 2005, and “Deployment,
maintenance, and configuration of complex hardware and software systems” (USPTO patent
number 7676806), also filed in 2005.
Example 2: Dr. Siddharth Ramachandran
This example demonstrates that Dr. Siddharth Ramachandran’s published dissertation
has significantly influenced the field of optical fiber and photonics devices, with multiple firms
citing his work in their patents, including Lucent Technologies. Moreover, his expertise led
him to work for renowned institutions like Bell Labs and OFS Labs, further contributing to
advancements in the industry through his published research and patented inventions.
Dr. Ramachandran earned his PhD in electrical and computer engineering from the Uni-
versity of Illinois Urbana-Champaign in 1998. His dissertation, titled “Photoinduced optical
integrated circuits and bulk photonic devices in chalcogenide glasses,” delves into the unique
properties of chalcogenide glasses that make them valuable for technological applications.
Dr. Ramachandran explores how exposure to light can alter the structure of these glasses
and enable energy transfer to rare earth elements, potentially benefiting lasers and communi-
cation devices. Additionally, he investigates methods to enhance the stability and longevity
of the glass through heating processes and determines optimal operating conditions.
Using the SPECTER algorithm, we found that Dr. Ramachandran’s dissertation is
textually similar to several corporate patents granted between 1997 and 1999, as listed in
Tables C21 and C22. For example, the 1997 patent titled “Single mode optical transmission
38
https://www.linkedin.com/in/david-nichols-6829331/
75
fiber, and method of making the fiber” (USPTO patent number 5596668) assigned to Lucent
Technologies has a strong link to Dr. Ramachandran’s dissertation due to their shared focus
on optical technologies, material properties, light interaction, and potential applications in
telecommunications. Both the patent and dissertation concentrate on optical technologies,
with Dr. Ramachandran’s thesis examining chalcogenide glasses’ properties for use in pho-
tonic, laser, and communication devices, while the patent is concerned with single-mode
optical transmission fibers that are essential components of optical communication systems.
Additionally, the patent and dissertation both involve the study of specific materials and
their properties, with Dr. Ramachandran researching chalcogenide glasses’ unique charac-
teristics, and the patent concentrating on the manufacturing process of single-mode optical
transmission fibers. Moreover, both the dissertation and the patent investigate materials and
their interaction with light, with Dr. Ramachandran examining how exposure to light can
alter the structure of chalcogenide glasses and enable energy transfer to rare earth elements,
and the patent discussing an optical fiber that can efficiently transmit light signals over long
distances. Finally, the patent and the dissertation have a connection to potential telecommu-
nication applications. Optical fibers are a critical technology in modern telecommunication
systems, while chalcogenide glasses have potential applications in communication devices
and could be incorporated into future optical technologies.
Dr. Ramachandran’s dissertation, entitled “Low-loss photoinduced waveguides in rapidly
thermally annealed films of chalcogenide glasses,” was published in Applied Physics Letters
in 1999. This publication has been cited by eight patents, including one assigned to Lucent
Technologies and titled “Mesa geometry semiconductor light emitter having chalcogenide
dielectric coating” (USPTO patent number 6463088). Dr. Ramachandran’s research has
had a significant impact on the development of various inventions at Lucent Technologies,
including light-emitting diodes (LEDs), laser diodes, and optoelectronic devices. LEDs are
utilized in a broad range of applications, such as displays, indicator lights, and general light-
ing. The mesa geometry semiconductor light emitter with a chalcogenide dielectric coating
has the potential to enhance the performance of LEDs, making them more energy-efficient
and durable. Laser diodes are used in many applications, including data communications,
optical storage, sensing, and medical equipment. The patented technology can improve the
efficiency and output power of laser diodes, leading to better overall performance. Further-
more, the semiconductor light emitter outlined in the patent can be integrated into a variety
of optoelectronic devices, such as photodetectors, optical modulators, or optical amplifiers,
ultimately improving their performance.
Dr. Ramachandran’s professional career spans over a decade of optical fiber and pho-
tonics device research. He began his work as a member of the technical staff at Bell Labs,
a division of Lucent Technologies, in 1998 and continued until 2001. He then joined OFS
Labs, a world-renowned institution in optical research and product development, where he
worked from 2001 to 2009. Throughout his career, Dr. Ramachandran has authored nu-
merous research articles on these subjects, including “Photoinduced index-tapered channel
waveguides in chalcogenide glasses for guided mode-size conversion” published in 1998, “Spa-
tially and spectrally resolved imaging of modal content in large-mode-area fibers” in 2008,
and “Generation and propagation of radially polarized beams in optical fibers” in 2009.
These publications reflect his expertise in optical fiber and photonics device research and
demonstrate his contribution to the field.
76
Dr. Ramachandran has made significant contributions to multiple patented inventions
during his tenure at Bell Labs and OFS Labs. For example, in 2007 he filed a patent ti-
tled “Visible continuum generation utilizing a hybrid optical source” on behalf of OFS Labs.
This invention focuses on generating a visible light continuum by employing a hybrid optical
source, which has potential applications in fields such as microscopy, imaging, and optical
communications. In 2009, he filed another patent on behalf of OFS Labs called “Systems
and techniques for generating Bessel beams” This patent involves the development of sys-
tems and methods for producing Bessel beams, a type of non-diffracting light beam that
maintains its intensity profile over a long distance, making it highly useful in applications
like optical trapping and laser machining. Both patented inventions are closely related to
Dr. Ramachandran’s doctoral studies. By building upon his doctoral research, Dr. Ra-
machandran has continued to contribute to the advancement of optical fiber and photonics
devices.
Example 3: Dr. Dirk Balfanz
This example shows that Dr. Dirk Balfanz’s dissertation on access control for ad-hoc
collaboration has been highly relevant and influential in the field of information technology, as
demonstrated by numerous citations in patents from such major tech companies as Microsoft
and Xerox. Dr. Balfanz’s expertise led him to work for both Xerox PARC and Google.
In 2001, Dr. Dirk Balfanz obtained his PhD in computer science from Princeton Uni-
versity. His dissertation titled “Access control for ad-hoc collaboration” explored various
approaches to managing access control during interactions with unknown or untrusted par-
ties. Dr. Balfanz demonstrated that it is possible to protect resources while allowing ad-hoc
collaborations to take place, even though it may seem counterintuitive. His research has
paved the way for refining access control logic, enhancing user-computer interaction mod-
els, and aiding programmers in securely dividing applications. The ultimate objective is
to address the security challenges in our increasingly interconnected world, where ad-hoc
collaborations are becoming more prevalent.
Dr. Balfanz acknowledged the substantial contributions of his collaborators at Microsoft
Research and Xerox PARC. While at Microsoft Research, Dan Simon conceived the Win-
dowBox idea, and Paul England offered guidance on Windows programming. At Xerox
PARC, Drew Dean collaborated with Dr. Balfanz on the Placeless access control logic, while
Doug Terry, Jim Thornton, and Mike Spreitzer provided further assistance. Ian Goldberg’s
expertise was critical in porting SSLeay to the PalmPilot, enhancing Copilot, and sharing
programming tips for the Pilot. Bob Relyea from Netscape provided assistance with specific
PKCS#11 details. Lastly, Andrew Appel’s role in establishing the decidability proof for the
logic in Chapter 2 of the dissertation was pivotal.
Using the specter algorithm, we found that Dr. Balfanz’s dissertation is textually similar
to various corporate patents granted between 2000 and 2002, as listed in Tables C21 and C22.
For example, the patent “Remote procedure processing device used by at least two linked
computer systems” granted to Xerox (USPTO patent number 6016516) and Dr. Balfanz’s
dissertation are connected within the broader context of information technology, with a
focus on distributed systems and collaboration. Both address challenges in these areas, with
the patent aimed at improving the ease and efficiency of combining and executing remote
procedures, and the dissertation emphasizing the importance of security and access control in
ad-hoc collaborations. Ultimately, both works contribute to enhancing the user experience in
77
collaborative settings by facilitating seamless interactions with remote resources and ensuring
secure access to shared information.
Dr. Balfanz’s dissertation was published in the 2002 Proceedings of the ACM Confer-
ence on Computer-Supported Cooperative Work under the title “Using speakeasy for ad hoc
peer-to-peer collaboration.” This publication has been cited by 23 patents, including those
from Microsoft. Two such patents are 8719847, titled “Management and marketplace for
distributed home devices,” and 8782527, titled “Collaborative phone-based file exchange.”
Patent 8719847 deals with managing and integrating distributed home devices in a net-
worked environment. The patent addresses challenges in securely connecting and managing
devices, enabling users to share and access resources in a controlled manner. Dr. Balfanz’s
dissertation, which emphasizes the importance of secure access control in ad-hoc collabo-
rations, is related to this patent in the sense that both works explore the need for secure
and controlled sharing of resources in networked environments. Patent 8782527 focuses on
facilitating secure and efficient file exchange between phones in a collaborative setting. The
patent describes methods and systems for securely sharing files among collaborating parties
using mobile devices. Dr. Balfanz’s dissertation is related to this patent as both works share
a focus on the importance of secure collaboration and access control when sharing resources,
such as files, among multiple users.
Dr. Balfanz worked as a research staff member at Xerox from 2001 to 2007. After
his tenure at Xerox, he joined Google in 2007 as a software engineer, where he focused on
security, privacy, and abuse prevention.39
Although Dr. Balfanz has published papers like “Security Keys: Practical Cryptographic
Second Factors for the Modern Web” in 2016 and “Origin-Bound Certificates: A Fresh Ap-
proach to Strong Client Authentication for the Web” in 2012, his primary contributions
have come in the form of patent inventions. He has made numerous contributions to in-
ventions during his time at Xerox PARC, including “Apparatus and methods for providing
secured communication” (USPTO patent number 7392387) filed in 2007, and “Systems and
methods for authenticating communications in a network” filed in 2004. Additionally, he
has contributed to inventions at Google, including “System and method for authenticating
to a participating website using locally stored credentials” filed in 2012, and “Methods and
systems of adding a user account to a device” filed in 2014.
39
https://www.linkedin.com/in/dirk-balfanz-7885852/
78