The Economics of Artificial Intelligence
Page 24
.com/ journals/.
4. We utilized data from the Historical Patent Data Files. The complete (unfi ltered) data sets from which we derived our data set are available here: https:// www .uspto .gov/ learning- and
- resources/ electronic- data- products/ historical- patent- data- fi les.
The Impact of Artifi cial Intelligence on Innovation 131
Table 4.4
Patent data summary statistics
Mean
Std. dev.
Min.
Max.
Application year
2003
6.68
1982
2014
Patent year
2007
6.98
1990
2014
Symbolic systems
.29
.45
0
1
Learning systems
.28
.45
0
1
Robotics
.41
.49
0
1
Artifi cial intelligence
.04
.19
0
1
Computer science
.77
.42
0
1
Other applications
.23
.42
0
1
US domestic fi rms
.59
.49
0
1
International fi rms
.41
.49
0
1
Org. type academic
.07
.26
0
1
Org. type private
.91
.29
0
1
Observations
13,615
search on patents, with the search terms being the same keywords used to
identify academic publications in AI.5 This provides an additional 8,640 AI
patents. We then allocate each patent into an AI fi eld by associating the rele-
vant search term with one of the overarching fi elds. For example, a patent
that is found through the search term “neural network,” is then classifi ed as
a “learning” patent. Some patents found through this search method will be
duplicative of those identifi ed by USPC search, that is, the USPC class will
be 706 or 901. We drop those duplicates. Together these two subsets create
a sample of 13,615 unique AI patents. Summary statistics are provided in
table 4.4.
In contrast to the distribution of learning systems, symbolic systems, and
robotics in the publication data, the three fi elds are more evenly distributed
in the patent data: 3,832 (28 percent) learning system patents, 3,930 (29 per-
cent) symbolic system patents, and 5,524 (40 percent) robotics patents. The
remaining patents are broadly classifi ed only as AI.
Using ancillary data sets to the USPTO Historical Masterfi le, we are able
to integrate variables of interest related to organization type, location, and
application space. For example, patent assignment data tracks ownership
of patents across time. Our interest in this analysis relates to upstream inno-
vative work, and for this reason we capture the initial patent assignee by
organization for each patent in our sample. This data enables the creation of
indicator variables for organization type and location. We create an indica-
tor for academic organization type by searching the name of the assignee for
words relating to academic institutions, for example, “university,” “college,”
5. We utilized data from the Document ID Dataset that is complementary to patent assignment data available on the USPTO website. The complete (unfi ltered) data sets from which we derived our data set are available here: https:// www .uspto .gov/ learning- and- resources/
electronic- data- products/ patent- assignment- dataset.
132 Iain M. Cockburn, Rebecca Henderson, and Scott Stern
or “institution.” We do the same for private- sector organizations, searching
for “corp.,” “business,” “inc.,” or “co.,” to name a few. We also search for
the same words or abbreviations utilized in other languages, for example,
“S.p.A.” Only 7 percent of the sample is awarded to academic organiza-
tions, while 91 percent is awarded to private entities. The remaining patents
are assigned to government entities, for example, the US Department of
Defense.
Similarly, we create indicator variables for patents assigned to US fi rms
and international fi rms, based on the country of the assignee. The inter-
national fi rm data can also be more narrowly identifi ed by specifi c country
(e.g., Canada) or region (e.g., European Union). Fifty- nine percent of our
patent sample is assigned to US domestic fi rms, while 41 percent is assigned
to international fi rms. Next to the United States, fi rms from non- Chinese,
Asian nations account for 28 percent of patents in the sample. Firms from
Canada are assigned 1.2 percent of the patents, and fi rms from China,
0.4 percent.
Additionally, the USPTO data includes NBER classifi cation and subclas-
sifi cation for each patent (Hall, Jaff e, and Trajtenberg 2001; Marco, Carley,
et al. 2015). These subclassifi cations provide some granular detail about
the application sector for which the patent is intended. We create indicator
variables for NBER subclassifi cations related to chemicals (NBER subclass
11, 12, 13, 14, 15, 19), communications (21), computer hardware and soft-
ware (22), computer science peripherals (23), data and storage (24), business
software (25), medical fi elds (31, 32, 33, and 39), electronics fi elds (41, 42,
43, 44, 45, 46, and 49), automotive fi elds (53, 54, 55), mechanical fi elds (51,
52, 59), and other fi elds (remaining). The vast majority of these patents (71
percent) are in NBER subclass 22, computer hardware and software. Sum-
mary statistics of the distribution of patents across application sectors are
provided in table 4.5.
Table 4.5
Distribution of patents across application sectors
Mean
Std. dev.
Chemicals
.007
.08
Communications
.044
.20
Computer hardware and software
.710
.45
Computer peripherals
.004
.06
Data and storage
.008
.09
Business software
.007
.09
All computer science
.773
.42
Medical
.020
.14
Electronics
.073
.26
Automotive
.023
.15
Mechanical
.075
.26
Other
.029
.16
Observations
13,615
The Impact of Artifi cial Intelligence on Innovation 133
4.6 Deep Learning as a GPT: An Exploratory Empirical Analysis
These data allow us to begin examining the claim that the technologies
of deep learning may be the nucleus of a general purpose invention for the
method of invention.
>
We begin in fi gures 4.1A and 4.1B with a simple description of the evolu-
tion over time of the three main fi elds identifi ed in the corpus of patents and
Fig. 4.1A Publications by AI fi eld over time
Fig. 4.1B Patents by AI fi eld over time
134 Iain M. Cockburn, Rebecca Henderson, and Scott Stern
papers. The fi rst insight is that the overall fi eld of AI has experienced sharp
growth since 1990. While there are only a small handful of papers (less than
one hundred per year) at the beginning of the period, each of the three fi elds
now generates more than one thousand papers per year. At the same time,
there is a striking divergence in activity across fi elds: each start from a similar
base, but there is a steady increase in the deep learning publications relative
to robotics and symbolic systems, particularly after 2009. Interestingly, at
least through the end of 2014, there is more similarity in the patterns for
all three fi elds in terms of patenting, with robotics patenting continuing to
hold a lead over learning and symbolic systems. However, there does seem
to be an acceleration of learning- oriented patents in the last few years of the
sample, and so there may be a relative shift toward learning over the last few
years, which will manifest itself over time as publication and examination
lags work their way through.
Within the publication data, there are striking variations across geogra-
phies. Figure 4.2A shows the overall growth in learning publications for the
United States versus rest- of-world, and fi gure 4.2B maps the fraction of
publications within each geography that are learning related. In the United
States, learning is far more variable. Prior to 2000 the United States has a
roughly equivalent share of learning- related publications, but the United
States then falls signifi cantly behind, only catching up again around 2013.
This is consistent with the suggestion in qualitative histories of AI that
learning research has had a “faddish” quality in the United States, with the
additional insight that the rest of the world (notably Canada) seems to have
taken advantage of this inconsistent focus in the United States to develop
capabilities and comparative advantage in this fi eld.
Fig. 4.2A Academic institution publication fraction by AI fi eld
The Impact of Artifi cial Intelligence on Innovation 135
Fig. 4.2B Fraction of learning publications by US versus world
With these broad patterns in mind, we turn to our key empirical exercise:
whether late in the fi rst decade of the twenty- fi rst century deep learning
shifted more toward “application- oriented” research than either robotics or
symbolic systems. We begin in fi gure 4.3 with a simple graph that examines
the number of publications over time (across all three fi elds) in computer
science journals versus application- oriented outlets. While there has actually
been a stagnation (even a small decline) in the overall number of AI publi-
cations in computer science journals, there has been a dramatic increase in
the number of AI- related publications in application- oriented outlets. By
the end of 2015, we estimate that nearly two- thirds of all publications in AI
were in fi elds beyond computer science.
In fi gure 4.4 we then look at this division by fi eld. Several patterns are
worthy of note. First, as earlier, we can see the relative growth through 2009
of publications in learning versus the two other fi elds. Also, consistent with
more qualitative accounts of the fi elds, we see the relative stagnation of
symbolic systems research relative to robotics and learning. But, after 2009,
there is a signifi cant increase in application publications in both robotics and
learning, but that the learning boost is both steeper and more long- lived.
Over the course of just seven years, learning- oriented application publica-
tions more than double in number, and now represent just under 50 percent
of all AI publications.6
These patterns are, if anything, even more striking if one disaggregates
6. The precise number of publications for 2015 is estimated from the experience of the fi rst nine months (the Web of Science data run through September 30, 2015). We apply a linear multiplier for the remaining three months (i.e., estimating each category by 4/ 3).
136 Iain M. Cockburn, Rebecca Henderson, and Scott Stern
Fig. 4.3 Publications in computer science versus application journals
Fig. 4.4 Publications in computer science versus application journals by AI fi eld
them by the geographic origin of the publication. In fi gure 4.5, we chart
rates of publication in computer science versus applications for the United
States as compared to the rest of the world. The striking upward swing in
AI application papers that begins in 2009 turns out to be overwhelmingly
driven by publications ex United States, though US researchers begin a
period of catch-up at an accelerating pace toward the fi nal few years of the
sample.
The Impact of Artifi cial Intelligence on Innovation 137
Fig. 4.5 Learning publications in computer science versus applications by United
States versus ROW
Finally, we look at how publications have varied across application sectors
over time. In table 4.6, we examine the number of publications by applica-
tion fi eld in each of the three areas of AI across two three- year cohorts
(2004– 2006 and 2013– 2015). There are a number of patterns of interest.
First and most important, in a range of application fi elds including medi-
cine, radiology, and economics, there is a large relative increase in learning-
oriented publications relative to robotics and symbolic systems. A number
of other sectors, including neuroscience and biology, realize a large increase
in both learning- oriented research as well as other AI fi elds. There are also
some more basic fi elds such as mathematics that have experienced a relative
decline in publications (indeed, learning- oriented publications in mathe-
matics experienced a small absolute decline, a striking diff erence relative
to most other fi elds in the sample). Overall, though it would be useful to
identify more precisely the type of research that is being conducted and
what is happening at the level of particular subfi elds, these results are con-
sistent with our broader hypothesis that, alongside the overall growth of
AI, learning- oriented research may represent a general purpose technology
that is now beginning to be exploited far more systematically across a wide
range of application sectors. (See table 4.7.)
Together, these preliminary fi ndings provide some direct empirical evi-
dence for at least one of our hypotheses: learning- oriented AI seems to
have some of the signature hallmarks of a general purpose technology. Bib-
liometric indicators of innovation show that it is rapidly developing, and is
being applied in many sectors—and these application sectors themselves
include some of the most technologically dynamic parts of the economy.
. Sci.
8
18
–
36
827
3,889
4582
1,431<
br />
1,322
1,125
Comp
39
39
51
88
73
291
404
653
401
–
elecom.T
gy
2
94
98
47
25
47
82
56
186
–
– 3
adioloR
gy
6
58
18
15
22
47
172
272
200
Ener
. o
31
35
73
271
970
258
139
348
109
Neur
terials
36
32
209
429
105
225
525
101
216
Ma
th
1
45
80
78
54
60
11
417
414
–
Ma
2015
7
51
24
92
325
490
283
139
149
ersus 2013–
Chemistry
2006 v
3
69
83
20
96
84
231
516
123
– 1
Medicine
eld, 2004–
ysics
13
52
68
84
343
388
122
135
125
Ph
8
45
10
12
20
10
25
292
423
oss sectors by AI fi
Economics
gy
33
65
97
93
13
258
600
133