2742
zyxw
zyxwvu
zyxwvu
IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 43, NO. 6, DECEMBER 1996
Single Event Upset at Ground Level
Eugene Normand, Member, IEEE
oeing Defense & Space Group, Seattle, WA 98124-2499
Abstract
a sophisticated ground-based detector system made at 100,
Ground level upsets have been observed in computer systems 5000 and 10,000 feet above sea level indicate that the 10-100
containing large amounts of random access memory fRAM). MeV flux falls off approximately linearly with altitude [8].
A~osphcricneutrons are most llkcly the major cause of the Very few measurements of thc neutron spectrum at groun
upscts based on measured data using the Weapons Neutron level have been made, especially over the entire energy range.
One set of the most recent terrestrial spccwal mcasurcments,
Rescarch (WNR) neutron beam.
made in Japan [9], was normalized to obtain the neutron
spectrum expected in the US, based on scaling airplane
I. INTRODUCTION
spectral measurements made over Japan [9] and
Several years after single event upset (SEU) was discovered These spectra show that the ground spectrum is roughly U300
in space in 1975, J. Ziegler [l] noted the potential for of that at 40000 ft.
microelectronics on the ground to be susceptible to SEU from
111. SINGLE EVENT UPSETS AT GROUND LEVEL
cosmic ray secondaries, primarily neutrons. Ziegler's work
was prompted by the work of T. May and M. Woods [2] in
uncovering errors in RAM chips due to upsets caused by the There is considerable evidence of upsets on the ground, but it
alpha particles released by U and Th contaminants within the has been largely kept proprietary or else it has been in the
chip packaging material. The alpha problem was regarded hands of computer systems engineers who do not u n d e r s t ~ d
seriously and chip vendors took specific actions to reduce it to its meaning or implications. In the following paragraphs we
l o l ~ ~ levels,
b ~ e mainly by reducing the alpha particle flux will present various examples of this kind of data, including
emitted by packaging and processing materials to generally < reference to the very recently revealed vast storehouse of data
obtained by IBM over a 15-yearperiod via a well-coordinated
0.01 Wcm2-hs[3].
proprietary effort. In addition, five specific examples will be
Unfor~nately,the potential for cosmic rays causing SEU on cited, one from a very large computer system that was taken
the ground received little attention, and has received almost off line for testing, two from the error log/maintenance
no public recognition on the part of chip vendors. Very history of a collection of large computers, one from a
revealed that beginning in 1979, they biomedical device utilizing SRAMs that has been implanted
un~er~ook
a very large proprietary effort to understand the in hundreds of patients and one from the system soft error
p~enomenonof upsets at ground level. This 15-year effort FIT rate (failures in time, i.e., IO9 device hours) testing
involved many different disciplines and activities: field performed by RAM vendors.
testing of memories, accelerated testing using cyclotron
beams, detailed model development on all levels, In addition, we believe that there are extensive collections of
e n v i r o ~ e nmonitoring
~
and coordination with device other data that provide evidence of these upsets, e.g. in the
designers [4]. In contrast to the lack of recognition of the key error and/or maintenance logs of large computer systems. In
role played by cosmic radiation for ground level upsets, the particular, the error logs of computer systems located in high
importance of this mechanism was recognized by people altitude cities, such as in the Rocky Mountain region, are
dealing with avionics, i.e., electronics in aircraft, relatively expected to reveal many such upsets. Although at present
early in the open literature. Avionics SEU by the atmospheric such records have not yet been made public, we hope that
neutrons was first predicted in 1984 [5] and later rigorously with the publication of this work, other SEU workers will
demonswated to occur in flight in 1992[6].
work cooperatively with computer systems people within their
organizations to uncover and reveal the large compilations of
errors
that exist. These errors have been detected, corrected
LEVEL NEUTRON FLUX
and logged by the dedicated software and hardware within
The neutron environment at ground level can be defined in those computer systems, so the computer systems engineers
terms of the models for the atmospheric neutron flux at are satisfied that their systems are well protected. However,
higher altitudes which are mainly based on neutrons in the in addition, the EDAC (error detection and correction)
energy range of 1<E< 10 MeV [7]. A number of studies have systems that work so effectively in protecting the large
of the energy spectrum of the computer systems, can also reveal the mystery of those upsets
x doesn't change with altitude or to SEU researchers who understand the mechanisms causing
its absolute magnitude does vary with the errors.
location and altitude around the earth ["I. Limited data from
zyxwvut
zyxwvut
zyxwvut
0018-9499/96$05.00 0 1996 IEEE
zyx
zyxwvutsrqponm
zyxwvutsrq
zyxwvutsrqpo
2743
1II.A EARLYIBM STUDY
An early study showed that when a large number of memories
was monitored for single event upset at three locations of
varying altitude (5000 feet, sea level and in a mine), the upset
rate decreased with decreasing elevation, indicating that
atmospheric neutrons are the likely cause [ll]. This study
has been recently published in a much updated format [12,
131 that carefully separates out the upsets caused by alpha
particles emitted by trace elements in the device package
from those caused by the atmospheric neutrons. Using the
atmosphLeric upset rate component at three locations within
the US, the variation with altitude is the same as the
ahnospheric neutron flux variation with altitude [12,13]. The
very recently issued special edition of the IBM Journal of
Researclh and Development (entirely devoted to the subject of
ground level upsets), has a great deal of additional
information on the many similar proprietary tests that IBM
performed. The results of most of those tests are, however,
presented in a relative or normalized format. In those
instances in which we can infer absolute error rates, that data
will be utilized (see discussion of FIT rates and Table 2
below).
to be deposited in a device to "flip" a logic state, e.g., 0+1
[l], (factor of > 100 reduction in the rate for a doubling of the
Qc value), whereas with neutrons and the recoils they
produce, it is much more gradual. The Fermilab system
contains DRAMS from two different manufacturers (and
therefore, almost certainly, with different Qcvalues) and yet
these showed no significant difference in upset rate. Other
large computer systems with different DRAMS, including
workstation clustered "computer farms" at Fermilab, also
exhibit about the same upsetbit-hour rate as observed for
ACPMAPS. The observed upset rate in the DRAMS of the
ACPMAPS is much more consistent with the SEUs being
caused by the atmospheric neutrons rather than packaging
material alphas as will be shown below.
If1.c UPSETRATES
IN LARGE COMPUTER SYSTEMS
An increasing number of off-the-shelf computers, in the
workstation and larger classes, are being designed to
incorporate EDAC to protect the RAM from errors. One such
model is the Nite Hawk computer. Each Nite Hawk has
approximately lGbit of DRAM memory, apportioned between
global and local memory. Many of these computers have
been used in a local systems integration laboratory, where the
f1I.B U P S E T R A T E IN FERMILAB COMPUTER SYSTEM
computer vendor also has the job of performing monthly
maintenance on the machines. An informal assessment by
The computer system ACPMAPS at Fermilab is a very large the computer maintenance people is that on the average, each
system of individual computers, which when joined together, machine shows one upset (parity error) per month, with some
contains about 160 Gbits of DRAM memory [14]. The having two errors and some none. Using the average value of
ACPMAPS is housed in a computer building far removed one error per month (defined as 624 hours), this is equivalent
from the very high energy Fermilab accelerators. It contains to a ground level upset rate of 1 . 6 ~ 1 0upset/bit-hr.
-~~
156 Gbits of 4 Mbit fast page-mode DRAM, guarded by
parity but not protected by EDAC. In production it A more accurate measure of the error rate was obtained based
consistently experiences single bit errors on an almost daily on a small number of errors from the error logs, acquired
basis. When the entire system was taken off-line for testing, it over a few-month period of time. The SYSERR logs of five
routinely gave an upset rate of 2.5 upset/day or 7E-13 Nite Hawk 5800 computers were checked; four are simulation
upsetbit hr.
computers and the fifth is a development computer. The logs
for the four simulation computers covered about 4 months,
It did not appear that these errors were being caused by while that for the developmental computer covered seven
alphas in the packaging material. First, the rate observed was months. The four simulation computers experienced 0,1,2
5-10 times larger than that which could be inferred from the and 3 errors respectively; two of the six errors were in global
results of the manufacturers' non-accelerated failure tests, and memory and four in local memory. The amount of total
more tlhan 500 times larger than the FIT rate based on memory available in these four computers varies. All have 64
extrapolating from accelerated failure tests with an alpha Mbytes of global memory, two have 160 Mbytes of local
source. Second, the chip vendor indicated that, based on lab memory and two have 256 Mbytes of local memory. Thus
tests with alpha sources, almost all alpha-induced upsets in two machines have available 1.8 Gbits and two have 2.6
these DRAMs occur when a "page miss" (a change in the row Gbits of memory. At present on average the memory usage
address) causes 4K bits of data to move from the DRAM cells on the simulation computers is estimated to be approximately
to a small on-chip SRAM page. The window of vulnerability 50%. This leads to an upset rate of 2.5 E-12 upsetbit-hr.
occurs when the long lines to the DRAM cells are active, so
the error rate should be proportional to the rate of page The developmental computer appears to be run on a more
misses (plus refreshes). Contrary to this, Fermilab found that consistent basis. Its error log covered a time period of -30
the 2.5 upset/day rate was independent of the rate of page weeks. This computer has 64 Mbytes in global memory and
misses, which was varied by over a factor of ten. Finally, as 64 Mbytes in local memory for a total of 1Gbits. For this
May and Woods showed [2], the alpha induced upset rate is machine an 80% usage factor for the total memory was
extremely sensitive to critical charge, Qc,the charge that has estimated, and over that time period 2 errors (one in global
zyxwvutsrq
zyxwvutsr
zy
zyxwvu
2744
zyx
zyxwvutsr
zyxwv
zyxwvutsrq
memory and one in local memory) were encountered. The
error rate for the developmental computer is thus 1.7E-12
upsetbit-hr. A more representative error rate was obtained
by averaging the error rates of the all five of the Nite Hawk
computers and this works out to 2.3E- 12 upset/(bit-hr).
A second independent source of upset data is the Cray YMP8 located about ten miles away from the Nitehawk computers.
The main memory of the YMP-8 consists of 32 modules, each
with 256 Mbits of SRAM, for a total of -8.2 Gbits of SRAM.
Each module comprises one thousand 256K x l SRAMs. The
system is protected by a standard EDAC system know as
SECDED, single error correct, double error detect. Upsets are
found only during the read operation. SECDED is
implemented by having Hamming code generated on every
write operation. On every read operation the 72 bit word (64
bits comprising the word, 8 extra Hamming code bits) is
again checked by the error detection circuit. If a single bit is
off, the bit is corrected and the error logged; if a double bit
error is found, no correction is attempted, but it is logged and
flagged and the entire module is replaced. The new Cray
Triton T-94 system , uses double error correct, multiple error
detect (DECMED) system employing 12 check bits so that
double bit errors can be corrected. It uses 2Mx2 SRAMs to
comprise the memory in its modules in very compact memory
stacks.
the Poisson distribution and the actual upset data indicates
that the source of most of the errors is random, such as SEUs
produced by the atmospheric neutrons. It also indicates that
the high error rates (> 8 errodmodule-yr) in two of the
modules may be due to more than random error.
The distribution also shows that the 10% most error-prone
modules will be experiencing at least 6 error/module-yr. The
utilization factor of the main memory is about 80% which has
to be used to obtain a meaningful bit error rate that can be
compared to the rate in other systems. Using the mean upset
rate of 2.3 errodmodule-yr (133 total upsets), this converts to
a bit error rate of 1.3E-12upsetlbit-hr.
In addition to the main memory, the Cray also has a
secondary bulk storage memory system called the Solid State
Device (SSD). The SSD consists of a total of 32 Gbits of
DRAM, in this case all in the form of 4Mx1
too is protected by EDAC. The error logs from the SSD were
studied for the same 22-month period of time and it was
found that the average number of errors was 2.71 month for a
total of 60 errors. The utilization factor for the SSD is lower
than for the main memory, with a value of 20% being a rough
estimate. Therefore the 2.7 errorlmonth converts to a bit
error rate of 6E-13 upsetbit-hr. The DRAMS also exhibited
double bit errors, a total of 17 or -28% of all the errors.
However 10 of the 17 double bit errors occurred during only
two of the months. The number of single bit errors for those
We were able to gain access to the system error logs for this two months was high but similar to the number of single bit
Cray YMP-8 covering a period of 22 months (May 1992 - errors during several other months, and the number of errors
February 1994). Over that time period, 30 out of the 32 in the main memory during those two months was about
modules experienced one or more parity errors. During the average. Thus, although it is unclear why so many double bit
first 16 months the parity errors were logged and date errors occurred during those two months, it appears that the
stamped, but this changed in August 1993, after which the memory usage in the SSD may have been much higher than
errors were logged but without a date stamp. To extract usual during those periods.
individual upset data required careful interpretation of the
1
error logs. This was made easier through the assistance of
the systems engineer, but it also required several assumptions
5
r
to be made in order Eo interpret the data in a consistent
2 0.8
manner. Two examples of errors that were not counted as
U
random errors are illustrative: 1) errors in "flaky" RAM chips
(defined as a chip that had the same error at the same
location on 2 or more days over the 22 months) and 2) the
large number of single bit errors that occurred on Oct. 11,
1992 in four modules during preventative maintenance (PM)
because the PM-induced errors were registered in the
modules in which the EDAC diagnostic wasn't tumed off.
o~"":""l'"'!""~""!""!
..""?':'-'':.-..i
0
1
2
3
4
5
6
7
8
9
10
The parity error data was converted into a distribution of
E rrorslhnodule-Yr
parity errors per module. This distribution function is shown
The Cumulative Dsitributiuion Function for
in Fig. 1 and is normalized to the errors occurring on an Figure 1
Ground
Level
Errors (Error/Module-Year) in the Main
annual basis. Since it is set up as a cumulative distribution,
Memory
of
theCRAY
YMP-8
we see that 50% of the modules will have 1.8-2 or more
errors per module-year. This is consistent with the mean
number of errors which is 2.3 error per module-yr. The 111.0 UPSETRATES FROM FIT RATETESTSBY RAM VENDORS
figure also shows the theoretical cumulative probability
function for a Poisson distribution based on a mean rate of RAM manufacturers typically perform two types of quality
2.3 error/module-yr. The generally good agreement between control tests at their facilities in which they record the bit
zyxwvut
zyxwvut
zyxwvu
-
--.I
zyxw
zyxwv
zyx
zyxwvu
2745
error ratle, the rate being given in FIT units: a) system soft
error rate (SSER), by monitoring 1000 parts for 1000 hours,
and b) accelerated SER (ASER) obtained by using a radiation
source. Historically, RAM vendors have used alpha sources
to conduct their accelerated tests. The use of alpha sources
for these tests goes back to the early problem of alpha
contaminants in the chip packaging causing upsets [2], and it
was standardized in terms of a test procedure [16]. However,
use of the alpha source does not provide an accurate
indication of the ground level upset rate as Lage [ 171 directly
showed 'by comparing SSER and alpha-source ASER rates.
Three other types of ASER testing have been proposed and
used: a) proton beams to simulate neutron-induced upset
(IBM, [118]), b) the WNR neutron spallation source at Los
Alamos (TI [19] and Boeing [20]) and c) a 14 MeV neutron
generator (used in conjunction with a calculational method by
Boeing [21]).
Examination of SSER FIT rates provides an
excellent method of inferring the ground level upset rate.
Unfortunately, few such measurements have been published.
Those that are available are listed in Table 1 which contains
test data conducted by Motorola [ 161 and by IBM [171. The
upset rales are presented in terms of the FIT rate as well as
the per bit rate. All of the Motorola data, which are for
various types of Motorola SRAMs, are from SSER tests, and
this data has error bars to indicate the poor statistics involved
(typically very few errors, e.g., < 5, are measured). The ZBM
data includes measurements from both SSER (field) and
ASER (]proton beam) tests on 1M and 4M DRAMs . Two
factors are to be noted with the IBM upset data: each of the
averaged FIT rates is for a DRAM is from a different vendor,
and there are no error bars indicative of the upset statistics.
We note a wide variation in the upset rate among the various
DFL4M devices and significant differences between the SSER
and ASER results which is not typical of the measurements in
many of their other tests. Nevertheless, taken as a composite,
the ground level RAM upset rates listed in Table 1 are
relatively consistent, mainly in the range of 1-2 E-12
upset/bit-hr, and are therefore similar to the ground level
upset rates measured in the large computer systems discussed
above.
IV. ANALYSIS
In summary, five different sources of ground level upset rates
in RAM devices have been discussed. These are tabulated in
Table 2. The upset rates agree with one another within less
than an order of magnitude, and a rate in the range of 1-2 E12 upsetbit-hr appears to be about average. Thus the simple
average value of 1.5 E-12 upset/bit-hr represents the entire
range of rates, 0.3-2.3 E-12 upsetbit-hr, for both DRAMs
and SRAMs, from the diverse sources of data.
Our hypothesis is that the great majority of these upset are
caused by the atmospheric neutrons, i.e., the cosmic ray
secondaries at ground level. To demonstrate this, we will
tabulate SEU measurements made on both SRAMs and
DRAMs that were tested in the WNR neutron beam at the
Los Alamos National Laboratory. As we have previously
shown [22], the WNR neutron spectrum is essentially
identical to that of the atmosphericneutrons. One hour in the
WNR beam is equivalent to 2-3 E5 hours (beam intensity
varies from year to year) at 40,000 ft, or alternatively, 6-9 E7
hours at ground level (the neutron flux at ground level is
taken as U300 of that at 40,000 ft.).
zyxwvu
zyxwvutsrqpon
Table 11 Ground Level Soft Error Rates Measured by RAM
10,300
IBM 1M
D/A
IBM 1M
D/F
Mot 256K S/F
2*
2"
3
1M
S/F
2
Mot 4M
S/F
4
Mot
3300 2500-
4100
325 230-420
500 4505601
2070' 133028001
5750 4500-
3.3E-12
3.1E-13 2E-12
2.1E-12
1.5E-12
8900$
-t D=DRAM, S=SRAM, A indicates accelerated testing using
proton beam, F indicates field testing (1000 parts, 1000hrs)
* In this case each device type was from a differentvendor
$ Each Of these individually measured FIT rates had an
uncertainty of about a factor Of 4 (2-0.5) based On the Small
number of upsets and the probabilistic treatment of its
confidence level
Table 3 contains the WNR SEU cross section measurements
on three DRAMs and six SRAMs, one of which has been
previously published [22]. The WNR SEU response of the six
S U M S on a per bit basis shows a fairly wide variation.
However, when the Cypress parts, the only RAMs that
exhibited multiple bit upset (a few percent of the single
errors), are removed, the variation narrows significantly. It
narrows even further if the only 4M SRAM, the MCM6246,
which is notably less sensitive, is also removed. Among the
three 4M DRAMs, there is also some variation, with the Oki
part being notably less sensitive than the other two. None of
the nine RAMs exhibited neutron-induced single event
latchup, as was to be expected [20].
Column 4 of Table 3 contains the WNR SEU cross section
(upsets/fluence > 10 MeV), column 5 the scaled SEU rate at
ground level (based on a flux of 19.3 n/cmZ-hr on the ground)
and column 6 the ground level SEU rate calculated using the
burst generation rate (BGR) method [20]. The scaled
neutron-induced SEU rates are in the same range of 0.5-2 E12 upset/bit-h as those actually measured on the ground as
tabulated in Table 2. nusin making the comparison
between the measured bit error rates from computer error
logs, field SER data, etc., summarized in Table 2, these error
I
I
zyxwvutzyxwvutsrqponml
srqponmlkjihgfedcbaZYXWVUTSRQPONMLKJI
H
G
zyxwvu
zyxwv
zyxwvu
2146
rates directly correlate with the neutron-induced upset rate
tabulated in Table 3. A direct
of the field upset rates and the rates scaled from
shown in Table 4.
reliability standards on microelectronics to encourage the
development and use of low FIT-rate chips, and d) utilizing
the appropriate and available accelerated SER
techniques/tests to measure ground level FIT rates.
As indicated, use of the WNR beam to measure RAM SEU
rates is one of several accelerated SER methods, probably the
best one because this neutron beam is so similar to the actual
ospheric neutron spectrum. The IBM method uses a
beam of 1.50 MeV protons to simulate the atmospheric
neutrons [18], and they apply an empirically derived factor
17 to convert the measured SEU
the ground level SEU rate (factor varies with
s very similar to the use of the
Table 3, in which the conversion
ourly neutron flux at ground level > 10
MeV, 19.3 n/cm2-hr, that converts the WNR SEU cross
1 SEU rate. Because of the limited
[20], we use another method which
cross section data via the BGR
as an efficient alternative to using
effectiveness of this approach has been
21 for a few RAMS. By comparing
columns 4 and 5 of Table 3 we provide further evidence of
the effectiveness of the method. Nevertheless, this BGR
augmented by measuring the SEU cross section
trons to normalize the BGR parameters,
m to the atmospheric neutron spectrum.
The diversity of applications in item b) is extremely broad.
Biomedical devices tend to be expensive, but due to the
urgency of health considerations, additional costs for EDAC
or SEU-immunechips can be readily absorbed and passed on.
Industrial products might focus on process control
applications for which some additional costs might also be
warranted to protect against RAM errors. In contrast,
commercial products tend to be highly cost competitive, and
so the extra costs of error mitigation techniques might
hardest to justify. However, in some instances, such as those
related to financial transactions and “smart” cards, or the use
of microelectronics-based automobile systems, the vital
importance of dealing with such ground level errors, which
are to be expected if no mitigation techniques are used, may
be much more apparent. Each product may use << 1Mbit of
RAM,but because millions of units expected to be sold, the
total number of bit-hours in operation may still be large.
zyxwvutsr
V. CONCLUSIONS
Thousands of single event upsets are occurring every year on
the ground, yet few in the SEU community are aware of &em.
These upsets have been recorded mainly in large computer
systems equipped with EDAC to detect, correct and log in
Having demonstrated that the atmospheric neutrons are these errors. We have examined a few such error logs from
primarily responsible for the ground level upsets, there are a large computers, as well as other sources of ground level
number of impacts that this cause-effectrelationship has that upset data. All of this data is consistent with the atmospheric
extend beyond the SEU community. Some of these impacts neutrons being the main cause of the upsets. It is also the
are summarized in Table 5 and include: a) improving the same conclusion reached years ago by the IBM team that
reliability of large computer systems, b) applying error investigated this topic privately [4]. We demonstrated the
mitigation techniques to RAMS used in biomedical, correlation by comparison with the neutron-induced SEU rate
commercial and industrial products, c) imposing realistic
System and Location
Basis for Ground Rate
zyxwvutsr
zy
2747
zy
Sections in WNR Beam
RAM
Meas'd WNR Gr'nd level SEU Calculated Gr'nd Weibull Fit Heavy Ion
Size/Type* SEU X-section, Rate, Upbit-hr, SEU Rate, Upbit- Parameters Used in BGR
cm2/bit
WNR-Scaled
hr, BGRMethod Calculation %
4M/D
1.2E-13
2.3E- 12
2.1E-12
4.7E- 7,0.85, 18.3, 1.13 ##
4M/D
2.2E- 14
4.3E-13
NIA
4 m
9.3E-14
1.8E-12
2.3E-12
ISS
RAM
Vendor
TC5 14400-80
Toshiba
MSM514400-80 Oki
TMS44100[22] TI
IDT7 1256
HM65656
MCM6206
MCM6246
CY7C195fi
CY7C1997
IDT
256K/S
Matra
256K/S
Motorola 256K/S
Motorola 4WS
C Y ~ R S S256K/S
Cypress 256K/S
6.5E- 14
1.9E-13
1.4E-13
1.25E-14
5.7E-13
5.2E-13
1.3E-12
3.7E-12
2.7E- 12
2.4E- 13
1.1E-11
1E-11
2.3E-12
1.2E-12
7E-13
3.4E-13
8.4E-12
'
1.93E- 13
3.72E-12
2.48E- 12
(7unique RAMs)
3.8E-7, 1.98, 11.46,2.24 $
2.7E-6,2.64,3005,0.636
5.3E-8,1.1.5.45.6.88
I
:.98E-6, 1.02,33.7, 1.08
1
zyxwvutsrqp
zyxwvutsrzy
zyxwvuts
Simple Average
lfor 9 RAMs
1
% Weibull parameters (see [SI) are in following order: 00 (per bit), Lo, W and S; BGR method (see [20]) assumed t=2 pm and
C=OS in all cases; Weibull parameters are from following related RAMs: # TC5141OOZ-10, $ HM65656 engineering samples
1231, Q MCM6226, and t CY7C185
fi These parts exhibited multiple bit upsets during the WNR testing.
based on measurements with the WNR neutron beam. We
have not focused on any one specific DRAM or SRAM,but
rather on a representative sampling of RAMs to show that the
correlation applies to both SRAMs and DRAMs, and applies
fairly well regardless of which commercially available RAM
is used (however this is not true for those RAMs specifically
designed to have a low SEU sensitivity, e.g. the IBM LUNAC andEi DRAMs [26]).
in their nomenclature) is still incorporated, but the
correctable errors are no longer logged. The exact reason for
eliminating the error logging is unclear (100% confidence in
the ECC, increased speed, lower cost, etc.), but it will have an
impact . On some of the older workstations that had much
smaller memory capacities, the errors were in fact logged, but
because of the smaller memory size, the errors occurred much
less frequently. Systems administrators familiar with these
older workstations can recall seeing the occurrence of single
An upset rate in the range of 1-2 E-12 upsethit-hr was shown bit errors. Based on the data we presented, the lGbit
to be representative of the rate that most SRAMs and DRAMs workstations should experience 1-2 errors per month,
in actual field applications are experiencing, although there depending on how much of the memory is being used on a
were a few with lower rates (see Table 2). The upset rate of daily basis. However, since memory requirements have
1-2 E-12 upset/bit-hr leads to FIT rates of 1000-2000FITS for expanded so dramatically over the last few years and are still
a lMbil RAM, which is just at the limit of 2000 FITS for soft continuing to do so, the number of errors are likely to
errors given in the STACK specification for integrated continue to increase at a rapid rate. However, without the
circuits [25]. Thus we would expect that most RAMs of error logs, there will be no way to track this expected trend in
larger memory capacity than 1M (e.g., 4M, 16M, etc) would increasing errors.
not meet the STACK limit in actual field applications. RAM
tests using an alpha source may yield a rate lower than this It has been suggested that it is the thermal neutron portion (E
limit, but this study, and that by Lage [17], show that this is
0.025 eV) of the atmospheric neulron spectrum, rather than
an erroneous test. The atmospheric neutrons are the cause of the high energy portion (E> 10 MeV), which is mainly
most of the upsets on the ground, and the alpha particles do responsible for the upsets [27]. In this case the mechanism is
' fraction
not simulate the neutron interactions with the RAMs, they that of the thermal neutrons interacting with the BO
of the boron in the borophososilicate glass (BPSG) within the
only simulate alphas emitted from the chip package.
glassivation layer over the die that produces alpha particles.
It should be noted that gaining access to error logs may not The energy deposition by the alphas leads to the upsets [27].
always be very easy. There is the case of one supercomputer A very similar mechanism was investigated earlier with
manufacturer who, through a very stringent purchase respect to the BO
'
content of boron dopants in
agreement, precludes any owner of the supercomputer from microelectronics [28]. That analyis found that both the 1.5
divulging error information about that computer system. In MeV alpha and the 0.8 MeV Li recoil produced by thermal
the case of workstations, which today have on the order of neutron interactions with Bi0 can deposit energy leading to
lGbit or more of DRAM, EDAC @CC, error correcting code,
-
2748
zyxwvutsr
zyxw
Table 4 Direct Comparison of RAM SEU Rates at Ground Level, From Field Measurements and Scaled from Measured SEU
Cross Sections in WNR Neutron Beam
,
Network Fermilab, Batavia,
zyxwvut
zyxwv
zyxwvu
zyxwvutsr
zyxwv
Reduce RAM sensitivity through techniques know to SEU community [EDAC, SEU-h”ne
SRAMs, use of other memories less susceptible to SEU (e.g., EEPROMs, flash EEPROMs, etc.)]
Utilize existing expertise and methods to reduce possibility of RAM upsets at ground level in
devices having widespread use (thousands-millions of individual products). In commercial
biomedical, ~ndustr~al products use of low SEU-sensitive RAMs is generally precluded because of increased cost.
Example of LUNA-E and C (EDAC) DRAMs developed by M to have low SEU rates.
and commercial
However, to be competitive in their THINKPAD laptop computer, IBM uses non-IBM DRAMS
products
[24]because they are cheaper.
As FL4M devices continue to increase in memory capacity, microelectronics will no
Impose realistic
meet standards set by its own industry, e.g., 2000 FITS (per device) in STACK Spec
microe~~tronics
12.1 [25]. They meet it now because the same standard provides for only an a source test, and
i n d u s to
~ develop low a’s are not the real cause of the errors. Once atmospheric neutrons are recognized as main cause
FIT-rate designs
of errors, they will not be able to meet the maximum allowed FIT rate for RAMs > 1 Mbit.
Effective SEU testing techniques can be applied to RAMS to quickly determine their ground level
Utilize a ~ ~ a b l e
sensitivity to upset (FIT rates). These test techniques are a much better and quicker way to
accelerated SER
provide feedback on the susceptibility of specific new RAM design features than the existing
~ec~iques/tes~s
Improve reliability of
upsets [28]. In that case, even for the most sensitive RAM
tested with thermal neutrons, the upset cross section, in
it, wm about three orders of magnitude smaller than
from the WNR beam (Table 3). Furthermore, ground
thermal neutron fluxes are greatly influenced by the
ts of ~opography,soil water content and surrounding
man-made materials [29]. For a very simple air/material
geometry, the thermal neutron flux at the interface varies by a
factor of 5 depending on the material [29]. This implies large
variations in the thermal flux are possible just due to the
material/geometry configuration surrounding a particular
computer. In contrast, the measured ground level upset rates
in Table 2 show much less variation. Thus for a number of
reasons, including complete uncertainty of the BPSG content
of commercial SRApvfs and DRAMs, large variation of the
ground level thermal neutron flux from location to location,
2749
and old measurements showing a much lower upset cross
section, we believe that the contribution of thermal neutrons
to the ground level upset rate is small.
It has also been suggested that other cosmic ray secondary
particles, protons and pions, may also be responsible for the
ground level upset rates [30]. These particles may contribute
to some portion of the ground level upset rate, but the
correlation above, between the measured ground level bit
error rate (from error logs, RAM SSER FIT rates, etc.) and
the WNR SEU rate measurements, indicate that the
atmospheric neutrons are the dominant cause. We expect
that additional examinations of other sources of ground level
errors will further verify this contention. Such studies might
show the effects of latitude and altitude on ground level rates,
e.g., similar to the variation of the atmospheric neutron flux
with latitude and altitude [SI, and of variations in the SEU
response of different RAMS, such as that seen in Table 3.
Furthermore, Such examinations will hopefully lead to
expanded cooperation between the SEU community and the
designers of microelectronics, computer systems and the
diversity of commercial electronic products that use
significant quantities of RAM, in terms of accounting for the
effects of SEU in those products.
republished as DNA-Report DNA-TR-94-123, DNA, Feb,
1995
7. 0. C. Allkofer and P. K. Grieder, Physics Data: Cosmic
Ravs on Earth, Fachinformationszentm Energie, Physik,
Mathematik GmbH, Karlsruhe, 1984
8. E. Normand and T. J. Baker, “Altitude and Latitude
Variations in Avionics SEU and Atmospheric Neutron Flux”,
IEEE Trans. Nucl. Sci., 40, 1484 (1993)
9. T. Nakamura et al, “Altitude Variation of Cosmic Ray
Neutrons”,Health Phvsics, 53,509 (1987)
10. J. Hewitt et al, “Ames Collaborative Study of Cosmic Ray
Neutrons: Mid-Latitude Flights”, Health Phvsics, 34. 375,
1978
11. T. OGorman, “An Experiment to Determine the Effect of
Cosmic Rays on a FET Computer Memory”, paper presented
at the Fourth Single Event Effects Symposium, Los Angeles,
1985
12. T. OGorman, “The Effect of Cosmic Rays on the Soft
Error Rate of a DRAM at Ground Level”, IEEE Trans.
Electron Devices, 4l, 553 (1994)
13. T. J. O’Gorman et al, “Field Testing for Cosmic Ray Soft
Errors in Semiconductor Memories” IBM J. Res. Develop.
40,41, (1996)
14. M. Fischler, personal communication
15. F. Gardic et a1 “Analysis of Local and Global Transient
Effects in a CMOS SRAM,” IEEE Trans. Nucl. Sci., 43,899
(1996)
16. “Package Induced Soft-Error Test Procedurc”, MIL STD
883D, Method 1032.1
17. C. Lage et al, “Soft Error Rate and Stored Charge
Requirements in Advanced High Density SRAMs”, IEDM
Tech. Digest, 821 (1993)
18. J. F. Ziegler et a1 “Accelerated Testing for Cosmic SoftError Rate,” IBM J. Res. DeveloE. 40,51, (1996)
19. W. R. McKee et al “Cosmic Ray Neutron Induced Upsets
as a Major Contributor to the Soft Error Rate of Current and
Future Generation of DRAMS,” paper presented at 1996
Proceedings of International Reliabilitv Physics SgmDosium,
April, 1996
20. E. Normand, “Single-Event Effects in Avionics,” LEEE
Trans. Nucl. Sci., fll, 461, 1996
21. E. Normand, D. L. Oberg, J. L. Wert, T. J. Baker and C.
M. Castaneda, “Considerationsin Single Event Upset Testing
with Energetic Neutrons”, paper presented at the Eighth
Single Event Effects Symposium,Los Angeles, April, 1992
22, E. Normand, D. L. Oberg, J. L. Wert, J. D. Ness, P. P.
Majewski, S. A. Wender and A. Gavron, “Single Event Upset
and Charge Collection Measurements Using High Energy
Neutrons and Protons”, IEEE Trans. Nucl. Sci., 41, 2203,
1994
23. R. Ecoffet, M LaBrunee, S . Duzellier and D. Falguere,
“Heavy Ion Test Results on Memories,” 1992 IEEE Radiation
Effects Data Workshop, p. 27
24. M. Martignano and R. Harbo-Sorensen, “IBM
THINKPAD Radiation Testing and Recovery During
Euromir Missions,” IEEE Trans. Nucl. Sci., 42,2004, 1995
zy
zyxwvutsr
zyxw
zyxwvu
zyx
ACKNOWLEDGMENT
The assistance provided by the following people is gratefully
acknowledge with respect to information and hardware they
furnished and useful discussions they participated in: S. R.
Allen, W. M. Kearns, T. A. Krogel, P. P Majewski, D. L.
Oberg, S. W. Snow and J. L. Wert of the Boeing Defense &
Space Group, S. A. Wender of LANL, G. Eddy of Cray
Research Inc., T. Corbiere of Matra MHS, J. F. Ziegler of
IBM and M. Fischler of Fermilab.
zyxwvutsrqp
REFERENCES
1. J. F. Ziegler and W. A. Lanford, “Effectof Cosmic Rays on
Computer Memories”,Science, 206,776 (1979)
2. T. C. May and M. H. Woods, ”A New Physical
Mechanism for Soft Errors in Dynamic Memories,
Proceedings 16 Int’l Reliability Physics Symposium, p. 33,
April, 1978
3. A. fhsnain and A. Ditali, “Building-In Reliability: Soft
Errors- A Case Study,” Proceedings. 30 Int’l Reliability
Physics Svnuosium, p. 276 April, 1992
4. J. F. Ziegler et al, “IBM Experiments in Soft Fails in
Computer Electronics (1979-1984)’ IBM J. Res. Develop. 40,
3, (1996)
5. R. Silberberg, C. H. Tsao and J. R. Letaw, “Neutron
Generated Single Event Upset in the Atmosphere”, IEEE
Trans. 1Vucl. Sci., NS-31,1066 and 1183, Dec. 1984
6. A. Taber and E. Normand, “Investigation and
Characterization of SEU Effects and Hardening Strategies in
Avionics”, IBM Report 92-L75-020-2, August, 1992,
zyxw
zyxwvu
2750
zyxwvutsrqponml
zyxwvu
zyxwvutsr
zyxwvu
zyxwvu
25. “General Requirements for Integrated Circuits”,
Specification 0001, Issue 12.1, STACK International, Milton
Keynes, UK, Sept. 1993
26. P. Calvel, P. Lamothe, C. Barillot, R. Ecoffet, S.
Duzellier and E. 6. Stassinopoulos, “Space Radiation
Evaluation of 16 Mhit DRAMS for Mass Memory
Applications,” IEEE Trans. Nucl. Sci., 41,2267,1994
27. R. Baumann, T. Hossain, S . Murata and H. Kitagawa,
“Boron Compounds as a Dominant Source of Alpha Particles
in Semiconductor Devices,” Proceedings 1995 Int’lReliability
Physics Symposium, p. 297, April, 1995
28. T. R. Oldhm, S . Murrill, and C. T. Self, “Single Event
Upset of VLSI Memory Circuits Induced by Thermal
Neutrons,” Radiation Effects, Research and Engineering,
Vo1.5, No. 1, p. 6, 1986
29 K. O’Brien, H. Sandmeier, G. E. Hansen and J. E.
Campbell, “Cosmic Ray Induced Neutron Background
Sources and Fluxes for Geometries of Air Over Water,
Ground, Iron and Aluminum,” J. GeoDhys. Res., 83, 114,
(1978)
30. J. F. DiCello et al, “An Estimate of Error Rates in
Integrated Circuits at Aircraft Altitudes and at Sea Level”,
Nucl. Inst. and Methods, &U1295 (1989)
3 1. J. R. Letaw and E. Normand, “Guidelinesfor Predicting
Single Event Upsets in Neutron Environments”,IEEE Trans.
Nucl. Sci, NS-38, 1500,1991