jffs2 PDF
jffs2 PDF
jffs2 PDF
David Woodhouse
Red Hat, Inc.
dwmw2@cambridge.redhat.com
This paper will give an overview of the restrictions Aside from the difference in erase block sizes, NAND
imposed by flash technology and hence the design flash chips also have other differences from NOR
aims of JFFS, and the implementation of both JFFS chips. They are further divided into “pages” which
and the improvements made in version 2, including are typically 512 bytes in size, each of which has
compression and more efficient garbage collection. an extra 16 bytes of “out of band” storage space,
intended to be used for metadata or error correc-
tion codes. NAND flash is written by loading the
required data into an internal buffer one byte at a
time, then issuing a write command. While NOR
1 Introduction flash allows bits to be cleared individually until
there are none left to be cleared, NAND flash allows
only ten such write cycles to each page before leak-
1.1 Flash age causes the contents to become undefined until
the next erase of the block in which the page resides.
To provide wear levelling and reliable operation, sec- The original JFFS is a purely log-structured file sys-
tors of the emulated block device are stored in vary- tem [LFS]. Nodes containing data and metadata are
ing locations on the physical medium, and a “Trans- stored on the flash chips sequentially, progressing
lation Layer” is used to keep track of the current strictly linearly through the storage space available.
location of each sector in the emulated block de-
vice. This translation layer is effectively a form of In JFFS v1, there is only one type of node in the
journalling file system. log; a structure known as struct jffs raw inode.
Each such node is associated with a single inode.
The most common such translation layer is a com- It starts with a common header containing the in-
ponent of the PCMCIA specification, the “Flash ode number of the inode to which it belongs and all
Translation Layer” [FTL]. More recently, a variant the current file system metadata for that inode, and
designed for use with NAND flash chips has been in may also carry a variable amount of data.
widespread use in the popular DiskOnChip devices
produced by M-Systems. There is a total ordering between the all the nodes
belonging to any individual inode, which is main-
Unfortunately, both FTL and the newer NFTL are tained by storing a version number in each node.
encumbered by patents — not only in the United Each node is written with a version higher than all
States but also, unusually, in much of Europe and previous nodes belonging to the same inode. The
Australia. M-Systems have granted a licence for version is an unsigned 32-bit field, allowing for 4
FTL to be used on all PCMCIA devices, and allow milliard nodes to be written for each inode during
NFTL to be used only on DiskOnChip devices. the life span of the file system. Because the lim-
ited lifetime of flash chips means this number is
Linux supports both of these translation layers, but extremely unlikely to be reached, this limitation is
their use is deprecated and intended for backwards deemed to be acceptable.
compatibility only. Not only are there patent is-
sues, but the practice of using a form of journalling Similarly, the inode number is stored in a 32-bit
file system to emulate a block device, on which a field, and inode numbers are never reused. The same
“standard” journalling file system is then used, is logic applies to the acceptability of this limitation,
unnecessarily inefficient. especially as it is possible to remove this restriction
without breaking backwards compatibility of JFFS
A far more efficient use of flash technology would be file systems, if it becomes necessary.
permitted by the use of a file system designed specif-
ically for use on such devices, with no extra layers of In addition to the normal inode metadata such as
uid, gid, mtime, atime, mtime etc., each JFFS v1 medium of each range of data.
raw node also contains the name of the inode to
which it belongs and the inode number of the parent JFFS v1 stores all this information at all times while
inode.1 the file system is mounted. Each directory lookup
can be satisfied immediately from data structures
Each node may also contain an amount of data, and held in-core, and file reads can be performed by
if data are present the node will also record the off- reading immediately from the appropriate locations
set in the file at which these data should appear. on the medium into the supplied buffer.
For reasons which are discussed later, there is a re-
striction on the maximum size of physical nodes, Metadata changes such as ownership or permissions
so large files will have many nodes associated with changes are performed by simply writing a new node
them, each node carrying data for a different range to the end of the log recording the appropriate new
within the file. metadata. File writes are similar; differing only in
that the node written will have data associated with
Nodes containing data for a range in the inode which it.
is also covered by a later node are said to be obso-
leted, as are nodes which contain no data, where the
metadata they contain has been outdated by a later
node. Space taken by obsoleted nodes is referred to 2.3 Garbage Collection
as “dirty space”.
Special inodes such as character or block devices and The principles of operation so far are extremely
symbolic links which have extra information associ- simple. The JFFS code happily writes out new
ated with them represent this information — the jffs raw inode structures to the medium to mark
device numbers or symlink target string — in the each change made to the filesystem. . . until, that is,
data part of the JFFS node, in the same manner as it runs out of space.
regular files represent their data, with the exception
that there may be only one non-obsolete node for At that point, the system needs to begin to reclaim
each such special inode at any time. Because sym- the dirty space which contains old nodes which have
bolic links and especially device nodes have small been obsoleted by subsequent writes.
amounts of such data, and because the data in these
inodes are always required all at once rather than The oldest node in the log is known as the head, and
by reading separate ranges, it is simpler to ensure new nodes are added to the tail of the log. In a
that the data are not fragmented into many different clean filesystem which on which garbage collection
nodes on the flash. has never been triggered, the head of the log will
be at the very beginning of the flash. As the tail
Inode deletion is performed by setting a deleted approaches the end of the flash, garbage collection
flag in the inode metadata. All later nodes asso- will be triggered to make space.
ciated with the deleted inode are marked with the
same flag, and when the last file handle referring Garbage collection will happen either in the context
to the deleted inode is closed, all its nodes become of a kernel thread which attempts to make space
obsolete. before it is actually required, or in the context of
a user process which finds insufficient free space on
the medium to perform a requested write. In either
2.2 Operation case, garbage collection will only continue if there is
dirty space which can be reclaimed. If there is not
enough dirty space to ensure that garbage collection
The entire medium is scanned at mount time, each will improve the situation, the kernel thread will
node being read and interpreted. The data stored in sleep, and writes will fail with −ENOSPC errors.
the raw nodes provide sufficient information to re-
build the entire directory hierarchy and a complete The goal of the garbage collection code is to erase
map for each inode of the physical location on the the first flash block in the log. At each pass, the
1 The lack of distinction between directory entries and in- node at the head of the log is examined. If the node
odes means that the original JFFS cannot support hard links. is obsolete, it is skipped and the head moves on to
the next node.2 If the node is still valid, it must be nario to one and a half times the size of the flash
rendered obsolete. The garbage collection code does sectors in use.
so by writing out a new data or metadata node to
the tail of the log. In fact, the above is only an approximation — it
ignores the fact that a name is stored with each
The new node written will contain the currently node on the flash, and that renaming a file to a
valid data for at least the range covered by the origi- longer name will cause all nodes belonging to that
nal node. If there is sufficient free space, the garbage file to grow when they are garbage collected.3
collection code may write a larger node than the
one being obsoleted, in order to improve storage ef- The precise amount of space which is required in
ficiency by merging many small nodes into fewer, order to ensure that garbage collection can continue
larger nodes. is not formally proven and may not even be bounded
with the current algorithms.
If the node being obsoleted is already partially ob-
soleted by later nodes which cover only part of the Empirical results show that a value of four flash sec-
same range of data, some of the data written to the tors seems to be sufficient, while the previous de-
new node will obviously differ from the data con- fault of two flash sectors would occasionally lead to
tained in the original. the tail of the log reaching the head and complete
deadlock of the file system.
In this way, the garbage collection code progresses
the head of the log through the flash until a com-
plete erase block is rendered obsolete, at which point 2.5 Evolution
it is erased and becomes available for reuse by the
tail of the log.
The original version of JFFS was used by Axis in
their embedded devices in a relatively limited fash-
2.4 Housekeeping ion, on 2.0 version of the Linux kernel.
For JFFS2, where blocks can be garbage collected XIP functionality in JFFS2 is not currently planned
out of order, it was necessary to ensure that old because it is fairly difficult to implement and be-
data could never “show through” the holes caused cause the potential benefits of XIP are not clearly
by truncation and subsequent extension of a file. sufficient to justify the effort required to do so.
For this reason, it was decided that there should be For obvious reasons, XIP and compression are mu-
no holes in the proper sense — a complete absence tually exclusive - if data are compressed, they can-
of information for the range of bytes in question. not be used directly in place. Given a prototype
Instead, upon receiving a request to write to an off- platform with sufficient quantities of both RAM and
set greater than the current size of a file, or a re- flash that neither XIP or compression are required,
quest to truncate to a larger size, JFFS2 inserts a and the desire to save money on the hardware, a
data node with the previously-mentioned compres- choice can be made between halving the amount of
sion type JFFS2 COMPR ZERO, meaning that no ac- RAM and using XIP, or halving the amount of flash
tual data are contained with the node, and the en- and using compression.
tire range represented by the node should be set to
zero upon being read. By choosing the latter option, the cost saving will
generally be greater than the former option, because
In the case where a file contains a very large hole, it flash is more expensive than RAM. The operating
is preferable to represent that hole by only a single system is able to be more flexible in its use of the
physical node on the medium, rather than a “hole” available RAM, discarding file buffers during peri-
node for each page in the range affected. Therefore, ods of high memory pressure. Furthermore, because
such hole nodes are a special case of data node; the write operations to flash chips are so slow, compress-
only type of data node which may cover a range of ing the data may actually be faster for many work-
more than one page. loads.
The main problem with XIP, however, is the interac- 4.2 Garbage Collection Space Require-
tion with memory management hardware. Firstly, ments
for all known memory management units, each page
of data must be exactly page-aligned on the flash
chip in order for it to be mapped into processes ad-
dress space – which makes such a file system even
more wasteful of space than the mere absence of A major annoyance for users is the amount of space
compression already implies. Secondly, while giv- currently required to be left spare for garbage collec-
ing write or erase commands to a flash chip, it tion. It is hoped that a formal proof of the amount
may return status words on all read cycles, there- of required space can be produced, but in the mean-
fore all currently valid mappings of the pages of the time a very conservative approach is taken — five
chip would have to be found and invalidated for the full erase blocks must be available for writing before
duration of the operation. These two limitations new writes will be permitted from user space.
make a writable filesystem with XIP functionality
extremely difficult to implement, and it is unlikely It should be possible to reduce this figure signifi-
that JFFS2 could support XIP without fundamen- cantly — hopefully to a single block for NOR flash
tal changes to its design. and to two or three blocks in the case of NAND
flash, where extra space should always be available
An read-only XIP filesystem would be a more rea- to copy away data from bad blocks.
sonable request, and an entirely separate file sys-
tem providing this functionality, based on the exist- The approach to this problem in JFFS1 was to eval-
ing ROMFS file system, is likely to be developed at uate and attempt to prove an upper bound on the
some time in the near future. amount of space required. This appeared to fail be-
cause there appeared to be no such upper bound.
For JFFS2, it is suspected that a more useful ap-
proach may be to define a reasonable upper bound,
such as a single erase block, and to modify the code
to make it true.
4.1 Improved Fault Tolerance
The frequency of bugs being reported has reached JFFS was merged into the Linux kernel prior to the
a fairly stable low level, and the majority of recent 2.4.0 release. The current JFFS2 code is also, at
problems reported with JFFS2 have actually turned the time of writing, in Alan Cox’s 2.4-ac kernels.
out to be errors in the physical flash drivers or with The latest code for the 2.4 version of each, and for
other parts of kernel code — although sometimes the 2.2 version of JFFS v1, is available from the
this has highlighted an area where JFFS2 should be Linux-MTD CVS repository. Instructions for ac-
more fault-tolerant. cessing this, along with links to snapshot tarballs
for the firewall-challenged, are available from:
Both versions of JFFS are now in active use in a rea-
sonable number of embedded systems, and JFFS2
has been included as a fundamental part of the http://www.linux-mtd.infradead.org/
“Familiar” distribution of Linux for the Compaq
iPAQ handheld computer; replacing the read-only
The original web site for JFFS and the current
CRAMFS filesystem which was previously used on
code for the 2.0 kernels, along with a link to the
those devices.
jffs-dev mailing list which is used for discussion
of both JFFS and JFFS2, is at:
The existence of a fully-functional writable file sys-
tem for this class of device is an exciting develop-
ment, and was absolutely essential to the progress of
the Familiar distribution, allowing files to be over- http://developer.axis.com/software/jffs/
written individually without having to reset the de-
vice and use the bootloader to program a complete
replacement CRAMFS. At the time of writing, a web site specific to JFFS2
is intended to appear “shortly” at:
Commercial support for JFFS2 is available from
Red Hat, Inc., for customers wishing to use it in
http://sources.redhat.com/jffs2/
production systems with full backup from the de-
velopers.
References
6 Acknowledgements
[FTL] Intel Corporation, Understanding the Flash
Translation Layer (FTL) Specification, (1998).
The author would like to thank Björn Wesen and http://developer.intel.com/design/flcomp/applnots/297816.htm
the staff of Axis Communications AB for designing [LFS] Mendel Rosenblum and John K. Ousterhout,
the original JFFS and releasing it under the GNU The Design and Implementation of a Log-
General Public License — and in particular for then Structured File System, ACM Transactions on
answering a stream of silly questions about it. Computer Systems 10(1) (1992) pp. 26–52.
ftp://ftp.cag.lfs.mit.edu/dm/papers/rosenblum:lfs.ps.gz
The author is also grateful to Red Hat, Inc., who
for some reason took it upon themselves to actually [eCos] Red Hat, Inc., eCos — Embedded Config-
pay him for playing with this stuff. urable Operating System.
http://sources.redhat.com/ecos/
Also deserving of a special mention is Vipin Malik,
who has done a wonderful job of testing JFFS and
JFFS2, often managing to break the latter when it