Chapter 4 - Data Representation Computer File Systems
Chapter 4 - Data Representation Computer File Systems
At least, these all seem different to us. However, ALL types of information stored in a
computer are stored internally in the same simple format: a sequence of 0's and 1's
Computers work with a binary number system that consists of only two digits - zero and one.
Inside the computer binary number is represented by an electrical pulse. One means a pulse of
electricity and zero means no pulse. All the data entered into computers is first converted into
the binary number system. One digit in binary number system is called bit and combination of
eight bits is called byte. A byte is the basic unit that is used to represent the alphabetic, numeric
and alphanumeric data.
Intro to Computers: Chapter 4: Data Representation & Computer File System Page 1
3) Graphics: Graphics that are displayed on a computer screen consist of pixels: the tiny
"dots" of color that collectively "paint" a graphic image on a computer screen. The pixels
are organized into many rows on the screen. In one common configuration, each row is 640
pixels long, and there are 480 such rows. Another configuration (and the one used on the
screens in the lab) is 800 pixels per row with 600 rows, which is referred to as a "resolution
of 800x600." Each pixel has two properties: its location on the screen and its color.
In most programming languages, all basic data types are built-in. In addition, many languages
also provide a set of composite data types. Opinions vary as to whether a built-in type that is
not basic should be considered "primitive". The actual range of primitive data types that is
available is dependent upon the specific programming language that is being used.
Intro to Computers: Chapter 4: Data Representation & Computer File System Page 2
o 20.0005
o 99.9
4. Boolean, logical values true and false.
A Boolean type, typically denoted "bool" or "boolean", is typically a logical type that can
be either "true" or "false". Although only one bit is necessary to accommodate the value set
"true" and "false", programming languages typically implement boolean types as one or
more bytes.
Information is stored in computers in the form of bits. A bit is used to represent information in
the computer. They are referred to as binary digits i.e. the 0’s and 1’s with 0 representing an
OFF state and 1 representing an ON state. The stored bits are usually retrieved from computers
memory for manipulation by the processor
A single bit alone cannot represent a number, letters or special characters, to represent
information; bits are combined into groups of eight. A group of eight bits is called a byte. Each
byte can be used to represent a number, letter or special character.
Binary Numbers
Normally we write numbers using digits 0 to 9. This is called base 10. However, any positive
integer (whole number) can be easily represented by a sequence of 0's and 1's. Numbers in this
form are said to be in base 2 and they are called binary numbers. Base 10 numbers use a
positional system based on powers of 10 to indicate their value. The number 123 is really 1
hundred + 2 tens + 3 ones. The value of each position is determined by ever-higher powers of
10, read from left to right. Base 2 works the same way, just with different powers. The number
101 in base 2 is really 1 four + 0 twos + 1 one (which equals 5 in base 10).
NB. A computer file must have a file name and an extension that indicates the content of the
file.
The way information is grouped into a file is entirely up to how it is designed. This has led to a
plethora of more or less standardized file structures for all imaginable purposes, from the
simplest to the most complex. Most computer files are used by computer programs which
create, modify or delete the files for their own use on an as-needed basis. The programmers
who create the programs decide what files are needed, how they are to be used and (often) their
names. In some cases, computer programs manipulate files that are made visible to the
computer user. For example, in a word-processing, the user manipulates document files that the
user personally names. Although the content of the document file is arranged in a format that
the word-processing program understands, the user is able to choose the name and location of
the file and provide the bulk of the information (such as words and text) that will be stored in
the file.
Many applications pack all their data files into a single file called archive file, using internal
markers to discern the different types of information contained within. The benefits of the
archive file are to lower the number of files for easier transfer, to reduce storage usage, or just
to organize outdated files. The archive file must often be unpacked before next using.
Intro to Computers: Chapter 4: Data Representation & Computer File System Page 4
4.7. File Operations
The most basic operations that programs can perform on a file are:
Create a new file
Change the access permissions and attributes/characteristics of a file
Access permissions – rights on how the users can use the file
File attributes are metadata associated with computer files that define file
system behavior. Each attribute can have one of two states: set and cleared
Open a file, which makes the file contents available to the program
Read data from a file
Write data to a file
Close a file, terminating the association between it and the program
Most computers have at least one file system. Some computers allow the use of several
different file systems.
File systems are used to implement type of data store to store, retrieve and update a set of file.
Without a file system, information placed in a storage area would be one large body of
information with no way to tell where one piece of information stops and the next begins. File
systems may use a data storage device such as a hard disk or CD-ROM and involve
maintaining the physical location of the files, or they may be virtual and exist only as an access
method for virtual data or for data over a network (e.g. NFS).
The file system manages access to both the content of files and the metadata about those files.
It is responsible for arranging storage space; reliability, efficiency, and tuning with regard to
the physical storage medium are important design considerations.
Intro to Computers: Chapter 4: Data Representation & Computer File System Page 5
space allocation and subsequent incremental allocations as the file grows. As files are deleted
the space they were allocated eventually is considered available for use by other files. This
creates alternating used and unused areas of various sizes. This is free space fragmentation.
When a file is created and there is not an area of contiguous space available for its initial
allocation the space must be assigned in fragments. When a file is modified such that it
becomes larger it may exceed the space initially allocated to it, another allocation must be
assigned elsewhere and the file becomes fragmented.
Methods for encrypting file data are sometimes included in the file system. This is very
effective since there is no need for file system utilities to know the encryption seed to
effectively manage the data. The risks of relying on encryption include the fact that an attacker
can copy the data and use brute force to decrypt the data. Losing the seed means losing the
data.
In the event of an operating system failure or "soft" power failure, special routines in the file
system must be invoked similar to when an individual program fails. The file system must also
be able to correct damaged structures. These may occur as a result of an operating system
failure for which the OS was unable to notify the file system, power failure or reset. The file
system must also record events to allow analysis of systemic issues as well as problems with
specific files or directories.
Intro to Computers: Chapter 4: Data Representation & Computer File System Page 6
the data it specifies the size of a memory buffer and the file system transfers data from the
media to the buffer. Sometimes a runtime library routine may allow the user program to define
a record based on a library call specifying a length. When the user program reads the data the
library retrieves data via the file system and returns a record.
Some file systems allow the specification of a fixed record length which is used for all write
and reads. This facilitates updating records.
An identification for each record, also known as a key, makes for a more sophisticated file
system. The user program can read, write and update records without regard with their
location. This requires complicated management of blocks of media usually separating key
blocks and data blocks. Very efficient algorithms can be developed with pyramid structure for
locating records.
Intro to Computers: Chapter 4: Data Representation & Computer File System Page 7
Transactional file systems
Some programs need to update multiple files "all at once". For example, a software installation
may write program binaries, libraries, and configuration files. If the software installation fails,
the program may be unusable. Transaction file systems creates temporary files that keeps
records of the current transactions. The transaction files are used to update the master files.
Transaction processing introduces the isolation guarantee, which states that operations within a
transaction are hidden from other threads on the system until the transaction commits, and that
interfering operations on the system will be properly serialized with the transaction.
Transactions also provide the atomicity guarantee, that operations inside of a transaction are
either all committed, or the transaction can be aborted and the system discards all of its partial
results. This means that if there is a crash or power failure, after recovery, the stored state will
be consistent. Either the software will be completely installed or the failed installation will be
completely rolled back, but an unusable partial install will not be left on the system.
Ensuring consistency across multiple file system operations is difficult, if not impossible,
without file system transactions. File locking can be used as a concurrency control mechanism
for individual files, but it typically does not protect the directory structure or file metadata. For
instance, file locking cannot prevent race conditions on symbolic links. File locking also
cannot automatically roll back a failed operation, such as a software upgrade; this requires
atomicity.
Journaling file systems are one technique used to introduce transaction-level consistency to file
system structures. Journal transactions are not exposed to programs as part of the OS API; they
are only used internally to ensure consistency at the granularity of a single system call.
Data backup systems typically do not provide support for direct backup of data stored in a
transactional manner, which makes recovery of reliable and consistent data sets difficult. Most
backup software simply notes what files have changed since a certain time, regardless of the
transactional state shared across multiple files in the overall dataset. As a workaround, some
database systems simply produce an archived state file containing all data up to that point, and
the backup software only backs that up and does not interact directly with the active
transactional databases at all. Recovery requires separate recreation of the database from the
state file, after the file has been restored by the backup software.
Intro to Computers: Chapter 4: Data Representation & Computer File System Page 8
b) Location transparency A consistent name space exists encompassing local as well as
remote files. The name of a file does not give its location.
c) Concurrency transparency All clients have the same view of the state of the file system.
This means that if one process is modifying a file, any other processes on the same system
or remote systems that are accessing the files will see the modifications in a coherent
manner.
d) Failure transparency The client and client programs should operate correctly after a server
failure.
e) Heterogeneity File service should be provided across different hardware and operating
system platforms.
f) Scalability The file system should work well in small environments (1 machine, a dozen
machines) and also scale gracefully to huge ones (hundreds through tens of thousands of
systems).
g) Replication transparency To support scalability, we may wish to replicate files across
multiple servers. Clients should be unaware of this.
h) Migration transparency Files should be able to move around without the client's
knowledge.
Intro to Computers: Chapter 4: Data Representation & Computer File System Page 9