Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
25 views

CSC 202 File Processing

Csc file from a student From Ogun waterside

Uploaded by

maleekjr13
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

CSC 202 File Processing

Csc file from a student From Ogun waterside

Uploaded by

maleekjr13
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 54

CSC 202 – INTRODUCTION TO FILE PROCESSING

Data Processing
Data processing is the act of handling or manipulating data in some specified ways so as to give
meaning to data or to transform the data into information. It is the process through which facts
and figures are collected, assigned meaning, communicated to others and retained for future
use.

Data: The word "data" is the plural of datum, which means fact, observation, assumption or
occurrence. More precisely, data are representations of facts pertaining to people, things, ideas
and events. Data are represented by symbols such as letters of the alphabets, numerals or
other special symbols.

Information: can be defined as “data that has been transformed into a meaningful and useful
form for specific purposes”. There is no hard and fast rule for determining when data becomes
information.

Data Processing Activities


Data processing consists of series of activities which are necessary to transform data into
information. The activities can broadly be classified into: Collection (Originating, Measuring,
and Recording), Conversion (Coding, Classifying, Verifying, Transforming), Manipulation
(Sorting, Calculating, Summarizing, Comparing) Storage (Storing, Retrieving), Communication
and Reproduction.

(a) Collection
Data originates naturally in the form of events transaction, observations, measurement,
interview etc. This data is then recorded in some usable form. Observable data may be reported
as a narration or in table form. Data may be initially recorded on paper source and then
converted into a machine usable form for processing in what is term Data Capturing.
Alternatively, they may be recorded by a direct input device in a paperless, machine-readable
form using on-line medium or Direct Data Capturing machine.

Activity:
Study and record the cost of up keep for twenty students during Harmattan semester 2014/2015
academic session. Make use of two (a male & a female) students at every level (100 & 200) of
any five departments in KWASU, in all you are sampling twenty (20) students.
The fields to be used are: Matriculation Number, Surname, Other names, Sex, State,
Department, Level, Session, Semester, Feeding, Transport, Housing, Clothing, Recharge card,
and Books.

1
(b) Conversion
Once the data is collected, it is converted from its source documents to a form that is more
suitable for processing. The data is first codified by assigning identification codes. A code
comprises of numbers, letters, special characters, or a combination of these. For example, an
employee may be allotted a code as his category as A class, etc. It is useful to codify data,
when data requires classification. To classify means to categorize, i.e., data with similar
characteristics is placed in similar categories or groups. For example, one may like to arrange
accounts data according to account number or date. Hence a balance sheet can easily be
prepared. After classification of data, it is verified or checked to ensure the accuracy before
processing starts. After verification, the data is transcribed from one data medium to another.
For example, in case data processing is done using a computer, the data may be transformed
from source documents to machine sensible form using magnetic tape or a disk.

(c) Manipulation
Once data is collected and converted, it is ready for the manipulation function which converts
data into information. Manipulation consists of the following activities:

Sorting
It involves the arrangement of data items in a desired sequence. Usually, it is easier to work with
data if it is arranged in a logical sequence. Most often, the data are arranged in alphabetical
sequence. Sometimes sorting itself will transform data into information. For example, a simple
act of sorting the names in alphabetical order gives meaning to a telephone directory. The
directory will be practically worthless without sorting. Business data processing extensively
utilizes sorting technique. Virtually all the records in business files are maintained in some
logical sequence. Numeric sorting is common in computer-based processing systems because it
is usually faster than alphabetical sorting.

Calculating
Arithmetic manipulation of data is called calculating. Items of recorded data can be added to one
another, subtracted, divided or multiplied to create new data. Calculation is an integral part of
data processing. For example, in calculating an employee's pay, the hours worked multiplied by
the hourly wage rate gives the gross pay. Based on total earning, income-tax deductions are
computed and subtracted from gross-pay to arrive at net pay.

Summarizing
To summarize is to condense or reduce masses of data to a more usable and concise form. For
example, you may summarize a lecture attended in a class by writing small notes in one or two
pages. When the data involved is numbers, you summarize by counting or accumulating the
totals of the data in a classification or by selecting strategic data from the mass of data being
processed. For example, the summarizing activity may provide a general manager with sales-
totals by major product line, the sales manager with sales totals by individual salesman as well
as by the product line and a salesman with sales data by customer as well as by product line.

2
Comparing
To compare data is to perform an evaluation in relation to some known measure. For example,
business managers compare data to discover how well their companies are doing. They may
compare current sales figures with those for last year to analyze the performance of the
company in the current month.

(d) Managing the Output Results


Once data has been captured and manipulated, the following activities may be carried out:

Storing
To store is to hold data for continued or later use. Storage is essential for any organized method
of processing and re-using data. The storage mechanisms for data processing systems are file
cabinets in a manual system, and electronic devices such as magnetic disks/magnetic tapes in
case of computer based system. The storing activity involves storing data and information in
organized manner in order to facilitate the retrieval activity.

Retrieving
To retrieve means to recover or find again the stored data or information. Thus data whether in
file cabinets or in computers can be recalled for further processing. Retrieval and comparison of
old data gives meaning to current information.

(e) Communication
Communication is the process of sharing information. Unless the information is made available
to the users who need it, it is worthless. Thus, communication involves the transfer of data and
information produced by the data processing system to the prospective users of such
information or to another data processing system. As a result, reports and documents are
prepared and delivered to the users. In electronic data processing, results are communicated
through display units or terminals.

(f) Reproduction
To reproduce is to copy or duplicate data or information. This reproduction activity may be done
by hand or by machine.

The Data Processing Cycle


The data processing activities described above are common to all data processing systems from
manual to electronic systems. These activities can be grouped in four functional categories, viz.,
data input, data processing, data output and storage, constituting what is known as a data
processing cycle.

(i) Input
The term input refers to the activities required to record data and to make it available for
processing. The input can also include the steps necessary to check, verify and validate data
contents.

3
(ii) Processing
The term processing denotes the actual data manipulation techniques such as classifying,
sorting, calculating, summarizing, comparing, etc. that convert data into information.

(iii) Output
It is a communication function which transmits the information, generated after processing of
data, to persons who need the information. Sometimes output also includes decoding activity
which converts the electronically generated information into human-readable form.

(iv) Storage
It involves the filing/keeping of data and information for future use.

Computer Processing Operations

(a) Input/output operations


A computer can accept data (input) from and supply processed data (output) to a wide range of
input/output devices. These devices such as keyboards, display screens, and printers make
human-machine communication possible.

(b) Calculation and text manipulation Operations


Computer circuits perform calculations on numbers. They are also capable of manipulating
alphanumeric and other symbols used in text with equal efficiency.

(c) Logic/Comparison Operations


A computer also possesses the ability to perform logic operations. For example, if we compare
two items represented by the symbols A and B, there are only three possible outcomes.
A is less than B (A<B); A is equal to B (A=B): or A is greater than B (A>B).
A computer can perform such comparisons and, depending on the result, follow a
predetermined path to complete its work. This ability to compare is an important property of
computers.

(d) Storage and Retrieval Operations


Both data and program instructions are stored internally in a computer. Once they are stored in
the internal memory, they can be called up quickly or retrieved, for further use.

Data Processing System

The activity of data processing can be viewed as a "system". A system can be defined as "a
group of interrelated components that seeks the attainment of a common goal by accepting
inputs and producing outputs in an organized process". For example, a production system
accepts raw material as input and produces finished goods as output. Similarly, a data
processing system can be viewed as a system that uses data as input and processes this data
to produce information as output. There are many kinds of data processing systems. A manual
4
data processing system is one that utilizes tools like pens, and filing cabinets. A mechanical
data processing system uses devices such as typewriters, calculating machines and book-
keeping machines. Finally, electronic data processing uses computers to automatically
process data.

Data Organization/Hierarchy

Data can be arranged in a variety of ways, but a hierarchical approach to organization is


generally recommended.

*Field
A field is a data item in a computer file. Its length may be fixed or variable. If all individuals have
3 digit employee numbers, a 3-digit field is required to store the particular data. Hence, it is a
fixed field. In contrast, since customer's name varies considerably from one customer to
another, a variable amount of space must be available to store this element. This can be called
as variable field.

*Record
A record is a collection of related data items or fields. Each record normally corresponds to a
specific unit of information. For example, various fields in a student record may include student
number, student's name, level and department. This is the data used to produce the students
report. Each record contains all the data for a given student. Each related item is grouped
together to form a record.

*File
The collection of records is called a file. A file contains all the related records for an application.
Files are stored on some medium, such as floppy disk, magnetic tape or magnetic disk, flash
drives, memory card etc.

*Database
The collection of related files is called a database. A database contains all the related files for a
particular application.

Fixed and variable length records.

Fixed Length Records


In this case, all the records in a file have the same number of bytes. Such a file is called a flat
file. If all the records are expected to contain essentially the same quantity of data, then fixed
length records are used.

Variable Length Records


In this case, records vary in length. Use of variable length records conserves storage space
when the quantity of information, of various records in a file, differs significantly.
5
Logical Versus Physical Record

A logical record contains all the data related to a single entity. It may be a payroll record for an
employee or a record of marks secured by a student in a particular examination. E.g. The record
of student (1345cs022).

A physical record refers to a record whose data fields are stored physically next to one
another. It is also the amount of data that is treated as a single unit by the input-output device.
i.e. the unit of transfer between disk and primary storage. Portions of the same logical record
may be located in different physical records or several logical records may be located in one
physical record. Generally, a physical record consists of more than one logical record.

Type of Data Management Files

*Master Files- Master files are permanent files kept up-to-date by applying the transactions that
occur during a particular operation. They contain generally two basic types of data:
Data of a more or less permanent nature as well as data which will change every time
transactions are applied to the file.

*Transaction Files- accumulates records at arbitrary points in time mainly in the course of
updating master files- it is usually emptied after use and re-accumulated again when required.
Transaction files contain details of all transactions that have occurred in the last period. A period
may be the time that has elapsed a day, a week, a month or more. For example a sales
transaction file may contain details of all sales made that day. Once the data has been
processed it can be discarded (although backup copies may be kept for a while).

*Security Files- Backup files for master or transaction files- they are not used in the ordinary
course of processing- they are used for replacement or reconciliations

Information Retrieval Methods


The file processing activities involved in data retrieval may be broadly classified as follows:

Searching: A query about a particular item of data may require looking through a master file to
find the appropriate record or records- SQL statement is SELECT * From table where criteria

Selecting: this as to do with highlighting and displaying records in a particular categories


Sorting: Arranging outputs in a particular order, Ascending/ Descending
Summarizing: Totals of selected items

6
BASIC FILE CONCEPT.
File is the basic unit of storage that enables a computer to distinguish one set of
information from another. The file is the central element in most applications. Before data
can be processed by a Computer-Based Information System (CBIS), it must be
systematically organized. The most common method is to arrange data into fields,
records, files and databases. Files can be considered to be the framework around which
data processing revolves. File processing is the process of creating, storing and accessing the
content of files.

Logical Components of File


The logical components deal with the real-world objects the data represent. These are
field, record and file. However, in today’s an information system, files most often exist as
parts of database, or organized collections of interrelated data.

a. Field
A field is the basic element of data. An individual field contains a single value, such
as an employee’s last name, a date, or the value of a sensor reading. It is characterized
by its length and data type (e.g., ASCII, string, decimal). Depending on the file design,
fields may be fixed length or variable length. In the latter case, the field often consists of
two or three subfields: the actual value to be stored, the name of the field, and, in some
cases, the length of the field. In other cases of variable-length fields, the length of the
field is indicated by the use of special demarcation symbols between fields.

b. Record
A record is a collection of related fields that can be treated as a unit by some application
program. For example, an employee record would contain such fields as name,
identification number, job designation, date of employment, and so on. Again, depending
on design, records may be of fixed length or variable length. A record will be of variable
length if some of its fields are of variable length or if the number of fields may vary. In the
latter case, each field is usually accompanied by a fieldname. In either case, the entire
record usually includes a length field.

File
A file is a collection of related records. The file is treated as a single entity by
users and applications and may be referenced by name. Files have names and may be
created and deleted. Access control restrictions usually apply at the file level. That is, in
a shared system, users and programs are granted or denied access to entire files. In
some more sophisticated systems, such controls are enforced at the record or eventhe
field level.

7
Naming Files

Files provide a way to store information and read it back later. This must be done
in a way as to shield the user from the details of how and where the information is
stored, and how the disks actually work. When a process creates a file, it gives the file a
name. When the process terminates, the file continue to exist, and can be accessed by
other processes using its name.
The exact rules for file naming vary somewhat from system to system, but all
operating systems allow strings of one to eight letters as legal filenames. The file name
is chosen by the person creating it, usually to reflect its contents. There are few
constraints on the format of the filename: It can comprise the letters A-Z, numbers 0-9
and special characters $ # & + @! ( ) - { } ' ` _ ~ as well as space. The only symbols that
cannot be used to identify a file are * | <> \ ^ =? / [ ] ';, plus control characters. The main
reason for choosing a file name is that there are different rules for different operating
systems that can present problems when files are moved from computer to another. For
example, Microsoft Windows is case insensitive, so files like MYEBOOKS, myebooks,
MyEbooks are all the same to Microsoft Windows.
However, under the UNIX operating system, all three would be different files as, in this
instance, file names are case sensitive.

Naming Convention
Usually a file would have two parts with “.” separating them. The part on the left
side of the period character is called the main name while the part on the right side is
called the extension. A good example of a filename is “course.doc.” The main name is
course while the extension is doc. File extension differentiates between different types
of files. We can have files with same names but different extensions and therefore we
generally refer to a file with its name along with its extension and that forms a complete
file name.

File Name Extension


A filename extension is a suffix to the name of a computer file appliedto indicate
the encoding convention or file format of its contents. Insome operating systems (for
example UNIX) it is optional, while insome others (such as DOS) it is a requirement. File
8
extension shows the type of file and the application that the Operating System will use in
opening it. Some operating systems limit the length of the extension (such as DOS and
OS/2, to three characters) while others (such as UNIX) do not. Some operating systems
(for example RISC OS) do not use file extensions.
The following tables show examples of some common filename extensions, their content
and applications wherein they are used:

Table 1: Filename Extension of Textual files

Text File Type Content Application


.html Hypertext mark-up language, the Internet browser such as
code of simple web pages. It Internet Explorer, Mozilla
consists of plain text file with Firefox and Opera.
embedded formatting instructions.
.pdf Portable Document Format, a Adobe Acrobat
document presentation format.
.rtf Rich Text Format, a document Any word processing
format that can be shared between application
different word processors.
.txt A plain and simple text file Any word processing
application
.doc Word processing files created with Microsoft word (.doc),
.dot popular packages Microsoft word template
.abw (.dot), Abiword (.abw), and
.lwp Lotus WordPro (.lwp)

Table 2: Filename Extension of Image Files

Image File Type Content Application


.gif General Interchange Format, the Lview
most common graphics format
though not the most economical.
.jpg Joint Picture Experts Group. It is a Lview
.jpeg 24 bits graphic format
.mpg Moving Picture Expert Group, a Windows Media Player,
.mpeg standard Internet movie platform Quick Time.
.mov Quick Time Movie, Apple Windows media Player,
Macintosh native movie platform. Quick Time.

9
Table 3: Filename Extension of Sound Files

Sound File types Content Application


.mp3 Audio Files on both PC and Windows Media
Macintosh Player
.wav Audio Files on PC Real Player
.ra Real Audio, a proprietary system
for delivering and playing
streaming audio on the web
.aiff Audio Files on Mac.

Table 4: Filename Extension of Utility type programme

Utility File Type Content Application


.ppt A presentation file for slide Microsoft power point
shows
.xls Spreadsheet Files Microsoft Excel, Lotus 123
.123
.mdb A database file Microsoft Access.

Table 5: Filename Extension of other types of files

File Type Content Application


.dll Dynamic Link Library. This is a compiled This is a compiled system
set of procedures and/or drivers called by file that should not be
another program. moved or altered
.exe A DOS/Windows program Downloads and launches it
in its own temporary
directory.
.zip Various popular compression formats for WinZip, ZipIt.
.sit the PC, Macintosh, and UNIX respectively.
.tar

File Attributes
The particular information kept for each file varies from operating system to operating
system. No matter what operating system one might be using, files always have certain
attributes or characteristics. Different file attributes are discussed as follow.

a. File Name
The symbolic file name is the only information kept in human-read form. As it is obvious, a file
name helps users to differentiate between various files.

10
b. File Type
A file type is required for the systems that support different types of files. As discussed earlier,
file type is a part of the complete file name. We might have two different files; say “csc202.doc”
and “csc202.txt”.Therefore the file type is an important attribute which helps in differentiating
between files based on their types. File types indicate which application should be used to open
a particular file.

c. Location
This is a pointer to the device and location on that device of the file. As it is clear from the
attribute name, it specifies where the file is stored.

d. Size
Size attribute keeps track of the current size of a file in bytes, words or blocks. The size of a file
is measured in bytes. A floppy disk holds about1.44 Mb; a Zip disk holds 100 Mb or 250 Mb; a
CD holds about 800Mb; a DVD holds about 4.7Gb.

e. Protection
Protection attribute of a file keeps track of the access-control information that controls who can
do reading, writing, executing, and soon.

f. Usage Count
This value indicates the number of processes that are currently using (have opened) a particular
file.

g. Time, Date and Process Identification


This information may be kept for creation, last modification, and last use. Data provided by
this attribute is often helpful for protection and usage monitoring. Each process has its own
identification number which contains information about file hierarchy.

Attribute Values of File.


Files have attributes varying considerably from system to system. The table below shows some
of the possibilities, but other ones also exist.

Table 6: Fields and various attribute values

FIELD MEANING
Protection Who can access the file and in what way?
Password Password needed to access the file
Creator Identity of the person who created the file
Owner Current owner
Read-only flag 0 for read/write, 1 for read only
Hidden flag 0 for normal, 1 for do not display in listing
System flag 0 for normal file, 1 for system file
Archive flag 0 has been backed up, 1 for needs to be backed up

11
ASCII/binary file 0 for ASCII file, 1 for binary file
Random Access file 0 for sequential access only, 1 for random access.
Temporary flag 0 for normal, 1 for delete on process exit
Lock flags 0 for unlocked, nonzero for locked
Record length Number of bytes in a record
Key position Offset of the key within each record
Key length Number of bytes in the key field
Creation time Date and time file was created
Time of last access Date and time file was last accessed
Time of last change Date and Time file was last changed
Current size Number of bytes in the file
Maximum size Maximum size file may grow

- The first four attributes relate to the file’s protection and tell who may access it and who
may not. All kinds of scheme are possible; in some systems the user must present a
password to access a file, in which case the password must be one of the attributes.

- The flags are bits or short fields that control or enable some specific properties. Hidden
files, for example, do not appear in listing of the files. The archive flag is a bit that keeps
track of whether the file has been backed up. The backup program clears it, and the
operating system sets it whenever a file is changed. In this way, the backup program can
tell which files need backing up. The temporary flag allows a file to be marked for
automatic deletion when the process that created it terminates.

- The record length, key position, and key length fields are only present in files whose
records can be looked up using a key. They provide the information required to find the
keys. The various times keep track of when the file was created, most recently accessed
and most recently modified. These are useful for a variety of purposes. For example, a
source file that has been modified after the creation of the corresponding object file
needs to be recompiled. These fields provide the necessary information.

- The current size tells how big the file is at present. Some mainframe operating systems
require the maximum size to be specified when the file is created, to let the operating
system reserve the maximum amount of storage in advance. Minicomputers and
personal computer systems are clever enough to do without this item.

The major advantage of file processing is that it helps to avoid duplication of Data. Applications
are developed independently in file processing system leading to unplanned duplicate files.
Duplication is wasteful as it requires additional storage space and changes in one file must be
made manually in all files. This also results in loss of data integrity. It is also possible that the
same data item may have different names in different files, or the same name may be used for
different data items in different files.

12
FILE ORGANISATION AND ACCESS METHOD
File organisation refers to the logical structuring of the records as determined by the way
in which they are accessed. File organisation refers to the structure of a file in terms of its
components and how they are mapped onto the backing store. Data files are organized so as to
facilitate access to records and to ensure their efficient storage. A trade-off between these two
requirements generally exists: if rapid access is required, more storage must be expended to
make it possible (for example, by providing indexes to the data records). Access to a record for
reading it (and sometimes updating it) is the essential operation on data. Any given file
organization supports one or more file access methods. Organisation is thus closely related to
but conceptually distinct from access methods. Access method is any algorithm used for the
storage and retrieval of records from a data file by determining the structural characteristics of
the file on which it is used.

File Organisation Criteria


In choosing a file organisation, several criteria are important:
 Short access time
 Ease of update
 Economy of storage
 Simple maintenance
 Reliability.
The relative priority of these criteria will depend on the applications that will use the file. For
example, if a file is only to be processed in batch mode, with all of the records accessed every
time, then rapid access for retrieval of a single record is of minimal concern. A file stored on
CDROM will never be updated, and so ease of update is not an issue. These criteria may
conflict. For example, for economy of storage, there should be minimum redundancy in the data.

FILE ORGANISATION METHODS


The number of alternative file organisation is unmanageably large, but we will consider five
fundamental organisations. Most structures used in actual systems either fall into one of these
categories or can be implemented with a combination of these organisations.
 The serial or pile
 The sequential file
 The indexed sequential file

13
 The direct or hashed file
I. The Serial File
These are files of unordered record. Data are collected in the order in which they arrive. When
records are received they are stored in the next available storage position. The purpose of the
serial file is simply to accumulate the mass of data and save it. Records may have different
fields, or similar fields in different orders. Because there is no structure to the serial file, record
access is by exhaustive search. That is, if we wish to find a record that contains a particular field
with a particular value, it is necessary to examine each record in the pile until the desired record
is found or the entire file has been searched. If we wish to find all records that contain a
particular field or contain that field with a particular value, then the entire file must be searched.
It allows quick insertion since there is no particular ordering. Records can easily be appended
to the end of the file and is easy to update. It can be used as temporary files to store transaction
data. However, beyond these limited uses, this type of file is unsuitable for most applications.
E.g. Records on Tape

II. The Sequential File


The most common form of file structure is the sequential file. In this file organisation, a fixed
format is used for records. All records are of the same length, consisting of the same number of
fixed-length fields in a particular order. Insertion is done by batch update. A temporary
unsorted file (transaction file) is used to hold the records to be inserted. It is then merged to the
sorted file periodically. One particular field, usually the first field in each record, referred to as
the key field uniquely identifies the record; thus key values for different records are always
different. Further, the records are stored in key sequence: alphabetical order for a text key, and
numerical order for a numerical key.
Sequential files are typically used in batch applications and are generally optimum for
such applications if they involve the processing of all the records (e.g., a billing or payroll
application).The sequential file organisation is the only one that is easily stored on tape as well
as disk. For interactive applications that involve queries and/or updates of individual records, the
sequential file provides poor performance. Additions to the file also present problems. Typically,
a sequential file is stored in simple sequential ordering of the records within blocks. That is, the
physical organisation of the file on tape or disk directly matches the logical organisation of the
file. In this case, the usual procedure is to place new records in a separate pile file, called a log
file or transaction file.

14
Periodically, a batch update is performed that merges the log file with the master file to
produce a new file in correct key sequence.

III. The Indexed Sequential File


A popular approach to overcoming the disadvantages of the sequential file is the indexed
sequential file. These are sorted data files with an index associated with it. The index indicate
the block containing the record with a given value for the key field. This is similar to the index at
the end of a book that gives easy access to the content. The index to the file support random
access. The indexed sequential file maintains the key characteristic of the sequential file:
records are organised in sequence based on a key field. Two features are added:
 an index to the file to support random access, and
 an overflow file.
The index provides a lookup capability to quickly reach the vicinity of a desired record.
The overflow file is similar to the log file used with a sequential file but is integrated so that a
record in the overflow file is located by following a pointer from its predecessor record. In the
simplest indexed sequential structure, a single level of indexing is used. The index in this case is
a simple sequential file. Each record in the index file consists of two fields: a key field, which is
the same as the key field in the main file, and a pointer into the main file. To find a specific
record, the index is searched to find the highest key value that is equal to or precedes the
desired key value. The search continues in the main file at the location indicated by the pointer.
Additions to the file are handled in the following manner: Each record in the main file
contains an additional field not visible to the application, which is a pointer to the overflow file.
When a new record is to be inserted into the file, it is added to the overflow file. The record in
the main file that immediately precedes the new record in logical sequence is updated to contain
a pointer to the new record in the overflow file. If the immediately preceding record is itself in the
overflow file, then the pointer in that record is updated. As with the sequential file, the indexed
sequential file is occasionally merged with the overflow file in batch mode.
The indexed sequential file greatly reduces the time required to access a single record,
without sacrificing the sequential nature of the file. To process the entire file sequentially, the
records of the main file are processed in sequence until a pointer to the overflow file is found,
then accessing continues in the overflow file until a null pointer is encountered, at which time
accessing of the main file is resumed where it left off. It allows sequential processing and

15
individual record retrieval through the index. The disadvantage here is that creation of index
table causes addition overhead.

IV. The Direct or Hashed File


The direct or hashed file exploits the capability found on disks to access directly any block of
a known address. As with sequential and indexed sequential files, a key field is required in each
record. However, there is no concept of sequential ordering here. A unique address (location) is
produced for each record. This is done by techniques (formula/algorithm) known as key
transformation to generate the hash function. Direct files are often used where very rapid access
i.e. fast response is required, where fixed length records are used, and where records are
always accessed one at a time. Examples are directories, pricing tables, schedules, and name
lists.
Records in a direct file are not stored physically one after the other, rather, they are stored
on a disk with a particular address or location that determinetheir position. The file allows
programs to read and write records rapidly in no particular order. The direct access is based on
disk model since disk allows random access to any file block. The advantage of this method is
that they are often used and are very efficient where very rapid access is required ie queries on
database. It also supports random access. The disadvantage it however poses is that searching
for a record may take a little time since there is no index.

FILE MANAGEMENT SYSTEM, OPERATIONS AND DIRECTORIES.


I. FILE MANAGEMENT SYSTEM (FMS)
Files are normally organized into directories to ease their use. When multiple users
have access to files, it may be desirable to control by whom and in what ways files may
be accessed. File management is the processes concerned with the overall
management of files. A file management system is that set of system software that
provides services to users and applications in the use of files.
The file management system is the subsystem of an operating system that
manages the data storage organisation on secondary storage, and provides services to
processes related to their access. In this sense, it interfaces the application programs
with the low-level media-I/O (e.g. disk I/O) subsystem, freeing the application
programmers from having to deal with low-level intricacies and allowing them to
16
implement I/O using convenient data-organisational abstractions such as files and
records. On the other hand, the FMS services often are the only ways through which
applications can access the data stored in the files, thus achieving an encapsulation of
the data themselves which can be usefully exploited for the purposes of data protection,
maintenance and control. Typically, the only way that a user or application may access
files is through the file management system. This relieves the user or programmer of the
necessity of developing special-purpose software for each application and provides the
system with a consistent, well-defined means of controlling its most important asset.

Objectives of File Management System


The following are the objectives of file management;
 Data Management. A FMS should provide data management services to
applications through convenient abstractions, simplifying and making device-
independent of the common operations involved in data access and modification.
 Generality with respect to storage devices. The FMS data abstractions and
access methods should remain unchanged irrespective of the devices involved in
data storage.
 Validity. An FMS should guarantee that at any given moment the stored data
reflect the operations performed on them, regardless of the time delays involved in
actually performing those operations. Appropriate access synchronization
mechanism should be used to enforce validity when multiple accesses from
independent processes are possible.
 Protection. Illegal or potentially dangerous operations on the data should be
controlled by the FMS, by enforcing a well defined data protection policy.
 Concurrency. In multiprogramming systems, concurrent access to the data
should be allowed with minimal differences with respect to single-process access,
save for access synchronization enforcement.
 Performance. The above functionalities should be offered achieving at the same
a good compromise in terms of data access speed and data transferring rate.

17
Functions of File Management
With respect to meeting user requirements, the extent of such requirements
depends on the variety of applications and the environment in which the computer
system will be used. Therefore, File Management system should ensure that;

 Each user should be able to create, delete, read, write, and modify files.
 Each user may have controlled access to other users’ files.
 Each user may control what types of accesses are allowed to the user’s files.
 Each user should be able to restructure the user’s files in a form appropriate to the
problem.
 Each user should be able to move data between files.
 Each user should be able to back up and recover the user’s files in case of
damage.
 Each user should be able to access his or her files by name rather than by
numeric identifier.

File System Architecture

User Program

Pile Sequential Index Indexed Hashed


Sequential
Logical I/O
Basic I/O supervisor
Basic File system
Disk device driver Tape device driver

 Device Drivers
At the lowest level, device drivers communicate directly with peripheral devices.
Drivers are special software programs that operate specific devices that can be either crucial or
optional to the functioning of the computer. Drivers help operate keyboards, printers, DVD
drives, etc.

18
The device driver is responsible for starting I/O operations on a device and processing the
completion of an I/O request. The typical devices controlled are disk and tape drives. Device
drivers are usually considered to be part of the operating system.

 Basic File System


The next level is referred to as the basic file system. This is the primary interface
with the environment outside of the computer system. It deals with blocks of data that
are exchanged with disk or tape systems. Thus, it is concerned with the placement of
those blocks on the secondary storage device and on the buffering of those blocks in
main memory. The basic file system is often considered part of the operating system.

 Basic I/O Supervisor


The basic I/O supervisor is responsible for all file I/O initiation and termination. The
basic I/O supervisor selects the device on which file I/O is to be performed, based on the
particular file selected. It is also concerned with scheduling disk and tape accesses to
optimize performance. I/O buffers are assigned and secondary memory is allocated at
this level. The basic I/O supervisor is part of the operating system.

 Logical I/O
Logical I/O enables users and applications to access records. Logical I/O provides a
general-purpose record I/O capability and maintains basic data about files. The level of
the file system closest to the user is often termed the access method. It provides a
standard interface between applications and the file systems and devices that hold the
data. Different access methods reflect different file structures and different ways of
accessing and processing the data.

II. OPERATIONS SUPPORTED BY FILE MANAGEMENT SYSTEM


Users and applications make use of files. The following are operations supported by file
management system.
 Retrieve _All
This operation retrieves all the records of a file. This will be required for an application
that must process all of the information in the file at one time. For example, an
19
application that produces a summary of the information in the file would need to retrieve
all records. This operation is often equated with the term sequential processing, because
all of the records are accessed in sequence.

 Retrieve _One
This requires the retrieval of just a single record. Interactive, transaction-oriented
applications need this operation.
 Retrieve _Next
This requires the retrieval of the record that is “next” in some logical sequence to the
most recently retrieved record. Some interactive applications, such as filling in forms,
may require such an operation. A program that is performing a search may also use this
operation.

 Retrieve _Previous
This is similar to Retrieve_Next, but in this case the record that is “previous” to the
currently accessed record is retrieved.
 Insert _One
Insert a new record into the file. It may be necessary that the new record fit into a
particular position to preserve a sequencing of the file.

 Delete_One
Delete an existing record. Certain linkages or other data structures may need to be
updated to preserve the sequencing of the file.
 Update_One
This operation retrieves a record, update one or more of its fields, and rewrite the
updated record back into the file. If the length of the record has changed, the update
operation is generally more difficult than if the length is preserved.
 Retrieve_Few
This retrieves a number of records. For example, an application or user may wish to
retrieve all records that satisfy a certain set of criteria. The nature of the operations that
are most commonly performed on a file will influence the way the file is organized as
previously discussed.
20
III. FILE DIRECTORIES
Concept of File Directory
To keep track of files, the file system normally provides directories, which, in many
systems are themselves files. The structure of the directories and the relationship among
them are the main areas where file systems tend to differ.
Associated with any file management system and collection of files is a file directory.
The directory contains information about the files, including attributes, location,
and ownership. Much of this information, especially those that concern storage, is
managed by the operating system. The directory is itself a file, accessible by various file
management routines.

The Contents of File Directory


From the user’s point of view, the directory provides a mapping between file
names, known to users and applications, and the files themselves. Thus, each file entry
includes the name of the file. An important category of information about each file
concerns its storage, including its location and size. In shared systems, it is also
important to provide information that is used to control access to the file. Typically, one
user is the owner of the file and may grant certain access privileges to other users.
Usage information is needed to manage the current use of the file and to record the
history of its usage.

File Directory Structure


The following are the schemes for defining the logical structure of a directory.
 Single-Level Directory
 Two-Level Directory
 Tree-Structured Directory
 Acyclic Graph Directory

1. Single-Level Directory
In a single-level directory system, all the files are placed in one directory. This is very
common on single-user operating systems. A single-level directory has significant

21
limitations when the number of files increases or when there is more than one user.
Since all files are in the same directory, they must have unique names. If there are two
users who call their data file “CSC202note.doc”, then the unique-name rule is violated.

Directory rec Text Data Mail Nect Cony Be Eni hex

Files

2. Two-Level Directory
In the two-level directory system, the system maintains a master block that has one
entry for each user. This master block contains the addresses of the directory of the
users. This structure effectively isolates one user from another. This design eliminates
name conflicts among users and this is an advantage because users are completely
independent, but a disadvantage when the users want to cooperate on some task and
access files of other users. Some systems simply do not allow local files to be accessed
by other users. It is also unsatisfactory for users with many files because it is quite
common for users to want to group their files together in a logical way. Below is a
double-level directory.

22
3. Tree-Level Structural Directories
In the tree-structured directory, the directory themselves are considered as files. This
leads to the possibility of having sub-directories that can contain files and sub-
subdirectories. An important issue in a tree-structured directory structure is how to
handle the deletion of a directory. If a directory is empty, its entry in its containing
directory can simply be deleted. However, suppose the directory to be deleted is not
empty, but contains several files or sub-directories then it becomes a bit problematic.
Some systems will not delete a directory unless it is empty. Thus, to delete a directory,
someone must first delete all the files in that directory. If there are any subdirectories,
this procedure must be applied recursively to them so that they can be deleted too.

4. Acyclic-Graph Directories
The acyclic directory structure is an extension of the tree-structured directory structure.
Unlike in the tree-structured directory where files and directories are owned by one
particular user, the acyclic structure takes away this prohibition and thus a directory or
file under directory can be owned by several users.

23
PATH NAMES
When a file system is organized as a directory tree, some way is needed for specifying
the filenames. Any file in the system can be located by following a path from the root or
master directory down various branches until the file is reached. The series of directory
names, culminating (ending) in the file name itself, constitutes a pathname for the file.
Two different methods commonly used are:
- Absolute Path name
- Relative Path name
 Absolute Path Name
With this path name, each file is given a path consisting of the path from the root
directory to the file. As an example, the file in the lower left hand corner of the Figure
below has the pathname User_B/Word/Unit_A/ABC. The slash is used to delimit names
in the sequence. The name of the master directory is implicit, because all paths start at
that directory. Note that it is perfectly acceptable to have several files with the same file
name, as long as they have unique pathnames, which is equivalent to saying that the

24
same file name may be used in different directories. In this example, there is another file
in the system with the file name ABC, but that has the pathname User_B/Draw/ABC.

Note that absolute file names always start at the root directory and are unique. In UNIX
the file components of the path are separated by /. In MS-DOS the separator is \. In
MULTICS it is >.

 Relative Path Name


Although the pathname facilitates the selection of file names, it would be awkward for
a user to have to spell out the entire pathname every time a reference is made to a file.
Typically, an interactive user or a process has associated with it a current directory, often
referred to as the working directory or current directory. Files are then referenced

25
relative to the working directory. For example, if the working directory for user B is
“Word,” then the pathname Unit_A/ABC is sufficient to identify the file in the lower left-
hand corner of the above figure.

Operations on Files and Directories


The operating system provides systems calls to create, write, read, reposition,
truncate and delete files. The following are various operations that can take place on
file:
a. Creating a File
When creating a file, a space in the file system must be found for the file and then an
entry for the new file must be made in the directory. The directory entry records the
name of the file and the location in the file system.
b. Opening a File
Before using a file, a process must open it. The purpose of the OPEN call is to allow the
system to fetch the list of secondary storage disk addresses into main memory for rapid
access of file.
c. Closing a File
When all the accesses are finished, the file should be closed to free up internal table
space. Many systems encourage this by imposing a maximum number of open files on
processes.
d. Writing a File
To write a file, a system call is made specifying both the name of the file and the
information to be written to the file. Given the name of the file, the system searches the
directory to find the location of the file. The directory entry will need to store a pointer to
the current block of the file (usually the beginning of the file). The write pointer must be
updated ensuring successive writes that can be used to write a sequence of blocks to
the file.

e. Reading a File
To read a file, a system call is made that specifies the name of the file and where (in
memory) the next block of the file should be put. Again, the directory is searched for the

26
associated directory entry, and the directory will need a pointer to the next block to be
read. Once the block is read, the pointer is updated.

f. Deleting a File
To delete a file, the directory is searched for the named file. Having found the associated
directory entry, the space allocated to the file is released (so it can be reused by other
files) and invalidates the directory entry.

g. Renaming a File
It frequently happens that user needs to change the name of an existing file. This system
call makes that possible. It is not always strictly necessary, because the file can always
be copied to a new file with the new name, and the old file then deleted.

h. Appending a File
This call is a restricted form of WRITE call. It can only add data to the end of the file.
System that provide a minimal set of system calls do not generally have APPEND.

i. List a Directory
We need to list the files in a directory and the contents of the directory entry for each file
in the list.

FILE ALLOCATION.
The main purpose of a computer system is to execute programs. Those programs
together with the data they access must be in main memory during execution. Since main
memory is usually too small to accommodate all the data and programs permanently, the
computer system must provide secondary storage to back up main memory. Most modern
computer systems use disk as the primary storage medium for information (both programs and
data). We want to discuss how files are being allocated to the disk storage. In allocating disk
space, the following issues are involved:
- When a new file is created, is the maximum space required for that file allocated all at
once?
- Should the space allocated to a file be one or more contiguous units/portions? A portion
is a contiguous set of allocated blocks. The size of a portion can range from a single
block to the entire file. What should the size of portion allocated for a file be?

27
- What sort of data structure or table is used to keep track of the portions assigned to a
file? An example of such a structure is a file allocation table (FAT)

Pre-allocation versus Dynamic Allocation.


A pre-allocation policy requires that the maximum size of a file be declared at the time of
the file creation request. In a number of cases, such as program compilations, production of
summary data files, or the transfer of a file from another system over a communication network,
this value (size of a file) can be reliably estimated.
However, for many applications, it is difficult if not impossible to estimate reliably the
maximum potential size of a file. In those cases, users and application programmers would tend
to overestimate file size so as not to run out of space. This is wasteful from the point of view of
secondary storage allocation. The dynamic allocation allocates space to a file in portions as
needed.

Portion Size.
The second issue as listed above is that of the size of the portion allocated to a file. At one
extreme, a portion large enough to hold the entire file is allocated. At the other extreme, space
on the disk is allocated one block at a time as the need arises. In choosing a portion size, there
is a tradeoff between efficiency from the point of view of a single file versus overall system
efficiency. A list of some items to be considered in the tradeoff is:
- Contiguity of space increases performance, especially for Retrieve_Next operations, and
greatly for transactions running in a transaction-oriented operating system
- Having a large number of small portions increases the size of tables needed to manage
the allocation information.
- Having fixed-size portions (for example, blocks) simplifies the reallocation of space.
- Having small fixed-size portions minimizes waste of unused storage due to over-
allocation.

File Allocation Methods


The main issue here is how to allocate space to files so that disk space is utilised
effectively and file can be accessed quickly. Three major methods of allocating disk space are;
contiguous, linked/chained and indexed.
1. Contiguous Allocation:-With contiguous allocation, a single contiguous set of
blocks is allocated to a file at the time of file creation. This is a pre-allocation strategy,

28
using variable-size portions. The file allocation table needs just a single entry for each
file, showing the starting block and the length of the file. Contiguous allocation is the best
from the point of view of the individual sequential file. Multiple blocks can be read in at a
time to improve I/O performance for sequential processing. It is also easy to retrieve a
single block. Note that, with pre-allocation, it is necessary to declare the size of the file at
the time of creation. Fragmentation is a problem in this situation. Compaction of the free
space is needful from time to time.

Contiguous File Allocation.

Contiguous File Allocation (After Compaction)

29
2. Linked/Chained Allocation: In Chained allocation, allocation is on an individual
block basis. Each block contains a pointer to the next block in the chain. Again, the file
allocation table needs just a single entry for each file, showing the starting block and the
length of the file. Although pre-allocation is possible, it is more common simply to allocate
blocks as needed. Any free block can be added to a chain. There is no external
fragmentation in this case because only one block at a time is needed. To select an
individual block of a file requires tracing through the chain to the desired block. One
consequence of linking/chaining, is that there is no accommodation of the principle of
locality.

3. Indexed Allocation:-Indexed allocation addresses many of the problems of


contiguous and linked/chained allocation. In this case, the file allocation table contains a
separate one-level index for each file; the index has one entry for each portion allocated to the
file. Typically, the file indexes are not physically stored as part of the file allocation table. Rather,
the file index for a file is kept in a separate block and the entry for the file in the file allocation
table points to that block.
Allocation by blocks eliminates external fragmentation, whereas allocation by variable-size
portions improves locality of file (i.e. files situated at particular point). In either case, file
consolidation may be done from time to time. File consolidation reduces the size of the index in
the case of variable-size portions, but not in the case of block allocation. Indexed allocation
supports both sequential and direct access to the file and thus is the most popular form of file
allocation.
30
File Allocation Table (FAT): is a table that shows the starting position of files in blocks as well
as the length of files.
Compaction: is a process of shuffling the memory content to place all free spaces together in
one large block.

BLOCKING OF RECORD.
File is a body of stored data or information in an electronic format. Almost all information
stored on computers is in the form of files. Files reside on mass storage devices such as hard
disks, optical disks, magnetic tapes, and floppy disks. When the Central Processing Unit (CPU)
of a computer needs data from a file, or needs to write data to a file, it temporarily stores the file
in its main memory, or Random Access Memory (RAM), while it works on the data.
A file consists of a collection of blocks and the operating system is responsible for
allocating blocks to files.
Records are the logical unit of access of a structured file, whereas blocks are the unit of I/O with
secondary storage. For I/O to be performed, records must be organized as blocks. On most
systems, blocks are of fixed length. This simplifies I/O, buffer allocation in main memory, and
the organisation of blocks on secondary storage. The larger the block, the more records that
31
are passed in one I/O operation. If a file is being processed or searched sequentially, this is an
advantage, because the number of I/O operations is reduced by using larger blocks, thus
speeding up processing. On the other hand, if records are being accessed randomly and no
particular locality of reference is observed, then larger blocks result in the unnecessary transfer
of unused records.
However, we can say that the I/O transfer time is reduced by using larger blocks, but a
competing concern is that larger blocks require larger I/O buffers, making buffer management
more difficult. (buffer is a temporary storage for data being manipulated or processed).

Methods of Blocking
There are three methods of blocking namely; fixed, variable-length spanned and variable-length
unspanned.

a. Fixed Blocking
It uses fixed-length records, and an integral number of records are stored in a block. There may
be unused space at the end of each block. This is referred to as internal fragmentation. File
fragmentation is defined as a condition in which files are broken apart on disk into small,
physically separated segments.

b. Variable-Length Spanned Blocking


Variable-length records are used and are packed into blocks with no unused space. Thus, some
records must span two blocks, with the continuation indicated by a pointer to the successor
block.

c. Variable-Length Unspanned Blocking


Variable-length records are used, but spanning is not employed. There is wasted space in most
blocks because of the inability to use the remainder of a block if the next record is larger than
the remaining unused space.
NB: Variable-length spanned blocking is efficient for storage and does not limit the size of
records. However, this technique is difficult to implement. Records that span two blocks require
two I/O operations, and files are difficult to update, regardless of the organisation. Variable-
length unspanned blocking results in wasted space.

32
File Space Management
Files are normally stored on disk, so management of disk space is a major concern to the
file system designers. To keep track of free disk space, the system maintains a free space list.
The free space list records all disk blocks that are free – those that are not allocated to some file
or directory.
To create a file, the system searches the free space and allocates that space to the new file.
This space is then removed from the free-space list. When a file is deleted, its disk space is
added to the free-space list.

33
Techniques Used in Space Management
There are four different techniques of free space. They are;
 Bit tables
 Chained free portion
 Indexing
 Free block list

i. Bit Tables
This method uses a vector containing one bit for each block on the disk. Each entry of a 0
corresponds to a free block, and each 1 corresponds to a block in use. For example, for the disk
layout shown below, a vector of length 35 is needed and would have the following value:
00111000011111000011111111111011000

A bit table has the advantage that it is relatively easy to find one or a contiguous group of free
blocks. Thus, a bit table works well with any of the file allocation methods outlined. Another
advantage is that it is as small as possible. However, it can still be sizeable. The amount of
memory (in bytes) required for a block bitmap is
disk size in bytes
8 x file system block size
Thus, for a 16-Gigbyte disk with 512-byte blocks, the bit table occupies about 4 Mbytes.
Accordingly, most file systems that use bit tables maintain auxiliary data structures that
summarise the contents of subranges of the bit table. For example, the table could be divided
34
logically into a number of equal-size sub-ranges. A summary table could include, for each sub-
range, the number of free blocks and the maximum-sized contiguous number of free blocks.
When the file system needs a number of contiguous blocks, it can scan the summary table to
find an appropriate sub-range and then search that sub-range.

ii. Linked List/Chained Free Portions


The free portions may be chained together by using a pointer and length value in each free
portion. This method has negligible space overhead because there is no need for a disk
allocation table, merely for a pointer to the beginning of the chain and the length of the first
portion. This method is suited to all of the file allocation methods. If allocation is a block at a
time, simply choose the free block at the head of the chain and adjust the first pointer or length
value. The headers from the portions are fetched one at a time to determine the next suitable
free portion in the chain. This method has its own problems. After some use, the disk will
become quite fragmented and many portions will be a single block long. Also, note that every
time you allocate a block, you need to read the block first to recover the pointer to the new first
free block before writing data to that block. If many individual blocks need to be allocated, at one
time, for a file operation, this greatly slows file creation. Similarly, deleting highly fragmented
files is very time consuming.

iii. Indexing
The indexing approach treats free space as a file and uses an index table as described under
file allocation. There is one entry in the table for every free portion on the disk. This approach
provides efficient support for all of the file allocation methods.

iv. Free Block List


In this method, each block is assigned a number sequentially and the list of the numbers of all
free blocks is maintained in a reserved portion of the disk. Although the free block list is too
large to be store in main memory, there are two effective techniques for storing a small part of
the list in main memory.

a. The list can be treated as a push-down stack (LIFO) with the first few thousand elements of
the stack kept in main memory. When a new block is allocated, it is popped from the top of the
stack, which is in main memory. Similarly, when a block is de-allocated, it is pushed onto the
stack. There has to be a transfer between disk and main memory when the in-memory portion of
35
the stack becomes either full or empty. Thus, this technique gives almost zero-time access most
of the time.

b. The list can be treated as a FIFO queue, with a few thousand entries from both the head and
the tail of the queue in main memory. A block is allocated by taking the first entry from the head
of the queue and de-allocated by adding it to the end of the tail of the queue. There only has to
be a transfer between disk and main memory when either the in-memory portion of the head of
the queue becomes empty or the in-memory portion of the tail of the queue becomes full.

File System Performance


Access to disk is much slower than access to memory. Reading a memory word typically takes
a few hundred nanoseconds at most. Reading a disk block takes tens of milliseconds, a factor of
100,000 slower. As a result of this difference in access time, many file systems have been
designed to reduce the number of disk accesses needed.

Methods of Improving Performance

a. Block Caching
The most common technique used to reduce disk accesses is the block cache. (Cache means
to hide.) In this context, a cache is a collection of blocks that logically belong to the disk, but are
being kept in memory for performance reasons. Various algorithms can be used to manage the
cache, but a common one is to check all read requests to see if the needed block is in the
cache. If it is, the read request can be satisfied without a disk access. If the disk is not in the
cache, it is first read into the cache, and then copied to wherever it is needed. Subsequent
requests for the same block can be satisfied from the cache.

b. Reduction in Disk Motion


Another important technique is to reduce the amount of disk arm motion by putting blocks that
are likely to be accessed in sequence close to each other, preferably in the same cylinder.
When output file is written, the file system has to allocate the blocks one at a time, as they are
needed. If the free blocks are recorded in a bit map, and the whole bit map is in the main
memory, it is easy enough to choose a free block as close as possible to the previous block.

36
File System Reliability
Destruction of a file system is often a far greater disaster than destruction of a computer.
If computer is destroyed by fire, lightning surges, or a cup of coffee poured onto the keyboard, it
is annoying and will cost money, but generally a replacement can be purchased with a minimum
of fuss. Inexpensive personal computers can even be replaced within a few hours. If a
computer file system is irrevocably lost, whether due to hardware, software or any other means,
restoring all the information will be difficult, time consuming, and in many cases, impossible. For
the people whose programs, documents, customer files, tax records, databases, marketing
plans, or other data are gone forever, the consequences can be catastrophic. While the file
system cannot offer any protection against any physical destruction of the equipment and
media, it can help protect the information.

Techniques to Improve Reliability

a. Bad Block Management


Disks often have bad blocks. Floppy disks are generally perfect when they leave the factory, but
they can develop bad blocks during use. Hard disks frequently have bad blocks from the start: it
is just too expensive to manufacture them completely free of all defects. In fact, most hard disk
manufacturers supply with each drive a list of the bad blocks their tests have discovered. Two
solutions to the bad block problems are used, one hardware and the other software.
 The hardware solution is to dedicate a sector on the disk to the bad block list. When the
controller is first initialized, it reads the bad block list and picks a space block (or track) to
replace the defective ones, recording the mapping in the bad block list.

 Software solution requires the user or file system to carefully construct a file containing
all the bad blocks. This technique removes them from the free list, so they will never
occur in data files. As long as the bad block file is never read or written, no problem will
arise. Care has to be taken during disk backups to avoid reading this file.
b. Backups
Even with a clever strategy for dealing with bad blocks, it is important to back up files frequently.
After all, automatically switching to a spare track after a crucial data block has been ruined is not
easy. Backup technique is as simple as it sounds. It involves keeping another copy of the data

37
on some other machine or device so that the copy could be used in case of a system failure.
There are two types of backup techniques, namely full dump and incremental dump.

- Full dump simply refers to making a backup copy of the whole disk on another disk or
machine. It is pretty obvious that the process of full dumping is time consuming as well as
memory consuming.
- Incremental dump has some advantages over full dump. The simplest form of
incremental dumping is to make a full dump periodically (say monthly or weekly) and to
make a daily dump of only those files that have been modified since the last full dump. A
better scheme could be to change only those files that have been changed since the last
full dump. Such a scheme of data backup is time efficient as well as memory efficient.
To implement this method, a list of dump times for each file must be kept on disk.

File System Security


Security has many facets. Two of the more important ones relate to data loss and intrusion.
Some of the common causes of data loss are:
- Natural phenomenon such as: fire, flood, earthquakes, wars, riots or attacks from rodents
- Hardware or software errors: CPU malfunctions, unreadable disks or tapes,
telecommunication errors, program bugs
- Human errors: incorrect data entry, wrong tape or disk mounted, wrong program run, lost
disk or tape.
Most of these can be dealt with by maintaining adequate backups, preferably far away from the
original data.

System Protection Levels


To protect the system, we must take security measures at four levels:
 Physical: The site or sites containing the computer systems must be physically secured
against armed or surreptitious entry by intruders.
 Human: Users must be screened carefully to reduce the chances of authorising a user
who then gives access to an intruder (in exchange for a bribe, for example).
 Network: Much computer data in modern systems travels over private leased lines,
shared lines such as: the Internet, or dial-up lines. Intercepting these data could be just
as harmful as breaking into a computer; and interruption of communications could
38
constitute a remote denial-of-service attack, diminishing users' use of and trust in the
system.
 Operating system: The system must protect itself from accidental or purposeful security
breaches.
Security at the first two levels must be maintained if operating-system security is to be ensured.
A weakness at a high level of security (physical or human) allows circumvention of strict low-
level (operating system) security measures.
Furthermore, the system hardware must provide protection to allow the implementation of
security features. Most contemporary operating systems are now designed to provide security
features.

A. Intrusion/Categories of Intruders
Intrusion is a set of actions that attempt to compromise the integrity, confidentiality, or
availability of any resource on a computing platform.
Categories of Intruders
- Casual prying by non technical users. Many people have terminals to timesharing
systems on their desks, and human nature being what it is, some of them will read other
people’s electronic mails and other files if no barriers are placed in the way.
- Snooping by insiders. Students, system programmers, operators, and other technical
personnel often consider it to be a personal challenge to break the security of a local
computer system. They are often highly skilled and are willing to devote a substantial
amount of time to the effort.
- Determined attempt to make money. Some bank programmers have attempted
banking system to steal from the bank. Schemes vary from changing software to
truncating rather than rounding off interest, keeping the fraction of money for themselves,
siphoning off accounts not used for years, to blackmail (“pay me or I will destroy all the
bank’s records”)
- Commercial or military espionage. Espionage refers to a serious and well funded by a
competitor or a foreign country to steal programs, trade secrets, patents, technology,
circuit designs, marketing plans, and so forth. Often this attempt will involve wiretapping
or even erecting antennas at the computer to pick up its electromagnetic radiation.
The amount of effort that one puts into security and protection clearly depends on who the
enemy is thought to be. Absolute protection of the system from malicious abuse is not possible,
39
but the cost to the perpetrator can be made sufficiently high to deter most, if not all,
unauthorised attempts to access the information residing in the system.

Intrusion Detection
Intrusion detection strives to detect attempted or successful intrusions into computer systems
and to initiate appropriate responses to the intrusions. Intrusion can be detected through:

- Auditing and Logging. A common method of intrusion detection is audit-trail processing,


in which security-relevant events are logged to an audit trail and then matched against
attack signatures (in signature-based detection) or analyzed for anomalous behavior (in
anomaly detection).
- System-Call Monitoring is a more recent and speculative form of anomaly detection.
This approach monitors process system calls to detect in real time when a process is
deviating from its expected system-call behavior.

User Authentication
A major security problem for operating systems is authentication. The protection system
depends on the ability to identify the programs and processes currently executing, which in turn
depends on the ability to identify each user of the system. The process of identifying users when
they log on is called user authentication. How do we determine whether a user's identity is
authentic? Generally, authentication is based on one or more of three items:
- User possession (a key or card)
- User knowledge (a user identifier and password)
- User attributes (fingerprint, retina pattern, or signature).

1. Passwords
The most common approach to authenticating a user identity is the use of passwords. When
a user identifies herself by user ID or account name, she is asked for a password. If the user-
supplied password matches the password stored in the system, the system assumes that the
user is legitimate. Passwords are often used to protect objects in the computer system, in the
absence of more complete protection schemes.Different passwords may be associated with
different access rights. For example, different passwords may be used for reading files,
appending files, and updating files.
40
Password Vulnerabilities
Passwords are extremely common because they are easy to understand and use.
Unfortunately, passwords can often be guessed, accidentally exposed, sniffed, or illegally
transferred from an authorized user to an unauthorised one. There are two common ways to
guess a password.
- One way is for the intruder (either human or program) to know the user or to have
information about the user.
- The use of brute force, trying enumeration, or all possible combinations of letters,
numbers, and punctuation, until the password is found.
In addition to being guessed, passwords can be exposed as a result of visual or electronic
monitoring. An intruder can look over the shoulder of a user (shoulder surfing) when the user
is logging in and can learn the password easily by watching the keystrokes. Alternatively,
anyone with access to the network on which a computer resides could seamlessly add a
network monitor, allowing her to watch all data being transferred on the network (sniffing),
including user IDs and passwords. Encrypting the data stream containing the password solves
this problem. Exposure is a particularly severe problem if the password is written down where it
can be read or lost.

2. Biometrics
There are many other variations to the use of passwords for authentication. Palm or hand-
readers are commonly used to secure physical access—for example, access to a data center.
These readers match stored parameters against what is being read from hand-reader pads. The
parameters can include a temperature map, as well as finger length, finger width, and line
patterns. These devices are currently too large and expensive to be used for normal computer
authentication. Fingerprint readers have become accurate and cost-effective and should
become more common in the future. These devices read your finger's ridge patterns and
convert them into a sequence of numbers. Over time, they can store a set of sequences to
adjust for the location of the finger on the reading pad and other factors. Software can then scan
a finger on the pad and compare its features with these stored sequences to determine if the
finger on the pad is the same as the stored one.

41
B. Program Threats
When a program written by one user may be used by another, misuse and unexpected behavior
may result. Some common methods by which users gain access to the programs of others are:
Trojan horses, Trap doors, Stack and buffer overflow.

i. Trojan Horse
Many systems have mechanisms for allowing programs written by users to be executed by other
users. If these programs are executed in a domain that provides the access rights of the
executing user, the other users may misuse these rights. A text-editor program, for example,
may include code to search the file to be edited for certain keywords. If any are found, the entire
file may be copied to a special area accessible to the creator of the text editor. A code segment
that misuses its environment is called a Trojan horse.

ii. Trap Door


The designer of a program or system might leave a hole in the software that only he/she is
capable of using. This type of security breach is called trap door. Programmers have been
arrested for embezzling from banks by including rounding errors in their code and having the
occasional half-cent credited to their accounts. This account crediting can add up to a large
amount of money, considering the number of transactions that a large bank executes. Trap
doors pose a difficult problem because, to detect them, we have to analyze all the source code
for all components of a system. Given that software systems may consist of millions of lines of
code, this analysis is not done frequently, and frequently it is not done at all.

iii. Stack and Buffer Overflow


This is the most common way for an attacker outside of the system, on a network or dial-up
connection, to gain unauthorized access to the target system. An authorized user of the system
may also use this exploit for privilege escalation, to gain privileges beyond those allowed for
that user. Essentially, the attack exploits a bug in a program. The bug can be a simple case of
poor programming, in which the programmer neglected. The buffer-overflow attack is especially
malicious, as it can be run within a system and can travel over allowed communications
channels. Such attacks can occur within protocols that are expected to be used to communicate
with the machine, and they can therefore be hard to detect and prevent. They can even bypass

42
the security added by firewalls. One solution to this problem is for the CPU to have a feature
that disallows execution of code in a stack section of memory.

C. System Threats
Most operating systems provide a means by which processes can give birth to other processes.
In such an environment, it is possible to create a situation where operating system resources
and user files are misused. The two most common methods for achieving this misuse are
worms and viruses.

i. Worms
A wormis a process that uses the spawn(giving birth/replicating)mechanism to ravage system
performance. The worm spawns copies of itself, using up system resources and perhaps locking
out all other processes. On computer networks, worms are particularly potent, since they may
reproduce themselves among systems and thus shut down the entire network.

ii. Viruses
Like worms, viruses are designed to spread into other programs and can wreck havoc in a
system by modifying or destroying files and causing system crashes and program malfunctions.
Whereas a worm is structured as a complete, standalone program, a virus is a fragment of code
embedded in a legitimate program. Viruses are a major problem for computer users, especially
users of microcomputer systems. Viruses are usually spread when users download viral
programs from public bulletin boards or exchange disks containing an infection. In recent years,
a common form of virus transmission has been via the exchange of Microsoft Office files, such
as Microsoft Word documents. Most commercial antivirus packages are effective against only
particular known viruses. They work by searching all the programs on a system for the specific
pattern of instructions known to make up the virus. When they find a known pattern, they
remove the instructions, disinfectingthe program. These commercial packages have catalogs of
thousands of viruses for which they search. The best protection against computer viruses is
prevention, or the practice of safe computing. Purchasing unopened software from vendors
and avoiding free or pirated copies from public sources or disk exchange is the safest route to
preventing infection. Another defense is to avoid opening any e-mail attachments from
unknown users.

43
iii. Denial of Service
The last attack category, denial of service, is aimed not at gaining information or stealing
resources but rather at disrupting legitimate use of a system or facility. An intruder could delete
all the files on a system, for example. It involves launching an attack that prevents legitimate
use of system resources

File Protection
There are three most popular implementations of file protection:
- File Naming
It depends upon the inability of a user to access a file he cannot name. This can be
implemented by allowing only users to see the files they have created. But since most file
systems allow only a limited number of characters for filenames, there is no guarantee that two
users will not use the same filenames.

- Password Protection
This scheme associates a password to each file. If a user does not know the password
associated to a file then he cannot access it. This is a very effective way of protecting files but
for a user who owns many files, and constantly changes the password to make sure that nobody
accesses these files will require that users have some systematic way of keeping track of their
passwords.

- Access Control
An access list is associated with each file or directory. The access list contains information on
the type of users and accesses that they can have on a directory or file. An example is the
following access list associated to a UNIX file or directory:
drwxrwxrwx

The d indicates that this is an access list for a directory, the first rwxindicates that it can be
read, written, and executed by the owner of the file, the second rwxis an access information for
users belonging to the same group as the owner (somewhere on the system is a list of users
belonging to same group as the owner), and the last rwxfor all other users. The rwxcan be
changed to just r - - indicating that it can only be read, or – w - for write-only, - - x for execute
only.

44
File Characteristics

File Hit Rate: is the term used to describe the rate of processing Master files in terms of active
records. It is defined as the proportion of records updated or referenced on each updating in
relation to the total number of records in the master file for instance if 1000 out of 10000 is
processed then the hit rate is 10%
Volatility: This is the frequency with which records are added to or deleted from the file. We can
have volatile file or static files
Size: This is the amount of data stored in the file. It may be expressed in number of bytes, kilo
bytes or mega bytes
Growth: file often grows steadily in size as new records are added

File Processing Techniques

Batch processing
Online processing
Interactive processing
-Real-time processing
-Multi-users processing
-Multi- tasking processing

-Batch processing: Transactions are accumulated into batches of suitable sizes, and then each
batch is sorted and processed through sequence of stages known as a run. Each batch is
identified by a batch number which is to be recorded on a batch control slip. This slip also
contains control information (e.g number of items in each batch) and individual hardware
requirement. The identification and specifications are done using a special language called Job
control language. The JCL made it possible to specify names for the jobs, files to be used by
each, peripherals required and job priority etc. The weakness is that it delays output, require
physical transportation of data or manual intervention in the course of processing, it has minimal
application in the modern computing

-Online processing: involves direct connection of the data source to the computer either by
using wired or wireless connection. The computer, the data source (another computer, consoles
or any other input devices) and the output devices are said to be on-line when they can interact
automatically.

45
-Interactive processing: This involves hands-on transactions which is referred to as
“conversational mode processing”. The software prompt the user for information, the user is
expected to respond promptly, the response will be processed immediately. The resultant
outcome is determined by the sequence of request by the computer and answers supplied by the
users. This type of techniques receives and processes data at random intervals, hence time lag
is not tragic. Examples of interactive processing are:

i. Real- time processing: Transactions are said to be real-time when processing is done
as event occurs and the master files are update immediately. Examples of Real-time processing
are; Airline seat reservation, online banking and recharging GSM account etc

ii. Multi-users processing. Has provision for a number of users to use the same computer
at a time. A special multi-users operating system (Window, Linux etc) will be required to control
the resources such that the delay in response to user requests are not noticeable

iii. Multi-tasking processing. This technique facilitates the running of two or more tasks
(programs) concurrently on the computers. The technique require a Multi-tasking Operating
system that will allow high speed switching between different tasks while affording access to
multiple sources of information.

Management of Files in Windows


There are three ways of managing files in Windows operating systems:
- From within a Program
- By using My Computer and
- By using Windows Explorer.

a. Managing Files from within a Program


When you choose “File” ->“Save As” from within a program such as Microsoft Word, a dialogue
box appears with three important features:
- “Save in” (near the top of the box). A dropdown box that brings up your computer’s
directory structure, to allow you to choose where to save your file.
- “File name” (near the bottom of the box). Allows you to type in a name for your file.

46
- “Save as type” (at the bottom of the box). A dropdown box that allows you to choose a
format (type) for your file. The default file format will appear with the default file extension.

Save As Dialog box

These three options also appear when you choose “File” ->“Open” from within Microsoft Word,
but they have slightly different names. Other programs will have the same three options, which
again might have slightly different names.

b. Using My Computer

Double-clicking on the My Computer icon, which is located in the upper left-hand corner of your
desktop, will open a window labeled “My Computer”. From within this window, you can open,
move, copy, and delete files; you can also create, move, copy, and delete folders. Double-
clicking on any folder icon also opens the contents of that.
At the “top” level of the directory structure are the drives, differentiated by letters:
- A:\ is your floppy disk drive
- C:\ is your hard disk
- D:\ is your Zip, CD, or DVD drive
- F:\ is probably your flash

Go to “View” at the top of the window to change the way files and folders are displayed within
the window. There are four ways to view files and folders:
- Large icons
- Small icons
47
- List – Choose this when you want to work with several files or folders at a time.
- Details – This is a good mode to work in when you want to see when the file was created,
its size, and other important information.

My Computer Dialog box

Tool Bar Panel


The toolbar has several buttons that enable you to work with files and folders:

Up – Choosing “Up” enables you to navigate through the computer’s directory structure quickly.
Clicking on this button will change the contents of the current window, taking you “up” in the
directory structure until you get to the highest level, at which only the drives are shown.
Cut – When you single-click on a file or folder to select it, it will be highlighted in the window.
Choosing “Cut” will delete the file or folder from its current location and copy it to the clipboard
so that it can be pasted elsewhere.
Copy – Choosing “Copy” will copy a selected file or folder into the clipboard so that it can be
pasted elsewhere, but will not remove the file or folder from its current location.
Paste – Choosing “Paste” will paste a file or folder that is stored in the clipboard into the current
location.
Undo – Choosing “Undo” allows you to undo an action that you have just performed. This is
particularly useful when you have just deleted something you didn’t mean to delete.
Delete – Choosing “Delete” will delete a selected file or folder without copying it to the clipboard.

48
Properties – Choosing “Properties” will bring up a box that gives you information about a
particular file or folder.
To create a new folder in the current window, you can do one of two things:
 Go to “File”-> “New”-> “Folder.”
A new folder appears in the current window, and the folder name is highlighted that will allow
you to name it.
 Right-click anywhere in the current window (not on an icon or filename) and choose
“New”-> “Folder.”
Right-clicking on a selected file or folder will allow you to do several useful things, among which
are the following:
- Rename a file or folder by choosing “Rename.” A blinking cursor will appear in the file or
folder name.
- Create a desktop shortcut by choosing “Send To” “Desktop as Shortcut.”
- Copy the file or folder to a floppy disk by choosing “Send To”-> “3 ½ Floppy (A:).”
- Cut, copy, paste, or print a file.

c. Using Windows Explorer


In Windows Explorer, the entire directory structure is always available at all times in the left-
hand pane. In this respect it differs from My Computer. Another difference between Windows
Explorer and My Computer is that Windows Explorer allows you to drag-and-drop files and
folders with the mouse.
In the left-hand pane, drives, directories, and subdirectories are visible. To expand your view of
the contents of a drive or directory, click on the + sign next to the directory name. To collapse
your view of the contents of a drive or directory, click on the – sign next to the directory name.
To see the contents of a drive or directory, click once on it (i.e., select it). In the right hand pane,
the contents of the selected drive or directory are then displayed. The right hand pane functions
just like the windows in My Computer.
In the first example below, the drive C:\ is selected, and its contents are shown in the right-hand
pane, while in the next example, the drive C:\ has been expanded, and the directory
“Documents” has been selected. Its contents are displayed in the right hand pane:

49
My Documents Dialog Box

Working with More than One File


To select two or more separate files, hold down the “Ctrl” key and click on each filename. To
select a contiguous group of files in a list, click on the first filename, then hold down the “Shift”
key and click on the last filename. All files in between will also be selected. You can then
perform cut, copy, and delete functions on all the selected files.
50
Locating Lost File
Use the “Find File” facility of your operating system by going to “Start”->”Find”-> ”Files or
Folders.” A box will appear that will allow you to search for a file by name, by part of its name
(use * as a wildcard character), by location, by date, by type, or by other criteria.

Tips for Management of Electronic Files


It is very important to keep the files on your computer organized and up-to-date. Just as with
paper files, the goal of computer file management is to ensure that you can find what you’re
looking for, even if you’re looking for it years after its creation. The following file management
tips will be of help in keeping your files accessible:
 Organise by file types. Make applications easier to find by creating a folder called
Program Files on your drive and keeping all your applications there. For instance, the
executables for Word, PowerPoint, Simply Accounting and WinZip would all reside in the
Program Files folder.
 One place for all. Place all documents in the My Documents folder and nowhere else.
So whether it’s a spreadsheet, a letter or a PowerPoint presentation, it goes here. This
will make it easier to find things and to run backups.
 Create folders in My Documents. These are the drawers of your computer’s filing
cabinet, so to speak. Use plain language to name your folders; you don’t want to be
looking at this list of folders in the future and wondering what “TFK” or whatever other
interesting abbreviation you invented means.
 Nest folders within folders. Create other folders within these main folders as need
arises. For instance, a folder called “Invoices” might contain folders called “2004”, “2005”
and “2006”. A folder named for a client might include the folders “customer data” and
“correspondence”. The goal is to have every file in a folder rather than having a bunch of
orphan files listed.
 Follow the file naming conventions. Do not use spaces in file names. If you break any
of these rules, be consistent about it.
 Be specific. Give files logical, specific names and include dates in file names if possible.
The goal when naming files is to be able to tell what the file is about without having to
open it and look. So if the document is a letter to a customer reminding him that payment

51
is overdue, call it something like “overdue081206” rather than something like “letter”. How
will you know who the letter is to without opening it.
 File as you go. The best time to file a document is when you first create it. So get in the
habit of using the “Save As” dialogue box to file your document as well as name it, putting
it in the right place in the first place.
 Order your files for your convenience. If there are folders or files that you use a lot,
force them to the top of the file list by renaming them with AA at the beginning of the
filename.
 Cull your files regularly. Sometimes what’s old is obvious as in the example of the
folder named “Invoices” above. If it’s not, keep your folders uncluttered by clearing out
the old files. Do NOT delete business related files unless you are absolutely certain that
you will never need the file again. Instead, in your main collection of folders in My
Documents, create a folder called “Old” or “Inactive” and move old files into it when you
come across them.
 Back up your files regularly. Whether you’re copying your files onto another drive or
onto tape, it’s important to set up and follow a regular back up regime. If you follow these
file management tips consistently, even if you don’t know where something is, you know
where it should.

Sorting and Search Algorithms

Different types of algorithms exists with regards to file sorting and searching,
I. Sorting Algorithm
In computer science and mathematics, a sorting algorithm is a prescribed set of well-defined
rules or instructions that puts elements of a list in a certain order. The most-used orders are
numerical order and alphabetical order. Efficient sorting is important to optimizing the use of
other algorithms (such as search and merge algorithms) that require sorted lists to work
correctly. More formally, the output must satisfy two conditions:
1. The output is in non-decreasing order (each element is no smaller than the previous element
according to the desired total order);
2. The output is a permutation, or reordering, of the input.
Some popular Sorting Algorithms are as follows:

52
 Bubble Sort
This is a sorting algorithm that continuously steps through a list, swapping items until they
appear in the correct order. Bubble sortis a straightforward and simple method of sorting data.
The algorithm starts at the beginning of the data set. It compares the first two elements, and if
the first is greater than the second, it swaps them. It continues doing this for each pair of
adjacent elements to the end of the data set. It is used for small list.
 Insertion Sort
This is a simple sorting algorithm that is relatively efficient for small lists and mostly-sorted lists,
and often used as part of more sophisticated algorithms. It works by taking elements from the
list one by one and inserting them in their correct position into a new sorted list.
 Merge Sort
Merge sort takes advantage of the ease of merging already sorted lists into a new sorted list. It
starts by comparing every two elements (i.e., 1 with 2, then 3 with 4...) and swapping them if the
first should come after the second. It then merges each of the resulting lists of two into lists of
four, then merges those lists of four, and so on; until at last two lists are merged into the final
sorted list. Of the algorithms described here, this is the first that scales well to very large list.
 Heap Sort
Heap sort works by determining the largest (or smallest) element of the list, placing that at the
end (or beginning) of the list, then continuing with the rest of the list, but accomplishes this task
efficiently by using a data structure called a heap, a special type of binary tree. Once the data
list has been made into a heap, the root node is guaranteed to be the largest
(or smallest) element. When it is removed and placed at the end of the list, the heap is
rearranged so the largest element remaining moves to the root.
 Quick Sort
Quick sort is a divide and conquer algorithm which relies on a partition operation: to partition an
array, we choose an element, called a pivot, move all smaller elements before the pivot, and
move all greater elements after it. This can be done efficiently in linear time and in-place. We
then, recursively sort the lesser and greater sub-lists. Efficient implementations of quick sort
(with in-place partitioning) are typically unstable sorts and somewhat complex, but are among
the fastest sorting algorithms in practice. Because of its modest space usage, quick sort is one
of the most popular sorting algorithms, available in many standard libraries. The most complex
issue in quick sort is choosing a good pivot element; consistently poor choices of pivots can
result in drastically slower performance,
53
II. Search Algorithm
 Linear search: processes the records of a file in their order of occurrence until it either
locates the desired records or processes all the records.
- if the records in the files are ordered then the number of search is relative to the position
of the desired file. The lowest number of search could be 1 and maximum is n
- Moving to the next record is done simply by incrementing the address of the current
record by the record size
{
Linear search algorithm
FOR i= 1 to n DO
IF key(sought) = key (i) THEN Terminate successfully
Next i
}

 Binary search: for an ordered file, this searching technique reduces the number of
search by comparing the key of the sought key with the middle record of the file. The
upper or the lower halve of the file will be eliminated based on whether the sought key is
greater than or less than the middle key.

IF the key(sought) < key(middle) THEN eliminate the upper portion including the middle key
ELSE IF the key(sought) > key(middle) THEN eliminate the lower portion including the
middle key.
The procedure will continue until the desired record is found or it is determined that the record is
not available in the file

{
Binary search algorithm
Lower: = 1
Upper; = n
WHILE Lower <= Upper DO
IF Middle := (Lower + Upper)/2 THEN terminate successfully
ELSE IF key(sought) > key (middle) THEN Lower := Middle + 1
ELSE Upper := Middle-1
END IF
END WHILE
Terminate unsuccessfully
}

54

You might also like