Physical File Basics
Physical File Basics
Physical File Basics
PHYSICAL FILES
CHAPTER OBJECTIVES
Upon completion of this chapter, you should be able to
o Explain the hierarchy of data: file, records, and fields.
o Describe the differences between an alphanumeric
(character) field, zoned decimal field, and packed
decimal field.
o Explain the EBCDIC (Extended Binary Coded Decimal
Interchange Code) representation for storing data.
o Explain and demonstrate how Data Description
Specifications (DDS) are used to describe files.
o Describe the differences between arrival sequence and
keyed sequence access paths.
HIERARCHY OF DATA
Programming languages such as COBOL/400 provide the
computer with the instructions to perform specific tasks.
Normally, these instructions include accessing data. Data are
unorganized raw facts.
To process data in an organized way, data areas are set aside
in the computer's memory for fields, records, and files. A field
is a group of storage positions reserved for a specific data
item. For example, if a retail organization wanted to store
data about their employees, it would probably want to
include the following data items: employee number, store
number, employee name, department number, hourly rate,
hours worked, sales, and so on. Each data item describes one
specific element of the employee and is stored as a field.
Suppose an employee's number is 864955834. Enough
Figure 1.2. Group of related fields that are stored as one employee
record.
from the Employee Pay file. Each record represents data for
one employee of the organization. A payroll file, accounts
receivable file, inventory file, and sales file are examples of
commonly used files in business applications.
Figure 1.3. Group of related employee records that are stored in one
employee file.
INTERNAL BINARY
REPRESENTATION OF DATA
The two computer codes used for internal binary
representation are EBCDIC and ASCII. Microcomputers and
non-IBM computer vendors use ASCII. ASCII stands
for American Standard Code for Information Interchange and
is pronounced ass-key.
IBM midrange and mainframe computers use the EBCDIC
coding system and is the only code illustrated in this
book. EBCDIC, pronounced eb-ce-dick, stands
Figure 1.4. EBCDIC codes for letters, numbers, character blank, and
corresponding binary and hexadecimal codes.
DATA TYPES
There are several different formats used to internally store
data within the AS/400 and iSeries Servers. The method used
to represent data internally depends upon the type of
processing to be performed on the data. The method used to
define numeric data also affects the program's efficiency.
Fields can be classified as
Type
Description of Data
Zoned-decimal
Packed-decimal
Binary
Floating-point
Date
Time
Timestamp
Type
Description of Data
Hexadecimal
ALPHANUMERIC (CHARACTER)
FIELDS
An alphanumeric field or character field is a field that
contains any combination of letters, digits, and special
characters such as $, %, @, or &. Simply, an alphanumeric
field is a field that contains any printable characters. In Figure
1.7, the letter A in the Type column identifies alphanumeric
fields. Thus, the first name, middle initial, and last name
fields are defined as alphanumeric or character fields.
In this format a single position of storage, or byte, is used to
store one character of data. Data defined as an alphanumeric
(character) field cannot be used in arithmetic operations,
even though the field may contain only numeric digits.
Each byte of an alphanumeric field is divided into two
portions. The high-order, 4-bit zone portion and the low-
Digit
1101
0001
NUMERIC FIELDS
A numeric field is a field that contains the numeric digits 0
through 9 only. There are two considerations when defining
numeric fields:
1. If a field is to be used in an arithmetic operation, it must
be defined as a numeric field.
2. Fields such as employee number, ZIP code, and part
number will probably not be used in arithmetic
operations but contain numeric digits only. Thus, these
fields could be defined as character fields or numeric
fields. However, for data integrity in a database
application it is recommended that all fields that contain
numeric digits be defined as numeric fields regardless of
how they are processed.
There are two primary methods for defining numeric data:
zoned decimal and packed decimal. We discuss these data
types next.
Digit
1111
0101
For zoned decimal fields, each byte stores one digit, where
the zone portion is equivalent to "all bits on" and the digit
portion is the binary equivalent of one of the decimal
numbers 0 to 9. The zone portion of the rightmost byte of
zoned decimal fields contains the sign for the field.
Tip
PROGRAMMING TIP
IBM midrange and mainframe computers execute arithmetic
operations in packed decimal format. Therefore, it is recommended
that numeric fields being used in arithmetic operations be defined as
packed decimal format. This saves CPU time when the arithmetic
operations are executed.
Tip
PROGRMMING TIP
It is recommended that when establishing a packed field, it should
be defined as containing an odd number of digits. This eliminates
the need for the operating system to zero-fill the high-order byte
and allows the field to contain a larger value without requiring
additional storage.
CONVERTING PACKED
DECIMAL FORMAT TO ZONED
DECIMAL FORMAT
When converting a packed decimal field to a zoned decimal
field, compute as follows:
1. Multiply the number of bytes in the packed decimal field
by 2.
2. Subtract 1 from the result of the multiplication.
Example 1
Convert a three-byte packed decimal field containing the
number 59759 to zoned decimal format.
1. Multiply 3 by 2.
2. Subtract 1.
Result: Five bytes are needed for the zoned decimal field.
Example 2
Convert a four-byte packed field containing the number
9407541 to zoned decimal format.
1. Multiply 4 by 2.
2. Subtract 1.
Result: Seven bytes needed for the zoned decimal field.
If data are entered in zoned decimal format and an arithmetic
operation such as addition is specified, the computer must
first pack the field, execute the add operation, and then
unpack the field again. These conversions are performed
automatically by the operating system and take additional
processing time. Thus, it is recommended that numeric fields
that are used in arithmetic operations be defined as packed
decimal fields.
To print or display numeric fields in a readable form, they
must be in zoned decimal format. Since packed decimal data
are not in a readable form, fields specified as output fields to
a print file should never be defined in the packed decimal
format.
When moving a packed field to a zoned decimal field the
computer automatically unpacks the sending field into the
receiving field. In this way, packed fields stored on disk are
converted to zoned decimal format so they can be printed in
readable form.
Let us consider the data record in Figure 1.11 to demonstrate
how data are represented internally in character, zoned
decimal, and packed decimal formats.
Figure 1.11. Sample data for one record in the Employee Pay file
EMPPAYPF.
DATE FORMAT
A date field is a field that contains a valid date. The letter L
is used to identify a field as a date field. Date fields have a
predetermined size of ten bytes and a predetermined format
based on the internal format used in the application.
Therefore, no data type or length is specified for date fields.
The default internal format for date fields is *ISO. ISO stands
for International Standards Organization. When defined as an
*ISO data type, the date field is defined as a ten byte field
using the format yyyy-mm-dd.
The *ISO default internal format can be overridden by the
definition specification keyword DATFMT. For example, the
format of a date field can be changed to the *USA (IBM USA
Standard) by specifying the DATFMT *USA keyword. By using
this format, the date is internally stored in the format
mm/dd/yyyy.
DATA FILES
ACCESS PATHS
Physical and logical files use access paths to access records
in a file. The access path is the method used by the
operating system to retrieve input records and write output
records. The access path may be organized as arrival
sequence (nonkeyed) or keyed sequence. Thus, records can
be read from or written to a file based on: (1) an arrival
sequence (nonkeyed) access path or (2) a keyed sequence
access path. The access path for physical and logical files is
stored with the actual file object.
When new records are added to a physical file they are added
to the end of the file. This means that the order of the
records in the physical file is in no particular sequence.
With accessing records from an arrival sequence file, they are
accessed sequentially. A specific record can be accessed only
after first accessing all records that physically precede it.
That is, for a program to read the 910th record in an arrival
sequence file, it must first read past the first 909 records.
Suppose the Employee Pay file contained only the ten records
shown in Figure 1.13 and that the file is stored in arrival
sequence. Thus, employee 827392161 was the first record
added to the file, employee 228725876 was the second
record added to the file, 235235658 was the tenth record
added to the file, and so on.
Figure 1.13 shows the order in which the records are stored in
the file. Since the file is defined as arrival sequence, the
records received by the program are in the same sequence
as they are actually physically stored in the file, thus, first-in,
first-out.
Once the source DDS are entered, the Create Physical File
(CRTPF) command is used to compile the source DDS into a
file object. Remember, as with all other objects, the compiled
physical file is stored in a library. After the file is compiled
into a file object, data can be entered or modified in the file.
Since PF is designated as the Type when the source is
entered, the system knows what type of object to create
when the DDS are compiled. The system stores the file
description and record and field attributes with the file object
DATA DESCRIPTION
SPECIFICATIONS (DDS) FOR AN
ARRIVAL SEQUENCE FILE
Data Description Specifications such as those in Figure
1.17 define record formats in physical and logical files. This
includes defining the characteristics of fields including field
names, size of fields, the type of data to be stored in the
fields, valid values that fields can contain, and so on. In
Line 7.00: The Employee Number, called EMPLOYEENO, is leftjustified in the Name field (positions 1928). The rules for
forming field names are the same as those for record names
explained on line 6 above.
The 9 right-justified in the Length field (positions 3034)
indicates that the field contains nine digits. The 0 rightjustified in the Decimal Positions field (positions 3637)
indicates that the field has zero decimal positions.
Remember, the number of decimal positions must be entered
for all numeric fields even if the number of decimal positions
is zero. When an entry is specified in the Decimal Positions
field, the field is automatically defined as numeric. The
default is that if the decimal positions entry is omitted, that
is, columns 3637 are blank, the field is defined as
a character (alphanumeric) field.
The letter S in the Data Type field (position 35) indicates that
the field is signed zoned-decimal numeric. Thus, the field will
occupy nine bytes of storage in the record.
ALIAS Keyword: Physical files allow a data field to have an
alternative (ALIAS) name of up to 30 characters. When
the ALIAS keyword is used, the ALIAS name,
in this instance, is copied to the program
instead of the field nameEMPLOYEENO in the Name field (positions
1928).
EP_EMPLOYEE_NUMBER
If the Data Type field (position 35) is blank and the Decimal
Positions field (positions 3637) is blank, the default field
type is A for alphanumeric (character).
If the Data Type field (position 35) is blank and the Decimal
Positions field (positions 3637) is notblank, that is, it
contains a valid numeric number, then the default field type
is P for packed decimal.
Tip
PROGRAMMING TIP
We recommend that a data type be specified for all fields. This will
help novice software developers in debugging the DDS since there
will be no confusion as to what the data type really is. Also,
specifying the data type provides instant documentation.
Line 8.00: The Store Number field, STORENO, is defined as a 4byte zoned-decimal field with zero decimal positions. The 4 in
the Length field (positions 3034) indicates four digits in
length and the S in the Data Type field (position 35) indicates
that the data type is zoned decimal numeric. Thus, the field
will occupy four byes in storage. The 0 in the Decimal
Positions field (positions 3637) indicates zero decimal
positions.
The ALIAS name of
field.
EP_STORE_NUMBER
Line 9.00: The First Name field, FIRSTNAME, is defined as a 15byte alphanumeric (character) field (A in Data Type field,
position 35). An alias name of EP_FIRST_NAME is assigned using
the ALIASkeyword in positions 4572.
Line 10.00: The Middle Initial field, MIDDLEINIT, is defined as a
1-byte alphanumeric field (A in Data Type field, position 35).
An alternative name of EP_MIDDLE_INITIAL is assigned using the
ALIAS keyword.
Line 11.00: The Last Name field, LASTNAME, is defined as a 15byte alphanumeric field (A in Data Type field, position 35). An
alternative name of
the ALIAS keyword.
EP_LAST_NAME
is assigned using
high-order key or primary sort field and the last key field
listed is the low-order sort field.
To write programs that access the Employee Pay file, you
need to refer to the DDS of the physical file EMPPAYPF so you
can use the correct field names in describing the output
report file. The ALIAS names are assigned to the fields in the
DDS to standardize names for the fields in this file and are
used for all programs that access this file as an externally
described file. The prefix EP-(Employee Pay) is used to
indicate that these fields are part of the Employee Pay File.
PRINTING OR DISPLAYING
RECORDS WITH THE COPY FILE
(CPYF) COMMAND
Another method that can be used to display the records of a
file is the Copy File (CPYF) command. One advantage of
this command is that each record is displayed with both the
character and hexadecimal representations. To execute the
CPYF command
1. Enter CPYF on any command line.
In the original DDS for the Employee Pay File in Figure 1.23,
the company allows for sales up to but not including one
hundred thousand dollars ($99,999). Therefore, the sales
field in the physical file is five digits long with zero decimal
places. This is adequate until changes occur. Maybe inflation
causes the company to raise the price of its products, which
raises sales beyond $100,000, or the company decides to
start selling a different line of products that sell for much
more than the old product line. When this happens, a
software developer must change the sales field to allow for at
least six or more digits.
FILE COBOL2DLIB/EMPPAYPF)
SRCFILE (COBOL2DLIB/QDDSSRC)
SRCMBR(EMPPAYPF)
DLTDEPLF (*YES)
END-OF-CHAPTER AIDS
CHAPTER SUMMARY
The most commonly used data types used on iSeries Servers
are
Data Type
Characteristics
Types Of Data
Alphanumeric
(Character) (A)
Data Type
Characteristics
Types Of Data
characters such as
Numeric:
Signed Zoned
Decimal (S)
Packed Decimal
(P)
KEY TERMS
Access path
ALIAS
Arrival sequence
Copy File (CPYF)
Data Description Specifications
(DDS)
Data File Utility (DFU)
DB2/400
Display File Field Description (DSPFFD)
Display Physical File Member (DSPPFM)
Externally described files
Index
Key field
Key file
Keyed sequence
Logical file
Physical file (PF)
Pointer
Program-described files
Relational database model
CHAPTER SELF-TEST
TRUE-FALSE QUESTIONS
FILL-IN-THE BLANKS
1. When created, a keyed file establishes two files on disk:
the ___ file and a separate ___ file.
2. The ___ field uniquely identifies each record in a keyed
file.
3. The key file or index contains two fields: the ___ field(s)
and a ___ to the physical data.
12.
13.
14.
In relational databases, records are often displayed
as ___ in a table.
15.
PROGRAMMING ASSIGNMENTS
1. Using the following problem definition
1. Create the DDS for the Customer Transaction File.
2. Use DFU to enter sample test data into the file.
Record Description Layout for customer
transaction file
Field Description
Type
Size
COBOL Field-nam
First Name
CT-FIRST-NAME
Middle Initial
CT-MIDDLE-INITIA
Last Name
15
CT-LAST-NAME
10
CT-DATE-OF-TRAN
Transaction Amount
7,2
CT-TRANSACTION
Type
Size
COBOL Field-name
5,0
PS-EMPLOYEE-NUMBER
20
PS-EMPLOYEE-LAST-NA
12
PS-EMPLOYEE-FIRST-NA
Territory Number
2,0
PS-TERRITORY-NUMBER
Office Number
2,0
PS-OFFICE-NUMBER
Field Description
Type
Size
COBOL Field-name
Annual Salary
7,0
PS-ANNUAL-SALARY
9,0
PS-SOCIAL-SECURITY-NU
Type
Size
COBOL Field-name
15
CA-CUSTOMER-FIRST-NA
CA-CUSTOMER-MIDDLE-
20
CA-CUSTOMER-LAST-NA
Street Address
20
CA-STREET-ADDRESS
City
15
CA-CITY
State
CA-STATE
Zip Code
5,0
CA-ZIP-CODE
Field Description
Type
Size
COBOL Field-na
5,0
CU-ACCOUNT-NU
First Name
10
CU-FIRST-NAME
Last Name
15
CU-LAST-NAME
Street Address
20
CU-STREET-ADD
5,0
CU-HOURS-OF-E
Gas Used
5,0
CU-GAS-USED
Electricity Bill
5,2
CU-ELECTRICITY
BOOK SECTION
Business
Business Communication
Digital Media
BOOK SECTION
Databases
BOOK SECTION
BOOK SECTION
Information Technology/Operations
BOOK SECTION
Blog
Support
Feedback
Sign In
2014 Safari. Terms of Service / Membership Agreement / Privacy Policy