Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
Primary Memory
Random Access Memory (RAM)
Secondary Memory
Disk (Hard Disk)
Tape
Solid
State Devices (SSD)
DVD/Blue Ray
Secondary Storage
The blocking factor bfr for a file is the (average) number of file
records stored in a disk block.
A file can have fixed-length records or variable-length records.
FILES OF RECORDS (CONTD.)
File records can be unspanned or spanned
Unspanned: no record can span two blocks
Spanned: a record can be stored in more than one block
The physical disk blocks that are allocated to hold the
records of a file can be contiguous, linked, or indexed.
In a file of fixed-length records, all records have the same
format. Usually, unspanned blocking is used with such files.
Files of variable-length records require additional
information to be stored in each record, such as separator
characters and field types.
Usually spanned blocking is used with such files.
Unordered Files
• Also called a heap or a pile file.
• New records are inserted at the end of the file.
• Deletion can be to mark a record as invalid
– Later compaction can be done to recover space.
• A linear search through the file records is
necessary to search for a record since the files are
unordered
– This requires reading and searching half the file blocks
on the average, and is hence quite expensive.
• Record insertion is quite efficient.
• Reading the records in order of a particular field
requires sorting the file records after reading.
Ordered Files
• Also called a sequential file.
• File records are kept sorted by the values of an ordering field (eg. SSN)
• Insertion is expensive: records must be inserted in the correct order.
– It is common to keep a separate unordered overflow (or transaction)
file for new records to improve insertion efficiency; this is
periodically merged with the main ordered file.
• A binary search can be used to search for a record on its ordering field
value.
– This requires reading and searching log2 of the file blocks on the
average, an improvement over linear search.
• Reading the records in order of the ordering field is quite efficient.
HOW DOES A DATABASE MANIPULATE DATA ON DISK?
ITEMS TABLE
How does MYSQL know where to find and return the data
for this query?
Array
Hashtable/DictionaryAssociative Array
Tuple
Graphs
Trees
Object
ARRAY: DATA STRUCTURES
Clustered Index
Unclustered/Non-clustered Index
CLUSTERED INDEX
Owners
Owner_ID (PK)
name
age
Cars
Car_ID(PK)
Owner_ID ((PK)
type
CLUSTERED EXAMPLE
Owners
OwnerID | name | age
1 J 42
2 K 35
Cars
CarID | OwnerID | type
1 1 Ford
2 1 Mustang
If we update one of the values of a clustered index, the database has to resort the rows
- This involves deleting and inserting, which is a performance hit!
Typically, clustered indexes are on PK and FK cause those values aren’t updated
much
UNCLUSTERED/NONCLUSTERED INDEX