Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
33 views

Example Problems

Data base design

Uploaded by

Sandra Kaveesher
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Example Problems

Data base design

Uploaded by

Sandra Kaveesher
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Example Problems:

1. Here are a few example problems related to blocking factors, along with solutions to help you
understand the concept better:

### Problem 1: Basic Blocking Factor Calculation

*Problem:*

You have a file where each record is 200 bytes long. The storage block size is 1 KB (1024 bytes).
What is the blocking factor for this file?

*Solution:*

The blocking factor is calculated by dividing the block size by the record size.

\[

\text{Blocking Factor} = \frac{\text{Block Size}}{\text{Record Size}} = \frac{1024 \text{ bytes}}{200


\text{ bytes}} = 5.12

\]

Since the blocking factor must be a whole number, the blocking factor here would be 5. This
means that 5 records can fit into one block.

---

### Problem 2: Identifying Records in a Specific Block

*Problem:*

Given the blocking factor calculated as 5 in Problem 1, determine which records would be stored
in Block 3.

*Solution:*

To find the records in Block 3:

1. The first block (Block 1) would contain records 1 to 5.

2. The second block (Block 2) would contain records 6 to 10.


3. The third block (Block 3) would therefore contain records 11 to 15.

So, records 11 through 15 would be stored in Block 3.

---

### Problem 3: Adjusting Blocking Factor with Different Block Sizes

*Problem:*

If the block size is increased to 2 KB (2048 bytes) while keeping the record size the same at 200
bytes, what would be the new blocking factor?

*Solution:*

The new blocking factor is calculated similarly:

\[

\text{Blocking Factor} = \frac{\text{Block Size}}{\text{Record Size}} = \frac{2048 \text{ bytes}}{200


\text{ bytes}} = 10.24

\]

Rounding down, the new blocking factor is 10. This means that with a block size of 2 KB, each
block can store 10 records.

---

### Problem 4: Effect of Variable Record Sizes

*Problem:*

Assume you have a file with variable record sizes, ranging between 150 bytes and 250 bytes. If the
block size is 1.5 KB (1536 bytes), how would you calculate an approximate blocking factor?

*Solution:*

For variable record sizes, an average record size can be used to calculate an approximate blocking
factor.
1. First, calculate the average record size:

\[

\text{Average Record Size} = \frac{150 + 250}{2} = 200 \text{ bytes}

\]

2. Then, calculate the blocking factor using the average record size:

\[

\text{Blocking Factor} = \frac{\text{Block Size}}{\text{Average Record Size}} = \frac{1536 \


text{ bytes}}{200 \text{ bytes}} = 7.68

\]

Rounding down, the approximate blocking factor would be 7.

---

### Problem 5: Impact of Changing Record Sizes on Storage Efficiency

*Problem:*

Suppose you decrease the record size from 200 bytes to 180 bytes, keeping the block size constant
at 1 KB (1024 bytes). What is the impact on the blocking factor, and how does this affect storage
efficiency?

*Solution:*

First, calculate the new blocking factor:

\[

\text{Blocking Factor} = \frac{\text{Block Size}}{\text{Record Size}} = \frac{1024 \text{ bytes}}{180


\text{ bytes}} = 5.69

\]

Rounding down, the new blocking factor is 5.


*Impact on Storage Efficiency:*

- *Before (with 200-byte records):* 5 records per block with 24 bytes left unused (1024 - (5 * 200)).

- *After (with 180-byte records):* 5 records per block with 124 bytes left unused (1024 - (5 * 180)).

The reduction in record size slightly decreases the storage efficiency, as more space is left unused
in each block.

2. Assume an employee file with 60,000 records with the size of 200 bytes, which is ordered by its
key field employee_number (of length 18 bits). The employee file is stored in hard disk blocks of
size 2048 bytes. [5][CO01][BTL4]

(a) Calculate the average number of block accesses required to locate any employee record
directly in the file, for the given employee_number.

(b) If an index file is created for this employee file, how much improvement will be obtained in the
search for the same record (assume length of pointer information is 12 bits)?

Solution :

Let's break down the problem and calculate the required values step by step:

### Part (a): Average Number of Block Accesses to Locate a Record

1. *File Characteristics*:

- Number of records: \( n = 60,000 \)

- Size of each record: 200 bytes

- Block size: 2048 bytes

- Key field size (employee number): 18 bits (2.25 bytes)

2. *Number of records per block*:

\[

\text{Records per block} = \frac{\text{Block size}}{\text{Record size}} = \frac{2048 \text{ bytes}}


{200 \text{ bytes}} = 10.24 \approx 10 \text{ records/block}

\]

Since only full records can be stored, each block will store 10 records.
3. *Total number of blocks required*:

\[

\text{Total blocks} = \frac{\text{Total records}}{\text{Records per block}} = \frac{60,000}{10} =


6,000 \text{ blocks}

\]

4. *Average number of block accesses*:

If the file is ordered by the key field (employee number), and assuming a uniform distribution
and direct access, the average number of block accesses in a linear search would be:

\[

\text{Average block accesses} = \frac{\text{Total blocks}}{2} = \frac{6,000}{2} = 3,000

\]

### Part (b): Improvement Using an Index File

1. *Index File Characteristics*:

- Pointer size: 12 bits (1.5 bytes)

- Key size: 18 bits (2.25 bytes)

- Total index entry size: \( 2.25 \text{ bytes} + 1.5 \text{ bytes} = 3.75 \text{ bytes} \)

2. *Number of index entries per block*:

\[

\text{Index entries per block} = \frac{\text{Block size}}{\text{Index entry size}} = \frac{2048 \


text{ bytes}}{3.75 \text{ bytes}} \approx 546 \text{ entries/block}

\]

3. *Total number of index blocks*:

\[

\text{Total index entries} = \text{Total blocks} = 6,000 \text{ entries}

\]

\[
\text{Total index blocks} = \frac{6,000 \text{ entries}}{546 \text{ entries/block}} \approx 11 \
text{ blocks}

\]

4. *Average number of block accesses using index*:

For a two-level index, the search involves accessing:

- One block for the first-level index.

- One block for the corresponding block of the second-level index.

- One block for the actual data block.

Therefore, the total number of block accesses is:

\[

\text{Total block accesses with index} = 1 + 1 + 1 = 3 \text{ block accesses}

\]

### Improvement Calculation:

The improvement in block accesses is the ratio of block accesses without the index to block
accesses with the index:

\[

\text{Improvement} = \frac{\text{Block accesses without index}}{\text{Block accesses with index}}


= \frac{3,000}{3} = 1,000 \text{ times}

\]

Thus, using an index file provides a significant improvement, reducing the average number of
block accesses by a factor of 1,000.

3. Assume a folding hash function defined as follows: “For the given key ‘X’, the hash value H(X) =
(a + b + c) mod M, where a, b and c are the three parts of given key ‘X’”.

Analyze the above hash function and infer its working to fold the key 123456789 in to hash table
of ten spaces (0 to 9).

Solution:
To analyze the given folding hash function and apply it to the key 123456789, let's break it down
into steps:

### Step 1: Divide the Key into Parts

The key 123456789 needs to be divided into three parts. The key length is 9 digits, so we can
divide it evenly into three parts:

- Part 1 (a) = 123

- Part 2 (b) = 456

- Part 3 (c) = 789

### Step 2: Calculate the Hash Value

The hash value is computed as the sum of the three parts modulo the size of the hash table (which
is 10 in this case).

So, the formula is:

\[

H(X) = (a + b + c) \mod M

\]

Substituting the values:

\[

H(123456789) = (123 + 456 + 789) \mod 10

\]

### Step 3: Perform the Calculation

First, compute the sum of the parts:

\[

123 + 456 + 789 = 1368

\]

Next, find the modulo with the size of the hash table (10):
\[

1368 \mod 10 = 8

\]

### Conclusion

The hash value H(123456789) for the given key, when folded and mapped into a hash table of 10
spaces, is 8. This means the key 123456789 will be placed in position 8 of the hash table.

Indexing

4. Example 1. Suppose that we have an ordered file with r = 30,000 records stored on a disk with

100 bytes. The blocking factor for the file would be bfr = ⎣(B/R)⎦ = ⎣(1024/100)⎦ = 10 records per
block size B = 1024 bytes. File records are of fixed size and are unspanned, with record length R =

block. The number of blocks needed for the file is b = ⎡(r/bfr)⎤ = ⎡(30000/10)⎤ = 3000 blocks. A
binary search on the data file would need approximately ⎡log2b⎤= ⎡(log23000)⎤ = 12 block
accesses.

Now suppose that the ordering key field of the file is V = 9 bytes long, a block pointer is P = 6 bytes

6) = 15 bytes, so the blocking factor for the index is bfri = ⎣(B/Ri)⎦ = ⎣(1024/15)⎦ = 68 entries per
long, and we have constructed a primary index for the file. The size of each index entry is Ri = (9 +

is 3000. The num ber of index blocks is hence bi = ⎡(ri/bfri)⎤ = ⎡(3000/68)⎤ = 45 blocks. To perform
block. The total number of index entries ri is equal to the number of blocks in the data file, which

a binary search on the index file would need ⎡(log2bi)⎤ = ⎡(log245)⎤ = 6 block accesses. To search
for a record using the index, we need one additional block access to the data file for a total of 6 + 1
= 7 block accesses—an improvement over binary search on the data file, which required 12 disk
block accesses.

A major problem with a primary index—as with any ordered file—is insertion and deletion of
records. With a primary index, the problem is compounded because if we attempt to insert a
record in its correct position in the data file, we must not only move records to make space for the
new record but also change some index entries, since moving records will change the anchor
records of some blocks. Using an unordered overflow file, as discussed in Section 17.7, can reduce
this problem. Another possibility is to use a linked list of overflow records for each block in the
data file. This is similar to the method of dealing with overflow records described with hashing in
Section 17.8.2. Records within each block and its overflow linked list can be sorted to improve
retrieval time. Record deletion is handled using dele tion markers

5. Example 2. Consider the file of Example 1 with r = 30,000 fixed-length records of size R = 100
bytes stored on a disk with block size B = 1024 bytes. The file has b = 3000 blocks, as calculated in
Example 1. Suppose we want to search for a record with a specific value for the secondary key—a
nonordering key field of the file that is V = 9 bytes long. Without the secondary index, to do a
linear search on the file would require b/2 = 3000/2 = 1500 block accesses on the average.
Suppose that we con struct a secondary index on that nonordering key field of the file. As in

blocking factor for the index is bfri = ⎣(B/Ri)⎦ = ⎣(1024/15)⎦ = 68 entries per block. In a dense
Example 1, a block pointer is P = 6 bytes long, so each index entry is Ri = (9 + 6) = 15 bytes, and the

secondary index such as this, the total number of index entries ri is equal to the number of records
in the data file, which is 30,000. The number of blocks needed for the index is hence bi = ⎡(ri
/bfri)⎤ = ⎡(3000/68)⎤ = 442 blocks.

A binary search on this secondary index needs ⎡(log2bi)⎤ = ⎡(log2442)⎤ = 9 block accesses. To
search for a record using the index, we need an additional block access to the data file for a total
of 9 + 1 = 10 block accesses—a vast improvement over the 1500 block accesses needed on the
average for a linear search, but slightly worse than the 7 block accesses required for the primary
index. This difference arose because the primary index was nondense and hence shorter, with only
45 blocks in length.

We can also create a secondary index on a nonkey, nonordering field of a file. In this case,
numerous records in the data file can have the same value for the indexing field. There are several
options for implementing such an index:

■ Option 1 is to include duplicate index entries with the same K(i) value—one for each record. This
would be a dense index.

■ Option 2 is to have variable-length records for the index entries, with a repeating field for the
pointer. We keep a list of pointers in the index entry for K(i)—one pointer to each block that
contains a record whose indexing field value equals K(i). In either option 1 or option 2, the binary
search algorithm on the index must be modified appropriately to account for a variable number of
index entries per index key value.

■ Option 3, which is more commonly used, is to keep the index entries them selves at a fixed
length and have a single entry for each index field value,but to create an extra level of indirection
to handle the multiple pointers. In this nondense scheme, the pointer P(i) in index entry points to
a disk block, which contains a set of record pointers; each record pointer in that disk block points
to one of the data file records with value K(i) for the index ing field. If some value K(i) occurs in too
many records, so that their record pointers cannot fit in a single disk block, a cluster or linked list
of blocks is 18.1 Types of Single-Level Ordered Indexes 641 used. This technique is illustrated in
Figure 18.5. Retrieval via the index requires one or more additional block accesses because of the
extra level, but the algorithms for searching the index and (more importantly) for inserting of new
records in the data file are straightforward. In addition, retrievals on complex selection conditions
may be handled by referring to the record pointers, without having to retrieve many unnecessary
records from the data file (see Exercise 18.23).

6. Example 3. Suppose that the dense secondary index of Example 2 is converted into a multilevel
index. We calculated the index blocking factor bfri = 68 index entries per block, which is also the

calculated. The number of second-level blocks will be b2 = ⎡(b1/fo)⎤ = ⎡(442/68)⎤ = 7 blocks, and
fan-out fo for the multilevel index; the number of first level blocks b1 = 442 blocks was also

the number of third-level blocks will be b3 = ⎡(b2/fo)⎤ = ⎡(7/68)⎤ = 1 block. Hence, the third level
is the top level of the index, and t = 3. To access a record by searching the multilevel index, we
must access one block at each level plus one block from the data file, so we need t + 1 = 3 + 1 = 4
block accesses.

Compare this to Example 2, where 10 block accesses were needed when a single-level index and
binary search were used. Notice that we could also have a multilevel primary index, which would
be non dense. Exercise 18.18(c) illustrates this case, where we must access the data block from the
file before we can determine whether the record being searched for is in the file. For a dense
index, this can be determined by accessing the first index level (without having to access a data
block), since there is an index entry for every record in the file

You might also like