ECE 530 High Performance Vlsi Home Work 1

ECE 530
HIGH PERFORMANCE VLSI

HOME WORK 1
Ankita Amar Desai

A20427081
The main object of this homework is to understand the system level power reduction by using WATTCH. This
homework needs to be very clear with Lab 1 tutorial.
Sorting Algorithm
A)
Following are the five algorithms which were developed in this homework.
3. Insert Sort:- Insertion sort sorts one element at a time, It is just like manual sorting by humans. Insertion sort
is better for small set of elements. Insertion sort is slower than heap sort, shell sort, quick sort,and merge sort.
 Consider one element at a time

 Use a for loop running for all the elements in the array
 Compare first two integers in the array and swap them if necessary and the again carry the same
procedure until all the elements in the array are sorted.
4. Merge sort:- Based on divide and conquer strategy.
 Divide the array in two parts

 Apply sorting on each part
 After sorting is done in two parts then now consider a whole array and then keep on sorting.
5. Heap Sort:- Heap sort algorithm starts by building a heap from the given elements and then heap removes its
largest element from the end of partially sorted array
 Require two arrays – one to hold the heap and the other to hold the sorted elements .
 extract the largest element
 Places it in the next open position from the end of the partially sorted array.
 Repeat until there is nothing in the heap and the array is full
6.Shell Sort:- Based on an increment sequence. The increment size is reduced after each pass until the
increment size is 1. With an increment size of 1, the sort is a basic insertion sort, but by this time the data is
guaranteed to be almost sorted, which is insertion sort's "best case"
7. Selection sort:- First finds the smallest in the array and exchanges it with the element in the
first position, then finds the second smallest element and exchanges it with the element in the second position,
and continue in this way until the entire array is sorted.
 Find the maximum element.

 Swap the maximum element with the first minimum element
B)
1. Bubble Sort Algorithm
2. Quick Sort Algorithm

3. Insert Sort Algorithm
4. Merge Sort Algorithm

5. Heap Sort Algorithm
6. Shell Sort Algorithm

7. Selection Sort Algorithm
C) Power estimation results
Sr. No. Sorting Algorithms Average Power
1. Bubble Sort Algorithm 16.9769
2. Quick Sort Algorithm 15.2572
3. Insert Sort Algorithm 15.9635
4. Merge Sort Algorithm 15.3038
5. Heap Sort Algorithm 15.5334
6. Shell Sort Algorithm 15.3153
7. Selection Sort Algorithm 16.5436

D) Form the table above we can see that the power required for the quick sort algorithm is minimum as
compared to the reaming sorting techniques and also the power required for bubble sort is more as compared
to the remaining sorting techniques. So it is seen that the power consumption depends on few factors like the
number of loops and the number of functions. So as we decrease the loops and make a parallel implementation
of for loops then we can see that the power consumption is minimum.
Slack Loop unrolling

Following are the two tasks given to us to complete the second part of this project
Task 1) Use software transformation (example, loop unrolling) technique to modify ‘slack.c’ with default cache
configuration;
Task 2) Use software transformation (example, loop unrolling) technique to modify ‘slack.c’ with your own cache
configuration; we need to modify ‘my_config’ file for cache settings. Find your optimal cache
parameters for level-1 and level-2 caches in terms of power
The default values of level-1 instruction cache, level-1 data cache and level-2 cache are as followed
Level-1 data cache -cache:dl1 dl1:128:32:4:l
Level-1 instruction cache -cache:il1 il1:512:32:1:l
Level-2 data -cache-cache:dl2 ul2:1024:64:4:l
Level-2 instruction cache dl2
A)
Task 1)
Without changing the default cache configuration as given above we will modify the slack.c file which is given to
us using a software transformation technique known as loop unrolling and will find out the power consumption
for this case. Figure 8 shows the modification that are been done to ‘slack.c’ file in-order to get minimum power.
Procedure to be followed:-
1. Study the two reference files given to us which are ‘roll.c’ and ‘unroll.c’.
2. The slack code given to us is with respect to roll software transformation.
3. We need to convert the roll software transformation to unroll software transformation.
4. If we look at the slack code which is given to us we can see that there are many functions in the code.
5. We need to apply unrolling to only one of these function. If we apply unrolling to almost all of the
function the power will increase which is not what we want.
6. We must also make sure that the ‘output.txt’ file shows the same results as it was showing with
unrolling.
7. Here I have applied the unrolling technique to the last loop in the code because doing this only did I get
the same results in the ‘output.txt’ file as before.
Fig 8. Changes made in the slack.c file
Fig 9. Config file with all default cache configuration

Fig 10. Output.txt file for unrolling applied in the slack.c file
Task 2)
Here we will be changing the default cache configuration which was discussed earlier to those set of
combinations which will give us the minimum power. So we actually need to perform trial and error method but
there is one more way that we can do and that is by simply referring the lab 1 tutorial. I found that if I make all
the 4 cache for cache size 4K then we get minimum power which is what we want.
Procedure to be followed:-
1. Make modifications in the cache configuration to ‘my_config’ file.

2. I have choose correct combination of all the 4 cache. Which means this cache has 64 sets each having
block size of 16B and the associativity of cache is 4. So after making these changes in the ‘my_config’
file, compile the slack file with applied unrolling technique.
3. Next step is to view the sim log file.
Fig 10. Config file with cache size of 4KB
B) Output results
Fig 11 shows that by keeping the default cache configuration values we get average power of 14.6365.
Fig 11. Simlog file

C) Power Estimation Results
Keeping D2 and I1 constant
dl1 dl1:128:32:4:l dl1:512:32:1:l dl1:1024:64:4:l
dl2 ul2:1024:64:4:l ul2:1024:64:4:l ul2:1024:64:4:l
Il1 il2:512:32:1:l il2:512:32:1:l il2:512:32:1:l
Average Power 14.79 16.95 18.99

for slack
Average Power for 14.69 16.83 18.86
slack with unroll
Keeping D1 and D2 constant
dl1 dl1:128:32:4:l dl1:128:32:4:l dl1:128:32:4:l
dl2 ul2:1024:64:4:l ul2:1024:64:4:l ul2:1024:64:4:l
Il1 dl1:128:32:4:l il2:512:32:1:l il2: 1024:64:4:l
Average Power 10.847 14.79 17.61

for slack
slack with unroll
Keeping D1 and L1 constant
dl1 dl1:128:32:4:l dl1:128:32:4:l dl1:128:32:4:l
dl2 ul2: 128:32:4:l ul2: 512:32:1:l ul2:1024:64:4:l
Il1 dl1: 512:32:1:l il2:512:32:1:l il2: 512:32:1:l
Average Power 14.31 14.46 14.79

for slack
slack with unroll
D) We can see that the value for D1, L1, L2 and D2 are set at 128:32:4 gives us least power.
We can say that the power is 10.847 for slack without enrol and 10.842 for slack with unroll.
Cache configuration values are independent of any applied software configuration. It could
be found and said that minimum power is concurred when the combination of both modified
cache and transformation in software is done.
Conclusion:- Successfully implemented five sorting algorithms and analysed the power
required for them and have understood as to why they consume this much power. Also the
concept of cache configuration was understood.

ECE 530 High Performance Vlsi Home Work 1

Uploaded by

Copyright:

Available Formats

ECE 530 High Performance Vlsi Home Work 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ECE 530 High Performance Vlsi Home Work 1

Uploaded by

Copyright:

Available Formats

ECE 530

HIGH PERFORMANCE VLSI

Ankita Amar Desai

 Consider one element at a time

4. Merge sort:- Based on divide and conquer strategy.

 Divide the array in two parts

 Find the maximum element.

1. Bubble Sort Algorithm

2. Quick Sort Algorithm

4. Merge Sort Algorithm

6. Shell Sort Algorithm

C) Power estimation results

Sr. No. Sorting Algorithms Average Power

1. Bubble Sort Algorithm 16.9769

2. Quick Sort Algorithm 15.2572

3. Insert Sort Algorithm 15.9635

4. Merge Sort Algorithm 15.3038

5. Heap Sort Algorithm 15.5334

6. Shell Sort Algorithm 15.3153

7. Selection Sort Algorithm 16.5436

Slack Loop unrolling

Level-1 data cache -cache:dl1 dl1:128:32:4:l

Level-1 instruction cache -cache:il1 il1:512:32:1:l

Level-2 data -cache-cache:dl2 ul2:1024:64:4:l

Level-2 instruction cache dl2

Fig 9. Config file with all default cache configuration

1. Make modifications in the cache configuration to ‘my_config’ file.

Fig 11. Simlog file

Keeping D2 and I1 constant

dl1 dl1:128:32:4:l dl1:512:32:1:l dl1:1024:64:4:l

dl2 ul2:1024:64:4:l ul2:1024:64:4:l ul2:1024:64:4:l

Il1 il2:512:32:1:l il2:512:32:1:l il2:512:32:1:l

Average Power 14.79 16.95 18.99

Keeping D1 and D2 constant

dl1 dl1:128:32:4:l dl1:128:32:4:l dl1:128:32:4:l

dl2 ul2:1024:64:4:l ul2:1024:64:4:l ul2:1024:64:4:l

Il1 dl1:128:32:4:l il2:512:32:1:l il2: 1024:64:4:l

Average Power 10.847 14.79 17.61

Keeping D1 and L1 constant

dl1 dl1:128:32:4:l dl1:128:32:4:l dl1:128:32:4:l

dl2 ul2: 128:32:4:l ul2: 512:32:1:l ul2:1024:64:4:l

Il1 dl1: 512:32:1:l il2:512:32:1:l il2: 512:32:1:l

Average Power 14.31 14.46 14.79

You might also like