Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Radix Sort Algorithm

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10
At a glance
Powered by AI
Radix sort is an efficient algorithm for sorting fixed-length integers. It works by grouping digits of the same place value and is well-suited for parallel processing.

Radix sort can be 3 times faster than quicksort for sorting integers. It is also well-suited for sorting keys with multiple parts that have a small value range.

Radix sort requires fixed-size keys and is not efficient for very long keys or a large number of items to sort. It also requires an unconventional compare routine.

Radix Sort Algorithm

NITHIN RAJU CHANDY


Rutgers University
nithin.chandy@rutgers.edu

Radix sort is a non-comparative integer sorting algorithm that sorts integer data by grouping the
individual digits at the same position and value. Unlike other sorting methods, radix sort
considers the structure of the keys and it never directly compares one whole item to another
whole item. Radix sort is much more efficient than any comparison-based sort for sorting fixedsize integers. This paper offers an in-depth study of Radix sort and its various types. Complexity
of radixsort, with pseudocode and analysis for the different types are discussed in this paper.
The algorithms that are analyzed are: LSD radix sort, MSD radix sort. Also, we briefly discuss a
new variation of radix sort known as Forward radix sort.
General Terms: Design, Algorithms, Performance, Pseudocode, Complexity
Additional Key Words and Phrases: Most Significant Digits, Least Significant Digits

1. INTRODUCTION
Sorting algorithms are one of the very basic and fundamental algorithms that every one should
know, be it a computer science student or a computer scientist. Sorting problems has attracted a
great deal of research, perhaps due to the complexity of solving it efficiently despite its simple,
familiar statement. Theyre studied extensively for the last few decades. As you already know,
there are lot of sorting algorithms available and each of them are suitable for different
applications.
Radix sort is a unique sorting algorithm which is very efficient in sorting fixed-size integers like
phone numbers, SSNs, etc. A computer algorithm for radix sort was invented in 1954 at MIT by
Harold H. Seward.
For those of you who dont know what a radix is, [1] radix can be considered as a position in a
number. In decimal system, a radix is just a digit in a decimal number. For example the number
2560 has four digits, or four radices 2,5,6 and 0.
The Radix Sort gets its name from those radices, because the method first sorts the input values
according to their first radix, then according to the second one, and so on. It is then a multi-pass
sort, and the number of passes equals the number of radices in the input values. For example
youll need 4 passes to sort standard 32 bits integers, because in hexadecimal the radix is a byte.
Radix Sort is also often called Byte Sort.

Radix Sort Algorithm

Advantages of Radix sort are

- Radix sort is able to compete in efficiency with Quicksort in sorting fixed-size integers ( ~3
times faster ).
[2] Radix sort is well suited for sorting according to keys with multiple parts when the parts of
the key have a small value range e.g. sorting a text file according to the characters on the
given columns (Unix or MS/DOS sort).
- LSD Radix sort is said to have an average computational complexity of O(n*k) and a space
time complexity of O(n+k), where n is the number of keys, and k is the average key length.
- LSD (Least Significant Digit) Radix sort is *stable by default. MSD (Most Significant Digit)
Radix sort is not stable by default but can be implemented as a stable algorithm with the use
of a memory buffer.

Like any other sorting algorithm, Radix sort does have disadvantages.

- It is not that efficient for very long keys, because the total sorting time is proportional to key
length and to the number of items to sort.
- It requires an unconventional compare routine.
- It requires fixed size keys, and some standard way of breaking the keys into pieces.
In the following sections, we will analyze the various types of radix sorts.

2. RADIX SORT
As you all know, most computers internally represent data in the form of binary digits and
hence this algorithm can be run easily. Radix Sort is a clever and intuitive little sorting
algorithm. Two classifications of radix sorts are least significant digit (LSD) radix sorts and most
significant digit (MSD) radix sorts. LSD radix sorts process the integer representations starting
from the least digit and move towards the most significant digit while MSD radix sorts work the
other way around.
Radix sort basically divides items into equal sized pieces and it looks at only a piece of an item at
a time. For example, if you consider a string, the pieces could be the characters in the string
from right to left.
A
last

*Stable

N
first

sort : Equal elements appear in the output sequence in the same order as they do in the input sequence.

Radix Sort Algorithm

Similarly for integers, the pieces could be the digits from lowest-order to highest-order (first the
1s position, then the 10s position, then 100s position and so on).
1

last

2
first

If we have a list of items that needs to be sorted, radix sort looks at the same piece (radix)of all
items in the list, and then check the pieces in the next position and so on, until it traverse
through all the pieces.
For instance, if the items to be sorted are strings, it looks at the rightmost character of all
strings, then the next character to that in all strings, and so on, finally looking at the leftmost
characters of all strings. Integers are also checked in a similar manner.
Radix sort usually keeps one bucket for each possible value that a piece of an item could be.
Bucket could be a linked list or an array.
Consider this example
73

56

48

26

53

78

22

44

21

75

During the first pass, the 1s position of each item will be checked and these numbers will be put
in corresponding buckets as in Fig. 1.
Buckets
0
1

21

22

53

44

75

26

56

78

48

73

7
8
9

Fig . 1. After Round 1 of Radix sort algorithm

Radix Sort Algorithm

The second pass results can be seen in Fig. 2.


Buckets
0
1
2

26

22

48

44

56

53

78

75

21

6
7

73

8
Fig . 2. After Round 2 of Radix sort algorithm

After 2 rounds, now we have the numbers in sorted order in the buckets.
21

22

26

44

48

56

53

73

75

78

Radix sort repeats the following two steps for each piece of an item:

- Into the buckets: Go through each item and categorize pieces of each item it into
corresponding buckets based on their value.

- Concatenate the buckets: Once all the pieces are grouped, they are merged in to one single
list which will be the sorted list.

2.1. Pseudocode for Radix Sort


Pseudocode for radix sort is given below. A stable [3] counting sort is used in radix sort as a
subroutine to sort pieces of each item.

Radix Sort Algorithm

Code Segment 1: Radix sort


Input : An array input of length N
Output : The array in sorted order
Radix-Sort(A, d)
// Each key in A[1..n] is a d-digit integer
// (Digits are Numbered 1 to d from right to left)
for i = 1 to d do

Use a stable sorting algorithm to sort A on digit i.

Code Segment 2: Counting Sort (A,B,k) subroutine in radix sort


// A [1 .. n] -- Input array to be sorted. (Each value in A is an integer the range 1 through k.)
// B [1 .. n] -- Sorted output array.
// C [1 .. k] -- Array of counters.

2.2. LSD Radix Sort (Bottom-up Radixsort)


LSD radix starts sorting from the least significant digit and move the processing towards the
most significant digit. The integer representations that are processed by sorting algorithms are
often called "keys," which can exist all by themselves or be associated with other data. LSD radix
sorts typically use the following sorting order: short keys come before longer keys, and keys of
the same length are sorted lexicographically. This coincides with the normal order of integer
representations, such as the sequence 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

Radix Sort Algorithm

[4] LSD

Radix sort works as follows:

1. Sort by the least significant digit of the key, usually with CountingSort or BucketSort. The sort
must be stable. If two strings differ on first character, key-indexed sort puts them in proper
relative order.If two strings agree on first character,stability keeps them in proper relative order.
2. Repeat for the rest of the digits, working up to the MSD. The end result will be sorted.

2.3. MSD Radix Sort (Top-down radix sort)


[5] A

most significant digit (MSD) radix sort can be used to sort keys in *lexicographic order.
Unlike a least significant digit (LSD) radix sort, a most significant digit radix sort does not
necessarily preserve the original order of duplicate keys.
An MSD radix sort starts processing the keys from the most significant digit, leftmost digit, to
the least significant digit, rightmost digit. This sequence is opposite that of least significant digit
(LSD) radix sorts. An MSD radix sort stops rearranging the position of a key when the
processing reaches a unique prefix of the key.
Some MSD radix sorts use one level of buckets in which to group the keys (e.g. counting sort and
pigeonhole sort). A postman's sort / postal sort is a kind of MSD radix sort.
MSD Radix sort works as follows:
1. If the radix is R, the first pass works as follows:
Create R buckets.
In bucket M, store all items whose most significant digit (in R-based representation) is M.
Reorder the array by concatenating the contents of all buckets.
2. In the second pass, we sort each of the buckets separately.
All items in the same bucket have the same most significant digit.
Thus, we sort each bucket (by creating sub buckets of the bucket) based on the second
most significant digit.

* In mathematics, the lexicographic or lexicographical order is a generalization of the way the alphabetical order of
words is based on the alphabetical order of their component letters.

Radix Sort Algorithm

Disadvantages of MSD radix sort are


MSD radix sort is observed to be very slow for small files and huge number of small files can
lead to poor performance
It is a cache inefficient algorithm as it accesses items randomly
It requires extra space for counters
Complicated in-place key-indexed counting
2.4. Analysis
Radix sort is not a comparison-based sort, so theoretical limit of O(NlgN) is not applicable. Not
surprisingly, the runtime complexity of radix sort depends on the stable sorting algorithm
chosen to sort the digits. If were using counting sort as the stable algorithm , the runtime
complexity becomes O(d(n+k) where n is the number of keys and k is the key length because
radix sort applies counting sort once for each of the d positions of digits in the data. Its space
requirement is same as for counting sort: two arrays of size n and an array of size k.
The running time of radix also depends on the base k that the integers are represented in. Large
bases result in slower counting sorts, but fewer counting sorts since the number of digits in the
elements decrease. On the other hand, small bases result result in faster counting sorts, but
more digits and consequently more counting sorts.
Researchers at MIT [6] has found that the minimum running time is obtained when k = n, and
then the complexity would be O(n logn u), where n is the number of integers in the range 0 to
u1.
[6] The

maximum number of digits in an element will be logk u for some base k. To minimize
running time, we will want to minimize O((n + k) logk u). When k = n, the running time of
radix sort would be O(n logn u). If u = nO(1), the running time of radix sort turns out to be
O(n), giving us a linear time sorting algorithm if the range of integers were sorting is
polynomial in the number of integers were sorting.
Is Radix Sort smarter than Quicksort?
If we have fixed length of log2(n) bits for every digit, then Radixsort will outperform Quicksort.
At the same time, Radix sort is cache inefficient but Quick sort is cache efficient. Also, Radix
sort uses counting sort as a subroutine and counting sort takes extra space to sort numbers. I
have performed a timing analysis on both these sorts and my observations can be found under
the Results section.

Radix Sort Algorithm

2.5. Forward radix sort


[7] Forward

radix sort is a new radix sorting method introduced by researchers at Lund


University that combines the advantages of LSD and MSD radixsort. By adding a simple
preprocessing step to this algorithm, a string sorting problem can be reduced to an integer
sorting problem in optimal asymptotic time and improved time bounds for integer sorting and
string sorting.
The main strength of LSD radixsort is that it inspects a complete horizontal strip at a time; the
main weakness is that it inspects all characters of the input. MSD radixsort only inspects the
distinguishing prefixes of the strings, but it does not make efficient use of the buckets. Forward
radixsort starts with the most significant digit, performs bucketing only once for each horizontal
strip, and inspects only the significant characters. More details about this can be found in [7].
2.5.1 Analysis
Forward radixsort runs in O(S+n+m Smax) time, where Smax is the length of the longest
distinguishing prefix. The first two terms come from the fact that the algorithm inspects each
distinguishing character once and each string at least once. The last term comes from the fact
that the algorithm runs in Smax passes and visits m buckets in each pass. The worst-case running
time is also bounded by O(S + n + m2).

2.6. APPLICATIONS
2.6.1 In Parellel computing
MSD radix sort algorithm is widely used in parallel computing, as each of the subdivisions can
be sorted independently of the rest. That means we can engage as many of the available
processors while managing memory usage effectively.
In this case, each bucket is passed to the next available processor. A single processor would be
used at the start (the most significant digit). By the second or third digit, all available processors
would likely be engaged. If all keys were of the same value, then there would be only a single bin
with all elements in it, and no parallelism would be available. For random inputs all bins would
be near equally populated and a large amount of parallelism opportunity would be available.
2.6.2. Computational molecular biology
Radix sorts plays essential role in computational molecular biology which makes use of parallel
computing
2.6.3. Data compression
Data compression is the art of reducing the number of bits needed to store or transmit data.

Radix Sort Algorithm

2.6.4. Plagiarism detection


Detecting plagiarism in computer programs through manual inspection process is very errorprone, since a plagiarist can easily change the way program looks but altering the control flow.
In addition, the detection process is very time-consuming. Radix sort can act as a redundancy
detector and can detect duplicates. For example, given a string of N characters, find the longest
repeated substring.

2.7. RELATED WORK


I have performed few tests just to make sure that radix sort performs better than quicksort for
fixed length data. Tests were performed on fixed length strings. Four datasets of different sizes
were used.
Table 1. Timing Analysis : Radix Sort vs Quick Sort
Number of fixed size Strings

Radix Sort Time (s)

Quick Sort Time (s)

10,000

0.139

0.36

100,000

1.142

2.23

1,000,000

10.429

16.12

10,000,000

94.087

64.13

Radix Sort

Quicksort

100

75

50

25

0
10,000

100,000

1,000,000

Fig 3. Radix Sort vs Quick Sort

10,000,000

Radix Sort Algorithm

10

As you can see in Figure 3, radix sort performs better than quicksort for most values of n, but
quicksort slightly outperforms radixsort for higher values of n.

8. CONCLUSION
In this paper, we looked at Radix sort and its several classifications. It is obvious that radix sort
is not an optimal algorithm and it is only efficient in special cases. Lot of research is going on in
improving radix sort, considering its usage in parallel computing.
In comparison with Quicksort, radix sort is only useful only for fixed length integer keys and
has bad scalability. Time complexity of radix sort is O(k*n) whereas for quicksort it is k*nlogn,
the constant factor in quicksort is actually a constant and is very small. More memory
requirements in Radix sort leads to more cache misses and more page faults as the input size
grows.
The latest efficient version of radix sort i.e, Forward radix sort, which is a combination of LSD
and MSD radix sort can be further optimized to turn it into an optimal algorithm.

ACKNOWLEDGEMENTS
I would like to thank all the authors whose articles has been referenced for creating this paper . Without
their hard work, this paper would not be possible.

REFERENCES
[1] http://www.cs.umanitoba.ca/~chrisib/teaching/comp2140/notes/003e_radixSort.pdf
[2] http://www.cs.tut.fi/~tie20106/material/lecture6_4.pdf
[3] http://www.albany.edu/~csi503/pdfs/handout_9.1.pdf
[4] https://www.cs.princeton.edu/~rs/AlgsDS07/18RadixSort.pdf
[5] http://en.wikipedia.org/wiki/Radix_sort
[6] http://courses.csail.mit.edu/6.006/spring11/exams/notes2-1.pdf
[7] https://www.nada.kth.se/~snilsson/publications/Efficient-radix-sort/text.pdf
[8] https://www.cs.usfca.edu/~galles/visualization/RadixSort.html
[9] http://www.dcs.gla.ac.uk/~pat/52233/slides/RadixSort1x1.pdf

You might also like