Radix Sort Algorithm
Radix Sort Algorithm
Radix Sort Algorithm
Radix sort is a non-comparative integer sorting algorithm that sorts integer data by grouping the
individual digits at the same position and value. Unlike other sorting methods, radix sort
considers the structure of the keys and it never directly compares one whole item to another
whole item. Radix sort is much more efficient than any comparison-based sort for sorting fixedsize integers. This paper offers an in-depth study of Radix sort and its various types. Complexity
of radixsort, with pseudocode and analysis for the different types are discussed in this paper.
The algorithms that are analyzed are: LSD radix sort, MSD radix sort. Also, we briefly discuss a
new variation of radix sort known as Forward radix sort.
General Terms: Design, Algorithms, Performance, Pseudocode, Complexity
Additional Key Words and Phrases: Most Significant Digits, Least Significant Digits
1. INTRODUCTION
Sorting algorithms are one of the very basic and fundamental algorithms that every one should
know, be it a computer science student or a computer scientist. Sorting problems has attracted a
great deal of research, perhaps due to the complexity of solving it efficiently despite its simple,
familiar statement. Theyre studied extensively for the last few decades. As you already know,
there are lot of sorting algorithms available and each of them are suitable for different
applications.
Radix sort is a unique sorting algorithm which is very efficient in sorting fixed-size integers like
phone numbers, SSNs, etc. A computer algorithm for radix sort was invented in 1954 at MIT by
Harold H. Seward.
For those of you who dont know what a radix is, [1] radix can be considered as a position in a
number. In decimal system, a radix is just a digit in a decimal number. For example the number
2560 has four digits, or four radices 2,5,6 and 0.
The Radix Sort gets its name from those radices, because the method first sorts the input values
according to their first radix, then according to the second one, and so on. It is then a multi-pass
sort, and the number of passes equals the number of radices in the input values. For example
youll need 4 passes to sort standard 32 bits integers, because in hexadecimal the radix is a byte.
Radix Sort is also often called Byte Sort.
- Radix sort is able to compete in efficiency with Quicksort in sorting fixed-size integers ( ~3
times faster ).
[2] Radix sort is well suited for sorting according to keys with multiple parts when the parts of
the key have a small value range e.g. sorting a text file according to the characters on the
given columns (Unix or MS/DOS sort).
- LSD Radix sort is said to have an average computational complexity of O(n*k) and a space
time complexity of O(n+k), where n is the number of keys, and k is the average key length.
- LSD (Least Significant Digit) Radix sort is *stable by default. MSD (Most Significant Digit)
Radix sort is not stable by default but can be implemented as a stable algorithm with the use
of a memory buffer.
Like any other sorting algorithm, Radix sort does have disadvantages.
- It is not that efficient for very long keys, because the total sorting time is proportional to key
length and to the number of items to sort.
- It requires an unconventional compare routine.
- It requires fixed size keys, and some standard way of breaking the keys into pieces.
In the following sections, we will analyze the various types of radix sorts.
2. RADIX SORT
As you all know, most computers internally represent data in the form of binary digits and
hence this algorithm can be run easily. Radix Sort is a clever and intuitive little sorting
algorithm. Two classifications of radix sorts are least significant digit (LSD) radix sorts and most
significant digit (MSD) radix sorts. LSD radix sorts process the integer representations starting
from the least digit and move towards the most significant digit while MSD radix sorts work the
other way around.
Radix sort basically divides items into equal sized pieces and it looks at only a piece of an item at
a time. For example, if you consider a string, the pieces could be the characters in the string
from right to left.
A
last
*Stable
N
first
sort : Equal elements appear in the output sequence in the same order as they do in the input sequence.
Similarly for integers, the pieces could be the digits from lowest-order to highest-order (first the
1s position, then the 10s position, then 100s position and so on).
1
last
2
first
If we have a list of items that needs to be sorted, radix sort looks at the same piece (radix)of all
items in the list, and then check the pieces in the next position and so on, until it traverse
through all the pieces.
For instance, if the items to be sorted are strings, it looks at the rightmost character of all
strings, then the next character to that in all strings, and so on, finally looking at the leftmost
characters of all strings. Integers are also checked in a similar manner.
Radix sort usually keeps one bucket for each possible value that a piece of an item could be.
Bucket could be a linked list or an array.
Consider this example
73
56
48
26
53
78
22
44
21
75
During the first pass, the 1s position of each item will be checked and these numbers will be put
in corresponding buckets as in Fig. 1.
Buckets
0
1
21
22
53
44
75
26
56
78
48
73
7
8
9
26
22
48
44
56
53
78
75
21
6
7
73
8
Fig . 2. After Round 2 of Radix sort algorithm
After 2 rounds, now we have the numbers in sorted order in the buckets.
21
22
26
44
48
56
53
73
75
78
Radix sort repeats the following two steps for each piece of an item:
- Into the buckets: Go through each item and categorize pieces of each item it into
corresponding buckets based on their value.
- Concatenate the buckets: Once all the pieces are grouped, they are merged in to one single
list which will be the sorted list.
[4] LSD
1. Sort by the least significant digit of the key, usually with CountingSort or BucketSort. The sort
must be stable. If two strings differ on first character, key-indexed sort puts them in proper
relative order.If two strings agree on first character,stability keeps them in proper relative order.
2. Repeat for the rest of the digits, working up to the MSD. The end result will be sorted.
most significant digit (MSD) radix sort can be used to sort keys in *lexicographic order.
Unlike a least significant digit (LSD) radix sort, a most significant digit radix sort does not
necessarily preserve the original order of duplicate keys.
An MSD radix sort starts processing the keys from the most significant digit, leftmost digit, to
the least significant digit, rightmost digit. This sequence is opposite that of least significant digit
(LSD) radix sorts. An MSD radix sort stops rearranging the position of a key when the
processing reaches a unique prefix of the key.
Some MSD radix sorts use one level of buckets in which to group the keys (e.g. counting sort and
pigeonhole sort). A postman's sort / postal sort is a kind of MSD radix sort.
MSD Radix sort works as follows:
1. If the radix is R, the first pass works as follows:
Create R buckets.
In bucket M, store all items whose most significant digit (in R-based representation) is M.
Reorder the array by concatenating the contents of all buckets.
2. In the second pass, we sort each of the buckets separately.
All items in the same bucket have the same most significant digit.
Thus, we sort each bucket (by creating sub buckets of the bucket) based on the second
most significant digit.
* In mathematics, the lexicographic or lexicographical order is a generalization of the way the alphabetical order of
words is based on the alphabetical order of their component letters.
maximum number of digits in an element will be logk u for some base k. To minimize
running time, we will want to minimize O((n + k) logk u). When k = n, the running time of
radix sort would be O(n logn u). If u = nO(1), the running time of radix sort turns out to be
O(n), giving us a linear time sorting algorithm if the range of integers were sorting is
polynomial in the number of integers were sorting.
Is Radix Sort smarter than Quicksort?
If we have fixed length of log2(n) bits for every digit, then Radixsort will outperform Quicksort.
At the same time, Radix sort is cache inefficient but Quick sort is cache efficient. Also, Radix
sort uses counting sort as a subroutine and counting sort takes extra space to sort numbers. I
have performed a timing analysis on both these sorts and my observations can be found under
the Results section.
2.6. APPLICATIONS
2.6.1 In Parellel computing
MSD radix sort algorithm is widely used in parallel computing, as each of the subdivisions can
be sorted independently of the rest. That means we can engage as many of the available
processors while managing memory usage effectively.
In this case, each bucket is passed to the next available processor. A single processor would be
used at the start (the most significant digit). By the second or third digit, all available processors
would likely be engaged. If all keys were of the same value, then there would be only a single bin
with all elements in it, and no parallelism would be available. For random inputs all bins would
be near equally populated and a large amount of parallelism opportunity would be available.
2.6.2. Computational molecular biology
Radix sorts plays essential role in computational molecular biology which makes use of parallel
computing
2.6.3. Data compression
Data compression is the art of reducing the number of bits needed to store or transmit data.
10,000
0.139
0.36
100,000
1.142
2.23
1,000,000
10.429
16.12
10,000,000
94.087
64.13
Radix Sort
Quicksort
100
75
50
25
0
10,000
100,000
1,000,000
10,000,000
10
As you can see in Figure 3, radix sort performs better than quicksort for most values of n, but
quicksort slightly outperforms radixsort for higher values of n.
8. CONCLUSION
In this paper, we looked at Radix sort and its several classifications. It is obvious that radix sort
is not an optimal algorithm and it is only efficient in special cases. Lot of research is going on in
improving radix sort, considering its usage in parallel computing.
In comparison with Quicksort, radix sort is only useful only for fixed length integer keys and
has bad scalability. Time complexity of radix sort is O(k*n) whereas for quicksort it is k*nlogn,
the constant factor in quicksort is actually a constant and is very small. More memory
requirements in Radix sort leads to more cache misses and more page faults as the input size
grows.
The latest efficient version of radix sort i.e, Forward radix sort, which is a combination of LSD
and MSD radix sort can be further optimized to turn it into an optimal algorithm.
ACKNOWLEDGEMENTS
I would like to thank all the authors whose articles has been referenced for creating this paper . Without
their hard work, this paper would not be possible.
REFERENCES
[1] http://www.cs.umanitoba.ca/~chrisib/teaching/comp2140/notes/003e_radixSort.pdf
[2] http://www.cs.tut.fi/~tie20106/material/lecture6_4.pdf
[3] http://www.albany.edu/~csi503/pdfs/handout_9.1.pdf
[4] https://www.cs.princeton.edu/~rs/AlgsDS07/18RadixSort.pdf
[5] http://en.wikipedia.org/wiki/Radix_sort
[6] http://courses.csail.mit.edu/6.006/spring11/exams/notes2-1.pdf
[7] https://www.nada.kth.se/~snilsson/publications/Efficient-radix-sort/text.pdf
[8] https://www.cs.usfca.edu/~galles/visualization/RadixSort.html
[9] http://www.dcs.gla.ac.uk/~pat/52233/slides/RadixSort1x1.pdf