Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Next Article in Journal
Multiagent Hierarchical Cognition Difference Policy for Multiagent Cooperation
Next Article in Special Issue
A Lower Bound for the Query Phase of Contraction Hierarchies and Hub Labels and a Provably Optimal Instance-Based Schema
Previous Article in Journal
On the Descriptive Complexity of Color Coding
Previous Article in Special Issue
On Computational Hardness of Multidimensional Subtraction Games
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Lexicographic Unranking of Combinations Revisited

CNRS, Laboratoire de Paris 6—lip6umr 7606, Sorbonne Université, F-75005 Paris, France
*
Author to whom correspondence should be addressed.
Algorithms 2021, 14(3), 97; https://doi.org/10.3390/a14030097
Submission received: 30 November 2020 / Revised: 15 March 2021 / Accepted: 16 March 2021 / Published: 19 March 2021
(This article belongs to the Special Issue Selected Algorithmic Papers From CSR 2020)

Abstract

:
In the context of combinatorial sampling, the so-called “unranking method” can be seen as a link between a total order over the objects and an effective way to construct an object of given rank. The most classical order used in this context is the lexicographic order, which corresponds to the familiar word ordering in the dictionary. In this article, we propose a comparative study of four algorithms dedicated to the lexicographic unranking of combinations, including three algorithms that were introduced decades ago. We start the paper with the introduction of our new algorithm using a new strategy of computations based on the classical factorial numeral system (or factoradics). Then, we present, in a high level, the three other algorithms. For each case, we analyze its time complexity on average, within a uniform framework, and describe its strengths and weaknesses. For about 20 years, such algorithms have been implemented using big integer arithmetic rather than bounded integer arithmetic which makes the cost of computing some coefficients higher than previously stated. We propose improvements for all implementations, which take this fact into account, and we give a detailed complexity analysis, which is validated by an experimental analysis. Finally, we show that, even if the algorithms are based on different strategies, all are doing very similar computations. Lastly, we extend our approach to the unranking of other classical combinatorial objects such as families counted by multinomial coefficients and k-permutations.

1. Introduction

One of the most fundamental combinatorial objects is called a combination. A combination of k objects among n is a subset of cardinality k of a set containing n objects. In many enumerating problems, it appears either as the main combinatorial structure or as a core building block because of its simplicity and counting characteristics.
In the 1960s while resolving some optimization problems related to scheduling, Lehmer rediscovered an important property linking natural numbers with a mixed radius numeral system based on combinations. This relation gave him the possibility to exhibit some greedy approach for a ranking algorithm that transforms (bijectively) a combination into an integer. This numeral system is now commonly called the “combinatorial number system” or “combinadics”. It is often used for the reverse of Lehmer’s problem: generating the uth combination (for a given order on the set of combinations). For efficiency reasons, this approach can be substituted to exhaustive generation once the latter is no longer possible due to the combinatorial explosion of the number of objects when their size increases. In the context of combinations, the explosion appears quickly (we recall that 2 n n ( 2 π n ) 1 2 4 n ). This generation strategy of a single element is classically called an unranking method. It appears as the fundamental problem in combinatorial generation such as in [1] and in optimization [2]. It is also used as a basic building block in scheduling problems [3], as well as, e.g., in software testing [4]. In other contexts, it appears as the core problem for the generation of complex structures: we can for example cite phylogenetics [5] and bioinformatics [6].
Prior to speaking of unranking one must specify a total order over the objects under consideration. The one that is frequently used is the lexicographic order; since it is humanly easy to handle, it has been extensively studied. However, as Ruskey mentioned [7] (p. 59), lexicographic generation is usually not the most efficient, thus particular care must be taken while unranking for this order. Knuth dedicated a section to the lexicographic generation of combinatorial objects, especially combination [8], relating it to the special case of Gray codes. Usually, while describing an unranking method, the dual approach for ranking, i.e., taking a built object and computing its rank, is also studied. The above reference to Knuth presents such algorithms. Other combinatorial objects are also studied in Ruskey’s book about combinatorial generation [7] and in Skiena’s book dedicated to the practical implementation of such algorithms [9]. For another ad hoc approach focused on the efficiency aspects of the ranking problem, one can refer to the work of Ryabko [10].
The classical approach for the construction of combinatorial structures presenting a recursive decomposition schema consists in taking advantage of this decomposition in order to build a bigger object from smaller ones. The method is detailed in the famous book by Nijenhuis and Wilf [11]. There, the authors were mostly interested in exhaustive generation and uniform random sampling, but some ideas about the decomposition schema are also applicable in the unranking context. The method has been applied generically to decomposable objects in the sense of analytic combinatorics, first in the context of recursive generation [12], and then in the context of unranking approaches [13].
Aside from such generic approaches there exist several ad hoc algorithms. The paper by Kokosinski [14] presents a comparative study of several of them. The complexity analysis of these algorithms have been settled to be linear in n on average over all possible combinations when k is ranging from 0 to n.
However, to the best of our knowledge, these complexity analyses are only counting the number of calls to the function that computes a binomial coefficient, assuming that all the necessary coefficients have been pre-computed and stored (this pre-computation step is not included in the complexity analysis). From this fact, two questions arise. First, is this complexity analysis relevant? That is, does it faithfully reflect the actual runtime of the algorithms and can we afford the pre-computation of that many binomial coefficients? Second, among the different existing algorithms, which one performs best in practice?
Using exact computations, it is actually not a problem to deal with combinations over sets of several thousands of objects. In this context, using a table filled with all possible binomial coefficients that might be needed is impractical. Most classical computer algebra systems (CAS) can unrank combinations in a reasonable time though, which suggests that there are better approaches. In Table 1, we present some of our experimental results. We detail everything in Section 4, but as a foretaste here we give some key points. In Section 2, we introduce a new unranking algorithm. The first column gives the typical run time of our C implementation of this algorithm. In the other columns, we give a rough performance comparison of four different CAS. For each one, we compare the average run time in milliseconds to unrank a combination first by using the native algorithm of the CAS (“their algo.”) and second by implementing our new algorithm in the high-level programming language of the CAS. We observe a great diversity of run times in Table 1. Unfortunately, Maple, Mathematica, and MATLAB (to deal with integers with arbitrary precision in MATLAB, we use symbolic computations by calling the function sym) are all closed source so we cannot take a look at their implementation to understand these differences. However, a detailed study of the different algorithms from the literature and some practical considerations presented in Section 4 provide some insights on this question. Sagemath is a special case to us. Since version 9.1, our algorithm has been implemented (in Python 3) and is used as the native algorithm.

Combinatorial Context

Throughout the paper, we represent combinations as follows.
Definition 1.
Let n and k be two integers with 0 k n . We represent a combination of k elements among n elements denoted by { 0 , 1 , , n 1 } as a finite increasingly sorted sequence containing k distinct elements.
For example, let n and k be, respectively, 5 and 3. The finite sequences ( 0 , 1 , 2 ) and ( 0 , 2 , 4 ) are combinations of k elements among n, but ( 0 , 2 , 1 ) and ( 0 , 1 , 2 , 3 ) are not. Other representations are sometimes used, notably by using a two-letter alphabet, but we stick to the one given in Definition 1 in this paper.
There are several possible orders for comparing combinations. In the following, we restrict our attention to orders comparing combinations of the same length, i.e., the same number of elements.
Definition 2.
Let A = ( a 0 , a 1 , , a k 1 ) and B = ( b 0 , b 1 , , b k 1 ) be two distinct combinations of k elements among n.
  • In the lexicographic order, we say that A is smaller than B if and only if both combinations have the same (possibly empty) prefix such that ( a 0 , , a p 1 ) = ( b 0 , , b p 1 ) and if in addition a p < b p .
  • In the co-lexicographic order, we say that A is smaller than B if and only if the finite sequence ( a k 1 , , a 0 ) is smaller than ( b k 1 , , b 0 ) for the lexicographic order.
  • If A is smaller than B for a given order, then, for the reverse order, B is smaller that A.
Definition 3.
Let 0 k n be two integers and let A be a combination of k elements among n. For a given order, the rank u of A belongs to { 0 , 1 , n k 1 } and is such that A is the uth smallest combination.
With these definitions in mind, we can enter the core of the paper organized as follows. We first give the presentation and a first complexity analysis of our new algorithm solving the lexicographic combination unranking problem in Section 2. Section 3 is dedicated to the survey of three classical algorithms for the latter problem. The first one is based on the so-called recursive method and the two other algorithms are based on combinadics, which is a specific numeral system. It seems that it is the first time that both are compared and the reason one is better is explained. We also propose a new method for their analysis based on a generating function approach. Obviously, we obtain the same results as in the literature, but we manage also to obtain better asymptotic bounds than in the literature. In Section 4, we then compare their efficiency experimentally (the implementation and the exhaustive material used for repeating the experiments are all available at http://github.com/Kerl13/combination_unranking (accessed on 15 March 2021)) and propose a way to improve the computation of the binomial coefficients used in all four algorithms. Surprisingly, once the improvements have been implemented in all algorithms, we observe deep similarities in their computations, which is reflected by their observed run times. Finally, we extend our approach to solve the problem of unranking structures enumerated by multinomial coefficients and also objects counted by the k-permutations of n (also called arrangements).

2. Unranking through Factoradics: A New Strategy

The classical methods to unrank combinations are relying on the combinatorial number system introduced in 1887, by E. Pascal [15] and later by D. H. Lehmer (detailed in the book [16] (p. 27)). We survey these classical algorithms in Section 3.2. In this section, we first present a new strategy based on another number system called factoradics. To the best of our knowledge, the latter has never been used for the unranking of combinations.

2.1. Link between the Factorial Number System and Permutations

We recall that the factorial number system, or factoradics, is a mixed radix numeral system in which the representation of integers relies on the use of factorial numbers. Its definition belongs to the folklore but already appeared in 1888 in [17].
Definition 4.
Let u be a positive integer and let n be the unique integer satisfying ( n 1 ) ! u < n ! . Then, there exists a unique sequence of integers ( f ) { 0 , , n 1 } , with 0 f for all ℓ, such that:
u = f 0 · 0 ! + f 1 · 1 ! + + f n 2 · ( n 2 ) ! + f n 1 · ( n 1 ) !
The finite sequence ( f 0 , f 1 , , f n 1 ) is called the factoradic decomposition (or factoradic) of u (note that f 0 = 0 for all u).
Taking the number u = 2021 as an example, we obtain the following decomposition: 2021 = 0 × 0 ! + 1 × 1 ! + 2 × 2 ! + 0 × 3 ! + 4 × 4 ! + 4 × 5 ! + 2 × 6 ! . Thus, its factoradic is ( 0 , 1 , 2 , 0 , 4 , 4 , 2 ) .
Definition 5.
Let n be a positive integer. A permutation of size n is an ordering of the elements of the set { 0 , 1 , , n 1 } .
We represent a permutation of size n as a finite sequence of length n indicating the order of its elements. For example, the sequence ( 2 , 4 , 0 , 3 , 1 ) is a permutation of size 5.
The factorial number system is particularly suitable to define a one-to-one correspondence between integers and permutations and thus can be used as an unranking method for permutations. The algorithm implemented in the function UnrankingPermutation in Algorithm 1 is a straightforward adaptation of the Fisher–Yates random sampler for permutations [18].
Fact 1.
For all 0 u < n ! ,UnrankingPermutation ( n , u ) returns the uth permutation in lexicographic order among the n ! permutations of n elements.
Algorithm 1 Unranking a permutation.
1: function Extract(F,n,k) 
1: function UnrankingPermutation(n,u) 2: P ← [0,1,…,n − 1]
2: Ffactoradic(u)3: L ← [0,…,0]k components
3. while length ( F ) < n do4. for i from 0 to k 1 do
4:   append ( F , 0 ) 5:   L [ i ] P [ F [ n 1 i ] ]
5: return E XTRACT ( F , n , n ) 6:   remove ( P , F [ n 1 i ] )    
7: return L
factoradic ( u ) : computes the factoradic of u;
length ( F ) : computes the number of components in F;   
append ( F , i ) : appends the element i at the end of F;
remove ( F , i ) : removes from F the element at index i.
Since the factoradic (with 8 components) of 2021 is ( 0 , 1 , 2 , 0 , 4 , 4 , 2 , 0 ) , the permutation (of size 8) of rank 2021 is ( 0 , 3 , 6 , 7 , 1 , 5 , 4 , 2 ) . To obtain this permutation, we read the factoradic from right to left, and extract iteratively from the list ( 0 , 1 , , 8 1 ) the element whose index is the coefficient read in the factoradic. This goes on until the list is empty and we reach the leftmost component of the factoradic. Thus, in our example, we start by extracting the element of index 0, which is 0. Then, the list P becomes ( 1 , 2 , , 7 ) and we extract the element of index 2, which is 3. Then, P becomes ( 1 , 2 , 4 , 5 , 6 , 7 ) and we extract the fourth element, which is 6, and so on.
Note that, for the sake of clarity, we present the function Extract using a list for P, but a better data structure must be used to achieve good performance. Good candidates are dynamic balanced trees, as presented in [19], or multisets with elements of weight 1 or 0, as presented in the Appendix of [20], since both provide logarithmic access and removal. Unfortunately, it seems that there is no algorithm based on some swap operation giving an in-place shuffle to unrank permutations in the lexicographic order; put differently: Durstenfeld’s algorithm [21] cannot be easily adapted for the lexicographic order.

2.2. Combinations Unranking through Factoradics

We start with the definition of our algorithm to unrank combinations via the use of factoradics. The basic ideas driving our algorithm are the following:
  • We define a bijection between the combinations of k elements among n and a subset of the permutations of n elements.  
  • We transform the combination rank u into the rank u of the appropriate permutation.  
  • We build (the prefix of) the permutation of rank u by using Algorithm 1.
Definition 6.
Let n and k be two integers with 0 k n . We define P n , k as the function which maps the combination ( 0 , 1 , , k 1 ) to the size-n permutation ( 0 , 1 , , k 1 , d k , , d n 1 ) where the integers d i are such that d k < d k + 1 < < d n 1 and { 0 , , k 1 , d k , , d n 1 } = { 0 , 1 , , n 1 } .
For instance, by definition, for n = 5 and k = 3 , the permutations associated with the combinations ( 0 , 1 , 2 ) and ( 0 , 2 , 4 ) are, respectively, ( 0 , 1 , 2 , 3 , 4 ) and ( 0 , 2 , 4 , 1 , 3 ) . Note that the function P n , k returns the smallest size-n permutation for the lexicographic order whose prefix is the given combination.
Proposition 1.
For all integers 0 k n , the function P n , k is a bijection from the set of ( n , k ) combinations to the set of size-n permutations whose prefix of length k and suffix of length n k are both increasingly sorted.
Remark that the permutation (of size 5) ( 0 , 1 , 2 , 3 , 4 ) is the permutation associated with combinations ( 0 , 1 ) and ( 0 , 1 , 2 ) by, respectively, P 2 , 5 and P 3 , 5 . In fact, there are exactly six combinations associated with the latter permutation, but for different values of k. The proof of Proposition 1 is straightforward.
Fact 2.
For any integer m 0 , the number of sequences ( f i ) 0 i < n satisfying n k f n k f n k + 1 f n 1 m is given by n m k .
In fact, we get this result using a classical cardinality argument: a sequence of integers of the form x 0 = 0 x 1 x 2 x k x k + 1 = corresponds to a weak composition (consider the differences x i + 1 x i ) of the integer into k + 1 terms. The number of such compositions is given by + k k . Hence, the number of sequences matching the description of Fact 2 is given by n k m + k k .
We now exhibit how to transform a combination rank into its corresponding permutation rank.
Lemma 1.
For any given 0 k n , the factoradic decompositions of the ranks of the permutations obtained as the image by P n , k of some combination of k elements among n are all the finite sequences of the form ( 0 , , 0 , f n k , , f n 1 ) with n k f n k f n k + 1 f n 1 0 .
Proof. 
Let u be an integer whose factoradic is ( 0 , , 0 , f n k , f n 1 ) as in the lemma. Due to the constraint n k f n k f n k + 1 f n 1 0 , the permutation corresponding to u has for prefix of length k the sequence ( f n 1 , f n 2 + 1 , f n 3 + 2 , , f n k + k 1 ) which is increasingly sorted. The rest of the permutation (the suffix of length n k ) corresponds to the increasing sequence of elements that have not been taken yet. Thus, the result corresponds to a combination via P n , k .
Fact 2 completes the proof. □
Thus, to convert the combination rank u into its corresponding permutation rank u , it is sufficient to find the uth sequence satisfying Lemma 1 in co-lexicographic order. This is presented in Algorithm 2 where the RankConversion function implements the conversion from u to u and the UnrankingCombination function implements the whole unranking procedure.
The key to the rank conversion is also Fact 2. As a consequence, we get that the first n 1 k 1 such sequences (in co-lexicographic order) have f n 1 = 0 and that the n 1 k following have f n 1 1 .
Once again, we opt for a simple presentation here where the rank conversion and the unranking part of the algorithm are clearly separated. There is much room for improvement here: for instance, note that, at the end of the function RankConversion, a factoradic decomposition is transformed into the integer it represents, but then at the beginning of UnrankingPermutation this integer will be decomposed again in factoradics. In fact, instead of storing m into F on Line 8 we could directly compute the ith component of the combination as m + i by using the remark at the beginning of the proof of Lemma 1. In Section 4.2, we provide a more efficient way to implement this algorithm including this particular optimization and other improvements.
Proposition 2.
The functionUnrankingCombination ( n , k , u ) computes the uth combination of k elements among n in lexicographic order.
Proof. 
The algorithm, and thus its proof, relies heavily on Fact 2. The key to prove the correctness of the RankConversion function is the following loop invariant:
  • The values of f n 1 j for all 0 j < i have been computed and stored in F.
  • The value of f n i 1 (which has not been determined yet, as we enter the loop) is at least m.
  • The variable u holds the rank of the sequence ( f j ) 0 j < n i (note that j < i ) among all sequences satisfying the condition of Fact 2 with k = k i and n = n i .
With this invariant at hand, the combinatorial argument behind the condition u < n m i 1 k i 1 in the algorithm becomes more apparent; that is, the binomial coefficient counts the number of such sequences ending with m. Hence, if u < n m i 1 k i 1 , then f n 1 i = m and we move to the evaluation of the next coefficient ( f n i 2 ); otherwise, we try the next possible value for m. □
Algorithm 2 Unranking a combination.
  1: function RankConversion(n,k,u)
2: F ← [0,…,0]        ▷ n components in F
3: i ← 0
4: m ← 0
5: while i < k do
1: function UnrankingCombination(n,k,u)
2: u′ ← RankConversion(n,k,u)
3: pUnrankingPermutation(n,u′)
4: return the first k elements of p
6:  bbinomial(n − 1 − mi,k − 1 − i)
7:  if b > u then
8:   F[n − 1 − i] ← m
9:   ii + 1
10:  else
11:   uub
12:   mm + 1
13:       ▷F is the factoradic decomposition
14: return composition(F)
binomial(n,k) computes the value of n k ;
composition(F): computes the integer whose factoradic is F.
The usual way to evaluate the efficiency of such an algorithm is to count the number of times the binomial function is called (see, e.g., the book [7] (p. 66) or the papers [22,23]). During the conversion from the rank of the combination to the one of the associated permutation, the coefficients are obtained via trials (in the factoradic notation) for f n 1 to f n k , remarking that through our bijection P n , k the latter sequence is weakly increasing. Thus, the worst cases are obtained when the value f n k is as large as possible, that is n k . In these cases, the number of calls to binomial is n.
To study the average number of calls to binomial, when u describes the whole range from 0 to n k 1 , we introduce the following cumulative sequence.
Lemma 2.
Let u n , k be the cumulative numbers of calls to binomial while unranking all possible combinations u from 0 to n k 1 . The sequence satisfies: u n , 0 = 0 and u n , n + i = 0 for all n and i > 0 , and otherwise:
u n , k = n k + u n 1 , k 1 + u n 1 , k .
In Table 2, we give the first values of u n , k , when n is less than 9. The obtained sequence is stored under the reference OEIS A127717 (throughout this paper, a reference OEIS A⋯ points to Sloane’s Online Encyclopedia of Integer Sequences www.oeis.org (accessed on 15 March 2021)). The bijection between both structures is direct, and thus we provide new information about this sequence in the following.
Proof. 
The recurrence can be observed by unrolling the first iteration of the while loop. In the first iteration of the loop, a binomial coefficient b is always computed (regardless the value of k and n) which accounts for the term n k in the recurrence relation. Then, for all the ranks u such that u < b , we choose f n 1 = 0 and increment i, so that the rest of the execution corresponds to unranking a combination of k 1 elements among n 1 . This is accounted for by u n 1 , k 1 . Conversely, for all ranks u such that u b , the value of m is incremented and the rest of the execution corresponds to unranking a combination of k elements among n 1 , which is accounted for by u n 1 , k .  □
We turn to bivariate generating functions to encode the sequence ( u n , k ) as a power series and express the above recurrence as a simple equation satisfied by this function, which can be solved explicitly. The reader can refer to the two books of Flajolet and Sedgewick [24,25] for a global presentation of such method.
Theorem 3.
Let U ( z , y ) be the generating functions associated with ( u n , k ) . Then,
U ( z , y ) = 1 1 z z y 1 1 z z y 1 1 z ; thus u n , k = n k k k + 1 ( n + 1 ) .
Proof. 
The first step of the proof consists in exhibiting the ordinary generating function associated with U ( z , y ) . To obtain a functional equation satisfied by U, we start from the result presented in Lemma 2. The extreme cases are u n , 0 = 0 and u n , n + i = 0 for all n and i > 0 . The recursive equation is u n , k = n k + u n 1 , k 1 + u n 1 , k .
We remark the constant n k in the equation. We thus need the bivariate generating function of binomial coefficients. Let us denote it by B ( z , y ) , which is equal to
B ( z , y ) = n 0 k = 0 n n k z n y k = 1 1 z z y .
To take into account the extreme cases, we must remove the terms corresponding to k = 0 :
B ˜ ( z , y ) = 1 1 z z y 1 1 z .
By summing both sides of the recursive relation and by taking care of the extreme cases, we get:
n 1 k = 1 n u n , k z n y k = B ˜ ( z , y ) + n 1 k = 1 n u n 1 , k 1 z n y k + n 1 k = 1 n u n 1 , k z n y k U ( z , y ) = B ˜ ( z , y ) + z y U ( z , y ) + z U ( z , y ) .
We thus deduce
U ( z , y ) = 1 1 z z y 1 1 z z y 1 1 z .
The second step in the proof consists in extracting the coefficient u n , k . We rewrite U ( z , y ) as:
U ( z , y ) = 1 1 z ( 1 + y ) 1 1 z ( 1 + y ) 1 1 z = r 0 z r ( 1 + y ) r · r 0 z r ( 1 + y ) r r 0 z r = r 0 z r ( 1 + y ) r · 1 1 + r 1 z r ( 1 + y ) r 1 .
By extraction, the coefficient in front of z n is:
[ z n ] U ( z , y ) = = 0 n 1 ( 1 + y ) ( 1 + y ) n 1 = = 0 n 1 ( 1 + y ) n ( 1 + y ) = n ( 1 + y ) n ( 1 + y ) n 1 y .
The latter result corresponds to the distribution of the costs when k varies from 0 to n. We can then exactly extract the coefficient of z n y k :
[ z n y k ] U ( z , y ) = n · n k n k + 1 = n k k k + 1 ( n + 1 ) .
 □
Corollary 1.
To unrank a combination of k elements among n, the function UnrankingCombination ( n , k , · ) needs on average u n , k / n k calls to the function binomial. For n being large and k being of the form α n for 0 < α < 1 , this average number of calls is
u n , k n k = n k = α n n + 1 1 α + O 1 n .
The result is direct by using Theorem 3. Since we have the exact value of u n , k , the mean value can easily be computed in other cases such as k = o ( n ) or k = n o ( n ) .

3. Classical Unranking Algorithms

This section is dedicated to the presentation of a survey of the usual approaches to unrank combinations in the lexicographic order. The motivation behind this section is threefold. First, the classical algorithms were developed in the 1970s and 1980s and it is a hard task to get access to the papers we mention. Second, although they have been analyzed according to the number of calls to the binomial coefficient computations, we present here a standardization of the analysis using generating functions such as in the previous section. Finally, as shown in Section 4 and in the conclusion of the paper, a detailed analysis of all the possible approaches is necessary to well understand the behaviors of the computations.

3.1. Unranking through the Recursive Method

We are dealing with a combinatorial structure here, combinations, that is well understood in the combinatorial sense. Thus, when trying to develop an unranking algorithm, the first idea consists in developing one based on the classical recursive generation method presented in [11]. This type of algorithm is based on a recursive decomposition of the structure into smaller parts. Here, this idea is to use the following fact: a combination of k elements among { 0 , 1 , , n 1 } either contains n 1 or does not. In the first case, the rest of the combination can be seen as a combination of k 1 elements among n 1 and in the second case the combination is a combination of k elements among n 1 . From a counting point of view, this translates into the well-known identity n k = n 1 k 1 + n 1 k and, from an unranking point of view, this translates into Algorithm 3.
Algorithm 3 Recursive Unranking.
1: function UnrankingRecursive(n,k,u)
2: LRecGeneration(n,k,u)
3: L′ ← [0,…,0]         ▷ k components
4: for i from 0 to k − 1 do
5:  L′[i] ← n − 1 − L[k − 1 − i]
6: return L

1: function RecGeneration(n,k,u)
2: if k = 0 then
3:  return []
4: if n = k then
5:  return [0,1,2,…,k − 1]
6: bbinomial(n − 1,k − 1)
7: if u < b then
8:  RRecGeneration(n − 1,k − 1,u)
9:  append(R,n − 1)
10:  return R
11: else
12:  return RecGeneration(n−1,k,ub)

Remark 1.
An alternative choice would have been to test whether b < n 1 k on Line 6. This corresponds to putting first the combinations that do not contain n 1 and then those that contain n 1 . In this case, the unranking order is different but the performance is similar.
Proposition 3.
The functionRecGeneration(n,k,u) computes the uth combination of k elements among n for the reverse co-lexicographic order.
Corollary 2.
The functionUnrankingRecursive(n,k,u) computes the uth combinations of k elements among n for the lexicographic order.
Proof. 
The proposition is proved by induction and the corollary is a direct observation given in [26] (p. 47). □
Here, again, we are interested in the average number of calls to the binomial function, when u describes the whole range of integers from 0 to n k 1 . Ruskey justified such a measure by supposing the table of all binomial coefficients pre-computed, thus each call is equivalent. In Section 4, we discuss this measure. We introduce the sequence (to simplify the notations, we use several times the notations u n , k and U ( z ) for distinct sequences and series) u n , k equal to the cumulative number of calls to binomial for the whole range of possible values for u.
Lemma 3.
Let u n , k be the cumulative number of calls to binomial for the recursive unranking while unranking all possible u from 0 to n k 1 . The sequence satisfies: u n , 0 = 0 and u n , n + i = 0 for all n and i 0 , and otherwise
u n , k = n k + u n 1 , k 1 + u n 1 , k .
Proof. 
If 0 < k < n , calling RecGeneration(n,k,u) incurs one call to binomial and a recursive call. The cumulative cost of the first call to binomial is n k , the cumulative cost of the recursive calls for u < n 1 k 1 is u n 1 , k 1 and the cumulative costs of the recursive calls for u n 1 k 1 is u n 1 , k . □
In the following Table 3, we give the first values of u n , k , when n is less than 9.
Theorem 4.
Let U ( z , y ) be the ordinary generating function associated with ( u n , k ) , such that U ( z , y ) = n 0 k = 0 n u n , k y k z n . Then,
U ( z , y ) = 1 1 z z y 1 1 z z y 1 1 z z y 1 z y , thus , u n , k = n k k n + 1 k + 1 1 n k + 1 .
The proof of Theorem 4 is very similar to the one of Theorem 3.
Proof. 
The first step of the proof consists in exhibiting the ordinary generating function associated with U ( z , y ) . To obtain the equation for U, we start from the result presented in Lemma 3. The extreme cases are u n , 0 = 0 and u n , n + i = 0 for all n and i 0 . The recursive equation is u n , k = n k + u n 1 , k 1 + u n 1 , k .
Again, the numbers n k appear in the equation, thus we need the bivariate generating function of binomial coefficients. Let us denote it by B ( z , y ) . It satisfies:
B ( z , y ) = n 0 k = 0 n n k z n y k = 1 1 z z y .
To follow the extreme cases, we must remove the first column k = 0 and the diagonal k = n :
B ˜ ( z , y ) = 1 1 z z y 1 1 z z y 1 z y .
By summing the recursive equation and taking care of the extreme cases, we get:
n 1 k = 1 n u n , k z n y k = B ˜ ( z , y ) + n 1 k = 1 n u n 1 , k 1 z n y k + n 1 k = 1 n u n 1 , k z n y k U ( z , y ) = B ˜ ( z , y ) + z y U ( z , y ) + z U ( z , y ) .
We thus deduce
U ( z , y ) = 1 1 z z y 1 1 z z y 1 1 z z y 1 z y .
The second step of the proof consists in extracting the coefficient u n , k .
U ( z , y ) = 1 1 z ( 1 + y ) 1 1 z ( 1 + y ) 1 1 z z y 1 z y = r 0 z r ( 1 + y ) r · r 0 z r ( 1 + y ) r r 0 z r r 1 z r y r = r 0 z r ( 1 + y ) r · 1 1 + r 1 z r ( 1 + y ) r 1 y r .
By extraction the coefficient in front of z n is:
[ z n ] U ( z , y ) = = 0 n 1 ( 1 + y ) ( 1 + y ) n 1 y n = = 0 n 1 ( 1 + y ) n ( 1 + y ) y n ( 1 + y ) = n ( 1 + y ) n ( 1 + y ) n 1 y y ( 1 + y ) n + y n + 1 .
The latter result corresponds to the distribution of the costs when k varies from 0 to n. We can then extract the coefficient of z n y k :
[ z n y k ] U ( z , y ) = n · n k n k + 1 n k 1 = n k n n k k + 1 k n k + 1
which completes the proof. □
This sequence u n , k is a shifted version of the sequence stored under the reference OEIS A059797. We thus can complete its properties in the OEIS using our results.
Due to the values of the extreme cases when k = 0 and k = n and the symmetry in the recurrence, we obviously obtain that u n , k = u n , n k , reflecting the symmetry of the binomial coefficients.
Corollary 3.
The functionUnrankingRecursive(n,k,·) needs on average u n , k / n k calls to the function binomial. For n being large and k being of the form α · n for 0 < α < 1 , we get:
u n , k n k = n k = α n n + 2 1 α ( 1 α ) + O 1 n .
Note that, for this algorithm too, the average complexity is only below the worst-case complexity by a constant (when k α n ).
Naturally, to be able to handle large values of n and k, a tail-recursive variant of this algorithm, or an iterative version, should be preferred over the straightforward implementation. The recursive approach is a drawback for some programming languages that do not handle recursion efficiently (due to the depth of the stack). Naturally, other strategies have been suggested in the literature.

3.2. Unranking through Combinadics

In 1887, E. Pascal [15] and later D. H. Lehmer (detailed in the book [16] (p. 27)) presented an interesting way to decompose a natural number, in what we call today a mixed radix numeral system. In their case, it is the so-called combinatorial number system, or combinadics. The decomposition relies on binomial coefficients.
Fact 5.
Let n k be two positive integers. For all integers u, with 0 u < n k , there exists a unique sequence 0 c 1 < c 2 < < c k < n such that (we extend the definition of binomial coefficients with n k = 0 when k > n )
u = c 1 1 + c 2 2 + + c k 1 k 1 + c k k .
The finite sequence ( c 1 , , c k ) is called the combinadic of u.
For example, when n = 5 and k = 3 , the number 8 is represented as 1 1 + 3 2 + 4 3 , thus the combinadic of 8 is ( 1 , 3 , 4 ) . In Table 4, we present for various values of u the combinadic of u and the combination of rank u for n = 6 and k = 2 . Here, we observe that the exhibited ranking is co-lexicographic and that the combination of rank u can be deduced from the combinadic of u by reversing it and applying the transformation x n 1 x to each of its components.
In 2004, using this representation, McCaffrey exhibited, in the MSDN article [27], an algorithm to build the uth element (in lexicographic order) of the combinations of k elements among n. However, in fact, this algorithm was already published by Kreher and Stinson [26] (p. 47) and can also be seen as an extension of the work of Lehmer. This algorithm is interesting as it corresponds to the previous implementation used in the mathematics software system Sagemath [28] (the previous unranking algorithm from Sagemath is stored in the Software Heritage Archive swh:1:cnt:c60366bc03936eede6509b23307321faf1035e23;lines=539-605). In the beginning of 2020, we replaced the Sagemath implementation by the algorithm presented in Section 2 (the new unranking algorithm from Sagemath is stored in the Software Heritage Archive swh:1:cnt:b2a68056554dbf90fa55e76820f348d9d55019e3;lines=539-653 (accessed on 15 March 2021)).
The algorithm simply performs the combinadic decomposition of u and then applies the aforementioned transformation. The idea to get the combinadic of an integer 0 u < n k is the following: c k is obtained as the maximum integer such that u c k k , and then the remaining part u c k k is smaller than n 1 k 1 so it can be decomposed recursively into a combinadic with k 1 components smaller than n 1 . McCaffrey’s algorithm is described in Algorithm 4.
Algorithm 4 Unranking a combination.
1: function UnrankingViaCombinadic( n , k , u )
2.  L [ 0 , , 0 ]                     ▹k components
3.  u binomial ( n , k ) 1 u
4.  v n
5. for i from 0 to k 1 do
6.   v v 1
7.   b binomial ( v , k i )
8.  while u < b do
9.    v v 1
10.    b binomial ( v , k i )
11.   u u b
12.   L [ i ] n 1 v
13. return L
As noted while explaining Table 4, we work with the reverse of the rank u (see Line 3 in the algorithm) in order to unrank combinations in lexicographic order. The presented algorithm is also close to Er’s algorithm [23] whose representation for combinations is distinct, but the computations are analogous; furthermore, in his paper, Theorem 2 corresponds exactly to the combinadic decomposition.
The function UnrankingViaCombinadic ( n , k , u ) computes the combinations of k elements among n of rank u in lexicographic order, although the core of the algorithm is reverse co-lexicographic. The correctness of the algorithm is stated in [26].
Again, we express the complexity of this algorithm as its number of calls to the binomial function. First note that, the values n and k being given, the worst cases are obtained when v gets as small as possible at the end of the loop, thus for all u whose combinadic satisfies c 1 = 0 . Hence, the worst case complexity is n 1 . Again, we complete this analysis, by computing the average complexity of the algorithm. To reach this goal, we again introduce the sequence u n , k computing the cumulative number of calls to binomial when u ranges from 0 to n k 1 .
Lemma 4.
Let u n , k be the cumulative numbers of calls to binomial while unranking all possible u from 0 to n k 1 . The sequence satisfies: u n , 0 = 1 and u n , n + i = 0 for all n and i > 0 , and otherwise
u n , k = n k + u n 1 , k 1 + u n 1 , k .
Table 5 presents the sequence given in Lemma 4. The difference with the two sequences studied before lies in the extreme cases. This sequence is a shifted version of the sequence OEIS A264751. Both combinatorial objects can be put in bijection, and thus some conjectures stated there are solved in the following.
Proof. 
Note that n c k calls to binomial are necessary to determine c k , then c k c k 1 calls to determine c k 1 , …, and finally c 2 c 1 to determine c 1 . Hence, the total number of binomial coefficients evaluations necessary to compute the combinadic of u is n c 1 (and thus only depends on c 1 ). Besides, for a given j 0 , the number of finite sequences c 1 = j < c 2 < c 3 < < c k < n is equal to the number of sequences 0 c 2 < c 3 < < c k < n j 1 by the change of variable c i c i j 1 . Hence, this number is equal to n 1 j k 1 . In addition, there is a first call to binomial at the beginning of the algorithm to reverse the rank, regardless of the value of u. We thus obtain:
u n , k = n k + j = 0 n k ( n j ) · n j 1 k 1 .
Using the latter equation, the recursive equation is directly proved by induction. □
Note that in this case the cumulative numbers are not symmetrical u n , k u n , n k . In fact, the computation of the combinadics is not symmetrical.
Theorem 6.
Let U ( z , y ) be the generating function associated with ( u n , k ) . Then,
U ( z , y ) = 1 1 z z y 1 1 z z y z 1 z ; thus u n , k = n k n + 1 n k k + 1 .
The proof is the same as for Theorem 3. The values for u n , k are a bit different due to the extreme cases u n , 0 .
Corollary 4.
The average number of calls to binomial in Algorithm 4 for n being large and k being of the form α n for 0 < α < 1 is
u n , k n k = n k = α n n + 2 1 α + O 1 n .
In the literature, another algorithm based on combinadics is given in [22]. We provide a pseudo-code equivalent of the original Fortran algorithm in Algorithm 5. Note that the algorithm does not handle the case k = 1 , which should thus be treated separately. There, in the computation of the combinadic for a given rank, the coefficients are computed from the smallest one, c 1 , to the second largest one, c k 1 , and finally the value for c k is directly deduced with no need for further trials. In this algorithm, the variable L contains the combinadic of u (not its reverse). We note two differences with Algorithm 4. First, the last coefficient ( c k ) is directly computed without “trying” the different possible values as for the previous coefficients (see Line 13 ). Second, it uses a combinatorial argument to find the value of the c i coefficients that is the complementary of the argument used in the previous algorithm: the number of sequences 0 c 1 < c 2 < < c k < n with c 1 j is equal to i = 0 j n i 1 k 1 , hence the accumulation performed in the r variable. The fact that the same combinatorial argument can be used in two different ways here has to be put in parallel with Remark 1 at the beginning of this section.
The first point mentioned above is an improvement over Algorithm 4, but in fact this second algorithm is penalized by the extra addition that it has to perform and then undo on Line 12 to find each c i . With this approach, it is mandatory to compute the accumulation of binomial coefficients in r until it becomes greater than u to know when to exit the loop. It appears in our experimentations in the next section that this has a noticeable impact on performance.
Algorithm 5 Unranking a combination (alternative algorithm).
1: function UnrankingViaCombinadic2( n , k , u )
2:  L [ 0 , , 0 ]                     ▹k components
3.  r 0
4. for i from 0 to k 2 do
5.  if i 0 then L [ i ] L [ i 1 ]
6:  else L [ i ] 1
7:  while true do
8:    L [ i ] L [ i ] + 1
9:    b binomial ( n L [ i ] 1 , k i 1 )
10:    r r + b
11:   if r > u then exit the loop
12:   r r b
13:  L [ k 1 ] L [ k 2 ] + u r + 1
14: return L
UnrankingViaCombinadic2(n,k,u) is a lexicographic unranking for combinations. This algorithm was presented by Buckles and Lybanon [22], and its correctness is presented in [26]. Finally, note it is approximately the implementation in Matlab [29] whose code was also presented by Ruskey ([7], p. 65). The latter approach also does trials to find the last coefficient of the combination instead of computing it directly on Line 13.
Lemma 5.
Let u n , k be the cumulative numbers of calls to binomial while unranking all possible u from 0 to n k 1 . For all n, the sequence satisfies u n , k = 0 when k = 1 , 2 or k > n , and otherwise
u n , k = n k + u n 1 , k 1 + u n 1 , k .
The result is proved in an analogous way as Lemma 4, summing over c k 1 instead of c 1 . In Table 6, we compute the first values of ( u n , k ) . We note the first values are smaller than the previous ones, and Theorem 7 gives their asymptotic behavior.
Theorem 7.
Let U ( z , y ) be the generating functions associated with ( u n , k ) . Then,
U ( z , y ) = z 2 y 2 ( 1 z ) 2 ( 1 z z y ) 2 ; thus u n , k = n k k 1 k + 1 ( n + 1 ) .
Corollary 5.
The average number of calls to binomial in Algorithm 5 for n being large and k being of the form α n for 0 < α < 1 is
u n , k n k = n k = α n n + 1 2 α + O 1 n .
The improvement for the efficiency of the algorithm is seen only on the second order in comparison to the one of Corollary 4 on average. Let us conclude this part with an interesting remark.
Remark 2.
In our Algorithm 3, while unranking k elements among n, we combinatorially see n k as first n 1 k 1 and otherwise at n 1 k . This is the same approach as in Algorithm 5. However, as explained after Algorithm 3, we could have first been interested in n 1 k and then in n 1 k 1 . Then, the algorithm would have been in a reverse co-lexicographic fashion, and it would follow exactly the same approach as Algorithm 4.

4. Improving Efficiency and Realistic Complexity Analysis

Finally, we show that the complexity model used above is no longer adequate. Although the approximation that computing one binomial coefficient has a constant cost seems to have been sufficient in the past (due the limitation of the implementations to 32-bit integers; see the work of McCaffrey [27], who seems to be the first to introduce combination unranking using big integer computations), this is no longer a valid model as big integers are now more widely used. We discuss the impact of big integer arithmetic and some optimizations that significantly improve the performance of all algorithms.
From the two last sections, we know the different algorithms that are mostly used in practice. However, having in mind the results exhibited in Table 1, implementations may vary a lot. Obviously, using a specialized library for big integer arithmetic such as the GMP C library is the best way to carry out fast operations, but the distinct behaviors observed in the table suggests that deeper differences exists between the various implementations than just the choice of the library.
For all algorithms, we prove that the average number of calls to the binomial function is equivalent to n (when k grows linearly). A first question to investigate is about this complexity measure: Is it reasonable? An obvious approach consists in comparing these results with the actual run time of the algorithms.

4.1. First Experiments to Visualize the Time Complexity in a Real Context

In the two next plots in Figure 1, we present the average time needed for the computations of 500 combinations for each pair ( n , k = n / 2 ) where n ranges from 250 to 10,000 with a step size of 250. The choice k = n / 2 was made because it corresponds to the worst complexity cases when k varies from 0 to n. To conduct our experiments, we implemented all algorithms in C using the classical GMP library for big integers arithmetic (the implementation and the exhaustive material used for repeating the experiments are available at http://github.com/Kerl13/combination_unranking (accessed on 15 March 2021)). The experiments were run on a standard laptop PC with an I7-8665U CPU, 32Gb RAM running Ubuntu Linux 2020. Moreover, except for Table 1, which required running different computer algebra systems and thus the X server, the experiments were run in the console while the X server and other time-consuming services (wifi, sound, etc.) were off. On the leftmost plot, we present the average time for the unranking of a combination for n ranging from 250 to 10,000 by steps of 250 too. Note that all the curves are merged: all algorithms seem to need almost exactly the same time to unrank a combination. In the rightmost plot, we present a version with memoization (memoization is an optimization technique consisting in storing the results of expensive function calls and returning the cached values when the function is called again with the same arguments; this technique is sometimes called “tabling”) for the algorithms: we first run a pre-computation step storing all binomial coefficients that will be used. The second step consists in unranking the combinations by using the pre-calculated values for the binomial coefficients. In our plot, we present only the time used for the second step. We remark that the recursive Algorithm 3 is not as efficient as the others. Below, we show that this is due to its recursive nature and that this can be easily avoided. As explained above, Algorithm 5 should be less efficient than Algorithm 4, which is indeed apparent in the plots. Our Algorithms 2 and 4 are the most efficient when used in this setup.
While the number of evaluations of binomial coefficients is linear in n, it is clear that it is not the case for the time complexity of the algorithms (see the leftmost plot). However, once the binomial coefficients have been pre-computed, running the four algorithms without computing binomial coefficients is closer to a linear function but they still are not really linear.
While memoizing all binomial coefficients when n is of order of some hundreds is possible, it is no longer the case when n is of the order of several thousands (for the experiments in Figure 1, we only computed the necessary binomial coefficients using a lazy approach). Such use cases did not occur when the methods (e.g., [22,23]) were derived. However, now that big integers are more widely used, it becomes reasonable to unrank combinations of large size and it is thus necessary to take the cost of arithmetic operations into account. The first use of big integers for combination unranking appears to date back to 2004 [27].

4.2. Improving the Implementations of the Algorithms

Before going further with the experiments, we propose an improvement in the computation of the binomial coefficient, which is applicable to all algorithms presented in this paper.
In all of the presented algorithms, a binomial coefficient is computed at each step of the generation. There are various ways to implement those. One possibility is to compute the two products n · ( n 1 ) · ( n 2 ) ( n k + 1 ) and k ! separately using a divide and conquer strategy as described in [30] (Section 15.3) and then to compute the division.
In the unranking algorithms, instead of doing the “full” evaluation of the binomial coefficient at each step, it is possible to reuse the computations from the previous step and obtain the value of the new coefficient by a constant number of multiplications or divisions by a small integer. This is possible because in all algorithms the parameters of the binomial coefficients vary only by ± 1 from one step to the next. For instance, in Algorithm 2, just before incrementing i on Line 9, the coefficient for the next iteration can be obtained by multiplying b by k 1 i and dividing it by n 1 m i . Similarly, in the other branch of the if, just before incrementing m on Line 12, the value of the coefficient for the next iteration can be obtained by multiplying b by n m k and dividing it by n 1 m i . In the end, only the first binomial coefficient is computed from scratch and all others are obtained as described above. This lowers the amortized cost of computing one coefficient to Θ ( 1 ) multiplication of a big integer by a small integer. This optimization is applicable to all the algorithms presented in this paper.
In Algorithm 6, we propose an optimized version of our Algorithm 2 based on the above remarks. It also includes some other enhancements. First, instead of computing the permutation rank of the combination and then unranking the combination as a permutation, it is possible to process the coefficients of the factoradic decomposition on the fly to extract the right value from the set { 0 , 1 , 2 , , n 1 } . Second, we can note that we are in a special use-case of the Extract function where values are extracted in increasing order. Hence, there is no need to explicitly store the set of remaining values (not yet in the combination) to get access to its mth value: it is m + i where i is the number of already extracted values. Finally, the last component of the combination can be deduced without computing more binomial coefficients (Line 15) in order to leave the loop earlier.
Before going on with the efficiency comparisons, we use the remarks related to the binomial coefficient computation to improve Algorithms 3–5. Furthermore, to make the comparison fair, we used a variant of Algorithm 3 for the recursive approach in which the array storing the result is allocated with the right size at the beginning of the execution, as in the other algorithms. In addition, in order not to penalize it due to its recursive nature, the order of some instructions have been changed so that it is tail-recursive. The optimized version is given in Algorithm 7.
In UnrankTR, the variable i represents the position in L of the next value to be computed and the variable m represents the next candidate to be the value of L [ i ] . The invariant satisfied by b is that UnrankTR is always called with b = n · n 1 k 1 .
Algorithm 6 Unranking a combination with optimization.
1: function OptimizedUnrankingCombination( n , k , u )
2:  L [ 0 , , 0 ]                         ▹k components
3:  b binomial ( n 1 , k 1 ) · n
4:  m 0 ; i 0 ;
5: while i < k 1 do          ▹ Invariant: b = n m i 1 k 1 i · ( n m i )
6:   b b / ( n m i )
7:  if b > u then
8:    L [ i ] m + i
9:    b b · ( k i 1 )
10:    i i + 1
11:  else
12:    u u b
13:    b b · ( n m k )
14:    m m + 1
15: if k > 0 then L [ k 1 ] n + u b
16: return L
We propose a second time efficiency comparison for some algorithms with their optimizations in Figure 2. Comparing this plot with the leftmost one of Figure 1, we note that, when n = 10,000 , the algorithms run approximately 45 times faster. Again, we note that Algorithm 5 is still less efficient than the others, which all seem to be equivalent.
Algorithm 7 Recursive method with optimizations.
1: function OptimizedUnranking-
     Recursive(n,k,u)
2: L ← [0,…,0]       ▷ k components 
3: b ← binomial(n,k)
4: UnrankTR(L,0,0,n,k,u,b)
5: return L

1: function UnrankTR(L,i,m,n,k,u,b)
2: if k = 0 then do nothing
3: else if k = n then
4:  for j from 0 to k − 1 do
5:   L[i + j] ← m + j
6: else
7:  bb/n
8:  if u < b then
9:   L[i] ← m
10:   b ← (k − 1) · b
11:   UnrankTR(L,i + 1,m + 1,n − 1,k − 1,u,b)
12:  else
13:   uub
14:   b ← (nk) · b
15:   UnrankTR(L,i,m + 1,n − 1,k,u,b)
To understand the behavior of the curves of the above plot, we introduce another way to analyze the time complexity in Figure 3. Now, we take n = 10,000 and we let k range from 250 to 9750 with an iteration step of 250. For each step, we present an average value for 500 tests. Again (in the leftmost plot), the dashed lines correspond to the first version of each algorithm and the solid lines to the optimized versions. In the rightmost plot, we only focus on the optimized versions of the four algorithms. We remark that the worst time complexity is obtained when k = n / 2 . In addition, Algorithm 6 and the tail-recursive Algorithms 4 and 7 behave in almost the same way. Whereas Algorithm 4 is a little bit more efficient when k < n / 2 , the two other are a bit more efficient for the second half range.
Thus, the curves for Algorithms 4, 6, and 7 (once optimized) are hard to distinguish. There is a surprising explanation for this. In fact, it can be shown that all three algorithms perform the exact same arithmetic operations on big integers except for a few terms at then end of their execution, due to their different base cases. In the case of the recursive and factoradic-based algorithms, the similarity goes further. Algorithm 7, being tail-recursive, it can be automatically translated into the imperative style (the conversion of a tail-recursive algorithm into its imperative version is called “tail-call” optimization and is implemented by most compilers for languages with recursion) and the result of the automatic translation is an algorithm which is almost identical to Algorithm 2. In the case of Algorithm 4 (once optimized), although the computations are the same, they are used to obtain the values of the c i coefficients in a different order, which makes the parallel less obvious.
As a result, the only fundamental differences between all these algorithms are their base cases. Since they are all based on a different combinatorial point of view, their base cases have been characterized slightly differently, but, as we can see by comparing the results of their complexity analyses (establishing how many calls to binomial are done), this only impacts the second order of their complexity.
Besides, for all algorithms, a significant speed-up can be achieved by re-using the value of the most recently computed binomial coefficient.

4.3. Realistic Complexity Analysis

We now propose a more precise complexity analysis based on a more realistic cost model. Recall that we are dealing with big integers. More precisely for n and k being given, the ranks as well as the binomial coefficients computed during the generation can have up to L n , k = 1 + log 2 n k bits. Using Stirling’s approximation, we get that
L n , k n k = α n n α log 2 1 α + ( 1 α ) log 2 1 1 α .
Besides, the cost of the multiplication of a big integer with O ( n ) bits with a smaller integer of O ( ln n ) bits can be bounded by O ( n ln n M ( ln n ) ) where M ( x ) is the cost of multiplying two x-bits integers. This can be achieved by writing the big integer in base 2 log 2 n = n and performing the multiplication using the naive “textbook” algorithm in this base. The first term n ln n counts the number of operations done in base n and the second term M ( ln ) is the cost of one single multiplication in this base.
A rough upper bound for M ( x ) is x 2 , obtained by using the naive multiplication algorithm. Actually, the naive algorithm is often used in practice for small values of x since the asymptotically more efficient algorithms only become better above a given threshold. For our use-case, n is likely to fit in a machine word in practice and thus the naive algorithm must be used. Hence, the upper bound of O ( n ln n ) for the cost of the multiplication of a small integer by a big integer should faithfully reflect the actual runtime of our implementations, although it is theoretically not optimal. A tighter bound of O ( n ( ln ln n ) ( ln ln ln n ) ) can be obtained by using the Schönhage–Strassen algorithm, although it is not advisable in practice.
In addition to the cost of the multiplications discussed above, a linear number of comparisons and additions are performed. The cost of one such operation is linear in n and thus negligible compared to the cost of the multiplications. Finally, the first binomial coefficient must be computed from scratch which can be done at negligible cost compared to n 2 ln n M ( ln n ) .
By combining the above discussion with the results from the previous sections, we get the average bit-complexity of all algorithms when k grows linearly with n, as presented in Theorem 8.
Theorem 8.
For all the optimized algorithms of the present paper, there exist a constant c > 0 such that for all n large enough and k = α n for 0 < α < n , the bit-complexity of the algorithm is bounded by:
c · n 2 ln n · α log 2 1 α + ( 1 α ) log 2 1 1 α .
In Figure 4, we display the time complexity of our algorithm (in green) with the graph of the function α C · α ln 1 α + ( 1 α ) ln 1 1 α where the constant C was chosen so that the maximum values of both curves coincide.
This validates experimentally our complexity results. Besides, we checked by profiling the optimized algorithms that most of the run time is spent in the arithmetic operations, which also confirms the validity of our complexity model.

5. Extensions of the Algorithmic Context

5.1. Objects Counted by Multinomial Coefficients

Let n and m be two positive integers and let K = ( k 1 , k 2 , , k m ) be a finite sequence of non-negative integers whose sum equals n. The multinomial coefficient n k 1 , , k m counts the number of ways of depositing n distinct objects into m distinct buckets such that there are k i objects in the ith bucket. It can also be interpreted as combination with repetitions as follows. Consider we have a (large enough) pool of m kinds of different objects and we must pick a finite sequence of n objects from this pool such that k 1 of them are of the first kind, k 2 of the second kind, and so on.
Note that, when m = 2 , the multinomial coefficient corresponds to a binomial coefficient. Informally, Proposition 1 (related to combinations) states that combinations are in one-to-one correspondence with permutations containing two increasing runs. We have an analogous interpretation here. The ranks from 0 to n k 1 , , k m 1 are in one-to-one correspondence with size-n permutations composed of m increasing runs. We now exhibit this link more formally.
Proposition 4.
Let n and m be two positive integers and let K = ( k 1 , k 2 , , k m ) be a sequence of non-negative integers such that i k i = n . Each object enumerated by n k 1 , , k m is represented by a finite sequence ( 1 , 2 , k m ) where for all 1 i m , i is an increasing finite sequence of length k i . Furthermore, the union over i of all elements of i is exactly { 0 , 1 , , n 1 } .
Our unranking algorithm relies on the classical formula expressing the multinomial coefficient as a product of binomial coefficients.
n k 1 , , k m = k m k m · k m + k m 1 k m 1 k m + + k 2 k 2 · k m + + k 1 k 1 .
Algorithm 8 Unranking a combination with repetitions.
1: function UnrankingCombinationWithRepetitions( n , K = ( k 1 , , k m ) , u )
2:  F [ 0 , , 0 ]                     ▷n components in M
3.  u u
4:  n k m
5: for i from m 1 downto 1 do
6:   n n + k i
7:   b binomial ( n , k i )
8:   ( u , u ) division ( u , b )
9:   F factoradic ( R ANK C ONVERSION ( n , k i , u ) )
10:  for j from 0 to k i 1 do
11:    F [ n k i + j ] F [ n k i + j ]
12:  r composition ( F )
13: return U NRANKING P ERMUTATION , r )
division ( s , t ) : returns the pair ( q , r ) corresponding respectively to the quotient and the
remainder of
        the integer division of s by t.
Proposition 5.
The function UnrankingCombinationWithRepetitions ( n , ( k 1 , , k m ) , u ) computes the uth object counted by n k 1 , , k m in lexicographic order.
The core of the algorithm consists in computing the rank of the permutation, written in factoradics, associated with the combination with repetitions we are interested in. It remains then to unrank a permutation.
Proof. 
Based on Proposition 4, the core of the algorithm computes the rank of a permutation containing m increasing runs respectively of lengths k 1 , k 2 , , k m . Determining the contribution of each run in the factoradic decomposition of the permutation rank is done in the external loop starting in Line 5, from k m down to k 1 . The correctness of our algorithm relies on the following loop invariant:
  • The values of f j for all 0 j < k m + + k i + 1 have been computed and stored in F.
  • The values of f k m + + k i + 1 , , f k m + k i + 1 + k i 1 (which has not been determined yet, as we enter the loop) are equal to the factoradics of the rank u mod k m + + k i k i in the combinations of k i elements among k m + + k i possible elements.
  • The variable u holds the rank of the runs that must be still unranked.
 □
Once the factoradic F of the run under consideration has been computed (Line 10), it remains to update F according to the k i last components of F .

5.2. Objects Counted by k-Permutations

Let n and k be two positive integer. A k-permutation is given by a combination of k elements among n elements, and an ordering of these k elements. The number of k-permutation over a set of n elements is given by:
k ! n k = n 1 , , 1 , n k .
Proposition 6.
The function UnrankingKPermutation ( n , k , u ) returns the result of a call to UnrankingCombinationWithRepetitions ( n , ( 1 , , 1 , n k ) , u ) . Thus, it computes the uth k-permutation among n elements in lexicographic order.
Proof. 
The facts that both combinatorial classes have the same cardinality and that the algorithm for unranking combination with repetitions is lexicographic induce the correctness of the function. □

6. Conclusions

In the present article, we first exhibit, in Section 2, a new algorithm based on the classical factoradic numeral system for the lexicographic unranking of combinations. We also give a first step complexity analysis of the algorithm.
Section 3 is dedicated to the survey of three classical algorithms. The first one is the algorithm based on the so-called recursive method. The other two algorithms are based on combinadics, which is a numeral system particularly suited to representing combinations. To the best of our knowledge, it is the first time that both algorithms are compared. During the survey, we extend our complexity analysis based on a generating function approach, as presented in Section 2, in order to obtain a uniform presentation of the analysis of all four algorithms. Obviously, we thus reprove their average complexity results but with more details. Then, in Section 4, we compare their efficiency experimentally and propose a way to improve all these algorithms based on classical formulas of the binomial coefficient.
As a surprising result, we find that all the usual algorithms share a very similar core, doing almost the same computations in order to reconstruct the combination under consideration.
One interesting remark is that understanding in detail the core computations that are necessary to unrank combinations, it is possible to significantly improve all algorithms. This understanding, joined with a detailed and realistic theoretical complexity analysis, leads to a prediction of the run time of the algorithm that closely matches the actual run time of their implementations.
However, due to details that are neglected in practice, we realize in Table 1 that some improvements are still necessary in various computer algebra systems in order to get the most efficient implementations possible for the unranking of combinatorial objects.
Finally, in Section 5, we extend our approach to solve the problem of unranking structures enumerated by multinomial coefficients and objects counted by the k-permutations of n elements (also called arrangements).

Author Contributions

The authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: http://github.com/Kerl13/combination_unranking (accessed on 15 March 2021).

Acknowledgments

The authors thank C. Donnot and Y. Herida for having conducted the preliminary experiments about our algorithm. Furthermore, the authors thank the referees for their careful reading of the paper, their comments and suggested improvements.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Shablya, Y.; Kruchinin, D.; Kruchinin, V. Method for Developing Combinatorial Generation Algorithms Based on AND/OR Trees and Its Application. Mathematics 2020, 8, 962. [Google Scholar] [CrossRef]
  2. Grebennik, I.; Lytvynenko, O. Developing software for solving some combinatorial generation and optimization problems. In Proceedings of the 7th International Conference on Application of Information and Communication Technology and Statistics in Economy and Education, Sofia, Bulgaria, 3–4 November 2017; pp. 135–143. [Google Scholar]
  3. Tamada, Y.; Imoto, S.; Miyano, S. Parallel Algorithm for Learning Optimal Bayesian Network Structure. J. Mach. Learn. Res. 2011, 12, 2437–2459. [Google Scholar]
  4. Myers, A.F. k-out-of-n:G System Reliability With Imperfect Fault Coverage. IEEE Trans. Reliab. 2007, 56, 464–473. [Google Scholar] [CrossRef]
  5. Bodini, O.; Genitrini, A.; Naima, M. Ranked Schröder Trees. In Proceedings of the Sixteenth Workshop on Analytic Algorithmics and Combinatorics, ANALCO 2019, San Diego, CA, USA, 6 January 2019; pp. 13–26. [Google Scholar] [CrossRef] [Green Version]
  6. Ali, N.; Shamoon, A.; Yadav, N.; Sharma, T. Peptide Combination Generator: A Tool for Generating Peptide Combinations. ACS Omega 2020, 5, 5781–5783. [Google Scholar] [CrossRef] [PubMed]
  7. Ruskey, F. Combinatorial Generation; 2003; (unpublished); Available online: https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.93.5967 (accessed on 15 March 2021).
  8. Knuth, D.E. The Art of Computer Programming, Volume 4A, Combinatorial Algorithms; Addison-Wesley Professional: Boston, MA, USA, 2011. [Google Scholar]
  9. Skiena, S. The Algorithm Design Manual, Third Edition; Texts in Computer Science; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar] [CrossRef]
  10. Ryabko, B.Y. Fast enumeration of combinatorial objects. Discret. Math. Appl. 1998, 8, 163–182. [Google Scholar] [CrossRef] [Green Version]
  11. Nijenhuis, A.; Wilf, H.S. Combinatorial Algorithms; Computer Science and Applied Mathematics; Academic Press: New York, NY, USA, 1975. [Google Scholar]
  12. Flajolet, P.; Zimmermann, P.; Van Cutsem, B. A calculus for the random generation of labelled combinatorial structures. Theor. Comput. Sci. 1994, 132, 1–35. [Google Scholar] [CrossRef]
  13. Martínez, C.; Molinero, X. A generic approach for the unranking of labeled combinatorial classes. Random Struct. Algorithms 2001, 19, 472–497. [Google Scholar] [CrossRef] [Green Version]
  14. Kokosinski, Z. Algorithms for Unranking Combinations and their Applications. In Proceedings of the Seventh IASTED/ISMM International Conference on Parallel and Distributed Computing and Systems, Washington, DC, USA, 19–21 October 1995; Hamza, M.H., Ed.; IASTED/ACTA Press: Calgary, AB, Canada, 1995; pp. 216–224. [Google Scholar]
  15. Pascal, E. Sopra una formula numerica. G. Di Mat. 1887, 25, 45–49. [Google Scholar]
  16. Beckenbach, E.F.; Pólya, G. Applied Combinatorial Mathematics; R.E. Krieger Publishing Company: Malabar, FL, USA, 1981. [Google Scholar]
  17. Laisant, C.A. Sur la numération factorielle, application aux permutations. Bull. Soc. Math. Fr. 1888, 16, 176–183. [Google Scholar] [CrossRef] [Green Version]
  18. Fisher, R.A.; Yates, F. Statistical Tables for Biological, Agricultural and Medical Research; Oliver & Boyd: London, UK, 1948. [Google Scholar]
  19. Bonet, B. Efficient algorithms to rank and unrank permutations in lexicographic order. In AAAI Workshop on Search in AI and Robotics—Technical Report; The AAAI Press: Menlo Park, CA, USA, 2008; pp. 142–151. [Google Scholar]
  20. Bodini, O.; Genitrini, A.; Peschanski, F. A Quantitative Study of Pure Parallel Processes. Electron. J. Comb. 2016, 23, 39. [Google Scholar] [CrossRef]
  21. Durstenfeld, R. Algorithm 235: Random Permutation. Commun. ACM 1964, 7, 420. [Google Scholar] [CrossRef]
  22. Buckles, B.P.; Lybanon, M. Algorithm 515: Generation of a Vector from the Lexicographical Index [G6]. ACM Trans. Math. Softw. 1977, 3, 180–182. [Google Scholar] [CrossRef]
  23. Er, M.C. Lexicographic ordering, ranking and unranking of combinations. Int. J. Comput. Math. 1985, 17, 277–283. [Google Scholar] [CrossRef]
  24. Flajolet, P.; Sedgewick, R. Analytic Combinatorics; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
  25. Flajolet, P.; Sedgewick, R. An Introduction to the Analysis of Algorithms, 2nd ed.; Addison Wesley: Boston, MA, USA, 2013. [Google Scholar]
  26. Kreher, D.L.; Stinson, D.R. Combinatorial Algorithms: Generation, Enumeration, and Search; CRC Press: Boca Raton, FL, USA, 1999. [Google Scholar]
  27. McCaffrey, J. Generating the mth Lexicographical Element of a Mathematical Combination. MSDN. 2004. Available online: http://docs.microsoft.com/en-us/previous-versions/visualstudio/aa289166(v=vs.70) (accessed on 15 March 2021).
  28. The Sage Developers. SageMath, the Sage Mathematics Software System (Version 9.2). 2020. Available online: https://www.sagemath.org/ (accessed on 15 March 2021).
  29. Butler, B. Function kSubsetLexUnrank, MATLAB Central File Exchange. 2015. Available online: http://fr.mathworks.com/matlabcentral/fileexchange/53976-combinatorial-numbering-rank-and-unrank (accessed on 15 March 2021).
  30. Bostan, A.; Chyzak, F.; Giusti, M.; Lebreton, R.; Lecerf, G.; Salvy, B.; Schost, E. Algorithmes Efficaces en Calcul Formel, 10th ed.; Printed by CreateSpace: Palaiseau, France, 2017; 686p. [Google Scholar]
Figure 1. Time (in ms) for unranking a combination, with n = 250…10,000 and k = n / 2 .
Figure 1. Time (in ms) for unranking a combination, with n = 250…10,000 and k = n / 2 .
Algorithms 14 00097 g001
Figure 2. Time (in ms) for unranking a combination with the optimized algorithms.
Figure 2. Time (in ms) for unranking a combination with the optimized algorithms.
Algorithms 14 00097 g002
Figure 3. Time (in ms) for unranking a combination, with n = 10,000 and k = 25…9975.
Figure 3. Time (in ms) for unranking a combination, with n = 10,000 and k = 25…9975.
Algorithms 14 00097 g003
Figure 4. Merge of Algorithm 2 and its theoretical complexity.
Figure 4. Merge of Algorithm 2 and its theoretical complexity.
Algorithms 14 00097 g004
Table 1. Average time (in ms) for the unranking of a combination of k elements among n (too long means that the average time is larger than 30,000 ms).
Table 1. Average time (in ms) for the unranking of a combination of k elements among n (too long means that the average time is larger than 30,000 ms).
Time in ms
Sample SagemathMapleMathematicaMatlab
Our Algo.v. 9.0v. 9.2v. 2020.0v. 12.1.1.0v. R2020b
Implem. in CTheir Algo.New Algo.Their Algo.Our Algo.Their Algo.Our Algo.Their Algo.Our Algo.
n = 1000 0.054642.60452.967278.22.120.441764.31453996.63041.2
k = 100
n = 1000 0.060528.89032.47846142.960.346083.95473520.63380.0
k = 500
n = 3000 0.1749615.89688.7929118013.25.913111.82311,8469315.2
k = 300
n = 3000 0.2752496.35898.0500613019.24.962413.06711,0879879.4
k = 1500
n = 10,0001.2554191.0331.665too long65.121.90639.935too longtoo long
k = 1000
n = 10,0002.38492245.629.027too long97.929.91646.452too longtoo long
k = 5000
Table 2. First values of u n , k for n = 1 8 and k = 0 n (Algorithm 2).
Table 2. First values of u n , k for n = 1 8 and k = 0 n (Algorithm 2).
k012345678
n
101
2032
30683
401020154
50154045245
60217010584356
7028112210224140487
8036168378504420216638
Table 3. First values of u n , k for n = 1 8 and k = 0 n (Algorithm 3).
Table 3. First values of u n , k for n = 1 8 and k = 0 n (Algorithm 3).
k012345678
n
100
2020
30550
4091690
50143535140
6020649064200
7027105189189105270
8035160350448350160350
Table 4. Combinadics and their combinations for n = 6 and k = 2 .
Table 4. Combinadics and their combinations for n = 6 and k = 2 .
Rank, uReversed Rank, u = 6 2 1 u Combinadic of u Combination of Rank u
014(4, 5)(0, 1)
113(3, 5)(0, 2)
212(2, 5)(0, 3)
311(1, 5)(0, 4)
410(0, 5)(0, 5)
59(3, 4)(1, 2)
68(2, 4)(1, 3)
77(1, 4)(1, 4)
86(0, 4)(1, 5)
95(2, 3)(2, 3)
104(1, 3)(2, 4)
113(0, 3)(2, 5)
122(1, 2)(3, 4)
131(0, 2)(3, 5)
140(0, 1)(4, 5)
Table 5. First values of u n , k for n = 1 8 and k = 0 n (Algorithm 4).
Table 5. First values of u n , k for n = 1 8 and k = 0 n (Algorithm 4).
k012345678
n
112
2153
319114
411426195
51205055296
61278512599417
7135133245259161558
8144196434574476244719
Table 6. First values of u n , k for n = 1 8 and k = 0 n (Algorithm 5).
Table 6. First values of u n , k for n = 1 8 and k = 0 n (Algorithm 5).
k012345678
n
100
2001
30042
40010103
5002030184
600357063285
70056140168112406
80084252378336180547
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Genitrini, A.; Pépin, M. Lexicographic Unranking of Combinations Revisited. Algorithms 2021, 14, 97. https://doi.org/10.3390/a14030097

AMA Style

Genitrini A, Pépin M. Lexicographic Unranking of Combinations Revisited. Algorithms. 2021; 14(3):97. https://doi.org/10.3390/a14030097

Chicago/Turabian Style

Genitrini, Antoine, and Martin Pépin. 2021. "Lexicographic Unranking of Combinations Revisited" Algorithms 14, no. 3: 97. https://doi.org/10.3390/a14030097

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop