0000-0003-1072-8559 \reportnumALG-24-007 \reporturlhttps://reports.cicirello.org/24/007/ALG-24-007.pdf \ACMF.2.1; G.3; G.4; I.1.2 \MSC68Q25; 68Q87; 68W40; 60-04
On the Average Runtime of an Open Source Binomial Random Variate Generation Algorithm
Abstract
The BTPE algorithm (Binomial, Triangle, Parallelogram, Exponential) of Kachitvichyanukul and Schmeiser is one of the faster and more widely utilized algorithms for generating binomial random variates. Cicirello’s open source Java library, , includes an implementation of BTPE as well as a variety of other random number related utilities. In this report, I explore the average case runtime of the BTPE algorithm when generating random values from binomial distribution . Beginning with Kachitvichyanukul and Schmeiser’s formula for the expected number of acceptance-rejection sampling iterations, I analyze the limit behavior as approaches infinity, and show that the average runtime of BTPE converges to a constant. I instrument the open source Java implementation from the library to experimentally validate the analysis.
keywords:
binomial; BTPE; inverse transform; Java; open source; random variate; runtime analysis1 Introduction
In this report, I explore the average runtime behavior of binomial random variate generation within the open source Java library [5]. The library provides a variety of random number related enhancements beyond what is provided by the Java API itself. The core functionality of is provided through a hierarchy of wrapper classes. This set of ’s wrapper classes directly corresponds to the hierarchy of random number generator interfaces introduced in Java 17 [19]. In some cases, overrides Java’s random number generators with faster algorithms, such as for random integers subject to a bound. In other cases, adds functionality, such as additional distributions, i.e., the binomial, Cauchy, among others [5]. The initial motivation of the library was to provide a source of enhanced and more efficient randomness for other libraries, such as JavaPermutationTools [3] and Chips-n-Salsa [4].
Among the randomness enhancements of the library is support for generating binomial random variates, which are important to many applications [18, 17, 9, 22, 15, 2, 12, 21, 14]. There are many algorithms for generating binomial random variates [10, 11, 16, 1, 14, 13]. The [5] library implements the BTPE Algorithm (Binomial, Triangle, Parallelogram, Exponential) [10], and falls back upon the inverse transform [10, 13] for cases that cannot be handled by BTPE.
The runtime of many algorithms for generating random values from binomial distribution grows with some function of and . For example, the runtime of the inverse transform method is [10, 13]. One technique many binomial random variate algorithms utilize is acceptance–rejection sampling [8], and the iterations of such sampling is often what leads to such runtimes. BTPE does use acceptance–rejection sampling, but seems to require relatively few iterations. At the time of introduction, Kachitvichyanukul and Schmeiser determined the expected number of iterations as a function of and [10] (see Section 2.1). However, the aim of my present report is to provide simpler insight into the average case runtime than is provided by Kachitvichyanukul and Schmeiser’s formula.
In this report, I explore the limit of Kachitvichyanukul and Schmeiser’s formula as approaches infinity (Section 3), for two cases of , including the minimum supported by BTPE (i.e., ), which maximizes the expected number of rejection sampling iterations, and , which is when the expected number of rejection sampling iterations is minimized. Both cases lead to average runtimes that are constants in the limit for large . In Section 4, I validate the findings experimentally using the library, and utilizing my prior empirical study [6]. I discuss the findings and offer conclusions in Section 5, including the main result that the average runtime of BTPE is . I now begin in Section 2 with some required background.
2 Preliminaries
2.1 Rejection Sampling Iterations of BTPE
The BTPE [10] algorithm separates the binomial distribution into four parts, using triangular functions in the middle portions, and exponential functions in the tails, and it uses acceptance–rejection sampling [8]. For complete details of BTPE, which are beyond the scope of this paper, I refer the reader to the article that introduced it [10].
To generate a random value from binomial distribution , each acceptance–rejection iteration of BTPE generates two random values from , i.e., uniformly distributed over the interval . When they introduced BTPE, Kachitvichyanukul and Schmeiser determined that the expected number of iterations, of BTPE is [10]:
(1) |
and since each iteration generates two random uniform values from , the expected number of uniform variates, , required by BTPE is thus:
(2) |
Equations 1 and 2 involve , , and , which in turn depend upon a few others, all of which are functions of and . Kachitvichyanukul and Schmeiser [10] define these as follows:
(3) | |||
(4) | |||
(5) | |||
(6) | |||
(7) | |||
(8) | |||
(9) | |||
(10) | |||
(11) | |||
(12) | |||
(13) | |||
(14) | |||
(15) | |||
(16) | |||
(17) | |||
(18) |
Also note that BTPE is only relevant when: . Kachitvichyanukul and Schmeiser recommend using the inverse transform method [13] as a fallback for cases when BTPE is not relevant. This is what is done in the implementation in the library [5].
The expected number of acceptance–rejection sampling iterations, expressed in Equation 2, is maximized when is minimal. Thus, the expected number of rejection sampling iterations is maximized for the cases and . Due to symmetry, these two cases lead to the same expected number of rejection sampling iterations. The nearer is to , the fewer iterations BTPE requires on average. Table 1 shows how the expected number of required uniform variates changes as increases for these two cases of . It includes the minimum supported by BTPE (e.g., ), and then considers at increasing powers of two to show how the number of uniform variates used by BTPE to generate one binomial random variate varies as increases rapidly.
from Equation 2 | ||
---|---|---|
3.996 | 3.996 | |
3.837 | 3.599 | |
3.792 | 3.337 | |
3.790 | 2.985 | |
3.793 | 2.632 | |
3.796 | 2.420 | |
3.797 | 2.317 | |
3.798 | 2.307 | |
3.798 | 2.276 | |
3.799 | 2.300 | |
3.799 | 2.299 | |
3.799 | 2.300 | |
3.799 | 2.302 | |
3.799 | 2.310 | |
3.799 | 2.310 | |
3.799 | 2.313 | |
3.799 | 2.314 |
2.2 Stirling’s Formula
While computing the limiting behavior of BTPE, we will encounter some factorials. Stirling’s formula [7] will be useful in the analysis. Stirling’s formula is as follows:
(19) |
where
(20) |
A consequence of using Stirling’s formula is that it will introduce terms involving , either generally for or in other cases for specific values of . The following observations will be useful:
(21) | |||
(22) | |||
(23) | |||
(24) | |||
(25) | |||
(26) |
3 Limit Analysis
Let’s now proceed to compute the limit:
(27) |
for two cases, when and when in Sections 3.1 and 3.2, respectively.
3.1 Case 1: Maximum Rejection Sampling Iterations
Consider the limit from Equation 27 when , which is the lowest (in terms of ) supported by BTPE. Begin by computing the values of the various constants from Section 2.1 in terms of and as follows:
(28) | |||
(29) | |||
(30) | |||
(31) | |||
(32) | |||
(33) |
From Equation (20), find the following, which we need later:
(34) | |||
(35) | |||
(36) |
We will also need the following limit:
(37) |
We will also need the limits of and as approaches infinity, as follows:
(38) |
and
(39) |
We finally can compute the limit of Equation (27):
(40) |
Thus, the expected number of uniform random variates required by BTPE to generate one binomial random variate converges to approximately 3.801 as grows large (approximately 1.90 rejection sampling iterations on average) for the case of minimum supported . Observe in Table 1 that BTPE approaches this limit case rapidly.
3.2 Case 2: Central Case
Now consider the limit from Equation 27 for the case of .
Let us begin by computing some of the constants from Section 2.1 for this case. When computing , assume that is even, without loss of generality:
(41) | |||
(42) | |||
(43) | |||
(44) |
Now substitute these, and other Section 2.1 definitions, into Equation 27 and simplify:
(45) |
The next-to-last step above utilizes the fact that when .
Before continuing to simplify Equation 45, compute the following limits. First:
(46) |
Now, compute the limit . Since involves a floor, compute bounds on this limit. Although we will see that the upper and lower bounds are equal, and thus via the squeeze theorem is also equal to our target limit.
(47) |
The next limit that we need is as follows:
(48) |
Now consider the following:
(49) |
Now continue simplifying Equation 48, keeping Equation 47 in mind as well as that as follows:
(50) |
Thus, when the expected number of uniform random variates required by BTPE to generate one binomial random variate converges to approximately 2.319 as grows large (approximately 1.16 rejection sampling iterations on average). Observe in Table 1, that this limit case is reached quickly.
4 Experimental Validation
In earlier work, I used the open source Java library to empirically explore the behavior of my BTPE implementation [6]. I wrapped an instance of Java’s SplittableRandom class, which implements the splitmix [20] pseudorandom number generator (PRNG). The purpose was to instrument the PRNG in order to count the number of uniform random variates, , generated (i.e., calls to the nextDouble() method) while generating a binomial random variate. I then supply this wrapped PRNG instance as the source of randomness for ’s implementation of BTPE.
That prior paper [6] considered many combinations of and . Here I focus on the same values of . But, I only focus on two specific cases of , the two cases utilized earlier in the paper: from Section 3.1 and from Section 3.2. For each binomial distribution , I generate 10,000 binomial random variates, and I compute the average number of uniform variates per binomial, with 95% confidence intervals. I compare the experimental mean to the prediction of the number of uniform variates from Equation 2, and test significance with a -test. The experiments used OpenJDK 17 on a Windows 10 PC with a 3.4 GHz AMD A10-5700 CPU and 8 GB RAM. I used 3.1.1. Both the source code for the experiments (https://github.com/cicirello/btpe-iterations), as well as for (https://github.com/cicirello/rho-mu) is maintained on GitHub. The API documentation for the library is available on the web: https://rho-mu.cicirello.org/.
predicted | mean | -value | predicted | mean | -value | |
---|---|---|---|---|---|---|
3.837 | 0.76 | 3.599 | 0.46 | |||
3.792 | 0.80 | 3.337 | 0.89 | |||
3.790 | 0.45 | 2.985 | 0.27 | |||
3.793 | 0.27 | 2.632 | 0.66 | |||
3.796 | 0.99 | 2.420 | 0.41 | |||
3.797 | 0.42 | 2.317 | 0.11 | |||
3.798 | 0.00 | 2.307 | 0.85 | |||
3.798 | 0.72 | 2.276 | 0.98 | |||
3.799 | 0.70 | 2.300 | 0.13 | |||
3.799 | 0.97 | 2.299 | 0.55 | |||
3.799 | 0.73 | 2.300 | 0.08 | |||
3.799 | 0.27 | 2.302 | 0.65 | |||
3.799 | 0.32 | 2.310 | 0.29 | |||
3.799 | 0.46 | 2.310 | 0.73 | |||
3.799 | 0.80 | 2.313 | 0.91 | |||
3.799 | 0.74 | 2.314 | 0.58 |
Table 2 shows the results. The raw and processed data are available on GitHub (https://github.com/cicirello/btpe-iterations). The empirical results confirm the analytical prediction of Equation (2). Observe that there is no significant difference between the analytical prediction and the empirically computed means. T-test -values are above 0.05 in almost all cases (well above in most cases). There is only a single case, , in Table 2 where a -test -value is less than 0.05, seemingly suggesting that the implementation behaves differently than theory predicts in this case. However, this case is explainable by random chance. There are 32 cases represented in Table 2, so this one case is approximately 3% of the cases tested. My more in depth empirical study explored a much larger number of cases experimentally, finding no significant difference between predicted and observed behavior in all but 4% of cases [6]. At significance level 0.05, we should expect false-positives in approximately 5% of cases.
5 Discussion and Conclusions
In this report, I analyzed the runtime behavior of the BTPE algorithm for binomial random variate generation, and observed the following concerning average performance:
-
•
The average number of acceptance–rejection sampling iterations, , and average number of uniform random numbers generated while generating a binomial random variate, , are as follows:
(52) (53) At least one iteration is necessary (the above lower bound), and the above upper bounds derive from the case of minimum supported for , which is the case requiring the maximum number of rejection sampling iterations on average (Table 1).
-
•
For large and the case of , which maximizes the expected number of rejection sampling iterations, we find that the expected number of uniform random numbers required to generate one binomial random variate is a constant:
(54) -
•
For large and the case of , the case that minimizes the expected number of rejection sampling iterations, we find that the expected number of uniform random numbers required to generate one binomial random variate is also a constant:
(55)
Thus, the average runtime of BTPE is , while the average runtime of many of the alternative algorithms for generating binomial random variates is worse than a constant. For example, the average runtime of the inverse transform method is [10, 13].
In this report, I also experimentally validated my limit analysis of Kachitvichyanukul and Schmeiser’s formula, using my implementation of BTPE in the Java library [5]. This validates: (a) Kachitvichyanukul and Schmeiser’s formula for the expected number of rejection sampling iterations, (b) my analysis of that formula, and (c) that ’s implementation of BTPE behaves as theory predicts. One limitation of this study is that it focuses on average case performance. It is possible that BTPE may exhibit many more rejection sampling iterations on occasion. The maximum number of uniform random variates generated while generating a single binomial during the course of my previous experimentation [6] was 38 (19 rejection sampling iterations). Such instances were very rare in that study, which generated approximately 2.88 million binomial random variates.
References
- Ahrens and Dieter [1974] J. H. Ahrens and U. Dieter. Computer methods for sampling from gamma, beta, poisson and bionomial distributions. Computing, 12(3):223–246, 1974. 10.1007/BF02293108.
- Arshad et al. [2019] Habiba Arshad, Muhammad Attique Khan, Muhammad Sharif, Mussarat Yasmin, and Muhammad Younus Javed. Multi-level features fusion and selection for human gait recognition: an optimized framework of bayesian model and binomial distribution. International Journal of Machine Learning and Cybernetics, 10(12):3601–3618, 2019. 10.1007/s13042-019-00947-0.
- Cicirello [2018] Vincent A. Cicirello. JavaPermutationTools: A java library of permutation distance metrics. Journal of Open Source Software, 3(31):950, November 2018. 10.21105/joss.00950.
- Cicirello [2020] Vincent A. Cicirello. Chips-n-salsa: A java library of customizable, hybridizable, iterative, parallel, stochastic, and self-adaptive local search algorithms. Journal of Open Source Software, 5(52):2448, August 2020. 10.21105/joss.02448.
- Cicirello [2022] Vincent A. Cicirello. : A java library of randomization enhancements and other math utilities. Journal of Open Source Software, 7(76):4663, August 2022. 10.21105/joss.04663.
- Cicirello [2023] Vincent A. Cicirello. An analysis of an open source binomial random variate generation algorithm. Engineering Proceedings, 56(1):86, October 2023. 10.3390/ASEC2023-15349.
- Cormen et al. [2022] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. MIT Press, Cambridge, MA, 4th edition, 2022.
- Flury [1990] Bernard D. Flury. Acceptance–rejection sampling made easy. SIAM Review, 32(3):474–476, 1990. 10.1137/1032082.
- García-García et al. [2022] Jaime Israel García-García, Nicolás Alonso Fernández Coronado, Elizabeth H. Arredondo, and Isaac Alejandro Imilpán Rivera. The binomial distribution: Historical origin and evolution of its problem situations. Mathematics, 10(15):2680, 2022. 10.3390/math10152680.
- Kachitvichyanukul and Schmeiser [1988] Voratas Kachitvichyanukul and Bruce W. Schmeiser. Binomial random variate generation. Communications of the ACM, 31(2):216–222, 1988. 10.1145/42372.42381.
- Kachitvichyanukul and Schmeiser [1989] Voratas Kachitvichyanukul and Bruce W. Schmeiser. Algorithm 678: Btpec: Sampling from the binomial distribution. ACM Transactions on Mathematical Software, 15(4):394–397, 1989. 10.1145/76909.76916.
- Khan and Olivier [2019] Manzoor Khan and Jake Olivier. Regression to the mean for the bivariate binomial distribution. Statistics in Medicine, 38(13):2391–2412, 2019. 10.1002/sim.8115.
- Knuth [1998] Donald E. Knuth. The Art of Computer Programming, Volume 2, Seminumerical Algorithms. Addison Wesley, 3rd edition, 1998.
- Kuhl [2017] Michael E. Kuhl. History of random variate generation. In Proceedings of the 2017 Winter Simulation Conference, pages 231–242. IEEE Press, 2017. 10.1109/WSC.2017.8247791.
- Naimi and Whitcomb [2020] Ashley I Naimi and Brian W Whitcomb. Estimating risk ratios and risk differences using regression. American Journal of Epidemiology, 189(6):508–510, 2020. 10.1093/aje/kwaa044.
- Relles [1972] Daniel A. Relles. A simple algorithm for generating binomial random variables when n is large. Journal of the American Statistical Association, 67(339):612–613, 1972. 10.1080/01621459.1972.10481259.
- Shah et al. [2022] Sayed Qaiser Ali Shah, Farrukh Zeeshan Khan, and Muneer Ahmad. Mitigating tcp syn flooding based edos attack in cloud computing environment using binomial distribution in sdn. Computer Communications, 182:198–211, 2022. 10.1016/j.comcom.2021.11.008.
- Singh et al. [2022] Sunny Singh, Muskaan Chawla, Devendra Prasad, Divya Anand, Abdullah Alharbi, and Wael Alosaimi. An improved binomial distribution-based trust management algorithm for remote patient monitoring in wbans. Sustainability, 14(4):2141, 2022. 10.3390/su14042141.
- Steele [2017] Guy Steele. Jep 356: Enhanced pseudo-random number generators. JEP 356, OpenJDK, 2017. URL https://openjdk.org/jeps/356.
- Steele et al. [2014] Guy L. Steele, Doug Lea, and Christine H. Flood. Fast splittable pseudorandom number generators. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, pages 453–472, New York, NY, USA, 2014. Association for Computing Machinery. 10.1145/2660193.2660195.
- Wang and Pei [2019] Guantao Wang and Jingjing Pei. Macro risk: A versatile and universal strategy for measuring the overall safety of hazardous industrial installations in china. International Journal of Environmental Research and Public Health, 16(10):1680, 2019. 10.3390/ijerph16101680.
- Zhang and Lin [2020] Xiaoxian Zhang and Zhifen Lin. Hormesis-induced gap between the guidelines and reality in ecological risk assessment. Chemosphere, 243:125348, 2020. 10.1016/j.chemosphere.2019.125348.