Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Revisiting Monte Carlo Strength Evaluation

Martin Stanek
Department of Computer Science
Faculty of Mathematics, Physics and Informatics
Comenius University
martin.stanek@fmph.uniba.sk
Abstract

The Monte Carlo method, proposed by Dell’Amico and Filippone, estimates a password’s rank within a probabilistic model for password generation, i.e., it determines the password’s strength according to this model. We propose several ideas to improve the precision or speed of the estimation. Through experimental tests, we demonstrate that improved sampling can yield slightly better precision. Moreover, additional precomputation results in faster estimations with a modest increase in memory usage.
Keywords: Password, Monte Carlo, Strength Evaluation

1 Introduction

Passwords remain a frequently used authentication method, despite numerous initiatives, technologies, and implementations aiming for passwordless authentication. Although the popularity of methods such as Windows Hello, Passkey, and WebAuthn has increased, the security of passwords continues to be a significant topic in many application areas.

Evaluating the strength of a password is useful for providing users with feedback on their chosen passwords. This feedback can assist users in selecting stronger passwords. Often, the strength is calculated as a password’s rank, i.e., how many passwords will be generated by some chosen algorithm until our password is produced. There are various tools that calculate the strength of the password, for example zxcvbn [8], or password scorer tool in PCFG cracker [5].

Dell’Amico and Filippone proposed a Monte Carlo algorithm that estimates a password’s rank within a probabilistic model [2]. The algorithm work for any probabilistic password generation model, and the authors proved that estimated results converge to the actual ranks.

The Monte Carlo estimator is also used to evaluate and compare different probabilistic models for password generation. The original paper compares n𝑛nitalic_n-grams models [4], the PCFG model using probabilistic context-free grammar [7], and the Backoff model [3]. Recent example of using the Monte Carlo estimator is the evaluation of a password guessing method that employs a random forest [6].

Our contribution.

We propose three ideas for improving the precision or speed of the Monte Carlo estimator. The first idea is to interpolate password’s rank within the sampled interval it belongs, according its probability. The second idea aims to reduce probability overlap in sampled passwords. Both these ideas, presented in Section 3.2, seek to improve the estimator’s precision. The estimation speed for a password, originally based on binary search, can be enhanced with some additional data computed in advance (the third idea, see Section 3.3). All ideas have been tested experimentally to assess their merit. The results are presented in Section 4. Our experiments demonstrate that improved sampling can yield slightly better precision. However, the effect of interpolation on precision is inconclusive, and we cannot rely on this technique to improve precision.

We utilize the reference implementation of the Monte Carlo estimator, which was published by one of the authors of the original paper on GitHub [1], and we employ the RockYou dataset for our experiments. Given that our focus lies on the estimator itself, the choice of dataset is relatively unimportant.

2 How the Monte Carlo Estimator works

We mostly follow [2] in this section. Let ΓΓ\Gammaroman_Γ be a set of all allowed passwords. A probabilistic password model aims to capture how humans select password, assigning higher probabilities to more frequently chosen passwords and lower probabilities to less common ones. Let p(α)𝑝𝛼p(\alpha)italic_p ( italic_α ) denotes a probability assigned to password α𝛼\alphaitalic_α by the model, such that αΓp(α)=1subscript𝛼Γ𝑝𝛼1\sum_{\alpha\in\Gamma}p(\alpha)=1∑ start_POSTSUBSCRIPT italic_α ∈ roman_Γ end_POSTSUBSCRIPT italic_p ( italic_α ) = 1. Different models yield different probability distributions.

When the model is used for an attack, it enumerates password in descending order of probability. Therefore, the strength of a password α𝛼\alphaitalic_α is the number of passwords with a higher probability:

Sp(α)=|{βΓ;p(β)>p(α)}|.subscript𝑆𝑝𝛼formulae-sequence𝛽Γ𝑝𝛽𝑝𝛼S_{p}(\alpha)=|\{\beta\in\Gamma;\;p(\beta)>p(\alpha)\}|.italic_S start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_α ) = | { italic_β ∈ roman_Γ ; italic_p ( italic_β ) > italic_p ( italic_α ) } | .
Remark.

In this context, the authors do not address the possibility that the model may assign identical probabilities to multiple passwords, resulting in a non-monotonic p𝑝pitalic_p. The definition of Sp(α)subscript𝑆𝑝𝛼S_{p}(\alpha)italic_S start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_α ) assigns all passwords that share the same probability the lowest rank in their group. This approach can be considered prudent from a security standpoint.

Computing the exact value of Sp(α)subscript𝑆𝑝𝛼S_{p}(\alpha)italic_S start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_α ), for a random α𝛼\alphaitalic_α, has prohibitively large time complexity. The Monte Carlo estimator uses sampling and approximation to provide efficient and sufficiently accurate estimation. It relies on two properties of the underlying model:

  • The model allows for efficiently computing p(α)𝑝𝛼p(\alpha)italic_p ( italic_α ) for any password α𝛼\alphaitalic_α.

  • There is an efficient sampling method that generates a password according to the model’s distribution.

Precomputation.

The estimator generates a sample ΘΘ\Thetaroman_Θ of n𝑛nitalic_n passwords (sampling with replacement). The sample Θ={β1,,βn}Θsubscript𝛽1subscript𝛽𝑛\Theta=\{\beta_{1},\ldots,\beta_{n}\}roman_Θ = { italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_β start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } is sorted by descending probability, i.e., p(β1)p(βn)𝑝subscript𝛽1𝑝subscript𝛽𝑛p(\beta_{1})\geq\ldots\geq p(\beta_{n})italic_p ( italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≥ … ≥ italic_p ( italic_β start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). The cumulative ranks of sampled passwords are calculated as follows:

ci=1nj=1i1p(βj) for i=1,,n.formulae-sequencesubscript𝑐𝑖1𝑛superscriptsubscript𝑗1𝑖1𝑝subscript𝛽𝑗 for 𝑖1𝑛c_{i}=\frac{1}{n}\,\sum_{j=1}^{i}\frac{1}{p(\beta_{j})}\quad\text{ for }\;i=1,% \ldots,n.italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_p ( italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG for italic_i = 1 , … , italic_n .

The estimator needs to store the probabilities. The cumulative ranks can be easily recomputed. However, both these arrays are usually significantly smaller than representation of the model, see Section 3.

Remark.

The implementation [1] uses negative log2subscript2\log_{2}roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT probabilities, i.e., scaling p(βj)𝑝subscript𝛽𝑗p(\beta_{j})italic_p ( italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) to log2p(βj)subscript2𝑝subscript𝛽𝑗-\log_{2}p(\beta_{j})- roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p ( italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ).

Estimation.

In order to estimate Sp(α)subscript𝑆𝑝𝛼S_{p}(\alpha)italic_S start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_α ) for some password α𝛼\alphaitalic_α, the probability p(α)𝑝𝛼p(\alpha)italic_p ( italic_α ) is computed first. Then the binary search is used to compute the largest index j𝑗jitalic_j such that p(βj)>p(α)𝑝subscript𝛽𝑗𝑝𝛼p(\beta_{j})>p(\alpha)italic_p ( italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) > italic_p ( italic_α ). The result, estimated rank of α𝛼\alphaitalic_α is Sp(α)cjsubscript𝑆𝑝𝛼subscript𝑐𝑗S_{p}(\alpha)\approx c_{j}italic_S start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_α ) ≈ italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Hence, the time complexity of the estimator is O(logn)𝑂𝑛O(\log n)italic_O ( roman_log italic_n ).

3 Areas for improvement

3.1 Memory requirement

The RockYou dataset contains more than 14 million unique passwords. The more passwords are used to train a model, the better and more precise results we can expect, such as in our case for password strength estimation. However, there is a point beyond which additional training data provide only negligible improvement, while further increasing the model’s size. Notably, even the set of 10,000 most frequent passwords generates models of substantial size: 3.173.173.17\,3.17MB for 4-gram, 7.457.457.45\,7.45MB for 5-gram, 43.543.543.5\,43.5MB for Backoff, and 0.990.990.99\,0.99MB for PCFG. An attempt to use up to 10101010% of the RockYou dataset for training leads to unacceptable model sizes, where Backoff model being the largest, as shown in Figure 1.

Refer to caption
Figure 1: Size of the model reflecting the number of passwords in a training dataset. The graph on the right excludes the Backoff model to show other three models more clearly.

The model defines how passwords are represented, generated, and how their probabilities are calculated. Since these methods are specific for each model, we do not aim to improve the model size. However, the Monte Carlo estimator utilizes an additional arrays, where probabilities and ranks of sampled passwords are precomputed. The original paper [2] experiments with various sample sizes up to 100,000 (having “relative error 1111%”), but mostly uses the default sample size of 10,000. The default sample size requires 160 kB of memory111Real numbers are represented as the numpy.float64 datatype. and its dominated by the memory required for any model trained on a dataset of reasonable length.

3.2 Precision

The estimator assigns the same rank cjsubscript𝑐𝑗c_{j}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to any password α𝛼\alphaitalic_α for which the probability falls within the range p(βj)>p(α)p(βj+1)𝑝subscript𝛽𝑗𝑝𝛼𝑝subscript𝛽𝑗1p(\beta_{j})>p(\alpha)\geq p(\beta_{j+1})italic_p ( italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) > italic_p ( italic_α ) ≥ italic_p ( italic_β start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ). Intuitively, passwords with distinct probabilities should not get the same numeric estimate. Certainly, this is not an issue when the password strength is presented on a reduced scale using descriptive characteristics like weakmediumstrongvery strong, or using a traffic lights metaphor redambergreen.

Idea 1.

Interpolate rank values within intervals using an appropriate function. The most basic approach, without additional parameters, is linear interpolation. This has no impact on memory complexity and a negligible impact on time complexity. Figure 2 shows a graph of password ranks, on a logarithmic scale, for various models and the sample size of 10,000. It appears that linear interpolation on the logarithmic scale should perform well for these models.

Refer to caption
Figure 2: Password ranks corresponding to the position in the sample

The precision of the estimator depends on the sample size. More specifically, it depends on the number of unique probabilities in the set P={p(β1),,p(βn)}𝑃𝑝subscript𝛽1𝑝subscript𝛽𝑛P=\{p(\beta_{1}),\ldots,p(\beta_{n})\}italic_P = { italic_p ( italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_p ( italic_β start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) }. We define the overlap of ΘΘ\Thetaroman_Θ as the fraction of probability values that are already in the set, and therefore do not contribute to the estimator’s precision: 1|P|/n1𝑃𝑛1-|P|/n1 - | italic_P | / italic_n. Table 1 shows the average overlap for different models and sample sizes. Surprising differences in overlap are observed among different models. An expected increase in overlap is observed with an increasing sample size, since the overlap depends substantially on password probability distribution, given by the model from which the passwords are generated. On the other hand, a larger training dataset results in greater diversity of passwords, leading to slightly lower overlap.

Idea 2.

The estimator will sample random passwords for ΘΘ\Thetaroman_Θ until it gets n𝑛nitalic_n unique probabilities. It compresses sample by discarding duplicate probabilities in such a way that preserves the cumulative sum of the entry with the largest index. Hence, the rank calculation remains intact, and the overlap of the resulting ΘΘ\Thetaroman_Θ is be 00. Since the sampling is done in precomputation phase, it does not impact the estimation time or memory complexity in any way.

training set sample size 4-gram 5-gram Backoff PCFG
500,000 10,000 13.6% 16.5% 20.4% 44.6%
30,000 20.6% 25.8% 31.6% 60.8%
50,000 24.8% 30.5% 37.3% 67.1%
1,000,000 10,000 12.4% 14.7% 17.2% 43.1%
30,000 19.1% 22.4% 27.6% 58.5%
50,000 22.4% 26.7% 33.2% 64.8%
Table 1: Overlap percentage for different models and sample sizes. Models are trained on 500,000 and 1,000,000 passwords using the most frequent passwords from the RockYou dataset. Every number is an average of 3 experiments.

Table 2 shows how many passwords must be sampled using a trained model to achieve the target size of the sample with distinct probabilities.

target Sampled passwords
sample size 4-gram 5-gram Backoff PCFG
10,000 11,689 12,239 12,894 23,483
30,000 38,795 42,032 47,178 122,865
50,000 68,358 75,953 90,123 258,489
Table 2: Average number of sampled passwords required to achieve the desired sample size with distinct probabilities. Models are trained on 500,000 passwords using the most frequent passwords from the RockYou dataset. Every number is an average of 10 experiments, rounded to the nearest integer.

3.3 Estimation speed

The binary search employed in the original estimator is fast enough for assessing individual passwords. However, when the estimator is used to evaluate or compare different models and their variants, the ranks of a large number of passwords need to be estimated. An optimization can be relevant in these scenarios.

Idea 3.

Divide the interval of possible probability values p(βi)𝑝subscript𝛽𝑖p(\beta_{i})italic_p ( italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), in our case expressed as negative log2subscript2\log_{2}roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT values, into t𝑡titalic_t intervals (bins): [0,τ1)0subscript𝜏1[0,\tau_{1})[ 0 , italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), [τ2,τ3)subscript𝜏2subscript𝜏3[\tau_{2},\tau_{3})[ italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), …, [τt1,)subscript𝜏𝑡1[\tau_{t-1},\infty)[ italic_τ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , ∞ ), where 0<τ1<<τt10subscript𝜏1subscript𝜏𝑡10<\tau_{1}<\ldots<\tau_{t-1}0 < italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < … < italic_τ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT. For each interval, we calculate minimal and maximal look-up indices that narrow interval for binary search (we use τ0=0subscript𝜏00\tau_{0}=0italic_τ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0 in the following equations):

LUmin(i)subscriptLUmin𝑖\displaystyle\text{LU}_{\text{min}}(i)LU start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( italic_i ) =max{1jnlog2p(βj)τi1},absent1𝑗conditional𝑛subscript2𝑝subscript𝛽𝑗subscript𝜏𝑖1\displaystyle=\max\{1\leq j\leq n\mid-\log_{2}p(\beta_{j})\geq\tau_{i-1}\},= roman_max { 1 ≤ italic_j ≤ italic_n ∣ - roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p ( italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ≥ italic_τ start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT } ,
LUmax(i)subscriptLUmax𝑖\displaystyle\text{LU}_{\text{max}}(i)LU start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ( italic_i ) =min{1jnlog2p(βj)<τi1}, for 1it.formulae-sequenceabsent1𝑗conditional𝑛subscript2𝑝subscript𝛽𝑗subscript𝜏𝑖1 for 1𝑖𝑡\displaystyle=\min\{1\leq j\leq n\mid-\log_{2}p(\beta_{j})<\tau_{i-1}\},\text{% for }1\leq i\leq t.= roman_min { 1 ≤ italic_j ≤ italic_n ∣ - roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p ( italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) < italic_τ start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT } , for 1 ≤ italic_i ≤ italic_t .

The estimator is adapted accordingly. Given a password α𝛼\alphaitalic_α, we calculate an appropriate interval such that log2p(α)[τi1,τi)subscript2𝑝𝛼subscript𝜏𝑖1subscript𝜏𝑖-\log_{2}p(\alpha)\in[\tau_{i-1},\tau_{i})- roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p ( italic_α ) ∈ [ italic_τ start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). Then, the binary search is performed within the set of indices {LUmin(i),,LUmax(i)}subscriptLUmin𝑖subscriptLUmax𝑖\{\text{LU}_{\text{min}}(i),\ldots,\text{LU}_{\text{max}}(i)\}{ LU start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( italic_i ) , … , LU start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ( italic_i ) }, instead of full set {1,,n}1𝑛\{1,\ldots,n\}{ 1 , … , italic_n }. We expect to narrow the interval for the binary search substantially, so the benefit of fewer comparisons will be measurable. Trivially, the precision of the estimator remains unchanged.

The price paid is the cost of computing LUminsubscriptLUmin\text{LU}_{\text{min}}LU start_POSTSUBSCRIPT min end_POSTSUBSCRIPT and LUmaxsubscriptLUmax\text{LU}_{\text{max}}LU start_POSTSUBSCRIPT max end_POSTSUBSCRIPT arrays, which is simple one-time precomputation, and small memory needed to store these arrays in the estimator222For example, 100 intervals “cost” approximately 7.8 kB, even with a wasteful representation using Python’s int objects for stored indices and lists for the arrays.

4 Experiments

We implement the ideas presented in the previous section and present the results of our experiments.

4.1 Precision

The ideas aimed at improving precision apply to the Monte Carlo Estimator, regardless of the underlying model. We do not attempt to modify the models. For example, if password -1-1-1-1 is assigned inf333Python’s float(‘inf’) value as negative log2subscript2\log_{2}roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT probability in the PCFG model, because the pattern is outside of the trained grammar, we do not try to “fix this”. Moreover, we do not compare the performance of the models to each other.

We assess the impact of our ideas on the real ranks of password generated by the models. Similarly to the original paper [2], we generate all passwords up to some probability threshold. The rank of a password is its position in the list sorted by the probabilities assigned by the model to the passwords.

The first experiment uses the PCFG model trained on 10 million passwords from the RockYou dataset. The threshold for password generation was set at 20202020, i.e., all passwords with probability at least 220superscript2202^{-20}2 start_POSTSUPERSCRIPT - 20 end_POSTSUPERSCRIPT were generated – there were 91,693 passwords in this dataset (let’s denote it T𝑇Titalic_T). We consider various combinations of proposed ideas:

  • original – a reference implementation of the estimator [1];

  • interpolation – interpolate rank calculation within the interval between two adjacent probabilities (Idea 1);

  • sampling – improved sampling with n𝑛nitalic_n unique probabilities (Idea 2);

  • all – a combination of interpolation and sampling.

Let rr(α)rr𝛼\text{rr}(\alpha)rr ( italic_α ) denote the real rank of password αT𝛼𝑇\alpha\in Titalic_α ∈ italic_T, and let er(α)er𝛼\text{er}(\alpha)er ( italic_α ) denote the rank estimated by a particular variant of the estimator. The weighted error of the estimator on the password set T𝑇Titalic_T is calculated as follows:

αTp(α)|er(α)rr(α)|.subscript𝛼𝑇𝑝𝛼er𝛼rr𝛼\sum_{\alpha\in T}p(\alpha)\,|\text{er}(\alpha)-\text{rr}(\alpha)|.∑ start_POSTSUBSCRIPT italic_α ∈ italic_T end_POSTSUBSCRIPT italic_p ( italic_α ) | er ( italic_α ) - rr ( italic_α ) | .

The weighted error assumes that the estimators are used to asses passwords chosen by humans, following the original distribution. We also consider a simple error for completeness:

1|T|αT|er(α)rr(α)|.1𝑇subscript𝛼𝑇er𝛼rr𝛼\frac{1}{|T|}\,\sum_{\alpha\in T}|\text{er}(\alpha)-\text{rr}(\alpha)|.divide start_ARG 1 end_ARG start_ARG | italic_T | end_ARG ∑ start_POSTSUBSCRIPT italic_α ∈ italic_T end_POSTSUBSCRIPT | er ( italic_α ) - rr ( italic_α ) | .
variant weighted error simple error
original 16.54 101.11
interpolation 15.33 90.63
sampling 11.79 70.63
all 10.86 63.10
Table 3: Weighted and simple errors of various estimator variants. Every number is an average of 100 experiments.

Table 3 shows the results of our experiment. We performed 100 experiments. We have to warn the reader – the reported errors are sensitive to the particular password distribution sampled into ΘΘ\Thetaroman_Θ. Unsurprisingly, the sampling (Idea 2) helps to reduce estimation errors in general. The situation with interpolation (Idea 1) is mixed, with a substantial fraction of experiments showing worse statistics. The reason is that the interpolation makes the error worse when passwords in ΘΘ\Thetaroman_Θ already “overshoot” their true ranks. Taking the same rank without interpolation compensates for this. Therefore, interpolation cannot be recommended for improving the precision of the estimator. On the other hand, it helps with the “same rank” problem, when different passwords are assigned the same rank by the estimator.

Figure 3 compares visually the original variant with the “all” variant. It illustrates the simple difference of calculated rank and estimated rank. It also shows the relative error of the estimators. As expected, based on the convergence proof in [2], the relative error is rather small in both cases.

Refer to caption
Figure 3: The simple difference error (on the left) and the relative error (on the right) of the original and the "all" strategy. Both graphs display results for 50,000 of the most probable passwords from the PCFG model. The numbers are the average values from 100 experiments.

4.2 Estimation speed

We tested two configurations: the first one with 100 intervals (bins), and the second with 1,000 intervals. Negative log probabilities are divided into fixed intervals [0,1)01[0,1)[ 0 , 1 ), [1,2)12[1,2)[ 1 , 2 ), …, [99,)99[99,\infty)[ 99 , ∞ ) for the first case, and into [0,0.1)00.1[0,0.1)[ 0 , 0.1 ), [0.1,0.2)0.10.2[0.1,0.2)[ 0.1 , 0.2 ), …, [99.9,)99.9[99.9,\infty)[ 99.9 , ∞ ) for the second case, respectively. Both configurations were tested with four different sizes of ΘΘ\Thetaroman_Θ. Table 4 shows the relative speed of different variants with respect to the baseline, which is the original algorithm with |Θ|=10000Θ10000|\Theta|=10000| roman_Θ | = 10000. The results confirm a moderate speed-up for 100 intervals and a substantial speed-up for 1,000 intervals.

Estimation performance
sample size original 100 bins 1000 bins
10,000 1.00 0.92 0.37
30,000 1.08 1.00 0.39
50,000 1.13 1.04 0.40
100,000 1.20 1.10 0.40
Table 4: Average relative estimation performance, where the baseline 1.001.001.001.00 is the estimation performance of the original binary search for the sample size 10,000. Experiment uses 106superscript10610^{6}10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT randomly generated passwords by the PCFG model. Every number is an average of 10 experiments, and rounded to the two decimal places.

5 Additional observation and conclusion

Since passwords in ΘΘ\Thetaroman_Θ are generated according to their probability, with sufficiently large sample size, we expect that for some k𝑘kitalic_k, the top-k𝑘kitalic_k most probable passwords will be in the correct order at the beginning of ΘΘ\Thetaroman_Θ. Therefore, simply reporting the order of these top-k𝑘kitalic_k passwords by the estimator can be beneficial to the precision. Figure 4 illustrates this phenomenon for the PCFG model and the sample size of 10,000, where approximately the top 180 passwords have the exact rank as their position in ΘΘ\Thetaroman_Θ. However, further down the precision quickly deteriorates.

An interesting question is if we can improve the estimator’s precision by compensating for unusually large or small jumps (differences) between adjacent probabilities in the sampled passwords.

Refer to caption
Figure 4: Comparison of original password’s rank estimate and estimate according the position in sampled passwords (denoted as fixed).

An area outside this paper that deserves further focus is the precision of the estimator for low-probability passwords. The estimator’s precision worsens for passwords with high ranks. A potential approach might use a different or additional sampling methods that focus on less probable passwords, so that we can cover this part of the probability space better.

References