Probabilistic/Randomized Algorithms If M is chosen to be a large, 31-bit prime, the period should be
significantly large for most applications. M = 2 31 - 1 = 2,147,483,647
and A = 48,271. Ming- Hwa Wang, Ph.D. COEN 279/AMTH 377 Design and Analysis of Algorithm s Same sequence occurs all the time for easy debugging, and input seed Departm ent of Com puter Engineering (e.g., use system clock) for real runs. Santa Clara Univ ersity Usually a random real number in the open interval (0,1), which can be done by dividing by M. Probabilistic or Random ized algorithm Multiplication overflow prevention: let Q = M / A = 44,488 and R = M % At least once during the algorithm, a random number is used to make a A = 3,399, decision instead of spending time to work out which alternative is best. xi+1 = Axi % M = A(xi % Q) - R(xi / Q) + M (xi), where (xi) = xi / Q - The worst-case running time of a randomized algorithm is almost always Axi / M = 1 iff the remaining terms evaluate to less than zero, 0 the same as the worst-case running time of the non-randomized otherwise. algorithm. xi+1 = Axi % M = Axi - M(Axi / M) = Axi - M(xi / Q) + M(xi / Q) - M(Axi A good randomized algorithm has no bad input, but only bad random / M) = Axi - M(xi / Q) + M(xi / Q - Axi / M) = A(Q(xi / Q) + xi % Q) - numbers. M(xi / Q) + M(xi / Q - Axi / M) = (AQ - M)(xi / Q) + A(xi % Q) + M(xi The random numbers are important, and we can get an expected running / Q - Axi / M) = -R(xi / Q) + A(xi % Q) + M(xi / Q - Axi / M) = A(xi % time, where we now average over all possible random numbers instead Q) - R(xi / Q) + M (xi) of over all possible inputs, or the mean time that it would take to solve the same instance over and over again. Num erical Probabilistic Algorithm s A randomized algorithm runs quickly but occasionally makes an error. For certain real-life problems, computation of an exact solution is not The probability of error can, however, be make negligibly small. Any possible even in principle, e.g., uncertainties in the experimental data, digital purported solution can be verified efficiently for correctness. computers handle only binary values, etc. For other problems, a pre cise A randomized algorithm may give probabilistic answers which are not answer exists but it would take too long to figure it out exactly. Numerical necessarily exact. algorithms yield a confidence interval, and the expected precision improves The same algorithm may behave differently when it is applied twice to as the time available to the algorithm increase. The error is usually inversely the same instance. Its execution time, and even the result obtained, may proportional to the square root of the amount of work performed. vary considerable from one use to the next. If the algorithm gets stuck Buffon’s Needle: throw a needle at random on a floor made of planks of (e.g., core dump), simply restart it on the same instance for a fresh constant width, if the needle is exactly half as long as the planks in the chance of success. If there is more than one correct answer, several floor and if the width of the cracks between the planks are zero, the different ones may be obtained by running the probabilistic algorithm probability that the needle will fall across a crack is 1/ . The probability more than once. that a randomly thrown needle will fall across a crack is 2 / , where is An expected running time bound is somewhat stronger than an average - needle length and is plank width. The result estimate will be between case bound, but is weaker than the corresponding worst-case bound. and with probability at least (desired reliability). Numerical integration - Monte Carlo integration: deterministic integration Random Num ber Generators algorithms are easy to be fooled, and very expensive when evaluating a True randomness is virtually impossible to do on a computer. multiple integral. The hybrid techniques that partly systematic and partly Pseudorandom numbers. What really needed is a sequence of random probabilistic is called quasi Monte Carlo integration. numbers appear independently. Probabilistic Counting: The linear congruential generator: xi+1 = Axi % M, where x0 is the seed Counting twice as far to up to 2 n+1 - 2 by initialize to 0, each time and 1 ≤ x0 < M. If M is prime, xi is never 0. After M-1 numbers, the tick is called, flip a fair coin. If it comes up head, add 1 to the sequence repeat (period of M-1). Some choices of A gets shorter period register, otherwise, do nothing. When count is called, return twice than M-1. the value stored in the register. Counting exponentially farther from 0 to 22 n-1 - 1. Keep in the Skip Lists register an estimate of the logarithm of the actual number of ticks Every 2 ith node has a pointer to the node 2 i ahead of it. The total and count(c) returns 2 c-1. Keep the relative error in control instead number of pointers has only doubled, but now at most lgN nodes of absolute. are examined during a search. The search consists of either advancing to a new node or dropping to a lower pointer in the same Monte Carlo Algorithm s node. Monte Carlo algorithms give exact answer with high probability whatever the A level k node is a node that has k pointers, the ith pointer in any instance considered, although sometimes they provide a wrong answer. level k node (k i) points to the next node with at least i levels. Generally you cannot tell if the answer is correct, but you can reduce the Roughly half the nodes are level 1 nodes, roughly a quarter are level error probability arbitrarily by allowing the algorithm more time (amplifying 2, and, in general, approximately 1/2 i nodes are level i. We choose the stochastic). A Monte Carlo algorithm is p-correct if it returns a correct the level randomly. answer with probability at least p (0 < p < 1), whatever the instance Find: start at the highest pointer at the header, traverse along this considered. p depends on the instance size but not on the instance itself. level until find that the next node is larger than the one we are Verifying Matrix Multiplication: looking for (or nil). When this occurs, go to the next lower level and straightforward matrix multiplication algorithm (n 3), Strassen’s continue the strategy. When progress is stopped at level 1, either we algorithm (n 2.37) are in front of the node we are looking for, or it is not in the list. Let D = AB - C, S {1,2, .., n}, and S(D) denote the vector of Insert: proceed as in a Find, and keep track of each point where we length n obtained by adding pointwise the rows of D indexed by the switch to a lower level. The new node, whose level is determined elements of S. S(D) is always 0 if AB equal C, otherwise, assume i randomly, is then spliced into the list. be an integer such that the ith row of D contains at least one nonzero O(lgN) expected cost element. The probability that S(D) 0 is at least one-half. Let X be a Skip lists need an estimate of the number of elements that will be in binary vector of length of n such that Xj = 1 if j S and Xj = 0 the list to determine the number of levels. Different level of nodes otherwise. Then S(D) = XD, and we want to verify if XAB = XC, need different type declarations. where (XA)B need (n 2). Getting the answer false just once allows you conclude that AB C. The probability that k successive calls each Las Vegas Algorithm s -k -k return the wrong answer is at most 2 , so it is (1 - 2 ) correct. Las Vegas algorithms make probabilistic choices to help guide them more Alternatively, Monte Carlo algorithms can be given an explicit upper quickly to a correct solution, they never return a wrong answer. Two main bound on the tolerable error probability in (n 2lg -1). categories of Las Vegas algorithms: it take longer time to solve a problem Primality Testing when unfortunate choice are made (e.g., Quicksort), and alternatively, they O(2 d/2) to test whether a d-digit number is a prime allow themselves go to a dead end and admit that they cannot find a solution Randomized polynomial-time algorithm: if the algorithm declares in this run of the algorithm. A Las Vegas algorithm has the Robin Hood that the number is not prime, then it is certainly not a prime. If the effect, with high probability, instances that took a long time deterministically algorithm declares that the number is a prime, then with high are now solved much faster, but instances on which the deterministic probability but not 100% sure, the number is prime. algorithm was particularly good are slowed down to average. Let p(x) be the Fermat's Lesser Theorem: If P is prime, and 0 < A < P, then AP-1 1 probability of success of the algorithm, then the expected time t(x) is 1/p(x). %P However, a correct analysis must consider separately the expected time Pick 1 < A < N-1 at random. If AN-1 1 % N, declare that N is taken by LV(x) in case of success s(x) and in case of failure f(x). t(x) = s(x) probably prime, otherwise declare that N is definitely not prime. + ((1-p(x))/p(x))f(x). False witness of primality: Carmichael numbers are not prime but The Eight Queens Problem satisfy AN-1 1 % N for all 0 < A < N that are relatively prime to N. Combine backtracking with probabilistic algorithm, first places a If P is prime and 0 < X < P, the only solutions to X2 = 1 % P are X = number of queens on the board in a random way, and then uses 1, P-1. backtracking to try and add the remaining queens without reconsidering the positions of the queens that were placed randomly. The more queens we place randomly, the smaller the average time needed by the subsequent backtracking stage, whether it fails or succeeds, but the greater the probability of failure. This is the fine - tuning knob. Probabilistic Quickselect and Quicksort Universal Hashing Las Vegas hashing allows us to retain the efficiency of hashing on the average, without arbitrarily favoring some programs at the expense of others. Choose the hash function randomly at the beginning of each compilation and again whenever rehashing becomes necessary, ensure that collision lists remain reasonably well-balanced with high probability. Universal hashing: Let U = {1,2, .., a-1} be the universe of potential indexes for the associative table, and let B = {1,2, ..,N-1} be the set of indexes in the hash table. Let two distinct x and y in U, a set H of functions from U to B, and h:U B is a function chosen randomly from H, H is a universal2 class of hash functions if the probability that h(x) = h(y) is at most 1/N. Let p be a prime number at least as large as a, and i, j be two integers (1 i < p and 0 j < p), then h ij(x) = ((ix + j)%p)%N, and H is universal2. Factorizing Large Integers The factorization problem consists of finding the unique decomposition of n into a product of prime facto rs. The splitting consists of finding one nontrivial divisor of n, provided n is composite. Factorizing reduces to splitting and primality testing. An integer is k-smooth if all its prime divisors are among the k smallest prime numbers. k-smooth integers can be factorized efficiently by trial division if k is small. A hard composite number is the product of two primes of roughly equal size. Let n be a composite integer, Let a and b be distinct integers between 1 and n-1 such that a + b n. If a2 % n b2 % n, then gcd(a+b, n) is a nontrivial divisor of n.