Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
94 views

Conditional Probability

Uploaded by

Deva Raghul
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views

Conditional Probability

Uploaded by

Deva Raghul
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Conditional probability

From Wikipedia, the free encyclopedia


This article is missing citations or needs footnotes. Please help add inline citations to guard against copyright
violations and factual inaccuracies. (December 2007)

Conditional probability is the probability of some event A, given the occurrence of some other event B. Conditional probability
is written P(A|B), and is read "the (conditional) probability of A, given B" or "the probability of A under the condition B". When
in a random experiment the event B is known to have occurred, the possible outcomes of the experiment are reduced to B, and
hence the probability of the occurrence of A is changed from the unconditional probability into the conditional probability given
B.

Joint probability is the probability of two events in conjunction. That is, it is the probability of both events together. The joint
probability of A and B is written or

Marginal probability is then the unconditional probability P(A) of the event A; that is, the probability of A, regardless of
whether event B did or did not occur. If B can be thought of as the event of a random variable X having a given outcome, the
marginal probability of A can be obtained by summing (or integrating, more generally) the joint probabilities over all outcomes
for X. For example, if there are two possible outcomes for X with corresponding events B and B', this means that
. This is called marginalization.

In these definitions, note that there need not be a causal or temporal relation between A and B. A may precede B or vice versa or
they may happen at the same time. A may cause B or vice versa or they may have no causal relation at all. Notice, however, that
causal and temporal relations are informal notions, not belonging to the probabilistic framework. They may apply in some
examples, depending on the interpretation given to events.

Conditioning of probabilities, i.e. updating them to take account of (possibly new) information, may be achieved through Bayes'
theorem. In such conditioning, the probability of A given only initial information I, P(A|I), is known as the prior probability. The
updated conditional probability of A, given I and the outcome of the event B, is known as the posterior probability, P(A|B,I).

Introduction

Consider the simple scenario of rolling two fair six-sided dice, labelled die 1 and die 2. Define the following three events (not
assumed to occur simultaneously):

A: Die 1 lands on 3.
B: Die 2 lands on 1.
C: The dice sum to 8.

The prior probability of each event describes how likely the outcome is before the dice are rolled, without any knowledge of the
roll's outcome. For example, die 1 is equally likely to fall on each of its 6 sides, so P(A) = 1/6. Similarly P(B) = 1/6. Likewise, of
the 6 × 6 = 36 possible ways that a pair of dice can land, just 5 result in a sum of 8 (namely 2 and 6, 3 and 5, 4 and 4, 5 and 3, 6
and 2), so P(C) = 5/36.

Some of these events can both occur at the same time; for example events A and C can happen at the same time, in the case where
die 1 lands on 3 and die 2 lands on 5. This is the only one of the 36 outcomes where both A and C occur, so its probability is 1/36.
The probability of both A and C occurring is called the joint probability of A and C and is written , so
. On the other hand, if die 2 lands on 1, the dice cannot sum to 8, so .

Now suppose we roll the dice and cover up die 2, so we can only see die 1, and observe that die 1 landed on 3. Given this partial
information, the probability that the dice sum to 8 is no longer 5/36; instead it is 1/6, since die 2 must land on 5 to achieve this
result. This is called the conditional probability, because it is the probability of C under the condition that A is observed, and is
written P(C | A), which is read "the probability of C given A." Similarly, P(C | B) = 0, since if we observe die 2 landed on 1, we
already know the dice can't sum to 8, regardless of what the other die landed on.
On the other hand, if we roll the dice and cover up die 2, and observe die 1, this has no impact on the probability of event B,
which only depends on die 2. We say events A and B are statistically independent or just independent and in this case

In other words, the probability of B occurring after observing that die 1 landed on 3 is the same as before we observed die 1.

Intersection events and conditional events are related by the formula:

In this example, we have:

As noted above, , so by this formula:

On multiplying across by P(A),

In other words, if two events are independent, their joint probability is the product of the prior probabilities of each event
occurring by itself.

[edit] Definition

Given a probability space (Ω, F, P) and two events A, B ∈ F with P(B) > 0, the conditional probability of A given B is defined by

If P(B) = 0 then P(A | B) is undefined (see Borel–Kolmogorov paradox for an explanation). However it is possible to define a
conditional probability with respect to a σ-algebra of such events (such as those arising from a continuous random variable).

For example, if X and Y are non-degenerate and jointly continuous random variables with density ƒX,Y(x, y) then, if B has positive
measure,
The case where B has zero measure can only be dealt with directly in the case that B={y0}, representing a single point, in which
case

If A has measure zero then the conditional probability is zero. An indication of why the more general case of zero measure cannot
be dealt with in a similar way can be seen by noting that that the limit, as all δyi approach zero, of

depends on their relationship as they approach zero. See conditional expectation for more information.

[edit] Derivation

The following derivation is taken from Grinstead and Snell's Introduction to Probability. [1]

Let Ω be a sample space with the probability P. Suppose the event has occurred and an altered probability P({ω} | E)
is to be assigned to the elementary events {ω} to reflect the fact that E has occurred. (In the following we will omit the curled
brackets.)

For all we want to make sure that the intuitive result P(ω | E) = 0 is true.

Also, without further information provided, we can be certain that the relative magnitude of probabilities is conserved:

This requirement leads us to state:

where α, is a positive real constant or scaling factor to reflect the above requirement.

Since we know E has occurred, we can state P(E) > 0 and:

Hence

For another event F this leads to:


[edit] Results

From the definition

the partition rule

may be derived. From here the expanded form of Bayes' theorem can be found:

[edit] Statistical independence

Two random events A and B are statistically independent if and only if

Thus, if A and B are independent, then their joint probability can be expressed as a simple product of their individual
probabilities.

Equivalently, for two independent events A and B with non-zero probabilities,

and

In other words, if A and B are independent, then the conditional probability of A, given B is simply the individual probability of A
alone; likewise, the probability of B given A is simply the probability of B alone.

[edit] The conditional probability fallacy

The conditional probability fallacy is the assumption that P(A|B) is approximately equal to P(B|A). The mathematician John Allen
Paulos discusses this in his book Innumeracy,[2] where he points out that it is a mistake often made even by doctors, lawyers, and
other highly educated non-statisticians. It can be overcome by describing the data in actual numbers rather than probabilities.

The relation between P(A|B) and P(B|A) is given by Bayes' theorem:


In other words, one can only assume that P(A|B) is approximately equal to P(B|A) if the prior probabilities P(A) and P(B) are also
approximately equal.

An example

In the following constructed but realistic situation, the difference between P(A|B) and P(B|A) may be surprising, but is at the same
time obvious.

In order to identify individuals having a serious disease in an early curable form, one may consider screening a large group of
people. While the benefits are obvious, an argument against such screenings is the disturbance caused by false positive screening
results: If a person not having the disease is incorrectly found to have it by the initial test, they will most likely be distressed, and
even if they subsequently take a more careful test and are told they are well, their lives may still be affected negatively. If they
undertake unnecessary treatment for the disease, they may be harmed by the treatment's side effects and costs.

The magnitude of this problem is best understood in terms of conditional probabilities.

Suppose 1% of the group suffer from the disease, and the rest are well. Choosing an individual at random,

P(ill) = 1% = 0.01 and P(well) = 99% = 0.99.

Suppose that when the screening test is applied to a person not having the disease, there is a 1% chance of getting a false positive
result (and hence 99% chance of getting a true negative result), i.e.

P(positive | well) = 1%, and P(negative | well) = 99%.

Finally, suppose that when the test is applied to a person having the disease, there is a 1% chance of a false negative result (and
99% chance of getting a true positive result), i.e.

P(negative | ill) = 1% and P(positive | ill) = 99%.

[edit] Calculations

The fraction of individuals in the whole group who are well and test negative (true negative):

The fraction of individuals in the whole group who are ill and test positive (true positive):

The fraction of individuals in the whole group who have false positive results:

The fraction of individuals in the whole group who have false negative results:

Furthermore, the fraction of individuals in the whole group who test positive:
Finally, the probability that an individual actually has the disease, given that the test result is positive:

Conclusion

In this example, it should be easy to relate to the difference between the conditional probabilities P(positive | ill) which with the
assumed probabilities is 99%, and P(ill | positive) which is 50%: the first is the probability that an individual who has the disease
tests positive; the second is the probability that an individual who tests positive actually has the disease. With the numbers chosen
here, the last result is likely to be deemed unacceptable: half the people testing positive are actually false positives.

Second type of conditional probability fallacy

Another type of fallacy is interpreting conditional probabilities of events (or a series of events) as (unconditional) probabilities, or
seeing them as being in the same order of magnitude. A conditional probability of an event and its (total) probability are linked
with each other through the formula of total probability, but without additional information one of them says little about the
other. The fallacy to view P(A|B) as P(A) or as being close to P(A) is often related with some forms of statistical bias but it can
be subtle.

Here is an example: One of the conditions for the legendary wild-west hero Wyatt Earp to have become a legend was having
survived all the duels he fought. Indeed, it is reported that he was never wounded, not even scratched by a bullet. The probability
of this to happen is very small, contributing to his fame because events of very small probabilities attract attention. However, the
point is that the degree of attention depends very much on the observer. Somebody impressed by a specific event (here seeing a
"hero") is prone to view effects of randomness differently from others which are less impressed.

In general it does not make much sense to ask after observation of a remarkable series of events, "What is the probability of
this?"; which is a conditional probability based upon observation. The distinction between conditional and unconditional
probabilities can be intricate if the observer who asks "What is the probability?" is himself/herself an outcome of a random
selection. The name "Wyatt Earp effect" was coined in an article "Der Wyatt Earp Effekt" (in German) showing through several
examples its subtlety and impact in various scientific domains.

Conditioning on a random variable

There is also a concept of the conditional probability of an event given a discrete random variable. Such a conditional probability
is a random variable in its own right.

Suppose X is a random variable that can be equal either to 0 or to 1. As above, one may speak of the conditional probability of
any event A given the event X = 0, and also of the conditional probability of A given the event X = 1. The former is denoted P(A|
X = 0) and the latter P(A|X = 1). Now define a new random variable Y, whose value is P(A|X = 0) if X = 0 and P(A|X = 1) if X = 1.
That is

This new random variable Y is said to be the conditional probability of the event A given the discrete random variable X:
According to the "law of total probability", the expected value of Y is just the marginal (or "unconditional") probability of A.

More generally still, it is possible to speak of the conditional probability of an event given a sigma-algebra. See conditional
expectation.

_______________________________________________________________________________________________________

Queueing theory

Queueing theory is the mathematical study of waiting lines, or queues. The theory enables mathematical analysis of several
related processes, including arriving at the (back of the) queue, waiting in the queue (essentially a storage process), and being
served at the front of the queue. The theory permits the derivation and calculation of several performance measures including the
average waiting time in the queue or the system, the expected number waiting or receiving service, and the probability of
encountering the system in certain states, such as empty, full, having an available server or having to wait a certain time to be
served.

Queueing theory has applications in diverse fields, [1] including telecommunications[2], traffic engineering, computing[3] and the
design of factories, shops, offices and hospitals.[4]

Overview

The word queue comes, via French, from the Latin cauda, meaning tail. The spelling "queueing" over "queuing" is typically
encountered in the academic research field. In fact, one of the flagship journals of the profession is named "Queueing Systems".

Queueing theory is generally considered a branch of operations research because the results are often used when making business
decisions about the resources needed to provide service. It is applicable in a wide variety of situations that may be encountered in
business, commerce, industry, healthcare,[5] public service and engineering. Applications are frequently encountered in customer
service situations as well as transport and telecommunication. Queueing theory is directly applicable to intelligent transportation
systems, call centers, PABXs, networks, telecommunications, server queueing, mainframe computer queueing of
telecommunications terminals, advanced telecommunications systems, and traffic flow.

Notation for describing the characteristics of a queueing model was first suggested by David G. Kendall in 1953. Kendall's
notation introduced an A/B/C queueing notation that can be found in all standard modern works on queueing theory, for example,
Tijms.[6]

The A/B/C notation designates a queueing system having A as interarrival time distribution, B as service time distribution, and C
as number of servers. For example, "G/D/1" would indicate a General (may be anything) arrival process, a Deterministic
(constant time) service process and a single server. More details on this notation are given in the article about queueing models.

[edit] History

Agner Krarup Erlang, a Danish engineer who worked for the Copenhagen Telephone Exchange, published the first paper on
queueing theory in 1909.[7]

David G. Kendall introduced an A/B/C queueing notation in 1953. Important work on queueing theory used in modern packet
switching networks was performed in the early 1960s by Leonard Kleinrock.

[edit] Application to telephony

The public switched telephone network (PSTN) is designed to accommodate the offered traffic intensity with only a small loss.
The performance of loss systems is quantified by their grade of service, driven by the assumption that if sufficient capacity is not
available, the call is refused and lost.[8] Alternatively, overflow systems make use of alternative routes to divert calls via different
paths — even these systems have a finite traffic carrying capacity. [8]
However, the use of queueing in PSTNs allows the systems to queue their customers' requests until free resources become
available. This means that if traffic intensity levels exceed available capacity, customer's calls are not lost; customers instead wait
until they can be served.[9] This method is used in queueing customers for the next available operator.

A queueing discipline determines the manner in which the exchange handles calls from customers. [9] It defines the way they will
be served, the order in which they are served, and the way in which resources are divided among the customers. [9][10] Here are
details of four queueing disciplines:

First in first out 


This principle states that customers are served one at a time and that the customer that has been waiting the longest is
served first.[10]
Last in first out  
This principle also serves customers one at a time, however the customer with the shortest waiting time will be served
first.[10] Also known as a stack.
Processor sharing  
Customers are served equally. Network capacity is shared between customers and they all effectively experience the
same delay.[10]
Priority  
Customers with high priority are served first.[10]

Queueing is handled by control processes within exchanges, which can be modelled using state equations. [9][10] Queueing systems
use a particular form of state equations known as a Markov chain that models the system in each state.[9] Incoming traffic to these
systems is modelled via a Poisson distribution and is subject to Erlang’s queueing theory assumptions viz. [8]

 Pure-chance traffic – Call arrivals and departures are random and independent events. [8]
 Statistical equilibrium – Probabilities within the system do not change.[8]
 Full availability – All incoming traffic can be routed to any other customer within the network. [8]
 Congestion is cleared as soon as servers are free.[8]

Classic queueing theory involves complex calculations to determine waiting time, service time, server utilization and other
metrics that are used to measure queueing performance. [9][10]

Queueing networks

Networks of queues are systems which contain an arbitrary, but finite, number m of queues. Customers, sometimes of different
classes,[11] travel through the network and are served at the nodes. The state of a network can be described by a vector
, where ki is the number of customers at queue i. In open networks, customers can join and leave the system,
whereas in closed networks the total number of customers within the system remains fixed.

The first significant result in the area was Jackson networks, for which an efficient product form equilibrium distribution exists.

[edit] Role of Poisson process, exponential distributions

A useful queueing model represents a real-life system with sufficient accuracy and is analytically tractable. A queueing model
based on the Poisson process and its companion exponential probability distribution often meets these two requirements. A
Poisson process models random events (such as a customer arrival, a request for action from a web server, or the completion of
the actions requested of a web server) as emanating from a memoryless process. That is, the length of the time interval from the
current time to the occurrence of the next event does not depend upon the time of occurrence of the last event. In the Poisson
probability distribution, the observer records the number of events that occur in a time interval of fixed length. In the (negative)
exponential probability distribution, the observer records the length of the time interval between consecutive events. In both, the
underlying physical process is memoryless.

Models based on the Poisson process often respond to inputs from the environment in a manner that mimics the response of the
system being modeled to those same inputs. The analytically tractable models that result yield both information about the system
being modeled and the form of their solution. Even a queueing model based on the Poisson process that does a relatively poor job
of mimicking detailed system performance can be useful. The fact that such models often give "worst-case" scenario evaluations
appeals to system designers who prefer to include a safety factor in their designs. Also, the form of the solution of models based
on the Poisson process often provides insight into the form of the solution to a queueing problem whose detailed behavior is
poorly mimicked. As a result, queueing models are frequently modeled as Poisson processes through the use of the exponential
distribution.

[edit] Limitations of queueing theory

The assumptions of classical queueing theory may be too restrictive to be able to model real-world situations exactly. The
complexity of production lines with product-specific characteristics cannot be handled with those models. Therefore specialized
tools have been developed to simulate, analyze, visualize and optimize time dynamic queueing line behavior.

For example; the mathematical models often assume infinite numbers of customers, infinite queue capacity, or no bounds on
inter-arrival or service times, when it is quite apparent that these bounds must exist in reality. Often, although the bounds do
exist, they can be safely ignored because the differences between the real-world and theory is not statistically significant, as the
probability that such boundary situations might occur is remote compared to the expected normal situation. Furthermore, several
studies show the robustness of queueing models outside their assumptions. In other cases the theoretical solution may either
prove intractable or insufficiently informative to be useful.

Alternative means of analysis have thus been devised in order to provide some insight into problems that do not fall under the
scope of queueing theory, although they are often scenario-specific because they generally consist of computer simulations or
analysis of experimental data. See network traffic simulation.

You might also like