Kullback-Leibler divergence (Kullback 1951) is an information-based measure of disparity among probability distributions. Given distributions P and Q defined over X, with Q absolutely continuous with respect to P, the Kullback-Leibler divergence of Q from P is the P-expectation of \({-\log }_{2}\{P/Q\}.\ \mathrm{So},\ {D}_{KL}(P,Q) = -{\int \nolimits \nolimits {}_{\,\,\,X}\log }_{2}(Q(x)/P(x))dP.\) This quantity can be seen as the difference between the cross-entropy forQonP, H(P, Q) = − ∫​​​X log2(Q(x))dP, and the self-entropy (Shannon 1948) of P, H(P) = H(P, P) = − ∫​​​X log2(P(x))dP. Since H(P, Q) is the P-expectation of the number of bits of information, beyond those encoded in Q, that are needed to identify points in X, D KL (P, Q) = H(P) − H(P, Q) is the expected difference, from the perspective of P, between the information encoded in P and the information encoded in Q.
D KL has a number of features that make it plausible as a measure of probabilistic divergence. Here are some of its...