In this writing, I consider the KL divergence with measure theoretic approach.
Suppose \(P\) and \(Q\) are probability measures on a measurable space \((\Omega, \mathcal{F})\), and \(P\ll Q\). That is, \(P\) is absolutely continuous with respect to \(Q\). (Or, equivalently, \(Q\) dominates \(P\))
KL divergence is defined as below:
\[\begin{aligned} \mathcal{D}_{KL}(P||Q)&:=E_P\left[\log \frac{dP}{dQ}\right] \\ &= \int_{\Omega}\log \frac{dP}{dQ} dP \end{aligned}\]
If we have density functions for \(P\) and \(Q\) with respect to Lebesgue measure:
\[\begin{aligned} \frac{dP}{d\lambda}&=f \\ \frac{dQ}{d\lambda} &=g \end{aligned}\]
then the KL divergence can be expressed in the familiar form.
\[\begin{aligned} \mathcal{D}_{KL}(P||Q) &= \int_{\Omega}\log \frac{dP}{dQ} dP \\ &= \int_{\Omega}\log \frac{\frac{dP}{d\lambda}}{\frac{dQ}{d\lambda}} \frac{dP}{d\lambda}d\lambda \\ &= \int_{\Omega} \left(\log \frac{f}{g}\right) f d\lambda \end{aligned} \]
Just to be clear, \(dP=P(d\omega)\), \(dQ=Q(d\omega)\), and \(d\lambda=\lambda(d\omega)\).
Since the Lebesgue measure is defined over a real interval, for the last expression, \(\Omega=\mathbb{R}\cap[\text{some interval}]\).
'통계 > 확률론 및 수리통계학' 카테고리의 다른 글
Measure theoretic description of change-of-variables (0) | 2025.01.01 |
---|---|
Metric vs distance measure (0) | 2024.01.26 |
Is a pdf a probability measure? (0) | 2024.01.24 |
Covariance matrix의 다양한 이름들 & Autocorrelation matrix (0) | 2024.01.17 |
확률 P와 기대값의 부등식 (0) | 2023.12.25 |