Measure-theoretic Treatment of Statistics
posted: 01-Aug-2025 & updated: 03-Aug-2025
\[% \newcommand{\algA}{\algk{A}} \newcommand{\algC}{\algk{C}} \newcommand{\bigtimes}{\times} \newcommand{\compl}[1]{\tilde{#1}} \newcommand{\complexes}{\mathbb{C}} \newcommand{\dom}{\mathop{\bf dom {}}} \newcommand{\ereals}{\reals\cup\{-\infty,\infty\}} \newcommand{\field}{\mathbb{F}} \newcommand{\integers}{\mathbb{Z}} \newcommand{\lbdseqk}[1]{\seqk{\lambda}{#1}} \newcommand{\meas}[3]{({#1}, {#2}, {#3})} \newcommand{\measu}[2]{({#1}, {#2})} \newcommand{\meast}[3]{\left({#1}, {#2}, {#3}\right)} \newcommand{\naturals}{\mathbb{N}} \newcommand{\nuseqk}[1]{\seqk{\nu}{#1}} \newcommand{\pair}[2]{\langle {#1}, {#2}\rangle} \newcommand{\rationals}{\mathbb{Q}} \newcommand{\reals}{\mathbb{R}} \newcommand{\seq}[1]{\left\langle{#1}\right\rangle} \newcommand{\powerset}{\mathcal{P}} \newcommand{\pprealk}[1]{\reals_{++}^{#1}} \newcommand{\ppreals}{\mathbb{R}_{++}} \newcommand{\prealk}[1]{\reals_{+}^{#1}} \newcommand{\preals}{\mathbb{R}_+} \newcommand{\tXJ}{\topos{X}{J}} % \newcommand{\relint}{\mathop{\bf relint {}}} \newcommand{\boundary}{\mathop{\bf bd {}}} \newcommand{\subsetset}[1]{\mathcal{#1}} \newcommand{\Tr}{\mathcal{\bf Tr}} \newcommand{\symset}[1]{\mathbf{S}^{#1}} \newcommand{\possemidefset}[1]{\mathbf{S}_+^{#1}} \newcommand{\posdefset}[1]{\mathbf{S}_{++}^{#1}} \newcommand{\ones}{\mathbf{1}} \newcommand{\Prob}{\mathop{\bf Prob {}}} \newcommand{\prob}[1]{\Prob\left\{#1\right\}} \newcommand{\Expect}{\mathop{\bf E {}}} \newcommand{\Var}{\mathop{\bf Var{}}} \newcommand{\Mod}[1]{\;(\text{mod}\;#1)} \newcommand{\ball}[2]{B(#1,#2)} \newcommand{\generates}[1]{\langle {#1} \rangle} \newcommand{\isomorph}{\approx} \newcommand{\isomorph}{\approx} \newcommand{\nullspace}{\mathcalfont{N}} \newcommand{\range}{\mathcalfont{R}} \newcommand{\diag}{\mathop{\bf diag {}}} \newcommand{\rank}{\mathop{\bf rank {}}} \newcommand{\Ker}{\mathop{\mathrm{Ker} {}}} \newcommand{\Map}{\mathop{\mathrm{Map} {}}} \newcommand{\End}{\mathop{\mathrm{End} {}}} \newcommand{\Img}{\mathop{\mathrm{Im} {}}} \newcommand{\Aut}{\mathop{\mathrm{Aut} {}}} \newcommand{\Gal}{\mathop{\mathrm{Gal} {}}} \newcommand{\Irr}{\mathop{\mathrm{Irr} {}}} \newcommand{\arginf}{\mathop{\mathrm{arginf}}} \newcommand{\argsup}{\mathop{\mathrm{argsup}}} \newcommand{\argmin}{\mathop{\mathrm{argmin}}} \newcommand{\ev}{\mathop{\mathrm{ev} {}}} \newcommand{\affinehull}{\mathop{\bf aff {}}} \newcommand{\cvxhull}{\mathop{\bf Conv {}}} \newcommand{\epi}{\mathop{\bf epi {}}} \newcommand{\injhomeo}{\hookrightarrow} \newcommand{\perm}[1]{\text{Perm}(#1)} \newcommand{\aut}[1]{\text{Aut}(#1)} \newcommand{\ideal}[1]{\mathfrak{#1}} \newcommand{\bigset}[2]{\left\{#1\left|{#2}\right.\right\}} \newcommand{\bigsetl}[2]{\left\{\left.{#1}\right|{#2}\right\}} \newcommand{\primefield}[1]{\field_{#1}} \newcommand{\dimext}[2]{[#1:{#2}]} \newcommand{\restrict}[2]{#1|{#2}} \newcommand{\algclosure}[1]{#1^\mathrm{a}} \newcommand{\finitefield}[2]{\field_{#1^{#2}}} \newcommand{\frobmap}[2]{\varphi_{#1,{#2}}} % %\newcommand{\algfontmode}{} % %\ifdefined\algfontmode %\newcommand\mathalgfont[1]{\mathcal{#1}} %\newcommand\mathcalfont[1]{\mathscr{#1}} %\else \newcommand\mathalgfont[1]{\mathscr{#1}} \newcommand\mathcalfont[1]{\mathcal{#1}} %\fi % %\def\DeltaSirDir{yes} %\newcommand\sdirletter[2]{\ifthenelse{\equal{\DeltaSirDir}{yes}}{\ensuremath{\Delta #1}}{\ensuremath{#2}}} \newcommand{\sdirletter}[2]{\Delta #1} \newcommand{\sdirlbd}{\sdirletter{\lambda}{\Delta \lambda}} \newcommand{\sdir}{\sdirletter{x}{v}} \newcommand{\seqk}[2]{#1^{(#2)}} \newcommand{\seqscr}[3]{\seq{#1}_{#2}^{#3}} \newcommand{\xseqk}[1]{\seqk{x}{#1}} \newcommand{\sdirk}[1]{\seqk{\sdir}{#1}} \newcommand{\sdiry}{\sdirletter{y}{\Delta y}} \newcommand{\slen}{t} \newcommand{\slenk}[1]{\seqk{\slen}{#1}} \newcommand{\ntsdir}{\sdir_\mathrm{nt}} \newcommand{\pdsdir}{\sdir_\mathrm{pd}} \newcommand{\sdirnu}{\sdirletter{\nu}{w}} \newcommand{\pdsdirnu}{\sdirnu_\mathrm{pd}} \newcommand{\pdsdiry}{\sdiry_\mathrm{pd}} \newcommand\pdsdirlbd{\sdirlbd_\mathrm{pd}} % \newcommand{\normal}{\mathcalfont{N}} % \newcommand{\algk}[1]{\mathalgfont{#1}} \newcommand{\collk}[1]{\mathcalfont{#1}} \newcommand{\classk}[1]{\collk{#1}} \newcommand{\indexedcol}[1]{\{#1\}} \newcommand{\rel}{\mathbf{R}} \newcommand{\relxy}[2]{#1\;\rel\;{#2}} \newcommand{\innerp}[2]{\langle{#1},{#2}\rangle} \newcommand{\innerpt}[2]{\left\langle{#1},{#2}\right\rangle} \newcommand{\closure}[1]{\overline{#1}} \newcommand{\support}{\mathbf{support}} \newcommand{\set}[2]{\{#1|#2\}} \newcommand{\metrics}[2]{\langle {#1}, {#2}\rangle} \newcommand{\interior}[1]{#1^\circ} \newcommand{\topol}[1]{\mathfrak{#1}} \newcommand{\topos}[2]{\langle {#1}, \topol{#2}\rangle} % topological space % \newcommand{\alg}{\algk{A}} \newcommand{\algB}{\algk{B}} \newcommand{\algF}{\algk{F}} \newcommand{\algR}{\algk{R}} \newcommand{\algX}{\algk{X}} \newcommand{\algY}{\algk{Y}} % \newcommand\coll{\collk{C}} \newcommand\collB{\collk{B}} \newcommand\collF{\collk{F}} \newcommand\collG{\collk{G}} \newcommand{\tJ}{\topol{J}} \newcommand{\tS}{\topol{S}} \newcommand\openconv{\collk{U}} % \newenvironment{my-matrix}[1]{\begin{bmatrix}}{\end{bmatrix}} \newcommand{\colvectwo}[2]{\begin{my-matrix}{c}{#1}\\{#2}\end{my-matrix}} \newcommand{\colvecthree}[3]{\begin{my-matrix}{c}{#1}\\{#2}\\{#3}\end{my-matrix}} \newcommand{\rowvecthree}[3]{\begin{bmatrix}{#1}&{#2}&{#3}\end{bmatrix}} \newcommand{\mattwotwo}[4]{\begin{bmatrix}{#1}&{#2}\\{#3}&{#4}\end{bmatrix}} % \newcommand\optfdk[2]{#1^\mathrm{#2}} \newcommand\tildeoptfdk[2]{\tilde{#1}^\mathrm{#2}} \newcommand\fobj{\optfdk{f}{obj}} \newcommand\fie{\optfdk{f}{ie}} \newcommand\feq{\optfdk{f}{eq}} \newcommand\tildefobj{\tildeoptfdk{f}{obj}} \newcommand\tildefie{\tildeoptfdk{f}{ie}} \newcommand\tildefeq{\tildeoptfdk{f}{eq}} \newcommand\xdomain{\mathcalfont{X}} \newcommand\xobj{\optfdk{\xdomain}{obj}} \newcommand\xie{\optfdk{\xdomain}{ie}} \newcommand\xeq{\optfdk{\xdomain}{eq}} \newcommand\optdomain{\mathcalfont{D}} \newcommand\optfeasset{\mathcalfont{F}} % \newcommand{\bigpropercone}{\mathcalfont{K}} % \newcommand{\prescript}[3]{\;^{#1}{#3}} % %\]Introduction
Preamble
Notations
-
sets of numbers
- $\naturals$ - set of natural numbers
- $\integers$ - set of integers
- $\integers_+$ - set of nonnegative integers
- $\rationals$ - set of rational numbers
- $\reals$ - set of real numbers
- $\preals$ - set of nonnegative real numbers
- $\ppreals$ - set of positive real numbers
- $\complexes$ - set of complex numbers
-
sequences $\seq{x_i}$ and the like
- finite $\seq{x_i}_{i=1}^n$, infinite $\seq{x_i}_{i=1}^\infty$ - use $\seq{x_i}$ whenever unambiguously understood
- similarly for other operations, e.g., $\sum x_i$, $\prod x_i$, $\cup A_i$, $\cap A_i$, $\bigtimes A_i$
- similarly for integrals, e.g., $\int f$ for $\int_{-\infty}^\infty f$
-
sets
- $\compl{A}$ - complement of $A$
- $A\sim B$ - $A\cap \compl{B}$
- $A\Delta B$ - $(A\cap \compl{B}) \cup (\compl{A} \cap B)$
- $\powerset(A)$ - set of all subsets of $A$
-
sets in metric vector spaces
- $\closure{A}$ - closure of set $A$
- $\interior{A}$ - interior of set $A$
- $\relint A$ - relative interior of set $A$
- $\boundary A$ - boundary of set $A$
-
set algebra
- $\sigma(\subsetset{A})$ - $\sigma$-algebra generated by $\subsetset{A}$, i.e., smallest $\sigma$-algebra containing $\subsetset{A}$
-
norms in $\reals^n$
- $\|x\|_p$ ($p\geq1$) - $p$-norm of $x\in\reals^n$, i.e., $(|x_1|^p + \cdots + |x_n|^p)^{1/p}$
- e.g., $\|x\|_2$ - Euclidean norm
-
matrices and vectors
- $a_{i}$ - $i$-th entry of vector $a$
- $A_{ij}$ - entry of matrix $A$ at position $(i,j)$, i.e., entry in $i$-th row and $j$-th column
- $\Tr(A)$ - trace of $A \in\reals^{n\times n}$, i.e., $A_{1,1}+ \cdots + A_{n,n}$
-
symmetric, positive definite, and positive semi-definite matrices
- $\symset{n}\subset \reals^{n\times n}$ - set of symmetric matrices
- $\possemidefset{n}\subset \symset{n}$ - set of positive semi-definite matrices; $A\succeq0 \Leftrightarrow A \in \possemidefset{n}$
- $\posdefset{n}\subset \symset{n}$ - set of positive definite matrices; $A\succ0 \Leftrightarrow A \in \posdefset{n}$
-
sometimes,
use Python script-like notations
(with serious abuse of mathematical notations)
-
use $f:\reals\to\reals$ as if it were $f:\reals^n \to \reals^n$,
e.g.,
$$
\exp(x) = (\exp(x_1), \ldots, \exp(x_n)) \quad \mbox{for } x\in\reals^n
$$
and
$$
\log(x) = (\log(x_1), \ldots, \log(x_n)) \quad \mbox{for } x\in\ppreals^n
$$
which corresponds to Python code
numpy.exp(x)
ornumpy.log(x)
wherex
is instance ofnumpy.ndarray
, i.e.,numpy
array -
use $\sum x$ to mean $\ones^T x$ for $x\in\reals^n$,
i.e.
$$
\sum x = x_1 + \cdots + x_n
$$
which corresponds to Python code
x.sum()
wherex
isnumpy
array -
use $x/y$ for $x,y\in\reals^n$ to mean
$$
\rowvecthree{x_1/y_1}{\cdots}{x_n/y_n}^T
$$
which corresponds to Python code
x / y
wherex
andy
are $1$-dnumpy
arrays -
use $X/Y$ for $X,Y\in\reals^{m\times n}$ to mean
$$
\begin{my-matrix}{cccc}
X_{1,1}/Y_{1,1} & X_{1,2}/Y_{1,2} & \cdots & X_{1,n}/Y_{1,n}
\\
X_{2,1}/Y_{2,1} & X_{2,2}/Y_{2,2} & \cdots & X_{2,n}/Y_{2,n}
\\
\vdots & \vdots & \ddots & \vdots
\\
X_{m,1}/Y_{m,1} & X_{m,2}/Y_{m,2} & \cdots & X_{m,n}/Y_{m,n}
\end{my-matrix}
$$
which corresponds to Python code
X / Y
whereX
andY
are $2$-dnumpy
arrays
-
use $f:\reals\to\reals$ as if it were $f:\reals^n \to \reals^n$,
e.g.,
$$
\exp(x) = (\exp(x_1), \ldots, \exp(x_n)) \quad \mbox{for } x\in\reals^n
$$
and
$$
\log(x) = (\log(x_1), \ldots, \log(x_n)) \quad \mbox{for } x\in\ppreals^n
$$
which corresponds to Python code
Some definitions
Some conventions
-
(for some subjects) use following conventions
- $0\cdot \infty = \infty \cdot 0 = 0$
- $(\forall x\in\ppreals)(x\cdot \infty = \infty \cdot x = \infty)$
- $\infty \cdot \infty = \infty$
Measure-theoretic Treatment of Probabilities
Probability Measure
Measurable functions
- denote $n$-dimensional Borel sets by $\algR^n$
- for two measurable spaces, $\measu{\Omega}{\algF}$ and $\measu{\Omega'}{\algF'}$, function, $f:\Omega \to \Omega'$ with $$ \left( \forall A' \in \algF' \right) \left( f^{-1}(A') \in \algF \right) $$ said to be measurable with respect to $\algF/\algF'$ (thus, measurable functions defined on page~ and page~ can be said to be measurable with respect to $\collk{B}/\algR$)
-
when $\Omega=\reals^n$ in $\measu{\Omega}{\algF}$,
$\algF$ is assumed to be $\algR^n$,
and sometimes drop $\algR^n$
- thus, e.g., we say $f:\Omega\to\reals^n$ is measurable with respect to $\algF$ (instead of $\algF/\algR^n$)
- measurable function, $f:\reals^n\to\reals^m$ (i.e., measurable with respect to $\algR^n/\algR^m$), called Borel functions
- $f:\Omega\to\reals^n$ is measurable with respect to $\algF/\algR^n$ if and only if every component, $f_i:\Omega\to\reals$, is measurable with respect to $\algF/\algR$
Probability (measure) spaces
-
set function, $P:\algk{F}\to[0,1]$, defined on algebra, $\algk{F}$, of set $\Omega$,
satisfying following properties,
called probability measure
(refer to page~ for resumblance with measurable spaces)
- $(\forall A\in\algk{F})(0\leq P(A)\leq 1)$
- $P(\emptyset) = 0,\ P(\Omega) = 1$
- $(\forall \mbox{ disjoint } \seq{A_n} \subset \algk{F} )(P\left(\bigcup A_n\right) = \sum P(A_n))$
- for $\sigma$-algebra, $\algk{F}$, $\meas{\Omega}{\algk{F}}{P}$, called probability measure space or probability space
- set $A\in\algk{F}$ with $P(A)=1$, called a support of $P$
Dynkin's $\pi$-$\lambda$ theorem
-
class, $\subsetset{P}$, of subsets of $\Omega$ closed under finite intersection,
called $\pi$-system, i.e.,
- $(\forall A,B\in \subsetset{P})(A\cap B\in\subsetset{P})$
-
class, $\subsetset{L}$, of subsets of $\Omega$ containing $\Omega$
closed under complements and countable disjoint unions
called $\lambda$-system
- $\Omega \in \subsetset{L}$
- $(\forall A\in \subsetset{L})(\compl{A}\in\subsetset{L})$
- $(\forall \mbox{ disjoint }\seq{A_n})(\bigcup A_n \in \subsetset{L})$
- class that is both $\pi$-system and $\lambda$-system is $\sigma$-algebra
- Dynkin's $\pi$-$\lambda$ theorem - for $\pi$-system, $\subsetset{P}$, and $\lambda$-system, $\subsetset{L}$, with $\subsetset{P} \subset \subsetset{L}$, $$ \sigma(\subsetset{P}) \subset \subsetset{L} $$
- for $\pi$-system, $\algk{P}$, two probability measures, $P_1$ and $P_2$, on $\sigma(\algk{P})$, agreeing $\algk{P}$, agree on $\sigma(\algk{P})$
Limits of Events
- for $\seq{A_n}$ converging to $A$ $$ \lim P(A_n) = P(A) $$
Probabilistic independence
- given probability space, $\meas{\Omega}{\algk{F}}{P}$
- $A,B\in\algk{F}$ with $$ P(A\cap B) = P(A) P(B) $$ said to be independent
- indexed collection, $\seq{A_\lambda}$, with $$ \left( \forall n\in\naturals, \mbox{ distinct } \lambda_1, \ldots, \lambda_n \in \Lambda \right) \left( P\left(\bigcap_{i=1}^n A_{\lambda_i}\right) = \prod_{i=1}^n P(A_{\lambda_i}) \right) $$ said to be independent
Independence of classes of events
- indexed collection, $\seq{\subsetset{A}_\lambda}$, of classes of events (i.e., subsets) with $$ \left( \forall A_\lambda \in \subsetset{A}_\lambda \right) \left( \seq{A_\lambda} \mbox{ are independent} \right) $$ said to be independent
- for independent indexed collection, \seq{\subsetset{A}_\lambda}, with every $\subsetset{A}_\lambda$ being $\pi$-sytem, \seq{\sigma(\subsetset{A}_\lambda)} are independent
- for independent (countable) collection of events, $\seq{\seq{A_{ni}}_{i=1}^\infty}_{n=1}^\infty$, $\seq{\algk{F}_n}_{n=1}^\infty$ with $\algk{F}_n = \sigma(\seq{A_{ni}}_{i=1}^\infty)$ are independent
Borel-Cantelli lemmas
-
for sequence of events, $\seq{A_n}$, with $\sum P(A_n)$ converging $$ P(\limsup A_n) = 0 $$
-
for independent sequence of events, $\seq{A_n}$, with $\sum P(A_n)$ diverging $$ P(\limsup A_n)=1 $$
Tail events and Kolmogorov's zero-one law
- for sequence of events, $\seq{A_n}$ $$ \algk{T} = \bigcap_{n=1}^\infty \sigma\left(\seq{A_i}_{i=n}^\infty\right) $$ called tail $\sigma$-algebra associated with \seq{A_n}; its lements are called tail events
- Kolmogorov's zero-one law - for independent sequence of events, $\seq{A_n}$ every event in tail $\sigma$-algebra has probability measure either $0$ or $1$
Product probability spaces
-
for two measure spaces, $\meas{X}{\algX}{\mu}$ and $\meas{Y}{\algY}{\nu}$,
want to find product measure, $\pi$,
such that
$$
\left(
\forall A\in \algX, B\in\algY
\right)
\left(
\pi(A\times B) = \mu(A)\nu(B)
\right)
$$
- e.g., if both $\mu$ and $\nu$ are Lebesgue measure on $\reals$, $\pi$ will be Lebesgue measure on $\reals^2$
- $A\times B$ for $A\in\algX$ and $B\in\algY$ is measurable rectangle
-
$\sigma$-algebra generated by measurable rectangles
denoted by
$$
\algX \times \algY
$$
- thus, not Cartesian product in usual sense
- generally much larger than class of measurable rectangles
Sections of measurable subsets and functions
- for two measure spaces, $\meas{X}{\algX}{\mu}$ and $\meas{Y}{\algY}{\nu}$
-
sections of measurable subsets
- $\set{y\in Y}{(x,y)\in E}$ is section of $E$ determined by $x$
- $\set{x\in X}{(x,y)\in E}$ is section of $E$ determined by $y$
-
sections of measurable functions
- for measurable function, $f$, with respect to $\algX\times \algY$
- $f(x,\cdot)$ is section of $f$ determined by $x$
- $f(\cdot,y)$ is section of $f$ determined by $y$
-
sections of measurable subsets are measurable
- $\left( \forall x\in X, E\in \algX \times \algY \right) \left( \set{y\in Y}{(x,y)\in E} \in \algY \right)$
- $\left( \forall y\in Y, E\in \algX \times \algY \right) \left( \set{x\in X}{(x,y)\in E} \in \algX \right)$
-
sections of measurable functions are measurable
- $f(x,\cdot)$ is measurable with respect to $\algY$ for every $x\in X$
- $f(\cdot,y)$ is measurable with respect to $\algX$ for every $y\in Y$
Product measure
- for two $\sigma$-finite measure spaces, $\meas{X}{\algX}{\mu}$ and $\meas{Y}{\algY}{\nu}$
-
two functions defined below for every $E\in\algX\times\algY$ are $\sigma$-finite measures
- $\pi'(E) = \int_X \nu\set{y\in Y}{(x,y)\in E} d\mu$
- $\pi''(E) = \int_Y \mu\set{x\in X}{(x,y)\in E} d\nu$
- for every measurable rectangle, $A\times B$, with $A\in\algX$ and $B\in\algY$ $$ \pi'(A\times B) = \pi''(A\times B) = \mu(A) \nu(B) $$
- (use conventions in page~ for extended real values)
- indeed, $\pi'(E)=\pi''(E)$ for every $E\in\algX\times\algY$; let $\pi=\pi'=\pi''$
-
$\pi$ is
- called product measure and denoted by $\mu\times \nu$
- $\sigma$-finite measure
- only measure such that $\pi(A\times B) =\mu(A) \nu(B)$ for every measurable rectangle
Fubini's theorem
-
suppose two $\sigma$-finite measure spaces, $\meas{X}{\algX}{\mu}$ and $\meas{Y}{\algY}{\nu}$
- define
- $X_0 = \set{x\in X}{\int_Y |f(x,y)|d\nu < \infty}\subset X$
- $Y_0 = \set{y\in Y}{\int_X |f(x,y)|d\nu < \infty}\subset Y$
- Fubini's theorem - for nonnegative measurable function, $f$, following are measurable with respect to $\algX$ and $\algY$ respectively $$ g(x) = \int_Y f(x,y)d\nu,\ \ h(y) = \int_X f(x,y)d\mu $$ and following holds $$ \int_{X\times Y} f(x,y) d\pi = \int_X \left(\int_Y f(x,y) d\nu\right)d\mu = \int_Y \left(\int_X f(x,y) d\mu\right)d\nu $$
-
for $f$, (not necessarily nonnegative) integrable function with respect to $\pi$
- $\mu(X\sim X_0) = 0$, $\nu(Y\sim Y_0)=0$
- $g$ and $h$ are finite measurable on $X_0$ and $Y_0$ respectively
- (above) equalities of double integral holds
Random Variables
Random variables
- for probability space, $\meas{\Omega}{\algk{F}}{P}$,
- measurable function (with respect to $\algF/\algR$), $X:\Omega \to \reals$, called random variable
-
measurable function (with respect to $\algF/\algR^n$), $X:\Omega \to \reals^n$,
called random vector
- when expressing $X(\omega)=(X_1(\omega), \ldots, X_n(\omega))$, $X$ is measurable if and only if every $X_i$ is measurable
- thus, $n$-dimensional random vaector is simply $n$-tuple of random variables
-
smallest $\sigma$-algebra with respect to which $X$ is measurable,
called $\sigma$-algebra generated by $X$
and denoted by $\sigma(X)$
- $\sigma(X)$ consists exactly of sets, $\set{\omega\in \Omega}{X(\omega)\in H}$, for $H\in\algR^n$
- random variable, $Y$, is measurable with respect to $\sigma(X)$ if and only if exists measurable function, $f:\reals^n\to\reals$ such that $Y(\omega) = f(X(\omega))$ for all $\omega$, i.e., $Y=f\circ X$
Probability distributions for random variables
- probability measure on $\reals$, $\mu = PX^{-1}$, i.e., $$ \mu(A) = P(X\in A) \mbox{ for } A \in \algR $$ called distribution or law of random variable, $X$
- function, $F:\reals\to[0,1]$, defined by $$ F(x) = \mu(-\infty, x] = P(X\leq x) $$ called distribution function or cumulative distribution function (CDF) of $X$
- Borel set, $S$, with $P(S)=1$, called support
- random variable, its distribution, its distribution function, said to be discrete when has countable support
Probability distribution of mappings of random variables
- for measurable $g:\reals\to\reals$, $$ \left( \forall A\in\algR \right) \left( \prob{g(X)\in A} = \prob{X \in g^{-1}(A)} = \mu (g^{-1}(A)) \right) $$ hence, $g(X)$ has distribution of $\mu g^{-1}$
Probability density for random variables
- Borel function, $f: \reals\to\preals$, satisfying $$ \left( \forall A \in \algR \right) \left( \mu(A) = P(X\in A) = \int_A f(x) dx \right) $$ called density or probability density function (PDF) of random variable
- above is equivalent to $$ \left( \forall a < b \in \reals \right) \left( \int_a^b f(x) dx = P(a<X\leq b) = F(b) - F(a) \right) $$
-
(refer to statement on page~)
- note, though, $F$ does not need to differentiate to $f$ everywhere; only $f$ required to integrate properly
- if $F$ does differentiate to $f$ and $f$ is continuous, fundamental theorem of calculus implies $f$ indeed is density for $F$
Probability distribution for random vectors
- (similarly to random variables) probability measure on $\reals^n$, $\mu = PX^{-1}$, i.e., $$ \mu(A) = P(X\in A) \mbox{ for } A \in \algk{B}^k $$ called distribution or law of random vector, $X$
- function, $F:\reals^k\to[0,1]$, defined by $$ F(x) = \mu S_x = P(X\preceq x) $$ where $$ S_x = \set{\omega\in \Omega}{X(\omega)\preceq x} = \set{\omega\in \Omega}{X_i(\omega)\leq x_i} $$ called distribution function or cumulative distribution function (CDF) of $X$
- (similarly to random variables) random vector, its distribution, its distribution function, said to be discrete when has countable support
Marginal distribution for random vectors
- (similarly to random variables) for measurable $g:\reals^n\to\reals^m$ $$ \left( \forall A\in\algR^{m} \right) \left( \prob{g(X)\in A} = \prob{X \in g^{-1}(A)} = \mu(g^{-1}(A)) \right) $$ hence, $g(X)$ has distribution of $\mu g^{-1}$
- for $g_i:\reals^n\to\reals$ with $g_i(x) = x_i$ $$ \left( \forall A\in\algR \right) \left( \prob{g(X)\in A} = \prob{X_i \in A} \right) $$
- measure, $\mu_i$, defined by $\mu_i(A) = \prob{X_i\in A}$, called ($i$-th) marginal distribution of $X$
- for $\mu$ having density function, $f:\reals^n\to\preals$, density function of marginal distribution is $$ f_i(x) = \int_{\algR^{n-1}} f(x_{-i}) d \mu_{-i} $$ where $x_{-i} = (x_1,\ldots,x_{i-1}, x_{i+1}, \ldots, x_n)$ and similarly for $d\mu_{-i}$
Independence of random variables
- random variables, $X_1$, , $X_n$, with independent $\sigma$-algebras generated by them, said to be independent
-
(refer to page~ for
independence of collections of subsets)
- because $\sigma(X_i) = X_i^{-1}(\algR)=\set{X_i^{-1}(H)}{H\in\algR}$, independent if and only if $$ \left( \forall H_1, \ldots, H_n\in \algR \right) \left( P\left(X_1\in H_1,\ldots, X_n\in H_n\right) = \prod P\left(X_i\in H_i\right) \right) $$ i.e., $$ \left( \forall H_1, \ldots, H_n\in \algR \right) \left( P\left(\bigcap X_i^{-1}(H_i)\right) = \prod P\left(X_i^{-1}(H_i)\right) \right) $$
Equivalent statements of independence of random variables
-
for random variables, $X_1$, , $X_n$,
having $\mu$ and $F:\reals^n\to[0,1]$ as their distribution and CDF,
with each $X_i$ having $\mu_i$ and $F_i:\reals\to[0,1]$ as its distribution and CDF,
following statements are equivalent
- $X_1,\ldots,X_n \mbox{ are independent}$
- $\left( \forall H_1, \ldots, H_n\in \algR \right) \left( P\left(\bigcap X_i^{-1}(H_i)\right) = \prod P\left(X_i^{-1}(H_i)\right) \right)$
- $\left( \forall H_1,\ldots,H_n \in \algR \right) \left( P(X_1\in H_1,\ldots, X_n\in H_n) = \prod P(X_i \in H_i) \right)$
- $\left( \forall x\in \reals^n \right) \left( P(X_1\leq x_1,\ldots, X_n\leq x_n) = \prod P(X_i \leq x_i) \right)$
- $\left( \forall x \in \reals^n \right) \left( F(x) = \prod F_i(x_i) \right)$
- $\mu = \mu_1 \times \cdots \times \mu_n$
- $\left( \forall x \in \reals^n \right) \left( f(x) = \prod f_i(x_i) \right)$
Independence of random variables with separate $\sigma$-algebra
- given probability space, $\meas{\Omega}{\algk{F}}{P}$
- random variables, $X_1$, , $X_n$, each of which is measurable with respect to each of $n$ independent $\sigma$-algebras, $\algk{G}_1\subset \algF$, , $\algk{G}_n\subset \algF$ respectively, are independent
Independence of random vectors
-
for random vectors, $X_1:\Omega\to\reals^{d_1}$, , $X_n:\Omega\to\reals^{d_n}$,
having $\mu$ and $F:\reals^{d_1}\times\cdots\times\reals^{d_n}\to[0,1]$ as their distribution and CDF,
with each $X_i$ having $\mu_i$ and $F_i:\reals^{d_i}\to[0,1]$ as its distribution and CDF,
following statements are equivalent
- $X_1,\ldots,X_n \mbox{ are independent}$
- $\left( \forall H_1\in\algR^{d_1}, \ldots, H_n\in \algR^{d_n} \right) \left( P\left(\bigcap X_i^{-1}(H_i)\right) = \prod P\left(X_i^{-1}(H_i)\right) \right)$
- $\left( \forall H_1\in\algR^{d_1}, \ldots, H_n\in \algR^{d_n} \right) \left( P(X_1\in H_1,\ldots, X_n\in H_n) = \prod P(X_i \in H_i) \right)$
- $\left( \forall x_1\in \reals^{d_1},\ldots,x_n\in\reals^{d_n} \right) \left( P(X_1\preceq x_1,\ldots, X_n\preceq x_n) = \prod P(X_i \preceq x_i) \right)$
- $\left( \forall x_1\in \reals^{d_1},\ldots,x_n\in\reals^{d_n} \right) \left( F(x_1,\ldots,x_n) = \prod F_i(x_i) \right)$
- $\mu = \mu_1 \times \cdots \times \mu_n$
- $\left( \forall x_1\in \reals^{d_1},\ldots,x_n\in\reals^{d_n} \right) \left( f(x_1,\ldots,x_n) = \prod f_i(x_i) \right)$
Independence of infinite collection of random vectors
- infinite collection of random vectors for which every finite subcollection is independent, said to be independent
- for independent (countable) collection of random vectors, $\seq{\seq{X_{ni}}_{i=1}^\infty}_{n=1}^\infty$, $\seq{\algk{F}_n}_{n=1}^\infty$ with $\algk{F}_n = \sigma(\seq{X_{ni}}_{i=1}^\infty)$ are independent
Probability evaluation for two independent random vectors
Sequence of random variables
Expected values
-
$\Expect X$ is
- always defined for nonnegative $X$
-
for general case
- defined, or
- $X$ has an expected value if either $\Expect X^+<\infty$ or $\Expect X^-<\infty$ or both, in which case, $\Expect X =\Expect X^+ - \Expect X^-$
- $X$ is integrable if and only if $\Expect |X| <\infty$
-
limits
- if $\seq{X_n}$ is dominated by integral random variable or they are uniformly integrable, $\Expect X_n$ converges to $\Expect X$ if $X_n$ converges to $X$ in probability
Markov and Chebyshev's inequalities
Jensen's, Hölder's, and Lyapunov's inequalities
- note Hölder's inequality implies Lyapunov's inequality
Maximal inequalities
– define $S_n = \sum X_i$
Moments
- if $\Expect |X|^n<\infty$, $\Expect |X|^k<\infty$ for $k<n$
- $\Expect X^n$ defined only when $\Expect|X|^n<\infty$
Moment generating functions
- $n$-th derivative of $M$ with respect to $s$ is $M^{(n)}(s) = \frac{d^n}{ds^n} F(s) = \Expect \left(X^ne^{sX}\right) = \int xe^{sx} d\mu$
- thus, $n$-th derivative of $M$ with respect to $s$ at $s=0$ is $n$-th moment of $X$ $$ M^{(n)}(0) = \Expect X^n $$
- for independent random variables, $\seq{X_i}_{i=1}^n$, moment generating function of $\sum X_i$ $$ \prod M_i(s) $$
Convergence of Random Variables
Convergences of random variables
- indeed, if above equation holds for $A=(-\infty, x)$, it holds for many other subsets
Relations of different types of convergences of random variables
Necessary and sufficient conditions for convergence of probability
\[{X_n}\ \mbox{ converge in probability}\]if and only if
\[\left( \forall \epsilon>0 \right) \left( \prob{|X_n-X|>\epsilon\mbox{ i.o}} = \prob{\limsup |X_n-X| > \epsilon } = 0 \right)\]if and only if
\[\left( \forall \mbox{ subsequence }\seq{X_{n_k}} \right) \left( \exists \mbox{ its subsequence }\seq{X_{n_{k_l}}} \mbox{ converging to } f \mbox{ with probability } 1 \right)\]Necessary and sufficient conditions for convergence in distribution
\[X_n\Rightarrow X, \mbox{\ie, $X_n$ converge in distribution}\]if and only if
\[F_n\Rightarrow F, \mbox{\ie, $F_n$ converge weakly}\]if and only if
\[\left( \forall A = (-\infty, x] \mbox{ with } x\in\reals \right) \left( \lim \mu_n(A) = \mu(A) \right)\]if and only if
\[\left( \forall x \mbox{ with } \prob{X=x} = 0 \right) \left( \lim \prob{X_n\leq x} = \prob{X\leq x} \right)\]Strong law of large numbers
– define $S_n = \sum_{i=1}^n X_i$
- strong law of large numbers also called Kolmogorov's law
Weak law of large numbers
– define $S_n = \sum_{i=1}^n X_i$
- because convergence with probability $1$ implies convergence in probability (), strong law of large numbers implies weak law of large numbers
Normal distributions
– assume probability space, $\meas{\Omega}{\algF}{P}$
- note $\Expect X=c$ and $\Var X=\sigma^2$
- called standard normal distribution when $c=0$ and $\sigma=1$
Multivariate normal distributions
– assume probability space, $\meas{\Omega}{\algF}{P}$
- note that $\Expect X=c$ and covariance matrix is $\Sigma$
Lindeberg-Lévy theorem
– define $S_n = \sum^n X_i$
Limit theorems in $\reals^n$
- $\lim \int f d\mu_n = \int f d\mu$ for every bounded continuous $f$
- $\limsup \mu_n(C) \leq \mu(C)$ for every closed $C$
- $\liminf \mu_n(G) \geq \mu(G)$ for every open $G$
- $\lim \mu_n(A) = \mu(A)$ for every $\mu$-continuity $A$
Central limit theorem
– assume probability space, $\meas{\Omega}{\algF}{P}$ and define $\sum^n X_i = S_n$
Convergence of random series
- for independent $\seq{X_n}$, probability of $\sum X_n$ converging is either $0$ or $1$
- below characterize two cases in terms of distributions of individual $X_n$ -- XXX: diagram
– define trucated version of $X_n$ by $X_n^{(c)}$, i.e., $X_n I_{|X_n|\leq c}$