Measure-theoretic Treatment of Statistics

21 minute read

posted: 01-Aug-2025 & updated: 05-Aug-2025

NotebookLM Podcast

22:33 \(% \newcommand{\algA}{\algk{A}} \newcommand{\algC}{\algk{C}} \newcommand{\bigtimes}{\times} \newcommand{\compl}[1]{\tilde{#1}} \newcommand{\complexes}{\mathbb{C}} \newcommand{\dom}{\mathop{\bf dom {}}} \newcommand{\ereals}{\reals\cup\{-\infty,\infty\}} \newcommand{\field}{\mathbb{F}} \newcommand{\integers}{\mathbb{Z}} \newcommand{\lbdseqk}[1]{\seqk{\lambda}{#1}} \newcommand{\meas}[3]{({#1}, {#2}, {#3})} \newcommand{\measu}[2]{({#1}, {#2})} \newcommand{\meast}[3]{\left({#1}, {#2}, {#3}\right)} \newcommand{\naturals}{\mathbb{N}} \newcommand{\nuseqk}[1]{\seqk{\nu}{#1}} \newcommand{\pair}[2]{\langle {#1}, {#2}\rangle} \newcommand{\rationals}{\mathbb{Q}} \newcommand{\reals}{\mathbb{R}} \newcommand{\seq}[1]{\left\langle{#1}\right\rangle} \newcommand{\powerset}{\mathcal{P}} \newcommand{\pprealk}[1]{\reals_{++}^{#1}} \newcommand{\ppreals}{\mathbb{R}_{++}} \newcommand{\prealk}[1]{\reals_{+}^{#1}} \newcommand{\preals}{\mathbb{R}_+} \newcommand{\tXJ}{\topos{X}{J}} % \newcommand{\relint}{\mathop{\bf relint {}}} \newcommand{\boundary}{\mathop{\bf bd {}}} \newcommand{\subsetset}[1]{\mathcal{#1}} \newcommand{\Tr}{\mathcal{\bf Tr}} \newcommand{\symset}[1]{\mathbf{S}^{#1}} \newcommand{\possemidefset}[1]{\mathbf{S}_+^{#1}} \newcommand{\posdefset}[1]{\mathbf{S}_{++}^{#1}} \newcommand{\ones}{\mathbf{1}} \newcommand{\Prob}{\mathop{\bf Prob {}}} \newcommand{\prob}[1]{\Prob\left\{#1\right\}} \newcommand{\Expect}{\mathop{\bf E {}}} \newcommand{\Var}{\mathop{\bf Var{}}} \newcommand{\Mod}[1]{\;(\text{mod}\;#1)} \newcommand{\ball}[2]{B(#1,#2)} \newcommand{\generates}[1]{\langle {#1} \rangle} \newcommand{\isomorph}{\approx} \newcommand{\isomorph}{\approx} \newcommand{\nullspace}{\mathcalfont{N}} \newcommand{\range}{\mathcalfont{R}} \newcommand{\diag}{\mathop{\bf diag {}}} \newcommand{\rank}{\mathop{\bf rank {}}} \newcommand{\Ker}{\mathop{\mathrm{Ker} {}}} \newcommand{\Map}{\mathop{\mathrm{Map} {}}} \newcommand{\End}{\mathop{\mathrm{End} {}}} \newcommand{\Img}{\mathop{\mathrm{Im} {}}} \newcommand{\Aut}{\mathop{\mathrm{Aut} {}}} \newcommand{\Gal}{\mathop{\mathrm{Gal} {}}} \newcommand{\Irr}{\mathop{\mathrm{Irr} {}}} \newcommand{\arginf}{\mathop{\mathrm{arginf}}} \newcommand{\argsup}{\mathop{\mathrm{argsup}}} \newcommand{\argmin}{\mathop{\mathrm{argmin}}} \newcommand{\ev}{\mathop{\mathrm{ev} {}}} \newcommand{\affinehull}{\mathop{\bf aff {}}} \newcommand{\cvxhull}{\mathop{\bf Conv {}}} \newcommand{\epi}{\mathop{\bf epi {}}} \newcommand{\injhomeo}{\hookrightarrow} \newcommand{\perm}[1]{\text{Perm}(#1)} \newcommand{\aut}[1]{\text{Aut}(#1)} \newcommand{\ideal}[1]{\mathfrak{#1}} \newcommand{\bigset}[2]{\left\{#1\left|{#2}\right.\right\}} \newcommand{\bigsetl}[2]{\left\{\left.{#1}\right|{#2}\right\}} \newcommand{\primefield}[1]{\field_{#1}} \newcommand{\dimext}[2]{[#1:{#2}]} \newcommand{\restrict}[2]{#1|{#2}} \newcommand{\algclosure}[1]{#1^\mathrm{a}} \newcommand{\finitefield}[2]{\field_{#1^{#2}}} \newcommand{\frobmap}[2]{\varphi_{#1,{#2}}} % %\newcommand{\algfontmode}{} % %\ifdefined\algfontmode %\newcommand\mathalgfont[1]{\mathcal{#1}} %\newcommand\mathcalfont[1]{\mathscr{#1}} %\else \newcommand\mathalgfont[1]{\mathscr{#1}} \newcommand\mathcalfont[1]{\mathcal{#1}} %\fi % %\def\DeltaSirDir{yes} %\newcommand\sdirletter[2]{\ifthenelse{\equal{\DeltaSirDir}{yes}}{\ensuremath{\Delta #1}}{\ensuremath{#2}}} \newcommand{\sdirletter}[2]{\Delta #1} \newcommand{\sdirlbd}{\sdirletter{\lambda}{\Delta \lambda}} \newcommand{\sdir}{\sdirletter{x}{v}} \newcommand{\seqk}[2]{#1^{(#2)}} \newcommand{\seqscr}[3]{\seq{#1}_{#2}^{#3}} \newcommand{\xseqk}[1]{\seqk{x}{#1}} \newcommand{\sdirk}[1]{\seqk{\sdir}{#1}} \newcommand{\sdiry}{\sdirletter{y}{\Delta y}} \newcommand{\slen}{t} \newcommand{\slenk}[1]{\seqk{\slen}{#1}} \newcommand{\ntsdir}{\sdir_\mathrm{nt}} \newcommand{\pdsdir}{\sdir_\mathrm{pd}} \newcommand{\sdirnu}{\sdirletter{\nu}{w}} \newcommand{\pdsdirnu}{\sdirnu_\mathrm{pd}} \newcommand{\pdsdiry}{\sdiry_\mathrm{pd}} \newcommand\pdsdirlbd{\sdirlbd_\mathrm{pd}} % \newcommand{\normal}{\mathcalfont{N}} % \newcommand{\algk}[1]{\mathalgfont{#1}} \newcommand{\collk}[1]{\mathcalfont{#1}} \newcommand{\classk}[1]{\collk{#1}} \newcommand{\indexedcol}[1]{\{#1\}} \newcommand{\rel}{\mathbf{R}} \newcommand{\relxy}[2]{#1\;\rel\;{#2}} \newcommand{\innerp}[2]{\langle{#1},{#2}\rangle} \newcommand{\innerpt}[2]{\left\langle{#1},{#2}\right\rangle} \newcommand{\closure}[1]{\overline{#1}} \newcommand{\support}{\mathbf{support}} \newcommand{\set}[2]{\{#1|#2\}} \newcommand{\metrics}[2]{\langle {#1}, {#2}\rangle} \newcommand{\interior}[1]{#1^\circ} \newcommand{\topol}[1]{\mathfrak{#1}} \newcommand{\topos}[2]{\langle {#1}, \topol{#2}\rangle} % topological space % \newcommand{\alg}{\algk{A}} \newcommand{\algB}{\algk{B}} \newcommand{\algF}{\algk{F}} \newcommand{\algR}{\algk{R}} \newcommand{\algX}{\algk{X}} \newcommand{\algY}{\algk{Y}} % \newcommand\coll{\collk{C}} \newcommand\collB{\collk{B}} \newcommand\collF{\collk{F}} \newcommand\collG{\collk{G}} \newcommand{\tJ}{\topol{J}} \newcommand{\tS}{\topol{S}} \newcommand\openconv{\collk{U}} % \newenvironment{my-matrix}[1]{\begin{bmatrix}}{\end{bmatrix}} \newcommand{\colvectwo}[2]{\begin{my-matrix}{c}{#1}\\{#2}\end{my-matrix}} \newcommand{\colvecthree}[3]{\begin{my-matrix}{c}{#1}\\{#2}\\{#3}\end{my-matrix}} \newcommand{\rowvecthree}[3]{\begin{bmatrix}{#1}&{#2}&{#3}\end{bmatrix}} \newcommand{\mattwotwo}[4]{\begin{bmatrix}{#1}&{#2}\\{#3}&{#4}\end{bmatrix}} % \newcommand\optfdk[2]{#1^\mathrm{#2}} \newcommand\tildeoptfdk[2]{\tilde{#1}^\mathrm{#2}} \newcommand\fobj{\optfdk{f}{obj}} \newcommand\fie{\optfdk{f}{ie}} \newcommand\feq{\optfdk{f}{eq}} \newcommand\tildefobj{\tildeoptfdk{f}{obj}} \newcommand\tildefie{\tildeoptfdk{f}{ie}} \newcommand\tildefeq{\tildeoptfdk{f}{eq}} \newcommand\xdomain{\mathcalfont{X}} \newcommand\xobj{\optfdk{\xdomain}{obj}} \newcommand\xie{\optfdk{\xdomain}{ie}} \newcommand\xeq{\optfdk{\xdomain}{eq}} \newcommand\optdomain{\mathcalfont{D}} \newcommand\optfeasset{\mathcalfont{F}} % \newcommand{\bigpropercone}{\mathcalfont{K}} % \newcommand{\prescript}[3]{\;^{#1}{#3}} % %\)

Introduction

Preamble

Notations

sets of numbers
- $\naturals$ - set of natural numbers
- $\integers$ - set of integers
- $\integers_+$ - set of nonnegative integers
- $\rationals$ - set of rational numbers
- $\reals$ - set of real numbers
- $\preals$ - set of nonnegative real numbers
- $\ppreals$ - set of positive real numbers
- $\complexes$ - set of complex numbers
sequences $\seq{x_i}$ and the like
- finite $\seq{x_i}_{i=1}^n$, infinite $\seq{x_i}_{i=1}^\infty$ - use $\seq{x_i}$ whenever unambiguously understood
- similarly for other operations, e.g., $\sum x_i$, $\prod x_i$, $\cup A_i$, $\cap A_i$, $\bigtimes A_i$
- similarly for integrals, e.g., $\int f$ for $\int_{-\infty}^\infty f$
sets
- $\compl{A}$ - complement of $A$
- $A\sim B$ - $A\cap \compl{B}$
- $A\Delta B$ - $(A\cap \compl{B}) \cup (\compl{A} \cap B)$
- $\powerset(A)$ - set of all subsets of $A$
sets in metric vector spaces
- $\closure{A}$ - closure of set $A$
- $\interior{A}$ - interior of set $A$
- $\relint A$ - relative interior of set $A$
- $\boundary A$ - boundary of set $A$
set algebra
- $\sigma(\subsetset{A})$ - $\sigma$-algebra generated by $\subsetset{A}$, i.e., smallest $\sigma$-algebra containing $\subsetset{A}$
norms in $\reals^n$
- $\|x\|_p$ ($p\geq1$) - $p$-norm of $x\in\reals^n$, i.e., $(|x_1|^p + \cdots + |x_n|^p)^{1/p}$
- e.g., $\|x\|_2$ - Euclidean norm
matrices and vectors
- $a_{i}$ - $i$-th entry of vector $a$
- $A_{ij}$ - entry of matrix $A$ at position $(i,j)$, i.e., entry in $i$-th row and $j$-th column
- $\Tr(A)$ - trace of $A \in\reals^{n\times n}$, i.e., $A_{1,1}+ \cdots + A_{n,n}$
symmetric, positive definite, and positive semi-definite matrices
- $\symset{n}\subset \reals^{n\times n}$ - set of symmetric matrices
- $\possemidefset{n}\subset \symset{n}$ - set of positive semi-definite matrices; $A\succeq0 \Leftrightarrow A \in \possemidefset{n}$
- $\posdefset{n}\subset \symset{n}$ - set of positive definite matrices; $A\succ0 \Leftrightarrow A \in \posdefset{n}$
sometimes, use Python script-like notations (with serious abuse of mathematical notations)
- use $f:\reals\to\reals$ as if it were $f:\reals^n \to \reals^n$, e.g., $$ \exp(x) = (\exp(x_1), \ldots, \exp(x_n)) \quad \mbox{for } x\in\reals^n $$ and $$ \log(x) = (\log(x_1), \ldots, \log(x_n)) \quad \mbox{for } x\in\ppreals^n $$ which corresponds to Python code numpy.exp(x) or numpy.log(x) where x is instance of numpy.ndarray, i.e., numpy array
- use $\sum x$ to mean $\ones^T x$ for $x\in\reals^n$, i.e. $$ \sum x = x_1 + \cdots + x_n $$ which corresponds to Python code x.sum() where x is numpy array
- use $x/y$ for $x,y\in\reals^n$ to mean $$ \rowvecthree{x_1/y_1}{\cdots}{x_n/y_n}^T $$ which corresponds to Python code x / y where x and y are $1$-d numpy arrays
- use $X/Y$ for $X,Y\in\reals^{m\times n}$ to mean $$ \begin{my-matrix}{cccc} X_{1,1}/Y_{1,1} & X_{1,2}/Y_{1,2} & \cdots & X_{1,n}/Y_{1,n} \\ X_{2,1}/Y_{2,1} & X_{2,2}/Y_{2,2} & \cdots & X_{2,n}/Y_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ X_{m,1}/Y_{m,1} & X_{m,2}/Y_{m,2} & \cdots & X_{m,n}/Y_{m,n} \end{my-matrix} $$ which corresponds to Python code X / Y where X and Y are $2$-d numpy arrays

Some definitions

statement $P_n$, said to happen infinitely often or i.o. if $$ \left( \forall N\in\naturals \right) \left( \exists n > N \right) \left( P_n \right) $$

statement $P(x)$, said to happen almost everywhere or a.e. or almost surely or a.s. (depending on context) associated with measure space $\meas{X}{\algB}{\mu}$ if $$ \mu \set{x}{P(x)} = 1 $$ or equivalently $$ \mu \set{x}{\sim P(x)} = 0 $$

Some conventions

(for some subjects) use following conventions
- $0\cdot \infty = \infty \cdot 0 = 0$
- $(\forall x\in\ppreals)(x\cdot \infty = \infty \cdot x = \infty)$
- $\infty \cdot \infty = \infty$

Measure-theoretic Treatment of Probabilities

Probability Measure

Measurable functions

denote $n$-dimensional Borel sets by $\algR^n$
for two measurable spaces, $\measu{\Omega}{\algF}$ and $\measu{\Omega'}{\algF'}$, function, $f:\Omega \to \Omega'$ with $$ \left( \forall A' \in \algF' \right) \left( f^{-1}(A') \in \algF \right) $$ said to be measurable with respect to $\algF/\algF'$ (thus, measurable functions defined on page~here and page~here can be said to be measurable with respect to $\collk{B}/\algR$)
when $\Omega=\reals^n$ in $\measu{\Omega}{\algF}$, $\algF$ is assumed to be $\algR^n$, and sometimes drop $\algR^n$
- thus, e.g., we say $f:\Omega\to\reals^n$ is measurable with respect to $\algF$ (instead of $\algF/\algR^n$)
measurable function, $f:\reals^n\to\reals^m$ (i.e., measurable with respect to $\algR^n/\algR^m$), called Borel functions
$f:\Omega\to\reals^n$ is measurable with respect to $\algF/\algR^n$ if and only if every component, $f_i:\Omega\to\reals$, is measurable with respect to $\algF/\algR$

Probability (measure) spaces

set function, $P:\algk{F}\to[0,1]$, defined on algebra, $\algk{F}$, of set $\Omega$, satisfying following properties, called probability measure (refer to page~here for resumblance with measurable spaces)
- $(\forall A\in\algk{F})(0\leq P(A)\leq 1)$
- $P(\emptyset) = 0,\ P(\Omega) = 1$
- $(\forall \mbox{ disjoint } \seq{A_n} \subset \algk{F} )(P\left(\bigcup A_n\right) = \sum P(A_n))$
for $\sigma$-algebra, $\algk{F}$, $\meas{\Omega}{\algk{F}}{P}$, called probability measure space or probability space
set $A\in\algk{F}$ with $P(A)=1$, called a support of $P$

Dynkin's $\pi$-$\lambda$ theorem

class, $\subsetset{P}$, of subsets of $\Omega$ closed under finite intersection, called $\pi$-system, i.e.,
- $(\forall A,B\in \subsetset{P})(A\cap B\in\subsetset{P})$
class, $\subsetset{L}$, of subsets of $\Omega$ containing $\Omega$ closed under complements and countable disjoint unions called $\lambda$-system
- $\Omega \in \subsetset{L}$
- $(\forall A\in \subsetset{L})(\compl{A}\in\subsetset{L})$
- $(\forall \mbox{ disjoint }\seq{A_n})(\bigcup A_n \in \subsetset{L})$
class that is both $\pi$-system and $\lambda$-system is $\sigma$-algebra
Dynkin's $\pi$-$\lambda$ theorem - for $\pi$-system, $\subsetset{P}$, and $\lambda$-system, $\subsetset{L}$, with $\subsetset{P} \subset \subsetset{L}$, $$ \sigma(\subsetset{P}) \subset \subsetset{L} $$
for $\pi$-system, $\algk{P}$, two probability measures, $P_1$ and $P_2$, on $\sigma(\algk{P})$, agreeing $\algk{P}$, agree on $\sigma(\algk{P})$

Limits of Events

no for sequence of subsets, $\seq{A_n}$, $$ P(\liminf A_n) \leq \liminf P(A_n) \leq \limsup P(A_n) \leq P(\limsup A_n) $$

for $\seq{A_n}$ converging to $A$ $$ \lim P(A_n) = P(A) $$

no for sequence of $\pi$-systems, $\seq{\algA_n}$, $\seq{\sigma(\algA_n)}$ is independent

Probabilistic independence

given probability space, $\meas{\Omega}{\algk{F}}{P}$
$A,B\in\algk{F}$ with $$ P(A\cap B) = P(A) P(B) $$ said to be independent
indexed collection, $\seq{A_\lambda}$, with $$ \left( \forall n\in\naturals, \mbox{ distinct } \lambda_1, \ldots, \lambda_n \in \Lambda \right) \left( P\left(\bigcap_{i=1}^n A_{\lambda_i}\right) = \prod_{i=1}^n P(A_{\lambda_i}) \right) $$ said to be independent

Independence of classes of events

indexed collection, $\seq{\subsetset{A}_\lambda}$, of classes of events (i.e., subsets) with $$ \left( \forall A_\lambda \in \subsetset{A}_\lambda \right) \left( \seq{A_\lambda} \mbox{ are independent} \right) $$ said to be independent
for independent indexed collection, \seq{\subsetset{A}_\lambda}, with every $\subsetset{A}_\lambda$ being $\pi$-sytem, \seq{\sigma(\subsetset{A}_\lambda)} are independent
for independent (countable) collection of events, $\seq{\seq{A_{ni}}_{i=1}^\infty}_{n=1}^\infty$, $\seq{\algk{F}_n}_{n=1}^\infty$ with $\algk{F}_n = \sigma(\seq{A_{ni}}_{i=1}^\infty)$ are independent

Borel-Cantelli lemmas

for sequence of events, $\seq{A_n}$, with $\sum P(A_n)$ converging $$ P(\limsup A_n) = 0 $$
for independent sequence of events, $\seq{A_n}$, with $\sum P(A_n)$ diverging $$ P(\limsup A_n)=1 $$

Tail events and Kolmogorov's zero-one law

for sequence of events, $\seq{A_n}$ $$ \algk{T} = \bigcap_{n=1}^\infty \sigma\left(\seq{A_i}_{i=n}^\infty\right) $$ called tail $\sigma$-algebra associated with \seq{A_n}; its lements are called tail events
Kolmogorov's zero-one law - for independent sequence of events, $\seq{A_n}$ every event in tail $\sigma$-algebra has probability measure either $0$ or $1$

Product probability spaces

for two measure spaces, $\meas{X}{\algX}{\mu}$ and $\meas{Y}{\algY}{\nu}$, want to find product measure, $\pi$, such that $$ \left( \forall A\in \algX, B\in\algY \right) \left( \pi(A\times B) = \mu(A)\nu(B) \right) $$
- e.g., if both $\mu$ and $\nu$ are Lebesgue measure on $\reals$, $\pi$ will be Lebesgue measure on $\reals^2$
$A\times B$ for $A\in\algX$ and $B\in\algY$ is measurable rectangle
$\sigma$-algebra generated by measurable rectangles denoted by $$ \algX \times \algY $$
- thus, not Cartesian product in usual sense
- generally much larger than class of measurable rectangles

Sections of measurable subsets and functions

for two measure spaces, $\meas{X}{\algX}{\mu}$ and $\meas{Y}{\algY}{\nu}$
sections of measurable subsets
- $\set{y\in Y}{(x,y)\in E}$ is section of $E$ determined by $x$
- $\set{x\in X}{(x,y)\in E}$ is section of $E$ determined by $y$
sections of measurable functions - for measurable function, $f$, with respect to $\algX\times \algY$
- $f(x,\cdot)$ is section of $f$ determined by $x$
- $f(\cdot,y)$ is section of $f$ determined by $y$
sections of measurable subsets are measurable
- $ \left( \forall x\in X, E\in \algX \times \algY \right) \left( \set{y\in Y}{(x,y)\in E} \in \algY \right) $
- $ \left( \forall y\in Y, E\in \algX \times \algY \right) \left( \set{x\in X}{(x,y)\in E} \in \algX \right) $
sections of measurable functions are measurable
- $f(x,\cdot)$ is measurable with respect to $\algY$ for every $x\in X$
- $f(\cdot,y)$ is measurable with respect to $\algX$ for every $y\in Y$

Product measure

for two $\sigma$-finite measure spaces, $\meas{X}{\algX}{\mu}$ and $\meas{Y}{\algY}{\nu}$
two functions defined below for every $E\in\algX\times\algY$ are $\sigma$-finite measures
- $\pi'(E) = \int_X \nu\set{y\in Y}{(x,y)\in E} d\mu$
- $\pi''(E) = \int_Y \mu\set{x\in X}{(x,y)\in E} d\nu$
for every measurable rectangle, $A\times B$, with $A\in\algX$ and $B\in\algY$ $$ \pi'(A\times B) = \pi''(A\times B) = \mu(A) \nu(B) $$
(use conventions in page~here for extended real values)
indeed, $\pi'(E)=\pi''(E)$ for every $E\in\algX\times\algY$; let $\pi=\pi'=\pi''$
$\pi$ is
- called product measure and denoted by $\mu\times \nu$
- $\sigma$-finite measure
- only measure such that $\pi(A\times B) =\mu(A) \nu(B)$ for every measurable rectangle

Fubini's theorem

suppose two $\sigma$-finite measure spaces, $\meas{X}{\algX}{\mu}$ and $\meas{Y}{\algY}{\nu}$ - define
- $X_0 = \set{x\in X}{\int_Y |f(x,y)|d\nu < \infty}\subset X$
- $Y_0 = \set{y\in Y}{\int_X |f(x,y)|d\nu < \infty}\subset Y$
Fubini's theorem - for nonnegative measurable function, $f$, following are measurable with respect to $\algX$ and $\algY$ respectively $$ g(x) = \int_Y f(x,y)d\nu,\ \ h(y) = \int_X f(x,y)d\mu $$ and following holds $$ \int_{X\times Y} f(x,y) d\pi = \int_X \left(\int_Y f(x,y) d\nu\right)d\mu = \int_Y \left(\int_X f(x,y) d\mu\right)d\nu $$
for $f$, (not necessarily nonnegative) integrable function with respect to $\pi$
- $\mu(X\sim X_0) = 0$, $\nu(Y\sim Y_0)=0$
- $g$ and $h$ are finite measurable on $X_0$ and $Y_0$ respectively
- (above) equalities of double integral holds

Random Variables

Random variables

for probability space, $\meas{\Omega}{\algk{F}}{P}$,
measurable function (with respect to $\algF/\algR$), $X:\Omega \to \reals$, called random variable
measurable function (with respect to $\algF/\algR^n$), $X:\Omega \to \reals^n$, called random vector
- when expressing $X(\omega)=(X_1(\omega), \ldots, X_n(\omega))$, $X$ is measurable if and only if every $X_i$ is measurable
- thus, $n$-dimensional random vaector is simply $n$-tuple of random variables
smallest $\sigma$-algebra with respect to which $X$ is measurable, called $\sigma$-algebra generated by $X$ and denoted by $\sigma(X)$
- $\sigma(X)$ consists exactly of sets, $\set{\omega\in \Omega}{X(\omega)\in H}$, for $H\in\algR^n$
- random variable, $Y$, is measurable with respect to $\sigma(X)$ if and only if exists measurable function, $f:\reals^n\to\reals$ such that $Y(\omega) = f(X(\omega))$ for all $\omega$, i.e., $Y=f\circ X$

Probability distributions for random variables

probability measure on $\reals$, $\mu = PX^{-1}$, i.e., $$ \mu(A) = P(X\in A) \mbox{ for } A \in \algR $$ called distribution or law of random variable, $X$
function, $F:\reals\to[0,1]$, defined by $$ F(x) = \mu(-\infty, x] = P(X\leq x) $$ called distribution function or cumulative distribution function (CDF) of $X$
Borel set, $S$, with $P(S)=1$, called support
random variable, its distribution, its distribution function, said to be discrete when has countable support

Probability distribution of mappings of random variables

for measurable $g:\reals\to\reals$, $$ \left( \forall A\in\algR \right) \left( \prob{g(X)\in A} = \prob{X \in g^{-1}(A)} = \mu (g^{-1}(A)) \right) $$ hence, $g(X)$ has distribution of $\mu g^{-1}$

Probability density for random variables

Borel function, $f: \reals\to\preals$, satisfying $$ \left( \forall A \in \algR \right) \left( \mu(A) = P(X\in A) = \int_A f(x) dx \right) $$ called density or probability density function (PDF) of random variable
above is equivalent to $$ \left( \forall a < b \in \reals \right) \left( \int_a^b f(x) dx = P(a<X\leq b) = F(b) - F(a) \right) $$
(refer to statement on page~here)
- note, though, $F$ does not need to differentiate to $f$ everywhere; only $f$ required to integrate properly
- if $F$ does differentiate to $f$ and $f$ is continuous, fundamental theorem of calculus implies $f$ indeed is density for $F$

Probability distribution for random vectors

(similarly to random variables) probability measure on $\reals^n$, $\mu = PX^{-1}$, i.e., $$ \mu(A) = P(X\in A) \mbox{ for } A \in \algk{B}^k $$ called distribution or law of random vector, $X$
function, $F:\reals^k\to[0,1]$, defined by $$ F(x) = \mu S_x = P(X\preceq x) $$ where $$ S_x = \set{\omega\in \Omega}{X(\omega)\preceq x} = \set{\omega\in \Omega}{X_i(\omega)\leq x_i} $$ called distribution function or cumulative distribution function (CDF) of $X$
(similarly to random variables) random vector, its distribution, its distribution function, said to be discrete when has countable support

Marginal distribution for random vectors

(similarly to random variables) for measurable $g:\reals^n\to\reals^m$ $$ \left( \forall A\in\algR^{m} \right) \left( \prob{g(X)\in A} = \prob{X \in g^{-1}(A)} = \mu(g^{-1}(A)) \right) $$ hence, $g(X)$ has distribution of $\mu g^{-1}$
for $g_i:\reals^n\to\reals$ with $g_i(x) = x_i$ $$ \left( \forall A\in\algR \right) \left( \prob{g(X)\in A} = \prob{X_i \in A} \right) $$
measure, $\mu_i$, defined by $\mu_i(A) = \prob{X_i\in A}$, called ($i$-th) marginal distribution of $X$
for $\mu$ having density function, $f:\reals^n\to\preals$, density function of marginal distribution is $$ f_i(x) = \int_{\algR^{n-1}} f(x_{-i}) d \mu_{-i} $$ where $x_{-i} = (x_1,\ldots,x_{i-1}, x_{i+1}, \ldots, x_n)$ and similarly for $d\mu_{-i}$

Independence of random variables

random variables, $X_1$, …, $X_n$, with independent $\sigma$-algebras generated by them, said to be independent
(refer to page~here for independence of collections of subsets)
- because $\sigma(X_i) = X_i^{-1}(\algR)=\set{X_i^{-1}(H)}{H\in\algR}$, independent if and only if $$ \begin{eqnarray*} && \left( \forall H_1, \ldots, H_n\in \algR \right) \\ && \left( P\left(X_1\in H_1,\ldots, X_n\in H_n\right) = \prod P\left(X_i\in H_i\right) \right) \end{eqnarray*} $$ i.e., $$ \begin{eqnarray*} && \left( \forall H_1, \ldots, H_n\in \algR \right) \\ && \left( P\left(\bigcap X_i^{-1}(H_i)\right) = \prod P\left(X_i^{-1}(H_i)\right) \right) \end{eqnarray*} $$

Equivalent statements of independence of random variables

for random variables, $X_1$, …, $X_n$, having $\mu$ and $F:\reals^n\to[0,1]$ as their distribution and CDF, with each $X_i$ having $\mu_i$ and $F_i:\reals\to[0,1]$ as its distribution and CDF, following statements are equivalent
- $X_1,\ldots,X_n \mbox{ are independent}$
- $$ \begin{eqnarray*} &&\left( \forall H_1, \ldots, H_n\in \algR \right) \\ &&\left( P\left(\bigcap X_i^{-1}(H_i)\right) = \prod P\left(X_i^{-1}(H_i)\right) \right) \end{eqnarray*} $$
- $$ \begin{eqnarray*} &&\left( \forall H_1,\ldots,H_n \in \algR \right) \\ &&\left( P(X_1\in H_1,\ldots, X_n\in H_n) = \prod P(X_i \in H_i) \right) \end{eqnarray*} $$
- $$ \begin{eqnarray*} &&\left( \forall x\in \reals^n \right) \\ &&\left( P(X_1\leq x_1,\ldots, X_n\leq x_n) = \prod P(X_i \leq x_i) \right) \end{eqnarray*} $$
- $\left( \forall x \in \reals^n \right) \left( F(x) = \prod F_i(x_i) \right) $
- $\mu = \mu_1 \times \cdots \times \mu_n $
- $\left( \forall x \in \reals^n \right) \left( f(x) = \prod f_i(x_i) \right) $

Independence of random variables with separate $\sigma$-algebra

given probability space, $\meas{\Omega}{\algk{F}}{P}$
random variables, $X_1$, …, $X_n$, each of which is measurable with respect to each of $n$ independent $\sigma$-algebras, $\algk{G}_1\subset \algF$, …, $\algk{G}_n\subset \algF$ respectively, are independent

Independence of random vectors

for random vectors, $X_1:\Omega\to\reals^{d_1}$, …, $X_n:\Omega\to\reals^{d_n}$, having $\mu$ and $F:\reals^{d_1}\times\cdots\times\reals^{d_n}\to[0,1]$ as their distribution and CDF, with each $X_i$ having $\mu_i$ and $F_i:\reals^{d_i}\to[0,1]$ as its distribution and CDF, following statements are equivalent
- $X_1,\ldots,X_n \mbox{ are independent}$
- $$ \begin{eqnarray*} && \left( \forall H_1\in\algR^{d_1}, \ldots, H_n\in \algR^{d_n} \right) \\ && \left( P\left(\bigcap X_i^{-1}(H_i)\right) = \prod P\left(X_i^{-1}(H_i)\right) \right) \end{eqnarray*} $$
- $$ \begin{eqnarray*} && \left( \forall H_1\in\algR^{d_1}, \ldots, H_n\in \algR^{d_n} \right) \\ && \left( P(X_1\in H_1,\ldots, X_n\in H_n) = \prod P(X_i \in H_i) \right) \end{eqnarray*} $$
- $$ \begin{eqnarray*} && \left( \forall x_1\in \reals^{d_1},\ldots,x_n\in\reals^{d_n} \right) \\ && \left( P(X_1\preceq x_1,\ldots, X_n\preceq x_n) = \prod P(X_i \preceq x_i) \right) \end{eqnarray*} $$
- $ \left( \forall x_1\in \reals^{d_1},\ldots,x_n\in\reals^{d_n} \right) \left( F(x_1,\ldots,x_n) = \prod F_i(x_i) \right) $
- $\mu = \mu_1 \times \cdots \times \mu_n $
- $ \left( \forall x_1\in \reals^{d_1},\ldots,x_n\in\reals^{d_n} \right) \left( f(x_1,\ldots,x_n) = \prod f_i(x_i) \right) $

Independence of infinite collection of random vectors

infinite collection of random vectors for which every finite subcollection is independent, said to be independent
for independent (countable) collection of random vectors, $\seq{\seq{X_{ni}}_{i=1}^\infty}_{n=1}^\infty$, $\seq{\algk{F}_n}_{n=1}^\infty$ with $\algk{F}_n = \sigma(\seq{X_{ni}}_{i=1}^\infty)$ are independent

Probability evaluation for two independent random vectors

for independent random vectors, $X$ and $Y$, with distributions, $\mu$ and $\nu$, in $\reals^n$ and $\reals^m$ respectively $$ \begin{eqnarray*} && \left( \forall B\in\algR^{n+m} \right) \\ && \left( \prob{(X,Y)\in B} = \int_{\reals^n} \prob{(x,Y)\in B} d\mu_X \right) \end{eqnarray*} $$ and $$ \begin{eqnarray*} && \left( \forall A\in\algR^{n}, B\in\algR^{n+m} \right) \\ && \left( \prob{X\in A, (X,Y)\in B} = \int_{A} \prob{(x,Y)\in B} d\mu_X \right) \end{eqnarray*} $$

Sequence of random variables

for sequence of probability measures on $\algR$, $\seq{\mu_n}$, exists probability space, $\meas{X}{\Omega}{P}$, and sequence of independent random variables in $\reals$, $\seq{X_n}$, such that each $X_n$ has $\mu_n$ as distribution

Expected values

for random variable, $X$, on $\meas{\Omega}{\algF}{P}$, integral of $X$ with respect to measure, $P$ $$ \Expect X = \int X dP = \int_\Omega X(\omega) dP $$ called expected value of $X$

$\Expect X$ is
- always defined for nonnegative $X$
- for general case
  - defined, or
  - $X$ has an expected value if either $\Expect X^+<\infty$ or $\Expect X^-<\infty$ or both, in which case, $\Expect X =\Expect X^+ - \Expect X^-$
$X$ is integrable if and only if $\Expect |X| <\infty$
limits
- if $\seq{X_n}$ is dominated by integral random variable or they are uniformly integrable, $\Expect X_n$ converges to $\Expect X$ if $X_n$ converges to $X$ in probability

Markov and Chebyshev's inequalities

for random variable, $X$, on $\meas{\Omega}{\algF}{P}$, $$ \prob{X\geq \alpha} \leq \frac{1}{\alpha} \int_{X\geq \alpha} X d P \leq \frac{1}{\alpha} \Expect X $$ for nonnegative $X$, hence $$ \prob{|X|\geq \alpha} \leq \frac{1}{\alpha^n} \int_{|X|\geq \alpha} |X|^n d P \leq \frac{1}{\alpha^n} \Expect |X|^n $$ for general $X$

as special case of Markov inequality, $$ \begin{eqnarray*} \prob{|X-\Expect X|\geq \alpha} &\leq& \frac{1}{\alpha^2} \int_{|X-\Expect X|\geq \alpha} (X-\Expect X)^2 d P \\ &\leq& \frac{1}{\alpha^2} \Var X \end{eqnarray*} $$ for general $X$

Jensen's, Hölder's, and Lyapunov's inequalities

for random variable, $X$, on $\meas{\Omega}{\algF}{P}$, and convex function, $\varphi$ $$ \varphi\left(\Expect X\right) \prob{X\geq \alpha} \leq \frac{1}{\alpha} \int_{X\geq \alpha} X d P \leq \frac{1}{\alpha} \Expect X $$

for two random variables, $X$ and $Y$, on $\meas{\Omega}{\algF}{P}$, and $p,q\in(1,\infty)$ with $1/p+1/q=1$ $$ \Expect |XY| \leq \left(\Expect |X|^p\right)^{1/p} \left(\Expect |X|^q\right)^{1/q} $$

for random variable, $X$, on $\meas{\Omega}{\algF}{P}$, and $0<\alpha<\beta$ $$ \left(\Expect |X|^\alpha\right)^{1/\alpha} \leq \left(\Expect |X|^\beta\right)^{1/\beta} $$

note Hölder's inequality implies Lyapunov's inequality

Maximal inequalities

if $A\in\algF=\bigcap_{n=1}^\infty \sigma(X_n, X_{n+1},\ldots)$ for independent $\seq{X_n}$, $$ \prob{A} = 0 \vee \prob{A} = 1 $$

– define $S_n = \sum X_i$

for independent $\seq{X_i}_{i=1}^n$ with $\Expect X_i =0$ and $\Var X_i<\infty$ and $\alpha>0$ $$ \prob{\max S_i \geq \alpha} \leq \frac{1}{\alpha}\Var S_n $$

for independent $\seq{X_i}_{i=1}^n$ and $\alpha>0$ $$ \prob{\max |S_i| \geq 3\alpha} \leq 3 \max \prob{|S_i|\geq\alpha} $$

Moments

for random variable, $X$, on $\meas{\Omega}{\algF}{P}$, integral of $X$ with respect to measure, $P$ $$ \Expect X^n = \int x^k d\mu = \int x^k dF(x) $$ called $k$-th moment of $X$ or $\mu$ or $F$, and $$ \Expect |X|^n = \int |x|^k d\mu = \int |x|^k dF(x) $$ called $k$-th absolute moment of $X$ or $\mu$ or $F$

if $\Expect |X|^n<\infty$, $\Expect |X|^k<\infty$ for $k<n$
$\Expect X^n$ defined only when $\Expect|X|^n<\infty$

Moment generating functions

for random variable, $X$, on $\meas{\Omega}{\algF}{P}$, $M:\complexes \to \complexes$ defined by $$ M(s) = \Expect \left( e^{sX} \right) = \int e^{sx} d\mu = \int e^{sx} dF(x) $$ called moment generating function of $X$

$n$-th derivative of $M$ with respect to $s$ is $ M^{(n)}(s) = \frac{d^n}{ds^n} F(s) = \Expect \left(X^ne^{sX}\right) = \int xe^{sx} d\mu $
thus, $n$-th derivative of $M$ with respect to $s$ at $s=0$ is $n$-th moment of $X$ $$ M^{(n)}(0) = \Expect X^n $$
for independent random variables, $\seq{X_i}_{i=1}^n$, moment generating function of $\sum X_i$ $$ \prod M_i(s) $$

Convergence of Random Variables

Convergences of random variables

random variables, $\seq{X_n}$, with $$ \prob{\lim X_n = X} = P(\set{\omega \in \Omega}{\lim X_n(\omega) = X(\omega)}) = 1 $$ said to converge to $X$ with probability $1$ and denoted by $X_n\to X$ a.s.

random variables, $\seq{X_n}$, with $$ \left( \forall \epsilon>0 \right) \left( \lim \prob{|X_n-X|>\epsilon} = 0 \right) $$ said to converge to $X$ in probability

distribution functions, $\seq{F_n}$, with $$ \left( \forall x \mbox{ in domain of }F \right) \left( \lim F_n(x) = F(x) \right) $$ said to converge weakly to distribution function, $F$, and denoted by $F_n \Rightarrow F$

When $F_n\Rightarrow F$, associated random variables, $\seq{X_n}$, said to converge in distribution to $X$, associated with $F$, and denoted by $X_n \Rightarrow X$

for measures on $\measu{\reals}{\algR}$, $\seq{\mu_n}$, associated with distribution functions, $\seq{F_n}$, respectively, and measure on $\measu{\reals}{\algR}$, $\mu$, associated with distribution function, $F$, we denote $$ \mu_n \Rightarrow \mu $$ if $$ \left( \forall A = (-\infty, x] \mbox{ with } x\in\reals \right) \left( \lim \mu_n(A) = \mu(A) \right) $$

indeed, if above equation holds for $A=(-\infty, x)$, it holds for many other subsets

Relations of different types of convergences of random variables

convergence with probability $1$ implies convergence in probability, which implies $X_n\Rightarrow X$, i.e. $$ \begin{eqnarray*} && X_n \to X \mbox{ a.s., \ie, } X_n \mbox{ converge to } X \mbox{ with probability $1$} \\ &\Rightarrow& X_n \mbox{ converge to } X \mbox{ in probability} \\ &\Rightarrow& X_n \Rightarrow X \mbox{, \ie, } X_n \mbox{ converge to } X \mbox{ in distribution}, \end{eqnarray*} $$

Necessary and sufficient conditions for convergence of probability

\[{X_n}\ \mbox{ converge in probability}\]

if and only if

\[\left( \forall \epsilon>0 \right) \left( \prob{|X_n-X|>\epsilon\mbox{ i.o}} = \prob{\limsup |X_n-X| > \epsilon } = 0 \right)\]

if and only if

\[\begin{eqnarray*} && \left( \forall \mbox{ subsequence }\seq{X_{n_k}} \right) \\ && \left( \exists \mbox{ its subsequence }\seq{X_{n_{k_l}}} \mbox{ converging to } f \mbox{ with probability } 1 \right) \end{eqnarray*}\]

Necessary and sufficient conditions for convergence in distribution

\[X_n\Rightarrow X, \mbox{\ie, $X_n$ converge in distribution}\]

if and only if

\[F_n\Rightarrow F, \mbox{\ie, $F_n$ converge weakly}\]

if and only if

\[\left( \forall A = (-\infty, x] \mbox{ with } x\in\reals \right) \left( \lim \mu_n(A) = \mu(A) \right)\]

if and only if

\[\left( \forall x \mbox{ with } \prob{X=x} = 0 \right) \left( \lim \prob{X_n\leq x} = \prob{X\leq x} \right)\]

Strong law of large numbers

– define $S_n = \sum_{i=1}^n X_i$

for sequence of independent and identically distributed (i.i.d.) random variables with finite mean, $\seq{X_n}$ $$ \frac{1}{n} S_n \to \Expect X_1 $$ with probability $1$

strong law of large numbers also called Kolmogorov's law

for sequence of independent and identically distributed (i.i.d.) random variables with $\Expect X_1^- < \infty$ and $\Expect X_1^+ = \infty$ (hence, $\Expect X = \infty$) $$ \frac{1}{n} S_n \to \infty $$ with probability $1$

Weak law of large numbers

– define $S_n = \sum_{i=1}^n X_i$

for sequence of independent and identically distributed (i.i.d.) random variables with finite mean, $\seq{X_n}$ $$ \frac{1}{n} S_n \to \Expect X_1 $$ in probability

because convergence with probability $1$ implies convergence in probability (), strong law of large numbers implies weak law of large numbers

Normal distributions

– assume probability space, $\meas{\Omega}{\algF}{P}$

Random variable, $X:\Omega\to\reals$, with $$ \left( A\in\algR \right) \left( \prob{X\in A} = \frac{1}{\sqrt{2\pi}\sigma} \int_A e^{-(x-c)^2/2} d\mu \right) $$ where $\mu=PX^{-1}$ for some $\sigma>0$ and $c\in\reals$, called normal distribution and denoted by $X \sim \normal(c,\sigma^2)$

note $\Expect X=c$ and $\Var X=\sigma^2$
called standard normal distribution when $c=0$ and $\sigma=1$

Multivariate normal distributions

– assume probability space, $\meas{\Omega}{\algF}{P}$

Random variable, $X:\Omega\to\reals^n$, with $$ \left( A\in\algR^n \right) \left( \prob{X\in A} = \frac{1}{\sqrt{(2\pi)^n}\sqrt{\det \Sigma}} \int_A e^{-(x-c)^T\Sigma^{-1}(x-c)/2} d\mu \right) $$ where $\mu=PX^{-1}$ for some $\Sigma\succ0\in\posdefset{n}$ and $c\in\reals^n$, called ($n$-dimensional) normal distribution, and denoted by $X \sim \normal(c,\Sigma)$

note that $\Expect X=c$ and covariance matrix is $\Sigma$

Lindeberg-Lévy theorem

– define $S_n = \sum^n X_i$

for independent random variables, $\seq{X_n}$, having same distribution with expected value, $c$, and same variance, $\sigma^2<\infty$, ${(S_n - nc)}/{\sigma\sqrt{n}}$ converges to standard normal distribution in distribution, i.e., $$ \frac{S_n - nc}{\sigma\sqrt{n}} \Rightarrow N $$ where $N$ is standard normal distribution

implies $$ S_n / n \Rightarrow c $$

Limit theorems in $\reals^n$

each of following statements are equivalent to weak convergence of measures, $\seq{\mu_n}$, to $\mu$, on measurable space, $\measu{\reals^k}{\algR^k}$

$\lim \int f d\mu_n = \int f d\mu$ for every bounded continuous $f$
$\limsup \mu_n(C) \leq \mu(C)$ for every closed $C$
$\liminf \mu_n(G) \geq \mu(G)$ for every open $G$
$\lim \mu_n(A) = \mu(A)$ for every $\mu$-continuity $A$

for random vectors, $\seq{X_n}$, and random vector, $Y$, of $k$-dimension, $X_n\Rightarrow Y$, i.e., $X_n$ converge to $Y$ in distribution if and only if $$ \left( \forall z\in \reals^k \right) \left( z^T X_n \Rightarrow z^T Y \right) $$

Central limit theorem

– assume probability space, $\meas{\Omega}{\algF}{P}$ and define $\sum^n X_i = S_n$

for random variables, $\seq{X_n}$, having same distributions with $\Expect X_n = c\in\reals^k$ and positive definite covariance matrix, $\Sigma\succ0\in\mathcalfont{S}_k$, i.e., $\Expect(X_n-c)(X_n-c)^T = \Sigma$, where $\Sigma_{ii} < \infty$ (hence $\Sigma \prec M I_n$ for some $M\in\ppreals$ due to Cauchy-Schwarz inequality), $$ (S_n -nc)/\sqrt{n} \mbox{ converges in distribution to } Y $$ where $Y \sim \normal(0,\Sigma)$

Convergence of random series

for independent $\seq{X_n}$, probability of $\sum X_n$ converging is either $0$ or $1$
below characterize two cases in terms of distributions of individual $X_n$

for independent $\seq{X_n}$ with $\Expect X_n=0$ and $\Var X_n < \infty$ $$ \sum X_n \mbox{ converges with probability $1$} $$

for independent $\seq{X_n}$, $\sum X_n$ converges with probability $1$ if and only if they converges in probability

define trucated version of $X_n$ by $X_n^{(c)}$, i.e., $X_n I_{|X_n|\leq c}$

for independent $\seq{X_n}$, $\sum X_n$ converge with probability $1$ if all of $\sum \prob{|X_n|>c}$, $\sum \Expect(X_n^{(c)})$, and $\sum \Var(X_n^{(c)})$ converge for some $c>0$

Share on

X Facebook LinkedIn Bluesky

Sunghee Yun (He/Him)

NotebookLM Podcast

Introduction

Preamble

Notations

Some definitions

Some conventions

Measure-theoretic Treatment of Probabilities

Probability Measure

Measurable functions

Probability (measure) spaces

Dynkin's $\pi$-$\lambda$ theorem

Limits of Events

Probabilistic independence

Independence of classes of events

Borel-Cantelli lemmas

Tail events and Kolmogorov's zero-one law

Product probability spaces

Sections of measurable subsets and functions

Product measure

Fubini's theorem

Random Variables

Random variables

Probability distributions for random variables

Probability distribution of mappings of random variables

Probability density for random variables

Probability distribution for random vectors

Marginal distribution for random vectors

Independence of random variables

Equivalent statements of independence of random variables

Independence of random variables with separate $\sigma$-algebra

Independence of random vectors

Independence of infinite collection of random vectors

Probability evaluation for two independent random vectors

Sequence of random variables

Expected values

Markov and Chebyshev's inequalities

Jensen's, Hölder's, and Lyapunov's inequalities

Maximal inequalities

Moments

Moment generating functions

Convergence of Random Variables

Convergences of random variables

Relations of different types of convergences of random variables

Necessary and sufficient conditions for convergence of probability

Necessary and sufficient conditions for convergence in distribution

Strong law of large numbers

Weak law of large numbers

Normal distributions

Multivariate normal distributions

Lindeberg-Lévy theorem

Limit theorems in $\reals^n$

Central limit theorem

Convergence of random series

Share on

You May Also Enjoy

Dimensional Paradox of Music

Partial information is not (necessarily) better than ignorance - Wisdom of Strategic Ignorance

Harmony Across Generations - A Musical Journey Through Korean Heritage

All Math Topics in All Multiverses