20 minute read

posted: 01-Aug-2025 & updated: 03-Aug-2025

\[% \newcommand{\algA}{\algk{A}} \newcommand{\algC}{\algk{C}} \newcommand{\bigtimes}{\times} \newcommand{\compl}[1]{\tilde{#1}} \newcommand{\complexes}{\mathbb{C}} \newcommand{\dom}{\mathop{\bf dom {}}} \newcommand{\ereals}{\reals\cup\{-\infty,\infty\}} \newcommand{\field}{\mathbb{F}} \newcommand{\integers}{\mathbb{Z}} \newcommand{\lbdseqk}[1]{\seqk{\lambda}{#1}} \newcommand{\meas}[3]{({#1}, {#2}, {#3})} \newcommand{\measu}[2]{({#1}, {#2})} \newcommand{\meast}[3]{\left({#1}, {#2}, {#3}\right)} \newcommand{\naturals}{\mathbb{N}} \newcommand{\nuseqk}[1]{\seqk{\nu}{#1}} \newcommand{\pair}[2]{\langle {#1}, {#2}\rangle} \newcommand{\rationals}{\mathbb{Q}} \newcommand{\reals}{\mathbb{R}} \newcommand{\seq}[1]{\left\langle{#1}\right\rangle} \newcommand{\powerset}{\mathcal{P}} \newcommand{\pprealk}[1]{\reals_{++}^{#1}} \newcommand{\ppreals}{\mathbb{R}_{++}} \newcommand{\prealk}[1]{\reals_{+}^{#1}} \newcommand{\preals}{\mathbb{R}_+} \newcommand{\tXJ}{\topos{X}{J}} % \newcommand{\relint}{\mathop{\bf relint {}}} \newcommand{\boundary}{\mathop{\bf bd {}}} \newcommand{\subsetset}[1]{\mathcal{#1}} \newcommand{\Tr}{\mathcal{\bf Tr}} \newcommand{\symset}[1]{\mathbf{S}^{#1}} \newcommand{\possemidefset}[1]{\mathbf{S}_+^{#1}} \newcommand{\posdefset}[1]{\mathbf{S}_{++}^{#1}} \newcommand{\ones}{\mathbf{1}} \newcommand{\Prob}{\mathop{\bf Prob {}}} \newcommand{\prob}[1]{\Prob\left\{#1\right\}} \newcommand{\Expect}{\mathop{\bf E {}}} \newcommand{\Var}{\mathop{\bf Var{}}} \newcommand{\Mod}[1]{\;(\text{mod}\;#1)} \newcommand{\ball}[2]{B(#1,#2)} \newcommand{\generates}[1]{\langle {#1} \rangle} \newcommand{\isomorph}{\approx} \newcommand{\isomorph}{\approx} \newcommand{\nullspace}{\mathcalfont{N}} \newcommand{\range}{\mathcalfont{R}} \newcommand{\diag}{\mathop{\bf diag {}}} \newcommand{\rank}{\mathop{\bf rank {}}} \newcommand{\Ker}{\mathop{\mathrm{Ker} {}}} \newcommand{\Map}{\mathop{\mathrm{Map} {}}} \newcommand{\End}{\mathop{\mathrm{End} {}}} \newcommand{\Img}{\mathop{\mathrm{Im} {}}} \newcommand{\Aut}{\mathop{\mathrm{Aut} {}}} \newcommand{\Gal}{\mathop{\mathrm{Gal} {}}} \newcommand{\Irr}{\mathop{\mathrm{Irr} {}}} \newcommand{\arginf}{\mathop{\mathrm{arginf}}} \newcommand{\argsup}{\mathop{\mathrm{argsup}}} \newcommand{\argmin}{\mathop{\mathrm{argmin}}} \newcommand{\ev}{\mathop{\mathrm{ev} {}}} \newcommand{\affinehull}{\mathop{\bf aff {}}} \newcommand{\cvxhull}{\mathop{\bf Conv {}}} \newcommand{\epi}{\mathop{\bf epi {}}} \newcommand{\injhomeo}{\hookrightarrow} \newcommand{\perm}[1]{\text{Perm}(#1)} \newcommand{\aut}[1]{\text{Aut}(#1)} \newcommand{\ideal}[1]{\mathfrak{#1}} \newcommand{\bigset}[2]{\left\{#1\left|{#2}\right.\right\}} \newcommand{\bigsetl}[2]{\left\{\left.{#1}\right|{#2}\right\}} \newcommand{\primefield}[1]{\field_{#1}} \newcommand{\dimext}[2]{[#1:{#2}]} \newcommand{\restrict}[2]{#1|{#2}} \newcommand{\algclosure}[1]{#1^\mathrm{a}} \newcommand{\finitefield}[2]{\field_{#1^{#2}}} \newcommand{\frobmap}[2]{\varphi_{#1,{#2}}} % %\newcommand{\algfontmode}{} % %\ifdefined\algfontmode %\newcommand\mathalgfont[1]{\mathcal{#1}} %\newcommand\mathcalfont[1]{\mathscr{#1}} %\else \newcommand\mathalgfont[1]{\mathscr{#1}} \newcommand\mathcalfont[1]{\mathcal{#1}} %\fi % %\def\DeltaSirDir{yes} %\newcommand\sdirletter[2]{\ifthenelse{\equal{\DeltaSirDir}{yes}}{\ensuremath{\Delta #1}}{\ensuremath{#2}}} \newcommand{\sdirletter}[2]{\Delta #1} \newcommand{\sdirlbd}{\sdirletter{\lambda}{\Delta \lambda}} \newcommand{\sdir}{\sdirletter{x}{v}} \newcommand{\seqk}[2]{#1^{(#2)}} \newcommand{\seqscr}[3]{\seq{#1}_{#2}^{#3}} \newcommand{\xseqk}[1]{\seqk{x}{#1}} \newcommand{\sdirk}[1]{\seqk{\sdir}{#1}} \newcommand{\sdiry}{\sdirletter{y}{\Delta y}} \newcommand{\slen}{t} \newcommand{\slenk}[1]{\seqk{\slen}{#1}} \newcommand{\ntsdir}{\sdir_\mathrm{nt}} \newcommand{\pdsdir}{\sdir_\mathrm{pd}} \newcommand{\sdirnu}{\sdirletter{\nu}{w}} \newcommand{\pdsdirnu}{\sdirnu_\mathrm{pd}} \newcommand{\pdsdiry}{\sdiry_\mathrm{pd}} \newcommand\pdsdirlbd{\sdirlbd_\mathrm{pd}} % \newcommand{\normal}{\mathcalfont{N}} % \newcommand{\algk}[1]{\mathalgfont{#1}} \newcommand{\collk}[1]{\mathcalfont{#1}} \newcommand{\classk}[1]{\collk{#1}} \newcommand{\indexedcol}[1]{\{#1\}} \newcommand{\rel}{\mathbf{R}} \newcommand{\relxy}[2]{#1\;\rel\;{#2}} \newcommand{\innerp}[2]{\langle{#1},{#2}\rangle} \newcommand{\innerpt}[2]{\left\langle{#1},{#2}\right\rangle} \newcommand{\closure}[1]{\overline{#1}} \newcommand{\support}{\mathbf{support}} \newcommand{\set}[2]{\{#1|#2\}} \newcommand{\metrics}[2]{\langle {#1}, {#2}\rangle} \newcommand{\interior}[1]{#1^\circ} \newcommand{\topol}[1]{\mathfrak{#1}} \newcommand{\topos}[2]{\langle {#1}, \topol{#2}\rangle} % topological space % \newcommand{\alg}{\algk{A}} \newcommand{\algB}{\algk{B}} \newcommand{\algF}{\algk{F}} \newcommand{\algR}{\algk{R}} \newcommand{\algX}{\algk{X}} \newcommand{\algY}{\algk{Y}} % \newcommand\coll{\collk{C}} \newcommand\collB{\collk{B}} \newcommand\collF{\collk{F}} \newcommand\collG{\collk{G}} \newcommand{\tJ}{\topol{J}} \newcommand{\tS}{\topol{S}} \newcommand\openconv{\collk{U}} % \newenvironment{my-matrix}[1]{\begin{bmatrix}}{\end{bmatrix}} \newcommand{\colvectwo}[2]{\begin{my-matrix}{c}{#1}\\{#2}\end{my-matrix}} \newcommand{\colvecthree}[3]{\begin{my-matrix}{c}{#1}\\{#2}\\{#3}\end{my-matrix}} \newcommand{\rowvecthree}[3]{\begin{bmatrix}{#1}&{#2}&{#3}\end{bmatrix}} \newcommand{\mattwotwo}[4]{\begin{bmatrix}{#1}&{#2}\\{#3}&{#4}\end{bmatrix}} % \newcommand\optfdk[2]{#1^\mathrm{#2}} \newcommand\tildeoptfdk[2]{\tilde{#1}^\mathrm{#2}} \newcommand\fobj{\optfdk{f}{obj}} \newcommand\fie{\optfdk{f}{ie}} \newcommand\feq{\optfdk{f}{eq}} \newcommand\tildefobj{\tildeoptfdk{f}{obj}} \newcommand\tildefie{\tildeoptfdk{f}{ie}} \newcommand\tildefeq{\tildeoptfdk{f}{eq}} \newcommand\xdomain{\mathcalfont{X}} \newcommand\xobj{\optfdk{\xdomain}{obj}} \newcommand\xie{\optfdk{\xdomain}{ie}} \newcommand\xeq{\optfdk{\xdomain}{eq}} \newcommand\optdomain{\mathcalfont{D}} \newcommand\optfeasset{\mathcalfont{F}} % \newcommand{\bigpropercone}{\mathcalfont{K}} % \newcommand{\prescript}[3]{\;^{#1}{#3}} % %\]

Introduction

Preamble

Notations

  • sets of numbers
    • $\naturals$ - set of natural numbers
    • $\integers$ - set of integers
    • $\integers_+$ - set of nonnegative integers
    • $\rationals$ - set of rational numbers
    • $\reals$ - set of real numbers
    • $\preals$ - set of nonnegative real numbers
    • $\ppreals$ - set of positive real numbers
    • $\complexes$ - set of complex numbers
  • sequences $\seq{x_i}$ and the like
    • finite $\seq{x_i}_{i=1}^n$, infinite $\seq{x_i}_{i=1}^\infty$ - use $\seq{x_i}$ whenever unambiguously understood
    • similarly for other operations, e.g., $\sum x_i$, $\prod x_i$, $\cup A_i$, $\cap A_i$, $\bigtimes A_i$
    • similarly for integrals, e.g., $\int f$ for $\int_{-\infty}^\infty f$
  • sets
    • $\compl{A}$ - complement of $A$
    • $A\sim B$ - $A\cap \compl{B}$
    • $A\Delta B$ - $(A\cap \compl{B}) \cup (\compl{A} \cap B)$
    • $\powerset(A)$ - set of all subsets of $A$
  • sets in metric vector spaces
    • $\closure{A}$ - closure of set $A$
    • $\interior{A}$ - interior of set $A$
    • $\relint A$ - relative interior of set $A$
    • $\boundary A$ - boundary of set $A$
  • set algebra
    • $\sigma(\subsetset{A})$ - $\sigma$-algebra generated by $\subsetset{A}$, i.e., smallest $\sigma$-algebra containing $\subsetset{A}$
  • norms in $\reals^n$
    • $\|x\|_p$ ($p\geq1$) - $p$-norm of $x\in\reals^n$, i.e., $(|x_1|^p + \cdots + |x_n|^p)^{1/p}$
    • e.g., $\|x\|_2$ - Euclidean norm
  • matrices and vectors
    • $a_{i}$ - $i$-th entry of vector $a$
    • $A_{ij}$ - entry of matrix $A$ at position $(i,j)$, i.e., entry in $i$-th row and $j$-th column
    • $\Tr(A)$ - trace of $A \in\reals^{n\times n}$, i.e., $A_{1,1}+ \cdots + A_{n,n}$
  • symmetric, positive definite, and positive semi-definite matrices
    • $\symset{n}\subset \reals^{n\times n}$ - set of symmetric matrices
    • $\possemidefset{n}\subset \symset{n}$ - set of positive semi-definite matrices; $A\succeq0 \Leftrightarrow A \in \possemidefset{n}$
    • $\posdefset{n}\subset \symset{n}$ - set of positive definite matrices; $A\succ0 \Leftrightarrow A \in \posdefset{n}$
  • sometimes, use Python script-like notations (with serious abuse of mathematical notations)
    • use $f:\reals\to\reals$ as if it were $f:\reals^n \to \reals^n$, e.g., $$ \exp(x) = (\exp(x_1), \ldots, \exp(x_n)) \quad \mbox{for } x\in\reals^n $$ and $$ \log(x) = (\log(x_1), \ldots, \log(x_n)) \quad \mbox{for } x\in\ppreals^n $$ which corresponds to Python code numpy.exp(x) or numpy.log(x) where x is instance of numpy.ndarray, i.e., numpy array
    • use $\sum x$ to mean $\ones^T x$ for $x\in\reals^n$, i.e. $$ \sum x = x_1 + \cdots + x_n $$ which corresponds to Python code x.sum() where x is numpy array
    • use $x/y$ for $x,y\in\reals^n$ to mean $$ \rowvecthree{x_1/y_1}{\cdots}{x_n/y_n}^T $$ which corresponds to Python code x / y where x and y are $1$-d numpy arrays
    • use $X/Y$ for $X,Y\in\reals^{m\times n}$ to mean $$ \begin{my-matrix}{cccc} X_{1,1}/Y_{1,1} & X_{1,2}/Y_{1,2} & \cdots & X_{1,n}/Y_{1,n} \\ X_{2,1}/Y_{2,1} & X_{2,2}/Y_{2,2} & \cdots & X_{2,n}/Y_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ X_{m,1}/Y_{m,1} & X_{m,2}/Y_{m,2} & \cdots & X_{m,n}/Y_{m,n} \end{my-matrix} $$ which corresponds to Python code X / Y where X and Y are $2$-d numpy arrays

Some definitions

statement $P_n$, said to happen infinitely often or i.o. if $$ \left( \forall N\in\naturals \right) \left( \exists n > N \right) \left( P_n \right) $$
statement $P(x)$, said to happen almost everywhere or a.e. or almost surely or a.s. (depending on context) associated with measure space $\meas{X}{\algB}{\mu}$ if $$ \mu \set{x}{P(x)} = 1 $$ or equivalently $$ \mu \set{x}{\sim P(x)} = 0 $$

Some conventions

  • (for some subjects) use following conventions
    • $0\cdot \infty = \infty \cdot 0 = 0$
    • $(\forall x\in\ppreals)(x\cdot \infty = \infty \cdot x = \infty)$
    • $\infty \cdot \infty = \infty$

Measure-theoretic Treatment of Probabilities

Probability Measure

Measurable functions

  • denote $n$-dimensional Borel sets by $\algR^n$
  • for two measurable spaces, $\measu{\Omega}{\algF}$ and $\measu{\Omega'}{\algF'}$, function, $f:\Omega \to \Omega'$ with $$ \left( \forall A' \in \algF' \right) \left( f^{-1}(A') \in \algF \right) $$ said to be measurable with respect to $\algF/\algF'$ (thus, measurable functions defined on page~ and page~ can be said to be measurable with respect to $\collk{B}/\algR$)
  • when $\Omega=\reals^n$ in $\measu{\Omega}{\algF}$, $\algF$ is assumed to be $\algR^n$, and sometimes drop $\algR^n$
    • thus, e.g., we say $f:\Omega\to\reals^n$ is measurable with respect to $\algF$ (instead of $\algF/\algR^n$)
  • measurable function, $f:\reals^n\to\reals^m$ (i.e., measurable with respect to $\algR^n/\algR^m$), called Borel functions
  • $f:\Omega\to\reals^n$ is measurable with respect to $\algF/\algR^n$ if and only if every component, $f_i:\Omega\to\reals$, is measurable with respect to $\algF/\algR$

Probability (measure) spaces

  • set function, $P:\algk{F}\to[0,1]$, defined on algebra, $\algk{F}$, of set $\Omega$, satisfying following properties, called probability measure (refer to page~ for resumblance with measurable spaces)
    • $(\forall A\in\algk{F})(0\leq P(A)\leq 1)$
    • $P(\emptyset) = 0,\ P(\Omega) = 1$
    • $(\forall \mbox{ disjoint } \seq{A_n} \subset \algk{F} )(P\left(\bigcup A_n\right) = \sum P(A_n))$
  • for $\sigma$-algebra, $\algk{F}$, $\meas{\Omega}{\algk{F}}{P}$, called probability measure space or probability space
  • set $A\in\algk{F}$ with $P(A)=1$, called a support of $P$

Dynkin's $\pi$-$\lambda$ theorem

  • class, $\subsetset{P}$, of subsets of $\Omega$ closed under finite intersection, called $\pi$-system, i.e.,
    • $(\forall A,B\in \subsetset{P})(A\cap B\in\subsetset{P})$
  • class, $\subsetset{L}$, of subsets of $\Omega$ containing $\Omega$ closed under complements and countable disjoint unions called $\lambda$-system
    • $\Omega \in \subsetset{L}$
    • $(\forall A\in \subsetset{L})(\compl{A}\in\subsetset{L})$
    • $(\forall \mbox{ disjoint }\seq{A_n})(\bigcup A_n \in \subsetset{L})$
  • class that is both $\pi$-system and $\lambda$-system is $\sigma$-algebra
  • Dynkin's $\pi$-$\lambda$ theorem - for $\pi$-system, $\subsetset{P}$, and $\lambda$-system, $\subsetset{L}$, with $\subsetset{P} \subset \subsetset{L}$, $$ \sigma(\subsetset{P}) \subset \subsetset{L} $$
  • for $\pi$-system, $\algk{P}$, two probability measures, $P_1$ and $P_2$, on $\sigma(\algk{P})$, agreeing $\algk{P}$, agree on $\sigma(\algk{P})$

Limits of Events

no for sequence of subsets, $\seq{A_n}$, $$ P(\liminf A_n) \leq \liminf P(A_n) \leq \limsup P(A_n) \leq P(\limsup A_n) $$
  • for $\seq{A_n}$ converging to $A$ $$ \lim P(A_n) = P(A) $$
no for sequence of $\pi$-systems, $\seq{\algA_n}$, $\seq{\sigma(\algA_n)}$ is independent

Probabilistic independence

  • given probability space, $\meas{\Omega}{\algk{F}}{P}$
  • $A,B\in\algk{F}$ with $$ P(A\cap B) = P(A) P(B) $$ said to be independent
  • indexed collection, $\seq{A_\lambda}$, with $$ \left( \forall n\in\naturals, \mbox{ distinct } \lambda_1, \ldots, \lambda_n \in \Lambda \right) \left( P\left(\bigcap_{i=1}^n A_{\lambda_i}\right) = \prod_{i=1}^n P(A_{\lambda_i}) \right) $$ said to be independent

Independence of classes of events

  • indexed collection, $\seq{\subsetset{A}_\lambda}$, of classes of events (i.e., subsets) with $$ \left( \forall A_\lambda \in \subsetset{A}_\lambda \right) \left( \seq{A_\lambda} \mbox{ are independent} \right) $$ said to be independent
  • for independent indexed collection, \seq{\subsetset{A}_\lambda}, with every $\subsetset{A}_\lambda$ being $\pi$-sytem, \seq{\sigma(\subsetset{A}_\lambda)} are independent
  • for independent (countable) collection of events, $\seq{\seq{A_{ni}}_{i=1}^\infty}_{n=1}^\infty$, $\seq{\algk{F}_n}_{n=1}^\infty$ with $\algk{F}_n = \sigma(\seq{A_{ni}}_{i=1}^\infty)$ are independent

Borel-Cantelli lemmas

  • for sequence of events, $\seq{A_n}$, with $\sum P(A_n)$ converging $$ P(\limsup A_n) = 0 $$
  • for independent sequence of events, $\seq{A_n}$, with $\sum P(A_n)$ diverging $$ P(\limsup A_n)=1 $$

Tail events and Kolmogorov's zero-one law

  • for sequence of events, $\seq{A_n}$ $$ \algk{T} = \bigcap_{n=1}^\infty \sigma\left(\seq{A_i}_{i=n}^\infty\right) $$ called tail $\sigma$-algebra associated with \seq{A_n}; its lements are called tail events
  • Kolmogorov's zero-one law - for independent sequence of events, $\seq{A_n}$ every event in tail $\sigma$-algebra has probability measure either $0$ or $1$

Product probability spaces

  • for two measure spaces, $\meas{X}{\algX}{\mu}$ and $\meas{Y}{\algY}{\nu}$, want to find product measure, $\pi$, such that $$ \left( \forall A\in \algX, B\in\algY \right) \left( \pi(A\times B) = \mu(A)\nu(B) \right) $$
    • e.g., if both $\mu$ and $\nu$ are Lebesgue measure on $\reals$, $\pi$ will be Lebesgue measure on $\reals^2$
  • $A\times B$ for $A\in\algX$ and $B\in\algY$ is measurable rectangle
  • $\sigma$-algebra generated by measurable rectangles denoted by $$ \algX \times \algY $$
    • thus, not Cartesian product in usual sense
    • generally much larger than class of measurable rectangles

Sections of measurable subsets and functions

  • for two measure spaces, $\meas{X}{\algX}{\mu}$ and $\meas{Y}{\algY}{\nu}$
  • sections of measurable subsets
    • $\set{y\in Y}{(x,y)\in E}$ is section of $E$ determined by $x$
    • $\set{x\in X}{(x,y)\in E}$ is section of $E$ determined by $y$
  • sections of measurable functions - for measurable function, $f$, with respect to $\algX\times \algY$
    • $f(x,\cdot)$ is section of $f$ determined by $x$
    • $f(\cdot,y)$ is section of $f$ determined by $y$
  • sections of measurable subsets are measurable
    • $\left( \forall x\in X, E\in \algX \times \algY \right) \left( \set{y\in Y}{(x,y)\in E} \in \algY \right)$
    • $\left( \forall y\in Y, E\in \algX \times \algY \right) \left( \set{x\in X}{(x,y)\in E} \in \algX \right)$
  • sections of measurable functions are measurable
    • $f(x,\cdot)$ is measurable with respect to $\algY$ for every $x\in X$
    • $f(\cdot,y)$ is measurable with respect to $\algX$ for every $y\in Y$

Product measure

  • for two $\sigma$-finite measure spaces, $\meas{X}{\algX}{\mu}$ and $\meas{Y}{\algY}{\nu}$
  • two functions defined below for every $E\in\algX\times\algY$ are $\sigma$-finite measures
    • $\pi'(E) = \int_X \nu\set{y\in Y}{(x,y)\in E} d\mu$
    • $\pi''(E) = \int_Y \mu\set{x\in X}{(x,y)\in E} d\nu$
  • for every measurable rectangle, $A\times B$, with $A\in\algX$ and $B\in\algY$ $$ \pi'(A\times B) = \pi''(A\times B) = \mu(A) \nu(B) $$
  • (use conventions in page~ for extended real values)
  • indeed, $\pi'(E)=\pi''(E)$ for every $E\in\algX\times\algY$; let $\pi=\pi'=\pi''$
  • $\pi$ is
    • called product measure and denoted by $\mu\times \nu$
    • $\sigma$-finite measure
    • only measure such that $\pi(A\times B) =\mu(A) \nu(B)$ for every measurable rectangle

Fubini's theorem

  • suppose two $\sigma$-finite measure spaces, $\meas{X}{\algX}{\mu}$ and $\meas{Y}{\algY}{\nu}$ - define
    • $X_0 = \set{x\in X}{\int_Y |f(x,y)|d\nu < \infty}\subset X$
    • $Y_0 = \set{y\in Y}{\int_X |f(x,y)|d\nu < \infty}\subset Y$
  • Fubini's theorem - for nonnegative measurable function, $f$, following are measurable with respect to $\algX$ and $\algY$ respectively $$ g(x) = \int_Y f(x,y)d\nu,\ \ h(y) = \int_X f(x,y)d\mu $$ and following holds $$ \int_{X\times Y} f(x,y) d\pi = \int_X \left(\int_Y f(x,y) d\nu\right)d\mu = \int_Y \left(\int_X f(x,y) d\mu\right)d\nu $$
  • for $f$, (not necessarily nonnegative) integrable function with respect to $\pi$
    • $\mu(X\sim X_0) = 0$, $\nu(Y\sim Y_0)=0$
    • $g$ and $h$ are finite measurable on $X_0$ and $Y_0$ respectively
    • (above) equalities of double integral holds

Random Variables

Random variables

  • for probability space, $\meas{\Omega}{\algk{F}}{P}$,
  • measurable function (with respect to $\algF/\algR$), $X:\Omega \to \reals$, called random variable
  • measurable function (with respect to $\algF/\algR^n$), $X:\Omega \to \reals^n$, called random vector
    • when expressing $X(\omega)=(X_1(\omega), \ldots, X_n(\omega))$, $X$ is measurable if and only if every $X_i$ is measurable
    • thus, $n$-dimensional random vaector is simply $n$-tuple of random variables
  • smallest $\sigma$-algebra with respect to which $X$ is measurable, called $\sigma$-algebra generated by $X$ and denoted by $\sigma(X)$
    • $\sigma(X)$ consists exactly of sets, $\set{\omega\in \Omega}{X(\omega)\in H}$, for $H\in\algR^n$
    • random variable, $Y$, is measurable with respect to $\sigma(X)$ if and only if exists measurable function, $f:\reals^n\to\reals$ such that $Y(\omega) = f(X(\omega))$ for all $\omega$, i.e., $Y=f\circ X$

Probability distributions for random variables

  • probability measure on $\reals$, $\mu = PX^{-1}$, i.e., $$ \mu(A) = P(X\in A) \mbox{ for } A \in \algR $$ called distribution or law of random variable, $X$
  • function, $F:\reals\to[0,1]$, defined by $$ F(x) = \mu(-\infty, x] = P(X\leq x) $$ called distribution function or cumulative distribution function (CDF) of $X$
  • Borel set, $S$, with $P(S)=1$, called support
  • random variable, its distribution, its distribution function, said to be discrete when has countable support

Probability distribution of mappings of random variables

  • for measurable $g:\reals\to\reals$, $$ \left( \forall A\in\algR \right) \left( \prob{g(X)\in A} = \prob{X \in g^{-1}(A)} = \mu (g^{-1}(A)) \right) $$ hence, $g(X)$ has distribution of $\mu g^{-1}$

Probability density for random variables

  • Borel function, $f: \reals\to\preals$, satisfying $$ \left( \forall A \in \algR \right) \left( \mu(A) = P(X\in A) = \int_A f(x) dx \right) $$ called density or probability density function (PDF) of random variable
  • above is equivalent to $$ \left( \forall a < b \in \reals \right) \left( \int_a^b f(x) dx = P(a<X\leq b) = F(b) - F(a) \right) $$
  • (refer to statement on page~)
    • note, though, $F$ does not need to differentiate to $f$ everywhere; only $f$ required to integrate properly
    • if $F$ does differentiate to $f$ and $f$ is continuous, fundamental theorem of calculus implies $f$ indeed is density for $F$

Probability distribution for random vectors

  • (similarly to random variables) probability measure on $\reals^n$, $\mu = PX^{-1}$, i.e., $$ \mu(A) = P(X\in A) \mbox{ for } A \in \algk{B}^k $$ called distribution or law of random vector, $X$
  • function, $F:\reals^k\to[0,1]$, defined by $$ F(x) = \mu S_x = P(X\preceq x) $$ where $$ S_x = \set{\omega\in \Omega}{X(\omega)\preceq x} = \set{\omega\in \Omega}{X_i(\omega)\leq x_i} $$ called distribution function or cumulative distribution function (CDF) of $X$
  • (similarly to random variables) random vector, its distribution, its distribution function, said to be discrete when has countable support

Marginal distribution for random vectors

  • (similarly to random variables) for measurable $g:\reals^n\to\reals^m$ $$ \left( \forall A\in\algR^{m} \right) \left( \prob{g(X)\in A} = \prob{X \in g^{-1}(A)} = \mu(g^{-1}(A)) \right) $$ hence, $g(X)$ has distribution of $\mu g^{-1}$
  • for $g_i:\reals^n\to\reals$ with $g_i(x) = x_i$ $$ \left( \forall A\in\algR \right) \left( \prob{g(X)\in A} = \prob{X_i \in A} \right) $$
  • measure, $\mu_i$, defined by $\mu_i(A) = \prob{X_i\in A}$, called ($i$-th) marginal distribution of $X$
  • for $\mu$ having density function, $f:\reals^n\to\preals$, density function of marginal distribution is $$ f_i(x) = \int_{\algR^{n-1}} f(x_{-i}) d \mu_{-i} $$ where $x_{-i} = (x_1,\ldots,x_{i-1}, x_{i+1}, \ldots, x_n)$ and similarly for $d\mu_{-i}$

Independence of random variables

  • random variables, $X_1$, , $X_n$, with independent $\sigma$-algebras generated by them, said to be independent
  • (refer to page~ for independence of collections of subsets)
    • because $\sigma(X_i) = X_i^{-1}(\algR)=\set{X_i^{-1}(H)}{H\in\algR}$, independent if and only if $$ \left( \forall H_1, \ldots, H_n\in \algR \right) \left( P\left(X_1\in H_1,\ldots, X_n\in H_n\right) = \prod P\left(X_i\in H_i\right) \right) $$ i.e., $$ \left( \forall H_1, \ldots, H_n\in \algR \right) \left( P\left(\bigcap X_i^{-1}(H_i)\right) = \prod P\left(X_i^{-1}(H_i)\right) \right) $$

Equivalent statements of independence of random variables

  • for random variables, $X_1$, , $X_n$, having $\mu$ and $F:\reals^n\to[0,1]$ as their distribution and CDF, with each $X_i$ having $\mu_i$ and $F_i:\reals\to[0,1]$ as its distribution and CDF, following statements are equivalent
    • $X_1,\ldots,X_n \mbox{ are independent}$
    • $\left( \forall H_1, \ldots, H_n\in \algR \right) \left( P\left(\bigcap X_i^{-1}(H_i)\right) = \prod P\left(X_i^{-1}(H_i)\right) \right)$
    • $\left( \forall H_1,\ldots,H_n \in \algR \right) \left( P(X_1\in H_1,\ldots, X_n\in H_n) = \prod P(X_i \in H_i) \right)$
    • $\left( \forall x\in \reals^n \right) \left( P(X_1\leq x_1,\ldots, X_n\leq x_n) = \prod P(X_i \leq x_i) \right)$
    • $\left( \forall x \in \reals^n \right) \left( F(x) = \prod F_i(x_i) \right)$
    • $\mu = \mu_1 \times \cdots \times \mu_n$
    • $\left( \forall x \in \reals^n \right) \left( f(x) = \prod f_i(x_i) \right)$

Independence of random variables with separate $\sigma$-algebra

  • given probability space, $\meas{\Omega}{\algk{F}}{P}$
  • random variables, $X_1$, , $X_n$, each of which is measurable with respect to each of $n$ independent $\sigma$-algebras, $\algk{G}_1\subset \algF$, , $\algk{G}_n\subset \algF$ respectively, are independent

Independence of random vectors

  • for random vectors, $X_1:\Omega\to\reals^{d_1}$, , $X_n:\Omega\to\reals^{d_n}$, having $\mu$ and $F:\reals^{d_1}\times\cdots\times\reals^{d_n}\to[0,1]$ as their distribution and CDF, with each $X_i$ having $\mu_i$ and $F_i:\reals^{d_i}\to[0,1]$ as its distribution and CDF, following statements are equivalent
    • $X_1,\ldots,X_n \mbox{ are independent}$
    • $\left( \forall H_1\in\algR^{d_1}, \ldots, H_n\in \algR^{d_n} \right) \left( P\left(\bigcap X_i^{-1}(H_i)\right) = \prod P\left(X_i^{-1}(H_i)\right) \right)$
    • $\left( \forall H_1\in\algR^{d_1}, \ldots, H_n\in \algR^{d_n} \right) \left( P(X_1\in H_1,\ldots, X_n\in H_n) = \prod P(X_i \in H_i) \right)$
    • $\left( \forall x_1\in \reals^{d_1},\ldots,x_n\in\reals^{d_n} \right) \left( P(X_1\preceq x_1,\ldots, X_n\preceq x_n) = \prod P(X_i \preceq x_i) \right)$
    • $\left( \forall x_1\in \reals^{d_1},\ldots,x_n\in\reals^{d_n} \right) \left( F(x_1,\ldots,x_n) = \prod F_i(x_i) \right)$
    • $\mu = \mu_1 \times \cdots \times \mu_n$
    • $\left( \forall x_1\in \reals^{d_1},\ldots,x_n\in\reals^{d_n} \right) \left( f(x_1,\ldots,x_n) = \prod f_i(x_i) \right)$

Independence of infinite collection of random vectors

  • infinite collection of random vectors for which every finite subcollection is independent, said to be independent
  • for independent (countable) collection of random vectors, $\seq{\seq{X_{ni}}_{i=1}^\infty}_{n=1}^\infty$, $\seq{\algk{F}_n}_{n=1}^\infty$ with $\algk{F}_n = \sigma(\seq{X_{ni}}_{i=1}^\infty)$ are independent

Probability evaluation for two independent random vectors

for independent random vectors, $X$ and $Y$, with distributions, $\mu$ and $\nu$, in $\reals^n$ and $\reals^m$ respectively $$ \left( \forall B\in\algR^{n+m} \right) \left( \prob{(X,Y)\in B} = \int_{\reals^n} \prob{(x,Y)\in B} d\mu_X \right) $$ and $$ \left( \forall A\in\algR^{n}, B\in\algR^{n+m} \right) \left( \prob{X\in A, (X,Y)\in B} = \int_{A} \prob{(x,Y)\in B} d\mu_X \right) $$

Sequence of random variables

for sequence of probability measures on $\algR$, $\seq{\mu_n}$, exists probability space, $\meas{X}{\Omega}{P}$, and sequence of independent random variables in $\reals$, $\seq{X_n}$, such that each $X_n$ has $\mu_n$ as distribution

Expected values

for random variable, $X$, on $\meas{\Omega}{\algF}{P}$, integral of $X$ with respect to measure, $P$ $$ \Expect X = \int X dP = \int_\Omega X(\omega) dP $$ called expected value of $X$
  • $\Expect X$ is
    • always defined for nonnegative $X$
    • for general case
      • defined, or
      • $X$ has an expected value if either $\Expect X^+<\infty$ or $\Expect X^-<\infty$ or both, in which case, $\Expect X =\Expect X^+ - \Expect X^-$
  • $X$ is integrable if and only if $\Expect |X| <\infty$
  • limits
    • if $\seq{X_n}$ is dominated by integral random variable or they are uniformly integrable, $\Expect X_n$ converges to $\Expect X$ if $X_n$ converges to $X$ in probability

Markov and Chebyshev's inequalities

for random variable, $X$, on $\meas{\Omega}{\algF}{P}$, $$ \prob{X\geq \alpha} \leq \frac{1}{\alpha} \int_{X\geq \alpha} X d P \leq \frac{1}{\alpha} \Expect X $$ for nonnegative $X$, hence $$ \prob{|X|\geq \alpha} \leq \frac{1}{\alpha^n} \int_{|X|\geq \alpha} |X|^n d P \leq \frac{1}{\alpha^n} \Expect |X|^n $$ for general $X$
as special case of Markov inequality, $$ \prob{|X-\Expect X|\geq \alpha} \leq \frac{1}{\alpha^2} \int_{|X-\Expect X|\geq \alpha} (X-\Expect X)^2 d P \leq \frac{1}{\alpha^2} \Var X $$ for general $X$

Jensen's, Hölder's, and Lyapunov's inequalities

for random variable, $X$, on $\meas{\Omega}{\algF}{P}$, and convex function, $\varphi$ $$ \varphi\left(\Expect X\right) \prob{X\geq \alpha} \leq \frac{1}{\alpha} \int_{X\geq \alpha} X d P \leq \frac{1}{\alpha} \Expect X $$
for two random variables, $X$ and $Y$, on $\meas{\Omega}{\algF}{P}$, and $p,q\in(1,\infty)$ with $1/p+1/q=1$ $$ \Expect |XY| \leq \left(\Expect |X|^p\right)^{1/p} \left(\Expect |X|^q\right)^{1/q} $$
for random variable, $X$, on $\meas{\Omega}{\algF}{P}$, and $0<\alpha<\beta$ $$ \left(\Expect |X|^\alpha\right)^{1/\alpha} \leq \left(\Expect |X|^\beta\right)^{1/\beta} $$
  • note Hölder's inequality implies Lyapunov's inequality

Maximal inequalities

if $A\in\algF=\bigcap_{n=1}^\infty \sigma(X_n, X_{n+1},\ldots)$ for independent $\seq{X_n}$, $$ \prob{A} = 0 \vee \prob{A} = 1 $$

– define $S_n = \sum X_i$

for independent $\seq{X_i}_{i=1}^n$ with $\Expect X_i =0$ and $\Var X_i<\infty$ and $\alpha>0$ $$ \prob{\max S_i \geq \alpha} \leq \frac{1}{\alpha}\Var S_n $$
for independent $\seq{X_i}_{i=1}^n$ and $\alpha>0$ $$ \prob{\max |S_i| \geq 3\alpha} \leq 3 \max \prob{|S_i|\geq\alpha} $$

Moments

for random variable, $X$, on $\meas{\Omega}{\algF}{P}$, integral of $X$ with respect to measure, $P$ $$ \Expect X^n = \int x^k d\mu = \int x^k dF(x) $$ called $k$-th moment of $X$ or $\mu$ or $F$, and $$ \Expect |X|^n = \int |x|^k d\mu = \int |x|^k dF(x) $$ called $k$-th absolute moment of $X$ or $\mu$ or $F$
  • if $\Expect |X|^n<\infty$, $\Expect |X|^k<\infty$ for $k<n$
  • $\Expect X^n$ defined only when $\Expect|X|^n<\infty$

Moment generating functions

for random variable, $X$, on $\meas{\Omega}{\algF}{P}$, $M:\complexes \to \complexes$ defined by $$ M(s) = \Expect \left( e^{sX} \right) = \int e^{sx} d\mu = \int e^{sx} dF(x) $$ called moment generating function of $X$
  • $n$-th derivative of $M$ with respect to $s$ is $M^{(n)}(s) = \frac{d^n}{ds^n} F(s) = \Expect \left(X^ne^{sX}\right) = \int xe^{sx} d\mu$
  • thus, $n$-th derivative of $M$ with respect to $s$ at $s=0$ is $n$-th moment of $X$ $$ M^{(n)}(0) = \Expect X^n $$
  • for independent random variables, $\seq{X_i}_{i=1}^n$, moment generating function of $\sum X_i$ $$ \prod M_i(s) $$

Convergence of Random Variables

Convergences of random variables

random variables, $\seq{X_n}$, with $$ \prob{\lim X_n = X} = P(\set{\omega \in \Omega}{\lim X_n(\omega) = X(\omega)}) = 1 $$ said to converge to $X$ with probability $1$ and denoted by $X_n\to X$ a.s.
random variables, $\seq{X_n}$, with $$ \left( \forall \epsilon>0 \right) \left( \lim \prob{|X_n-X|>\epsilon} = 0 \right) $$ said to converge to $X$ in probability
distribution functions, $\seq{F_n}$, with $$ \left( \forall x \mbox{ in domain of }F \right) \left( \lim F_n(x) = F(x) \right) $$ said to converge weakly to distribution function, $F$, and denoted by $F_n \Rightarrow F$
When $F_n\Rightarrow F$, associated random variables, $\seq{X_n}$, said to converge in distribution to $X$, associated with $F$, and denoted by $X_n \Rightarrow X$
for measures on $\measu{\reals}{\algR}$, $\seq{\mu_n}$, associated with distribution functions, $\seq{F_n}$, respectively, and measure on $\measu{\reals}{\algR}$, $\mu$, associated with distribution function, $F$, we denote $$ \mu_n \Rightarrow \mu $$ if $$ \left( \forall A = (-\infty, x] \mbox{ with } x\in\reals \right) \left( \lim \mu_n(A) = \mu(A) \right) $$
  • indeed, if above equation holds for $A=(-\infty, x)$, it holds for many other subsets

Relations of different types of convergences of random variables

convergence with probability $1$ implies convergence in probability, which implies $X_n\Rightarrow X$, i.e. $$ \begin{eqnarray*} && X_n \to X \mbox{ a.s., \ie, } X_n \mbox{ converge to } X \mbox{ with probability $1$} \\ &\Rightarrow& X_n \mbox{ converge to } X \mbox{ in probability} \\ &\Rightarrow& X_n \Rightarrow X \mbox{, \ie, } X_n \mbox{ converge to } X \mbox{ in distribution}, \end{eqnarray*} $$

Necessary and sufficient conditions for convergence of probability

\[{X_n}\ \mbox{ converge in probability}\]

if and only if

\[\left( \forall \epsilon>0 \right) \left( \prob{|X_n-X|>\epsilon\mbox{ i.o}} = \prob{\limsup |X_n-X| > \epsilon } = 0 \right)\]

if and only if

\[\left( \forall \mbox{ subsequence }\seq{X_{n_k}} \right) \left( \exists \mbox{ its subsequence }\seq{X_{n_{k_l}}} \mbox{ converging to } f \mbox{ with probability } 1 \right)\]

Necessary and sufficient conditions for convergence in distribution

\[X_n\Rightarrow X, \mbox{\ie, $X_n$ converge in distribution}\]

if and only if

\[F_n\Rightarrow F, \mbox{\ie, $F_n$ converge weakly}\]

if and only if

\[\left( \forall A = (-\infty, x] \mbox{ with } x\in\reals \right) \left( \lim \mu_n(A) = \mu(A) \right)\]

if and only if

\[\left( \forall x \mbox{ with } \prob{X=x} = 0 \right) \left( \lim \prob{X_n\leq x} = \prob{X\leq x} \right)\]

Strong law of large numbers

– define $S_n = \sum_{i=1}^n X_i$

for sequence of independent and identically distributed (i.i.d.) random variables with finite mean, $\seq{X_n}$ $$ \frac{1}{n} S_n \to \Expect X_1 $$ with probability $1$
  • strong law of large numbers also called Kolmogorov's law
for sequence of independent and identically distributed (i.i.d.) random variables with $\Expect X_1^- < \infty$ and $\Expect X_1^+ = \infty$ (hence, $\Expect X = \infty$) $$ \frac{1}{n} S_n \to \infty $$ with probability $1$

Weak law of large numbers

– define $S_n = \sum_{i=1}^n X_i$

for sequence of independent and identically distributed (i.i.d.) random variables with finite mean, $\seq{X_n}$ $$ \frac{1}{n} S_n \to \Expect X_1 $$ in probability
  • because convergence with probability $1$ implies convergence in probability (), strong law of large numbers implies weak law of large numbers

Normal distributions

– assume probability space, $\meas{\Omega}{\algF}{P}$

Random variable, $X:\Omega\to\reals$, with $$ \left( A\in\algR \right) \left( \prob{X\in A} = \frac{1}{\sqrt{2\pi}\sigma} \int_A e^{-(x-c)^2/2} d\mu \right) $$ where $\mu=PX^{-1}$ for some $\sigma>0$ and $c\in\reals$, called normal distribution and denoted by $X \sim \normal(c,\sigma^2)$
  • note $\Expect X=c$ and $\Var X=\sigma^2$
  • called standard normal distribution when $c=0$ and $\sigma=1$

Multivariate normal distributions

– assume probability space, $\meas{\Omega}{\algF}{P}$

Random variable, $X:\Omega\to\reals^n$, with $$ \left( A\in\algR^n \right) \left( \prob{X\in A} = \frac{1}{\sqrt{(2\pi)^n}\sqrt{\det \Sigma}} \int_A e^{-(x-c)^T\Sigma^{-1}(x-c)/2} d\mu \right) $$ where $\mu=PX^{-1}$ for some $\Sigma\succ0\in\posdefset{n}$ and $c\in\reals^n$, called ($n$-dimensional) normal distribution, and denoted by $X \sim \normal(c,\Sigma)$
  • note that $\Expect X=c$ and covariance matrix is $\Sigma$

Lindeberg-Lévy theorem

– define $S_n = \sum^n X_i$

for independent random variables, $\seq{X_n}$, having same distribution with expected value, $c$, and same variance, $\sigma^2<\infty$, ${(S_n - nc)}/{\sigma\sqrt{n}}$ converges to standard normal distribution in distribution, i.e., $$ \frac{S_n - nc}{\sigma\sqrt{n}} \Rightarrow N $$ where $N$ is standard normal distribution
  • implies $$ S_n / n \Rightarrow c $$

Limit theorems in $\reals^n$

each of following statements are equivalent to weak convergence of measures, $\seq{\mu_n}$, to $\mu$, on measurable space, $\measu{\reals^k}{\algR^k}$
  • $\lim \int f d\mu_n = \int f d\mu$ for every bounded continuous $f$
  • $\limsup \mu_n(C) \leq \mu(C)$ for every closed $C$
  • $\liminf \mu_n(G) \geq \mu(G)$ for every open $G$
  • $\lim \mu_n(A) = \mu(A)$ for every $\mu$-continuity $A$
for random vectors, $\seq{X_n}$, and random vector, $Y$, of $k$-dimension, $X_n\Rightarrow Y$, i.e., $X_n$ converge to $Y$ in distribution if and only if $$ \left( \forall z\in \reals^k \right) \left( z^T X_n \Rightarrow z^T Y \right) $$

Central limit theorem

– assume probability space, $\meas{\Omega}{\algF}{P}$ and define $\sum^n X_i = S_n$

for random variables, $\seq{X_n}$, having same distributions with $\Expect X_n = c\in\reals^k$ and positive definite covariance matrix, $\Sigma\succ0\in\mathcalfont{S}_k$, i.e., $\Expect(X_n-c)(X_n-c)^T = \Sigma$, where $\Sigma_{ii} < \infty$ (hence $\Sigma \prec M I_n$ for some $M\in\ppreals$ due to Cauchy-Schwarz inequality), $$ (S_n -nc)/\sqrt{n} \mbox{ converges in distribution to } Y $$ where $Y \sim \normal(0,\Sigma)$

Convergence of random series

  • for independent $\seq{X_n}$, probability of $\sum X_n$ converging is either $0$ or $1$
  • below characterize two cases in terms of distributions of individual $X_n$ -- XXX: diagram
for independent $\seq{X_n}$ with $\Expect X_n=0$ and $\Var X_n < \infty$ $$ \sum X_n \mbox{ converges with probability $1$} $$
for independent $\seq{X_n}$, $\sum X_n$ converges with probability $1$ if and only if they converges in probability

– define trucated version of $X_n$ by $X_n^{(c)}$, i.e., $X_n I_{|X_n|\leq c}$

for independent $\seq{X_n}$, $$ \begin{eqnarray*} &=& \sum X_n \mbox{ converge with probability $1$} \\ && \mbox{if all of } \sum \prob{|X_n|>c}, \sum \Expect(X_n^{(c)}), \sum \Var(X_n^{(c)}) \mbox{ converge for some }c>0 \end{eqnarray*} $$

Updated: