All Math Topics in All the Multiverses
posted: 02-Aug-2025 & updated: 03-Aug-2025
\[% \newcommand{\algA}{\algk{A}} \newcommand{\algC}{\algk{C}} \newcommand{\bigtimes}{\times} \newcommand{\compl}[1]{\tilde{#1}} \newcommand{\complexes}{\mathbb{C}} \newcommand{\dom}{\mathop{\bf dom {}}} \newcommand{\ereals}{\reals\cup\{-\infty,\infty\}} \newcommand{\field}{\mathbb{F}} \newcommand{\integers}{\mathbb{Z}} \newcommand{\lbdseqk}[1]{\seqk{\lambda}{#1}} \newcommand{\meas}[3]{({#1}, {#2}, {#3})} \newcommand{\measu}[2]{({#1}, {#2})} \newcommand{\meast}[3]{\left({#1}, {#2}, {#3}\right)} \newcommand{\naturals}{\mathbb{N}} \newcommand{\nuseqk}[1]{\seqk{\nu}{#1}} \newcommand{\pair}[2]{\langle {#1}, {#2}\rangle} \newcommand{\rationals}{\mathbb{Q}} \newcommand{\reals}{\mathbb{R}} \newcommand{\seq}[1]{\left\langle{#1}\right\rangle} \newcommand{\powerset}{\mathcal{P}} \newcommand{\pprealk}[1]{\reals_{++}^{#1}} \newcommand{\ppreals}{\mathbb{R}_{++}} \newcommand{\prealk}[1]{\reals_{+}^{#1}} \newcommand{\preals}{\mathbb{R}_+} \newcommand{\tXJ}{\topos{X}{J}} % \newcommand{\relint}{\mathop{\bf relint {}}} \newcommand{\boundary}{\mathop{\bf bd {}}} \newcommand{\subsetset}[1]{\mathcal{#1}} \newcommand{\Tr}{\mathcal{\bf Tr}} \newcommand{\symset}[1]{\mathbf{S}^{#1}} \newcommand{\possemidefset}[1]{\mathbf{S}_+^{#1}} \newcommand{\posdefset}[1]{\mathbf{S}_{++}^{#1}} \newcommand{\ones}{\mathbf{1}} \newcommand{\Prob}{\mathop{\bf Prob {}}} \newcommand{\prob}[1]{\Prob\left\{#1\right\}} \newcommand{\Expect}{\mathop{\bf E {}}} \newcommand{\Var}{\mathop{\bf Var{}}} \newcommand{\Mod}[1]{\;(\text{mod}\;#1)} \newcommand{\ball}[2]{B(#1,#2)} \newcommand{\generates}[1]{\langle {#1} \rangle} \newcommand{\isomorph}{\approx} \newcommand{\isomorph}{\approx} \newcommand{\nullspace}{\mathcalfont{N}} \newcommand{\range}{\mathcalfont{R}} \newcommand{\diag}{\mathop{\bf diag {}}} \newcommand{\rank}{\mathop{\bf rank {}}} \newcommand{\Ker}{\mathop{\mathrm{Ker} {}}} \newcommand{\Map}{\mathop{\mathrm{Map} {}}} \newcommand{\End}{\mathop{\mathrm{End} {}}} \newcommand{\Img}{\mathop{\mathrm{Im} {}}} \newcommand{\Aut}{\mathop{\mathrm{Aut} {}}} \newcommand{\Gal}{\mathop{\mathrm{Gal} {}}} \newcommand{\Irr}{\mathop{\mathrm{Irr} {}}} \newcommand{\arginf}{\mathop{\mathrm{arginf}}} \newcommand{\argsup}{\mathop{\mathrm{argsup}}} \newcommand{\argmin}{\mathop{\mathrm{argmin}}} \newcommand{\ev}{\mathop{\mathrm{ev} {}}} \newcommand{\affinehull}{\mathop{\bf aff {}}} \newcommand{\cvxhull}{\mathop{\bf Conv {}}} \newcommand{\epi}{\mathop{\bf epi {}}} \newcommand{\injhomeo}{\hookrightarrow} \newcommand{\perm}[1]{\text{Perm}(#1)} \newcommand{\aut}[1]{\text{Aut}(#1)} \newcommand{\ideal}[1]{\mathfrak{#1}} \newcommand{\bigset}[2]{\left\{#1\left|{#2}\right.\right\}} \newcommand{\bigsetl}[2]{\left\{\left.{#1}\right|{#2}\right\}} \newcommand{\primefield}[1]{\field_{#1}} \newcommand{\dimext}[2]{[#1:{#2}]} \newcommand{\restrict}[2]{#1|{#2}} \newcommand{\algclosure}[1]{#1^\mathrm{a}} \newcommand{\finitefield}[2]{\field_{#1^{#2}}} \newcommand{\frobmap}[2]{\varphi_{#1,{#2}}} % %\newcommand{\algfontmode}{} % %\ifdefined\algfontmode %\newcommand\mathalgfont[1]{\mathcal{#1}} %\newcommand\mathcalfont[1]{\mathscr{#1}} %\else \newcommand\mathalgfont[1]{\mathscr{#1}} \newcommand\mathcalfont[1]{\mathcal{#1}} %\fi % %\def\DeltaSirDir{yes} %\newcommand\sdirletter[2]{\ifthenelse{\equal{\DeltaSirDir}{yes}}{\ensuremath{\Delta #1}}{\ensuremath{#2}}} \newcommand{\sdirletter}[2]{\Delta #1} \newcommand{\sdirlbd}{\sdirletter{\lambda}{\Delta \lambda}} \newcommand{\sdir}{\sdirletter{x}{v}} \newcommand{\seqk}[2]{#1^{(#2)}} \newcommand{\seqscr}[3]{\seq{#1}_{#2}^{#3}} \newcommand{\xseqk}[1]{\seqk{x}{#1}} \newcommand{\sdirk}[1]{\seqk{\sdir}{#1}} \newcommand{\sdiry}{\sdirletter{y}{\Delta y}} \newcommand{\slen}{t} \newcommand{\slenk}[1]{\seqk{\slen}{#1}} \newcommand{\ntsdir}{\sdir_\mathrm{nt}} \newcommand{\pdsdir}{\sdir_\mathrm{pd}} \newcommand{\sdirnu}{\sdirletter{\nu}{w}} \newcommand{\pdsdirnu}{\sdirnu_\mathrm{pd}} \newcommand{\pdsdiry}{\sdiry_\mathrm{pd}} \newcommand\pdsdirlbd{\sdirlbd_\mathrm{pd}} % \newcommand{\normal}{\mathcalfont{N}} % \newcommand{\algk}[1]{\mathalgfont{#1}} \newcommand{\collk}[1]{\mathcalfont{#1}} \newcommand{\classk}[1]{\collk{#1}} \newcommand{\indexedcol}[1]{\{#1\}} \newcommand{\rel}{\mathbf{R}} \newcommand{\relxy}[2]{#1\;\rel\;{#2}} \newcommand{\innerp}[2]{\langle{#1},{#2}\rangle} \newcommand{\innerpt}[2]{\left\langle{#1},{#2}\right\rangle} \newcommand{\closure}[1]{\overline{#1}} \newcommand{\support}{\mathbf{support}} \newcommand{\set}[2]{\{#1|#2\}} \newcommand{\metrics}[2]{\langle {#1}, {#2}\rangle} \newcommand{\interior}[1]{#1^\circ} \newcommand{\topol}[1]{\mathfrak{#1}} \newcommand{\topos}[2]{\langle {#1}, \topol{#2}\rangle} % topological space % \newcommand{\alg}{\algk{A}} \newcommand{\algB}{\algk{B}} \newcommand{\algF}{\algk{F}} \newcommand{\algR}{\algk{R}} \newcommand{\algX}{\algk{X}} \newcommand{\algY}{\algk{Y}} % \newcommand\coll{\collk{C}} \newcommand\collB{\collk{B}} \newcommand\collF{\collk{F}} \newcommand\collG{\collk{G}} \newcommand{\tJ}{\topol{J}} \newcommand{\tS}{\topol{S}} \newcommand\openconv{\collk{U}} % \newenvironment{my-matrix}[1]{\begin{bmatrix}}{\end{bmatrix}} \newcommand{\colvectwo}[2]{\begin{my-matrix}{c}{#1}\\{#2}\end{my-matrix}} \newcommand{\colvecthree}[3]{\begin{my-matrix}{c}{#1}\\{#2}\\{#3}\end{my-matrix}} \newcommand{\rowvecthree}[3]{\begin{bmatrix}{#1}&{#2}&{#3}\end{bmatrix}} \newcommand{\mattwotwo}[4]{\begin{bmatrix}{#1}&{#2}\\{#3}&{#4}\end{bmatrix}} % \newcommand\optfdk[2]{#1^\mathrm{#2}} \newcommand\tildeoptfdk[2]{\tilde{#1}^\mathrm{#2}} \newcommand\fobj{\optfdk{f}{obj}} \newcommand\fie{\optfdk{f}{ie}} \newcommand\feq{\optfdk{f}{eq}} \newcommand\tildefobj{\tildeoptfdk{f}{obj}} \newcommand\tildefie{\tildeoptfdk{f}{ie}} \newcommand\tildefeq{\tildeoptfdk{f}{eq}} \newcommand\xdomain{\mathcalfont{X}} \newcommand\xobj{\optfdk{\xdomain}{obj}} \newcommand\xie{\optfdk{\xdomain}{ie}} \newcommand\xeq{\optfdk{\xdomain}{eq}} \newcommand\optdomain{\mathcalfont{D}} \newcommand\optfeasset{\mathcalfont{F}} % \newcommand{\bigpropercone}{\mathcalfont{K}} % \newcommand{\prescript}[3]{\;^{#1}{#3}} % %\]Introduction
Preamble
Notations
-
sets of numbers
- $\naturals$ - set of natural numbers
- $\integers$ - set of integers
- $\integers_+$ - set of nonnegative integers
- $\rationals$ - set of rational numbers
- $\reals$ - set of real numbers
- $\preals$ - set of nonnegative real numbers
- $\ppreals$ - set of positive real numbers
- $\complexes$ - set of complex numbers
-
sequences $\seq{x_i}$ and the like
- finite $\seq{x_i}_{i=1}^n$, infinite $\seq{x_i}_{i=1}^\infty$ - use $\seq{x_i}$ whenever unambiguously understood
- similarly for other operations, e.g., $\sum x_i$, $\prod x_i$, $\cup A_i$, $\cap A_i$, $\bigtimes A_i$
- similarly for integrals, e.g., $\int f$ for $\int_{-\infty}^\infty f$
-
sets
- $\compl{A}$ - complement of $A$
- $A\sim B$ - $A\cap \compl{B}$
- $A\Delta B$ - $(A\cap \compl{B}) \cup (\compl{A} \cap B)$
- $\powerset(A)$ - set of all subsets of $A$
-
sets in metric vector spaces
- $\closure{A}$ - closure of set $A$
- $\interior{A}$ - interior of set $A$
- $\relint A$ - relative interior of set $A$
- $\boundary A$ - boundary of set $A$
-
set algebra
- $\sigma(\subsetset{A})$ - $\sigma$-algebra generated by $\subsetset{A}$, i.e., smallest $\sigma$-algebra containing $\subsetset{A}$
-
norms in $\reals^n$
- $\|x\|_p$ ($p\geq1$) - $p$-norm of $x\in\reals^n$, i.e., $(|x_1|^p + \cdots + |x_n|^p)^{1/p}$
- e.g., $\|x\|_2$ - Euclidean norm
-
matrices and vectors
- $a_{i}$ - $i$-th entry of vector $a$
- $A_{ij}$ - entry of matrix $A$ at position $(i,j)$, i.e., entry in $i$-th row and $j$-th column
- $\Tr(A)$ - trace of $A \in\reals^{n\times n}$, i.e., $A_{1,1}+ \cdots + A_{n,n}$
-
symmetric, positive definite, and positive semi-definite matrices
- $\symset{n}\subset \reals^{n\times n}$ - set of symmetric matrices
- $\possemidefset{n}\subset \symset{n}$ - set of positive semi-definite matrices; $A\succeq0 \Leftrightarrow A \in \possemidefset{n}$
- $\posdefset{n}\subset \symset{n}$ - set of positive definite matrices; $A\succ0 \Leftrightarrow A \in \posdefset{n}$
-
sometimes,
use Python script-like notations
(with serious abuse of mathematical notations)
-
use $f:\reals\to\reals$ as if it were $f:\reals^n \to \reals^n$,
e.g.,
$$
\exp(x) = (\exp(x_1), \ldots, \exp(x_n)) \quad \mbox{for } x\in\reals^n
$$
and
$$
\log(x) = (\log(x_1), \ldots, \log(x_n)) \quad \mbox{for } x\in\ppreals^n
$$
which corresponds to Python code
numpy.exp(x)
ornumpy.log(x)
wherex
is instance ofnumpy.ndarray
, i.e.,numpy
array -
use $\sum x$ to mean $\ones^T x$ for $x\in\reals^n$,
i.e.
$$
\sum x = x_1 + \cdots + x_n
$$
which corresponds to Python code
x.sum()
wherex
isnumpy
array -
use $x/y$ for $x,y\in\reals^n$ to mean
$$
\rowvecthree{x_1/y_1}{\cdots}{x_n/y_n}^T
$$
which corresponds to Python code
x / y
wherex
andy
are $1$-dnumpy
arrays -
use $X/Y$ for $X,Y\in\reals^{m\times n}$ to mean
$$
\begin{my-matrix}{cccc}
X_{1,1}/Y_{1,1} & X_{1,2}/Y_{1,2} & \cdots & X_{1,n}/Y_{1,n}
\\
X_{2,1}/Y_{2,1} & X_{2,2}/Y_{2,2} & \cdots & X_{2,n}/Y_{2,n}
\\
\vdots & \vdots & \ddots & \vdots
\\
X_{m,1}/Y_{m,1} & X_{m,2}/Y_{m,2} & \cdots & X_{m,n}/Y_{m,n}
\end{my-matrix}
$$
which corresponds to Python code
X / Y
whereX
andY
are $2$-dnumpy
arrays
-
use $f:\reals\to\reals$ as if it were $f:\reals^n \to \reals^n$,
e.g.,
$$
\exp(x) = (\exp(x_1), \ldots, \exp(x_n)) \quad \mbox{for } x\in\reals^n
$$
and
$$
\log(x) = (\log(x_1), \ldots, \log(x_n)) \quad \mbox{for } x\in\ppreals^n
$$
which corresponds to Python code
Some definitions
Some conventions
-
(for some subjects) use following conventions
- $0\cdot \infty = \infty \cdot 0 = 0$
- $(\forall x\in\ppreals)(x\cdot \infty = \infty \cdot x = \infty)$
- $\infty \cdot \infty = \infty$
Math Stories
Dualities
-
duality
- “very pervasive and important concept in (modern) mathematics''
- “important general theme having manifestations in almost every area of mathematics''
-
dualities appear in many places in mathematics, e.g.
- dual of normed space is space of bounded linear functionals on the space (page~)
- dual cones and dual norms are defined ( & )
- can define dual generalized inequalities using dual cones ()
- can find necessary and sufficient conditions for $K$-convexity using dual generalized inequalities ()
- duality can be observed even in fundamental theorem for Galois theory, i.e., $G(K/E) \leftrightarrow E$ & $H \leftrightarrow K^H$ ()
- exist dualities in continuous / discrete functions in time domain and continuous / discrete functions in frequency domain, i.e., as in Fourier Transformation
- However, never fascinated more than , e.g.,
Algebra
Inequalities
Jensen's inequality
- strictly convex function: for any $x\neq y$ and $0< \alpha <1$ () $$ \alpha f(x) + (1-\alpha) f(y) > f(\alpha x + (1-\alpha) y) $$
- convex function: for any $x, y$ and $0< \alpha <1$ () $$ \alpha f(x) + (1-\alpha) f(y) \geq f(\alpha x + (1-\alpha) y) $$
- if $f$ is strictly convex, equality holds if and only if $x_1=\cdots=x_n$
Jensen's inequality - for random variables
- discrete random variable interpretation of Jensen's inequality in summation form - assume $\Prob(X=x_i) = \alpha_i$, then $$ \Expect f(X) = \alpha_1 f(x_1) + \cdots + \alpha_n f(x_n) \geq f(\alpha_1 x_1 + \cdots + \alpha_n x_n) = f\left(\Expect X\right) $$
- true for any random variables $X$
Proof for $n=3$
- for any $x,y,z$ and $\alpha,\beta,\gamma>0$ with $\alpha + \beta + \gamma = 1$ $$ \begin{eqnarray*} \alpha f(x) + \beta f(y) + \gamma f(z) &=& (\alpha+\beta)\left(\frac{\alpha}{\alpha+\beta} f(x) + \frac{\beta}{\alpha + \beta} f(y)\right) + \gamma f(z) \\ &\geq& (\alpha+\beta)f\left(\frac{\alpha}{\alpha+\beta} x + \frac{\beta}{\alpha + \beta} y\right) + \gamma f(z) \\ &\geq& f\left((\alpha+\beta)\left(\frac{\alpha}{\alpha+\beta} x + \frac{\beta}{\alpha + \beta} y\right) + \gamma z \right) \\ &=& f(\alpha x + \beta y + \gamma z ) \end{eqnarray*} $$
Proof for all $n$
-
use mathematical induction
- assume that Jensen's inequality holds for $1\leq n\leq m$
- for distinct $x_i$ and $\alpha_i>0$ ($1\leq i\leq m+1$) with $\alpha_1 + \cdots + \alpha_{m+1} = 1$ $$ \begin{eqnarray*} \sum^{m+1}_{i=1} \alpha_i f(x_i) &=& \left(\sum^m_{j=1} \alpha_j\right) \sum^m_{i=1} \left(\frac{\alpha_i}{\sum^m_{j=1} \alpha_j} f(x_i)\right) + \alpha_{m+1} f(x_{m+1}) \\ &\geq& \left(\sum^m_{j=1} \alpha_j\right) f\left(\sum^m_{i=1} \left(\frac{\alpha_i}{\sum^m_{j=1} \alpha_j} x_i\right)\right) + \alpha_{m+1} f(x_{m+1}) \\ &=& \left(\sum^m_{j=1} \alpha_j\right) f\left(\frac{1}{\sum^m_{j=1} \alpha_j}\sum^m_{i=1} {\alpha_i}{} x_i\right) + \alpha_{m+1} f(x_{m+1}) \\ &\geq& f\left( \sum^m_{i=1} \alpha_i x_i + \alpha_{m+1} x_{m+1}\right) = f\left( \sum^{m+1}_{i=1} \alpha_i x_i \right) \end{eqnarray*} $$
1st and 2nd order conditions for convexity
- 1st order condition (assuming differentiable $f:\reals\to\reals$) - $f$ is strictly convex if and only if for any $x\neq y$ $$ f(y) > f(x) + f'(x)(y-x) $$
-
2nd order condition (assuming twice-differentiable $f:\reals\to\reals$)
- if $f''(x)>0$, $f$ is strictly convex
- $f$ is convex if and only if for any $x$ $$ f''(x) \geq 0 $$
Jensen's inequality examples
- $f(x)=x^2$ is strictly convex $$ \frac{a^2 + b^2}{2} \geq \left(\frac{a+b}{2}\right)^2 $$
- $f(x)=x^4$ is strictly convex $$ \frac{a^4 + b^4}{2} \geq \left(\frac{a+b}{2}\right)^4 $$
- $f(x)=\exp(x)$ is strictly convex $$ \frac{\exp(a) + \exp(b)}{2} \geq \exp\left(\frac{a+b}{2}\right) $$
- equality holds if and only if $a=b$ for all inequalities
1st and 2nd order conditions for convexity - vector version
- 1st order condition (assuming differentiable $f:\reals^n\to\reals$) - $f$ is strict convex if and only if for any $x,y$ $$ f(y) > f(x) + \nabla f(x)^T (y-x) $$ where $\nabla f(x) \in\reals^{n}$ with $\nabla f(x)_{i} = \partial f(x) / \partial x_i$
-
2nd order condition (assuming twice-differentiable $f:\reals^n\to\reals$)
- if $\nabla^2 f(x) \succ 0$, $f$ is strictly convex
- $f$ is convex if and only if for any $x$ $$ \nabla^2 f(x)\succeq 0 $$
Jensen's inequality examples - vector version
- assume $f:\reals^n\to\reals$
-
$f(x)=\|x\|_2 = \sqrt{\sum x_i^2}$ is strictly convex
$$
(\|a\|_2 + 2\|b\|_2 )/3
\geq
\left\|(a+2b)/3\right\|_2
$$
- equality holds if and only if $a=b\in\reals^n$
-
$f(x)=\|x\|_p = \left(\sum |x_i|^p\right)^{1/p}$ ($p>1$) is strictly convex
$$
\frac{1}{k}
\left(\sum_{i=1}^k\|x^{(i)}\|_p \right)
\geq
\left\|\frac{1}{k}\sum_{i=1}^k x^{(i)}\right\|_p
$$
- equality holds if and only if $x^{(1)}=\cdots=x^{(k)}\in\reals^n$
AM $\geq$ GM
-
for all $a,b>0$
$$
\frac{a + b}{2} \geq \sqrt{ab}
$$
- equality holds if and only if $a=b$
- below most general form holds
- let's prove these incrementally (for rational $\alpha_i$)
Proof of AM $\geq$ GM - simplest case
- use fact that $x^2\geq0$ for any $x\in\reals$
-
for any $a,b>0$
$$
\begin{eqnarray*}
&&
(\sqrt{a}-\sqrt{b})^2 \geq 0
\\
&\Leftrightarrow&
a^2 - 2\sqrt{ab} + b^2 \geq 0
\\
&\Leftrightarrow&
a + b \geq 2\sqrt{ab}
\\
&\Leftrightarrow&
\frac{a + b}{2} \geq \sqrt{ab}
\end{eqnarray*}
$$
- equality holds if and only if $a=b$
Proof of AM $\geq$ GM - when $n=4$ and $n=8$
-
for any $a,b,c,d>0$
$$
\frac{a+b+c+d}{4}
\geq
\frac{2\sqrt{ab} + 2\sqrt{cd}}{4}
=
\frac{\sqrt{ab} + \sqrt{cd}}{2}
\geq
\sqrt{\sqrt{ab} \sqrt{cd}}
=
\sqrt[4]{abcd}
$$
- equality holds if and only if $a=b$ and $c=d$ and $ab=cd$ if and only if $a=b=c=d$
-
likewise, for $a_1,\ldots,a_8>0$
$$
\begin{eqnarray*}
\frac{a_1+\cdots+a_8}{8}
&\geq&
\frac{\sqrt{a_1a_2} + \sqrt{a_3a_4} + \sqrt{a_5a_6} + \sqrt{a_7a_8}}{4}
\\
&\geq&
\sqrt[4]{\sqrt{a_1a_2} \sqrt{a_3a_4} \sqrt{a_5a_6} \sqrt{a_7a_8}}
\\
&=&
\sqrt[8]{a_1\cdots a_8}
\end{eqnarray*}
$$
- equality holds if and only if $a_1=\cdots=a_8$
Proof of AM $\geq$ GM - when $n=2^m$
-
generalized to cases $n=2^m$
$$
\left(\sum_{a=1}^{2^m} a_i\right) / 2^m\geq \left({\prod_{a=1}^{2^m} a_i}\right)^{1/2^m}
$$
- equality holds if and only if $a_1=\cdots=a_{2^m}$
- can be proved by mathematical induction
Proof of AM $\geq$ GM - when $n=3$
-
proof for $n=3$
$$
\begin{eqnarray*}
&&
\frac{a+b+c}{3} = \frac{a + b + c + (a+b+c)/3}{4}
\geq \sqrt[4]{abc(a+b+c)/3}
\\
&\Rightarrow&
\left(\frac{a+b+c}{3}\right)^4 \geq {abc(a+b+c)/3}
\\
&\Leftrightarrow&
\left(\frac{a+b+c}{3}\right)^3 \geq abc
\\
&\Leftrightarrow&
\frac{a+b+c}{3} \geq \sqrt[3]{abc}
\end{eqnarray*}
$$
- equality holds if and only if $a=b=c=(a+b+c)/3$ if and only if $a=b=c$
Proof of AM $\geq$ GM - for all integers
- for any integer $n\neq 2^m$
-
for $m$ such that $2^m>n$
$$
\begin{eqnarray*}
&&
\frac{a_1+\cdots+a_n}{n} = \frac{a_1 + \cdots + a_n + (2^m-n) (a_1+\cdots+a_n) /n}{2^m}
\\
&&
\geq
\sqrt[2^m]{a_1\cdots a_n \cdot ((a_1 + \cdots + a_n)/n)^{2^m-n}}
\\
&\Leftrightarrow&
\left(\frac{a_1+\cdots+a_n}{n}\right)^{2^m}
\geq
{a_1\cdots a_n \cdot \left(\frac{a_1 + \cdots + a_n}{n}\right)^{2^m-n}}
\\
&\Leftrightarrow&
\left(\frac{a_1+\cdots+a_n}{n}\right)^{n}
\geq
{a_1\cdots a_n}
\\
&\Leftrightarrow&
\frac{a_1+\cdots+a_n}{n}
\geq
\sqrt[n]{a_1\cdots a_n}
\end{eqnarray*}
$$
- equality holds if and only if $a_1=\cdots=a_n$
Proof of AM $\geq$ GM - rational $\alpha_i$
- given $n$ positive rational $\alpha_i$, we can find $n$ natural numbers $q_i$ such that $$ \alpha_i = \frac{q_i}{ N} $$ where $q_1+\cdots+q_n=N$
-
for any $n$ positive $a_i\in\reals$ and positive $n$ $\alpha_i\in\rationals$ with $\alpha_1+\cdots+\alpha_n=1$
$$
\alpha_1 a_1 + \cdots + \alpha_n a_n
= \frac{q_1 a_1 + \cdots + q_n a_n}{N}
\geq \sqrt[N]{a_1^{q_1}\cdots a_n^{q_n}}
= a_1^{\alpha_1}\cdots a_n^{\alpha_n}
$$
- equality holds if and only if $a_1=\cdots=a_n$
Proof of AM $\geq$ GM - real $\alpha_i$
- exist $n$ rational sequences $\{ \beta_{i,1}, \beta_{i,2}, \ldots\}$ ($1\leq i\leq n$) such that $$ \begin{eqnarray*} && \beta_{1,j}+\cdots+\beta_{n,j}=1 \ \forall \ j\geq1 \\ && \lim_{j\to\infty} \beta_{i,j} = \alpha_i \ \forall \ 1\leq i\leq n \end{eqnarray*} $$
- for all $j$ $$ \beta_{1,j} a_1 + \cdots + \beta_{n,j} a_n \geq a_1^{\beta_{1,j}}\cdots a_n^{\beta_{n,j}} $$ hence $$ \begin{eqnarray*} && \lim_{j\to\infty} \left(\beta_{1,j} a_1 + \cdots + \beta_{n,j} a_n \right) \geq \lim_{j\to\infty} a_1^{\beta_{1,j}}\cdots a_n^{\beta_{n,j}} \\ &\Leftrightarrow& \alpha_1 a_1 + \cdots + \alpha_n a_n \geq a_1^{\alpha_1}\cdots a_n^{\alpha_n} \end{eqnarray*} $$
- cannot prove equality condition from above proof method
Proof of AM $\geq$ GM using Jensen's inequality
- $(-\log)$ is strictly convex function because $$ \frac{d^2}{dx^2} \left(-\log(x)\right) = \frac{d}{dx} \left(-\frac{1}{x} \right) = \frac{1}{x^2} > 0 $$
- Jensen's inequality implies for $a_i >0$, $\alpha_i >0$ with $\sum \alpha_i = 1$ $$ \begin{eqnarray*} -\log\left(\prod a_i^{\alpha_i}\right) = -\sum \log\left( a_i^{\alpha_i}\right) = \sum \alpha_i (-\log(a_i)) \geq -\log \left(\sum \alpha_i a_i\right) \end{eqnarray*} $$
- $(-\log)$ strictly monotonically decreases, hence $\prod a_i^{\alpha_i} \leq \sum \alpha_i a_i$, having just proved $$ \alpha_1 a_1 + \cdots + \alpha_n a_n \geq a_1^{\alpha_1}\cdots a_n^{\alpha_n} $$ where equality if and only if $a_i$ are equal (by Jensen's inequality's equality condition)
Cauchy-Schwarz inequality
-
middle school proof
$$
\begin{eqnarray*}
&&\sum (t a_i + b_i)^2 \geq 0 \ \forall\ t \in \reals
\\
&\Leftrightarrow&
t^2 \sum a_i^2 + 2t \sum a_ib_i + \sum b_i^2 \geq 0 \ \forall\ t \in \reals
\\
&\Leftrightarrow&
\Delta = \left(\sum a_ib_i \right)^2 - \sum a_i^2 \sum b_i^2 \leq 0
\end{eqnarray*}
$$
- equality holds if and only if $\exists t\in\reals$, $t a_i + b_i=0$ for all $1\leq i\leq n$
Cauchy-Schwarz inequality - another proof
-
$x\geq0$ for any $x\in\reals$, hence
$$
\begin{eqnarray*}
&&
\sum_i \sum_j (a_ib_j - a_jb_i)^2 \geq0
\\
&\Leftrightarrow&
\sum_i \sum_j (a_i^2b_j^2 - 2a_ia_jb_ib_j + a_j^2b_i^2) \geq0
\\
&\Leftrightarrow&
\sum_i \sum_j a_i^2b_j^2 + \sum_i \sum_j a_j^2b_i^2 -2 \sum_i \sum_j a_ia_jb_ib_j \geq 0
\\
&\Leftrightarrow&
2 \sum_i a_i^2 \sum_j b_j^2 - 2 \sum_i a_ib_i \sum_j a_jb_j \geq 0
\\
&\Leftrightarrow&
\sum_i a_i^2 \sum_j b_j^2 - \left(\sum_i a_ib_i\right)^2 \geq0
\end{eqnarray*}
$$
- equality holds if and only if $a_ib_j=a_jb_i$ for all $1\leq i,j\leq n$
Cauchy-Schwarz inequality - still another proof
- for any $x,y\in\reals$ and $\alpha,\beta>0$ with $\alpha + \beta = 1$ $$ \begin{eqnarray*} && (\alpha x - \beta y)^2 = \alpha^2 x^2 + \beta^2 y^2 - 2\alpha \beta xy \\ && = \alpha(1-\beta) x^2 + (1-\alpha)\beta y^2 - 2\alpha \beta xy \geq 0 \\ &\Leftrightarrow& \alpha x^2 + \beta y^2 \geq \alpha \beta x^2 + \alpha \beta y^2 + 2\alpha \beta xy = \alpha \beta (x+y)^2 \\ &\Leftrightarrow& x^2 / \alpha + y^2 / \beta \geq (x+y)^2 \end{eqnarray*} $$
- plug in $x=a_i$, $y=b_i$, $\alpha = A/(A+B)$, $\beta=B/(A+B)$ where $A = \sqrt{\sum a_i^2}$, $B = \sqrt{\sum b_i^2}$ $$ \begin{eqnarray*} && \sum (a_i^2 / \alpha + b_i^2 / \beta) \geq \sum (a_i+b_i)^2 \Leftrightarrow (A+B)^2 \geq A^2 + B^2 + 2 \sum a_i b_i \\ &\Leftrightarrow& AB \geq \sum a_i b_i \Leftrightarrow A^2B^2 \geq \left(\sum a_i b_i\right)^2 \Leftrightarrow {\sum a_i^2}{\sum b_i^2} \geq \left(\sum a_i b_i \right)^2 \end{eqnarray*} $$
Cauchy-Schwarz inequality - proof using determinant
-
almost the same proof as first one - but using $2$-by-$2$ matrix determinant
$$
\begin{eqnarray*}
&&\sum (x a_i + y b_i )^2 \geq 0 \ \forall\ x,y \in \reals
\\
&\Leftrightarrow&
x^2 \sum a_i^2 + 2xy \sum a_ib_i + y^2\sum b_i^2 \geq 0 \ \forall \ x, y \in \reals
\\
&\Leftrightarrow&
\begin{my-matrix}{cc}
x & y
\end{my-matrix}
\begin{my-matrix}{cc}
\sum a_i^2 & \sum a_ib_i
\\
\sum a_ib_i & \sum b_i^2
\end{my-matrix}
\begin{my-matrix}{c}
x \\ y
\end{my-matrix}
\geq 0
\ \forall \ x, y \in \reals
\\
\\
&\Leftrightarrow&
\left|
\begin{array}{cc}
\sum a_i^2 & \sum a_ib_i
\\
\sum a_ib_i & \sum b_i^2
\end{array}
\right|
\geq 0
\Leftrightarrow
\sum a_i^2 \sum b_i^2 - \left(\sum a_ib_i \right)^2 \geq0
\end{eqnarray*}
$$
- equality holds if and only if $$ \left( \exists x,y\in\reals \mbox{ with } xy\neq0 \right) \left( xa_i + yb_i=0\ \ \forall 1\leq i\leq n \right) $$
- allows beautiful generalization of Cauchy-Schwarz inequality
Cauchy-Schwarz inequality - generalization
- want to say something like $\sum_{i=1}^n (x a_i + y b_i + z c_i + w d_i + \cdots)^2$
-
run out of alphabets - use double subscripts
$$
\begin{eqnarray*}
&&
\sum_{i=1}^n (x_1 A_{1,i} + x_2 A_{2,i} + \cdots + x_m A_{m,i})^2 \geq 0 \ \forall\ x_i \in \reals
\\
&\Leftrightarrow&
\sum_{i=1}^n (x^T a_i)^2
=
\sum_{i=1}^n x^T a_ia_i^T x
=
x^T \left(\sum_{i=1}^n a_ia_i^T\right) x \geq 0 \ \forall\ x \in \reals^m
\\
&\Leftrightarrow&
\left|
\begin{array}{cccc}
\sum_{i=1}^n A_{1,i}^2 & \sum_{i=1}^n A_{1,i} A_{2,i} & \cdots & \sum_{i=1}^n A_{1,i} A_{m,i}
\\
\sum_{i=1}^n A_{1,i}A_{2,i} & \sum_{i=1}^n A_{2,i}^2 & \cdots & \sum_{i=1}^n A_{2,i} A_{m,i}
\\
\vdots & \vdots & \ddots & \vdots
\\
\sum_{i=1}^n A_{1,i}A_{m,i} & \sum_{i=1}^n A_{2,i}A_{m,i} & \cdots & \sum_{i=1}^n A_{m,i}^2
\end{array}
\right|
\geq 0
\end{eqnarray*}
$$
- where $a_i = \begin{my-matrix}{ccc} A_{1,i} &\cdots & A_{m,i}\end{my-matrix}^T \in\reals^m$
- equality holds if and only if $\exists x\neq0\in\reals^m$, $x^Ta_i =0$ for all $1\leq i\leq n$
Cauchy-Schwarz inequality - three series of variables
-
let $m=3$
$$
\begin{eqnarray*}
&&
\begin{my-matrix}{ccc}
\sum a_{i}^2 & \sum a_{i} b_{i} & \sum a_{i} c_{i}
\\
\sum a_{i}b_{i} & \sum b_{i}^2 & \sum b_{i} c_{i}
\\
\sum a_{i}c_{i} & \sum b_{i}c_{i} & \sum c_{i}^2
\end{my-matrix}
\succeq 0
\\
&\Rightarrow&
\sum a_i^2 \sum b_i^2 \sum c_i^2 + 2 \sum a_ib_i \sum b_ic_i \sum c_ia_i
\\
&&
\geq \sum a_i^2 \left(\sum b_i c_i\right)^2 + \sum b_i^2 \left(\sum a_i c_i\right)^2 + \sum c_i^2 \left(\sum a_i b_i\right)^2
\end{eqnarray*}
$$
- equality holds if and only if $\exists x,y,z\in\reals$, $xa_i + yb_i + zc_i=0$ for all $1\leq i\leq n$
-
questions for you
- what does this mean?
- any real-world applications?
Cauchy-Schwarz inequality - extensions
- note that all these can be further generalized as in page~\pageref{page:Cauchy-Schwarz inequality - generalization}
Number Theory - Queen of Mathematics
Integers
-
integers ($\integers$)
-
$\ldots -2, -1, 0, 1, 2, \ldots$
- first defined by Bertrand Russell
-
algebraic structure - commutative ring
- addition, multiplication defined, but divison not defined
- addition, multiplication are associative
- multiplication distributive over addition
- addition, multiplication are commutative
-
natural numbers ($\naturals$)
- $1, 2, \ldots$
Division and prime numbers
- divisors for $n\in\naturals$ $$ \set{d\in\naturals}{ d \mbox{ divides } n} $$
-
prime numbers
- $p$ is primes if $1$ and $p$ are only divisors
Fundamental theorem of arithmetic
Elementary quantities
- greatest common divisor (gcd) (of $a$ and $b$) $$ \gcd(a,b) = \max \set{d}{d\mbox{ divides both }a \mbox{ and } b} $$
- least common multiple (lcm) (of $a$ and $b$) $$ \mbox{lcm}(a,b) = \min \set{m}{\mbox{both } a \mbox{ and } b \mbox{ divides }m} $$
- $a$ and $b$ coprime, relatively prime, mutually prime $\Leftrightarrow$ $\gcd(a,b)=1$
Are there infinite number of prime numbers?
- yes!
-
proof
- assume there only exist finite number of prime numbers, e.g., $p_1 < p_2 < \cdots <p_n$
- but then, $p_1 \cdot p_2 \cdots p_n + 1$ is prime, but which is greater than $p_n$, hence contradiction
Integers modulo $n$
-
$a\equiv b\Mod{n}$ and $c\equiv d\Mod{n}$ imply
- $a+c\equiv b+d \Mod{n}$
- $ac\equiv bd \Mod{n}$
Euler's theorem
- e.g., $\varphi(12) = \varphi(2^2\cdot 3^1) = 1\cdot2^1\cdot 2\cdot3^0 = 4$, $\varphi(10) = \varphi(2^1\cdot5^1) = 1\cdot2^0\cdot 4\cdot 5^0 =4$
- e.g., $5^4 \equiv 1 \Mod{12}$ whereas $4^4 \equiv 4 \neq 1 \Mod{12}$
- Euler's theorem underlies RSA cryptosystem, which is pervasively used in internet communication
Abstract Algebra
Why Abstract Algebra?
Why abstract algebra?
- it's fun!
- can understand instrict structures of algebraic objects
-
allow us to solve extremely practical problems
(depending on your definition of practicality)
- e.g., can prove why root formulas for polynomials of order $n\geq 5$ do not exist
-
prepare us for pursuing further math topics such as
- differential geometry
- algebraic geometry
- analysis
- representation theory
- algebraic number theory
Some history
- by the way, historically, often the case that application of an idea presented before extracting and presenting the idea on its own right
- e.g., Galois used “quotient group'' only implicitly in his 1830's investigation, and it had to wait until 1889 to be explicitly presented as “abstract quotient group'' by Hölder
Groups
Monoids
- when $(\forall x, y, z \in S)((xy)z = x(yz))$, composition is said to be associative
- $e\in S$ such that $(\forall x\in S)(ex = xe = x)$, called unit element - always unique for any two unit elements $e$ and $f$, $e = ef = f,$ hence, $e=f$
- monoid $M$ with $\left( \forall x, y \in M \right) \left( xy = yx \right)$, called commutative or abelian monoid
- subset $H\subset M$ which has the unit element $e$ and is itself monoid, called submonoid
Groups
- for $x\in G$, $y\in G$ with $xy=yx=e$, called inverse of $x$
- group derived from commutative monoid, called abelian group or commutative group
- group $G$ with $|G|<\infty$, called finite group
- (similarly as submonoid) $H\subset G$ that has unit element and is itself group, called subgroup
- subgroup consisting only of unit element, called trivial
Cyclic groups, generators, and direct products
Homeomorphism and isomorphism
- group homeomorphism $f:G\to G'$ is similarly monoid-homeomorphism
- homeomorphism $f:G\to G'$ where exists $g:G\to G'$ such that $f\circ g:G'\to G'$ and $g\circ f:G\to G$ are identity mappings, called isomorphism, sometimes denoted by $G\isomorph G'$
- homeomorphism of $G$ into itself, called endomorphism
- isomorphism of $G$ onto itself, called automorphism
- set of all automorphisms of $G$ is itself group, denoted by \aut{G}
Kernel, image, and embedding of homeomorphism
- for group-homeomorphism $f:G\to G'$, $f(G)\subset G'$ is subgroup of $G'$
- homeomorphism whose kernel is trivial is injective, often denoted by special arrow $$ f:G \injhomeo G' $$
- surjective homeomorphism whose kernel is trivial is isomorphism
- for group $G$, its generators $S$, and another group $G'$, map $f:S\to G'$ has at most one extension to homeomorphism of $G$ into $G'$
Orthogonal subgroups
Cosets of groups
- for $a\in G$, $x\mapsto ax$ induces bijection of $H$ onto $aH$, hence all left cosets have same cardinality
- $aH \cap bH \neq \emptyset$ for $a,b\in G$ implies $aH=bH$
- hence, $G$ is disjoint union of left cosets of $H$
- same statements can be made for right cosets
Indices and orders of groups
hence, if $(G:1)<\infty$, both $(G:H)$ and $(H:1)$ divide $(G:1)$
Normal subgroup
- set of cosets $\set{xH}{x\in G}$ with law of composition defined by $(xH)(yH) = (xy)H,$ forms group with unit element $H$, denoted by $G/H$, called factor group of $G$ by $H$, read $G$ modulo $H$ or $G$ mod $H$
- $x \mapsto xH$ induces homeomorphism of $X$ onto $\set{xH}{x\in G}$, called canonical map , kernel of which is $H$
- kernel of (every) homeomorphism of $G$ is normal subgroups of $G$
- for family of normal subgroups of $G$, $\seq{N_\lambda}$, $\bigcap N_\lambda$ is also normal subgroup
- every subgroup of abelian group is normal
- factor group of abelian group is abelian
- factor group of cyclic group is cyclic
Normalizers and centralizers
- e.g., $A \mapsto \det A$ of multiplicative group of square matrices in $\reals^{n\times n}$ into $\reals\sim\{0\}$ is homeomorphism, kernel of which called special linear group, and (of course) is normal
Normalizers and congruence
- subgroup $H\subset G$ of group $G$ is normal subgroup of its normalizer $N_H$
- subgroup $K\subset G$ with $H\subset K$ where $H$ is normal in $K$ is contained in $N_H$
- for subgroup $K\subset N_H$, $KH$ is group and $H$ is normal in $KH$
- normalizer of $H$ is largest subgroup of $G$ in which $H$ is normal
Exact sequences of homeomorphisms
- for normal subgroup $H\subset G$ of group $G$, sequence $H \overset{j}{\to} G \overset{\varphi}{\to} G/H$ is exact where $j$ is inclusion and $\varphi$
- $0 \overset{}{\to} G' \overset{f}{\to} G \overset{g}{\to} G'' \overset{}{\to} 0$ is exact if and only if $f$ injective, $g$ surjective, and $\Img f = \Ker g$
- if $H=\Ker g$ above, $0 \overset{}{\to} H \overset{}{\to} G \overset{}{\to} G/H \overset{}{\to} 0$
- more precisely, exists commutative diagram as in the figure, in which vertical mappings are isomorphisms and rows are exact
Canonical homeomorphism examples
all homeomorphisms described below called canonical
-
for two groups $G$ & $G'$ and homeomorphism $f:G\to G'$ whose kernel is $H$,
exists unique homeomorphism $f_*: G/H \to G'$ with
$$
f=f_*\circ \varphi
$$
where $\varphi:G\to G/H$ is canonical map,
and $f_*$ is injective
- $f_*$ can be defined by $xH\mapsto f(x)$
- $f_*$ said to be induced by $f$
- $f_*$ induces isomorphism $\lambda: G/H \to \Img f$
- below sequence summarizes above statements $$ G \overset{\varphi}{\to} G/H \overset{\lambda}{\to} \Img f \overset{j}{\to} G $$ where $j$ is inclusion
-
for group $G$,
subgroup $H\subset G$,
and
homeomorphism $f:G\to G'$ whose kernel contains $H$,
intersection of all normal subgroups containing $H$, $N$,
which is the smallest normal subgroup containing $H$,
is contained in $\Ker f$,
i.e.,
$N\subset \Ker f$,
and exists unique homeomorphism, $f_*:G/N\to G'$
such that
$$
f = f_* \circ \varphi
$$
where $\varphi:G\to G/H$ is canonical map
- $f_*$ can be defined by $xN\mapsto f(x)$
- $f_*$ said to be induced by $f$
- for subgroups of $G$, $H$ and $K$ with $K\subset H$, $xK \mapsto xH$ induces homeomorphism of $G/K$ into $G/H$, whose kernel is $\set{xK}{x\in H}$, thus canonical isomorphism $$ (G/K)/(H/K) \isomorph (G/K) $$ this can be shown in the figure where rows are exact
- for subgroup $H\subset G$ and $K\subset G$ with $H$ contained in normalizer of $K$, $H\cap K$ is normal subgroup of $H$, $HK=KH$ is subgroup of $G$, exists surjective homeomorphism $$ H \to HK / K $$ with $x \mapsto xK$, whose kernel is $H\cap K$, hence canonical isomorphism $$ H/(H\cap K) \isomorph HK/K $$
- for group homeomorphism $f:G\to G'$, normal subgroup of $G'$, $H'$, $$ H=f^{-1}(H')\subset G $$ as shown in the figure, $H$ is normal in $G$ and kernel of homeomorphism $$ G \overset{f}{\to} G'\overset{\varphi}{\to} G'/H' $$ is $H$ where $\varphi$ is canonical map, hence we have injective homeomorphism $$ \bar{f}:G/H \to G'/H' $$ again called canonical homeomorphism, giving commutative diagram in the figure; if $f$ is surjective, $\bar{f}$ is isomorphism
Towers
- said to be normal if every $G_{i+1}$ is normal in $G_i$
- said to be abelian if normal and every factor group $G_i/G_{i+1}$ is abelian
- said to be cyclic if normal and every factor group $G_i/G_{i+1}$ is cyclic
- normal if $G'_i$ form normal tower
- abelian if $G'_i$ form abelian tower
- cyclic if $G'_i$ form cyclic tower
Refinement of towers and solvability of groups
- abelian tower of finite group admits cyclic refinement
- finite solvable group admits cyclic tower, whose last element is trivial subgroup
Commutators and commutator subgroups
- $G^C$ is normal in $G$
- $G/G^C$ is commutative
- $G^C$ is contained in kernel of every homeomorphism of $G$ into commutative group
- of above statements
- commutator group is at the heart of solvability and non-solvability problems!
Simple groups
Butterfly lemma
- indeed $$ (U\cap V)/((u\cap V)(U\cap v)) \isomorph\ u(U\cap V) / u(U\cap v) \isomorph\ (U\cap V)v / (u\cap V)v $$
Equivalent towers
Schreier and Jordan-Hölder theorems
Cyclic groups
Properties of cyclic groups
- infinity cyclic group has exactly two generators; if $a$ is one, $a^{-1}$ is the other
- for cyclic group $G$ of order $n$ and generator $x$, set of generators of $G$ is $$ \set{x^m}{m \mbox{ is relatively prime to }n} $$
- for cyclic group $G$ and two generators $a$ and $b$, exists automorphism of $G$ mapping $a$ onto $b$; conversely, every automorphism maps $a$ to some generator
- for cyclic group $G$ of order $n$ and $d\in\naturals$ dividing $n$, exists unique subgroup of order $d$
- for cyclic groups $G_1$ and $G_2$ of orders $n$ and $m$ respectively with $n$ and $m$ relatively prime, $G_1\times G_2$ is cyclic group
- for non-cyclic finite abelian group $G$, exists subgroup isomorphic to $C\times C$ with $C$ cyclic with prime order
Symmetric groups and permutations
Operations of group on set
- $S$, called $G$-set
- denote $\pi(x)$ for $x\in G$ by $\pi_x$, hence homeomorphism denoted by $x\mapsto \pi_x$
- obtain mapping from such operation, $G\times S \to S$, with $(x,s)\mapsto \pi_x(s)$
-
often abbreviate $\pi_x(s)$ by $xs$, with which the following two properties satisfied
- $\left( \forall x,y\in G, s\in S \right) \left( x(ys) = (xy)s \right)$
- $\left( \forall s\in S \right) \left( es = s \right)$
- conversely, for mapping $G\times S\to S$ with $(x,s)\mapsto xs$ satisfying above two properties, $s\mapsto xs$ is permutation for $x\in G$, hence $\pi_x$ is homeomorphism of $G$ into $\perm{S}$
- thus, operation of $G$ on $S$ can be defined as mapping $S\times G\to S$ satisfying above two properties
Conjugation
- $\gamma_x$, called inner
- kernel of conjugation is center of $G$
- to avoid confusion, instead of writing $xy$ for $\gamma_x(y)$, write $$ \gamma_x(y) = xyx^{-1} = \prescript{x}{}{y} \mbox{ and } \gamma_{x^{-1}}(y) = x^{-1}yx = {y}^x $$
- for subset $A\subset G$, map $(x,A) \mapsto xAx^{-1}$ is operation of $G$ on set of subsets of $G$
- similarly for subgroups of $G$
- two subsets of $G$, $A$ and $B$ with $B= x A x^{-1}$ for some $x\in G$, said to be conjugate
Translation
-
for subgroup $H\subset G$,
$T_x(H) = xH$ is left coset
- denote set of left cosets also by $G/H$ even if $H$ is not normal
- denote set of right cosets also by $H\backslash G$
-
examples of translation
-
$G=GL(V)$, group of linear automorphism of vector space with field $F$,
for which, map $(A,v)\mapsto Av$ for $A\in G$ and $v\in V$
defines operation of $G$ on $V$
- $G$ is subgroup of group of permutations, $\perm{V}$
- for $V=F^n$, $G$ is group of nonsingular $n$-by-$n$ matrices
-
$G=GL(V)$, group of linear automorphism of vector space with field $F$,
for which, map $(A,v)\mapsto Av$ for $A\in G$ and $v\in V$
defines operation of $G$ on $V$
Isotropy
- for conjugation operation of group $G$, $G_s$ is normalizer of $s\in G$
- isotropy groups are conjugate, e.g., for $s,s'\in S$ and $y\in G$ with $ys=s'$, $$ G_{s'} = yG_s y^{-1} $$
- by definition, kernel of operation of $G$ on $S$ is $$ K = \bigcap_{s\in S} G_s \subset G $$
- operation with trivial kernel, said to be faithful
- $s\in G$ with $G_s = G$, called fixed point
Orbits of operation
- for $x,y\in G$ in same coset of $G_s$, $xs = ys$, i.e. $\left( \exists z\in G \right) \left( x,y \in zG_s \right) \Leftrightarrow xs = ys$
- hence, mapping $G/G_s \to S$ with $x \mapsto x G_s$ is morphism of $G$-sets, thus
Orbit decomposition and class formula
- orbits are disjoint $$ S = \coprod_{\lambda \in \Lambda} Gs_\lambda $$ where $s_\lambda$ are elements of distinct orbits
Sylow subgroups
- number of fixed points of $H$ is congruent to size of $S$ modulo $p$, i.e. $$ \mbox{\# fixed points of }H \equiv |S| \Mod{p} $$
- if $H$ has exaxctly one fixed point, $|S| \equiv 1\Mod{p}$
- if $p$ divides $|S|$, $|S| \equiv 0\Mod{p}$
Sylow subgroups and solvability
- now can prove following
Rings
Rings
- $A$ is commutative group with respect to addition - unit element denoted by $0$
- $A$ is monoid with respect to multiplication - unit element denoted by $1$
- multiplication is distributive over addition, i.e. $$ \left( \forall x, y, z \in A \right) \left( (x+y)z = xz + yz \mbox{ \& } z(x+y) = zx + zy \right) $$
- do not assume $1\neq 0$
-
can prove, e.g.,
- $\left( \forall x \in A \right) \left( 0x = 0 \right)$ because $0x + x = 0x + 1x = (0+1)x = 1x = x$
- if $1=0$, $A=\{0\}$ because $x = 1x = 0x = 0$
- $\left( \forall x,y\in A \right) \left( (-x)y = -(xy) \right)$ because $xy + (-x)y = (x+ -x)y = 0y = 0$
More on ring
Fields
General distributivity
- general distributivity - for ring $A$, $\seq{x_i}_{i=1}^n\subset A$ and $\seq{y_i}_{i=1}^n\subset A$ $$ \left( \sum x_i \right) \left( \sum y_j \right) = \sum_i \sum_j x_iy_j $$
Ring examples
-
for set $S$ and ring $A$,
set of all mappings of $S$ into $A$ $\Map(S,A)$
whose addition and multiplication are defined as below,
is ring
$$
\begin{eqnarray*}
&
\left(
\forall f,g\in \Map(S,A)
\right)
\left(
\forall x\in S
\right)
\left(
(f+g)(x) = f(x)+g(x)
\right)
&
\\
&
\left(
\forall f,g\in \Map(S,A)
\right)
\left(
\forall x\in S
\right)
\left(
(fg)(x) = f(x)g(x)
\right)
&
\end{eqnarray*}
$$
- additive and multiplicative unit elements of $\Map(S,A)$ are constant maps whose values are additive and multiplicative unit elements of $A$ respectively
- $\Map(S,A)$ is commutative if and only if $A$ is commutative
- for set $S$, $\Map(S,\reals)$ (page~) is a commutative ring
-
for abelian group $M$,
set $\End(M)$ of group homeomorphisms of $M$ into itself
is ring with normal addition and mapping composition as multiplication
- additive and multiplicative unit elements of $\End(M)$ are constant map whose value is the unit element of $M$ and identity mapping respectively
- not commutative in general
- for ring $A$, set $A[X]$ of polynomials over $A$ is ring, ()
-
for field $K$,
$K^{n\times n}$,
i.e.,
set of $n$-by-$n$ matrices with components in $K$,
is ring
- $\left(K^{n\times n}\right)^\ast$, i.e., multiplicative group of units of $K^{n\times n}$, consists of non-singular matrices, i.e., those whose determinants are nonzero
Group ring
- $\sum_{xy=z} a_xb_y$ above defines what is called convolution product
Convolution product
- one may restrict this definition to functions which are $0$ except at finite number of elements
-
for $f,g\in L^1(\reals)$, can define convolution product $f\ast g$ by
$$
(f\ast g) (x) = \int_{\reals} f(x-y)g(y)dy
$$
- satisfies all axioms of ring except that there is not unit element
- commutative (essentially because $\reals$ is commutative)
- more generally, for locally compact group $G$ wiht Haar measure $\mu$, can define convolution product by $$ (f\ast g) (x) = \int_{G} f(xy^{-1})g(y)d\mu(y) $$
Ideals of ring
- for ring $A$, $(0)$ are $A$ itself area ideals
- $a$, said to be generator of $\ideal{a}=Aa$ (over $A$)
Principle rings
- $\integers$ (set of integers) is principal ring
- $k[X]$ (ring of polynomials) for field $k$ is principal ring
-
ring of algebraic integers in number field $K$
is not necessarily principal
- let $\ideal{p}$ be prime ideal, let $R_\ideal{p}$ be ring of all elements $a/b$ with $a,b\in R$ and $b\not\in\ideal{p}$, then $R_\ideal{p}$ is principal, with one prime ideal $\ideal{m}_\ideal{p}$ consisting of all elements $a/b$ as above but with $a\in\ideal{p}$
-
let $A$
be set of entire functions on complex plane,
then $A$ is commutative ring,
and every finitely generated ideal is principal
- given discrete set of complex numbers $\{z_i\}$ and nonnegative integers $\{m_i\}$, exists entire function $f$ having zeros at $z_i$ of multiplicity $m_i$ and no other zeros
- every principal ideal is of form $Af$ for some such $f$
- group of units $A^\ast$ in $A$ consists of functions having no zeros
Ideals as both additive and multiplicative monoids
-
ideals form additive monoid
- for left ideals $\ideal{a}$, $\ideal{b}$, $\ideal{c}$ of ring $A$, $\ideal{a}+\ideal{b}$ is left ideal, $(\ideal{a}+\ideal{b})+\ideal{c} =\ideal{a}+(\ideal{b}+\ideal{c})$, hence form additive monoid with $(0)$ as the unit elemenet
- similarly for right ideals & two-sided ideals
-
ideals form multiplicative monoid
- for left ideals $\ideal{a}$, $\ideal{b}$, $\ideal{c}$ of ring $A$, define $\ideal{a}\ideal{b}$ as $$ \ideal{a}\ideal{b} = \bigcup_{i=1}^\infty \bigsetl{\sum_{i=1}^n x_i y_i}{x_i \in \ideal{a},y_i\in \ideal{b}} $$ then $\ideal{a}\ideal{b}$ is also left ideal, $(\ideal{a}\ideal{b})\ideal{c} =\ideal{a}(\ideal{b}\ideal{c})$, hence form multiplicative monoid with $A$ itself as the unit elemenet; for this reason, this unit element $A$, i.e., the ring itself, often written as $(1)$
- similarly for right ideals & two-sided ideals
- ideal multiplication is also distributive over addition
- however, set of ideals does not form ring (because the additive monoid is not group)
Generators of ideal
- above equal to smallest ideals containing $a_i$, i.e., intersection of all ideals containing $a_i$ $$ \cap_{a_1,\ldots, a_n\in\ideal{a}} \ideal{a} $$ - just like set ($\sigma$-)algebras in set theory
Entire rings
Ring-homeomorphism
- kernel of ring-homeomorphism $f:A\to B$ is ideal of $A$
- conversely, for ideal $\ideal{a}$, can construct factor ring $A/\ideal{a}$
- simply say “homeomorphism'' if reference to ring is clear
Factor ring and canonical map
-
for ring $A$ and ideal $\ideal{a}$
- for subset $S\subset \ideal{a}$, write $S \equiv 0 \Mod{\ideal{a}}$
- for $x,y\in A$, if $x-y\in\ideal{a}$, write $x \equiv y \Mod{\ideal{a}}$
- if $\ideal{a} = (a)$ for $a\in A$, for $x,y\in A$, if $x-y\in\ideal{a}$, write $x \equiv y \Mod{a}$
Factor ring induced ring-homeomorphism
- the ring canonical map $f:A\to A/\ideal{a}$ is universal in category of homeomorphisms whose kernel contains $\ideal{a}$
Prime ideal and maximal ideal
- equivalently, ideal $\ideal{p}\neq A$ is prime if and only if $\left( \forall x,y \in A \right) \left( xy \in \ideal{p} \Rightarrow x \in \ideal{p} \mbox{ or } y \in \ideal{p} \right)$
- every maximal ideal is prime
- every ideal is contained in some maximal ideal
- ideal $\{0\}$ is prime if and only if $A$ is entire
- ideal $\ideal{m}$ is maximal if and only if $A/\ideal{m}$ is field
- inverse image of prime ideal of commutative ring homeomorphism is prime
Embedding of ring
- indeed, for bijective ring-isomorphism $f:A\to B$, exists set-theoretic inverse $g:B\to A$ of $f$, which is ring-homeomorphism
Characteristic of ring
-
for ring $A$,
consider ring-homeomorphism
$$
\lambda:\integers \to A
$$
such that
$$
\lambda(n) = ne
$$
where $e$ is multiplicative unit element of $A$
- kernel of $\lambda$ is ideal $(n)$ for some $n\geq0$, i.e., ideal generated by some nonnegative integer $n$
- hence, canonical injective ring-homeomorphism $\integers/n\integers \to A$, which is ring-isomorphism between $\integers/n\integers$ and subring of $A$
- when $n\integers$ is prime ideal, exist two cases; either $n=0$ or $n=p$ for prime number $p$
Prime fields and prime rings
- field $K$ has characteristic $0$ or $p$ for prime number $p$
-
$K$ contains as subfield (isomorphic image of)
- $\rationals$ if characteristic is $0$
- $\primefield{p}$ if characteristic is $p$
$\integers/n\integers$
- $\integers$ is ring
- every ideal of $\integers$ is principal, i.e., either $\{0\}$ or $n\integers$ for some $n\in\naturals$ (refer to page~)
-
ideal of $\integers$ is prime if and only if is $p\integers$ for some prime number $p\in\naturals$
- $p\integers$ is maximal ideal
- $\integers/p\integers$ for prime $p$ is field and denoted by \primefield{p}
Euler phi-function
Chinese remainder theorem
Isomorphism of endomorphisms of cyclic groups
- ring isomorphism $$ \integers/n\integers \isomorph \End(A) $$
- group isomorphism $$ (\integers/n\integers)^\ast \isomorph \Aut(A) $$
- e.g., for group of $n$-th roots of unity in $\complexes$, all automorphisms are given by $$ \xi \mapsto \xi^k $$ for $k\in(\integers/n\integers)^\ast$
Irreducibility and factorial rings
Greatest common divisor
Polynomials
Why (ring of) polynomials?
- lays ground work for polynomials in general
-
needs polynomials over arbitrary rings for diverse purposes
- polynomials over finite field which cannot be identified with polynomial functions in that field
- polynomials with integer coefficients; reduce them mod $p$ for prime $p$
- polynomials over arbitrary commutative rings
- rings of polynomial differential operators for algebraic geometry & analysis
- e.g., ring learning with errors (RLWE) for cryptographic algorithms
Ring of polynomials
- exist many ways to define polynomials over commutative ring; here's one
- for every $a\in A$, define function which has value $a$ on $X^n$, and value $0$ for every other element of $S$, by $aX^r$
- then, a polynomial can be uniquely written as $$ f(X) = a_0X^0 + \cdots + a_nX^n $$ for some $n\in\integers_+$, $a_i\in A$
- $a_i$, called coefficients of $f$
Polynomial functions
- hence, for $x\in B$, subring $A[x]$ of $B$ generated by $x$ over $A$ is ring of all polynomial values $f(x)$ for $f\in A[X]$
- in particular, $X$ is variable over $A$
Polynomial examples
-
consider $\alpha=\sqrt{2}$ and $\bigset{a+b\alpha}{a,b\in\integers}$,
subring of $\integers[\alpha]\subset \reals$
generated by $\alpha$.
- $\alpha$ is not transcendental because $f(\alpha)=0$ for $f(X)=X^2-1$
- hence kernel of evaluation map of $\integers[X]$ into $\integers[\alpha]$ is not injective, hence not isomorphism
- indeed $$ \integers[\alpha] = \bigset{a+b\alpha}{a,b\in\integers} $$
-
consider $\primefield{p}$ for prime number $p$
- $f(X) = X^p - X\in \primefield{p}[X]$ is not zero polynomial, but because $x^{p-1} \equiv 1$ for every nonzero $x\in\primefield{p}$ by (Euler's theorem), $x^p\equiv x$ for every $x\in\primefield{p}$, thus for polynomial function, $f_{\primefield{p}}$, $f_{\primefield{p}}(x)=0$ for every $x$ in $\primefield{p}$
- i.e., non-zero polynomial induces zero polynomial function
Reduction map
- for homeomorphism $\varphi:A\to B$ of commutative rings, exists associated homeomorphisms of polynomial rings $A[X]\to B[X]$ such that $$ f(X) = \sum a_i X^i \mapsto \sum \varphi(a_i) X^i = (\varphi f)(X) $$
- e.g., for complex conjugate $\varphi: \complexes \to \complexes$, homeomorphism of $\complexes[X]$ into itself can be obtained by reduction map $f \mapsto \varphi f$, which is complex conjugate of polynomials with complex coefficients
Basic properties of polynomials in one variable
Constant, monic, and irreducible polynomials
Roots or zeros of polynomials
Induction of zero functions
Reduced polynomials and uniqueness
- for field $k$ with $q$ elements, polynomial in $n$ variables over $k$ can be expressed as $$ f(X_1,\ldots,X_n) = \sum a_i X_1^{\nu_{i,1}} \cdots X_n^{\nu_{i,n}} $$ for finite sequence, $\seqscr{a_i}{i=1}{m}$, and $\seqscr{\nu_{i,1}}{i=1}{m}$, , $\seqscr{\nu_{i,n}}{i=1}{m}$ where $a_i\in k$ and $\nu_{i,j} \geq 0$
- because $X_i^q=X_i$ for any $X_i$, any $\nu_{i,j}\geq q$ can be (repeatedly) replaced by $\nu_{i,j}-(q-1)$, hence $f$ can be rewritten as $$ f(X_1,\ldots,X_n) = \sum a_i X_1^{\mu_{i,1}} \cdots X_n^{\mu_{i,n}} $$ where $0\leq \mu_{i,j} < q$ for all $i,j$
Multiplicative subgroups and $n$-th roots of unity
Algebraic closedness
- e.g., complex numbers are algebraically closed
- every field is contained in some algebraically closed field ()
-
for algebraically closed field $k$
- (of course) every irreducible polynomial in $k[X]$ is of degree $1$
- unique factorization of polynomial of nonnegative degree can be written in form $$ f(X) = c \prod_{i=1}^{r} (X-\alpha_i)^{m_i} $$ with nonzero $c\in k$, distinct roots, $\alpha_1,\ldots,\alpha_r \in k$, and $m_1,\ldots,m_r \in \naturals$
Derivatives of polynomials
- for $f,g\in A[X]$ with commutative ring $A$, and $a\in A$ $$ (f+g)' = f' + g' \quad \mbox{\&} \quad (fg)' = f'g + fg' \quad \mbox{\&} \quad (af)' = af' $$
Multiple roots and multiplicity
- nonzero polynomial $f(X)\in k[X]$ in one variable over field $k$ having $a\in k$ as root can be written of form $$ f(X) = (X-a)^m g(X) $$ with some polynomial $g(X)\in A[X]$ relatively prime to $(X-a)$ (hence, $g(a)\neq0$)
Frobenius endomorphism
- homeomorphism of $K$ into itself $x\mapsto x^p$ has trivial kernel, hence injective
- hence, iterating $r\geq 1$ times yields endomorphism, $x\mapsto x^{p^r}$
Roots with multiplicity $p^r$ in fields having characteristic $p$
-
for field $K$ having characteristic $p$
- $p | {p \choose \nu}$ for all $0< \nu < p$ because $p$ is prime, hence, for every $a,b\in K$ $$ (a+b)^p = a^p + b^p $$
- applying this resurvely $r$ times yields $$ (a+b)^{p^r} = (a^p + b^p)^{p^{r-1}} = (a^{p^2} + b^{p^2})^{p^{r-2}} = \cdots = a^{p^r} + b^{p^r} $$ hence $$ (X-a)^{p^r} = X^{p^r} - a^{p^r} $$
- if $a,c\in K$ satisfy $a^{p^r} = c$ $$ X^{p^r} - c = X^{p^r} - a^{p^r} = (X-a)^{p^r} $$ hence, polynomial $X^{p^r}-c$ has precisely one root $a$ of multiplicity $p^r$!
Algebraic Extension
Algebraic extension
-
will show
- for polynomial over field, always exists some extension of that field where the polynomial has root
- existence of algebraic closure for every field
Extension of field
- can view $E$ as vector space over $F$
- if dimension of the vector space is finite, extension called finite extension of $F$
- if infinite, called infinite extension of $F$
Algebraic over field
- for algebraic $\alpha\neq0$, can always find such equation like above that $a_0\neq0$
-
equivalent statements to
- exists homeomorphism $\varphi: F[X] \to E$ such that $$ \left(\forall x\in F\right) \left(\varphi(x) = x\right) \mbox{ \& } \varphi(X) = \alpha \mbox{ \& } \Ker \varphi \neq \{0\} $$
- exists evaluation homeomorphism $\ev_\alpha: F[X] \to E$ with nonzero kernel (refer to for definition of evaluation homeomorphism)
- in which case, $\Ker \varphi$ is principal ideal (by ), hence generated by single element, thus exists nonzero $p(X) \in F[X]$ (with normalized leading coefficient being $1$) so that $$ F[X] / (p(X)) \isomorph F[\alpha] $$
- $F[\alpha]$ entire (), hence $p(X)$ irreducible (refer to )
Algebraic extensions
- converse is not true, e.g., subfield of complex numbers consisting of algebraic numbers over $\rationals$ is infinite extension of $\rationals$
Dimension of extensions
- if $\seqscr{x_i}{i\in I}{}$ is basis for $F$ over $k$, and $\seqscr{y_j}{j\in J}{}$ is basis for $E$ over $F$, $\seqscr{x_iy_j}{(i,j)\in I\times J}{}$ is basis for $E$ over $k$
Generation of field extensions
- $k(\alpha_1,\ldots, \alpha_n)$ consists of all quotients $f(\alpha_1,\ldots,\alpha_n)/g(\alpha_1,\ldots, \alpha_n)$ where $f,g\in k[X]$ and $g(\alpha_1,\ldots, \alpha_n)\neq0$, i.e. $$ \begin{eqnarray*} &=& k(\alpha_1,\ldots,\alpha_n) \\ &=& \bigset{f(\alpha_1,\ldots, \alpha_n)/g(\alpha_1,\ldots,\alpha_n)}{f,g\in f[X], g(\alpha_1,\ldots,\alpha_n)\neq0} \end{eqnarray*} $$
- any field extension $E$ over $k$ is union of smallest subfields containing $\alpha_1,\ldots, \alpha_n$ where $\alpha_1,\ldots, \alpha_n$ range over finite set of elements of $E$, i.e. $$ E = \bigcup_{n\in\naturals} \bigcup_{\alpha_1, \ldots, \alpha_n \in E} k(\alpha_1,\ldots,\alpha_n) $$
Tower of fields
Algebraicness of finitely generated subfields
- indeed, $\Irr(\alpha,k,X)$ has a fortiori coefficients in $F$
- assume tower of fields $$ k \subset k(\alpha_1) \subset k(\alpha_1, \alpha_2) \subset \cdots \subset k(\alpha_1,\ldots, \alpha_n) $$ where $\alpha_i$ is algebraic over $k$
- then, $\alpha_{i+1}$ is algebraic over $k(\alpha_1,\ldots,\alpha_i)$ (by )
Compositum of subfields and lifting
- cannot define compositum if $E$ and $F$ are not embedded in common field $L$
- could define compositum of set of subfields of $L$ as smallest subfield containing subfields in the set
Lifting
- often draw diagram as in the figure
Finite generation of compositum
- refer to diagra in the figure
Distinguished classes
- for tower of fields $k\subset F\subset E$, extension $k\subset E$ is in $\classk{C}$ if and only if both $k\subset F$ and $F\subset E$ are in $\classk{C}$
- if $k\subset E$ is in $\classk{C}$, $F$ is any extension of $k$, and both $E$ and $F$ are subfields of common field, then $F\subset EF$ is in $\classk{C}$
- if $k\subset F$ and $k\subset E$ are in $\classk{C}$ and both $E$ and $F$ are subfields of common field, $k\subset EF$ is in $\classk{C}$
Both algebraic and finite extensions are distinguished
- true that finitely generated extensions form distinguished class (not necessarily algebraic extensions or finite extensions)
Field embedding and embedding extension
- assuming $F$, $E$, $\sigma$, and $\tau$ same as in , if $\alpha\in E$ is root of $f\in F[X]$, then $\alpha^\tau$ is root of $f^\sigma$ for if $f(X) = \sum_{i=0}^n a_i X^i$, then $f(\alpha) = \sum_{i=0}^n a_i \alpha^i = 0$, and $0 = f(\alpha)^\tau = \sum_{i=0}^n (a_i^\tau ) (\alpha^\tau)^i = \sum_{i=0}^n a_i^\sigma (\alpha^\tau)^i = f^\sigma(\alpha^\tau)$
Embedding of field extensions
Existence of roots of irreducible polynomial
- assume $p(X) \in k[X]$ irreducible polynomial and consider canonical map, which is ring homeomorphism $$ \sigma: k[X] \to k[X] / ((p(X)) $$
-
consider $\Ker \restrict{\sigma}{k}$
- every kernel of ring homeomorphism is ideal, hence if nonzero $a \in \Ker \restrict{\sigma}{k}$, $1\in \Ker \restrict{\sigma}{k}$ because $a^{-1} \in \Ker \restrict{\sigma}{k}$, but $1\not\in (p(X))$
- thus, $\Ker \restrict{\sigma}{k} = \{0\}$, hence $p^\sigma\neq0$
- now for $\alpha = X^\sigma$ $$ p^\sigma(\alpha) = p^\sigma(X^\sigma) = (p(X))^\sigma = 0 $$
- thus, $\alpha$ is algebraic in $k^\sigma$, i.e., $\alpha \in k[X]^\sigma$ is root of $p^\sigma$ in $k^\sigma(\alpha)$
Existence of algebraically closed algebraic field extensions
Isomorphism between algebraically closed algebraic extensions
- thus, algebraically closed algebraic extension is determined up to isomorphism
Algebraic closure
-
examples
- complex conjugation is automorphism of $\complexes$ (which is the only continuous automorphism of $\complexes$)
- subfield of $\complexes$ consisting of all numbers which are algebraic over $\rationals$ is algebraic closure of $\rationals$, i.e., $\algclosure{\rationals}$
- $\algclosure{\rationals} \neq \complexes$
- $\algclosure{\reals} = \complexes$
- \algclosure{\rationals}\ is countable
Splitting fields
- for field, $k$, every $f\in k[X]$ has splitting field in $\algclosure{k}$
Splitting fields for family of polynomials
- in most applications, deal with finite $\Lambda$
- becoming increasingly important to consider infinite algebraic extensions
- various proofs would not be simpler if restricted ourselves to finite cases
Normal extensions
- every embedding of $K$ into $\algclosure{k}$ over $k$ induces automorphism
- $K$ is splitting field of family of polynomials in $k[X]$
- every irreducible polynomial of $k[X]$ which has root in $K$ splits into linear factors in $K$
-
not true that class of normal extensions is distinguished
- e.g., below tower of fields is tower of normal extensions $$ \rationals \subset \rationals(\sqrt{2}) \subset \rationals(\sqrt[4]{2}) $$
- but, extension $\rationals \subset \rationals(\sqrt[4]{2})$ is not normal because complex roots of $X^4-2$ are not in $\rationals(\sqrt[4]{2})$
Retention of normality of extensions
Separable degree of field extensions
-
for field, $F$, and its algebraic extension, $E$
- let $L$ be algebraically closed field and assume embedding, $\sigma:F\to L$
- let $L'$ be another algebraically closed field and assume another embedding, $\tau:F\to L'$ - assume as before that $L'$ is algebraic closure of $F^\tau$
- then implies, exists isomorphism, $\lambda:L\to L'$ extending $\tau\circ \sigma^{-1}$ applied to $F^\sigma$
- let $S_\sigma$ & $S_\tau$ be sets of embedding extensions of $\sigma$ to $E$ in $L$ and $L'$ respectively
- then $\lambda$ induces map from $S_\sigma$ into $S_\tau$ with $\tilde{\sigma} \mapsto \lambda \circ \tilde{\sigma}$ and $\lambda^{-1}$ induces inverse map from $S_\tau$ into $S_\sigma$, hence exists bijection between $S_\sigma$ and $S_\tau$, hence have same cardinality
Multiplicativity of and upper bound on separable degree of field extensions
- i.e., separable degree is at most equal to degree (i.e., dimension) of field extension
Finite separable field extensions
Arbitrary separable field extensions
Separable closure and conjugates
- smallest normal extension of $k$ containing $E$ is compositum of all conjugates of $E$ in $\algclosure{E}$
- $\alpha^{\sigma_1}$, , $\alpha^{\sigma_r}$ are simply distinct roots of $\Irr(\alpha, k, X)$
- smallest normal extension of $k$ containing one of these conjugates is simply $k(\alpha^{\sigma_1}, \ldots, \alpha^{\sigma_r})$
Prime element theorem
Finite fields
Automorphisms of finite fields
- is (ring) homeomorphism with $\Ker \frobmap{p}{n} = \{0\}$ since is field, thus is injective (), and surjective because is finite,
- thus, is isomorphism leaving \primefield{p}\ fixed
Galois Theory
What we will do to appreciate Galois theory
-
study
- group of automorphisms of finite (and infinite) Galois extension (at length)
- give examples, e.g., cyclotomic extensions, abelian extensions, (even) non-abelian ones
- leading into study of matrix representation of Galois group & classifications
-
have
tools to prove
- fundamental theorem of algebra
- insolvability of quintic polynomials
-
mention unsolved problems
- given finite group, exists Galois extension of $\rationals$ having this group as Galois group?
Fixed fields
-
$K^G$ is subfield of $K$ because for every $x,y\in K^G$
- $0^\sigma = 0 \Rightarrow 0\in K^G$
- $(x+y)^\sigma = x^\sigma + y^\sigma = x + y \Rightarrow x+y \in K^G$
- $(-x)^\sigma = - x^\sigma = - x \Rightarrow -x \in K^G$
- $1^\sigma = 1 \Rightarrow 1\in K^G$
- $(xy)^\sigma = x^\sigma y^\sigma = xy \Rightarrow xy\in K^G$
- $(x^{-1})^\sigma = (x^\sigma)^{-1} = x^{-1} \Rightarrow x^{-1} \in K^G$
- $0,1\in K^G$, hence $K^G$ contains prime field
Galois extensions and Galois groups
Fundamental theorem for Galois theory
- map $H \mapsto K^H$ induces isomorphism between set of subgroups of $G(K/k)$ & set of intermediate fields
- subgroup, $H$, of $G(K/k)$, is normal if and only if $K^H/k$ is Galois
- for normal subgroup, $H$, $\sigma\mapsto \restrict{\sigma}{K^H}$ induces isomorphism between $G(K/k)/H$ and $G(K^H/k)$
- shall prove step by step
Galois subgroups association with intermediate fields
- $K/F$ is Galois & $K^{G(K/F)} = F$, hence, $K^G = k$
- map $$ F \mapsto G(K/F) $$ induces injective homeomorphism from set of intermediate fields to subgroups of $G$
- $F/k$ is normal extension if and only if $G(K/F)$ is normal subgroup of $G(K/k)$
- if $F/k$ is normal extension, map, $\sigma \mapsto \restrict{\sigma}{F}$, induces homeomorphism of $G(K/k)$ onto $G(F/k)$ of which $G(K/F)$ is kernel, thus $$ G(F/k) \isomorph G(K/k)/G(K/F) $$
Proof for fundamental theorem for Galois theory
- finally, we prove fundamental theorem for Galois theory ()
-
assume $K/k$ is finite Galois extension
and $H$ is subgroup of $G(K/k)$
- implies $K^H$ is intermediate field, hence implies $K/K^H$ is Galois, implies $G(K/K^H) = H$, thus, every $H$ is Galois
- map, $H\mapsto K^H$, induces homeomorphism, $\sigma$, of set of all subgroups of $G(K/k)$ into set of intermediate fields
- $\sigma$ is injective since for any two subgroups, $H$ and $H'$, of $G(K/k)$, if $K^H=K^{H'}$, then $H=G(K/K^H)=G(K/K^{H'})=H'$
- $\sigma$ is surjective since for every intermediate field, $F$, implies $K/F$ is Galois, $G(K/F)$ is subgroup of $G(K/k)$, and $K^{G(K/F)}=F$, thus, $\sigma(G(K/F)) = K^{G(K/F)}= F$
- therefore, $\sigma$ is isomorphism between set of all subgroups of $G(K/k)$ and set of intermediate fields
- since implies separable extensions are distinguished, $H^K/k$ is separable, thus implies that $K^H/k$ is Galois if and only if $G(K/K^H)$ is normal
- lastly, implies that if $K^H/k$ is Galois, $G(H^K/k) \isomorph G(K/k) / H$
Abelian and cyclic Galois extensions and groups
- if $K/k$ is abelian, $F/k$ is Galois and abelian
- if $K/k$ is cyclic, $F/k$ is Galois and cyclic
Theorems and corollaries about Galois extensions
- $KF / F$ and $K/(K\cap F)$ are Galois extensions
- map $$ \sigma \mapsto \sigma|K $$ induces isomorphism between $G(KF / F)$ and $G(K/(K\cap F))$
- $K_1K_2/k$ is Galois extension
- map $$ \sigma \mapsto (\restrict{\sigma}{K_1}, \restrict{\sigma}{K_2}) $$ of $G(K_1K_2/k)$ into $G(K_1/k) \times G(K_2/k)$ is injective; if $K_1\cap K_2=k$, map is isomorphism
- $K_1\cdots K_n/k$ is Galois extension
- map $$ \sigma \mapsto (\restrict{\sigma}{K_1}, \ldots, \restrict{\sigma}{K_n}) $$ induces isomorphism of $G(K_1\cdots K_n/k)$ onto $G(K_1/k) \times \cdots \times G(K_n/k)$
- $K_1/k$, , $K_n/k$ are Galois extensions
- $G(K_i/k)=G_i$ for $i=1,\ldots,n$
- $K_{i+1}\cap(K_1\cdots K_i) = k$ for $i=1,\ldots,n-1$
- $K=K_1\cdots K_n$
- for two abelian Galois extensions, $K/k$ and $L/k$, $KL/k$ is abelian Galois extension
- for abelian Galois extension, $K/k$, and any extension, $E/k$, $KE/E$ is abelian Galois extension
- for abelian Galois extension, $K/k$, and intermediate field, $E$, both $K/E$ and $E/k$ are abelian Galois extensions
Solvable and radical extensions
- unity, or
- $X^n=a$ with $a\in E_i$, and $n$ prime to characteristic, or
- $X_p-X-a$ with $a\in E_i$ if $p$ is positive characteristic
Applications of Galois theory
Real Analysis
Set Theory
Some principles
Some definitions for functions
- terms, map and function, exterchangeably used
- $X$ and $Y$, called domain of $f$ and codomain of $f$ respectively
- $\set{f(x)}{x\in X}$, called range of $f$
- for $Z\subset Y$, $f^{-1}(Z) = \set{x\in X}{f(x)\in Z}\subset X$, called preimage or inverse image of $Z$ under $f$
- for $y\in Y$, $f^{-1}(\{y\})$, called fiber of $f$ over $y$
- $f$, called injective or injection or one-to-one if $\left( \forall x\neq v \in X \right) \left( f(x) \neq f(v) \right)$
- $f$, called surjective or surjection or onto if $\left( \forall x \in X \right) \left( \exists y in Y \right) (y=f(x))$
- $f$, called bijective or bijection if $f$ is both injective and surjective, in which case, $X$ and $Y$, said to be one-to-one correspondece or bijective correspondece
- $g:Y\to X$, called left inverse if $g\circ f$ is identity function
- $h:Y\to X$, called right inverse if $f\circ h$ is identity function
Some properties of functions
- $f$ is injective if and only if $f$ has left inverse
- $f$ is surjective if and only if $f$ has right inverse
- hence, $f$ is bijective if and only if $f$ has both left and right inverse because if $g$ and $h$ are left and right inverses respectively, $g = g \circ (f\circ h) = (g\circ f)\circ h = h$
- if $|X|=|Y|<\infty$, $f$ is injective if and only if $f$ is surjective if and only if $f$ is bijective
Countability of sets
- set $A$ is countable if range of some function whose domain is $\naturals$
- $\naturals$, $\integers$, $\rationals$: countable
- $\reals$: not countable
Limit sets
-
for sequence, $\seq{A_n}$, of subsets of $X$
- limit superior or limsup of \seq{A_n}, defined by $$ \limsup \seq{A_n} = \bigcap_{n=1}^\infty \bigcup_{m=n}^\infty A_m $$
- limit inferior or liminf of \seq{A_n}, defined by $$ \liminf \seq{A_n} = \bigcup_{n=1}^\infty \bigcap_{m=n}^\infty A_m $$
- always $$ \liminf \seq{A_n} \subset \limsup \seq{A_n} $$
- when $\liminf \seq{A_n} = \limsup \seq{A_n}$, sequence, $\seq{A_n}$, said to converge to it, denote $$ \lim \seq{A_n} = \liminf \seq{A_n} = \limsup \seq{A_n} = A $$
Algebras of sets
-
collection $\alg$ of subsets of $X$ called algebra or Boolean algebra if
$$
(\forall A, B \in \alg) (A\cup B\in\alg)
\mbox{ and }
(\forall A \in \alg) (\compl{A}\in\alg)
$$
- $(\forall A_1, \ldots, A_n \in \alg)(\cup_{i=1}^n A_i \in \alg)$
- $(\forall A_1, \ldots, A_n \in \alg)(\cap_{i=1}^n A_i \in \alg)$
-
algebra $\alg$ called $\sigma$-algebra or Borel field if
- every union of a countable collection of sets in $\alg$ is in $\alg$, i.e., $$ (\forall \seq{A_i})(\cup_{i=1}^\infty A_i \in \alg) $$
- given sequence of sets in algebra $\alg$, $\seq{A_i}$, exists disjoint sequence, $\seq{B_i}$ such that $$ B_i \subset A_i \mbox{ and } \bigcup_{i=1}^\infty B_i = \bigcup_{i=1}^\infty A_i $$
Algebras generated by subsets
-
algebra generated by collection of subsets of $X$, $\coll$, can be found by
$$
\alg =
\bigcap \set{\algk{B}}{\algk{B} \in \collF}
$$
where $\collF$ is family of all algebras containing $\coll$
- smallest algebra $\alg$ containing $\coll$, i.e., $$ (\forall \algk{B} \in \collF)(\alg \subset \algk{B}) $$
-
$\sigma$-algebra generated by collection of subsets of $X$, $\coll$, can be found by
$$
\alg=
\bigcap \set{\algk{B}}{\algk{B} \in \collG}
$$
where $\collG$ is family of all $\sigma$-algebras containing $\coll$
- smallest $\sigma$-algebra $\alg$ containing $\coll$, i.e., $$ (\forall \algk{B} \in \collG)(\alg \subset \algk{B}) $$
Relation
- $x$ said to stand in relation $\rel$ to $y$, denoted by $\relxy{x}{y}$
- $\rel$ said to be relation on $X$ if $\relxy{x}{y}$ $\Rightarrow$ $x\in X$ and $y\in X$
-
$\rel$ is
- transitive if $\relxy{x}{y}$ and $\relxy{y}{z}$ $\Rightarrow$ $\relxy{x}{z}$
- symmetric if $\relxy{x}{y} = \relxy{y}{x}$
- reflexive if $\relxy{x}{x}$
- antisymmetric if $\relxy{x}{y}$ and $\relxy{y}{x}$ $\Rightarrow$ $x=y$
-
$\rel$ is
- equivalence relation if transitive, symmetric, and reflexive, e.g., modulo
- partial ordering if transitive and antisymmetric, e.g., “$\subset$''
-
linear (or simple) ordering if transitive, antisymmetric, and $\relxy{x}{y}$ or $\relxy{y}{x}$ for all $x,y\in X$
- e.g., “$\geq$'' linearly orders $\reals$ while “$\subset$'' does not $\powerset(X)$
Ordering
-
given partial order, $\prec$, $a$ is
- a first/smallest/least element if $x \neq a \Rightarrow a\prec x$
- a last/largest/greatest element if $x \neq a \Rightarrow x\prec a$
- a minimal element if $x \neq a \Rightarrow x \not\prec a$
- a maximal element if $x \neq a \Rightarrow a \not\prec x$
-
partial ordering $\prec$ is
- strict partial ordering if $x\not\prec x$
- reflexive partial ordering if $x\prec x$
-
strict linear ordering $<$ is
- well ordering for $X$ if every nonempty set contains a first element
Axiom of choice and equivalent principles
- also called multiplicative axiom - preferred to be called to axiom of choice by Bertrand Russell for reason writte
- no problem when $\coll$ is finite
- need axiom of choice when $\coll$ is not finite
Infinite direct product
- for $z=\seq{x_\lambda}\in\bigtimes X_\lambda$, $x_\lambda$ called $\lambda$-th coordinate of $z$
- if one of $X_\lambda$ is empty, $\bigtimes X_\lambda$ is empty
- axiom of choice is equivalent to converse, i.e., if none of $X_\lambda$ is empty, $\bigtimes X_\lambda$ is not empty if one of $X_\lambda$ is empty, $\bigtimes X_\lambda$ is empty
- this is why Bertrand Russell prefers multiplicative axiom to axiom of choice for name of axiom ()
Real Number System
Field axioms
-
field axioms - for every $x,y,z\in\field$
- $(x+y)+z= x+(y+z)$ - additive associativity
- $(\exists 0\in\field)(\forall x\in\field)(x+0=x)$ - additive identity
- $(\forall x\in\field)(\exists w\in\field)(x+w=0)$ - additive inverse
- $x+y= y+x$ - additive commutativity
- $(xy)z= x(yz)$ - multiplicative associativity
- $(\exists 1\neq0\in\field)(\forall x\in\field)(x\cdot 1=x)$ - multiplicative identity
- $(\forall x\neq0\in\field)(\exists w\in\field)(xw=1)$ - multiplicative inverse
- $x(y+z) = xy + xz$ - distributivity
- $xy= yx$ - multiplicative commutativity
-
system (set with $+$ and $\cdot$) satisfying axiom of field called field
- e.g., field of module $p$ where $p$ is prime, $\primefield{p}$
Axioms of order
-
axioms of order - subset, $\field_{++}\subset \field$, of positive (real) numbers satisfies
- $x,y\in \field_{++} \Rightarrow x+y\in \field_{++}$
- $x,y\in \field_{++} \Rightarrow xy\in \field_{++}$
- $x\in \field_{++} \Rightarrow -x\not\in \field_{++}$
- $x\in \field \Rightarrow x=0\lor x\in \field_{++} \lor -x \in \field_{++}$
-
system satisfying field axioms & axioms of order called ordered field
- e.g., set of real numbers ($\reals$), set of rational numbers ($\rationals$)
Axiom of completeness
-
completeness axiom
- every nonempty set $S$ of real numbers which has an upper bound has a least upper bound, i.e., $$ \set{l}{(\forall x\in S)(l\leq x)} $$ has least element.
- use $\inf S$ and $\sup S$ for least and greatest element (when exist)
-
ordered field that is complete is complete ordered field
- e.g., $\reals$ (with $+$ and $\cdot$)
-
axiom of Archimedes
- given any $x\in\reals$, there is an integer $n$ such that $x<n$
-
corollary
- given any $x<y \in \reals$, exists $r\in\rationals$ such tat $x < r < y$
Sequences of $\reals$
-
sequence of $\reals$ denoted by $\seq{x_i}_{i=1}^\infty$ or $\seq{x_i}$
- mapping from $\naturals$ to $\reals$
-
limit of $\seq{x_n}$ denoted by $\lim_{n\to\infty} x_n$ or $\lim x_n$ - defined by $a\in\reals$
$$
(\forall \epsilon>0)(\exists N\in\naturals) (n \geq N \Rightarrow |x_n-a|<\epsilon)
$$
- $\lim x_n$ unique if exists
- $\seq{x_n}$ called Cauchy sequence if $$ (\forall \epsilon>0)(\exists N\in\naturals) (n,m \geq N \Rightarrow |x_n-x_m|<\epsilon) $$
-
Cauchy criterion - characterizing complete metric space (including $\reals$)
- sequence converges if and only if Cauchy sequence
Other limits
- cluster point of $\seq{x_n}$ - defined by $c\in\reals$ $$ (\forall \epsilon>0, N\in\naturals)(\exists n>N)(|x_n-c|<\epsilon) $$
- limit superior or limsup of $\seq{x_n}$ $$ \limsup x_n = \inf_n \sup_{k>n} x_k $$
- limit inferior or liminf of $\seq{x_n}$ $$ \liminf x_n = \sup_n \inf_{k>n} x_k $$
- $\liminf x_n \leq \limsup x_n$
- $\seq{x_n}$ converges if and only if $\liminf x_n = \limsup x_n$ (=$\lim x_n$)
Open and closed sets
-
$O$ called open if
$$
(\forall x\in O)(\exists \delta>0)(y\in\reals)(|y-x|<\delta\Rightarrow y\in O)
$$
- intersection of finite collection of open sets is open
- union of any collection of open sets is open
- $\closure{E}$ called closure of $E$ if $$ (\forall x \in \closure{E} \ \&\ \delta>0)(\exists y\in E)(|x-y|<\delta) $$
-
$F$ called closed if
$$
F = \closure{F}
$$
- union of finite collection of closed sets is closed
- intersection of any collection of closed sets is closed
Open and closed sets - facts
- every open set is union of countable collection of disjoint open intervals
-
(Lindelöf) any collection $\coll$ of open sets has a countable subcollection $\seq{O_i}$ such that
$$
\bigcup_{O\in\coll} O = \bigcup_{i} O_i
$$
- equivalently, any collection $\collk{F}$ of closed sets has a countable subcollection $\seq{F_i}$ such that $$ \bigcap_{O\in\collk{F}} F = \bigcap_{i} F_i $$
Covering and Heine-Borel theorem
-
collection $\coll$ of sets called covering of $A$ if
$$
A \subset \bigcup_{O\in\coll} O
$$
- $\coll$ said to cover $A$
- $\coll$ called open covering if every $O\in\coll$ is open
- $\coll$ called finite covering if $\coll$ is finite
- Heine-Borel theorem\index{Heine-Borel theorem}\index{Heine, Heinrich Eduard!Heine-Borel theorem}\index{Borel, Félix Édouard Justin Émile!Heine-Borel theorem} - for any closed and bounded set, every open covering has finite subcovering
-
corollary
- any collection $\coll$ of closed sets including at least one bounded set every finite subcollection of which has nonempty intersection has nonempty intersection.
Continuous functions
- $f$ (with domain $D$) called continuous at $x$ if $$ (\forall\epsilon >0)(\exists \delta>0)(\forall y\in D)(|y-x|<\delta \Rightarrow |f(y)-f(x)|<\epsilon) $$
- $f$ called continuous on $A\subset D$ if $f$ is continuous at every point in $A$
- $f$ called uniformly continuous on $A\subset D$ if $$ (\forall\epsilon >0)(\exists \delta>0)(\forall x,y\in D)(|x-y|<\delta \Rightarrow |f(x)-f(y)|<\epsilon) $$
Continuous functions - facts
- $f$ is continuous if and only if for every open set $O$ (in co-domain), $f^{-1}(O)$ is open
- $f$ continuous on closed and bounded set is uniformly continuous
- extreme value theorem - $f$ continuous on closed and bounded set, $F$, is bounded on $F$ and assumes its maximum and minimum on $F$ $$ (\exists x_1, x_2 \in F)(\forall x\in F)(f(x_1) \leq f(x) \leq f(x_2)) $$
- intermediate value theorem - for $f$ continuous on $[a,b]$ with $f(a) \leq f(b)$, $$ (\forall d)(f(a) \leq d \leq f(b))(\exists c\in[a,b])(f(c) = d) $$
Borel sets and Borel $\sigma$-algebra
-
Borel set
- any set that can be formed from open sets (or, equivalently, from closed sets) through the operations of countable union, countable intersection, and relative complement
-
Borel algebra or Borel $\sigma$-algebra
- smallest $\sigma$-algebra containing all open sets
-
also
- smallest $\sigma$-algebra containing all closed sets
- smallest $\sigma$-algebra containing all open intervals (due to statement on page~)
Various Borel sets
-
countable union of closed sets (in $\reals$),
called an $F_\sigma$ ($F$ for closed & $\sigma$ for sum)
- thus, every countable set, every closed set, every open interval, every open sets, is an $F_\sigma$ (note $(a,b)=\bigcup_{n=1}^\infty [a+1/n,b-1/n]$)
- countable union of sets in $F_\sigma$ again is an $F_\sigma$
-
countable intersection of open sets
called a $G_\delta$ ($G$ for open & $\delta$ for durchschnitt - average in German)
- complement of $F_\sigma$ is a $G_\delta$ and vice versa
- $F_\sigma$ and $G_\delta$ are simple types of Borel sets
- countable intersection of $F_\sigma$'s is $F_{\sigma\delta}$, countable union of $F_{\sigma\delta}$'s is $F_{\sigma\delta\sigma}$, countable intersection of $F_{\sigma\delta\sigma}$'s is $F_{\sigma\delta\sigma\delta}$, etc., & likewise for $G_{\delta \sigma \ldots}$
- below are all classes of Borel sets, but not every Borel set belongs to one of these classes $$ F_{\sigma}, F_{\sigma\delta}, F_{\sigma\delta\sigma}, F_{\sigma\delta\sigma\delta}, \ldots, G_{\delta}, G_{\delta\sigma}, G_{\delta\sigma\delta}, G_{\delta\sigma\delta\sigma}, \ldots, $$
Lebesgue Measure
Riemann integral
-
Riemann integral
- partition induced by sequence $\seq{x_i}_{i=1}^n$ with $a=x_1<\cdots<x_n=b$
-
lower and upper sums
- $L(f,\seq{x_i}) = \sum_{i=1}^{n-1} \inf_{x\in[x_i,x_{i+1}]} f(x) (x_{i+1}-x_{i})$
- $U(f,\seq{x_i}) = \sum_{i=1}^{n-1} \sup_{x\in[x_i,x_{i+1}]} f(x) (x_{i+1}-x_{i})$
- always holds: $L(f,\seq{x_i}) \leq U(f,\seq{y_i})$, hence $$ \sup_{\seq{x_i}} L(f,\seq{x_i}) \leq \inf_{\seq{x_i}} U(f,\seq{x_i}) $$
- Riemann integrable if $$ \sup_{\seq{x_i}} L(f,\seq{x_i}) = \inf_{\seq{x_i}} U(f,\seq{x_i}) $$
- every continuous function is Riemann integrable
Motivation - want measure better than Riemann integrable
- consider indicator (or characteristic) function $\chi_\rationals:[0,1] \to [0,1]$ $$ \chi_\rationals(x) = \left\{\begin{array}{ll} 1 &\mbox{if } x \in \rationals \\ 0 &\mbox{if } x \not\in \rationals \end{array}\right. $$
- not Riemann integrable: $\sup_{\seq{x_i}} L(f,\seq{x_i}) = 0 \neq 1 = \inf_{\seq{x_i}} U(f,\seq{x_i})$
-
however, irrational numbers infinitely more than rational numbers, hence
- want to have some integral $\int$ such that, e.g., $$ \int_{[0,1]} \chi_\rationals(x) dx = 0 \mbox{ and } \int_{[0,1]} (1-\chi_\rationals(x)) dx = 1 $$
Properties of desirable measure
-
want some measure $\mu:\subsetset{M}\to\preals=\set{x\in\reals}{x\geq0}$
- defined for every subset of $\reals$, i.e., $\subsetset{M} = \powerset(\reals)$
- equals to length for open interval $$ \mu[b,a] = b-a $$
- countable additivity: for disjoint $\seq{E_i}_{i=1}^\infty$ $$ \mu(\cup E_i) = \sum \mu(E_i) $$
- translation invariant $$ \mu(E+x) = \mu(E) \mbox{ for } x\in\reals $$
- no such measure exists
- not known whether measure with first three properties exists
-
want to find translation invariant countably additive measure
- hence, give up on first property
Race won by Henri Lebesgue in 1902!
- mathematicians in 19th century struggle to solve this problem
- race won by French mathematician, Henri Léon Lebesgue in 1902!
-
Lebesgue integral covers much wider range of functions
- indeed, $\chi_\rationals$ is Lebesgue integrable $$ \int_{[0,1]} \chi_\rationals(x) dx = 0 \mbox{ and } \int_{[0,1]} (1-\chi_\rationals(x)) dx = 1 $$
Outer measure
- for $E\subset\reals$, define outer measure $\mu^\ast:\powerset(\reals)\to\preals$ $$ \mu^\ast E = \inf_{\seq{I_i}} \left\{\left.\sum l(I_i) \right| E\subset \cup I_i\right\} $$ where $I_i=(a_i,b_i)$ and $l(I_i) = b_i-a_i$
- outer measure of open interval is length $$ \mu^\ast(a_i,b_i) = b_i-a_i $$
- countable subadditivity $$ \mu^\ast\left(\cup E_i\right) \leq \sum \mu^\ast E_i $$
-
corollaries
- $\mu^\ast E = 0$ if $E$ is countable
- $[0,1]$ not countable
Measurable sets
- $E\subset\reals$ called measurable if for every $A\subset\reals$ $$ \mu^\ast A = \mu^\ast (E\cup A) + \mu^\ast (\compl{E}\cup {A}) $$
- $\mu^\ast E =0$, then $E$ measurable
- every open interval $(a,b)$ with $a\geq -\infty$ and $b\leq \infty$ is measurable
- disjoint countable union of measurable sets is measurable, i.e., $\cup E_i$ is measurable
- collection of measurable sets is $\sigma$-algebra
Borel algebra is measurable
-
note
- every open set is disjoint countable union of open intervals (page~)
- disjoint countable union of measurable sets is measurable (page~)
- open intervals are measurable (page~)
- hence, every open set is measurable
-
also
- collection of measurable sets is $\sigma$-algebra (page~)
- every open set is Borel set and Borel sets are $\sigma$-algebra (page~)
- hence, Borel sets are measurable
- specifically, Borel algebra (smallest $\sigma$-algebra containing all open sets) is measurable
Lebesgue measure
- restriction of $\mu^\ast$ in collection $\subsetset{M}$ of measurable sets called Lebesgue measure $$ \mu:\subsetset{M}\to\preals $$
- countable subadditivity - for $\seq{E_n}$ $$ \mu (\cup E_n) \leq \sum \mu E_n $$
- countable additivity - for disjoint $\seq{E_n}$ $$ \mu (\cup E_n) = \sum \mu E_n $$
- for dcreasing sequence of measurable sets, $\seq{E_n}$, i.e., $(\forall n\in\naturals)(E_{n+1} \subset E_n)$ $$ \mu\left( \bigcap E_n \right) = \lim \mu E_n $$
(Lebesgue) measurable sets are nice ones!
- following statements are equivalent $$ \begin{eqnarray*} &-& E \mbox{ is measurable} \\ &-& (\forall \epsilon >0) (\exists \mbox{ open } O\supset E) (\mu^\ast(O\sim E)<\epsilon) \\ &-& (\forall \epsilon >0) (\exists \mbox{ closed } F\subset E) (\mu^\ast(E\sim F)<\epsilon) \\ &-& (\exists G_\delta) (G_\delta \supset E) (\mu^\ast(G_\delta\sim E)<\epsilon) \\ &-& (\exists F_\sigma) (F_\sigma \subset E) (\mu^\ast(E\sim F_\sigma)<\epsilon) \end{eqnarray*} $$
- if $\mu^\ast E$ is finite, above statements are equivalent to $$ (\forall \epsilon>0) \left(\exists U = \bigcup_{i=1}^n (a_i,b_i) \right) (\mu^\ast (U\Delta E) < \epsilon) $$
Lebesgue measure resolves problem in movitation
- let $$ E_1 = \set{x\in[0,1]}{x\in\rationals},\ E_2 = \set{x\in[0,1]}{x\not\in\rationals} $$
- $\mu^\ast E_1=0$ because $E_1$ is countable, hence measurable and $$ \mu E_1 = \mu^\ast E_1 = 0 $$
- algebra implies $E_2 = [0, 1] \cap \compl{E_1}$ is measurable
- countable additivity implies $\mu E_1 + \mu E_2 = \mu[0,1] = 1$, hence $$ \mu E_1 = 1 $$
Lebesgue Measurable Functions
Lebesgue measurable functions
-
for $f:X\to\reals\cup\{-\infty, \infty\}$,
i.e., extended real-valued function, the followings are equivalent
- for every $a\in\reals$, $\set{x\in{X}}{f(x) < a}$ is measurable
- for every $a\in\reals$, $\set{x\in{X}}{f(x) \leq a}$ is measurable
- for every $a\in\reals$, $\set{x\in{X}}{f(x) > a}$ is measurable
- for every $a\in\reals$, $\set{x\in{X}}{f(x) \geq a}$ is measurable
-
if so,
- for every $a\in\reals\cup\{-\infty, \infty\}$, $\set{x\in{X}}{f(x) = a}$ is measurable
-
extended real-valued function, $f$, called (Lebesgue) measurable function if
- domain is measurable
- any one of above four statements holds
Properties of Lebesgue measurable functions
-
for real-valued measurable functions, $f$ and $g$, and $c\in\reals$
- $f+c$, $cf$, $f+g$, $fg$ are measurable
-
for every extended real-valued measurable function sequence, $\seq{f_n}$
- $\sup f_n$, $\limsup f_n$ are measurable
- hence, $\inf f_n$, $\liminf f_n$ are measurable
- thus, if $\lim f_n$ exists, it is measurable
Almost everywhere - a.e.
-
statement, $P(x)$, called almost everywhere or a.e. if
$$
\mu \set{x}{\sim P(x)} = 0
$$
- e.g., $f$ said to be equal to $g$ a.e. if $\mu\set{x}{f(x)\neq g(x)}=0$
- e.g., $\seq{f_n}$ said to converge to $f$ a.e. if $$ (\exists E \mbox{ with } \mu E=0)(\forall x \not\in E)(\lim f_n (x) = f(x)) $$
-
facts
- if $f$ is measurable and $f=g$ i.e., then $g$ is measurable
- if measurable extended real-valued $f$ defined on $[a,b]$ with $f(x) \in\reals$ a.e., then for every $\epsilon>0$, exist step function $g$ and continuous function $h$ such that $$ \mu\set{x}{|f-g| \geq \epsilon} < \epsilon,\ \mu\set{x}{|f-h| \geq \epsilon} < \epsilon $$
Characteristic \& simple functions
-
for any $A\subset\reals$, $\chi_A$ called characteristic function if
$$
\chi_A(x) = \left\{\begin{array}{ll}
1 & x\in A\\
0 & x\not\in A\\
\end{array}\right.
$$
- $\chi_A$ is measurable if and only if $A$ is measurable
- measurable $\varphi$ called simple if for some distinct $\seq{a_i}_{i=1}^n$ $$ \varphi(x) = \sum_{i=1}^n a_i \chi_{A_i}(x) $$ where $A_i = \set{x}{x= a_i}$
Littlewood's three principles
- let $M(E)$ with measurable set, $E$, denote set of measurable functions defined on $E$
-
every (measurable) set is nearly finite union of intervals, e.g.,
- $E$ is measurable if and only if $$ (\forall \epsilon>0) (\exists \{I_i: \mbox{open\ interval}\}_{i=1}^n) (\mu^\ast(E \Delta (\cup I_n)) < \epsilon) $$
-
every (measurable) function is nearly continuous, e.g.,
- (Lusin's theorem) $$ (\forall f \in M[a,b])(\forall \epsilon >0)(\exists g \in C[a,b]) (\mu\set{x}{f(x)\neq g(x)}< \epsilon) $$
- every convergent (measurable) function sequence is nearly uniformly convergent, e.g., $$ \begin{eqnarray*} &=& (\forall \mbox{ measurable }\seq{f_n} \mbox{ converging to } f \mbox { a.e. on } E \mbox{ with } \mu E<\infty) \\ && (\forall \epsilon>0 \mbox{ and } \delta>0) (\exists A\subset E \mbox{ with } \mu(A)<\delta \mbox{ and } N\in\naturals) \\ && (\forall n > N, x\in E\sim A)(|f_n(x)-f(x)|<\epsilon) \end{eqnarray*} $$
Egoroff's theorem
- Egoroff theorem - provides stronger version of third statement on page~ $$ \begin{eqnarray*} &=& (\forall \mbox{ measurable }\seq{f_n} \mbox{ converging to } f \mbox { a.e. on } E \mbox{ with } \mu E<\infty) \\ && (\exists A\subset E \mbox{ with } \mu(A)<\epsilon) (f_n \mbox{ uniformly converges to } f \mbox{ on } E\sim A ) \end{eqnarray*} $$
Lebesgue Integral
Integral of simple functions
- canonical representation of simple function $$ \varphi(x) = \sum_{i=1}^n a_i \chi_{A_i}(x) $$ where $a_i$ are distinct $A_i=\{x|\varphi(x)=a_i\}$ - note $A_i$ are disjoint
- when $\mu\set{x}{\varphi(x)\neq0}< \infty$ and $\varphi = \sum_{i=1}^n a_i \chi_{A_i}$ is canonical representation, define integral of $\varphi$ by $$ \int \varphi = \int \varphi (x) dx= \sum_{i=1}^n a_i \mu A_i $$
- when $E$ is measurable, define $$ \int_E \varphi = \int \varphi \chi_E $$
Properties of integral of simple functions
- for simple functions $\varphi$ and $\psi$ that vanish out of finite measure set, i.e., $\mu\set{x}{\varphi(x)\neq0}<\infty$, $\mu\set{x}{\psi(x)\neq0}<\infty$, and for every $a,b\in\reals$ $$ \int (a\varphi + b\psi) = a \int\varphi + b \int\psi $$
- thus, even for simple function, $\varphi = \sum_{i=1}^n a_i \chi_{A_i}$ that vanishes out of finite measure set, not necessarily in canonical representation, $$ \int \varphi = \sum_{i=1}^n a_i \mu A_i $$
- if $\varphi \geq \psi$ a.e. $$ \int \varphi \geq \int \psi $$
Lebesgue integral of bounded functions
-
for bounded function, $f$, and finite measurable set, $E$,
$$
\sup_{\varphi:\ \mathrm{simple},\ \varphi \leq f} \int_E \varphi
\leq
\inf_{\psi:\ \mathrm{simple},\ f \leq \psi} \int_E \psi
$$
- if $f$ is defined on $E$, $f$ is measurable function if and only if $$ \sup_{\varphi:\ \mathrm{simple},\ \varphi \leq f} \int_E \varphi = \inf_{\psi:\ \mathrm{simple},\ f \leq \psi} \int_E \psi $$
- for bounded measurable function, $f$, defined on measurable set, $E$, with $\mu E < \infty$, define (Lebesgue) integral of $f$ over $E$ $$ \int_E f(x) dx = \sup_{\varphi:\ \mathrm{simple},\ \varphi \leq f} \int_E \varphi = \inf_{\psi:\ \mathrm{simple},\ f \leq \psi} \int_E \psi $$
Properties of Lebesgue integral of bounded functions
-
for bounded measurable functions, $f$ and $g$, defined on $E$ with finite measure
- for every $a,b\in\reals$ $$ \int_E (af+bg) = a \int_E f + b\int_E g $$
- if $f\leq g$ a.e. $$ \int_E f \leq \int_E g $$
- for disjoint measurable sets, $A,B\subset E$, $$ \int_{A\cup B} f = \int_A f + \int_B f $$
- hence, $$ \left|\int_E f \right| \leq \int_E |f| \mbox{ \& } f=g \mbox{ a.e. } \Rightarrow \int_E f = \int_E g $$
Lebesgue integral of bounded functions over finite interval
- if bounded function, $f$, defined on $[a,b]$ is Riemann integrable, then $f$ is measurable and $$ \int_{[a,b]} f = R \int_a^b f(x) dx $$ where $R\int$ denotes Riemann integral
- bounded function, $f$, defined on $[a,b]$ is Riemann integrable if and only if set of points where $f$ is discontinuous has measure zero
- for sequence of measurable functions, $\seq{f_n}$, defined on measurable $E$ with finite measure, and $M>0$, if $|f_n|<M$ for every $n$ and $f(x) = \lim f_n(x)$ for every $x\in E$ $$ \int_E f = \lim \int_E f_n $$
Lebesgue integral of nonnegative functions
- for nonnegative measurable function, $f$, defined on measurable set, $E$, define $$ \int_E f = \sup_{h:\ \mathrm{bounded\ measurable\ function},\ \mu\set{x}{h(x)\neq0}<\infty,\ h\leq f} \int_E h $$
-
for nonnegative measurable functions, $f$ and $g$
- for every $a,b\geq0$ $$ \int_E (af + bg) = a\int_E f + b\int_E g $$
- if $f\geq g$ a.e. $$ \int_E f \leq \int_E g $$
-
thus,
- for every $c>0$ $$ \int_E cf = a\int_E f $$
Fatou's lemma and monotone convergence theorem for Lebesgue integral
-
Fatou's lemma -
for nonnegative measurable function sequence, $\seq{f_n}$,
with $\lim f_n = f$ a.e. on measurable set, $E$
$$
\int_E f \leq \liminf \int_E f_n
$$
- note $\lim f_n$ is measurable (page~), hence $f$ is measurable (page~)
- monotone convergence theorem - for nonnegative increasing measurable function sequence, $\seq{f_n}$, with $\lim f_n = f$ a.e. on measurable set, $E$ $$ \int_E f = \lim \int_E f_n $$
- for nonnegative measure function, $f$, and sequence of disjoint measurable sets, $\seq{E_i}$, $$ \int_{\cup E_i} f = \sum \int_{E_i} f $$
Lebesgue integrability of nonnegative functions
- nonnegative measurable function, $f$, said to be integrable over measurable set, $E$, if $$ \int_E f < \infty $$
- for nonnegative measurable functions, $f$ and $g$, if $f$ is integrable on measurable set, $E$, and $g\leq f$ a.e. on $E$, then $g$ is integrable and $$ \int_E (f-g) = \int_E f - \int_E g $$
- for nonnegative integrable function, $f$, defined on measurable set, $E$, and every $\epsilon$, exists $\delta >0$ such that for every measurable set $A\subset E$ with $\mu A< \epsilon$ (then $f$ is integrable on $A$, of course), $$ \int_A f < \epsilon $$
Lebesgue integral
- for (any) function, $f$, define $f^+$ and $f^-$ such that for every $x$ $$ \begin{eqnarray*} f^+(x) &=& \max\{f(x), 0\} \\ f^-(x) &=& \max\{-f(x), 0\} \end{eqnarray*} $$
- note $f = f^+ - f^-,\ |f| = f^+ + f^-,\ f^- = (-f)^+$
- measurable function, $f$, said to be (Lebesgue) integrable over measurable set, $E$, if (nonnegative measurable) functions, $f^+$ and $f^-$, are integrable $$ \int_E f = \int_E f^+ - \int_E f^- $$
Properties of Lebesgue integral
-
for $f$ and $g$ integrable on measure set, $E$, and $a,b\in\reals$
- $af+bg$ is integral and $$ \int_E (af+bg) = a \int_E f + b\int_E g $$
- if $f\geq g$ a.e. on $E$, $$ \int_E f \geq \int_E g $$
- for disjoint measurable sets, $A,B\subset E$ $$ \int_{A\cup B} f = \int_A f + \int_B g $$
Lebesgue convergence theorem (for Lebesgue integral)
- Lebesgue convergence theorem - for measurable $g$ integrable on measurable set, $E$, and measurable sequence $\seq{f_n}$ converging to $f$ with $|f_n|<g$ a.e. on $E$, ($f$ is measurable (page~), every $f_n$ is integrable (page~)) and $$ \int_E f = \lim \int_E f_n $$
Generalization of Lebesgue convergence theorem (for Lebesgue integral)
- generalization of Lebesgue convergence theorem - for sequence of functions, $\seq{g_n}$, integrable on measurable set, $E$, converging to integrable $g$ a.e. on $E$, and sequence of measurable functions, $\seq{f_n}$, converging to $f$ a.e. on $E$ with $|f_n|<g_n$ a.e. on $E$, if $$ \int_E g = \lim \int_E g_n $$ then ($f$ is measurable (page~), every $f_n$ is integrable (page~)) and $$ \int_E f = \lim \int_E f_n $$
Comments on convergence theorems
- Fatou's lemma (page~), monotone convergence theorem (page~), Lebesgue convergence theorem (page~), all state that under suitable conditions, we say something about $$ \int \lim f_n $$ in terms of $$ \lim \int f_n $$
- Fatou's lemma requires weaker condition than Lebesgue convergence theorem, i.e., only requires “bounded below'' whereas Lebesgue converges theorem also requires “bounded above'' $$ \int \lim f_n \leq \liminf \int f_n $$
-
monotone convergence theorem is somewhat between the two;
- advantage - applicable even when $f$ not integrable
- Fatou's lemma and monotone converges theorem very clsoe in sense that can be derived from each other using only facts of positivity and linearity of integral
Convergence in measure
- $\seq{f_n}$ of measurable functions said to converge $f$ in measure if $$ (\forall \epsilon>0) (\exists N\in\naturals) (\forall n > N) (\mu\set{x}{|f_n-f|>\epsilon} < \epsilon) $$
- thus, third statement on page~ implies $$ (\forall \seq{f_n} \mbox{ converging to } f \mbox { a.e. on } E \mbox{ with } \mu E<\infty) (f_n \mbox{ converge in measure to }f) $$
-
however, the converse is not true, i.e.,
exists $\seq{f_n}$ converging in measure to $f$ that does not converge to $f$ a.e.
- e.g., XXX
- Fatou's lemma (page~), monotone convergence theorem (page~), Lebesgue convergence theorem (page~) remain valid! even when “convergence a.e.'' replaced by “convergence in measure''
Conditions for convergence in measure
Space Overview
Diagrams for relations among various spaces
-
note from the figure
- metric should be defined to utter completeness
- metric spaces can be induced from normed spaces
Classical Banach Spaces
Normed linear space
- $X$ called linear space if $$ (\forall x, y \in X, a, b \in \reals)(ax + by \in X) $$
-
linear space, $X$, called normed space with associated norm $\|\cdot\|: X \to \preals$ if
- $$ (\forall x\in X)(\|x\|=0 \Rightarrow x \equiv 0) $$
- $$ (\forall x \in X, a \in \reals)(\|ax\| = |a|\|x\|) $$
- subadditivity $$ (\forall x,y\in X)(\|x+y\| \leq \|x\| + \|y\|) $$
$L^p$ spaces
- $L^p = L^p[0,1]$ denotes space of (Lebesgue) measurable functions such that $$ \int_{[0,1]} |f|^p < \infty $$
- define $\|\cdot\|:L^p\to\preals$ $$ \|f\| = \|f\|_p = \left(\int_{[0,1]} |f|^p\right)^{1/p} $$
-
$L^p$ are linear normed spaces with norm $\|\cdot\|_p$ when $p\geq 1$ because
- $|f(x)|^p + |g(x)|^p \leq 2^p(|f(x)|^p + |g(x)|^p)$ implies $(\forall f, g\in L^p)(f+g \in L^p)$
- $|\alpha f(x)|^p = |a|^p|f(x)|^p$ implies $(\forall f\in L^p, a \in \reals)(af \in L^p)$
- $\|f\|=0\Rightarrow f=0\mbox{ a.e.}$
- $\|a f\| = |a|\|f\|$
- $\|f+g\|\geq \|f\|+\|g\|$ (Minkowski inequality)
$L^\infty$ space
- $L^\infty = L^\infty[0,1]$ denotes space of measurable functions bounded a.e.
-
$L^\infty$ is linear normed space with norm
$$
\|f\| = \|f\|_\infty = \mathrm{ess\ sup} |f|
= \inf_{g: g=f \ \mathrm{a.e}} \sup_{x\in[0,1]} |g(x)|
$$
- thus $$ \|f\|_\infty = \inf\set{M}{\mu\set{x}{f(x)>M}=0} $$
Inequalities in $L^\infty$
-
Minkowski inequality - for $p\in[1,\infty]$
$$
(\forall f,g\in L^p)(\|f+g\|_p \leq \|f\|_p + \|g\|_p)
$$
- if $p\in(1,\infty)$, equality holds if and only if $(\exists a,b\geq 0 \mbox{ with } ab\neq0)(af = bg \mbox{ a.e.})$
- Minkowski inequality for $0<p<1$: $$ (\forall f,g\in L^p)(f,g\geq0 \mbox{ a.e.} \Rightarrow \|f+g\|_p \geq \|f\|_p + \|g\|_p) $$
-
Hölder's inequality - for $p,q\in[1,\infty]$ with $1/p+1/q=1$
$$
(\forall f\in L^p, g\in L^q)
\left(fg \in L^1 \mbox{ and } \int_{[0,1]} |fg| \leq \int_{[0,1]} |f|^p \int_{[0,1]} |g|^q\right)
$$
- equality holds if and only if $(\exists a,b\geq 0 \mbox{ with } ab\neq0)(a|f|^p = b|g|^q \mbox{ a.e.})$
Convergence and completeness in normed linear spaces
-
$\seq{f_n}$ in normed linear space
- said to converge to $f$, i.e., $\lim f_n =f$ or $f_n \to f$, if $$ (\forall \epsilon>0)(\exists N\in\naturals)(\forall n> N)(\|f_n-f\|<\epsilon) $$
- called Cauchy sequence if $$ (\forall \epsilon>0)(\exists N\in\naturals)(\forall n,m> N)(\|f_n-f_m\|<\epsilon) $$
- called summable if $\sum^n_{i=1} f_i$ converges
- called absolutely summable if $\sum^n_{i=1} |f_i|$ converges
- normed linear space called complete if every Cauchy sequence converges
- normed linear space is complete if and only if every absolutely summable series is summable
Banach space
- complete normed linear space called Banach space
- (Riesz-Fischer) $L^p$ spaces are compact, hence Banach spaces
- convergence in $L^p$ called convergence in mean of order $p$
- convergence in $L^\infty$ implies nearly uniformly converges
Approximation in $L^p$
- $\Delta=\seq{d_i}_{i=0}^n$ with $0=d_1<d_2<\cdots<d_n=1$ called subdivision of $[0,1]$ (with $\Delta_i = [d_{i-1},d_{i}]$)
- $\varphi_{f,\Delta}$ for $f\in L^p$ called step function if $$ \varphi_{f,\Delta}(x) = \frac{1}{d_i-d_{i+1}}\int_{d_{i-1}}^{d_i} f(t)dt \mbox{ for } x\in[d_{i-1},d_i) $$
-
for $f\in L^p$ ($1<p\leq \infty$), exist $\varphi_{f,\Delta}$ and continuous function, $\psi$ such that
$$
\|\varphi_{f,\Delta_i}-f\|<\epsilon
\mbox{ and }
\|\psi-f\|<\epsilon
$$
- $L^p$ version of Littlewood's second principle (page~)
- for $f\in L^p$, $\varphi_{f,\Delta}\to f$ as $\max \Delta_i\to0$, i.e., $$ (\forall \epsilon>0)(\exists \delta>0)(\max \Delta_i < \delta \Rightarrow \|\varphi_{f,\Delta}-f\|_p < \epsilon) $$
Bounded linear functionals on $L^p$
- $F:X\in\reals$ for normed linear space $X$ called linear functional if $$ (\forall f, g \in F, a,b \in\reals)(F(af+bg)=aF(f)+bF(g)) $$
- linear functional, $F$, said to be bounded if $$ (\exists M)(\forall f\in X)(|F(f)|\leq M\|f\|) $$
- smallest such constant called norm of $F$, i.e., $$ \|F\| = \sup_{f\in X, f\neq0} {|F(f)|}/{\|f\|} $$
Riesz representation theorem
- for every $g\in L^q$ ($1\leq p\leq \infty$), following defines a bounded linear functional in $L^p$ $$ F(f) = \int fg $$ where $\|F\|=\|g\|_q$
- Riesz representation theorem - for every bounded linear functional in $L^p$, $F$, ($1\leq p<\infty$), there exists $g\in L^q$ such that $$ F(f) = \int fg $$ where $\|F\|=\|g\|_q$
- for each case, $L^q$ is dual of $L^p$ (refer to page for definition of dual)
Metric Spaces
Metric spaces
-
$\metrics{X}{\rho}$ with nonempty set, $X$, and metric $\rho: X\times X\to\preals$ called metric space
if for every $x,y,z \in X$
- $\rho(x,y)=0 \Leftrightarrow x=y$
- $\rho(x,y)=\rho(y,x)$
- $\rho(x,y) \leq \rho(x,z) + \rho(z,y)$ (triangle inequality)
-
examples of metric spaces
- $\metrics{\reals}{|\cdot|}$, $\metrics{\reals^n}{\|\cdot\|_p}$ with $1\leq p\leq \infty$
- for $f\subset X$, $S_{x,r} = \set{y}{\rho(y,x)<r}$ called ball
- for $E\subset X$, $\sup\set{\rho(x,y)}{x,y \in E}$ called diameter of $E$ defined by
- $\rho$ called pseudometric if 1st requirement removed
- $\rho$ called extended metric if $\rho: X\times X \to\preals\cup\{\infty\}$
Cartesian product
- for two metric spaces $\metrics{X}{\rho}$ and $\metrics{Y}{\sigma}$, metric space $\metrics{X\times Y}{\tau}$ with $\tau:X\times Y\to\preals$ such that $$ \tau((x_1,y_1),(x_2,y_2)) = (\rho(x_1,x_2)^2 + \sigma(y_1,y_2)^2)^{1/2} $$ called Cartesian product metric space
-
$\tau$ satisfies all properties required by metric
- e.g., $\reals^{n} \times \reals^{m} = \reals^{n+m}$
Open sets - metric spaces
-
$O \subset X$ said to be open open if
$$
(\forall x\in O)(\exists \delta>0)(\forall y\in X)(\rho(y,x)<\delta \Rightarrow y\in O)
$$
- $X$ and $\emptyset$ are open
- intersection of finite collection of open sets is open
- union of any collection of open sets is open
Closed sets - metric spaces
-
$x\in X$ called point of closure of $E\subset X$ if
$$
(\forall \epsilon>0)(\exists y\in E)(\rho(y,x) < \epsilon)
$$
- $\closure{E}$ denotes set of points of closure of $E$; called closure of $E$
- $E\subset \closure{E}$
-
$F \subset X$ said to be closed if
$$
F = \closure{F}
$$
- $X$ and $\emptyset$ are closed
- union of finite collection of closed sets is closed
- intersection of any collection of closed sets is closed
- complement of closed set is open
- complement of open set is closed
Dense sets and separability - metric spaces
- $D\subset X$ said to be dense if $$ \closure{D} = X $$
- $X$ is said to be separable if exists finite dense subset, i.e., $$ (\exists D\subset X)(|D| < \infty \ \& \ \closure{D}=X) $$
- $X$ is separable if and only if exists countable collection of open sets $\seq{O_i}$ such that for all open $O\subset X$ $$ O = \bigcup_{O_i\subset O} O_i $$
Continuous functions - metric spaces
- $f:X\to Y$ for metric spaces $\metrics{X}{\rho}$ and $\metrics{Y}{\sigma}$ called mapping or function from $X$ into $Y$
- $f$ said to be onto if $$ f(X)=Y $$
- $f$ said to be continuous at $x\in X$ if $$ (\forall \epsilon>0)(\exists \delta>0)(\forall y\in X)(\rho(y,x)<\delta \Rightarrow \sigma(f(y),f(x))<\epsilon) $$
- $f$ said to be continuous if $f$ is continuous at every $x\in X$
- $f$ is continuous if and only if for every open $O\subset Y$, $f^{-1}(O)$ is open
- if $f:X\to Y$ and $g:Y\to Z$ are continuous, $g\circ f:X\to Z$ is continuous
Homeomorphism
-
one-to-one mapping of $X$ onto $Y$ (or equivalently, one-to-one correspondece between $X$ and $Y$), $f$,
said to be homeomorphism if
- both $f$ and $f^{-1}$ are continuous
- $X$ and $Y$ said to be homeomorphic if exists homeomorphism
- topology is study of properties unaltered by homeomorphisms and such properties called topological
- one-to-one correspondece $X$ and $Y$ is homeomorphism if and only if it maps open sets in $X$ to open sets in $Y$ and vice versa
-
every property defined by means of open sets (or equivalently, closed sets)
or/and being continuous functions
is topological one
- e.g., $f$ is continuous on $X$ is homeomorphism, then $f\circ h^{-1}$ is continuous function on $Y$
Isometry
- homeomorphism preserving distance called isometry, i.e., $$ (\forall x,y \in X)(\sigma(h(x),h(y)) = \rho(x,y)) $$
- $X$ and $Y$ said to be isometric if exists isometry
- (from abstract point of view) two isometric spaces are exactly same; it's nothing but relabeling of points
-
two metrics, $\rho$ and $\sigma$ on $X$, said to be equivalent
if identity mapping of $\metrics{X}{\rho}$ onto $\metrics{X}{\sigma}$
is homeomorphism
- hence, two metrics are equivalent if and only if set in one metric is open whenever open in the other metric
Convergence - metric spaces
-
$\seq{x_n}$ defined for metric space, $X$
-
said to converge to $x$, i.e., $\lim x_n =x$ or $x_n \to x$, if
$$
(\forall \epsilon>0)(\exists N\in\naturals)(\forall n> N)(\rho(x_n,x)<\epsilon)
$$
- equivalently, every ball about $x$ contains all but finitely many points of $\seq{x_n}$
-
said to have cluster point, $x$, if
$$
(\forall \epsilon>0, N\in\naturals)(\exists n> N)(\rho(x_n,x)<\epsilon)
$$
- equivalently, every ball about $x$ contains infinitely many points of $\seq{x_n}$
- equivalently, every ball about $x$ contains at least one point of $\seq{x_n}$
-
said to converge to $x$, i.e., $\lim x_n =x$ or $x_n \to x$, if
$$
(\forall \epsilon>0)(\exists N\in\naturals)(\forall n> N)(\rho(x_n,x)<\epsilon)
$$
-
every convergent point is cluster point
- converse not true
Completeness - metric spaces
- $\seq{x_n}$ of metric space, $X$, called Cauchy sequence if $$ (\forall \epsilon>0)(\exists N\in\naturals)(\forall n,m> N)(\rho(x_n,x_m)<\epsilon) $$
- convergence sequence is Cauchy sequence
-
$X$ said to be complete if every Cauchy sequence converges
- e.g., $\metrics{\reals}{\rho}$ with $\rho(x,y)=|x-y|$
- for incomplete $\metrics{X}{\rho}$, exists complete $X^\ast$ where $X$ is isometrically embedded in $X^\ast$ as dense set
- if $X$ contained in complete $Y$, $X^\ast$ is isometric with $\closure{X}$ in $Y$
Uniform continuity - metric spaces
-
$f:X\to Y$ for metric spaces $\metrics{X}{\rho}$ and $\metrics{Y}{\sigma}$
said to be uniformly continuous if
$$
(\forall \epsilon>0)(\exists \delta)(\forall x,y \in X)(\rho(x,y) < \delta \Rightarrow \sigma(f(x),f(y))<\epsilon)
$$
-
example of continuous, but not uniformly continuous function
- $h:[0,1)\to\preals$ with $h(x)=x/(1-x)$
- $h$ maps Cauchy sequence $\seq{1-1/n}_{n=1}^\infty$ in $[0,1)$ to $\seq{n-1}_{n=1}^\infty$ in $\preals$, which is not Cauchy sequence
-
example of continuous, but not uniformly continuous function
- homeomorphism $f$ between $\metrics{X}{\rho}$ and $\metrics{Y}{\sigma}$ with both $f$ and $f^{-1}$ uniformly continuous called uniform homeomorphism
Uniform homeomorphism
-
uniform homeomorphism $f$ between $\metrics{X}{\rho}$ and $\metrics{Y}{\sigma}$
maps every Cauchy sequence $\seq{x_n}$ in $X$ mapped to $\seq{f(x_n)}$ in $Y$ which is Cauchy
- being Cauchy sequence, hence, being complete preserved by uniform homeomorphism
- being uniformly continuous also preserved by uniform homeomorphism
- each of three properties (being Cauchy sequence, being complete, being uniformly continuous) called uniform property
-
uniform properties are not topological properties, e.g., $h$ on page~
- is homeomorphism between incomplete space $[0,1)$ and complete space $\preals$
- maps Cauchy sequence $\seq{1-1/n}_{n=1}^\infty$ in $[0,1)$ to $\seq{n-1}_{n=1}^\infty$ in $\preals$, which is not Cauchy sequence
- its inverse maps uniformly continuous function $\sin$ back to non-uniformly continuity function on $[0,1)$
Uniform equivalence
- two metrics, $\rho$ and $\sigma$ on $X$, said to be uniformly equivalent if identity mapping of $\metrics{X}{\rho}$ onto $\metrics{X}{\sigma}$ is uniform homeomorphism, i.e., $$ (\forall \epsilon, \delta>0, x,y \in X) (\rho(x,y)<\delta \Rightarrow \sigma(x,y)<\epsilon \ \&\ \sigma(x,y)<\delta \Rightarrow \rho(x,y)<\epsilon) $$
-
example of uniform equivalence on $X\times Y$
- any two of below metrics are uniformly equivalent on $X\times Y$ $$ \begin{eqnarray*} &&\tau((x_1,y_1),(x_2,y_2)) = (\rho(x_1,x_2)^2 + \sigma(y_1,y_2)^2)^{1/2} \\ &&\rho_1((x_1,y_1),(x_2,y_2)) = \rho(x_1,x_2) + \sigma(y_1,y_2) \\ &&\rho_\infty((x_1,y_1),(x_2,y_2)) = \max\{\rho(x_1,x_2), \sigma(y_1,y_2)\} \end{eqnarray*} $$
- for $\metrics{X}{\rho}$ and complete $\metrics{Y}{\sigma}$ and $f:X\to Y$ uniformly continuous on $E\subset X$ into $Y$, exists unique continuous extension $g$ of $f$ on $\closure{E}$, which is uniformly continuous
Subspaces
-
for metric space, $\metrics{X}{\rho}$,
metric space $\metrics{S}{\rho_S}$ with $S\subset X$ and $\rho_S$ being restriction of $\rho$ to S,
called subspace of $\metrics{X}{\rho}$
-
e.g. (with standard Euclidean distance)
- $\rationals$ is subspace of $\reals$
- $\bigsetl{(x,y)\in\reals^2}{y=0}$ is subspace of $\reals^2$, which is isometric to $\reals$
-
e.g. (with standard Euclidean distance)
-
for metric space, $X$, and its subspace, $S$,
- $\closure{E}\subset S$ is closure of $E$ relative to $S$.
- $A\subset S$ is closure relative to $S$ if and only if $(\exists \closure{F}\subset A)(A = \closure{F}\cap S)$
- $A\subset O$ is open relative to $S$ if and only if $(\exists \mbox{ open }{O}\subset A)(A = {O}\cap S)$
-
also
- every subspace of separable metric space is separable
- every complete subset of metric space is closed
- every closed subset of complete metric space is complete
Compact metric spaces
-
motivation - want metric spaces where
- conclusion of Heine-Borel theorem (page~) are valid
- many properties of $[0,1]$ are true, e.g., Bolzano-Weierstrass property (page~)
-
e.g.,
- bounded closed set in $\reals$ has finite open covering property
- metric space $X$ called compact metric space if every open covering of $X$, $\collk{U}$, contains finite open covering of $X$, e.g., $$ (\forall \mbox{ open covering of $X$}, \collk{U})(\exists \{O_1,\ldots,O_n\} \subset \collk{U}) (X\in\cup O_i) $$
-
$A\subset X$ called compact if
compact as subspace of $X$
- i.e., every open covering of $A$ contains finite open covering of $A$
Compact metric spaces - alternative definition
- collection, $\collk{F}$, of sets in $X$ said to have finite intersection property if every finite subcollection of $\collk{F}$ has nonempty intersection
-
if rephrase definition of compact metric spaces in terms of closed instead of open
- $X$ is called compact metric space if every collection of closed sets with empty intersection contains finite subcollection with empty intersection
- thus, $X$ is compact if and only if every collection of closed sets with finite intersection property has nonempty intersection
Bolzano-Weierstrass property and sequential compactness
-
metric space said to
- have Bolzano-Weierstrass property if every sequence has cluster point
- $X$ said to be sequentially compact if every sequence has convergent subsequence
- $X$ has Bolzano-Weierstrass property \iaoi\ sequentially compact
Compact metric spaces - properties
-
following three statements about metric space are equivalent
(not true for general topological sets)
- being compact
- having Bolzano-Weierstrass property
- being sequentially compact
-
compact metric spaces have corresponding to some of those of complete metric spaces
(compare with statements on page~)
- every compact subset of metric space is closed and bounded
- every closed subset of compact metric space is compact
- (will show above in following slides)
Necessary condition for compactness
- compact metric space is sequentially compact
- equivalently, compact metric space has Bolzano-Weierstrass property (page~)
Necessary conditions for sequentially compactness
- every continuity real-valued function on sequentially compact space is bounded and assumes its maximum and minimum
- sequentially compact space is totally bounded
- every open covering of sequentially compact space has Lebesgue number
Sufficient conditions for compactness
- metric space that is totally bounded and has Lebesgue number for every covering is compact
Borel-Lebesgue theorem
-
conditions on
pages ,
,
and
imply the following equivalent statements
- $X$ is compact
- $X$ has Bolzano-Weierstrass property
- $X$ is sequentially compact
- above called Borel-Lebesgue theorem
-
hence, can drop sequentially in every statement on page~,
i.e.,
-
every continuity real-valued function on
sequentiallycompact space is bounded and assumes its maximum and minimum -
sequentiallycompact space is totally bounded -
every open covering of
sequentiallycompact space has Lebesgue number
-
every continuity real-valued function on
Compact metric spaces - other facts
- closed subset of compact space is compact
-
compact subset of metric space is closed and bounded
- hence, Heine-Borel theorem (page~) implies
- set of $\reals$ is compact \iaoi\ closed and bounded
- metric space is compact if and only if it is complete and totally bounded
-
thus, compactness can be viewed as absolute type of closedness
- refer to page~ for exactly same comments for general topological spaces
- continuous image of compact set is compact
- continuous mapping of compact metric space into metric space is uniformly continuous
Diagrams for relations among metric spaces
- the figure shows relations among metric spaces stated on pages , , , and
Baire category
- do (more) deeply into certain aspects of complete metric spaces, namely, Baire theory of category
-
subset $E$ in metric space where $\sim (\closure{E})$ is dense,
said to be nowhere dense
- equivalently, $\closure{E}$ contains no nonempty open set
- union of countable collection of nowhere open sets, said to be of first category or meager
- set not of first category, said to be of second category or nonmeager
- complement of set of first category, called residual or co-meager
Baire category theorem
-
Baire theorem -
for complete metric space, $X$,
and countable collection of dense open subsets, $\seq{O_k}\subset X$,
the intersection of the collection
$$
\bigcap O_k
$$
is dense
- refer to page~ for locally compact space version of Baire theorem
- Baire category theorem - no nonempty open subset of complete metric space is of first category, i.e., union of countable collection of nowhere dense subsets
- Baire category theorem is unusual in that uniform property, \ie, completeness of metric spaces, implies purely topological nature
Second category everywhere
- metric or topological spaces with property that “no nonempty open subset of complete metric space is of first category'', said to be of second category everywhere (with respect to themselves)
- Baire category theorem says complete metric space is of second category everywhere
-
locally compact Hausdorff spaces are of second category everywhere, too
(refer to page~ for definition of locally compact Hausdorff spaces)
- for these spaces, though, many of results of category theory follow directly from local compactness
Sets of first category
-
collection of sets with following properties, called a $\sigma$-ideal of sets
- countable union of sets in the collection is, again, in the collection
- subset of any in the collection is, again, in the collection
-
both of below collections are $\sigma$-ideal of sets
- sets of first category in topological space
- measure zero sets in complete measure space
-
sets of first category regards as “small'' sets
- such sets in complete metric spaces no interior points
- interestingly! set of first category in $[0,1]$ can have Lebesgue measure $1$, hence complement of which is residual set of measure zero
Some facts of category theory
- for open set, $O$, and closed set, $F$, $\closure{O}\sim O$ and $F\sim \interior{F}$ are nowhere dense
- closed set of first category in complete metric space is nowhere dense
- subset of complete metric space is residual if and only if contains dense $G_\delta$, hence subset of complete metric space is of first category if and only if contained in $F_\sigma$ whose complement is dense
- for countable collection of closed sets, $\seq{F_n}$, $\bigcup \interior{F_n}$ is residual open set; if $\bigcup F_n$ is complete metric space, $O$ is dense
- some applications of category theory to analysis seem almost too good to be belived; here's one:
- uniform boundedness principle - for family, $\collF$, of real-valued continuous functions on complete metric space, $X$, with property that $(\forall x\in X)(\exists M_x\in\reals)(\forall f\in\collF)(|f(x)|\leq M_x)$ $$ (\exists \mbox{ open }O, M\in\reals)(\forall x\in O, f\in\collF)(|f(x)|\leq M) $$
Topological Spaces
Motivation for topological spaces
-
want to have something like
- notion of open set is fundamental
- other notions defined in terms of open sets
- more general than metric spaces
-
why not stick to metric spaces?
-
certain notions have natural meaning
not consistent with topological concepts
derived from metric spaces
- e.g.. weak topologies in Banach spaces
-
certain notions have natural meaning
not consistent with topological concepts
derived from metric spaces
Topological spaces
-
$\topos{X}{J}$ with nonempty set $X$ of points and family $\tJ$ of subsets,
which we call open, having the following properties
called
topological spaces
- $\emptyset, X\in\tJ$
- $O_1, O_2 \in\tJ \Rightarrow O_1 \cap O_2 \in\tJ$
- $O_\alpha \Rightarrow \cup_\alpha O_\alpha \in \tJ$
- family, $\tJ$, is called topology
-
for $X$, always exist two topologies defined on $X$
- trivial topology having only $\emptyset$ and $X$
- discrete topology for which every subset of $X$ is an open set
Topological spaces associated with metric spaces
-
can associate topological space, $\topos{X}{J}$, to any metric space $\metrics{X}{\rho}$
where $\tJ$ is family of open sets in $\metrics{X}{\rho}$
- because properties in definition of topological space satisfied by open sets in metric space
-
$\topos{X}{J}$ assiaciated with metric space, $\metrics{X}{\rho}$ said to be metrizable
- $\rho$ called metric for $\tXJ$
-
distinction between metric space and associated topological space is essential
- because different metric spaces associate same topological space
- in this case, these metric spaces are equivalent
- metric and topological spaces are couples
Some definitions for topological spaces
- subset $F\subset X$ with $\compl{F}$ is open called closed
-
intersection of all closed sets containing $E\subset X$ called closure of $E$ denoted by $\closure{E}$
- $\closure{E}$ is smallest closed set containing $E$
- $x\in X$ called point of closure of $E\subset X$ if every open set containing $x$ meets $E$, i.e., has nonempty intersection with $E$
- union of all open sets contained in $E\subset X$ is called interior of $E$ denoted by $\interior{E}$
- $x\in X$ called interior point of $E$ if exists open set, $E$, with $x\in O\subset E$
Some properties of topological spaces
- $\emptyset$, $X$ are closed
- union of closed sets is closed
- intersection of any collection of closed sets is closed
- $E\subset \closure{E}$, $\closure{\closure{E}} = \closure{E}$, $\closure{A\cup B} = \closure{A} \cup \closure{B}$
- $F$ closed if and only if $\closure{F}=F$
- $\closure{E}$ is set of points of closure of $E$
- $\interior{E}\subset E$, $\interior{(\interior{E})} = \interior{E}$, $\interior{(A\cup B)} = \interior{A} \cup \interior{B}$
- $\interior{E}$ is set of interior points of $E$
- $\interior{(\compl{E})} = \sim \closure{E}$
Subspace and convergence of topological spaces
-
for subset of $\topos{X}{J}$, $A$,
define topology \tS\ for $A$
with $\tS = \set{A\cap O}{O \in \tJ}$
- $\tS$ called topology inherited from \tJ
- $\topos{A}{S}$ called subspace of $\topos{X}{J}$
-
$\seq{x_n}$ said to converge to $x\in X$ if
$$
(\forall O \in \tJ \mbox{ containing } x)(\exists N\in\naturals)(\forall n>N)(x_n \in O)
$$
- denoted by $$ \lim x_n = x $$
- $\seq{x_n}$ said to have $x\in X$ as cluster point if $$ (\forall O \in\tJ\mbox{ containing } x, N\in\naturals)(\exists n>N)(x_n \in O) $$
-
$\seq{x_n}$ has converging subsequence to $x\in X$, then $x$ is cluster point of $\seq{x_n}$
- converse is not true for arbitrary topological space
Continuity in topological spaces
- mapping $f:X\to Y$ with $\topos{X}{J}$, $\topos{Y}{S}$ said to be continuous if $$ (\forall O\in \tS)(f^{-1}(O) \in \tJ) $$
- $f:X \to Y$ said to be continuous at $x\in X$ if $$ (\forall O\in\tS\mbox{ containing } f(x))(\exists U\in\tJ\mbox{ containing } x)(f(U)\subset O) $$
- $f$ is continuous if and only if $f$ is continuous at every $x\in X$
- for continuous $f$ on $\topos{X}{J}$, restriction $g$ on $A\subset X$ is continuous
- for $A$ with $A=A_1 \cup A_2$ where both $A_1$ and $A_2$ are either open or closed, $f:A\to Y$ with each of both restrictions, $\restrict{f}{A_1}$ and $\restrict{f}{A_2}$, continuous, is continuous
Homeomorphism for topological spaces
- one-to-one continuous function of $X$ onto $Y$, $f$, with continuous inverse function, $f^{-1}$, called homeomorphism between $\topos{X}{J}$ and $\topos{Y}{S}$
- $\topos{X}{J}$ and $\topos{Y}{S}$ said to be homeomorphic if exists homeomorphism between them
- homeomorphic spaces are indistinguishable where homeomorphism amounting to relabeling of points (from abstract pointp of view)
-
thus, below roles are same
- role that homeomorphism plays for topological spaces
- role that isometry plays for metric spaces
- role that isomorphism plays for algebraic systems
Stronger and weaker topologies
-
for two topologies, $\tJ$ and $\tS$ for same $X$ with $\tS\supset\tJ$
- $\tS$ said to be stronger or finer than $\tJ$
- $\tJ$ said to be weaker or coarser than $\tS$
- $\tS$ is stronger than $\tJ$ if and only if identity mapping of $\topos{X}{S}$ to $\topos{Y}{J}$ is continuous
- for two topologies, $\tJ$ and $\tS$ for same $X$, $\tJ\cap\tS$ also topology
- for any collection of topologies, $\{\tJ_\alpha\}$ for same $X$, $\cap_\alpha \tJ_\alpha$ is topology
-
for nonempty set, $X$, and any collection of subsets of $X$, $\coll$
- exists weakest topology containing \coll, i.e., weakest topology where all subsets in $\coll$ are open
- it is intersection of all topologies containing $\coll$
Bases for topological spaces
- collection $\collB$ of open sets of $\tXJ$ called a base for topology, $\tJ$, of $X$ if $$ (\forall O\in \tJ, x\in O)(\exists B\in\collB)(x\in B\subset O) $$
-
collection $\collB_x$ of open sets of $\tXJ$ containing $x$ called a base at $x$
if
$$
(\forall O\in\tJ \mbox{ containing }x)(\exists B\in\collB_x)(x\in B\subset O)
$$
- elements of $\collB_x$ often called neighborhoods of $x$
- when no base given, neighborhood of $x$ is an open set containing $x$
- thus, $\collB$ of open sets is a base if and only if contains a base for every $x\in X$
-
for topological space that is also metric space
- all balls from a base
- balls centered at $x$ form a base at $x$
Characterization of topological spaces in terms of bases
- definition of open sets in terms of base - when $\collB$ is base of $\tXJ$ $$ (O\in\tJ) \Leftrightarrow (\forall x\in O)(\exists B\in\collB)(x\in B\subset O) $$
-
often, convenient to specify topology for $X$ by
- specifying a base of open sets, $\collB$, and
- using above criterion to define open sets
-
collection of subsets of $X$, $\collB$, is base for some topology if and only if
$$
\begin{eqnarray*}
&(\forall x\in X)(\exists B\in\collB)(x\in B)&
\\
&\mbox{and}&
\\
&(\forall x\in X, B_1, B_2 \in \collB \mbox{ with } x\in B_1\cap B_2)
(\exists B_3\in \collB)(x\in B_3 \subset B_1\cap B_2)&
\end{eqnarray*}
$$
- condition of collection to be basis for some topology
Subbases for topological spaces
-
for $\tXJ$, collection of open sets, $\coll$, called a subbase for topology $\tJ$
if
$$
(\forall O\in \tJ, x\in O)(\exists \seq{C_i}_{i=1}^n\subset\coll)(x\in \cap C_i \subset O)
$$
- sometimes convenient to define topology in terms of subbase
- for subbase for $\tJ$, $\coll$, collection of finite intersections of sets from $\coll$ forms base for $\tJ$
- any collection of subsets of $X$ is subbase for weakest topology where sets of the collection are open
Axioms of countability
-
topological space said to satisfy first axiom of countability
if
exists countable base at every point
- every metric space satisfies first axiom of countability because for every $x\in X$, set of balls centered at $x$ with rational radii forms base for $x$
-
topological space said to satisfy second axiom of countability
if
exists countable base for the space
- every metric space satisfies second axiom of countability if and only if separable (refer to page~ for definition of separability)
Topological spaces - facts
-
given base, $\collB$, for $\tXJ$
- $x \in \closure{E}$ if and only if $(\exists B\in\collB)(x\in B \ \&\ B\cap E \neq \emptyset)$
-
given base at $x$ for $\tXJ$, $\collB_x$, and base at $y$ for $\topos{Y}{S}$, $\topol{C}_y$
- $f:X\to Y$ continuous at $x$ if and only if $(\forall C\in\topol{C}_y)(\exists B\in\collB_x)(B\subset f^{-1}(C))$
-
if $\tXJ$ satisfies first axiom of countability
- $x \in \closure{E}$ if and only if $(\exists \seq{x_n} \mbox{ from } E)(\lim x_n = x)$
- $x$ cluster point of $\seq{x_n}$ if and only if exists its subsequence converging to $x$
- $\tXJ$ said to be Lindelöf space or have Lindelöf property if every open covering of $X$ has countable subcover
- second axiom of countability implies Lindelöf property
Separation axioms
-
why separation axioms
- properties of topological spaces are (in general) quite different from those of metric spaces
- often convenient assume additional conditions true in metric spaces
-
separation axioms
-
$T_1$ - Tychonoff spaces
- $(\forall x \neq y \in X)(\exists \mbox{ open }O\subset X)(y \in O, x \not\in O)$
-
$T_2$ - Hausdorff spaces
- $(\forall x \neq y \in X)(\exists \mbox{ open }O_1, O_2\subset X \mbox{ with } O_1\cap O_2=\emptyset)(x \in O_1, y \in O_2)$
-
$T_3$ - regular spaces
- $T_1$ & $(\forall \mbox{ closed } F \subset X, x \not\in F) (\exists \mbox{ open }O_1, O_2\subset X \mbox{ with } O_1\cap O_2=\emptyset) (x \in O_1, F \subset O_2)$
-
$T_4$ - normal spaces
- $T_1$ & $(\forall \mbox{ closed } F_1, F_2 \subset X) (\exists \mbox{ open }O_1, O_2\subset X \mbox{ with } O_1\cap O_2=\emptyset) (F_1 \subset O_1, F_2 \subset O_2)$
-
$T_1$ - Tychonoff spaces
Separation axioms - facts
-
necessary and sufficient condition for $T_1$
- topological space satisfies $T_1$ if and only if every singletone, $\{x\}$, is closed
-
important consequences of normality, $T_4$
- Urysohn's lemma - for normal topological space, $X$ $$ (\forall \mbox{ disjoint closed } A, B \subset X) (\exists f\in C(X,[0,1])) (f(A) = \{0\}, f(B) = \{1\}) $$
- Tietze's extension theorem - for normal topological space, $X$ $$ (\forall \mbox{ closed } A \subset X, f\in C(A,\reals)) (\exists g \in C(X,\reals)) (\forall x \in A) (g(x) = f(x)) $$
- Urysohn metrization theorem - normal topological space satisfying second axiom of countability is metrizable
Weak topology generated by functions
-
given any set of points, $X$ & any collection of functions of $X$ into $\reals$, $\collk{F}$,
exists weakest totally on $X$ such that
all functions in $\collk{F}$ is continuous
- it is weakest topology containing - refer to page~ $$ \coll\ = \bigcup_{f\in\collk{F}} \bigcup_{O\subset \reals} f^{-1}(O) $$
- called weak topology generated by $\collk{F}$
Complete regularity
-
for $\tXJ$ and continuous function collection $\collk{F}$,
weak topology generated by $\collk{F}$ is weaker than $\tJ$
- however, if $$ (\forall \mbox{ closed } F\subset X, x \not\in F)(\exists f\in\collk{F})(f(A)=\{0\}, f(x)=1) $$ then, weak topology generated by $\collk{F}$ coincides with $\tJ$
- if condition satisfied by $\collk{F} = C(X,\reals)$, $X$ said to be completely regular provided $X$ satisfied $T_1$ (Tychonoff space)
- every normal topological ($T_4$) space is completely regular (Urysohn's lemma)
- every completely regular space is regular space ($T_3$)
- complete regularity sometimes called $T_{3\frac{1}{2}}$
Diagrams for separation axioms for topological spaces
- the figure shows $T_4 \Rightarrow T_{3\frac{1}{2}} \Rightarrow T_3 \Rightarrow T_2 \Rightarrow T_1$
- every metric spaces is normal space
Topological spaces of interest
-
very general topological spaces quite bizarre
- do not seem to be much needed in analysis
-
only topological spaces (Royden) found useful for analysis are
- metrizable topological spaces
- locally compact Hausdorff spaces
- topological vector spaces
- all above are completely regular
- algebraic geometry, however, uses Zariski topology on affine or projective space, topology giving us compact $T_1$ space which is not Hausdorff
Connectedness
-
topological space, $X$,said to be connected if not exist two nonempty disjoint open sets, $O_1$ and $O_2$,
such that $O_1\cup O_2 = X$
- such pair, $(O_1, O_2)$, if exist, called separation of $X$
- pair of disjoint nonempty closed sets, $(F_1,F_2)$, with $F_1\cup F_2=X$ is also separation of $X$ - because they are also open
- $X$ is connected if and only if only subsets that are both closed and open are $\emptyset$ and $X$
-
subset $E\subset X$ said to be connected
if connected in topology inherited from $\tXJ$
- thus, $E$ is connected if not exist two nonempty open sets, $O_1$ and $O_2$, such that $E\subset O_1\cup O_2$ and $E\cap O_1\cap O_2 = \emptyset$
Properties of connected space, component, and local connectedness
- if exists continuous mapping of connected space to topological space, $Y$, $Y$ is connected
- (generalized version of) intermediate value theorem - for $f:X\to\reals$ where $X$ is connected $$ (\forall x, y \in X, c\in \reals \mbox{ with } f(x) < c < f(y))(\exists z \in X)(z=f(z)) $$
- subset of $\reals$ is connected if and only if is either interval or singletone
-
for $x\in X$, union of all connected sets containing $x$ is called component
- component is connected and closed
- two components containing same point coincide
- thus, $X$ is disjoint union of components
-
$X$ said to be locally connected if exists base for $X$ consisting of connected sets
- components of locally connected space are open
- space can be connected, but not locally connected
Product topological spaces
- for $\tXJ$ and $\topos{Y}{S}$, topology on $X\times Y$ taking as a base the following $$ \set{O_1 \times O_2}{O_1 \in \tJ, O_2 \in \topol{S}} $$
-
called product topology for $X\times Y$
- for metric spaces, $X$ and $Y$, product topology is product metric
- for indexed family with index set, $\collk{A}$, $\topos{X_\alpha}{\tJ_\alpha}$, product topology on $\bigtimes_{\alpha\in\collk{A}} X_{\alpha}$ defined as taking as a base the following $\bigsetl{\bigtimes X_\alpha}{O_\alpha\in \tJ_\alpha, O_\alpha = X_\alpha \mbox{ except finite number of }\alpha}$
-
$\pi_\alpha: \bigtimes X_{\alpha} \to X_\alpha$ with $\pi_\alpha(y) = x_\alpha$,
i.e., $\alpha$-th coordinate, called projection
- every $\pi_\alpha$ continuous
- $\bigtimes X_\alpha$ weakest topology with continuous $\pi_\alpha$'s
- if $(\forall \alpha\in\collk{A})(X_\alpha=X)$, $\bigtimes X_{\alpha}$ denoted by $X^\collk{A}$
Product topology with countable index set
-
for countable $\collk{A}$
-
$\bigtimes X_\alpha$ denoted by $X^\omega$ or $X^\naturals$
$\because$ only # elements of $\collk{A}$ important
- e.g., $\mbox{\bf 2}^\omega$ is Cantor set if denoting discrete topology with two elements by
-
$\bigtimes X_\alpha$ denoted by $X^\omega$ or $X^\naturals$
$\because$ only # elements of $\collk{A}$ important
- if $X$ is metrizable, $X^\omega$ is metrizable
- $\naturals^\omega = \naturals^\naturals$ is topology space homeomorphic to $\reals\sim\rationals$ when denoting discrete topology with countable set also by $\naturals$
Product topologies induced by set and continuous functions
- for $I=[0,1]$, $I^\collk{A}$ called cube
- $I^\omega$ is metrizable, and called Hilbert cube
-
for any set $X$ and any collection of $f:X\to[0,1]$, $\collk{F}$
with $(\forall x\neq y\in X)(\exists f\in\collk{F})(f(x)\neq f(y))$
-
can define one-to-one mapping of \collk{F}\ into $I^X$
with $f(x)$ as $x$-th coordinate of $f$
- $\pi_x: \collk{F} \to I$ (mapping of $\collk{F}$ into $I$) with $\pi_x(f) = f(x)$
- topology that $\collk{F}$ inherits as subspace of $I^X$ called topology of pointwise convergence (because $\pi_x$ is project, hence continuous)
-
can define one-to-one mapping of $X$ into $I^\collk{F}$
with $f(x)$ as $f$-th coordinate of $x$
- topology of $X$ as subspace of $I^\collk{F}$ is weak topology generated by \collk{F}
-
if every $f\in\collk{F}$ is continuous,
- topology of $X$ into $I^\collk{F}$ is continuous
- if for every closed $F\subset X$ and for each $x\not\in F$, exists $f\in\collk{F}$ such that $f(x)=1$ and $f(F)=\{0\}$, then $X$ is homeomorphic to image of $I^\collk{F}$
-
can define one-to-one mapping of \collk{F}\ into $I^X$
with $f(x)$ as $x$-th coordinate of $f$
Compact and Locally Compact Spaces
Compact spaces
-
compactness for metric spaces (page~)
can be generalized to topological spaces
- things are very much similar to those of metrics spaces
- for subset $K\subset X$, collection of open sets, $\openconv$, the union of which $K$ is contained in called open covering of $K$
- topological space, $X$, said to be compact if every open convering of contains finite subcovering
-
$K\subset X$ said to be compact if compact as subspace of $X$
- or equivalently, $K$ is compact if every covering of $K$ by open sets of $X$ has finite subcovering
- thus, Heine-Borel (page~) says every closed and bounded subset of $\reals$ is compact
- for $\collk{F}\subset\powerset(X)$ any finite subcollection of which has nonempty intersection called finite intersection property
- thus, topological space compact if and only if every collection with finite intersection property has nonempty intersection
Compact spaces - facts
-
compactness can be viewed as absolute type of closedness because
- closed subset of compact space is compact
- compact subset of Hausdorff space is closed
- refer to page~ for exactly the same comments for metric spaces
- thus, every compact set of $\reals$ is closed and bounded
- continuous image of compact set is compact
- one-to-one continuous mapping of compact space into Hausdorff space is homeomorphism
Refinement of open covering
- for open covering of $X$, $\openconv$, open covering of $X$ every element of which is subset of element of $\openconv$, called refinement of $\openconv$ or said to refine $\openconv$
- $X$ is cmopact if and only if every open covering has finite refinement
- any two open covers, $\openconv$ and $\collk{V}$, have common refinement, i.e., $$ \set{U\cap V}{U\in\openconv, V\in\collk{V}} $$
Countable compactness and Lindelöf
- topological space for which every open covering has countable subcovering said to be Lindelöf
- topological space for which every countable open covering has finite subcovering said to be countably compact space
- thus, topological space is compact if and only if both Lindelöf and countably compact
- every second countable space is Lindelöf
- thus, countable compactness coincides with compactness if second countable (i.e., satisfying second axiom of countability)
- continuous image of compact countably compact space is countably compact
Bolzano-Weierstrass property and sequential compactness
- topological space, $X$, said to have Bolzano-Weierstrass property if every sequence, $\seq{x_n}$, in $X$ has at least one cluster point, i.e., $$ (\forall \seq{x_n}) (\exists x\in X) (\forall \epsilon>0, N\in\naturals) (\exists n>N, O\subset X) (x\in O, O \mbox{ is open}, x_n \in O) $$
- topological space has Bolzano-Weierstrass properties if and only if countably compact
- topological space said to be sequentially compact if every sequence has converging subsequence
- sequentially compact space is countably compact
- thus, Lindelöf coincides with compactness if sequentially compact
- countably compact and first countable (i.e., satisfying first axiom of countability) space is sequentially compact
Diagrams for relations among topological spaces
- the figure shows relations among topological spaces stated on pages and
Real-valued functions on topological spaces
- continuous real-valued function on countably compact space is bounded and assumes maximum and minimum
- $f:X\to\reals$ with topological space, $X$, called upper semicontinuous if $\set{x\in X}{f(x)<\alpha}$ is open for every $\alpha \in \reals$
- stronger statement - upper semicontinuous real-valued function on countably compact space is bounded (from above) and assumes maximum
- Dini - for sequence of upper semicontinuous real-valued functions on countably compact space, $\seq{f_n}$, with property that $\seq{f_n(x)}$ decreases monotonically to zero for every $x\in X$, $\seq{f_n}$ converges to zero uniformly
Products of compact spaces
- Tychonoff theorem - (probably) most important theorem in general topology
- most applications in analysis need only special case of product of (closed) intervals, but this special case does not seem to be easire to prove than general case, i.e., Tychonoff theorem
-
lemmas needed to prove Tychonoff theorem
- for collection of subsets of $X$ with finite intersection property, $\collk{A}$, exists collection $\collk{B}\supset\collk{A}$ with finite intersection property that is maximal with respect to this property, i.e., no collection with finite intersection property properly contains $\collk{B}$
- for collection, $\collk{B}$, of subsets of $X$ that is maximal with respect to finite intersection property, each intersection of finite number of sets in $\collk{B}$ is again in $\collk{B}$ and each set that meets each set in $\collk{B}$ is itself in $\collk{B}$
- Tychonoff theorem - product space $\bigtimes X_\alpha$ is compact for indexed family of compact topological spaces, $\seq{X_\alpha}$
Locally compact spaces
- topological space, $X$, with $$ (\forall x\in X)(\exists \mbox{ open }O\subset X)(x\in O, \closure{O} \mbox{ is compact}) $$ called locally compact
- topological space is locally compact if and only if set of all open sets with compact closures forms base for the topological space
-
every compact space is locally compact
-
but converse it not true
- e.g., Euclidean spaces $\reals^n$ are locally compact, but not compact
-
but converse it not true
Locally compact Hausdorff spaces
- locally compact Hausdorff spaces constitute one of most important classes of topological spaces
- so useful is combination of Hausdorff separation axioms in connection with compactness that French usage (following Bourbaki) reserves term ‘compact space' for those compact and Hausdorff, using term ‘pseudocompact' for those not Hausdorff!
- following slides devote to establishing some of their basic properties
Support and subordinateness
- for function, $f$, on topological spaces, closure of $\set{x}{f(x)\neq0}$, called support of $f$, i.e., $$ \support f = \closure{\set{x}{f(x)\neq0}} $$
- given covering $\indexedcol{O_\lambda}$ of $X$, collection $\indexedcol{\varphi_\alpha}$ with $\varphi_\alpha:X\to\reals$ satisfying $$ (\forall \varphi_\alpha)(\exists O_\lambda)(\support \varphi_\alpha \subset O_\lambda) $$ said to be subordinate to $\indexedcol{O_\lambda}$
Some properties of locally compact Hausdorff spaces
-
for compact subset, $K$, of locally compact Hausdorff space, $X$
- exists open subset with compact closure, $O\subset X$, containing $K$
- exists continuous nonnegative function, $f$, on $X$, with $$ (\forall x\in K)(f(x)=1) \mbox{ and } (\forall x\not\in O)(f(x)=0) $$ if $K$ is $G_\delta$, may take $f<1$ in $\compl{K}$
- for open covering, $\indexedcol{O_\lambda}$, for compact subset, $K$, of locally compact Hausdorff space, exists $\seq{\varphi_i}_{i=1}^n \subset C(X,\preals)$ subordinate to $\indexedcol{O_\lambda}$ such that $$ (\forall x \in K)(\varphi_1(x)+\cdots+\varphi_n(x) =1) $$
Local compactness and second Baire category
-
for locally compact space, $X$,
and countable collection of dense open subsets, $\seq{O_k}\subset X$,
the intersection of the collection
$$
\bigcap O_k
$$
is dense
- analogue of Baire theorem for complete metric spaces (refer to page~ for Baire theorem)
- thus, every locally compact space is locally of second Baire category with respect to itself
Local compactness, Hausdorffness, and denseness
- for countable union, $\bigcup F_n$, of closed sets containing open subset, $O$, in locally compact space, union of interiors, $\bigcup \interior{F_n}$, is open set dense in $O$
- dense subset of Hausdorff space, $X$, which is locally compact in its subspace topology, is open subset of $X$
- subset, $Y$, of locally compact Hausdorff space is locally compact in its subspace topology if and only if $Y$ is relatively open subset of $\closure{Y}$
Alexandroff one-point compactification
-
for locally compact Hausdorff space, $X$,
can form $X^\ast$ by adding single point $\omega\not\in X$ to $X$
and take set in $X^\ast$ to be open
if it is either open in $X$ or complement of compact subset in $X$,
then
- $X^\ast$ is compact Hausdorff spaces
- identity mapping of $X$ into $X^\ast$ is homeomorphism of $X$ and $X^\ast\sim\{\omega\}$
- $X^\ast$ called Alexandroff one-point compactification of $X$
- $\omega$ often referred to as infinity in $X^\ast$
- continuous mapping, $f$, from topological space to topological space inversely mapping compact set to compact set, said to be proper
- proper maps from locally compact Hausdorff space into locally compact Hausdorff space are precisely those continuous maps of $X$ into $Y$ tha can be extended to continuous maps $f^\ast$ of $X^\ast$ into $Y^\ast$ by taking point at infinity in $X^\ast$ to point at infinity in $Y^\ast$
Manifolds
- connected Hausdorff space with each point having neighborhood homeomorphic to ball in $\reals^n$ called $n$-dimensional manifold
- sometimes say manifold is connected Hausdorff space that is locally Euclidean
- thus, manifold has all local properties of Euclidean space; particularly locally compact and locally connected
- neighborhood homeomorphic to ball called coordinate neighborhood or coordinate ball
- pair $\pair{U}{\varphi}$ with coordinate ball, $U$, with homeomorphism from $U$ onto ball in $\reals^n$, $\varphi$, called coodinate chart; $\varphi$ called coordinate map
- coordinate (in $\reals^n$) of point, $x\in U$, under $\varphi$ said to be coordinate of $x$ in the chart
Equivalent properties for manifolds
-
for manifold, $M$, the following are equivalent
- $M$ is paracompact
- $M$ is $\sigma$-compact
- $M$ is Lindelöf
- every open cover of $M$ has star-finite open refinement
- exist sequence of open subsets of $M$, $\seq{O_n}$, with $\closure{O_n}$ compact, $\closure{O_n}\subset O_{n+1}$, and $M=\bigcup O_n$
- exists proper continuous map, $\varphi:M\to [0,\infty)$
- $M$ is second countable
Banach Spaces
Vector spaces
- set $X$ with $+:X\times X\to X$, $\cdot: \reals \times X\to X$ satisfying the following properties called vector space or linear space or linear vector space over $\reals$ $$ \begin{eqnarray*} \mbox{- for all } x,y,z\in X \mbox{ and } \lambda, \mu \in \reals \\ x+y= y+x && \mbox{- additive commutativity} \\ (x+y)+z= x+(y+z) && \mbox{- additive associativity} \\ (\exists 0\in X)\ x+0=x && \mbox{- additive identity} \\ \lambda(x+y) = \lambda x + \lambda y && \mbox{- distributivity of multiplicative over addition} \\ (\lambda+\mu)x = \lambda x + \mu x && \mbox{- distributivity of multiplicative over addition} \\ \lambda(\mu x)= (\lambda \mu)x && \mbox{- multiplicative associativity} \\ 0\cdot x = 0\in X&& \\ 1\cdot x = x&& \end{eqnarray*} $$
Norm and Banach spaces
- $\|\cdot\|:X\to\preals$ with vector space, $X$, called norm if $$ \begin{eqnarray*} \mbox{for all } x,y\in X \mbox{ and } \alpha \in \reals& \\ \|x\| = 0 \Leftrightarrow x=0 && \mbox{- positive definiteness / positiveness /point-separating} \\ \|x+y\|\geq \|x\| + \|y\| && \mbox{- triangle inequality / subadditivity} \\ \|\alpha x\| = |\alpha| \|x\| && \mbox{- Absolute homogeneity} \end{eqnarray*} $$
-
normed vector space that is complete metric space with metric induced by norm,
i.e., $\rho:X\times X \to \preals$ with $\rho(x,y)=\|x-y\|$,
called Banach space
- can be said to be class of spaces endowed with both topological and algebraic structure
-
examples include
- $L^p$ with $1\leq p\leq \infty$ (page~),
- $C(X)=C(X,\reals)$, i.e., space of all continuous real-valued functions on compact space, $X$
Properties of vector spaces
- normed vector space is complete if and only if every absolutely summable sequence is summable
Subspaces of vector spaces
- nonempty subset, $S$, of vector space, $X$, with $x,y\in S\Rightarrow \lambda x + \mu y\in S$, called subspace or linear manifold
- intersection of any family of linear manifolds is linear manifold
- hence, for $A\subset X$, exists smallest linear manifold containing $A$, often denoted by $\{A\}$
- if $S$ is closed as subset of $X$, called closed linear manifold
-
some definitions
- $A+x$ defined by $\set{y+x}{y\in A}$, called translate of $A$ by $x$
- $\lambda A$ defined by $\set{\lambda x}{x \in A}$
- $A+B$ defined by $\set{x+y}{x \in A, y\in B}$
Linear operators on vector spaces
- mapping of vector space, $X$, to another (possibly same) vector space called linear mapping, or linear operator, or linear transformation if $$ (\forall x,y \in X, \alpha, \beta \in \reals) (A(\alpha x + \beta y y) = \alpha (Ax) + \beta (Ay)) $$
- linear operator called bounded if $$ (\exists M) (\forall x \in X) (\|Ax\|\leq M \|x\|) $$
-
least such bound called norm of linear operator, i.e.,
$$
M
= \sup_{x\in X, x\neq 0} \|Ax\|/\|x\|
$$
- linearity implies $$ M = \sup_{x\in X, \|x\|= 1} \|Ax\| = \sup_{x\in X, \|x\|\leq 1} \|Ax\| $$
Isomorphism and isometrical isomorphism
- bounded linear operator from $X$ to $Y$ called isomorphism if exists bounded inverse linear operator, i.e., $$ (\exists A:X\to Y, B:Y\to X)(AB \mbox{ and } BA \mbox{ are identity}) $$
- isomorphism between two normed vector spaces that preserve norms called isometrical isomorphism
- from abstract point of view, isometrically isomorphic spaces are identical, i.e., isometrical isomorphism merely amounts to element renaming
Properties of linear operators on vector spaces
-
for linear operators, point continuity $\Rightarrow$ boundedness $\Rightarrow$ uniform continuity,
i.e.,
- bounded linear operator is uniformly continuous
- linear operator continuous at one point is bounded
- space of all bounded linear operators from {normed vector space} to {Banach space} is {Banach space}
Linear functionals on vector spaces
- linear operator from vector space, $X$, to $\reals$ called linear functional, i.e., $f:X\to\reals$ such that for all $x,y\in X$ and $\alpha, \beta \in \reals$ $$ f(\alpha x + \beta y) = \alpha f(x) + \beta f(y) $$
- want to extend linear functional from subspace to whole vector space while preserving properties of functional
Hahn-Banach theorem
- Hahn-Banach theorem - for vector space, $X$, and linear functional, $p:X \to \reals$ with $$ (\forall x,y\in X, \alpha \geq0) (p(x+y)\leq p(x) + p(y) \mbox{ and } p(\alpha x) = \alpha p(x)) $$ and for subspace of $X$, $S$, and linear functional, $f:S\to\reals$, with $$ (\forall s \in S) (f(s) \leq p(s)) $$ exists linear functional, $F:X\to\reals$, such that $$ (\forall s \in S) ( F(s) = f(s)) \mbox{ and } (\forall x \in X) (F(x) \leq p(x)) $$
- corollary - for normed vector space, $X$, exists bounded linear functional, $f:X\to\reals$ $$ f(x) = \|f\|\|x\| $$
Dual spaces of normed spaces
- space of bounded linear functionals on normed space, $X$, called dual or conjugate of $X$, denoted by $X^\ast$
- every dual is Banach space (refer to page~)
-
dual of $L^p$ is (isometrically isomorphic to) $L^q$ for $1\leq p<\infty$
- exists natural representation of bounded linear functional on $L^p$ by $L^q$ (by Riesz representation theorem on page~)
- not every bounded linear functionals on $L^\infty$ has natural representation
Natural isomorphism
-
define linear mapping of normed space, $X$, to $X^{\ast\ast}$ (i.e., dual of dual of $X$),
$\varphi:X\to X^{\ast\ast}$ such that for $x\in X$,
$(
\forall f\in X^{\ast}
)
(
(\varphi (x))(f) = f(x)
)$
- then, $\|\varphi(x)\| = \sup_{\|g\|=1, g\in X^\ast} g(x) \leq \sup_{\|g\|=1, g\in X^\ast} \|g\|\|x\| = \|x\|$
- by corollary on page~, there exists $f\in X^\ast$ such that $f(x)=\|x\|$, then $\|f\|=1$, and $f(x)=\|x\|$, thus $\|\varphi(x)\| = \sup_{\|g\|=1, g\in X^\ast} g(x) \geq f(x) = \|x\|$
- thus, $\|\varphi(x)\| = \|x\|$, hence $\varphi$ is isometrically isomorphic linear mapping of $X$ onto $\varphi(X)\subset X^{\ast\ast}$, which is subspace of $X^{\ast\ast}$
- $\varphi$ called natural isomorphism of $X$ into $X^{\ast\ast}$
- $X$ said to be reflexive if $\varphi(X)=X^{\ast\ast}$
- thus, $L^p$ with $1< p<\infty$ is reflexive, but $L^1$ and $L^\infty$ are not
- note $X$ may be isometric with $X^{\ast\ast}$ without reflexive
Completeness of natural isomorphism
- for natural isomorphism, $\varphi$
-
$X^{\ast\ast}$ is complete, hence Banach space
- because bounded linear functional to $\reals$ (refer to page~)
- thus, closure of $\varphi(X)$ in $X^{\ast\ast}$, $\closure{\varphi(X)}$, complete (refer to page~)
- therefore, every normed vector space ($X$) is isometrically isomorphic to dense subset of Banach spaces ($X^{\ast\ast}$)
Hahn-Banach theorem - complex version
- Bohnenblust and Sobczyk - for complex vector space, $X$, and linear functional, $p:X \to \reals$ with $$ ( \forall x,y\in X, \alpha \in\complexes ) ( p(x+y)\leq p(x) + p(y) \mbox{ and } p(\alpha x) = |\alpha| p(x) ) $$ and for subspace of $X$, $S$, and (complex) linear functional, $f:S\to\complexes$, with $$ ( \forall s \in S ) ( |f(s)| \leq p(s) ) $$ exists linear functional, $F:X\to\reals$, such that $$ ( \forall s \in S ) ( F(s) = f(s) ) $$ and $$ ( \forall x \in X ) ( |F(x)| \leq p(x) ) $$
Open mapping on topological spaces
- mapping from topological space to another topological space the image of each open set by which is open called open mapping
- hence, one-to-one continuous open mapping is homeomorphism
- (will show) continuous linear transformation of Banach space onto another Banach space is always open mapping
- (will) use above to provide criteria for continuity of linear transformation
Closed graph theorem (on Banach spaces)
-
every continuous linear transformation of Banach space onto Banach space is open mapping
- in particular, if the mapping is one-to-one, it is isomorphism
- for linear vector space, $X$, complete in two norms, $\|\cdot\|_A$ and $\|\cdot\|_B$, with $C\in\reals$ such that $(\forall x\in X)(\|x\|_A \leq C \|x\|_B)$, two norms are equivalent, i.e., $(\exists C'\in\reals)(\forall x\in X)(\|x\|_B \leq C' \|x\|_A)$
-
closed graph theorem - linear transformation, $A$, from Banach space, $A$, to Banach space, $B$,
with property that
“if $\seq{x_n}$ converges in $X$ to $x\in X$ and $\seq{Ax_n}$ converges in $Y$ to $y\in Y$,
then $y=Ax$''
is continuous
- equivalent to say, if graph $\set{(x,Ax)}{x\in X}\subset X\times Y$ is closed, $A$ is continuous
Principle of uniform boundedness (on Banach spaces)
- principle of uniform boundedness - for family of bounded linear operators, $\collk{F}$ from Banach space, $X$, to normed space, $Y$, with $$ ( \forall x \in X ) ( \exists M_x ) ( \forall T \in \collk{F} ) ( \|Tx\| \leq M_x ) $$ then operators in $\collk{F}$ is uniformly bounded, i.e., $$ ( \exists M ) ( \forall T \in \collk{F} ) ( \|T\| \leq M ) $$
Topological vector spaces
- just as notion of metric spaces generalized to notion of topological spaces
- notion of normed linear space generalized to notion of topological vector spaces
- linear vector space, $X$, with topology, $\tJ$, equipped with continuous addition, $+:X\times X\to X$ and continuous multiplication by scalars, $+:\reals\times X\to X$, called topological vector space
Translation invariance of topological vector spaces
-
for topological vector space,
translation by $x\in X$ is homeomorphism (due to continuity of addition)
- hence, $x+O$ of open set $O$ is open
- every topology with this property said to be translation invariant
- for translation invariant topology, $\tJ$, on $X$, and base, $\collB$, for $\tJ$ at $0$, set $$ \set{x+U}{U\in \collB} $$ forms a base for $\tJ$ at $x$
- hence, sufficient to give a base at $0$ to determine translation invariance of topology
- base at $0$ often called local base
Sufficient and necessarily condition for topological vector spaces
- for topological vector space, $X$, can find base, $\collB$, satisfying following properties $$ \begin{eqnarray*} && (\forall U, V \in \collB)(\exists W\in \collB)(W\subset U\cap V) \\ && (\forall U \in \collB, x\in U)(\exists V\in \collB)(x+V\subset U) \\ && (\forall U \in \collB)(\exists V\in \collB)(V + V \subset U) \\ && (\forall U \in \collB, x\in X)(\exists \alpha\in \reals)(x\in \alpha U) \\ && (\forall U \in \collB, 0<|\alpha|\leq 1\in \reals)(\alpha U\subset U, \alpha U\subset \collB) \end{eqnarray*} $$
-
conversely, for collection, $\collB$, of subsets containing $0$
satisfying above properties,
exists topology for $X$ making $X$ topological vector space
with $\collB$ as base at $0$
- this topology is Hausdorff if and only if $$ \bigcap\{U\in \collB\} = \{0\} $$
- for normed linear space, can take $\collB$ to be set of spheres centered at $0$, then $\collB$ satisfies above properties, hence can form topological vector space
Topological isomorphism
- in topological vector space, can compare neighborhoods at one point with neighborhoods of another point by translation
- for mapping, $f$, from topological vector space, $X$, to topological vector space, $Y$, such that $$ \begin{eqnarray*} && (\forall \mbox{ open } O\subset Y \mbox{ with }0\in O) (\exists \mbox{ open } U\subset X \mbox{ with }0\in U) \\ && (\forall x\in X) (f(x+U) \subset f(x) + O) \end{eqnarray*} $$ said to be uniformly continuous
- linear transformation, $f$, is uniformly continuous if continuous at one point
-
continuous one-to-one mapping, $\varphi$, from $X$ onto $Y$ with continuous $\varphi^{-1}$
called (topological) isomorphism
- in abstract point of view, isomorphic spaces are same
- Tychonoff - finite-dimensional Hausdorff topological vector space is topologically isomorphic to $\reals^n$ for some $n$
Weak topologies
-
for vector space, $X$, and collection of linear functionals, $\collF$,
weakest topology generated by $\collF$,
i.e., in way that each functional in $\collF$ is continuous in that topology,
called weak topology generated by $\collF$
- translation invariant
- base at $0$ given by sets $$ \set{x\in X}{\forall f \in\collk{G}, |f(x)|<\epsilon} $$ for all finite $\collk{G}\subset\collF$ and $\epsilon>0$
- basis satisfies properties on page~, hence, (above) weak topology makes topological vector space
- for normed vector space, $X$, and collection of continuous functionals, $\collF$, i.e., $\collF\subset X^\ast$, weak topology generated by $\collF$ weaker than (fewer open sets) norm topology of $X$
- metric topology generated by norm called strong topology of $X$
- weak topology generated by $X^\ast$ called weak topology of $X$
Strongly and weakly open and closed sets
- open and closed sets of strong topology called strongly open and strongly closed
- open and closed sets of weak topology called weakly open and weakly closed
- wealy closed set is strongly closed, but converse not true
- however, these coincides for linear manifold, i.e., linear manifold is weakly closed if and only if strongly closed
- every strongly converent sequence (or net) is weakly convergent
Weak$^\ast$ topologies
- for normed space, weak topology of $X^\ast$ is weakest topology for which all functionals in $X^{\ast\ast}$ are continuous
- turns out that weak topology of $X^\ast$ is less useful than weak topology generated by $X$, i.e., that generated by $\varphi(X)$ where $\varphi$ is the natural embedding of $X$ into $X^{\ast\ast}$ (refer to page~)
-
(above) weak topology generated by $\varphi(X)$
called weak$^\ast$ topology for $X^\ast$
- even weaker than weak topology of $X^\ast$
- thus, weak$^\ast$ closed subset of is weakly closed, and weak convergence implies weak$^\ast$ convergence
- base at $0$ for weak$^\ast$ topology given by sets $$ \set{f}{\forall x\in A, |f(x)|<\epsilon} $$ for all finite $A\subset X$ and $\epsilon>0$
- when $X$ is reflexive, weak and weak$^\ast$ topologies coincide
- Alaoglu - unit ball $S^\ast = \set{f\in X^\ast}{\|f\|\geq1}$ is compact in weak$^\ast$ topology
Convex sets
- for vector space, $X$ and $x,y\in X$ $$ \set{\lambda x + (1-\lambda)y}{\lambda \in [0,1]} \subset X $$ called segmenet joining $x$ and $y$
- set $K\subset X$ said to be convex or convex set if every segment joining any two points in $K$ is in $K$, i.e., $(\forall x,y\in K)(\mbox{segment joining }x,y\subset X)$
- every $\lambda x + (1-\lambda)y$ for $0<\lambda<1$ called interior point of segment
- point in $K\subset X$ where intersection with $K$ of every line going through $x$ contains open interval about $x$, said to be internal point, i.e., $$ (\exists \epsilon>0)(\forall y\in K, |\lambda|<\epsilon)(x+y x\in K) $$
- convex set examples - linear manifold & ball, ellipsoid in normed space
Properties of convex sets
- for convex sets, $K_1$ and $K_2$, following are also convex sets $$ K_1 \cap K_2,\ \lambda K_1,\ K_1 + K_2 $$
-
for linear operators from vector space, $X$, and vector space, $Y$,
- image of convex set (or linear manifold) in $X$ is convex set (or linear manifold) in $Y$,
- inverse image of convex set (or linear manifold) in $Y$ is convex set (or linear manifold) in $X$
- closure of convex set in topological vector space is convex set
Support functions of and separated convex sets
- for subset $K$ of vector space $X$, $p:K\to \preals$ with $p(x) = \inf{\lambda|\lambda^{-1}x \in K, \lambda>0}$ called support functions
-
for convex set $K\subset X$ containing $0$ as internal point
- $(\forall x\in X,\lambda\geq0)(p(\lambda x) = \lambda p(x))$
- $(\forall x,y\in X)(p(x+y)\leq p(x)+p(y))$
- $\set{x\in X}{p(x) < 1} \subset K \subset \set{x\in X}{p(x)\leq 1}$
- two convex sets, $K_1$ and $K_2$ such that exists linear functional, $f$, and $\alpha\in\reals$ with $(\forall x\in K_1)(f(x) \leq \alpha)$ and $(\forall x\in K_2)(f(x) \geq \alpha)$, said to be separated
- for two disjoint convex sets in vector space with at least one of them having internal point, exists nonzero linear functional that separates two sets
Local convexity
- topological vector space with base for topology consisting of convest sets, said to be locally convex
- for family of convex sets, $\collk{N}$, in vector space, following conditions are sufficient for being able to translate sets in $\collk{N}$ to form base for topology to make topological space into locally convex topological vector space $$ \begin{eqnarray*} & (\forall N\in\collk{N})(x\in N \Rightarrow x \mbox{ is internal}) & \\ & (\forall N_1, N_2\in\collk{N})(\exists N_3\in\collk{N})(N_3 \subset N_1 \cap N_2) & \\ & (\forall N \in\collk{N}, \alpha\in\reals \mbox{ with } 0<|\alpha|<1)(\alpha N \in \collk{N}) & \end{eqnarray*} $$
- conversely, for every locally convex topological vector space, exists base at $0$ satisfying above conditions
-
follows that
- weak topology on vector space generated by linear functionals is locally convex
- normed vector space is locally convex topological vector space
Facts regarding local convexity
- for locally convex topological vector space closed convex subset, $F$, with point, $x$, not in $F$, exists continuous linear functional, $f$, such that $$ f(x) < \inf_{y\in F} f(y) $$
-
corollaries
- convex set in locally convex topological vector space is strongly closed if and only if weakly closed
- for distinct points, $x$ and $y$, in locally convex Hausdorff vector space, exists continuous linear functional, $f$, such that $f(x)\neq f(y)$
Extreme points and supporting sets of convex sets
- point in convex set in vector space that is not interior point of any line segment lying in the set, called extreme point
- thus, $x$ is extreme point of convex set, $K$, if and only if $x=\lambda y + (1-\lambda) z$ with $0<\lambda<1$ implies $y\not\in K$ or $z\not\in K$
- closed and convex subset, $S$, of convex set, $K$, with property that for every interior point of line segment in $K$ belonging to $S$, entire line segment belongs to $S$, called supporting set of $K$
- for closed and convex set, $K$, set of points a continuous linear functional assumes maximum on $K$, is supporting set of $K$
Convex hull and convex convex hull
- for set $E$ in vector space, intersection of all convex sets containing set, $E$, called convex hull of $E$, which is convex set
- for set $E$ in vector space, intersection of all closed convex sets containing set, $E$, called closed convex hull of $E$, which is closed convex set
- Krein-Milman theorem - compact convex set in locally convex topologically vector space is closed convex hull of its extreme points
Hilbert spaces
- Banach space, $H$, with function $\innerp{\cdot}{\cdot}:H\times H\to\reals$ satisfying following properties, called Hilbert space $$ \begin{eqnarray*} &&(\forall x,y,z\in H, \alpha, \beta \in \reals)(\innerp{\alpha x + \beta y}{z}=\alpha\innerp{x}{z} + \beta\innerp{y}{z}) \\ &&(\forall x,y\in H)(\innerp{x}{y} = \innerp{y}{z}) \\ &&(\forall x\in H)(\innerp{x}{x} = \|x\|^2) \end{eqnarray*} $$
-
$\innerp{x}{y}$ called inner product
for $x,y\in H$
- examples - $\innerp{x}{y} = x^T y = \sum x_i y_i$ for $\reals^n$, $\innerp{x}{y} = \int x(t)y(t) dt$ for $L^2$
-
Schwarz or Cauchy-Schwarz or Cauchy-Buniakowsky-Schwarz inequality -
$$
\|x\|\|y\| \geq \innerp{x}{y}
$$
-
hence,
- linear functional defined by $f(x)=\innerp{x}{y}$ bounded by $\|y\|$
- $\innerp{x}{y}$ is continuous function from $H\times H$ to $\reals$
-
hence,
Inner product in Hilbert spaces
- $x$ and $y$ in $H$ with $\innerp{x}{y}=0$ said to be orthogonal denoted by $x\perp y$
- set $S$ of which any two elements orthogonal called orthogonal system
- orthogonal system called orthonormal if every element has unit norm
- any two elements are $\sqrt{2}$ apart, hence if $H$ separable, every orthonormal system in $H$ must be countable
- shall deal only with separable Hilbert spaces
Fourier coefficients
- assume orthonormal system expressed as sequence, $\seq{\varphi_n}$ - may be finite or infinite
- for $x\in H$ $$ a_n = \innerp{x}{\varphi_n} $$ called Fourier coefficients
- for $n\in\naturals$, we have $$ \|x\|^2 \geq \sum^n_{i=1} a_i^2 $$ $$ \begin{eqnarray*} \left\| x-\sum_{i=1}^n a_i \varphi_i \right\|^2 &=& \innerpt{x-\sum a_i \varphi_i}{x-\sum a_i \varphi_i}{} \\ &=& \innerpt{x}{x} - 2 \innerpt{x}{\sum a_i \varphi_i}{} + \innerpt{\sum a_i \varphi_i}{\sum a_i \varphi_i}{} \\ &=& \|x\|^2 - 2 \sum a_i \innerpt{x}{\varphi_i} + \sum a_i^2 \|\varphi_i\|^2 = \|x\|^2 - \sum a_i^2 \geq 0 \end{eqnarray*} $$
Fourier coefficients of limit of $x$
- Bessel's inequality - for $x\in H$, its Fourier coefficients, $\seq{a_n}$ $$ \sum_{n=1}^\infty a_n^2 \leq \|x\|^2 $$
- then, $\seq{z_n}$ defined by following is Cauchy sequence $z_n = \sum_{i=1}^n a_i \varphi_i$
- completeness (of Hilbert space) implies $\seq{z_n}$ converges - let $y=\lim z_n$ $$ y=\lim z_n = \sum_{i=1}^\infty a_i \varphi_i $$
- continuity of inner product implies $\innerp{y}{\varphi_n} = \lim (z_n,\varphi_n) = a_n$, i.e., Fourier coefficients of $y\in H$ are $a_n$, i.e.,
- $y$ has same Fourier coefficients as $x$
Complete orthonormal system
- orthonormal system, $\seq{\varphi_n}_{n=1}^\infty$, of Hilbert spaces, $H$, is said to be complete if $$ (\forall x\in H, n\in\naturals)(\innerp{x}{\varphi_n}=0) \Rightarrow x=0 $$
- orthonormal system is complete if and only if maximal, i.e., $$ \seq{\varphi_n} \mbox{ is complete} \Leftrightarrow ( (\exists \mbox{ orthonormal }R\subset H)(\forall n\in\naturals)(\varphi_n \in R) \Rightarrow R = \seq{\varphi_n} ) $$
- Hausdorff maximal principle () implies existence of maximal orthonormal system, hence following statement
- for separable Hilbert space, $H$, every orthonormal system is separable and exists a complete orthonormal system. any such system, $\seq{\varphi_n}$, and $x\in H$ $$ x = \sum a_n \varphi_n $$ with $a_n = \innerp{x}{\varphi_n}$, and $\|x\| = \sum a_n^2$
Dimensions of Hilbert spaces
- every complete orthonormal system of separable Hilbert space has same number of elements, i.e., has same cardinality
- hence, every complete orthonormal system has either finite or countably infinite complete orthonormal system
-
this number called dimension of separable Hilbert space
- for Hilbert space with countably infinite complete orthonormal system, we say, $\dim H = \aleph_0$
Isomorphism and isometry between Hilbert spaces
- isomorphism, $\Phi$, of Hilbert space onto another Hilbert space is linear mapping with property, $\innerp{\Phi x}{\Phi y} = \innerp{x}{y}$
- hence, every isomorphism between Hilbert spaces is isometry
- every $n$-dimensional Hilbert space is isomorphic to $\reals^n$
- every $\aleph_0$-dimensional Hilbert space is isomorphic to $l^2$, which again is isomorphic to $L^2$
- $L^2[0,1]$ is separable and $\seq{\cos (n\pi t)}$ is infinite orthogonal system
- every bounded linear functional, $f$, on Hilbert space, $H$, has unique $y$ such that $$ (\forall x\in H)(f(x)=\innerp{x}{y}) $$ and $\|f\|=\|y\|$
Measure and Integration
Purpose of integration theory
-
purpose of “measure and integration'' slides
- abstract (out) most important properties of Lebesgue measure and Lebesgue integration
- provide certain axioms that Lebesgue measure satisfies
- base our integration theory on these axioms
- hence, our theory valid for every system satisfying the axioms
Measurable space, measure, and measure space
- family of subsets containing $\emptyset$ closed under countable union and completement, called $\sigma$-algebra
- mapping of sets to extended real numbers, called set function
-
$\measu{X}{\algk{B}}$ with set, $X$, and $\sigma$-algebra of $X$, $\algk{B}$,
called measurable space
- $A\in\algk{B}$, said to be measurable (with respect to \algk{B})
- nonnegative set function, $\mu$, defined on $\algk{B}$ satisfying $\mu(\emptyset)=0$ and for every disjoint, $\seq{E_n}_{n=1}^\infty\subset \algk{B}$, $$ \mu\left(\bigcup E_n\right) = \sum \mu E_n $$ called measure on measurable space, $\measu{X}{\algk{B}}$
- measurable space, $\measu{X}{\algk{B}}$, equipped with measure, $\mu$, called measure space and denoted by $\meas{X}{\algk{B}}{\mu}$
Measure space examples
- $\meas{\reals}{\subsetset{M}}{\mu}$ with Lebesgue measurable sets, $\subsetset{M}$, and Lebesgue measure, $\mu$
- $\meast{[0,1]}{\set{A\in\subsetset{M}}{A\subset[0,1]}}{\mu}$ with Lebesgue measurable sets, $\subsetset{M}$, and Lebesgue measure, $\mu$
- $\meas{\reals}{\algB}{\mu}$ with class of Borel sets, $\algB$, and Lebesgue measure, $\mu$
- $\meas{\reals}{\powerset(\reals)}{\mu_C}$ with set of all subsets of $\reals$, $\powerset(\reals)$, and counting measure, $\mu_C$
-
interesting (and bizarre) example
- $\meas{X}{\collk{A}}{\mu_B}$ with any uncountable set, $X$, family of either countable or complement of countable set, $\collk{A}$, and measure, $\mu_B$, such that $\mu_B A =0$ for countable $A\subset X$ and $\mu_B B=1$ for uncountable $B\subset X$
More properties of measures
- for $A,B\in\algB$ with $A\subset B$ $$ \mu A \leq \mu B $$
- for $\seq{E_n}\subset \algB$ with $\mu E_1 < \infty$ and $E_{n+1} \subset E_n$ $$ \mu\left(\bigcap E_n\right) = \lim \mu E_n $$
- for $\seq{E_n}\subset \algB$ $$ \mu\left(\bigcup E_n\right) \leq \sum \mu E_n $$
Finite and $\sigma$-finite measures
- measure, $\mu$, with $\mu(X)<\infty$, called finite
-
measure, $\mu$, with $X=\bigcup X_n$ for some $\seq{X_n}$ and $\mu(X_n)<\infty$,
called $\sigma$-finite
- always can take $\seq{X_n}$ with disjoint $X_n$
- Lebesgue measure on $[0,1]$ is finite
- Lebesgue measure on $\reals$ is $\sigma$-finite
- countering measure on uncountable set is not $\sigma$-measure
Sets of finite and $\sigma$-finite measure
- set, $E\in \algB$, with $\mu E<\infty$, said to be of finite measure
- set that is countable union of measurable sets of finite measure, said to be of $\sigma$-finite measure
- measurable set contained in set of $\sigma$-finite measure, is of $\sigma$-finite measure
- countable union of sets of $\sigma$-finite measure, is of $\sigma$-finite measure
- when $\mu$ is $\sigma$-finite, every measurable set is of $\sigma$-finite
Semifinite measures
- roughly speacking, nearly all familiar properties of Lebesgue measure and Lebesgue integration hold for arbitrary $\sigma$-finite measure
- many treatment of abstract measure theory limit themselves to $\sigma$-finite measures
- many parts of general theory, however, do not required assumption of $\sigma$-finiteness
- undesirable to have development unnecessarily restrictive
- measure, $\mu$, for which every measurable set of infinite measure contains measurable sets of arbitrarily large finite measure, said to be semifinite
- every $\sigma$-finite measure is semifinite measure while measure, $\mu_B$, on page~ is not
Complete measure spaces
-
measure space, $\meas{X}{\algB}{\mu}$, for which $\algB$ contains all subsets of sets of measure zero,
said to be complete,
i.e.,
$$
(\forall B\in\algB \mbox{ with } \mu B=0)
(A \subset B \Rightarrow A \in \algB)
$$
- e.g., Lebesgue measure is complete, but Lebesgue measure restricted to $\sigma$-algebra of Borel sets is not
- every measure space can be completed by addition of subsets of sets of measure zero
-
for $\meas{X}{\algB}{\mu}$, can find complete measure space $\meas{X}{\algB_0}{\mu_0}$
such that
$$
\begin{eqnarray*}
&-&
\algB \subset \algB_0
\\
&-&
E \in\algB \Rightarrow \mu E = \mu_0 E
\\
&-&
E \in\algB_0 \Leftrightarrow E = A \cup B
\mbox{ where } B,C\in\algB, \mu C = 0, A\subset C
\end{eqnarray*}
$$
- $\meas{X}{\algB_0}{\mu_0}$ called completion of $\meas{X}{\algB}{\mu}$
Local measurability and saturatedness
- for $\meas{X}{\algB}{\mu}$, $E\subset X$ for which $(\forall B\in\algB \mbox{ with }\mu B < \infty)(E\cap B\in\algB)$, said to be locally measurable
- collection, $\algC$, of all locally measurable sets is $\sigma$-algebra containing $\algB$
- measure for which every locally measurable set is measurable, said to be saturated
- every $\sigma$-finite measure is saturated
-
measure can be extended to saturated measure,
but (unlike completion)
extension is not unique
- can take $\algC$ as extension for locally measurable sets, but measure can be extended on $\algC$ in more than one ways
Measurable functions
- concept and properties of measurable functions in abstract measurable space almost identical with those of Lebesgue measurable functions (page~)
- theorems and facts are essentially same as those of Lebesgue measurable functions
- assume measurable space, $\measu{X}{\algB}$
-
for $f:X\to\ereals$, following are equivalent
- $(\forall a\in\reals) (\set{x\in X}{f(x) < a}\in\algB)$
- $(\forall a\in\reals) (\set{x\in X}{f(x) \leq a}\in\algB)$
- $(\forall a\in\reals) (\set{x\in X}{f(x) > a}\in\algB)$
- $(\forall a\in\reals) (\set{x\in X}{f(x) \geq a}\in\algB)$
- $f:X\to\ereals$ for which any one of above four statements holds, called measurable or measurable with respect to \algB
Properties of measurable functions
-
for measurable functions, $f$ and $g$, and $c\in\reals$
- $f+c$, $cf$, $f+g$, $fg$, $f\vee g$ are measurable
-
for every measurable function sequence, $\seq{f_n}$
- $\sup f_n$, $\limsup f_n$, $\inf f_n$, $\liminf f_n$ are measurable
- thus, $\lim f_n$ is measurable if exists
Simple functions and other properties
- $\varphi$ called simple function if for distinct $\seq{c_i}_{i=1}^n$ and measurable sets, $\seq{E_i}_{i=1}^n$ $$ \varphi(x) = \sum_{i=1}^n c_i \chi_{E_i}(x) $$
-
for nonnegative measurable function, $f$,
exists nondecreasing sequence of simple functions, $\seq{\varphi_n}$,
i.e., $\varphi_{n+1}\geq \varphi_n$
such that for every point in $X$
$$
f = \lim \varphi_n
$$
- for $f$ defined on $\sigma$-finite measure space, we may choose $\seq{\varphi_n}$ so that every $\varphi_n$ vanishes outside set of finite measure
- for complete measure, $\mu$, $f$ measurable and $f=g$ a.e. imply measurability of $g$
Define measurable function by ordinate sets
- $\set{x}{f(x)<\alpha}$ sometimes called ordinate sets , which is nondecreasing in $\alpha$
- below says when given nondecreasing ordinate sets, we can find $f$ satisfying $$ \set{x}{f(x)<\alpha} \subset B_\alpha \subset \set{x}{f(x)\leq\alpha} $$
- for nondecreasing function, $h:D\to\algB$, for dense set of real numbers, $D$, i.e., $B_\alpha \subset B_\beta$ for all $\alpha<\beta$ where $B_\alpha = h(\alpha)$, exists unique measurable function, $f:X\to\ereals$ such that $f\leq \alpha$ on $B_\alpha$ and $f\geq \alpha$ on $X\sim B_\alpha$
- can relax some conditions and make it a.e. version as below
-
for function, $h:D\to\algB$, for dense set of real numbers, $D$,
such that $\mu(B_\alpha\sim B_\beta)=0$ for all $\alpha < \beta$ where $B_\alpha = h(\alpha)$,
exists measurable function, $f:X\to\ereals$
such that $f\leq \alpha$ a.e. on $B_\alpha$ and $f\geq \alpha$ a.e. on $X\sim B_\alpha$
- if $g$ has the same property, $f=g$ a.e.
Integration
- many definitions and proofs of Lebesgue integral depend only on properties of Lebesgue measure which are also true for arbitrary measure in abstract measure space (page~)
-
integral of nonnegative simple function, $\varphi(x) = \sum_{i=1}^n c_i \chi_{E_i}(x)$,
on measurable set, $E$, defined by
$$
\int_E \varphi d\mu= \sum_{i=1}^n c_i \mu (E_i \cap E)
$$
- independent of representation of $\varphi$
- for $a,b\in\ppreals$ and nonnegative simple functions, $\varphi$ and $\psi$ $$ \int (a\varphi + b\psi) = a \int\varphi + b \int\psi $$
Integral of bounded functions
- for bounded function, $f$, identically zero outside measurable set of finite measure $$ \sup_{\varphi:\ \mathrm{simple},\ \varphi \leq f} \int \varphi = \inf_{\psi:\ \mathrm{simple},\ f \leq \psi} \int \psi $$ if and only if $f=g$ a.e. for measurable function, $g$
- but, $f=g$ a.e. for measurable function, $g$, \iaoi\ $f$ is measurable with respect to completion of $\mu$, $\bar{\mu}$
- natural class of functions to consider for integration theory are those measurable \wrt\ completion of $\mu$
- thus, shall either assume $\mu$ is complete measure or define integral with respect to $\mu$ to be integral with respect to completion of $\mu$ depending on context unless otherwise specified
Difficulty of general integral of nonnegative functions
-
for Lebesgue integral of nonnegative functions
(page~)
- first define integral for bounded measurable functions
- define integral of nonnegative function, $f$ as supremum of integrals of all bounded measurable functions, $h\leq f$, vanishing outside measurable set of finite measure
-
unfortunately, not work in case that measure is not semifinite
- e.g., if $\algB=\{\emptyset,X\}$ with $\mu \emptyset = 0$ and $\mu X = \infty$, we want $\int 1 d\mu=\infty$, but only bounded measurable function vanishing outside measurable set of finite measure is $h\equiv0$, hence, $\int g d\mu = 0$
- to avoid this difficulty, we define integral of nonnegative measurable function directly in terms of integrals of nonnegative simple functions
Integral of nonnegative functions
- for measurable function, $f:X\to\reals\cup\{\infty\}$, on measure space, $\meas{X}{\algB}{\mu}$, define integral of nonnegative extended real-valued measurable function $$ \int f d\mu = \sup_{\varphi:\ \mathrm{simple\ function},\ 0\leq \varphi\leq f} \int \varphi d\mu $$
-
however,
definition of integral of nonnegative extended real-valued measurable function
can be awkward to apply because
- taking supremum over large collection of simple functions
- not clear from definition that $\int(f+g) = \int f + \int g$
- thus, first establish some convergence theorems, and determine value of $\int f$ as limit of $\int \varphi_n$ for increasing sequence, $\seq{\varphi_n}$, of simple functions converging to $f$
Fatou's lemma and monotone convergence theorem
- Fatou's lemma - for nonnegative measurable function sequence, $\seq{f_n}$, with $\lim f_n = f$ a.e. on measurable set, $E$ $$ \int_E f \leq \liminf \int_E f_n $$
- monotone convergence theorem - for nonnegative measurable function sequence, $\seq{f_n}$, with $f_n\leq f$ for all $n$ and with $\lim f_n = f$ a.e. $$ \int_E f = \lim \int_E f_n $$
Integrability of nonnegative functions
-
for nonnegative measurable functions, $f$ and $g$, and $a,b\in\preals$
$$
\int (af + bg) = a\int f + b\int g
\mbox{ \& }
\int f \geq 0
$$
- equality holds if and only if $f=0$ a.e.
- monotone convergence theorem together with above yields, for nonnegative measurable function sequence, $\seq{f_n}$ $$ \int \sum f_n = \sum \int f_n $$
- measurable nonnegative function, $f$, with $$ \int_E fd\mu <\infty $$ said to be integral (over measurable set, $E$, \wrt\ $\mu$)
Integral
- arbitrary function, $f$, for which both $f^+$ and $f^-$ are integrable, said to be integrable
- in this case, define integral $$ \int_E f = \int_E f^+ - \int_E f^- $$
Properties of integral
-
for $f$ and $g$ integrable on measure set, $E$, and $a,b\in\reals$
- $af+bg$ is integral and $$ \int_E (af+bg) = a \int_E f + b\int_E g $$
- if $|h|\leq |f|$ and $h$ is measurable, then $h$ is integrable
- if $f\geq g$ a.e. $$ \int f \geq \int g $$
Lebesgue convergence theorem
- Lebesgue convergence theorem - for integral, $g$, over $E$ and sequence of measurable functions, $\seq{f_n}$, with $\lim f_n(x) = f(x)$ a.e. on $E$, if $$ |f_n(x)|\leq g(x) $$ then $$ \int_E f = \lim \int_E f_n $$
Setwise convergence of sequence of measures
- preceding convergence theorems assume fixed measure, $\mu$
- can generalize by allowing measure to vary
- given measurable space, $\measu{X}{\algB}$, sequence of set functions, $\seq{\mu_n}$, defined on $\algB$, satisfying $$ (\forall E\in\algB) (\lim \mu_n E = \mu E) $$ for some set function, $\mu$, defined on $\algB$, said to converge setwise to $\mu$
General convergence theorems
- generalization of Fatou's leamma - for measurable space, $\measu{X}{\algB}$, sequence of measures, $\seq{\mu_n}$, defined on $\algB$, converging setwise to $\mu$, defined on $\algB$, and sequence of nonnegative functions, $\seq{f_n}$, each measurable with respect to $\mu_n$, converging pointwise to function, $f$, measurable with respect to $\mu$ (compare with Fatou's lemma on page~) $$ \int f d\mu \leq \liminf\int f_n d\mu_n $$
- generalization of Lebesgue convergence theorem - for measurable space, $\measu{X}{\algB}$, sequence of measures, $\seq{\mu_n}$, defined on $\algB$, converging setwise to $\mu$, defined on $\algB$, and sequences of functions, $\seq{f_n}$ and $\seq{g_n}$, each of $f_n$ and $g_n$, measurable with respect to $\mu_n$, converging pointwise to $f$ and $g$, measurable with respect to $\mu$, respectively, such that (compare with Lebesgue convergence theorem on page~) $$ \lim \int g_n d\mu_n = \int g d\mu < \infty $$ satisfy $$ \lim \int f_n d\mu_n = \int f\mu $$
$L^p$ spaces
-
for complete measure space, $\meas{X}{\algB}{\mu}$
- space of measurable functions on $X$ with with $\int |f|^p < \infty$, for which element equivalence is defined by being equal a.e., called $L^p$ spaces denoted by $L^p(\mu)$
- space of bounded measure functions, called $L^\infty$ space denoted by $L^\infty(\mu)$
-
norms
- for $p\in[1,\infty)$ $$ \|f\|_p=\left( \int |f|^p d\mu \right)^{1/p} $$
- for $p=\infty$ $$ \|f\|_\infty = \mathrm{ess\ sup} |f| = \inf \bigsetl{|g(x)|}{\mbox{measurable }g \mbox{ with } g=f \mbox{ a.e.}} $$
- for $p\in[1,\infty]$, spaces, $L^p(\mu)$, are Banach spaces
Hölder's inequality and Littlewood's second principle
- Hölder's inequality - for $p,q\in[1,\infty]$ with $1/p+1/q=1$, $f\in L^p(\mu)$ and $g\in L^q(\mu)$ satisfy $fg \in L^1(\mu)$ and $$ \|fg\|_1 = \int |fg| d\mu \leq \|f\|_p\|g\|_q $$
- complete measure space version of Littlewood's second principle - for $p\in[1,\infty)$ $$ \begin{eqnarray*} &=& (\forall f\in L^p(\mu), \epsilon>0) \\ && (\exists \mbox{ simple function } \varphi \mbox{ vanishing outside set of finite measure}) \\ && \ \ \ \ \ \ \ (\|f-\varphi\|_p < \epsilon) \end{eqnarray*} $$
Riesz representation theorem
- Riesz representation theorem - for $p\in[1,\infty)$ and bounded linear functional, $F$, on $L^p(\mu)$ and $\sigma$-finite measure, $\mu$, exists unique $g\in L^q(\mu)$ where $1/p+1/q=1$ such that $$ F(f) = \int fg d\mu $$ where $\|F\| = \|g\|_q$
- if $p\in(1,\infty)$, Riesz representation theorem holds without assumption of $\sigma$-finiteness of measure
Measure and Outer Measure
General measures
- consider some ways of defining measures on $\sigma$-algebra
-
recall that for Lebesgue measure
- define measure for open intervals
- define outer measure
- define notion of measurable sets
- finally derive Lebesgue measure
-
one can do similar things in general, e.g.,
- derive measure from outer measure
- derive outer measure from measure defined on algebra of sets
Outer measure
-
set function, $\mu^\ast:\powerset(X)\to[0,\infty]$,
for space $X$, having following properties,
called outer measure
- $\mu^\ast \emptyset = 0$
- $A\subset B \Rightarrow \mu^\ast A \leq \mu^\ast B$ (monotonicity)
- $E \subset \bigcup_{n=1}^\infty E_n \Rightarrow \mu^\ast E \leq \sum_{n=1}^\infty \mu^\ast E_n$ (countable subadditivity)
- $\mu^\ast$ with $\mu^\ast X<\infty$ called finite
- set $E\subset X$ satisfying following property, said to be measurable \wrt\ $\mu^\ast$ $$ (\forall A\subset X) (\mu^\ast(A) =\mu^\ast(A\cap E) + \mu^\ast(A\cap \compl{E})) $$
- class, $\algB$, of $\mu^\ast$-measurable sets is $\sigma$-algebra
- restriction of $\mu^\ast$ to $\algB$ is complete measure on $\algB$
Extension to measure from measure on an algebra
-
set function, $\mu:\alg\to[0,\infty]$, defined on algebra, $\alg$,
having following properties,
called measure on an algebra
- $\mu(\emptyset) = 0$
- $\left( \forall \mbox{ disjoint } \seq{A_n} \subset \alg \mbox{ with } \bigcup A_n \in \alg \right) \left( \mu\left(\bigcup A_n\right) = \sum \mu A_n \right)$
- measure on an algebra, \alg, is measure if and only if $\alg$ is $\sigma$-algebra
-
can extend measure on an algebra to measure defined on $\sigma$-algebra, $\algB$, containing $\alg$,
by
- constructing outer measure $\mu^\ast$ from $\mu$
- deriving desired extension $\bar{\mu}$ induced by $\mu^\ast$
- process by which constructing $\mu^\ast$ from $\mu$ similar to constructing Lebesgue outer measure from lengths of intervals
Outer measure constructed from measure on an algebra
- given measure, $\mu$, on an algebra, $\alg$
- define set function, $\mu^\ast:\powerset(X)\to[0,\infty]$, by $$ \mu^\ast E = \inf_{\seq{A_n}\subset \alg,\ E\subset \bigcup A_n} \sum \mu A_n $$
- $\mu^\ast$ called outer measure induced by $\mu$
- then
- for $A\in\alg$ and $\seq{A_n}\subset\alg$ with $A\subset \bigcup A_n$, $\mu A\leq \sum \mu A_n$
- hence, $(\forall A\in\alg)(\mu^\ast A = \mu A)$
- $\mu^\ast$ is outer measure
- every $A\in\alg$ is measurable with respect to $\mu^\ast$
Regular outer measure
-
for algebra, $\alg$
- $\alg_\sigma$ denote sets that are countable unions of sets of $\alg$
- $\alg_{\sigma \delta}$ denote sets that are countable intersections of sets of $\alg_\sigma$
- given measure, $\mu$, on an algebra, $\alg$ and outer measure, $\mu^\ast$ induced by $\mu$, for every $E\subset X$ and every $\epsilon>0$, exists $A\in\alg_\sigma$ and $B\in\alg_{\sigma \delta}$ with $E\subset A$ and $E\subset B$ $$ \mu^\ast A \leq \mu^\ast E + \epsilon \mbox{ and } \mu^\ast E = \mu^\ast B $$
- outer measure, $\mu^\ast$, with below property, said to be regular $$ (\forall E\subset X, \epsilon>0) (\exists \mbox{ $\mu^\ast$-measurable set }A \mbox{ with } E\subset A) (\mu^\ast A \subset \mu^\ast E + \epsilon) $$
- every outer measure induced by measure on an algebra is regular outer measure
Carathéodory theorem
- given measure, $\mu$, on an algebra, $\alg$ and outer measure, $\mu^\ast$ induced by $\mu$
-
$E\subset X$ is $\mu^\ast$-measurable
if and only if
exist $A\in\alg_{\sigma\delta}$ and $B\subset X$ with $\mu^\ast B=0$
such that
$$
E=A\sim B
$$
- for $B\subset X$ with $\mu^\ast B=0$, exists $C\in\alg_{\sigma\delta}$ with $\mu^\ast C=0$ such that $B\subset C$
-
Carathéodory theorem -
restriction, $\bar{\mu}$, of $\mu^\ast$ to $\mu^\ast$-measurable sets
if extension of $\mu$ to $\sigma$-algebra containing $\alg$
- if $\mu$ is finite or $\sigma$-finite, so is $\bar{\mu}$ respectively
- if $\mu$ is $\sigma$-finite, $\bar{\mu}$ is only measure on smallest $\sigma$-algebra containing $\alg$ which is extension of $\mu$
Product measures
- for countable disjoint collection of measurable rectangles, $\seq{(A_n \times B_n)}$, whose union is measurable rectangle, $A\times B$ $$ \lambda(A\times B) = \sum \lambda(A_n \times B_n) $$
- for $x\in X$ and $E\in \algk{R}_{\sigma\delta}$ $$ E_x = \set{y}{\langle x,y\rangle \in E} $$ is measurable subset of $Y$
- for $E\subset\algk{R}_{\sigma\delta}$ with $\mu \times \nu(E)<\infty$, function, $g$, defined by $$ g(x) = \nu E_x $$ is measurable function of $x$ and $$ \int g d\mu = \mu \times \nu(E) $$
- XXX
Carathéodory outer measures
- set, $X$, of points and set, $\Gamma$, of real-valued functions on $X$
- two sets for which exist $a>b$ such that function, $\varphi$, greater than $a$ on one set and less than $b$ on the other set, said to be separated by function, $\varphi$
- outer measure, $\mu^\ast$, with $(\forall A,B\subset X \mbox{ separated by } f\in\Gamma) (\mu^\ast(A\cup B) = \mu^\ast A + \mu^\ast B)$, called Carathéodory outer measure with respect to $\Gamma$
- outer measure, $\mu^\ast$, on metric space, $\metrics{X}{\rho}$, for which $\mu^\ast(A\cup B)=\mu^\ast A + \mu^\ast B$ for $A,B\subset X$ with $\rho(A,B)>0$, called Carathéodory outer measure for $X$ or metric outer measure
- for Carathéodory outer measure, $\mu^\ast$, with respect to $\Gamma$, every function in $\Gamma$ is $\mu^\ast$-measurable
- for Carathéodory outer measure, $\mu^\ast$, for metric space, \metrics{X, \rho}, every closed set (hence every Borel set) is measurable with respect to $\mu^\ast$
Measure-theoretic Treatment of Probabilities
Probability Measure
Measurable functions
- denote $n$-dimensional Borel sets by $\algR^n$
- for two measurable spaces, $\measu{\Omega}{\algF}$ and $\measu{\Omega'}{\algF'}$, function, $f:\Omega \to \Omega'$ with $$ \left( \forall A' \in \algF' \right) \left( f^{-1}(A') \in \algF \right) $$ said to be measurable with respect to $\algF/\algF'$ (thus, measurable functions defined on page~ and page~ can be said to be measurable with respect to $\collk{B}/\algR$)
-
when $\Omega=\reals^n$ in $\measu{\Omega}{\algF}$,
$\algF$ is assumed to be $\algR^n$,
and sometimes drop $\algR^n$
- thus, e.g., we say $f:\Omega\to\reals^n$ is measurable with respect to $\algF$ (instead of $\algF/\algR^n$)
- measurable function, $f:\reals^n\to\reals^m$ (i.e., measurable with respect to $\algR^n/\algR^m$), called Borel functions
- $f:\Omega\to\reals^n$ is measurable with respect to $\algF/\algR^n$ if and only if every component, $f_i:\Omega\to\reals$, is measurable with respect to $\algF/\algR$
Probability (measure) spaces
-
set function, $P:\algk{F}\to[0,1]$, defined on algebra, $\algk{F}$, of set $\Omega$,
satisfying following properties,
called probability measure
(refer to page~ for resumblance with measurable spaces)
- $(\forall A\in\algk{F})(0\leq P(A)\leq 1)$
- $P(\emptyset) = 0,\ P(\Omega) = 1$
- $(\forall \mbox{ disjoint } \seq{A_n} \subset \algk{F} )(P\left(\bigcup A_n\right) = \sum P(A_n))$
- for $\sigma$-algebra, $\algk{F}$, $\meas{\Omega}{\algk{F}}{P}$, called probability measure space or probability space
- set $A\in\algk{F}$ with $P(A)=1$, called a support of $P$
Dynkin's $\pi$-$\lambda$ theorem
-
class, $\subsetset{P}$, of subsets of $\Omega$ closed under finite intersection,
called $\pi$-system, i.e.,
- $(\forall A,B\in \subsetset{P})(A\cap B\in\subsetset{P})$
-
class, $\subsetset{L}$, of subsets of $\Omega$ containing $\Omega$
closed under complements and countable disjoint unions
called $\lambda$-system
- $\Omega \in \subsetset{L}$
- $(\forall A\in \subsetset{L})(\compl{A}\in\subsetset{L})$
- $(\forall \mbox{ disjoint }\seq{A_n})(\bigcup A_n \in \subsetset{L})$
- class that is both $\pi$-system and $\lambda$-system is $\sigma$-algebra
- Dynkin's $\pi$-$\lambda$ theorem - for $\pi$-system, $\subsetset{P}$, and $\lambda$-system, $\subsetset{L}$, with $\subsetset{P} \subset \subsetset{L}$, $$ \sigma(\subsetset{P}) \subset \subsetset{L} $$
- for $\pi$-system, $\algk{P}$, two probability measures, $P_1$ and $P_2$, on $\sigma(\algk{P})$, agreeing $\algk{P}$, agree on $\sigma(\algk{P})$
Limits of Events
- for $\seq{A_n}$ converging to $A$ $$ \lim P(A_n) = P(A) $$
Probabilistic independence
- given probability space, $\meas{\Omega}{\algk{F}}{P}$
- $A,B\in\algk{F}$ with $$ P(A\cap B) = P(A) P(B) $$ said to be independent
- indexed collection, $\seq{A_\lambda}$, with $$ \left( \forall n\in\naturals, \mbox{ distinct } \lambda_1, \ldots, \lambda_n \in \Lambda \right) \left( P\left(\bigcap_{i=1}^n A_{\lambda_i}\right) = \prod_{i=1}^n P(A_{\lambda_i}) \right) $$ said to be independent
Independence of classes of events
- indexed collection, $\seq{\subsetset{A}_\lambda}$, of classes of events (i.e., subsets) with $$ \left( \forall A_\lambda \in \subsetset{A}_\lambda \right) \left( \seq{A_\lambda} \mbox{ are independent} \right) $$ said to be independent
- for independent indexed collection, \seq{\subsetset{A}_\lambda}, with every $\subsetset{A}_\lambda$ being $\pi$-sytem, \seq{\sigma(\subsetset{A}_\lambda)} are independent
- for independent (countable) collection of events, $\seq{\seq{A_{ni}}_{i=1}^\infty}_{n=1}^\infty$, $\seq{\algk{F}_n}_{n=1}^\infty$ with $\algk{F}_n = \sigma(\seq{A_{ni}}_{i=1}^\infty)$ are independent
Borel-Cantelli lemmas
-
for sequence of events, $\seq{A_n}$, with $\sum P(A_n)$ converging $$ P(\limsup A_n) = 0 $$
-
for independent sequence of events, $\seq{A_n}$, with $\sum P(A_n)$ diverging $$ P(\limsup A_n)=1 $$
Tail events and Kolmogorov's zero-one law
- for sequence of events, $\seq{A_n}$ $$ \algk{T} = \bigcap_{n=1}^\infty \sigma\left(\seq{A_i}_{i=n}^\infty\right) $$ called tail $\sigma$-algebra associated with \seq{A_n}; its lements are called tail events
- Kolmogorov's zero-one law - for independent sequence of events, $\seq{A_n}$ every event in tail $\sigma$-algebra has probability measure either $0$ or $1$
Product probability spaces
-
for two measure spaces, $\meas{X}{\algX}{\mu}$ and $\meas{Y}{\algY}{\nu}$,
want to find product measure, $\pi$,
such that
$$
\left(
\forall A\in \algX, B\in\algY
\right)
\left(
\pi(A\times B) = \mu(A)\nu(B)
\right)
$$
- e.g., if both $\mu$ and $\nu$ are Lebesgue measure on $\reals$, $\pi$ will be Lebesgue measure on $\reals^2$
- $A\times B$ for $A\in\algX$ and $B\in\algY$ is measurable rectangle
-
$\sigma$-algebra generated by measurable rectangles
denoted by
$$
\algX \times \algY
$$
- thus, not Cartesian product in usual sense
- generally much larger than class of measurable rectangles
Sections of measurable subsets and functions
- for two measure spaces, $\meas{X}{\algX}{\mu}$ and $\meas{Y}{\algY}{\nu}$
-
sections of measurable subsets
- $\set{y\in Y}{(x,y)\in E}$ is section of $E$ determined by $x$
- $\set{x\in X}{(x,y)\in E}$ is section of $E$ determined by $y$
-
sections of measurable functions
- for measurable function, $f$, with respect to $\algX\times \algY$
- $f(x,\cdot)$ is section of $f$ determined by $x$
- $f(\cdot,y)$ is section of $f$ determined by $y$
-
sections of measurable subsets are measurable
- $\left( \forall x\in X, E\in \algX \times \algY \right) \left( \set{y\in Y}{(x,y)\in E} \in \algY \right)$
- $\left( \forall y\in Y, E\in \algX \times \algY \right) \left( \set{x\in X}{(x,y)\in E} \in \algX \right)$
-
sections of measurable functions are measurable
- $f(x,\cdot)$ is measurable with respect to $\algY$ for every $x\in X$
- $f(\cdot,y)$ is measurable with respect to $\algX$ for every $y\in Y$
Product measure
- for two $\sigma$-finite measure spaces, $\meas{X}{\algX}{\mu}$ and $\meas{Y}{\algY}{\nu}$
-
two functions defined below for every $E\in\algX\times\algY$ are $\sigma$-finite measures
- $\pi'(E) = \int_X \nu\set{y\in Y}{(x,y)\in E} d\mu$
- $\pi''(E) = \int_Y \mu\set{x\in X}{(x,y)\in E} d\nu$
- for every measurable rectangle, $A\times B$, with $A\in\algX$ and $B\in\algY$ $$ \pi'(A\times B) = \pi''(A\times B) = \mu(A) \nu(B) $$
- (use conventions in page~ for extended real values)
- indeed, $\pi'(E)=\pi''(E)$ for every $E\in\algX\times\algY$; let $\pi=\pi'=\pi''$
-
$\pi$ is
- called product measure and denoted by $\mu\times \nu$
- $\sigma$-finite measure
- only measure such that $\pi(A\times B) =\mu(A) \nu(B)$ for every measurable rectangle
Fubini's theorem
-
suppose two $\sigma$-finite measure spaces, $\meas{X}{\algX}{\mu}$ and $\meas{Y}{\algY}{\nu}$
- define
- $X_0 = \set{x\in X}{\int_Y |f(x,y)|d\nu < \infty}\subset X$
- $Y_0 = \set{y\in Y}{\int_X |f(x,y)|d\nu < \infty}\subset Y$
- Fubini's theorem - for nonnegative measurable function, $f$, following are measurable with respect to $\algX$ and $\algY$ respectively $$ g(x) = \int_Y f(x,y)d\nu,\ \ h(y) = \int_X f(x,y)d\mu $$ and following holds $$ \int_{X\times Y} f(x,y) d\pi = \int_X \left(\int_Y f(x,y) d\nu\right)d\mu = \int_Y \left(\int_X f(x,y) d\mu\right)d\nu $$
-
for $f$, (not necessarily nonnegative) integrable function with respect to $\pi$
- $\mu(X\sim X_0) = 0$, $\nu(Y\sim Y_0)=0$
- $g$ and $h$ are finite measurable on $X_0$ and $Y_0$ respectively
- (above) equalities of double integral holds
Random Variables
Random variables
- for probability space, $\meas{\Omega}{\algk{F}}{P}$,
- measurable function (with respect to $\algF/\algR$), $X:\Omega \to \reals$, called random variable
-
measurable function (with respect to $\algF/\algR^n$), $X:\Omega \to \reals^n$,
called random vector
- when expressing $X(\omega)=(X_1(\omega), \ldots, X_n(\omega))$, $X$ is measurable if and only if every $X_i$ is measurable
- thus, $n$-dimensional random vaector is simply $n$-tuple of random variables
-
smallest $\sigma$-algebra with respect to which $X$ is measurable,
called $\sigma$-algebra generated by $X$
and denoted by $\sigma(X)$
- $\sigma(X)$ consists exactly of sets, $\set{\omega\in \Omega}{X(\omega)\in H}$, for $H\in\algR^n$
- random variable, $Y$, is measurable with respect to $\sigma(X)$ if and only if exists measurable function, $f:\reals^n\to\reals$ such that $Y(\omega) = f(X(\omega))$ for all $\omega$, i.e., $Y=f\circ X$
Probability distributions for random variables
- probability measure on $\reals$, $\mu = PX^{-1}$, i.e., $$ \mu(A) = P(X\in A) \mbox{ for } A \in \algR $$ called distribution or law of random variable, $X$
- function, $F:\reals\to[0,1]$, defined by $$ F(x) = \mu(-\infty, x] = P(X\leq x) $$ called distribution function or cumulative distribution function (CDF) of $X$
- Borel set, $S$, with $P(S)=1$, called support
- random variable, its distribution, its distribution function, said to be discrete when has countable support
Probability distribution of mappings of random variables
- for measurable $g:\reals\to\reals$, $$ \left( \forall A\in\algR \right) \left( \prob{g(X)\in A} = \prob{X \in g^{-1}(A)} = \mu (g^{-1}(A)) \right) $$ hence, $g(X)$ has distribution of $\mu g^{-1}$
Probability density for random variables
- Borel function, $f: \reals\to\preals$, satisfying $$ \left( \forall A \in \algR \right) \left( \mu(A) = P(X\in A) = \int_A f(x) dx \right) $$ called density or probability density function (PDF) of random variable
- above is equivalent to $$ \left( \forall a < b \in \reals \right) \left( \int_a^b f(x) dx = P(a<X\leq b) = F(b) - F(a) \right) $$
-
(refer to statement on page~)
- note, though, $F$ does not need to differentiate to $f$ everywhere; only $f$ required to integrate properly
- if $F$ does differentiate to $f$ and $f$ is continuous, fundamental theorem of calculus implies $f$ indeed is density for $F$
Probability distribution for random vectors
- (similarly to random variables) probability measure on $\reals^n$, $\mu = PX^{-1}$, i.e., $$ \mu(A) = P(X\in A) \mbox{ for } A \in \algk{B}^k $$ called distribution or law of random vector, $X$
- function, $F:\reals^k\to[0,1]$, defined by $$ F(x) = \mu S_x = P(X\preceq x) $$ where $$ S_x = \set{\omega\in \Omega}{X(\omega)\preceq x} = \set{\omega\in \Omega}{X_i(\omega)\leq x_i} $$ called distribution function or cumulative distribution function (CDF) of $X$
- (similarly to random variables) random vector, its distribution, its distribution function, said to be discrete when has countable support
Marginal distribution for random vectors
- (similarly to random variables) for measurable $g:\reals^n\to\reals^m$ $$ \left( \forall A\in\algR^{m} \right) \left( \prob{g(X)\in A} = \prob{X \in g^{-1}(A)} = \mu(g^{-1}(A)) \right) $$ hence, $g(X)$ has distribution of $\mu g^{-1}$
- for $g_i:\reals^n\to\reals$ with $g_i(x) = x_i$ $$ \left( \forall A\in\algR \right) \left( \prob{g(X)\in A} = \prob{X_i \in A} \right) $$
- measure, $\mu_i$, defined by $\mu_i(A) = \prob{X_i\in A}$, called ($i$-th) marginal distribution of $X$
- for $\mu$ having density function, $f:\reals^n\to\preals$, density function of marginal distribution is $$ f_i(x) = \int_{\algR^{n-1}} f(x_{-i}) d \mu_{-i} $$ where $x_{-i} = (x_1,\ldots,x_{i-1}, x_{i+1}, \ldots, x_n)$ and similarly for $d\mu_{-i}$
Independence of random variables
- random variables, $X_1$, , $X_n$, with independent $\sigma$-algebras generated by them, said to be independent
-
(refer to page~ for
independence of collections of subsets)
- because $\sigma(X_i) = X_i^{-1}(\algR)=\set{X_i^{-1}(H)}{H\in\algR}$, independent if and only if $$ \left( \forall H_1, \ldots, H_n\in \algR \right) \left( P\left(X_1\in H_1,\ldots, X_n\in H_n\right) = \prod P\left(X_i\in H_i\right) \right) $$ i.e., $$ \left( \forall H_1, \ldots, H_n\in \algR \right) \left( P\left(\bigcap X_i^{-1}(H_i)\right) = \prod P\left(X_i^{-1}(H_i)\right) \right) $$
Equivalent statements of independence of random variables
-
for random variables, $X_1$, , $X_n$,
having $\mu$ and $F:\reals^n\to[0,1]$ as their distribution and CDF,
with each $X_i$ having $\mu_i$ and $F_i:\reals\to[0,1]$ as its distribution and CDF,
following statements are equivalent
- $X_1,\ldots,X_n \mbox{ are independent}$
- $\left( \forall H_1, \ldots, H_n\in \algR \right) \left( P\left(\bigcap X_i^{-1}(H_i)\right) = \prod P\left(X_i^{-1}(H_i)\right) \right)$
- $\left( \forall H_1,\ldots,H_n \in \algR \right) \left( P(X_1\in H_1,\ldots, X_n\in H_n) = \prod P(X_i \in H_i) \right)$
- $\left( \forall x\in \reals^n \right) \left( P(X_1\leq x_1,\ldots, X_n\leq x_n) = \prod P(X_i \leq x_i) \right)$
- $\left( \forall x \in \reals^n \right) \left( F(x) = \prod F_i(x_i) \right)$
- $\mu = \mu_1 \times \cdots \times \mu_n$
- $\left( \forall x \in \reals^n \right) \left( f(x) = \prod f_i(x_i) \right)$
Independence of random variables with separate $\sigma$-algebra
- given probability space, $\meas{\Omega}{\algk{F}}{P}$
- random variables, $X_1$, , $X_n$, each of which is measurable with respect to each of $n$ independent $\sigma$-algebras, $\algk{G}_1\subset \algF$, , $\algk{G}_n\subset \algF$ respectively, are independent
Independence of random vectors
-
for random vectors, $X_1:\Omega\to\reals^{d_1}$, , $X_n:\Omega\to\reals^{d_n}$,
having $\mu$ and $F:\reals^{d_1}\times\cdots\times\reals^{d_n}\to[0,1]$ as their distribution and CDF,
with each $X_i$ having $\mu_i$ and $F_i:\reals^{d_i}\to[0,1]$ as its distribution and CDF,
following statements are equivalent
- $X_1,\ldots,X_n \mbox{ are independent}$
- $\left( \forall H_1\in\algR^{d_1}, \ldots, H_n\in \algR^{d_n} \right) \left( P\left(\bigcap X_i^{-1}(H_i)\right) = \prod P\left(X_i^{-1}(H_i)\right) \right)$
- $\left( \forall H_1\in\algR^{d_1}, \ldots, H_n\in \algR^{d_n} \right) \left( P(X_1\in H_1,\ldots, X_n\in H_n) = \prod P(X_i \in H_i) \right)$
- $\left( \forall x_1\in \reals^{d_1},\ldots,x_n\in\reals^{d_n} \right) \left( P(X_1\preceq x_1,\ldots, X_n\preceq x_n) = \prod P(X_i \preceq x_i) \right)$
- $\left( \forall x_1\in \reals^{d_1},\ldots,x_n\in\reals^{d_n} \right) \left( F(x_1,\ldots,x_n) = \prod F_i(x_i) \right)$
- $\mu = \mu_1 \times \cdots \times \mu_n$
- $\left( \forall x_1\in \reals^{d_1},\ldots,x_n\in\reals^{d_n} \right) \left( f(x_1,\ldots,x_n) = \prod f_i(x_i) \right)$
Independence of infinite collection of random vectors
- infinite collection of random vectors for which every finite subcollection is independent, said to be independent
- for independent (countable) collection of random vectors, $\seq{\seq{X_{ni}}_{i=1}^\infty}_{n=1}^\infty$, $\seq{\algk{F}_n}_{n=1}^\infty$ with $\algk{F}_n = \sigma(\seq{X_{ni}}_{i=1}^\infty)$ are independent
Probability evaluation for two independent random vectors
Sequence of random variables
Expected values
-
$\Expect X$ is
- always defined for nonnegative $X$
-
for general case
- defined, or
- $X$ has an expected value if either $\Expect X^+<\infty$ or $\Expect X^-<\infty$ or both, in which case, $\Expect X =\Expect X^+ - \Expect X^-$
- $X$ is integrable if and only if $\Expect |X| <\infty$
-
limits
- if $\seq{X_n}$ is dominated by integral random variable or they are uniformly integrable, $\Expect X_n$ converges to $\Expect X$ if $X_n$ converges to $X$ in probability
Markov and Chebyshev's inequalities
Jensen's, Hölder's, and Lyapunov's inequalities
- note Hölder's inequality implies Lyapunov's inequality
Maximal inequalities
– define $S_n = \sum X_i$
Moments
- if $\Expect |X|^n<\infty$, $\Expect |X|^k<\infty$ for $k<n$
- $\Expect X^n$ defined only when $\Expect|X|^n<\infty$
Moment generating functions
- $n$-th derivative of $M$ with respect to $s$ is $M^{(n)}(s) = \frac{d^n}{ds^n} F(s) = \Expect \left(X^ne^{sX}\right) = \int xe^{sx} d\mu$
- thus, $n$-th derivative of $M$ with respect to $s$ at $s=0$ is $n$-th moment of $X$ $$ M^{(n)}(0) = \Expect X^n $$
- for independent random variables, $\seq{X_i}_{i=1}^n$, moment generating function of $\sum X_i$ $$ \prod M_i(s) $$
Convergence of Random Variables
Convergences of random variables
- indeed, if above equation holds for $A=(-\infty, x)$, it holds for many other subsets
Relations of different types of convergences of random variables
Necessary and sufficient conditions for convergence of probability
\[{X_n}\ \mbox{ converge in probability}\]if and only if
\[\left( \forall \epsilon>0 \right) \left( \prob{|X_n-X|>\epsilon\mbox{ i.o}} = \prob{\limsup |X_n-X| > \epsilon } = 0 \right)\]if and only if
\[\left( \forall \mbox{ subsequence }\seq{X_{n_k}} \right) \left( \exists \mbox{ its subsequence }\seq{X_{n_{k_l}}} \mbox{ converging to } f \mbox{ with probability } 1 \right)\]Necessary and sufficient conditions for convergence in distribution
\[X_n\Rightarrow X, \mbox{\ie, $X_n$ converge in distribution}\]if and only if
\[F_n\Rightarrow F, \mbox{\ie, $F_n$ converge weakly}\]if and only if
\[\left( \forall A = (-\infty, x] \mbox{ with } x\in\reals \right) \left( \lim \mu_n(A) = \mu(A) \right)\]if and only if
\[\left( \forall x \mbox{ with } \prob{X=x} = 0 \right) \left( \lim \prob{X_n\leq x} = \prob{X\leq x} \right)\]Strong law of large numbers
– define $S_n = \sum_{i=1}^n X_i$
- strong law of large numbers also called Kolmogorov's law
Weak law of large numbers
– define $S_n = \sum_{i=1}^n X_i$
- because convergence with probability $1$ implies convergence in probability (), strong law of large numbers implies weak law of large numbers
Normal distributions
– assume probability space, $\meas{\Omega}{\algF}{P}$
- note $\Expect X=c$ and $\Var X=\sigma^2$
- called standard normal distribution when $c=0$ and $\sigma=1$
Multivariate normal distributions
– assume probability space, $\meas{\Omega}{\algF}{P}$
- note that $\Expect X=c$ and covariance matrix is $\Sigma$
Lindeberg-Lévy theorem
– define $S_n = \sum^n X_i$
Limit theorems in $\reals^n$
- $\lim \int f d\mu_n = \int f d\mu$ for every bounded continuous $f$
- $\limsup \mu_n(C) \leq \mu(C)$ for every closed $C$
- $\liminf \mu_n(G) \geq \mu(G)$ for every open $G$
- $\lim \mu_n(A) = \mu(A)$ for every $\mu$-continuity $A$
Central limit theorem
– assume probability space, $\meas{\Omega}{\algF}{P}$ and define $\sum^n X_i = S_n$
Convergence of random series
- for independent $\seq{X_n}$, probability of $\sum X_n$ converging is either $0$ or $1$
- below characterize two cases in terms of distributions of individual $X_n$ -- XXX: diagram
– define trucated version of $X_n$ by $X_n^{(c)}$, i.e., $X_n I_{|X_n|\leq c}$
Convex Optimization
Convex Sets
Lines and line segmenets
Affine sets
Relative interiors and boundaries
Convex sets
- convex hull (of course) is convex set
Cones
- convex cone (of course) is convex set
- examples of convex cones: $\prealk{n}$, $\pprealk{n}$, $\possemidefset{n}$, and $\posdefset{n}$
Hyperplanes and half spaces
- hyperplanes and half spaces are convex sets
Euclidean balls and ellipsoids
- Euclidean balls and ellipsoids are convex sets
Norm balls and norm cones
- norm balls and norm cones are convex sets
Polyhedra
Convexity preserving set operations
-
intersection preserves convexity
- for (any) collection of convex sets, $\coll$, $$ \bigcap_{C\in\coll} C $$ is convex set
-
scalar scaling preserves convexity
- for convex set $C$ $$ \alpha C $$ is convex set for any $\alpha\in\reals$
-
sum preserves convexity
- for convex sets $C$ and $D$ $$ C+D $$ is convex set
-
direct product preserves convexity
- for convex sets $C$ and $D$ $$ C\times D $$ is convex set
-
projection preserves convexity
- for convex set $C\subset A \times B$ $$ \set{x\in A}{(\exists y)((x,y)\in C)} $$ is convex
-
image and inverse image by affine function preserve convexity
- for affine function $f:A\to B$ and convex sets $C\subset A$ and $D\subset B$ $$ f(C) \;\& \; f^{-1}(D) $$ are convex
-
image and inverse image by linear-fractional function preserve convexity
- for convex sets $C\subset \reals^n, D\subset \reals^m$ and linear-fractional function, $g:\reals^n\to\reals^m$, i.e., function defined by $g(x) = (Ax+b)/(c^Tx+d)$ for $A\in\reals^{m\times n}$, $b\in\reals^m$, $c\in\reals^n$, and $d\in\reals$ $$ g(C) \ \& \ g^{-1}(D) $$ are convex
Proper cones and generalized inequalities
- solid, i.e., $\interior{K}\neq \emptyset$
- pointed, i.e., $x\in vK$ and $-x\in K$ imply $x=0$
- examples of proper cones: $\prealk{n}$ and $\possemidefset{n}$
- (nonstrict) generalized inequality $$ x \preceq_K y \Leftrightarrow y - x\in K $$
- strict generalized inequality $$ x \prec_K y \Leftrightarrow y - x\in \interior{K} $$
- $\preceq_K$ and $\prec_K$ are partial orderings
Convex sets induced by generalized inequalities
- for affine function $g:\reals^n\to\symset{m}$, i.e., $f(x)=A_0 + A_1 x_1 + \cdots + A_n x_n$ for some $A_0,\ldots,A_n\in\symset{m}$, $f^{-1}(\possemidefset{n})$ is convex (by ), i.e., $$ \set{x\in\reals^n}{A_0 + A_1 x_1 + \cdots + A_n x_n \succeq 0} \subset \reals^n $$ is convex
- can negate each matrix $A_i$ and have same results, hence $$ \set{x\in\reals^n}{A_0 + A_1 x_1 + \cdots + A_n x_n \preceq 0} \subset \reals^n $$ is (also) convex
Separating and supporting hyperplanes
Dual cones
- the figure illustrates $x \in K^\ast$ while $z\not\in K^\ast$
Dual norms
-
examples
- dual cone of subspace $V\subset \reals^n$ is orthogonal complement of $V$, $V^\perp$, where $V^\perp=\set{y}{\forall v\in V,v^Ty = 0}$
- $\prealk{n}$ and $\possemidefset{n}$ are self-dual
- dual of norm cone is norm cone associated with dual norm, i.e., if $K=\set{(x,t)\in\reals^{n} \times \reals}{\|x\|\leq t}$ $$ K=\set{(y,u)\in\reals^{n} \times \reals}{\|y\|_\ast\leq u} $$
Properties of dual cones
- $K^\ast$ is closed and convex
- $K_1\subset K_2 \Rightarrow K_2^\ast \subset K_1^\ast$
- if $\interior{K} \neq \emptyset$, $K^\ast$ is pointed
- if $\closure{K}$ is pointed, $\interior{(K^\ast)} \neq \emptyset$
- $K^{\ast\ast}=(K^\ast)^\ast$ is closure of convex hull of $K$,
- $K^\ast$ is closed and convex
- if $K$ is closed and convex, $K^{\ast\ast} = K$
- dual of proper cone is proper cone
- for proper cone $K$, $K^{\ast\ast}=K$
Dual generalized inequalities
- $x\preceq_K y$ if and only if $(\forall \lambda \succeq_{K^\ast} 0)(\lambda^T x \leq \lambda^T y)$
- $x\prec_K y$ if and only if $(\forall \lambda \succeq_{K^\ast} 0 \mbox{ with } \lambda\neq0)(\lambda^T x < \lambda^T y)$
- $x\preceq_{K^\ast} y$ if and only if $(\forall \lambda \succeq_{K} 0)(\lambda^T x \leq \lambda^T y)$
- $x\prec_{K^\ast} y$ if and only if $(\forall \lambda \succeq_{K} 0 \mbox{ with } \lambda\neq0)(\lambda^T x < \lambda^T y)$
Theorem of alternative for linear strict generalized inequalities
Convex Functions
Convex functions
- function $f:\reals^n\to\reals$ the domain of which is convex and which satisfies $$ \left( \forall x,y\in \dom f, 0\leq \theta \leq 1 \right) \left( f(\theta x + (1-\theta) y) \leq \theta f(x) + (1-\theta) f(y) \right) $$ said to be convex
- function $f:\reals^n\to\reals$ the domain of which is convex and which satisfies $$ \left( \forall \mbox{ distinct } x,y\in \dom f, 0< \theta < 1 \right) \left( f(\theta x + (1-\theta) y) < \theta f(x) + (1-\theta) f(y) \right) $$ said to be strictly convex
- function $f:\reals^n\to\reals$ the domain of which is convex where $-f$ is convex, said to be concave
- function $f:\reals^n\to\reals$ the domain of which is convex where $-f$ is strictly convex, said to be strictly concave
Extended real-value extensions of convex functions
-
using extended real-value extensions of convex functions,
can drop “$\dom f$'' in equations,
e.g.,
- $f$ is convex if and only if its extended-value extension $\tilde{f}$ satisfies $$ \left( \forall x,y\in \dom f, 0\leq \theta \leq 1 \right) \left( f(\theta x + (1-\theta) y) \leq \theta f(x) + (1-\theta) f(y) \right) $$
- $f$ is strictly convex if and only if its extended-value extension $\tilde{f}$ satisfies $$ \left( \forall \mbox{ distinct } x,y\in \dom f, 0< \theta < 1 \right) \left( f(\theta x + (1-\theta) y) < \theta f(x) + (1-\theta) f(y) \right) $$
First-order condition for convexity
- convex if and only if $\dom f$ is convex and $$ \left( \forall x,y\in \dom f \right) \left( f(y) \geq f(x) + \nabla f(x) ^T (y-x) \right) $$
- strictly convex if and only if $\dom f$ is convex and $$ \left( \forall \mbox{ distinct } x,y\in \dom f \right) \left( f(y) > f(x) + \nabla f(x) ^T (y-x) \right) $$
-
implies
that
for convex function $f$
- first-order Taylor approximation is global underestimator
-
can derive
global information
from
local information
- e.g., if $\nabla f(x)=0$, $x$ is global minimizer
- explains remarkable properties of convex functions and convex optimization problems
Second-order condition for convexity
- if $\dom f$ is convex and $$ \left( \forall x\in \dom f \right) \left( \nabla^2 f(x) \succ 0 \right) $$ it is strictly convex
Convex function examples
- assume function $f:\reals^n\to\reals$ and $\dom f =\reals^n$ unlesss specified otherwise
- affine function, i.e., $f(x)=a^Tx +b$ for some $a\in\reals^n$ and $b\in\reals$, is convex
-
quadratic functions
- if $f(x) = x^T Px + q^Tx$
for some $P\in\symset{n}$ and $q\in\reals^n$
- $f$ is convex if and only if $P\succeq0$
- $f$ is strictly convex if and only if $P\succ0$
- exponential function, i.e., $f(x) = \exp(a^Tx+b)$ for some $a\in\reals^n$ and $b\in\reals$, is convex
- power, i.e., $f(x) = x^a$ for some $a\geq1$, is convex on $\ppreals$
- power of absolute value, i.e., $f(x) = |x|^a$ for some $a\geq1$, is convex on $\reals$
- logarithm function, i.e., $f(x) = \log x$, is concave on $\ppreals$
- negative entropy, i.e., $$ f(x) = \left\{\begin{array}{ll} x\log x & \mbox{if } x >0 \\ 0 &\mbox{if } x=0 \end{array}\right. $$ is convex on $\preals$
- norm as function is convex (by definition of norms, i.e., triangle inequality & absolute homogeneity)
- max function, i.e., $f(x)=\max(x_1,\ldots,x_n\}$, is convex
- quadratic-over-linear function, $f(x,y) = x^2/y$, is convex on $\reals\times \ppreals$
- log-sum-exp, $f(x) = \log(\exp(x_1)+\cdots+\exp(x_n))$, is convex
- geometric mean, $f(x) = (\prod_{i=1}^n x_i )^{1/n}$, is concave on $\pprealk{n}$
- log-determinant, $f(X) = \log \det X$, is concave on $\posdefset{n}$
Sublevel sets and superlevel sets
- every sublevel set of convex function is convex
- and every superlevel set of concave function is convex
-
note, however, converse is not true
- e.g., every sublevel set of $\log$ is convex, but $\log$ is concave
Epigraphs and hypographs
- function is convex if and only if its epigraph is convex
- function is concave if and only if its hypograph is convex
Convexity preserving function operations
-
nonnegative weighted sum preserves convexity
- for convex functions $f_1$, , $f_n$ and nonnegative weights $w_1,\ldots, w_n$ $$ w_1 f_1 + \cdots w_n f_n $$ is convex
-
nonnegative weighted integration preserves convexity
- for measurable set $Y$, $w:Y\to\preals$, and $f:X \times Y$ where $f(x,y)$ is convex in $x$ for every $y\in Y$ and measurable in $y$ for every $x\in X$ $$ \int_Y w(y) f(x,y) dy $$ is convex
-
pointwise maximum preserves convexity
- for convex functions $f_1$, , $f_n$ $$ \max\{f_1, \ldots, f_n\} $$ is convex
-
pointwise supremum preserves convexity
- for indexed family of convex functions $\indexedcol{f_\lambda}_{\lambda\in\Lambda}$ $$ \sup_{\lambda \in \Lambda} f_\lambda $$ is convex (one way to see this is $\epi \sup_\lambda f_\lambda = \bigcap_\lambda \epi f_\lambda$)
-
composition
-
suppose $g:\reals^n\to\reals^k$, $h:\reals^k\to\reals$, and $f=h\circ g$
- $f$ convex if $h$ convex & nondecreasing in each argument, and $g_i$ convex
- $f$ convex if $h$ convex & nonincreasing in each argument, and $g_i$ concave
- $f$ concave if $h$ concave & nondecreasing in each argument, and $g_i$ concave
- $f$ concave if $h$ concave & nonincreasing in each argument, and $g_i$ convex
-
suppose $g:\reals^n\to\reals^k$, $h:\reals^k\to\reals$, and $f=h\circ g$
-
minimization
- for function $f(x,y)$ convex in $(x,y)$ and convex set $C$ $$ \inf_{y\in C} f(x,y) $$ is convex provided it is bounded below where domain is $\set{x}{(\exists y\in C)((x,y) \in \dom f)}$
-
perspective of convex function preserves convexity
- for convex function $f:X\to\reals$, function $g:X\times \reals \to \reals$ defined by $$ g(x,t) = tf(x/t) $$ with $\dom g = \set{(x,t)}{x/t \in \dom f, t>0}$ is convex
Convex functions examples
-
piecewise-linear function is convex, i.e.
- $\max\{a_1^Tx+b_1,\ldots,a_m^T x + b_m\}$ for some $a_i\in\reals^n$ and $b_i\in\reals$ is convex
-
sum of $k$ largest components is convex, i.e.
- $x_{[1]} + \cdots + x_{[k]}$ where $x_{[i]}$ denotes $i$-th largest component, is convex (since $f(x) = \max\set{x_{i_1}+\cdots+x_{i_r}}{1\leq i_1< i_2<\cdots < i_r\leq n}$)
-
support function of set, i.e.,
- $\sup\set{x^Ty}{y\in A}$ for $A\subset\reals^n$ is convex
-
distance (when measured by arbitrary norm) to farthest point of set
- $\sup\set{\|x-y\|}{y\in A}$ for $A\subset\reals^n$ is convex
-
least-squares cost as function of weights
-
$\inf_{x\in\reals^n} \sum^n_{i=1} w_i(a_i^Tx - b_i)^2$ for some $a_i\in\reals^n$ and $b_i\in\reals$
is concave
- note that above function equals to $\sum_{i=1}^n w_i b_i^2 - \sum_{i=1}^n w_i^2 b_i^2 a_i^T \left( \sum_{j=1}^n w_ja_ja_j^T\right)^{-1} a_i$ but not clear whether it is concave
-
$\inf_{x\in\reals^n} \sum^n_{i=1} w_i(a_i^Tx - b_i)^2$ for some $a_i\in\reals^n$ and $b_i\in\reals$
is concave
-
maximum eigenvalue of symmetric matrix
- $\lambda_\mathrm{max}(F(x)) = \sup\set{y^TF(x)y}{\|y\|_2 \leq 1}$ where $F:\reals^n\to \symset{m}$ is linear function in $x$
-
norm of matrix
- $\sup\set{u^TG(x)v}{\|u\|_2 \leq 1, \|v\|_2\leq1}$ where $G:\reals^n\to \reals^{m\times n}$ is linear function in $x$
-
distance (when measured by arbitrary norm) to convex set
- for convex set $C$, $\inf\set{\|x-y\|}{y\in C}$
-
infimum of convex function
subject to linear constraint
- for convex function $h$, $\inf\set{h(y)}{Ay=x}$ is convex (since it is $\inf_y (h(y) + I_{Ay=x}(x,y))$)
-
perspective of Euclidean norm squared
- map $(x,t) \mapsto x^Tx /t$ induces convex function in $(x,t)$ for $t>0$
-
perspective of negative log
- map $(x,t) \mapsto -t \log(x/t)$ induces convex function in $(x,t) \in \pprealk{2}$
-
perspective of convex function
- for convex function $f:\reals^n\to\reals$, function $g:\reals^n\to\reals$ defined by $$ g(x) = (c^T x + d) f((Ax+b)/(c^T x + d)) $$ from some $A\in\reals^{m\times n}$, $b\in\reals^m$, $c\in\reals^n$, and $d\in\reals$ with $\dom g = \set{x}{(Ax+b)/(c^Tx + d)\in \dom f, c^T x + d >0}$ is convex
Conjugate functions
- conjugate function is convex for any function $f$ because it is supremum of linear (hence convex) functions (in $x$) ()
Conjugate function examples
-
strictly convex quadratic function
- for $f:\reals^n \to \preals$ defined $f(x) = x^TQx/2$ where $Q\in \posdefset{n}$, $$ f^\ast(x)= \sup_y(y^Tx - y^TQy/2) = (y^Tx - y^TQy/2)|_{y=Q^{-1}x} = x^TQ^{-1}x/2 $$ which is also strictly convex quadratic function
-
log-determinant
- for function $f:\posdefset{n} \to \reals$ defined by $f(X) = \log \det X^{-1}$ $$ f^\ast(X) = \sup_{Y\in\posdefset{n}} (\Tr XY + \log \det Y) = \log\det (-X)^{-1} - n $$ where $\dom f^\ast = -\posdefset{n}$
-
indicator function
- for indicator function $I_A:\reals^n\to\{0,\infty\}$ with $A\subset \reals^n$ $$ I_A^\ast(x) = \sup_y (y^Tx - I_A(y)) = \sup \set{y^Tx}{y\in A} $$ which is support function of $A$
-
log-sum-exp function
- for function $f: \reals^n \to \reals$ defined by $f(x) = \log(\sum_{i=1}^n \exp(x_i))$ $$ f^\ast(x) = \sum_{i=1}^n x_i \log x_i + I_{x\succeq 0, \ones^T x = 1}(x) $$
-
norm
- for norm function $f:\reals^n\to\preals$ defined by $f(x)=\|x\|$ $$ f^\ast(x) = \sup_y( {y^Tx - \|y\|}) = I_{\|x\|_\ast\leq1}(x) $$
-
norm squared
- for function $f: \reals \to \preals$ defined by $f(x) = \|x\|^2/2$ $$ f^\ast(x) = \|x\|_\ast^2/2 $$
-
differentiable convex function
- for differentiable convex function $f:\reals^n\to\reals$ $$ f^\ast(x)= (y^\ast)^T \nabla f(y^\ast) - f(y^\ast) $$ where $y^\ast = \argsup_y (x^Ty-f(y))$
-
sum of independent functions
- for function $f:\reals^n\times \reals^m \to \reals$ defined by $f(x,y) = f_1(x) + f_2(y)$ where $f_1:\reals^n\to\reals$ and $f_2:\reals^m\to\reals$ $$ f^\ast(x,y) = f_1^\ast(x) + f_2^\ast(y) $$
Convex functions \wrt\ generalized inequalities
- function $f$ satisfying $$ \left( \forall x,y \in \dom f, 0\leq \theta\leq 1 \right) \left( f(\theta x + (1-\theta) y) \preceq_K \theta f(x) + (1-\theta) f(y) \right) $$ called $K$-convex
- function $f$ satisfying $$ \left( \forall x\neq y \in \dom f, 0< \theta< 1 \right) \left( f(\theta x + (1-\theta) y) \prec_K \theta f(x) + (1-\theta) f(y) \right) $$ called strictly $K$-convex
- function $f$ is $K$-convex if and only if for every $w\succeq_{K^\ast}0$, $w^Tf$ is convex
- function $f$ is strictly $K$-convex if and only if for every nonzero $w\succeq_{K^\ast}0$, $w^Tf$ is strictly convex
Matrix convexity
-
examples of matrix convexity
- function of $\reals^{n\times m}$ into $\possemidefset{n}$ defined by $X\mapsto XX^T$ is matrix convex
- function of $\posdefset{n}$ into itself defined by $X\mapsto X^p$ is matrix convex for $1\leq p\leq 2$ or $-1\leq p \leq0$, and matrix concave for $0\leq p\leq1$
- function of $\symset{n}$ into $\posdefset{n}$ defined by $X\mapsto \exp(X)$ is not matrix convex
- quadratic matrix function of $\reals^{m\times n}$ into $\symset{n}$ defined by $X\mapsto X^TAX + B^TX + X^TB + C$ for $A\in\symset{m}$, $B\in\reals^{m\times n}$, and $C\in\symset{n}$ is matrix convex when $A\succeq0$
Convex Optimization Problems
Optimization problems
- $\fobj$, $\fie$, and $\feq$ are objective function, inequality \& equality contraint function
- $\fie(x) \preceq 0$ and $\feq(x) = 0$ are inequality contraints and equality contraints
- $\optdomain = \xobj \cap \xie \cap \xeq$ is domain of optimization problem
- $\optfeasset =\set{x\in \optdomain}{\fie(x) \preceq0, \feq(x)=0}$, called feasible set, $x\in\optdomain$, said to be feasible if $x\in\optfeasset$, optimization problem, said to be feasible if $\optfeasset\neq \emptyset$
- $p^\ast = \inf\set{\fobj(x)}{x\in\optfeasset}$, called optimal value of optimization problem
- if optimization problem is infeasible, $p^\ast = \infty$ (following convention that infimum of empty set is $\infty$)
- if $p^\ast=-\infty$, optimization problem said to be unbounded
Global and local optimalities
- $x\in \optfeasset$ with $\fobj(x) = p^\ast$, called (global) optimal point
- $X_\mathrm{opt} = \set{x\in \optfeasset}{\fobj(x)=p^\ast}$, called optimal set
- when $X_\mathrm{opt} \neq \emptyset$, we say optimal value is attained or achieved and optimization problem is solvable
- optimization problem is not solvable if $p^\ast = \infty$ or $p^\ast = -\infty$ (converse is not true)
Equivalent optimization problems
-
below two optimization problems are equivalent
- $$ \begin{array}{ll} \mbox{minimize} & -x-y \\ \mbox{subject to} & 2x+y \leq1 \\ & x+2y \leq1 \end{array} $$
- $$ \begin{array}{ll} \mbox{minimize} & -2u-v/3 \\ \mbox{subject to} & 4u+v/3 \leq1 \\ & 2u+2v/3 \leq1 \end{array} $$
- since if $(x^\ast, y^\ast)$ solves first, $(u,v)=(x^\ast/2, 3y^\ast)$ solves second, and if $(u^\ast, v^\ast)$ solves second, $(x,y)=(2u^\ast, v^\ast/3)$ solves first
Change of variables
- given function $\phi:\mathcalfont{Z} \to \xdomain$, optimization problem in can be rewritten as $$ \begin{array}{ll} \mbox{minimize} & \fobj(\phi(z)) \\ \mbox{subject to} & \fie(\phi(z)) \preceq 0 \\ & \feq(\phi(z)) =0 \end{array} $$ where $z\in\mathcalfont{Z}$ is optimization variable
- if $\phi$ is injective and $\optdomain \subset \phi(\mathcalfont{Z})$, above optimization problem and optimization problem in are equivalent, i.e.
- two optimization problems said to be related by change of variable or substitution of variable $x=\phi(z)$
Convex optimization
- when $\xdomain= \reals^n$, optimization problem can be formulated as $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie(x) \preceq 0 \\ & Ax = b \end{array} $$ for some $A\in\reals^{p\times n}$ and $b\in\reals^p$
-
domain of convex optimization problem is convex
- since domains of $\fobj$, $\fie$, and $\feq$ are convex (by definition of convex functions) and intersection of convex sets is convex
-
feasible set of convex optimization problem
is convex
- since sublevel sets of convex functions are convex, feasible sets for affine function is either empty set, singleton, or affine sets, all of which are convex sets
Optimality conditions for convex optimization problems
- $x\in\optdomain$ is optimal if and only if $x\in\optfeasset$ and $$ \left( \forall y \in \optfeasset \right) \left( \nabla \fobj(x)^T(y-x) \geq0 \right) $$
- for unconstrained problems, $x\in\optdomain$ is optimal if and only if $$ \nabla \fobj(x)=0 $$
Optimality conditions for some convex optimization problems
-
unconstrained convex quadratic optimization
$$
\begin{array}{ll}
\mbox{minimize}
& \fobj(x) = (1/2)x^TPx + q^Tx
\end{array}
$$
where $\xobj=\reals^n$ and $P\in\possemidefset{n}$
-
$x$ is optimal if and only if
$$
\nabla \fobj(x) = Px + q = 0
$$
exist three cases
- if $P\in\posdefset{n}$, exists unique optimum $x^\ast = -P^{-1}q$
- if $q\in\range(P)$, $X_\mathrm{opt}=-P^\dagger q + \nullspace(P)$
- if $q\not\in\range(P)$, $p^\ast = -\infty$
-
$x$ is optimal if and only if
$$
\nabla \fobj(x) = Px + q = 0
$$
exist three cases
-
analytic centering
$$
\begin{array}{ll}
\mbox{minimize}
& \fobj(x) = - \sum_{i=1}^m \log (b_i-a_i^Tx)
\end{array}
$$
where $\xobj = \set{x\in\reals^n}{Ax \prec b}$
-
$x$ is optimal if and only if
$$
\nabla \fobj(x) = \sum_{i=1}^m \frac{1}{b_i-a_i^Tx}a_i = 0
$$
exist three cases
- exists unique optimum, which happens if and only if $\set{x}{b_i-a_i^Tx}$ is nonempty and bounded
- exist infinitely many optima, in which case, $X_\mathrm{opt}$ is affine set
- exists no optimum, which happens if and only if $\fobj$ is unbounded below
-
$x$ is optimal if and only if
$$
\nabla \fobj(x) = \sum_{i=1}^m \frac{1}{b_i-a_i^Tx}a_i = 0
$$
exist three cases
-
convex optimization problem with equality constraints only
$$
\begin{array}{ll}
\mbox{minimize}
& \fobj(x)
\\
\mbox{subject to}
& Ax =b
\end{array}
$$
where $\xdomain=\reals^n$
- $x$ is optimal if and only if $$ \nabla \fobj(x) \perp \nullspace(A) $$ or equivalently, exists $\nu\in\reals^p$ such that $$ \nabla \fobj(x) = A^T\nu $$
Linear programming
- can transform above LP into standard form LP $$ \begin{array}{ll} \mbox{minimize} & \tilde{c}^T\tilde{x} \\ \mbox{subject to} & \tilde{A}\tilde{x} = \tilde{b} \\ & \tilde{x} \succeq0 \end{array} $$
LP examples
-
diet problem
-
find amount of $n$ different food to minimize purchase cost
while satisfying nutrition requirements
- assume exist $n$ food and $m$ nutritions, $c_i$ is cost of food $i$, $A_{ji}$ is amount of nutrition $j$ contained in unit quantity of food $i$, $b_j$ is amount requirement for nutrition $j$
- diet problem can be formulated as LP $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & Ax \succeq b \\ & x\succeq0 \end{array} $$
-
Chebyshev center of polyhedron
- find largest Euclidean ball contained in polyhedron
- assume polyhedron is $\set{x\in\reals^n}{a_i^Tx \leq b_i, i=1,\ldots, m}$
- problem of finding Chebyshev center of polyhedron can be formulated as LP $$ \begin{array}{ll} \mbox{maximize} & r \\ \mbox{subject to} & a_i^T x + r\|a_i\|_2 \leq b_i \end{array} $$ where optimization variables are $x\in\reals^n$ and $r\in\reals$
-
piecewise-linear minimization
- minimize maximum of affine functions
- assume $m$ affine functions $a_i^Tx + b_i$
- piecewise-linear minimization problem can be formulated as LP $$ \begin{array}{ll} \mbox{minimize} & t \\ \mbox{subject to} & a_i^Tx + b_ i \leq t,\quad i=1,\ldots,m \end{array} $$
-
linear-fractional program
$$
\begin{array}{ll}
\mbox{minimize} &
(
c^T x + d
)
/
(
e^T x + f
)
\\
\mbox{subject to} &
Gx \preceq h
\\ &
Ax = b
\end{array}
$$
- if feasible set is nonempty, can be formulated as LP $$ \begin{array}{ll} \mbox{minimize} & c^T y + dz \\ \mbox{subject to} & Gy - hz \preceq0 \\ & Ay-bz = 0 \\ & e^Ty + fz = 1 \\ & z\geq0 \end{array} $$
Quadratic programming
- when $P=0$, QP reduces to LP, hence LP is specialization of QP
QP examples
-
least-squares (LS) problems
- LS can be formulated as QP $$ \begin{array}{ll} \mbox{minimize} & \|Ax-b\|_2^2 \end{array} $$
-
distance between two polyhedra
- assume two polyhedra $\set{x\in\reals^n}{Ax\preceq b, Cx =d}$ and $\set{x\in\reals^n}{\tilde{A}x\preceq \tilde{b}, \tilde{C}x =\tilde{d}}$
- problem of finding distance between two polyhedra can be formulated as QP $$ \begin{array}{ll} \mbox{minimize} & \|x-y\|_2^2 \\ \mbox{subject to} & Ax\preceq b, \quad Cx =d \\ & \tilde{A}y\preceq \tilde{b}, \quad \tilde{C}y =\tilde{d} \end{array} $$
Quadratically constrained quadratic programming
- when $P_i=0$ for $i=1,\ldots,m$, QCQP reduces to QP, hence QP is specialization of QCQP
Second-order cone programming
- when $b_i=0$, SOCP reduces to QCQP, hence QCQP is specialization of SOCP
SOCP examples
-
robust linear program
-
minimize $c^T x$
while satisfying
$\tilde{a}_i^T x \leq b_i$
for every $\tilde{a}_i \in \set{a_i+P_iu}{\|u\|_2\leq1}$
where $P_i\in\symset{n}$
- can be formulated as SOCP $$ \begin{array}{ll} \mbox{minimize} & c^T x \\ \mbox{subject to} & a_i^T x + \|P_i^T x\|_2 \leq b_i \end{array} $$
-
linear program with random constraints
-
minimize $c^T x$
while satisfying
$\tilde{a}_i^T x \leq b_i$
with probability no less than $\eta$
where $\tilde{a} \sim \normal(a_i,\Sigma_i)$
- can be formulated as SOCP $$ \begin{array}{ll} \mbox{minimize} & c^T x \\ \mbox{subject to} & a_i^T x + \Phi^{-1}(\eta)\|\Sigma_i^{1/2} x\|_2 \leq b_i \end{array} $$
Geometric programming
Geometric programming in convex form
- geometric program in is not convex optimization problem (as it is)
- however, can be transformed to equivalent convex optimization problem by change of variables and transformation of functions
Convex optimization with generalized inequalities
- problem in reduces to convex optimization problem in when $q=1$ and $K_1=\prealk{m}$, hence convex optimization is specialization of convex optimization with generalized inequalities
- like convex optimization
Conic programming
- can transform above CP to standard form CP $$ \begin{array}{ll} \mbox{minimize} & \tildefobj(X) \\ \mbox{subject to} & \tildefeq (X) = 0 \\ & X \succeq_{K} 0 \end{array} $$
- cone program is one of simplest convex optimization problems with generalized inequalities
Semidefinite programming
- above inequality, called linear matrix inequality (LMI)
- can transform SDP to standard form SDP $$ \begin{array}{ll} \mbox{minimize} & \Tr (CX) \\ \mbox{subject to} & \Tr (A_iX) = b_i\quad i=1,\ldots,p \\ & X \succeq 0 \end{array} $$ where $\xdomain=\possemidefset{n}$ and $C,A_1,\ldots,A_p\in\symset{n}$ and $b_i\in\reals$
SDP examples
- LP
-
SOCP
- SOCP in is equivalent to $$ \begin{array}{ll} \mbox{minimize} & f^T x \\ \mbox{subject to} & Fx = g \\ & \begin{my-matrix}{cc} c_i^Tx + d_i & x^TA_i^T + b_i^T \\ A_ix + b_i & (c_i^Tx + d_i)I_{n_i} \end{my-matrix} \succeq 0 \quad i=1,\ldots,m \end{array} $$ which can be transformed to SDP in , thus, SDP reduces to SOCP
- hence, SOCP is specialization of SDP
Determinant maximization problems
- if $l=1$, $C_1=\cdots=C_n=0$, $D=1$, max-det problem reduces to SDP, hence SDP is specialization of max-det problem
Diagrams for containment of convex optimization problems
- the figure shows containment relations among convex optimization problems
- vertical lines ending with filled circles indicate existence of direct reductions, i.e., optimization problem transformations to special cases
Duality
Lagrangian
- $\lambda$, called Lagrange multiplier associated inequality constraints $\fie(x)\preceq0$
- $\lambda_i$, called Lagrange multiplier associated $i$-th inequality constraint $\fie_i(x)\leq0$
- $\nu$, called Lagrange multiplier associated equality constraints $\feq(x)=0$
- $\nu_i$, called Lagrange multiplier associated $i$-th equality constraint $\feq_i(x)=0$
- $\lambda$ and $\nu$, called dual variables or Lagrange multiplier vectors associated with the optimization problem
Lagrange dual functions
-
$g$ is (always) concave function (even when optimization problem is not convex)
- since is pointwise infimum of linear (hence concave) functions is concave
- $g(\lambda,\nu)$ provides lower bound for optimal value of associated optimization problem, i.e., $$ g(\lambda,\nu) \leq p^\ast $$ for every $\lambda\succeq0$
- $(\lambda,\nu) \in \set{(\lambda,\nu)}{\lambda\succeq0, g(\lambda,\nu)>-\infty}$, said to be dual feasible
Dual function examples
-
LS solution of linear equations
$$
\lssollineqs{primal}
$$
- Lagrangian - $L(x,\nu) = x^T x + \nu^T(Ax-b)$
- Lagrange dual function $$ \lssollineqs{dual fcn} $$
-
standard form LP
$$
\begin{array}{ll}
\mbox{minimize} &
c^Tx
\\
\mbox{subject to} &
Ax = b
\\ &
x\succeq 0
\end{array}
$$
- Lagrangian - $L(x,\lambda,\nu) = c^T x - \lambda^T x + \nu^T(Ax-b)$
-
Lagrange dual function
$$
g(\lambda,\nu) = \left\{\begin{array}{ll}
-b^T\nu & A^T\nu - \lambda + c = 0
\\
-\infty & \mbox{otherwise}
\end{array}\right.
$$
- hence, set of dual feasible points is $\set{(A^T\nu + c,\nu)}{A^T\nu +c \succeq0}$
-
maximum cut, sometimes called max-cut, problem, which is NP-hard
$$
\begin{array}{ll}
\mbox{minimize} &
x^T W x
\\
\mbox{subject to} &
x_i^2 = 1
\end{array}
$$
where $W\in\symset{n}$
- Lagrangian - $L(x,\nu) = x^T(W+\diag(\nu))x - \ones^Tx$
-
Lagrange dual function
$$
g(\nu) = \left\{\begin{array}{ll}
-\ones^T\nu
& W + \diag(\nu) \succeq 0
\\
-\infty & \mbox{otherwise}
\end{array}\right.
$$
- hence, set of dual feasible points is $\set{\nu}{W+\diag(\nu)\succeq0}$
-
some trivial problem
$$
\begin{array}{ll}
\mbox{minimize} &
f(x)
\\
\mbox{subject to} &
x=0
\end{array}
$$
- Lagrangian - $L(x,\nu) =f(x)+\nu^Tx$
-
Lagrange dual function
$$
g(\nu) = \inf_{x\in\reals^n} (f(x)+\nu^Tx)
= -\sup_{x\in\reals^n} ((-\nu)^Tx-f(x))
= - f^\ast(-\nu)
$$
- hence, set of dual feasible points is $-\dom f^\ast$, and for every $f:\reals^n\to\reals$ and $\nu\in\reals^n$ $$ -f^\ast(-\nu) \leq f(0) $$
-
minimization with linear inequality and equality constraints
$$
\begin{array}{ll}
\mbox{minimize} &
f(x)
\\
\mbox{subject to} &
Ax\preceq b
\\ &
Cx= d
\end{array}
$$
- Lagrangian - $L(x,\lambda, \nu) = f(x) + \lambda^T(Ax-b) + \nu^T(Cx-d)$
-
Lagrange dual function
$$
g(\nu) = -b^T\lambda - d^T\nu - f^\ast(-A^T \lambda - C^T\nu)
$$
- hence, set of dual feasible points is $\set{(\lambda,\nu)}{-A^T\lambda - C^T\nu \in \dom f^\ast, \lambda\succeq 0}$
-
equality constrained norm minimization
$$
\begin{array}{ll}
\mbox{minimize} &
\|x\|
\\
\mbox{subject to} &
Ax = b
\end{array}
$$
- Lagrangian - $L(x,\nu) = \|x\| + \nu^T(Ax-b)$
-
Lagrange dual function
$$
g(\nu) = -b^T\nu -\sup_{x\in\reals^n} ((-A^T\nu)^Tx - \|x\|)
= \left\{\begin{array}{ll}
-b^T \nu&\|A^T\nu\|_\ast\leq1
\\
- \infty & \mbox{otherwise}
\end{array}\right.
$$
- hence, set of dual feasible points is $\set{\nu}{\|A^T\nu\|_\ast \leq1}$
-
entropy maximization
$$
\entmax{primal}
$$
where domain of objective function is $\pprealk{n}$
- Lagrangian - $L(x,\lambda,\nu) = \sum_{i=1}^n x_i\log x_i + \lambda^T(Ax-b) + \nu(\ones^Tx-1)$
- Lagrange dual function $$ g(\lambda,\nu) = \entmax{dual fcn} $$ obtained using $f^\ast(y) = \sum_{i=1}^n \exp(y_i-1)$ where $a_i$ is $i$-th column vector of $A$
-
minimum volume covering ellipsoid
$$
\minvolcovering{primal}
$$
where domain of objective function is $\posdefset{n}$
- Lagrangian - $L(X,\lambda) = -\log \det X + \sum_{i=1}^m \lambda_i(a_i^T X a_i - 1)$
- Lagrange dual function $$ g(\lambda) = \minvolcovering{dual fcn} $$ obtained using $f^\ast(Y) = -\log\det(-Y) - n$
Best lower bound
- for every $(\lambda,\nu)$ with $\lambda\succeq 0$, Lagrange dual function $g(\lambda,\nu)$ (in ) provides lower bound for optimal value $p^\ast$ for optimization problem in
-
natural question to ask is
- how good is the lower bound?
- what is best lower bound we can achieve?
- these questions lead to definition of Lagrange dual problem
Lagrange dual problems
- original problem in , (somestime) called primal problem
- domain is $\reals^m\times \reals^p$
- dual feasibility defined in page~, i.e., $(\lambda,\nu)$ satisfying $\lambda \succeq 0 \quad g(\lambda,\nu) > -\infty$ indeed means feasibility for Lagrange dual problem
- $d^\ast = \sup\set{g(\lambda,\nu)}{\lambda\in\reals^m,\:\nu\in\reals^p,\:\lambda\succeq 0}$, called dual optimal value
- $(\lambda^\ast,\nu^\ast) = \argsup\set{g(\lambda,\nu)}{\lambda\in\reals^m,\:\nu\in\reals^p,\:\lambda\succeq 0}$, said to be dual optimal or called optimal Lagrange multipliers (if exists)
- Lagrange dual problem in is convex optimization (even though original problem is not) since $g(\lambda,\nu)$ is always convex
Making dual constraints explicit dual problems
- (out specific) way we define Lagrange dual function in as function $g$ of $\reals^m \times \reals^p$ into $\reals\cup\{-\infty\}$, i.e., $\dom g = \reals^n\times\reals^p$
- however, in many cases, feasible set $\set{(\lambda,\nu)}{\lambda \succeq 0 \quad g(\lambda,\nu) > -\infty}$ is proper subset of $\reals^n\times\reals^p$
- can make this implicit feasibility condition explicit by adding it as constraint (as shown in following examples)
Lagrange dual problems associated with LPs
-
standard form LP
- primal problem $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & Ax = b \\ & x\succeq 0 \end{array} $$
-
Lagrange dual problem
$$
\begin{array}{ll}
\mbox{maximize} &
g(\lambda,\nu) = \left\{\begin{array}{ll}
-b^T\nu & A^T\nu - \lambda + c = 0
\\
-\infty & \mbox{otherwise}
\end{array}\right.
\\
\mbox{subject to} &
\lambda \succeq 0
\end{array}
$$
(refer to page~
for Lagrange dual function)
- can make dual feasibility explicit by adding it to constraints as mentioned on page~ $$ \begin{array}{ll} \mbox{maximize} & -b^T\nu \\ \mbox{subject to} & \lambda \succeq 0 \\ & A^T\nu - \lambda + c = 0 \end{array} $$
- can further simplify problem $$ \begin{array}{ll} \mbox{maximize} & -b^T\nu \\ \mbox{subject to} & A^T\nu + c \succeq 0 \end{array} $$
- last problem is inequality form LP
- all three problems are equivalent, but not same problems
- will, however, with abuse of terminology, refer to all three problems as Lagrange dual problem
-
inequality form LP
- primal problem $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & Ax \preceq b \end{array} $$
- Lagrangian $$ L(x,\lambda) = c^Tx + \lambda^T(Ax-b) $$
- Lagrange dual function $$ g(\lambda) = -b^T\lambda + \inf_{x\in\reals^n} (c+A^T\lambda)^T x = \left\{\begin{array}{ll} -b^T\lambda & A^T\lambda + c =0 \\ -\infty & \mbox{otherwise} \end{array}\right. $$
-
Lagrange dual problem
$$
\begin{array}{ll}
\mbox{maximize} &
g(\lambda)
= \left\{\begin{array}{ll}
-b^T\lambda & A^T\lambda + c =0
\\
-\infty & \mbox{otherwise}
\end{array}\right.
\\
\mbox{subject to} &
\lambda \succeq 0
\end{array}
$$
- can make dual feasibility explicit by adding it to constraints as mentioned on page~ $$ \begin{array}{ll} \mbox{maximize} & -b^T\nu \\ \mbox{subject to} & A^T\lambda + c = 0 \\ & \lambda \succeq 0 \end{array} $$
- dual problem is standard form LP
- thus, dual of standard form LP is inequality form LP and vice versa
- also, for both cases, dual of dual is same as primal problem
Lagrange dual problem of equality constrained optimization problem
- equality constrained optimization problem $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & Ax = b \end{array} $$
- dual function $$ \begin{eqnarray*} g(\nu) & = & \inf_{x\in\dom \fobj} (\fobj(x) + \nu^T(Ax-b)) = -b^T\nu - \sup_{x\in\dom \fobj}(-\nu^TAx -\fobj(x)) \\ & = & -b^T\nu - {\fobj}^\ast(-A^T\nu) \end{eqnarray*} $$
- dual problem $$ \begin{array}{ll} \mbox{maximize} & -b^T\nu - {\fobj}^\ast(-A^T\nu) \end{array} $$
Lagrange dual problem associated with equality constrained quadratic program
-
strictly convex quadratic problem
$$
\begin{array}{ll}
\mbox{minimize} &
\fobj(x) = x^TPx + q^T x + r
\\
\mbox{subject to} &
Ax=b
\end{array}
$$
- conjugate function of objective function $$ {\fobj}^\ast(x) = (x-q)^TP^{-1}(x-q)/4 - r = x^TP^{-1}x/4 -q^TP^{-1}x/2 + q^TP^{-1}q/4 -r $$
- dual problem $$ \begin{array}{ll} \mbox{maximize} & -\nu^T (AP^{-1}A^T)\nu /4 -(b + A P^{-1} q/2)^T\nu - q^TP^{-1}q/4 +r \end{array} $$
Lagrange dual problems associated with nonconvex quadratic problems
-
primal problem
$$
\noncvxquadprob{primal}
$$
where $A\in\symset{n}$, $A\not\in\possemidefset{n}$, and $b\in\reals^n$
- since $A\not\succeq 0$, not convex optimization problem
- sometimes called trust region problem arising minimizing second-order approximation of function over bounded region
- Lagrange dual function $$ g(\lambda) = \noncvxquadprob{dual fcn} $$ where $(A+\lambda I)^\dagger$ is pseudo-inverse of $A+\lambda I$
-
Lagrange dual problem
$$
\noncvxquadprob{dual}
$$
where optimization variable is $\lambda \in\reals$
- note we do not need constraint $\lambda \geq0$ since it is implied by $A+\lambda I \succeq 0$
- though not obvious from what it appears to be, it is (of course) convex optimization problem (by definition of Lagrange dual function, i.e., )
- can be expressed ar $$ \begin{array}{ll} \mbox{maximize} & -\sum_{i=1}^n (q_i^Tb)^2/(\lambda_i + \lambda) - \lambda \\ \mbox{subject to} & \lambda \geq - \lambda_\mathrm{min}(A) \end{array} $$ where $\lambda_i$ and $q_i$ are eigenvalues and corresponding orthogormal eigenvectors of $A$, when $\lambda_i + \lambda=0$ for some $i$, we interpret $(q_i^Tb)^2/0$ as 0 if $q_i^T0$ and $\infty$ otherwise
Weak duality
- since $g(\lambda,\nu)\leq p^\ast$ for every $\lambda\succeq 0$, we have $$ d^\ast = \sup\set{g(\lambda,\nu)}{\lambda\in\reals^m,\:\nu\in\reals^p,\:\lambda\succeq 0} \leq p^\ast $$
- $d^\ast$ is best lower bound for primal problem that can be obtained from Lagrange dual function (by definition)
-
weak duality holds even when $d^\ast$ or/and $p^\ast$ are not finite, e.g.
- if primal problem is unbounded below so that $p^\ast=-\infty$, must have $d^\ast = -\infty$, i.e., dual problem is infeasible
- conversely, if dual problem is unbounded above so that $d^\ast = \infty$, must have $p^\ast=\infty$, i.e., primal problem is infeasible
Optimal duality gap
-
sometimes used for lower bound of optimal value of problem which is difficult to solve
-
for example,
dual problem
of max-cut problem (on page~),
which is NP-hard,
is
$$
\begin{array}{ll}
\mbox{minimize} &
-\ones^T \nu
\\
\mbox{subject to} &
W + \diag(\nu) \succeq 0
\end{array}
$$
where optimization variable is $\nu\in\reals^n$
- the dual problem can be solved very efficiently using polynomial time algorithms while primal problme cannot be solved unless $n$ is very small
-
for example,
dual problem
of max-cut problem (on page~),
which is NP-hard,
is
$$
\begin{array}{ll}
\mbox{minimize} &
-\ones^T \nu
\\
\mbox{subject to} &
W + \diag(\nu) \succeq 0
\end{array}
$$
where optimization variable is $\nu\in\reals^n$
Strong duality
-
strong duality does not hold in general
- if it held always, max-cut problem, which is NP-hard, can be solved in polynomial time, which would be one of biggest breakthrough in field of theoretical computer science
- may mean some of strongest cryptography methods, e.g., homeomorphic cryptography, can be broken
Slater's theorem
- exist many conditions which guarantee strong duality, which are called constraint qualifications - one of them is Slater's condition
- such condition, called Slater's condition
- such point, (sometimes) said to be strictly feasible
Strong duality for LS solution of linear equations
- primal problem $$ \lssollineqs{primal} $$
- dual problem $$ \lssollineqs{dual} $$ (refer to page~ for Lagrange dual function)
-
“dual is always feasible''
and
“primal is feasible $\Rightarrow$ Slater's condition holds'',
thus
Slater's theorem ()
implies,
exist only three cases
- $(d^\ast = p^\ast \in \reals)$ or $(d^\ast \in \reals\:\&\: p^\ast = \infty)$ or $(d^\ast = p^\ast = \infty)$
- if primal is infeasible, though, $b\not\in\range(A)$, thus exists $z$, such that $A^Tz=0$ and $b^Tz \neq0$, then line $\set{tz}{t\in\reals}$ makes dual problem unbounded above, hence $d^\ast=\infty$
- hence, strong duality always holds, i.e., $(d^\ast= p^\ast \in \reals)$ or $(d^\ast = p^\ast = \infty)$
Strong duality for LP
- every LP either is infeasible or satisfies Slater's condition
-
dual of LP is LP,
hence, Slater's theorem ()
implies
- if primal is feaisble, either $(d^\ast=p^\ast= -\infty)$ or $(d^\ast=p^\ast\in\reals)$
- if dual is feaisble, either $(d^\ast=p^\ast= \infty)$ or $(d^\ast=p^\ast\in\reals)$
-
only other case left is $(d^\ast=-\infty\;\&\;p^\ast= \infty)$
- indeed, this pathological case can happen
Strong duality for entropy maximization
- primal problem $$ \entmax{primal} $$
- dual problem (refer to page~ for Lagrange dual function) $$ \entmax{dual} $$
- dual problem is feasible, hence, Slater's theorem () implies, if exists $x\succ 0$ with $Ax \preceq b$ and $\ones^T x =1$, strong duality holds, and indeed $d^\ast=p^\ast\in\reals$
- by the way, can simplify dual problem by maximizing dual objective function over $\nu$ $$ \entmax{simplied dual} $$ which is geometry program in convex form () with nonnegativity contraint
Strong duality for minimum volume covering ellipsoid
- primal problem $$ \minvolcovering{primal} $$ where $\optdomain=\posdefset{n}$
- dual problem $$ \minvolcovering{dual} $$ (refer to page~ for Lagrange dual function)
- $X=\alpha I$ with large enough $\alpha>0$ satisfies primal's constraints, hence Slater's condition always holds, thus, strong duality always holds, i.e., $(d^\ast = p^\ast \in \reals)$ or $(d^\ast = p^\ast = -\infty)$
- in fact, $\range(a_1,\ldots,a_m) = \reals^n$ if and only if $d^\ast=p^\ast\in\reals^n$
Strong duality for trust region nonconvex quadratic problems
- one of rare occasions in which strong duality obtains for nonconvex problems
- primal problem $$ \noncvxquadprob{primal} $$ where $A\in\symset{n}$, $A\not\in\possemidefset{n}$, and $b\in\reals^n$
- Lagrange dual problem (page~) $$ \noncvxquadprob{dual} $$
- strong duality always holds and $d^\ast=p^\ast\in\reals$ (since dual problem is feasible - large enough $\lambda$ satisfies dual constraints)
- in fact, exists stronger result - strong dual holds for optimization problem with quadratic objective and one quadratic inequality constraint, provided Slater's condition holds
Matrix games using mixed strategies
-
matrix game - consider game with two players $A$ and $B$
- player $A$ makes choice $1\leq a\leq n$, player $B$ makes choice $1\leq b\leq m$, then player $A$ makes payment of $P_{ab}$ to player $B$
- matrix $P\in\reals^{n\times m}$, called payoff matrix
- player $A$ tries to pay as little as possible & player $B$ tries to received as much as possible
- players use randomized or mixed strategies, i.e., each player makes choice randomly and independently of other player's choice according to probability distributions $$ \Prob(a=i) = u_i\; i=1\leq i\leq n \quad \Prob(b=j) = v_j\; i=1\leq j\leq m $$
- expected payoff (from player $A$ to player $B$) $$ \sum_i \sum_j u_iv_jP_{ij} = u^TPv $$
-
assume player $A$'s strategy is known to play $B$
- player $B$ will choose $v$ to maximize $u^TPv$ $$ \sup\set{u^TPv}{v\succeq 0,\; \ones^Tv=1} = \max_{1\leq j\leq m} (P^Tu)_j $$
- player $A$ (assuming that player $B$ will employ above strategy to maximize payment) will choose $u$ to minimize payment $$ \begin{array}{ll} \mbox{minimize} & \max_{1\leq j\leq m} (P^Tu)_j \\ \mbox{subject to} & u\succeq 0\quad \ones^Tu=1 \end{array} $$
-
assume player $B$'s strategy is known to play $A$
- then player $B$ will do same to maximize payment (assuming that player $A$ will employ such strategy to minimize payment) $$ \begin{array}{ll} \mbox{maximize} & \min_{1\leq i\leq n} (Pv)_i \\ \mbox{subject to} & v\succeq 0\quad \ones^Tv=1 \end{array} $$
Strong duality for matrix games using mixed strategies
- in matrix game, can guess in frist came, player $B$ has advantage over player $A$ because $A$'s strategy's exposed to $B$, and vice versa, hence optimal value of first problem is greater than that of second problem
- surprising, no one has advantage over the other, i.e., optimal values of two problems are same - will show this
- first observe both problems are (convex) piecewise-linear optimization problems
-
formulate first problem as LP
$$
\begin{array}{ll}
\mbox{minimize} &
t
\\
\mbox{subject to} &
u\succeq 0 \quad \ones^T u =1 \quad P^T u \preceq t\ones
\end{array}
$$
- Lagrangian $$ L(u,t,\lambda_1, \lambda_2,\nu) = \nu + (1-\ones^T\lambda_1)t + (P\lambda_1 - \nu \ones - \lambda_2)^Tu $$
- Lagrange dual function $$ g(\lambda_1, \lambda_2,\nu) = \left\{\begin{array}{ll} \nu & \ones^T\lambda_1 = 1 \;\&\; P\lambda_1 - \nu \ones = \lambda_2 \\ -\infty & \mbox{otherwise} \end{array}\right. $$
- Lagrange dual problem $$ \begin{array}{ll} \mbox{maximize} & \nu \\ \mbox{subject to} & \ones^T\lambda_1 = 1 \quad P\lambda_1 - \nu \ones = \lambda_2 \\ & \lambda_1 \succeq 0 \quad \lambda_2 \succeq 0 \end{array} $$
- eliminating $\lambda_2$ gives below Lagrange dual problem $$ \begin{array}{ll} \mbox{maximize} & \nu \\ \mbox{subject to} & \lambda_1 \succeq 0 \quad \ones^T\lambda_1 = 1 \quad P\lambda_1 \succeq \nu \ones \end{array} $$ which is equivalent to second problem in matrix game
- weak duality confirms “player who knows other player's strategy has advantage or on par''
- moreoever, primal problem satisfies Slater's condition, hence strong duality {always} holds, and dual is feasible, hence $d^\ast=p^\ast\in\reals$, i.e., regardless of who knows other player's strategy, no player has advantage
Geometric interpretation of duality
- assume (not necessarily convex) optimization problem in
- define graph $$ G = \set{(\fie(x), \feq(x), \fobj(x))}{x\in\optdomain} \subset \reals^m \times \reals^p \times \reals $$
- for every $\lambda\succeq 0$ and $\nu$ $$ \begin{eqnarray*} p^\ast &=& \inf\set{t}{(u,v,t) \in G, u\preceq 0, v &=& 0} \\ & \geq & \inf\set{t+\lambda^Tu + \nu^T v}{(u,v,t) \in G, u\preceq 0, v = 0} \\ & \geq & \inf\set{t+\lambda^Tu + \nu^T v}{(u,v,t) \in G} = g(\lambda,\nu) \end{eqnarray*} $$ where second inequality comes from $\set{(u,v,t)}{(u,v,t) \in G, u\preceq 0, v = 0} \subset G$
- above establishes weak duality using graph
- last equality implies that $$ (\lambda, \nu, 1)^T (u,v,t) \geq g(\lambda,\nu) $$ hence if $g(\lambda,\nu) > -\infty$, $(\lambda, \nu, 1)$ and $g(\lambda,\nu)$ define nonvertical supporting hyperplane for $G$ - nonvertical because third component is nonzero
- the figure shows $G$ as area inside closed curve contained in $\reals^m\times\reals^p\times\reals$ where $m=1$ and $p=0$ as primal optimal value $p^\ast$ and supporting hyperplane $\lambda u + t = g(\lambda)$
- the figure shows three hyperplanes determined by three values for $\lambda$, one of which $\lambda^\ast$ is optimal solution for dual problem
Epigraph interpretation of duality
- define extended graph over $G$ - sort of epigraph of $G$ $$ \begin{eqnarray*} H &=& G + \preals^m \times \{0\} \times \preals \\ & = & \set{(u, v, t)}{x\in\optdomain, \fie(x) \preceq u, \feq(x) = v, \fobj(x)\leq t } \end{eqnarray*} $$
- if $\lambda\succeq 0$, $g(\lambda,\nu) = \inf\set{(\lambda,\nu,1)^T(u,v,t)}{(u,v,t) \in H}$, thus $$ (\lambda,\nu,1)^T (u,v,t) \geq g(\lambda,\nu) $$ defines nonvertical supporting hyperplane for $H$
- now $p^\ast = \inf\set{t}{(0,0,t)\in H}$, hence $(0,0,p^\ast) \in \boundary H$, hence $$ p^\ast =(\lambda,\nu,1)^T (0,0,p^\ast) \geq g(\lambda,\nu) $$
- once again establishes weak duality
- the figure shows epigraph interpretation
Proof of strong duality under constraint qualification
- now we show proof of strong duality - this is one of rare cases where proof is shown in main slides instead of “selected proofs'' section like Galois theory since - (I hope) it will give you some good intuition about why strong duality holds for (most) convex optimization problems
- assume Slater's condition holds, i.e., $\fobj$ and $\fie$ are convex, $\feq$ is affine, and exists $x\in\optdomain$ such that $\fie(x) \prec 0$ and $\feq(x) = 0$
- further assume $\optdomain$ has interior (hence, $\relint \optdomain = \interior{\optdomain}$ and $\rank A=p$
- assume $p^\ast\in\reals$ - since exists feasible $x$, the other possibility is $p^\ast = -\infty$, but then, $d^\ast = -\infty$, hence strong duality holds
- $H$ is convex
- now define $$ B = \set{(0,0,s)\in\reals^m\times\reals^p\times\reals}{s<p^\ast} $$
- then $B\cap H=\emptyset$, hence implies exists separable hyperplane with $(\tilde{\lambda}, \tilde{\nu}, \mu)\neq 0$ and $\alpha$ such that $$ \begin{eqnarray*} (u,v,t) \in H &\Rightarrow& \tilde{\lambda}^T u + \tilde{\nu}^T v + \mu t \geq \alpha \\ (u,v,t) \in B &\Rightarrow& \tilde{\lambda}^T u + \tilde{\nu}^T v + \mu t \leq \alpha \end{eqnarray*} $$
-
then $\tilde{\lambda} \succeq 0$ & $\mu\geq0$ - assume $\mu>0$
- can prove when $\mu=0$, but kind of tedius, plus, whole purpose is provide good intuition, so will not do it here
- above second inequality implies $\mu p^\ast \leq \alpha$ and for some $x\in\optdomain$ $$ \mu L(x,\tilde{\lambda}/\mu, \tilde{\nu}/\mu) = \tilde{\lambda}^T \fie(x) + \tilde{\nu}^T \feq(x) + \mu \fobj(x) \geq \alpha \geq \mu p^\ast $$ thus, $$ g(\tilde{\lambda}/\mu, \tilde{\nu}/\mu) \geq p^\ast $$
- finally, weak duality implies $$ g(\lambda,\nu) = p^\ast $$ where $\lambda = \tilde{\lambda}/\mu$ & $\nu = \tilde{\nu}/\mu$
Max-min characterization of weak and strong dualities
- note $$ \begin{eqnarray*} \sup_{\lambda\geq 0, \nu} L(x,\lambda,\nu) &=& \sup_{\lambda\geq 0, \nu} \left( \fobj(x) + \lambda^T \fie(x) + \nu^T \feq(x) \right) \\ & = & \left\{\begin{array}{ll} \fobj(x) & x\in\optfeasset \\ \infty & \mbox{otherwise} \end{array}\right. \end{eqnarray*} $$
- thus $p^\ast = \inf_{x\in\optdomain} \sup_{\lambda\succeq 0, \nu} L(x,\lambda,\nu)$ whereas $d^\ast = \sup_{\lambda\succeq 0,\nu} \inf_{x\in\optdomain} L(x,\lambda,\nu)$
- weak duality means $$ \sup_{\lambda\succeq 0, \nu} \inf_{x\in\optdomain} L(x,\lambda,\nu) \leq \inf_{x\in\optdomain} \sup_{\lambda\succeq 0, \nu} L(x,\lambda,\nu) $$
- strong duality means $$ \sup_{\lambda\succeq 0, \nu} \inf_{x\in\optdomain} L(x,\lambda,\nu) = \inf_{x\in\optdomain} \sup_{\lambda\succeq 0, \nu} L(x,\lambda,\nu) $$
Max-min inequality
- indeed, inequality $\sup_{\lambda\succeq 0} \inf_{x\in\optdomain} L(x,\lambda,\nu) \leq \inf_{x\in\optdomain} \sup_{\lambda\succeq 0} L(x,\lambda,\nu)$ holds for general case
- this happens, e.g., $X=\optdomain$, $Y=\prealk{m} \times \reals^p$, $f$ is Lagrangian of optimization problem (in ) for which strong duality holds
Saddle-points
- if assumption in holds, $x^\ast$ minimizes $f(x,y^\ast)$ over $X$ and $y^\ast$ maximizes $f(x^\ast,y)$ over $Y$ $$ \sup_{y\in Y} f(x^\ast,y) = f(x^\ast,y^\ast) = \inf_{x\in X} f(x,y^\ast) $$
Saddle-point interpretation of strong duality
- for primal optimum $x^\ast$ and dual optimum $(\lambda^\ast,\nu^\ast)$ $$ g(\lambda^\ast,\nu^\ast) \leq L(x^\ast, \lambda^\ast, \nu^\ast) \leq \fobj(x^\ast) $$
-
if strong duality holds,
for every $x\in\optdomain$, $\lambda\succeq 0$, and $\nu$
$$
L(x^\ast,\lambda,\nu)
\leq
\fobj(x^\ast) = L(x^\ast,\lambda^\ast,\nu^\ast) = g(\lambda^\ast,\nu^\ast)
\leq
L(x,\lambda^\ast, \nu^\ast)
$$
- thus $x^\ast$ and $(\lambda^\ast,\nu^\ast)$ form saddle-point of Lagrangian
-
conversely, if $\tilde{x}$ and $(\tilde{\lambda},\tilde{\nu})$ are saddle-point of Lagrangian,
i.e.,
for every $x\in\optdomain$, $\lambda\succeq 0$, and $\nu$
$$
L(\tilde{x}, {\lambda},{\nu})
\leq
L(\tilde{x}, \tilde{\lambda},\tilde{\nu})
\leq
L({x}, \tilde{\lambda},\tilde{\nu})
$$
- hence $g(\tilde{\lambda},\tilde{\nu}) = \inf_{x\in\optdomain} L(x,\tilde{\lambda},\tilde{\nu}) = L(\tilde{x}, \tilde{\lambda},\tilde{\nu}) = \sup_{\lambda\succeq 0, \nu} L(\tilde{x},{\lambda},{\nu}) = \fobj(\tilde{x})$, thus $g(\lambda^\ast,\nu^\ast) \leq g(\tilde{\lambda}, \tilde{\nu})$ & $\fobj(\tilde{x}) \leq \fobj(x^\ast)$
- thus $\tilde{x}$ and $(\tilde{\lambda}, \tilde{\nu})$ are primal and dual optimal
Game interpretation
- assume two players play zero-sum game with payment function $f:X\times Y\to \reals$ where player $A$ pays player $B$ amount equal to $f(x,y)$ when player $A$ chooses $x$ and player $B$ chooses $y$
- player $A$ will try to minimize $f(x,y)$ and player $B$ will try to maximize $f(x,y)$
-
assume player $A$ chooses first
then player $B$ chooses after learning opponent's choice
- if player $A$ chooses $x$, player $B$ will choose $\argsup_{y\in Y} f(x,y)$
- knowing that, player $A$ will first choose $\arginf_{x\in X} \sup_{y\in Y} f(x,y)$
- hence payment will be $\inf_{x\in X} \sup_{y\in Y} f(x,y)$
- if player $B$ makes her choise first, opposite happens, i.e., payment will be $\sup_{y\in Y} \inf_{x\in X} f(x,y)$
- max-min inequality of says $$ \sup_{y\in Y} \inf_{x\in X} f(x,y) \leq \inf_{x\in X} \sup_{y\in Y} f(x,y) $$ i.e., whowever chooses later has advantage, which is similar or rather same as matrix games using mixed strategies on page~
- saddle-point for $f$ (and $X$ and $Y$), $(x^\ast,y^\ast)$, called solution of game - $x^\ast$ is optimal choice for player $A$ and $x^\ast$ is optimal choice for player $B$
Game interpretation for weak and strong dualities
- assume payment function in zero-sum game on page~ is Lagrangian of optimization problem in
- assume that $X=\xdomain$ and $Y=\prealk{n} \times \reals^p$
- if player $A$ chooses first, knowing that player $B$ will choose $\argsup_{(\lambda,\nu)\in Y}L(x,\lambda,\nu)$, she will choose $x^\ast = \arginf_{x\in\xdomain} \sup_{(\lambda,\nu)\in Y}L(x,\lambda,\nu)$
- likewise, player $B$ will choose $(\lambda^\ast,\nu^\ast) = \argsup_{(\lambda,\nu)\in Y} \inf_{x\in\xdomain} L(x,\lambda,\nu)$
- optimal dualtiy gap $p^\ast - d^\ast$ equals to advantage player who goes second has
- if strong dualtiy holds, $(x^\ast, \lambda^\ast, \nu^\ast)$ is solution of game, in which case no one has advantage
Certificate of suboptimality
- dual feasible point $(\lambda,\nu)$ degree of suboptimality of current solution
- assume $x$ is feasible solution, then $$ \fobj(x) - p^\ast \leq \fobj(x) - g(\lambda,\nu) $$ guarantees that $\fobj(x)$ is no further than $\epsilon = \fobj(x) - g(\lambda,\nu)$ from optimal point point $x^\ast$ (even though we do not know optimal solution)
- for this reason, $(\lambda,\nu)$, called certificate of suboptimality
- $x$ is $\epsilon$-suboptimal for primal problem and $(\lambda,\nu)$ is $\epsilon$-suboptimal for dual problem
- strong duality means we could find arbitrarily small certificate of suboptimality
Complementary slackness
- assume strong duality holds for optimization problem in and assume $x^\ast$ is primal optimum and $(\lambda^\ast,\nu^\ast)$ is dual optimum, then $$ \fobj(x^\ast) = L(x^\ast,\lambda^\ast,\nu^\ast) = \fobj(x^\ast) + {\lambda^\ast}^T \fie(x^\ast) + {\nu^\ast}^T \feq(x^\ast) $$
- $\feq(x^\ast)=0$ implies ${\lambda^\ast}^T \fie(x^\ast)=0$
- then $\lambda^\ast \succeq 0$ and $\fie(x^\ast) \preceq 0$ imply $$ \lambda_i^\ast \fie_i(x^\ast) = 0 \quad i=1,\ldots,m $$
KKT optimality conditions
KKT necessary for optimality with strong duality
- when strong duality holds, KKT optimality conditions are necessary for primal and dual optimality
- or equivalently
- primal and dual optimality with strong duality imply KKT optimality conditions
KKT and convexity sufficient for optimality with strong duality
- assume convex optimization problem where $\fobj$, $\fie$, and $\feq$ are all differentiable and ${x}\in\optdomain$ and $({\lambda}, {\nu})\in\reals^m\times\reals^p$ satisfying KKT conditions, i.e. $$ \fie({x}) \preceq 0, \; \feq({x}) = 0 , \; {\lambda} \succeq 0 , \; {\lambda}^T \fie({x}) = 0 , \; \nabla_x L({x}, {\lambda},{\nu}) = 0 $$
- since $L(x,\lambda,\nu)$ is convex for $\lambda\succeq 0$, i.e., each of $\fobj(x)$, $\lambda^T \fie(x)$, and $\nu^T \feq(x)$ is convex, vanishing gradient implies $x$ achieves infimum for Lagrangian, hence $$ g(\lambda,\nu) = L(x,\lambda,\nu) = \fobj(x) + \lambda^T \fie(x) + \nu^T \feq(x) = f(x) $$
- thus, strong duality holds, i.e., $x$ and $(\lambda,\nu)$ are primal and dual optimal solutions with zero duality gap
- for convex optimization problem, KKT optimality conditions are sufficient for primal and dual optimality with strong duality
- or equivalently
- KKT optimality conditions and convexity imply primal and dual optimality and strong duality
-
together with
implies
that
for convex optimization problem
- KKT optimality conditions are necessary and sufficient for primal and dual optimality with strong duality
Solving primal problems via dual problems
- when strong duality holds, can retrieve primal optimum from dual optimum since primal optimal solution is minimize of $$ L(x,\lambda^\ast,\nu^\ast) $$ where $(\lambda^\ast, \nu^\ast)$ is dual optimum
-
example - entropy maximization
($\optdomain = \pprealk{n}$)
- primal problem -
- dual problem -
- provided dual optimum $(\lambda^\ast,\nu^\ast)$, primal optimum is $$ x^\ast = \argmin_{x\in\optdomain} \left( \sum x_i \log x_i + {\lambda^\ast}^T (Ax-b) + \nu^\ast(\ones^Tx -1) \right) $$
- $\nabla_x L(x,\lambda^\ast,\nu^\ast) = \log x + A^T \lambda^\ast + (1+\nu^\ast)\ones$, hence $$ x^\ast = \exp(-(A^T \lambda^\ast + (1+\nu^\ast)\ones)) $$
Perturbed optimization problems
- original problem in with perturbed constraints $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie(x) \preceq u \\ & \feq(x) =v \end{array} $$ where $u\in\reals^m$ and $v\in\reals^p$
- define $p^\ast(u,v)$ as optimal value of above perturbed problem, i.e. $$ p^\ast(u,v) = \inf\set{\fobj(x)}{x\in\optdomain, \fie(x) \preceq u, \feq(x) = v} $$ which is convex when problem is convex optimization problem - note $p^\ast(0,0)=p^\ast$
- assume and dual optimum $(\lambda^\ast,\nu^\ast)$, if strong duality holds, for every feasible $x$ for perturbed problem $$ p^\ast(0,0)=g(\lambda^\ast,\nu^\ast) \leq \fobj(x) + {\lambda^\ast}^T \fie(x) + {\nu^\ast}^T \feq(x) \leq \fobj(x) + {\lambda^\ast}^T u + {\nu^\ast}^T v $$ thus $$ p^\ast(0,0)\leq p^\ast(u,v) + {\lambda^\ast}^T u + {\nu^\ast}^T v $$ hence $$ p^\ast(u,v)\geq p^\ast(0,0) - {\lambda^\ast}^T u - {\nu^\ast}^T v $$
- the figure shows this for optimization problem with one inequality constraint and no equality constraint
Global sensitivity analysis via perturbed problems
- recall $$ p^\ast(u,v)\geq p^\ast(0,0) - {\lambda^\ast}^T u - {\nu^\ast}^T v $$
-
interpretations
- if $\lambda^\ast_i$ is large, when $i$-th inequality constraint is tightened, optimal value increases a lot
- if $\lambda^\ast_i$ is small, when $i$-th inequality constraint is relaxed, optimal value decreases not a lot
- if $|\nu^\ast_i|$ is large, reducing $v_i$ when $\nu^\ast_i>0$ or increasing $v_i$ when $\nu^\ast_i<0$ increases optimval value a lot
- if $|\nu^\ast_i|$ is small, increasing $v_i$ when $\nu^\ast_i>0$ or decreasing $v_i$ when $\nu^\ast_i<0$ decreases optimval value not a lot
- it only gives lower bounds - will explore local behavior
Local sensitivity analysis via perturbed problems
-
assume $p^\ast(u,v)$ is differentiable with respect to $u$ and $v$,
i.e., $\nabla_{(u,v)} p^\ast(u,v)$ exist
- then $$ \frac{\partial}{\partial u_i} p^\ast (0,0) = \lim_{h\to 0^+} \frac{p^\ast(he_i,0) - p^\ast(0,0)}{h} \geq \lim_{h\to 0^+} \frac{-{\lambda^\ast}^T (he_i) }{h} = -\lambda_i $$ and $$ \frac{\partial}{\partial u_i} p^\ast (0,0) = \lim_{h\to 0^-} \frac{p^\ast(he_i,0) - p^\ast(0,0)}{h} \leq \lim_{h\to 0^-} \frac{-{\lambda^\ast}^T (he_i) }{h} = -\lambda_i $$
- obtain same result for $v_i$, hence $$ \nabla_u\; p^\ast (0,0) = -\lambda \quad \nabla_v\; p^\ast (0,0) = -\nu $$
- so larger $\lambda_i$ or $|\nu_i|$ means larger change in optimal value of perturbed problem when $u_i$ or $v_i$ change a bit and vice versa quantitatively, - $\lambda_i$ an $\nu_i$ provide exact ratio and direction
Different dual problems for equivalent optimization problems - 1
-
introducing new variables and equality constraints
for unconstrained problems
-
unconstrained optimization problem
$$
\begin{array}{ll}
\mbox{minimize} &
f(Ax+b)
\end{array}
$$
- dual Lagrange function is $g = p^\ast$, hence strong duality holds, which, however, does not provide useful information
-
reformulate as equivalent optimization problem
$$
\begin{array}{ll}
\mbox{minimize} &
f(y)
\\
\mbox{subject to} &
Ax+b = y
\end{array}
$$
- Lagrangian - $L(x,y,\nu) = f(y) + \nu^T(Ax+b-y)$
- Lagrange dual function - $g(\nu) = -I(A^T\nu = 0) + b^T\nu - f^\ast(\nu)$
- dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & b^T\nu - f^\ast(\nu) \\ \mbox{subject to} & A^T \nu = 0 \end{array} $$
-
unconstrained optimization problem
$$
\begin{array}{ll}
\mbox{minimize} &
f(Ax+b)
\end{array}
$$
-
examples
-
unconstrained geometric problem
$$
\begin{array}{ll}
\mbox{minimize} &
\log\left(
\sum_{i=1}^m \exp(a_i^Tx + b_i)
\right)
\end{array}
$$
- reformulation $$ \begin{array}{ll} \mbox{minimize} & \log\left( \sum_{i=1}^m \exp(y_i) \right) \\ \mbox{subject to} & Ax + b =y \end{array} $$
- dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & b^T \nu - \sum_{i=1}^m \nu_i \log \nu_i \\ \mbox{subject to} & \ones^T \nu = 1 \\ & A^T \nu = 0 \\ & \nu \succeq 0 \end{array} $$ which is entropy maximization problem
-
norm minimization problem
$$
\begin{array}{ll}
\mbox{minimize} &
\|Ax-b\|
\end{array}
$$
- reformulation $$ \begin{array}{ll} \mbox{minimize} & \|y\| \\ \mbox{subject to} & Ax - b = y \end{array} $$
- dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & b^T \nu \\ \mbox{subject to} & \|\nu\|_\ast \leq 1 \\ & A^T \nu =0 \end{array} $$
-
unconstrained geometric problem
$$
\begin{array}{ll}
\mbox{minimize} &
\log\left(
\sum_{i=1}^m \exp(a_i^Tx + b_i)
\right)
\end{array}
$$
Different dual problems for equivalent optimization problems - 2
-
introducing new variables and equality constraints
for constrained problems
- inequality constrained optimization problem $$ \begin{array}{ll} \mbox{minimize} & f_0(A_0x+b_0) \\ \mbox{subject to} & f_i(A_ix+b_i) \leq 0\quad i=1,\ldots,m \end{array} $$
- reformulation $$ \begin{array}{ll} \mbox{minimize} & f_0(y_0) \\ \mbox{subject to} & f_i(y_i) \leq 0\quad i=1,\ldots,m \\ & A_i x + b_i = y_i\quad i=0,\ldots,m \end{array} $$
- dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & \sum_{i=0}^m \nu_i^T b_i - f_0^\ast(\nu_0) - \sum_{i=1}^m \lambda_i f_i^\ast(\nu_i/\lambda_i) \\ \mbox{subject to} & \sum_{i=0}^m A_i^T \nu_i = 0 \\ & \lambda \succeq 0 \end{array} $$
-
examples
-
inequality constrained geometric program
$$
\begin{array}{ll}
\mbox{minimize} &
\log\left(\sum \exp(A_0x + b_0)\right)
\\
\mbox{subject to} &
\log\left(\sum \exp(A_ix + b_i)\right)\leq 0\quad i=1,\ldots,m
\end{array}
$$
where $A_i\in\reals^{K_i\times n}$
and $\exp(z) := (\exp(z_1),\ldots,\exp(z_k)))\in\reals^n$
and $\sum z := \sum_{i=1}^k z_i\in\reals$
for $z\in\reals^k$
- reformulation $$ \begin{array}{ll} \mbox{minimize} & \log\left(\sum \exp(y_0)\right) \\ \mbox{subject to} & \log\left(\sum \exp(y_i)\right)\leq 0\quad i=1,\ldots,m \\ & A_i x + b_i = y_i \quad i=0,\ldots,m \end{array} $$
- dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & \sum_{i=0}^m b_i^T \nu_i - \nu_0^T\log(\nu_0) - \sum_{i=1}^m \nu_i^T\log(\nu_i/\lambda_i) \\ \mbox{subject to} & \nu_i \succeq 0\quad i=0,\ldots,m \\ & \ones^T \nu_0 = 1,\; \ones^T\nu_i=\lambda_i\quad i=1,\ldots,m \\ & \lambda_i\geq 0 \quad i=1,\ldots,m \\ & \sum_{i=0}^m A_i^T\nu_i = 0 \end{array} $$ where and $\log(z) := (\log(z_1),\ldots,\log(z_k)))\in\reals^n$ for $z\in\pprealk{k}$
- simplified dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & \sum_{i=0}^m b_i^T \nu_i - \nu_0^T\log(\nu_0) - \sum_{i=1}^m \nu_i^T\log(\nu_i/\ones^T\nu_i) \\ \mbox{subject to} & \nu_i \succeq 0\quad i=0,\ldots,m \\ & \ones^T \nu_0 = 1 \\ & \sum_{i=0}^m A_i^T\nu_i = 0 \end{array} $$
-
inequality constrained geometric program
$$
\begin{array}{ll}
\mbox{minimize} &
\log\left(\sum \exp(A_0x + b_0)\right)
\\
\mbox{subject to} &
\log\left(\sum \exp(A_ix + b_i)\right)\leq 0\quad i=1,\ldots,m
\end{array}
$$
where $A_i\in\reals^{K_i\times n}$
and $\exp(z) := (\exp(z_1),\ldots,\exp(z_k)))\in\reals^n$
and $\sum z := \sum_{i=1}^k z_i\in\reals$
for $z\in\reals^k$
Different dual problems for equivalent optimization problems - 3
-
transforming objectives
- norm minimization problem $$ \begin{array}{ll} \mbox{minimize} & \|Ax - b\| \end{array} $$
- reformulation $$ \begin{array}{ll} \mbox{minimize} & (1/2)\|y\|^2 \\ \mbox{subject to} & Ax - b = y \end{array} $$
- dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & -(1/2)\|\nu\|_\ast^2 + b^T\nu \\ \mbox{subject to} & A^T\nu = 0 \end{array} $$
Different dual problems for equivalent optimization problems - 4
-
making contraints implicit
- LP with box constraints $$ \begin{array}{ll} \mbox{minimize} & c^T x \\ \mbox{subject to} & Ax = b,\; l \preceq x \preceq u \end{array} $$
- dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & -b^T\nu - \lambda_1^Tu + \lambda_2^Tl \\ \mbox{subject to} & A^T\nu + \lambda_1 - \lambda_2 + c = 0,\; \lambda_1 \succeq 0,\; \lambda_2 \succeq 0 \end{array} $$
- reformulation $$ \begin{array}{ll} \mbox{minimize} & c^T x + I ( l\preceq x \preceq u) \\ \mbox{subject to} & Ax = b \end{array} $$
- dual optimization problem for reformulated primal problem $$ \begin{array}{ll} \mbox{maximize} & -b^T \nu - u^T(A^T\nu + c)^- + l^T(A^T\nu + c)^+ \end{array} $$
Theorems of Alternatives
Weak alternatives
- can prove using duality of optimization problems
-
consider primal and dual problems
- primal problem $$ \begin{array}{ll} \mbox{minimize} & 0 \\ \mbox{subject to} & \fie(x) \preceq 0 \\ & \feq(x) =0 \end{array} $$
- dual problem $$ \begin{array}{ll} \mbox{maximize} & g(\lambda,\nu) \\ \mbox{subject to} & \lambda \succeq 0 \end{array} $$ where $$ g(\lambda,\nu) = \inf_{x\in\optdomain} \left( \lambda^T \fie(x) + \nu^T \feq(x) \right) $$
- then $p^\ast,\; d^\ast \in \{0,\infty\}$
- now assume first system of \theoremname~\ref{theorem:weak alternatives of two systems}\ is feasible, then $p^\ast = 0$, hence weak duality applies $d^\ast=0$, thus there exist no $\lambda$ and $\nu$ such that $\lambda\succeq 0$ and $g(\lambda,\nu) > 0$ i.e., second system is infeasible, since otherwise there exist $\lambda$ and $\nu$ making $g(\lambda,\nu)$ arbitrarily large; if $\tilde{\lambda}\succeq 0$ and $\tilde{\nu}$ satisfy $g({\lambda},{\nu})>0$, $g(\alpha\tilde{\lambda}, \alpha\tilde{\nu}) = \alpha g(\tilde{\lambda}, \tilde{\nu})$ goes to $\infty$ when $\alpha\to\infty$
- assume second system is feasible, then $g(\lambda,\nu)$ can be arbitrarily large for above reasons, thus $d^\ast = \infty$, hence weak duality implies $p^\ast = \infty$, which implies first system is infeasible
- therefore two systems are weak alternatives; at most one of them is feasible
- (actually, not hard to prove it without using weak duality)
Weak alternatives with strict inequalities
Strong alternatives
Strong alternatives with strict inequalities
-
proof -
consider convex optimization problem and its dual
- primal problem $$ \begin{array}{ll} \mbox{minimize} & s \\ \mbox{subject to} & \fie(x) - s \ones \preceq 0 \\ & \feq(x) =0 \end{array} $$
- dual problem $$ \begin{array}{ll} \mbox{maximize} & g(\lambda,\nu) \\ \mbox{subject to} & \lambda \succeq 0 \quad \ones^T \lambda = 1 \end{array} $$ where $g(\lambda,\nu) = \inf_{x\in\optdomain} \left( \lambda^T \fie(x) + \nu^T \feq(x) \right)$
- first observe Slater's condition holds for primal problem since by hypothesis of , exists $y\in\relint \optdomain$ with $\feq(y)=0$, hence $(y,\fie(y))\in\xie\times \reals$ is primal feasible satisifying Slater's condition
- hence Slater's theorem () implies $d^\ast=p^\ast$
- assume first system is feasible, then primal problem is strictly feasible and $d^\ast = p^\ast<0$, hence second system infeasible since otherwise feasible point for second system is feasible point of dual problem, hence $d^\ast\geq0$
- assume first system is infeasible, then $d^\ast = p^\ast\geq0$, hence Slater's theorem () implies exists dual optimal $(\lambda^\ast,\nu^\ast)$ (whether or not $d^\ast=\infty$), hence $(\lambda^\ast,\nu^\ast)$ is feasible point for second system of
- therefore two systems are strong alternatives; each is feasible if and only if the other is infeasible
Strong alternatives for linear inequalities
- dual function of feasibility problem for $Ax\preceq b$ is $$ g(\lambda) = \inf_{x\in\reals^n} \lambda^T(Ax-b) = \left\{ \begin{array}{ll} -b^T \lambda & A^T\lambda = 0 \\ -\infty & \mbox{otherwise} \end{array} \right. $$
- hence alternative system is $\lambda\succeq0,\;b^T\lambda <0,\; A^T\lambda=0$
- thus implies below systems are strong alternatives $$ Ax \preceq b \quad\&\quad \lambda\succeq0 \quad b^T\lambda <0 \quad A^T\lambda=0 $$
- similarly alternative system is $\lambda\succeq0,\;b^T\lambda <0,\; A^T\lambda=0$ and implies below systems are strong alternatives $$ Ax \prec b \quad\&\quad \lambda\succeq0 \quad \lambda \neq 0 \quad b^T\lambda \leq 0 \quad A^T\lambda=0 $$
Farkas' lemma
- will prove using LP and its dual
- consider LP $\left(\mbox{minimize}\; c^T x \quad \mbox{subject to}\; Ax \preceq 0\right)$
- dual function is $g(y) = \inf_{x\in\reals^n} \left(c^Tx + y^TAx \right) = \left\{ \begin{array}{ll} 0 & A^Ty + c= 0 \\ -\infty & \mbox{otherwise} \end{array} \right.$
- hence dual problem is $\left( \mbox{maximize} \; 0 \quad \mbox{subject to} \; A^T y + c = 0 , \; y \succeq 0 \right)$
- assume first system is feasible, then homogeneity of primal problem implies $p^\ast = -\infty$, thus $d^\ast$, i.e., dual is infeasible, hence second system is infeasible
- assume first system is infeasible, since primal is always feasible, $p^\ast=0$, hence strong duality implies $d^\ast =0$, thus second system is feasible
Convex Optimization with Generalized Inequalities
Optimization problems with generalized inequalities
- every terminology and associated notation is same as of optimization problem in such as objective & inequality & equality contraint functions, domain of optimization problem $\optdomain$, feasible set $\optfeasset$, optimal value $p^\ast$
- note that when $K_i=\preals$ (hence $\bigpropercone=\prealk{m}$), above optimization problem coincides with that in , i.e., optimization problems with generalized inequalities subsume (normal) optimization problems
Lagrangian for generalized inequalities
Lagrange dual functions for generalized inequalities
- $g$ is concave function
- $g(\lambda,\nu)$ is lower bound for optimal value of associated optimization problem i.e., $$ g(\lambda,\nu) \leq p^\ast $$ for every $\lambda\succeq_\bigpropercone^\ast0$ where $\bigpropercone^\ast$ denotes dual cone of $\bigpropercone$, i.e., $\bigpropercone^\ast = \bigtimes K_i^\ast$ where $K_i^\ast\subset\reals^{k_i}$ is dual cone of $K_i\subset\reals^{k_i}$
- $(\lambda,\nu)$ with $\lambda\succeq_\bigpropercone 0$ and $g(\lambda,\nu)>-\infty$ said to be dual feasible
Lagrange dual problems for generalized inequalities
Slater's theorem for generalized inequalities
Duality for SDP
- (inequality form) SDP $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & x_1F_1 + \cdots + x_nF_n + G \preceq 0 \end{array} $$ where $F_1,\ldots,F_n,G\in\symset{k}$ and $\bigpropercone = \possemidefset{k}$
- Lagrangian $$ L(x,Z) = c^Tx + (x_1F_1 + \cdots + x_nF_n + G) \bullet Z = \sum x_i(F_i\bullet Z + c_i) + G \bullet Z $$ where $X\bullet Y = \Tr XY$ for $X,Y\in\symset{k}$
- Lagrange dual function $$ g(Z) = \inf_{x\in\reals^n} L(x,Z) = \left\{ \begin{array}{ll} G \bullet Z & F_i\bullet Z + c_i= 0\quad i=1,\ldots,n \\ -\infty & \mbox{otherwise} \end{array} \right. $$
- Lagrange dual problem $$ \begin{array}{ll} \mbox{maximize} & G\bullet Z \\ \mbox{subject to} & F_i \bullet Z + c_i = 0\quad i=1,\ldots,n \\ & Z \succeq 0 \end{array} $$ where fact that $\possemidefset{k}$ is self-dual, i.e., $\bigpropercone^\ast = \bigpropercone$
- Slater's theorem () implies if primal problem is strictly feasible, i.e., exists $x\in\reals^n$ such that $\sum x_iF_i + G\prec 0$, strong duality holds
KKT optimality conditions for generalized inequalities
KKT conditions and optimalities for generalized inequalities
-
for every optimization problem with generalized inequalities
(),
every statement for normal optimization problem
(),
regarding relations among
KKT conditions,
optimality,
primal and dual optimality,
and
strong duality,
is exactly the same
-
for every optimization problem with generalized inequalities
()
- if strong duality holds, primal and dual optimal points satisfy KKT optimality conditions in (same as )
- if optimization problem is convex and primal and dual solutions satisfy KKT optimality conditions in , the solutions are optimal with strong duality (same as )
- therefore, for convex optimization problem, KKT optimality conditions are necessary and sufficient for primal and dual optimality with strong duality
-
for every optimization problem with generalized inequalities
()
Perturbation and sensitivity analysis for generalized inequalities
- original problem in with perturbed constraints $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie(x) \preceq_\bigpropercone u \\ & \feq(x) =v \end{array} $$ where $u\in\reals^m$ and $v\in\reals^p$
- define $p^\ast(u,v) = p^\ast(u,v) = \inf\set{\fobj(x)}{x\in\optdomain, \fie(x) \preceq u, \feq(x) = v}$, which is convex when problem is convex optimization problem - note $p^\ast(0,0)=p^\ast$
- as for normal optimization problem case (page~), if and dual optimum $(\lambda^\ast,\nu^\ast)$, if strong duality holds, $$ p^\ast(u,v)\geq p^\ast(0,0) - {\lambda^\ast}^T u - {\nu^\ast}^T v $$ and $$ \nabla_u\; p^\ast (0,0) = -\lambda \quad \nabla_v\; p^\ast (0,0) = -\nu $$
Sensitivity analysis for SDP
- assume inequality form SDP and its dual problem on page~ and page~
-
consider perturbed SDP
$$
\begin{array}{ll}
\mbox{minimize} &
c^Tx
\\
\mbox{subject to} &
x_1F_1 + \cdots + x_nF_n + G \preceq U
\end{array}
$$
for some $U\in\symset{k}$
- define $p^\ast:\symset{k} \to \reals$ such that $p^\ast(U)$ is optimal value of above problem
- assume $x^\ast\in\reals^n$ and $Z^\ast\in\possemidefset{k}$ are primal and dual optimum with zero dualty gap
- then $$ p^\ast(U) \geq p^\ast - Z^\ast \bullet U $$
- if $\nabla_U p^\ast$ exists at $U=0$ $$ \nabla_U p^\ast(0) = - Z^\ast $$
Weak alternatives for generalized inequalities
Strong alternatives for generalized inequalities
Strong alternatives for SDP
-
for $F_1,\ldots,F_n,G\in\symset{k}$, $x\in\reals^n$, and $Z\in\symset{k}$
- below systems are strong alternatives $$ x_1F_1 + \cdots + x_nF_n + G \prec 0 $$ and $$ Z \succeq 0 \quad Z\neq 0 \quad G\bullet Z \geq 0 \quad F_i \bullet Z = 0\;i=1,\ldots,n $$
- if $\sum v_i F_i \succeq 0 \Rightarrow \sum v_i F_i = 0$, below systems are strong alternatives $$ x_1F_1 + \cdots + x_nF_n + G \preceq 0 $$ and $$ Z \succeq 0 \quad G\bullet Z > 0 \quad F_i \bullet Z = 0\;i=1,\ldots,n $$
Unconstrained Minimization
Unconstrained minimization
- consider unconstrained convex optimization problem, i.e., $m=p=0$ in $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \end{array} $$ where domain of optimization problem is $\optdomain\ = \xobj \subset \reals^n$
-
assume
- $\fobj$ is twice-differentiable (hence by definition $\xobj$ is open)
- optimal solution $x^\ast$ exists, i.e., $p^\ast = \inf_{x\in\optdomain} \fobj(x) = \fobj(x^\ast)$
- implies $x^\ast$ is optimal solution if and only if $$ \nabla \fobj(x^\ast) = 0 $$
- can solve above equation directly for few cases, but usually depend on iterative method, i.e., find sequence of points $\xseqk{0}, \xseqk{1}, \ldots \in \xobj$ such that $\lim_{k\to\infty} \fobj(\xseqk{k}) = p^\ast$
Requirements for iterative methods
-
requirements for iterative methods
- initial point $\xseqk{0}$ should be in domain of optimization problem, i.e. $$ \xseqk{0} \in \xobj\ $$
- sublevel set of $\fobj(\xseqk{0})$ $$ S = \bigset{x\in\xobj}{\fobj(x) \leq \fobj(\xseqk{0})} $$ should be closed
-
e.g.
- sublevel set of $\fobj(\xseqk{0})$ is closed for all $\xseqk{0}\in\xobj$ if $\fobj$ is closed, i.e., all its sublevel sets are closed
- $\fobj$ is closed if $\xobj = \reals^n$ and $\fobj$ is continuous
- $\fobj$ is closed if $\fobj$ is continuous, $\xobj$ is open, and $\fobj(x) \to \infty$ as $x \to \boundary \xobj$
Unconstrained minimization examples
-
convex quadratic problem
$$
\begin{array}{ll}
\mbox{minimize} &
\fobj(x) =
(1/2) x^TP x +q^Tx
\end{array}
$$
where $P\in\possemidefset{n}$ and $q\in\reals^n$
-
solution obtained by solving
$$
\nabla \fobj(x^\ast) = P x^\ast + q = 0
$$
- if solution exists, $x^\ast = - P^\dagger q$ (thus $p^\ast>-\infty$)
- otherwise, problem is unbounded below, i.e., $p^\ast = -\infty$
- ability to analytically solve quadratic minimization problem is basis for Newton's method, power method for unconstrained minimization
-
least-squares (LS) is special case of convex quadratic problem
$$
\begin{array}{ll}
\mbox{minimize} &
(1/2) \|Ax-b\|_2^2
= (1/2) x^T (A^TA) x - b^TAx + (1/2)\|b\|_2^2
\end{array}
$$
- optimal always exists, can be obtained via normal equations $$ A^T Ax^\ast = b $$
-
solution obtained by solving
$$
\nabla \fobj(x^\ast) = P x^\ast + q = 0
$$
-
unconstrained GP
$$
\begin{array}{ll}
\mbox{minimize} &
\fobj(x) =
\log\left(
\sum \exp (Ax+b)
\right)
\end{array}
$$
for $A\in\reals^{m\times n}$ and $b\in\reals^m$
- solution obtained by solving $$ \nabla \fobj(x^\ast) = \frac{\sum A^T \exp(Ax^\ast+b)}{\sum \exp(Ax^\ast+b)} = 0 $$
- need to resort to iterative method - since $\xobj = \reals^n$ and $\fobj$ is continuous, $\fobj$ is closed, hence every point in $\reals^n$ can be initial point
-
analytic center of linear inequalities
$$
\begin{array}{ll}
\mbox{minimize} &
\fobj(x) = - \sum\log(b-Ax)
\end{array}
$$
where $\xobj = \set{x\in\reals^n}{b-Ax \succ 0}$
- need to resort to iterative method - since $\xobj$ is open, $\fobj$ is continuous, and $\fobj(x) \to \infty$ as $x\to\boundary \xobj$, $\fobj$ is closed, hence every point in $\xobj$ can be initial point
- $\fobj$, called logarithmic barrier for inequalities $Ax\preceq b$
-
analytic center of LMI
$$
\begin{array}{ll}
\mbox{minimize} &
\fobj(x) = - \log\det F(x) = \log\det F(x)^{-1}
\end{array}
$$
where $F:\reals^n\to \symset{k}$ is defined by
$$
F(x) = x_1F_1 + \cdots + x_nF_n
$$
where $F_i\in \symset{k}$
and $\xobj = \set{x\in\reals^n}{F(x)\succ 0}$
- need to resort to iterative method - since $\xobj$ is open, $\fobj$ is continuous, and $\fobj(x) \to \infty$ as $x\to\boundary \xobj$, $\fobj$ is closed, hence every point in $\xobj$ can be initial point
- $\fobj$, called logarithmic barrier for LMI
Strong convexity and implications
- function $\fobj$ is strongly convex on $S$ $$ \left( \exists m >0 \right) \left( \forall x \in S \right) \left( \nabla^2 \fobj(x) \succeq mI \right) $$
-
strong convexity implies for every $x,y\in S$
$$
\fobj(y) \geq \fobj(x) + \nabla \fobj(x)^T (y-x) + ({m}/{2}) \|y-x\|_2^2
$$
- which implies gradient provides optimality certificate and tells us how far current point is from optimum, i.e. $$ \fobj(x) - p^\ast \leq ({1}/{2m}) \|\nabla \fobj(x)\|_2^2 \quad \|x-x^\ast\|_2 \leq ({2}/{m}) \|\nabla \fobj(x)\|_2 $$
- first equation implies sublevel sets contained in $S$ is bounded, hence continuous function $\nabla^2 \fobj(x)$ is also bounded, i.e., $\left( \exists M >0 \right) \left( \nabla^2 \fobj(x) \preceq M I \right)$, then $$ \fobj(x) - p^\ast \geq \frac{1}{2M} \|\nabla \fobj(x)\|_2^2 $$
Iterative methods
Line search methods
- Require: $\fobj$, \sdirk{k}, $\alpha\in(0,0.5)$, $\beta\in(0,1)$
- $\slen:=1$
- while $\fobj(\xseqk{k} + \slen \sdirk{k}) > \fobj(\xseqk{k}) + \alpha \slen \nabla \fobj(\xseqk{k})^T \sdirk{k}$ do
- $\slen := \beta \slen$
- end while
Gradient descent method
- Require: $\fobj$, initial point $x\in \dom \fobj$
- repeat
- search direction - $\sdir := - \nabla \fobj(x)$
- do line search to choose $\slen>0$
- update - $x := x + \slen \sdir$
- until stopping criterion satisfied
Summary of gradient descent method
- gradient method often exhibits approximately linear convergence, i.e., error $\fobj(\xseqk{k})-p^\ast$ converges to zero approximately as geometric series
- choice of backtracking parameters $\alpha$ and $\beta$ has noticeable but not dramatic effect on convergence
- exact line search sometimes improves convergence of gradient method, but not by large, hence mostly not worth implementation
- converge rate depends greatly on condition number of Hessian or sublevel sets - when condition number if large, gradient method can be useless
Newton's method - motivation
- second-order Taylor expansion of $\fobj$ - $\hat{\fobj}(\sdir) = \fobj(x + \sdir) = \fobj(x) + \nabla \fobj(x)^T \sdir + \frac{1}{2} \sdir^T \nabla^2 \fobj(x) \sdir$
- minimum of Taylor expansion achieved when $\nabla \hat{\fobj}(\sdir) = \nabla \fobj(x) + \nabla^2 \fobj(x) v = 0$
- solution called Newton step $$ \sdir_\mathrm{nt}(x) = - \nabla^2 \fobj(x)^{-1} \nabla \fobj(x) $$ assuming $\nabla^2\fobj(x)\succ0$
- thus Newton step minimizes local quadratic approximation of function
- difference of current and quadratic approximation minimum $$ \fobj(x) - \hat{\fobj}(\sdir_\mathrm{tn}(x)) = \frac{1}{2} \sdir_\mathrm{nt}^T \nabla^2 \fobj(x) \sdir_\mathrm{nt} = \frac{1}{2} \lambda(x)^2 $$
- Newton decrement $$ \lambda(x) = \sqrt{\sdir_\mathrm{nt}(x)^T \nabla^2 \fobj(x) \sdir_\mathrm{nt}(x)} = \sqrt{\nabla \fobj(x)^T \nabla^2 \fobj(x)^{-1} \nabla \fobj(x)} $$
Newton's method
- Require: \fobj, initial point $x\in \dom \fobj$, tolerance $\epsilon>0$
- loop
- computer Newton step and descrement $$ \sdir_\mathrm{nt}(x) := -\nabla^2 \fobj(x)^{-1} \nabla \fobj(x) \quad \lambda(x)^2 := \nabla \fobj(x)^T \nabla^2 \fobj(x)^{-1} \nabla \fobj(x) $$
- stopping criterion - quit if $\lambda(x)^2/2 < \epsilon$
- do line search to choose $t>0$
- update - $x := x + \slen \sdir_\mathrm{nt}$
- end loop
- Newton step is descent direction since $$ \left. \left( \frac{d}{dx}\fobj(x+t\sdir_\mathrm{nt}) \right) \right|_{t=0} = \nabla \fobj(x) ^T \sdir_\mathrm{nt} = - \lambda(x)^2 <0 $$
Assumptions for convergence analysis of Newton's method
-
assumptions
- strong convexity and boundedness of Hessian on sublevel set $$ \left( \exists\; m, M > 0 \right) \left( \forall x \in S \right) \left( mI \preceq \nabla^2 \fobj(x) \preceq MI \right) $$
- Lipschitz continuity of Hessian on sublevel set $$ \left( \exists L > 0 \right) \left( \forall x,y\in S \right) \left( \|\nabla^2 \fobj(x)- \nabla^2\fobj(y)\|_2 \leq L \|x-y\|_2 \right) $$
-
Lipschitz continuity constant $L$ plays critical role
in performance of Newton's method
- intuition says Newton's method works well for functions whose quadratic approximations do not change fast, i.e., when $L$ is small
Convergence analysis of Newton's method
- damped Newton phase - if $\|\nabla \fobj(\xseqk{k})\|_2 \geq \eta$, $$ \fobj(\xseqk{k+1}) - \fobj(\xseqk{k}) \leq - \gamma $$
- quadratic convergence phase - if $\|\nabla \fobj(\xseqk{k})\|_2 < \eta$, backtracking line search selects step length $\slenk{k}=1$ $$ \frac{L}{2m^2} \|\nabla \fobj(\xseqk{k+1})\|_2 \leq \left( \frac{L}{2m^2} \|\nabla \fobj(\xseqk{k})\|_2 \right)^2 $$
Summary of Newton's method
- Newton's method is affine invariant, hence performance is independent of condition number unlike gradient method
- once entering quadratic convergence phase, Newton's method converges extremely fast
- performance not much dependent on choice of algorithm parameters
- big disadvantage is computational cost for evaluating search direction, i.e., solving linear system
Self-concordance
Why self-concordance?
- convergence analysis of Newton's method depends on assumptions about function characteristics, e.g., $m,M, L > 0$ for strong convexity, continuity of Hessian, i.e. $$ m I \preceq \nabla^2 f(x) \preceq M I \quad \|\nabla^2 f(x)- \nabla^2f(y)\| \leq L \|x-y\| $$
-
self-concordance
discovered by Nesterov and Nemirovski
(who gave name self-concordance)
plays important role
for reasons
such as
- convergence analysis does not depend any function characterizing paramters
- many barrier functions which are used for interior-point methods, which are important class of optimization algorithms are self-concordance
- property of self-concordance is affine invariant
Self-concordance preserving operations
Self-concordant function examples
- negative logarithm - $f:\ppreals \to \reals$ with $$ f(x)=-\log x $$ is self-concordant since $$ |f'''(x)| / f''(x)^{3/2} = \left(2/x^3\right) / \left((1/x^2)^{3/2}\right) = 2 $$
- negative entropy plus negative logarithm - $f:\ppreals \to \reals$ with $$ f(x)=x\log x-\log x $$ is self-concordant since $$ |f'''(x)| / f''(x)^{3/2} = (x+2)/{(x+1)^{3/2}} \leq 2 $$
- log barrier for linear inequalities - for $A\in\reals^{m\times n}$ and $b\in\reals^m$ $$ f(x) = - \sum \log(b-Ax) $$ with $\dom f = \set{x\in\reals^n}{b-Ax \succ 0}$ is self-concordant by , i.e., $f$ is affine transformation of sum of self-concordant functions
- log-determinant - $f:\posdefset{n}\to\reals$ with $$ f(X) = \log\det X^{-1} = - \log\det X $$ is self-concordant since for every $X\in \posdefset{n}$ and $V\in\symset{n}$ function $g:\reals\to\reals$ defined by $g(t) = - \log\det(X+tV)$ where $\dom f = \set{t\in\reals}{X+tV\succeq 0}$ is self-concordant since $$ \begin{eqnarray*} g(t) &=& - \log \det (X^{1/2} (I + tX^{-1/2} V X^{-1/2})X^{1/2}) \\ &=& -\log\det X - \log\det(I+tX^{-1/2}VX^{-1/2}) \\ &=& -\log\det X - \sum \log (1+t\lambda_i(X,V)) \end{eqnarray*} $$ where $\lambda_i(X,V)$ is $i$-th eigenvalue of $X^{-1/2}VX^{1/2}$ is self-concordant by , i.e., $g$ is affine transformation of sum of self-concordant functions
- log of concave quadratic - $f:X\to\reals$ with $$ f(x) = -\log(-x^TPx - q^Tx - r) $$ where $P\in\possemidefset{n}$ and $X=\set{x\in\reals^n}{x^TPx + q^Tx + r<0}$
-
function $f:X\to\reals$
with
$$
f(x) = -\log(-g(x)) - \log x
$$
where $\dom f = \set{x\in\dom g \cap \ppreals}{g(x)<0}$
and
function $h:H\to\reals$
$$
h(x) = -\log(-g(x)-ax^2-bx-c) - \log x
$$
where $a\geq0$ and $\dom h = \set{x\in\dom g \cap \ppreals}{g(x)+ax^2+bx+c<0}$
are self-concordant
if $g$ is one of below
- $g(x) = -x^p$ for $0<p\leq 1$
- $g(x) = -\log x$
- $g(x) = x \log x$
- $g(x) = x^p$ for $-1\leq p\leq 0$
- $g(x) = (ax+b)^2/x$ for $a,b\in\reals$
- function $f:X\to\reals$ with $X = \set{(x,y)}{\|x\|_2 < y}\subset \reals^n \times \ppreals$ defined by $$ f(x,y) = -\log(y^2-x^Tx) $$ is self-concordant - can be proved using
- function $f:X\to\reals$ with $X = \set{(x,y)}{|x|^p < y}\subset \reals \times \ppreals$ defined by $$ f(x,y) = -2\log y - \log(y^{2/p}- x^2) $$ where $p\geq1$ is self-concordant - can be proved using
- function $f:X\to\reals$ with $X = \set{(x,y)}{\exp(x) < y}\subset \reals \times \ppreals$ defined by $$ f(x,y) = -\log y - \log(\log y - x) $$ is self-concordant - can be proved using
Properties of self-concordant functions
- note $$ \lambda(x) = \sup_{v\neq 0} \left(v^T \nabla \fobj(x) / \left( v^T \nabla^2 \fobj(x) v \right)^{1/2} \right) $$
Stopping criteria for self-concordant objective functions
- recall $\lambda(x)^2$ provides approximate optimality certificate, (page~) i.e., assuming $\fobj$ is well approximated by quadratic function around $x$ $$ \fobj(x) - p^\ast \lessapprox \lambda(x)^2/2 $$
- however, strict convexity together with self-concordance provides proven bound (by ) $$ \fobj(x) - p^\ast \leq \lambda(x)^2 $$ for $\lambda(x) \leq 0.68$
- hence can use following stopping criterion for guaranteed bound $$ \lambda(x)^2 \leq \epsilon \quad \Rightarrow \quad \fobj(x) - p^\ast \leq \epsilon $$ for $\epsilon \leq 0.68^2$
Convergence analysis of Newton's method for self-concordant functions
- damped Newton phase - if $\lambda(\xseqk{k})>\eta$ $$ \fobj(\xseqk{k+1}) - \fobj(\xseqk{k}) \leq - \gamma $$
- quadratic convergence phase - if $\lambda(\xseqk{k})\leq\eta$ backtracking line search selects step length $\slenk{k}=1$ $$ 2\lambda(\xseqk{k+1}) \leq \left(2\lambda(\xseqk{k})\right)^2 $$
Equality Constrained Minimization
Equality constrained minimization
- consider equality constrained convex optimization problem, i.e., $m=0$ in $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & Ax = b \end{array} $$ where $A\in\reals^{p\times n}$ and domain of optimization problem is $\optdomain\ = \xobj \subset \reals^n$
-
assume
- $\rank A = p<n$, i.e., rows of $A$ are linearly independent
- $\fobj$ is twice-differentiable (hence by definition $\xobj$ is open)
- optimal solution $x^\ast$ exists, i.e., $p^\ast = \inf_{x\in\optfeasset} \fobj(x) = \fobj(x^\ast)$ and $Ax^\ast = b$
Solving KKT for equality constrained minimization
- implies $x^\ast\in\xobj$ is optimal solution if and only if exists $\nu^\ast\in\reals^p$ satisfy KKT optimality conditions, i.e., $$ \begin{eqnarray*} Ax^\ast = b &&\mbox{\define{primal feasibility equations}} \\ \nabla \fobj(x^\ast) + A^T\nu^\ast = 0 &&\mbox{\define{dual feasibility equations}} \end{eqnarray*} $$
-
solving equality constrained problem
is equivalent to
solving KKT equations
- handful types of problems can be solved analytically
-
using unconstrained minimization methods
- can eliminate equality constraints and apply unconstrained minimization methods
- solving dual problem using unconstrained minimization methods and retrieve primal solution (refer to page~)
-
will discuss Newton's method directly handling equality constraints
- preserving problem structure such as sparsity
Equality constrained convex quadratic minimization
- equality constrained convex quadratic minimization problem $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) = (1/2)x^T P x + q^Tx \\ \mbox{subject to} & Ax = b \end{array} $$ where $P\in\possemidefset{n}$ and $A\in\reals^{p\times n}$
- important since basis for extension of Newton's method to equality constrained problems
- KKT system $$ Ax^\ast = b \; \& \; Px^\ast + q + A^T\nu^\ast = 0 \; \Leftrightarrow \; \underbrace{ \mattwotwo{P}{A^T}{A}{0} }_{\mbox{\define{KKT matrix}}} \colvectwo{x^\ast}{\nu^\ast} = \colvectwo{-q}{b} $$
- exist primal and dual optimum $(x^\ast,\nu^\ast)$ if and only if KKT system has solution; otherwise, problem is unbounded below
Eliminating equality constraints
-
can solve equality constrained convex optimization
by
- eliminating equality constraints and
- using optimization method for solving unconstrained optimization
- note $$ \optfeasset = \set{x}{Ax=b} = \set{Fz + x_0}{z\in\reals^{n-p}} $$ for some $F\in\reals^{n\times(n-p)}$ where $\range(F) = \nullspace(A)$
- thus original problem equivalent to $$ \begin{array}{ll} \mbox{minimize} & \fobj(Fz + x_0) \end{array} $$
- if $z^\ast$ is optimal solution, $x^\ast = Fz^\ast + x_0$
- optimal dual can be retrieved by $$ \nu^\ast = - (AA^T)^{-1} A\nabla \fobj(x^\ast) $$
Solving dual problems
- Lagrange dual function of equality constrained problem $$ \begin{eqnarray*} g(\nu) & = & \inf_{x\in\optdomain} \left( \fobj(x) + \nu^T(Ax-b) \right) = -b^T\nu - \sup_{x\in\optdomain} \left((-A^T\nu)^Tx -\fobj(x)\right) \\ & = & -b^T \nu - {\fobj}^\ast(-A^T\nu) \end{eqnarray*} $$
- dual problem $$ \begin{array}{ll} \mbox{maximize} & -b^T \nu - {\fobj}^\ast(-A^T\nu) \end{array} $$
- by assumption, strong duality holds, hence if $\nu^\ast$ is dual optimum $$ g(\nu^\ast) = p^\ast $$
- if dual objective is twice-differentiable, can solve dual problem using unconstrained minimization methods
- primal optimum can be retrieved using method on page~)
Newton's method with equality constraints
-
finally discuss Newton's method which directly handles equality constraints
- similar to Newton's method for unconstrained minimization
- initial point, however, should be feasible, i.e., $\xseqk{0}\in\xobj$ and $A\xseqk{0} = b$
- Newton step tailored for equality constrained problem
Newton step via second-order approximation
- solve original problem approximately by solving $$ \begin{array}{ll} \mbox{minimize} & \hat{\fobj}(x+\sdir) = \fobj(x) + \nabla \fobj(x)^T \sdir + (1/2) \sdir^T \nabla^2 \fobj(x) \sdir \\ \mbox{subject to} & A(x+\sdir) = b \end{array} $$ where $x\in\optfeasset$
- Newton step for equality constrained minimization problem, defined by solution of KKT system for above convex quadratic minimization problem $$ \mattwotwo{\nabla^2 \fobj(x)}{A^T}{A}{0} \colvectwo{\sdir_\mathrm{nt}}{w} = \colvectwo{-\nabla \fobj(x)}{0} $$ only when KKT system is nonsingular
Newton step via solving linearized KKT optimality conditions
- recall KKT optimality conditions for equality constrained convex optimization problem $$ Ax^\ast = b \quad \& \quad \nabla \fobj(x^\ast) + A^T\nu^\ast = 0 $$
- linearize KKT conditions $$ \begin{eqnarray*} && A(x+\sdir) = b \quad \& \quad \nabla \fobj(x) + \nabla^2 \fobj(x) \sdir + A^Tw = 0 \\ &\Leftrightarrow& A\sdir = 0 \quad \& \quad \nabla^2 \fobj(x) \sdir + A^Tw = - \nabla \fobj(x) \end{eqnarray*} $$ where $x\in\optfeasset$
- Newton step defined by above equations is equivalent to that obtained by second-order approximation
Newton decrement for equality constrained minimization
- Newton descrement for equality constrained problem is defined by $$ \lambda(x) = \left(\sdir_\mathrm{nt} \nabla^2 \fobj(x) \sdir_\mathrm{nt}\right)^{1/2} $$
- same expression as that for unconstrained minimization, but is different since Newton step $\sdir_\mathrm{nt}$ is different from that for unconstrained minimization, i.e., $\sdir_\mathrm{nt} \neq -\nabla^2 \fobj(x)^{-1} \nabla \fobj(x)$ (refer to )
- however, as before, $$ \fobj(x) - \inf_{\sdir\in\reals^n}\set{\hat{\fobj}(x+\sdir)}{A(x+\sdir)=b} = \lambda(x)^2/2 $$ and $$ \left. \left( \frac{d}{dt}\fobj(x+t\sdir_\mathrm{nt}) \right) \right|_{t=0} = \nabla \fobj(x) ^T \sdir_\mathrm{nt} = - \lambda(x)^2 <0 $$
Feasible Newton's method for equality constrained minimization
- Require: $\fobj$, initial point $x\in \dom \fobj$ with $Ax=b$, tolerance $\epsilon>0$
- loop
- computer Newton step and descrement $\ntsdir(x)$ \& $\lambda(x)$
- stopping criterion - quit if $\lambda(x)^2/2 < \epsilon$
- do line search on \fobj\ to choose $t>0$
- update - $x := x + \slen \ntsdir$
- end loop
-
- assumes KKT matrix is nonsingular for every step
- is feasible descent method since all iterates are feasible with $\fobj(\xseqk{k+1}) <\fobj(\xseqk{k})$
Assumptions for convergence analysis of feasible Newton's method for equality constrained minimization
- feasibility of initial point - $\xseqk{0}\in\dom \fobj \;\&\; A\xseqk{0}=b$
- sublevel set $S = \set{x\in \dom \fobj}{\fobj(x) \leq \fobj(\xseqk{0}),\; Ax=b}$ is closed
- boundedness of Hessian on $S$ $$ \left( \exists M > 0 \right) \left( \forall x\in S \right) \left( \nabla^2 \fobj(x) \preceq M I \right) $$
- boundedness of KKT matrix on $S$ - corresponds to strong convexity assumption in unconstrained minimization $$ \left( \exists K >0 \right) \left( \forall x \in S \right) \left( \left\| \mattwotwo{\nabla^2 \fobj(x)}{A^T}{A}{0}^{-1} \right\|_2 \leq K \right) $$
- Lipschitz continuity of Hessian on $S$ $$ \left( \exists L > 0 \right) \left( \forall x,y\in S \right) \left( \left\|\nabla^2 \fobj(x) - \nabla^2 \fobj(y)\right\|_2 \leq L \|x-y\|_2 \right) $$
Convergence analysis of feasible Newton's method for equality constrained minimization
- convergence analysis of Newton's method for equality constrained minimization can be done by analyzing unconstrained minimization after eliminating equality constraints
-
thus, yield exactly same results as
for unconstrained minimization
()
(with different parameter values),
i.e.,
- consists of damped Newton phase and quadratic convergence phase
- # iterations required to achieve $\fobj(\xseqk{k})-p^\ast \leq \epsilon$ is $$ \left(\fobj(\xseqk{0})-p^\ast\right)/\gamma + \log_2 \log_2 (\epsilon_0/\epsilon) $$
- for # iterations required to achieve $\fobj(\xseqk{k})-p^\ast \leq \epsilon$ for self-concordant functions is also same as for unconstrained minimization () $$ \left(\fobj(\xseqk{0}) - p^\ast\right)/{\gamma} + \log_2 \log_2 (1 / \epsilon) $$ where $\gamma = \alpha \beta (1-2\alpha)^2 / (20-8\alpha)$
Newton step at infeasible points
- only assume that $x\in\dom \fobj$ (hence, can be infeasible)
- (as before) linearize KKT conditions $$ \begin{eqnarray*} && A(x+\ntsdir) = b \quad \& \quad \nabla \fobj(x) + \nabla^2 \fobj(x) \ntsdir + A^Tw = 0 \\ &\Leftrightarrow& A\ntsdir = b - Ax \quad \& \quad \nabla^2 \fobj(x) \ntsdir + A^Tw = - \nabla \fobj(x) \\ &\Leftrightarrow& \mattwotwo{\nabla^2 \fobj(x)}{A^T}{A}{0} \colvectwo{\ntsdir}{w} = - \colvectwo{\nabla \fobj(x)}{Ax-b} \end{eqnarray*} $$
- same as feasible Newton step except second component on RHS of KKT system
Interpretation as primal-dual Newton step
- update both primal and dual variables $x$ and $\nu$
- define $r:\reals^n\to\reals^p\to\reals^n\times\reals^p$ by $$ r(x,\nu) = (r_\mathrm{dual}(x,\nu),r_\mathrm{pri}(x,\nu)) $$ where $$ \begin{eqnarray*} \mbox{\define{dual residual}} & - & r_\mathrm{dual}(x,\nu) = \nabla \fobj(x) + A^T\nu \\ \mbox{\define{primal residual}} & - & r_\mathrm{pri}(x,\nu) = Ax-b \end{eqnarray*} $$
Equivalence of infeasible Newton step to primal-dual Newton step
- linearize $r$ to obtain primal-dual Newton step, i.e. $$ \begin{eqnarray*} && r(x,\nu) + D_{x,\nu} r(x,\nu) \colvectwo{\pdsdir}{\pdsdirnu} = 0 \\ &\Leftrightarrow& \mattwotwo{\nabla^2f(x)}{A^T}{A}{0} \colvectwo{\pdsdir}{\pdsdirnu} = - \colvectwo{\nabla f(x) + A^T\nu}{Ax-b} \end{eqnarray*} $$
-
letting $\nu^+= \nu + \pdsdirnu$ gives
$$
\mattwotwo{\nabla^2f(x)}{A^T}{A}{0}
\colvectwo{\pdsdir}{\nu^+}
=
- \colvectwo{\nabla f(x)}{Ax-b}
$$
- equivalent to infeasible Newton step
- reveals that current value of dual variable not needed
Residual norm reduction property
- infeasible Newton step is not descent direction (unlike feasible Newton step) since $$ \begin{eqnarray*} \left. \left( \frac{d}{dt}\fobj(x+t\pdsdir) \right) \right|_{t=0} &=& \nabla \fobj(x) ^T \pdsdir \\ &=& - \pdsdir^T \left(\nabla^2 \fobj(x) \pdsdir + A^Tw \right) = - \pdsdir^T \nabla^2 \fobj(x) \pdsdir + (Ax-b)^Tw \end{eqnarray*} $$ which is not necessarily negative
- however, norm of residual decreases in infeasible Newton direction $$ \begin{eqnarray*} \left. \left( \frac{d}{dx} \|r(y+t\pdsdiry)\|_2^2 \right) \right|_{t=0} & = & - 2 r(y)^T r(y) = - 2 \|r(y)\|_2^2 \\ \Leftrightarrow \quad \left. \left( \frac{d}{dx} \|r(y+t\pdsdiry)\|_2 \right) \right|_{t=0} & = & \frac{-2\|r(y)\|_2^2}{2\|r(y)\|_2} = - \|r(y)\|_2 \end{eqnarray*} $$ where $y=(x,\nu)$ and $\pdsdiry = (\pdsdir, \pdsdirnu)$
- can use $r(\xseqk{k},\nuseqk{k})$ to measure optimization progress for infeasible Newton's method
Full and damped step feasibility property
- assume step length is $t$ at some iteration, then $$ r_\mathrm{pri}(x^+,\nu^+) = Ax^+-b = A(x + t \pdsdir) - b = (1-t) r_\mathrm{pri}(x,\nu) $$
-
hence
$l>k$
$$
\seqk{r}{l}
=
\left(
\prod_{i=k}^{l-1}
(1-\seqk{t}{i})
\right)
\seqk{r}{k}
$$
- primal residual reduced by $1-\seqk{t}{k}$ at step $k$
- Newton step becomes feasible step once full step length ($t=1$) taken
Infeasible Newton's method for equality constrained minimization
- Require: $\fobj$, initial point $x\in \dom \fobj$ \& $\nu$, tolerance $\epsilon_\mathrm{pri}>0$ \& $\epsilon_\mathrm{dual}>0$
- repeat
- computer Newton step and descrement $\pdsdir(x)$ \& $\pdsdirnu(x)$, \
- do line search on $r(x,\nu)$ to choose $\slen>0$
- update - $x := x + \slen \pdsdir$ \& $\nu := \nu + \slen \pdsdirnu$
- until $\|r_\mathrm{dual}(x,\nu)\| \leq \epsilon_\mathrm{dual}$ \& $\|Ax-b\| \leq \epsilon_\mathrm{pri}$
-
note similarity and difference of
&
- line search done not on $\fobj$, but on primal-dual residuals $r(x,\nu)$
- stopping criteria depends on $r(x,\nu)$, not on Newton decrementa $\lambda(x)^2$
- primal and dual feasibility checked separately - here norm in $\|Ax-b\|$ can be any norm, e.g., $\|\cdot\|_0$, $\|\cdot\|_1$, $\|\cdot\|_2$, $\|\cdot\|_\infty$, depending on specific application
Line search methods for infeasible Newton's method
- line search methods for infeasible Newton's method, i.e., & same with $\fobj$ replaced by $\|r(x,\nu)\|_2$,
- but they have special forms (of course) - refer to below special case descriptions
- Require: \sdir, \sdirnu, $\alpha\in(0,0.5)$, $\beta\in(0,1)$
- $\slen:=1$
- while $\|r(x +\slen\pdsdir, \nu + \slen\pdsdirnu)\|_2 > (1-\alpha \slen)\|r(x,\nu)\|_2$ do
- $\slen := \beta \slen$
- end while
Pros and cons of infeasible Newton's method
-
pros
-
do not need to find feasible point separately,
e.g.
- “''
- “''
- if step length is one at any iteration, following steps coincides with feasible Newton's method - could switch to feasible Newton's method
-
do not need to find feasible point separately,
e.g.
-
cons
- exists no clear way to detect feasibility - primal residual decreases slowly (phase I method in interior point method resolves this problem)
- convergence of infeasible Newton's method can be very slow (until feasibility is achieved0
Assumptions for convergence analysis of infeasible Newton's method for equality constrained minimization
- sublevel set $S = \bigset{(x,\nu)\in \dom \fobj\times \reals^m}{ \|r(x,\nu)\|_2 \leq \|r(\xseqk{0},\nuseqk{0})\|_2 }$ is closed, which always holds because $\|r\|_2$ is closed
- boundedness of KKT matrix on $S$ $$ \left( \exists K >0 \right) \left( \forall x \in S \right) \left( \left\| Dr(x,\nu)^{-1} \right\|_2 = \left\| \mattwotwo{\nabla^2 \fobj(x)}{A^T}{A}{0}^{-1} \right\|_2 \leq K \right) $$
- Lipschitz continuity of Hessian on $S$ $$ \left( \exists L > 0 \right) \left( \forall (x,\nu), (y,\mu)\in S \right) \left( \left\|Dr(x,\nu) - Dr(y,\mu)\right\|_2 \leq L \|(x,\nu) - (y,\mu)\|_2 \right) $$
- above assumptions imply $\set{x\in\dom \fobj}{Ax=b}\neq\emptyset$ and exist optimal point $(x^\ast,\nu^\ast)$
Convergence analysis of infeasible Newton's method for equality constrained minimization
- very simliar to that for Newton's method for unconstrained minimization
-
consist of two phases - like unconstrained minimization or infeasible Newton's method (refer to
or page~)
- damped Newton phase - if $\|r(\xseqk{k},\nuseqk{k})\|_2> 1/(K^2L)$ $$ \|r(\xseqk{k+1},\nuseqk{k+1})\|_2 \leq \|r(\xseqk{k},\nuseqk{k})\|_2 - \alpha \beta / K^2L $$
- quadratic convergence damped Newton phase - if $\|r(\xseqk{k},\nuseqk{k})\|_2 \leq 1/(K^2L)$ $$ \left( K^2L \|r(\xseqk{k},\nuseqk{k})\|_2 / 2 \right) \leq \left( K^2L \|r(\xseqk{k-1},\nuseqk{k-1})\|_2 / 2 \right)^2 \leq \cdots \leq (1/2)^{2^k} $$
- # iterations of infeasible Newton's method required to satisfy $\|r(\xseqk{k},\nuseqk{k})\|_2\leq\epsilon$ $$ \|r(\xseqk{0},\nuseqk{0})\| /(\alpha \beta / K^2L) + \log_2 \log_2 (\epsilon_0/\epsilon) \quad \mbox{where}\; \epsilon_0 = 2/(K^2L) $$
- $(\xseqk{k},\nuseqk{k})$ converges to $(x^\ast,\nu^\ast)$
Barrier Interior-point Methods
Interior-point methods
- want to solve inequality constrained minimization problem
-
interior-point methods solve convex optimization problem ()
or KKT optimality conditions ()
by
-
applying Newton's method to sequence of
- equality constrained problems or
- modified versions of KKT optimality conditions
-
applying Newton's method to sequence of
- discuss interior-point barrier method & interior-point primal-dual method
-
hierarchy of convex optimization algorithms
- simplest - linear equality constrained quadratic program - can solve analytically
- Newton's method - solve linear equality constrained convex optimization problem by solving sequence of linear equality constrained quadratic programs
- interior-point methods - solve linear equality & convex inequality constrained problem by solving sequence of lienar equality constrained convex optimization problem
Indicator function barriers
- approxmiate general convex inequality constrained problem as linear equality constrained problem
- make inequality constraints implicit in objective function $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) + \sum I_-(\fie(x)) \\ \mbox{subject to} & Ax=b \end{array} $$ where $I_-:\reals\to \reals$ is indicator function for nonpositive real numbers, i.e. $$ I_{-}(u) = \left\{\begin{array}{ll} 0 & u\leq 0 \\ \infty & u> 0 \end{array}\right. $$
Logarithmic barriers
- approximate indicator function by logarithmic function $$ \hat{I}_- = -(1/t) \log(-u) \quad \dom \hat{I}_- = -\ppreals $$ for $t>0$ to obtain $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) + \sum -(1/t) \log(-\fie(x)) \\ \mbox{subject to} & Ax=b \end{array} $$
- objective function is convex due to composition rule for convexity preservation (page~), and differentiable
- hence, can use Newton's method to solve it
- function $\phi$ defined by $$ \phi(x) = - \sum \log(-\fie(x)) $$ with $\dom \phi \set{x\in\xdomain}{\fie(x) \prec 0}$ called logarithmic barrier or log barrier
- solve sequence of log barrier problems as we increase $t$
Central path
- optimization problem $$ \begin{array}{ll} \mbox{minimize} & t \fobj(x) + \phi(x) \\ \mbox{subject to} & Ax = b \end{array} $$ with $t>0$ where $$ \phi(x) = - \sum \log(-\fie(x)) $$
- solution of above problem, called central point, denoted by $x^\ast(t)$, set of central points, called central path
- intuition says $x^\ast(t)$ will converge to $x^\ast$ as $t\to\infty$
- KKT conditions imply $$ Ax^\ast(t) = b \quad \fie(x^\ast(t)) \prec 0 $$ and exists $\nu^\ast(t)$ such that $$ \begin{eqnarray*} 0 &=& t \nabla \fobj(x^\ast(t)) + \nabla \phi(x^\ast(t)) + t A^T \nu^\ast(t) \\ &=& t\nabla \fobj(x^\ast(t)) - \sum \frac{1}{\fie_i(x^\ast(t))} \nabla\fie_i(x^\ast(t)) + t A^T \nu^\ast(t) \end{eqnarray*} $$
- thus if we let $\lambda^\ast(t) = -1/t\fie_i(x^\ast(t))$, $x^\ast(t)$ minimizes $$ L(x,\lambda^\ast(t),\nu^\ast(t)) = \fobj(x) + {\lambda^\ast(t)}^T \fie(x) + {\nu^\ast(t)}^T (Ax-b) $$ where $L$ is Lagrangian of original problem in
- hence, dual function $g(\lambda^\ast(t),\nu^\ast(t))$ is finite and $$ \begin{eqnarray*} g(\lambda^\ast(t), \nu^\ast(t)) &=& \inf_{x\in\xdomain} L(x,\lambda^\ast(t),\nu^\ast(t)) &=& L(x^\ast(t),\lambda^\ast(t),\nu^\ast(t)) \\ & = & \fobj(x^\ast(t)) + {\lambda^\ast(t)}^T \fie(x^\ast(t)) + {\nu^\ast(t)}^T (Ax^\ast(t)-b) = \fobj(x^\ast(t)) - m/t \end{eqnarray*} $$ and $$ \fobj(x^\ast(t)) - p^\ast \leq \fobj(x^\ast(t)) - g(\lambda^\ast(t), \nu^\ast(t)) = m/t $$
- that is, $x^\ast(t)$ is no more than $m/t$-suboptimal
- which confirms out intuition that $x^\ast(t)\to x^\ast$ as $t\to\infty$
Central path interpretation via KKT conditions
- previous arguments imply that $x$ is central point, i.e., $x=x^\ast(t)$ for some $t>0$ if and only if exist $\lambda$ and $\nu$ such that $$ \begin{eqnarray*} Ax=b \quad \fie({x}) &\preceq& 0 \quad \mbox{- primal feasibility} \\ \lambda &\succeq& 0 \quad \mbox{- dual feasibility} \\ - {\lambda_i}^T \fie_i({x}) &=& 1/t \quad \mbox{- complementary $1/t$-slackness} \\ \nabla_x L(x,\lambda,\nu) &=& 0 \quad \mbox{- vanishing gradient of Lagrangian} \end{eqnarray*} $$ called centrality conditions
-
only difference between centrality conditions and KKT conditions in
is complementary $1/t$-slackness
- note that I've just made up term “complementary $1/t$-slackness'' - you won't be able to find terminology in any literature
- for large $t$, $\lambda^\ast(t)$ & $\nu^\ast(t)$ very closely satisfy (true) complementary slackness
Central path interpretation via force field
- assume exist no equality constraints
- interpret $\phi$ as potential energy by some force field, e.g., electrical field and $t\fobj$ as potential energy by some other force field, e.g., gravity
-
then
- force by first force field (in $n$-dimensional space), which we call barrier force, is $$ - \nabla \phi(x) = \sum \frac{1}{\fie_i(x)} \nabla \fie_i(x) $$
- force by second force field, which we call objective force, is $$ - \nabla (t\fobj(x)) = -t \nabla \fobj(x) $$
-
$x^\ast(t)$ is point where two forces exactly balance each other
- as $x$ approach boundary, barrier force pushes $x$ harder from barriers,
- as $t$ increases, objective force pushes $x$ harder to point where objective potential energy is minimized
Equality constrained problem using log barrier
- central point $x^\ast(t)$ is $m/t$-suboptimal point guaranteed by optimality certificate $g(\lambda^\ast(t),\nu^\ast(t))$
- hence solving below problem provides solution with $\epsilon$-suboptimality $$ \begin{array}{ll} \mbox{minimize} & (m/\epsilon) \fobj(x) + \phi(x) \\ \mbox{subject to} & Ax=b \end{array} $$
- but works only for small problems since for large $m/\epsilon$, objective function ill behaves
Barrier methods
- Require: strictly feasible $x$, $t>0$, $\mu>1$, tolerance $\epsilon>0$
- repeat
- centering step - find $x^\ast(t)$ by minimizing $t\fobj + \phi$ subject to $Ax=b$ starting at $x$
- (optionally) compute $\lambda^\ast(t)$ \& $\nu^\ast(t)$
- stopping criterion - quit if $m/t<\epsilon$
- increase $t$ - $t := \mu t$
- update $x$ - $x := x^\ast(t)$ \Until
-
barrier method, also called path-following method,
solves sequence of equality constrained optimization problem with log barrier
- when first proposed by Fiacco and McCormick in 1960s, it was called sequential unconstrained minimization technique (SUMT)
- centering step also called outer iteration
- each iteration of algorithm used for equality constrained problem, called inner iteration
Accuracy in centering in barrier method
-
accuracy of centering
- only goal of centering is getting close to $x^\ast$, hence exact calculation of $x^\ast(t)$ not critical as long as approximates of $x^\ast(t)$ go to $x^\ast$
- while cannot calculate $g(\lambda,\nu)$ for this case, below provides dual feasible point when Newton step $\ntsdir$ for optimization problem on page~ is small, i.e., for nearly centered $$ \tilde{\lambda}_i = -\frac{1}{t\fie_i(x)} \left( 1 - \frac{\nabla \fie_i(x)^T \ntsdir}{\fie_i(x)} \right) $$
Choices of parameters of barrier method
-
choice of $\mu$
-
$\mu$ determines aggressiveness of $t$-update
- larger $\mu$, less outer iterations, but more inner iterations
- smaller $\mu$, less outer iterations, but more inner iterations
- values from $10$ to $20$ for $\mu$ seem to work well
-
$\mu$ determines aggressiveness of $t$-update
-
candidates for choice of initial $t$
- choose $\seqk{t}{0}$ such that
- $$ m / \seqk{t}{0} \approx \fobj(\xseqk{0}) - p^\ast $$
- make central path condition on page~ maximally satisfied $$ \seqk{t}{0} = \arginf_{t} \inf_{\tilde{\nu}} \left\| t \nabla \fobj(\xseqk{0}) + \nabla \phi(\xseqk{0}) + A^T \tilde{\nu} \right\| $$
Convergence analysis of barrier method
- assuming $t\fobj + \phi$ can be minimized by Newton's method for $\seqk{t}{0}$, $\mu\seqk{t}{0}$, $\mu^2\seqk{t}{0}$,
- at $k$'th step, duality gap achieved is $m/\mu^k\seqk{t}{0}$
- # centering steps required to achieve accuracy of $\epsilon$ is $$ \left\lceil \frac{\log \left(m/\epsilon \seqk{t}{0}\right)}{\log \mu} \right\rceil $$ plus one (initial centering step)
-
for convergence of centering
- for feasible centering problem, $t\fobj + \phi$ should satisfy conditions on page~, i.e., initial sublevel set is closed, associated inverse KKT matrix is bounded & Hessian satisfies Lipschitz condition
- for infeasible centering problem, $t\fobj + \phi$ should satisfy conditions on page~
Primal-dual Interior-point Methods
Primal-dual \& barrier interior-point methods
-
in primal-dual interior-point methods
- both primal and dual variables are updated at each iteration
- search directions are obtained from Newton's method, applied to modified KKT equations, i.e., optimality conditions for logarithmic barrier centering problem
- primal-dual search directions are similar to, but not quite the same as, search directions arising in barrier methods
- primal and dual iterates are not necessarily feasible
-
primal-dual interior-point methods
- often more efficient than barrier methods especially when high accuracy is required - can exhibit better than linear convergence
- (customized versions) outperform barrier method for several basic problems, such as, LP, QP, SOCP, GP, SDP
- can work for feasible, but not strictly feasible problems
- still active research topic, but show great promise
Modified KKT conditions and central points
- modified KKT conditions (for convex optimization in ) expressed as $$ r_t(x,\lambda,\nu) = \colvecthree {\nabla \fobj(x) + D\fie(x)^T\lambda + A^T\nu} {-\diag(\lambda)\fobj(x) - (1/t) \ones} {Ax-b} $$ where $$ \begin{eqnarray*} \mbox{\define{dual residual}} &-& r_\mathrm{dual}(x,\lambda,\nu) = {\nabla \fobj(x) + D\fie(x)^T\lambda + A^T\nu} \\ \mbox{\define{centrality residual}} &-& r_\mathrm{cent}(x,\lambda,\nu) = {-\diag(\lambda)\fobj(x) - (1/t) \ones} \\ \mbox{\define{primal residual}} &-& r_\mathrm{pri}(x,\lambda,\nu) = {Ax-b} \end{eqnarray*} $$
-
if $x$, $\lambda$, $\nu$ satisfy $r_t(x,\lambda,\nu)=0$ (and $\fie(x) \prec 0$),
then
- $x=x^\ast(t)$, $\lambda=\lambda^\ast(t)$, $\nu=\nu^\ast(t)$
- $x$ is primal feasible and $\lambda$ & $\nu$ are dual feasible with duality gap $m/t$
Primal-dual search direction
- assume current (primal-dual) point $y=(x,\lambda,\nu)$ and Newton step $\sdiry =( \sdir, \sdirnu, \sdirlbd)$
- as before, linearize equation to obtain Newton step, i.e., $$ r_t(y+\sdiry) \approx r_t(y) + Dr_t(y) \sdiry = 0 \quad \Leftrightarrow \quad \sdiry = -Dr_t(y)^{-1} r_t(y) $$ hence $$ \begin{my-matrix}{ccc} \nabla^2 f(x) + \sum \lambda_i \nabla^2 \fie_i(x) & D\fie(x)^T & A^T \\ -\diag(\lambda) D\fobj(x) & -\diag(\fobj(x)) & 0 \\ A & 0 & 0 \end{my-matrix} \colvecthree{\sdir}{\sdirlbd}{\sdirnu} = - \colvecthree {r_\mathrm{dual}} {r_\mathrm{cent}} {r_\mathrm{pri}} $$
- above equation determines primal-dual search direction $\pdsdiry = (\pdsdir, \pdsdirlbd, \pdsdirnu)$
Surrogate duality gap
- iterates $\xseqk{k}$, $\lbdseqk{k}$, and $\nuseqk{k}$ of primal-dual interior-point method are not necessarily feasible
- hence, cannot easily evaluate duality gap $\seqk{\eta}{k}$ as for barrier method
- define surrogate duality gap for $\fie(x) \prec 0$ and $\lambda\succeq0$ as $$ \hat{\eta}(x,\lambda) = - \fie(x)^T \lambda $$
- $\hat{\eta}$ would be duality gap if $x$ were primal feasible and $\lambda$ & $\nu$ were dual feasible
- value $t$ corresponding to surrogate duality gap $\hat{\eta}$ is $m/\hat{\eta}$
Primal-dual interior-point method
- Require: initial point $x$ with $\fie(x)\prec0$, $\lambda \succ 0$, $\mu > 1$, $\epsilon_\mathrm{pri}>0$, $\epsilon_\mathrm{dual}>0$, $\epsilon>0$
- repeat
- set $t := \mu m /\hat{\eta}$
- computer primal-dual search direction $\pdsdiry = (\pdsdir, \pdsdirlbd, \pdsdirnu)$
- do line search to choose $s>0$
- update - $x := x + s \pdsdir$, $\lambda := \lambda + s \pdsdirnu$, $\nu := \nu + s \pdsdirnu$
- until $\|r_\mathrm{pri}(x,\lambda,\nu)\|_2\leq \epsilon_\mathrm{pri}$, $\|r_\mathrm{dual}(x,\lambda,\nu)\|_2\leq \epsilon_\mathrm{dual}$, $\hat{\eta} \leq \epsilon$
- common to choose small $\epsilon_\mathrm{pri}$, $\epsilon_\mathrm{dual}$, & $\epsilon$ since primal-dual method often shows faster than linear convergence
Line search for primal-dual interior-point method
- liner search is standard backtracking line search on $r(x,\lambda,\nu)$ similar to that in except making sure that $\fie(x) \prec 0$ and $\lambda\succ0$
- note initial $s$ in is largest $s$ that makes $\lambda + s\pdsdirlbd$ positive
- Require: \pdsdir, \pdsdirlbd, \pdsdirnu, $\alpha\in(0.01,0.1)$, $\beta\in(0.3,0.8)$
- $s:= 0.99\sup\set{s\in[0,1]}{\lambda + s \sdirlbd \succeq 0} = 0.99\min\{1,\min\set{-\lambda_i/\sdirlbd_i}{\sdirlbd_i < 0}\}$
- while $\fie (x +s\pdsdir) \not \prec 0$ do
- $t := \beta t$
- end while
- while $\|r(x +s\pdsdir, \lambda + s\pdsdirlbd, \nu + s\pdsdirnu)\|_2 > (1-\alpha s)\|r(x,\lambda,\nu)\|_2$ do
- $t := \beta t$
- end while