All Math Topics in All Multiverses

277 minute read

posted: 02-Aug-2025 & updated: 05-Aug-2025

NotebookLM Podcast

19:54 \(% \newcommand{\algA}{\algk{A}} \newcommand{\algC}{\algk{C}} \newcommand{\bigtimes}{\times} \newcommand{\compl}[1]{\tilde{#1}} \newcommand{\complexes}{\mathbb{C}} \newcommand{\dom}{\mathop{\bf dom {}}} \newcommand{\ereals}{\reals\cup\{-\infty,\infty\}} \newcommand{\field}{\mathbb{F}} \newcommand{\integers}{\mathbb{Z}} \newcommand{\lbdseqk}[1]{\seqk{\lambda}{#1}} \newcommand{\meas}[3]{({#1}, {#2}, {#3})} \newcommand{\measu}[2]{({#1}, {#2})} \newcommand{\meast}[3]{\left({#1}, {#2}, {#3}\right)} \newcommand{\naturals}{\mathbb{N}} \newcommand{\nuseqk}[1]{\seqk{\nu}{#1}} \newcommand{\pair}[2]{\langle {#1}, {#2}\rangle} \newcommand{\rationals}{\mathbb{Q}} \newcommand{\reals}{\mathbb{R}} \newcommand{\seq}[1]{\left\langle{#1}\right\rangle} \newcommand{\powerset}{\mathcal{P}} \newcommand{\pprealk}[1]{\reals_{++}^{#1}} \newcommand{\ppreals}{\mathbb{R}_{++}} \newcommand{\prealk}[1]{\reals_{+}^{#1}} \newcommand{\preals}{\mathbb{R}_+} \newcommand{\tXJ}{\topos{X}{J}} % \newcommand{\relint}{\mathop{\bf relint {}}} \newcommand{\boundary}{\mathop{\bf bd {}}} \newcommand{\subsetset}[1]{\mathcal{#1}} \newcommand{\Tr}{\mathcal{\bf Tr}} \newcommand{\symset}[1]{\mathbf{S}^{#1}} \newcommand{\possemidefset}[1]{\mathbf{S}_+^{#1}} \newcommand{\posdefset}[1]{\mathbf{S}_{++}^{#1}} \newcommand{\ones}{\mathbf{1}} \newcommand{\Prob}{\mathop{\bf Prob {}}} \newcommand{\prob}[1]{\Prob\left\{#1\right\}} \newcommand{\Expect}{\mathop{\bf E {}}} \newcommand{\Var}{\mathop{\bf Var{}}} \newcommand{\Mod}[1]{\;(\text{mod}\;#1)} \newcommand{\ball}[2]{B(#1,#2)} \newcommand{\generates}[1]{\langle {#1} \rangle} \newcommand{\isomorph}{\approx} \newcommand{\isomorph}{\approx} \newcommand{\nullspace}{\mathcalfont{N}} \newcommand{\range}{\mathcalfont{R}} \newcommand{\diag}{\mathop{\bf diag {}}} \newcommand{\rank}{\mathop{\bf rank {}}} \newcommand{\Ker}{\mathop{\mathrm{Ker} {}}} \newcommand{\Map}{\mathop{\mathrm{Map} {}}} \newcommand{\End}{\mathop{\mathrm{End} {}}} \newcommand{\Img}{\mathop{\mathrm{Im} {}}} \newcommand{\Aut}{\mathop{\mathrm{Aut} {}}} \newcommand{\Gal}{\mathop{\mathrm{Gal} {}}} \newcommand{\Irr}{\mathop{\mathrm{Irr} {}}} \newcommand{\arginf}{\mathop{\mathrm{arginf}}} \newcommand{\argsup}{\mathop{\mathrm{argsup}}} \newcommand{\argmin}{\mathop{\mathrm{argmin}}} \newcommand{\ev}{\mathop{\mathrm{ev} {}}} \newcommand{\affinehull}{\mathop{\bf aff {}}} \newcommand{\cvxhull}{\mathop{\bf Conv {}}} \newcommand{\epi}{\mathop{\bf epi {}}} \newcommand{\injhomeo}{\hookrightarrow} \newcommand{\perm}[1]{\text{Perm}(#1)} \newcommand{\aut}[1]{\text{Aut}(#1)} \newcommand{\ideal}[1]{\mathfrak{#1}} \newcommand{\bigset}[2]{\left\{#1\left|{#2}\right.\right\}} \newcommand{\bigsetl}[2]{\left\{\left.{#1}\right|{#2}\right\}} \newcommand{\primefield}[1]{\field_{#1}} \newcommand{\dimext}[2]{[#1:{#2}]} \newcommand{\restrict}[2]{#1|{#2}} \newcommand{\algclosure}[1]{#1^\mathrm{a}} \newcommand{\finitefield}[2]{\field_{#1^{#2}}} \newcommand{\frobmap}[2]{\varphi_{#1,{#2}}} % %\newcommand{\algfontmode}{} % %\ifdefined\algfontmode %\newcommand\mathalgfont[1]{\mathcal{#1}} %\newcommand\mathcalfont[1]{\mathscr{#1}} %\else \newcommand\mathalgfont[1]{\mathscr{#1}} \newcommand\mathcalfont[1]{\mathcal{#1}} %\fi % %\def\DeltaSirDir{yes} %\newcommand\sdirletter[2]{\ifthenelse{\equal{\DeltaSirDir}{yes}}{\ensuremath{\Delta #1}}{\ensuremath{#2}}} \newcommand{\sdirletter}[2]{\Delta #1} \newcommand{\sdirlbd}{\sdirletter{\lambda}{\Delta \lambda}} \newcommand{\sdir}{\sdirletter{x}{v}} \newcommand{\seqk}[2]{#1^{(#2)}} \newcommand{\seqscr}[3]{\seq{#1}_{#2}^{#3}} \newcommand{\xseqk}[1]{\seqk{x}{#1}} \newcommand{\sdirk}[1]{\seqk{\sdir}{#1}} \newcommand{\sdiry}{\sdirletter{y}{\Delta y}} \newcommand{\slen}{t} \newcommand{\slenk}[1]{\seqk{\slen}{#1}} \newcommand{\ntsdir}{\sdir_\mathrm{nt}} \newcommand{\pdsdir}{\sdir_\mathrm{pd}} \newcommand{\sdirnu}{\sdirletter{\nu}{w}} \newcommand{\pdsdirnu}{\sdirnu_\mathrm{pd}} \newcommand{\pdsdiry}{\sdiry_\mathrm{pd}} \newcommand\pdsdirlbd{\sdirlbd_\mathrm{pd}} % \newcommand{\normal}{\mathcalfont{N}} % \newcommand{\algk}[1]{\mathalgfont{#1}} \newcommand{\collk}[1]{\mathcalfont{#1}} \newcommand{\classk}[1]{\collk{#1}} \newcommand{\indexedcol}[1]{\{#1\}} \newcommand{\rel}{\mathbf{R}} \newcommand{\relxy}[2]{#1\;\rel\;{#2}} \newcommand{\innerp}[2]{\langle{#1},{#2}\rangle} \newcommand{\innerpt}[2]{\left\langle{#1},{#2}\right\rangle} \newcommand{\closure}[1]{\overline{#1}} \newcommand{\support}{\mathbf{support}} \newcommand{\set}[2]{\{#1|#2\}} \newcommand{\metrics}[2]{\langle {#1}, {#2}\rangle} \newcommand{\interior}[1]{#1^\circ} \newcommand{\topol}[1]{\mathfrak{#1}} \newcommand{\topos}[2]{\langle {#1}, \topol{#2}\rangle} % topological space % \newcommand{\alg}{\algk{A}} \newcommand{\algB}{\algk{B}} \newcommand{\algF}{\algk{F}} \newcommand{\algR}{\algk{R}} \newcommand{\algX}{\algk{X}} \newcommand{\algY}{\algk{Y}} % \newcommand\coll{\collk{C}} \newcommand\collB{\collk{B}} \newcommand\collF{\collk{F}} \newcommand\collG{\collk{G}} \newcommand{\tJ}{\topol{J}} \newcommand{\tS}{\topol{S}} \newcommand\openconv{\collk{U}} % \newenvironment{my-matrix}[1]{\begin{bmatrix}}{\end{bmatrix}} \newcommand{\colvectwo}[2]{\begin{my-matrix}{c}{#1}\\{#2}\end{my-matrix}} \newcommand{\colvecthree}[3]{\begin{my-matrix}{c}{#1}\\{#2}\\{#3}\end{my-matrix}} \newcommand{\rowvecthree}[3]{\begin{bmatrix}{#1}&{#2}&{#3}\end{bmatrix}} \newcommand{\mattwotwo}[4]{\begin{bmatrix}{#1}&{#2}\\{#3}&{#4}\end{bmatrix}} % \newcommand\optfdk[2]{#1^\mathrm{#2}} \newcommand\tildeoptfdk[2]{\tilde{#1}^\mathrm{#2}} \newcommand\fobj{\optfdk{f}{obj}} \newcommand\fie{\optfdk{f}{ie}} \newcommand\feq{\optfdk{f}{eq}} \newcommand\tildefobj{\tildeoptfdk{f}{obj}} \newcommand\tildefie{\tildeoptfdk{f}{ie}} \newcommand\tildefeq{\tildeoptfdk{f}{eq}} \newcommand\xdomain{\mathcalfont{X}} \newcommand\xobj{\optfdk{\xdomain}{obj}} \newcommand\xie{\optfdk{\xdomain}{ie}} \newcommand\xeq{\optfdk{\xdomain}{eq}} \newcommand\optdomain{\mathcalfont{D}} \newcommand\optfeasset{\mathcalfont{F}} % \newcommand{\bigpropercone}{\mathcalfont{K}} % \newcommand{\prescript}[3]{\;^{#1}{#3}} % %\)

Math Stories

Fundamental Theorems

Fundamental theorem of arithmetic

integer $n\geq2$ can be factored uniquely into products of primes, i.e., exist distinct primes, $p_1$, …, $p_k$, and $e_1,\ldots, e_k\in\naturals$ such that $$ n = p_1^{e_1} p_2^{e_2} \cdots p_k^{e_k} $$

Fundamental theorem of algebra

every non-constant single-variable polynomial with complex coefficients has at least one complex root, or equivalently, (the field of complex numbers) is algebraically closed, or equivalently, every non-zero, single-variable, degree $n$ polynomial with complex coefficients has, counted with multiplicity, exactly $n$ complex roots.

the fundamental theorem of algebra, also called d'Alembert's theorem or the d'Alembert–Gauss theorem
despite its name, not fundamental for modern algebra; named when algebra was synonymous with the theory of equations

Fundamental theorem of calculus

first fundamental theorem of calculus - for continuous real-valued function $f:[a,b]\to\reals$, function $F:[a,b]\to\reals$ defined by $ F(x) = \int_a^x f(t) dt $ is uniformly continuous on $[a,b]$ and differentiable on open interval $(a,b)$ and $$ F'(x) = f(x) $$ for all $x\in(a,b)$, hence $F$ is antiderivative of $f$
second fundamental theorem of calculus or Newton-Leibniz theorem - for real-valued function $f:[a,b]\to\reals$ and continuous function $F:[a,b]\to\reals$ which is antiderivative of $f$ in $(a,b)$, i.e. $ F'(x) = f(x) $, if $f$ is Riemann integrable on $[a,b]$, then $$ \int_a^b f(x) dx = F(b) - F(a) $$

Fundamental theorem of calculus for line integrals

line integral through a gradient field can be evaluated by evaluating the original scalar field at the endpoints of the curve, i.e., if $\varphi: X \to \reals$ is differentiable function and $\gamma$ is curve in $X\subset \reals$ which starts at point $p\in\reals^n$ and ends at point $q\in\reals^n$, then $$ \int_\gamma \nabla \varphi(x)^T dx = \varphi(q) - \varphi(p) $$

generalization of the second fundamental theorem of calculus of Fundamental theorem of calculus

Fundamental theorem of cyclic groups

every subgroup of a cyclic group is cyclic; moreover, for finite cyclic group of order $n$, every subgroup's order is a divisor of $n$, and exists exactly one subgroup for each divisor.

Fundamental theorem of equivalence relations

equivalence relation $\sim$ on set $X$ partitions $X$; conversely, corresponding to any partition of $X$, exists equivalence relation $\sim$ on $X$

Fundamental theorem of finite abelian groups

every finite abelian group can be expressed as direct sum of cyclic subgroups of prime-power order, i.e., any finite abelian group $G$ is isomorphic to direct sum of form $$ \bigoplus_{i=1}^u \left(\integers/{k_i}\integers\right) $$ in either of the following canonical ways

numbers $k_1$, …, $k_u$ are powers of (not necessarily distinct) primes
$k_1$ divides $k_2$, which divides $k_3$, and so on up to $k_u$

also known as basis theorem for finite abelian groups

Fundamental theorem of finitely generated abelian groups

Fundamental theorem of finitely generated abelian groups generalizes Fundamental theorem of finite abelian groups in two ways

primary decomposition - every finitely generated abelian group is isomorphic to a direct sum of primary cyclic groups and infinite cyclic groups, i.e., every finitely generated abelian group $G$ is isomorphic to group of form $$ G = \integers^n \oplus \left(\integers/q_1 \integers\right) \oplus \cdots \oplus \left(\integers/q_t \integers\right) $$ where $n\geq0$ is rank, and numbers $q_1, \ldots, q_t$ are powers of (not necessarily distinct) prime numbers; in particular, $G$ is finite if and only if $n=0$, values of $n, q_1, \ldots, q_t$ are (up to rearranging indices) uniquely determined by $G$, i.e., exists one and only one way to represent $G$ as such decomposition
invariant factor decomposition - can also write any finitely generated abelian group $G$ as direct sum of form $$ G = \integers^{n} \oplus \left(\integers /{k_{1}}\integers\right) \oplus \cdots \oplus \left(\integers /{k_{u}}\integers\right) $$ where $k_1$ divides $k_2$, which divides $k_3$ and so on up to $k_u$; again, rank $n$ and invariant factors $k_1, \ldots, k_u$ are uniquely determined by $G$ (here with a unique order); rank and sequence of invariant factors determine group up to isomorphism

Fundamental theorem for Galois theory

for finite Galois extension, $K/k$

map $H \mapsto K^H$ induces isomorphism between set of subgroups of $G(K/k)$ & set of intermediate fields
subgroup, $H$, of $G(K/k)$, is normal if and only if $K^H/k$ is Galois
for normal subgroup, $H$, $\sigma\mapsto \restrict{\sigma}{K^H}$ induces isomorphism between $G(K/k)/H$ and $G(K^H/k)$

Fundamental theorem on homeomorphisms

for two groups $G$ and $H$ and group homeomorphism $f:G\to H$, normal subgroup $N\subset G$, natural surjective homeomorphism $\varphi:G\to G/N$ if $N$ is subset of $\Ker{f}$, exists unique homeomorphism $h:G/N\to H$ such that $$ f = h \circ \varphi $$

Fundamental theorem of ideal theory in number fields

every nonzero proper ideal in ring of integers of number field admits unique factorization into product of nonzero prime ideals; in other words, every ring of integers of number field is Dedekind domain

Fundamental theorem of linear algebra

number of columns of matrix $M$ is sume of rank of $M$ and nullity of $M$, or equivalently, dimension of domain of linear transformation $f$ is sum of rank of $f$ (dimension of image of $f$) and nullity of $f$ (dimension of kernel of $f$)

Fundamental theorem of linear programming

for linear program $$ \begin{array}{ll} \mbox{minimal} & c^Tx \\ \mbox{subject to} & Ax \leq b \end{array} $$ if $P=\set{x\in\reals^n}{Ax \leq b}$ is bounded polyhedron (hence polytope) and $x^\ast$ is optimal solution, then $x^\ast$ is either extreme point (i.e., vertex) of $P$ or lies on some face of $P$

Fundamental theorem of symmetric polynomials

for every commutative ring $A$, denote ring of symmetric polynomials in variables $X_1$, …, $X_n$ with coefficients in $A$ by $A[X_1,\ldots,X_n]^{S_n}$, which is polynomial ringt in $n$ elementary symmetric polynomials $e_k(X_1,\ldots,X_n)$ for $k=1,\ldots,n$, then every symmetric polynomial $P(X_1,\ldots,X_n) \in A[X_1,\ldots,X_n]^{S_n}$ has unique representation $$ P(X_1, \ldots, X_n) = Q(e_1(X_1,\ldots,X_n), \ldots, e_n(X_1,\ldots,X_n)) $$ for some polynomials $Q\in A[Y_1,\ldots,Y_n]$, or equivalently, ring homeomorphism that sends $Y_k$ to $e_k(X_1,\ldots,X_n)$ for $k=1,\ldots,n$ defines an isomorphism between $A[Y_1,\ldots,Y_n]$ and $A[X_1,\ldots,X_n]^{S_n}$

Duality

Dualities

duality
- “very pervasive and important concept in (modern) mathematics''
- “important general theme having manifestations in almost every area of mathematics''
dualities appear in many places in mathematics, e.g.
- dual of normed space is space of bounded linear functionals on the space (page~here)
- dual cones and dual norms are defined ( & )
- can define dual generalized inequalities using dual cones ()
- can find necessary and sufficient conditions for $K$-convexity using dual generalized inequalities ()
- duality can be observed even in fundamental theorem for Galois theory, i.e., $G(K/E) \leftrightarrow E$ & $H \leftrightarrow K^H$ ()
- exist dualities in continuous / discrete functions in time domain and continuous / discrete functions in frequency domain, i.e., as in Fourier Transformation

However, never fascinated more than duality appearing in optimization, e.g.,
- properties such as weak duality () and strong duality ()
- dual problem provides some bound for the optimal value of the original problem, hence certificate of suboptimality!
- constraint qualifications such as Slater's condition () guarantee strong duality!

Notations

sets of numbers
- $\naturals$ - set of natural numbers
- $\integers$ - set of integers
- $\integers_+$ - set of nonnegative integers
- $\rationals$ - set of rational numbers
- $\reals$ - set of real numbers
- $\preals$ - set of nonnegative real numbers
- $\ppreals$ - set of positive real numbers
- $\complexes$ - set of complex numbers
sequences $\seq{x_i}$ and the like
- finite $\seq{x_i}_{i=1}^n$, infinite $\seq{x_i}_{i=1}^\infty$ - use $\seq{x_i}$ whenever unambiguously understood
- similarly for other operations, e.g., $\sum x_i$, $\prod x_i$, $\cup A_i$, $\cap A_i$, $\bigtimes A_i$
- similarly for integrals, e.g., $\int f$ for $\int_{-\infty}^\infty f$
sets
- $\compl{A}$ - complement of $A$
- $A\sim B$ - $A\cap \compl{B}$
- $A\Delta B$ - $(A\cap \compl{B}) \cup (\compl{A} \cap B)$
- $\powerset(A)$ - set of all subsets of $A$
sets in metric vector spaces
- $\closure{A}$ - closure of set $A$
- $\interior{A}$ - interior of set $A$
- $\relint A$ - relative interior of set $A$
- $\boundary A$ - boundary of set $A$
set algebra
- $\sigma(\subsetset{A})$ - $\sigma$-algebra generated by $\subsetset{A}$, i.e., smallest $\sigma$-algebra containing $\subsetset{A}$
norms in $\reals^n$
- $\|x\|_p$ ($p\geq1$) - $p$-norm of $x\in\reals^n$, i.e., $(|x_1|^p + \cdots + |x_n|^p)^{1/p}$
- e.g., $\|x\|_2$ - Euclidean norm
matrices and vectors
- $a_{i}$ - $i$-th entry of vector $a$
- $A_{ij}$ - entry of matrix $A$ at position $(i,j)$, i.e., entry in $i$-th row and $j$-th column
- $\Tr(A)$ - trace of $A \in\reals^{n\times n}$, i.e., $A_{1,1}+ \cdots + A_{n,n}$
symmetric, positive definite, and positive semi-definite matrices
- $\symset{n}\subset \reals^{n\times n}$ - set of symmetric matrices
- $\possemidefset{n}\subset \symset{n}$ - set of positive semi-definite matrices; $A\succeq0 \Leftrightarrow A \in \possemidefset{n}$
- $\posdefset{n}\subset \symset{n}$ - set of positive definite matrices; $A\succ0 \Leftrightarrow A \in \posdefset{n}$
sometimes, use Python script-like notations (with serious abuse of mathematical notations)
- use $f:\reals\to\reals$ as if it were $f:\reals^n \to \reals^n$, e.g., $$ \exp(x) = (\exp(x_1), \ldots, \exp(x_n)) \quad \mbox{for } x\in\reals^n $$ and $$ \log(x) = (\log(x_1), \ldots, \log(x_n)) \quad \mbox{for } x\in\ppreals^n $$ which corresponds to Python code numpy.exp(x) or numpy.log(x) where x is instance of numpy.ndarray, i.e., numpy array
- use $\sum x$ to mean $\ones^T x$ for $x\in\reals^n$, i.e. $$ \sum x = x_1 + \cdots + x_n $$ which corresponds to Python code x.sum() where x is numpy array
- use $x/y$ for $x,y\in\reals^n$ to mean $$ \rowvecthree{x_1/y_1}{\cdots}{x_n/y_n}^T $$ which corresponds to Python code x / y where x and y are $1$-d numpy arrays
- use $X/Y$ for $X,Y\in\reals^{m\times n}$ to mean $$ \begin{my-matrix}{cccc} X_{1,1}/Y_{1,1} & X_{1,2}/Y_{1,2} & \cdots & X_{1,n}/Y_{1,n} \\ X_{2,1}/Y_{2,1} & X_{2,2}/Y_{2,2} & \cdots & X_{2,n}/Y_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ X_{m,1}/Y_{m,1} & X_{m,2}/Y_{m,2} & \cdots & X_{m,n}/Y_{m,n} \end{my-matrix} $$ which corresponds to Python code X / Y where X and Y are $2$-d numpy arrays

Some definitions

statement $P_n$, said to happen infinitely often or i.o. if $$ \left( \forall N\in\naturals \right) \left( \exists n > N \right) \left( P_n \right) $$

statement $P(x)$, said to happen almost everywhere or a.e. or almost surely or a.s. (depending on context) associated with measure space $\meas{X}{\algB}{\mu}$ if $$ \mu \set{x}{P(x)} = 1 $$ or equivalently $$ \mu \set{x}{\sim P(x)} = 0 $$

Some conventions

(for some subjects) use following conventions
- $0\cdot \infty = \infty \cdot 0 = 0$
- $(\forall x\in\ppreals)(x\cdot \infty = \infty \cdot x = \infty)$
- $\infty \cdot \infty = \infty$

Algebra

Inequalities

Jensen's inequality

strictly convex function: for any $x\neq y$ and $0< \alpha <1$ () $$ \alpha f(x) + (1-\alpha) f(y) > f(\alpha x + (1-\alpha) y) $$
convex function: for any $x, y$ and $0< \alpha <1$ () $$ \alpha f(x) + (1-\alpha) f(y) \geq f(\alpha x + (1-\alpha) y) $$

for convex function $f$ and distinct $x_i$ and $0< \alpha_i <1$ with $\alpha_1 + \cdots = \alpha_n=1$ $$ \alpha_1 f(x_1) + \cdots + \alpha_n f(x_n) \geq f(\alpha_1 x_1 + \cdots + \alpha_n x_n) $$

if $f$ is strictly convex, equality holds if and only if $x_1=\cdots=x_n$

Jensen's inequality - for random variables

discrete random variable interpretation of Jensen's inequality in summation form - assume $\Prob(X=x_i) = \alpha_i$, then $$ \Expect f(X) = \alpha_1 f(x_1) + \cdots + \alpha_n f(x_n) \geq f(\alpha_1 x_1 + \cdots + \alpha_n x_n) = f\left(\Expect X\right) $$
true for any random variables $X$

for random vector $X$ (page~here for definition) $$ \Expect f(X) \geq f(\Expect X) $$ if probability density function (PDF) $p_X$ given, $$ \int f(x) p_X(x) dx \geq f\left(\int x p_X(x) dx\right) $$

Proof for $n=3$

for any $x,y,z$ and $\alpha,\beta,\gamma>0$ with $\alpha + \beta + \gamma = 1$ $$ \begin{eqnarray*} \alpha f(x) + \beta f(y) + \gamma f(z) &=& (\alpha+\beta)\left(\frac{\alpha}{\alpha+\beta} f(x) + \frac{\beta}{\alpha + \beta} f(y)\right) + \gamma f(z) \\ &\geq& (\alpha+\beta)f\left(\frac{\alpha}{\alpha+\beta} x + \frac{\beta}{\alpha + \beta} y\right) + \gamma f(z) \\ &\geq& f\left((\alpha+\beta)\left(\frac{\alpha}{\alpha+\beta} x + \frac{\beta}{\alpha + \beta} y\right) + \gamma z \right) \\ &=& f(\alpha x + \beta y + \gamma z ) \end{eqnarray*} $$

Proof for all $n$

use mathematical induction
- assume that Jensen's inequality holds for $1\leq n\leq m$
- for distinct $x_i$ and $\alpha_i>0$ ($1\leq i\leq m+1$) with $\alpha_1 + \cdots + \alpha_{m+1} = 1$ $$ \begin{eqnarray*} \sum^{m+1}_{i=1} \alpha_i f(x_i) &=& \left(\sum^m_{j=1} \alpha_j\right) \sum^m_{i=1} \left(\frac{\alpha_i}{\sum^m_{j=1} \alpha_j} f(x_i)\right) + \alpha_{m+1} f(x_{m+1}) \\ &\geq& \left(\sum^m_{j=1} \alpha_j\right) f\left(\sum^m_{i=1} \left(\frac{\alpha_i}{\sum^m_{j=1} \alpha_j} x_i\right)\right) + \alpha_{m+1} f(x_{m+1}) \\ &=& \left(\sum^m_{j=1} \alpha_j\right) f\left(\frac{1}{\sum^m_{j=1} \alpha_j}\sum^m_{i=1} {\alpha_i}{} x_i\right) + \alpha_{m+1} f(x_{m+1}) \\ &\geq& f\left( \sum^m_{i=1} \alpha_i x_i + \alpha_{m+1} x_{m+1}\right) = f\left( \sum^{m+1}_{i=1} \alpha_i x_i \right) \end{eqnarray*} $$

1st and 2nd order conditions for convexity

1st order condition (assuming differentiable $f:\reals\to\reals$) - $f$ is strictly convex if and only if for any $x\neq y$ $$ f(y) > f(x) + f'(x)(y-x) $$
2nd order condition (assuming twice-differentiable $f:\reals\to\reals$)
- if $f''(x)>0$, $f$ is strictly convex
- $f$ is convex if and only if for any $x$ $$ f''(x) \geq 0 $$

Jensen's inequality examples

$f(x)=x^2$ is strictly convex $$ \frac{a^2 + b^2}{2} \geq \left(\frac{a+b}{2}\right)^2 $$
$f(x)=x^4$ is strictly convex $$ \frac{a^4 + b^4}{2} \geq \left(\frac{a+b}{2}\right)^4 $$
$f(x)=\exp(x)$ is strictly convex $$ \frac{\exp(a) + \exp(b)}{2} \geq \exp\left(\frac{a+b}{2}\right) $$
equality holds if and only if $a=b$ for all inequalities

1st and 2nd order conditions for convexity - vector version

1st order condition (assuming differentiable $f:\reals^n\to\reals$) - $f$ is strict convex if and only if for any $x,y$ $$ f(y) > f(x) + \nabla f(x)^T (y-x) $$ where $\nabla f(x) \in\reals^{n}$ with $\nabla f(x)_{i} = \partial f(x) / \partial x_i$
2nd order condition (assuming twice-differentiable $f:\reals^n\to\reals$)
- if $\nabla^2 f(x) \succ 0$, $f$ is strictly convex
- $f$ is convex if and only if for any $x$ $$ \nabla^2 f(x)\succeq 0 $$
where $\nabla^2 f(x) \in\reals^{n\times n}$ is Hessian matrix of $f$ evaluated at $x$, i.e., $\nabla^2 f(x)_{i,j} = \partial^2 f(x) / \partial x_i \partial x_j$

Jensen's inequality examples - vector version

assume $f:\reals^n\to\reals$
$f(x)=\|x\|_2 = \sqrt{\sum x_i^2}$ is strictly convex $$ (\|a\|_2 + 2\|b\|_2 )/3 \geq \left\|(a+2b)/3\right\|_2 $$
- equality holds if and only if $a=b\in\reals^n$
$f(x)=\|x\|_p = \left(\sum |x_i|^p\right)^{1/p}$ ($p>1$) is strictly convex $$ \frac{1}{k} \left(\sum_{i=1}^k\|x^{(i)}\|_p \right) \geq \left\|\frac{1}{k}\sum_{i=1}^k x^{(i)}\right\|_p $$
- equality holds if and only if $x^{(1)}=\cdots=x^{(k)}\in\reals^n$

AM $\geq$ GM

for all $a,b>0$ $$ \frac{a + b}{2} \geq \sqrt{ab} $$
- equality holds if and only if $a=b$
below most general form holds

for any $n$ $a_i>0$ and $\alpha_i>0$ with $\alpha_1+\cdots+\alpha_n=1$ $$ \alpha_1 a_1 + \cdots + \alpha_n a_n \geq {a_1^{\alpha_1} \cdots a_n^{\alpha_n}} $$ where equality holds if and only if $a_1=\cdots=a_n$

let's prove these incrementally (for rational $\alpha_i$)

Proof of AM $\geq$ GM - simplest case

use fact that $x^2\geq0$ for any $x\in\reals$
for any $a,b>0$ $$ \begin{eqnarray*} && (\sqrt{a}-\sqrt{b})^2 \geq 0 \\ &\Leftrightarrow& a^2 - 2\sqrt{ab} + b^2 \geq 0 \\ &\Leftrightarrow& a + b \geq 2\sqrt{ab} \\ &\Leftrightarrow& \frac{a + b}{2} \geq \sqrt{ab} \end{eqnarray*} $$
- equality holds if and only if $a=b$

Proof of AM $\geq$ GM - when $n=4$ and $n=8$

for any $a,b,c,d>0$ $$ \begin{eqnarray*} \frac{a+b+c+d}{4} &\geq& \frac{2\sqrt{ab} + 2\sqrt{cd}}{4} = \frac{\sqrt{ab} + \sqrt{cd}}{2} \\ &\geq& \sqrt{\sqrt{ab} \sqrt{cd}} = \sqrt[4]{abcd} \end{eqnarray*} $$
- equality holds if and only if $a=b$ and $c=d$ and $ab=cd$ if and only if $a=b=c=d$
likewise, for $a_1,\ldots,a_8>0$ $$ \begin{eqnarray*} \frac{a_1+\cdots+a_8}{8} &\geq& \frac{\sqrt{a_1a_2} + \sqrt{a_3a_4} + \sqrt{a_5a_6} + \sqrt{a_7a_8}}{4} \\ &\geq& \sqrt[4]{\sqrt{a_1a_2} \sqrt{a_3a_4} \sqrt{a_5a_6} \sqrt{a_7a_8}} \\ &=& \sqrt[8]{a_1\cdots a_8} \end{eqnarray*} $$
- equality holds if and only if $a_1=\cdots=a_8$

Proof of AM $\geq$ GM - when $n=2^m$

generalized to cases $n=2^m$ $$ \left(\sum_{a=1}^{2^m} a_i\right) / 2^m\geq \left({\prod_{a=1}^{2^m} a_i}\right)^{1/2^m} $$
- equality holds if and only if $a_1=\cdots=a_{2^m}$
can be proved by mathematical induction

Proof of AM $\geq$ GM - when $n=3$

proof for $n=3$ $$ \begin{eqnarray*} \frac{a+b+c}{3} &=& \frac{a + b + c + (a+b+c)/3}{4} \geq \sqrt[4]{abc(a+b+c)/3} \\ &\Rightarrow& \left(\frac{a+b+c}{3}\right)^4 \geq {abc(a+b+c)/3} \\ &\Leftrightarrow& \left(\frac{a+b+c}{3}\right)^3 \geq abc \\ &\Leftrightarrow& \frac{a+b+c}{3} \geq \sqrt[3]{abc} \end{eqnarray*} $$
- equality holds if and only if $a=b=c=(a+b+c)/3$ if and only if $a=b=c$

Proof of AM $\geq$ GM - for all integers

for any integer $n\neq 2^m$
for $m$ such that $2^m>n$ $$ \begin{eqnarray*} \frac{a_1+\cdots+a_n}{n} &=& \frac{a_1 + \cdots + a_n + (2^m-n) (a_1+\cdots+a_n) /n}{2^m} \\ && \geq \sqrt[2^m]{a_1\cdots a_n \cdot ((a_1 + \cdots + a_n)/n)^{2^m-n}} \\ &\Leftrightarrow& \left(\frac{a_1+\cdots+a_n}{n}\right)^{2^m} \geq a_1\cdots a_n \\ &&\cdot \left(\frac{a_1 + \cdots + a_n}{n}\right)^{2^m-n} \\ &\Leftrightarrow& \left(\frac{a_1+\cdots+a_n}{n}\right)^{n} \geq {a_1\cdots a_n} \\ &\Leftrightarrow& \frac{a_1+\cdots+a_n}{n} \geq \sqrt[n]{a_1\cdots a_n} \end{eqnarray*} $$
- equality holds if and only if $a_1=\cdots=a_n$

Proof of AM $\geq$ GM - rational $\alpha_i$

given $n$ positive rational $\alpha_i$, we can find $n$ natural numbers $q_i$ such that $$ \alpha_i = \frac{q_i}{ N} $$ where $q_1+\cdots+q_n=N$
for any $n$ positive $a_i\in\reals$ and positive $n$ $\alpha_i\in\rationals$ with $\alpha_1+\cdots+\alpha_n=1$ $$ \begin{eqnarray*} \alpha_1 a_1 + \cdots + \alpha_n a_n &=& \frac{q_1 a_1 + \cdots + q_n a_n}{N} \\ &&\geq \sqrt[N]{a_1^{q_1}\cdots a_n^{q_n}} = a_1^{\alpha_1}\cdots a_n^{\alpha_n} \end{eqnarray*} $$
- equality holds if and only if $a_1=\cdots=a_n$

Proof of AM $\geq$ GM - real $\alpha_i$

exist $n$ rational sequences $\{ \beta_{i,1}, \beta_{i,2}, \ldots\}$ ($1\leq i\leq n$) such that $$ \begin{eqnarray*} && \beta_{1,j}+\cdots+\beta_{n,j}=1 \ \forall \ j\geq1 \\ && \lim_{j\to\infty} \beta_{i,j} = \alpha_i \ \forall \ 1\leq i\leq n \end{eqnarray*} $$
for all $j$ $$ \beta_{1,j} a_1 + \cdots + \beta_{n,j} a_n \geq a_1^{\beta_{1,j}}\cdots a_n^{\beta_{n,j}} $$ hence $$ \begin{eqnarray*} && \lim_{j\to\infty} \left(\beta_{1,j} a_1 + \cdots + \beta_{n,j} a_n \right) \geq \lim_{j\to\infty} a_1^{\beta_{1,j}}\cdots a_n^{\beta_{n,j}} \\ &\Leftrightarrow& \alpha_1 a_1 + \cdots + \alpha_n a_n \geq a_1^{\alpha_1}\cdots a_n^{\alpha_n} \end{eqnarray*} $$
cannot prove equality condition from above proof method

Proof of AM $\geq$ GM using Jensen's inequality

$(-\log)$ is strictly convex function because $$ \frac{d^2}{dx^2} \left(-\log(x)\right) = \frac{d}{dx} \left(-\frac{1}{x} \right) = \frac{1}{x^2} > 0 $$
Jensen's inequality implies for $a_i >0$, $\alpha_i >0$ with $\sum \alpha_i = 1$ $$ \begin{eqnarray*} -\log\left(\prod a_i^{\alpha_i}\right) &=& -\sum \log\left( a_i^{\alpha_i}\right) \\ &=& \sum \alpha_i (-\log(a_i)) \\ &\geq& -\log \left(\sum \alpha_i a_i\right) \end{eqnarray*} $$
$(-\log)$ strictly monotonically decreases, hence $\prod a_i^{\alpha_i} \leq \sum \alpha_i a_i$, having just proved $$ \alpha_1 a_1 + \cdots + \alpha_n a_n \geq a_1^{\alpha_1}\cdots a_n^{\alpha_n} $$ where equality if and only if $a_i$ are equal (by Jensen's inequality's equality condition)

Cauchy-Schwarz inequality

for any $a_i, b_i\in\reals$ $$ ( a_1^2 + \cdots + a_n^2 ) ( b_1^2 + \cdots + b_n^2 ) \geq (a_1b_1 + \cdots + a_nb_n)^2 $$

middle school proof $$ \begin{eqnarray*} &&\sum (t a_i + b_i)^2 \geq 0 \ \forall\ t \in \reals \\ &\Leftrightarrow& t^2 \sum a_i^2 + 2t \sum a_ib_i + \sum b_i^2 \geq 0 \ \forall\ t \in \reals \\ &\Leftrightarrow& \Delta = \left(\sum a_ib_i \right)^2 - \sum a_i^2 \sum b_i^2 \leq 0 \end{eqnarray*} $$
- equality holds if and only if $\exists t\in\reals$, $t a_i + b_i=0$ for all $1\leq i\leq n$

Cauchy-Schwarz inequality - another proof

$x\geq0$ for any $x\in\reals$, hence $$ \begin{eqnarray*} && \sum_i \sum_j (a_ib_j - a_jb_i)^2 \geq0 \\ &\Leftrightarrow& \sum_i \sum_j (a_i^2b_j^2 - 2a_ia_jb_ib_j + a_j^2b_i^2) \geq0 \\ &\Leftrightarrow& \sum_i \sum_j a_i^2b_j^2 + \sum_i \sum_j a_j^2b_i^2 -2 \sum_i \sum_j a_ia_jb_ib_j \geq 0 \\ &\Leftrightarrow& 2 \sum_i a_i^2 \sum_j b_j^2 - 2 \sum_i a_ib_i \sum_j a_jb_j \geq 0 \\ &\Leftrightarrow& \sum_i a_i^2 \sum_j b_j^2 - \left(\sum_i a_ib_i\right)^2 \geq0 \end{eqnarray*} $$
- equality holds if and only if $a_ib_j=a_jb_i$ for all $1\leq i,j\leq n$

Cauchy-Schwarz inequality - still another proof

for any $x,y\in\reals$ and $\alpha,\beta>0$ with $\alpha + \beta = 1$ $$ \begin{eqnarray*} && (\alpha x - \beta y)^2 = \alpha^2 x^2 + \beta^2 y^2 - 2\alpha \beta xy \\ && = \alpha(1-\beta) x^2 + (1-\alpha)\beta y^2 - 2\alpha \beta xy \geq 0 \\ &\Leftrightarrow& \alpha x^2 + \beta y^2 \geq \alpha \beta x^2 + \alpha \beta y^2 + 2\alpha \beta xy = \alpha \beta (x+y)^2 \\ &\Leftrightarrow& x^2 / \alpha + y^2 / \beta \geq (x+y)^2 \end{eqnarray*} $$
plug in $x=a_i$, $y=b_i$, $\alpha = A/(A+B)$, $\beta=B/(A+B)$ where $A = \sqrt{\sum a_i^2}$, $B = \sqrt{\sum b_i^2}$ $$ \begin{eqnarray*} && \sum (a_i^2 / \alpha + b_i^2 / \beta) \geq \sum (a_i+b_i)^2 \\ &\Leftrightarrow& (A+B)^2 \geq A^2 + B^2 + 2 \sum a_i b_i \\ &\Leftrightarrow& AB \geq \sum a_i b_i \\ &\Leftrightarrow& A^2B^2 \geq \left(\sum a_i b_i\right)^2 \\ &\Leftrightarrow& {\sum a_i^2}{\sum b_i^2} \geq \left(\sum a_i b_i \right)^2 \end{eqnarray*} $$

Cauchy-Schwarz inequality - proof using determinant

almost the same proof as first one - but using $2$-by-$2$ matrix determinant $$ \begin{eqnarray*} &&\sum (x a_i + y b_i )^2 \geq 0 \ \forall\ x,y \in \reals \\ &\Leftrightarrow& x^2 \sum a_i^2 + 2xy \sum a_ib_i + y^2\sum b_i^2 \geq 0 \ \forall \ x, y \in \reals \\ &\Leftrightarrow& \begin{my-matrix}{cc} x & y \end{my-matrix} \begin{my-matrix}{cc} \sum a_i^2 & \sum a_ib_i \\ \sum a_ib_i & \sum b_i^2 \end{my-matrix} \begin{my-matrix}{c} x \\ y \end{my-matrix} \geq 0 \ \forall \ x, y \in \reals \\ \\ &\Leftrightarrow& \left| \begin{array}{cc} \sum a_i^2 & \sum a_ib_i \\ \sum a_ib_i & \sum b_i^2 \end{array} \right| \geq 0 \\ &\Leftrightarrow& \sum a_i^2 \sum b_i^2 - \left(\sum a_ib_i \right)^2 \geq0 \end{eqnarray*} $$
- equality holds if and only if $$ \left( \exists x,y\in\reals \mbox{ with } xy\neq0 \right) \left( xa_i + yb_i=0\ \ \forall 1\leq i\leq n \right) $$
allows beautiful generalization of Cauchy-Schwarz inequality

Cauchy-Schwarz inequality - generalization

want to say something like $\sum_{i=1}^n (x a_i + y b_i + z c_i + w d_i + \cdots)^2$
run out of alphabets … - use double subscripts $$ \begin{eqnarray*} && \sum_{i=1}^n (x_1 A_{1,i} + x_2 A_{2,i} + \cdots + x_m A_{m,i})^2 \geq 0 \ \forall\ x_i \in \reals \\ &\Leftrightarrow& \sum_{i=1}^n (x^T a_i)^2 = \sum_{i=1}^n x^T a_ia_i^T x = x^T \left(\sum_{i=1}^n a_ia_i^T\right) x \geq 0 \ \forall\ x \in \reals^m \\ &\Leftrightarrow& \left| \begin{array}{cccc} \sum_{i=1}^n A_{1,i}^2 & \sum_{i=1}^n A_{1,i} A_{2,i} & \cdots & \sum_{i=1}^n A_{1,i} A_{m,i} \\ \sum_{i=1}^n A_{1,i}A_{2,i} & \sum_{i=1}^n A_{2,i}^2 & \cdots & \sum_{i=1}^n A_{2,i} A_{m,i} \\ \vdots & \vdots & \ddots & \vdots \\ \sum_{i=1}^n A_{1,i}A_{m,i} & \sum_{i=1}^n A_{2,i}A_{m,i} & \cdots & \sum_{i=1}^n A_{m,i}^2 \end{array} \right| \geq 0 \end{eqnarray*} $$
- where $a_i = \begin{my-matrix}{ccc} A_{1,i} &\cdots & A_{m,i}\end{my-matrix}^T \in\reals^m$
- equality holds if and only if $\exists x\neq0\in\reals^m$, $x^Ta_i =0$ for all $1\leq i\leq n$

Cauchy-Schwarz inequality - three series of variables

let $m=3$ $$ \begin{eqnarray*} && \begin{my-matrix}{ccc} \sum a_{i}^2 & \sum a_{i} b_{i} & \sum a_{i} c_{i} \\ \sum a_{i}b_{i} & \sum b_{i}^2 & \sum b_{i} c_{i} \\ \sum a_{i}c_{i} & \sum b_{i}c_{i} & \sum c_{i}^2 \end{my-matrix} \succeq 0 \\ &\Rightarrow& \sum a_i^2 \sum b_i^2 \sum c_i^2 + 2 \sum a_ib_i \sum b_ic_i \sum c_ia_i \\ && \geq \sum a_i^2 \left(\sum b_i c_i\right)^2 + \sum b_i^2 \left(\sum a_i c_i\right)^2 + \sum c_i^2 \left(\sum a_i b_i\right)^2 \end{eqnarray*} $$
- equality holds if and only if $\exists x,y,z\in\reals$, $xa_i + yb_i + zc_i=0$ for all $1\leq i\leq n$
questions for you
- what does this mean?
- any real-world applications?

Cauchy-Schwarz inequality - extensions

for $a_i, b_i \in\complexes$ $$ \sum |a_i|^2 \sum |b_i|^2 \geq \left|\sum a_i b_i \right|^2 $$

for two complex infinite sequences $\seq{a_i}_{i=1}^\infty$ and $\seq{b_i}_{i=1}^\infty$ $$ \sum^\infty_{i=1} |a_i|^2 \sum^\infty_{i=1} |b_i|^2 \geq \left|\sum^\infty_{i=1} a_i b_i \right|^2 $$

for two complex functions $f,g:[0,1] \to \complexes$ $$ \int |f|^2 \int |g|^2 \geq \left|\int f g \right|^2 $$

note that all these can be further generalized as in page~\pageref{page:Cauchy-Schwarz inequality - generalization}

Number Theory - Queen of Mathematics

Integers

integers ($\integers$) - $\ldots -2, -1, 0, 1, 2, \ldots$
- first defined by Bertrand Russell
- algebraic structure - commutative ring
  - addition, multiplication defined, but divison not defined
  - addition, multiplication are associative
  - multiplication distributive over addition
  - addition, multiplication are commutative
natural numbers ($\naturals$)
- $1, 2, \ldots$

Division and prime numbers

divisors for $n\in\naturals$ $$ \set{d\in\naturals}{ d \mbox{ divides } n} $$
prime numbers
- $p$ is primes if $1$ and $p$ are only divisors

Fundamental theorem of arithmetic

hence, integers are factorial ring ()

Elementary quantities

greatest common divisor (gcd) (of $a$ and $b$) $$ \gcd(a,b) = \max \set{d}{d\mbox{ divides both }a \mbox{ and } b} $$
- for definition of gcd for general entire rings, refer to
least common multiple (lcm) (of $a$ and $b$) $$ \mbox{lcm}(a,b) = \min \set{m}{\mbox{both } a \mbox{ and } b \mbox{ divides }m} $$
$a$ and $b$ coprime, relatively prime, mutually prime $\Leftrightarrow$ $\gcd(a,b)=1$

Are there infinite number of prime numbers?

yes!
proof
- assume there only exist finite number of prime numbers, e.g., $p_1 < p_2 < \cdots <p_n$
- but then, $p_1 \cdot p_2 \cdots p_n + 1$ is prime, but which is greater than $p_n$, hence contradiction

Integers modulo $n$

when $n$ divides $a-b$, $a$, said to be equivalent to $b$ modulo $n$, denoted by $$ a \equiv b \Mod{n} $$ read as “$a$ congruent to $b$ mod $n$''

$a\equiv b\Mod{n}$ and $c\equiv d\Mod{n}$ imply
- $a+c\equiv b+d \Mod{n}$
- $ac\equiv bd \Mod{n}$

classes determined by modulo relation, called congruence or residue class under modulo

set of equivalence classes under modulo, denoted by $\integers/n \integers$, called integers modulo $n$ or integers mod $n$

Euler's theorem

for $n\in\naturals$, $$ \varphi(n) = (p_1-1)p_1^{e_1-1} \cdots (p_k-1)p_k^{e_k-1} = n \prod_{\mathrm{prime}\ p\ \mathrm{dividing}\ n} (1-1/p) $$ called Euler's totient function, also called Euler $\varphi$-function

e.g., $\varphi(12) = \varphi(2^2\cdot 3^1) = 1\cdot2^1\cdot 2\cdot3^0 = 4$, $\varphi(10) = \varphi(2^1\cdot5^1) = 1\cdot2^0\cdot 4\cdot 5^0 =4$

for coprime $n$ and $a$ $$ a^{\varphi(n)} \equiv 1 \Mod{n} $$

e.g., $5^4 \equiv 1 \Mod{12}$ whereas $4^4 \equiv 4 \neq 1 \Mod{12}$

Euler's theorem underlies RSA cryptosystem, which is pervasively used in internet communication

Abstract Algebra

Why Abstract Algebra?

Why abstract algebra?

it's fun!
can understand instrict structures of algebraic objects
allow us to solve extremely practical problems (depending on your definition of practicality)
- e.g., can prove why root formulas for polynomials of order $n\geq 5$ do not exist
prepare us for pursuing further math topics such as
- differential geometry
- algebraic geometry
- analysis
- representation theory
- algebraic number theory

Some history

by the way, historically, often the case that application of an idea presented before extracting and presenting the idea on its own right
e.g., Galois used “quotient group'' only implicitly in his 1830's investigation, and it had to wait until 1889 to be explicitly presented as “abstract quotient group'' by Hölder

Groups

Monoids

mapping $S\times S \to S$ for set $S$, called law of composition (of $S$ to itself)

when $(\forall x, y, z \in S)((xy)z = x(yz))$, composition is said to be associative
$e\in S$ such that $(\forall x\in S)(ex = xe = x)$, called unit element - always unique

for any two unit elements $e$ and $f$, $ e = ef = f, $ hence, $e=f$

set $M$ with composition which is associative and having unit element, called monoid (so in particular, $M$ is not empty)

monoid $M$ with $ \left( \forall x, y \in M \right) \left( xy = yx \right) $, called commutative or abelian monoid
subset $H\subset M$ which has the unit element $e$ and is itself monoid, called submonoid

Groups

monoid $G$ with $$ \left( \forall x \in G \right) \left( \exists y \in G \right) \left( xy = yx = e \right) $$ called group

for $x\in G$, $y\in G$ with $xy=yx=e$, called inverse of $x$
group derived from commutative monoid, called abelian group or commutative group
group $G$ with $|G|<\infty$, called finite group
(similarly as submonoid) $H\subset G$ that has unit element and is itself group, called subgroup
subgroup consisting only of unit element, called trivial

Cyclic groups, generators, and direct products

group $G$ with $$ \left( \exists a\in G \right) \left( \forall x \in G \right) \left( \exists n\in \naturals \right) \left( x = a^n \right) $$ called cyclic group , such $a\in G$ called cyclic generator

for group $G$, $S\subset G$ with $$ \left( \forall x \in G \right) \left( x \mbox{ is arbitrary product of elements or inverse elements of } S \right) $$ called set of generators for $G$, said to generate $G$, denoted by $G=\generates{S}$

for two groups $G_1$ and $G_2$, group $G_1\times G_2$ with $$ \left( \forall (x_1,x_2), (y_1,y_2) \in G_1 \times G_2 \right) \left( (x_1,x_2)(y_1,y_2) = (x_1y_1, x_2,y_2) \in G_1 \times G_2 \right) $$ whose unit element defined by $(e_1,e_2)$ where $e_1$ and $e_2$ are unit elements of $G_1$ and $G_2$ respectively, called direct product of $G_1$ and $G_2$

Homeomorphism and isomorphism

for monoids $M$ and $M'$, mapping $f:M\to M'$ with $f(e)=e'$ $$ \left( x,y \in M \right) \left( f(xy) = f(x)f(y) \right) $$ where $e$ and $e'$ are unit elements of $M$ and $M'$ respectively, called monoid-homeomorphism or simple homeomorphism

group homeomorphism $f:G\to G'$ is similarly monoid-homeomorphism
homeomorphism $f:G\to G'$ where exists $g:G\to G'$ such that $f\circ g:G'\to G'$ and $g\circ f:G\to G$ are identity mappings, called isomorphism, sometimes denoted by $G\isomorph G'$
homeomorphism of $G$ into itself, called endomorphism
isomorphism of $G$ onto itself, called automorphism

set of all automorphisms of $G$ is itself group, denoted by $\aut{G}$

Kernel, image, and embedding of homeomorphism

for group-homeomorphism $f:G\to G'$ where $e'$ is unit element of $G'$, $f^{-1}(\{e'\})$, which is subgroup of $G$, called kernel of $f$, denoted by $\Ker{f}$

homeomorphism $f:G\to G'$ establishing isomorphism between $G$ and $f(G)\subset G'$, called embedding

for group-homeomorphism $f:G\to G'$, $f(G)\subset G'$ is subgroup of $G'$
homeomorphism whose kernel is trivial is injective, often denoted by special arrow $$ f:G \injhomeo G' $$
surjective homeomorphism whose kernel is trivial is isomorphism
for group $G$, its generators $S$, and another group $G'$, map $f:S\to G'$ has at most one extension to homeomorphism of $G$ into $G'$

Orthogonal subgroups

for group $G$ and two subgroups $H$ and $K\subset G$ with $HK = G$, $H\cap K = \{e\}$, and $ \left( x\in H, y\in K \right) \left( xy=yx \right) $, $$ f: H\times K \to G $$ with $(x,y)\mapsto xy$ is isomorphism can generalize to finite number of subgroups, $H_1$, …, $H_n$ such that $$ H_1 \cdots H_n = G $$ and $$ H_{k+1} \cap (H_1\cdots H_k) = \{e\} $$ in which case, $G$ is isomorphic to $H_1\cdots H_n$

Cosets of groups

for group $G$ and subgroup $H\subset G$, $aH$ for some $a\in G$, called left coset of $H$ in $G$, and element in $aH$, called coset representation of $aH$ - can define right cosets similarly

for group $G$ and subgroup $H\subset G$,

for $a\in G$, $x\mapsto ax$ induces bijection of $H$ onto $aH$, hence all left cosets have same cardinality
$aH \cap bH \neq \emptyset$ for $a,b\in G$ implies $aH=bH$
hence, $G$ is disjoint union of left cosets of $H$
same statements can be made for right cosets

number of left cosets of $H$ in $G$, called index of $H$ in $G$, denoted by $(G:H)$ - index of trivial subgroups, called order of $G$, denoted by $(G:1)$

Indices and orders of groups

for group $G$ and two subgroups $H$ and $K\subset G$ with $K\subset H$, $$ (G:H) (H:K) = (G:K) $$ when $K$ is trivial, we have $$ (G:H) (H:1) = (G:1) $$

hence, if $(G:1)<\infty$, both $(G:H)$ and $(H:1)$ divide $(G:1)$

Normal subgroup

subgroup $H\subset G$ of group $G$ with $$ \left( \forall x \in G \right) \left( x H = H x \right) \Leftrightarrow \left( \forall x \in G \right) \left( xHx^{-1}= H \right) $$ called normal subgroup of $G$, in which case

set of cosets $\set{xH}{x\in G}$ with law of composition defined by $ (xH)(yH) = (xy)H, $ forms group with unit element $H$, denoted by $G/H$, called factor group of $G$ by $H$, read $G$ modulo $H$ or $G$ mod $H$
$x \mapsto xH$ induces homeomorphism of $X$ onto $\set{xH}{x\in G}$, called canonical map , kernel of which is $H$

kernel of (every) homeomorphism of $G$ is normal subgroups of $G$
for family of normal subgroups of $G$, $\seq{N_\lambda}$, $ \bigcap N_\lambda $ is also normal subgroup
every subgroup of abelian group is normal
factor group of abelian group is abelian
factor group of cyclic group is cyclic

Normalizers and centralizers

for subset $S\subset G$ of group $G$, $$ \set{x\in G}{xSx^{-1} = S} $$ is subgroup, called normalizer of $S$, and also called centralizer of $a$ when $S=\{a\}$ is singletone; $$ \set{x\in G}{(\forall y\in S)(xyx^{-1} = y)} $$ called centralizer of $S$, and centralizer of $G$ itself, called center of $G$

e.g., $A \mapsto \det A$ of multiplicative group of square matrices in $\reals^{n\times n}$ into $\reals\sim\{0\}$ is homeomorphism, kernel of which called special linear group, and (of course) is normal

Normalizers and congruence

subgroup $H\subset G$ of group $G$ is normal subgroup of its normalizer $N_H$

subgroup $H\subset G$ of group $G$ is normal subgroup of its normalizer $N_H$
subgroup $K\subset G$ with $H\subset K$ where $H$ is normal in $K$ is contained in $N_H$
for subgroup $K\subset N_H$, $KH$ is group and $H$ is normal in $KH$
normalizer of $H$ is largest subgroup of $G$ in which $H$ is normal

for normal subgroup $H\subset G$ of group $G$, we write $$ x \equiv y \Mod{H} $$ if $xH=yH$, read $x$ and $y$ are congruent modulo $H$ - this notation used mostly for additive groups

Exact sequences of homeomorphisms

below sequence of homeomorphisms with $\Img f = \Ker g$ $$ G' \overset{f}{\longrightarrow} G \overset{g}{\longrightarrow} G'' $$ said to be exact below sequence of homeomorphisms with $\Img f_i = \Ker f_{i+1}$ $$ G_1 \overset{f_1}{\longrightarrow} G_2 \overset{f_2}{\longrightarrow} G_3 \longrightarrow \cdots \overset{f_{n-1}}{\longrightarrow} G_n $$ said to be exact

for normal subgroup $H\subset G$ of group $G$, sequence $ H \overset{j}{\to} G \overset{\varphi}{\to} G/H $ is exact where $j$ is inclusion and $\varphi$
$ 0 \overset{ }{\to} G' \overset{f}{\to} G \overset{g}{\to} G'' \overset{ }{\to} 0 $ is exact if and only if $f$ injective, $g$ surjective, and $\Img f = \Ker g$
if $H=\Ker g$ above, $ 0 \overset{ }{\to} H \overset{ }{\to} G \overset{ }{\to} G/H \overset{ }{\to} 0 $
more precisely, exists commutative diagram as in the figure, in which vertical mappings are isomorphisms and rows are exact

Canonical homeomorphism examples

all homeomorphisms described below called canonical

for two groups $G$ & $G'$ and homeomorphism $f:G\to G'$ whose kernel is $H$, exists unique homeomorphism $f_*: G/H \to G'$ with $$ f=f_*\circ \varphi $$ where $\varphi:G\to G/H$ is canonical map, and $f_*$ is injective
- $f_*$ can be defined by $xH\mapsto f(x)$
- $f_*$ said to be induced by $f$
- $f_*$ induces isomorphism $\lambda: G/H \to \Img f$
- below sequence summarizes above statements $$ G \overset{\varphi}{\to} G/H \overset{\lambda}{\to} \Img f \overset{j}{\to} G $$ where $j$ is inclusion
for group $G$, subgroup $H\subset G$, and homeomorphism $f:G\to G'$ whose kernel contains $H$, intersection of all normal subgroups containing $H$, $N$, which is the smallest normal subgroup containing $H$, is contained in $\Ker f$, i.e., $N\subset \Ker f$, and exists unique homeomorphism, $f_*:G/N\to G'$ such that $$ f = f_* \circ \varphi $$ where $\varphi:G\to G/H$ is canonical map
- $f_*$ can be defined by $xN\mapsto f(x)$
- $f_*$ said to be induced by $f$
for subgroups of $G$, $H$ and $K$ with $K\subset H$, $xK \mapsto xH$ induces homeomorphism of $G/K$ into $G/H$, whose kernel is $\set{xK}{x\in H}$, thus canonical isomorphism $$ (G/K)/(H/K) \isomorph (G/K) $$ this can be shown in the figure where rows are exact
for subgroup $H\subset G$ and $K\subset G$ with $H$ contained in normalizer of $K$, $H\cap K$ is normal subgroup of $H$, $HK=KH$ is subgroup of $G$, exists surjective homeomorphism $$ H \to HK / K $$ with $x \mapsto xK$, whose kernel is $H\cap K$, hence canonical isomorphism $$ H/(H\cap K) \isomorph HK/K $$
for group homeomorphism $f:G\to G'$, normal subgroup of $G'$, $H'$, $$ H=f^{-1}(H')\subset G $$ as shown in the figure,
$H$ is normal in $G$ and kernel of homeomorphism $$ G \overset{f}{\to} G'\overset{\varphi}{\to} G'/H' $$ is $H$ where $\varphi$ is canonical map, hence we have injective homeomorphism $$ \bar{f}:G/H \to G'/H' $$ again called canonical homeomorphism, giving commutative diagram in the figure; if $f$ is surjective, $\bar{f}$ is isomorphism

Towers

for group $G$, sequence of subgroups $$ G = G_0 \supset G_1 \supset G_2 \supset \cdots \supset G_m $$ called tower of subgroups

said to be normal if every $G_{i+1}$ is normal in $G_i$
said to be abelian if normal and every factor group $G_i/G_{i+1}$ is abelian
said to be cyclic if normal and every factor group $G_i/G_{i+1}$ is cyclic

for group homeomorphism $f:G\to G'$ and normal tower $$ G' = G'_0 \supset G'_1 \supset G'_2 \supset \cdots \supset G'_m $$ tower $$ f^{-1}(G') = f^{-1}(G'_0) \supset f^{-1}(G'_1) \supset f^{-1}(G'_2) \supset \cdots \supset f^{-1}(G'_m) $$ is

normal if $G'_i$ form normal tower
abelian if $G'_i$ form abelian tower
cyclic if $G'_i$ form cyclic tower

because every homeomorphism $$ G_i / G_{i+1} \to G'_i / G'_{i+1} $$ is injective

Refinement of towers and solvability of groups

group having an abelian tower whose last element is trivial subgroup, said to be solvable

abelian tower of finite group admits cyclic refinement
finite solvable group admits cyclic tower, whose last element is trivial subgroup

group whose order is prime power is solvable

for group $G$ and its normal subgroup $H$, $G$ is solvable if and only if both $H$ and $G/H$ are solvable

Commutators and commutator subgroups

for group $G$, $xyx^{-1}y^{-1}$ for $x,y\in G$, called commutator

subgroup generated by commutators of group $G$, called commutator subgroup, denoted by $G^C$, i.e. $$ G^C = \generates{\set{xyx^{-1}y^{-1}}{x,y\in G}} $$

$G^C$ is normal in $G$
$G/G^C$ is commutative
$G^C$ is contained in kernel of every homeomorphism of $G$ into commutative group
of above statements

commutator group is at the heart of solvability and non-solvability problems!

Simple groups

non-trivial group having no normal subgroup other than itself and trivial subgroup, said to be simple

abelian group is simple if and only if cycle of prime order

Butterfly lemma

for subgroups $U$ and $V$ of a group and normal subgroups $u$ and $v$ of $U$ and $V$ respectively, $$ u(U\cap v) \mbox{ is normal in } u(U\cap V) $$ $$ (u\cap V)v \mbox{ is normal in } (U\cap V)v $$ and factor groups are isomorphic, i.e., $$ u(U\cap V) / u(U\cap v) \isomorph\ (U\cap V)v / (u\cap V)v $$ these shown in the figure

indeed $$ (U\cap V)/((u\cap V)(U\cap v)) \isomorph\ u(U\cap V) / u(U\cap v) \isomorph\ (U\cap V)v / (u\cap V)v $$

Equivalent towers

for two normal towers of same height starting from same group ending with trivial subgroup $$ G = G_1 \supset G_2 \supset G_3 \supset \cdots \supset G_{n+1} = \{e\} $$ $$ G = H_1 \supset H_2 \supset H_3 \supset \cdots \supset H_{n+1} = \{e\} $$ with $$ G_i/G_{i+1}\isomorph H_{\pi(i)+1}/H_{\pi(i)} $$ for some permutation $\pi\in\perm{\{1,\ldots,n\}}$, i.e., sequences of factor groups are same up to isomorphisms and permutation of indices, said to be equivalent

Schreier and Jordan-Hölder theorems

two normal towers starting from same group and ending with trivial subgroup have equivalent refinement

all normal towers starting from same group and ending with trivial subgroup where each factor group is non-trivial and simple are equivalent

Cyclic groups

for group $G$, $n\in\naturals$ with $a^n=e$ for $a\in G$, called exponent of $a$; $n\in\naturals$ with $x^n=e$ for every $x\in G$, called exponent of $G$

for group $G$ and $a\in G$, smallest $n\in\naturals$ with $a^n=e$, called period of $a$

for finite group $G$ of order $n>1$, period of every non-unit element $a$ ($\neq e$) devided $n$; if $n$ is prime number, $G$ is cyclic and period of every generator is $n$

every subgroup of cyclic group is cyclic and image of every homeomorphism of cyclic group is cyclic

Properties of cyclic groups

infinity cyclic group has exactly two generators; if $a$ is one, $a^{-1}$ is the other
for cyclic group $G$ of order $n$ and generator $x$, set of generators of $G$ is $$ \set{x^m}{m \mbox{ is relatively prime to }n} $$
for cyclic group $G$ and two generators $a$ and $b$, exists automorphism of $G$ mapping $a$ onto $b$; conversely, every automorphism maps $a$ to some generator
for cyclic group $G$ of order $n$ and $d\in\naturals$ dividing $n$, exists unique subgroup of order $d$
for cyclic groups $G_1$ and $G_2$ of orders $n$ and $m$ respectively with $n$ and $m$ relatively prime, $G_1\times G_2$ is cyclic group
for non-cyclic finite abelian group $G$, exists subgroup isomorphic to $C\times C$ with $C$ cyclic with prime order

Symmetric groups and permutations

for nonempty set $S$, group $G$ of bijective functions of $S$ onto itself with law of composition being function composition, called symmetric group of $S$, denoted by $\perm{S}$; elements in $\perm{S}$ called permutations of $S$; element swapping two disjoint elements in $S$ leaving every others left, called transposition

for finite symmetric group $S_n$, exits unique homeomorphism $\epsilon: S_n \to\{-1,1\}$ mapping every transposition, $\tau$, to $-1$, i.e., $\epsilon(\tau)=-1$

element of finite symmetric group $\sigma$ with $\epsilon(\sigma)=1$, called even, element $\sigma$ with $\epsilon(\sigma)=-1$, called odd; kernel of $\epsilon$, called alternating group, denoted by $A_n$

symmetric group $S_n$ with $n\geq 5$ is not solvable

alternating group $A_n$ with $n\geq 5$ is simple

Operations of group on set

for group $G$ and set $S$, homeomorphism $$ \pi:G \to \perm{S} $$ called operation of $G$ on $S$ or action of $G$ on $S$

$S$, called $G$-set
denote $\pi(x)$ for $x\in G$ by $\pi_x$, hence homeomorphism denoted by $x\mapsto \pi_x$

obtain mapping from such operation, $G\times S \to S$, with $(x,s)\mapsto \pi_x(s)$
often abbreviate $\pi_x(s)$ by $xs$, with which the following two properties satisfied
- $ \left( \forall x,y\in G, s\in S \right) \left( x(ys) = (xy)s \right) $
- $ \left( \forall s\in S \right) \left( es = s \right) $
conversely, for mapping $G\times S\to S$ with $(x,s)\mapsto xs$ satisfying above two properties, $s\mapsto xs$ is permutation for $x\in G$, hence $\pi_x$ is homeomorphism of $G$ into $\perm{S}$
thus, operation of $G$ on $S$ can be defined as mapping $S\times G\to S$ satisfying above two properties

Conjugation

for group $G$ and map $\gamma_x:G\to G$ with $\gamma_x(y) = xyx^{-1}$, homeomorphism $$ G \to \aut{G} \mbox{ defined by } x\mapsto \gamma_x $$ called conjugation, which is operation of $G$ on itself

$\gamma_x$, called inner
kernel of conjugation is center of $G$
to avoid confusion, instead of writing $xy$ for $\gamma_x(y)$, write $$ \gamma_x(y) = xyx^{-1} = \prescript{x}{}{y} \mbox{ and } \gamma_{x^{-1}}(y) = x^{-1}yx = {y}^x $$
for subset $A\subset G$, map $(x,A) \mapsto xAx^{-1}$ is operation of $G$ on set of subsets of $G$
similarly for subgroups of $G$
two subsets of $G$, $A$ and $B$ with $B= x A x^{-1}$ for some $x\in G$, said to be conjugate

Translation

operation of $G$ on itself defined by map $$ (x,y) \mapsto xy $$ called translation, denoted by $T_x:G \to G$ with $T_x(y) = xy$

for subgroup $H\subset G$, $T_x(H) = xH$ is left coset
- denote set of left cosets also by $G/H$ even if $H$ is not normal
- denote set of right cosets also by $H\backslash G$
examples of translation
- $G=GL(V)$, group of linear automorphism of vector space with field $F$, for which, map $(A,v)\mapsto Av$ for $A\in G$ and $v\in V$ defines operation of $G$ on $V$
  - $G$ is subgroup of group of permutations, $\perm{V}$
- for $V=F^n$, $G$ is group of nonsingular $n$-by-$n$ matrices

Isotropy

for operation of group $G$ on set $S$ $$ \set{x\in G}{xs = s} $$ called isotropy of $G$, denoted by $G_s$, which is subgroup of $G$

for conjugation operation of group $G$, $G_s$ is normalizer of $s\in G$
isotropy groups are conjugate, e.g., for $s,s'\in S$ and $y\in G$ with $ys=s'$, $$ G_{s'} = yG_s y^{-1} $$
by definition, kernel of operation of $G$ on $S$ is $$ K = \bigcap_{s\in S} G_s \subset G $$
operation with trivial kernel, said to be faithful
$s\in G$ with $G_s = G$, called fixed point

Orbits of operation

for operation of group $G$ on set $S$, $\set{xs}{x\in G}$, called orbit of $s$ under $G$, denoted by $Gs$

for $x,y\in G$ in same coset of $G_s$, $xs = ys$, i.e. $ \left( \exists z\in G \right) \left( x,y \in zG_s \right) \Leftrightarrow xs = ys $
hence, mapping $G/G_s \to S$ with $x \mapsto x G_s$ is morphism of $G$-sets, thus

for group $G$, operating on set $S$ and $s\in S$, order of orbit $Gs$ is equal to index $(G:G_s)$

for subgroup $H$ of group $G$, number of conjugate subgroups to $H$ is index of normalizer of $H$ in $G$

operation with one orbit, said to be transitive

Orbit decomposition and class formula

orbits are disjoint $$ S = \coprod_{\lambda \in \Lambda} Gs_\lambda $$ where $s_\lambda$ are elements of distinct orbits

for group $G$ operating on set $S$, index set $\Lambda$ whose elements represent distinct orbits $$ |S| = \sum_{\lambda \in \Lambda} (G:G_\lambda) $$

for group $G$ and set $C\subset G$ whose elements represent distinct conjugacy classes $$ (G:1) = \sum_{x\in C} (G:G_x) $$

Sylow subgroups

for prime number $p$, finite group with order $p^n$ for some $n\geq0$, called $p$-group; subgroup $H\subset G$ of finite group $G$ with order $p^n$ for some $n\geq0$, called $p$-subgroup; subgroup of order $p^n$ where $p^n$ is highest power of $p$ dividing order of $G$, called $p$-Sylow subgroup

finite abelian group of order divided by prime number $p$ has subgroup of order $p$

finite group of order divided by prime number $p$ has $p$-Sylow subgroup

for $p$-group $H$, operating on finite set $S$

number of fixed points of $H$ is congruent to size of $S$ modulo $p$, i.e. $$ \mbox{\# fixed points of }H \equiv |S| \Mod{p} $$
if $H$ has exaxctly one fixed point, $ |S| \equiv 1\Mod{p} $
if $p$ divides $|S|$, $ |S| \equiv 0\Mod{p} $

Sylow subgroups and solvability

finite $p$-group is solvable; if it is non-trivial, it has non-trivial center

for non-trivial $p$-group, exists sequence of subgroups $$ \{e\} = G_0 \subset G_1 \subset G_2 \subset \cdots \subset G_n =G $$ where $G_i$ is normal in $G$ and $G_{i+1}/G_i$ is cyclic group of order $p$

for finite group $G$ and smallest prime number dividing order of $G$ $p$, every subgroup of index $p$ is normal

group of order $pq$ with $p$ and $q$ being distinct prime numbers, is solvable

now can prove following
- group of order, $35$, is solvable - implied by and
- group of order less than $60$ is solvable

Rings

set $A$ together with two laws of composition called multiplication and addition which are written as product and sum respectively, satisfying following conditions, called ring

$A$ is commutative group with respect to addition - unit element denoted by $0$
$A$ is monoid with respect to multiplication - unit element denoted by $1$
multiplication is distributive over addition, i.e. $$ \left( \forall x, y, z \in A \right) \left( (x+y)z = xz + yz \mbox{ \& } z(x+y) = zx + zy \right) $$
do not assume $1\neq 0$

can prove, e.g.,
- $\left( \forall x \in A \right) \left( 0x = 0 \right)$ because $0x + x = 0x + 1x = (0+1)x = 1x = x$
- if $1=0$, $A=\{0\}$ because $x = 1x = 0x = 0$
- $\left( \forall x,y\in A \right) \left( (-x)y = -(xy) \right)$ because $xy + (-x)y = (x+ -x)y = 0y = 0$

subset of ring which itself is ring with same additive and multiplicative laws of composition, called subring

More on ring

subset $U$ of ring $A$ such that every element of $U$ has both left and right inverses, called group of units of $A$ or group of invertible elements of $A$, sometimes denoted by $A^\ast$

ring with $1\neq0$ and every nonzero element being invertible, called division ring

ring $A$ with $\left( \forall x,y \in A \right) \left( xy= yx \right)$, called commutative ring

subset $C\subset A$ of ring $A$ such that $$ C= \set{a\in A}{\forall x \in A, xa = ax} $$ is subring, and is called center of ring $A$

Fields

commutative division ring, called field

General distributivity

general distributivity - for ring $A$, $\seq{x_i}_{i=1}^n\subset A$ and $\seq{y_i}_{i=1}^n\subset A$ $$ \left( \sum x_i \right) \left( \sum y_j \right) = \sum_i \sum_j x_iy_j $$

Ring examples

for set $S$ and ring $A$, set of all mappings of $S$ into $A$ $\Map(S,A)$ whose addition and multiplication are defined as below, is ring $$ \begin{eqnarray*} & \left( \forall f,g\in \Map(S,A) \right) \left( \forall x\in S \right) \left( (f+g)(x) = f(x)+g(x) \right) & \\ & \left( \forall f,g\in \Map(S,A) \right) \left( \forall x\in S \right) \left( (fg)(x) = f(x)g(x) \right) & \end{eqnarray*} $$
- additive and multiplicative unit elements of $\Map(S,A)$ are constant maps whose values are additive and multiplicative unit elements of $A$ respectively
- $\Map(S,A)$ is commutative if and only if $A$ is commutative
- for set $S$, $\Map(S,\reals)$ (page~here) is a commutative ring
for abelian group $M$, set $\End(M)$ of group homeomorphisms of $M$ into itself is ring with normal addition and mapping composition as multiplication
- additive and multiplicative unit elements of $\End(M)$ are constant map whose value is the unit element of $M$ and identity mapping respectively
- not commutative in general
for ring $A$, set $A[X]$ of polynomials over $A$ is ring, ()
for field $K$, $K^{n\times n}$, i.e., set of $n$-by-$n$ matrices with components in $K$, is ring
- $\left(K^{n\times n}\right)^\ast$, i.e., multiplicative group of units of $K^{n\times n}$, consists of non-singular matrices, i.e., those whose determinants are nonzero

Group ring

for group $G$ and field $K$, set of all formal linear combinations $ \sum_{x\in G} a_x x $ with $a_x\in K$ where $a_x$ are zero except finite number of them where addition is defined normally and multiplication is defined as $$ \left( \sum_{x\in G} a_x x \right) \left( \sum_{y\in G} b_y y \right) = \sum_{z\in G} \left( \sum_{xy=z} a_xb_y xy \right) $$ called group ring, denoted by $K[G]$

$\sum_{xy=z} a_xb_y$ above defines what is called convolution product

Convolution product

for two functions $f,g$ on group $G$, convolution (product), denoted by $f\ast g$, defined by $$ (f\ast g)(z) = \sum_{xy=z} f(x)f(y) $$ as function on group $G$

one may restrict this definition to functions which are $0$ except at finite number of elements

for $f,g\in L^1(\reals)$, can define convolution product $f\ast g$ by $$ (f\ast g) (x) = \int_{\reals} f(x-y)g(y)dy $$
- satisfies all axioms of ring except that there is not unit element
- commutative (essentially because $\reals$ is commutative)
more generally, for locally compact group $G$ wiht Haar measure $\mu$, can define convolution product by $$ (f\ast g) (x) = \int_{G} f(xy^{-1})g(y)d\mu(y) $$

Ideals of ring

subset $\ideal{a}$ of ring $A$ which is subgroup of additive group of $A$ with $A\ideal{a}\subset \ideal{a}$, called left ideal; indeed, $A\ideal{a} = \ideal{a}$ because $A$ has $1$; right ideal can be similarly defined, i.e., $\ideal{a} A = \ideal{a}$; subset which is both left and right ideal, called two-sided ideal or simply ideal

for ring $A$, $(0)$ are $A$ itself area ideals

for ring $A$ and $a\in A$, left ideal $Aa$, called principal left ideal

$a$, said to be generator of $\ideal{a}=Aa$ (over $A$)

$AaA$, called principal two-sided ideal where $$ AaA = \bigcup_{i=1}^\infty \bigsetl{\sum_{i=1}^n x_i a y_i}{x_i,y_i\in A} $$

only ideals of field are the field itself and zero ideal

Principal rings

commutative ring of which every ideal is principal and $1\neq0$, called principal ring

$\integers$ (set of integers) is principal ring
$k[X]$ (ring of polynomials) for field $k$ is principal ring
ring of algebraic integers in number field $K$ is not necessarily principal
- let $\ideal{p}$ be prime ideal, let $R_\ideal{p}$ be ring of all elements $a/b$ with $a,b\in R$ and $b\not\in\ideal{p}$, then $R_\ideal{p}$ is principal, with one prime ideal $\ideal{m}_\ideal{p}$ consisting of all elements $a/b$ as above but with $a\in\ideal{p}$
let $A$ be set of entire functions on complex plane, then $A$ is commutative ring, and every finitely generated ideal is principal
- given discrete set of complex numbers $\{z_i\}$ and nonnegative integers $\{m_i\}$, exists entire function $f$ having zeros at $z_i$ of multiplicity $m_i$ and no other zeros
- every principal ideal is of form $Af$ for some such $f$
- group of units $A^\ast$ in $A$ consists of functions having no zeros

Ideals as both additive and multiplicative monoids

ideals form additive monoid
- for left ideals $\ideal{a}$, $\ideal{b}$, $\ideal{c}$ of ring $A$, $\ideal{a}+\ideal{b}$ is left ideal, $(\ideal{a}+\ideal{b})+\ideal{c} =\ideal{a}+(\ideal{b}+\ideal{c})$, hence form additive monoid with $(0)$ as the unit elemenet
- similarly for right ideals & two-sided ideals
ideals form multiplicative monoid
- for left ideals $\ideal{a}$, $\ideal{b}$, $\ideal{c}$ of ring $A$, define $\ideal{a}\ideal{b}$ as $$ \ideal{a}\ideal{b} = \bigcup_{i=1}^\infty \bigsetl{\sum_{i=1}^n x_i y_i}{x_i \in \ideal{a},y_i\in \ideal{b}} $$ then $\ideal{a}\ideal{b}$ is also left ideal, $(\ideal{a}\ideal{b})\ideal{c} =\ideal{a}(\ideal{b}\ideal{c})$, hence form multiplicative monoid with $A$ itself as the unit elemenet; for this reason, this unit element $A$, i.e., the ring itself, often written as $(1)$
- similarly for right ideals & two-sided ideals
ideal multiplication is also distributive over addition
however, set of ideals does not form ring (because the additive monoid is not group)

Generators of ideal

for ring $A$ and $a_1,\ldots,a_n\subset A$, set of elements of $A$ of form $$ \sum_{i=1}^n x_i a_i $$ with $x_i \in A$, is left ideal, denoted by $(a_1,\ldots,a_n)$, called generators of the left ideal; similarly for right ideals

above equal to smallest ideals containing $a_i$, i.e., intersection of all ideals containing $a_i$ $$ \cap_{a_1,\ldots, a_n\in\ideal{a}} \ideal{a} $$ - just like set ($\sigma$-)algebras in set theory

Entire rings

for ring $A$, $x,y\in A$ with $x\neq0$, $y\neq0$, and $xy=0$, said to be zero divisors

commutative ring with no zero divisors for which $1\neq0$, said to be entire; entire ring, sometimes called integral domain

every field is entire ring

Ring-homeomorphism

mapping of ring into ring $f:A\to B$ such that $f$ is monoid-homeomorphism for both additive and multiplicative structure on $A$ and $B$, i.e., $$ \left( \forall a, b \in A \right) \left( f(a+b) = f(a) + f(b) \mbox{ \& } f(ab) = f(a)f(b) \right) $$ and $$ f(1)=1 \mbox{ \& } f(0)=0 $$ called ring-homeomorphism; kernel, defined to be kernel of $f$ viewed as additive homeomorphism

kernel of ring-homeomorphism $f:A\to B$ is ideal of $A$
conversely, for ideal $\ideal{a}$, can construct factor ring $A/\ideal{a}$
simply say “homeomorphism'' if reference to ring is clear

ring-homeomorphism from field into field is injective (due to )

Factor ring and canonical map

for ring $A$ and an ideal $\ideal{a} \subset A$, set of cosets $x+\ideal{a}$ for $x\in A$ combined with addition defined by viewing $A$ and $\ideal{a}$ as additive groups, multiplication defined by $ (x+\ideal{a}) (y+\ideal{a}) = xy+\ideal{a}, $ which satisfy all requirements for ring, called factor ring or residue class ring, denoted by $A/\ideal{a}$; cosets in $A/\ideal{a}$, called residue classes modulo \ideal{a}, and each coset $x+\ideal{a}$ called residue class of $x$ modulo \ideal{a}

for ring $A$ and ideal $\ideal{a}$
- for subset $S\subset \ideal{a}$, write $S \equiv 0 \Mod{\ideal{a}}$
- for $x,y\in A$, if $x-y\in\ideal{a}$, write $x \equiv y \Mod{\ideal{a}}$
- if $\ideal{a} = (a)$ for $a\in A$, for $x,y\in A$, if $x-y\in\ideal{a}$, write $x \equiv y \Mod{a}$

ring-homeomorphism of ring $A$ into factor ring $A/\ideal{a}$ $$ A \to A/\ideal{a} $$ called canonical map of $A$ into $A/\ideal{a}$

Factor ring induced ring-homeomorphism

for ring-homeomorphism $g:A\to A'$ whose kernel contains ideal $\ideal{a}$, exists unique ring-homeomorphism $g_\ast:A/\ideal{a} \to A'$ making diagram in the figure commutative, i.e., $g^\ast \circ f = g$ where $f$ is the ring canonical map $f:A\to A/\ideal{a}$

the ring canonical map $f:A\to A/\ideal{a}$ is universal in category of homeomorphisms whose kernel contains $\ideal{a}$

Prime ideal and maximal ideal

for commutative ring $A$, ideal $\ideal{p}\neq A$ with $A/\ideal{p}$ entire, called prime ideal or just prime;

equivalently, ideal $\ideal{p}\neq A$ is prime if and only if $ \left( \forall x,y \in A \right) \left( xy \in \ideal{p} \Rightarrow x \in \ideal{p} \mbox{ or } y \in \ideal{p} \right) $

for commutative ring $A$, ideal $\ideal{m}\neq A$ such that $$ \left( \forall \mbox{ ideal } \ideal{a} \subset A \right) \left( \ideal{m} \subset \ideal{a} \Rightarrow \ideal{a} = A \right) $$ called maximal ideal

for commutative ring $A$

every maximal ideal is prime
every ideal is contained in some maximal ideal
ideal $\{0\}$ is prime if and only if $A$ is entire
ideal $\ideal{m}$ is maximal if and only if $A/\ideal{m}$ is field
inverse image of prime ideal of commutative ring homeomorphism is prime

Embedding of ring

bijective ring-homeomorphism () is isomorphism

indeed, for bijective ring-isomorphism $f:A\to B$, exists set-theoretic inverse $g:B\to A$ of $f$, which is ring-homeomorphism

image $f(A)$ of ring-homeomorphism $f:A\to B$ is subring of $B$

ring-isomorphism between $A$ and its image, established by injective ring-homeomorphism $f:A\to B$, called embedding of ring

for ring-homeomorphism $f:A\to A'$ and ideal $\ideal{a}'$ of $A'$, injective ring-homeomorphism $$ A/f^{-1}(\ideal{a}') \to A'/\ideal{a}' $$ called induced injective ring-homeomorphism

Characteristic of ring

for ring $A$, consider ring-homeomorphism $$ \lambda:\integers \to A $$ such that $$ \lambda(n) = ne $$ where $e$ is multiplicative unit element of $A$
- kernel of $\lambda$ is ideal $(n)$ for some $n\geq0$, i.e., ideal generated by some nonnegative integer $n$
- hence, canonical injective ring-homeomorphism $\integers/n\integers \to A$, which is ring-isomorphism between $\integers/n\integers$ and subring of $A$
- when $n\integers$ is prime ideal, exist two cases; either $n=0$ or $n=p$ for prime number $p$

ring $A$ with $\{0\}$ as prime ideal kernel above, said to have characteristic $0$; if prime ideal kernel is $p\integers$ for prime number $p$, $A$, said to have characteristic $p$, in which case, $A$ contains (isomorphic image of) $\integers/p\integers$ as subring, abbreviated by \primefield{p}

Prime fields and prime rings

field $K$ has characteristic $0$ or $p$ for prime number $p$
$K$ contains as subfield (isomorphic image of)
- $\rationals$ if characteristic is $0$
- $\primefield{p}$ if characteristic is $p$

in above cases, both $\rationals$ and $\primefield{p}$, called prime field (contained in $K$); since prime field is smallest subfield of $K$ containing $1$ having no automorphism other than identity, identify it with $\rationals$ or $\primefield{p}$ for each case

in above cases, prime ring (contained in $K$) means either integers $\integers$ if $K$ has characteristic $0$ or $\primefield{p}$ if $K$ has characteristic $p$

$\integers/n\integers$

$\integers$ is ring
every ideal of $\integers$ is principal, i.e., either $\{0\}$ or $n\integers$ for some $n\in\naturals$ (refer to page~here)
ideal of $\integers$ is prime if and only if is $p\integers$ for some prime number $p\in\naturals$
- $p\integers$ is maximal ideal

$\integers/n\integers$, called ring of integers modulo $n$; abbreviated as $\mbox{mod }n$

$\integers/p\integers$ for prime $p$ is field and denoted by $\primefield{p}$

Euler phi-function

for $n>1$, order of divison ring of $\integers/n\integers$, called Euler phi-function, denoted by $\varphi(n)$; if prime factorization of $n$ is $$ n = p_1^{e_1} \cdots p_r^{e_r} $$ with distinct $p_i$ and $e_i\geq1$ $$ \varphi(n) = p_1^{e_1-1} (p_1 - 1) \cdots p_r^{e_r-1} (p_r - 1) $$

for $x$ prime to $n$ $$ x^{\varphi(n)} \equiv 1 \Mod{n} $$

Chinese remainder theorem

for ring $A$ and $n$ ideals $\ideal{a}_1$, … $\ideal{a}_n$ ($n\geq2$) with $\ideal{a}_i + \ideal{a}_j=A$ for all $i \neq j$ $$ \left( \forall x_1,\ldots, x_n \in A \right) \left( \exists x \in A \right) \left( \forall 1\leq i\leq n \right) \left( x \equiv x_i \Mod{\ideal{a}_i} \right) $$

for ring $A$, $n$ ideals $\ideal{a}_1$, … $\ideal{a}_n$ ($n\geq2$) with $\ideal{a}_i + \ideal{a}_j=A$ for all $i \neq j$, and map of $A$ into product induced by canonical maps of $A$ onto $A/\ideal{a}_i$ for each factor, i.e., $$ f: A \to \prod A/\ideal{a}_i $$ $f$ is surjective and $\Ker f = \bigcap \ideal{a}_i$, hence, exists isomorphism $$ A/\cap \ideal{a}_i \isomorph \prod A/\ideal{a}_i $$

Isomorphism of endomorphisms of cyclic groups

for cyclic group $A$ of order $n$, endomorphisms of $A$ into $A$ with $x\mapsto kx$ for $k\in\integers$ induce

ring isomorphism $$ \integers/n\integers \isomorph \End(A) $$
group isomorphism $$ (\integers/n\integers)^\ast \isomorph \Aut(A) $$

where $(\integers/n\integers)^\ast$ denotes group of units of $\integers/n\integers$ ()

e.g., for group of $n$-th roots of unity in $\complexes$, all automorphisms are given by $$ \xi \mapsto \xi^k $$ for $k\in(\integers/n\integers)^\ast$

Irreducibility and factorial rings

for entire ring $A$, non-unit non-zero element $a\in A$ with $$ \left( \forall b, c\in A \right) \left( a = bc \Rightarrow b \mbox{ or } c \mbox{ is unit} \right) $$ said to be irreducible

for entire ring $A$, element $a\in A$ for which, exists unit $u$ and irreducible elements, $p_1$, …, $p_r$ in $A$ such that $$ a = u \prod p_i $$ and this expression is unique up to permutation and multiplications by units, said to have unique factorization into irreducible elements

entire ring with every non-zero element has unique factorial into irreducible elements, called factorial ring or unique factorization ring

Greatest common divisor

for entire ring $A$ and nonzero elements $a,b\in A$, $a$ said to divide $b$ if exists $c\in A$ such that $ac=b$, denoted by $a|b$

for entire ring $A$ and $a,b\in A$, $d\in A$ which divides $a$ and $b$ and satisfies $$ \left( \forall c \in A \right) \left( c|a \mbox{ \& } c|b \Rightarrow c | d \right) $$ called greatest common divisor (g.c.d.) of $a$ and $b$

for principal entire ring $A$ and nonzero $a,b\in A$, $c\in A$ with $(a,b) = (c)$ is g.c.d. of $a$ and $b$

principal entire ring is factorial

Polynomials

Why (ring of) polynomials?

lays ground work for polynomials in general
needs polynomials over arbitrary rings for diverse purposes
- polynomials over finite field which cannot be identified with polynomial functions in that field
- polynomials with integer coefficients; reduce them mod $p$ for prime $p$
- polynomials over arbitrary commutative rings
- rings of polynomial differential operators for algebraic geometry & analysis
e.g., ring learning with errors (RLWE) for cryptographic algorithms

Ring of polynomials

exist many ways to define polynomials over commutative ring; here's one

for ring $A$, set of functions from monoid $S = \set{X^r}{r\in\integers, r\geq0}$ into $A$ which are equal to $0$ except finite number of elements of $S$, called polynomials over $A$, denoted by $A[X]$

for every $a\in A$, define function which has value $a$ on $X^n$, and value $0$ for every other element of $S$, by $aX^r$
then, a polynomial can be uniquely written as $$ f(X) = a_0X^0 + \cdots + a_nX^n $$ for some $n\in\integers_+$, $a_i\in A$
$a_i$, called coefficients of $f$

Polynomial functions

for two rings $A$ and $B$ with $A\subset B$ and $f\in A[X]$ with $f(X) = a_0 + a_1 X + \cdots + a_nX^n$, map $f_B: B\to B$ defined by $$ f_B(x) = a_0 + a_1 x + \cdots + a_n x^n $$ called polynomial function associated with $f(X)$

for two rings $A$ and $B$ with $A\subset B$ and $b\in B$, ring homeomorphism from $A[X]$ into $B$ with association, $\ev_b:f\mapsto f(b)$, called evaluation homeomorphism, said to be obtained by substituting $b$ for $X$ in $f$

hence, for $x\in B$, subring $A[x]$ of $B$ generated by $x$ over $A$ is ring of all polynomial values $f(x)$ for $f\in A[X]$

for two rings $A$ and $B$ with $A\subset B$, if $x\in B$ makes evaluation homeomorphism $\ev_x:f\mapsto f(x)$ isomorphic, $x$, said to be transcendental over $A$ or variable over $A$

in particular, $X$ is variable over $A$

Polynomial examples

consider $\alpha=\sqrt{2}$ and $\bigset{a+b\alpha}{a,b\in\integers}$, subring of $\integers[\alpha]\subset \reals$ generated by $\alpha$.
- $\alpha$ is not transcendental because $f(\alpha)=0$ for $f(X)=X^2-1$
- hence kernel of evaluation map of $\integers[X]$ into $\integers[\alpha]$ is not injective, hence not isomorphism
- indeed $$ \integers[\alpha] = \bigset{a+b\alpha}{a,b\in\integers} $$
consider $\primefield{p}$ for prime number $p$
- $f(X) = X^p - X\in \primefield{p}[X]$ is not zero polynomial, but because $x^{p-1} \equiv 1$ for every nonzero $x\in\primefield{p}$ by (Euler's theorem), $x^p\equiv x$ for every $x\in\primefield{p}$, thus for polynomial function, $f_{\primefield{p}}$, $f_{\primefield{p}}(x)=0$ for every $x$ in $\primefield{p}$
- i.e., non-zero polynomial induces zero polynomial function

Reduction map

for homeomorphism $\varphi:A\to B$ of commutative rings, exists associated homeomorphisms of polynomial rings $A[X]\to B[X]$ such that $$ f(X) = \sum a_i X^i \mapsto \sum \varphi(a_i) X^i = (\varphi f)(X) $$

above ring homeomorphism $f\mapsto \varphi f$, called reduction map

e.g., for complex conjugate $\varphi: \complexes \to \complexes$, homeomorphism of $\complexes[X]$ into itself can be obtained by reduction map $f \mapsto \varphi f$, which is complex conjugate of polynomials with complex coefficients

for prime ideal $\ideal{p}$ of ring $A$ and surjective canonical map $\varphi: A \to A/\ideal{p}$, reduction map $\varphi f$ for $f\in A[X]$, sometimes called reduction of $f$ modulo \ideal{p}

Basic properties of polynomials in one variable

for set of all polynomials in one variable of nonnegative degrees $A[X]$ with commutative ring $A$ $$ \begin{eqnarray*} && \left( \forall f,g \in A[X] \mbox{ with leading coefficients of } g \mbox{ unit in }A \right) \\ && \left( \exists q, r \in A[X] \mbox{ with } \deg r < \deg g \right) \left( f = qg + r \right) \end{eqnarray*} $$

polynomial ring in one variable $k[X]$ with field $k$ is principal

polynomial ring in one variable $k[X]$ with field $k$ is factorial

Constant, monic, and irreducible polynomials

$k \in k[X]$ with field $k$, called constant polynomial; $f(x) \in k[X]$ with leading coefficient $1$, called monic polynomial

polynomial $f(x)\in k[X]$ such that $$ \left( \forall g(X), h(X) \in k[X] \right) \left( f(X) = g(X)h(X) \Rightarrow g(X) \in k \mbox{ or } h(X) \in k \right) $$ said to be irreducible

Roots or zeros of polynomials

for commutative ring $B$, its subring $A\subset B$, and $f(x)\in A[X]$ in one variable, $b\in B$ satisfying $$ f(b) = 0 $$ called root or zero of $f$

for field $k$, polynomial $f\in k[X]$ in one variable of degree $n\geq 0$ has at most $n$ roots in $k$; if $a$ is root of $f$ in $k$, $X-a$ divides $f(X)$

Induction of zero functions

for field $k$ and infinite subset $T\subset k$, if polynomial $f\in k[X]$ in one variable over $k$ satisfies $$ \left( \forall a \in k \right) \left( f(a) =0 \right) $$ then $f(0)=0$, i.e., $f$ induces zero function

for field $k$ and $n$ infinite subsets of $k$, $\seq{S_i}_{i=1}^n$, if polynomial in $n$ variables over field $k$ satisfies $$ \left( \forall a_i\in S_i \mbox{ for } 1\leq i \leq n \right) \left( f(a_1,\ldots,a_n)=0 \right) $$ then $f=0$, i.e., $f$ induces zero function

if polynomial in $n$ variables over infinite field $k$ induces zero function in $k^{(n)}$, $f=0$

if polynomial in $n$ variables over finite field $k$ of order $q$, degree of which in each variable is less than $q$, induces zero function in $k^{(n)}$, $f=0$

Reduced polynomials and uniqueness

for field $k$ with $q$ elements, polynomial in $n$ variables over $k$ can be expressed as $$ f(X_1,\ldots,X_n) = \sum a_i X_1^{\nu_{i,1}} \cdots X_n^{\nu_{i,n}} $$ for finite sequence, $\seqscr{a_i}{i=1}{m}$, and $\seqscr{\nu_{i,1}}{i=1}{m}$, …, $\seqscr{\nu_{i,n}}{i=1}{m}$ where $a_i\in k$ and $\nu_{i,j} \geq 0$
because $X_i^q=X_i$ for any $X_i$, any $\nu_{i,j}\geq q$ can be (repeatedly) replaced by $\nu_{i,j}-(q-1)$, hence $f$ can be rewritten as $$ f(X_1,\ldots,X_n) = \sum a_i X_1^{\mu_{i,1}} \cdots X_n^{\mu_{i,n}} $$ where $0\leq \mu_{i,j} < q$ for all $i,j$

above polynomial, called reduced polynomial, denoted by $f^\ast$

for field $k$ with $q$ elements, reduced polynomial is unique (by )

Multiplicative subgroups and $n$-th roots of unity

for field $k$, subgroup of group $k^\ast=k\sim \{0\}$, called multiplicative subgroup of $k$

finite multiplicative subgroup of field is cyclic

multiplicative subgroup of finite field is cyclic

generator for group of $n$-th roots of unity, called primitive $n$-th root of unity; group of roots of unity, denoted by $\mu$; group of roots of unity in field $k$, denoted by $\mu(k)$

Algebraic closedness

field $k$, for which every polynomial in $k[X]$ of positive degree has root in $k$, said to be algebraically closed

e.g., complex numbers are algebraically closed
every field is contained in some algebraically closed field ()
for algebraically closed field $k$
- (of course) every irreducible polynomial in $k[X]$ is of degree $1$
- unique factorization of polynomial of nonnegative degree can be written in form $$ f(X) = c \prod_{i=1}^{r} (X-\alpha_i)^{m_i} $$ with nonzero $c\in k$, distinct roots, $\alpha_1,\ldots,\alpha_r \in k$, and $m_1,\ldots,m_r \in \naturals$

Derivatives of polynomials

for polynomial $f(X) = a_nX^n + \cdots + a_1 X + a_0 \in A[X]$ with commutative ring $A$, map $D:A[X] \to A[X]$ defined by $$ Df(X) = na_n X^{n-1} + \cdots + a_1 $$ called derivative of polynomial, denoted by $f'(X)$;

for $f,g\in A[X]$ with commutative ring $A$, and $a\in A$ $$ (f+g)' = f' + g' \quad \mbox{ and } \quad (fg)' = f'g + fg' \quad \mbox{ and } \quad (af)' = af' $$

Multiple roots and multiplicity

nonzero polynomial $f(X)\in k[X]$ in one variable over field $k$ having $a\in k$ as root can be written of form $$ f(X) = (X-a)^m g(X) $$ with some polynomial $g(X)\in A[X]$ relatively prime to $(X-a)$ (hence, $g(a)\neq0$)

above, $m$, called multiplicity of $a$ in $f$; $a$, said to be multiple root of $f$ if $m>1$

for polynomial $f$ of one variable over field $k$, $a\in k$ is multiple root of $f$ if and only if $f(a)=0$ and $f'(a)=0$

for polynomial $f\in K[X]$ over field $K$ of positive degree, $f'\neq0$ if $K$ has characteristic $0$; if $K$ has characteristic $p>0$, $f'=0$ if and only if $$ f(X) = \sum_{\nu=1}^n a_\nu X^\nu $$ where $p$ divides each integer $\nu$ whenever $a_\nu\neq0$

Frobenius endomorphism

homeomorphism of $K$ into itself $x\mapsto x^p$ has trivial kernel, hence injective
hence, iterating $r\geq 1$ times yields endomorphism, $x\mapsto x^{p^r}$

for field $K$, prime number $p$, and $r\geq1$, endomorphism of $K$ into itself, $x\mapsto x^{p^r}$, called Frobenius endomorphism

Roots with multiplicity $p^r$ in fields having characteristic $p$

for field $K$ having characteristic $p$
- $p | {p \choose \nu}$ for all $0< \nu < p$ because $p$ is prime, hence, for every $a,b\in K$ $$ (a+b)^p = a^p + b^p $$
- applying this resurvely $r$ times yields $$ (a+b)^{p^r} = (a^p + b^p)^{p^{r-1}} = (a^{p^2} + b^{p^2})^{p^{r-2}} = \cdots = a^{p^r} + b^{p^r} $$ hence $$ (X-a)^{p^r} = X^{p^r} - a^{p^r} $$
- if $a,c\in K$ satisfy $a^{p^r} = c$ $$ X^{p^r} - c = X^{p^r} - a^{p^r} = (X-a)^{p^r} $$ hence, polynomial $X^{p^r}-c$ has precisely one root $a$ of multiplicity $p^r$!

Algebraic Extension

Algebraic extension

will show
- for polynomial over field, always exists some extension of that field where the polynomial has root
- existence of algebraic closure for every field

Extension of field

for field $E$ and its subfield $F\subset E$, $E$ said to be extension field of $F$, (sometimes) denoted by $E/F$ (which should not confused with factor group)

can view $E$ as vector space over $F$
if dimension of the vector space is finite, extension called finite extension of $F$
if infinite, called infinite extension of $F$

Algebraic over field

for field $E$ and its subfield $F\subset E$, $\alpha\in E$ satisfying $$ \left( \exists a_0,\ldots, a_n \mbox{ with not all } a_i \mbox{ zero} \right) \left( a_0 + a_1\alpha + \cdots + a_n \alpha^n=0 \right) $$ said to be algebraic over $F$

for algebraic $\alpha\neq0$, can always find such equation like above that $a_0\neq0$

equivalent statements to
- exists homeomorphism $\varphi: F[X] \to E$ such that $$ \left(\forall x\in F\right) \left(\varphi(x) = x\right) \mbox{ \& } \varphi(X) = \alpha \mbox{ \& } \Ker \varphi \neq \{0\} $$
- exists evaluation homeomorphism $\ev_\alpha: F[X] \to E$ with nonzero kernel (refer to for definition of evaluation homeomorphism)
in which case, $\Ker \varphi$ is principal ideal (by ), hence generated by single element, thus exists nonzero $p(X) \in F[X]$ (with normalized leading coefficient being $1$) so that $$ F[X] / (p(X)) \isomorph F[\alpha] $$
$F[\alpha]$ entire (), hence $p(X)$ irreducible (refer to )

normalized $p(X)$ (i.e., with leading coefficient being $1$) uniquely determined by $\alpha$, called THE irreducible polynomial of $\alpha$ over $F$, denoted by $\Irr(\alpha, F, X)$

Algebraic extensions

for field $F$, its extension field every element of which is algebraic over $F$, said to be algebraic extension of $F$

for field $F$, every finite extension field of $F$ is algebraic over $F$

converse is not true, e.g., subfield of complex numbers consisting of algebraic numbers over $\rationals$ is infinite extension of $\rationals$

Dimension of extensions

for field $F$ and its extension field $E$, dimension of $E$ as vector space over $F$, called dimension of $E$ over $F$, denoted by $\dimext{E}{F}$

for field $k$ and its extension fields $F$ and $E$ with $k\subset F\subset E$ $ \dimext{E}{k} = \dimext{E}{F} \dimext{F}{k} $

if $\seqscr{x_i}{i\in I}{}$ is basis for $F$ over $k$, and $\seqscr{y_j}{j\in J}{}$ is basis for $E$ over $F$, $\seqscr{x_iy_j}{(i,j)\in I\times J}{}$ is basis for $E$ over $k$

for field $k$ and its extension fields $F$ & $E$ with $k\subset F\subset E$, $E/k$ is finite if and only if both $F/k$ and $E/F$ are finite

Generation of field extensions

for field $k$, its extension field $E$, and $\alpha_1,\ldots, \alpha_n \in E$, smallest subfield containing $k$ and $\alpha_1$, …, $\alpha_n$, said to be finitely generated over $k$ by $\alpha_1$, \ldots, $\alpha_n$, denoted by $k(\alpha_1,\ldots, \alpha_n)$

$k(\alpha_1,\ldots, \alpha_n)$ consists of all quotients $f(\alpha_1,\ldots,\alpha_n)/g(\alpha_1,\ldots, \alpha_n)$ where $f,g\in k[X]$ and $g(\alpha_1,\ldots, \alpha_n)\neq0$, i.e. $$ \begin{eqnarray*} && k(\alpha_1,\ldots,\alpha_n) \\ &=& \bigset{f(\alpha_1,\ldots, \alpha_n)/g(\alpha_1,\ldots,\alpha_n)}{f,g\in f[X], g(\alpha_1,\ldots,\alpha_n)\neq0} \end{eqnarray*} $$
any field extension $E$ over $k$ is union of smallest subfields containing $\alpha_1,\ldots, \alpha_n$ where $\alpha_1,\ldots, \alpha_n$ range over finite set of elements of $E$, i.e. $$ E = \bigcup_{n\in\naturals} \bigcup_{\alpha_1, \ldots, \alpha_n \in E} k(\alpha_1,\ldots,\alpha_n) $$

every finite extension of field is finitely generated

Tower of fields

sequence of extension fields $$ F_1 \subset F_2 \subset \cdots \subset F_n $$ called tower of fields

tower of fields, said to be finite if and only if each step of extensions is finite

Algebraicness of finitely generated subfields

for field $k$, its extension field $E$, and $\alpha\in E$ being algebraic over $k$ $$ k(\alpha) = k[\alpha] $$ and $$ [k(\alpha):k] = \deg \Irr(\alpha, k, X) $$ hence $k(\alpha)$ is finite extension of $k$, thus algebraic extension over $k$ (by )

for field $k$, its extension field $F$, and $\alpha\in E$ being algebraic over $k$ where $k(\alpha)$ and $F$ are subfields of common field, $\alpha$ is algebraic over $F$

indeed, $\Irr(\alpha,k,X)$ has a fortiori coefficients in $F$

assume tower of fields $$ k \subset k(\alpha_1) \subset k(\alpha_1, \alpha_2) \subset \cdots \subset k(\alpha_1,\ldots, \alpha_n) $$ where $\alpha_i$ is algebraic over $k$
then, $\alpha_{i+1}$ is algebraic over $k(\alpha_1,\ldots,\alpha_i)$ (by )

for field $k$ and $\alpha_1$, …, $\alpha_n$ being algebraic over $k$, $E=k(\alpha_1,\ldots,\alpha_n)$ is finitely algebraic over $k$ (due to , , and ). Indeed, $E = k[\alpha_1, \ldots, \alpha_n]$ and $$ \begin{eqnarray*} [k(\alpha_1,\ldots,\alpha_n):k] &=& \deg \Irr(\alpha_1,k,X) \deg \Irr(\alpha_2,k(\alpha_1),X) \\ && \cdots \deg \Irr(\alpha_n, k(\alpha_1,\ldots,\alpha_{n-1}), X), \end{eqnarray*} $$

Compositum of subfields and lifting

for field $k$ and its extension fields $E$ and $F$, which are subfields of common field $L$, smallest subfield of $L$ containing both $E$ and $F$, called compositum of $E$ and $F$ in $L$, denoted by $EF$

cannot define compositum if $E$ and $F$ are not embedded in common field $L$

could define compositum of set of subfields of $L$ as smallest subfield containing subfields in the set

extension $E$ of $k$ is compositum of all its finitely generated subfields over $k$, i.e., $ E = \bigcup_{n\in\naturals} \bigcup_{\alpha_1, \ldots, \alpha_n \in E} k(\alpha_1,\ldots,\alpha_n) $

Lifting

extension $EF$ of $F$, called translation of $E$ to $F$ or lifting of $E$ to $F$

often draw diagram as in the figure

Finite generation of compositum

for field $k$, its extension field $F$, and $E = k(\alpha_1,\ldots,\alpha_n)$ where both $E$ and $F$ are contained in common field $L$, $$ EF = F(\alpha_1, \ldots, \alpha_n) $$ i.e., compositum $EF$ is finitely generated over $F$

refer to diagra in the figure

Distinguished classes

for field $k$, class $\classk{C}$ of extension fields satisfying

for tower of fields $k\subset F\subset E$, extension $k\subset E$ is in $\classk{C}$ if and only if both $k\subset F$ and $F\subset E$ are in $\classk{C}$
if $k\subset E$ is in $\classk{C}$, $F$ is any extension of $k$, and both $E$ and $F$ are subfields of common field, then $F\subset EF$ is in $\classk{C}$

said to be distinguished; the figure illustrates these two properties, which imply the following property

if $k\subset F$ and $k\subset E$ are in $\classk{C}$ and both $E$ and $F$ are subfields of common field, $k\subset EF$ is in $\classk{C}$

Both algebraic and finite extensions are distinguished

class of algebraic extensions is distinguished, so is class of finite extensions

true that finitely generated extensions form distinguished class (not necessarily algebraic extensions or finite extensions)

Field embedding and embedding extension

for two fields $F$ and $L$, injective homeomorphism $\sigma:F\to L$, called embedding of $F$ into $L$; then (of course) $\sigma$ induces isomorphism of $F$ with its image $\sigma F$ (footnote – Here $\sigma F$ is sometimes written as $F^\sigma$.)

for field embedding $\sigma:F\to L$, field extension $F\subset E$, and embedding $\tau:E\to L$ whose restriction to $F$ being equal to $\sigma$, said to be over $\sigma$ or extend $\sigma$; if $\sigma$ is identity, embedding $\tau$, called embedding of $E$ over $F$; diagrams in the figure show these embedding extensions

assuming $F$, $E$, $\sigma$, and $\tau$ same as in , if $\alpha\in E$ is root of $f\in F[X]$, then $\alpha^\tau$ is root of $f^\sigma$ for if $f(X) = \sum_{i=0}^n a_i X^i$, then $f(\alpha) = \sum_{i=0}^n a_i \alpha^i = 0$, and $ 0 = f(\alpha)^\tau = \sum_{i=0}^n (a_i^\tau ) (\alpha^\tau)^i = \sum_{i=0}^n a_i^\sigma (\alpha^\tau)^i = f^\sigma(\alpha^\tau) $

Embedding of field extensions

for field $k$ and its algebraic extension $E$, embedding of $E$ into itself over $k$ is isomorphism

for field $k$ and its field extensions $E$ and $F$ contained in common field, $$ E[F] = F[E] = \bigcup_{n=1}^\infty \bigset{e_1f_1 + \cdots + e_nf_n}{e_i\in E, f_i\in F} $$ and $EF$ is field of quotients of these elements

for field $k$, its field extensions $E_1$ and $E_2$ contained in commen field $E$, and embedding $\sigma:E\to L$ for field $L$, $$ \sigma(E_1 E_2) = \sigma(E_1) \sigma(E_2) $$

Existence of roots of irreducible polynomial

assume $p(X) \in k[X]$ irreducible polynomial and consider canonical map, which is ring homeomorphism $$ \sigma: k[X] \to k[X] / ((p(X)) $$
consider $\Ker \restrict{\sigma}{k}$
- every kernel of ring homeomorphism is ideal, hence if nonzero $a \in \Ker \restrict{\sigma}{k}$, $1\in \Ker \restrict{\sigma}{k}$ because $a^{-1} \in \Ker \restrict{\sigma}{k}$, but $1\not\in (p(X))$
- thus, $\Ker \restrict{\sigma}{k} = \{0\}$, hence $p^\sigma\neq0$
now for $\alpha = X^\sigma$ $$ p^\sigma(\alpha) = p^\sigma(X^\sigma) = (p(X))^\sigma = 0 $$
thus, $\alpha$ is algebraic in $k^\sigma$, i.e., $\alpha \in k[X]^\sigma$ is root of $p^\sigma$ in $k^\sigma(\alpha)$

for field $k$ and irreducible $p(X)\in k[X]$ with $\deg p \geq 1$, exist field $L$ and homeomorphism $\sigma:k \to L$ such that $p^\sigma$ with $\deg p^\sigma \geq 1$ has root in field extension of $k^\sigma$

Existence of algebraically closed algebraic field extensions

for field $k$ and $f\in k[X]$ with $\deg f \geq 1$, exists extension of $k$ in which $f$ has root

for field $k$ and $f_1$, …, $f_n$ $\in$ $k[X]$ with $\deg f_i \geq 1$, exists extension of $k$ in which every $f_i$ has root

for every field $k$, exists algebraically closed extension of $k$

for every field $k$, exists algebraically closed algebraic extension of $k$

Isomorphism between algebraically closed algebraic extensions

for field, $k$, $\alpha$ being algebraic over $k$, algebraically closed field, $L$, and embedding, $\sigma:k\to L$, # possible embedding extensions of $\sigma$ to $k(\alpha)$ in $L$ is equal to # distinct roots of $\Irr(\alpha,k,X)$, hence no greater than # roots of $\Irr(\alpha,k,X)$

for field, $k$, its algebraic extensions, $E$, algebraically closed field, $L$, and embedding, $\sigma:k\to L$, exists embedding extension of $\sigma$ to $E$ in $L$; if $E$ is algebraically closed and $L$ is algebraic over $k^\sigma$, every such embedding extension is isomorphism of $E$ onto $L$

for field, $k$, and its algebraically closed algebraic extensions, $E$ and $E'$, exists isomorphism bewteen $E$ and $E'$ which induces identity on $k$, i.e. $$ \tau: E \to E' $$ where $\restrict{\tau}{k}$ is identity

thus, algebraically closed algebraic extension is determined up to isomorphism

Algebraic closure

for field, $k$, algebraically closed algebraic extension of $k$, which is determined up to isomorphism, called algebraic closure of $k$, frequently denoted by $\algclosure{k}$

examples
- complex conjugation is automorphism of $\complexes$ (which is the only continuous automorphism of $\complexes$)
- subfield of $\complexes$ consisting of all numbers which are algebraic over $\rationals$ is algebraic closure of $\rationals$, i.e., $\algclosure{\rationals}$
- $\algclosure{\rationals} \neq \complexes$
- $\algclosure{\reals} = \complexes$
- \algclosure{\rationals}\ is countable

algebraic closure of finite field is countable

for infinite field, $k$, every algebraic extension of $k$ has same cardinality as $k$

Splitting fields

for field, $k$, and $f\in k[X]$ with $\deg f \geq 1$, field extension, $K$, of $k$, $f$ splits into linear factors in which, i.e., $$ f(X) = c (X-\alpha_1) \cdots (X-\alpha_n) $$ and which is finitely generated over $k$ by $\alpha_1$, …, $\alpha_n$ (hence $K=k(\alpha_1, \ldots, \alpha_n)$), called splitting field of $f$

for field, $k$, every $f\in k[X]$ has splitting field in $\algclosure{k}$

for field, $k$, $f\in k[X]$ with $\deg f \geq1$, and two splitting fields of $f$, $K$ and $E$, exists isomorphism between $K$ and $E$; if $k\subset K\subset \algclosure{k}$, every embedding of $E$ into $\algclosure{k}$ over $k$ is isomorphism of $E$ onto $K$

Splitting fields for family of polynomials

for field, $k$, index set, $\Lambda$, and indexed family of polynomials, $\set{f_\lambda\in k[X]}{\lambda \in \Lambda, \deg f_\lambda \geq1}$, extension field of $k$, every $f_\lambda$ splits into linear factors in which and which is generated by all roots of all polynomials, $f_\lambda$, called splitting field for family of polynomials

in most applications, deal with finite $\Lambda$
becoming increasingly important to consider infinite algebraic extensions
various proofs would not be simpler if restricted ourselves to finite cases

for field, $k$, index set, $\Lambda$, and two splitting fields, $K$ and $E$, for family of polynomials, $\set{f_\lambda\in k[X]}{\lambda \in \Lambda, \deg f_\lambda \geq1}$, every embedding of $E$ into $\algclosure{K}$ over $k$ is isomorphism of $E$ onto $K$

Normal extensions

for field, $k$, and its algebraic extension, $K$, with $k\subset K\subset \algclosure{k}$, following statements are equivalent

every embedding of $K$ into $\algclosure{k}$ over $k$ induces automorphism
$K$ is splitting field of family of polynomials in $k[X]$
every irreducible polynomial of $k[X]$ which has root in $K$ splits into linear factors in $K$

for field, $k$, and its algebraic extension, $K$, with $k\subset K\subset \algclosure{k}$, satisfying properties in , said to be normal

not true that class of normal extensions is distinguished
- e.g., below tower of fields is tower of normal extensions $$ \rationals \subset \rationals(\sqrt{2}) \subset \rationals(\sqrt[4]{2}) $$
- but, extension $\rationals \subset \rationals(\sqrt[4]{2})$ is not normal because complex roots of $X^4-2$ are not in $\rationals(\sqrt[4]{2})$

Retention of normality of extensions

normal extensions remain normal under lifting; if $k\subset E\subset K$ and $K$ is normal over $k$, $K$ is normal over $E$; if $K_1$ and $K_2$ are normal over $k$ and are contained in common field, $K_1K_2$ is normal over $k$, and so is $K_1\cap K_2$

Separable degree of field extensions

for field, $F$, and its algebraic extension, $E$
- let $L$ be algebraically closed field and assume embedding, $\sigma:F\to L$
  - exists embedding extension of $\sigma$ to $E$ in $L$ by
  - such $\sigma$ maps $E$ on subfield of $L$ which is algebraic over $F^\sigma$
  - hence, $E^\sigma$ is contained in algebraic closure of $F^\sigma$ which is contained in $L$
  - will assume that $L$ is the algebraic closure of $F^\sigma$
- let $L'$ be another algebraically closed field and assume another embedding, $\tau:F\to L'$ - assume as before that $L'$ is algebraic closure of $F^\tau$
- then implies, exists isomorphism, $\lambda:L\to L'$ extending $\tau\circ \sigma^{-1}$ applied to $F^\sigma$
- let $S_\sigma$ & $S_\tau$ be sets of embedding extensions of $\sigma$ to $E$ in $L$ and $L'$ respectively
- then $\lambda$ induces map from $S_\sigma$ into $S_\tau$ with $\tilde{\sigma} \mapsto \lambda \circ \tilde{\sigma}$ and $\lambda^{-1}$ induces inverse map from $S_\tau$ into $S_\sigma$, hence exists bijection between $S_\sigma$ and $S_\tau$, hence have same cardinality

above cardinality only depends on extension $E/F$, called separable degree of $E$ over $F$, denoted by $[E:F]_s$

Multiplicativity of and upper bound on separable degree of field extensions

for tower of algebraic field extensions, $k\subset F\subset E$, $$ [E:k]_s = [E:F]_s [F:k]_s $$

for finite algebraic field extension, $k\subset E$ $$ [E:k]_s \leq [E:k] $$

i.e., separable degree is at most equal to degree (i.e., dimension) of field extension

for tower of algebraic field extensions, $k\subset F\subset E$, with $[E:k]<\infty$ $$ [E:k]_s = [E:k] $$ holds if and only if corresponding equality holds in every step of tower, i.e., for $E/F$ and $F/k$

Finite separable field extensions

for finite algebraic field extension, $E/k$, with $[E:k]_s=[E:k]$, $E$, said to be separable over $k$

for field, $k$, $\alpha$, which is algebraic over $k$ with $k(\alpha)$ being separable over $k$, said to be separable over $k$

for field, $k$, $\alpha$, which is algebraic over $k$, is separable over $k$ if and only if $\Irr(\alpha,k,X)$ has no multiple roots

for field, $k$, $f\in k[X]$ with no multiple roots, said to be separable

for tower of algebraic field extensions, $k\subset F\subset K$, if $\alpha \in K$ is separable over $k$, then $\alpha$ is separable over $F$

for finite field extension, $E/k$, $E$ is separable over $k$ if and only if every element of $E$ is separable over $k$

Arbitrary separable field extensions

for (not necessarily finite) field extension, $E/k$, $E$, of which every finitely generated subextension is separable over $k$, i.e., $$ \left( \forall n\in\naturals\ \& \ \alpha_1,\ldots, \alpha_n \in E \right) \left( k(\alpha_1, \ldots, \alpha_n) \mbox{ is separable over } k \right) $$ said to be separable over $k$

for algebraic extension, $E/k$, $E$, which is generated by family of elements, $\{\alpha_\lambda\}_{\lambda\in\Lambda}$, with every $\alpha_\lambda$ is separable over $k$, is separable over $k$

separable extensions form distinguished class of extensions

Separable closure and conjugates

for field, $k$, compositum of all separable extensions of $k$ in given algebraic closure $\algclosure{k}$, called separable closure of $k$, denoted by $k^\mathrm{s}$ or \sepclosure{k}

for algebraic field extension, $E/k$, and embedding of $E$, $\sigma$, in $\algclosure{k}$ over $k$, $E^\sigma$, called conjugate of $E$ in \algclosure{k}

smallest normal extension of $k$ containing $E$ is compositum of all conjugates of $E$ in $\algclosure{E}$

for field, $k$, $\alpha$ being algebraic over $k$, and distinct embeddings, $\sigma_1$, …, $\sigma_r$ of $k(\alpha)$ into $\algclosure{k}$ over $k$, $\alpha^{\sigma_1}$, …, $\alpha^{\sigma_r}$, called conjugates of $\alpha$ in \algclosure{k}

$\alpha^{\sigma_1}$, …, $\alpha^{\sigma_r}$ are simply distinct roots of $\Irr(\alpha, k, X)$
smallest normal extension of $k$ containing one of these conjugates is simply $k(\alpha^{\sigma_1}, \ldots, \alpha^{\sigma_r})$

Prime element theorem

for finite algebraic field extension, $E/k$, exists $\alpha\in E$ such that $E=k(\alpha)$ if and only if exists only finite # fields, $F$, such that $k\subset F\subset E$; if $E$ is separable over $k$, exists such element, $\alpha$

for finite algebraic field extension, $E/k$, $\alpha\in E$ with $E=k(\alpha)$, called primitive element of $E$ over $k$

Finite fields

for every prime number, $p$, and integer, $n\geq1$, exists finite field of order $p^n$, denoted by $\finitefield{p}{n}$, uniquely determined as subfield of algebraic closure, $\algclosure{\primefield{p}}$, which is splitting field of polynomial $$ f_{p,n}(X) = X^{p^n} - X $$ and whose elements are roots of $f_{p,n}$

for every finite field, $F$, exist prime number, $p$, and integer, $n\geq1$, such that $F=\finitefield{p}{n}$

for finite field, , and integer, $m\geq1$, exists one and only one extension of degree, $m$, which is

multiplicative group of finite field is cyclic

Automorphisms of finite fields

mapping $$ \frobmap{p}{n}: \finitefield{p}{n} \to \finitefield{p}{n} $$ defined by $x\mapsto x^p$, called Frobenius mapping

is (ring) homeomorphism with $\Ker \frobmap{p}{n} = \{0\}$ since is field, thus is injective (), and surjective because is finite,
thus, is isomorphism leaving \primefield{p}\ fixed

group of automorphisms of is cyclic of degree $n$, generated by

for prime number, $p$, and integers, $m,n\geq1$, in any $\algclosure{\primefield{p}}$, is contained in if and only if $n$ divides $m$, i.e., exists $d\in\integers$ such that $m=dn$, in which case, is normal and separable over group of automorphisms of over is cyclic of order, $d$, generated by $\frobmap{p}{m}^n$

Galois Theory

What we will do to appreciate Galois theory

study
- group of automorphisms of finite (and infinite) Galois extension (at length)
- give examples, e.g., cyclotomic extensions, abelian extensions, (even) non-abelian ones
- leading into study of matrix representation of Galois group & classifications
have tools to prove
- fundamental theorem of algebra
- insolvability of quintic polynomials
mention unsolved problems
- given finite group, exists Galois extension of $\rationals$ having this group as Galois group?

Fixed fields

for field, $K$, and group of automorphisms, $G$, of $K$, $$ \set{x\in K}{\forall \sigma\in G, x^\sigma = x}\subset K $$ is subfield of $K$, and called fixed field of $G$, denoted by $K^G$

$K^G$ is subfield of $K$ because for every $x,y\in K^G$
- $0^\sigma = 0 \Rightarrow 0\in K^G$
- $(x+y)^\sigma = x^\sigma + y^\sigma = x + y \Rightarrow x+y \in K^G$
- $(-x)^\sigma = - x^\sigma = - x \Rightarrow -x \in K^G$
- $1^\sigma = 1 \Rightarrow 1\in K^G$
- $(xy)^\sigma = x^\sigma y^\sigma = xy \Rightarrow xy\in K^G$
- $(x^{-1})^\sigma = (x^\sigma)^{-1} = x^{-1} \Rightarrow x^{-1} \in K^G$
hence, $K^G$ closed under addition & multiplication, and is commutative division ring, thus field
$0,1\in K^G$, hence $K^G$ contains prime field

Galois extensions and Galois groups

algebraic extension, $K$, of field, $k$, which is normal and separable, said to be Galois (extension of $k$) or Galois over $k$ considering $K$ as embedded in $\algclosure{k}$; for convenience, sometimes say $K/k$ is Galois

for field, $k$ and its Galois extension, $K$, group of automorphisms of $K$ over $k$, called Galois group of $K$ over $k$, denoted by $G(K/k)$, $G_{K/k}$, $\Gal(K/k)$, or (simply) $G$

for field, $k$, separable $f\in k[X]$ with $\deg f \geq 1$, and its splitting field, $K/k$, Galois group of $K$ over $k$ (i.e., $G(K/k)$), called Galois group of $f$ over $k$

for field, $k$, separable $f\in k[X]$ with $\deg f \geq 1$, and its splitting field, $K/k$, $$ f(X) = (X-\alpha_1) \cdots (X-\alpha_n) $$ elements of Galois group of $f$ over $k$, $G$, permute roots of $f$, hence, exists injective homeomorphism of $G$ into $S_n$, i.e., symmetric group on $n$ elements

Fundamental theorem for Galois theory

for finite Galois extension, $K/k$

map $H \mapsto K^H$ induces isomorphism between set of subgroups of $G(K/k)$ & set of intermediate fields
subgroup, $H$, of $G(K/k)$, is normal if and only if $K^H/k$ is Galois
for normal subgroup, $H$, $\sigma\mapsto \restrict{\sigma}{K^H}$ induces isomorphism between $G(K/k)/H$ and $G(K^H/k)$

(illustrated in the figure)

shall prove step by step

Galois subgroups association with intermediate fields

for Galois extension, $K/k$, and intermediate field, $F$

$K/F$ is Galois & $K^{G(K/F)} = F$, hence, $K^G = k$
map $$ F \mapsto G(K/F) $$ induces injective homeomorphism from set of intermediate fields to subgroups of $G$

for Galois extension, $K/k$, and intermediate field, $F$, subgroup, $G(K/F)\subset G(K/k)$, called group associated with $F$, said to belong to $F$

for Galois extension, $K/k$, and two intermediate fields, $F_1$ and $F_2$, $G(K/F_1) \cap G(K/F_2)$ belongs to $F_1F_2$, i.e., $$ G(K/F_1) \cap G(K/F_2) = G(K/F_1F_2) $$

for Galois extension, $K/k$, and two intermediate fields, $F_1$ and $F_2$, smallest subgroup of $G$ containing $G(K/F_1)$ and $G(K/F_2)$ belongs to $F_1\cap F_2$, i.e. $$ \bigcap_{G(K/F_1)\subset H, G(K/F_2)\subset H} \set{H}{H\subset G(K/k)} = G(K/(F_1\cap F_2)) $$

for Galois extension, $K/k$, and two intermediate fields, $F_1$ and $F_2$, $$ F_1\subset F_2 \mbox{ \iaoi\ } G(K/F_2)\subset G(K/F_1) $$

for finite separable field extension, $E/k$, the smallest normal extension of $k$ containing $E$, $K$, $K/k$ is finite Galois and exist only finite number of intermediate fields

for algebraic separable extension, $E/k$, if every element of $E$ has degree no greater than $n$ over $k$ for some $n\geq1$, $E$ is finite over $k$ and $[E:k]\leq n$

(Artin) for field, $K$, finite $\Aut(K)$ of order, $n$, and $k = K^{\Aut(K)}$, $K/k$ is Galois, $G(K/k)= \Aut(K)$, and $[K:k] = n$

for finite Galois extension, $K/k$, every subgroup of $G(K/k)$ belongs to intermediate field

for Galois extension, $K/k$, and intermediate field, $F$,

$F/k$ is normal extension if and only if $G(K/F)$ is normal subgroup of $G(K/k)$
if $F/k$ is normal extension, map, $\sigma \mapsto \restrict{\sigma}{F}$, induces homeomorphism of $G(K/k)$ onto $G(F/k)$ of which $G(K/F)$ is kernel, thus $$ G(F/k) \isomorph G(K/k)/G(K/F) $$

Proof for fundamental theorem for Galois theory

finally, we prove fundamental theorem for Galois theory ()
assume $K/k$ is finite Galois extension and $H$ is subgroup of $G(K/k)$
- implies $K^H$ is intermediate field, hence implies $K/K^H$ is Galois, implies $G(K/K^H) = H$, thus, every $H$ is Galois
- map, $H\mapsto K^H$, induces homeomorphism, $\sigma$, of set of all subgroups of $G(K/k)$ into set of intermediate fields
- $\sigma$ is injective since for any two subgroups, $H$ and $H'$, of $G(K/k)$, if $K^H=K^{H'}$, then $H=G(K/K^H)=G(K/K^{H'})=H'$
- $\sigma$ is surjective since for every intermediate field, $F$, implies $K/F$ is Galois, $G(K/F)$ is subgroup of $G(K/k)$, and $K^{G(K/F)}=F$, thus, $\sigma(G(K/F)) = K^{G(K/F)}= F$
- therefore, $\sigma$ is isomorphism between set of all subgroups of $G(K/k)$ and set of intermediate fields
- since implies separable extensions are distinguished, $H^K/k$ is separable, thus implies that $K^H/k$ is Galois if and only if $G(K/K^H)$ is normal
- lastly, implies that if $K^H/k$ is Galois, $G(H^K/k) \isomorph G(K/k) / H$

Abelian and cyclic Galois extensions and groups

Galois extension with abelian Galois group, said to be abelian

Galois extension with cyclic Galois group, said to be cyclic

for Galois extension, $K/k$, and intermediate field, $F$,

if $K/k$ is abelian, $F/k$ is Galois and abelian
if $K/k$ is cyclic, $F/k$ is Galois and cyclic

for field, $k$, compositum of all abelian Galois extensions of $k$ in given $\algclosure{k}$, called maximum abelian extension of $k$, denoted by $\maxabext{k}$

Theorems and corollaries about Galois extensions

for Galois extension, $K/k$, and arbitrary extension, $F/k$, where $K$ and $F$ are subfields of common field,

$KF / F$ and $K/(K\cap F)$ are Galois extensions
map $$ \sigma \mapsto \sigma|K $$ induces isomorphism between $G(KF / F)$ and $G(K/(K\cap F))$

theorem illustrated in the figure

for finite Galois extension, $K/k$, and arbitrary extension, $F/k$, where $K$ and $F$ are subfields of common field, $$ [KF:F] \mbox{ divides } [F:k] $$

for Galois extensions, $K_1/k$ and $K_2/k$, where $K_1$ and $K_2$ are subfields of common field,

$K_1K_2/k$ is Galois extension
map $$ \sigma \mapsto (\restrict{\sigma}{K_1}, \restrict{\sigma}{K_2}) $$ of $G(K_1K_2/k)$ into $G(K_1/k) \times G(K_2/k)$ is injective; if $K_1\cap K_2=k$, map is isomorphism

theorem illustrated in the figure

for $n$ Galois extensions, $K_i/k$, where $K_1$, …, $K_n$ are subfields of common field and $K_{i+1}\cap(K_1\cdots K_i) = k$ for $i=1,\ldots,n-1$,

$K_1\cdots K_n/k$ is Galois extension
map $$ \sigma \mapsto (\restrict{\sigma}{K_1}, \ldots, \restrict{\sigma}{K_n}) $$ induces isomorphism of $G(K_1\cdots K_n/k)$ onto $G(K_1/k) \times \cdots \times G(K_n/k)$

for Galois extension, $K/k$, where $G(K/k)$ can be written as $G_1\times \cdots \times G_n$, and $K_1$, …, $K_n$, each of which is fixed field of $$ G_1 \times \cdots \times \underbrace{\{e\}}_{i\mathrm{th\ position}} \times \cdots \times G_n $$

$K_1/k$, …, $K_n/k$ are Galois extensions
$G(K_i/k)=G_i$ for $i=1,\ldots,n$
$K_{i+1}\cap(K_1\cdots K_i) = k$ for $i=1,\ldots,n-1$
$K=K_1\cdots K_n$

assume all fields are subfields of common field

for two abelian Galois extensions, $K/k$ and $L/k$, $KL/k$ is abelian Galois extension
for abelian Galois extension, $K/k$, and any extension, $E/k$, $KE/E$ is abelian Galois extension
for abelian Galois extension, $K/k$, and intermediate field, $E$, both $K/E$ and $E/k$ are abelian Galois extensions

Solvable and radical extensions

finite separable extension, $E/k$, such that Galois group of smallest Galois extension, $K/k$, containing $E$ is solvable, said to be solvable

solvable extensions form distinguished class of extensions

finite extension, $F/k$, such that it is separable and exists finite extension, $E/k$, containing $F$ admitting tower decomposition $$ k = E_0 \subset E_1 \subset \cdots \subset E_m = E $$ with $E_{i+1}/E_i$ is obtained by adjoining root of

unity, or
$X^n=a$ with $a\in E_i$, and $n$ prime to characteristic, or
$X_p-X-a$ with $a\in E_i$ if $p$ is positive characteristic

said to be solvable by radicals

separable extension, $E/k$, is solvable by radicals if and only if it is solvable

Applications of Galois theory

general equation of degree, $n$, cannot be solved by radicals for $n\geq5$ (implied by , , , and )

$f\in \complexes[X]$ of degree, $n$, has precisely $n$ roots in $\complexes$ (when counted with multiplicity), hence $\complexes$ is algebraically closed

Real Analysis

Set Theory

Some principles

$$ P(1) \& [P(n\Rightarrow P(n+1)] \Rightarrow (\forall n \in \naturals)P(n) $$

each nonempty subset of $\naturals$ has a smallest element

for $f:X\to X$ and $a\in X$, exists unique infinite sequence $\langle x_n\rangle_{n=1}^\infty\subset X$ such that $$ x_1=a $$ and $$ \left( \forall n \in \naturals \right) \left( x_{n+1} = f(x_n) \right) $$

note that $\Leftrightarrow$ $\Rightarrow$

Some definitions for functions

for $f:X\to Y$

terms, map and function, exterchangeably used
$X$ and $Y$, called domain of $f$ and codomain of $f$ respectively
$\set{f(x)}{x\in X}$, called range of $f$
for $Z\subset Y$, $f^{-1}(Z) = \set{x\in X}{f(x)\in Z}\subset X$, called preimage or inverse image of $Z$ under $f$
for $y\in Y$, $f^{-1}(\{y\})$, called fiber of $f$ over $y$
$f$, called injective or injection or one-to-one if $\left( \forall x\neq v \in X \right) \left( f(x) \neq f(v) \right)$
$f$, called surjective or surjection or onto if $\left( \forall x \in X \right) \left( \exists y in Y \right) (y=f(x))$
$f$, called bijective or bijection if $f$ is both injective and surjective, in which case, $X$ and $Y$, said to be one-to-one correspondece or bijective correspondece
$g:Y\to X$, called left inverse if $g\circ f$ is identity function
$h:Y\to X$, called right inverse if $f\circ h$ is identity function

Some properties of functions

for $f:X\to Y$

$f$ is injective if and only if $f$ has left inverse
$f$ is surjective if and only if $f$ has right inverse
hence, $f$ is bijective if and only if $f$ has both left and right inverse because if $g$ and $h$ are left and right inverses respectively, $g = g \circ (f\circ h) = (g\circ f)\circ h = h$
if $|X|=|Y|<\infty$, $f$ is injective if and only if $f$ is surjective if and only if $f$ is bijective

Countability of sets

set $A$ is countable if range of some function whose domain is $\naturals$
$\naturals$, $\integers$, $\rationals$: countable
$\reals$: not countable

Limit sets

for sequence, $\seq{A_n}$, of subsets of $X$
- limit superior or limsup of \seq{A_n}, defined by $$ \limsup \seq{A_n} = \bigcap_{n=1}^\infty \bigcup_{m=n}^\infty A_m $$
- limit inferior or liminf of \seq{A_n}, defined by $$ \liminf \seq{A_n} = \bigcup_{n=1}^\infty \bigcap_{m=n}^\infty A_m $$
always $$ \liminf \seq{A_n} \subset \limsup \seq{A_n} $$
when $\liminf \seq{A_n} = \limsup \seq{A_n}$, sequence, $\seq{A_n}$, said to converge to it, denote $$ \lim \seq{A_n} = \liminf \seq{A_n} = \limsup \seq{A_n} = A $$

Algebras of sets

collection $\alg$ of subsets of $X$ called algebra or Boolean algebra if $$ (\forall A, B \in \alg) (A\cup B\in\alg) \mbox{ and } (\forall A \in \alg) (\compl{A}\in\alg) $$
- $(\forall A_1, \ldots, A_n \in \alg)(\cup_{i=1}^n A_i \in \alg)$
- $(\forall A_1, \ldots, A_n \in \alg)(\cap_{i=1}^n A_i \in \alg)$
algebra $\alg$ called $\sigma$-algebra or Borel field if
- every union of a countable collection of sets in $\alg$ is in $\alg$, i.e., $$ (\forall \seq{A_i})(\cup_{i=1}^\infty A_i \in \alg) $$
given sequence of sets in algebra $\alg$, $\seq{A_i}$, exists disjoint sequence, $\seq{B_i}$ such that $$ B_i \subset A_i \mbox{ and } \bigcup_{i=1}^\infty B_i = \bigcup_{i=1}^\infty A_i $$

Algebras generated by subsets

algebra generated by collection of subsets of $X$, $\coll$, can be found by $$ \alg = \bigcap \set{\algk{B}}{\algk{B} \in \collF} $$ where $\collF$ is family of all algebras containing $\coll$
- smallest algebra $\alg$ containing $\coll$, i.e., $$ (\forall \algk{B} \in \collF)(\alg \subset \algk{B}) $$
$\sigma$-algebra generated by collection of subsets of $X$, $\coll$, can be found by $$ \alg= \bigcap \set{\algk{B}}{\algk{B} \in \collG} $$ where $\collG$ is family of all $\sigma$-algebras containing $\coll$
- smallest $\sigma$-algebra $\alg$ containing $\coll$, i.e., $$ (\forall \algk{B} \in \collG)(\alg \subset \algk{B}) $$

Relation

$x$ said to stand in relation $\rel$ to $y$, denoted by $\relxy{x}{y}$
$\rel$ said to be relation on $X$ if $\relxy{x}{y}$ $\Rightarrow$ $x\in X$ and $y\in X$
$\rel$ is
- transitive if $\relxy{x}{y}$ and $\relxy{y}{z}$ $\Rightarrow$ $\relxy{x}{z}$
- symmetric if $\relxy{x}{y} = \relxy{y}{x}$
- reflexive if $\relxy{x}{x}$
- antisymmetric if $\relxy{x}{y}$ and $\relxy{y}{x}$ $\Rightarrow$ $x=y$
$\rel$ is
- equivalence relation if transitive, symmetric, and reflexive, e.g., modulo
- partial ordering if transitive and antisymmetric, e.g., “$\subset$''
- linear (or simple) ordering if transitive, antisymmetric, and $\relxy{x}{y}$ or $\relxy{y}{x}$ for all $x,y\in X$
  - e.g., “$\geq$'' linearly orders $\reals$ while “$\subset$'' does not $\powerset(X)$

Ordering

given partial order, $\prec$, $a$ is
- a first/smallest/least element if $x \neq a \Rightarrow a\prec x$
- a last/largest/greatest element if $x \neq a \Rightarrow x\prec a$
- a minimal element if $x \neq a \Rightarrow x \not\prec a$
- a maximal element if $x \neq a \Rightarrow a \not\prec x$
partial ordering $\prec$ is
- strict partial ordering if $x\not\prec x$
- reflexive partial ordering if $x\prec x$
strict linear ordering $<$ is
- well ordering for $X$ if every nonempty set contains a first element

Axiom of choice and equivalent principles

given a collection of nonempty sets, $\coll$, there exists $f:\coll\ \to \cup_{A\in\coll} A $ such that $$ \left( \forall A\in\coll\ \right) \left( f(A) \in A \right) $$

also called multiplicative axiom - preferred to be called to axiom of choice by Bertrand Russell for reason writte
no problem when $\coll$ is finite
need axiom of choice when $\coll$ is not finite

for particial ordering $\prec$ on $X$, exists a maximal linearly ordered subset $S\subset X$, i.e., $S$ is linearity ordered by $\prec$ and if $S\subset T\subset X$ and $T$ is linearly ordered by $\prec$, $S=T$

every set $X$ can be well ordered, i.e., there is a relation $<$ that well orders $X$

note that $\Leftrightarrow$ $\Leftrightarrow$

Infinite direct product

for collection of sets, $\seq{X_\lambda}$, with index set, $\Lambda$, $$ \bigtimes_{\lambda\in\Lambda} X_\lambda $$ called direct product

for $z=\seq{x_\lambda}\in\bigtimes X_\lambda$, $x_\lambda$ called $\lambda$-th coordinate of $z$

if one of $X_\lambda$ is empty, $\bigtimes X_\lambda$ is empty
axiom of choice is equivalent to converse, i.e., if none of $X_\lambda$ is empty, $\bigtimes X_\lambda$ is not empty if one of $X_\lambda$ is empty, $\bigtimes X_\lambda$ is empty
this is why Bertrand Russell prefers multiplicative axiom to axiom of choice for name of axiom ()

Real Number System

Field axioms

field axioms - for every $x,y,z\in\field$
- $(x+y)+z= x+(y+z)$ - additive associativity
- $(\exists 0\in\field)(\forall x\in\field)(x+0=x)$ - additive identity
- $(\forall x\in\field)(\exists w\in\field)(x+w=0)$ - additive inverse
- $x+y= y+x$ - additive commutativity
- $(xy)z= x(yz)$ - multiplicative associativity
- $(\exists 1\neq0\in\field)(\forall x\in\field)(x\cdot 1=x)$ - multiplicative identity
- $(\forall x\neq0\in\field)(\exists w\in\field)(xw=1)$ - multiplicative inverse
- $x(y+z) = xy + xz$ - distributivity
- $xy= yx$ - multiplicative commutativity
system (set with $+$ and $\cdot$) satisfying axiom of field called field
- e.g., field of module $p$ where $p$ is prime, $\primefield{p}$

Axioms of order

axioms of order - subset, $\field_{++}\subset \field$, of positive (real) numbers satisfies
- $x,y\in \field_{++} \Rightarrow x+y\in \field_{++}$
- $x,y\in \field_{++} \Rightarrow xy\in \field_{++}$
- $x\in \field_{++} \Rightarrow -x\not\in \field_{++}$
- $x\in \field \Rightarrow x=0\lor x\in \field_{++} \lor -x \in \field_{++}$
system satisfying field axioms & axioms of order called ordered field
- e.g., set of real numbers ($\reals$), set of rational numbers ($\rationals$)

Axiom of completeness

completeness axiom
- every nonempty set $S$ of real numbers which has an upper bound has a least upper bound, i.e., $$ \set{l}{(\forall x\in S)(l\leq x)} $$ has least element.
- use $\inf S$ and $\sup S$ for least and greatest element (when exist)
ordered field that is complete is complete ordered field
- e.g., $\reals$ (with $+$ and $\cdot$)
axiom of Archimedes
- given any $x\in\reals$, there is an integer $n$ such that $x<n$
corollary
- given any $x<y \in \reals$, exists $r\in\rationals$ such tat $ x < r < y $

Sequences of $\reals$

sequence of $\reals$ denoted by $\seq{x_i}_{i=1}^\infty$ or $\seq{x_i}$
- mapping from $\naturals$ to $\reals$
limit of $\seq{x_n}$ denoted by $\lim_{n\to\infty} x_n$ or $\lim x_n$ - defined by $a\in\reals$ such that $$ (\forall \epsilon>0)(\exists N\in\naturals) (n \geq N \Rightarrow |x_n-a|<\epsilon) $$
- $\lim x_n$ unique if exists
$\seq{x_n}$ called Cauchy sequence if $$ (\forall \epsilon>0)(\exists N\in\naturals) (n,m \geq N \Rightarrow |x_n-x_m|<\epsilon) $$
Cauchy criterion - characterizing complete metric space (including $\reals$)
- sequence converges if and only if Cauchy sequence

Other limits

cluster point of $\seq{x_n}$ - defined by $c\in\reals$ $$ (\forall \epsilon>0, N\in\naturals)(\exists n>N)(|x_n-c|<\epsilon) $$
limit superior or limsup of $\seq{x_n}$ $$ \limsup x_n = \inf_n \sup_{k>n} x_k $$
limit inferior or liminf of $\seq{x_n}$ $$ \liminf x_n = \sup_n \inf_{k>n} x_k $$
$\liminf x_n \leq \limsup x_n$
$\seq{x_n}$ converges if and only if $\liminf x_n = \limsup x_n$ (=$\lim x_n$)

Open and closed sets

$O$ called open if $$ (\forall x\in O)(\exists \delta>0)(y\in\reals)(|y-x|<\delta\Rightarrow y\in O) $$
- intersection of finite collection of open sets is open
- union of any collection of open sets is open
$\closure{E}$ called closure of $E$ if $$ (\forall x \in \closure{E} \ \&\ \delta>0)(\exists y\in E)(|x-y|<\delta) $$
$F$ called closed if $$ F = \closure{F} $$
- union of finite collection of closed sets is closed
- intersection of any collection of closed sets is closed

Open and closed sets - facts

every open set is union of countable collection of disjoint open intervals
(Lindelöf) any collection $\coll$ of open sets has a countable subcollection $\seq{O_i}$ such that $$ \bigcup_{O\in\coll} O = \bigcup_{i} O_i $$
- equivalently, any collection $\collk{F}$ of closed sets has a countable subcollection $\seq{F_i}$ such that $$ \bigcap_{O\in\collk{F}} F = \bigcap_{i} F_i $$

Covering and Heine-Borel theorem

collection $\coll$ of sets called covering of $A$ if $$ A \subset \bigcup_{O\in\coll} O $$
- $\coll$ said to cover $A$
- $\coll$ called open covering if every $O\in\coll$ is open
- $\coll$ called finite covering if $\coll$ is finite
Heine-Borel theorem - for any closed and bounded set, every open covering has finite subcovering
corollary
- any collection $\coll$ of closed sets including at least one bounded set every finite subcollection of which has nonempty intersection has nonempty intersection.

Continuous functions

$f$ (with domain $D$) called continuous at $x$ if $$ (\forall\epsilon >0)(\exists \delta>0)(\forall y\in D)(|y-x|<\delta \Rightarrow |f(y)-f(x)|<\epsilon) $$
$f$ called continuous on $A\subset D$ if $f$ is continuous at every point in $A$
$f$ called uniformly continuous on $A\subset D$ if $$ (\forall\epsilon >0)(\exists \delta>0)(\forall x,y\in D)(|x-y|<\delta \Rightarrow |f(x)-f(y)|<\epsilon) $$

Continuous functions - facts

$f$ is continuous if and only if for every open set $O$ (in co-domain), $f^{-1}(O)$ is open
$f$ continuous on closed and bounded set is uniformly continuous
extreme value theorem - $f$ continuous on closed and bounded set, $F$, is bounded on $F$ and assumes its maximum and minimum on $F$ $$ (\exists x_1, x_2 \in F)(\forall x\in F)(f(x_1) \leq f(x) \leq f(x_2)) $$
intermediate value theorem - for $f$ continuous on $[a,b]$ with $f(a) \leq f(b)$, $$ (\forall d)(f(a) \leq d \leq f(b))(\exists c\in[a,b])(f(c) = d) $$

Borel sets and Borel $\sigma$-algebra

Borel set
- any set that can be formed from open sets (or, equivalently, from closed sets) through the operations of countable union, countable intersection, and relative complement
Borel algebra or Borel $\sigma$-algebra
- smallest $\sigma$-algebra containing all open sets
- also
  - smallest $\sigma$-algebra containing all closed sets
  - smallest $\sigma$-algebra containing all open intervals (due to statement on page~here)

Various Borel sets

countable union of closed sets (in $\reals$), called an $F_\sigma$ ($F$ for closed & $\sigma$ for sum)
- thus, every countable set, every closed set, every open interval, every open sets, is an $F_\sigma$ (note $(a,b)=\bigcup_{n=1}^\infty [a+1/n,b-1/n]$)
- countable union of sets in $F_\sigma$ again is an $F_\sigma$
countable intersection of open sets called a $G_\delta$ ($G$ for open & $\delta$ for durchschnitt - average in German)
- complement of $F_\sigma$ is a $G_\delta$ and vice versa
$F_\sigma$ and $G_\delta$ are simple types of Borel sets
countable intersection of $F_\sigma$'s is $F_{\sigma\delta}$, countable union of $F_{\sigma\delta}$'s is $F_{\sigma\delta\sigma}$, countable intersection of $F_{\sigma\delta\sigma}$'s is $F_{\sigma\delta\sigma\delta}$, etc., & likewise for $G_{\delta \sigma \ldots}$
below are all classes of Borel sets, but not every Borel set belongs to one of these classes $$ F_{\sigma}, F_{\sigma\delta}, F_{\sigma\delta\sigma}, F_{\sigma\delta\sigma\delta}, \ldots, G_{\delta}, G_{\delta\sigma}, G_{\delta\sigma\delta}, G_{\delta\sigma\delta\sigma}, \ldots, $$

Lebesgue Measure

Riemann integral

Riemann integral
- partition induced by sequence $\seq{x_i}_{i=1}^n$ with $a=x_1<\cdots<x_n=b$
- lower and upper sums
  - $L(f,\seq{x_i}) = \sum_{i=1}^{n-1} \inf_{x\in[x_i,x_{i+1}]} f(x) (x_{i+1}-x_{i})$
  - $U(f,\seq{x_i}) = \sum_{i=1}^{n-1} \sup_{x\in[x_i,x_{i+1}]} f(x) (x_{i+1}-x_{i})$
- always holds: $L(f,\seq{x_i}) \leq U(f,\seq{y_i})$, hence $$ \sup_{\seq{x_i}} L(f,\seq{x_i}) \leq \inf_{\seq{x_i}} U(f,\seq{x_i}) $$
- Riemann integrable if $$ \sup_{\seq{x_i}} L(f,\seq{x_i}) = \inf_{\seq{x_i}} U(f,\seq{x_i}) $$
every continuous function is Riemann integrable

Motivation - want measure better than Riemann integrable

consider indicator (or characteristic) function $\chi_\rationals:[0,1] \to [0,1]$ $$ \chi_\rationals(x) = \left\{\begin{array}{ll} 1 &\mbox{if } x \in \rationals \\ 0 &\mbox{if } x \not\in \rationals \end{array}\right. $$
not Riemann integrable: $\sup_{\seq{x_i}} L(f,\seq{x_i}) = 0 \neq 1 = \inf_{\seq{x_i}} U(f,\seq{x_i})$
however, irrational numbers infinitely more than rational numbers, hence
- want to have some integral $\int$ such that, e.g., $$ \int_{[0,1]} \chi_\rationals(x) dx = 0 \mbox{ and } \int_{[0,1]} (1-\chi_\rationals(x)) dx = 1 $$

Properties of desirable measure

want some measure $\mu:\subsetset{M}\to\preals=\set{x\in\reals}{x\geq0}$
- defined for every subset of $\reals$, i.e., $\subsetset{M} = \powerset(\reals)$
- equals to length for open interval $$ \mu[b,a] = b-a $$
- countable additivity: for disjoint $\seq{E_i}_{i=1}^\infty$ $$ \mu(\cup E_i) = \sum \mu(E_i) $$
- translation invariant $$ \mu(E+x) = \mu(E) \mbox{ for } x\in\reals $$
no such measure exists
not known whether measure with first three properties exists
want to find translation invariant countably additive measure
- hence, give up on first property

Race won by Henri Lebesgue in 1902!

mathematicians in 19th century struggle to solve this problem
race won by French mathematician, Henri Léon Lebesgue in 1902!
Lebesgue integral covers much wider range of functions
- indeed, $\chi_\rationals$ is Lebesgue integrable $$ \int_{[0,1]} \chi_\rationals(x) dx = 0 \mbox{ and } \int_{[0,1]} (1-\chi_\rationals(x)) dx = 1 $$

Outer measure

for $E\subset\reals$, define outer measure $\mu^\ast:\powerset(\reals)\to\preals$ $$ \mu^\ast E = \inf_{\seq{I_i}} \left\{\left.\sum l(I_i) \right| E\subset \cup I_i\right\} $$ where $I_i=(a_i,b_i)$ and $l(I_i) = b_i-a_i$
outer measure of open interval is length $$ \mu^\ast(a_i,b_i) = b_i-a_i $$
countable subadditivity $$ \mu^\ast\left(\cup E_i\right) \leq \sum \mu^\ast E_i $$
corollaries
- $\mu^\ast E = 0$ if $E$ is countable
- $[0,1]$ not countable

Measurable sets

$E\subset\reals$ called measurable if for every $A\subset\reals$ $$ \mu^\ast A = \mu^\ast (E\cup A) + \mu^\ast (\compl{E}\cup {A}) $$
$\mu^\ast E =0$, then $E$ measurable
every open interval $(a,b)$ with $a\geq -\infty$ and $b\leq \infty$ is measurable
disjoint countable union of measurable sets is measurable, i.e., $\cup E_i$ is measurable
collection of measurable sets is $\sigma$-algebra

Borel algebra is measurable

note
- every open set is disjoint countable union of open intervals (page~here)
- disjoint countable union of measurable sets is measurable (page~here)
- open intervals are measurable (page~here)
hence, every open set is measurable
also
- collection of measurable sets is $\sigma$-algebra (page~here)
- every open set is Borel set and Borel sets are $\sigma$-algebra (page~here)
hence, Borel sets are measurable
specifically, Borel algebra (smallest $\sigma$-algebra containing all open sets) is measurable

Lebesgue measure

restriction of $\mu^\ast$ in collection $\subsetset{M}$ of measurable sets called Lebesgue measure $$ \mu:\subsetset{M}\to\preals $$
countable subadditivity - for $\seq{E_n}$ $$ \mu (\cup E_n) \leq \sum \mu E_n $$
countable additivity - for disjoint $\seq{E_n}$ $$ \mu (\cup E_n) = \sum \mu E_n $$
for dcreasing sequence of measurable sets, $\seq{E_n}$, i.e., $(\forall n\in\naturals)(E_{n+1} \subset E_n)$ $$ \mu\left( \bigcap E_n \right) = \lim \mu E_n $$

(Lebesgue) measurable sets are nice ones!

following statements are equivalent $$ \begin{eqnarray*} &-& E \mbox{ is measurable} \\ &-& (\forall \epsilon >0) (\exists \mbox{ open } O\supset E) (\mu^\ast(O\sim E)<\epsilon) \\ &-& (\forall \epsilon >0) (\exists \mbox{ closed } F\subset E) (\mu^\ast(E\sim F)<\epsilon) \\ &-& (\exists G_\delta) (G_\delta \supset E) (\mu^\ast(G_\delta\sim E)<\epsilon) \\ &-& (\exists F_\sigma) (F_\sigma \subset E) (\mu^\ast(E\sim F_\sigma)<\epsilon) \end{eqnarray*} $$
if $\mu^\ast E$ is finite, above statements are equivalent to $$ (\forall \epsilon>0) \left(\exists U = \bigcup_{i=1}^n (a_i,b_i) \right) (\mu^\ast (U\Delta E) < \epsilon) $$

Lebesgue measure resolves problem in movitation

let $$ E_1 = \set{x\in[0,1]}{x\in\rationals},\ E_2 = \set{x\in[0,1]}{x\not\in\rationals} $$
$\mu^\ast E_1=0$ because $E_1$ is countable, hence measurable and $$ \mu E_1 = \mu^\ast E_1 = 0 $$
algebra implies $E_2 = [0, 1] \cap \compl{E_1}$ is measurable
countable additivity implies $\mu E_1 + \mu E_2 = \mu[0,1] = 1$, hence $$ \mu E_1 = 1 $$

Lebesgue Measurable Functions

Lebesgue measurable functions

for $f:X\to\reals\cup\{-\infty, \infty\}$, i.e., extended real-valued function, the followings are equivalent
- for every $a\in\reals$, $\set{x\in{X}}{f(x) < a}$ is measurable
- for every $a\in\reals$, $\set{x\in{X}}{f(x) \leq a}$ is measurable
- for every $a\in\reals$, $\set{x\in{X}}{f(x) > a}$ is measurable
- for every $a\in\reals$, $\set{x\in{X}}{f(x) \geq a}$ is measurable
if so,
- for every $a\in\reals\cup\{-\infty, \infty\}$, $\set{x\in{X}}{f(x) = a}$ is measurable
extended real-valued function, $f$, called (Lebesgue) measurable function if
- domain is measurable
- any one of above four statements holds

Properties of Lebesgue measurable functions

for real-valued measurable functions, $f$ and $g$, and $c\in\reals$
- $f+c$, $cf$, $f+g$, $fg$ are measurable
for every extended real-valued measurable function sequence, $\seq{f_n}$
- $\sup f_n$, $\limsup f_n$ are measurable
- hence, $\inf f_n$, $\liminf f_n$ are measurable
- thus, if $\lim f_n$ exists, it is measurable

Almost everywhere - a.e.

statement, $P(x)$, called almost everywhere or a.e. if $$ \mu \set{x}{\sim P(x)} = 0 $$
- e.g., $f$ said to be equal to $g$ a.e. if $\mu\set{x}{f(x)\neq g(x)}=0$
- e.g., $\seq{f_n}$ said to converge to $f$ a.e. if $$ (\exists E \mbox{ with } \mu E=0)(\forall x \not\in E)(\lim f_n (x) = f(x)) $$
facts
- if $f$ is measurable and $f=g$ i.e., then $g$ is measurable
- if measurable extended real-valued $f$ defined on $[a,b]$ with $f(x) \in\reals$ a.e., then for every $\epsilon>0$, exist step function $g$ and continuous function $h$ such that $$ \mu\set{x}{|f-g| \geq \epsilon} < \epsilon,\ \mu\set{x}{|f-h| \geq \epsilon} < \epsilon $$

Characteristic \& simple functions

for any $A\subset\reals$, $\chi_A$ called characteristic function if $$ \chi_A(x) = \left\{\begin{array}{ll} 1 & x\in A\\ 0 & x\not\in A\\ \end{array}\right. $$
- $\chi_A$ is measurable if and only if $A$ is measurable
measurable $\varphi$ called simple if for some distinct $\seq{a_i}_{i=1}^n$ $$ \varphi(x) = \sum_{i=1}^n a_i \chi_{A_i}(x) $$ where $A_i = \set{x}{x= a_i}$

Littlewood's three principles

let $M(E)$ with measurable set, $E$, denote set of measurable functions defined on $E$
every (measurable) set is nearly finite union of intervals, e.g.,
- $E$ is measurable if and only if $$ (\forall \epsilon>0) (\exists \{I_i: \mbox{open\ interval}\}_{i=1}^n) (\mu^\ast(E \Delta (\cup I_n)) < \epsilon) $$
every (measurable) function is nearly continuous, e.g.,
- (Lusin's theorem) $$ (\forall f \in M[a,b])(\forall \epsilon >0)(\exists g \in C[a,b]) (\mu\set{x}{f(x)\neq g(x)}< \epsilon) $$
every convergent (measurable) function sequence is nearly uniformly convergent, e.g., $$ \begin{eqnarray*} && (\forall \mbox{ measurable }\seq{f_n} \mbox{ converging to } f \mbox { a.e. on } E \mbox{ with } \mu E<\infty) \\ && (\forall \epsilon>0 \mbox{ and } \delta>0) (\exists A\subset E \mbox{ with } \mu(A)<\delta \mbox{ and } N\in\naturals) \\ && (\forall n > N, x\in E\sim A)(|f_n(x)-f(x)|<\epsilon) \end{eqnarray*} $$

Egoroff's theorem

Egoroff theorem - provides stronger version of third statement on page~here $$ \begin{eqnarray*} && (\forall \mbox{ measurable }\seq{f_n} \mbox{ converging to } f \mbox { a.e. on } E \mbox{ with } \mu E<\infty) \\ && (\exists A\subset E \mbox{ with } \mu(A)<\epsilon) (f_n \mbox{ uniformly converges to } f \mbox{ on } E\sim A ) \end{eqnarray*} $$

Lebesgue Integral

Integral of simple functions

canonical representation of simple function $$ \varphi(x) = \sum_{i=1}^n a_i \chi_{A_i}(x) $$ where $a_i$ are distinct $A_i=\{x|\varphi(x)=a_i\}$ - note $A_i$ are disjoint
when $\mu\set{x}{\varphi(x)\neq0}< \infty$ and $\varphi = \sum_{i=1}^n a_i \chi_{A_i}$ is canonical representation, define integral of $\varphi$ by $$ \int \varphi = \int \varphi (x) dx= \sum_{i=1}^n a_i \mu A_i $$
when $E$ is measurable, define $$ \int_E \varphi = \int \varphi \chi_E $$

Properties of integral of simple functions

for simple functions $\varphi$ and $\psi$ that vanish out of finite measure set, i.e., $\mu\set{x}{\varphi(x)\neq0}<\infty$, $\mu\set{x}{\psi(x)\neq0}<\infty$, and for every $a,b\in\reals$
$$ \int (a\varphi + b\psi) = a \int\varphi + b \int\psi $$
thus, even for simple function, $\varphi = \sum_{i=1}^n a_i \chi_{A_i}$ that vanishes out of finite measure set, not necessarily in canonical representation, $$ \int \varphi = \sum_{i=1}^n a_i \mu A_i $$
if $\varphi \geq \psi$ a.e. $$ \int \varphi \geq \int \psi $$

Lebesgue integral of bounded functions

for bounded function, $f$, and finite measurable set, $E$, $$ \sup_{\varphi:\ \mathrm{simple},\ \varphi \leq f} \int_E \varphi \leq \inf_{\psi:\ \mathrm{simple},\ f \leq \psi} \int_E \psi $$
- if $f$ is defined on $E$, $f$ is measurable function if and only if $$ \sup_{\varphi:\ \mathrm{simple},\ \varphi \leq f} \int_E \varphi = \inf_{\psi:\ \mathrm{simple},\ f \leq \psi} \int_E \psi $$
for bounded measurable function, $f$, defined on measurable set, $E$, with $\mu E < \infty$, define (Lebesgue) integral of $f$ over $E$
$$ \int_E f(x) dx = \sup_{\varphi:\ \mathrm{simple},\ \varphi \leq f} \int_E \varphi = \inf_{\psi:\ \mathrm{simple},\ f \leq \psi} \int_E \psi $$

Properties of Lebesgue integral of bounded functions

for bounded measurable functions, $f$ and $g$, defined on $E$ with finite measure
- for every $a,b\in\reals$ $$ \int_E (af+bg) = a \int_E f + b\int_E g $$
- if $f\leq g$ a.e. $$ \int_E f \leq \int_E g $$
- for disjoint measurable sets, $A,B\subset E$, $$ \int_{A\cup B} f = \int_A f + \int_B f $$
hence, $$ \left|\int_E f \right| \leq \int_E |f| \mbox{ \& } f=g \mbox{ a.e. } \Rightarrow \int_E f = \int_E g $$

Lebesgue integral of bounded functions over finite interval

if bounded function, $f$, defined on $[a,b]$ is Riemann integrable, then $f$ is measurable and $$ \int_{[a,b]} f = R \int_a^b f(x) dx $$ where $R\int$ denotes Riemann integral
bounded function, $f$, defined on $[a,b]$ is Riemann integrable if and only if set of points where $f$ is discontinuous has measure zero
for sequence of measurable functions, $\seq{f_n}$, defined on measurable $E$ with finite measure, and $M>0$, if $|f_n|<M$ for every $n$ and $f(x) = \lim f_n(x)$ for every $x\in E$ $$ \int_E f = \lim \int_E f_n $$

Lebesgue integral of nonnegative functions

for nonnegative measurable function, $f$, defined on measurable set, $E$, define
$$ \int_E f = \sup_{h:\ \mathrm{bounded\ measurable\ function},\ \mu\set{x}{h(x)\neq0}<\infty,\ h\leq f} \int_E h $$
for nonnegative measurable functions, $f$ and $g$
- for every $a,b\geq0$ $$ \int_E (af + bg) = a\int_E f + b\int_E g $$
- if $f\geq g$ a.e. $$ \int_E f \leq \int_E g $$
thus,
- for every $c>0$ $$ \int_E cf = a\int_E f $$

Fatou's lemma and monotone convergence theorem for Lebesgue integral

Fatou's lemma - for nonnegative measurable function sequence, $\seq{f_n}$, with $\lim f_n = f$ a.e. on measurable set, $E$ $$ \int_E f \leq \liminf \int_E f_n $$
- note $\lim f_n$ is measurable (page~here), hence $f$ is measurable (page~here)
monotone convergence theorem - for nonnegative increasing measurable function sequence, $\seq{f_n}$, with $\lim f_n = f$ a.e. on measurable set, $E$
$$ \int_E f = \lim \int_E f_n $$
for nonnegative measure function, $f$, and sequence of disjoint measurable sets, $\seq{E_i}$, $$ \int_{\cup E_i} f = \sum \int_{E_i} f $$

Lebesgue integrability of nonnegative functions

nonnegative measurable function, $f$, said to be integrable over measurable set, $E$, if
$$ \int_E f < \infty $$
for nonnegative measurable functions, $f$ and $g$, if $f$ is integrable on measurable set, $E$, and $g\leq f$ a.e. on $E$, then $g$ is integrable and $$ \int_E (f-g) = \int_E f - \int_E g $$
for nonnegative integrable function, $f$, defined on measurable set, $E$, and every $\epsilon$, exists $\delta >0$ such that for every measurable set $A\subset E$ with $\mu A< \epsilon$ (then $f$ is integrable on $A$, of course), $$ \int_A f < \epsilon $$

Lebesgue integral

for (any) function, $f$, define $f^+$ and $f^-$ such that for every $x$ $$ \begin{eqnarray*} f^+(x) &=& \max\{f(x), 0\} \\ f^-(x) &=& \max\{-f(x), 0\} \end{eqnarray*} $$
note $f = f^+ - f^-,\ |f| = f^+ + f^-,\ f^- = (-f)^+$
measurable function, $f$, said to be (Lebesgue) integrable over measurable set, $E$, if (nonnegative measurable) functions, $f^+$ and $f^-$, are integrable
$$ \int_E f = \int_E f^+ - \int_E f^- $$

Properties of Lebesgue integral

for $f$ and $g$ integrable on measure set, $E$, and $a,b\in\reals$
- $af+bg$ is integral and $$ \int_E (af+bg) = a \int_E f + b\int_E g $$
- if $f\geq g$ a.e. on $E$, $$ \int_E f \geq \int_E g $$
- for disjoint measurable sets, $A,B\subset E$ $$ \int_{A\cup B} f = \int_A f + \int_B g $$

Lebesgue convergence theorem (for Lebesgue integral)

Lebesgue convergence theorem - for measurable $g$ integrable on measurable set, $E$, and measurable sequence $\seq{f_n}$ converging to $f$ with $|f_n|<g$ a.e. on $E$, ($f$ is measurable (page~here), every $f_n$ is integrable (page~here)) and
$$ \int_E f = \lim \int_E f_n $$

Generalization of Lebesgue convergence theorem (for Lebesgue integral)

generalization of Lebesgue convergence theorem - for sequence of functions, $\seq{g_n}$, integrable on measurable set, $E$, converging to integrable $g$ a.e. on $E$, and sequence of measurable functions, $\seq{f_n}$, converging to $f$ a.e. on $E$ with $|f_n|<g_n$ a.e. on $E$, if $$ \int_E g = \lim \int_E g_n $$ then ($f$ is measurable (page~here), every $f_n$ is integrable (page~here)) and $$ \int_E f = \lim \int_E f_n $$

Comments on convergence theorems

Fatou's lemma (page~here), monotone convergence theorem (page~here), Lebesgue convergence theorem (page~here), all state that under suitable conditions, we say something about $$ \int \lim f_n $$ in terms of $$ \lim \int f_n $$
Fatou's lemma requires weaker condition than Lebesgue convergence theorem, i.e., only requires “bounded below'' whereas Lebesgue converges theorem also requires “bounded above'' $$ \int \lim f_n \leq \liminf \int f_n $$
monotone convergence theorem is somewhat between the two;
- advantage - applicable even when $f$ not integrable
- Fatou's lemma and monotone converges theorem very clsoe in sense that can be derived from each other using only facts of positivity and linearity of integral

Convergence in measure

$\seq{f_n}$ of measurable functions said to converge $f$ in measure if $$ (\forall \epsilon>0) (\exists N\in\naturals) (\forall n > N) (\mu\set{x}{|f_n-f|>\epsilon} < \epsilon) $$
thus, third statement on page~here implies $$ (\forall \seq{f_n} \mbox{ converging to } f \mbox { a.e. on } E \mbox{ with } \mu E<\infty) (f_n \mbox{ converge in measure to }f) $$
however, the converse is not true, i.e., exists $\seq{f_n}$ converging in measure to $f$ that does not converge to $f$ a.e.
- e.g., XXX
Fatou's lemma (page~here), monotone convergence theorem (page~here), Lebesgue convergence theorem (page~here) remain valid! even when “convergence a.e.'' replaced by “convergence in measure''

Conditions for convergence in measure

$$ \left( \forall \seq{f_n} \mbox{ converging in measure to } f \right) \left( \exists \mbox{ subsequence }\seq{f_{n_k}} \mbox{ converging a.e. to } f \right) $$

for sequence $\seq{f_n}$ measurable on $E$ with $\mu E<\infty$ $$ \begin{eqnarray*} &&\seq{f_n} \mbox{ converging in measure to } f \\ &\Leftrightarrow& \left( \forall \mbox{ subsequence }\seq{f_{n_k}} \right) \left( \exists \mbox{ its subsequence }\seq{f_{n_{k_l}}} \mbox{ converging a.e. to } f \right) \end{eqnarray*} $$

Space Overview

Diagrams for relations among various spaces

note from the figure
- metric should be defined to utter completeness
- metric spaces can be induced from normed spaces

Classical Banach Spaces

Normed linear space

$X$ called linear space if $$ (\forall x, y \in X, a, b \in \reals)(ax + by \in X) $$
linear space, $X$, called normed space with associated norm $\|\cdot\|: X \to \preals$ if
- $$ (\forall x\in X)(\|x\|=0 \Rightarrow x \equiv 0) $$
- $$ (\forall x \in X, a \in \reals)(\|ax\| = |a|\|x\|) $$
- subadditivity $$ (\forall x,y\in X)(\|x+y\| \leq \|x\| + \|y\|) $$

$L^p$ spaces

$L^p = L^p[0,1]$ denotes space of (Lebesgue) measurable functions such that $$ \int_{[0,1]} |f|^p < \infty $$
define $\|\cdot\|:L^p\to\preals$ $$ \|f\| = \|f\|_p = \left(\int_{[0,1]} |f|^p\right)^{1/p} $$
$L^p$ are linear normed spaces with norm $\|\cdot\|_p$ when $p\geq 1$ because
- $|f(x)|^p + |g(x)|^p \leq 2^p(|f(x)|^p + |g(x)|^p)$ implies $(\forall f, g\in L^p)(f+g \in L^p)$
- $|\alpha f(x)|^p = |a|^p|f(x)|^p$ implies $(\forall f\in L^p, a \in \reals)(af \in L^p)$
- $\|f\|=0\Rightarrow f=0\mbox{ a.e.}$
- $\|a f\| = |a|\|f\|$
- $\|f+g\|\geq \|f\|+\|g\|$ (Minkowski inequality)

$L^\infty$ space

$L^\infty = L^\infty[0,1]$ denotes space of measurable functions bounded a.e.
$L^\infty$ is linear normed space with norm $$ \|f\| = \|f\|_\infty = \mathrm{ess\ sup} |f| = \inf_{g: g=f \ \mathrm{a.e}} \sup_{x\in[0,1]} |g(x)| $$
- thus $$ \|f\|_\infty = \inf\set{M}{\mu\set{x}{f(x)>M}=0} $$

Inequalities in $L^\infty$

Minkowski inequality - for $p\in[1,\infty]$ $$ (\forall f,g\in L^p)(\|f+g\|_p \leq \|f\|_p + \|g\|_p) $$
- if $p\in(1,\infty)$, equality holds if and only if $(\exists a,b\geq 0 \mbox{ with } ab\neq0)(af = bg \mbox{ a.e.})$
Minkowski inequality for $0<p<1$: $$ (\forall f,g\in L^p)(f,g\geq0 \mbox{ a.e.} \Rightarrow \|f+g\|_p \geq \|f\|_p + \|g\|_p) $$
Hölder's inequality - for $p,q\in[1,\infty]$ with $1/p+1/q=1$
$$ (\forall f\in L^p, g\in L^q) \left(fg \in L^1 \mbox{ and } \int_{[0,1]} |fg| \leq \int_{[0,1]} |f|^p \int_{[0,1]} |g|^q\right) $$
- equality holds if and only if $(\exists a,b\geq 0 \mbox{ with } ab\neq0)(a|f|^p = b|g|^q \mbox{ a.e.})$

Convergence and completeness in normed linear spaces

$\seq{f_n}$ in normed linear space
- said to converge to $f$, i.e., $\lim f_n =f$ or $f_n \to f$, if $$ (\forall \epsilon>0)(\exists N\in\naturals)(\forall n> N)(\|f_n-f\|<\epsilon) $$
- called Cauchy sequence if $$ (\forall \epsilon>0)(\exists N\in\naturals)(\forall n,m> N)(\|f_n-f_m\|<\epsilon) $$
- called summable if $\sum^n_{i=1} f_i$ converges
- called absolutely summable if $\sum^n_{i=1} |f_i|$ converges
normed linear space called complete if every Cauchy sequence converges
normed linear space is complete if and only if every absolutely summable series is summable

Banach space

complete normed linear space called Banach space
(Riesz-Fischer) $L^p$ spaces are compact, hence Banach spaces
convergence in $L^p$ called convergence in mean of order $p$
convergence in $L^\infty$ implies nearly uniformly converges

Approximation in $L^p$

$\Delta=\seq{d_i}_{i=0}^n$ with $0=d_1<d_2<\cdots<d_n=1$ called subdivision of $[0,1]$ (with $\Delta_i = [d_{i-1},d_{i}]$)
$\varphi_{f,\Delta}$ for $f\in L^p$ called step function if $$ \varphi_{f,\Delta}(x) = \frac{1}{d_i-d_{i+1}}\int_{d_{i-1}}^{d_i} f(t)dt \mbox{ for } x\in[d_{i-1},d_i) $$
for $f\in L^p$ ($1<p\leq \infty$), exist $\varphi_{f,\Delta}$ and continuous function, $\psi$ such that $$ \|\varphi_{f,\Delta_i}-f\|<\epsilon \mbox{ and } \|\psi-f\|<\epsilon $$
- $L^p$ version of Littlewood's second principle (page~here)
for $f\in L^p$, $\varphi_{f,\Delta}\to f$ as $\max \Delta_i\to0$, i.e., $$ (\forall \epsilon>0)(\exists \delta>0)(\max \Delta_i < \delta \Rightarrow \|\varphi_{f,\Delta}-f\|_p < \epsilon) $$

Bounded linear functionals on $L^p$

$F:X\in\reals$ for normed linear space $X$ called linear functional if $$ (\forall f, g \in F, a,b \in\reals)(F(af+bg)=aF(f)+bF(g)) $$
linear functional, $F$, said to be bounded if $$ (\exists M)(\forall f\in X)(|F(f)|\leq M\|f\|) $$
smallest such constant called norm of $F$, i.e., $$ \|F\| = \sup_{f\in X, f\neq0} {|F(f)|}/{\|f\|} $$

Riesz representation theorem

for every $g\in L^q$ ($1\leq p\leq \infty$), following defines a bounded linear functional in $L^p$ $$ F(f) = \int fg $$ where $\|F\|=\|g\|_q$
Riesz representation theorem - for every bounded linear functional in $L^p$, $F$, ($1\leq p<\infty$), there exists $g\in L^q$ such that
$$ F(f) = \int fg $$ where $\|F\|=\|g\|_q$
for each case, $L^q$ is dual of $L^p$ (refer to page here for definition of dual)

Metric Spaces

Metric spaces

$\metrics{X}{\rho}$ with nonempty set, $X$, and metric $\rho: X\times X\to\preals$ called metric space if for every $x,y,z \in X$
- $\rho(x,y)=0 \Leftrightarrow x=y$
- $\rho(x,y)=\rho(y,x)$
- $\rho(x,y) \leq \rho(x,z) + \rho(z,y)$ (triangle inequality)
examples of metric spaces
- $\metrics{\reals}{|\cdot|}$, $\metrics{\reals^n}{\|\cdot\|_p}$ with $1\leq p\leq \infty$
for $f\subset X$, $S_{x,r} = \set{y}{\rho(y,x)<r}$ called ball
for $E\subset X$, $\sup\set{\rho(x,y)}{x,y \in E}$ called diameter of $E$ defined by
$\rho$ called pseudometric if 1st requirement removed
$\rho$ called extended metric if $\rho: X\times X \to\preals\cup\{\infty\}$

Cartesian product

for two metric spaces $\metrics{X}{\rho}$ and $\metrics{Y}{\sigma}$, metric space $\metrics{X\times Y}{\tau}$ with $\tau:X\times Y\to\preals$ such that $$ \tau((x_1,y_1),(x_2,y_2)) = (\rho(x_1,x_2)^2 + \sigma(y_1,y_2)^2)^{1/2} $$ called Cartesian product metric space
$\tau$ satisfies all properties required by metric
- e.g., $\reals^{n} \times \reals^{m} = \reals^{n+m}$

Open sets - metric spaces

$O \subset X$ said to be open open if $$ (\forall x\in O)(\exists \delta>0)(\forall y\in X)(\rho(y,x)<\delta \Rightarrow y\in O) $$
- $X$ and $\emptyset$ are open
- intersection of finite collection of open sets is open
- union of any collection of open sets is open

Closed sets - metric spaces

$x\in X$ called point of closure of $E\subset X$ if $$ (\forall \epsilon>0)(\exists y\in E)(\rho(y,x) < \epsilon) $$
- $\closure{E}$ denotes set of points of closure of $E$; called closure of $E$
- $E\subset \closure{E}$
$F \subset X$ said to be closed if $$ F = \closure{F} $$
- $X$ and $\emptyset$ are closed
- union of finite collection of closed sets is closed
- intersection of any collection of closed sets is closed
complement of closed set is open
complement of open set is closed

Dense sets and separability - metric spaces

$D\subset X$ said to be dense if $$ \closure{D} = X $$
$X$ is said to be separable if exists finite dense subset, i.e., $$ (\exists D\subset X)(|D| < \infty \ \& \ \closure{D}=X) $$
$X$ is separable if and only if exists countable collection of open sets $\seq{O_i}$ such that for all open $O\subset X$ $$ O = \bigcup_{O_i\subset O} O_i $$

Continuous functions - metric spaces

$f:X\to Y$ for metric spaces $\metrics{X}{\rho}$ and $\metrics{Y}{\sigma}$ called mapping or function from $X$ into $Y$
$f$ said to be onto if $$ f(X)=Y $$
$f$ said to be continuous at $x\in X$ if $$ (\forall \epsilon>0)(\exists \delta>0)(\forall y\in X)(\rho(y,x)<\delta \Rightarrow \sigma(f(y),f(x))<\epsilon) $$
$f$ said to be continuous if $f$ is continuous at every $x\in X$
$f$ is continuous if and only if for every open $O\subset Y$, $f^{-1}(O)$ is open
if $f:X\to Y$ and $g:Y\to Z$ are continuous, $g\circ f:X\to Z$ is continuous

Homeomorphism

one-to-one mapping of $X$ onto $Y$ (or equivalently, one-to-one correspondece between $X$ and $Y$), $f$, said to be homeomorphism if
- both $f$ and $f^{-1}$ are continuous
$X$ and $Y$ said to be homeomorphic if exists homeomorphism
topology is study of properties unaltered by homeomorphisms and such properties called topological
one-to-one correspondece $X$ and $Y$ is homeomorphism if and only if it maps open sets in $X$ to open sets in $Y$ and vice versa
every property defined by means of open sets (or equivalently, closed sets) or/and being continuous functions is topological one
- e.g., $f$ is continuous on $X$ is homeomorphism, then $f\circ h^{-1}$ is continuous function on $Y$

Isometry

homeomorphism preserving distance called isometry, i.e., $$ (\forall x,y \in X)(\sigma(h(x),h(y)) = \rho(x,y)) $$
$X$ and $Y$ said to be isometric if exists isometry
(from abstract point of view) two isometric spaces are exactly same; it's nothing but relabeling of points
two metrics, $\rho$ and $\sigma$ on $X$, said to be equivalent if identity mapping of $\metrics{X}{\rho}$ onto $\metrics{X}{\sigma}$ is homeomorphism
- hence, two metrics are equivalent if and only if set in one metric is open whenever open in the other metric

Convergence - metric spaces

$\seq{x_n}$ defined for metric space, $X$
- said to converge to $x$, i.e., $\lim x_n =x$ or $x_n \to x$, if $$ (\forall \epsilon>0)(\exists N\in\naturals)(\forall n> N)(\rho(x_n,x)<\epsilon) $$
  - equivalently, every ball about $x$ contains all but finitely many points of $\seq{x_n}$
- said to have cluster point, $x$, if $$ (\forall \epsilon>0, N\in\naturals)(\exists n> N)(\rho(x_n,x)<\epsilon) $$
  - equivalently, every ball about $x$ contains infinitely many points of $\seq{x_n}$
  - equivalently, every ball about $x$ contains at least one point of $\seq{x_n}$
every convergent point is cluster point
- converse not true

Completeness - metric spaces

$\seq{x_n}$ of metric space, $X$, called Cauchy sequence if $$ (\forall \epsilon>0)(\exists N\in\naturals)(\forall n,m> N)(\rho(x_n,x_m)<\epsilon) $$
convergence sequence is Cauchy sequence
$X$ said to be complete if every Cauchy sequence converges
- e.g., $\metrics{\reals}{\rho}$ with $\rho(x,y)=|x-y|$
for incomplete $\metrics{X}{\rho}$, exists complete $X^\ast$ where $X$ is isometrically embedded in $X^\ast$ as dense set
if $X$ contained in complete $Y$, $X^\ast$ is isometric with $\closure{X}$ in $Y$

Uniform continuity - metric spaces

$f:X\to Y$ for metric spaces $\metrics{X}{\rho}$ and $\metrics{Y}{\sigma}$ said to be uniformly continuous if $$ (\forall \epsilon>0)(\exists \delta)(\forall x,y \in X)(\rho(x,y) < \delta \Rightarrow \sigma(f(x),f(y))<\epsilon) $$
- example of continuous, but not uniformly continuous function
  - $h:[0,1)\to\preals$ with $h(x)=x/(1-x)$
  - $h$ maps Cauchy sequence $\seq{1-1/n}_{n=1}^\infty$ in $[0,1)$ to $\seq{n-1}_{n=1}^\infty$ in $\preals$, which is not Cauchy sequence
homeomorphism $f$ between $\metrics{X}{\rho}$ and $\metrics{Y}{\sigma}$ with both $f$ and $f^{-1}$ uniformly continuous called uniform homeomorphism

Uniform homeomorphism

uniform homeomorphism $f$ between $\metrics{X}{\rho}$ and $\metrics{Y}{\sigma}$ maps every Cauchy sequence $\seq{x_n}$ in $X$ mapped to $\seq{f(x_n)}$ in $Y$ which is Cauchy
- being Cauchy sequence, hence, being complete preserved by uniform homeomorphism
- being uniformly continuous also preserved by uniform homeomorphism
each of three properties (being Cauchy sequence, being complete, being uniformly continuous) called uniform property
uniform properties are not topological properties, e.g., $h$ on page~here
- is homeomorphism between incomplete space $[0,1)$ and complete space $\preals$
- maps Cauchy sequence $\seq{1-1/n}_{n=1}^\infty$ in $[0,1)$ to $\seq{n-1}_{n=1}^\infty$ in $\preals$, which is not Cauchy sequence
- its inverse maps uniformly continuous function $\sin$ back to non-uniformly continuity function on $[0,1)$

Uniform equivalence

two metrics, $\rho$ and $\sigma$ on $X$, said to be uniformly equivalent if identity mapping of $\metrics{X}{\rho}$ onto $\metrics{X}{\sigma}$ is uniform homeomorphism, i.e., $$ (\forall \epsilon, \delta>0, x,y \in X) (\rho(x,y)<\delta \Rightarrow \sigma(x,y)<\epsilon \ \&\ \sigma(x,y)<\delta \Rightarrow \rho(x,y)<\epsilon) $$
example of uniform equivalence on $X\times Y$
- any two of below metrics are uniformly equivalent on $X\times Y$ $$ \begin{eqnarray*} &&\tau((x_1,y_1),(x_2,y_2)) = (\rho(x_1,x_2)^2 + \sigma(y_1,y_2)^2)^{1/2} \\ &&\rho_1((x_1,y_1),(x_2,y_2)) = \rho(x_1,x_2) + \sigma(y_1,y_2) \\ &&\rho_\infty((x_1,y_1),(x_2,y_2)) = \max\{\rho(x_1,x_2), \sigma(y_1,y_2)\} \end{eqnarray*} $$
for $\metrics{X}{\rho}$ and complete $\metrics{Y}{\sigma}$ and $f:X\to Y$ uniformly continuous on $E\subset X$ into $Y$, exists unique continuous extension $g$ of $f$ on $\closure{E}$, which is uniformly continuous

Subspaces

for metric space, $\metrics{X}{\rho}$, metric space $\metrics{S}{\rho_S}$ with $S\subset X$ and $\rho_S$ being restriction of $\rho$ to S, called subspace of $\metrics{X}{\rho}$
- e.g. (with standard Euclidean distance)
  - $\rationals$ is subspace of $\reals$
  - $\bigsetl{(x,y)\in\reals^2}{y=0}$ is subspace of $\reals^2$, which is isometric to $\reals$
for metric space, $X$, and its subspace, $S$,
- $\closure{E}\subset S$ is closure of $E$ relative to $S$.
- $A\subset S$ is closure relative to $S$ if and only if $(\exists \closure{F}\subset A)(A = \closure{F}\cap S)$
- $A\subset O$ is open relative to $S$ if and only if $(\exists \mbox{ open }{O}\subset A)(A = {O}\cap S)$
also
- every subspace of separable metric space is separable
- every complete subset of metric space is closed
- every closed subset of complete metric space is complete

Compact metric spaces

motivation - want metric spaces where
- conclusion of Heine-Borel theorem (page~here) are valid
- many properties of $[0,1]$ are true, e.g., Bolzano-Weierstrass property (page~here)
e.g.,
- bounded closed set in $\reals$ has finite open covering property
metric space $X$ called compact metric space if every open covering of $X$, $\collk{U}$, contains finite open covering of $X$, e.g., $$ (\forall \mbox{ open covering of $X$}, \collk{U})(\exists \{O_1,\ldots,O_n\} \subset \collk{U}) (X\in\cup O_i) $$
$A\subset X$ called compact if compact as subspace of $X$
- i.e., every open covering of $A$ contains finite open covering of $A$

Compact metric spaces - alternative definition

collection, $\collk{F}$, of sets in $X$ said to have finite intersection property if every finite subcollection of $\collk{F}$ has nonempty intersection
if rephrase definition of compact metric spaces in terms of closed instead of open
- $X$ is called compact metric space if every collection of closed sets with empty intersection contains finite subcollection with empty intersection
thus, $X$ is compact if and only if every collection of closed sets with finite intersection property has nonempty intersection

Bolzano-Weierstrass property and sequential compactness

metric space said to
- have Bolzano-Weierstrass property if every sequence has cluster point
- $X$ said to be sequentially compact if every sequence has convergent subsequence
$X$ has Bolzano-Weierstrass property \iaoi\ sequentially compact

Compact metric spaces - properties

following three statements about metric space are equivalent (not true for general topological sets)
- being compact
- having Bolzano-Weierstrass property
- being sequentially compact
compact metric spaces have corresponding to some of those of complete metric spaces (compare with statements on page~here)
- every compact subset of metric space is closed and bounded
- every closed subset of compact metric space is compact
(will show above in following slides)

Necessary condition for compactness

compact metric space is sequentially compact
equivalently, compact metric space has Bolzano-Weierstrass property (page~here)

Necessary conditions for sequentially compactness

every continuity real-valued function on sequentially compact space is bounded and assumes its maximum and minimum
sequentially compact space is totally bounded
every open covering of sequentially compact space has Lebesgue number

Sufficient conditions for compactness

metric space that is totally bounded and has Lebesgue number for every covering is compact

Borel-Lebesgue theorem

conditions on pages here, here, and here imply the following equivalent statements
- $X$ is compact
- $X$ has Bolzano-Weierstrass property
- $X$ is sequentially compact
above called Borel-Lebesgue theorem
hence, can drop sequentially in every statement on page~here, i.e.,
- every continuity real-valued function on ~~sequentially~~ compact space is bounded and assumes its maximum and minimum
- ~~sequentially~~ compact space is totally bounded
- every open covering of ~~sequentially~~ compact space has Lebesgue number

Compact metric spaces - other facts

closed subset of compact space is compact
compact subset of metric space is closed and bounded
- hence, Heine-Borel theorem (page~here) implies
- set of $\reals$ is compact \iaoi\ closed and bounded
metric space is compact if and only if it is complete and totally bounded
thus, compactness can be viewed as absolute type of closedness
- refer to page~here for exactly same comments for general topological spaces
continuous image of compact set is compact
continuous mapping of compact metric space into metric space is uniformly continuous

Diagrams for relations among metric spaces

the figure shows relations among metric spaces stated on pages here, here, here, and here

Baire category

do (more) deeply into certain aspects of complete metric spaces, namely, Baire theory of category
subset $E$ in metric space where $\sim (\closure{E})$ is dense, said to be nowhere dense
- equivalently, $\closure{E}$ contains no nonempty open set
union of countable collection of nowhere open sets, said to be of first category or meager
set not of first category, said to be of second category or nonmeager
complement of set of first category, called residual or co-meager

Baire category theorem

Baire theorem -
for complete metric space, $X$, and countable collection of dense open subsets, $\seq{O_k}\subset X$, the intersection of the collection $$ \bigcap O_k $$ is dense
- refer to page~here for locally compact space version of Baire theorem
Baire category theorem - no nonempty open subset of complete metric space is of first category, i.e., union of countable collection of nowhere dense subsets
Baire category theorem is unusual in that uniform property, \ie, completeness of metric spaces, implies purely topological nature (footnote – “no nonempty open subset of complete metric space is of first category” is purely topological nature because if two spaces are (topologically) homeomorphic, and no nonempty open subsets of one space is of first category, then neither is any nonempty open subset of the other space )

Second category everywhere

metric or topological spaces with property that “no nonempty open subset of complete metric space is of first category'', said to be of second category everywhere (with respect to themselves)
Baire category theorem says complete metric space is of second category everywhere
locally compact Hausdorff spaces are of second category everywhere, too (refer to page~here for definition of locally compact Hausdorff spaces)
- for these spaces, though, many of results of category theory follow directly from local compactness

Sets of first category

collection of sets with following properties, called a $\sigma$-ideal of sets
- countable union of sets in the collection is, again, in the collection
- subset of any in the collection is, again, in the collection
both of below collections are $\sigma$-ideal of sets
- sets of first category in topological space
- measure zero sets in complete measure space
sets of first category regards as “small'' sets
- such sets in complete metric spaces no interior points
interestingly! set of first category in $[0,1]$ can have Lebesgue measure $1$, hence complement of which is residual set of measure zero

Some facts of category theory

for open set, $O$, and closed set, $F$, $\closure{O}\sim O$ and $F\sim \interior{F}$ are nowhere dense
closed set of first category in complete metric space is nowhere dense
subset of complete metric space is residual if and only if contains dense $G_\delta$, hence subset of complete metric space is of first category if and only if contained in $F_\sigma$ whose complement is dense
for countable collection of closed sets, $\seq{F_n}$, $\bigcup \interior{F_n}$ is residual open set; if $\bigcup F_n$ is complete metric space, $O$ is dense
some applications of category theory to analysis seem almost too good to be belived; here's one:
uniform boundedness principle - for family, $\collF$, of real-valued continuous functions on complete metric space, $X$, with property that $(\forall x\in X)(\exists M_x\in\reals)(\forall f\in\collF)(|f(x)|\leq M_x)$ $$ (\exists \mbox{ open }O, M\in\reals)(\forall x\in O, f\in\collF)(|f(x)|\leq M) $$

Topological Spaces

Motivation for topological spaces

want to have something like
- notion of open set is fundamental
- other notions defined in terms of open sets
- more general than metric spaces
why not stick to metric spaces?
- certain notions have natural meaning not consistent with topological concepts derived from metric spaces
  - e.g.. weak topologies in Banach spaces

Topological spaces

$\topos{X}{J}$ with nonempty set $X$ of points and family $\tJ$ of subsets, which we call open, having the following properties called topological spaces
- $\emptyset, X\in\tJ$
- $O_1, O_2 \in\tJ \Rightarrow O_1 \cap O_2 \in\tJ$
- $O_\alpha \Rightarrow \cup_\alpha O_\alpha \in \tJ$
family, $\tJ$, is called topology
for $X$, always exist two topologies defined on $X$
- trivial topology having only $\emptyset$ and $X$
- discrete topology for which every subset of $X$ is an open set

Topological spaces associated with metric spaces

can associate topological space, $\topos{X}{J}$, to any metric space $\metrics{X}{\rho}$ where $\tJ$ is family of open sets in $\metrics{X}{\rho}$
- because properties in definition of topological space satisfied by open sets in metric space
$\topos{X}{J}$ assiaciated with metric space, $\metrics{X}{\rho}$ said to be metrizable
- $\rho$ called metric for $\tXJ$
distinction between metric space and associated topological space is essential
- because different metric spaces associate same topological space
- in this case, these metric spaces are equivalent
metric and topological spaces are couples

Some definitions for topological spaces

subset $F\subset X$ with $\compl{F}$ is open called closed
intersection of all closed sets containing $E\subset X$ called closure of $E$ denoted by $\closure{E}$
- $\closure{E}$ is smallest closed set containing $E$
$x\in X$ called point of closure of $E\subset X$ if every open set containing $x$ meets $E$, i.e., has nonempty intersection with $E$
union of all open sets contained in $E\subset X$ is called interior of $E$ denoted by $\interior{E}$
$x\in X$ called interior point of $E$ if exists open set, $E$, with $x\in O\subset E$

Some properties of topological spaces

$\emptyset$, $X$ are closed
union of closed sets is closed
intersection of any collection of closed sets is closed
$E\subset \closure{E}$, $\closure{\closure{E}} = \closure{E}$, $\closure{A\cup B} = \closure{A} \cup \closure{B}$
$F$ closed if and only if $\closure{F}=F$
$\closure{E}$ is set of points of closure of $E$
$\interior{E}\subset E$, $\interior{(\interior{E})} = \interior{E}$, $\interior{(A\cup B)} = \interior{A} \cup \interior{B}$
$\interior{E}$ is set of interior points of $E$
$\interior{(\compl{E})} = \sim \closure{E}$

Subspace and convergence of topological spaces

for subset of $\topos{X}{J}$, $A$, define topology \tS\ for $A$ with $\tS = \set{A\cap O}{O \in \tJ}$
- $\tS$ called topology inherited from \tJ
- $\topos{A}{S}$ called subspace of $\topos{X}{J}$
$\seq{x_n}$ said to converge to $x\in X$ if $$ (\forall O \in \tJ \mbox{ containing } x)(\exists N\in\naturals)(\forall n>N)(x_n \in O) $$
- denoted by $$ \lim x_n = x $$
$\seq{x_n}$ said to have $x\in X$ as cluster point if $$ (\forall O \in\tJ\mbox{ containing } x, N\in\naturals)(\exists n>N)(x_n \in O) $$
$\seq{x_n}$ has converging subsequence to $x\in X$, then $x$ is cluster point of $\seq{x_n}$
- converse is not true for arbitrary topological space

Continuity in topological spaces

mapping $f:X\to Y$ with $\topos{X}{J}$, $\topos{Y}{S}$ said to be continuous if $$ (\forall O\in \tS)(f^{-1}(O) \in \tJ) $$
$f:X \to Y$ said to be continuous at $x\in X$ if $$ (\forall O\in\tS\mbox{ containing } f(x))(\exists U\in\tJ\mbox{ containing } x)(f(U)\subset O) $$
$f$ is continuous if and only if $f$ is continuous at every $x\in X$
for continuous $f$ on $\topos{X}{J}$, restriction $g$ on $A\subset X$ is continuous
for $A$ with $A=A_1 \cup A_2$ where both $A_1$ and $A_2$ are either open or closed, $f:A\to Y$ with each of both restrictions, $\restrict{f}{A_1}$ and $\restrict{f}{A_2}$, continuous, is continuous

Homeomorphism for topological spaces

one-to-one continuous function of $X$ onto $Y$, $f$, with continuous inverse function, $f^{-1}$, called homeomorphism between $\topos{X}{J}$ and $\topos{Y}{S}$
$\topos{X}{J}$ and $\topos{Y}{S}$ said to be homeomorphic if exists homeomorphism between them
homeomorphic spaces are indistinguishable where homeomorphism amounting to relabeling of points (from abstract pointp of view)
thus, below roles are same
- role that homeomorphism plays for topological spaces
- role that isometry plays for metric spaces
- role that isomorphism plays for algebraic systems

Stronger and weaker topologies

for two topologies, $\tJ$ and $\tS$ for same $X$ with $\tS\supset\tJ$
- $\tS$ said to be stronger or finer than $\tJ$
- $\tJ$ said to be weaker or coarser than $\tS$
$\tS$ is stronger than $\tJ$ if and only if identity mapping of $\topos{X}{S}$ to $\topos{Y}{J}$ is continuous
for two topologies, $\tJ$ and $\tS$ for same $X$, $\tJ\cap\tS$ also topology
for any collection of topologies, $\{\tJ_\alpha\}$ for same $X$, $\cap_\alpha \tJ_\alpha$ is topology
for nonempty set, $X$, and any collection of subsets of $X$, $\coll$
- exists weakest topology containing \coll, i.e., weakest topology where all subsets in $\coll$ are open
- it is intersection of all topologies containing $\coll$

Bases for topological spaces

collection $\collB$ of open sets of $\tXJ$ called a base for topology, $\tJ$, of $X$ if $$ (\forall O\in \tJ, x\in O)(\exists B\in\collB)(x\in B\subset O) $$
collection $\collB_x$ of open sets of $\tXJ$ containing $x$ called a base at $x$ if $$ (\forall O\in\tJ \mbox{ containing }x)(\exists B\in\collB_x)(x\in B\subset O) $$
- elements of $\collB_x$ often called neighborhoods of $x$
- when no base given, neighborhood of $x$ is an open set containing $x$
thus, $\collB$ of open sets is a base if and only if contains a base for every $x\in X$
for topological space that is also metric space
- all balls from a base
- balls centered at $x$ form a base at $x$

Characterization of topological spaces in terms of bases

definition of open sets in terms of base - when $\collB$ is base of $\tXJ$ $$ (O\in\tJ) \Leftrightarrow (\forall x\in O)(\exists B\in\collB)(x\in B\subset O) $$
often, convenient to specify topology for $X$ by
- specifying a base of open sets, $\collB$, and
- using above criterion to define open sets
collection of subsets of $X$, $\collB$, is base for some topology if and only if $$ \begin{eqnarray*} &(\forall x\in X)(\exists B\in\collB)(x\in B)& \\ &\mbox{and}& \\ &(\forall x\in X, B_1, B_2 \in \collB \mbox{ with } x\in B_1\cap B_2) (\exists B_3\in \collB)(x\in B_3 \subset B_1\cap B_2)& \end{eqnarray*} $$
- condition of collection to be basis for some topology

Subbases for topological spaces

for $\tXJ$, collection of open sets, $\coll$, called a subbase for topology $\tJ$ if $$ (\forall O\in \tJ, x\in O)(\exists \seq{C_i}_{i=1}^n\subset\coll)(x\in \cap C_i \subset O) $$
- sometimes convenient to define topology in terms of subbase
for subbase for $\tJ$, $\coll$, collection of finite intersections of sets from $\coll$ forms base for $\tJ$
any collection of subsets of $X$ is subbase for weakest topology where sets of the collection are open

Axioms of countability

topological space said to satisfy first axiom of countability if exists countable base at every point
- every metric space satisfies first axiom of countability because for every $x\in X$, set of balls centered at $x$ with rational radii forms base for $x$
topological space said to satisfy second axiom of countability if exists countable base for the space
- every metric space satisfies second axiom of countability if and only if separable (refer to page~here for definition of separability)

Topological spaces - facts

given base, $\collB$, for $\tXJ$
- $x \in \closure{E}$ if and only if $(\exists B\in\collB)(x\in B \ \&\ B\cap E \neq \emptyset)$
given base at $x$ for $\tXJ$, $\collB_x$, and base at $y$ for $\topos{Y}{S}$, $\topol{C}_y$
- $f:X\to Y$ continuous at $x$ if and only if $ (\forall C\in\topol{C}_y)(\exists B\in\collB_x)(B\subset f^{-1}(C)) $
if $\tXJ$ satisfies first axiom of countability
- $x \in \closure{E}$ if and only if $(\exists \seq{x_n} \mbox{ from } E)(\lim x_n = x)$
- $x$ cluster point of $\seq{x_n}$ if and only if exists its subsequence converging to $x$
$\tXJ$ said to be Lindelöf space or have Lindelöf property if every open covering of $X$ has countable subcover
second axiom of countability implies Lindelöf property

Separation axioms

why separation axioms
- properties of topological spaces are (in general) quite different from those of metric spaces
- often convenient assume additional conditions true in metric spaces
separation axioms
- $T_1$ - Tychonoff spaces
  - $$ \begin{eqnarray*} &&(\forall x \neq y \in X) \\ &&(\exists \mbox{ open }O\subset X) \\ &&(y \in O, x \not\in O) \end{eqnarray*} $$
- $T_2$ - Hausdorff spaces
  - $$ \begin{eqnarray*} &&(\forall x \neq y \in X) \\ &&(\exists \mbox{ open }O_1, O_2\subset X \mbox{ with } O_1\cap O_2=\emptyset) \\ &&(x \in O_1, y \in O_2) \end{eqnarray*} $$
- $T_3$ - regular spaces
  - $T_1$ & $$ \begin{eqnarray*} &&(\forall \mbox{ closed } F \subset X, x \not\in F) \\ &&(\exists \mbox{ open }O_1, O_2\subset X \mbox{ with } O_1\cap O_2=\emptyset) \\ &&(x \in O_1, F \subset O_2) \end{eqnarray*} $$
- $T_4$ - normal spaces
  - $T_1$ & $$ \begin{eqnarray*} &&(\forall \mbox{ closed } F_1, F_2 \subset X) \\ &&(\exists \mbox{ open }O_1, O_2\subset X \mbox{ with } O_1\cap O_2=\emptyset) \\ &&(F_1 \subset O_1, F_2 \subset O_2) \end{eqnarray*} $$

Separation axioms - facts

necessary and sufficient condition for $T_1$
- topological space satisfies $T_1$ if and only if every singletone, $\{x\}$, is closed
important consequences of normality, $T_4$
- Urysohn's lemma - for normal topological space, $X$ $$ \begin{eqnarray*} &&(\forall \mbox{ disjoint closed } A, B \subset X) \\ &&(\exists f\in C(X,[0,1])) \\ &&(f(A) = \{0\}, f(B) = \{1\}) \end{eqnarray*} $$
- Tietze's extension theorem - for normal topological space, $X$ $$ \begin{eqnarray*} &&(\forall \mbox{ closed } A \subset X, f\in C(A,\reals)) \\ &&(\exists g \in C(X,\reals)) \\ &&(\forall x \in A) (g(x) = f(x)) \end{eqnarray*} $$
- Urysohn metrization theorem - normal topological space satisfying second axiom of countability is metrizable

Weak topology generated by functions

given any set of points, $X$ & any collection of functions of $X$ into $\reals$, $\collk{F}$, exists weakest totally on $X$ such that all functions in $\collk{F}$ is continuous
- it is weakest topology containing - refer to page~here $$ \coll\ = \bigcup_{f\in\collk{F}} \bigcup_{O\subset \reals} f^{-1}(O) $$
- called weak topology generated by $\collk{F}$

Complete regularity

for $\tXJ$ and continuous function collection $\collk{F}$, weak topology generated by $\collk{F}$ is weaker than $\tJ$
- however, if $$ (\forall \mbox{ closed } F\subset X, x \not\in F)(\exists f\in\collk{F})(f(A)=\{0\}, f(x)=1) $$ then, weak topology generated by $\collk{F}$ coincides with $\tJ$
- if condition satisfied by $\collk{F} = C(X,\reals)$, $X$ said to be completely regular provided $X$ satisfied $T_1$ (Tychonoff space)
every normal topological ($T_4$) space is completely regular (Urysohn's lemma)
every completely regular space is regular space ($T_3$)
complete regularity sometimes called $T_{3\frac{1}{2}}$

Diagrams for separation axioms for topological spaces

the figure shows $T_4 \Rightarrow T_{3\frac{1}{2}} \Rightarrow T_3 \Rightarrow T_2 \Rightarrow T_1$
every metric spaces is normal space

Topological spaces of interest

very general topological spaces quite bizarre
- do not seem to be much needed in analysis
only topological spaces (Royden) found useful for analysis are
- metrizable topological spaces
- locally compact Hausdorff spaces
- topological vector spaces
all above are completely regular
algebraic geometry, however, uses Zariski topology on affine or projective space, topology giving us compact $T_1$ space which is not Hausdorff

Connectedness

topological space, $X$,said to be connected if not exist two nonempty disjoint open sets, $O_1$ and $O_2$, such that $O_1\cup O_2 = X$
- such pair, $(O_1, O_2)$, if exist, called separation of $X$
- pair of disjoint nonempty closed sets, $(F_1,F_2)$, with $F_1\cup F_2=X$ is also separation of $X$ - because they are also open
$X$ is connected if and only if only subsets that are both closed and open are $\emptyset$ and $X$
subset $E\subset X$ said to be connected if connected in topology inherited from $\tXJ$
- thus, $E$ is connected if not exist two nonempty open sets, $O_1$ and $O_2$, such that $E\subset O_1\cup O_2$ and $E\cap O_1\cap O_2 = \emptyset$

Properties of connected space, component, and local connectedness

if exists continuous mapping of connected space to topological space, $Y$, $Y$ is connected
(generalized version of) intermediate value theorem - for $f:X\to\reals$ where $X$ is connected $$ (\forall x, y \in X, c\in \reals \mbox{ with } f(x) < c < f(y))(\exists z \in X)(z=f(z)) $$
subset of $\reals$ is connected if and only if is either interval or singletone
for $x\in X$, union of all connected sets containing $x$ is called component
- component is connected and closed
- two components containing same point coincide
- thus, $X$ is disjoint union of components
$X$ said to be locally connected if exists base for $X$ consisting of connected sets
- components of locally connected space are open
- space can be connected, but not locally connected

Product topological spaces

for $\tXJ$ and $\topos{Y}{S}$, topology on $X\times Y$ taking as a base the following $$ \set{O_1 \times O_2}{O_1 \in \tJ, O_2 \in \topol{S}} $$
called product topology for $X\times Y$
- for metric spaces, $X$ and $Y$, product topology is product metric
for indexed family with index set, $\collk{A}$, $\topos{X_\alpha}{\tJ_\alpha}$, product topology on $\bigtimes_{\alpha\in\collk{A}} X_{\alpha}$ defined as taking as a base the following $\bigsetl{\bigtimes X_\alpha}{O_\alpha\in \tJ_\alpha, O_\alpha = X_\alpha \mbox{ except finite number of }\alpha}$
$\pi_\alpha: \bigtimes X_{\alpha} \to X_\alpha$ with $\pi_\alpha(y) = x_\alpha$, i.e., $\alpha$-th coordinate, called projection
- every $\pi_\alpha$ continuous
- $\bigtimes X_\alpha$ weakest topology with continuous $\pi_\alpha$'s
if $(\forall \alpha\in\collk{A})(X_\alpha=X)$, $\bigtimes X_{\alpha}$ denoted by $X^\collk{A}$

Product topology with countable index set

for countable $\collk{A}$
- $\bigtimes X_\alpha$ denoted by $X^\omega$ or $X^\naturals$ $\because$ only # elements of $\collk{A}$ important
  - e.g., $\textbf{2}^\omega$ is Cantor set if denoting discrete topology with two elements by $\textbf{2}$
if $X$ is metrizable, $X^\omega$ is metrizable
$\naturals^\omega = \naturals^\naturals$ is topology space homeomorphic to $\reals\sim\rationals$ when denoting discrete topology with countable set also by $\naturals$

Product topologies induced by set and continuous functions

for $I=[0,1]$, $I^\collk{A}$ called cube
$I^\omega$ is metrizable, and called Hilbert cube
for any set $X$ and any collection of $f:X\to[0,1]$, $\collk{F}$ with $(\forall x\neq y\in X)(\exists f\in\collk{F})(f(x)\neq f(y))$
- can define one-to-one mapping of \collk{F}\ into $I^X$ with $f(x)$ as $x$-th coordinate of $f$
  - $\pi_x: \collk{F} \to I$ (mapping of $\collk{F}$ into $I$) with $\pi_x(f) = f(x)$
  - topology that $\collk{F}$ inherits as subspace of $I^X$ called topology of pointwise convergence (because $\pi_x$ is project, hence continuous)
- can define one-to-one mapping of $X$ into $I^\collk{F}$ with $f(x)$ as $f$-th coordinate of $x$
  - topology of $X$ as subspace of $I^\collk{F}$ is weak topology generated by \collk{F}
  - if every $f\in\collk{F}$ is continuous,
    - topology of $X$ into $I^\collk{F}$ is continuous
    - if for every closed $F\subset X$ and for each $x\not\in F$, exists $f\in\collk{F}$ such that $f(x)=1$ and $f(F)=\{0\}$, then $X$ is homeomorphic to image of $I^\collk{F}$

Compact and Locally Compact Spaces

Compact spaces

compactness for metric spaces (page~here) can be generalized to topological spaces
- things are very much similar to those of metrics spaces
for subset $K\subset X$, collection of open sets, $\openconv$, the union of which $K$ is contained in called open covering of $K$
topological space, $X$, said to be compact if every open convering of contains finite subcovering
$K\subset X$ said to be compact if compact as subspace of $X$
- or equivalently, $K$ is compact if every covering of $K$ by open sets of $X$ has finite subcovering
- thus, Heine-Borel (page~here) says every closed and bounded subset of $\reals$ is compact
for $\collk{F}\subset\powerset(X)$ any finite subcollection of which has nonempty intersection called finite intersection property
thus, topological space compact if and only if every collection with finite intersection property has nonempty intersection

Compact spaces - facts

compactness can be viewed as absolute type of closedness because
- closed subset of compact space is compact
- compact subset of Hausdorff space is closed
refer to page~here for exactly the same comments for metric spaces
thus, every compact set of $\reals$ is closed and bounded
continuous image of compact set is compact
one-to-one continuous mapping of compact space into Hausdorff space is homeomorphism

Refinement of open covering

for open covering of $X$, $\openconv$, open covering of $X$ every element of which is subset of element of $\openconv$, called refinement of $\openconv$ or said to refine $\openconv$
$X$ is cmopact if and only if every open covering has finite refinement
any two open covers, $\openconv$ and $\collk{V}$, have common refinement, i.e., $$ \set{U\cap V}{U\in\openconv, V\in\collk{V}} $$

Countable compactness and Lindelöf

topological space for which every open covering has countable subcovering said to be Lindelöf
topological space for which every countable open covering has finite subcovering said to be countably compact space
thus, topological space is compact if and only if both Lindelöf and countably compact
every second countable space is Lindelöf
thus, countable compactness coincides with compactness if second countable (i.e., satisfying second axiom of countability)
continuous image of compact countably compact space is countably compact

Bolzano-Weierstrass property and sequential compactness

topological space, $X$, said to have Bolzano-Weierstrass property if every sequence, $\seq{x_n}$, in $X$ has at least one cluster point, i.e., $$ (\forall \seq{x_n}) (\exists x\in X) (\forall \epsilon>0, N\in\naturals) (\exists n>N, O\subset X) (x\in O, O \mbox{ is open}, x_n \in O) $$
topological space has Bolzano-Weierstrass properties if and only if countably compact
topological space said to be sequentially compact if every sequence has converging subsequence
sequentially compact space is countably compact
thus, Lindelöf coincides with compactness if sequentially compact
countably compact and first countable (i.e., satisfying first axiom of countability) space is sequentially compact

Diagrams for relations among topological spaces

the figure shows relations among topological spaces stated on pages here and here

Real-valued functions on topological spaces

continuous real-valued function on countably compact space is bounded and assumes maximum and minimum
$f:X\to\reals$ with topological space, $X$, called upper semicontinuous if $\set{x\in X}{f(x)<\alpha}$ is open for every $\alpha \in \reals$
stronger statement - upper semicontinuous real-valued function on countably compact space is bounded (from above) and assumes maximum
Dini - for sequence of upper semicontinuous real-valued functions on countably compact space, $\seq{f_n}$, with property that $\seq{f_n(x)}$ decreases monotonically to zero for every $x\in X$, $\seq{f_n}$ converges to zero uniformly

Products of compact spaces

Tychonoff theorem - (probably) most important theorem in general topology
most applications in analysis need only special case of product of (closed) intervals, but this special case does not seem to be easire to prove than general case, i.e., Tychonoff theorem
lemmas needed to prove Tychonoff theorem
- for collection of subsets of $X$ with finite intersection property, $\collk{A}$, exists collection $\collk{B}\supset\collk{A}$ with finite intersection property that is maximal with respect to this property, i.e., no collection with finite intersection property properly contains $\collk{B}$
- for collection, $\collk{B}$, of subsets of $X$ that is maximal with respect to finite intersection property, each intersection of finite number of sets in $\collk{B}$ is again in $\collk{B}$ and each set that meets each set in $\collk{B}$ is itself in $\collk{B}$
Tychonoff theorem - product space $\bigtimes X_\alpha$ is compact for indexed family of compact topological spaces, $\seq{X_\alpha}$

Locally compact spaces

topological space, $X$, with $$ (\forall x\in X)(\exists \mbox{ open }O\subset X)(x\in O, \closure{O} \mbox{ is compact}) $$ called locally compact
topological space is locally compact if and only if set of all open sets with compact closures forms base for the topological space
every compact space is locally compact
- but converse it not true
  - e.g., Euclidean spaces $\reals^n$ are locally compact, but not compact

Locally compact Hausdorff spaces

locally compact Hausdorff spaces constitute one of most important classes of topological spaces
so useful is combination of Hausdorff separation axioms in connection with compactness that French usage (following Bourbaki) reserves term ‘compact space' for those compact and Hausdorff, using term ‘pseudocompact' for those not Hausdorff!
following slides devote to establishing some of their basic properties

Support and subordinateness

for function, $f$, on topological spaces, closure of $\set{x}{f(x)\neq0}$, called support of $f$, i.e., $$ \support f = \closure{\set{x}{f(x)\neq0}} $$
given covering $\indexedcol{O_\lambda}$ of $X$, collection $\indexedcol{\varphi_\alpha}$ with $\varphi_\alpha:X\to\reals$ satisfying $$ (\forall \varphi_\alpha)(\exists O_\lambda)(\support \varphi_\alpha \subset O_\lambda) $$ said to be subordinate to $\indexedcol{O_\lambda}$

Some properties of locally compact Hausdorff spaces

for compact subset, $K$, of locally compact Hausdorff space, $X$
- exists open subset with compact closure, $O\subset X$, containing $K$
- exists continuous nonnegative function, $f$, on $X$, with $$ (\forall x\in K)(f(x)=1) \mbox{ and } (\forall x\not\in O)(f(x)=0) $$ if $K$ is $G_\delta$, may take $f<1$ in $\compl{K}$
for open covering, $\indexedcol{O_\lambda}$, for compact subset, $K$, of locally compact Hausdorff space, exists $\seq{\varphi_i}_{i=1}^n \subset C(X,\preals)$ subordinate to $\indexedcol{O_\lambda}$ such that $$ (\forall x \in K)(\varphi_1(x)+\cdots+\varphi_n(x) =1) $$

Local compactness and second Baire category

for locally compact space, $X$, and countable collection of dense open subsets, $\seq{O_k}\subset X$, the intersection of the collection $$ \bigcap O_k $$ is dense
- analogue of Baire theorem for complete metric spaces (refer to page~here for Baire theorem)
thus, every locally compact space is locally of second Baire category with respect to itself

Local compactness, Hausdorffness, and denseness

for countable union, $\bigcup F_n$, of closed sets containing open subset, $O$, in locally compact space, union of interiors, $\bigcup \interior{F_n}$, is open set dense in $O$
dense subset of Hausdorff space, $X$, which is locally compact in its subspace topology, is open subset of $X$
subset, $Y$, of locally compact Hausdorff space is locally compact in its subspace topology if and only if $Y$ is relatively open subset of $\closure{Y}$

Alexandroff one-point compactification

for locally compact Hausdorff space, $X$, can form $X^\ast$ by adding single point $\omega\not\in X$ to $X$ and take set in $X^\ast$ to be open if it is either open in $X$ or complement of compact subset in $X$, then
- $X^\ast$ is compact Hausdorff spaces
- identity mapping of $X$ into $X^\ast$ is homeomorphism of $X$ and $X^\ast\sim\{\omega\}$
- $X^\ast$ called Alexandroff one-point compactification of $X$
- $\omega$ often referred to as infinity in $X^\ast$
continuous mapping, $f$, from topological space to topological space inversely mapping compact set to compact set, said to be proper
proper maps from locally compact Hausdorff space into locally compact Hausdorff space are precisely those continuous maps of $X$ into $Y$ tha can be extended to continuous maps $f^\ast$ of $X^\ast$ into $Y^\ast$ by taking point at infinity in $X^\ast$ to point at infinity in $Y^\ast$

Manifolds

connected Hausdorff space with each point having neighborhood homeomorphic to ball in $\reals^n$ called $n$-dimensional manifold
sometimes say manifold is connected Hausdorff space that is locally Euclidean
thus, manifold has all local properties of Euclidean space; particularly locally compact and locally connected
neighborhood homeomorphic to ball called coordinate neighborhood or coordinate ball
pair $\pair{U}{\varphi}$ with coordinate ball, $U$, with homeomorphism from $U$ onto ball in $\reals^n$, $\varphi$, called coodinate chart; $\varphi$ called coordinate map
coordinate (in $\reals^n$) of point, $x\in U$, under $\varphi$ said to be coordinate of $x$ in the chart

Equivalent properties for manifolds

for manifold, $M$, the following are equivalent
- $M$ is paracompact
- $M$ is $\sigma$-compact
- $M$ is Lindelöf
- every open cover of $M$ has star-finite open refinement
- exist sequence of open subsets of $M$, $\seq{O_n}$, with $\closure{O_n}$ compact, $\closure{O_n}\subset O_{n+1}$, and $M=\bigcup O_n$
- exists proper continuous map, $\varphi:M\to [0,\infty)$
- $M$ is second countable

Banach Spaces

Vector spaces

set $X$ with $+:X\times X\to X$, $\cdot: \reals \times X\to X$ satisfying the following properties called vector space or linear space or linear vector space over $\reals$ $$ \begin{eqnarray*} \mbox{- for all } x,y,z\in X \mbox{ and } \lambda, \mu \in \reals \\ x+y= y+x && \mbox{- additive commutativity} \\ (x+y)+z= x+(y+z) && \mbox{- additive associativity} \\ (\exists 0\in X)\ x+0=x && \mbox{- additive identity} \\ \lambda(x+y) = \lambda x + \lambda y && \mbox{- distributivity of multiplication over addition} \\ (\lambda+\mu)x = \lambda x + \mu x && \mbox{- distributivity of multiplication over addition} \\ \lambda(\mu x)= (\lambda \mu)x && \mbox{- multiplicative associativity} \\ 0\cdot x = 0\in X&& \\ 1\cdot x = x&& \end{eqnarray*} $$

Norm and Banach spaces

$\|\cdot\|:X\to\preals$ with vector space, $X$, called norm if $$ \begin{eqnarray*} \mbox{for all } x,y\in X \mbox{ and } \alpha \in \reals& \\ \|x\| = 0 \Leftrightarrow x=0 && \mbox{- positive definiteness / positiveness /point-separating} \\ \|x+y\|\geq \|x\| + \|y\| && \mbox{- triangle inequality / subadditivity} \\ \|\alpha x\| = |\alpha| \|x\| && \mbox{- Absolute homogeneity} \end{eqnarray*} $$
normed vector space that is complete metric space with metric induced by norm, i.e., $\rho:X\times X \to \preals$ with $\rho(x,y)=\|x-y\|$, called Banach space
- can be said to be class of spaces endowed with both topological and algebraic structure
examples include
- $L^p$ with $1\leq p\leq \infty$ (page~here),
- $C(X)=C(X,\reals)$, i.e., space of all continuous real-valued functions on compact space, $X$

Properties of vector spaces

normed vector space is complete if and only if every absolutely summable sequence is summable

Subspaces of vector spaces

nonempty subset, $S$, of vector space, $X$, with $x,y\in S\Rightarrow \lambda x + \mu y\in S$, called subspace or linear manifold
intersection of any family of linear manifolds is linear manifold
hence, for $A\subset X$, exists smallest linear manifold containing $A$, often denoted by $\{A\}$
if $S$ is closed as subset of $X$, called closed linear manifold
some definitions
- $A+x$ defined by $\set{y+x}{y\in A}$, called translate of $A$ by $x$
- $\lambda A$ defined by $\set{\lambda x}{x \in A}$
- $A+B$ defined by $\set{x+y}{x \in A, y\in B}$

Linear operators on vector spaces

mapping of vector space, $X$, to another (possibly same) vector space called linear mapping, or linear operator, or linear transformation if $$ (\forall x,y \in X, \alpha, \beta \in \reals) (A(\alpha x + \beta y y) = \alpha (Ax) + \beta (Ay)) $$
linear operator called bounded if $$ (\exists M) (\forall x \in X) (\|Ax\|\leq M \|x\|) $$
least such bound called norm of linear operator, i.e., $$ M = \sup_{x\in X, x\neq 0} \|Ax\|/\|x\| $$
- linearity implies $$ M = \sup_{x\in X, \|x\|= 1} \|Ax\| = \sup_{x\in X, \|x\|\leq 1} \|Ax\| $$

Isomorphism and isometrical isomorphism

bounded linear operator from $X$ to $Y$ called isomorphism if exists bounded inverse linear operator, i.e., $$ (\exists A:X\to Y, B:Y\to X)(AB \mbox{ and } BA \mbox{ are identity}) $$
isomorphism between two normed vector spaces that preserve norms called isometrical isomorphism
from abstract point of view, isometrically isomorphic spaces are identical, i.e., isometrical isomorphism merely amounts to element renaming

Properties of linear operators on vector spaces

for linear operators, point continuity $\Rightarrow$ boundedness $\Rightarrow$ uniform continuity, i.e.,
- bounded linear operator is uniformly continuous
- linear operator continuous at one point is bounded
space of all bounded linear operators from {normed vector space} to {Banach space} is {Banach space}

Linear functionals on vector spaces

linear operator from vector space, $X$, to $\reals$ called linear functional, i.e., $f:X\to\reals$ such that for all $x,y\in X$ and $\alpha, \beta \in \reals$ $$ f(\alpha x + \beta y) = \alpha f(x) + \beta f(y) $$
want to extend linear functional from subspace to whole vector space while preserving properties of functional

Hahn-Banach theorem

Hahn-Banach theorem - for vector space, $X$, and linear functional, $p:X \to \reals$ with $$ (\forall x,y\in X, \alpha \geq0) (p(x+y)\leq p(x) + p(y) \mbox{ and } p(\alpha x) = \alpha p(x)) $$ and for subspace of $X$, $S$, and linear functional, $f:S\to\reals$, with $$ (\forall s \in S) (f(s) \leq p(s)) $$ exists linear functional, $F:X\to\reals$, such that $$ (\forall s \in S) ( F(s) = f(s)) \mbox{ and } (\forall x \in X) (F(x) \leq p(x)) $$
corollary - for normed vector space, $X$, exists bounded linear functional, $f:X\to\reals$ $$ f(x) = \|f\|\|x\| $$

Dual spaces of normed spaces

space of bounded linear functionals on normed space, $X$, called dual or conjugate of $X$, denoted by $X^\ast$
every dual is Banach space (refer to page~here)
dual of $L^p$ is (isometrically isomorphic to) $L^q$ for $1\leq p<\infty$
- exists natural representation of bounded linear functional on $L^p$ by $L^q$ (by Riesz representation theorem on page~here)
not every bounded linear functionals on $L^\infty$ has natural representation

Natural isomorphism

define linear mapping of normed space, $X$, to $X^{\ast\ast}$ (i.e., dual of dual of $X$), $\varphi:X\to X^{\ast\ast}$ such that for $x\in X$, $ ( \forall f\in X^{\ast} ) ( (\varphi (x))(f) = f(x) ) $
- then, $ \|\varphi(x)\| = \sup_{\|g\|=1, g\in X^\ast} g(x) \leq \sup_{\|g\|=1, g\in X^\ast} \|g\|\|x\| = \|x\| $
- by corollary on page~here, there exists $f\in X^\ast$ such that $f(x)=\|x\|$, then $\|f\|=1$, and $f(x)=\|x\|$, thus $\|\varphi(x)\| = \sup_{\|g\|=1, g\in X^\ast} g(x) \geq f(x) = \|x\|$
- thus, $\|\varphi(x)\| = \|x\|$, hence $\varphi$ is isometrically isomorphic linear mapping of $X$ onto $\varphi(X)\subset X^{\ast\ast}$, which is subspace of $X^{\ast\ast}$
- $\varphi$ called natural isomorphism of $X$ into $X^{\ast\ast}$
- $X$ said to be reflexive if $\varphi(X)=X^{\ast\ast}$
thus, $L^p$ with $1< p<\infty$ is reflexive, but $L^1$ and $L^\infty$ are not
note $X$ may be isometric with $X^{\ast\ast}$ without reflexive

Completeness of natural isomorphism

for natural isomorphism, $\varphi$
$X^{\ast\ast}$ is complete, hence Banach space
- because bounded linear functional to $\reals$ (refer to page~here)
thus, closure of $\varphi(X)$ in $X^{\ast\ast}$, $\closure{\varphi(X)}$, complete (refer to page~here)
therefore, every normed vector space ($X$) is isometrically isomorphic to dense subset of Banach spaces ($X^{\ast\ast}$)

Hahn-Banach theorem - complex version

Bohnenblust and Sobczyk - for complex vector space, $X$, and linear functional, $p:X \to \reals$ with $$ ( \forall x,y\in X, \alpha \in\complexes ) ( p(x+y)\leq p(x) + p(y) \mbox{ and } p(\alpha x) = |\alpha| p(x) ) $$ and for subspace of $X$, $S$, and (complex) linear functional, $f:S\to\complexes$, with $$ ( \forall s \in S ) ( |f(s)| \leq p(s) ) $$ exists linear functional, $F:X\to\reals$, such that $$ ( \forall s \in S ) ( F(s) = f(s) ) $$ and $$ ( \forall x \in X ) ( |F(x)| \leq p(x) ) $$

Open mapping on topological spaces

mapping from topological space to another topological space the image of each open set by which is open called open mapping
hence, one-to-one continuous open mapping is homeomorphism
(will show) continuous linear transformation of Banach space onto another Banach space is always open mapping
(will) use above to provide criteria for continuity of linear transformation

Closed graph theorem (on Banach spaces)

every continuous linear transformation of Banach space onto Banach space is open mapping
- in particular, if the mapping is one-to-one, it is isomorphism
for linear vector space, $X$, complete in two norms, $\|\cdot\|_A$ and $\|\cdot\|_B$, with $C\in\reals$ such that $ (\forall x\in X)(\|x\|_A \leq C \|x\|_B) $, two norms are equivalent, i.e., $ (\exists C'\in\reals)(\forall x\in X)(\|x\|_B \leq C' \|x\|_A) $
closed graph theorem - linear transformation, $A$, from Banach space, $A$, to Banach space, $B$, with property that “if $\seq{x_n}$ converges in $X$ to $x\in X$ and $\seq{Ax_n}$ converges in $Y$ to $y\in Y$, then $y=Ax$'' is continuous
- equivalent to say, if graph $\set{(x,Ax)}{x\in X}\subset X\times Y$ is closed, $A$ is continuous

Principle of uniform boundedness (on Banach spaces)

principle of uniform boundedness - for family of bounded linear operators, $\collk{F}$ from Banach space, $X$, to normed space, $Y$, with $$ ( \forall x \in X ) ( \exists M_x ) ( \forall T \in \collk{F} ) ( \|Tx\| \leq M_x ) $$ then operators in $\collk{F}$ is uniformly bounded, i.e., $$ ( \exists M ) ( \forall T \in \collk{F} ) ( \|T\| \leq M ) $$

Topological vector spaces

just as notion of metric spaces generalized to notion of topological spaces
notion of normed linear space generalized to notion of topological vector spaces
linear vector space, $X$, with topology, $\tJ$, equipped with continuous addition, $+:X\times X\to X$ and continuous multiplication by scalars, $+:\reals\times X\to X$, called topological vector space

Translation invariance of topological vector spaces

for topological vector space, translation by $x\in X$ is homeomorphism (due to continuity of addition)
- hence, $x+O$ of open set $O$ is open
- every topology with this property said to be translation invariant
for translation invariant topology, $\tJ$, on $X$, and base, $\collB$, for $\tJ$ at $0$, set $$ \set{x+U}{U\in \collB} $$ forms a base for $\tJ$ at $x$
hence, sufficient to give a base at $0$ to determine translation invariance of topology
base at $0$ often called local base

Sufficient and necessarily condition for topological vector spaces

for topological vector space, $X$, can find base, $\collB$, satisfying following properties $$ \begin{eqnarray*} && (\forall U, V \in \collB)(\exists W\in \collB)(W\subset U\cap V) \\ && (\forall U \in \collB, x\in U)(\exists V\in \collB)(x+V\subset U) \\ && (\forall U \in \collB)(\exists V\in \collB)(V + V \subset U) \\ && (\forall U \in \collB, x\in X)(\exists \alpha\in \reals)(x\in \alpha U) \\ && (\forall U \in \collB, 0<|\alpha|\leq 1\in \reals)(\alpha U\subset U, \alpha U\subset \collB) \end{eqnarray*} $$
conversely, for collection, $\collB$, of subsets containing $0$ satisfying above properties, exists topology for $X$ making $X$ topological vector space with $\collB$ as base at $0$
- this topology is Hausdorff if and only if $$ \bigcap\{U\in \collB\} = \{0\} $$
for normed linear space, can take $\collB$ to be set of spheres centered at $0$, then $\collB$ satisfies above properties, hence can form topological vector space

Topological isomorphism

in topological vector space, can compare neighborhoods at one point with neighborhoods of another point by translation
for mapping, $f$, from topological vector space, $X$, to topological vector space, $Y$, such that $$ \begin{eqnarray*} && (\forall \mbox{ open } O\subset Y \mbox{ with }0\in O) (\exists \mbox{ open } U\subset X \mbox{ with }0\in U) \\ && (\forall x\in X) (f(x+U) \subset f(x) + O) \end{eqnarray*} $$ said to be uniformly continuous
linear transformation, $f$, is uniformly continuous if continuous at one point
continuous one-to-one mapping, $\varphi$, from $X$ onto $Y$ with continuous $\varphi^{-1}$ called (topological) isomorphism
- in abstract point of view, isomorphic spaces are same
Tychonoff - finite-dimensional Hausdorff topological vector space is topologically isomorphic to $\reals^n$ for some $n$

Weak topologies

for vector space, $X$, and collection of linear functionals, $\collF$, weakest topology generated by $\collF$, i.e., in way that each functional in $\collF$ is continuous in that topology, called weak topology generated by $\collF$
- translation invariant
- base at $0$ given by sets $$ \set{x\in X}{\forall f \in\collk{G}, |f(x)|<\epsilon} $$ for all finite $\collk{G}\subset\collF$ and $\epsilon>0$
- basis satisfies properties on page~here, hence, (above) weak topology makes topological vector space
for normed vector space, $X$, and collection of continuous functionals, $\collF$, i.e., $\collF\subset X^\ast$, weak topology generated by $\collF$ weaker than (fewer open sets) norm topology of $X$
metric topology generated by norm called strong topology of $X$
weak topology generated by $X^\ast$ called weak topology of $X$

Strongly and weakly open and closed sets

open and closed sets of strong topology called strongly open and strongly closed
open and closed sets of weak topology called weakly open and weakly closed
wealy closed set is strongly closed, but converse not true
however, these coincides for linear manifold, i.e., linear manifold is weakly closed if and only if strongly closed
every strongly converent sequence (or net) is weakly convergent

Weak$^\ast$ topologies

for normed space, weak topology of $X^\ast$ is weakest topology for which all functionals in $X^{\ast\ast}$ are continuous
turns out that weak topology of $X^\ast$ is less useful than weak topology generated by $X$, i.e., that generated by $\varphi(X)$ where $\varphi$ is the natural embedding of $X$ into $X^{\ast\ast}$ (refer to page~here)
(above) weak topology generated by $\varphi(X)$ called weak$^\ast$ topology for $X^\ast$
- even weaker than weak topology of $X^\ast$
- thus, weak$^\ast$ closed subset of is weakly closed, and weak convergence implies weak$^\ast$ convergence
base at $0$ for weak$^\ast$ topology given by sets $$ \set{f}{\forall x\in A, |f(x)|<\epsilon} $$ for all finite $A\subset X$ and $\epsilon>0$
when $X$ is reflexive, weak and weak$^\ast$ topologies coincide
Alaoglu - unit ball $S^\ast = \set{f\in X^\ast}{\|f\|\geq1}$ is compact in weak$^\ast$ topology

Convex sets

for vector space, $X$ and $x,y\in X$ $$ \set{\lambda x + (1-\lambda)y}{\lambda \in [0,1]} \subset X $$ called segmenet joining $x$ and $y$
set $K\subset X$ said to be convex or convex set if every segment joining any two points in $K$ is in $K$, i.e., $ (\forall x,y\in K)(\mbox{segment joining }x,y\subset X) $
every $\lambda x + (1-\lambda)y$ for $0<\lambda<1$ called interior point of segment
point in $K\subset X$ where intersection with $K$ of every line going through $x$ contains open interval about $x$, said to be internal point, i.e., $$ (\exists \epsilon>0)(\forall y\in K, |\lambda|<\epsilon)(x+y x\in K) $$
convex set examples - linear manifold & ball, ellipsoid in normed space

Properties of convex sets

for convex sets, $K_1$ and $K_2$, following are also convex sets $$ K_1 \cap K_2,\ \lambda K_1,\ K_1 + K_2 $$
for linear operators from vector space, $X$, and vector space, $Y$,
- image of convex set (or linear manifold) in $X$ is convex set (or linear manifold) in $Y$,
- inverse image of convex set (or linear manifold) in $Y$ is convex set (or linear manifold) in $X$
closure of convex set in topological vector space is convex set

Support functions of and separated convex sets

for subset $K$ of vector space $X$, $p:K\to \preals$ with $p(x) = \inf{\lambda|\lambda^{-1}x \in K, \lambda>0}$ called support functions
for convex set $K\subset X$ containing $0$ as internal point
- $(\forall x\in X,\lambda\geq0)(p(\lambda x) = \lambda p(x))$
- $(\forall x,y\in X)(p(x+y)\leq p(x)+p(y))$
- $\set{x\in X}{p(x) < 1} \subset K \subset \set{x\in X}{p(x)\leq 1}$
two convex sets, $K_1$ and $K_2$ such that exists linear functional, $f$, and $\alpha\in\reals$ with $(\forall x\in K_1)(f(x) \leq \alpha)$ and $(\forall x\in K_2)(f(x) \geq \alpha)$, said to be separated
for two disjoint convex sets in vector space with at least one of them having internal point, exists nonzero linear functional that separates two sets

Local convexity

topological vector space with base for topology consisting of convest sets, said to be locally convex
for family of convex sets, $\collk{N}$, in vector space, following conditions are sufficient for being able to translate sets in $\collk{N}$ to form base for topology to make topological space into locally convex topological vector space $$ \begin{eqnarray*} & (\forall N\in\collk{N})(x\in N \Rightarrow x \mbox{ is internal}) & \\ & (\forall N_1, N_2\in\collk{N})(\exists N_3\in\collk{N})(N_3 \subset N_1 \cap N_2) & \\ & (\forall N \in\collk{N}, \alpha\in\reals \mbox{ with } 0<|\alpha|<1)(\alpha N \in \collk{N}) & \end{eqnarray*} $$
conversely, for every locally convex topological vector space, exists base at $0$ satisfying above conditions
follows that
- weak topology on vector space generated by linear functionals is locally convex
- normed vector space is locally convex topological vector space

Facts regarding local convexity

for locally convex topological vector space closed convex subset, $F$, with point, $x$, not in $F$, exists continuous linear functional, $f$, such that $$ f(x) < \inf_{y\in F} f(y) $$
corollaries
- convex set in locally convex topological vector space is strongly closed if and only if weakly closed
- for distinct points, $x$ and $y$, in locally convex Hausdorff vector space, exists continuous linear functional, $f$, such that $f(x)\neq f(y)$

Extreme points and supporting sets of convex sets

point in convex set in vector space that is not interior point of any line segment lying in the set, called extreme point
thus, $x$ is extreme point of convex set, $K$, if and only if $x=\lambda y + (1-\lambda) z$ with $0<\lambda<1$ implies $y\not\in K$ or $z\not\in K$
closed and convex subset, $S$, of convex set, $K$, with property that for every interior point of line segment in $K$ belonging to $S$, entire line segment belongs to $S$, called supporting set of $K$
for closed and convex set, $K$, set of points a continuous linear functional assumes maximum on $K$, is supporting set of $K$

Convex hull and convex convex hull

for set $E$ in vector space, intersection of all convex sets containing set, $E$, called convex hull of $E$, which is convex set
for set $E$ in vector space, intersection of all closed convex sets containing set, $E$, called closed convex hull of $E$, which is closed convex set
Krein-Milman theorem - compact convex set in locally convex topologically vector space is closed convex hull of its extreme points

Hilbert spaces

Banach space, $H$, with function $\innerp{\cdot}{\cdot}:H\times H\to\reals$ satisfying following properties, called Hilbert space $$ \begin{eqnarray*} &&(\forall x,y,z\in H, \alpha, \beta \in \reals)(\innerp{\alpha x + \beta y}{z}=\alpha\innerp{x}{z} + \beta\innerp{y}{z}) \\ &&(\forall x,y\in H)(\innerp{x}{y} = \innerp{y}{z}) \\ &&(\forall x\in H)(\innerp{x}{x} = \|x\|^2) \end{eqnarray*} $$
$\innerp{x}{y}$ called inner product for $x,y\in H$
- examples - $\innerp{x}{y} = x^T y = \sum x_i y_i$ for $\reals^n$, $\innerp{x}{y} = \int x(t)y(t) dt$ for $L^2$
Schwarz or Cauchy-Schwarz or Cauchy-Buniakowsky-Schwarz inequality - $$ \|x\|\|y\| \geq \innerp{x}{y} $$
- hence,
  - linear functional defined by $f(x)=\innerp{x}{y}$ bounded by $\|y\|$
  - $\innerp{x}{y}$ is continuous function from $H\times H$ to $\reals$

Inner product in Hilbert spaces

$x$ and $y$ in $H$ with $\innerp{x}{y}=0$ said to be orthogonal denoted by $x\perp y$
set $S$ of which any two elements orthogonal called orthogonal system
orthogonal system called orthonormal if every element has unit norm
any two elements are $\sqrt{2}$ apart, hence if $H$ separable, every orthonormal system in $H$ must be countable
shall deal only with separable Hilbert spaces

Fourier coefficients

assume orthonormal system expressed as sequence, $\seq{\varphi_n}$ - may be finite or infinite
for $x\in H$ $$ a_n = \innerp{x}{\varphi_n} $$ called Fourier coefficients
for $n\in\naturals$, we have $$ \|x\|^2 \geq \sum^n_{i=1} a_i^2 $$
$$ \begin{eqnarray*} \left\| x-\sum_{i=1}^n a_i \varphi_i \right\|^2 &=& \innerpt{x-\sum a_i \varphi_i}{x-\sum a_i \varphi_i}{} \\ &=& \innerpt{x}{x} - 2 \innerpt{x}{\sum a_i \varphi_i}{} + \innerpt{\sum a_i \varphi_i}{\sum a_i \varphi_i}{} \\ &=& \|x\|^2 - 2 \sum a_i \innerpt{x}{\varphi_i} + \sum a_i^2 \|\varphi_i\|^2 \\ &=& \|x\|^2 - \sum a_i^2 \geq 0 \end{eqnarray*} $$

Fourier coefficients of limit of $x$

Bessel's inequality - for $x\in H$, its Fourier coefficients, $\seq{a_n}$ $$ \sum_{n=1}^\infty a_n^2 \leq \|x\|^2 $$
then, $\seq{z_n}$ defined by following is Cauchy sequence $ z_n = \sum_{i=1}^n a_i \varphi_i $
completeness (of Hilbert space) implies $\seq{z_n}$ converges - let $y=\lim z_n$ $$ y=\lim z_n = \sum_{i=1}^\infty a_i \varphi_i $$
continuity of inner product implies $\innerp{y}{\varphi_n} = \lim (z_n,\varphi_n) = a_n$, i.e., Fourier coefficients of $y\in H$ are $a_n$, i.e.,
$y$ has same Fourier coefficients as $x$

Complete orthonormal system

orthonormal system, $\seq{\varphi_n}_{n=1}^\infty$, of Hilbert spaces, $H$, is said to be complete if $$ (\forall x\in H, n\in\naturals)(\innerp{x}{\varphi_n}=0) \Rightarrow x=0 $$
orthonormal system is complete if and only if maximal, i.e., $$ \begin{eqnarray*} \seq{\varphi_n} \mbox{ is complete} &\Leftrightarrow& ( (\exists \mbox{ orthonormal }R\subset H)(\forall n\in\naturals)(\varphi_n \in R) \\ &\Rightarrow& R = \seq{\varphi_n} ) \end{eqnarray*} $$
Hausdorff maximal principle () implies existence of maximal orthonormal system, hence following statement
for separable Hilbert space, $H$, every orthonormal system is separable and exists a complete orthonormal system. any such system, $\seq{\varphi_n}$, and $x\in H$ $$ x = \sum a_n \varphi_n $$ with $a_n = \innerp{x}{\varphi_n}$, and $\|x\| = \sum a_n^2$

Dimensions of Hilbert spaces

every complete orthonormal system of separable Hilbert space has same number of elements, i.e., has same cardinality
hence, every complete orthonormal system has either finite or countably infinite complete orthonormal system
this number called dimension of separable Hilbert space
- for Hilbert space with countably infinite complete orthonormal system, we say, $\dim H = \aleph_0$

Isomorphism and isometry between Hilbert spaces

isomorphism, $\Phi$, of Hilbert space onto another Hilbert space is linear mapping with property, $\innerp{\Phi x}{\Phi y} = \innerp{x}{y}$
hence, every isomorphism between Hilbert spaces is isometry
every $n$-dimensional Hilbert space is isomorphic to $\reals^n$
every $\aleph_0$-dimensional Hilbert space is isomorphic to $l^2$, which again is isomorphic to $L^2$
$L^2[0,1]$ is separable and $\seq{\cos (n\pi t)}$ is infinite orthogonal system
every bounded linear functional, $f$, on Hilbert space, $H$, has unique $y$ such that $$ (\forall x\in H)(f(x)=\innerp{x}{y}) $$ and $\|f\|=\|y\|$

Measure and Integration

Purpose of integration theory

purpose of “measure and integration'' slides
- abstract (out) most important properties of Lebesgue measure and Lebesgue integration
provide certain axioms that Lebesgue measure satisfies
base our integration theory on these axioms
hence, our theory valid for every system satisfying the axioms

Measurable space, measure, and measure space

family of subsets containing $\emptyset$ closed under countable union and completement, called $\sigma$-algebra
mapping of sets to extended real numbers, called set function
$\measu{X}{\algk{B}}$ with set, $X$, and $\sigma$-algebra of $X$, $\algk{B}$, called measurable space
- $A\in\algk{B}$, said to be measurable (with respect to \algk{B})
nonnegative set function, $\mu$, defined on $\algk{B}$ satisfying $\mu(\emptyset)=0$ and for every disjoint, $\seq{E_n}_{n=1}^\infty\subset \algk{B}$, $$ \mu\left(\bigcup E_n\right) = \sum \mu E_n $$ called measure on measurable space, $\measu{X}{\algk{B}}$
measurable space, $\measu{X}{\algk{B}}$, equipped with measure, $\mu$, called measure space and denoted by $\meas{X}{\algk{B}}{\mu}$

Measure space examples

$\meas{\reals}{\subsetset{M}}{\mu}$ with Lebesgue measurable sets, $\subsetset{M}$, and Lebesgue measure, $\mu$
$\meast{[0,1]}{\set{A\in\subsetset{M}}{A\subset[0,1]}}{\mu}$ with Lebesgue measurable sets, $\subsetset{M}$, and Lebesgue measure, $\mu$
$\meas{\reals}{\algB}{\mu}$ with class of Borel sets, $\algB$, and Lebesgue measure, $\mu$
$\meas{\reals}{\powerset(\reals)}{\mu_C}$ with set of all subsets of $\reals$, $\powerset(\reals)$, and counting measure, $\mu_C$
interesting (and bizarre) example
- $\meas{X}{\collk{A}}{\mu_B}$ with any uncountable set, $X$, family of either countable or complement of countable set, $\collk{A}$, and measure, $\mu_B$, such that $\mu_B A =0$ for countable $A\subset X$ and $\mu_B B=1$ for uncountable $B\subset X$

More properties of measures

for $A,B\in\algB$ with $A\subset B$ $$ \mu A \leq \mu B $$
for $\seq{E_n}\subset \algB$ with $\mu E_1 < \infty$ and $E_{n+1} \subset E_n$ $$ \mu\left(\bigcap E_n\right) = \lim \mu E_n $$
for $\seq{E_n}\subset \algB$ $$ \mu\left(\bigcup E_n\right) \leq \sum \mu E_n $$

Finite and $\sigma$-finite measures

measure, $\mu$, with $\mu(X)<\infty$, called finite
measure, $\mu$, with $X=\bigcup X_n$ for some $\seq{X_n}$ and $\mu(X_n)<\infty$, called $\sigma$-finite
- always can take $\seq{X_n}$ with disjoint $X_n$
Lebesgue measure on $[0,1]$ is finite
Lebesgue measure on $\reals$ is $\sigma$-finite
countering measure on uncountable set is not $\sigma$-measure

Sets of finite and $\sigma$-finite measure

set, $E\in \algB$, with $\mu E<\infty$, said to be of finite measure
set that is countable union of measurable sets of finite measure, said to be of $\sigma$-finite measure
measurable set contained in set of $\sigma$-finite measure, is of $\sigma$-finite measure
countable union of sets of $\sigma$-finite measure, is of $\sigma$-finite measure
when $\mu$ is $\sigma$-finite, every measurable set is of $\sigma$-finite

Semifinite measures

roughly speacking, nearly all familiar properties of Lebesgue measure and Lebesgue integration hold for arbitrary $\sigma$-finite measure
many treatment of abstract measure theory limit themselves to $\sigma$-finite measures
many parts of general theory, however, do not required assumption of $\sigma$-finiteness
undesirable to have development unnecessarily restrictive
measure, $\mu$, for which every measurable set of infinite measure contains measurable sets of arbitrarily large finite measure, said to be semifinite
every $\sigma$-finite measure is semifinite measure while measure, $\mu_B$, on page~here is not

Complete measure spaces

measure space, $\meas{X}{\algB}{\mu}$, for which $\algB$ contains all subsets of sets of measure zero, said to be complete, i.e., $$ (\forall B\in\algB \mbox{ with } \mu B=0) (A \subset B \Rightarrow A \in \algB) $$
- e.g., Lebesgue measure is complete, but Lebesgue measure restricted to $\sigma$-algebra of Borel sets is not
every measure space can be completed by addition of subsets of sets of measure zero
for $\meas{X}{\algB}{\mu}$, can find complete measure space $\meas{X}{\algB_0}{\mu_0}$ such that $$ \begin{eqnarray*} &-& \algB \subset \algB_0 \\ &-& E \in\algB \Rightarrow \mu E = \mu_0 E \\ &-& E \in\algB_0 \Leftrightarrow E = A \cup B \mbox{ where } B,C\in\algB, \mu C = 0, A\subset C \end{eqnarray*} $$
- $\meas{X}{\algB_0}{\mu_0}$ called completion of $\meas{X}{\algB}{\mu}$

Local measurability and saturatedness

for $\meas{X}{\algB}{\mu}$, $E\subset X$ for which $(\forall B\in\algB \mbox{ with }\mu B < \infty)(E\cap B\in\algB)$, said to be locally measurable
collection, $\algC$, of all locally measurable sets is $\sigma$-algebra containing $\algB$
measure for which every locally measurable set is measurable, said to be saturated
every $\sigma$-finite measure is saturated
measure can be extended to saturated measure, but (unlike completion) extension is not unique
- can take $\algC$ as extension for locally measurable sets, but measure can be extended on $\algC$ in more than one ways

Measurable functions

concept and properties of measurable functions in abstract measurable space almost identical with those of Lebesgue measurable functions (page~here)
theorems and facts are essentially same as those of Lebesgue measurable functions
assume measurable space, $\measu{X}{\algB}$
for $f:X\to\ereals$, following are equivalent
- $(\forall a\in\reals) (\set{x\in X}{f(x) < a}\in\algB)$
- $(\forall a\in\reals) (\set{x\in X}{f(x) \leq a}\in\algB)$
- $(\forall a\in\reals) (\set{x\in X}{f(x) > a}\in\algB)$
- $(\forall a\in\reals) (\set{x\in X}{f(x) \geq a}\in\algB)$
$f:X\to\ereals$ for which any one of above four statements holds, called measurable or measurable with respect to \algB

Properties of measurable functions

for measurable functions, $f$ and $g$, and $c\in\reals$
- $f+c$, $cf$, $f+g$, $fg$, $f\vee g$ are measurable
for every measurable function sequence, $\seq{f_n}$
- $\sup f_n$, $\limsup f_n$, $\inf f_n$, $\liminf f_n$ are measurable
- thus, $\lim f_n$ is measurable if exists

Simple functions and other properties

$\varphi$ called simple function if for distinct $\seq{c_i}_{i=1}^n$ and measurable sets, $\seq{E_i}_{i=1}^n$ $$ \varphi(x) = \sum_{i=1}^n c_i \chi_{E_i}(x) $$
for nonnegative measurable function, $f$, exists nondecreasing sequence of simple functions, $\seq{\varphi_n}$, i.e., $\varphi_{n+1}\geq \varphi_n$ such that for every point in $X$ $$ f = \lim \varphi_n $$
- for $f$ defined on $\sigma$-finite measure space, we may choose $\seq{\varphi_n}$ so that every $\varphi_n$ vanishes outside set of finite measure
for complete measure, $\mu$, $f$ measurable and $f=g$ a.e. imply measurability of $g$

Define measurable function by ordinate sets

$\set{x}{f(x)<\alpha}$ sometimes called ordinate sets , which is nondecreasing in $\alpha$
below says when given nondecreasing ordinate sets, we can find $f$ satisfying $$ \set{x}{f(x)<\alpha} \subset B_\alpha \subset \set{x}{f(x)\leq\alpha} $$
for nondecreasing function, $h:D\to\algB$, for dense set of real numbers, $D$, i.e., $B_\alpha \subset B_\beta$ for all $\alpha<\beta$ where $B_\alpha = h(\alpha)$, exists unique measurable function, $f:X\to\ereals$ such that $f\leq \alpha$ on $B_\alpha$ and $f\geq \alpha$ on $X\sim B_\alpha$
can relax some conditions and make it a.e. version as below
for function, $h:D\to\algB$, for dense set of real numbers, $D$, such that $\mu(B_\alpha\sim B_\beta)=0$ for all $\alpha < \beta$ where $B_\alpha = h(\alpha)$, exists measurable function, $f:X\to\ereals$ such that $f\leq \alpha$ a.e. on $B_\alpha$ and $f\geq \alpha$ a.e. on $X\sim B_\alpha$
- if $g$ has the same property, $f=g$ a.e.

Integration

many definitions and proofs of Lebesgue integral depend only on properties of Lebesgue measure which are also true for arbitrary measure in abstract measure space (page~here)
integral of nonnegative simple function, $\varphi(x) = \sum_{i=1}^n c_i \chi_{E_i}(x)$, on measurable set, $E$, defined by $$ \int_E \varphi d\mu= \sum_{i=1}^n c_i \mu (E_i \cap E) $$
- independent of representation of $\varphi$
for $a,b\in\ppreals$ and nonnegative simple functions, $\varphi$ and $\psi$ $$ \int (a\varphi + b\psi) = a \int\varphi + b \int\psi $$

Integral of bounded functions

for bounded function, $f$, identically zero outside measurable set of finite measure
$$ \sup_{\varphi:\ \mathrm{simple},\ \varphi \leq f} \int \varphi = \inf_{\psi:\ \mathrm{simple},\ f \leq \psi} \int \psi $$ if and only if $f=g$ a.e. for measurable function, $g$
but, $f=g$ a.e. for measurable function, $g$, \iaoi\ $f$ is measurable with respect to completion of $\mu$, $\bar{\mu}$
natural class of functions to consider for integration theory are those measurable \wrt\ completion of $\mu$
thus, shall either assume $\mu$ is complete measure or define integral with respect to $\mu$ to be integral with respect to completion of $\mu$ depending on context unless otherwise specified

Difficulty of general integral of nonnegative functions

for Lebesgue integral of nonnegative functions (page~here)
- first define integral for bounded measurable functions
- define integral of nonnegative function, $f$ as supremum of integrals of all bounded measurable functions, $h\leq f$, vanishing outside measurable set of finite measure
unfortunately, not work in case that measure is not semifinite
- e.g., if $\algB=\{\emptyset,X\}$ with $\mu \emptyset = 0$ and $\mu X = \infty$, we want $\int 1 d\mu=\infty$, but only bounded measurable function vanishing outside measurable set of finite measure is $h\equiv0$, hence, $\int g d\mu = 0$
to avoid this difficulty, we define integral of nonnegative measurable function directly in terms of integrals of nonnegative simple functions

Integral of nonnegative functions

for measurable function, $f:X\to\reals\cup\{\infty\}$, on measure space, $\meas{X}{\algB}{\mu}$, define integral of nonnegative extended real-valued measurable function
$$ \int f d\mu = \sup_{\varphi:\ \mathrm{simple\ function},\ 0\leq \varphi\leq f} \int \varphi d\mu $$
however, definition of integral of nonnegative extended real-valued measurable function can be awkward to apply because
- taking supremum over large collection of simple functions
- not clear from definition that $\int(f+g) = \int f + \int g$
thus, first establish some convergence theorems, and determine value of $\int f$ as limit of $\int \varphi_n$ for increasing sequence, $\seq{\varphi_n}$, of simple functions converging to $f$

Fatou's lemma and monotone convergence theorem

Fatou's lemma - for nonnegative measurable function sequence, $\seq{f_n}$, with $\lim f_n = f$ a.e. on measurable set, $E$
$$ \int_E f \leq \liminf \int_E f_n $$
monotone convergence theorem - for nonnegative measurable function sequence, $\seq{f_n}$, with $f_n\leq f$ for all $n$ and with $\lim f_n = f$ a.e.
$$ \int_E f = \lim \int_E f_n $$

Integrability of nonnegative functions

for nonnegative measurable functions, $f$ and $g$, and $a,b\in\preals$
$$ \int (af + bg) = a\int f + b\int g \mbox{ \& } \int f \geq 0 $$
- equality holds if and only if $f=0$ a.e.
monotone convergence theorem together with above yields, for nonnegative measurable function sequence, $\seq{f_n}$ $$ \int \sum f_n = \sum \int f_n $$
measurable nonnegative function, $f$, with
$$ \int_E fd\mu <\infty $$ said to be integral (over measurable set, $E$, \wrt\ $\mu$)

Integral

arbitrary function, $f$, for which both $f^+$ and $f^-$ are integrable, said to be integrable
in this case, define integral $$ \int_E f = \int_E f^+ - \int_E f^- $$

Properties of integral

for $f$ and $g$ integrable on measure set, $E$, and $a,b\in\reals$
- $af+bg$ is integral and $$ \int_E (af+bg) = a \int_E f + b\int_E g $$
- if $|h|\leq |f|$ and $h$ is measurable, then $h$ is integrable
- if $f\geq g$ a.e. $$ \int f \geq \int g $$

Lebesgue convergence theorem

Lebesgue convergence theorem - for integral, $g$, over $E$ and sequence of measurable functions, $\seq{f_n}$, with $\lim f_n(x) = f(x)$ a.e. on $E$, if
$$ |f_n(x)|\leq g(x) $$ then $$ \int_E f = \lim \int_E f_n $$

Setwise convergence of sequence of measures

preceding convergence theorems assume fixed measure, $\mu$
can generalize by allowing measure to vary
given measurable space, $\measu{X}{\algB}$, sequence of set functions, $\seq{\mu_n}$, defined on $\algB$, satisfying $$ (\forall E\in\algB) (\lim \mu_n E = \mu E) $$ for some set function, $\mu$, defined on $\algB$, said to converge setwise to $\mu$

General convergence theorems

generalization of Fatou's leamma - for measurable space, $\measu{X}{\algB}$, sequence of measures, $\seq{\mu_n}$, defined on $\algB$, converging setwise to $\mu$, defined on $\algB$, and sequence of nonnegative functions, $\seq{f_n}$, each measurable with respect to $\mu_n$, converging pointwise to function, $f$, measurable with respect to $\mu$ (compare with Fatou's lemma on page~here) $$ \int f d\mu \leq \liminf\int f_n d\mu_n $$
generalization of Lebesgue convergence theorem - for measurable space, $\measu{X}{\algB}$, sequence of measures, $\seq{\mu_n}$, defined on $\algB$, converging setwise to $\mu$, defined on $\algB$, and sequences of functions, $\seq{f_n}$ and $\seq{g_n}$, each of $f_n$ and $g_n$, measurable with respect to $\mu_n$, converging pointwise to $f$ and $g$, measurable with respect to $\mu$, respectively, such that (compare with Lebesgue convergence theorem on page~here) $$ \lim \int g_n d\mu_n = \int g d\mu < \infty $$ satisfy $$ \lim \int f_n d\mu_n = \int f\mu $$

$L^p$ spaces

for complete measure space, $\meas{X}{\algB}{\mu}$
- space of measurable functions on $X$ with with $\int |f|^p < \infty$, for which element equivalence is defined by being equal a.e., called $L^p$ spaces denoted by $L^p(\mu)$
- space of bounded measure functions, called $L^\infty$ space denoted by $L^\infty(\mu)$
norms
- for $p\in[1,\infty)$ $$ \|f\|_p=\left( \int |f|^p d\mu \right)^{1/p} $$
- for $p=\infty$ $$ \|f\|_\infty = \mathrm{ess\ sup} |f| = \inf \bigsetl{|g(x)|}{\mbox{measurable }g \mbox{ with } g=f \mbox{ a.e.}} $$
for $p\in[1,\infty]$, spaces, $L^p(\mu)$, are Banach spaces

Hölder's inequality and Littlewood's second principle

Hölder's inequality -
for $p,q\in[1,\infty]$ with $1/p+1/q=1$, $f\in L^p(\mu)$ and $g\in L^q(\mu)$ satisfy $fg \in L^1(\mu)$ and $$ \|fg\|_1 = \int |fg| d\mu \leq \|f\|_p\|g\|_q $$
complete measure space version of Littlewood's second principle -
for $p\in[1,\infty)$ $$ \begin{eqnarray*} && (\forall f\in L^p(\mu), \epsilon>0) \\ && (\exists \mbox{ simple function } \varphi \mbox{ vanishing outside set of finite measure}) \\ && \ \ \ \ \ \ \ (\|f-\varphi\|_p < \epsilon) \end{eqnarray*} $$

Riesz representation theorem

Riesz representation theorem -
for $p\in[1,\infty)$ and bounded linear functional, $F$, on $L^p(\mu)$ and $\sigma$-finite measure, $\mu$, exists unique $g\in L^q(\mu)$ where $1/p+1/q=1$ such that $$ F(f) = \int fg d\mu $$ where $\|F\| = \|g\|_q$
if $p\in(1,\infty)$, Riesz representation theorem holds without assumption of $\sigma$-finiteness of measure

Measure and Outer Measure

General measures

consider some ways of defining measures on $\sigma$-algebra
recall that for Lebesgue measure
- define measure for open intervals
- define outer measure
- define notion of measurable sets
- finally derive Lebesgue measure
one can do similar things in general, e.g.,
- derive measure from outer measure
- derive outer measure from measure defined on algebra of sets

Outer measure

set function, $\mu^\ast:\powerset(X)\to[0,\infty]$, for space $X$, having following properties, called outer measure
- $\mu^\ast \emptyset = 0$
- $A\subset B \Rightarrow \mu^\ast A \leq \mu^\ast B$ (monotonicity)
- $E \subset \bigcup_{n=1}^\infty E_n \Rightarrow \mu^\ast E \leq \sum_{n=1}^\infty \mu^\ast E_n$ (countable subadditivity)
$\mu^\ast$ with $\mu^\ast X<\infty$ called finite
set $E\subset X$ satisfying following property, said to be measurable \wrt\ $\mu^\ast$ $$ (\forall A\subset X) (\mu^\ast(A) =\mu^\ast(A\cap E) + \mu^\ast(A\cap \compl{E})) $$
class, $\algB$, of $\mu^\ast$-measurable sets is $\sigma$-algebra
restriction of $\mu^\ast$ to $\algB$ is complete measure on $\algB$

Extension to measure from measure on an algebra

set function, $\mu:\alg\to[0,\infty]$, defined on algebra, $\alg$, having following properties, called measure on an algebra
- $\mu(\emptyset) = 0$
- $\left( \forall \mbox{ disjoint } \seq{A_n} \subset \alg \mbox{ with } \bigcup A_n \in \alg \right) \left( \mu\left(\bigcup A_n\right) = \sum \mu A_n \right)$
measure on an algebra, \alg, is measure if and only if $\alg$ is $\sigma$-algebra
can extend measure on an algebra to measure defined on $\sigma$-algebra, $\algB$, containing $\alg$, by
- constructing outer measure $\mu^\ast$ from $\mu$
- deriving desired extension $\bar{\mu}$ induced by $\mu^\ast$
process by which constructing $\mu^\ast$ from $\mu$ similar to constructing Lebesgue outer measure from lengths of intervals

Outer measure constructed from measure on an algebra

given measure, $\mu$, on an algebra, $\alg$
define set function, $\mu^\ast:\powerset(X)\to[0,\infty]$, by $$ \mu^\ast E = \inf_{\seq{A_n}\subset \alg,\ E\subset \bigcup A_n} \sum \mu A_n $$
$\mu^\ast$ called outer measure induced by $\mu$
then
for $A\in\alg$ and $\seq{A_n}\subset\alg$ with $A\subset \bigcup A_n$, $\mu A\leq \sum \mu A_n$
hence, $(\forall A\in\alg)(\mu^\ast A = \mu A)$
$\mu^\ast$ is outer measure
every $A\in\alg$ is measurable with respect to $\mu^\ast$

Regular outer measure

for algebra, $\alg$
- $\alg_\sigma$ denote sets that are countable unions of sets of $\alg$
- $\alg_{\sigma \delta}$ denote sets that are countable intersections of sets of $\alg_\sigma$
given measure, $\mu$, on an algebra, $\alg$ and outer measure, $\mu^\ast$ induced by $\mu$, for every $E\subset X$ and every $\epsilon>0$, exists $A\in\alg_\sigma$ and $B\in\alg_{\sigma \delta}$ with $E\subset A$ and $E\subset B$ $$ \mu^\ast A \leq \mu^\ast E + \epsilon \mbox{ and } \mu^\ast E = \mu^\ast B $$
outer measure, $\mu^\ast$, with below property, said to be regular $$ (\forall E\subset X, \epsilon>0) (\exists \mbox{ $\mu^\ast$-measurable set }A \mbox{ with } E\subset A) (\mu^\ast A \subset \mu^\ast E + \epsilon) $$
every outer measure induced by measure on an algebra is regular outer measure

Carathéodory theorem

given measure, $\mu$, on an algebra, $\alg$ and outer measure, $\mu^\ast$ induced by $\mu$
$E\subset X$ is $\mu^\ast$-measurable if and only if exist $A\in\alg_{\sigma\delta}$ and $B\subset X$ with $\mu^\ast B=0$ such that $$ E=A\sim B $$
- for $B\subset X$ with $\mu^\ast B=0$, exists $C\in\alg_{\sigma\delta}$ with $\mu^\ast C=0$ such that $B\subset C$
Carathéodory theorem - restriction, $\bar{\mu}$, of $\mu^\ast$ to $\mu^\ast$-measurable sets if extension of $\mu$ to $\sigma$-algebra containing $\alg$
- if $\mu$ is finite or $\sigma$-finite, so is $\bar{\mu}$ respectively
- if $\mu$ is $\sigma$-finite, $\bar{\mu}$ is only measure on smallest $\sigma$-algebra containing $\alg$ which is extension of $\mu$

Product measures

for countable disjoint collection of measurable rectangles, $\seq{(A_n \times B_n)}$, whose union is measurable rectangle, $A\times B$ $$ \lambda(A\times B) = \sum \lambda(A_n \times B_n) $$
for $x\in X$ and $E\in \algk{R}_{\sigma\delta}$ $$ E_x = \set{y}{\langle x,y\rangle \in E} $$ is measurable subset of $Y$
for $E\subset\algk{R}_{\sigma\delta}$ with $\mu \times \nu(E)<\infty$, function, $g$, defined by $$ g(x) = \nu E_x $$ is measurable function of $x$ and $$ \int g d\mu = \mu \times \nu(E) $$
XXX

Carathéodory outer measures

set, $X$, of points and set, $\Gamma$, of real-valued functions on $X$
two sets for which exist $a>b$ such that function, $\varphi$, greater than $a$ on one set and less than $b$ on the other set, said to be separated by function, $\varphi$
outer measure, $\mu^\ast$, with $ (\forall A,B\subset X \mbox{ separated by } f\in\Gamma) (\mu^\ast(A\cup B) = \mu^\ast A + \mu^\ast B) $, called Carathéodory outer measure with respect to $\Gamma$
outer measure, $\mu^\ast$, on metric space, $\metrics{X}{\rho}$, for which $\mu^\ast(A\cup B)=\mu^\ast A + \mu^\ast B$ for $A,B\subset X$ with $\rho(A,B)>0$, called Carathéodory outer measure for $X$ or metric outer measure
for Carathéodory outer measure, $\mu^\ast$, with respect to $\Gamma$, every function in $\Gamma$ is $\mu^\ast$-measurable
for Carathéodory outer measure, $\mu^\ast$, for metric space, \metrics{X, \rho}, every closed set (hence every Borel set) is measurable with respect to $\mu^\ast$

Measure-theoretic Treatment of Probabilities

Probability Measure

Measurable functions

denote $n$-dimensional Borel sets by $\algR^n$
for two measurable spaces, $\measu{\Omega}{\algF}$ and $\measu{\Omega'}{\algF'}$, function, $f:\Omega \to \Omega'$ with $$ \left( \forall A' \in \algF' \right) \left( f^{-1}(A') \in \algF \right) $$ said to be measurable with respect to $\algF/\algF'$ (thus, measurable functions defined on page~here and page~here can be said to be measurable with respect to $\collk{B}/\algR$)
when $\Omega=\reals^n$ in $\measu{\Omega}{\algF}$, $\algF$ is assumed to be $\algR^n$, and sometimes drop $\algR^n$
- thus, e.g., we say $f:\Omega\to\reals^n$ is measurable with respect to $\algF$ (instead of $\algF/\algR^n$)
measurable function, $f:\reals^n\to\reals^m$ (i.e., measurable with respect to $\algR^n/\algR^m$), called Borel functions
$f:\Omega\to\reals^n$ is measurable with respect to $\algF/\algR^n$ if and only if every component, $f_i:\Omega\to\reals$, is measurable with respect to $\algF/\algR$

Probability (measure) spaces

set function, $P:\algk{F}\to[0,1]$, defined on algebra, $\algk{F}$, of set $\Omega$, satisfying following properties, called probability measure (refer to page~here for resumblance with measurable spaces)
- $(\forall A\in\algk{F})(0\leq P(A)\leq 1)$
- $P(\emptyset) = 0,\ P(\Omega) = 1$
- $(\forall \mbox{ disjoint } \seq{A_n} \subset \algk{F} )(P\left(\bigcup A_n\right) = \sum P(A_n))$
for $\sigma$-algebra, $\algk{F}$, $\meas{\Omega}{\algk{F}}{P}$, called probability measure space or probability space
set $A\in\algk{F}$ with $P(A)=1$, called a support of $P$

Dynkin's $\pi$-$\lambda$ theorem

class, $\subsetset{P}$, of subsets of $\Omega$ closed under finite intersection, called $\pi$-system, i.e.,
- $(\forall A,B\in \subsetset{P})(A\cap B\in\subsetset{P})$
class, $\subsetset{L}$, of subsets of $\Omega$ containing $\Omega$ closed under complements and countable disjoint unions called $\lambda$-system
- $\Omega \in \subsetset{L}$
- $(\forall A\in \subsetset{L})(\compl{A}\in\subsetset{L})$
- $(\forall \mbox{ disjoint }\seq{A_n})(\bigcup A_n \in \subsetset{L})$
class that is both $\pi$-system and $\lambda$-system is $\sigma$-algebra
Dynkin's $\pi$-$\lambda$ theorem - for $\pi$-system, $\subsetset{P}$, and $\lambda$-system, $\subsetset{L}$, with $\subsetset{P} \subset \subsetset{L}$, $$ \sigma(\subsetset{P}) \subset \subsetset{L} $$
for $\pi$-system, $\algk{P}$, two probability measures, $P_1$ and $P_2$, on $\sigma(\algk{P})$, agreeing $\algk{P}$, agree on $\sigma(\algk{P})$

Limits of Events

no for sequence of subsets, $\seq{A_n}$, $$ P(\liminf A_n) \leq \liminf P(A_n) \leq \limsup P(A_n) \leq P(\limsup A_n) $$

for $\seq{A_n}$ converging to $A$ $$ \lim P(A_n) = P(A) $$

no for sequence of $\pi$-systems, $\seq{\algA_n}$, $\seq{\sigma(\algA_n)}$ is independent

Probabilistic independence

given probability space, $\meas{\Omega}{\algk{F}}{P}$
$A,B\in\algk{F}$ with $$ P(A\cap B) = P(A) P(B) $$ said to be independent
indexed collection, $\seq{A_\lambda}$, with $$ \left( \forall n\in\naturals, \mbox{ distinct } \lambda_1, \ldots, \lambda_n \in \Lambda \right) \left( P\left(\bigcap_{i=1}^n A_{\lambda_i}\right) = \prod_{i=1}^n P(A_{\lambda_i}) \right) $$ said to be independent

Independence of classes of events

indexed collection, $\seq{\subsetset{A}_\lambda}$, of classes of events (i.e., subsets) with $$ \left( \forall A_\lambda \in \subsetset{A}_\lambda \right) \left( \seq{A_\lambda} \mbox{ are independent} \right) $$ said to be independent
for independent indexed collection, \seq{\subsetset{A}_\lambda}, with every $\subsetset{A}_\lambda$ being $\pi$-sytem, \seq{\sigma(\subsetset{A}_\lambda)} are independent
for independent (countable) collection of events, $\seq{\seq{A_{ni}}_{i=1}^\infty}_{n=1}^\infty$, $\seq{\algk{F}_n}_{n=1}^\infty$ with $\algk{F}_n = \sigma(\seq{A_{ni}}_{i=1}^\infty)$ are independent

Borel-Cantelli lemmas

for sequence of events, $\seq{A_n}$, with $\sum P(A_n)$ converging $$ P(\limsup A_n) = 0 $$
for independent sequence of events, $\seq{A_n}$, with $\sum P(A_n)$ diverging $$ P(\limsup A_n)=1 $$

Tail events and Kolmogorov's zero-one law

for sequence of events, $\seq{A_n}$ $$ \algk{T} = \bigcap_{n=1}^\infty \sigma\left(\seq{A_i}_{i=n}^\infty\right) $$ called tail $\sigma$-algebra associated with \seq{A_n}; its lements are called tail events
Kolmogorov's zero-one law - for independent sequence of events, $\seq{A_n}$ every event in tail $\sigma$-algebra has probability measure either $0$ or $1$

Product probability spaces

for two measure spaces, $\meas{X}{\algX}{\mu}$ and $\meas{Y}{\algY}{\nu}$, want to find product measure, $\pi$, such that $$ \left( \forall A\in \algX, B\in\algY \right) \left( \pi(A\times B) = \mu(A)\nu(B) \right) $$
- e.g., if both $\mu$ and $\nu$ are Lebesgue measure on $\reals$, $\pi$ will be Lebesgue measure on $\reals^2$
$A\times B$ for $A\in\algX$ and $B\in\algY$ is measurable rectangle
$\sigma$-algebra generated by measurable rectangles denoted by $$ \algX \times \algY $$
- thus, not Cartesian product in usual sense
- generally much larger than class of measurable rectangles

Sections of measurable subsets and functions

for two measure spaces, $\meas{X}{\algX}{\mu}$ and $\meas{Y}{\algY}{\nu}$
sections of measurable subsets
- $\set{y\in Y}{(x,y)\in E}$ is section of $E$ determined by $x$
- $\set{x\in X}{(x,y)\in E}$ is section of $E$ determined by $y$
sections of measurable functions - for measurable function, $f$, with respect to $\algX\times \algY$
- $f(x,\cdot)$ is section of $f$ determined by $x$
- $f(\cdot,y)$ is section of $f$ determined by $y$
sections of measurable subsets are measurable
- $ \left( \forall x\in X, E\in \algX \times \algY \right) \left( \set{y\in Y}{(x,y)\in E} \in \algY \right) $
- $ \left( \forall y\in Y, E\in \algX \times \algY \right) \left( \set{x\in X}{(x,y)\in E} \in \algX \right) $
sections of measurable functions are measurable
- $f(x,\cdot)$ is measurable with respect to $\algY$ for every $x\in X$
- $f(\cdot,y)$ is measurable with respect to $\algX$ for every $y\in Y$

Product measure

for two $\sigma$-finite measure spaces, $\meas{X}{\algX}{\mu}$ and $\meas{Y}{\algY}{\nu}$
two functions defined below for every $E\in\algX\times\algY$ are $\sigma$-finite measures
- $\pi'(E) = \int_X \nu\set{y\in Y}{(x,y)\in E} d\mu$
- $\pi''(E) = \int_Y \mu\set{x\in X}{(x,y)\in E} d\nu$
for every measurable rectangle, $A\times B$, with $A\in\algX$ and $B\in\algY$ $$ \pi'(A\times B) = \pi''(A\times B) = \mu(A) \nu(B) $$
(use conventions in page~here for extended real values)
indeed, $\pi'(E)=\pi''(E)$ for every $E\in\algX\times\algY$; let $\pi=\pi'=\pi''$
$\pi$ is
- called product measure and denoted by $\mu\times \nu$
- $\sigma$-finite measure
- only measure such that $\pi(A\times B) =\mu(A) \nu(B)$ for every measurable rectangle

Fubini's theorem

suppose two $\sigma$-finite measure spaces, $\meas{X}{\algX}{\mu}$ and $\meas{Y}{\algY}{\nu}$ - define
- $X_0 = \set{x\in X}{\int_Y |f(x,y)|d\nu < \infty}\subset X$
- $Y_0 = \set{y\in Y}{\int_X |f(x,y)|d\nu < \infty}\subset Y$
Fubini's theorem - for nonnegative measurable function, $f$, following are measurable with respect to $\algX$ and $\algY$ respectively $$ g(x) = \int_Y f(x,y)d\nu,\ \ h(y) = \int_X f(x,y)d\mu $$ and following holds $$ \int_{X\times Y} f(x,y) d\pi = \int_X \left(\int_Y f(x,y) d\nu\right)d\mu = \int_Y \left(\int_X f(x,y) d\mu\right)d\nu $$
for $f$, (not necessarily nonnegative) integrable function with respect to $\pi$
- $\mu(X\sim X_0) = 0$, $\nu(Y\sim Y_0)=0$
- $g$ and $h$ are finite measurable on $X_0$ and $Y_0$ respectively
- (above) equalities of double integral holds

Random Variables

Random variables

for probability space, $\meas{\Omega}{\algk{F}}{P}$,
measurable function (with respect to $\algF/\algR$), $X:\Omega \to \reals$, called random variable
measurable function (with respect to $\algF/\algR^n$), $X:\Omega \to \reals^n$, called random vector
- when expressing $X(\omega)=(X_1(\omega), \ldots, X_n(\omega))$, $X$ is measurable if and only if every $X_i$ is measurable
- thus, $n$-dimensional random vaector is simply $n$-tuple of random variables
smallest $\sigma$-algebra with respect to which $X$ is measurable, called $\sigma$-algebra generated by $X$ and denoted by $\sigma(X)$
- $\sigma(X)$ consists exactly of sets, $\set{\omega\in \Omega}{X(\omega)\in H}$, for $H\in\algR^n$
- random variable, $Y$, is measurable with respect to $\sigma(X)$ if and only if exists measurable function, $f:\reals^n\to\reals$ such that $Y(\omega) = f(X(\omega))$ for all $\omega$, i.e., $Y=f\circ X$

Probability distributions for random variables

probability measure on $\reals$, $\mu = PX^{-1}$, i.e., $$ \mu(A) = P(X\in A) \mbox{ for } A \in \algR $$ called distribution or law of random variable, $X$
function, $F:\reals\to[0,1]$, defined by $$ F(x) = \mu(-\infty, x] = P(X\leq x) $$ called distribution function or cumulative distribution function (CDF) of $X$
Borel set, $S$, with $P(S)=1$, called support
random variable, its distribution, its distribution function, said to be discrete when has countable support

Probability distribution of mappings of random variables

for measurable $g:\reals\to\reals$, $$ \left( \forall A\in\algR \right) \left( \prob{g(X)\in A} = \prob{X \in g^{-1}(A)} = \mu (g^{-1}(A)) \right) $$ hence, $g(X)$ has distribution of $\mu g^{-1}$

Probability density for random variables

Borel function, $f: \reals\to\preals$, satisfying $$ \left( \forall A \in \algR \right) \left( \mu(A) = P(X\in A) = \int_A f(x) dx \right) $$ called density or probability density function (PDF) of random variable
above is equivalent to $$ \left( \forall a < b \in \reals \right) \left( \int_a^b f(x) dx = P(a<X\leq b) = F(b) - F(a) \right) $$
(refer to statement on page~here)
- note, though, $F$ does not need to differentiate to $f$ everywhere; only $f$ required to integrate properly
- if $F$ does differentiate to $f$ and $f$ is continuous, fundamental theorem of calculus implies $f$ indeed is density for $F$

Probability distribution for random vectors

(similarly to random variables) probability measure on $\reals^n$, $\mu = PX^{-1}$, i.e., $$ \mu(A) = P(X\in A) \mbox{ for } A \in \algk{B}^k $$ called distribution or law of random vector, $X$
function, $F:\reals^k\to[0,1]$, defined by $$ F(x) = \mu S_x = P(X\preceq x) $$ where $$ S_x = \set{\omega\in \Omega}{X(\omega)\preceq x} = \set{\omega\in \Omega}{X_i(\omega)\leq x_i} $$ called distribution function or cumulative distribution function (CDF) of $X$
(similarly to random variables) random vector, its distribution, its distribution function, said to be discrete when has countable support

Marginal distribution for random vectors

(similarly to random variables) for measurable $g:\reals^n\to\reals^m$ $$ \left( \forall A\in\algR^{m} \right) \left( \prob{g(X)\in A} = \prob{X \in g^{-1}(A)} = \mu(g^{-1}(A)) \right) $$ hence, $g(X)$ has distribution of $\mu g^{-1}$
for $g_i:\reals^n\to\reals$ with $g_i(x) = x_i$ $$ \left( \forall A\in\algR \right) \left( \prob{g(X)\in A} = \prob{X_i \in A} \right) $$
measure, $\mu_i$, defined by $\mu_i(A) = \prob{X_i\in A}$, called ($i$-th) marginal distribution of $X$
for $\mu$ having density function, $f:\reals^n\to\preals$, density function of marginal distribution is $$ f_i(x) = \int_{\algR^{n-1}} f(x_{-i}) d \mu_{-i} $$ where $x_{-i} = (x_1,\ldots,x_{i-1}, x_{i+1}, \ldots, x_n)$ and similarly for $d\mu_{-i}$

Independence of random variables

random variables, $X_1$, …, $X_n$, with independent $\sigma$-algebras generated by them, said to be independent
(refer to page~here for independence of collections of subsets)
- because $\sigma(X_i) = X_i^{-1}(\algR)=\set{X_i^{-1}(H)}{H\in\algR}$, independent if and only if $$ \begin{eqnarray*} && \left( \forall H_1, \ldots, H_n\in \algR \right) \\ && \left( P\left(X_1\in H_1,\ldots, X_n\in H_n\right) = \prod P\left(X_i\in H_i\right) \right) \end{eqnarray*} $$ i.e., $$ \begin{eqnarray*} && \left( \forall H_1, \ldots, H_n\in \algR \right) \\ && \left( P\left(\bigcap X_i^{-1}(H_i)\right) = \prod P\left(X_i^{-1}(H_i)\right) \right) \end{eqnarray*} $$

Equivalent statements of independence of random variables

for random variables, $X_1$, …, $X_n$, having $\mu$ and $F:\reals^n\to[0,1]$ as their distribution and CDF, with each $X_i$ having $\mu_i$ and $F_i:\reals\to[0,1]$ as its distribution and CDF, following statements are equivalent
- $X_1,\ldots,X_n \mbox{ are independent}$
- $$ \begin{eqnarray*} &&\left( \forall H_1, \ldots, H_n\in \algR \right) \\ &&\left( P\left(\bigcap X_i^{-1}(H_i)\right) = \prod P\left(X_i^{-1}(H_i)\right) \right) \end{eqnarray*} $$
- $$ \begin{eqnarray*} &&\left( \forall H_1,\ldots,H_n \in \algR \right) \\ &&\left( P(X_1\in H_1,\ldots, X_n\in H_n) = \prod P(X_i \in H_i) \right) \end{eqnarray*} $$
- $$ \begin{eqnarray*} &&\left( \forall x\in \reals^n \right) \\ &&\left( P(X_1\leq x_1,\ldots, X_n\leq x_n) = \prod P(X_i \leq x_i) \right) \end{eqnarray*} $$
- $\left( \forall x \in \reals^n \right) \left( F(x) = \prod F_i(x_i) \right) $
- $\mu = \mu_1 \times \cdots \times \mu_n $
- $\left( \forall x \in \reals^n \right) \left( f(x) = \prod f_i(x_i) \right) $

Independence of random variables with separate $\sigma$-algebra

given probability space, $\meas{\Omega}{\algk{F}}{P}$
random variables, $X_1$, …, $X_n$, each of which is measurable with respect to each of $n$ independent $\sigma$-algebras, $\algk{G}_1\subset \algF$, …, $\algk{G}_n\subset \algF$ respectively, are independent

Independence of random vectors

for random vectors, $X_1:\Omega\to\reals^{d_1}$, …, $X_n:\Omega\to\reals^{d_n}$, having $\mu$ and $F:\reals^{d_1}\times\cdots\times\reals^{d_n}\to[0,1]$ as their distribution and CDF, with each $X_i$ having $\mu_i$ and $F_i:\reals^{d_i}\to[0,1]$ as its distribution and CDF, following statements are equivalent
- $X_1,\ldots,X_n \mbox{ are independent}$
- $$ \begin{eqnarray*} && \left( \forall H_1\in\algR^{d_1}, \ldots, H_n\in \algR^{d_n} \right) \\ && \left( P\left(\bigcap X_i^{-1}(H_i)\right) = \prod P\left(X_i^{-1}(H_i)\right) \right) \end{eqnarray*} $$
- $$ \begin{eqnarray*} && \left( \forall H_1\in\algR^{d_1}, \ldots, H_n\in \algR^{d_n} \right) \\ && \left( P(X_1\in H_1,\ldots, X_n\in H_n) = \prod P(X_i \in H_i) \right) \end{eqnarray*} $$
- $$ \begin{eqnarray*} && \left( \forall x_1\in \reals^{d_1},\ldots,x_n\in\reals^{d_n} \right) \\ && \left( P(X_1\preceq x_1,\ldots, X_n\preceq x_n) = \prod P(X_i \preceq x_i) \right) \end{eqnarray*} $$
- $ \left( \forall x_1\in \reals^{d_1},\ldots,x_n\in\reals^{d_n} \right) \left( F(x_1,\ldots,x_n) = \prod F_i(x_i) \right) $
- $\mu = \mu_1 \times \cdots \times \mu_n $
- $ \left( \forall x_1\in \reals^{d_1},\ldots,x_n\in\reals^{d_n} \right) \left( f(x_1,\ldots,x_n) = \prod f_i(x_i) \right) $

Independence of infinite collection of random vectors

infinite collection of random vectors for which every finite subcollection is independent, said to be independent
for independent (countable) collection of random vectors, $\seq{\seq{X_{ni}}_{i=1}^\infty}_{n=1}^\infty$, $\seq{\algk{F}_n}_{n=1}^\infty$ with $\algk{F}_n = \sigma(\seq{X_{ni}}_{i=1}^\infty)$ are independent

Probability evaluation for two independent random vectors

for independent random vectors, $X$ and $Y$, with distributions, $\mu$ and $\nu$, in $\reals^n$ and $\reals^m$ respectively $$ \begin{eqnarray*} && \left( \forall B\in\algR^{n+m} \right) \\ && \left( \prob{(X,Y)\in B} = \int_{\reals^n} \prob{(x,Y)\in B} d\mu_X \right) \end{eqnarray*} $$ and $$ \begin{eqnarray*} && \left( \forall A\in\algR^{n}, B\in\algR^{n+m} \right) \\ && \left( \prob{X\in A, (X,Y)\in B} = \int_{A} \prob{(x,Y)\in B} d\mu_X \right) \end{eqnarray*} $$

Sequence of random variables

for sequence of probability measures on $\algR$, $\seq{\mu_n}$, exists probability space, $\meas{X}{\Omega}{P}$, and sequence of independent random variables in $\reals$, $\seq{X_n}$, such that each $X_n$ has $\mu_n$ as distribution

Expected values

for random variable, $X$, on $\meas{\Omega}{\algF}{P}$, integral of $X$ with respect to measure, $P$ $$ \Expect X = \int X dP = \int_\Omega X(\omega) dP $$ called expected value of $X$

$\Expect X$ is
- always defined for nonnegative $X$
- for general case
  - defined, or
  - $X$ has an expected value if either $\Expect X^+<\infty$ or $\Expect X^-<\infty$ or both, in which case, $\Expect X =\Expect X^+ - \Expect X^-$
$X$ is integrable if and only if $\Expect |X| <\infty$
limits
- if $\seq{X_n}$ is dominated by integral random variable or they are uniformly integrable, $\Expect X_n$ converges to $\Expect X$ if $X_n$ converges to $X$ in probability

Markov and Chebyshev's inequalities

for random variable, $X$, on $\meas{\Omega}{\algF}{P}$, $$ \prob{X\geq \alpha} \leq \frac{1}{\alpha} \int_{X\geq \alpha} X d P \leq \frac{1}{\alpha} \Expect X $$ for nonnegative $X$, hence $$ \prob{|X|\geq \alpha} \leq \frac{1}{\alpha^n} \int_{|X|\geq \alpha} |X|^n d P \leq \frac{1}{\alpha^n} \Expect |X|^n $$ for general $X$

as special case of Markov inequality, $$ \begin{eqnarray*} \prob{|X-\Expect X|\geq \alpha} &\leq& \frac{1}{\alpha^2} \int_{|X-\Expect X|\geq \alpha} (X-\Expect X)^2 d P \\ &\leq& \frac{1}{\alpha^2} \Var X \end{eqnarray*} $$ for general $X$

Jensen's, Hölder's, and Lyapunov's inequalities

for random variable, $X$, on $\meas{\Omega}{\algF}{P}$, and convex function, $\varphi$ $$ \varphi\left(\Expect X\right) \prob{X\geq \alpha} \leq \frac{1}{\alpha} \int_{X\geq \alpha} X d P \leq \frac{1}{\alpha} \Expect X $$

for two random variables, $X$ and $Y$, on $\meas{\Omega}{\algF}{P}$, and $p,q\in(1,\infty)$ with $1/p+1/q=1$ $$ \Expect |XY| \leq \left(\Expect |X|^p\right)^{1/p} \left(\Expect |X|^q\right)^{1/q} $$

for random variable, $X$, on $\meas{\Omega}{\algF}{P}$, and $0<\alpha<\beta$ $$ \left(\Expect |X|^\alpha\right)^{1/\alpha} \leq \left(\Expect |X|^\beta\right)^{1/\beta} $$

note Hölder's inequality implies Lyapunov's inequality

Maximal inequalities

if $A\in\algF=\bigcap_{n=1}^\infty \sigma(X_n, X_{n+1},\ldots)$ for independent $\seq{X_n}$, $$ \prob{A} = 0 \vee \prob{A} = 1 $$

– define $S_n = \sum X_i$

for independent $\seq{X_i}_{i=1}^n$ with $\Expect X_i =0$ and $\Var X_i<\infty$ and $\alpha>0$ $$ \prob{\max S_i \geq \alpha} \leq \frac{1}{\alpha}\Var S_n $$

for independent $\seq{X_i}_{i=1}^n$ and $\alpha>0$ $$ \prob{\max |S_i| \geq 3\alpha} \leq 3 \max \prob{|S_i|\geq\alpha} $$

Moments

for random variable, $X$, on $\meas{\Omega}{\algF}{P}$, integral of $X$ with respect to measure, $P$ $$ \Expect X^n = \int x^k d\mu = \int x^k dF(x) $$ called $k$-th moment of $X$ or $\mu$ or $F$, and $$ \Expect |X|^n = \int |x|^k d\mu = \int |x|^k dF(x) $$ called $k$-th absolute moment of $X$ or $\mu$ or $F$

if $\Expect |X|^n<\infty$, $\Expect |X|^k<\infty$ for $k<n$
$\Expect X^n$ defined only when $\Expect|X|^n<\infty$

Moment generating functions

for random variable, $X$, on $\meas{\Omega}{\algF}{P}$, $M:\complexes \to \complexes$ defined by $$ M(s) = \Expect \left( e^{sX} \right) = \int e^{sx} d\mu = \int e^{sx} dF(x) $$ called moment generating function of $X$

$n$-th derivative of $M$ with respect to $s$ is $ M^{(n)}(s) = \frac{d^n}{ds^n} F(s) = \Expect \left(X^ne^{sX}\right) = \int xe^{sx} d\mu $
thus, $n$-th derivative of $M$ with respect to $s$ at $s=0$ is $n$-th moment of $X$ $$ M^{(n)}(0) = \Expect X^n $$
for independent random variables, $\seq{X_i}_{i=1}^n$, moment generating function of $\sum X_i$ $$ \prod M_i(s) $$

Convergence of Random Variables

Convergences of random variables

random variables, $\seq{X_n}$, with $$ \prob{\lim X_n = X} = P(\set{\omega \in \Omega}{\lim X_n(\omega) = X(\omega)}) = 1 $$ said to converge to $X$ with probability $1$ and denoted by $X_n\to X$ a.s.

random variables, $\seq{X_n}$, with $$ \left( \forall \epsilon>0 \right) \left( \lim \prob{|X_n-X|>\epsilon} = 0 \right) $$ said to converge to $X$ in probability

distribution functions, $\seq{F_n}$, with $$ \left( \forall x \mbox{ in domain of }F \right) \left( \lim F_n(x) = F(x) \right) $$ said to converge weakly to distribution function, $F$, and denoted by $F_n \Rightarrow F$

When $F_n\Rightarrow F$, associated random variables, $\seq{X_n}$, said to converge in distribution to $X$, associated with $F$, and denoted by $X_n \Rightarrow X$

for measures on $\measu{\reals}{\algR}$, $\seq{\mu_n}$, associated with distribution functions, $\seq{F_n}$, respectively, and measure on $\measu{\reals}{\algR}$, $\mu$, associated with distribution function, $F$, we denote $$ \mu_n \Rightarrow \mu $$ if $$ \left( \forall A = (-\infty, x] \mbox{ with } x\in\reals \right) \left( \lim \mu_n(A) = \mu(A) \right) $$

indeed, if above equation holds for $A=(-\infty, x)$, it holds for many other subsets

Relations of different types of convergences of random variables

convergence with probability $1$ implies convergence in probability, which implies $X_n\Rightarrow X$, i.e. $$ \begin{eqnarray*} && X_n \to X \mbox{ a.s., \ie, } X_n \mbox{ converge to } X \mbox{ with probability $1$} \\ &\Rightarrow& X_n \mbox{ converge to } X \mbox{ in probability} \\ &\Rightarrow& X_n \Rightarrow X \mbox{, \ie, } X_n \mbox{ converge to } X \mbox{ in distribution}, \end{eqnarray*} $$

Necessary and sufficient conditions for convergence of probability

\[{X_n}\ \mbox{ converge in probability}\]

if and only if

\[\left( \forall \epsilon>0 \right) \left( \prob{|X_n-X|>\epsilon\mbox{ i.o}} = \prob{\limsup |X_n-X| > \epsilon } = 0 \right)\]

if and only if

\[\begin{eqnarray*} && \left( \forall \mbox{ subsequence }\seq{X_{n_k}} \right) \\ && \left( \exists \mbox{ its subsequence }\seq{X_{n_{k_l}}} \mbox{ converging to } f \mbox{ with probability } 1 \right) \end{eqnarray*}\]

Necessary and sufficient conditions for convergence in distribution

\[X_n\Rightarrow X, \mbox{\ie, $X_n$ converge in distribution}\]

if and only if

\[F_n\Rightarrow F, \mbox{\ie, $F_n$ converge weakly}\]

if and only if

\[\left( \forall A = (-\infty, x] \mbox{ with } x\in\reals \right) \left( \lim \mu_n(A) = \mu(A) \right)\]

if and only if

\[\left( \forall x \mbox{ with } \prob{X=x} = 0 \right) \left( \lim \prob{X_n\leq x} = \prob{X\leq x} \right)\]

Strong law of large numbers

– define $S_n = \sum_{i=1}^n X_i$

for sequence of independent and identically distributed (i.i.d.) random variables with finite mean, $\seq{X_n}$ $$ \frac{1}{n} S_n \to \Expect X_1 $$ with probability $1$

strong law of large numbers also called Kolmogorov's law

for sequence of independent and identically distributed (i.i.d.) random variables with $\Expect X_1^- < \infty$ and $\Expect X_1^+ = \infty$ (hence, $\Expect X = \infty$) $$ \frac{1}{n} S_n \to \infty $$ with probability $1$

Weak law of large numbers

– define $S_n = \sum_{i=1}^n X_i$

for sequence of independent and identically distributed (i.i.d.) random variables with finite mean, $\seq{X_n}$ $$ \frac{1}{n} S_n \to \Expect X_1 $$ in probability

because convergence with probability $1$ implies convergence in probability (), strong law of large numbers implies weak law of large numbers

Normal distributions

– assume probability space, $\meas{\Omega}{\algF}{P}$

Random variable, $X:\Omega\to\reals$, with $$ \left( A\in\algR \right) \left( \prob{X\in A} = \frac{1}{\sqrt{2\pi}\sigma} \int_A e^{-(x-c)^2/2} d\mu \right) $$ where $\mu=PX^{-1}$ for some $\sigma>0$ and $c\in\reals$, called normal distribution and denoted by $X \sim \normal(c,\sigma^2)$

note $\Expect X=c$ and $\Var X=\sigma^2$
called standard normal distribution when $c=0$ and $\sigma=1$

Multivariate normal distributions

– assume probability space, $\meas{\Omega}{\algF}{P}$

Random variable, $X:\Omega\to\reals^n$, with $$ \left( A\in\algR^n \right) \left( \prob{X\in A} = \frac{1}{\sqrt{(2\pi)^n}\sqrt{\det \Sigma}} \int_A e^{-(x-c)^T\Sigma^{-1}(x-c)/2} d\mu \right) $$ where $\mu=PX^{-1}$ for some $\Sigma\succ0\in\posdefset{n}$ and $c\in\reals^n$, called ($n$-dimensional) normal distribution, and denoted by $X \sim \normal(c,\Sigma)$

note that $\Expect X=c$ and covariance matrix is $\Sigma$

Lindeberg-Lévy theorem

– define $S_n = \sum^n X_i$

for independent random variables, $\seq{X_n}$, having same distribution with expected value, $c$, and same variance, $\sigma^2<\infty$, ${(S_n - nc)}/{\sigma\sqrt{n}}$ converges to standard normal distribution in distribution, i.e., $$ \frac{S_n - nc}{\sigma\sqrt{n}} \Rightarrow N $$ where $N$ is standard normal distribution

implies $$ S_n / n \Rightarrow c $$

Limit theorems in $\reals^n$

each of following statements are equivalent to weak convergence of measures, $\seq{\mu_n}$, to $\mu$, on measurable space, $\measu{\reals^k}{\algR^k}$

$\lim \int f d\mu_n = \int f d\mu$ for every bounded continuous $f$
$\limsup \mu_n(C) \leq \mu(C)$ for every closed $C$
$\liminf \mu_n(G) \geq \mu(G)$ for every open $G$
$\lim \mu_n(A) = \mu(A)$ for every $\mu$-continuity $A$

for random vectors, $\seq{X_n}$, and random vector, $Y$, of $k$-dimension, $X_n\Rightarrow Y$, i.e., $X_n$ converge to $Y$ in distribution if and only if $$ \left( \forall z\in \reals^k \right) \left( z^T X_n \Rightarrow z^T Y \right) $$

Central limit theorem

– assume probability space, $\meas{\Omega}{\algF}{P}$ and define $\sum^n X_i = S_n$

for random variables, $\seq{X_n}$, having same distributions with $\Expect X_n = c\in\reals^k$ and positive definite covariance matrix, $\Sigma\succ0\in\mathcalfont{S}_k$, i.e., $\Expect(X_n-c)(X_n-c)^T = \Sigma$, where $\Sigma_{ii} < \infty$ (hence $\Sigma \prec M I_n$ for some $M\in\ppreals$ due to Cauchy-Schwarz inequality), $$ (S_n -nc)/\sqrt{n} \mbox{ converges in distribution to } Y $$ where $Y \sim \normal(0,\Sigma)$

Convergence of random series

for independent $\seq{X_n}$, probability of $\sum X_n$ converging is either $0$ or $1$
below characterize two cases in terms of distributions of individual $X_n$

for independent $\seq{X_n}$ with $\Expect X_n=0$ and $\Var X_n < \infty$ $$ \sum X_n \mbox{ converges with probability $1$} $$

for independent $\seq{X_n}$, $\sum X_n$ converges with probability $1$ if and only if they converges in probability

define trucated version of $X_n$ by $X_n^{(c)}$, i.e., $X_n I_{|X_n|\leq c}$

for independent $\seq{X_n}$, $\sum X_n$ converge with probability $1$ if all of $\sum \prob{|X_n|>c}$, $\sum \Expect(X_n^{(c)})$, and $\sum \Var(X_n^{(c)})$ converge for some $c>0$

Convex Optimization

Convex Sets

Lines and line segmenets

for some $x,y\in\reals^n$ $$ \set{\theta x + (1-\theta) y}{\theta\in\reals} $$ called line going through $x$ and $y$

for some $x,y\in\reals^n$ $$ \set{\theta x + (1-\theta) y}{0\leq\theta\leq1\in\reals} $$ called line segment connecting $x$ and $y$

Affine sets

set, $C\subset \reals^n$, every line going through any two points in which is contained in $C$, i.e. $$ \left( \forall x,y \in C \right) \left( \set{\theta x + (1-\theta) y}{\theta \in \reals} \subset C \right) $$ called affine set

for set, $C\subset\reals^n$, intersection of all affine sets containing $C$, called affine hull of $C$, denoted by $\affinehull C$, which is equal to set of all affine combinations of points in $C$, i.e. $$ \bigcup_{n\in\naturals} \set{\theta_1 x_1 + \cdots + \theta_n x_n}{x_1,\ldots,x_n\in C, \theta_1 + \cdots + \theta_n=1} $$

for $C\subset \reals^n$, dimension of $\affinehull C$, called affine dimension

Relative interiors and boundaries

for $C\subset \reals^n$, $$ \bigcup_{O:\mathrm{open}, O\cap \affinehull C\subset C} O \cap \affinehull C $$ or equivalently $$ \set{x}{(\exists \epsilon >0)(\forall y\in \affinehull{C}, \|y-x\|<\epsilon)(y\in C)} $$ is called relative interior of $C$ or interior relative to $C$, denoted by $\relint C$

for $C\subset \reals^n$, $\closure{C}\sim \relint C$, called relative boundary of $C$

Convex sets

set, $C\subset \reals^n$, every line segment connecting any two points in which is contained in $C$, i.e. $$ \left( \forall x,y\in C \right) \left( \forall 0\leq \theta\leq1 \right) \left( \theta x + (1-\theta) y \in C \right) $$ called convex set

for set, $C\subset \reals^n$, intersection of all convex sets containing $C$, called convex hull of $C$, denoted by $\cvxhull C$, which is equal to set of all convex combinations of points in $C$, i.e. $$ \bigcup_{n\in\naturals} \set{\theta_1 x_1 + \cdots + \theta_n x_n}{x_1,\ldots,x_n\in C, \theta_1 + \cdots + \theta_n=1, \theta_1, \ldots, \theta_n >0} $$

convex hull (of course) is convex set

Cones

set, $C\subset \reals^n$, for which $$ \left( \forall x\in C, \theta \geq 0 \right) \left( \theta x \in C \right) $$ called cone or nonnegative homogeneous

set, $C\subset \reals^n$, which is both convex and cone, called convex cone; $C$ is convex cone if and only if $$ \left( \forall x, y\in C, \theta, \xi \geq0 \right) \left( \theta x + \xi y \in C \right) $$

convex cone (of course) is convex set
examples of convex cones: $\prealk{n}$, $\pprealk{n}$, $\possemidefset{n}$, and $\posdefset{n}$

Hyperplanes and half spaces

$n-1$ dimensional affine set in $\reals^n$, called hyperplane; every hyperplane can be expressed as $$ \set{x\in\reals^n}{a^T = b} $$ for some $a\neq0 \in \reals^n$ and $b\in \reals$

one of two sets divided by hyperplane, called half space; every half space can be expressed as $$ \set{x\in\reals^n}{a^T \leq b} $$ for some $a\neq0 \in \reals^n$ and $b\in \reals$

hyperplanes and half spaces are convex sets

Euclidean balls and ellipsoids

set of all points distance of which from point, $x\in\reals^n$, is no greater than $r>0$, called (Euclidean) ball centered at $x$ with radius, $r$, denoted by $\ball{x}{r}$, i.e. $$ \ball{x}{r} = \set{y\in\reals^n}{\|y-x\|_2\leq r} $$

ball elongated along $n$ orthogonal axes, called ellipsoid, i.e., $$ \set{y\in\reals^n}{(y-x)^TP^{-1}(y-x)\leq 1} $$ for some $x\in\reals^n$ and $P\in \posdefset{n}$

Euclidean balls and ellipsoids are convex sets

Norm balls and norm cones

for norm, $\|\cdot\|:\reals^n\to\preals$, set of all points distance of which measured in the norm from point, $x\in\reals^n$, is no greater than $r>0$, called norm ball centered at $x$ with radius, $r$, associated with norm, $\|\cdot\|$, i.e. $$ \set{y\in\reals^n}{\|y-x\|\leq r} $$

for norm, $\|\cdot\|:\reals^n\to\preals$, $x\in\reals^n$, and $r>0$, $$ \set{(x,y)\in\reals^n \times \reals}{ \|x\|\leq r} \subset \reals^{n+1} $$ called cone associated with norm, $\|\cdot\|$

norm cone associated with Euclidean norm, called second-order cone

norm balls and norm cones are convex sets

Polyhedra

intersection of finite number of hyperplanes and half spaces, called polyhedron; every polyhedron can be expressed as $$ \set{x\in\reals^n}{Ax \preceq b, Cx = d} $$ for $A\in\reals^{m\times n}$, $b\in\reals^m$, $C\in\reals^{p\times n}$, $d\in\reals^p$

polyhedron is convex set (by )

Convexity preserving set operations

intersection preserves convexity
- for (any) collection of convex sets, $\coll$, $$ \bigcap_{C\in\coll} C $$ is convex set
scalar scaling preserves convexity
- for convex set $C$ $$ \alpha C $$ is convex set for any $\alpha\in\reals$
sum preserves convexity
- for convex sets $C$ and $D$ $$ C+D $$ is convex set
direct product preserves convexity
- for convex sets $C$ and $D$ $$ C\times D $$ is convex set
projection preserves convexity
- for convex set $C\subset A \times B$ $$ \set{x\in A}{(\exists y)((x,y)\in C)} $$ is convex
image and inverse image by affine function preserve convexity
- for affine function $f:A\to B$ and convex sets $C\subset A$ and $D\subset B$ $$ f(C) \;\& \; f^{-1}(D) $$ are convex
image and inverse image by linear-fractional function preserve convexity
- for convex sets $C\subset \reals^n, D\subset \reals^m$ and linear-fractional function, $g:\reals^n\to\reals^m$, i.e., function defined by $g(x) = (Ax+b)/(c^Tx+d)$ for $A\in\reals^{m\times n}$, $b\in\reals^m$, $c\in\reals^n$, and $d\in\reals$ $$ g(C) \ \& \ g^{-1}(D) $$ are convex

Proper cones and generalized inequalities

closed convex cone $K$ which is

solid, i.e., $\interior{K}\neq \emptyset$
pointed, i.e., $x\in vK$ and $-x\in K$ imply $x=0$

called proper cone

examples of proper cones: $\prealk{n}$ and $\possemidefset{n}$

proper cone $K$ defines generalized inequalities

(nonstrict) generalized inequality $$ x \preceq_K y \Leftrightarrow y - x\in K $$
strict generalized inequality $$ x \prec_K y \Leftrightarrow y - x\in \interior{K} $$

$\preceq_K$ and $\prec_K$ are partial orderings

Convex sets induced by generalized inequalities

for affine function $g:\reals^n\to\symset{m}$, i.e., $f(x)=A_0 + A_1 x_1 + \cdots + A_n x_n$ for some $A_0,\ldots,A_n\in\symset{m}$, $f^{-1}(\possemidefset{n})$ is convex (by ), i.e., $$ \set{x\in\reals^n}{A_0 + A_1 x_1 + \cdots + A_n x_n \succeq 0} \subset \reals^n $$ is convex
can negate each matrix $A_i$ and have same results, hence $$ \set{x\in\reals^n}{A_0 + A_1 x_1 + \cdots + A_n x_n \preceq 0} \subset \reals^n $$ is (also) convex

Separating and supporting hyperplanes

for nonempty disjoint convex sets $C$ and $D$, exists hyperplane which separates $C$ and $D$, i.e. $$ \left( \exists a\neq0\in\reals^n, b\in\reals \right) \left( \forall x\in C, y\in D \right) \left( a^T x + b \geq 0 \ \& \ a^T y + b \leq 0 \right) $$

for nonempty disjoint convex sets $C$ and $D$, hyperplane satisfying property in , called separating hyperplane, said to separate $C$ and $D$

for nonempty convex set $C$ and $x\in \boundary C$, exists hyperplane passing through $x$, i.e., $$ \left( \exists a\neq0\in\reals^n \right) \left( \forall y\in C \right) \left( a^T(y-x) \leq 0 \right) $$

for nonempty convex set $C$ and $x\in \boundary C$, hyperplane satisfied property in , called supporting hyperplane

Dual cones

for cone $K$, $$ \set{x}{\forall y \in K, y^Tx\geq 0 } $$ called dual cone of $K$, denoted by $K^\ast$

the figure illustrates $x \in K^\ast$ while $z\not\in K^\ast$

Dual norms

for norm $\|\cdot\|$, fudnction defined by $$ y \mapsto \sup \set{y^Tx}{\|x\|\leq 1} $$ called dual norm of $\|\cdot\|$, denoted by $\|\cdot\|_\ast$

examples
- dual cone of subspace $V\subset \reals^n$ is orthogonal complement of $V$, $V^\perp$, where $V^\perp=\set{y}{\forall v\in V,v^Ty = 0}$
- $\prealk{n}$ and $\possemidefset{n}$ are self-dual
- dual of norm cone is norm cone associated with dual norm, i.e., if $K=\set{(x,t)\in\reals^{n} \times \reals}{\|x\|\leq t}$ $$ K=\set{(y,u)\in\reals^{n} \times \reals}{\|y\|_\ast\leq u} $$

Properties of dual cones

for cones $K$, $K_1$, and $K_2$

$K^\ast$ is closed and convex
$K_1\subset K_2 \Rightarrow K_2^\ast \subset K_1^\ast$
if $\interior{K} \neq \emptyset$, $K^\ast$ is pointed
if $\closure{K}$ is pointed, $\interior{(K^\ast)} \neq \emptyset$
$K^{\ast\ast}=(K^\ast)^\ast$ is closure of convex hull of $K$,
$K^\ast$ is closed and convex

thus,

if $K$ is closed and convex, $K^{\ast\ast} = K$
dual of proper cone is proper cone
for proper cone $K$, $K^{\ast\ast}=K$

Dual generalized inequalities

dual of proper cone is proper (), hence the dual also induces generalized inequalities

for proper cone $K$,

$x\preceq_K y$ if and only if $(\forall \lambda \succeq_{K^\ast} 0)(\lambda^T x \leq \lambda^T y)$
$x\prec_K y$ if and only if $(\forall \lambda \succeq_{K^\ast} 0 \mbox{ with } \lambda\neq0)(\lambda^T x < \lambda^T y)$

$K^{\ast\ast} = K$, hence above are equivalent to

$x\preceq_{K^\ast} y$ if and only if $(\forall \lambda \succeq_{K} 0)(\lambda^T x \leq \lambda^T y)$
$x\prec_{K^\ast} y$ if and only if $(\forall \lambda \succeq_{K} 0 \mbox{ with } \lambda\neq0)(\lambda^T x < \lambda^T y)$

Theorem of alternative for linear strict generalized inequalities

for proper cone $K\subset \reals^m$, $A\in\reals^{m\times n}$, and $b\in\reals^m$, $$ Ax \prec_K b $$ is infeasible if and only if exist nonzero $\lambda\in\reals^m$ such that $$ \lambda \neq0,\ \lambda \succeq_{K^\ast} 0,\ A^T \lambda = 0,\ \lambda^T b \leq0 $$ Above two inequality systems are alternative, i.e., for any data, $A$ and $b$, exactly one of them is feasible.

Convex Functions

Convex functions

function $f:\reals^n\to\reals$ the domain of which is convex and which satisfies $$ \begin{eqnarray*} && \left( \forall x,y\in \dom f, 0\leq \theta \leq 1 \right) \\ && \left( f(\theta x + (1-\theta) y) \leq \theta f(x) + (1-\theta) f(y) \right) \end{eqnarray*} $$ said to be convex
function $f:\reals^n\to\reals$ the domain of which is convex and which satisfies $$ \begin{eqnarray*} && \left( \forall \mbox{ distinct } x,y\in \dom f, 0< \theta < 1 \right) \\ && \left( f(\theta x + (1-\theta) y) < \theta f(x) + (1-\theta) f(y) \right) \end{eqnarray*} $$ said to be strictly convex

function $f:\reals^n\to\reals$ the domain of which is convex where $-f$ is convex, said to be concave
function $f:\reals^n\to\reals$ the domain of which is convex where $-f$ is strictly convex, said to be strictly concave

Extended real-value extensions of convex functions

for convex function $f$, function $\tilde{f}: \reals^n \to \reals\cup\{\infty\}$ defined by $$ \tilde{f}(x) = \left\{\begin{array}{ll} f(x) &\mbox{if } x \in \dom f \\ \infty &\mbox{if } x \not\in \dom f \end{array}\right. $$ called extended real-value extension of $f$

using extended real-value extensions of convex functions, can drop “$\dom f$'' in equations, e.g.,
- $f$ is convex if and only if its extended-value extension $\tilde{f}$ satisfies $$ \begin{eqnarray*} && \left( \forall x,y\in \dom f, 0\leq \theta \leq 1 \right) \\ && \left( f(\theta x + (1-\theta) y) \leq \theta f(x) + (1-\theta) f(y) \right) \end{eqnarray*} $$
- $f$ is strictly convex if and only if its extended-value extension $\tilde{f}$ satisfies $$ \begin{eqnarray*} && \left( \forall \mbox{ distinct } x,y\in \dom f, 0< \theta < 1 \right) \\ && \left( f(\theta x + (1-\theta) y) < \theta f(x) + (1-\theta) f(y) \right) \end{eqnarray*} $$

First-order condition for convexity

differentiable $f$, i.e., $\dom f$ is open and gradient $\nabla f$ exists at every point in $\dom f$, is

convex if and only if $\dom f$ is convex and $$ \left( \forall x,y\in \dom f \right) \left( f(y) \geq f(x) + \nabla f(x) ^T (y-x) \right) $$
strictly convex if and only if $\dom f$ is convex and $$ \left( \forall \mbox{ distinct } x,y\in \dom f \right) \left( f(y) > f(x) + \nabla f(x) ^T (y-x) \right) $$

implies that for convex function $f$
- first-order Taylor approximation is global underestimator
- can derive global information from local information
  - e.g., if $\nabla f(x)=0$, $x$ is global minimizer
  - explains remarkable properties of convex functions and convex optimization problems

Second-order condition for convexity

twice-differentiable $f$, i.e., $\dom f$ is open and Hessian $\nabla^2 f$ exists at every point in $\dom f$, is convex if and only if $\dom f$ is convex and $$ \left( \forall x\in \dom f \right) \left( \nabla^2 f(x) \succeq 0 \right) $$

if $\dom f$ is convex and $$ \left( \forall x\in \dom f \right) \left( \nabla^2 f(x) \succ 0 \right) $$ it is strictly convex

Convex function examples

assume function $f:\reals^n\to\reals$ and $\dom f =\reals^n$ unlesss specified otherwise
affine function, i.e., $f(x)=a^Tx +b $ for some $a\in\reals^n$ and $b\in\reals$, is convex
quadratic functions - if $f(x) = x^T Px + q^Tx$ for some $P\in\symset{n}$ and $q\in\reals^n$
- $f$ is convex if and only if $P\succeq0$
- $f$ is strictly convex if and only if $P\succ0$
exponential function, i.e., $f(x) = \exp(a^Tx+b)$ for some $a\in\reals^n$ and $b\in\reals$, is convex
power, i.e., $f(x) = x^a$ for some $a\geq1$, is convex on $\ppreals$
power of absolute value, i.e., $f(x) = |x|^a$ for some $a\geq1$, is convex on $\reals$
logarithm function, i.e., $f(x) = \log x$, is concave on $\ppreals$
negative entropy, i.e., $$ f(x) = \left\{\begin{array}{ll} x\log x & \mbox{if } x >0 \\ 0 &\mbox{if } x=0 \end{array}\right. $$ is convex on $\preals$
norm as function is convex (by definition of norms, i.e., triangle inequality & absolute homogeneity)
max function, i.e., $f(x)=\max(x_1,\ldots,x_n\}$, is convex
quadratic-over-linear function, $f(x,y) = x^2/y$, is convex on $\reals\times \ppreals$
log-sum-exp, $f(x) = \log(\exp(x_1)+\cdots+\exp(x_n))$, is convex
geometric mean, $f(x) = (\prod_{i=1}^n x_i )^{1/n}$, is concave on $\pprealk{n}$
log-determinant, $f(X) = \log \det X$, is concave on $\posdefset{n}$

Sublevel sets and superlevel sets

for function $f$ and $\alpha\in\reals$, $$ \set{x\in\dom f}{f(x)\leq \alpha} $$ called $\alpha$-sublevel set of $f$

for function $f$ and $\alpha\in\reals$, $$ \set{x\in\dom f}{f(x)\geq \alpha} $$ called $\alpha$-superlevel set of $f$

every sublevel set of convex function is convex
and every superlevel set of concave function is convex

note, however, converse is not true
- e.g., every sublevel set of $\log$ is convex, but $\log$ is concave

Epigraphs and hypographs

for function $f$, $$ \set{(x,t)}{x\in\dom f, f(x)\leq t} $$ called epigraph of $f$, denoted by $\epi f$

for function $f$, $$ \set{(x,t)}{x\in\dom f, f(x)\geq t} $$ called hypograph of $f$, denoted by $\hypo f$

function is convex if and only if its epigraph is convex
function is concave if and only if its hypograph is convex

Convexity preserving function operations

nonnegative weighted sum preserves convexity
- for convex functions $f_1$, …, $f_n$ and nonnegative weights $w_1,\ldots, w_n$ $$ w_1 f_1 + \cdots w_n f_n $$ is convex
nonnegative weighted integration preserves convexity
- for measurable set $Y$, $w:Y\to\preals$, and $f:X \times Y$ where $f(x,y)$ is convex in $x$ for every $y\in Y$ and measurable in $y$ for every $x\in X$ $$ \int_Y w(y) f(x,y) dy $$ is convex
pointwise maximum preserves convexity
- for convex functions $f_1$, …, $f_n$ $$ \max\{f_1, \ldots, f_n\} $$ is convex
pointwise supremum preserves convexity
- for indexed family of convex functions $\indexedcol{f_\lambda}_{\lambda\in\Lambda}$ $$ \sup_{\lambda \in \Lambda} f_\lambda $$ is convex (one way to see this is $\epi \sup_\lambda f_\lambda = \bigcap_\lambda \epi f_\lambda$)
composition
- suppose $g:\reals^n\to\reals^k$, $h:\reals^k\to\reals$, and $f=h\circ g$
  - $f$ convex if $h$ convex & nondecreasing in each argument, and $g_i$ convex
  - $f$ convex if $h$ convex & nonincreasing in each argument, and $g_i$ concave
  - $f$ concave if $h$ concave & nondecreasing in each argument, and $g_i$ concave
  - $f$ concave if $h$ concave & nonincreasing in each argument, and $g_i$ convex
minimization
- for function $f(x,y)$ convex in $(x,y)$ and convex set $C$ $$ \inf_{y\in C} f(x,y) $$ is convex provided it is bounded below where domain is $\set{x}{(\exists y\in C)((x,y) \in \dom f)}$
perspective of convex function preserves convexity
- for convex function $f:X\to\reals$, function $g:X\times \reals \to \reals$ defined by $$ g(x,t) = tf(x/t) $$ with $\dom g = \set{(x,t)}{x/t \in \dom f, t>0}$ is convex

Convex functions examples

implies

piecewise-linear function is convex, i.e.
- $\max\{a_1^Tx+b_1,\ldots,a_m^T x + b_m\}$ for some $a_i\in\reals^n$ and $b_i\in\reals$ is convex
sum of $k$ largest components is convex, i.e.
- $x_{[1]} + \cdots + x_{[k]}$ where $x_{[i]}$ denotes $i$-th largest component, is convex (since $f(x) = \max\set{x_{i_1}+\cdots+x_{i_r}}{1\leq i_1< i_2<\cdots < i_r\leq n}$)
support function of set, i.e.,
- $\sup\set{x^Ty}{y\in A}$ for $A\subset\reals^n$ is convex
distance (when measured by arbitrary norm) to farthest point of set
- $\sup\set{\|x-y\|}{y\in A}$ for $A\subset\reals^n$ is convex
least-squares cost as function of weights
- $\inf_{x\in\reals^n} \sum^n_{i=1} w_i(a_i^Tx - b_i)^2$ for some $a_i\in\reals^n$ and $b_i\in\reals$ is concave
  - note that above function equals to $ \sum_{i=1}^n w_i b_i^2 - \sum_{i=1}^n w_i^2 b_i^2 a_i^T \left( \sum_{j=1}^n w_ja_ja_j^T\right)^{-1} a_i $ but not clear whether it is concave

maximum eigenvalue of symmetric matrix
- $\lambda_\mathrm{max}(F(x)) = \sup\set{y^TF(x)y}{\|y\|_2 \leq 1}$ where $F:\reals^n\to \symset{m}$ is linear function in $x$
norm of matrix
- $\sup\set{u^TG(x)v}{\|u\|_2 \leq 1, \|v\|_2\leq1}$ where $G:\reals^n\to \reals^{m\times n}$ is linear function in $x$
distance (when measured by arbitrary norm) to convex set
- for convex set $C$, $\inf\set{\|x-y\|}{y\in C}$
infimum of convex function subject to linear constraint
- for convex function $h$, $\inf\set{h(y)}{Ay=x}$ is convex (since it is $\inf_y (h(y) + I_{Ay=x}(x,y))$)
perspective of Euclidean norm squared
- map $(x,t) \mapsto x^Tx /t$ induces convex function in $(x,t)$ for $t>0$
perspective of negative log
- map $(x,t) \mapsto -t \log(x/t)$ induces convex function in $(x,t) \in \pprealk{2}$

perspective of convex function
- for convex function $f:\reals^n\to\reals$, function $g:\reals^n\to\reals$ defined by $$ g(x) = (c^T x + d) f((Ax+b)/(c^T x + d)) $$ from some $A\in\reals^{m\times n}$, $b\in\reals^m$, $c\in\reals^n$, and $d\in\reals$ with $\dom g = \set{x}{(Ax+b)/(c^Tx + d)\in \dom f, c^T x + d >0}$ is convex

Conjugate functions

for function $f$ $$ \sup_{y\in \dom f} (x^Ty - f(y)) $$ called conjugate function of $f$, denoted by $f^\ast$

conjugate function is convex for any function $f$ because it is supremum of linear (hence convex) functions (in $x$) ()

definition of conjugate function implies $$ f(x) + f^\ast(y) \geq x^Ty $$ sometimes called Young's inequality

for convex and closed function $f$ $$ f^{\ast\ast} = f $$ where closed function $f$ is defined by function with closed $\epi f$

Conjugate function examples

strictly convex quadratic function
- for $f:\reals^n \to \preals$ defined $f(x) = x^TQx/2$ where $Q\in \posdefset{n}$, $$ f^\ast(x)= \sup_y(y^Tx - y^TQy/2) = (y^Tx - y^TQy/2)|_{y=Q^{-1}x} = x^TQ^{-1}x/2 $$ which is also strictly convex quadratic function
log-determinant
- for function $f:\posdefset{n} \to \reals$ defined by $f(X) = \log \det X^{-1}$ $$ f^\ast(X) = \sup_{Y\in\posdefset{n}} (\Tr XY + \log \det Y) = \log\det (-X)^{-1} - n $$ where $\dom f^\ast = -\posdefset{n}$
indicator function
- for indicator function $I_A:\reals^n\to\{0,\infty\}$ with $A\subset \reals^n$ $$ I_A^\ast(x) = \sup_y (y^Tx - I_A(y)) = \sup \set{y^Tx}{y\in A} $$ which is support function of $A$
log-sum-exp function
- for function $f: \reals^n \to \reals$ defined by $f(x) = \log(\sum_{i=1}^n \exp(x_i))$ $$ f^\ast(x) = \sum_{i=1}^n x_i \log x_i + I_{x\succeq 0, \ones^T x = 1}(x) $$
norm
- for norm function $f:\reals^n\to\preals$ defined by $f(x)=\|x\|$ $$ f^\ast(x) = \sup_y( {y^Tx - \|y\|}) = I_{\|x\|_\ast\leq1}(x) $$
norm squared
- for function $f: \reals \to \preals$ defined by $f(x) = \|x\|^2/2$ $$ f^\ast(x) = \|x\|_\ast^2/2 $$
differentiable convex function
- for differentiable convex function $f:\reals^n\to\reals$ $$ f^\ast(x)= (y^\ast)^T \nabla f(y^\ast) - f(y^\ast) $$ where $y^\ast = \argsup_y (x^Ty-f(y))$
sum of independent functions
- for function $f:\reals^n\times \reals^m \to \reals$ defined by $f(x,y) = f_1(x) + f_2(y)$ where $f_1:\reals^n\to\reals$ and $f_2:\reals^m\to\reals$ $$ f^\ast(x,y) = f_1^\ast(x) + f_2^\ast(y) $$

Convex functions \wrt\ generalized inequalities

for proper cone $K$,

function $f$ satisfying $$ \left( \forall x,y \in \dom f, 0\leq \theta\leq 1 \right) \left( f(\theta x + (1-\theta) y) \preceq_K \theta f(x) + (1-\theta) f(y) \right) $$ called $K$-convex
function $f$ satisfying $$ \left( \forall x\neq y \in \dom f, 0< \theta< 1 \right) \left( f(\theta x + (1-\theta) y) \prec_K \theta f(x) + (1-\theta) f(y) \right) $$ called strictly $K$-convex

for proper cone $K$

function $f$ is $K$-convex if and only if for every $w\succeq_{K^\ast}0$, $w^Tf$ is convex
function $f$ is strictly $K$-convex if and only if for every nonzero $w\succeq_{K^\ast}0$, $w^Tf$ is strictly convex

Matrix convexity

function of $\reals^n$ into $\symset{m}$ which is $K$-convex where $K=\possemidefset{m}$, called matrix convex

examples of matrix convexity
- function of $\reals^{n\times m}$ into $\possemidefset{n}$ defined by $X\mapsto XX^T$ is matrix convex
- function of $\posdefset{n}$ into itself defined by $X\mapsto X^p$ is matrix convex for $1\leq p\leq 2$ or $-1\leq p \leq0$, and matrix concave for $0\leq p\leq1$
- function of $\symset{n}$ into $\posdefset{n}$ defined by $X\mapsto \exp(X)$ is not matrix convex
- quadratic matrix function of $\reals^{m\times n}$ into $\symset{n}$ defined by $X\mapsto X^TAX + B^TX + X^TB + C$ for $A\in\symset{m}$, $B\in\reals^{m\times n}$, and $C\in\symset{n}$ is matrix convex when $A\succeq0$

Convex Optimization Problems

Optimization problems

for $\fobj:\xobj \to \reals$, $\fie: \xie\to \reals^m$, $\feq: \xeq \to \reals^p$ where $\xobj$, $\xie$, and $\xeq$ are subsets of common set $\xdomain$ $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie(x) \preceq 0 \\ & \feq(x) =0 \end{array} $$ called optimization problem where $x$ is optimization variable

$\fobj$, $\fie$, and $\feq$ are objective function, inequality \& equality contraint function
$\fie(x) \preceq 0$ and $\feq(x) = 0$ are inequality contraints and equality contraints
$\optdomain = \xobj \cap \xie \cap \xeq$ is domain of optimization problem
$\optfeasset =\set{x\in \optdomain}{\fie(x) \preceq0, \feq(x)=0}$, called feasible set, $x\in\optdomain$, said to be feasible if $x\in\optfeasset$, optimization problem, said to be feasible if $\optfeasset\neq \emptyset$
$p^\ast = \inf\set{\fobj(x)}{x\in\optfeasset}$, called optimal value of optimization problem
if optimization problem is infeasible, $p^\ast = \infty$ (following convention that infimum of empty set is $\infty$)
if $p^\ast=-\infty$, optimization problem said to be unbounded

Global and local optimalities

for optimization problem in

$x\in \optfeasset$ with $\fobj(x) = p^\ast$, called (global) optimal point
$X_\mathrm{opt} = \set{x\in \optfeasset}{\fobj(x)=p^\ast}$, called optimal set
when $X_\mathrm{opt} \neq \emptyset$, we say optimal value is attained or achieved and optimization problem is solvable

optimization problem is not solvable if $p^\ast = \infty$ or $p^\ast = -\infty$ (converse is not true)

for optimization problem in where $\xdomain$ is metric space, $x\in\optfeasset$ satisfying $ \inf\set{\fobj(z)}{z\in\optfeasset, \rho(z,x)\leq r} $ where $\rho:\xdomain\times \xdomain\to\preals$ is metric, for some $r>0$, said to be locally optimal, i.e., $x$ solves $$ \begin{array}{ll} \mbox{minimize} & \fobj(z) \\\mbox{subject to}& \fie(z) \preceq 0 \\& \feq(z) =0 \\& \rho(z,x) \leq r \end{array} $$

Equivalent optimization problems

two optimization problems where solving one readily solve the other, said to be equivalent

below two optimization problems are equivalent
- $$ \begin{array}{ll} \mbox{minimize} & -x-y \\ \mbox{subject to} & 2x+y \leq1 \\ & x+2y \leq1 \end{array} $$
- $$ \begin{array}{ll} \mbox{minimize} & -2u-v/3 \\ \mbox{subject to} & 4u+v/3 \leq1 \\ & 2u+2v/3 \leq1 \end{array} $$
since if $(x^\ast, y^\ast)$ solves first, $(u,v)=(x^\ast/2, 3y^\ast)$ solves second, and if $(u^\ast, v^\ast)$ solves second, $(x,y)=(2u^\ast, v^\ast/3)$ solves first

Change of variables

given function $\phi:\mathcalfont{Z} \to \xdomain$, optimization problem in can be rewritten as $$ \begin{array}{ll} \mbox{minimize} & \fobj(\phi(z)) \\ \mbox{subject to} & \fie(\phi(z)) \preceq 0 \\ & \feq(\phi(z)) =0 \end{array} $$ where $z\in\mathcalfont{Z}$ is optimization variable
if $\phi$ is injective and $\optdomain \subset \phi(\mathcalfont{Z})$, above optimization problem and optimization problem in are equivalent, i.e.
- $X_\mathrm{opt}$ is optimal set of problem in $\Rightarrow$ $\phi^{-1}(X_\mathrm{opt})$ is optimal set of above problem
- $Z_\mathrm{opt}$ is optimal set of above problem $\Rightarrow$ $\phi(Z_\mathrm{opt})$ is optimal set of problem in
two optimization problems said to be related by change of variable or substitution of variable $x=\phi(z)$

Convex optimization

optimization problem in where $\xdomain$ is Banach space, i.e., complete linear normed vector space, $\fobj$ & $\fie$ are convex functions, and $\feq$ is affine function, called convex optimization problem

when $\xdomain= \reals^n$, optimization problem can be formulated as $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie(x) \preceq 0 \\ & Ax = b \end{array} $$ for some $A\in\reals^{p\times n}$ and $b\in\reals^p$

domain of convex optimization problem is convex
- since domains of $\fobj$, $\fie$, and $\feq$ are convex (by definition of convex functions) and intersection of convex sets is convex
feasible set of convex optimization problem is convex
- since sublevel sets of convex functions are convex, feasible sets for affine function is either empty set, singleton, or affine sets, all of which are convex sets

Optimality conditions for convex optimization problems

for convex optimization problem (in ), every local optimal point is global optimal point

for convex optimization problem (in ), when $\fobj$ is differentiable (i.e., $\dom \fobj$ is open and $\nabla \fobj$ exists everywhere in $\dom \fobj$)

$x\in\optdomain$ is optimal if and only if $x\in\optfeasset$ and $$ \left( \forall y \in \optfeasset \right) \left( \nabla \fobj(x)^T(y-x) \geq0 \right) $$
for unconstrained problems, $x\in\optdomain$ is optimal if and only if $$ \nabla \fobj(x)=0 $$

Optimality conditions for some convex optimization problems

unconstrained convex quadratic optimization $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) = (1/2)x^TPx + q^Tx \end{array} $$ where $\xobj=\reals^n$ and $P\in\possemidefset{n}$
- $x$ is optimal if and only if $$ \nabla \fobj(x) = Px + q = 0 $$ exist three cases
  - if $P\in\posdefset{n}$, exists unique optimum $x^\ast = -P^{-1}q$
  - if $q\in\range(P)$, $X_\mathrm{opt}=-P^\dagger q + \nullspace(P)$
  - if $q\not\in\range(P)$, $p^\ast = -\infty$
analytic centering $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) = - \sum_{i=1}^m \log (b_i-a_i^Tx) \end{array} $$ where $\xobj = \set{x\in\reals^n}{Ax \prec b}$
- $x$ is optimal if and only if $$ \nabla \fobj(x) = \sum_{i=1}^m \frac{1}{b_i-a_i^Tx}a_i = 0 $$ exist three cases
  - exists unique optimum, which happens if and only if $\set{x}{b_i-a_i^Tx}$ is nonempty and bounded
  - exist infinitely many optima, in which case, $X_\mathrm{opt}$ is affine set
  - exists no optimum, which happens if and only if $\fobj$ is unbounded below
convex optimization problem with equality constraints only $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & Ax =b \end{array} $$ where $\xdomain=\reals^n$
- $x$ is optimal if and only if $$ \nabla \fobj(x) \perp \nullspace(A) $$ or equivalently, exists $\nu\in\reals^p$ such that $$ \nabla \fobj(x) = A^T\nu $$

Linear programming

convex optimization problem in with $\xdomain=\reals^n$ and linear $\fobj$ & $\fie$, called linear program (LP), which can be formulated as $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & Cx \preceq d \\ & A x =b \end{array} $$ where $c\in\reals^n$, $C\in\reals^{m\times n}$, $d\in\reals^m$, $A\in\reals^{p\times n}$, $b\in\reals^p$

can transform above LP into standard form LP $$ \begin{array}{ll} \mbox{minimize} & \tilde{c}^T\tilde{x} \\ \mbox{subject to} & \tilde{A}\tilde{x} = \tilde{b} \\ & \tilde{x} \succeq0 \end{array} $$

LP examples

diet problem - find amount of $n$ different food to minimize purchase cost while satisfying nutrition requirements
- assume exist $n$ food and $m$ nutritions, $c_i$ is cost of food $i$, $A_{ji}$ is amount of nutrition $j$ contained in unit quantity of food $i$, $b_j$ is amount requirement for nutrition $j$
- diet problem can be formulated as LP $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & Ax \succeq b \\ & x\succeq0 \end{array} $$
Chebyshev center of polyhedron - find largest Euclidean ball contained in polyhedron
- assume polyhedron is $\set{x\in\reals^n}{a_i^Tx \leq b_i, i=1,\ldots, m}$
- problem of finding Chebyshev center of polyhedron can be formulated as LP $$ \begin{array}{ll} \mbox{maximize} & r \\ \mbox{subject to} & a_i^T x + r\|a_i\|_2 \leq b_i \end{array} $$ where optimization variables are $x\in\reals^n$ and $r\in\reals$
piecewise-linear minimization - minimize maximum of affine functions
- assume $m$ affine functions $a_i^Tx + b_i$
- piecewise-linear minimization problem can be formulated as LP $$ \begin{array}{ll} \mbox{minimize} & t \\ \mbox{subject to} & a_i^Tx + b_ i \leq t,\quad i=1,\ldots,m \end{array} $$
linear-fractional program $$ \begin{array}{ll} \mbox{minimize} & ( c^T x + d ) / ( e^T x + f ) \\ \mbox{subject to} & Gx \preceq h \\ & Ax = b \end{array} $$
- if feasible set is nonempty, can be formulated as LP $$ \begin{array}{ll} \mbox{minimize} & c^T y + dz \\ \mbox{subject to} & Gy - hz \preceq0 \\ & Ay-bz = 0 \\ & e^Ty + fz = 1 \\ & z\geq0 \end{array} $$

Quadratic programming

convex optimization problem in with $\xdomain=\reals^n$ and convex quadratic $\fobj$ and linear $\fie$, called quadratic program (QP), which can be formulated as $$ \begin{array}{ll} \mbox{minimize} & (1/2) x^TPx + q^Tx \\ \mbox{subject to} & Gx \preceq h \\ & A x =b \end{array} $$ where $P\in\possemidefset{n}$, $q\in\reals^n$, $G\in\reals^{m\times n}$, $h\in\reals^m$, $A\in\reals^{p\times n}$, $b\in\reals^p$

when $P=0$, QP reduces to LP, hence LP is specialization of QP

QP examples

least-squares (LS) problems
- LS can be formulated as QP $$ \begin{array}{ll} \mbox{minimize} & \|Ax-b\|_2^2 \end{array} $$
distance between two polyhedra
- assume two polyhedra $\set{x\in\reals^n}{Ax\preceq b, Cx =d}$ and $\set{x\in\reals^n}{\tilde{A}x\preceq \tilde{b}, \tilde{C}x =\tilde{d}}$
- problem of finding distance between two polyhedra can be formulated as QP $$ \begin{array}{ll} \mbox{minimize} & \|x-y\|_2^2 \\ \mbox{subject to} & Ax\preceq b, \quad Cx =d \\ & \tilde{A}y\preceq \tilde{b}, \quad \tilde{C}y =\tilde{d} \end{array} $$

Quadratically constrained quadratic programming

convex optimization problem in with $\xdomain=\reals^n$ and convex quadratic $\fobj$ & $\fie$, called quadratically constrained quadratic program (QCQP), which can be formulated as $$ \begin{array}{ll} \mbox{minimize} & (1/2) x^TP_0x + q_0^Tx \\ \mbox{subject to} & (1/2) x^TP_ix + q_i^Tx + r_i \leq0,\quad i=1,\ldots,m \\ & A x =b \end{array} $$ where $P_i\in\possemidefset{n}$, $q_i\in\reals^n$, $r_i\in\reals$, $A\in\reals^{p\times n}$, $b\in\reals^p$

when $P_i=0$ for $i=1,\ldots,m$, QCQP reduces to QP, hence QP is specialization of QCQP

Second-order cone programming

convex optimization problem in with $\xdomain=\reals^n$ and linear $\fobj$ and convex $\fie$ of form $$ \begin{array}{ll} \mbox{minimize} & f^T x \\ \mbox{subject to} & \|A_ix + b_i\|_2 \leq c_i^T x + d_i,\quad i=1,\ldots,m \\ & F x =g \end{array} $$ where $f\in\reals^n$, $A_i\in\reals^{n_i\times n}$, $b_i\in\reals^{n_i}$, $c_i\in\reals^{n}$, $d_i\in\reals$, $F\in\reals^{p\times n}$, $g\in\reals^p$ called second-order cone program (SOCP)

when $b_i=0$, SOCP reduces to QCQP, hence QCQP is specialization of SOCP

SOCP examples

robust linear program - minimize $c^T x$ while satisfying $\tilde{a}_i^T x \leq b_i$ for every $\tilde{a}_i \in \set{a_i+P_iu}{\|u\|_2\leq1}$ where $P_i\in\symset{n}$
- can be formulated as SOCP $$ \begin{array}{ll} \mbox{minimize} & c^T x \\ \mbox{subject to} & a_i^T x + \|P_i^T x\|_2 \leq b_i \end{array} $$
linear program with random constraints - minimize $c^T x$ while satisfying $\tilde{a}_i^T x \leq b_i$ with probability no less than $\eta$ where $\tilde{a} \sim \normal(a_i,\Sigma_i)$
- can be formulated as SOCP $$ \begin{array}{ll} \mbox{minimize} & c^T x \\ \mbox{subject to} & a_i^T x + \Phi^{-1}(\eta)\|\Sigma_i^{1/2} x\|_2 \leq b_i \end{array} $$

Geometric programming

function $f:\pprealk{n}\to\reals$ defined by $$ f(x) = cx_1^{a_1} \cdots x_n^{a_n} $$ where $c>0$ and $a_i\in\reals$, called monomial function or simply monomial

function $f:\pprealk{n}\to\reals$ which is finite sum of monomial functions, called posynomial function or simply posynomial

optimization problem $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie(x) \preceq 1 \\ & \feq(x) =1 \end{array} $$ for posynomials $\fobj:\pprealk{n} \to \reals$ & $\fie: \pprealk{n} \to \reals^m$ and monomials $\feq: \pprealk{n} \to \reals^p$, called geometric program (GP)

Geometric programming in convex form

geometric program in is not convex optimization problem (as it is)
however, can be transformed to equivalent convex optimization problem by change of variables and transformation of functions

geometric program (in ) can be transformed to equivalent convex optimization problem $$ \begin{array}{ll} \mbox{minimize} & \log\left( \sum_{k=1}^{K_0} \exp((a^{(0)}_k)^T y + b^{(0)}_k) \right) \\ \mbox{subject to} & \log\left( \sum_{k=1}^{K_i} \exp((a^{(i)}_k)^T y + b^{(i)}_k) \right) \leq0 \quad i=1,\ldots,m \\ & Gy = h \end{array} $$ for some $a^{(i)}_k\in\reals^n$, $b^{(i)}_k\in\reals$, $G\in\reals^{p\times n}$, $h\in\reals^p$ where optimization variable is $y=\log(x)\in\reals^n$

Convex optimization with generalized inequalities

convex optimization problem in with inequality constraints replaced by generalized inequality constraints, i.e. $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie_i(x) \preceq_{K_i} 0\quad i=1,\ldots,q \\ & \feq(x) = 0 \end{array} $$ where $K_i\subset R^{k_i}$ are proper cones and $\fie_i:\xie_i\to\reals^{k_i}$ are $K_i$-convex, called convex optimization problem with generalized inequality constraints

problem in reduces to convex optimization problem in when $q=1$ and $K_1=\prealk{m}$, hence convex optimization is specialization of convex optimization with generalized inequalities
like convex optimization
- feasible set is $\optfeasset = \set{x\in\optdomain}{\fie_i(x)\preceq_{K_i} 0, Ax=b}$ is convex
- local optimality implies global optimality
- optimality conditions in applies without modification

Conic programming

convex optimization problem with generalized inequality constraints in with linear $\fobj$ and one affine $\fie$ $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie(x) \preceq_{K} 0 \\ & \feq(x) = 0 \end{array} $$ called conic program (CP)

can transform above CP to standard form CP $$ \begin{array}{ll} \mbox{minimize} & \tildefobj(X) \\ \mbox{subject to} & \tildefeq (X) = 0 \\ & X \succeq_{K} 0 \end{array} $$

cone program is one of simplest convex optimization problems with generalized inequalities

Semidefinite programming

conic program in with $\xdomain=\reals^n$ and $K=\possemidefset{n}$ $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & x_1F_1 + \cdots + x_nF_n + G \preceq 0 \\ & Ax = b \end{array} $$ where $F_1,\ldots,F_n,G\in\symset{k}$ and $A\in\reals^{p\times n}$, called semidefinite program (SDP)

above inequality, called linear matrix inequality (LMI)
can transform SDP to standard form SDP $$ \begin{array}{ll} \mbox{minimize} & \Tr (CX) \\ \mbox{subject to} & \Tr (A_iX) = b_i\quad i=1,\ldots,p \\ & X \succeq 0 \end{array} $$ where $\xdomain=\possemidefset{n}$ and $C,A_1,\ldots,A_p\in\symset{n}$ and $b_i\in\reals$

SDP examples

LP
- if $k=m$, $F_i=\diag(C_{1,i}, \ldots, C_{m,i})$, $G=-\diag(d_1,\ldots, d_m)$ in , SDP reduces to LP in
- hence, LP is specialization of SDP
SOCP
- SOCP in is equivalent to $$ \begin{array}{ll} \mbox{minimize} & f^T x \\ \mbox{subject to} & Fx = g \\ & \begin{my-matrix}{cc} c_i^Tx + d_i & x^TA_i^T + b_i^T \\ A_ix + b_i & (c_i^Tx + d_i)I_{n_i} \end{my-matrix} \succeq 0 \quad i=1,\ldots,m \end{array} $$ which can be transformed to SDP in , thus, SDP reduces to SOCP
- hence, SOCP is specialization of SDP

Determinant maximization problems

convex optimization problem with generalized inequality constraints in with $\xdomain=\reals^n$ of form $$ \begin{array}{ll} \mbox{minimize} & -\log \det (x_1C_1 + \cdots + x_n C_n + D) +c^Tx \\ \mbox{subject to} & x_1F_1 + \cdots + x_nF_n + G \preceq 0 \\ & -x_1C_1 - \cdots - x_nC_n - D \prec 0 \\ & Ax = b \end{array} $$ where $c\in\reals^n$, $C_1,\ldots,C_n,D\in\symset{l}$, $F_1,\ldots,F_n,G\in\symset{k}$, and $A\in\reals^{p\times n}$, called determinant maximization problem or simply max-det problem (since it maximizes determinant of (positive definite) matrix with constraints)

if $l=1$, $C_1=\cdots=C_n=0$, $D=1$, max-det problem reduces to SDP, hence SDP is specialization of max-det problem

Diagrams for containment of convex optimization problems

the figure shows containment relations among convex optimization problems
vertical lines ending with filled circles indicate existence of direct reductions, i.e., optimization problem transformations to special cases

Duality

Lagrangian

for optimization problem in with nonempty domain $\optdomain$, function $L:\optdomain \times \reals^m \times \reals^p \to \reals$ defined by $$ L(x,\lambda, \nu) = \fobj(x) + \lambda^T \fie(x) + \nu^T \feq(x) $$ called Lagrangian associated with the optimization problem where

$\lambda$, called Lagrange multiplier associated inequality constraints $\fie(x)\preceq0$
$\lambda_i$, called Lagrange multiplier associated $i$-th inequality constraint $\fie_i(x)\leq0$
$\nu$, called Lagrange multiplier associated equality constraints $\feq(x)=0$
$\nu_i$, called Lagrange multiplier associated $i$-th equality constraint $\feq_i(x)=0$
$\lambda$ and $\nu$, called dual variables or Lagrange multiplier vectors associated with the optimization problem

Lagrange dual functions

for optimization problem in for which Lagrangian is defined, function $g:\reals^m \times \reals^p \to \reals\cup \{-\infty\}$ defined by $$ g(\lambda,\nu) = \inf_{x\in\optdomain} L(x,\lambda,\nu) = \inf_{x\in\optdomain} \left(\fobj(x) + \lambda^T \fie(x) + \nu^T \feq(x)\right) $$ called Lagrange dual function or just dual function associated with the optimization problem

$g$ is (always) concave function (even when optimization problem is not convex)
- since is pointwise infimum of linear (hence concave) functions is concave
$g(\lambda,\nu)$ provides lower bound for optimal value of associated optimization problem, i.e., $$ g(\lambda,\nu) \leq p^\ast $$ for every $\lambda\succeq0$
$(\lambda,\nu) \in \set{(\lambda,\nu)}{\lambda\succeq0, g(\lambda,\nu)>-\infty}$, said to be dual feasible

Dual function examples

LS solution of linear equations $$ \lssollineqs{primal} $$
- Lagrangian - $L(x,\nu) = x^T x + \nu^T(Ax-b)$
- Lagrange dual function
  $$ \lssollineqs{dual fcn} $$
standard form LP $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & Ax = b \\ & x\succeq 0 \end{array} $$
- Lagrangian - $L(x,\lambda,\nu) = c^T x - \lambda^T x + \nu^T(Ax-b)$
- Lagrange dual function
  $$ g(\lambda,\nu) = \left\{\begin{array}{ll} -b^T\nu & A^T\nu - \lambda + c = 0 \\ -\infty & \mbox{otherwise} \end{array}\right. $$
  - hence, set of dual feasible points is $\set{(A^T\nu + c,\nu)}{A^T\nu +c \succeq0}$
maximum cut, sometimes called max-cut, problem, which is NP-hard
$$ \begin{array}{ll} \mbox{minimize} & x^T W x \\ \mbox{subject to} & x_i^2 = 1 \end{array} $$ where $W\in\symset{n}$
- Lagrangian - $L(x,\nu) = x^T(W+\diag(\nu))x - \ones^Tx$
- Lagrange dual function $$ g(\nu) = \left\{\begin{array}{ll} -\ones^T\nu & W + \diag(\nu) \succeq 0 \\ -\infty & \mbox{otherwise} \end{array}\right. $$
  - hence, set of dual feasible points is $\set{\nu}{W+\diag(\nu)\succeq0}$
some trivial problem $$ \begin{array}{ll} \mbox{minimize} & f(x) \\ \mbox{subject to} & x=0 \end{array} $$
- Lagrangian - $L(x,\nu) =f(x)+\nu^Tx$
- Lagrange dual function $$ g(\nu) = \inf_{x\in\reals^n} (f(x)+\nu^Tx) = -\sup_{x\in\reals^n} ((-\nu)^Tx-f(x)) = - f^\ast(-\nu) $$
  - hence, set of dual feasible points is $-\dom f^\ast$, and for every $f:\reals^n\to\reals$ and $\nu\in\reals^n$ $$ -f^\ast(-\nu) \leq f(0) $$
minimization with linear inequality and equality constraints $$ \begin{array}{ll} \mbox{minimize} & f(x) \\ \mbox{subject to} & Ax\preceq b \\ & Cx= d \end{array} $$
- Lagrangian - $L(x,\lambda, \nu) = f(x) + \lambda^T(Ax-b) + \nu^T(Cx-d)$
- Lagrange dual function $$ g(\nu) = -b^T\lambda - d^T\nu - f^\ast(-A^T \lambda - C^T\nu) $$
  - hence, set of dual feasible points is $\set{(\lambda,\nu)}{-A^T\lambda - C^T\nu \in \dom f^\ast, \lambda\succeq 0}$
equality constrained norm minimization $$ \begin{array}{ll} \mbox{minimize} & \|x\| \\ \mbox{subject to} & Ax = b \end{array} $$
- Lagrangian - $L(x,\nu) = \|x\| + \nu^T(Ax-b)$
- Lagrange dual function $$ g(\nu) = -b^T\nu -\sup_{x\in\reals^n} ((-A^T\nu)^Tx - \|x\|) = \left\{\begin{array}{ll} -b^T \nu&\|A^T\nu\|_\ast\leq1 \\ - \infty & \mbox{otherwise} \end{array}\right. $$
  - hence, set of dual feasible points is $\set{\nu}{\|A^T\nu\|_\ast \leq1}$
entropy maximization $$ \entmax{primal} $$ where domain of objective function is $\pprealk{n}$
- Lagrangian - $L(x,\lambda,\nu) = \sum_{i=1}^n x_i\log x_i + \lambda^T(Ax-b) + \nu(\ones^Tx-1)$
- Lagrange dual function
  $$ g(\lambda,\nu) = \entmax{dual fcn} $$ obtained using $f^\ast(y) = \sum_{i=1}^n \exp(y_i-1)$ where $a_i$ is $i$-th column vector of $A$
minimum volume covering ellipsoid $$ \minvolcovering{primal} $$ where domain of objective function is $\posdefset{n}$
- Lagrangian - $L(X,\lambda) = -\log \det X + \sum_{i=1}^m \lambda_i(a_i^T X a_i - 1)$
- Lagrange dual function
  $$ g(\lambda) = \minvolcovering{dual fcn} $$ obtained using $f^\ast(Y) = -\log\det(-Y) - n$

Best lower bound

for every $(\lambda,\nu)$ with $\lambda\succeq 0$, Lagrange dual function $g(\lambda,\nu)$ (in ) provides lower bound for optimal value $p^\ast$ for optimization problem in
natural question to ask is
- how good is the lower bound?
- what is best lower bound we can achieve?
these questions lead to definition of Lagrange dual problem

Lagrange dual problems

for optimization problem in , optimization problem $$ \begin{array}{ll} \mbox{maximize} & g(\lambda,\nu) \\ \mbox{subject to} & \lambda \succeq 0 \end{array} $$ called Lagrange dual problem associated with problem in

original problem in , (somestime) called primal problem
domain is $\reals^m\times \reals^p$
dual feasibility defined in page~here, i.e., $(\lambda,\nu)$ satisfying $ \lambda \succeq 0 \quad g(\lambda,\nu) > -\infty $ indeed means feasibility for Lagrange dual problem
$d^\ast = \sup\set{g(\lambda,\nu)}{\lambda\in\reals^m,\:\nu\in\reals^p,\:\lambda\succeq 0}$, called dual optimal value
$(\lambda^\ast,\nu^\ast) = \argsup\set{g(\lambda,\nu)}{\lambda\in\reals^m,\:\nu\in\reals^p,\:\lambda\succeq 0}$, said to be dual optimal or called optimal Lagrange multipliers (if exists)

Lagrange dual problem in is convex optimization (even though original problem is not) since $g(\lambda,\nu)$ is always convex

Making dual constraints explicit dual problems

(out specific) way we define Lagrange dual function in as function $g$ of $\reals^m \times \reals^p$ into $\reals\cup\{-\infty\}$, i.e., $\dom g = \reals^n\times\reals^p$
however, in many cases, feasible set $\set{(\lambda,\nu)}{\lambda \succeq 0 \quad g(\lambda,\nu) > -\infty}$ is proper subset of $\reals^n\times\reals^p$
can make this implicit feasibility condition explicit by adding it as constraint (as shown in following examples)

Lagrange dual problems associated with LPs

standard form LP
- primal problem $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & Ax = b \\ & x\succeq 0 \end{array} $$
- Lagrange dual problem $$ \begin{array}{ll} \mbox{maximize} & g(\lambda,\nu) = \left\{\begin{array}{ll} -b^T\nu & A^T\nu - \lambda + c = 0 \\ -\infty & \mbox{otherwise} \end{array}\right. \\ \mbox{subject to} & \lambda \succeq 0 \end{array} $$ (refer to page~here for Lagrange dual function)
  - can make dual feasibility explicit by adding it to constraints as mentioned on page~here $$ \begin{array}{ll} \mbox{maximize} & -b^T\nu \\ \mbox{subject to} & \lambda \succeq 0 \\ & A^T\nu - \lambda + c = 0 \end{array} $$
  - can further simplify problem $$ \begin{array}{ll} \mbox{maximize} & -b^T\nu \\ \mbox{subject to} & A^T\nu + c \succeq 0 \end{array} $$
- last problem is inequality form LP
- all three problems are equivalent, but not same problems
- will, however, with abuse of terminology, refer to all three problems as Lagrange dual problem
inequality form LP
- primal problem $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & Ax \preceq b \end{array} $$
- Lagrangian $$ L(x,\lambda) = c^Tx + \lambda^T(Ax-b) $$
- Lagrange dual function $$ g(\lambda) = -b^T\lambda + \inf_{x\in\reals^n} (c+A^T\lambda)^T x = \left\{\begin{array}{ll} -b^T\lambda & A^T\lambda + c =0 \\ -\infty & \mbox{otherwise} \end{array}\right. $$
- Lagrange dual problem $$ \begin{array}{ll} \mbox{maximize} & g(\lambda) = \left\{\begin{array}{ll} -b^T\lambda & A^T\lambda + c =0 \\ -\infty & \mbox{otherwise} \end{array}\right. \\ \mbox{subject to} & \lambda \succeq 0 \end{array} $$
  - can make dual feasibility explicit by adding it to constraints as mentioned on page~here $$ \begin{array}{ll} \mbox{maximize} & -b^T\nu \\ \mbox{subject to} & A^T\lambda + c = 0 \\ & \lambda \succeq 0 \end{array} $$
- dual problem is standard form LP
thus, dual of standard form LP is inequality form LP and vice versa
also, for both cases, dual of dual is same as primal problem

Lagrange dual problem of equality constrained optimization problem

equality constrained optimization problem $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & Ax = b \end{array} $$
dual function $$ \begin{eqnarray*} g(\nu) & = & \inf_{x\in\dom \fobj} (\fobj(x) + \nu^T(Ax-b)) \\ &=& -b^T\nu - \sup_{x\in\dom \fobj}(-\nu^TAx -\fobj(x)) \\ & = & -b^T\nu - {\fobj}^\ast(-A^T\nu) \end{eqnarray*} $$
dual problem $$ \begin{array}{ll} \mbox{maximize} & -b^T\nu - {\fobj}^\ast(-A^T\nu) \end{array} $$

Lagrange dual problem associated with equality constrained quadratic program

strictly convex quadratic problem $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) = x^TPx + q^T x + r \\ \mbox{subject to} & Ax=b \end{array} $$
- conjugate function of objective function $$ \begin{eqnarray*} {\fobj}^\ast(x) &=& (x-q)^TP^{-1}(x-q)/4 - r \\ &=& x^TP^{-1}x/4 -q^TP^{-1}x/2 + q^TP^{-1}q/4 -r \end{eqnarray*} $$
- dual problem $$ \begin{array}{ll} \mbox{maximize} & -\nu^T (AP^{-1}A^T)\nu /4 -(b + A P^{-1} q/2)^T\nu - q^TP^{-1}q/4 +r \end{array} $$

Lagrange dual problems associated with nonconvex quadratic problems

primal problem $$ \noncvxquadprob{primal} $$ where $A\in\symset{n}$, $A\not\in\possemidefset{n}$, and $b\in\reals^n$
- since $A\not\succeq 0$, not convex optimization problem
- sometimes called trust region problem arising minimizing second-order approximation of function over bounded region
Lagrange dual function $$ g(\lambda) = \noncvxquadprob{dual fcn} $$ where $(A+\lambda I)^\dagger$ is pseudo-inverse of $A+\lambda I$
Lagrange dual problem
$$ \noncvxquadprob{dual} $$ where optimization variable is $\lambda \in\reals$
- note we do not need constraint $\lambda \geq0$ since it is implied by $A+\lambda I \succeq 0$
- though not obvious from what it appears to be, it is (of course) convex optimization problem (by definition of Lagrange dual function, i.e., )
- can be expressed ar $$ \begin{array}{ll} \mbox{maximize} & -\sum_{i=1}^n (q_i^Tb)^2/(\lambda_i + \lambda) - \lambda \\ \mbox{subject to} & \lambda \geq - \lambda_\mathrm{min}(A) \end{array} $$ where $\lambda_i$ and $q_i$ are eigenvalues and corresponding orthogormal eigenvectors of $A$, when $\lambda_i + \lambda=0$ for some $i$, we interpret $(q_i^Tb)^2/0$ as 0 if $q_i^T0$ and $\infty$ otherwise

Weak duality

since $g(\lambda,\nu)\leq p^\ast$ for every $\lambda\succeq 0$, we have $$ d^\ast = \sup\set{g(\lambda,\nu)}{\lambda\in\reals^m,\:\nu\in\reals^p,\:\lambda\succeq 0} \leq p^\ast $$

property that that optimal value of optimization problem (in ) is always no less than optimal value of Lagrange daul problem (in ) $$ d^\ast \leq p^\ast $$ called weak duality

$d^\ast$ is best lower bound for primal problem that can be obtained from Lagrange dual function (by definition)
weak duality holds even when $d^\ast$ or/and $p^\ast$ are not finite, e.g.
- if primal problem is unbounded below so that $p^\ast=-\infty$, must have $d^\ast = -\infty$, i.e., dual problem is infeasible
- conversely, if dual problem is unbounded above so that $d^\ast = \infty$, must have $p^\ast=\infty$, i.e., primal problem is infeasible

Optimal duality gap

difference between optimal value of optimization problem (in ) and optimal value of Lagrange daul problem (in ), i.e. $$ p^\ast - d^\ast $$ called optimal duality gap

sometimes used for lower bound of optimal value of problem which is difficult to solve
- for example, dual problem of max-cut problem (on page~here), which is NP-hard, is $$ \begin{array}{ll} \mbox{minimize} & -\ones^T \nu \\ \mbox{subject to} & W + \diag(\nu) \succeq 0 \end{array} $$ where optimization variable is $\nu\in\reals^n$
  - the dual problem can be solved very efficiently using polynomial time algorithms while primal problme cannot be solved unless $n$ is very small

Strong duality

if optimal value of optimization problem (in ) equals to optimal value of Lagrange daul problem (in ), i.e. $$ p^\ast = d^\ast $$ strong duality said to hold

strong duality does not hold in general
- if it held always, max-cut problem, which is NP-hard, can be solved in polynomial time, which would be one of biggest breakthrough in field of theoretical computer science
- may mean some of strongest cryptography methods, e.g., homeomorphic cryptography, can be broken

Slater's theorem

exist many conditions which guarantee strong duality, which are called constraint qualifications - one of them is Slater's condition

if optimization problem is convex (), and exists feasible $x\in\optdomain$ contained in $\relint \optdomain$ such that $$ \fie(x) \prec 0\quad \feq(x) = 0 $$ strong duality holds (and dual optimum is attained when $d^\ast>-\infty$)

such condition, called Slater's condition
such point, (sometimes) said to be strictly feasible

when there are affine inequality constraints, can refine Slater's condition - if first $k$ inequality constraint functions $\fie_1$, …, $\fie_k$ are affine, Slater's condition can be relaxed to $$ \fie_i(x)\leq 0\;\;i=1,\ldots,k \quad \fie_i(x) < 0\;\;i=k+1,\ldots,m \quad \feq(x) = 0 $$

Strong duality for LS solution of linear equations

primal problem $$ \lssollineqs{primal} $$
dual problem $$ \lssollineqs{dual} $$ (refer to page~here for Lagrange dual function)
“dual is always feasible'' and “primal is feasible $\Rightarrow$ Slater's condition holds'', thus Slater's theorem () implies, exist only three cases
- $(d^\ast = p^\ast \in \reals)$ or $(d^\ast \in \reals\:\&\: p^\ast = \infty)$ or $(d^\ast = p^\ast = \infty)$
if primal is infeasible, though, $b\not\in\range(A)$, thus exists $z$, such that $A^Tz=0$ and $b^Tz \neq0$, then line $\set{tz}{t\in\reals}$ makes dual problem unbounded above, hence $d^\ast=\infty$
hence, strong duality always holds, i.e., $(d^\ast= p^\ast \in \reals)$ or $(d^\ast = p^\ast = \infty)$

Strong duality for LP

every LP either is infeasible or satisfies Slater's condition
dual of LP is LP, hence, Slater's theorem () implies
- if primal is feaisble, either $(d^\ast=p^\ast= -\infty)$ or $(d^\ast=p^\ast\in\reals)$
- if dual is feaisble, either $(d^\ast=p^\ast= \infty)$ or $(d^\ast=p^\ast\in\reals)$
- only other case left is $(d^\ast=-\infty\;\&\;p^\ast= \infty)$
  - indeed, this pathological case can happen

Strong duality for entropy maximization

primal problem $$ \entmax{primal} $$
dual problem (refer to page~here for Lagrange dual function) $$ \entmax{dual} $$
dual problem is feasible, hence, Slater's theorem () implies, if exists $x\succ 0$ with $Ax \preceq b$ and $\ones^T x =1$, strong duality holds, and indeed $d^\ast=p^\ast\in\reals$
by the way, can simplify dual problem by maximizing dual objective function over $\nu$ $$ \entmax{simplied dual} $$ which is geometry program in convex form () with nonnegativity contraint

Strong duality for minimum volume covering ellipsoid

primal problem $$ \minvolcovering{primal} $$ where $\optdomain=\posdefset{n}$
dual problem $$ \minvolcovering{dual} $$ (refer to page~here for Lagrange dual function)
$X=\alpha I$ with large enough $\alpha>0$ satisfies primal's constraints, hence Slater's condition always holds, thus, strong duality always holds, i.e., $(d^\ast = p^\ast \in \reals)$ or $(d^\ast = p^\ast = -\infty)$
in fact, $\range(a_1,\ldots,a_m) = \reals^n$ if and only if $d^\ast=p^\ast\in\reals^n$

Strong duality for trust region nonconvex quadratic problems

one of rare occasions in which strong duality obtains for nonconvex problems
primal problem $$ \noncvxquadprob{primal} $$ where $A\in\symset{n}$, $A\not\in\possemidefset{n}$, and $b\in\reals^n$
Lagrange dual problem (page~here) $$ \noncvxquadprob{dual} $$
strong duality always holds and $d^\ast=p^\ast\in\reals$ (since dual problem is feasible - large enough $\lambda$ satisfies dual constraints)
in fact, exists stronger result - strong dual holds for optimization problem with quadratic objective and one quadratic inequality constraint, provided Slater's condition holds

Matrix games using mixed strategies

matrix game - consider game with two players $A$ and $B$
- player $A$ makes choice $1\leq a\leq n$, player $B$ makes choice $1\leq b\leq m$, then player $A$ makes payment of $P_{ab}$ to player $B$
- matrix $P\in\reals^{n\times m}$, called payoff matrix
- player $A$ tries to pay as little as possible & player $B$ tries to received as much as possible
- players use randomized or mixed strategies, i.e., each player makes choice randomly and independently of other player's choice according to probability distributions $$ \Prob(a=i) = u_i\; i=1\leq i\leq n \quad \Prob(b=j) = v_j\; i=1\leq j\leq m $$
expected payoff (from player $A$ to player $B$) $$ \sum_i \sum_j u_iv_jP_{ij} = u^TPv $$
assume player $A$'s strategy is known to play $B$
- player $B$ will choose $v$ to maximize $u^TPv$ $$ \sup\set{u^TPv}{v\succeq 0,\; \ones^Tv=1} = \max_{1\leq j\leq m} (P^Tu)_j $$
- player $A$ (assuming that player $B$ will employ above strategy to maximize payment) will choose $u$ to minimize payment $$ \begin{array}{ll} \mbox{minimize} & \max_{1\leq j\leq m} (P^Tu)_j \\ \mbox{subject to} & u\succeq 0\quad \ones^Tu=1 \end{array} $$
assume player $B$'s strategy is known to play $A$
- then player $B$ will do same to maximize payment (assuming that player $A$ will employ such strategy to minimize payment) $$ \begin{array}{ll} \mbox{maximize} & \min_{1\leq i\leq n} (Pv)_i \\ \mbox{subject to} & v\succeq 0\quad \ones^Tv=1 \end{array} $$

Strong duality for matrix games using mixed strategies

in matrix game, can guess in frist came, player $B$ has advantage over player $A$ because $A$'s strategy's exposed to $B$, and vice versa, hence optimal value of first problem is greater than that of second problem
surprising, no one has advantage over the other, i.e., optimal values of two problems are same - will show this
first observe both problems are (convex) piecewise-linear optimization problems
formulate first problem as LP $$ \begin{array}{ll} \mbox{minimize} & t \\ \mbox{subject to} & u\succeq 0 \quad \ones^T u =1 \quad P^T u \preceq t\ones \end{array} $$
- Lagrangian $$ L(u,t,\lambda_1, \lambda_2,\nu) = \nu + (1-\ones^T\lambda_1)t + (P\lambda_1 - \nu \ones - \lambda_2)^Tu $$
- Lagrange dual function $$ g(\lambda_1, \lambda_2,\nu) = \left\{\begin{array}{ll} \nu & \ones^T\lambda_1 = 1 \;\&\; P\lambda_1 - \nu \ones = \lambda_2 \\ -\infty & \mbox{otherwise} \end{array}\right. $$
Lagrange dual problem $$ \begin{array}{ll} \mbox{maximize} & \nu \\ \mbox{subject to} & \ones^T\lambda_1 = 1 \quad P\lambda_1 - \nu \ones = \lambda_2 \\ & \lambda_1 \succeq 0 \quad \lambda_2 \succeq 0 \end{array} $$
eliminating $\lambda_2$ gives below Lagrange dual problem $$ \begin{array}{ll} \mbox{maximize} & \nu \\ \mbox{subject to} & \lambda_1 \succeq 0 \quad \ones^T\lambda_1 = 1 \quad P\lambda_1 \succeq \nu \ones \end{array} $$ which is equivalent to second problem in matrix game
weak duality confirms “player who knows other player's strategy has advantage or on par''
moreoever, primal problem satisfies Slater's condition, hence strong duality {always} holds, and dual is feasible, hence $d^\ast=p^\ast\in\reals$, i.e., regardless of who knows other player's strategy, no player has advantage

Geometric interpretation of duality

assume (not necessarily convex) optimization problem in
define graph $$ G = \set{(\fie(x), \feq(x), \fobj(x))}{x\in\optdomain} \subset \reals^m \times \reals^p \times \reals $$
for every $\lambda\succeq 0$ and $\nu$ $$ \begin{eqnarray*} p^\ast &=& \inf\set{t}{(u,v,t) \in G, u\preceq 0, v = 0} \\ & \geq & \inf\set{t+\lambda^Tu + \nu^T v}{(u,v,t) \in G, u\preceq 0, v = 0} \\ & \geq & \inf\set{t+\lambda^Tu + \nu^T v}{(u,v,t) \in G} = g(\lambda,\nu) \end{eqnarray*} $$ where second inequality comes from $ \set{(u,v,t)}{(u,v,t) \in G, u\preceq 0, v = 0} \subset G $
above establishes weak duality using graph
last equality implies that $$ (\lambda, \nu, 1)^T (u,v,t) \geq g(\lambda,\nu) $$ hence if $g(\lambda,\nu) > -\infty$, $(\lambda, \nu, 1)$ and $g(\lambda,\nu)$ define nonvertical supporting hyperplane for $G$ - nonvertical because third component is nonzero

the figure shows $G$ as area inside closed curve contained in $\reals^m\times\reals^p\times\reals$ where $m=1$ and $p=0$ as primal optimal value $p^\ast$ and supporting hyperplane $\lambda u + t = g(\lambda)$

the figure shows three hyperplanes determined by three values for $\lambda$, one of which $\lambda^\ast$ is optimal solution for dual problem

Epigraph interpretation of duality

define extended graph over $G$ - sort of epigraph of $G$ $$ \begin{eqnarray*} H &=& G + \preals^m \times \{0\} \times \preals \\ & = & \set{(u, v, t)}{x\in\optdomain, \fie(x) \preceq u, \feq(x) = v, \fobj(x)\leq t } \end{eqnarray*} $$
if $\lambda\succeq 0$, $g(\lambda,\nu) = \inf\set{(\lambda,\nu,1)^T(u,v,t)}{(u,v,t) \in H}$, thus $$ (\lambda,\nu,1)^T (u,v,t) \geq g(\lambda,\nu) $$ defines nonvertical supporting hyperplane for $H$
now $p^\ast = \inf\set{t}{(0,0,t)\in H}$, hence $(0,0,p^\ast) \in \boundary H$, hence $$ p^\ast =(\lambda,\nu,1)^T (0,0,p^\ast) \geq g(\lambda,\nu) $$
once again establishes weak duality
the figure shows epigraph interpretation

Proof of strong duality under constraint qualification

now we show proof of strong duality - this is one of rare cases where proof is shown in main slides instead of “selected proofs'' section like Galois theory since - (I hope) it will give you some good intuition about why strong duality holds for (most) convex optimization problems
assume Slater's condition holds, i.e., $\fobj$ and $\fie$ are convex, $\feq$ is affine, and exists $x\in\optdomain$ such that $\fie(x) \prec 0$ and $\feq(x) = 0$
further assume $\optdomain$ has interior (hence, $\relint \optdomain = \interior{\optdomain}$ and $\rank A=p$
assume $p^\ast\in\reals$ - since exists feasible $x$, the other possibility is $p^\ast = -\infty$, but then, $d^\ast = -\infty$, hence strong duality holds
$H$ is convex
now define $$ B = \set{(0,0,s)\in\reals^m\times\reals^p\times\reals}{s<p^\ast} $$
then $B\cap H=\emptyset$, hence implies exists separable hyperplane with $(\tilde{\lambda}, \tilde{\nu}, \mu)\neq 0$ and $\alpha$ such that $$ \begin{eqnarray*} (u,v,t) \in H &\Rightarrow& \tilde{\lambda}^T u + \tilde{\nu}^T v + \mu t \geq \alpha \\ (u,v,t) \in B &\Rightarrow& \tilde{\lambda}^T u + \tilde{\nu}^T v + \mu t \leq \alpha \end{eqnarray*} $$
then $\tilde{\lambda} \succeq 0$ & $\mu\geq0$ - assume $\mu>0$
- can prove when $\mu=0$, but kind of tedius, plus, whole purpose is provide good intuition, so will not do it here
above second inequality implies $\mu p^\ast \leq \alpha$ and for some $x\in\optdomain$ $$ \mu L(x,\tilde{\lambda}/\mu, \tilde{\nu}/\mu) = \tilde{\lambda}^T \fie(x) + \tilde{\nu}^T \feq(x) + \mu \fobj(x) \geq \alpha \geq \mu p^\ast $$ thus, $$ g(\tilde{\lambda}/\mu, \tilde{\nu}/\mu) \geq p^\ast $$
finally, weak duality implies $$ g(\lambda,\nu) = p^\ast $$ where $\lambda = \tilde{\lambda}/\mu$ & $\nu = \tilde{\nu}/\mu$

Max-min characterization of weak and strong dualities

note $$ \begin{eqnarray*} \sup_{\lambda\geq 0, \nu} L(x,\lambda,\nu) &=& \sup_{\lambda\geq 0, \nu} \left( \fobj(x) + \lambda^T \fie(x) + \nu^T \feq(x) \right) \\ & = & \left\{\begin{array}{ll} \fobj(x) & x\in\optfeasset \\ \infty & \mbox{otherwise} \end{array}\right. \end{eqnarray*} $$
thus $ p^\ast = \inf_{x\in\optdomain} \sup_{\lambda\succeq 0, \nu} L(x,\lambda,\nu) $ whereas $ d^\ast = \sup_{\lambda\succeq 0,\nu} \inf_{x\in\optdomain} L(x,\lambda,\nu) $
weak duality means $$ \sup_{\lambda\succeq 0, \nu} \inf_{x\in\optdomain} L(x,\lambda,\nu) \leq \inf_{x\in\optdomain} \sup_{\lambda\succeq 0, \nu} L(x,\lambda,\nu) $$
strong duality means $$ \sup_{\lambda\succeq 0, \nu} \inf_{x\in\optdomain} L(x,\lambda,\nu) = \inf_{x\in\optdomain} \sup_{\lambda\succeq 0, \nu} L(x,\lambda,\nu) $$

Max-min inequality

indeed, inequality $ \sup_{\lambda\succeq 0} \inf_{x\in\optdomain} L(x,\lambda,\nu) \leq \inf_{x\in\optdomain} \sup_{\lambda\succeq 0} L(x,\lambda,\nu) $ holds for general case

for $f:{X} \times {Y} \to \reals$ $$ \sup_{y\in{Y}} \inf_{x\in{X}} f(x,y) \leq \inf_{x\in{X}} \sup_{y\in{Y}} f(x,y) $$

if below equality holds, we say $f$ (and $X$ and $Y$) satisfies strong max-min property or saddle point property $$ \sup_{y\in{Y}} \inf_{x\in{X}} f(x,y) = \inf_{x\in{X}} \sup_{y\in{Y}} f(x,y) $$

this happens, e.g., $X=\optdomain$, $Y=\prealk{m} \times \reals^p$, $f$ is Lagrangian of optimization problem (in ) for which strong duality holds

Saddle-points

for $f:X\times Y\to\reals$, pair $x^\ast\in X$ and $y^\ast\in Y$ such that $$ \left( \forall x \in X, y\in Y \right) \left( f(x^\ast,y) \leq f(x^\ast,y^\ast) \leq f(x,y^\ast) \right) $$ called saddle-point for $f$ (and $X$ and $Y$)

if assumption in holds, $x^\ast$ minimizes $f(x,y^\ast)$ over $X$ and $y^\ast$ maximizes $f(x^\ast,y)$ over $Y$ $$ \sup_{y\in Y} f(x^\ast,y) = f(x^\ast,y^\ast) = \inf_{x\in X} f(x,y^\ast) $$
- strong max-min property (in ) holds with $f(x^\ast,y^\ast)$ as common value

Saddle-point interpretation of strong duality

for primal optimum $x^\ast$ and dual optimum $(\lambda^\ast,\nu^\ast)$ $$ g(\lambda^\ast,\nu^\ast) \leq L(x^\ast, \lambda^\ast, \nu^\ast) \leq \fobj(x^\ast) $$
if strong duality holds, for every $x\in\optdomain$, $\lambda\succeq 0$, and $\nu$ $$ L(x^\ast,\lambda,\nu) \leq \fobj(x^\ast) = L(x^\ast,\lambda^\ast,\nu^\ast) = g(\lambda^\ast,\nu^\ast) \leq L(x,\lambda^\ast, \nu^\ast) $$
- thus $x^\ast$ and $(\lambda^\ast,\nu^\ast)$ form saddle-point of Lagrangian
conversely, if $\tilde{x}$ and $(\tilde{\lambda},\tilde{\nu})$ are saddle-point of Lagrangian, i.e., for every $x\in\optdomain$, $\lambda\succeq 0$, and $\nu$ $$ L(\tilde{x}, {\lambda},{\nu}) \leq L(\tilde{x}, \tilde{\lambda},\tilde{\nu}) \leq L({x}, \tilde{\lambda},\tilde{\nu}) $$
- hence $ g(\tilde{\lambda},\tilde{\nu}) = \inf_{x\in\optdomain} L(x,\tilde{\lambda},\tilde{\nu}) = L(\tilde{x}, \tilde{\lambda},\tilde{\nu}) = \sup_{\lambda\succeq 0, \nu} L(\tilde{x},{\lambda},{\nu}) = \fobj(\tilde{x}) $, thus $g(\lambda^\ast,\nu^\ast) \leq g(\tilde{\lambda}, \tilde{\nu})$ & $\fobj(\tilde{x}) \leq \fobj(x^\ast)$
- thus $\tilde{x}$ and $(\tilde{\lambda}, \tilde{\nu})$ are primal and dual optimal

Game interpretation

assume two players play zero-sum game with payment function $f:X\times Y\to \reals$ where player $A$ pays player $B$ amount equal to $f(x,y)$ when player $A$ chooses $x$ and player $B$ chooses $y$
player $A$ will try to minimize $f(x,y)$ and player $B$ will try to maximize $f(x,y)$
assume player $A$ chooses first then player $B$ chooses after learning opponent's choice
- if player $A$ chooses $x$, player $B$ will choose $\argsup_{y\in Y} f(x,y)$
- knowing that, player $A$ will first choose $\arginf_{x\in X} \sup_{y\in Y} f(x,y)$
- hence payment will be $\inf_{x\in X} \sup_{y\in Y} f(x,y)$
if player $B$ makes her choise first, opposite happens, i.e., payment will be $\sup_{y\in Y} \inf_{x\in X} f(x,y)$
max-min inequality of says $$ \sup_{y\in Y} \inf_{x\in X} f(x,y) \leq \inf_{x\in X} \sup_{y\in Y} f(x,y) $$ i.e., whowever chooses later has advantage, which is similar or rather same as matrix games using mixed strategies on page~here
saddle-point for $f$ (and $X$ and $Y$), $(x^\ast,y^\ast)$, called solution of game - $x^\ast$ is optimal choice for player $A$ and $x^\ast$ is optimal choice for player $B$

Game interpretation for weak and strong dualities

assume payment function in zero-sum game on page~here is Lagrangian of optimization problem in
assume that $X=\xdomain$ and $Y=\prealk{n} \times \reals^p$
if player $A$ chooses first, knowing that player $B$ will choose $\argsup_{(\lambda,\nu)\in Y}L(x,\lambda,\nu)$, she will choose $x^\ast = \arginf_{x\in\xdomain} \sup_{(\lambda,\nu)\in Y}L(x,\lambda,\nu)$
likewise, player $B$ will choose $(\lambda^\ast,\nu^\ast) = \argsup_{(\lambda,\nu)\in Y} \inf_{x\in\xdomain} L(x,\lambda,\nu)$
optimal dualtiy gap $p^\ast - d^\ast$ equals to advantage player who goes second has
if strong dualtiy holds, $(x^\ast, \lambda^\ast, \nu^\ast)$ is solution of game, in which case no one has advantage

Certificate of suboptimality

dual feasible point $(\lambda,\nu)$ degree of suboptimality of current solution
assume $x$ is feasible solution, then $$ \fobj(x) - p^\ast \leq \fobj(x) - g(\lambda,\nu) $$ guarantees that $\fobj(x)$ is no further than $\epsilon = \fobj(x) - g(\lambda,\nu)$ from optimal point point $x^\ast$ (even though we do not know optimal solution)
for this reason, $(\lambda,\nu)$, called certificate of suboptimality
$x$ is $\epsilon$-suboptimal for primal problem and $(\lambda,\nu)$ is $\epsilon$-suboptimal for dual problem
strong duality means we could find arbitrarily small certificate of suboptimality

Complementary slackness

assume strong duality holds for optimization problem in and assume $x^\ast$ is primal optimum and $(\lambda^\ast,\nu^\ast)$ is dual optimum, then $$ \fobj(x^\ast) = L(x^\ast,\lambda^\ast,\nu^\ast) = \fobj(x^\ast) + {\lambda^\ast}^T \fie(x^\ast) + {\nu^\ast}^T \feq(x^\ast) $$
$\feq(x^\ast)=0$ implies ${\lambda^\ast}^T \fie(x^\ast)=0$
then $\lambda^\ast \succeq 0$ and $\fie(x^\ast) \preceq 0$ imply $$ \lambda_i^\ast \fie_i(x^\ast) = 0 \quad i=1,\ldots,m $$

when strong duality holds, for primal and dual optimal points $x^\ast$ and $(\lambda^\ast, \nu)$ $$ \lambda_i^\ast \fie_i(x^\ast) = 0 \quad i=1,\ldots,m $$ this property, called complementary slackness

KKT optimality conditions

for optimization problem in where $\fobj$, $\fie$, and $\feq$ are all differentiable, below conditions for ${x}\in\optdomain$ and $({\lambda}, {\nu})\in\reals^m\times\reals^p$ $$ \begin{eqnarray*} \fie({x}) &\preceq& 0 \quad \mbox{- primal feasibility} \\ \feq(x) &=& 0 \quad \mbox{- primal feasibility} \\ \lambda &\succeq& 0 \quad \mbox{- dual feasibility} \\ {\lambda}^T \fie({x}) &=& 0 \quad \mbox{- complementary slackness} \\ \nabla_x L(x,\lambda,\nu) &=& 0 \quad \mbox{- vanishing gradient of Lagrangian} \end{eqnarray*} $$ called Karush-Kuhn-Tucker (KKT) optimality conditions

KKT necessary for optimality with strong duality

for optimization problem in where $\fobj$, $\fie$, and $\feq$ are all differentiable, if strong duality holds, primal and dual optimal solutions $x^\ast$ and $(\lambda^\ast, \nu)$ satisfy KKT optimality conditions (in ), i.e., for every optimization problem

when strong duality holds, KKT optimality conditions are necessary for primal and dual optimality
or equivalently
primal and dual optimality with strong duality imply KKT optimality conditions

KKT and convexity sufficient for optimality with strong duality

assume convex optimization problem where $\fobj$, $\fie$, and $\feq$ are all differentiable and ${x}\in\optdomain$ and $({\lambda}, {\nu})\in\reals^m\times\reals^p$ satisfying KKT conditions, i.e. $$ \fie({x}) \preceq 0, \; \feq({x}) = 0 , \; {\lambda} \succeq 0 , \; {\lambda}^T \fie({x}) = 0 , \; \nabla_x L({x}, {\lambda},{\nu}) = 0 $$
since $L(x,\lambda,\nu)$ is convex for $\lambda\succeq 0$, i.e., each of $\fobj(x)$, $\lambda^T \fie(x)$, and $\nu^T \feq(x)$ is convex, vanishing gradient implies $x$ achieves infimum for Lagrangian, hence $$ g(\lambda,\nu) = L(x,\lambda,\nu) = \fobj(x) + \lambda^T \fie(x) + \nu^T \feq(x) = f(x) $$
thus, strong duality holds, i.e., $x$ and $(\lambda,\nu)$ are primal and dual optimal solutions with zero duality gap

for convex optimization problem in where $\fobj$, $\fie$, and $\feq$ are all differentiable, if ${x}\in\optdomain$ and $({\lambda}, {\nu})\in\reals^m\times\reals^p$ satisfy KKT optimality conditions (in ), they are primal and dual optimal solutions having zero duality gap i.e.

for convex optimization problem, KKT optimality conditions are sufficient for primal and dual optimality with strong duality
or equivalently
KKT optimality conditions and convexity imply primal and dual optimality and strong duality

together with implies that for convex optimization problem
- KKT optimality conditions are necessary and sufficient for primal and dual optimality with strong duality

Solving primal problems via dual problems

when strong duality holds, can retrieve primal optimum from dual optimum since primal optimal solution is minimize of $$ L(x,\lambda^\ast,\nu^\ast) $$ where $(\lambda^\ast, \nu^\ast)$ is dual optimum
example - entropy maximization ($\optdomain = \pprealk{n}$)
- primal problem -
- dual problem -
- provided dual optimum $(\lambda^\ast,\nu^\ast)$, primal optimum is $$ x^\ast = \argmin_{x\in\optdomain} \left( \sum x_i \log x_i + {\lambda^\ast}^T (Ax-b) + \nu^\ast(\ones^Tx -1) \right) $$
- $\nabla_x L(x,\lambda^\ast,\nu^\ast) = \log x + A^T \lambda^\ast + (1+\nu^\ast)\ones$, hence $$ x^\ast = \exp(-(A^T \lambda^\ast + (1+\nu^\ast)\ones)) $$

Perturbed optimization problems

original problem in with perturbed constraints $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie(x) \preceq u \\ & \feq(x) =v \end{array} $$ where $u\in\reals^m$ and $v\in\reals^p$
define $p^\ast(u,v)$ as optimal value of above perturbed problem, i.e. $$ p^\ast(u,v) = \inf\set{\fobj(x)}{x\in\optdomain, \fie(x) \preceq u, \feq(x) = v} $$ which is convex when problem is convex optimization problem - note $p^\ast(0,0)=p^\ast$
assume and dual optimum $(\lambda^\ast,\nu^\ast)$, if strong duality holds, for every feasible $x$ for perturbed problem $$ \begin{eqnarray*} p^\ast(0,0) =g(\lambda^\ast,\nu^\ast) &\leq& \fobj(x) + {\lambda^\ast}^T \fie(x) + {\nu^\ast}^T \feq(x) \\ &\leq& \fobj(x) + {\lambda^\ast}^T u + {\nu^\ast}^T v \end{eqnarray*} $$ thus $$ p^\ast(0,0)\leq p^\ast(u,v) + {\lambda^\ast}^T u + {\nu^\ast}^T v $$ hence $$ p^\ast(u,v)\geq p^\ast(0,0) - {\lambda^\ast}^T u - {\nu^\ast}^T v $$
the figure shows this for optimization problem with one inequality constraint and no equality constraint

Global sensitivity analysis via perturbed problems

recall $$ p^\ast(u,v)\geq p^\ast(0,0) - {\lambda^\ast}^T u - {\nu^\ast}^T v $$
interpretations
- if $\lambda^\ast_i$ is large, when $i$-th inequality constraint is tightened, optimal value increases a lot
- if $\lambda^\ast_i$ is small, when $i$-th inequality constraint is relaxed, optimal value decreases not a lot
- if $|\nu^\ast_i|$ is large, reducing $v_i$ when $\nu^\ast_i>0$ or increasing $v_i$ when $\nu^\ast_i<0$ increases optimval value a lot
- if $|\nu^\ast_i|$ is small, increasing $v_i$ when $\nu^\ast_i>0$ or decreasing $v_i$ when $\nu^\ast_i<0$ decreases optimval value not a lot
it only gives lower bounds - will explore local behavior

Local sensitivity analysis via perturbed problems

assume $p^\ast(u,v)$ is differentiable with respect to $u$ and $v$, i.e., $\nabla_{(u,v)} p^\ast(u,v)$ exist
- then $$ \frac{\partial}{\partial u_i} p^\ast (0,0) = \lim_{h\to 0^+} \frac{p^\ast(he_i,0) - p^\ast(0,0)}{h} \geq \lim_{h\to 0^+} \frac{-{\lambda^\ast}^T (he_i) }{h} = -\lambda_i $$ and $$ \frac{\partial}{\partial u_i} p^\ast (0,0) = \lim_{h\to 0^-} \frac{p^\ast(he_i,0) - p^\ast(0,0)}{h} \leq \lim_{h\to 0^-} \frac{-{\lambda^\ast}^T (he_i) }{h} = -\lambda_i $$
- obtain same result for $v_i$, hence $$ \nabla_u\; p^\ast (0,0) = -\lambda \quad \nabla_v\; p^\ast (0,0) = -\nu $$
so larger $\lambda_i$ or $|\nu_i|$ means larger change in optimal value of perturbed problem when $u_i$ or $v_i$ change a bit and vice versa quantitatively, - $\lambda_i$ an $\nu_i$ provide exact ratio and direction

Different dual problems for equivalent optimization problems - 1

introducing new variables and equality constraints for unconstrained problems
- unconstrained optimization problem $$ \begin{array}{ll} \mbox{minimize} & f(Ax+b) \end{array} $$
  - dual Lagrange function is $g = p^\ast$, hence strong duality holds, which, however, does not provide useful information
- reformulate as equivalent optimization problem $$ \begin{array}{ll} \mbox{minimize} & f(y) \\ \mbox{subject to} & Ax+b = y \end{array} $$
  - Lagrangian - $ L(x,y,\nu) = f(y) + \nu^T(Ax+b-y) $
  - Lagrange dual function - $ g(\nu) = -I(A^T\nu = 0) + b^T\nu - f^\ast(\nu) $
  - dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & b^T\nu - f^\ast(\nu) \\ \mbox{subject to} & A^T \nu = 0 \end{array} $$

examples
- unconstrained geometric problem $$ \begin{array}{ll} \mbox{minimize} & \log\left( \sum_{i=1}^m \exp(a_i^Tx + b_i) \right) \end{array} $$
  - reformulation $$ \begin{array}{ll} \mbox{minimize} & \log\left( \sum_{i=1}^m \exp(y_i) \right) \\ \mbox{subject to} & Ax + b =y \end{array} $$
  - dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & b^T \nu - \sum_{i=1}^m \nu_i \log \nu_i \\ \mbox{subject to} & \ones^T \nu = 1 \\ & A^T \nu = 0 \\ & \nu \succeq 0 \end{array} $$ which is entropy maximization problem
- norm minimization problem $$ \begin{array}{ll} \mbox{minimize} & \|Ax-b\| \end{array} $$
  - reformulation $$ \begin{array}{ll} \mbox{minimize} & \|y\| \\ \mbox{subject to} & Ax - b = y \end{array} $$
  - dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & b^T \nu \\ \mbox{subject to} & \|\nu\|_\ast \leq 1 \\ & A^T \nu =0 \end{array} $$

Different dual problems for equivalent optimization problems - 2

introducing new variables and equality constraints for constrained problems
- inequality constrained optimization problem $$ \begin{array}{ll} \mbox{minimize} & f_0(A_0x+b_0) \\ \mbox{subject to} & f_i(A_ix+b_i) \leq 0\quad i=1,\ldots,m \end{array} $$
- reformulation $$ \begin{array}{ll} \mbox{minimize} & f_0(y_0) \\ \mbox{subject to} & f_i(y_i) \leq 0\quad i=1,\ldots,m \\ & A_i x + b_i = y_i\quad i=0,\ldots,m \end{array} $$
- dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & \sum_{i=0}^m \nu_i^T b_i - f_0^\ast(\nu_0) - \sum_{i=1}^m \lambda_i f_i^\ast(\nu_i/\lambda_i) \\ \mbox{subject to} & \sum_{i=0}^m A_i^T \nu_i = 0 \\ & \lambda \succeq 0 \end{array} $$

examples
- inequality constrained geometric program $$ \begin{array}{ll} \mbox{minimize} & \log\left(\sum \exp(A_0x + b_0)\right) \\ \mbox{subject to} & \log\left(\sum \exp(A_ix + b_i)\right)\leq 0\quad i=1,\ldots,m \end{array} $$ where $A_i\in\reals^{K_i\times n}$ and $\exp(z) := (\exp(z_1),\ldots,\exp(z_k)))\in\reals^n$ and $\sum z := \sum_{i=1}^k z_i\in\reals$ for $z\in\reals^k$
  - reformulation $$ \begin{array}{ll} \mbox{minimize} & \log\left(\sum \exp(y_0)\right) \\ \mbox{subject to} & \log\left(\sum \exp(y_i)\right)\leq 0\quad i=1,\ldots,m \\ & A_i x + b_i = y_i \quad i=0,\ldots,m \end{array} $$
  - dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & \sum_{i=0}^m b_i^T \nu_i - \nu_0^T\log(\nu_0) - \sum_{i=1}^m \nu_i^T\log(\nu_i/\lambda_i) \\ \mbox{subject to} & \nu_i \succeq 0\quad i=0,\ldots,m \\ & \ones^T \nu_0 = 1,\; \ones^T\nu_i=\lambda_i\quad i=1,\ldots,m \\ & \lambda_i\geq 0 \quad i=1,\ldots,m \\ & \sum_{i=0}^m A_i^T\nu_i = 0 \end{array} $$ where and $\log(z) := (\log(z_1),\ldots,\log(z_k)))\in\reals^n$ for $z\in\pprealk{k}$
  - simplified dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & \sum_{i=0}^m b_i^T \nu_i - \nu_0^T\log(\nu_0) - \sum_{i=1}^m \nu_i^T\log(\nu_i/\ones^T\nu_i) \\ \mbox{subject to} & \nu_i \succeq 0\quad i=0,\ldots,m \\ & \ones^T \nu_0 = 1 \\ & \sum_{i=0}^m A_i^T\nu_i = 0 \end{array} $$

Different dual problems for equivalent optimization problems - 3

transforming objectives
- norm minimization problem $$ \begin{array}{ll} \mbox{minimize} & \|Ax - b\| \end{array} $$
- reformulation $$ \begin{array}{ll} \mbox{minimize} & (1/2)\|y\|^2 \\ \mbox{subject to} & Ax - b = y \end{array} $$
- dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & -(1/2)\|\nu\|_\ast^2 + b^T\nu \\ \mbox{subject to} & A^T\nu = 0 \end{array} $$

Different dual problems for equivalent optimization problems - 4

making contraints implicit
- LP with box constraints $$ \begin{array}{ll} \mbox{minimize} & c^T x \\ \mbox{subject to} & Ax = b,\; l \preceq x \preceq u \end{array} $$
- dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & -b^T\nu - \lambda_1^Tu + \lambda_2^Tl \\ \mbox{subject to} & A^T\nu + \lambda_1 - \lambda_2 + c = 0,\; \lambda_1 \succeq 0,\; \lambda_2 \succeq 0 \end{array} $$
- reformulation $$ \begin{array}{ll} \mbox{minimize} & c^T x + I ( l\preceq x \preceq u) \\ \mbox{subject to} & Ax = b \end{array} $$
- dual optimization problem for reformulated primal problem $$ \begin{array}{ll} \mbox{maximize} & -b^T \nu - u^T(A^T\nu + c)^- + l^T(A^T\nu + c)^+ \end{array} $$

Theorems of Alternatives

Weak alternatives

for $\fie: \xie\to\reals^m$ & $\feq: \xeq\to\reals^p$ where $\xie$ and $\xeq$ are subsets of common set $\xdomain$, which is subset of Banach space, assuming $\optdomain = \xie \cap \xeq \neq \emptyset$, and $\lambda\in\reals^m$ & $\nu\in\reals^p$, below two systems of inequalities and equalities are weak alternatives, i.e., at most one of them is feasible $$ \fie(x) \preceq 0 \quad \feq(x) =0 $$ and $$ \lambda \succeq 0 \quad \inf_{x\in\optdomain} \left( \lambda^T \fie(x) + \nu^T \feq(x) \right) >0 $$

can prove using duality of optimization problems
consider primal and dual problems
- primal problem $$ \begin{array}{ll} \mbox{minimize} & 0 \\ \mbox{subject to} & \fie(x) \preceq 0 \\ & \feq(x) =0 \end{array} $$
- dual problem $$ \begin{array}{ll} \mbox{maximize} & g(\lambda,\nu) \\ \mbox{subject to} & \lambda \succeq 0 \end{array} $$ where $$ g(\lambda,\nu) = \inf_{x\in\optdomain} \left( \lambda^T \fie(x) + \nu^T \feq(x) \right) $$
then $p^\ast,\; d^\ast \in \{0,\infty\}$
now assume first system of \theoremname~\ref{theorem:weak alternatives of two systems}\ is feasible, then $p^\ast = 0$, hence weak duality applies $d^\ast=0$, thus there exist no $\lambda$ and $\nu$ such that $\lambda\succeq 0$ and $g(\lambda,\nu) > 0$ i.e., second system is infeasible, since otherwise there exist $\lambda$ and $\nu$ making $g(\lambda,\nu)$ arbitrarily large; if $\tilde{\lambda}\succeq 0$ and $\tilde{\nu}$ satisfy $g({\lambda},{\nu})>0$, $g(\alpha\tilde{\lambda}, \alpha\tilde{\nu}) = \alpha g(\tilde{\lambda}, \tilde{\nu})$ goes to $\infty$ when $\alpha\to\infty$
assume second system is feasible, then $g(\lambda,\nu)$ can be arbitrarily large for above reasons, thus $d^\ast = \infty$, hence weak duality implies $p^\ast = \infty$, which implies first system is infeasible
therefore two systems are weak alternatives; at most one of them is feasible
(actually, not hard to prove it without using weak duality)

Weak alternatives with strict inequalities

for $\fie: \xie\to\reals^m$ & $\feq: \xeq\to\reals^p$ where $\xie$ and $\xeq$ are subsets of common set $\xdomain$, which is subset of Banach space, assuming $\optdomain = \xie \cap \xeq \neq \emptyset$, and $\lambda\in\reals^m$ & $\nu\in\reals^p$, below two systems of inequalities and equalities are weak alternatives, i.e., at most one of them is feasible $$ \fie(x) \prec 0 \quad \feq(x) =0 $$ and $$ \lambda \succeq 0 \quad \lambda\neq 0 \quad \inf_{x\in\optdomain} \left( \lambda^T \fie(x) + \nu^T \feq(x) \right) \geq 0 $$

Strong alternatives

for convex $\fie: \xie\to\reals^m$ & affine $\feq:\xeq\to\reals^p$ where $\xie$ and $\xeq$ are subsets $\reals^n$ assuming $\optdomain = \xie \cap \xeq \neq \emptyset$ and $\lambda\in\reals^m$ & $\nu\in\reals^p$, if exists $x \in \relint \optdomain$ with $\feq(x)=0$, below two systems of inequalities and equalities are strong alternatives, i.e., exactly one of them is feasible $$ \fie(x) \preceq 0 \quad \feq(x) =0 $$ and $$ \lambda \succeq 0 \quad \inf_{x\in\optdomain} \left( \lambda^T \fie(x) + \nu^T \feq(x) \right) >0 $$

Strong alternatives with strict inequalities

for convex $\fie: \xie\to\reals^m$ & affine $\feq:\xeq\to\reals^p$ where $\xie$ and $\xeq$ are subsets $\reals^n$ assuming $\optdomain = \xie \cap \xeq \neq \emptyset$ and $\lambda\in\reals^m$ & $\nu\in\reals^p$, if exists $x \in \relint \optdomain$ with $\feq(x)=0$, below two systems of inequalities and equalities are strong alternatives, i.e., exactly one of them is feasible $$ \fie(x) \prec 0 \quad \feq(x) =0 $$ and $$ \lambda \succeq 0 \quad \lambda \neq 0 \quad \inf_{x\in\optdomain} \left( \lambda^T \fie(x) + \nu^T \feq(x) \right) \geq 0 $$

proof - consider convex optimization problem and its dual
- primal problem $$ \begin{array}{ll} \mbox{minimize} & s \\ \mbox{subject to} & \fie(x) - s \ones \preceq 0 \\ & \feq(x) =0 \end{array} $$
- dual problem $$ \begin{array}{ll} \mbox{maximize} & g(\lambda,\nu) \\ \mbox{subject to} & \lambda \succeq 0 \quad \ones^T \lambda = 1 \end{array} $$ where $ g(\lambda,\nu) = \inf_{x\in\optdomain} \left( \lambda^T \fie(x) + \nu^T \feq(x) \right) $
first observe Slater's condition holds for primal problem since by hypothesis of , exists $y\in\relint \optdomain$ with $\feq(y)=0$, hence $(y,\fie(y))\in\xie\times \reals$ is primal feasible satisifying Slater's condition
hence Slater's theorem () implies $d^\ast=p^\ast$
assume first system is feasible, then primal problem is strictly feasible and $d^\ast = p^\ast<0$, hence second system infeasible since otherwise feasible point for second system is feasible point of dual problem, hence $d^\ast\geq0$
assume first system is infeasible, then $d^\ast = p^\ast\geq0$, hence Slater's theorem () implies exists dual optimal $(\lambda^\ast,\nu^\ast)$ (whether or not $d^\ast=\infty$), hence $(\lambda^\ast,\nu^\ast)$ is feasible point for second system of
therefore two systems are strong alternatives; each is feasible if and only if the other is infeasible

Strong alternatives for linear inequalities

dual function of feasibility problem for $Ax\preceq b$ is $$ g(\lambda) = \inf_{x\in\reals^n} \lambda^T(Ax-b) = \left\{ \begin{array}{ll} -b^T \lambda & A^T\lambda = 0 \\ -\infty & \mbox{otherwise} \end{array} \right. $$
hence alternative system is $\lambda\succeq0,\;b^T\lambda <0,\; A^T\lambda=0$
thus implies below systems are strong alternatives $$ Ax \preceq b \quad\&\quad \lambda\succeq0 \quad b^T\lambda <0 \quad A^T\lambda=0 $$
similarly alternative system is $\lambda\succeq0,\;b^T\lambda <0,\; A^T\lambda=0$ and implies below systems are strong alternatives $$ Ax \prec b \quad\&\quad \lambda\succeq0 \quad \lambda \neq 0 \quad b^T\lambda \leq 0 \quad A^T\lambda=0 $$

Farkas' lemma

below systems of inequalities and equalities are strong alternatives $$ Ax\preceq 0 \quad c^T x < 0 \quad \& \quad A^T y + c = 0 \quad y \succeq 0 $$

will prove using LP and its dual
consider LP $\left(\mbox{minimize}\; c^T x \quad \mbox{subject to}\; Ax \preceq 0\right)$
dual function is $ g(y) = \inf_{x\in\reals^n} \left(c^Tx + y^TAx \right) = \left\{ \begin{array}{ll} 0 & A^Ty + c= 0 \\ -\infty & \mbox{otherwise} \end{array} \right. $
hence dual problem is $ \left( \mbox{maximize} \; 0 \quad \mbox{subject to} \; A^T y + c = 0 , \; y \succeq 0 \right) $
assume first system is feasible, then homogeneity of primal problem implies $p^\ast = -\infty$, thus $d^\ast$, i.e., dual is infeasible, hence second system is infeasible
assume first system is infeasible, since primal is always feasible, $p^\ast=0$, hence strong duality implies $d^\ast =0$, thus second system is feasible

Convex Optimization with Generalized Inequalities

Optimization problems with generalized inequalities

for $\fobj:\xobj \to \reals$, $\fie: \xie\to \bigtimes_{i=1}^m \reals^{k_i} $, $\feq: \xeq \to \reals^p$ where $\xobj$, $\xie$, and $\xeq$ are subsets of common set $\xdomain$ $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie(x) \preceq_\bigpropercone 0 \\ & \feq(x) =0 \end{array} $$ called optimization problem with generalized inequalities where $\bigpropercone = \bigtimes K_i$ is proper cone with $m$ proper cones $K_1\subset \reals^{k_1},\ldots, K_n\subset \reals^{k_m}$

every terminology and associated notation is same as of optimization problem in such as objective & inequality & equality contraint functions, domain of optimization problem $\optdomain$, feasible set $\optfeasset$, optimal value $p^\ast$
note that when $K_i=\preals$ (hence $\bigpropercone=\prealk{m}$), above optimization problem coincides with that in , i.e., optimization problems with generalized inequalities subsume (normal) optimization problems

Lagrangian for generalized inequalities

for optimization problem in with nonempty domain $\optdomain$, function $L:\optdomain \times \bigtimes_{i=1}^m \reals^{k_i} \times \reals^p \to \reals$ defined by $$ L(x,\lambda, \nu) = \fobj(x) + \lambda^T \fie(x) + \nu^T \feq(x) $$ called Lagrangian associated with the optimization problem where

every terminology and associated notation is same as of optimization problem in such as dual variables or Lagrange multipliers $\lambda$ and $\nu$.
Lagrangian for generalized inequalities subsumes (normal) Lagrangian ()

Lagrange dual functions for generalized inequalities

for optimization problem in for which Lagrangian is defined, function $g:\bigtimes \reals^{k_i} \times \reals^p \to \reals\cup \{-\infty\}$ defined by $$ g(\lambda,\nu) = \inf_{x\in\optdomain} L(x,\lambda,\nu) = \inf_{x\in\optdomain} \left(\fobj(x) + \lambda^T \fie(x) + \nu^T \feq(x)\right) $$ called Lagrange dual function or just dual function associated with optimization problem

Lagrange dual functions for generalized inequalities subsume (normal) Lagrange dual functions ()

$g$ is concave function
$g(\lambda,\nu)$ is lower bound for optimal value of associated optimization problem i.e., $$ g(\lambda,\nu) \leq p^\ast $$ for every $\lambda\succeq_\bigpropercone^\ast0$ where $\bigpropercone^\ast$ denotes dual cone of $\bigpropercone$, i.e., $\bigpropercone^\ast = \bigtimes K_i^\ast$ where $K_i^\ast\subset\reals^{k_i}$ is dual cone of $K_i\subset\reals^{k_i}$
$(\lambda,\nu)$ with $\lambda\succeq_\bigpropercone 0$ and $g(\lambda,\nu)>-\infty$ said to be dual feasible

Lagrange dual problems for generalized inequalities

for optimization problem in , optimization problem $$ \begin{array}{ll} \mbox{maximize} & g(\lambda,\nu) \\ \mbox{subject to} & \lambda \succeq_{\bigpropercone^\ast} 0 \end{array} $$ where $\bigpropercone^\ast$ denotes dual cone of $\bigpropercone$, i.e., $\bigpropercone^\ast = \bigtimes K_i^\ast$ where $K_i^\ast\subset\reals^{k_i}$ is dual cone of $K_i\subset\reals^{k_i}$, called Lagrange dual problem associated with problem in

every terminology and related notation is same as that in such as dual feasibility, dual optimal value $d^\ast$, optimal Lagrange multipliers $(\lambda^\ast,\nu^\ast)$
Lagrange dual problems for generalized inequalities subsume (normal) Lagrange dual problems ()

Lagrange dual problem in is convex optimization since $g(\lambda,\nu)$ is convex

Slater's theorem for generalized inequalities

if optimization problem in is convex, i.e., $\fobj$ is convex, $\fie$ is $\bigpropercone$-convex (i.e., every $\fie_i$ is $K_i$-convex) (), and exists feasible $x\in\optdomain$ contained in $\relint \optdomain$ such that $$ \fie(x) \prec_\bigpropercone\ 0\quad \feq(x) = 0 $$ strong duality holds (and dual optimal value is attained when $d^\ast>-\infty$)

such condition, called Slater's condition
such point, (sometimes) said to be strictly feasible
note resemblance with Slater's theorem in

Duality for SDP

(inequality form) SDP
$$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & x_1F_1 + \cdots + x_nF_n + G \preceq 0 \end{array} $$ where $F_1,\ldots,F_n,G\in\symset{k}$ and $\bigpropercone = \possemidefset{k}$
Lagrangian $$ \begin{eqnarray*} L(x,Z) &=& c^Tx + (x_1F_1 + \cdots + x_nF_n + G) \bullet Z \\ &=& \sum x_i(F_i\bullet Z + c_i) + G \bullet Z \end{eqnarray*} $$ where $X\bullet Y = \Tr XY$ for $X,Y\in\symset{k}$
Lagrange dual function $$ g(Z) = \inf_{x\in\reals^n} L(x,Z) = \left\{ \begin{array}{ll} G \bullet Z & F_i\bullet Z + c_i= 0\quad i=1,\ldots,n \\ -\infty & \mbox{otherwise} \end{array} \right. $$
Lagrange dual problem
$$ \begin{array}{ll} \mbox{maximize} & G\bullet Z \\ \mbox{subject to} & F_i \bullet Z + c_i = 0\quad i=1,\ldots,n \\ & Z \succeq 0 \end{array} $$ where fact that $\possemidefset{k}$ is self-dual, i.e., $\bigpropercone^\ast = \bigpropercone$
Slater's theorem () implies if primal problem is strictly feasible, i.e., exists $x\in\reals^n$ such that $\sum x_iF_i + G\prec 0$, strong duality holds

KKT optimality conditions for generalized inequalities

for optimization problem in where $\fobj$, $\fie$, and $\feq$ are all differentiable, below conditions for ${x}\in\optdomain$ and $({\lambda}, {\nu})\in\bigtimes \reals^{k_i} \times\reals^p$ $$ \begin{eqnarray*} \fie({x}) &\preceq_\bigpropercone& 0 \quad \mbox{- primal feasibility} \\ \feq(x) &=& 0 \quad \mbox{- primal feasibility} \\ \lambda &\succeq_{\bigpropercone^\ast}& 0 \quad \mbox{- dual feasibility} \\ {\lambda}^T \fie({x}) &=& 0 \quad \mbox{- complementary slackness} \\ \nabla_x L(x,\lambda,\nu) &=& 0 \quad \mbox{- vanishing gradient of Lagrangian} \end{eqnarray*} $$ called Karush-Kuhn-Tucker (KKT) optimality conditions

note KKT optimality conditions for generalized inequalities subsume (normal) KKT optimality conditions ()

KKT conditions and optimalities for generalized inequalities

for every optimization problem with generalized inequalities (), every statement for normal optimization problem (), regarding relations among KKT conditions, optimality, primal and dual optimality, and strong duality, is exactly the same
- for every optimization problem with generalized inequalities ()
  - if strong duality holds, primal and dual optimal points satisfy KKT optimality conditions in (same as )
  - if optimization problem is convex and primal and dual solutions satisfy KKT optimality conditions in , the solutions are optimal with strong duality (same as )
  - therefore, for convex optimization problem, KKT optimality conditions are necessary and sufficient for primal and dual optimality with strong duality

Perturbation and sensitivity analysis for generalized inequalities

original problem in with perturbed constraints $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie(x) \preceq_\bigpropercone u \\ & \feq(x) =v \end{array} $$ where $u\in\reals^m$ and $v\in\reals^p$
define $p^\ast(u,v) = p^\ast(u,v) = \inf\set{\fobj(x)}{x\in\optdomain, \fie(x) \preceq u, \feq(x) = v}$, which is convex when problem is convex optimization problem - note $p^\ast(0,0)=p^\ast$
as for normal optimization problem case (page~here), if and dual optimum $(\lambda^\ast,\nu^\ast)$, if strong duality holds, $$ p^\ast(u,v)\geq p^\ast(0,0) - {\lambda^\ast}^T u - {\nu^\ast}^T v $$ and $$ \nabla_u\; p^\ast (0,0) = -\lambda \quad \nabla_v\; p^\ast (0,0) = -\nu $$

Sensitivity analysis for SDP

assume inequality form SDP and its dual problem on page~here and page~here
consider perturbed SDP $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & x_1F_1 + \cdots + x_nF_n + G \preceq U \end{array} $$ for some $U\in\symset{k}$
- define $p^\ast:\symset{k} \to \reals$ such that $p^\ast(U)$ is optimal value of above problem
assume $x^\ast\in\reals^n$ and $Z^\ast\in\possemidefset{k}$ are primal and dual optimum with zero dualty gap
then $$ p^\ast(U) \geq p^\ast - Z^\ast \bullet U $$
if $\nabla_U p^\ast$ exists at $U=0$ $$ \nabla_U p^\ast(0) = - Z^\ast $$

Weak alternatives for generalized inequalities

for $\fie:\xie \to \bigtimes \reals^{k_i}$ & $\feq:\xeq \to \reals^p$ where $\xie$ and $\xeq$ are subsets of common Banach space assuming $\optdomain = \xie \cap \xeq \neq \emptyset$, and $\lambda \in \bigtimes \reals^{k_i}$ & $\nu \in \reals^p$, below pairs of systems are strong alternatives $$ \begin{eqnarray*} \fie(x) \preceq_\bigpropercone 0 \quad \feq(x) = 0 \quad & \& & \quad \lambda\succeq_{\bigpropercone^\ast} 0 \quad g(\lambda,\nu) > 0 \\ \fie(x) \prec_\bigpropercone 0 \quad \feq(x) = 0 \quad & \& & \quad \lambda\succeq_{\bigpropercone^\ast} 0 \quad \lambda \neq 0 \quad g(\lambda,\nu) \geq 0 \end{eqnarray*} $$ where $\bigpropercone = \bigtimes K_i$ with proper cones $K_i\subset\reals^{k_i}$ and function $g:\bigtimes \reals^{k_i} \times \reals^p \to \reals$ defined by $$ g(\lambda,\nu) = \inf_{x\in\optdomain} \left( \lambda^T \fie(x) + \nu^T \feq(x) \right) $$ note this theorem subsumes and

Strong alternatives for generalized inequalities

for $\bigpropercone$-convex $\fie:\xie \to \bigtimes \reals^{k_i}$ & affine $\feq:\xeq \to \reals^p$ where $\xie$ and $\xeq$ are subsets of $\reals^n$ assuming $\optdomain = \xie \cap \xeq \neq \emptyset$, and $\lambda \in \bigtimes \reals^{k_i}$ & $\nu \in \reals^p$, if exists $x\in\relint \optdomain$ with $\feq(x)=0$, below pairs of systems are strong alternatives $$ \begin{eqnarray*} \fie(x) \preceq_\bigpropercone 0 \quad \feq(x) = 0 \quad & \& & \quad \lambda\succeq_{\bigpropercone^\ast} 0 \quad g(\lambda,\nu) > 0 \\ \fie(x) \prec_\bigpropercone 0 \quad \feq(x) = 0 \quad & \& & \quad \lambda\succeq_{\bigpropercone^\ast} 0 \quad \lambda \neq 0 \quad g(\lambda,\nu) \geq 0 \end{eqnarray*} $$ where $\bigpropercone = \bigtimes K_i$ with proper cones $K_i\subset\reals^{k_i}$ and function $g:\bigtimes \reals^{k_i} \times \reals^p \to \reals$ defined by $$ g(\lambda,\nu) = \inf_{x\in\optdomain} \left( \lambda^T \fie(x) + \nu^T \feq(x) \right) $$ note this theorem subsumes and

Strong alternatives for SDP

for $F_1,\ldots,F_n,G\in\symset{k}$, $x\in\reals^n$, and $Z\in\symset{k}$
- below systems are strong alternatives $$ x_1F_1 + \cdots + x_nF_n + G \prec 0 $$ and $$ Z \succeq 0 \quad Z\neq 0 \quad G\bullet Z \geq 0 \quad F_i \bullet Z = 0\;i=1,\ldots,n $$
- if $\sum v_i F_i \succeq 0 \Rightarrow \sum v_i F_i = 0$, below systems are strong alternatives $$ x_1F_1 + \cdots + x_nF_n + G \preceq 0 $$ and $$ Z \succeq 0 \quad G\bullet Z > 0 \quad F_i \bullet Z = 0\;i=1,\ldots,n $$

Unconstrained Minimization

Unconstrained minimization

consider unconstrained convex optimization problem, i.e., $m=p=0$ in $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \end{array} $$ where domain of optimization problem is $\optdomain\ = \xobj \subset \reals^n$
assume
- $\fobj$ is twice-differentiable (hence by definition $\xobj$ is open)
- optimal solution $x^\ast$ exists, i.e., $p^\ast = \inf_{x\in\optdomain} \fobj(x) = \fobj(x^\ast)$
implies $x^\ast$ is optimal solution if and only if $$ \nabla \fobj(x^\ast) = 0 $$
can solve above equation directly for few cases, but usually depend on iterative method, i.e., find sequence of points $\xseqk{0}, \xseqk{1}, \ldots \in \xobj$ such that $ \lim_{k\to\infty} \fobj(\xseqk{k}) = p^\ast $

Requirements for iterative methods

requirements for iterative methods
- initial point $\xseqk{0}$ should be in domain of optimization problem, i.e. $$ \xseqk{0} \in \xobj\ $$
- sublevel set of $\fobj(\xseqk{0})$ $$ S = \bigset{x\in\xobj}{\fobj(x) \leq \fobj(\xseqk{0})} $$ should be closed
e.g.
- sublevel set of $\fobj(\xseqk{0})$ is closed for all $\xseqk{0}\in\xobj$ if $\fobj$ is closed, i.e., all its sublevel sets are closed
- $\fobj$ is closed if $\xobj = \reals^n$ and $\fobj$ is continuous
- $\fobj$ is closed if $\fobj$ is continuous, $\xobj$ is open, and $\fobj(x) \to \infty$ as $x \to \boundary \xobj$

Unconstrained minimization examples

convex quadratic problem $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) = (1/2) x^TP x +q^Tx \end{array} $$ where $P\in\possemidefset{n}$ and $q\in\reals^n$
- solution obtained by solving $$ \nabla \fobj(x^\ast) = P x^\ast + q = 0 $$
  - if solution exists, $x^\ast = - P^\dagger q$ (thus $p^\ast>-\infty$)
  - otherwise, problem is unbounded below, i.e., $p^\ast = -\infty$
- ability to analytically solve quadratic minimization problem is basis for Newton's method, power method for unconstrained minimization
- least-squares (LS) is special case of convex quadratic problem $$ \begin{array}{ll} \mbox{minimize} & (1/2) \|Ax-b\|_2^2 = (1/2) x^T (A^TA) x - b^TAx + (1/2)\|b\|_2^2 \end{array} $$
  - optimal always exists, can be obtained via normal equations $$ A^T Ax^\ast = b $$
unconstrained GP $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) = \log\left( \sum \exp (Ax+b) \right) \end{array} $$ for $A\in\reals^{m\times n}$ and $b\in\reals^m$
- solution obtained by solving $$ \nabla \fobj(x^\ast) = \frac{\sum A^T \exp(Ax^\ast+b)}{\sum \exp(Ax^\ast+b)} = 0 $$
- need to resort to iterative method - since $\xobj = \reals^n$ and $\fobj$ is continuous, $\fobj$ is closed, hence every point in $\reals^n$ can be initial point
analytic center of linear inequalities $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) = - \sum\log(b-Ax) \end{array} $$ where $\xobj = \set{x\in\reals^n}{b-Ax \succ 0}$
- need to resort to iterative method - since $\xobj$ is open, $\fobj$ is continuous, and $\fobj(x) \to \infty$ as $x\to\boundary \xobj$, $\fobj$ is closed, hence every point in $\xobj$ can be initial point
- $\fobj$, called logarithmic barrier for inequalities $Ax\preceq b$
analytic center of LMI $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) = - \log\det F(x) = \log\det F(x)^{-1} \end{array} $$ where $F:\reals^n\to \symset{k}$ is defined by $$ F(x) = x_1F_1 + \cdots + x_nF_n $$ where $F_i\in \symset{k}$ and $\xobj = \set{x\in\reals^n}{F(x)\succ 0}$
- need to resort to iterative method - since $\xobj$ is open, $\fobj$ is continuous, and $\fobj(x) \to \infty$ as $x\to\boundary \xobj$, $\fobj$ is closed, hence every point in $\xobj$ can be initial point
- $\fobj$, called logarithmic barrier for LMI

Strong convexity and implications

function $\fobj$ is strongly convex on $S$ $$ \left( \exists m >0 \right) \left( \forall x \in S \right) \left( \nabla^2 \fobj(x) \succeq mI \right) $$
strong convexity implies for every $x,y\in S$ $$ \fobj(y) \geq \fobj(x) + \nabla \fobj(x)^T (y-x) + ({m}/{2}) \|y-x\|_2^2 $$
- which implies gradient provides optimality certificate and tells us how far current point is from optimum, i.e. $$ \fobj(x) - p^\ast \leq ({1}/{2m}) \|\nabla \fobj(x)\|_2^2 \quad \|x-x^\ast\|_2 \leq ({2}/{m}) \|\nabla \fobj(x)\|_2 $$
first equation implies sublevel sets contained in $S$ is bounded, hence continuous function $\nabla^2 \fobj(x)$ is also bounded, i.e., $\left( \exists M >0 \right) \left( \nabla^2 \fobj(x) \preceq M I \right)$, then $$ \fobj(x) - p^\ast \geq \frac{1}{2M} \|\nabla \fobj(x)\|_2^2 $$

Iterative methods

numerical method generating sequence of points $\xseqk{0}, \xseqk{1}, \ldots \in S\subset \reals^n$ to make $\fobj(\xseqk{k})$ approaches to some desired value from some $f:S\to\reals$, called iterative method

iterative method generating search direction $\sdirk{k}\in\reals^n$ and step length $\slenk{k}>0$ at each step $k$ such that $$ \xseqk{k+1} = \xseqk{k} + \slenk{k} \sdirk{k} $$ called iterative method with search direction where $\sdirk{k}$, called search direction, $\slenk{k}$, called step length (which actually is not length)

for function $f:S\to\reals$, iterative method reducing function value, i.e. $$ \fobj(\xseqk{k+1}) \leq \fobj(\xseqk{k}) $$ for $k=0,1,\ldots$, called descent method

Line search methods

for iterating method with search directions, determining search direction $\sdirk{k}$ and step length $\slenk{k}$ for each step, called line search method

for descent iterating method with search directions, determine $\slen$ by $$ \slen = \argmin_{s>0} \fobj(x +s\sdir) $$

for descent iterating method with search directions, determine $\slen$ by

Require: $\fobj$, \sdirk{k}, $\alpha\in(0,0.5)$, $\beta\in(0,1)$
$\slen:=1$
while $\fobj(\xseqk{k} + \slen \sdirk{k}) > \fobj(\xseqk{k}) + \alpha \slen \nabla \fobj(\xseqk{k})^T \sdirk{k}$ do
$\slen := \beta \slen$
end while

Gradient descent method

Require: $\fobj$, initial point $x\in \dom \fobj$
repeat
search direction - $\sdir := - \nabla \fobj(x)$
do line search to choose $\slen>0$
update - $x := x + \slen \sdir$
until stopping criterion satisfied

Summary of gradient descent method

gradient method often exhibits approximately linear convergence, i.e., error $\fobj(\xseqk{k})-p^\ast$ converges to zero approximately as geometric series
choice of backtracking parameters $\alpha$ and $\beta$ has noticeable but not dramatic effect on convergence
exact line search sometimes improves convergence of gradient method, but not by large, hence mostly not worth implementation
converge rate depends greatly on condition number of Hessian or sublevel sets - when condition number if large, gradient method can be useless

Newton's method - motivation

second-order Taylor expansion of $\fobj$ - $$ \begin{eqnarray*} \hat{\fobj}(\sdir) &=& \fobj(x + \sdir) \\ &=& \fobj(x) + \nabla \fobj(x)^T \sdir + \frac{1}{2} \sdir^T \nabla^2 \fobj(x) \sdir \end{eqnarray*} $$
minimum of Taylor expansion achieved when $$ \begin{eqnarray*} \nabla \hat{\fobj}(\sdir) &=& \nabla \fobj(x) + \nabla^2 \fobj(x) v = 0 \end{eqnarray*} $$
solution called Newton step $$ \sdir_\mathrm{nt}(x) = - \nabla^2 \fobj(x)^{-1} \nabla \fobj(x) $$ assuming $\nabla^2\fobj(x)\succ0$
thus Newton step minimizes local quadratic approximation of function
difference of current and quadratic approximation minimum $$ \fobj(x) - \hat{\fobj}(\sdir_\mathrm{tn}(x)) = \frac{1}{2} \sdir_\mathrm{nt}^T \nabla^2 \fobj(x) \sdir_\mathrm{nt} = \frac{1}{2} \lambda(x)^2 $$
Newton decrement
$$ \begin{eqnarray*} \lambda(x) &=& \sqrt{\sdir_\mathrm{nt}(x)^T \nabla^2 \fobj(x) \sdir_\mathrm{nt}(x)} \\ &=& \sqrt{\nabla \fobj(x)^T \nabla^2 \fobj(x)^{-1} \nabla \fobj(x)} \end{eqnarray*} $$

Newton's method

damped descent method using Newton step

Require: \fobj, initial point $x\in \dom \fobj$, tolerance $\epsilon>0$
loop
computer Newton step and descrement \[ \sdir_\mathrm{nt}(x) := -\nabla^2 \fobj(x)^{-1} \nabla \fobj(x) \] \[ \lambda(x)^2 := \nabla \fobj(x)^T \nabla^2 \fobj(x)^{-1} \nabla \fobj(x) \]
stopping criterion - quit if $\lambda(x)^2/2 < \epsilon$
do line search to choose $t>0$
update - $x := x + \slen \sdir_\mathrm{nt}$
end loop

Newton step is descent direction since $$ \left. \left( \frac{d}{dx}\fobj(x+t\sdir_\mathrm{nt}) \right) \right|_{t=0} = \nabla \fobj(x) ^T \sdir_\mathrm{nt} = - \lambda(x)^2 <0 $$

Assumptions for convergence analysis of Newton's method

assumptions
- strong convexity and boundedness of Hessian on sublevel set $$ \left( \exists\; m, M > 0 \right) \left( \forall x \in S \right) \left( mI \preceq \nabla^2 \fobj(x) \preceq MI \right) $$
- Lipschitz continuity of Hessian on sublevel set $$ \left( \exists L > 0 \right) \left( \forall x,y\in S \right) \left( \|\nabla^2 \fobj(x)- \nabla^2\fobj(y)\|_2 \leq L \|x-y\|_2 \right) $$
Lipschitz continuity constant $L$ plays critical role in performance of Newton's method
- intuition says Newton's method works well for functions whose quadratic approximations do not change fast, i.e., when $L$ is small

Convergence analysis of Newton's method

for function $\fobj$ satisfying strong convexity, Hessian continuity & Lipschitz continuity with $m, M, L>0$, exist $0<\eta<m^2/L$ and $\gamma > 0$ such that for each step $k$

damped Newton phase - if $\|\nabla \fobj(\xseqk{k})\|_2 \geq \eta$, $$ \fobj(\xseqk{k+1}) - \fobj(\xseqk{k}) \leq - \gamma $$
quadratic convergence phase - if $\|\nabla \fobj(\xseqk{k})\|_2 < \eta$, backtracking line search selects step length $\slenk{k}=1$ $$ \frac{L}{2m^2} \|\nabla \fobj(\xseqk{k+1})\|_2 \leq \left( \frac{L}{2m^2} \|\nabla \fobj(\xseqk{k})\|_2 \right)^2 $$

# iterations of Newton's method required to satisfy stopping criterion $\fobj(\xseqk{k})-p^\ast\leq\epsilon$ is $$ \frac{\fobj(\xseqk{0}) - p^\ast}{\gamma} + \log_2 \log_2 (\epsilon_0 / \epsilon) \quad \mbox{where } \epsilon_0 = 2 m^3/L^2 $$

Summary of Newton's method

Newton's method is affine invariant, hence performance is independent of condition number unlike gradient method
once entering quadratic convergence phase, Newton's method converges extremely fast
performance not much dependent on choice of algorithm parameters
big disadvantage is computational cost for evaluating search direction, i.e., solving linear system

Self-concordance

convex function $f:X\to \reals$ with $X\subset \reals^n$ such that for all $x\in X, v\in\reals^n$, $g(t) = f(x+tv)$ with $\dom g = \set{t\in\reals}{x+tv\in X}$ satisfies $$ \left( \forall t\in\dom g \right) \left( |g'''(t)| \leq 2 g''(t)^{3/2} \right) $$

if convex function $g:X\to\reals$ with $X\subset \ppreals$ satisfies $$ |g'''(x)| \leq 3 g''(x) / x $$ function $f$ with $\dom f = \set{x\in\ppreals}{g(x)<0}$ defined by $$ f(x) = -\log(-g(x)) - \log x $$ and function $h$ with $\dom h = \set{x\in\ppreals}{g(x)+ax^2+bx + c<0}$ with $a\geq0$ defined by $$ h(x) = -\log(-g(x)-ax^2-bx-c) - \log x $$ are self-concordant

Why self-concordance?

convergence analysis of Newton's method depends on assumptions about function characteristics, e.g., $m,M, L > 0$ for strong convexity, continuity of Hessian, i.e. $$ m I \preceq \nabla^2 f(x) \preceq M I \quad \|\nabla^2 f(x)- \nabla^2f(y)\| \leq L \|x-y\| $$
self-concordance discovered by Nesterov and Nemirovski (who gave name self-concordance) plays important role for reasons such as
- convergence analysis does not depend any function characterizing paramters
- many barrier functions which are used for interior-point methods, which are important class of optimization algorithms are self-concordance
- property of self-concordance is affine invariant

Self-concordance preserving operations

self-concordance is preserved by positive scaling, addition, and affine transformation, i.e., if $f, g:X\to\reals$ are self-concordant functions with $X\subset\reals^n$, $h:H\to\reals^n$ with $H\subset \reals^m$ are affine functions, and $a>0$ $$ af, \quad f+g, \quad f\circ h $$ are self-concordant where $\dom f\circ h = \set{x\in H}{h(x) \in X}$

Self-concordant function examples

negative logarithm - $f:\ppreals \to \reals$ with $$ f(x)=-\log x $$ is self-concordant since $$ |f'''(x)| / f''(x)^{3/2} = \left(2/x^3\right) / \left((1/x^2)^{3/2}\right) = 2 $$
negative entropy plus negative logarithm - $f:\ppreals \to \reals$ with $$ f(x)=x\log x-\log x $$ is self-concordant since $$ |f'''(x)| / f''(x)^{3/2} = (x+2)/{(x+1)^{3/2}} \leq 2 $$
log barrier for linear inequalities - for $A\in\reals^{m\times n}$ and $b\in\reals^m$ $$ f(x) = - \sum \log(b-Ax) $$ with $\dom f = \set{x\in\reals^n}{b-Ax \succ 0}$ is self-concordant by , i.e., $f$ is affine transformation of sum of self-concordant functions
log-determinant - $f:\posdefset{n}\to\reals$ with $$ f(X) = \log\det X^{-1} = - \log\det X $$ is self-concordant since for every $X\in \posdefset{n}$ and $V\in\symset{n}$ function $g:\reals\to\reals$ defined by $g(t) = - \log\det(X+tV)$ where $\dom f = \set{t\in\reals}{X+tV\succeq 0}$ is self-concordant since $$ \begin{eqnarray*} g(t) &=& - \log \det (X^{1/2} (I + tX^{-1/2} V X^{-1/2})X^{1/2}) \\ &=& -\log\det X - \log\det(I+tX^{-1/2}VX^{-1/2}) \\ &=& -\log\det X - \sum \log (1+t\lambda_i(X,V)) \end{eqnarray*} $$ where $\lambda_i(X,V)$ is $i$-th eigenvalue of $X^{-1/2}VX^{1/2}$ is self-concordant by , i.e., $g$ is affine transformation of sum of self-concordant functions
log of concave quadratic - $f:X\to\reals$ with $$ f(x) = -\log(-x^TPx - q^Tx - r) $$ where $P\in\possemidefset{n}$ and $X=\set{x\in\reals^n}{x^TPx + q^Tx + r<0}$
function $f:X\to\reals$ with $$ f(x) = -\log(-g(x)) - \log x $$ where $\dom f = \set{x\in\dom g \cap \ppreals}{g(x)<0}$ and function $h:H\to\reals$ $$ h(x) = -\log(-g(x)-ax^2-bx-c) - \log x $$ where $a\geq0$ and $\dom h = \set{x\in\dom g \cap \ppreals}{g(x)+ax^2+bx+c<0}$ are self-concordant if $g$ is one of below
- $g(x) = -x^p$ for $0<p\leq 1$
- $g(x) = -\log x$
- $g(x) = x \log x$
- $g(x) = x^p$ for $-1\leq p\leq 0$
- $g(x) = (ax+b)^2/x$ for $a,b\in\reals$
since above $g$ satisfy $|g'''(x)| \leq 3 g''(x)/x$ for every $x\in\dom g$ ()
function $f:X\to\reals$ with $X = \set{(x,y)}{\|x\|_2 < y}\subset \reals^n \times \ppreals$ defined by $$ f(x,y) = -\log(y^2-x^Tx) $$ is self-concordant - can be proved using
function $f:X\to\reals$ with $X = \set{(x,y)}{|x|^p < y}\subset \reals \times \ppreals$ defined by $$ f(x,y) = -2\log y - \log(y^{2/p}- x^2) $$ where $p\geq1$ is self-concordant - can be proved using
function $f:X\to\reals$ with $X = \set{(x,y)}{\exp(x) < y}\subset \reals \times \ppreals$ defined by $$ f(x,y) = -\log y - \log(\log y - x) $$ is self-concordant - can be proved using

Properties of self-concordant functions

for convex function $f:X\to\reals$ with $X\subset \reals^n$, function $\lambda:\tilde{X}\to\preals$ with $\tilde{X} = \set{x\in X}{\nabla^2 \fobj(x) \succ 0}$ defined by $$ \lambda(x) = (\nabla \fobj(x)^T \nabla^2 \fobj(x)^{-1} \nabla \fobj(x))^{1/2} $$ called Newton decrement

note $$ \lambda(x) = \sup_{v\neq 0} \left(v^T \nabla \fobj(x) / \left( v^T \nabla^2 \fobj(x) v \right)^{1/2} \right) $$

for strictly convex self-concordant function $f:X\to\reals^n$ with $X\subset \reals^n$, Hessian is positive definition everywhere (hence Newton decrement is defined everywhere) and for every $x\in X$ $$ p^\ast \geq \fobj(x) - \lambda(x)^2 \quad \Leftrightarrow \quad \fobj(x) - p^\ast \leq \lambda(x)^2 $$ if $\lambda(x) \leq 0.68$

Stopping criteria for self-concordant objective functions

recall $\lambda(x)^2$ provides approximate optimality certificate, (page~here) i.e., assuming $\fobj$ is well approximated by quadratic function around $x$ $$ \fobj(x) - p^\ast \lessapprox \lambda(x)^2/2 $$
however, strict convexity together with self-concordance provides proven bound (by ) $$ \fobj(x) - p^\ast \leq \lambda(x)^2 $$ for $\lambda(x) \leq 0.68$
hence can use following stopping criterion for guaranteed bound $$ \lambda(x)^2 \leq \epsilon \quad \Rightarrow \quad \fobj(x) - p^\ast \leq \epsilon $$ for $\epsilon \leq 0.68^2$

Convergence analysis of Newton's method for self-concordant functions

for strictly convex self-concordant function $\fobj$, exist $0<\eta\leq 1/4$ and $\gamma>0$ (which depend only on line search parameters) such that

damped Newton phase - if $\lambda(\xseqk{k})>\eta$ $$ \fobj(\xseqk{k+1}) - \fobj(\xseqk{k}) \leq - \gamma $$
quadratic convergence phase - if $\lambda(\xseqk{k})\leq\eta$ backtracking line search selects step length $\slenk{k}=1$ $$ 2\lambda(\xseqk{k+1}) \leq \left(2\lambda(\xseqk{k})\right)^2 $$

# iterations required to satisfy stopping criterion $\fobj(\xseqk{k})-p^\ast\leq\epsilon$ is $$ \left(\fobj(\xseqk{0}) - p^\ast\right)/{\gamma} + \log_2 \log_2 (1 / \epsilon) $$ where $\gamma = \alpha \beta (1-2\alpha)^2 / (20-8\alpha)$

Equality Constrained Minimization

Equality constrained minimization

consider equality constrained convex optimization problem, i.e., $m=0$ in $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & Ax = b \end{array} $$ where $A\in\reals^{p\times n}$ and domain of optimization problem is $\optdomain\ = \xobj \subset \reals^n$
assume
- $\rank A = p<n$, i.e., rows of $A$ are linearly independent
- $\fobj$ is twice-differentiable (hence by definition $\xobj$ is open)
- optimal solution $x^\ast$ exists, i.e., $p^\ast = \inf_{x\in\optfeasset} \fobj(x) = \fobj(x^\ast)$ and $Ax^\ast = b$

Solving KKT for equality constrained minimization

implies $x^\ast\in\xobj$ is optimal solution if and only if exists $\nu^\ast\in\reals^p$ satisfy KKT optimality conditions,
i.e., $$ \begin{eqnarray*} Ax^\ast = b &&\mbox{\define{primal feasibility equations}} \\ \nabla \fobj(x^\ast) + A^T\nu^\ast = 0 &&\mbox{\define{dual feasibility equations}} \end{eqnarray*} $$
solving equality constrained problem is equivalent to solving KKT equations
- handful types of problems can be solved analytically
using unconstrained minimization methods
- can eliminate equality constraints and apply unconstrained minimization methods
- solving dual problem using unconstrained minimization methods and retrieve primal solution (refer to page~here)
will discuss Newton's method directly handling equality constraints
- preserving problem structure such as sparsity

Equality constrained convex quadratic minimization

equality constrained convex quadratic minimization problem $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) = (1/2)x^T P x + q^Tx \\ \mbox{subject to} & Ax = b \end{array} $$ where $P\in\possemidefset{n}$ and $A\in\reals^{p\times n}$
important since basis for extension of Newton's method to equality constrained problems
KKT system $$ Ax^\ast = b \; \& \; Px^\ast + q + A^T\nu^\ast = 0 \; \Leftrightarrow \; \underbrace{ \mattwotwo{P}{A^T}{A}{0} }_{\mbox{KKT matrix}} \colvectwo{x^\ast}{\nu^\ast} = \colvectwo{-q}{b} $$
exist primal and dual optimum $(x^\ast,\nu^\ast)$ if and only if KKT system has solution; otherwise, problem is unbounded below

Eliminating equality constraints

can solve equality constrained convex optimization by
- eliminating equality constraints and
- using optimization method for solving unconstrained optimization
note $$ \optfeasset = \set{x}{Ax=b} = \set{Fz + x_0}{z\in\reals^{n-p}} $$ for some $F\in\reals^{n\times(n-p)}$ where $\range(F) = \nullspace(A)$
thus original problem equivalent to $$ \begin{array}{ll} \mbox{minimize} & \fobj(Fz + x_0) \end{array} $$
if $z^\ast$ is optimal solution, $x^\ast = Fz^\ast + x_0$
optimal dual can be retrieved by $$ \nu^\ast = - (AA^T)^{-1} A\nabla \fobj(x^\ast) $$

Solving dual problems

Lagrange dual function of equality constrained problem $$ \begin{eqnarray*} g(\nu) & = & \inf_{x\in\optdomain} \left( \fobj(x) + \nu^T(Ax-b) \right) \\ &=& -b^T\nu - \sup_{x\in\optdomain} \left((-A^T\nu)^Tx -\fobj(x)\right) \\ & = & -b^T \nu - {\fobj}^\ast(-A^T\nu) \end{eqnarray*} $$
dual problem $$ \begin{array}{ll} \mbox{maximize} & -b^T \nu - {\fobj}^\ast(-A^T\nu) \end{array} $$
by assumption, strong duality holds, hence if $\nu^\ast$ is dual optimum $$ g(\nu^\ast) = p^\ast $$
if dual objective is twice-differentiable, can solve dual problem using unconstrained minimization methods
primal optimum can be retrieved using method on page~here)

Newton's method with equality constraints

finally discuss Newton's method which directly handles equality constraints
- similar to Newton's method for unconstrained minimization
- initial point, however, should be feasible, i.e., $\xseqk{0}\in\xobj$ and $A\xseqk{0} = b$
- Newton step tailored for equality constrained problem

Newton step via second-order approximation

solve original problem approximately by solving $$ \begin{array}{ll} \mbox{minimize} & \hat{\fobj}(x+\sdir) \\&= \fobj(x) + \nabla \fobj(x)^T \sdir + (1/2) \sdir^T \nabla^2 \fobj(x) \sdir \\ \mbox{subject to} & A(x+\sdir) = b \end{array} $$ where $x\in\optfeasset$
Newton step for equality constrained minimization problem, defined by solution of KKT system for above convex quadratic minimization problem $$ \mattwotwo{\nabla^2 \fobj(x)}{A^T}{A}{0} \colvectwo{\sdir_\mathrm{nt}}{w} = \colvectwo{-\nabla \fobj(x)}{0} $$ only when KKT system is nonsingular

Newton step via solving linearized KKT optimality conditions

recall KKT optimality conditions for equality constrained convex optimization problem $$ Ax^\ast = b \quad \& \quad \nabla \fobj(x^\ast) + A^T\nu^\ast = 0 $$
linearize KKT conditions $$ \begin{eqnarray*} && A(x+\sdir) = b \quad \& \quad \nabla \fobj(x) + \nabla^2 \fobj(x) \sdir + A^Tw = 0 \\ &\Leftrightarrow& A\sdir = 0 \quad \& \quad \nabla^2 \fobj(x) \sdir + A^Tw = - \nabla \fobj(x) \end{eqnarray*} $$ where $x\in\optfeasset$
Newton step defined by above equations is equivalent to that obtained by second-order approximation

Newton decrement for equality constrained minimization

Newton descrement for equality constrained problem is defined by $$ \lambda(x) = \left(\sdir_\mathrm{nt} \nabla^2 \fobj(x) \sdir_\mathrm{nt}\right)^{1/2} $$
same expression as that for unconstrained minimization, but is different since Newton step $\sdir_\mathrm{nt}$ is different from that for unconstrained minimization, i.e., $\sdir_\mathrm{nt} \neq -\nabla^2 \fobj(x)^{-1} \nabla \fobj(x)$ (refer to )
however, as before, $$ \fobj(x) - \inf_{\sdir\in\reals^n}\set{\hat{\fobj}(x+\sdir)}{A(x+\sdir)=b} = \lambda(x)^2/2 $$ and $$ \left. \left( \frac{d}{dt}\fobj(x+t\sdir_\mathrm{nt}) \right) \right|_{t=0} = \nabla \fobj(x) ^T \sdir_\mathrm{nt} = - \lambda(x)^2 <0 $$

Feasible Newton's method for equality constrained minimization

Require: $\fobj$, initial point $x\in \dom \fobj$ with $Ax=b$, tolerance $\epsilon>0$
loop
computer Newton step and descrement $\ntsdir(x)$ \& $\lambda(x)$
stopping criterion - quit if $\lambda(x)^2/2 < \epsilon$
do line search on \fobj\ to choose $t>0$
update - $x := x + \slen \ntsdir$
end loop

- assumes KKT matrix is nonsingular for every step
- is feasible descent method since all iterates are feasible with $\fobj(\xseqk{k+1}) <\fobj(\xseqk{k})$

Assumptions for convergence analysis of feasible Newton's method for equality constrained minimization

feasibility of initial point - $\xseqk{0}\in\dom \fobj \;\&\; A\xseqk{0}=b$
sublevel set $S = \set{x\in \dom \fobj}{\fobj(x) \leq \fobj(\xseqk{0}),\; Ax=b}$ is closed
boundedness of Hessian on $S$ $$ \left( \exists M > 0 \right) \left( \forall x\in S \right) \left( \nabla^2 \fobj(x) \preceq M I \right) $$
boundedness of KKT matrix on $S$ - corresponds to strong convexity assumption in unconstrained minimization $$ \left( \exists K >0 \right) \left( \forall x \in S \right) \left( \left\| \mattwotwo{\nabla^2 \fobj(x)}{A^T}{A}{0}^{-1} \right\|_2 \leq K \right) $$
Lipschitz continuity of Hessian on $S$ $$ \left( \exists L > 0 \right) \left( \forall x,y\in S \right) \left( \left\|\nabla^2 \fobj(x) - \nabla^2 \fobj(y)\right\|_2 \leq L \|x-y\|_2 \right) $$

Convergence analysis of feasible Newton's method for equality constrained minimization

convergence analysis of Newton's method for equality constrained minimization can be done by analyzing unconstrained minimization after eliminating equality constraints
thus, yield exactly same results as for unconstrained minimization () (with different parameter values), i.e.,
- consists of damped Newton phase and quadratic convergence phase
- # iterations required to achieve $\fobj(\xseqk{k})-p^\ast \leq \epsilon$ is $$ \left(\fobj(\xseqk{0})-p^\ast\right)/\gamma + \log_2 \log_2 (\epsilon_0/\epsilon) $$
for # iterations required to achieve $\fobj(\xseqk{k})-p^\ast \leq \epsilon$ for self-concordant functions is also same as for unconstrained minimization () $$ \left(\fobj(\xseqk{0}) - p^\ast\right)/{\gamma} + \log_2 \log_2 (1 / \epsilon) $$ where $\gamma = \alpha \beta (1-2\alpha)^2 / (20-8\alpha)$

Newton step at infeasible points

only assume that $x\in\dom \fobj$ (hence, can be infeasible)
(as before) linearize KKT conditions $$ \begin{eqnarray*} && A(x+\ntsdir) = b \quad \& \quad \nabla \fobj(x) + \nabla^2 \fobj(x) \ntsdir + A^Tw = 0 \\ &\Leftrightarrow& A\ntsdir = b - Ax \quad \& \quad \nabla^2 \fobj(x) \ntsdir + A^Tw = - \nabla \fobj(x) \\ &\Leftrightarrow& \mattwotwo{\nabla^2 \fobj(x)}{A^T}{A}{0} \colvectwo{\ntsdir}{w} = - \colvectwo{\nabla \fobj(x)}{Ax-b} \end{eqnarray*} $$
same as feasible Newton step except second component on RHS of KKT system

Interpretation as primal-dual Newton step

update both primal and dual variables $x$ and $\nu$
define $r:\reals^n\to\reals^p\to\reals^n\times\reals^p$ by $$ r(x,\nu) = (r_\mathrm{dual}(x,\nu),r_\mathrm{pri}(x,\nu)) $$ where $$ \begin{eqnarray*} \mbox{\define{dual residual}} & - & r_\mathrm{dual}(x,\nu) = \nabla \fobj(x) + A^T\nu \\ \mbox{\define{primal residual}} & - & r_\mathrm{pri}(x,\nu) = Ax-b \end{eqnarray*} $$

Equivalence of infeasible Newton step to primal-dual Newton step

linearize $r$ to obtain primal-dual Newton step, i.e. $$ \begin{eqnarray*} && r(x,\nu) + D_{x,\nu} r(x,\nu) \colvectwo{\pdsdir}{\pdsdirnu} = 0 \\ &\Leftrightarrow& \mattwotwo{\nabla^2f(x)}{A^T}{A}{0} \colvectwo{\pdsdir}{\pdsdirnu} = - \colvectwo{\nabla f(x) + A^T\nu}{Ax-b} \end{eqnarray*} $$
letting $\nu^+= \nu + \pdsdirnu$ gives $$ \mattwotwo{\nabla^2f(x)}{A^T}{A}{0} \colvectwo{\pdsdir}{\nu^+} = - \colvectwo{\nabla f(x)}{Ax-b} $$
- equivalent to infeasible Newton step
- reveals that current value of dual variable not needed

Residual norm reduction property

infeasible Newton step is not descent direction (unlike feasible Newton step) since $$ \begin{eqnarray*} \left. \left( \frac{d}{dt}\fobj(x+t\pdsdir) \right) \right|_{t=0} &=& \nabla \fobj(x) ^T \pdsdir \\ &=& - \pdsdir^T \left(\nabla^2 \fobj(x) \pdsdir + A^Tw \right) \\ &=& - \pdsdir^T \nabla^2 \fobj(x) \pdsdir + (Ax-b)^Tw \end{eqnarray*} $$ which is not necessarily negative
however, norm of residual decreases in infeasible Newton direction $$ \begin{eqnarray*} \left. \left( \frac{d}{dx} \|r(y+t\pdsdiry)\|_2^2 \right) \right|_{t=0} & = & - 2 r(y)^T r(y) = - 2 \|r(y)\|_2^2 \\ \Leftrightarrow \quad \left. \left( \frac{d}{dx} \|r(y+t\pdsdiry)\|_2 \right) \right|_{t=0} & = & \frac{-2\|r(y)\|_2^2}{2\|r(y)\|_2} = - \|r(y)\|_2 \end{eqnarray*} $$ where $y=(x,\nu)$ and $\pdsdiry = (\pdsdir, \pdsdirnu)$
can use $r(\xseqk{k},\nuseqk{k})$ to measure optimization progress for infeasible Newton's method

Full and damped step feasibility property

assume step length is $t$ at some iteration, then $$ r_\mathrm{pri}(x^+,\nu^+) = Ax^+-b = A(x + t \pdsdir) - b = (1-t) r_\mathrm{pri}(x,\nu) $$
hence $l>k$ $$ \seqk{r}{l} = \left( \prod_{i=k}^{l-1} (1-\seqk{t}{i}) \right) \seqk{r}{k} $$
- primal residual reduced by $1-\seqk{t}{k}$ at step $k$
- Newton step becomes feasible step once full step length ($t=1$) taken

Infeasible Newton's method for equality constrained minimization

Require: $\fobj$, initial point $x\in \dom \fobj$ \& $\nu$, tolerance $\epsilon_\mathrm{pri}>0$ \& $\epsilon_\mathrm{dual}>0$
repeat
computer Newton step and descrement $\pdsdir(x)$ \& $\pdsdirnu(x)$, \
do line search on $r(x,\nu)$ to choose $\slen>0$
update - $x := x + \slen \pdsdir$ \& $\nu := \nu + \slen \pdsdirnu$
until $\|r_\mathrm{dual}(x,\nu)\| \leq \epsilon_\mathrm{dual}$ \& $\|Ax-b\| \leq \epsilon_\mathrm{pri}$

note similarity and difference of &
- line search done not on $\fobj$, but on primal-dual residuals $r(x,\nu)$
- stopping criteria depends on $r(x,\nu)$, not on Newton decrementa $\lambda(x)^2$
- primal and dual feasibility checked separately - here norm in $\|Ax-b\|$ can be any norm, e.g., $\|\cdot\|_0$, $\|\cdot\|_1$, $\|\cdot\|_2$, $\|\cdot\|_\infty$, depending on specific application

Line search methods for infeasible Newton's method

line search methods for infeasible Newton's method, i.e., & same with $\fobj$ replaced by $\|r(x,\nu)\|_2$,
but they have special forms (of course) - refer to below special case descriptions

$$ \slen = \argmin_{s>0} \|r(x +s\pdsdir, \nu + s\pdsdirnu)\|_2 $$

Require: \sdir, \sdirnu, $\alpha\in(0,0.5)$, $\beta\in(0,1)$
$\slen:=1$
while $\|r(x +\slen\pdsdir, \nu + \slen\pdsdirnu)\|_2 > (1-\alpha \slen)\|r(x,\nu)\|_2$ do
$\slen := \beta \slen$
end while

Pros and cons of infeasible Newton's method

pros
- do not need to find feasible point separately, e.g.
  - “''
  can be solved by converting to
  - “''
  and solved by infeasible Newton's method
- if step length is one at any iteration, following steps coincides with feasible Newton's method - could switch to feasible Newton's method
cons
- exists no clear way to detect feasibility - primal residual decreases slowly (phase I method in interior point method resolves this problem)
- convergence of infeasible Newton's method can be very slow (until feasibility is achieved0

Assumptions for convergence analysis of infeasible Newton's method for equality constrained minimization

sublevel set $S = \bigset{(x,\nu)\in \dom \fobj\times \reals^m}{ \|r(x,\nu)\|_2 \leq \|r(\xseqk{0},\nuseqk{0})\|_2 }$ is closed, which always holds because $\|r\|_2$ is closed
boundedness of KKT matrix on $S$ $$ \left( \exists K >0 \right) \left( \forall x \in S \right) \left( \left\| Dr(x,\nu)^{-1} \right\|_2 = \left\| \mattwotwo{\nabla^2 \fobj(x)}{A^T}{A}{0}^{-1} \right\|_2 \leq K \right) $$
Lipschitz continuity of Hessian on $S$ $$ \begin{eqnarray*} \left( \exists L > 0 \right) && \left( \forall (x,\nu), (y,\mu)\in S \right) \\ && \left( \left\|Dr(x,\nu) - Dr(y,\mu)\right\|_2 \leq L \|(x,\nu) - (y,\mu)\|_2 \right) \end{eqnarray*} $$
above assumptions imply $\set{x\in\dom \fobj}{Ax=b}\neq\emptyset$ and exist optimal point $(x^\ast,\nu^\ast)$

Convergence analysis of infeasible Newton's method for equality constrained minimization

very simliar to that for Newton's method for unconstrained minimization
consist of two phases - like unconstrained minimization or infeasible Newton's method (refer to or page~here)
- damped Newton phase - if $\|r(\xseqk{k},\nuseqk{k})\|_2> 1/(K^2L)$ $$ \|r(\xseqk{k+1},\nuseqk{k+1})\|_2 \leq \|r(\xseqk{k},\nuseqk{k})\|_2 - \alpha \beta / K^2L $$
- quadratic convergence damped Newton phase - if $\|r(\xseqk{k},\nuseqk{k})\|_2 \leq 1/(K^2L)$ $$ \begin{eqnarray*} \left( K^2L \|r(\xseqk{k},\nuseqk{k})\|_2 / 2 \right) &\leq& \left( K^2L \|r(\xseqk{k-1},\nuseqk{k-1})\|_2 / 2 \right)^2 \\ &\leq& \cdots \leq (1/2)^{2^k} \end{eqnarray*} $$
# iterations of infeasible Newton's method required to satisfy $\|r(\xseqk{k},\nuseqk{k})\|_2\leq\epsilon$ $$ \|r(\xseqk{0},\nuseqk{0})\| /(\alpha \beta / K^2L) + \log_2 \log_2 (\epsilon_0/\epsilon) \quad \mbox{where}\; \epsilon_0 = 2/(K^2L) $$
$(\xseqk{k},\nuseqk{k})$ converges to $(x^\ast,\nu^\ast)$

Barrier Interior-point Methods

Interior-point methods

want to solve inequality constrained minimization problem
interior-point methods solve convex optimization problem () or KKT optimality conditions () by
- applying Newton's method to sequence of
  - equality constrained problems or
  - modified versions of KKT optimality conditions
discuss interior-point barrier method & interior-point primal-dual method
hierarchy of convex optimization algorithms
- simplest - linear equality constrained quadratic program - can solve analytically
- Newton's method - solve linear equality constrained convex optimization problem by solving sequence of linear equality constrained quadratic programs
- interior-point methods - solve linear equality & convex inequality constrained problem by solving sequence of lienar equality constrained convex optimization problem

Indicator function barriers

approxmiate general convex inequality constrained problem as linear equality constrained problem
make inequality constraints implicit in objective function $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) + \sum I_-(\fie(x)) \\ \mbox{subject to} & Ax=b \end{array} $$ where $I_-:\reals\to \reals$ is indicator function for nonpositive real numbers, i.e. $$ I_{-}(u) = \left\{\begin{array}{ll} 0 & u\leq 0 \\ \infty & u> 0 \end{array}\right. $$

Logarithmic barriers

approximate indicator function by logarithmic function $$ \hat{I}_- = -(1/t) \log(-u) \quad \dom \hat{I}_- = -\ppreals $$ for $t>0$ to obtain $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) + \sum -(1/t) \log(-\fie(x)) \\ \mbox{subject to} & Ax=b \end{array} $$
objective function is convex due to composition rule for convexity preservation (page~here), and differentiable
hence, can use Newton's method to solve it
function $\phi$ defined by $$ \phi(x) = - \sum \log(-\fie(x)) $$ with $\dom \phi \set{x\in\xdomain}{\fie(x) \prec 0}$ called logarithmic barrier or log barrier
solve sequence of log barrier problems as we increase $t$

Central path

optimization problem $$ \begin{array}{ll} \mbox{minimize} & t \fobj(x) + \phi(x) \\ \mbox{subject to} & Ax = b \end{array} $$ with $t>0$ where $$ \phi(x) = - \sum \log(-\fie(x)) $$
solution of above problem, called central point, denoted by $x^\ast(t)$, set of central points, called central path
intuition says $x^\ast(t)$ will converge to $x^\ast$ as $t\to\infty$
KKT conditions imply $$ Ax^\ast(t) = b \quad \fie(x^\ast(t)) \prec 0 $$ and exists $\nu^\ast(t)$ such that $$ \begin{eqnarray*} 0 &=& t \nabla \fobj(x^\ast(t)) + \nabla \phi(x^\ast(t)) + t A^T \nu^\ast(t) \\ &=& t\nabla \fobj(x^\ast(t)) - \sum \frac{1}{\fie_i(x^\ast(t))} \nabla\fie_i(x^\ast(t)) + t A^T \nu^\ast(t) \end{eqnarray*} $$
thus if we let $\lambda^\ast(t) = -1/t\fie_i(x^\ast(t))$, $x^\ast(t)$ minimizes $$ L(x,\lambda^\ast(t),\nu^\ast(t)) = \fobj(x) + {\lambda^\ast(t)}^T \fie(x) + {\nu^\ast(t)}^T (Ax-b) $$ where $L$ is Lagrangian of original problem in
hence, dual function $g(\lambda^\ast(t),\nu^\ast(t))$ is finite and $$ \begin{eqnarray*} g(\lambda^\ast(t), \nu^\ast(t)) &=& \inf_{x\in\xdomain} L(x,\lambda^\ast(t),\nu^\ast(t)) = L(x^\ast(t),\lambda^\ast(t),\nu^\ast(t)) \\ & = & \fobj(x^\ast(t)) + {\lambda^\ast(t)}^T \fie(x^\ast(t)) + {\nu^\ast(t)}^T (Ax^\ast(t)-b) \\ &=& \fobj(x^\ast(t)) - m/t \end{eqnarray*} $$ and $$ \fobj(x^\ast(t)) - p^\ast \leq \fobj(x^\ast(t)) - g(\lambda^\ast(t), \nu^\ast(t)) = m/t $$
that is, $x^\ast(t)$ is no more than $m/t$-suboptimal
which confirms out intuition that $x^\ast(t)\to x^\ast$ as $t\to\infty$

Central path interpretation via KKT conditions

previous arguments imply that $x$ is central point, i.e., $x=x^\ast(t)$ for some $t>0$ if and only if exist $\lambda$ and $\nu$ such that $$ \begin{eqnarray*} Ax=b \quad \fie({x}) &\preceq& 0 \quad \mbox{- primal feasibility} \\ \lambda &\succeq& 0 \quad \mbox{- dual feasibility} \\ - {\lambda_i}^T \fie_i({x}) &=& 1/t \quad \mbox{- complementary $1/t$-slackness} \\ \nabla_x L(x,\lambda,\nu) &=& 0 \quad \mbox{- vanishing gradient of Lagrangian} \end{eqnarray*} $$ called centrality conditions
only difference between centrality conditions and KKT conditions in is complementary $1/t$-slackness
- note that I've just made up term “complementary $1/t$-slackness'' - you won't be able to find terminology in any literature
for large $t$, $\lambda^\ast(t)$ & $\nu^\ast(t)$ very closely satisfy (true) complementary slackness

Central path interpretation via force field

assume exist no equality constraints
interpret $\phi$ as potential energy by some force field, e.g., electrical field and $t\fobj$ as potential energy by some other force field, e.g., gravity
then
- force by first force field (in $n$-dimensional space), which we call barrier force, is $$ - \nabla \phi(x) = \sum \frac{1}{\fie_i(x)} \nabla \fie_i(x) $$
- force by second force field, which we call objective force, is $$ - \nabla (t\fobj(x)) = -t \nabla \fobj(x) $$
$x^\ast(t)$ is point where two forces exactly balance each other
- as $x$ approach boundary, barrier force pushes $x$ harder from barriers,
- as $t$ increases, objective force pushes $x$ harder to point where objective potential energy is minimized

Equality constrained problem using log barrier

central point $x^\ast(t)$ is $m/t$-suboptimal point guaranteed by optimality certificate $g(\lambda^\ast(t),\nu^\ast(t))$
hence solving below problem provides solution with $\epsilon$-suboptimality $$ \begin{array}{ll} \mbox{minimize} & (m/\epsilon) \fobj(x) + \phi(x) \\ \mbox{subject to} & Ax=b \end{array} $$
but works only for small problems since for large $m/\epsilon$, objective function ill behaves

Barrier methods

Require: strictly feasible $x$, $t>0$, $\mu>1$, tolerance $\epsilon>0$
repeat
centering step - find $x^\ast(t)$ by minimizing $t\fobj + \phi$ subject to $Ax=b$ starting at $x$
(optionally) compute $\lambda^\ast(t)$ \& $\nu^\ast(t)$
stopping criterion - quit if $m/t<\epsilon$
increase $t$ - $t := \mu t$
update $x$ - $x := x^\ast(t)$

barrier method, also called path-following method, solves sequence of equality constrained optimization problem with log barrier
- when first proposed by Fiacco and McCormick in 1960s, it was called sequential unconstrained minimization technique (SUMT)
centering step also called outer iteration
each iteration of algorithm used for equality constrained problem, called inner iteration

Accuracy in centering in barrier method

accuracy of centering
- only goal of centering is getting close to $x^\ast$, hence exact calculation of $x^\ast(t)$ not critical as long as approximates of $x^\ast(t)$ go to $x^\ast$
- while cannot calculate $g(\lambda,\nu)$ for this case, below provides dual feasible point when Newton step $\ntsdir$ for optimization problem on page~here is small, i.e., for nearly centered $$ \tilde{\lambda}_i = -\frac{1}{t\fie_i(x)} \left( 1 - \frac{\nabla \fie_i(x)^T \ntsdir}{\fie_i(x)} \right) $$

Choices of parameters of barrier method

choice of $\mu$
- $\mu$ determines aggressiveness of $t$-update
  - larger $\mu$, less outer iterations, but more inner iterations
  - smaller $\mu$, less outer iterations, but more inner iterations
- values from $10$ to $20$ for $\mu$ seem to work well
candidates for choice of initial $t$ - choose $\seqk{t}{0}$ such that
- $$ m / \seqk{t}{0} \approx \fobj(\xseqk{0}) - p^\ast $$
- make central path condition on page~here maximally satisfied $$ \seqk{t}{0} = \arginf_{t} \inf_{\tilde{\nu}} \left\| t \nabla \fobj(\xseqk{0}) + \nabla \phi(\xseqk{0}) + A^T \tilde{\nu} \right\| $$

Convergence analysis of barrier method

assuming $t\fobj + \phi$ can be minimized by Newton's method for $\seqk{t}{0}$, $\mu\seqk{t}{0}$, $\mu^2\seqk{t}{0}$, …
at $k$'th step, duality gap achieved is $m/\mu^k\seqk{t}{0}$
# centering steps required to achieve accuracy of $\epsilon$ is $$ \left\lceil \frac{\log \left(m/\epsilon \seqk{t}{0}\right)}{\log \mu} \right\rceil $$ plus one (initial centering step)
for convergence of centering
- for feasible centering problem, $t\fobj + \phi$ should satisfy conditions on page~here, i.e., initial sublevel set is closed, associated inverse KKT matrix is bounded & Hessian satisfies Lipschitz condition
- for infeasible centering problem, $t\fobj + \phi$ should satisfy conditions on page~here

Primal-dual Interior-point Methods

Primal-dual \& barrier interior-point methods

in primal-dual interior-point methods
- both primal and dual variables are updated at each iteration
- search directions are obtained from Newton's method, applied to modified KKT equations, i.e., optimality conditions for logarithmic barrier centering problem
- primal-dual search directions are similar to, but not quite the same as, search directions arising in barrier methods
- primal and dual iterates are not necessarily feasible
primal-dual interior-point methods
- often more efficient than barrier methods especially when high accuracy is required - can exhibit better than linear convergence
- (customized versions) outperform barrier method for several basic problems, such as, LP, QP, SOCP, GP, SDP
- can work for feasible, but not strictly feasible problems
- still active research topic, but show great promise

Modified KKT conditions and central points

modified KKT conditions (for convex optimization in ) expressed as $$ r_t(x,\lambda,\nu) = \colvecthree {\nabla \fobj(x) + D\fie(x)^T\lambda + A^T\nu} {-\diag(\lambda)\fobj(x) - (1/t) \ones} {Ax-b} $$ where $$ \begin{eqnarray*} \mbox{dual residual} &-& r_\mathrm{dual}(x,\lambda,\nu) = {\nabla \fobj(x) + D\fie(x)^T\lambda + A^T\nu} \\ \mbox{centrality residual} &-& r_\mathrm{cent}(x,\lambda,\nu) = {-\diag(\lambda)\fobj(x) - (1/t) \ones} \\ \mbox{primal residual} &-& r_\mathrm{pri}(x,\lambda,\nu) = {Ax-b} \end{eqnarray*} $$
if $x$, $\lambda$, $\nu$ satisfy $r_t(x,\lambda,\nu)=0$ (and $\fie(x) \prec 0$), then
- $x=x^\ast(t)$, $\lambda=\lambda^\ast(t)$, $\nu=\nu^\ast(t)$
- $x$ is primal feasible and $\lambda$ & $\nu$ are dual feasible with duality gap $m/t$

Primal-dual search direction

assume current (primal-dual) point $y=(x,\lambda,\nu)$ and Newton step $\sdiry =( \sdir, \sdirnu, \sdirlbd)$
as before, linearize equation to obtain Newton step, i.e., $$ r_t(y+\sdiry) \approx r_t(y) + Dr_t(y) \sdiry = 0 \quad \Leftrightarrow \quad \sdiry = -Dr_t(y)^{-1} r_t(y) $$ hence $$ \begin{my-matrix}{ccc} \nabla^2 f(x) + \sum \lambda_i \nabla^2 \fie_i(x) & D\fie(x)^T & A^T \\ -\diag(\lambda) D\fobj(x) & -\diag(\fobj(x)) & 0 \\ A & 0 & 0 \end{my-matrix} \colvecthree{\sdir}{\sdirlbd}{\sdirnu} = - \colvecthree {r_\mathrm{dual}} {r_\mathrm{cent}} {r_\mathrm{pri}} $$
above equation determines primal-dual search direction $\pdsdiry = (\pdsdir, \pdsdirlbd, \pdsdirnu)$

Surrogate duality gap

iterates $\xseqk{k}$, $\lbdseqk{k}$, and $\nuseqk{k}$ of primal-dual interior-point method are not necessarily feasible
hence, cannot easily evaluate duality gap $\seqk{\eta}{k}$ as for barrier method
define surrogate duality gap for $\fie(x) \prec 0$ and $\lambda\succeq0$ as $$ \hat{\eta}(x,\lambda) = - \fie(x)^T \lambda $$
$\hat{\eta}$ would be duality gap if $x$ were primal feasible and $\lambda$ & $\nu$ were dual feasible
value $t$ corresponding to surrogate duality gap $\hat{\eta}$ is $m/\hat{\eta}$

Primal-dual interior-point method

Require: initial point $x$ with $\fie(x)\prec0$, $\lambda \succ 0$, $\mu > 1$, $\epsilon_\mathrm{pri}>0$, $\epsilon_\mathrm{dual}>0$, $\epsilon>0$
repeat
set $t := \mu m /\hat{\eta}$
computer primal-dual search direction $\pdsdiry = (\pdsdir, \pdsdirlbd, \pdsdirnu)$
do line search to choose $s>0$
update - $x := x + s \pdsdir$, $\lambda := \lambda + s \pdsdirnu$, $\nu := \nu + s \pdsdirnu$
until $\|r_\mathrm{pri}(x,\lambda,\nu)\|_2\leq \epsilon_\mathrm{pri}$, $\|r_\mathrm{dual}(x,\lambda,\nu)\|_2\leq \epsilon_\mathrm{dual}$, $\hat{\eta} \leq \epsilon$

common to choose small $\epsilon_\mathrm{pri}$, $\epsilon_\mathrm{dual}$, & $\epsilon$ since primal-dual method often shows faster than linear convergence

Line search for primal-dual interior-point method

liner search is standard backtracking line search on $r(x,\lambda,\nu)$ similar to that in except making sure that $\fie(x) \prec 0$ and $\lambda\succ0$
note initial $s$ in is largest $s$ that makes $\lambda + s\pdsdirlbd$ positive

Require: \pdsdir, \pdsdirlbd, \pdsdirnu, $\alpha\in(0.01,0.1)$, $\beta\in(0.3,0.8)$
$s$ $:=$ $0.99\sup\set{s\in[0,1]}{\lambda + s \sdirlbd \succeq 0}$ $=$ $0.99\min\{1,\min\set{-\lambda_i/\sdirlbd_i}{\sdirlbd_i < 0}\}$
while $\fie (x +s\pdsdir) \not \prec 0$ do
$t := \beta t$
end while
while $\|r(x +s\pdsdir, \lambda + s\pdsdirlbd, \nu + s\pdsdirnu)\|_2 > (1-\alpha s)\|r(x,\lambda,\nu)\|_2$ do
$t := \beta t$
end while

Share on

X Facebook LinkedIn Bluesky

Sunghee Yun (He/Him)

NotebookLM Podcast

Math Stories

Fundamental Theorems

Fundamental theorem of arithmetic

Fundamental theorem of algebra

Fundamental theorem of calculus

Fundamental theorem of calculus for line integrals

Fundamental theorem of cyclic groups

Fundamental theorem of equivalence relations

Fundamental theorem of finite abelian groups

Fundamental theorem of finitely generated abelian groups

Fundamental theorem for Galois theory

Fundamental theorem on homeomorphisms

Fundamental theorem of ideal theory in number fields

Fundamental theorem of linear algebra

Fundamental theorem of linear programming

Fundamental theorem of symmetric polynomials

Duality

Dualities

Notations

Some definitions

Some conventions

Algebra

Inequalities

Jensen's inequality

Jensen's inequality - for random variables

Proof for $n=3$

Proof for all $n$

1st and 2nd order conditions for convexity

Jensen's inequality examples

1st and 2nd order conditions for convexity - vector version

Jensen's inequality examples - vector version

AM $\geq$ GM

Proof of AM $\geq$ GM - simplest case

Proof of AM $\geq$ GM - when $n=4$ and $n=8$

Proof of AM $\geq$ GM - when $n=2^m$

Proof of AM $\geq$ GM - when $n=3$

Proof of AM $\geq$ GM - for all integers

Proof of AM $\geq$ GM - rational $\alpha_i$

Proof of AM $\geq$ GM - real $\alpha_i$

Proof of AM $\geq$ GM using Jensen's inequality

Cauchy-Schwarz inequality

Cauchy-Schwarz inequality - another proof

Cauchy-Schwarz inequality - still another proof

Cauchy-Schwarz inequality - proof using determinant

Cauchy-Schwarz inequality - generalization

Cauchy-Schwarz inequality - three series of variables

Cauchy-Schwarz inequality - extensions

Number Theory - Queen of Mathematics

Integers

Division and prime numbers

Fundamental theorem of arithmetic

Elementary quantities

Are there infinite number of prime numbers?

Integers modulo $n$

Euler's theorem

Abstract Algebra

Why Abstract Algebra?

Why abstract algebra?

Some history

Groups

Monoids

Groups

Cyclic groups, generators, and direct products

Homeomorphism and isomorphism

Kernel, image, and embedding of homeomorphism

Orthogonal subgroups

Cosets of groups

Indices and orders of groups

Normal subgroup

Normalizers and centralizers

Normalizers and congruence

Exact sequences of homeomorphisms

Canonical homeomorphism examples

Towers

Refinement of towers and solvability of groups

Commutators and commutator subgroups

Simple groups

Butterfly lemma