Convex Optimization

99 minute read

posted: 01-Aug-2025 & updated: 08-Aug-2025

NotebookLM Podcasts

72:27
32:54 \(% \newcommand{\algA}{\algk{A}} \newcommand{\algC}{\algk{C}} \newcommand{\bigtimes}{\times} \newcommand{\compl}[1]{\tilde{#1}} \newcommand{\complexes}{\mathbb{C}} \newcommand{\dom}{\mathop{\bf dom {}}} \newcommand{\ereals}{\reals\cup\{-\infty,\infty\}} \newcommand{\field}{\mathbb{F}} \newcommand{\integers}{\mathbb{Z}} \newcommand{\lbdseqk}[1]{\seqk{\lambda}{#1}} \newcommand{\meas}[3]{({#1}, {#2}, {#3})} \newcommand{\measu}[2]{({#1}, {#2})} \newcommand{\meast}[3]{\left({#1}, {#2}, {#3}\right)} \newcommand{\naturals}{\mathbb{N}} \newcommand{\nuseqk}[1]{\seqk{\nu}{#1}} \newcommand{\pair}[2]{\langle {#1}, {#2}\rangle} \newcommand{\rationals}{\mathbb{Q}} \newcommand{\reals}{\mathbb{R}} \newcommand{\seq}[1]{\left\langle{#1}\right\rangle} \newcommand{\powerset}{\mathcal{P}} \newcommand{\pprealk}[1]{\reals_{++}^{#1}} \newcommand{\ppreals}{\mathbb{R}_{++}} \newcommand{\prealk}[1]{\reals_{+}^{#1}} \newcommand{\preals}{\mathbb{R}_+} \newcommand{\tXJ}{\topos{X}{J}} % \newcommand{\relint}{\mathop{\bf relint {}}} \newcommand{\boundary}{\mathop{\bf bd {}}} \newcommand{\subsetset}[1]{\mathcal{#1}} \newcommand{\Tr}{\mathcal{\bf Tr}} \newcommand{\symset}[1]{\mathbf{S}^{#1}} \newcommand{\possemidefset}[1]{\mathbf{S}_+^{#1}} \newcommand{\posdefset}[1]{\mathbf{S}_{++}^{#1}} \newcommand{\ones}{\mathbf{1}} \newcommand{\Prob}{\mathop{\bf Prob {}}} \newcommand{\prob}[1]{\Prob\left\{#1\right\}} \newcommand{\Expect}{\mathop{\bf E {}}} \newcommand{\Var}{\mathop{\bf Var{}}} \newcommand{\Mod}[1]{\;(\text{mod}\;#1)} \newcommand{\ball}[2]{B(#1,#2)} \newcommand{\generates}[1]{\langle {#1} \rangle} \newcommand{\isomorph}{\approx} \newcommand{\isomorph}{\approx} \newcommand{\nullspace}{\mathcalfont{N}} \newcommand{\range}{\mathcalfont{R}} \newcommand{\diag}{\mathop{\bf diag {}}} \newcommand{\rank}{\mathop{\bf rank {}}} \newcommand{\Ker}{\mathop{\mathrm{Ker} {}}} \newcommand{\Map}{\mathop{\mathrm{Map} {}}} \newcommand{\End}{\mathop{\mathrm{End} {}}} \newcommand{\Img}{\mathop{\mathrm{Im} {}}} \newcommand{\Aut}{\mathop{\mathrm{Aut} {}}} \newcommand{\Gal}{\mathop{\mathrm{Gal} {}}} \newcommand{\Irr}{\mathop{\mathrm{Irr} {}}} \newcommand{\arginf}{\mathop{\mathrm{arginf}}} \newcommand{\argsup}{\mathop{\mathrm{argsup}}} \newcommand{\argmin}{\mathop{\mathrm{argmin}}} \newcommand{\ev}{\mathop{\mathrm{ev} {}}} \newcommand{\affinehull}{\mathop{\bf aff {}}} \newcommand{\cvxhull}{\mathop{\bf Conv {}}} \newcommand{\epi}{\mathop{\bf epi {}}} \newcommand{\injhomeo}{\hookrightarrow} \newcommand{\perm}[1]{\text{Perm}(#1)} \newcommand{\aut}[1]{\text{Aut}(#1)} \newcommand{\ideal}[1]{\mathfrak{#1}} \newcommand{\bigset}[2]{\left\{#1\left|{#2}\right.\right\}} \newcommand{\bigsetl}[2]{\left\{\left.{#1}\right|{#2}\right\}} \newcommand{\primefield}[1]{\field_{#1}} \newcommand{\dimext}[2]{[#1:{#2}]} \newcommand{\restrict}[2]{#1|{#2}} \newcommand{\algclosure}[1]{#1^\mathrm{a}} \newcommand{\finitefield}[2]{\field_{#1^{#2}}} \newcommand{\frobmap}[2]{\varphi_{#1,{#2}}} % %\newcommand{\algfontmode}{} % %\ifdefined\algfontmode %\newcommand\mathalgfont[1]{\mathcal{#1}} %\newcommand\mathcalfont[1]{\mathscr{#1}} %\else \newcommand\mathalgfont[1]{\mathscr{#1}} \newcommand\mathcalfont[1]{\mathcal{#1}} %\fi % %\def\DeltaSirDir{yes} %\newcommand\sdirletter[2]{\ifthenelse{\equal{\DeltaSirDir}{yes}}{\ensuremath{\Delta #1}}{\ensuremath{#2}}} \newcommand{\sdirletter}[2]{\Delta #1} \newcommand{\sdirlbd}{\sdirletter{\lambda}{\Delta \lambda}} \newcommand{\sdir}{\sdirletter{x}{v}} \newcommand{\seqk}[2]{#1^{(#2)}} \newcommand{\seqscr}[3]{\seq{#1}_{#2}^{#3}} \newcommand{\xseqk}[1]{\seqk{x}{#1}} \newcommand{\sdirk}[1]{\seqk{\sdir}{#1}} \newcommand{\sdiry}{\sdirletter{y}{\Delta y}} \newcommand{\slen}{t} \newcommand{\slenk}[1]{\seqk{\slen}{#1}} \newcommand{\ntsdir}{\sdir_\mathrm{nt}} \newcommand{\pdsdir}{\sdir_\mathrm{pd}} \newcommand{\sdirnu}{\sdirletter{\nu}{w}} \newcommand{\pdsdirnu}{\sdirnu_\mathrm{pd}} \newcommand{\pdsdiry}{\sdiry_\mathrm{pd}} \newcommand\pdsdirlbd{\sdirlbd_\mathrm{pd}} % \newcommand{\normal}{\mathcalfont{N}} % \newcommand{\algk}[1]{\mathalgfont{#1}} \newcommand{\collk}[1]{\mathcalfont{#1}} \newcommand{\classk}[1]{\collk{#1}} \newcommand{\indexedcol}[1]{\{#1\}} \newcommand{\rel}{\mathbf{R}} \newcommand{\relxy}[2]{#1\;\rel\;{#2}} \newcommand{\innerp}[2]{\langle{#1},{#2}\rangle} \newcommand{\innerpt}[2]{\left\langle{#1},{#2}\right\rangle} \newcommand{\closure}[1]{\overline{#1}} \newcommand{\support}{\mathbf{support}} \newcommand{\set}[2]{\{#1|#2\}} \newcommand{\metrics}[2]{\langle {#1}, {#2}\rangle} \newcommand{\interior}[1]{#1^\circ} \newcommand{\topol}[1]{\mathfrak{#1}} \newcommand{\topos}[2]{\langle {#1}, \topol{#2}\rangle} % topological space % \newcommand{\alg}{\algk{A}} \newcommand{\algB}{\algk{B}} \newcommand{\algF}{\algk{F}} \newcommand{\algR}{\algk{R}} \newcommand{\algX}{\algk{X}} \newcommand{\algY}{\algk{Y}} % \newcommand\coll{\collk{C}} \newcommand\collB{\collk{B}} \newcommand\collF{\collk{F}} \newcommand\collG{\collk{G}} \newcommand{\tJ}{\topol{J}} \newcommand{\tS}{\topol{S}} \newcommand\openconv{\collk{U}} % \newenvironment{my-matrix}[1]{\begin{bmatrix}}{\end{bmatrix}} \newcommand{\colvectwo}[2]{\begin{my-matrix}{c}{#1}\\{#2}\end{my-matrix}} \newcommand{\colvecthree}[3]{\begin{my-matrix}{c}{#1}\\{#2}\\{#3}\end{my-matrix}} \newcommand{\rowvecthree}[3]{\begin{bmatrix}{#1}&{#2}&{#3}\end{bmatrix}} \newcommand{\mattwotwo}[4]{\begin{bmatrix}{#1}&{#2}\\{#3}&{#4}\end{bmatrix}} % \newcommand\optfdk[2]{#1^\mathrm{#2}} \newcommand\tildeoptfdk[2]{\tilde{#1}^\mathrm{#2}} \newcommand\fobj{\optfdk{f}{obj}} \newcommand\fie{\optfdk{f}{ie}} \newcommand\feq{\optfdk{f}{eq}} \newcommand\tildefobj{\tildeoptfdk{f}{obj}} \newcommand\tildefie{\tildeoptfdk{f}{ie}} \newcommand\tildefeq{\tildeoptfdk{f}{eq}} \newcommand\xdomain{\mathcalfont{X}} \newcommand\xobj{\optfdk{\xdomain}{obj}} \newcommand\xie{\optfdk{\xdomain}{ie}} \newcommand\xeq{\optfdk{\xdomain}{eq}} \newcommand\optdomain{\mathcalfont{D}} \newcommand\optfeasset{\mathcalfont{F}} % \newcommand{\bigpropercone}{\mathcalfont{K}} % \newcommand{\prescript}[3]{\;^{#1}{#3}} % %\)

Introduction

Preamble

Notations

sets of numbers
- $\naturals$ - set of natural numbers
- $\integers$ - set of integers
- $\integers_+$ - set of nonnegative integers
- $\rationals$ - set of rational numbers
- $\reals$ - set of real numbers
- $\preals$ - set of nonnegative real numbers
- $\ppreals$ - set of positive real numbers
- $\complexes$ - set of complex numbers
sequences $\seq{x_i}$ and the like
- finite $\seq{x_i}_{i=1}^n$, infinite $\seq{x_i}_{i=1}^\infty$ - use $\seq{x_i}$ whenever unambiguously understood
- similarly for other operations, e.g., $\sum x_i$, $\prod x_i$, $\cup A_i$, $\cap A_i$, $\bigtimes A_i$
- similarly for integrals, e.g., $\int f$ for $\int_{-\infty}^\infty f$
sets
- $\compl{A}$ - complement of $A$
- $A\sim B$ - $A\cap \compl{B}$
- $A\Delta B$ - $(A\cap \compl{B}) \cup (\compl{A} \cap B)$
- $\powerset(A)$ - set of all subsets of $A$
sets in metric vector spaces
- $\closure{A}$ - closure of set $A$
- $\interior{A}$ - interior of set $A$
- $\relint A$ - relative interior of set $A$
- $\boundary A$ - boundary of set $A$
set algebra
- $\sigma(\subsetset{A})$ - $\sigma$-algebra generated by $\subsetset{A}$, i.e., smallest $\sigma$-algebra containing $\subsetset{A}$
norms in $\reals^n$
- $\|x\|_p$ ($p\geq1$) - $p$-norm of $x\in\reals^n$, i.e., $(|x_1|^p + \cdots + |x_n|^p)^{1/p}$
- e.g., $\|x\|_2$ - Euclidean norm
matrices and vectors
- $a_{i}$ - $i$-th entry of vector $a$
- $A_{ij}$ - entry of matrix $A$ at position $(i,j)$, i.e., entry in $i$-th row and $j$-th column
- $\Tr(A)$ - trace of $A \in\reals^{n\times n}$, i.e., $A_{1,1}+ \cdots + A_{n,n}$
symmetric, positive definite, and positive semi-definite matrices
- $\symset{n}\subset \reals^{n\times n}$ - set of symmetric matrices
- $\possemidefset{n}\subset \symset{n}$ - set of positive semi-definite matrices; $A\succeq0 \Leftrightarrow A \in \possemidefset{n}$
- $\posdefset{n}\subset \symset{n}$ - set of positive definite matrices; $A\succ0 \Leftrightarrow A \in \posdefset{n}$
sometimes, use Python script-like notations (with serious abuse of mathematical notations)
- use $f:\reals\to\reals$ as if it were $f:\reals^n \to \reals^n$, e.g., $$ \exp(x) = (\exp(x_1), \ldots, \exp(x_n)) \quad \mbox{for } x\in\reals^n $$ and $$ \log(x) = (\log(x_1), \ldots, \log(x_n)) \quad \mbox{for } x\in\ppreals^n $$ which corresponds to Python code numpy.exp(x) or numpy.log(x) where x is instance of numpy.ndarray, i.e., numpy array
- use $\sum x$ to mean $\ones^T x$ for $x\in\reals^n$, i.e. $$ \sum x = x_1 + \cdots + x_n $$ which corresponds to Python code x.sum() where x is numpy array
- use $x/y$ for $x,y\in\reals^n$ to mean $$ \rowvecthree{x_1/y_1}{\cdots}{x_n/y_n}^T $$ which corresponds to Python code x / y where x and y are $1$-d numpy arrays
- use $X/Y$ for $X,Y\in\reals^{m\times n}$ to mean $$ \begin{my-matrix}{cccc} X_{1,1}/Y_{1,1} & X_{1,2}/Y_{1,2} & \cdots & X_{1,n}/Y_{1,n} \\ X_{2,1}/Y_{2,1} & X_{2,2}/Y_{2,2} & \cdots & X_{2,n}/Y_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ X_{m,1}/Y_{m,1} & X_{m,2}/Y_{m,2} & \cdots & X_{m,n}/Y_{m,n} \end{my-matrix} $$ which corresponds to Python code X / Y where X and Y are $2$-d numpy arrays

Some definitions

statement $P_n$, said to happen infinitely often or i.o. if $$ \left( \forall N\in\naturals \right) \left( \exists n > N \right) \left( P_n \right) $$

statement $P(x)$, said to happen almost everywhere or a.e. or almost surely or a.s. (depending on context) associated with measure space $\meas{X}{\algB}{\mu}$ if $$ \mu \set{x}{P(x)} = 1 $$ or equivalently $$ \mu \set{x}{\sim P(x)} = 0 $$

Some conventions

(for some subjects) use following conventions
- $0\cdot \infty = \infty \cdot 0 = 0$
- $(\forall x\in\ppreals)(x\cdot \infty = \infty \cdot x = \infty)$
- $\infty \cdot \infty = \infty$

Convex Optimization

Convex Sets

Lines and line segmenets

for some $x,y\in\reals^n$ $$ \set{\theta x + (1-\theta) y}{\theta\in\reals} $$ called line going through $x$ and $y$

for some $x,y\in\reals^n$ $$ \set{\theta x + (1-\theta) y}{0\leq\theta\leq1\in\reals} $$ called line segment connecting $x$ and $y$

Affine sets

set, $C\subset \reals^n$, every line going through any two points in which is contained in $C$, i.e. $$ \left( \forall x,y \in C \right) \left( \set{\theta x + (1-\theta) y}{\theta \in \reals} \subset C \right) $$ called affine set

for set, $C\subset\reals^n$, intersection of all affine sets containing $C$, called affine hull of $C$, denoted by $\affinehull C$, which is equal to set of all affine combinations of points in $C$, i.e. $$ \bigcup_{n\in\naturals} \set{\theta_1 x_1 + \cdots + \theta_n x_n}{x_1,\ldots,x_n\in C, \theta_1 + \cdots + \theta_n=1} $$

for $C\subset \reals^n$, dimension of $\affinehull C$, called affine dimension

Relative interiors and boundaries

for $C\subset \reals^n$, $$ \bigcup_{O:\mathrm{open}, O\cap \affinehull C\subset C} O \cap \affinehull C $$ or equivalently $$ \set{x}{(\exists \epsilon >0)(\forall y\in \affinehull{C}, \|y-x\|<\epsilon)(y\in C)} $$ is called relative interior of $C$ or interior relative to $C$, denoted by $\relint C$

for $C\subset \reals^n$, $\closure{C}\sim \relint C$, called relative boundary of $C$

Convex sets

set, $C\subset \reals^n$, every line segment connecting any two points in which is contained in $C$, i.e. $$ \left( \forall x,y\in C \right) \left( \forall 0\leq \theta\leq1 \right) \left( \theta x + (1-\theta) y \in C \right) $$ called convex set

for set, $C\subset \reals^n$, intersection of all convex sets containing $C$, called convex hull of $C$, denoted by $\cvxhull C$, which is equal to set of all convex combinations of points in $C$, i.e. $$ \bigcup_{n\in\naturals} \set{\theta_1 x_1 + \cdots + \theta_n x_n}{x_1,\ldots,x_n\in C, \theta_1 + \cdots + \theta_n=1, \theta_1, \ldots, \theta_n >0} $$

convex hull (of course) is convex set

Cones

set, $C\subset \reals^n$, for which $$ \left( \forall x\in C, \theta \geq 0 \right) \left( \theta x \in C \right) $$ called cone or nonnegative homogeneous

set, $C\subset \reals^n$, which is both convex and cone, called convex cone; $C$ is convex cone if and only if $$ \left( \forall x, y\in C, \theta, \xi \geq0 \right) \left( \theta x + \xi y \in C \right) $$

convex cone (of course) is convex set
examples of convex cones: $\prealk{n}$, $\pprealk{n}$, $\possemidefset{n}$, and $\posdefset{n}$

Hyperplanes and half spaces

$n-1$ dimensional affine set in $\reals^n$, called hyperplane; every hyperplane can be expressed as $$ \set{x\in\reals^n}{a^T = b} $$ for some $a\neq0 \in \reals^n$ and $b\in \reals$

one of two sets divided by hyperplane, called half space; every half space can be expressed as $$ \set{x\in\reals^n}{a^T \leq b} $$ for some $a\neq0 \in \reals^n$ and $b\in \reals$

hyperplanes and half spaces are convex sets

Euclidean balls and ellipsoids

set of all points distance of which from point, $x\in\reals^n$, is no greater than $r>0$, called (Euclidean) ball centered at $x$ with radius, $r$, denoted by $\ball{x}{r}$, i.e. $$ \ball{x}{r} = \set{y\in\reals^n}{\|y-x\|_2\leq r} $$

ball elongated along $n$ orthogonal axes, called ellipsoid, i.e., $$ \set{y\in\reals^n}{(y-x)^TP^{-1}(y-x)\leq 1} $$ for some $x\in\reals^n$ and $P\in \posdefset{n}$

Euclidean balls and ellipsoids are convex sets

Norm balls and norm cones

for norm, $\|\cdot\|:\reals^n\to\preals$, set of all points distance of which measured in the norm from point, $x\in\reals^n$, is no greater than $r>0$, called norm ball centered at $x$ with radius, $r$, associated with norm, $\|\cdot\|$, i.e. $$ \set{y\in\reals^n}{\|y-x\|\leq r} $$

for norm, $\|\cdot\|:\reals^n\to\preals$, $x\in\reals^n$, and $r>0$, $$ \set{(x,y)\in\reals^n \times \reals}{ \|x\|\leq r} \subset \reals^{n+1} $$ called cone associated with norm, $\|\cdot\|$

norm cone associated with Euclidean norm, called second-order cone

norm balls and norm cones are convex sets

Polyhedra

intersection of finite number of hyperplanes and half spaces, called polyhedron; every polyhedron can be expressed as $$ \set{x\in\reals^n}{Ax \preceq b, Cx = d} $$ for $A\in\reals^{m\times n}$, $b\in\reals^m$, $C\in\reals^{p\times n}$, $d\in\reals^p$

polyhedron is convex set (by )

Convexity preserving set operations

intersection preserves convexity
- for (any) collection of convex sets, $\coll$, $$ \bigcap_{C\in\coll} C $$ is convex set
scalar scaling preserves convexity
- for convex set $C$ $$ \alpha C $$ is convex set for any $\alpha\in\reals$
sum preserves convexity
- for convex sets $C$ and $D$ $$ C+D $$ is convex set
direct product preserves convexity
- for convex sets $C$ and $D$ $$ C\times D $$ is convex set
projection preserves convexity
- for convex set $C\subset A \times B$ $$ \set{x\in A}{(\exists y)((x,y)\in C)} $$ is convex
image and inverse image by affine function preserve convexity
- for affine function $f:A\to B$ and convex sets $C\subset A$ and $D\subset B$ $$ f(C) \;\& \; f^{-1}(D) $$ are convex
image and inverse image by linear-fractional function preserve convexity
- for convex sets $C\subset \reals^n, D\subset \reals^m$ and linear-fractional function, $g:\reals^n\to\reals^m$, i.e., function defined by $g(x) = (Ax+b)/(c^Tx+d)$ for $A\in\reals^{m\times n}$, $b\in\reals^m$, $c\in\reals^n$, and $d\in\reals$ $$ g(C) \ \& \ g^{-1}(D) $$ are convex

Proper cones and generalized inequalities

closed convex cone $K$ which is

solid, i.e., $\interior{K}\neq \emptyset$
pointed, i.e., $x\in vK$ and $-x\in K$ imply $x=0$

called proper cone

examples of proper cones: $\prealk{n}$ and $\possemidefset{n}$

proper cone $K$ defines generalized inequalities

(nonstrict) generalized inequality $$ x \preceq_K y \Leftrightarrow y - x\in K $$
strict generalized inequality $$ x \prec_K y \Leftrightarrow y - x\in \interior{K} $$

$\preceq_K$ and $\prec_K$ are partial orderings

Convex sets induced by generalized inequalities

for affine function $g:\reals^n\to\symset{m}$, i.e., $f(x)=A_0 + A_1 x_1 + \cdots + A_n x_n$ for some $A_0,\ldots,A_n\in\symset{m}$, $f^{-1}(\possemidefset{n})$ is convex (by ), i.e., $$ \set{x\in\reals^n}{A_0 + A_1 x_1 + \cdots + A_n x_n \succeq 0} \subset \reals^n $$ is convex
can negate each matrix $A_i$ and have same results, hence $$ \set{x\in\reals^n}{A_0 + A_1 x_1 + \cdots + A_n x_n \preceq 0} \subset \reals^n $$ is (also) convex

Separating and supporting hyperplanes

for nonempty disjoint convex sets $C$ and $D$, exists hyperplane which separates $C$ and $D$, i.e. $$ \left( \exists a\neq0\in\reals^n, b\in\reals \right) \left( \forall x\in C, y\in D \right) \left( a^T x + b \geq 0 \ \& \ a^T y + b \leq 0 \right) $$

for nonempty disjoint convex sets $C$ and $D$, hyperplane satisfying property in , called separating hyperplane, said to separate $C$ and $D$

for nonempty convex set $C$ and $x\in \boundary C$, exists hyperplane passing through $x$, i.e., $$ \left( \exists a\neq0\in\reals^n \right) \left( \forall y\in C \right) \left( a^T(y-x) \leq 0 \right) $$

for nonempty convex set $C$ and $x\in \boundary C$, hyperplane satisfied property in , called supporting hyperplane

Dual cones

for cone $K$, $$ \set{x}{\forall y \in K, y^Tx\geq 0 } $$ called dual cone of $K$, denoted by $K^\ast$

the figure illustrates $x \in K^\ast$ while $z\not\in K^\ast$

Dual norms

for norm $\|\cdot\|$, fudnction defined by $$ y \mapsto \sup \set{y^Tx}{\|x\|\leq 1} $$ called dual norm of $\|\cdot\|$, denoted by $\|\cdot\|_\ast$

examples
- dual cone of subspace $V\subset \reals^n$ is orthogonal complement of $V$, $V^\perp$, where $V^\perp=\set{y}{\forall v\in V,v^Ty = 0}$
- $\prealk{n}$ and $\possemidefset{n}$ are self-dual
- dual of norm cone is norm cone associated with dual norm, i.e., if $K=\set{(x,t)\in\reals^{n} \times \reals}{\|x\|\leq t}$ $$ K=\set{(y,u)\in\reals^{n} \times \reals}{\|y\|_\ast\leq u} $$

Properties of dual cones

for cones $K$, $K_1$, and $K_2$

$K^\ast$ is closed and convex
$K_1\subset K_2 \Rightarrow K_2^\ast \subset K_1^\ast$
if $\interior{K} \neq \emptyset$, $K^\ast$ is pointed
if $\closure{K}$ is pointed, $\interior{(K^\ast)} \neq \emptyset$
$K^{\ast\ast}=(K^\ast)^\ast$ is closure of convex hull of $K$,
$K^\ast$ is closed and convex

thus,

if $K$ is closed and convex, $K^{\ast\ast} = K$
dual of proper cone is proper cone
for proper cone $K$, $K^{\ast\ast}=K$

Dual generalized inequalities

dual of proper cone is proper (), hence the dual also induces generalized inequalities

for proper cone $K$,

$x\preceq_K y$ if and only if $(\forall \lambda \succeq_{K^\ast} 0)(\lambda^T x \leq \lambda^T y)$
$x\prec_K y$ if and only if $(\forall \lambda \succeq_{K^\ast} 0 \mbox{ with } \lambda\neq0)(\lambda^T x < \lambda^T y)$

$K^{\ast\ast} = K$, hence above are equivalent to

$x\preceq_{K^\ast} y$ if and only if $(\forall \lambda \succeq_{K} 0)(\lambda^T x \leq \lambda^T y)$
$x\prec_{K^\ast} y$ if and only if $(\forall \lambda \succeq_{K} 0 \mbox{ with } \lambda\neq0)(\lambda^T x < \lambda^T y)$

Theorem of alternative for linear strict generalized inequalities

for proper cone $K\subset \reals^m$, $A\in\reals^{m\times n}$, and $b\in\reals^m$, $$ Ax \prec_K b $$ is infeasible if and only if exist nonzero $\lambda\in\reals^m$ such that $$ \lambda \neq0,\ \lambda \succeq_{K^\ast} 0,\ A^T \lambda = 0,\ \lambda^T b \leq0 $$ Above two inequality systems are alternative, i.e., for any data, $A$ and $b$, exactly one of them is feasible.

Convex Functions

Convex functions

function $f:\reals^n\to\reals$ the domain of which is convex and which satisfies $$ \begin{eqnarray*} && \left( \forall x,y\in \dom f, 0\leq \theta \leq 1 \right) \\ && \left( f(\theta x + (1-\theta) y) \leq \theta f(x) + (1-\theta) f(y) \right) \end{eqnarray*} $$ said to be convex
function $f:\reals^n\to\reals$ the domain of which is convex and which satisfies $$ \begin{eqnarray*} && \left( \forall \mbox{ distinct } x,y\in \dom f, 0< \theta < 1 \right) \\ && \left( f(\theta x + (1-\theta) y) < \theta f(x) + (1-\theta) f(y) \right) \end{eqnarray*} $$ said to be strictly convex

function $f:\reals^n\to\reals$ the domain of which is convex where $-f$ is convex, said to be concave
function $f:\reals^n\to\reals$ the domain of which is convex where $-f$ is strictly convex, said to be strictly concave

Extended real-value extensions of convex functions

for convex function $f$, function $\tilde{f}: \reals^n \to \reals\cup\{\infty\}$ defined by $$ \tilde{f}(x) = \left\{\begin{array}{ll} f(x) &\mbox{if } x \in \dom f \\ \infty &\mbox{if } x \not\in \dom f \end{array}\right. $$ called extended real-value extension of $f$

using extended real-value extensions of convex functions, can drop “$\dom f$'' in equations, e.g.,
- $f$ is convex if and only if its extended-value extension $\tilde{f}$ satisfies $$ \begin{eqnarray*} && \left( \forall x,y\in \dom f, 0\leq \theta \leq 1 \right) \\ && \left( f(\theta x + (1-\theta) y) \leq \theta f(x) + (1-\theta) f(y) \right) \end{eqnarray*} $$
- $f$ is strictly convex if and only if its extended-value extension $\tilde{f}$ satisfies $$ \begin{eqnarray*} && \left( \forall \mbox{ distinct } x,y\in \dom f, 0< \theta < 1 \right) \\ && \left( f(\theta x + (1-\theta) y) < \theta f(x) + (1-\theta) f(y) \right) \end{eqnarray*} $$

First-order condition for convexity

differentiable $f$, i.e., $\dom f$ is open and gradient $\nabla f$ exists at every point in $\dom f$, is

convex if and only if $\dom f$ is convex and $$ \left( \forall x,y\in \dom f \right) \left( f(y) \geq f(x) + \nabla f(x) ^T (y-x) \right) $$
strictly convex if and only if $\dom f$ is convex and $$ \left( \forall \mbox{ distinct } x,y\in \dom f \right) \left( f(y) > f(x) + \nabla f(x) ^T (y-x) \right) $$

implies that for convex function $f$
- first-order Taylor approximation is global underestimator
- can derive global information from local information
  - e.g., if $\nabla f(x)=0$, $x$ is global minimizer
  - explains remarkable properties of convex functions and convex optimization problems

Second-order condition for convexity

twice-differentiable $f$, i.e., $\dom f$ is open and Hessian $\nabla^2 f$ exists at every point in $\dom f$, is convex if and only if $\dom f$ is convex and $$ \left( \forall x\in \dom f \right) \left( \nabla^2 f(x) \succeq 0 \right) $$

if $\dom f$ is convex and $$ \left( \forall x\in \dom f \right) \left( \nabla^2 f(x) \succ 0 \right) $$ it is strictly convex

Convex function examples

assume function $f:\reals^n\to\reals$ and $\dom f =\reals^n$ unlesss specified otherwise
affine function, i.e., $f(x)=a^Tx +b $ for some $a\in\reals^n$ and $b\in\reals$, is convex
quadratic functions - if $f(x) = x^T Px + q^Tx$ for some $P\in\symset{n}$ and $q\in\reals^n$
- $f$ is convex if and only if $P\succeq0$
- $f$ is strictly convex if and only if $P\succ0$
exponential function, i.e., $f(x) = \exp(a^Tx+b)$ for some $a\in\reals^n$ and $b\in\reals$, is convex
power, i.e., $f(x) = x^a$ for some $a\geq1$, is convex on $\ppreals$
power of absolute value, i.e., $f(x) = |x|^a$ for some $a\geq1$, is convex on $\reals$
logarithm function, i.e., $f(x) = \log x$, is concave on $\ppreals$
negative entropy, i.e., $$ f(x) = \left\{\begin{array}{ll} x\log x & \mbox{if } x >0 \\ 0 &\mbox{if } x=0 \end{array}\right. $$ is convex on $\preals$
norm as function is convex (by definition of norms, i.e., triangle inequality & absolute homogeneity)
max function, i.e., $f(x)=\max(x_1,\ldots,x_n\}$, is convex
quadratic-over-linear function, $f(x,y) = x^2/y$, is convex on $\reals\times \ppreals$
log-sum-exp, $f(x) = \log(\exp(x_1)+\cdots+\exp(x_n))$, is convex
geometric mean, $f(x) = (\prod_{i=1}^n x_i )^{1/n}$, is concave on $\pprealk{n}$
log-determinant, $f(X) = \log \det X$, is concave on $\posdefset{n}$

Sublevel sets and superlevel sets

for function $f$ and $\alpha\in\reals$, $$ \set{x\in\dom f}{f(x)\leq \alpha} $$ called $\alpha$-sublevel set of $f$

for function $f$ and $\alpha\in\reals$, $$ \set{x\in\dom f}{f(x)\geq \alpha} $$ called $\alpha$-superlevel set of $f$

every sublevel set of convex function is convex
and every superlevel set of concave function is convex

note, however, converse is not true
- e.g., every sublevel set of $\log$ is convex, but $\log$ is concave

Epigraphs and hypographs

for function $f$, $$ \set{(x,t)}{x\in\dom f, f(x)\leq t} $$ called epigraph of $f$, denoted by $\epi f$

for function $f$, $$ \set{(x,t)}{x\in\dom f, f(x)\geq t} $$ called hypograph of $f$, denoted by $\hypo f$

function is convex if and only if its epigraph is convex
function is concave if and only if its hypograph is convex

Convexity preserving function operations

nonnegative weighted sum preserves convexity
- for convex functions $f_1$, …, $f_n$ and nonnegative weights $w_1,\ldots, w_n$ $$ w_1 f_1 + \cdots w_n f_n $$ is convex
nonnegative weighted integration preserves convexity
- for measurable set $Y$, $w:Y\to\preals$, and $f:X \times Y$ where $f(x,y)$ is convex in $x$ for every $y\in Y$ and measurable in $y$ for every $x\in X$ $$ \int_Y w(y) f(x,y) dy $$ is convex
pointwise maximum preserves convexity
- for convex functions $f_1$, …, $f_n$ $$ \max\{f_1, \ldots, f_n\} $$ is convex
pointwise supremum preserves convexity
- for indexed family of convex functions $\indexedcol{f_\lambda}_{\lambda\in\Lambda}$ $$ \sup_{\lambda \in \Lambda} f_\lambda $$ is convex (one way to see this is $\epi \sup_\lambda f_\lambda = \bigcap_\lambda \epi f_\lambda$)
composition
- suppose $g:\reals^n\to\reals^k$, $h:\reals^k\to\reals$, and $f=h\circ g$
  - $f$ convex if $h$ convex & nondecreasing in each argument, and $g_i$ convex
  - $f$ convex if $h$ convex & nonincreasing in each argument, and $g_i$ concave
  - $f$ concave if $h$ concave & nondecreasing in each argument, and $g_i$ concave
  - $f$ concave if $h$ concave & nonincreasing in each argument, and $g_i$ convex
minimization
- for function $f(x,y)$ convex in $(x,y)$ and convex set $C$ $$ \inf_{y\in C} f(x,y) $$ is convex provided it is bounded below where domain is $\set{x}{(\exists y\in C)((x,y) \in \dom f)}$
perspective of convex function preserves convexity
- for convex function $f:X\to\reals$, function $g:X\times \reals \to \reals$ defined by $$ g(x,t) = tf(x/t) $$ with $\dom g = \set{(x,t)}{x/t \in \dom f, t>0}$ is convex

Convex functions examples

implies

piecewise-linear function is convex, i.e.
- $\max\{a_1^Tx+b_1,\ldots,a_m^T x + b_m\}$ for some $a_i\in\reals^n$ and $b_i\in\reals$ is convex
sum of $k$ largest components is convex, i.e.
- $x_{[1]} + \cdots + x_{[k]}$ where $x_{[i]}$ denotes $i$-th largest component, is convex (since $f(x) = \max\set{x_{i_1}+\cdots+x_{i_r}}{1\leq i_1< i_2<\cdots < i_r\leq n}$)
support function of set, i.e.,
- $\sup\set{x^Ty}{y\in A}$ for $A\subset\reals^n$ is convex
distance (when measured by arbitrary norm) to farthest point of set
- $\sup\set{\|x-y\|}{y\in A}$ for $A\subset\reals^n$ is convex
least-squares cost as function of weights
- $\inf_{x\in\reals^n} \sum^n_{i=1} w_i(a_i^Tx - b_i)^2$ for some $a_i\in\reals^n$ and $b_i\in\reals$ is concave
  - note that above function equals to $ \sum_{i=1}^n w_i b_i^2 - \sum_{i=1}^n w_i^2 b_i^2 a_i^T \left( \sum_{j=1}^n w_ja_ja_j^T\right)^{-1} a_i $ but not clear whether it is concave

maximum eigenvalue of symmetric matrix
- $\lambda_\mathrm{max}(F(x)) = \sup\set{y^TF(x)y}{\|y\|_2 \leq 1}$ where $F:\reals^n\to \symset{m}$ is linear function in $x$
norm of matrix
- $\sup\set{u^TG(x)v}{\|u\|_2 \leq 1, \|v\|_2\leq1}$ where $G:\reals^n\to \reals^{m\times n}$ is linear function in $x$
distance (when measured by arbitrary norm) to convex set
- for convex set $C$, $\inf\set{\|x-y\|}{y\in C}$
infimum of convex function subject to linear constraint
- for convex function $h$, $\inf\set{h(y)}{Ay=x}$ is convex (since it is $\inf_y (h(y) + I_{Ay=x}(x,y))$)
perspective of Euclidean norm squared
- map $(x,t) \mapsto x^Tx /t$ induces convex function in $(x,t)$ for $t>0$
perspective of negative log
- map $(x,t) \mapsto -t \log(x/t)$ induces convex function in $(x,t) \in \pprealk{2}$

perspective of convex function
- for convex function $f:\reals^n\to\reals$, function $g:\reals^n\to\reals$ defined by $$ g(x) = (c^T x + d) f((Ax+b)/(c^T x + d)) $$ from some $A\in\reals^{m\times n}$, $b\in\reals^m$, $c\in\reals^n$, and $d\in\reals$ with $\dom g = \set{x}{(Ax+b)/(c^Tx + d)\in \dom f, c^T x + d >0}$ is convex

Conjugate functions

for function $f$ $$ \sup_{y\in \dom f} (x^Ty - f(y)) $$ called conjugate function of $f$, denoted by $f^\ast$

conjugate function is convex for any function $f$ because it is supremum of linear (hence convex) functions (in $x$) ()

definition of conjugate function implies $$ f(x) + f^\ast(y) \geq x^Ty $$ sometimes called Young's inequality

for convex and closed function $f$ $$ f^{\ast\ast} = f $$ where closed function $f$ is defined by function with closed $\epi f$

Conjugate function examples

strictly convex quadratic function
- for $f:\reals^n \to \preals$ defined $f(x) = x^TQx/2$ where $Q\in \posdefset{n}$, $$ f^\ast(x)= \sup_y(y^Tx - y^TQy/2) = (y^Tx - y^TQy/2)|_{y=Q^{-1}x} = x^TQ^{-1}x/2 $$ which is also strictly convex quadratic function
log-determinant
- for function $f:\posdefset{n} \to \reals$ defined by $f(X) = \log \det X^{-1}$ $$ f^\ast(X) = \sup_{Y\in\posdefset{n}} (\Tr XY + \log \det Y) = \log\det (-X)^{-1} - n $$ where $\dom f^\ast = -\posdefset{n}$
indicator function
- for indicator function $I_A:\reals^n\to\{0,\infty\}$ with $A\subset \reals^n$ $$ I_A^\ast(x) = \sup_y (y^Tx - I_A(y)) = \sup \set{y^Tx}{y\in A} $$ which is support function of $A$
log-sum-exp function
- for function $f: \reals^n \to \reals$ defined by $f(x) = \log(\sum_{i=1}^n \exp(x_i))$ $$ f^\ast(x) = \sum_{i=1}^n x_i \log x_i + I_{x\succeq 0, \ones^T x = 1}(x) $$
norm
- for norm function $f:\reals^n\to\preals$ defined by $f(x)=\|x\|$ $$ f^\ast(x) = \sup_y( {y^Tx - \|y\|}) = I_{\|x\|_\ast\leq1}(x) $$
norm squared
- for function $f: \reals \to \preals$ defined by $f(x) = \|x\|^2/2$ $$ f^\ast(x) = \|x\|_\ast^2/2 $$
differentiable convex function
- for differentiable convex function $f:\reals^n\to\reals$ $$ f^\ast(x)= (y^\ast)^T \nabla f(y^\ast) - f(y^\ast) $$ where $y^\ast = \argsup_y (x^Ty-f(y))$
sum of independent functions
- for function $f:\reals^n\times \reals^m \to \reals$ defined by $f(x,y) = f_1(x) + f_2(y)$ where $f_1:\reals^n\to\reals$ and $f_2:\reals^m\to\reals$ $$ f^\ast(x,y) = f_1^\ast(x) + f_2^\ast(y) $$

Convex functions \wrt\ generalized inequalities

for proper cone $K$,

function $f$ satisfying $$ \left( \forall x,y \in \dom f, 0\leq \theta\leq 1 \right) \left( f(\theta x + (1-\theta) y) \preceq_K \theta f(x) + (1-\theta) f(y) \right) $$ called $K$-convex
function $f$ satisfying $$ \left( \forall x\neq y \in \dom f, 0< \theta< 1 \right) \left( f(\theta x + (1-\theta) y) \prec_K \theta f(x) + (1-\theta) f(y) \right) $$ called strictly $K$-convex

for proper cone $K$

function $f$ is $K$-convex if and only if for every $w\succeq_{K^\ast}0$, $w^Tf$ is convex
function $f$ is strictly $K$-convex if and only if for every nonzero $w\succeq_{K^\ast}0$, $w^Tf$ is strictly convex

Matrix convexity

function of $\reals^n$ into $\symset{m}$ which is $K$-convex where $K=\possemidefset{m}$, called matrix convex

examples of matrix convexity
- function of $\reals^{n\times m}$ into $\possemidefset{n}$ defined by $X\mapsto XX^T$ is matrix convex
- function of $\posdefset{n}$ into itself defined by $X\mapsto X^p$ is matrix convex for $1\leq p\leq 2$ or $-1\leq p \leq0$, and matrix concave for $0\leq p\leq1$
- function of $\symset{n}$ into $\posdefset{n}$ defined by $X\mapsto \exp(X)$ is not matrix convex
- quadratic matrix function of $\reals^{m\times n}$ into $\symset{n}$ defined by $X\mapsto X^TAX + B^TX + X^TB + C$ for $A\in\symset{m}$, $B\in\reals^{m\times n}$, and $C\in\symset{n}$ is matrix convex when $A\succeq0$

Convex Optimization Problems

Optimization problems

for $\fobj:\xobj \to \reals$, $\fie: \xie\to \reals^m$, $\feq: \xeq \to \reals^p$ where $\xobj$, $\xie$, and $\xeq$ are subsets of common set $\xdomain$ $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie(x) \preceq 0 \\ & \feq(x) =0 \end{array} $$ called optimization problem where $x$ is optimization variable

$\fobj$, $\fie$, and $\feq$ are objective function, inequality \& equality contraint function
$\fie(x) \preceq 0$ and $\feq(x) = 0$ are inequality contraints and equality contraints
$\optdomain = \xobj \cap \xie \cap \xeq$ is domain of optimization problem
$\optfeasset =\set{x\in \optdomain}{\fie(x) \preceq0, \feq(x)=0}$, called feasible set, $x\in\optdomain$, said to be feasible if $x\in\optfeasset$, optimization problem, said to be feasible if $\optfeasset\neq \emptyset$
$p^\ast = \inf\set{\fobj(x)}{x\in\optfeasset}$, called optimal value of optimization problem
if optimization problem is infeasible, $p^\ast = \infty$ (following convention that infimum of empty set is $\infty$)
if $p^\ast=-\infty$, optimization problem said to be unbounded

Global and local optimalities

for optimization problem in

$x\in \optfeasset$ with $\fobj(x) = p^\ast$, called (global) optimal point
$X_\mathrm{opt} = \set{x\in \optfeasset}{\fobj(x)=p^\ast}$, called optimal set
when $X_\mathrm{opt} \neq \emptyset$, we say optimal value is attained or achieved and optimization problem is solvable

optimization problem is not solvable if $p^\ast = \infty$ or $p^\ast = -\infty$ (converse is not true)

for optimization problem in where $\xdomain$ is metric space, $x\in\optfeasset$ satisfying $ \inf\set{\fobj(z)}{z\in\optfeasset, \rho(z,x)\leq r} $ where $\rho:\xdomain\times \xdomain\to\preals$ is metric, for some $r>0$, said to be locally optimal, i.e., $x$ solves $$ \begin{array}{ll} \mbox{minimize} & \fobj(z) \\\mbox{subject to}& \fie(z) \preceq 0 \\& \feq(z) =0 \\& \rho(z,x) \leq r \end{array} $$

Equivalent optimization problems

two optimization problems where solving one readily solve the other, said to be equivalent

below two optimization problems are equivalent
- $$ \begin{array}{ll} \mbox{minimize} & -x-y \\ \mbox{subject to} & 2x+y \leq1 \\ & x+2y \leq1 \end{array} $$
- $$ \begin{array}{ll} \mbox{minimize} & -2u-v/3 \\ \mbox{subject to} & 4u+v/3 \leq1 \\ & 2u+2v/3 \leq1 \end{array} $$
since if $(x^\ast, y^\ast)$ solves first, $(u,v)=(x^\ast/2, 3y^\ast)$ solves second, and if $(u^\ast, v^\ast)$ solves second, $(x,y)=(2u^\ast, v^\ast/3)$ solves first

Change of variables

given function $\phi:\mathcalfont{Z} \to \xdomain$, optimization problem in can be rewritten as $$ \begin{array}{ll} \mbox{minimize} & \fobj(\phi(z)) \\ \mbox{subject to} & \fie(\phi(z)) \preceq 0 \\ & \feq(\phi(z)) =0 \end{array} $$ where $z\in\mathcalfont{Z}$ is optimization variable
if $\phi$ is injective and $\optdomain \subset \phi(\mathcalfont{Z})$, above optimization problem and optimization problem in are equivalent, i.e.
- $X_\mathrm{opt}$ is optimal set of problem in $\Rightarrow$ $\phi^{-1}(X_\mathrm{opt})$ is optimal set of above problem
- $Z_\mathrm{opt}$ is optimal set of above problem $\Rightarrow$ $\phi(Z_\mathrm{opt})$ is optimal set of problem in
two optimization problems said to be related by change of variable or substitution of variable $x=\phi(z)$

Convex optimization

optimization problem in where $\xdomain$ is Banach space, i.e., complete linear normed vector space, $\fobj$ & $\fie$ are convex functions, and $\feq$ is affine function, called convex optimization problem

when $\xdomain= \reals^n$, optimization problem can be formulated as $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie(x) \preceq 0 \\ & Ax = b \end{array} $$ for some $A\in\reals^{p\times n}$ and $b\in\reals^p$

domain of convex optimization problem is convex
- since domains of $\fobj$, $\fie$, and $\feq$ are convex (by definition of convex functions) and intersection of convex sets is convex
feasible set of convex optimization problem is convex
- since sublevel sets of convex functions are convex, feasible sets for affine function is either empty set, singleton, or affine sets, all of which are convex sets

Optimality conditions for convex optimization problems

for convex optimization problem (in ), every local optimal point is global optimal point

for convex optimization problem (in ), when $\fobj$ is differentiable (i.e., $\dom \fobj$ is open and $\nabla \fobj$ exists everywhere in $\dom \fobj$)

$x\in\optdomain$ is optimal if and only if $x\in\optfeasset$ and $$ \left( \forall y \in \optfeasset \right) \left( \nabla \fobj(x)^T(y-x) \geq0 \right) $$
for unconstrained problems, $x\in\optdomain$ is optimal if and only if $$ \nabla \fobj(x)=0 $$

Optimality conditions for some convex optimization problems

unconstrained convex quadratic optimization $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) = (1/2)x^TPx + q^Tx \end{array} $$ where $\xobj=\reals^n$ and $P\in\possemidefset{n}$
- $x$ is optimal if and only if $$ \nabla \fobj(x) = Px + q = 0 $$ exist three cases
  - if $P\in\posdefset{n}$, exists unique optimum $x^\ast = -P^{-1}q$
  - if $q\in\range(P)$, $X_\mathrm{opt}=-P^\dagger q + \nullspace(P)$
  - if $q\not\in\range(P)$, $p^\ast = -\infty$
analytic centering $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) = - \sum_{i=1}^m \log (b_i-a_i^Tx) \end{array} $$ where $\xobj = \set{x\in\reals^n}{Ax \prec b}$
- $x$ is optimal if and only if $$ \nabla \fobj(x) = \sum_{i=1}^m \frac{1}{b_i-a_i^Tx}a_i = 0 $$ exist three cases
  - exists unique optimum, which happens if and only if $\set{x}{b_i-a_i^Tx}$ is nonempty and bounded
  - exist infinitely many optima, in which case, $X_\mathrm{opt}$ is affine set
  - exists no optimum, which happens if and only if $\fobj$ is unbounded below
convex optimization problem with equality constraints only $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & Ax =b \end{array} $$ where $\xdomain=\reals^n$
- $x$ is optimal if and only if $$ \nabla \fobj(x) \perp \nullspace(A) $$ or equivalently, exists $\nu\in\reals^p$ such that $$ \nabla \fobj(x) = A^T\nu $$

Linear programming

convex optimization problem in with $\xdomain=\reals^n$ and linear $\fobj$ & $\fie$, called linear program (LP), which can be formulated as $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & Cx \preceq d \\ & A x =b \end{array} $$ where $c\in\reals^n$, $C\in\reals^{m\times n}$, $d\in\reals^m$, $A\in\reals^{p\times n}$, $b\in\reals^p$

can transform above LP into standard form LP $$ \begin{array}{ll} \mbox{minimize} & \tilde{c}^T\tilde{x} \\ \mbox{subject to} & \tilde{A}\tilde{x} = \tilde{b} \\ & \tilde{x} \succeq0 \end{array} $$

LP examples

diet problem - find amount of $n$ different food to minimize purchase cost while satisfying nutrition requirements
- assume exist $n$ food and $m$ nutritions, $c_i$ is cost of food $i$, $A_{ji}$ is amount of nutrition $j$ contained in unit quantity of food $i$, $b_j$ is amount requirement for nutrition $j$
- diet problem can be formulated as LP $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & Ax \succeq b \\ & x\succeq0 \end{array} $$
Chebyshev center of polyhedron - find largest Euclidean ball contained in polyhedron
- assume polyhedron is $\set{x\in\reals^n}{a_i^Tx \leq b_i, i=1,\ldots, m}$
- problem of finding Chebyshev center of polyhedron can be formulated as LP $$ \begin{array}{ll} \mbox{maximize} & r \\ \mbox{subject to} & a_i^T x + r\|a_i\|_2 \leq b_i \end{array} $$ where optimization variables are $x\in\reals^n$ and $r\in\reals$
piecewise-linear minimization - minimize maximum of affine functions
- assume $m$ affine functions $a_i^Tx + b_i$
- piecewise-linear minimization problem can be formulated as LP $$ \begin{array}{ll} \mbox{minimize} & t \\ \mbox{subject to} & a_i^Tx + b_ i \leq t,\quad i=1,\ldots,m \end{array} $$
linear-fractional program $$ \begin{array}{ll} \mbox{minimize} & ( c^T x + d ) / ( e^T x + f ) \\ \mbox{subject to} & Gx \preceq h \\ & Ax = b \end{array} $$
- if feasible set is nonempty, can be formulated as LP $$ \begin{array}{ll} \mbox{minimize} & c^T y + dz \\ \mbox{subject to} & Gy - hz \preceq0 \\ & Ay-bz = 0 \\ & e^Ty + fz = 1 \\ & z\geq0 \end{array} $$

Quadratic programming

convex optimization problem in with $\xdomain=\reals^n$ and convex quadratic $\fobj$ and linear $\fie$, called quadratic program (QP), which can be formulated as $$ \begin{array}{ll} \mbox{minimize} & (1/2) x^TPx + q^Tx \\ \mbox{subject to} & Gx \preceq h \\ & A x =b \end{array} $$ where $P\in\possemidefset{n}$, $q\in\reals^n$, $G\in\reals^{m\times n}$, $h\in\reals^m$, $A\in\reals^{p\times n}$, $b\in\reals^p$

when $P=0$, QP reduces to LP, hence LP is specialization of QP

QP examples

least-squares (LS) problems
- LS can be formulated as QP $$ \begin{array}{ll} \mbox{minimize} & \|Ax-b\|_2^2 \end{array} $$
distance between two polyhedra
- assume two polyhedra $\set{x\in\reals^n}{Ax\preceq b, Cx =d}$ and $\set{x\in\reals^n}{\tilde{A}x\preceq \tilde{b}, \tilde{C}x =\tilde{d}}$
- problem of finding distance between two polyhedra can be formulated as QP $$ \begin{array}{ll} \mbox{minimize} & \|x-y\|_2^2 \\ \mbox{subject to} & Ax\preceq b, \quad Cx =d \\ & \tilde{A}y\preceq \tilde{b}, \quad \tilde{C}y =\tilde{d} \end{array} $$

Quadratically constrained quadratic programming

convex optimization problem in with $\xdomain=\reals^n$ and convex quadratic $\fobj$ & $\fie$, called quadratically constrained quadratic program (QCQP), which can be formulated as $$ \begin{array}{ll} \mbox{minimize} & (1/2) x^TP_0x + q_0^Tx \\ \mbox{subject to} & (1/2) x^TP_ix + q_i^Tx + r_i \leq0,\quad i=1,\ldots,m \\ & A x =b \end{array} $$ where $P_i\in\possemidefset{n}$, $q_i\in\reals^n$, $r_i\in\reals$, $A\in\reals^{p\times n}$, $b\in\reals^p$

when $P_i=0$ for $i=1,\ldots,m$, QCQP reduces to QP, hence QP is specialization of QCQP

Second-order cone programming

convex optimization problem in with $\xdomain=\reals^n$ and linear $\fobj$ and convex $\fie$ of form $$ \begin{array}{ll} \mbox{minimize} & f^T x \\ \mbox{subject to} & \|A_ix + b_i\|_2 \leq c_i^T x + d_i,\quad i=1,\ldots,m \\ & F x =g \end{array} $$ where $f\in\reals^n$, $A_i\in\reals^{n_i\times n}$, $b_i\in\reals^{n_i}$, $c_i\in\reals^{n}$, $d_i\in\reals$, $F\in\reals^{p\times n}$, $g\in\reals^p$ called second-order cone program (SOCP)

when $b_i=0$, SOCP reduces to QCQP, hence QCQP is specialization of SOCP

SOCP examples

robust linear program - minimize $c^T x$ while satisfying $\tilde{a}_i^T x \leq b_i$ for every $\tilde{a}_i \in \set{a_i+P_iu}{\|u\|_2\leq1}$ where $P_i\in\symset{n}$
- can be formulated as SOCP $$ \begin{array}{ll} \mbox{minimize} & c^T x \\ \mbox{subject to} & a_i^T x + \|P_i^T x\|_2 \leq b_i \end{array} $$
linear program with random constraints - minimize $c^T x$ while satisfying $\tilde{a}_i^T x \leq b_i$ with probability no less than $\eta$ where $\tilde{a} \sim \normal(a_i,\Sigma_i)$
- can be formulated as SOCP $$ \begin{array}{ll} \mbox{minimize} & c^T x \\ \mbox{subject to} & a_i^T x + \Phi^{-1}(\eta)\|\Sigma_i^{1/2} x\|_2 \leq b_i \end{array} $$

Geometric programming

function $f:\pprealk{n}\to\reals$ defined by $$ f(x) = cx_1^{a_1} \cdots x_n^{a_n} $$ where $c>0$ and $a_i\in\reals$, called monomial function or simply monomial

function $f:\pprealk{n}\to\reals$ which is finite sum of monomial functions, called posynomial function or simply posynomial

optimization problem $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie(x) \preceq 1 \\ & \feq(x) =1 \end{array} $$ for posynomials $\fobj:\pprealk{n} \to \reals$ & $\fie: \pprealk{n} \to \reals^m$ and monomials $\feq: \pprealk{n} \to \reals^p$, called geometric program (GP)

Geometric programming in convex form

geometric program in is not convex optimization problem (as it is)
however, can be transformed to equivalent convex optimization problem by change of variables and transformation of functions

geometric program (in ) can be transformed to equivalent convex optimization problem $$ \begin{array}{ll} \mbox{minimize} & \log\left( \sum_{k=1}^{K_0} \exp((a^{(0)}_k)^T y + b^{(0)}_k) \right) \\ \mbox{subject to} & \log\left( \sum_{k=1}^{K_i} \exp((a^{(i)}_k)^T y + b^{(i)}_k) \right) \leq0 \quad i=1,\ldots,m \\ & Gy = h \end{array} $$ for some $a^{(i)}_k\in\reals^n$, $b^{(i)}_k\in\reals$, $G\in\reals^{p\times n}$, $h\in\reals^p$ where optimization variable is $y=\log(x)\in\reals^n$

Convex optimization with generalized inequalities

convex optimization problem in with inequality constraints replaced by generalized inequality constraints, i.e. $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie_i(x) \preceq_{K_i} 0\quad i=1,\ldots,q \\ & \feq(x) = 0 \end{array} $$ where $K_i\subset R^{k_i}$ are proper cones and $\fie_i:\xie_i\to\reals^{k_i}$ are $K_i$-convex, called convex optimization problem with generalized inequality constraints

problem in reduces to convex optimization problem in when $q=1$ and $K_1=\prealk{m}$, hence convex optimization is specialization of convex optimization with generalized inequalities
like convex optimization
- feasible set is $\optfeasset = \set{x\in\optdomain}{\fie_i(x)\preceq_{K_i} 0, Ax=b}$ is convex
- local optimality implies global optimality
- optimality conditions in applies without modification

Conic programming

convex optimization problem with generalized inequality constraints in with linear $\fobj$ and one affine $\fie$ $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie(x) \preceq_{K} 0 \\ & \feq(x) = 0 \end{array} $$ called conic program (CP)

can transform above CP to standard form CP $$ \begin{array}{ll} \mbox{minimize} & \tildefobj(X) \\ \mbox{subject to} & \tildefeq (X) = 0 \\ & X \succeq_{K} 0 \end{array} $$

cone program is one of simplest convex optimization problems with generalized inequalities

Semidefinite programming

conic program in with $\xdomain=\reals^n$ and $K=\possemidefset{n}$ $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & x_1F_1 + \cdots + x_nF_n + G \preceq 0 \\ & Ax = b \end{array} $$ where $F_1,\ldots,F_n,G\in\symset{k}$ and $A\in\reals^{p\times n}$, called semidefinite program (SDP)

above inequality, called linear matrix inequality (LMI)
can transform SDP to standard form SDP $$ \begin{array}{ll} \mbox{minimize} & \Tr (CX) \\ \mbox{subject to} & \Tr (A_iX) = b_i\quad i=1,\ldots,p \\ & X \succeq 0 \end{array} $$ where $\xdomain=\possemidefset{n}$ and $C,A_1,\ldots,A_p\in\symset{n}$ and $b_i\in\reals$

SDP examples

LP
- if $k=m$, $F_i=\diag(C_{1,i}, \ldots, C_{m,i})$, $G=-\diag(d_1,\ldots, d_m)$ in , SDP reduces to LP in
- hence, LP is specialization of SDP
SOCP
- SOCP in is equivalent to $$ \begin{array}{ll} \mbox{minimize} & f^T x \\ \mbox{subject to} & Fx = g \\ & \begin{my-matrix}{cc} c_i^Tx + d_i & x^TA_i^T + b_i^T \\ A_ix + b_i & (c_i^Tx + d_i)I_{n_i} \end{my-matrix} \succeq 0 \quad i=1,\ldots,m \end{array} $$ which can be transformed to SDP in , thus, SDP reduces to SOCP
- hence, SOCP is specialization of SDP

Determinant maximization problems

convex optimization problem with generalized inequality constraints in with $\xdomain=\reals^n$ of form $$ \begin{array}{ll} \mbox{minimize} & -\log \det (x_1C_1 + \cdots + x_n C_n + D) +c^Tx \\ \mbox{subject to} & x_1F_1 + \cdots + x_nF_n + G \preceq 0 \\ & -x_1C_1 - \cdots - x_nC_n - D \prec 0 \\ & Ax = b \end{array} $$ where $c\in\reals^n$, $C_1,\ldots,C_n,D\in\symset{l}$, $F_1,\ldots,F_n,G\in\symset{k}$, and $A\in\reals^{p\times n}$, called determinant maximization problem or simply max-det problem (since it maximizes determinant of (positive definite) matrix with constraints)

if $l=1$, $C_1=\cdots=C_n=0$, $D=1$, max-det problem reduces to SDP, hence SDP is specialization of max-det problem

Diagrams for containment of convex optimization problems

the figure shows containment relations among convex optimization problems
vertical lines ending with filled circles indicate existence of direct reductions, i.e., optimization problem transformations to special cases

Duality

Lagrangian

for optimization problem in with nonempty domain $\optdomain$, function $L:\optdomain \times \reals^m \times \reals^p \to \reals$ defined by $$ L(x,\lambda, \nu) = \fobj(x) + \lambda^T \fie(x) + \nu^T \feq(x) $$ called Lagrangian associated with the optimization problem where

$\lambda$, called Lagrange multiplier associated inequality constraints $\fie(x)\preceq0$
$\lambda_i$, called Lagrange multiplier associated $i$-th inequality constraint $\fie_i(x)\leq0$
$\nu$, called Lagrange multiplier associated equality constraints $\feq(x)=0$
$\nu_i$, called Lagrange multiplier associated $i$-th equality constraint $\feq_i(x)=0$
$\lambda$ and $\nu$, called dual variables or Lagrange multiplier vectors associated with the optimization problem

Lagrange dual functions

for optimization problem in for which Lagrangian is defined, function $g:\reals^m \times \reals^p \to \reals\cup \{-\infty\}$ defined by $$ g(\lambda,\nu) = \inf_{x\in\optdomain} L(x,\lambda,\nu) = \inf_{x\in\optdomain} \left(\fobj(x) + \lambda^T \fie(x) + \nu^T \feq(x)\right) $$ called Lagrange dual function or just dual function associated with the optimization problem

$g$ is (always) concave function (even when optimization problem is not convex)
- since is pointwise infimum of linear (hence concave) functions is concave
$g(\lambda,\nu)$ provides lower bound for optimal value of associated optimization problem, i.e., $$ g(\lambda,\nu) \leq p^\ast $$ for every $\lambda\succeq0$
$(\lambda,\nu) \in \set{(\lambda,\nu)}{\lambda\succeq0, g(\lambda,\nu)>-\infty}$, said to be dual feasible

Dual function examples

LS solution of linear equations $$ \lssollineqs{primal} $$
- Lagrangian - $L(x,\nu) = x^T x + \nu^T(Ax-b)$
- Lagrange dual function
  $$ \lssollineqs{dual fcn} $$
standard form LP $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & Ax = b \\ & x\succeq 0 \end{array} $$
- Lagrangian - $L(x,\lambda,\nu) = c^T x - \lambda^T x + \nu^T(Ax-b)$
- Lagrange dual function
  $$ g(\lambda,\nu) = \left\{\begin{array}{ll} -b^T\nu & A^T\nu - \lambda + c = 0 \\ -\infty & \mbox{otherwise} \end{array}\right. $$
  - hence, set of dual feasible points is $\set{(A^T\nu + c,\nu)}{A^T\nu +c \succeq0}$
maximum cut, sometimes called max-cut, problem, which is NP-hard
$$ \begin{array}{ll} \mbox{minimize} & x^T W x \\ \mbox{subject to} & x_i^2 = 1 \end{array} $$ where $W\in\symset{n}$
- Lagrangian - $L(x,\nu) = x^T(W+\diag(\nu))x - \ones^Tx$
- Lagrange dual function $$ g(\nu) = \left\{\begin{array}{ll} -\ones^T\nu & W + \diag(\nu) \succeq 0 \\ -\infty & \mbox{otherwise} \end{array}\right. $$
  - hence, set of dual feasible points is $\set{\nu}{W+\diag(\nu)\succeq0}$
some trivial problem $$ \begin{array}{ll} \mbox{minimize} & f(x) \\ \mbox{subject to} & x=0 \end{array} $$
- Lagrangian - $L(x,\nu) =f(x)+\nu^Tx$
- Lagrange dual function $$ g(\nu) = \inf_{x\in\reals^n} (f(x)+\nu^Tx) = -\sup_{x\in\reals^n} ((-\nu)^Tx-f(x)) = - f^\ast(-\nu) $$
  - hence, set of dual feasible points is $-\dom f^\ast$, and for every $f:\reals^n\to\reals$ and $\nu\in\reals^n$ $$ -f^\ast(-\nu) \leq f(0) $$
minimization with linear inequality and equality constraints $$ \begin{array}{ll} \mbox{minimize} & f(x) \\ \mbox{subject to} & Ax\preceq b \\ & Cx= d \end{array} $$
- Lagrangian - $L(x,\lambda, \nu) = f(x) + \lambda^T(Ax-b) + \nu^T(Cx-d)$
- Lagrange dual function $$ g(\nu) = -b^T\lambda - d^T\nu - f^\ast(-A^T \lambda - C^T\nu) $$
  - hence, set of dual feasible points is $\set{(\lambda,\nu)}{-A^T\lambda - C^T\nu \in \dom f^\ast, \lambda\succeq 0}$
equality constrained norm minimization $$ \begin{array}{ll} \mbox{minimize} & \|x\| \\ \mbox{subject to} & Ax = b \end{array} $$
- Lagrangian - $L(x,\nu) = \|x\| + \nu^T(Ax-b)$
- Lagrange dual function $$ g(\nu) = -b^T\nu -\sup_{x\in\reals^n} ((-A^T\nu)^Tx - \|x\|) = \left\{\begin{array}{ll} -b^T \nu&\|A^T\nu\|_\ast\leq1 \\ - \infty & \mbox{otherwise} \end{array}\right. $$
  - hence, set of dual feasible points is $\set{\nu}{\|A^T\nu\|_\ast \leq1}$
entropy maximization $$ \entmax{primal} $$ where domain of objective function is $\pprealk{n}$
- Lagrangian - $L(x,\lambda,\nu) = \sum_{i=1}^n x_i\log x_i + \lambda^T(Ax-b) + \nu(\ones^Tx-1)$
- Lagrange dual function
  $$ g(\lambda,\nu) = \entmax{dual fcn} $$ obtained using $f^\ast(y) = \sum_{i=1}^n \exp(y_i-1)$ where $a_i$ is $i$-th column vector of $A$
minimum volume covering ellipsoid $$ \minvolcovering{primal} $$ where domain of objective function is $\posdefset{n}$
- Lagrangian - $L(X,\lambda) = -\log \det X + \sum_{i=1}^m \lambda_i(a_i^T X a_i - 1)$
- Lagrange dual function
  $$ g(\lambda) = \minvolcovering{dual fcn} $$ obtained using $f^\ast(Y) = -\log\det(-Y) - n$

Best lower bound

for every $(\lambda,\nu)$ with $\lambda\succeq 0$, Lagrange dual function $g(\lambda,\nu)$ (in ) provides lower bound for optimal value $p^\ast$ for optimization problem in
natural question to ask is
- how good is the lower bound?
- what is best lower bound we can achieve?
these questions lead to definition of Lagrange dual problem

Lagrange dual problems

for optimization problem in , optimization problem $$ \begin{array}{ll} \mbox{maximize} & g(\lambda,\nu) \\ \mbox{subject to} & \lambda \succeq 0 \end{array} $$ called Lagrange dual problem associated with problem in

original problem in , (somestime) called primal problem
domain is $\reals^m\times \reals^p$
dual feasibility defined in page~here, i.e., $(\lambda,\nu)$ satisfying $ \lambda \succeq 0 \quad g(\lambda,\nu) > -\infty $ indeed means feasibility for Lagrange dual problem
$d^\ast = \sup\set{g(\lambda,\nu)}{\lambda\in\reals^m,\:\nu\in\reals^p,\:\lambda\succeq 0}$, called dual optimal value
$(\lambda^\ast,\nu^\ast) = \argsup\set{g(\lambda,\nu)}{\lambda\in\reals^m,\:\nu\in\reals^p,\:\lambda\succeq 0}$, said to be dual optimal or called optimal Lagrange multipliers (if exists)

Lagrange dual problem in is convex optimization (even though original problem is not) since $g(\lambda,\nu)$ is always convex

Making dual constraints explicit dual problems

(out specific) way we define Lagrange dual function in as function $g$ of $\reals^m \times \reals^p$ into $\reals\cup\{-\infty\}$, i.e., $\dom g = \reals^n\times\reals^p$
however, in many cases, feasible set $\set{(\lambda,\nu)}{\lambda \succeq 0 \quad g(\lambda,\nu) > -\infty}$ is proper subset of $\reals^n\times\reals^p$
can make this implicit feasibility condition explicit by adding it as constraint (as shown in following examples)

Lagrange dual problems associated with LPs

standard form LP
- primal problem $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & Ax = b \\ & x\succeq 0 \end{array} $$
- Lagrange dual problem $$ \begin{array}{ll} \mbox{maximize} & g(\lambda,\nu) = \left\{\begin{array}{ll} -b^T\nu & A^T\nu - \lambda + c = 0 \\ -\infty & \mbox{otherwise} \end{array}\right. \\ \mbox{subject to} & \lambda \succeq 0 \end{array} $$ (refer to page~here for Lagrange dual function)
  - can make dual feasibility explicit by adding it to constraints as mentioned on page~here $$ \begin{array}{ll} \mbox{maximize} & -b^T\nu \\ \mbox{subject to} & \lambda \succeq 0 \\ & A^T\nu - \lambda + c = 0 \end{array} $$
  - can further simplify problem $$ \begin{array}{ll} \mbox{maximize} & -b^T\nu \\ \mbox{subject to} & A^T\nu + c \succeq 0 \end{array} $$
- last problem is inequality form LP
- all three problems are equivalent, but not same problems
- will, however, with abuse of terminology, refer to all three problems as Lagrange dual problem
inequality form LP
- primal problem $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & Ax \preceq b \end{array} $$
- Lagrangian $$ L(x,\lambda) = c^Tx + \lambda^T(Ax-b) $$
- Lagrange dual function $$ g(\lambda) = -b^T\lambda + \inf_{x\in\reals^n} (c+A^T\lambda)^T x = \left\{\begin{array}{ll} -b^T\lambda & A^T\lambda + c =0 \\ -\infty & \mbox{otherwise} \end{array}\right. $$
- Lagrange dual problem $$ \begin{array}{ll} \mbox{maximize} & g(\lambda) = \left\{\begin{array}{ll} -b^T\lambda & A^T\lambda + c =0 \\ -\infty & \mbox{otherwise} \end{array}\right. \\ \mbox{subject to} & \lambda \succeq 0 \end{array} $$
  - can make dual feasibility explicit by adding it to constraints as mentioned on page~here $$ \begin{array}{ll} \mbox{maximize} & -b^T\nu \\ \mbox{subject to} & A^T\lambda + c = 0 \\ & \lambda \succeq 0 \end{array} $$
- dual problem is standard form LP
thus, dual of standard form LP is inequality form LP and vice versa
also, for both cases, dual of dual is same as primal problem

Lagrange dual problem of equality constrained optimization problem

equality constrained optimization problem $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & Ax = b \end{array} $$
dual function $$ \begin{eqnarray*} g(\nu) & = & \inf_{x\in\dom \fobj} (\fobj(x) + \nu^T(Ax-b)) \\ &=& -b^T\nu - \sup_{x\in\dom \fobj}(-\nu^TAx -\fobj(x)) \\ & = & -b^T\nu - {\fobj}^\ast(-A^T\nu) \end{eqnarray*} $$
dual problem $$ \begin{array}{ll} \mbox{maximize} & -b^T\nu - {\fobj}^\ast(-A^T\nu) \end{array} $$

Lagrange dual problem associated with equality constrained quadratic program

strictly convex quadratic problem $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) = x^TPx + q^T x + r \\ \mbox{subject to} & Ax=b \end{array} $$
- conjugate function of objective function $$ \begin{eqnarray*} {\fobj}^\ast(x) &=& (x-q)^TP^{-1}(x-q)/4 - r \\ &=& x^TP^{-1}x/4 -q^TP^{-1}x/2 + q^TP^{-1}q/4 -r \end{eqnarray*} $$
- dual problem $$ \begin{array}{ll} \mbox{maximize} & -\nu^T (AP^{-1}A^T)\nu /4 -(b + A P^{-1} q/2)^T\nu - q^TP^{-1}q/4 +r \end{array} $$

Lagrange dual problems associated with nonconvex quadratic problems

primal problem $$ \noncvxquadprob{primal} $$ where $A\in\symset{n}$, $A\not\in\possemidefset{n}$, and $b\in\reals^n$
- since $A\not\succeq 0$, not convex optimization problem
- sometimes called trust region problem arising minimizing second-order approximation of function over bounded region
Lagrange dual function $$ g(\lambda) = \noncvxquadprob{dual fcn} $$ where $(A+\lambda I)^\dagger$ is pseudo-inverse of $A+\lambda I$
Lagrange dual problem
$$ \noncvxquadprob{dual} $$ where optimization variable is $\lambda \in\reals$
- note we do not need constraint $\lambda \geq0$ since it is implied by $A+\lambda I \succeq 0$
- though not obvious from what it appears to be, it is (of course) convex optimization problem (by definition of Lagrange dual function, i.e., )
- can be expressed ar $$ \begin{array}{ll} \mbox{maximize} & -\sum_{i=1}^n (q_i^Tb)^2/(\lambda_i + \lambda) - \lambda \\ \mbox{subject to} & \lambda \geq - \lambda_\mathrm{min}(A) \end{array} $$ where $\lambda_i$ and $q_i$ are eigenvalues and corresponding orthogormal eigenvectors of $A$, when $\lambda_i + \lambda=0$ for some $i$, we interpret $(q_i^Tb)^2/0$ as 0 if $q_i^T0$ and $\infty$ otherwise

Weak duality

since $g(\lambda,\nu)\leq p^\ast$ for every $\lambda\succeq 0$, we have $$ d^\ast = \sup\set{g(\lambda,\nu)}{\lambda\in\reals^m,\:\nu\in\reals^p,\:\lambda\succeq 0} \leq p^\ast $$

property that that optimal value of optimization problem (in ) is always no less than optimal value of Lagrange daul problem (in ) $$ d^\ast \leq p^\ast $$ called weak duality

$d^\ast$ is best lower bound for primal problem that can be obtained from Lagrange dual function (by definition)
weak duality holds even when $d^\ast$ or/and $p^\ast$ are not finite, e.g.
- if primal problem is unbounded below so that $p^\ast=-\infty$, must have $d^\ast = -\infty$, i.e., dual problem is infeasible
- conversely, if dual problem is unbounded above so that $d^\ast = \infty$, must have $p^\ast=\infty$, i.e., primal problem is infeasible

Optimal duality gap

difference between optimal value of optimization problem (in ) and optimal value of Lagrange daul problem (in ), i.e. $$ p^\ast - d^\ast $$ called optimal duality gap

sometimes used for lower bound of optimal value of problem which is difficult to solve
- for example, dual problem of max-cut problem (on page~here), which is NP-hard, is $$ \begin{array}{ll} \mbox{minimize} & -\ones^T \nu \\ \mbox{subject to} & W + \diag(\nu) \succeq 0 \end{array} $$ where optimization variable is $\nu\in\reals^n$
  - the dual problem can be solved very efficiently using polynomial time algorithms while primal problme cannot be solved unless $n$ is very small

Strong duality

if optimal value of optimization problem (in ) equals to optimal value of Lagrange daul problem (in ), i.e. $$ p^\ast = d^\ast $$ strong duality said to hold

strong duality does not hold in general
- if it held always, max-cut problem, which is NP-hard, can be solved in polynomial time, which would be one of biggest breakthrough in field of theoretical computer science
- may mean some of strongest cryptography methods, e.g., homeomorphic cryptography, can be broken

Slater's theorem

exist many conditions which guarantee strong duality, which are called constraint qualifications - one of them is Slater's condition

if optimization problem is convex (), and exists feasible $x\in\optdomain$ contained in $\relint \optdomain$ such that $$ \fie(x) \prec 0\quad \feq(x) = 0 $$ strong duality holds (and dual optimum is attained when $d^\ast>-\infty$)

such condition, called Slater's condition
such point, (sometimes) said to be strictly feasible

when there are affine inequality constraints, can refine Slater's condition - if first $k$ inequality constraint functions $\fie_1$, …, $\fie_k$ are affine, Slater's condition can be relaxed to $$ \fie_i(x)\leq 0\;\;i=1,\ldots,k \quad \fie_i(x) < 0\;\;i=k+1,\ldots,m \quad \feq(x) = 0 $$

Strong duality for LS solution of linear equations

primal problem $$ \lssollineqs{primal} $$
dual problem $$ \lssollineqs{dual} $$ (refer to page~here for Lagrange dual function)
“dual is always feasible'' and “primal is feasible $\Rightarrow$ Slater's condition holds'', thus Slater's theorem () implies, exist only three cases
- $(d^\ast = p^\ast \in \reals)$ or $(d^\ast \in \reals\:\&\: p^\ast = \infty)$ or $(d^\ast = p^\ast = \infty)$
if primal is infeasible, though, $b\not\in\range(A)$, thus exists $z$, such that $A^Tz=0$ and $b^Tz \neq0$, then line $\set{tz}{t\in\reals}$ makes dual problem unbounded above, hence $d^\ast=\infty$
hence, strong duality always holds, i.e., $(d^\ast= p^\ast \in \reals)$ or $(d^\ast = p^\ast = \infty)$

Strong duality for LP

every LP either is infeasible or satisfies Slater's condition
dual of LP is LP, hence, Slater's theorem () implies
- if primal is feaisble, either $(d^\ast=p^\ast= -\infty)$ or $(d^\ast=p^\ast\in\reals)$
- if dual is feaisble, either $(d^\ast=p^\ast= \infty)$ or $(d^\ast=p^\ast\in\reals)$
- only other case left is $(d^\ast=-\infty\;\&\;p^\ast= \infty)$
  - indeed, this pathological case can happen

Strong duality for entropy maximization

primal problem $$ \entmax{primal} $$
dual problem (refer to page~here for Lagrange dual function) $$ \entmax{dual} $$
dual problem is feasible, hence, Slater's theorem () implies, if exists $x\succ 0$ with $Ax \preceq b$ and $\ones^T x =1$, strong duality holds, and indeed $d^\ast=p^\ast\in\reals$
by the way, can simplify dual problem by maximizing dual objective function over $\nu$ $$ \entmax{simplied dual} $$ which is geometry program in convex form () with nonnegativity contraint

Strong duality for minimum volume covering ellipsoid

primal problem $$ \minvolcovering{primal} $$ where $\optdomain=\posdefset{n}$
dual problem $$ \minvolcovering{dual} $$ (refer to page~here for Lagrange dual function)
$X=\alpha I$ with large enough $\alpha>0$ satisfies primal's constraints, hence Slater's condition always holds, thus, strong duality always holds, i.e., $(d^\ast = p^\ast \in \reals)$ or $(d^\ast = p^\ast = -\infty)$
in fact, $\range(a_1,\ldots,a_m) = \reals^n$ if and only if $d^\ast=p^\ast\in\reals^n$

Strong duality for trust region nonconvex quadratic problems

one of rare occasions in which strong duality obtains for nonconvex problems
primal problem $$ \noncvxquadprob{primal} $$ where $A\in\symset{n}$, $A\not\in\possemidefset{n}$, and $b\in\reals^n$
Lagrange dual problem (page~here) $$ \noncvxquadprob{dual} $$
strong duality always holds and $d^\ast=p^\ast\in\reals$ (since dual problem is feasible - large enough $\lambda$ satisfies dual constraints)
in fact, exists stronger result - strong dual holds for optimization problem with quadratic objective and one quadratic inequality constraint, provided Slater's condition holds

Matrix games using mixed strategies

matrix game - consider game with two players $A$ and $B$
- player $A$ makes choice $1\leq a\leq n$, player $B$ makes choice $1\leq b\leq m$, then player $A$ makes payment of $P_{ab}$ to player $B$
- matrix $P\in\reals^{n\times m}$, called payoff matrix
- player $A$ tries to pay as little as possible & player $B$ tries to received as much as possible
- players use randomized or mixed strategies, i.e., each player makes choice randomly and independently of other player's choice according to probability distributions $$ \Prob(a=i) = u_i\; i=1\leq i\leq n \quad \Prob(b=j) = v_j\; i=1\leq j\leq m $$
expected payoff (from player $A$ to player $B$) $$ \sum_i \sum_j u_iv_jP_{ij} = u^TPv $$
assume player $A$'s strategy is known to play $B$
- player $B$ will choose $v$ to maximize $u^TPv$ $$ \sup\set{u^TPv}{v\succeq 0,\; \ones^Tv=1} = \max_{1\leq j\leq m} (P^Tu)_j $$
- player $A$ (assuming that player $B$ will employ above strategy to maximize payment) will choose $u$ to minimize payment $$ \begin{array}{ll} \mbox{minimize} & \max_{1\leq j\leq m} (P^Tu)_j \\ \mbox{subject to} & u\succeq 0\quad \ones^Tu=1 \end{array} $$
assume player $B$'s strategy is known to play $A$
- then player $B$ will do same to maximize payment (assuming that player $A$ will employ such strategy to minimize payment) $$ \begin{array}{ll} \mbox{maximize} & \min_{1\leq i\leq n} (Pv)_i \\ \mbox{subject to} & v\succeq 0\quad \ones^Tv=1 \end{array} $$

Strong duality for matrix games using mixed strategies

in matrix game, can guess in frist came, player $B$ has advantage over player $A$ because $A$'s strategy's exposed to $B$, and vice versa, hence optimal value of first problem is greater than that of second problem
surprising, no one has advantage over the other, i.e., optimal values of two problems are same - will show this
first observe both problems are (convex) piecewise-linear optimization problems
formulate first problem as LP $$ \begin{array}{ll} \mbox{minimize} & t \\ \mbox{subject to} & u\succeq 0 \quad \ones^T u =1 \quad P^T u \preceq t\ones \end{array} $$
- Lagrangian $$ L(u,t,\lambda_1, \lambda_2,\nu) = \nu + (1-\ones^T\lambda_1)t + (P\lambda_1 - \nu \ones - \lambda_2)^Tu $$
- Lagrange dual function $$ g(\lambda_1, \lambda_2,\nu) = \left\{\begin{array}{ll} \nu & \ones^T\lambda_1 = 1 \;\&\; P\lambda_1 - \nu \ones = \lambda_2 \\ -\infty & \mbox{otherwise} \end{array}\right. $$
Lagrange dual problem $$ \begin{array}{ll} \mbox{maximize} & \nu \\ \mbox{subject to} & \ones^T\lambda_1 = 1 \quad P\lambda_1 - \nu \ones = \lambda_2 \\ & \lambda_1 \succeq 0 \quad \lambda_2 \succeq 0 \end{array} $$
eliminating $\lambda_2$ gives below Lagrange dual problem $$ \begin{array}{ll} \mbox{maximize} & \nu \\ \mbox{subject to} & \lambda_1 \succeq 0 \quad \ones^T\lambda_1 = 1 \quad P\lambda_1 \succeq \nu \ones \end{array} $$ which is equivalent to second problem in matrix game
weak duality confirms “player who knows other player's strategy has advantage or on par''
moreoever, primal problem satisfies Slater's condition, hence strong duality {always} holds, and dual is feasible, hence $d^\ast=p^\ast\in\reals$, i.e., regardless of who knows other player's strategy, no player has advantage

Geometric interpretation of duality

assume (not necessarily convex) optimization problem in
define graph $$ G = \set{(\fie(x), \feq(x), \fobj(x))}{x\in\optdomain} \subset \reals^m \times \reals^p \times \reals $$
for every $\lambda\succeq 0$ and $\nu$ $$ \begin{eqnarray*} p^\ast &=& \inf\set{t}{(u,v,t) \in G, u\preceq 0, v = 0} \\ & \geq & \inf\set{t+\lambda^Tu + \nu^T v}{(u,v,t) \in G, u\preceq 0, v = 0} \\ & \geq & \inf\set{t+\lambda^Tu + \nu^T v}{(u,v,t) \in G} = g(\lambda,\nu) \end{eqnarray*} $$ where second inequality comes from $ \set{(u,v,t)}{(u,v,t) \in G, u\preceq 0, v = 0} \subset G $
above establishes weak duality using graph
last equality implies that $$ (\lambda, \nu, 1)^T (u,v,t) \geq g(\lambda,\nu) $$ hence if $g(\lambda,\nu) > -\infty$, $(\lambda, \nu, 1)$ and $g(\lambda,\nu)$ define nonvertical supporting hyperplane for $G$ - nonvertical because third component is nonzero

the figure shows $G$ as area inside closed curve contained in $\reals^m\times\reals^p\times\reals$ where $m=1$ and $p=0$ as primal optimal value $p^\ast$ and supporting hyperplane $\lambda u + t = g(\lambda)$

the figure shows three hyperplanes determined by three values for $\lambda$, one of which $\lambda^\ast$ is optimal solution for dual problem

Epigraph interpretation of duality

define extended graph over $G$ - sort of epigraph of $G$ $$ \begin{eqnarray*} H &=& G + \preals^m \times \{0\} \times \preals \\ & = & \set{(u, v, t)}{x\in\optdomain, \fie(x) \preceq u, \feq(x) = v, \fobj(x)\leq t } \end{eqnarray*} $$
if $\lambda\succeq 0$, $g(\lambda,\nu) = \inf\set{(\lambda,\nu,1)^T(u,v,t)}{(u,v,t) \in H}$, thus $$ (\lambda,\nu,1)^T (u,v,t) \geq g(\lambda,\nu) $$ defines nonvertical supporting hyperplane for $H$
now $p^\ast = \inf\set{t}{(0,0,t)\in H}$, hence $(0,0,p^\ast) \in \boundary H$, hence $$ p^\ast =(\lambda,\nu,1)^T (0,0,p^\ast) \geq g(\lambda,\nu) $$
once again establishes weak duality
the figure shows epigraph interpretation

Proof of strong duality under constraint qualification

now we show proof of strong duality - this is one of rare cases where proof is shown in main slides instead of “selected proofs'' section like Galois theory since - (I hope) it will give you some good intuition about why strong duality holds for (most) convex optimization problems
assume Slater's condition holds, i.e., $\fobj$ and $\fie$ are convex, $\feq$ is affine, and exists $x\in\optdomain$ such that $\fie(x) \prec 0$ and $\feq(x) = 0$
further assume $\optdomain$ has interior (hence, $\relint \optdomain = \interior{\optdomain}$ and $\rank A=p$
assume $p^\ast\in\reals$ - since exists feasible $x$, the other possibility is $p^\ast = -\infty$, but then, $d^\ast = -\infty$, hence strong duality holds
$H$ is convex
now define $$ B = \set{(0,0,s)\in\reals^m\times\reals^p\times\reals}{s<p^\ast} $$
then $B\cap H=\emptyset$, hence implies exists separable hyperplane with $(\tilde{\lambda}, \tilde{\nu}, \mu)\neq 0$ and $\alpha$ such that $$ \begin{eqnarray*} (u,v,t) \in H &\Rightarrow& \tilde{\lambda}^T u + \tilde{\nu}^T v + \mu t \geq \alpha \\ (u,v,t) \in B &\Rightarrow& \tilde{\lambda}^T u + \tilde{\nu}^T v + \mu t \leq \alpha \end{eqnarray*} $$
then $\tilde{\lambda} \succeq 0$ & $\mu\geq0$ - assume $\mu>0$
- can prove when $\mu=0$, but kind of tedius, plus, whole purpose is provide good intuition, so will not do it here
above second inequality implies $\mu p^\ast \leq \alpha$ and for some $x\in\optdomain$ $$ \mu L(x,\tilde{\lambda}/\mu, \tilde{\nu}/\mu) = \tilde{\lambda}^T \fie(x) + \tilde{\nu}^T \feq(x) + \mu \fobj(x) \geq \alpha \geq \mu p^\ast $$ thus, $$ g(\tilde{\lambda}/\mu, \tilde{\nu}/\mu) \geq p^\ast $$
finally, weak duality implies $$ g(\lambda,\nu) = p^\ast $$ where $\lambda = \tilde{\lambda}/\mu$ & $\nu = \tilde{\nu}/\mu$

Max-min characterization of weak and strong dualities

note $$ \begin{eqnarray*} \sup_{\lambda\geq 0, \nu} L(x,\lambda,\nu) &=& \sup_{\lambda\geq 0, \nu} \left( \fobj(x) + \lambda^T \fie(x) + \nu^T \feq(x) \right) \\ & = & \left\{\begin{array}{ll} \fobj(x) & x\in\optfeasset \\ \infty & \mbox{otherwise} \end{array}\right. \end{eqnarray*} $$
thus $ p^\ast = \inf_{x\in\optdomain} \sup_{\lambda\succeq 0, \nu} L(x,\lambda,\nu) $ whereas $ d^\ast = \sup_{\lambda\succeq 0,\nu} \inf_{x\in\optdomain} L(x,\lambda,\nu) $
weak duality means $$ \sup_{\lambda\succeq 0, \nu} \inf_{x\in\optdomain} L(x,\lambda,\nu) \leq \inf_{x\in\optdomain} \sup_{\lambda\succeq 0, \nu} L(x,\lambda,\nu) $$
strong duality means $$ \sup_{\lambda\succeq 0, \nu} \inf_{x\in\optdomain} L(x,\lambda,\nu) = \inf_{x\in\optdomain} \sup_{\lambda\succeq 0, \nu} L(x,\lambda,\nu) $$

Max-min inequality

indeed, inequality $ \sup_{\lambda\succeq 0} \inf_{x\in\optdomain} L(x,\lambda,\nu) \leq \inf_{x\in\optdomain} \sup_{\lambda\succeq 0} L(x,\lambda,\nu) $ holds for general case

for $f:{X} \times {Y} \to \reals$ $$ \sup_{y\in{Y}} \inf_{x\in{X}} f(x,y) \leq \inf_{x\in{X}} \sup_{y\in{Y}} f(x,y) $$

if below equality holds, we say $f$ (and $X$ and $Y$) satisfies strong max-min property or saddle point property $$ \sup_{y\in{Y}} \inf_{x\in{X}} f(x,y) = \inf_{x\in{X}} \sup_{y\in{Y}} f(x,y) $$

this happens, e.g., $X=\optdomain$, $Y=\prealk{m} \times \reals^p$, $f$ is Lagrangian of optimization problem (in ) for which strong duality holds

Saddle-points

for $f:X\times Y\to\reals$, pair $x^\ast\in X$ and $y^\ast\in Y$ such that $$ \left( \forall x \in X, y\in Y \right) \left( f(x^\ast,y) \leq f(x^\ast,y^\ast) \leq f(x,y^\ast) \right) $$ called saddle-point for $f$ (and $X$ and $Y$)

if assumption in holds, $x^\ast$ minimizes $f(x,y^\ast)$ over $X$ and $y^\ast$ maximizes $f(x^\ast,y)$ over $Y$ $$ \sup_{y\in Y} f(x^\ast,y) = f(x^\ast,y^\ast) = \inf_{x\in X} f(x,y^\ast) $$
- strong max-min property (in ) holds with $f(x^\ast,y^\ast)$ as common value

Saddle-point interpretation of strong duality

for primal optimum $x^\ast$ and dual optimum $(\lambda^\ast,\nu^\ast)$ $$ g(\lambda^\ast,\nu^\ast) \leq L(x^\ast, \lambda^\ast, \nu^\ast) \leq \fobj(x^\ast) $$
if strong duality holds, for every $x\in\optdomain$, $\lambda\succeq 0$, and $\nu$ $$ L(x^\ast,\lambda,\nu) \leq \fobj(x^\ast) = L(x^\ast,\lambda^\ast,\nu^\ast) = g(\lambda^\ast,\nu^\ast) \leq L(x,\lambda^\ast, \nu^\ast) $$
- thus $x^\ast$ and $(\lambda^\ast,\nu^\ast)$ form saddle-point of Lagrangian
conversely, if $\tilde{x}$ and $(\tilde{\lambda},\tilde{\nu})$ are saddle-point of Lagrangian, i.e., for every $x\in\optdomain$, $\lambda\succeq 0$, and $\nu$ $$ L(\tilde{x}, {\lambda},{\nu}) \leq L(\tilde{x}, \tilde{\lambda},\tilde{\nu}) \leq L({x}, \tilde{\lambda},\tilde{\nu}) $$
- hence $ g(\tilde{\lambda},\tilde{\nu}) = \inf_{x\in\optdomain} L(x,\tilde{\lambda},\tilde{\nu}) = L(\tilde{x}, \tilde{\lambda},\tilde{\nu}) = \sup_{\lambda\succeq 0, \nu} L(\tilde{x},{\lambda},{\nu}) = \fobj(\tilde{x}) $, thus $g(\lambda^\ast,\nu^\ast) \leq g(\tilde{\lambda}, \tilde{\nu})$ & $\fobj(\tilde{x}) \leq \fobj(x^\ast)$
- thus $\tilde{x}$ and $(\tilde{\lambda}, \tilde{\nu})$ are primal and dual optimal

Game interpretation

assume two players play zero-sum game with payment function $f:X\times Y\to \reals$ where player $A$ pays player $B$ amount equal to $f(x,y)$ when player $A$ chooses $x$ and player $B$ chooses $y$
player $A$ will try to minimize $f(x,y)$ and player $B$ will try to maximize $f(x,y)$
assume player $A$ chooses first then player $B$ chooses after learning opponent's choice
- if player $A$ chooses $x$, player $B$ will choose $\argsup_{y\in Y} f(x,y)$
- knowing that, player $A$ will first choose $\arginf_{x\in X} \sup_{y\in Y} f(x,y)$
- hence payment will be $\inf_{x\in X} \sup_{y\in Y} f(x,y)$
if player $B$ makes her choise first, opposite happens, i.e., payment will be $\sup_{y\in Y} \inf_{x\in X} f(x,y)$
max-min inequality of says $$ \sup_{y\in Y} \inf_{x\in X} f(x,y) \leq \inf_{x\in X} \sup_{y\in Y} f(x,y) $$ i.e., whowever chooses later has advantage, which is similar or rather same as matrix games using mixed strategies on page~here
saddle-point for $f$ (and $X$ and $Y$), $(x^\ast,y^\ast)$, called solution of game - $x^\ast$ is optimal choice for player $A$ and $x^\ast$ is optimal choice for player $B$

Game interpretation for weak and strong dualities

assume payment function in zero-sum game on page~here is Lagrangian of optimization problem in
assume that $X=\xdomain$ and $Y=\prealk{n} \times \reals^p$
if player $A$ chooses first, knowing that player $B$ will choose $\argsup_{(\lambda,\nu)\in Y}L(x,\lambda,\nu)$, she will choose $x^\ast = \arginf_{x\in\xdomain} \sup_{(\lambda,\nu)\in Y}L(x,\lambda,\nu)$
likewise, player $B$ will choose $(\lambda^\ast,\nu^\ast) = \argsup_{(\lambda,\nu)\in Y} \inf_{x\in\xdomain} L(x,\lambda,\nu)$
optimal dualtiy gap $p^\ast - d^\ast$ equals to advantage player who goes second has
if strong dualtiy holds, $(x^\ast, \lambda^\ast, \nu^\ast)$ is solution of game, in which case no one has advantage

Certificate of suboptimality

dual feasible point $(\lambda,\nu)$ degree of suboptimality of current solution
assume $x$ is feasible solution, then $$ \fobj(x) - p^\ast \leq \fobj(x) - g(\lambda,\nu) $$ guarantees that $\fobj(x)$ is no further than $\epsilon = \fobj(x) - g(\lambda,\nu)$ from optimal point point $x^\ast$ (even though we do not know optimal solution)
for this reason, $(\lambda,\nu)$, called certificate of suboptimality
$x$ is $\epsilon$-suboptimal for primal problem and $(\lambda,\nu)$ is $\epsilon$-suboptimal for dual problem
strong duality means we could find arbitrarily small certificate of suboptimality

Complementary slackness

assume strong duality holds for optimization problem in and assume $x^\ast$ is primal optimum and $(\lambda^\ast,\nu^\ast)$ is dual optimum, then $$ \fobj(x^\ast) = L(x^\ast,\lambda^\ast,\nu^\ast) = \fobj(x^\ast) + {\lambda^\ast}^T \fie(x^\ast) + {\nu^\ast}^T \feq(x^\ast) $$
$\feq(x^\ast)=0$ implies ${\lambda^\ast}^T \fie(x^\ast)=0$
then $\lambda^\ast \succeq 0$ and $\fie(x^\ast) \preceq 0$ imply $$ \lambda_i^\ast \fie_i(x^\ast) = 0 \quad i=1,\ldots,m $$

when strong duality holds, for primal and dual optimal points $x^\ast$ and $(\lambda^\ast, \nu)$ $$ \lambda_i^\ast \fie_i(x^\ast) = 0 \quad i=1,\ldots,m $$ this property, called complementary slackness

KKT optimality conditions

for optimization problem in where $\fobj$, $\fie$, and $\feq$ are all differentiable, below conditions for ${x}\in\optdomain$ and $({\lambda}, {\nu})\in\reals^m\times\reals^p$ $$ \begin{eqnarray*} \fie({x}) &\preceq& 0 \quad \mbox{- primal feasibility} \\ \feq(x) &=& 0 \quad \mbox{- primal feasibility} \\ \lambda &\succeq& 0 \quad \mbox{- dual feasibility} \\ {\lambda}^T \fie({x}) &=& 0 \quad \mbox{- complementary slackness} \\ \nabla_x L(x,\lambda,\nu) &=& 0 \quad \mbox{- vanishing gradient of Lagrangian} \end{eqnarray*} $$ called Karush-Kuhn-Tucker (KKT) optimality conditions

KKT necessary for optimality with strong duality

for optimization problem in where $\fobj$, $\fie$, and $\feq$ are all differentiable, if strong duality holds, primal and dual optimal solutions $x^\ast$ and $(\lambda^\ast, \nu)$ satisfy KKT optimality conditions (in ), i.e., for every optimization problem

when strong duality holds, KKT optimality conditions are necessary for primal and dual optimality
or equivalently
primal and dual optimality with strong duality imply KKT optimality conditions

KKT and convexity sufficient for optimality with strong duality

assume convex optimization problem where $\fobj$, $\fie$, and $\feq$ are all differentiable and ${x}\in\optdomain$ and $({\lambda}, {\nu})\in\reals^m\times\reals^p$ satisfying KKT conditions, i.e. $$ \fie({x}) \preceq 0, \; \feq({x}) = 0 , \; {\lambda} \succeq 0 , \; {\lambda}^T \fie({x}) = 0 , \; \nabla_x L({x}, {\lambda},{\nu}) = 0 $$
since $L(x,\lambda,\nu)$ is convex for $\lambda\succeq 0$, i.e., each of $\fobj(x)$, $\lambda^T \fie(x)$, and $\nu^T \feq(x)$ is convex, vanishing gradient implies $x$ achieves infimum for Lagrangian, hence $$ g(\lambda,\nu) = L(x,\lambda,\nu) = \fobj(x) + \lambda^T \fie(x) + \nu^T \feq(x) = f(x) $$
thus, strong duality holds, i.e., $x$ and $(\lambda,\nu)$ are primal and dual optimal solutions with zero duality gap

for convex optimization problem in where $\fobj$, $\fie$, and $\feq$ are all differentiable, if ${x}\in\optdomain$ and $({\lambda}, {\nu})\in\reals^m\times\reals^p$ satisfy KKT optimality conditions (in ), they are primal and dual optimal solutions having zero duality gap i.e.

for convex optimization problem, KKT optimality conditions are sufficient for primal and dual optimality with strong duality
or equivalently
KKT optimality conditions and convexity imply primal and dual optimality and strong duality

together with implies that for convex optimization problem
- KKT optimality conditions are necessary and sufficient for primal and dual optimality with strong duality

Solving primal problems via dual problems

when strong duality holds, can retrieve primal optimum from dual optimum since primal optimal solution is minimize of $$ L(x,\lambda^\ast,\nu^\ast) $$ where $(\lambda^\ast, \nu^\ast)$ is dual optimum
example - entropy maximization ($\optdomain = \pprealk{n}$)
- primal problem -
- dual problem -
- provided dual optimum $(\lambda^\ast,\nu^\ast)$, primal optimum is $$ x^\ast = \argmin_{x\in\optdomain} \left( \sum x_i \log x_i + {\lambda^\ast}^T (Ax-b) + \nu^\ast(\ones^Tx -1) \right) $$
- $\nabla_x L(x,\lambda^\ast,\nu^\ast) = \log x + A^T \lambda^\ast + (1+\nu^\ast)\ones$, hence $$ x^\ast = \exp(-(A^T \lambda^\ast + (1+\nu^\ast)\ones)) $$

Perturbed optimization problems

original problem in with perturbed constraints $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie(x) \preceq u \\ & \feq(x) =v \end{array} $$ where $u\in\reals^m$ and $v\in\reals^p$
define $p^\ast(u,v)$ as optimal value of above perturbed problem, i.e. $$ p^\ast(u,v) = \inf\set{\fobj(x)}{x\in\optdomain, \fie(x) \preceq u, \feq(x) = v} $$ which is convex when problem is convex optimization problem - note $p^\ast(0,0)=p^\ast$
assume and dual optimum $(\lambda^\ast,\nu^\ast)$, if strong duality holds, for every feasible $x$ for perturbed problem $$ \begin{eqnarray*} p^\ast(0,0) =g(\lambda^\ast,\nu^\ast) &\leq& \fobj(x) + {\lambda^\ast}^T \fie(x) + {\nu^\ast}^T \feq(x) \\ &\leq& \fobj(x) + {\lambda^\ast}^T u + {\nu^\ast}^T v \end{eqnarray*} $$ thus $$ p^\ast(0,0)\leq p^\ast(u,v) + {\lambda^\ast}^T u + {\nu^\ast}^T v $$ hence $$ p^\ast(u,v)\geq p^\ast(0,0) - {\lambda^\ast}^T u - {\nu^\ast}^T v $$
the figure shows this for optimization problem with one inequality constraint and no equality constraint

Global sensitivity analysis via perturbed problems

recall $$ p^\ast(u,v)\geq p^\ast(0,0) - {\lambda^\ast}^T u - {\nu^\ast}^T v $$
interpretations
- if $\lambda^\ast_i$ is large, when $i$-th inequality constraint is tightened, optimal value increases a lot
- if $\lambda^\ast_i$ is small, when $i$-th inequality constraint is relaxed, optimal value decreases not a lot
- if $|\nu^\ast_i|$ is large, reducing $v_i$ when $\nu^\ast_i>0$ or increasing $v_i$ when $\nu^\ast_i<0$ increases optimval value a lot
- if $|\nu^\ast_i|$ is small, increasing $v_i$ when $\nu^\ast_i>0$ or decreasing $v_i$ when $\nu^\ast_i<0$ decreases optimval value not a lot
it only gives lower bounds - will explore local behavior

Local sensitivity analysis via perturbed problems

assume $p^\ast(u,v)$ is differentiable with respect to $u$ and $v$, i.e., $\nabla_{(u,v)} p^\ast(u,v)$ exist
- then $$ \frac{\partial}{\partial u_i} p^\ast (0,0) = \lim_{h\to 0^+} \frac{p^\ast(he_i,0) - p^\ast(0,0)}{h} \geq \lim_{h\to 0^+} \frac{-{\lambda^\ast}^T (he_i) }{h} = -\lambda_i $$ and $$ \frac{\partial}{\partial u_i} p^\ast (0,0) = \lim_{h\to 0^-} \frac{p^\ast(he_i,0) - p^\ast(0,0)}{h} \leq \lim_{h\to 0^-} \frac{-{\lambda^\ast}^T (he_i) }{h} = -\lambda_i $$
- obtain same result for $v_i$, hence $$ \nabla_u\; p^\ast (0,0) = -\lambda \quad \nabla_v\; p^\ast (0,0) = -\nu $$
so larger $\lambda_i$ or $|\nu_i|$ means larger change in optimal value of perturbed problem when $u_i$ or $v_i$ change a bit and vice versa quantitatively, - $\lambda_i$ an $\nu_i$ provide exact ratio and direction

Different dual problems for equivalent optimization problems - 1

introducing new variables and equality constraints for unconstrained problems
- unconstrained optimization problem $$ \begin{array}{ll} \mbox{minimize} & f(Ax+b) \end{array} $$
  - dual Lagrange function is $g = p^\ast$, hence strong duality holds, which, however, does not provide useful information
- reformulate as equivalent optimization problem $$ \begin{array}{ll} \mbox{minimize} & f(y) \\ \mbox{subject to} & Ax+b = y \end{array} $$
  - Lagrangian - $ L(x,y,\nu) = f(y) + \nu^T(Ax+b-y) $
  - Lagrange dual function - $ g(\nu) = -I(A^T\nu = 0) + b^T\nu - f^\ast(\nu) $
  - dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & b^T\nu - f^\ast(\nu) \\ \mbox{subject to} & A^T \nu = 0 \end{array} $$

examples
- unconstrained geometric problem $$ \begin{array}{ll} \mbox{minimize} & \log\left( \sum_{i=1}^m \exp(a_i^Tx + b_i) \right) \end{array} $$
  - reformulation $$ \begin{array}{ll} \mbox{minimize} & \log\left( \sum_{i=1}^m \exp(y_i) \right) \\ \mbox{subject to} & Ax + b =y \end{array} $$
  - dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & b^T \nu - \sum_{i=1}^m \nu_i \log \nu_i \\ \mbox{subject to} & \ones^T \nu = 1 \\ & A^T \nu = 0 \\ & \nu \succeq 0 \end{array} $$ which is entropy maximization problem
- norm minimization problem $$ \begin{array}{ll} \mbox{minimize} & \|Ax-b\| \end{array} $$
  - reformulation $$ \begin{array}{ll} \mbox{minimize} & \|y\| \\ \mbox{subject to} & Ax - b = y \end{array} $$
  - dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & b^T \nu \\ \mbox{subject to} & \|\nu\|_\ast \leq 1 \\ & A^T \nu =0 \end{array} $$

Different dual problems for equivalent optimization problems - 2

introducing new variables and equality constraints for constrained problems
- inequality constrained optimization problem $$ \begin{array}{ll} \mbox{minimize} & f_0(A_0x+b_0) \\ \mbox{subject to} & f_i(A_ix+b_i) \leq 0\quad i=1,\ldots,m \end{array} $$
- reformulation $$ \begin{array}{ll} \mbox{minimize} & f_0(y_0) \\ \mbox{subject to} & f_i(y_i) \leq 0\quad i=1,\ldots,m \\ & A_i x + b_i = y_i\quad i=0,\ldots,m \end{array} $$
- dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & \sum_{i=0}^m \nu_i^T b_i - f_0^\ast(\nu_0) - \sum_{i=1}^m \lambda_i f_i^\ast(\nu_i/\lambda_i) \\ \mbox{subject to} & \sum_{i=0}^m A_i^T \nu_i = 0 \\ & \lambda \succeq 0 \end{array} $$

examples
- inequality constrained geometric program $$ \begin{array}{ll} \mbox{minimize} & \log\left(\sum \exp(A_0x + b_0)\right) \\ \mbox{subject to} & \log\left(\sum \exp(A_ix + b_i)\right)\leq 0\quad i=1,\ldots,m \end{array} $$ where $A_i\in\reals^{K_i\times n}$ and $\exp(z) := (\exp(z_1),\ldots,\exp(z_k)))\in\reals^n$ and $\sum z := \sum_{i=1}^k z_i\in\reals$ for $z\in\reals^k$
  - reformulation $$ \begin{array}{ll} \mbox{minimize} & \log\left(\sum \exp(y_0)\right) \\ \mbox{subject to} & \log\left(\sum \exp(y_i)\right)\leq 0\quad i=1,\ldots,m \\ & A_i x + b_i = y_i \quad i=0,\ldots,m \end{array} $$
  - dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & \sum_{i=0}^m b_i^T \nu_i - \nu_0^T\log(\nu_0) - \sum_{i=1}^m \nu_i^T\log(\nu_i/\lambda_i) \\ \mbox{subject to} & \nu_i \succeq 0\quad i=0,\ldots,m \\ & \ones^T \nu_0 = 1,\; \ones^T\nu_i=\lambda_i\quad i=1,\ldots,m \\ & \lambda_i\geq 0 \quad i=1,\ldots,m \\ & \sum_{i=0}^m A_i^T\nu_i = 0 \end{array} $$ where and $\log(z) := (\log(z_1),\ldots,\log(z_k)))\in\reals^n$ for $z\in\pprealk{k}$
  - simplified dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & \sum_{i=0}^m b_i^T \nu_i - \nu_0^T\log(\nu_0) - \sum_{i=1}^m \nu_i^T\log(\nu_i/\ones^T\nu_i) \\ \mbox{subject to} & \nu_i \succeq 0\quad i=0,\ldots,m \\ & \ones^T \nu_0 = 1 \\ & \sum_{i=0}^m A_i^T\nu_i = 0 \end{array} $$

Different dual problems for equivalent optimization problems - 3

transforming objectives
- norm minimization problem $$ \begin{array}{ll} \mbox{minimize} & \|Ax - b\| \end{array} $$
- reformulation $$ \begin{array}{ll} \mbox{minimize} & (1/2)\|y\|^2 \\ \mbox{subject to} & Ax - b = y \end{array} $$
- dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & -(1/2)\|\nu\|_\ast^2 + b^T\nu \\ \mbox{subject to} & A^T\nu = 0 \end{array} $$

Different dual problems for equivalent optimization problems - 4

making contraints implicit
- LP with box constraints $$ \begin{array}{ll} \mbox{minimize} & c^T x \\ \mbox{subject to} & Ax = b,\; l \preceq x \preceq u \end{array} $$
- dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & -b^T\nu - \lambda_1^Tu + \lambda_2^Tl \\ \mbox{subject to} & A^T\nu + \lambda_1 - \lambda_2 + c = 0,\; \lambda_1 \succeq 0,\; \lambda_2 \succeq 0 \end{array} $$
- reformulation $$ \begin{array}{ll} \mbox{minimize} & c^T x + I ( l\preceq x \preceq u) \\ \mbox{subject to} & Ax = b \end{array} $$
- dual optimization problem for reformulated primal problem $$ \begin{array}{ll} \mbox{maximize} & -b^T \nu - u^T(A^T\nu + c)^- + l^T(A^T\nu + c)^+ \end{array} $$

Theorems of Alternatives

Weak alternatives

for $\fie: \xie\to\reals^m$ & $\feq: \xeq\to\reals^p$ where $\xie$ and $\xeq$ are subsets of common set $\xdomain$, which is subset of Banach space, assuming $\optdomain = \xie \cap \xeq \neq \emptyset$, and $\lambda\in\reals^m$ & $\nu\in\reals^p$, below two systems of inequalities and equalities are weak alternatives, i.e., at most one of them is feasible $$ \fie(x) \preceq 0 \quad \feq(x) =0 $$ and $$ \lambda \succeq 0 \quad \inf_{x\in\optdomain} \left( \lambda^T \fie(x) + \nu^T \feq(x) \right) >0 $$

can prove using duality of optimization problems
consider primal and dual problems
- primal problem $$ \begin{array}{ll} \mbox{minimize} & 0 \\ \mbox{subject to} & \fie(x) \preceq 0 \\ & \feq(x) =0 \end{array} $$
- dual problem $$ \begin{array}{ll} \mbox{maximize} & g(\lambda,\nu) \\ \mbox{subject to} & \lambda \succeq 0 \end{array} $$ where $$ g(\lambda,\nu) = \inf_{x\in\optdomain} \left( \lambda^T \fie(x) + \nu^T \feq(x) \right) $$
then $p^\ast,\; d^\ast \in \{0,\infty\}$
now assume first system of \theoremname~\ref{theorem:weak alternatives of two systems}\ is feasible, then $p^\ast = 0$, hence weak duality applies $d^\ast=0$, thus there exist no $\lambda$ and $\nu$ such that $\lambda\succeq 0$ and $g(\lambda,\nu) > 0$ i.e., second system is infeasible, since otherwise there exist $\lambda$ and $\nu$ making $g(\lambda,\nu)$ arbitrarily large; if $\tilde{\lambda}\succeq 0$ and $\tilde{\nu}$ satisfy $g({\lambda},{\nu})>0$, $g(\alpha\tilde{\lambda}, \alpha\tilde{\nu}) = \alpha g(\tilde{\lambda}, \tilde{\nu})$ goes to $\infty$ when $\alpha\to\infty$
assume second system is feasible, then $g(\lambda,\nu)$ can be arbitrarily large for above reasons, thus $d^\ast = \infty$, hence weak duality implies $p^\ast = \infty$, which implies first system is infeasible
therefore two systems are weak alternatives; at most one of them is feasible
(actually, not hard to prove it without using weak duality)

Weak alternatives with strict inequalities

for $\fie: \xie\to\reals^m$ & $\feq: \xeq\to\reals^p$ where $\xie$ and $\xeq$ are subsets of common set $\xdomain$, which is subset of Banach space, assuming $\optdomain = \xie \cap \xeq \neq \emptyset$, and $\lambda\in\reals^m$ & $\nu\in\reals^p$, below two systems of inequalities and equalities are weak alternatives, i.e., at most one of them is feasible $$ \fie(x) \prec 0 \quad \feq(x) =0 $$ and $$ \lambda \succeq 0 \quad \lambda\neq 0 \quad \inf_{x\in\optdomain} \left( \lambda^T \fie(x) + \nu^T \feq(x) \right) \geq 0 $$

Strong alternatives

for convex $\fie: \xie\to\reals^m$ & affine $\feq:\xeq\to\reals^p$ where $\xie$ and $\xeq$ are subsets $\reals^n$ assuming $\optdomain = \xie \cap \xeq \neq \emptyset$ and $\lambda\in\reals^m$ & $\nu\in\reals^p$, if exists $x \in \relint \optdomain$ with $\feq(x)=0$, below two systems of inequalities and equalities are strong alternatives, i.e., exactly one of them is feasible $$ \fie(x) \preceq 0 \quad \feq(x) =0 $$ and $$ \lambda \succeq 0 \quad \inf_{x\in\optdomain} \left( \lambda^T \fie(x) + \nu^T \feq(x) \right) >0 $$

Strong alternatives with strict inequalities

for convex $\fie: \xie\to\reals^m$ & affine $\feq:\xeq\to\reals^p$ where $\xie$ and $\xeq$ are subsets $\reals^n$ assuming $\optdomain = \xie \cap \xeq \neq \emptyset$ and $\lambda\in\reals^m$ & $\nu\in\reals^p$, if exists $x \in \relint \optdomain$ with $\feq(x)=0$, below two systems of inequalities and equalities are strong alternatives, i.e., exactly one of them is feasible $$ \fie(x) \prec 0 \quad \feq(x) =0 $$ and $$ \lambda \succeq 0 \quad \lambda \neq 0 \quad \inf_{x\in\optdomain} \left( \lambda^T \fie(x) + \nu^T \feq(x) \right) \geq 0 $$

proof - consider convex optimization problem and its dual
- primal problem $$ \begin{array}{ll} \mbox{minimize} & s \\ \mbox{subject to} & \fie(x) - s \ones \preceq 0 \\ & \feq(x) =0 \end{array} $$
- dual problem $$ \begin{array}{ll} \mbox{maximize} & g(\lambda,\nu) \\ \mbox{subject to} & \lambda \succeq 0 \quad \ones^T \lambda = 1 \end{array} $$ where $ g(\lambda,\nu) = \inf_{x\in\optdomain} \left( \lambda^T \fie(x) + \nu^T \feq(x) \right) $
first observe Slater's condition holds for primal problem since by hypothesis of , exists $y\in\relint \optdomain$ with $\feq(y)=0$, hence $(y,\fie(y))\in\xie\times \reals$ is primal feasible satisifying Slater's condition
hence Slater's theorem () implies $d^\ast=p^\ast$
assume first system is feasible, then primal problem is strictly feasible and $d^\ast = p^\ast<0$, hence second system infeasible since otherwise feasible point for second system is feasible point of dual problem, hence $d^\ast\geq0$
assume first system is infeasible, then $d^\ast = p^\ast\geq0$, hence Slater's theorem () implies exists dual optimal $(\lambda^\ast,\nu^\ast)$ (whether or not $d^\ast=\infty$), hence $(\lambda^\ast,\nu^\ast)$ is feasible point for second system of
therefore two systems are strong alternatives; each is feasible if and only if the other is infeasible

Strong alternatives for linear inequalities

dual function of feasibility problem for $Ax\preceq b$ is $$ g(\lambda) = \inf_{x\in\reals^n} \lambda^T(Ax-b) = \left\{ \begin{array}{ll} -b^T \lambda & A^T\lambda = 0 \\ -\infty & \mbox{otherwise} \end{array} \right. $$
hence alternative system is $\lambda\succeq0,\;b^T\lambda <0,\; A^T\lambda=0$
thus implies below systems are strong alternatives $$ Ax \preceq b \quad\&\quad \lambda\succeq0 \quad b^T\lambda <0 \quad A^T\lambda=0 $$
similarly alternative system is $\lambda\succeq0,\;b^T\lambda <0,\; A^T\lambda=0$ and implies below systems are strong alternatives $$ Ax \prec b \quad\&\quad \lambda\succeq0 \quad \lambda \neq 0 \quad b^T\lambda \leq 0 \quad A^T\lambda=0 $$

Farkas' lemma

below systems of inequalities and equalities are strong alternatives $$ Ax\preceq 0 \quad c^T x < 0 \quad \& \quad A^T y + c = 0 \quad y \succeq 0 $$

will prove using LP and its dual
consider LP $\left(\mbox{minimize}\; c^T x \quad \mbox{subject to}\; Ax \preceq 0\right)$
dual function is $ g(y) = \inf_{x\in\reals^n} \left(c^Tx + y^TAx \right) = \left\{ \begin{array}{ll} 0 & A^Ty + c= 0 \\ -\infty & \mbox{otherwise} \end{array} \right. $
hence dual problem is $ \left( \mbox{maximize} \; 0 \quad \mbox{subject to} \; A^T y + c = 0 , \; y \succeq 0 \right) $
assume first system is feasible, then homogeneity of primal problem implies $p^\ast = -\infty$, thus $d^\ast$, i.e., dual is infeasible, hence second system is infeasible
assume first system is infeasible, since primal is always feasible, $p^\ast=0$, hence strong duality implies $d^\ast =0$, thus second system is feasible

Convex Optimization with Generalized Inequalities

Optimization problems with generalized inequalities

for $\fobj:\xobj \to \reals$, $\fie: \xie\to \bigtimes_{i=1}^m \reals^{k_i} $, $\feq: \xeq \to \reals^p$ where $\xobj$, $\xie$, and $\xeq$ are subsets of common set $\xdomain$ $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie(x) \preceq_\bigpropercone 0 \\ & \feq(x) =0 \end{array} $$ called optimization problem with generalized inequalities where $\bigpropercone = \bigtimes K_i$ is proper cone with $m$ proper cones $K_1\subset \reals^{k_1},\ldots, K_n\subset \reals^{k_m}$

every terminology and associated notation is same as of optimization problem in such as objective & inequality & equality contraint functions, domain of optimization problem $\optdomain$, feasible set $\optfeasset$, optimal value $p^\ast$
note that when $K_i=\preals$ (hence $\bigpropercone=\prealk{m}$), above optimization problem coincides with that in , i.e., optimization problems with generalized inequalities subsume (normal) optimization problems

Lagrangian for generalized inequalities

for optimization problem in with nonempty domain $\optdomain$, function $L:\optdomain \times \bigtimes_{i=1}^m \reals^{k_i} \times \reals^p \to \reals$ defined by $$ L(x,\lambda, \nu) = \fobj(x) + \lambda^T \fie(x) + \nu^T \feq(x) $$ called Lagrangian associated with the optimization problem where

every terminology and associated notation is same as of optimization problem in such as dual variables or Lagrange multipliers $\lambda$ and $\nu$.
Lagrangian for generalized inequalities subsumes (normal) Lagrangian ()

Lagrange dual functions for generalized inequalities

for optimization problem in for which Lagrangian is defined, function $g:\bigtimes \reals^{k_i} \times \reals^p \to \reals\cup \{-\infty\}$ defined by $$ g(\lambda,\nu) = \inf_{x\in\optdomain} L(x,\lambda,\nu) = \inf_{x\in\optdomain} \left(\fobj(x) + \lambda^T \fie(x) + \nu^T \feq(x)\right) $$ called Lagrange dual function or just dual function associated with optimization problem

Lagrange dual functions for generalized inequalities subsume (normal) Lagrange dual functions ()

$g$ is concave function
$g(\lambda,\nu)$ is lower bound for optimal value of associated optimization problem i.e., $$ g(\lambda,\nu) \leq p^\ast $$ for every $\lambda\succeq_\bigpropercone^\ast0$ where $\bigpropercone^\ast$ denotes dual cone of $\bigpropercone$, i.e., $\bigpropercone^\ast = \bigtimes K_i^\ast$ where $K_i^\ast\subset\reals^{k_i}$ is dual cone of $K_i\subset\reals^{k_i}$
$(\lambda,\nu)$ with $\lambda\succeq_\bigpropercone 0$ and $g(\lambda,\nu)>-\infty$ said to be dual feasible

Lagrange dual problems for generalized inequalities

for optimization problem in , optimization problem $$ \begin{array}{ll} \mbox{maximize} & g(\lambda,\nu) \\ \mbox{subject to} & \lambda \succeq_{\bigpropercone^\ast} 0 \end{array} $$ where $\bigpropercone^\ast$ denotes dual cone of $\bigpropercone$, i.e., $\bigpropercone^\ast = \bigtimes K_i^\ast$ where $K_i^\ast\subset\reals^{k_i}$ is dual cone of $K_i\subset\reals^{k_i}$, called Lagrange dual problem associated with problem in

every terminology and related notation is same as that in such as dual feasibility, dual optimal value $d^\ast$, optimal Lagrange multipliers $(\lambda^\ast,\nu^\ast)$
Lagrange dual problems for generalized inequalities subsume (normal) Lagrange dual problems ()

Lagrange dual problem in is convex optimization since $g(\lambda,\nu)$ is convex

Slater's theorem for generalized inequalities

if optimization problem in is convex, i.e., $\fobj$ is convex, $\fie$ is $\bigpropercone$-convex (i.e., every $\fie_i$ is $K_i$-convex) (), and exists feasible $x\in\optdomain$ contained in $\relint \optdomain$ such that $$ \fie(x) \prec_\bigpropercone\ 0\quad \feq(x) = 0 $$ strong duality holds (and dual optimal value is attained when $d^\ast>-\infty$)

such condition, called Slater's condition
such point, (sometimes) said to be strictly feasible
note resemblance with Slater's theorem in

Duality for SDP

(inequality form) SDP
$$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & x_1F_1 + \cdots + x_nF_n + G \preceq 0 \end{array} $$ where $F_1,\ldots,F_n,G\in\symset{k}$ and $\bigpropercone = \possemidefset{k}$
Lagrangian $$ \begin{eqnarray*} L(x,Z) &=& c^Tx + (x_1F_1 + \cdots + x_nF_n + G) \bullet Z \\ &=& \sum x_i(F_i\bullet Z + c_i) + G \bullet Z \end{eqnarray*} $$ where $X\bullet Y = \Tr XY$ for $X,Y\in\symset{k}$
Lagrange dual function $$ g(Z) = \inf_{x\in\reals^n} L(x,Z) = \left\{ \begin{array}{ll} G \bullet Z & F_i\bullet Z + c_i= 0\quad i=1,\ldots,n \\ -\infty & \mbox{otherwise} \end{array} \right. $$
Lagrange dual problem
$$ \begin{array}{ll} \mbox{maximize} & G\bullet Z \\ \mbox{subject to} & F_i \bullet Z + c_i = 0\quad i=1,\ldots,n \\ & Z \succeq 0 \end{array} $$ where fact that $\possemidefset{k}$ is self-dual, i.e., $\bigpropercone^\ast = \bigpropercone$
Slater's theorem () implies if primal problem is strictly feasible, i.e., exists $x\in\reals^n$ such that $\sum x_iF_i + G\prec 0$, strong duality holds

KKT optimality conditions for generalized inequalities

for optimization problem in where $\fobj$, $\fie$, and $\feq$ are all differentiable, below conditions for ${x}\in\optdomain$ and $({\lambda}, {\nu})\in\bigtimes \reals^{k_i} \times\reals^p$ $$ \begin{eqnarray*} \fie({x}) &\preceq_\bigpropercone& 0 \quad \mbox{- primal feasibility} \\ \feq(x) &=& 0 \quad \mbox{- primal feasibility} \\ \lambda &\succeq_{\bigpropercone^\ast}& 0 \quad \mbox{- dual feasibility} \\ {\lambda}^T \fie({x}) &=& 0 \quad \mbox{- complementary slackness} \\ \nabla_x L(x,\lambda,\nu) &=& 0 \quad \mbox{- vanishing gradient of Lagrangian} \end{eqnarray*} $$ called Karush-Kuhn-Tucker (KKT) optimality conditions

note KKT optimality conditions for generalized inequalities subsume (normal) KKT optimality conditions ()

KKT conditions and optimalities for generalized inequalities

for every optimization problem with generalized inequalities (), every statement for normal optimization problem (), regarding relations among KKT conditions, optimality, primal and dual optimality, and strong duality, is exactly the same
- for every optimization problem with generalized inequalities ()
  - if strong duality holds, primal and dual optimal points satisfy KKT optimality conditions in (same as )
  - if optimization problem is convex and primal and dual solutions satisfy KKT optimality conditions in , the solutions are optimal with strong duality (same as )
  - therefore, for convex optimization problem, KKT optimality conditions are necessary and sufficient for primal and dual optimality with strong duality

Perturbation and sensitivity analysis for generalized inequalities

original problem in with perturbed constraints $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie(x) \preceq_\bigpropercone u \\ & \feq(x) =v \end{array} $$ where $u\in\reals^m$ and $v\in\reals^p$
define $p^\ast(u,v) = p^\ast(u,v) = \inf\set{\fobj(x)}{x\in\optdomain, \fie(x) \preceq u, \feq(x) = v}$, which is convex when problem is convex optimization problem - note $p^\ast(0,0)=p^\ast$
as for normal optimization problem case (page~here), if and dual optimum $(\lambda^\ast,\nu^\ast)$, if strong duality holds, $$ p^\ast(u,v)\geq p^\ast(0,0) - {\lambda^\ast}^T u - {\nu^\ast}^T v $$ and $$ \nabla_u\; p^\ast (0,0) = -\lambda \quad \nabla_v\; p^\ast (0,0) = -\nu $$

Sensitivity analysis for SDP

assume inequality form SDP and its dual problem on page~here and page~here
consider perturbed SDP $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & x_1F_1 + \cdots + x_nF_n + G \preceq U \end{array} $$ for some $U\in\symset{k}$
- define $p^\ast:\symset{k} \to \reals$ such that $p^\ast(U)$ is optimal value of above problem
assume $x^\ast\in\reals^n$ and $Z^\ast\in\possemidefset{k}$ are primal and dual optimum with zero dualty gap
then $$ p^\ast(U) \geq p^\ast - Z^\ast \bullet U $$
if $\nabla_U p^\ast$ exists at $U=0$ $$ \nabla_U p^\ast(0) = - Z^\ast $$

Weak alternatives for generalized inequalities

for $\fie:\xie \to \bigtimes \reals^{k_i}$ & $\feq:\xeq \to \reals^p$ where $\xie$ and $\xeq$ are subsets of common Banach space assuming $\optdomain = \xie \cap \xeq \neq \emptyset$, and $\lambda \in \bigtimes \reals^{k_i}$ & $\nu \in \reals^p$, below pairs of systems are strong alternatives $$ \begin{eqnarray*} \fie(x) \preceq_\bigpropercone 0 \quad \feq(x) = 0 \quad & \& & \quad \lambda\succeq_{\bigpropercone^\ast} 0 \quad g(\lambda,\nu) > 0 \\ \fie(x) \prec_\bigpropercone 0 \quad \feq(x) = 0 \quad & \& & \quad \lambda\succeq_{\bigpropercone^\ast} 0 \quad \lambda \neq 0 \quad g(\lambda,\nu) \geq 0 \end{eqnarray*} $$ where $\bigpropercone = \bigtimes K_i$ with proper cones $K_i\subset\reals^{k_i}$ and function $g:\bigtimes \reals^{k_i} \times \reals^p \to \reals$ defined by $$ g(\lambda,\nu) = \inf_{x\in\optdomain} \left( \lambda^T \fie(x) + \nu^T \feq(x) \right) $$ note this theorem subsumes and

Strong alternatives for generalized inequalities

for $\bigpropercone$-convex $\fie:\xie \to \bigtimes \reals^{k_i}$ & affine $\feq:\xeq \to \reals^p$ where $\xie$ and $\xeq$ are subsets of $\reals^n$ assuming $\optdomain = \xie \cap \xeq \neq \emptyset$, and $\lambda \in \bigtimes \reals^{k_i}$ & $\nu \in \reals^p$, if exists $x\in\relint \optdomain$ with $\feq(x)=0$, below pairs of systems are strong alternatives $$ \begin{eqnarray*} \fie(x) \preceq_\bigpropercone 0 \quad \feq(x) = 0 \quad & \& & \quad \lambda\succeq_{\bigpropercone^\ast} 0 \quad g(\lambda,\nu) > 0 \\ \fie(x) \prec_\bigpropercone 0 \quad \feq(x) = 0 \quad & \& & \quad \lambda\succeq_{\bigpropercone^\ast} 0 \quad \lambda \neq 0 \quad g(\lambda,\nu) \geq 0 \end{eqnarray*} $$ where $\bigpropercone = \bigtimes K_i$ with proper cones $K_i\subset\reals^{k_i}$ and function $g:\bigtimes \reals^{k_i} \times \reals^p \to \reals$ defined by $$ g(\lambda,\nu) = \inf_{x\in\optdomain} \left( \lambda^T \fie(x) + \nu^T \feq(x) \right) $$ note this theorem subsumes and

Strong alternatives for SDP

for $F_1,\ldots,F_n,G\in\symset{k}$, $x\in\reals^n$, and $Z\in\symset{k}$
- below systems are strong alternatives $$ x_1F_1 + \cdots + x_nF_n + G \prec 0 $$ and $$ Z \succeq 0 \quad Z\neq 0 \quad G\bullet Z \geq 0 \quad F_i \bullet Z = 0\;i=1,\ldots,n $$
- if $\sum v_i F_i \succeq 0 \Rightarrow \sum v_i F_i = 0$, below systems are strong alternatives $$ x_1F_1 + \cdots + x_nF_n + G \preceq 0 $$ and $$ Z \succeq 0 \quad G\bullet Z > 0 \quad F_i \bullet Z = 0\;i=1,\ldots,n $$

Unconstrained Minimization

Unconstrained minimization

consider unconstrained convex optimization problem, i.e., $m=p=0$ in $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \end{array} $$ where domain of optimization problem is $\optdomain\ = \xobj \subset \reals^n$
assume
- $\fobj$ is twice-differentiable (hence by definition $\xobj$ is open)
- optimal solution $x^\ast$ exists, i.e., $p^\ast = \inf_{x\in\optdomain} \fobj(x) = \fobj(x^\ast)$
implies $x^\ast$ is optimal solution if and only if $$ \nabla \fobj(x^\ast) = 0 $$
can solve above equation directly for few cases, but usually depend on iterative method, i.e., find sequence of points $\xseqk{0}, \xseqk{1}, \ldots \in \xobj$ such that $ \lim_{k\to\infty} \fobj(\xseqk{k}) = p^\ast $

Requirements for iterative methods

requirements for iterative methods
- initial point $\xseqk{0}$ should be in domain of optimization problem, i.e. $$ \xseqk{0} \in \xobj\ $$
- sublevel set of $\fobj(\xseqk{0})$ $$ S = \bigset{x\in\xobj}{\fobj(x) \leq \fobj(\xseqk{0})} $$ should be closed
e.g.
- sublevel set of $\fobj(\xseqk{0})$ is closed for all $\xseqk{0}\in\xobj$ if $\fobj$ is closed, i.e., all its sublevel sets are closed
- $\fobj$ is closed if $\xobj = \reals^n$ and $\fobj$ is continuous
- $\fobj$ is closed if $\fobj$ is continuous, $\xobj$ is open, and $\fobj(x) \to \infty$ as $x \to \boundary \xobj$

Unconstrained minimization examples

convex quadratic problem $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) = (1/2) x^TP x +q^Tx \end{array} $$ where $P\in\possemidefset{n}$ and $q\in\reals^n$
- solution obtained by solving $$ \nabla \fobj(x^\ast) = P x^\ast + q = 0 $$
  - if solution exists, $x^\ast = - P^\dagger q$ (thus $p^\ast>-\infty$)
  - otherwise, problem is unbounded below, i.e., $p^\ast = -\infty$
- ability to analytically solve quadratic minimization problem is basis for Newton's method, power method for unconstrained minimization
- least-squares (LS) is special case of convex quadratic problem $$ \begin{array}{ll} \mbox{minimize} & (1/2) \|Ax-b\|_2^2 = (1/2) x^T (A^TA) x - b^TAx + (1/2)\|b\|_2^2 \end{array} $$
  - optimal always exists, can be obtained via normal equations $$ A^T Ax^\ast = b $$
unconstrained GP $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) = \log\left( \sum \exp (Ax+b) \right) \end{array} $$ for $A\in\reals^{m\times n}$ and $b\in\reals^m$
- solution obtained by solving $$ \nabla \fobj(x^\ast) = \frac{\sum A^T \exp(Ax^\ast+b)}{\sum \exp(Ax^\ast+b)} = 0 $$
- need to resort to iterative method - since $\xobj = \reals^n$ and $\fobj$ is continuous, $\fobj$ is closed, hence every point in $\reals^n$ can be initial point
analytic center of linear inequalities $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) = - \sum\log(b-Ax) \end{array} $$ where $\xobj = \set{x\in\reals^n}{b-Ax \succ 0}$
- need to resort to iterative method - since $\xobj$ is open, $\fobj$ is continuous, and $\fobj(x) \to \infty$ as $x\to\boundary \xobj$, $\fobj$ is closed, hence every point in $\xobj$ can be initial point
- $\fobj$, called logarithmic barrier for inequalities $Ax\preceq b$
analytic center of LMI $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) = - \log\det F(x) = \log\det F(x)^{-1} \end{array} $$ where $F:\reals^n\to \symset{k}$ is defined by $$ F(x) = x_1F_1 + \cdots + x_nF_n $$ where $F_i\in \symset{k}$ and $\xobj = \set{x\in\reals^n}{F(x)\succ 0}$
- need to resort to iterative method - since $\xobj$ is open, $\fobj$ is continuous, and $\fobj(x) \to \infty$ as $x\to\boundary \xobj$, $\fobj$ is closed, hence every point in $\xobj$ can be initial point
- $\fobj$, called logarithmic barrier for LMI

Strong convexity and implications

function $\fobj$ is strongly convex on $S$ $$ \left( \exists m >0 \right) \left( \forall x \in S \right) \left( \nabla^2 \fobj(x) \succeq mI \right) $$
strong convexity implies for every $x,y\in S$ $$ \fobj(y) \geq \fobj(x) + \nabla \fobj(x)^T (y-x) + ({m}/{2}) \|y-x\|_2^2 $$
- which implies gradient provides optimality certificate and tells us how far current point is from optimum, i.e. $$ \fobj(x) - p^\ast \leq ({1}/{2m}) \|\nabla \fobj(x)\|_2^2 \quad \|x-x^\ast\|_2 \leq ({2}/{m}) \|\nabla \fobj(x)\|_2 $$
first equation implies sublevel sets contained in $S$ is bounded, hence continuous function $\nabla^2 \fobj(x)$ is also bounded, i.e., $\left( \exists M >0 \right) \left( \nabla^2 \fobj(x) \preceq M I \right)$, then $$ \fobj(x) - p^\ast \geq \frac{1}{2M} \|\nabla \fobj(x)\|_2^2 $$

Iterative methods

numerical method generating sequence of points $\xseqk{0}, \xseqk{1}, \ldots \in S\subset \reals^n$ to make $\fobj(\xseqk{k})$ approaches to some desired value from some $f:S\to\reals$, called iterative method

iterative method generating search direction $\sdirk{k}\in\reals^n$ and step length $\slenk{k}>0$ at each step $k$ such that $$ \xseqk{k+1} = \xseqk{k} + \slenk{k} \sdirk{k} $$ called iterative method with search direction where $\sdirk{k}$, called search direction, $\slenk{k}$, called step length (which actually is not length)

for function $f:S\to\reals$, iterative method reducing function value, i.e. $$ \fobj(\xseqk{k+1}) \leq \fobj(\xseqk{k}) $$ for $k=0,1,\ldots$, called descent method

Line search methods

for iterating method with search directions, determining search direction $\sdirk{k}$ and step length $\slenk{k}$ for each step, called line search method

for descent iterating method with search directions, determine $\slen$ by $$ \slen = \argmin_{s>0} \fobj(x +s\sdir) $$

for descent iterating method with search directions, determine $\slen$ by

Require: $\fobj$, \sdirk{k}, $\alpha\in(0,0.5)$, $\beta\in(0,1)$
$\slen:=1$
while $\fobj(\xseqk{k} + \slen \sdirk{k}) > \fobj(\xseqk{k}) + \alpha \slen \nabla \fobj(\xseqk{k})^T \sdirk{k}$ do
$\slen := \beta \slen$
end while

Gradient descent method

Require: $\fobj$, initial point $x\in \dom \fobj$
repeat
search direction - $\sdir := - \nabla \fobj(x)$
do line search to choose $\slen>0$
update - $x := x + \slen \sdir$
until stopping criterion satisfied

Summary of gradient descent method

gradient method often exhibits approximately linear convergence, i.e., error $\fobj(\xseqk{k})-p^\ast$ converges to zero approximately as geometric series
choice of backtracking parameters $\alpha$ and $\beta$ has noticeable but not dramatic effect on convergence
exact line search sometimes improves convergence of gradient method, but not by large, hence mostly not worth implementation
converge rate depends greatly on condition number of Hessian or sublevel sets - when condition number if large, gradient method can be useless

Newton's method - motivation

second-order Taylor expansion of $\fobj$ - $$ \begin{eqnarray*} \hat{\fobj}(\sdir) &=& \fobj(x + \sdir) \\ &=& \fobj(x) + \nabla \fobj(x)^T \sdir + \frac{1}{2} \sdir^T \nabla^2 \fobj(x) \sdir \end{eqnarray*} $$
minimum of Taylor expansion achieved when $$ \begin{eqnarray*} \nabla \hat{\fobj}(\sdir) &=& \nabla \fobj(x) + \nabla^2 \fobj(x) v = 0 \end{eqnarray*} $$
solution called Newton step $$ \sdir_\mathrm{nt}(x) = - \nabla^2 \fobj(x)^{-1} \nabla \fobj(x) $$ assuming $\nabla^2\fobj(x)\succ0$
thus Newton step minimizes local quadratic approximation of function
difference of current and quadratic approximation minimum $$ \fobj(x) - \hat{\fobj}(\sdir_\mathrm{tn}(x)) = \frac{1}{2} \sdir_\mathrm{nt}^T \nabla^2 \fobj(x) \sdir_\mathrm{nt} = \frac{1}{2} \lambda(x)^2 $$
Newton decrement
$$ \begin{eqnarray*} \lambda(x) &=& \sqrt{\sdir_\mathrm{nt}(x)^T \nabla^2 \fobj(x) \sdir_\mathrm{nt}(x)} \\ &=& \sqrt{\nabla \fobj(x)^T \nabla^2 \fobj(x)^{-1} \nabla \fobj(x)} \end{eqnarray*} $$

Newton's method

damped descent method using Newton step

Require: \fobj, initial point $x\in \dom \fobj$, tolerance $\epsilon>0$
loop
computer Newton step and descrement \[ \sdir_\mathrm{nt}(x) := -\nabla^2 \fobj(x)^{-1} \nabla \fobj(x) \] \[ \lambda(x)^2 := \nabla \fobj(x)^T \nabla^2 \fobj(x)^{-1} \nabla \fobj(x) \]
stopping criterion - quit if $\lambda(x)^2/2 < \epsilon$
do line search to choose $t>0$
update - $x := x + \slen \sdir_\mathrm{nt}$
end loop

Newton step is descent direction since $$ \left. \left( \frac{d}{dx}\fobj(x+t\sdir_\mathrm{nt}) \right) \right|_{t=0} = \nabla \fobj(x) ^T \sdir_\mathrm{nt} = - \lambda(x)^2 <0 $$

Assumptions for convergence analysis of Newton's method

assumptions
- strong convexity and boundedness of Hessian on sublevel set $$ \left( \exists\; m, M > 0 \right) \left( \forall x \in S \right) \left( mI \preceq \nabla^2 \fobj(x) \preceq MI \right) $$
- Lipschitz continuity of Hessian on sublevel set $$ \left( \exists L > 0 \right) \left( \forall x,y\in S \right) \left( \|\nabla^2 \fobj(x)- \nabla^2\fobj(y)\|_2 \leq L \|x-y\|_2 \right) $$
Lipschitz continuity constant $L$ plays critical role in performance of Newton's method
- intuition says Newton's method works well for functions whose quadratic approximations do not change fast, i.e., when $L$ is small

Convergence analysis of Newton's method

for function $\fobj$ satisfying strong convexity, Hessian continuity & Lipschitz continuity with $m, M, L>0$, exist $0<\eta<m^2/L$ and $\gamma > 0$ such that for each step $k$

damped Newton phase - if $\|\nabla \fobj(\xseqk{k})\|_2 \geq \eta$, $$ \fobj(\xseqk{k+1}) - \fobj(\xseqk{k}) \leq - \gamma $$
quadratic convergence phase - if $\|\nabla \fobj(\xseqk{k})\|_2 < \eta$, backtracking line search selects step length $\slenk{k}=1$ $$ \frac{L}{2m^2} \|\nabla \fobj(\xseqk{k+1})\|_2 \leq \left( \frac{L}{2m^2} \|\nabla \fobj(\xseqk{k})\|_2 \right)^2 $$

# iterations of Newton's method required to satisfy stopping criterion $\fobj(\xseqk{k})-p^\ast\leq\epsilon$ is $$ \frac{\fobj(\xseqk{0}) - p^\ast}{\gamma} + \log_2 \log_2 (\epsilon_0 / \epsilon) \quad \mbox{where } \epsilon_0 = 2 m^3/L^2 $$

Summary of Newton's method

Newton's method is affine invariant, hence performance is independent of condition number unlike gradient method
once entering quadratic convergence phase, Newton's method converges extremely fast
performance not much dependent on choice of algorithm parameters
big disadvantage is computational cost for evaluating search direction, i.e., solving linear system

Self-concordance

convex function $f:X\to \reals$ with $X\subset \reals^n$ such that for all $x\in X, v\in\reals^n$, $g(t) = f(x+tv)$ with $\dom g = \set{t\in\reals}{x+tv\in X}$ satisfies $$ \left( \forall t\in\dom g \right) \left( |g'''(t)| \leq 2 g''(t)^{3/2} \right) $$

if convex function $g:X\to\reals$ with $X\subset \ppreals$ satisfies $$ |g'''(x)| \leq 3 g''(x) / x $$ function $f$ with $\dom f = \set{x\in\ppreals}{g(x)<0}$ defined by $$ f(x) = -\log(-g(x)) - \log x $$ and function $h$ with $\dom h = \set{x\in\ppreals}{g(x)+ax^2+bx + c<0}$ with $a\geq0$ defined by $$ h(x) = -\log(-g(x)-ax^2-bx-c) - \log x $$ are self-concordant

Why self-concordance?

convergence analysis of Newton's method depends on assumptions about function characteristics, e.g., $m,M, L > 0$ for strong convexity, continuity of Hessian, i.e. $$ m I \preceq \nabla^2 f(x) \preceq M I \quad \|\nabla^2 f(x)- \nabla^2f(y)\| \leq L \|x-y\| $$
self-concordance discovered by Nesterov and Nemirovski (who gave name self-concordance) plays important role for reasons such as
- convergence analysis does not depend any function characterizing paramters
- many barrier functions which are used for interior-point methods, which are important class of optimization algorithms are self-concordance
- property of self-concordance is affine invariant

Self-concordance preserving operations

self-concordance is preserved by positive scaling, addition, and affine transformation, i.e., if $f, g:X\to\reals$ are self-concordant functions with $X\subset\reals^n$, $h:H\to\reals^n$ with $H\subset \reals^m$ are affine functions, and $a>0$ $$ af, \quad f+g, \quad f\circ h $$ are self-concordant where $\dom f\circ h = \set{x\in H}{h(x) \in X}$

Self-concordant function examples

negative logarithm - $f:\ppreals \to \reals$ with $$ f(x)=-\log x $$ is self-concordant since $$ |f'''(x)| / f''(x)^{3/2} = \left(2/x^3\right) / \left((1/x^2)^{3/2}\right) = 2 $$
negative entropy plus negative logarithm - $f:\ppreals \to \reals$ with $$ f(x)=x\log x-\log x $$ is self-concordant since $$ |f'''(x)| / f''(x)^{3/2} = (x+2)/{(x+1)^{3/2}} \leq 2 $$
log barrier for linear inequalities - for $A\in\reals^{m\times n}$ and $b\in\reals^m$ $$ f(x) = - \sum \log(b-Ax) $$ with $\dom f = \set{x\in\reals^n}{b-Ax \succ 0}$ is self-concordant by , i.e., $f$ is affine transformation of sum of self-concordant functions
log-determinant - $f:\posdefset{n}\to\reals$ with $$ f(X) = \log\det X^{-1} = - \log\det X $$ is self-concordant since for every $X\in \posdefset{n}$ and $V\in\symset{n}$ function $g:\reals\to\reals$ defined by $g(t) = - \log\det(X+tV)$ where $\dom f = \set{t\in\reals}{X+tV\succeq 0}$ is self-concordant since $$ \begin{eqnarray*} g(t) &=& - \log \det (X^{1/2} (I + tX^{-1/2} V X^{-1/2})X^{1/2}) \\ &=& -\log\det X - \log\det(I+tX^{-1/2}VX^{-1/2}) \\ &=& -\log\det X - \sum \log (1+t\lambda_i(X,V)) \end{eqnarray*} $$ where $\lambda_i(X,V)$ is $i$-th eigenvalue of $X^{-1/2}VX^{1/2}$ is self-concordant by , i.e., $g$ is affine transformation of sum of self-concordant functions
log of concave quadratic - $f:X\to\reals$ with $$ f(x) = -\log(-x^TPx - q^Tx - r) $$ where $P\in\possemidefset{n}$ and $X=\set{x\in\reals^n}{x^TPx + q^Tx + r<0}$
function $f:X\to\reals$ with $$ f(x) = -\log(-g(x)) - \log x $$ where $\dom f = \set{x\in\dom g \cap \ppreals}{g(x)<0}$ and function $h:H\to\reals$ $$ h(x) = -\log(-g(x)-ax^2-bx-c) - \log x $$ where $a\geq0$ and $\dom h = \set{x\in\dom g \cap \ppreals}{g(x)+ax^2+bx+c<0}$ are self-concordant if $g$ is one of below
- $g(x) = -x^p$ for $0<p\leq 1$
- $g(x) = -\log x$
- $g(x) = x \log x$
- $g(x) = x^p$ for $-1\leq p\leq 0$
- $g(x) = (ax+b)^2/x$ for $a,b\in\reals$
since above $g$ satisfy $|g'''(x)| \leq 3 g''(x)/x$ for every $x\in\dom g$ ()
function $f:X\to\reals$ with $X = \set{(x,y)}{\|x\|_2 < y}\subset \reals^n \times \ppreals$ defined by $$ f(x,y) = -\log(y^2-x^Tx) $$ is self-concordant - can be proved using
function $f:X\to\reals$ with $X = \set{(x,y)}{|x|^p < y}\subset \reals \times \ppreals$ defined by $$ f(x,y) = -2\log y - \log(y^{2/p}- x^2) $$ where $p\geq1$ is self-concordant - can be proved using
function $f:X\to\reals$ with $X = \set{(x,y)}{\exp(x) < y}\subset \reals \times \ppreals$ defined by $$ f(x,y) = -\log y - \log(\log y - x) $$ is self-concordant - can be proved using

Properties of self-concordant functions

for convex function $f:X\to\reals$ with $X\subset \reals^n$, function $\lambda:\tilde{X}\to\preals$ with $\tilde{X} = \set{x\in X}{\nabla^2 \fobj(x) \succ 0}$ defined by $$ \lambda(x) = (\nabla \fobj(x)^T \nabla^2 \fobj(x)^{-1} \nabla \fobj(x))^{1/2} $$ called Newton decrement

note $$ \lambda(x) = \sup_{v\neq 0} \left(v^T \nabla \fobj(x) / \left( v^T \nabla^2 \fobj(x) v \right)^{1/2} \right) $$

for strictly convex self-concordant function $f:X\to\reals^n$ with $X\subset \reals^n$, Hessian is positive definition everywhere (hence Newton decrement is defined everywhere) and for every $x\in X$ $$ p^\ast \geq \fobj(x) - \lambda(x)^2 \quad \Leftrightarrow \quad \fobj(x) - p^\ast \leq \lambda(x)^2 $$ if $\lambda(x) \leq 0.68$

Stopping criteria for self-concordant objective functions

recall $\lambda(x)^2$ provides approximate optimality certificate, (page~here) i.e., assuming $\fobj$ is well approximated by quadratic function around $x$ $$ \fobj(x) - p^\ast \lessapprox \lambda(x)^2/2 $$
however, strict convexity together with self-concordance provides proven bound (by ) $$ \fobj(x) - p^\ast \leq \lambda(x)^2 $$ for $\lambda(x) \leq 0.68$
hence can use following stopping criterion for guaranteed bound $$ \lambda(x)^2 \leq \epsilon \quad \Rightarrow \quad \fobj(x) - p^\ast \leq \epsilon $$ for $\epsilon \leq 0.68^2$

Convergence analysis of Newton's method for self-concordant functions

for strictly convex self-concordant function $\fobj$, exist $0<\eta\leq 1/4$ and $\gamma>0$ (which depend only on line search parameters) such that

damped Newton phase - if $\lambda(\xseqk{k})>\eta$ $$ \fobj(\xseqk{k+1}) - \fobj(\xseqk{k}) \leq - \gamma $$
quadratic convergence phase - if $\lambda(\xseqk{k})\leq\eta$ backtracking line search selects step length $\slenk{k}=1$ $$ 2\lambda(\xseqk{k+1}) \leq \left(2\lambda(\xseqk{k})\right)^2 $$

# iterations required to satisfy stopping criterion $\fobj(\xseqk{k})-p^\ast\leq\epsilon$ is $$ \left(\fobj(\xseqk{0}) - p^\ast\right)/{\gamma} + \log_2 \log_2 (1 / \epsilon) $$ where $\gamma = \alpha \beta (1-2\alpha)^2 / (20-8\alpha)$

Equality Constrained Minimization

Equality constrained minimization

consider equality constrained convex optimization problem, i.e., $m=0$ in $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & Ax = b \end{array} $$ where $A\in\reals^{p\times n}$ and domain of optimization problem is $\optdomain\ = \xobj \subset \reals^n$
assume
- $\rank A = p<n$, i.e., rows of $A$ are linearly independent
- $\fobj$ is twice-differentiable (hence by definition $\xobj$ is open)
- optimal solution $x^\ast$ exists, i.e., $p^\ast = \inf_{x\in\optfeasset} \fobj(x) = \fobj(x^\ast)$ and $Ax^\ast = b$

Solving KKT for equality constrained minimization

implies $x^\ast\in\xobj$ is optimal solution if and only if exists $\nu^\ast\in\reals^p$ satisfy KKT optimality conditions,
i.e., $$ \begin{eqnarray*} Ax^\ast = b &&\mbox{\define{primal feasibility equations}} \\ \nabla \fobj(x^\ast) + A^T\nu^\ast = 0 &&\mbox{\define{dual feasibility equations}} \end{eqnarray*} $$
solving equality constrained problem is equivalent to solving KKT equations
- handful types of problems can be solved analytically
using unconstrained minimization methods
- can eliminate equality constraints and apply unconstrained minimization methods
- solving dual problem using unconstrained minimization methods and retrieve primal solution (refer to page~here)
will discuss Newton's method directly handling equality constraints
- preserving problem structure such as sparsity

Equality constrained convex quadratic minimization

equality constrained convex quadratic minimization problem $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) = (1/2)x^T P x + q^Tx \\ \mbox{subject to} & Ax = b \end{array} $$ where $P\in\possemidefset{n}$ and $A\in\reals^{p\times n}$
important since basis for extension of Newton's method to equality constrained problems
KKT system $$ Ax^\ast = b \; \& \; Px^\ast + q + A^T\nu^\ast = 0 \; \Leftrightarrow \; \underbrace{ \mattwotwo{P}{A^T}{A}{0} }_{\mbox{KKT matrix}} \colvectwo{x^\ast}{\nu^\ast} = \colvectwo{-q}{b} $$
exist primal and dual optimum $(x^\ast,\nu^\ast)$ if and only if KKT system has solution; otherwise, problem is unbounded below

Eliminating equality constraints

can solve equality constrained convex optimization by
- eliminating equality constraints and
- using optimization method for solving unconstrained optimization
note $$ \optfeasset = \set{x}{Ax=b} = \set{Fz + x_0}{z\in\reals^{n-p}} $$ for some $F\in\reals^{n\times(n-p)}$ where $\range(F) = \nullspace(A)$
thus original problem equivalent to $$ \begin{array}{ll} \mbox{minimize} & \fobj(Fz + x_0) \end{array} $$
if $z^\ast$ is optimal solution, $x^\ast = Fz^\ast + x_0$
optimal dual can be retrieved by $$ \nu^\ast = - (AA^T)^{-1} A\nabla \fobj(x^\ast) $$

Solving dual problems

Lagrange dual function of equality constrained problem $$ \begin{eqnarray*} g(\nu) & = & \inf_{x\in\optdomain} \left( \fobj(x) + \nu^T(Ax-b) \right) \\ &=& -b^T\nu - \sup_{x\in\optdomain} \left((-A^T\nu)^Tx -\fobj(x)\right) \\ & = & -b^T \nu - {\fobj}^\ast(-A^T\nu) \end{eqnarray*} $$
dual problem $$ \begin{array}{ll} \mbox{maximize} & -b^T \nu - {\fobj}^\ast(-A^T\nu) \end{array} $$
by assumption, strong duality holds, hence if $\nu^\ast$ is dual optimum $$ g(\nu^\ast) = p^\ast $$
if dual objective is twice-differentiable, can solve dual problem using unconstrained minimization methods
primal optimum can be retrieved using method on page~here)

Newton's method with equality constraints

finally discuss Newton's method which directly handles equality constraints
- similar to Newton's method for unconstrained minimization
- initial point, however, should be feasible, i.e., $\xseqk{0}\in\xobj$ and $A\xseqk{0} = b$
- Newton step tailored for equality constrained problem

Newton step via second-order approximation

solve original problem approximately by solving $$ \begin{array}{ll} \mbox{minimize} & \hat{\fobj}(x+\sdir) \\&= \fobj(x) + \nabla \fobj(x)^T \sdir + (1/2) \sdir^T \nabla^2 \fobj(x) \sdir \\ \mbox{subject to} & A(x+\sdir) = b \end{array} $$ where $x\in\optfeasset$
Newton step for equality constrained minimization problem, defined by solution of KKT system for above convex quadratic minimization problem $$ \mattwotwo{\nabla^2 \fobj(x)}{A^T}{A}{0} \colvectwo{\sdir_\mathrm{nt}}{w} = \colvectwo{-\nabla \fobj(x)}{0} $$ only when KKT system is nonsingular

Newton step via solving linearized KKT optimality conditions

recall KKT optimality conditions for equality constrained convex optimization problem $$ Ax^\ast = b \quad \& \quad \nabla \fobj(x^\ast) + A^T\nu^\ast = 0 $$
linearize KKT conditions $$ \begin{eqnarray*} && A(x+\sdir) = b \quad \& \quad \nabla \fobj(x) + \nabla^2 \fobj(x) \sdir + A^Tw = 0 \\ &\Leftrightarrow& A\sdir = 0 \quad \& \quad \nabla^2 \fobj(x) \sdir + A^Tw = - \nabla \fobj(x) \end{eqnarray*} $$ where $x\in\optfeasset$
Newton step defined by above equations is equivalent to that obtained by second-order approximation

Newton decrement for equality constrained minimization

Newton descrement for equality constrained problem is defined by $$ \lambda(x) = \left(\sdir_\mathrm{nt} \nabla^2 \fobj(x) \sdir_\mathrm{nt}\right)^{1/2} $$
same expression as that for unconstrained minimization, but is different since Newton step $\sdir_\mathrm{nt}$ is different from that for unconstrained minimization, i.e., $\sdir_\mathrm{nt} \neq -\nabla^2 \fobj(x)^{-1} \nabla \fobj(x)$ (refer to )
however, as before, $$ \fobj(x) - \inf_{\sdir\in\reals^n}\set{\hat{\fobj}(x+\sdir)}{A(x+\sdir)=b} = \lambda(x)^2/2 $$ and $$ \left. \left( \frac{d}{dt}\fobj(x+t\sdir_\mathrm{nt}) \right) \right|_{t=0} = \nabla \fobj(x) ^T \sdir_\mathrm{nt} = - \lambda(x)^2 <0 $$

Feasible Newton's method for equality constrained minimization

Require: $\fobj$, initial point $x\in \dom \fobj$ with $Ax=b$, tolerance $\epsilon>0$
loop
computer Newton step and descrement $\ntsdir(x)$ \& $\lambda(x)$
stopping criterion - quit if $\lambda(x)^2/2 < \epsilon$
do line search on \fobj\ to choose $t>0$
update - $x := x + \slen \ntsdir$
end loop

- assumes KKT matrix is nonsingular for every step
- is feasible descent method since all iterates are feasible with $\fobj(\xseqk{k+1}) <\fobj(\xseqk{k})$

Assumptions for convergence analysis of feasible Newton's method for equality constrained minimization

feasibility of initial point - $\xseqk{0}\in\dom \fobj \;\&\; A\xseqk{0}=b$
sublevel set $S = \set{x\in \dom \fobj}{\fobj(x) \leq \fobj(\xseqk{0}),\; Ax=b}$ is closed
boundedness of Hessian on $S$ $$ \left( \exists M > 0 \right) \left( \forall x\in S \right) \left( \nabla^2 \fobj(x) \preceq M I \right) $$
boundedness of KKT matrix on $S$ - corresponds to strong convexity assumption in unconstrained minimization $$ \left( \exists K >0 \right) \left( \forall x \in S \right) \left( \left\| \mattwotwo{\nabla^2 \fobj(x)}{A^T}{A}{0}^{-1} \right\|_2 \leq K \right) $$
Lipschitz continuity of Hessian on $S$ $$ \left( \exists L > 0 \right) \left( \forall x,y\in S \right) \left( \left\|\nabla^2 \fobj(x) - \nabla^2 \fobj(y)\right\|_2 \leq L \|x-y\|_2 \right) $$

Convergence analysis of feasible Newton's method for equality constrained minimization

convergence analysis of Newton's method for equality constrained minimization can be done by analyzing unconstrained minimization after eliminating equality constraints
thus, yield exactly same results as for unconstrained minimization () (with different parameter values), i.e.,
- consists of damped Newton phase and quadratic convergence phase
- # iterations required to achieve $\fobj(\xseqk{k})-p^\ast \leq \epsilon$ is $$ \left(\fobj(\xseqk{0})-p^\ast\right)/\gamma + \log_2 \log_2 (\epsilon_0/\epsilon) $$
for # iterations required to achieve $\fobj(\xseqk{k})-p^\ast \leq \epsilon$ for self-concordant functions is also same as for unconstrained minimization () $$ \left(\fobj(\xseqk{0}) - p^\ast\right)/{\gamma} + \log_2 \log_2 (1 / \epsilon) $$ where $\gamma = \alpha \beta (1-2\alpha)^2 / (20-8\alpha)$

Newton step at infeasible points

only assume that $x\in\dom \fobj$ (hence, can be infeasible)
(as before) linearize KKT conditions $$ \begin{eqnarray*} && A(x+\ntsdir) = b \quad \& \quad \nabla \fobj(x) + \nabla^2 \fobj(x) \ntsdir + A^Tw = 0 \\ &\Leftrightarrow& A\ntsdir = b - Ax \quad \& \quad \nabla^2 \fobj(x) \ntsdir + A^Tw = - \nabla \fobj(x) \\ &\Leftrightarrow& \mattwotwo{\nabla^2 \fobj(x)}{A^T}{A}{0} \colvectwo{\ntsdir}{w} = - \colvectwo{\nabla \fobj(x)}{Ax-b} \end{eqnarray*} $$
same as feasible Newton step except second component on RHS of KKT system

Interpretation as primal-dual Newton step

update both primal and dual variables $x$ and $\nu$
define $r:\reals^n\to\reals^p\to\reals^n\times\reals^p$ by $$ r(x,\nu) = (r_\mathrm{dual}(x,\nu),r_\mathrm{pri}(x,\nu)) $$ where $$ \begin{eqnarray*} \mbox{\define{dual residual}} & - & r_\mathrm{dual}(x,\nu) = \nabla \fobj(x) + A^T\nu \\ \mbox{\define{primal residual}} & - & r_\mathrm{pri}(x,\nu) = Ax-b \end{eqnarray*} $$

Equivalence of infeasible Newton step to primal-dual Newton step

linearize $r$ to obtain primal-dual Newton step, i.e. $$ \begin{eqnarray*} && r(x,\nu) + D_{x,\nu} r(x,\nu) \colvectwo{\pdsdir}{\pdsdirnu} = 0 \\ &\Leftrightarrow& \mattwotwo{\nabla^2f(x)}{A^T}{A}{0} \colvectwo{\pdsdir}{\pdsdirnu} = - \colvectwo{\nabla f(x) + A^T\nu}{Ax-b} \end{eqnarray*} $$
letting $\nu^+= \nu + \pdsdirnu$ gives $$ \mattwotwo{\nabla^2f(x)}{A^T}{A}{0} \colvectwo{\pdsdir}{\nu^+} = - \colvectwo{\nabla f(x)}{Ax-b} $$
- equivalent to infeasible Newton step
- reveals that current value of dual variable not needed

Residual norm reduction property

infeasible Newton step is not descent direction (unlike feasible Newton step) since $$ \begin{eqnarray*} \left. \left( \frac{d}{dt}\fobj(x+t\pdsdir) \right) \right|_{t=0} &=& \nabla \fobj(x) ^T \pdsdir \\ &=& - \pdsdir^T \left(\nabla^2 \fobj(x) \pdsdir + A^Tw \right) \\ &=& - \pdsdir^T \nabla^2 \fobj(x) \pdsdir + (Ax-b)^Tw \end{eqnarray*} $$ which is not necessarily negative
however, norm of residual decreases in infeasible Newton direction $$ \begin{eqnarray*} \left. \left( \frac{d}{dx} \|r(y+t\pdsdiry)\|_2^2 \right) \right|_{t=0} & = & - 2 r(y)^T r(y) = - 2 \|r(y)\|_2^2 \\ \Leftrightarrow \quad \left. \left( \frac{d}{dx} \|r(y+t\pdsdiry)\|_2 \right) \right|_{t=0} & = & \frac{-2\|r(y)\|_2^2}{2\|r(y)\|_2} = - \|r(y)\|_2 \end{eqnarray*} $$ where $y=(x,\nu)$ and $\pdsdiry = (\pdsdir, \pdsdirnu)$
can use $r(\xseqk{k},\nuseqk{k})$ to measure optimization progress for infeasible Newton's method

Full and damped step feasibility property

assume step length is $t$ at some iteration, then $$ r_\mathrm{pri}(x^+,\nu^+) = Ax^+-b = A(x + t \pdsdir) - b = (1-t) r_\mathrm{pri}(x,\nu) $$
hence $l>k$ $$ \seqk{r}{l} = \left( \prod_{i=k}^{l-1} (1-\seqk{t}{i}) \right) \seqk{r}{k} $$
- primal residual reduced by $1-\seqk{t}{k}$ at step $k$
- Newton step becomes feasible step once full step length ($t=1$) taken

Infeasible Newton's method for equality constrained minimization

Require: $\fobj$, initial point $x\in \dom \fobj$ \& $\nu$, tolerance $\epsilon_\mathrm{pri}>0$ \& $\epsilon_\mathrm{dual}>0$
repeat
computer Newton step and descrement $\pdsdir(x)$ \& $\pdsdirnu(x)$, \
do line search on $r(x,\nu)$ to choose $\slen>0$
update - $x := x + \slen \pdsdir$ \& $\nu := \nu + \slen \pdsdirnu$
until $\|r_\mathrm{dual}(x,\nu)\| \leq \epsilon_\mathrm{dual}$ \& $\|Ax-b\| \leq \epsilon_\mathrm{pri}$

note similarity and difference of &
- line search done not on $\fobj$, but on primal-dual residuals $r(x,\nu)$
- stopping criteria depends on $r(x,\nu)$, not on Newton decrementa $\lambda(x)^2$
- primal and dual feasibility checked separately - here norm in $\|Ax-b\|$ can be any norm, e.g., $\|\cdot\|_0$, $\|\cdot\|_1$, $\|\cdot\|_2$, $\|\cdot\|_\infty$, depending on specific application

Line search methods for infeasible Newton's method

line search methods for infeasible Newton's method, i.e., & same with $\fobj$ replaced by $\|r(x,\nu)\|_2$,
but they have special forms (of course) - refer to below special case descriptions

$$ \slen = \argmin_{s>0} \|r(x +s\pdsdir, \nu + s\pdsdirnu)\|_2 $$

Require: \sdir, \sdirnu, $\alpha\in(0,0.5)$, $\beta\in(0,1)$
$\slen:=1$
while $\|r(x +\slen\pdsdir, \nu + \slen\pdsdirnu)\|_2 > (1-\alpha \slen)\|r(x,\nu)\|_2$ do
$\slen := \beta \slen$
end while

Pros and cons of infeasible Newton's method

pros
- do not need to find feasible point separately, e.g.
  - “''
  can be solved by converting to
  - “''
  and solved by infeasible Newton's method
- if step length is one at any iteration, following steps coincides with feasible Newton's method - could switch to feasible Newton's method
cons
- exists no clear way to detect feasibility - primal residual decreases slowly (phase I method in interior point method resolves this problem)
- convergence of infeasible Newton's method can be very slow (until feasibility is achieved0

Assumptions for convergence analysis of infeasible Newton's method for equality constrained minimization

sublevel set $S = \bigset{(x,\nu)\in \dom \fobj\times \reals^m}{ \|r(x,\nu)\|_2 \leq \|r(\xseqk{0},\nuseqk{0})\|_2 }$ is closed, which always holds because $\|r\|_2$ is closed
boundedness of KKT matrix on $S$ $$ \left( \exists K >0 \right) \left( \forall x \in S \right) \left( \left\| Dr(x,\nu)^{-1} \right\|_2 = \left\| \mattwotwo{\nabla^2 \fobj(x)}{A^T}{A}{0}^{-1} \right\|_2 \leq K \right) $$
Lipschitz continuity of Hessian on $S$ $$ \begin{eqnarray*} \left( \exists L > 0 \right) && \left( \forall (x,\nu), (y,\mu)\in S \right) \\ && \left( \left\|Dr(x,\nu) - Dr(y,\mu)\right\|_2 \leq L \|(x,\nu) - (y,\mu)\|_2 \right) \end{eqnarray*} $$
above assumptions imply $\set{x\in\dom \fobj}{Ax=b}\neq\emptyset$ and exist optimal point $(x^\ast,\nu^\ast)$

Convergence analysis of infeasible Newton's method for equality constrained minimization

very simliar to that for Newton's method for unconstrained minimization
consist of two phases - like unconstrained minimization or infeasible Newton's method (refer to or page~here)
- damped Newton phase - if $\|r(\xseqk{k},\nuseqk{k})\|_2> 1/(K^2L)$ $$ \|r(\xseqk{k+1},\nuseqk{k+1})\|_2 \leq \|r(\xseqk{k},\nuseqk{k})\|_2 - \alpha \beta / K^2L $$
- quadratic convergence damped Newton phase - if $\|r(\xseqk{k},\nuseqk{k})\|_2 \leq 1/(K^2L)$ $$ \begin{eqnarray*} \left( K^2L \|r(\xseqk{k},\nuseqk{k})\|_2 / 2 \right) &\leq& \left( K^2L \|r(\xseqk{k-1},\nuseqk{k-1})\|_2 / 2 \right)^2 \\ &\leq& \cdots \leq (1/2)^{2^k} \end{eqnarray*} $$
# iterations of infeasible Newton's method required to satisfy $\|r(\xseqk{k},\nuseqk{k})\|_2\leq\epsilon$ $$ \|r(\xseqk{0},\nuseqk{0})\| /(\alpha \beta / K^2L) + \log_2 \log_2 (\epsilon_0/\epsilon) \quad \mbox{where}\; \epsilon_0 = 2/(K^2L) $$
$(\xseqk{k},\nuseqk{k})$ converges to $(x^\ast,\nu^\ast)$

Barrier Interior-point Methods

Interior-point methods

want to solve inequality constrained minimization problem
interior-point methods solve convex optimization problem () or KKT optimality conditions () by
- applying Newton's method to sequence of
  - equality constrained problems or
  - modified versions of KKT optimality conditions
discuss interior-point barrier method & interior-point primal-dual method
hierarchy of convex optimization algorithms
- simplest - linear equality constrained quadratic program - can solve analytically
- Newton's method - solve linear equality constrained convex optimization problem by solving sequence of linear equality constrained quadratic programs
- interior-point methods - solve linear equality & convex inequality constrained problem by solving sequence of lienar equality constrained convex optimization problem

Indicator function barriers

approxmiate general convex inequality constrained problem as linear equality constrained problem
make inequality constraints implicit in objective function $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) + \sum I_-(\fie(x)) \\ \mbox{subject to} & Ax=b \end{array} $$ where $I_-:\reals\to \reals$ is indicator function for nonpositive real numbers, i.e. $$ I_{-}(u) = \left\{\begin{array}{ll} 0 & u\leq 0 \\ \infty & u> 0 \end{array}\right. $$

Logarithmic barriers

approximate indicator function by logarithmic function $$ \hat{I}_- = -(1/t) \log(-u) \quad \dom \hat{I}_- = -\ppreals $$ for $t>0$ to obtain $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) + \sum -(1/t) \log(-\fie(x)) \\ \mbox{subject to} & Ax=b \end{array} $$
objective function is convex due to composition rule for convexity preservation (page~here), and differentiable
hence, can use Newton's method to solve it
function $\phi$ defined by $$ \phi(x) = - \sum \log(-\fie(x)) $$ with $\dom \phi \set{x\in\xdomain}{\fie(x) \prec 0}$ called logarithmic barrier or log barrier
solve sequence of log barrier problems as we increase $t$

Central path

optimization problem $$ \begin{array}{ll} \mbox{minimize} & t \fobj(x) + \phi(x) \\ \mbox{subject to} & Ax = b \end{array} $$ with $t>0$ where $$ \phi(x) = - \sum \log(-\fie(x)) $$
solution of above problem, called central point, denoted by $x^\ast(t)$, set of central points, called central path
intuition says $x^\ast(t)$ will converge to $x^\ast$ as $t\to\infty$
KKT conditions imply $$ Ax^\ast(t) = b \quad \fie(x^\ast(t)) \prec 0 $$ and exists $\nu^\ast(t)$ such that $$ \begin{eqnarray*} 0 &=& t \nabla \fobj(x^\ast(t)) + \nabla \phi(x^\ast(t)) + t A^T \nu^\ast(t) \\ &=& t\nabla \fobj(x^\ast(t)) - \sum \frac{1}{\fie_i(x^\ast(t))} \nabla\fie_i(x^\ast(t)) + t A^T \nu^\ast(t) \end{eqnarray*} $$
thus if we let $\lambda^\ast(t) = -1/t\fie_i(x^\ast(t))$, $x^\ast(t)$ minimizes $$ L(x,\lambda^\ast(t),\nu^\ast(t)) = \fobj(x) + {\lambda^\ast(t)}^T \fie(x) + {\nu^\ast(t)}^T (Ax-b) $$ where $L$ is Lagrangian of original problem in
hence, dual function $g(\lambda^\ast(t),\nu^\ast(t))$ is finite and $$ \begin{eqnarray*} g(\lambda^\ast(t), \nu^\ast(t)) &=& \inf_{x\in\xdomain} L(x,\lambda^\ast(t),\nu^\ast(t)) = L(x^\ast(t),\lambda^\ast(t),\nu^\ast(t)) \\ & = & \fobj(x^\ast(t)) + {\lambda^\ast(t)}^T \fie(x^\ast(t)) + {\nu^\ast(t)}^T (Ax^\ast(t)-b) \\ &=& \fobj(x^\ast(t)) - m/t \end{eqnarray*} $$ and $$ \fobj(x^\ast(t)) - p^\ast \leq \fobj(x^\ast(t)) - g(\lambda^\ast(t), \nu^\ast(t)) = m/t $$
that is, $x^\ast(t)$ is no more than $m/t$-suboptimal
which confirms out intuition that $x^\ast(t)\to x^\ast$ as $t\to\infty$

Central path interpretation via KKT conditions

previous arguments imply that $x$ is central point, i.e., $x=x^\ast(t)$ for some $t>0$ if and only if exist $\lambda$ and $\nu$ such that $$ \begin{eqnarray*} Ax=b \quad \fie({x}) &\preceq& 0 \quad \mbox{- primal feasibility} \\ \lambda &\succeq& 0 \quad \mbox{- dual feasibility} \\ - {\lambda_i}^T \fie_i({x}) &=& 1/t \quad \mbox{- complementary $1/t$-slackness} \\ \nabla_x L(x,\lambda,\nu) &=& 0 \quad \mbox{- vanishing gradient of Lagrangian} \end{eqnarray*} $$ called centrality conditions
only difference between centrality conditions and KKT conditions in is complementary $1/t$-slackness
- note that I've just made up term “complementary $1/t$-slackness'' - you won't be able to find terminology in any literature
for large $t$, $\lambda^\ast(t)$ & $\nu^\ast(t)$ very closely satisfy (true) complementary slackness

Central path interpretation via force field

assume exist no equality constraints
interpret $\phi$ as potential energy by some force field, e.g., electrical field and $t\fobj$ as potential energy by some other force field, e.g., gravity
then
- force by first force field (in $n$-dimensional space), which we call barrier force, is $$ - \nabla \phi(x) = \sum \frac{1}{\fie_i(x)} \nabla \fie_i(x) $$
- force by second force field, which we call objective force, is $$ - \nabla (t\fobj(x)) = -t \nabla \fobj(x) $$
$x^\ast(t)$ is point where two forces exactly balance each other
- as $x$ approach boundary, barrier force pushes $x$ harder from barriers,
- as $t$ increases, objective force pushes $x$ harder to point where objective potential energy is minimized

Equality constrained problem using log barrier

central point $x^\ast(t)$ is $m/t$-suboptimal point guaranteed by optimality certificate $g(\lambda^\ast(t),\nu^\ast(t))$
hence solving below problem provides solution with $\epsilon$-suboptimality $$ \begin{array}{ll} \mbox{minimize} & (m/\epsilon) \fobj(x) + \phi(x) \\ \mbox{subject to} & Ax=b \end{array} $$
but works only for small problems since for large $m/\epsilon$, objective function ill behaves

Barrier methods

Require: strictly feasible $x$, $t>0$, $\mu>1$, tolerance $\epsilon>0$
repeat
centering step - find $x^\ast(t)$ by minimizing $t\fobj + \phi$ subject to $Ax=b$ starting at $x$
(optionally) compute $\lambda^\ast(t)$ \& $\nu^\ast(t)$
stopping criterion - quit if $m/t<\epsilon$
increase $t$ - $t := \mu t$
update $x$ - $x := x^\ast(t)$

barrier method, also called path-following method, solves sequence of equality constrained optimization problem with log barrier
- when first proposed by Fiacco and McCormick in 1960s, it was called sequential unconstrained minimization technique (SUMT)
centering step also called outer iteration
each iteration of algorithm used for equality constrained problem, called inner iteration

Accuracy in centering in barrier method

accuracy of centering
- only goal of centering is getting close to $x^\ast$, hence exact calculation of $x^\ast(t)$ not critical as long as approximates of $x^\ast(t)$ go to $x^\ast$
- while cannot calculate $g(\lambda,\nu)$ for this case, below provides dual feasible point when Newton step $\ntsdir$ for optimization problem on page~here is small, i.e., for nearly centered $$ \tilde{\lambda}_i = -\frac{1}{t\fie_i(x)} \left( 1 - \frac{\nabla \fie_i(x)^T \ntsdir}{\fie_i(x)} \right) $$

Choices of parameters of barrier method

choice of $\mu$
- $\mu$ determines aggressiveness of $t$-update
  - larger $\mu$, less outer iterations, but more inner iterations
  - smaller $\mu$, less outer iterations, but more inner iterations
- values from $10$ to $20$ for $\mu$ seem to work well
candidates for choice of initial $t$ - choose $\seqk{t}{0}$ such that
- $$ m / \seqk{t}{0} \approx \fobj(\xseqk{0}) - p^\ast $$
- make central path condition on page~here maximally satisfied $$ \seqk{t}{0} = \arginf_{t} \inf_{\tilde{\nu}} \left\| t \nabla \fobj(\xseqk{0}) + \nabla \phi(\xseqk{0}) + A^T \tilde{\nu} \right\| $$

Convergence analysis of barrier method

assuming $t\fobj + \phi$ can be minimized by Newton's method for $\seqk{t}{0}$, $\mu\seqk{t}{0}$, $\mu^2\seqk{t}{0}$, …
at $k$'th step, duality gap achieved is $m/\mu^k\seqk{t}{0}$
# centering steps required to achieve accuracy of $\epsilon$ is $$ \left\lceil \frac{\log \left(m/\epsilon \seqk{t}{0}\right)}{\log \mu} \right\rceil $$ plus one (initial centering step)
for convergence of centering
- for feasible centering problem, $t\fobj + \phi$ should satisfy conditions on page~here, i.e., initial sublevel set is closed, associated inverse KKT matrix is bounded & Hessian satisfies Lipschitz condition
- for infeasible centering problem, $t\fobj + \phi$ should satisfy conditions on page~here

Primal-dual Interior-point Methods

Primal-dual \& barrier interior-point methods

in primal-dual interior-point methods
- both primal and dual variables are updated at each iteration
- search directions are obtained from Newton's method, applied to modified KKT equations, i.e., optimality conditions for logarithmic barrier centering problem
- primal-dual search directions are similar to, but not quite the same as, search directions arising in barrier methods
- primal and dual iterates are not necessarily feasible
primal-dual interior-point methods
- often more efficient than barrier methods especially when high accuracy is required - can exhibit better than linear convergence
- (customized versions) outperform barrier method for several basic problems, such as, LP, QP, SOCP, GP, SDP
- can work for feasible, but not strictly feasible problems
- still active research topic, but show great promise

Modified KKT conditions and central points

modified KKT conditions (for convex optimization in ) expressed as $$ r_t(x,\lambda,\nu) = \colvecthree {\nabla \fobj(x) + D\fie(x)^T\lambda + A^T\nu} {-\diag(\lambda)\fobj(x) - (1/t) \ones} {Ax-b} $$ where $$ \begin{eqnarray*} \mbox{dual residual} &-& r_\mathrm{dual}(x,\lambda,\nu) = {\nabla \fobj(x) + D\fie(x)^T\lambda + A^T\nu} \\ \mbox{centrality residual} &-& r_\mathrm{cent}(x,\lambda,\nu) = {-\diag(\lambda)\fobj(x) - (1/t) \ones} \\ \mbox{primal residual} &-& r_\mathrm{pri}(x,\lambda,\nu) = {Ax-b} \end{eqnarray*} $$
if $x$, $\lambda$, $\nu$ satisfy $r_t(x,\lambda,\nu)=0$ (and $\fie(x) \prec 0$), then
- $x=x^\ast(t)$, $\lambda=\lambda^\ast(t)$, $\nu=\nu^\ast(t)$
- $x$ is primal feasible and $\lambda$ & $\nu$ are dual feasible with duality gap $m/t$

Primal-dual search direction

assume current (primal-dual) point $y=(x,\lambda,\nu)$ and Newton step $\sdiry =( \sdir, \sdirnu, \sdirlbd)$
as before, linearize equation to obtain Newton step, i.e., $$ r_t(y+\sdiry) \approx r_t(y) + Dr_t(y) \sdiry = 0 \quad \Leftrightarrow \quad \sdiry = -Dr_t(y)^{-1} r_t(y) $$ hence $$ \begin{my-matrix}{ccc} \nabla^2 f(x) + \sum \lambda_i \nabla^2 \fie_i(x) & D\fie(x)^T & A^T \\ -\diag(\lambda) D\fobj(x) & -\diag(\fobj(x)) & 0 \\ A & 0 & 0 \end{my-matrix} \colvecthree{\sdir}{\sdirlbd}{\sdirnu} = - \colvecthree {r_\mathrm{dual}} {r_\mathrm{cent}} {r_\mathrm{pri}} $$
above equation determines primal-dual search direction $\pdsdiry = (\pdsdir, \pdsdirlbd, \pdsdirnu)$

Surrogate duality gap

iterates $\xseqk{k}$, $\lbdseqk{k}$, and $\nuseqk{k}$ of primal-dual interior-point method are not necessarily feasible
hence, cannot easily evaluate duality gap $\seqk{\eta}{k}$ as for barrier method
define surrogate duality gap for $\fie(x) \prec 0$ and $\lambda\succeq0$ as $$ \hat{\eta}(x,\lambda) = - \fie(x)^T \lambda $$
$\hat{\eta}$ would be duality gap if $x$ were primal feasible and $\lambda$ & $\nu$ were dual feasible
value $t$ corresponding to surrogate duality gap $\hat{\eta}$ is $m/\hat{\eta}$

Primal-dual interior-point method

Require: initial point $x$ with $\fie(x)\prec0$, $\lambda \succ 0$, $\mu > 1$, $\epsilon_\mathrm{pri}>0$, $\epsilon_\mathrm{dual}>0$, $\epsilon>0$
repeat
set $t := \mu m /\hat{\eta}$
computer primal-dual search direction $\pdsdiry = (\pdsdir, \pdsdirlbd, \pdsdirnu)$
do line search to choose $s>0$
update - $x := x + s \pdsdir$, $\lambda := \lambda + s \pdsdirnu$, $\nu := \nu + s \pdsdirnu$
until $\|r_\mathrm{pri}(x,\lambda,\nu)\|_2\leq \epsilon_\mathrm{pri}$, $\|r_\mathrm{dual}(x,\lambda,\nu)\|_2\leq \epsilon_\mathrm{dual}$, $\hat{\eta} \leq \epsilon$

common to choose small $\epsilon_\mathrm{pri}$, $\epsilon_\mathrm{dual}$, & $\epsilon$ since primal-dual method often shows faster than linear convergence

Line search for primal-dual interior-point method

liner search is standard backtracking line search on $r(x,\lambda,\nu)$ similar to that in except making sure that $\fie(x) \prec 0$ and $\lambda\succ0$
note initial $s$ in is largest $s$ that makes $\lambda + s\pdsdirlbd$ positive

Require: \pdsdir, \pdsdirlbd, \pdsdirnu, $\alpha\in(0.01,0.1)$, $\beta\in(0.3,0.8)$
$s$ $:=$ $0.99\sup\set{s\in[0,1]}{\lambda + s \sdirlbd \succeq 0}$ $=$ $0.99\min\{1,\min\set{-\lambda_i/\sdirlbd_i}{\sdirlbd_i < 0}\}$
while $\fie (x +s\pdsdir) \not \prec 0$ do
$t := \beta t$
end while
while $\|r(x +s\pdsdir, \lambda + s\pdsdirlbd, \nu + s\pdsdirnu)\|_2 > (1-\alpha s)\|r(x,\lambda,\nu)\|_2$ do
$t := \beta t$
end while

Share on

X Facebook LinkedIn Bluesky

Sunghee Yun (He/Him)

NotebookLM Podcasts

Introduction

Preamble

Notations

Some definitions

Some conventions

Convex Optimization

Convex Sets

Lines and line segmenets

Affine sets

Relative interiors and boundaries

Convex sets

Cones

Hyperplanes and half spaces

Euclidean balls and ellipsoids

Norm balls and norm cones

Polyhedra

Convexity preserving set operations

Proper cones and generalized inequalities

Convex sets induced by generalized inequalities

Separating and supporting hyperplanes

Dual cones

Dual norms

Properties of dual cones

Dual generalized inequalities

Theorem of alternative for linear strict generalized inequalities

Convex Functions

Convex functions

Extended real-value extensions of convex functions

First-order condition for convexity

Second-order condition for convexity

Convex function examples

Sublevel sets and superlevel sets

Epigraphs and hypographs

Convexity preserving function operations

Convex functions examples

Conjugate functions

Conjugate function examples

Convex functions \wrt\ generalized inequalities

Matrix convexity

Convex Optimization Problems

Optimization problems

Global and local optimalities

Equivalent optimization problems

Change of variables

Convex optimization

Optimality conditions for convex optimization problems

Optimality conditions for some convex optimization problems

Linear programming

LP examples

Quadratic programming

QP examples

Quadratically constrained quadratic programming

Second-order cone programming

SOCP examples

Geometric programming

Geometric programming in convex form

Convex optimization with generalized inequalities

Conic programming

Semidefinite programming

SDP examples

Determinant maximization problems

Diagrams for containment of convex optimization problems

Duality

Lagrangian

Lagrange dual functions

Dual function examples

Best lower bound

Lagrange dual problems

Making dual constraints explicit dual problems

Lagrange dual problems associated with LPs

Lagrange dual problem of equality constrained optimization problem

Lagrange dual problem associated with equality constrained quadratic program

Lagrange dual problems associated with nonconvex quadratic problems

Weak duality

Optimal duality gap

Strong duality

Slater's theorem

Strong duality for LS solution of linear equations