Convex Optimization
posted: 01-Aug-2025 & updated: 03-Aug-2025
\[% \newcommand{\algA}{\algk{A}} \newcommand{\algC}{\algk{C}} \newcommand{\bigtimes}{\times} \newcommand{\compl}[1]{\tilde{#1}} \newcommand{\complexes}{\mathbb{C}} \newcommand{\dom}{\mathop{\bf dom {}}} \newcommand{\ereals}{\reals\cup\{-\infty,\infty\}} \newcommand{\field}{\mathbb{F}} \newcommand{\integers}{\mathbb{Z}} \newcommand{\lbdseqk}[1]{\seqk{\lambda}{#1}} \newcommand{\meas}[3]{({#1}, {#2}, {#3})} \newcommand{\measu}[2]{({#1}, {#2})} \newcommand{\meast}[3]{\left({#1}, {#2}, {#3}\right)} \newcommand{\naturals}{\mathbb{N}} \newcommand{\nuseqk}[1]{\seqk{\nu}{#1}} \newcommand{\pair}[2]{\langle {#1}, {#2}\rangle} \newcommand{\rationals}{\mathbb{Q}} \newcommand{\reals}{\mathbb{R}} \newcommand{\seq}[1]{\left\langle{#1}\right\rangle} \newcommand{\powerset}{\mathcal{P}} \newcommand{\pprealk}[1]{\reals_{++}^{#1}} \newcommand{\ppreals}{\mathbb{R}_{++}} \newcommand{\prealk}[1]{\reals_{+}^{#1}} \newcommand{\preals}{\mathbb{R}_+} \newcommand{\tXJ}{\topos{X}{J}} % \newcommand{\relint}{\mathop{\bf relint {}}} \newcommand{\boundary}{\mathop{\bf bd {}}} \newcommand{\subsetset}[1]{\mathcal{#1}} \newcommand{\Tr}{\mathcal{\bf Tr}} \newcommand{\symset}[1]{\mathbf{S}^{#1}} \newcommand{\possemidefset}[1]{\mathbf{S}_+^{#1}} \newcommand{\posdefset}[1]{\mathbf{S}_{++}^{#1}} \newcommand{\ones}{\mathbf{1}} \newcommand{\Prob}{\mathop{\bf Prob {}}} \newcommand{\prob}[1]{\Prob\left\{#1\right\}} \newcommand{\Expect}{\mathop{\bf E {}}} \newcommand{\Var}{\mathop{\bf Var{}}} \newcommand{\Mod}[1]{\;(\text{mod}\;#1)} \newcommand{\ball}[2]{B(#1,#2)} \newcommand{\generates}[1]{\langle {#1} \rangle} \newcommand{\isomorph}{\approx} \newcommand{\isomorph}{\approx} \newcommand{\nullspace}{\mathcalfont{N}} \newcommand{\range}{\mathcalfont{R}} \newcommand{\diag}{\mathop{\bf diag {}}} \newcommand{\rank}{\mathop{\bf rank {}}} \newcommand{\Ker}{\mathop{\mathrm{Ker} {}}} \newcommand{\Map}{\mathop{\mathrm{Map} {}}} \newcommand{\End}{\mathop{\mathrm{End} {}}} \newcommand{\Img}{\mathop{\mathrm{Im} {}}} \newcommand{\Aut}{\mathop{\mathrm{Aut} {}}} \newcommand{\Gal}{\mathop{\mathrm{Gal} {}}} \newcommand{\Irr}{\mathop{\mathrm{Irr} {}}} \newcommand{\arginf}{\mathop{\mathrm{arginf}}} \newcommand{\argsup}{\mathop{\mathrm{argsup}}} \newcommand{\argmin}{\mathop{\mathrm{argmin}}} \newcommand{\ev}{\mathop{\mathrm{ev} {}}} \newcommand{\affinehull}{\mathop{\bf aff {}}} \newcommand{\cvxhull}{\mathop{\bf Conv {}}} \newcommand{\epi}{\mathop{\bf epi {}}} \newcommand{\injhomeo}{\hookrightarrow} \newcommand{\perm}[1]{\text{Perm}(#1)} \newcommand{\aut}[1]{\text{Aut}(#1)} \newcommand{\ideal}[1]{\mathfrak{#1}} \newcommand{\bigset}[2]{\left\{#1\left|{#2}\right.\right\}} \newcommand{\bigsetl}[2]{\left\{\left.{#1}\right|{#2}\right\}} \newcommand{\primefield}[1]{\field_{#1}} \newcommand{\dimext}[2]{[#1:{#2}]} \newcommand{\restrict}[2]{#1|{#2}} \newcommand{\algclosure}[1]{#1^\mathrm{a}} \newcommand{\finitefield}[2]{\field_{#1^{#2}}} \newcommand{\frobmap}[2]{\varphi_{#1,{#2}}} % %\newcommand{\algfontmode}{} % %\ifdefined\algfontmode %\newcommand\mathalgfont[1]{\mathcal{#1}} %\newcommand\mathcalfont[1]{\mathscr{#1}} %\else \newcommand\mathalgfont[1]{\mathscr{#1}} \newcommand\mathcalfont[1]{\mathcal{#1}} %\fi % %\def\DeltaSirDir{yes} %\newcommand\sdirletter[2]{\ifthenelse{\equal{\DeltaSirDir}{yes}}{\ensuremath{\Delta #1}}{\ensuremath{#2}}} \newcommand{\sdirletter}[2]{\Delta #1} \newcommand{\sdirlbd}{\sdirletter{\lambda}{\Delta \lambda}} \newcommand{\sdir}{\sdirletter{x}{v}} \newcommand{\seqk}[2]{#1^{(#2)}} \newcommand{\seqscr}[3]{\seq{#1}_{#2}^{#3}} \newcommand{\xseqk}[1]{\seqk{x}{#1}} \newcommand{\sdirk}[1]{\seqk{\sdir}{#1}} \newcommand{\sdiry}{\sdirletter{y}{\Delta y}} \newcommand{\slen}{t} \newcommand{\slenk}[1]{\seqk{\slen}{#1}} \newcommand{\ntsdir}{\sdir_\mathrm{nt}} \newcommand{\pdsdir}{\sdir_\mathrm{pd}} \newcommand{\sdirnu}{\sdirletter{\nu}{w}} \newcommand{\pdsdirnu}{\sdirnu_\mathrm{pd}} \newcommand{\pdsdiry}{\sdiry_\mathrm{pd}} \newcommand\pdsdirlbd{\sdirlbd_\mathrm{pd}} % \newcommand{\normal}{\mathcalfont{N}} % \newcommand{\algk}[1]{\mathalgfont{#1}} \newcommand{\collk}[1]{\mathcalfont{#1}} \newcommand{\classk}[1]{\collk{#1}} \newcommand{\indexedcol}[1]{\{#1\}} \newcommand{\rel}{\mathbf{R}} \newcommand{\relxy}[2]{#1\;\rel\;{#2}} \newcommand{\innerp}[2]{\langle{#1},{#2}\rangle} \newcommand{\innerpt}[2]{\left\langle{#1},{#2}\right\rangle} \newcommand{\closure}[1]{\overline{#1}} \newcommand{\support}{\mathbf{support}} \newcommand{\set}[2]{\{#1|#2\}} \newcommand{\metrics}[2]{\langle {#1}, {#2}\rangle} \newcommand{\interior}[1]{#1^\circ} \newcommand{\topol}[1]{\mathfrak{#1}} \newcommand{\topos}[2]{\langle {#1}, \topol{#2}\rangle} % topological space % \newcommand{\alg}{\algk{A}} \newcommand{\algB}{\algk{B}} \newcommand{\algF}{\algk{F}} \newcommand{\algR}{\algk{R}} \newcommand{\algX}{\algk{X}} \newcommand{\algY}{\algk{Y}} % \newcommand\coll{\collk{C}} \newcommand\collB{\collk{B}} \newcommand\collF{\collk{F}} \newcommand\collG{\collk{G}} \newcommand{\tJ}{\topol{J}} \newcommand{\tS}{\topol{S}} \newcommand\openconv{\collk{U}} % \newenvironment{my-matrix}[1]{\begin{bmatrix}}{\end{bmatrix}} \newcommand{\colvectwo}[2]{\begin{my-matrix}{c}{#1}\\{#2}\end{my-matrix}} \newcommand{\colvecthree}[3]{\begin{my-matrix}{c}{#1}\\{#2}\\{#3}\end{my-matrix}} \newcommand{\rowvecthree}[3]{\begin{bmatrix}{#1}&{#2}&{#3}\end{bmatrix}} \newcommand{\mattwotwo}[4]{\begin{bmatrix}{#1}&{#2}\\{#3}&{#4}\end{bmatrix}} % \newcommand\optfdk[2]{#1^\mathrm{#2}} \newcommand\tildeoptfdk[2]{\tilde{#1}^\mathrm{#2}} \newcommand\fobj{\optfdk{f}{obj}} \newcommand\fie{\optfdk{f}{ie}} \newcommand\feq{\optfdk{f}{eq}} \newcommand\tildefobj{\tildeoptfdk{f}{obj}} \newcommand\tildefie{\tildeoptfdk{f}{ie}} \newcommand\tildefeq{\tildeoptfdk{f}{eq}} \newcommand\xdomain{\mathcalfont{X}} \newcommand\xobj{\optfdk{\xdomain}{obj}} \newcommand\xie{\optfdk{\xdomain}{ie}} \newcommand\xeq{\optfdk{\xdomain}{eq}} \newcommand\optdomain{\mathcalfont{D}} \newcommand\optfeasset{\mathcalfont{F}} % \newcommand{\bigpropercone}{\mathcalfont{K}} % \newcommand{\prescript}[3]{\;^{#1}{#3}} % %\]Introduction
Preamble
Notations
-
sets of numbers
- $\naturals$ - set of natural numbers
- $\integers$ - set of integers
- $\integers_+$ - set of nonnegative integers
- $\rationals$ - set of rational numbers
- $\reals$ - set of real numbers
- $\preals$ - set of nonnegative real numbers
- $\ppreals$ - set of positive real numbers
- $\complexes$ - set of complex numbers
-
sequences $\seq{x_i}$ and the like
- finite $\seq{x_i}_{i=1}^n$, infinite $\seq{x_i}_{i=1}^\infty$ - use $\seq{x_i}$ whenever unambiguously understood
- similarly for other operations, e.g., $\sum x_i$, $\prod x_i$, $\cup A_i$, $\cap A_i$, $\bigtimes A_i$
- similarly for integrals, e.g., $\int f$ for $\int_{-\infty}^\infty f$
-
sets
- $\compl{A}$ - complement of $A$
- $A\sim B$ - $A\cap \compl{B}$
- $A\Delta B$ - $(A\cap \compl{B}) \cup (\compl{A} \cap B)$
- $\powerset(A)$ - set of all subsets of $A$
-
sets in metric vector spaces
- $\closure{A}$ - closure of set $A$
- $\interior{A}$ - interior of set $A$
- $\relint A$ - relative interior of set $A$
- $\boundary A$ - boundary of set $A$
-
set algebra
- $\sigma(\subsetset{A})$ - $\sigma$-algebra generated by $\subsetset{A}$, i.e., smallest $\sigma$-algebra containing $\subsetset{A}$
-
norms in $\reals^n$
- $\|x\|_p$ ($p\geq1$) - $p$-norm of $x\in\reals^n$, i.e., $(|x_1|^p + \cdots + |x_n|^p)^{1/p}$
- e.g., $\|x\|_2$ - Euclidean norm
-
matrices and vectors
- $a_{i}$ - $i$-th entry of vector $a$
- $A_{ij}$ - entry of matrix $A$ at position $(i,j)$, i.e., entry in $i$-th row and $j$-th column
- $\Tr(A)$ - trace of $A \in\reals^{n\times n}$, i.e., $A_{1,1}+ \cdots + A_{n,n}$
-
symmetric, positive definite, and positive semi-definite matrices
- $\symset{n}\subset \reals^{n\times n}$ - set of symmetric matrices
- $\possemidefset{n}\subset \symset{n}$ - set of positive semi-definite matrices; $A\succeq0 \Leftrightarrow A \in \possemidefset{n}$
- $\posdefset{n}\subset \symset{n}$ - set of positive definite matrices; $A\succ0 \Leftrightarrow A \in \posdefset{n}$
-
sometimes,
use Python script-like notations
(with serious abuse of mathematical notations)
-
use $f:\reals\to\reals$ as if it were $f:\reals^n \to \reals^n$,
e.g.,
$$
\exp(x) = (\exp(x_1), \ldots, \exp(x_n)) \quad \mbox{for } x\in\reals^n
$$
and
$$
\log(x) = (\log(x_1), \ldots, \log(x_n)) \quad \mbox{for } x\in\ppreals^n
$$
which corresponds to Python code
numpy.exp(x)
ornumpy.log(x)
wherex
is instance ofnumpy.ndarray
, i.e.,numpy
array -
use $\sum x$ to mean $\ones^T x$ for $x\in\reals^n$,
i.e.
$$
\sum x = x_1 + \cdots + x_n
$$
which corresponds to Python code
x.sum()
wherex
isnumpy
array -
use $x/y$ for $x,y\in\reals^n$ to mean
$$
\rowvecthree{x_1/y_1}{\cdots}{x_n/y_n}^T
$$
which corresponds to Python code
x / y
wherex
andy
are $1$-dnumpy
arrays -
use $X/Y$ for $X,Y\in\reals^{m\times n}$ to mean
$$
\begin{my-matrix}{cccc}
X_{1,1}/Y_{1,1} & X_{1,2}/Y_{1,2} & \cdots & X_{1,n}/Y_{1,n}
\\
X_{2,1}/Y_{2,1} & X_{2,2}/Y_{2,2} & \cdots & X_{2,n}/Y_{2,n}
\\
\vdots & \vdots & \ddots & \vdots
\\
X_{m,1}/Y_{m,1} & X_{m,2}/Y_{m,2} & \cdots & X_{m,n}/Y_{m,n}
\end{my-matrix}
$$
which corresponds to Python code
X / Y
whereX
andY
are $2$-dnumpy
arrays
-
use $f:\reals\to\reals$ as if it were $f:\reals^n \to \reals^n$,
e.g.,
$$
\exp(x) = (\exp(x_1), \ldots, \exp(x_n)) \quad \mbox{for } x\in\reals^n
$$
and
$$
\log(x) = (\log(x_1), \ldots, \log(x_n)) \quad \mbox{for } x\in\ppreals^n
$$
which corresponds to Python code
Some definitions
Some conventions
-
(for some subjects) use following conventions
- $0\cdot \infty = \infty \cdot 0 = 0$
- $(\forall x\in\ppreals)(x\cdot \infty = \infty \cdot x = \infty)$
- $\infty \cdot \infty = \infty$
Convex Optimization
Convex Sets
Lines and line segmenets
Affine sets
Relative interiors and boundaries
Convex sets
- convex hull (of course) is convex set
Cones
- convex cone (of course) is convex set
- examples of convex cones: $\prealk{n}$, $\pprealk{n}$, $\possemidefset{n}$, and $\posdefset{n}$
Hyperplanes and half spaces
- hyperplanes and half spaces are convex sets
Euclidean balls and ellipsoids
- Euclidean balls and ellipsoids are convex sets
Norm balls and norm cones
- norm balls and norm cones are convex sets
Polyhedra
Convexity preserving set operations
-
intersection preserves convexity
- for (any) collection of convex sets, $\coll$, $$ \bigcap_{C\in\coll} C $$ is convex set
-
scalar scaling preserves convexity
- for convex set $C$ $$ \alpha C $$ is convex set for any $\alpha\in\reals$
-
sum preserves convexity
- for convex sets $C$ and $D$ $$ C+D $$ is convex set
-
direct product preserves convexity
- for convex sets $C$ and $D$ $$ C\times D $$ is convex set
-
projection preserves convexity
- for convex set $C\subset A \times B$ $$ \set{x\in A}{(\exists y)((x,y)\in C)} $$ is convex
-
image and inverse image by affine function preserve convexity
- for affine function $f:A\to B$ and convex sets $C\subset A$ and $D\subset B$ $$ f(C) \;\& \; f^{-1}(D) $$ are convex
-
image and inverse image by linear-fractional function preserve convexity
- for convex sets $C\subset \reals^n, D\subset \reals^m$ and linear-fractional function, $g:\reals^n\to\reals^m$, i.e., function defined by $g(x) = (Ax+b)/(c^Tx+d)$ for $A\in\reals^{m\times n}$, $b\in\reals^m$, $c\in\reals^n$, and $d\in\reals$ $$ g(C) \ \& \ g^{-1}(D) $$ are convex
Proper cones and generalized inequalities
- solid, i.e., $\interior{K}\neq \emptyset$
- pointed, i.e., $x\in vK$ and $-x\in K$ imply $x=0$
- examples of proper cones: $\prealk{n}$ and $\possemidefset{n}$
- (nonstrict) generalized inequality $$ x \preceq_K y \Leftrightarrow y - x\in K $$
- strict generalized inequality $$ x \prec_K y \Leftrightarrow y - x\in \interior{K} $$
- $\preceq_K$ and $\prec_K$ are partial orderings
Convex sets induced by generalized inequalities
- for affine function $g:\reals^n\to\symset{m}$, i.e., $f(x)=A_0 + A_1 x_1 + \cdots + A_n x_n$ for some $A_0,\ldots,A_n\in\symset{m}$, $f^{-1}(\possemidefset{n})$ is convex (by ), i.e., $$ \set{x\in\reals^n}{A_0 + A_1 x_1 + \cdots + A_n x_n \succeq 0} \subset \reals^n $$ is convex
- can negate each matrix $A_i$ and have same results, hence $$ \set{x\in\reals^n}{A_0 + A_1 x_1 + \cdots + A_n x_n \preceq 0} \subset \reals^n $$ is (also) convex
Separating and supporting hyperplanes
Dual cones
- the figure illustrates $x \in K^\ast$ while $z\not\in K^\ast$
Dual norms
-
examples
- dual cone of subspace $V\subset \reals^n$ is orthogonal complement of $V$, $V^\perp$, where $V^\perp=\set{y}{\forall v\in V,v^Ty = 0}$
- $\prealk{n}$ and $\possemidefset{n}$ are self-dual
- dual of norm cone is norm cone associated with dual norm, i.e., if $K=\set{(x,t)\in\reals^{n} \times \reals}{\|x\|\leq t}$ $$ K=\set{(y,u)\in\reals^{n} \times \reals}{\|y\|_\ast\leq u} $$
Properties of dual cones
- $K^\ast$ is closed and convex
- $K_1\subset K_2 \Rightarrow K_2^\ast \subset K_1^\ast$
- if $\interior{K} \neq \emptyset$, $K^\ast$ is pointed
- if $\closure{K}$ is pointed, $\interior{(K^\ast)} \neq \emptyset$
- $K^{\ast\ast}=(K^\ast)^\ast$ is closure of convex hull of $K$,
- $K^\ast$ is closed and convex
- if $K$ is closed and convex, $K^{\ast\ast} = K$
- dual of proper cone is proper cone
- for proper cone $K$, $K^{\ast\ast}=K$
Dual generalized inequalities
- $x\preceq_K y$ if and only if $(\forall \lambda \succeq_{K^\ast} 0)(\lambda^T x \leq \lambda^T y)$
- $x\prec_K y$ if and only if $(\forall \lambda \succeq_{K^\ast} 0 \mbox{ with } \lambda\neq0)(\lambda^T x < \lambda^T y)$
- $x\preceq_{K^\ast} y$ if and only if $(\forall \lambda \succeq_{K} 0)(\lambda^T x \leq \lambda^T y)$
- $x\prec_{K^\ast} y$ if and only if $(\forall \lambda \succeq_{K} 0 \mbox{ with } \lambda\neq0)(\lambda^T x < \lambda^T y)$
Theorem of alternative for linear strict generalized inequalities
Convex Functions
Convex functions
- function $f:\reals^n\to\reals$ the domain of which is convex and which satisfies $$ \left( \forall x,y\in \dom f, 0\leq \theta \leq 1 \right) \left( f(\theta x + (1-\theta) y) \leq \theta f(x) + (1-\theta) f(y) \right) $$ said to be convex
- function $f:\reals^n\to\reals$ the domain of which is convex and which satisfies $$ \left( \forall \mbox{ distinct } x,y\in \dom f, 0< \theta < 1 \right) \left( f(\theta x + (1-\theta) y) < \theta f(x) + (1-\theta) f(y) \right) $$ said to be strictly convex
- function $f:\reals^n\to\reals$ the domain of which is convex where $-f$ is convex, said to be concave
- function $f:\reals^n\to\reals$ the domain of which is convex where $-f$ is strictly convex, said to be strictly concave
Extended real-value extensions of convex functions
-
using extended real-value extensions of convex functions,
can drop “$\dom f$'' in equations,
e.g.,
- $f$ is convex if and only if its extended-value extension $\tilde{f}$ satisfies $$ \left( \forall x,y\in \dom f, 0\leq \theta \leq 1 \right) \left( f(\theta x + (1-\theta) y) \leq \theta f(x) + (1-\theta) f(y) \right) $$
- $f$ is strictly convex if and only if its extended-value extension $\tilde{f}$ satisfies $$ \left( \forall \mbox{ distinct } x,y\in \dom f, 0< \theta < 1 \right) \left( f(\theta x + (1-\theta) y) < \theta f(x) + (1-\theta) f(y) \right) $$
First-order condition for convexity
- convex if and only if $\dom f$ is convex and $$ \left( \forall x,y\in \dom f \right) \left( f(y) \geq f(x) + \nabla f(x) ^T (y-x) \right) $$
- strictly convex if and only if $\dom f$ is convex and $$ \left( \forall \mbox{ distinct } x,y\in \dom f \right) \left( f(y) > f(x) + \nabla f(x) ^T (y-x) \right) $$
-
implies
that
for convex function $f$
- first-order Taylor approximation is global underestimator
-
can derive
global information
from
local information
- e.g., if $\nabla f(x)=0$, $x$ is global minimizer
- explains remarkable properties of convex functions and convex optimization problems
Second-order condition for convexity
- if $\dom f$ is convex and $$ \left( \forall x\in \dom f \right) \left( \nabla^2 f(x) \succ 0 \right) $$ it is strictly convex
Convex function examples
- assume function $f:\reals^n\to\reals$ and $\dom f =\reals^n$ unlesss specified otherwise
- affine function, i.e., $f(x)=a^Tx +b$ for some $a\in\reals^n$ and $b\in\reals$, is convex
-
quadratic functions
- if $f(x) = x^T Px + q^Tx$
for some $P\in\symset{n}$ and $q\in\reals^n$
- $f$ is convex if and only if $P\succeq0$
- $f$ is strictly convex if and only if $P\succ0$
- exponential function, i.e., $f(x) = \exp(a^Tx+b)$ for some $a\in\reals^n$ and $b\in\reals$, is convex
- power, i.e., $f(x) = x^a$ for some $a\geq1$, is convex on $\ppreals$
- power of absolute value, i.e., $f(x) = |x|^a$ for some $a\geq1$, is convex on $\reals$
- logarithm function, i.e., $f(x) = \log x$, is concave on $\ppreals$
- negative entropy, i.e., $$ f(x) = \left\{\begin{array}{ll} x\log x & \mbox{if } x >0 \\ 0 &\mbox{if } x=0 \end{array}\right. $$ is convex on $\preals$
- norm as function is convex (by definition of norms, i.e., triangle inequality & absolute homogeneity)
- max function, i.e., $f(x)=\max(x_1,\ldots,x_n\}$, is convex
- quadratic-over-linear function, $f(x,y) = x^2/y$, is convex on $\reals\times \ppreals$
- log-sum-exp, $f(x) = \log(\exp(x_1)+\cdots+\exp(x_n))$, is convex
- geometric mean, $f(x) = (\prod_{i=1}^n x_i )^{1/n}$, is concave on $\pprealk{n}$
- log-determinant, $f(X) = \log \det X$, is concave on $\posdefset{n}$
Sublevel sets and superlevel sets
- every sublevel set of convex function is convex
- and every superlevel set of concave function is convex
-
note, however, converse is not true
- e.g., every sublevel set of $\log$ is convex, but $\log$ is concave
Epigraphs and hypographs
- function is convex if and only if its epigraph is convex
- function is concave if and only if its hypograph is convex
Convexity preserving function operations
-
nonnegative weighted sum preserves convexity
- for convex functions $f_1$, , $f_n$ and nonnegative weights $w_1,\ldots, w_n$ $$ w_1 f_1 + \cdots w_n f_n $$ is convex
-
nonnegative weighted integration preserves convexity
- for measurable set $Y$, $w:Y\to\preals$, and $f:X \times Y$ where $f(x,y)$ is convex in $x$ for every $y\in Y$ and measurable in $y$ for every $x\in X$ $$ \int_Y w(y) f(x,y) dy $$ is convex
-
pointwise maximum preserves convexity
- for convex functions $f_1$, , $f_n$ $$ \max\{f_1, \ldots, f_n\} $$ is convex
-
pointwise supremum preserves convexity
- for indexed family of convex functions $\indexedcol{f_\lambda}_{\lambda\in\Lambda}$ $$ \sup_{\lambda \in \Lambda} f_\lambda $$ is convex (one way to see this is $\epi \sup_\lambda f_\lambda = \bigcap_\lambda \epi f_\lambda$)
-
composition
-
suppose $g:\reals^n\to\reals^k$, $h:\reals^k\to\reals$, and $f=h\circ g$
- $f$ convex if $h$ convex & nondecreasing in each argument, and $g_i$ convex
- $f$ convex if $h$ convex & nonincreasing in each argument, and $g_i$ concave
- $f$ concave if $h$ concave & nondecreasing in each argument, and $g_i$ concave
- $f$ concave if $h$ concave & nonincreasing in each argument, and $g_i$ convex
-
suppose $g:\reals^n\to\reals^k$, $h:\reals^k\to\reals$, and $f=h\circ g$
-
minimization
- for function $f(x,y)$ convex in $(x,y)$ and convex set $C$ $$ \inf_{y\in C} f(x,y) $$ is convex provided it is bounded below where domain is $\set{x}{(\exists y\in C)((x,y) \in \dom f)}$
-
perspective of convex function preserves convexity
- for convex function $f:X\to\reals$, function $g:X\times \reals \to \reals$ defined by $$ g(x,t) = tf(x/t) $$ with $\dom g = \set{(x,t)}{x/t \in \dom f, t>0}$ is convex
Convex functions examples
-
piecewise-linear function is convex, i.e.
- $\max\{a_1^Tx+b_1,\ldots,a_m^T x + b_m\}$ for some $a_i\in\reals^n$ and $b_i\in\reals$ is convex
-
sum of $k$ largest components is convex, i.e.
- $x_{[1]} + \cdots + x_{[k]}$ where $x_{[i]}$ denotes $i$-th largest component, is convex (since $f(x) = \max\set{x_{i_1}+\cdots+x_{i_r}}{1\leq i_1< i_2<\cdots < i_r\leq n}$)
-
support function of set, i.e.,
- $\sup\set{x^Ty}{y\in A}$ for $A\subset\reals^n$ is convex
-
distance (when measured by arbitrary norm) to farthest point of set
- $\sup\set{\|x-y\|}{y\in A}$ for $A\subset\reals^n$ is convex
-
least-squares cost as function of weights
-
$\inf_{x\in\reals^n} \sum^n_{i=1} w_i(a_i^Tx - b_i)^2$ for some $a_i\in\reals^n$ and $b_i\in\reals$
is concave
- note that above function equals to $\sum_{i=1}^n w_i b_i^2 - \sum_{i=1}^n w_i^2 b_i^2 a_i^T \left( \sum_{j=1}^n w_ja_ja_j^T\right)^{-1} a_i$ but not clear whether it is concave
-
$\inf_{x\in\reals^n} \sum^n_{i=1} w_i(a_i^Tx - b_i)^2$ for some $a_i\in\reals^n$ and $b_i\in\reals$
is concave
-
maximum eigenvalue of symmetric matrix
- $\lambda_\mathrm{max}(F(x)) = \sup\set{y^TF(x)y}{\|y\|_2 \leq 1}$ where $F:\reals^n\to \symset{m}$ is linear function in $x$
-
norm of matrix
- $\sup\set{u^TG(x)v}{\|u\|_2 \leq 1, \|v\|_2\leq1}$ where $G:\reals^n\to \reals^{m\times n}$ is linear function in $x$
-
distance (when measured by arbitrary norm) to convex set
- for convex set $C$, $\inf\set{\|x-y\|}{y\in C}$
-
infimum of convex function
subject to linear constraint
- for convex function $h$, $\inf\set{h(y)}{Ay=x}$ is convex (since it is $\inf_y (h(y) + I_{Ay=x}(x,y))$)
-
perspective of Euclidean norm squared
- map $(x,t) \mapsto x^Tx /t$ induces convex function in $(x,t)$ for $t>0$
-
perspective of negative log
- map $(x,t) \mapsto -t \log(x/t)$ induces convex function in $(x,t) \in \pprealk{2}$
-
perspective of convex function
- for convex function $f:\reals^n\to\reals$, function $g:\reals^n\to\reals$ defined by $$ g(x) = (c^T x + d) f((Ax+b)/(c^T x + d)) $$ from some $A\in\reals^{m\times n}$, $b\in\reals^m$, $c\in\reals^n$, and $d\in\reals$ with $\dom g = \set{x}{(Ax+b)/(c^Tx + d)\in \dom f, c^T x + d >0}$ is convex
Conjugate functions
- conjugate function is convex for any function $f$ because it is supremum of linear (hence convex) functions (in $x$) ()
Conjugate function examples
-
strictly convex quadratic function
- for $f:\reals^n \to \preals$ defined $f(x) = x^TQx/2$ where $Q\in \posdefset{n}$, $$ f^\ast(x)= \sup_y(y^Tx - y^TQy/2) = (y^Tx - y^TQy/2)|_{y=Q^{-1}x} = x^TQ^{-1}x/2 $$ which is also strictly convex quadratic function
-
log-determinant
- for function $f:\posdefset{n} \to \reals$ defined by $f(X) = \log \det X^{-1}$ $$ f^\ast(X) = \sup_{Y\in\posdefset{n}} (\Tr XY + \log \det Y) = \log\det (-X)^{-1} - n $$ where $\dom f^\ast = -\posdefset{n}$
-
indicator function
- for indicator function $I_A:\reals^n\to\{0,\infty\}$ with $A\subset \reals^n$ $$ I_A^\ast(x) = \sup_y (y^Tx - I_A(y)) = \sup \set{y^Tx}{y\in A} $$ which is support function of $A$
-
log-sum-exp function
- for function $f: \reals^n \to \reals$ defined by $f(x) = \log(\sum_{i=1}^n \exp(x_i))$ $$ f^\ast(x) = \sum_{i=1}^n x_i \log x_i + I_{x\succeq 0, \ones^T x = 1}(x) $$
-
norm
- for norm function $f:\reals^n\to\preals$ defined by $f(x)=\|x\|$ $$ f^\ast(x) = \sup_y( {y^Tx - \|y\|}) = I_{\|x\|_\ast\leq1}(x) $$
-
norm squared
- for function $f: \reals \to \preals$ defined by $f(x) = \|x\|^2/2$ $$ f^\ast(x) = \|x\|_\ast^2/2 $$
-
differentiable convex function
- for differentiable convex function $f:\reals^n\to\reals$ $$ f^\ast(x)= (y^\ast)^T \nabla f(y^\ast) - f(y^\ast) $$ where $y^\ast = \argsup_y (x^Ty-f(y))$
-
sum of independent functions
- for function $f:\reals^n\times \reals^m \to \reals$ defined by $f(x,y) = f_1(x) + f_2(y)$ where $f_1:\reals^n\to\reals$ and $f_2:\reals^m\to\reals$ $$ f^\ast(x,y) = f_1^\ast(x) + f_2^\ast(y) $$
Convex functions \wrt\ generalized inequalities
- function $f$ satisfying $$ \left( \forall x,y \in \dom f, 0\leq \theta\leq 1 \right) \left( f(\theta x + (1-\theta) y) \preceq_K \theta f(x) + (1-\theta) f(y) \right) $$ called $K$-convex
- function $f$ satisfying $$ \left( \forall x\neq y \in \dom f, 0< \theta< 1 \right) \left( f(\theta x + (1-\theta) y) \prec_K \theta f(x) + (1-\theta) f(y) \right) $$ called strictly $K$-convex
- function $f$ is $K$-convex if and only if for every $w\succeq_{K^\ast}0$, $w^Tf$ is convex
- function $f$ is strictly $K$-convex if and only if for every nonzero $w\succeq_{K^\ast}0$, $w^Tf$ is strictly convex
Matrix convexity
-
examples of matrix convexity
- function of $\reals^{n\times m}$ into $\possemidefset{n}$ defined by $X\mapsto XX^T$ is matrix convex
- function of $\posdefset{n}$ into itself defined by $X\mapsto X^p$ is matrix convex for $1\leq p\leq 2$ or $-1\leq p \leq0$, and matrix concave for $0\leq p\leq1$
- function of $\symset{n}$ into $\posdefset{n}$ defined by $X\mapsto \exp(X)$ is not matrix convex
- quadratic matrix function of $\reals^{m\times n}$ into $\symset{n}$ defined by $X\mapsto X^TAX + B^TX + X^TB + C$ for $A\in\symset{m}$, $B\in\reals^{m\times n}$, and $C\in\symset{n}$ is matrix convex when $A\succeq0$
Convex Optimization Problems
Optimization problems
- $\fobj$, $\fie$, and $\feq$ are objective function, inequality \& equality contraint function
- $\fie(x) \preceq 0$ and $\feq(x) = 0$ are inequality contraints and equality contraints
- $\optdomain = \xobj \cap \xie \cap \xeq$ is domain of optimization problem
- $\optfeasset =\set{x\in \optdomain}{\fie(x) \preceq0, \feq(x)=0}$, called feasible set, $x\in\optdomain$, said to be feasible if $x\in\optfeasset$, optimization problem, said to be feasible if $\optfeasset\neq \emptyset$
- $p^\ast = \inf\set{\fobj(x)}{x\in\optfeasset}$, called optimal value of optimization problem
- if optimization problem is infeasible, $p^\ast = \infty$ (following convention that infimum of empty set is $\infty$)
- if $p^\ast=-\infty$, optimization problem said to be unbounded
Global and local optimalities
- $x\in \optfeasset$ with $\fobj(x) = p^\ast$, called (global) optimal point
- $X_\mathrm{opt} = \set{x\in \optfeasset}{\fobj(x)=p^\ast}$, called optimal set
- when $X_\mathrm{opt} \neq \emptyset$, we say optimal value is attained or achieved and optimization problem is solvable
- optimization problem is not solvable if $p^\ast = \infty$ or $p^\ast = -\infty$ (converse is not true)
Equivalent optimization problems
-
below two optimization problems are equivalent
- $$ \begin{array}{ll} \mbox{minimize} & -x-y \\ \mbox{subject to} & 2x+y \leq1 \\ & x+2y \leq1 \end{array} $$
- $$ \begin{array}{ll} \mbox{minimize} & -2u-v/3 \\ \mbox{subject to} & 4u+v/3 \leq1 \\ & 2u+2v/3 \leq1 \end{array} $$
- since if $(x^\ast, y^\ast)$ solves first, $(u,v)=(x^\ast/2, 3y^\ast)$ solves second, and if $(u^\ast, v^\ast)$ solves second, $(x,y)=(2u^\ast, v^\ast/3)$ solves first
Change of variables
- given function $\phi:\mathcalfont{Z} \to \xdomain$, optimization problem in can be rewritten as $$ \begin{array}{ll} \mbox{minimize} & \fobj(\phi(z)) \\ \mbox{subject to} & \fie(\phi(z)) \preceq 0 \\ & \feq(\phi(z)) =0 \end{array} $$ where $z\in\mathcalfont{Z}$ is optimization variable
- if $\phi$ is injective and $\optdomain \subset \phi(\mathcalfont{Z})$, above optimization problem and optimization problem in are equivalent, i.e.
- two optimization problems said to be related by change of variable or substitution of variable $x=\phi(z)$
Convex optimization
- when $\xdomain= \reals^n$, optimization problem can be formulated as $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie(x) \preceq 0 \\ & Ax = b \end{array} $$ for some $A\in\reals^{p\times n}$ and $b\in\reals^p$
-
domain of convex optimization problem is convex
- since domains of $\fobj$, $\fie$, and $\feq$ are convex (by definition of convex functions) and intersection of convex sets is convex
-
feasible set of convex optimization problem
is convex
- since sublevel sets of convex functions are convex, feasible sets for affine function is either empty set, singleton, or affine sets, all of which are convex sets
Optimality conditions for convex optimization problems
- $x\in\optdomain$ is optimal if and only if $x\in\optfeasset$ and $$ \left( \forall y \in \optfeasset \right) \left( \nabla \fobj(x)^T(y-x) \geq0 \right) $$
- for unconstrained problems, $x\in\optdomain$ is optimal if and only if $$ \nabla \fobj(x)=0 $$
Optimality conditions for some convex optimization problems
-
unconstrained convex quadratic optimization
$$
\begin{array}{ll}
\mbox{minimize}
& \fobj(x) = (1/2)x^TPx + q^Tx
\end{array}
$$
where $\xobj=\reals^n$ and $P\in\possemidefset{n}$
-
$x$ is optimal if and only if
$$
\nabla \fobj(x) = Px + q = 0
$$
exist three cases
- if $P\in\posdefset{n}$, exists unique optimum $x^\ast = -P^{-1}q$
- if $q\in\range(P)$, $X_\mathrm{opt}=-P^\dagger q + \nullspace(P)$
- if $q\not\in\range(P)$, $p^\ast = -\infty$
-
$x$ is optimal if and only if
$$
\nabla \fobj(x) = Px + q = 0
$$
exist three cases
-
analytic centering
$$
\begin{array}{ll}
\mbox{minimize}
& \fobj(x) = - \sum_{i=1}^m \log (b_i-a_i^Tx)
\end{array}
$$
where $\xobj = \set{x\in\reals^n}{Ax \prec b}$
-
$x$ is optimal if and only if
$$
\nabla \fobj(x) = \sum_{i=1}^m \frac{1}{b_i-a_i^Tx}a_i = 0
$$
exist three cases
- exists unique optimum, which happens if and only if $\set{x}{b_i-a_i^Tx}$ is nonempty and bounded
- exist infinitely many optima, in which case, $X_\mathrm{opt}$ is affine set
- exists no optimum, which happens if and only if $\fobj$ is unbounded below
-
$x$ is optimal if and only if
$$
\nabla \fobj(x) = \sum_{i=1}^m \frac{1}{b_i-a_i^Tx}a_i = 0
$$
exist three cases
-
convex optimization problem with equality constraints only
$$
\begin{array}{ll}
\mbox{minimize}
& \fobj(x)
\\
\mbox{subject to}
& Ax =b
\end{array}
$$
where $\xdomain=\reals^n$
- $x$ is optimal if and only if $$ \nabla \fobj(x) \perp \nullspace(A) $$ or equivalently, exists $\nu\in\reals^p$ such that $$ \nabla \fobj(x) = A^T\nu $$
Linear programming
- can transform above LP into standard form LP $$ \begin{array}{ll} \mbox{minimize} & \tilde{c}^T\tilde{x} \\ \mbox{subject to} & \tilde{A}\tilde{x} = \tilde{b} \\ & \tilde{x} \succeq0 \end{array} $$
LP examples
-
diet problem
-
find amount of $n$ different food to minimize purchase cost
while satisfying nutrition requirements
- assume exist $n$ food and $m$ nutritions, $c_i$ is cost of food $i$, $A_{ji}$ is amount of nutrition $j$ contained in unit quantity of food $i$, $b_j$ is amount requirement for nutrition $j$
- diet problem can be formulated as LP $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & Ax \succeq b \\ & x\succeq0 \end{array} $$
-
Chebyshev center of polyhedron
- find largest Euclidean ball contained in polyhedron
- assume polyhedron is $\set{x\in\reals^n}{a_i^Tx \leq b_i, i=1,\ldots, m}$
- problem of finding Chebyshev center of polyhedron can be formulated as LP $$ \begin{array}{ll} \mbox{maximize} & r \\ \mbox{subject to} & a_i^T x + r\|a_i\|_2 \leq b_i \end{array} $$ where optimization variables are $x\in\reals^n$ and $r\in\reals$
-
piecewise-linear minimization
- minimize maximum of affine functions
- assume $m$ affine functions $a_i^Tx + b_i$
- piecewise-linear minimization problem can be formulated as LP $$ \begin{array}{ll} \mbox{minimize} & t \\ \mbox{subject to} & a_i^Tx + b_ i \leq t,\quad i=1,\ldots,m \end{array} $$
-
linear-fractional program
$$
\begin{array}{ll}
\mbox{minimize} &
(
c^T x + d
)
/
(
e^T x + f
)
\\
\mbox{subject to} &
Gx \preceq h
\\ &
Ax = b
\end{array}
$$
- if feasible set is nonempty, can be formulated as LP $$ \begin{array}{ll} \mbox{minimize} & c^T y + dz \\ \mbox{subject to} & Gy - hz \preceq0 \\ & Ay-bz = 0 \\ & e^Ty + fz = 1 \\ & z\geq0 \end{array} $$
Quadratic programming
- when $P=0$, QP reduces to LP, hence LP is specialization of QP
QP examples
-
least-squares (LS) problems
- LS can be formulated as QP $$ \begin{array}{ll} \mbox{minimize} & \|Ax-b\|_2^2 \end{array} $$
-
distance between two polyhedra
- assume two polyhedra $\set{x\in\reals^n}{Ax\preceq b, Cx =d}$ and $\set{x\in\reals^n}{\tilde{A}x\preceq \tilde{b}, \tilde{C}x =\tilde{d}}$
- problem of finding distance between two polyhedra can be formulated as QP $$ \begin{array}{ll} \mbox{minimize} & \|x-y\|_2^2 \\ \mbox{subject to} & Ax\preceq b, \quad Cx =d \\ & \tilde{A}y\preceq \tilde{b}, \quad \tilde{C}y =\tilde{d} \end{array} $$
Quadratically constrained quadratic programming
- when $P_i=0$ for $i=1,\ldots,m$, QCQP reduces to QP, hence QP is specialization of QCQP
Second-order cone programming
- when $b_i=0$, SOCP reduces to QCQP, hence QCQP is specialization of SOCP
SOCP examples
-
robust linear program
-
minimize $c^T x$
while satisfying
$\tilde{a}_i^T x \leq b_i$
for every $\tilde{a}_i \in \set{a_i+P_iu}{\|u\|_2\leq1}$
where $P_i\in\symset{n}$
- can be formulated as SOCP $$ \begin{array}{ll} \mbox{minimize} & c^T x \\ \mbox{subject to} & a_i^T x + \|P_i^T x\|_2 \leq b_i \end{array} $$
-
linear program with random constraints
-
minimize $c^T x$
while satisfying
$\tilde{a}_i^T x \leq b_i$
with probability no less than $\eta$
where $\tilde{a} \sim \normal(a_i,\Sigma_i)$
- can be formulated as SOCP $$ \begin{array}{ll} \mbox{minimize} & c^T x \\ \mbox{subject to} & a_i^T x + \Phi^{-1}(\eta)\|\Sigma_i^{1/2} x\|_2 \leq b_i \end{array} $$
Geometric programming
Geometric programming in convex form
- geometric program in is not convex optimization problem (as it is)
- however, can be transformed to equivalent convex optimization problem by change of variables and transformation of functions
Convex optimization with generalized inequalities
- problem in reduces to convex optimization problem in when $q=1$ and $K_1=\prealk{m}$, hence convex optimization is specialization of convex optimization with generalized inequalities
- like convex optimization
Conic programming
- can transform above CP to standard form CP $$ \begin{array}{ll} \mbox{minimize} & \tildefobj(X) \\ \mbox{subject to} & \tildefeq (X) = 0 \\ & X \succeq_{K} 0 \end{array} $$
- cone program is one of simplest convex optimization problems with generalized inequalities
Semidefinite programming
- above inequality, called linear matrix inequality (LMI)
- can transform SDP to standard form SDP $$ \begin{array}{ll} \mbox{minimize} & \Tr (CX) \\ \mbox{subject to} & \Tr (A_iX) = b_i\quad i=1,\ldots,p \\ & X \succeq 0 \end{array} $$ where $\xdomain=\possemidefset{n}$ and $C,A_1,\ldots,A_p\in\symset{n}$ and $b_i\in\reals$
SDP examples
- LP
-
SOCP
- SOCP in is equivalent to $$ \begin{array}{ll} \mbox{minimize} & f^T x \\ \mbox{subject to} & Fx = g \\ & \begin{my-matrix}{cc} c_i^Tx + d_i & x^TA_i^T + b_i^T \\ A_ix + b_i & (c_i^Tx + d_i)I_{n_i} \end{my-matrix} \succeq 0 \quad i=1,\ldots,m \end{array} $$ which can be transformed to SDP in , thus, SDP reduces to SOCP
- hence, SOCP is specialization of SDP
Determinant maximization problems
- if $l=1$, $C_1=\cdots=C_n=0$, $D=1$, max-det problem reduces to SDP, hence SDP is specialization of max-det problem
Diagrams for containment of convex optimization problems
- the figure shows containment relations among convex optimization problems
- vertical lines ending with filled circles indicate existence of direct reductions, i.e., optimization problem transformations to special cases
Duality
Lagrangian
- $\lambda$, called Lagrange multiplier associated inequality constraints $\fie(x)\preceq0$
- $\lambda_i$, called Lagrange multiplier associated $i$-th inequality constraint $\fie_i(x)\leq0$
- $\nu$, called Lagrange multiplier associated equality constraints $\feq(x)=0$
- $\nu_i$, called Lagrange multiplier associated $i$-th equality constraint $\feq_i(x)=0$
- $\lambda$ and $\nu$, called dual variables or Lagrange multiplier vectors associated with the optimization problem
Lagrange dual functions
-
$g$ is (always) concave function (even when optimization problem is not convex)
- since is pointwise infimum of linear (hence concave) functions is concave
- $g(\lambda,\nu)$ provides lower bound for optimal value of associated optimization problem, i.e., $$ g(\lambda,\nu) \leq p^\ast $$ for every $\lambda\succeq0$
- $(\lambda,\nu) \in \set{(\lambda,\nu)}{\lambda\succeq0, g(\lambda,\nu)>-\infty}$, said to be dual feasible
Dual function examples
-
LS solution of linear equations
$$
\lssollineqs{primal}
$$
- Lagrangian - $L(x,\nu) = x^T x + \nu^T(Ax-b)$
- Lagrange dual function $$ \lssollineqs{dual fcn} $$
-
standard form LP
$$
\begin{array}{ll}
\mbox{minimize} &
c^Tx
\\
\mbox{subject to} &
Ax = b
\\ &
x\succeq 0
\end{array}
$$
- Lagrangian - $L(x,\lambda,\nu) = c^T x - \lambda^T x + \nu^T(Ax-b)$
-
Lagrange dual function
$$
g(\lambda,\nu) = \left\{\begin{array}{ll}
-b^T\nu & A^T\nu - \lambda + c = 0
\\
-\infty & \mbox{otherwise}
\end{array}\right.
$$
- hence, set of dual feasible points is $\set{(A^T\nu + c,\nu)}{A^T\nu +c \succeq0}$
-
maximum cut, sometimes called max-cut, problem, which is NP-hard
$$
\begin{array}{ll}
\mbox{minimize} &
x^T W x
\\
\mbox{subject to} &
x_i^2 = 1
\end{array}
$$
where $W\in\symset{n}$
- Lagrangian - $L(x,\nu) = x^T(W+\diag(\nu))x - \ones^Tx$
-
Lagrange dual function
$$
g(\nu) = \left\{\begin{array}{ll}
-\ones^T\nu
& W + \diag(\nu) \succeq 0
\\
-\infty & \mbox{otherwise}
\end{array}\right.
$$
- hence, set of dual feasible points is $\set{\nu}{W+\diag(\nu)\succeq0}$
-
some trivial problem
$$
\begin{array}{ll}
\mbox{minimize} &
f(x)
\\
\mbox{subject to} &
x=0
\end{array}
$$
- Lagrangian - $L(x,\nu) =f(x)+\nu^Tx$
-
Lagrange dual function
$$
g(\nu) = \inf_{x\in\reals^n} (f(x)+\nu^Tx)
= -\sup_{x\in\reals^n} ((-\nu)^Tx-f(x))
= - f^\ast(-\nu)
$$
- hence, set of dual feasible points is $-\dom f^\ast$, and for every $f:\reals^n\to\reals$ and $\nu\in\reals^n$ $$ -f^\ast(-\nu) \leq f(0) $$
-
minimization with linear inequality and equality constraints
$$
\begin{array}{ll}
\mbox{minimize} &
f(x)
\\
\mbox{subject to} &
Ax\preceq b
\\ &
Cx= d
\end{array}
$$
- Lagrangian - $L(x,\lambda, \nu) = f(x) + \lambda^T(Ax-b) + \nu^T(Cx-d)$
-
Lagrange dual function
$$
g(\nu) = -b^T\lambda - d^T\nu - f^\ast(-A^T \lambda - C^T\nu)
$$
- hence, set of dual feasible points is $\set{(\lambda,\nu)}{-A^T\lambda - C^T\nu \in \dom f^\ast, \lambda\succeq 0}$
-
equality constrained norm minimization
$$
\begin{array}{ll}
\mbox{minimize} &
\|x\|
\\
\mbox{subject to} &
Ax = b
\end{array}
$$
- Lagrangian - $L(x,\nu) = \|x\| + \nu^T(Ax-b)$
-
Lagrange dual function
$$
g(\nu) = -b^T\nu -\sup_{x\in\reals^n} ((-A^T\nu)^Tx - \|x\|)
= \left\{\begin{array}{ll}
-b^T \nu&\|A^T\nu\|_\ast\leq1
\\
- \infty & \mbox{otherwise}
\end{array}\right.
$$
- hence, set of dual feasible points is $\set{\nu}{\|A^T\nu\|_\ast \leq1}$
-
entropy maximization
$$
\entmax{primal}
$$
where domain of objective function is $\pprealk{n}$
- Lagrangian - $L(x,\lambda,\nu) = \sum_{i=1}^n x_i\log x_i + \lambda^T(Ax-b) + \nu(\ones^Tx-1)$
- Lagrange dual function $$ g(\lambda,\nu) = \entmax{dual fcn} $$ obtained using $f^\ast(y) = \sum_{i=1}^n \exp(y_i-1)$ where $a_i$ is $i$-th column vector of $A$
-
minimum volume covering ellipsoid
$$
\minvolcovering{primal}
$$
where domain of objective function is $\posdefset{n}$
- Lagrangian - $L(X,\lambda) = -\log \det X + \sum_{i=1}^m \lambda_i(a_i^T X a_i - 1)$
- Lagrange dual function $$ g(\lambda) = \minvolcovering{dual fcn} $$ obtained using $f^\ast(Y) = -\log\det(-Y) - n$
Best lower bound
- for every $(\lambda,\nu)$ with $\lambda\succeq 0$, Lagrange dual function $g(\lambda,\nu)$ (in ) provides lower bound for optimal value $p^\ast$ for optimization problem in
-
natural question to ask is
- how good is the lower bound?
- what is best lower bound we can achieve?
- these questions lead to definition of Lagrange dual problem
Lagrange dual problems
- original problem in , (somestime) called primal problem
- domain is $\reals^m\times \reals^p$
- dual feasibility defined in page~, i.e., $(\lambda,\nu)$ satisfying $\lambda \succeq 0 \quad g(\lambda,\nu) > -\infty$ indeed means feasibility for Lagrange dual problem
- $d^\ast = \sup\set{g(\lambda,\nu)}{\lambda\in\reals^m,\:\nu\in\reals^p,\:\lambda\succeq 0}$, called dual optimal value
- $(\lambda^\ast,\nu^\ast) = \argsup\set{g(\lambda,\nu)}{\lambda\in\reals^m,\:\nu\in\reals^p,\:\lambda\succeq 0}$, said to be dual optimal or called optimal Lagrange multipliers (if exists)
- Lagrange dual problem in is convex optimization (even though original problem is not) since $g(\lambda,\nu)$ is always convex
Making dual constraints explicit dual problems
- (out specific) way we define Lagrange dual function in as function $g$ of $\reals^m \times \reals^p$ into $\reals\cup\{-\infty\}$, i.e., $\dom g = \reals^n\times\reals^p$
- however, in many cases, feasible set $\set{(\lambda,\nu)}{\lambda \succeq 0 \quad g(\lambda,\nu) > -\infty}$ is proper subset of $\reals^n\times\reals^p$
- can make this implicit feasibility condition explicit by adding it as constraint (as shown in following examples)
Lagrange dual problems associated with LPs
-
standard form LP
- primal problem $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & Ax = b \\ & x\succeq 0 \end{array} $$
-
Lagrange dual problem
$$
\begin{array}{ll}
\mbox{maximize} &
g(\lambda,\nu) = \left\{\begin{array}{ll}
-b^T\nu & A^T\nu - \lambda + c = 0
\\
-\infty & \mbox{otherwise}
\end{array}\right.
\\
\mbox{subject to} &
\lambda \succeq 0
\end{array}
$$
(refer to page~
for Lagrange dual function)
- can make dual feasibility explicit by adding it to constraints as mentioned on page~ $$ \begin{array}{ll} \mbox{maximize} & -b^T\nu \\ \mbox{subject to} & \lambda \succeq 0 \\ & A^T\nu - \lambda + c = 0 \end{array} $$
- can further simplify problem $$ \begin{array}{ll} \mbox{maximize} & -b^T\nu \\ \mbox{subject to} & A^T\nu + c \succeq 0 \end{array} $$
- last problem is inequality form LP
- all three problems are equivalent, but not same problems
- will, however, with abuse of terminology, refer to all three problems as Lagrange dual problem
-
inequality form LP
- primal problem $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & Ax \preceq b \end{array} $$
- Lagrangian $$ L(x,\lambda) = c^Tx + \lambda^T(Ax-b) $$
- Lagrange dual function $$ g(\lambda) = -b^T\lambda + \inf_{x\in\reals^n} (c+A^T\lambda)^T x = \left\{\begin{array}{ll} -b^T\lambda & A^T\lambda + c =0 \\ -\infty & \mbox{otherwise} \end{array}\right. $$
-
Lagrange dual problem
$$
\begin{array}{ll}
\mbox{maximize} &
g(\lambda)
= \left\{\begin{array}{ll}
-b^T\lambda & A^T\lambda + c =0
\\
-\infty & \mbox{otherwise}
\end{array}\right.
\\
\mbox{subject to} &
\lambda \succeq 0
\end{array}
$$
- can make dual feasibility explicit by adding it to constraints as mentioned on page~ $$ \begin{array}{ll} \mbox{maximize} & -b^T\nu \\ \mbox{subject to} & A^T\lambda + c = 0 \\ & \lambda \succeq 0 \end{array} $$
- dual problem is standard form LP
- thus, dual of standard form LP is inequality form LP and vice versa
- also, for both cases, dual of dual is same as primal problem
Lagrange dual problem of equality constrained optimization problem
- equality constrained optimization problem $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & Ax = b \end{array} $$
- dual function $$ \begin{eqnarray*} g(\nu) & = & \inf_{x\in\dom \fobj} (\fobj(x) + \nu^T(Ax-b)) = -b^T\nu - \sup_{x\in\dom \fobj}(-\nu^TAx -\fobj(x)) \\ & = & -b^T\nu - {\fobj}^\ast(-A^T\nu) \end{eqnarray*} $$
- dual problem $$ \begin{array}{ll} \mbox{maximize} & -b^T\nu - {\fobj}^\ast(-A^T\nu) \end{array} $$
Lagrange dual problem associated with equality constrained quadratic program
-
strictly convex quadratic problem
$$
\begin{array}{ll}
\mbox{minimize} &
\fobj(x) = x^TPx + q^T x + r
\\
\mbox{subject to} &
Ax=b
\end{array}
$$
- conjugate function of objective function $$ {\fobj}^\ast(x) = (x-q)^TP^{-1}(x-q)/4 - r = x^TP^{-1}x/4 -q^TP^{-1}x/2 + q^TP^{-1}q/4 -r $$
- dual problem $$ \begin{array}{ll} \mbox{maximize} & -\nu^T (AP^{-1}A^T)\nu /4 -(b + A P^{-1} q/2)^T\nu - q^TP^{-1}q/4 +r \end{array} $$
Lagrange dual problems associated with nonconvex quadratic problems
-
primal problem
$$
\noncvxquadprob{primal}
$$
where $A\in\symset{n}$, $A\not\in\possemidefset{n}$, and $b\in\reals^n$
- since $A\not\succeq 0$, not convex optimization problem
- sometimes called trust region problem arising minimizing second-order approximation of function over bounded region
- Lagrange dual function $$ g(\lambda) = \noncvxquadprob{dual fcn} $$ where $(A+\lambda I)^\dagger$ is pseudo-inverse of $A+\lambda I$
-
Lagrange dual problem
$$
\noncvxquadprob{dual}
$$
where optimization variable is $\lambda \in\reals$
- note we do not need constraint $\lambda \geq0$ since it is implied by $A+\lambda I \succeq 0$
- though not obvious from what it appears to be, it is (of course) convex optimization problem (by definition of Lagrange dual function, i.e., )
- can be expressed ar $$ \begin{array}{ll} \mbox{maximize} & -\sum_{i=1}^n (q_i^Tb)^2/(\lambda_i + \lambda) - \lambda \\ \mbox{subject to} & \lambda \geq - \lambda_\mathrm{min}(A) \end{array} $$ where $\lambda_i$ and $q_i$ are eigenvalues and corresponding orthogormal eigenvectors of $A$, when $\lambda_i + \lambda=0$ for some $i$, we interpret $(q_i^Tb)^2/0$ as 0 if $q_i^T0$ and $\infty$ otherwise
Weak duality
- since $g(\lambda,\nu)\leq p^\ast$ for every $\lambda\succeq 0$, we have $$ d^\ast = \sup\set{g(\lambda,\nu)}{\lambda\in\reals^m,\:\nu\in\reals^p,\:\lambda\succeq 0} \leq p^\ast $$
- $d^\ast$ is best lower bound for primal problem that can be obtained from Lagrange dual function (by definition)
-
weak duality holds even when $d^\ast$ or/and $p^\ast$ are not finite, e.g.
- if primal problem is unbounded below so that $p^\ast=-\infty$, must have $d^\ast = -\infty$, i.e., dual problem is infeasible
- conversely, if dual problem is unbounded above so that $d^\ast = \infty$, must have $p^\ast=\infty$, i.e., primal problem is infeasible
Optimal duality gap
-
sometimes used for lower bound of optimal value of problem which is difficult to solve
-
for example,
dual problem
of max-cut problem (on page~),
which is NP-hard,
is
$$
\begin{array}{ll}
\mbox{minimize} &
-\ones^T \nu
\\
\mbox{subject to} &
W + \diag(\nu) \succeq 0
\end{array}
$$
where optimization variable is $\nu\in\reals^n$
- the dual problem can be solved very efficiently using polynomial time algorithms while primal problme cannot be solved unless $n$ is very small
-
for example,
dual problem
of max-cut problem (on page~),
which is NP-hard,
is
$$
\begin{array}{ll}
\mbox{minimize} &
-\ones^T \nu
\\
\mbox{subject to} &
W + \diag(\nu) \succeq 0
\end{array}
$$
where optimization variable is $\nu\in\reals^n$
Strong duality
-
strong duality does not hold in general
- if it held always, max-cut problem, which is NP-hard, can be solved in polynomial time, which would be one of biggest breakthrough in field of theoretical computer science
- may mean some of strongest cryptography methods, e.g., homeomorphic cryptography, can be broken
Slater's theorem
- exist many conditions which guarantee strong duality, which are called constraint qualifications - one of them is Slater's condition
- such condition, called Slater's condition
- such point, (sometimes) said to be strictly feasible
Strong duality for LS solution of linear equations
- primal problem $$ \lssollineqs{primal} $$
- dual problem $$ \lssollineqs{dual} $$ (refer to page~ for Lagrange dual function)
-
“dual is always feasible''
and
“primal is feasible $\Rightarrow$ Slater's condition holds'',
thus
Slater's theorem ()
implies,
exist only three cases
- $(d^\ast = p^\ast \in \reals)$ or $(d^\ast \in \reals\:\&\: p^\ast = \infty)$ or $(d^\ast = p^\ast = \infty)$
- if primal is infeasible, though, $b\not\in\range(A)$, thus exists $z$, such that $A^Tz=0$ and $b^Tz \neq0$, then line $\set{tz}{t\in\reals}$ makes dual problem unbounded above, hence $d^\ast=\infty$
- hence, strong duality always holds, i.e., $(d^\ast= p^\ast \in \reals)$ or $(d^\ast = p^\ast = \infty)$
Strong duality for LP
- every LP either is infeasible or satisfies Slater's condition
-
dual of LP is LP,
hence, Slater's theorem ()
implies
- if primal is feaisble, either $(d^\ast=p^\ast= -\infty)$ or $(d^\ast=p^\ast\in\reals)$
- if dual is feaisble, either $(d^\ast=p^\ast= \infty)$ or $(d^\ast=p^\ast\in\reals)$
-
only other case left is $(d^\ast=-\infty\;\&\;p^\ast= \infty)$
- indeed, this pathological case can happen
Strong duality for entropy maximization
- primal problem $$ \entmax{primal} $$
- dual problem (refer to page~ for Lagrange dual function) $$ \entmax{dual} $$
- dual problem is feasible, hence, Slater's theorem () implies, if exists $x\succ 0$ with $Ax \preceq b$ and $\ones^T x =1$, strong duality holds, and indeed $d^\ast=p^\ast\in\reals$
- by the way, can simplify dual problem by maximizing dual objective function over $\nu$ $$ \entmax{simplied dual} $$ which is geometry program in convex form () with nonnegativity contraint
Strong duality for minimum volume covering ellipsoid
- primal problem $$ \minvolcovering{primal} $$ where $\optdomain=\posdefset{n}$
- dual problem $$ \minvolcovering{dual} $$ (refer to page~ for Lagrange dual function)
- $X=\alpha I$ with large enough $\alpha>0$ satisfies primal's constraints, hence Slater's condition always holds, thus, strong duality always holds, i.e., $(d^\ast = p^\ast \in \reals)$ or $(d^\ast = p^\ast = -\infty)$
- in fact, $\range(a_1,\ldots,a_m) = \reals^n$ if and only if $d^\ast=p^\ast\in\reals^n$
Strong duality for trust region nonconvex quadratic problems
- one of rare occasions in which strong duality obtains for nonconvex problems
- primal problem $$ \noncvxquadprob{primal} $$ where $A\in\symset{n}$, $A\not\in\possemidefset{n}$, and $b\in\reals^n$
- Lagrange dual problem (page~) $$ \noncvxquadprob{dual} $$
- strong duality always holds and $d^\ast=p^\ast\in\reals$ (since dual problem is feasible - large enough $\lambda$ satisfies dual constraints)
- in fact, exists stronger result - strong dual holds for optimization problem with quadratic objective and one quadratic inequality constraint, provided Slater's condition holds
Matrix games using mixed strategies
-
matrix game - consider game with two players $A$ and $B$
- player $A$ makes choice $1\leq a\leq n$, player $B$ makes choice $1\leq b\leq m$, then player $A$ makes payment of $P_{ab}$ to player $B$
- matrix $P\in\reals^{n\times m}$, called payoff matrix
- player $A$ tries to pay as little as possible & player $B$ tries to received as much as possible
- players use randomized or mixed strategies, i.e., each player makes choice randomly and independently of other player's choice according to probability distributions $$ \Prob(a=i) = u_i\; i=1\leq i\leq n \quad \Prob(b=j) = v_j\; i=1\leq j\leq m $$
- expected payoff (from player $A$ to player $B$) $$ \sum_i \sum_j u_iv_jP_{ij} = u^TPv $$
-
assume player $A$'s strategy is known to play $B$
- player $B$ will choose $v$ to maximize $u^TPv$ $$ \sup\set{u^TPv}{v\succeq 0,\; \ones^Tv=1} = \max_{1\leq j\leq m} (P^Tu)_j $$
- player $A$ (assuming that player $B$ will employ above strategy to maximize payment) will choose $u$ to minimize payment $$ \begin{array}{ll} \mbox{minimize} & \max_{1\leq j\leq m} (P^Tu)_j \\ \mbox{subject to} & u\succeq 0\quad \ones^Tu=1 \end{array} $$
-
assume player $B$'s strategy is known to play $A$
- then player $B$ will do same to maximize payment (assuming that player $A$ will employ such strategy to minimize payment) $$ \begin{array}{ll} \mbox{maximize} & \min_{1\leq i\leq n} (Pv)_i \\ \mbox{subject to} & v\succeq 0\quad \ones^Tv=1 \end{array} $$
Strong duality for matrix games using mixed strategies
- in matrix game, can guess in frist came, player $B$ has advantage over player $A$ because $A$'s strategy's exposed to $B$, and vice versa, hence optimal value of first problem is greater than that of second problem
- surprising, no one has advantage over the other, i.e., optimal values of two problems are same - will show this
- first observe both problems are (convex) piecewise-linear optimization problems
-
formulate first problem as LP
$$
\begin{array}{ll}
\mbox{minimize} &
t
\\
\mbox{subject to} &
u\succeq 0 \quad \ones^T u =1 \quad P^T u \preceq t\ones
\end{array}
$$
- Lagrangian $$ L(u,t,\lambda_1, \lambda_2,\nu) = \nu + (1-\ones^T\lambda_1)t + (P\lambda_1 - \nu \ones - \lambda_2)^Tu $$
- Lagrange dual function $$ g(\lambda_1, \lambda_2,\nu) = \left\{\begin{array}{ll} \nu & \ones^T\lambda_1 = 1 \;\&\; P\lambda_1 - \nu \ones = \lambda_2 \\ -\infty & \mbox{otherwise} \end{array}\right. $$
- Lagrange dual problem $$ \begin{array}{ll} \mbox{maximize} & \nu \\ \mbox{subject to} & \ones^T\lambda_1 = 1 \quad P\lambda_1 - \nu \ones = \lambda_2 \\ & \lambda_1 \succeq 0 \quad \lambda_2 \succeq 0 \end{array} $$
- eliminating $\lambda_2$ gives below Lagrange dual problem $$ \begin{array}{ll} \mbox{maximize} & \nu \\ \mbox{subject to} & \lambda_1 \succeq 0 \quad \ones^T\lambda_1 = 1 \quad P\lambda_1 \succeq \nu \ones \end{array} $$ which is equivalent to second problem in matrix game
- weak duality confirms “player who knows other player's strategy has advantage or on par''
- moreoever, primal problem satisfies Slater's condition, hence strong duality {always} holds, and dual is feasible, hence $d^\ast=p^\ast\in\reals$, i.e., regardless of who knows other player's strategy, no player has advantage
Geometric interpretation of duality
- assume (not necessarily convex) optimization problem in
- define graph $$ G = \set{(\fie(x), \feq(x), \fobj(x))}{x\in\optdomain} \subset \reals^m \times \reals^p \times \reals $$
- for every $\lambda\succeq 0$ and $\nu$ $$ \begin{eqnarray*} p^\ast &=& \inf\set{t}{(u,v,t) \in G, u\preceq 0, v &=& 0} \\ & \geq & \inf\set{t+\lambda^Tu + \nu^T v}{(u,v,t) \in G, u\preceq 0, v = 0} \\ & \geq & \inf\set{t+\lambda^Tu + \nu^T v}{(u,v,t) \in G} = g(\lambda,\nu) \end{eqnarray*} $$ where second inequality comes from $\set{(u,v,t)}{(u,v,t) \in G, u\preceq 0, v = 0} \subset G$
- above establishes weak duality using graph
- last equality implies that $$ (\lambda, \nu, 1)^T (u,v,t) \geq g(\lambda,\nu) $$ hence if $g(\lambda,\nu) > -\infty$, $(\lambda, \nu, 1)$ and $g(\lambda,\nu)$ define nonvertical supporting hyperplane for $G$ - nonvertical because third component is nonzero
- the figure shows $G$ as area inside closed curve contained in $\reals^m\times\reals^p\times\reals$ where $m=1$ and $p=0$ as primal optimal value $p^\ast$ and supporting hyperplane $\lambda u + t = g(\lambda)$
- the figure shows three hyperplanes determined by three values for $\lambda$, one of which $\lambda^\ast$ is optimal solution for dual problem
Epigraph interpretation of duality
- define extended graph over $G$ - sort of epigraph of $G$ $$ \begin{eqnarray*} H &=& G + \preals^m \times \{0\} \times \preals \\ & = & \set{(u, v, t)}{x\in\optdomain, \fie(x) \preceq u, \feq(x) = v, \fobj(x)\leq t } \end{eqnarray*} $$
- if $\lambda\succeq 0$, $g(\lambda,\nu) = \inf\set{(\lambda,\nu,1)^T(u,v,t)}{(u,v,t) \in H}$, thus $$ (\lambda,\nu,1)^T (u,v,t) \geq g(\lambda,\nu) $$ defines nonvertical supporting hyperplane for $H$
- now $p^\ast = \inf\set{t}{(0,0,t)\in H}$, hence $(0,0,p^\ast) \in \boundary H$, hence $$ p^\ast =(\lambda,\nu,1)^T (0,0,p^\ast) \geq g(\lambda,\nu) $$
- once again establishes weak duality
- the figure shows epigraph interpretation
Proof of strong duality under constraint qualification
- now we show proof of strong duality - this is one of rare cases where proof is shown in main slides instead of “selected proofs'' section like Galois theory since - (I hope) it will give you some good intuition about why strong duality holds for (most) convex optimization problems
- assume Slater's condition holds, i.e., $\fobj$ and $\fie$ are convex, $\feq$ is affine, and exists $x\in\optdomain$ such that $\fie(x) \prec 0$ and $\feq(x) = 0$
- further assume $\optdomain$ has interior (hence, $\relint \optdomain = \interior{\optdomain}$ and $\rank A=p$
- assume $p^\ast\in\reals$ - since exists feasible $x$, the other possibility is $p^\ast = -\infty$, but then, $d^\ast = -\infty$, hence strong duality holds
- $H$ is convex
- now define $$ B = \set{(0,0,s)\in\reals^m\times\reals^p\times\reals}{s<p^\ast} $$
- then $B\cap H=\emptyset$, hence implies exists separable hyperplane with $(\tilde{\lambda}, \tilde{\nu}, \mu)\neq 0$ and $\alpha$ such that $$ \begin{eqnarray*} (u,v,t) \in H &\Rightarrow& \tilde{\lambda}^T u + \tilde{\nu}^T v + \mu t \geq \alpha \\ (u,v,t) \in B &\Rightarrow& \tilde{\lambda}^T u + \tilde{\nu}^T v + \mu t \leq \alpha \end{eqnarray*} $$
-
then $\tilde{\lambda} \succeq 0$ & $\mu\geq0$ - assume $\mu>0$
- can prove when $\mu=0$, but kind of tedius, plus, whole purpose is provide good intuition, so will not do it here
- above second inequality implies $\mu p^\ast \leq \alpha$ and for some $x\in\optdomain$ $$ \mu L(x,\tilde{\lambda}/\mu, \tilde{\nu}/\mu) = \tilde{\lambda}^T \fie(x) + \tilde{\nu}^T \feq(x) + \mu \fobj(x) \geq \alpha \geq \mu p^\ast $$ thus, $$ g(\tilde{\lambda}/\mu, \tilde{\nu}/\mu) \geq p^\ast $$
- finally, weak duality implies $$ g(\lambda,\nu) = p^\ast $$ where $\lambda = \tilde{\lambda}/\mu$ & $\nu = \tilde{\nu}/\mu$
Max-min characterization of weak and strong dualities
- note $$ \begin{eqnarray*} \sup_{\lambda\geq 0, \nu} L(x,\lambda,\nu) &=& \sup_{\lambda\geq 0, \nu} \left( \fobj(x) + \lambda^T \fie(x) + \nu^T \feq(x) \right) \\ & = & \left\{\begin{array}{ll} \fobj(x) & x\in\optfeasset \\ \infty & \mbox{otherwise} \end{array}\right. \end{eqnarray*} $$
- thus $p^\ast = \inf_{x\in\optdomain} \sup_{\lambda\succeq 0, \nu} L(x,\lambda,\nu)$ whereas $d^\ast = \sup_{\lambda\succeq 0,\nu} \inf_{x\in\optdomain} L(x,\lambda,\nu)$
- weak duality means $$ \sup_{\lambda\succeq 0, \nu} \inf_{x\in\optdomain} L(x,\lambda,\nu) \leq \inf_{x\in\optdomain} \sup_{\lambda\succeq 0, \nu} L(x,\lambda,\nu) $$
- strong duality means $$ \sup_{\lambda\succeq 0, \nu} \inf_{x\in\optdomain} L(x,\lambda,\nu) = \inf_{x\in\optdomain} \sup_{\lambda\succeq 0, \nu} L(x,\lambda,\nu) $$
Max-min inequality
- indeed, inequality $\sup_{\lambda\succeq 0} \inf_{x\in\optdomain} L(x,\lambda,\nu) \leq \inf_{x\in\optdomain} \sup_{\lambda\succeq 0} L(x,\lambda,\nu)$ holds for general case
- this happens, e.g., $X=\optdomain$, $Y=\prealk{m} \times \reals^p$, $f$ is Lagrangian of optimization problem (in ) for which strong duality holds
Saddle-points
- if assumption in holds, $x^\ast$ minimizes $f(x,y^\ast)$ over $X$ and $y^\ast$ maximizes $f(x^\ast,y)$ over $Y$ $$ \sup_{y\in Y} f(x^\ast,y) = f(x^\ast,y^\ast) = \inf_{x\in X} f(x,y^\ast) $$
Saddle-point interpretation of strong duality
- for primal optimum $x^\ast$ and dual optimum $(\lambda^\ast,\nu^\ast)$ $$ g(\lambda^\ast,\nu^\ast) \leq L(x^\ast, \lambda^\ast, \nu^\ast) \leq \fobj(x^\ast) $$
-
if strong duality holds,
for every $x\in\optdomain$, $\lambda\succeq 0$, and $\nu$
$$
L(x^\ast,\lambda,\nu)
\leq
\fobj(x^\ast) = L(x^\ast,\lambda^\ast,\nu^\ast) = g(\lambda^\ast,\nu^\ast)
\leq
L(x,\lambda^\ast, \nu^\ast)
$$
- thus $x^\ast$ and $(\lambda^\ast,\nu^\ast)$ form saddle-point of Lagrangian
-
conversely, if $\tilde{x}$ and $(\tilde{\lambda},\tilde{\nu})$ are saddle-point of Lagrangian,
i.e.,
for every $x\in\optdomain$, $\lambda\succeq 0$, and $\nu$
$$
L(\tilde{x}, {\lambda},{\nu})
\leq
L(\tilde{x}, \tilde{\lambda},\tilde{\nu})
\leq
L({x}, \tilde{\lambda},\tilde{\nu})
$$
- hence $g(\tilde{\lambda},\tilde{\nu}) = \inf_{x\in\optdomain} L(x,\tilde{\lambda},\tilde{\nu}) = L(\tilde{x}, \tilde{\lambda},\tilde{\nu}) = \sup_{\lambda\succeq 0, \nu} L(\tilde{x},{\lambda},{\nu}) = \fobj(\tilde{x})$, thus $g(\lambda^\ast,\nu^\ast) \leq g(\tilde{\lambda}, \tilde{\nu})$ & $\fobj(\tilde{x}) \leq \fobj(x^\ast)$
- thus $\tilde{x}$ and $(\tilde{\lambda}, \tilde{\nu})$ are primal and dual optimal
Game interpretation
- assume two players play zero-sum game with payment function $f:X\times Y\to \reals$ where player $A$ pays player $B$ amount equal to $f(x,y)$ when player $A$ chooses $x$ and player $B$ chooses $y$
- player $A$ will try to minimize $f(x,y)$ and player $B$ will try to maximize $f(x,y)$
-
assume player $A$ chooses first
then player $B$ chooses after learning opponent's choice
- if player $A$ chooses $x$, player $B$ will choose $\argsup_{y\in Y} f(x,y)$
- knowing that, player $A$ will first choose $\arginf_{x\in X} \sup_{y\in Y} f(x,y)$
- hence payment will be $\inf_{x\in X} \sup_{y\in Y} f(x,y)$
- if player $B$ makes her choise first, opposite happens, i.e., payment will be $\sup_{y\in Y} \inf_{x\in X} f(x,y)$
- max-min inequality of says $$ \sup_{y\in Y} \inf_{x\in X} f(x,y) \leq \inf_{x\in X} \sup_{y\in Y} f(x,y) $$ i.e., whowever chooses later has advantage, which is similar or rather same as matrix games using mixed strategies on page~
- saddle-point for $f$ (and $X$ and $Y$), $(x^\ast,y^\ast)$, called solution of game - $x^\ast$ is optimal choice for player $A$ and $x^\ast$ is optimal choice for player $B$
Game interpretation for weak and strong dualities
- assume payment function in zero-sum game on page~ is Lagrangian of optimization problem in
- assume that $X=\xdomain$ and $Y=\prealk{n} \times \reals^p$
- if player $A$ chooses first, knowing that player $B$ will choose $\argsup_{(\lambda,\nu)\in Y}L(x,\lambda,\nu)$, she will choose $x^\ast = \arginf_{x\in\xdomain} \sup_{(\lambda,\nu)\in Y}L(x,\lambda,\nu)$
- likewise, player $B$ will choose $(\lambda^\ast,\nu^\ast) = \argsup_{(\lambda,\nu)\in Y} \inf_{x\in\xdomain} L(x,\lambda,\nu)$
- optimal dualtiy gap $p^\ast - d^\ast$ equals to advantage player who goes second has
- if strong dualtiy holds, $(x^\ast, \lambda^\ast, \nu^\ast)$ is solution of game, in which case no one has advantage
Certificate of suboptimality
- dual feasible point $(\lambda,\nu)$ degree of suboptimality of current solution
- assume $x$ is feasible solution, then $$ \fobj(x) - p^\ast \leq \fobj(x) - g(\lambda,\nu) $$ guarantees that $\fobj(x)$ is no further than $\epsilon = \fobj(x) - g(\lambda,\nu)$ from optimal point point $x^\ast$ (even though we do not know optimal solution)
- for this reason, $(\lambda,\nu)$, called certificate of suboptimality
- $x$ is $\epsilon$-suboptimal for primal problem and $(\lambda,\nu)$ is $\epsilon$-suboptimal for dual problem
- strong duality means we could find arbitrarily small certificate of suboptimality
Complementary slackness
- assume strong duality holds for optimization problem in and assume $x^\ast$ is primal optimum and $(\lambda^\ast,\nu^\ast)$ is dual optimum, then $$ \fobj(x^\ast) = L(x^\ast,\lambda^\ast,\nu^\ast) = \fobj(x^\ast) + {\lambda^\ast}^T \fie(x^\ast) + {\nu^\ast}^T \feq(x^\ast) $$
- $\feq(x^\ast)=0$ implies ${\lambda^\ast}^T \fie(x^\ast)=0$
- then $\lambda^\ast \succeq 0$ and $\fie(x^\ast) \preceq 0$ imply $$ \lambda_i^\ast \fie_i(x^\ast) = 0 \quad i=1,\ldots,m $$
KKT optimality conditions
KKT necessary for optimality with strong duality
- when strong duality holds, KKT optimality conditions are necessary for primal and dual optimality
- or equivalently
- primal and dual optimality with strong duality imply KKT optimality conditions
KKT and convexity sufficient for optimality with strong duality
- assume convex optimization problem where $\fobj$, $\fie$, and $\feq$ are all differentiable and ${x}\in\optdomain$ and $({\lambda}, {\nu})\in\reals^m\times\reals^p$ satisfying KKT conditions, i.e. $$ \fie({x}) \preceq 0, \; \feq({x}) = 0 , \; {\lambda} \succeq 0 , \; {\lambda}^T \fie({x}) = 0 , \; \nabla_x L({x}, {\lambda},{\nu}) = 0 $$
- since $L(x,\lambda,\nu)$ is convex for $\lambda\succeq 0$, i.e., each of $\fobj(x)$, $\lambda^T \fie(x)$, and $\nu^T \feq(x)$ is convex, vanishing gradient implies $x$ achieves infimum for Lagrangian, hence $$ g(\lambda,\nu) = L(x,\lambda,\nu) = \fobj(x) + \lambda^T \fie(x) + \nu^T \feq(x) = f(x) $$
- thus, strong duality holds, i.e., $x$ and $(\lambda,\nu)$ are primal and dual optimal solutions with zero duality gap
- for convex optimization problem, KKT optimality conditions are sufficient for primal and dual optimality with strong duality
- or equivalently
- KKT optimality conditions and convexity imply primal and dual optimality and strong duality
-
together with
implies
that
for convex optimization problem
- KKT optimality conditions are necessary and sufficient for primal and dual optimality with strong duality
Solving primal problems via dual problems
- when strong duality holds, can retrieve primal optimum from dual optimum since primal optimal solution is minimize of $$ L(x,\lambda^\ast,\nu^\ast) $$ where $(\lambda^\ast, \nu^\ast)$ is dual optimum
-
example - entropy maximization
($\optdomain = \pprealk{n}$)
- primal problem -
- dual problem -
- provided dual optimum $(\lambda^\ast,\nu^\ast)$, primal optimum is $$ x^\ast = \argmin_{x\in\optdomain} \left( \sum x_i \log x_i + {\lambda^\ast}^T (Ax-b) + \nu^\ast(\ones^Tx -1) \right) $$
- $\nabla_x L(x,\lambda^\ast,\nu^\ast) = \log x + A^T \lambda^\ast + (1+\nu^\ast)\ones$, hence $$ x^\ast = \exp(-(A^T \lambda^\ast + (1+\nu^\ast)\ones)) $$
Perturbed optimization problems
- original problem in with perturbed constraints $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie(x) \preceq u \\ & \feq(x) =v \end{array} $$ where $u\in\reals^m$ and $v\in\reals^p$
- define $p^\ast(u,v)$ as optimal value of above perturbed problem, i.e. $$ p^\ast(u,v) = \inf\set{\fobj(x)}{x\in\optdomain, \fie(x) \preceq u, \feq(x) = v} $$ which is convex when problem is convex optimization problem - note $p^\ast(0,0)=p^\ast$
- assume and dual optimum $(\lambda^\ast,\nu^\ast)$, if strong duality holds, for every feasible $x$ for perturbed problem $$ p^\ast(0,0)=g(\lambda^\ast,\nu^\ast) \leq \fobj(x) + {\lambda^\ast}^T \fie(x) + {\nu^\ast}^T \feq(x) \leq \fobj(x) + {\lambda^\ast}^T u + {\nu^\ast}^T v $$ thus $$ p^\ast(0,0)\leq p^\ast(u,v) + {\lambda^\ast}^T u + {\nu^\ast}^T v $$ hence $$ p^\ast(u,v)\geq p^\ast(0,0) - {\lambda^\ast}^T u - {\nu^\ast}^T v $$
- the figure shows this for optimization problem with one inequality constraint and no equality constraint
Global sensitivity analysis via perturbed problems
- recall $$ p^\ast(u,v)\geq p^\ast(0,0) - {\lambda^\ast}^T u - {\nu^\ast}^T v $$
-
interpretations
- if $\lambda^\ast_i$ is large, when $i$-th inequality constraint is tightened, optimal value increases a lot
- if $\lambda^\ast_i$ is small, when $i$-th inequality constraint is relaxed, optimal value decreases not a lot
- if $|\nu^\ast_i|$ is large, reducing $v_i$ when $\nu^\ast_i>0$ or increasing $v_i$ when $\nu^\ast_i<0$ increases optimval value a lot
- if $|\nu^\ast_i|$ is small, increasing $v_i$ when $\nu^\ast_i>0$ or decreasing $v_i$ when $\nu^\ast_i<0$ decreases optimval value not a lot
- it only gives lower bounds - will explore local behavior
Local sensitivity analysis via perturbed problems
-
assume $p^\ast(u,v)$ is differentiable with respect to $u$ and $v$,
i.e., $\nabla_{(u,v)} p^\ast(u,v)$ exist
- then $$ \frac{\partial}{\partial u_i} p^\ast (0,0) = \lim_{h\to 0^+} \frac{p^\ast(he_i,0) - p^\ast(0,0)}{h} \geq \lim_{h\to 0^+} \frac{-{\lambda^\ast}^T (he_i) }{h} = -\lambda_i $$ and $$ \frac{\partial}{\partial u_i} p^\ast (0,0) = \lim_{h\to 0^-} \frac{p^\ast(he_i,0) - p^\ast(0,0)}{h} \leq \lim_{h\to 0^-} \frac{-{\lambda^\ast}^T (he_i) }{h} = -\lambda_i $$
- obtain same result for $v_i$, hence $$ \nabla_u\; p^\ast (0,0) = -\lambda \quad \nabla_v\; p^\ast (0,0) = -\nu $$
- so larger $\lambda_i$ or $|\nu_i|$ means larger change in optimal value of perturbed problem when $u_i$ or $v_i$ change a bit and vice versa quantitatively, - $\lambda_i$ an $\nu_i$ provide exact ratio and direction
Different dual problems for equivalent optimization problems - 1
-
introducing new variables and equality constraints
for unconstrained problems
-
unconstrained optimization problem
$$
\begin{array}{ll}
\mbox{minimize} &
f(Ax+b)
\end{array}
$$
- dual Lagrange function is $g = p^\ast$, hence strong duality holds, which, however, does not provide useful information
-
reformulate as equivalent optimization problem
$$
\begin{array}{ll}
\mbox{minimize} &
f(y)
\\
\mbox{subject to} &
Ax+b = y
\end{array}
$$
- Lagrangian - $L(x,y,\nu) = f(y) + \nu^T(Ax+b-y)$
- Lagrange dual function - $g(\nu) = -I(A^T\nu = 0) + b^T\nu - f^\ast(\nu)$
- dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & b^T\nu - f^\ast(\nu) \\ \mbox{subject to} & A^T \nu = 0 \end{array} $$
-
unconstrained optimization problem
$$
\begin{array}{ll}
\mbox{minimize} &
f(Ax+b)
\end{array}
$$
-
examples
-
unconstrained geometric problem
$$
\begin{array}{ll}
\mbox{minimize} &
\log\left(
\sum_{i=1}^m \exp(a_i^Tx + b_i)
\right)
\end{array}
$$
- reformulation $$ \begin{array}{ll} \mbox{minimize} & \log\left( \sum_{i=1}^m \exp(y_i) \right) \\ \mbox{subject to} & Ax + b =y \end{array} $$
- dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & b^T \nu - \sum_{i=1}^m \nu_i \log \nu_i \\ \mbox{subject to} & \ones^T \nu = 1 \\ & A^T \nu = 0 \\ & \nu \succeq 0 \end{array} $$ which is entropy maximization problem
-
norm minimization problem
$$
\begin{array}{ll}
\mbox{minimize} &
\|Ax-b\|
\end{array}
$$
- reformulation $$ \begin{array}{ll} \mbox{minimize} & \|y\| \\ \mbox{subject to} & Ax - b = y \end{array} $$
- dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & b^T \nu \\ \mbox{subject to} & \|\nu\|_\ast \leq 1 \\ & A^T \nu =0 \end{array} $$
-
unconstrained geometric problem
$$
\begin{array}{ll}
\mbox{minimize} &
\log\left(
\sum_{i=1}^m \exp(a_i^Tx + b_i)
\right)
\end{array}
$$
Different dual problems for equivalent optimization problems - 2
-
introducing new variables and equality constraints
for constrained problems
- inequality constrained optimization problem $$ \begin{array}{ll} \mbox{minimize} & f_0(A_0x+b_0) \\ \mbox{subject to} & f_i(A_ix+b_i) \leq 0\quad i=1,\ldots,m \end{array} $$
- reformulation $$ \begin{array}{ll} \mbox{minimize} & f_0(y_0) \\ \mbox{subject to} & f_i(y_i) \leq 0\quad i=1,\ldots,m \\ & A_i x + b_i = y_i\quad i=0,\ldots,m \end{array} $$
- dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & \sum_{i=0}^m \nu_i^T b_i - f_0^\ast(\nu_0) - \sum_{i=1}^m \lambda_i f_i^\ast(\nu_i/\lambda_i) \\ \mbox{subject to} & \sum_{i=0}^m A_i^T \nu_i = 0 \\ & \lambda \succeq 0 \end{array} $$
-
examples
-
inequality constrained geometric program
$$
\begin{array}{ll}
\mbox{minimize} &
\log\left(\sum \exp(A_0x + b_0)\right)
\\
\mbox{subject to} &
\log\left(\sum \exp(A_ix + b_i)\right)\leq 0\quad i=1,\ldots,m
\end{array}
$$
where $A_i\in\reals^{K_i\times n}$
and $\exp(z) := (\exp(z_1),\ldots,\exp(z_k)))\in\reals^n$
and $\sum z := \sum_{i=1}^k z_i\in\reals$
for $z\in\reals^k$
- reformulation $$ \begin{array}{ll} \mbox{minimize} & \log\left(\sum \exp(y_0)\right) \\ \mbox{subject to} & \log\left(\sum \exp(y_i)\right)\leq 0\quad i=1,\ldots,m \\ & A_i x + b_i = y_i \quad i=0,\ldots,m \end{array} $$
- dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & \sum_{i=0}^m b_i^T \nu_i - \nu_0^T\log(\nu_0) - \sum_{i=1}^m \nu_i^T\log(\nu_i/\lambda_i) \\ \mbox{subject to} & \nu_i \succeq 0\quad i=0,\ldots,m \\ & \ones^T \nu_0 = 1,\; \ones^T\nu_i=\lambda_i\quad i=1,\ldots,m \\ & \lambda_i\geq 0 \quad i=1,\ldots,m \\ & \sum_{i=0}^m A_i^T\nu_i = 0 \end{array} $$ where and $\log(z) := (\log(z_1),\ldots,\log(z_k)))\in\reals^n$ for $z\in\pprealk{k}$
- simplified dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & \sum_{i=0}^m b_i^T \nu_i - \nu_0^T\log(\nu_0) - \sum_{i=1}^m \nu_i^T\log(\nu_i/\ones^T\nu_i) \\ \mbox{subject to} & \nu_i \succeq 0\quad i=0,\ldots,m \\ & \ones^T \nu_0 = 1 \\ & \sum_{i=0}^m A_i^T\nu_i = 0 \end{array} $$
-
inequality constrained geometric program
$$
\begin{array}{ll}
\mbox{minimize} &
\log\left(\sum \exp(A_0x + b_0)\right)
\\
\mbox{subject to} &
\log\left(\sum \exp(A_ix + b_i)\right)\leq 0\quad i=1,\ldots,m
\end{array}
$$
where $A_i\in\reals^{K_i\times n}$
and $\exp(z) := (\exp(z_1),\ldots,\exp(z_k)))\in\reals^n$
and $\sum z := \sum_{i=1}^k z_i\in\reals$
for $z\in\reals^k$
Different dual problems for equivalent optimization problems - 3
-
transforming objectives
- norm minimization problem $$ \begin{array}{ll} \mbox{minimize} & \|Ax - b\| \end{array} $$
- reformulation $$ \begin{array}{ll} \mbox{minimize} & (1/2)\|y\|^2 \\ \mbox{subject to} & Ax - b = y \end{array} $$
- dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & -(1/2)\|\nu\|_\ast^2 + b^T\nu \\ \mbox{subject to} & A^T\nu = 0 \end{array} $$
Different dual problems for equivalent optimization problems - 4
-
making contraints implicit
- LP with box constraints $$ \begin{array}{ll} \mbox{minimize} & c^T x \\ \mbox{subject to} & Ax = b,\; l \preceq x \preceq u \end{array} $$
- dual optimization problem $$ \begin{array}{ll} \mbox{maximize} & -b^T\nu - \lambda_1^Tu + \lambda_2^Tl \\ \mbox{subject to} & A^T\nu + \lambda_1 - \lambda_2 + c = 0,\; \lambda_1 \succeq 0,\; \lambda_2 \succeq 0 \end{array} $$
- reformulation $$ \begin{array}{ll} \mbox{minimize} & c^T x + I ( l\preceq x \preceq u) \\ \mbox{subject to} & Ax = b \end{array} $$
- dual optimization problem for reformulated primal problem $$ \begin{array}{ll} \mbox{maximize} & -b^T \nu - u^T(A^T\nu + c)^- + l^T(A^T\nu + c)^+ \end{array} $$
Theorems of Alternatives
Weak alternatives
- can prove using duality of optimization problems
-
consider primal and dual problems
- primal problem $$ \begin{array}{ll} \mbox{minimize} & 0 \\ \mbox{subject to} & \fie(x) \preceq 0 \\ & \feq(x) =0 \end{array} $$
- dual problem $$ \begin{array}{ll} \mbox{maximize} & g(\lambda,\nu) \\ \mbox{subject to} & \lambda \succeq 0 \end{array} $$ where $$ g(\lambda,\nu) = \inf_{x\in\optdomain} \left( \lambda^T \fie(x) + \nu^T \feq(x) \right) $$
- then $p^\ast,\; d^\ast \in \{0,\infty\}$
- now assume first system of \theoremname~\ref{theorem:weak alternatives of two systems}\ is feasible, then $p^\ast = 0$, hence weak duality applies $d^\ast=0$, thus there exist no $\lambda$ and $\nu$ such that $\lambda\succeq 0$ and $g(\lambda,\nu) > 0$ i.e., second system is infeasible, since otherwise there exist $\lambda$ and $\nu$ making $g(\lambda,\nu)$ arbitrarily large; if $\tilde{\lambda}\succeq 0$ and $\tilde{\nu}$ satisfy $g({\lambda},{\nu})>0$, $g(\alpha\tilde{\lambda}, \alpha\tilde{\nu}) = \alpha g(\tilde{\lambda}, \tilde{\nu})$ goes to $\infty$ when $\alpha\to\infty$
- assume second system is feasible, then $g(\lambda,\nu)$ can be arbitrarily large for above reasons, thus $d^\ast = \infty$, hence weak duality implies $p^\ast = \infty$, which implies first system is infeasible
- therefore two systems are weak alternatives; at most one of them is feasible
- (actually, not hard to prove it without using weak duality)
Weak alternatives with strict inequalities
Strong alternatives
Strong alternatives with strict inequalities
-
proof -
consider convex optimization problem and its dual
- primal problem $$ \begin{array}{ll} \mbox{minimize} & s \\ \mbox{subject to} & \fie(x) - s \ones \preceq 0 \\ & \feq(x) =0 \end{array} $$
- dual problem $$ \begin{array}{ll} \mbox{maximize} & g(\lambda,\nu) \\ \mbox{subject to} & \lambda \succeq 0 \quad \ones^T \lambda = 1 \end{array} $$ where $g(\lambda,\nu) = \inf_{x\in\optdomain} \left( \lambda^T \fie(x) + \nu^T \feq(x) \right)$
- first observe Slater's condition holds for primal problem since by hypothesis of , exists $y\in\relint \optdomain$ with $\feq(y)=0$, hence $(y,\fie(y))\in\xie\times \reals$ is primal feasible satisifying Slater's condition
- hence Slater's theorem () implies $d^\ast=p^\ast$
- assume first system is feasible, then primal problem is strictly feasible and $d^\ast = p^\ast<0$, hence second system infeasible since otherwise feasible point for second system is feasible point of dual problem, hence $d^\ast\geq0$
- assume first system is infeasible, then $d^\ast = p^\ast\geq0$, hence Slater's theorem () implies exists dual optimal $(\lambda^\ast,\nu^\ast)$ (whether or not $d^\ast=\infty$), hence $(\lambda^\ast,\nu^\ast)$ is feasible point for second system of
- therefore two systems are strong alternatives; each is feasible if and only if the other is infeasible
Strong alternatives for linear inequalities
- dual function of feasibility problem for $Ax\preceq b$ is $$ g(\lambda) = \inf_{x\in\reals^n} \lambda^T(Ax-b) = \left\{ \begin{array}{ll} -b^T \lambda & A^T\lambda = 0 \\ -\infty & \mbox{otherwise} \end{array} \right. $$
- hence alternative system is $\lambda\succeq0,\;b^T\lambda <0,\; A^T\lambda=0$
- thus implies below systems are strong alternatives $$ Ax \preceq b \quad\&\quad \lambda\succeq0 \quad b^T\lambda <0 \quad A^T\lambda=0 $$
- similarly alternative system is $\lambda\succeq0,\;b^T\lambda <0,\; A^T\lambda=0$ and implies below systems are strong alternatives $$ Ax \prec b \quad\&\quad \lambda\succeq0 \quad \lambda \neq 0 \quad b^T\lambda \leq 0 \quad A^T\lambda=0 $$
Farkas' lemma
- will prove using LP and its dual
- consider LP $\left(\mbox{minimize}\; c^T x \quad \mbox{subject to}\; Ax \preceq 0\right)$
- dual function is $g(y) = \inf_{x\in\reals^n} \left(c^Tx + y^TAx \right) = \left\{ \begin{array}{ll} 0 & A^Ty + c= 0 \\ -\infty & \mbox{otherwise} \end{array} \right.$
- hence dual problem is $\left( \mbox{maximize} \; 0 \quad \mbox{subject to} \; A^T y + c = 0 , \; y \succeq 0 \right)$
- assume first system is feasible, then homogeneity of primal problem implies $p^\ast = -\infty$, thus $d^\ast$, i.e., dual is infeasible, hence second system is infeasible
- assume first system is infeasible, since primal is always feasible, $p^\ast=0$, hence strong duality implies $d^\ast =0$, thus second system is feasible
Convex Optimization with Generalized Inequalities
Optimization problems with generalized inequalities
- every terminology and associated notation is same as of optimization problem in such as objective & inequality & equality contraint functions, domain of optimization problem $\optdomain$, feasible set $\optfeasset$, optimal value $p^\ast$
- note that when $K_i=\preals$ (hence $\bigpropercone=\prealk{m}$), above optimization problem coincides with that in , i.e., optimization problems with generalized inequalities subsume (normal) optimization problems
Lagrangian for generalized inequalities
Lagrange dual functions for generalized inequalities
- $g$ is concave function
- $g(\lambda,\nu)$ is lower bound for optimal value of associated optimization problem i.e., $$ g(\lambda,\nu) \leq p^\ast $$ for every $\lambda\succeq_\bigpropercone^\ast0$ where $\bigpropercone^\ast$ denotes dual cone of $\bigpropercone$, i.e., $\bigpropercone^\ast = \bigtimes K_i^\ast$ where $K_i^\ast\subset\reals^{k_i}$ is dual cone of $K_i\subset\reals^{k_i}$
- $(\lambda,\nu)$ with $\lambda\succeq_\bigpropercone 0$ and $g(\lambda,\nu)>-\infty$ said to be dual feasible
Lagrange dual problems for generalized inequalities
Slater's theorem for generalized inequalities
Duality for SDP
- (inequality form) SDP $$ \begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & x_1F_1 + \cdots + x_nF_n + G \preceq 0 \end{array} $$ where $F_1,\ldots,F_n,G\in\symset{k}$ and $\bigpropercone = \possemidefset{k}$
- Lagrangian $$ L(x,Z) = c^Tx + (x_1F_1 + \cdots + x_nF_n + G) \bullet Z = \sum x_i(F_i\bullet Z + c_i) + G \bullet Z $$ where $X\bullet Y = \Tr XY$ for $X,Y\in\symset{k}$
- Lagrange dual function $$ g(Z) = \inf_{x\in\reals^n} L(x,Z) = \left\{ \begin{array}{ll} G \bullet Z & F_i\bullet Z + c_i= 0\quad i=1,\ldots,n \\ -\infty & \mbox{otherwise} \end{array} \right. $$
- Lagrange dual problem $$ \begin{array}{ll} \mbox{maximize} & G\bullet Z \\ \mbox{subject to} & F_i \bullet Z + c_i = 0\quad i=1,\ldots,n \\ & Z \succeq 0 \end{array} $$ where fact that $\possemidefset{k}$ is self-dual, i.e., $\bigpropercone^\ast = \bigpropercone$
- Slater's theorem () implies if primal problem is strictly feasible, i.e., exists $x\in\reals^n$ such that $\sum x_iF_i + G\prec 0$, strong duality holds
KKT optimality conditions for generalized inequalities
KKT conditions and optimalities for generalized inequalities
-
for every optimization problem with generalized inequalities
(),
every statement for normal optimization problem
(),
regarding relations among
KKT conditions,
optimality,
primal and dual optimality,
and
strong duality,
is exactly the same
-
for every optimization problem with generalized inequalities
()
- if strong duality holds, primal and dual optimal points satisfy KKT optimality conditions in (same as )
- if optimization problem is convex and primal and dual solutions satisfy KKT optimality conditions in , the solutions are optimal with strong duality (same as )
- therefore, for convex optimization problem, KKT optimality conditions are necessary and sufficient for primal and dual optimality with strong duality
-
for every optimization problem with generalized inequalities
()
Perturbation and sensitivity analysis for generalized inequalities
- original problem in with perturbed constraints $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & \fie(x) \preceq_\bigpropercone u \\ & \feq(x) =v \end{array} $$ where $u\in\reals^m$ and $v\in\reals^p$
- define $p^\ast(u,v) = p^\ast(u,v) = \inf\set{\fobj(x)}{x\in\optdomain, \fie(x) \preceq u, \feq(x) = v}$, which is convex when problem is convex optimization problem - note $p^\ast(0,0)=p^\ast$
- as for normal optimization problem case (page~), if and dual optimum $(\lambda^\ast,\nu^\ast)$, if strong duality holds, $$ p^\ast(u,v)\geq p^\ast(0,0) - {\lambda^\ast}^T u - {\nu^\ast}^T v $$ and $$ \nabla_u\; p^\ast (0,0) = -\lambda \quad \nabla_v\; p^\ast (0,0) = -\nu $$
Sensitivity analysis for SDP
- assume inequality form SDP and its dual problem on page~ and page~
-
consider perturbed SDP
$$
\begin{array}{ll}
\mbox{minimize} &
c^Tx
\\
\mbox{subject to} &
x_1F_1 + \cdots + x_nF_n + G \preceq U
\end{array}
$$
for some $U\in\symset{k}$
- define $p^\ast:\symset{k} \to \reals$ such that $p^\ast(U)$ is optimal value of above problem
- assume $x^\ast\in\reals^n$ and $Z^\ast\in\possemidefset{k}$ are primal and dual optimum with zero dualty gap
- then $$ p^\ast(U) \geq p^\ast - Z^\ast \bullet U $$
- if $\nabla_U p^\ast$ exists at $U=0$ $$ \nabla_U p^\ast(0) = - Z^\ast $$
Weak alternatives for generalized inequalities
Strong alternatives for generalized inequalities
Strong alternatives for SDP
-
for $F_1,\ldots,F_n,G\in\symset{k}$, $x\in\reals^n$, and $Z\in\symset{k}$
- below systems are strong alternatives $$ x_1F_1 + \cdots + x_nF_n + G \prec 0 $$ and $$ Z \succeq 0 \quad Z\neq 0 \quad G\bullet Z \geq 0 \quad F_i \bullet Z = 0\;i=1,\ldots,n $$
- if $\sum v_i F_i \succeq 0 \Rightarrow \sum v_i F_i = 0$, below systems are strong alternatives $$ x_1F_1 + \cdots + x_nF_n + G \preceq 0 $$ and $$ Z \succeq 0 \quad G\bullet Z > 0 \quad F_i \bullet Z = 0\;i=1,\ldots,n $$
Unconstrained Minimization
Unconstrained minimization
- consider unconstrained convex optimization problem, i.e., $m=p=0$ in $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \end{array} $$ where domain of optimization problem is $\optdomain\ = \xobj \subset \reals^n$
-
assume
- $\fobj$ is twice-differentiable (hence by definition $\xobj$ is open)
- optimal solution $x^\ast$ exists, i.e., $p^\ast = \inf_{x\in\optdomain} \fobj(x) = \fobj(x^\ast)$
- implies $x^\ast$ is optimal solution if and only if $$ \nabla \fobj(x^\ast) = 0 $$
- can solve above equation directly for few cases, but usually depend on iterative method, i.e., find sequence of points $\xseqk{0}, \xseqk{1}, \ldots \in \xobj$ such that $\lim_{k\to\infty} \fobj(\xseqk{k}) = p^\ast$
Requirements for iterative methods
-
requirements for iterative methods
- initial point $\xseqk{0}$ should be in domain of optimization problem, i.e. $$ \xseqk{0} \in \xobj\ $$
- sublevel set of $\fobj(\xseqk{0})$ $$ S = \bigset{x\in\xobj}{\fobj(x) \leq \fobj(\xseqk{0})} $$ should be closed
-
e.g.
- sublevel set of $\fobj(\xseqk{0})$ is closed for all $\xseqk{0}\in\xobj$ if $\fobj$ is closed, i.e., all its sublevel sets are closed
- $\fobj$ is closed if $\xobj = \reals^n$ and $\fobj$ is continuous
- $\fobj$ is closed if $\fobj$ is continuous, $\xobj$ is open, and $\fobj(x) \to \infty$ as $x \to \boundary \xobj$
Unconstrained minimization examples
-
convex quadratic problem
$$
\begin{array}{ll}
\mbox{minimize} &
\fobj(x) =
(1/2) x^TP x +q^Tx
\end{array}
$$
where $P\in\possemidefset{n}$ and $q\in\reals^n$
-
solution obtained by solving
$$
\nabla \fobj(x^\ast) = P x^\ast + q = 0
$$
- if solution exists, $x^\ast = - P^\dagger q$ (thus $p^\ast>-\infty$)
- otherwise, problem is unbounded below, i.e., $p^\ast = -\infty$
- ability to analytically solve quadratic minimization problem is basis for Newton's method, power method for unconstrained minimization
-
least-squares (LS) is special case of convex quadratic problem
$$
\begin{array}{ll}
\mbox{minimize} &
(1/2) \|Ax-b\|_2^2
= (1/2) x^T (A^TA) x - b^TAx + (1/2)\|b\|_2^2
\end{array}
$$
- optimal always exists, can be obtained via normal equations $$ A^T Ax^\ast = b $$
-
solution obtained by solving
$$
\nabla \fobj(x^\ast) = P x^\ast + q = 0
$$
-
unconstrained GP
$$
\begin{array}{ll}
\mbox{minimize} &
\fobj(x) =
\log\left(
\sum \exp (Ax+b)
\right)
\end{array}
$$
for $A\in\reals^{m\times n}$ and $b\in\reals^m$
- solution obtained by solving $$ \nabla \fobj(x^\ast) = \frac{\sum A^T \exp(Ax^\ast+b)}{\sum \exp(Ax^\ast+b)} = 0 $$
- need to resort to iterative method - since $\xobj = \reals^n$ and $\fobj$ is continuous, $\fobj$ is closed, hence every point in $\reals^n$ can be initial point
-
analytic center of linear inequalities
$$
\begin{array}{ll}
\mbox{minimize} &
\fobj(x) = - \sum\log(b-Ax)
\end{array}
$$
where $\xobj = \set{x\in\reals^n}{b-Ax \succ 0}$
- need to resort to iterative method - since $\xobj$ is open, $\fobj$ is continuous, and $\fobj(x) \to \infty$ as $x\to\boundary \xobj$, $\fobj$ is closed, hence every point in $\xobj$ can be initial point
- $\fobj$, called logarithmic barrier for inequalities $Ax\preceq b$
-
analytic center of LMI
$$
\begin{array}{ll}
\mbox{minimize} &
\fobj(x) = - \log\det F(x) = \log\det F(x)^{-1}
\end{array}
$$
where $F:\reals^n\to \symset{k}$ is defined by
$$
F(x) = x_1F_1 + \cdots + x_nF_n
$$
where $F_i\in \symset{k}$
and $\xobj = \set{x\in\reals^n}{F(x)\succ 0}$
- need to resort to iterative method - since $\xobj$ is open, $\fobj$ is continuous, and $\fobj(x) \to \infty$ as $x\to\boundary \xobj$, $\fobj$ is closed, hence every point in $\xobj$ can be initial point
- $\fobj$, called logarithmic barrier for LMI
Strong convexity and implications
- function $\fobj$ is strongly convex on $S$ $$ \left( \exists m >0 \right) \left( \forall x \in S \right) \left( \nabla^2 \fobj(x) \succeq mI \right) $$
-
strong convexity implies for every $x,y\in S$
$$
\fobj(y) \geq \fobj(x) + \nabla \fobj(x)^T (y-x) + ({m}/{2}) \|y-x\|_2^2
$$
- which implies gradient provides optimality certificate and tells us how far current point is from optimum, i.e. $$ \fobj(x) - p^\ast \leq ({1}/{2m}) \|\nabla \fobj(x)\|_2^2 \quad \|x-x^\ast\|_2 \leq ({2}/{m}) \|\nabla \fobj(x)\|_2 $$
- first equation implies sublevel sets contained in $S$ is bounded, hence continuous function $\nabla^2 \fobj(x)$ is also bounded, i.e., $\left( \exists M >0 \right) \left( \nabla^2 \fobj(x) \preceq M I \right)$, then $$ \fobj(x) - p^\ast \geq \frac{1}{2M} \|\nabla \fobj(x)\|_2^2 $$
Iterative methods
Line search methods
- Require: $\fobj$, \sdirk{k}, $\alpha\in(0,0.5)$, $\beta\in(0,1)$
- $\slen:=1$
- while $\fobj(\xseqk{k} + \slen \sdirk{k}) > \fobj(\xseqk{k}) + \alpha \slen \nabla \fobj(\xseqk{k})^T \sdirk{k}$ do
- $\slen := \beta \slen$
- end while
Gradient descent method
- Require: $\fobj$, initial point $x\in \dom \fobj$
- repeat
- search direction - $\sdir := - \nabla \fobj(x)$
- do line search to choose $\slen>0$
- update - $x := x + \slen \sdir$
- until stopping criterion satisfied
Summary of gradient descent method
- gradient method often exhibits approximately linear convergence, i.e., error $\fobj(\xseqk{k})-p^\ast$ converges to zero approximately as geometric series
- choice of backtracking parameters $\alpha$ and $\beta$ has noticeable but not dramatic effect on convergence
- exact line search sometimes improves convergence of gradient method, but not by large, hence mostly not worth implementation
- converge rate depends greatly on condition number of Hessian or sublevel sets - when condition number if large, gradient method can be useless
Newton's method - motivation
- second-order Taylor expansion of $\fobj$ - $\hat{\fobj}(\sdir) = \fobj(x + \sdir) = \fobj(x) + \nabla \fobj(x)^T \sdir + \frac{1}{2} \sdir^T \nabla^2 \fobj(x) \sdir$
- minimum of Taylor expansion achieved when $\nabla \hat{\fobj}(\sdir) = \nabla \fobj(x) + \nabla^2 \fobj(x) v = 0$
- solution called Newton step $$ \sdir_\mathrm{nt}(x) = - \nabla^2 \fobj(x)^{-1} \nabla \fobj(x) $$ assuming $\nabla^2\fobj(x)\succ0$
- thus Newton step minimizes local quadratic approximation of function
- difference of current and quadratic approximation minimum $$ \fobj(x) - \hat{\fobj}(\sdir_\mathrm{tn}(x)) = \frac{1}{2} \sdir_\mathrm{nt}^T \nabla^2 \fobj(x) \sdir_\mathrm{nt} = \frac{1}{2} \lambda(x)^2 $$
- Newton decrement $$ \lambda(x) = \sqrt{\sdir_\mathrm{nt}(x)^T \nabla^2 \fobj(x) \sdir_\mathrm{nt}(x)} = \sqrt{\nabla \fobj(x)^T \nabla^2 \fobj(x)^{-1} \nabla \fobj(x)} $$
Newton's method
- Require: \fobj, initial point $x\in \dom \fobj$, tolerance $\epsilon>0$
- loop
- computer Newton step and descrement $$ \sdir_\mathrm{nt}(x) := -\nabla^2 \fobj(x)^{-1} \nabla \fobj(x) \quad \lambda(x)^2 := \nabla \fobj(x)^T \nabla^2 \fobj(x)^{-1} \nabla \fobj(x) $$
- stopping criterion - quit if $\lambda(x)^2/2 < \epsilon$
- do line search to choose $t>0$
- update - $x := x + \slen \sdir_\mathrm{nt}$
- end loop
- Newton step is descent direction since $$ \left. \left( \frac{d}{dx}\fobj(x+t\sdir_\mathrm{nt}) \right) \right|_{t=0} = \nabla \fobj(x) ^T \sdir_\mathrm{nt} = - \lambda(x)^2 <0 $$
Assumptions for convergence analysis of Newton's method
-
assumptions
- strong convexity and boundedness of Hessian on sublevel set $$ \left( \exists\; m, M > 0 \right) \left( \forall x \in S \right) \left( mI \preceq \nabla^2 \fobj(x) \preceq MI \right) $$
- Lipschitz continuity of Hessian on sublevel set $$ \left( \exists L > 0 \right) \left( \forall x,y\in S \right) \left( \|\nabla^2 \fobj(x)- \nabla^2\fobj(y)\|_2 \leq L \|x-y\|_2 \right) $$
-
Lipschitz continuity constant $L$ plays critical role
in performance of Newton's method
- intuition says Newton's method works well for functions whose quadratic approximations do not change fast, i.e., when $L$ is small
Convergence analysis of Newton's method
- damped Newton phase - if $\|\nabla \fobj(\xseqk{k})\|_2 \geq \eta$, $$ \fobj(\xseqk{k+1}) - \fobj(\xseqk{k}) \leq - \gamma $$
- quadratic convergence phase - if $\|\nabla \fobj(\xseqk{k})\|_2 < \eta$, backtracking line search selects step length $\slenk{k}=1$ $$ \frac{L}{2m^2} \|\nabla \fobj(\xseqk{k+1})\|_2 \leq \left( \frac{L}{2m^2} \|\nabla \fobj(\xseqk{k})\|_2 \right)^2 $$
Summary of Newton's method
- Newton's method is affine invariant, hence performance is independent of condition number unlike gradient method
- once entering quadratic convergence phase, Newton's method converges extremely fast
- performance not much dependent on choice of algorithm parameters
- big disadvantage is computational cost for evaluating search direction, i.e., solving linear system
Self-concordance
Why self-concordance?
- convergence analysis of Newton's method depends on assumptions about function characteristics, e.g., $m,M, L > 0$ for strong convexity, continuity of Hessian, i.e. $$ m I \preceq \nabla^2 f(x) \preceq M I \quad \|\nabla^2 f(x)- \nabla^2f(y)\| \leq L \|x-y\| $$
-
self-concordance
discovered by Nesterov and Nemirovski
(who gave name self-concordance)
plays important role
for reasons
such as
- convergence analysis does not depend any function characterizing paramters
- many barrier functions which are used for interior-point methods, which are important class of optimization algorithms are self-concordance
- property of self-concordance is affine invariant
Self-concordance preserving operations
Self-concordant function examples
- negative logarithm - $f:\ppreals \to \reals$ with $$ f(x)=-\log x $$ is self-concordant since $$ |f'''(x)| / f''(x)^{3/2} = \left(2/x^3\right) / \left((1/x^2)^{3/2}\right) = 2 $$
- negative entropy plus negative logarithm - $f:\ppreals \to \reals$ with $$ f(x)=x\log x-\log x $$ is self-concordant since $$ |f'''(x)| / f''(x)^{3/2} = (x+2)/{(x+1)^{3/2}} \leq 2 $$
- log barrier for linear inequalities - for $A\in\reals^{m\times n}$ and $b\in\reals^m$ $$ f(x) = - \sum \log(b-Ax) $$ with $\dom f = \set{x\in\reals^n}{b-Ax \succ 0}$ is self-concordant by , i.e., $f$ is affine transformation of sum of self-concordant functions
- log-determinant - $f:\posdefset{n}\to\reals$ with $$ f(X) = \log\det X^{-1} = - \log\det X $$ is self-concordant since for every $X\in \posdefset{n}$ and $V\in\symset{n}$ function $g:\reals\to\reals$ defined by $g(t) = - \log\det(X+tV)$ where $\dom f = \set{t\in\reals}{X+tV\succeq 0}$ is self-concordant since $$ \begin{eqnarray*} g(t) &=& - \log \det (X^{1/2} (I + tX^{-1/2} V X^{-1/2})X^{1/2}) \\ &=& -\log\det X - \log\det(I+tX^{-1/2}VX^{-1/2}) \\ &=& -\log\det X - \sum \log (1+t\lambda_i(X,V)) \end{eqnarray*} $$ where $\lambda_i(X,V)$ is $i$-th eigenvalue of $X^{-1/2}VX^{1/2}$ is self-concordant by , i.e., $g$ is affine transformation of sum of self-concordant functions
- log of concave quadratic - $f:X\to\reals$ with $$ f(x) = -\log(-x^TPx - q^Tx - r) $$ where $P\in\possemidefset{n}$ and $X=\set{x\in\reals^n}{x^TPx + q^Tx + r<0}$
-
function $f:X\to\reals$
with
$$
f(x) = -\log(-g(x)) - \log x
$$
where $\dom f = \set{x\in\dom g \cap \ppreals}{g(x)<0}$
and
function $h:H\to\reals$
$$
h(x) = -\log(-g(x)-ax^2-bx-c) - \log x
$$
where $a\geq0$ and $\dom h = \set{x\in\dom g \cap \ppreals}{g(x)+ax^2+bx+c<0}$
are self-concordant
if $g$ is one of below
- $g(x) = -x^p$ for $0<p\leq 1$
- $g(x) = -\log x$
- $g(x) = x \log x$
- $g(x) = x^p$ for $-1\leq p\leq 0$
- $g(x) = (ax+b)^2/x$ for $a,b\in\reals$
- function $f:X\to\reals$ with $X = \set{(x,y)}{\|x\|_2 < y}\subset \reals^n \times \ppreals$ defined by $$ f(x,y) = -\log(y^2-x^Tx) $$ is self-concordant - can be proved using
- function $f:X\to\reals$ with $X = \set{(x,y)}{|x|^p < y}\subset \reals \times \ppreals$ defined by $$ f(x,y) = -2\log y - \log(y^{2/p}- x^2) $$ where $p\geq1$ is self-concordant - can be proved using
- function $f:X\to\reals$ with $X = \set{(x,y)}{\exp(x) < y}\subset \reals \times \ppreals$ defined by $$ f(x,y) = -\log y - \log(\log y - x) $$ is self-concordant - can be proved using
Properties of self-concordant functions
- note $$ \lambda(x) = \sup_{v\neq 0} \left(v^T \nabla \fobj(x) / \left( v^T \nabla^2 \fobj(x) v \right)^{1/2} \right) $$
Stopping criteria for self-concordant objective functions
- recall $\lambda(x)^2$ provides approximate optimality certificate, (page~) i.e., assuming $\fobj$ is well approximated by quadratic function around $x$ $$ \fobj(x) - p^\ast \lessapprox \lambda(x)^2/2 $$
- however, strict convexity together with self-concordance provides proven bound (by ) $$ \fobj(x) - p^\ast \leq \lambda(x)^2 $$ for $\lambda(x) \leq 0.68$
- hence can use following stopping criterion for guaranteed bound $$ \lambda(x)^2 \leq \epsilon \quad \Rightarrow \quad \fobj(x) - p^\ast \leq \epsilon $$ for $\epsilon \leq 0.68^2$
Convergence analysis of Newton's method for self-concordant functions
- damped Newton phase - if $\lambda(\xseqk{k})>\eta$ $$ \fobj(\xseqk{k+1}) - \fobj(\xseqk{k}) \leq - \gamma $$
- quadratic convergence phase - if $\lambda(\xseqk{k})\leq\eta$ backtracking line search selects step length $\slenk{k}=1$ $$ 2\lambda(\xseqk{k+1}) \leq \left(2\lambda(\xseqk{k})\right)^2 $$
Equality Constrained Minimization
Equality constrained minimization
- consider equality constrained convex optimization problem, i.e., $m=0$ in $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) \\ \mbox{subject to} & Ax = b \end{array} $$ where $A\in\reals^{p\times n}$ and domain of optimization problem is $\optdomain\ = \xobj \subset \reals^n$
-
assume
- $\rank A = p<n$, i.e., rows of $A$ are linearly independent
- $\fobj$ is twice-differentiable (hence by definition $\xobj$ is open)
- optimal solution $x^\ast$ exists, i.e., $p^\ast = \inf_{x\in\optfeasset} \fobj(x) = \fobj(x^\ast)$ and $Ax^\ast = b$
Solving KKT for equality constrained minimization
- implies $x^\ast\in\xobj$ is optimal solution if and only if exists $\nu^\ast\in\reals^p$ satisfy KKT optimality conditions, i.e., $$ \begin{eqnarray*} Ax^\ast = b &&\mbox{\define{primal feasibility equations}} \\ \nabla \fobj(x^\ast) + A^T\nu^\ast = 0 &&\mbox{\define{dual feasibility equations}} \end{eqnarray*} $$
-
solving equality constrained problem
is equivalent to
solving KKT equations
- handful types of problems can be solved analytically
-
using unconstrained minimization methods
- can eliminate equality constraints and apply unconstrained minimization methods
- solving dual problem using unconstrained minimization methods and retrieve primal solution (refer to page~)
-
will discuss Newton's method directly handling equality constraints
- preserving problem structure such as sparsity
Equality constrained convex quadratic minimization
- equality constrained convex quadratic minimization problem $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) = (1/2)x^T P x + q^Tx \\ \mbox{subject to} & Ax = b \end{array} $$ where $P\in\possemidefset{n}$ and $A\in\reals^{p\times n}$
- important since basis for extension of Newton's method to equality constrained problems
- KKT system $$ Ax^\ast = b \; \& \; Px^\ast + q + A^T\nu^\ast = 0 \; \Leftrightarrow \; \underbrace{ \mattwotwo{P}{A^T}{A}{0} }_{\mbox{\define{KKT matrix}}} \colvectwo{x^\ast}{\nu^\ast} = \colvectwo{-q}{b} $$
- exist primal and dual optimum $(x^\ast,\nu^\ast)$ if and only if KKT system has solution; otherwise, problem is unbounded below
Eliminating equality constraints
-
can solve equality constrained convex optimization
by
- eliminating equality constraints and
- using optimization method for solving unconstrained optimization
- note $$ \optfeasset = \set{x}{Ax=b} = \set{Fz + x_0}{z\in\reals^{n-p}} $$ for some $F\in\reals^{n\times(n-p)}$ where $\range(F) = \nullspace(A)$
- thus original problem equivalent to $$ \begin{array}{ll} \mbox{minimize} & \fobj(Fz + x_0) \end{array} $$
- if $z^\ast$ is optimal solution, $x^\ast = Fz^\ast + x_0$
- optimal dual can be retrieved by $$ \nu^\ast = - (AA^T)^{-1} A\nabla \fobj(x^\ast) $$
Solving dual problems
- Lagrange dual function of equality constrained problem $$ \begin{eqnarray*} g(\nu) & = & \inf_{x\in\optdomain} \left( \fobj(x) + \nu^T(Ax-b) \right) = -b^T\nu - \sup_{x\in\optdomain} \left((-A^T\nu)^Tx -\fobj(x)\right) \\ & = & -b^T \nu - {\fobj}^\ast(-A^T\nu) \end{eqnarray*} $$
- dual problem $$ \begin{array}{ll} \mbox{maximize} & -b^T \nu - {\fobj}^\ast(-A^T\nu) \end{array} $$
- by assumption, strong duality holds, hence if $\nu^\ast$ is dual optimum $$ g(\nu^\ast) = p^\ast $$
- if dual objective is twice-differentiable, can solve dual problem using unconstrained minimization methods
- primal optimum can be retrieved using method on page~)
Newton's method with equality constraints
-
finally discuss Newton's method which directly handles equality constraints
- similar to Newton's method for unconstrained minimization
- initial point, however, should be feasible, i.e., $\xseqk{0}\in\xobj$ and $A\xseqk{0} = b$
- Newton step tailored for equality constrained problem
Newton step via second-order approximation
- solve original problem approximately by solving $$ \begin{array}{ll} \mbox{minimize} & \hat{\fobj}(x+\sdir) = \fobj(x) + \nabla \fobj(x)^T \sdir + (1/2) \sdir^T \nabla^2 \fobj(x) \sdir \\ \mbox{subject to} & A(x+\sdir) = b \end{array} $$ where $x\in\optfeasset$
- Newton step for equality constrained minimization problem, defined by solution of KKT system for above convex quadratic minimization problem $$ \mattwotwo{\nabla^2 \fobj(x)}{A^T}{A}{0} \colvectwo{\sdir_\mathrm{nt}}{w} = \colvectwo{-\nabla \fobj(x)}{0} $$ only when KKT system is nonsingular
Newton step via solving linearized KKT optimality conditions
- recall KKT optimality conditions for equality constrained convex optimization problem $$ Ax^\ast = b \quad \& \quad \nabla \fobj(x^\ast) + A^T\nu^\ast = 0 $$
- linearize KKT conditions $$ \begin{eqnarray*} && A(x+\sdir) = b \quad \& \quad \nabla \fobj(x) + \nabla^2 \fobj(x) \sdir + A^Tw = 0 \\ &\Leftrightarrow& A\sdir = 0 \quad \& \quad \nabla^2 \fobj(x) \sdir + A^Tw = - \nabla \fobj(x) \end{eqnarray*} $$ where $x\in\optfeasset$
- Newton step defined by above equations is equivalent to that obtained by second-order approximation
Newton decrement for equality constrained minimization
- Newton descrement for equality constrained problem is defined by $$ \lambda(x) = \left(\sdir_\mathrm{nt} \nabla^2 \fobj(x) \sdir_\mathrm{nt}\right)^{1/2} $$
- same expression as that for unconstrained minimization, but is different since Newton step $\sdir_\mathrm{nt}$ is different from that for unconstrained minimization, i.e., $\sdir_\mathrm{nt} \neq -\nabla^2 \fobj(x)^{-1} \nabla \fobj(x)$ (refer to )
- however, as before, $$ \fobj(x) - \inf_{\sdir\in\reals^n}\set{\hat{\fobj}(x+\sdir)}{A(x+\sdir)=b} = \lambda(x)^2/2 $$ and $$ \left. \left( \frac{d}{dt}\fobj(x+t\sdir_\mathrm{nt}) \right) \right|_{t=0} = \nabla \fobj(x) ^T \sdir_\mathrm{nt} = - \lambda(x)^2 <0 $$
Feasible Newton's method for equality constrained minimization
- Require: $\fobj$, initial point $x\in \dom \fobj$ with $Ax=b$, tolerance $\epsilon>0$
- loop
- computer Newton step and descrement $\ntsdir(x)$ \& $\lambda(x)$
- stopping criterion - quit if $\lambda(x)^2/2 < \epsilon$
- do line search on \fobj\ to choose $t>0$
- update - $x := x + \slen \ntsdir$
- end loop
-
- assumes KKT matrix is nonsingular for every step
- is feasible descent method since all iterates are feasible with $\fobj(\xseqk{k+1}) <\fobj(\xseqk{k})$
Assumptions for convergence analysis of feasible Newton's method for equality constrained minimization
- feasibility of initial point - $\xseqk{0}\in\dom \fobj \;\&\; A\xseqk{0}=b$
- sublevel set $S = \set{x\in \dom \fobj}{\fobj(x) \leq \fobj(\xseqk{0}),\; Ax=b}$ is closed
- boundedness of Hessian on $S$ $$ \left( \exists M > 0 \right) \left( \forall x\in S \right) \left( \nabla^2 \fobj(x) \preceq M I \right) $$
- boundedness of KKT matrix on $S$ - corresponds to strong convexity assumption in unconstrained minimization $$ \left( \exists K >0 \right) \left( \forall x \in S \right) \left( \left\| \mattwotwo{\nabla^2 \fobj(x)}{A^T}{A}{0}^{-1} \right\|_2 \leq K \right) $$
- Lipschitz continuity of Hessian on $S$ $$ \left( \exists L > 0 \right) \left( \forall x,y\in S \right) \left( \left\|\nabla^2 \fobj(x) - \nabla^2 \fobj(y)\right\|_2 \leq L \|x-y\|_2 \right) $$
Convergence analysis of feasible Newton's method for equality constrained minimization
- convergence analysis of Newton's method for equality constrained minimization can be done by analyzing unconstrained minimization after eliminating equality constraints
-
thus, yield exactly same results as
for unconstrained minimization
()
(with different parameter values),
i.e.,
- consists of damped Newton phase and quadratic convergence phase
- # iterations required to achieve $\fobj(\xseqk{k})-p^\ast \leq \epsilon$ is $$ \left(\fobj(\xseqk{0})-p^\ast\right)/\gamma + \log_2 \log_2 (\epsilon_0/\epsilon) $$
- for # iterations required to achieve $\fobj(\xseqk{k})-p^\ast \leq \epsilon$ for self-concordant functions is also same as for unconstrained minimization () $$ \left(\fobj(\xseqk{0}) - p^\ast\right)/{\gamma} + \log_2 \log_2 (1 / \epsilon) $$ where $\gamma = \alpha \beta (1-2\alpha)^2 / (20-8\alpha)$
Newton step at infeasible points
- only assume that $x\in\dom \fobj$ (hence, can be infeasible)
- (as before) linearize KKT conditions $$ \begin{eqnarray*} && A(x+\ntsdir) = b \quad \& \quad \nabla \fobj(x) + \nabla^2 \fobj(x) \ntsdir + A^Tw = 0 \\ &\Leftrightarrow& A\ntsdir = b - Ax \quad \& \quad \nabla^2 \fobj(x) \ntsdir + A^Tw = - \nabla \fobj(x) \\ &\Leftrightarrow& \mattwotwo{\nabla^2 \fobj(x)}{A^T}{A}{0} \colvectwo{\ntsdir}{w} = - \colvectwo{\nabla \fobj(x)}{Ax-b} \end{eqnarray*} $$
- same as feasible Newton step except second component on RHS of KKT system
Interpretation as primal-dual Newton step
- update both primal and dual variables $x$ and $\nu$
- define $r:\reals^n\to\reals^p\to\reals^n\times\reals^p$ by $$ r(x,\nu) = (r_\mathrm{dual}(x,\nu),r_\mathrm{pri}(x,\nu)) $$ where $$ \begin{eqnarray*} \mbox{\define{dual residual}} & - & r_\mathrm{dual}(x,\nu) = \nabla \fobj(x) + A^T\nu \\ \mbox{\define{primal residual}} & - & r_\mathrm{pri}(x,\nu) = Ax-b \end{eqnarray*} $$
Equivalence of infeasible Newton step to primal-dual Newton step
- linearize $r$ to obtain primal-dual Newton step, i.e. $$ \begin{eqnarray*} && r(x,\nu) + D_{x,\nu} r(x,\nu) \colvectwo{\pdsdir}{\pdsdirnu} = 0 \\ &\Leftrightarrow& \mattwotwo{\nabla^2f(x)}{A^T}{A}{0} \colvectwo{\pdsdir}{\pdsdirnu} = - \colvectwo{\nabla f(x) + A^T\nu}{Ax-b} \end{eqnarray*} $$
-
letting $\nu^+= \nu + \pdsdirnu$ gives
$$
\mattwotwo{\nabla^2f(x)}{A^T}{A}{0}
\colvectwo{\pdsdir}{\nu^+}
=
- \colvectwo{\nabla f(x)}{Ax-b}
$$
- equivalent to infeasible Newton step
- reveals that current value of dual variable not needed
Residual norm reduction property
- infeasible Newton step is not descent direction (unlike feasible Newton step) since $$ \begin{eqnarray*} \left. \left( \frac{d}{dt}\fobj(x+t\pdsdir) \right) \right|_{t=0} &=& \nabla \fobj(x) ^T \pdsdir \\ &=& - \pdsdir^T \left(\nabla^2 \fobj(x) \pdsdir + A^Tw \right) = - \pdsdir^T \nabla^2 \fobj(x) \pdsdir + (Ax-b)^Tw \end{eqnarray*} $$ which is not necessarily negative
- however, norm of residual decreases in infeasible Newton direction $$ \begin{eqnarray*} \left. \left( \frac{d}{dx} \|r(y+t\pdsdiry)\|_2^2 \right) \right|_{t=0} & = & - 2 r(y)^T r(y) = - 2 \|r(y)\|_2^2 \\ \Leftrightarrow \quad \left. \left( \frac{d}{dx} \|r(y+t\pdsdiry)\|_2 \right) \right|_{t=0} & = & \frac{-2\|r(y)\|_2^2}{2\|r(y)\|_2} = - \|r(y)\|_2 \end{eqnarray*} $$ where $y=(x,\nu)$ and $\pdsdiry = (\pdsdir, \pdsdirnu)$
- can use $r(\xseqk{k},\nuseqk{k})$ to measure optimization progress for infeasible Newton's method
Full and damped step feasibility property
- assume step length is $t$ at some iteration, then $$ r_\mathrm{pri}(x^+,\nu^+) = Ax^+-b = A(x + t \pdsdir) - b = (1-t) r_\mathrm{pri}(x,\nu) $$
-
hence
$l>k$
$$
\seqk{r}{l}
=
\left(
\prod_{i=k}^{l-1}
(1-\seqk{t}{i})
\right)
\seqk{r}{k}
$$
- primal residual reduced by $1-\seqk{t}{k}$ at step $k$
- Newton step becomes feasible step once full step length ($t=1$) taken
Infeasible Newton's method for equality constrained minimization
- Require: $\fobj$, initial point $x\in \dom \fobj$ \& $\nu$, tolerance $\epsilon_\mathrm{pri}>0$ \& $\epsilon_\mathrm{dual}>0$
- repeat
- computer Newton step and descrement $\pdsdir(x)$ \& $\pdsdirnu(x)$, \
- do line search on $r(x,\nu)$ to choose $\slen>0$
- update - $x := x + \slen \pdsdir$ \& $\nu := \nu + \slen \pdsdirnu$
- until $\|r_\mathrm{dual}(x,\nu)\| \leq \epsilon_\mathrm{dual}$ \& $\|Ax-b\| \leq \epsilon_\mathrm{pri}$
-
note similarity and difference of
&
- line search done not on $\fobj$, but on primal-dual residuals $r(x,\nu)$
- stopping criteria depends on $r(x,\nu)$, not on Newton decrementa $\lambda(x)^2$
- primal and dual feasibility checked separately - here norm in $\|Ax-b\|$ can be any norm, e.g., $\|\cdot\|_0$, $\|\cdot\|_1$, $\|\cdot\|_2$, $\|\cdot\|_\infty$, depending on specific application
Line search methods for infeasible Newton's method
- line search methods for infeasible Newton's method, i.e., & same with $\fobj$ replaced by $\|r(x,\nu)\|_2$,
- but they have special forms (of course) - refer to below special case descriptions
- Require: \sdir, \sdirnu, $\alpha\in(0,0.5)$, $\beta\in(0,1)$
- $\slen:=1$
- while $\|r(x +\slen\pdsdir, \nu + \slen\pdsdirnu)\|_2 > (1-\alpha \slen)\|r(x,\nu)\|_2$ do
- $\slen := \beta \slen$
- end while
Pros and cons of infeasible Newton's method
-
pros
-
do not need to find feasible point separately,
e.g.
- “''
- “''
- if step length is one at any iteration, following steps coincides with feasible Newton's method - could switch to feasible Newton's method
-
do not need to find feasible point separately,
e.g.
-
cons
- exists no clear way to detect feasibility - primal residual decreases slowly (phase I method in interior point method resolves this problem)
- convergence of infeasible Newton's method can be very slow (until feasibility is achieved0
Assumptions for convergence analysis of infeasible Newton's method for equality constrained minimization
- sublevel set $S = \bigset{(x,\nu)\in \dom \fobj\times \reals^m}{ \|r(x,\nu)\|_2 \leq \|r(\xseqk{0},\nuseqk{0})\|_2 }$ is closed, which always holds because $\|r\|_2$ is closed
- boundedness of KKT matrix on $S$ $$ \left( \exists K >0 \right) \left( \forall x \in S \right) \left( \left\| Dr(x,\nu)^{-1} \right\|_2 = \left\| \mattwotwo{\nabla^2 \fobj(x)}{A^T}{A}{0}^{-1} \right\|_2 \leq K \right) $$
- Lipschitz continuity of Hessian on $S$ $$ \left( \exists L > 0 \right) \left( \forall (x,\nu), (y,\mu)\in S \right) \left( \left\|Dr(x,\nu) - Dr(y,\mu)\right\|_2 \leq L \|(x,\nu) - (y,\mu)\|_2 \right) $$
- above assumptions imply $\set{x\in\dom \fobj}{Ax=b}\neq\emptyset$ and exist optimal point $(x^\ast,\nu^\ast)$
Convergence analysis of infeasible Newton's method for equality constrained minimization
- very simliar to that for Newton's method for unconstrained minimization
-
consist of two phases - like unconstrained minimization or infeasible Newton's method (refer to
or page~)
- damped Newton phase - if $\|r(\xseqk{k},\nuseqk{k})\|_2> 1/(K^2L)$ $$ \|r(\xseqk{k+1},\nuseqk{k+1})\|_2 \leq \|r(\xseqk{k},\nuseqk{k})\|_2 - \alpha \beta / K^2L $$
- quadratic convergence damped Newton phase - if $\|r(\xseqk{k},\nuseqk{k})\|_2 \leq 1/(K^2L)$ $$ \left( K^2L \|r(\xseqk{k},\nuseqk{k})\|_2 / 2 \right) \leq \left( K^2L \|r(\xseqk{k-1},\nuseqk{k-1})\|_2 / 2 \right)^2 \leq \cdots \leq (1/2)^{2^k} $$
- # iterations of infeasible Newton's method required to satisfy $\|r(\xseqk{k},\nuseqk{k})\|_2\leq\epsilon$ $$ \|r(\xseqk{0},\nuseqk{0})\| /(\alpha \beta / K^2L) + \log_2 \log_2 (\epsilon_0/\epsilon) \quad \mbox{where}\; \epsilon_0 = 2/(K^2L) $$
- $(\xseqk{k},\nuseqk{k})$ converges to $(x^\ast,\nu^\ast)$
Barrier Interior-point Methods
Interior-point methods
- want to solve inequality constrained minimization problem
-
interior-point methods solve convex optimization problem ()
or KKT optimality conditions ()
by
-
applying Newton's method to sequence of
- equality constrained problems or
- modified versions of KKT optimality conditions
-
applying Newton's method to sequence of
- discuss interior-point barrier method & interior-point primal-dual method
-
hierarchy of convex optimization algorithms
- simplest - linear equality constrained quadratic program - can solve analytically
- Newton's method - solve linear equality constrained convex optimization problem by solving sequence of linear equality constrained quadratic programs
- interior-point methods - solve linear equality & convex inequality constrained problem by solving sequence of lienar equality constrained convex optimization problem
Indicator function barriers
- approxmiate general convex inequality constrained problem as linear equality constrained problem
- make inequality constraints implicit in objective function $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) + \sum I_-(\fie(x)) \\ \mbox{subject to} & Ax=b \end{array} $$ where $I_-:\reals\to \reals$ is indicator function for nonpositive real numbers, i.e. $$ I_{-}(u) = \left\{\begin{array}{ll} 0 & u\leq 0 \\ \infty & u> 0 \end{array}\right. $$
Logarithmic barriers
- approximate indicator function by logarithmic function $$ \hat{I}_- = -(1/t) \log(-u) \quad \dom \hat{I}_- = -\ppreals $$ for $t>0$ to obtain $$ \begin{array}{ll} \mbox{minimize} & \fobj(x) + \sum -(1/t) \log(-\fie(x)) \\ \mbox{subject to} & Ax=b \end{array} $$
- objective function is convex due to composition rule for convexity preservation (page~), and differentiable
- hence, can use Newton's method to solve it
- function $\phi$ defined by $$ \phi(x) = - \sum \log(-\fie(x)) $$ with $\dom \phi \set{x\in\xdomain}{\fie(x) \prec 0}$ called logarithmic barrier or log barrier
- solve sequence of log barrier problems as we increase $t$
Central path
- optimization problem $$ \begin{array}{ll} \mbox{minimize} & t \fobj(x) + \phi(x) \\ \mbox{subject to} & Ax = b \end{array} $$ with $t>0$ where $$ \phi(x) = - \sum \log(-\fie(x)) $$
- solution of above problem, called central point, denoted by $x^\ast(t)$, set of central points, called central path
- intuition says $x^\ast(t)$ will converge to $x^\ast$ as $t\to\infty$
- KKT conditions imply $$ Ax^\ast(t) = b \quad \fie(x^\ast(t)) \prec 0 $$ and exists $\nu^\ast(t)$ such that $$ \begin{eqnarray*} 0 &=& t \nabla \fobj(x^\ast(t)) + \nabla \phi(x^\ast(t)) + t A^T \nu^\ast(t) \\ &=& t\nabla \fobj(x^\ast(t)) - \sum \frac{1}{\fie_i(x^\ast(t))} \nabla\fie_i(x^\ast(t)) + t A^T \nu^\ast(t) \end{eqnarray*} $$
- thus if we let $\lambda^\ast(t) = -1/t\fie_i(x^\ast(t))$, $x^\ast(t)$ minimizes $$ L(x,\lambda^\ast(t),\nu^\ast(t)) = \fobj(x) + {\lambda^\ast(t)}^T \fie(x) + {\nu^\ast(t)}^T (Ax-b) $$ where $L$ is Lagrangian of original problem in
- hence, dual function $g(\lambda^\ast(t),\nu^\ast(t))$ is finite and $$ \begin{eqnarray*} g(\lambda^\ast(t), \nu^\ast(t)) &=& \inf_{x\in\xdomain} L(x,\lambda^\ast(t),\nu^\ast(t)) &=& L(x^\ast(t),\lambda^\ast(t),\nu^\ast(t)) \\ & = & \fobj(x^\ast(t)) + {\lambda^\ast(t)}^T \fie(x^\ast(t)) + {\nu^\ast(t)}^T (Ax^\ast(t)-b) = \fobj(x^\ast(t)) - m/t \end{eqnarray*} $$ and $$ \fobj(x^\ast(t)) - p^\ast \leq \fobj(x^\ast(t)) - g(\lambda^\ast(t), \nu^\ast(t)) = m/t $$
- that is, $x^\ast(t)$ is no more than $m/t$-suboptimal
- which confirms out intuition that $x^\ast(t)\to x^\ast$ as $t\to\infty$
Central path interpretation via KKT conditions
- previous arguments imply that $x$ is central point, i.e., $x=x^\ast(t)$ for some $t>0$ if and only if exist $\lambda$ and $\nu$ such that $$ \begin{eqnarray*} Ax=b \quad \fie({x}) &\preceq& 0 \quad \mbox{- primal feasibility} \\ \lambda &\succeq& 0 \quad \mbox{- dual feasibility} \\ - {\lambda_i}^T \fie_i({x}) &=& 1/t \quad \mbox{- complementary $1/t$-slackness} \\ \nabla_x L(x,\lambda,\nu) &=& 0 \quad \mbox{- vanishing gradient of Lagrangian} \end{eqnarray*} $$ called centrality conditions
-
only difference between centrality conditions and KKT conditions in
is complementary $1/t$-slackness
- note that I've just made up term “complementary $1/t$-slackness'' - you won't be able to find terminology in any literature
- for large $t$, $\lambda^\ast(t)$ & $\nu^\ast(t)$ very closely satisfy (true) complementary slackness
Central path interpretation via force field
- assume exist no equality constraints
- interpret $\phi$ as potential energy by some force field, e.g., electrical field and $t\fobj$ as potential energy by some other force field, e.g., gravity
-
then
- force by first force field (in $n$-dimensional space), which we call barrier force, is $$ - \nabla \phi(x) = \sum \frac{1}{\fie_i(x)} \nabla \fie_i(x) $$
- force by second force field, which we call objective force, is $$ - \nabla (t\fobj(x)) = -t \nabla \fobj(x) $$
-
$x^\ast(t)$ is point where two forces exactly balance each other
- as $x$ approach boundary, barrier force pushes $x$ harder from barriers,
- as $t$ increases, objective force pushes $x$ harder to point where objective potential energy is minimized
Equality constrained problem using log barrier
- central point $x^\ast(t)$ is $m/t$-suboptimal point guaranteed by optimality certificate $g(\lambda^\ast(t),\nu^\ast(t))$
- hence solving below problem provides solution with $\epsilon$-suboptimality $$ \begin{array}{ll} \mbox{minimize} & (m/\epsilon) \fobj(x) + \phi(x) \\ \mbox{subject to} & Ax=b \end{array} $$
- but works only for small problems since for large $m/\epsilon$, objective function ill behaves
Barrier methods
- Require: strictly feasible $x$, $t>0$, $\mu>1$, tolerance $\epsilon>0$
- repeat
- centering step - find $x^\ast(t)$ by minimizing $t\fobj + \phi$ subject to $Ax=b$ starting at $x$
- (optionally) compute $\lambda^\ast(t)$ \& $\nu^\ast(t)$
- stopping criterion - quit if $m/t<\epsilon$
- increase $t$ - $t := \mu t$
- update $x$ - $x := x^\ast(t)$ \Until
-
barrier method, also called path-following method,
solves sequence of equality constrained optimization problem with log barrier
- when first proposed by Fiacco and McCormick in 1960s, it was called sequential unconstrained minimization technique (SUMT)
- centering step also called outer iteration
- each iteration of algorithm used for equality constrained problem, called inner iteration
Accuracy in centering in barrier method
-
accuracy of centering
- only goal of centering is getting close to $x^\ast$, hence exact calculation of $x^\ast(t)$ not critical as long as approximates of $x^\ast(t)$ go to $x^\ast$
- while cannot calculate $g(\lambda,\nu)$ for this case, below provides dual feasible point when Newton step $\ntsdir$ for optimization problem on page~ is small, i.e., for nearly centered $$ \tilde{\lambda}_i = -\frac{1}{t\fie_i(x)} \left( 1 - \frac{\nabla \fie_i(x)^T \ntsdir}{\fie_i(x)} \right) $$
Choices of parameters of barrier method
-
choice of $\mu$
-
$\mu$ determines aggressiveness of $t$-update
- larger $\mu$, less outer iterations, but more inner iterations
- smaller $\mu$, less outer iterations, but more inner iterations
- values from $10$ to $20$ for $\mu$ seem to work well
-
$\mu$ determines aggressiveness of $t$-update
-
candidates for choice of initial $t$
- choose $\seqk{t}{0}$ such that
- $$ m / \seqk{t}{0} \approx \fobj(\xseqk{0}) - p^\ast $$
- make central path condition on page~ maximally satisfied $$ \seqk{t}{0} = \arginf_{t} \inf_{\tilde{\nu}} \left\| t \nabla \fobj(\xseqk{0}) + \nabla \phi(\xseqk{0}) + A^T \tilde{\nu} \right\| $$
Convergence analysis of barrier method
- assuming $t\fobj + \phi$ can be minimized by Newton's method for $\seqk{t}{0}$, $\mu\seqk{t}{0}$, $\mu^2\seqk{t}{0}$,
- at $k$'th step, duality gap achieved is $m/\mu^k\seqk{t}{0}$
- # centering steps required to achieve accuracy of $\epsilon$ is $$ \left\lceil \frac{\log \left(m/\epsilon \seqk{t}{0}\right)}{\log \mu} \right\rceil $$ plus one (initial centering step)
-
for convergence of centering
- for feasible centering problem, $t\fobj + \phi$ should satisfy conditions on page~, i.e., initial sublevel set is closed, associated inverse KKT matrix is bounded & Hessian satisfies Lipschitz condition
- for infeasible centering problem, $t\fobj + \phi$ should satisfy conditions on page~
Primal-dual Interior-point Methods
Primal-dual \& barrier interior-point methods
-
in primal-dual interior-point methods
- both primal and dual variables are updated at each iteration
- search directions are obtained from Newton's method, applied to modified KKT equations, i.e., optimality conditions for logarithmic barrier centering problem
- primal-dual search directions are similar to, but not quite the same as, search directions arising in barrier methods
- primal and dual iterates are not necessarily feasible
-
primal-dual interior-point methods
- often more efficient than barrier methods especially when high accuracy is required - can exhibit better than linear convergence
- (customized versions) outperform barrier method for several basic problems, such as, LP, QP, SOCP, GP, SDP
- can work for feasible, but not strictly feasible problems
- still active research topic, but show great promise
Modified KKT conditions and central points
- modified KKT conditions (for convex optimization in ) expressed as $$ r_t(x,\lambda,\nu) = \colvecthree {\nabla \fobj(x) + D\fie(x)^T\lambda + A^T\nu} {-\diag(\lambda)\fobj(x) - (1/t) \ones} {Ax-b} $$ where $$ \begin{eqnarray*} \mbox{\define{dual residual}} &-& r_\mathrm{dual}(x,\lambda,\nu) = {\nabla \fobj(x) + D\fie(x)^T\lambda + A^T\nu} \\ \mbox{\define{centrality residual}} &-& r_\mathrm{cent}(x,\lambda,\nu) = {-\diag(\lambda)\fobj(x) - (1/t) \ones} \\ \mbox{\define{primal residual}} &-& r_\mathrm{pri}(x,\lambda,\nu) = {Ax-b} \end{eqnarray*} $$
-
if $x$, $\lambda$, $\nu$ satisfy $r_t(x,\lambda,\nu)=0$ (and $\fie(x) \prec 0$),
then
- $x=x^\ast(t)$, $\lambda=\lambda^\ast(t)$, $\nu=\nu^\ast(t)$
- $x$ is primal feasible and $\lambda$ & $\nu$ are dual feasible with duality gap $m/t$
Primal-dual search direction
- assume current (primal-dual) point $y=(x,\lambda,\nu)$ and Newton step $\sdiry =( \sdir, \sdirnu, \sdirlbd)$
- as before, linearize equation to obtain Newton step, i.e., $$ r_t(y+\sdiry) \approx r_t(y) + Dr_t(y) \sdiry = 0 \quad \Leftrightarrow \quad \sdiry = -Dr_t(y)^{-1} r_t(y) $$ hence $$ \begin{my-matrix}{ccc} \nabla^2 f(x) + \sum \lambda_i \nabla^2 \fie_i(x) & D\fie(x)^T & A^T \\ -\diag(\lambda) D\fobj(x) & -\diag(\fobj(x)) & 0 \\ A & 0 & 0 \end{my-matrix} \colvecthree{\sdir}{\sdirlbd}{\sdirnu} = - \colvecthree {r_\mathrm{dual}} {r_\mathrm{cent}} {r_\mathrm{pri}} $$
- above equation determines primal-dual search direction $\pdsdiry = (\pdsdir, \pdsdirlbd, \pdsdirnu)$
Surrogate duality gap
- iterates $\xseqk{k}$, $\lbdseqk{k}$, and $\nuseqk{k}$ of primal-dual interior-point method are not necessarily feasible
- hence, cannot easily evaluate duality gap $\seqk{\eta}{k}$ as for barrier method
- define surrogate duality gap for $\fie(x) \prec 0$ and $\lambda\succeq0$ as $$ \hat{\eta}(x,\lambda) = - \fie(x)^T \lambda $$
- $\hat{\eta}$ would be duality gap if $x$ were primal feasible and $\lambda$ & $\nu$ were dual feasible
- value $t$ corresponding to surrogate duality gap $\hat{\eta}$ is $m/\hat{\eta}$
Primal-dual interior-point method
- Require: initial point $x$ with $\fie(x)\prec0$, $\lambda \succ 0$, $\mu > 1$, $\epsilon_\mathrm{pri}>0$, $\epsilon_\mathrm{dual}>0$, $\epsilon>0$
- repeat
- set $t := \mu m /\hat{\eta}$
- computer primal-dual search direction $\pdsdiry = (\pdsdir, \pdsdirlbd, \pdsdirnu)$
- do line search to choose $s>0$
- update - $x := x + s \pdsdir$, $\lambda := \lambda + s \pdsdirnu$, $\nu := \nu + s \pdsdirnu$
- until $\|r_\mathrm{pri}(x,\lambda,\nu)\|_2\leq \epsilon_\mathrm{pri}$, $\|r_\mathrm{dual}(x,\lambda,\nu)\|_2\leq \epsilon_\mathrm{dual}$, $\hat{\eta} \leq \epsilon$
- common to choose small $\epsilon_\mathrm{pri}$, $\epsilon_\mathrm{dual}$, & $\epsilon$ since primal-dual method often shows faster than linear convergence
Line search for primal-dual interior-point method
- liner search is standard backtracking line search on $r(x,\lambda,\nu)$ similar to that in except making sure that $\fie(x) \prec 0$ and $\lambda\succ0$
- note initial $s$ in is largest $s$ that makes $\lambda + s\pdsdirlbd$ positive
- Require: \pdsdir, \pdsdirlbd, \pdsdirnu, $\alpha\in(0.01,0.1)$, $\beta\in(0.3,0.8)$
- $s:= 0.99\sup\set{s\in[0,1]}{\lambda + s \sdirlbd \succeq 0} = 0.99\min\{1,\min\set{-\lambda_i/\sdirlbd_i}{\sdirlbd_i < 0}\}$
- while $\fie (x +s\pdsdir) \not \prec 0$ do
- $t := \beta t$
- end while
- while $\|r(x +s\pdsdir, \lambda + s\pdsdirlbd, \nu + s\pdsdirnu)\|_2 > (1-\alpha s)\|r(x,\lambda,\nu)\|_2$ do
- $t := \beta t$
- end while