CHAPTER 5: Duality Flashcards
(35 cards)
Lagrangian multipliers, which was used to find the critical points of a function with constraints given by equations.
find the minimum of f(x1,x2) = x2 1 given that x1 and x2 satisfy x1 + x2 = 1. To solve the problem, you define a Lagrangian function
L = f(x1,x2)−λ(x1 + x2 −1),
where λ is the Lagrangian multiplier. The minimum of f (with the constraint) is one of the critical points of L, where ∂L/∂x_i = 0.
You will see that the definitions of the Lagrangian functions we introduce below are very similar to the one above, except that we now will be dealing with inequality constraints in general. It turns out that it is easier to see the key features of a dual problem from a more general point of view, without the details related to LPPs. We therefore consider the following more general problem:
max z = f(x),
subject to g(x) ≤0, where g(x) = (g1(x),…,gm(x))T ∈ Rm, x ∈ R^n
We let x∗ be the feasible optimal solution for the problem, i.e., f(x∗) = max f(x) and g(x∗) ≤ 0, and A denotes the feasible region, i.e.,
A = {x ∈ Rn : g(x) ≤0}.
Note the constraint xj ≥ 0 has been included in the above problem, because it can be written as −xj ≤0 and we can take−xj as one of the components of g(x). Remark. In terms of the feasible regionA, the problem can be written as max_{x∈A} f(x), where the constraints are not shown explicitly, but have been implicitly included in the domain in which we search for the maximum.
DEF 5.1
Lagrangian function, Lagrangian dual functio
The Lagrangian function L(x,y) associated with the maximisation problem in Eq. 5.1 is defined as:
L(x,y) = f(x)−y^T g(x)
= f(x)- sum from I=1 to m of [y_i g_i (x)]
where y ∈ Rm, x ∈ Rn and y ≥ 0.
y is the vector of the dual variables.
(y is new variables we introduces)
For each constraint there is a corresponding dual variable.
The Lagrangian dual function is defined as:
v(y) = max _{x∈Rn} L(x,y), for y≥0.
lagrangian function motivation
solve the problem given by equation
max z = f(x),
subject to g(x) ≤0, where g(x) = (g1(x),…,gm(x))T ∈ Rm, x ∈ R^n
one may try to combine the constraint and the cost function into a single function, and somehow enforce the constraint directly. One way is to replace the problem by
max_{x∈Rn} [l(x)] =
f(x) + P(g(x))
(without constraints).
l(x) is the penalized cost function and P(·) is the penalty function
- P(u) takes very large negative values when u bigger than 0, nearly 0 when u ≤0.
- P(g(x)) makes a very large negative contribution when g(x) bigger than 0
- SO maximum of l(x) is never found for x where g(x) bigger than 0
- Thus sol is same as if g(x) ≤ 0 is imposed
Since l(x) ≈ f(x) when g(x) ≤ 0, the solution would be essentially the same as the original constrained problem.
The Lagrangian function L(x,y) may be regarded as a poor–but much simpler–approximation of l(x) with P(g(x)) = −y^T g(x) for some y≥0
diagram: P(u) approximation from lagrangian dashed straight decreasing line through O, P(u) is just below 0 for negative u and decreases closer to -0,for positive u continues to decrease and bounded below by -30
The Lagrangian dual function is the optimal value of the penalized cost function. Though the Lagrangian function appears as a poor choice for a penalised cost function if we were using the penalty method, it has the advantages of being simple and retains some key properties of the penalty method
PROPOSITION 5.1
*usefulness of dual problem
v(y) and f(x)
v(y) ≥ f(x*) ≥ f(x) for y≥0 and x being a feasible solution of
max z = f(x),
subject to g(x) ≤0, where g(x) = (g1(x),…,gm(x))T ∈ Rm, x ∈ R^n
(i. e., x ∈A).
* v(y) for any y is an upper bound for f(x), so provides info if exact minimum of f(x) is difficult to find
Proof
v(y) ≥ f(x∗) ≥ f(x) for y≥0 and x being a feasible solution of
max z = f(x),
subject to g(x) ≤0, where g(x) = (g1(x),…,gm(x))T ∈ Rm, x ∈ R^n
1) f(x∗) ≥ f(x)
follows from the def of x∗
2) note v(y) = max_{x∈R^n} [L(x,y)] =
max_{x∈R^n} [f(x) - y^Tg(x)]
since x∗ is feasible we have g(x∗)≤0 hence -y^T g(x∗) ≥0 for y ≥0
Thus v(y)= max_{x∈R^n} [L(x,y)] ≥ L(x∗,y) = f(x∗)- y^Tg(x∗) ≥ f(x*)
*def of L
Def 5.2
can we find the minimum of v(y), hence obtain the best upper bound for f(x)?
To answer this question, we need to solve the following problem: min v(y) subject to y≥0. This problem is the dual problem of the problem
max z = f(x),
subject to g(x) ≤0, where g(x) = (g1(x),…,gm(x))T ∈ Rm, x ∈ R^n
The dual problem of the primal problem in
max z = f(x),
subject to g(x) ≤0, where g(x) = (g1(x),…,gm(x))T ∈ Rm, x ∈ R^n
is min v(y), subject to y≥0,
where v(y) is the Lagrangian dual function, i.e.,
v(y) =
max_{x∈Rn} L(x,y),
L(x,y) = f(x)−y^T g(x),
and y is the vector of the dual variables
NOTES 5.2
the dual problem of the problem
max z = f(x),
subject to g(x) ≤0, where g(x) = (g1(x),…,gm(x))T ∈ Rm, x ∈ R^n
the dual Lagrangian function v(y) is always convex (or concave, if the primal problem is a minimisation problem), and the dual problem is a convex minimisation problem, which makes it easier to solve if we can write down the problem explicitly
EXAMPLE 5.1
The primal problem is defined as:
max f(x) = 4+2x−x^2, s. t.
x ≤ 2.
Find the Lagrangian dual function of the problem.
Lagrangian function
L(x,y) = f(x)−y^T g(x)
= 4+2x−x^2 -
for fixed y
v(y) =
max_{x∈Rn} L(x,y),
+2x−x2
Example 5.2. The primal problem is defined as: max f(x) = 4 + 2x−x^2, s. t.
x(3−x) ≤ 0.
Find the Lagrangian dual function of the problem.
Lagrangian function
L(x,y)=4+2x-x^2-yx(3-x)
we have
∂L/∂x = 2 − 2x − 3y + 2xy,
∂^2L/∂x^2 = −2 + 2y
if y=1 then ∂L/∂x=-1 less than 0. Therefore max L(x,y)= +∞ for y=1. If y6=1, letting ∂L/∂x=0 gives the stationary point x= (3y-2)/2(y-1), corresponds to a minimum if y less than 1 since ∂^2L/∂x^2 bigger than 0, ie v(y)= max L(x,y)= +∞ for y bigger than 1.
The stationary point corresponds to a maximum when y less than 1 : v(Y)= L(x,y)|_{x=(3y-2)/2(y-1)}= 4 - [ (3y-2)^2/4(y-1)] which is defined for 0 less than or equal to y less than 1.
dual function
primal function
We discuss briefly the relation between the dual function and the primal function in this section.
PROP 5.2 WEAK DUALITY CONDITION
Let y* be the feasible optimal solution for the dual problem, i.e., v(y) = min v(y) and y ≥ 0. Then v(y) ≥ f(x).
Proof: It follows from Proposition 5.1: v(y) ≥ f(x*) for any y ≥ 0.
DUALITY GAP
Remark. Weak duality condition is satisfied by any primal-dual pairs. In general, v(y) ≠ f(x), and the difference is called the duality gap.
DEF 5.3 STRONG DUALITY
. We say the strong duality condition holds if v(y) = f(x), i.e., if the optimal costs for the primal/dual problems are the same.
EXAMPLE 5.3
It can be checked that the strong duality condition holds in both Example 5.1 and 5.2
Example 5.1, we find x= 1 with f_max = 5 and y = 0 with v_min = 5. For the primal problem in Example 5.2, the constraint is equivalent to x ≤ 0 or x ≥ 3. Therefore we find f_max = f(0) = 4 with x* = 0. Letting dv/dy = 0, we find y* = 2/3 and vmin = v(y*) = 4.
THEOREM 5.1 COMPLEMENTARY SLACKNESS
If the strong duality condition holds, then y_ig_i(x) = 0 for all i. The equations are called the complementary slackness conditions.
PROOF
THEOREM 5.1 COMPLEMENTARY SLACKNESS
From the proof of Proposition 5.1, we have v(y) = max L(x, y) ≥ L(x ∗ , y) = f(x ∗ ) − y T g(x ∗ ) ≥ f(x ∗ ) where the last step follows from the fact that −y Tg(x ∗ ) ≥ 0 since y ≥ 0 and g(x ∗ ) ≤ 0 due to the feasibility of x ∗ . The above expression is true for any y ≥ 0, so is also true for y ∗ . Hence v(y ∗ ) ≥ f(x ∗ ) − y ∗Tg(x ∗ ) ≥ f(x ∗ ). By strong duality, we have f(x ∗ ) = v(y ∗ ). This means f(x ∗ ) ≥ f(x ∗ ) − y ∗T g(x ∗ ) ≥ f(x ∗ ) Thus y ∗Tg(x ∗ ) = 0. We also have y ∗ i gi(x ∗ ) ≤ 0 for any i, since y ∗ i ≥ 0 and gi(x ∗ ) ≤ 0. Therefore for y ∗Tg(x ∗ ) = 0, we need y ∗ i gi(x ∗ ) = 0 for all i. Remark. Consider a maximisation primal problem, and assume the strong duality condition holds. We can show that x = x* is a maximum of L(x, y*) (as a function of x). To see this, we note L(x, y*) = f(x) − y*T g(x) ≤ max x∈Rn L(x, y*) = v(y*) = f(x*) On the other hand, L(x*, y*) = f(x*), since y*Tg(x*) = 0. Therefore we have L(x, y*) ≤ L(x*, y*), which means x* is a maximum of L(x, y*).
As a result, the gradient of L(x, y) with respect to x must be zero at x, i.e., ∂L(x, y)/∂xi
|x=x = 0 for
i = 1, 2, …, n. Substituting in the expression for L(x, y), we have ∂ f(x)/∂xi
x=x ∗ − m ∑ j=1 y ∗ j ∂gj(x) ∂xi
x=x
∗
= 0.
THEOREM 5.2 KKT CONDITIONS
narrows down where optimal should be
WE WILL APPLY THIS IN EXAM
Let x* and y* be the optimal solutions for the primal and dual problems, respectively.
If the strong duality condition holds, then x* and y* satisfy the following Karush-Kuhn-Tucker (KKT) conditions: ∂L(x, y*)/∂xi |_{x=x*} ≡ ∂ f(x)/∂xi |_{x=x*} − ∑_{j=1,m} [y*_j ∂gj(x)/∂xi] |_{x=x*}
= 0 (i = 1, 2, …, n),
g_j(x) ≤ 0,
y_j ≥ 0,
y_j g_j(x) = 0,
(j = 1, 2, …, m)
THEOREM 5.2 KKT CONDITIONS
NOTES
The last condition in the KKT conditions is the complementary slackness condition. The first equation should be familiar to you, because same condition was used in the method of Lagrangian multipliers, which was used to find the critical points of a function subject to (equality) constraints.
The above theorem shows that we can find x* and y* by solving the equations in the above KKT conditions. It forms the basis of a whole suite of new methods that complement the simplex methods. We will expand a bit more on this later.
Duality for a minimisation primal problem
DEF 5.4 DUALITY FOR MINIMISATION PROBLEM
For a minimisation problem min f(x) subject to g(x) ≤ 0, the Lagrangian function is defined as
L(x, y) = f(x) + y^T g(x)
with y ≥ 0.
The second term is always non-positive, so that L(x, y) provides a lower bound for f(x). The Lagrangian dual function is defined as v(y) = min_{x∈R^n} L(x, y) and the dual problem is: max v(y), with y ≥ 0
in this cse f(x)≥ f(x) ≥ v(y) and weak duality condition is v(y) ≥ f(x). The neg sign in first of KKT conditions is changed to +, rest remain same
MEMORIZE KKT
If the definitions appear difficult to memorize, it may help to remember that the Lagrangian functions are penalized cost functions; the added terms penalize the costs when the constraints are not satisfied. For minimisation problems, the added term is y^T g(x) exactly because it is penalizing: it is positive when g > 0 (i.e., when the constraint is not satisfied), so that L would be bigger than z (hence penalized).
The dual problems of LPPs
When the primal problem is an LPP, the above general definition would lead to a rather simple expression for the dual problem, which is consistent with the dietician-salesman example we described before. In this sub-section, the LPPs are given in its original formulation where only decision variables are present unless specified otherwise explicitly
EXAMPLE 5.4
(The dual of a maximisation LPP with ≤ constraints)
Find the dual problem for the following
primal LPP:
max z = c^Tx, subject to Ax ≤ b, x ≥ 0,
where b ∈ R^m and x ∈ R^n
Here x does NOT include the slack variables or artificial variables. We assume
there are n decision variables x1, x2, …, xn and m constraints.
The constraints are Ax ≤ b, i.e., Ax − b ≤ 0, and −x ≤ 0, and the cost function f(x) = c^Tx.
Introducing dual variables y ≥ 0 corresponding to constraints Ax − b ≤ 0, and w ≥ 0 corresponding to −x ≤ 0, respectively. The Lagrangian function can be written as
L(x, y, w) = c^Tx − y^T(Ax − b) − w^T(−x)
= y^T b + (c^T + w^T − y^T A)x.
The Lagrangian dual function is v(y, w) = max_x∈R^n L(x, y, w) = y^T b + max_{x∈R^n}[(c^T+ w^T − y^T A)x]
PROPOSITION 5.3 dual of the dual LPP
The dual of the dual LPP is the same as the original primal LPP.