Optimization lectures: Lecture 2(b) Descents Methods

M.Sc. Molecular Modelling, Department of Crystallography, Birkbeck College

Molecular Interactions Unit: Energy Minimization


Oliver Smart
(c) O.S. Smart 1996, all rights reserved

file last modified Thur 5 Mar 1996

n.b., need modern version of netscape to read this document correctly.


Lecture 2 (b) - Descent Methods

We have seen how to perform a univariant optimization where we minimize the a function of single variable f(x). Now we will see how to adapt this for multivariant optimization where we want to minimize a function of many variables f(X) (where X is a vector:

  1. Start at some given point X1
  2. Assign i = 1.
  3. Choose a search direction Di
  4. Use a line minimization method (e.g., Golden Section) to minimize f(Xi + Di) by varying the scalar .
  5. Replace Xi+1 = Xi + minDi
  6. Have we converged? (An important question)
    • YES: Output result and Stop.
    • NO: go back to step 3.
This is a general description of a descent method.


The question is how do we choose the search directions Di?

(a) Method of Alternating Variables (Dumb!)

Simply take the unit vector along i, so Di = .

This can be an awful choice even for a 2D function:

As can be seen the step size gets smaller and smaller as the procedure progress the minimum is never reached. The method is in general non-convergent on a quadratic surface.

(b) Steepest Descents

Instead of choosing the set of unit vectors choose to minimize along the direction of the greatest slope, minus the gradient or the force:

where: It is convenient to think of the gradient indicating the direction that water would flow down a hill on a map. Steepest descents shows reasonable convergence initially: but progress gets slow as the minimum is approached. Fletcher shows that the method is convergent on a quadratic surface but requires an infinite number of steps!

In practice steepest descents is normally used without line searches. Instead some initial path length is choosen and a step of this length is taken down the unit vector in the direction of the force. If the step produces an increase in energy then the step length has got to large and it is halved for the next step. But if the step produces a decrease then it may be far to small and the step length is increased by 20%. This is called crude steepest descents and you will use it in the mini-project the pseudo-code is:

  1. Start at some given point X1
  2. Assign i = 1, = 0.001.
  3. Find potential energy PE = f(Xi) and force vector F = -(Xi)
  4. is PE < PEold?
    • YES: = 1.2
    • NO : = 0.5

  5. Di= Fi
  6. Replace Xi+1 = Xi + Di
  7. Store PEold =PE
  8. Have we converged? Is the rms force |Fi| < 0.001 ?
    • YES: Output result and Stop.
    • NO: go back to step 3.

(c) Conjugate Gradients

This begs the question can a better set of search directions be choosen? Not surprisingly the answer is yes.

The conjugate gradients method chooses search directions using information gained from previous searches. This is in contrast to steepest descents in which all previous information is ignored and the search direction is set to the force vector Fi. The search directions for Fletcher-Reeves Conjugate gradients are given:

Thus the search direction for step i depends on the the direction for the previous step (i-1) as well as the current force vector.

The routine can be given in pseudo-code:

  1. Start at some given point X1
  2. Assign i = 1.
  3. Find potential energy PE(Xi) and force vector F = -(Xi)
  4. Choose a search direction:
    is remainder[(i-1)/ND] = 0 ?
    • yes: Di = Fi
    • no: Di = Fi + Di-1
  5. Use a line minimization method to minimize PE(Xi + Di) by varying the scalar .
  6. Replace Xi+1 = Xi + minDi
  7. Have we converged:
    rms force |Fi| < pre-set limit ?
    • YES: Output result and Stop.
    • NO: go back to step 3.
The basis of the method is that for an quadratic surface the Fletcher-Powell search directions are conjugate to the Hessian, H:
DitHDj = 0
when i is not equal to j
This is important because for a quadratic surface conjugate search directions result in the following relations:
Di.Fj = 0
Fi.Fj = 0
when i is not equal to j
To see what this basically means consider that we do a line search along a particular direction thus ensuring the gradient along the direction is zero. If the next direction is conjugate to the first the subsequent search will not affect the gradient along the original direction. Thus by looking along n (independent) directions for an n-dimensional space we get n directions for which the gradient is zero i.e., the gradient is zero in directions. For a n-dimensional space quadratic surface a conjugate gradients method will converge in n or less steps.

  • Conjugate gradients is a reasonably robust technique which is widely used in biomolecular applications.
  • It has an advantage over quasi-Newtonian tecniques in that it only requires the storage of 3 vectors of length n instead of an n by n matrix.
  • Clever - directions are conjugate to the Hessian but we never calculate it.
  • In practice have non-quadratic surfaces so after n steps normal to set search direction back to steepest descents - hence step 4's question above.

  • On to next section: Lecture 3(a) Newtonian methods