Global Convergence Properties of the HS Conjugate Gradient Method

It is well known that global convergence has not been established for the Hestenes-Stiefel (HS) conjugate gradient method using the traditional line searches conditions. In this paper, under some suitable conditions, by using a modified Armijo line search, global convergence results were established for the HS method. Preliminary numerical results on a set of large-scale problems were reported to show that the HS method’s computational effiiciency is encouraging. Mathematics Subject Classification: 90C06; 90C30; 65K05


Introduction
Consider the unconstrained optimization problem where f : R n −→ R is continuously differentiable.The line search method usually takes the following iterative formula for (1.1), where x k is the current iterate point, α k > 0 is a steplength and d k is a search direction.Different choices of d k and α k will determine different line search methods( [23,25,26]).We denote f (x k ) by f k ,∇f (x k ) by g k , and ∇f (x k+1 ) by g k+1 , respectively. .denotes the Euclidian norm of vectors and define y k = g k+1 − g k .
We all know that a method is called steepest descent method if we take d k = −g k as a search direction at every iteration, which has wide applications in solving large-scale minimization problems ( [23,24,28]).One drawback of the method is often yielding zigzag phenomena in solving practical problems, which makes the algorithm converge to an optimal solution very slowly, or even fail to converge ( [16,18]).
If we take d k = −H k g k as a search direction at each iteration in the algorithm, where H k is an n × n matrix approximating [∇ 2 f (x k )] −1 , then the corresponding method is called the Newton-like method ( [16,18,28]) such as the Newton method, the quasi-Newton method, variable metric method, etc.Many papers have proposed this method for optimization problems ( [4,5,8,19]).However, the Newton-like method needs to store and compute matrix H k at each iteration and thus adds to the cost of storage and computation.Accordingly, this method is not suitable to solve large-scale optimization problems in many cases.
Due to its simplicity and its very low memory requirement, the conjugate gradient method is a powerful line search method for solving the large-scale optimization problems.In fact, the CG method is not among the fastest or most robust optimization algorithms for nonlinear problems available today, but it remains very popular for engineers and mathematicians who are interested in solving large problems (cf.[1,15,17,27]).The conjugate gradient method is designed to solve unconstrained optimization problem (1).More explicitly, the conjugate gradient method is an algorithm for finding the nearest local minimum of a function of variables which presupposes that the gradient of the function can be computed.We consider only the case where the method is implemented without regular restarts.The iterative formula of the conjugate gradient method is given by (2), where α k is a steplength which is computed by carrying out a line search, and d k is the search direction defined by where β k is a scalar and g (x) denotes ∇f (x).If f is a strictly convex quadratic function, namely, where H is a positive definite matrix and if α k is the exact one-dimensional minimizer along the direction d k , i.e., 3) is called the linear conjugate gradient method.Otherwise, (1.2)-(1.3) is called the nonlinear conjugate gradient method.
Conjugate gradient methods differ in their way of defining the scalar parameter β k .In the literature, there have been proposed several choices for β k which give rise to distinct conjugate gradient methods.The most well known conjugate gradient methods are the Hestenes-Stiefel (HS) method [11], the Fletcher-Reeves (FR) method [9], the Polak-Ribière-Polyak (PR) method [20,22 ], the Conjugate Descent method(CD) [8], the Liu-Storey (LS) method [14], the Dai-Yuan (DY) method [6], and Hager and Zhang (HZ) method [12].The update parameters of these methods are respectively specified as follows: The convergence behavior of the above formulas with some line search conditions has been studied by many authors for many years.The FR method with an exact line search was proved to globally convergent on general functions by Zoutendijk [29].However, the PRP method and the HS method with the exact line search are not globally convergent, see Powell's counterexample [21].In the already-existing convergence analysis and implementations of the conjugate gradient method, the Armijo conditions, namely, Let s > 0 be a constant, ρ ∈ (0, 1) and μ ∈ (0, 1).Choose α k to be the largest α in {s, sρ, sρ 2 , ..., } such that The drawback of the Armijo line search is how to choose the initial step size s.If s is too large then the procedure needs to call much more function evaluations.If s is too small then the efficiency of related algorithm will be decreased.Thereby, we should choose an adequate initial step size s at each iteration so as to find the step size α k easily.
In addition, the sufficient descent condition: has often been used in the literature to analyze the global convergence of conjugate gradient methods with inexact line searches.For instance, Al-Baali [1], Toouati-Ahmed and Storey [3], Hu and Storey [13], Gilbert and Nocedal [10] analyzed the global convergence of algorithms related to the Fletcher-Reeves method with the strong Wolfe line search.Their convergence analysis used the sufficient descent condition (1.8).As for the algorithms related to the PRP method, Gilbert and Nocedal [10] investigated wide choices of β k that resulted in globally convergent methods.In order for the sufficient descent condition to hold, they modified the strong Wolfe line search to the two-stage line search, the first stage is to find a point using the strong Wolfe line search, and the second stage is when, at that point the sufficient descent condition does not hold, more line search iterations will proceed until a new point satisfying the sufficient descent condition is found.They hinted that the sufficient descent condition may be crucial for conjugate gradient methods.
In this paper we propose a new Armijo line search in which an appropriate initial step size s is defined and varies at each iteration.The new Armijo line search enables us to find the step size α k easily at each iteration and guarantees the global convergence of the original HS conjugate gradient method under some mild conditions.The global convergence and linear convergence rate are analyzed and numerical results show that HS method with the new Armijo line search is more effective than other similar methods in solving large scale minimization problems.

New Armijo line search
We first assume that Asumption A. The objective function f (x) is continuously differentiable and has a lower bound on R n Asumption B. The gradient New Armijo line search and where d(

Algorithm and Convergent properties
In this subsection, we will reintroduce the convergence properties of the HS method Now we give the following algorithm firstly.
Step 0: Given Step 1: If g k = 0 then stop else go to Step 2.
Step 2: Set and α k is defined by the new Armijo line search.
Step 3. Setk := k + 1 and go to Step 1.Some simple properties of the above algorithm are given as follows.
Assume that (A) and (B) hold and the HS method with the new Armijo line search generates an infinite sequence By the condition (B), the Cauchy-Schwartz inequality and the HS method, we have Assume that (A) and (B) hold.Then the new Armijo line search is well defined.
On the one hand, since α → 0lim On the other hand, by Lemma 2, we can obtain we can prove that the new Armijo-type line search is well defined when α ∈ [0, α k ].The proof is completed.

Global convergence
Assume that (A) and (B) hold and the HS method with the new Armijo line search generates an infinite sequence {x k } and there exist m 0 > 0 and M 0 > 0 such that m 0 ≤ L k ≤ M 0 .Then, For k = 0 we have For k > 0, by the procedure of the new Armijo line search, we have By the Cauchy-Schwartz inequality, the above inequality and noting the HS formula, we have The proof is completed.Assume that (A) and (B) hold, the HS method with the new Armijo line search generates an infinite sequence {x k } and there exist m 0 > 0 and M 0 > 0 such that m 0 ≤ L k ≤ M 0 .Then k → ∞lim g k = 0.15 (15) Let η 0 = inf ∀k {α k }.
If η 0 > 0 then we have By (A) we have and thus, lim k→∞ g k = 0.
In the following, we will prove that η 0 > 0. For the contrary, assume that η 0 = 0.Then, there exists an infinite subset K ⊆ {0, 1, 2, ..., } such that lim k∈K,k→∞ By Lemma 3 we obtain Therefore, there is a k such that Let α = α k /ρ , at least one of the following two inequalities and does not hold.If (17) does not hold, then we have Using the mean value theorem on the left-hand side of the above inequality, there exists By (B), the Cauchy-Schwartz inequality, (19) and Lemma 1, we have We can obtain from Lemma 3 that which contradicts (16).If (18) does not hold, then we have and thus, By using the Cauchy-Schwartz inequality on the left-hand side of the above inequality we have Combining Lemma 3 we have which also contradicts (16).This shows that η 0 > 0. The whole proof is completed.

Linear Convergence Rate
In this section we shall prove that the HS method with the new Armijomodified line search has linear convergence rate under some mild conditions.
We further assume that Asumption C. The sequence {x k } generated by the HS method with the new Armijo-type line search converges to x * , ∇ 2 f (x * ) is a symmetric positive definite matrix and f (x) is twice continuously differentiable on Assume that Asumption (C) holds.Then there exist m , M and ε 0 with 0 < m ≤ M and ε ≤ ε 0 such that and thus By (23) and (22) we can also obtain, from the Cauchy-Schwartz inequality, that and Its proof can be seen from the literature (e.g.[29]).
Assume that Asumption (C) holds, the HS method with the new Armijotype line search generates an infinite sequence {x k } and there exist m > 0 and M > 0 such that m 0 ≤ L k ≤ M 0 .Then {x k } converges to x * at least R-linearly.
Its proof can be seen from the literature (e.g.[26]).

Numerical Reports
In this section, we shall conduct some numerical experiments to show the efficiency of the new Armijo-modified line search used in the HS method.
The Lipschitz constant L of g(x) is usually not a known priori in practical computation and needs to be estimated.In the sequel, we shall discuss the problem and present some approaches for estimating L. In a recent paper [24], some approaches for estimating L were proposed.If k ≥ 1 then we set In fact, if L is a Lipschitz constant then any L greater than L is also a Lipschitz constant, which allows us to find a large Lipschitz constant.However, a very large Lipschitz constant possibly leads to a very small step size and makes the HS method with the new Armijo-modified line search converge very slowly.Thereby, we should seek as small as possible Lipschitz constants in practical computation.
In the k-th iteration we take the Lipschitz constants as respectively with L 0 > 0 and M 0 being a large positive number.
Assume that (H1) and (H2) hold, the HS method with the new Armijomodified line search generates an infinite sequence {x k } and L k is evaluated by (34), (35) or (36).Then, there exist m 0 > 0 and M 0 > 0 such that Obviously, L k = L 0 , and we can take m 0 = L 0 .For (34) we have For (35) we have For (36), by using the Cauchy-Schwartz inequality, we have L k = max L 0 , By letting M 0 = max(L 0 , L, M 0 ), we complete the proof.
HS1, HS2, and HS3 denote the HS methods with the new Armijo-modified line search corresponding to the estimations (34)-(36), respectively.HS denotes the original HS method with strong Wolfe line search.PRP+ denotes the PRP method with β k = max 0, β P RP k and strong Wolfe line search .
Birgin and Martinez developed a family of scaled conjugate gradient algorithms, called the spectral conjugate gradient method (abbreviated as SCG) [3].Numerical experiments showed that some special SCG methods were effective.In one SCG method, the initial choice of α at the k−th iteration in SCG method was We chose 15 test problems (Problems 21-35) with the dimension n = 10 000 and initial points from the literature [18] to implement the HS method with the new Armijo-modified line search.We set the parameters as μ = 0.25, ρ = 0.75, c = 0.75 and L 0 = 1 in the numerical experiment.five conjugate gradient algorithms (HS1, HS2, HS3, HS and HS+) are compared in numerical performance.
The stop criterion is g k ≤ 10 −8 , and the numerical results are given in Table 1.
In Table 1, CPU denotes the total CPU time (seconds) for solving all the 15 test problems.A pair of numbers means the number of iterations and the number of functional evaluations.It can be seen from Table 1 that the HS method with the new Armijo-modified line search is effective for solving some large scale problems.In particular, method HS1 seems to be the best one among the five algorithms because it uses the least number of iterations and functional evaluations when the algorithms reach the same precision.This shows that the estimating formula (34) may be more reasonable than other formula.In fact, if This motivates us to guess that the suitable Lipschitz constant should be chosen in the interval It can be seen from Table 1 that HS methods with the new line search are superior to HS and PRP+ conjugate gradient methods.Moreover, the HS method may fail in some cases if we choose inadequate parameters.Although the PRP+ conjugate gradient method has global convergence, its numerical performance is not better than that of the HS method in many situations.
Numerical experiments show that the new line search proposed in this paper is effective for the HS method in practical computation.The reason is that we used Lipschitz constant estimation in the new line search and could define an adequate initial step size s k so as to seek a suitable step size α k for the HS method, which reduced the function evaluations at each iteration and improved the efficiency of the HS method.
It is possible that the initial choice of step size (38) is reasonable for the SCG method in practical computation.All the facts show that choosing an adequate initial step size at each iteration is very important for line search methods, especially for conjugate gradient methods.

Conclusion
In this paper, a new form of Armijo-modified line search has been proposed for guaranteeing the global convergence of the HS conjugate gradient method for minimizing functions that have Lipschitz continuous partial derivatives.It needs one to estimate the local Lipschitz constant of the derivative of objective functions in practical computation.The global convergence and linear convergence rate of the HS method with the new Armijo-modified line search were analyzed under some mild conditions.Numerical results showed that the corresponding HS method with the new Armijo-modified line search was effective and superior to the HS conjugate gradient method with strong Wolfe line search.For further research we should not only find more techniques of estimating parameters but also carry out numerical experiments.