The Complexity of Optimizing Over a Simplex, Hypercube or Sphere: A Short Survey

We consider the computational complexity of optimizing various classes of continuous functions over a simplex, hypercube or sphere.These relatively simple optimization problems have many applications.We review known approximation results as well as negative (inapproximability) results from the recent literature.


Introduction
Consider the generic global optimization problem: for some continuous, computable f : K → IR and compact convex set K ⊂ IR n , and letf := max {f (x) : x ∈ K} .
In this short survey we will consider the computational complexity of computing or approximating f (orf ) in the case where K is one of the following three sets: • the standard (or unit) simplex: • the unit hypercube [0, 1] n , • the unit sphere: S n := {x ∈ IR n : x = 1}.
Problem (1) has a surprising number of applications for these choices of K, and we only mention a few. For the simplex, and quadratic f , the applications include finding maximum stable sets in graphs, portfolio optimization, testing matrix copositivity, game theory, and population dynamics problems (see the review paper by Bomze [4] and the references therein). A recent application is the estimation of crossing numbers in certain classes of graphs [13].
The following example is of optimization of a general non-polynomial f over the simplex, that occurs in multivariate interpolation, and in finite element methods (see [10]). Example 1.1 Given is a finite set of interpolation points Θ ⊂ ∆ n . Denote the fundamental Lagrange polynomial associated with an interpolation point θ ∈ Θ by l θ . In other words, for x ∈ Θ: For a given g : IR n → IR, the associated Lagrange interpolant of g with respect to Θ is: Note that L Θ (g) interpolates g at the points in Θ.
The associated Lebesgue constant is defined as: The Lebesgue constant is important in bounding the error of approximation, since one can show that where the norm is the supremum norm on ∆ n , and p * is the best possible polynomial approximation of g of the same degree as L Θ (g). Thus, to compute the Lebesgue constant for Θ, we should maximize f (x) = θ∈Θ |l θ (x)| over the simplex ∆ n . For K = [0, 1] n and quadratic f , the examples include the maximum cut problem in graphs (see below). For general f , it includes many engineering design problems where simple upper and lower bounds on the variables are given, and no other constraints are present. These problems are sometimes referred to as 'box constrained global optimization problems'.
For K = S n (the sphere), problem (1) becomes a minimal eigenvalue problem for a quadratic form f (x) = x T Qx, by the Raleigh-Ritz theorem. For general quadratic functions it contains the trust region problem, that appears in many nonlinear programming algorithms as a sub-problem. For general forms (homogeneous polynomials), it contains the classical problem of deciding whether a given form is positive semidefinite.

Notions of approximation
Most of the optimization problems we will consider will be NP-hard, and we will therefore be interested in approximating the optimal values as well as possible in polynomial time.
One has to be careful when defining the notion of an approximation to an optimal solution of problem (1). The reason is that we usually do not know the rangef − f of function values on K in advance. Iff − f is small compared to a given , then it is not satisfactory to only compute some x ∈ K with the property that f (x) − f < . It is therefore better to find an x ∈ K such that f (x) − f < (f − f ), since we then know that f (x) belongs to the fraction of lowest function values.
The approximation is If we replace condition (2) by the condition then we speak of a (1 − )-approximation of f in the weak sense.
The following definitions are essentially from De Klerk, Laurent and Parrilo [12], and are consistent with the corresponding definitions in combinatorial optimization. For example, if f is a polynomial, the bit size required to represent f may be taken as the sum of the bit sizes of its coefficients.

Definition 2.3 (PTAS/FPTAS) If, for a given function class F, problem
(1) has a polynomial time (1 − )-approximation algorithm for each ∈ (0, 1], we say that problem (1) allows a polynomial time approximation scheme (PTAS) for the function class F. In case of a strongly polynomial time (1− )-approximation algorithm for each ∈ (0, 1], we speak of a fully polynomial time approximation scheme (FPTAS).
These definitions can be adapted in an obvious way for maximization problems, or if the approximations are in the weak sense of (3).

Inapproximability results
We first review negative approximation results for problem (1). We will see that, in a well-defined sense, optimization over the hypercube is much harder than over the simplex, while the complexity of optimization over a sphere is somewhere in between.

The case of the simplex
If K = ∆ n , then computing f is an NP-hard problem, already for quadratic polynomials, as it contains the maximum stable set problem as a special case. Indeed, let G be a graph with adjacency matrix A and let I denote the identity matrix; then the maximum size α(G) of a stable set in G can be expressed as by a theorem of Motzkin and Straus [14]. Moreover, this problem cannot have a FPTAS, unless all problems in NP can be solved in randomized polynomial time; this is due to the following inapproximability result for the maximum stable set problem by Håstad [8].

The case of the hypercube
If K = [0, 1] m and f quadratic, then problem (1) contains the maximum cut problem in graphs as a special case. Indeed, for a graph G = (V, E) with Laplacian matrix L, the size of the maximum cut is given by (see [7,15]): where e is the all-ones vector. The maximum cut problem does not allow a PTAS, as the following theorem shows.
It follows that problem (1) does not allow a PTAS for any class of functions that includes the quadratic polynomials if K = [0, 1] n . A related negative result is due to Bellare and Rogaway [3], who proved that if P = NP and ∈ (0, 1/3), there is no polynomial time (1 − )-approximation algorithm in the weak sense for the problem of minimizing a polynomial of total degree d ≥ 2 over all sets of the form K = {x ∈ [0, 1] n | Ax ≤ b}.

The case of the sphere
Nesterov [16] showed that maximizing a cubic form (homogeneous polynomial) on the unit sphere is an NP-hard problem, using a reduction from the maximum stable set problem.
Note that this indeed involves maximizing a (square free) form of degree 3 in the variables x and y over the unit sphere. Also note that the number of variables is polynomial in |V |, since the x variables correspond to the vertices of G, and the y variables correspond to the edges of the complement of G.
In view of the inapproximability result for the maximum stable set problem in Theorem 3.1, we have the following corollary. 4 Approximation results

The case of the simplex
We now consider the complexity of (approximately) solving problem (1) for K = ∆ n .

Easy cases
Problem (1) can be solved in polynomial time or allows a FPTAS for the following classes of functions: • f is concave; in this case the global minimum of f is attained at one of the n extreme points of ∆ n ; • f is convex and self-concordant with polynomial time computable gradient and Hessian. In this case the theory of interior point algorithms of Nesterov and Nemirovski [17] provides a FPTAS for problem (1). This remains true if K is the unit hypercube or unit ball.

Results for polynomial f
Bomze and De Klerk [5] showed that, for K = ∆ n and f quadratic, problem (1) allows a PTAS. One of the PTAS algorithms that they considered is particularly simple: it evaluates f on the regular grid for any m ≥ 1. This PTAS result was extended to polynomials of fixed degree by De Klerk, Laurent, and Parrilo [12]. Earlier, related results were obtained by Faybusovich [6]. where One can verify that which implies that lim r→∞ w r (d) = 1. There also exist more sophisticated (and practical) PTAS algorithms for minimizing forms of fixed degree over the simplex that employ linear or semidefinite programming. For example, the authors of [12] consider has nonnegative coefficients (r = 0, 1, . . .) Note these bounds may be computed using linear programming, and this computation may be performed in polynomial time when r and d are fixed. Moreover, it is shown in [12] that where w r (d) was defined in (7). Thus the values f

Results for non-polynomial f
Recently, De Klerk, Elabwabi, and Den Hertog [11] derived approximation results for (not necessarily polynomial) functions that meet a Lipschitz condition of given order.
Once again, the underlying algorithm is simply the evaluation of f on a suitable regular grid.
Before defining this class of functions, recall that the modulus of continuity of f on a compact convex set K is defined by We now define the class Lip L (α) of functions that meet the Lipschitz condition of given order α > 0 with respect to a given constant L > 0: This condition is also called a Hölder continuity condition. (Some authors reserve the term 'Lipschitz' for the case α = 1.) This theorem implies a PTAS in the weak sense for minimizing computable functions from the class Lip L (α) over ∆ n , for fixed L and α. Note that Theorem 4.3 does not imply Corollary 4.2. De Klerk, Elabwabi, and Den Hertog [11] also identified a further class of functions that allow a PTAS. This class includes the polynomials of fixed degree, and is defined in terms of suitable bounds on the higher order derivatives of f . The interested reader is referred to [11] for more details.
It is still an open question to completely classify the classes of functions that allow a PTAS.

The case of the hypercube
For the maximum cut problem there is a celebrated polynomial time 0.878approximation algorithm due to Goemans and Williamson [7], who suggested the following semidefinite programming (SDP) relaxation of the maximum cut problem (4): where e is the all-ones vector, and X 0 means that X is a symmetric positive semidefinite matrix. Goemans and Williamson [7] also devised a randomized rounding scheme that uses the optimal solution of (9) to generate cuts in the graph. Their algorithm produces a cut of cardinality at least 0.878OP T ≥ 0.878|maximum cut|.
Related approximation results for quadratic optimization over a hypercube were given by Nesterov [15] and Nesterov et al. [18]. In particular, the Nesterov showed the following.
Notice that the objective function in the maximum cut problem (4) is convex quadratic, since the Laplacian matrix of a graph is always positive semidefinite. Thus the theorem by Nesterov covers a larger class of problems than maximum cut, but the 2/π constant is significantly lower than the 0.878 obtained by Goemans and Williamson.

The case of the sphere
The complexity of optimization over the sphere is still relatively poorly understood, compared to the simplex or hypercube. To be more precise, there still is a big gap between approximation and inapproximability results.
Easy case: quadratic optimization over the unit ball Consider the problem of minimizing a quadratic function f (x) = x T Bx+2a T x+ α over the unit ball x ≤ 1. By the S-procedure of Yakubovich (see e.g. [19] and the references therein) this problem may be rewritten as a semidefinite program as follows. min

Results for polynomial objective functions
The result in Corollary 4.2 implies that there exists a PTAS for minimixing even forms of fixed degree on the unit sphere. (Recall that a form is called even if all exponents are even.) Recently, Barvinok [2] has proved another partial result: one may derive a randomized PTAS for maximizing a form on the sphere for a special class of forms called (δ, N )-focused forms. • the form f can be written as a non-negative linear combination where the α I 's are nonnegative scalars.
holds with probability at least 2/3 for a random k-dimensional subspace L ⊂ IR n .
One may solve the problem max x∈S n ∩L f (x) in polynomial time using techniques from computational algebraic geometry, since L is of fixed dimension, and therefore the number of variables in the resulting optimization problem is in fact fixed. Thus we obtain the following corollary. It is still an open question whether this result can be extended to all forms of fixed degree.

Conclusion and discussion
Approximation algorithms have been studied extensively for combinatorial optimization problems, but have not received the same attention for NP-hard continuous optimization problems. Indeed, most of the results described in this survey were obtained in the last decade.
There is also not much computational experience yet with approximation algorithms for nonlinear programming. The only significant exception so far is for semidefinite programming relaxations for quadratic optimization on the simplex or the hypercube.
It is therefore the hope of this author that this relatively young research area will attract both theoretically and computationally minded researchers.