## Abstract

Bayesian optimization is a metamodel-based global optimization approach that can balance between exploration and exploitation. It has been widely used to solve single-objective optimization problems. In engineering design, making trade-offs between multiple conflicting objectives is common. In this work, a multi-objective Bayesian optimization approach is proposed to obtain the Pareto solutions. A novel acquisition function is proposed to determine the next sample point, which helps improve the diversity and convergence of the Pareto solutions. The proposed approach is compared with some state-of-the-art metamodel-based multi-objective optimization approaches with four numerical examples and one engineering case. The results show that the proposed approach can obtain satisfactory Pareto solutions with significantly reduced computational cost.

## 1 Introduction

In simulation-based design optimization, the objective functions are usually expensive to be evaluated [1]. In order to reduce the number of function evaluations during the search, metamodels or surrogates, including Kriging models [2,3], radial basis functions [4], artificial neural networks [5], and polynomial-based approximations [6], are commonly used to assist design optimization. Particularly, Kriging is a metamodeling approach that supports sequential sampling to reduce the overall number of samples and has been extensively applied for design optimization under uncertainty [7–12].

Recently, a new metamodel-assisted global optimization approach, Bayesian optimization (BO), attracted research attentions. The uniqueness of BO is that the sequential sampling strategies are designed to strike a balance between exploration and exploitation for robust global optimization under uncertainty. In addition to the surrogate model as an approximation of the objective function, an acquisition function is also constructed in the same solution space to guide the sampling procedure in BO. The new sample is taken to maximize the acquisition function for each iteration, instead of searching directly based on the objective surrogate. Commonly used acquisition functions include expected improvement (EI) [13,14], probability of improvement (PI) [15,16], and lower confidence bound (LCB) [17,18], where uncertainty associated with the predicted objective values from the surrogates is considered. BO has been studied for a wide range of applications, such as robotics [19,20], combinatorial optimization [21], machine learning [22], simulation-based design optimization [23,24], and others.

Most of the existing BO studies focused on single-objective optimization. In many real-world problems, multiple objectives need to be considered and are likely to be conflicting with each other. The purpose of multi-objective optimization is to find a set of optimum solutions in the Pareto sense so that there is no optimum solution that is superior to the other with respect to all objectives [25,26]. Heuristic algorithms, such as multi-objective particle swarm optimization and multi-objective genetic algorithm, are commonly used to obtain the Pareto solutions of multi-objective optimization problems because of their robustness in convergence [27–29]. Since these population-based algorithms require a large number of function evaluations in order to converge to Pareto solutions [26,30,31], a common strategy to improve the efficiency is using metamodels to replace the objectives [32–37]. In the metamodel-assisted multi-objective optimization approaches, usually one metamodel is constructed separately for each objective. The cost of function evaluation for the surrogates is much lower than that for the original objective functions.

Very limited work has been done on how to formulate BO for efficient multi-objective optimization. The research focus of multi-objective BO (MOBO) is how to design acquisition functions for solving multi-objective problems efficiently. In the existing work, the EI function has been extended to different forms in MOBO. A straightforward extension is to define multi-objective EI as the expected improvement based on the joint probability of improvement from individual objectives with the assumption of independence between objectives [38–41]. Knowles [38] proposed the ParEGO approach, which is an extension of the single-objective efficient global optimization or EI approach. Ponweiser et al. [40] introduced a new EI-based multi-objective optimization, which utilizes the S-metric or hypervolume contribution to decide which sample point to evaluate next. Keane [42] combined the design of experiment methods with Kriging models to enable the parallel evolution of multi-objective Pareto sets. Feliot et al. [43] proposed a Bayesian multi-objective optimization (BMOO) approach that handles multiple objectives and non-linear constraints simultaneously, in which a one-step look-ahead Bayesian strategy is used. Forrester et al. [44] defined the parental improvement based on the Euclidean distance. Bautista [45] defined it as the minimum improvement among all objectives. Pandita et al. [46] provided a reformulation of expected improvement over the dominated hypervolume to solve stochastic multi-objective optimization problems. Calandra et al. [47] proposed to exploit the metamodels to improve the estimation of the final Pareto front. However, in these approaches, the calculation of the expected improvement needs to compute multi-dimensional integrals, which usually relies on Monte Carlo sampling and is computationally expensive. To solve this problem, Zhan et al. [48] proposed the EI matrix (EIM) for multi-objective problems, where the elements in the EIM are the single-objective EIs for each sample point with respect to each objective, and an aggregated scalar value based on all elements in the EIM is used to guide the sampling.

In this paper, a new MOBO approach is proposed to solve multi-objective optimization problems with a new acquisition function. In the proposed approach, two metrics of the relative hyperarea difference (RHD) and overall spread (OS) [49,50] are modified to better measure the convergence and diversity of Pareto solutions. The new acquisition function is defined to simultaneously improve the convergence and diversity of Pareto solutions, in which the modified RHD and OS are combined, and the LCB functions are employed to consider the metamodel uncertainty.

The rest of this paper is organized as follows: in Sec. 2, the background of multi-objective optimization and BO is introduced. In Sec. 3, the proposed MOBO approach is presented. In Sec. 4, the performance and efficiency of the proposed MOBO approach are demonstrated using four numerical examples and an engineering case. The proposed approach is also compared with existing metamodel-based approaches. The advantages of the proposed approach are analyzed and summarized. Section 5 concludes the paper.

## 2 Background

### 2.1 Multi-Objective Optimization.

*F*(

**) is a vector of**

*x**M*objective functions among which at least two objective functions are in conflict,

**= (**

*x**x*

_{1},

*x*

_{2}, …

*x*

_{N})

^{T}is the design variable vector, and

*x*_{lb}and

*x*_{ub}are the lower and upper bounds, respectively.

**= (**

*g**g*

_{1},

*g*

_{2}, …

*g*

_{J}) are the constraints. Since there are trade-offs among those objective functions, the problem in Eq. (1) generally has a set of optimum solutions in the Pareto sense. That is, there is no optimum that is superior to the other in terms of all objectives [25]. These solutions are termed as Pareto set or Pareto frontier.

### 2.2 Kriging Model.

Kriging or Gaussian process regression model is usually used as the surrogate in BO. It provides the predictions of not only the mean value at an unobserved design location but also the associated uncertainty as variance.

*f*(

**) as a realization of a stochastic process**

*x**Y*(

**) [51,52]**

*x**μ*and Z(

**) are the mean and error term of the Gaussian process, respectively. In this paper, the simple Kriging is used and**

*x**μ*is a constant; Z(

**) follows a normal distribution with zero mean and nonzero covariance. The Gaussian spatial correlation function between the values of function**

*x**f*at two locations

**and**

*x***′ is given by**

*x**σ*

^{2}is the standard deviation (STD) that determines the overall magnitude of the variance, and

*θ*

_{1},…,

*θ*

_{u}are the roughness parameters. To obtain the hyperparameters

*σ*

^{2}and

*θ*

_{1},…,

*θ*

_{u}, the maximum likelihood estimate is applied.

**= {**

*X*

*x*_{1},

*x*_{2},…,

*x*_{n}} and their responses

*f*(

**) = {**

*X**f*(

*x*_{1}),

*f*(

*x*_{2}),…,

*f*(

*x*_{n})}, the predicted mean value and the prediction error (mean squared error) at an unobserved location

*x*_{p}can be calculated as

*μ*,

*J*_{n}is an

*n*-dimensional vector with all elements as ones,

*R*_{D}is the covariance matrix with elements

*R*_{D}(

*i*,

*j*) =

*ρ*(

*x*_{i},

*x*_{j}),

*x*_{i},

*x*_{j}∈

**, and**

*X*

*r*_{D}(

*x*_{p}) is the vector of correlations between

*x*_{p}and sample points, i.e.,

*r*

_{Di}(

*x*_{p}) =

*ρ*(

*x*_{p},

*x*_{i}),

*x*_{i}∈

**. More details of Kriging models can be found in Refs. [51,53].**

*X*### 2.3 Bayesian Optimization.

Bayesian optimization is a metamodel-based approach for global optimization under uncertainty with sequential sampling. The most used metamodel to approximate the unknown objective function *f*(** x**) is based on Kriging. An acquisition function is defined to make a trade-off between exploration (sampling in the most uncertain region) and exploitation (sampling in the region with the best predictions) in the sampling or searching process. Commonly used acquisition functions include EI, PI, and LCB functions. Here, the LCB function is introduced. More details about the EI and PI functions can be found in Ref. [24].

**= {**

*X*

*x*_{1},

*x*_{2},…,

*x*_{n}} and their responses

*f*(

**) = {**

*X**f*(

*x*_{1}),

*f*(

*x*_{2}),…,

*f*(

*x*_{n})}, the LCB acquisition function is expressed as [17,18]

*s*(

*x*) is the square root of the mean squared error, and

*k*dictates the exploitation–exploration balance. A new sample point is selected by minimizing the LCB function. With the new sample point, the Kriging model is updated. The iteration continues until a stop criterion is met. The advantage of using LCB as the acquisition function is that there is no need to calculate the integrals as in the EI and PI functions. The calculation of high-dimensional integrals is expensive for multi-objective problems.

## 3 The Proposed MOBO Formulation

In the proposed multi-objective Bayesian optimization formulation, two metrics for the quality of Pareto set, the RHD and OS, are modified and used to define the new acquisition function of MOBO.

### 3.1 Modified Quality Metrics.

The RHD and OS are usually used to measure the quality of Pareto frontiers [49,50]. As illustrated in Fig. 1, RHD and OS represent the convergence and diversity of the obtained Pareto frontier, respectively. The RHD is the area of the polygon formed by the Pareto frontier and an ideal solution that is best in all objectives. The smaller the value of RHD is, the higher convergence of the Pareto frontier is. The OS is the bounding box of the frontier. The larger the value of OS is, the higher diversity of the Pareto frontier is.

*P*= {

*a*,

*b*,

*c*,

*d*} is the current Pareto set, the RHD and OS are calculated as

*p*

_{good}is an ideal and hypothetically good solution, whereas

*p*

_{bad}. is a known and extremely bad solution that is much worse than the frontier, and HA(

*p*

_{bad},

*p*

_{good}) represents the area of bounding box formed by

*p*

_{good}and

*p*

_{bad}.

The main disadvantages of the RHD and OS are (1) the extremely good and bad solutions *p*_{good} and *p*_{bad} need to be defined before the calculation of the two metrics and (2) when a new Pareto solution is found between two Pareto solutions, the value of RHD for a Pareto front will increase, instead of monotonically decreasing. This contradicts to the original intention that when two Pareto frontiers are compared, the one with a smaller RHD is supposed to be better.

To overcome the above two shortcomings in calcuating RHD and OS and using them to measure the improvement, the modified hyperarea difference (MHD) and modified overall spread (MOS) are proposed here, as shown in Fig. 2. The shaded areas in Figs. 2(a) and 2(b) represent the values of the MHD and MOS, respectively. MHD and MOS are not ratios as in the original RHD and OS. Thus, the calculations do not rely on the hypothetically good and bad solutions. The areas are calculated based on the existing frontier. Furthermore, the value of MHD will decrease monotonically when a new Pareto point is found between two existing Pareto solutions. As illustrated in Fig. 3(a), the area for MHD monotonically reduces as new Pareto solutions (e.g., point e) are found and included in the Pareto set, which is different from RHD as in Fig. 3(b). Therefore, when MHD is used in the acquisition function, it encourages finding more Pareto solutions, which will provide designers more choices.

The effects of new Pareto solutions on the new metrics are further illustrated with a simple example in Fig. 4. Based on the four Pareto solutions in Fig. 4(a), MHD and MOS for the two-objective minimization problem are calculated to be 3.0 and 6.25, respectively. When a new Pareto solution outside the range of the existing ones is inserted, either at the upper left or lower right of the region as shown in Fig. 4(b), the diversity is improved with a larger MOS value. As shown in Fig. 4(c), a new solution that dominates some existing solutions will improve the convergence with a smaller MHD value. Also shown in Fig. 4(d), a new solution that does not dominate any existing solution will improve the convergence and reduce the MHD value. Therefore, a new Pareto solution can always improve either diversity (increasing MOS) or convergence (reducing MHD).

### 3.2 The Proposed Acquisition Function.

*M*is the number of objectives that are approximated by Kriging models, and $f^i(x)$ and

*s*

_{i}(

**) are the predicted mean and standard deviation of the Kriging model for the**

*x**i*th objective at

**, respectively. Here, the coefficients**

*x**k*

_{i}’s are chosen to be 1. Larger coefficient values encourage the search in the uncertain regions for exploration, whereas smaller ones encourage exploitation. The effect of the LCB functions on searching the solutions is illustrated in Fig. 5. Support that there are four candidate solutions $xi(i=1,2,3,4)$. The predicted means at

*x*_{1}and

*x*_{2}are almost the same, but the predicted variance at

*x*_{1}is larger. Hence,

*x*_{1}has a higher priority to be sampled than

*x*_{2}.

*x*_{3}and

*x*_{4}have the same prediction variance, while

*x*_{4}has a better predicted mean. Therefore,

*x*_{4}has a higher priority to be sampled than

*x*_{3}. For other cases where sample points cannot be directly compared, the acquisition function based on the MHD and MOS is used to compare their priorities with a balance between exploitation and exploration.

*I*

_{MHD}(

**) and**

*x**I*

_{MOS}(

**) represent the relative improvements in MHD and MOS, respectively, as**

*x*

*D*_{n}represents existing

*n*sample points, $MHD(Dn)$ and MOS(

*D*_{n}) represent the MHD and MOS based on

*D*_{n}, respectively, whereas $MHD({Dn,x})$ and MOS(

*D*_{n}) represent the updated MHD and MOS when

**is added in the sample set. In calculating $MHD({Dn,x})$ and $MOS(Dn)$, the LCB functions are used as the objectives. By maximizing the acquisition function in Eq. (11), a new sample point will be selected to update the Kriging models.**

*x*The basic requirement of multi-objective optimization is to obtain a sufficient number of Pareto solutions with good convergence and diversity. Since a new Pareto solution always brings an improvement of either MHD or MOS, our acquisition function is defined to simultaneously and rapidly improve both the convergence and diversity of the Pareto solutions.

*a*(

**) is positive when**

*x*

*f*_{LCB}(

**) is in the non-dominated region Ω. Otherwise, the value of**

*x**a*(

**) is equal to zero and does not provide the guidance in sequential sampling. When there are already a large number of Pareto solutions in the current sample set, the non-dominated region will be very small. This will make it difficult to find the optimal solution of the acquisition function. To resolve this efficiency issue, the acquisition function is further modified to**

*x*

*X*_{j}’s (

*j*= 1, 2, …,

*K*) represent the current Pareto solutions. $\Vert fLCB(x)\u2212f(Xj)\Vert 2$ denotes the Euclidean distance between two vectors,

*f*_{LCB}(

**) and the original objectives corresponding to the existing Pareto solution**

*x*

*X*_{j}, in the output space. The additional component in Eq. (13) can provide a guidance to help find the optimal solution of the acquisition function when the non-dominated region becomes smaller as the search proceeds. It should be noted that the additional component in Eq. (13) will not change the optimal solution of the acquisition function.

### 3.3 The New MOBO Procedure.

The proposed MOBO approach begins with the initial sampling. The basic requirement is that the initial sample points are distributed uniformly in the whole design space to provide each region of the design space the equal opportunity to be detected. Hence, Latin hypercube sampling (LHS) [54,55] can be used to generate the initial sample set.

The MOBO procedure is summarized in Fig. 7, which includes the following steps:

*Step 1*: Generate an initial sample set by LHS and obtain the true responses from the original objectives.*Step 2*: Construct the initial Kriging models based on the initial sample points, one model for each objective. The design and analysis of computer experiments toolbox [56] is used to construct the Kriging models.*Step 3*: Sort the sample points to obtain the current Pareto solutions. Here, a fast non-dominated sorting approach is used [57]. In this study, the sorting approach is simplified in two ways to save the sorting time: (1) the simplified sorting approach only divides the sample points into dominated and non-dominated solutions, while the original approach classifies the sample points into multiple dominance levels and (2) the sample points that have been classified as dominated solutions are not sorted in the subsequent iterations.*Steps 4 and 5*: The next sample point is determined by maximizing the proposed acquisition function. The new sample point is used to update the Kriging models and the acquisition function.*Step 6*: If the stop criterion is not met, go to*Step 3*. Otherwise stop and output the Pareto solutions.

The optimization algorithm terminates when the following two conditions are met simultaneously:

When the difference between the current quality metrics and the ones in the previous iteration is less than a specific value (e.g., 1% in this study), or the number of iterations reaches a predetermined maximum level, which is set as 200 in this study.

When the optimization procedure finds at least

*k*solutions so that a sufficient number of Pareto solutions give the decision maker enough choices. In this study,*k*is set as 20.

## 4 Examples and Results

In this section, several commonly used numerical examples (ZDT1, ZDT2, ZDT3, and FON) adapted from Refs. [32,58] and an engineering case adapted from Ref. [59] are used to demonstrate the applicability and efficiency of the proposed MOBO approach.

Since the EIM [48] and BMOO approaches [43] are the most relevant work, the proposed approach is compared with them through the test examples. For the EIM approaches, three multi-objective criteria were developed by combining the elements in the EIM into scalar functions in three different ways: (1) Euclidean distance-based EIM (EIMe), (2) maximin distance-based EIM (EIMm), and (3) hypervolume-based EIM (EIMh). For all of the approaches, the same initial sampling strategy is used and the initial sampling size is set as 30. The same stop criterion described in Sec. 3.3 is also used for all approaches.

### 4.1 Numerical Examples.

Each numerical example is solved 30 times with the same approach to account for the influence of randomness. The formulations of the four examples are listed in Table 1. For ZDT1, ZDT2, and ZDT3, the Kriging model is only constructed for *f*_{2}(*x*) because *f*_{1}(*x*) is very simple. For the FON problem, two Kriging models are constructed for the two objectives respectively.

Cases | Formulation |
---|---|

ZDT1 | $minimizef1(x)=x1f2(x)=g(x)\xd7h(x)whereg(x)=1+9n\u22121\u2211i=2nxih(x)=1\u2212f1(x)/g(x)0\u2264xi\u22641,i=1,\u2026,n$ |

ZDT2 | $minimizef1(x)=x1f2(x)=g(x)\xd7h(x)whereg(x)=1+9n\u22121\u2211i=2nxih(x)=1\u2212f1(x)/g(x)0\u2264xi\u22641,i=1,\u2026,n$ |

ZDT3 | $minimizef1(x)=x1f2(x)=g(x)\xd7h(x)whereg(x)=1+9n\u22121\u2211i=2nxih(x)=1\u2212f1(x)/g(x)\u2212(f1(x)/g(x))sin(10\pi f1)0\u2264xi\u22641,i=1,\u2026,n$ |

FON | $minimizef1(x)=1\u2212exp(\u2212\u2211i=13(xi\u221213)2)f2(x)=1\u2212exp(\u2212\u2211i=13(xi+13)2)\u22124\u2264xi\u22644,i=1,\u2026,3$ |

Cases | Formulation |
---|---|

ZDT1 | $minimizef1(x)=x1f2(x)=g(x)\xd7h(x)whereg(x)=1+9n\u22121\u2211i=2nxih(x)=1\u2212f1(x)/g(x)0\u2264xi\u22641,i=1,\u2026,n$ |

ZDT2 | $minimizef1(x)=x1f2(x)=g(x)\xd7h(x)whereg(x)=1+9n\u22121\u2211i=2nxih(x)=1\u2212f1(x)/g(x)0\u2264xi\u22641,i=1,\u2026,n$ |

ZDT3 | $minimizef1(x)=x1f2(x)=g(x)\xd7h(x)whereg(x)=1+9n\u22121\u2211i=2nxih(x)=1\u2212f1(x)/g(x)\u2212(f1(x)/g(x))sin(10\pi f1)0\u2264xi\u22641,i=1,\u2026,n$ |

FON | $minimizef1(x)=1\u2212exp(\u2212\u2211i=13(xi\u221213)2)f2(x)=1\u2212exp(\u2212\u2211i=13(xi+13)2)\u22124\u2264xi\u22644,i=1,\u2026,3$ |

*n*= 3), to present a detailed comparison of the different approaches. Figure 8 shows a typical set of Pareto frontiers obtained from one of the 30 runs for different approaches. The true Pareto optimal frontier for the ZDT1 problem can be analytically expressed as [60]

As shown in Fig. 8, the Pareto frontiers from the EIM’s, BMOO, and the proposed MOBO approaches are in good agreement with the true Pareto frontier. This indicates that these metamodel-based approaches can obtain satisfactory Pareto solutions.

The comparisons of the quality metrics and computational efficiency for different approaches are summarized in Table 2. In Table 2, FC denotes the number of original function evaluations. The value ranges, means, and STDs of the MHD, MOS, and FC from 30 runs are listed. The box charts of the MHD, MOS, and FC obtained by different approaches for the ZDT1 problem are also shown in Fig. 9.

EIMe | EIMm | EIMh | BMOO | Proposed approach | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Range | Mean | STD | Range | Mean | STD | Range | Mean | STD | Range | Mean | STD | Range | Mean | STD | |

MHD | [0.35, 0.38] | 0.37 | 0.01 | [0.34, 0.37] | 0.36 | 0.01 | [0.36, 0.37] | 0.36 | 0.00 | [0.36, 0.43] | 0.38 | 0.02 | [0.36, 0.37] | 0.36 | 0.00 |

MOS | [1.00, 1.56] | 1.13 | 0.15 | [1.01, 1.46] | 1.11 | 0.11 | [1.00, 1.57] | 1.11 | 0.16 | [0.95, 2.17] | 1.25 | 0.34 | [0.99, 1.01] | 1.01 | 0.00 |

FC | [53, 131] | 89.93 | 18.63 | [60, 150] | 91.20 | 19.54 | [67, 207] | 110.00 | 33.12 | [61, 175] | 105.27 | 28.45 | [50, 80] | 52.73 | 6.60 |

EIMe | EIMm | EIMh | BMOO | Proposed approach | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Range | Mean | STD | Range | Mean | STD | Range | Mean | STD | Range | Mean | STD | Range | Mean | STD | |

MHD | [0.35, 0.38] | 0.37 | 0.01 | [0.34, 0.37] | 0.36 | 0.01 | [0.36, 0.37] | 0.36 | 0.00 | [0.36, 0.43] | 0.38 | 0.02 | [0.36, 0.37] | 0.36 | 0.00 |

MOS | [1.00, 1.56] | 1.13 | 0.15 | [1.01, 1.46] | 1.11 | 0.11 | [1.00, 1.57] | 1.11 | 0.16 | [0.95, 2.17] | 1.25 | 0.34 | [0.99, 1.01] | 1.01 | 0.00 |

FC | [53, 131] | 89.93 | 18.63 | [60, 150] | 91.20 | 19.54 | [67, 207] | 110.00 | 33.12 | [61, 175] | 105.27 | 28.45 | [50, 80] | 52.73 | 6.60 |

From Table 2 and Fig. 9, it can be seen that the means of quality metrics of Pareto solutions obtained by all four approaches are close to the ideal quality metrics. The proposed approach however shows the best stability with the minimum standard deviations. In terms of computational cost, the number of function evaluations in the proposed approach is significantly lower than those of other approaches. The proposed approach also has the smallest standard deviation of FC. The results show that the proposed approach is more efficient and stable than the other three approaches. To further compare the efficiency of different approaches to find a sufficient number of Pareto solutions, we counted the number of sample points (including initial sample points) required for different approaches to find 10, 15, and 20 Pareto solutions. The results are shown in Table 3. The proposed approach always has the lowest average number of samples. Obtaining the enough number of Pareto solutions is important for multi-objective optimization, because the designer can have more options to meet the design requirements from a larger number of Pareto solutions.

The number of obtained pareto solutions | EIMe | EIMm | EIMh | BMOO | Proposed approach |
---|---|---|---|---|---|

10 | 46.13 | 44.07 | 49.97 | 48.44 | 40.60 |

15 | 72.07 | 76.30 | 81.10 | 78.81 | 46.80 |

20 | 89.77 | 91.17 | 109.87 | 104.93 | 52.73 |

The number of obtained pareto solutions | EIMe | EIMm | EIMh | BMOO | Proposed approach |
---|---|---|---|---|---|

10 | 46.13 | 44.07 | 49.97 | 48.44 | 40.60 |

15 | 72.07 | 76.30 | 81.10 | 78.81 | 46.80 |

20 | 89.77 | 91.17 | 109.87 | 104.93 | 52.73 |

The comparison results for the quality of Pareto solutions and computational efforts for all examples are summarized in Table 4. It is seen that the EIMe, EIMm, EIMh, and BMOO approaches, and the proposed one can obtain satisfactory optimal solutions for the ZDT2 problem with *n* = 3, as well as ZDT3 and FON problems. In terms of computational efficiency, the proposed approach has the lowest computational cost. In addition, for all of the three metrics, the proposed approach has the minimum standard deviations, which indicates that the proposed approach has the best stability for these problems. For the three high-dimensional examples (ZDT2 problems with *n* = 6, 8, and 10), searching for Pareto solutions is more difficult. The “/” symbol in Table 4 indicates that it is difficult for the optimization method to reach the termination conditions when the maximum number of iterations 200 is reached. More iterations are required to find at least 20 Pareto solutions with converged quality metrics. For ZDT2 problems with *n* = 6 and 8, the proposed approach shows the best efficiency and stability. When the maximum number of iterations is reached, it is difficult for the EIMh and BMOO approaches to meet the termination condition for the ZDT2 problems with *n* = 6 and 8. The EIMm approach cannot meet the termination condition for the ZDT2 problem with *n* = 8. For the ZDT2 problem with *n* = 10, all five approaches failed to meet the termination conditions. The reason is that the number of samples required to construct a Kriging model with good approximation grows exponentially as the dimension of the problem increases [24]. As a result, the prediction accuracy of the Kriging model will decrease as the dimension increases when the number of samples does not grow quick enough.

EIMe | EIMm | EIMh | BMOO | Proposed approach | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

30 runs | Mean | STD | 30 runs | Mean | STD | 30 runs | Mean | STD | 30 runs | Mean | STD | 30 runs | Mean | STD | ||

ZDT2 (n = 3) | MHD | [0.69, 0.70] | 0.70 | 0.00 | [0.69, 0.70] | 0.70 | 0.00 | [0.69, 0.70] | 0.70 | 0.00 | [0.66, 0.73] | 0.71 | 0.02 | [0.69, 0.69] | 0.69 | 0.00 |

MOS | [0.70, 1.45] | 1.06 | 0.10 | [1.00, 1.33] | 1.05 | 0.06 | [0.99, 1.53] | 1.06 | 0.12 | [0.88, 1.93] | 1.02 | 0.18 | [1.00, 1.00] | 1.00 | 0.00 | |

FC | [51, 229] | 69.6 | 35.41 | [51, 69] | 56.03 | 4.53 | [63, 230] | 198.43 | 38.5 | [65, 115] | 79.33 | 10.84 | [50, 52] | 50.03 | 0.48 | |

ZDT3 | MHD | [0.43, 0.84] | 0.75 | 0.07 | [0.71, 0.84] | 0.78 | 0.03 | [0.69, 0.86] | 0.77 | 0.03 | [0.52, 0.93] | 0.77 | 0.06 | [0.69, 0.91] | 0.76 | 0.03 |

MOS | [1.11, 3.09] | 1.81 | 0.41 | [1.46, 4.12] | 1.82 | 0.54 | [1.51, 3.88] | 2.10 | 0.52 | [1.14, 4.15] | 2.08 | 0.55 | [1.39, 1.74] | 1.54 | 0.07 | |

FC | [106, 229] | 183.13 | 35.31 | [115, 229] | 181.87 | 29.12 | [127, 229] | 195.1 | 31.71 | [91, 215] | 168.43 | 28.69 | [84, 179] | 150.5 | 28.13 | |

FON | MHD | [0.46, 0.70] | 0.66 | 0.05 | [0.51, 0.70] | 0.67 | 0.04 | [0.53, 0.70] | 0.67 | 0.04 | [0.57, 0.70] | 0.67 | 0.03 | [0.66, 0.70] | 0.69 | 0.01 |

MOS | [0.62, 0.96] | 0.92 | 0.07 | [0.76, 0.96] | 0.93 | 0.04 | [0.76, 0.96] | 0.93 | 0.05 | [0.76, 0.96] | 0.94 | 0.04 | [0.93, 0.96] | 0.96 | 0.01 | |

FC | [73, 179] | 115.6 | 34.99 | [68, 230] | 126.13 | 53.52 | [80, 230] | 158.1 | 54.56 | [80, 230] | 158.1 | 54.56 | [63, 103] | 79 | 9.56 | |

ZDT2 (n = 6) | MHD | [0.69, 0.71] | 0.70 | 0.01 | [0.65, 0.72] | 0.70 | 0.01 | / | / | / | / | / | / | [0.68, 0.69] | 0.69 | 0.00 |

MOS | [1.00, 1.87] | 1.18 | 0.20 | [0.97, 1.98] | 1.14 | 0.21 | / | / | / | / | / | / | [0.99, 1.05] | 1.01 | 0.02 | |

FC | [53, 93] | 66.12 | 9.50 | [50, 73] | 57.97 | 4.79 | / | / | / | / | / | / | [50, 56] | 51.66 | 1.49 | |

ZDT2 (n = 8) | MHD | [0.67, 0.72] | 0.71 | 0.01 | / | / | / | / | / | / | / | / | / | [0.68, 0.70] | 0.69 | 0.01 |

MOS | [1.00, 2.09] | 1.29 | 0.33 | / | / | / | / | / | / | / | / | / | [0.99, 1.05] | 1.01 | 0.02 | |

FC | [53, 118] | 77.19 | 18.59 | / | / | / | / | / | / | / | / | / | [50, 60] | 54.8 | 2.23 | |

ZDT2 (n = 10) | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | |

/ | / | / | / | / | / | / | / | / | / | / | / | / | / | / | ||

/ | / | / | / | / | / | / | / | / | / | / | / | / | / | / |

EIMe | EIMm | EIMh | BMOO | Proposed approach | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

30 runs | Mean | STD | 30 runs | Mean | STD | 30 runs | Mean | STD | 30 runs | Mean | STD | 30 runs | Mean | STD | ||

ZDT2 (n = 3) | MHD | [0.69, 0.70] | 0.70 | 0.00 | [0.69, 0.70] | 0.70 | 0.00 | [0.69, 0.70] | 0.70 | 0.00 | [0.66, 0.73] | 0.71 | 0.02 | [0.69, 0.69] | 0.69 | 0.00 |

MOS | [0.70, 1.45] | 1.06 | 0.10 | [1.00, 1.33] | 1.05 | 0.06 | [0.99, 1.53] | 1.06 | 0.12 | [0.88, 1.93] | 1.02 | 0.18 | [1.00, 1.00] | 1.00 | 0.00 | |

FC | [51, 229] | 69.6 | 35.41 | [51, 69] | 56.03 | 4.53 | [63, 230] | 198.43 | 38.5 | [65, 115] | 79.33 | 10.84 | [50, 52] | 50.03 | 0.48 | |

ZDT3 | MHD | [0.43, 0.84] | 0.75 | 0.07 | [0.71, 0.84] | 0.78 | 0.03 | [0.69, 0.86] | 0.77 | 0.03 | [0.52, 0.93] | 0.77 | 0.06 | [0.69, 0.91] | 0.76 | 0.03 |

MOS | [1.11, 3.09] | 1.81 | 0.41 | [1.46, 4.12] | 1.82 | 0.54 | [1.51, 3.88] | 2.10 | 0.52 | [1.14, 4.15] | 2.08 | 0.55 | [1.39, 1.74] | 1.54 | 0.07 | |

FC | [106, 229] | 183.13 | 35.31 | [115, 229] | 181.87 | 29.12 | [127, 229] | 195.1 | 31.71 | [91, 215] | 168.43 | 28.69 | [84, 179] | 150.5 | 28.13 | |

FON | MHD | [0.46, 0.70] | 0.66 | 0.05 | [0.51, 0.70] | 0.67 | 0.04 | [0.53, 0.70] | 0.67 | 0.04 | [0.57, 0.70] | 0.67 | 0.03 | [0.66, 0.70] | 0.69 | 0.01 |

MOS | [0.62, 0.96] | 0.92 | 0.07 | [0.76, 0.96] | 0.93 | 0.04 | [0.76, 0.96] | 0.93 | 0.05 | [0.76, 0.96] | 0.94 | 0.04 | [0.93, 0.96] | 0.96 | 0.01 | |

FC | [73, 179] | 115.6 | 34.99 | [68, 230] | 126.13 | 53.52 | [80, 230] | 158.1 | 54.56 | [80, 230] | 158.1 | 54.56 | [63, 103] | 79 | 9.56 | |

ZDT2 (n = 6) | MHD | [0.69, 0.71] | 0.70 | 0.01 | [0.65, 0.72] | 0.70 | 0.01 | / | / | / | / | / | / | [0.68, 0.69] | 0.69 | 0.00 |

MOS | [1.00, 1.87] | 1.18 | 0.20 | [0.97, 1.98] | 1.14 | 0.21 | / | / | / | / | / | / | [0.99, 1.05] | 1.01 | 0.02 | |

FC | [53, 93] | 66.12 | 9.50 | [50, 73] | 57.97 | 4.79 | / | / | / | / | / | / | [50, 56] | 51.66 | 1.49 | |

ZDT2 (n = 8) | MHD | [0.67, 0.72] | 0.71 | 0.01 | / | / | / | / | / | / | / | / | / | [0.68, 0.70] | 0.69 | 0.01 |

MOS | [1.00, 2.09] | 1.29 | 0.33 | / | / | / | / | / | / | / | / | / | [0.99, 1.05] | 1.01 | 0.02 | |

FC | [53, 118] | 77.19 | 18.59 | / | / | / | / | / | / | / | / | / | [50, 60] | 54.8 | 2.23 | |

ZDT2 (n = 10) | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | |

/ | / | / | / | / | / | / | / | / | / | / | / | / | / | / | ||

/ | / | / | / | / | / | / | / | / | / | / | / | / | / | / |

### 4.2 An Engineering Example.

In this section, the proposed approach is applied to the design optimization of a torque arm. As shown in Fig. 10, the torque arm is fixed to the hole at the left end. Two forces, *P*_{1} = 8.0 kN and *P*_{2} = 4.0 kN, are placed at the center of the right end. Young’s modulus and Poisson’s ratio of the material are 200 GPa and 0.3, respectively.

*α*,

*b*

_{1},

*D*

_{1},

*h*,

*t*

_{1}, and

*t*

_{2}. The design optimization of the torque arm can be expressed as

*V*is the total volume of the torque arm, and $max_d$ is the maximum displacement. In the simulation-based optimization process, the maximum displacement and stress of the torque arm are predicted by two Kriging metamodels. The penalty function [61] is used to handle constraints in this example. For unknown or computationally expensive constraints, surrogates of constraints can also be constructed to quickly identify the infeasible solutions [23].

Each of the five approaches is run 10 times for this problem. The Pareto frontiers obtained by these different approaches are plotted in Fig. 11. As shown in Fig. 11, the Pareto frontiers obtained by the different approaches are in good consistency. The diversity of the proposed approach is better than the other four approaches. The Pareto solutions obtained by the EIMe and EIMm approaches cluster in some regions and are not as diverse or distributed as the ones obtained by the EIMh, BMOO, and the proposed approach. Obviously, a more uniformly distributed solution set is more beneficial for the designer. The design variable values of five Pareto solutions, labeled with numbers in Fig. 11, and the corresponding responses are listed in Table 5.

No. | Design variables | Volume | Maximum displacement |
---|---|---|---|

1 | (3.75, 27.25, 90.03, 23.32, 12.05, 8.03) | 444.25 | 1.59 |

2 | (4.18, 25.00, 90.00, 29.29, 12.00, 8.00) | 470.97 | 1.11 |

3 | (4.50, 31.78, 90.00, 30.00, 12.00, 8.00) | 523.64 | 0.83 |

4 | (4.50, 35.00, 90.00, 29.99, 22.00, 13.00) | 650.46 | 0.64 |

5 | (4.50, 35.00, 120.00, 30.00, 22.00, 13.00) | 799.36 | 0.634 |

No. | Design variables | Volume | Maximum displacement |
---|---|---|---|

1 | (3.75, 27.25, 90.03, 23.32, 12.05, 8.03) | 444.25 | 1.59 |

2 | (4.18, 25.00, 90.00, 29.29, 12.00, 8.00) | 470.97 | 1.11 |

3 | (4.50, 31.78, 90.00, 30.00, 12.00, 8.00) | 523.64 | 0.83 |

4 | (4.50, 35.00, 90.00, 29.99, 22.00, 13.00) | 650.46 | 0.64 |

5 | (4.50, 35.00, 120.00, 30.00, 22.00, 13.00) | 799.36 | 0.634 |

The detailed comparisons of the performances are summarized in Table 6. It can be seen in Table 6 that the proposed approach has better efficiency and diversity than the other approaches. For the two quality metrics and one efficiency metric, the proposed approach has the smallest standard deviations, which shows the best stability.

EIMe | EIMm | EIMh | BMOO | Proposed approach | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Range | Mean | STD | Range | Mean | STD | Range | Mean | STD | Range | Mean | STD | Range | Mean | STD | |

MHD | [48.17, 63.85] | 55.22 | 4.76 | [50.13, 64.25] | 57.81 | 4.5 | [34.96, 46.95] | 47.82 | 3.67 | [43.47, 55.77] | 47.82 | 3.67 | [44.39, 52.26] | 47.64 | 2.64 |

MOS | [96.98, 294.83] | 170.84 | 58.7 | [99.83, 188.77] | 155.81 | 25.52 | [111.10, 235.32] | 185.88 | 39.08 | [110.92, 310.45] | 270.44 | 19.21 | [314.30, 356.84] | 330.74 | 15.48 |

FC | [94, 167] | 122.2 | 20.34 | [97, 139] | 114.5 | 15.58 | [56, 71] | 64.1 | 4.72 | [62, 89] | 68.13 | 6.54 | [60, 69] | 63.3 | 2.41 |

EIMe | EIMm | EIMh | BMOO | Proposed approach | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Range | Mean | STD | Range | Mean | STD | Range | Mean | STD | Range | Mean | STD | Range | Mean | STD | |

MHD | [48.17, 63.85] | 55.22 | 4.76 | [50.13, 64.25] | 57.81 | 4.5 | [34.96, 46.95] | 47.82 | 3.67 | [43.47, 55.77] | 47.82 | 3.67 | [44.39, 52.26] | 47.64 | 2.64 |

MOS | [96.98, 294.83] | 170.84 | 58.7 | [99.83, 188.77] | 155.81 | 25.52 | [111.10, 235.32] | 185.88 | 39.08 | [110.92, 310.45] | 270.44 | 19.21 | [314.30, 356.84] | 330.74 | 15.48 |

FC | [94, 167] | 122.2 | 20.34 | [97, 139] | 114.5 | 15.58 | [56, 71] | 64.1 | 4.72 | [62, 89] | 68.13 | 6.54 | [60, 69] | 63.3 | 2.41 |

## 5 Conclusion

In this paper, a new acquisition function is proposed for multi-objective Bayesian optimization, where the improvement of the hyperarea and overall spread are used to guide the sequential sampling to search the Pareto solutions. Two modified metrics of hyperarea and overall spread of the Pareto frontier are proposed and used in the new acquisition function, which encourages searching for a large number of Pareto solutions and provides a balance between convergence and diversity of the Pareto frontier. The LCB functions are used to incorporate uncertainty in surrogate modeling and find the exploration–exploitation balance, while keeping the computational efficiency.

The proposed approach is demonstrated with four numerical examples and an engineering case. Its performance is compared with other four expected improvement matrix approaches. The results show that the proposed approach is computationally more efficient with fewer numbers of function evaluations to obtain the same number of Pareto solutions. The new approach is also stable and robust with the minimum level of variations.

While computational efficiency is the consideration of using the LCB functions, the main computational cost still remains at constructing the Kriging models. Scalability has been a major issue for Kriging as the dimension of searching space increases. The computational efficiency of the current scheme can be further improved. As the number of sample points increases, the computational time for sorting the non-dominated solutions and calculating MHD and MOS becomes longer. More efficient data structures and algorithms can be developed. In addition, a constant exploitation–exploration coefficient is used in our LCB functions. A dynamically adaptive coefficient value can have a better balance between the exploitation and exploration. In future work, we will also examine the efficiency and accuracy of the proposed approach for optimization with more than two objectives.

## Acknowledgment

This research has been supported in part by the National Natural Science Foundation of China (Grant Nos. 51775203, 51805179, and 51721092) and the Fundamental Research Funds for the Central Universities at HUST (Grant No. 2016YXMS272). The support of China Scholarship Council is also appreciated.