## Abstract

Establishing fast and accurate structure-to-property relationships is an important component in the design and discovery of advanced materials. Physics-based simulation models like the finite element method (FEM) are often used to predict deformation, stress, and strain fields as a function of material microstructure in material and structural systems. Such models may be computationally expensive and time intensive if the underlying physics of the system is complex. This limits their application to solve inverse design problems and identify structures that maximize performance. In such scenarios, surrogate models are employed to make the forward mapping computationally efficient to evaluate. However, the high dimensionality of the input microstructure and the output field of interest often renders such surrogate models inefficient, especially when dealing with sparse data. Deep convolutional neural network (CNN) based surrogate models have shown great promise in handling such high-dimensional problems. In this paper, a single ellipsoidal void structure under a uniaxial tensile load represented by a linear elastic, high-dimensional and expensive-to-query, FEM model. We consider two deep CNN architectures, a modified convolutional autoencoder framework with a fully connected bottleneck and a UNet CNN, and compare their accuracy in predicting the von Mises stress field for any given input void shape in the FEM model. Additionally, a sensitivity analysis study is performed using the two approaches, where the variation in the prediction accuracy on unseen test data is studied through numerical experiments by varying the number of training samples from 20 to 100.

## 1 Introduction

During the metal forming process, large plastic deformation in a material usually leads to ductile fracture [1]. At the microscopic level, ductile fracture occurs through void nucleation [2,3], void growth [4,5], and void coalescence [6–8]. Void geometry [9,10] plays a key role in influencing ductile fracture behavior, and it is important to know how void geometry influences local stress fields under applied load. This information can be utilized and adopted in an inverse design framework to obtain optimal void geometry, which can help avoid or delay ductile fracture. Physics-based models simulating complex ductile fracture behavior are often very expensive to run. This leads to a lack of data which renders the inverse design process inefficient. Surrogate models [11–19] are good alternatives that help build a fast yet accurate input–output relationship and have been successfully used in several engineering problems [20–28]. Most of the existing surrogate modeling methods like Gaussian process regression and polynomial chaos expansions [29–32] are data efficient in a low-dimensional regime. They have shown immense promise in modeling data sets that stem from computationally expensive solvers, working well under a finite budget. However, when dealing with high-dimensional image-like input and output field data, they need more data to achieve satisfactory accuracy. Their efficiency is also compromised because they become slow during the training phase. Also, field data usually have underlying spatial correlations that are not utilized when vectorized. One way to overcome the challenges associated with high-dimensional image data is to apply dimensionality reduction techniques [33,34]. Dimensionality reduction techniques can compress the data into a low-dimensional space while preserving the maximum amount of information. Some popular dimensionality reduction methods used for surrogate modeling building include principal component analysis [35], proper orthogonal decomposition [36], gradient-free Bayesian methods [37], variational autoencoders [38], manifold learning methods [39], and gradient information-based nonlinear dimension reduction [40]. However, this approach requires combined training of three different models, the first model $\varphi x$ which maps the high-dimensional input data $X$ into low-dimensional data $X^$, a second model $\varphi y$ which maps the high-dimensional output data $Y$ into low-dimensional data $Y^$ and finally, a surrogate model $fxy$ to build a fast and accurate relationship between $X^$ and $Y^$ such that $Y=\varphi y\u22121(fxy(\varphi x(X)))$. This approach is effective, but sometimes combining the three model tasks into a single training process can be cumbersome. Convolutional neural networks (CNNs) are useful in the sense that they enable a direct image-to-image mapping $\psi $ such that $Y=\psi (X)$. They are very effective in identifying important features and detecting spatial correlations typically present in image-like field data [41–48].

To predict stress fields corresponding to different input geometry maps using machine learning (ML), the stresses are often interpolated in regular grids or pixels or voxels. Convolutional operations can then be applied on the both input and output maps, to infer the output field of interest for unseen geometries. In literature, there are several studies of efficient simulation data based surrogate models, such as Refs. [49–51]. While different flavors of CNNs are used in these, it is noteworthy that the ML models typically require large amount of data. For instance, Ref. [50] utilizes 15,000 input–output data pairs in training, and Ref. [49] utilizes up to 100,000 data points. While this setup is possible in simulations (albeit computationally costly), synthesis of such large amount of experimental data is often prohibitively expensive. Alternatively, neural networks can also be used to map a few selected summary statistics such as maximum stress values, volume of high stress regions from geometric parameters, and applied force values, potentially requiring fewer data points, but with a loss of full-field information in the reconstructions.

In this paper, a single ellipsoidal void representative volume element (RVE) model in a linear elastic material domain is considered. As an approximation, the void growth, nucleation, or coalescence is not considered in the simulation model. The ellipsoidal void in the RVE model can assume different shapes depending on its aspect ratio and angle of orientation. The objective is to map the stress field efficiently with varying void shapes. We consider a modified convolutional autoencoder (CAE) network and the UNet CNN deep learning surrogate models to predict the von Mises stress field maps as a function of the input shape of the ellipsoidal void. Neural architecture search and hyperparameter tuning are key to obtaining optimally trained networks, especially in the limited data regime. For that purpose, the hyperparameters of the CAE network are asynchronously optimized using deephyper [52] while the hyperparameters of the UNet CNN are optimized using a Monte Carlo sampling-based algorithm.

Both CAE or UNet are extensively studied algorithms, often in big-data regime. Similarly, the hyperparameter optimization using deephyper is also largely done in abundance of data. In this work, we work with extremely limited data regime, where the training points are varied between roughly 20 and 100 points. Careful data curation and network specifications, and meticulous sensitivity studies are necessary to achieve satisfactory accuracy in these situations. The key impacts of this work are highlighted here: (a) the two methods implemented and tested are intended for low data regime, (b) the extensive hyperparameter search capability of the deephyper framework and the simplicity of the UNet lend themselves to low data regime problems without the need for much sophisticated intervention for the engineering problems of interest, and (c) this work carries out an exhaustive study in the performance of the models across data sizes to further analyze and understand their generalized applicability.

The rest of the paper is organized as follows: first, we describe the engineering problem in Sec. 2, next we provide an overview of the different modeling methods used for image-based surrogate modeling in Sec. 3, followed by the numerical results on the engineering problem of high-dimensional elastic stress field prediction in Sec. 4. We summarize our conclusions from this work in Sec. 5.

## 2 Problem Description

A representative void problem is setup to investigate the methods proposed in the paper. The parent shape of the void is ellipsoidal, and the domain is cuboidal (refer to Fig. 1). The center of the void and domain overlap with each other. The behavior of the material in the domain is assumed to be linear elastic. Simulation cases with different void geometries can be generated by varying the void axial lengths $Rx$, $Ry$, $Rz$ and the void axial rotations $\theta x$, $\theta y$, and $\theta z$. The bounds on the axis lengths of the void are determined by enforcing the far-field stress conditions to avoid any edge effects [53–59]. Based on the presence of feature rotation, two categories of data sets are generated, namely the non-rotated dataset and the rotated dataset. For the non-rotated dataset, the axis lengths $Rx$, $Ry$, and $Rz$ are varied within their respective bounds. For the rotated dataset, the axial rotation $\theta y$ and the axis lengths $Rx$, $Ry$, and $Rz$ are varied within their respective bounds. Tables 1 and 2 specify the ranges of the variable parameters for the non-rotated and rotated dataset cases, respectively.

Parameter | Lower bound | Upper bound |
---|---|---|

$Rx$ | $0.05\xd7Lx$ | $0.1\xd7Lx$ |

$Ry$ | $0.05\xd7Ly$ | $0.1\xd7Ly$ |

$Rz$ | $0.05\xd7Lz$ | $0.1\xd7Lz$ |

Parameter | Lower bound | Upper bound |
---|---|---|

$Rx$ | $0.05\xd7Lx$ | $0.1\xd7Lx$ |

$Ry$ | $0.05\xd7Ly$ | $0.1\xd7Ly$ |

$Rz$ | $0.05\xd7Lz$ | $0.1\xd7Lz$ |

Parameter | Lower bound | Upper bound |
---|---|---|

$Rx$ | $0.05\xd7Lx$ | $0.1\xd7Lx$ |

$Ry$ | $0.05\xd7Ly$ | $0.1\xd7Ly$ |

$Rz$ | $0.05\xd7Lz$ | $0.1\xd7Lz$ |

$\theta y$ | $5\u2218$ | $85\u2218$ |

Parameter | Lower bound | Upper bound |
---|---|---|

$Rx$ | $0.05\xd7Lx$ | $0.1\xd7Lx$ |

$Ry$ | $0.05\xd7Ly$ | $0.1\xd7Ly$ |

$Rz$ | $0.05\xd7Lz$ | $0.1\xd7Lz$ |

$\theta y$ | $5\u2218$ | $85\u2218$ |

A uniformly distributed load is applied on the top and bottom faces of the cuboidal domain. The load and material properties for the simulation are given in Table 3. The simulation was conducted through ansys apdl scripts. Subsequently, python scripts were leveraged for automation. 250 data points were generated with randomly sampled ellipsoidal axis lengths and rotation angles, each for rotated and non-rotated cases. Nodal von Mises stress values are queried from each simulation run as outputs. To enable faster development cycles for the surrogate mapping, the data generated initially as three-dimensional tensors are converted to two-dimensional tensors by extracting the data in the $X\u2212Z$ plane at $Y=0$ as shown in Fig. 2. These two-dimensional tensors are on regular grids of shape $128\xd7128$, intended to be used with convolutional neural network. Representative results from the simulations for the two data categories are presented in Figs. 3 and 4 respectively. The datasets generated in the two categories serve as the test bed for the downstream surrogate model development and analysis.

## 3 Methodology

The two deep learning architectures considered in this investigation belong to the class of CNNs that are popularly used for direct image-to-image mapping tasks. However, in most of the applications, these frameworks are trained with an enormous amount of data points compared to our problem. In the following subsections, we give a summary of the two CNN architectures and discuss how the architectures and the associated hyperparameters are optimized.

### 3.1 Convolutional Autoencoder Architecture.

A standard autoencoder (AE) [60,61] is a combination of two detachable neural network architectures: an encoder and a decoder. The trained encoder takes the high-dimensional input data to find a low-dimensional representation in the latent space. The trained decoder, on the other hand, maps the latent space to the original high-dimensional input data. Thus, the true decoder output is the same as the encoder input. For multilayer AEs, the width of the encoder is progressively narrower, and the decoder network gets wider till it matches the original input dimension. In the case of convolutional autoencoders, the input images are first subjected to the encoder containing the convolutional layers, followed by fully connected (dense) bottleneck layers. The decoder mirrors this architecture by stacking dense layers followed by deconvolution layers. A representative CAE architecture is shown in Fig. 5. For the purpose of this work, the original convolutional autoencoder has been modified by replacing the input image with a different target image as the decoder output. Thus, instead of self-supervised training (i.e., the input and output is the same image), we perform a supervised mapping from the 2D images of the ellipsoidal void RVE to the corresponding von Mises stress field. In our CAE architecture, both encoder and decoder consist of approximately 5 million trainable parameters each. The trainable parameters consist of weights and biases in both fully connected layers and convolutional layers.

### 3.2 UNet Architecture.

The UNet CNN architecture was initially proposed to perform accurate medical image segmentation [62] and has subsequently found its application in various image-to-image translation tasks [63,64]. The standard UNet architecture is very similar to the CAE architecture, and it contains a series of contracting layers followed by a set of expanding layers. The distinct feature of the UNet is the presence of skip connections propagating context information from the contracting layers to the expanding layer, which potentially enhances the resolution of the target output. A schematic of the UNet architecture is shown in Fig. 6. Unlike the CAE architecture, the encoder network and decoder network in the UNet architecture are separated by a convolutional bottleneck layer instead of a dense bottleneck layer. The UNet architecture consists of about 20 million trainable parameters in total.

### 3.3 Hyperparameter Optimization.

During the training of the above-mentioned architectures, the weights of the encoder and decoder networks, as well as the bottleneck layers, are updated via back-propagation such as the prediction of the stress field matches the corresponding stress field as closely as possible. In addition to the network weights being optimized via supervised training, we also have two classes of tunable hyperparameters not involved in the back-propagation-based training, namely the network-independent hyperparameters and the network-based architecture hyperparameters, that can potentially help in maximizing the prediction accuracy.

For the CAE architecture-based hyperparameter optimization, we utilize an automated searching platform deephyper [52]. We use five hyperparameters in our optimization: initial learning rate, decay rate, and training batch size are the network-independent hyperparameters, whereas the maximum width of the dense layers and the latent space dimension are the network-based hyperparameters. Using the centralized Bayesian optimization algorithm for model searches, we deploy deephyper on each CAE core training run with a particular configuration. The optimal configuration is selected amongst $100$ sampled configurations based on the validation accuracy. This is particularly useful in our problem with a low number of training data where the models are prone to sub-optimal training. In addition, one can also obtain epistemic and aleatoric uncertainties using the ensemble of models obtained from deephyper. However, this analysis is not considered in this body of work and is reserved for a later study. The search is conducted on the NVIDIA-A100 GPUs available at the Argonne Laboratory Computing Resource Center.

In the case of the UNet architecture-based hyperparameter optimization, the network-independent hyperparameters considered are the learning rate, decay rate, and batch size while the network-based hyperparameters considered are the number of convolutional layers, the number of filters for each convolutional layer, and the type of normalization applied in each layer. We performed a Monte Carlo sampling-based optimization by generating $1000$ space-filling random samples of hyperparameter configurations and selecting the optimal configuration based on the validation accuracy. The hyperparameter search is conducted on an NVIDIA-RTX-A6000 GPU on a local computer.

## 4 Results

We have developed in-house CNN architectures using the TensorFlow [65] package for implementing machine learning applications in python. These differ from publicly and commercially available available CAE and UNet architectures, since our implementation involves customized loss functions including the void geometry. The performance of both the deep CNN models on the direct geometry-to-stress mapping is first demonstrated for a total training data budget of 100. Then we demonstrate the performance changes with varying training data sizes. Across the following comparison studies, the training, validation, and testing data splits are consistent for both CAE and UNet.

### 4.1 Training With 100 Data Points.

#### 4.1.1 Convolutional Autoencoder-Based Prediction Results.

This section highlights and discusses the results obtained from the CAE-based training of the non-rotated and rotated datasets. Ninety data points are used for training the CAE model while 10 and 150 data points are used for validation and testing respectively. An unweighted root mean squared error (RMSE) is used as the loss function for training the CAE architecture. Figures 7 and 8 depict the best-case and worst-case test data predictions of the CAE for non-rotated cases, respectively. For the best prediction performance, it can be seen that the 2D ellipse has a higher aspect ratio thereby minimizing the stress concentrations on the edges perpendicular to the load applied. On the contrary, the worst prediction is the one with the lowest aspect ratio where the stress concentrations are significantly higher on the edges perpendicular to the load applied. Figure 9 compares the true and CAE predicted stress metrics across the training, validation, and test data for the non-rotated case. It can be seen from the figure that as the prediction percentile increases from $50$th percentile (median) to $100$th percentile (maximum), the scatter of the predictions increases. The CAE model is able to capture the mean stress fields better than the higher percentiles. The reduced scatter in the average predicted stress versus true stress is as expected, due to aggregate values across the entire stress fields. However, the maximum stress is a singular value on the field, where the predicted values expected to have larger deviations compared to the target stress values. This can be attributed to the data distribution being biased toward moderate stress values (thus facilitating better optimization in those regions compared to extreme values) and the usage of an unweighted loss function for the CAE training.

Identical plots for the rotated dataset are given in Figs. 10–12. Similar to the non-rotated dataset case, Fig. 12 shows an increasing scatter trend as the percentile measure of the stress output increases. This behavior can be attributed to the use of the RMSE metric explained earlier. When the best-case test data prediction (shown in Fig. 10) is inspected further, it can be seen that the rotation is almost minimal. On the other hand, the worst-case test data prediction (as shown in Fig. 11) is an ellipse that has been rotated to a recognizable extent. Both these shapes represent the opposite ends of the complexity spectrum and the stress variations generated.

#### 4.1.2 UNet-Based Prediction Results.

This section showcases the predictive performance of the trained UNet model. 90 training data and 10 validation data were used for the training. A weighted RMSE is used as the loss function for training the UNet architecture. After training, von Mises stress field maps are generated using the UNet model for 150 unseen input ellipsoidal void model geometries. Figure 13 displays the stress map prediction for the test case with the highest prediction accuracy among the entire test dataset. Figure 14 displays the stress map prediction for the test case with the lowest prediction accuracy among the entire test dataset. It is noted that the test dataset was not used during the training of the UNet architecture. Figure 15 compares the UNet predictions of different statistical measures of the Von Mises stress field and the corresponding ground truth values for all the training, validation, and testing datasets. It is seen from Fig. 15 that the predicted and the true average and median stress values show very good correlations for the train, validation, and test datasets. However, for the high percentile stress metrics, the correlations are weaker across all three datasets. It suggests that the trained model has good predictive power for capturing stress values in the low-stress regions, and the predictive performance worsens when capturing stress values in higher-stress regions. This behavior is very similar to the CAE-based results demonstrated in the previous section.

A similar analysis is done for the rotated dataset cases. Figure 16 displays the stress map prediction for the test case with the highest prediction accuracy among the entire test dataset. Figure 17 displays the stress map prediction for the test case with the lowest prediction accuracy among the entire test dataset. Figure 18 shows the comparison of the UNet predictions of different statistical measures of the Von Mises stress field and the corresponding ground truth values for all the training, validation, and testing datasets.

### 4.2 Sensitivity Analysis on Training Data Size.

In this section, we study how the prediction accuracy on the test dataset varies with variation in the training data size in the small data regime for the non-rotated and rotated dataset cases. To compare the performance of the CAE-based and UNet-based predictions on test data, normalized RMSE for the median stress, 90th percentile, 99th percentile, and maximum stress metrics using both the architectures are plotted in Figs. 19(a) and 19(b). It is observed that for only the maximum stress case, the normalized RMSE on the unseen test data predictions shows signs of slight improvement with an increase in the number of training data. For all the other stress statistics, the error is found to be almost insensitive to the variation in the training data size. Thus, overall, the prediction performance of the CAE and the UNet model, even at a small data size of $20$, is satisfactory for the moderate stress regions. From Fig. 19(a), in the non-rotated dataset case, it is observed that the performance of the two architectures is similar for the median and 90th percentile stress predictions. However, for the 99th percentile stress and the maximum stress prediction cases, UNet predictions are found to be more accurate than that of CAE. Similar conclusions can be made in general for the rotated dataset case from Fig. 19(a). The enhanced performance of UNet compared to CAE can be attributed to two main reasons. First, the skip connections in the UNet architecture help transfer important semantic information from the input RVE to the output stress field and this helps in capturing the localized high-stress regions. Also, for the UNet training, a weighted mean squared error loss is used as the loss function (as opposed to unweighted mean squared error for the CAE case) for training the datasets, which essentially put more importance on capturing higher-stress regions accurately.

## 5 Conclusions

In this study, we used deep CNNs to construct an efficient and accurate mapping between the input void shapes and the corresponding von Mises stress field maps. We considered a CAE based CNN and the UNet-based CNN to build the surrogate map in a low data considering training data only up to 100. This is the novelty in this work—while majority of the ML-based stress field reconstructions do not focus on low data regime (for instance, see Refs. [49,50]), our framework is tailored and tested solely for situations where data are sparse. While we have only demonstrated our work on relatively inexpensive numerical simulation datasets, the prospective impacts on emulating manufacturing processes with experimental data or sophisticated simulations are considerable.

We also performed a sensitivity study and demonstrated how the test prediction accuracy evolves with changes in the training data size and compares the two deep learning methods. We thus successfully show how deep neural networks can help in having efficient structure-to-performance maps and thus help in efficient inverse design especially when data are sparse. It is noted that the FEM simulation model considered is a simple linear elastic simulation. But the prediction results obtained with sparse data indicate that deep neural networks can be efficiently used to build surrogates of more sophisticated and expensive simulation models that capture the physics of the system better.

## Footnote

## Acknowledgment

This material is based upon work supported by the U.S. Department of Energy’s Office of Energy Efficiency and Renewable Energy (EERE) under the Advanced Materials and Manufacturing Technologies Office, Award Number DE-AC0206H11357. The views expressed herein do not necessarily represent the views of the U.S. Department of Energy or the United States Government. Work at Argonne National Laboratory was supported by the U.S. Department of Energy, Office of High Energy Physics. Argonne, a U.S. Department of Energy Office of Science Laboratory, is operated by UChicago Argonne LLC under contract no. DE-AC02-06CH11357. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the US Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan.^{2} Part of the analysis here is carried out on Swing, a GPU system at the Laboratory Computing Resource Center (LCRC) of Argonne National Laboratory. We would also like to thank Dr. Aymeric Moinet, at General Electric Aerospace Research, Niskayuna, NY, for his insights into the ellipsoidal void problem.

## Conflict of Interest

There are no conflicts of interest.

## Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

## References

*Learning in Graphical Models*

*Gaussian Processes for Machine Learning*