Abstract

Low-fidelity engineering-level dynamic models are commonly employed while designing uncrewed aircraft flight controllers due to their rapid development and cost-effectiveness. However, during adverse conditions, or complex path-following missions, the uncertainties in low-fidelity models often result in suboptimal controller performance. Aircraft system identification techniques offer alternative methods for finding higher fidelity dynamic models but can be restrictive in flight test requirements and procedures. This challenge is exacerbated when there is no pilot onboard. This work introduces data-driven machine learning (ML) to enhance the fidelity of aircraft dynamic models, overcoming the limitations of conventional system identification. A large dataset from twelve previous flights is utilized within an ML framework to create a long short-term memory (LSTM) model for the aircraft's lateral-directional dynamics. A deep reinforcement learning (RL)-based flight controller is developed using a randomized dynamic domain created using the LSTM and physics-based models to quantify the impact of LSTM dynamic model improvements on controller performance. The RL controller performance is compared to other modern controller techniques in four actual flight tests in the presence of exogenous disturbances and noise, assessing its tracking capabilities and its ability to reject disturbances. The RL controller with a randomized dynamic domain outperforms an RL controller trained using only the engineering-level dynamic model, a linear quadratic regulator controller, and an L1 adaptive controller. Notably, it demonstrated up to 72% improvements in lateral tracking when the aircraft had to follow challenging paths and during intentional adverse onboard conditions.

1 Introduction

Autonomous aircraft flight control has made many possibilities available to humans. This ranges from reducing the workload on human pilots to allowing fully autonomous uncrewed flight missions. As the complexity of missions increases, the dependency on the capabilities of flight control systems grows. The symbiotic relationship between the flight controller and the dynamic model of aircraft motivates the development of higher-fidelity aircraft models that support advanced flight controllers.

High-fidelity aircraft modeling techniques, such as wind tunnel tests and computational fluid dynamics analysis, are commonly employed for high-sensitivity applications, such as transport aircraft, fighters, and business jets. However, for small uncrewed aircraft systems (UAS), the utilization of these high-fidelity modeling methods can be cost-prohibitive.

For UASs, an alternative solution to reduce modeling costs is to rely on conceptual-level low-fidelity dynamic models. This includes dynamic models developed using relatively simple theoretical methods [1,2]. Such models have been successfully implemented for designing modern flight controllers, even sophisticated deep reinforcement learning (RL) controllers. However, inherent uncertainties in the low-fidelity models adversely impact flight controller performance. The uncertainties in the low-fidelity models can result in predicting the aircraft motion with incorrect magnitudes, incorrect time delays, or even incorrect motion trends [3]. The exponential demand on the complexity of UAS missions demands better dynamic models and flight controllers.

A commonly used method to enhance aircraft dynamic models' fidelity is system identification using flight test data. System identification has been studied for many years, and a wide range of analytical methods exist, as presented in the references cited here [46]. This technique allows enhancing aircraft simulation models based on data observed in the actual flight environment. The method can be used even for improving the simulation of complex flight conditions such as rotorcraft operation in ship air wake gusts [7].

Commonly, for modeling the base flight dynamics of a fixed-wing aircraft, system identification techniques use relatively short segments of flight consisting of specially designed input maneuvers. System identification methodology, as presented in standard textbooks, call for performing the designed flight maneuvers in low wind conditions (e.g., early morning) to reduce the impact of wind disturbances on modeling the aircraft flight dynamics [4]. In system identification experiments and flight maneuvers, inputs are tailored to specific frequencies, amplitudes, and shapes (e.g., singlet, doublet, multistep) to excite aircraft states or dynamic modes effectively. Such careful design ensures mode excitation and prevents coupling between dynamic modes. However, these requirements make the conventional system identification approach restrictive because commonly collected flight test data cannot be utilized for system identification purposes, necessitating gathering specific flight data.

The rapid advancements in data-driven machine learning (ML) techniques have provided a new opportunity to enhance the fidelity of aircraft dynamic models using commonly collected flight test data, overcoming the limitations of the conventional system identification approach. In this work, we use data, which was already available from multiple previous flights for improving the fidelity of the UAS model. We use data-driven ML techniques, incorporating a long short-term memory (LSTM) recurrent neural network (RNN) architecture. LSTM RNNs have memory elements that are not present in standard neural networks. These memory elements allow LSTM models to make predictions based on information in the previous sequence of input data, instead of only making predictions based on information from a single previous time-step.

Aircraft system identification using neural networks has been done in previous works such as [4,8,9], where the end goal was to obtain aircraft stability and control derivatives. However, in our previous experience, getting consistent stability and control derivatives for a small UAS was challenging [10]. In this work, our objective is not to obtain stability and control derivatives. Instead, we aim to utilize neural networks for modeling the input–output relationships, mimicking the dynamic behavior of the aircraft, as in Refs. [11,12].

We develop a model that takes aircraft states and controls at previous time-step(s) as inputs and uses them to predict the next time-step states. In Ref. [11], a small feed-forward (memoryless) neural network is used along with a moving data window concept to perform online modeling of aircraft translational acceleration. In Ref. [12], RNNs are used to model (i.e., predict) aircraft rotational rates and translational velocities at different Mach numbers and altitudes. However, in the reference, training is performed based on simulation data due to the difficulty of obtaining a large set of flight test data that covers the entire flight envelope. We train an LSTM RNN using actual flight test data to capture the actual flight dynamics. Unlike an online modeling approach, we focus on offline modeling using a large set of data to generate a model that works over a larger portion of the flight envelope.

Using LSTM environment models for training RL agents has been explored in nonaerospace model-based RL applications such as developing recommender systems [13] and a robotic assistant for elderly people [14]. Ref. [15] developed a (non-LSTM) neural network world model and used it for training a simplified discrete “controller” for aircraft air-to-air engagement scenarios. However, aside from being a non-LSTM model, the scope of the dynamic model is limited to position and velocity modeling and does not account for aircraft forces and moments. Also, the developed controller outputs are limited to general commands of turn left, turn right, speed up, slow down, etc., not actual servo deflection commands. To the best of our knowledge, using an LSTM RNN as the model for training RL-based aircraft inner-loop controllers has not been done in previous research.

Deep RL has been the subject of much recent research in robotics and uncrewed systems [1619]. This work expands on the recent application of RL techniques to fixed-wing aircraft flight control. In flight control, RL techniques have been mainly focused on rotary wing applications and simulation-based validation. Examples of application of RL to rotary wing flight control include [20,21], which were performed in simulations, and [22,23] which include actual flight tests. For fixed-wing aircraft, different RL algorithms have been applied recently to flight control, but the works were only validated in simulations. Notable examples are Refs. [24,25], which use an actor-critic design framework, Ref. [26], which uses the soft actor-critic algorithm, Ref. [27], which uses the twin-delayed deep deterministic policy gradient (TD3) algorithm, and Ref. [28], which uses the normalized advantage function Q-learning algorithm. Recent applications of RL algorithms to fixed-wing UASs include [29] for active stall protection and [30] for perched landing.

A few recent works include the application of deep RL to fixed-wing UASs inner-loop flight control while including actual flight test validations [3135]. The deep deterministic policy gradient algorithm is used in Refs. [31,32], while the proximal policy optimization (PPO) algorithm is used in Refs. [33,34]. References [3134] develop flight controllers for longitudinal motion pitch angle and airspeed tracking, and controller training is done using the low-fidelity physics-based aircraft dynamic models. Very recently, reference [35] used the soft actor-critic algorithm for attitude (roll and pitch) control of a fixed-wing UAS, where the controller was trained on an aircraft model developed using wind tunnel testing and computational fluid dynamics.

The gap in the controller's performance between simulation and the actual environment is often referred to as the “reality gap” [36] and is active research in the field of robotics and AI/ML. In our previous studies [33,34], we utilized an approach called domain randomization [3739] to improve the generalization performance of the control policy and show that the RL-based controller developed in the simulation was robust and verifiable in the actual environments. In domain randomization, some environment parameters are randomized in simulation during controller development, exposing the controller to possible variations. For instance, Refs. [33,34] used estimated uncertainties in model parameters, control delays, sensor noises, and wind disturbances to enrich the training environment. The concept is closely related to robust control design such as H. However, in the robust control design, only worst-case variation is considered. The accurate a priori knowledge about worst-case variation is difficult to achieve, so often, an overestimated upper bound is used in favor of safety and at the expense of performance. In contrast, domain randomization does not only focus on the worst-case but also considers the whole range of parameter values and attempts to perform statistically significant enough variability in the simulator so that there are significant overlaps with the real-world variations. Although quantifying the true reality gap is intractable, its upper bound can be estimated from the simulated data and is reduced with an increasing number of samples from the distributions [40].

Deep RL-based controller development, in general, yields a model that is large, complex, and opaque. Therefore, it is difficult to establish safety guarantees for such controllers using existing analytical methods. However, probabilistic guarantee can be established through methods such as formal verification [4143] that checks the correctness of the model outputs using a simplified model. Reference [31] proposed a monitoring algorithm based on formal verification that automatically switches from a primary controller (deep RL) to a secondary controller (LQR) if a predicted state, given the primary controller output, falls outside the safe zone for the secondary controller. However, the proposed formal verification method relies heavily on the accuracy of an linear-time-invariant (LTI) dynamic model and requires a separate secondary controller that runs in parallel. In addition, the controller switching can result in unintended outcomes such as oscillations. An alternative method is called safe reinforcement learning (safe RL) (e.g., Ref. [44]) that attempts to acquire the safe policy out of the box by factoring in safety objectives in the training process.

In this work, we introduce data-driven ML methods to improve the fidelity of an aircraft dynamic model. We use a bank of collected flight data in an ML framework, overcoming conventional system identification restrictions. The work also contributes to designing a verifiable deep RL-based lateral-directional flight controller. We follow the approach of safe RL through rigorous training and testing of the controller to ensure safe control. To that end, the controller synthesis step utilizes both the developed LSTM-based and physics-based dynamic models to form a randomized and physics-informed RL training environment to improve robustness toward modeling uncertainties. A novel RL reward function is used that includes safety components such as constraints on rate of controls, which has been shown to be effective in improving the stability and robustness of the controller. In addition, an LSTM layer is added to the controller architecture to enhance the controller's adaptive performance. Several challenging actual flight tests are conducted to assess the controller performance and the improvements in the aircraft dynamic model. The flight tests are conducted on different days with different wind conditions and in the presence of sensor noise demonstrating the ability of the controller to control the aircraft in the presence of exogenous disturbances.

2 Testbed Aircraft: The SkyHunter

The testbed aircraft used in this work is the SkyHunter UAS presented in Fig. 1. The SkyHunter is a fixed-wing UAS featuring a twin tailboom design and uses a single pusher electric motor. The UAS has a 1.8 meters wingspan, a length of 1.4 meters, and a weight of 4 kilograms. The SkyHunter has elevator, aileron and rudder control surfaces.

Fig. 1
The SkyHunter UAS

3 Physics-Based Dynamic Model

Engineering-level physics-based methods were used to obtain initial flight dynamic models of the SkyHunter UAS. Two dynamic models were developed. The first is a LTI decoupled dynamic model of the lateral-directional motion. It has the form
(1)

where A and B are matrices consisting of the linear model coefficients. X is the state vector containing the perturbed lateral-directional states (β, ϕ, P and R) and U is the control vector containing the perturbed controls (δa and δr). X˙ is the time rate of change of the states.

The second model is a full six-degree-of-freedom (6DoF) model of the aircraft which uses the nonlinear transnational and rotational aircraft equations of motion
(2)
(3)

where m is the mass of the aircraft, I is the moment of inertia matrix, FA+P+G is the aerodynamics, propulsive and gravitational forces, MA+P is the aerodynamic and propulsive moments, V and ω are the transnational and rotational velocity vectors, respectively. All quantities are in the aircraft body coordinate system. The models were developed using geometric and mass measurements from the airframe. Aerodynamic forces and moments are modeled based on stability and control derivatives estimated using advanced aircraft analysis (AAA) software [2]. AAA is an aircraft design software widely used by the aerospace industry and academia for decades [4548]. AAA is based on the analytical methods presented in Refs. [49,50] and the U.S. Air Force Stability and Control DATCOM [51]. The quality of the AAA physics-based model of SkyHunter was further improved using actual flight data and tuning methods presented in Ref. [52]. The improved SkyHunter physics-based model was used in developing guidance and control algorithms, which were tested in actual flight tests, and they outperformed widely used open-source autopilot software (e.g., Pixhawk) [34,5355]. Further details on the SkyHunter model are published in Ref. [3] along with an analysis of the validity of the physics-based 6DOF model under different flight conditions, including loss of control and stall testing scenarios. In this work, we use machine learning techniques to improve the fidelity of the aircraft dynamic model.

4 Data-Driven Modeling of Aircraft Dynamics Using Machine Learning

Machine learning provides a way to use data for modeling different processes. We use a bank of data from 12 flight tests to improve the lateral-directional dynamics model of the SkyHunter. These 12 flight tests amount to a total of 218,085 data points (182 min of flight time) which are used for training, validating and testing the developed ML models. The explored model architectures, the used flight test data, the training setup, and the modeling results are presented in this section.

4.1 Model Architecture.

In this work, our goal is to develop a model for the lateral-directional motion. The model takes previous state and control values as inputs and outputs predictions of future states. For the lateral-directional motion, the aircraft states of interest are the sideslip angle (β), the roll angle (ϕ), the roll rate (P), and the yaw rate (R). The control inputs of interest are the aileron and rudder deflections (δa and δr, respectively). Thus, the developed model has the inputs and outputs shown in Fig. 2(a).

Fig. 2
Linear and MLP models inputs-outputs and computations flow. Information from only one previous time step is passed to the model. (a) Inputs & outputs and (b) computations flow.
Fig. 2
Linear and MLP models inputs-outputs and computations flow. Information from only one previous time step is passed to the model. (a) Inputs & outputs and (b) computations flow.
Close modal
We experiment with different ML model architectures and we present the results obtained using each of these architectures. We start with a linear model, in which the model outputs (Ŷ) are a linear combination of the model inputs (X) as shown in the following equation:
(4)

Here, the model inputs are the states and controls at the current time-step, X=[βk,ϕk,Pk,Rk,δak,δrk]T, and the model outputs are the states at the next time-step, Ŷ=[βk+1,ϕk+1,Pk+1,Rk+1]T. W represents the weights matrix. We do not include a bias term in the linear model.

Then, we experiment with a multilayer preceptron (MLP) neural network. The MLP network has the same inputs and outputs as the linear model but it is capable of capturing highly nonlinear relationships. We use ReLU activation functions and include two hidden layers with 64 neurons each. Thus, the MLP network outputs are calculated using the following equations:
(5)
(6)
(7)

where W and b represent the weights and biases in each layer, and h represents the hidden layer outputs. The subscripts indicate layer numbers.

We also develop RNN models. RNNs are of especial interest in modeling sequential data since, in addition to taking in the previous time-step inputs, RNNs have internal state elements. Using these state elements, RNNs can make predictions based on trends in the previous time steps, not just based on information from one previous time-step. The flow of information between time steps in an RNN is presented in Fig. 3. This can be compared to the flow of information in a standard MLP model, presented in Fig. 2(b). In the MLP, information from only one previous time-step is passed to the model to predict the next time-step states. In the RNN, information flows from previous time steps through the hidden layer. For an RNN with a single hidden layer, the calculations in the hidden layer are as shown in Eq. (8). The flow of information from the previous time-step hidden layer output (hk1) to the current time-step hidden layer output (hk) is implemented in the term (Whhhk1)
(8)
Fig. 3
Computations flow in an RNN: Information from previous time steps is passed to the model
Fig. 3
Computations flow in an RNN: Information from previous time steps is passed to the model
Close modal

A popular variant of RNN models, known as the long short-term memory (LSTM) RNN, is used in this work. The LSTM structure is designed to have “gates,” which are intended to manage which information is kept or forgotten from the observed sequential input data. Detailed information about the mathematics of the LSTM model is available in Ref. [56]. We develop two LSTM models. The first LSTM model directly predicts the next time-step outputs (like the linear, and MLP models). The second LSTM model (referred to as ResLSTM) uses a framework that predicts the residuals. i.e., instead of predicting the next time-step outputs, the model predicts how much the next step changes from the current step. The LSTM and ResLSTM networks used in this work have one hidden layer with 32 units.

4.2 Flight Data Used For Modeling.

A collection of data from 12 flight tests is used in this work. The flight data covers flight from the takeoff ascent to cruise flight to landing descent. This does not follow the standard aircraft system identification procedure where the flight data needs to be collected from specifically designed maneuvers. Instead, we use a bank of flight data already collected from normal aircraft operation from different phases of flight. We propose using flight data without specifically performing maneuvers for separately exciting the aileron and rudder controls and without separately exciting each of the lateral-directional modes (Dutch-roll, roll mode, and spiral mode).

For training, we use data from nine flight tests. These data are directly used for training the ML models weights and biases. Data from two flight tests are used as the validation set. The validation set is used to evaluate when the model training should be stopped to avoid overfitting. Training is stopped when the loss (the mean squared error) evaluated on the validation set does not improve for two consecutive training epochs. One flight test is used as the test dataset. This test dataset is used to evaluate model performance on data which was not used to train the model. Table 1 and Fig. 4 present the distribution of data in the training, validation and test datasets. The number of data points in the training, validation, and test datasets are 162,754, 38,117, and 17,214 points, respectively, which is equivalent to 135, 31, and 14 min of flight, respectively, given the 20 Hz sampling rate.

Fig. 4
Training, validation, and test datasets distributions
Fig. 4
Training, validation, and test datasets distributions
Close modal
Table 1

Training, validation, and test data sets statistics

StatisticDatasetSideslip angle β, (deg)Roll angle ϕ, (deg)Roll rate P, (deg/s)Yaw rate R, (deg/s)Aileron δa, (deg)Rudder δr, (deg)
MeanTrain−0.12−8.32−0.01−3.87−0.130.51
Validation−0.21−9.320.01−4.62−0.290.64
Test0.02−7.87−0.04−4.17−0.040.50
Std.Train1.0116.8013.488.670.781.13
Validation0.6714.479.886.970.570.93
Test0.7915.1111.077.270.500.79
Min.Train−10.95−73.23−199.40−60.14−17.59−6.97
Validation−5.53−55.91−66.87−41.49−4.50−5.40
Test−5.85−54.28−127.17−34.96−5.54−3.00
Max.Train12.00139.19190.9153.4014.2713.72
Validation4.2758.1573.4825.843.827.80
Test6.7567.41104.9239.756.082.63
StatisticDatasetSideslip angle β, (deg)Roll angle ϕ, (deg)Roll rate P, (deg/s)Yaw rate R, (deg/s)Aileron δa, (deg)Rudder δr, (deg)
MeanTrain−0.12−8.32−0.01−3.87−0.130.51
Validation−0.21−9.320.01−4.62−0.290.64
Test0.02−7.87−0.04−4.17−0.040.50
Std.Train1.0116.8013.488.670.781.13
Validation0.6714.479.886.970.570.93
Test0.7915.1111.077.270.500.79
Min.Train−10.95−73.23−199.40−60.14−17.59−6.97
Validation−5.53−55.91−66.87−41.49−4.50−5.40
Test−5.85−54.28−127.17−34.96−5.54−3.00
Max.Train12.00139.19190.9153.4014.2713.72
Validation4.2758.1573.4825.843.827.80
Test6.7567.41104.9239.756.082.63

Data from each flight test were inspected before training to check its quality. The trim aileron and rudder values were identified in each flight and subtracted from the recorded aileron and rudder deflections. In several flights, a bias was identified and removed from the sideslip angle estimations recorded in the flight data. The bias in sideslip angles was related to errors in the aileron and rudder trim settings. Correcting these trim errors and rerunning the sideslip angle estimation Kalman filter offline helped correct the bias in the sideslip angles. All angular values and angular rates were converted to radians and radians per second before training. Table 1 and Fig. 4 present the flight data after the trim and bias corrections.

4.3 Model Training Setup.

Each model was trained to minimize the mean-squared-error, or the loss
(9)

where Y and Ŷ are the flight data measurements and model predictions, respectively. no is the number of model outputs. For the case of four model outputs (β, ϕ, P, and R), no = 4. N is the number of data points. k and i are used to sum over the model outputs and the data points, respectively. Training was done using Tensorflow [57] using procedures similar to Ref. [58]. The batch size used during training is 32. Training stops if the validation loss does not improve in two successive epochs. The Adam algorithm [59] is used to perform the training.

4.4 Modeling Results.

The performance of the developed models is evaluated using the mean absolute error (MAE) metric, which is a standard metric in ML work [60]. In this section, we mathematically evaluate the performance of each of the developed models. The MAE in predicting a single time-step in the future is presented in Fig. 5 for each of the model outputs (β, ϕ, P, and R). The figure presents the performance over the training, validation and test datasets. The LSTM and ResLSTM models have better performance metrics on all three training sets for the sideslip angle, β. The linear and ResLSTM models have better performance on all three datasets for the roll angle, ϕ. All four models have similar performance for the roll rate, P. For the yaw rate, R, the LSTM and ResLSTM models have better performance on the training and testing data, and are a little better on the validation data.

Fig. 5
Single step prediction MAEs
Fig. 5
Single step prediction MAEs
Close modal

We desire to use the developed models for training a lateral-directional aircraft controller using reinforcement learning. For this, the model needs to have good performance in simulating several time steps in the future, not just one time-step. Therefore, we evaluate the performance of the models on 6 s of simulation, which is the duration of the RL training episodes. In these simulations, the simulation outputs at one time-step are used as inputs in the following time-step in a looping manner. The aileron and rudder controls are the only two variables obtained from the flight data at each time-step since these are autopilot or human pilot commands and should not be predicted by an aircraft model. When evaluating the performance of the models, it became clear that feeding the models with zero sideslip angle yielded better results in the 6 s simulations. This may be due to errors in the sideslip angle estimations available in the flight data and used for ML model training. Therefore, for the remaining results, we feed zero as the sideslip angle to the models and we evaluate model performances for the three outputs: ϕ, P, and R.

For small UAS, there are two approaches to obtain aircraft airflow angles (angle of attack, α, and sideslip angle, β): (a) through estimation, or (b) through measurement. Measurement using a 5-hole pitot tube or other practical approaches suffers from static pressure inaccuracy due to static ports' location. This issue is resolved in large aircraft by distributing the location of static ports. Since UAS are small, air-stream interaction with the body can cause large errors in the measurement of airflow angles. Additionally, the cost of a 5-hole pitot tube or an αβ-veins system can be more than one order of magnitude higher than the UAS cost. The second approach, obtaining the airflow angles through estimation, is less expensive, but finding the correct covariance matrix, if an extended Kalman filter is used, or dealing with bias errors results in challenges in obtaining good estimations. Additionally, it is difficult to know the ground truth airflow angles to evaluate the estimation accuracy. A good example demonstrating the challenge of estimating the sideslip angle is evident in this study. Employing the Extended Kalman filter estimated sideslip angles to model the lateral-directional flight dynamics resulted in inferior outcomes compared to scenarios where the sideslip angle was not utilized.

We compare the prediction performance of the four learned models (Linear, MLP, LSTM, and ResLSTM), the two physics models (LTI and 6DOF), and two baseline models (baseline and “zero”). The baseline model simply predicts that the next time-step states are equal to the current time-step states. The “zero” model predicts zero for all the states at all time steps. This model is of interest in straight line flight sections where the aircraft states are around zero.

Classification algorithms are used to classify the flight data into turning and straight line flight similar to the approach in Ref. [3]. Using the classification algorithms, 44 turning flight segments were obtained from the first validation flight, 34 segments were obtained from the second validation flight, and 43 segments were obtained from the test flight. Figure 6 presents the average MAE for these 6 s segments of turning flight. To obtain a perspective of how large the MAEs are, we normalize the MAEs by the standard deviations of roll angle, roll rate, and yaw rate calculated on the training data, σTrain (presented in Table 1). These normalized percentage values are shown on the right axes of the plots in Fig. 6. The learned models have improved MAE compared to the physics-based models in most comparisons. The LSTM model had improved MAE in all comparisons, except for the roll angle prediction in the test flight (but it had improved MAE in the two validation flights). The LSTM model had improvements of up to 45.8% and 23.4% over the physics-based models on the validation and test data, respectively. Table 2 shows the percentage improvement obtained by the LSTM model compared to the physics-based models.

Fig. 6
Average MAE of several 6 s segments of turning flight
Fig. 6
Average MAE of several 6 s segments of turning flight
Close modal
Table 2

Improvements in MAE achieved by the LSTM model over the physics-based models (measured in % of σTrain)

Flight
VariableVal. 1 (%)Val. 2 (%)Test (%)
P13.04.520.0
R35.845.823.4
ϕ21.612.0−12.1
Flight
VariableVal. 1 (%)Val. 2 (%)Test (%)
P13.04.520.0
R35.845.823.4
ϕ21.612.0−12.1

Similar analysis was performed for straight line flight segments. Using the classification algorithms, 21 straight flight segments were obtained from the first validation flight, 20 segments were obtained from the second validation flight, and 21 segments were obtained from the test flight. Overall, in the analyzed straight line flights, the rotation rates predictions performance was comparable across the different learned and physics-based models. For the roll angle, ϕ, the learned models had larger errors than the physics-based models in the validation flights. However, for the test flight, the LSTM, linear, and MLP models had lower roll angle MAEs compared to the physics-based models.

Given the improved performance of the LSTM model compared to the physics-based models, the LSTM model is selected for training the lateral-directional controller using reinforcement learning. The LSTM model had improved performance over the ResLSTM model in roll angle predictions. The LSTM model had improved performance in yaw rate predictions over the Linear and MLP models as well in Fig. 6.

A sample of the prediction performance of the LSTM and physics-based models on a 30 s flight portion from the test flight is presented in Fig. 7. The improved performance of the LSTM model over the physics-based models can be seen in the roll rate (P) and yaw rate (R) modeling. In this test flight, the LSTM model had some error in modeling the roll angle (ϕ), but it was moving in the correct directions like the flight data.

Fig. 7
Test flight 30 s simulation
Fig. 7
Test flight 30 s simulation
Close modal

Another sample of the prediction performance of the LSTM and physics-based models on a 30-s flight portion from the first validation flight is presented in Fig. 8. The improved performance of the LSTM model compared to the physics-based models can be seen in all three model outputs: the roll rate (P), yaw rate (R), and roll angle (ϕ). Using the LSTM model for training and then testing a flight controller, in the rest of this work, provides a way to evaluate the practical use of the LSTM model.

Fig. 8
First validation flight 30 s simulation
Fig. 8
First validation flight 30 s simulation
Close modal

5 Controller Development Using Reinforcement Learning

Reinforcement Learning enables a control policy (π(a|s)) to learn a sequential mapping from the state (s) to optimal control (a) by directly interacting with an environment (Env) and using feedback received as a form of reward (r) for its control decisions. The environment is formalized as a finite horizon discounted Markov decision process. An Markov decision process M is defined by a tuple (S,A,P,R,ρ0,γ,T), where S is the set of states, A is the set of actions, P:S×A×SR0 is the state transition probability distribution, R:S×AR is the reward function, ρ0:SR0 is the initial state distribution, γ(0,1] is the discount factor, and T is the horizon of each episode of interactions. The algorithm used in this work optimizes a stochastic policy πθ:S×AR0. Let, η(π) denotes its expected total discounted reward: η(π)=Eτ[t=0TγtR(st,at)], where τ=(s0,a0,) denotes the whole trajectory, s0ρ0(s0),atπ(at|st), and st+1P(st+1|st,at).

For the safety of the aircraft, controller training is performed in simulation environments. Two different control policies, π1 and π2, are developed for comparison trained using two different simulation environments. The training of π1 is performed using a deterministic (Env1) environment where an LTI-based dynamic model is used for the state transition. In contrast, the training of π2 uses a stochastic (Env2) environment utilizing domain-randomization approach to improve the generalization performance of the control policy [3639]. The stochastic (Env2) environment makes use of both LTI- and LSTM-based dynamic models, where each model is randomly selected with uniform probability at the beginning of each training episode. The details of LTI- and LSTM-based dynamic models can be found in Secs. 3 and 4, respectively.

A state-of-the-art model-free RL (MFRL) algorithm—PPO [61]—is used to train the RL-based controller. MFRL algorithm attempts to learn the optimal policy without making any assumption about the environment model and PPO has comparatively better stability and monotonic improvement characteristic of the policy than other MFRL counterparts. PPO adopts actor-critic architecture [62], which requires the training of two neural networks (NN)—an actor/policy network πθ(a|s), and a critic or state value network Vν(s) parameterized by NN weights θ and ν, respectively. As with any policy-gradient (PG) method [63], PPO directly optimizes the policy with respect to its expected return or η(π). Return (Gt) is defined as the discounted sum of future reward discounted by a constant discount factor γ. The true value of η(π) is unknown. Therefore, the state value network, Vν(s), is trained alongside the policy, which outputs V̂νπ(s); an estimation of η(π). This estimated value tells us how good it is to be in the input state s under the policy π and is defined by
(10)
where τ=(at+1,st+1,,sT1,aT) denotes the trajectory at time-step t, s0ρ0(s0) is the initial state sampled from initial state distribution ρ0, atπ(at|st) is the action taken at time-step t, R(s, a) is the reward function given state-action pairs (s, a), and Gt is the collected reward/return from the trajectory. Value estimation in Eq. (10) as the policy objective may suffer from high variance. Therefore, to reduce the variance, advantage estimation Âθπ(st) is used instead as the policy objective. The advantage function tells about the reward difference that can be obtained by taking the particular action at the given state
(11)
Proximal policy optimization also prevents large policy parameter updates by constraining probability ratio r(θ) within a small interval around 1, thereby constraining changes in the policy distribution for stable learning. The policy objective to be maximized
(12)
(13)
The critic is updated by the minimization of the following mean-squared loss function:
(14)

5.1 Neural Network Architecture.

The policy and the critic are represented by two LSTM-based neural networks (NNs) with weights θ and ν, respectively. Both NNs are based on the same architecture and are composed of one input layer, one LSTM layer, two feed-forward (FF) hidden layers, and one output layer, as shown in Fig. 9. The LSTM layer comprises one hidden layer of 128 LSTM memory cells. Each hidden layer is a fully connected layer of 128 hidden units (neurons) with tanh activation. Training hyper-parameters used in this work are presented in Table 3. The LSTM layer for the input ϕe is shown unrolled in time-step t in Fig. 9. The layer uses an input sequence where the state from current time-step t through time-step tN are stacked together.

Fig. 9
Network architecture used for actor and critic representation
Fig. 9
Network architecture used for actor and critic representation
Close modal
Table 3

Policy training hyper-parameters

Hyper-parameterValue
LSTM hidden-layers1
FF hidden-layers3
Neurons128
Activationtanh
State sequence length, N32
Discount-factor, γ0.99
Clip param, ϵ0.2
Batch-size5120
Learning rate1 × 10−3
# of epochs10
Episode-length, T128
# of episodes per batch, B40
Step time-period, dt0.05
Hyper-parameterValue
LSTM hidden-layers1
FF hidden-layers3
Neurons128
Activationtanh
State sequence length, N32
Discount-factor, γ0.99
Clip param, ϵ0.2
Batch-size5120
Learning rate1 × 10−3
# of epochs10
Episode-length, T128
# of episodes per batch, B40
Step time-period, dt0.05

5.2 Network Input-Output.

The network input (a.k.a. observation vector), s, consists of the lateral states roll ϕ, roll-rates P, and yaw-rates R along with the roll error ϕe calculated by taking the difference between commanded roll (ϕcmd) and roll (ϕ). Observation vector is defined by
(15)
The network output (out) for the critic is the state value estimation (V̂νπ(st)), while it is the action a for the policy. PPO uses stochastic action as policy outputs in a Gaussian probability distribution defined by a mean vector μa and a diagonal covariance matrix Σa, where both μa and Σa are trainable parameters. The stochasticity of action is initially high at the beginning of the training (high variance), which helps the policy to explore. Should the policy be converged, action gradually becomes deterministic (variance 0) at the final stage of training. This deterministic action (zero variance) is used during evaluation and testing. Policy output (action) vector mean is given by
(16)

where δa[1,1] and δr[1,1] are the normalized aileron and rudder deflections from their respective trims. These control setpoints are appropriately scaled and shifted to match the aircraft's control constraints.

5.3 Reward Function.

The elements used in designing the reward function R(s, a) can be divided into five groups based on their functions and is defined by

where

  • Group 1 is used to improve the tracking performance and consists of weighted L2 cost/penalty for the roll tracking errors (ϕe).

  • Group 2 is weighted about half the magnitude smaller than Group-1 and consists of L2 cost/penalty for the nonzero perturbed control values outputted by the policy to keep them close to their respective trim values.

  • Group 3 regulates roll-rate (P) and yaw-rate (R) with L2 penalty weights about one order of magnitude smaller than Group-1.

  • Group 4 regulates control rates (δa˙ and δr˙) with L2 penalty weights about two orders of magnitude smaller than Group 1.

  • Group 5 limits control rates (δa˙ and δr˙) within their maximum values (γδa˙ and γδr˙). The initial L2 penalty weights are set at about one order of magnitude smaller than Group 1 and then increased by one order of magnitude at the final stage of the training to further reduce the control rates.

Algorithm 1

Policy training pseudo-code

Input:
  LTI- and LSTM-based dynamic models (DMs) developed in Secs. 3 and 4, respectively.
  Maximum # of episodes, kmax.
Output:
  Optimized policy network weights.
Initialization:
  Initialize critic and policy network parameters ν and θ, respectively.
fork = 0,1,2,, kmaxdo
  1. For the state transition, ifπ1then pick Env1
   else ifπ2then pick Env2.
  2. Collect set of B=40 trajectories of length
   T =128 into a dataset Dk={τi}i=1B
   on policy πθk=π(·|θkπ), where τi={(sti,ati,rti,st+1i)}t=0T
   contains data collected along the ith trajectory.
 fori = 0,1,2,…, Ndo
   3. Compute total discounted reward Gt in (10).
   4. Estimate advantages using (11).
  end
  5. Update policy by maximizing (12).
  6. Update critic by minimizing (14).
  7. Break if converged.
end
Input:
  LTI- and LSTM-based dynamic models (DMs) developed in Secs. 3 and 4, respectively.
  Maximum # of episodes, kmax.
Output:
  Optimized policy network weights.
Initialization:
  Initialize critic and policy network parameters ν and θ, respectively.
fork = 0,1,2,, kmaxdo
  1. For the state transition, ifπ1then pick Env1
   else ifπ2then pick Env2.
  2. Collect set of B=40 trajectories of length
   T =128 into a dataset Dk={τi}i=1B
   on policy πθk=π(·|θkπ), where τi={(sti,ati,rti,st+1i)}t=0T
   contains data collected along the ith trajectory.
 fori = 0,1,2,…, Ndo
   3. Compute total discounted reward Gt in (10).
   4. Estimate advantages using (11).
  end
  5. Update policy by maximizing (12).
  6. Update critic by minimizing (14).
  7. Break if converged.
end

The complete policy training pseudo-code is summarized in Algorithm 1. The training starts with randomly initializing policy and critic network weights. The state vector (Eq. (15)) is randomly initialized according to Table 4 before the start of each training episode to create the initial observation vector. The ranges for observation vector are matched with the initial conditions seen in conducted flight tests. The policy is then rolled into the environment in a Monte Carlo (MC) fashion to collect B=40 samples of trajectories. Each collected trajectory contains a sequence of (state, action, reward) tuples of a complete roll-out episode of length T =128 timesteps and is stored in a memory buffer of size 5120 (=40×128). PPO uses a full buffer for the network update and refreshes the memory buffer with new trajectories after each update (on-policy update). Before each network update, the buffer is divided into sequences for LSTM layer input, each sequence with a length of N=32. The update step uses Adam optimizer [59], a state-of-the-art stochastic gradient descent algorithm. After each update step, the policy parameter θ is moved toward the direction of higher η(π) suggested by the gradient of policy objective θJ(θ). The training is stopped once the desired performance is reached.

Table 4

Ranges for initial observation vector

VariableInitial condition
ϕ±30°
P±50°/s
R±20°/s
ϕe±10°
VariableInitial condition
ϕ±30°
P±50°/s
R±20°/s
ϕe±10°

Figure 10 shows the sum of the reward function values of each complete episode (total episodic reward) of B=40 episodes after each update step during the policy training process. The training stops (convergence) after a total of 311 update steps (about 1.6 × 106 timesteps) for both π1 and π2. Higher mean and variance in reward values for π2 is observed because of the use of dynamic randomization. The algorithm applies the increased penalty weights of Group 5 after 310 update steps. The total training time of the controller was about 33 min on a laptop computer with a 12-core i7-9750H CPU and an RTX 2070 Max-Q GPU.

Fig. 10
Total episodic reward during policy training (a) for policy π1 training and (b) for policy π2 training
Fig. 10
Total episodic reward during policy training (a) for policy π1 training and (b) for policy π2 training
Close modal

6 Flight Test Results

The performance of the developed controllers is validated in actual flight test experiments. Four flight tests are performed in which the controller's performance is evaluated in different scenarios and compared to controllers developed using modern and adaptive control techniques. The flights were performed on different days with different wind conditions demonstrating the ability of the controller to handle disturbances. The flights were also subject to sensor noise inherently present in the flight sensors.

In the first flight test (Flight 1 in Table 5), the aircraft was commanded to fly multiple laps around a rectangular path. This flight was conducted in 7.8 ft/s wind conditions coming from the West with gusts up to 10.7 ft/s. Thus, the wind speed was up to 21% of the commanded cruise speed of 50 ft/s. The laps were performed with (a) the neural network controller trained using the LSTM model and dynamic randomization (NN π2), (b) a linear quadratic regulator (LQR) controller developed using modern control techniques, and (c) an L1 adaptive controller. The LQR and L1 controllers was developed using LTI-based dynamic model as described in Sec. 3 and manually tuned for the targeted aircraft in flight tests. The 2D trajectory tracking performance of the aircraft using the different controllers is presented in Fig. 11. Flight using the neural network controller had better trajectory tracking than the other two controllers. Table 5 compares the root mean square error in roll angle tracking and the results show that the neural network controller had smaller root mean square error compared to the LQR and L1 controllers. The neural network controller yielded 30% and 60% improvements in the maximum tracking error at the east leg, compared to the LQR and L1 controllers, respectively.

Fig. 11
Actual flight 1: 2D trajectory using different controllers and different rudder effectiveness. Wind: 7.8 → 10.7 ft/s W. (a) δr effectiveness: 100%, (b) δr effectiveness: 50%, and (c) δr effectiveness: 0%.
Fig. 11
Actual flight 1: 2D trajectory using different controllers and different rudder effectiveness. Wind: 7.8 → 10.7 ft/s W. (a) δr effectiveness: 100%, (b) δr effectiveness: 50%, and (c) δr effectiveness: 0%.
Close modal
Table 5

Analysis of flight performance

FlightControllerηδr (%)ϕ RMSE (deg)Max. error (ft.)
1LQR1003.0371 East
1NN π21002.5350 East
1L11005.50128 East
1LQR502.9169 East
1NN π2502.4645 East
1L1506.47147 East
1LQR02.9069 East
1NN π202.6045 East
1L106.20158 East
2LQR1003.40113 East
2NN π21003.4597 East
2NN π1 (LTI only)100FailedFailed
3aNN π21003.72254 South
3aLQR1006.70470 South
4LQR1003.8045 West
4NN π21004.3066 West
FlightControllerηδr (%)ϕ RMSE (deg)Max. error (ft.)
1LQR1003.0371 East
1NN π21002.5350 East
1L11005.50128 East
1LQR502.9169 East
1NN π2502.4645 East
1L1506.47147 East
1LQR02.9069 East
1NN π202.6045 East
1L106.20158 East
2LQR1003.40113 East
2NN π21003.4597 East
2NN π1 (LTI only)100FailedFailed
3aNN π21003.72254 South
3aLQR1006.70470 South
4LQR1003.8045 West
4NN π21004.3066 West
a

Flight 3 has a triangular flight path while the other flights have rectangular flight path. This contributes to the different 2D tracking errors presented for Flight 3.

The controller's performance was also evaluated under the adverse conditions of degraded rudder control surface effectiveness. To artificially emulate rudder effectiveness failure, the aircraft rudder commands generated by the controllers were multiplied by an effectiveness factor (ηδr) before being sent to the rudder servos. The controllers were tested under 50% rudder effectiveness and 0% rudder effectiveness (i.e., corresponds to rudder is not working). The flight trajectory tracking performance under these failure cases is presented in Fig. 11. Flight using the neural network has improved tracking over the other two controllers. Table 5 presents the maximum East error at the east flight leg for all three controllers under the different rudder degradation settings. Flight using the neural network controller yielded 36% and 72% improvements in the maximum tracking errors at the east leg compared to the LQR and L1 controllers, respectively. Rudder degradation did not have an adverse effect on the trajectory and roll angle tracking performance of the neural network controller, as seen in Table 5. The neural network controller again had the lowest roll angle tracking error in the rudder degradation flights.

In the second flight test (Flight 2 in Table 5), the aircraft was again commanded to fly around a rectangular path. This second flight was conducted in 7.3 → 11 ft/s wind conditions coming from the North. In this flight, a comparison was made between (a) the neural network controller trained using the LSTM model and dynamic randomization (NN π2), (b) the neural network controller trained using the LTI model only (NN π1), and (c) the LQR controller. Figure 12 shows the last 15 s of flight using the controller trained using the LSTM model with dynamic randomization (NN π2), then control was switched to the controller trained using the LTI model only (NN π1). The controller, trained using only the LTI model, could not safely control the aircraft and caused the aircraft to go into unstable behavior and roll almost 360 degrees, and the human pilot took back control of the aircraft. Comparison between the roll angle and trajectory tracking performance of the NN π2 and LQR controllers are presented in Table 5 and Fig. 13 where the neural network controller is seen to have better or similar performance compared to the LQR controller.

Fig. 12
Actual flight 2: comparison between (NN π1) and (NN π2) flight performance. Wind: 7.3 → 11 ft/s N.
Fig. 12
Actual flight 2: comparison between (NN π1) and (NN π2) flight performance. Wind: 7.3 → 11 ft/s N.
Close modal
Fig. 13
Actual flight 2: 2D trajectory. Wind: 7.3 → 11 ft/s N.
Fig. 13
Actual flight 2: 2D trajectory. Wind: 7.3 → 11 ft/s N.
Close modal

The performance of the developed controllers was evaluated in a third, more challenging flight scenario (Flight 3 in Table 5). The aircraft was commanded to fly around a triangular path as shown in Fig. 14, where the triangle has angles of about 90-25-65 degrees. This flight scenario puts a demand on the aircraft to perform large changes in heading and very drastic maneuvers. A comparison was made between (a) the LQR controller and (b) the neural network controller trained using dynamic randomization and the LSTM model (NN π2). Flight using the neural network controller has significantly better trajectory tracking as presented in Fig. 14. For example, at the South-West angle of the triangle, flight using the neural network controller has significantly improved tracking performance. As presented in Table 5, the maximum error from the South leg is 254 ft. for the neural network controller, which is 54% of the error for the LQR controller (470 ft.). This third flight was conducted in 8.8 ft/s wind conditions coming from the North-East with gusts up to 13.2 ft/s. Thus, the wind speed was up to 26% of the aircraft's 50 ft/s commanded cruise speed. The neural network controller without the sideslip angle (β) as input also showed comparable coordinated turn performance in flight tests during sharp turns with the well-tuned LQR controller that uses sideslip angle as input.

Fig. 14
Actual flight 3: 2D trajectories using different controllers in a more challenging scenario. Wind: 8.8 → 13.2 ft/s NE.
Fig. 14
Actual flight 3: 2D trajectories using different controllers in a more challenging scenario. Wind: 8.8 → 13.2 ft/s NE.
Close modal

The fourth flight test (Flight 4 in Table 5) aimed to assess the capability of flight controllers in executing complex collision avoidance maneuvers in the presence of wind. During this test, the aircraft navigated through the southern segment of its flight path, successfully avoiding a virtual obstacle. The comparison was made between the flight performance of (a) an LQR-based controller and (b) the neural network controller ANN π2. This flight was conducted in 8.8 → 11 ft/s East wind conditions, corresponding to wind speed up to 24% of the aircraft 45 ft/s commanded cruise speed. Obstacle avoidance was done based on the approach in Ref. [64]. The aircraft under command of the neural network controller could successfully avoid the obstacle and follow the desired flight path as presented in Fig. 15. The neural network controller had similar roll angle and trajectory tracking performance to the LQR-based controller as seen in Table 5 and Fig. 15.

Fig. 15
Actual flight 4: 2D trajectories using different controllers in an obstacle avoidance scenario. Wind: 8.8 → 11 ft/s E.
Fig. 15
Actual flight 4: 2D trajectories using different controllers in an obstacle avoidance scenario. Wind: 8.8 → 11 ft/s E.
Close modal

All four flight tests performed using the neural network controller π2 were subjected to different wind disturbance conditions and the controller was subject to the sensor noise. The flights were conducted on different days, in different wind and gust magnitudes, and different wind directions. Table 6 summarizes the wind conditions of the four flights. The wind conditions reached up to 26% of the aircraft cruise speed. The neural network controller π2 successfully controlled the aircraft in these conditions, and showed better or similar performance to LQR-based and L1 adaptive controllers.

Table 6

Wind conditions for each flight

FlightCruise speed, VC (ft/s)Wind directionWind → gust (ft/s)% of VC
150W.7.8 → 10.721%
250N.7.3 → 11.022%
350NE.8.8 → 13.226%
445E.8.8 → 11.024%
FlightCruise speed, VC (ft/s)Wind directionWind → gust (ft/s)% of VC
150W.7.8 → 10.721%
250N.7.3 → 11.022%
350NE.8.8 → 13.226%
445E.8.8 → 11.024%

As presented, flight using the neural network controller trained using the LSTM model and dynamic randomization (NN π2) showed significantly improved trajectory tracking performance compared to the LQR and L1 controllers (as seen in Flight 1, even for rudder degradation cases, and in Flight 3 in the challenging scenario requiring a large change in heading). The neural network controller trained using the LTI model only (NN π1) was unsuccessful and it caused the aircraft to go into a 360-deg roll. This shows that using the LSTM model and dynamic randomization can practically yield a successful flight controller with improved performance over controllers developed using modern and adaptive control techniques.

7 Conclusions

In this work, data-driven machine learning techniques are used to improve the dynamic model of an uncrewed aircraft using a bank of available flight test data. A recurrent neural network with LSTM architecture is shown to have improved modeling over dynamic models developed using physics-based methods. Unlike restrictive classical system identification methods, the ML LSTM method provides the required methodology and framework to use any portion of flight test to improve the fidelity of an aircraft dynamic model. Lateral-directional RL-based controllers are developed using the PPO deep RL algorithm. The RL-based controller performance, stability, and robustness are improved using a training environment that utilizes both the LSTM and physics-based dynamic models in a random fashion and a reward function that uses terms to regulate rates of control surfaces along with other states. The developed controller is tested in different flight test scenarios and is compared to controllers designed using modern (LQR) and adaptive (L1) control techniques. Assessing the controller's performance in benign flight test conditions or simple path following missions is insufficient. Several complex paths and intentional adverse onboard conditions in the presence of exogenous disturbances are used to quantify the improvement in aircraft dynamics model fidelity. The flight performance using the RL-based controller is observed to be significantly better than the LQR and L1 controllers—even during rudder degradation flight tests and in challenging flight scenarios requiring large changes in heading.

Acknowledgment

Much appreciation is given to collaborators from the KU Flight Research Lab, especially, Justin Clough, Megan Carlson, and Alex Zugazagoitia for their assistance in flight test support and execution.

Funding Data

  • National Aeronautics and Space Administration (NASA) and Armstrong Flight Research Center (Project No. 18CDA067 L; Funder ID: 10.13039/100007346).

  • Federal Aviation Administration (FAA) (No. 908-1003025; Funder ID: 10.13039/100006282).

Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

References

1.
Drela
,
M.
, and
Youngren
,
H.
,
2020
, Athena Vortex Lattice, Massachusetts Institute of Technology, Cambridge, MA, accessed July 1, 2024, https://web.mit.edu/drela/Public/web/avl/
2.
Design, Analysis and Research Corporation
,
2018
, Advanced Aircraft Analysis, Lawrence, KS, accessed July 1, 2024, https://www.darcorp.com/advanced-aircraft-analysis-software/
3.
Benyamen
,
H.
,
Mays
,
B. S.
,
Chowdhury
,
M.
,
Keshmiri
,
S.
, and
Ewing
,
M. S.
,
2023
, “
Analysis of Aircraft Simulation Validity in Different Flight Conditions
,” 2023 International Conference on Unmanned Aircraft Systems (
ICUAS
),
Warsaw, Poland, June 6–9, pp.
129
136
.10.1109/ICUAS57906.2023.10156586
4.
Jategaonkar
,
R. V.
,
2015
,
Flight Vehicle System Identification: A Time-Domain Methodology
, 2nd ed.,
American Institute of Aeronautics and Astronautics
, Reston, VA.
5.
Morelli
,
E. A.
, and
Klein
,
V.
,
2016
,
Aircraft System Identification: Theory and Practice
,
Sunflyte Enterprises
,
Williamsburg, VA
, Vol.
2
.
6.
Tischler
,
M. B.
, and
Remple
,
R. K.
,
2012
,
Aircraft and Rotorcraft System Identification: Engineering Methods With Flight Test Examples
, 2nd ed.,
American Institute of Aeronautics and Astronautics
, Reston, VA.
7.
Sparbanie
,
S. M.
,
2008
, “
Modeling and Identification of Unsteady Airwake Disturbances on Rotorcraft
,”
M.S. thesis
, The Pennsylvania State University, University Park, PA.https://etda.libraries.psu.edu/catalog/9116
8.
Hageman
,
J. J.
,
Smith
,
M. S.
, and
Stachowiak
,
S.
,
2003
, “
Integration of Online Parameter Identification and Neural Network for in-Flight Adaptive Control
,” NASA, Report No.
NASA/TM-2003-212028
.https://ntrs.nasa.gov/api/citations/20040000923/downloads/20040000923.pdf
9.
Dhayalan
,
R.
,
Saderla
,
S.
, and
Ghosh
,
A. K.
,
2018
, “
Parameter Estimation of UAV From Flight Data Using Neural Network
,”
Aircr. Eng. Aerosp. Technol.
,
90
(
2
), pp.
302
311
.10.1108/AEAT-03-2016-0050
10.
Benyamen
,
H.
,
2019
, “
Stability and Control Derivatives Identification for an Unmanned Aerial Vehicle With Low Cost Sensors Using an Extended Kalman Filter Algorithm
,” M.S. thesis,
University of Kansas
,
Lawrence, KS
.
11.
Garcia
,
G.
, and
Keshmiri
,
S.
,
2012
, “
Online Artificial Neural Network Model Based Nonlinear Model Predictive Controller for the Meridian UAS
,”
AIAA
Paper No. 2012-4984.10.2514/6.2012-4984
12.
Roudbari
,
A.
, and
Saghafi
,
F.
,
2016
, “
Generalization of Ann-Based Aircraft Dynamics Identification Techniques Into the Entire Flight Envelope
,”
IEEE Trans. Aerosp. Electron. Syst.
,
52
(
4
), pp.
1866
1880
.10.1109/TAES.2016.140693
13.
Bai
,
X.
,
Guan
,
J.
, and
Wang
,
H.
,
2019
,
A Model-Based Reinforcement Learning With Adversarial Training for Online Recommendation
,
Curran Associates Inc
.,
Red Hook, NY
, pp.
10735
10746
.
14.
Chalvatzaki
,
G.
,
Papageorgiou
,
X. S.
,
Maragos
,
P.
, and
Tzafestas
,
C. S.
,
2019
, “
Learn to Adapt to Human Walking: A Model-Based Reinforcement Learning Approach for a Robotic Assistant Rollator
,”
IEEE Rob. Autom. Lett.
,
4
(
4
), pp.
3774
3781
.10.1109/LRA.2019.2929996
15.
Soleyman
,
S.
,
Chen
,
Y.
,
Fadaie
,
J.
,
Hung
,
F.
,
Khosla
,
D.
,
Moffit
,
S.
,
Roach
,
S.
, and
Tullock
,
C.
,
2022
, “
Predictive Modeling of Aircraft Dynamics Using Neural Networks
,”
SAE Int. J. Aerosp.
,
15
(
2
), pp.
159
170
.10.4271/01-15-02-0010
16.
Wu
,
P.
,
Escontrela
,
A.
,
Hafner
,
D.
,
Abbeel
,
P.
, and
Goldberg
,
K.
,
2023
, “
DayDreamer: World Models for Physical Robot Learning
,” 6th Conference on Robot Learning (
CoRL
), Auckland, New Zealand, Dec. 14–18, pp.
2226
2240
.https://proceedings.mlr.press/v205/wu23c/wu23c.pdf
17.
Zhu
,
X.
,
Liang
,
Y.
,
Sun
,
H.
,
Wang
,
X.
, and
Ren
,
B.
,
2022
, “
Robot Obstacle Avoidance System Using Deep Reinforcement Learning
,”
Ind. Robot
,
49
(
2
), pp.
301
310
.10.1108/IR-06-2021-0127
18.
Kim
,
M.
,
Seo
,
J.
,
Lee
,
M.
, and
Choi
,
J.
,
2021
, “
Vision-Based Uncertainty-Aware Lane Keeping Strategy Using Deep Reinforcement Learning
,”
ASME J. Dyn. Syst., Meas., Control
,
143
(
8
), p.
084503
.10.1115/1.4050396
19.
Manring
,
L. H.
, and
Mann
,
B. P.
,
2022
, “
Modeling and Reinforcement Learning Control of an Autonomous Vehicle to Get Unstuck From a Ditch
,”
ASME J. Auton. Veh. Syst.
,
2
(
1
), p.
011003
.10.1115/1.4054499
20.
Bekar
,
C.
,
Yuksek
,
B.
, and
Inalhan
,
G.
,
2020
, “
High Fidelity Progressive Reinforcement Learning for Agile Maneuvering UAVs
,”
AIAA
Paper No. 2020-0898.10.2514/6.2020-0898
21.
Torres
,
E.
,
Xu
,
L.
, and
Sardarmehni
,
T.
,
2022
, “
Using Actor-Critic Reinforcement Learning for Control and Flight Formation of Quadrotors
,”
ASME
Paper No. IMECE2022-97224.10.1115/IMECE2022-97224
22.
Koch
,
W.
,
Mancuso
,
R.
,
West
,
R.
, and
Bestavros
,
A.
,
2019
, “
Reinforcement Learning for UAV Attitude Control
,”
ACM Trans. Cyber-Phys. Syst.
,
3
(
2
), pp.
1
21
.10.1145/3301273
23.
Lee
,
B.
,
Saj
,
V.
,
Benedict
,
M.
, and
Kalathil
,
D.
,
2021
, “
A Deep Reinforcement Learning Control Strategy for Vision-Based Ship Landing of Vertical Flight Aircraft
,”
AIAA
Paper No. 2021-3218.10.2514/6.2021-3218
24.
Ferrari
,
S.
, and
Stengel
,
R. F.
,
2004
, “
Online Adaptive Critic Flight Control
,”
J. Guid., Control, Dyn.
,
27
(
5
), pp.
777
786
.10.2514/1.12597
25.
Lee
,
J. H.
, and
Van Kampen
,
E.-J.
,
2021
, “
Online Reinforcement Learning for Fixed-Wing Aircraft Longitudinal Control
,”
AIAA
Paper No. 2021-0392.10.2514/6.2021-0392
26.
Dally
,
K.
, and
Van Kampen
,
E.-J.
,
2022
, “
Soft Actor-Critic Deep Reinforcement Learning for Fault Tolerant Flight Control
,”
AIAA
Paper No. 2022-2078.10.2514/6.2022-2078
27.
Völker
,
W.
,
Li
,
Y.
, and
Kampen
,
E.-J. V.
,
2023
, “
Twin-Delayed Deep Deterministic Policy Gradient for Altitude Control of a Flying-Wing Aircraft With an Uncertain Aerodynamic Model
,”
AIAA
Paper No. 2023–2678.10.2514/6.2023-2678
28.
Clarke
,
S. G.
, and
Hwang
,
I.
,
2020
, “
Deep Reinforcement Learning Control for Aerobatic Maneuvering of Agile Fixed-Wing Aircraft
,”
AIAA
Paper No. 2020-0136.10.2514/6.2020-0136
29.
Hein
,
F.
,
Notter
,
S.
, and
Fichter
,
W.
,
2023
, “
Sim-to-Real Transfer of a Deep Reinforcement Learning Approach for Active Stall Protection
,”
AIAA
Paper No. 2023-2536.10.2514/6.2023-2536
30.
Fletcher
,
L. J.
,
Clarke
,
R. J.
,
Richardson
,
T. S.
, and
Hansen
,
M.
,
2022
, “
Integrating Throttle Into a Reinforcement Learning Controller for a Perched Landing of a Variable Sweep Wing UAV
,”
AIAA
Paper No. 2022-1288.10.2514/6.2022-1288
31.
Shukla
,
D.
,
Lal
,
R.
,
Hauptman
,
D.
,
Keshmiri
,
S. S.
,
Prabhakar
,
P.
, and
Beckage
,
N.
,
2020
, “
Flight Test Validation of a Safety-Critical Neural Network Based Longitudinal Controller for a Fixed-Wing UAS
,”
AIAA
Paper No. 2020-3093.10.2514/6.2020-3093
32.
Benyamen
,
H.
,
Chowdhury
,
M.
, and
Keshmiri
,
S. S.
,
2023
, “
Reinforcement Learning Based Aircraft Controller Enhanced By Gaussian Process Trim Finding
,”
ASME Lett. Dyn. Syst. Control
,
3
(
3
), p.
031002
.10.1115/1.4063605
33.
Chowdhury
,
M.
, and
Keshmiri
,
S.
,
2022
, “
Design and Flight Test Validation of an AI-Based Longitudinal Flight Controller for Fixed-Wing UASs
,” 2022 IEEE Aerospace Conference (
AERO
),
Big Sky, MT, March 5–12, pp.
1
12
.10.1109/AERO53065.2022.9843777
34.
Chowdhury
,
M.
, and
Keshmiri
,
S.
,
2024
, “
Interchangeable Reinforcement-Learning Flight Controller for Fixed-Wing Uass
,”
IEEE Trans. Aerosp. Electron. Syst.
,
60
(
2
), pp.
2305
2318
.10.1109/TAES.2024.3351608
35.
Bøhn
,
E.
,
Coates
,
E. M.
,
Reinhardt
,
D.
, and
Johansen
,
T. A.
,
2024
, “
Data-Efficient Deep Reinforcement Learning for Attitude Control of Fixed-Wing Uavs: Field Experiments
,”
IEEE Trans. Neural Networks Learn. Syst.
,
35
(
3
), pp.
3168
3180
.10.1109/TNNLS.2023.3263430
36.
Jakobi
,
N.
,
Husbands
,
P.
, and
Harvey
,
I.
,
1995
, “
Noise and the Reality Gap: The Use of Simulation in Evolutionary Robotics
,”
Advances in Artificial Life: Third European Conference on Artificial Life, Granada
,
Spain
,
June 4–6
, pp.
704
720
.10.1007/3-540-59496-5_337
37.
Tobin
,
J.
,
Fong
,
R.
,
Ray
,
A.
,
Schneider
,
J.
,
Zaremba
,
W.
, and
Abbeel
,
P.
,
2017
, “
Domain Randomization for Transferring Deep Neural Networks From Simulation to the Real World
,” 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (
IROS
),
Vancouver, BC, Canada, Sept. 24–28, pp.
23
30
.10.1109/IROS.2017.8202133
38.
Peng
,
X. B.
,
Andrychowicz
,
M.
,
Zaremba
,
W.
, and
Abbeel
,
P.
,
2018
, “
Sim-to-Real Transfer of Robotic Control With Dynamics Randomization
,” 2018 IEEE International Conference on Robotics and Automation (
ICRA
),
Brisbane, Australia, May 21–25, pp.
3803
3810
.10.1109/ICRA.2018.8460528
39.
Wada
,
D.
,
Araujo-Estrada
,
S.
, and
Windsor
,
S.
,
2022
, “
Sim-to-Real Transfer for Fixed-Wing Uncrewed Aerial Vehicle: Pitch Control by High-Fidelity Modelling and Domain Randomization
,”
IEEE Rob. Autom. Lett.
,
7
(
4
), pp.
11735
11742
.10.1109/LRA.2022.3205442
40.
Muratore
,
F.
,
Gienger
,
M.
, and
Peters
,
J.
,
2021
, “
Assessing Transferability From Simulation to Reality for Reinforcement Learning
,”
IEEE Trans. Pattern Anal. Mach. Intell.
,
43
(
4
), pp.
1172
1183
.10.1109/TPAMI.2019.2952353
41.
Bacci
,
E.
, and
Parker
,
D.
,
2020
, “
Probabilistic Guarantees for Safe Deep Reinforcement Learning
,” Formal Modeling and Analysis of Timed Systems: 18th International Conference (
FORMATS
)
,
Vienna, Austria
,
Sept. 1–3
, pp.
231
248
.https://www.cs.ox.ac.uk/david.parker/papers/formats20.pdf
42.
Kazak
,
Y.
,
Barrett
,
C.
,
Katz
,
G.
, and
Schapira
,
M.
,
2019
, “
Verifying Deep-RL-Driven Systems
,”
Proceedings of the 2019 Workshop on Network Meets AI & ML
, Beijing, China, Aug. 23, pp.
83
89
.10.1145/3341216.3342218
43.
Prabhakar
,
P.
, and
Rahimi Afzal
,
Z.
,
2019
, “
Abstraction Based Output Range Analysis for Neural Networks
,”
Proceedings of the 33rd International Conference on Neural Information Processing Systems
, Vancouver, BC, Canada, Dec. 8–14, pp.
15788
15798
.https://proceedings.neurips.cc/paper_files/paper/2019/file/5df0385cba256a135be596dbe28fa7aa-Paper.pdf
44.
Cheng
,
R.
,
Orosz
,
G.
,
Murray
,
R. M.
, and
Burdick
,
J. W.
,
2019
, “
End-to-End Safe Reinforcement Learning Through Barrier Functions for Safety-Critical Continuous Control Tasks
,”
Proceedings of AAAI Conference Artificial Intelligence
, Honolulu, HI, Jan. 27–Feb. 1, pp.
3387
3395
.10.1609/aaai.v33i01.33013387
45.
Design, Analysis and Research Corporation
,
2024
, “
AAA in Publications
,” accessed April 20, 2024, https://www.darcorp.com/advanced-aircraft-analysis-publications/
46.
Daşkiran
,
O.
, and
Kavsaoğlu
,
M. Ş.
,
2013
, “
Flight Dynamics Analysis and Control of Transport Aircraft Subject to Failure
,”
EUCASS Proceedings Series – Advances, AeroSpace Sciences, Array
, St. Petersburg, Russia, July 4–8, pp.
347
362
.10.1051/eucass/201306347
47.
Lampton
,
A.
, and
Valasek
,
J.
,
2012
, “
Prediction of Icing Effects on the Lateral/Directional Stability and Control of Light Airplanes
,”
Aerosp. Sci. Technol.
,
23
(
1
), pp.
305
311
.10.1016/j.ast.2011.08.005
48.
Briggs
,
H. C.
,
2015
, “
A Survey of Integrated Tools for Air Vehicle Design, Part I
,”
AIAA
Paper No. 2015-0803.10.2514/6.2015-0803
49.
Roskam
,
J.
,
1985
,
Airplane Design, Parts I-VIII
,
DARCorporation
, Lawrence, KS.
50.
Roskam
,
J.
,
1998
,
Airplane Flight Dynamics and Automatic Flight Controls
,
DARCorporation
, Lawrence, KS.
51.
Finck
,
R.
,
Ellison
,
D.
, and
Malthan
,
L.
,
1978
,
USAF (United States Air Force) Stability and Control DATCOM (Data Compendium)
,
Defense Technical Information Center
, Wright-Patterson AFB, OH.
52.
Benyamen
,
H.
, and
Keshmiri
,
S.
,
2022
, “
Flight Test Validation Verification of @AIR Distributed Electric Propulsion Aircraft Dynamic Model
,” 2022 International Conference on Unmanned Aircraft Systems (
ICUAS
), Dubrovnik, Croatia, June 21–24, pp.
821
830
.10.1109/ICUAS54217.2022.9836206
53.
Xu
,
J.
,
McKinnis
,
A.
,
Keshmiri
,
S.
, and
Bowes
,
R.
,
2022
, “
Flight Test of the Novel Fixed-Wing Multireference Multiscale LN Guidance Logic for Complex Path Following
,”
J. Intell. Rob. Syst.
,
105
(
3
), p.
63
.10.1007/s10846-022-01660-x
54.
Blevins
,
A. T.
,
McKinnis
,
A.
,
Xu
,
J.
, and
Keshmiri
,
S. S.
,
2022
, “
Flight Test Validation of Real-Time UAS Mission Planning Autonomy and Optimal Path Planning for Flight Line Surveys
,”
AIAA
Paper No. 2022-2292.10.2514/6.2022-2292
55.
Kim
,
A. R.
,
Keshmiri
,
S.
,
Blevins
,
A.
,
Shukla
,
D.
, and
Huang
,
W.
,
2020
, “
Control of Multi-Agent Collaborative Fixed-Wing UASs in Unstructured Environment
,”
J. Intell. Rob. Syst.
,
97
(
1
), pp.
205
225
.10.1007/s10846-019-01057-3
56.
Zhang
,
A.
,
Lipton
,
Z. C.
,
Li
,
M.
, and
Smola
,
A. J.
,
2023
,
Dive Into Deep Learning
,
Cambridge University Press
, Cambridge, UK
.
57.
Abadi
,
M.
,
Agarwal
,
A.
,
Barham
,
P.
,
Brevdo
,
E.
,
Chen
,
Z.
,
Citro
,
C.
,
Corrado
,
G. S.
, et al.,
2015
, “
TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems
,” accessed June 1, 2020, tensorflow.org
58.
Tensorflow
,
2022
, “
Time Series Forecasting Tutorial
,” accessed Mar. 15, 2022, https://www.tensorflow.org/tutorials/structured_data/time_series
59.
Kingma
,
D. P.
, and
Ba
,
J.
,
2015
, “
Adam: A Method for Stochastic Optimization
,” 3rd International Conference on Learning Representations (
ICLR
), San Diego, CA, May 7–9, pp.
1
15
.10.48550/arXiv.1412.6980
60.
Keras
,
2024
, “
Keras 3 API Documentation, Metrics, Regression Metrics
,” accessed Feb. 21, 2024, https://keras.io/api/metrics/regression_metrics/
61.
Schulman
,
J.
,
Wolski
,
F.
,
Dhariwal
,
P.
,
Radford
,
A.
, and
Klimov
,
O.
,
2017
, “
Proximal Policy Optimization Algorithms
,” preprint
arXiv:1707.06347
.10.48550/arXiv.1707.06347
62.
Konda
,
V.
, and
Tsitsiklis
,
J.
,
1999
, “
Actor-Critic Algorithms
,”
Proceedings of the 12th International Conference on Neural Information Processing Systems
, Denver, CO, Nov. 29–Dec. 4, pp.
1008
1014
.https://proceedings.neurips.cc/paper_files/paper/1999/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf
63.
Sutton
,
R. S.
,
McAllester
,
D.
,
Singh
,
S.
, and
Mansour
,
Y.
,
1999
, “
Policy Gradient Methods for Reinforcement Learning With Function Approximation
,”
Proceedings of the 12th International Conference on Neural Information Processing Systems
, Denver, CO, Nov. 29–Dec. 4, pp.
1057
1063
.https://proceedings.neurips.cc/paper_files/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf
64.
Stastny
,
T. J.
,
Garcia
,
G. A.
, and
Keshmiri
,
S. S.
,
2014
, “
Collision and Obstacle Avoidance in Unmanned Aerial Systems Using Morphing Potential Field Navigation and Nonlinear Model Predictive Control
,”
ASME J. Dyn. Syst., Meas., Control
,
137
(
1
), p.
014503
.10.1115/1.4028034