[go: up one dir, main page]

WO2025050223A1 - Implicit dual control for uncertain stochastic systems - Google Patents

Implicit dual control for uncertain stochastic systems Download PDF

Info

Publication number
WO2025050223A1
WO2025050223A1 PCT/CA2024/051169 CA2024051169W WO2025050223A1 WO 2025050223 A1 WO2025050223 A1 WO 2025050223A1 CA 2024051169 W CA2024051169 W CA 2024051169W WO 2025050223 A1 WO2025050223 A1 WO 2025050223A1
Authority
WO
WIPO (PCT)
Prior art keywords
control
augmented
data structure
dual
states
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CA2024/051169
Other languages
French (fr)
Inventor
Andrew Craig MATHIS
Jonathon W. Sensinger
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of New Brunswick
Original Assignee
University of New Brunswick
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of New Brunswick filed Critical University of New Brunswick
Publication of WO2025050223A1 publication Critical patent/WO2025050223A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators

Definitions

  • System identification techniques can identify these uncertainties through testing before the system is put into normal operation, but this option is not always available and cannot identify time-varying uncertainties.
  • Adaptive control techniques can identify uncertain parameters while directing a system toward a desired objective, but cannot determine what actions would result in measurements with more information about the uncertainties, and are therefore passively adaptive.
  • Actively adaptive control techniques estimate the reductions in uncertainty that will result from their control actions and probe the system to identify the uncertain parameters to a sufficient level such that the desired goal is optimized. This actively adaptive control is known as dual control, as the controls are chosen to learn about the uncertainties and to reduce the cost.
  • Robust approaches consider the set of possible values that the parameters could take and can account for the uncertainties to limit their impact on achieving the control goals (e.g., [39]).
  • Stochastic approaches consider the probability density function of the uncertain parameters to account for parameter realizations with the higher chances of being true (e.g., [6]).
  • Adaptive approaches continuously update estimates of the uncertain parameters using measurements from the system and controls the system as if these estimates are the true values of the parameters (e.g., [15]).
  • dual approaches consider what actions could be taken to improve the information in future measurements in such a way that the resulting reduction in future costs is greater than the cost of these probing actions (e.g., [47]).
  • Adaptive control methods are passively adaptive, in that they only consider changes to parameter uncertainties due to past measurements.
  • Dual control on the other hand, is an actively adaptive control method, in that it actively modifies its control actions to seek out future measurements that will reduce parameter uncertainties.
  • Dual control has three features: it is probing, cautious, and selective [28]. Dual control is probing in that it modifies its control signals to obtain more information-rich measurements, it is cautious in that it will tend to make smaller control actions when uncertainties are high, and it is selective in that it will only attempt to identify parameters that impact the system’s performance.
  • An analogy for dual control that has been used is driving a new car from location A to location B [32].
  • Implicit dual controllers on the other hand, approximate the Bellman equations in such a way that the reduction of uncertainty due to future measurements is estimated, which generally comes at the cost of higher computational effort compared to explicit methods [22].
  • the relative importance of the dual features and the other objectives does not have to be quantified with the implicit approach as they are linked, but probing for increased parameter information is only done if it will reduce future costs.
  • implicit methods can give better results than explicit methods [7].
  • Bayard and Schumitzky [7] use an iteration in policy space algorithm that combines particle filtering with Monte Carlo simulations to estimate the cost-to-go and iteratively improve on a given control policy but is limited to control inputs that only take on two discrete values.
  • Sehr and Bitmead [53] approximate the system dynamics as a Partially Observable Markov Decision Processes, allowing the dual stochastic MPC problem to be solved explicitly for small and medium-sized problems.
  • Thangavel et al. [60] and Hanssen and Foss [20] take a multi-stage approach to implicit dual control and consider a branching network of scenarios, where each branch represents the system’s predicted response for a single realization of the uncertain parameters.
  • the optimization problem that is solved is to determine the control actions (over the control horizon) which minimize the sum of the costs of the scenarios over the prediction horizon.
  • the reduction in the uncertainties due to future measurements is estimated for each time step in a “robust” horizon, and this reduction is reflected in the selection of the discrete uncertainty values upon which the subsequent scenarios are based.
  • the robust horizon is usually chosen to be less than the control horizon, and the uncertainties after that point are assumed to be equal to the nominal value, no longer incorporating the dual features.
  • the existing implicit dual approaches are limited by Bellman’s curse of dimensionality as they do not take a derivative-based approach in continuous state space.
  • Optimal Control the control actions to a dynamic system over a period of time are determined by solving an optimization problem where a given objective function is minimized [63].
  • the objective function is chosen to cause the system to demonstrate a particular behaviour, such as following a particular state trajectory or minimizing energy usage.
  • Objective, or cost, functions can have terms that impose costs at each point in time, known as stage costs, or terms that only impose costs at the final time, known as terminal costs. Constraints can also be imposed on the states and/or controls in the form of inequalities or equalities [63].
  • a common example of an optimal control problem is finding the cheapest connections to fly to a destination [63].
  • the state x represents the cities
  • the control u represents the choices of which flights to take
  • cost(x, u) represents the cost of the plane ticket
  • next(x, u) represents the city where the flight u from city x lands.
  • the optimal cost-to-go can be determined by selecting the lowest cost-to-go at the initial time step, which provides the optimal control trajectory.
  • this approach means that the problem can be solved by considering flights to a single city at a time and keeping running totals of the flight costs. The cost of flights to the final destination can be recorded first, giving the cost-to-go (to the final destination) from each of those cities. The cost of flights to those second-to-last destinations from other cities can then be added to their respective cost-to-go functions, giving the cost-to-go from those cities.
  • This control approach is only locally-optimal due to it being a trajectory-based approach [35].
  • iLQR considers deviations around an initial nominal control trajectory, u ⁇ p , the resulting solution can only be optimal within that region of control space, and no claims of global optimality can be made.
  • iLQR can also be applied in a moving horizon setting where feedback is used to improve performance by accounting for inaccuracies due to the linearization process.
  • the algorithm is initialized with a nominal control trajectory as described above that could be a vector of zeros, randomly generated, or any other form of seed.
  • the iLQR algorithm is then run for a given finite time horizon, resulting in an improved control policy.
  • the control policy is then applied to the true system for a single time step, after which a state measurement from the system is obtained.
  • the length of the time horizon and the improved control policy can then be adjusted and used to run the iLQR algorithm. This process can be repeated for a problem with a finite time horizon until the final time step is reached, or continuously for problems with infinite time horizons.
  • iLQG Iterative Linear Quadratic Gaussian (iLQG) iLQG extends iLQR to nonlinear systems that are stochastic and do not have quadratic costs.
  • the cost function is “quadratized” about the nominal state-control trajectory in a similar way that the system dynamics are linearized (non-quadratic cost functions can be handled in the same way for iLQR). Since the states are uncertain, the measurement dynamics are also included in the iLQG algorithm, and a filter is required to estimate the states’ mean values and covariances.
  • the forward integration of the system dynamics to obtain the nominal state trajectory from the nominal control trajectory and the calculations of the derivatives required for the linearization of the system dynamics as well as the “quadratization” of the cost function are grouped together in what’s known as a forward pass.
  • the next step in the algorithm is the estimator, which uses the noisy measurements to infer the value of the states and their covariances.
  • a backward pass is required to calculate a quadratic approximation to the cost-to-go function, and then the optimal control deviations can be found.
  • iLQR iLQG
  • iLQG gives optimal solutions due to its trajectory-based approach, which also allows it to be very efficient.
  • the algorithm can be initialized multiple times with different nominal control trajectories, known as seeds, to search a larger control space. These seeds may converge to different local minima, and a selection criteria can be determined to select a single solution from this set of solutions.
  • H has negative eigenvalues
  • the approximation of the cost-to-go function may become negative, and cost-to-go always non-negative.
  • the first option [57] is to set + .
  • the second option is to regularize H and G through the quadratic cost-to-go t erms ([57]) as where min(eig(H)) is the minimum eigenvalue of H.
  • a fourth option [37] i s to set , (2.59) where [V, D] eig(H) is the eigenvalue decomposition of H, but the elements in the diagonal matrix D that are less than are replaced with before computing H.
  • Tassa et al. present a quadratic modification schedule for the regularization term in [57].
  • cost terms should be imposed on both u and s(u).
  • the other option to introduce control constraints is by solving the quadratic program of minimizing the cost-to-go approximation from equation (2.34), subject to the box control constraints bmin and bmax.
  • the quadratic optimization problem to be solved is min ( , ) (2.68) subject to + . (2.69)
  • This approach directly solves for the sequence of control actions that minimize the approximation of the total cost, by solving a sequence of these problems in a backward pass.
  • the gain matrices lk and Lk are the final result, such that the improved policy for the control deviations can be determined as shown in equation (2.52).
  • this optimization process can result in control deviations that are arbitrarily large and can therefore send the state trajectory outside of the region where the linear approximation to the system dynamics is reasonably accurate.
  • a line search is performed to sequentially reduce the improved control deviations until a solution is found that is estimated to cause a reduction in the total cost.
  • This locally-linear policy is determined by performing a forward pass through the estimated system dynamics: where is a backtracking search parameter that is set to 1 and then sequentially reduced. If the line search fails to find a reduced cost solution, the regularization parameter is increased as shown in equations (2.62) and (2.63), and the control policy backward pass and line search forward pass are repeated until the algorithm converges to a locally optimal control policy.
  • Stopping criteria Four stopping criteria are used to define convergence and control how long this iterative algorithm runs. First, if the gradient of the control deviations is less than a predefined threshold, the algorithm terminates.
  • Moving horizon implementation iLQG can be implemented with a moving horizon approach as described for iLQR, but since there is noise in the system dynamics and the measurements, a filter may be used to obtain an estimate of the states. A system diagram for this approach is shown in FIG.2.
  • the iLQG block implements the iterative iLQG algorithm, which is referred to as the inner loop 110, while the outer loop 120 contains the control signal being sent to the true system 130 for a single time step, the state dynamics being sampled with a sampling period of T as well as the measurement 140 and filtering steps.
  • the inner loop 110 accepts, as input, a nominal control trajectory , the current state and parameter values and and generates a new control policy after executing the forward and backward pass operations.
  • the new control policy enables the generation of an updated control trajectory for use as the nominal control trajectory for future iterations of the inner loop.
  • Adaptive Control is the control of systems with uncertain parameters that are constant or time-varying. This uncertainty can be due to the cost or complexity of accurately measuring the parameters being high, or when the control scheme is to be applied to multiple similar systems that have different values for the parameters [15].
  • adaptive control methods are divided into four categories: gain scheduling, model- reference adaptive control, self-tuning adaptive control, and dual control.
  • gain scheduling is to change the controller or its parameters based on pre-defined conditions.
  • a series of local controllers are required to implement gain scheduling, which are each tuned for their operating range, and a method of switching between them.
  • the selection of the local controller can be seen as an adaptive process.
  • a model reference adaptive controller adjusts the estimates of a model’s parameters such that the tracking error of the plant converges to zero.
  • the control and adaptation laws are coupled, and these can be derived using Lyapunov theory such that stability can be shown.
  • the plant parameters are recursively estimated from the input-output data and then used by a controller as if they were the true plant parameters.
  • Using the estimated value as if it were the true value is often called the certainty equivalence principle, allowing the design of the estimation process and controller to be independent, unlike in model-reference adaptive control.
  • self-tuning adaptive control there are also no guarantees of parameter convergence without sufficient richness, and the design of the estimator and the controller being independent makes the stability of the system harder to prove [55].
  • Dual control is theoretically the ideal adaptive control method [28], and one of the major differences is that dual control takes into account the fact that uncertainties will be reduced in the future. This consideration of future information allows for a controller that can probe the plant and make control actions to reduce the parameter uncertainty in the future in a cautious way and focus on the most relevant parameters.
  • Dual Control Bayard et al. divide stochastic control policies into three classes: open-loop, feedback, and closed-loop [7]. Open-loop control policies do not use any process measurements and therefore no learning occurs. Feedback control policies use all measurements up to the current time step, and therefore learning can occur, but the learning is passive since the data generated is only due to performing the control task.
  • Closed-loop control policies also use all measurements up to the current time step, but additionally, they anticipate that future measurements are going to be made in the future. Taking future measurements into account to determine the current control action allows planned, or active, learning to take place.
  • these stochastic control methods are known as dual controllers. Due to the curse of dimensionality, Bellman’s equations cannot be efficiently solved for problems of arbitrary size, and therefore the explicit and implicit approximations of dual control are needed for its practical implementation.
  • the explicit approximation involves modifying the cost function to elicit one or more of the dual behaviours of probing, caution, and selectiveness.
  • the implicit approximation on the other hand elicits these dual behaviours through a method that allows the control algorithm to consider the impact of probing actions on future costs.
  • Adding terms to the cost function with the explicit approximation requires an explicit trade-off between the control objectives and the system identification, while with the implicit approximation, the controller can balance these dual objectives without additional information. While it is simple to add extra terms to the cost function to elicit the dual features of caution, probing, and/or selectiveness, this approach fixes the value of system information relative to the minimization of the rest of the cost function. Even with methods that vary the value of system information based on a specific measure, explicit approaches are likely to overvalue or undervalue system information at different points in time compared to implicit approaches.
  • Bar-Shalom and Tse work is extended in [28] by estimating the system dynamics using a parametric, Gaussian process and neural network regression. Like Bar- Shalom and Tse’s work on which it is based, this control method is explicitly dual and only applicable to additive noise.
  • One application shows how dual NMPC can be applied to control the climate of a building, and this example is also detailed in [27].
  • Bar-Shalom and Tse also refer to time-scale separation and reformulating this implicit dual NMPC method without the dynamic programming portion to make it solely NMPC-based.
  • Implicit dual control As with the work on explicit dual control, many of the implicit dual control publications have been based on MPC, but several have taken other approaches. These works also make different assumptions on the uncertain parameters and use different estimation approaches. Multi-stage NMPC is used as an implicit approximation to dual control in [60], where the unknown system parameters were assumed to be bounded, parametric, and time- invariant. Here, the Fisher information matrix is used to estimate the future reduction in uncertainties, but the least squares estimate of the uncertainties is assumed to be constant.
  • the parameter sensitivities are included in the scenario tree in the same way as the states, allowing the controller to make decisions based on probing for information on specific parameters. More involved estimation schemes for the uncertainties and their bounds are mentioned, including guaranteed parameter estimation.
  • This dual control method is then applied to the control of a simulated chemical batch reactor.
  • This work was extended in [59], and the assumption that the least squares estimate has converged to the true value of the uncertainty is removed. This assumption is replaced by an over-approximation factor to estimate the future changes in the point estimation.
  • guaranteed parameter estimation is mentioned as a possible future extension of this work, but was not explored. This approach is used as a basis of comparison throughout this work and is therefore explained in more detail in the following section.
  • Multi-stage NMPC considers a branching network of scenarios, where each branch represents the system’s predicted response for a single realization of the uncertainties.
  • This scenario tree is illustrated in FIG.3, where x represents the system states, u represents the control actions, and d represents the discrete realizations of the uncertainties with superscripts indicating the scenario number and subscripts indicating the time index into the future.
  • control actions that share a node are equal, and this requirement is known as the non-anticipatory constraint.
  • the reduction in the uncertainties due to future measurements is estimated for each time step in a robust horizon, and this reduction is reflected in the selection of the uncertainty values upon which the subsequent scenarios are based.
  • the robust horizon is usually chosen to be less than the control horizon, and the uncertainties after that point are assumed to be equal to the nominal value.
  • the optimization problem that is solved is to determine the control actions (over the control horizon) that minimize the sum of the costs of the scenarios over the prediction horizon. To ensure that none of the scenarios are excited more than necessary, the selected control action is cautious when uncertainties are high.
  • Probing is also introduced as small control actions that may increase a scenario’s cost initially but can lead to larger reductions in future costs associated with the uncertainty.
  • selectiveness is introduced, as probing actions that have a higher “return on investment” will be prioritized, and therefore the most important uncertainties will be reduced.
  • the multi-stage NMPC method relies on predicting the future reductions in uncertainties for given control actions, and even when representing uncertainties by a discrete set of realizations, the number of scenarios grows exponentially and must be limited. The selection of the discrete realizations of the uncertainties impacts the controller’s robustness and computational For linear systems, using the minimum, maximum, and nominal values of the uncertainties (assuming they are bounded) keeps the number of scenarios per branch low and can be shown to be “usually” robust [60].
  • SUMMARY Systems and methods are provided for determining control actions for controlling a stochastic system via an implicit dual control using a computer processor and associated memory encoded with an augmented mathematical model of dynamics of the system, the augmented mathematical model characterizing the dynamics of the states and uncertain parameters of the system.
  • a stochastic differential-dynamic-programming-based algorithm is employed to process an augmented state data structure characterizing the states of the stochastic system and the one or more uncertain parameters, and an augmented state covariance data structure comprising a covariance matrix of the states of the stochastic system and a covariance matrix of the uncertain parameters, according to the augmented mathematical model and a cost function, in which the uncertain parameters are treated as additional states subject to the augmented mathematical model, to determine a control policy for reducing cost through implicitly generated dual features of probing, caution, and selectiveness.
  • a computer-implemented method of determining control actions for controlling a stochastic system according to an implicit dual controller comprising: providing control and processing circuity comprising at least one processor and associated memory, the memory comprising an augmented mathematical model of dynamics of the stochastic system, the augmented mathematical model a mathematical model including states of the stochastic system that dynamically evolve over time and a set of parameters characterizing dynamic evolution of the stochastic system, the set of parameters comprising one or more uncertain parameters, the augmented mathematical model augmented to include dynamics of one or more uncertain parameters; storing, in the memory, an initialized form of an augmented state data structure, the augmented state data structure being provided in as a first data array of data elements characterizing the states of the stochastic system and the one or more uncertain parameters, thereby grouping the states and the one or more uncertain parameters within the augmented state data structure; storing, in the memory, via the processor, an initialized form of an augmented state covariance data structure, the augmented state co
  • the mathematical model is configured such that all of the states are observable, and performing step d), the one or more uncertain parameters are updated via a filter.
  • at least one of the states is unobservable, and wherein, when performing step d), a filter is configured to employ the augmented mathematical model to estimate updated values of unknown states and the one or more uncertain parameters by processing the augmented state data structure and the augmented state covariance data structure.
  • step b) is performed after performing steps c) and d), such that the control policy is determined based on newly determined states.
  • the control action is autonomously applied to the stochastic system for at least one time step.
  • the control action is not applied to the stochastic system for at least one time step.
  • the control and processing circuity is encoded with the augmented mathematical model such that the augmented mathematical model is characterized by multiplicative noise.
  • the control and processing circuity is encoded with the augmented mathematical model such that at least one of the uncertain parameters is modeled as a time-dependent parameter.
  • the control and processing circuity is encoded to employ a moving control horizon.
  • the mathematical model is obtained by data- driven modeling.
  • the mathematical model is obtained by regression-based data-driven modeling.
  • the augmented stochastic differential- dynamic-programming-based algorithm is based on an algorithm selected from the group consisting of the iterative linear quadratic Gaussian (iLQG) algorithm, the stochastic dual dynamic programming (SDDP) algorithm, and variations thereof.
  • the augmented stochastic differential- dynamic-programming-based algorithm is based on an algorithm selected from the group consisting of dual dynamic programming, the sequential linear quadratic (SLQ) algorithm, and the iterative linear quadratic regulator (iLQR) algorithm, and variations thereof, all being modified to include stochastic terms.
  • the stochastic system is an industrial system for producing or refining a product.
  • At least one of the one or more uncertain parameters may be a composition of a feedstock, and wherein the control action comprises controlling an input rate of the feedstock.
  • the industrial system may be an anerobic digestion system and the feedstock is an organic feedstock suitable for digestion by
  • the stochastic system comprises a population, wherein the states comprises a plurality of infection status states of the population, wherein the mathematical model simulates spread and dynamics of an infectious disease among the population, and wherein the control policy is configured to determine, at least in part, a severity of public policy actions for containing spread of the infectious disease, and wherein the one or more uncertain parameters comprise a minimum rate of infection when a maximum severity of public policy is applied.
  • the stochastic system is an autonomous vehicle, and where at least one uncertain parameter is associated with an uncertainty caused by an impact of an environment on dynamics of the autonomous vehicle.
  • the one or more uncertain parameters may comprise at least one of a friction coefficient and a drag coefficient having uncertainty due to external environmental conditions.
  • the stochastic system is an individual undergoing rehabilitation, wherein the states characterize at least one of participation, activities, health condition, body functions and structures, environmental factors, and personal factors, and wherein the one or more uncertain parameters comprise gains and time-constants involving interactions between the states in response to rehabilitation control actions.
  • the stochastic system is a wearable robotic system, and wherein at least one uncertain parameter is tunable on a per-user basis.
  • the stochastic system is an industrial system, and wherein at least one uncertain parameter is associated with degradation of the industrial system, and wherein the method further comprises employing updated values of the at least one uncertain parameter and/or its updated uncertainty, obtained during control of the industrial system, to detect a fault associated with degradation of the industrial system.
  • the stochastic system is a building climate control system, and wherein the one or more uncertain parameters comprise at least one of an uncertain parameter associated with external factor and an uncertain parameter characterizing a building-specific factor.
  • a computer-implemented method of determining control actions for controlling a stochastic system according to an adaptive controller comprising: providing control and processing circuity comprising at least one processor and associated memory, the memory comprising a mathematical model of dynamics of the stochastic system, the mathematical model including states of the stochastic system that dynamically evolve over time and a set of parameters characterizing dynamic evolution of the stochastic system, the set of parameters comprising one or more uncertain parameters, the memory also comprising an augmented mathematical model of dynamics of the stochastic system, the augmented mathematical model comprising a mathematical model augmented to dynamics of one or more uncertain parameters; storing, in the memory, an initialized form of a state data structure, the state data structure being provided in as a first data array of data elements characterizing the states of the stochastic system; storing, in the memory, an initialized form of a state covariance data structure, the state covariance data structure being provided as a second data array of data elements comprising a covariance matrix of the states of
  • the stochastic differential-dynamic- programming-based algorithm is based on an algorithm selected from the group consisting of the iterative linear quadratic Gaussian (iLQG) algorithm, the stochastic dual dynamic programming (SDDP) algorithm, and variations thereof.
  • the stochastic differential-dynamic- programming-based algorithm is based on an algorithm selected from the group consisting of dual dynamic programming, the sequential linear quadratic (SLQ) algorithm, and the iterative linear quadratic regulator (iLQR) algorithm, and variations thereof, all being modified to include stochastic terms.
  • an implicit dual controller for controlling a stochastic system
  • the implicit dual controller comprising: control and processing circuity comprising at least one processor and associated memory, the memory comprising an augmented mathematical model of dynamics of the stochastic system, the augmented mathematical model based on a mathematical model including states of the stochastic system that dynamically evolve over time and a set of parameters characterizing dynamic evolution of the stochastic system, the set of parameters comprising one or more uncertain parameters, the augmented mathematical model augmented to include dynamics of one or more uncertain parameters; the memory further comprising: an initialized form of an augmented state data structure, the augmented state data structure being provided in as a first data array of data elements characterizing the states of the stochastic system and the one or more uncertain parameters, thereby grouping the states and the one or more uncertain parameters within the augmented state data structure; an initialized form of an augmented state covariance data structure, the augmented state covariance data structure being provided as a second data array of data elements comprising a covariance matrix of the states of the
  • an adaptive controller for controlling a stochastic system
  • the adaptive controller comprising: control and processing circuity comprising at least one processor and associated memory, the memory comprising a mathematical model of dynamics of the stochastic system, the mathematical model including states of the stochastic system that dynamically evolve over time and a set of parameters characterizing dynamic evolution of the stochastic system, the set of parameters comprising one or more uncertain parameters, the memory also comprising an augmented mathematical model of dynamics of the stochastic system, the augmented mathematical model comprising a mathematical model augmented to include dynamics of one or more uncertain parameters; the memory further comprising: an initialized form of a state data structure, the state data structure being provided in as a first data array of data elements characterizing the states of the stochastic system; an initialized form of a state covariance data structure, the state covariance data structure being provided as a second data array of data elements comprising a covariance matrix of the states of the stochastic system; an initialized form of an augmented state data structure, the augmented state
  • a system comprising: a stochastic physical subsystem; one or more sensors associated with said stochastic physical subsystem for measuring an output associated with said stochastic physical subsystem; and control and processing circuitry operably coupled to said stochastic physical system and said one or more sensors, said control and processing circuitry comprising at least one processor and associated memory, said memory comprising instructions executable by said at least one processor for performing operations comprising: (a) employing an augmented stochastic differential-dynamic-programming- based algorithm to determine control actions for controlling the stochastic physical subsystem, the augmented stochastic differential-dynamic-programming-based algorithm modelling the stochastic system, at least in part, according to a set of states and one or more uncertain parameters, and employing an augmented state vector comprising the states and the one or more uncertain parameters, an augmented covariance matrix that combines a covariance matrix of the states and a covariance of the one or more uncertain parameters; (b) applying the control actions to the stochastic physical subsystem; (c)
  • a method of controlling a stochastic physical system according to an implicit dual controller the stochastic system being modeled, at least in part, according to a set of states and one or more uncertain parameters
  • the method comprising: (a) employing an augmented stochastic differential-dynamic-programming-based algorithm to determine control actions for controlling the stochastic physical system, the augmented stochastic differential-dynamic-programming-based algorithm employing an augmented state vector comprising the states and the one or more uncertain parameters, and an augmented covariance matrix that combines a covariance matrix of the states and a covariance matrix of the one or more uncertain parameters; (b) applying the control actions to the stochastic physical system; (c) measuring an output of the stochastic physical system; (d) processing the output and the control actions to determine an augmented state vector estimate and an augmented covariance matrix estimate; and (e) repeating steps (a)-(d) to determine control actions for a plurality of time steps, such that each time that step (a) is
  • FIG.1A shows a comparison of DDP-like controllers.
  • FIG.1B shows a detailed comparison of DDP-like controllers according to their titles.
  • FIG.1C shows a detailed comparison of DDP-like controllers according to their algorithms.
  • FIG.1D shows a detailed comparison of DDP-like controllers according to their treatment of noise.
  • FIG.1E shows a detailed comparison of DDP-like controllers according to their approximations.
  • FIG.1F shows a detailed comparison of DDP-like controllers according to their regularization.
  • FIG.1G shows a detailed comparison of DDP-like controllers according to their constraints.
  • FIG.2 shows a closed-loop iLQG system diagram (terms are defined in the list of variables).
  • FIG.3 shows the tree of scenarios considered in multi-stage NMPC [59].
  • FIGS.4A, 4B and 4C show flow charts for the three variations of iLQG discussed in the present disclosure.
  • FIG.4A is the same as FIG.2 and is shown here for ease of comparison, while FIGS.4B and 4C illustrate example adaptive and implicit dual control methods, respectively. Terms defined in the list of variables.
  • FIG.4A is the same as FIG.2 and is shown here for ease of comparison, while FIGS.4B and 4C illustrate example adaptive and implicit dual control methods, respectively. Terms defined in the list of variables.
  • FIG.4A is the same as FIG.2 and is shown here for ease of comparison, while FIGS.
  • FIG. 5A shows a control comparison between dual and adaptive iLQG and MS-SP- NMPC on linear example.
  • FIG.5B shows a state comparison between dual and adaptive iLQG and MS-SP-NMPC on a linear example.
  • FIG.5C shows a parameter comparison between dual and adaptive iLQG and MS-SP- NMPC on a linear example.
  • FIG.5D shows a cost comparison between dual and adaptive iLQG and MS-SP-NMPC on a linear example.
  • FIG.5E shows a control comparison between dual and adaptive iLQG on a time-varying parameters example.
  • FIG.5F shows a state comparison between dual and adaptive iLQG on time-varying parameters example.
  • FIG.7C shows a parameter comparison between dual iLQG with and without the system’s multiplicative noise being compensated for in the controller.
  • FIG.7D shows a parameter ratio comparison between dual iLQG with and without the system’s multiplicative noise being compensated for in the controller.
  • FIG.7E shows a cost comparison between dual iLQG with and without the system’s multiplicative noise being compensated for in the controller.
  • FIG.8A shows a model-reference adaptive control flowchart for Rohr’s example [55].
  • FIG.8B shows a control comparison between dual and adaptive iLQG and model- reference adaptive control on an unmodelled dynamics example.
  • FIG.8C shows a control comparison between dual and adaptive iLQG and model- reference adaptive control on the unmodelled dynamics example for first seven seconds.
  • FIG.8D shows a state comparison between dual and adaptive iLQG and model-reference adaptive control on the unmodelled dynamics example.
  • FIG.8E shows a state comparison between dual and adaptive iLQG and model-reference adaptive control on the unmodelled dynamics example for first seven seconds.
  • FIG.8F shows a parameter comparison between dual and adaptive iLQG and model- reference adaptive control on the unmodelled dynamics example.
  • FIG.8G shows a parameter comparison between dual and adaptive iLQG and model- reference adaptive control on the unmodelled dynamics example for first seven seconds.
  • FIG.8H shows a parameter ratio comparison between dual and adaptive iLQG and model- reference adaptive control on the unmodelled dynamics example.
  • FIG.8I shows a parameter ratio comparison between dual and adaptive iLQG and model-reference adaptive control on the unmodelled dynamics example for first seven seconds.
  • FIG.8J shows an output comparison between dual and adaptive iLQG and model- reference adaptive control on the unmodelled dynamics example.
  • FIG.8K shows an output comparison between dual and adaptive iLQG and model- reference adaptive control on the unmodelled dynamics example for first seven seconds.
  • FIG.8L shows a cost comparison between dual and adaptive iLQG and model-reference adaptive control on the unmodelled dynamics example.
  • FIG.8M shows a cost comparison between dual and adaptive iLQG and model-reference adaptive control on the unmodelled dynamics example for first seven seconds.
  • FIG. 9A shows an example implicit dual iLQG control system.
  • FIG. 9B shows an example implicit dual iLQG control system with a joint estimation filter.
  • FIG. 9C shows an example implicit dual iLQG control system with dual estimation filters.
  • FIG. 9D shows an example implicit dual system employed for system with fully observable states.
  • FIG. 9E illustrates how, within the inner loop of the method, the model structure and parameters are provided to the iLQG algorithm.
  • FIG. 9F shows various example approaches to model generation.
  • FIG. 10 shows a Venn Diagram schematically illustrating relationships between different types of differential dynamic programming-based algorithms.
  • FIG.11 shows an example system with an implicit dual controller.
  • FIG.12 shows parameters for the example AM2 model.
  • FIG.13A shows a control comparison between dual and adaptive iLQG and dual MS- SP-NMPC on an AM2 anaerobic digestion model with two uncertain parameters.
  • FIG.13B shows a state comparison between dual and adaptive iLQG and dual MS-SP- NMPC on the AM2 anaerobic digestion model with two uncertain parameters.
  • FIG.13C shows a parameter comparison between dual and adaptive iLQG and dual MS-SP-NMPC on the AM2 anaerobic digestion model with two uncertain parameters.
  • FIG.13D shows a biogas production comparison between dual and adaptive iLQG and dual MS-SP-NMPC on the AM2 anaerobic digestion model with two un-certain parameters.
  • FIG.13E shows a cost comparison between dual and adaptive iLQG and dual MS-SP- NMPC on the AM2 anaerobic digestion model with two uncertain parameters.
  • FIG.14 shows parameter values for the AM2 comparison with seventeen uncertainties.
  • FIG.15A shows a control comparison between dual and adaptive iLQG on an AM2 anaerobic digestion model with seventeen uncertain parameters.
  • FIG.15B shows a state comparison between dual and adaptive iLQG on the AM2 anaerobic digestion model with seventeen uncertain parameters.
  • FIG.15C shows a parameter comparison between dual and adaptive iLQG on the AM2 anaerobic digestion model with seventeen uncertain parameters.
  • FIG.15D shows a biogas production comparison between dual and adaptive iLQG on the AM2 anaerobic digestion model with seventeen uncertain parameters.
  • FIG.15E shows a cost comparison between dual and adaptive iLQG on the AM2 anaerobic digestion model with seventeen uncertain parameters.
  • FIG.16 shows a SIR model compartmental diagram [42].
  • FIG.17A shows a SIDARTHE model compartmental diagram [29].
  • FIG.17B shows parameters for the SIDARTHE model.
  • FIG.18 shows a partial compartmental diagram for considering the impact of an over- whelmed ICU [29].
  • FIG.19A shows a control comparison between dual and adaptive iLQG and dual MS- SP-NMPC on a SIDARTHE COVID-19 with two uncertain parameters.
  • FIG.19B shows a state comparison between dual and adaptive iLQG and dual MS-SP- NMPC on the SIDARTHE COVID-19 model with two uncertain parameters.
  • FIG.19C shows a parameter comparison between dual and adaptive iLQG and dual MS-SP-NMPC on the SIDARTHE COVID-19 model with two uncertain parameters.
  • FIG.19D shows a cost comparison between dual and adaptive iLQG and dual MS-SP- NMPC on the SIDARTHE COVID-19 model with two uncertain parameters.
  • FIG. 20A shows results from 100 seeded runs of the 2 parameter SIDARTHE COVID-19 model with several rolling horizon lengths.
  • FIG. 20B shows a frequency (%) of final cost of 100 seeded runs of the iLQG algorithms for the 2 parameter SIDARTHE COVID-19 model with a rolling horizon length of 40.
  • FIG. 20C shows a distribution of the simulation time required for 100 seeded runs of the iLQG algorithms.
  • FIG.21 shows parameter values for the SIDARTHE comparison with sixteen uncertainties.
  • FIG.22A shows a control comparison between dual and adaptive iLQG on a modified SIDARTHE COVID-19 model with sixteen uncertain parameters.
  • FIG.22B shows a state comparison between dual and adaptive iLQG on the modified SIDARTHE COVID-19 model with sixteen uncertain parameters.
  • FIG.22C shows a parameter comparison between dual and adaptive iLQG on the modified SIDARTHE COVID-19 model with sixteen uncertain parameters.
  • FIG.22D shows a cost comparison between dual and adaptive iLQG on the modified SIDARTHE COVID-19 model with sixteen uncertain parameters.
  • the terms “comprises” and “comprising” are to be construed as being inclusive and open ended, and not exclusive. Specifically, when used in the specification and claims, the terms “comprises” and “comprising” and variations thereof mean the specified features, steps or components are included. These terms are not to be interpreted to exclude the presence of other features, steps or components.
  • the term “exemplary” means “serving as an example, instance, or illustration,” and should not be construed as preferred or advantageous over other configurations disclosed herein.
  • any specified range or group is as a shorthand way of referring to each and every member of a range or group individually, as well as each and every possible sub-range or sub-group encompassed therein and similarly with respect to any sub-ranges or sub-groups therein. Unless otherwise specified, the present disclosure relates to and explicitly incorporates each and every specific member and combination of sub-ranges or sub-groups.
  • the iterative linear quadratic Gaussian (iLQG) method is a powerful control technique due to its ability to handle nonlinear and stochastic systems with multiplicative noise but has not been extended to handle systems with uncertain parameters in either an adaptive or dual manner.
  • iLQG which can calculate locally-optimal control policies for nonlinear stochastic systems, is based on continuous state space with the use of derivatives of a linearized system about a nominal control trajectory and is related to Pontryagin’s maximum principle.
  • iLQG is not a dual or adaptive control algorithm, it can handle nonlinear and stochastic systems with many states through its derivative-based approach in continuous state space.
  • the present inventors realized that by modifying iLQG to treat the uncertain parameters as uncertain states, the resulting adaptive and dual iLQG control algorithm can predict how changes to the inputs and states can result in future reductions in the parameter uncertainty and therefore increase overall performance (lower costs).
  • dual iLQG and variations thereof employing other stochastic DDP-based algorithms) can identify changes to the inputs that can decrease parameter uncertainty, and although these actions have an associated cost, they decrease the overall cost over the control trajectory.
  • Adaptive and dual iLQG represents a fast (due to the linearization of the system) and feasible (due to working with derivatives about a nominal state-control trajectory) solution to the implicit dual control of small and large systems while avoiding Bellman’s curse of dimensionality. Accordingly, in present work, an existing derivative-based control method is extended to handle systems with uncertain parameters in either an adaptive or dual manner.
  • Adaptive iLQG To extend iLQG to uncertain systems in an adaptive manner, two changes are made to the closed-loop iLQG approach shown in FIG.4A (and FIG.2). First, the initial estimates of the uncertain parameters are to the iLQG inner loop as constants.
  • the augmentation of the state vector with the parameters for the closed-loop filter is an approach that is known as joint simultaneous state and parameter estimation [49].
  • an augmented constants vector c a is created similar to the augmented state vector, through the concatenation of the constants and the parameters, Moreover, when the processing the augmented forms of the state vector and the covariance matrix by the filter in the outer loop, an augmented form of the state dynamics where ( , ) is the parameter dynamics and ( , ) is the augmented system dynamics.
  • This approach allows adaptive iLQG to update its parameter estimates in the outer loop after getting new measurements and pass them to the inner loop iLQG algorithm as constants, which is known as the certainty equivalence principle.
  • the system dynamics are also adapted in [48] using Locally Weighted Projection Regression.
  • Dual iLQG To extend this adaptive iLQG approach to be dual, the uncertainty associated with the parameters influences the control policy. Therefore, instead of treating the parameters as constants in the inner loop iLQG algorithm, the parameters are treated as states and an augmented state vector is formed as shown in equation (3.1). An augmented state covariance is also created as shown in equation (3.2).
  • the present dual iLQG method involves the processing of the augmented state dynamics in the inner loop. The inclusion of the parameter dynamics makes dual iLQG able to handle time-varying parameters. These changes are represented in the system diagram in FIG.4C.
  • the augmented state vector is employed when executing the inner loop 110 according to the augmented form of the state dynamics, such that the uncertain parameters are treated as states by the inner loop, with the uncertain parameters governed by the parameter dynamics prescribed by the augmented state dynamics.
  • the iLQG algorithm treats the parameters as unmeasured states and allows the control algorithm to predict how changes to the inputs and states can result in future reductions in the parameter uncertainty, through the backward pass of the cost-to- go function, to lower the total cost of the control trajectory.
  • the parameter uncertainty influences the control actions through the inner loop Kalman filter gain, which impacts the estimates of the augmented state vector and the cost function at each time step.
  • a first augmented data structure is an augmented state data structure, provided in a matrix, for example, a 1-D array (or alternatively in a multi-dimensional array), which is a concatenation of the data of the state vector and the data elements of the uncertain parameter vector, as shown in equation (3.1).
  • a second augmented data structure is an augmented state covariance matrix, which is initialized as a combination of the data elements from the state covariance matrix and data elements from the uncertain parameter covariance matrix in a block diagonal manner, as shown in equation (3.2).
  • This new data structure contains the confidence information for the associated augmented state vector, and when used with the augmented state vector, allows for the states and parameters to be treated as a single stochastic entity in whatever part of the algorithm it is applied.
  • the augmented state dynamics are also encoded, in functional logical form, into the memory of the computer system for processing, in which the state dynamics function and the parameter dynamics function are concatenated, as shown in equation (3.4).
  • This encoded functional form allows the modelled dynamics of the augmented state vector to be calculated as a single dynamic system.
  • the augmented state and augmented state covariance data structures are employed in the outer-loop filter of the closed-loop iLQG algorithm to give the algorithm the new ability of being adaptive, thus creating the new adaptive iLQG algorithm.
  • These two data structures allow the outer loop filter to treat the parameters as states, and through the use of the information contained in the measurements from the true system, leads to updated estimates of the augmented state vector and augmented state covariance matrix for the next iteration of the algorithm.
  • these data structures allows for the estimates of the uncertain parameters to be updated for each iteration of the algorithm, which can lead the adaptive iLQG algorithm to perform better than the closed-loop iLQG algorithm in terms of the evaluation of the total cost function.
  • these data structures take up more space on the computer memory than the data elements from the state vector or state covariance matrix alone, the adaptive ability that they create when used in the closed-loop iLQG algorithm can allow for the computer to produce a better solution for controlling the physical system than the closed-loop iLQG algorithm. Additionally, the adaptive iLQG algorithm can produce this better solution with a similar number of processor cycles, and therefore can be more computationally efficient than the closed-loop iLQG algorithm.
  • an augmented covariance data structure that includes the covariance matrix of the states and the covariance matrix of the one or more uncertain parameters will include the data elements corresponding to those present in the covariance matrix of the states and those present in the covariance matrix of the one or more uncertain parameters, but need not be provided in a standard covariance matrix form, provided that the data elements of the augmented covariance data structure can be accessed and processed by the computer hardware implementing the method.
  • the augmented state data structure and augmented state covariance matrix data structures are employed outer-loop filter and in the inner-loop iLQG algorithm along with the augmented state dynamics function data structure to give the closed-loop iLQG algorithm the new ability of being dual, thus creating the new dual iLQG algorithm.
  • the augmented state data structure and augmented state covariance data structures allow the outer-loop filter to treat the parameters as states, and through the use of the information contained in the measurements from the true system, leads to updated estimates of the augmented state vector and augmented state covariance matrix for the next iteration of the algorithm.
  • the use of the augmented state data structure in the inner-loop iLQG algorithm allows the dual iLQG algorithm to treat the parameters as unmeasured states and allows the dual iLQG algorithm to predict how changes to the control inputs and states can result in future reductions in the parameter uncertainty, through the backward pass of the cost-to-go function, and allow for a lower total cost of the control trajectory.
  • the parameter uncertainty that is encoded in the augmented state covariance data structure influences the control actions through the inner loop Kalman filter gain, which impacts the estimates of the augmented state vector data structure and the cost function at each time step.
  • dual iLQG can identify changes to the inputs that can decrease parameter uncertainties while also decreasing the total cost of the control trajectory.
  • the use of these data structures in this way can lead the dual iLQG algorithm to perform better than the closed-loop iLQG and adaptive iLQG algorithms in terms of the evaluation of the total cost function.
  • These data structures allow for a dual approach that is derivative-based and can handle applications with higher numbers of states and parameters without the common issue of Bellman’s curse of dimensionality that limits the use of conventional implicit dual control algorithms on computer systems.
  • the curse of dimensionality is a phenomenon in which the size of a stochastic computational problem, in terms of memory and/or processor requirements, grows exponentially with a linear increase in the number of states and parameters, as described previously for the case of dual SP-MS-NMPC.
  • the derivative-based approach of the closed-loop iLQG algorithm allows it to search along a nominal control trajectory for where changes to the control trajectory are likely to reduce the total cost.
  • the use of these data structures takes this existing closed-loop iLQG algorithm and significantly improves it by making it dual, allowing the dual iLQG algorithm to use the same computer resources, in terms of memory and processing, to solve larger problems than other implicit dual control approaches.
  • Dual iLQG is implicitly dual through the augmented state and covariance data structures that allow it to identify changes to the control trajectory that can decrease parameter uncertainty, through the use of derivatives of the cost function. Although these control changes may have an associated cost, they decrease the overall cost of the control trajectory.
  • a seeding method may be incorporated to initialize the optimization problem with different sets of initial conditions.
  • the nominal control trajectory is the variable that is used to initialize the optimization and is iteratively improved upon through successive runs of the inner loop iLQG algorithm [35].
  • This seeding process increases the performance of the dual iLQG algorithm, it linearly increases the computation time for each time step in which it is used.
  • Moving horizon approaches When performing example implementations of the adaptive and implicit dual control methods, two different example moving horizon options were coded into the dual iLQG algorithm: shrinking and rolling horizon approaches. In the shrinking horizon approach, the controller simulates the system and solves for the control actions for the entire time horizon, from the present time step to the final time step.
  • the time horizon is reduced by a single time step and the inner loop of the algorithm is reinitialized. This process repeats until the final time step is reached, meaning that the calculation effort for each time step decreases as the algorithm progresses.
  • the rolling horizon approach has a fixed time horizon, such that after each inner loop of the algorithm is complete, the next outer loop iteration is initialized by shifting this fixed time horizon one step forward, dropping the present time step and adding a new time step on the end. Once the final time step is reached, although the controls and system responses have been determined for future time steps, they are not considered in the results.
  • the true value of the constant parameter vector is [125, 50], while its initial estimate is [100, d of [8100, 4500; 4500, 5625].
  • This system is simulated over 6 time steps that are 0.05 seconds long. Both states are measured with a noise scaling factor of 10 - 2 , and as the MS-SP-MPC does not consider the states to be uncertain, the noise scaling factors for dual iLQG were set to 10 - 15 for both the state and parameter dynamics.
  • the cost function can be minimized by maintaining u2 at zero and driving x1 to 5 using u1, as u2 does not influence x1.
  • a dual control method on the other hand will temporarily use u 2 to reduce the uncertainty associated with d2, even though it will incur a cost, to reduce the overall cost.
  • Adaptive MS-SP-MPC, dual MS-SP-MPC, and dual iLQG were compared using this system, and the resulting controls are in FIG.5A. Starting with u 2 , it can be seen that the adaptive MS-SP-MPC maintains u2 at zero as expected.
  • the dual MS-SP-MPC has a non-zero value of u 2 for the first time step only, while the magnitude of u 2 for the dual iLQG controller gradually steps down to zero over four time steps.
  • the dual iLQG algorithm took an average of 3.5 seconds to run, while the adaptive iLQG algorithm took 0.4 seconds, the adaptive MS-SP-MPC algorithm took 3.0 seconds and the dual MS-SP-MPC algorithm took 75 seconds.
  • the dual iLQG controller significantly outperformed dual MS- SP-MPC by being less cautious with its initial value for u1 and by continuing to u2 after the first time step to improve its estimate of the uncertain parameters. Robustness to time-varying parameters
  • parameters that are modelled as constant may vary over time. The ability of a controller to achieve the desired objective in the face of uncertain parameter dynamics can be described as the controller’s robustness to time-varying parameters.
  • the previous linear example is modified to have time-varying parameters to explore dual iLQG’s robustness to time-varying parameters.
  • the system dynamics are the same as shown in equation (3.5), and the initial state and parameter estimates also are the same and both the states and the controls remain unconstrained.
  • the initial true value of the parameters is the same as before, but the true parameters vary with dynamics of: ) , (3.8) while the controller believes that the parameters remain constant over time.
  • the time step of 0.05 seconds was kept the same as the previous example, instead of simulating the system over six steps, twenty time steps were used to show the impact of the time-varying parameters.
  • FIG.5G shows each iLQG algorithm’s estimates of the time-varying parameters.
  • the dual controller quickly converged to the true parameter values, while the adaptive iLQG controller took a couple of extra time steps to converge to the true value of the second parameter, and had a small offset error in its estimate of the first parameter.
  • the dual controller’s estimates for the parameters were better than the adaptive controller’s estimates, and both controllers were better able to estimate the second parameter.
  • FIG.5H shows each iLQG algorithm’s cumulative cost over the control horizon. Although the dual iLQG controller has a higher cost after the first time step, it subsequently maintains a lower cumulative cost than the adaptive iLQG algorithm.
  • the dual iLQG algorithm’s higher cost for the first time step is due to its increased distance from the goal value of 5 for the first state and its use of the second control action to probe the system. This probing action allows the dual controller to have a final cost that is 16% lower than the adaptive controller.
  • 100 initial control sequence seeds were run, with each element of each sequence drawn from a normal distribution with a mean of zero and a variance of 0.01. This variance was sufficient to generate a diversity of results for these simulations.
  • FIGS.6A and 6B A histogram of the final costs and a box plot of the simulation times for each of the algorithms is shown in FIGS.6A and 6B. In FIG.6A, the expected successive improvement between the three algorithms can be seen.
  • the best iLQG solution has roughly the same value as the median adaptive iLQG solution, and likewise the best adaptive iLQG solution has roughly the same value as the median dual iLQG solution.
  • the variance of the solutions visibly decreases with the change in the algorithm, with dual iLQG giving more consistent results than the other two algorithms.
  • the dual iLQG controller achieved a 16% reduction in the cost function compared to the adaptive controller. This difference in performance was primarily due to dual iLQG’s increased ability to determine the uncertain parameters, although both dual and adaptive iLQG showed robustness to time-varying parameters in this example.
  • the consistent improvement of dual iLQG over adaptive iLQG and adaptive iLQG over iLQG was shown, as well as that they have comparable run times.
  • Importance of compensating for multiplicative noise Dual iLQG can handle multiplicative noise where Bar-Shalom and Tse’s wide- sense dual control is limited to additive noise and MS-SP-NMPC does not consider process noise, only measurement noise.
  • multiplicative noise appears in many applications such as the control of prostheses using myoelectric signals [54], robot motion planning when using sensors with distance-dependent errors [14], stochastic fluid dynamics [12], and power grids where renewable energy represents a significant portion of total power generation [19].
  • the importance of compensating for multiplicative noise in the controller was demonstrated using the simple system dynamics shown in equation (3.5) with multiplicative noise.
  • dual iLQG was used twice, but in one case the multiplicative noise was accounted for in the controller, and in the other case it was not. This approach was used to keep all other variables except the one in question the same and is equivalent to using a controller that can only deal with additive noise when interacting with systems that have multiplicative noise.
  • the controller that is compensating for the multiplicative noise uses a smaller first u2 control action than the other controller, before making a larger control action in the second step. This action allows the compensating controller to gain some information on the system before having additional noise from a larger control step.
  • the controller that is not compensating for the multiplicative noise takes the opposite strategy in the first two time steps. The remaining u 2 and the u 1 results are relatively similar between the controllers.
  • FIG.7B shows the effect of the controls on the states, where the influence of the objective function being the square of the distance between the first state and a value of five can be seen.
  • the non-compensating controller overshoots to 5.45 at the end of the first time step, while the compensating controller overshoots to 5.31.
  • the non-compensating controller has an x 1 value of 5.31, while the compensating controller has a value of 5.03.
  • the non-compensating controller then takes three more time steps to get lower than 5.03, while the compensating controller stays within ⁇ 0.04 of five.
  • a plot of the estimates of the two uncertain parameters is shown in FIG.7C. Both controllers start to converge to estimates lower than the true value of the parameters, the non- compensating controller from above and the compensating controller from below. As with the unmodelled dynamics example, in this case it is not the absolute parameter values that are important for the controller, but the ratio of the parameter values.
  • FIG.7D shows the ratio of the parameters for the two controllers over the simulation.
  • FIG.7E shows the cumulative cost of these controllers over the simulation period.
  • the major changes in cost occur in two time steps, with the compensating controller’s cost leveling out after the first time step and the non- compensating controller’s cost leveling out after the third time step.
  • the controller that is compensating for the multiplicative noise has a total cost that is 62.1% lower than the controller that does not compensate for the multiplicative noise.
  • This section demonstrates the importance of using a dual controller that can properly compensate for multiplicative noise in systems where multiplicative noise exists. In the simple example presented, the difference in cost was significant at 62.1%.
  • Rohr’s example is used to illustrate the effect of unmodelled dynamics, showing that, unless a dead zone is used, the given adaptation law causes the first-order system to become unstable when driven to a constant reference in the face of a sinusoidal measurement noise.
  • This simple example is used here to demonstrate dual iLQG’s robustness to unmodelled dynamics in this case.
  • Rohr’s example considers a desired performance that is described by a reference system that is a first-order system with a transfer function of: (3.11) where and are parameters both with a value of 3.
  • the true system is a third-order where and are the true system parameters with values of 2 and 1, respectively.
  • model-reference adaptive control is used to control this system and is implemented as shown in FIG.8A.
  • model-reference adaptive control the desired system behaviour is specified through the use of a reference model. The controller then attempts to make the true system’s output equal to the output from the reference model by adjusting the input to the true system based on a set of known regressors and adapting parameters.
  • model-reference adaptive control to adapt a set of compensating parameters to account for the true system’s behaviour and impose the desired behaviour onto the system. This is different than the dual iLQG approach which is to adapt its estimates of the true system’s parameters to a sufficient level to minimize the desired cost function.
  • a control law of + (3.17) was used where r is the input reference and y is the output of the true system after the addition of noise, and the a terms are the corresponding adaptation parameters.
  • the measurement noise scaling factors were 0.5 and the noise associated with the state and parameter dynamics was 10 - 15 .
  • the sinusoidal noise from the problem was implemented on the true system, which the algorithms had no knowledge of other than the 0.5 noise scaling factor.
  • FIGS.8D and 8E show the true and estimated states for all three controllers over the entire 70 seconds and for just the first seven seconds, respectively.
  • the response to the high-frequency control actions from the iLQG controllers can be seen in the true dynamics for the first two states, while their estimates remained at zero with increasing uncertainty because there were no dynamics for those states in the modelled system and they were unmeasured.
  • the true state response for the third state also reflected the high-frequency input but was very small compared to the estimated state values due to the 229 scaling factor in the true system output.
  • FIGS. 8F and 8G show that for the iLQG controllers, after an initial adaptation period, the estimates of the parameters for the controllers remained largely constant and that there was no significant parameter drifting occurring although the estimates of the second parameter did decrease slightly over the 70 second simulation.
  • the ratio of can be plotted, as shown in FIG.8H for the entire time horizon, and for the first seven seconds in FIG.8I.
  • FIG.8I shows that dual iLQG had a better estimate of this ratio for the first 1.5 seconds, after which adaptive iLQG is largely better.
  • FIG.8H also more clearly shows that the slight drifting of the second parameter in FIG.8F resulted in the ratio of the parameter estimates from both controllers nearly converging to the true value by the end of the 70 second simulation.
  • Model-reference adaptive control oscillates around the true parameter ratio, but over time diverges as the parameter drift.
  • FIGS. 8J and 8K show the full and initial plots of the modelled and true system output.
  • the modelled system output is shown to approach 2.01, very close to the desired value of 2, while the true system output approached 1.91. Since the modelled system had practically reached the desired output reference, the controllers’ input stopped changing and the true system’s output also stopped changing, but not necessarily at the desired output reference due to the unmodelled dynamics and measurement noise.
  • Model- reference adaptive control oscillates around the desired output of 2 but diverges quickly 60 seconds into the simulation. The stage and total costs for the adaptive and dual iLQG algorithms are shown in FIGS.
  • Dual iLQG will be applied to a complex biochemical system, an anaerobic digestion, and a complex public policy control problem, the spread of COVID-19 in a population.
  • the focus of this disclosure is the control of systems with uncertain parameters in such a way that the reduction of uncertainty is implicit in the minimization of a given cost function.
  • These dual goals of system identification and cost minimization are often at odds with each other, and this tension creates a set of three features that characterize dual control.
  • Dual control demonstrates caution, minimizing the magnitude of control actions when uncertainties are high, probing, varying the control actions to gain information about the uncertain parameters, and selectiveness, only seeking to gain information on those parameters which will are likely to cause a reduction in future costs [28].
  • the optimization problem in MS-SP-NMPC grows exponentially with increasing uncertain parameters, and its current implementation can only handle two uncertain parameters.
  • This issue of the problem size increasing exponentially with the number of states is known as Bellman’s curse of dimensionality [63], and this limits existing implicit dual methods as they do not take a derivative-based approach in continuous state space.
  • the dual iLQG methods presented in this disclosure fill this gap by extending the derivative-based iLQG method to be implicitly dual. Both the dual and adaptive iLQG methods presented were demonstrated to be robust to time-varying parameters, and unmodelled dynamics, and can handle multiplicative noise.
  • Dual iLQG is applicable to and nonlinear control problems with many uncertain parameters, as shown in its application to the control of anaerobic digestion and COVID-19 in the Examples provided below. Since MS-SP-NMPC is only able to handle systems with two uncertain parameters, simplified versions of these systems were used to compare dual and adaptive iLQG with MS-SP-NMPC, and dual iLQG outperformed MS-SP- NMPC in all but one case. When the systems with all of the uncertain parameters were used, dual iLQG consistently outperformed adaptive iLQG.
  • Example iLQG implementation Selected Aspects of Example iLQG Implementation Some aspects of the Example iLQG implementation are as follows: 1. A high-speed adaptive iLQG algorithm was developed that estimates individual parameters of a system instead of the entire dynamic model of the system. The parameters were treated as constants in the iLQG algorithm through the creation of an augmented constants vector, but treated as states in the outer loop filter through the creation of an augmented state vector and augmented state covariance matrix. 2. Bellman’s “curse of dimensionality” that limits conventional implicit dual control algorithms was overcome by extending the iLQG control algorithm to be applicable to dual control problems.
  • Dual iLQG Algorithms While the preceding section of the present disclosure disclosed an example implementation of a dual iLQG algorithm based on the configuration shown in FIG.4C, it will be understood that the specific implementation shown in FIG.4C is not intended to be limiting, and that different configurations of dual iLQG-based algorithms that employ augmented states and an augmented covariance matrix, may be implemented without departing from the intended scope of the present disclosure. Furthermore, as discussed in additional detail below, the present dual implicit augmented iLQG algorithms may be adapted according to a wide variety of stochastic differential-dynamic-programming-based algorithms.
  • the present section contemplates some example and non-limiting implementations involving different configurations involving an iLQG-based algorithm.
  • the states are augmented to generate an augmented state vector that includes the uncertain parameter(s), such that each uncertain parameter is treated as a state by the iLQG algorithm.
  • the state covariance matrix is augmented with the covariance matrix of the uncertain parameter(s) to form an augmented covariance matrix that is employed by the iLQG algorithm.
  • FIG.9A illustrates an example closed-loop dual iLQG algorithm similar to that shown in FIG.4C.
  • the state and parameter estimates are concatenated into an augmented state vector and the covariances are likewise combined in a block diagonal fashion.
  • the figure shows the closed-loop, dual augmented implementation of the iLQG algorithm, which shows an outer loop of the dual iLQG algorithm, and also includes an iLQG box that represents an inner loop.
  • the iLQG box representing the inner loop of the algorithm, and which employs augmented states and covariance matrices, iteratively converges to an updated control trajectory and policy that contains the dual features of caution, probing, and selectiveness.
  • This is the core of the algorithm, the outer closed-loop implementation shown in the figure, combined with the augmentation, provides the implicit dual functionality of the overall algorithm.
  • the core iLQG algorithm may be implemented by employing the equations in the iLQG section of the present disclosure, but using the augmented state and covariance matrix.
  • the algorithm when implemented, may include the following steps, as previously described: (i) forward pass; (ii) simulating system dynamics using the given control trajectory; (iii) obtaining state, control, and cost derivatives at each time step – used to linearize the system; (iv) employing an inner loop estimator (filter); (v) performing a backward pass; (vi), approximating the cost-to-go function at each time step; (vii) obtaining a control policy for each time step; (viii) obtaining a new control trajectory; (ix) applying the control policy to state estimates for each time step to obtain control deviations; and (x) adding these control deviations to the previous control trajectory.
  • a zero-order hold may be employed to convert the control trajectory from being discrete to continuous in
  • the first control action may then be sent to the true system.
  • the new state of the system is sampled and measured, and the measurement is then filtered, utilizing other system information, to create an updated estimate of the augmented state vector and its covariance, as shown in the outer loop portion of the figure.
  • the control trajectory horizon is then updated, and after a time delay, the iLQG algorithm is run again to determine the control action for the next time step.
  • the outer loop filter(s) can be any algorithm that can provide updated estimates of the states and parameters and their covariances given at least a set of measurements and prior estimates of the states and parameters and their covariances.
  • this filtering would be performed with some type of recursive Bayesian estimator such as a Kalman or particle filter, particularly one suited to nonlinear systems.
  • These filters typically rely on inputs including the dynamic equations, the measurement function, and noise estimates and covariances along with the measurements and prior estimates of the states and parameters and their covariances to provide updated estimates of the states and parameters and their covariances.
  • a single filter could be used, known as joint filtering, or separate filters could be used, known as dual filtering.
  • FIG.9B is the same as FIG.9A other than the fact that it explicitly states that a joint filter is being used.
  • the augmented state vector can be separated before filtering and the updated estimates can be concatenated after the filtering.
  • FIGS.9A and 9B show example implementations in which a single filter is employed to estimate the augmented state vector and covariance matrix based on the measurements, in other example implementations, the measurements may be filtered using separate filters for the systems states and the one or more uncertain parameters, utilizing other system information. An example implementation of such a system is shown in FIG.9C.
  • FIGS.9A-9C illustrate the use of a moving horizon implementation, where the moving control horizon is updated with each time step in the outer loop.
  • the horizon may be updated according to several different approaches, including, but not limited to, moving horizon and rolling horizon approaches.
  • FIGS.9A-9C show additional features of example implicit dual iLQG algorithms relative to FIG.4A, including the covariances of states and uncertain parameter(s), and the augmented covariance matrices, and a loop to show that the control trajectory is reused between iterations.
  • the certain parameters, or "constants" c are also shown as optionally being provided as a function of time.
  • the figures also include dashed lines to show the initialization of the algorithm. Initialization may be performed, for example, with the estimates of the states and parameters and their covariances, certain variables, and a seed control trajectory or policy, as described in the previous section.
  • FIG.9A-9C do not show all of the inputs provided to the components within the inner and outer loops of the augmented iLQG algorithm.
  • additional inputs such as, but not limited to, a dynamic model of the system, a cost function, noise covariances, and a measurement model, are not shown in the figure despite being implemented.
  • the filter block also does not show all necessary inputs to the filter.
  • FIGS.9A-9C show non-limiting example cases in which the augmented state vector is not broken up (de-augmented) before each new iteration of the outer loop.
  • the augmentation can be performed according to many different implementations.
  • the uncertain parameters could be the first component in the concatenation of the state and parameter vectors, or the elements of the parameter vector could be interspersed with the elements of the state vector, provided that the correlating covariances are augmented in the same order (but in a block diagonal fashion instead of concatenated).
  • the example implementations shown in FIGS.9A-9C may be well suited for cases in which at least some of the states are observable, with the filter in the outer loop providing estimates of the unobservable states based on the measurements.
  • a partially observable system is one in which some of the states can be inferred from measurements, and an unobservable system is one on which none of the states can be inferred from measurements.
  • a system can be fully observable, with all states being determinable from measurements without the need for a filter.
  • a filter can be employed to provide estimates of the one or more uncertain parameters, and the states can optionally be computed from the measurements in the absence of the use of a filter.
  • FIG.9D An example implementation of such a system is illustrated in FIG.9D. As shown in the figure, the new state of the system is sampled and measured, and the measurement is then filtered to create an updated estimate of the one or more uncertain parameters, utilizing other system information. The states are determined by a function of the measurement(s) and then concatenated into the updated augmented state vector and updated augmented covariance matrix.
  • the filter employed in the outer loop to update the estimates of the parameters does not strictly require the use of the state dynamic model.
  • an Extended Kalman Filter were used, it contains a term that is the derivative of the states with respect to the parameters which would require the state model, but a Sigma Point Kalman Filter does not require the state dynamics.
  • Gaussian Process regression or other machine-learning techniques can be used to formulate the system dynamics [28, 34].
  • measurements from the system are used to estimate the relation between the change of the states over time and the values of the states, the control inputs, a set of constants, and a set of parameters.
  • This estimate of the system dynamics can then be used in the dual iLQG algorithm in place of a white box dynamic model.
  • FIG.9E illustrates how, within the inner loop 110, the model structure and parameters are provided to the iLQG algorithm.
  • FIG.9F shows various example approaches to model generation, including, for example, data-driven (machine learning approaches), such as regression-based approaches.
  • One non-limiting example of a regression- based approach to model generation is the Sparse Identification of Non-linear Dynamics (SINDy) algorithm.
  • SINDy Sparse Identification of Non-linear Dynamics
  • the ability of state constraints to be imposed on the system could be improved by combining this work with the augmented Lagrangian method for state constraints in differential dynamic programming [24].
  • This is an established method for imposing state constraints in differential-dynamic-programming-based algorithms.
  • the adaptive and implicit dual methods described above with reference to FIGS.4B and 4C and FIGS.9A-9D show the newly determined control action generated by the execution of the inner loop, this control action may not be applied to the system in some cases.
  • a user may overrule the control action, applying a different control action between successive inner loop executions, or choosing to not apply any control action at all between successive inner loop executions.
  • the control policy from the last completed iteration can be used to calculate the new control action instead of using the second element of the control trajectory from the previous (outer loop) iteration of the algorithm.
  • This updated state estimate could come from filtering a system measurement, allowing for the generation of a new control trajectory.
  • An estimation of the time to measure the system and filter the data to obtain a new state estimate could be employed to determine at what point this use of the previous control policy was necessary.
  • control action may be determined, from the updated control policy that is provided by executing the inner loop, based on the state estimates employed as inputs to the inner loop, or, alternatively, based on the update values of the states determined within the outer loop, after performing the measurement of the system output.
  • the parameter dynamics employed in the augmented mathematical model of the system dynamics provides a functional description that models how the uncertain parameters are expected to evolve over time.
  • This model is a set of equations that could be informed by knowledge of the physical system or generated analytically with the use of data and it describes the rate of change of each of the uncertain parameters.
  • This model is not expected to be a prefect reflection of reality, but higher fidelity models would allow for better results from the algorithm.
  • the parameter dynamics are used to predict the values of the parameters in the future in order to ultimately reduce the cost of the control trajectory, and to update the estimates of the parameters in the present time step using measurements of the true system.
  • the parameter dynamics may model one or more uncertain parameters as constants, and in such cases, the for those parameters would reflect zero rate of change over time as per the modeled dynamics.
  • the parameter dynamics may prescribe a constant uncertain parameter, which nonetheless changes its value as successive iterations and control actions are performed, as the implicit dual method takes control actions that refine the uncertain parameter.
  • the preceding example embodiments involving an implicit dual control algorithm each employ an augmented iLQG algorithm as the inner loop of the overall closed-loop algorithm, with an augmented state vector and augmented covariance matrix as described above. It will be understood, however, that a wide variety of alternative algorithms may be employed in the inner loop, provided that the algorithm is based on differential dynamic programming and is adapted for stochastic systems.
  • Such a broader class of algorithms that are augmentable, as per the present method of augmentation of the state vector with uncertain parameters, and the augmentation of the state covariance matrix with the uncertain parameter covariance matrix, are henceforth referred to as stochastic differential-dynamic-programming-based (DDP-based) algorithms.
  • this broad class of augmented algorithms are referred to as augmented stochastic differential-dynamic-programming-based algorithms.
  • stochastic differential-dynamic-programming-based algorithms that can be augmented, and employed in a closed loop control structure as described above, to obtain an implicit dual controller, include iLQG and stochastic DDP (SDDP), and variations thereof.
  • Variations of other differential-dynamic-programming-based algorithms that are not stochastic can be made to incorporate stochastic terms such that they can be augmented, and employed in a closed loop control structure as described above, and include DDP, the Sequential Linear Quadratic (SLQ) algorithm and the Iterative Linear Quadratic Regulator (iLQR) algorithm, and variations thereof.
  • DDP the Sequential Linear Quadratic
  • iLQR Iterative Linear Quadratic Regulator
  • the iLQG inner loop algorithm in FIGS.9A-9D may be substituted with another type of stochastic differential-dynamic-programming-based algorithm, such as SDDP and a stochastic variation of iLQR, to obtain another implementation of an implicit dual controller.
  • SLQ was developed as a variation in the 2000s, which does not use the exact Hessian in step (ii), whereas the DDP algorithm uses the exact Hessian when calculating B in step (ii).
  • the iLQR algorithm is another variation that was independently developed in the 2000s.
  • the difference between iLQR and DDP is that in step (iv) a nonlinear system can be used in iLQR, whereas in DDP the linearized version is used. All three of these techniques are similar, with each having minor differences that can affect efficiency, depending on the application.
  • the state dynamics are augmented with the parameter dynamics to create the augmented state dynamics, in the same manner in which the state vector is augmented with the uncertain parameter vector to create the augmented state vector, and in which the state covariance is augmented with the uncertain parameter covariance matrix to obtain the augmented covariance matrix.
  • the system can be a linear system.
  • the present example implicit dual control systems and methods may find useful and beneficial application, relative to conventional control approaches, when applied to linear systems with multiplicative noise.
  • Dual iLQG, and other example implicit dual differential-dynamic-programming- based control algorithms disclosed herein show significant promise as a control approach that can account for and actively reduce the uncertainties that are inherent to systems while improving their performance. Since this approach is so general and requires no training prior to implementation, it is applicable to a wide range of fields including applications that have been identified as high-impact.
  • 2017 review paper “Systems and Control for the future of civilization, research agenda: Current and future roles, impact and grand challenges” by Lamnabhi- Lagarrigue et al.
  • the present implicit dual algorithms can be applied to inform health care systems, government policy, and financial systems.
  • the review paper by Lamnabhi-Lagarrigue et al. mentions several high-impact system and control applications for the future, and the present implicit dual control algorithms could be applied to many of them, including automotive control, spacecraft control, renewable energy and smart grid, assistive devices for people with disabilities, and advanced building control.
  • dual iLQG was applied to both a government policy problem (COVID-19) and a renewable energy problem (anaerobic digestion).
  • many of the present example implicit dual differential-dynamic- programming-based algorithms can handle multiplicative noise whereas other dual control algorithms such as MS-SP-NMPC and wide-sense dual cannot.
  • multiplicative noise is common, such as in the control of prostheses using myoelectric signals [54], robot motion planning when using sensors with distance-dependent errors [14], stochastic fluid dynamics [12], and power grids where renewable energy represents a significant portion of total power generation [19, 70].
  • Other applications that involve multiplicative noise include financial stochastic volatility models [67], systems involving wireless communication [68], and batch chemical process control [69].
  • the present disclosure provides the closed-loop implicit dual differential-dynamic-programming-based control of physical systems, in which, in some example implementations, signals are transduced from sensors associated with the physical system, transformed into digital signals that are processed to infer states and uncertain parameters associated with the physical system, and to determine, via implicit dual control, suitable control actions for controlling the physical system, thereby providing the control signals to the physical system, such that the control signals are transduced into physical controls that are applied to the physical system, with the physical system being controlled according to the computed control signals.
  • the transformation of these sensor signals into physical changes in the controlled system through closed-loop implicit dual control can thereby improve the performance of the physical system according to a specified objective or cost function.
  • the MS-SP-NMPC control problem can quickly become intractable with increasing uncertain parameters, realizations of each uncertain parameter, and the length of the “robust horizon.
  • the computer is unable to solve such intractable problems due to the large number of variables and calculations involved.
  • the number and size of the variables can exceed the computer’s memory and the quantity of calculations can push the computer’s processor to its maximum limits, extending the time required to solve the problem. In some cases, this execution time may be so long as to preclude the method’s beneficial application in a real-time physical setting.
  • closed-loop implicit dual differential-dynamic-programming- based control as described in this disclosure overcomes Bellman’s “curse of dimensionality” by taking a derivative-based approach in continuous space.
  • This closed-loop implicit dual control method allows the computer to solve dual control problems with large numbers of variables without reaching memory or processor limits through the extension of a differential- dynamic-based algorithm, that can handle, for example, high-dimensional stochastic nonlinear systems, to be an implicit approximation of dual control.
  • closed-loop implicit dual control as described in this disclosure can avoid the discretization of the control and state space that leads to Bellman’s “curse of dimensionality”. Due to the ability of the closed-loop implicit dual control method to handle larger control problems efficiently, it can be applied to improve a variety of complex physical systems.
  • FIG.11 an example system is shown that includes a controllable subsystem 400 that is controllable via, and operatively coupled to, implicit dual controller 200.
  • the subsystem 400 may include separate control and processing circuity, including components such as those shown in implicit dual controller 200, which may be operatively connected to one or more mechanical or non-mechanical output components of the subsystem 400 (such as one or more motors, actuators, or devices that are responsive to a control signal provided by the implicit dual controller 200 for controlling the controllable subsystem).
  • the system includes one or more sensors 410 for sensing signals suitable to determine, either directly, or to estimate via a filter, the augmented state vector and augmented covariance matrix of the controllable subsystem. The types of sensors employed will depend on the application.
  • example sensors can include, but are not limited to, pressure sensors, electrical contact sensors, torque sensors, force sensors, position sensors, current sensors, velocity myoelectric sensors.
  • example sensors can include, but are not limited to, pressure sensors, gas sensors, flow sensors, ultrasonic sensors, alcohol sensors, temperature sensors, and humidity sensors.
  • example sensors can include, but are not limited to, voltage sensors, current sensors, light sensors, pressure sensors, rain sensors, temperature sensors, and anemometer sensors.
  • example sensors can include, but are not limited to, epidemiological surveillance, medical data collection, and laboratory data collection.
  • example sensors can include, but are not limited to, market data collection and economic indicators.
  • one or more of the sensors 410 may be integrated with the controllable subsystem 400, i.e. the sensors may be external sensors or internal sensors (e.g. sensors residing on or within the controllable subsystem 400).
  • implicit dual controller 200 may include a processor 210, a memory 215, a system bus 205, a control and data acquisition interface 220 for acquiring sensor data and user input and for sending control commands to the controllable physical subsystem 400, a power source 225, and a plurality of optional additional devices or components such as storage device 230, communications interface 235, display 240, and one or more input/output devices 245.
  • the methods described herein can be partially implemented via hardware logic in processor 210 and partially using the instructions stored in memory 215. Some embodiments may be implemented using processor 210 without additional instructions stored in memory 215. Some embodiments are implemented using the instructions stored in memory 215 for execution by one or more microprocessors.
  • the example methods described herein for controlling a subsystem can be implemented via processor 210 and/or memory 215.
  • the inner loop of the implicit dual control algorithm is executed by the augmented stochastic differential-dynamic-programming-based algorithm module shown at 300, based on the augmented state and augmented covariance data structures 310, based on estimates provided by the filter 320, employing measurements obtained from the sensors 410.
  • the example system shown in the figure is not intended to be limited to the components that may be employed in a given implementation.
  • the implicit dual controller 200 may be provided on a computing device that is mechanically supported by the controllable subsystem 400.
  • the implicit dual controller 200 may be physically separate from the controllable subsystem 400.
  • the implicit dual controller 200 may include a mobile computing device, such as a tablet or smartphone that is connected to a local processing hardware supported by the controllable subsystem 400 via one or more wired or wireless connections.
  • a portion of the implicit dual controller 200 may be implemented, at least in part, on a remote computing system that connects to a local processing hardware via a remote network, such that some aspects of the processing are (e.g. in the cloud).
  • a remote computing system that connects to a local processing hardware via a remote network, such that some aspects of the processing are (e.g. in the cloud).
  • FIG.11 any number of each component can be included.
  • a computer typically contains a number of different data storage media.
  • bus 205 is depicted as a single connection between all of the components, it will be appreciated that the bus 205 may represent one or more circuits, devices or communication channels which link two or more of the components.
  • bus 205 often includes or is a motherboard.
  • some example embodiments of the present disclosure can be implemented in fully functioning computers and computer systems, various embodiments are capable of being distributed as a computing product in a variety of forms and are capable of being applied regardless of the particular type of machine or computer readable media used to actually effect the distribution.
  • a computer readable storage medium can be used to store software and data which when executed by a data processing system causes the system to perform various methods.
  • the executable software and data may be stored in various places including for example ROM, volatile RAM, nonvolatile memory and/or cache. Portions of this software and/or data may be stored in any one of these storage devices.
  • the phrases “computer readable material” and “computer readable storage medium” refers to all computer-readable media, except for a transitory propagating signal per se.
  • Anaerobic digestion is a chemical process where organic materials are broken down by various types of bacteria in a low-oxygen environment to produce a gas that is commonly known as biogas, which consists primarily of methane and carbon dioxide [9].
  • This biogas has various uses as a fuel, and can, for instance, be used directly for cooking or heating. It can also be compressed into a liquid fuel that is similar to natural gas, or it can be used to generate heat and power with a combined heat and power system [25].
  • biogas can provide consistent power independent of the weather, can be located anywhere there is a consistent supply of organic waste, and the energy can easily be stored for later use.
  • anaerobic digestion represents a controllable renewable energy source that is free of many of the intermittency, storage, and site location issues common with other sources of renewable energy.
  • Anaerobic digestion has several that make it an ideal application for dual control.
  • First of all, anaerobic digestion models have uncertain parameters, and in particular those associated with the dynamics of the bacteria populations are difficult to measure and have a significant impact on the rest of the system dynamics.
  • the composition of the organic feedstocks to the digester is not known precisely. This combination of uncertainties in the feedstock composition, bacteria population dynamics, and the measurements makes this application well-suited for dual control.
  • ADM1 Anaerobic Digestion Model No.1
  • AM2 Anaerobic Digestion Model No.9
  • ADM1 is a comprehensive model that has 24 state variables and is commonly used to simulate anaerobic digestion systems, but due to its complexity has limited use for control approaches [26].
  • AM2 on the other hand has six state variables while still capturing the main dynamics of the process and is commonly used for model-based control and parameter estimation [25]. For these reasons, the AM2 model was used for the present example.
  • the AM2 model represents the anaerobic digestion process as: where X1 is the concentration of acidogenic bacteria, X2 is the concentration of methanogenic bacteria, S 1 is the organic substrate concentration, S 2 is the volatile fatty acid concentration, Z is the total alkalinity concentration, C is the total inorganic carbon concentration, and the model parameters are described in FIG.12 [50].
  • the inputs to this model are the dilution rate, D, along with the inlet concentrations of organic substrate, volatile fatty acids, total alkalinity, and total inorganic carbon, , , , and respectively.
  • Two-parameter comparison with dual MS-SP-NMPC The true and estimated initial values of the state vector, [X 1 , X 2 , S 1 , S 2 , Z, C], are both [0.5, 1, 1, 5, 40, 50] and the state are all constrained to be greater than 0.
  • the controls, [q a , q b , q c ], are all constrained to be between 0.0001 and 1.
  • the true value of the uncertain parameter vector, ⁇ ⁇ 1 , Z C is [1.2, 100], while its initial estimate is [2, 50] with a covariance matrix of diag[0.16, 1111.11].
  • This system is simulated over 70 time steps that are 0.1 days long. Only states [S1, S2, C] are measured with a noise term G of 0.5I3.
  • the process noise scaling matrix F for the dual iLQG state and parameter dynamics were set to 10 - 15 I23 to be comparable with the lack of MS-SP-NMPC to handle noise.
  • FIG.13A which shows the three feedstock controls for the AM2 model for each of the three controllers
  • the dual and adaptive iLQG algorithms took a similar approach, while the dual MS-SP-NMPC algorithm converged to a very different solution.
  • the iLQG algorithms varied all three feedstock controls
  • the MS- SP-NMPC algorithm focused primarily on the first feedstock control and only momentarily used the other two.
  • the relative performance of these controls could only be determined by their impact on tracking the desired rate of biogas production.
  • FIG.13B The plot of the anaerobic digestion states is shown in FIG.13B, with solid lines for the true state trajectories and markers for the state estimate means and covariances. Similar to the controls, the MS-SP-NMPC algorithm took a different approach to solve this problem, as the state trajectories are significantly different from the iLQG algorithms. Additionally, the impact of ⁇ ⁇ 1 being uncertain can be seen in the large covariance associated with the third state, S2, even though it was being measured.
  • FIG.13C shows the plot of the parameter estimates, with the true values shown as black lines.
  • the MS-SP-NMPC algorithm estimate of the first parameter, ⁇ ⁇ 1, was close but not as accurate, and not accurate at all for the second parameter, ZC.
  • the reason for the MS-SP- NMPC algorithm’s estimate of Z C being poor was that it hardly used the third control, q C , and therefore had little opportunity to get feedback on ZC.
  • the MS-SP-NMPC algorithm’s strategy of focusing on only one of the feedstock controls was successful, as it was largely able to maintain the biogas production level near the desired flow rate.
  • Dual MS-SP-NMPC uses the full nonlinear system dynamics and may have had an advantage in this problem with these initial conditions. Seventeen-parameter comparison with adaptive iLQG To show the ability of dual iLQG to handle many uncertain parameters, it was compared to adaptive iLQG for the AM2 system with seventeen uncertain parameters. Dual MS-SP-NMPC was not included in this comparison as it is unable to handle more than two uncertain parameters.
  • the true and estimated initial values of the state vector, [X 1 , X 2 , S 1 , S 2 , Z, C], are both [0.5, 1, 1, 5, 40, 50] and the state are all constrained to be greater than 0.
  • the controls, [qa, qb, qc], are all constrained to be between 0 and 1.
  • the true and initial estimated values This system was simulated over 70 time steps that are 0.1 days long. Only states [S1, S2, C] were measured with a noise term of of 0.5I.
  • the multiplicative noise terms for the dual iLQG state and parameter dynamics were set to 10 - 15 I to be comparable with the lack of MS-SP-NMPC to handle noise.
  • the cost function was as given for the two- parameter example, shown in equation (4.19).
  • FIG. 15A shows the results for the two iLQG algorithms’ control trajectories. As with the previous example, these results by themselves did not indicate the relative performance of the algorithms as there were no terms in the cost function to explicitly penalize or encourage the use of the feedstock controls.
  • FIG.15B shows the true and estimated values for the anaerobic digestion system states. The two iLQG algorithms had similar values for the first four states, but there was a significant difference for the last two states, as the dual controller had higher total concentrations of alkalinity and inorganic carbon. Looking at the parameter estimates in FIG.15C, the two iLQG algorithms gave similar results for most of the parameters with only a few that approached the true parameter values.
  • Non- limiting examples of industrial manufacturing, synthesis or fabrication system include automated assembly lines (such as automobile assembly lines), additive manufacturing systems, chemical synthesis reactors, injection molding systems, semiconductor fabrication systems, CNC machining systems, metalworking systems, bio-manufacturing systems, electrochemical and thin film deposition systems, and autonomous textile weaving systems.
  • Example 2 Application to Control of the COVID-19 Outbreak The control of the COVID-19 pandemic continues to represent an enormous challenge for governments all over the world.
  • the movement of the population through these states can then be represented graphically as arrows between each compartment (state), and these population flows can then be described with equations based on the states themselves as well as parameters and controls.
  • These parameters generally represent infection and fatality rates for different populations, and the controls are methods of influencing these dynamics.
  • These compartmental models have been tailored to better represent the dynamics of the COVID-19 virus, with different compartments or states being considered by different researchers.
  • the SIDARTHE model can be expressed as: where the description of the parameters can be found in FIG.17B.
  • the recovery and mortality rates of the Threatened state are modelled as being dependent on the Threatened state to represent the impact of the health care system being overwhelmed.
  • this effect was achieved in a two-step process, whereby a model was created where the Threatened population was divided into those in the limited-capacity intensive care unit (ICU) and those not, and then this model was simplified to maintain the eight states described above.
  • ICU limited-capacity intensive care unit
  • T1 as the Threatened population that do not require ICU treatment
  • T2 as those that do, and assuming that there are no transfers between these two populations
  • the (T )T and (T )T terms can therefore be represented as: and used in equations (5.9) to (5.11).
  • SIDARTHE model limitations Although the SIDARTHE model can capture the major aspects of the dynamics of COVID-19, there are several limitations to this model. First of all, this model represents the population as static, other than deaths due to COVID-19.
  • SIDARTHE does not include population changes due to travel, births, or non-COVID-related deaths. A more complex model that did include non-COVID-related deaths would also be an interesting application for dual control. Additionally, the SIDARTHE model only represents the public health policies as a single control input, lumping the impact of these policies into a single value representing the severity of the restrictions. Although this makes the implementation of the model much easier, it would be difficult for health agencies to get precise recommendations from such a lumped term. Additionally, this single control action limits the potential probing that a dual control method could implement, as in reality multiple policies can be varied over time.
  • the public health policies that were considered were media campaigns (u1), enforcing social distancing and mask use (u2), performing asymptomatic testing (u 3 ), performing symptomatic testing (u 4 ), quarantining of positive cases (u5), increasing non-ICU hospital resources (u6), and increasing ICU resources (u7). Since many of these public health policies influence more than one parameter in the SIDARTHE model to varying levels, effectiveness parameters were introduced. For instance, both media campaigns and enforcing social distancing and mask use will lower , but the media campaigns may do so less effectively.
  • Two-parameter comparison with dual MS-SP-NMPC Since the dual MS-SP-NMPC code was only able to handle two parameters, it was compared with dual and adaptive iLQG for the original single-control SIDARTHE model.
  • the uncertain parameters were chosen to be min and min , as the extent to which the spread of COVID-19 can be reduced by imposing full restrictions is a critical factor in determining the trade-off between reducing cases and the socio-economic impacts of restrictions.
  • the values of the total population and the parameters used in these simulations were taken from [29], where the COVID-19 outbreak in Germany was modelled.
  • the states were constrained to [0, 82999999] and the true and estimated initial values of the state vector were both [82998999, 1000, 0, 0, 0, 0, 0, 0] .
  • the control inputs were constrained to [0, 1].
  • the true value of the constant parameter vector was [0.0422, 0.0422], while its initial estimate was [0.15, 0.1] with a covariance matrix of [0.0025, 0; 0, 0.0011].
  • This system was simulated over 40-time steps that were 1.0 days long, with a rolling horizon approach.
  • the Diagnosed, Recognized, Threatened, and Extinct states were measured with a noise term of diag([10, 7.5, 5, 2.5]), and as the MS-SP-MPC did not consider the states to be uncertain, the process noise scaling matrix for the dual iLQG state and parameter dynamics were set to 10 - 15 I10 and the noise was not implemented in the true system in the outer loop.
  • This cost function was very similar to the one used in [51] for the suppression of COVID-19, but c u was increased here to increase the cost of imposing restrictions. Additionally, this is a representative cost function; its values would have to be set by policymakers, but the benefits illustrated in this work are robust to various cost functions. Instead of using a shrinking horizon approach for the SIDARTHE simulations, a rolling horizon approach was used to match how this problem would be approached by governments.
  • FIG.19A shows the controls for the adaptive iLQG, dual iLQG, and dual MS-SP- NMPC algorithms. Both of the iLQG algorithms used the full range of the constrained control, while the MS-SP-NMPC algorithm’s control remained near zero for the entire simulation duration.
  • the dual iLQG controls use a high degree of restrictions early on, with a small variation that likely helped with parameter identification, before tapering off over the 40-day time frame.
  • the effect of these controls on the system states can be seen in FIG.19B, where the near lack of control from the MS-SP-NMPC algorithm is clear from the rapid increase in case numbers, while the iLQG algorithms did better at keeping the case numbers under control, with dual iLQG performing better than adaptive iLQG.
  • the estimation of the two uncertain parameters is shown in FIG.19C.
  • the dual MS-SP-NMPC algorithm’s estimates remained at the initial values, while the iLQG algorithms converged to the true values in a similar manner.
  • both adaptive and dual iLQG are an improvement on iLQG, but where the iLQG and adaptive iLQG solutions are strongly skewed to the left, the dual iLQG solutions do not show this same pattern. It may be that for this application the dual probing actions do not result in the cost reductions that the algorithm determined were probable.
  • the dual iLQG algorithm does find a similar minimum solution as the adaptive algorithm though, and the two have a similar variance. Looking at the simulation time required for these seeded runs in FIG.20C, the results are similar between the three algorithms. Although the dual algorithm would normally be expected to have the longest times, and it does have the largest extreme times in this case, it actually has the lowest median time of the three algorithms.
  • the complexity of the model and the length of the horizons appear to have more of an impact on the simulation times than the differences between the Sixteen-parameter comparison with adaptive iLQG
  • the model expanded from the two-parameter case to sixteen parameters in this case to compare dual and adaptive iLQG.
  • Dual MS-SP-NMPC was not included in this comparison as it is unable to handle more than two uncertain parameters.
  • the modified SIDARTHE model used for this comparison used 5 control inputs, u1 to u5, as described in equations (5.20) to (5.24).
  • the states are constrained to [0, 82999999] and the true and estimated initial values of the state vector are both [82998999, 1000, 0, 0, 0, 0, 0, 0] .
  • the 5 control inputs are constrained to [0, 1].
  • the true and initial estimated values of the uncertain parameters used in this simulation are shown in FIG.21. This system is simulated over 30 time steps that are 1.0 days long, with a rolling horizon approach.
  • the Diagnosed, Recognized, Threatened, and Extinct states are measured with a noise term of diag([10, 7.5, 5, 2.5]), and the noise term for dual iLQG were set to 10 - 5 for the state dynamics and 10 - 15 for the parameter dynamics, but the noise was not implemented in the true system in the outer loop.
  • the controls resulting from each of the iLQG algorithms are shown in FIG.22A.
  • the dual controller has significantly lower values for the 2nd, 3rd, and 4th controls (enforcing social distancing and mask use, asymptomatic testing, and symptomatic testing) that have higher weights in the cost function, while the 1st and 5th controls are very similar to adaptive iLQG.
  • FIG.22B shows the true and estimated states for each algorithm, along with the covariance of the estimates.
  • the dual controller has only slightly higher case numbers than the adaptive controller. Due to the significant measurement noise and large covariance of the states in this example, the parameter estimates shown in FIG.22C also have large covariances, and the estimates do not converge to the true values in the 30-day time horizon.
  • a control action may be taken by the public health entity that is different from a recommended control action as per a currently determined control policy at a given time step or iteration (which may include taking no action at all), and the method may proceed to the following iteration without application of the recommended control action.
  • Example 3 Autonomous Vehicle Control Autonomous vehicles must be able to operate in uncertain and changing environments. Many of these uncertainties can be expressed as uncertain parameters in stochastic mathematical models of the dynamics of the autonomous vehicle system, and additional information on these parameters can be gained through measurements. With a large number of states and uncertain parameters depending on the type of land, water, or air vehicle, computing control actions for a vehicle in real-time would be challenging for any conventional algorithm.
  • uncertain parameters may include, but would not be limited to, any one or more of the road friction coefficient and tire contact parameters.
  • these parameters may include, but would not be limited to, any one or more of the effectiveness coefficients of the control surfaces or lift, drag, or moment coefficients or other nondimensional coefficients.
  • the mass of the vehicle may also be considered if it is deemed sufficiently uncertain in a given case.
  • the adaptive and dual control methods described above would calculate control policies that would be implemented on the autonomous vehicles with the objective of minimizing the given cost functions. These control policies would inform for example the use of wheel torques, braking, and steering in the case of a land vehicle, or the thrust, control surface position, or ballast tanks in the case of a water vehicle.
  • the example adaptive control or implicit dual control methods described herein, when applied to the present example application of autonomous vehicle control, would employ an encoded augmented mathematical model of the dynamics of the autonomous vehicle.
  • the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented mathematical model of the dynamics of the autonomous vehicle within the augmented stochastic DDP-based algorithm within the inner processing loop enables the aforementioned uncertain parameters to be treated like states during the computation of the updated control policy, thereby enabling the implicit dual features of probing, caution, and selectiveness to achieving convergence faster than a corresponding method absent of augmentation.
  • Such an autonomous controller based on dual implicit control enables the anticipation of the learning that will result from a given set of control actions for controlling the autonomous vehicle, and how the information it gains will help it achieve its goal as expressed in the cost function, such that the control policy determined for controlling the dynamics of the autonomous vehicle adapts to uncertain model parameters, such as uncertain and potentially hazardous conditions.
  • a dual control algorithm (implemented according to a method employing an augmented stochastic DDP-based algorithm, as noted above) for an autonomous land vehicle would be cautious in that it would limit large and sudden control actions, probing in that it would make movements to better assess the current road conditions, and it would be selective in that it would prioritize identifying the road conditions over other uncertain parameters depending on the given situation.
  • Example 4 Personalized Healthcare Rehabilitation Understanding the best rehabilitation treatment plan for an individual is difficult for many reasons. A significant one is that existing treatment pathways are built on studies that aggregate across clinics. By the time these studies are published, the pathways that they have analyzed are decades old, no longer representing the entire set of available actions. In addition, these pathways are aggregated.
  • rehabilitation is deterministic if we modeled the precise quality and quantity of sleep every night, the exact diet, and many other details, but absent of these, rehabilitation presents as an extremely stochastic process in which the current best practice is to rely on aggregated data vs. responding to the daily stochastic fluctuations presented by the patient.
  • These unconventional methods employ derivate-based stochastic DDP-based algorithms and involve the processing, by computer hardware, of an augmented state data structure that includes the both the states and the uncertain parameters, an augmented state covariance data structure that characterizes the covariance both the states and the uncertain parameters, according to an encoded augmented mathematical model of the personalized rehabilitation dynamics, where the uncertain parameters are treated as states.
  • This unconventional approach enables, as described above, the efficient computation of a suitable control policy with improved computational efficiency.
  • a wide range of states can be considered.
  • one or more states may be selected from the taxonomy identified by the World Health Organization (International Classification of Functioning, Disability, and Health: ICF), that includes categories of participation, activities, health condition, body functions and structures, environmental factors, and personal factors. States within the participation domain could include the number of times a person was able to go to work; engage in their favourite sport; or participate in other meaningful activities for them. As well, it could include states including quality of life. Measurements of states within this domain include the short Form 36 and other questionnaire, as well as location-data that can be tracked and logged.
  • ICF World Health Organization
  • States within the activity level could include their level of performance in various activities, including walking on level ground, walking up stairs, getting into and out of cars, and other activities pertinent to their intended participation and their health condition. Measurements of these states include validated tests such as the timed up and go (TUG) test, or the 6-minute walk test. States related to health condition could include the functional ability to use a wheelchair or exoskeleton, and measured by physiotherapists using a rating scale. States related to body functions and structures would include the location and type of For example, in the domain of spinal cord rehabilitation it would include the level of injury (e.g., T5) and whether it was complete or incomplete, typically measured using an ASIA Impairment scale.
  • T5 the level of injury
  • ASIA Impairment scale e.g., T5
  • States within the category of environmental factors could include states such as the distance to and accessibility of the various treatment options given their geography, as well as whether clinicians were able to speak their native language and/or translation services were available.
  • personal factors could include states such as the age, height, weight, sex, and ethnicity of the person; the level of motivation of the individual; their level of executive function; and other intrinsic attributes and motivators. These states can be measured, for example, through IQ and EQ tests and other tests of motivation.
  • the relationship between the states, combined with inputs of treatment options can be dynamically modelled using a combination of data-driven processes and existing dynamical information on available actions.
  • Non-limiting examples of parameters that are uncertain within a mathematical model of personalized rehabilitation include the gains and time-constants between each of the states.
  • uncertain parameters include the gain and time-constant that relate FES training to walking ability, the gain and time- constant that relates FES training to the ability to participate in a specific activity such as golf, and the gain and time-constant that relate FES training to the ability to participate with community members within the golfing community.
  • gains and time-constants between each of the actions and each of the states as well as between each of the states and each other.
  • control actions may include pharmacological treatments (e.g., spasticity-reducing drugs such as Botox), surgical treatments (e.g., tendon release), conventional therapeutic approaches (e.g., stretching or walking), and/or robotic/assistive technological approaches (e.g., exoskeletons, functional electrical stimulation).
  • pharmacological treatments e.g., spasticity-reducing drugs such as Botox
  • surgical treatments e.g., tendon release
  • conventional therapeutic approaches e.g., stretching or walking
  • robotic/assistive technological approaches e.g., exoskeletons, functional electrical stimulation.
  • the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented mathematical model of the dynamics of the personalized rehabilitation within the augmented stochastic DDP-based algorithm within the inner processing loop enables the aforementioned uncertain parameters to be treated like states during the computation of the updated control policy, thereby enabling the implicit dual features of probing, caution, and selectiveness to achieving convergence faster than a corresponding method absent of augmentation.
  • Such an autonomous controller based on dual implicit control enables the anticipation of the learning that will result from a given set of control actions for controlling the treatment decisions employed during personalized rehabilitation, and how the information it gains will help it achieve its goal as expressed in the cost function.
  • example methods may include applying the control actions that are determined as the example methods are executed and an improved control policy is determined on each iteration of the method
  • other example implementations of the methods may be absent of the step of applying the control action – and thus absent of performing a medical treatment or therapeutic intervention – and may only be employed to communicate potential control actions that can be considered, and optionally implemented, by a provider.
  • a control action may be taken by the provider that is different from a control action as per a currently determined control policy at a given time step or iteration (which may include taking no action at all), and the method may proceed to the following iteration without application of the recommended control action.
  • Example 5 Wearable Robotics
  • Wearable robotics include upper and lower-limb exoskeletons and exosuits, which can either be rigid or soft. The goal of these devices is to augment, assist, or rehabilitate individuals. However, every individual moves slightly differently. Human movement is inherently stochastic due to the biologically stochastic nature of force generation within human muscles, and each person comes up with slightly different control strategies based on the dimensions of their various limbs; their strength and flexibility, and any underlying impairments they may have.
  • One of the largest goals for wearable robotics has been to make life easier for people – typically defined as reducing the metabolic cost for them to do their desired activities.
  • developing efficient personalized tuning settings that reduce metabolic cost has been challenging.
  • These unconventional methods employ derivate-based stochastic DDP-based algorithms and involve the processing, by the computer hardware controlling the wearable robotic system, of an augmented state data structure that includes the both the states and the uncertain parameters, an augmented state covariance data structure that characterizes the covariance both the states and the uncertain parameters, according to an encoded augmented mathematical model of the dynamics of the wearable robotic system, where the uncertain parameters are treated as states.
  • This unconventional approach enables, as described above, the efficient computation of a suitable control policy that is tailored to the user of the wearable robotic system with improved computational efficiency.
  • Non-limiting examples of states of such a stochastic system may include, for example, relevant states used by the wearable robot, which typically include joint angles, torques, forces, power, and energy, along with metabolic consumption as measured by CO2 expiration.
  • Some wearable exoskeletons also use states including myoelectric activation as recorded using electromyographic electrodes, sonomyography, or other devices.
  • Other devices include states that measure user intent as measured using EEG, or other devices.
  • Others enforce state-based impedance regimes in which, depending on which phase of gait a person is, certain parameters are tuned to determine an impedance and equilibrium position.
  • Others enforce a phase portrait or virtual holonomic constraint.
  • Many models are used in the field, but all of them have tunable parameters that are unique to each individual, and the majority of the models in the field have multiple parameters, making tuning them difficult given how stochastic both the user’s signals are as well as how stochastic and time-delayed the measurement of metabolic activity is.
  • the uncertain parameters employed to define an augmented state data structure and augmented covariance data structure will in general depend on the particular mathematical model used.
  • the uncertain parameters could include, for example, any one or more of the stiffness, damping, and inertia values along with the equilibrium position within each state.
  • the uncertain can include, for example, one or more of parameters describing a minimum jerk trajectory or other kinematic profile (e.g. according to a model described in US Patent No. 10,213,324, Sensinger and Lenzi 2019).
  • the uncertain parameters can include, for example, one or more parameters that determine the profile of the phase portrait (e.g. according to a model described US Patent No. 10,314,723, Sensinger and Gregg.2019). It will be understood that a wide variety of models and associate parameters may be employed (e.g. another example model is described in US Patent No. 10,799,373, Lenzi and Sensinger.2020), and in general, the various models each have tunable parameters that are unique to each individual.
  • Non-limiting examples of control actions include generating joint kinematics (position or velocity profiles or phase portraits), kinetics (producing torques or torque trajectories), and applying the control actions to actuators (e.g. motors).
  • the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented mathematical model of the dynamics of the wearable robotic system in a filter residing in the outer processing loop enables the outer loop filter to provide an updated estimate of the aforementioned uncertain parameters, taking into account their uncertainty, thereby leading to an improvement in convergence, computing efficiency, and personalization of the operation of the wearable robotic system.
  • the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented model of the dynamics of the wearable robotic system within the augmented stochastic DDP-based algorithm within the inner processing loop enables the aforementioned uncertain parameters to be treated like states during the computation of the updated control policy, thereby enabling the implicit dual features of probing, caution, and selectiveness to achieving convergence faster than a corresponding method absent of augmentation.
  • Such an autonomous controller based on dual implicit control enables the anticipation of the learning that will result from a given set of control actions for controlling the wearable robotic system, and how the information it gains will help it control the wearable robotic system in a manner that is tailored to individual user preferences, thereby leading to improved adoption, utilization and clinical efficacy.
  • Example 6 Fault Detection
  • controllers such as electrical systems, electromechanical systems (systems or subsystems that include electrically driven/actuated mechanical components), hydraulic systems, pneumatic systems, thermal systems, and combinations thereof.
  • Such industrial systems or subsystems often face degradation over time, with the degradation causing failure. Having the ability to monitor the health of the system and detect faults indicative of potential failure can prevent expensive repairs, downtime, and loss of life.
  • This unconventional approach enables, as described above, the efficient computation of a suitable control policy that accounts for system degradation with improved computational efficiency. Moreover, such an approach enables the monitoring, during the control of the system, of the uncertainty associated with one or more uncertain parameters, thereby enabling a control method that can facilitate the early detection of system degradation.
  • Mathematical models of industrial systems such as those described above may already exist, or may be generated to characterize the time evolution of system states according to parameters that can include one or more uncertain parameters. It will be understood that the specific states, parameters, and controls would be specific to a given industrial system.
  • Non-limiting examples of industrial systems and associated uncertain parameters include the detection of scaling in boilers (for which example uncertain parameters could include heat transfer efficiency or fouling factor) and DC motor fault detection (for which example uncertain parameters could include motor resistance, friction torque coefficient, and magnetic flux linkage).
  • Additional examples of industrial systems that are controllable for autonomous fault detection in the presence of uncertain model parameters include, but are not limited to, robotic systems, autonomous and non-autonomous vehicles, wind and tidal turbines, and HVAC equipment.
  • the adaptive and dual control methods described above would calculate control policies that would be implemented on the industrial system with the objective of minimizing the given cost functions and also providing the features of fault detection and system health monitoring. The nature of the controls would depend on the specific industrial system to which this control approach was applied.
  • the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented mathematical model of the dynamics of the industrial system within the augmented stochastic DDP-based algorithm within the inner processing loop enables the aforementioned uncertain parameters to be treated like states during the computation of the updated control policy, thereby enabling the implicit dual features of probing, caution, and selectiveness to achieve convergence faster than a corresponding method absent of augmentation, and facilitating the monitoring of uncertainty associated with its parameter estimates while the system controlled during normal operation.
  • the augmented stochastic DDP-based algorithm can be motivated to probe and monitor these parameters so that the overall cost was not impacted.
  • the uncertain parameters and their uncertainties can be employed to facilitate fault detection according to many different implementations.
  • criteria such as thresholds
  • one or more specific uncertain parameters defining bounds on their normal operating ranges, and/or uncertainty e.g. variances and/or covariances as determined by the augmented covariance data structure
  • uncertainty e.g. variances and/or covariances as determined by the augmented covariance data structure
  • Example 7 Building climate Control With 12% of global energy usage being used for heating and cooling buildings, building climate control significantly impacts climate change (for example, see González-Torres et al., Energ. Rep.8, 626-637, 2022).
  • the control of the climate of a building can involve a variety of devices, including all types of HVAC equipment, automatic blinds, lighting, and heat energy storage equipment to maintain a building’s temperature and humidity levels. Sensors for such a system could include temperature, humidity, occupancy, sunlight, and anemometer sensors.
  • Such a control system can also be informed by external factors including weather predictions, the scheduled usage of the building, and time-of-use power pricing, all of which contribute to uncertainty in a mathematical model employed by a controller.
  • MPC-based controllers can maintain the climate of the building by predicting and accounting for the impact of disturbances, but this approach requires a sufficiently accurate model of the building, including numerous parameters that have associated uncertainty.
  • Some of these uncertain parameters involve external factors as noted above, while other uncertain parameters are building-specific and can vary with time. Accordingly, identifying these uncertain parameters and their appropriate respective values would have to be performed for each building for which this control method is used, and this would be an expensive and potentially cost-prohibitive approach. Accordingly, a technical problem exists in the field of building climate control in that existing control methods fail to accommodate, learn and refine uncertain parameters related to external factors and/or building-specific aspects that impact the interior climate of a building.
  • the states involved in a building climate control model can vary depending on climate control implementation, but in some non-limiting examples, can include temperatures and humidities, and the uncertain parameters could consist of thermal conductivities, thermal capacities, radiation heat transfer coefficients, and HVAC equipment efficiencies.
  • thermal conductivities thermal capacities
  • radiation heat transfer coefficients thermal capacities
  • HVAC equipment efficiencies are critical for determining the impact of sunlight on a building’s external and internal temperatures, and fouling of surfaces and glazings can significantly change these values.
  • the efficiencies of many types of HVAC equipment, specifically heat pumps for example are dependent on the outside conditions and can vary significantly throughout a single day.
  • time-varying parameters could be included in the dual iLQG augmented state vector so that they could be identified during normal operation of the building climate control system.
  • Mathematical models for building climate control typically describe the rate of change of temperatures and humidities of interest. This is usually done in a lumped approach to simplify the control model and avoid the use of partial differential equations, such as for example considering the entire exterior south face of a building to be a single temperature.
  • the states of the model could include for example the temperature of each external and internal wall of the building, the air temperature of each room in the building, the humidity of each room in the building, and the temperature of any heat storage equipment.
  • the model would relate the rates of change of these states to the values of these states, the thermal properties of the building both known and uncertain, the impact of the control equipment, and the impact of external factors as described above.
  • the adaptive and dual control methods described above would calculate control policies that would be implemented with the building climate control equipment with the objective of minimizing the given cost functions. These control policies would inform for example the scheduling of the HVAC equipment, automatic blinds, lighting, and heat energy storage equipment.
  • the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented mathematical model of the dynamics of the building climate control system in a filter residing in the outer processing loop enables the outer loop filter to provide an updated estimate of the aforementioned uncertain parameters, taking into account their uncertainty, thereby leading to an improvement in convergence, computing efficiency, and enabling the online monitoring of the uncertain parameters degree of uncertainty, as per updated values generated by the outer loop filter.
  • the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented mathematical model of the dynamics of the building climate control system within the augmented stochastic DDP-based algorithm within the inner processing loop enables the aforementioned uncertain parameters to be treated like states during the computation of the updated control policy, thereby enabling the implicit dual features of probing, caution, and selectiveness to refinement of uncertain parameters associated with external factors and building-specific aspects during climate control, and to achieve convergence faster than a corresponding method absent of augmentation.
  • Each of these sources can have a different impedance (resistance, inductance, and capacitance), and can run at slightly different frequencies and amplitudes, making the grid itself a stochastic system. Ensuring that power from each component of the grid can be used without the entire grid becoming unstable is a challenging problem. Accordingly, a technical challenge exists in regulating smart grids in that the system is stochastic and that there are many uncertain parameters that govern the dynamics of its states, making it difficult to ensure stability during operation. This technical problem involving the need to account for uncertain parameters that are associated with potential instability of a smart grid during its regulation can be solved by the adaptive and dual control methods of the present disclosure, as described in further detail below.
  • These unconventional methods employ derivate-based stochastic DDP-based algorithms and involve the processing, by the computer hardware controlling the smart grid, of an augmented state data structure that includes the both the states and the uncertain parameters, an augmented state covariance data structure that characterizes the covariance both the states and the uncertain parameters, according to an encoded augmented mathematical model of the dynamics of the smart grid, where the uncertain parameters are treated as states.
  • This unconventional approach enables, as described above, the efficient computation of a suitable control policy that accounts for and determines improved estimates of the uncertain parameters during control, with improved computational efficiency, thereby providing a customized control method that can lead to improved stability during regulation of the smart grid.
  • a mathematical model of a smart grid may map the various impedance parameters of each power source within the grid to the net power supply of the grid, and non- limiting examples of states include voltage amplitude and frequency being produced by each source within the grid.
  • states include voltage amplitude and frequency being produced by each source within the grid.
  • many of the parameters that govern the dynamics of these states are uncertain, due to the stochastic of the system and variations among components within systems or within a given system.
  • Non-limiting examples of uncertain parameters include the impedance parameters of each source within the grid, along with the electrical connections that connect them.
  • Examples of control actions that can be taken, when implementing an adaptive or implicit dual control method according to the example embodiments described above or variations thereof, include selecting which sources are connected to the grid, along with the addition or removal extra impedance to the grid.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

Systems and methods are provided for determining control actions for controlling a stochastic system via an implicit dual control using a computer processor and associated memory encoded with an augmented mathematical model of dynamics of the system, the augmented mathematical model characterizing the dynamics of the states and uncertain parameters of the system. A stochastic differential-dynamic-programming-based algorithm is employed to process an augmented state data structure characterizing the states of the stochastic system and the one or more uncertain parameters, and an augmented state covariance data structure comprising a covariance matrix of the states of the stochastic system and a covariance matrix of the uncertain parameters, according to the augmented mathematical model and a cost function, in which the uncertain parameters are treated as additional states subject to the augmented mathematical model, to determine a control policy for reducing cost through implicitly generated dual features of probing, caution, and selectiveness.

Description

IMPLICIT DUAL CONTROL FOR UNCERTAIN STOCHASTIC SYSTEMS CROSS-REFERENCE TO RELATED APPLICATION This application claims priority to U.S. Provisional Patent Application No. 63/537,243, titled “IMPLICIT DUAL CONTROL FOR UNCERTAIN STOCHASTIC SYSTEMS” and filed on September 8, 2023, the entire contents of which is incorporated herein by reference. BACKGROUND The present disclosure relates to the control of stochastic systems with uncertain dynamics. In real-world control problems, there are often uncertainties due to the system dynamics being difficult to model and/or measurements being inaccurate or unavailable. These uncertainties make it difficult to determine what control actions should be taken to have a system achieve a desired objective. System identification techniques can identify these uncertainties through testing before the system is put into normal operation, but this option is not always available and cannot identify time-varying uncertainties. Adaptive control techniques can identify uncertain parameters while directing a system toward a desired objective, but cannot determine what actions would result in measurements with more information about the uncertainties, and are therefore passively adaptive. Actively adaptive control techniques, on the other hand, estimate the reductions in uncertainty that will result from their control actions and probe the system to identify the uncertain parameters to a sufficient level such that the desired goal is optimized. This actively adaptive control is known as dual control, as the controls are chosen to learn about the uncertainties and to reduce the cost. When designing a control system for systems with uncertain parameters, there are several approaches that could be used. Robust approaches consider the set of possible values that the parameters could take and can account for the uncertainties to limit their impact on achieving the control goals (e.g., [39]). Stochastic approaches consider the probability density function of the uncertain parameters to account for parameter realizations with the higher chances of being true (e.g., [6]). Adaptive approaches continuously update estimates of the uncertain parameters using measurements from the system and controls the system as if these estimates are the true values of the parameters (e.g., [15]). Finally, dual approaches consider what actions could be taken to improve the information in future measurements in such a way that the resulting reduction in future costs is greater than the cost of these probing actions (e.g., [47]). Adaptive control methods are passively adaptive, in that they only consider changes to parameter uncertainties due to past measurements. Dual control on the other hand, is an actively adaptive control method, in that it actively modifies its control actions to seek out future measurements that will reduce parameter uncertainties. Dual control has three features: it is probing, cautious, and selective [28]. Dual control is probing in that it modifies its control signals to obtain more information-rich measurements, it is cautious in that it will tend to make smaller control actions when uncertainties are high, and it is selective in that it will only attempt to identify parameters that impact the system’s performance. An analogy for dual control that has been used is driving a new car from location A to location B [32]. Although it may be more efficient to just start driving under assumptions of how cars generally operate and changing your driving based on differences that you notice (taking an adaptive control approach), taking the time to quickly test (probe) the car’s responsiveness to the brakes and the gas pedals at a safe time (cautious) will be informative, and although this approach will slow down the start of your trip a bit, could prevent an accident. Also, although understanding how the air conditioning works may also be helpful, it is not as important to know as the car’s responsiveness (selectiveness). Dual control was first identified by Feldbaum in 1960 [18], and he recognized that the optimal dual control problem could be solved using stochastic dynamic programming. Unfortunately, stochastic dynamic programming involves solving the Bellman equations, which are generally computationally inefficient due to what Bellman termed the curse of dimensionality, where the size of the problem grows exponentially with the number of states [63]. Although dual control is the only class of stochastic control policies that exhibits active learning, computationally tractable approximations of dual control have been developed [7]. There are two types of approximations of dual control: explicit and implicit. Explicit dual control methods directly incorporate one or more of the three features of dual control into the objective function. Although this approach is simple, the relative importance of the regulation and identification functions must be defined [59]. Implicit dual controllers, on the other hand, approximate the Bellman equations in such a way that the reduction of uncertainty due to future measurements is estimated, which generally comes at the cost of higher computational effort compared to explicit methods [22]. The relative importance of the dual features and the other objectives does not have to be quantified with the implicit approach as they are linked, but probing for increased parameter information is only done if it will reduce future costs. By not prescribing the relative importance of the control objective and these dual features, implicit methods can give better results than explicit methods [7]. There are several existing dual implicit approaches, but each only considers limited control or parameter realizations to make the control problem tractable or is only applicable for systems with a limited number of states. Bayard and Schumitzky [7] use an iteration in policy space algorithm that combines particle filtering with Monte Carlo simulations to estimate the cost-to-go and iteratively improve on a given control policy but is limited to control inputs that only take on two discrete values. Sehr and Bitmead [53] approximate the system dynamics as a Partially Observable Markov Decision Processes, allowing the dual stochastic MPC problem to be solved explicitly for small and medium-sized problems. Thangavel et al. [60] and Hanssen and Foss [20] take a multi-stage approach to implicit dual control and consider a branching network of scenarios, where each branch represents the system’s predicted response for a single realization of the uncertain parameters. The optimization problem that is solved is to determine the control actions (over the control horizon) which minimize the sum of the costs of the scenarios over the prediction horizon. The reduction in the uncertainties due to future measurements is estimated for each time step in a “robust” horizon, and this reduction is reflected in the selection of the discrete uncertainty values upon which the subsequent scenarios are based. As this process causes the number of scenarios to grow exponentially, the robust horizon is usually chosen to be less than the control horizon, and the uncertainties after that point are assumed to be equal to the nominal value, no longer incorporating the dual features. The existing implicit dual approaches are limited by Bellman’s curse of dimensionality as they do not take a derivative-based approach in continuous state space. Optimal Control In optimal control, the control actions to a dynamic system over a period of time are determined by solving an optimization problem where a given objective function is minimized [63]. The objective function is chosen to cause the system to demonstrate a particular behaviour, such as following a particular state trajectory or minimizing energy usage. Objective, or cost, functions can have terms that impose costs at each point in time, known as stage costs, or terms that only impose costs at the final time, known as terminal costs. Constraints can also be imposed on the states and/or controls in the form of inequalities or equalities [63]. A common example of an optimal control problem is finding the cheapest connections to fly to a destination [63]. Here, the state x represents the cities, the control u represents the choices of which flights to take, cost(x, u) represents the cost of the plane ticket, and next(x, u) represents the city where the flight u from city x lands. The goal is therefore to find the series of flights (u0, u1, ... , u 1) to get from the initial departure city x0 to the final destination xn that minimizes the total cost:
Figure imgf000004_0001
( , ) = cost( , ), (2.1) where the dynamics are xk+1 = next(xk, uk), x X , and u U (x), where X and U (x) are finite sets. Problems such as this one can be solved using Bellman’s optimality principle, which states that “an optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy” [8]. Therefore, optimal control problems can be simplified from a single optimization of a control sequence to a sequence of optimizations of individual control decisions starting from the final control decision and proceeding backward in time. Once the optimal control action at a given time step is determined, the cost of that time step is used to calculate the optimal cost-to-go function v(x), which is the minimum total cost to reach the final state from each state, and can be represented as: ( , ) + (next( , ))]. (2.2)
Figure imgf000005_0001
cost-to-go has been calculated at every time step by starting at the final time step and doing this backward pass through time, the optimal cost-to-go can be determined by selecting the lowest cost-to-go at the initial time step, which provides the optimal control trajectory. In the context of the flight example given above, this approach means that the problem can be solved by considering flights to a single city at a time and keeping running totals of the flight costs. The cost of flights to the final destination can be recorded first, giving the cost-to-go (to the final destination) from each of those cities. The cost of flights to those second-to-last destinations from other cities can then be added to their respective cost-to-go functions, giving the cost-to-go from those cities. This process then continues until all of these possible routes start with the pre-determined initial departure city. The optimal control problem is then solved by simply selecting the lowest cost-to-go from the initial departure city. This approach is known as dynamic programming and it is well-suited to optimal control problems with defined final costs and a small number of states. As dynamic programming is an exhaustive method that considers all of the possible control trajectories, the size of the optimization problem grows exponentially with the number of states. This is Bellman’s curse of dimensionality, and it likely has no general solution [63], but several approximately optimal control strategies can circumvent this issue. (a) The linear quadratic regulator (LQR) A special case of optimal control is the linear quadratic regulator (LQR), which is applicable to deterministic linear systems have quadratic costs [13]. For the discrete-time linear system = + (2.3) with the quadratic cost function over a finite horizon N of = + + + 2 , (2.4) the optimal control at time step k is , (2.5) where: = ( + ) ( + ) (2.6) and Sk is found by using the Riccati equation = ( + )( + ) ( + ) + (2.7) iteratively backwards in time from the terminal condition of SN = Q. (b) Iterative LQR (iLQR) Iterative LQR (iLQR) extends LQR to nonlinear deterministic systems by linearizing the system about a nominal state-control trajectory. Consider a discrete- time nonlinear deterministic dynamic system with state xp Rnx and control up Rnu: = , . (2.8)
Figure imgf000006_0001
= , where xp is the desired terminal state, Qf and Q are the state cost-weighting matrices that are symmetric positive semi-definite, and R is the control cost- weighting matrix which is positive definite. This iterative algorithm starts with a given nominal control trajectory, u¯p, which is used to determine the nominal state x¯p through integration of the discretized version of (2.8), = + ( , . (2.10) Variables representing the deviations between the actual and nominal state and control trajectories can then be expressed as xk = xpk x¯pk and uk = upk u¯pk, and the dynamic system and output equation in equations (2.8) can be expressed as: x + = + , + . (2.11) Linearizing equation (2.11) about (x¯p k, u¯p k) and subtracting equation (2.10), the linear approximation of equation (2.8) are given as = + (2.12) .
Figure imgf000007_0001
As shown in [35], the locally-optimal controller can be derived as
Figure imgf000007_0002
= ( ) + , (2.18) which is solved iteratively backwards in time from the terminal conditions of SN = Qf and vN nominal control trajectory is given as
Figure imgf000007_0003
= + . (2.19) This control approach is only locally-optimal due to it being a trajectory-based approach [35]. As iLQR considers deviations around an initial nominal control trajectory, u¯p, the resulting solution can only be optimal within that region of control space, and no claims of global optimality can be made. iLQR can also be applied in a moving horizon setting where feedback is used to improve performance by accounting for inaccuracies due to the linearization process. To do this, the algorithm is initialized with a nominal control trajectory as described above that could be a vector of zeros, randomly generated, or any other form of seed. The iLQR algorithm is then run for a given finite time horizon, resulting in an improved control policy. The control policy is then applied to the true system for a single time step, after which a state measurement from the system is obtained. The length of the time horizon and the improved control policy can then be adjusted and used to run the iLQR algorithm. This process can be repeated for a problem with a finite time horizon until the final time step is reached, or continuously for problems with infinite time horizons. (c) Iterative Linear Quadratic Gaussian (iLQG) iLQG extends iLQR to nonlinear systems that are stochastic and do not have quadratic costs. To handle non-quadratic cost functions, the cost function is “quadratized” about the nominal state-control trajectory in a similar way that the system dynamics are linearized (non-quadratic cost functions can be handled in the same way for iLQR). Since the states are uncertain, the measurement dynamics are also included in the iLQG algorithm, and a filter is required to estimate the states’ mean values and covariances. The forward integration of the system dynamics to obtain the nominal state trajectory from the nominal control trajectory and the calculations of the derivatives required for the linearization of the system dynamics as well as the “quadratization” of the cost function are grouped together in what’s known as a forward pass. The next step in the algorithm is the estimator, which uses the noisy measurements to infer the value of the states and their covariances. Next, a backward pass is required to calculate a quadratic approximation to the cost-to-go function, and then the optimal control deviations can be found. Since the linear approximation of the nonlinear system dynamics loses accuracy for larger control deviations, a line search is implemented to iteratively reduce the control deviations if the solution’s estimated cost is not less than the cost of the initial nominal trajectory, improving the algorithm’s convergence. Finally, as with iLQR, iLQG gives optimal solutions due to its trajectory-based approach, which also allows it to be very efficient. To account for this, the algorithm can be initialized multiple times with different nominal control trajectories, known as seeds, to search a larger control space. These seeds may converge to different local minima, and a selection criteria can be determined to select a single solution from this set of solutions. Additional details on iLQR and iLQG publications and their differences can be found in FIGS.1A-1G. (i) The forward pass Linearizing the system dynamics Consider a nonlinear stochastic dynamic system with state trajectory xp Rnx , control trajectory up Rnu , and output trajectory yp Rny with standard and interdependent Brownian noise w R nw and v R nv
Figure imgf000009_0001
Starting with a nominal control trajectory u¯p, a nominal state trajectory x¯p can be determined by applying u¯p to a deterministic and discreteized version of equation (2.20): = + ( , , (2.22) through Euler integration with x¯p(0) = xp(0). The nominal output trajectory can then be obtained with both x¯p and u¯p with a deterministic and discreteized version of equation (2.21), = , . (2.23) Variables representing the deviations of the state, control, and output can then be expressed as = , = , and = respectively, and the dynamic system and output equation in equations (2.20) and (2.21) can be expressed as + = + + ( + , + + , + . (2.24) + = ( + , + + ( + , + . (2.25) Linearizing equations (2.24) and about the nominal state-control trajectory (x¯pk , u¯p k) and subtracting equations (2.22) and (2.23), locally linear discrete forms of equations (2.20) and (2.21) are obtained as = + + ( , ) (2.26)
Figure imgf000010_0001
where the Gaussian noise terms k R nw and k R nv are independent and zero-mean with and the superscript [i] indicates the i th columns
Figure imgf000010_0002
Quadratizing the cost function
Figure imgf000010_0003
an initial state xp to a terminal time N can be defined in terms of a stage cost (t, xp k, up k) 0 and a terminal cost h(xp N) 0 as
Figure imgf000010_0004
( , ) = , , ) . (2.28) Since xp can be obtained through the integration of equation (2.20) from xp, p* 0 the optimal control problem is to find the optimal control trajectory u which minimizes J, ( , ). (2.29) The cost-to-go Jk is the sum of the costs occurring from time step k until the terminal time N , during which the partial control trajectory: : , , } is applied, starting from , =
Figure imgf000011_0001
The optimal cost-to-go at time k starting from xpk is given as
Figure imgf000011_0002
and its value at the terminal time N is: ( ) . (2.32) As described in [58], the Dynamic Programming Principle then reduces the optimal control problem to a sequence of minimizations over a single control starting at the terminal time, progressing backwards in time to the initial state,
Figure imgf000011_0003
Quadratizing the stage cost term in equation (2.33) and writing it in terms of the deviation variables, as was done with the system dynamics in equation (2.26), results
Figure imgf000011_0004
, .
Figure imgf000012_0001
(ii) The estimator Since the states and measurements for the system in question are subject to noise, an estimator is required to infer the values of the states, as they are required to determine the control policy deviations. Li and Todorov suggest that a filter that is based on a sequence of fixed gains be used to determine the control policy in the inner loop of this iterative algorithm, but that in the outer loop moving horizon implementation an adaptive filter could be used [37]. For this fixed-gain inner loop filter, assuming that the system has an initial state estimate with a mean of x0 and a covariance of 0 and the unconditional means and covariances are defined as E [ e
Figure imgf000012_0002
xk] , m k E k [ek], x k E xkx , e k E eke , and xek E xke , the optimal filter gain can be computed as:
Figure imgf000012_0003
+ , + , + , + , + , + ,
Figure imgf000013_0001
(iii) The backward pass Approximating the cost-to-go function Assuming a quadratic form for the cost-to-go function vk(xk,xk) as
Figure imgf000013_0002
the parameters Sx k , Sx k, Sxx k , sx k , sx k , and sk can be computed recursively backwards
Figure imgf000013_0003
= ( ) + ( ) + ( )
Figure imgf000014_0001
Determining the control policy If H is positive semi-definite, then the unconstrained optimal control deviation can be computed as: ( + ) = + , (2.52) where G is given as:
Figure imgf000014_0002
+ , , + , , , (2.53) and lk and Lk are the control policy gains. If H has negative eigenvalues, then the approximation of the cost-to-go function may become negative, and cost-to-go always non-negative. To ensure that H always has non-negative eigenvalues, a regularized version of H, or H and G, H and G, are used to capture the second-order information in these matrices to compute the unconstrained optimal control deviation as: ( + ) = + . (2.54)
Figure imgf000015_0001
There are several options of how perform this regularization, and all use a positive regularization term . The first option [57] is to set + . (2.55) The second option is to regularize H and G through the quadratic cost-to-go terms ([57]) as
Figure imgf000015_0002
where min(eig(H)) is the minimum eigenvalue of H. Finally, a fourth option [37] is to set , (2.59) where [V, D] = eig(H) is the eigenvalue decomposition of H, but the elements in the diagonal matrix D that are less than are replaced with before computing H. In order to have be as small as possible so that the algorithm converges quickly but also have increase quickly if H is not being positive definite, Tassa et al. present a quadratic modification schedule for the regularization term in [57]. Given a minimum vale for of min (typically 10 6) and a minimum modification 0 (typically 2), is either increased with
Figure imgf000015_0003
or decreased with ,
Figure imgf000016_0001
not being positive definite or if the line search fails to find a solution that improves upon the initial control trajectory. The regularization parameter is decreased when the line search finds an improved control trajectory or when the algorithm terminates due to a small control deviation gradient. A parameter defining the maximum regularization term, max, is used for another stopping criteria. Control constraints As discussed in [58], there are two main options for introducing control constraints in iLQG; by using squashing functions or by solving a quadratic program that is subject to box constraints. A squashing function, s(u) is an element-wise sigmoid with vector limits of lim ( ) = (2.64) lim ( ) = (2.65) that is introduced in the system dynamics as: = ( , ( )), (2.66) where bmin and bmax are the vector limits to be imposed on the control inputs. An example of a squashing function is: ( ) = tanh( ) + (2.67) To keep u from taking on extreme values, cost terms should be imposed on both u and s(u). The other option to introduce control constraints is by solving the quadratic program of minimizing the cost-to-go approximation from equation (2.34), subject to the box control constraints bmin and bmax. The quadratic optimization problem to be solved is min ( , ) (2.68) subject to + . (2.69)
Figure imgf000017_0001
This approach directly solves for the sequence of control actions that minimize the approximation of the total cost, by solving a sequence of these problems in a backward pass. For more details, please refer to [58]. (iv) The line search Regardless of the method used to impose control constraints, the gain matrices lk and Lk are the final result, such that the improved policy for the control deviations can be determined as shown in equation (2.52). As discussed previously, this optimization process can result in control deviations that are arbitrarily large and can therefore send the state trajectory outside of the region where the linear approximation to the system dynamics is reasonably accurate. To account for this, a line search is performed to sequentially reduce the improved control deviations until a solution is found that is estimated to cause a reduction in the total cost. This locally-linear policy is determined by performing a forward pass through the estimated system dynamics:
Figure imgf000017_0002
where is a backtracking search parameter that is set to 1 and then sequentially reduced. If the line search fails to find a reduced cost solution, the regularization parameter is increased as shown in equations (2.62) and (2.63), and the control policy backward pass and line search forward pass are repeated until the algorithm converges to a locally optimal control policy. (v) Stopping criteria Four stopping criteria are used to define convergence and control how long this iterative algorithm runs. First, if the gradient of the control deviations is less than a predefined threshold, the algorithm terminates. Second, if the positive reduction in the cost function is less than a predefined threshold, the algorithm also terminates. Third, if is greater than the predefined max value, that algorithm is not able to regularize H and the algorithm terminates. Finally, if the number that the algorithm has completed exceeds a predefined maximum number of iterations, the algorithm also terminates. (vi) Moving horizon implementation iLQG can be implemented with a moving horizon approach as described for iLQR, but since there is noise in the system dynamics and the measurements, a filter may be used to obtain an estimate of the states. A system diagram for this approach is shown in FIG.2. As shown in the figure, the iLQG block implements the iterative iLQG algorithm, which is referred to as the inner loop 110, while the outer loop 120 contains the control signal being sent to the true system 130 for a single time step, the state dynamics being sampled with a sampling period of T as well as the measurement 140 and filtering steps. The inner loop 110 accepts, as input, a nominal control trajectory , the current state and
Figure imgf000018_0001
parameter values and and generates a new control policy after executing the forward and backward pass operations. The new control policy enables the generation of an updated control trajectory for use as the nominal control trajectory for future iterations of the inner loop. For each outer loop iteration, the measurements and the existing knowledge about the system can be used to update the estimate of the states with the use of a filter 150 such as a sigma-point Kalman filter [46]. Here, where the iLQG algorithm is neither adaptive nor dual, the uncertain parameters d are treated as constants, and an augmented constants vector can be formed by their concatenation. Adaptive Control Adaptive control is the control of systems with uncertain parameters that are constant or time-varying. This uncertainty can be due to the cost or complexity of accurately measuring the parameters being high, or when the control scheme is to be applied to multiple similar systems that have different values for the parameters [15]. In [28], adaptive control methods are divided into four categories: gain scheduling, model- reference adaptive control, self-tuning adaptive control, and dual control. The idea behind gain scheduling is to change the controller or its parameters based on pre-defined conditions. A series of local controllers are required to implement gain scheduling, which are each tuned for their operating range, and a method of switching between them. The selection of the local controller can be seen as an adaptive process. A model reference adaptive controller adjusts the estimates of a model’s parameters such that the tracking error of the plant converges to zero. Here, the control and adaptation laws are coupled, and these can be derived using Lyapunov theory such that stability can be shown. Although perfect can be achieved, this condition does not imply that the parameter estimates have converged on their true values; a control task must have sufficient richness for parameter convergence [55]. In the self-tuning adaptive control method, the plant parameters are recursively estimated from the input-output data and then used by a controller as if they were the true plant parameters. Using the estimated value as if it were the true value is often called the certainty equivalence principle, allowing the design of the estimation process and controller to be independent, unlike in model-reference adaptive control. With self-tuning adaptive control there are also no guarantees of parameter convergence without sufficient richness, and the design of the estimator and the controller being independent makes the stability of the system harder to prove [55]. Dual control is theoretically the ideal adaptive control method [28], and one of the major differences is that dual control takes into account the fact that uncertainties will be reduced in the future. This consideration of future information allows for a controller that can probe the plant and make control actions to reduce the parameter uncertainty in the future in a cautious way and focus on the most relevant parameters. Dual Control Bayard et al. divide stochastic control policies into three classes: open-loop, feedback, and closed-loop [7]. Open-loop control policies do not use any process measurements and therefore no learning occurs. Feedback control policies use all measurements up to the current time step, and therefore learning can occur, but the learning is passive since the data generated is only due to performing the control task. Closed-loop control policies also use all measurements up to the current time step, but additionally, they anticipate that future measurements are going to be made in the future. Taking future measurements into account to determine the current control action allows planned, or active, learning to take place. As the control policy has two objectives, system regulation and identification, these stochastic control methods are known as dual controllers. Due to the curse of dimensionality, Bellman’s equations cannot be efficiently solved for problems of arbitrary size, and therefore the explicit and implicit approximations of dual control are needed for its practical implementation. The explicit approximation involves modifying the cost function to elicit one or more of the dual behaviours of probing, caution, and selectiveness. The implicit approximation on the other hand elicits these dual behaviours through a method that allows the control algorithm to consider the impact of probing actions on future costs. Adding terms to the cost function with the explicit approximation requires an explicit trade-off between the control objectives and the system identification, while with the implicit approximation, the controller can balance these dual objectives without additional information. While it is simple to add extra terms to the cost function to elicit the dual features of caution, probing, and/or selectiveness, this approach fixes the value of system information relative to the minimization of the rest of the cost function. Even with methods that vary the value of system information based on a specific measure, explicit approaches are likely to overvalue or undervalue system information at different points in time compared to implicit approaches. In implicit approaches, the value of system information is determined through the consideration of its impact on the unmodified cost function over the control horizon. For this reason, implicit dual approximations are considered superior to explicit approaches. (a) Explicit dual control As the explicit dual approach is manifested through the cost function, many control approaches have been used as a basis for explicit dual control. In [10], a self-tuning adaptive controller for linear systems was modified to include caution and probing terms. In [43], a linear MPC approach was modified to include the dual behaviour of probing by adding a sufficient richness condition such that each control input considers the control inputs that have recently been made, allowing for persistent excitation. A similar approach was used in [23] for linear single input single output systems and was extended in [21] to include disturbances, and in [30] to multiple input multiple output systems with unmeasured stochastic disturbances. A robust tube-based MPC was combined with a partially closed- loop stochastic MPC to give the cautious and probing features of dual control in an explicit way in [16], and was later extended to use an unscented Kalman filter for state estimation in [17]. In [66], a robust invariant set-based MPC method was applied to linear systems with additive and parametric uncertainties using MPC that was formulated to select the controls that maximize probing and regulation objectives. In [31, 32], explicit dual NMPC is developed using sequential optimal experimental design. Due to the relative ease of modifying a cost function to encourage dual behaviours, there have been more publications on explicit dual control than the implicit approximation. Lastly, and very relevant to the present work, in 1976, Bar-Shalom and Tse presented an explicit dual control algorithm known as wide-sense dual control in [4]. Wide-sense dual control is a dual version of differential dynamic programming, which was developed by Mayne in 1966 [44], and Bar-Shalom and Tse appear to not have known about Mayne’s prior work. Bar-Shalom and Tse use a partial certainty equivalence assumption to treat the estimates of the states and parameters as their true values, which is only suitable for the additive noise that they use. This assumption makes wide-sense dual control essentially an explicit dual fully observable iLQG with a second-order approximation of the dynamics instead of a first-order approximation. Li and Todorov developed iLQR in 2004 [35] and in 2006 [36], and they appear to not have known about Bar-Shalom and Tse’s prior work, basing their work on Mayne’s differential dynamic programming [3]. In 2010, Theodorou, Tassa, and Todorov [62] derived stochastic differential dynamic programming for multiplicative noise, which, other than not being dual, is a more general version of wide-sense dual control. For more details on these comparisons, see FIGS.1A-1G. Comparing wide-sense dual control with fully observable iLQG, other than the use of a second-order approximation of the dynamics and the use of an augmented state vector, there is a difference in the scalar term of the quadratic approximation of the cost-to- go function. Using different notation, Bar-Shalom and Tse list the scalar component of the assumed quadratic form of the cost-to-go function as +
Figure imgf000021_0001
which can be written in the notation used by Li and Todorov in [37] as
Figure imgf000021_0002
where f x is the state covariance along the nominal trajectory. These terms provide an explicit addition to the cost function used for differential dynamic programming [58] to balance the notion of caution and probing. They go beyond the terms needed for non-dual control, as evidenced by the fact that the additive noise should have no impact on the control policy for differential dynamic programming [62]. For comparison, the non-dual scalar term of the quadratic approximation of the cost-to-go function is: = . (2.75) For context, if
Figure imgf000021_0003
and Tse had added multiplicative noise, the equation would have gained , , , and if the states were not fully observable, it
Figure imgf000021_0004
also gains the term , , [37], as in
Figure imgf000021_0005
equation (2.50). Bar-Shalom and Tse point to their terms + , as implementing the dual features of caution and probing respectively [4], and this makes sense because changing the coefficient in front of of those terms will explicitly affect the balance of caution versus probing. The inclusion of that term accordingly makes their wide- sense dual control approach an explicit dual control algorithm that can accommodate additive noise. Bar-Shalom and Tse’s work is extended in [28] by estimating the system dynamics using a parametric, Gaussian process and neural network regression. Like Bar- Shalom and Tse’s work on which it is based, this control method is explicitly dual and only applicable to additive noise. One application shows how dual NMPC can be applied to control the climate of a building, and this example is also detailed in [27]. Bar-Shalom and Tse also refer to time-scale separation and reformulating this implicit dual NMPC method without the dynamic programming portion to make it solely NMPC-based. Although this control strategy was formulated considering unknown time-varying system parameters, the stability of the controller and providing guarantees on the parameter estimation were not considered. (b) Implicit dual control As with the work on explicit dual control, many of the implicit dual control publications have been based on MPC, but several have taken other approaches. These works also make different assumptions on the uncertain parameters and use different estimation approaches. Multi-stage NMPC is used as an implicit approximation to dual control in [60], where the unknown system parameters were assumed to be bounded, parametric, and time- invariant. Here, the Fisher information matrix is used to estimate the future reduction in uncertainties, but the least squares estimate of the uncertainties is assumed to be constant. The parameter sensitivities are included in the scenario tree in the same way as the states, allowing the controller to make decisions based on probing for information on specific parameters. More involved estimation schemes for the uncertainties and their bounds are mentioned, including guaranteed parameter estimation. This dual control method is then applied to the control of a simulated chemical batch reactor. This work was extended in [59], and the assumption that the least squares estimate has converged to the true value of the uncertainty is removed. This assumption is replaced by an over-approximation factor to estimate the future changes in the point estimation. Here, guaranteed parameter estimation is mentioned as a possible future extension of this work, but was not explored. This approach is used as a basis of comparison throughout this work and is therefore explained in more detail in the following section. In [20], a multi-stage implicit dual MPC method is presented where the unknown system parameters are assumed to be time-invariant. Here, the future measurements at each stage are used to update the states and parameters for all of the other scenarios in that stage using an ensemble filter. A similar approach was used in [56], except that an unscented Kalman filter is used instead of an ensemble Kalman filter, and the model parameters are not updated (and therefore this method is not dual). Using a policy iteration approach and particle filtering for the nonlinear estimation problem an implicit dual control method is developed in [7]. This strategy is applied to a generic system with process and measurement noise, and the computations are performed in a novel H-block format. This control scheme is then applied to the simulation of a linear pendulum with uncertain length, mass, and control gain. In [53], the finite-horizon implicit dual control problem is made tractable for problems of reasonable dimensions by approximating the system dynamics by a partially observable Markov decision process. This control method is then applied to a health care decision-making process. One method that has been developed to implement implicit dual control is multi-stage NMPC. Multi-stage NMPC considers a branching network of scenarios, where each branch represents the system’s predicted response for a single realization of the uncertainties. This scenario tree is illustrated in FIG.3, where x represents the system states, u represents the control actions, and d represents the discrete realizations of the uncertainties with superscripts indicating the scenario number and subscripts indicating the time index into the future. It is noted that the control actions that share a node are equal, and this requirement is known as the non-anticipatory constraint. The reduction in the uncertainties due to future measurements is estimated for each time step in a robust horizon, and this reduction is reflected in the selection of the uncertainty values upon which the subsequent scenarios are based. As this process causes the number of scenarios to grow exponentially, the robust horizon is usually chosen to be less than the control horizon, and the uncertainties after that point are assumed to be equal to the nominal value. The optimization problem that is solved is to determine the control actions (over the control horizon) that minimize the sum of the costs of the scenarios over the prediction horizon. To ensure that none of the scenarios are excited more than necessary, the selected control action is cautious when uncertainties are high. Probing is also introduced as small control actions that may increase a scenario’s cost initially but can lead to larger reductions in future costs associated with the uncertainty. In the same manner, selectiveness is introduced, as probing actions that have a higher “return on investment” will be prioritized, and therefore the most important uncertainties will be reduced. The multi-stage NMPC method relies on predicting the future reductions in uncertainties for given control actions, and even when representing uncertainties by a discrete set of realizations, the number of scenarios grows exponentially and must be limited. The selection of the discrete realizations of the uncertainties impacts the controller’s robustness and computational For linear systems, using the minimum, maximum, and nominal values of the uncertainties (assuming they are bounded) keeps the number of scenarios per branch low and can be shown to be “usually” robust [60]. For nonlinear systems, this approach is not robust but has shown good results in practice [60]. In [11], uncertainties in a stochastic NMPC method were propagated using the unscented Kalman filter, but this control approach was neither dual nor multi-stage. Dual multi-stage NMPC with the unscented Kalman filter method was shown to be robust in [38]. Present implementations of dual and non-dual multi-stage NMPC [1, 40, 41, 45, 61] use IPOPT, an interior-point optimization routine for large-scale problems [65], and CasADi, an open-source numerical optimization framework designed to solve optimal control problems [2], from within Python, MATLAB, or Octave. To symbolically calculate the confidence regions that are needed to set up and solve the dual multi-stage NMPC problem, the eigenvectors and eigenvalues of the parameter covariance matrix are required. Currently, CasADi’s symbolic eigenvector and eigenvalue functions only support matrices up to 3×3 in size, limiting the number of uncertain parameters to three. The code from [61] that was used in this work was only able to handle a maximum of two parameters. Therefore in this work, comparing the present dual iLQG approach with the dual MS-SP-NMPC approach was only done on systems with two parameters. SUMMARY Systems and methods are provided for determining control actions for controlling a stochastic system via an implicit dual control using a computer processor and associated memory encoded with an augmented mathematical model of dynamics of the system, the augmented mathematical model characterizing the dynamics of the states and uncertain parameters of the system. A stochastic differential-dynamic-programming-based algorithm is employed to process an augmented state data structure characterizing the states of the stochastic system and the one or more uncertain parameters, and an augmented state covariance data structure comprising a covariance matrix of the states of the stochastic system and a covariance matrix of the uncertain parameters, according to the augmented mathematical model and a cost function, in which the uncertain parameters are treated as additional states subject to the augmented mathematical model, to determine a control policy for reducing cost through implicitly generated dual features of probing, caution, and selectiveness. Accordingly, in a first aspect, there is provided a computer-implemented method of determining control actions for controlling a stochastic system according to an implicit dual controller, the method comprising: providing control and processing circuity comprising at least one processor and associated memory, the memory comprising an augmented mathematical model of dynamics of the stochastic system, the augmented mathematical model a mathematical model including states of the stochastic system that dynamically evolve over time and a set of parameters characterizing dynamic evolution of the stochastic system, the set of parameters comprising one or more uncertain parameters, the augmented mathematical model augmented to include dynamics of one or more uncertain parameters; storing, in the memory, an initialized form of an augmented state data structure, the augmented state data structure being provided in as a first data array of data elements characterizing the states of the stochastic system and the one or more uncertain parameters, thereby grouping the states and the one or more uncertain parameters within the augmented state data structure; storing, in the memory, via the processor, an initialized form of an augmented state covariance data structure, the augmented state covariance data structure being provided as a second data array of data elements comprising a covariance matrix of the states of the stochastic system and a covariance matrix of the one or more uncertain parameters, thereby grouping data elements of the respective covariance matrices of the states and the one or more uncertain parameters within the augmented state covariance data structure; storing, in the memory, via the processor, an initialized form of a nominal control trajectory data structure; performing a control loop iteration via the processor by: a) processing, according to an augmented stochastic differential-dynamic- programming-based algorithm, the augmented state data structure, the augmented state covariance data structure and the nominal control trajectory data structure according to the augmented mathematical model and a cost function such that a forward pass and a backward pass performed when executing the augmented stochastic differential-dynamic-programming-based algorithm, in which the one or more uncertain parameters are treated as additional states subject to the augmented mathematical model, results in a control policy configured to reduce cost through implicitly generated dual features of probing, caution, and selectiveness, thereby achieving convergence faster than a corresponding method absent of augmentation; b) processing the control policy to determine a control action for controlling the stochastic system; c) receiving one or more output measurements of an output of the stochastic system; and d) processing the one or more output measurements to update the augmented state data structure and to update the augmented state covariance data structure; and repeating steps a) to d) one or more times based on an updated nominal control trajectory data structure to determine and apply control actions over a plurality of time steps, thereby implicitly incorporating prediction of how changes to the control actions result in reductions in parameter uncertainty to implicitly lower a total cost of the control actions. In one example implementation of the method, the mathematical model is configured such that all of the states are observable, and performing step d), the one or more uncertain parameters are updated via a filter. In one example implementation of the method, at least one of the states is unobservable, and wherein, when performing step d), a filter is configured to employ the augmented mathematical model to estimate updated values of unknown states and the one or more uncertain parameters by processing the augmented state data structure and the augmented state covariance data structure. In one example implementation of the method, for at least one time step, step b) is performed after performing steps c) and d), such that the control policy is determined based on newly determined states. In one example implementation of the method, the control action is autonomously applied to the stochastic system for at least one time step. In one example implementation of the method, the control action is not applied to the stochastic system for at least one time step. In one example implementation of the method, the control and processing circuity is encoded with the augmented mathematical model such that the augmented mathematical model is characterized by multiplicative noise. In one example implementation of the method, the control and processing circuity is encoded with the augmented mathematical model such that at least one of the uncertain parameters is modeled as a time-dependent parameter. In one example implementation of the method, the control and processing circuity is encoded to employ a moving control horizon. In one example implementation of the method, the mathematical model is obtained by data- driven modeling. In one example implementation of the method, the mathematical model is obtained by regression-based data-driven modeling. In one example implementation of the method, the augmented stochastic differential- dynamic-programming-based algorithm is based on an algorithm selected from the group consisting of the iterative linear quadratic Gaussian (iLQG) algorithm, the stochastic dual dynamic programming (SDDP) algorithm, and variations thereof. In one example implementation of the method, the augmented stochastic differential- dynamic-programming-based algorithm is based on an algorithm selected from the group consisting of dual dynamic programming, the sequential linear quadratic (SLQ) algorithm, and the iterative linear quadratic regulator (iLQR) algorithm, and variations thereof, all being modified to include stochastic terms. In one example implementation of the method, the stochastic system is an industrial system for producing or refining a product. At least one of the one or more uncertain parameters may be a composition of a feedstock, and wherein the control action comprises controlling an input rate of the feedstock. The industrial system may be an anerobic digestion system and the feedstock is an organic feedstock suitable for digestion by In one example implementation of the method, the stochastic system comprises a population, wherein the states comprises a plurality of infection status states of the population, wherein the mathematical model simulates spread and dynamics of an infectious disease among the population, and wherein the control policy is configured to determine, at least in part, a severity of public policy actions for containing spread of the infectious disease, and wherein the one or more uncertain parameters comprise a minimum rate of infection when a maximum severity of public policy is applied. In one example implementation of the method, the stochastic system is an autonomous vehicle, and where at least one uncertain parameter is associated with an uncertainty caused by an impact of an environment on dynamics of the autonomous vehicle. The one or more uncertain parameters may comprise at least one of a friction coefficient and a drag coefficient having uncertainty due to external environmental conditions. In one example implementation of the method, the stochastic system is an individual undergoing rehabilitation, wherein the states characterize at least one of participation, activities, health condition, body functions and structures, environmental factors, and personal factors, and wherein the one or more uncertain parameters comprise gains and time-constants involving interactions between the states in response to rehabilitation control actions. In one example implementation of the method, the stochastic system is a wearable robotic system, and wherein at least one uncertain parameter is tunable on a per-user basis. In one example implementation of the method, the stochastic system is an industrial system, and wherein at least one uncertain parameter is associated with degradation of the industrial system, and wherein the method further comprises employing updated values of the at least one uncertain parameter and/or its updated uncertainty, obtained during control of the industrial system, to detect a fault associated with degradation of the industrial system. In one example implementation of the method, the stochastic system is a building climate control system, and wherein the one or more uncertain parameters comprise at least one of an uncertain parameter associated with external factor and an uncertain parameter characterizing a building-specific factor. In another aspect, there is provided a computer-implemented method of determining control actions for controlling a stochastic system according to an adaptive controller, the method comprising: providing control and processing circuity comprising at least one processor and associated memory, the memory comprising a mathematical model of dynamics of the stochastic system, the mathematical model including states of the stochastic system that dynamically evolve over time and a set of parameters characterizing dynamic evolution of the stochastic system, the set of parameters comprising one or more uncertain parameters, the memory also comprising an augmented mathematical model of dynamics of the stochastic system, the augmented mathematical model comprising a mathematical model augmented to dynamics of one or more uncertain parameters; storing, in the memory, an initialized form of a state data structure, the state data structure being provided in as a first data array of data elements characterizing the states of the stochastic system; storing, in the memory, an initialized form of a state covariance data structure, the state covariance data structure being provided as a second data array of data elements comprising a covariance matrix of the states of the stochastic system; storing, in the memory, an initialized form of an augmented state data structure, the augmented state data structure being provided as a third data array of data elements characterizing the states of the stochastic system and the one or more uncertain parameters, thereby grouping the states and the one or more uncertain parameters within the augmented state data structure; storing, in the memory, an initialized form of an augmented state covariance data structure, the augmented state covariance data structure being provided as a fourth data array of data elements comprising a covariance matrix of the states of the stochastic system and a covariance matrix of the one or more uncertain parameters, thereby grouping data elements of the respective covariance matrices of the states and the one or more uncertain parameters within the augmented state covariance data structure; storing, in the memory, an initialized form of a nominal control trajectory data structure; performing a control loop iteration via the processor by: a) processing, according to a stochastic differential-dynamic-programming-based algorithm, the state data structure, the state covariance data structure and the nominal control trajectory data structure according to the mathematical model and a cost function, such that a forward pass and a backward pass performed when executing the stochastic differential-dynamic- programming-based algorithm, results in a control policy; b) processing the control policy to determine a control action for controlling the stochastic system; c) receiving one or more output measurements of an output of the stochastic system; and d) processing the one or more output measurements to update the state data structure, the state covariance data structure and the one or more uncertain parameters, wherein a filter is configured to employ the augmented mathematical model to estimate updated values of unknown states and the one or more uncertain parameters by processing the augmented state data structure and the augmented state covariance data structure; and repeating steps a) to d) one or more times based on an updated nominal control trajectory data structure to determine and apply control actions over a plurality of time steps, thereby incorporating prediction of how changes to the control actions result in reductions in parameter uncertainty to lower a total cost of the control actions. In one example implementation of the the stochastic differential-dynamic- programming-based algorithm is based on an algorithm selected from the group consisting of the iterative linear quadratic Gaussian (iLQG) algorithm, the stochastic dual dynamic programming (SDDP) algorithm, and variations thereof. In one example implementation of the method, the stochastic differential-dynamic- programming-based algorithm is based on an algorithm selected from the group consisting of dual dynamic programming, the sequential linear quadratic (SLQ) algorithm, and the iterative linear quadratic regulator (iLQR) algorithm, and variations thereof, all being modified to include stochastic terms. In another aspect, there is provided an implicit dual controller for controlling a stochastic system, the implicit dual controller comprising: control and processing circuity comprising at least one processor and associated memory, the memory comprising an augmented mathematical model of dynamics of the stochastic system, the augmented mathematical model based on a mathematical model including states of the stochastic system that dynamically evolve over time and a set of parameters characterizing dynamic evolution of the stochastic system, the set of parameters comprising one or more uncertain parameters, the augmented mathematical model augmented to include dynamics of one or more uncertain parameters; the memory further comprising: an initialized form of an augmented state data structure, the augmented state data structure being provided in as a first data array of data elements characterizing the states of the stochastic system and the one or more uncertain parameters, thereby grouping the states and the one or more uncertain parameters within the augmented state data structure; an initialized form of an augmented state covariance data structure, the augmented state covariance data structure being provided as a second data array of data elements comprising a covariance matrix of the states of the stochastic system and a covariance matrix of the one or more uncertain parameters, thereby grouping data elements of the respective covariance matrices of the states and the one or more uncertain parameters within the augmented state covariance data structure; an initialized form of a nominal control trajectory data structure; the memory further comprising instructions executable by said at least one processor for performing operations comprising: performing a control loop iteration by: a) processing, according to an augmented stochastic differential-dynamic- programming-based algorithm, the augmented state data structure, the augmented state covariance data structure and the nominal control trajectory data structure according to the augmented mathematical model and a cost function such that a forward pass and a backward pass performed when executing the augmented stochastic differential-dynamic-programming-based algorithm, in which the one or more uncertain parameters are as additional states subject to the augmented mathematical model, results in a control policy configured to reduce cost through implicitly generated dual features of probing, caution, and selectiveness, thereby achieving convergence faster than a corresponding method absent of augmentation; b) processing the control policy to determine a control action for controlling the stochastic system; c) receiving one or more output measurements of an output of the stochastic system; and d) processing the one or more output measurements to update the augmented state data structure and to update the augmented state covariance data structure; and repeating steps a) to d) one or more times based on an updated nominal control trajectory data structure to determine and apply control actions over a plurality of time steps, thereby implicitly incorporating prediction of how changes to the control actions result in reductions in parameter uncertainty to implicitly lower a total cost of the control actions. In another aspect, there is provided an adaptive controller for controlling a stochastic system, the adaptive controller comprising: control and processing circuity comprising at least one processor and associated memory, the memory comprising a mathematical model of dynamics of the stochastic system, the mathematical model including states of the stochastic system that dynamically evolve over time and a set of parameters characterizing dynamic evolution of the stochastic system, the set of parameters comprising one or more uncertain parameters, the memory also comprising an augmented mathematical model of dynamics of the stochastic system, the augmented mathematical model comprising a mathematical model augmented to include dynamics of one or more uncertain parameters; the memory further comprising: an initialized form of a state data structure, the state data structure being provided in as a first data array of data elements characterizing the states of the stochastic system; an initialized form of a state covariance data structure, the state covariance data structure being provided as a second data array of data elements comprising a covariance matrix of the states of the stochastic system; an initialized form of an augmented state data structure, the augmented state data structure being provided as a third data array of data elements characterizing the states of the stochastic system and the one or more uncertain parameters, thereby grouping the states and the one or more uncertain parameters within the augmented state data structure; an initialized form of an augmented state covariance data structure, the augmented state covariance data structure being provided as a fourth data array of data elements comprising a covariance matrix of the states of the stochastic system and a covariance matrix of the one or more uncertain parameters, thereby grouping data elements of the respective covariance matrices of the states and the one or more uncertain parameters the augmented state covariance data structure; and an initialized form of a nominal control trajectory data structure; the memory further comprising instructions executable by said at least one processor for performing operations comprising: performing a control loop iteration by: a) processing, according to a stochastic differential-dynamic-programming- based algorithm, the state data structure, the state covariance data structure and the nominal control trajectory data structure according to the mathematical model and a cost function, such that a forward pass and a backward pass performed when executing the stochastic differential-dynamic- programming-based algorithm, results in a control policy; b) processing the control policy to determine a control action for controlling the stochastic system; c) receiving one or more output measurements of an output of the stochastic system; and d) processing the one or more output measurements to update the state data structure, the state covariance data structure and the one or more uncertain parameters, wherein a filter is configured to employ the augmented mathematical model to estimate updated values of unknown states and the one or more uncertain parameters by processing the augmented state data structure and the augmented state covariance data structure; and repeating steps a) to d) one or more times based on an updated nominal control trajectory data structure to determine and apply control actions over a plurality of time steps, thereby incorporating prediction of how changes to the control actions result in reductions in parameter uncertainty to lower a total cost of the control actions. In another aspect, there is provided a system comprising: a stochastic physical subsystem; one or more sensors associated with said stochastic physical subsystem for measuring an output associated with said stochastic physical subsystem; and control and processing circuitry operably coupled to said stochastic physical system and said one or more sensors, said control and processing circuitry comprising at least one processor and associated memory, said memory comprising instructions executable by said at least one processor for performing operations comprising: (a) employing an augmented stochastic differential-dynamic-programming- based algorithm to determine control actions for controlling the stochastic physical subsystem, the augmented stochastic differential-dynamic-programming-based algorithm modelling the stochastic system, at least in part, according to a set of states and one or more uncertain parameters, and employing an augmented state vector comprising the states and the one or more uncertain parameters, an augmented covariance matrix that combines a covariance matrix of the states and a covariance of the one or more uncertain parameters; (b) applying the control actions to the stochastic physical subsystem; (c) receiving signals from the one or more sensors to measure output of the stochastic physical subsystem; (d) processing the output and the control actions to determine an augmented state vector estimate and an augmented covariance matrix estimate; and (e) repeating steps (a)-(d) to determine control actions for a plurality of time steps, such that each time that step (a) is repeated, the most recently determined augmented state vector estimate and augmented covariance matrix estimate are employed by the augmented stochastic differential-dynamic-programming-based algorithm; the augmented stochastic differential-dynamic-programming-based algorithm thereby implicitly incorporating prediction of how changes to the control actions result in reductions in parameter uncertainty to implicitly lower a total cost of a control trajectory. In another aspect, there is provided a method of controlling a stochastic physical system according to an implicit dual controller, the stochastic system being modeled, at least in part, according to a set of states and one or more uncertain parameters, the method comprising: (a) employing an augmented stochastic differential-dynamic-programming-based algorithm to determine control actions for controlling the stochastic physical system, the augmented stochastic differential-dynamic-programming-based algorithm employing an augmented state vector comprising the states and the one or more uncertain parameters, and an augmented covariance matrix that combines a covariance matrix of the states and a covariance matrix of the one or more uncertain parameters; (b) applying the control actions to the stochastic physical system; (c) measuring an output of the stochastic physical system; (d) processing the output and the control actions to determine an augmented state vector estimate and an augmented covariance matrix estimate; and (e) repeating steps (a)-(d) to determine control actions for a plurality of time steps, such that each time that step (a) is repeated, the most recently determined augmented state vector estimate and augmented covariance matrix estimate are employed by the augmented stochastic differential-dynamic-programming-based algorithm; the augmented stochastic differential-dynamic-programming-based algorithm thereby implicitly incorporating prediction of how changes to the control actions result in reductions in parameter uncertainty to implicitly lower a total cost of a control trajectory. A further understanding of the functional and advantageous aspects of the disclosure can be realized by reference to the following detailed description and drawings. BRIEF DESCRIPTION OF THE DRAWINGS Embodiments are described with to the accompanying drawings. In the drawings, like reference numbers can indicate identical or functionally similar elements. FIG.1A shows a comparison of DDP-like controllers. FIG.1B shows a detailed comparison of DDP-like controllers according to their titles. FIG.1C shows a detailed comparison of DDP-like controllers according to their algorithms. FIG.1D shows a detailed comparison of DDP-like controllers according to their treatment of noise. FIG.1E shows a detailed comparison of DDP-like controllers according to their approximations. FIG.1F shows a detailed comparison of DDP-like controllers according to their regularization. FIG.1G shows a detailed comparison of DDP-like controllers according to their constraints. FIG.2 shows a closed-loop iLQG system diagram (terms are defined in the list of variables). FIG.3 shows the tree of scenarios considered in multi-stage NMPC [59]. FIGS.4A, 4B and 4C show flow charts for the three variations of iLQG discussed in the present disclosure. FIG.4A is the same as FIG.2 and is shown here for ease of comparison, while FIGS.4B and 4C illustrate example adaptive and implicit dual control methods, respectively. Terms defined in the list of variables. FIG. 5A shows a control comparison between dual and adaptive iLQG and MS-SP- NMPC on linear example. FIG.5B shows a state comparison between dual and adaptive iLQG and MS-SP-NMPC on a linear example. FIG.5C shows a parameter comparison between dual and adaptive iLQG and MS-SP- NMPC on a linear example. FIG.5D shows a cost comparison between dual and adaptive iLQG and MS-SP-NMPC on a linear example. FIG.5E shows a control comparison between dual and adaptive iLQG on a time-varying parameters example. FIG.5F shows a state comparison between dual and adaptive iLQG on time-varying parameters example. FIG.5G shows a parameter comparison between dual and adaptive iLQG on a time-varying parameters example. FIG.5H shows a cost comparison between dual and adaptive iLQG on a time-varying parameters example. FIG. 6A shows the frequency (%) of final cost of 100 seeded runs of the iLQG algorithms for the time-varying parameter example. FIG. 6B shows the distribution of the simulation time required for 100 seeded runs of the iLQG algorithms for the time-varying parameter example. FIG.7A shows a control comparison between dual iLQG with and without the system’s multiplicative noise being compensated for in controller. FIG.7B shows a state comparison between dual iLQG with and without the system’s multiplicative noise being compensated for in the controller. FIG.7C shows a parameter comparison between dual iLQG with and without the system’s multiplicative noise being compensated for in the controller. FIG.7D shows a parameter ratio comparison between dual iLQG with and without the system’s multiplicative noise being compensated for in the controller. FIG.7E shows a cost comparison between dual iLQG with and without the system’s multiplicative noise being compensated for in the controller. FIG.8A shows a model-reference adaptive control flowchart for Rohr’s example [55]. FIG.8B shows a control comparison between dual and adaptive iLQG and model- reference adaptive control on an unmodelled dynamics example. FIG.8C shows a control comparison between dual and adaptive iLQG and model- reference adaptive control on the unmodelled dynamics example for first seven seconds. FIG.8D shows a state comparison between dual and adaptive iLQG and model-reference adaptive control on the unmodelled dynamics example. FIG.8E shows a state comparison between dual and adaptive iLQG and model-reference adaptive control on the unmodelled dynamics example for first seven seconds. FIG.8F shows a parameter comparison between dual and adaptive iLQG and model- reference adaptive control on the unmodelled dynamics example. FIG.8G shows a parameter comparison between dual and adaptive iLQG and model- reference adaptive control on the unmodelled dynamics example for first seven seconds. FIG.8H shows a parameter ratio comparison between dual and adaptive iLQG and model- reference adaptive control on the unmodelled dynamics example. FIG.8I shows a parameter ratio comparison between dual and adaptive iLQG and model-reference adaptive control on the unmodelled dynamics example for first seven seconds. FIG.8J shows an output comparison between dual and adaptive iLQG and model- reference adaptive control on the unmodelled dynamics example. FIG.8K shows an output comparison between dual and adaptive iLQG and model- reference adaptive control on the unmodelled dynamics example for first seven seconds. FIG.8L shows a cost comparison between dual and adaptive iLQG and model-reference adaptive control on the unmodelled dynamics example. FIG.8M shows a cost comparison between dual and adaptive iLQG and model-reference adaptive control on the unmodelled dynamics example for first seven seconds. FIG. 9A shows an example implicit dual iLQG control system. FIG. 9B shows an example implicit dual iLQG control system with a joint estimation filter. FIG. 9C shows an example implicit dual iLQG control system with dual estimation filters. FIG. 9D shows an example implicit dual system employed for system with fully observable states. FIG. 9E illustrates how, within the inner loop of the method, the model structure and parameters are provided to the iLQG algorithm. FIG. 9F shows various example approaches to model generation. FIG. 10 shows a Venn Diagram schematically illustrating relationships between different types of differential dynamic programming-based algorithms. FIG.11 shows an example system with an implicit dual controller. FIG.12 shows parameters for the example AM2 model. FIG.13A shows a control comparison between dual and adaptive iLQG and dual MS- SP-NMPC on an AM2 anaerobic digestion model with two uncertain parameters. FIG.13B shows a state comparison between dual and adaptive iLQG and dual MS-SP- NMPC on the AM2 anaerobic digestion model with two uncertain parameters. FIG.13C shows a parameter comparison between dual and adaptive iLQG and dual MS-SP-NMPC on the AM2 anaerobic digestion model with two uncertain parameters. FIG.13D shows a biogas production comparison between dual and adaptive iLQG and dual MS-SP-NMPC on the AM2 anaerobic digestion model with two un-certain parameters. FIG.13E shows a cost comparison between dual and adaptive iLQG and dual MS-SP- NMPC on the AM2 anaerobic digestion model with two uncertain parameters. FIG.14 shows parameter values for the AM2 comparison with seventeen uncertainties. FIG.15A shows a control comparison between dual and adaptive iLQG on an AM2 anaerobic digestion model with seventeen uncertain parameters. FIG.15B shows a state comparison between dual and adaptive iLQG on the AM2 anaerobic digestion model with seventeen uncertain parameters. FIG.15C shows a parameter comparison between dual and adaptive iLQG on the AM2 anaerobic digestion model with seventeen uncertain parameters. FIG.15D shows a biogas production comparison between dual and adaptive iLQG on the AM2 anaerobic digestion model with seventeen uncertain parameters. FIG.15E shows a cost comparison between dual and adaptive iLQG on the AM2 anaerobic digestion model with seventeen uncertain parameters. FIG.16 shows a SIR model compartmental diagram [42]. FIG.17A shows a SIDARTHE model compartmental diagram [29]. FIG.17B shows parameters for the SIDARTHE model. FIG.18 shows a partial compartmental diagram for considering the impact of an over- whelmed ICU [29]. FIG.19A shows a control comparison between dual and adaptive iLQG and dual MS- SP-NMPC on a SIDARTHE COVID-19 with two uncertain parameters. FIG.19B shows a state comparison between dual and adaptive iLQG and dual MS-SP- NMPC on the SIDARTHE COVID-19 model with two uncertain parameters. FIG.19C shows a parameter comparison between dual and adaptive iLQG and dual MS-SP-NMPC on the SIDARTHE COVID-19 model with two uncertain parameters. FIG.19D shows a cost comparison between dual and adaptive iLQG and dual MS-SP- NMPC on the SIDARTHE COVID-19 model with two uncertain parameters. FIG. 20A shows results from 100 seeded runs of the 2 parameter SIDARTHE COVID-19 model with several rolling horizon lengths. FIG. 20B shows a frequency (%) of final cost of 100 seeded runs of the iLQG algorithms for the 2 parameter SIDARTHE COVID-19 model with a rolling horizon length of 40. FIG. 20C shows a distribution of the simulation time required for 100 seeded runs of the iLQG algorithms. FIG.21 shows parameter values for the SIDARTHE comparison with sixteen uncertainties. FIG.22A shows a control comparison between dual and adaptive iLQG on a modified SIDARTHE COVID-19 model with sixteen uncertain parameters. FIG.22B shows a state comparison between dual and adaptive iLQG on the modified SIDARTHE COVID-19 model with sixteen uncertain parameters. FIG.22C shows a parameter comparison between dual and adaptive iLQG on the modified SIDARTHE COVID-19 model with sixteen uncertain parameters. FIG.22D shows a cost comparison between dual and adaptive iLQG on the modified SIDARTHE COVID-19 model with sixteen uncertain parameters. DETAILED DESCRIPTION Various embodiments and aspects of the disclosure will be described with reference to details discussed below. The following description and drawings are illustrative of the disclosure and are not to be construed as limiting the disclosure. Numerous specific details are described to provide a thorough understanding of various embodiments of the present disclosure. However, in certain instances, well-known or conventional details are not described in order to provide a concise discussion of embodiments of the present disclosure. As used herein, the terms “comprises” and “comprising” are to be construed as being inclusive and open ended, and not exclusive. Specifically, when used in the specification and claims, the terms “comprises” and “comprising” and variations thereof mean the specified features, steps or components are included. These terms are not to be interpreted to exclude the presence of other features, steps or components. As used herein, the term “exemplary” means “serving as an example, instance, or illustration,” and should not be construed as preferred or advantageous over other configurations disclosed herein. As used herein, the terms “about” and “approximately” are meant to cover variations that may exist in the upper and lower limits of the ranges of values, such as variations in properties, parameters, and dimensions. Unless otherwise specified, the terms “about” and “approximately” mean plus or minus 25 percent or less. It is to be understood that unless otherwise specified, any specified range or group is as a shorthand way of referring to each and every member of a range or group individually, as well as each and every possible sub-range or sub-group encompassed therein and similarly with respect to any sub-ranges or sub-groups therein. Unless otherwise specified, the present disclosure relates to and explicitly incorporates each and every specific member and combination of sub-ranges or sub-groups. The iterative linear quadratic Gaussian (iLQG) method is a powerful control technique due to its ability to handle nonlinear and stochastic systems with multiplicative noise but has not been extended to handle systems with uncertain parameters in either an adaptive or dual manner. iLQG, which can calculate locally-optimal control policies for nonlinear stochastic systems, is based on continuous state space with the use of derivatives of a linearized system about a nominal control trajectory and is related to Pontryagin’s maximum principle. Although iLQG is not a dual or adaptive control algorithm, it can handle nonlinear and stochastic systems with many states through its derivative-based approach in continuous state space. The present inventors realized that by modifying iLQG to treat the uncertain parameters as uncertain states, the resulting adaptive and dual iLQG control algorithm can predict how changes to the inputs and states can result in future reductions in the parameter uncertainty and therefore increase overall performance (lower costs). In particular, by calculating the derivatives of the (implicit) cost function at each time step, dual iLQG (and variations thereof employing other stochastic DDP-based algorithms) can identify changes to the inputs that can decrease parameter uncertainty, and although these actions have an associated cost, they decrease the overall cost over the control trajectory. Adaptive and dual iLQG, described below, represents a fast (due to the linearization of the system) and feasible (due to working with derivatives about a nominal state-control trajectory) solution to the implicit dual control of small and large systems while avoiding Bellman’s curse of dimensionality. Accordingly, in present work, an existing derivative-based control method is extended to handle systems with uncertain parameters in either an adaptive or dual manner. Adaptive iLQG To extend iLQG to uncertain systems in an adaptive manner, two changes are made to the closed-loop iLQG approach shown in FIG.4A (and FIG.2). First, the initial estimates of the uncertain parameters are to the iLQG inner loop as constants. Second, in the outer loop, the parameters are estimated along with the states in the filter. This is done by concatenating the uncertain states and parameters together into a single augmented state vector, =
Figure imgf000038_0001
where is the augmented state vector and is the uncertain parameter trajectory, and their respective covariance matrices are also combined in a block diagonal manner, (3.2)
Figure imgf000038_0002
where is the augmented covariance matrix and
Figure imgf000038_0003
is the parameter covariance matrix. The augmentation of the state vector with the parameters for the closed-loop filter is an approach that is known as joint simultaneous state and parameter estimation [49]. To pass the parameters to the inner loop iLQG algorithm, an augmented constants vector ca is created similar to the augmented state vector, through the concatenation of the constants and the parameters,
Figure imgf000038_0004
Moreover, when the processing the augmented forms of the state vector and the covariance matrix by the filter in the outer loop, an augmented form of the state dynamics
Figure imgf000038_0005
where ( , ) is the parameter dynamics and ( , ) is the augmented system dynamics. This approach allows adaptive iLQG to update its parameter estimates in the outer loop after getting new measurements and pass them to the inner loop iLQG algorithm as constants, which is known as the certainty equivalence principle. Critically, because the parameters are treated as constants in the inner loop iLQG algorithm, the uncertainty associated with the parameters does not impact the determination of the control policy. These changes are shown in the system diagram for closed-loop adaptive iLQG in FIG.4B. As can be seen in the figure, the state vector is employed by the filter 150 in the outer loop, but the non-augmented forms of the current state and parameter values and are employed when executing the inner loop 110. Although adaptive iLQG controllers have been developed, they are generally focused on identifying the system dynamics as a whole as opposed to a set of parameters in an existing model of the system. In [34], an adaptive iLQG algorithm was developed where the entire Gaussian Process system dynamics were adapted using Gaussian Process regression. The system dynamics are also adapted in [48] using Locally Weighted Projection Regression. Dual iLQG To extend this adaptive iLQG approach to be dual, the uncertainty associated with the parameters influences the control policy. Therefore, instead of treating the parameters as constants in the inner loop iLQG algorithm, the parameters are treated as states and an augmented state vector is formed as shown in equation (3.1). An augmented state covariance is also created as shown in equation (3.2). Notably, unlike the adaptive method described above, the present dual iLQG method involves the processing of the augmented state dynamics in the inner loop. The inclusion of the parameter dynamics makes dual iLQG able to handle time-varying parameters. These changes are represented in the system diagram in FIG.4C. As can be seen in the figure, the augmented state vector is employed when executing the inner loop 110 according to the augmented form of the state dynamics, such that the uncertain parameters are treated as states by the inner loop, with the uncertain parameters governed by the parameter dynamics prescribed by the augmented state dynamics. In this way, the iLQG algorithm treats the parameters as unmeasured states and allows the control algorithm to predict how changes to the inputs and states can result in future reductions in the parameter uncertainty, through the backward pass of the cost-to- go function, to lower the total cost of the control trajectory. The parameter uncertainty influences the control actions through the inner loop Kalman filter gain, which impacts the estimates of the augmented state vector and the cost function at each time step. By calculating the derivatives of the cost function at each time step, dual iLQG can identify changes to the inputs that can decrease parameter uncertainties while also decreasing the total cost of the control trajectory. When implementing the aforementioned adaptive and implicit dual algorithms on a computing system having processing hardware and memory, augmented data structures are employed that facilitate the processing of the augmented dynamics within the iLQG inner loop and/or the filter residing in the outer loop. A first augmented data structure is an augmented state data structure, provided in a matrix, for example, a 1-D array (or alternatively in a multi-dimensional array), which is a concatenation of the data of the state vector and the data elements of the uncertain parameter vector, as shown in equation (3.1). This new augmented data structure allows the uncertain parameters to be treated as states in whatever part of the algorithm it is applied. A second augmented data structure is an augmented state covariance matrix, which is initialized as a combination of the data elements from the state covariance matrix and data elements from the uncertain parameter covariance matrix in a block diagonal manner, as shown in equation (3.2). This new data structure contains the confidence information for the associated augmented state vector, and when used with the augmented state vector, allows for the states and parameters to be treated as a single stochastic entity in whatever part of the algorithm it is applied. As noted above, the augmented state dynamics are also encoded, in functional logical form, into the memory of the computer system for processing, in which the state dynamics function and the parameter dynamics function are concatenated, as shown in equation (3.4). This encoded functional form allows the modelled dynamics of the augmented state vector to be calculated as a single dynamic system. In the case of the adaptive control method, the augmented state and augmented state covariance data structures are employed in the outer-loop filter of the closed-loop iLQG algorithm to give the algorithm the new ability of being adaptive, thus creating the new adaptive iLQG algorithm. These two data structures allow the outer loop filter to treat the parameters as states, and through the use of the information contained in the measurements from the true system, leads to updated estimates of the augmented state vector and augmented state covariance matrix for the next iteration of the algorithm. Using these data structures allows for the estimates of the uncertain parameters to be updated for each iteration of the algorithm, which can lead the adaptive iLQG algorithm to perform better than the closed-loop iLQG algorithm in terms of the evaluation of the total cost function. Although these data structures take up more space on the computer memory than the data elements from the state vector or state covariance matrix alone, the adaptive ability that they create when used in the closed-loop iLQG algorithm can allow for the computer to produce a better solution for controlling the physical system than the closed-loop iLQG algorithm. Additionally, the adaptive iLQG algorithm can produce this better solution with a similar number of processor cycles, and therefore can be more computationally efficient than the closed-loop iLQG algorithm. It will be understood that when implementing the systems and methods disclosed herein, an augmented covariance data structure that includes the covariance matrix of the states and the covariance matrix of the one or more uncertain parameters will include the data elements corresponding to those present in the covariance matrix of the states and those present in the covariance matrix of the one or more uncertain parameters, but need not be provided in a standard covariance matrix form, provided that the data elements of the augmented covariance data structure can be accessed and processed by the computer hardware implementing the method. In the implicit dual control method, the augmented state data structure and augmented state covariance matrix data structures are employed outer-loop filter and in the inner-loop iLQG algorithm along with the augmented state dynamics function data structure to give the closed-loop iLQG algorithm the new ability of being dual, thus creating the new dual iLQG algorithm. The augmented state data structure and augmented state covariance data structures allow the outer-loop filter to treat the parameters as states, and through the use of the information contained in the measurements from the true system, leads to updated estimates of the augmented state vector and augmented state covariance matrix for the next iteration of the algorithm. The use of the augmented state data structure in the inner-loop iLQG algorithm allows the dual iLQG algorithm to treat the parameters as unmeasured states and allows the dual iLQG algorithm to predict how changes to the control inputs and states can result in future reductions in the parameter uncertainty, through the backward pass of the cost-to-go function, and allow for a lower total cost of the control trajectory. The parameter uncertainty that is encoded in the augmented state covariance data structure influences the control actions through the inner loop Kalman filter gain, which impacts the estimates of the augmented state vector data structure and the cost function at each time step. By calculating the derivatives of the cost function at each time step, dual iLQG can identify changes to the inputs that can decrease parameter uncertainties while also decreasing the total cost of the control trajectory. The use of these data structures in this way can lead the dual iLQG algorithm to perform better than the closed-loop iLQG and adaptive iLQG algorithms in terms of the evaluation of the total cost function. These data structures allow for a dual approach that is derivative-based and can handle applications with higher numbers of states and parameters without the common issue of Bellman’s curse of dimensionality that limits the use of conventional implicit dual control algorithms on computer systems. The curse of dimensionality is a phenomenon in which the size of a stochastic computational problem, in terms of memory and/or processor requirements, grows exponentially with a linear increase in the number of states and parameters, as described previously for the case of dual SP-MS-NMPC. The derivative-based approach of the closed-loop iLQG algorithm allows it to search along a nominal control trajectory for where changes to the control trajectory are likely to reduce the total cost. The use of these data structures takes this existing closed-loop iLQG algorithm and significantly improves it by making it dual, allowing the dual iLQG algorithm to use the same computer resources, in terms of memory and processing, to solve larger problems than other implicit dual control approaches. Although these augmented data structures and encoded augmented state dynamics take up more space on the computer memory than the state vector, state covariance matrix, or state dynamics function alone, the dual ability that they create when used in the closed-loop iLQG algorithm can allow for the computer to produce a better solution for controlling the stochastic system than the closed-loop iLQG algorithm or the adaptive iLQG algorithm. Additionally, the dual iLQG algorithm can produce this better solution with a similar number of processor cycles, and therefore can be more computationally efficient than the closed-loop iLQG algorithm and the adaptive iLQG algorithm, as shown below. It is important to note that the present dual iLQG methods are implicitly dual. Unlike the explicit dual approach of including terms in the cost function that promote one or more of the dual features of caution, probing, and selectiveness, implicit dual approaches allow for flexibility in varying the trade off between cost function minimization and parameter identification in the optimization process, without including specific terms in the cost function that explicitly promote these dual features. Dual iLQG is implicitly dual through the augmented state and covariance data structures that allow it to identify changes to the control trajectory that can decrease parameter uncertainty, through the use of derivatives of the cost function. Although these control changes may have an associated cost, they decrease the overall cost of the control trajectory. By not prescribing the relative importance of cost function minimization and the three dual features, implicit methods can give better results than explicit methods [8]. Seeding approach Due to the dual iLQG approach being locally optimal, a seeding method may be incorporated to initialize the optimization problem with different sets of initial conditions. In the iLQG algorithm, the nominal control trajectory is the variable that is used to initialize the optimization and is iteratively improved upon through successive runs of the inner loop iLQG algorithm [35]. By running multiple instances of a single inner loop run with different initial control trajectory seeds, a larger control space is explored, allowing a better solution to be found. Although this seeding process increases the performance of the dual iLQG algorithm, it linearly increases the computation time for each time step in which it is used. For applications where the computational time required for the dual iLQG algorithm is much shorter than the desired control update time for the true dynamic system, seeding could take place at every time step. Otherwise, significant performance increases can still be gained by having the seeding process only take place during the first time step. Moving horizon approaches When performing example implementations of the adaptive and implicit dual control methods, two different example moving horizon options were coded into the dual iLQG algorithm: shrinking and rolling horizon approaches. In the shrinking horizon approach, the controller simulates the system and solves for the control actions for the entire time horizon, from the present time step to the final time step. After the inner loop determines the optimal control policy for this time horizon and implements the first control action, the time horizon is reduced by a single time step and the inner loop of the algorithm is reinitialized. This process repeats until the final time step is reached, meaning that the calculation effort for each time step decreases as the algorithm progresses. The rolling horizon approach on hand has a fixed time horizon, such that after each inner loop of the algorithm is complete, the next outer loop iteration is initialized by shifting this fixed time horizon one step forward, dropping the present time step and adding a new time step on the end. Once the final time step is reached, although the controls and system responses have been determined for future time steps, they are not considered in the results. In this way, the calculation effort for each time step stays the same as the algorithm progresses. Comparison with dual MS-SP-NMPC To compare the dual iLQG approach with the dual multi-stage sigma-point MPC (MS-SP-MPC) approach described in [61], the linear system from that paper was used. The dynamic model of the system is: = (3.5) = where = [ , ] is the state vector, = [ , ] is the control input vector, and = [ , ] is the vector of the uncertain parameters. The true and estimated initial values of the state vector are both [0, 0] and the states are unconstrained, and the control inputs are also unconstrained. The true value of the constant parameter vector is [125, 50], while its initial estimate is [100, d of [8100, 4500; 4500, 5625]. This system is simulated over 6 time steps that are 0.05 seconds long. Both states are measured with a noise scaling factor of 10- 2, and as the MS-SP-MPC does not consider the states to be uncertain, the noise scaling factors for dual iLQG were set to 10- 15 for both the state and parameter dynamics. The cost function is given as: ( , ) = ( + 10 . (3.6) This problem is designed to demonstrate the advantage of dual control over adaptive control. In an adaptive optimal control method, the cost function can be minimized by maintaining u2 at zero and driving x1 to 5 using u1, as u2 does not influence x1. A dual control method on the other hand will temporarily use u2 to reduce the uncertainty associated with d2, even though it will incur a cost, to reduce the overall cost. Adaptive MS-SP-MPC, dual MS-SP-MPC, and dual iLQG were compared using this system, and the resulting controls are in FIG.5A. Starting with u2, it can be seen that the adaptive MS-SP-MPC maintains u2 at zero as expected. The dual MS-SP-MPC has a non-zero value of u2 for the first time step only, while the magnitude of u2 for the dual iLQG controller gradually steps down to zero over four time steps.
Figure imgf000044_0001
FIG.5B shows the resulting states for each of these controllers, the dual and adaptive MS-SP-MPCs are shown to undershoot the tracking reference of x1 = 5 by the end of the first time step, instead reaching 3.1 and 2.9 respectively, while the dual iLQG controller overshot to 5.6. After this point, both dual controllers tracked the state reference with only small errors, while the adaptive MS-SP-MPC took until the last time step to reach a similar performance. These controls result in both of the dual controllers accurately estimating the true parameter values of 125 and 50 after the first time step, as shown in FIG.5C, while the adaptive MS-SP-MPC did not reach a similar level of accuracy until the last time step. Additionally, the dual iLQG controller’s estimate of the uncertain parameters was maintained near the true values, while the dual MS-SP-MPC’s estimates varied. The cumulative cost at each time step for these controllers is shown in FIG.5D, with totals of 0.4, 3.61, and 4.65 for the dual iLQG, dual MS-SP-MPC, and adaptive MS-SP- MPC respectively. The dual iLQG algorithm took an average of 3.5 seconds to run, while the adaptive iLQG algorithm took 0.4 seconds, the adaptive MS-SP-MPC algorithm took 3.0 seconds and the dual MS-SP-MPC algorithm took 75 seconds. In this linear example, the dual iLQG controller significantly outperformed dual MS- SP-MPC by being less cautious with its initial value for u1 and by continuing to
Figure imgf000044_0002
u2 after the first time step to improve its estimate of the uncertain parameters. Robustness to time-varying parameters In many applications, parameters that are modelled as constant may vary over time. The ability of a controller to achieve the desired objective in the face of uncertain parameter dynamics can be described as the controller’s robustness to time-varying parameters. In this section, the previous linear example is modified to have time-varying parameters to explore dual iLQG’s robustness to time-varying parameters. For this example, the system dynamics are the same as shown in equation (3.5), and the initial state and parameter estimates also are the same and both the states and the controls remain unconstrained. The initial true value of the parameters is the same as before, but the true parameters vary with dynamics of:
Figure imgf000044_0003
), (3.8) while the controller believes that the parameters remain constant over time. Although the time step of 0.05 seconds was kept the same as the previous example, instead of simulating the system over six steps, twenty time steps were used to show the impact of the time-varying parameters. The measurement noise scaling factors were kept at 10- 2, but the noise associated with the state and parameter dynamics was increased to 5- 1 from 10- 15. The cost function also remained the same, as shown in equation (3.6). As would be expected from the results of the previous example, the dual controller used the second control action to gain information about the second parameter while the adaptive controller’s second control action remained near zero, as shown in FIG.5E. In terms of the first control action, both controllers had the same value for the first time step, after which they had similar but different values. In FIG.5F, both controllers can be seen to maintain the first state near 5 according to the cost function. Other than the second time step, adaptive iLQG appeared to track the desired reference at least as well as dual iLQG on average. FIG.5G shows each iLQG algorithm’s estimates of the time-varying parameters. In the first several time steps, the dual controller quickly converged to the true parameter values, while the adaptive iLQG controller took a couple of extra time steps to converge to the true value of the second parameter, and had a small offset error in its estimate of the first parameter. The dual controller’s estimates for the parameters were better than the adaptive controller’s estimates, and both controllers were better able to estimate the second parameter. FIG.5H shows each iLQG algorithm’s cumulative cost over the control horizon. Although the dual iLQG controller has a higher cost after the first time step, it subsequently maintains a lower cumulative cost than the adaptive iLQG algorithm. The dual iLQG algorithm’s higher cost for the first time step is due to its increased distance from the goal value of 5 for the first state and its use of the second control action to probe the system. This probing action allows the dual controller to have a final cost that is 16% lower than the adaptive controller. To explore the stability and robustness of these results, 100 initial control sequence seeds were run, with each element of each sequence drawn from a normal distribution with a mean of zero and a variance of 0.01. This variance was sufficient to generate a diversity of results for these simulations. A histogram of the final costs and a box plot of the simulation times for each of the algorithms is shown in FIGS.6A and 6B. In FIG.6A, the expected successive improvement between the three algorithms can be seen. Interestingly in this case, the best iLQG solution has roughly the same value as the median adaptive iLQG solution, and likewise the best adaptive iLQG solution has roughly the same value as the median dual iLQG solution. the variance of the solutions visibly decreases with the change in the algorithm, with dual iLQG giving more consistent results than the other two algorithms. These results show that although these algorithms give different results with different initial control trajectories, adaptive iLQG is likely to outperform iLQG, and dual iLQG is likely to outperform adaptive iLQG. Looking at the simulation time required for these seeded runs in FIG.6B, the results are not as may have been expected. Instead of seeing a pattern of increasing time with increasing algorithm complexity from iLQG to adaptive iLQG and dual iLQG, the classical iLQG solution appears to take more time in this example than would be expected. Perhaps the adaptation of the parameters in both the adaptive and dual cases that may have initially slowed the algorithms down led to solutions that subsequently converged faster at each time step than the non-adaptive iLQG case. Either way, the relationship between the adaptive and dual iLQG times are as expected, with the dual algorithm taking longer and having more of a spread. Interestingly, the results for both the adaptive and dual algorithm times are skewed downward, and only occasionally have longer run times. Overall, all three of these algorithms are within the same order of magnitude for the time they take to run. Overall, the dual iLQG controller achieved a 16% reduction in the cost function compared to the adaptive controller. This difference in performance was primarily due to dual iLQG’s increased ability to determine the uncertain parameters, although both dual and adaptive iLQG showed robustness to time-varying parameters in this example. In the seeded results, the consistent improvement of dual iLQG over adaptive iLQG and adaptive iLQG over iLQG was shown, as well as that they have comparable run times. Importance of compensating for multiplicative noise Dual iLQG can handle multiplicative noise where Bar-Shalom and Tse’s wide- sense dual control is limited to additive noise and MS-SP-NMPC does not consider process noise, only measurement noise. This can be important, as multiplicative noise appears in many applications such as the control of prostheses using myoelectric signals [54], robot motion planning when using sensors with distance-dependent errors [14], stochastic fluid dynamics [12], and power grids where renewable energy represents a significant portion of total power generation [19]. In this example, the importance of compensating for multiplicative noise in the controller was demonstrated using the simple system dynamics shown in equation (3.5) with multiplicative noise. For this comparison, dual iLQG was used twice, but in one case the multiplicative noise was accounted for in the controller, and in the other case it was not. This approach was used to keep all other variables except the one in question the same and is equivalent to using a controller that can only deal with additive noise when interacting with systems that have multiplicative noise. For this example, the system in equation (3.5) were modified to include multiplicative Brownian noise based on the control signals: = ( ) + ( ) (3.9)
Figure imgf000047_0001
. true values, and covariances are the same as in the Section titled “Comparison with dual MS-SP-NMPC”, and both the states and the controls remain unconstrained. The cost function, time step length, and number of time steps are also all the same as in this section. The results of this comparison in terms of the control signals are shown in FIG. 7A. As mentioned previously, in the example, dual controllers will use the second control action to gain information about the second parameter, allowing for increased ability to use the first control to get the first state to the desired value of 5. Here, the controller that is compensating for the multiplicative noise uses a smaller first u2 control action than the other controller, before making a larger control action in the second step. This action allows the compensating controller to gain some information on the system before having additional noise from a larger control step. The controller that is not compensating for the multiplicative noise takes the opposite strategy in the first two time steps. The remaining u2 and the u1results are relatively similar between the controllers. FIG.7B shows the effect of the controls on the states, where the influence of the objective function being the square of the distance between the first state and a value of five can be seen. The non-compensating controller overshoots to 5.45 at the end of the first time step, while the compensating controller overshoots to 5.31. By the second time step, the non-compensating controller has an x1 value of 5.31, while the compensating controller has a value of 5.03. The non-compensating controller then takes three more time steps to get lower than 5.03, while the compensating controller stays within ± 0.04 of five. A plot of the estimates of the two uncertain parameters is shown in FIG.7C. Both controllers start to converge to estimates lower than the true value of the parameters, the non- compensating controller from above and the compensating controller from below. As with the unmodelled dynamics example, in this case it is not the absolute parameter values that are important for the controller, but the ratio of the parameter values. FIG.7D shows the ratio of the parameters for the two controllers over the simulation. Here, it is much more obvious that the compensating controller’s estimate of the parameter ratio is more accurate than that of the non-compensating controller for all but a single time step. FIG.7E shows the cumulative cost of these controllers over the simulation period. The major changes in cost occur in two time steps, with the compensating controller’s cost leveling out after the first time step and the non- compensating controller’s cost leveling out after the third time step. The controller that is compensating for the multiplicative noise has a total cost that is 62.1% lower than the controller that does not compensate for the multiplicative noise. This section demonstrates the importance of using a dual controller that can properly compensate for multiplicative noise in systems where multiplicative noise exists. In the simple example presented, the difference in cost was significant at 62.1%. Robustness to unmodelled dynamics System models are often simplifications of the true dynamics of a given system, where nonlinearities or higher-order dynamics are excluded from the control model. These simplifications can be due to the cost associated with a more complete system identification or the difficulty of quantifying the mathematical relationship of a given phenomenon. The difference between the response that the algorithm expects from the system and the true response of the system can cause the values of the adapting parameters to drift over time, possibly leading to the system becoming unstable. In Slotine and Li’s Applied Nonlinear Control textbook [55], Rohr’s example is used to illustrate the effect of unmodelled dynamics, showing that, unless a dead zone is used, the given adaptation law causes the first-order system to become unstable when driven to a constant reference in the face of a sinusoidal measurement noise. This simple example is used here to demonstrate dual iLQG’s robustness to unmodelled dynamics in this case. Rohr’s example considers a desired performance that is described by a reference system that is a first-order system with a transfer function of:
Figure imgf000048_0001
(3.11) where and are parameters both with a value of 3. The true system is a third-order
Figure imgf000048_0002
where and are the true system parameters with values of 2 and 1, respectively. For this example, the controller only has knowledge of the first-order portion of this model, leaving the second-order portion of the transfer function unmodelled. These transfer functions give state models of: + (3.13) = (3.14) for the first-order system, where is the system output, and:
Figure imgf000049_0001
for the true third-order system. The system also has a measurement noise of 0.5sin(16.1 ). In [55], model-reference adaptive control is used to control this system and is implemented as shown in FIG.8A. In model-reference adaptive control, the desired system behaviour is specified through the use of a reference model. The controller then attempts to make the true system’s output equal to the output from the reference model by adjusting the input to the true system based on a set of known regressors and adapting parameters. This allows model-reference adaptive control to adapt a set of compensating parameters to account for the true system’s behaviour and impose the desired behaviour onto the system. This is different than the dual iLQG approach which is to adapt its estimates of the true system’s parameters to a sufficient level to minimize the desired cost function. To implement model-reference adaptive control on this example, a control law of = + (3.17) was used where r is the input reference and y is the output of the true system after the addition of noise, and the a terms are the corresponding adaptation parameters. These parameters were updated with an adaptation law of , (3.18) where is the adaptation parameter vector [ , ] , is a positive constant, is the regressor vector which is equal to [ , ] in this case, and is the error between and the reference model output . To run this example with dual iLQG, the true values and initial estimates listed previously are used, with a covariance of I2. The states are initialized at [0, 0, 0] for both the true values and the estimates, and a state covariance of 10- 15 is used as the problem does not include state uncertainties. Both the states are unconstrained for this example, and the parameters are constants. A time step length of 0.1 seconds was used and the system was simulated over 700 time steps to match the period shown in [55]. The measurement noise scaling factors were 0.5 and the noise associated with the state and parameter dynamics was 10- 15. The sinusoidal noise from the problem was implemented on the true system, which the algorithms had no knowledge of other than the 0.5 noise scaling factor. The cost function was: ( , ) = ( , (3.19) where the modelled and true output equations are given in equations (3.14) and (3.16). Additionally, to have the first-order system’s state align with the proper state of the third- order system, the first-order system was modelled as a third-order system with the first two states having zero dynamics and not being measured. The resulting controls from both the adaptive and dual iLQG controllers as well as the model-reference adaptive controller are shown in FIG.8B for the entire 70 seconds, and in FIG.8C for the first seven seconds. In these figures, it can be seen that both of the iLQG controllers provided similar high-frequency control actions with amplitudes that decreased over the first 1.5 seconds before evening out at a nearly steady-state value of around 1. During this initial oscillatory period, dual iLQG generally had a slightly smaller control amplitude than the adaptive iLQG, and in the steady region, both controllers continued to have small variations in their controls. In contrast, the model-reference adaptive controller starts with a lower frequency control action before oscillating around a value of 1. The amplitude of this oscillation about a control action of 1 grows until just after 60 seconds when the controller fails. FIGS.8D and 8E show the true and estimated states for all three controllers over the entire 70 seconds and for just the first seven seconds, respectively. The response to the high-frequency control actions from the iLQG controllers can be seen in the true dynamics for the first two states, while their estimates remained at zero with increasing uncertainty because there were no dynamics for those states in the modelled system and they were unmeasured. The true state response for the third state also reflected the high-frequency input but was very small compared to the estimated state values due to the 229 scaling factor in the true system output. These large discrepancies between the true and estimated states were because they are different systems, with only the relationship between the inputs and the outputs being comparable. For the third state estimates, after some initial oscillations, there was a sustained offset between the dual and adaptive estimates. Finally, in FIG.8D it can be seen that the system remained stable over the entire 70 second period. As the model- reference adaptive controller does not estimate the states, only true values are shown. The low- frequency response in the first two states is to the low-frequency controls, and the low value of the third state is again due to the 229 scaling factor. The estimates of the uncertain parameters can be seen in FIGS. 8F and 8G, with FIG.8G again being a zoomed-in version. Although the model-reference adaptive controller does not directly estimate these parameters and instead adapts the compensation parameters discussed, these parameter values can be derived from the definitions of the compensation parameters as:
Figure imgf000051_0001
Using these relations, the parameter estimates are shown in FIGS. 8F and 8G were determined. After about 2 seconds, these estimates stop changing as quickly and are close to the true values, but over the full simulation time, they drift until they cause the true system to become unstable just after 60 seconds. FIG.8F shows that for the iLQG controllers, after an initial adaptation period, the estimates of the parameters for the controllers remained largely constant and that there was no significant parameter drifting occurring although the estimates of the second parameter did decrease slightly over the 70 second simulation. Zooming in to the initial adaptation period shown in FIG.8G, both controllers’ estimates of the first parameter initially approached the true value of 2, but then gradually converged to a final value near 2.6. For the second parameter, there were no initial rapid changes in the estimates, just a gradual convergence to 1.4. Additionally, adaptive iLQG appeared to have better parameter estimates than dual iLQG, which was unexpected considering that the dual controller ended up with a lower cost. To understand how dual iLQG performed better in this example while the adaptive controller had better parameter estimates, a closer look at the system is required. Knowing that this system quickly reaches a steady state, an expression for the steady-state output of the first-order system can be shown to be: = , (3.22) where the subscript ss denotes steady state. With the sole term in the objective function being the squared difference between the output and a reference of 2, it becomes clear that it is not the individual values of the parameters that are relevant for this problem, but the ratio of . Knowing this, the ratio of can be plotted, as shown in FIG.8H for the entire time horizon, and for the first seven seconds in FIG.8I. Although difficult to see in FIG. 8H, FIG.8I shows that dual iLQG had a better estimate of this ratio for the first 1.5 seconds, after which adaptive iLQG is largely better. FIG.8H also more clearly shows that the slight drifting of the second parameter in FIG.8F resulted in the ratio of the parameter estimates from both controllers nearly converging to the true value by the end of the 70 second simulation. Model-reference adaptive control oscillates around the true parameter ratio, but over time diverges as the parameter drift. FIGS. 8J and 8K show the full and initial plots of the modelled and true system output. In FIG.8J, the modelled system output is shown to approach 2.01, very close to the desired value of 2, while the true system output approached 1.91. Since the modelled system had practically reached the desired output reference, the controllers’ input stopped changing and the true system’s output also stopped changing, but not necessarily at the desired output reference due to the unmodelled dynamics and measurement noise. Model- reference adaptive control oscillates around the desired output of 2 but diverges quickly 60 seconds into the simulation. The stage and total costs for the adaptive and dual iLQG algorithms are shown in FIGS. 8L and 8M, with the latter figure being only the initial seven seconds of the simulation. These plots directly reflect the square of the distance of the true system output in FIGS.8J and 8K from the desired reference of 2. Although both controllers ended up with small but non-zero stage costs for the second part of the simulation, dual iLQG outperformed adaptive iLQG for the first part of the simulation, resulting in dual iLQG having an 11% lower total cost. Model-reference adaptive control is not shown as it is not based on a cost function. For the model-reference adaptive controller in [55], the constant reference does not provide sufficient information for the parameter identification to distinguish information from noise, causing the parameters to slowly drift. After drifting significantly beyond their true values, the parameter estimates cause the system to become unstable and suddenly diverge around 70 seconds into the simulation. On the other hand, both dual and adaptive iLQG demonstrated robustness to unmodelled dynamics and were able to get the system’s output very close to the desired reference. Unlike the nonlinear controller in [55], these controllers did not cause their parameter estimates to drift and avoided the instability that caused Slotine and Li’s controller to fail. In the preceding section titled “Dual Control”, after describing how iLQG can be extended to be adaptive or dual, dual iLQG was compared with dual MS-SP-NMPC on a simple linear system where dual iLQG significantly outperformed the other controllers. The robustness of dual and adaptive iLQG to time-varying parameters and unmodelled dynamics on specific systems was then investigated. Although dual iLQG outperformed adaptive iLQG in these two examples, both controllers demonstrated robustness to time-varying parameters and unmodelled dynamics. The importance of using controllers that are designed to handle multiplicative noise was also Dual iLQG shows promise as a high-speed dual control algorithm that can be applied to larger problems than other dual control algorithms due to its derivative-based approach. Dual iLQG can also handle multiplicative noise where wide-sense dual control and MS-SP-NMPC cannot. Below, dual iLQG will be applied to a complex biochemical system, an anaerobic digestion, and a complex public policy control problem, the spread of COVID-19 in a population. The focus of this disclosure is the control of systems with uncertain parameters in such a way that the reduction of uncertainty is implicit in the minimization of a given cost function. These dual goals of system identification and cost minimization are often at odds with each other, and this tension creates a set of three features that characterize dual control. Dual control demonstrates caution, minimizing the magnitude of control actions when uncertainties are high, probing, varying the control actions to gain information about the uncertain parameters, and selectiveness, only seeking to gain information on those parameters which will are likely to cause a reduction in future costs [28]. Although the cost function can be modified to explicitly introduce one or more of these dual features, these explicit approximations of dual features require that the relative importance between the dual features and the original cost function is pre- determined. Implicit approximations of dual control, on the other hand, consider the impact of future reductions in parameter uncertainty on the original cost function over the control horizon, increasing the flexibility of the control algorithm to balance system identification and cost minimization when compared to explicit methods. Due to this increased flexibility, implicit methods can give better results than explicit methods [7], generally at a cost of higher computational effort [22]. Several implicit dual control algorithms exist, but each only considers limited control or parameter realizations to make the control problem tractable, or are only applicable for systems with a limited number of states. For example, the optimization problem in MS-SP-NMPC grows exponentially with increasing uncertain parameters, and its current implementation can only handle two uncertain parameters. This issue of the problem size increasing exponentially with the number of states is known as Bellman’s curse of dimensionality [63], and this limits existing implicit dual methods as they do not take a derivative-based approach in continuous state space. The dual iLQG methods presented in this disclosure fill this gap by extending the derivative-based iLQG method to be implicitly dual. Both the dual and adaptive iLQG methods presented were demonstrated to be robust to time-varying parameters, and unmodelled dynamics, and can handle multiplicative noise. Neither wide-sense dual control nor MS-SP-NMPC can handle multiplicative noise, making dual iLQG ideally suited to the wide range of control problems that inherently involve multiplicative noise. Dual iLQG is applicable to and nonlinear control problems with many uncertain parameters, as shown in its application to the control of anaerobic digestion and COVID-19 in the Examples provided below. Since MS-SP-NMPC is only able to handle systems with two uncertain parameters, simplified versions of these systems were used to compare dual and adaptive iLQG with MS-SP-NMPC, and dual iLQG outperformed MS-SP- NMPC in all but one case. When the systems with all of the uncertain parameters were used, dual iLQG consistently outperformed adaptive iLQG. Selected Aspects of Example iLQG Implementation Some aspects of the Example iLQG implementation are as follows: 1. A high-speed adaptive iLQG algorithm was developed that estimates individual parameters of a system instead of the entire dynamic model of the system. The parameters were treated as constants in the iLQG algorithm through the creation of an augmented constants vector, but treated as states in the outer loop filter through the creation of an augmented state vector and augmented state covariance matrix. 2. Bellman’s “curse of dimensionality” that limits conventional implicit dual control algorithms was overcome by extending the iLQG control algorithm to be applicable to dual control problems. The parameters were treated as states in the iLQG algorithm and the outer loop filter through the creation of an augmented state vector, augmented state covariance matrix, and an augmented dynamic model. 3. The robustness of dual and adaptive iLQG to both time-varying parameter dynamics and unmodelled dynamics was explored. 4. The importance of having a control method that can account for multiplicative noise in systems where it exists was demonstrated. 5. Dual and adaptive iLQG were applied to two complex systems; COVID- 19 and anaerobic digestion (see Examples below). 6. The superior performance of dual iLQG over dual MS-SP-NPMC and wide-sense dual control was demonstrated. Adaptations, Variations and Generalization of Dual iLQG Algorithms While the preceding section of the present disclosure disclosed an example implementation of a dual iLQG algorithm based on the configuration shown in FIG.4C, it will be understood that the specific implementation shown in FIG.4C is not intended to be limiting, and that different configurations of dual iLQG-based algorithms that employ augmented states and an augmented covariance matrix, may be implemented without departing from the intended scope of the present disclosure. Furthermore, as discussed in additional detail below, the present dual implicit augmented iLQG algorithms may be adapted according to a wide variety of stochastic differential-dynamic-programming-based algorithms. The present section contemplates some example and non-limiting implementations involving different configurations involving an iLQG-based algorithm. In each example implementation of a dual iLQG algorithm, the states are augmented to generate an augmented state vector that includes the uncertain parameter(s), such that each uncertain parameter is treated as a state by the iLQG algorithm. Likewise, the state covariance matrix is augmented with the covariance matrix of the uncertain parameter(s) to form an augmented covariance matrix that is employed by the iLQG algorithm. This augmentation of the states and the covariance matrices, such that each uncertain parameter is treated as a state by the iLQG algorithm, results in an algorithm that implicitly provides the dual features of caution, probing, and selectiveness. It will be understood that the capability to handle multiplicative noise is inherent to the iLQG algorithm (residing in the inner loop). FIG.9A illustrates an example closed-loop dual iLQG algorithm similar to that shown in FIG.4C. As shown in the figure, before being passed to the iLQG algorithm, the state and parameter estimates are concatenated into an augmented state vector and the covariances are likewise combined in a block diagonal fashion. The figure shows the closed-loop, dual augmented implementation of the iLQG algorithm, which shows an outer loop of the dual iLQG algorithm, and also includes an iLQG box that represents an inner loop. The iLQG box, representing the inner loop of the algorithm, and which employs augmented states and covariance matrices, iteratively converges to an updated control trajectory and policy that contains the dual features of caution, probing, and selectiveness. This is the core of the algorithm, the outer closed-loop implementation shown in the figure, combined with the augmentation, provides the implicit dual functionality of the overall algorithm. The core iLQG algorithm may be implemented by employing the equations in the iLQG section of the present disclosure, but using the augmented state and covariance matrix. The algorithm, when implemented, may include the following steps, as previously described: (i) forward pass; (ii) simulating system dynamics using the given control trajectory; (iii) obtaining state, control, and cost derivatives at each time step – used to linearize the system; (iv) employing an inner loop estimator (filter); (v) performing a backward pass; (vi), approximating the cost-to-go function at each time step; (vii) obtaining a control policy for each time step; (viii) obtaining a new control trajectory; (ix) applying the control policy to state estimates for each time step to obtain control deviations; and (x) adding these control deviations to the previous control trajectory. As shown in FIG.9A, after performing the inner iLQG loop based on the augmented state vector and augmented covariance, a zero-order hold may be employed to convert the control trajectory from being discrete to continuous in The first control action may then be sent to the true system. The new state of the system is sampled and measured, and the measurement is then filtered, utilizing other system information, to create an updated estimate of the augmented state vector and its covariance, as shown in the outer loop portion of the figure. The control trajectory horizon is then updated, and after a time delay, the iLQG algorithm is run again to determine the control action for the next time step. The outer loop filter(s) can be any algorithm that can provide updated estimates of the states and parameters and their covariances given at least a set of measurements and prior estimates of the states and parameters and their covariances. Typically, this filtering would be performed with some type of recursive Bayesian estimator such as a Kalman or particle filter, particularly one suited to nonlinear systems. These filters typically rely on inputs including the dynamic equations, the measurement function, and noise estimates and covariances along with the measurements and prior estimates of the states and parameters and their covariances to provide updated estimates of the states and parameters and their covariances. To estimate both the states and parameters, either a single filter could be used, known as joint filtering, or separate filters could be used, known as dual filtering. FIG.9B is the same as FIG.9A other than the fact that it explicitly states that a joint filter is being used. In the case of a filter that accepts and outputs the states and vectors separately, such as with dual filtering, the augmented state vector can be separated before filtering and the updated estimates can be concatenated after the filtering. While FIGS.9A and 9B show example implementations in which a single filter is employed to estimate the augmented state vector and covariance matrix based on the measurements, in other example implementations, the measurements may be filtered using separate filters for the systems states and the one or more uncertain parameters, utilizing other system information. An example implementation of such a system is shown in FIG.9C. The updated estimates of the states and parameters, and their covariances, as provided by the separate state and parameter estimation filters, are then concatenated into the updated augmented state vector and updated augmented covariance matrix. The control trajectory horizon is then updated, and after a time delay, the iLQG algorithm is run again to determine the control action for the next time step. FIGS.9A-9C illustrate the use of a moving horizon implementation, where the moving control horizon is updated with each time step in the outer loop. The horizon may be updated according to several different approaches, including, but not limited to, moving horizon and rolling horizon approaches. FIGS.9A-9C show additional features of example implicit dual iLQG algorithms relative to FIG.4A, including the covariances of states and uncertain parameter(s), and the augmented covariance matrices, and a loop to show that the control trajectory is reused between iterations. The certain parameters, or "constants" c, are also shown as optionally being provided as a function of time. The figures also include dashed lines to show the initialization of the algorithm. Initialization may be performed, for example, with the estimates of the states and parameters and their covariances, certain variables, and a seed control trajectory or policy, as described in the previous section. It will be understood that FIG.9A-9C do not show all of the inputs provided to the components within the inner and outer loops of the augmented iLQG algorithm. For example, additional inputs such as, but not limited to, a dynamic model of the system, a cost function, noise covariances, and a measurement model, are not shown in the figure despite being implemented. It will also be understood that the filter block also does not show all necessary inputs to the filter. It is also noted that FIGS.9A-9C show non-limiting example cases in which the augmented state vector is not broken up (de-augmented) before each new iteration of the outer loop. With regard to the augmentation of the states and covariances, it will be understood that the augmentation can be performed according to many different implementations. For example, the uncertain parameters could be the first component in the concatenation of the state and parameter vectors, or the elements of the parameter vector could be interspersed with the elements of the state vector, provided that the correlating covariances are augmented in the same order (but in a block diagonal fashion instead of concatenated). While the example implementations shown in FIGS.9A-9C may be well suited for cases in which at least some of the states are observable, with the filter in the outer loop providing estimates of the unobservable states based on the measurements. For clarity, a partially observable system is one in which some of the states can be inferred from measurements, and an unobservable system is one on which none of the states can be inferred from measurements. In some implementations, a system can be fully observable, with all states being determinable from measurements without the need for a filter. In such a case, a filter can be employed to provide estimates of the one or more uncertain parameters, and the states can optionally be computed from the measurements in the absence of the use of a filter. An example implementation of such a system is illustrated in FIG.9D. As shown in the figure, the new state of the system is sampled and measured, and the measurement is then filtered to create an updated estimate of the one or more uncertain parameters, utilizing other system information. The states are determined by a function of the measurement(s) and then concatenated into the updated augmented state vector and updated augmented covariance matrix. The control trajectory horizon is then updated, and after a time delay, the iLQG algorithm is run again to determine the control for the next time step. In such cases in which the states can be determined algorithmically from the measurements without the use of a filter, the filter employed in the outer loop to update the estimates of the parameters does not strictly require the use of the state dynamic model. For example, if an Extended Kalman Filter were used, it contains a term that is the derivative of the states with respect to the parameters which would require the state model, but a Sigma Point Kalman Filter does not require the state dynamics. Some example implementations of a dual iLQG algorithm require a dynamic model for each system that it is applied to, and the noise characteristics of that system need to be known. In some example implementations, in the absence of a white box dynamic model, Gaussian Process regression or other machine-learning techniques can be used to formulate the system dynamics [28, 34]. In such cases, measurements from the system are used to estimate the relation between the change of the states over time and the values of the states, the control inputs, a set of constants, and a set of parameters. This estimate of the system dynamics can then be used in the dual iLQG algorithm in place of a white box dynamic model. For example, FIG.9E illustrates how, within the inner loop 110, the model structure and parameters are provided to the iLQG algorithm. FIG.9F shows various example approaches to model generation, including, for example, data-driven (machine learning approaches), such as regression-based approaches. One non-limiting example of a regression- based approach to model generation is the Sparse Identification of Non-linear Dynamics (SINDy) algorithm. In some example implementations, the ability of state constraints to be imposed on the system could be improved by combining this work with the augmented Lagrangian method for state constraints in differential dynamic programming [24]. This is an established method for imposing state constraints in differential-dynamic-programming-based algorithms. Although the adaptive and implicit dual methods described above with reference to FIGS.4B and 4C and FIGS.9A-9D show the newly determined control action generated by the execution of the inner loop, this control action may not be applied to the system in some cases. For example, a user may overrule the control action, applying a different control action between successive inner loop executions, or choosing to not apply any control action at all between successive inner loop executions. In some example implementations, if a control action is needed or desired to operate the true system before the inner loop iLQG algorithm could generate an updated control trajectory, the control policy from the last completed iteration can be used to calculate the new control action instead of using the second element of the control trajectory from the previous (outer loop) iteration of the algorithm. The optimal control deviation is computed by the iLQG algorithm as: = + (3.23) where lk and Lk are the scalar and state-dependant control gains (and the augmented state vector would be used in the cases of adaptive or dual iLQG). The updated control trajectory can then be computed as: = + (3.24)
Figure imgf000059_0001
considered, if an updated control horizon is not available, the control policy in the form: = + + (3.25) could give a better result than the second element of the previous control trajectory, given an updated state estimate. This updated state estimate could come from filtering a system measurement, allowing for the generation of a new control trajectory. An estimation of the time to measure the system and filter the data to obtain a new state estimate could be employed to determine at what point this use of the previous control policy was necessary. An alternative approach, as the inner loop iLQG algorithm is iterative, would be to stop the algorithm before its other convergence conditions and to use the control trajectory it had generated up until this point. The advantage of the first approach is that a given control policy could be used for multiple time steps with no other algorithm calculations required depending on the situation that made this special case necessary. With regard to the adaptive and implicit dual methods described above, it is to be understood that the control action may be determined, from the updated control policy that is provided by executing the inner loop, based on the state estimates employed as inputs to the inner loop, or, alternatively, based on the update values of the states determined within the outer loop, after performing the measurement of the system output. It is to be understood that the parameter dynamics employed in the augmented mathematical model of the system dynamics provides a functional description that models how the uncertain parameters are expected to evolve over time. This model is a set of equations that could be informed by knowledge of the physical system or generated analytically with the use of data and it describes the rate of change of each of the uncertain parameters. This model is not expected to be a prefect reflection of reality, but higher fidelity models would allow for better results from the algorithm. Similar to the state dynamics, the parameter dynamics are used to predict the values of the parameters in the future in order to ultimately reduce the cost of the control trajectory, and to update the estimates of the parameters in the present time step using measurements of the true system. In some applications, the parameter dynamics may model one or more uncertain parameters as constants, and in such cases, the for those parameters would reflect zero rate of change over time as per the modeled dynamics. Accordingly, while some example implementations of implicit dual methods have been described within the context of time- varying uncertain parameters, it will be understood that in some cases, the parameter dynamics may prescribe a constant uncertain parameter, which nonetheless changes its value as successive iterations and control actions are performed, as the implicit dual method takes control actions that refine the uncertain parameter. Generalization to other Algorithms The preceding example embodiments involving an implicit dual control algorithm each employ an augmented iLQG algorithm as the inner loop of the overall closed-loop algorithm, with an augmented state vector and augmented covariance matrix as described above. It will be understood, however, that a wide variety of alternative algorithms may be employed in the inner loop, provided that the algorithm is based on differential dynamic programming and is adapted for stochastic systems. Such a broader class of algorithms that are augmentable, as per the present method of augmentation of the state vector with uncertain parameters, and the augmentation of the state covariance matrix with the uncertain parameter covariance matrix, are henceforth referred to as stochastic differential-dynamic-programming-based (DDP-based) algorithms. When augmented according to the present augmentation method, this broad class of augmented algorithms are referred to as augmented stochastic differential-dynamic-programming-based algorithms. Non-limiting examples of stochastic differential-dynamic-programming-based algorithms that can be augmented, and employed in a closed loop control structure as described above, to obtain an implicit dual controller, include iLQG and stochastic DDP (SDDP), and variations thereof. Variations of other differential-dynamic-programming-based algorithms that are not stochastic can be made to incorporate stochastic terms such that they can be augmented, and employed in a closed loop control structure as described above, and include DDP, the Sequential Linear Quadratic (SLQ) algorithm and the Iterative Linear Quadratic Regulator (iLQR) algorithm, and variations thereof. For example, the iLQG inner loop algorithm in FIGS.9A-9D may be substituted with another type of stochastic differential-dynamic-programming-based algorithm, such as SDDP and a stochastic variation of iLQR, to obtain another implementation of an implicit dual controller. Examples of suitable stochastic differential-dynamic-programming-based algorithms, and their relationship to other such algorithms and the more general class of differential-dynamic- programming-based algorithms, are shown in FIG.10 below. In general, a stochastic differential-dynamic-programming-based algorithm may be implemented for a nonlinear stochastic system as follows: (i) linearize the system, (ii) updates become linear in nature ( xi+1 = A xi + B ui), (iii) true linear optimal control (LQG) can then be applied to calculate the optimal control law for the update, (iv) this update can then be applied to the true state of the nonlinear system, and (v) a new linearization can be applied (iteratively) until convergence is reached. SLQ was developed as a variation in the 2000s, which does not use the exact Hessian in step (ii), whereas the DDP algorithm uses the exact Hessian when calculating B in step (ii). The iLQR algorithm is another variation that was independently developed in the 2000s. The difference between iLQR and DDP is that in step (iv) a nonlinear system can be used in iLQR, whereas in DDP the linearized version is used. All three of these techniques are similar, with each having minor differences that can affect efficiency, depending on the application. It will be understood that when augmenting a given stochastic differential-dynamic- programming-based algorithm according to the present disclosure, the state dynamics are augmented with the parameter dynamics to create the augmented state dynamics, in the same manner in which the state vector is augmented with the uncertain parameter vector to create the augmented state vector, and in which the state covariance is augmented with the uncertain parameter covariance matrix to obtain the augmented covariance matrix. While many example implementations of the present disclosure relate to implicit dual control of nonlinear stochastic systems, it will be understood that in some implementations, the system can be a linear system. For example, the present example implicit dual control systems and methods may find useful and beneficial application, relative to conventional control approaches, when applied to linear systems with multiplicative noise. Dual iLQG, and other example implicit dual differential-dynamic-programming- based control algorithms disclosed herein, show significant promise as a control approach that can account for and actively reduce the uncertainties that are inherent to systems while improving their performance. Since this approach is so general and requires no training prior to implementation, it is applicable to a wide range of fields including applications that have been identified as high-impact. In the 2017 review paper “Systems and Control for the future of humanity, research agenda: Current and future roles, impact and grand challenges” by Lamnabhi- Lagarrigue et al. [33], three requirements are listed that “call for a paramount role for data- driven modeling, which is integrated into virtually all future complex engineering systems.” The present implicit dual control algorithms address two of these three requirements, specifically the need for models to adapt to changing parameters as well as the need for approaches that enable active learning, which is described as “probing the system/environment to generate sensor information that is suitable for model adaptation”. Considering the ability of the present implicit dual control algorithms to handle nonlinear systems and efficiently handle large systems, and the capability to handle both linear and nonlinear stochastic systems with multiplicative noise, they are in a favourable position to meet these requirements to solve current and future control problems. Since uncertainty is common in real-world systems, the present implicit dual control algorithms are applicable to a wide spectrum of applications. Aside from controlling mechatronic systems, the present implicit dual algorithms can be applied to inform health care systems, government policy, and financial systems. The review paper by Lamnabhi-Lagarrigue et al. mentions several high-impact system and control applications for the future, and the present implicit dual control algorithms could be applied to many of them, including automotive control, spacecraft control, renewable energy and smart grid, assistive devices for people with disabilities, and advanced building control. In the present disclosure, dual iLQG was applied to both a government policy problem (COVID-19) and a renewable energy problem (anaerobic digestion). Additionally, many of the present example implicit dual differential-dynamic- programming-based algorithms can handle multiplicative noise whereas other dual control algorithms such as MS-SP-NMPC and wide-sense dual cannot. This feature is particularly well-suited for applications where multiplicative noise is common, such as in the control of prostheses using myoelectric signals [54], robot motion planning when using sensors with distance-dependent errors [14], stochastic fluid dynamics [12], and power grids where renewable energy represents a significant portion of total power generation [19, 70]. Other applications that involve multiplicative noise include financial stochastic volatility models [67], systems involving wireless communication [68], and batch chemical process control [69]. In some example embodiments, the present disclosure provides the closed-loop implicit dual differential-dynamic-programming-based control of physical systems, in which, in some example implementations, signals are transduced from sensors associated with the physical system, transformed into digital signals that are processed to infer states and uncertain parameters associated with the physical system, and to determine, via implicit dual control, suitable control actions for controlling the physical system, thereby providing the control signals to the physical system, such that the control signals are transduced into physical controls that are applied to the physical system, with the physical system being controlled according to the computed control signals. The transformation of these sensor signals into physical changes in the controlled system through closed-loop implicit dual control can thereby improve the performance of the physical system according to a specified objective or cost function. It is also noted, however, that the implicit dual differential-dynamic-programming- based algorithms of the present disclosure provide significant benefits relative to prior schemes with regard to efficiency of the underlying computer system. Prior implicit dual control methods have been limited due to Bellman’s “curse of dimensionality” where, as the number of variables in the dual control problem increase, the computational resources required to solve the problem increase exponentially to the point that the problem can be intractable, or extremely difficult to efficiently solve given present computational power and methods. An example of the exponential growth of the control problem can be seen in MS- SP-NMPC, and specifically in FIG.3, where branching future scenarios considered are limited by both the robust horizon and the number of realizations of each uncertain parameter considered (three in the case of the example shown). The MS-SP-NMPC control problem can quickly become intractable with increasing uncertain parameters, realizations of each uncertain parameter, and the length of the “robust horizon. The computer is unable to solve such intractable problems due to the large number of variables and calculations involved. The number and size of the variables can exceed the computer’s memory and the quantity of calculations can push the computer’s processor to its maximum limits, extending the time required to solve the problem. In some cases, this execution time may be so long as to preclude the method’s beneficial application in a real-time physical setting. On the other hand, closed-loop implicit dual differential-dynamic-programming- based control as described in this disclosure overcomes Bellman’s “curse of dimensionality” by taking a derivative-based approach in continuous space. This closed-loop implicit dual control method allows the computer to solve dual control problems with large numbers of variables without reaching memory or processor limits through the extension of a differential- dynamic-based algorithm, that can handle, for example, high-dimensional stochastic nonlinear systems, to be an implicit approximation of dual control. With derivatives of the cost function and the system dynamics, closed-loop implicit dual control as described in this disclosure can avoid the discretization of the control and state space that leads to Bellman’s “curse of dimensionality”. Due to the ability of the closed-loop implicit dual control method to handle larger control problems efficiently, it can be applied to improve a variety of complex physical systems. Even though the closed-loop implicit dual differential-dynamic-programming- based method described in this disclosure involves fewer calculations than prior implicit dual methods, attempting to perform in the absence of a computer would take sufficient time that it would preclude the method’s beneficial application in a real-time physical setting. Referring now to FIG.11, an example system is shown that includes a controllable subsystem 400 that is controllable via, and operatively coupled to, implicit dual controller 200. Although not shown in the figure, the subsystem 400 may include separate control and processing circuity, including components such as those shown in implicit dual controller 200, which may be operatively connected to one or more mechanical or non-mechanical output components of the subsystem 400 (such as one or more motors, actuators, or devices that are responsive to a control signal provided by the implicit dual controller 200 for controlling the controllable subsystem). The system includes one or more sensors 410 for sensing signals suitable to determine, either directly, or to estimate via a filter, the augmented state vector and augmented covariance matrix of the controllable subsystem. The types of sensors employed will depend on the application. For example, in applications involving the control of a robotic system, example sensors can include, but are not limited to, pressure sensors, electrical contact sensors, torque sensors, force sensors, position sensors, current sensors, velocity myoelectric sensors. In chemical process control applications, example sensors can include, but are not limited to, pressure sensors, gas sensors, flow sensors, ultrasonic sensors, alcohol sensors, temperature sensors, and humidity sensors. In power grid applications, example sensors can include, but are not limited to, voltage sensors, current sensors, light sensors, pressure sensors, rain sensors, temperature sensors, and anemometer sensors. In healthcare applications such as the COVID-19 public policy example, example sensors can include, but are not limited to, epidemiological surveillance, medical data collection, and laboratory data collection. In financial market applications, example sensors can include, but are not limited to, market data collection and economic indicators. Moreover, as shown in the figure by dashed box 420, one or more of the sensors 410 may be integrated with the controllable subsystem 400, i.e. the sensors may be external sensors or internal sensors (e.g. sensors residing on or within the controllable subsystem 400). As shown in the example embodiment illustrated in FIG.11, implicit dual controller 200 may include a processor 210, a memory 215, a system bus 205, a control and data acquisition interface 220 for acquiring sensor data and user input and for sending control commands to the controllable physical subsystem 400, a power source 225, and a plurality of optional additional devices or components such as storage device 230, communications interface 235, display 240, and one or more input/output devices 245. The methods described herein can be partially implemented via hardware logic in processor 210 and partially using the instructions stored in memory 215. Some embodiments may be implemented using processor 210 without additional instructions stored in memory 215. Some embodiments are implemented using the instructions stored in memory 215 for execution by one or more microprocessors. For example, the example methods described herein for controlling a subsystem can be implemented via processor 210 and/or memory 215. As shown in FIG.11, the inner loop of the implicit dual control algorithm is executed by the augmented stochastic differential-dynamic-programming-based algorithm module shown at 300, based on the augmented state and augmented covariance data structures 310, based on estimates provided by the filter 320, employing measurements obtained from the sensors 410. It is to be understood that the example system shown in the figure is not intended to be limited to the components that may be employed in a given implementation. For example, in one example implementation, the implicit dual controller 200 may be provided on a computing device that is mechanically supported by the controllable subsystem 400. Alternatively, one or more components of the implicit dual controller 200 may be physically separate from the controllable subsystem 400. For example, the implicit dual controller 200 may include a mobile computing device, such as a tablet or smartphone that is connected to a local processing hardware supported by the controllable subsystem 400 via one or more wired or wireless connections. In another example implementation, a portion of the implicit dual controller 200 may be implemented, at least in part, on a remote computing system that connects to a local processing hardware via a remote network, such that some aspects of the processing are (e.g. in the cloud). Although only one of each component is illustrated in FIG.11, any number of each component can be included. For example, a computer typically contains a number of different data storage media. Furthermore, although the bus 205 is depicted as a single connection between all of the components, it will be appreciated that the bus 205 may represent one or more circuits, devices or communication channels which link two or more of the components. For example, in many computers, bus 205 often includes or is a motherboard. Although some example embodiments of the present disclosure can be implemented in fully functioning computers and computer systems, various embodiments are capable of being distributed as a computing product in a variety of forms and are capable of being applied regardless of the particular type of machine or computer readable media used to actually effect the distribution. A computer readable storage medium can be used to store software and data which when executed by a data processing system causes the system to perform various methods. The executable software and data may be stored in various places including for example ROM, volatile RAM, nonvolatile memory and/or cache. Portions of this software and/or data may be stored in any one of these storage devices. As used herein, the phrases “computer readable material” and “computer readable storage medium” refers to all computer-readable media, except for a transitory propagating signal per se. The following examples are presented to enable those skilled in the art to understand and to practice embodiments of the present disclosure. They should not be considered as a limitation on the scope of the disclosure, but merely as being illustrative and representative thereof. EXAMPLES Example 1: Application to Anaerobic Digestion Anaerobic digestion is a chemical process where organic materials are broken down by various types of bacteria in a low-oxygen environment to produce a gas that is commonly known as biogas, which consists primarily of methane and carbon dioxide [9]. This biogas has various uses as a fuel, and can, for instance, be used directly for cooking or heating. It can also be compressed into a liquid fuel that is similar to natural gas, or it can be used to generate heat and power with a combined heat and power system [25]. Compared to other renewable energy sources, biogas can provide consistent power independent of the weather, can be located anywhere there is a consistent supply of organic waste, and the energy can easily be stored for later use. Therefore, anaerobic digestion represents a controllable renewable energy source that is free of many of the intermittency, storage, and site location issues common with other sources of renewable energy. Anaerobic digestion has several that make it an ideal application for dual control. First of all, anaerobic digestion models have uncertain parameters, and in particular those associated with the dynamics of the bacteria populations are difficult to measure and have a significant impact on the rest of the system dynamics. Second, it is common for not all of the states to be measured, and those that are measured are noisy. Finally, although not commonly modelled, the composition of the organic feedstocks to the digester is not known precisely. This combination of uncertainties in the feedstock composition, bacteria population dynamics, and the measurements makes this application well-suited for dual control. System model Two widely used models used to represent the anaerobic digestion dynamics are Anaerobic Digestion Model No.1 (or ADM1) [5] and AM2 [9]. ADM1 is a comprehensive model that has 24 state variables and is commonly used to simulate anaerobic digestion systems, but due to its complexity has limited use for control approaches [26]. AM2 on the other hand has six state variables while still capturing the main dynamics of the process and is commonly used for model-based control and parameter estimation [25]. For these reasons, the AM2 model was used for the present example. The AM2 model The AM2 model represents the anaerobic digestion process as:
Figure imgf000066_0001
where X1 is the concentration of acidogenic bacteria, X2 is the concentration of methanogenic bacteria, S1 is the organic substrate concentration, S2 is the volatile fatty acid concentration, Z is the total alkalinity concentration, C is the total inorganic carbon concentration, and the model parameters are described in FIG.12 [50]. The inputs to this model are the dilution rate, D, along with the inlet concentrations of organic substrate, volatile fatty acids, total alkalinity, and total inorganic carbon, , , , and respectively.
Figure imgf000066_0002
The growth rates of the acidogenic bacteria, µ1, are assumed to follow Michaelis-Menten kinetics: ( ) = , (4.7) and the growth rates of the methanogenic bacteria, , are assumed to follow Haldane kinetics:
Figure imgf000067_0001
where the molar flow rate of carbon dioxide, , is given by: = ( + ), (4.9) where and are the maximum respective growth rates,
Figure imgf000067_0002
and the molar flow rate methane, , is given by: = ( ) . (4.12) The flow rate of biogas, , is then assumed to be consisting only of carbon dioxide and methane, and is therefore given by: = + . (4.13) AM2 with uncertain feedstocks To model the uncertainty in the digester feedstocks, the inputs to the AM2 model were modified and parameters to characterize the feedstocks were added. For a case with three feedstocks, the flow rates of these feedstocks become the new inputs to the model, and can be represented by qa, qb, and qc. The dilution rate, therefore, becomes the sum of the individual flow rates,
Figure imgf000067_0003
= + + . (4.14) The inlet concentrations can then be as: =
Figure imgf000068_0001
where , , , and characterize the organic substrate concentration, volatile fatty acid concentration, total alkalinity concentration, and total inorganic carbon concentration for feedstock A, and , , , and and , , , and do the same for feedstocks B and C respectively. Two-parameter comparison with dual MS-SP-NMPC The true and estimated initial values of the state vector, [X1, X2, S1, S2, Z, C], are both [0.5, 1, 1, 5, 40, 50] and the state are all constrained to be greater than 0. The controls, [qa, qb, qc], are all constrained to be between 0.0001 and 1. The true value of the uncertain parameter vector, µ¯1, ZC , is [1.2, 100], while its initial estimate is [2, 50] with a covariance matrix of diag[0.16, 1111.11]. This system is simulated over 70 time steps that are 0.1 days long. Only states [S1, S2, C] are measured with a noise term G of 0.5I3. The process noise scaling matrix F for the dual iLQG state and parameter dynamics were set to 10- 15I23 to be comparable with the lack of MS-SP-NMPC to handle noise. The cost function is given as ( , ) = ( . (4.19) Starting with FIG.13A which shows the three feedstock controls for the AM2 model for each of the three controllers, it can be seen that the dual and adaptive iLQG algorithms took a similar approach, while the dual MS-SP-NMPC algorithm converged to a very different solution. While the iLQG algorithms varied all three feedstock controls, the MS- SP-NMPC algorithm focused primarily on the first feedstock control and only momentarily used the other two. As there were no terms in the cost function to explicitly penalize or encourage the use of the feedstock controls, the relative performance of these controls could only be determined by their impact on tracking the desired rate of biogas production. The plot of the anaerobic digestion states is shown in FIG.13B, with solid lines for the true state trajectories and markers for the state estimate means and covariances. Similar to the controls, the MS-SP-NMPC algorithm took a different approach to solve this problem, as the state trajectories are significantly different from the iLQG algorithms. Additionally, the impact of µ¯1 being uncertain can be seen in the large covariance associated with the third state, S2, even though it was being measured. FIG.13C shows the plot of the parameter estimates, with the true values shown as black lines. While the iLQG algorithms both approached the true values of the parameters, the MS-SP-NMPC algorithm’s estimate of the first parameter, µ¯1, was close but not as accurate, and not accurate at all for the second parameter, ZC. The reason for the MS-SP- NMPC algorithm’s estimate of ZC being poor was that it hardly used the third control, qC, and therefore had little opportunity to get feedback on ZC. Looking next at the plot of the biogas production in FIG.13D, the MS-SP-NMPC algorithm’s strategy of focusing on only one of the feedstock controls was successful, as it was largely able to maintain the biogas production level near the desired flow rate. Although the iLQG algorithms’ results were similar, the dual controller outperformed the adaptive controller consistently after the first couple of time steps. The ability of the controllers to track the desired biogas production level was directly reflected in their costs, as shown in FIG.13E Comparing the total costs, dual iLQG had a 29.1% improvement over adaptive iLQG, while dual MS-SP-NMPC had a 55.5% improvement over dual iLQG. This two-parameter example demonstrates how dual iLQG does not consistently outperform dual MS-SP-NMPC. Looking at the state plots in FIG. 13B, MS-SP-NMPC was operating in a very different region of state space compared to dual or adaptive iLQG. Since the iLQG algorithms use linearizations of the system dynamics, they can get stuck in local minima. Dual MS-SP-NMPC on the other hand uses the full nonlinear system dynamics and may have had an advantage in this problem with these initial conditions. Seventeen-parameter comparison with adaptive iLQG To show the ability of dual iLQG to handle many uncertain parameters, it was compared to adaptive iLQG for the AM2 system with seventeen uncertain parameters. Dual MS-SP-NMPC was not included in this comparison as it is unable to handle more than two uncertain parameters. The true and estimated initial values of the state vector, [X1, X2, S1, S2, Z, C], are both [0.5, 1, 1, 5, 40, 50] and the state are all constrained to be greater than 0. The controls, [qa, qb, qc], are all constrained to be between 0 and 1. The true and initial estimated values
Figure imgf000069_0001
This system was simulated over 70 time steps that are 0.1 days long. Only states [S1, S2, C] were measured with a noise term of of 0.5I. The multiplicative noise terms for the dual iLQG state and parameter dynamics were set to 10- 15I to be comparable with the lack of MS-SP-NMPC to handle noise. The cost function was as given for the two- parameter example, shown in equation (4.19). The results for the two iLQG algorithms’ control trajectories are shown in FIG. 15A. As with the previous example, these results by themselves did not indicate the relative performance of the algorithms as there were no terms in the cost function to explicitly penalize or encourage the use of the feedstock controls. FIG.15B shows the true and estimated values for the anaerobic digestion system states. The two iLQG algorithms had similar values for the first four states, but there was a significant difference for the last two states, as the dual controller had higher total concentrations of alkalinity and inorganic carbon. Looking at the parameter estimates in FIG.15C, the two iLQG algorithms gave similar results for most of the parameters with only a few that approached the true parameter values. The most notable exception was parameter 16, ZC, where the dual controller’s estimate quickly approached the true value while the adaptive controller lagged behind. Other than this, there were several instances where dual iLQG had a better estimate than adaptive iLQG and several examples where the opposite was true. One may keep in mind that parameter identification is not the objective of these algorithms and that a feature of dual control is being selective in identifying parameters to such a level that the objective function can be minimized. For the plot of biogas production in FIG.15D, the dual controller approached the desired production level by oscillating around the setpoint while the adaptive iLQG did not demonstrate that behaviour. After performing worse than adaptive iLQG for several time steps in the first half of the simulation, dual iLQG consistently outperformed adaptive iLQG over the second half of the simulation. These results are reflected in the plot of the objective function in FIG.15E, where it can be seen that the first and third time steps are where dual iLQG gained the most advantage over adaptive iLQG. Overall, the dual iLQG algorithm’s total cost was 18.6% lower than that of the adaptive iLQG algorithm. Summary The present example application demonstrated how dual iLQG can be used to control an uncertain nonlinear biochemical system. To allow for dual iLQG to be compared with dual MS-SP-NPMC, the AM2 anaerobic digestion system was simplified to have only two uncertain parameters. Although dual MS-SP-NMPC outperformed dual iLQG in this two-parameter case, dual MS-SP-NMPC was unable to handle the following seventeen-parameter case. In this seventeen-parameter case, dual iLQG outperformed adaptive iLQG by 18.6%. While the present example has focused on the application of the aforementioned adaptive and dual control methods to the determination of control actions for controlling a biogas synthesis system, it will be understood that the present example can be adapted for the control of any industrial manufacturing, synthesis or fabrication system for producing or refining a product. Moreover, while specific examples of mathematical models of the dynamics of the biogas synthesis system have been employed in the present example, with specific associated states, uncertain parameters, and control actions, it will be understood that the skilled artisan may adapt the present example to alternative mathematical models, associated states, uncertain parameters, and control actions that may be suitable for a given implementation. Non- limiting examples of industrial manufacturing, synthesis or fabrication system include automated assembly lines (such as automobile assembly lines), additive manufacturing systems, chemical synthesis reactors, injection molding systems, semiconductor fabrication systems, CNC machining systems, metalworking systems, bio-manufacturing systems, electrochemical and thin film deposition systems, and autonomous textile weaving systems. Moreover, although the present example implementation involved the use of the aforementioned adaptive and implicit dual control methods to adaptively model and refine uncertainty in parameters associated with the composition of a feedstock employed as an input to a synthesis process, it will be understood that in alternative implementations, the uncertain parameters employed in the augmented mathematical model, the augmented state data structure, and the augmented covariance data structure may characterize other types of uncertainty in an industrial manufacturing, fabrication or synthesis process that is autonomously controlled. Example 2: Application to Control of the COVID-19 Outbreak The control of the COVID-19 pandemic continues to represent an enormous challenge for governments all over the world. Although policies to limit deaths and hospitalizations have been determined, such as enforcing mask use and lockdowns, these policies have social and economic costs, and precisely how effective these policies are is uncertain [29]. Additionally, there are uncertainties associated with the dynamics of the virus’s spread through populations [52]. Having an objective of balancing deaths and hospitalizations with social and economic a dynamic system that has uncertain parameters makes COVID-19 a suitable application for dual control. The purpose of using COVID-19 as an application for dual control here is not to model the spread of the virus through a real population or to suggest that dual control could have saved lives, but to apply dual control to a complex nonlinear system that people understand. To that end, where available, values for the states, parameters, and cost- weighting factors from the literature were used, and where not, illustrative values were used. System model To model the spread of infectious diseases through populations, compartmental models have been used since 1927 [42]. These models divide a population into a series of compartments that represent stages of the disease and then describe how these groups change over time. One of the simplest of these compartment models was the SIR model [42] as shown in FIG.16, where a population is divided into being Susceptible (S), Infected (I), or Removed (R) (that is, deceased). The movement of the population through these states can then be represented graphically as arrows between each compartment (state), and these population flows can then be described with equations based on the states themselves as well as parameters and controls. These parameters generally represent infection and fatality rates for different populations, and the controls are methods of influencing these dynamics. For instance, the SIR model can be expressed as:
Figure imgf000072_0001
, where is the transmission rate between the infected and susceptible populations, is the inverse infectious period, and N is the total population (S + I + R = N ). Note that in the SIR model, having a population that is recovered is not considered, and neither is a hospitalized population that may have a different transmission rate. These compartmental models have been tailored to better represent the dynamics of the COVID-19 virus, with different compartments or states being considered by different researchers. In [29], eight compartments are considered: Susceptible, Infected (those that are asymptomatic, infected, and undetected), Diagnosed (those that are asymptomatic, infected, and detected), Ailing (those that are symptomatic, infected, and undetected), Recognized (those that are symptomatic, infected, and detected), Threatened (those that are acutely symptomatic, infected, and detected), Healed (either after being detected or not, and assumed immune after being infected), and Extinct assumed to be detected), giving the SIDARTHE model shown in FIG.17A. In the SIDARTHE model, the infected populations other than the threatened population infect the susceptible population with different rates of transmission. Once infected, there are 5 different transitions between populations considered, shown in different colours in FIG.17A: developing symptoms, getting diagnosed, getting healed, becoming critical, or dying. With the parameters shown in FIG.17A describing these transitions between the states, the SIDARTHE model can be expressed as:
Figure imgf000073_0001
where the description of the parameters can be found in FIG.17B. In this model, the recovery and mortality rates of the Threatened state are modelled as being dependent on the Threatened state to represent the impact of the health care system being overwhelmed. In [29], this effect was achieved in a two-step process, whereby a model was created where the Threatened population was divided into those in the limited-capacity intensive care unit (ICU) and those not, and then this model was simplified to maintain the eight states described above. The compartmental diagrams for these two steps can be seen in FIG.18. Defining T1 as the Threatened population that do not require ICU treatment and T2 as those that do, and assuming that there are no transfers between these two populations, the dynamics of these states can be represented as
Figure imgf000073_0002
= + + , where 1 and 1 are independent parameters and 2 and 2 are dependent on T2. This model is approximated to a lumped model as shown
Figure imgf000073_0003
the right of FIG.5.3, for a defined ICU capacity of TICU , by assuming that if T2 TICU : ( ) = + , (5.14)
Figure imgf000074_0001
but if T2 > TICU , (T ) increases to crit for the remaining T2 population who require ICU treatment but cannot access it, and
Figure imgf000074_0002
recovery rate for this group also drops to 0. The (T )T and (T )T terms can therefore be represented as:
Figure imgf000074_0003
and used in equations (5.9) to (5.11). To implement controls in this model, public health policies are seen to have a direct influence on the infection rates and , and this relationship can be modelled as (5.18)
Figure imgf000074_0004
(5.19) where u(t) is constrained to [0, 1] and can be used to vary the infection rates between minimum values of min and min with u = 1 and maximum values of max and max with u = 0. SIDARTHE model limitations Although the SIDARTHE model can capture the major aspects of the dynamics of COVID-19, there are several limitations to this model. First of all, this model represents the population as static, other than deaths due to COVID-19. SIDARTHE does not include population changes due to travel, births, or non-COVID-related deaths. A more complex model that did include non-COVID-related deaths would also be an interesting application for dual control. Additionally, the SIDARTHE model only represents the public health policies as a single control input, lumping the impact of these policies into a single value representing the severity of the restrictions. Although this makes the implementation of the model much easier, it would be difficult for health agencies to get precise recommendations from such a lumped term. Additionally, this single control action limits the potential probing that a dual control method could implement, as in reality multiple policies can be varied over time. For instance, media campaigns, enforcing social distancing and mask use, performing asymptomatic testing, performing symptomatic testing, quarantining of positive cases, increasing non-ICU hospital resources, and increasing ICU resources could all be considered independent controls, and the of the SIDARTHE model to include these controls will be discussed in the next section. Changes to the SIDARTHE model To extend the SIDARTHE model to have separate control inputs representing different types of public health policies, a similar approach to that of equations (5.18) and (5.19) was used. The public health policies that were considered were media campaigns (u1), enforcing social distancing and mask use (u2), performing asymptomatic testing (u3), performing symptomatic testing (u4), quarantining of positive cases (u5), increasing non-ICU hospital resources (u6), and increasing ICU resources (u7). Since many of these public health policies influence more than one parameter in the SIDARTHE model to varying levels, effectiveness parameters were introduced. For instance, both media campaigns and enforcing social distancing and mask use will lower , but the media campaigns may do so less effectively. The impact of the control inputs on the SIDARTHE model parameters can therefore be
Figure imgf000075_0001
, where the terms are effectiveness factors for the controls, describing the influence they have For each of the SIDARTHE model parameters, the maximum and minimum values are maintained. For instance,
Figure imgf000075_0002
+ + = 1, (5.30) and therefore when these effectiveness factors are used as uncertain parameters in dual iLQG, one of the factors from each set can from the others. Two-parameter comparison with dual MS-SP-NMPC Since the dual MS-SP-NMPC code was only able to handle two parameters, it was compared with dual and adaptive iLQG for the original single-control SIDARTHE model. The uncertain parameters were chosen to be min and min, as the extent to which the spread of COVID-19 can be reduced by imposing full restrictions is a critical factor in determining the trade-off between reducing cases and the socio-economic impacts of restrictions. The values of the total population and the parameters used in these simulations were taken from [29], where the COVID-19 outbreak in Germany was modelled. The states were constrained to [0, 82999999] and the true and estimated initial values of the state vector were both [82998999, 1000, 0, 0, 0, 0, 0, 0] . The control inputs were constrained to [0, 1]. The true value of the constant parameter vector was [0.0422, 0.0422], while its initial estimate was [0.15, 0.1] with a covariance matrix of [0.0025, 0; 0, 0.0011]. This system was simulated over 40-time steps that were 1.0 days long, with a rolling horizon approach. The Diagnosed, Recognized, Threatened, and Extinct states were measured with a noise term of diag([10, 7.5, 5, 2.5]), and as the MS-SP-MPC did not consider the states to be uncertain, the process noise scaling matrix for the dual iLQG state and parameter dynamics were set to 10- 15I10 and the noise was not implemented in the true system in the outer loop. The cost function was set as ( , ) = + , (5.31) where cx = [0, 0, 0, 0, 0, 0.0033, 0, 0.0267], and cu = 10. This cost function was very similar to the one used in [51] for the suppression of COVID-19, but cu was increased here to increase the cost of imposing restrictions. Additionally, this is a representative cost function; its values would have to be set by policymakers, but the benefits illustrated in this work are robust to various cost functions. Instead of using a shrinking horizon approach for the SIDARTHE simulations, a rolling horizon approach was used to match how this problem would be approached by governments. Since COVID-19 is an ongoing issue, the control approach should reflect that, and ignoring this fact was shown to cause unexpected results from the controllers. When using a shrinking horizon, all of the controllers had knowledge of the fixed terminal simulation time, and as that terminal time approached, they changed their behaviour. Due to the cost of the control actions and the propagation delay between infected individuals becoming hospitalized or dying, the drop all restrictions, or controls, to zero near the end of the simulation to reduce costs, as there were no costs associated with the soon-to-be spiking hospitalization levels after the simulation ended. In essence, the controllers were behaving as if they knew the world was ending at a particular time, and there was no need for restrictions. As this behaviour was not representative of the approach governments would likely take, a rolling horizon approach was used, where even during the last time step of the simulation, the controller was still trying to minimize costs over a time horizon that extended past the final time step. This rolling horizon approach gave the desired controller behaviour and was used for all of the SIDARTHE simulations. FIG.19A shows the controls for the adaptive iLQG, dual iLQG, and dual MS-SP- NMPC algorithms. Both of the iLQG algorithms used the full range of the constrained control, while the MS-SP-NMPC algorithm’s control remained near zero for the entire simulation duration. The dual iLQG controls use a high degree of restrictions early on, with a small variation that likely helped with parameter identification, before tapering off over the 40-day time frame. The effect of these controls on the system states can be seen in FIG.19B, where the near lack of control from the MS-SP-NMPC algorithm is clear from the rapid increase in case numbers, while the iLQG algorithms did better at keeping the case numbers under control, with dual iLQG performing better than adaptive iLQG. The estimation of the two uncertain parameters is shown in FIG.19C. Here, with the control hardly used, the dual MS-SP-NMPC algorithm’s estimates remained at the initial values, while the iLQG algorithms converged to the true values in a similar manner. Finally, the plot of the cumulative cost for each algorithm is shown in FIG.19D. The MS-SP-NMPC’s cost increased each time step with the increasing case counts, remaining lower than the cost of the iLQG algorithms until after the 30th day. Comparing the total costs, dual iLQG outperformed adaptive iLQG and dual MS-SP-NMPC by 28% and 66% respectively. Robustness and stability of results To explore the robustness and stability of these results, the above simulation was run 100 times with each time step of the initial control policy being drawn randomly from a normal distribution with a mean of 0.5 and a variance of 0.0625. This process was repeated with the iLQG, adaptive iLQG, and dual iLQG algorithms for three different rolling horizon lengths. The impact of the horizon length on the simulation time of these algorithms was of particular interest to compare with similar dual MS-SP-NMPC results. These results are shown in FIG. 20A where the abbreviations A-iLQG and D-iLQG are used for adaptive iLQG and dual iLQG. Looking to the left-hand side of Table 5.2, the best solution of the 100 seeds is shown along with a measure of how close 90% of the solutions were to this minimum in brackets. For instance, the first entry reads that the minimum solution was 427 and that 90% of the 100 seeds had results within 2.04% of this minimum value (therefore the results were very consistent). With a horizon length of 5, all three of these algorithms preformed poorly compared to the longer horizon options shown, but the solutions were concentrated with 90% of them with 2-3% of their respective minimums, therefore they consistently performed poorly. The results for the other two horizon lengths were similar to one another in terms of their minimums, but the longer horizon length of 40 improved the concentration of the solutions around the best solution for all three algorithms. If the horizon length of 5 was too short, causing the algorithms to consistently give poor results, the longer horizon options do not appear to have that issue. To compare these results with individual runs of the dual MS-SP- NMPC algorithm, for horizons of 5 and 25 had results of 486.6 and 509.9, while the algorithm ran out of available memory when trying to run with the full horizon of 40. All three of the iLQG algorithms outperform dual MS-SP-NMPC in this application. Moving to the right-hand side of FIG.20A, the median simulation times from the 100 seeds are shown along with the interquartile range of the times, both shown in seconds. Here, longer horizons are correlated with longer median run times, and in most cases, increasing complexity of the iLQG algorithm also increases the median run time. The exceptions to this last point are that for a horizon of 25, the adaptive iLQG algorithm has a lower median time than the iLQG algorithm, and for the horizon of 40 the dual iLQG algorithm has the lowest median time. These simulation times are all within the same order of magnitude, which cannot be said for the dual MS-SP-NMPC algorithm. Individual runs of the dual MS-SP-NMPC algorithm with horizons of 5 and 25 had simulation times of 594.6 and 11301 seconds, while the algorithm ran out of available memory when trying to run with the full horizon of 40. This demonstrates the significant advantage of dual iLQG over dual MS-SP-NMPC. To get a better idea of these results for the case of with the rolling horizon length of 40, a histogram of the costs and a box plot of the simulation times for each of the algorithms is shown in FIGS.20B and 20C. In FIG.20B, both adaptive and dual iLQG are an improvement on iLQG, but where the iLQG and adaptive iLQG solutions are strongly skewed to the left, the dual iLQG solutions do not show this same pattern. It may be that for this application the dual probing actions do not result in the cost reductions that the algorithm determined were probable. The dual iLQG algorithm does find a similar minimum solution as the adaptive algorithm though, and the two have a similar variance. Looking at the simulation time required for these seeded runs in FIG.20C, the results are similar between the three algorithms. Although the dual algorithm would normally be expected to have the longest times, and it does have the largest extreme times in this case, it actually has the lowest median time of the three algorithms. In this application, the complexity of the model and the length of the horizons appear to have more of an impact on the simulation times than the differences between the Sixteen-parameter comparison with adaptive iLQG As in the anaerobic digestion application, the model expanded from the two-parameter case to sixteen parameters in this case to compare dual and adaptive iLQG. Dual MS-SP-NMPC was not included in this comparison as it is unable to handle more than two uncertain parameters. The modified SIDARTHE model used for this comparison used 5 control inputs, u1 to u5, as described in equations (5.20) to (5.24). The states are constrained to [0, 82999999] and the true and estimated initial values of the state vector are both [82998999, 1000, 0, 0, 0, 0, 0, 0] . The 5 control inputs are constrained to [0, 1]. The true and initial estimated values of the uncertain parameters used in this simulation are shown in FIG.21. This system is simulated over 30 time steps that are 1.0 days long, with a rolling horizon approach. The Diagnosed, Recognized, Threatened, and Extinct states are measured with a noise term of diag([10, 7.5, 5, 2.5]), and the noise term for dual iLQG were set to 10- 5 for the state dynamics and 10- 15 for the parameter dynamics, but the noise was not implemented in the true system in the outer loop. The cost function is given as ( , ) = + , (5.32) where cx = [0, 0, 0, 0, 0, 0.033, 0, 0.267], and cu = [0.01, 10, 0.75, 0.75, 0.5]. The controls resulting from each of the iLQG algorithms are shown in FIG.22A. Compared to the adaptive controller, the dual controller has significantly lower values for the 2nd, 3rd, and 4th controls (enforcing social distancing and mask use, asymptomatic testing, and symptomatic testing) that have higher weights in the cost function, while the 1st and 5th controls are very similar to adaptive iLQG. FIG.22B shows the true and estimated states for each algorithm, along with the covariance of the estimates. With lower controls, the case counts for dual iLQG are higher, especially for the 2nd, 3rd, and 4th states (Infected, Diagnosed, and Ailing), but for the 6th and 8th states (Threatened and Extinct) that have non-zero cost terms, the dual controller has only slightly higher case numbers than the adaptive controller. Due to the significant measurement noise and large covariance of the states in this example, the parameter estimates shown in FIG.22C also have large covariances, and the estimates do not converge to the true values in the 30-day time horizon. Even though the dual iLQG controller has higher case counts, it can maintain similar Threatened and Extinct cases while fewer restrictions, and this results in a total cost that is 21% lower than the adaptive controller, as shown in FIG.22D. The dual controller has slightly higher costs than the adaptive controller initially, before having similar results in the middle of the simulation, and then significantly lower results for the last part of the simulation. Summary This present example application demonstrates how dual iLQG can be used to inform government policy for uncertain nonlinear systems. To allow for dual iLQG to be compared with dual MS-SP-NPMC, the SIDARTHE model of COVID-19 was simplified to having only two uncertain parameters. Dual iLQG outperformed dual MS-SP-NMPC by 7% and adaptive iLQG by 26%, but dual MS-SP-NMPC was unable to handle the following sixteen-parameter case. With sixteen uncertain parameters, dual iLQG outperformed adaptive iLQG by 21%. While the present example has focused on the application of the aforementioned adaptive and dual control methods to the determination of public health controls for controlling the outbreak of COVID-19, it will be understood that the present example can be adapted for the determination of public health controls for controlling the outbreak of any disease. Moreover, while specific examples of mathematical models of the dynamics of the population system have been employed in the present example, with specific associated states, uncertain parameters, and control actions, it will be understood that the skilled artisan may adapt the present example to alternative mathematical models, associated states, uncertain parameters, and control actions that may be suitable for a given implementation. Moreover, while some example implementations may include applying the control actions that are determined as an improved control policy is determined on each iteration of the adaptive or implicit dual control method, other example implementations and variations of the present example may be absent of the step of applying the recommended control action and may only be employed to communicate potential control actions that can be considered, and optionally implemented, by a public health entity. In some example implementations, during one or more iterations of the method, a control action may be taken by the public health entity that is different from a recommended control action as per a currently determined control policy at a given time step or iteration (which may include taking no action at all), and the method may proceed to the following iteration without application of the recommended control action. Example 3: Autonomous Vehicle Control Autonomous vehicles must be able to operate in uncertain and changing environments. Many of these uncertainties can be expressed as uncertain parameters in stochastic mathematical models of the dynamics of the autonomous vehicle system, and additional information on these parameters can be gained through measurements. With a large number of states and uncertain parameters depending on the type of land, water, or air vehicle, computing control actions for a vehicle in real-time would be challenging for any conventional algorithm. This technical problem of the need for an improved control method that accounts for the uncertainty of parameters affecting the dynamics of the autonomous vehicle can be solved by the derivative-based adaptive and dual control methods of the present disclosure, as described below. These unconventional methods employ derivate-based stochastic DDP-based algorithms and involve the processing, by the computer hardware controlling the autonomous vehicle, of an augmented state data structure that includes both the states and the uncertain parameters, an augmented state covariance data structure that characterizes the covariance both the states and the uncertain parameters, according to an encoded augmented mathematical model of the dynamics of the autonomous vehicle, where the uncertain parameters are treated as states. This unconventional approach enables, as described above, the efficient computation of a suitable control policy with improved computational efficiency. The adaptive and dual control methods described above could be adapted for this application as follows. Depending on the type of vehicle, a subset of the vehicle’s position and angle in each of the three dimensions, as well as time derivatives of those variables, could be employed as states of the system. It will be understood that the specific states selected for a given mathematical model may vary among different example implementations. Furthermore, depending on the specific properties and configuration of the autonomous vehicle being considered, a wide variety of parameters could be modeled as uncertain parameters. For example, in the case of a land vehicle, uncertain parameters may include, but would not be limited to, any one or more of the road friction coefficient and tire contact parameters. For air or water vehicles, these parameters may include, but would not be limited to, any one or more of the effectiveness coefficients of the control surfaces or lift, drag, or moment coefficients or other nondimensional coefficients. The mass of the vehicle may also be considered if it is deemed sufficiently uncertain in a given case. The adaptive and dual control methods described above would calculate control policies that would be implemented on the autonomous vehicles with the objective of minimizing the given cost functions. These control policies would inform for example the use of wheel torques, braking, and steering in the case of a land vehicle, or the thrust, control surface position, or ballast tanks in the case of a water vehicle. The example adaptive control or implicit dual control methods described herein, when applied to the present example application of autonomous vehicle control, would employ an encoded augmented mathematical model of the dynamics of the autonomous vehicle. Although simplifications or linearization of this set of equations describing the autonomous vehicle dynamics are possible, this would not be necessary in some example implementations, such as, for example, in the case of a dual iLQG implementation that can handle a nonlinear system. When implemented according to the adaptive control methods described above, the use of an augmented state data structure, the state covariance data structure, and the encoded augmented mathematical model of the dynamics of the autonomous vehicle in a filter residing in the outer processing loop enables the outer loop filter to provide an updated estimate of the aforementioned uncertain parameters, taking into account their uncertainty, thereby leading to an improvement in convergence and computing efficiency. Moreover, when implemented according to the implicit dual control methods described above, the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented mathematical model of the dynamics of the autonomous vehicle within the augmented stochastic DDP-based algorithm within the inner processing loop, enables the aforementioned uncertain parameters to be treated like states during the computation of the updated control policy, thereby enabling the implicit dual features of probing, caution, and selectiveness to achieving convergence faster than a corresponding method absent of augmentation. Such an autonomous controller based on dual implicit control enables the anticipation of the learning that will result from a given set of control actions for controlling the autonomous vehicle, and how the information it gains will help it achieve its goal as expressed in the cost function, such that the control policy determined for controlling the dynamics of the autonomous vehicle adapts to uncertain model parameters, such as uncertain and potentially hazardous conditions. For example, when an autonomous vehicle is operating under conditions in which ice may be present on the road, a dual control algorithm (implemented according to a method employing an augmented stochastic DDP-based algorithm, as noted above) for an autonomous land vehicle would be cautious in that it would limit large and sudden control actions, probing in that it would make movements to better assess the current road conditions, and it would be selective in that it would prioritize identifying the road conditions over other uncertain parameters depending on the given situation. Example 4: Personalized Healthcare Rehabilitation Understanding the best rehabilitation treatment plan for an individual is difficult for many reasons. A significant one is that existing treatment pathways are built on studies that aggregate across clinics. By the time these studies are published, the pathways that they have analyzed are decades old, no longer representing the entire set of available actions. In addition, these pathways are aggregated. For example, many studies have found that what works best for females does not work best for males; or what works best for elderly does not work best for youth – even when many of the conditions and symptoms are the same. As well, every individual has different demographics in terms of how accessible different treatment options are; how motivated they are to do certain things; and how much time they can spend to achieve their goals. Rehabilitation is also stochastic in nature – even in the same individual, the same treatment does not always create the same effect. In the same way that we could say rolling a dice was deterministic if we could accurately model the forces used to tumble it, we could say that rehabilitation is deterministic if we modeled the precise quality and quantity of sleep every night, the exact diet, and many other details, but absent of these, rehabilitation presents as an extremely stochastic process in which the current best practice is to rely on aggregated data vs. responding to the daily stochastic fluctuations presented by the patient. The number of available options, ranging from pharmacological options to surgical options to therapeutic options (including exoskeleton training, functional electrical stimulation, and conventional physiotherapy), along with the fact that they can be employed in parallel, combined with the stochastic and time-delayed nature of the process (rehabilitation now may only produce an improvement in the weeks to come) make it challenging to develop a suitable and computationally efficient control method to prescribe rehabilitation treatment pathways. This technical problem of the need for an improved control method that accounts for the uncertainty of parameters affecting the personalized and stochastic nature of the rehabilitation process can be solved by the derivative-based adaptive and dual control methods of the present disclosure, as described below. These unconventional methods employ derivate-based stochastic DDP-based algorithms and involve the processing, by computer hardware, of an augmented state data structure that includes the both the states and the uncertain parameters, an augmented state covariance data structure that characterizes the covariance both the states and the uncertain parameters, according to an encoded augmented mathematical model of the personalized rehabilitation dynamics, where the uncertain parameters are treated as states. This unconventional approach enables, as described above, the efficient computation of a suitable control policy with improved computational efficiency. When considering the development of a mathematical model characterizing the dynamics of personalized rehabilitation, a wide range of states can be considered. For example, in one non- limiting example implementation, one or more states may be selected from the taxonomy identified by the World Health Organization (International Classification of Functioning, Disability, and Health: ICF), that includes categories of participation, activities, health condition, body functions and structures, environmental factors, and personal factors. States within the participation domain could include the number of times a person was able to go to work; engage in their favourite sport; or participate in other meaningful activities for them. As well, it could include states including quality of life. Measurements of states within this domain include the short Form 36 and other questionnaire, as well as location-data that can be tracked and logged. States within the activity level could include their level of performance in various activities, including walking on level ground, walking up stairs, getting into and out of cars, and other activities pertinent to their intended participation and their health condition. Measurements of these states include validated tests such as the timed up and go (TUG) test, or the 6-minute walk test. States related to health condition could include the functional ability to use a wheelchair or exoskeleton, and measured by physiotherapists using a rating scale. States related to body functions and structures would include the location and type of For example, in the domain of spinal cord rehabilitation it would include the level of injury (e.g., T5) and whether it was complete or incomplete, typically measured using an ASIA Impairment scale. States within the category of environmental factors could include states such as the distance to and accessibility of the various treatment options given their geography, as well as whether clinicians were able to speak their native language and/or translation services were available. Personal factors could include states such as the age, height, weight, sex, and ethnicity of the person; the level of motivation of the individual; their level of executive function; and other intrinsic attributes and motivators. These states can be measured, for example, through IQ and EQ tests and other tests of motivation. The relationship between the states, combined with inputs of treatment options can be dynamically modelled using a combination of data-driven processes and existing dynamical information on available actions. For example, there is data available that shows how individuals respond within various states (including states such as walking ability, activity level, or participation level), based on specific interventions (such as, for example, functional electrostimulation (FES) therapy or exoskeleton training), and that this relationship is temporally dynamic (e.g., well represented as a second order phenomenon with a time-constant of weeks). The dynamic model would link each of the states within each level to the states within the next level, combined with the available actions at each level. These linkages can flow both ways given the matrix structure of the model (e.g., states within the participation domain can affect the activity domain, and states within the activity domain can affect the participation domain). Non-limiting examples of parameters that are uncertain within a mathematical model of personalized rehabilitation include the gains and time-constants between each of the states. For example, in the example case of FES training, non-limiting examples of uncertain parameters include the gain and time-constant that relate FES training to walking ability, the gain and time- constant that relates FES training to the ability to participate in a specific activity such as golf, and the gain and time-constant that relate FES training to the ability to participate with community members within the golfing community. There are gains and time-constants between each of the actions and each of the states; as well as between each of the states and each other. These parameters are well represented as the cells within a matrix mapping the updated states to the existing states and the control actions, where the vector of states includes all possible states plus a time-delayed version of them (to accommodate the inclusion of a time-constant). Non-limiting examples of control actions may include pharmacological treatments (e.g., spasticity-reducing drugs such as Botox), surgical treatments (e.g., tendon release), conventional therapeutic approaches (e.g., stretching or walking), and/or robotic/assistive technological approaches (e.g., exoskeletons, functional electrical stimulation). The example adaptive control or implicit dual control methods described herein, when applied to the present example application of personalized rehabilitation, would employ an encoded augmented mathematical model of the time evolution of the personalized rehabilitation process. Although simplifications or this set of equations describing the dynamics are possible, this would not be necessary in some example implementations, such as, for example, in the case of a dual iLQG implementation that can handle a nonlinear system. When implemented according to the adaptive control methods described above, the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented mathematical model of the dynamics of the personalized rehabilitation in a filter residing in the outer processing loop enables the outer loop filter to provide an updated estimate of the aforementioned uncertain parameters, taking into account their uncertainty, thereby leading to an improvement in convergence and computing efficiency. Moreover, when implemented according to the implicit dual control methods described above, the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented mathematical model of the dynamics of the personalized rehabilitation within the augmented stochastic DDP-based algorithm within the inner processing loop, enables the aforementioned uncertain parameters to be treated like states during the computation of the updated control policy, thereby enabling the implicit dual features of probing, caution, and selectiveness to achieving convergence faster than a corresponding method absent of augmentation. Such an autonomous controller based on dual implicit control enables the anticipation of the learning that will result from a given set of control actions for controlling the treatment decisions employed during personalized rehabilitation, and how the information it gains will help it achieve its goal as expressed in the cost function. While the example above has been illustrated within the context of control methods for improving personalized physical rehabilitation, whereby the uncertainty of one or more parameters governing the rehabilitation dynamics is accounted for during the developing of a control policy according to an adaptive or dual control method based on data augmentation, it will be understood that the present example may be adapted to a wide variety of rehabilitation applications, including, for example, sports medicine, orthopedics, respiratory, stroke recovery (e.g. motor, speech, vision), cardiology, oncology, mental health, nutrition, rheumatology and chronic pain management. Each of these medical domains involve rehabilitation and can benefit from the improved personalization in tailored therapeutic decision support that can be provided by the present adaptive and implicit dual methods, in which uncertain parameters are treated as states through data structure and model augmentation. It is noted that while some example methods may include applying the control actions that are determined as the example methods are executed and an improved control policy is determined on each iteration of the method, other example implementations of the methods may be absent of the step of applying the control action – and thus absent of performing a medical treatment or therapeutic intervention – and may only be employed to communicate potential control actions that can be considered, and optionally implemented, by a provider. In some example implementations, during one or more iterations of the method, a control action may be taken by the provider that is different from a control action as per a currently determined control policy at a given time step or iteration (which may include taking no action at all), and the method may proceed to the following iteration without application of the recommended control action. Example 5: Wearable Robotics Wearable robotics include upper and lower-limb exoskeletons and exosuits, which can either be rigid or soft. The goal of these devices is to augment, assist, or rehabilitate individuals. However, every individual moves slightly differently. Human movement is inherently stochastic due to the biologically stochastic nature of force generation within human muscles, and each person comes up with slightly different control strategies based on the dimensions of their various limbs; their strength and flexibility, and any underlying impairments they may have. One of the largest goals for wearable robotics has been to make life easier for people – typically defined as reducing the metabolic cost for them to do their desired activities. However, because of the differences between individuals and the stochasticity within all individuals, developing efficient personalized tuning settings that reduce metabolic cost has been challenging. Only recently have a few groups been able to reduce metabolic costs, and their tuning strategies are lengthy and cumbersome. Accordingly, a technical problem exists in the field of wearable robotics in that existing control methods fail to accommodate, learn and refine user-preferred parameters during operation. This technical problem involving the need to account for the uncertainty of parameters affecting the dynamics of a wearable robotic system, when attempting to find a control policy that is tailored to the personalized needs of the wearer, can be solved by the derivative-based adaptive and dual control methods of the present disclosure, as described below. These unconventional methods employ derivate-based stochastic DDP-based algorithms and involve the processing, by the computer hardware controlling the wearable robotic system, of an augmented state data structure that includes the both the states and the uncertain parameters, an augmented state covariance data structure that characterizes the covariance both the states and the uncertain parameters, according to an encoded augmented mathematical model of the dynamics of the wearable robotic system, where the uncertain parameters are treated as states. This unconventional approach enables, as described above, the efficient computation of a suitable control policy that is tailored to the user of the wearable robotic system with improved computational efficiency. Non-limiting examples of states of such a stochastic system may include, for example, relevant states used by the wearable robot, which typically include joint angles, torques, forces, power, and energy, along with metabolic consumption as measured by CO2 expiration. Some wearable exoskeletons also use states including myoelectric activation as recorded using electromyographic electrodes, sonomyography, or other devices. Other devices include states that measure user intent as measured using EEG, or other devices. There are a variety of models that have been used and tuned to work with wearable robotics. Some of these simply map states and actions, where actions would be the forces, torques, and/or kinematic trajectories enacted by the wearable robot in response to the user’s actions (e.g., forces, myoelectric activity, or intent). Others enforce state-based impedance regimes in which, depending on which phase of gait a person is, certain parameters are tuned to determine an impedance and equilibrium position. Others enforce a phase portrait or virtual holonomic constraint. Many models are used in the field, but all of them have tunable parameters that are unique to each individual, and the majority of the models in the field have multiple parameters, making tuning them difficult given how stochastic both the user’s signals are as well as how stochastic and time-delayed the measurement of metabolic activity is. The uncertain parameters employed to define an augmented state data structure and augmented covariance data structure will in general depend on the particular mathematical model used. In an example implementation employing state-based impedance control, the uncertain parameters could include, for example, any one or more of the stiffness, damping, and inertia values along with the equilibrium position within each state. In another example implementation involving control of kinematic trajectories during the swing phase of gait, for example, the uncertain can include, for example, one or more of parameters describing a minimum jerk trajectory or other kinematic profile (e.g. according to a model described in US Patent No. 10,213,324, Sensinger and Lenzi 2019). In an example implementation involving virtual holonomic constraints, the uncertain parameters can include, for example, one or more parameters that determine the profile of the phase portrait (e.g. according to a model described US Patent No. 10,314,723, Sensinger and Gregg.2019). It will be understood that a wide variety of models and associate parameters may be employed (e.g. another example model is described in US Patent No. 10,799,373, Lenzi and Sensinger.2020), and in general, the various models each have tunable parameters that are unique to each individual. Non-limiting examples of control actions include generating joint kinematics (position or velocity profiles or phase portraits), kinetics (producing torques or torque trajectories), and applying the control actions to actuators (e.g. motors). When implemented according to the adaptive control methods described above, the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented mathematical model of the dynamics of the wearable robotic system in a filter residing in the outer processing loop enables the outer loop filter to provide an updated estimate of the aforementioned uncertain parameters, taking into account their uncertainty, thereby leading to an improvement in convergence, computing efficiency, and personalization of the operation of the wearable robotic system. Moreover, when implemented according to the implicit dual control methods described above, the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented model of the dynamics of the wearable robotic system within the augmented stochastic DDP-based algorithm within the inner processing loop, enables the aforementioned uncertain parameters to be treated like states during the computation of the updated control policy, thereby enabling the implicit dual features of probing, caution, and selectiveness to achieving convergence faster than a corresponding method absent of augmentation. Such an autonomous controller based on dual implicit control enables the anticipation of the learning that will result from a given set of control actions for controlling the wearable robotic system, and how the information it gains will help it control the wearable robotic system in a manner that is tailored to individual user preferences, thereby leading to improved adoption, utilization and clinical efficacy. Example 6: Fault Detection Various industrial systems and subsystems are operated by controllers, such as electrical systems, electromechanical systems (systems or subsystems that include electrically driven/actuated mechanical components), hydraulic systems, pneumatic systems, thermal systems, and combinations thereof. Such industrial systems or subsystems often face degradation over time, with the degradation causing failure. Having the ability to monitor the health of the system and detect faults indicative of potential failure can prevent expensive repairs, downtime, and loss of life. While databases of past failures may be available for some systems, allowing data-driven health monitoring and fault detection subsystems to be implemented, this data may not exist for systems that fail infrequently. While the states of such systems are often observable, allowing for feedback pertaining to state-based system health and faults to be provided to operators, degradation can be associated with uncertain (and potentially time-varying) system parameters that can be difficult to identify during normal system operation. Periodic testing during dedicated time intervals may be required to properly identify these uncertain parameters, incurring costs associated with taking the system out of normal operation. The degradation of the components of the system that causes these parameters to change over time can impact the performance of the system controller and may eventually cause the failure of the system overall. Accordingly, a technical problem exists in the field of industrial system control in that existing control methods fail to facilitate the monitoring of parameters associated uncertainty and potential system degradation. This technical problem involving the need to account for and facilitate monitoring of uncertain parameters that are associated with potential degradation of an industrial system can be solved by the adaptive and dual control methods of the present disclosure, as described in further detail below. These unconventional methods employ derivate-based stochastic DDP- based algorithms and involve the processing, by the computer hardware controlling the industrial system, of an augmented state data structure that includes the both the states and the uncertain parameters, an augmented state covariance data structure that characterizes the covariance both the states and the uncertain according to an encoded augmented mathematical model of the dynamics of the industrial system, where the uncertain parameters are treated as states. This unconventional approach enables, as described above, the efficient computation of a suitable control policy that accounts for system degradation with improved computational efficiency. Moreover, such an approach enables the monitoring, during the control of the system, of the uncertainty associated with one or more uncertain parameters, thereby enabling a control method that can facilitate the early detection of system degradation. Mathematical models of industrial systems such as those described above may already exist, or may be generated to characterize the time evolution of system states according to parameters that can include one or more uncertain parameters. It will be understood that the specific states, parameters, and controls would be specific to a given industrial system. Non-limiting examples of industrial systems and associated uncertain parameters include the detection of scaling in boilers (for which example uncertain parameters could include heat transfer efficiency or fouling factor) and DC motor fault detection (for which example uncertain parameters could include motor resistance, friction torque coefficient, and magnetic flux linkage). Additional examples of industrial systems that are controllable for autonomous fault detection in the presence of uncertain model parameters include, but are not limited to, robotic systems, autonomous and non-autonomous vehicles, wind and tidal turbines, and HVAC equipment. The adaptive and dual control methods described above would calculate control policies that would be implemented on the industrial system with the objective of minimizing the given cost functions and also providing the features of fault detection and system health monitoring. The nature of the controls would depend on the specific industrial system to which this control approach was applied. When implemented according to the adaptive control methods described above, the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented mathematical model of the dynamics of the industrial system in a filter residing in the outer processing loop enables the outer loop filter to provide an updated estimate of the aforementioned uncertain parameters, taking into account their uncertainty, thereby leading to an improvement in convergence, computing efficiency, and enabling the online monitoring of the uncertain parameters and their degree of uncertainty, as per updated values generated by the outer loop filter. Moreover, when implemented according to the implicit dual control methods described above, the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented mathematical model of the dynamics of the industrial system within the augmented stochastic DDP-based algorithm within the inner processing loop, enables the aforementioned uncertain parameters to be treated like states during the computation of the updated control policy, thereby enabling the implicit dual features of probing, caution, and selectiveness to achieve convergence faster than a corresponding method absent of augmentation, and facilitating the monitoring of uncertainty associated with its parameter estimates while the system controlled during normal operation. In this case of an implicit dual controller, if the parameter dynamics included a sufficiently high uncertainty, the augmented stochastic DDP-based algorithm can be motivated to probe and monitor these parameters so that the overall cost was not impacted. The uncertain parameters and their uncertainties can be employed to facilitate fault detection according to many different implementations. In some example implementations, criteria, (such as thresholds) one or more specific uncertain parameters defining bounds on their normal operating ranges, and/or uncertainty (e.g. variances and/or covariances as determined by the augmented covariance data structure) can be employed, allowing for feedback to operators on both uncertain state- and parameter-based system health and faults. Example 7: Building Climate Control With 12% of global energy usage being used for heating and cooling buildings, building climate control significantly impacts climate change (for example, see González-Torres et al., Energ. Rep.8, 626-637, 2022). The control of the climate of a building can involve a variety of devices, including all types of HVAC equipment, automatic blinds, lighting, and heat energy storage equipment to maintain a building’s temperature and humidity levels. Sensors for such a system could include temperature, humidity, occupancy, sunlight, and anemometer sensors. Such a control system can also be informed by external factors including weather predictions, the scheduled usage of the building, and time-of-use power pricing, all of which contribute to uncertainty in a mathematical model employed by a controller. Given a mathematical model of the dynamics of the climate within a building, MPC-based controllers can maintain the climate of the building by predicting and accounting for the impact of disturbances, but this approach requires a sufficiently accurate model of the building, including numerous parameters that have associated uncertainty. Some of these uncertain parameters involve external factors as noted above, while other uncertain parameters are building-specific and can vary with time. Accordingly, identifying these uncertain parameters and their appropriate respective values would have to be performed for each building for which this control method is used, and this would be an expensive and potentially cost-prohibitive approach. Accordingly, a technical problem exists in the field of building climate control in that existing control methods fail to accommodate, learn and refine uncertain parameters related to external factors and/or building-specific aspects that impact the interior climate of a building. This technical problem can be solved by the adaptive and dual control methods of the present disclosure, as described in further detail below. These unconventional methods employ derivate- based stochastic DDP-based algorithms and involve the processing, by the computer hardware controlling the industrial system, of an augmented state data structure that includes the both the states and the uncertain parameters, an augmented state covariance data structure that characterizes the covariance both the states and the uncertain parameters, according to an encoded augmented mathematical model of the dynamics of the building climate control system, where the uncertain parameters are treated as states. This unconventional approach enables, as described above, the efficient computation of a suitable control policy that accounts for the aforementioned uncertainty with improved computational efficiency. The states involved in a building climate control model can vary depending on climate control implementation, but in some non-limiting examples, can include temperatures and humidities, and the uncertain parameters could consist of thermal conductivities, thermal capacities, radiation heat transfer coefficients, and HVAC equipment efficiencies. Although the thermal properties of common building materials are well-documented, differences in humidity and compaction from the laboratory testing conditions can impact these values throughout the building’s lifespan. Radiation heat transfer coefficients are critical for determining the impact of sunlight on a building’s external and internal temperatures, and fouling of surfaces and glazings can significantly change these values. The efficiencies of many types of HVAC equipment, specifically heat pumps for example, are dependent on the outside conditions and can vary significantly throughout a single day. All of these time-varying parameters could be included in the dual iLQG augmented state vector so that they could be identified during normal operation of the building climate control system. Mathematical models for building climate control typically describe the rate of change of temperatures and humidities of interest. This is usually done in a lumped approach to simplify the control model and avoid the use of partial differential equations, such as for example considering the entire exterior south face of a building to be a single temperature. The states of the model could include for example the temperature of each external and internal wall of the building, the air temperature of each room in the building, the humidity of each room in the building, and the temperature of any heat storage equipment. The model would relate the rates of change of these states to the values of these states, the thermal properties of the building both known and uncertain, the impact of the control equipment, and the impact of external factors as described above. The adaptive and dual control methods described above would calculate control policies that would be implemented with the building climate control equipment with the objective of minimizing the given cost functions. These control policies would inform for example the scheduling of the HVAC equipment, automatic blinds, lighting, and heat energy storage equipment. When implemented according to the adaptive control methods described above, the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented mathematical model of the dynamics of the building climate control system in a filter residing in the outer processing loop enables the outer loop filter to provide an updated estimate of the aforementioned uncertain parameters, taking into account their uncertainty, thereby leading to an improvement in convergence, computing efficiency, and enabling the online monitoring of the uncertain parameters degree of uncertainty, as per updated values generated by the outer loop filter. Moreover, when implemented according to the implicit dual control methods described above, the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented mathematical model of the dynamics of the building climate control system within the augmented stochastic DDP-based algorithm within the inner processing loop, enables the aforementioned uncertain parameters to be treated like states during the computation of the updated control policy, thereby enabling the implicit dual features of probing, caution, and selectiveness to refinement of uncertain parameters associated with external factors and building-specific aspects during climate control, and to achieve convergence faster than a corresponding method absent of augmentation. Example 8: Smart Grid Control Regulating Smart-grids is another challenging application, because the grid often does not have control over the sources of power being added to it. Each of these sources can have a different impedance (resistance, inductance, and capacitance), and can run at slightly different frequencies and amplitudes, making the grid itself a stochastic system. Ensuring that power from each component of the grid can be used without the entire grid becoming unstable is a challenging problem. Accordingly, a technical challenge exists in regulating smart grids in that the system is stochastic and that there are many uncertain parameters that govern the dynamics of its states, making it difficult to ensure stability during operation. This technical problem involving the need to account for uncertain parameters that are associated with potential instability of a smart grid during its regulation can be solved by the adaptive and dual control methods of the present disclosure, as described in further detail below. These unconventional methods employ derivate-based stochastic DDP-based algorithms and involve the processing, by the computer hardware controlling the smart grid, of an augmented state data structure that includes the both the states and the uncertain parameters, an augmented state covariance data structure that characterizes the covariance both the states and the uncertain parameters, according to an encoded augmented mathematical model of the dynamics of the smart grid, where the uncertain parameters are treated as states. This unconventional approach enables, as described above, the efficient computation of a suitable control policy that accounts for and determines improved estimates of the uncertain parameters during control, with improved computational efficiency, thereby providing a customized control method that can lead to improved stability during regulation of the smart grid. For example, a mathematical model of a smart grid may map the various impedance parameters of each power source within the grid to the net power supply of the grid, and non- limiting examples of states include voltage amplitude and frequency being produced by each source within the grid. As noted above, many of the parameters that govern the dynamics of these states are uncertain, due to the stochastic of the system and variations among components within systems or within a given system. Non-limiting examples of uncertain parameters include the impedance parameters of each source within the grid, along with the electrical connections that connect them. Examples of control actions that can be taken, when implementing an adaptive or implicit dual control method according to the example embodiments described above or variations thereof, include selecting which sources are connected to the grid, along with the addition or removal extra impedance to the grid. The specific embodiments described above have been shown by way of example, and it should be understood that these embodiments may be susceptible to various modifications and alternative forms. It should be further understood that the claims are not intended to be limited to the particular forms disclosed, but rather to cover all modifications, equivalents, and alternatives falling within the spirit and scope of this disclosure.
REFERENCES [1] Yehia Abdelsalam, Sankaranarayanan Subramanian, and Sebastian Engell, Asymptotically Stabilizing Multi-Stage Model Predictive Control, 202059th IEEE Conference on Decision and Control (CDC) (Jeju, Korea (South)), vol.2020-Decem, IEEE, dec 2020, pp. 710–717. [2] Joel A. E. Andersson, Joris Gillis, Greg Horn, James B. Rawlings, and Moritz Diehl, CasADi: a software framework for nonlinear optimization and optimal control, Mathematical Programming Computation 11 (2019), no. 1, 1–36. [3] Yaakov Bar-Shalom and Edison Tee, Dual Effect, Certainty Equivalence and Separation in Stochastic Control., (1974), no. 5, 250–257. [4] Yaakov Bar-Shalom and Edison Tse, Caution, Probing, and the Value of Information in the Control of Uncertain Systems, Annals of Economic and Social Measurement 5 (1976), no. 3, 323–337. [5] D. J. Batstone, J. Keller, I. Angelidaki, S. V. Kalyuzhnyi, S. G. Pavlostathis, A. Rozzi, W. T. Sanders, H. Siegrist, and V. A. Vavilin, The IWA Anaerobic Digestion Model No 1 (ADM1)., Water science and technology : a journal of the International Association on Water Pollution Research 45 (2002), no.10, 65–73. [6] Vinay A. Bavdekar and Ali Mesbah, Stochastic Model Predictive Control with Integrated Experiment Design for Nonlinear Systems, IFAC-PapersOnLine 49 (2016), no. 7, 49–54. [7] David S. Bayard and Alan Schumitzky, Implicit dual control based on particle filtering and forward dynamic programming, International Journal of Adaptive Control and Signal Processing 24 (2010), 155–177. [8] Richard Bellman, A Markovian Decision Process, Indiana University Mathematics Journal 6 (1957), no. 4, 679–684. [9] Olivier Bernard, Zakaria Hadj-Sadok, Denis Dochain, Antoine Genovesi, Jean-philippe Philippe Steyer, Comore Project, and Sophia-antipolis Cedex, Dynamical model development and parameter identification for an anaerobic wastewater treatment process. Biotechnology and bioengineering 75.4 (2001) 424-438..pdf, Biotechnology and Bioengineering 75 (2001), no.4, 424–438. [10] Vladim´ Boba´l, Petr Chalupa, Petr Dosta´l, and Petr Dosta´l Vladim´ Boba´l, Petr Chalupa, Adaptive Dual Control of Nolinear Liquid System, IFAC Proceedings Volumes 40 (2007), no. 13, 483–488. [11] Eric Bradford and Lars Imsland, Economic Stochastic Model Predictive Control Using the Unscented Kalman Filter, IFAC-PapersOnLine 51 (2018), no. 18, 417–422. [12] Jingyi Chao and Thomas Scha¨fer, Multiplicative noise and the diffusion of conserved densities, Journal of High Energy Physics 2021 (2021), no. 1, 71. Gregory Chow, Analysis and control of dynamic economic systems, Wiley, New York, 1975. [14] Noel E. Du Toit and J. W. Burdick, Robot Motion Planning in Dynamic, Uncertain Environments, IEEE Transactions on Robotics 28 (2012), no. 1, 101–115. [15] Lorenzo Fagiano, Georg Schildbach, Marko Tanaskovic, and Manfred Morari, Scenario and Adaptive Model Predictive Control of Uncertain Systems, IFAC- PapersOnLine 48 (2015), no. 23, 352–359. [16] M. Farrokhsiar and H. Najjaran, A robust probing motion planning scheme: A tube- based MPC approach, 2013 American Control Conference (Washington, DC, USA), IEEE, jun 2013, pp. 6492–6498. [17] Morteza Farrokhsiar, MPC-based Mobile Robots Motion Planning and Control in Uncertain Dynamic Environment, Doctoral, The University of British Columbia, 2015, p. 114. [18] A Feldbaum, Dual control theory. I, Avtomat. i Telemekh. 21 (1960), no. 9, 1240– 1249. [19] Yi Guo and Tyler H. Summers, A Performance and Stability Analysis of Low-inertia Power Grids with Stochastic System Inertia, 2019 American Control Conference (ACC), IEEE, jul 2019, pp. 1965–1970. [20] Kristian G. Hanssen and Bjarne Foss, Based Implicit Dual Model Predictive Control, IFAC-PapersOnLine 48 (2015), no.23, 416–421. [21] Tor Aksel N. Heirung, Bjarne Foss, and B. Erik Ydstie, MPC-based dual control with online experiment design, Journal of Process Control 32 (2015), 64–76. [22] Tor Aksel N Heirung, Joel A Paulson, Jared O’Leary, Ali Mesbah, Jared O Leary, Ali Mesbah, Jared O’Leary, and Ali Mesbah, Stochastic model predictive control - how does it work?, Computers & Chemical Engineering 114 (2018), 158–170. [23] Tor Aksel N Heirung, B. Erik Ydstie, and Bjarne Foss, An Adaptive Model Predictive Dual Controller, 11th IFAC International Workshop on Adaptation and Learning in Control and Signal Processing (Caen, France), vol. 46, IFAC, 2013, pp. 62–67. [24] Wilson Jallet, Antoine Bambade, Nicolas Mansard, and Justin Carpentier, Constrained Differential Dynamic Programming: A primal-dual augmented Lagrangian approach, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (Kyoto, Japan), 2022. [25] Julie Jimenez, Eric Latrille, Je´rome Harmand, Angel Robles, Jose´ Ferrer, Daniel Gaida, Christian Wolf, Francis Mairet, Olivier Bernard, Victor Alcaraz-Gonzalez, Hugo Mendez-Acosta, Daniel Zitomer, Dennis Totzke, Henri Spanjers, Fabian Jacobi, Alan Guwy, Richard Dinsdale, Giuliano Premier, Sofiane Mazhegrane, Gonzalo Ruiz-Filippi, Aurora Seco, Thierry Ribeiro, Andre´ Pauss, and Jean Philippe Steyer, Instrumentation and control of anaerobic digestion processes: a review and some research challenges, Reviews in Environmental Science and Biotechnology 14 (2015), no. 4, 615–648. Hoil Kil, Dewei Li, Yugeng Xi, and Jiwei Li, Model predictive control with on-line model identification for anaerobic digestion processes, Biochemical Engineering Journal 128 (2017), 63–75. [27] Edgar D. Klenske, Philipp Hennig, Bernhard Scholkopf, and Melanie N. Zeilinger, Approximate dual control maintaining the value of information with an application to (ECC) (Aalborg, Denmark), IEEE,
Figure imgf000096_0001
[28] Edgar Dietrich Klenske, Nonparametric Disturbance Correction and Nonlinear Dual Control, Ph.D. thesis, 2017, p. 251. [29] Johannes Ko¨hler, Lukas Schwenkel, Anne Koch, Julian Berberich, Patricia Pauli, and Frank Allgo¨wer, Robust and optimal predictive control of the COVID-19 outbreak, Annual Reviews in Control 51 (2021), 525–539. [30] Kunal Kumar, Tor Aksel N Heirung, Sachin C. Patwardhan, and Bjarne Foss, Experimental evaluation of a MIMO adaptive dual MPC, IFAC-PapersOnLine 28 (2015), no. 8, 545– 550. [31] H. C. La, A. Potschka, J. P. Schlo¨der, and H. G. Bock, Dual Control and Information Gain in Controlling Uncertain Processes, IFAC-PapersOnLine 49 (2016), no. 7, 139– 144. [32] Huu Chuong La, Dual Control for Nonlinear Model Predictive Control, Ph.d., Heidelberg University, 2016, p. 140. [33] Francoise Lamnabhi-Lagarrigue, Anuradha Annaswamy, Sebastian Engell, Alf Isaksson, Pramod Khargonekar, Richard M. Murray, Henk Nijmeijer, Tariq Samad, Dawn Tilbury, and Paul Van den Hof, Systems & Control for the future of humanity, research agenda: Current and future roles, impact and grand challenges, Annual Reviews in Control 43 (2017), 1–64. [34] Gilwoo Lee, Siddhartha S. Srinivasa, and Matthew T. Mason, GP-ILQG: Data-driven Robust Optimal Control for Uncertain Nonlinear Dynamical Systems, arXiv abs/1705.0 (2017). [35] Weiwei Li and Emanuel Todorov, Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems, Proceedings of the First International Conference on Informatics in Control, Automation and Robotics, SciTePress - Science and Technology Publications, 2004, pp. 222–229. [36] , An iterative optimal control and estimation design for nonlinear stochastic system, Proceedings of the IEEE Conference on Decision and Control (San Diego, CA, USA), 2006, pp. 3242–3247. [37] , Iterative linearization methods for approximately optimal control and estimation of non-linear stochastic system, International Journal of Control 80 (2007), no. 9, 1439–1453. [38] Sergio Lucia, Robust Multi-stage Nonlinear Model Predictive Control, Ph.d., Technical University of Dortmund, 2015. [39] Sergio Lucia and Sebastian Engell, Potential and limitations of multi-stage nonlinear model predictive control, IFAC-PapersOnLine 28 (2015), no. 8, 1015–1020. [40] Sergio Lucia, Sankaranarayanan Subramanian, Daniel Limon, and Sebastian Engell, Stability properties of multi-stage nonlinear model predictive control, Systems and Control Letters 143 (2020), 104743. [41] Sergio Lucia, Alexandru Tatulea-Codrean, Christian Schoppmeyer, and Sebastian Engell, An environment for the efficient testing and implementation of robust NMPC, 2014 IEEE Conference on Control Applications (2014), no. October, 1843–1848. [42] Harsh Maheshwari, Shreyas Shetty, Nayana Bannur, and Srujana Merugu, CoSIR: Optimal control of SIR epidemic dynamics by mapping to Lotka- Volterra System, medRxiv (2021). [43] Giancarlo Marafioti, Enhanced Model Predictive Control: Dual Control Approach and State Estimation Issues, Doctoral, Norwegian University of Science and Technology, 2010, p. 150. [44] David Mayne, A second-order gradient method for determining optimal trajectories of non-linear discrete-time systems, International Journal of Control 3 (1966), no. 1, 85–95. [45] Zawadi Mdoe, Dinesh Krishnamoorthy, and Johannes Ja¨schke, Adaptive horizon economic nonlinear model predictive control, Journal of Process Control 92 (2021), 108–118. [46] Rudolph Van Der Merwe and Eric Wan, Sigma-Point Kalman Filters for Probabilistic Inference in Dynamic State-Space Models, in Proceedings of the 60th Annual Meeting of The Institute of Navigation (ION) (2004). [47] Ali Mesbah, Stochastic model predictive control with active uncertainty learning: A Survey on dual control, Annual Reviews in Control 45 (2018), 107–117. [48] Djordje Mitrovic, Stefan Klanke, and Sethu Vijayakumar, Adaptive optimal feedback control with learned internal dynamics models, Studies in Computational Intelligence 264 (2010), 65–84. [49] Altan Onat, A Novel and Computationally Efficient Joint Unscented Kalman Filtering Scheme for Parameter Estimation of a Class of Nonlinear Systems, IEEE Access 7 (2019), 31634– 31655. [50] Guaranteed parameter estimation of non-linear dynamic systems using high-order bounding techniques with domain and CPU-time reduction strategies, IMA Journal of Mathematical Control and Information 33 (2016), no. 3, 563–587. [51] Tama´s Pe´ni, Bala´zs Csutak, Ga´bor Szederke´nyi, and Gergely Ro¨st, Nonlinear model predictive control with logic constraints for COVID-19 management, Nonlinear Dynamics 102 (2020), no. 4, 1965–1986. [52] Jay Prakash Tripathi and Richa Tripathi, Adaptive nonlinear control scheme for containment of COVID-19 spread, Simulation 98 (2022), no. 9. [53] Martin A. Sehr and Robert R. Bitmead, Tractable dual optimal stochastic model predictive control: An example in healthcare, 2017 IEEE Conference on Control Technology and Applications (CCTA) (Kohala Coast, Hawaii, USA), IEEE, aug 2017, pp. 1223–1228. [54] Jonathon W. Sensinger and Strahinja Dosen, A Review of Sensory Feedback in Upper-Limb Prostheses From the Perspective of Human Motor Control, Frontiers in Neuroscience 14 (2020). [55] Jean-Jacques Slotine and Weiping Li, Applied Nonlinear Control, Prentice Hall, 1991. [56] Sankaranarayanan Subramanian, Sergio Lucia, and Sebastian Engell, Economic multi- stage output feedback NMPC using the Unscented Kalman Filter, IFAC-PapersOnLine 28 (2015), no. 8, 38–43. [57] Yuval Tassa, Tom Erez, and Emanuel Todorov, Synthesis and stabilization of complex behaviors through online trajectory optimization, IEEE International Conference on Intelligent Robots and Systems (Vilamoura, Algarve, Portugal), 2012, pp. 4906–4913. [58] Yuval Tassa, Nicolas Mansard, and Emo Todorov, Control-limited differential dynamic programming, Proceedings - IEEE International Conference on Robotics and Automation (Hong Kong, China), 2014, pp. 1168–1175. [59] S. Thangavel, S. Lucia, R. Paulen, and S. Engell, Dual robust nonlinear model predictive control: A multi-stage approach, Journal of Process Control 72 (2018), 39–51.
[60] Sakthi Thangavel, Sergio Lucia, Radoslav Paulen, and Sebastian Engell, Towards dual robust nonlinear model predictive control: A multi-stage approach, 2015 American Control Conference (ACC) (Chicago, Illinois, USA), IEEE, jul 2015, pp. 428–433. [61] Sakthi Thangavel, Radoslav Paulen, and Sebastian Engell, Dual multi-stage NMPC using sigma point principles, IFAC-PapersOnLine, vol. 53, Elsevier B.V., 2020, pp. 11243– 11250. [62] Evangelos Theodorou, Yuval Tassa, and Emo Todorov, Stochastic differential dynamic programming, Proceedings of the 2010 American Control Conference (Baltimore, Maryland, USA), 2010, pp. 1125–1132. [63] Emanuel Todorov, Optimal Control Theory, Bayesian Brain, 2006. [64] Edison Tse, Yaakov Bar-Shalom, and Lewis Meier, Wide-Sense Adaptive Dual Control for Nonlinear Stochastic Systems, IEEE Transactions on Automatic Control 18 (1973), no. 2, 98–108. [65] Andreas Wa¨chter and Lorenz T. Biegler, On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming, Mathematical Programming 106 (2006), no. 1, 25–57. [66] Avishai Weiss and Stefano Di Cairano, Robust dual control MPC with guaranteed constraint satisfaction, Proceedings of the IEEE Conference on Decision and Control (2014), 6713–6718. [67] Thomas Mikosch and Mohsen Rezapour, Stochastic volatility models with possible extremal clustering, Bernoulli 19 (2013), no. 5A. [68] Vyacheslav Tuzlukov, Generalized Approach to Signal Processing in Wireless Communications: The Main Aspects and some Examples, Wireless Communications and Networks - Recent Advances, InTech, mar 2012. [69]S.Joe Qin and Thomas A. Badgwell, A survey of industrial model predictive control technology, Control Engineering Practice 11 (2003), no.7, 733–764. [70]Fateme Aghaee, Nima Mahdian Dehkordi, Bayati, and Amin Hajizadeh, Delay and General Multiplicative Noise-Resilient Secondary Frequency and Voltage Control for an Autonomous Microgrid, 202112th Power Electronics, Drive Systems, and Technologies Conference (PEDSTC), IEEE, Feb 2021, pp.1–6. LIST OF ABBREVIATIONS LQR Linear quadratic regulator iLQR Iterative linear quadratic regulator iLQG Iterative linear quadratic Gaussian MPC Model predictive control NMPC Nonlinear model predictive control MS-SP-NMPC Multi-Stage Sigma Point Nonlinear Model Predictive Control SIDARTHE Susceptible, Infected, Diagnosed, Ailing, Recognized, Threatened, Healed, Extinct ICU Intensive Care Unit LIST OF VARIABLES A linear process state coefficient matrix, ailing state for COVID-19 model. B linear process control coefficient matrix
Figure imgf000102_0001
total inorganic carbon concentration C vector of linearized process noise terms
Figure imgf000102_0002
total inorganic carbon concentration
Figure imgf000102_0003
inlet concentrations of total inorganic carbon con- centration
Figure imgf000102_0004
Cu linearized process noise control coefficient matrix Cx linearized process noise state coefficient matrix D eigenvalues from eigenvalue decomposition, dilution rate in the anerobic digestion section, detected population state for COVID-19 model D vector of linearized measurement noise terms Du linearized noise control coefficient matrix Dx linearized measurement noise state coefficient matrix linearized measurement state coefficient matrix, extinct state for COVID-19 model E linearized measurement control coefficient matrix F process noise scaling matrix stage cost G regularized cost-to-go quadratic state dependent shorthand term G measurement noise scaling matrix Gx cost-to-go quadratic state dependent shorthand term G x cost-to-go quadratic state estimate dependent shorthand term H cost-to-go quadratic control dependent shorthand term, transfer function in the section titled “ Robustness to time-varying parameters”, healed state for COVID-19 model H regularized cost-to-go quadratic control dependent shorthand term I infected population state for COVID-19 model J cost function K gain KH Henry’s constant KI2 inhibition constant associated with µ2 KS1 half-saturation constant associated with µ1 KS2 half-saturation constant associated with µ2 L state dependant control gain M estimator process noise shorthand term N control horizon, total population for COVID-19 model P state-control quadratic cost-weighting matrix PCO2 partial pressure of carbon dioxide P estimator measurement noise shorthand term Pt total pressure Q state quadratic cost-weighting matrix Qf terminal state quadratic cost-weighting matrix R control quadratic cost-weighting matrix, recovered population for COVID-19 model S susceptible population state for COVID-19 model S1 organic substrate concentration S a 1 organic substrate concentration of feedstock a S in 1 inlet concentrations of organic substrate concen- tration S2 volatile fatty acid concentration S
Figure imgf000104_0001
volatile fatty acid concentration S in 2 inlet concentrations of volatile fatty acid concen- tration Sx state deviation dependent quadratic cost-to-go term S x estimate of state deviation dependent quadratic cost-to- go term S xx state deviation and estimate of state deviation de- pendent quadratic cost-to-go term T discrete sampling period, threatened state for COVID-19 model T1 threatened state for COVID-19 model that do not require ICU treatment T2 threatened state for COVID-19 model that do re- quire ICU treatment TICU capacity of the threatened state in the ICU TICUmax maximum capacity of the threatened state in the ICU TICUmin minimum capacity of the threatened state in the ICU V eigenvectors from eigenvalue decomposition Vk optimal cost-to-go from time k X1 concentration of acidogenic bacteria X2 concentration of methanogenic bacteria Z total alkalinity concentration Za total alkalinity concentration Zin inlet concentrations of total alkalinity concentra tion a adaptation parameter vector am negative pole of reference model transfer function ap negative pole of plant transfer function ar input reference adaptation parameter ay output adaptation parameter bmax squashing function maximum vector limit bmin squashing function minimum vector limit c constant vector ca augmented constant vector c scalar linearized process noise terms d uncertain parameter vector d scalar linearized measurement noise terms d estimate of uncertain parameter vector dp uncertain parameter trajectory vector e output error f system dynamics g measurement equation g cost-to-go linear control dependent shorthand term h terminal cost k discrete time step counter k1 yield for substrate degradation k2 yield for volatile fatty acid production k3 yield for volatile fatty acid consumption k4 yield for carbon dioxide production k5 yield for carbon dioxide production k6 yield for methane production kLa liquid/gas transfer constant km gain for reference model kp gain for plant l scalar control gain Q quadratized stage cost me unconditional mean of the error of the estimate of state deviation vector mx unconditional mean of the estimate of the state deviation vector p Laplace variable q state scalar cost-weighting matrix qa flow rate of feedstock A qbiogas molar flow rate of biogas q state linear cost-weighting matrix qCH4 molar flow rate of methane qCO2 molar flow rate of carbon dioxide r input reference r control linear cost-weighting matrix s(u) squashing function s scalar cost-to-go term sx state deviation dependent linear cost-to-go term s x estimate of state deviation dependent linear cost- to- go term u control vector u control deviation vector up control trajectory vector u¯p nominal control vector
Figure imgf000107_0001
up optimal control trajectory uss steady state control input v cost-to-go function v measurement noise, regressor vector in section titled “Robustness to time-varying parameters” w process noise x state vector xa augmented state vector x state deviation vector x time derivative of x x estimate of state deviation vector xp state trajectory vector x pa augmented state trajectory vectorp nominal state vector xp desired terminal state y measurement deviation vector ym output of reference model yp measurement trajectory vector y¯p nominal measurement vector yss steady state output regularization term modification factor 0 minimum regularization term modification factor t discrete time step covariance x state covariance
Figure imgf000108_0001
parameter covariance

Claims

CLAIMS 1. A computer-implemented method of determining control actions for controlling a stochastic system according to an implicit dual controller, the method comprising: providing control and processing circuity comprising at least one processor and associated memory, the memory comprising an augmented mathematical model of dynamics of the stochastic system, the augmented mathematical model based on a mathematical model including states of the stochastic system that dynamically evolve over time and a set of parameters characterizing dynamic evolution of the stochastic system, the set of parameters comprising one or more uncertain parameters, the augmented mathematical model augmented to include dynamics of one or more uncertain parameters; storing, in the memory, an initialized form of an augmented state data structure, the augmented state data structure being provided in as a first data array of data elements characterizing the states of the stochastic system and the one or more uncertain parameters, thereby grouping the states and the one or more uncertain parameters within the augmented state data structure; storing, in the memory, via the processor, an initialized form of an augmented state covariance data structure, the augmented state covariance data structure being provided as a second data array of data elements comprising a covariance matrix of the states of the stochastic system and a covariance matrix of the one or more uncertain parameters, thereby grouping data elements of the respective covariance matrices of the states and the one or more uncertain parameters within the augmented state covariance data structure; storing, in the memory, via the processor, an initialized form of a nominal control trajectory data structure; performing a control loop iteration via the processor by: a) processing, according to an augmented stochastic differential-dynamic-programming- based algorithm, the augmented state data structure, the augmented state covariance data structure and the nominal control trajectory data structure according to the augmented mathematical model and a cost function such that a forward pass and a backward pass performed when executing the augmented stochastic differential-dynamic-programming-based algorithm, in which the one or more uncertain parameters are treated as additional states subject to the augmented mathematical model, results in a control policy configured to reduce cost through implicitly generated dual features of probing, caution, and selectiveness, thereby achieving convergence faster than a corresponding method absent of augmentation; b) processing the control policy to a control action for controlling the stochastic system; c) receiving one or more output measurements of an output of the stochastic system; and d) processing the one or more output measurements to update the augmented state data structure and to update the augmented state covariance data structure; and repeating steps a) to d) one or more times based on an updated nominal control trajectory data structure to determine and apply control actions over a plurality of time steps, thereby implicitly incorporating prediction of how changes to the control actions result in reductions in parameter uncertainty to implicitly lower a total cost of the control actions.
2. The method according to claim 1 wherein the mathematical model is configured such that all of the states are observable, and wherein, when performing step d), the one or more uncertain parameters are updated via a filter.
3. The method according to claim 1 wherein at least one of the states is unobservable, and wherein, when performing step d), a filter is configured to employ the augmented mathematical model to estimate updated values of unknown states and the one or more uncertain parameters by processing the augmented state data structure and the augmented state covariance data structure.
4. A computer-implemented method of determining control actions for controlling a stochastic system according to an adaptive controller, the method comprising: providing control and processing circuity comprising at least one processor and associated memory, the memory comprising a mathematical model of dynamics of the stochastic system, the mathematical model including states of the stochastic system that dynamically evolve over time and a set of parameters characterizing dynamic evolution of the stochastic system, the set of parameters comprising one or more uncertain parameters, the memory also comprising an augmented mathematical model of dynamics of the stochastic system, the augmented mathematical model comprising a mathematical model augmented to include dynamics of one or more uncertain parameters; storing, in the memory, an initialized form of a state data structure, the state data structure being provided in as a first data array of data elements characterizing the states of the stochastic system; storing, in the memory, an initialized form of a state covariance data structure, the state covariance data structure being provided as a second data array of data elements comprising a covariance matrix of the states of the stochastic system; storing, in the memory, an initialized form of an augmented state data structure, the augmented state data structure being provided as a third data of data elements characterizing the states of the stochastic system and the one or more uncertain parameters, thereby grouping the states and the one or more uncertain parameters within the augmented state data structure; storing, in the memory, an initialized form of an augmented state covariance data structure, the augmented state covariance data structure being provided as a fourth data array of data elements comprising a covariance matrix of the states of the stochastic system and a covariance matrix of the one or more uncertain parameters, thereby grouping data elements of the respective covariance matrices of the states and the one or more uncertain parameters within the augmented state covariance data structure; storing, in the memory, an initialized form of a nominal control trajectory data structure; performing a control loop iteration via the processor by: a) processing, according to a stochastic differential-dynamic-programming-based algorithm, the state data structure, the state covariance data structure and the nominal control trajectory data structure according to the mathematical model and a cost function, such that a forward pass and a backward pass performed when executing the stochastic differential-dynamic-programming-based algorithm, results in a control policy; b) processing the control policy to determine a control action for controlling the stochastic system; c) receiving one or more output measurements of an output of the stochastic system; and d) processing the one or more output measurements to update the state data structure, the state covariance data structure and the one or more uncertain parameters, wherein a filter is configured to employ the augmented mathematical model to estimate updated values of unknown states and the one or more uncertain parameters by processing the augmented state data structure and the augmented state covariance data structure; and repeating steps a) to d) one or more times based on an updated nominal control trajectory data structure to determine and apply control actions over a plurality of time steps, thereby incorporating prediction of how changes to the control actions result in reductions in parameter uncertainty to lower a total cost of the control actions.
5. The method according to any one of claims 1 to 4 wherein for at least one time step, step b) is performed after performing steps c) and d), such that the control policy is determined based on newly determined states.
6. The method according to any one of claims 1 to 5 wherein the control action is autonomously applied to the stochastic system for at least one time step.
7. The method according to any one of claims 1 to 6 wherein the control action is not applied to the stochastic system for at least one time step.
8. The method according to any one of claims 1 to 7 wherein the control and processing circuity is encoded with the augmented mathematical model such that the augmented mathematical model is characterized by multiplicative noise.
9. The method according to any one of claims 1 to 7 wherein the control and processing circuity is encoded with the augmented mathematical model such that at least one of the uncertain parameters is modeled as a time-dependent parameter.
10. The method according to any one of claims 1 to 9 wherein the control and processing circuity is encoded to employ a moving control horizon.
11. The method according to any one of claims 1 to 10 wherein the mathematical model is obtained by data-driven modeling.
12. The method according to claim 11 wherein the mathematical model is obtained by regression-based data-driven modeling.
13. The method according to claim 1 wherein the augmented stochastic differential-dynamic- programming-based algorithm is based on an algorithm selected from the group consisting of the iterative linear quadratic Gaussian (iLQG) algorithm, the stochastic dual dynamic programming (SDDP) algorithm, and variations thereof.
14. The method according to claim 1 wherein the augmented stochastic differential-dynamic- programming-based algorithm is based on an algorithm selected from the group consisting of dual dynamic programming, the sequential linear quadratic (SLQ) algorithm, and the iterative linear quadratic regulator (iLQR) algorithm, and variations thereof, all being modified to include stochastic terms.
15. The method according to claim 4 wherein the stochastic differential-dynamic-programming-based algorithm is based on an algorithm selected from the group consisting of the iterative linear quadratic Gaussian (iLQG) algorithm, the stochastic dual dynamic programming (SDDP) algorithm, and variations thereof.
16. The method according to claim 4 wherein the stochastic differential-dynamic-programming-based algorithm is based on an algorithm selected from the group consisting of dual dynamic programming, the sequential linear quadratic (SLQ) algorithm, and the iterative linear quadratic regulator (iLQR) algorithm, and variations thereof, all being modified to include stochastic terms.
17. The method according to any one of claims 1 to 16 wherein the stochastic system is an industrial system for producing or refining a product.
18. The method according to claim 17 wherein at least one of the one or more uncertain parameters is a composition of a feedstock, and wherein the control action comprises controlling an input rate of the feedstock.
19. The method according to claim 18 wherein the industrial system is an anerobic digestion system and the feedstock is an organic feedstock suitable for digestion by microorganisms.
20. The method according to any one of claims 1 to 16 wherein the stochastic system comprises a population, wherein the states comprises a plurality of infection status states of the population, wherein the mathematical model simulates spread and dynamics of an infectious disease among the population, and wherein the control policy is configured to determine, at least in part, a severity of public policy actions for containing spread of the infectious disease, and wherein the one or more uncertain parameters comprise a minimum rate of infection when a maximum severity of public policy is applied.
21. The method according to any one of claims 1 to 16 wherein the stochastic system is an autonomous vehicle, and where at least one uncertain parameter is associated with an uncertainty caused by an impact of an environment on dynamics of the autonomous vehicle.
22. The method according to claim 21 wherein the one or more uncertain parameters comprise at least one of a friction coefficient and a drag coefficient having uncertainty due to external environmental conditions.
23. The method according to any one of claims 1 wherein the stochastic system is an individual undergoing rehabilitation, wherein the states characterize at least one of participation, activities, health condition, body functions and structures, environmental factors, and personal factors, and wherein the one or more uncertain parameters comprise gains and time-constants involving interactions between the states in response to rehabilitation control actions.
24. The method according to any one of claims 1 to 16 wherein the stochastic system is a wearable robotic system, and wherein at least one uncertain parameter is tunable on a per-user basis.
25. The method according to any one of claims 1 to 16 wherein the stochastic system is an industrial system, and wherein at least one uncertain parameter is associated with degradation of the industrial system, and wherein the method further comprises employing updated values of the at least one uncertain parameter and/or its updated uncertainty, obtained during control of the industrial system, to detect a fault associated with degradation of the industrial system.
26. The method according to any one of claims 1 to 16 wherein the stochastic system is a building climate control system, and wherein the one or more uncertain parameters comprise at least one of an uncertain parameter associated with external factor and an uncertain parameter characterizing a building-specific factor.
27. An implicit dual controller for controlling a stochastic system, the implicit dual controller comprising: control and processing circuity comprising at least one processor and associated memory, the memory comprising an augmented mathematical model of dynamics of the stochastic system, the augmented mathematical model based on a mathematical model including states of the stochastic system that dynamically evolve over time and a set of parameters characterizing dynamic evolution of the stochastic system, the set of parameters comprising one or more uncertain parameters, the augmented mathematical model augmented to include dynamics of one or more uncertain parameters; the memory further comprising: an initialized form of an augmented state data structure, the augmented state data structure being provided in as a first data array of data elements characterizing the states of the stochastic system and the one or more uncertain parameters, thereby grouping the states and the one or more uncertain parameters within the augmented state data structure; an initialized form of an covariance data structure, the augmented state covariance data structure being provided as a second data array of data elements comprising a covariance matrix of the states of the stochastic system and a covariance matrix of the one or more uncertain parameters, thereby grouping data elements of the respective covariance matrices of the states and the one or more uncertain parameters within the augmented state covariance data structure; an initialized form of a nominal control trajectory data structure; the memory further comprising instructions executable by said at least one processor for performing operations comprising: performing a control loop iteration by: a) processing, according to an augmented stochastic differential-dynamic- programming-based algorithm, the augmented state data structure, the augmented state covariance data structure and the nominal control trajectory data structure according to the augmented mathematical model and a cost function such that a forward pass and a backward pass performed when executing the augmented stochastic differential-dynamic-programming-based algorithm, in which the one or more uncertain parameters are treated as additional states subject to the augmented mathematical model, results in a control policy configured to reduce cost through implicitly generated dual features of probing, caution, and selectiveness, thereby achieving convergence faster than a corresponding method absent of augmentation; b) processing the control policy to determine a control action for controlling the stochastic system; c) receiving one or more output measurements of an output of the stochastic system; and d) processing the one or more output measurements to update the augmented state data structure and to update the augmented state covariance data structure; and repeating steps a) to d) one or more times based on an updated nominal control trajectory data structure to determine and apply control actions over a plurality of time steps, thereby implicitly incorporating prediction of how changes to the control actions result in reductions in parameter uncertainty to implicitly lower a total cost of the control actions.
28. An adaptive controller for controlling a stochastic system, the adaptive controller comprising: control and processing circuity comprising at least one processor and associated memory, the memory comprising a mathematical model of dynamics of the stochastic system, the mathematical model including states of the stochastic system that dynamically evolve over time and a set of parameters characterizing dynamic evolution of the stochastic system, the set of parameters comprising one or more uncertain parameters, the memory an augmented mathematical model of dynamics of the stochastic system, the augmented mathematical model comprising a mathematical model augmented to include dynamics of one or more uncertain parameters; the memory further comprising: an initialized form of a state data structure, the state data structure being provided in as a first data array of data elements characterizing the states of the stochastic system; an initialized form of a state covariance data structure, the state covariance data structure being provided as a second data array of data elements comprising a covariance matrix of the states of the stochastic system; an initialized form of an augmented state data structure, the augmented state data structure being provided as a third data array of data elements characterizing the states of the stochastic system and the one or more uncertain parameters, thereby grouping the states and the one or more uncertain parameters within the augmented state data structure; an initialized form of an augmented state covariance data structure, the augmented state covariance data structure being provided as a fourth data array of data elements comprising a covariance matrix of the states of the stochastic system and a covariance matrix of the one or more uncertain parameters, thereby grouping data elements of the respective covariance matrices of the states and the one or more uncertain parameters within the augmented state covariance data structure; and an initialized form of a nominal control trajectory data structure; the memory further comprising instructions executable by said at least one processor for performing operations comprising: performing a control loop iteration by: a) processing, according to a stochastic differential-dynamic-programming-based algorithm, the state data structure, the state covariance data structure and the nominal control trajectory data structure according to the mathematical model and a cost function, such that a forward pass and a backward pass performed when executing the stochastic differential-dynamic-programming-based algorithm, results in a control policy; b) processing the control policy to determine a control action for controlling the stochastic system; c) receiving one or more output measurements of an output of the stochastic system; and d) processing the one or more output measurements to update the state data structure, the state covariance data structure and the one or more uncertain parameters, wherein a filter is configured to employ the augmented mathematical model to estimate updated values of unknown states and the one or more uncertain parameters the augmented state data structure and the augmented state covariance data structure; and repeating steps a) to d) one or more times based on an updated nominal control trajectory data structure to determine and apply control actions over a plurality of time steps, thereby incorporating prediction of how changes to the control actions result in reductions in parameter uncertainty to lower a total cost of the control actions.
PCT/CA2024/051169 2023-09-08 2024-09-09 Implicit dual control for uncertain stochastic systems Pending WO2025050223A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363537243P 2023-09-08 2023-09-08
US63/537,243 2023-09-08

Publications (1)

Publication Number Publication Date
WO2025050223A1 true WO2025050223A1 (en) 2025-03-13

Family

ID=94922791

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2024/051169 Pending WO2025050223A1 (en) 2023-09-08 2024-09-09 Implicit dual control for uncertain stochastic systems

Country Status (1)

Country Link
WO (1) WO2025050223A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120065758A (en) * 2025-04-28 2025-05-30 天目山实验室 Adaptive control method and computing equipment for aircraft

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140278303A1 (en) * 2013-03-15 2014-09-18 Wallace LARIMORE Method and system of dynamic model identification for monitoring and control of dynamic machines with variable structure or variable operation conditions
CA2953385A1 (en) * 2014-06-30 2016-01-07 Evolving Machine Intelligence Pty Ltd A system and method for modelling system behaviour
EP3008528B1 (en) * 2013-06-14 2020-02-26 Wallace E. Larimore A method and system of dynamic model identification for monitoring and control of dynamic machines with variable structure or variable operation conditions
US20210050116A1 (en) * 2019-07-23 2021-02-18 The Broad Institute, Inc. Health data aggregation and outbreak modeling
US20210373513A1 (en) * 2020-05-29 2021-12-02 Mitsubishi Electric Research Laboratories, Inc. Nonlinear Optimization Method for Stochastic Predictive Control
US20220187793A1 (en) * 2020-12-10 2022-06-16 Mitsubishi Electric Research Laboratories, Inc. Stochastic Model-Predictive Control of Uncertain System
US20230022510A1 (en) * 2021-07-01 2023-01-26 Mitsubishi Electric Research Laboratories, Inc. Stochastic Nonlinear Predictive Controller and Method based on Uncertainty Propagation by Gaussian-assumed Density Filters

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140278303A1 (en) * 2013-03-15 2014-09-18 Wallace LARIMORE Method and system of dynamic model identification for monitoring and control of dynamic machines with variable structure or variable operation conditions
EP3008528B1 (en) * 2013-06-14 2020-02-26 Wallace E. Larimore A method and system of dynamic model identification for monitoring and control of dynamic machines with variable structure or variable operation conditions
CA2953385A1 (en) * 2014-06-30 2016-01-07 Evolving Machine Intelligence Pty Ltd A system and method for modelling system behaviour
US20210050116A1 (en) * 2019-07-23 2021-02-18 The Broad Institute, Inc. Health data aggregation and outbreak modeling
US20210373513A1 (en) * 2020-05-29 2021-12-02 Mitsubishi Electric Research Laboratories, Inc. Nonlinear Optimization Method for Stochastic Predictive Control
US20220187793A1 (en) * 2020-12-10 2022-06-16 Mitsubishi Electric Research Laboratories, Inc. Stochastic Model-Predictive Control of Uncertain System
US20230022510A1 (en) * 2021-07-01 2023-01-26 Mitsubishi Electric Research Laboratories, Inc. Stochastic Nonlinear Predictive Controller and Method based on Uncertainty Propagation by Gaussian-assumed Density Filters

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120065758A (en) * 2025-04-28 2025-05-30 天目山实验室 Adaptive control method and computing equipment for aircraft

Similar Documents

Publication Publication Date Title
Mesbah et al. Fusion of machine learning and MPC under uncertainty: What advances are on the horizon?
Dutta et al. A survey and comparative evaluation of actor‐critic methods in process control
Cao et al. Deep neural network approximation of nonlinear model predictive control
Hedrea et al. Results on tensor product-based model transformation of magnetic levitation systems
Yassin et al. Recent advancements & methodologies in system identification: A review
WO2025050223A1 (en) Implicit dual control for uncertain stochastic systems
Rajasekhar et al. Exploring reinforcement learning in process control: a comprehensive survey
Jeyaraj et al. Real‐time data‐driven PID controller for multivariable process employing deep neural network
Dang et al. Online self-learning fuzzy recurrent stochastic configuration networks for modeling nonstationary dynamics
Alsmeier et al. Imitation Learning of MPC with Neural Networks: Error Guarantees and Sparsification
Pozzi et al. Imitation learning-driven approximation of stochastic control models
Ławryńczuk Introduction to model predictive control
van Lith Hybrid fuzzy-first principles modeling
Vrabie et al. Biologically inspired scheme for continuous-time approximate dynamic programming
Li et al. Expensive Optimization
Truong et al. Ensemble Bidirectional Long Short-Term Memory Network Identification for Nonlinear Autoregressive Exogenous Model: Application to Dual Double-Acting Piston Pump
Mathis Dual iterative linear quadratic Gaussian control for uncertain nonlinear systems
Banker et al. Model-free Reinforcement Learning for Model-based Control: Towards Safe, Interpretable and Sample-efficient Agents
Lv et al. A novel non-contact recognition approach of walking intention based on long short-term memory network
Abonyi et al. Adaptive Sugeno fuzzy control: A case study
Niu et al. Enhancing Control Performance through ESN-Based Model Compensation in MPC for Dynamic Systems
Chatterjee et al. Robust Fault Detection Of A Hybrid Control System Using Derivative Free Estimator And Reinforcement Learning Method
Chang Neural Lyapunov Methods for Learning-based Control
Huebotter et al. Spiking Neural Networks for Continuous Control via End-to-End Model-Based Learning
Kontovourkis et al. Adaptive kinetic structural behavior through machine learning: optimizing the process of kinematic transformation using artificial neural networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24861429

Country of ref document: EP

Kind code of ref document: A1