[go: up one dir, main page]

WO2025050223A1 - Double commande implicite pour systèmes stochastiques incertains - Google Patents

Double commande implicite pour systèmes stochastiques incertains Download PDF

Info

Publication number
WO2025050223A1
WO2025050223A1 PCT/CA2024/051169 CA2024051169W WO2025050223A1 WO 2025050223 A1 WO2025050223 A1 WO 2025050223A1 CA 2024051169 W CA2024051169 W CA 2024051169W WO 2025050223 A1 WO2025050223 A1 WO 2025050223A1
Authority
WO
WIPO (PCT)
Prior art keywords
control
augmented
data structure
dual
states
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CA2024/051169
Other languages
English (en)
Inventor
Andrew Craig MATHIS
Jonathon W. Sensinger
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of New Brunswick
Original Assignee
University of New Brunswick
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of New Brunswick filed Critical University of New Brunswick
Publication of WO2025050223A1 publication Critical patent/WO2025050223A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators

Definitions

  • System identification techniques can identify these uncertainties through testing before the system is put into normal operation, but this option is not always available and cannot identify time-varying uncertainties.
  • Adaptive control techniques can identify uncertain parameters while directing a system toward a desired objective, but cannot determine what actions would result in measurements with more information about the uncertainties, and are therefore passively adaptive.
  • Actively adaptive control techniques estimate the reductions in uncertainty that will result from their control actions and probe the system to identify the uncertain parameters to a sufficient level such that the desired goal is optimized. This actively adaptive control is known as dual control, as the controls are chosen to learn about the uncertainties and to reduce the cost.
  • Robust approaches consider the set of possible values that the parameters could take and can account for the uncertainties to limit their impact on achieving the control goals (e.g., [39]).
  • Stochastic approaches consider the probability density function of the uncertain parameters to account for parameter realizations with the higher chances of being true (e.g., [6]).
  • Adaptive approaches continuously update estimates of the uncertain parameters using measurements from the system and controls the system as if these estimates are the true values of the parameters (e.g., [15]).
  • dual approaches consider what actions could be taken to improve the information in future measurements in such a way that the resulting reduction in future costs is greater than the cost of these probing actions (e.g., [47]).
  • Adaptive control methods are passively adaptive, in that they only consider changes to parameter uncertainties due to past measurements.
  • Dual control on the other hand, is an actively adaptive control method, in that it actively modifies its control actions to seek out future measurements that will reduce parameter uncertainties.
  • Dual control has three features: it is probing, cautious, and selective [28]. Dual control is probing in that it modifies its control signals to obtain more information-rich measurements, it is cautious in that it will tend to make smaller control actions when uncertainties are high, and it is selective in that it will only attempt to identify parameters that impact the system’s performance.
  • An analogy for dual control that has been used is driving a new car from location A to location B [32].
  • Implicit dual controllers on the other hand, approximate the Bellman equations in such a way that the reduction of uncertainty due to future measurements is estimated, which generally comes at the cost of higher computational effort compared to explicit methods [22].
  • the relative importance of the dual features and the other objectives does not have to be quantified with the implicit approach as they are linked, but probing for increased parameter information is only done if it will reduce future costs.
  • implicit methods can give better results than explicit methods [7].
  • Bayard and Schumitzky [7] use an iteration in policy space algorithm that combines particle filtering with Monte Carlo simulations to estimate the cost-to-go and iteratively improve on a given control policy but is limited to control inputs that only take on two discrete values.
  • Sehr and Bitmead [53] approximate the system dynamics as a Partially Observable Markov Decision Processes, allowing the dual stochastic MPC problem to be solved explicitly for small and medium-sized problems.
  • Thangavel et al. [60] and Hanssen and Foss [20] take a multi-stage approach to implicit dual control and consider a branching network of scenarios, where each branch represents the system’s predicted response for a single realization of the uncertain parameters.
  • the optimization problem that is solved is to determine the control actions (over the control horizon) which minimize the sum of the costs of the scenarios over the prediction horizon.
  • the reduction in the uncertainties due to future measurements is estimated for each time step in a “robust” horizon, and this reduction is reflected in the selection of the discrete uncertainty values upon which the subsequent scenarios are based.
  • the robust horizon is usually chosen to be less than the control horizon, and the uncertainties after that point are assumed to be equal to the nominal value, no longer incorporating the dual features.
  • the existing implicit dual approaches are limited by Bellman’s curse of dimensionality as they do not take a derivative-based approach in continuous state space.
  • Optimal Control the control actions to a dynamic system over a period of time are determined by solving an optimization problem where a given objective function is minimized [63].
  • the objective function is chosen to cause the system to demonstrate a particular behaviour, such as following a particular state trajectory or minimizing energy usage.
  • Objective, or cost, functions can have terms that impose costs at each point in time, known as stage costs, or terms that only impose costs at the final time, known as terminal costs. Constraints can also be imposed on the states and/or controls in the form of inequalities or equalities [63].
  • a common example of an optimal control problem is finding the cheapest connections to fly to a destination [63].
  • the state x represents the cities
  • the control u represents the choices of which flights to take
  • cost(x, u) represents the cost of the plane ticket
  • next(x, u) represents the city where the flight u from city x lands.
  • the optimal cost-to-go can be determined by selecting the lowest cost-to-go at the initial time step, which provides the optimal control trajectory.
  • this approach means that the problem can be solved by considering flights to a single city at a time and keeping running totals of the flight costs. The cost of flights to the final destination can be recorded first, giving the cost-to-go (to the final destination) from each of those cities. The cost of flights to those second-to-last destinations from other cities can then be added to their respective cost-to-go functions, giving the cost-to-go from those cities.
  • This control approach is only locally-optimal due to it being a trajectory-based approach [35].
  • iLQR considers deviations around an initial nominal control trajectory, u ⁇ p , the resulting solution can only be optimal within that region of control space, and no claims of global optimality can be made.
  • iLQR can also be applied in a moving horizon setting where feedback is used to improve performance by accounting for inaccuracies due to the linearization process.
  • the algorithm is initialized with a nominal control trajectory as described above that could be a vector of zeros, randomly generated, or any other form of seed.
  • the iLQR algorithm is then run for a given finite time horizon, resulting in an improved control policy.
  • the control policy is then applied to the true system for a single time step, after which a state measurement from the system is obtained.
  • the length of the time horizon and the improved control policy can then be adjusted and used to run the iLQR algorithm. This process can be repeated for a problem with a finite time horizon until the final time step is reached, or continuously for problems with infinite time horizons.
  • iLQG Iterative Linear Quadratic Gaussian (iLQG) iLQG extends iLQR to nonlinear systems that are stochastic and do not have quadratic costs.
  • the cost function is “quadratized” about the nominal state-control trajectory in a similar way that the system dynamics are linearized (non-quadratic cost functions can be handled in the same way for iLQR). Since the states are uncertain, the measurement dynamics are also included in the iLQG algorithm, and a filter is required to estimate the states’ mean values and covariances.
  • the forward integration of the system dynamics to obtain the nominal state trajectory from the nominal control trajectory and the calculations of the derivatives required for the linearization of the system dynamics as well as the “quadratization” of the cost function are grouped together in what’s known as a forward pass.
  • the next step in the algorithm is the estimator, which uses the noisy measurements to infer the value of the states and their covariances.
  • a backward pass is required to calculate a quadratic approximation to the cost-to-go function, and then the optimal control deviations can be found.
  • iLQR iLQG
  • iLQG gives optimal solutions due to its trajectory-based approach, which also allows it to be very efficient.
  • the algorithm can be initialized multiple times with different nominal control trajectories, known as seeds, to search a larger control space. These seeds may converge to different local minima, and a selection criteria can be determined to select a single solution from this set of solutions.
  • H has negative eigenvalues
  • the approximation of the cost-to-go function may become negative, and cost-to-go always non-negative.
  • the first option [57] is to set + .
  • the second option is to regularize H and G through the quadratic cost-to-go t erms ([57]) as where min(eig(H)) is the minimum eigenvalue of H.
  • a fourth option [37] i s to set , (2.59) where [V, D] eig(H) is the eigenvalue decomposition of H, but the elements in the diagonal matrix D that are less than are replaced with before computing H.
  • Tassa et al. present a quadratic modification schedule for the regularization term in [57].
  • cost terms should be imposed on both u and s(u).
  • the other option to introduce control constraints is by solving the quadratic program of minimizing the cost-to-go approximation from equation (2.34), subject to the box control constraints bmin and bmax.
  • the quadratic optimization problem to be solved is min ( , ) (2.68) subject to + . (2.69)
  • This approach directly solves for the sequence of control actions that minimize the approximation of the total cost, by solving a sequence of these problems in a backward pass.
  • the gain matrices lk and Lk are the final result, such that the improved policy for the control deviations can be determined as shown in equation (2.52).
  • this optimization process can result in control deviations that are arbitrarily large and can therefore send the state trajectory outside of the region where the linear approximation to the system dynamics is reasonably accurate.
  • a line search is performed to sequentially reduce the improved control deviations until a solution is found that is estimated to cause a reduction in the total cost.
  • This locally-linear policy is determined by performing a forward pass through the estimated system dynamics: where is a backtracking search parameter that is set to 1 and then sequentially reduced. If the line search fails to find a reduced cost solution, the regularization parameter is increased as shown in equations (2.62) and (2.63), and the control policy backward pass and line search forward pass are repeated until the algorithm converges to a locally optimal control policy.
  • Stopping criteria Four stopping criteria are used to define convergence and control how long this iterative algorithm runs. First, if the gradient of the control deviations is less than a predefined threshold, the algorithm terminates.
  • Moving horizon implementation iLQG can be implemented with a moving horizon approach as described for iLQR, but since there is noise in the system dynamics and the measurements, a filter may be used to obtain an estimate of the states. A system diagram for this approach is shown in FIG.2.
  • the iLQG block implements the iterative iLQG algorithm, which is referred to as the inner loop 110, while the outer loop 120 contains the control signal being sent to the true system 130 for a single time step, the state dynamics being sampled with a sampling period of T as well as the measurement 140 and filtering steps.
  • the inner loop 110 accepts, as input, a nominal control trajectory , the current state and parameter values and and generates a new control policy after executing the forward and backward pass operations.
  • the new control policy enables the generation of an updated control trajectory for use as the nominal control trajectory for future iterations of the inner loop.
  • Adaptive Control is the control of systems with uncertain parameters that are constant or time-varying. This uncertainty can be due to the cost or complexity of accurately measuring the parameters being high, or when the control scheme is to be applied to multiple similar systems that have different values for the parameters [15].
  • adaptive control methods are divided into four categories: gain scheduling, model- reference adaptive control, self-tuning adaptive control, and dual control.
  • gain scheduling is to change the controller or its parameters based on pre-defined conditions.
  • a series of local controllers are required to implement gain scheduling, which are each tuned for their operating range, and a method of switching between them.
  • the selection of the local controller can be seen as an adaptive process.
  • a model reference adaptive controller adjusts the estimates of a model’s parameters such that the tracking error of the plant converges to zero.
  • the control and adaptation laws are coupled, and these can be derived using Lyapunov theory such that stability can be shown.
  • the plant parameters are recursively estimated from the input-output data and then used by a controller as if they were the true plant parameters.
  • Using the estimated value as if it were the true value is often called the certainty equivalence principle, allowing the design of the estimation process and controller to be independent, unlike in model-reference adaptive control.
  • self-tuning adaptive control there are also no guarantees of parameter convergence without sufficient richness, and the design of the estimator and the controller being independent makes the stability of the system harder to prove [55].
  • Dual control is theoretically the ideal adaptive control method [28], and one of the major differences is that dual control takes into account the fact that uncertainties will be reduced in the future. This consideration of future information allows for a controller that can probe the plant and make control actions to reduce the parameter uncertainty in the future in a cautious way and focus on the most relevant parameters.
  • Dual Control Bayard et al. divide stochastic control policies into three classes: open-loop, feedback, and closed-loop [7]. Open-loop control policies do not use any process measurements and therefore no learning occurs. Feedback control policies use all measurements up to the current time step, and therefore learning can occur, but the learning is passive since the data generated is only due to performing the control task.
  • Closed-loop control policies also use all measurements up to the current time step, but additionally, they anticipate that future measurements are going to be made in the future. Taking future measurements into account to determine the current control action allows planned, or active, learning to take place.
  • these stochastic control methods are known as dual controllers. Due to the curse of dimensionality, Bellman’s equations cannot be efficiently solved for problems of arbitrary size, and therefore the explicit and implicit approximations of dual control are needed for its practical implementation.
  • the explicit approximation involves modifying the cost function to elicit one or more of the dual behaviours of probing, caution, and selectiveness.
  • the implicit approximation on the other hand elicits these dual behaviours through a method that allows the control algorithm to consider the impact of probing actions on future costs.
  • Adding terms to the cost function with the explicit approximation requires an explicit trade-off between the control objectives and the system identification, while with the implicit approximation, the controller can balance these dual objectives without additional information. While it is simple to add extra terms to the cost function to elicit the dual features of caution, probing, and/or selectiveness, this approach fixes the value of system information relative to the minimization of the rest of the cost function. Even with methods that vary the value of system information based on a specific measure, explicit approaches are likely to overvalue or undervalue system information at different points in time compared to implicit approaches.
  • Bar-Shalom and Tse work is extended in [28] by estimating the system dynamics using a parametric, Gaussian process and neural network regression. Like Bar- Shalom and Tse’s work on which it is based, this control method is explicitly dual and only applicable to additive noise.
  • One application shows how dual NMPC can be applied to control the climate of a building, and this example is also detailed in [27].
  • Bar-Shalom and Tse also refer to time-scale separation and reformulating this implicit dual NMPC method without the dynamic programming portion to make it solely NMPC-based.
  • Implicit dual control As with the work on explicit dual control, many of the implicit dual control publications have been based on MPC, but several have taken other approaches. These works also make different assumptions on the uncertain parameters and use different estimation approaches. Multi-stage NMPC is used as an implicit approximation to dual control in [60], where the unknown system parameters were assumed to be bounded, parametric, and time- invariant. Here, the Fisher information matrix is used to estimate the future reduction in uncertainties, but the least squares estimate of the uncertainties is assumed to be constant.
  • the parameter sensitivities are included in the scenario tree in the same way as the states, allowing the controller to make decisions based on probing for information on specific parameters. More involved estimation schemes for the uncertainties and their bounds are mentioned, including guaranteed parameter estimation.
  • This dual control method is then applied to the control of a simulated chemical batch reactor.
  • This work was extended in [59], and the assumption that the least squares estimate has converged to the true value of the uncertainty is removed. This assumption is replaced by an over-approximation factor to estimate the future changes in the point estimation.
  • guaranteed parameter estimation is mentioned as a possible future extension of this work, but was not explored. This approach is used as a basis of comparison throughout this work and is therefore explained in more detail in the following section.
  • Multi-stage NMPC considers a branching network of scenarios, where each branch represents the system’s predicted response for a single realization of the uncertainties.
  • This scenario tree is illustrated in FIG.3, where x represents the system states, u represents the control actions, and d represents the discrete realizations of the uncertainties with superscripts indicating the scenario number and subscripts indicating the time index into the future.
  • control actions that share a node are equal, and this requirement is known as the non-anticipatory constraint.
  • the reduction in the uncertainties due to future measurements is estimated for each time step in a robust horizon, and this reduction is reflected in the selection of the uncertainty values upon which the subsequent scenarios are based.
  • the robust horizon is usually chosen to be less than the control horizon, and the uncertainties after that point are assumed to be equal to the nominal value.
  • the optimization problem that is solved is to determine the control actions (over the control horizon) that minimize the sum of the costs of the scenarios over the prediction horizon. To ensure that none of the scenarios are excited more than necessary, the selected control action is cautious when uncertainties are high.
  • Probing is also introduced as small control actions that may increase a scenario’s cost initially but can lead to larger reductions in future costs associated with the uncertainty.
  • selectiveness is introduced, as probing actions that have a higher “return on investment” will be prioritized, and therefore the most important uncertainties will be reduced.
  • the multi-stage NMPC method relies on predicting the future reductions in uncertainties for given control actions, and even when representing uncertainties by a discrete set of realizations, the number of scenarios grows exponentially and must be limited. The selection of the discrete realizations of the uncertainties impacts the controller’s robustness and computational For linear systems, using the minimum, maximum, and nominal values of the uncertainties (assuming they are bounded) keeps the number of scenarios per branch low and can be shown to be “usually” robust [60].
  • SUMMARY Systems and methods are provided for determining control actions for controlling a stochastic system via an implicit dual control using a computer processor and associated memory encoded with an augmented mathematical model of dynamics of the system, the augmented mathematical model characterizing the dynamics of the states and uncertain parameters of the system.
  • a stochastic differential-dynamic-programming-based algorithm is employed to process an augmented state data structure characterizing the states of the stochastic system and the one or more uncertain parameters, and an augmented state covariance data structure comprising a covariance matrix of the states of the stochastic system and a covariance matrix of the uncertain parameters, according to the augmented mathematical model and a cost function, in which the uncertain parameters are treated as additional states subject to the augmented mathematical model, to determine a control policy for reducing cost through implicitly generated dual features of probing, caution, and selectiveness.
  • a computer-implemented method of determining control actions for controlling a stochastic system according to an implicit dual controller comprising: providing control and processing circuity comprising at least one processor and associated memory, the memory comprising an augmented mathematical model of dynamics of the stochastic system, the augmented mathematical model a mathematical model including states of the stochastic system that dynamically evolve over time and a set of parameters characterizing dynamic evolution of the stochastic system, the set of parameters comprising one or more uncertain parameters, the augmented mathematical model augmented to include dynamics of one or more uncertain parameters; storing, in the memory, an initialized form of an augmented state data structure, the augmented state data structure being provided in as a first data array of data elements characterizing the states of the stochastic system and the one or more uncertain parameters, thereby grouping the states and the one or more uncertain parameters within the augmented state data structure; storing, in the memory, via the processor, an initialized form of an augmented state covariance data structure, the augmented state co
  • the mathematical model is configured such that all of the states are observable, and performing step d), the one or more uncertain parameters are updated via a filter.
  • at least one of the states is unobservable, and wherein, when performing step d), a filter is configured to employ the augmented mathematical model to estimate updated values of unknown states and the one or more uncertain parameters by processing the augmented state data structure and the augmented state covariance data structure.
  • step b) is performed after performing steps c) and d), such that the control policy is determined based on newly determined states.
  • the control action is autonomously applied to the stochastic system for at least one time step.
  • the control action is not applied to the stochastic system for at least one time step.
  • the control and processing circuity is encoded with the augmented mathematical model such that the augmented mathematical model is characterized by multiplicative noise.
  • the control and processing circuity is encoded with the augmented mathematical model such that at least one of the uncertain parameters is modeled as a time-dependent parameter.
  • the control and processing circuity is encoded to employ a moving control horizon.
  • the mathematical model is obtained by data- driven modeling.
  • the mathematical model is obtained by regression-based data-driven modeling.
  • the augmented stochastic differential- dynamic-programming-based algorithm is based on an algorithm selected from the group consisting of the iterative linear quadratic Gaussian (iLQG) algorithm, the stochastic dual dynamic programming (SDDP) algorithm, and variations thereof.
  • the augmented stochastic differential- dynamic-programming-based algorithm is based on an algorithm selected from the group consisting of dual dynamic programming, the sequential linear quadratic (SLQ) algorithm, and the iterative linear quadratic regulator (iLQR) algorithm, and variations thereof, all being modified to include stochastic terms.
  • the stochastic system is an industrial system for producing or refining a product.
  • At least one of the one or more uncertain parameters may be a composition of a feedstock, and wherein the control action comprises controlling an input rate of the feedstock.
  • the industrial system may be an anerobic digestion system and the feedstock is an organic feedstock suitable for digestion by
  • the stochastic system comprises a population, wherein the states comprises a plurality of infection status states of the population, wherein the mathematical model simulates spread and dynamics of an infectious disease among the population, and wherein the control policy is configured to determine, at least in part, a severity of public policy actions for containing spread of the infectious disease, and wherein the one or more uncertain parameters comprise a minimum rate of infection when a maximum severity of public policy is applied.
  • the stochastic system is an autonomous vehicle, and where at least one uncertain parameter is associated with an uncertainty caused by an impact of an environment on dynamics of the autonomous vehicle.
  • the one or more uncertain parameters may comprise at least one of a friction coefficient and a drag coefficient having uncertainty due to external environmental conditions.
  • the stochastic system is an individual undergoing rehabilitation, wherein the states characterize at least one of participation, activities, health condition, body functions and structures, environmental factors, and personal factors, and wherein the one or more uncertain parameters comprise gains and time-constants involving interactions between the states in response to rehabilitation control actions.
  • the stochastic system is a wearable robotic system, and wherein at least one uncertain parameter is tunable on a per-user basis.
  • the stochastic system is an industrial system, and wherein at least one uncertain parameter is associated with degradation of the industrial system, and wherein the method further comprises employing updated values of the at least one uncertain parameter and/or its updated uncertainty, obtained during control of the industrial system, to detect a fault associated with degradation of the industrial system.
  • the stochastic system is a building climate control system, and wherein the one or more uncertain parameters comprise at least one of an uncertain parameter associated with external factor and an uncertain parameter characterizing a building-specific factor.
  • a computer-implemented method of determining control actions for controlling a stochastic system according to an adaptive controller comprising: providing control and processing circuity comprising at least one processor and associated memory, the memory comprising a mathematical model of dynamics of the stochastic system, the mathematical model including states of the stochastic system that dynamically evolve over time and a set of parameters characterizing dynamic evolution of the stochastic system, the set of parameters comprising one or more uncertain parameters, the memory also comprising an augmented mathematical model of dynamics of the stochastic system, the augmented mathematical model comprising a mathematical model augmented to dynamics of one or more uncertain parameters; storing, in the memory, an initialized form of a state data structure, the state data structure being provided in as a first data array of data elements characterizing the states of the stochastic system; storing, in the memory, an initialized form of a state covariance data structure, the state covariance data structure being provided as a second data array of data elements comprising a covariance matrix of the states of
  • the stochastic differential-dynamic- programming-based algorithm is based on an algorithm selected from the group consisting of the iterative linear quadratic Gaussian (iLQG) algorithm, the stochastic dual dynamic programming (SDDP) algorithm, and variations thereof.
  • the stochastic differential-dynamic- programming-based algorithm is based on an algorithm selected from the group consisting of dual dynamic programming, the sequential linear quadratic (SLQ) algorithm, and the iterative linear quadratic regulator (iLQR) algorithm, and variations thereof, all being modified to include stochastic terms.
  • an implicit dual controller for controlling a stochastic system
  • the implicit dual controller comprising: control and processing circuity comprising at least one processor and associated memory, the memory comprising an augmented mathematical model of dynamics of the stochastic system, the augmented mathematical model based on a mathematical model including states of the stochastic system that dynamically evolve over time and a set of parameters characterizing dynamic evolution of the stochastic system, the set of parameters comprising one or more uncertain parameters, the augmented mathematical model augmented to include dynamics of one or more uncertain parameters; the memory further comprising: an initialized form of an augmented state data structure, the augmented state data structure being provided in as a first data array of data elements characterizing the states of the stochastic system and the one or more uncertain parameters, thereby grouping the states and the one or more uncertain parameters within the augmented state data structure; an initialized form of an augmented state covariance data structure, the augmented state covariance data structure being provided as a second data array of data elements comprising a covariance matrix of the states of the
  • an adaptive controller for controlling a stochastic system
  • the adaptive controller comprising: control and processing circuity comprising at least one processor and associated memory, the memory comprising a mathematical model of dynamics of the stochastic system, the mathematical model including states of the stochastic system that dynamically evolve over time and a set of parameters characterizing dynamic evolution of the stochastic system, the set of parameters comprising one or more uncertain parameters, the memory also comprising an augmented mathematical model of dynamics of the stochastic system, the augmented mathematical model comprising a mathematical model augmented to include dynamics of one or more uncertain parameters; the memory further comprising: an initialized form of a state data structure, the state data structure being provided in as a first data array of data elements characterizing the states of the stochastic system; an initialized form of a state covariance data structure, the state covariance data structure being provided as a second data array of data elements comprising a covariance matrix of the states of the stochastic system; an initialized form of an augmented state data structure, the augmented state
  • a system comprising: a stochastic physical subsystem; one or more sensors associated with said stochastic physical subsystem for measuring an output associated with said stochastic physical subsystem; and control and processing circuitry operably coupled to said stochastic physical system and said one or more sensors, said control and processing circuitry comprising at least one processor and associated memory, said memory comprising instructions executable by said at least one processor for performing operations comprising: (a) employing an augmented stochastic differential-dynamic-programming- based algorithm to determine control actions for controlling the stochastic physical subsystem, the augmented stochastic differential-dynamic-programming-based algorithm modelling the stochastic system, at least in part, according to a set of states and one or more uncertain parameters, and employing an augmented state vector comprising the states and the one or more uncertain parameters, an augmented covariance matrix that combines a covariance matrix of the states and a covariance of the one or more uncertain parameters; (b) applying the control actions to the stochastic physical subsystem; (c)
  • a method of controlling a stochastic physical system according to an implicit dual controller the stochastic system being modeled, at least in part, according to a set of states and one or more uncertain parameters
  • the method comprising: (a) employing an augmented stochastic differential-dynamic-programming-based algorithm to determine control actions for controlling the stochastic physical system, the augmented stochastic differential-dynamic-programming-based algorithm employing an augmented state vector comprising the states and the one or more uncertain parameters, and an augmented covariance matrix that combines a covariance matrix of the states and a covariance matrix of the one or more uncertain parameters; (b) applying the control actions to the stochastic physical system; (c) measuring an output of the stochastic physical system; (d) processing the output and the control actions to determine an augmented state vector estimate and an augmented covariance matrix estimate; and (e) repeating steps (a)-(d) to determine control actions for a plurality of time steps, such that each time that step (a) is
  • FIG.1A shows a comparison of DDP-like controllers.
  • FIG.1B shows a detailed comparison of DDP-like controllers according to their titles.
  • FIG.1C shows a detailed comparison of DDP-like controllers according to their algorithms.
  • FIG.1D shows a detailed comparison of DDP-like controllers according to their treatment of noise.
  • FIG.1E shows a detailed comparison of DDP-like controllers according to their approximations.
  • FIG.1F shows a detailed comparison of DDP-like controllers according to their regularization.
  • FIG.1G shows a detailed comparison of DDP-like controllers according to their constraints.
  • FIG.2 shows a closed-loop iLQG system diagram (terms are defined in the list of variables).
  • FIG.3 shows the tree of scenarios considered in multi-stage NMPC [59].
  • FIGS.4A, 4B and 4C show flow charts for the three variations of iLQG discussed in the present disclosure.
  • FIG.4A is the same as FIG.2 and is shown here for ease of comparison, while FIGS.4B and 4C illustrate example adaptive and implicit dual control methods, respectively. Terms defined in the list of variables.
  • FIG.4A is the same as FIG.2 and is shown here for ease of comparison, while FIGS.4B and 4C illustrate example adaptive and implicit dual control methods, respectively. Terms defined in the list of variables.
  • FIG.4A is the same as FIG.2 and is shown here for ease of comparison, while FIGS.
  • FIG. 5A shows a control comparison between dual and adaptive iLQG and MS-SP- NMPC on linear example.
  • FIG.5B shows a state comparison between dual and adaptive iLQG and MS-SP-NMPC on a linear example.
  • FIG.5C shows a parameter comparison between dual and adaptive iLQG and MS-SP- NMPC on a linear example.
  • FIG.5D shows a cost comparison between dual and adaptive iLQG and MS-SP-NMPC on a linear example.
  • FIG.5E shows a control comparison between dual and adaptive iLQG on a time-varying parameters example.
  • FIG.5F shows a state comparison between dual and adaptive iLQG on time-varying parameters example.
  • FIG.7C shows a parameter comparison between dual iLQG with and without the system’s multiplicative noise being compensated for in the controller.
  • FIG.7D shows a parameter ratio comparison between dual iLQG with and without the system’s multiplicative noise being compensated for in the controller.
  • FIG.7E shows a cost comparison between dual iLQG with and without the system’s multiplicative noise being compensated for in the controller.
  • FIG.8A shows a model-reference adaptive control flowchart for Rohr’s example [55].
  • FIG.8B shows a control comparison between dual and adaptive iLQG and model- reference adaptive control on an unmodelled dynamics example.
  • FIG.8C shows a control comparison between dual and adaptive iLQG and model- reference adaptive control on the unmodelled dynamics example for first seven seconds.
  • FIG.8D shows a state comparison between dual and adaptive iLQG and model-reference adaptive control on the unmodelled dynamics example.
  • FIG.8E shows a state comparison between dual and adaptive iLQG and model-reference adaptive control on the unmodelled dynamics example for first seven seconds.
  • FIG.8F shows a parameter comparison between dual and adaptive iLQG and model- reference adaptive control on the unmodelled dynamics example.
  • FIG.8G shows a parameter comparison between dual and adaptive iLQG and model- reference adaptive control on the unmodelled dynamics example for first seven seconds.
  • FIG.8H shows a parameter ratio comparison between dual and adaptive iLQG and model- reference adaptive control on the unmodelled dynamics example.
  • FIG.8I shows a parameter ratio comparison between dual and adaptive iLQG and model-reference adaptive control on the unmodelled dynamics example for first seven seconds.
  • FIG.8J shows an output comparison between dual and adaptive iLQG and model- reference adaptive control on the unmodelled dynamics example.
  • FIG.8K shows an output comparison between dual and adaptive iLQG and model- reference adaptive control on the unmodelled dynamics example for first seven seconds.
  • FIG.8L shows a cost comparison between dual and adaptive iLQG and model-reference adaptive control on the unmodelled dynamics example.
  • FIG.8M shows a cost comparison between dual and adaptive iLQG and model-reference adaptive control on the unmodelled dynamics example for first seven seconds.
  • FIG. 9A shows an example implicit dual iLQG control system.
  • FIG. 9B shows an example implicit dual iLQG control system with a joint estimation filter.
  • FIG. 9C shows an example implicit dual iLQG control system with dual estimation filters.
  • FIG. 9D shows an example implicit dual system employed for system with fully observable states.
  • FIG. 9E illustrates how, within the inner loop of the method, the model structure and parameters are provided to the iLQG algorithm.
  • FIG. 9F shows various example approaches to model generation.
  • FIG. 10 shows a Venn Diagram schematically illustrating relationships between different types of differential dynamic programming-based algorithms.
  • FIG.11 shows an example system with an implicit dual controller.
  • FIG.12 shows parameters for the example AM2 model.
  • FIG.13A shows a control comparison between dual and adaptive iLQG and dual MS- SP-NMPC on an AM2 anaerobic digestion model with two uncertain parameters.
  • FIG.13B shows a state comparison between dual and adaptive iLQG and dual MS-SP- NMPC on the AM2 anaerobic digestion model with two uncertain parameters.
  • FIG.13C shows a parameter comparison between dual and adaptive iLQG and dual MS-SP-NMPC on the AM2 anaerobic digestion model with two uncertain parameters.
  • FIG.13D shows a biogas production comparison between dual and adaptive iLQG and dual MS-SP-NMPC on the AM2 anaerobic digestion model with two un-certain parameters.
  • FIG.13E shows a cost comparison between dual and adaptive iLQG and dual MS-SP- NMPC on the AM2 anaerobic digestion model with two uncertain parameters.
  • FIG.14 shows parameter values for the AM2 comparison with seventeen uncertainties.
  • FIG.15A shows a control comparison between dual and adaptive iLQG on an AM2 anaerobic digestion model with seventeen uncertain parameters.
  • FIG.15B shows a state comparison between dual and adaptive iLQG on the AM2 anaerobic digestion model with seventeen uncertain parameters.
  • FIG.15C shows a parameter comparison between dual and adaptive iLQG on the AM2 anaerobic digestion model with seventeen uncertain parameters.
  • FIG.15D shows a biogas production comparison between dual and adaptive iLQG on the AM2 anaerobic digestion model with seventeen uncertain parameters.
  • FIG.15E shows a cost comparison between dual and adaptive iLQG on the AM2 anaerobic digestion model with seventeen uncertain parameters.
  • FIG.16 shows a SIR model compartmental diagram [42].
  • FIG.17A shows a SIDARTHE model compartmental diagram [29].
  • FIG.17B shows parameters for the SIDARTHE model.
  • FIG.18 shows a partial compartmental diagram for considering the impact of an over- whelmed ICU [29].
  • FIG.19A shows a control comparison between dual and adaptive iLQG and dual MS- SP-NMPC on a SIDARTHE COVID-19 with two uncertain parameters.
  • FIG.19B shows a state comparison between dual and adaptive iLQG and dual MS-SP- NMPC on the SIDARTHE COVID-19 model with two uncertain parameters.
  • FIG.19C shows a parameter comparison between dual and adaptive iLQG and dual MS-SP-NMPC on the SIDARTHE COVID-19 model with two uncertain parameters.
  • FIG.19D shows a cost comparison between dual and adaptive iLQG and dual MS-SP- NMPC on the SIDARTHE COVID-19 model with two uncertain parameters.
  • FIG. 20A shows results from 100 seeded runs of the 2 parameter SIDARTHE COVID-19 model with several rolling horizon lengths.
  • FIG. 20B shows a frequency (%) of final cost of 100 seeded runs of the iLQG algorithms for the 2 parameter SIDARTHE COVID-19 model with a rolling horizon length of 40.
  • FIG. 20C shows a distribution of the simulation time required for 100 seeded runs of the iLQG algorithms.
  • FIG.21 shows parameter values for the SIDARTHE comparison with sixteen uncertainties.
  • FIG.22A shows a control comparison between dual and adaptive iLQG on a modified SIDARTHE COVID-19 model with sixteen uncertain parameters.
  • FIG.22B shows a state comparison between dual and adaptive iLQG on the modified SIDARTHE COVID-19 model with sixteen uncertain parameters.
  • FIG.22C shows a parameter comparison between dual and adaptive iLQG on the modified SIDARTHE COVID-19 model with sixteen uncertain parameters.
  • FIG.22D shows a cost comparison between dual and adaptive iLQG on the modified SIDARTHE COVID-19 model with sixteen uncertain parameters.
  • the terms “comprises” and “comprising” are to be construed as being inclusive and open ended, and not exclusive. Specifically, when used in the specification and claims, the terms “comprises” and “comprising” and variations thereof mean the specified features, steps or components are included. These terms are not to be interpreted to exclude the presence of other features, steps or components.
  • the term “exemplary” means “serving as an example, instance, or illustration,” and should not be construed as preferred or advantageous over other configurations disclosed herein.
  • any specified range or group is as a shorthand way of referring to each and every member of a range or group individually, as well as each and every possible sub-range or sub-group encompassed therein and similarly with respect to any sub-ranges or sub-groups therein. Unless otherwise specified, the present disclosure relates to and explicitly incorporates each and every specific member and combination of sub-ranges or sub-groups.
  • the iterative linear quadratic Gaussian (iLQG) method is a powerful control technique due to its ability to handle nonlinear and stochastic systems with multiplicative noise but has not been extended to handle systems with uncertain parameters in either an adaptive or dual manner.
  • iLQG which can calculate locally-optimal control policies for nonlinear stochastic systems, is based on continuous state space with the use of derivatives of a linearized system about a nominal control trajectory and is related to Pontryagin’s maximum principle.
  • iLQG is not a dual or adaptive control algorithm, it can handle nonlinear and stochastic systems with many states through its derivative-based approach in continuous state space.
  • the present inventors realized that by modifying iLQG to treat the uncertain parameters as uncertain states, the resulting adaptive and dual iLQG control algorithm can predict how changes to the inputs and states can result in future reductions in the parameter uncertainty and therefore increase overall performance (lower costs).
  • dual iLQG and variations thereof employing other stochastic DDP-based algorithms) can identify changes to the inputs that can decrease parameter uncertainty, and although these actions have an associated cost, they decrease the overall cost over the control trajectory.
  • Adaptive and dual iLQG represents a fast (due to the linearization of the system) and feasible (due to working with derivatives about a nominal state-control trajectory) solution to the implicit dual control of small and large systems while avoiding Bellman’s curse of dimensionality. Accordingly, in present work, an existing derivative-based control method is extended to handle systems with uncertain parameters in either an adaptive or dual manner.
  • Adaptive iLQG To extend iLQG to uncertain systems in an adaptive manner, two changes are made to the closed-loop iLQG approach shown in FIG.4A (and FIG.2). First, the initial estimates of the uncertain parameters are to the iLQG inner loop as constants.
  • the augmentation of the state vector with the parameters for the closed-loop filter is an approach that is known as joint simultaneous state and parameter estimation [49].
  • an augmented constants vector c a is created similar to the augmented state vector, through the concatenation of the constants and the parameters, Moreover, when the processing the augmented forms of the state vector and the covariance matrix by the filter in the outer loop, an augmented form of the state dynamics where ( , ) is the parameter dynamics and ( , ) is the augmented system dynamics.
  • This approach allows adaptive iLQG to update its parameter estimates in the outer loop after getting new measurements and pass them to the inner loop iLQG algorithm as constants, which is known as the certainty equivalence principle.
  • the system dynamics are also adapted in [48] using Locally Weighted Projection Regression.
  • Dual iLQG To extend this adaptive iLQG approach to be dual, the uncertainty associated with the parameters influences the control policy. Therefore, instead of treating the parameters as constants in the inner loop iLQG algorithm, the parameters are treated as states and an augmented state vector is formed as shown in equation (3.1). An augmented state covariance is also created as shown in equation (3.2).
  • the present dual iLQG method involves the processing of the augmented state dynamics in the inner loop. The inclusion of the parameter dynamics makes dual iLQG able to handle time-varying parameters. These changes are represented in the system diagram in FIG.4C.
  • the augmented state vector is employed when executing the inner loop 110 according to the augmented form of the state dynamics, such that the uncertain parameters are treated as states by the inner loop, with the uncertain parameters governed by the parameter dynamics prescribed by the augmented state dynamics.
  • the iLQG algorithm treats the parameters as unmeasured states and allows the control algorithm to predict how changes to the inputs and states can result in future reductions in the parameter uncertainty, through the backward pass of the cost-to- go function, to lower the total cost of the control trajectory.
  • the parameter uncertainty influences the control actions through the inner loop Kalman filter gain, which impacts the estimates of the augmented state vector and the cost function at each time step.
  • a first augmented data structure is an augmented state data structure, provided in a matrix, for example, a 1-D array (or alternatively in a multi-dimensional array), which is a concatenation of the data of the state vector and the data elements of the uncertain parameter vector, as shown in equation (3.1).
  • a second augmented data structure is an augmented state covariance matrix, which is initialized as a combination of the data elements from the state covariance matrix and data elements from the uncertain parameter covariance matrix in a block diagonal manner, as shown in equation (3.2).
  • This new data structure contains the confidence information for the associated augmented state vector, and when used with the augmented state vector, allows for the states and parameters to be treated as a single stochastic entity in whatever part of the algorithm it is applied.
  • the augmented state dynamics are also encoded, in functional logical form, into the memory of the computer system for processing, in which the state dynamics function and the parameter dynamics function are concatenated, as shown in equation (3.4).
  • This encoded functional form allows the modelled dynamics of the augmented state vector to be calculated as a single dynamic system.
  • the augmented state and augmented state covariance data structures are employed in the outer-loop filter of the closed-loop iLQG algorithm to give the algorithm the new ability of being adaptive, thus creating the new adaptive iLQG algorithm.
  • These two data structures allow the outer loop filter to treat the parameters as states, and through the use of the information contained in the measurements from the true system, leads to updated estimates of the augmented state vector and augmented state covariance matrix for the next iteration of the algorithm.
  • these data structures allows for the estimates of the uncertain parameters to be updated for each iteration of the algorithm, which can lead the adaptive iLQG algorithm to perform better than the closed-loop iLQG algorithm in terms of the evaluation of the total cost function.
  • these data structures take up more space on the computer memory than the data elements from the state vector or state covariance matrix alone, the adaptive ability that they create when used in the closed-loop iLQG algorithm can allow for the computer to produce a better solution for controlling the physical system than the closed-loop iLQG algorithm. Additionally, the adaptive iLQG algorithm can produce this better solution with a similar number of processor cycles, and therefore can be more computationally efficient than the closed-loop iLQG algorithm.
  • an augmented covariance data structure that includes the covariance matrix of the states and the covariance matrix of the one or more uncertain parameters will include the data elements corresponding to those present in the covariance matrix of the states and those present in the covariance matrix of the one or more uncertain parameters, but need not be provided in a standard covariance matrix form, provided that the data elements of the augmented covariance data structure can be accessed and processed by the computer hardware implementing the method.
  • the augmented state data structure and augmented state covariance matrix data structures are employed outer-loop filter and in the inner-loop iLQG algorithm along with the augmented state dynamics function data structure to give the closed-loop iLQG algorithm the new ability of being dual, thus creating the new dual iLQG algorithm.
  • the augmented state data structure and augmented state covariance data structures allow the outer-loop filter to treat the parameters as states, and through the use of the information contained in the measurements from the true system, leads to updated estimates of the augmented state vector and augmented state covariance matrix for the next iteration of the algorithm.
  • the use of the augmented state data structure in the inner-loop iLQG algorithm allows the dual iLQG algorithm to treat the parameters as unmeasured states and allows the dual iLQG algorithm to predict how changes to the control inputs and states can result in future reductions in the parameter uncertainty, through the backward pass of the cost-to-go function, and allow for a lower total cost of the control trajectory.
  • the parameter uncertainty that is encoded in the augmented state covariance data structure influences the control actions through the inner loop Kalman filter gain, which impacts the estimates of the augmented state vector data structure and the cost function at each time step.
  • dual iLQG can identify changes to the inputs that can decrease parameter uncertainties while also decreasing the total cost of the control trajectory.
  • the use of these data structures in this way can lead the dual iLQG algorithm to perform better than the closed-loop iLQG and adaptive iLQG algorithms in terms of the evaluation of the total cost function.
  • These data structures allow for a dual approach that is derivative-based and can handle applications with higher numbers of states and parameters without the common issue of Bellman’s curse of dimensionality that limits the use of conventional implicit dual control algorithms on computer systems.
  • the curse of dimensionality is a phenomenon in which the size of a stochastic computational problem, in terms of memory and/or processor requirements, grows exponentially with a linear increase in the number of states and parameters, as described previously for the case of dual SP-MS-NMPC.
  • the derivative-based approach of the closed-loop iLQG algorithm allows it to search along a nominal control trajectory for where changes to the control trajectory are likely to reduce the total cost.
  • the use of these data structures takes this existing closed-loop iLQG algorithm and significantly improves it by making it dual, allowing the dual iLQG algorithm to use the same computer resources, in terms of memory and processing, to solve larger problems than other implicit dual control approaches.
  • Dual iLQG is implicitly dual through the augmented state and covariance data structures that allow it to identify changes to the control trajectory that can decrease parameter uncertainty, through the use of derivatives of the cost function. Although these control changes may have an associated cost, they decrease the overall cost of the control trajectory.
  • a seeding method may be incorporated to initialize the optimization problem with different sets of initial conditions.
  • the nominal control trajectory is the variable that is used to initialize the optimization and is iteratively improved upon through successive runs of the inner loop iLQG algorithm [35].
  • This seeding process increases the performance of the dual iLQG algorithm, it linearly increases the computation time for each time step in which it is used.
  • Moving horizon approaches When performing example implementations of the adaptive and implicit dual control methods, two different example moving horizon options were coded into the dual iLQG algorithm: shrinking and rolling horizon approaches. In the shrinking horizon approach, the controller simulates the system and solves for the control actions for the entire time horizon, from the present time step to the final time step.
  • the time horizon is reduced by a single time step and the inner loop of the algorithm is reinitialized. This process repeats until the final time step is reached, meaning that the calculation effort for each time step decreases as the algorithm progresses.
  • the rolling horizon approach has a fixed time horizon, such that after each inner loop of the algorithm is complete, the next outer loop iteration is initialized by shifting this fixed time horizon one step forward, dropping the present time step and adding a new time step on the end. Once the final time step is reached, although the controls and system responses have been determined for future time steps, they are not considered in the results.
  • the true value of the constant parameter vector is [125, 50], while its initial estimate is [100, d of [8100, 4500; 4500, 5625].
  • This system is simulated over 6 time steps that are 0.05 seconds long. Both states are measured with a noise scaling factor of 10 - 2 , and as the MS-SP-MPC does not consider the states to be uncertain, the noise scaling factors for dual iLQG were set to 10 - 15 for both the state and parameter dynamics.
  • the cost function can be minimized by maintaining u2 at zero and driving x1 to 5 using u1, as u2 does not influence x1.
  • a dual control method on the other hand will temporarily use u 2 to reduce the uncertainty associated with d2, even though it will incur a cost, to reduce the overall cost.
  • Adaptive MS-SP-MPC, dual MS-SP-MPC, and dual iLQG were compared using this system, and the resulting controls are in FIG.5A. Starting with u 2 , it can be seen that the adaptive MS-SP-MPC maintains u2 at zero as expected.
  • the dual MS-SP-MPC has a non-zero value of u 2 for the first time step only, while the magnitude of u 2 for the dual iLQG controller gradually steps down to zero over four time steps.
  • the dual iLQG algorithm took an average of 3.5 seconds to run, while the adaptive iLQG algorithm took 0.4 seconds, the adaptive MS-SP-MPC algorithm took 3.0 seconds and the dual MS-SP-MPC algorithm took 75 seconds.
  • the dual iLQG controller significantly outperformed dual MS- SP-MPC by being less cautious with its initial value for u1 and by continuing to u2 after the first time step to improve its estimate of the uncertain parameters. Robustness to time-varying parameters
  • parameters that are modelled as constant may vary over time. The ability of a controller to achieve the desired objective in the face of uncertain parameter dynamics can be described as the controller’s robustness to time-varying parameters.
  • the previous linear example is modified to have time-varying parameters to explore dual iLQG’s robustness to time-varying parameters.
  • the system dynamics are the same as shown in equation (3.5), and the initial state and parameter estimates also are the same and both the states and the controls remain unconstrained.
  • the initial true value of the parameters is the same as before, but the true parameters vary with dynamics of: ) , (3.8) while the controller believes that the parameters remain constant over time.
  • the time step of 0.05 seconds was kept the same as the previous example, instead of simulating the system over six steps, twenty time steps were used to show the impact of the time-varying parameters.
  • FIG.5G shows each iLQG algorithm’s estimates of the time-varying parameters.
  • the dual controller quickly converged to the true parameter values, while the adaptive iLQG controller took a couple of extra time steps to converge to the true value of the second parameter, and had a small offset error in its estimate of the first parameter.
  • the dual controller’s estimates for the parameters were better than the adaptive controller’s estimates, and both controllers were better able to estimate the second parameter.
  • FIG.5H shows each iLQG algorithm’s cumulative cost over the control horizon. Although the dual iLQG controller has a higher cost after the first time step, it subsequently maintains a lower cumulative cost than the adaptive iLQG algorithm.
  • the dual iLQG algorithm’s higher cost for the first time step is due to its increased distance from the goal value of 5 for the first state and its use of the second control action to probe the system. This probing action allows the dual controller to have a final cost that is 16% lower than the adaptive controller.
  • 100 initial control sequence seeds were run, with each element of each sequence drawn from a normal distribution with a mean of zero and a variance of 0.01. This variance was sufficient to generate a diversity of results for these simulations.
  • FIGS.6A and 6B A histogram of the final costs and a box plot of the simulation times for each of the algorithms is shown in FIGS.6A and 6B. In FIG.6A, the expected successive improvement between the three algorithms can be seen.
  • the best iLQG solution has roughly the same value as the median adaptive iLQG solution, and likewise the best adaptive iLQG solution has roughly the same value as the median dual iLQG solution.
  • the variance of the solutions visibly decreases with the change in the algorithm, with dual iLQG giving more consistent results than the other two algorithms.
  • the dual iLQG controller achieved a 16% reduction in the cost function compared to the adaptive controller. This difference in performance was primarily due to dual iLQG’s increased ability to determine the uncertain parameters, although both dual and adaptive iLQG showed robustness to time-varying parameters in this example.
  • the consistent improvement of dual iLQG over adaptive iLQG and adaptive iLQG over iLQG was shown, as well as that they have comparable run times.
  • Importance of compensating for multiplicative noise Dual iLQG can handle multiplicative noise where Bar-Shalom and Tse’s wide- sense dual control is limited to additive noise and MS-SP-NMPC does not consider process noise, only measurement noise.
  • multiplicative noise appears in many applications such as the control of prostheses using myoelectric signals [54], robot motion planning when using sensors with distance-dependent errors [14], stochastic fluid dynamics [12], and power grids where renewable energy represents a significant portion of total power generation [19].
  • the importance of compensating for multiplicative noise in the controller was demonstrated using the simple system dynamics shown in equation (3.5) with multiplicative noise.
  • dual iLQG was used twice, but in one case the multiplicative noise was accounted for in the controller, and in the other case it was not. This approach was used to keep all other variables except the one in question the same and is equivalent to using a controller that can only deal with additive noise when interacting with systems that have multiplicative noise.
  • the controller that is compensating for the multiplicative noise uses a smaller first u2 control action than the other controller, before making a larger control action in the second step. This action allows the compensating controller to gain some information on the system before having additional noise from a larger control step.
  • the controller that is not compensating for the multiplicative noise takes the opposite strategy in the first two time steps. The remaining u 2 and the u 1 results are relatively similar between the controllers.
  • FIG.7B shows the effect of the controls on the states, where the influence of the objective function being the square of the distance between the first state and a value of five can be seen.
  • the non-compensating controller overshoots to 5.45 at the end of the first time step, while the compensating controller overshoots to 5.31.
  • the non-compensating controller has an x 1 value of 5.31, while the compensating controller has a value of 5.03.
  • the non-compensating controller then takes three more time steps to get lower than 5.03, while the compensating controller stays within ⁇ 0.04 of five.
  • a plot of the estimates of the two uncertain parameters is shown in FIG.7C. Both controllers start to converge to estimates lower than the true value of the parameters, the non- compensating controller from above and the compensating controller from below. As with the unmodelled dynamics example, in this case it is not the absolute parameter values that are important for the controller, but the ratio of the parameter values.
  • FIG.7D shows the ratio of the parameters for the two controllers over the simulation.
  • FIG.7E shows the cumulative cost of these controllers over the simulation period.
  • the major changes in cost occur in two time steps, with the compensating controller’s cost leveling out after the first time step and the non- compensating controller’s cost leveling out after the third time step.
  • the controller that is compensating for the multiplicative noise has a total cost that is 62.1% lower than the controller that does not compensate for the multiplicative noise.
  • This section demonstrates the importance of using a dual controller that can properly compensate for multiplicative noise in systems where multiplicative noise exists. In the simple example presented, the difference in cost was significant at 62.1%.
  • Rohr’s example is used to illustrate the effect of unmodelled dynamics, showing that, unless a dead zone is used, the given adaptation law causes the first-order system to become unstable when driven to a constant reference in the face of a sinusoidal measurement noise.
  • This simple example is used here to demonstrate dual iLQG’s robustness to unmodelled dynamics in this case.
  • Rohr’s example considers a desired performance that is described by a reference system that is a first-order system with a transfer function of: (3.11) where and are parameters both with a value of 3.
  • the true system is a third-order where and are the true system parameters with values of 2 and 1, respectively.
  • model-reference adaptive control is used to control this system and is implemented as shown in FIG.8A.
  • model-reference adaptive control the desired system behaviour is specified through the use of a reference model. The controller then attempts to make the true system’s output equal to the output from the reference model by adjusting the input to the true system based on a set of known regressors and adapting parameters.
  • model-reference adaptive control to adapt a set of compensating parameters to account for the true system’s behaviour and impose the desired behaviour onto the system. This is different than the dual iLQG approach which is to adapt its estimates of the true system’s parameters to a sufficient level to minimize the desired cost function.
  • a control law of + (3.17) was used where r is the input reference and y is the output of the true system after the addition of noise, and the a terms are the corresponding adaptation parameters.
  • the measurement noise scaling factors were 0.5 and the noise associated with the state and parameter dynamics was 10 - 15 .
  • the sinusoidal noise from the problem was implemented on the true system, which the algorithms had no knowledge of other than the 0.5 noise scaling factor.
  • FIGS.8D and 8E show the true and estimated states for all three controllers over the entire 70 seconds and for just the first seven seconds, respectively.
  • the response to the high-frequency control actions from the iLQG controllers can be seen in the true dynamics for the first two states, while their estimates remained at zero with increasing uncertainty because there were no dynamics for those states in the modelled system and they were unmeasured.
  • the true state response for the third state also reflected the high-frequency input but was very small compared to the estimated state values due to the 229 scaling factor in the true system output.
  • FIGS. 8F and 8G show that for the iLQG controllers, after an initial adaptation period, the estimates of the parameters for the controllers remained largely constant and that there was no significant parameter drifting occurring although the estimates of the second parameter did decrease slightly over the 70 second simulation.
  • the ratio of can be plotted, as shown in FIG.8H for the entire time horizon, and for the first seven seconds in FIG.8I.
  • FIG.8I shows that dual iLQG had a better estimate of this ratio for the first 1.5 seconds, after which adaptive iLQG is largely better.
  • FIG.8H also more clearly shows that the slight drifting of the second parameter in FIG.8F resulted in the ratio of the parameter estimates from both controllers nearly converging to the true value by the end of the 70 second simulation.
  • Model-reference adaptive control oscillates around the true parameter ratio, but over time diverges as the parameter drift.
  • FIGS. 8J and 8K show the full and initial plots of the modelled and true system output.
  • the modelled system output is shown to approach 2.01, very close to the desired value of 2, while the true system output approached 1.91. Since the modelled system had practically reached the desired output reference, the controllers’ input stopped changing and the true system’s output also stopped changing, but not necessarily at the desired output reference due to the unmodelled dynamics and measurement noise.
  • Model- reference adaptive control oscillates around the desired output of 2 but diverges quickly 60 seconds into the simulation. The stage and total costs for the adaptive and dual iLQG algorithms are shown in FIGS.
  • Dual iLQG will be applied to a complex biochemical system, an anaerobic digestion, and a complex public policy control problem, the spread of COVID-19 in a population.
  • the focus of this disclosure is the control of systems with uncertain parameters in such a way that the reduction of uncertainty is implicit in the minimization of a given cost function.
  • These dual goals of system identification and cost minimization are often at odds with each other, and this tension creates a set of three features that characterize dual control.
  • Dual control demonstrates caution, minimizing the magnitude of control actions when uncertainties are high, probing, varying the control actions to gain information about the uncertain parameters, and selectiveness, only seeking to gain information on those parameters which will are likely to cause a reduction in future costs [28].
  • the optimization problem in MS-SP-NMPC grows exponentially with increasing uncertain parameters, and its current implementation can only handle two uncertain parameters.
  • This issue of the problem size increasing exponentially with the number of states is known as Bellman’s curse of dimensionality [63], and this limits existing implicit dual methods as they do not take a derivative-based approach in continuous state space.
  • the dual iLQG methods presented in this disclosure fill this gap by extending the derivative-based iLQG method to be implicitly dual. Both the dual and adaptive iLQG methods presented were demonstrated to be robust to time-varying parameters, and unmodelled dynamics, and can handle multiplicative noise.
  • Dual iLQG is applicable to and nonlinear control problems with many uncertain parameters, as shown in its application to the control of anaerobic digestion and COVID-19 in the Examples provided below. Since MS-SP-NMPC is only able to handle systems with two uncertain parameters, simplified versions of these systems were used to compare dual and adaptive iLQG with MS-SP-NMPC, and dual iLQG outperformed MS-SP- NMPC in all but one case. When the systems with all of the uncertain parameters were used, dual iLQG consistently outperformed adaptive iLQG.
  • Example iLQG implementation Selected Aspects of Example iLQG Implementation Some aspects of the Example iLQG implementation are as follows: 1. A high-speed adaptive iLQG algorithm was developed that estimates individual parameters of a system instead of the entire dynamic model of the system. The parameters were treated as constants in the iLQG algorithm through the creation of an augmented constants vector, but treated as states in the outer loop filter through the creation of an augmented state vector and augmented state covariance matrix. 2. Bellman’s “curse of dimensionality” that limits conventional implicit dual control algorithms was overcome by extending the iLQG control algorithm to be applicable to dual control problems.
  • Dual iLQG Algorithms While the preceding section of the present disclosure disclosed an example implementation of a dual iLQG algorithm based on the configuration shown in FIG.4C, it will be understood that the specific implementation shown in FIG.4C is not intended to be limiting, and that different configurations of dual iLQG-based algorithms that employ augmented states and an augmented covariance matrix, may be implemented without departing from the intended scope of the present disclosure. Furthermore, as discussed in additional detail below, the present dual implicit augmented iLQG algorithms may be adapted according to a wide variety of stochastic differential-dynamic-programming-based algorithms.
  • the present section contemplates some example and non-limiting implementations involving different configurations involving an iLQG-based algorithm.
  • the states are augmented to generate an augmented state vector that includes the uncertain parameter(s), such that each uncertain parameter is treated as a state by the iLQG algorithm.
  • the state covariance matrix is augmented with the covariance matrix of the uncertain parameter(s) to form an augmented covariance matrix that is employed by the iLQG algorithm.
  • FIG.9A illustrates an example closed-loop dual iLQG algorithm similar to that shown in FIG.4C.
  • the state and parameter estimates are concatenated into an augmented state vector and the covariances are likewise combined in a block diagonal fashion.
  • the figure shows the closed-loop, dual augmented implementation of the iLQG algorithm, which shows an outer loop of the dual iLQG algorithm, and also includes an iLQG box that represents an inner loop.
  • the iLQG box representing the inner loop of the algorithm, and which employs augmented states and covariance matrices, iteratively converges to an updated control trajectory and policy that contains the dual features of caution, probing, and selectiveness.
  • This is the core of the algorithm, the outer closed-loop implementation shown in the figure, combined with the augmentation, provides the implicit dual functionality of the overall algorithm.
  • the core iLQG algorithm may be implemented by employing the equations in the iLQG section of the present disclosure, but using the augmented state and covariance matrix.
  • the algorithm when implemented, may include the following steps, as previously described: (i) forward pass; (ii) simulating system dynamics using the given control trajectory; (iii) obtaining state, control, and cost derivatives at each time step – used to linearize the system; (iv) employing an inner loop estimator (filter); (v) performing a backward pass; (vi), approximating the cost-to-go function at each time step; (vii) obtaining a control policy for each time step; (viii) obtaining a new control trajectory; (ix) applying the control policy to state estimates for each time step to obtain control deviations; and (x) adding these control deviations to the previous control trajectory.
  • a zero-order hold may be employed to convert the control trajectory from being discrete to continuous in
  • the first control action may then be sent to the true system.
  • the new state of the system is sampled and measured, and the measurement is then filtered, utilizing other system information, to create an updated estimate of the augmented state vector and its covariance, as shown in the outer loop portion of the figure.
  • the control trajectory horizon is then updated, and after a time delay, the iLQG algorithm is run again to determine the control action for the next time step.
  • the outer loop filter(s) can be any algorithm that can provide updated estimates of the states and parameters and their covariances given at least a set of measurements and prior estimates of the states and parameters and their covariances.
  • this filtering would be performed with some type of recursive Bayesian estimator such as a Kalman or particle filter, particularly one suited to nonlinear systems.
  • These filters typically rely on inputs including the dynamic equations, the measurement function, and noise estimates and covariances along with the measurements and prior estimates of the states and parameters and their covariances to provide updated estimates of the states and parameters and their covariances.
  • a single filter could be used, known as joint filtering, or separate filters could be used, known as dual filtering.
  • FIG.9B is the same as FIG.9A other than the fact that it explicitly states that a joint filter is being used.
  • the augmented state vector can be separated before filtering and the updated estimates can be concatenated after the filtering.
  • FIGS.9A and 9B show example implementations in which a single filter is employed to estimate the augmented state vector and covariance matrix based on the measurements, in other example implementations, the measurements may be filtered using separate filters for the systems states and the one or more uncertain parameters, utilizing other system information. An example implementation of such a system is shown in FIG.9C.
  • FIGS.9A-9C illustrate the use of a moving horizon implementation, where the moving control horizon is updated with each time step in the outer loop.
  • the horizon may be updated according to several different approaches, including, but not limited to, moving horizon and rolling horizon approaches.
  • FIGS.9A-9C show additional features of example implicit dual iLQG algorithms relative to FIG.4A, including the covariances of states and uncertain parameter(s), and the augmented covariance matrices, and a loop to show that the control trajectory is reused between iterations.
  • the certain parameters, or "constants" c are also shown as optionally being provided as a function of time.
  • the figures also include dashed lines to show the initialization of the algorithm. Initialization may be performed, for example, with the estimates of the states and parameters and their covariances, certain variables, and a seed control trajectory or policy, as described in the previous section.
  • FIG.9A-9C do not show all of the inputs provided to the components within the inner and outer loops of the augmented iLQG algorithm.
  • additional inputs such as, but not limited to, a dynamic model of the system, a cost function, noise covariances, and a measurement model, are not shown in the figure despite being implemented.
  • the filter block also does not show all necessary inputs to the filter.
  • FIGS.9A-9C show non-limiting example cases in which the augmented state vector is not broken up (de-augmented) before each new iteration of the outer loop.
  • the augmentation can be performed according to many different implementations.
  • the uncertain parameters could be the first component in the concatenation of the state and parameter vectors, or the elements of the parameter vector could be interspersed with the elements of the state vector, provided that the correlating covariances are augmented in the same order (but in a block diagonal fashion instead of concatenated).
  • the example implementations shown in FIGS.9A-9C may be well suited for cases in which at least some of the states are observable, with the filter in the outer loop providing estimates of the unobservable states based on the measurements.
  • a partially observable system is one in which some of the states can be inferred from measurements, and an unobservable system is one on which none of the states can be inferred from measurements.
  • a system can be fully observable, with all states being determinable from measurements without the need for a filter.
  • a filter can be employed to provide estimates of the one or more uncertain parameters, and the states can optionally be computed from the measurements in the absence of the use of a filter.
  • FIG.9D An example implementation of such a system is illustrated in FIG.9D. As shown in the figure, the new state of the system is sampled and measured, and the measurement is then filtered to create an updated estimate of the one or more uncertain parameters, utilizing other system information. The states are determined by a function of the measurement(s) and then concatenated into the updated augmented state vector and updated augmented covariance matrix.
  • the filter employed in the outer loop to update the estimates of the parameters does not strictly require the use of the state dynamic model.
  • an Extended Kalman Filter were used, it contains a term that is the derivative of the states with respect to the parameters which would require the state model, but a Sigma Point Kalman Filter does not require the state dynamics.
  • Gaussian Process regression or other machine-learning techniques can be used to formulate the system dynamics [28, 34].
  • measurements from the system are used to estimate the relation between the change of the states over time and the values of the states, the control inputs, a set of constants, and a set of parameters.
  • This estimate of the system dynamics can then be used in the dual iLQG algorithm in place of a white box dynamic model.
  • FIG.9E illustrates how, within the inner loop 110, the model structure and parameters are provided to the iLQG algorithm.
  • FIG.9F shows various example approaches to model generation, including, for example, data-driven (machine learning approaches), such as regression-based approaches.
  • One non-limiting example of a regression- based approach to model generation is the Sparse Identification of Non-linear Dynamics (SINDy) algorithm.
  • SINDy Sparse Identification of Non-linear Dynamics
  • the ability of state constraints to be imposed on the system could be improved by combining this work with the augmented Lagrangian method for state constraints in differential dynamic programming [24].
  • This is an established method for imposing state constraints in differential-dynamic-programming-based algorithms.
  • the adaptive and implicit dual methods described above with reference to FIGS.4B and 4C and FIGS.9A-9D show the newly determined control action generated by the execution of the inner loop, this control action may not be applied to the system in some cases.
  • a user may overrule the control action, applying a different control action between successive inner loop executions, or choosing to not apply any control action at all between successive inner loop executions.
  • the control policy from the last completed iteration can be used to calculate the new control action instead of using the second element of the control trajectory from the previous (outer loop) iteration of the algorithm.
  • This updated state estimate could come from filtering a system measurement, allowing for the generation of a new control trajectory.
  • An estimation of the time to measure the system and filter the data to obtain a new state estimate could be employed to determine at what point this use of the previous control policy was necessary.
  • control action may be determined, from the updated control policy that is provided by executing the inner loop, based on the state estimates employed as inputs to the inner loop, or, alternatively, based on the update values of the states determined within the outer loop, after performing the measurement of the system output.
  • the parameter dynamics employed in the augmented mathematical model of the system dynamics provides a functional description that models how the uncertain parameters are expected to evolve over time.
  • This model is a set of equations that could be informed by knowledge of the physical system or generated analytically with the use of data and it describes the rate of change of each of the uncertain parameters.
  • This model is not expected to be a prefect reflection of reality, but higher fidelity models would allow for better results from the algorithm.
  • the parameter dynamics are used to predict the values of the parameters in the future in order to ultimately reduce the cost of the control trajectory, and to update the estimates of the parameters in the present time step using measurements of the true system.
  • the parameter dynamics may model one or more uncertain parameters as constants, and in such cases, the for those parameters would reflect zero rate of change over time as per the modeled dynamics.
  • the parameter dynamics may prescribe a constant uncertain parameter, which nonetheless changes its value as successive iterations and control actions are performed, as the implicit dual method takes control actions that refine the uncertain parameter.
  • the preceding example embodiments involving an implicit dual control algorithm each employ an augmented iLQG algorithm as the inner loop of the overall closed-loop algorithm, with an augmented state vector and augmented covariance matrix as described above. It will be understood, however, that a wide variety of alternative algorithms may be employed in the inner loop, provided that the algorithm is based on differential dynamic programming and is adapted for stochastic systems.
  • Such a broader class of algorithms that are augmentable, as per the present method of augmentation of the state vector with uncertain parameters, and the augmentation of the state covariance matrix with the uncertain parameter covariance matrix, are henceforth referred to as stochastic differential-dynamic-programming-based (DDP-based) algorithms.
  • this broad class of augmented algorithms are referred to as augmented stochastic differential-dynamic-programming-based algorithms.
  • stochastic differential-dynamic-programming-based algorithms that can be augmented, and employed in a closed loop control structure as described above, to obtain an implicit dual controller, include iLQG and stochastic DDP (SDDP), and variations thereof.
  • Variations of other differential-dynamic-programming-based algorithms that are not stochastic can be made to incorporate stochastic terms such that they can be augmented, and employed in a closed loop control structure as described above, and include DDP, the Sequential Linear Quadratic (SLQ) algorithm and the Iterative Linear Quadratic Regulator (iLQR) algorithm, and variations thereof.
  • DDP the Sequential Linear Quadratic
  • iLQR Iterative Linear Quadratic Regulator
  • the iLQG inner loop algorithm in FIGS.9A-9D may be substituted with another type of stochastic differential-dynamic-programming-based algorithm, such as SDDP and a stochastic variation of iLQR, to obtain another implementation of an implicit dual controller.
  • SLQ was developed as a variation in the 2000s, which does not use the exact Hessian in step (ii), whereas the DDP algorithm uses the exact Hessian when calculating B in step (ii).
  • the iLQR algorithm is another variation that was independently developed in the 2000s.
  • the difference between iLQR and DDP is that in step (iv) a nonlinear system can be used in iLQR, whereas in DDP the linearized version is used. All three of these techniques are similar, with each having minor differences that can affect efficiency, depending on the application.
  • the state dynamics are augmented with the parameter dynamics to create the augmented state dynamics, in the same manner in which the state vector is augmented with the uncertain parameter vector to create the augmented state vector, and in which the state covariance is augmented with the uncertain parameter covariance matrix to obtain the augmented covariance matrix.
  • the system can be a linear system.
  • the present example implicit dual control systems and methods may find useful and beneficial application, relative to conventional control approaches, when applied to linear systems with multiplicative noise.
  • Dual iLQG, and other example implicit dual differential-dynamic-programming- based control algorithms disclosed herein show significant promise as a control approach that can account for and actively reduce the uncertainties that are inherent to systems while improving their performance. Since this approach is so general and requires no training prior to implementation, it is applicable to a wide range of fields including applications that have been identified as high-impact.
  • 2017 review paper “Systems and Control for the future of civilization, research agenda: Current and future roles, impact and grand challenges” by Lamnabhi- Lagarrigue et al.
  • the present implicit dual algorithms can be applied to inform health care systems, government policy, and financial systems.
  • the review paper by Lamnabhi-Lagarrigue et al. mentions several high-impact system and control applications for the future, and the present implicit dual control algorithms could be applied to many of them, including automotive control, spacecraft control, renewable energy and smart grid, assistive devices for people with disabilities, and advanced building control.
  • dual iLQG was applied to both a government policy problem (COVID-19) and a renewable energy problem (anaerobic digestion).
  • many of the present example implicit dual differential-dynamic- programming-based algorithms can handle multiplicative noise whereas other dual control algorithms such as MS-SP-NMPC and wide-sense dual cannot.
  • multiplicative noise is common, such as in the control of prostheses using myoelectric signals [54], robot motion planning when using sensors with distance-dependent errors [14], stochastic fluid dynamics [12], and power grids where renewable energy represents a significant portion of total power generation [19, 70].
  • Other applications that involve multiplicative noise include financial stochastic volatility models [67], systems involving wireless communication [68], and batch chemical process control [69].
  • the present disclosure provides the closed-loop implicit dual differential-dynamic-programming-based control of physical systems, in which, in some example implementations, signals are transduced from sensors associated with the physical system, transformed into digital signals that are processed to infer states and uncertain parameters associated with the physical system, and to determine, via implicit dual control, suitable control actions for controlling the physical system, thereby providing the control signals to the physical system, such that the control signals are transduced into physical controls that are applied to the physical system, with the physical system being controlled according to the computed control signals.
  • the transformation of these sensor signals into physical changes in the controlled system through closed-loop implicit dual control can thereby improve the performance of the physical system according to a specified objective or cost function.
  • the MS-SP-NMPC control problem can quickly become intractable with increasing uncertain parameters, realizations of each uncertain parameter, and the length of the “robust horizon.
  • the computer is unable to solve such intractable problems due to the large number of variables and calculations involved.
  • the number and size of the variables can exceed the computer’s memory and the quantity of calculations can push the computer’s processor to its maximum limits, extending the time required to solve the problem. In some cases, this execution time may be so long as to preclude the method’s beneficial application in a real-time physical setting.
  • closed-loop implicit dual differential-dynamic-programming- based control as described in this disclosure overcomes Bellman’s “curse of dimensionality” by taking a derivative-based approach in continuous space.
  • This closed-loop implicit dual control method allows the computer to solve dual control problems with large numbers of variables without reaching memory or processor limits through the extension of a differential- dynamic-based algorithm, that can handle, for example, high-dimensional stochastic nonlinear systems, to be an implicit approximation of dual control.
  • closed-loop implicit dual control as described in this disclosure can avoid the discretization of the control and state space that leads to Bellman’s “curse of dimensionality”. Due to the ability of the closed-loop implicit dual control method to handle larger control problems efficiently, it can be applied to improve a variety of complex physical systems.
  • FIG.11 an example system is shown that includes a controllable subsystem 400 that is controllable via, and operatively coupled to, implicit dual controller 200.
  • the subsystem 400 may include separate control and processing circuity, including components such as those shown in implicit dual controller 200, which may be operatively connected to one or more mechanical or non-mechanical output components of the subsystem 400 (such as one or more motors, actuators, or devices that are responsive to a control signal provided by the implicit dual controller 200 for controlling the controllable subsystem).
  • the system includes one or more sensors 410 for sensing signals suitable to determine, either directly, or to estimate via a filter, the augmented state vector and augmented covariance matrix of the controllable subsystem. The types of sensors employed will depend on the application.
  • example sensors can include, but are not limited to, pressure sensors, electrical contact sensors, torque sensors, force sensors, position sensors, current sensors, velocity myoelectric sensors.
  • example sensors can include, but are not limited to, pressure sensors, gas sensors, flow sensors, ultrasonic sensors, alcohol sensors, temperature sensors, and humidity sensors.
  • example sensors can include, but are not limited to, voltage sensors, current sensors, light sensors, pressure sensors, rain sensors, temperature sensors, and anemometer sensors.
  • example sensors can include, but are not limited to, epidemiological surveillance, medical data collection, and laboratory data collection.
  • example sensors can include, but are not limited to, market data collection and economic indicators.
  • one or more of the sensors 410 may be integrated with the controllable subsystem 400, i.e. the sensors may be external sensors or internal sensors (e.g. sensors residing on or within the controllable subsystem 400).
  • implicit dual controller 200 may include a processor 210, a memory 215, a system bus 205, a control and data acquisition interface 220 for acquiring sensor data and user input and for sending control commands to the controllable physical subsystem 400, a power source 225, and a plurality of optional additional devices or components such as storage device 230, communications interface 235, display 240, and one or more input/output devices 245.
  • the methods described herein can be partially implemented via hardware logic in processor 210 and partially using the instructions stored in memory 215. Some embodiments may be implemented using processor 210 without additional instructions stored in memory 215. Some embodiments are implemented using the instructions stored in memory 215 for execution by one or more microprocessors.
  • the example methods described herein for controlling a subsystem can be implemented via processor 210 and/or memory 215.
  • the inner loop of the implicit dual control algorithm is executed by the augmented stochastic differential-dynamic-programming-based algorithm module shown at 300, based on the augmented state and augmented covariance data structures 310, based on estimates provided by the filter 320, employing measurements obtained from the sensors 410.
  • the example system shown in the figure is not intended to be limited to the components that may be employed in a given implementation.
  • the implicit dual controller 200 may be provided on a computing device that is mechanically supported by the controllable subsystem 400.
  • the implicit dual controller 200 may be physically separate from the controllable subsystem 400.
  • the implicit dual controller 200 may include a mobile computing device, such as a tablet or smartphone that is connected to a local processing hardware supported by the controllable subsystem 400 via one or more wired or wireless connections.
  • a portion of the implicit dual controller 200 may be implemented, at least in part, on a remote computing system that connects to a local processing hardware via a remote network, such that some aspects of the processing are (e.g. in the cloud).
  • a remote computing system that connects to a local processing hardware via a remote network, such that some aspects of the processing are (e.g. in the cloud).
  • FIG.11 any number of each component can be included.
  • a computer typically contains a number of different data storage media.
  • bus 205 is depicted as a single connection between all of the components, it will be appreciated that the bus 205 may represent one or more circuits, devices or communication channels which link two or more of the components.
  • bus 205 often includes or is a motherboard.
  • some example embodiments of the present disclosure can be implemented in fully functioning computers and computer systems, various embodiments are capable of being distributed as a computing product in a variety of forms and are capable of being applied regardless of the particular type of machine or computer readable media used to actually effect the distribution.
  • a computer readable storage medium can be used to store software and data which when executed by a data processing system causes the system to perform various methods.
  • the executable software and data may be stored in various places including for example ROM, volatile RAM, nonvolatile memory and/or cache. Portions of this software and/or data may be stored in any one of these storage devices.
  • the phrases “computer readable material” and “computer readable storage medium” refers to all computer-readable media, except for a transitory propagating signal per se.
  • Anaerobic digestion is a chemical process where organic materials are broken down by various types of bacteria in a low-oxygen environment to produce a gas that is commonly known as biogas, which consists primarily of methane and carbon dioxide [9].
  • This biogas has various uses as a fuel, and can, for instance, be used directly for cooking or heating. It can also be compressed into a liquid fuel that is similar to natural gas, or it can be used to generate heat and power with a combined heat and power system [25].
  • biogas can provide consistent power independent of the weather, can be located anywhere there is a consistent supply of organic waste, and the energy can easily be stored for later use.
  • anaerobic digestion represents a controllable renewable energy source that is free of many of the intermittency, storage, and site location issues common with other sources of renewable energy.
  • Anaerobic digestion has several that make it an ideal application for dual control.
  • First of all, anaerobic digestion models have uncertain parameters, and in particular those associated with the dynamics of the bacteria populations are difficult to measure and have a significant impact on the rest of the system dynamics.
  • the composition of the organic feedstocks to the digester is not known precisely. This combination of uncertainties in the feedstock composition, bacteria population dynamics, and the measurements makes this application well-suited for dual control.
  • ADM1 Anaerobic Digestion Model No.1
  • AM2 Anaerobic Digestion Model No.9
  • ADM1 is a comprehensive model that has 24 state variables and is commonly used to simulate anaerobic digestion systems, but due to its complexity has limited use for control approaches [26].
  • AM2 on the other hand has six state variables while still capturing the main dynamics of the process and is commonly used for model-based control and parameter estimation [25]. For these reasons, the AM2 model was used for the present example.
  • the AM2 model represents the anaerobic digestion process as: where X1 is the concentration of acidogenic bacteria, X2 is the concentration of methanogenic bacteria, S 1 is the organic substrate concentration, S 2 is the volatile fatty acid concentration, Z is the total alkalinity concentration, C is the total inorganic carbon concentration, and the model parameters are described in FIG.12 [50].
  • the inputs to this model are the dilution rate, D, along with the inlet concentrations of organic substrate, volatile fatty acids, total alkalinity, and total inorganic carbon, , , , and respectively.
  • Two-parameter comparison with dual MS-SP-NMPC The true and estimated initial values of the state vector, [X 1 , X 2 , S 1 , S 2 , Z, C], are both [0.5, 1, 1, 5, 40, 50] and the state are all constrained to be greater than 0.
  • the controls, [q a , q b , q c ], are all constrained to be between 0.0001 and 1.
  • the true value of the uncertain parameter vector, ⁇ ⁇ 1 , Z C is [1.2, 100], while its initial estimate is [2, 50] with a covariance matrix of diag[0.16, 1111.11].
  • This system is simulated over 70 time steps that are 0.1 days long. Only states [S1, S2, C] are measured with a noise term G of 0.5I3.
  • the process noise scaling matrix F for the dual iLQG state and parameter dynamics were set to 10 - 15 I23 to be comparable with the lack of MS-SP-NMPC to handle noise.
  • FIG.13A which shows the three feedstock controls for the AM2 model for each of the three controllers
  • the dual and adaptive iLQG algorithms took a similar approach, while the dual MS-SP-NMPC algorithm converged to a very different solution.
  • the iLQG algorithms varied all three feedstock controls
  • the MS- SP-NMPC algorithm focused primarily on the first feedstock control and only momentarily used the other two.
  • the relative performance of these controls could only be determined by their impact on tracking the desired rate of biogas production.
  • FIG.13B The plot of the anaerobic digestion states is shown in FIG.13B, with solid lines for the true state trajectories and markers for the state estimate means and covariances. Similar to the controls, the MS-SP-NMPC algorithm took a different approach to solve this problem, as the state trajectories are significantly different from the iLQG algorithms. Additionally, the impact of ⁇ ⁇ 1 being uncertain can be seen in the large covariance associated with the third state, S2, even though it was being measured.
  • FIG.13C shows the plot of the parameter estimates, with the true values shown as black lines.
  • the MS-SP-NMPC algorithm estimate of the first parameter, ⁇ ⁇ 1, was close but not as accurate, and not accurate at all for the second parameter, ZC.
  • the reason for the MS-SP- NMPC algorithm’s estimate of Z C being poor was that it hardly used the third control, q C , and therefore had little opportunity to get feedback on ZC.
  • the MS-SP-NMPC algorithm’s strategy of focusing on only one of the feedstock controls was successful, as it was largely able to maintain the biogas production level near the desired flow rate.
  • Dual MS-SP-NMPC uses the full nonlinear system dynamics and may have had an advantage in this problem with these initial conditions. Seventeen-parameter comparison with adaptive iLQG To show the ability of dual iLQG to handle many uncertain parameters, it was compared to adaptive iLQG for the AM2 system with seventeen uncertain parameters. Dual MS-SP-NMPC was not included in this comparison as it is unable to handle more than two uncertain parameters.
  • the true and estimated initial values of the state vector, [X 1 , X 2 , S 1 , S 2 , Z, C], are both [0.5, 1, 1, 5, 40, 50] and the state are all constrained to be greater than 0.
  • the controls, [qa, qb, qc], are all constrained to be between 0 and 1.
  • the true and initial estimated values This system was simulated over 70 time steps that are 0.1 days long. Only states [S1, S2, C] were measured with a noise term of of 0.5I.
  • the multiplicative noise terms for the dual iLQG state and parameter dynamics were set to 10 - 15 I to be comparable with the lack of MS-SP-NMPC to handle noise.
  • the cost function was as given for the two- parameter example, shown in equation (4.19).
  • FIG. 15A shows the results for the two iLQG algorithms’ control trajectories. As with the previous example, these results by themselves did not indicate the relative performance of the algorithms as there were no terms in the cost function to explicitly penalize or encourage the use of the feedstock controls.
  • FIG.15B shows the true and estimated values for the anaerobic digestion system states. The two iLQG algorithms had similar values for the first four states, but there was a significant difference for the last two states, as the dual controller had higher total concentrations of alkalinity and inorganic carbon. Looking at the parameter estimates in FIG.15C, the two iLQG algorithms gave similar results for most of the parameters with only a few that approached the true parameter values.
  • Non- limiting examples of industrial manufacturing, synthesis or fabrication system include automated assembly lines (such as automobile assembly lines), additive manufacturing systems, chemical synthesis reactors, injection molding systems, semiconductor fabrication systems, CNC machining systems, metalworking systems, bio-manufacturing systems, electrochemical and thin film deposition systems, and autonomous textile weaving systems.
  • Example 2 Application to Control of the COVID-19 Outbreak The control of the COVID-19 pandemic continues to represent an enormous challenge for governments all over the world.
  • the movement of the population through these states can then be represented graphically as arrows between each compartment (state), and these population flows can then be described with equations based on the states themselves as well as parameters and controls.
  • These parameters generally represent infection and fatality rates for different populations, and the controls are methods of influencing these dynamics.
  • These compartmental models have been tailored to better represent the dynamics of the COVID-19 virus, with different compartments or states being considered by different researchers.
  • the SIDARTHE model can be expressed as: where the description of the parameters can be found in FIG.17B.
  • the recovery and mortality rates of the Threatened state are modelled as being dependent on the Threatened state to represent the impact of the health care system being overwhelmed.
  • this effect was achieved in a two-step process, whereby a model was created where the Threatened population was divided into those in the limited-capacity intensive care unit (ICU) and those not, and then this model was simplified to maintain the eight states described above.
  • ICU limited-capacity intensive care unit
  • T1 as the Threatened population that do not require ICU treatment
  • T2 as those that do, and assuming that there are no transfers between these two populations
  • the (T )T and (T )T terms can therefore be represented as: and used in equations (5.9) to (5.11).
  • SIDARTHE model limitations Although the SIDARTHE model can capture the major aspects of the dynamics of COVID-19, there are several limitations to this model. First of all, this model represents the population as static, other than deaths due to COVID-19.
  • SIDARTHE does not include population changes due to travel, births, or non-COVID-related deaths. A more complex model that did include non-COVID-related deaths would also be an interesting application for dual control. Additionally, the SIDARTHE model only represents the public health policies as a single control input, lumping the impact of these policies into a single value representing the severity of the restrictions. Although this makes the implementation of the model much easier, it would be difficult for health agencies to get precise recommendations from such a lumped term. Additionally, this single control action limits the potential probing that a dual control method could implement, as in reality multiple policies can be varied over time.
  • the public health policies that were considered were media campaigns (u1), enforcing social distancing and mask use (u2), performing asymptomatic testing (u 3 ), performing symptomatic testing (u 4 ), quarantining of positive cases (u5), increasing non-ICU hospital resources (u6), and increasing ICU resources (u7). Since many of these public health policies influence more than one parameter in the SIDARTHE model to varying levels, effectiveness parameters were introduced. For instance, both media campaigns and enforcing social distancing and mask use will lower , but the media campaigns may do so less effectively.
  • Two-parameter comparison with dual MS-SP-NMPC Since the dual MS-SP-NMPC code was only able to handle two parameters, it was compared with dual and adaptive iLQG for the original single-control SIDARTHE model.
  • the uncertain parameters were chosen to be min and min , as the extent to which the spread of COVID-19 can be reduced by imposing full restrictions is a critical factor in determining the trade-off between reducing cases and the socio-economic impacts of restrictions.
  • the values of the total population and the parameters used in these simulations were taken from [29], where the COVID-19 outbreak in Germany was modelled.
  • the states were constrained to [0, 82999999] and the true and estimated initial values of the state vector were both [82998999, 1000, 0, 0, 0, 0, 0, 0] .
  • the control inputs were constrained to [0, 1].
  • the true value of the constant parameter vector was [0.0422, 0.0422], while its initial estimate was [0.15, 0.1] with a covariance matrix of [0.0025, 0; 0, 0.0011].
  • This system was simulated over 40-time steps that were 1.0 days long, with a rolling horizon approach.
  • the Diagnosed, Recognized, Threatened, and Extinct states were measured with a noise term of diag([10, 7.5, 5, 2.5]), and as the MS-SP-MPC did not consider the states to be uncertain, the process noise scaling matrix for the dual iLQG state and parameter dynamics were set to 10 - 15 I10 and the noise was not implemented in the true system in the outer loop.
  • This cost function was very similar to the one used in [51] for the suppression of COVID-19, but c u was increased here to increase the cost of imposing restrictions. Additionally, this is a representative cost function; its values would have to be set by policymakers, but the benefits illustrated in this work are robust to various cost functions. Instead of using a shrinking horizon approach for the SIDARTHE simulations, a rolling horizon approach was used to match how this problem would be approached by governments.
  • FIG.19A shows the controls for the adaptive iLQG, dual iLQG, and dual MS-SP- NMPC algorithms. Both of the iLQG algorithms used the full range of the constrained control, while the MS-SP-NMPC algorithm’s control remained near zero for the entire simulation duration.
  • the dual iLQG controls use a high degree of restrictions early on, with a small variation that likely helped with parameter identification, before tapering off over the 40-day time frame.
  • the effect of these controls on the system states can be seen in FIG.19B, where the near lack of control from the MS-SP-NMPC algorithm is clear from the rapid increase in case numbers, while the iLQG algorithms did better at keeping the case numbers under control, with dual iLQG performing better than adaptive iLQG.
  • the estimation of the two uncertain parameters is shown in FIG.19C.
  • the dual MS-SP-NMPC algorithm’s estimates remained at the initial values, while the iLQG algorithms converged to the true values in a similar manner.
  • both adaptive and dual iLQG are an improvement on iLQG, but where the iLQG and adaptive iLQG solutions are strongly skewed to the left, the dual iLQG solutions do not show this same pattern. It may be that for this application the dual probing actions do not result in the cost reductions that the algorithm determined were probable.
  • the dual iLQG algorithm does find a similar minimum solution as the adaptive algorithm though, and the two have a similar variance. Looking at the simulation time required for these seeded runs in FIG.20C, the results are similar between the three algorithms. Although the dual algorithm would normally be expected to have the longest times, and it does have the largest extreme times in this case, it actually has the lowest median time of the three algorithms.
  • the complexity of the model and the length of the horizons appear to have more of an impact on the simulation times than the differences between the Sixteen-parameter comparison with adaptive iLQG
  • the model expanded from the two-parameter case to sixteen parameters in this case to compare dual and adaptive iLQG.
  • Dual MS-SP-NMPC was not included in this comparison as it is unable to handle more than two uncertain parameters.
  • the modified SIDARTHE model used for this comparison used 5 control inputs, u1 to u5, as described in equations (5.20) to (5.24).
  • the states are constrained to [0, 82999999] and the true and estimated initial values of the state vector are both [82998999, 1000, 0, 0, 0, 0, 0, 0] .
  • the 5 control inputs are constrained to [0, 1].
  • the true and initial estimated values of the uncertain parameters used in this simulation are shown in FIG.21. This system is simulated over 30 time steps that are 1.0 days long, with a rolling horizon approach.
  • the Diagnosed, Recognized, Threatened, and Extinct states are measured with a noise term of diag([10, 7.5, 5, 2.5]), and the noise term for dual iLQG were set to 10 - 5 for the state dynamics and 10 - 15 for the parameter dynamics, but the noise was not implemented in the true system in the outer loop.
  • the controls resulting from each of the iLQG algorithms are shown in FIG.22A.
  • the dual controller has significantly lower values for the 2nd, 3rd, and 4th controls (enforcing social distancing and mask use, asymptomatic testing, and symptomatic testing) that have higher weights in the cost function, while the 1st and 5th controls are very similar to adaptive iLQG.
  • FIG.22B shows the true and estimated states for each algorithm, along with the covariance of the estimates.
  • the dual controller has only slightly higher case numbers than the adaptive controller. Due to the significant measurement noise and large covariance of the states in this example, the parameter estimates shown in FIG.22C also have large covariances, and the estimates do not converge to the true values in the 30-day time horizon.
  • a control action may be taken by the public health entity that is different from a recommended control action as per a currently determined control policy at a given time step or iteration (which may include taking no action at all), and the method may proceed to the following iteration without application of the recommended control action.
  • Example 3 Autonomous Vehicle Control Autonomous vehicles must be able to operate in uncertain and changing environments. Many of these uncertainties can be expressed as uncertain parameters in stochastic mathematical models of the dynamics of the autonomous vehicle system, and additional information on these parameters can be gained through measurements. With a large number of states and uncertain parameters depending on the type of land, water, or air vehicle, computing control actions for a vehicle in real-time would be challenging for any conventional algorithm.
  • uncertain parameters may include, but would not be limited to, any one or more of the road friction coefficient and tire contact parameters.
  • these parameters may include, but would not be limited to, any one or more of the effectiveness coefficients of the control surfaces or lift, drag, or moment coefficients or other nondimensional coefficients.
  • the mass of the vehicle may also be considered if it is deemed sufficiently uncertain in a given case.
  • the adaptive and dual control methods described above would calculate control policies that would be implemented on the autonomous vehicles with the objective of minimizing the given cost functions. These control policies would inform for example the use of wheel torques, braking, and steering in the case of a land vehicle, or the thrust, control surface position, or ballast tanks in the case of a water vehicle.
  • the example adaptive control or implicit dual control methods described herein, when applied to the present example application of autonomous vehicle control, would employ an encoded augmented mathematical model of the dynamics of the autonomous vehicle.
  • the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented mathematical model of the dynamics of the autonomous vehicle within the augmented stochastic DDP-based algorithm within the inner processing loop enables the aforementioned uncertain parameters to be treated like states during the computation of the updated control policy, thereby enabling the implicit dual features of probing, caution, and selectiveness to achieving convergence faster than a corresponding method absent of augmentation.
  • Such an autonomous controller based on dual implicit control enables the anticipation of the learning that will result from a given set of control actions for controlling the autonomous vehicle, and how the information it gains will help it achieve its goal as expressed in the cost function, such that the control policy determined for controlling the dynamics of the autonomous vehicle adapts to uncertain model parameters, such as uncertain and potentially hazardous conditions.
  • a dual control algorithm (implemented according to a method employing an augmented stochastic DDP-based algorithm, as noted above) for an autonomous land vehicle would be cautious in that it would limit large and sudden control actions, probing in that it would make movements to better assess the current road conditions, and it would be selective in that it would prioritize identifying the road conditions over other uncertain parameters depending on the given situation.
  • Example 4 Personalized Healthcare Rehabilitation Understanding the best rehabilitation treatment plan for an individual is difficult for many reasons. A significant one is that existing treatment pathways are built on studies that aggregate across clinics. By the time these studies are published, the pathways that they have analyzed are decades old, no longer representing the entire set of available actions. In addition, these pathways are aggregated.
  • rehabilitation is deterministic if we modeled the precise quality and quantity of sleep every night, the exact diet, and many other details, but absent of these, rehabilitation presents as an extremely stochastic process in which the current best practice is to rely on aggregated data vs. responding to the daily stochastic fluctuations presented by the patient.
  • These unconventional methods employ derivate-based stochastic DDP-based algorithms and involve the processing, by computer hardware, of an augmented state data structure that includes the both the states and the uncertain parameters, an augmented state covariance data structure that characterizes the covariance both the states and the uncertain parameters, according to an encoded augmented mathematical model of the personalized rehabilitation dynamics, where the uncertain parameters are treated as states.
  • This unconventional approach enables, as described above, the efficient computation of a suitable control policy with improved computational efficiency.
  • a wide range of states can be considered.
  • one or more states may be selected from the taxonomy identified by the World Health Organization (International Classification of Functioning, Disability, and Health: ICF), that includes categories of participation, activities, health condition, body functions and structures, environmental factors, and personal factors. States within the participation domain could include the number of times a person was able to go to work; engage in their favourite sport; or participate in other meaningful activities for them. As well, it could include states including quality of life. Measurements of states within this domain include the short Form 36 and other questionnaire, as well as location-data that can be tracked and logged.
  • ICF World Health Organization
  • States within the activity level could include their level of performance in various activities, including walking on level ground, walking up stairs, getting into and out of cars, and other activities pertinent to their intended participation and their health condition. Measurements of these states include validated tests such as the timed up and go (TUG) test, or the 6-minute walk test. States related to health condition could include the functional ability to use a wheelchair or exoskeleton, and measured by physiotherapists using a rating scale. States related to body functions and structures would include the location and type of For example, in the domain of spinal cord rehabilitation it would include the level of injury (e.g., T5) and whether it was complete or incomplete, typically measured using an ASIA Impairment scale.
  • T5 the level of injury
  • ASIA Impairment scale e.g., T5
  • States within the category of environmental factors could include states such as the distance to and accessibility of the various treatment options given their geography, as well as whether clinicians were able to speak their native language and/or translation services were available.
  • personal factors could include states such as the age, height, weight, sex, and ethnicity of the person; the level of motivation of the individual; their level of executive function; and other intrinsic attributes and motivators. These states can be measured, for example, through IQ and EQ tests and other tests of motivation.
  • the relationship between the states, combined with inputs of treatment options can be dynamically modelled using a combination of data-driven processes and existing dynamical information on available actions.
  • Non-limiting examples of parameters that are uncertain within a mathematical model of personalized rehabilitation include the gains and time-constants between each of the states.
  • uncertain parameters include the gain and time-constant that relate FES training to walking ability, the gain and time- constant that relates FES training to the ability to participate in a specific activity such as golf, and the gain and time-constant that relate FES training to the ability to participate with community members within the golfing community.
  • gains and time-constants between each of the actions and each of the states as well as between each of the states and each other.
  • control actions may include pharmacological treatments (e.g., spasticity-reducing drugs such as Botox), surgical treatments (e.g., tendon release), conventional therapeutic approaches (e.g., stretching or walking), and/or robotic/assistive technological approaches (e.g., exoskeletons, functional electrical stimulation).
  • pharmacological treatments e.g., spasticity-reducing drugs such as Botox
  • surgical treatments e.g., tendon release
  • conventional therapeutic approaches e.g., stretching or walking
  • robotic/assistive technological approaches e.g., exoskeletons, functional electrical stimulation.
  • the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented mathematical model of the dynamics of the personalized rehabilitation within the augmented stochastic DDP-based algorithm within the inner processing loop enables the aforementioned uncertain parameters to be treated like states during the computation of the updated control policy, thereby enabling the implicit dual features of probing, caution, and selectiveness to achieving convergence faster than a corresponding method absent of augmentation.
  • Such an autonomous controller based on dual implicit control enables the anticipation of the learning that will result from a given set of control actions for controlling the treatment decisions employed during personalized rehabilitation, and how the information it gains will help it achieve its goal as expressed in the cost function.
  • example methods may include applying the control actions that are determined as the example methods are executed and an improved control policy is determined on each iteration of the method
  • other example implementations of the methods may be absent of the step of applying the control action – and thus absent of performing a medical treatment or therapeutic intervention – and may only be employed to communicate potential control actions that can be considered, and optionally implemented, by a provider.
  • a control action may be taken by the provider that is different from a control action as per a currently determined control policy at a given time step or iteration (which may include taking no action at all), and the method may proceed to the following iteration without application of the recommended control action.
  • Example 5 Wearable Robotics
  • Wearable robotics include upper and lower-limb exoskeletons and exosuits, which can either be rigid or soft. The goal of these devices is to augment, assist, or rehabilitate individuals. However, every individual moves slightly differently. Human movement is inherently stochastic due to the biologically stochastic nature of force generation within human muscles, and each person comes up with slightly different control strategies based on the dimensions of their various limbs; their strength and flexibility, and any underlying impairments they may have.
  • One of the largest goals for wearable robotics has been to make life easier for people – typically defined as reducing the metabolic cost for them to do their desired activities.
  • developing efficient personalized tuning settings that reduce metabolic cost has been challenging.
  • These unconventional methods employ derivate-based stochastic DDP-based algorithms and involve the processing, by the computer hardware controlling the wearable robotic system, of an augmented state data structure that includes the both the states and the uncertain parameters, an augmented state covariance data structure that characterizes the covariance both the states and the uncertain parameters, according to an encoded augmented mathematical model of the dynamics of the wearable robotic system, where the uncertain parameters are treated as states.
  • This unconventional approach enables, as described above, the efficient computation of a suitable control policy that is tailored to the user of the wearable robotic system with improved computational efficiency.
  • Non-limiting examples of states of such a stochastic system may include, for example, relevant states used by the wearable robot, which typically include joint angles, torques, forces, power, and energy, along with metabolic consumption as measured by CO2 expiration.
  • Some wearable exoskeletons also use states including myoelectric activation as recorded using electromyographic electrodes, sonomyography, or other devices.
  • Other devices include states that measure user intent as measured using EEG, or other devices.
  • Others enforce state-based impedance regimes in which, depending on which phase of gait a person is, certain parameters are tuned to determine an impedance and equilibrium position.
  • Others enforce a phase portrait or virtual holonomic constraint.
  • Many models are used in the field, but all of them have tunable parameters that are unique to each individual, and the majority of the models in the field have multiple parameters, making tuning them difficult given how stochastic both the user’s signals are as well as how stochastic and time-delayed the measurement of metabolic activity is.
  • the uncertain parameters employed to define an augmented state data structure and augmented covariance data structure will in general depend on the particular mathematical model used.
  • the uncertain parameters could include, for example, any one or more of the stiffness, damping, and inertia values along with the equilibrium position within each state.
  • the uncertain can include, for example, one or more of parameters describing a minimum jerk trajectory or other kinematic profile (e.g. according to a model described in US Patent No. 10,213,324, Sensinger and Lenzi 2019).
  • the uncertain parameters can include, for example, one or more parameters that determine the profile of the phase portrait (e.g. according to a model described US Patent No. 10,314,723, Sensinger and Gregg.2019). It will be understood that a wide variety of models and associate parameters may be employed (e.g. another example model is described in US Patent No. 10,799,373, Lenzi and Sensinger.2020), and in general, the various models each have tunable parameters that are unique to each individual.
  • Non-limiting examples of control actions include generating joint kinematics (position or velocity profiles or phase portraits), kinetics (producing torques or torque trajectories), and applying the control actions to actuators (e.g. motors).
  • the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented mathematical model of the dynamics of the wearable robotic system in a filter residing in the outer processing loop enables the outer loop filter to provide an updated estimate of the aforementioned uncertain parameters, taking into account their uncertainty, thereby leading to an improvement in convergence, computing efficiency, and personalization of the operation of the wearable robotic system.
  • the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented model of the dynamics of the wearable robotic system within the augmented stochastic DDP-based algorithm within the inner processing loop enables the aforementioned uncertain parameters to be treated like states during the computation of the updated control policy, thereby enabling the implicit dual features of probing, caution, and selectiveness to achieving convergence faster than a corresponding method absent of augmentation.
  • Such an autonomous controller based on dual implicit control enables the anticipation of the learning that will result from a given set of control actions for controlling the wearable robotic system, and how the information it gains will help it control the wearable robotic system in a manner that is tailored to individual user preferences, thereby leading to improved adoption, utilization and clinical efficacy.
  • Example 6 Fault Detection
  • controllers such as electrical systems, electromechanical systems (systems or subsystems that include electrically driven/actuated mechanical components), hydraulic systems, pneumatic systems, thermal systems, and combinations thereof.
  • Such industrial systems or subsystems often face degradation over time, with the degradation causing failure. Having the ability to monitor the health of the system and detect faults indicative of potential failure can prevent expensive repairs, downtime, and loss of life.
  • This unconventional approach enables, as described above, the efficient computation of a suitable control policy that accounts for system degradation with improved computational efficiency. Moreover, such an approach enables the monitoring, during the control of the system, of the uncertainty associated with one or more uncertain parameters, thereby enabling a control method that can facilitate the early detection of system degradation.
  • Mathematical models of industrial systems such as those described above may already exist, or may be generated to characterize the time evolution of system states according to parameters that can include one or more uncertain parameters. It will be understood that the specific states, parameters, and controls would be specific to a given industrial system.
  • Non-limiting examples of industrial systems and associated uncertain parameters include the detection of scaling in boilers (for which example uncertain parameters could include heat transfer efficiency or fouling factor) and DC motor fault detection (for which example uncertain parameters could include motor resistance, friction torque coefficient, and magnetic flux linkage).
  • Additional examples of industrial systems that are controllable for autonomous fault detection in the presence of uncertain model parameters include, but are not limited to, robotic systems, autonomous and non-autonomous vehicles, wind and tidal turbines, and HVAC equipment.
  • the adaptive and dual control methods described above would calculate control policies that would be implemented on the industrial system with the objective of minimizing the given cost functions and also providing the features of fault detection and system health monitoring. The nature of the controls would depend on the specific industrial system to which this control approach was applied.
  • the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented mathematical model of the dynamics of the industrial system within the augmented stochastic DDP-based algorithm within the inner processing loop enables the aforementioned uncertain parameters to be treated like states during the computation of the updated control policy, thereby enabling the implicit dual features of probing, caution, and selectiveness to achieve convergence faster than a corresponding method absent of augmentation, and facilitating the monitoring of uncertainty associated with its parameter estimates while the system controlled during normal operation.
  • the augmented stochastic DDP-based algorithm can be motivated to probe and monitor these parameters so that the overall cost was not impacted.
  • the uncertain parameters and their uncertainties can be employed to facilitate fault detection according to many different implementations.
  • criteria such as thresholds
  • one or more specific uncertain parameters defining bounds on their normal operating ranges, and/or uncertainty e.g. variances and/or covariances as determined by the augmented covariance data structure
  • uncertainty e.g. variances and/or covariances as determined by the augmented covariance data structure
  • Example 7 Building climate Control With 12% of global energy usage being used for heating and cooling buildings, building climate control significantly impacts climate change (for example, see González-Torres et al., Energ. Rep.8, 626-637, 2022).
  • the control of the climate of a building can involve a variety of devices, including all types of HVAC equipment, automatic blinds, lighting, and heat energy storage equipment to maintain a building’s temperature and humidity levels. Sensors for such a system could include temperature, humidity, occupancy, sunlight, and anemometer sensors.
  • Such a control system can also be informed by external factors including weather predictions, the scheduled usage of the building, and time-of-use power pricing, all of which contribute to uncertainty in a mathematical model employed by a controller.
  • MPC-based controllers can maintain the climate of the building by predicting and accounting for the impact of disturbances, but this approach requires a sufficiently accurate model of the building, including numerous parameters that have associated uncertainty.
  • Some of these uncertain parameters involve external factors as noted above, while other uncertain parameters are building-specific and can vary with time. Accordingly, identifying these uncertain parameters and their appropriate respective values would have to be performed for each building for which this control method is used, and this would be an expensive and potentially cost-prohibitive approach. Accordingly, a technical problem exists in the field of building climate control in that existing control methods fail to accommodate, learn and refine uncertain parameters related to external factors and/or building-specific aspects that impact the interior climate of a building.
  • the states involved in a building climate control model can vary depending on climate control implementation, but in some non-limiting examples, can include temperatures and humidities, and the uncertain parameters could consist of thermal conductivities, thermal capacities, radiation heat transfer coefficients, and HVAC equipment efficiencies.
  • thermal conductivities thermal capacities
  • radiation heat transfer coefficients thermal capacities
  • HVAC equipment efficiencies are critical for determining the impact of sunlight on a building’s external and internal temperatures, and fouling of surfaces and glazings can significantly change these values.
  • the efficiencies of many types of HVAC equipment, specifically heat pumps for example are dependent on the outside conditions and can vary significantly throughout a single day.
  • time-varying parameters could be included in the dual iLQG augmented state vector so that they could be identified during normal operation of the building climate control system.
  • Mathematical models for building climate control typically describe the rate of change of temperatures and humidities of interest. This is usually done in a lumped approach to simplify the control model and avoid the use of partial differential equations, such as for example considering the entire exterior south face of a building to be a single temperature.
  • the states of the model could include for example the temperature of each external and internal wall of the building, the air temperature of each room in the building, the humidity of each room in the building, and the temperature of any heat storage equipment.
  • the model would relate the rates of change of these states to the values of these states, the thermal properties of the building both known and uncertain, the impact of the control equipment, and the impact of external factors as described above.
  • the adaptive and dual control methods described above would calculate control policies that would be implemented with the building climate control equipment with the objective of minimizing the given cost functions. These control policies would inform for example the scheduling of the HVAC equipment, automatic blinds, lighting, and heat energy storage equipment.
  • the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented mathematical model of the dynamics of the building climate control system in a filter residing in the outer processing loop enables the outer loop filter to provide an updated estimate of the aforementioned uncertain parameters, taking into account their uncertainty, thereby leading to an improvement in convergence, computing efficiency, and enabling the online monitoring of the uncertain parameters degree of uncertainty, as per updated values generated by the outer loop filter.
  • the use of an augmented state data structure, the augmented state covariance data structure, and the encoded augmented mathematical model of the dynamics of the building climate control system within the augmented stochastic DDP-based algorithm within the inner processing loop enables the aforementioned uncertain parameters to be treated like states during the computation of the updated control policy, thereby enabling the implicit dual features of probing, caution, and selectiveness to refinement of uncertain parameters associated with external factors and building-specific aspects during climate control, and to achieve convergence faster than a corresponding method absent of augmentation.
  • Each of these sources can have a different impedance (resistance, inductance, and capacitance), and can run at slightly different frequencies and amplitudes, making the grid itself a stochastic system. Ensuring that power from each component of the grid can be used without the entire grid becoming unstable is a challenging problem. Accordingly, a technical challenge exists in regulating smart grids in that the system is stochastic and that there are many uncertain parameters that govern the dynamics of its states, making it difficult to ensure stability during operation. This technical problem involving the need to account for uncertain parameters that are associated with potential instability of a smart grid during its regulation can be solved by the adaptive and dual control methods of the present disclosure, as described in further detail below.
  • These unconventional methods employ derivate-based stochastic DDP-based algorithms and involve the processing, by the computer hardware controlling the smart grid, of an augmented state data structure that includes the both the states and the uncertain parameters, an augmented state covariance data structure that characterizes the covariance both the states and the uncertain parameters, according to an encoded augmented mathematical model of the dynamics of the smart grid, where the uncertain parameters are treated as states.
  • This unconventional approach enables, as described above, the efficient computation of a suitable control policy that accounts for and determines improved estimates of the uncertain parameters during control, with improved computational efficiency, thereby providing a customized control method that can lead to improved stability during regulation of the smart grid.
  • a mathematical model of a smart grid may map the various impedance parameters of each power source within the grid to the net power supply of the grid, and non- limiting examples of states include voltage amplitude and frequency being produced by each source within the grid.
  • states include voltage amplitude and frequency being produced by each source within the grid.
  • many of the parameters that govern the dynamics of these states are uncertain, due to the stochastic of the system and variations among components within systems or within a given system.
  • Non-limiting examples of uncertain parameters include the impedance parameters of each source within the grid, along with the electrical connections that connect them.
  • Examples of control actions that can be taken, when implementing an adaptive or implicit dual control method according to the example embodiments described above or variations thereof, include selecting which sources are connected to the grid, along with the addition or removal extra impedance to the grid.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

Sont prévus des systèmes et des procédés pour déterminer des actions de commande pour commander un système stochastique par l'intermédiaire d'une double commande implicite à l'aide d'un processeur informatique et d'une mémoire associée codée avec un modèle mathématique augmenté de dynamique du système, le modèle mathématique augmenté caractérisant la dynamique des états et de paramètres incertains du système. Un algorithme à base de programmation dynamique différentielle stochastique est employé pour traiter une structure de données d'état augmenté caractérisant les états du système stochastique et le ou les paramètres incertains, et une structure de données de covariance d'état augmenté comprenant une matrice de covariance des états du système stochastique et une matrice de covariance des paramètres incertains, selon le modèle mathématique augmenté et une fonction de coût, dans laquelle les paramètres incertains sont traités en tant qu'états supplémentaires soumis au modèle mathématique augmenté, pour déterminer une politique de commande pour réduire le coût par l'intermédiaire de doubles caractéristiques de sondage, de précaution et de sélectivité générées implicitement.
PCT/CA2024/051169 2023-09-08 2024-09-09 Double commande implicite pour systèmes stochastiques incertains Pending WO2025050223A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363537243P 2023-09-08 2023-09-08
US63/537,243 2023-09-08

Publications (1)

Publication Number Publication Date
WO2025050223A1 true WO2025050223A1 (fr) 2025-03-13

Family

ID=94922791

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2024/051169 Pending WO2025050223A1 (fr) 2023-09-08 2024-09-09 Double commande implicite pour systèmes stochastiques incertains

Country Status (1)

Country Link
WO (1) WO2025050223A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120065758A (zh) * 2025-04-28 2025-05-30 天目山实验室 一种飞行器的自适应控制方法和计算设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140278303A1 (en) * 2013-03-15 2014-09-18 Wallace LARIMORE Method and system of dynamic model identification for monitoring and control of dynamic machines with variable structure or variable operation conditions
CA2953385A1 (fr) * 2014-06-30 2016-01-07 Evolving Machine Intelligence Pty Ltd Systeme et procede pour modeliser un comportement de systeme
EP3008528B1 (fr) * 2013-06-14 2020-02-26 Wallace E. Larimore Procédé et système d'identification de modèle dynamique de surveillance et de commande de machines dynamiques à structure variable ou à conditions de mise en oeuvre variables
US20210050116A1 (en) * 2019-07-23 2021-02-18 The Broad Institute, Inc. Health data aggregation and outbreak modeling
US20210373513A1 (en) * 2020-05-29 2021-12-02 Mitsubishi Electric Research Laboratories, Inc. Nonlinear Optimization Method for Stochastic Predictive Control
US20220187793A1 (en) * 2020-12-10 2022-06-16 Mitsubishi Electric Research Laboratories, Inc. Stochastic Model-Predictive Control of Uncertain System
US20230022510A1 (en) * 2021-07-01 2023-01-26 Mitsubishi Electric Research Laboratories, Inc. Stochastic Nonlinear Predictive Controller and Method based on Uncertainty Propagation by Gaussian-assumed Density Filters

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140278303A1 (en) * 2013-03-15 2014-09-18 Wallace LARIMORE Method and system of dynamic model identification for monitoring and control of dynamic machines with variable structure or variable operation conditions
EP3008528B1 (fr) * 2013-06-14 2020-02-26 Wallace E. Larimore Procédé et système d'identification de modèle dynamique de surveillance et de commande de machines dynamiques à structure variable ou à conditions de mise en oeuvre variables
CA2953385A1 (fr) * 2014-06-30 2016-01-07 Evolving Machine Intelligence Pty Ltd Systeme et procede pour modeliser un comportement de systeme
US20210050116A1 (en) * 2019-07-23 2021-02-18 The Broad Institute, Inc. Health data aggregation and outbreak modeling
US20210373513A1 (en) * 2020-05-29 2021-12-02 Mitsubishi Electric Research Laboratories, Inc. Nonlinear Optimization Method for Stochastic Predictive Control
US20220187793A1 (en) * 2020-12-10 2022-06-16 Mitsubishi Electric Research Laboratories, Inc. Stochastic Model-Predictive Control of Uncertain System
US20230022510A1 (en) * 2021-07-01 2023-01-26 Mitsubishi Electric Research Laboratories, Inc. Stochastic Nonlinear Predictive Controller and Method based on Uncertainty Propagation by Gaussian-assumed Density Filters

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120065758A (zh) * 2025-04-28 2025-05-30 天目山实验室 一种飞行器的自适应控制方法和计算设备

Similar Documents

Publication Publication Date Title
Dutta et al. A survey and comparative evaluation of actor‐critic methods in process control
Cao et al. Deep neural network approximation of nonlinear model predictive control
Hedrea et al. Results on tensor product-based model transformation of magnetic levitation systems
CN102207928A (zh) 基于强化学习的多Agent污水处理决策支持系统
Yassin et al. Recent advancements & methodologies in system identification: A review
Rajasekhar et al. Exploring reinforcement learning in process control: a comprehensive survey
WO2025050223A1 (fr) Double commande implicite pour systèmes stochastiques incertains
Jeyaraj et al. Real‐time data‐driven PID controller for multivariable process employing deep neural network
Dang et al. Online self-learning fuzzy recurrent stochastic configuration networks for modeling nonstationary dynamics
Alsmeier et al. Imitation Learning of MPC with Neural Networks: Error Guarantees and Sparsification
Pozzi et al. Imitation learning-driven approximation of stochastic control models
Kottas et al. Fuzzy cognitive networks: Adaptive network estimation and control paradigms
Truong et al. Ensemble Bidirectional Long Short-Term Memory Network Identification for Nonlinear Autoregressive Exogenous Model: Application to Dual Double-Acting Piston Pump
Ławryńczuk Introduction to model predictive control
Lai et al. Deep neural network-based real-time trajectory planning for an automatic guided vehicle with obstacles
van Lith Hybrid fuzzy-first principles modeling
Vrabie et al. Biologically inspired scheme for continuous-time approximate dynamic programming
Li et al. Expensive Optimization
ML et al. An adaptive battery health monitoring framework using wavelet scattering and spiking graph transformers optimized by Arctic Wolf algorithm
Mathis Dual iterative linear quadratic Gaussian control for uncertain nonlinear systems
Banker et al. Model-free Reinforcement Learning for Model-based Control: Towards Safe, Interpretable and Sample-efficient Agents
Lv et al. A novel non-contact recognition approach of walking intention based on long short-term memory network
Abonyi et al. Adaptive Sugeno fuzzy control: A case study
Niu et al. Enhancing Control Performance through ESN-Based Model Compensation in MPC for Dynamic Systems
Chatterjee et al. Robust Fault Detection Of A Hybrid Control System Using Derivative Free Estimator And Reinforcement Learning Method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24861429

Country of ref document: EP

Kind code of ref document: A1