[go: up one dir, main page]

WO2002031606A1 - Echantillonnage de previsions et commande de perception - Google Patents

Echantillonnage de previsions et commande de perception Download PDF

Info

Publication number
WO2002031606A1
WO2002031606A1 PCT/DK2001/000660 DK0100660W WO0231606A1 WO 2002031606 A1 WO2002031606 A1 WO 2002031606A1 DK 0100660 W DK0100660 W DK 0100660W WO 0231606 A1 WO0231606 A1 WO 0231606A1
Authority
WO
WIPO (PCT)
Prior art keywords
perception
information
action
expectation
control method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/DK2001/000660
Other languages
English (en)
Inventor
Preben ALSTRØM
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Core AS
Original Assignee
Core AS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Core AS filed Critical Core AS
Priority to AU2001295435A priority Critical patent/AU2001295435A1/en
Publication of WO2002031606A1 publication Critical patent/WO2002031606A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • G05B13/027Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only

Definitions

  • the present invention (expectation sampling) samples incoming data in a way that selects the most relevant data and completes missing data based on information extracted from the previous data flow.
  • the invention can advantageously be coupled to a control system and thereby act as an operation system (perception control).
  • Data acquisition is an important element of handling information.
  • This process there are two generic problems that often are encountered.
  • One is the selection of relevant information from the plentitude of information that is received or offered. For example, one can here think of the information that is received every day though the media including the Internet.
  • the problem is known in production systems, where a lot of information often is available, but where it is also not known whether or to which degree the various accessible data is relevant. In surveys, the problem of selection is well known. Which questions are really relevant to ask?
  • the present invention provides a way called expectation sampling to perform a selection that is far from random. In stead the sampling is based on a learning process where previous encountered relations are used to select the next pattern of activity.
  • Another generic problem of data acquisition is missing data transfer.
  • Information could for various reasons reach a control system only in an incomplete form.
  • a sensor could fail during a process, or some other cut in communication could appear. Often one would not like to stop the process going on, but rather have the system continue acting as well as possible under the given circumstances.
  • the fact that information typically is obtained at different rates is another factor where it could be advantageous to speed up the next process on the expense of risking missing some data.
  • the present invention also completes missing data based on previous encountered relations.
  • the invention can with advantage be coupled to a control system and thereby act as an operation system that selects and completes information. We call this perception control.
  • One aim of the present is to provide a method which provides an action corresponding to coming information.
  • This aim implies the problem of providing an action corresponding to either information not being available, not being yet available or a combination thereof.
  • a method which meet this aim may in some sense utilise a system being able - or configured - to provide an action based on at least an expectation of coming information.
  • a method utilising two separate systems: a perception system and an action system.
  • the perception system is configured to provide perceptions of coming information, that is information that either is not yet available or information that has not yet been received by the action system and the action system is configured to provide an action based on the perception of the coming information.
  • the present invention relates to method preferably being a control method utilising a computerised perception system (1) and a computerised action system (2) which in common determine an action (a), preferably being a control signal, said perception system (1) establishes a perception ( ⁇ ) of coming information (S) based on expectations ( ⁇ ) and said action system (2) determines an action (a) corresponding to the perception ( ⁇ ) of the coming information (S) based on motivations (expectations of rewards), the method comprising determining an expectation ( ⁇ ) of coming information (S) based on past perceptions ( ⁇ ) of past coming information (S), combining the expectation ( ⁇ ) and present incoming information (S) so as to establish a perception ( ⁇ ) of coming information (S),by use of the perception system (1) receiving said present incoming information (S) from an external environment (3), determining an action (a) based on said perception ( ⁇ ) of the coming information
  • expectation and perception typically are vectors and that when reference is made in the following to combinations of either expectation or perception reference is made to combinations of the element of the vector, to combination of vectors or a combination thereof.
  • past perception indicates, correctly, that perceptions are formed during evolution of time that is perceptions follow one after another in the time domain. Thus, formation of a present perception of coming information is followed later on by formation of another formation of a new present perception and at that time the former present perception of coming information is then a past perception of past incoming information.
  • the perception system is configured to produce expectations ( ⁇ ) in a self-driven and self-organising activation process in order to provide expectations on which the perceptions is to be based.
  • the expectation ( ⁇ ) on which a perception is to be based is determined as a combination of past perceptions ( ⁇ ), such as a weighted sum of past perceptions ( ⁇ ). Furthermore, it is preferred that the perception ( ⁇ ) of coming information is based on a combination of expectation(s) ( ⁇ ) and incoming information (S).
  • the combination of expectation(s) ( ⁇ ) and incoming information (S) includes weighting of the expectation(s) ( ⁇ ) and the incoming information (S) by the confidence levels for the expectation and adding up the weighted expectation(s) and incoming information (S).
  • the past perceptions ( ⁇ ) on which the expectation(s) ( ⁇ ) is based is stored in a memory, such as a short term memory.
  • the perception system utilises a neural network, but other computer means such as statistical means may advantageously be utilised in connection with the perception system.
  • a neural network based perception system In embodiments of the method according to invention in which a neural network based perception system is utilised, such a system must, of course, be trained. In some situations the neural network is trained during use and in such case training of the perception system (1) takes preferably place in case the difference between the perception ( ⁇ ) of coming information and a combination of expectation ( ⁇ ), perception ( ⁇ ) and incoming information (S) is not within a predefined limit. This on-going learning may very advantageously be combined with an initial learning of the neural network.
  • the information (S) on which perception(s) ( ⁇ ) and expectation(s) ( ⁇ ) are based is selected from a plentitude of information.
  • Such selection is particular useful and in some case even a must for instance if the plentitude of information is so vast that the information on which perception and expectation is to be based is drowned by the presence of non-relevant information.
  • the selection is performed by the perception system (1), said selection being preferably performed based on previous encountered relations.
  • the method according to the present invention is configured to perform addition of information, and in particular embodiments it is preferred that the information (S) on which perception(s) ( ⁇ ) and expectation(s) ( ⁇ ) are based is added additional information. It is often preferred in such cases that the additional information is provided based on previous encountered relations.
  • these to measures may very advantageous be combined in one embodiment.
  • This may be utilised in for instance a situation where the information on which the perception and expectation is to be based is extracted from a plentitude of information and wherein this plentitude of information miss parts of information on which the perception and expectation is to be based.
  • the perception system extracts information available from the plentitude of information and adds additional information to the extracted information, so that expectation and perception may be provided.
  • the action system determines an action.
  • the action (a) determined by the action system (2) is determined based on a combination of perceptions ( ⁇ ) of coming information, said perceptions ( ⁇ ) being determined by the perception system (1).
  • the action (a) determined by the action system (2) is determined based on a combination of motivation (V) dependent weights (w) and the perceptions ( ⁇ ) of coming information determined by the perception system (1).
  • the motivation dependent weights (w) represents tendency levels to perform certain actions.
  • the combination is a summation in which the motivation dependent weights (w) are multiplied on the perceptions ( ⁇ ) of coming information.
  • the motivations (V) are formed as a combination of perceptions ( ⁇ ), preferably as a weighted sum of perceptions ( ⁇ ), of the coming information.
  • the action system forms motivations (V) in an ongoing process. This ongoing formation may, of course, also be utilised in other situations.
  • noise in the system. This is preferably implemented by configuring the action system so that actions (a) determined by the action system (2) comprise noise. Preferably, a random action is selected with probability ⁇ being related to or being the level of the noise.
  • the action system (2) utilises a neural network.
  • the neural network is trained by motivation learning, said motivation (V) being realised in form of expectations of receiving rewards or punishment.
  • utilising motivation learning is may be preferred that the motivation learning is conducted in case a measure of how well the motivation anticipates a reward is not within a pre-defined limit.
  • the invention relates in another aspect to a system configured to carry out the method according to the present invention.
  • the system comprises at least means embodying the perception system and the action system.
  • these means or additional means are configured to carry out any of the features of the method according to the present invention.
  • the present invention relates to a data processing system such as a computer system for performing a control method
  • said data processing system comprises a computerised perception system and a computerised action system.
  • the systems according to the present invention comprise processor means, such as one or more electronic processing unit(s), for processing data and storage means, such as RAM, ROM, PROM and/or EPROM, for storing data, and these systems determine in common by use of the processor means and the storage means an action (a), preferably being a control signal.
  • the perception system (1) establishes a perception ( ⁇ ) of coming information (S) based on expectations ( ⁇ ) and said action system (2) determines an action (a) corresponding to the perception ( ⁇ ) of the coming information (S) based on motivations (expectations of rewards), which establishment and determination are also done by use of processor means and storage means similar to the ones listed above.
  • the data processing system comprises preferably: means for determining an expectation ( ⁇ ) of coming information (S) based on past perceptions ( ⁇ ) of past coming information (S), means for combining the expectation ( ⁇ ) and present incoming information (S) so as to establish a perception ( ⁇ ) of coming information (S),by use of the perception system (1) receiving said present incoming information (S) from an external environment (3), and means for determining an action (a) based on said perception ( ⁇ ) of the coming information (S) by use of the action system (2) receiving said perception ( ⁇ ).
  • processor means such as electronic processing unit storage means such as and , such as RAM, ROM, PROM and/or EPROM.
  • processor means such as electronic processing unit storage means such as and , such as RAM, ROM, PROM and/or EPROM.
  • a data processing system such as a computer system, is considered within the scope of the present invention, and such processing system comprises accordingly, computer means, such as the ones listed above, for executing one or more of the step of the control method according to the present invention. .
  • Figure 1 shows schematically a perception control system.
  • Figure 2 shows schematically a perception system and an example of Expectation Learning (EL).
  • the new perception is a combination of both expectation and the sensory input from the environment.
  • the previous perceptions form an expectation, ⁇ ', which is a prediction of the next perception to be formed. This is then compared to the actual perception formed, which is a weighted combination of the previous perceptions and the externally provided sensory input. Should the perception differ from the expected, the weights u and confidence measure ⁇ are adjusted.
  • Figure 3 shows Appetitive Conditioning and Extinction.
  • the stimulus configuration is:Trials 0-100: “bell” followed by "food", Trials 101 - : “bell” alone to test for conditioning.
  • the CS was activated prior to the US.
  • the expectation of "food” remains. This causes the perception of "food” to be activated until the 185 th trial, 85 trials after the actual food input was present. At this point extinction becomes effective.
  • Figure 4 shows Blocking.
  • the stimulus configuration is: Trials 0-100: “bell” followed by "food”, Trials 101-150: “bell” and “light” followed by "food”, Trials 151-200 trials: "light” alone to test for conditioning, Trials 201 - : “bell” alone to test for conditioning.
  • Figure 5 shows Motivation Learning (ML).
  • Action system The motivation V formed provides predictions of future "rewards”. Comparing with the actual reward received, the weights v and the tendency weights w may be modified.
  • Figure 6 shows an example of use of Instrumental Reinforcement of the present invention.
  • Figure 6 caption cont.
  • the rewarded action is continuously chosen after an initial period, defined by the amount of time taken for the rewarded action first to be selected in a random fashion.
  • Figure 8 shows The Law of Effect: A higher rewarded action is chosen on the expense of a lower rewarded action.
  • Figure 9 shows the perception / x a, illustrated for controlling a cart-pole system. In this case, four components of information may be received, and only two actions can be taken.
  • Figure 10 shows an example of the perception system.
  • Figure 11 shows the cart-pole task.
  • the "random" system and the “extra unit” system have the worst performance.
  • the "same as last” system performs well, but is clearly worse than perception control, marked by the value of the parameter ⁇ .
  • Each point is a mean over 40 runs. The maximum number of trials given is 5000.
  • Figure 13 shows the Mountain-Car task. On the steep sites the gravity is stronger than the motor.
  • Figure 15 shows average number of received components per trial. The same systems as in Figure 14. Each point is a mean of 30 runs. Note the logarithmic vertical axis.
  • Figure 17 shows results for the timing problem.
  • ⁇ L 0, 0.01, 0.05, 0.2, 0.3, 0.5, 0.9 and 1.0.
  • Each point is a mean of 40 runs.
  • the present invention is based on ideas from human perception. It is well known that human perception not always is consistent with reality. General phenomena as “we see what we expect to see”, “selective perception”, the “halo-effect”, and “self-fulfilling prophecies” are all accepted concepts in the description of how our perception may be responsible for erroneous and inexpedient behaviour. However, despite much strong focus on the disadvantages of perception, there are important advantages as well. For example, our perception selects information from a continuously incoming highly complex information stream, in a way that for instance allows us to separate even a weak voice in a noisy environment.
  • Another, equally important feature of human perception is its ability to complete information that the sensory system has not provided, but is needed in order to act adequately. Through our senses, one often receives only partial information, for instance a few words of a sentence. Still, one may be able to form an expectation of what was said, or even reconstruct the sentence.
  • the present invention involves a "perception" system that selects and completes information from an external environment with the aim to control it ( Figure 1).
  • the success or failure of control may typically be described in terms of success or failure criteria that serve as a "motivational" drive for the actions taken.
  • the computational approach used may for instance be reinforcement learning [see e.g. R.S. Sutton and A. G. Barto, "Reinforcement Learning: An Introduction", MIT Press 1998], which we shall use in the specific applications described below.
  • the success/failure criteria are expressed in terms of "rewardsVpunishments", and "motivation” is realized in form of "expectations” of receiving rewards and punishments.
  • the motivation strongly influences the action tendencies, thus evoking the selection of appropriate control actions.
  • Actions are not given by a simple mapping of the perception, but influenced by motivation (expectations of rewards).
  • perception is not a simple mapping of the input, but influenced by expectations of incoming information.
  • expectations are formed in an ongoing process of one activity pattern initiating another.
  • specific realizations of an expectation-based "perception system” is introduced and coupled to a motivation-based "action system”.
  • the full perception control system is applied on a number of generic control tasks, for which we exemplify the information-completing feature.
  • motivation learning an essential characteristic being that learning is driven by a basic reward system.
  • An important ingredient is the existence of "hard-wired” needs or goals that act as a primary drive in the development of motivation. Motivation is valuable because rewards and punishments are not always instantly given, or may be intermittent in nature.
  • the other, equally important, and essential feature of this invention is the expectation of incoming information. These expectations are not related to any predefined drive, but produced in a self-driven and self-organizing activation process, where the currents propagating through the junctions from one activity pattern initiate to another.
  • the learning method basic for the invention and associated with these expectations is denoted expectation learning (EL) to distinguish it from the reward-driven ML methods.
  • EL expectation learning
  • the expectations are valuable because the information on which action is taken is not always complete or may be "polluted" by irrelevant information.
  • conditioning The basic test of learning is conditioning. As is the case for expectations, there are two distinct types of conditioning tasks, one is called classical conditioning, and the other is called instrumental conditioning. Instrumental conditioning concerns a reward- or punishment-driven change in performance. Below, this is specified in the context of the action system and the ML method, applying a reinforcement learning algorithm.
  • Classical conditioning concerns the pairing of inputs, where the action associated with a certain input is given (the unconditioned stimulus).
  • the reward system is irrelevant for classical conditioning.
  • ML is the relevant learning method for instrumental conditioning
  • EL is the appropriate method for classical conditioning, as elaborated further below.
  • the perception system The perception system
  • the irradiation-driven dynamics introduced in this invention and tested for classical conditioning does not involve a reward system.
  • the action system By separating the action system from the perception system we allow the action system to be optimized to the (relatively) easier task of associating the internal representation, or the "perception", to the action.
  • the expectation dynamics processes associations between different stimuli to form the perception. These two association processes may happen at very different time scales.
  • the particular action called the unconditioned response is by definition connected strongly to the perception of the input called the unconditioned stimulus.
  • the conditioned stimulus perception is in the expectation framework not directly associated with the unconditioned response. Rather, the conditioned stimulus evokes the perception of the unconditioned stimulus thus producing the unconditioned response.
  • a memory structure of the perception may be introduced, whereby the perception is "remembered” for a given time period - a type of "short-term memory".
  • an internal history of the perceived environment can be built up, and then used to attempt to attain predictions for the near-future environment. This type of bootstrapping of the perception can be regarded as a type of internal classical conditioning.
  • perception is here described by a vector, ⁇ , in general the perception of the last ⁇ timesteps, ⁇ being the duration or size of the short-term memory. This size may be assumed fixed, so that a constant historical snapshot of what has happened is maintained. As time increases, the perceptions are moved systematically back in the short-term memory, and new perceptions for the current time are added (fig. A).
  • the fundamental characteristic of the perception system of the invention is that it produces a vector ⁇ ', the expectation, which in a simple form is given by a weighted sum of past perceptions,
  • ⁇ j runs over the entire short-term memory of the perception (fig. B).
  • the basic point of expectation is that it strongly influences the perception of a sensory input.
  • the new perception may be given in the following form:
  • ⁇ * 1 (t + l) f WTA 03 i ⁇ ' i +(l - ⁇ i )s 1 ] ,
  • is a weighting vector, weighting the expectation against the input.
  • the weighting factors ⁇ j give the confidence levels for the expectation.
  • a process fwr A called “winners-take-aH” is here applied, whereby, the entries with the largest magnitude, the winners, are set to unit value and the rest set to zero.
  • the total "activity" (number of winners) is assumed to be fixed.
  • ⁇ u is a constant defining the adaptation rate of the weights u
  • the confidence level ⁇ j is here taken to be the average of the fraction of times the expectation element ⁇ j matches the actual sensory input Sj. This may also be capped, so that, for instance, ⁇ , runs between 0.1 and 0.9, (which is used in the examples presented), rather than 0 and 1 , thus allowing the input and the expectation to always have some role in the determination of the perception.
  • the perception system is shown in Figure 2.
  • Classical condition has three basic ingredients: activation, inhibition and irradiation. The expectation dynamics considered here embodies these ingredients, including the cases outlined below.
  • Basic classical conditioning After repeated presentations of the CS followed by the US, the CS elicits a response (CR, the conditioned response), which resembles the UR.
  • a and B Two stimuli, A and B are presented, and paired with an US. Firstly, A is presented followed by the US, causing A to become a CS. Then, the combination of A and B is presented followed by the US. In this case, the conditioning of stimulus B is blocked by the prior conditioning of A, and will not become a CS.
  • Stimulus generalization if a new stimulus, C is presented, where C possesses similarities to an existing CS, then the UR is elicited.
  • Conditioned inhibition A new stimulus D is repeatedly presented together with a CS without subsequent presentation of the US. Afterwards D is presented together with another CS, the UR may not appear. To illustrate the behavior of the perception system, some examples are analyzed in detail below. The size of the "short-term memory" was taken to be 3 time steps, but is not crucial for the conditioning phenomena discussed.
  • Our model also embodies blocking [case (Hi)], where a new stimulus, say a "light”, is presented together with a CS, the “bell”, subsequently followed by presentation of the US, the "food”. The expected perception of the US appears, and therefore no learning takes place - nothing is changed. Thus, the "light” will not be conditioned.
  • instrumental conditioning a reward- or punishment-driven change in performance is tested.
  • instrumental conditioning is tested in the context of the reward-driven motivation approach, applying a reinforcement learning algorithm.
  • Alternative implementations could also be considered.
  • An important characteristic of the motivation system is the existence of innate "hard-wired" needs or goals that act as a primary drive in the development of motivation. This drive may formally be introduced through a scalar r, the value of which represents an accomplished success (r positive) or failure (r negative) in fulfilling the needs or reaching the goals set.
  • the action system is characterized by the ongoing formation of another scalar, the motivation V,
  • Actions are formed based on the perception ⁇ .
  • the action a is given by the following form,
  • w motivation dependent weights, representing the tendency levels to perform certain actions in given situations (fig. C).
  • the number of "winners” in the “winners-take- all” process sets the total action activity.
  • Noise can be introduced, and is in the examples below.
  • a random action (rather than the action above) is selected with probability ⁇ (noise level).
  • noise level
  • a certain level of noise may be essential. If the system is not allowed to "explore”, more highly rewarded actions (or the possibility of avoidance of punishment) may remain undiscovered. When there is a non-zero noise level, the system can indeed discover more highly rewarded actions within the system, and act accordingly.
  • the motivation is modified adapting the weights v,
  • e J is called the eligibility of the pathway between / ' and
  • the concept of eligibility is an integral part of reinforcement learning algorithms (see e.g. R.S. Sutton and A. G. Barto, "Reinforcement Learning: An Introduction", MIT Press 1998).
  • the action system is shown in Figure 5.
  • Instrumental conditioning involves reward-based emission of specific actions, i.e. the increase and decrease in the tendency to perform specific actions.
  • the motivation dynamics considered in this paper models instrumental conditioning in various aspects, including four broad cases:
  • Positive reinforcement A certain action is rewarded. As a result, the rewarded action is observed more frequently.
  • Aversive reinforcement A certain action is punished. As a result, the punished action is observed less frequently.
  • Negative reinforcement A certain action relieves punishment. As a result, the relieving action is observed more frequently.
  • the percentage of time that the specific action is taken, A is illustrated as a function of the number of actions, N A , and as a function of the noise level ⁇ (the level of random choice of action). In the case of no rewards, we have
  • a positively rewarded action is selected in a much higher proportion of the time than unrewarded actions. Indeed, once the positively rewarded action has been selected among the available actions (at first randomly before learning occurs), then the positively rewarded action will continually be taken (see Figure 7), and others only taken due to the presence of noise.
  • the perception When input is incident, the perception is activated, and an action is produced. Initially, all the weights are set to zero, and so the action taken is selected randomly. In this case of positive reinforcement, no changes are made to the weights if the action is not rewarded. Once the rewarded action is selected, however, the weight between the perception and this action is strengthened, meaning that from this point onwards, that action is always selected when the perception is activated, apart from when it is overridden by the noise added in the action selection.
  • the partnership of classical conditioning via expectation learning and instrumental conditioning via motivation learning provides a comprehensive method that embodies conditioning phenomena.
  • the separability of instrumental and classical conditioning phenomena can utilize the strengths of the motivation dynamical protocols to encode instrumental conditioning between the perception and the action for the system, whilst allowing the expectation sampling to do the associative work between incoming information.
  • the use of binary valued units in the perception provides an advantage in that the expectation sampling is learnt quickly. Whilst the system may employ a large number of computational units (in perception) leading to a large number of weights, the weights are not readjusted once the expectation and perception agree, and the connections to the actions are sparsely activated.
  • the fixed activity level requires that the perception be fully activated (i.e. a fixed number of "winners") at any one time step. This protocol allows for imperfect stimuli. If the system is missing or overloaded with sensory information from the environment, the expectation and confidence form a filter with which prior experience (conditioning) helps to form the perception.
  • the current implementation of the expectation dynamics uses only positive expectation weights. This means that the strengths of the connections are only decreased relative to other weights (i.e. to decrease a particular connection, the others need to be higher relative to this one).
  • the system is not directly inhibited by a particular connection; rather, inhibition is modeled by increasing the connection to other (competing) units.
  • the actions may only act upon the environment. However, generally the actions (or part of them) are also transferred to the perception itself - so that the perception on which the system acts is made up on both internal and external components. Both of these can then also be modified according to the expectation learning method.
  • the perception system In order to evaluate the performance of perception control, we shall compare with three simple systems coupled with the same action system, but where the perception system is replaced by a simple information completion mechanism.
  • the "random" system missing components are assigned a random value within a reasonable interval.
  • the "same as last” system a missing component is assigned the same value as last time it was available. In a continuously changing environment this value may often be close to correct, and a reasonable control action may be taken.
  • the "extra unit” system a special unit is assigned and activated if and only if some component is missing. One can think of the unit as a warning indicator: Some information is missing.
  • an information vector / may be defined having elements /,-, where / e C c for some component c.
  • Perception is introduced as a two dimensional array of units that consists of as many units as the size of the information vector /times the number of possible control actions ( Figure 9).
  • the perception is / x a, where a is the action vector.
  • the action vector is zero in all places but the current control action,
  • Figure 10 shows the perception system.
  • the expectation ⁇ '(t) is formed as a weighted sum from the perception calculated at time t - 1.
  • k e C c For all k e C c :
  • the expectation ⁇ ' is also used for completing the received information.
  • the rth element of the completed information ⁇ -,(t) at time step r, is
  • the dynamics for the weight is such that the expectation resembles the actually received information. This part is the expectation learning. Learning takes place when the expectation is wrong. To this end, an error term ⁇ is calculated. If the Mh component is received, the error term ⁇ k is
  • ⁇ k (t) s k (t) - ⁇ ' k (t).
  • the system to be controlled contains a cart with a pole attached.
  • the pole must be held upright within +12° measured from vertical position and the cart must not move more than ⁇ 2.4m from the start position. If one of these things happens, a punishment is received.
  • two actions are considered: giving the cart a 10 N push to the left or to the right.
  • the environmental information has four components: x (the position of the cart on the track), (the cart velocity), ⁇ (the angle of the pole measured from its vertical), and ⁇ (the angular velocity of the pole). Contrary to the original situation, we now introduce for each of the components a probability p that the component is missing.
  • the component units are defined by discretizing the phase space into intervals, based on the following quantization thresholds:
  • Figure 12 shows the results using perception control (eight different values of ⁇ ), as well as results from using the three simple methods described above.
  • Control was considered obtained if the control system could balance the pole for 10,000 time steps.
  • the learning time is defined as the number of times the pole falls or the cart enters a forbidden area before control is obtained. In our example the maximum number of trials allowed is 5000 - if a system does not obtain control within this time, a learning time of 5000 is then assigned, in order to make it possible to calculate a mean.
  • the learning times shown in Figure 12 are means of 40 runs. Worst performance is obtained by the "random” system and by the “extra unit” system. These systems are unable to obtain control if the probability of a component missing is higher than 2%. A better performance is obtained by using the "same as last" system.
  • the learning time is notably reduced. Curves for eight values (0, 0.01, 0.05, 0.2, 0.3, 0.5, 0.9 and 1.0) of ⁇ L are shown. This parameter has no significant influence on the learning time. In all cases, the perception system was able to obtain control up to 28% probability of a component missing, which is 73% probability that the information is incomplete.
  • the information received by the control system is generally several time steps old.
  • the agent only has 0.3% probability for receiving all the information, and with 36% probability the information will at any given time step be totally lacking for the next 10 time steps.
  • the "same as last" system will at large values of p often perform the same action over and over again, which is the kind of behavior that actually solves the task.
  • the third and last example considered is a discrete, non-continuous problem.
  • the purpose is to demonstrate that the perception control system can handle discrete problems, in contrast to the "same as last" system that strongly relies on continuity.
  • the example is an abstract formulation of a simple but general timing problem.
  • the environment consists of an arbitrary number of states, here chosen to be eleven, and the (control) system can choose to be active or passive at each state. State /?+1 follows state n, as illustrated in Figure 16.
  • One state is now selected, as a state where the system has to act, here taken to be state seven.
  • the results for the three simple methods are the three curves on the left.
  • the "random" system and the “same as last” system have the worst performance. Their performance is easy to understand. In both cases, for small probabilities the system acts when it finds that the environment is in the selected state (even if it is not).
  • the mean return, normalized by the maximal possible return can be calculated to be 1-1.6p. For larger probabilities (p>0.7625), the systems "give up” and stay passive all the time, receiving the associated punishment. In this case, the fraction of maximal possible return is -0.22.
  • the examples show that perception control works far better than some simple methods.
  • the perception dynamics makes it possible for the subsequent action dynamics to obtain control even when large amounts of environmental information are missing.
  • the operational feedback has a time delay, e.g., in regulation of fluid transport where there is a time delay between a change in input and the resulting change in output,
  • the input S is the state of the process
  • the action a is the regulation of control parameters.

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

L'invention concerne un procédé (échantillonnage de prévisions) permettant d'échantillonner des données entrantes de manière à sélectionner les données les plus pertinentes, et à compléter les données manquantes en fonction d'informations extraites d'un train de données antérieur. Ce procédé peut avantageusement être couplé à un système de commande, et donc agir comme un système d'exploitation (commande de perception).
PCT/DK2001/000660 2000-10-10 2001-10-10 Echantillonnage de previsions et commande de perception Ceased WO2002031606A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001295435A AU2001295435A1 (en) 2000-10-10 2001-10-10 Expectation sampling and perception control

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DKPA200001508 2000-10-10
DKPA200001508 2000-10-10

Publications (1)

Publication Number Publication Date
WO2002031606A1 true WO2002031606A1 (fr) 2002-04-18

Family

ID=8159780

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/DK2001/000660 Ceased WO2002031606A1 (fr) 2000-10-10 2001-10-10 Echantillonnage de previsions et commande de perception

Country Status (2)

Country Link
AU (1) AU2001295435A1 (fr)
WO (1) WO2002031606A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5504692A (en) * 1992-06-15 1996-04-02 E. I. Du Pont De Nemours Co., Inc. System and method for improved flow data reconciliation
US5613041A (en) * 1992-11-24 1997-03-18 Pavilion Technologies, Inc. Method and apparatus for operating neural network with missing and/or incomplete data
WO1997030400A1 (fr) * 1996-02-02 1997-08-21 Rodney Michael John Cotterill Procede de traitement de flux de donnees dans un reseau neuronal, et reseau neuronal
US6169981B1 (en) * 1996-06-04 2001-01-02 Paul J. Werbos 3-brain architecture for an intelligent decision and control system
US6216048B1 (en) * 1993-03-02 2001-04-10 Pavilion Technologies, Inc. Method and apparatus for determining the sensitivity of inputs to a neural network on output parameters

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5504692A (en) * 1992-06-15 1996-04-02 E. I. Du Pont De Nemours Co., Inc. System and method for improved flow data reconciliation
US5613041A (en) * 1992-11-24 1997-03-18 Pavilion Technologies, Inc. Method and apparatus for operating neural network with missing and/or incomplete data
US6216048B1 (en) * 1993-03-02 2001-04-10 Pavilion Technologies, Inc. Method and apparatus for determining the sensitivity of inputs to a neural network on output parameters
WO1997030400A1 (fr) * 1996-02-02 1997-08-21 Rodney Michael John Cotterill Procede de traitement de flux de donnees dans un reseau neuronal, et reseau neuronal
US6169981B1 (en) * 1996-06-04 2001-01-02 Paul J. Werbos 3-brain architecture for an intelligent decision and control system

Also Published As

Publication number Publication date
AU2001295435A1 (en) 2002-04-22

Similar Documents

Publication Publication Date Title
Hintzman Judgments of frequency and recognition memory in a multiple-trace memory model.
Schmidhuber Making the world differentiable: on using self supervised fully recurrent neural networks for dynamic reinforcement learning and planning in non-stationary environments
Passino Biomimicry for optimization, control, and automation
US5606646A (en) Recurrent neural network-based fuzzy logic system
Kasabov On-line learning, reasoning, rule extraction and aggregation in locally optimized evolving fuzzy neural networks
US5802506A (en) Adaptive autonomous agent with verbal learning
Chernova et al. Multi-thresholded approach to demonstration selection for interactive robot learning
Nauck et al. Choosing appropriate neuro-fuzzy models
WO2002031606A1 (fr) Echantillonnage de previsions et commande de perception
Stavroulakis Neuro-fuzzy and fuzzy-neural applications in telecommunications
Hatano et al. GBDT modeling of deep reinforcement learning agents using distillation
Dubova Generalizing with overly complex representations
McMahon et al. An autonomous explore/exploit strategy
Nauck et al. The evolution of neuro-fuzzy systems
Wu et al. Human cognitive learning in shared control via differential game with bounded rationality and incomplete information
Vefghi et al. Dynamic monitoring and control of patient anaesthetic and dose levels: time-delay, moving-average neural networks, and principal components analysis
Suh et al. The context-aware learning model: Reward-based and experience-based logistic regression backpropagation
Scholtes Kohonen feature maps in natural language processing
Shi Reinforcement-learning based control for nonlinear systems
US20060224543A1 (en) Guidance system
Eriksson et al. Dynamic network architectures for deep q-learning: Modelling neurogenesis in artificial intelligence
US20250131279A1 (en) Training neural networks for policy adaptation
Coster et al. Expectation and conditioning
Zitar Machine learning with rule extraction by genetic assisted reinforcement (REGAR): application to nonlinear control
Lemhaouri et al. A Developmental Robot Model of Early Language Acquisition

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ CZ DE DE DK DK DM DZ EC EE EE ES FI FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PH PL PT RO RU SD SE SG SI SK SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP