SELECTIVE ATTENTION ADAPTIVE RESONANCE THEORY
TECHNICAL FIELD
This invention is for a pattern recognition system that uses an artificial neural network architecture with novel mechanisms for object and 2-D pattern (signal) recognition in cluttered and noisy images. The proposed neural network, hereinafter referred to as Selective Attention Adaptive Resonance Theory (SAART), extends the capability of known Adaptive Resonance Theory (ART) neural networks (as first proposed by G. Carpenter and S. Grossberg) to difficult pattern and object recognition problems. This invention proposes an object recognition architecture that uses flexible neural layers and novel mechanisms for selective information transfer which enable the network to use its stored memory more effectively in order to selectively attend to and recognise a learnt object when that object appears in a complex or noisy background.
BACKGROUND ART
In the ART neural networks of G. Carpenter and S. Grossberg, a comparison is made between the spatial patterns across two neural layers, one of which receives an external input only, whereas the second neural layer may also receive a top-down input (a learned pattern) from the activated memory of the network. An input pattern is deemed to be recognized when this match exceeds a certain pre-set threshold value. However, because ART neural networks do not have suitable mechanisms by which non-relevant clutter (and noise) may be filtered out, ART networks cannot recognize learned objects when they are embedded in a cluttered and noisy background scene. This limits the application of ART neural networks (and all the existing approaches to object recognition) in that it does not suggest how the memory within a neural network may be used to extract and recognize an object in a complex and cluttered environment such as a visual environment. This deficiency is due to ART's rigid attentional sub-system which does not allow a match to occur between portions of the input that can match the recalled memory.
It is an object of this invention to improve on the existing pattern recognition methods and systems. It is a further object of this invention to provide a new multi-dimensional artificial neural network architecture that is based on flexible neural layers and novel mechanisms that overcome at least some of the problems that are faced by the current object and pattern recognition systems.
In particular, this invention proposes how a multi-dimensional (in this case two-dimensional) artificial neural network may be designed to automatically and selectively filter the non-relevant clutter in order to recognize the desired input pattern in a complex and cluttered input such as an image.
The object of the invention is achieved in part by the use of "memory guided selective attention", which is implemented by a mechanism called top-down presynaptic facilitation whereby the recalled top-down memory is used to modulate the bottom-up signal transmission gain. That is, the spatial pattern from the recalled long term memory (LTM) is used to selectively amplify certain bottom-up signal transmission gains, while competition in the network removes the unwanted signals.
DISCLOSURE OF THE INVENTION
Therefore in one form of the invention though this need not be the only or indeed the broadest form there is proposed a_pattern recognition system comprising a neural network architecture consisting of at least two interconnected layers wherein one of the at least two layers selectively modulates input information into another of the at least two layers.
In a further form of the invention there is proposed a pattern recognition system comprising a neural network architecture consisting of at least two layers in communication with each other; the first layer being a memory layer and having means to store a plurality of memory fields the second layer adapted to receive an external input and comprising a modulation means that modulates said external input to produce a modulated signal; a comparison means that compares the memory fields of the memory layer with the modulated signal and amplifies those portions of the modulated signal that are Iocated in the memory
fields thereby producing an indicator of a match between the memory field and the modulated signal.
Preferably if no portions of the modulated signal are matched with those Iocated in the memory fields the modulated signal is subsequently stored as a new memory field.
Preferably those portions of the modulated signal that are not Iocated in the memory fields are attenuated from the modulated signal resulting in a resonance condition between the first and second layers.
In preference the modulation is weighted according to a pre-determined formula.
In a yet further form of the invention there is proposed a signal recognition system comprising; a plurality of network layers, at least one of the layers being a memory layer comprising a memory means adapted to store a plurality of signals; wherein at least two of the layers are adapted to receive an external signal and output that signal to at least some of the other layers; the system further comprising; a comparator means that compares the received signal with the stored signals in the memory fields and effect a matching output signal whose value depends on the degree of correlation between the received signal and the stored signals said matching output signal being input into the at least two layers adapted to receive the external signal, an amplifier means that amplifies the matching output signal which is input into the first of the two receiving layers and compared with the received external signal; an attenuator means that attenuates the matching output signal which is input into the second of the receiving layers and compared with the received external signal; thus resulting in the part of the received signal that was initially stored in the memory fields to become the dominant signal in the first of the receiving layers and to be totally absent in the second of the receiving layers, that dominant signal subsequently stored in one of the memory fields.
In preference the system further comprises a modulating means that selectively modulates the signals between the layers that are in communication with each other.
In preference each of the layers further comprises of a plurality of cells in communication with each other that selectively amplify or attenuate the signal input into the layer.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a schematic diagram of a first embodiment of the invention showing the architecture of the SAART neural network;
Figure 2 is a schematic diagram of a one-dimensional modular comparative neural layer;
Figure 3 is a schematic diagram illustrating the model of a facilitated chemical synapse;
Figure 4 is a schematic diagram illustrating the key property of the facilitator comparative neural layer;
Figure 5 is a simulated example illustrating the effect of facilitation on the steady state output of the neural layer;
Figure 6 is a progressive computer simulation output of the layer on target objects whose resultant edges are used as the 2-dimensional facilitatory spatial pattern;
Figure 7 is an example of selective facilitation on cluttered visual images as produced by computer simulations;
Figure 8 is a one-dimensional schematic of the fee forward-back interactions between two 2-dimensional presynaptically modulated comparative neural layers FO to F1 ;
Figure 9 is an example of two 3 object shapes used to generate a set of noisy inputs for the network;
Figure 10 is an example of the noisy inputs that we use to train the neural network;
Figure 11 shows the temporal evolution of the top-down and bottom-up long term memories whilst the neural network is engaged in learning input patterns;
Figure 12 shows images of 3 objects whose shape is learnt by the network;
Figure 13 is an example of a complex test image and the recognition results;
Figure 14 is a schematic diagram of the extended feet ford excitation-feed back presynaptic facilitation neural circuit;
Figure 15 is a schematic diagram showing how the neural network can enable the feed ford excitation-feedback presynaptic facilitation circuit of Figure 14 to test for the presence of the relevant 2-dimensional shape within the input; and
Figure 16 is a temporal simulation of the extended feed forward excitation-feed back presynaptically facilitation of Figure 14.
DESCRIPTION OF THE INVENTION
Although it is possible to design the SAART neural network with a minimum of only three processing layers, the following discussion below describes the interconnectivity and the dynamics of a network that consists of six processing layers. Thus although many variants of the invention may exist, the main requirement is to provide means by which the established memory within the network can modulate the bottom-up signal transmission gains into a layer of competitive neurons, thus enabling a match to occur between the recalled (or active) memory and a portion of the overall input. The following discussion also concentrates on visual recognition data, however, it is to be understood that the invention can be used or any pattern recognition regardless of the format of the data, that is visual, aural or simply radio signals to mention a few.
Turning now to the drawings in detail, FIG 1 shows a preferred architecture of the SAART neural network. The network consists of six
dynamic layers called Presynaptically Facilitated Excitatory Shunting Competitive Neural Layers (L1, L2, L3, L4, L5 and L6). Cells within each layer are engaged in lateral competition. The net steady state activity of each cell in the layer is thus determined by the combined effect of the input signal and the lateral competition within the layer. These neural layers are interconnected via dynamic synaptic pathways 2, 3, 4, 5, 9, 10, 11 , 12, 13, 14 and 15 (models of chemical synapses) whose signal transmission gain may be selectively modulated by the output of other layers. Memories of learned spatial patterns are retained in the adaptive bottom-up long-term memory (BU-LTM) pathways 12 (from L4 to L6) and the adaptive top-down long-term memory (TD-LTM) pathways 13 (from L6 to L5). These memories represent a 2-D spatial pattern (i.e., the shape of an object).
Synapse is a neurobiological term that describes the chemical junction between the terminal ending of one axonal pathway and the dendrite of another cell (or the cell body itself), where nerve impulses can pass from one cell to another by the release and diffusion across the small gap between the pre and postsynaptic neurons of small quantities of specific chemical (a neurotransmitter) substance. The released neurotransmitter is received by the receptors on the postsynaptic side and is converted into an electrical signal that excites (or inhibits) the target cell. It is to be understood that the dynamics of chemical synapses can be modelled (either in software, silicon hardware implementation or hybrid) and embedded into artificial neural networks to provide a number of processing enhancements. Embedding a model of synaptic dynamics into an artificial neural network provided the network with added processing flexibility (such as selective gain control).
In object recognition applications, the input to the network shown in Figure 1 at is a 2-dimensional edge processed image. This input enters the network via two layers (L1 and L2) whose bottom-up signal transmission gain along pathways 2 and 3 are modulated by the activity of cells in layer L3. In the simplest possible case, each cell in layers L1 and L2 sample the bottom-up input via a signal pathway. Each cell in layer L3 projects a single facilitatory pathways to the input synapse of a corresponding cell in layer L1. Signals along these pathways interact with the synaptic dynamics by enhancing the transmitter mobilization rate in
the corresponding input pathways 2 into layer L1. The effect of this is that if a certain spatial pattern from the recalled memory is active across layer
L3, then this spatial pattern acts to amplify the gain of the bottom-up synapses into layer L1 , but only in those spatial locations that correspond to the active cells in layer L3. This mechanism, coupled with lateral competition in each layer, enables past memories to amplify the features of the relevant input while simultaneously filtering out the non-matching portions of the overall input at 1. However, the gain of the input pathways
2 to layer L2 (which are normally high) are suppressed by the active neurons in L2. The effect of this is that the spatial pattern that is filtered out of L1 will appear across L2. That is, the input that is attended by the network will appear across L1 while all the unattended bits of the input will appear across L2.
Since Layer L3 contains a sum of two spatial patterns or neural activities (the bottom-up activity from Layer L1 and the top-down activity from Layer L5) the resultant neural activity that ends circulating in the loop L3→L4→L5→L3 will depend not only on what has entered this loop from layer L1 but also on the signals entering this loop from the recalled memory (top-down signals from the activated cells in layer L6). Very weak inputs to layer L5 from layer L6 (i.e., weak top-down memory) will not have any effect on the system, allowing the bottom-up input activity to enter and resonate in the loop, which then leads to learning of the new input stimulus. However, strong inputs from layer L6 will totally overwrite the activity of layer L5 that was initially due to the input signals that are passed through L1 and L3. This strong top-down activity will circulate (reverberate) in the L3→L4- L5- L3 loop while at the same time it will amplify the corresponding bottom-up neural signals that may be available at the input to L1. Whether this leads to stable resonance (equilibrium) and learning depends on whether the reverberating neural activity across L3 can select a portion of the bottom-up activity across L1 with which it can match. Comparison of the activities across L1 and L3 at 17 indicates whether the spatial patterns across the two layers are matched above the required threshold level 16. The long term memory is updated whenever the cellular activities (spatial patterns) across L1 and L3 are matched to within the desired threshold level (typically chosen to be high) and when the time rate of change of the match has fallen below a preset steady state threshold level. If the two spatial patterns cannot be matched, the
network is reset and then attempts to access another previously learned memory or it will learn the total input activity across L1. The strength of the circulating activity in the L3→L4-»L5→L3 loop can be controlled by adjusting the gains in pathways 9, 14 and 15.
Layer L5 is designed so that its cells are more sensitive to the top-down signals from layer L6 than to signals from layer L4 and also have a greater saturation level (by at least a factor of 10). Long term memory is updated whenever the 2-D spatial patterns of cellular activity across layers L1 and L3 are matched to within a pre-set tolerance level (or vigilance) and when the network reaches a stable state (i.e., when dR/dt is approx 0 where R is the degree of match). In the simplest possible case, the degree of match between the spatial patterns across layers L1 and L3 can be determined by calculating the cosine of the angle that is formed by the two multidimensional vectors.
Memory gated bottom-up synaptic inputs to layer L5 are also gated by a model of dynamic postsynaptic receptors. Inactivation (or desensitization) of the postsynaptic receptors along the active bottom-up pathways that are linked to active cells in layer L6 can induce attentional shifts by temporarily biasing the network against continuously attending to the same stimulus. However, attentional shifts to other familiar inputs are driven by the direct bottom-up pathways from layer L2 (which contains all the currently non-attended input information). These direct pathways from L2 will release some LTM transmitter in the bottom-up LTM synapses of cells in layer L6, thus exciting the non-excited cells in layer L6. When this direct excitation of cells in layer L6 is strong enough to overcome the competitive feedback inhibition, a new cell (or cell sub-population) in L6 wins the competition and sends its top-down memory in to the reverberatory loop, thus leading to a new resonance.
In Fig. 2 we show a one-dimensional schematic of a modulated competitive neural layer. The input to the layer is gated by the mobilized transmitter whose level is facilitated (enhanced) by another spatial pattern from outside the layer. The combined action of facilitation and the competition within the layer leads to a stable state where certain features of the input are enhanced at the expense of others. The terms used in the figure are as follows; Zij - transmitter production rate, Uij - stored
transmitter, Yij - mobilised transmitter. The figure further includes the slow inhibitory neurons 20, fast excitatory cells 21 , post synaptic feedback 22, excitatory postsynaptic potential 23, transmitter synapse 24, presynaptic facilitation 25, synaptic input 26, facilitatory input 27.
The following five equations represent the minimum set of equations that specify the model of a facilitated competitive neural layer (the mathematical procedure for choosing the various parameters of the layer will be described in the Appendix).
Postsynaptic cellular activity x^ )
BGvif X;: = lJ- y A + GV;; + G V,
V IJ
where B is the upper saturation level, A is typically chosen to be 1 , G is the excitatory postsynaptic gain, v„ is the excitatory postsynaptic potential, vj*.* is the lateral feedback inhibitory potential, while G is the gain of the lateral feedback inhibition.
Lateral competitive feedback inhibition (v,** )
Lateral competition in the layer is mediated by slow inhibitory inter¬ neurons (represented by black circles in Fig 2), whose activity is given by
where f(χpq) = max(xpq - Θ,0) is a thresholding function (linear above Θ) which ensures that a cell is contributing to lateral inhibition only when its activity is above the threshold of Θ . A and B are constants (typically chosen such that A = A/10 while B is typically chosen to be equal to , where n is the number of neurons in the
M lOπ layer.
Excitatory postsynaptic potential ( vy )
dvii dt V Jijyfj[p + κvf(χij) ]
where D is the decay rate, Jy is the layer input at location (i,j), pv and Kv are constants ( Kv is typically chosen to be about 10 times larger than pv); f(xtj) is the thresholded postsynaptic feedback.
Note that the postsynaptic feedback signal interacts multiplicatively with the synaptic input signal Jy- to provide a further boost in the released transmitter and an increase in vy . Thus the postsynaptic cell can increase vy but only under the condition of correlated firing with the synaptic input.
Stored Transmitter (uy- )
duii r
-£- - <*u(Zij - Uij) ~ ψu + JijKuf{χij){uij -y
The postsynaptic feedback signal also interacts multiplicatively with the input to deplete the stored transmitter «y.
Mobilized Transmitter yn
- - - [βy + Ηj -y ] ~ Jijyij[Py + κyf(xij ] ~ Yyyij
The facilitatory signal Fy in the above equation acts to increase the level of the mobilized transmitter and hence the input synaptic gain. The above model of a facilitated chemical synapse is schematized in Fig 3.
Fig 4 illustrates the main property of the facilitated competitive neural layer. The figure depicts a 2-D spatial pattern being fed into the layer input. Also shown is the facilitatory spatial pattern which selectively amplifies the input gain into the layer. Since the facilitatory pattern (the car shape) is a subset of the input pattern, the resultant steady state output of the layer will thus represent only the relevant portion of the
10/ 1 input (provided that the facilitatory pattern exists in the input). In this figure the stead state activity in FO 30, can be compared with the pattern selective facilitatory presynaptic gain control signals 31 , and the input 32.
Fig 5 shows computer simulation results that demonstrate the processing property of the layer in the unfacilitated and the facilitated case. Figure 5(a) shows the input into the layer, 5(b) shows the steady state output of
11 the layer (cellular activity) for the case of zero facilitation, 5(c) shows the facilitatory pattern Fij whilst 5(d) shows the steady state output of the facilitated layer. Note that although the strength of the facilitated input varies and is in some places much weaker than the unfacilitated input, all the relevant portions of the facilitated pattern are available at the output of the layer (and with approximately equal strength).
In Fig 6 and 7 there are shown several computer simulations of the layer on real world visual images. Fig 6(a) shows the first object, 6(b) the object edges, 6(c) the input, 6(d) the steady state of the layer. Fig 7 shows a number of test images, image edges, layer inputs and layer outputs clearly showing how the network has been able to differentiate the desirable signal from the noisy or cluttered background.
Fig 8 is a one-dimensional schematic of feedforward-feedback interactions between two 2-D presynaptically modulated competitive neural layers FO and F1 (these correspond to layers L1 and L3 respectively in the SAART neural network). The external input 41 is compared to the memory input 45. the figure indicates the process of the network, with the steady state activity in F1 44, the facilitatory presynaptic feedback from F1 to FO 43 and the steady state activity in FO 42. Layer FO provides a bottom-up excitatory input to layer F1 , while layer F1 backprojects facilitatory presynaptic feedback signals to modulate the bottom-up input synapses of layer F0 (thus modulating their synaptic signal transmissions gains). When layer F1 is designed to be more sensitive to the top-down inputs, then these feedforward-feedback interactions lead to selective synchronization or selective resonance between F0 and F1 , as indicated by the sketched steady state cellular activities.
The results of computer simulations employ in the neural architecture of the invention is shown in Figures 9-11. These computer simulations results of the complete SAART neural network were conducted in two applications:
(i) unsupervised real-time learning i noisy inputs; and
(ii) object recognition in cluttered visual images.
12 Fig 9 shows two three object shapes (ship boundaries) that were used to generate a set of noisy inputs (examples of which are shown in Fig 10) to be used as the training inputs to the SAART network (each input is 32 by
32 elements and is shown in reverse contrast). Note that the ship boundaries in the original input arrays were prealigned to maximise their spatial overlap (the purpose of this was to increase the difficulty of pattern learning and recognition). Fig 9(a) shows the first shape or ship boundary
1 , 9(b) shows the second shape of ship boundary 2 whilst 9(c) shows shape 3 or the boundary of ship 3.
Fig 11 shows the time-evolution of the top-down and the bottom-up long term memories (transmitter production rates) while the SAART neural network is engaged in learning the input patterns that are embedded in the noisy background. The vertical scale is the temporal evolution of the pattern.
Fig 12 shows images of three objects whose shape is learned by the network (one at a time). After the network is trained on the three object shapes, it is then exposed to a number of complex images within which the learned objects are embedded. The procedure for image rendering and threshold settings is described below, the first column is the object, the second column the object edges, the third column the input into the system, the fourth column the bottom-up long term memory whilst the fifth column is the top down long term memory.
Original grey level images (8 bits, 256 x 256 pixels, captured by a Vidicon camera and then digitized) were preprocessed by a 3 x 3 Sobel edge operator to obtain object edges. The resultant edge processed images were then scaled (by a simple averaging procedure) to the size of the network layers (32 x 32 cells). This reduces the resolution of the target and the test images by a factor of 8. The edge processed images of the three shown target objects are initially learned by the network. The network is then tested on the cluttered visual images. Note that as a result of the competition in the network, some of the weaker edges in the edge images of the target objects did not survive and were absent in the memory. The shape of each target object was learned with a high learning rate (0.5). Note that this learning is not switched off during the test phase.
13
Recognition in the network is achieved when the match between the spatial patterns of activity across L1 and L3 exceeds the pre-set threshold level of 0.90 (currently measured by the cosine of the angle between the two multidimensional vectors) and when the time-rate of change of the match (which is measured over four iterations of the network) is below the pre-set steady state threshold level of 0.00005 (i.e., the computational decision is taken at the steady state). The threshold for recognition is determined by first finding the highest level of the match between the three target shapes of (which was 0.851 ), and then setting the threshold above this value. A choice of 0.9 for the recognition threshold was thus chosen simply on the basis of requiring a sufficiently high discriminatory power between the target shapes, while providing sufficient flexibility for the cases when a fraction of the object edges are not detected in the cluttered images. This threshold is then set and remains fixed thereafter.
Fig 13 shows some of the complex test images and the recognition results.
To summarise the simulation results, we note that only one object was not recognised (in the Test image 11 ). The reason for this is that a large fraction of the object's shape blends in with the cluttered background and its edges are not detected. The recognition of several objects was just above the threshold level. This is mainly due to the loss of resolution when reducing the size of the original images. However, none of the objects were misclassified.
The following is a description of a simple extension that can be made at the input of the network to increase its robustness against small variations in the input shape relative to the stored shape.
Two Dimensional Sampling of the Input
If we extend layer L1 so that each cell in the layer samples its inputs via a set of synaptic pathways or a 2-D filter (such as a 2-D Gaussian, as shown in Fig 14), then the robustness of the network against small changes in the input shape can be increased dramatically (i.e., the network becomes insensitive to small distortions or small displacement of the input shape relative to the stored shape). This extension is illustrated
14 in Fig 14 where we show the modified Feedforward Excitation-Feedback
Presynaptic Facilitation neural circuit. In Figure 14 field FO samples the bottom up inputs via a number of input synapses whose transmitter production rate is modulated by a 2-Dimensional Gaussian and whose signal transmission gain (mobilized transmitter) is facilitated by active cells in F1. Postsynaptic feedback from active cells in FO interacts multiplicatively with sampled bottom-up inputs to further regulate the synaptic signal transmission gain by the depletion of eh stored transmitter in the activated bottom-up pathways. As indicated in Fig 15, the circuit can now search for a particular shape over a much larger input region. The circuit tests for the presence of the relevant 2-Dimensional shape within the facilitated input receptive fields. Computer simulations in Fig 16 demonstrate some of the capabilities of the extension on several distorted versions of the same shape (with and without clutter). Figure 16 shows simulation of the extended FFE-FBPF neural circuit on distorted D- Dimensional shapes (with a top-down reference), The data is shown in reverse contrast (black = 1.0. white = 0). The input array is 40x40 elements, while the two neural layers in the circuit are both 34x34 class in size. Each input shown in the left column was presented to the extended FFE-FBPF circuit for 50 iterations. The resultant steady state spatial patterns across Field FO and F1 are shown in the second and the third column respectively. The bracketed numbers in the rightmost column indicate the steady state match between the spatial patterns across Fields F0 and F1 (the required match for recognition is (great than or equal to) 0.9800; steady state is assumed when the rate of change of the match falls below 0.0001 ).
Throughout this specification various indications are given as to the scope of the invention. However, the invention is not limited to any one of these and may reside in two or more combined together. The purpose of the description is for illustration only and nor for limitation.