Detailed Description
The present disclosure will now be described with reference to the accompanying drawings, in which preferred example embodiments of the disclosure are shown. However, the present disclosure may be embodied in other forms and should not be construed as limited to the embodiments set forth herein. The disclosed embodiments are provided to fully convey the scope of the disclosure to the skilled artisan.
Terminology
Hereinafter referred to as "nodes". The term "node" may refer to a neuron, such as a neuron of an artificial neural network, another processing element of a network of processing elements, such as a processor, or a combination thereof. Thus, the term "network" (NW) may refer to an artificial neural network, a network of processing elements, or a combination thereof.
Hereinafter referred to as "processing unit". The processing unit may also be referred to as a synapse, such as an input unit (with processing unit) for a node. However, in some embodiments, the processing unit is a (general purpose) processing unit (other than a synapse) associated with (connected to, connectable to, or included in) a node of the NW, or a (general purpose) processing unit located between two different nodes of the NW.
Hereinafter referred to as "context". Context is the environment or situation involved. The context is related to the type of (input) data that is intended, e.g. different types of tasks, where each different task has its own context. As an example, if the system input is a pixel from an image sensor, and the image sensor is exposed to different lighting conditions, each different lighting condition may be a different context of an object imaged by the image sensor, such as a ball, car, or tree. As another example, if the system input is an audio frequency band from one or more microphones, each different speaker may be a different context of phonemes present in the one or more audio frequency bands.
Hereinafter referred to as "measurable". The term "measurable" should be interpreted as something that can be measured or detected, i.e. something that is detectable. The terms "measurement" and "sensing" should be interpreted as synonyms.
Hereinafter referred to as "entities". The term entity is to be interpreted as an entity, such as a physical entity or a more abstract entity, such as a financial entity, e.g. one or more financial data sets. The term "physical entity" will be interpreted as an entity having physical presence, such as an object, a feature (of an object), a gesture, an applied pressure, a speaker, a spoken letter, a syllable, a phoneme, a word or a phrase.
Hereinafter referred to as "update unit". The update unit may be an update module or an update object.
Embodiments will be described below, wherein FIG. 1 is a schematic block diagram of a data processing system 100 shown in accordance with some embodiments, and FIG. 2 is a schematic block diagram of a data processing system 100 shown in accordance with some embodiments. In some embodiments, data processing system 100 is a Network (NW) or data processing system 100 includes an NW. In some embodiments, data processing system 100 is or includes a deep neural network, a deep belief network, a deep reinforcement learning system, a recurrent neural network, or a convolutional neural network.
The data processing system 100 has or is configured with one or more system inputs 110a, 110b, …, 110z. One or more of the system inputs 110a, 110b, …, 110z include data to be processed. The data may be multidimensional. For example, a plurality of signals are provided in parallel. In some embodiments, the system inputs 110a, 110b, …, 110z include or consist of time-continuous data. In some embodiments, the data to be processed includes data from sensors, such as image sensors, touch sensors, and/or sound sensors (e.g., microphones). Further, in some embodiments, one or more of the system inputs 110a, 110b, …, 110z includes sensor data for a plurality of contexts/tasks, e.g., when the data processing system 100 is in a learn mode and/or when the data processing system 100 is in an execute mode. That is, in some embodiments, data processing system 100 is in both an execution mode and a learning mode.
Further, data processing system 100 has or is configured with a system output 120. Data processing system 100 includes Network (NW) 130.NW 130 includes a plurality of nodes 130a, 130b, …,130 x. Each node 130a, 130b, …,130 x has or is configured with a plurality of inputs 132a, 132b, …, 132y. In some embodiments, at least one of the plurality of inputs 132a, 132b, …, 132y is a system input 110a, 110b, …, 110z. Further, in some embodiments, all of the system inputs 110a, 110b, …, 110z are used as inputs 132a, 132b, …, 132y to one or more of the nodes 130a, 130b, …,130 x. Further, in some embodiments, each of the nodes 130a, 130b, …,130 x has one or more system inputs 110a, 110b, …, 110z as inputs 132a, 132b, …, 132y. Each node 130a, 130b, …,130 x has or includes a weight Wa, wb, …, wy for each input 132a, 132b, …, 132y, i.e., each input 132a, 132b, …, 132y is associated with a respective weight Wa, wb, …, wy. In some embodiments, each weight Wa, wb, …, wy has a value in the range from 0 to 1. Further, NW 130, or each node thereof, generates or is configured to generate an output 134a, 134b, …, 134x. In some embodiments, each node 130a, 130b, …,130 x calculates (to that node) a combination of inputs 132a, 132b, …, 132y multiplied by respective weights Wa, wb, …, wy, such as a (linear) sum, square sum, or average, to produce outputs 134a, 134b, …, 134x.
The data processing system 100 comprises one or more updating units 150 configured to update the weights Wa, …, wy of each node during the learning mode based on (in accordance with) the correlation of each respective input 132a, …,132 c of the node (e.g., 130 a) with the corresponding output (e.g., 134 a), i.e., with the output of the same node (e.g., 130 a). In some embodiments, the weights are not updated during the execution mode. In one example, the updating of the weights Wa, wb, wc is based on (according to) the correlation of each respective input 132a, …,132 c to a node 130a with the combined activation of all inputs 132a, …,132 c to that node 130a, i.e., the correlation of each respective input 132a, …,132 c to a node 130a with the output 134a of that node 130a (node 130a is exemplary and applicable to all other nodes 130b, …,130 x). Thus, the correlation (value) between the first input 132a and the respective output 134a is calculated, the correlation (value) between the second input 132b and the respective output 134a is calculated, and the correlation (value) between the third input 132c and the respective output 134a is calculated. In some embodiments, different calculated correlation (series) values are compared to each other and the weights are updated based on (according to) the comparison. In some embodiments, updating the weights Wa, …, wy of each node based on (according to) the correlation of each respective input (e.g., 132a, …,132 c) of the node (e.g., 130 a) with the corresponding output (e.g., 134 a) includes evaluating each input (e.g., 132a, …,132 c) of the node (e.g., 130 a) based on (according to) a scoring function. The scoring function gives an indication of how useful each input (e.g., 132a, …,132 c) of a node (e.g., 130 a) is in space, e.g., for a corresponding output (e.g., 134 a) compared to other inputs (e.g., 132a, …,132 c) to the node, and/or in time, e.g., as the data processing system (100) processes the input (e.g., 132 a). As described above, the updating of the weights Wa, …, wy of each node is based on or in accordance with the correlation of each respective input 132a, …,132 c of the node (e.g., 130 a) with the corresponding output (e.g., 134 a), i.e., with the output of (only) the same node. Thus, the updating of the weight of each node is independent of the updating/learning in the other nodes, i.e. each node has independent learning.
Further, the data processing system 100 includes one or more processing units 140x configured to receive the processing unit input 142x and configured to generate a processing unit output 144x by changing the sign of the received processing unit input 142 x. In some implementations, the sign of the received processing unit input 142x is changed by multiplying the processing unit input 142x by-1. However, in other embodiments, the sign of the received processing unit input 142x is changed by shifting the received processing unit input 142x by 180 degrees. Or by reversing the sign of the received processing unit input 142x, for example from positive to negative or from negative to positive. The system output 120 includes an output 134a, 134b, …, 134x for each node 130a, 130b, …, 130 x. In some embodiments, the system output 120 is an array of outputs 134a, 134b, …, 134x. Further, in some embodiments, the system output 120 is used to identify one or more entities or measurable characteristics thereof (or characteristics) while in the execution mode, e.g., from sensor data.
In some embodiments, NW 130 includes only a first set 160 of the plurality of nodes 130a, 130b, …,130x (as shown in fig. 1). However, in some embodiments, NW 130 includes a first group 160 of the plurality of nodes 130a, 130b, …,130x and a second group 162 of the plurality of nodes 130a, 130b, …,130x (as shown in fig. 2). Each node (e.g., 130a, 130 b) in the first set 160 of the plurality of nodes (i.e., excitatory nodes) is configured to stimulate one or more other nodes (e.g., 130 x) in the plurality of nodes 130a, 130b, …,130x by providing an output (e.g., 134a, 134 b) of each node (e.g., 130a, 130 b) in the first set 160 of nodes (directly) as an input (132 d, …, 132 y) to one or more other nodes (e.g., 130 x) in the plurality of nodes 130a, 130b, …,130x, such as all other nodes 130b, …,130 x.
Further, the nodes (e.g., 130 x) in the second set 162 of the plurality of nodes are configured to suppress one or more other nodes 130a, 130b, …, such as one or more other nodes 130a, 130b, … being all other nodes 130a, 130b, … in the plurality of nodes 130a, 130b, …,130x, by providing an output (e.g., 134 x) of each node (e.g., 130 x) in the second set 162 as a processing unit input 142x to a respective processing unit (e.g., 140 x), each respective processing unit (e.g., 140 x) being configured to provide a processing unit output 144x as an input (e.g., 132b, 132 e) to one or more other nodes, e.g., 130a, 130 b). Each node of the plurality of nodes 130a, 130b, …,130x belongs to one of the first and second groups (160, 162) of nodes. Further, as described above, in some embodiments, all nodes 130a, 130b, …,130x belong to a first group 160 of nodes. In some embodiments, each node 130a, 130b, …,130x is configured to suppress or energize some/all of the other nodes 130b, …,130x in the plurality of nodes 130a, 130b, …,130x by multiplying the output 134a, 134b, …,134x (of each node 130a, 130b, …,130 x) by-1 or providing it directly as an input 132d, …,132 y to one or more other nodes 130b, …,130 x. By configuring one of the nodes to suppress other nodes and another of the nodes to excite other nodes and perform updates based on (based on) the correlation during the learning mode, a more efficient network may be provided, e.g., the utilization of available network capacity may be maximized, thereby providing a more efficient data processing system.
In some embodiments, the updating unit 150 comprises a probability value Pa, …, py for each weight Wa, …, wy for increasing the weight (and possibly a probability value Pad, …, pyd for decreasing the weight, in some embodiments the probability value is 1-Pa, …, 1-Py, i.e. Pad = 1-Pa, pbd = 1-Pb, etc.). In some embodiments, the updating unit 150 comprises a look-up table (LUT) for storing probability values Pa, …, py. During the learn mode, data processing system 100 is configured to limit the ability of a node (e.g., 130 a) to inhibit or stimulate one or more other nodes (e.g., 130b, …,130 x) by: providing a first set point of a sum of ownership weights (e.g., wd, wy) associated with inputs (e.g., 132d, …,132 y) to one or more other nodes (e.g., 130b, …,130 x), comparing the first set point to a sum of ownership weights (e.g., wd, wy) associated with inputs (e.g., 132d, …,132 y) to one or more other nodes (e.g., 130b, …,130 x), decreasing a probability value (e.g., wd, wy) associated with the input (e.g., 132d,132 y) to one or more other nodes (e.g., 130b, …,130 x), increasing a probability value (e.g., pd, 132 y) associated with the input (e.g., 132d,132 y) to one or more other nodes (e.g., 132d,132 y) if the first set point is less than a sum of ownership weights (e.g., wd, wy) associated with inputs (e.g., 132d,132 y) to one or more other nodes (e.g., 130b, …,130 x), increasing a probability value (e.g., pd, 132 y) associated with the input (e.g., 132d,132 y) to one or more nodes (e.g., 132d,132 y).
Further, in some embodiments, data processing system 100 is configured during a learn mode to limit the ability of a system input (e.g., 110 z) to inhibit or stimulate one or more nodes (e.g., 130b, 130 x) by: providing a first set point of a sum of ownership weights (e.g., wg, wx) associated with inputs (e.g., 132g, 132 x) to one or more nodes (e.g., 130b, 130 x) to a first set point, the first set point being compared to the sum of ownership weights (e.g., wg, 132 x) associated with inputs (e.g., 132g, 132 x) to one or more nodes (e.g., 130b, 130 x), decreasing a probability value (e.g., pg, px) associated with the weights (e.g., wg, wx) if the first set point is less than the sum of ownership weights (e.g., wg, 132 x) associated with inputs (e.g., 132g, 132 x) to one or more nodes (e.g., 130b, 130 x), and increasing a probability value (e.g., pg, px) associated with the first set point (e.g., wg, wx) if the first set point is greater than the sum of ownership weights (e.g., 132g, 132 x) associated with the first set point (e.g., 132 g., 132g, 132 x) to the one or more nodes (e.g., 132 g., 132g, 132 x).
Further, in some embodiments, each input (e.g., 132d, 132 y) to one or more other nodes (e.g., 130b, 130 x) has coordinates in the network space, and the amount of decrease/increase in the weight (e.g., wd, wy) of the input (e.g., 132d, 132 y) to the one or more other nodes (e.g., 130b, 130 x) is based on (in terms of) the distance between the coordinates in the network space of the input (e.g., 132d, 132 y) associated with the weight (e.g., wd, wy). In these embodiments, the decrease/increase in weight is based on (according to) the probability of decreasing/increasing the weight (indicated by the probability value) and is based on (according to) the amount of decreasing/increasing the weight (which is calculated based on the distance between the coordinates entered in the network space).
In some embodiments, data processing system 100 is (further) configured to: if the weights Wa, …, wy (discussed) do not increase for the (first) preset period of time, the weights Wa, …, wy (e.g., any one of one or more of the weights) are set to zero. Moreover, in some embodiments, data processing system 100 is (further) configured to: if the sum of all weights (e.g., wd, wy) associated with inputs (e.g., 132d, 132 y) to one or more other nodes (e.g., 130b, 130 x) does not exceed the first set point within a (second) preset time period, then the probability value Pa, …, py of the weight Wa, …, wy having a zero value is increased.
In some embodiments, data processing system 100 is configured during a learn mode to increase the relevance of the output (e.g., 134 a) of a node (e.g., 130 a) to one or more other nodes (e.g., 130b, 130 x) by: providing a first set point of a sum of ownership weights (e.g., wd, wy) associated with inputs (e.g., 132d, 132 y) to a plurality of other nodes (e.g., 130b, 130 x), comparing the first set point to the sum of ownership weights (e.g., 132d, 132 y) associated with inputs (e.g., 132d, 132 y) to one or more other nodes (e.g., 130b, 130 x), increasing a probability of changing the input (e.g., 132a, 132b, 132 c) to the sum of ownership weights (e.g., wd, wy) to the node (e.g., 132 a), and if the first set point is less than the sum of ownership weights (e.g., wd, wy) associated with inputs (e.g., 132b, 130 x) to one or more other nodes (e.g., 130b, 130 x) over the entire length of the first time period, decreasing the probability of changing the sum of ownership weights (e.g., 132d, wy) to the node (e.g., 132a, 132b, 132 c) over the entire length of the first time period, and if the first set point is less than the sum of ownership weights (e.g., 132d, 132 y) associated with inputs (e.g., 132d, 132 y) to the one or more other nodes (e.g., 130b, 132 y) and if the probability of changing the sum of ownership weights (e.g., 132b, 132 c) is not changing over the entire length of the total length of the first set point (e.g., 132b, 132 c) to be less than the sum of ownership).
Furthermore, in some embodiments, the updating unit 150 comprises a probability value Pa, …, py for each weight Wa, …, wy for increasing the weight (and possibly a probability value Pad, …, pyd for decreasing the weight, in some embodiments the probability value is 1-Pa, …, 1-Py, i.e. Pad = 1-Pa, pbd = 1-Pb, etc.). In these embodiments, during the learn mode, the data processing system 100 is configured to provide a second set point of the sum of the ownership weights Wa, wb, wc associated with the inputs 132a, 132b, 132c to the node 130a, the data processing system 100 is configured to calculate the sum of the ownership weights Wa, wb, wc associated with the inputs 132a, 132b, 132c to the node 130a, the data processing system 100 is configured to compare the calculated sum with the second set point, and if the calculated sum is greater than the second set point, to decrease the probability values Pa, pb, pc associated with the ownership weights Wa, wb, wc associated with the inputs 132a, 132b, 132c to the node 130a, and if the calculated sum is less than the second set point, to increase the probability values Pa, pb, pc associated with the weights Wa, wc associated with the inputs 132a, 132b, 132c (to the other nodes 130b, …, 130 x) as an example.
Further, in some embodiments, during the learning mode, the data processing system 100 is configured to detect whether the network 130 is sparsely connected by comparing the cumulative weight change of the one or more system inputs 110a, 110b, …, 110z over the second period of time to a threshold. The cumulative weight change refers to a change in weight Wa, wf, wg, wx associated with one or more system inputs 110a, 110b, …, 110z over a second period of time. The second time period may be a predetermined time period. If the cumulative weight change is greater than the threshold, then the network 130 is determined to be sparsely connected. Further, data processing system 100 is configured to: if the data processing system 100 detects a sparse connection of the network 130, the output 134a, 134b, …, 134x of one or more of the plurality of nodes 130a, 130b, …,130 x is increased by adding a predetermined waveform to the output 134a, 134b, …, 134x of one or more of the plurality of nodes 130a, 130b, …,130 x for the duration of the third period of time. The third time period may be a predetermined time period. The nodes may be better grouped together by adding a predetermined waveform to the outputs 134a, 134b, …, 134x of one or more of the plurality of nodes 130a, 130b, …,130 x for the duration of the third time period.
Furthermore, in some embodiments, each node includes an update unit 150. Each updating unit 150 is configured to update the weights Wa, wb, wc of a respective node 130a based on (according to) the correlation of each respective input 132a, …, 132c of that node 130a with the output 134a of that node 130 a. Furthermore, each updating unit 150 is configured to apply a first function to the correlation if the associated node belongs to a first group 160 of the plurality of nodes and to apply a second function different from the first function to the correlation if the associated node belongs to a second group 162 of the plurality of nodes, in order to update the weights Wa, wb, wc during the learning mode (node 130a is an example and is also applicable to all other nodes 130b, …, 130 x). In some embodiments, the first (learning) function is a function in which the output, i.e. the weight change (value), increases exponentially if the input, i.e. the correlation (value), increases and vice versa (decreasing the input results in exponentially decreasing the output). In some embodiments, the second (learning) function is a function in which the output, i.e. the weight change (value), decreases exponentially if the input, i.e. the correlation (value), increases, and vice versa (decreasing the input results in exponentially increasing the output).
In some embodiments, the data processing system 100 is configured to calculate the overall variance of the outputs 134a, 134b, …, 134x of the nodes 130a, 130b, …, 130x of the network after the updating of the weights Wa, …, wy has been performed, compare the calculated overall variance to a power law; and minimizing errors between the population and the power law, such as mean absolute error or mean square error, by adjusting parameters of the network. Thus, the overall variance of the outputs 134a, 134b, …, 134x of the nodes 130a, 130b, …, 130x of the network may approximate a power law distribution. Hereby, an optimal resource utilization is achieved and/or each node is allowed to contribute in an optimal way, thereby providing a more efficient data utilization. The power law may be, for example, a logarithm of the amount of change based on (in terms of) a logarithmic interpretation of the number of components resulting from the principal component analysis. In another example, the power law is based on principal component analysis of finite time vectors of activity/output across all neurons, with each principal component score in the abscissa being replaced with a node number. It is assumed that the input data to which the system is exposed has a greater number of principal components than nodes. In this case, each node added to the system potentially expands the maximum capacity of the system when following the power law. Examples of (adjustable) parameters of the network include: the type of scaling learned (how the weights are composed, the range of weights, etc.), the induced change in synaptic weights at update (e.g., exponentially, linearly), the amount of gain in learning, one or more time constants of the state memory of the or each node, the specific learning function (e.g., first function and/or second function), the transfer function of each node, the total capacity of the connection between the node and the sensor, the total capacity of the node across all nodes.
Further, in some embodiments, the data processing system 100 is configured to learn from the sensor data to identify one or more (unidentified) entities or (unidentified) measurable characteristics (or properties) thereof while in the learning mode, and thereafter to identify the one or more entities or measurable characteristics (or properties) thereof while in the execution mode, e.g., from the sensor data. In some embodiments, the identified entities are one or more of speakers, spoken letters, syllables, phonemes, words or phrases present in the (audio) sensor data. Alternatively or additionally, the identified entities are one or more objects or one or more features (e.g., pixels) of objects present in the sensor data. As a further alternative or in addition, the identified entity is a new contact event present in the (touch) sensor data, the end of the contact event, a gesture or an applied pressure. Although in some embodiments all sensor data is a particular type of sensor data, such as audio sensor data, image sensor data, or touch sensor data, in other embodiments the sensor data is a mix of different types of sensor data, such as audio sensor data, image sensor data, and touch sensor data, i.e., the sensor data includes different modalities. In some embodiments, the data processing system 100 is configured to learn from the sensor data to identify the measurable characteristic (or characteristics) of the entity. The measurable characteristic may be a feature of the object, a portion of a feature, a time-evolving trajectory of the location, a trajectory of the applied pressure, or a frequency feature or time-evolving frequency feature of a certain speaker when speaking a certain letter, syllable, phoneme, word or phrase. Such measurable characteristics may then be mapped to an entity. For example, features of an object may be mapped to the object, a portion of the features may be mapped to features (of the object), a locus of locations may be mapped to gestures, a locus of applied pressure may be mapped to (maximum) applied pressure, a frequency feature of a certain speaker may be mapped to the speaker, and spoken letters, syllables, phonemes, words or phrases may be mapped to actual letters, syllables, phonemes, words or phrases. Such a mapping may simply be a lookup in a memory, a lookup table or a database. The searching may be based on (from) finding the entity of the plurality of physical entities having the closest characteristic to the identified measurable characteristic. From this lookup, the actual entity can be identified. Furthermore, the data processing system 100 may be used in a warehouse, e.g. as part of a fully automated warehouse (machine), in a robot, e.g. connected to a robot actuator (or robot control circuit) via middleware (for connecting the data processing system 100 to the actuator), or in a system with a low complexity event based camera, whereby trigger data from the event based camera may be fed/sent directly to the data processing system 100.
FIG. 3 is a flowchart illustrating exemplary method steps according to some embodiments. Fig. 3 illustrates a computer-implemented or hardware-implemented method 300 for processing data. The method may be implemented in analog hardware/electronic circuitry, in digital circuitry such as gates and flip-flops, in mixed signal circuitry, in software, and in any combination thereof. In some embodiments, the method 300 includes entering a learning mode. Alternatively, method 300 includes providing data processing system 100 that has been trained. In this case, steps 370 and 380 (steps g and h) are not performed. The method 300 includes receiving 310 one or more system inputs 110a, 110b, …, 110z including data to be processed. Further, the method 300 includes providing 320 a plurality of inputs 132a, 132b, …, 132y to the network NW 130 including the plurality of first nodes 130a, 130b, …, 130x, at least one of the plurality of inputs being a system input. Further, the method 300 includes receiving 330 an output 134a, 134b, …, 134x from each first node 130a, 130b, …, 130 x. The method 300 includes providing 340 a system output 120 including an output 34a, 134b, …, 134x of each first node 130a, 130b, …, 130 x. Further, the method 300 includes exciting 350, by a node 130a, 130b in the first set 160 of nodes, one or more other nodes …, 130x in the plurality of nodes 130a, 130b, …, 130x by providing an output 134a, 134b of each node 130a, 130b in the first set 160 of nodes as an input 132d, …, 132y to one or more other nodes …, 130x in the plurality of nodes 130a, 130b, …, 130x. Further, the method 300 includes suppressing 360, by a node 130x in the second set 162 of the plurality of nodes, one or more other nodes 130a, 130b, … in the plurality of nodes 130a, 130b, …, 130x by providing an output 134x of each node 130x in the second set 162 as a processing unit input 142x to a respective processing unit 140x, each respective processing unit 140x configured to provide a processing unit output 144x as an input 132b, 132e, … to one or more other nodes 130a, 130b, …. The method 300 includes updating 370 the weights Wa, …, wy by the one or more updating units 150 based on (according to) the correlations (during the learn mode and as described above in connection with fig. 1 and 2). Further, the method 300 includes repeating 380 (during the learning mode) steps 310, 320, 330, 340, 350, 360, and 370 (as described above) until the learning criterion is met (thus exiting the learning mode when the learning criterion is met). In some embodiments, the learning criteria is that data processing system 100 is fully trained. In some embodiments, the learning criteria is that the weights Wa, wb, …, wy converge and/or the total error is below an error threshold. In some embodiments, method 300 includes entering an execution/recognition mode. Further, the method 300 includes repeating 390 (during the execution/recognition mode) steps 310, 320, 330, 340, 350, and 360 (as described above) until the stopping criteria are met (thus exiting the execution/recognition mode when the stopping criteria are met). The stop criteria/conditions may be that all pending data has been processed or that a certain number of data/a certain number of cycles have been processed/executed. Alternatively, the stop criteria is that the entire data processing system 100 is shut down. As another alternative, the stopping criteria is that data processing system 100 (or a user of system 100) has found that further training of data processing system 100 is required. in this case, data processing system 100 enters/reenters the learn mode (and performs steps 310, 320, 330, 340, 350, 360, 370, 380, and 390). Each node of the plurality of nodes 130a, 130b, …, 130x belongs to one of the first set 160 and the second set 162 of nodes.
In some embodiments, the method 300 includes initializing 304 the weights Wa, …, wy by setting the weights Wa, …, wy to zero. Alternatively, the method 300 includes initializing 306 the weights Wa, …, wy by randomly assigning values between 0 and 1 to the weights Wa, …, wy. Further, in some embodiments, the method 300 includes adding 308 a predetermined waveform to the output 134a, 134b, …, 134x of one or more of the plurality of nodes 130a, 130b, …,130 x for the duration of the third period of time. In some embodiments, the third time period begins simultaneously with receiving 310 one or more system inputs 110a, 110b, …, 110z that include data to be processed.
According to some embodiments, a computer program product includes a non-transitory computer readable medium 400, such as Universal Serial Bus (USB) memory, a plug-in card, an embedded drive, a Digital Versatile Disk (DVD), or Read Only Memory (ROM). Fig. 4 illustrates an example computer-readable medium in the form of a Compact Disk (CD) ROM 400. A computer readable medium has stored thereon a computer program comprising program instructions. The computer program may be loaded into a data Processor (PROC) 420, which data Processor (PROC) 420 may for example be comprised in a computer or computing device 410. When loaded into a data processing unit, the computer program may be stored in a memory (MEM) 430 associated with or included in the data processing unit. According to some embodiments, the computer program may perform the method steps of the method shown in fig. 3, for example, when the computer program is loaded into and executed by a data processing unit. Furthermore, in some embodiments, a computer program product is provided comprising instructions that, when executed on at least one processor of a processing device, cause the processing device to perform the method shown in fig. 3. Further, in some embodiments, a non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a processing device is provided, the one or more programs comprising instructions, which when executed by the processing device, cause the processing device to perform the method shown in fig. 3.
Fig. 5 illustrates an update unit, according to some embodiments. The update unit 150a is for the node 130 a. However, all update units 150, 150a (for all nodes) are the same or similar. The update unit 150a receives each respective input 132a, …, 132c of the node 130a (or all nodes if the central update unit 150). In addition, the update unit 150a receives the outputs 134a of the nodes 130a (or, if the central update unit 150, the outputs 134a of all nodes). Further, the updating unit 150a includes a correlator 152a. The correlator 152a calculates the correlation of each respective input 132a, …, 132c of the node 130a with the corresponding output (134 a) during the learn mode, producing a correlation value(s) for each input 132a, …, 132 c. In some embodiments, the different calculated (series of) correlation values are compared to each other (to produce a correlation ratio), and the updating of the weights is based on (based on) the comparison. Furthermore, in some embodiments, the updating unit 150a is configured to apply a first function 154 to the correlations (values, ratios) if the node (130 a) belongs to a first group 160 of the plurality of nodes and to apply a second function 156, different from the first function, to the correlations (values, ratios) if the node (130 a) belongs to a second group 162 of the plurality of nodes, in order to update the weights Wa, wb, wc during the learning mode. In some embodiments, the update unit 150a tracks whether a node belongs to the first group 160 or the second group 162 by utilizing a lock table (LUTs). Further, in some embodiments, the updating unit 150a includes a probability value Pa, pb, pc for each weight Wa, wb, wc for increasing the weight. In some embodiments, the updating unit 150a comprises a probability value Pad, pbd, pcd for each weight Wa, wb, wc for reducing the weight, the probability value being 1-Pa, 1-Pb, 1-Pc in some embodiments, i.e. pad=1-Pa, pbd=1-Pb and pcd=1-Pc). In some embodiments, the probability values Pa, pb, pc and optionally the probability value Pad, pbd, pcd are included in the memory unit 158a of the update unit 150 a. In some embodiments, memory unit 158a is a lock table (LUT). In some embodiments, the updating unit 150a applies one of the first and second functions and/or the probability values Pa, pb, pc and optionally the probability value Pad, pbd, pcd to the calculated (series of) correlation values (or resulting correlation ratios) to obtain the updating signal 159, and then applies the updating signal 159 to the weights Wa, wb, wc to update the weights Wa, wb, wc. the function/structure of the update unit for the other nodes 150b, …, 150x is the same as that for the node 150 a. Further, in some embodiments, the central update unit 150 includes each of the update units for each of the nodes 130a, 130b, …,130 x.
Fig. 6 illustrates a compartment according to some embodiments. In some embodiments, each node 130a, 130b, …,130 x includes a plurality of compartments 900. Each compartment is configured with a plurality of compartment inputs 910a, 910b, …, 910x. Further, each compartment 900 includes compartment weights 920a, 920b, …, 920x for each compartment input 910a, 910b, …, 910x. Further, each compartment 900 is configured to produce a compartment output 940. In some embodiments, the compartment output 940 is calculated by the compartment as a weighted combination, such as a sum, of all compartment inputs 930a, 930b, …, 930x to the compartment. To calculate the sum/combination, the compartment may be equipped with a summer (or adder/summing unit) 935. Each compartment 900 includes an updating unit 995, the updating unit 995 being configured to update the compartment weights 920a, 920b, …, 920x based on (according to) the correlations during the learning mode (in the same manner as described above in connection with fig. 5 and elsewhere for the updating unit 150a, and for one or more compartments may include evaluating each input of the node based on a scoring function). Further, the compartment output 940 of each compartment is used to adjust the output 134a, 134b, …,134 x (e.g., 134 a) including the nodes 130a, 130b, …,130 x (e.g., 130 a) of the compartment 900 based on (according to) the transfer function. Examples of transfer functions that may be utilized are time constants, such as one or more of RC filters, resistors, spike generators, and active elements, such as transistors or operational amplifiers. The compartments 900a, 900b, …, 900x may include sub-compartments 900aa, 900ab, …, 900ba, 900bb, …, 900xx. Thus, each compartment 900a, 900b, …, 900x may have sub-compartments 900aa, 900ab, …, 900ba, 900bb, sub-compartments, etc. functionally connected to the compartment, i.e. the compartments are cascaded. The number of compartments (and sub-compartments) of a particular node is based on (based on) the type of input to the particular node, such as a suppression input, a sensor input, and an excitatory input. Further, the compartments 900 of the nodes are arranged such that each compartment 900 has a majority of one type of input (e.g., inhibitory input, sensor input, or excitatory input). Thus, any type of input (e.g., inhibitory input, sensor input, or excitatory input) is not allowed to become too dominant.
In some embodiments, still referring to fig. 6, the update unit 995 of each compartment 900 includes probability values PCa, …, PCy for increasing weights (and possibly probability values PCad, …, PCyd for decreasing weights, in some embodiments probability values 1-PCa, …, 1-PCy, i.e., PCad = 1-PCa, PCbd = 1-PCb, etc.) for each compartment weight 920a, 920b, …, 920 x. In these embodiments, during the learn mode, the data processing system 100 is configured to calculate a third set point of the sum of all compartment weights 920a, 920b, …, 920x associated with the compartment inputs 910a, 910b, …, 910x to the compartment 900, the data processing system 100 is configured to calculate the sum of all compartment weights 920a, 920b, …, 920x associated with the compartment inputs 910a, 910b, …, 910x to the compartment 900, the data processing system 100 is configured to compare the calculated sum with the third set point, and if the calculated sum is greater than the third set point, to decrease the probability value PCa, …, PCy associated with the weight 920a, 920b, …, 920x (associated therewith) of the compartment input 910a, 910b, …, 910x to the compartment 900, and if the calculated sum is less than the third set point, to increase the probability value PCa, 920a, 920b, 920x associated with the weight 920a, 920b, 920x (associated with the weight 920b, 920 x) of the compartment input 910a, …, 910x to the compartment 900. The third set point is based on (according to) the type of input, such as a system input (sensor input), an input from a node in the first set 160 of the plurality of nodes (excitatory input), or an input from a node in the second set 162 of the plurality of nodes (inhibitory input).
In some embodiments, data processing system 100 is a time-continuous data processing system, i.e., all signals within data processing system 100, including signals between different nodes and including one or more system inputs 110a, 110b, …, 110z and system output 120 are time-continuous (e.g., without spikes).
List of embodiments:
Example 1, a data processing system (100) configured with one or more system inputs (110 a, 110b, …, 110 z) including data to be processed and a system output (120), comprising:
A network NW (130) comprising a plurality of nodes (130 a, 130b, …, 130 x), each node being configured to have a plurality of inputs (132 a, 132b, …, 132 y), each node (130 a, 130b, …, 130 x) comprising a weight (Wa, …, wy) for each input (132 a, 132b, …, 132), and each node being configured to produce an output (134 a, 134b, …, 134 x); and
One or more updating units (150) configured to update the weight (Wa, …, wy) of each node (130 a) during a learning mode based on a correlation of each respective input (132 a, …, 132 c) of the node with the corresponding output (134 a);
One or more processing units (140 x) configured to receive the processing unit input and configured to generate a processing unit output by changing a sign of the received processing unit input; and
Wherein the system output (120) comprises the output (134 a, 134b, …, 134 x) of each node (130 a, 130b, …, 130 x),
Wherein the nodes (130 a, 130 b) in the first group (160) of the plurality of nodes are configured to stimulate one or more other nodes (…, 130 x) in the plurality of nodes (130 a, 130b, …, 130 x) by providing an output (134 a, 134 b) of each node (130 a, 130 b) in the first group (160) of nodes as an input (132 d, …,132 y) to the one or more other nodes (…, 130 x),
Wherein a node (130 x) in a second group (162) of the plurality of nodes is configured to suppress one or more other nodes (130 a, 130b, …) in the plurality of nodes (130 a, 130b, …, 130 x) by providing the output (134 x) of each of the nodes (130 x) in the second group (162) as a processing unit input to a respective processing unit (140 x), each respective processing unit (140 x) being configured to provide the processing unit output as an input (132 b, 132e, …) to the one or more other nodes (130 a, 130b, …), and
Wherein each node of the plurality of nodes (130 a, 130b, …, 130 x) belongs to one of the first and second groups (160, 162) of nodes.
Example 2, the data processing system of example 1, wherein the system input includes sensor data for a plurality of contexts/tasks.
Example 3, the data processing system of any of examples 1-2, wherein the update unit 150 includes a probability value (Pa, …, py) for each weight (Wa, …, wy) for increasing the weight, and wherein during the learning mode, the data processing system is configured to limit a node's (130 a) ability to suppress or energize the one or more other nodes (130 b, …, 130 x) by: providing a first set point of a sum of all weights (Wd, wy) associated with the inputs (132 d, …, 132 y) to the one or more other nodes (130 b, …, 130 x), decreasing a probability value (Pd, py) associated with the weights (Wd, wy) and a sum of all weights (Wd, wy) associated with the inputs (132 d, …, 132 y) to the one or more other nodes (130 b, …, 130 x), increasing the probability value (Pd, py) associated with the weights (Wd, wy) to one or more other nodes (130 b, …, 130 x) and the sum of all weights (Wd, wy) associated with the inputs (132 d, 3434, 132 y) to the one or more other nodes (130 b, …, 130 x), if the first set point is less than the sum of all weights (Pd, wy) associated with the inputs (132 d, 3434, 132 y) to the one or more other nodes (130 b, …, 130 x), increasing the probability value (Pd, wy) associated with the one or more inputs (Pd, 34, 132 y) to the one or more other nodes (132 b, 132 x), and increasing the probability value (Pd, wy) associated with the one or more inputs (Pd, 132 y) to the sum of the inputs (132 d, 34 y).
Example 4, the data processing system of any of examples 1 to 3, wherein during the learn mode, the data processing system is configured to limit an ability of a system input (110 z) to inhibit or stimulate one or more nodes (130 a, …, 130 x) by: -providing the first set point of a sum of all weights (Wg, wx) associated with the inputs (132 g, 132 x) to the one or more nodes (130 a, …, 130 x), -comparing the first set point with a sum of all weights (Wg, wx) associated with the inputs (132 g, 132 x) to the one or more nodes (130 a, …, 130 x), -reducing the probability value (Pg, px) associated with the weights (Wg, wx), if the first set point is smaller than a sum of all weights (Wg, wx) associated with the inputs (132 g, 132 x) to the one or more nodes (130 a, …, 130 x), if the weights (Wg, wx) associated with the inputs (132 g, 132 x) to the one or more nodes (130 a, …, 130 x) are greater than the probability value (Pg, px) associated with the one or more inputs (132 g, 35 x) associated with the one or more nodes (132 g, 132 x), if the weights (Wg, 35 x) associated with the one or more inputs (132 g, 132 x) are greater than one or more than one of the sum of the weights (Pg, px) associated with the inputs (132 g, 132 x).
Example 5, the data processing system of any of examples 3 to 4, wherein each of the inputs (132 d, …, 132 y) to the one or more other nodes (130 b, …, 130 x) has coordinates in a network space, wherein the decrease/increase in the weight (Wd, wy) of the input (132 d, 132 y) to the one or more other nodes (130 b, …, 130 x) is based on a distance between coordinates of the input (132 d, 132 y) associated with the weight (Wd, wy) in the network space.
Example 6, the data processing system of any of examples 3-5, wherein the system is further configured to set the weight (Wa, …, wy) to zero if the weight (Wa, …, wy) does not increase within a preset period of time; and/or
Wherein the system is further configured to increase a probability value (Pa, …, py) of a weight (Wa, …, wy) having a zero value if a sum of all weights (Wd, wy) associated with the inputs (132 d, …, 132 y) to the one or more other nodes (130 b, …, 130 x) does not exceed the first set point within a preset period of time.
Example 7, the data processing system of any of examples 1-2, wherein during the learning mode, the data processing system is configured to increase a correlation of the output (134 a) of a node (130 a) with the one or more other nodes (130 b, …, 130 x) by: -providing a first set point of a sum of all weights (Wd, wy) associated with the inputs (132 d, …, 132 y) to the one or more other nodes (130 b, …, 130 x), -comparing the first set point with a sum of all weights (Wd, wy) associated with the inputs (132 d, …, 132 y) to the one or more other nodes (130 b, …, 130 x) over a first time period, -increasing a probability of changing the weights (Wa, wb, 132 c) to the node (130 a) and-decreasing a probability of changing the weights (Wd, wc) to the one or more other nodes (132 b, …, 130 x) over the first time period by less than a sum of all weights (Wd, wy) associated with the inputs (132 d, …, 132 y) to the one or more other nodes (130 b, …, 130 x), -decreasing a probability of changing the weights (Wd, wc) to the one or more nodes (132 b, 132 c) over the first time period by more than the sum of all weights (Wd, wc) associated with the inputs (132 b, 132 c).
Example 8, the data processing system of any of examples 1-2, wherein the update unit (150) comprises a probability value (Pa, …, py) for increasing the weight for each weight (Wa, …, gy), and wherein during the learning mode the data processing system is configured to provide a second set point of a sum of all weights (Wa, wb, wc) associated with the inputs (132 a, 132b, 132 c) to a node (130 a), the data processing system is configured to calculate a sum of all weights (Wa, wb, wc) associated with the inputs (132 a, 132b, 132 c) to the node (130 a), the data processing system is configured to compare the calculated sum with the second set point, and if the calculated sum is greater than the second set point, to reduce the probability value (Pa, wb, wc) associated with the weights (Wa, pb, wb, wc) and if the calculated sum is less than the second set point, the probability value (Pb, wb, wc) associated with the inputs (132 a, 132b, wc) to the node (132 a).
Example 9, the data processing system of any of examples 1-2, wherein each node (130 a, 130b, …, 130 x) comprises a plurality of compartments (900), and each compartment is configured to have a plurality of compartment inputs (910 a, 910b, …, 910 x), each compartment (900) comprises a compartment weight (920 a, 920b, …, 920 x) for each compartment input (910 a, 910b, …, 910 x), and each compartment (900) is configured to generate a compartment output (940), and wherein each compartment (900) comprises an updating unit (995) configured to update a compartment weight (920 a, 920b, …, 920 x) based on a correlation during the learning mode and wherein the compartment output (940) of each compartment is used to adjust the output (134 a, 134b, 134 x) of the node (130 a, 130b, …, 130 x) comprising the compartment based on a transfer function.
Example 10, the data processing system of example 9, wherein the update unit (995) of each compartment (900) includes a third set point for each compartment weight (920 a, 920b, …, 920 x) for increasing the probability value (PCa, …, PCy) of the weight, and wherein during the learning mode the data processing system is configured to provide a sum of all compartment weights (920 a, 920b, …, 920 x) associated with the compartment input (910 a, 910b, …, 910 x) to compartment (900), the data processing system is configured to calculate a sum of all compartment weights (920 a, 920b, …, 910 x) associated with compartment weights (920 a, 920b, …, 920 x) to compartment (900), the data processing system is configured to compare the calculated sum with the third set point, and if the calculated sum is greater than the third set point, the probability value is configured to decrease the data processing system as a sum of all compartment weights (920 a, 920b, …, 920 x) associated with compartment weights (920 a, 920b, 920 x) and the sum of the compartment weights (920 a, 920b, …, 910 x) associated with compartment weights (920 a, 920b, 920 x) to compartment (900), the data processing system is configured to decrease the sum of the calculated sum of all compartment weights (920 a, 920b, …, 920 x) associated with the calculated sum of the compartment weights (920 a, 920b, 920 x) associated with the calculated sum of the compartment weights (920 a, 920 x) and the calculated by the sum of the calculated by the data processing system is greater than is calculated by the sum of the calculated by the sum, 920 x) is associated with the compartment input (910 a, 910b, …, 910 x) to the compartment (900), and wherein the third setpoint is based on a type of input, such as a system input, an input from a node in the first set (160) of the plurality of nodes, or an input from a node in the second set (162) of the plurality of nodes.
Example 11, the data processing system of any of examples 1-2, wherein during the learning mode, the data processing system is configured to:
Detecting whether the network (130) is sparsely connected by comparing a cumulative weight change of the system inputs (110 a, 110b, …, 110 z) over a second period of time with a threshold; and
If the data processing system detects a sparse connection of the network (130), the output (134 a, 134b, …, 134 x) of one or more of the plurality of nodes (130 a, 130b, …, 130 x) is increased by adding a predetermined waveform to the output (134 a, 134b, …, 134 x) of one or more of the plurality of nodes (130 a, 130b, …, 130 x) for a duration of a third period of time.
Example 12, the data processing system of any of examples 1-11, wherein each node comprises an updating unit (150), wherein each updating unit (150) is configured to update a weight (Wa, wb, wc) of each respective input (132 a, …, 132 c) of the node (130 a) based on a correlation of the respective input to the output (134 a) of the node (130 a), and wherein each updating unit (150) is configured to apply a first function to the correlation if the associated node belongs to a first group (160) of the plurality of nodes, and to apply a second function, different from the first function, to the correlation if the associated node belongs to the second group (162) of the plurality of nodes, to update the weight (Wa, wb, wc) during the learning mode.
Example 13, the data processing system of any of examples 1-12, wherein the data processing system is configured to: after the updating of the weights (Wa, …, wy) has been performed, calculating a global variance of the outputs (134 a, 134b, …, 134 x) of the nodes (130 a, 130b, …, 130 x) of the network, comparing the calculated global variance with a power law; and minimizing the error or mean square error between the ensemble and the power law by adjusting parameters of the network.
Example 14, the data processing system of any of examples 2-13, wherein the data processing system is configured to learn from the sensor data to identify one or more entities when in a learning mode, and thereafter configured to identify the one or more entities when in an execution mode, and wherein the identified entities are one or more of: a speaker, spoken letter, syllable, phoneme, word or phrase present in the sensor data, or an object or feature of an object present in the sensor data, or a new contact event present in the sensor data, an end of a contact event, a gesture or an applied pressure.
Example 15, a computer-implemented or hardware-implemented method (300) for processing data, comprising:
a) Receiving (310) one or more system inputs (110 a, 110b, …, 110 z) comprising data to be processed;
b) Providing (320) a plurality of inputs (132 a, 132b, …, 132 y) to a network NW (130) comprising a plurality of first nodes (130 a, 130b, …, 130 x), at least one of the plurality of inputs being a system input;
c) -receiving (330) an output (134 a, 134b, …, 134 x) from each first node (130 a, 130b, …, 130 x);
d) -providing (340) a system output (120), the system output (120) comprising the output (134 a, 134b, …, 134 x) of each first node (130 a, 130b, …, 130 x);
e) Exciting (350) one or more other nodes (…, 130 x) in a first set (160) of the plurality of nodes by a node (130 a, 130 b) in the first set (160) of the plurality of nodes by providing an output (134 a, 134 b) of each node (130 a, 130 b) as an input (132 d, …, 132 y) to the one or more other nodes (…, 130 x) in the plurality of nodes (130 a, 130b, …, 130 x);
f) Suppressing (360) one or more other nodes (130 a, 130b, …) of the plurality of nodes (130 a, 130b, …, 130 x) by a node (130 x) of a second group (162) of the plurality of nodes by providing an output (134 x) of each node (130 x) of the second group (162) as a processing unit input to a respective processing unit (140 x), each respective processing unit (140 x) configured to provide the processing unit output as an input (132 b, 132e, …) to the one or more other nodes (130 a, 130b, …);
g) Optionally, updating (370) the weights (Wa, …, wy) by one or more updating units (150) based on the correlations; and
H) -optionally repeating (380) a) -g) until a learning criterion is met;
i) Repeating (390) a) -f) until the stopping criteria are met, and
Wherein each node of the plurality of nodes (130 a, 130b, …, 130 x) belongs to one of the first and second groups (160, 162) of nodes.
Example 16, the method of example 15, further comprising:
Initializing (304) the weights (Wa, …, wy) by setting the weights (Wa, …, wy) to zero; and
A predetermined waveform is added (308) to the output (134 a, 134b, …, 134 x) of one or more of the plurality of nodes (130 a, 130b, …, 130 x) for a duration of a third period of time, the third period of time beginning at a time of receiving (310) one or more system inputs (110 a, 110b, …, 110 z) comprising data to be processed.
Example 17, the method of example 15, further comprising:
Initializing (306) weights (Wa, …, wy) by randomly assigning values between 0 and 1 to the weights (Wa, …, wy); and
A predetermined waveform is added (308) to the output (134 a, 134b, …, 134 x) of one or more of the plurality of nodes (130 a, 130b, …, 130 x) for a duration of a third period of time.
Example 18, a computer program product comprising a non-transitory computer readable medium (400), having stored thereon a computer program comprising program instructions, the computer program being loadable into a data-processing unit (420) and configured to perform the method according to any of examples 15-17 when the computer program is run by the data-processing unit (420).
Those skilled in the art will recognize that the present disclosure is not limited to the preferred embodiments described above. Those skilled in the art will further recognize that modifications and variations may be made within the scope of the appended claims. For example, signals from other sensors, such as fragrance sensors or flavor sensors, may be processed by the data processing system. Furthermore, the described data processing system may equally well be used for unsegmented, connected handwriting recognition, speech recognition, speaker recognition and anomaly detection in a network traffic or Intrusion Detection System (IDSs). Further, variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed disclosure, from a study of the drawings, the disclosure, and the appended claims.