[go: up one dir, main page]

WO2020152129A1 - Procédé et dispositif de construction d'un réseau neuronal - Google Patents

Procédé et dispositif de construction d'un réseau neuronal Download PDF

Info

Publication number
WO2020152129A1
WO2020152129A1 PCT/EP2020/051343 EP2020051343W WO2020152129A1 WO 2020152129 A1 WO2020152129 A1 WO 2020152129A1 EP 2020051343 W EP2020051343 W EP 2020051343W WO 2020152129 A1 WO2020152129 A1 WO 2020152129A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
nodes
neural network
output
focus set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/EP2020/051343
Other languages
English (en)
Inventor
Claes STRANNEGÅRD
Niklas ENGSNER
Fredrik MÄKELÄINEN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dynamic Topologies Sweden AB
Original Assignee
Dynamic Topologies Sweden AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dynamic Topologies Sweden AB filed Critical Dynamic Topologies Sweden AB
Publication of WO2020152129A1 publication Critical patent/WO2020152129A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • This application relates to the field of artificial intelligence and machine learning, and in particular to a machine learning system capable of constructing a neural network.
  • AI artificial intelligence
  • a neural network is comprised of interconnected nodes. The output of one node can be the input of another node. The nodes may be arranged in layers. Generally, the leftmost layer of the neural network is called the input layer, and the rightmost layer the output layer. The output layer is arranged downstream from the input layer.
  • a neural network comprises of an input layer, which provides input to the output layer.
  • input data may be any kind of data and is represented by a vector of an arbitrary but fixed dimension.
  • the output may be a vector of an arbitrary but fixed dimension.
  • neural networks more often comprise of several layers. These neural networks are called multi-level networks and the layers between the input layer and the output layer are called hidden layers.
  • Figure 1 shows a simple example of a neural network 100.
  • the three nodes l l Oa-c to the far left represents the input nodes.
  • the next layer of four nodes 120a-d in the middle represents a hidden layer.
  • the hidden layer thus comprises four hidden nodes.
  • the node 130 to the far right represents the output layer.
  • the neural network typically takes an input, passes it through the layers of hidden nodes and outputs a prediction representing the combined input of all the nodes.
  • the hidden layers apply an activation function before passing on the result to the next layer, wherein the activation function modifies the received data and characterizes the relationship between input and output layers.
  • the activation functions give neural networks their power, i.e. allowing them to model complex non-linear relationships. By modifying inputs with non-linear functions, neural networks can model highly complex relationships between features.
  • Traditional neural networks typically comprise fully-connected layers, i.e. all nodes within two adjacent layers are typically connected such that each node receives input from all previous layer’s nodes and sends its output to every node in the next layer.
  • Each connection between two nodes has an associated weight.
  • the associated activation function is parameterized by the weights.
  • various learning tasks can be performed by minimizing a cost function over the network function, i.e. by adjusting the weights associated with the connections until the received output data is correct.
  • the nodes within the neural network may be associated with biases.
  • a bias of a node may be added to the total weighted sum of inputs to the node.
  • the bias may serve as a threshold to shift the activation function, i.e. when the node should be activated.
  • the use of biases in a neural network may increase the capacity of the network to solve problems.
  • the neural network In order for a neural network to develop to learn new features and to make better decisions, the neural network can continuously be reconfigured.
  • the deep learning paradigm is widely used and is an artificial intelligence function that imitates the workings of the human brain in processing data and creating patterns for use in the decision making. Deep learning has been successful, e.g. in the domains of image recognition, speech recognition, and language technology. In deep learning, each layer learns to transform its input data in the correct manner, such that the neural network may perform accurate decision making and provide correct output. However, deep learning is computationally expensive. The learning process is both data-hungry and energy-hungry.
  • the reasons for the high-energy consumption include the large number of trainable parameters, catastrophic forgetting that requires re-training, and lack of one-shot learning capabilities.
  • deep neural networks are typically trained starting from neural architectures that are hand-made by engineers for each application. This locks the networks into architectures that do not change even if circumstances heavily effecting the input change, meaning that those architectures become completely inadequate. This inflexibility clearly limits the model’s usefulness.
  • network architectures that grow incrementally, e.g. by adding layers with dense connectivity pattern and randomized weights. Such methods are typically based on raw computational power that is used for searching among network architectures. In that case, the increased flexibility comes at the price of a high-energy consumption.
  • a neural network If a neural network is too small it may lead to a task or proper decision not being properly performed by the neural network. However, if the neural network grows too large, it may lead to overfitting and may increase the consumption of time, data and energy. Hence, high-energy consumption and architectural inflexibility are well-known drawbacks of the deep learning paradigm.
  • a method for configuring a neural network is implemented by at least one computing device.
  • the method comprises providing input data to at least one input node of the neural network and receiving output data from at least one downstream node in the neural network.
  • the output data is generated based on the input data by a focus set of the neural network, wherein the focus set comprises at least one node that fulfils at least one criterion related to generation of output data.
  • the method further comprises receiving a signal comprising information indicating the accuracy of the received output data.
  • the method thereafter comprises comparing information indicating accuracy of the received output data to a threshold. Based on the comparison, the neural network is configured by adding at least one node to the neural network and connecting the at least one added node to the focus set of the nodes within the neural network.
  • the at least one criterion for the focus set comprises at least one performance criterion.
  • the at least one performance criterion is related to at least one of an activation level of the node, a change in the activation level of the node, a frequency of activation of the node and an accuracy of previous output of the node.
  • the at least one criterion further comprises at least one structural criterion related to at least one of a position of the node within the neural network, a number of nodes within the neural network and a number of nodes within an area of the neural network.
  • the information indicating the accuracy of the received output data is an error value determined based on a difference between the received output data and a reference output data, and adding at least one node is performed if the error value is greater than or equal to the threshold.
  • the information indicating the accuracy of the received output data is determined based on a reward signal representing how well the neural network performed based on said provided input data.
  • the at least one downstream node is represented by at least one focus set output node and adding nodes to the neural network and connecting the added nodes to the focus set further comprises, for each focus set output node, adding a respective value node.
  • the value node receives input only from the respective node in the focus set.
  • a concept node is added to the neural network and connected to the focus set to act as an output of the focus set, and the concept node is connected to each of the added value nodes.
  • At least one trainable parameter of at least one of the value nodes and concept node is set such that the output of the concept node matches a reference output.
  • the method further comprises configuring the neural network by performing backpropagation.
  • the step of performing backpropagation is performed only along the focus set.
  • the step of performing backpropagation comprises connecting at least two temporary nodes to the focus set and adjusting at least one trainable parameter of at least one node within the focus set and/or of at least one of the at least two temporary nodes in order to increase the accuracy of the output data received from the neural network. Thereafter, the at least two temporary nodes are removed.
  • the at least two temporary nodes comprise at least one temporary normalization node and at least one temporary output node and the at least one temporary normalization node is a node that rescales the output of an output node of the focus set.
  • the at least one downstream node is represented by at least one focus set output node.
  • the step of connecting at least two temporary nodes to the focus set then further comprises, for each focus set output node, adding a respective temporary normalization node to the neural network.
  • the temporary normalization nodes is connected to the focus set output node and the outgoing weight from the focus set output node to the temporary normalization node is set to 1.
  • the step of connecting at least two temporary nodes to the focus set further comprises adding a respective temporary output node to the neural network for each dimension of the output of the focus set output nodes.
  • Each temporary output node is connected to each temporary normalization node.
  • the outgoing weights from each temporary normalization node to the temporary output nodes are set such that each temporary normalization node has a weighted output that represents an output value of its respective focus set output node.
  • the method further comprises performing generalization to merge nodes with corresponding output data if at least two concept nodes receive input from at least two value nodes that receive input from a common node and produce corresponding output data, and if at least one of the two concept nodes was added in a previous step.
  • performing generalization further comprises adding a respective value node corresponding to each of the common nodes.
  • the respective value node receives input only from the respective common node.
  • a concept node is added to act as a common output of the common nodes.
  • the concept node is connected to each of the added value nodes and at least one trainable parameter of the value nodes and the concept node is set such that the output of the added concept node is a function of the at least two concept nodes.
  • the method further comprises removing the concept node and at least one corresponding value node that provides input to the concept node.
  • the at least one removal criterion is related to at least one of a frequency of appearances of the concept node in a focus set, an average signal comprising information indicating the accuracy of the received output data, and a position of the concept node within the neural network.
  • feedback from the environment in the form of a reward signal is received and used for computing a reference vector that in turn constitutes the basis for the update of the network.
  • computing device for configuring a neural network.
  • the computing device is configured to perform the method according to one of the aspects provided in the disclosure.
  • a computer readable storage medium encoded with instructions that, when executed on a processor, performs the method according to one of the aspects provided in the disclosure.
  • a carrier containing the computer program according to one of the aspects provided in the disclosure, wherein the carrier is one of an electronic signal, optical signal, radio signal, or computer readable storage medium.
  • Figure 1 shows a schematic overview of a traditional neural network
  • Figure 2a, 2b, 2c and 2d show three examples of nodes used within the present disclosure and an example of a neural network
  • Figures 3a, 3b and 3c show flowcharts for a method according to the present disclosure for configuring a neural network
  • Figures 4a and 4b illustrate an example embodiment of adding nodes to a focus set according to the present disclosure
  • FIGS. 5a, 5b, 5c and 5d show an example embodiment of performing backpropagation according to the present disclosure
  • Figures 6a and 6b shows an example embodiment of a neural network before and after performing generalization according to the present disclosure
  • Figure 7a and 7b show an example embodiment of a neural network before and after a removal criterion has been fulfilled
  • Figure 8a is a block diagram illustrating a neural network configuring unit according to the present disclosure
  • Figure 8b is a schematic view of the components of a computing device according to the present disclosure.
  • Figure 8c illustrates a schematic overview of a computing device according to the present disclosure
  • Figure 8d shows a schematic view of a computer-readable medium according to one embodiment of the present disclosure.
  • Embodiments of the present disclosure relate to constructing neural networks incrementally.
  • the present disclosure relates to a method for performing this, wherein the proposed method is implemented by a computing device.
  • Figures 2a, b, c, d, and Figure 3 show three different types of node used within the disclosed method and Figure 2d illustrates an example of a neural network.
  • Figure 3 shows a flowchart for a method 300 for configuring a neural network. The method may be implemented by at least one computing device, such as the computing device shown in Figure 8.
  • a neural network may be configured with three types of nodes. These three types of nodes are input nodes, value nodes and concept nodes.
  • An input node 210 as illustrated in Figure 2a, has no incoming connections and is represented in the input layer. Thus, an input node is a node without any incoming connections from elsewhere within the network, and which is provided in the input layer.
  • the input node 210 may accordingly receive input signals that are provided to the network. These signals might come from a variety of sources, e.g. from hardware sensors, from other neural networks, from nodes in a convolutional neural network that may represent image features, or from any other type of pre-processing device.
  • a value node 220 is defined as a node with exactly one incoming connection.
  • the value node 220 has an incoming connection from an input node 210.
  • the connection may come from another node in a network, such as from a concept node, which will be described in relation to Figure 2c.
  • Value nodes are represented in the drawings by ellipses.
  • the value node 220 has an activation function, wherein the activation function of a value node defines the output produced by the value node given an input, as previously described.
  • the value node may for example, have a Gaussian activation function.
  • FIG. 2c An example of a concept node 230 is illustrated in Figure 2c.
  • the concept node 230 is a node whose incoming connections all come from value nodes 220a, b, ... , n.
  • a concept node 230 must have at least one incoming connection.
  • each concept node 230 may have an associated vector, y(u).
  • the vector y(u) is the output y generated by the node u, i.e. the decision for y.
  • Figure 2d shows an example of a neural network using the different types of node disclosed above.
  • the neural network may receive taste and visual inputs and based on these inputs, the neural network may identify a source of the inputs.
  • the input nodes 210a-d are input for sweetness, sourness, saltiness and redness, respectively.
  • Each of the input nodes 210a-d has a respective value node 220a-d attached to it, i.e. a value node is connected to each input node.
  • the concept node 230a may represent a memory and is a concept node of raspberry taste.
  • the concept node 230b may represent a memory and is a concept node of a red raspberry.
  • the neural network illustrated in Figure 2d is a multi-level neural network. The associated vectors of the concept nodes are not shown. Accordingly, if taste and visual inputs are received by the neural network illustrated in Figure 2d, the concept node 230a may determine if the taste input belongs to a raspberry. The concept node 230b may then combine this information with input received from the input node 210d and value node 220d to determine if the inputted information should be identified not solely as a raspberry, but as a red raspberry.
  • the method for configuring a neural network will now be described with reference to Figures 3a-c.
  • the method may be performed to configure any arbitrary network with a number of input nodes, such as the network illustrated in Figure 2d for example.
  • the network may be, for example, a feed-forward network or a recurrent network.
  • the method starts at step 301 with providing input data to at least one input node of the neural network.
  • the input data may be any kind of data and is generally represented by real-valued vectors of dimension m. Alternatively, the input data may be represented by discrete values.
  • the method is implemented by at least one computing device.
  • output data is received from at least one downstream node in the neural network.
  • a node arranged downstream in the neural network is arranged such that the downstream node, or a node prior to the downstream node, receives data from the input node and performs some operation on that data.
  • the nodes that are used in the generation of the output data received in step 302 can be collectively called a focus set of the neural network for that output data.
  • the received output data is thus generated by the focus set based on the input data, and is received by the computing device performing the method from the at least one downstream node in the neural network.
  • the focus set comprises at least one node that fulfils at least one criterion related to generation of output data, as will be discussed below.
  • a focus set can be a subset of the total neural network.
  • the at least one criterion for the focus set may comprise at least one performance criterion.
  • the criterion applied to determine which nodes that are to be included in the focus set may generally relate to how a node within the neural network compares to other nodes within the neural network.
  • the at least one performance criterion may, for example, relate to at least one of an activation level of the node, a change in the activation level of the node, a frequency of activation of the node and an accuracy of previous output of the node. Accordingly, the at least one performance criterion may relate to how a node acted when input data was introduced to the neural network, relate to how the node has acted historically and how accurate the output from the node has been historically.
  • the at least one criterion may further comprise at least one structural criterion related to at least one of a position of the node within the neural network, a number of nodes within the neural network and a number of nodes within an area of the neural network.
  • the structural criterion may thus relate to where a node is located within the neural network and used in combination with the at least one performance criterion.
  • the focus set may accordingly be a subset of the total neural network.
  • the focus set of the input vector x is a set, possibly empty, of input nodes and concept nodes of the neural network. This may make the configuration of the neural network much more efficient. It may not be necessary to change the complete neural network when the network performs poorly, it may be possible to adapt only the focus set.
  • the focus set comprises nodes that fulfil three criteria. These three criteria may, for example, be that the node should have activation of at least Q, e.g. 0.2, when the neural network receives input data, that there are no other nodes downstream of the node that has activation of at least Q, e.g. 0.2, and that the node is among the M, e.g. 5, nodes with highest activation that satisfy the two previous criteria.
  • Q is an activation parameter
  • M is a focus set size parameter. Both these two parameters will affect the degree of sparsity of the neural network. Accordingly, the criterion used to determine the focus set will affect the size of the focus set, which in turn is going to affect the sparsity of the neural network. It is therefore possible to control the sparsity of the neural network by the proposed method.
  • a signal comprising information indicating accuracy of the received output data is received at step 303.
  • the information indicating the accuracy of the received output data may be, for example, an error value based on a difference between the received output data and a reference output. This may be used in supervised learning, as known in the art. Alternatively, the information indicating the accuracy of the received output data may be determined based on a reward signal representing how well the neural network performed based on said provided input data. This is an example of reinforcement learning, as also known in the art. By using rewards, it may be possible to focus directly on performance of the neural network instead of comparing input/output pairs.
  • a scalar cost may be computed based on the expected reward versus the actual reward, where the scalar cost thus is a measure of how wrong the neural network was in terms of its ability to estimate the relationship between the input and output.
  • the reward signal is received from the environment.
  • the network works as a function approximator that represents the so-called Q-matrix, with estimated quality values, or Q-values, of pairs of states and actions.
  • a target vector is then computed, e.g., as in deep Q-learning, as known in the art, whereupon the reinforcement learning case proceeds as in the case of supervised learning.
  • the received information indicating the accuracy of the received output data is compared to a threshold.
  • a threshold By comparing the information indicating the accuracy of the received output against a threshold, it may be possible to determine how well the neural network performed. The result from the comparison is then used in the method to decide how the network is going to be configured.
  • the information indicating the accuracy of the received output data is an error value determined based on a difference between the received output data and a reference output, where the reference output data may be the correct output data that should be produced by the neural network based on the input data.
  • the correct output data is accordingly a desired output data.
  • feedback from the environment in the form of a reward signal is received and used for computing a reference vector that in turn constitutes the basis for the update of the network.
  • Desired output data may interchangeably be referred to as a target output data.
  • the error value is greater than or equal to the threshold and the neural network is deemed to have performed poorly.
  • the determined difference between the received output data and the correct output data is small, i.e., the received output data is close to the correct output data, the error value is less than the threshold and the neural network is deemed to be performing better.
  • the neural network is performing very well and in accordance with what is expected from the neural network.
  • the information indicating the accuracy of the received output data is determined based on a reward signal representing how well the neural network performed based on said provided input data.
  • the update of a value estimating the quality of an action depends on the reward received, i.e. the most recently received reward, and the best estimated value of the actions in the next step, i.e. estimated future reward. If the reward signal differs much from the expected reward signal, i. e., the reward signal is far from the expected reward signal, the comparison is greater than or equal to the threshold and the neural network is deemed to have performed poorly.
  • the comparison is less than the threshold and the neural network is deemed to be performing better. Additionally, if there is no difference between the reward signal and the expected reward signal, the neural network is performing very well and in accordance with what is expected from the neural network.
  • the information indicating the accuracy of the received output data may be an accuracy score. For example, this may be a score between 0 and 1, where 0 represents a completely incorrect output and 1 represents a perfectly correct output.
  • the information is compared to an accuracy threshold, for example that indicates an accuracy above 0.8 is acceptable.
  • the neural network is deemed to have performed poorly.
  • the neural network is deemed to be performing better.
  • the information is exactly 1 , the neural network is performing perfectly.
  • the method may perform different actions, as also illustrated in Figure 3a. If the information indicating the accuracy of the received output data indicates that the neural network performed poorly, the method moves to step 310. If the information indicating the accuracy of the received output data indicates a sufficient but not perfect accuracy of the output data, , the method moves to step 320. Finally, if the network performed perfectly, the method moves to step 330. Step 330 may also be performed after step 310 or 320 has been performed.
  • the present disclosure provides a method where it is decided whether to perform reconfiguration of the network based on a comparison of information indicating the accuracy of the received output data to a threshold that indicates how the neural network performed on certain input data.
  • the network is only reconfigured when the network performs poorly, i. e. when the information indicating the accuracy of the received output data is not good enough.
  • the disclosed method provides a neural network that grows gradually and which is generally kept small since the neural network only grows as long as it is needed. This in turn increases the energy efficiency of the network as energy may be saved in the computing phase, i.
  • the neural network is used to calculate output data based on a received input data. Energy is saved as the neural network comprises a smaller number of nodes and connections required to produce an accurate output compared to previously known solutions. Furthermore, with the proposed method it may be possible to make sure that the neural network does not grow uncontrollably.
  • the provided method may start with an arbitrary network and may build arbitrary depth.
  • the neural network is configured by adding at least one node to the neural network and connecting the at least one added node to the focus set of the nodes within the neural network.
  • the step 310 of adding nodes to the neural network and connecting the added nodes to the focus set may further comprise, for each node in the focus set, step 311 of adding a respective value node to the node in the focus set.
  • a value node may receive input only from its respective node in the focus set.
  • a concept node is added to the neural network and connected to the focus set to act as an output of the focus set.
  • the concept node is connected to each of the added value nodes.
  • at least one trainable parameter of at least one of the value nodes and concept node is set such that the output of the concept node matches a target output, i.e. a reference output.
  • a trainable parameter may be, for example, a bias or an incoming weight of a node.
  • trainable parameters are the mean and standard deviation of a Gaussian activation function. More generally, a trainable parameter may be any coefficient appearing inside an activation function.
  • step 310 of adding nodes to the neural network and connecting the added node to the focus set adds new memory structures to the neural network. It adds one or more value nodes and a concept node that unites them. Accordingly, step 310 forms a partial memory of the present situation and the disclosed method provides a neural network that grows gradually. Thus, relatively sparse networks are configured, which do not grow uncontrollably. The energy efficiency in the computing phase may be increased as the network may comprise the smallest numbers of nodes and connections required to produce an accurate output.
  • Figures 4a and 4b illustrate an example of the network before and after the step 310 of adding nodes to the neural network and connecting the added nodes to the focus set is performed.
  • Figure 4a illustrates the network before the addition of nodes.
  • the network illustrated in Figure 4a only shows the nodes in the focus set of the neural network.
  • the focus set comprises of three input nodes 210a-c.
  • the input nodes may provide input to the neural network.
  • a signal comprising information indicating the accuracy of the received output data from these three input nodes 210a-c has indicated that the focus set illustrated in Figure 4a performed poorly.
  • step 310 of adding nodes to the neural network and connecting the added nodes to the focus set is performed.
  • Figure 4b illustrates the network after such an addition of nodes.
  • a respective value node 220a-c is added for each output node in the focus set, i.e. 210a-c.
  • the value nodes 220a-c receive input only from the respective input node 210a-c in the focus set.
  • a concept node 230 added to the neural network and connected to the focus set to act as an output of the focus set.
  • the concept node 230 is connected to each of the value nodes 220a-c. Accordingly, the concept node 230 is added to the neural network and connected to the focus set via the value nodes 220a-c.
  • the proposed method makes it possible to support one shot learning, i.e. the neural network may learn a certain task or feature at the first attempt.
  • the proposed method supports multi-level learning, i.e. the neural network may first learn a notion of frog. Then it encounters a poisonous frog that is red, then the neural network may learn the notion of red frog with one-shot learning.
  • the proposed method will generate sparse networks.
  • the method may move to step 320, which comprises deciding to configure the neural network by performing backpropagation.
  • step 320 comprises deciding to configure the neural network by performing backpropagation.
  • the neural network only needs some fine-tuning, which is performed by the backpropagation that shapes the parameters of the architecture. This is contrary to when the neural network does not perform as well as desired, then the architecture of the neural network is shaped by adding nodes.
  • the backpropagation is known in the art. In order to make the proposed method even more efficient, the backpropagation according to one exemplary embodiment is only performed along paths of the temporary network that end in the temporary output nodes. Thus, the backpropagation may only be performed along the focus set and not along the complete neural network.
  • the next step of the method is going to connect at least two temporary nodes to the focus set output nodes.
  • the focus set output nodes represent the at least one downstream node from which the output data is received.
  • the temporary nodes are added in order to emphasise the importance of the respective output nodes of the focus set. The number of temporary nodes may thus depend on the focus set.
  • a temporary normalization node is a node that rescales the output of an output node of the focus set, i.e. the at least one temporary normalization node reflects the importance or contribution of the focus set output node to the eventual output of the network/focus set.
  • Another example of a temporary node may be a temporary output node.
  • the temporary output node may function as an output node for the temporarily added structure.
  • Steps 321 to 325 give an example of how at least two temporary nodes are added to the neural network and connected to the focus set.
  • a respective temporary normalization node is added to the neural network for each focus set output node.
  • the temporary normalization nodes are connected to their respective focus set output node. Accordingly, a temporary normalization layer having the same number of nodes as there are concept nodes of the focus set is added to the neural network and connected to the focus set. The normalization layer is used for rescaling purposes.
  • the outgoing weight from the focus set output node to the temporary normalization node is set to 1 in accordance with step 323. Thereafter, the method may further comprise the step 324 of adding at least one temporary output node to the neural network.
  • the number of temporary output nodes is equal to the dimension of the output from the focus set output nodes.
  • Each temporary normalization node is, in step 325, connected to each of the temporary output nodes. Accordingly, a temporary output layer that is fully connected to the normalization layer is added.
  • the outgoing weight from the temporary normalization nodes to the temporary output nodes are set in step 326. These weights are set such that each temporary normalization node has a weighted output that represents an output value of its respective concept node, as will be explained in relation to Figure 5.
  • the output of the temporary nodes is computed as a normalized linear combination of the output data of the concept nodes of the focus set.
  • the temporary output nodes may be computed as a linear combination of the temporary normalization nodes.
  • At step 327 at least one trainable parameter of at least one node within the focus set and/or of at least one of the temporary nodes is adjusted.
  • the trainable parameter is adjusted in order to increase an accuracy of the received output data of the neural network. For example, by increasing the accuracy, the determined difference between the received output data and the reference output data may be reduced.
  • the at least two temporary nodes are removed at step 328. Thus, a better performing neural network with a small structure is obtained, and the network can be retested with its original architecture.
  • the focus set of the neural network of which the backpropagation is going to be performed is illustrated in Figure 5a.
  • the focus set illustrated in Figure 5a comprises four input nodes 210a- d, which are connected to four respective value nodes 220a-d. Two of the value nodes 220a and 220b are connected to one concept node 230a, and the other two of the value nodes 220c and 220d are connected to another concept node 230b.
  • the output data from concept node 230a is (4,5) and the output data from concept node 230b is (8,9).
  • the reference output data for the focus set in this case is (5,4).
  • the activity of concept node 230a is higher than the activity of the concept node 230b.
  • concept node 230a has an activity of 8 and concept node 230b has an activity of 2. This means that the output from concept node 230a is considered more important and concept node 230a should contribute 80% to the output from the focus set, while concept node 230b should contribute only 20%.
  • the concept nodes of the focus set illustrated in Figure 5a comprise the concept nodes 230a and 230b.
  • the at least two temporary nodes according to the present example therefore comprises two temporary normalization nodes.
  • the outputs from the concept nodes 230a and 230b, (4,5) and (8,9) respectively, are two dimensional.
  • the at least two temporary nodes according to the present example therefore comprises two temporary output nodes.
  • the focus set with the added temporary normalization and output layers is illustrated in Figure 5b.
  • Two temporary normalization nodes 230c and 230d are added and connected to the focus set output nodes 230a and 230b, and two temporary output nodes 230e and 230f are added to the temporary normalization nodes 230c and 230d.
  • the normalization layer is used for rescaling purposes, and the rescaling values of the normalization nodes according to the present example are thus set to 0.8 and 0.2, in line with their respective activities.
  • the outgoing weight from the focus set output node to the temporary normalization node is set to 1. Thereafter, for each normalization node, a respective temporary output node is added to the neural network and connected to each temporary normalization node. Accordingly, an output layer that is fully connected to the normalization layer is added.
  • the outgoing weights wl to w4 from the temporary normalization nodes to the temporary output nodes are set such that each temporary normalization node has a weighted output that represents an output value of its respective concept node.
  • the outputs from the focus set output nodes 230a and 230b were (4,5) and (8,9).
  • wl is set to 4 and w2 is set to 5.
  • w3 is set to 8 and w4 to 9. This results in that the output from temporary output node 230e is 4.8, i.e.
  • At least one trainable parameter of at least one node within the focus set and/or of at least one of the at least two temporary nodes is then adjusted in order to increase the accuracy of the received output data received from the focus set output nodes.
  • the trainable parameter should be adjusted such that the reference output data (5,4) is received from the neural network.
  • the gradient descent algorithm may be used as the basis of the parameter updates.
  • Figure 5c illustrates an example of the result after this has been performed.
  • the illustrated weights of w5 to w8, i.e. generally all weights leading into a value node, and the weights set to 1 from the focus set output node to the temporary normalization node are treated as constants and not updated.
  • weights wl-w4 and w9-wl2 are changed in order to change the output of the temporary output nodes.
  • the weights wl and w3 are increased in order to increase the output of the temporary output node 230e from 4.8 towards the reference value 5.
  • weights w2 and w4 are decreased, in order to decrease the output of the temporary output node 230f from 5.8 towards the reference value 4.
  • the temporary layers are removed. That is, the temporary normalisation nodes and the temporary output nodes are removed.
  • the result is the structure illustrated in Figure 5d, where several parameters, e.g. w9-wl2 and the vectors y and y’, have been adjusted in order to refine the output data from the focus set such that it comes closer to the reference output data.
  • the output of concept node 230a is now above 4 and below 5
  • the output of concept node 230b is now above 8 and below 9.
  • backpropagation a controlled fine-tuning of the network may be provided.
  • Backpropagation will only be performed when the neural network information indicating the accuracy of the received output data indicates a sufficient but not perfect accuracy of the output data. This means that backpropagation will not be performed when the neural networked performed very badly and the backpropagation will only be performed when there are smaller changes needed to adjust the neural network, whereas in current methods, backpropagation is performed more frequently.
  • the method according to the present disclosure may further comprise step 330 of performing generalization to merge nodes with corresponding output data.
  • the step 330 of performing generalization may generalize the network such that intersections of concept nodes with similar target vectors may construct simpler networks. Accordingly, this step may be performed even if the network performs perfectly, e.g. if output data from at least one downstream node in the neural network correspond to a reference output data.
  • generalisation is performed if at least two concept nodes receive input from at least two value nodes that receive input from a common node and produce corresponding output data, and if at least one of the two concept nodes was added in a previous step.
  • Figure 6a illustrates an arbitrary neural network, which in this case comprises of four input nodes 210a-d, six value nodes connected to the input nodes and two concept nodes 230a and 230b.
  • the input nodes 210a-d provide the input to the neural network and the two concept nodes 230a and 230b generate the output data.
  • the step of performing 330 generalization comprises step 331 of adding a respective value node corresponding to each of the common nodes.
  • the common input nodes in the present example are input nodes 210b and 210c.
  • concept nodes 230a and 230b both receive input from the input nodes 210b and 230c, and thus, input nodes 210b and 230c are nodes in common, i.e. common input nodes.
  • Figure 6b illustrates the added value node 220a and 220b. The respective value node receives input only from the respective common input node.
  • common input nodes need not be input nodes to the network, but could be concept nodes that provide input to one or more downstream nodes.
  • step 332 of adding a concept node to act as a common output of the common input nodes is performed.
  • the concept node is connected, in step 333, to each of the added value nodes.
  • the added common output node is illustrated in Figure 6b as concept node 230c.
  • the common output node 230c receives input from the added value nodes 220a and 220b.
  • At least one trainable parameter of the value nodes 220a and 220b and the concept node 230c is set in step 334, such that the output of the added concept node 230c is a function of the at least two concept nodes 230a and 230b.
  • This function may be, for example, such that the output of the concept node is an average of the at least two concept nodes 230a and 230b, or a weighted averaged of the at least two concept nodes 230a and 230b.
  • the step of performing generalization may generalize concepts with similar target vectors by adding a concept node with similar target vector and fewer predecessors, wherein the predecessors of a concept node is the set of concept nodes and input nodes that are connected to the concept node via a value node.
  • the generalization may reduce the neural network’s sensitivity to noise, enable network reduction and generalize concepts so that wider classes of data may be represented.
  • the method may further comprise removing 340 the concept node and at least one corresponding value node that provides input to the concept node. Removal of the concept and value nodes is performed if the concept node fulfils at least one removal criterion related to generation of output data.
  • the at least one removal criterion may, for example, be related to at least one of a frequency of appearances of the concept node in a focus set, an average signal comprising information indicating the accuracy of the received output data, and a position of the concept node within the neural network.
  • FIGs 7a and 7b An example embodiment of when the removal criterion is applied is illustrated in Figures 7a and 7b.
  • the focus set comprises the concept node 230b and the removal criterion is related to a difference between received output data and a correct output data.
  • the output data of the concept node 230b differs much from the correct output data, and the removal criterion is thus fulfilled.
  • Figure 7a illustrates the neural network before the removal criterion is applied
  • Figure 7b the neural network after the concept node and at least one corresponding value node that provides input to the concept node are removed.
  • a possible advantage of applying a removal criterion is that the unnecessarily large neural networks are counteracted.
  • the provided method may be advantageous to use in products where data points may be expensive or scarce.
  • Example of such products may be medical support systems that may warn of upcoming epileptic seizures or blood sugar dips based on individual data or drug discovery systems.
  • the proposed method may be used in products that require extraordinary adaptability, e.g. household robots that can adapt to new homes or do cleaning, cooking and dishwashing.
  • existing products may also be improved with the proposed method by saving energy and time and it increases the ability to understand and explain neural networks.
  • Figure 8a is a block diagram illustrating a neural network configuring unit 800 according to the present disclosure.
  • the neural network configuring unit 800 is adapted to implement the method for configuring a neural network according to the present disclosure.
  • the neural network configuring unit 800 comprises an input data providing unit 805, an output data receiving unit 810, an accuracy signal receiving unit 815, a comparison unit 820 and a configuration unit 825.
  • the input data providing unit 805 of the neural network configuring unit 800 is adapted to provide input data to at least one input node of the neural network.
  • the output data receiving unit 810 of the neural network configuring unit 800 is further adapted to receive output data from at least one downstream node in the neural network.
  • the output data is generated based on the input data by a focus set of the neural network.
  • the focus set comprises at least one node that fulfils at least one criterion related to generation of output data.
  • the accuracy signal receiving unit 815 of the neural network configuring unit 800 is further adapted to receive a signal comprising information indicating the accuracy of the received output data.
  • the comparison unit 820 of the neural network configuring unit 800 is further adapted to compare the information indicating the accuracy of the received output data to a threshold.
  • the configuration unit 825 of the neural network configuring unit 800 is adapted to configure the neural network. Based on the result of the comparison, the configuration unit 825 of the neural network configuring unit 800 is adapted to configure the neural network by adding at least one node to the neural network and connecting the at least one added node to the focus set of the nodes within the neural network.
  • the computing device 830 is a physical computer (a hardware system), which may be dedicated to run one or more services, as a host, to serve the needs of users of the other computers or communication nodes on a network.
  • the computing device 830 may comprise a controller 835, which is responsible for the overall operation of the computing device 830 and is preferably implemented by any commercially available CPU ("Central Processing Unit"), DSP ("Digital Signal Processor”) or any other electronic programmable logic device.
  • CPU Central Processing Unit
  • DSP Digital Signal Processor
  • the controller 835 may be implemented using instructions that enable hardware functionality, for example, by using executable computer program instructions in a general-purpose or special-purpose processor that may be stored on a computer readable storage medium (disk, memory etc.) 855 to be executed by such a processor.
  • the controller 835 may be configured to read instructions from the memory 855 and execute these instructions to control the operation of the computing device 830.
  • the memory 855 may be implemented using any commonly known technology for computer-readable memories such as ROM, RAM, SRAM, DRAM, CMOS, FLASH, DDR, EEPROM memory, flash memory, hard drive, optical storage or any combination thereof.
  • the computing device 830 may further comprise one or more applications 860.
  • the applications may be sets of instructions that when executed by the controller 835 control the operation of the computing device 830.
  • the memory 855 may be used for various purposes by the controller 835, one of them being for storing application data and program instructions 880 for various software modules in the computing device 830.
  • the software modules include a real-time operating system, drivers for a man-machine interface 840, an application handler as well as various applications 860.
  • the computing device 830 may, according to some embodiments, further comprise a user interface 840, which may, for example, comprise a display and a keypad or a touch screen. Other user interface elements known in the art may equally for part of the user interface 840.
  • the computing device 830 may further comprise a radio frequency interface 845, which is adapted to allow the server to communicate with other devices, such as other computing devices, through a radio frequency band through the use of different radio frequency technologies. Examples of such technologies are WIFI, Bluetooth®, W-CDMA, GSM, UTRAN, LTE, and NMT to name a few.
  • the computing device 830 may further comprise a wired interface 850, which is adapted to allow the server to communicate with other devices through the use of different network technologies. Examples of such technologies are USB, Ethernet, Local Area Network, and TCP/IP (Transport Control Protocol/Internet Protocol) to name a few.
  • a wired interface 850 is adapted to allow the server to communicate with other devices through the use of different network technologies. Examples of such technologies are USB, Ethernet, Local Area Network, and TCP/IP (Transport Control Protocol/Internet Protocol) to name a few.
  • the RF interface 845 may comprise an internal or external antenna as well as appropriate radio circuitry for establishing and maintaining a wireless link to a base station.
  • the radio circuitry comprises a series of analogue and digital electronic components, together forming a radio receiver and transmitter. These components include, i.e., band pass filters, amplifiers, mixers, local oscillators, low pass filters, AD/DA converters, etc.
  • the computing device 830 implementing the method according to the present disclosure may comprise of a plurality of interconnected computing devices.
  • An example of such a distributed computing device 830 is illustrated in Figure 8c.
  • the distributed computing devices 830 may be connected with each other over a network 865.
  • the network 865 may be the internet.
  • the internet is a global system of interconnected computer networks that use the standard Internet protocol suite (TCP/IP - Transmission Control protocol/Internet Protocol) to serve billions of users worldwide. It is a network of networks that consists of millions of private, public, academic, business, and government networks, of local to global scope, that are linked by a broad array of electronic, wireless and optical networking technologies.
  • the internet carries a vast range of information resources and services, such as the inter-linked hypertext documents of the World Wide Web (WWW).
  • WWW World Wide Web
  • the internet is full of possibilities and variations of how to connect communication nodes and the embodiments disclosed herein are for purely exemplary purposes and should not be construed to be limiting.
  • the computing devices 830 may be connected through a wired connection, or a wireless connection, or any combination of known connection methods for example through dedicated networks or connections.
  • Figure 8c there are three computing devices 830, of which one is a desktop computer and two are illustrated as servers.
  • any other suitable number of computing devices 830 could be present in the network, and other examples of such computing devices 830 may be possible, such as a personal computer, desktop or laptop, an internet tablet, a mobile telephone, a smart phone, a personal digital assistant, a server, electronic key, machine-to-machine device (M2M) and a work station to name a few.
  • the at least one computing device 830 may further be embodied as a virtual machine, i.e.
  • any computing device 830 may be connected to the internet and the number and type of computing devices in Figure 8c should not be construed as limiting.
  • the computing devices 830 may be configured for network communication, either wireless or wired. Alternatively, the computing devices 830 may be configured for both wireless and wired network communication.
  • Figure 8d shows a schematic view of a computer-readable medium 855 encoded with instructions 880 that, when executed on a processor, performs the methods described above.
  • the computer-readable medium 855 is in this embodiment a data disc.
  • the data disc may be a magnetic data storage disc.
  • the data disc may be configured to carry instructions 880 that when loaded into a controller, such as a processor, executes a method or procedure according to the embodiments disclosed above.
  • the data disc may be arranged to be connected to or within and read by a reading device 885, for loading the instructions into the controller.
  • a reading device 885 in combination with one (or several) data disc(s) is a hard drive.
  • the computer-readable medium can also be other mediums such as compact discs, digital video discs, flash memories or other memory technologies commonly used.
  • the data disc may be one type of a tangible computer- readable medium 855.
  • the instructions 880 may also be downloaded to a computer data reading device 875, such as a computer or other device capable of reading computer coded data on a computer-readable medium, by comprising the instructions 880 in a computer-readable signal 870 which is transmitted via a wireless (or wired) interface (for example via the Internet) to the computer data reading device 875 for loading the instructions 880 into a controller.
  • the computer-readable signal 870 is one type of a non-tangible computer-readable medium 855.
  • the instructions may be stored in a memory (not shown explicitly in Figure 8d, but referenced 855 in Figure 8b as part of the computing device 830) of the computer data reading device 885.
  • references to computer program, instructions, code etc. should be understood to encompass software for a programmable processor or firmware such as, for example, the programmable content of a hardware device whether instructions for a processor, or configuration settings for a fixed-function device, gate array or programmable logic device etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé de configuration d'un réseau neuronal, le procédé étant mis en œuvre par au moins un dispositif informatique. Le procédé consiste à fournir des données d'entrée à au moins un nœud d'entrée. Des données de sortie sont reçues en provenance d'au moins un nœud en aval. Les données de sortie sont générées par un ensemble de focalisation du réseau neuronal sur la base des données d'entrée. L'ensemble de focalisation comprend au moins un nœud qui remplit au moins un critère lié à la génération de données de sortie. Le procédé consiste en outre à recevoir un signal comprenant des informations indiquant la précision des données de sortie reçues. Le signal reçu est ensuite comparé à un seuil. Sur la base du résultat de la comparaison, le procédé comprend en outre la configuration du réseau neuronal par l'ajout d'au moins un nœud au réseau neuronal et la connexion du ou des nœuds ajoutés à l'ensemble de mise au point des nœuds à l'intérieur du réseau neuronal.
PCT/EP2020/051343 2019-01-24 2020-01-21 Procédé et dispositif de construction d'un réseau neuronal Ceased WO2020152129A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SE1950084-2 2019-01-24
SE1950084 2019-01-24

Publications (1)

Publication Number Publication Date
WO2020152129A1 true WO2020152129A1 (fr) 2020-07-30

Family

ID=69182529

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2020/051343 Ceased WO2020152129A1 (fr) 2019-01-24 2020-01-21 Procédé et dispositif de construction d'un réseau neuronal

Country Status (1)

Country Link
WO (1) WO2020152129A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SE2151100A1 (en) * 2021-09-03 2023-03-04 IntuiCell AB A computer-implemented or hardware-implemented method for processing data, a computer program product, a data processing system and a first control unit therefor
US11941510B2 (en) 2020-06-16 2024-03-26 IntuiCell AB Computer-implemented or hardware-implemented method of entity identification, a computer program product and an apparatus for entity identification

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GERMAN I PARISI ET AL: "Continual Lifelong Learning with Neural Networks: A Review", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 21 February 2018 (2018-02-21), XP081036610, DOI: 10.1016/J.NEUNET.2019.01.012 *
MÄKELÄINEN FREDRIK ET AL: "Efficient Concept Formation in Large State Spaces", 21 July 2018, INTERNATIONAL CONFERENCE ON FINANCIAL CRYPTOGRAPHY AND DATA SECURITY; [LECTURE NOTES IN COMPUTER SCIENCE; LECT.NOTES COMPUTER], SPRINGER, BERLIN, HEIDELBERG, PAGE(S) 140 - 150, ISBN: 978-3-642-17318-9, XP047481655 *
TIANJUN XIAO ET AL: "Error-Driven Incremental Learning in Deep Convolutional Neural Network for Large-Scale Image Classification", MULTIMEDIA, ACM, 2 PENN PLAZA, SUITE 701 NEW YORK NY 10121-0701 USA, 3 November 2014 (2014-11-03), pages 177 - 186, XP058058691, ISBN: 978-1-4503-3063-3, DOI: 10.1145/2647868.2654926 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11941510B2 (en) 2020-06-16 2024-03-26 IntuiCell AB Computer-implemented or hardware-implemented method of entity identification, a computer program product and an apparatus for entity identification
SE2151100A1 (en) * 2021-09-03 2023-03-04 IntuiCell AB A computer-implemented or hardware-implemented method for processing data, a computer program product, a data processing system and a first control unit therefor
WO2023033697A1 (fr) * 2021-09-03 2023-03-09 IntuiCell AB Procédé mis en œuvre par ordinateur ou mis en œuvre par matériel pour traiter des données, produit-programme informatique, système de traitement de données et première unité de commande associée
SE546526C2 (en) * 2021-09-03 2024-11-26 IntuiCell AB A computer-implemented or hardware-implemented method for processing data, a computer program product, a data processing system and a first control unit therefor

Similar Documents

Publication Publication Date Title
US20240354576A1 (en) System and method for self constructing deep neural network design through adversarial learning
US10776668B2 (en) Effective building block design for deep convolutional neural networks using search
Zhang et al. Efficient federated learning for cloud-based AIoT applications
Seo et al. Semantics-native communication with contextual reasoning
CN113785503A (zh) 使用自适应学习的波束管理
Wu et al. A hybrid constructive algorithm for single-layer feedforward networks learning
CN115866610A (zh) 基于强化学习(rl)和图神经网络(gnn)的无线接入网资源管理
CN112887239B (zh) 基于深度混合神经网络的快速准确水声信号调制方式识别方法
EP4425382A1 (fr) Procédé de formation de modèle et appareil de communication
WO2020152129A1 (fr) Procédé et dispositif de construction d'un réseau neuronal
CN112634992A (zh) 分子性质预测方法及其模型的训练方法及相关装置、设备
CN111353717A (zh) 区块链共识节点推选系统及方法
Perenda et al. Evolutionary optimization of residual neural network architectures for modulation classification
US20200175350A1 (en) System and method for online reconfiguration of a neural network system
CN116129888A (zh) 一种音频数据分类方法、装置、设备及介质
Huang et al. Bayesian-learning-based diffusion least mean square algorithms over networks
CN118485134A (zh) 一种基于增量强化学习的目标搜索方法及装置
WO2025074369A1 (fr) Système et procédé d'apprentissage de marl collaboratif efficace à l'aide de réseaux de tenseur
Deng et al. Efficient real-time recognition model of plant diseases for low-power consumption platform
KR20240126334A (ko) 미분가능 아키텍처 검색에서의 온도 감쇠방법
US20240378450A1 (en) Methods and apparatuses for training a model based reinforcement learning model
Uykan Clustering-based algorithms for single-hidden-layer sigmoid perceptron
Bodyanskiy et al. Adaptive double neo-fuzzy neuron and its combined learning
Isah et al. Gft-cosmep: Beyond 5g network digital twin failure classification with graph neural network
US20240379091A1 (en) Voice assistant application for automated voice responses by licensed voices

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20701321

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20701321

Country of ref document: EP

Kind code of ref document: A1