WO2020177863A1 - Apprentissage d'algorithmes - Google Patents
Apprentissage d'algorithmes Download PDFInfo
- Publication number
- WO2020177863A1 WO2020177863A1 PCT/EP2019/055537 EP2019055537W WO2020177863A1 WO 2020177863 A1 WO2020177863 A1 WO 2020177863A1 EP 2019055537 W EP2019055537 W EP 2019055537W WO 2020177863 A1 WO2020177863 A1 WO 2020177863A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- algorithm
- obtaining
- trainable weights
- arithmetic operations
- perturbation vectors
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0495—Quantised networks; Sparse networks; Compressed networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
Definitions
- the present specification relates to training of algorithms, such as neural networks, having at least some trainable weights.
- An algorithm such as a neural network, can be trained using a loss function. Obstacles to practical hardware implementations of such algorithms include high memory requirements and computational complexity. There remains scope for further developments in this area.
- this specification describes an apparatus comprising: means for initialising parameters of an algorithm, wherein said algorithm having at least some trainable weights and at least some arithmetic operations; means for generating or obtaining training data comprising inputs and desired target outputs for training said algorithm, wherein the target outputs are expected outputs of the algorithm in response to the respective inputs; means for generating or obtaining a set of random (e.g.
- pseudo-random perturbation vectors to be applied after at least some of said arithmetic operations of the algorithm (e.g. all mathematical operations that might introduce an error, such as due to the use of fixed-point arithmetic), wherein said perturbation vectors have a distribution (e.g. a probability density function) depending on properties of a target device implementing said algorithm and depending on the arithmetic operations; means for updating at least some of the trainable weights of the algorithm based on a loss function; and means for repeating the generation or obtaining of training data, generation or obtaining of random perturbation vectors and updating of said trainable weights until a first condition is reached.
- the perturbation vector may be random or pseudo-random perturbations within a defined distribution.
- the perturbation vectors may be evenly distributed within a defined set (wherein the defined set defines the distribution of the perturbation vectors).
- the algorithm may comprise a neural network.
- the algorithm may have a plurality of layers (e.g. a plurality of dense layers), each layer having at least some trainable weights and each layer including one or more arithmetic operations.
- the arithmetic operations may include multiplications.
- the perturbation vectors are applied after each multiplication.
- Some embodiments comprise means for determining an error distribution of the target device for use in defining said distribution of said perturbation vectors.
- the error distribution may be determined by measurement. One example is a histogram approach.
- the distribution of the perturbation vectors may be based on a“best fit” to the measured error distribution.
- the target device may be one of an application-specific integrated circuit and a field- programmable gate array.
- the target device may implement a fixed-point arithmetic.
- training may be conducted using a floating-point arithmetic.
- Some embodiments comprise means for quantizing said trainable weights, such that said weights can only take values within a codebook having a finite number of entries that is a subset of the possible values available during updating.
- the said means for repeating may comprise means for repeating the generation or obtaining of training data, the generating or obtaining of random perturbation vectors, the updating of said trainable weights and the quantizing of said trainable weights until the first condition is reached.
- the said loss function may comprise a penalty term related to the quantization of the trainable weights.
- the said penalty term may include a variable that is adjusted on each repetition of the updating and quantizing such that, on each repetition, more weight is given in the loss function to a difference between the trainable weights and the quantized trainable weights.
- the first condition may be met when a performance metric for the algorithm is met.
- the first condition may be met when a performance metric has been unchanged (e.g. unchanged within a margin) for a defined number of iterations.
- the first condition may comprise a defined number of iterations (e.g. a maximum number of iterations).
- the loss function may be related to one or more of mean squared error, block error rate and categorical cross-entropy.
- the said at least some weights of the algorithm may be trained using stochastic gradient descent (or methods based on stochastic gradient descent or some similar algorithm).
- the algorithm implements at least part of a transmission system comprising a transmitter, a channel and a receiver.
- the said means may comprise: at least one processor; and at least one memory including computer program code, the at least one memory and the computer program configured, with the at least one processor, to cause the performance of the apparatus.
- this specification describes a method (e.g. a method of training an algorithm, such as a neural network) comprising: initialising parameters of an algorithm, wherein said algorithm has at least some trainable weights and at least some arithmetic operations; generating or obtaining training data comprising inputs and desired target outputs for training said algorithm, wherein the target outputs are expected outputs of the algorithm in response to the respective inputs; generating or obtaining a set of random (e.g.
- pseudo random perturbation vectors to be applied after at least some of said arithmetic operations of the algorithm (e.g. all mathematical operations that might introduce an error), wherein said perturbation vectors have a distribution (e.g. a probability density function) depending on properties of a target device implementing said algorithm and depending on the arithmetic operations; updating at least some of the trainable weights of the algorithm based on a loss function; and repeating the generation or obtaining of training data, generation or obtaining of random perturbation vectors and updating of said trainable weights until a first condition is reached.
- the perturbation vector may be random or pseudo-random perturbations within a defined distribution.
- the perturbation vectors may be evenly distributed within a defined set (wherein the defined set defines the distribution of the perturbation vectors).
- the algorithm may comprise a neural network.
- the algorithm may have a plurality of layers (e.g. a plurality of dense layers), each layer having at least some trainable weights and each layer including one or more arithmetic operations.
- Some embodiments comprise determining an error distribution of the target device for use in defining said distribution of said perturbation vectors.
- the error distribution may be determined by measurement.
- Some embodiments comprise quantizing said trainable weights, such that said weights can only take values within a codebook having a finite number of entries that is a subset of the possible values available during updating.
- the said loss function may comprise a penalty term related to the quantization of the trainable weights.
- the said penalty term may include a variable that is adjusted on each repetition of the updating and quantizing such that, on each repetition, more weight is given in the loss function to a difference between the trainable weights and the quantized trainable weights.
- the first condition may be met when a performance metric for the algorithm is met.
- the first condition may be met when a performance metric has been unchanged (e.g. unchanged within a margin) for a defined number of iterations.
- the first condition may comprise a defined number of iterations (e.g. a maximum number of iterations).
- the said at least some weights of the algorithm may be trained using stochastic gradient descent (or methods based on stochastic gradient descent).
- this specification describes any apparatus configured to perform any method as described with reference to the second aspect.
- this specification describes computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform any method as described with reference to the second aspect.
- this specification describes a computer program comprising instructions for causing an apparatus to perform at least the following: initialise parameters of an algorithm, wherein said algorithm has at least some trainable weights and at least some arithmetic operations; generate or obtain training data comprising inputs and desired target outputs for training said algorithm, wherein the target outputs are expected outputs of the algorithm in response to the respective inputs; generate or obtain a set of random perturbation vectors to be applied after at least some of said arithmetic operations of the algorithm, wherein said perturbation vectors have a distribution depending on properties of a target device implementing said algorithm and depending on the arithmetic operations; update at least some of the trainable weights of the algorithm based on a loss function; and repeat the generation or obtaining of training data, generation or obtaining of random perturbation vectors and updating of said trainable weights until a first condition is reached.
- this specification describes a computer-readable medium (such as a non-transitory computer readable medium) comprising program instructions stored thereon for performing at least the following: initialising parameters of an algorithm, wherein said algorithm has at least some trainable weights and at least some arithmetic operations; generating or obtaining training data comprising inputs and desired target outputs for training said algorithm, wherein the target outputs are expected outputs of the algorithm in response to the respective inputs; generating or obtaining a set of random (e.g. pseudo random) perturbation vectors to be applied after at least some of said arithmetic operations of the algorithm (e.g. all mathematical operations that might introduce an error), wherein said perturbation vectors have a distribution (e.g.
- a probability density function depending on properties of a target device implementing said algorithm and depending on the arithmetic operations; updating at least some of the trainable weights of the algorithm based on a loss function; and repeating the generation or obtaining of training data, generation or obtaining of random
- this specification describes an apparatus comprising: at least one processor; and at least one memoiy including computer program code which, when executed by the at least one processor, causes the apparatus to: initialise parameters of an algorithm, wherein said algorithm has at least some trainable weights and at least some arithmetic operations; generate or obtain training data comprising inputs and desired target outputs for training said algorithm, wherein the target outputs are expected outputs of the algorithm in response to the respective inputs; generate or obtain a set of random perturbation vectors to be applied after at least some of said arithmetic operations of the algorithm, wherein said perturbation vectors have a distribution depending on properties of a target device implementing said algorithm and depending on the arithmetic operations; update at least some of the trainable weights of the algorithm based on a loss function; and repeat the generation or obtaining of training data, generation or obtaining of random perturbation vectors and updating of said trainable weights until a first condition is reached.
- this specification describes an apparatus comprising: an
- initialisation module for initialising parameters of an algorithm, wherein said algorithm having at least some trainable weights and at least some arithmetic operations; a training data module for generating or obtaining training data comprising inputs and desired target outputs for training said algorithm, wherein the target outputs are expected outputs of the algorithm in response to the respective inputs; a perturbation vector generating module for generating or obtaining a set of random (e.g. pseudo-random) perturbation vectors to be applied after at least some of said arithmetic operations of the algorithm (e.g. all mathematical operations that might introduce an error, such as due to the use of fixed-point arithmetic), wherein said perturbation vectors have a distribution (e.g.
- the perturbation vector may be random or pseudo-random perturbations within a defined distribution.
- the perturbation vectors may be evenly distributed within a defined set (wherein the defined set defines the distribution of the perturbation vectors).
- FIG. 1 is a block diagram of a system in accordance with an example embodiment
- FIG. 2 is a block diagram of an example deep neural network
- FIG. 3 is a flow chart showing an algorithm in accordance with an example
- FIG. 4 is a flow chart showing an algorithm in accordance with an example
- FIG. 5 is a flow chart showing an algorithm in accordance with an example
- FIG. 6 is a block diagram of an example communication system in which example embodiments may be implemented.
- FIG. 7 is a block diagram of a components of a system in accordance with an example embodiment.
- FIGS. 8A and 8B show tangible media, respectively a removable memory unit and a compact disc (CD) storing computer-readable code which when run by a computer perform operations according to embodiments.
- CD compact disc
- FIG. l is a block diagram of a system, indicated generally by the reference numeral 10, in accordance with an example embodiment.
- the system 10 comprising a training data module 12, an algorithm 14 and a loss function module 16.
- the algorithm 14 includes a number of trainable weights and implements at least some arithmetic functions.
- the training data module 12 and the loss function module 16 are used to train the algorithm 14, as described further below.
- the training data module 12 provides training data including inputs and corresponding desired outputs for training the algorithm 14.
- the inputs are provided to the algorithm 14 and the outputs provided to the loss function module 16.
- the output of the algorithm 14 in response to the inputs received from the training data module 12 is also provided to the loss function module, such that the loss function module can generate an output based on a comparison of actual outputs and expected outputs of the algorithm 14.
- the trainable weights of the algorithm 14 are trained based on the output of the loss function module 16, for example using stochastic gradient descent, or some similar approach.
- the algorithm 14 may be implemented using a neural network, such as a deep neural network.
- FIG. 2 shows an example deep neural network 20 comprising a plurality of
- a neural network such as the network 20 can be trained by adjusting the connections between nodes and the relative weights of those connections.
- GPUs graphical processing units
- FPGAs field- programmable gate arrays
- ASICs application-specific integrated circuits
- Neural networks are often trained and exploited on computationally powerful platforms with general processing units (GPU) acceleration, supporting high precision floating point arithmetic (e.g. 32-bit or 64-bit floating point arithmetic). Such hardware may not be available for many applications. Accordingly, the neural networks implementing the algorithm 14 of the system 10 described above may be compressed to meet practical constraints. This may be achieved by using a more compact
- compression of weights and/or biases of the neural networks may be achieved through quantization, such that the weights are forced to take values within a codebook with a finite number of entries (e.g. at a lower precision that that provided by a training module).
- the capabilities of the systems used for training an algorithm (such as the algorithm 14) may be different to the capabilities of the algorithm in use.
- FIG. 3 is a flow chart showing an algorithm, indicated generally by the reference numeral 30, in accordance with an example embodiment.
- the algorithm 30 starts at operation 31, where the algorithm 14 described above is trained.
- the algorithm 14 may be a neural network that is trained in a supervised manner using a stochastic gradient descent (SGD) algorithm.
- SGD stochastic gradient descent
- parameters of the algorithm 14 are quantized.
- the algorithm 14 can then be implemented using the quantized parameters. This approach can significantly reduce memory requirements within the algorithm 14 and the complexity of arithmetic operations implemented by the algorithm 14, but typically results in reduced precision.
- Example embodiments are described below with reference to a neural network including a number of dense layers (such as the neural network 20 referred to above). It should be noted that this is one example implementation and that other embodiments are possible.
- a dense layer may be made of U 3 1 units, each implementing the following equation:
- a u is the u th activation (i.e. the output of the u th unit)
- o is an activation function (e.g. ReLu, sigmoid etc.).
- the basic arithmetic operations required by the implementation of such as layer typically include addition, multiplication and an activation function (e.g. identity, ReLu etc.).
- an activation function e.g. identity, ReLu etc.
- convolutional or recurrent layers require only simple arithmetic operations such as the addition, multiplication and activation functions referred to above.
- each operation potentially introduces an error.
- K j bits for the integer part K F bits for the fractional part, and one sign bit
- K j bits for the integer part K F bits for the fractional part
- b L take values ⁇ 0, 1
- C codebook
- the neural network Assuming one has some knowledge of the targeted hardware and arithmetic, it may be possible to define a quantization function Q, which maps each real number to an element from the codebook C in order to simulate the targeted hardware arithmetic. With the knowledge of Q, one could define the neural network, at training, so that the neural network behaves as the targeted hardware. For example, considering a dense layer defined in equation (1) above, the neural network could be implemented in training as:
- a dense layer as defined in (1) above may be implemented in training as:
- a 2-dimensional convolutional layer may be implemented at training as:
- Y l l is the output of the I th filter with co-ordinate (i ), e.g. a pixel in a picture
- M and N are the filter width and length
- Q is the number of filters in the layer’s input
- X is the layer’s input
- W l is the weights of the filter of the I th layers.
- P l is the added perturbation.
- non-trivially implemented activation functions such as hyperbolic tangent and sigmoid
- these are often implemented by piece-wise linear approximations, for which perturbations can also be added for increased robustness.
- a neural network obtained using the principles of random perturbation outlined above is differentiable, such a neural network can be training using stochastic gradient descent (SGD) or some similar methodology, as follows.
- SGD stochastic gradient descent
- FIG. 4 is a flow chart showing an algorithm, indicated generally by the reference numeral 40, in accordance with an example embodiment.
- the algorithm 40 shows an example training method for an algorithm (such as a neural network) having at least some trainable weights, wherein the algorithm is to be implemented on a target device (such as an application-specific integrated circuit or a field-programmable gate array).
- an algorithm such as a neural network
- a target device such as an application-specific integrated circuit or a field-programmable gate array
- the algorithm 40 starts at operation 41 where parameters of the algorithm 14 are initialised.
- the initialised parameters may be set to q 0 and a variable j (if used) set to o.
- the training parameters Q may be initialised randomly.
- training data is obtained.
- the training data may include a number of inputs together with target desired outputs of the algorithm in response to said inputs.
- the input-output pairs may be generated randomly.
- the perturbation vectors may be applied after at least some arithmetic operations of the algorithm being trained.
- the perturbation vectors may be random (e.g. random within a defined distribution).
- the perturbation vectors may have a distribution depending on properties of a target device
- a loss function estimate is generated based on outputs * of the input- output pairs and estimated outputs
- parameters of the algorithm are updated based on the loss function (e.g. using stochastic gradient descent or some similar process).
- it is determined whether the algorithm 40 is complete e.g. by determining whether a first condition has been reached). If so, the algorithm terminates at operation 47. If not, the variable j (if used) is incremented and the algorithm returns to operation 42, and the training processes repeated.
- the operation 46 may be implemented in many ways. For example, the algorithm 40 may be deemed to complete when a defined number of iterations (e.g. indicated by the variable j ) have been carried out.
- the algorithm 40 may be used to train a neural network (or some other algorithm) to be more robust to arithmetic errors, and therefore more suitable to the hardware on which it is deployed.
- the hardware may be an ASIC or an FPGA, although this is not essential to all embodiments.
- the target hardware may implement a fixed-point arithmetic and may, for example, be training using a floating point arithmetic.
- the choice of a probability density function (pdf) from which the perturbation of operation 43 is drawn may depend on a number of factors, such as a quantization function and assumptions on the potential errors.
- a probability density function PDF
- the error is uniformly distributed on [—2 Kr , 0].
- the probability density function PDF
- PDF probability density function
- the algorithm such as a neural network
- Quantizing the weights can significant reduce the complexity of an algorithm. For example, by forcing the weights to take values within the codebook ⁇ —1,0,1), all multiplications can be reduced to zeroing or a sign change. Similarly, by forcing the weights to take values within the codebook of powers of two ⁇ 2 n
- FIG. 5 is a flow chart showing an algorithm, indicated generally by the reference numeral 50, in accordance with an example embodiment.
- the algorithm 50 combines the teaching described above with learning-compression principles.
- the algorithm 50 starts at operation 51, where an algorithm (such as the algorithm 14) is initialised, thereby providing a means for initialising parameters of an algorithm.
- an algorithm such as the algorithm 14
- Operations 52 to 54 implement a learning operation 52, a compression operation 53 and a parameter updating operation 54 that collectively implement a learning- compression algorithm, as discussed further below.
- operation 55 it is determined whether or not the learning-compression algorithm is complete. If not, the operations 52 to 54 are repeated.
- the algorithm 50 terminates at operation 56.
- the learning operation 52 adjusts the weights of the algorithm 14 (e.g. neural networks) in the non-quantized space.
- the operation 52 may be similar to the algorithm 40 described above, but includes an additional penalty term, related to the allowed codebook values in the quantized space.
- the learning operation 52 implements a means for updating trainable parameters of the transmission system based on a loss function.
- L the cross-entropy, which is the loss function minimized during training defined by:
- the learning operation 52 solves the following optimization problem:
- m is a parameter which increases as the training progresses (such that - reduces);
- the compression operation 53 implements a means for quantizing trainable parameters of the transmission system.
- the compression operation 53 may perform the element-wise
- the initialisation operation 51 may comprise:
- the learning operation 52 may comprise:
- the compression operation 53 may comprise:
- the update parameters operation 54 may comprise:
- the complete operation 55 may comprise:
- the algorithm 50 converges to a local solution as ® ⁇ . This could be achieved, for example, by following a multiplicative schedule m ⁇ ) ⁇ - m (i > ' a, where a > 1 and m (0> are parameters of the algorithm. It should be noted that the sequence m (i) can be generated in many ways. Using a multiplicative schedule (as discussed above), the initial value of m (0) , as well as a are optimisation parameters (and may be optimised as part of the training operation 52 described above). It should also be noted that the batch size N as well as the learning rate (and possibly other parameters of the chosen SGD variant, e,g, ADAM, RMSProp, Momentum) could be optimization parameters of the algorithms 40 and 50 described above.
- the principles described herein may be used many applications (e.g. relating to many example algorithms 14). By way of example, the principles described herein may be used in a communication system.
- FIG. 6 is a block diagram of an example communication system, indicated generally by the reference numeral 60, in which example embodiments may be implemented.
- the system 60 includes a transmitter 62, a channel 63 and a receiver 64. Viewed at a system level, the system 60 converts an input symbol (s) (also called a message) received at the input to the transmitter 62 into an output symbol (s) at the output of the receiver 64.
- the transmitter 62 implements a transmitter algorithm.
- the receiver 64 implements a receiver algorithm.
- the algorithms of the transmitter 62 and the receiver 64 may be trained using the principles described herein in order to optimise the performance of the system 60 as a whole.
- the transmitter 62 may include a dense layer of one or more units 70, 71 (e.g.
- the dense layers 70, 71 may include an embedding module.
- the modules within the transmitter 62 are provided by way of example and modifications are possible.
- the receiver 64 may include a dense layer of one or more units 74, 75 (e.g. including one or more neural networks), a softmax module 76 and an arg max module 77.
- the output of the softmax module is a probability vector that is provided to the input of an arg max module 77.
- the modules within the receiver 64 are provided by way of example and modifications are possible.
- the system 60 therefore provides an autoencoder implementing an end-to-end communication system.
- the autoencoder can be trained with respect to an arbitrary loss function that is relevant for a certain performance metric, such as block error rate (BLER).
- BLER block error rate
- one obstacle to practical hardware implementation of systems is the high memoiy requirement and computational complexity of the involved neural networks.
- Hardware acceleration may be provided to achieve reasonable inference time.
- graphical processing units GPUs
- GPUs graphical processing units
- neural network or some other parametric algorithm
- GPU graphic processing units
- Such hardware may not be available for some systems (such as some implementations of the communication system 60).
- neural networks may be compressed to meet practical constraints. This may be achieved by using a more compact representation of neural network parameters, at the cost of reduced precision.
- compression of weights and/or biases of the neural networks may be achieved through quantization, such that the weights are forced to take values within a codebook with a finite number of entries (e.g. at a lower precision that that provided by a training module).
- each weight may take a binary value (e.g. either -1 or +1).
- FIG. 7 is a schematic diagram of components of one or more of the modules described previously (e.g. the transmitter or receiver neural networks), which hereafter are referred to generically as processing systems no.
- a processing system no may have a processor 112, a memory 114 closely coupled to the processor and comprised of a RAM 124 and ROM 122, and, optionally, hardware keys 120 and a display 128.
- the processing system no may comprise one or more network interfaces 118 for connection to a network, e.g. a modem which may be wired or wireless.
- the processor 112 is connected to each of the other components in order to control operation thereof.
- the memory 114 may comprise a non-volatile memory, a hard disk drive (HDD) or a solid state drive (SSD).
- the ROM 122 of the memory 114 stores, amongst other things, an operating system 125 and may store software applications 126.
- the RAM 124 of the memory 114 is used by the processor 112 for the temporaiy storage of data.
- the operating system 125 may contain code which, when executed by the processor, implements aspects of the algorithms 30, 40 and 50.
- the processor 112 may take any suitable form. For instance, it may be a
- microcontroller plural microcontrollers, a processor, or plural processors.
- the processing system no may be a standalone computer, a server, a console, or a network thereof. In some embodiments, the processing system no may also be associated with external software applications. These may be applications stored on a remote server device and may run partly or exclusively on the remote server device. These applications may be termed cloud-hosted applications.
- the processing system no may be in
- FIGS. 8A and 8B show tangible media, respectively a removable memoiy unit 165 and a compact disc (CD) 168, storing computer-readable code which when run by a computer may perform methods according to embodiments described above.
- the removable memoiy unit 165 may be a memory stick, e.g. a USB memory stick, having internal memory 166 storing the computer-readable code.
- the memory 166 may be accessed by a computer system via a connector 167.
- the CD 168 may be a CD-ROM or a DVD or similar. Other forms of tangible storage media may be used.
- Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic.
- the software, application logic and/or hardware may reside on memoiy, or any computer media.
- the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media.
- a“memory” or“computer-readable medium” may be any non-transitory media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.
- programmable gate arrays FPGA field-programmable gate arrays
- ASIC application specify circuits ASIC
- signal processing devices and other devices References to computer program, instructions, code etc. should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device as instructions for a processor or configured or configuration settings for a fixed function device, gate array,
- circuitry refers to all of the following: (a) hardware-only circuit implementations (such as implementations in only analogue and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a server, to perform various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.
- example embodiments have generally been described above with reference to populations of neural networks. This is not essential to all embodiments.
- other forms of algorithms such as a parametric algorithm
- trainable parameters of the algorithm e.g. parameters of a neural network
- a population of algorithms may be updated by updating one or more parameters and/or one or more structures of one or more of the algorithms of a population.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
L'invention concerne un appareil, un procédé et un programme informatique consistant à : initialiser les paramètres d'un algorithme ; générer ou obtenir des données d'apprentissage comprenant des entrées et des sorties cibles souhaitées pour apprendre ledit algorithme ; générer ou obtenir un ensemble de vecteurs de perturbation aléatoire à appliquer après au moins certaines opérations arithmétiques de l'algorithme ; mettre à jour au moins certains poids d'apprentissage de l'algorithme d'après une fonction de perte ; et répéter la génération ou l'obtention des données d'apprentissage, la génération ou l'obtention des vecteurs de perturbation aléatoires ainsi que la mise à jour desdites pondérations d'apprentissage jusqu'à ce qu'une première condition soit remplie.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/EP2019/055537 WO2020177863A1 (fr) | 2019-03-06 | 2019-03-06 | Apprentissage d'algorithmes |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/EP2019/055537 WO2020177863A1 (fr) | 2019-03-06 | 2019-03-06 | Apprentissage d'algorithmes |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020177863A1 true WO2020177863A1 (fr) | 2020-09-10 |
Family
ID=65718002
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2019/055537 Ceased WO2020177863A1 (fr) | 2019-03-06 | 2019-03-06 | Apprentissage d'algorithmes |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2020177863A1 (fr) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114626635A (zh) * | 2022-04-02 | 2022-06-14 | 北京乐智科技有限公司 | 一种基于混合神经网络的钢铁物流成本预测方法及系统 |
| EP4016879A1 (fr) * | 2020-12-16 | 2022-06-22 | Nokia Technologies Oy | Estimation de l'étalement de retard et de l'étalement doppler |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160328647A1 (en) * | 2015-05-08 | 2016-11-10 | Qualcomm Incorporated | Bit width selection for fixed point neural networks |
-
2019
- 2019-03-06 WO PCT/EP2019/055537 patent/WO2020177863A1/fr not_active Ceased
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160328647A1 (en) * | 2015-05-08 | 2016-11-10 | Qualcomm Incorporated | Bit width selection for fixed point neural networks |
Non-Patent Citations (5)
| Title |
|---|
| CHAIM BASKIN ET AL: "NICE: Noise Injection and Clamping Estimation for Neural Network Quantization", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 29 September 2018 (2018-09-29), XP081422330 * |
| CHAIM J BASKIN ET AL: "UNIQ: Uniform Noise Injection for Non-uniform Quantization of Neural Networks", 2 October 2018 (2018-10-02), XP055641141, Retrieved from the Internet <URL:https://arxiv.org/pdf/1804.10969.pdf> [retrieved on 20191111], DOI: 10.1016/j.addbeh.2016.11.010 * |
| DARRYL D LIN ET AL: "Fixed Point Quantization of Deep Convolutional Networks", 2 June 2016 (2016-06-02), pages 1 - 10, XP055561866, Retrieved from the Internet <URL:https://arxiv.org/pdf/1511.06393.pdf> [retrieved on 20190226] * |
| FAYCAL AIT AOUDIA ET AL: "Towards Hardware Implementation of Neural Network-based Communication Algorithms", 2019 IEEE 20TH INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS (SPAWC), 19 February 2019 (2019-02-19), pages 1 - 5, XP055641018, ISBN: 978-1-5386-6528-2, DOI: 10.1109/SPAWC.2019.8815398 * |
| HAO LI ET AL: "Training Quantized Nets: A Deeper Understanding", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 7 June 2017 (2017-06-07), XP081281619 * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP4016879A1 (fr) * | 2020-12-16 | 2022-06-22 | Nokia Technologies Oy | Estimation de l'étalement de retard et de l'étalement doppler |
| US11924007B2 (en) | 2020-12-16 | 2024-03-05 | Nokia Technologies Oy | Estimating delay spread and doppler spread |
| CN114626635A (zh) * | 2022-04-02 | 2022-06-14 | 北京乐智科技有限公司 | 一种基于混合神经网络的钢铁物流成本预测方法及系统 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN113424202B (zh) | 针对神经网络训练调整激活压缩 | |
| Child | Very deep vaes generalize autoregressive models and can outperform them on images | |
| EP3915056B1 (fr) | Compression d'activation de réseau neuronal avec des mantisses non uniformes | |
| CN108345939B (zh) | 基于定点运算的神经网络 | |
| EP3906616B1 (fr) | Compression d'activation de réseau neuronal avec virgule flottante de bloc aberrant | |
| US10789734B2 (en) | Method and device for data quantization | |
| US11657254B2 (en) | Computation method and device used in a convolutional neural network | |
| US12159228B2 (en) | End-to-end learning in communication systems | |
| WO2020142223A1 (fr) | Quantification différée de paramètres pendant l'apprentissage à l'aide d'un outil d'apprentissage machine | |
| US11082264B2 (en) | Learning in communication systems | |
| CN110663048A (zh) | 用于深度神经网络的执行方法、执行装置、学习方法、学习装置以及程序 | |
| CN114416351B (zh) | 资源分配方法、装置、设备、介质及计算机程序产品 | |
| EP4303770A1 (fr) | Identification d'un ou plusieurs paramètres de quantification pour quantifier des valeurs à traiter par un réseau neuronal | |
| CN115668229A (zh) | 用于经训练神经网络的低资源计算块 | |
| WO2020177863A1 (fr) | Apprentissage d'algorithmes | |
| US12107679B2 (en) | Iterative detection in a communication system | |
| CN113177634B (zh) | 基于神经网络输入输出量化的图像分析系统、方法和设备 | |
| US20220083870A1 (en) | Training in Communication Systems | |
| US20210125064A1 (en) | Method and apparatus for training neural network | |
| CN114548360A (zh) | 用于更新人工神经网络的方法 | |
| US20220405561A1 (en) | Electronic device and controlling method of electronic device | |
| US9355363B2 (en) | Systems and methods for virtual parallel computing using matrix product states | |
| CN116108915A (zh) | 非暂态计算机可读记录介质、操作方法和操作设备 | |
| JP2024508596A (ja) | 階層的な共有指数浮動小数点データタイプ | |
| CN114118358A (zh) | 图像处理方法、装置、电子设备、介质及程序产品 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19709688 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 19709688 Country of ref document: EP Kind code of ref document: A1 |