[go: up one dir, main page]

WO2020177863A1 - Apprentissage d'algorithmes - Google Patents

Apprentissage d'algorithmes Download PDF

Info

Publication number
WO2020177863A1
WO2020177863A1 PCT/EP2019/055537 EP2019055537W WO2020177863A1 WO 2020177863 A1 WO2020177863 A1 WO 2020177863A1 EP 2019055537 W EP2019055537 W EP 2019055537W WO 2020177863 A1 WO2020177863 A1 WO 2020177863A1
Authority
WO
WIPO (PCT)
Prior art keywords
algorithm
obtaining
trainable weights
arithmetic operations
perturbation vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/EP2019/055537
Other languages
English (en)
Inventor
Faycal AIT AOUDIA
Jakob Hoydis
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Technologies Oy
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Priority to PCT/EP2019/055537 priority Critical patent/WO2020177863A1/fr
Publication of WO2020177863A1 publication Critical patent/WO2020177863A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Definitions

  • the present specification relates to training of algorithms, such as neural networks, having at least some trainable weights.
  • An algorithm such as a neural network, can be trained using a loss function. Obstacles to practical hardware implementations of such algorithms include high memory requirements and computational complexity. There remains scope for further developments in this area.
  • this specification describes an apparatus comprising: means for initialising parameters of an algorithm, wherein said algorithm having at least some trainable weights and at least some arithmetic operations; means for generating or obtaining training data comprising inputs and desired target outputs for training said algorithm, wherein the target outputs are expected outputs of the algorithm in response to the respective inputs; means for generating or obtaining a set of random (e.g.
  • pseudo-random perturbation vectors to be applied after at least some of said arithmetic operations of the algorithm (e.g. all mathematical operations that might introduce an error, such as due to the use of fixed-point arithmetic), wherein said perturbation vectors have a distribution (e.g. a probability density function) depending on properties of a target device implementing said algorithm and depending on the arithmetic operations; means for updating at least some of the trainable weights of the algorithm based on a loss function; and means for repeating the generation or obtaining of training data, generation or obtaining of random perturbation vectors and updating of said trainable weights until a first condition is reached.
  • the perturbation vector may be random or pseudo-random perturbations within a defined distribution.
  • the perturbation vectors may be evenly distributed within a defined set (wherein the defined set defines the distribution of the perturbation vectors).
  • the algorithm may comprise a neural network.
  • the algorithm may have a plurality of layers (e.g. a plurality of dense layers), each layer having at least some trainable weights and each layer including one or more arithmetic operations.
  • the arithmetic operations may include multiplications.
  • the perturbation vectors are applied after each multiplication.
  • Some embodiments comprise means for determining an error distribution of the target device for use in defining said distribution of said perturbation vectors.
  • the error distribution may be determined by measurement. One example is a histogram approach.
  • the distribution of the perturbation vectors may be based on a“best fit” to the measured error distribution.
  • the target device may be one of an application-specific integrated circuit and a field- programmable gate array.
  • the target device may implement a fixed-point arithmetic.
  • training may be conducted using a floating-point arithmetic.
  • Some embodiments comprise means for quantizing said trainable weights, such that said weights can only take values within a codebook having a finite number of entries that is a subset of the possible values available during updating.
  • the said means for repeating may comprise means for repeating the generation or obtaining of training data, the generating or obtaining of random perturbation vectors, the updating of said trainable weights and the quantizing of said trainable weights until the first condition is reached.
  • the said loss function may comprise a penalty term related to the quantization of the trainable weights.
  • the said penalty term may include a variable that is adjusted on each repetition of the updating and quantizing such that, on each repetition, more weight is given in the loss function to a difference between the trainable weights and the quantized trainable weights.
  • the first condition may be met when a performance metric for the algorithm is met.
  • the first condition may be met when a performance metric has been unchanged (e.g. unchanged within a margin) for a defined number of iterations.
  • the first condition may comprise a defined number of iterations (e.g. a maximum number of iterations).
  • the loss function may be related to one or more of mean squared error, block error rate and categorical cross-entropy.
  • the said at least some weights of the algorithm may be trained using stochastic gradient descent (or methods based on stochastic gradient descent or some similar algorithm).
  • the algorithm implements at least part of a transmission system comprising a transmitter, a channel and a receiver.
  • the said means may comprise: at least one processor; and at least one memory including computer program code, the at least one memory and the computer program configured, with the at least one processor, to cause the performance of the apparatus.
  • this specification describes a method (e.g. a method of training an algorithm, such as a neural network) comprising: initialising parameters of an algorithm, wherein said algorithm has at least some trainable weights and at least some arithmetic operations; generating or obtaining training data comprising inputs and desired target outputs for training said algorithm, wherein the target outputs are expected outputs of the algorithm in response to the respective inputs; generating or obtaining a set of random (e.g.
  • pseudo random perturbation vectors to be applied after at least some of said arithmetic operations of the algorithm (e.g. all mathematical operations that might introduce an error), wherein said perturbation vectors have a distribution (e.g. a probability density function) depending on properties of a target device implementing said algorithm and depending on the arithmetic operations; updating at least some of the trainable weights of the algorithm based on a loss function; and repeating the generation or obtaining of training data, generation or obtaining of random perturbation vectors and updating of said trainable weights until a first condition is reached.
  • the perturbation vector may be random or pseudo-random perturbations within a defined distribution.
  • the perturbation vectors may be evenly distributed within a defined set (wherein the defined set defines the distribution of the perturbation vectors).
  • the algorithm may comprise a neural network.
  • the algorithm may have a plurality of layers (e.g. a plurality of dense layers), each layer having at least some trainable weights and each layer including one or more arithmetic operations.
  • Some embodiments comprise determining an error distribution of the target device for use in defining said distribution of said perturbation vectors.
  • the error distribution may be determined by measurement.
  • Some embodiments comprise quantizing said trainable weights, such that said weights can only take values within a codebook having a finite number of entries that is a subset of the possible values available during updating.
  • the said loss function may comprise a penalty term related to the quantization of the trainable weights.
  • the said penalty term may include a variable that is adjusted on each repetition of the updating and quantizing such that, on each repetition, more weight is given in the loss function to a difference between the trainable weights and the quantized trainable weights.
  • the first condition may be met when a performance metric for the algorithm is met.
  • the first condition may be met when a performance metric has been unchanged (e.g. unchanged within a margin) for a defined number of iterations.
  • the first condition may comprise a defined number of iterations (e.g. a maximum number of iterations).
  • the said at least some weights of the algorithm may be trained using stochastic gradient descent (or methods based on stochastic gradient descent).
  • this specification describes any apparatus configured to perform any method as described with reference to the second aspect.
  • this specification describes computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform any method as described with reference to the second aspect.
  • this specification describes a computer program comprising instructions for causing an apparatus to perform at least the following: initialise parameters of an algorithm, wherein said algorithm has at least some trainable weights and at least some arithmetic operations; generate or obtain training data comprising inputs and desired target outputs for training said algorithm, wherein the target outputs are expected outputs of the algorithm in response to the respective inputs; generate or obtain a set of random perturbation vectors to be applied after at least some of said arithmetic operations of the algorithm, wherein said perturbation vectors have a distribution depending on properties of a target device implementing said algorithm and depending on the arithmetic operations; update at least some of the trainable weights of the algorithm based on a loss function; and repeat the generation or obtaining of training data, generation or obtaining of random perturbation vectors and updating of said trainable weights until a first condition is reached.
  • this specification describes a computer-readable medium (such as a non-transitory computer readable medium) comprising program instructions stored thereon for performing at least the following: initialising parameters of an algorithm, wherein said algorithm has at least some trainable weights and at least some arithmetic operations; generating or obtaining training data comprising inputs and desired target outputs for training said algorithm, wherein the target outputs are expected outputs of the algorithm in response to the respective inputs; generating or obtaining a set of random (e.g. pseudo random) perturbation vectors to be applied after at least some of said arithmetic operations of the algorithm (e.g. all mathematical operations that might introduce an error), wherein said perturbation vectors have a distribution (e.g.
  • a probability density function depending on properties of a target device implementing said algorithm and depending on the arithmetic operations; updating at least some of the trainable weights of the algorithm based on a loss function; and repeating the generation or obtaining of training data, generation or obtaining of random
  • this specification describes an apparatus comprising: at least one processor; and at least one memoiy including computer program code which, when executed by the at least one processor, causes the apparatus to: initialise parameters of an algorithm, wherein said algorithm has at least some trainable weights and at least some arithmetic operations; generate or obtain training data comprising inputs and desired target outputs for training said algorithm, wherein the target outputs are expected outputs of the algorithm in response to the respective inputs; generate or obtain a set of random perturbation vectors to be applied after at least some of said arithmetic operations of the algorithm, wherein said perturbation vectors have a distribution depending on properties of a target device implementing said algorithm and depending on the arithmetic operations; update at least some of the trainable weights of the algorithm based on a loss function; and repeat the generation or obtaining of training data, generation or obtaining of random perturbation vectors and updating of said trainable weights until a first condition is reached.
  • this specification describes an apparatus comprising: an
  • initialisation module for initialising parameters of an algorithm, wherein said algorithm having at least some trainable weights and at least some arithmetic operations; a training data module for generating or obtaining training data comprising inputs and desired target outputs for training said algorithm, wherein the target outputs are expected outputs of the algorithm in response to the respective inputs; a perturbation vector generating module for generating or obtaining a set of random (e.g. pseudo-random) perturbation vectors to be applied after at least some of said arithmetic operations of the algorithm (e.g. all mathematical operations that might introduce an error, such as due to the use of fixed-point arithmetic), wherein said perturbation vectors have a distribution (e.g.
  • the perturbation vector may be random or pseudo-random perturbations within a defined distribution.
  • the perturbation vectors may be evenly distributed within a defined set (wherein the defined set defines the distribution of the perturbation vectors).
  • FIG. 1 is a block diagram of a system in accordance with an example embodiment
  • FIG. 2 is a block diagram of an example deep neural network
  • FIG. 3 is a flow chart showing an algorithm in accordance with an example
  • FIG. 4 is a flow chart showing an algorithm in accordance with an example
  • FIG. 5 is a flow chart showing an algorithm in accordance with an example
  • FIG. 6 is a block diagram of an example communication system in which example embodiments may be implemented.
  • FIG. 7 is a block diagram of a components of a system in accordance with an example embodiment.
  • FIGS. 8A and 8B show tangible media, respectively a removable memory unit and a compact disc (CD) storing computer-readable code which when run by a computer perform operations according to embodiments.
  • CD compact disc
  • FIG. l is a block diagram of a system, indicated generally by the reference numeral 10, in accordance with an example embodiment.
  • the system 10 comprising a training data module 12, an algorithm 14 and a loss function module 16.
  • the algorithm 14 includes a number of trainable weights and implements at least some arithmetic functions.
  • the training data module 12 and the loss function module 16 are used to train the algorithm 14, as described further below.
  • the training data module 12 provides training data including inputs and corresponding desired outputs for training the algorithm 14.
  • the inputs are provided to the algorithm 14 and the outputs provided to the loss function module 16.
  • the output of the algorithm 14 in response to the inputs received from the training data module 12 is also provided to the loss function module, such that the loss function module can generate an output based on a comparison of actual outputs and expected outputs of the algorithm 14.
  • the trainable weights of the algorithm 14 are trained based on the output of the loss function module 16, for example using stochastic gradient descent, or some similar approach.
  • the algorithm 14 may be implemented using a neural network, such as a deep neural network.
  • FIG. 2 shows an example deep neural network 20 comprising a plurality of
  • a neural network such as the network 20 can be trained by adjusting the connections between nodes and the relative weights of those connections.
  • GPUs graphical processing units
  • FPGAs field- programmable gate arrays
  • ASICs application-specific integrated circuits
  • Neural networks are often trained and exploited on computationally powerful platforms with general processing units (GPU) acceleration, supporting high precision floating point arithmetic (e.g. 32-bit or 64-bit floating point arithmetic). Such hardware may not be available for many applications. Accordingly, the neural networks implementing the algorithm 14 of the system 10 described above may be compressed to meet practical constraints. This may be achieved by using a more compact
  • compression of weights and/or biases of the neural networks may be achieved through quantization, such that the weights are forced to take values within a codebook with a finite number of entries (e.g. at a lower precision that that provided by a training module).
  • the capabilities of the systems used for training an algorithm (such as the algorithm 14) may be different to the capabilities of the algorithm in use.
  • FIG. 3 is a flow chart showing an algorithm, indicated generally by the reference numeral 30, in accordance with an example embodiment.
  • the algorithm 30 starts at operation 31, where the algorithm 14 described above is trained.
  • the algorithm 14 may be a neural network that is trained in a supervised manner using a stochastic gradient descent (SGD) algorithm.
  • SGD stochastic gradient descent
  • parameters of the algorithm 14 are quantized.
  • the algorithm 14 can then be implemented using the quantized parameters. This approach can significantly reduce memory requirements within the algorithm 14 and the complexity of arithmetic operations implemented by the algorithm 14, but typically results in reduced precision.
  • Example embodiments are described below with reference to a neural network including a number of dense layers (such as the neural network 20 referred to above). It should be noted that this is one example implementation and that other embodiments are possible.
  • a dense layer may be made of U 3 1 units, each implementing the following equation:
  • a u is the u th activation (i.e. the output of the u th unit)
  • o is an activation function (e.g. ReLu, sigmoid etc.).
  • the basic arithmetic operations required by the implementation of such as layer typically include addition, multiplication and an activation function (e.g. identity, ReLu etc.).
  • an activation function e.g. identity, ReLu etc.
  • convolutional or recurrent layers require only simple arithmetic operations such as the addition, multiplication and activation functions referred to above.
  • each operation potentially introduces an error.
  • K j bits for the integer part K F bits for the fractional part, and one sign bit
  • K j bits for the integer part K F bits for the fractional part
  • b L take values ⁇ 0, 1
  • C codebook
  • the neural network Assuming one has some knowledge of the targeted hardware and arithmetic, it may be possible to define a quantization function Q, which maps each real number to an element from the codebook C in order to simulate the targeted hardware arithmetic. With the knowledge of Q, one could define the neural network, at training, so that the neural network behaves as the targeted hardware. For example, considering a dense layer defined in equation (1) above, the neural network could be implemented in training as:
  • a dense layer as defined in (1) above may be implemented in training as:
  • a 2-dimensional convolutional layer may be implemented at training as:
  • Y l l is the output of the I th filter with co-ordinate (i ), e.g. a pixel in a picture
  • M and N are the filter width and length
  • Q is the number of filters in the layer’s input
  • X is the layer’s input
  • W l is the weights of the filter of the I th layers.
  • P l is the added perturbation.
  • non-trivially implemented activation functions such as hyperbolic tangent and sigmoid
  • these are often implemented by piece-wise linear approximations, for which perturbations can also be added for increased robustness.
  • a neural network obtained using the principles of random perturbation outlined above is differentiable, such a neural network can be training using stochastic gradient descent (SGD) or some similar methodology, as follows.
  • SGD stochastic gradient descent
  • FIG. 4 is a flow chart showing an algorithm, indicated generally by the reference numeral 40, in accordance with an example embodiment.
  • the algorithm 40 shows an example training method for an algorithm (such as a neural network) having at least some trainable weights, wherein the algorithm is to be implemented on a target device (such as an application-specific integrated circuit or a field-programmable gate array).
  • an algorithm such as a neural network
  • a target device such as an application-specific integrated circuit or a field-programmable gate array
  • the algorithm 40 starts at operation 41 where parameters of the algorithm 14 are initialised.
  • the initialised parameters may be set to q 0 and a variable j (if used) set to o.
  • the training parameters Q may be initialised randomly.
  • training data is obtained.
  • the training data may include a number of inputs together with target desired outputs of the algorithm in response to said inputs.
  • the input-output pairs may be generated randomly.
  • the perturbation vectors may be applied after at least some arithmetic operations of the algorithm being trained.
  • the perturbation vectors may be random (e.g. random within a defined distribution).
  • the perturbation vectors may have a distribution depending on properties of a target device
  • a loss function estimate is generated based on outputs * of the input- output pairs and estimated outputs
  • parameters of the algorithm are updated based on the loss function (e.g. using stochastic gradient descent or some similar process).
  • it is determined whether the algorithm 40 is complete e.g. by determining whether a first condition has been reached). If so, the algorithm terminates at operation 47. If not, the variable j (if used) is incremented and the algorithm returns to operation 42, and the training processes repeated.
  • the operation 46 may be implemented in many ways. For example, the algorithm 40 may be deemed to complete when a defined number of iterations (e.g. indicated by the variable j ) have been carried out.
  • the algorithm 40 may be used to train a neural network (or some other algorithm) to be more robust to arithmetic errors, and therefore more suitable to the hardware on which it is deployed.
  • the hardware may be an ASIC or an FPGA, although this is not essential to all embodiments.
  • the target hardware may implement a fixed-point arithmetic and may, for example, be training using a floating point arithmetic.
  • the choice of a probability density function (pdf) from which the perturbation of operation 43 is drawn may depend on a number of factors, such as a quantization function and assumptions on the potential errors.
  • a probability density function PDF
  • the error is uniformly distributed on [—2 Kr , 0].
  • the probability density function PDF
  • PDF probability density function
  • the algorithm such as a neural network
  • Quantizing the weights can significant reduce the complexity of an algorithm. For example, by forcing the weights to take values within the codebook ⁇ —1,0,1), all multiplications can be reduced to zeroing or a sign change. Similarly, by forcing the weights to take values within the codebook of powers of two ⁇ 2 n
  • FIG. 5 is a flow chart showing an algorithm, indicated generally by the reference numeral 50, in accordance with an example embodiment.
  • the algorithm 50 combines the teaching described above with learning-compression principles.
  • the algorithm 50 starts at operation 51, where an algorithm (such as the algorithm 14) is initialised, thereby providing a means for initialising parameters of an algorithm.
  • an algorithm such as the algorithm 14
  • Operations 52 to 54 implement a learning operation 52, a compression operation 53 and a parameter updating operation 54 that collectively implement a learning- compression algorithm, as discussed further below.
  • operation 55 it is determined whether or not the learning-compression algorithm is complete. If not, the operations 52 to 54 are repeated.
  • the algorithm 50 terminates at operation 56.
  • the learning operation 52 adjusts the weights of the algorithm 14 (e.g. neural networks) in the non-quantized space.
  • the operation 52 may be similar to the algorithm 40 described above, but includes an additional penalty term, related to the allowed codebook values in the quantized space.
  • the learning operation 52 implements a means for updating trainable parameters of the transmission system based on a loss function.
  • L the cross-entropy, which is the loss function minimized during training defined by:
  • the learning operation 52 solves the following optimization problem:
  • m is a parameter which increases as the training progresses (such that - reduces);
  • the compression operation 53 implements a means for quantizing trainable parameters of the transmission system.
  • the compression operation 53 may perform the element-wise
  • the initialisation operation 51 may comprise:
  • the learning operation 52 may comprise:
  • the compression operation 53 may comprise:
  • the update parameters operation 54 may comprise:
  • the complete operation 55 may comprise:
  • the algorithm 50 converges to a local solution as ® ⁇ . This could be achieved, for example, by following a multiplicative schedule m ⁇ ) ⁇ - m (i > ' a, where a > 1 and m (0> are parameters of the algorithm. It should be noted that the sequence m (i) can be generated in many ways. Using a multiplicative schedule (as discussed above), the initial value of m (0) , as well as a are optimisation parameters (and may be optimised as part of the training operation 52 described above). It should also be noted that the batch size N as well as the learning rate (and possibly other parameters of the chosen SGD variant, e,g, ADAM, RMSProp, Momentum) could be optimization parameters of the algorithms 40 and 50 described above.
  • the principles described herein may be used many applications (e.g. relating to many example algorithms 14). By way of example, the principles described herein may be used in a communication system.
  • FIG. 6 is a block diagram of an example communication system, indicated generally by the reference numeral 60, in which example embodiments may be implemented.
  • the system 60 includes a transmitter 62, a channel 63 and a receiver 64. Viewed at a system level, the system 60 converts an input symbol (s) (also called a message) received at the input to the transmitter 62 into an output symbol (s) at the output of the receiver 64.
  • the transmitter 62 implements a transmitter algorithm.
  • the receiver 64 implements a receiver algorithm.
  • the algorithms of the transmitter 62 and the receiver 64 may be trained using the principles described herein in order to optimise the performance of the system 60 as a whole.
  • the transmitter 62 may include a dense layer of one or more units 70, 71 (e.g.
  • the dense layers 70, 71 may include an embedding module.
  • the modules within the transmitter 62 are provided by way of example and modifications are possible.
  • the receiver 64 may include a dense layer of one or more units 74, 75 (e.g. including one or more neural networks), a softmax module 76 and an arg max module 77.
  • the output of the softmax module is a probability vector that is provided to the input of an arg max module 77.
  • the modules within the receiver 64 are provided by way of example and modifications are possible.
  • the system 60 therefore provides an autoencoder implementing an end-to-end communication system.
  • the autoencoder can be trained with respect to an arbitrary loss function that is relevant for a certain performance metric, such as block error rate (BLER).
  • BLER block error rate
  • one obstacle to practical hardware implementation of systems is the high memoiy requirement and computational complexity of the involved neural networks.
  • Hardware acceleration may be provided to achieve reasonable inference time.
  • graphical processing units GPUs
  • GPUs graphical processing units
  • neural network or some other parametric algorithm
  • GPU graphic processing units
  • Such hardware may not be available for some systems (such as some implementations of the communication system 60).
  • neural networks may be compressed to meet practical constraints. This may be achieved by using a more compact representation of neural network parameters, at the cost of reduced precision.
  • compression of weights and/or biases of the neural networks may be achieved through quantization, such that the weights are forced to take values within a codebook with a finite number of entries (e.g. at a lower precision that that provided by a training module).
  • each weight may take a binary value (e.g. either -1 or +1).
  • FIG. 7 is a schematic diagram of components of one or more of the modules described previously (e.g. the transmitter or receiver neural networks), which hereafter are referred to generically as processing systems no.
  • a processing system no may have a processor 112, a memory 114 closely coupled to the processor and comprised of a RAM 124 and ROM 122, and, optionally, hardware keys 120 and a display 128.
  • the processing system no may comprise one or more network interfaces 118 for connection to a network, e.g. a modem which may be wired or wireless.
  • the processor 112 is connected to each of the other components in order to control operation thereof.
  • the memory 114 may comprise a non-volatile memory, a hard disk drive (HDD) or a solid state drive (SSD).
  • the ROM 122 of the memory 114 stores, amongst other things, an operating system 125 and may store software applications 126.
  • the RAM 124 of the memory 114 is used by the processor 112 for the temporaiy storage of data.
  • the operating system 125 may contain code which, when executed by the processor, implements aspects of the algorithms 30, 40 and 50.
  • the processor 112 may take any suitable form. For instance, it may be a
  • microcontroller plural microcontrollers, a processor, or plural processors.
  • the processing system no may be a standalone computer, a server, a console, or a network thereof. In some embodiments, the processing system no may also be associated with external software applications. These may be applications stored on a remote server device and may run partly or exclusively on the remote server device. These applications may be termed cloud-hosted applications.
  • the processing system no may be in
  • FIGS. 8A and 8B show tangible media, respectively a removable memoiy unit 165 and a compact disc (CD) 168, storing computer-readable code which when run by a computer may perform methods according to embodiments described above.
  • the removable memoiy unit 165 may be a memory stick, e.g. a USB memory stick, having internal memory 166 storing the computer-readable code.
  • the memory 166 may be accessed by a computer system via a connector 167.
  • the CD 168 may be a CD-ROM or a DVD or similar. Other forms of tangible storage media may be used.
  • Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic.
  • the software, application logic and/or hardware may reside on memoiy, or any computer media.
  • the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media.
  • a“memory” or“computer-readable medium” may be any non-transitory media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.
  • programmable gate arrays FPGA field-programmable gate arrays
  • ASIC application specify circuits ASIC
  • signal processing devices and other devices References to computer program, instructions, code etc. should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device as instructions for a processor or configured or configuration settings for a fixed function device, gate array,
  • circuitry refers to all of the following: (a) hardware-only circuit implementations (such as implementations in only analogue and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a server, to perform various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.
  • example embodiments have generally been described above with reference to populations of neural networks. This is not essential to all embodiments.
  • other forms of algorithms such as a parametric algorithm
  • trainable parameters of the algorithm e.g. parameters of a neural network
  • a population of algorithms may be updated by updating one or more parameters and/or one or more structures of one or more of the algorithms of a population.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

L'invention concerne un appareil, un procédé et un programme informatique consistant à : initialiser les paramètres d'un algorithme ; générer ou obtenir des données d'apprentissage comprenant des entrées et des sorties cibles souhaitées pour apprendre ledit algorithme ; générer ou obtenir un ensemble de vecteurs de perturbation aléatoire à appliquer après au moins certaines opérations arithmétiques de l'algorithme ; mettre à jour au moins certains poids d'apprentissage de l'algorithme d'après une fonction de perte ; et répéter la génération ou l'obtention des données d'apprentissage, la génération ou l'obtention des vecteurs de perturbation aléatoires ainsi que la mise à jour desdites pondérations d'apprentissage jusqu'à ce qu'une première condition soit remplie.
PCT/EP2019/055537 2019-03-06 2019-03-06 Apprentissage d'algorithmes Ceased WO2020177863A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2019/055537 WO2020177863A1 (fr) 2019-03-06 2019-03-06 Apprentissage d'algorithmes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2019/055537 WO2020177863A1 (fr) 2019-03-06 2019-03-06 Apprentissage d'algorithmes

Publications (1)

Publication Number Publication Date
WO2020177863A1 true WO2020177863A1 (fr) 2020-09-10

Family

ID=65718002

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2019/055537 Ceased WO2020177863A1 (fr) 2019-03-06 2019-03-06 Apprentissage d'algorithmes

Country Status (1)

Country Link
WO (1) WO2020177863A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114626635A (zh) * 2022-04-02 2022-06-14 北京乐智科技有限公司 一种基于混合神经网络的钢铁物流成本预测方法及系统
EP4016879A1 (fr) * 2020-12-16 2022-06-22 Nokia Technologies Oy Estimation de l'étalement de retard et de l'étalement doppler

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160328647A1 (en) * 2015-05-08 2016-11-10 Qualcomm Incorporated Bit width selection for fixed point neural networks

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160328647A1 (en) * 2015-05-08 2016-11-10 Qualcomm Incorporated Bit width selection for fixed point neural networks

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CHAIM BASKIN ET AL: "NICE: Noise Injection and Clamping Estimation for Neural Network Quantization", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 29 September 2018 (2018-09-29), XP081422330 *
CHAIM J BASKIN ET AL: "UNIQ: Uniform Noise Injection for Non-uniform Quantization of Neural Networks", 2 October 2018 (2018-10-02), XP055641141, Retrieved from the Internet <URL:https://arxiv.org/pdf/1804.10969.pdf> [retrieved on 20191111], DOI: 10.1016/j.addbeh.2016.11.010 *
DARRYL D LIN ET AL: "Fixed Point Quantization of Deep Convolutional Networks", 2 June 2016 (2016-06-02), pages 1 - 10, XP055561866, Retrieved from the Internet <URL:https://arxiv.org/pdf/1511.06393.pdf> [retrieved on 20190226] *
FAYCAL AIT AOUDIA ET AL: "Towards Hardware Implementation of Neural Network-based Communication Algorithms", 2019 IEEE 20TH INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS (SPAWC), 19 February 2019 (2019-02-19), pages 1 - 5, XP055641018, ISBN: 978-1-5386-6528-2, DOI: 10.1109/SPAWC.2019.8815398 *
HAO LI ET AL: "Training Quantized Nets: A Deeper Understanding", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 7 June 2017 (2017-06-07), XP081281619 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4016879A1 (fr) * 2020-12-16 2022-06-22 Nokia Technologies Oy Estimation de l'étalement de retard et de l'étalement doppler
US11924007B2 (en) 2020-12-16 2024-03-05 Nokia Technologies Oy Estimating delay spread and doppler spread
CN114626635A (zh) * 2022-04-02 2022-06-14 北京乐智科技有限公司 一种基于混合神经网络的钢铁物流成本预测方法及系统

Similar Documents

Publication Publication Date Title
CN113424202B (zh) 针对神经网络训练调整激活压缩
Child Very deep vaes generalize autoregressive models and can outperform them on images
EP3915056B1 (fr) Compression d&#39;activation de réseau neuronal avec des mantisses non uniformes
CN108345939B (zh) 基于定点运算的神经网络
EP3906616B1 (fr) Compression d&#39;activation de réseau neuronal avec virgule flottante de bloc aberrant
US10789734B2 (en) Method and device for data quantization
US11657254B2 (en) Computation method and device used in a convolutional neural network
US12159228B2 (en) End-to-end learning in communication systems
WO2020142223A1 (fr) Quantification différée de paramètres pendant l&#39;apprentissage à l&#39;aide d&#39;un outil d&#39;apprentissage machine
US11082264B2 (en) Learning in communication systems
CN110663048A (zh) 用于深度神经网络的执行方法、执行装置、学习方法、学习装置以及程序
CN114416351B (zh) 资源分配方法、装置、设备、介质及计算机程序产品
EP4303770A1 (fr) Identification d&#39;un ou plusieurs paramètres de quantification pour quantifier des valeurs à traiter par un réseau neuronal
CN115668229A (zh) 用于经训练神经网络的低资源计算块
WO2020177863A1 (fr) Apprentissage d&#39;algorithmes
US12107679B2 (en) Iterative detection in a communication system
CN113177634B (zh) 基于神经网络输入输出量化的图像分析系统、方法和设备
US20220083870A1 (en) Training in Communication Systems
US20210125064A1 (en) Method and apparatus for training neural network
CN114548360A (zh) 用于更新人工神经网络的方法
US20220405561A1 (en) Electronic device and controlling method of electronic device
US9355363B2 (en) Systems and methods for virtual parallel computing using matrix product states
CN116108915A (zh) 非暂态计算机可读记录介质、操作方法和操作设备
JP2024508596A (ja) 階層的な共有指数浮動小数点データタイプ
CN114118358A (zh) 图像处理方法、装置、电子设备、介质及程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19709688

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19709688

Country of ref document: EP

Kind code of ref document: A1