[go: up one dir, main page]

WO2019127363A1 - 神经网络权重编码方法、计算装置及硬件系统 - Google Patents

神经网络权重编码方法、计算装置及硬件系统 Download PDF

Info

Publication number
WO2019127363A1
WO2019127363A1 PCT/CN2017/119821 CN2017119821W WO2019127363A1 WO 2019127363 A1 WO2019127363 A1 WO 2019127363A1 CN 2017119821 W CN2017119821 W CN 2017119821W WO 2019127363 A1 WO2019127363 A1 WO 2019127363A1
Authority
WO
WIPO (PCT)
Prior art keywords
weight
matrix
splicing
analog circuit
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2017/119821
Other languages
English (en)
French (fr)
Inventor
张悠慧
季宇
张优扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201780042640.0A priority Critical patent/CN109791626B/zh
Priority to PCT/CN2017/119821 priority patent/WO2019127363A1/zh
Publication of WO2019127363A1 publication Critical patent/WO2019127363A1/zh
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • the present invention relates generally to the field of neural network technologies, and more particularly to a weight coding method, a computing device, and a hardware system for a neural network.
  • the neural network has made breakthroughs in computing, and has achieved high accuracy in many fields such as image recognition, speech recognition, and natural language processing.
  • neural networks require massive computing resources.
  • the general-purpose processor has been difficult to meet the computational needs of deep learning, and designing a dedicated chip has become an important development direction.
  • memristor provides an efficient solution for neural network chip design.
  • Memristor has the advantages of high density, non-volatile, low power consumption, cost-effective, easy 3D, etc.
  • the characteristics of adjustable resistance can be used as the programmable weight, and the advantage of the combination of the calculation and the calculation can be used as the high-speed multiplier.
  • the neural network components are all neurons, which are connected to each other by a large number of neurons.
  • the connections between neurons can be thought of as directed edges with weights, the outputs of the neurons are weighted by the connections between the neurons, and then passed to the connected neurons, and all the neurons receive The inputs are added together for further processing, producing the output of the neurons.
  • Neural network modeling usually consists of several neurons as a layer, and layers are connected to each other to construct.
  • Figure 1 shows a chain of neural networks. Each circle in the figure represents a neuron, each The arrows indicate the connections between the neurons, each of which has a weight, and the structure of the actual neural network is not limited to a chain-like network structure.
  • the core computation of the neural network is a matrix vector multiplication operation.
  • the output produced by the layer L n containing n neurons can be represented by a vector V n of length n, which is fully associated with the layer L m containing m neurons, and the connection weights It can be expressed as a matrix M n ⁇ m , the matrix size is n rows and m columns, and each matrix element represents the weight of one connection.
  • the vector input to L m after weighting is M n ⁇ m V n , and such matrix vector multiplication is the core calculation of the neural network.
  • the neural network acceleration chip also has the main design goal of accelerating matrix multiplication.
  • the memristor array is just right for the above work.
  • V is a set of input voltage
  • the voltage is multiplied by the memristor conductance G and superimposed output current
  • the output current is multiplied by the grounding resistance Rs to obtain the output voltage V'.
  • the whole process is realized under the analog circuit, and has a fast speed and a small area.
  • chip computing based on memristor also has the disadvantages of low precision, large disturbance, large cost of digital-to-analog/analog conversion, and limited matrix size.
  • the memristor can perform matrix vector multiplication operations efficiently, since the memristor chip matrix vector multiplication is implemented in an analog circuit, noise and disturbance are inevitably brought about, so compared with the neural network, the memristor The calculation results are not accurate.
  • the use of a memristor indicates that the weight has a certain error. As shown in Figure 3, the weights of different levels will overlap. In order to avoid overlap, the existing methods generally use a number of low-precision memristor splicing to represent a high-precision weight, and in the case where each memristor has a low precision, the weight data can be considered accurate. Taking a 2-bit memristor to represent a 4-bit weight as an example, a 2-bit memristor is used to indicate a lower weight of 2 bits and the other represents a high 2 bits.
  • ISAAC existing ISAAC technology first trains a neural network with floating point numbers and then "writes" the weighted data to the memristor.
  • ISAAC uses four 2-bit memristor devices to represent an 8-bit weight, which allows more resources to be used to improve matrix operation accuracy.
  • ISAAC uses splicing methods to represent weights, which is relatively inefficient and requires a lot of resources. For example, if you represent one weight, you need 4 memristor devices.
  • the existing PRIME technology first trains a neural network with floating point numbers, then uses two 3-bit precision input voltages to represent a 6-bit input, and two 4-bit memristor devices to represent an 8-bit.
  • the weights are weighted, and the positive and negative weights are represented by two sets of arrays.
  • PRIME uses positive and negative addition and high and low splicing methods to represent weights, and also requires a lot of resources. That is, to represent one weight, four memristor devices are needed.
  • the present invention has been made in view of the above circumstances.
  • a non-splicing weight training method for a neural network comprising: a weight-spotting step of converting each matrix element of a weight matrix into a first number having a predetermined number of bits; a step of introducing a noise having a predetermined standard deviation into the first number to obtain a second number; and a training step of training the weight matrix represented by the second number, training to convergence, and obtaining a training result, wherein The training result will be used as the final weight matrix, each matrix element being written one by one into a single analog circuit device corresponding to a matrix element, wherein a single matrix is represented by a single analog circuit device rather than a splicing of multiple analog circuit devices. element.
  • the first number conversion in the weight setting step, can be performed by a linear relationship or a logarithmic relationship.
  • the noise may be a read/write error of an analog circuit, and obey a normal distribution law.
  • the analog circuit device may be a memristor, a capacitor comparator or a voltage comparator.
  • the first number may be a fixed point number and the second number may be a floating point number.
  • a non-splicing weight coding method for a neural network comprising the steps of: writing each matrix element of a weight matrix one by one into a single analog circuit device corresponding to a matrix element, A single matrix element is represented by a splicing of a single analog circuit device rather than a plurality of analog circuit devices, wherein the weight matrix is obtained by the non-splicing weight training method described above.
  • the method may further include the following steps: a weight-spotting step of converting each matrix element of the weight matrix into a first number having a predetermined number of bits; and an error introduction step in Introducing noise with a predetermined standard deviation into the first number to obtain a second number; and training step, training the weight matrix represented by the second number, training until convergence, and obtaining a training result.
  • a neural network chip having a basic module for performing an operation of matrix vector multiplication in hardware by an analog circuit device, wherein each matrix element of the weight matrix is written one by one to represent one A single analog circuit device of matrix elements to represent a single matrix element of a weight matrix by splicing of a single analog circuit device rather than multiple analog circuit devices.
  • the weight matrix may be obtained by the above non-splicing weight training method.
  • a computing device includes a memory and a processor having stored thereon computer executable instructions that, when executed by a processor, perform a non-splicing weight training method according to the above Or according to the above non-splicing weight coding method.
  • a neural network system comprising: the computing device according to the above; and the neural network chip according to the above.
  • an encoding method for a neural network which can greatly reduce resource consumption without affecting effects, thereby saving resource overhead, and arranging a large-scale nerve under conditions of limited resources.
  • the internet The internet.
  • Figure 1 shows a schematic of a chained neural network.
  • Figure 2 shows a schematic diagram of a memristor based crossbar switch structure.
  • Figure 3 shows a weighted statistical distribution map of eight levels of weights on a memristor.
  • Fig. 4 shows a schematic diagram of an application scenario of an encoding technique of a neural network according to the present invention.
  • Figure 5 shows a general flow chart of an encoding method in accordance with the present invention.
  • Fig. 6 shows a comparison of experimental effects using the existing high and low level stitching method and the encoding method according to the present invention.
  • the present application provides a new encoding method (hereinafter referred to as RLevel encoding method), which is essentially different from the existing method in that the new encoding method does not require that the weight values represented by a single device do not overlap, but instead Kinds of errors are introduced into the training.
  • RLevel encoding method By training the weight matrix containing noise and enabling it to train to convergence, the converged values are finally written into a single device, thereby enhancing the noise immunity of the model and reducing the representation of matrix elements.
  • the number of costs reduces resource and resource consumption.
  • Figure 3 shows a weighted statistical distribution map of eight levels of weights on a memristor.
  • circuit devices other than the memristor capable of realizing matrix vector multiplication are also possible, such as a capacitor or a voltage comparator.
  • l and h represent low and high-order devices respectively
  • the weight is expressed as 2 n *h+l
  • the errors of low and high are L ⁇ (l, ⁇ 2 ), H ⁇ (h, ⁇ 2 ), then 2 n *H ⁇ (h, 2 2n * ⁇ 2 ).
  • the weight range is 2 2n -l
  • the standard deviation of the weight error is We use the range of values and the standard deviation as the standard for the final accuracy.
  • the accuracy of the splicing weight method is:
  • a device is used to represent a high-precision weight with an accuracy of (2 n -l) / ⁇ .
  • Fig. 4 shows a schematic diagram of an application scenario of an encoding technique of a neural network according to the present invention.
  • the general inventive concept of the present disclosure is to solve the problem that the network model 1200 employed by the neural network application 1100 is weight-encoded by the encoding method 1300, and the result is written into the memristor device of the neural network chip 1400.
  • the weight based on the memristor neural network indicates the problem of requiring a large number of devices, and finally saves a lot of resources without significant loss of accuracy.
  • FIG. 5 shows a general flow diagram of an encoding method in accordance with the present invention, comprising the following steps:
  • Weight setting process S210 converting each matrix element of the weight matrix into a first number having a predetermined number of bits
  • each weight value is converted into a fixed-point number with a certain precision, and the fixed-point weight is obtained.
  • each weight value is converted into 4 to a specific number of points.
  • step S220 in which noise having a predetermined standard deviation is introduced in the first number to obtain a second number.
  • the first number is set to a fixed point number
  • the second number is equal to the first number plus noise, so the second number is a floating point number.
  • the number of fixed points of 0, 1, 2, and 3 adds noise and becomes four floating point numbers of -0.1, 1.02, 2.03, and 2.88.
  • the first number may be a floating point number. .
  • Training step S230 training the weight matrix represented by the second number, training to convergence, and then writing the training result as a final weight matrix into the circuit device for weight matrix calculation.
  • the fixed point matrix B (Table 2) is decomposed into a high order matrix H (Table 3) and a low order matrix L (Table 4):
  • the fixed-point matrix B is converted into the conductance value of 4*10 -6 to 4*10 -5 according to the RLevel method and the high-low level splicing method respectively, and the Rlevel conductance matrix of Table 5 is obtained.
  • RC, high conductivity matrix HC and low conductivity matrix LC are converted into the conductance value of 4*10 -6 to 4*10 -5 according to the RLevel method and the high-low level splicing method respectively.
  • the training process according to the present invention does not convert the matrix into conductance values, but rather increases the noise of a normal distribution with a standard deviation of ⁇ on the basis of the first number.
  • the introduction of the actual error is caused by noise and disturbance during the reading and writing process of the memristor device or other used circuit device, so the data is based on the conductance value as the analog value below. analysis.
  • the outputs of the Rlevel conductance matrix RC, the high-order conductance matrix HC, and the low-level conductance matrix LC are respectively:
  • the above spliced output is output according to the high bit output *4+ low bit.
  • the RLevel coding method according to the present invention has very close precision to the output of the prior art high and low level splicing method, whether noise is added or not. Therefore, the solution of the present invention is verified from a theoretical point of view. Practicality and feasibility.
  • Fig. 6 shows a comparison of experimental effects using the existing high and low level stitching method and the RLevel encoding method according to the present invention.
  • This experiment used a convolutional neural network to classify the CIFAR10 data set.
  • the data set has 60,000 32*32 pixel color images, each of which belongs to one of 10 categories.
  • the abscissa is the weight precision and the ordinate is the correct rate.
  • the lower line uses the RLevel method to represent the weights of 2, 4, 6, and 8 bits by one device, and the upper line is 2 of 1, 2, 3, and 4 bits, respectively.
  • the devices are spliced to represent 2, 4, 6, and 8 bits.
  • the accuracy of the RLevel method is very close to that of the high-low-level stitching method, but since only one device is used, and no splicing of multiple devices is required, it is non-splicing. Encoding, so you can save 50% of resources.
  • the weight coding method of the present invention it is possible to provide substantially the same accuracy as the existing high and low bit splicing without using high and low bit splicing, and the weight matrix calculation of the neural network by the analog circuit such as a memristor is solved.
  • the need to arrange a large number of circuit devices also reduces costs and saves resources.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

一种用于神经网络的非拼接权重编码方法,包括:权重定点化步骤,将权重矩阵的每个矩阵元素转换为具有预定比特位数的第一数(S210);误差引入步骤,在所述第一数中引入具有预定标准差的噪声,获得第二数(S220);和训练步骤,对以第二数表示的权重矩阵进行训练,训练至收敛后,再将训练结果作为最终的权重矩阵写入对应表示一个矩阵元素的单个模拟电路器件中(S230),其中,通过单个模拟电路器件而非多个模拟电路器件的拼接来表示单个矩阵元素。根据所述用于神经网络的编码方法,可以在不影响效果的情况下,极大的降低资源的消耗,从而节省资源开销,以在有限资源的条件下布置规模巨大的神经网络。

Description

神经网络权重编码方法、计算装置及硬件系统 技术领域
本发明总体地涉及神经网络技术领域,更具体地涉及用于神经网络的权重编码方法、计算装置以及硬件系统。
背景技术
随着摩尔定律逐渐失效,现有芯片工艺进步放缓,人们不得不面向新应用和新器件。近年来,神经网络(Neural Network,NN)计算取得了突破性进展,在图像识别、语言识别、自然语言处理等诸多领域均取得了很高的准确率,但神经网络需要海量计算资源,现有的通用处理器已经很难满足深度学习的计算需求,设计专用芯片已经成为了一个重要的发展方向。与此同时,忆阻器的出现为神经网络芯片设计提供了一种高效的解决方案,忆阻器具有高密度、非易失、低功耗、存算合一、易于3D等优点,在神经网络计算中可以利用其阻值可调的特点作为可编程权重,并利用其存算合一的优点作高速乘加器。
神经网络组成单元均为神经元,由大量神经元相互连接成网络。神经元之间的连接可以看作带权重的有向边,神经元的输出会被神经元之间的连接所加权,然后传递给所连到的神经元,而每个神经元接收到的所有输入会被累加起来进行进一步处理,产生神经元的输出。神经网络的建模通常以若干神经元为一层,层与层之间相互连接来构建,图1所示的是一种链状的神经网络,图中每一个圆表示一个神经元,每一个箭头表示神经元之间的连接,每个连接均有权重,实际神经网络的结构不限于链状的网络结构。
神经网络的核心计算是矩阵向量乘操作,包含n个神经元的层L n产生的输出可以用长度为n的向量V n表示,与包含m个神经元的层L m全相联,连接权重可以表示成矩阵M n×m,矩阵大小为n行m列,每个矩阵元素表示一个连接的权重。则加权之后输入到L m的向量为M n×mV n,这样的矩阵向量乘法运算是神经网络最核心的计算。
由于矩阵向量乘计算量非常大,在现有的通用处理器上进行大量的矩阵 乘运算需要耗费大量的时间,因此神经网络加速芯片也都是以加速矩阵乘法运算为主要的设计目标。忆阻器阵列恰好能胜任上述工作。首先V为一组输入电压,电压与忆阻器电导G相乘并叠加输出电流,输出电流与接地电阻Rs相乘得到输出电压V’,整个过程在模拟电路下实现,具有速度快,面积小的优点。
然而使用基于忆阻器的芯片计算也存在精度低、扰动大,数模/模数转换开销大,矩阵规模受限等不足。而且虽然忆阻器可以高效地进行矩阵向量乘法运算,但是由于忆阻器芯片矩阵向量乘是在模拟电路中实现,所以不可避免的带来噪声和扰动,所以相对于神经网络,忆阻器的计算结果是不准确的。
由于忆阻器的工艺限制,使用忆阻器表示权重会有一定的误差。如图3所示,不同级的权重会有一定的重叠。为了避免重叠,现有方法一般是使用若干个低精度的忆阻器拼接来表示一个高精度权重,每个忆阻器精度都很低的情况下,可以认为权重数据是准确的。以用2个2比特忆阻器表示4比特权重为例,用一个2比特忆阻器来表示权重低2位,另一个表示高2位。
现有的ISAAC技术首先用浮点数训练一个神经网络,然后将权重数据“写入”忆阻器。ISAAC是用4个2比特忆阻器器件来表示一个8比特的权重,这样可以利用更多的资源来提高矩阵运算精度。
ISAAC使用拼接的方法来表示权重,效率比较低,需要很多的资源,比如表示一个1个权重,就需要4个忆阻器器件。
与ISAAC类似,现有的PRIME技术首先也用浮点数训练一个神经网络,然后使用2个3比特精度的输入电压来表示一个6比特输入,2个4比特的忆阻器器件来表示一个8比特的权重,并且将正、负的权重分别用两组阵列表示。
PRIME使用正负相加和高低位拼接方法来来表示权重,也需要大量的资源。即表示一个1个权重,就需要4个忆阻器器件。
基于忆阻器器件实现神经网络,必须克服权重读取误差的问题,这个问题是由器件特性和现有工艺造成的,目前难以避免。这些现有技术使用若干个低精度可以认为是“没有误差”的忆阻器拼接来表示一个高精度的权重,需要大量的资源,资源利用效率低。
因此,需要一种用于基于忆阻器神经网络的权重表示技术,以解决上述问题。
发明内容
鉴于上述情况,做出了本发明。
根据本发明的一个方面,提供了一种用于神经网络的非拼接权重训练方法,包括:权重定点化步骤,将权重矩阵的每个矩阵元素转换为具有预定比特位数的第一数;误差引入步骤,在所述第一数中引入具有预定标准差的噪声,获得第二数;和训练步骤,对以第二数表示的权重矩阵进行训练,训练至收敛后,得到训练结果,其中,所述训练结果将作为最终的权重矩阵,其各个矩阵元素被逐个写入对应表示一个矩阵元素的单个模拟电路器件中,其中通过单个模拟电路器件而非多个模拟电路器件的拼接来表示单个矩阵元素。
根据上述非拼接权重训练方法,在权重定点化步骤中,可以通过线性关系或对数关系进行第一数的转换。
根据上述非拼接权重训练方法,所述噪声可以为模拟电路的读写误差,并且服从正态分布规律。
根据上述非拼接权重训练方法,所述模拟电路器件可以为忆阻器、电容比较器或者电压比较器。
根据上述非拼接权重训练方法,所述第一数可以为定点数并且第二数可以为浮点数。
根据本发明的另一方面,提供了一种用于神经网络的非拼接权重编码方法,包括如下步骤:将权重矩阵的每个矩阵元素逐个写入对应表示一个矩阵元素的单个模拟电路器件中,以便通过单个模拟电路器件而非多个模拟电路器件的拼接来表示单个矩阵元素,其中,所述权重矩阵是通过上述非拼接权重训练方法得到的。
根据上述非拼接权重编码方法,在写入步骤之前,可以还包括如下步骤:权重定点化步骤,将权重矩阵的每个矩阵元素转换为具有预定比特位数的第一数;误差引入步骤,在所述第一数中引入具有预定标准差的噪声,获得第二数;和训练步骤,对以第二数表示的权重矩阵进行训练,训练至收敛后,得到训练结果。
根据本发明的另一方面,提供了一种神经网络芯片,具有通过模拟电路器件以硬件形式执行矩阵向量乘的操作的基本模块,其中,权重矩阵的每个矩阵元素被逐个写入对应表示一个矩阵元素的单个模拟电路器件中,以便通 过单个模拟电路器件而非多个模拟电路器件的拼接来表示权重矩阵的单个矩阵元素。
根据上述神经网络芯片,所述权重矩阵可以是上述非拼接权重训练方法得到的。
根据本发明的又一方面,提供了一种计算装置,包括存储器和处理器,存储器上存储有计算机可执行指令,所述计算机可执行指令当被处理器执行时执行根据上述非拼接权重训练方法或根据上述非拼接权重编码方法。
根据本发明的又一方面,提供了一种神经网络系统,包括:根据上述的计算装置;以及根据上述的神经网络芯片。
根据本发明,提供了一种用于神经网络的编码方法,可以在不影响效果的情况下,极大的降低资源的消耗,从而节省资源开销,以在有限资源的条件下布置规模巨大的神经网络。
附图说明
从下面结合附图对本发明实施例的详细描述中,本发明的这些和/或其它方面和优点将变得更加清楚并更容易理解,其中:
图1示出了链状的神经网络的示意图。
图2示出了基于忆阻器的交叉开关结构的示意图。
图3示出了在一个忆阻器上划分8级权重的权重统计分布图。
图4示出了根据本发明的神经网络的编码技术的应用情境的示意图。
图5示出了根据本发明的编码方法的总体流程图。
图6示出了使用现有高低位拼接方法和根据本发明的编码方法的实验效果对比。
具体实施方式
为了使本领域技术人员更好地理解本发明,下面结合附图和具体实施方式对本发明作进一步详细说明。
本申请提供一种新的编码方法(下文称为RLevel编码方法),其与现有方法的本质区别在于,新的编码方法并不要求使用单个器件表示的权重值不发生重叠,而是将这种误差引入训练当中。通过对含有噪声的权重矩阵进行训练,并使之能够训练至收敛,最后将收敛后的数值写入单个器件中,由此 既能够增强该模型的抗噪能力,也能够减少表示矩阵元素的器件的数量,降低了成本和资源消耗。
下面将结合附图对本申请的技术原理和实施方式进行详细分析。
图3示出了在一个忆阻器上划分8级权重的权重统计分布图。
如图3所示,由于忆阻器器件引起的误差近似于正态分布,假设器件的误差服从正态分布N(μ,σ 2),如果用忆阻器电导值来表示一个n比特的值,则μ有2 n个可能的值。这里,本领域技术人员可以理解,为了简化计算,对应不同电导值μ,采用相同的标准差σ。
虽然,在下面的具体实施方式中以忆阻器作为示例进行说明,但是除了忆阻器以外的其他能够实现矩阵向量乘的电路器件也是可以的,比如电容或电压比较器。
根据正态分布叠加的性质:统计独立的常态随机变量X~(μ xx 2),Y~(μ yy 2),那么它们的和也满足正态分布U=X+Y~(μ xyx 2y 2)。
假设如现有技术那样,用2个器件拼接来表示一个高精度权重。l和h分别代表低位和高位器件,权重表示为2 n*h+l,低位和高位的误差分别是L~(l,σ 2),H~(h,σ 2),则2 n*H~(h,2 2n2)。权重的取值范围是2 2n-l,权重误差的标准差是
Figure PCTCN2017119821-appb-000001
我们把取值范围与标准差作为最终精度的标准,则拼接权重方法的精度为:
Figure PCTCN2017119821-appb-000002
相比,本申请中,用一个器件来表示高精度权重,其精度是(2 n-l)/σ。
从上述结果可见,使用高低位拼接方法和单个器件表示权重的精度基本相同。
图4示出了根据本发明的神经网络的编码技术的应用情境的示意图。如图4所示,本公开的总体发明构思在于:将神经网络应用1100所采用的网络模型1200通过编码方法1300进行权重编码,将结果写入神经网络芯片1400的忆阻器器件,从而解决了基于忆阻器神经网络的权重表示需要大量器件的问题,最终在不明显损失精度的前提下,节省了大量的资源。
一、编码方法
图5示出了根据本发明的编码方法的总体流程图,包括如下步骤:
1、权重定点化步骤S210,将权重矩阵的每个矩阵元素转换为具有预定比特位数的第一数;
根据硬件设计(单个忆阻器器件的精度需要硬件支持),在前向网络中, 将每一个权重值转换为一定精度的定点数,得到定点化权重。
这里,为了更好地说明本发明的方法,以下面的表1的2*2大小的权重矩阵A为例子进一步说明。
表1初始权重矩阵A
0.2641 0.8509
0.3296 0.6740
当以4比特作为预定比特位数,将每个权重值转化为4比特定点数。其中,矩阵中的最大值0.8509对应于4比特的最大值,即2 4-1=15,而其他值相应地进行线性换算而得到定点化权重,获得表2的定点数矩阵。
表2定点数矩阵B
5.0000 15.0000
6.0000 12.0000
需要说明的是,上面是通过线性方式进行定点数的转换,但是本领域技术人员可以理解,也可以不通过线性方式,而通过对数或者其他计算方式进行转换。
2、误差引入步骤S220,在所述第一数中引入具有预定标准差的噪声,获得第二数。
根据忆阻器器件特性,加入标准差为σ的正态分布的噪声进行训练,即权重w=w+Noise,Noise~(0,σ 2)。需要说明的是,在这里,将第一数设定为定点数,而第二数等于第一数加噪音,因此第二数为浮点数。例如0、1、2、3四个定点数,加入了噪音,变成-0.1、1.02、2.03、2.88四个浮点数,但是这样的设定并非限制性的,第一数也可以是浮点数。
3、训练步骤S230,对以第二数表示的权重矩阵进行训练,训练至收敛后,再将训练结果作为最终的权重矩阵写入用于权重矩阵计算的电路器件中。
二、理论验证
下面给出实际的示例来从理论角度说明以同样的输入,使用根据本发明的RLevel编码方法的输出和根据现有技术的高低位拼接方法的输出有着接近的精度。
如果用两个2比特进行拼接,定点数矩阵B(表2)分解为高位矩阵H (表3)和低位矩阵L(表4):
表3高位矩阵H
1.0000 3.0000
1.0000 3.0000
表4低位矩阵L
1.0000 3.0000
2.0000 0.0000
拼接中,定点数矩阵B等于高位矩阵H*4+低位矩阵L,即B=4*H+L,不管高位还是低位,最大值对应于2比特的最大值,即3。
下面为了更好地模拟实际误差的引入,将定点数矩阵B分别按照RLevel方法和高低位拼接方法转换为4*10 -6至4*10 -5的电导值,则得到表5的Rlevel电导矩阵RC、高位电导矩阵HC和低位电导矩阵LC。
需要注意的是,根据本发明的训练过程并不会将矩阵转换为电导值,而是在第一数的基础上增加标准差为σ的正态分布的噪声进行训练。此处是为了说明,实际误差的引入是由于忆阻器器件或者其他所使用的电路器件在读取和写入过程中由于噪声和扰动而引起的,因此下面基于作为模拟值的电导值进行数据分析。
表5电导矩阵
Figure PCTCN2017119821-appb-000003
假设输入电压为:
0.10    0.15
【无噪声】
如果没有噪声,则基于上述输入电压,Rlevel电导矩阵RC、高位电导矩阵HC和低位电导矩阵LC的输出分别为:
表6电导矩阵输出
Rlevel输出RC_out 高位输出HC_out 低位输出LC_out 拼接输出HLC_out
4.36000000E-06 4.0000E-06 1.0000E-05 2.18000000E-05
8.92000000E-06 5.8000E-06 4.6000E-06 4.46000000E-05
上述拼接输出为按照高位输出*4+低位输出。
如果将表6的结果,即Rlevel输出RC_out和拼接输出HL_out,转换为8比特定点数进行比较,则可以看出两者均为:
125.     255.
【加入噪声】
如果对电导矩阵加入均值为0且标准差为0.05*4*10 -5(即大约为5%)的噪声,则得到表7的噪声矩阵。
表7噪声矩阵
Figure PCTCN2017119821-appb-000004
仍然假设输入电压为:
0.10     0.15
则Rlevel、高位和低位噪声矩阵输出分别为:
表8噪声矩阵输出
Rlevel输出RN_out 高位输出HN_out 低位输出LN_out 拼接输出HLN_out
4.5550E-06 4.2578E-06 6.3242E-06 2.3355E-05
9.0081E-06 9.9704E-06 4.1181E-06 4.4000E-05
如果将表8的结果,即Rlevel输出RN_out和拼接输出HLN_out,转换为8比特定点数进行比较,则可以看出两者分别为:
Rlevel输出:129.00255.00
拼接输出:135.00255.00
由最终结果可见,无论加入噪声还是不加入噪声,根据本发明的RLevel编码方法都与现有技术的高低位拼接方法的输出有着非常接近的精度,因此,从理论角度验证了本发明的方案的实用性和可行性。
三、数据验证
为了从实验数据角度验证本发明的编码方法的有效性,申请人做了一系列的实验。
图6示出了使用现有高低位拼接方法和根据本发明的RLevel编码方法的实验效果对比。
本次实验使用卷积神经网络对CIFAR10数据集进行分类。该数据集有60000个32*32像素的彩色图片,每张图片都属于10种分类之一。如图6所示,横坐标为权重精度,纵坐标为正确率。图中有两根线,下面的线是使用RLevel方法,分别用一个器件表示2、4、6、8比特的权重,而上面的线是以分别为1、2、3、4比特的2个器件拼接来表示2、4、6、8比特。
如图6所示,在本次实验中,用RLevel方法的准确率与用高低位拼接方法的准确率非常接近,但是由于只使用了一个器件,而不需要多个器件的拼接,属于非拼接编码,因此可以节省50%的资源。如此,根据本发明的权重编码方法,不需要采用高低位拼接,就能够提供与现有的高低位拼接基本相同的精度,既解决了通过忆阻器等模拟电路来进行神经网络的权重矩阵计算需要布置大量电路器件的问题,也降低了成本,节省了资源。
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。因此,本发明的保护范围应该以权利要求的保护范围为准。

Claims (11)

  1. 一种用于神经网络的非拼接权重训练方法,包括:
    权重定点化步骤,将权重矩阵的每个矩阵元素转换为具有预定比特位数的第一数;
    误差引入步骤,在所述第一数中引入具有预定标准差的噪声,获得第二数;和
    训练步骤,对以第二数表示的权重矩阵进行训练,训练至收敛后,得到训练结果,
    其中,所述训练结果将作为最终的权重矩阵,其各个矩阵元素被逐个写入对应表示一个矩阵元素的单个模拟电路器件中,其中通过单个模拟电路器件而非多个模拟电路器件的拼接来表示单个矩阵元素。
  2. 根据权利要求1所述的非拼接权重训练方法,其中,在权重定点化步骤中,通过线性关系或对数关系进行第一数的转换。
  3. 根据权利要求1所述的非拼接权重训练方法,其中,所述噪声为模拟电路的读写误差,并且服从正态分布规律。
  4. 根据权利要求1所述的非拼接权重训练方法,其中,所述模拟电路器件为忆阻器、电容比较器或者电压比较器。
  5. 根据权利要求1所述的非拼接权重训练方法,其中,所述第一数为定点数并且第二数为浮点数。
  6. 一种用于神经网络的非拼接权重编码方法,包括如下步骤:将权重矩阵的每个矩阵元素逐个写入对应表示一个矩阵元素的单个模拟电路器件中,以便通过单个模拟电路器件而非多个模拟电路器件的拼接来表示单个矩阵元素,
    其中,所述权重矩阵是通过权利要求1到5中任一项所述的非拼接权重训练方法得到的。
  7. 根据权利要求6所述的非拼接权重编码方法,其中,在写入步骤之前,还包括如下步骤:
    权重定点化步骤,将权重矩阵的每个矩阵元素转换为具有预定比特位数的第一数;
    误差引入步骤,在所述第一数中引入具有预定标准差的噪声,获得第二 数;和
    训练步骤,对以第二数表示的权重矩阵进行训练,训练至收敛后,得到训练结果。
  8. 一种神经网络芯片,具有通过模拟电路器件以硬件形式执行矩阵向量乘的操作的基本模块,
    其中,权重矩阵的每个矩阵元素被逐个写入对应表示一个矩阵元素的单个模拟电路器件中,以便通过单个模拟电路器件而非多个模拟电路器件的拼接来表示权重矩阵的单个矩阵元素。
  9. 根据权利要求8的神经网络芯片,其中所述权重矩阵是通过权利要求1到5中任一项所述的非拼接权重训练方法得到的。
  10. 一种计算装置,包括存储器和处理器,存储器上存储有计算机可执行指令,所述计算机可执行指令当被处理器执行时执行权利要求1到5中任一项所述的非拼接权重训练方法或权利要求6到7中任一项所述的非拼接权重编码方法。
  11. 一种神经网络系统,包括:
    如权利要求10所述的计算装置;以及
    如权利要求8-9中任一项所述的神经网络芯片。
PCT/CN2017/119821 2017-12-29 2017-12-29 神经网络权重编码方法、计算装置及硬件系统 Ceased WO2019127363A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201780042640.0A CN109791626B (zh) 2017-12-29 2017-12-29 神经网络权重编码方法、计算装置及硬件系统
PCT/CN2017/119821 WO2019127363A1 (zh) 2017-12-29 2017-12-29 神经网络权重编码方法、计算装置及硬件系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/119821 WO2019127363A1 (zh) 2017-12-29 2017-12-29 神经网络权重编码方法、计算装置及硬件系统

Publications (1)

Publication Number Publication Date
WO2019127363A1 true WO2019127363A1 (zh) 2019-07-04

Family

ID=66495542

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/119821 Ceased WO2019127363A1 (zh) 2017-12-29 2017-12-29 神经网络权重编码方法、计算装置及硬件系统

Country Status (2)

Country Link
CN (1) CN109791626B (zh)
WO (1) WO2019127363A1 (zh)

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10678244B2 (en) 2017-03-23 2020-06-09 Tesla, Inc. Data synthesis for autonomous control systems
US11157441B2 (en) 2017-07-24 2021-10-26 Tesla, Inc. Computational array microprocessor system using non-consecutive data formatting
US11409692B2 (en) 2017-07-24 2022-08-09 Tesla, Inc. Vector computational unit
US10671349B2 (en) 2017-07-24 2020-06-02 Tesla, Inc. Accelerated mathematical engine
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests
US12307350B2 (en) 2018-01-04 2025-05-20 Tesla, Inc. Systems and methods for hardware-based pooling
US11561791B2 (en) 2018-02-01 2023-01-24 Tesla, Inc. Vector computational unit receiving data elements in parallel from a last row of a computational array
US11215999B2 (en) 2018-06-20 2022-01-04 Tesla, Inc. Data pipeline and deep learning system for autonomous driving
US11361457B2 (en) 2018-07-20 2022-06-14 Tesla, Inc. Annotation cross-labeling for autonomous control systems
US11636333B2 (en) 2018-07-26 2023-04-25 Tesla, Inc. Optimizing neural network structures for embedded systems
US11562231B2 (en) 2018-09-03 2023-01-24 Tesla, Inc. Neural networks for embedded devices
KR20250078625A (ko) 2018-10-11 2025-06-02 테슬라, 인크. 증강 데이터로 기계 모델을 훈련하기 위한 시스템 및 방법
US11196678B2 (en) 2018-10-25 2021-12-07 Tesla, Inc. QOS manager for system on a chip communications
US11816585B2 (en) 2018-12-03 2023-11-14 Tesla, Inc. Machine learning models operating at different frequencies for autonomous vehicles
US11537811B2 (en) 2018-12-04 2022-12-27 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11610117B2 (en) 2018-12-27 2023-03-21 Tesla, Inc. System and method for adapting a neural network model on a hardware platform
US11150664B2 (en) 2019-02-01 2021-10-19 Tesla, Inc. Predicting three-dimensional features for autonomous driving
US10997461B2 (en) 2019-02-01 2021-05-04 Tesla, Inc. Generating ground truth for machine learning from time series elements
US11567514B2 (en) 2019-02-11 2023-01-31 Tesla, Inc. Autonomous and user controlled vehicle summon to a target
US10956755B2 (en) 2019-02-19 2021-03-23 Tesla, Inc. Estimating object properties using visual image data
CN114341884B (zh) * 2019-09-09 2025-07-25 高通股份有限公司 用于针对二进制处理应用来修改神经网络的系统和方法
CN110796241B (zh) * 2019-11-01 2022-06-17 清华大学 基于忆阻器的神经网络的训练方法及其训练装置
CN111027619B (zh) * 2019-12-09 2022-03-15 华中科技大学 一种基于忆阻器阵列的K-means分类器及其分类方法
CN113344170B (zh) 2020-02-18 2023-04-25 杭州知存智能科技有限公司 神经网络权重矩阵调整方法、写入控制方法以及相关装置
WO2021163866A1 (zh) * 2020-02-18 2021-08-26 杭州知存智能科技有限公司 神经网络权重矩阵调整方法、写入控制方法以及相关装置
CN115481562B (zh) * 2021-06-15 2023-05-16 中国科学院微电子研究所 多并行度优化方法、装置、识别方法和电子设备
US12462575B2 (en) 2021-08-19 2025-11-04 Tesla, Inc. Vision-based machine learning model for autonomous driving with adjustable virtual camera
CN114282478B (zh) * 2021-11-18 2023-11-17 南京大学 一种修正可变电阻器件阵列点乘误差的方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224986A (zh) * 2015-09-29 2016-01-06 清华大学 基于忆阻器件的深度神经网络系统
US20170061281A1 (en) * 2015-08-27 2017-03-02 International Business Machines Corporation Deep neural network training with native devices
CN106650922A (zh) * 2016-09-29 2017-05-10 清华大学 硬件神经网络转换方法、计算装置、编译方法和神经网络软硬件协作系统
CN107085628A (zh) * 2017-03-21 2017-08-22 东南大学 一种细胞神经网络可调权值模块仿真方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10580401B2 (en) * 2015-01-27 2020-03-03 Google Llc Sub-matrix input for neural network layers
CN106796668B (zh) * 2016-03-16 2019-06-14 香港应用科技研究院有限公司 用于人工神经网络中比特深度减少的方法和系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170061281A1 (en) * 2015-08-27 2017-03-02 International Business Machines Corporation Deep neural network training with native devices
CN105224986A (zh) * 2015-09-29 2016-01-06 清华大学 基于忆阻器件的深度神经网络系统
CN106650922A (zh) * 2016-09-29 2017-05-10 清华大学 硬件神经网络转换方法、计算装置、编译方法和神经网络软硬件协作系统
CN107085628A (zh) * 2017-03-21 2017-08-22 东南大学 一种细胞神经网络可调权值模块仿真方法

Also Published As

Publication number Publication date
CN109791626B (zh) 2022-12-27
CN109791626A (zh) 2019-05-21

Similar Documents

Publication Publication Date Title
CN109791626B (zh) 神经网络权重编码方法、计算装置及硬件系统
US20230409893A1 (en) On-chip training of memristor crossbar neuromorphic processing systems
CN108009640B (zh) 基于忆阻器的神经网络的训练装置及其训练方法
US20240185050A1 (en) Analog neuromorphic circuit implemented using resistive memories
CN109791628B (zh) 神经网络模型分块压缩方法、训练方法、计算装置及系统
US20220374688A1 (en) Training method of neural network based on memristor and training device thereof
CN106203625B (zh) 一种基于多重预训练的深层神经网络训练方法
US12217164B2 (en) Neural network and its information processing method, information processing system
Kim et al. Input voltage mapping optimized for resistive memory-based deep neural network hardware
CN107423816A (zh) 一种多计算精度神经网络处理方法和系统
US20210209450A1 (en) Compressed weight distribution in networks of neural processors
CN108446766A (zh) 一种快速训练堆栈自编码深度神经网络的方法
CN107944545A (zh) 应用于神经网络的计算方法及计算装置
US20250200348A1 (en) Model Compression Method and Apparatus, and Related Device
CN111539522A (zh) 基于固定尺寸忆阻阵列的大规模ncs容错框架的构建方法
CN118839766A (zh) 一种面向多边缘设备的Transformer模型协同推理方法
CN109977470A (zh) 一种基于忆阻Hopfield神经网络实现稀疏编码的电路及其操作方法
CN113705784B (zh) 一种基于矩阵共享的神经网络权重编码方法及硬件系统
CN117273109A (zh) 基于量子神经元的混合神经网络的构建方法及装置
CN114997385B (zh) 应用于神经网络的存内计算架构的操作方法、装置和设备
Lu et al. NVMLearn: a simulation platform for non-volatile-memory-based deep learning hardware
US11977432B2 (en) Data processing circuit and fault-mitigating method
Li et al. Memory saving method for enhanced convolution of deep neural network
CN106169094A (zh) 一种基于分布式神经元的rnnlm系统及其设计方法
CN119476396B (zh) 一种适用于忆阻型类脑芯片硬件部署的神经网络训练方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17936307

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17936307

Country of ref document: EP

Kind code of ref document: A1