RU2739340C1

RU2739340C1 - Method for neuromorphic data processing and device for implementation thereof

Info

Publication number: RU2739340C1
Application number: RU2019136697A
Authority: RU
Inventors: Дмитрий Олегович Малашин; Роман Олегович Малашин
Priority date: 2019-11-14
Filing date: 2019-11-14
Publication date: 2020-12-23

Abstract

FIELD: data processing.

SUBSTANCE: invention relates to a method and a device for neuromorphic data processing. In the method, when power is supplied to a field programmable logic device (FPLD), comprising a neural network unit realizing neural network selection for the current task solution, which satisfies selection criteria, configuration of said neural network unit is downloaded from configuration memory, which based on analysis of input data of solved task and according to the above-specified criteria selects from the configuration memory corresponding neural network and loads it into the memory of the FPLD, providing dynamic reconfiguration of the FPLD, the loaded neural network performs data processing to solve the current task, and if the solution of the current task received by the loaded neural network does not satisfy predetermined criteria, then the neural network unit performs dynamic reconfiguration of the FPLD, loading from configuration memory image of another neural network, which satisfies current operating conditions, providing full or partial dynamic reconfiguration.

EFFECT: technical result is faster configuration of the FPLD.

2 cl, 6 dwg

Description

Предлагаемое техническое решение относится к области информатики и вычислительной техники и может быть использовано в создании компактных высокопроизводительных вычислителей, в том числе в робототехнике.The proposed technical solution relates to the field of informatics and computer technology and can be used to create compact high-performance computers, including in robotics.

Существует задача моделирования работы человеческого мозга в реальном времени, а также множество других задач, требующих использования нейроморфной обработки.There is the task of simulating the work of the human brain in real time, as well as many other tasks that require the use of neuromorphic processing.

Имеются нейроморфные суперкомпьютерыThere are neuromorphic supercomputers

(https://www.datacenterdynamics.com/news/us-air-force-ibm-unveil-worlds-largest- neuromorphic-digital-synaptic-supercomputer/), например, Spiking Neural Network Architecture (http://apt.cs.manchester.ac.uk/projects/SpiNNaker/,(https://www.datacenterdynamics.com/news/us-air-force-ibm-unveil-worlds-largest- neuromorphic-digital-synaptic-supercomputer /), e.g. Spiking Neural Network Architecture (http: // apt. cs.manchester.ac.uk/projects/SpiNNaker/,

http://wp.doc.ic.ac.uk/hipeds/wp-content/uploads/sites/78/2016/01/The-SpiNNaker-Project-Seminar-Slides.pdf), который содержит порядка 1 миллиона аппаратных ядер с ограниченным набором команд и микросхем программируемой логики (ПЛИС). Это устройство и принцип его работы выбраны за прототип.http://wp.doc.ic.ac.uk/hipeds/wp-content/uploads/sites/78/2016/01/The-SpiNNaker-Project-Seminar-Slides.pdf), which contains about 1 million hardware cores with a limited set of commands and programmable logic chips (FPGA). This device and its principle of operation were chosen as a prototype.

Основными недостатками прототипа являются чрезмерные габариты (прототип занимает большие помещения), высокое энергопотребление и низкая надежность.The main disadvantages of the prototype are excessive dimensions (the prototype takes up large rooms), high power consumption and low reliability.

Техническая проблема состоит в том, что из-за имплементации множества нейронных сетей в нейроморфный суперкомпьютер требуется большое количество высокоскоростной памяти, логических элементов, аппаратных ядер, специализированных микросхем, элементов питания и других элементов. Как следствие исключительно большого количества элементов, имеющееся количество межсоединений приводит к уменьшению надежности. Также техническая проблема состоит в невозможности использования прототипа в возимой или носимой технике в связи с имеющимися масса- габаритными характеристиками.The technical problem is that, due to the implementation of many neural networks in a neuromorphic supercomputer, a large amount of high-speed memory, logic elements, hardware cores, specialized microcircuits, batteries and other elements are required. As a consequence of the extremely large number of elements, the number of interconnections available leads to a decrease in reliability. Also, the technical problem is the impossibility of using the prototype in a portable or portable equipment due to the available weight and overall characteristics.

Для разрешения указанной проблемы предлагается способ нейроморфной обработки данных, включающий операции нейросетевой обработки, в котором при подаче питания в программируемую логическую интегральную схему (ПЛИС), содержащую блок нейронной сети, реализующей выбор нейронной сети для решения текущей задачи, удовлетворяющей критериям выбора, включающим в себя наибольшую ожидаемую точность решения текущей задачи и наименьшие вычислительные ресурсы, затрачиваемые на решение этой задачи, из конфигурационной памяти загружается конфигурация указанного блока нейронной сети, далее указанный блок нейронной сети на основе анализа входных данных текущей решаемой задачи и в соответствии с выше определенными критериями выбирает из конфигурационной памяти соответствующую нейронную сеть и загружает ее в память ПЛИС, обеспечивая таким образом динамическую реконфигурацию ПЛИС в процессе работы, затем загруженная нейронная сеть выполняет обработку данных для решения текущей задачи, и, в том случае, если решение текущей задачи, получаемое загруженной нейронной сетью, не удовлетворяет предопределенным критериям, то вышеуказанный блок нейронной сети выполняет динамическую реконфигурацию ПЛИС, загружая из конфигурационной памяти образ другой нейронной сети, удовлетворяющей текущим условиям работы, при этом обеспечивая полную динамическую реконфигурацию или частичную динамическую реконфигурацию, обновляя конфигурацию только части логики ПЛИС в том случае, когда основная часть конфигурационного образа вновь загружаемой нейронной сети и предыдущей загруженной нейронной сети совпадают.To solve this problem, a method of neuromorphic data processing is proposed, including neural network processing operations, in which, when power is supplied to a programmable logic integrated circuit (FPGA), containing a neural network block that implements the selection of a neural network for solving the current problem that satisfies the selection criteria, including the highest expected accuracy of solving the current problem and the smallest computational resources spent on solving this problem, the configuration of the specified neural network block is loaded from the configuration memory, then the specified neural network block, based on the analysis of the input data of the current problem being solved and in accordance with the above defined criteria, selects from the configuration memory of the corresponding neural network and loads it into the FPGA memory, thus providing dynamic reconfiguration of the FPGA during operation, then the loaded neural network performs data processing to solve the current task, and, in the event that p The solution to the current task obtained by the loaded neural network does not meet the predefined criteria, then the above-mentioned neural network block performs dynamic reconfiguration of the FPGA, loading from the configuration memory an image of another neural network that satisfies the current operating conditions, while providing full dynamic reconfiguration or partial dynamic reconfiguration, updating configuration of only part of the FPGA logic in the case when the main part of the configuration image of the newly loaded neural network and the previous loaded neural network coincide.

В устройство нейроморфной обработки данных, включающее нейросетевые блоки, реализованные на основе процессоров с ограниченным набором команд и программируемых логических интегральных схемах (ПЛИС), отличающееся тем, что в него введены ПЛИС с конфигурацией связей на базе статического оперативного запоминающего устройства, содержащая блок нейронной сети, реализующей выбор нейронной сети для решения текущей задачи, удовлетворяющей критериям выбора, включающим в себя наибольшую ожидаемую точность решения текущей задачи и наименьшие вычислительные ресурсы, затрачиваемые на решение этой задачи, конфигурационная память, высокоскоростной канал передачи данных между ПЛИС и конфигурационной памятью t пропускной способностью 50-100 Тб/с, при этом ПЛИС выполнена с возможностью загрузки, при подаче питания, из конфигурационной памяти по высокоскоростному каналу передачи данных конфигурации указанного блока нейронной сети, реализующей выбор нейронной сети для решения текущей задачи, а указанный блок нейронной сети выполнен с возможностью на основе анализа входных данных текущей решаемой задачи и в соответствии с выше определенными критериями выполнять выбор из конфигурационной памяти соответствующей нейронной сети, и выполнять загрузку ее образа в ПЛИС, а также с возможностью, в том случае, если решение задачи, получаемое загруженной нейронной сетью, не удовлетворяет предопределенным критериям, обеспечивать динамическую реконфигурацию ПЛИС, загружая из конфигурационной памяти образ другой нейронной сети, удовлетворяющей текущим условиям работы, при этом обеспечивая полную динамическую реконфигурацию, или, в случае, когда основная часть конфигурационного образа вновь загружаемой нейронной сети и предыдущей загруженной нейронной сети совпадают, обеспечивая частичную динамическую реконфигурацию, обновляя конфигурацию только части логики ПЛИС.In a neuromorphic data processing device, including neural network blocks, implemented on the basis of processors with a limited set of instructions and programmable logic integrated circuits (FPGAs), characterized in that FPGAs with a configuration of connections based on static random access memory, containing a neural network block, are introduced into it, which implements the choice of a neural network for solving the current problem, satisfying the selection criteria, including the highest expected accuracy of solving the current problem and the smallest computational resources spent on solving this problem, configuration memory, high-speed data transfer channel between FPGA and configuration memory t with a bandwidth of 50- 100 TB / s, while the FPGA is designed to be loaded, when power is applied, from the configuration memory via a high-speed data transmission channel for the configuration of the specified neural network block, which implements the selection of the neural network for solving the current task, and the specified block the neural network is made with the ability, based on the analysis of the input data of the current problem to be solved and in accordance with the above-defined criteria, to select from the configuration memory of the corresponding neural network, and to load its image into the FPGA, and also with the possibility, if the solution to the problem, received by the loaded neural network does not satisfy predefined criteria, provide dynamic reconfiguration of the FPGA, loading from the configuration memory the image of another neural network that meets the current operating conditions, while providing full dynamic reconfiguration, or, in the case when the main part of the configuration image of the newly loaded neural network and the previous loaded neural network are the same, providing partial dynamic reconfiguration, updating the configuration of only part of the FPGA logic.

Как следствие, устройство, в котором реализован предлагаемый способ, занимает пространство порядка 12-36 кубических сантиметров. Из-за уменьшения количества электронных компонентов и, соответственно, количества паяных соединений между ними, увеличивается надежность устройства.As a consequence, the device in which the proposed method is implemented occupies a space of about 12-36 cubic centimeters. Due to the decrease in the number of electronic components and, accordingly, the number of soldered connections between them, the reliability of the device increases.

Таким образом, технический результат от использования предлагаемых способа и устройства состоит, во-первых, в увеличении надежности устройства и уменьшении его габаритов и массы на несколько порядков и, во-вторых, в повышении быстродействия конфигурации ПЛИС за счет загрузки в ПЛИС образа нейронной сети, удовлетворяющей предопределенным критериям, с обеспечением частичной динамической реконфигурации ПЛИС.Thus, the technical result from the use of the proposed method and device consists, firstly, in increasing the reliability of the device and reducing its size and weight by several orders of magnitude and, secondly, in increasing the speed of the FPGA configuration by loading the neural network image into the FPGA, satisfying predefined criteria, with the provision of partial dynamic reconfiguration of the FPGA.

На фиг. 1 изображена общая архитектура работы предлагаемого способа.FIG. 1 shows the general architecture of the proposed method.

На фиг. 2 изображена структура специализированного высокоскоростного канала передачи данных между ПЛИС и конфигурационной памятью.FIG. 2 shows the structure of a dedicated high-speed data transfer channel between FPGA and configuration memory.

На фиг. 3 изображено состояние ПЛИС в начальный момент времени при подаче питания.FIG. 3 shows the state of the FPGA at the initial moment of time when power is applied.

На фиг. 4 изображена функциональная схема выбора специальной нейронной сети (СпНС) и загрузки из ПЗУ соответствующей нейронной сети.FIG. 4 shows a functional diagram of the selection of a special neural network (SPNS) and loading from the ROM of the corresponding neural network.

На фиг. 5 изображено состояние ПЛИС и ПЗУ после конфигурации ПЛИС выбранной нейронной сетью.FIG. 5 shows the state of the FPGA and ROM after the configuration of the FPGA with the selected neural network.

На фиг.6 изображено состояние ПЛИС и ПЗУ после изменения условий работы устройства, выбора и конфигурации ПЛИС выбранной нейронной сетью, отличной от ранее используемой.Figure 6 shows the state of the FPGA and ROM after changing the operating conditions of the device, the selection and configuration of the FPGA by the selected neural network, different from the one previously used.

В упрощенном виде современные ПЛИС представляют собой набор логических вентилей с программируемыми связями между ними, а также набор периферийных контроллеров для взаимодействия с внешними микросхемами. Программирование связей между логическими вентилями может осуществляться за счет различных принципов (пережигаемые перемычки, управляемые перемычки, энергонезависимая память и др.), но в рамках предлагаемого решения представляет интерес ПЛИС на основе конфигурации логических связей при помощи статического ОЗУ. Данный тип ПЛИС позволяет максимально быстро осуществить конфигурацию ПЛИС при подаче питания.In a simplified form, modern FPGAs are a set of logic gates with programmable connections between them, as well as a set of peripheral controllers for interacting with external microcircuits. The programming of connections between logic gates can be carried out due to various principles (burned-out jumpers, controllable jumpers, non-volatile memory, etc.), but within the framework of the proposed solution, FPGAs based on the configuration of logical connections using static RAM are of interest. This type of FPGA allows for the fastest possible configuration of the FPGA when power is applied.

Принцип реализации предлагаемого способа нейроморфной обработки базируется на введении в конфигурацию ПЛИС в начальный момент времени специальной операции, выполняемой с использованием специальной натренированной нейронной сети. Данная специальная нейронная сеть выполняет следующие задачи:The implementation principle of the proposed method of neuromorphic processing is based on the introduction of a special operation into the FPGA configuration at the initial moment of time, performed using a special trained neural network. This special neural network performs the following tasks:

1) Определение по входным данным наилучшей нейронной сети в конкретный момент времени (или другого алгоритма) для решения этой задачи (фиг. 1) в зависимости от изменения условий работы изделия (внешние, внутренние шумы, динамический диапазон, сюжеты изображений, рабочая температура и т.д.). Основной задачей дополнительно введенной операции является выбор из множества нейронных сетей (или других решений) наилучшей в рамках заданных критериев качества.1) Determination from the input data of the best neural network at a particular moment in time (or another algorithm) to solve this problem (Fig. 1) depending on changes in the operating conditions of the product (external, internal noises, dynamic range, image plots, operating temperature, etc.) etc.). The main task of the additionally introduced operation is to choose the best one from a variety of neural networks (or other solutions) within the specified quality criteria.

2) Оценка качества применяемой нейронной сети в данный момент времени.2) Assessment of the quality of the applied neural network at a given time.

В качестве используемых загружаемых нейронных сетей по предлагаемому способу могут быть использованы как свёрточные нейронные сети (СНС) с различным количеством свёрточных слоев, так и другие виды сетей: рекуррентные нейронные сети, комбинированные нейронные сети.As used loadable neural networks according to the proposed method can be used as convolutional neural networks (CNN) with a different number of convolutional layers, and other types of networks: recurrent neural networks, combined neural networks.

Подход к реализации архитектуры СпНС в рамках заявляемого способа может быть сформулирован следующим образом:The approach to the implementation of the SNS architecture within the framework of the proposed method can be formulated as follows:

1. Выбор целевой функции. Как пример реализации, это может быть функция, учитывающая вычислительную сложность и точность классификации изображения.1. Choice of the objective function. As an example of implementation, it can be a function that takes into account the computational complexity and accuracy of image classification.

2. Введение множества классификаторов А (множества СНС или других классификаторов), функции выбора следующего классификатора - Ф₁ и функции обновления состояния по результатам работы классификатора - Ф₂.2. Introduction of a set of classifiers A (a set of SNS or other classifiers), the function of choosing the next classifier - Ф ₁ and the function of updating the state according to the results of the work of the classifier - Ф ₂ .

3. Аппроксимация функции выбора Ф₁ и функции обновления Ф₂ для решения задачи. Функции Ф₁, Ф₂, должны минимизировать затрачиваемые вычислительные ресурсы и максимизировать ожидаемую точность решения. Функция выбора

отображает текущее состояние s_t и данные х (изображение, в рассматриваемом случае) на пространство «ключей» классификаторов

где ε - символ означающий, что вычисления требуется прекратить. Термин «ключ» в данном случае используется, чтобы подчеркнуть, что Ф₁ не генерирует алгоритм а, а лишь выбирает его из списка. Классификатор а∈А принимает данные х на вход, а возвращает вектор решения у*: а(х)→у*. Для удобства можем считать, что «ключ» а также содержит информацию о «сложности» алгоритма а. Функция обновления состояния

отображает текущее состояние и результат работы выбранного классификатора на пространство состояний и новое обобщенное решение с учетом предыдущих ответов классификаторов. Это может быть реализовано в рамках заявляемого способа в упрощенном виде на основе наивного Байесовского классификатора и минимизации ожидаемой энтропии.3. Approximation of the selection function Ф ₁ and the update function Ф ₂ for solving the problem. Functions Ф ₁ , Ф ₂ , should minimize the expended computing resources and maximize the expected accuracy of the solution. Selection function

maps the current state s _t and data x (image, in this case) on the space of "keys" of classifiers

where ε is a symbol indicating that calculations must be stopped. The term "key" in this case is used to emphasize that Ф ₁ does not generate algorithm a, but only selects it from the list. The classifier a∈A takes data x as input, and returns the decision vector y *: a (x) → y *. For convenience, we can assume that the "key" a also contains information about the "complexity" of the algorithm a. Status update function

displays the current state and the result of the selected classifier on the state space and a new generalized solution, taking into account the previous answers of the classifiers. This can be implemented within the framework of the proposed method in a simplified form based on a naive Bayesian classifier and minimizing the expected entropy.

В более общем случае для аппроксимации оптимальных функций выбора Ф₁(s_t-1) и функции обновления состояния

предлагается использовать СпНС - управляющую рекуррентную нейронную сеть-агент, которая способна взаимодействовать с исходными данными (с изображением) опосредованно через обученные нейронные сети-классификаторы. СпНС может быть обучена в парадигме обучения с подкреплением, когда классификаторы являются средой для агента.In a more general case, to approximate the optimal selection functions Ф ₁ (s _t-1 ) and the state update function

it is proposed to use the SNS - a control recurrent neural network-agent that is able to interact with the initial data (with the image) indirectly through the trained neural networks-classifiers. SNS can be trained in the reinforcement learning paradigm, when classifiers are the environment for the agent.

Целевой функцией при обучении нейронной сети-агента является максимальная точность распознавания изображений обучающей выборки и минимальное количество затраченных ресурсов, таким образом вознаграждение R при обучении складывается из точности полученного ответа и времени на него потраченногоThe objective function when training a neural network agent is the maximum accuracy of recognition of images of the training sample and the minimum amount of resources spent, thus the reward R during training is the sum of the accuracy of the received answer and the time spent on it.

где r=1, если объект распознан верно и 0 в противном случае, Т(A_i) - время выполнения классификатора A_i в физических единицах, λ>0 - гиперпараметр, который описывает насколько долгим может быть процесс распознавания в среднем. При λ=0, вычислительные ресурсы не ограничены, т.к. «стоимость вычислений» не сказывается на вознаграждении. Поскольку T(A_i)>0 для любого i, то вознаграждение, полученное в отдаленном будущем, уменьшается без использования дополнительных гиперпараметров. Если «ответом» является вектор вероятностей, то r соответствует кроссэнтропия эталонного и полученного распределения.where r = 1 if the object is recognized correctly and 0 otherwise, Т (A _i ) is the execution time of the classifier A _i in physical units, λ> 0 is a hyperparameter that describes how long the recognition process can be on average. When λ = 0, computing resources are not limited, since “Compute cost” does not affect the remuneration. Since T (A _i )> 0 for any i, then the reward received in the distant future is reduced without using additional hyperparameters. If the "answer" is a vector of probabilities, then r corresponds to the cross-entropy of the reference and obtained distributions.

В качестве управляющей нейронной сети агента можно использовать сеть на основе долгой-краткосрочной памяти (LSTM) (Hochreiter S., Schmidhuber J. Long short-term memory // Neural Computation. V.9. №8. 1997. PP. 1735-1780) или аналогов.A network based on long-short-term memory (LSTM) can be used as a controlling neural network of an agent (Hochreiter S., Schmidhuber J. Long short-term memory // Neural Computation. V.9. No. 8. 1997. PP. 1735-1780 ) or analogs.

Для сокращения затрат вычислительных ресурсов целесообразно также использовать «механизм внимания», и, таким образом, пространство действий при обучении агента, включает пространство ключей классификаторов

где ε - символ означающий, что вычисления требуется прекратить, а также управление классифицируемым окном на изображении

где Ω - это область определения изображения. В результате агенту нужно выучить стратегию

где θ - настраиваемые в процессе обучения веса нейронной сети, a s_t - состояние системы, которое включает историю всех предыдущих действий и ответов классификаторов.To reduce the cost of computing resources, it is also advisable to use the "attention mechanism", and, thus, the space of actions during training of the agent includes the space of the classifiers keys

where ε is a symbol that means that calculations need to be stopped, as well as control of the classified window on the image

where Ω is the domain of the image. As a result, the agent needs to learn the strategy

where θ are the weights of the neural network adjusted during training, as _t is the state of the system, which includes the history of all previous actions and responses of the classifiers.

При реализации предлагаемого способа специальная нейронная сеть для выбора нейронной сети представляет собой специализированный нейросетевой блок, выполняющий задачи выбора наилучшей нейронной сети и оценки качества применяемой нейронной сети в данный момент времени.When implementing the proposed method, a special neural network for choosing a neural network is a specialized neural network unit that performs the tasks of choosing the best neural network and assessing the quality of the applied neural network at a given time.

Время загрузки нейронной сети после выбора наилучшего технического решения нейросетевым блоком будет ограничиваться пропускной способностью канала между ПЛИС и конфигурационной памятью ПЗУ. Например, при образе размером порядка 70 Мб (что соответствует многослойной сверточной нейронной сети для обработки видеоинформации) при пропускной способности канала 100 Тб/с время полной реконфигурации ПЛИС составит порядка 0,7 мкс. В случае использования частичной динамической реконфигурации и при уменьшении сложности используемого технического решения время реконфигурации ПЛИС может быть уменьшено до единиц наносекунд.The loading time of the neural network after choosing the best technical solution by the neural network unit will be limited by the bandwidth of the channel between the FPGA and the configuration memory of the ROM. For example, with an image of about 70 MB (which corresponds to a multilayer convolutional neural network for processing video information) with a channel bandwidth of 100 Tb / s, the time for complete reconfiguration of the FPGA will be about 0.7 μs. In the case of using partial dynamic reconfiguration and reducing the complexity of the used technical solution, the FPGA reconfiguration time can be reduced to a few nanoseconds.

Для обеспечения требуемой пропускной способности предлагается использовать дифференциальные линии передачи данных (фиг. 2). Современный предел скорости передачи данных по 1-й дифференциальной паре в ПЛИС составляет порядка 100 Гб/с (Intel Agilex FPGA Advanced Information Brief: (Device Overview).02.07.2019, https://www.intel. com/content/dam/www/programmable/us/en/pdfs/literature/hb/agilex/ag-overview.pdf). Таким образом, для обеспечения требуемой пропускной способности предлагается ввести в конфигурационный канал ПЛИС порядка 1000 дифференциальных пар при использовании соответственно порядка 2000 выводов ПЛИС и 2000 выводов ПЗУ.To provide the required bandwidth, it is proposed to use differential data transmission lines (Fig. 2). The current limit of the data transfer rate for the 1st differential pair in FPGAs is about 100 Gb / s (Intel Agilex FPGA Advanced Information Brief: (Device Overview). 07/02/2019, https: //www.intel. Com / content / dam / www / programmable / us / en / pdfs / literature / hb / agilex / ag-overview.pdf). Thus, in order to ensure the required bandwidth, it is proposed to introduce about 1000 differential pairs into the configuration channel of the FPGA, using, respectively, about 2000 FPGA pins and 2000 ROM pins.

Микросхемы ПЗУ, имеющее размеры порядка 216 квадратных миллиметров (В17А FF_512Gb_iTb_2Tb_4Tb_8Tb_Async_Sync_NAND_Datasheet.pdf - Rev. L 03/29/19 EN, https://www.micron.com/products/nand-flash/3d-nand/part-catalog/mt29f8t08ewhafj6-3r), имеют объем памяти 8 Тб, что позволяет хранить 114284 нейронных сети (при размере одного конфигурационного образа 70 Мб), предназначенных для обработки изображений.ROM chips having dimensions of about 216 square millimeters (В17А FF_512Gb_iTb_2Tb_4Tb_8Tb_Async_Sync_NAND_Datasheet.pdf - Rev. L 03/29/19 EN, https://www.micron.com/products/nand-flash/part-3d-catalog -3r), have a memory capacity of 8 TB, which allows storing 114,284 neural networks (with a size of one configuration image of 70 MB) intended for image processing.

Предлагаемый способ нейроморфной обработки может быть реализован с помощью устройства, которое содержит ПЛИС, внутри которой реализован блок специальной нейронной сети (СпНС), блок ПЗУ, внутри которого находятся сверточные нейронные сети СНС_1, СНС_2…CHC_N.The proposed method of neuromorphic processing can be implemented using a device that contains an FPGA, inside which a block of a special neural network (SPNS) is implemented, a ROM unit, inside which there are convolutional neural networks SNS_1, SNS_2 ... CHC_N.

Устройство работает следующим образом. В начальный момент времени (фиг. 3) при подаче питания в ПЛИС загружается конфигурация блока СпНС. Данная СпНС занимает относительно мало места внутри кристалла ПЛИС. Остальная часть ПЛИС не используется и отключена от статического потребления. Соответственно энергопотребление ПЛИС небольшое в данный момент.The device works as follows. At the initial moment of time (Fig. 3), when power is applied to the FPGA, the configuration of the SPNS unit is loaded. This SNS takes up relatively little space inside the FPGA crystal. The rest of the FPGA is not used and is disconnected from static consumption. Accordingly, the power consumption of the FPGA is small at the moment.

Далее СпНС на основе анализа входных данных выбирает лучшую нейронную сеть и загружает его в ПЛИС (например, сверточная нейронная сеть - СНС_1) из ПЗУ. Таким образом происходит динамическая реконфигурация ПЛИС в процессе работы (фиг. 4).Then the SNN, based on the analysis of the input data, selects the best neural network and loads it into the FPGA (for example, a convolutional neural network - SNS_1) from ROM. Thus, there is a dynamic reconfiguration of the FPGA during operation (Fig. 4).

Загруженная СНС_1 (фиг. 5) осуществляет обработку. При этом СНС_1 занимает часть свободного места внутри ПЛИС. В данном случае энергопотребление на порядки выше по сравнению с изначальной конфигурацией.Downloaded SNS_1 (Fig. 5) performs processing. In this case, SNS_1 takes up part of the free space inside the FPGA. In this case, power consumption is orders of magnitude higher compared to the original configuration.

Далее, если решение, получаемое СНС_1, не удовлетворяет каким-либо критериям, то происходит сверхбыстрая динамическая реконфигурация ПЛИС новым образом из конфигурационной памяти (фиг. 6). В данном случае в качестве нового решения выступает сверточная нейронная сеть - СНС_2. СпНС могла выбрать в зависимости от условий работы устройства в качестве наилучшего решения любую нейронную сеть, находящуюся в ПЗУ от СНС_1 до CHC_N.Further, if the solution obtained by SNS_1 does not satisfy any criteria, then an ultra-fast dynamic reconfiguration of the FPGA occurs in a new way from the configuration memory (Fig. 6). In this case, the new solution is a convolutional neural network - SNS_2. The SNS could choose, depending on the operating conditions of the device, as the best solution, any neural network located in ROM from SNS_1 to CHC_N.

Каждый раз при изменении условий работы устройства СпНС опять выбирает из ПЗУ наилучшую нейронную сеть в данный момент времени и загружает соответствующую конфигурацию из ПЗУ в ПЛИС.Each time when the operating conditions of the device change, the SNS again selects the best neural network from the ROM at a given time and loads the corresponding configuration from the ROM into the FPGA.

В представленном примере реализации СНС_ 2 имеет ту же архитектуру, что и СНС_1, но отличается добавлением дополнительного сверточного слоя. Таким образом, при реконфигурации СНС_2 целесообразно использовать частичную динамическую реконфигурацию, для увеличения скорости конфигурации ПЛИС с использованием образа СНС_2. Частичная динамическая реконфигурация позволяет обновить конфигурацию только небольшой части логики и не затрагивает основную часть конфигурационного образа.In the presented example implementation, SNS_2 has the same architecture as SNS_1, but differs in the addition of an additional convolutional layer. Thus, when reconfiguring SNS_2, it is advisable to use partial dynamic reconfiguration to increase the speed of the FPGA configuration using the SNS_2 image. Partial dynamic reconfiguration allows you to update the configuration of only a small part of the logic and does not affect the main part of the configuration image.

Помимо использования нейронных сетей, в рамках заявляемого способа, в качестве еще одного варианта реализации, СпНС может выбирать и загружать в ПЛИС также другие математические блоки различной архитектуры.In addition to using neural networks, within the framework of the proposed method, as another implementation option, the SNS can select and load other mathematical blocks of various architectures into the FPGA.

Еще одним вариантом реализации заявляемого способа является возможность СпНС выбирать не одну нейронную сеть (или другое алгоритмическое решение) для загрузки в ПЛИС, а возможность СпНС выбирать целую последовательность загрузки оптимальных решений в зависимости от изменения условий работы.Another embodiment of the proposed method is the ability of the SNS to select not one neural network (or other algorithmic solution) for loading into the FPGA, but the ability of the SNS to select a whole sequence of loading optimal solutions depending on changes in operating conditions.

Заявителями был разработан экспериментальный образец устройства для реализации заявляемого способа на базе системы-на-кристалле семейства Cyclone V с 110000 программируемых логических элементов и конфигурационной памятью Micron семейства Fortis Flash размером 12x18 мм и объемом 512 Гб. Эксперименты проводились с четырьмя различными образами нейронных сетей, загруженными в ПЗУ. Экспериментальный образец имел ограничения по пропускной способности памяти и, как следствие, скорости динамической (частичной динамической) реконфигурации. Проведенные экспериментальные исследования подтвердили работоспособность заявляемых технических решений.The applicants have developed an experimental model of a device for implementing the proposed method based on a system-on-chip of the Cyclone V family with 110,000 programmable logic elements and Micron configuration memory of the Fortis Flash family with a size of 12x18 mm and a volume of 512 GB. The experiments were carried out with four different images of neural networks loaded into ROM. The experimental sample had limitations in memory bandwidth and, as a consequence, the speed of dynamic (partial dynamic) reconfiguration. Experimental studies have confirmed the efficiency of the proposed technical solutions.

По сравнению с прототипом использование специальных нейронных сетей для выбора нейронных сетей и переход на динамически реконфигурируемые нейронные сети позволяет создавать компактные универсальные нейроморфные устройства обработки, приближенные по своим возможностям к человеческому мозгу, способные к выполнению широкого класса задач с нечетким описанием.Compared to the prototype, the use of special neural networks for the selection of neural networks and the transition to dynamically reconfigurable neural networks makes it possible to create compact universal neuromorphic processing devices that are close in their capabilities to the human brain, capable of performing a wide class of tasks with fuzzy descriptions.

Claims

1. A method of neuromorphic data processing, including neural network processing operations, characterized in that when power is supplied to a programmable logic integrated circuit (FPGA) containing a neural network block that implements the selection of a neural network for solving the current problem that meets the selection criteria, including the largest the expected accuracy of solving the current problem and the smallest computational resources spent on solving this problem, the configuration of the specified neural network block is loaded from the configuration memory, then the specified neural network block, based on the analysis of the input data of the current problem being solved, and in accordance with the above-defined criteria selects the corresponding a neural network and loads it into the FPGA memory, thus providing a dynamic reconfiguration of the FPGA during operation, then the loaded neural network performs data processing to solve the current task, and if the solution to the current task obtained loaded neural network does not meet predefined criteria, then the above neural network block performs dynamic reconfiguration of the FPGA, loading from the configuration memory an image of another neural network that meets the current operating conditions, while providing full dynamic reconfiguration or partial dynamic reconfiguration, updating the configuration of only part of the FPGA logic , in the case when the main part of the configuration image of the newly loaded neural network and the previous loaded neural network coincide.

2. A device for neuromorphic data processing, including neural network blocks implemented on the basis of processors with a limited set of instructions and programmable logic integrated circuits (FPGA), characterized in that the device contains an FPGA with a configuration of connections based on a static random access memory, containing a neural network block, which implements the choice of a neural network for solving the current problem, satisfying the selection criteria, including the highest expected accuracy of solving the current problem and the smallest computational resources spent on solving this problem, configuration memory, high-speed data transmission channel between FPGA and configuration memory with a bandwidth of 50 100 TB / s, while the FPGA is designed to be loaded, when power is applied, from the configuration memory via a high-speed data transmission channel for the configuration of the specified neural network unit, which implements the selection of a neural network for solving the current task, and indicated The th block of the neural network is made with the ability, based on the analysis of the input data of the current problem being solved and in accordance with the above-defined criteria, to select from the configuration memory of the corresponding neural network, and to load its image into the FPGA, as well as with the possibility, if the solution to the problem received by the loaded neural network does not satisfy the predefined criteria, provide dynamic reconfiguration of the FPGA, loading from the configuration memory the image of another neural network that meets the current operating conditions, while providing full dynamic reconfiguration, or, in the case when the main part of the configuration image of the newly loaded neural network the network and the previous loaded neural network are the same, providing partial dynamic reconfiguration, updating the configuration of only a part of the FPGA logic.