RU2848159C1

RU2848159C1 - Device for multiplying binary matrices

Info

Publication number: RU2848159C1
Application number: RU2025104287A
Authority: RU
Inventors: Алексей Владимирович Болгак; Эдуард Игоревич Ватутин
Filing date: 2025-02-25
Publication date: 2025-10-16

Abstract

FIELD: computer technology.

SUBSTANCE: device for multiplying matrices, containing a matrix of n×n operational blocks (where n is the size of the square matrices being multiplied), first and second blocks of matrix coefficients, a shift register, a group of n two-stage registers, each operational block comprising a first, second and third trigger, first, second, and third AND elements, first and second OR elements, an inverter, each of the matrix coefficient blocks contains n×n storage blocks, each of the storage blocks contains an AND element, a flip-flop, a group of n AND elements, a group of n OR elements, a first group of n flip-flops is introduced as part of each storage block, a second group of n triggers, a third group of n triggers, included in the coefficient block – a first group of of (n–1) two-stage registers, a second group of (n–1) two-stage registers, a group of n×n triggers, a group of n×(n–1) OR elements.

EFFECT: reduction in the processing time of square binary matrices by means of pipelining the read operation from a specialised multi-port memory.

1 cl, 5 dwg

Description

Устройство относится к вычислительной технике и может быть использовано для умножения квадратных бинарных матриц размером n × n элементов.The device is related to computing technology and can be used to multiply square binary matrices of size n × n elements.

Известно устройство для умножения произвольных квадратных матриц (Патент РФ на полезную модель № 193927. Устройство для умножения бинарных матриц. Заявл. 26.06.2019, опубл. 21.11.2019, Бюл. № 33), содержащее матрицу из n × n операционных блоков (где n - размер перемножаемых квадратных матриц), первый и второй блоки коэффициентов матриц, сдвиговый регистр, группу из n двухступенчатых регистров, в составе каждого операционного блока первый, второй, третий триггер, первый, второй, третий элементы И, первый, второй элементы ИЛИ, инвертор, каждый из блоков коэффициентов матриц содержит n × n блоков хранения, каждый из блоков хранения содержит элемент И, триггер, группу из n элементов И, группу из n элементов ИЛИ [1].A device for multiplying arbitrary square matrices is known (Patent of the Russian Federation for Utility Model No. 193927. Device for Multiplying Binary Matrices. Claimed on June 26, 2019, published on November 21, 2019, Bulletin No. 33), comprising a matrix of n × n operational blocks (where n is the size of the square matrices being multiplied), first and second blocks of matrix coefficients, a shift register, a group of n two-stage registers, each operational block comprising the first, second, third trigger, the first, second, third AND elements, the first, second OR elements, an inverter, each of the matrix coefficient blocks contains n × n storage blocks, each of the storage blocks contains an AND element, a trigger, a group of n AND elements, a group of n OR elements [1].

Недостатком указанного устройства является большие временные затраты при умножении бинарных матриц и, как следствие, низкое быстродействие.The disadvantage of this device is the large time costs involved in multiplying binary matrices and, as a consequence, low performance.

Наиболее близким по технической сущности к заявляемому устройству является устройство (Патент РФ на полезную модель № 157948. Устройство для умножения матриц. Кл. G06F 17/16, заявл. 08.07.2015, опубл. 20.12.2015, Бюл. № 35), содержащее матрицу из n × n операционных блоков (где n - размер перемножаемых квадратных матриц), каждый из которых содержит первый, второй и третий регистры, умножитель и сумматор, мультиплексор, первый и второй блоки коэффициентов матриц, сдвиговый регистр, группу из n двухступенчатых регистров, каждый из блоков коэффициентов матриц содержит n × n блоков хранения и группу из n выходных элементов ИЛИ, каждый из блоков хранения содержит регистр, элемент И, группу из n элементов И, группу из n элементов ИЛИ [2].The closest in technical essence to the claimed device is the device (Patent of the Russian Federation for Utility Model No. 157948. Device for matrix multiplication. Class G06F 17/16, declared 07/08/2015, published 12/20/2015, Bulletin No. 35), containing a matrix of n × n operational blocks (where n is the size of the square matrices being multiplied), each of which contains the first, second and third registers, a multiplier and adder, a multiplexer, the first and second blocks of matrix coefficients, a shift register, a group of n two-stage registers, each of the blocks of matrix coefficients contains n × n storage blocks and a group of n output OR elements, each of the storage blocks contains a register, an AND element, a group of n AND elements, a group of n OR elements [2].

Недостатком данного устройства является аппаратная избыточность и, как следствие, высокая аппаратная сложность.The disadvantage of this device is hardware redundancy and, as a result, high hardware complexity.

Технической задачей предложенного изобретения является снижение времени обработки квадратных бинарных матриц за счет конвейеризации операции чтения из специализированной многопортовой памяти.The technical objective of the proposed invention is to reduce the processing time of square binary matrices by pipelining the read operation from specialized multi-port memory.

Техническая задача решается тем, что в устройство, содержащее матрицу из n × n операционных блоков (где n - размер перемножаемых квадратных матриц), первый и второй блоки коэффициентов матриц, сдвиговый регистр, группу из n двухступенчатых регистров, в составе каждого операционного блока первый, второй, третий триггер, первый, второй, третий элементы И, первый, второй элементы ИЛИ, инвертор, каждый из блоков коэффициентов матриц содержит n × n блоков хранения, каждый из блоков хранения содержит элемент И, триггер, группу из n элементов И, группу из n элементов ИЛИ, причем первый вход ( i, j )-го операционного блока (где i = , j =) подключен к первому выходу ( i, j-1)-го операционного блока, второй вход ( l, k )-го операционного блока (где l =, k =) подключен ко второму выходу ( l - 1, k )-го операционного блока, -й выход группы выходов устройства подключен к третьему выходу ( n, k )-го операционного блока, синхровходы всех операционных блоков подключены к синхровходу матрицы операционных блоков устройства, третий вход ( l, k )-го операционного блока (где l =, k =) подключен ко третьему выходу ( l - 1, k )-го операционного блока, первый вход ( i, 1)-го операционного блока (где i =) подключен к i-му выходу в составе группы выходов первого блока коэффициентов матрицы, второй вход (1, k )-го операционного блока (где k =) подключен к k-му выходу в составе группы выходов второго блока коэффициентов матрицы, управляющие входы операционных блоков подключены к управляющему входу устройства, а входы сброса операционных блоков - ко входу сброса устройства, который также подключен ко входам сброса каждого из n двухступенчатых регистров, информационный вход m-го двухступенчатого регистра (где m = ) подключен к выходу ( m - 1)-го двухступенчатого регистра, информационный вход первого двухступенчатого регистра подключен к выходу сдвигового регистра, синхровход которого подключен к синхровходу устройства, который также подключен к синхровходам каждого из n двухступенчатых регистров, выход k-го (где k =) двухступенчатого регистра подключен к k-му входу в составе группы адресов строк чтения второго блока коэффициентов матрицы и к k-му входу в составе группы адресов столбцов чтения первого блока коэффициентов матрицы, синхровходы первого и второго блоков коэффициентов матриц подключены к синхровходу записи устройства, в составе каждого из блоков коэффициентов матриц синхровход блока коэффициентов матрицы подключен к синхровходам каждого из n × n блоков хранения, k-й вход (где k =) в составе группы адресов строк чтения блока коэффициентов матрицы подключен ко входу адресов строк чтения (1, 1)-го блока хранения, k-й вход (где k =) в составе группы адресов столбцов чтения блока коэффициентов матрицы подключен ко входу адресов столбцов чтения (1, 1)-го блока хранения, k-й вход (где k =) в составе группы входов данных от предыдущей строки ( m, p )-го блока хранения (где m = , p = ) подключен к k-му выходу данных для следующей строки ( m - 1, p )-го блока хранения, k-й вход (где k =) в составе группы адресов строк записи блока коэффициентов матрицы подключен ко входу адресов строк записи ( i, k )-го блока хранения (где i =), k-й вход (где k =) в составе группы адресов столбцов записи блока коэффициентов матрицы подключен ко входу адресов столбцов записи ( k, i )-го блока хранения (где i =), вход данных записи блока коэффициентов матрицы подключен ко входу данных записи каждого из n × n блоков хранения, в составе каждого из блоков хранения первый вход элемента И подключен ко входу адреса столбца записи блока хранения, второй вход - ко входу адреса строки записи блока хранения, а третий вход - ко входу синхронизации блока хранения, выход элемента И подключен к синхровходу триггера, информационный вход триггера подключен ко входу данных записи блока хранения, выход триггера подключен к первым входам каждого из n элементов группы элементов И, выход k-го элемента (где k =) группы элементов И подключен к первому входу k-го (где k =) элемента группы элементов ИЛИ, второй вход k-го (где k =) элемента группы элементов ИЛИ подключен к k-му (где k =) входу данных от предыдущей строки блока хранения, в составе каждого из операционных блоков вход и выход второго триггера подключены соответственно к первым входу и выходу операционного блока, вход и выход первого триггера подключены соответственно ко вторым входу и выходу операционного блока, третий выход операционного блока соединен с выходом третьего триггера, выход первого триггера соединен с первым входом первого элемента И, выход второго триггера соединен со вторым входом первого элемента И, второй вход первого элемента ИЛИ соединен с выходом первого элемента И, первый и второй входы которого соединены соответственно с первым и вторым выходами операционного блока, синхровход которого подключен к синхровходам первого, второго и третьего триггеров, выход первого элемента ИЛИ соединен со вторым входом второго элемента И, второй вход третьего элемента И соединен с третьим входом операционного блока, а выход - со вторым входом второго элемента ИЛИ, выход которого соединен с входом третьего триггера, управляющий вход операционного блока соединен с первым входом второго элемента И и входом инвертора, выход которого соединен с первым входом третьего элемента И, а вход сброса операционного блока - со входами сброса первого, второго и третьего триггеров, выход второго элемента И подключен к первому входу второго элемента ИЛИ, выход третьего триггера соединен с первым входом первого элемента ИЛИ, дополнительно введены в составе каждого блока хранения первая группа из n триггеров, вторая группа из n триггеров, третья группа из n триггеров, в составе блока коэффициентов - первая группа из (n - 1) двухступенчатых регистров, вторая группа из (n - 1) двухступенчатых регистров, группа из n × n триггеров, группа из n × (n - 1) элементов ИЛИ, причем информационный вход k-го элемента (где k =) первой группы триггеров подключен к k-му входу (где k =) в составе группы входов адресов строк чтения блока хранения, а выход k-го элемента (где k =) первой группы триггеров подключен ко второму входу k-го элемента (где k =) группы элементов И и к k-му выходу (где k =) в составе группы выходов адресов строк чтения блока хранения, информационный вход k-го элемента (где k =) второй группы триггеров подключен к k-му входу (где k =) в составе группы входов адресов столбцов чтения блока хранения, а выход k-го элемента (где k =) второй группы триггеров подключен к третьему входу k-го элемента (где k =) группы элементов И и к k-му выходу (где k =) в составе группы выходов адресов столбцов чтения блока хранения, информационный вход k-го элемента (где k =) третьей группы триггеров подключен к выходу k-го элемента (где k =) группы элементов ИЛИ, а выход k-го элемента (где k =) третьей группы триггеров подключен к k-му выходу (где k =) в составе группы выходов данных текущего блока хранения, вход сброса блока хранения подключен ко входам сброса первой, второй и третьей группы триггеров, первый элемент первой группы двухступенчатых регистров подключен ко входу адресов столбцов чтения блока коэффициентов матрицы, выход k-го элемента (где k = ) первой группы двухступенчатых регистров подключен к k-му входу (где k =) в составе группы входов адресов столбцов чтения (1, k + 1)-го блока хранения (где k = ) и ко входу ( k + 1)-го (где k = ) элемента первой группы двухступенчатых регистров, вход первого элемента второй группы двухступенчатых регистров подключен ко входу адресов строк чтения блока коэффициентов матрицы, выход k-го элемента (где k = ) второй группы двухступенчатых регистров подключен к k-му входу (где k =) в составе группы входов адресов строк чтения ( k + 1, 1)-го блока хранения (где k = ) и ко входу ( k + 1)-го (где k = ) элемента второй группы двухступенчатых регистров, k-й выход (где k =) в составе группы выходов адресов строк чтения ( m, p )-го блока хранения (где m = , p =) подключен к k-му входу в составе группы входов адресов строк чтения ( m, p + 1)-го блока хранения, k-й выход (где ) в составе группы выходов адресов столбцов чтения ( m, p )-го блока хранения (где m = , p =) подключен к k-му входу в составе группы входов адресов столбцов чтения ( m + 1, p )-го блока хранения, вход сброса блока коэффициентов матрицы устройства подключен ко входам сброса каждого из n × n блоков хранения, к первой и второй группе из (n - 1) двухступенчатых регистров и к группе из n × n триггеров, k-й разряд выхода в составе группы выходов данных ( n, 1)-го блока хранения подключен к информационному входу k-го элемента (где k = ) группы триггеров, а k-й разряд выхода в составе группы выходов данных ( n, p )-го блока хранения (где p = ) подключен к первому входу ( k, p - 1)-го элемента (где k =) группы элементов ИЛИ, выход ( k, p )-го элемента (где k =, p = ) группы элементов ИЛИ подключен к информационному входу ( k, p + 1)-го элемента группы триггеров, выход ( k, p )-го элемента (где k =, p = ) группы триггеров подключен ко второму входу ( k, p + 1)-го элемента группы элементов ИЛИ, а выход ( k, n )-го элемента (где k =) группы триггеров подключен к k-му выходу в составе группы выходов блока коэффициентов матрицы.The technical problem is solved by the fact that in the device,containingmatrix fromn×noperating blocks (wheren- the size of the square matrices being multiplied), the first and second blocks of matrix coefficients, a shift register, a group ofntwo-stage registers, each operational block contains the first, second, third trigger, first, second, third AND elements, first, second OR elements, inverter, each of the matrix coefficient blocks containsn × nstorage blocks, each of the storage blocks contains an AND element, a flip-flop, a group ofnelements I, group ofnOR elements,andfirst entrance( i,j )-th operating block (wherei=,j=) connected to the first output( i,j-1)-th operating block, second entrance( l,k )-th operating block (wherel=,k=) connected to the second output( l-1,k )-th operating block,-th output of the device output group is connected to the third output( n,k )-th operating unit, the synchronization inputs of all operating units are connected to the synchronization input of the matrix of operating units of the device, the third input( l,k )-th operating block (wherel=,k=) connected to the third output( l- 1,k )-th operating block, first entrance( i, 1)-th operating block (wherei=) connected toi-th output as part of the group of outputs of the first block of matrix coefficients, the second input(1,k )-th operating block (wherek=) connected tok-th output as part of the group of outputs of the second block of matrix coefficients, the control inputs of the operating blocks are connected to the control input of the device, and the reset inputs of the operating blocks are connected to the reset input of the device, which is also connected to the reset inputs of each ofntwo-stage registers, information inputm-th two-stage register (wherem=) connected to the output( m- 1)-th two-stage register, the information input of the first two-stage register is connected to the output of the shift register, the clock input of which is connected to the clock input of the device, which is also connected to the clock inputs of each ofntwo-stage registers, outputk-th (wherek=) two-stage register is connected tok-th input as part of the group of addresses of the reading lines of the second block of matrix coefficients and tok-th input as part of the group of addresses of the columns of reading the first block of matrix coefficients, the sync inputs of the first and second blocks of matrix coefficients are connected to the sync input of the device write, as part of each of the blocks of matrix coefficients, the sync input of the block of matrix coefficients is connected to the sync inputs of each ofn×nstorage blocks,k-th entrance (wherek=) as part of the group of addresses of the reading lines of the matrix coefficient block is connected to the input of the addresses of the reading lines(1, 1)-th storage block, k-th entrance (wherek=) as part of the group of read column addresses of the matrix coefficient block is connected to the input of read column addresses(1, 1)-th storage block, k-th entrance (wherek=) as part of the data input group from the previous line( m,p )-th storage block (wherem=,p=) connected tok-th data output for the next row( m- 1,p )-th storage block, k-th entrance (wherek=) as part of the group of addresses of recording rows of the matrix coefficient block is connected to the input of addresses of recording rows( i,k )-th storage block (wherei=),k-th entrance (wherek=) as part of the group of record column addresses of the matrix coefficient block is connected to the input of record column addresses( k,i )-th storage block (wherei=), the data input of the recording block of matrix coefficients is connected to the data input of the recording of each ofn×nstorage blocks, in each of the storage blocks the first input of the AND element is connected to the input of the column address of the storage block record, the second input is connected to the input of the row address of the storage block record, and the third input is connected to the input of the synchronization of the storage block, the output of the AND element is connected to the synchronization input of the trigger, the information input of the trigger is connected to the input of the data of the storage block record, the output of the trigger is connected to the first inputs of each ofnelements of the group of elements I, outputk-th element (wherek=) groups of elements AND is connected to the first inputk-th (wherek=) element of the OR group of elements, the second inputk-th (wherek=) element of a group of elements OR connected tok-mu (wherek=) the data input from the previous row of the storage block, in each of the operational blocks the input and output of the second trigger are connected respectively to the first input and output of the operational block, the input and output of the first trigger are connected respectively to the second input and output of the operational block, the third output of the operational block is connected to the output of the third trigger, the output of the first trigger is connected to the first input of the first AND element, the output of the second trigger is connected to the second input of the first AND element, the second input of the first OR element is connected to the output of the first AND element, the first and second inputs of which are connected respectively to the first and second outputs of the operational block, the clock input of which is connected to the clock inputs of the first, second and third triggers, the output of the first OR element is connected to the second input of the second AND element, the second input of the third AND element is connected to the third input of the operational block, and the output is connected to the second input of the second OR element, the output of which is connected to the input of the third trigger, the control input of the operational block is connected to the first input of the second AND element and the input of the inverter, the output of which is connected to the first input of the third AND element, and the input reset of the operating unit - with reset inputs of the first, second and third triggers, the output of the second AND element is connected to the first input of the second OR element, the output of the third trigger is connected to the first input of the first OR element,additionally introducedEach storage block contains the first group ofntriggers, the second group ofntriggers, the third group ofntriggers, as part of the coefficient block - the first group of (n- 1) two-stage registers, the second group of (n- 1) two-stage registers, a group ofn×ntriggers, a group ofn ×(n- 1) OR elements,andinformation entrancek-th element (wherek=) the first group of triggers is connected tok-th entrance (wherek=) as part of a group of input addresses of the storage block reading lines, and the outputk-th element (wherek=) the first group of triggers is connected to the second inputk-th element (wherek=) groups of elements I and Kk-th exit (wherek=) as part of the output group of storage block read line addresses, information inputk-th element (wherek=) the second group of triggers is connected tok-th entrance (wherek=) as part of a group of inputs of column addresses of reading a storage block, and the outputk-th element (wherek=) the second group of triggers is connected to the third inputk-th element (wherek=) groups of elements I and Kk-th exit (wherek=) as part of the group of output addresses of the storage block read columns, information inputk-th element (wherek=) the third group of triggers is connected to the outputk-th element (wherek=) groups of OR elements, and the outputk-th element (wherek=) the third group of triggers is connected tok-th exit (wherek=) as part of the group of data outputs of the current storage block, the reset input of the storage block is connected to the reset inputs of the first, second and third groups of triggers, the first element of the first group of two-stage registers is connected to the input of the column addresses of reading the matrix coefficient block, the outputk-th element (wherek=) the first group of two-stage registers is connected tok-th entrance (wherek=) as part of a group of read column address inputs(1,k+ 1)-th storage block (wherek=) and to the entrance( k+ 1)-th (wherek=) of the element of the first group of two-stage registers, the input of the first element of the second group of two-stage registers is connected to the input of the addresses of the reading lines of the matrix coefficient block, the outputk-th element (wherek=) the second group of two-stage registers is connected tok-th entrance (wherek=) as part of a group of inputs of read line addresses( k+ 1, 1)-th storage block (wherek=) and to the entrance( k+ 1)-th (wherek=) element of the second group of two-stage registers,k-th exit (wherek=) as part of a group of read line address outputs( m,p )-th storage block (wherem=,p=) connected tok-th input as part of a group of inputs of read line addresses( m,p+ 1)-th storage block, k-th exit (where) as part of a group of read column address outputs( m,p )-th storage block (wherem=,p=) connected tok-th input as part of a group of inputs of read column addresses( m+ 1, p )-th storage block, the reset input of the device matrix coefficient block is connected to the reset inputs of each ofn × nstorage blocks, to the first and second groups of (n- 1) two-stage registers and to a group ofn × ntriggers,k-th output digit in the data output group( n, 1)-th storage block is connected to the information inputk-th element (wherek=) groups of triggers, andk-th output digit in the data output group( n,p )-th storage block (wherep=) connected to the first input( k,p- 1)-th element (wherek=) groups of OR elements, output( k,p )-th element (wherek=,p=) groups of elements OR connected to the information input( k,p +1)-th element of the trigger group, output( k,p )-th element (wherek=,p=) trigger groups are connected to the second input( k,p+ 1)-th element of the OR group of elements, and the output( k,n )-th element (wherek=) groups of triggers connected tok-th output as part of the group of outputs of the matrix coefficient block.

На фиг. 1 изображена функциональная схема устройства; на фиг. 2 - схема операционного блока; на фиг. 3 - схема блока коэффициентов матрицы; на фиг. 4 - схема блока хранения; на фиг. 5 - затраты времени на выполнение операции умножения предлагаемым устройством и прототипом.Fig. 1 shows a functional diagram of the device; Fig. 2 - a diagram of the operational unit; Fig. 3 - a diagram of the matrix coefficient block; Fig. 4 - a diagram of the storage unit; Fig. 5 - the time costs for performing the multiplication operation by the proposed device and the prototype.

Устройство для умножения матриц (фиг. 1) содержит матрицу n × n операционных блоков 1, первый блок 2 коэффициентов матрицы, второй блок 3 коэффициентов матрицы, сдвиговый регистр 4, группу из n двухступенчатых регистров 5, каждый операционный блок 1 содержит (фиг. 2) первый 6, второй 7 и третий 8 триггеры, элемент И 9, элемент ИЛИ 10, элемент И 11, элемент И 12, элемент ИЛИ 13, инвертор 14, первый 2 и второй 3 блоки коэффициентов матриц имеют однотипную структуру (фиг. 3), каждый из них включает в своем составе n × n блоков хранения 15, каждый из которых (фиг. 4) содержит триггер 16, элемент И 17, группу элементов И 18, группу элементов ИЛИ 19, группу триггеров 28, группу триггеров 29, группу триггеров 30, вход 20 синхронизации, вход 21 адреса строки записи, вход 22 адреса столбца записи, вход 23 данных записи, группу входов 24 адресов строк чтения, группу входов 25 адресов столбцов чтения, группу входов 26 данных от предыдущей строки, группу выходов 27 данных для следующей строки, группу выходов 31 адресов следующей строки, группу выходов 38 адресов следующего столбца, вход сброса, в состав каждого из блоков 2 и 3 коэффициентов матриц также входят группа входов 34 адресов строк чтения, группа входов 37 адресов столбцов чтения, вход 32 адреса строки записи, вход 36 адреса столбца записи, вход 35 данных записи, синхровход 33 записи, вход сброса, группа двухступенчатых регистров 39, группа двухступенчатых регистров 40, группа триггеров 41, группа элементов ИЛИ 42, группа выходов 43, причем первый вход ( i, j )-го операционного блока 1 (где , ) подключен к первому выходу ( i, j - 1)-го операционного блока 1, второй вход ( l, k )-го операционного блока 1 (где , ) подключен ко второму выходу ( l - 1, k )-го операционного блока 1, -й выход группы выходов устройства подключен к третьему выходу ( n, k )-го операционного блока 1, синхровходы всех операционных блоков 1 подключены к синхровходу матрицы операционных блоков устройства, третий вход ( l, k )-го операционного блока 1 (где , ) подключен ко третьему выходу ( l - 1, k )-го операционного блока 1, первый вход ( i, 1)-го операционного блока 1 (где ) подключен к i-му выходу в составе группы выходов первого блока 2 коэффициентов матрицы, второй вход (1, k )-го операционного блока 1 (где ) подключен к k-му выходу в составе группы выходов второго блока 3 коэффициентов матрицы, управляющие входы операционных блоков 1 подключены к управляющему входу устройства, а входы сброса операционных блоков 1 - ко входу сброса устройства, который также подключен ко входам сброса каждого из n двухступенчатых регистров 5, информационный вход m-го двухступенчатого регистра 5 (где ) подключен к выходу ( m - 1)-го двухступенчатого регистра 5, информационный вход первого двухступенчатого регистра 5 подключен к выходу сдвигового регистра 4, синхровход которого подключен к синхровходу устройства, который также подключен к синхровходам каждого из n двухступенчатых регистров 5, выход k-го (где ) двухступенчатого регистра 5 подключен к k-му входу в составе группы адресов строк чтения 34 второго блока 3 коэффициентов матрицы и к k-му входу в составе группы адресов столбцов чтения 37 первого блока 2 коэффициентов матрицы, синхровходы 33 записи первого 2 и второго 3 блоков коэффициентов матриц подключены к синхровходу записи устройства, в составе каждого из блоков коэффициентов матриц синхровход записи блока коэффициентов матрицы 33 подключен к синхровходу 20 каждого из n × n блоков хранения 15, k-й вход (где ) в составе группы входов адресов строк чтения 34 блока коэффициентов матрицы подключен к k-му (где ) входу 24 в составе группы входов адресов cтрок чтения (1, 1)-го блока хранения 15 и ко входу k-го (где k = ) двухступенчатого регистра 40, выход k-го (где k = ) двухступенчатого регистра 40 подключен к k-му (где k = ) входу 24 в составе группы входов адресов строк чтения (1, k + 1)-го блока хранения 15 (где k = ) и ко входу ( k + 1)-го (где k = ) двухступенчатого регистра 40, k-й вход (где ) в составе группы адресов столбцов чтения 37 блока коэффициентов матрицы подключен к k-му (где ) входу (1, 1)-го блока хранения 15 и ко входу k-го (где k = ) двухступенчатого регистра 39, выход k-го (где k = ) двухступенчатого регистра 39 подключен к k-му (где k = ) входу 25 в составе группы входов адресов столбцов чтения ( k + 1, 1)-го блока хранения 15 (где k = ) и ко входу ( k + 1)-го (где k = ) двухступенчатого регистра 39, k-й выход (где ) в составе группы выходов 31 адресов строк чтения ( m, p )-го блока хранения 15 (где m = , p =) подключен к k-му входу в составе группы входов 24 адресов строк чтения ( m, p + 1)-го блока хранения 15, k-й выход (где ) в составе группы выходов 38 адресов столбцов чтения ( m, p )-го блока хранения 15 (где m = , p =) подключен к k-му входу в составе группы входов 25 адресов столбцов чтения ( m + 1, p )-го блока хранения 15, вход сброса блока коэффициентов матрицы устройства подключен ко входам сброса каждого из n × n блоков хранения 15, к группам из (n - 1) двухступенчатых регистров 39 и 40 и к группе из n × n триггеров 41, k-й вход (где ) в составе группы входов 26 данных от предыдущей строки ( m, p )-го блока хранения 15 (где , ) подключен к k-му выходу 27 данных в составе группы выходов данных для следующей строки (m - 1, p)-го блока хранения 15, k-й вход (где ) в составе группы входов 32 адресов строк записи блока коэффициентов матрицы подключен ко входу 21 адресов строк записи -го блока хранения 15 (где ), k-й вход (где ) в составе группы входов 36 адресов столбцов записи блока коэффициентов матрицы подключен ко входу 22 адресов столбцов записи ( k, i )-го блока хранения 15 (где ), вход 35 данных записи блока коэффициентов матрицы подключен ко входу 23 данных записи каждого из n × n блоков хранения 15, k-й разряд выхода в составе группы выходов 27 данных ( n, 1)-го блока хранения 15 подключен к информационному входу k-го элемента (где k = ) группы триггеров 41, а k-й разряд выхода в составе группы выходов 27 данных ( n, p )-го блока хранения 15 (где p = ) подключен к первому входу ( k, p - 1)-го элемента (где k =) группы элементов ИЛИ 42, выход ( k, p )-го элемента (где k =, p = ) группы элементов ИЛИ 42 подключен к информационному входу ( k, p + 1)-го элемента группы триггеров 41, выход ( k, p )-го элемента (где k =, p = ) группы триггеров 41 подключен ко второму входу ( k, p + 1)-го элемента группы элементов ИЛИ 42, а выход ( k, n )-го элемента (где k =) группы триггеров 41 подключен к k-му выходу 43 в составе группы выходов блока коэффициентов матрицы, в составе каждого из блоков хранения 15 информационный вход триггера 16 подключен ко входу 23 данных записи блока хранения, а синхровход триггера 16 - к выходу элемента И 17, первый вход которого подключен ко входу 22 адреса столбца записи блока хранения 15, второй вход - ко входу 21 адреса строки записи блока хранения 15, а третий вход - ко входу синхронизации 20 блока хранения 15, выход триггера 16 подключен к первым входам каждого из n элементов группы элементов И 18, выход k-го элемента (где ) группы элементов И 18 подключен к первому входу k-го элемента группы элементов ИЛИ 19, второй вход k-го элемента группы элементов ИЛИ 19 подключен к k-му входу 26 данных от предыдущей строки блока хранения 15, а выход k-го элемента группы элементов ИЛИ 19 подключен к информационному входу k-го элемента группы триггеров 30, выход k-го триггера 30 подключен к k-му выходу 27 данных для следующей строки блока хранения 15, второй вход k-го элемента группы элементов И 18 подключен к выходу k-го триггера группы триггеров 28, а третий вход k-го элемента группы элементов И 18 подключен к выходу k-го триггера группы триггеров 29, в составе каждого из операционных блоков 1 вход и выход второго триггера 7 подключены соответственно к первым входу и выходу операционного блока 1, вход и выход первого триггера 6 подключены соответственно ко вторым входу и выходу операционного блока 1, третий выход операционного блока 1 соединен с выходом третьего триггера 8, выход первого триггера 6 соединен с первым входом первого элемента И 9, выход второго триггера 7 соединен со вторым входом первого элемента И 9, второй вход элемента ИЛИ 10 соединен с выходом элемента И 9, первый и второй входы которого соединены соответственно с первым и вторым выходами операционного блока 1, синхровход которого подключен к синхровходам первого 6, второго 7 и третьего 8 триггеров, выход элемента ИЛИ 10 соединен со вторым входом элемента И 11, второй вход элемента И 12 соединен с третьим входом операционного блока 1, а выход - со вторым входом элемента ИЛИ 13, выход которого соединен с входом триггера 8, управляющий вход операционного блока 1 соединен с первым входом элемента И 11 и входом инвертора 14, выход которого соединен с первым входом элемента И 12, а вход сброса операционного блока 1 - со входами сброса первого 6, второго 7 и третьего 8 триггеров, выход элемента И 11 подключен к первому входу ИЛИ 12, выход триггера 8 соединен с первым входом элемента ИЛИ 10.The device for matrix multiplication (Fig. 1) contains a matrix of n × n operational units 1, a first unit 2 of matrix coefficients, a second unit 3 of matrix coefficients, a shift register 4, a group of n two-stage registers 5, each operational unit 1 contains (Fig. 2) the first 6, second 7 and third 8 triggers, an AND element 9, an OR element 10, an AND element 11, an AND element 12, an OR element 13, an inverter 14, the first 2 and second 3 blocks of matrix coefficients have the same type of structure (Fig. 3), each of them includes n × n storage blocks 15, each of which (Fig. 4) contains a trigger 16, an AND element 17, a group of AND elements 18, a group of OR elements 19, a group of triggers 28, a group of triggers 29, a group of triggers 30, a synchronization input 20, an input 21 write row address, input 22 write column address, input 23 write data, group of inputs 24 read row addresses, group of inputs 25 read column addresses, group of inputs 26 data from the previous row, group of outputs 27 data for the next row, group of outputs 31 next row addresses, group of outputs 38 next column addresses, reset input, each of blocks 2 and 3 matrix coefficients also includes group of inputs 34 read row addresses, group of inputs 37 read column addresses, input 32 write row address, input 36 write column address, input 35 write data, clock input 33 write, reset input, group of two-stage registers 39, group of two-stage registers 40, group of triggers 41, group of OR elements 42, group of outputs 43, wherein the first input ( i , j ) of the -th operational block 1 (where , ) is connected to the first output of the ( i , j - 1 ) -th operational unit 1, the second input of the ( l , k ) -th operational unit 1 (where , ) is connected to the second output ( l - 1, k ) of the operating unit 1, -th output of the device output group is connected to the third output of the ( n , k ) -th operational block 1, the synchronization inputs of all operational blocks 1 are connected to the synchronization input of the matrix of operational blocks of the device, the third input of the ( l , k ) -th operational block 1 (where , ) is connected to the third output of the ( l - 1, k ) -th operational unit 1, the first input of the ( i , 1 ) -th operational unit 1 (where ) is connected to the i -th output as part of the group of outputs of the first block 2 of matrix coefficients, the second input ( 1, k ) of the operational block 1 (where ) is connected to the k -th output as part of the output group of the second block 3 of the matrix coefficients, the control inputs of the operational blocks 1 are connected to the control input of the device, and the reset inputs of the operational blocks 1 are connected to the reset input of the device, which is also connected to the reset inputs of each of the n two-stage registers 5, the information input of the m -th two-stage register 5 (where ) is connected to the output of the ( m - 1 ) -th two-stage register 5, the information input of the first two-stage register 5 is connected to the output of the shift register 4, the clock input of which is connected to the clock input of the device, which is also connected to the clock inputs of each of the n two-stage registers 5, the output of the k -th (where ) of the two-stage register 5 is connected to the k -th input as part of the group of read row addresses 34 of the second block 3 of matrix coefficients and to the k -th input as part of the group of read column addresses 37 of the first block 2 of matrix coefficients, the sync inputs 33 for writing the first 2 and second 3 blocks of matrix coefficients are connected to the sync input for writing the device, as part of each of the blocks of matrix coefficients, the sync input for writing the block of matrix coefficients 33 is connected to the sync input 20 of each of the n × n storage blocks 15, the k -th input (where ) as part of the group of inputs of the addresses of the reading lines of the 34th block of matrix coefficients is connected to the k -th (where ) to input 24 as part of the group of inputs of read line addresses ( 1, 1 ) of the storage block 15 and to the input of the k -th (where k = ) two-stage register 40, output k -th (where k = ) two-stage register 40 is connected to the k -th (where k = ) input 24 as part of the group of inputs of read line addresses ( 1, k + 1 ) of the storage block 15 (where k = ) and to the input ( k + 1 ) -th (where k = ) two-stage register 40, k -th input (where ) as part of the group of addresses of the columns of reading the 37th block of matrix coefficients is connected to the k -th (where ) to the input of the ( 1, 1 ) -th storage block 15 and to the input of the k -th (where k = ) two-stage register 39, output k -th (where k = ) two-stage register 39 is connected to the k -th (where k = ) input 25 as part of the group of inputs of read column addresses of the ( k + 1, 1 ) -th storage block 15 (where k = ) and to the input ( k + 1 ) -th (where k = ) two-stage register 39, k -th output (where ) as part of the group of outputs 31 of the addresses of the read lines ( m , p ) of the storage block 15 (where m = , p = ) is connected to the k -th input as part of the input group of 24 read line addresses ( m , p + 1 ) of the storage block 15, the k -th output (where ) as part of the output group 38 addresses of read columns ( m , p ) of the storage block 15 (where m = , p = ) is connected to the k -th input in the group of inputs 25 of the read column addresses of the ( m + 1 , p ) -th storage block 15, the reset input of the device matrix coefficient block is connected to the reset inputs of each of the n × n storage blocks 15, to groups of ( n - 1) two-stage registers 39 and 40 and to a group of n × n triggers 41, the k -th input (where ) as part of the input group 26 of data from the previous line ( m , p ) of the storage block 15 (where , ) is connected to the k -th data output 27 as part of the data output group for the next row ( m - 1, p ) of the storage block 15, the k -th input (where ) as part of the group of inputs 32 of the addresses of the recording lines of the matrix coefficient block is connected to the input 21 of the addresses of the recording lines -th storage block 15 (where ), k -th input (where ) as part of the group of inputs 36 of the addresses of the columns of the recording block of the matrix coefficients is connected to the input 22 of the addresses of the columns of the recording ( k , i ) -th storage block 15 (where ), input 35 of the recording data of the matrix coefficient block is connected to input 23 of the recording data of each of the n × n storage blocks 15, the k -th output digit in the group of outputs 27 of the data ( n , 1 ) -th storage block 15 is connected to the information input of the k -th element (where k = ) of the trigger group 41, and the k -th output digit in the output group 27 of the data ( n , p ) of the storage block 15 (where p = ) is connected to the first input of the ( k, p - 1 ) -th element (where k = ) groups of OR elements 42, the output of the ( k, p ) -th element (where k = , p = ) of the OR group of elements 42 is connected to the information input of the ( k, p + 1 ) -th element of the trigger group 41, the output of the ( k, p ) -th element (where k = , p = ) of the group of triggers 41 is connected to the second input of the ( k, p + 1 ) -th element of the group of OR elements 42, and the output of the ( k, n ) -th element (where k = ) of the group of triggers 41 is connected to the k -th output 43 as part of the group of outputs of the matrix coefficient block, as part of each of the storage blocks 15, the information input of trigger 16 is connected to the input 23 of the recording data of the storage block, and the synchronization input of trigger 16 is connected to the output of AND element 17, the first input of which is connected to the input 22 of the recording column address of storage block 15, the second input is connected to the input 21 of the recording row address of storage block 15, and the third input is connected to the synchronization input 20 of storage block 15, the output of trigger 16 is connected to the first inputs of each of the n elements of the group of AND elements 18, the output of the k -th element (where ) of the AND element group 18 is connected to the first input of the k -th element of the OR element group 19, the second input of the k -th element of the OR element group 19 is connected to the k -th input 26 of the data from the previous line of the storage unit 15, and the output of the k -th element of the OR element group 19 is connected to the information input of the k -th element of the trigger group 30, the output of the k -th trigger 30 is connected to the k -th output 27 of the data for the next line of the storage unit 15, the second input of the k -th element of the AND element group 18 is connected to the output of the k -th trigger of the trigger group 28, and the third input of the k -th element of the AND element group 18 is connected to the output of the k -th trigger of the trigger group 29, in each of the operational units 1 the input and output of the second trigger 7 are connected respectively to the first input and output of the operational unit 1, the input and output of the first trigger 6 are connected respectively to the second input and output of the operational unit 1, the third output operating unit 1 is connected to the output of the third trigger 8, the output of the first trigger 6 is connected to the first input of the first AND element 9, the output of the second trigger 7 is connected to the second input of the first AND element 9, the second input of the OR element 10 is connected to the output of the AND element 9, the first and second inputs of which are connected respectively to the first and second outputs of operating unit 1, the clock input of which is connected to the clock inputs of the first 6, second 7 and third 8 triggers, the output of the OR element 10 is connected to the second input of the AND element 11, the second input of the AND element 12 is connected to the third input of operating unit 1, and the output is connected to the second input of the OR element 13, the output of which is connected to the input of the trigger 8, the control input of operating unit 1 is connected to the first input of the AND element 11 and the input of the inverter 14, the output of which is connected to the first input of the AND element 12, and the reset input of operating unit 1 is connected to the reset inputs of the first 6, second 7 and third 8 triggers, the output of AND element 11 is connected to the first input of OR element 12, the output of trigger 8 is connected to the first input of OR element 10.

Операционные блоки 1, объединенные в систолическую матричную структуру, используются для выполнения умножения элементов матриц в параллельно-конвейерном виде и хранения его результата. Первый 2 и второй 3 блоки коэффициентов матриц используются для хранения значений коэффициентов и , первой и второй матриц и выборки требуемых групп из n коэффициентов за такт в каждом из блоков в соответствии с логикой работы матрицы операционных блоков 1. Сдвиговый регистр 4 обеспечивает формирование последовательности значений , , …, , , …, , используемых при работе группы регистров 5. Группа двухступенчатых регистров 5 предназначена для хранения адресов группы из n строк первого блока 2 коэффициентов матрицы и группы из n столбцов второго блока 3 коэффициентов матрицы; данные адреса необходимы для чтения групп коэффициентов перемножаемых матриц в соответствии с логикой работы матрицы операционных блоков 1. Триггеры 6 и 7 предназначены для хранения коэффициентов перемножаемых матриц при работе операционных блоков 1. Триггер 8 используется для хранения промежуточных значений дизъюнкций конъюнкций коэффициентов перемножаемых матриц в процессе работы устройства, после (3n - 2) шагов работы устройства данные триггеры содержат значения коэффициентов результирующей матрицы C. Элемент И 9 используется для логического перемножения пары коэффициентов и матриц. Элемент ИЛИ 10 предназначен для вычисления дизъюнкций конъюнкций коэффициентов перемножаемых матриц. Элементы И 11 и ИЛИ 13 обеспечивает поступление значения дизъюнкций конъюнкций коэффициентов перемножаемых матриц на вход триггера 8 на этапе работы, элементы инвертор 14 и И 12 обеспечивают продвижение результата умножения (коэффициентов результирующей матрицы C) между триггерами 8 операционных блоков 1 в параллельно-конвейерном режиме на этапе вывода результата. Блоки хранения 15, объединенные в матрицу n × n (фиг. 3), образуют собой первый 2 и второй 3 блоки коэффициентов перемножаемых матриц A и B. Триггеры 16 служат для хранения исходных значений коэффициентов перемножаемых бинарных матриц. Элементы И 17 управляют прохождением синхросигнала 20 на синхровходы триггеров 16 в соответствии с выбранными значениями строки i и столбца j в составе блоков коэффициентов матриц. Блоки элементов И 18₁ - 18_n управляют прохождением на входы соответствующих блоков элементов ИЛИ 19₁ - 19_n группы из n значений с выходов триггеров 16, причем выбранные значения определяются координатами искомых строк и столбцов на входах 24₁ - 24_n и 25₁ - 25_n в соответствии с логикой работы матрицы операционных блоков 1. Элементы ИЛИ 19₁ - 19_n, группа триггеров 30₁ - 30_n в составе блоков хранения 15 в совокупности с группой триггеров 41₁₁ - 41_nn и группой элементов ИЛИ 42₁₁ - 42_nn обеспечивают получение на выходах 43₁ - 43_n блока коэффициентов матрицы очередной группы коэффициентов матрицы в соответствии с логикой работы матрицы операционных блоков 1 в конвейерном режиме, причем n пар адресов строк и столбцов искомых коэффициентов определяются значениями на входах 24₁ - 24_n и 25₁ - 25_n блоков хранения 15. Вход 20 синхронизации используется при записи начальных значений коэффициентов матриц в триггеры 16. Входы 21 и 22 адреса строки i и столбца j записи обеспечивают выбор ij-го блока хранения 15 в составе блока коэффициентов матрицы путем управления прохождением синхросигнала на синхровходе 20 через элемент И 17 при записи требуемого коэффициента со входа 23 в триггер 16 выбранного ij-го блока хранения 15. Вход 23 данных записи используется для приема блоком хранения 15 коэффициентов перемножаемых матриц с целью их записи в триггеры 16. Группа входов 24₁ - 24_n адресов строк чтения и группа входов 25₁ - 25_n адресов столбцов чтения в совокупности с группой триггеров 28₁ - 28_n и 29₁ - 29_n соответственно используются для управления своевременным прохождением значений группы из n коэффициентов матрицы из триггеров 16 блоков хранения 15 на выходы 43₁ - 43_n блока коэффициентов матрицы. Группа входов 26₁ - 26_n данных от предыдущей строки в совокупности с группой выходов 27₁ - 27_n данных для следующей строки используется для формирования искомых значений дизъюнкции группы выбранных значений коэффициентов матриц по столбцам блоков хранения 15, получаемой на выходах 27₁ - 27_n блоков хранения 15 n-й строки блока коэффициентов матрицы. Вход 34 адресов строк чтения в совокупности со входом 37 адресов столбцов чтения используются для получения блоком коэффициентов матрицы адресов строк и столбцов требуемых элементов с целью их выборки из триггеров 16 и выдачи на выходы 43₁ - 43_n. Вход 32 адреса строки записи в совокупности со входом 36 адреса столбца записи используются для получения блоком коэффициентов матрицы значений i-й строки и j-го столбца с целью последующей записи в выбранный ij-й операционный блок значения коэффициента, поданного на вход 35 данных записи. Вход 33 синхронизации записи используется для приема стробирующего сигнала записи данных со входа 35 блока коэффициентов матрицы в выбранный ij-й блок хранения 15. Группа триггеров 41₁₁ - 41_nn и элементов ИЛИ 42₁₁ - 42_nn используется для своевременного поступления данных и формирования на выходах 43₁ - 43_n блока коэффициентов матрицы дизъюнкций значений с выходов 27₁ - 27_n блоков хранения 15 блока коэффициентов матрицы с целью формирования группы из n искомых коэффициентов матрицы в соответствии с их адресами на входах 34 и 37 и логикой работы матрицы операционных блоков 1. На выходах 43₁ - 43_n блока коэффициентов матрицы производится формирование группы из n коэффициентов матрицы с целью их последующей передачи на матрицу операционных блоков 1.Operational units 1, combined into a systolic matrix structure, are used to perform multiplication of matrix elements in a parallel pipeline and store the result. The first 2 and second 3 matrix coefficient blocks are used to store the coefficient values. And ,the first and second matrices and selection of the required groups fromncoefficients per clock cycle in each of the blocks in accordance with the operating logic of the matrix of operational blocks 1. Shift register 4 ensures the formation of a sequence of values,, …,,, …,, used in the operation of register group 5. Group 5 of two-stage registers is intended for storing the addresses of a group ofnrows of the first block 2 matrix coefficients and groups ofncolumns of the second block 3 of matrix coefficients; these addresses are necessary for reading the groups of coefficients of the matrices being multiplied in accordance with the logic of the matrix of operational blocks 1. Triggers 6 and 7 are intended for storing the coefficients of the matrices being multiplied during the operation of operational blocks 1. Trigger 8 is used to store intermediate values of the disjunctions of the conjunctions of the coefficients of the matrices being multiplied during the operation of the device, after (3n- 2) steps of the device operation, these triggers contain the values of the coefficients of the resulting matrixCThe AND element 9 is used for logical multiplication of a pair of coefficients And matrices. Element OR 10 is designed to calculate the disjunctions of the conjunctions of the coefficients of the matrices being multiplied. Elements AND 11 and OR 13 ensure that the value of the disjunctions of the conjunctions of the coefficients of the matrices being multiplied is fed to the input of trigger 8 during the operation stage, elements inverter 14 and AND 12 ensure the advancement of the multiplication result (the coefficients of the resulting matrixC) between the triggers of 8 operational units 1 in parallel-pipeline mode at the stage of outputting the result. Storage units 15, combined into a matrixn×n(Fig. 3), form the first 2 and second 3 blocks of coefficients of the matrices being multipliedA And BTriggers 16 are used to store the initial values of the coefficients of the binary matrices being multiplied. Elements AND 17 control the passage of the clock signal 20 to the clock inputs of triggers 16 in accordance with the selected row values.iand columnjas part of matrix coefficient blocks. Blocks of I-elements 18₁- 18_n control the passage of OR elements to the inputs of the corresponding blocks 19₁- 19_ngroups fromnvalues from the outputs of triggers 16, and the selected values are determined by the coordinates of the desired rows and columns at the inputs 24₁- 24_nand 25₁- 25_nin accordance with the logic of the matrix of operating blocks 1. Elements OR 19₁- 19_n, trigger group 30₁- 30_nas part of storage blocks 15 in conjunction with a group of triggers 41₁₁- 41_nnand a group of elements OR 42₁₁- 42_nnensure that 43 outputs are obtained₁- 43_nblock of matrix coefficients of the next group of matrix coefficients in accordance with the logic of the operation of the matrix of operational blocks 1 in the pipeline mode, andnpairs of row and column addresses of the sought coefficients are determined by the values at the inputs 24₁- 24_nand 25₁- 25_nstorage blocks 15. Input 20 synchronization is used when writing the initial values of matrix coefficients into triggers 16. Inputs 21 and 22 row addressesiand columnjrecords provide choiceij-th storage block 15 as part of the matrix coefficient block by controlling the passage of the synchronization signal at the synchronization input 20 through the AND element 17 when recording the required coefficient from the input 23 into the trigger 16 of the selectedij-th storage block 15. Input 23 of the recording data is used for receiving the coefficients of the matrices being multiplied by storage block 15 for the purpose of writing them into triggers 16. Input group 24₁- 24_nread line addresses and input group 25₁- 25_nread column addresses in conjunction with a group of triggers 28₁- 28_nand 29₁- 29_nare respectively used to control the timely passage of group values fromnmatrix coefficients from 16 triggers, 15 storage blocks to 43 outputs₁- 43_nmatrix coefficient block. Input group 26₁- 26_ndata from the previous line in conjunction with output group 27₁- 27_ndata for the next row is used to form the required values of the disjunction of the group of selected values of matrix coefficients by the columns of storage blocks 15, obtained at the outputs 27₁- 27_n15 storage blocksn-th row of the matrix coefficient block. Input 34 of the read row addresses in conjunction with input 37 of the read column addresses are used to obtain the matrix coefficient block of the row and column addresses of the required elements for the purpose of selecting them from triggers 16 and outputting them to outputs 43₁- 43_n. Input 32 of the record row address in combination with input 36 of the record column address are used to obtain the coefficients of the value matrix by the blocki-th line andj-th column for the purpose of subsequent recording in the selectedij-th operational block of the coefficient value supplied to input 35 of the recording data. Input 33 of the recording synchronization is used to receive the strobe signal for recording data from input 35 of the matrix coefficient block to the selectedij-th storage block 15. Trigger group 41₁₁- 41_nnand elements OR 42₁₁- 42_nnused for timely receipt of data and formation at the outputs 43₁- 43_nblock of coefficients of the matrix of disjunctions of values from outputs 27₁- 27_nstorage blocks 15 block of matrix coefficients for the purpose of forming a group ofnthe required matrix coefficients in accordance with their addresses at inputs 34 and 37 and the operating logic of the matrix of operational blocks 1. At outputs 43₁- 43_na block of matrix coefficients is used to form a group ofnmatrix coefficients for the purpose of their subsequent transfer to the matrix of operational blocks 1.

Устройство работает следующим образом. На этапе загрузки исходных данных значения , элементов первой матрицы поочередно подаются на вход 35 первого блока 2 коэффициентов матрицы внешним устройством, на входы 32 и 36 первого блока 2 коэффициентов матрицы подаются соответственно адреса строки i и столбца j в унитарном коде, запись очередного элемента производится путем подачи синхроимпульса на синхровход 33 записи первого блока 2 коэффициентов матрицы.The device works as follows. At the stage of loading the initial data, the values , elements of the first matrix are fed one after another to input 35 of the first block 2 of matrix coefficients by an external device, the addresses of row i and column j in unitary code are fed to inputs 32 and 36 of the first block 2 of matrix coefficients, respectively, the recording of the next element is performed by applying a sync pulse to the sync input 33 of the recording of the first block 2 of matrix coefficients.

Аналогично рассмотренному выше производится загрузка исходных значений , элементов второй матрицы путем подачи внешним устройством их значений на вход 35 второго блока 3 коэффициентов матрицы, подачи адресов i и j в унитарном коде соответственно на входы 32 и 36 второго блока 3 коэффициентов матрицы и синхроимпульса на синхровход 33 записи второго блока 3 коэффициентов матрицы. Загрузка элементов первой и второй матриц может быть совмещена во времени.The loading of initial values is performed in a similar manner to that discussed above. , elements of the second matrix by feeding their values to input 35 of the second block of 3 matrix coefficients by an external device, feeding addresses i and j in unitary code to inputs 32 and 36 of the second block of 3 matrix coefficients, respectively, and a sync pulse to the sync input 33 of the recording of the second block of 3 matrix coefficients. Loading the elements of the first and second matrices can be combined in time.

Значение коэффициента со входа 35 блока коэффициентов матрицы поступает на входы 23 блоков хранения 15 и затем на информационный вход триггера 16 в каждом из блоков хранения. Значения адресов строки и столбца со входов 32 и 36 поступают соответственно на входы 21 и 22 блоков хранения 15, причем i-й разряд со входа 32 подается на входы 21 блоков хранения 15 i-й строки блока коэффициентов матрицы, а j-й разряд со входа 36 подается на входы 22 блоков хранения 15 j-го столбца блока коэффициентов матрицы, . Синхроимпульс со входа синхронизации 33 записи блока коэффициентов матрицы поступает на входы 20 операционных блоков и далее на вход элемента И 17. Единичные значения со входов 21 и 22, соответствующие выбранному операционному блоку , поступают на вход элемента И 17 операционных блоков и открывают его для прохождения синхросигнала на синхровход триггера 16, обеспечивая запись очередного коэффициента. Во всех остальных блоках хранения 15 как минимум на одном из входов 21 или 22 присутствует нулевое значение, элементы И 17 закрыты для прохождения синхросигнала и запись значения в триггер 16 не происходит.The value of the coefficient from input 35 of the matrix coefficient block is fed to inputs 23 of storage blocks 15 and then to the information input of trigger 16 in each of the storage blocks. The values of the row and column addresses from inputs 32 and 36 are fed, respectively, to inputs 21 and 22 of storage blocks 15, where the i -th digit from input 32 is fed to inputs 21 of storage blocks 15 of the i -th row of the matrix coefficient block, and the j -th digit from input 36 is fed to inputs 22 of storage blocks 15 of the j -th column of the matrix coefficient block, . The synchronization pulse from the synchronization input 33 of the matrix coefficient block entry goes to the inputs of the 20 operational blocks and then to the input of the AND element 17. The unit values from inputs 21 and 22, corresponding to the selected operational block , are fed to the input of AND element 17 of the operational blocks and open it to allow the clock signal to pass to the clock input of flip-flop 16, ensuring the recording of the next coefficient. In all other storage blocks 15, at least one of the inputs 21 or 22 has a zero value. AND elements 17 are closed to allow the clock signal to pass, and no value is written to flip-flop 16.

На этапе инициализации производится подача сигнала сброса на входы сброса группы двухступенчатых регистров 5, триггеров 6, 7, 8, групп триггеров 28, 29 и 30, групп двухступенчатых регистров 39₁ - 39_n _-1, 40₁ - 40_n _-1, групп триггеров 41₁₁ - 41_nn, что обеспечивает их установку в ноль. На вход данных 23 подается единичное значение.During the initialization stage, a reset signal is applied to the reset inputs of two-stage register group 5, triggers 6, 7, 8, trigger groups 28, 29 and 30, two-stage register groups 39 ₁ - 39 _n _-1 , 40 ₁ - 40 _n _-1 , trigger groups 41 ₁₁ - 41 _nn , which ensures that they are set to zero. A value of one is applied to data input 23.

В сдвиговый регистр 4 производится запись значения 00…01₂. На входы адресов строк чтения первого блока 2 коэффициентов матрицы соответственно подаются значения 00…01₂, 00…010₂, …, 100…0₂, что впоследствии обеспечивает выдачу элементов , , …, на соответствующем выходе первого блока 2 коэффициентов матрицы, .The value 00…01 ₂ is written to the shift register 4. To the inputs the addresses of the reading lines of the first block 2 of the matrix coefficients are respectively supplied with the values 00…01 ₂ , 00…010 ₂ , …, 100…0 ₂ , which subsequently ensures the output of elements , , …, at the corresponding output first block of 2 matrix coefficients, .

На входы адресов столбцов чтения второго блока 3 коэффициентов матрицы соответственно подаются значения 00…01₂, 00…010₂, …, 100…0₂, что впоследствии обеспечивает выдачу элементов , , …, на соответствующем выходе второго блока 3 коэффициентов матрицы, . На управляющие входы всех операционных блоков 1 в составе матрицы n × n подаются единичные значения. На входы 26₁ - 26_n блоков хранения 15₁ - 15₁ _n подаются нулевые значения.To the entrances the addresses of the columns of the reading of the second block 3 of the matrix coefficients are respectively supplied with the values 00…01 ₂ , 00…010 ₂ , …, 100…0 ₂ , which subsequently ensures the output of elements , , …, at the corresponding output the second block of 3 matrix coefficients, . Unit values are fed to the control inputs of all operational blocks 1 in the n × n matrix. Zero values are fed to inputs 26 ₁ - 26 _n of storage blocks 15 ₁ - 15 ₁ _n .

На этапе работы на синхровходы группы двухступенчатых регистров 5 подается сигнал записи в первую ступень, что обеспечивает запись значений по следующей схеме: , , . После этого в следующем такте подается сигнал записи во вторую ступень регистров 5, что обеспечивает запись информации из первой ступени во вторую, на синхровход сдвигового регистра 4 подается сигнал, обеспечивающий сдвиг содержимого регистра в сторону старших разрядов. На первом шаге на этапе работы регистр 4 получает значение , а регистры 5 - соответственно значения 00…01₂, 00…0₂, …, .At the operating stage, a recording signal is sent to the first stage at the sync inputs of the two-stage register group 5, which ensures that values are recorded according to the following scheme: , , . After this, in the next clock cycle, a write signal is sent to the second stage of register 5, which ensures that information from the first stage is written to the second, and a signal is sent to the clock input of shift register 4, shifting the register contents toward the most significant bits. In the first step of the operating phase, register 4 receives the value , and registers 5 - respectively, the values 00…01 ₂ , 00…0 ₂ , …, .

Значения с выходов группы регистров 5₁ - 5_n поступают соответственно на входы блока 2 коэффициентов матрицы 1, откуда на вход 25 блока хранения 15₁₁.Параллельно с этим значения с выходов группы регистров 5₁ - 5_n поступают соответственно на входы блока 3 коэффициентов матрицы 2, откуда на вход 24 блока хранения 15₁₁. k-й разряд входа 37 поступает на k-й разряд входа 25, а k-й разряд входа 34 поступает на k-й разряд входа 24. Таже значения со входа 37 блоков 2 и 3 коэффициентов матрицы поступают на первую ступень регистра 39₁, а значения со входов 34 поступают на первую ступень регистра 40₁, где фиксируются по приходу синхросигнала. Также по приходу синхросигнала значения со входов 24₁ - 24_n фиксируются в триггерах 28₁ - 28_n, а со входов 25₁ - 25_n в триггерах 29₁ - 29_n.Values from the outputs of register group 5₁- 5_nare sent to the inputs accordinglyblock 2 of matrix coefficients 1, from where to input 25 of storage block 15₁₁.In parallel with this, the values from the outputs of register group 5₁- 5_nare sent to the inputs accordinglyblock 3 coefficients of matrix 2, from where to the input 24 of storage block 15₁₁.k-th digit of input 37 goes tok-th digit of the input is 25, and k-th digit of input 34 goes tok-th digit of input 24. The same values from input 37 of blocks 2 and 3 of matrix coefficients are sent to the first stage of register 39₁, and the values from inputs 34 go to the first stage of register 40₁, where they are recorded upon arrival of the synchronization signal. Also, upon arrival of the synchronization signal, the values from inputs 24₁- 24_nare fixed in triggers 28₁- 28_n, and from the inputs 25₁- 25_nin triggers 29₁- 29_n.

Значения с выходов группы триггеров 28, значения с выходов группы триггеров 29 поступают на 2 и 3 входы элементов И 18₁ - 18_n, обеспечивая прохождение сигнала с выхода триггеров 16 на входы элементов 19₁ - 19_n только в случае наличия двух единичных значений k = на 2 и 3 входе элемента И 18_k, что обеспечивает чтение значения триггера 16 только для выбранной пары строки и столбца.The values from the outputs of trigger group 28, the values from the outputs of trigger group 29 are sent to inputs 2 and 3 of AND elements 18 ₁ - 18 _n , ensuring that the signal from the output of triggers 16 passes to the inputs of elements 19 ₁ - 19 _n only in the case of the presence of two single values k = at the 2nd and 3rd inputs of the AND element 18 _k , which ensures that the value of trigger 16 is read only for the selected row and column pair.

Прочитанные значения проходят через элементы ИЛИ , после чего поступают на информационные входы триггеров , где впоследствии фиксируются в первой ступени по приходу синхросигнала.The read values are passed through the OR gates , after which they are sent to the information inputs of the triggers , where they are subsequently recorded in the first stage upon arrival of the synchronization signal.

В первом такте работы на выходе 27 блока хранения 15₁₁ сформированы необходимые значения коэффициента a ₁₁ матрицы. Далее по приходу синхросигнала производится запись информации из первой ступени регистров 39₁ - 39_n _-1 и 40₁ - 40_n _-1 и группы триггеров 28₁ - 28_n, 29₁ - 29_n и 30₁ - 30_n во вторые ступени соответственно.In the first cycle of operation, the required values of the coefficient a ₁₁ of the matrix are formed at the output 27 of the storage block 15 _11. Then, upon receipt of the synchronization signal, information is written from the first stage of registers 39 ₁ - 39 _n _-1 and 40 ₁ - 40 _n _-1 and the group of triggers 28 ₁ - 28 _n , 29 ₁ - 29 _n and 30 ₁ - 30 _n to the second stages, respectively.

Так, на первом шаге на этапе работы на входах первого блока 2 коэффициентов матрицы присутствуют значения , , …, (1, 2, …, n в десятичной форме с учетом используемого унитарного кодирования), на входах первого блока 2 коэффициентов матрицы - значения , , …, (1, 0, …, 0 в десятичной форме).So, at the first step, at the stage of working on the inputs the first block of 2 matrix coefficients contains values , , …, (1, 2, …, n in decimal form taking into account the unitary coding used), at the inputs first block 2 matrix coefficients - values , , …, (1, 0, …, 0 in decimal form).

На следующем такте работы производится запись значений из вторых ступеней регистров 39_k _-1 в первую ступень регистров 39_k, и из вторых ступеней регистров 40_k _-1 в первую ступень регистров 40_k, где k = . Параллельно с этим значения из вторых ступеней группы триггеров 28₁ - 28_n блоков хранения 15_i _, _j _-1 записываются в первую ступень группы триггеров 28₁ - 28_n блоков хранения 15_ij, а значения из вторых ступеней группы триггеров 29₁ - 29_n блоков хранения 15_i _-1, _j записываются в первую ступень группы триггеров 29₁ - 29_n блоков хранения 15_ij.At the next cycle of operation, the values are written from the second stages of registers 39 _k _-1 to the first stage of registers 39 _k , and from the second stages of registers 40 _k _-1 to the first stage of registers 40 _k , where k = . In parallel with this, the values from the second stages of the trigger group 28 ₁ - 28 _n of the storage blocks 15 _i _, _j _-1 are written to the first stage of the group of triggers 28 ₁ - 28 _n storage blocks 15 _ij , and the values from the second stages of the group of triggers 29 ₁ - 29 _n storage blocks 15 _i _-1, _j are written to the first stage of the group of triggers 29 ₁ - 29 _n storage blocks 15 _ij .

На втором такте работы устройства на выходах 38 блоков хранения 15₂₁ и 15₁₂ обеспечивается чтение необходимой информации. На третьем такте работы устройства аналогичным образом обеспечивается чтение необходимой информации на выходах 38 блоков хранения 15₃₁, 15₂₂, 15₁₃ и так далее. На n-ном такте работы устройства на выходе 27 блока хранения 15_n ₁ сформировано искомое значение коэффициента матрицы a _ij, которое впоследствии фиксируется в первых ступенях триггеров 41₁₁ - 41_n ₁. Далее производится запись значений из первых во вторые ступени группы триггеров 41₁₁ - 41_n _n. На (n + 1) такте на выходе 27 блока хранения 15_n ₂ аналогичным образом, оказывается сформированным значение коэффициента матрицы a _ij. Указанная пара значений поступает на группу элементов ИЛИ 42₁₁ - 42_n ₁ с выходов которых поступает на первую ступень группы триггеров 41₁₂ - 41_n ₂, где фиксируется по приходу синхросигнала. За последующие (n - 2) тактов данные проходят через группы элементов ИЛИ 42₁₃ - 42_n ₃, …, 41₁ _n - 42_n _n и группы триггеров 41₁₃ - 41_n ₃, …, 41₁ _n - 41_n _n и оказываются на выходах 43₁ - 43_n блока коэффициентов матрицы.On the second cycle of the device operation, the required information is read at the outputs 38 of storage blocks 15 ₂₁ and 15 _12. On the third cycle of the device operation, the required information is read in a similar manner at the outputs 38 of storage blocks 15 ₃₁ , 15 ₂₂ , 15 ₁₃ and so on. On the n -th cycle of the device operation, the desired value of the matrix coefficient a _ij is formed at the output 27 of storage block 15 _n ₁ , which is subsequently recorded in the first stages of triggers 41 ₁₁ - 41 _n ₁ . Then, the values are written from the first to the second stages of the group of triggers 41 ₁₁ - 41 _n _n . On the ( n + 1) cycle, the value of the matrix coefficient a _ij is similarly formed at the output 27 of storage block 15 _n _{2 .} The specified pair of values is fed to the group of OR elements 42 ₁₁ - 42 _n ₁ from the outputs of which it is fed to the first stage of the group of triggers 41 ₁₂ - 41 _n ₂ , where it is recorded upon arrival of the clock signal. Over the next ( n - 2) clock cycles, the data passes through the groups of OR elements 42 ₁₃ - 42 _n ₃ , …, 41 ₁ _n - 42 _n _n and the groups of triggers 41 ₁₃ - 41 _n ₃ , …, 41 ₁ _n - 41 _n _n and ends up at the outputs 43 ₁ - 43 _n of the matrix coefficient block.

Далее в каждом новом такте на выходах 43₁ - 43_n оказываются сформированы необходимые значения коэффициентов матрицы в соответствии с адресами, которые были поданы на входы 34 и 37 блока коэффициентов матрицы ранее.Further, in each new clock cycle, the required values of the matrix coefficients are generated at outputs 43 ₁ - 43 _n in accordance with the addresses that were previously supplied to inputs 34 and 37 of the matrix coefficient block.

Сформированные значения с выходов 43 первого блока 2 коэффициентов матрицы поступают на входы соответствующих триггеров 7 первого столбца матрицы операционных блоков 1, а с выходов 43 второго блока 3 коэффициентов матрицы - на входы соответствующих триггеров 6 первой строки матрицы операционных блоков 1, где фиксируются по приходу соответствующего синхросигнала; значения из первой ступени триггеров 8 операционных блоков 1 при этом записываются во вторую ступень.The generated values from the outputs 43 of the first block 2 of the matrix coefficients are fed to the inputs of the corresponding triggers 7 of the first column of the matrix of the operational blocks 1, and from the outputs 43 of the second block 3 of the matrix coefficients - to the inputs of the corresponding triggers 6 of the first row of the matrix of the operational blocks 1, where they are recorded upon arrival of the corresponding synchronization signal; the values from the first stage of the triggers 8 of the operational blocks 1 are written to the second stage.

Каждый операционный блок 1 реализует функции \/, где t - номер шага работы устройства. Вычисление значений коэффициентов результирующей матрицы C производится с использованием следующих рекуррентных формул:Each operating unit 1 implements the functions \/ , where t is the device's operating step number. The coefficient values of the resulting matrix C are calculated using the following recurrence formulas:

, ,

\/ , \/ ,

. .

С выходов триггеров 6 и 7 значения и подаются на входы элемента И 9, на выходе которого формируется их конъюнкция , поступающая на второй вход элемента ИЛИ 10. На первый вход элемента ИЛИ 10 подается значение из триггера 8, с выхода которого сформированное значение поступает на второй вход элемента И 11. Элемент И 12 закрыт для прохождения сигнала с третьего входа операционного блока ввиду наличия нулевого сигнала на первом входе элемента И 12, поступающего с выхода инвертора 14. Значение со второго входа элемента И 11 проходит на его выход благодаря тому, что на первом входе элемента И 11, находится единичное значение управляющего сигнала с управляющего входа операционного блока. Значение с выхода элемента И 11 проходит через элемент ИЛИ 13 на вход первой ступени триггера 8, где фиксируется по пришествии соответствующего синхросигнала, что обеспечивает формирование в триггерах 8 операционных блоков 1 искомых значений.From the outputs of triggers 6 and 7 the values And are fed to the inputs of the AND element 9, at the output of which their conjunction is formed , arriving at the second input of the OR element 10. The value from trigger 8 is fed to the first input of the OR element 10, from the output of which the formed value is fed to the second input of the AND element 11. The AND element 12 is closed to the passage of the signal from the third input of the operational unit due to the presence of a zero signal at the first input of the AND element 12, coming from the output of inverter 14. The value from the second input of the AND element 11 passes to its output due to the fact that the first input of the AND element 11 contains a unity value of the control signal from the control input of the operational unit. The value from the output of the AND element 11 passes through the OR element 13 to the input of the first stage of trigger 8, where it is fixed upon arrival of the corresponding synchronization signal, which ensures the formation of the desired values in the triggers 8 of the operational units 1.

Далее на синхровходы группы двухступенчатых регистров 5 подается сигнал записи в первую ступень, и работа устройства повторяется, как описано выше. Так, на втором шаге на этапе работы регистр 4 получает значение , а регистры 5 - соответственно значения , , , …, . На выходах 43 первого блока 2 коэффициентов матрицы формируются значения , которые поступают на первый столбец операционных элементов 1 и фиксируются в соответствующих триггерах 7; на выходах 43 второго блока 3 коэффициентов матрицы формируются значения которые поступают на первую строку операционных элементов 1 и фиксируются в соответствующих триггерах 8.Next, a write signal is sent to the first stage of the two-stage register group 5, and the device's operation is repeated as described above. Thus, in the second step, register 4 receives the value , and registers 5 are the values, respectively , , , …, At the outputs 43 of the first block 2 of the matrix coefficients, values are formed , which are fed to the first column of operational elements 1 and are recorded in the corresponding triggers 7; at the outputs 43 of the second block 3 of the matrix coefficients, values are formed which arrive at the first line of operational elements 1 and are recorded in the corresponding triggers 8.

На этапе получения результата умножения элемент И 11 закрыт для прохождения сигнала с выхода триггера 8, ввиду наличия нулевого сигнала на первом входе элемента И 11, поступающего с управляющего входа операционного блока. Значение со второго входа элемента И 12 проходит на его выход, благодаря тому, что на первом входе элемента И 12, находится единичное значение управляющего сигнала, поступающее с выхода инвертора 14. Значение с выхода элемента И 12 проходит через элемент ИЛИ 13 на вход первой ступени триггера 8, где фиксируется по пришествии соответствующего синхросигнала, что обеспечивает формирование в триггерах 8 операционных блоков 1 искомых значений. После этого производится выдача синхроимпульсов, поочередно обеспечивающих запись сперва в первую, а затем во вторую ступени триггеров 8, что за n шагов обеспечивает построчный вывод результирующих элементов матрицы C из нижней строки матрицы операционных элементов 1 в параллельно-конвейерном режиме.At the stage of obtaining the multiplication result, the AND element 11 is closed to the signal from the output of the flip-flop 8, due to the presence of a zero signal at the first input of the AND element 11, coming from the control input of the operational unit. The value from the second input of the AND element 12 passes to its output, due to the fact that the first input of the AND element 12 contains a unity value of the control signal coming from the output of the inverter 14. The value from the output of the AND element 12 passes through the OR element 13 to the input of the first stage of the flip-flop 8, where it is fixed upon arrival of the corresponding clock signal, which ensures the formation of the desired values in the flip-flops 8 of the operational units 1. After this, the output of clock pulses is performed, alternately ensuring writing first to the first, and then to the second stage of the flip-flops 8, which in n steps ensures the row-by-row output of the resulting elements of the matrix C from the bottom row of the matrix of operational elements 1 in a parallel-pipeline mode.

Оценим преимущества предлагаемого устройства с точки зрения снижения временных затрат на умножение бинарных матриц.Let us evaluate the advantages of the proposed device in terms of reducing the time costs for multiplying binary matrices.

Время работы устройства определяется какThe operating time of the device is defined as

t _общ = t _ин+ t _раб, t _general = t _in+ t _slave,

где t _ин - время инициализации устройства; t _раб - время обработки данных.where t _in is the device initialization time; t _rab is the data processing time.

При инициализации устройства производится сброс значений группы двухступенчатых регистров 5, триггеров 6, 7, 8, групп триггеров 28, 29, 30, групп двухступенчатых регистров 39, 40, группы триггеров 41, на что требуется время t _ин = 2t ₀.When the device is initialized , the values of the group of two-stage registers 5, triggers 6, 7, 8, groups of triggers 28, 29, 30, groups of two-stage registers 39, 40, group of triggers 41 are reset, which requires time t _in = 2 t ₀ .

Этап работы устройства включает в своем составе (2n - 1) итерацию. На каждой итерации операционная часть работает как группа линейных синхронных конвейеров под управлением общего синхросигнала. Конвейерный такт t _k определяется максимумом из времени чтения данных из специализированной памяти и времени работы операционного блока 1:The device's operating stage consists of (2 n - 1) iterations. At each iteration, the operating unit operates as a group of linear synchronous pipelines under the control of a common clock signal. The pipeline clock t _k is determined by the maximum of the data read time from the dedicated memory and the operating time of operating unit 1:

t _k = max (t _k _чт , t _k _ОБ). t _k =max (t _k _Thu , t _k _ABOUT).

В начале работы адрес строки с одного из входов 34 адресов строк чтения блока коэффициентов матрицы записывается в первую ступень двухступенчатого регистра 40₁и с входа 24 в первую ступень группы триггеров 28₁ - 28_n блока хранения 15₁₁, на что затрачивается время , параллельно с этим адрес столбца с одного из входов 37 адресов строк чтения блока коэффициентов матрицы записывается в первую ступень двухступенчатого регистра 39₁ с входа 25 в первую ступень группы триггеров 29₁ - 29_n блока хранения 15₁₁, далее данные переписываются из первых во вторые ступени группы триггеров 30₁ - 30_n, на что затрачивается время .At the start of operation, the line address from one of the inputs 34 of the read line addresses of the matrix coefficient block is written into the first stage of the two-stage register 40 ₁ and from input 24 into the first stage of the group of triggers 28 ₁ - 28 _n of the storage block 15 ₁₁ , which takes time , in parallel with this, the column address from one of the inputs 37 of the address of the rows for reading the matrix coefficient block is written into the first stage of the two-stage register 39 ₁ from input 25 into the first stage of the group of triggers 29 ₁ - 29 _n of the storage block 15 ₁₁ , then the data is rewritten from the first to the second stages of the group of triggers 30 ₁ - 30 _n , which takes time .

Далее производится запись информации из вторых ступеней двухступенчатых регистров 40_i _-1 в первые ступени регистров 40_i и из вторых ступеней двухступенчатых регистров 39_i _-1 в первые ступени регистров 39_i, где i = , на что затрачивается время 2t ₀, и из вторых ступеней триггеров 28₁ -28_n блоков хранения 15_i _j в первые ступени триггеров 28₁ - 28_n блоков хранения 15_i _, _j ₊ ₁, где i = , j = , и из вторых ступеней триггеров 29₁ - 29_n блоков хранения 15_i _j в первые ступени триггеров 29₁ - 29_n блоков хранения 15_i+ _1, _j, где i = , j = , на что затрачивается время 2t ₀ . Next, information is recorded from the second stages of the two-stage registers 40_i _-1in the first stages of registers 40_i And from the second stages of two-stage registers 39_i _-1in the first stages of registers 39_i, Wherei = , what time is spent on 2t ₀, and from the second stages of triggers 28₁-28_n15 storage blocks_i _jin the first stages of triggers 28₁- 28_n15 storage blocks_i _, _j ₊ ₁, Where i = , j = , and from the second stages of triggers 29₁- 29_n15 storage blocks_i _jin the first stages of triggers 29₁- 29_n 15 storage blocks_i+ _1, _j, Where i = , j = , what time is spent on 2t ₀ .

Далее сигналы с выходов вторых ступеней групп триггеров 28₁ - 28_n и 29₁ - 29_n открывают трехвходовые элементы И 18₁ - 18_n (необходимое время - 2t ₀) для прохождения сигнала с выхода триггера 16 на входы элементов ИЛИ 19 (необходимое время - ) и затем записывается в первую ступень группы триггеров 30, на что затрачивается время .Next, the signals from the outputs of the second stages of trigger groups 28 ₁ - 28 _n and 29 ₁ - 29 _n open the three-input AND elements 18 ₁ - 18 _n (the required time is 2 t ₀ ) to allow the signal from the output of trigger 16 to pass to the inputs of OR elements 19 (the required time is ) and then recorded in the first stage of the trigger group 30, which takes time .

Параллельно с записью информации во вторые ступени двухступенчатых регистров 39, 40 и триггеров 28, 29 выполняется прохождение сигнала через элементы ИЛИ 42 (необходимое время - ), далее запись информации в первую ступень триггеров 41 (необходимое время - 2t ₀) и затем запись информации из первой ступени во вторую триггеров 41 (необходимое время - 2t ₀).In parallel with the recording of information into the second stages of the two-stage registers 39, 40 and triggers 28, 29, the signal passes through the OR elements 42 (the required time is ), then recording information in the first stage of triggers 41 (the required time is 2 t ₀ ) and then recording information from the first stage in the second stage of triggers 41 (the required time is 2 t ₀ ).

Время чтения порции данных из памяти в конвейерном режиме составляетThe time to read a portion of data from memory in pipeline mode is

t _k _чт _. = max (, , , , ) + max (max(, , , ) + + ++, + +) = 2t ₀+ max (7t ₀, 5t ₀) = 9t ₀. t _k _Thu _.= max (,,,,) + max (max(,,,) ++ ++,++) = 2t ₀+ max (7t ₀, 5t ₀) = 9t ₀.

Во время работы ячейки операционного блока устройства обработки бинарных матриц на базе систолических структур сигнал проходит через двухступенчатый триггер 7 за время , параллельно с этим, сигнал аналогично проходит через двухступенчатый триггер 6. Изначально запись сигналов и происходит в первую ступень триггеров 6 и 7, соответственно, за время . Далее, сигналы переходят на вторую ступень данных триггеров, за время, равное .During the operation of the cell of the operating unit of the device for processing binary matrices based on systolic structures, the signal passes through a two-stage trigger 7 in time , in parallel with this, the signal similarly passes through a two-stage trigger 6. Initially, the signals are recorded And occurs in the first stage of triggers 6 and 7, respectively, during the time . Next, the signals move to the second stage of these triggers, in a time equal to .

Наиболее длинный путь, возникающий при прохождении сигнала через логические элементы по схеме операционного блока 1, представлен элементами: триггеры 6, 7, 8, элементы И 9, 11, элементы ИЛИ 10, 13, на что необходимо времяThe longest path that occurs when a signal passes through the logic elements according to the circuit of the operational block 1 is represented by the elements: triggers 6, 7, 8, AND elements 9, 11, OR elements 10, 13, which requires time

t _k _ОБ = max (, ) + ++ + + = 10t ₀, t _k _ABOUT= max (,) +++++= 10t ₀,

запись во вторую ступень триггера 8 производится параллельно с записью в первые ступени триггеров 6 и 7.recording into the second stage of trigger 8 is carried out in parallel with recording into the first stages of triggers 6 and 7.

Конвейерный такт может быть определен какThe pipeline cycle can be defined as

t _k = max (9t ₀, 10t ₀) = 10t ₀. t _k = max (9 t ₀ , 10 t ₀ ) = 10 t ₀ .

Время обработки данных t _раб = t _k × (2n - 1) ввиду того, что при выбранном алгоритме умножения матриц подача коэффициентов производится за (2n - 1) итерацию, а общее время работы устройства определяется какThe data processing time t _work = t _k × (2 n - 1) due to the fact that with the selected matrix multiplication algorithm, the coefficients are supplied in (2 n - 1) iterations, and the total operating time of the device is determined as

t _общ = 2t ₀ + 10t ₀ × (2n - 1) = 2t ₀ + 20t ₀ n - 10t ₀ = (20n - 8) t ₀. t _total = 2 t ₀ + 10 t ₀ × (2 n - 1) = 2 t ₀ + 20 t ₀ n - 10 t ₀ = (20 n - 8) t ₀ .

Оценим временные затраты прототипа на умножение бинарных матриц.Let's estimate the time costs of the prototype for multiplying binary matrices.

Время работы прототипа определяется какThe prototype running time is defined as

t _общ _{. прот.} = t _ин _{. прот.}+ t _раб _{. прот.}, t _general _{. prot.} = t _in _{. prot.} + t _rab _{. prot .} ,

где t _ин _{. прот.} - время инициализации; t _раб _{. прот.} - время обработки данных.where t _in _{. prot.} is the initialization time; t _rab _{. prot.} is the data processing time.

Время инициализации и время работы операционных блоков совпадают у предложенного устройства и прототипа:The initialization time and operating time of the operating units are the same for the proposed device and the prototype:

tt _{ин. прот. in. prot.} = t= t _ин.in. ;;

t _k _{ОБ прот.} = t _k _ОБ. t _k _{ABOUT Archpriest} = t _k _ABOUT.

Этап работы прототипа включает в своем составе (2n - 1) итерацию. На каждой итерации операционная часть работает как группа линейных синхронных конвейеров под управлением общего синхросигнала. Конвейерный такт определяется максимумом из времени чтения данных из специализированной памяти и времени работы операционного блока 1:The prototype's operating stage consists of (2 n - 1) iterations. At each iteration, the operating unit operates as a group of linear synchronous pipelines controlled by a common clock signal. The pipeline clock is determined by the maximum of the data read time from specialized memory and the operating time of operating unit 1:

t _{k прот.} = max (t _{k чт. прот.}, t _{k ОБ прот.}). t _{to the arch.} =max (t _{to thu prot.},t _{k OB prot.}).

При чтении данных сигнал с выхода триггера 16 проходит через один из трехвходовых элементов И 18, на что затрачивается время 2t ₀ (все элементы И 18 в составе блоков хранения работают параллельно), через один из двухвходовых элементов ИЛИ 19, на что затрачивается время t ₀ и через n-входовой элемент ИЛИ 34, на что с использованием пирамидальной схемы из двухвходовых элементов необходимо время , где - операция округления вверх. Таким образом, общее время чтения данных составляетWhen reading data, the signal from the output of trigger 16 passes through one of the three-input AND elements 18, which takes 2 t ₀ (all AND elements 18 in the storage blocks operate in parallel), through one of the two-input OR elements 19, which takes t ₀ , and through the n -input OR element 34, which, using a pyramidal circuit of two-input elements, takes time , Where - rounding up operation. Thus, the total time to read the data is

t _чт _{. прот.} = 2t ₀ + t ₀ n + . t _Thu _{. prot.}= 2t ₀+t ₀ n+ .

t _k _прот. = max (10t ₀, 2t ₀ + t ₀ n + ). t _k _Archpriest= max (10t ₀, 2t ₀+t ₀ n+ ).

Для практически важных случаев (n > 4)For practically important cases ( n > 4)

t _k _прот. = 2t ₀ + t ₀ n + , t _k _Archpriest= 2t ₀+t ₀ n+ ,

то есть время работы прототипа лимитируется темпом поступления входных данных из специализированной многопортовой памяти. Время обработки данных t _раб _{. прот.} = t _k _прот. × (2n - 1) ввиду того, что при выбранном алгоритме умножения матриц подача коэффициентов производится за (2n - 1) итерацию.That is, the prototype's operating time is limited by the rate at which input data arrives from the specialized multiport memory. The data processing time twork _{. prot.} ₌ _tkprot . _× ( 2 n - 1) due to the fact that, with the chosen matrix multiplication algorithm, the coefficients are supplied in (2 n - 1) iterations.

t _общ _{. прот.} = 2t ₀ + (2t ₀ + t ₀ n + ) × (2n - 1) = 2t ₀+ 4t ₀ n + 2t ₀ n ² + 2n - 2t ₀ - t ₀ n - = 2t ₀ n ² + 3t ₀ n + 2n - = (2n ² + 3n + 2n - ) t ₀. t _general _{. prot.}= 2t ₀ + (2t ₀ + t ₀ n+)×(2n- 1)=2t ₀+ 4t ₀ n+ 2t ₀ n ²+ 2n - 2t ₀-t ₀ n-= 2t ₀ n ²+ 3t ₀ n+ 2n -= (2n ²+ 3n+ 2n -)t ₀.

Выигрыш во времени обработки определяется какThe gain in processing time is defined as

η = . η = .

Значения временных затрат для предлагаемого устройства и прототипа, рассчитанные для различных n, t ₀ = 1 нс приведены на фиг. 5.The time cost values for the proposed device and the prototype, calculated for different n , t ₀ = 1 ns, are shown in Fig. 5.

Из представленных данных следует, что предлагаемое устройство позволяет снизить время обработки до 206,3 раза при выполнении умножения квадратных бинарных матриц размером n ≤ 2048. From the presented data it follows that the proposed device allows to reduce the processing time by up to 206.3 times when performing the multiplication of square binary matrices of size n ≤ 2048 .

Источники информацииSources of information

1. Патент РФ на полезную модель № 193927. Устройство для умножения бинарных матриц. Заявл. 26.06.2019, опубл. 21.11.2019, Бюл. № 33.1. Russian Federation Patent for Utility Model No. 193927. Device for Binary Matrix Multiplication. Claimed June 26, 2019, published November 21, 2019, Bulletin No. 33.

2. Патент РФ на полезную модель № 157948. Устройство для умножения матриц. Кл. G06F 17/16, заявл. 08.07.2015, опубл. 20.12.2015, Бюл. № 35.2. Russian Federation Patent for Utility Model No. 157948. Matrix Multiplication Device. Class G06F 17/16, filed July 8, 2015, published December 20, 2015, Bulletin No. 35.

Claims

A device for multiplying binary matrices, comprising a matrix of n×n operational units, where n is the size of the square matrices being multiplied, first and second matrix coefficient units, a shift register, a group of n two-stage registers, each operational unit comprising a first, second, third trigger, a first, second, third AND elements, a first, second OR elements, an inverter, each of the matrix coefficient units contains n×n storage units, each of the storage units contains an AND element, a trigger, a group of n AND elements, a group of n OR elements, and the first input of the (i, j)th operational unit, where connected to the first output of the (i, j-1)-th operational unit, the second input -th operating block, where connected to the second output -th operational unit, the k-th output of the device output group is connected to the third output of the (n, k)-th operational unit, the synchronization inputs of all operational units are connected to the synchronization input of the matrix of operational units of the device, the third input -th operating block, where connected to the third output -th operating unit, the first input of the (i, 1)-th operating unit, where connected to the i-th output as part of the group of outputs of the first block of matrix coefficients, the second input of the (1, k)-th operational block, where connected to the k-th output as part of the output group of the second block of matrix coefficients, the control inputs of the operational blocks are connected to the control input of the device, and the reset inputs of the operational blocks are connected to the reset input of the device, which is also connected to the reset inputs of each of the n two-stage registers, the information input of the m-th two-stage register, where connected to the output of the (m-1)-th two-stage register, the information input of the first two-stage register is connected to the output of the shift register, the clock input of which is connected to the clock input of the device, which is also connected to the clock inputs of each of the n two-stage registers, the output of the k-th, where the two-stage register is connected to the k-th input as part of the group of read row addresses of the second block of matrix coefficients and the k-th input as part of the group of read column addresses of the first block of matrix coefficients, the sync inputs of the first and second blocks of matrix coefficients are connected to the sync input of the device write, as part of each of the blocks of matrix coefficients, the sync input of the block of matrix coefficients is connected to the sync inputs of each of the n×n storage blocks, the k-th input, where as part of the group of read line addresses of the matrix coefficient block, it is connected to the input of the read line addresses of the (1, 1)-th storage block, the k-th input, where as part of the group of read column addresses of the matrix coefficient block, it is connected to the input of the read column addresses of the (1, 1)-th storage block, the k-th input, where as part of a group of data inputs from the previous row of the (m, p)-th storage block, where connected to the k-th data output for the next row of the (m-1, p)-th storage block, the k-th input, where as part of the group of addresses of recording rows of the matrix coefficient block, it is connected to the input of the addresses of recording rows of the (i, k)-th storage block, where k-th input, where as part of the group of column addresses of the recording block of matrix coefficients, it is connected to the input of the column addresses of the recording block of the (k, i)-th storage block, where the write data input of the matrix coefficient block is connected to the write data input of each of the n×n storage blocks, within each of the storage blocks the first input of the AND element is connected to the input of the column address of the write storage block, the second input is connected to the input of the row address of the write storage block, and the third input is connected to the synchronization input of the storage block, the output of the AND element is connected to the synchronization input of the trigger, the information input of the trigger is connected to the write data input of the storage block, the output of the trigger is connected to the first inputs of each of the n elements of the group of AND elements, the output of the k-th element, where group of elements AND is connected to the first input of the k-th, where element of the OR group of elements, the second input of the k-th, where element of the group of elements OR is connected to the k-th, where the data input from the previous row of the storage block, in each of the operational blocks the input and output of the second trigger are connected respectively to the first input and output of the operational block, the input and output of the first trigger are connected respectively to the second input and output of the operational block, the third output of the operational block is connected to the output of the third trigger, the output of the first trigger is connected to the first input of the first AND element, the output of the second trigger is connected to the second input of the first AND element, the second input of the first OR element is connected to the output of the first AND element, the first and second inputs of which are connected respectively to the first and second outputs of the operational block, the clock input of which is connected to the clock inputs of the first, second and third triggers, the output of the first OR element is connected to the second input of the second AND element, the second input of the third AND element is connected to the third input of the operational block, and the output is connected to the second input of the second OR element, the output of which is connected to the input of the third trigger, the control input of the operational block is connected to the first input of the second AND element and the input of the inverter, the output of which is connected to the first input of the third AND element, and the reset input operational block - with reset inputs of the first, second and third triggers, the output of the second AND element is connected to the first input of the second OR element, the output of the third trigger is connected to the first input of the first OR element, additionally introduced in each storage block are the first group of n triggers, the second group of n triggers, the third group of n triggers, in the coefficient block - the first group of (n-1) two-stage registers, the second group of (n-1) two-stage registers, a group of n×n triggers, a group of n×(n-1) OR elements, and the information input of the k-th element, where the first group of triggers is connected to the k-th input, where as part of the input group of addresses of the reading lines of the storage block, and the output of the k-th element, where the first group of triggers is connected to the second input of the k-th element, where groups of elements and to the k-th output, where as part of the group of output addresses of the storage block read lines, the information input of the k-th element, where the second group of triggers is connected to the k-th input, where as part of the input group of the storage block read column addresses, and the output of the k-th element, where the second group of triggers is connected to the third input of the k-th element, where groups of elements and to the k-th output, where as part of the group of output addresses of the storage block read columns, the information input of the k-th element, where the third group of triggers is connected to the output of the k-th element, where groups of OR elements, and the output of the k-th element, where the third group of triggers is connected to the k-th output, where as part of the group of data outputs of the current storage block, the reset input of the storage block is connected to the reset inputs of the first, second and third groups of triggers, the first element of the first group of two-stage registers is connected to the input of the read column addresses of the matrix coefficient block, the output of the k-th element, where the first group of two-stage registers is connected to the k-th input, where as part of the input group of read column addresses of the (1, k+1)-th storage block, where and to the input of the (k+1)th, where element of the first group of two-stage registers, the input of the first element of the second group of two-stage registers is connected to the input of the addresses of the reading lines of the matrix coefficient block, the output of the k-th element, where the second group of two-stage registers is connected to the k-th input, where as part of the input group of read line addresses of the (k+1, 1)-th storage block, where and to the input of the (k+1)th, where element of the second group of two-stage registers, k-th output, where as part of the output group of read line addresses of the (m, p)-th storage block, where connected to the k-th input as part of the group of inputs of the read line addresses of the (m, p+1)-th storage block, the k-th output, where as part of the output group of read column addresses of the (m, p)-th storage block, where is connected to the k-th input in the group of inputs of the read column addresses of the (m+1, p)-th storage block, the reset input of the device matrix coefficient block is connected to the reset inputs of each of the n×n storage blocks, to the first and second group of (n-1) two-stage registers and to a group of n×n triggers, the k-th output digit in the group of data outputs of the (n, 1)-th storage block is connected to the information input of the k-th element, where groups of triggers, and the k-th output digit is part of the data output group of the (n, p)-th storage block, where connected to the first input of the (k, p-1)-th element, where groups of OR elements, the output of the (k, p)-th element, where group of OR elements is connected to the information input of the (k, p+1)-th element of the trigger group, the output of the (k, p)-th element, where group of triggers is connected to the second input of the (k, p+1)-th element of the group of OR elements, and the output of the (k, n)-th element, where The group of triggers is connected to the k-th output as part of the group of outputs of the matrix coefficient block.