RU157948U1

RU157948U1 - DEVICE FOR MATRIX MULTIPLICATION

Info

Publication number: RU157948U1
Application number: RU2015127533/08U
Authority: RU
Inventors: Эдуард Игоревич Ватутин; Илья Александрович Мартынов; Виталий Семенович Титов
Priority date: 2015-07-08
Filing date: 2015-07-08
Publication date: 2015-12-20

Abstract

Устройство для умножения матриц, содержащее матрицу из n×n операционных блоков (где n - размер перемножаемых квадратных матриц), каждый из которых содержит первый, второй и третий регистры, умножитель и сумматор, причем выход и выход второго регистра подключены соответственно к первым входу и выходу операционного блока соответственно, вход и выход первого регистра подключены соответственно ко вторым входу и выходу операционного блока соответственно, третий выход операционного блока соединен с выходом третьего регистра, второй вход сумматора соединен с выходом умножителя, первый и второй входы которого соединены соответственно с первым и вторым выходами операционного блока, синхровход которого подключен к синхровходам всех регистров, умножителя и сумматора, первый вход (i,j)-го операционного блока (где) подключен к первому выходу (i,j-1)-го операционного блока, второй вход (l,k)-го операционного блока (где) подключен ко второму выходу (l-1,k)-го операционного блока, k-й выход группы выходов устройства подключен к третьему выходу (n,k)-го операционного блока, синхровходы всех операционных блоков подключены к синхровходу матрицы операционных блоков устройства, отличающееся тем, что в него дополнительно введены мультиплексор в составе каждого операционного блока, первый и второй блоки коэффициентов матриц, сдвиговый регистр, группа из n двухступенчатых регистров, каждый из блоков коэффициентов матриц содержит n×n блоков хранения и группу из n выходных элементов ИЛИ, каждый из блоков хранения содержит регистр, элемент И, группу из n элементов И, группу из n элементов ИЛИ, причем выход сумматора соединен с первым входом мультA device for matrix multiplication containing a matrix of n × n operating blocks (where n is the size of the multiplied square matrices), each of which contains the first, second and third registers, a multiplier and an adder, the output and output of the second register being connected respectively to the first input and the output of the operating unit, respectively, the input and output of the first register are connected respectively to the second input and output of the operating unit, respectively, the third output of the operating unit is connected to the output of the third register, the second input is the sum Ora is connected to the output of the multiplier, the first and second inputs of which are connected respectively to the first and second outputs of the operating unit, the sync input of which is connected to the sync inputs of all registers, the multiplier and the adder, the first input of the (i, j) -th operating unit (where) is connected to the first the output of the (i, j-1) -th operating unit, the second input of the (l, k) -th operating unit (where) is connected to the second output of the (l-1, k) -th operating unit, k-th output of the group of device outputs connected to the third output of the (n, k) -th operating unit, sync inputs of all operating ith blocks are connected to the sync input of the matrix of the operating blocks of the device, characterized in that it additionally includes a multiplexer as part of each operational block, the first and second blocks of matrix coefficients, a shift register, a group of n two-stage registers, each of the matrix coefficient blocks contains n × n storage units and a group of n output OR elements, each of the storage units contains a register, an AND element, a group of n AND elements, a group of n OR elements, and the adder output is connected to the first input of the mult

Description

Полезная модель относится к вычислительной технике и может быть использована для умножения произвольных Квадратных матриц размером n×n элементов.The utility model relates to computer technology and can be used to multiply arbitrary Square matrices of size n × n elements.

Известно устройство для матричных операций (авторское свидетельство СССР №1429127, кл. G06F 15/347, заявл. 04.03.87, опубл. 07.10.88, бюл. №37), содержащее ленточную матрицу из n²-(n-p)(n-p+1)/2-(n-q)(n-q+1)/2 операционных блоков, где n - размерность квадратных матриц, p и q - количество элементов соответственно первого столбца и первой строки ленточной матрицы.A device for matrix operations is known (USSR author's certificate No. 1429127, class G06F 15/347, application form 04.03.87, publ. 07.10.88, bulletin No. 37) containing a tape matrix of n ² - (np) (n- p + 1) / 2- (nq) (n-q + 1) / 2 operating units, where n is the dimension of square matrices, p and q are the number of elements of the first column and the first row of the tape matrix, respectively.

Недостатком данного устройства является отсутствие возможности умножения плотных матриц.The disadvantage of this device is the lack of the ability to multiply dense matrices.

Наиболее близким по технической сущности к заявляемой полезной модели является устройство для умножения ленточной матрицы на полную матрицу (авторское свидетельство СССР №1534471, кл. G06F 15/347, заявл. 15.01.88, опубл. 07.01.90, бюл. №1), содержащее матрицу m×n (где m - ширина ленты матрицы - множимого, n - число столбцов матрицы - множителя) операционных блоков, каждый из которых содержит три регистра, умножитель, сумматор, матрицу (m-1)×n элементов задержки.The closest in technical essence to the claimed utility model is a device for multiplying the tape matrix by a full matrix (USSR author's certificate No. 1534471, class G06F 15/347, application. 15.01.88, publ. 07.01.90, bull. No. 1), containing an m × n matrix (where m is the width of the matrix-multiplier ribbon, n is the number of columns of the matrix-multiplier) operating units, each of which contains three registers, a multiplier, an adder, a matrix of (m-1) × n delay elements.

Технической задачей предложенной полезной модели является расширение функциональных возможностей за счет реализации возможности умножения произвольных квадратных матриц C=A×B размером n×n, элементов.The technical task of the proposed utility model is to expand the functionality by realizing the possibility of multiplying arbitrary square matrices C = A × B of size n × n, elements.

Техническая задача решается тем, что в устройство, содержащее матрицу из n×n операционных блоков (где n - размер перемножаемых квадратных матриц), каждый из которых содержит первый, второй и третий регистры, умножитель и сумматор, причем выход и выход второго регистра подключены соответственно к первым входу и выходу операционного блока соответственно, вход и выход первого регистра подключены соответственно к ко вторым входу и выходу операционного блока соответственно, третий выход операционного блока соединен с выходом третьего регистра, второй вход сумматора соединен с выходом умножителя, первый и второй входы которого соединены соответственно с первым и вторым выходами операционного блока, синхровход которого подключен к синхровходам всех регистров, умножителя и сумматора, первый вход (i,j)-го операционного блока (где

,

) подключен к первому выходу (i,j)-го операционного блока, второй вход (l,k)-го операционного блока (где

,

) подключен ко второму выходу (l-1,k)-го операционного блока, k-й выход группы выходов устройства подключен к третьему выходу (n,k)-го операционного блока, синхровходы всех операционных блоков подключены к синхровходу матрицы операционных блоков устройства, дополнительно введены мультиплексор в составе каждого операционного блока, первый и второй блоки коэффициентов матриц, сдвиговый регистр, группа из n двухступенчатых регистров, каждый из блоков коэффициентов матриц содержит n×n блоков хранения и группу из n выходных элементов ИЛИ, каждый из блоков хранения содержит регистр, элемент И, группу из n элементов И, группу из n элементов ИЛИ, причем выход сумматора соединен с первым входом мультиплексора, второй вход которого соединен с третьим входом операционного блока, а выход - со входом третьего регистра, выход которого соединен с первым входом сумматора, управляющий вход операционного блока соединен с управляющим входом мультиплексора, а вход сброса операционного блока - со входами сброса первого, второго и третьего регистров, третий вход (l,k)-го операционного блока (где

,

) подключен ко третьему выходу (l-1,k)-го операционного блока, первый вход (i,1)-го операционного блока (где

) подключен к i-му выходу в составе группы выходов первого блока коэффициентов матрицы, второй вход (1,k)-го операционного блока (где

) подключен к k-му выходу в составе группы выходов второго блока коэффициентов матрицы, управляющие входы операционных блоков подключены к управляющему входу устройства, а входы сброса операционных блоков - ко входу сброса устройства, который также подключен ко входам сброса каждого из n двухступенчатых регистров, информационный вход m-го двухступенчатого регистра (где

) подключен к выходу (m-1)-го двухступенчатого регистра, информационный вход первого двухступенчатого регистра подключен к выходу сдвигового регистра, синхровход которого подключен к синхровходу устройства, который также подключен к синхровходам каждого из n двухступенчатых регистров, выход k-го (где

) двухступенчатого регистра подключен к k-му входу в составе группы адресов строк чтения второго блока коэффициентов матрицы и к k-му входу в составе группы адресов столбцов чтения первого блока коэффициентов матрицы, синхровходы первого и второго блоков коэффициентов матриц подключены к синхровходу записи устройства, в составе каждого из блоков коэффициентов матриц синхровход блока коэффициентов матрицы подключен к синхровходам каждого из n×n блоков хранения, k-й вход (где

) в составе группы адресов строк чтения блока коэффициентов матрицы подключен ко входу адресов строк чтения (i,k)-го блока хранения (где

), k-й вход (где

) в составе группы адресов столбцов чтения блока коэффициентов матрицы подключен ко входу адресов столбцов чтения (k,i)-го блока хранения (где

), k-и вход (где

) в составе группы входов данных от предыдущей строки (m,p)-го блока хранения (где

,

) подключен к k-му выходу данных для следующей строки (m-1,p)-го блока хранения, k-й вход (где

) в составе группы адресов строк записи блока коэффициентов матрицы подключен ко входу адресов строк записи (i,k)-го блока хранения (где

), k-й вход (где

) в составе группы адресов столбцов записи блока коэффициентов матрицы подключен ко входу адресов столбцов записи (k,i)-го блока хранения (где

), вход данных записи блока коэффициентов матрицы подключен ко входу данных записи каждого из n×n блоков хранения, k-й выход (где

) в составе группы выходов данных для следующей строки (n,p)-го блока хранения (где

) подключен к p-му входу k-го элемента группы элементов ИЛИ блока коэффициентов матрицы, выход k-го (где

) элемента группы элементов ИЛИ блока коэффициентов матрицы подключен к k-му выходу в составе группы выходов блока коэффициентов матрицы, в составе каждого из блоков хранения информационный вход регистра подключен ко входу данных записи блока хранения, а синхровход регистра - к выходу элемента И, первый вход которого подключен ко входу адреса столбца записи блока хранения, второй вход - ко входу адреса строки записи блока хранения, а третий вход - ко входу синхронизации блока хранения, выход регистра подключен к первым входам каждого из n элементов группы элементов И, выход k-го элемента (где

) группы элементов И подключен к первому входу k-го элемента группы элементов ИЛИ, второй вход k-го элемента группы элементов ИЛИ подключен к k-му входу данных от предыдущей строки блока хранения, а выход k-го элемента группы элементов ИЛИ подключен к k-му выходу данных для следующей строки блока хранения, второй вход k-го элемента группы элементов И подключен к k-му разряду в составе входа адреса строки чтения блока хранения, а третий вход k-го элемента группы элементов И подключен к k-му разряду в составе входа адреса столбца чтения блока хранения.The technical problem is solved in that in a device containing a matrix of n × n operating blocks (where n is the size of the multiplied square matrices), each of which contains the first, second and third registers, a multiplier and an adder, with the output and output of the second register connected respectively to the first input and output of the operation unit, respectively, the input and output of the first register are connected respectively to the second input and output of the operation unit, respectively, the third output of the operation unit is connected to the output of the third register , the second input of the adder is connected to the output of the multiplier, the first and second inputs of which are connected respectively to the first and second outputs of the operating unit, the sync input of which is connected to the sync inputs of all registers, the multiplier and the adder, the first input of the (i, j) -th operating unit (where

,

) is connected to the first output of the (i, j) -th operating unit, the second input of the (l, k) -th operating unit (where

,

) is connected to the second output of the (l-1, k) -th operating unit, the k-th output of the group of device outputs is connected to the third output of the (n, k) -th operating unit, the clock inputs of all operating blocks are connected to the sync input of the matrix of operating units of the device, additionally introduced a multiplexer as part of each operational unit, the first and second blocks of matrix coefficients, a shift register, a group of n two-stage registers, each of the matrix coefficient blocks contains n × n storage blocks and a group of n output OR elements, each storage blocks contains a register, an AND element, a group of n AND elements, a group of n OR elements, the adder output being connected to the first input of the multiplexer, the second input of which is connected to the third input of the operation unit, and the output is to the input of the third register, the output of which is connected with the first input of the adder, the control input of the operating unit is connected to the control input of the multiplexer, and the reset input of the operating unit is connected to the reset inputs of the first, second and third registers, the third input of the (l, k) -th operating unit (where

,

) is connected to the third output of the (l-1, k) -th operating unit, the first input of the (i, 1) -th operating unit (where

) is connected to the i-th output as part of the group of outputs of the first block of matrix coefficients, the second input of the (1, k) -th operational block (where

) is connected to the k-th output as part of the group of outputs of the second block of matrix coefficients, the control inputs of the operating blocks are connected to the control input of the device, and the reset inputs of the operating blocks are connected to the reset input of the device, which is also connected to the reset inputs of each of n two-stage registers, information input of the m-th two-stage register (where

) is connected to the output of the (m-1) -th two-stage register, the information input of the first two-stage register is connected to the output of the shift register, the clock input of which is connected to the clock input of the device, which is also connected to the clock inputs of each of n two-stage registers, the output of the k-th

) a two-stage register is connected to the k-th input as a group of addresses of reading rows of the second block of matrix coefficients and to the k-th input of a group of addresses of reading columns of the first block of matrix coefficients, the clock inputs of the first and second blocks of matrix coefficients are connected to the sync input of the device recording, the composition of each of the blocks of matrix coefficients, the sync input of the block of matrix coefficients is connected to the sync inputs of each of the n × n storage blocks, the kth input (where

) as part of the group of addresses of the reading lines of the block of matrix coefficients is connected to the input of the addresses of the reading lines of the (i, k) th storage block (where

), k-th input (where

) as part of the group of addresses of the reading columns of the block of matrix coefficients is connected to the input of the addresses of the reading columns of the (k, i) -th storage block (where

), kth input (where

) as part of the group of data inputs from the previous row of the (m, p) -th storage block (where

,

) is connected to the k-th data output for the next row of the (m-1, p) -th storage block, the k-th input (where

) as part of the group of address of the recording row of the block of matrix coefficients is connected to the input of the address of the row of the recording of the (i, k) th storage block (where

), k-th input (where

) as part of the group of addresses of the columns of the record of the block of matrix coefficients is connected to the input of the addresses of the columns of the record of the (k, i) -th storage unit (where

), the input of the recording data of the block of matrix coefficients is connected to the input of the recording data of each of the n × n storage blocks, the kth output (where

) as part of the group of data outputs for the next row of the (n, p) -th storage block (where

) is connected to the pth input of the kth element of the group of elements OR the matrix coefficient block, the output of the kth (where

) an element of a group of elements OR of a block of matrix coefficients is connected to the k-th output as part of a group of outputs of a block of matrix coefficients, as part of each of the storage blocks, the register information input is connected to the data input of the storage block, and the register clock is connected to the output of the And element, the first input which is connected to the input of the address of the recording column of the storage unit, the second input to the input of the address of the recording line of the storage unit, and the third input to the synchronization input of the storage unit, the output of the register is connected to the first inputs of each of n e cops group element and the output element of the k-th (where

) the group of AND elements is connected to the first input of the kth element of the group of OR elements, the second input of the kth element of the group of OR elements is connected to the kth data input from the previous row of the storage unit, and the output of the kth element of the group of OR elements is connected to k the data output for the next row of the storage unit, the second input of the kth element of the group of AND elements is connected to the kth digit as part of the input address of the reading line of the storage unit, and the third input of the kth element of the group of AND elements is connected to the kth discharge as part of the input address of the read column of the storage unit I am.

На фиг. 1 изображена функциональная схема устройства; на фиг. 2 - схема операционного блока; на фиг. 3 - схема блока коэффициентов матрицы; на фиг. 4 - схема блока хранения; на фиг. 5 - пример поступления данных в матрицу операционных блоков; на фиг. 6 - пример обработки данных матрицей операционных блоков; на фиг. 7 - пример выгрузки результата умножения из матрицы операционных блоков; на фиг. 8 - пример работы блоков коэффициентов матрицы.In FIG. 1 shows a functional diagram of a device; in FIG. 2 is a diagram of an operation unit; in FIG. 3 is a diagram of a block of matrix coefficients; in FIG. 4 is a diagram of a storage unit; in FIG. 5 - an example of the receipt of data in the matrix of operating units; in FIG. 6 is an example of data processing by a matrix of operating units; in FIG. 7 is an example of unloading the result of a multiplication from a matrix of operating blocks; in FIG. 8 is an example of the operation of matrix coefficient blocks.

Устройство для умножения матриц (фиг. 1) содержит матрицу n×n операционных блоков 1, первый блок 2 коэффициентов матрицы, второй блок 3 коэффициентов матрицы, сдвиговый регистр 4, группу из n двухступенчатых регистров 5, каждый операционный блок 1 содержит (фиг. 2) первый 6, второй 7 и третий 8 регистры, умножитель 9, сумматор 10 и мультиплексор 11, первый 2 и второй 3 блоки коэффициентов матриц имеют однотипную структуру (фиг. 3), каждый из них включает в своем составе n×n блоков хранения 12, каждый из которых (фиг. 4) содержит регистр 13, элемент И 14, блок элементов И 15, блок элементов ИЛИ 16, вход 17 синхронизации, вход 18 адреса строки записи, вход 19 адреса столбца записи, вход 20 данных записи, группу входов 21 адресов строк чтения, группу входов 22 адресов столбцов чтения, группу входов 23 данных от предыдущей строки, группу выходов 24 данных для следующей строки, в состав каждого из блоков 2 и 3 коэффициентов матриц также входят группа входов 25 адресов строк чтения, группа входов 26 адресов столбцов чтения, вход 27 адреса строки записи, вход 28 адреса столбца записи, вход 29 данных записи, вход 30 синхронизации, блок элементов ИЛИ 31, группа выходов 32, причем выход и выход второго регистра 7 подключены соответственно к первым входу и выходу операционного блока 1 соответственно, вход и выход первого регистра 6 подключены соответственно к ко вторым входу и выходу операционного блока 1 соответственно, третий выход операционного блока 1 соединен с выходом третьего регистра 8, второй вход сумматора 10 соединен с выходом умножителя 9, первый и второй входы которого соединены соответственно с первым и вторым выходами операционного блока 1, синхровход которого подключен к синхровходам первого 6, второго 7 и третьего 8 регистров, умножителя 9 и сумматора 10, выход сумматора 10 соединен с первым входом мультиплексора 11, второй вход которого соединен с третьим входом операционного блока 1, а выход - со входом третьего регистра 8, выход которого соединен с первым входом сумматора 10, управляющий вход операционного блока 1 соединен с управляющим входом мультиплексора 11, а вход сброса операционного блока 1 - со входами сброса первого 6, второго 7 и третьего 8 регистров, первый вход (i,j)-го операционного блока 1 (где

,

) подключен к первому выходу (i,j-1)-го операционного блока 1, второй вход (l,k)-го операционного блока 1 (где

,

) подключен ко второму выходу (l-1,k)-го операционного блока 1, k-й выход группы выходов устройства подключен к третьему выходу (n,k)-го операционного блока 1, синхровходы всех операционных блоков 1 подключены к синхровходу матрицы операционных блоков устройства, третий вход (l,k)-го операционного блока 1 (где

,

) подключен ко третьему выходу (l-1,k)-го операционного блока 1, первый вход (i,l)-го операционного блока 1 (где

) подключен к i-му выходу в составе группы выходов первого блока 2 коэффициентов матрицы, второй вход (1,k)-го операционного блока 1 (где

) подключен к k-му выходу в составе группы выходов второго блока 3 коэффициентов матрицы, управляющие входы операционных блоков 1 подключены к управляющему входу устройства, а входы сброса операционных блоков 1 - ко входу сброса устройства, который также подключен ко входам сброса каждого из n двухступенчатых регистров 5, информационный вход m-го двухступенчатого регистра 5 (где

) подключен к выходу (m-1)-го двухступенчатого регистра 5, информационный вход первого двухступенчатого регистра 5 подключен к выходу сдвигового регистра 4, синхровход которого подключен к синхровходу устройства, который также подключен к синхровходам каждого из n двухступенчатых регистров 5, выход k-го (где

) двухступенчатого регистра 5 подключен к k-му входу в составе группы адресов строк чтения 25 второго блока 3 коэффициентов матрицы и к k-му входу в составе группы адресов столбцов чтения 26 первого блока 2 коэффициентов матрицы, синхровходы 30 первого 2 и второго 3 блоков коэффициентов матриц подключены к синхровходу записи устройства, в составе каждого из блоков коэффициентов матриц синхровход блока коэффициентов матрицы 30 подключен к синхровходу 17 каждого из n×n блоков хранения 12, k-й вход (где

) в составе группы адресов строк чтения 25 блока коэффициентов матрицы подключен ко входу 21 адресов строк чтения (i,k)-го блока хранения 12 (где

), k-й вход (где

) в составе группы адресов столбцов чтения 26 блока коэффициентов матрицы подключен ко входу 22 адресов столбцов чтения (k,i)-го блока хранения 12 (где

), k-й вход (где

) в составе группы входов 23 данных от предыдущей строки (m,p)-го блока хранения 12 (где

,

) подключен к k-му выходу 24 данных для следующей строки (m-1,p)-го блока хранения 12, k-й вход (где

) в составе группы адресов строк записи 27 блока коэффициентов матрицы подключен ко входу 18 адресов строк записи (i,k)-го блока хранения 12 (где

), k-й вход (где

) в составе группы адресов столбцов записи 28 блока коэффициентов матрицы подключен ко входу 19 адресов столбцов записи (k,i)-го блока хранения 12 (где

), вход 29 данных записи блока коэффициентов матрицы подключен ко входу 20 данных записи каждого из n×n блоков хранения 12, k-й выход (где

) в составе группы выходов 24 данных для следующей строки (n,p)-го блока хранения 12 (где

) подключен к p-му входу k-го элемента группы элементов ИЛИ 31 блока коэффициентов матрицы, выход k-го (где

) элемента группы элементов ИЛИ 31 блока коэффициентов матрицы подключен к k-му выходу 32 в составе группы выходов блока коэффициентов матрицы, в составе каждого из блоков хранения информационный вход регистра 13 подключен ко входу 20 данных записи блока хранения, а синхровход регистра 13 - к выходу элемента И 14, первый вход которого подключен ко входу 19 адреса столбца записи блока хранения 12, второй вход - ко входу 18 адреса строки записи блока хранения 12, а третий вход - ко входу синхронизации 17 блока хранения 12, выход регистра 13 подключен к первым входам каждого из n элементов группы элементов И 15, выход k-го элемента (где

) группы элементов И 15 подключен к первому входу k-го элемента группы элементов ИЛИ 16, второй вход k-го элемента группы элементов ИЛИ 16 подключен к k-му входу 23 данных от предыдущей строки блока хранения 12, а выход k-го элемента группы элементов ИЛИ 16 подключен к k-му выходу 24 данных для следующей строки блока хранения 12, второй вход k-го элемента группы элементов И 15 подключен к k-му разряду в составе входа 21 адреса строки чтения блока хранения 12, а третий вход k-го элемента группы элементов И 15 подключен к k-му разряду в составе входа 22 адреса столбца чтения блока хранения 12.The device for matrix multiplication (Fig. 1) contains an n × n matrix of operating blocks 1, a first matrix coefficient block 2, a second matrix coefficient block 3, a shift register 4, a group of n two-stage registers 5, each operational block 1 contains (Fig. 2 ) the first 6, second 7 and third 8 registers, multiplier 9, adder 10 and multiplexer 11, first 2 and second 3 blocks of matrix coefficients have the same structure (Fig. 3), each of them includes n × n storage blocks 12 , each of which (Fig. 4) contains a register 13, element AND 14, block elemen And 15, block of elements OR 16, input 17 synchronization, input 18 addresses of the record line, input 19 addresses of the record column, input 20 data of the record, group of inputs 21 addresses of the read lines, group of inputs 22 addresses of the read columns, group of inputs 23 data from the previous rows, a group of outputs 24 data for the next row, each of

blocks

2 and 3 of the matrix coefficients also includes a group of inputs 25 addresses of read rows, a group of inputs 26 addresses of read columns, input 27 addresses of a write row, input 28 addresses of a write column, input 29 recording data, input 30 synchronization, bl ok elements OR 31, a group of outputs 32, and the output and output of the second register 7 are connected respectively to the first input and output of the operating unit 1, respectively, the input and output of the first register 6 are connected respectively to the second input and output of the operating unit 1, respectively, the third output of the operating unit 1 is connected to the output of the third register 8, the second input of the adder 10 is connected to the output of the multiplier 9, the first and second inputs of which are connected respectively to the first and second outputs of the operation unit 1, the sync input of which Connected to the sync inputs of the first 6, second 7 and third 8 registers, a multiplier 9 and an adder 10, the output of the adder 10 is connected to the first input of the multiplexer 11, the second input of which is connected to the third input of the operation unit 1, and the output is connected to the input of the third register 8, output which is connected to the first input of the adder 10, the control input of the operation unit 1 is connected to the control input of the multiplexer 11, and the reset input of the operation unit 1 is connected to the reset inputs of the first 6, second 7 and third 8 registers, the first input of the (i, j) -th operating block 1 (where

,

) is connected to the first output of the (i, j-1) -th operational unit 1, the second input of the (l, k) -th operational unit 1 (where

,

) is connected to the second output of the (l-1, k) -th operating unit 1, the k-th output of the group of device outputs is connected to the third output of the (n, k) -th operating unit 1, the sync inputs of all operating units 1 are connected to the sync input of the operating matrix units of the device, the third input of the (l, k) -th operational block 1 (where

,

) is connected to the third output of the (l-1, k) -th operational unit 1, the first input of the (i, l) -th operational unit 1 (where

) is connected to the i-th output as part of the group of outputs of the first block 2 of the matrix coefficients, the second input of the (1, k) -th operational block 1 (where

) is connected to the k-th output as part of the group of outputs of the second block of 3 matrix coefficients, the control inputs of the operating blocks 1 are connected to the control input of the device, and the reset inputs of the operating blocks 1 are connected to the reset input of the device, which is also connected to the reset inputs of each of n two-stage registers 5, the information input of the m-th two-stage register 5 (where

) is connected to the output of the (m-1) -th two-stage register 5, the information input of the first two-stage register 5 is connected to the output of the shift register 4, the clock input of which is connected to the clock input of the device, which is also connected to the clock inputs of each of n two-stage registers 5, the output k- go (where

) two-stage register 5 is connected to the k-th input as a group of addresses of reading rows 25 of the second block of 3 matrix coefficients and to the k-th input of a group of addresses of reading columns 26 of the first block 2 of matrix coefficients, sync inputs 30 of the first 2 and second 3 blocks of coefficients matrices are connected to the device recording clock input, as part of each of the matrix coefficients blocks, the clock input of the matrix coefficients block 30 is connected to the clock input 17 of each of the n × n storage blocks 12, the kth input (where

) as part of the group of addresses of the reading lines, 25 block of matrix coefficients is connected to the input of 21 addresses of the reading lines of the (i, k) th storage block 12 (where

), k-th input (where

) as part of the group of addresses of reading columns, 26 block of matrix coefficients is connected to the input of 22 addresses of reading columns of the (k, i) -th storage unit 12 (where

), k-th input (where

) as part of a group of inputs 23 data from the previous row of the (m, p) -th storage block 12 (where

,

) is connected to the k-th output 24 of the data for the next row of the (m-1, p) -th storage block 12, the k-th input (where

) as a part of the group of addresses of recording lines 27 of the matrix coefficient block is connected to the input of 18 addresses of the recording lines of the (i, k) th storage unit 12 (where

), k-th input (where

) as part of the group of addresses of the columns of the record, the 28 block of matrix coefficients is connected to the input 19 of the addresses of the columns of the record of the (k, i) -th storage block 12 (where

), the input 29 of the recording data of the block of matrix coefficients is connected to the input 20 of the recording data of each of the n × n storage blocks 12, the k-th output (where

) as part of a group of outputs 24 data for the next row of the (n, p) -th storage block 12 (where

) is connected to the pth input of the kth element of the group of elements OR 31 of the matrix coefficient block, the output of the kth (where

) of an element of the group of elements OR 31 of the matrix coefficient block is connected to the k-th output 32 as part of the group of outputs of the matrix coefficient block, as part of each of the storage blocks, the information input of the register 13 is connected to the input 20 of the recording data of the storage block, and the clock input of the register 13 is connected to the output element And 14, the first input of which is connected to the input 19 of the address of the recording column of the storage unit 12, the second input - to the input 18 of the address of the recording line of the storage unit 12, and the third input - to the synchronization input 17 of the storage unit 12, the output of the register 13 is connected to the first moves each of n elements of the group of AND gates 15, the output k-th element (where

) the group of elements AND 15 is connected to the first input of the k-th element of the group of elements OR 16, the second input of the k-th element of the group of elements OR 16 is connected to the k-th input 23 of data from the previous row of the storage unit 12, and the output of the k-th element of the group elements OR 16 is connected to the k-th output 24 of the data for the next line of the storage unit 12, the second input of the k-th element of the group of elements AND 15 is connected to the k-th category in the input 21 of the address of the reading line of the storage unit 12, and the third input k- of the th element of the group of elements And 15 is connected to the k-th category in the input 22 of the column address reading storage unit 12.

Операционные блоки 1, объединенные в систолическую матричную структуру, используются для выполнения умножения элементов матриц в параллельно-конвейерном виде и хранения его результата. Первый 2 и второй 3 блоки коэффициентов матриц используются для хранения значений коэффициентов a _ij и b_ij,

первой и второй матриц и выборки требуемых групп из n коэффициентов за такт в каждом из блоков в соответствии с логикой работы матрицы операционных блоков 1 (см. фиг. 5). Сдвиговый регистр 4 обеспечивает формирование последовательности значений 00…01₂, 00…010₂, …, 100…0₂, 00…0₂, …, 00...0₂, используемых при работе группы регистров 5. Группа двухступенчатых регистров 5 предназначена для хранения адресов группы из n строк первого блока 2 коэффициентов матрицы и группы из n столбцов второго блока 3 коэффициентов матрицы; данные адреса необходимы для чтения групп коэффициентов перемножаемых матриц в соответствии с логикой работы матрицы операционных блоков 1. Регистры 6 и 7 предназначены для хранения коэффициентов перемножаемых матриц при работе операционных блоков 1. Регистр 8 используется для хранения промежуточных значений сумм произведений коэффициентов перемножаемых матриц в процессе работы устройства, после 3n-2 шагов работы устройства данные регистры содержат значения коэффициентов результирующей матрицы C. Умножитель 9 используется для перемножения пары коэффициентов a _ik и b_kj матриц. Сумматор 10 предназначен для вычисления суммы произведений коэффициентов перемножаемых матриц. Мультиплексор 11 обеспечивает поступление значения суммы произведений коэффициентов перемножаемых матриц на вход регистра 8 на этапе работы и продвижение результата умножения (коэффициентов результирующей матрицы C) между регистрами 8 операционных блоков 1 в параллельно-конвейерном режиме на этапе вывода результата. Блоки хранения 12, объединенные в матрицу их и (фиг. 3), образуют собой первый 2 и второй 3 блоки коэффициентов перемножаемых матриц A и B. Регистры 13 служат для хранения исходных значений коэффициентов перемножаемых матриц. Элементы И 14 управляют прохождением синхросигнала 17 на синхровходы регистров 13 в соответствии с выбранными значениями строки i и столбца j в составе блоков коэффициентов матриц. Блоки элементов И 15₁-15_n управляют прохождением на входы соответствующих блоков элементов ИЛИ 16₁-16_n группы из n значений с выходов регистров 13, причем выбранные значения определяются координатами искомых строк и столбцов на входах 21₁-21_n и 22₁-22_n в соответствии с логикой работы матрицы операционных блоков 1. Элементы ИЛИ 16₁-16_n в составе блоков хранения 12 в совокупности с элементами ИЛИ 31₁- 31_n обеспечивают получение на выходах 32₁-32_n блока коэффициентов матрицы очередной группы коэффициентов матрицы в соответствии с логикой работы матрицы операционных блоков 1, причем n пар адресов строк и столбцов искомых коэффициентов определяются значениями на входах 21₁-21_n и 22₁-22_п блоков хранения 12. Вход 17 синхронизации используется при записи начальных значений коэффициентов матриц в регистры 13. Входы 18 и 19 адреса строки i и столбца j записи обеспечивают выбор ij-го блока хранения 12 в составе блока коэффициентов матрицы путем управления прохождением синхросигнала 17 через элемент И 14 при записи требуемого коэффициента со входа 20 в регистр 13 выбранного ij-го блока хранения 12. Вход 20 данных записи используется для приема блоком хранения 12 коэффициентов перемножаемых матриц с целью их записи в регистры 13. Группа входов 21₁-21_n адресов строк чтения в совокупности с группой входов 22₁-22_n адресов столбцов чтения используются для управления прохождением значений группы из n коэффициентов матрицы из регистров 13 блоков хранения 12 на выходы 32₁-32_n блока элементов матрицы. Группа входов 23₁-23_n данных от предыдущей строки в совокупности с группой входов 24₁-24_n данных для следующей строки используется для формирования искомых значений дизъюнкции группы выбранных значений коэффициентов матриц по столбцам блоков хранения 12, получаемой на выходах 24₁-24_n блоков хранения 12 n-й строки блока коэффициентов матрицы. Вход 25 адресов строк чтения в совокупности со входом 26 адресов столбцов чтения используются для получения блоком коэффициентов матрицы адресов строк и столбцов требуемых элементов с целью их выборки из триггеров 13 и выдачи на выходы 32₁-32_n. Вход 27 адреса строки записи в совокупности со входом 28 адреса столбца записи используются для получения блоком коэффициентов матрицы значений i-й строки и j-го столбца с целью последующей записи в выбранный ij-й операционный блок значения коэффициента, поданного на вход 29 данных записи. Вход 30 синхронизации используется для приема стробирующего сигнала записи данных со входа 29 блока коэффициентов матрицы в выбранный ij-й блок хранения 12. Блок элементов ИЛИ 31₁-31_n используется для формирования на выходах 32₁-32_n блока коэффициентов матрицы дизъюнкций значений с выходов 24₁-24_n блоков хранения 12 блока коэффициентов матрицы с целью формирования группы из n искомых коэффициентов матрицы в соответствии с их адресами на входах 25 и 26 и логикой работы матрицы операционных блоков 1. На выходах 32₁-32_п блока коэффициентов матрицы производится формирование группы из n коэффициентов матрицы с целью их последующей передачи на матрицу операционных блоков 1.Operational units 1, combined into a systolic matrix structure, are used to perform the multiplication of matrix elements in a parallel-conveyor form and store its result. The first 2 and second 3 blocks of matrix coefficients are used to store the values of the coefficients a _ij and b _ij ,

the first and second matrices and a sample of the required groups of n coefficients per cycle in each of the blocks in accordance with the logic of the matrix of operational blocks 1 (see Fig. 5). The shift register 4 provides the formation of a sequence of values 00 ... 01 ₂ , 00 ... 010 ₂ , ..., 100 ... 0 ₂ , 00 ... 0 ₂ , ..., 00 ... 0 ₂ used in the operation of the group of registers 5. The group of two-stage registers 5 is intended for storing the addresses of a group of n rows of the first block of 2 matrix coefficients and a group of n columns of the second block of 3 matrix coefficients; these addresses are necessary for reading groups of coefficients of multiplicable matrices in accordance with the logic of operation of the matrix of operating blocks 1.

Registers

6 and 7 are designed to store coefficients of multiplicative matrices during operation of operational blocks 1. Register 8 is used to store intermediate values of the sums of the products of coefficients of multiplicable matrices during operation devices, after 3n-2 steps of the device operation, these registers contain the values of the coefficients of the resulting matrix C. Multiplier 9 is used to multiply Nia pair of coefficients a _ik and b _kj matrices. The adder 10 is designed to calculate the sum of the products of the coefficients of the multiplied matrices. The multiplexer 11 provides the input value of the sum of the products of the coefficients of the multiplied matrices to the input of the register 8 at the stage of operation and the promotion of the multiplication result (coefficients of the resulting matrix C) between the registers 8 of the operating units 1 in parallel-conveyor mode at the stage of outputting the result. Storage blocks 12, combined in their matrix and (Fig. 3), form the first 2 and second 3 blocks of coefficients of the multiplicable matrices A and B. Registers 13 are used to store the initial values of the coefficients of the multiplicable matrices. Elements And 14 control the passage of the clock signal 17 to the clock inputs of the registers 13 in accordance with the selected values of row i and column j in the blocks of matrix coefficients. Blocks of elements And 15 ₁ -15 _n control the passage to the inputs of the corresponding blocks of elements OR 16 ₁ -16 _n groups of n values from the outputs of registers 13, and the selected values are determined by the coordinates of the desired rows and columns at the inputs 21 ₁ -21 _n and 22 ₁ - 22 _n in accordance with the logic of operation of the matrix of operating blocks 1. OR elements 16 ₁ -16 _n as part of storage units 12 together with OR elements 31 ₁ - 31 _n provide the outputs 32 ₁ -32 _n of the matrix coefficient block of the next group of matrix coefficients in accordance with the logic of the matrix peratsionnyh units 1, wherein the n pairs of row addresses and column values of the unknown coefficients are determined on the inlet 21 ₁ -21 _n and 22 ₁ -22 _n storage 12. Entrance 17 is used for recording synchronization initial values of matrix coefficients in registers block 13. The

inputs

18 and 19 the addresses of row i and column j of the record provide the choice of the ij-th storage unit 12 as part of the matrix coefficient block by controlling the passage of the clock signal 17 through the And element 14 when writing the required coefficient from input 20 to register 13 of the selected ij-th storage block 12. Data input 20 behind write is used for the storage unit to receive 12 coefficients of multiplicable matrices for recording them in registers 13. Input group 21 ₁ -21 _n read row addresses in conjunction with a group of inputs 22 ₁ -22 _n read column addresses are used to control the passage of values of a group of n coefficients matrices from registers 13 of storage units 12 to outputs 32 ₁ -32 _{n of a} block of matrix elements. The group of inputs 23 ₁ -23 _n data from the previous row in conjunction with the group of inputs 24 ₁ -24 _n data for the next row is used to form the desired disjunction values of the group of selected values of the matrix coefficients for the columns of storage units 12 obtained at the outputs 24 ₁ -24 _n storage blocks 12 n-th row of the matrix coefficient block. An input of 25 read row addresses together with an input of 26 read column addresses is used to obtain a block of matrix coefficients for the row and column addresses of the required elements in order to select them from triggers 13 and output 32 ₁ -32 _n to the outputs. The entry 27 of the recording row address in conjunction with the input 28 of the recording column address is used to obtain the block of matrix coefficients of the values of the i-th row and j-th column for subsequent recording in the selected ij-th operational block the value of the coefficient applied to the input 29 of the recording data. The synchronization input 30 is used to receive the gating signal of data recording from the input 29 of the matrix coefficient block to the selected ij-th storage block 12. The block of OR elements 31 ₁ -31 _{n is} used to form the block of coefficients of the matrix of disjunction values from the outputs 32 ₁ -32 _n 24 ₁ -24 _n storage blocks 12 blocks of matrix coefficients in order to form a group of n required matrix coefficients in accordance with their addresses at

inputs

25 and 26 and the logic of operation of the matrix of operational blocks 1. At outputs 32 ₁ -32 _p block of matrix coefficients is produced a group of n matrix coefficients is formed for the purpose of their subsequent transmission to the matrix of operating units 1.

Устройство работает следующим образом. На этапе загрузки исходных данных значения a _ij,

элементов первой матрицы поочередно подаются на вход 29 первого блока 2 коэффициентов матрицы внешним устройством, на входы 27 и 28 первого блока 2 коэффициентов матрицы подаются соответственно адреса строки i и столбца j в унитарном коде, запись очередного элемента a _ij производится путем подачи синхроимпульса на синхровход 30 записи первого блока 2 коэффициентов матрицы.The device operates as follows. At the stage of loading the initial data, the values a _ij ,

the elements of the first matrix are alternately fed to the input 29 of the first block 2 of the matrix coefficients by an external device, the addresses of row i and column j in the unitary code are supplied respectively to the

inputs

27 and 28 of the first block of 2 matrix coefficients, the next element a _{ij is} recorded by applying a clock pulse to sync input 30 write the first block of 2 matrix coefficients.

Аналогично рассмотренному выше производится загрузка исходных значений b_ij,

элементов второй матрицы путем подачи внешним устройством их значений на вход 29 второго блока 3 коэффициентов матрицы, подачи адресов i и j в унитарном коде соответственно на входы 27 и 28 второго блока 3 коэффициентов матрицы и синхроимпульса на синхровход 30 записи второго блока 3 коэффициентов матрицы. Загрузка элементов первой и второй матриц может быть совмещена во времени.Similarly to the above, the initial values b _{ij are} loaded,

the elements of the second matrix by applying an external device their values to the input 29 of the second block 3 of the matrix coefficients, supplying the addresses i and j in the unitary code, respectively, to the

inputs

27 and 28 of the second block 3 of the matrix coefficients and the clock pulse to the sync input 30 of the recording of the second block 3 of the matrix coefficients. Loading elements of the first and second matrices can be combined in time.

Значение коэффициента с входа 29 блока коэффициентов матрицы поступает на входы 20 блоков хранения 12 и затем на информационный вход регистра 13 в каждом из блоков хранения. Значения адресов строки и столбца со входов 27 и 28 поступают соответственно на входы 18 и 19 блоков хранения 12, причем i-й разряд со входа 27 подается на входы 18 блоков хранения 12 i-й строки блока коэффициентов матрицы, а j-й разряд со входа 28 подается на входы 19 блоков хранения 12 j-го столбца блока коэффициентов матрицы,

. Синхроимпульс со входа 30 блока коэффициентов матрицы поступает на входы 17 операционных блоков и далее на вход элемента И 14. Единичные значения со входов 18 и 19, соответствующие выбранному операционному блоку 12_ij, поступают на вход элемента И 14 операционных блоков и открывают его для прохождения синхросигнала на синхровход регистра 13, обеспечивая запись очередного коэффициента. Во всех остальных операционных блоках 12 как минимум на одном из входов 18 или 19 присутствует нулевое значение, элементы И 14 закрыты для прохождения синхросигнала и запись значения в регистр 13 не происходит.The coefficient value from the input 29 of the block of matrix coefficients is fed to the inputs 20 of the storage units 12 and then to the information input of the register 13 in each of the storage units. The values of the row and column addresses from

inputs

27 and 28 are respectively supplied to the

inputs

18 and 19 of storage units 12, and the i-th digit from input 27 is fed to the inputs of 18 storage units 12 of the i-th row of the matrix coefficient block, and the j-th discharge input 28 is fed to the inputs 19 of the storage blocks 12 of the jth column of the matrix coefficient block,

. The clock pulse from the input 30 of the block of matrix coefficients goes to the inputs 17 of the operating blocks and then to the input of the I 14 element. The single values from the

inputs

18 and 19 corresponding to the selected operation block 12 _ij go to the input of the I element 14 of the operation blocks and open it for the clock signal to the sync input of register 13, providing a record of the next coefficient. In all other operating units 12, at least one of the

inputs

18 or 19 has a zero value, the AND 14 elements are closed for the clock signal and the value is not written to register 13.

На этапе инициализации производится подача сигнала сброса на входы сброса блока регистров 5 и регистров 6, 7 и 8, что обеспечивает их установку в ноль. В сдвиговый регистр 4 производится запись значения 00…01₂. На входы 25₁-25_n адресов строк чтения первого блока 2 коэффициентов матрицы соответственно подаются значения 00…01₂, 00…010₂, …, 100…0₂, что впоследствии обеспечивает выдачу элементов a _i1, a _i2, …, a _in на соответствующем выходе 32_i первого блока 2 коэффициентов матрицы,

. На входы 26₁-26_n адресов столбцов чтения второго блока 3 коэффициентов матрицы соответственно подаются значения 00…01₂, 00…010₂, …, 100…0₂, что впоследствии обеспечивает выдачу элементов b_1j, b_2j, …, b_jn на соответствующем выходе 32_j второго блока 3 коэффициентов матрицы,

. На адресные входы мультиплексоров 11 подается нулевое значение.At the initialization stage, a reset signal is supplied to the reset inputs of the block of registers 5 and registers 6, 7 and 8, which ensures their setting to zero. The value 00 ... 01 _{2 is} recorded in the shift register 4. The inputs 25 ₁ -25 _n addresses of the reading lines of the first block of 2 matrix coefficients are respectively supplied with the values 00 ... 01 ₂ , 00 ... 010 ₂ , ..., 100 ... 0 ₂ , which subsequently ensures the output of the elements a _i1 , a _i2 , ..., a _in at the corresponding output 32 _{i of the} first block 2 of the matrix coefficients,

. The inputs 26 ₁ -26 _n addresses of the reading columns of the second block 3 of the matrix coefficients are respectively supplied with values 00 ... 01 ₂ , 00 ... 010 ₂ , ..., 100 ... 0 ₂ , which subsequently ensures the output of elements b _1j , b _2j , ..., b _jn at the corresponding output 32 _{j of the} second block 3 of the matrix coefficients,

. The address inputs of the multiplexers 11 is fed a zero value.

На этапе работы на синхровходы блока двухступенчатых регистров 5 подается сигнал записи в первую ступень, что обеспечивает запись значений по следующей схеме: Рг5_i:=Рг5_i-1,

, Рг51:=Рг4. После этого в следующем такте подается сигнал записи во вторую ступень регистров 5, что обеспечивает запись информации из первой ступени во вторую, на синхровход сдвигового регистра 4 подается сигнал, обеспечивающий сдвиг содержимого регистра в сторону старших разрядов. На первом шаге на этапе работы регистр 4 получает значение 00…010₂, а регистры 5 - соответственно значения 00…01₂, 00…0₂, …, 00…0₂.At the stage of operation, the sync inputs of the block of two-stage registers 5 are supplied with a recording signal in the first stage, which ensures the recording of values according to the following scheme: Рг5 _i : = Рг5 _i-1 ,

Pg51: = Pg4. After that, in the next step, a write signal is supplied to the second stage of the registers 5, which ensures the recording of information from the first stage to the second, a signal is provided to the sync input of the shift register 4, which ensures the shift of the contents of the register to the higher bits. In the first step, at the stage of operation, register 4 receives the value 00 ... 010 ₂ , and registers 5 receive the values 00 ... 01 ₂ , 00 ... 0 ₂ , ..., 00 ... 0 ₂ , respectively.

Значения с выходов группы регистров 5₁-5_n поступают соответственно на входы 26₁-26_n адресов столбцов чтения первого блока 2 коэффициентов матрицы, а затем на соответствующие входы 22 блоков хранения 12 элементов первого блока 2 коэффициентов матрицы, причем значение k-го бита регистра 5_j поступает на входы 22_j операционных блоков k-го столбца первого блока 2 коэффициентов матрицы. Также значения с выходов группы регистров 5₁-5_п поступают соответственно на входы 25₁-25_n адресов столбцов чтения второго блока 3 коэффициентов матрицы, а затем на соответствующие входы 21 блоков хранения 12 элементов второго блока 3 коэффициентов матрицы, причем значение k-го бита регистра 5_i поступает на входы 22_i операционных блоков k-й строки второго блока 3 коэффициентов матрицы.The values from the outputs of the group of registers 5 ₁ -5 _n go respectively to the inputs 26 ₁ -26 _{n of the} address of the reading columns of the first block of 2 matrix coefficients, and then to the corresponding inputs of 22 storage blocks of 12 elements of the first block of 2 matrix coefficients, and the value of the kth bit register 5 _j goes to the inputs 22 _{j of the} operating blocks of the k-th column of the first block 2 of the matrix coefficients. Also, the values from the outputs of the group of registers 5 ₁ -5 _p go respectively to the inputs 25 ₁ -25 _{n of the} addresses of the reading columns of the second block of 3 matrix coefficients, and then to the corresponding inputs of 21 storage units of 12 elements of the second block of 3 matrix coefficients, and the value of the kth the register bit 5 _i goes to the inputs 22 _{i of the} operational blocks of the kth row of the second block of 3 matrix coefficients.

Со входов 21 и 22 блоков хранения 12 двоичные значения поступают на входы соответствующих элементов И 15, открывая их для прохождения значения из регистра 13 на вход соответствующего элемента ИЛИ 16, при этом элемент И 15_k открыт только в том случае, если оба сигнала на входах 21_k и 22_k имеют единичное значение,

. Этим обеспечивается чтение значения регистра 13 выбранного операционного блока 12_ij с использованием k-й группы входов 21_k и 22_k и элемента И 15_k, в остальных операционных блоках на выходе элемента И 15_k будет сформировано нулевое значение ввиду наличия нулевого сигнала как минимум на одном из входов 21_k и 22_k.From the

inputs

21 and 22 of the storage units 12, binary values are supplied to the inputs of the corresponding elements AND 15, opening them for passing the value from the register 13 to the input of the corresponding element OR 16, while the element And 15 _{k is} open only if both signals at the

inputs

21 _k and 22 _k have a single value,

. This ensures that the value of register 13 of the selected operating unit 12 _{ij is} read using the k-th group of

inputs

21 _k and 22 _k and the And 15 _k element, in the remaining operational blocks, the output of the And 15 _k element will generate a zero value due to the presence of a zero signal at least one of the entrances is 21 _k and 22 _k .

Прочитанные значения последовательно проходят через элементы ИЛИ 16 «сверху вниз» (в порядке элементы ИЛИ 16₁-16_n операционного блока 12_1j, затем элементы ИЛИ 16₁-16_n операционного блока 12_2j и т.д. до элементов ИЛИ 16₁-16_n операционного блока 12_nj,

) и с выходов 24₁-24_n блоков хранения 12 n-й строки блока коэффициентов матрицы поступают на входы соответствующих элементов ИЛИ 31₁-31_n. На выходе каждого из элементов ИЛИ 31₁-31_n формируется дизъюнкция значений с выходов 24₁-24_n, соответствующая требуемому ij-му элементу хранимой матрицы, причем каждая из k пар значений i и j определяются соответствующими значениями на входах 25₁-25_n и 26₁-26_n блоков хранения 12,

. Этим достигается возможность независимого параллельного во времени выбора k значений среди хранящихся в блоке коэффициентов матрицы.The read values pass sequentially through the elements of the OR 16 "from top to bottom" (in the order of the elements OR 16 ₁ -16 _{n of the} operating unit 12 _1j , then the elements OR 16 ₁ -16 _{n of the} operational unit 12 _2j , etc. to the elements OR 16 ₁ - 16 _n operating unit 12 _nj ,

) and from the outputs 24 ₁ -24 _n of the storage units, the 12th row of the block of matrix coefficients are supplied to the inputs of the corresponding OR elements 31 ₁ -31 _n . At the output of each of the OR elements 31 ₁ -31 _n , a disjunction of values from the outputs 24 ₁ -24 _{n is formed} , corresponding to the required ijth element of the stored matrix, each of k pairs of values of i and j being determined by the corresponding values at the inputs 25 ₁ -25 _n and 26 ₁ -26 _n storage units 12,

. This ensures the possibility of independent time-parallel selection of k values among the matrix coefficients stored in the block.

Так на первом шаге на этапе работы на входах 25₁-25_n первого блока 2 коэффициентов матрицы присутствуют значения 00…01₂, 00…010₂, …, 100…0₂ (1, 2, …, n в десятичной форме с учетом используемого унитарного кодирования), на входах 26₁-26_n первого блока 2 коэффициентов матрицы - значения 00…01₂, 00…0₂, …, 00…0₂ (1, 0, …, 0 в десятичной форме), что обеспечивает формирование на выходах 32₁-32_n первого блока 2 коэффициентов матрицы значений a ₁₁, 0, 0, …, 0 соответственно (несуществующим элементам матрицы a _i0 соответствуют нули). Аналогично рассмотренному выше, на входах 25₁-25_n второго блока 3 коэффициентов матрицы присутствуют значения 00…01₂, 00…0₂, …, 00…0₂ (1, 0, …, 0 в десятичной форме), на входах 26₁-26_n первого блока 2 коэффициентов матрицы - значения 00…01₂, 00…010₂, 100…0₂ (1, 2, …, n в десятичной форме), что обеспечивает формирование на выходах 32₁-32_n второго блока 3 коэффициентов матрицы значений b₁₁, 0, 0, …, 0 соответственно.So at the first step at the work stage, at the inputs 25 ₁ -25 _{n of the} first block of 2 matrix coefficients there are values 00 ... 01 ₂ , 00 ... 010 ₂ , ..., 100 ... 0 ₂ (1, 2, ..., n in decimal form, taking into account unitary coding), at inputs 26 ₁ -26 _{n of the} first block of 2 matrix coefficients - values 00 ... 01 ₂ , 00 ... 0 ₂ , ..., 00 ... 0 ₂ (1, 0, ..., 0 in decimal form), which ensures formation at the outputs 32 ₁ -32 _{n of the} first block of 2 coefficients of the matrix of values a ₁₁ , 0, 0, ..., 0, respectively (non-existent elements of the matrix a _i0 correspond to zeros). Similarly to the above, at the inputs 25 ₁ -25 _{n of the} second block of 3 matrix coefficients there are values 00 ... 01 ₂ , 00 ... 0 ₂ , ..., 00 ... 0 ₂ (1, 0, ..., 0 in decimal form), at the inputs 26 ₁ -26 _{n of the} first block of 2 matrix coefficients - values 00 ... 01 ₂ , 00 ... 010 ₂ , 100 ... 0 ₂ (1, 2, ..., n in decimal form), which ensures the formation of outputs 32 ₁ -32 _{n of the} second block 3 coefficients of the matrix of values of b ₁₁ , 0, 0, ..., 0, respectively.

Сформированные значения с выходов 32 первого блока 2 коэффициентов матрицы поступают на входы соответствующих регистров 7 первого столбца матрицы операционных блоков 1, а с выходов 32 второго блока 3 коэффициентов матрицы - на входы соответствующих регистров 6 первой строки матрицы операционных блоков 1, где фиксируются по пришествии соответствующего синхросигнала; значения из первой ступени регистров 8 операционных блоков 1 при этом записываются во вторую ступень.The generated values from the outputs 32 of the first block 2 of the matrix coefficients go to the inputs of the corresponding registers 7 of the first column of the matrix of operating blocks 1, and the outputs 32 of the second block 3 of the matrix coefficients go to the inputs of the corresponding registers 6 of the first row of the matrix of operating blocks 1, where they are fixed upon the arrival of the corresponding clock signal; the values from the first stage of the registers 8 of the operating units 1 are recorded in the second stage.

Каждый операционный блок 1 реализует функции

,

, где t - номер шага работы устройства. Вычисление значений коэффициентов результирующей матрицы С производится с использованием следующих рекуррентных формул:Each operating unit 1 implements functions

,

where t is the step number of the device. The calculation of the coefficients of the resulting matrix C is performed using the following recurrence formulas:

С выходов регистров 6 и 7 значения a _ik и b_kj подаются на вход блока умножения 9, на выходе которого формируется их произведение a_ikb_kj, поступающее на первый вход сумматора 10. На второй вход сумматора 10 подается значение из регистра 8, с выхода которого сформированное значение поступает на первый информационный вход мультиплексора 11. На этапе работы значение с первого входа мультиплексора 11 проходит на вход первой ступени регистра 8 благодаря нулевому значению сигнала на его адресном входе, где фиксируется по пришествии соответствующего синхросигнала, что обеспечивает формирование в регистрах 8 операционных блоков 1 искомых значений.From the outputs of registers 6 and 7, the values of a _ik and b _kj are fed to the input of the multiplication unit 9, at the output of which their product a _ik b _{kj is formed} , fed to the first input of adder 10. The value from register 8 is supplied to the second input of adder 10, from the output whose generated value goes to the first information input of multiplexer 11. At the stage of operation, the value from the first input of multiplexer 11 passes to the input of the first stage of register 8 due to the zero value of the signal at its address input, where it is fixed upon the arrival of the corresponding clock signal, which provides formation in registers 8 of operating units 1 of the desired values.

Далее на синхровходы блока двухступенчатых регистров 5 подается сигнал записи в первую ступень, и работа устройства повторяется, как описано выше. Так на втором шаге на этапе работы регистр 4 получает значение 00…100₂, а регистры 5 - соответственно значения 00…10₂, 00…01₂, 00…0₂, …, 00…0₂. На выходах 32 первого блока 2 коэффициентов матрицы формируются значения a _l2, a ₂₁, 0, …, 0, которые поступают на первый столбец операционных элементов 1 и фиксируются в соответствующих регистрах 7; на выходах 32 второго блока 3 коэффициентов матрицы формируются значения b_2l, b_l2, 0, …, 0, которые поступают на первую строку операционных элементов 1 и фиксируются в соответствующих регистрах 8. Порядок поступления элементов перемножаемых матриц в матрицу операционных блоков 1 показан на фиг. 5.Next, the write signal to the first stage is applied to the sync inputs of the block of two-stage registers 5, and the operation of the device is repeated as described above. So at the second step at the stage of operation, register 4 receives the value 00 ... 100 ₂ , and registers 5 receive the values 00 ... 10 ₂ , 00 ... 01 ₂ , 00 ... 0 ₂ , ..., 00 ... 0 ₂ , respectively. At the outputs 32 of the first block 2 of the matrix coefficients, the values a _l2 , a ₂₁ , 0, ..., 0 are formed, which arrive at the first column of the operational elements 1 and are fixed in the corresponding registers 7; at the outputs 32 of the second block 3 of the matrix coefficients, the values b _2l , b _l2 , 0, ..., 0 are formed, which are received on the first line of the operational elements 1 and are fixed in the corresponding registers 8. The order of receipt of the elements of the multiplied matrices in the matrix of the operational blocks 1 is shown in FIG. . 5.

По прошествии n-1 шагов работы в сдвиговом регистре 4 формируется значение 100…0₂. В начале n-го шага работы производится сдвиг данного значения в сторону старших разрядов, что приводит к выходу старшей единицы за пределы разрядов регистра и формированию значения 00…0₂, которое не изменяется в последующих шагах работы устройства (см. фиг. 8).After n-1 work steps in the shift register 4, a value of 100 ... 0 _{2 is formed} . At the beginning of the n-th operation step, this value is shifted towards the higher digits, which leads to the older unit going beyond the register digits and forming a value of 00 ... 0 ₂ , which does not change in subsequent steps of the device (see Fig. 8).

По прошествии 3n-2 шагов работы в регистрах 8 операционных элементов 1 формируются искомые элементы c_ij результирующей матрицы C, что показано на фиг. 6 для случая n=3.After 3n-2 work steps, the required elements c _{ij of the} resulting matrix C are formed in the registers 8 of the operational elements 1, as shown in FIG. 6 for the case n = 3.

На этапе получения результата умножения на адресные входы мультиплексоров 11 подается единичное значение, что обеспечивает подключение вторых входов мультиплексоров 11 к их выходам. После этого производится выдача синхроимпульсов, поочередно обеспечивающих запись сперва в первую, а затем во вторую ступени регистров 8, что за n шагов обеспечивает построчный вывод результирующих элементов матрицы C из нижней строки матрицы операционных элементов 1 в параллельно-конвейерном режиме, что схематично показано на фиг. 7 для случая n=3.At the stage of obtaining the result of multiplication, a single value is supplied to the address inputs of the multiplexers 11, which ensures the connection of the second inputs of the multiplexers 11 to their outputs. After that, the synchronization pulses are produced, which sequentially record first to the first and then to the second stage of registers 8, which in n steps provides a line-by-line output of the resulting elements of the matrix C from the bottom row of the matrix of operational elements 1 in parallel-conveyor mode, which is shown schematically in FIG. . 7 for the case n = 3.

Для умножения ленточной матрицы на полную матрицу либо матриц с числом элементов, меньшим n, и в общем случае не являющихся квадратными, в соответствующие блоки хранения 12 первого 2 и второго 3 блоков коэффициентов матриц необходимо загрузить нулевые значения.To multiply the tape matrix by a full matrix or matrices with the number of elements less than n, and in general not being square, zero values must be loaded into the corresponding storage blocks 12 of the first 2 and second 3 blocks of matrix coefficients.

Claims

A device for matrix multiplication containing a matrix of n × n operating blocks (where n is the size of the multiplied square matrices), each of which contains the first, second and third registers, a multiplier and an adder, the output and output of the second register being connected respectively to the first input and the output of the operating unit, respectively, the input and output of the first register are connected respectively to the second input and output of the operating unit, respectively, the third output of the operating unit is connected to the output of the third register, the second input is the sum ora connected to the output of the multiplier, first and second inputs connected respectively to the first and second outputs of the operation unit, the clock is connected to the clock terminal of the register, multiplier and adder having a first input (i, j) -th operation unit (wherein

) is connected to the first output of the (i, j-1) -th operating unit, the second input of the (l, k) -th operating unit (where

) is connected to the second output of the (l-1, k) -th operating unit, the k-th output of the group of device outputs is connected to the third output of the (n, k) -th operating unit, the clock inputs of all operating blocks are connected to the sync input of the matrix of operating units of the device, characterized in that it additionally includes a multiplexer as part of each operating unit, the first and second blocks of matrix coefficients, a shift register, a group of n two-stage registers, each of the matrix coefficient blocks contains n × n storage blocks and a group of n output OR elements, each of the storage units contains a register, an AND element, a group of n AND elements, a group of n OR elements, the adder output connected to the first input of the multiplexer, the second input of which connected to the third input of the operation unit, and the output to the input third register, the output of which is connected to the first input of the adder, the control input of the operating unit is connected to the control input of the multiplexer, and the reset input of the operating unit is connected to the reset inputs of the first, second and third registers, the third input of the (l, k) -th opera insulating block (where

) is connected to the i-th output as part of the group of outputs of the first block of matrix coefficients, the second input of the (l, k) -th operational block (where

) is connected to the k-th output as part of the group of outputs of the second block of matrix coefficients, the control inputs of the operating blocks are connected to the control input of the device, and the reset inputs of the operating blocks are connected to the reset input of the device, which is also connected to the reset inputs of each of n two-stage registers, information input m-mu of a two-stage register (where

), k-th input (where

,

), k-th input (where