[go: up one dir, main page]

TWI766211B - Configuration load and unload of a reconfigurable data processor - Google Patents

Configuration load and unload of a reconfigurable data processor Download PDF

Info

Publication number
TWI766211B
TWI766211B TW108142191A TW108142191A TWI766211B TW I766211 B TWI766211 B TW I766211B TW 108142191 A TW108142191 A TW 108142191A TW 108142191 A TW108142191 A TW 108142191A TW I766211 B TWI766211 B TW I766211B
Authority
TW
Taiwan
Prior art keywords
configurable
configuration
unit
file
array
Prior art date
Application number
TW108142191A
Other languages
Chinese (zh)
Other versions
TW202032383A (en
Inventor
曼尼斯 夏
倫 西瓦拉瑪
馬克 盧特雷爾
大衛 傑克森
拉古 帕拉哈卡
蘇姆蒂 賈拉斯
格雷戈里 格羅霍斯
普拉莫德 娜塔拉雅
Original Assignee
美商聖巴諾瓦系統公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/198,086 external-priority patent/US11188497B2/en
Priority claimed from US16/197,826 external-priority patent/US10831507B2/en
Application filed by 美商聖巴諾瓦系統公司 filed Critical 美商聖巴諾瓦系統公司
Publication of TW202032383A publication Critical patent/TW202032383A/en
Application granted granted Critical
Publication of TWI766211B publication Critical patent/TWI766211B/en

Links

Images

Landscapes

  • Logic Circuits (AREA)

Abstract

A reconfigurable data processor comprises a bus system, and an array of configurable units connected to the bus system, configurable units in the array including configuration data stores to store unit files comprising a plurality of sub-files of configuration data particular to the corresponding configurable units. Configurable units in the plurality of configurable units each include logic to execute a unit configuration load process, including receiving via the bus system, sub-files of a unit file particular to the configurable unit, and loading the received sub-files into the configuration store of the configurable unit. A configuration load controller connected to the bus system, including logic to execute an array configuration load process, including distributing a configuration file comprising unit files for a plurality of the configurable units in the array.

Description

可重組態資料處理器的組態加載和卸載Configuration Loading and Unloading for Reconfigurable Data Processors

本技術有關可重組態架構的組態,具體言之,可被應用至粗粒(coarse-grain)可重組態架構之組態。The present technology relates to the configuration of reconfigurable architectures, and in particular, can be applied to the configuration of coarse-grain reconfigurable architectures.

包括場可程式化閘極陣列(FPGA)之可重組態處理器可經組構以更有效率地或可能比使用通用處理器執行電腦程式被達成還快速地實現各種各樣的功能。所謂的粗粒可重組態架構(例如CGRA)正在發展中,其中於陣列中之可組態單元比典型的、更細粒的FPGA所使用的更複雜,且可賦能更快速或更有效率的各種各樣的類別之執行。舉例來說,CGRA已被提出其可賦能用於機器學習與人工智慧工作負載之能源效率的加速器之實現。見Prabhakar等人所著的“Plasticine: A Reconfigurable Architecture for Parallel Patterns,” ISCA ’17, June 24-28, 2017, Toronto, ON, Canada。Reconfigurable processors, including Field Programmable Gate Arrays (FPGAs), can be configured to implement a wide variety of functions more efficiently or possibly faster than is possible using general purpose processors to execute computer programs. So-called coarse-grained reconfigurable architectures (such as CGRAs) are being developed in which the configurable elements in an array are more complex than those used in typical, finer-grained FPGAs, and may enable faster or more Execution of various categories of efficiency. For example, CGRA has been proposed for its implementation of accelerators that could enable energy efficiency for machine learning and artificial intelligence workloads. See Prabhakar et al., “Plasticine: A Reconfigurable Architecture for Parallel Patterns,” ISCA ’17, June 24-28, 2017, Toronto, ON, Canada.

可重組態處理器之組態涉及組態描述之編譯以產生組態檔案(有時參照為位元流或位元檔案)、及於處理器上將組態檔案分配至可組態單元。要開始一處理,組態檔案必須對於該處理被加載。要改變一處理,組態檔案必須被新的組態檔案取代。Configuration of reconfigurable processors involves the compilation of configuration descriptions to generate configuration files (sometimes referred to as bitstreams or bit files), and distribution of configuration files to configurable units on the processor. To start a process, the configuration file must be loaded for the process. To change a process, the configuration file must be replaced by a new configuration file.

為了分配與加載組態檔案之程序與支持結構會很複雜,且程序之執行會很耗時。The procedures and support structures for distributing and loading configuration files can be complex, and the execution of the procedures can be time-consuming.

為了最大化操作效率,且能在可重組態處理器上調換程式,需要一種有效率地加載組態狀態及儲存組態與程式狀態之手段。In order to maximize operational efficiency and enable program swapping on reconfigurable processors, a means of efficiently loading configuration states and storing configuration and program states is required.

說明一種技術,其賦能對於粗粒可重組態陣列處理器(其含有被以柵(grid)、方塊(tile)狀設置之可程式化元件及其他類型的可重組態處理器)的組態與控制狀態之有效率的加載及卸載。Describes a technique that enables processing of coarse-grained reconfigurable array processors containing programmable elements arranged in grids, tiles, and other types of reconfigurable processors. Efficient loading and unloading of configuration and control states.

此處所述之技術提供從儲存於記憶體中之經格式化的組態檔案加載組態資料及經由並聯與串聯技術的組合被傳送至可重組態處理器之能力。同樣的,此處所述之技術提供卸載程式控制及資料狀態至被類似地格式化的卸載組態檔案內之有效率的手段。綜上所述,加載與卸載技術可支援協定以快速地調換程式進出可重組態處理器以賦能分時(time-sharing)與其他虛擬化技術。The techniques described herein provide the ability to load configuration data from a formatted configuration file stored in memory and transfer it to a reconfigurable processor via a combination of parallel and serial techniques. Likewise, the techniques described herein provide an efficient means of uninstalling program control and data status into similarly formatted uninstall configuration files. In summary, loading and unloading techniques can support protocols to rapidly swap programs in and out of reconfigurable processors to enable time-sharing and other virtualization techniques.

組態與重組態程序及結構於此被說明可使用於可重組態處理器,其包含匯流排系統、及連接至匯流排系統之一或多個陣列的可組態單元。於一或多個陣列中之可組態單元包括組態資料儲存器(其使用例如閂鎖之序列鏈來實現)以儲存組態資料(於此參照為單元檔案)。特定於可組態單元之單元檔案可包含組態資料之複數個子檔案。於此處所述之範例中,子檔案由具有適於使用匯流排系統有效的分配之資料的「群集(chunk)」所組成。The configuration and reconfiguration procedures and structures described herein may be used for a reconfigurable processor, which includes a busbar system, and configurable units connected to one or more arrays of the busbar system. Configurable cells in one or more arrays include configuration data storage (implemented using serial chains such as latches) to store configuration data (referred to herein as cell files). A unit file specific to a configurable unit may contain multiple subfiles of configuration data. In the example described here, subfiles consist of "chunks" with data suitable for efficient distribution using the bus system.

於該複數個可組態單元中之可組態單元可各包括用以執行一單元組態加載處理之邏輯,該單元組態加載處理包括經由該匯流排系統來接收特定於該可組態單元之一單元檔案的子檔案、及將所接收的子檔案加載至該可組態單元之組態儲存器內。於一些實施例中,於該複數個可組態單元中之可組態單元在執行期間在組態亦被使用於該組態加載處理中之後使用於該匯流排系統中之路由。Configurable units in the plurality of configurable units may each include logic to perform a unit configuration loading process including receiving via the bus system specific to the configurable unit A sub-file of a unit file, and loading the received sub-file into the configuration memory of the configurable unit. In some embodiments, configurable cells of the plurality of configurable cells are used for routing in the bus system during execution after configuration is also used in the configuration loading process.

組態加載控制器被描述包括用以執行陣列組態加載處理之邏輯。陣列組態加載處理包括對於在該陣列中的複數個可組態單元分配包含單元檔案的一組態檔案以實現一機器。A configuration load controller is described to include logic to perform the array configuration load process. The array configuration loading process includes allocating a configuration file including cell files to a plurality of configurable cells in the array to implement a machine.

於該技術之一個觀點中,單元檔案可被組織以包含複數個排序過的子檔案。於一些實施例中,特定於不同的可組態單元之單元檔案可具有不同數量之排序過的子檔案。用於一陣列的可組態單元之組態檔案被設置使得單元檔案之子檔案與對於其他單元檔案相同次序之其他子檔案交錯、及被設置使得於組態檔案中的子檔案之位置暗示於子檔案之陣列中的可組態單元及其於特定於可組態單元之單元檔案中的次序。In one aspect of the technique, a unit file may be organized to contain a plurality of ordered sub-files. In some embodiments, unit files specific to different configurable units may have different numbers of ordered sub-files. The configuration file for the configurable cells of an array is set so that subfiles of the cell file are interleaved with other subfiles in the same order as for other cell files, and are set so that the position of the subfile in the configuration file is implied by the subfile Configurable cells in the file's array and their order in the configurable cell-specific cell file.

此處所述之陣列組態加載處理的範例藉由以N個回合(回合R(i),i從0至N-1)之分配序列來發送子檔案至於陣列中之複數個可組態單元而執行。於各回合R(i)中,該處理經由匯流排系統將一個次序(i)的子檔案傳送至具有包括最多(i+1)個子檔案之單元檔案的可組態單元。The example of the array configuration loading process described here is by sending subfiles to a plurality of configurable cells in the array in an allocation sequence of N rounds (rounds R(i), i from 0 to N-1). And execute. In each round R(i), the process transfers, via the bus system, a subfile of order (i) to a configurable cell with a cell file that includes at most (i+1) subfiles.

於該複數個可組態單元中之可組態單元中的該等組態資料儲存器包含序列鏈,且該單元組態加載處理可於一個匯流排週期中於一個回合的分配序列中從該匯流排系統接收所有或部份的特定於該可組態單元之該單元檔案的一第一子檔案、及在於下一個回合的分配序列中接收一第二子檔案之前在隨後的匯流排週期期間開始將所接收的第一子檔案推入該序列鏈內、及於一稍後的匯流排週期中於該下一個回合的分配序列中從該匯流排系統接收該第二子檔案、及在將稍早所接收的子檔案推入該序列鏈內之後開始將所接收的第二子檔案推入該序列鏈內。The configuration data stores in the configurable units of the plurality of configurable units include sequence chains, and the unit configuration load process can be loaded from the unit in a round of allocation sequences in a bus cycle. The bus system receives all or part of a first subfile of the cell file specific to the configurable cell, and during a subsequent bus cycle before receiving a second subfile in the allocation sequence of the next round Begin pushing the received first subfile into the sequence chain, and receive the second subfile from the bus system in the next round of the allocation sequence in a later bus cycle, and in the Pushing the second received subfile into the sequence chain begins after the earlier received subfile is pushed into the sequence chain.

於某些回合的分配序列中,於該複數個排序過的子檔案中之該第二子檔案被該可組態單元接收之前,該第一子檔案於該可組態單元中被該單元組態加載處理消耗。In some rounds of allocation sequences, before the second sub-file of the plurality of sorted sub-files is received by the configurable unit, the first sub-file is grouped by the unit in the configurable unit state loading processing consumption.

該陣列可包括多於一個類型的可組態單元,且用於不同類型的可組態單元之該等單元檔案包括不同數量的組態資料之子檔案。舉例來說,用於一第一類型的可組態單元之該等單元檔案包括Z1群集,且用於一第二類型的可組態單元之該等單元檔案包括Z2群集,其中Z1小於Z2。該陣列組態加載處理可包括擷取包括用於所有將於回合R(i)中被分配的第一類型與第二類型的可組態單元之該等單元檔案的子檔案(i)之該組態檔案的片段,其中(i)從0至Z1-1,及接著擷取包括用於所有將於回合R(i)中被分配的第二類型的可組態單元之該等單元檔案的子檔案(i)之該組態檔案的片段,其中(i)從Z1至Z2-1。此協定可被延伸至任何數量的類型之具有不同數量的子檔案於其單元檔案中之可組態單元。The array may include more than one type of configurable cell, and the cell files for different types of configurable cells include different numbers of subfiles of configuration data. For example, the cell files for a first type of configurable cells include Z1 clusters, and the cell files for a second type of configurable cells include Z2 clusters, where Z1 is less than Z2. The array configuration loading process may include retrieving the subfile (i) that includes the cell files for all configurable cells of the first type and second type to be allocated in round R(i). A segment of the configuration file, where (i) goes from 0 to Z1-1, and then retrieves the cell file that includes the cell files for all configurable cells of the second type to be allocated in round R(i) Fragment of the configuration file of subfile (i), where (i) is from Z1 to Z2-1. This protocol can be extended to any number of configurable units of type that have different numbers of subfiles in their unit files.

於啟始該陣列組態加載處理之一項技術中,識別該組態檔案於記憶體中之位置的組態加載命令可從一主處理被接收、及因應該命令,該處理產生一或多個記憶體存取請求。當所請求的組態檔案之部份返回時,分配序列可被執行。In one technique for initiating the array configuration load process, a configuration load command identifying the location of the configuration file in memory may be received from a host process, and in response to the command, the process generates one or more memory access request. When the requested portion of the configuration file is returned, the allocation sequence can be executed.

複數個單元檔案之檔案可依符合該分配序列的交錯方式被設置於組態檔案中。組態檔案之此設置賦能組態加載處理以暗示可組態單元、及於各子檔案之複數個排序過的子檔案中之位置(藉由子檔案於組態檔案中之位置)。陣列組態加載處理可包括基於該等子檔案於該組態檔案中之位置將該等子檔案路由至可組態單元。The files of the plurality of unit files may be arranged in the configuration file in an interleaved manner according to the allocation sequence. This setting of the configuration file enables the configuration load process to imply the configurable unit, and position within the plurality of ordered subfiles of each subfile (by the position of the subfile in the configuration file). The array configuration loading process may include routing the subfiles to configurable cells based on their location in the configuration file.

藉由陣列組態加載處理來接收組態資料之複數個可組態單元可包括於該陣列之可組態單元中之所有可組態單元。於被組態檔案實現的機器沒有利用所有的可組態單元之例子中,用於一或多個未使用的可組態單元之單元檔案可實現無操作組態。同樣的,陣列組態加載處理可被組構使得藉由陣列組態加載處理來接收組態資料之複數個可組態單元包括少於於該陣列之可組態單元中之所有的可組態單元。The plurality of configurable cells that receive configuration data through the array configuration loading process may include all of the configurable cells in the configurable cells of the array. In instances where the machine implemented by the configuration file does not utilize all configurable units, the unit file for one or more unused configurable units may enable a no-op configuration. Likewise, the array configuration load process can be configured such that the plurality of configurable cells that receive configuration data by the array configuration load process includes less than all of the configurable cells of the array. unit.

於此處所述之一範例中,於該陣列的可組態單元中之可組態單元包括個別的以在該陣列組態加載邏輯處開始與結束之菊鍊連接的加載完成狀態邏輯。於確認組態檔案的使用菊鍊之成功的加載之程序中,該陣列組態加載邏輯在該組態檔案被分配之後於該菊鍊上轉送組態加載完成訊號,且在該陣列中之各可組態單元中,當來自該鏈之一先前的成員之該組態加載完成訊號被接收且其自己的單元檔案之加載被完成時,該組態加載完成狀態邏輯於該菊鍊上轉送該組態加載完成訊號。In one example described herein, the configurable cells in the configurable cells of the array include individual load completion status logic that is daisy-chained starting and ending at the array configuration load logic. In the process of confirming the successful loading of configuration files using a daisy chain, the array configuration load logic forwards a configuration load complete signal on the daisy chain after the configuration file is allocated, and each In a configurable unit, when the configuration load complete signal is received from a previous member of the chain and the loading of its own unit file is complete, the configuration load complete state logic forwards the configuration load complete state logic on the daisy chain. Configuration loading complete signal.

匯流排系統係於此說明,其支援複數個陣列之可組態單元,其中各陣列可被參照為方塊(tile)。所述之匯流排系統包括一頂層網路與一陣列層網路,該頂層網路係連接至一外部資料介面(例如一或多個PCIE或DDR類型介面)及至各方塊之一陣列介面,且該陣列層網路係連接至對應的方塊之陣列介面及至該陣列的可組態單元中之該等可組態單元。該陣列組態加載處理可包括從一主處理接收識別該組態檔案於記憶體中之位置的組態加載命令、及因應該命令經由該頂層網路而產生一或多個記憶體存取請求以透過該外部資料介面來擷取該組態檔案。該陣列組態加載處理可使用於該組態檔案中之該等子檔案的位置所暗示之位址將子檔案經由該陣列層網路路由至該等可組態單元。The bus system described herein supports a plurality of arrays of configurable cells, where each array may be referred to as a tile. The bus system includes a top layer network and an array layer network, the top layer network being connected to an external data interface (such as one or more PCIE or DDR type interfaces) and to an array interface of each block, and The array layer network is connected to the array interface of the corresponding block and to the configurable units of the configurable units of the array. The array configuration load process may include receiving, from a host process, a configuration load command identifying the location of the configuration file in memory, and generating one or more memory access requests in response to the command via the top-level network to retrieve the configuration file through the external data interface. The array configuration load process may use addresses implied by the locations of the subfiles in the configuration file to route subfiles through the array layer network to the configurable units.

組態卸載控制器被說明包括用以執行陣列組態卸載處理之邏輯,該陣列組態卸載處理包括分配一卸載命令至在該陣列中之複數個可組態單元以卸載特定於對應的可組態單元之該等單元檔案,該等單元檔案各包含複數個排序過的子檔案、在該組態卸載控制器處接收來自該陣列的可組態單元之子檔案。一卸載組態檔案係根據該子檔案為其之一部分的該單元檔案之該可組態單元及於該單元檔案中之該子檔案之次序藉由設置所接收的子檔案於記憶體中來組合。組態卸載檔案之結構可與以上所述之組態檔案的結構相同。於該複數個可組態單元中之可組態單元可包括用以執行一單元組態卸載處理之邏輯,該單元組態卸載處理包括從該可組態單元之該組態儲存器卸載該等子檔案及經由該匯流排系統(例如經由陣列層網路)將特定於該可組態單元的一單元檔案之子檔案傳輸至該組態卸載控制器。經卸載的子檔案不需要被組態卸載控制器以任何特定次序接收。組態卸載控制器接著透過匯流排系統(例如經由頂層網路)將單元子檔案傳輸至記憶體。A configuration offload controller is described as including logic for performing an array configuration offload process, the array configuration offload process including assigning an offload command to a plurality of configurable units in the array to offload a configuration specific to a corresponding configurable unit. The unit files of the state unit, the unit files each including a plurality of ordered sub-files, the sub-files of the configurable cells received at the configuration offload controller from the array. An uninstall configuration file is assembled by setting the received subfile in memory according to the order of the configurable unit of the unit file of which the subfile is a part and the subfile in the unit file . The structure of the configuration uninstall file can be the same as that of the configuration file described above. A configurable unit in the plurality of configurable units may include logic to perform a unit configuration unloading process that includes unloading the configurable units from the configuration storage of the configurable unit Subfiles and subfiles of a cell file specific to the configurable cell are transmitted to the configuration offload controller via the bus system (eg, via an array layer network). The offloaded subfiles do not need to be received in any particular order by the configured offload controller. The configuration offload controller then transfers the cell subfile to memory through the bus system (eg, via the top-level network).

用以組態可重組態資料處理器之方法亦被提供。Methods for configuring reconfigurable data processors are also provided.

此處所述之技術的其他態樣與優點可見於圖式、及其後之詳細說明、及申請專利發明的檢閱。Additional aspects and advantages of the techniques described herein can be found in the drawings, the detailed description that follows, and the review of the patented invention.

以下說明將典型地參照特定的結構實施例與方法。應了解的是,並沒有要將技術限制至具體揭露的實施例與方法,取代的是,該技術可使用其他特徵、元件、方法、及實施例來實行。較佳實施例被描述以顯示本技術,而沒有限制其範疇,其為申請專利範圍所界定。所屬技術領域中具有通常知識者將於以下說明中了解到各種各樣的等效變化。The following description will typically refer to specific structural embodiments and methods. It is to be understood that the techniques are not intended to be limited to the specifically disclosed embodiments and methods, and instead, the techniques may be implemented using other features, elements, methods, and embodiments. The preferred embodiments are described to illustrate the technology, not to limit its scope, which is defined by the scope of the claims. Various equivalent changes will be apparent to those of ordinary skill in the art from the following description.

第1圖為顯示包括主機120、記憶體140、及可重組態資料處理器110之系統的系統圖。如於第1圖之範例中所示,可重組態資料處理器110包括可組態單元之陣列190及組態加載/卸載控制器195。如此處所使用之用語「組態加載/卸載控制器(configuration load/unload controller)」參照組態加載控制器與組態卸載控制器之結合。組態加載控制器與組態卸載控制器可使用分開的邏輯與資料路徑來源被實現、或當適合特定實施例時可使用共用的邏輯與資料路徑來源被實現。於一些實施例中,系統可僅包括此處所述之類型的組態加載控制器。於一些實施例中,系統可僅包括此處所述之類型的組態卸載控制器。FIG. 1 is a system diagram showing a system including a host 120 , a memory 140 , and a reconfigurable data processor 110 . As shown in the example of FIG. 1 , the reconfigurable data processor 110 includes an array 190 of configurable cells and a configuration load/unload controller 195 . The term "configuration load/unload controller" as used herein refers to a combination of a configuration load controller and a configuration unload controller. The configuration load controller and configuration unload controller may be implemented using separate logic and data path sources, or may be implemented using a common logic and data path source as appropriate for a particular embodiment. In some embodiments, the system may include only configuration load controllers of the type described herein. In some embodiments, the system may only include configuration offload controllers of the type described herein.

處理器110包括連接至主機120之外部I/O介面130、及連接至記憶體140之外部I/O介面150。I/O介面130、150經由匯流排系統115連接至可組態單元之陣列190及組態加載/卸載控制器195。匯流排系統115可具有運送一個群集的資料之匯流排寬度,於此範例中其可為128個位元(大致而言,本說明中所參照的128個位元可被考量為範例群集大小)。通常,一群集的組態檔案可具有數量N個位元的資料,且匯流排系統於一個匯流排週期中可被組構以傳送N個位元的資料,其中N為任何實用的匯流排寬度。被分配於分配序列中之子檔案可由一個群集、或其他量的資料組成(當適合特定實施例時)。程序係使用由各一個群集的資料組成的子檔案來說明。當然,技術可被組構以分配不同的大小之子檔案,包括可由被分配於例如兩個匯流排週期中之兩個群集所組成的子檔案。The processor 110 includes an external I/O interface 130 connected to the host 120 and an external I/O interface 150 connected to the memory 140 . The I/O interfaces 130 , 150 are connected to the array 190 of configurable cells and the configuration load/unload controller 195 via the bus system 115 . The bus system 115 can have a bus width that carries a cluster of data, which in this example can be 128 bits (roughly, the 128 bits referenced in this description can be considered as an example cluster size) . In general, a clustered configuration file can have a number of N bits of data, and a bus system can be configured to transmit N bits of data in one bus cycle, where N is any practical bus width . The sub-files allocated in the allocation sequence may consist of a cluster, or other amount of data (as appropriate for the particular embodiment). Programs are described using subfiles consisting of one cluster of data each. Of course, techniques can be configured to allocate subfiles of different sizes, including subfiles that can be composed of two clusters that are allocated, for example, in two bus cycles.

要組態可組態單元之陣列190中之可組態單元與組態檔案,主機120可經由於可重組態資料處理器110中之介面130、匯流排系統115、及介面150將組態檔案發送至記憶體140。組態檔案可用許多方式被加載,包括在可組態處理器110的外部之資料路徑(當適合特定架構時)。組態檔案可經由記憶體介面150從記憶體140被擷取。群集的組態檔案可接著依此處所述之分配序列被發送至可重組態資料處理器110中的可組態單元之陣列190中的可組態單元。To configure the configurable cells and configuration files in the array 190 of configurable cells, the host 120 may configure the configuration through the interface 130 , the bus system 115 , and the interface 150 in the reconfigurable data processor 110 The file is sent to memory 140 . Configuration files can be loaded in a number of ways, including data paths external to configurable processor 110 (as appropriate for a particular architecture). The configuration file can be retrieved from memory 140 via memory interface 150 . The configuration file for the cluster may then be sent to the configurable cells in the array 190 of configurable cells in the reconfigurable data processor 110 according to the allocation sequence described herein.

外部時脈產生器170或其他時脈訊號來源可將一時脈訊號175或複數個時脈訊號提供至在可重組態資料處理器110中之元件,包括可組態單元之陣列190、及匯流排系統115、及外部資料I/O介面。An external clock generator 170 or other clock signal source may provide a clock signal 175 or a plurality of clock signals to elements in the reconfigurable data processor 110, including the array 190 of configurable cells, and busses The row system 115, and the external data I/O interface.

第2圖為粗粒可重組態架構(Coarse Grain Reconfigurable Architecture;CGRA)處理器之組件的簡化方塊圖。於此範例中,CGRA處理器具有兩個方塊(Tile1、Tile2)。方塊包含連接至匯流排系統的可組態單元之陣列,於此範例中包括陣列層網路。匯流排系統包括將方塊連接至外部I/O介面205(或任何數量的介面)之頂層網路。於其他實施例中,不同的匯流排系統組態可被利用。於各方塊中之可組態單元於此範例中為於陣列層網路上之節點。Figure 2 is a simplified block diagram of components of a Coarse Grain Reconfigurable Architecture (CGRA) processor. In this example, the CGRA processor has two tiles (Tile1, Tile2). A block contains an array of configurable cells connected to a bus system, including in this example an array-level network. The bus system includes a top-level network that connects the blocks to the external I/O interface 205 (or any number of interfaces). In other embodiments, different busbar system configurations may be utilized. The configurable units in each block are, in this example, nodes on the array layer network.

兩個方塊各具有四個位址產生及合併單元(Address Generation and Coalescing Unit;AGCU)(例如MAGCU1、AGCU12、AGCU13、AGCU14)。AGCU為頂層網路上之節點與陣列層網路上之節點,且包括用以將資料於各方塊中在頂層網路上之節點與陣列層網路上之節點之間路由的資源。Each of the two blocks has four Address Generation and Coalescing Units (AGCU) (eg MAGCU1, AGCU12, AGCU13, AGCU14). The AGCU is a node on the top-level network and a node on the array-level network, and includes resources used to route data in each block between nodes on the top-level network and nodes on the array-level network.

於此範例中,在頂層網路上之節點包括一或多個外部I/O,包括介面205。到外部裝置之介面包括用以將資料在頂層網路上之節點與外部裝置之間路由的資源,例如連接至該介面之高容量記憶體、主處理器、其他CGRA處理器、FPGA裝置等等。In this example, nodes on the top-level network include one or more external I/Os, including interface 205 . The interface to the external device includes resources used to route data between nodes on the top-level network and the external device, such as high-capacity memory, host processors, other CGRA processors, FPGA devices, etc. connected to the interface.

於方塊中之一個AGCU於此範例中被組構為主AGCU,其包括用於該方塊之陣列組態加載/卸載控制器。於其他實施例中,多於一個陣列組態加載/卸載控制器可被實現且一個陣列組態加載/卸載控制器可藉由在多於一個的AGCU之間分配的邏輯被實現。One of the AGCUs in the block is configured as the main AGCU in this example, which includes the array configuration load/unload controller for that block. In other embodiments, more than one array configuration load/unload controller may be implemented and one array configuration load/unload controller may be implemented with logic distributed among more than one AGCU.

MAGCU1包括用於Tile1之組態加載/卸載控制器,且MAGCU2包括用於Tile2之組態加載/卸載控制器。於其他實施例中,組態加載/卸載控制器可被設計用於多於一個的方塊之加載與卸載組態。於其他實施例中,多於一個的組態控制器可被設計用於單一個方塊的組態。同樣的,組態加載/卸載控制器可被實現於系統的其他部份中,包括頂層網路與陣列層網路或複數個網路上之獨立節點。MAGCU1 includes a configuration load/unload controller for Tile1 and MAGCU2 includes a configuration load/unload controller for Tile2. In other embodiments, the configuration load/unload controller may be designed to load and unload configurations of more than one block. In other embodiments, more than one configuration controller may be designed for the configuration of a single block. Likewise, configuration load/unload controllers can be implemented in other parts of the system, including top-level and array-level networks or separate nodes on multiple networks.

頂層網路係使用連接至彼此以及至其他在頂層網路上之節點(包括AGCU、及I/O介面205)的頂層切換器(211-216)來構成。頂層網路包括連接頂層切換器之鏈結(例如L11、L12、L21、L22)。資料在鏈結上之頂層切換器之間的封包中移動、及從切換器移動至於連接至切換器之網路上的節點。舉例來說,頂層切換器211與212係藉由鏈結L11來連接、頂層切換器214與215係藉由鏈結L12來連接、頂層切換器211與214係藉由鏈結L13來連接、及頂層切換器212與213係藉由鏈結L21來連接。鏈結可包括一或多個匯流排與支援控制線,包括舉例來說群集寬(chunk-wide)匯流排(向量匯流排)。舉例來說,頂層網路可包括資料、可操作以協調資料之以類似AXI相容協定的方式之傳送的請求與回應通道。見AMBA® AXI and ACE Protocol Specification, ARM, 2017。The top-level network is constructed using top-level switches (211-216) connected to each other and to other nodes on the top-level network, including the AGCU, and the I/O interface 205. Top-level networks include links to top-level switches (eg, L11, L12, L21, L22). Data moves in packets between top-level switches on a link, and from switches to nodes on the network connected to the switch. For example, top switches 211 and 212 are connected by link L11, top switches 214 and 215 are connected by link L12, top switches 211 and 214 are connected by link L13, and Top switches 212 and 213 are connected by link L21. A link may include one or more buses and supporting control lines, including, for example, a chunk-wide bus (vector bus). For example, the top-level network may include data, request and response channels operable to coordinate the transfer of data in a manner similar to an AXI compliant protocol. See AMBA® AXI and ACE Protocol Specification, ARM, 2017.

頂層切換器可被連接至AGCU。舉例來說,頂層切換器211、212、214及215分別連接至方塊Tile1中之MAGCU1、AGCU12、AGCU13及AGCU14。頂層切換器212、213、215及216分別連接至方塊Tile2中之MAGCU2、AGCU22、AGCU23及AGCU24。The top switcher can be connected to the AGCU. For example, top-level switches 211, 212, 214, and 215 are connected to MAGCU1, AGCU12, AGCU13, and AGCU14 in block Tile1, respectively. Top switches 212, 213, 215 and 216 are connected to MAGCU2, AGCU22, AGCU23 and AGCU24 in block Tile2, respectively.

頂層切換器可被連接至一或多個外部I/O介面(例如介面205)。The top-level switch can be connected to one or more external I/O interfaces (eg, interface 205).

第3圖為方塊及可使用於第2圖之組態中的陣列層網路之簡化圖式,其中,在該陣列中之可組態單元為在陣列層網路上之節點。Figure 3 is a simplified diagram of a block and an array layer network that can be used in the configuration of Figure 2, where the configurable cells in the array are nodes on the array layer network.

於此範例中,可組態單元之陣列300包括複數個類型的可組態單元。該等類型的可組態單元於此範例中包括型樣計算單元(Pattern Compute Unit;PCU)、型樣記憶體單元(Pattern Memory Unit;PMU)、切換單元(S)、及位址產生及合併單元(各包括兩個位址產生器AG及共用CU)。這些類型的可組態單元之功能的範例見Prabhakar等人所著的“Plasticine: A Reconfigurable Architecture For Parallel Patterns”, ISCA ’17, June 24-28, 2017, Toronto, ON, Canada,其全部併入於此作為參考。這些可組態單元中之各者含有包含表示要運行程式之設定或序列的一組暫存器或正反器之組態儲存器,且可包括巢套迴路之數量、各迴路迭代器之限制、於各階段被執行的指令、運算元之來源、及用於輸入與輸出介面之網路參數。In this example, the array 300 of configurable cells includes a plurality of types of configurable cells. These types of configurable units in this example include Pattern Compute Unit (PCU), Pattern Memory Unit (PMU), Switch Unit (S), and Address Generation and Merging Units (each including two address generators AG and a shared CU). An example of the functionality of these types of configurable cells can be found in "Plasticine: A Reconfigurable Architecture For Parallel Patterns" by Prabhakar et al., ISCA '17, June 24-28, 2017, Toronto, ON, Canada, which is incorporated in its entirety Here as a reference. Each of these configurable units contains configuration storage containing a set of registers or flip-flops representing the settings or sequences of the program to be run, and may include the number of nested loops, the limit of each loop iterator , the instructions to be executed at each stage, the source of the operands, and the network parameters for the input and output interfaces.

此外,這些可組態單元中之各者含有包含可使用以追蹤於巢套迴路或其他中之進展的狀態之一組暫存器或正反器之組態儲存器。組態檔案含有表示執行程式之各組件的初始組態或開始狀態之位元流。此位元流被參照為位元檔案。程式加載是基於位元檔案之內容來設定於該陣列的可組態單元之中的組態儲存器以允許所有的組件執行程式(亦即,機器)之程序。程式加載亦可需要所有PMU記憶體之加載。In addition, each of these configurable units contains configuration storage that includes a set of registers or flip-flops that can be used to track progress in nested loops or otherwise. A configuration file contains a stream of bits representing the initial configuration or starting state of the components of the executing program. This bitstream is referred to as a bitfile. Program loading is based on the contents of a bit file to set configuration memory in the configurable cells of the array to allow all components to execute programs (ie, machines). Program loading may also require loading of all PMU memory.

陣列層網路包括與陣列中之可組態單元互連的鏈結。於陣列層網路中之鏈結包括一或多個(且於此情形中,三個)種類的實體匯流排:群集級(chunk-level)向量匯流排(例如128個位元的資料)、字元級(word-level)純量匯流排(例如32個位元的資料)、及多位元級(multiple bit-level)控制匯流排。例如,在切換單元311與312之間的互連321包括具有128個位元的向量匯流排寬度之向量匯流排互連、具有32個位元的純量匯流排寬度之純量匯流排互連、及控制匯流排互連。Array layer nets include links that interconnect configurable cells in the array. Links in an array-level network include one or more (and in this case, three) types of physical busses: chunk-level vector busses (eg, 128-bit data), Word-level scalar buses (eg, 32-bit data), and multiple bit-level control buses. For example, interconnect 321 between switching cells 311 and 312 includes a vector bus interconnect with a vector bus width of 128 bits, a scalar bus interconnect with a scalar bus width of 32 bits , and control bus interconnection.

三種類型的實體匯流排的不同之處在於資料被傳送的詳盡性(granularity)。於一個實施例中,向量匯流排可運送包括16個位元組(=128個位元)的資料作為其酬載之群集。純量匯流排可具有32個位元的酬載、及運送純量運算元或控制資訊。控制匯流排可運送控制交握,例如符記(token)及其他訊號。向量與純量匯流排可被封包切換,包括表示各封包之目的地的標頭及其他資訊,例如序列號碼,其可被使用以當封包沒有依照次序被接收時將檔案再組合。各封包標頭可含有目的地識別符,其識別目的地切換單元之地理座標(例如陣列中之列與行)、及介面識別符,其識別被使用以抵達目的地單元之目的地切換器上的介面(例如北、南、東、西等)。控制網路可基於舉例來說於裝置中之時序電路而被電路切換。組態加載/卸載控制器可產生用於128個位元之各群集的組態資料之標頭。標頭於標頭匯流排被傳輸至該陣列之可組態單元中的各可組態單元。The three types of physical busses differ in the granularity with which the data is conveyed. In one embodiment, a vector bus can carry a cluster that includes 16 bytes (=128 bytes) of data as its payload. A scalar bus can have a 32-bit payload and carry scalar operands or control information. The control bus can carry control handshakes such as tokens and other signals. Vectors and scalar buses can be packet-switched, including headers indicating the destination of each packet and other information, such as sequence numbers, which can be used to reassemble files when packets are received out of order. Each packet header may contain a destination identifier, which identifies the geographic coordinates of the destination switch unit (eg, column and row in the array), and an interface identifier, which identifies the destination switch used to reach the destination unit interface (eg north, south, east, west, etc.). The control network may be circuit switched based on, for example, sequential circuits in the device. The configuration load/unload controller can generate headers for configuration data for each cluster of 128 bits. The header is transmitted to each of the configurable cells of the array on the header bus.

於一範例中,128個位元之群集的資料於於向量匯流排上被傳輸,其提供該群集如同向量輸入至可組態單元。向量匯流排可包括128個酬載線、及一組標頭線。標頭可包括用於各群集之序列ID,其可包括: • 用來表示該群集是否為高速暫存記憶體(scratchpad memory)或組態儲存器資料之位元。 • 形成群集號碼之位元。 • 表示行識別符之位元。 • 表示列識別符之位元。 • 表示組件識別符之位元。In one example, data for clusters of 128 bits are transmitted on the vector bus, which provides the clusters as vector inputs to configurable cells. A vector bus may include 128 payload lines, and a set of header lines. The header may include the sequence ID for each cluster, which may include: • A bit used to indicate whether the cluster is scratchpad memory or configuration storage data. • The bits that form the cluster number. • A bit representing the row identifier. • A bit representing the column identifier. • A bit representing the component identifier.

對於加載操作,組態加載控制器可將數量N的群集按照次序從N-1到0發送至可組態單元。於此範例中,6個群集以最大有效位元第一次序按照群集5->群集4->群集3->群集2->群集1->群集0被發送。(應注意的是,此最大有效位元第一次序導致群集5從陣列組態加載控制器於回合0的分配序列中被分配。)對於卸載操作,組態卸載控制器可將卸載資料的次序寫入至記憶體。對於加載與卸載操作兩者,於可組態單元中之組態資料儲存器中的組態序列鏈中之轉移係從最小有效位元(least-significant-bit;LSB)至最大有效位元(most-significant-bit;MSB)、或先MSB。For a load operation, the configuration load controller may send a number N of clusters to the configurable units in order from N-1 to 0. In this example, 6 clusters are sent in most significant bit first order as Cluster 5 -> Cluster 4 -> Cluster 3 -> Cluster 2 -> Cluster 1 -> Cluster 0. (It should be noted that this most significant bit first order causes cluster 5 to be allocated from the array configuration load controller in the round 0 allocation sequence.) For unload operations, the configuration unload controller may Write sequentially to memory. For both load and unload operations, the transition in the configuration sequence chain in the configuration data store in the configurable unit is from the least-significant-bit (LSB) to the most significant-bit ( most-significant-bit; MSB), or MSB first.

第3A圖顯示連接於陣列層網路中之元件的切換單元之範例。如於第3A圖之範例中,切換單元可具有8個介面。切換單元之北、南、東、及西介面被使用於在切換單元之間的連接。切換單元之東北、東南、西北及西南介面各被使用以形成至PCU或PMU例子之連接。各方塊象限中之一組2個切換單元具有至位址產生及合併單元(AGCU)之連接,其包括多個位址產生(AG)單元及連接至多個位址產生單元之合併單元(coalescing unit;CU)。合併單元(CU)作出AG之間的仲裁及處理記憶體請求。切換單元的8個介面中之各者可包括向量介面、純量介面、及控制介面以跟向量網路、純量網路、及控制網路通訊。Figure 3A shows an example of a switching unit connected to elements in an array layer network. As in the example of FIG. 3A, the switching unit may have 8 interfaces. The north, south, east, and west interfaces of the switching units are used for the connection between the switching units. The northeast, southeast, northwest, and southwest interfaces of the switching unit are each used to form a connection to the PCU or PMU instance. A set of 2 switching units in each block quadrant has connections to an address generation and merging unit (AGCU) comprising a plurality of address generation (AG) units and a coalescing unit connected to the plurality of address generation units ; CU). The Merge Unit (CU) arbitrates between AGs and handles memory requests. Each of the eight interfaces of the switching unit may include a vector interface, a scalar interface, and a control interface to communicate with the vector network, the scalar network, and the control network.

在機器組態之後之執行期間,資料可經由一或多個單元切換器及在單元切換器之間的一或多個鏈結被發送至使用陣列層網路上之一或多個切換單元的向量匯流排與向量介面之可組態單元。During execution after machine configuration, data may be sent via one or more cell switches and one or more links between cell switches to vectors using one or more switching cells on the array layer network Configurable unit for bus and vector interface.

於此處所述之實施例中,在方塊之組態之前,資料可經由一或多個單元切換器及在單元切換器之間的一或多個鏈結從使用相同的向量匯流排之組態加載控制器發送至使用陣列層網路上之一或多個切換單元的向量匯流排與向量介面之可組態單元。例如,特定於可組態單元PMU 341之於單元檔案中之群集的組態資料可經由在組態加載/卸載控制器301與切換單元311之西(West;W)向量介面之間的鏈結320、切換單元311、切換單元311之東南(Southeast;SE)向量介面與PMU 341之間的鏈結331而從組態加載/卸載控制器301被發送至PMU 341。In the embodiments described herein, prior to the configuration of the blocks, data may be transferred from groups using the same vector bus via one or more cell switches and one or more links between cell switches. The state loading controller sends to the configurable units using the vector bus and vector interface of one or more switching units on the array layer network. For example, configuration data specific to configurable unit PMU 341 for a cluster in the unit file may be via a link between configuration load/unload controller 301 and the West (W) vector interface of switch unit 311 320. The switch unit 311, the link 331 between the Southeast (SE) vector interface of the switch unit 311 and the PMU 341 is sent from the configuration load/unload controller 301 to the PMU 341.

於此範例中,AGCU之其中一者被組構為主AGCU,其包括組態加載/卸載控制器(例如301)。主AGCU實現暫存器,而透過暫存器,主機(120,第1圖)可經由匯流排系統將命令發送至主AGCU。主AGCU控制於方塊中於可組態單元之陣列上的操作且實現程式控制狀態機以基於其從主機所接收之命令透過寫入至暫存器來追蹤方塊之狀態。對於每次的狀態轉變,主AGCU透過菊鍊命令匯流排(第4圖)發出命令至方塊上之所有組件。命令包括用以將於方塊中的可組態單元之陣列中的可組態單元重置之程式重置命令、以及用以將組態檔案加載至可組態單元之程式加載命令。In this example, one of the AGCUs is configured as the main AGCU, which includes a configuration load/unload controller (eg, 301). The master AGCU implements the scratchpad, and through the scratchpad, the host (120, FIG. 1 ) can send commands to the master AGCU via the bus system. The main AGCU controls operations on the array of configurable cells in the block and implements a program-controlled state machine to track the state of the block by writing to registers based on commands it receives from the host. For each state transition, the master AGCU issues commands to all components on the block via the daisy-chain command bus (Figure 4). Commands include a program reset command to reset the configurable cells in the array of configurable cells in the block, and a program load command to load a configuration file into the configurable cells.

於主AGCU中之組態加載控制器負責從記憶體讀取組態檔案及將組態資料發送至方塊之每個可組態單元。主AGCU可從記憶體以較佳地頂層網路之最大總處理量(throughput)讀取組態檔案。從記憶體讀取之資料根據此處所述之分配序列透過於陣列層網路上之向量介面被主AGCU傳輸至對應的可組態單元。The configuration loading controller in the main AGCU is responsible for reading configuration files from memory and sending configuration data to each configurable unit of the block. The main AGCU can read the configuration file from memory at preferably the maximum throughput of the top-level network. Data read from memory is transmitted by the main AGCU to the corresponding configurable unit through the vector interface on the array layer network according to the allocation sequence described herein.

於一個實施例中,有一種方式可減少在可組態單元內之線路需求,保持待於組態加載處理中被加載或於組態卸載處理中被卸載之單元檔案於一組件中的組態與狀態暫存器係以序列鏈連接,且可透過透過序列鏈來轉移位元之程序被加載。於一些實施例中,可有多於一個的序列鏈被並聯或串聯設置。當可組態單元於一個匯流排週期中從主AGCU接收舉例來說128個位元的組態資料時,可組態單元透過其序列鏈以每週期1個位元的比率轉移此資料,轉移週期可用與匯流排週期相同的比率來運行。可組態單元會需要128個轉移週期以加載128個組態位元連同透過向量介面所接收的128個位元之資料。128個位元組態資料參照為群集。可組態單元可要求多個群集的資料加載所有其組態位元。範例移位暫存器結構係顯示於第6圖中。In one embodiment, there is a way to reduce wiring requirements within configurable cells, maintaining the configuration in a component of cell files to be loaded in a configuration load process or unloaded in a configuration unload process The state register is linked by a sequence chain and can be loaded by a program that transfers bits through the sequence chain. In some embodiments, more than one sequence chain may be arranged in parallel or in series. When a configurable unit receives, for example, 128 bits of configuration data from the main AGCU in one bus cycle, the configurable unit transfers this data through its serial chain at a rate of 1 bit per cycle, transferring The cycle can be run at the same rate as the bus cycle. The configurable unit would require 128 transfer cycles to load the 128 configuration bits along with the 128 bits of data received through the vector interface. The 128-bit configuration data is referenced as a cluster. A configurable unit can request data from multiple clusters to load all of its configuration bits. An example shift register structure is shown in Figure 6.

可組態單元透過多個記憶體介面(150,第1圖)介接記憶體。記憶體介面中之各者可使用數個AGCU被存取。各AGCU含有可重組態純量資料路徑以產生用於晶片外記憶體(off-chip memory)之請求。各AGCU含有用以組織資料之先進先出緩衝器(first-in-first-out buffers;FIFO)以緩衝出去的命令、資料、及從晶片外記憶體進來的回應。The configurable unit interfaces with memory through a plurality of memory interfaces (150, FIG. 1). Each of the memory interfaces can be accessed using several AGCUs. Each AGCU contains a reconfigurable scalar data path to generate requests for off-chip memory. Each AGCU contains first-in-first-out buffers (FIFO) for organizing data to buffer out commands, data, and incoming responses from off-chip memory.

AGCU中之位址產生器AG可產生稀疏的或密集的記憶體命令。密集的請求可被使用以大量傳送相鄰的晶片外記憶體區域、且可被使用以從可組態單元之陣列中的可組態單元讀取群集的資料或將群集的資料寫入至可組態單元之陣列中的可組態單元。密集的請求可藉由AGCU中之合併單元(CU)被轉換至多個晶片外記憶體叢發(burst)請求。稀疏的請求可使位址之流(stream)排入佇列(enqueue)至合併單元內。合併單元使用合併快取以維持於所發出的晶片外記憶體請求上之元資料及結合屬於相同的晶片外記憶體請求之稀疏的位址以將所發出的晶片外記憶體請求之數量最小化。The address generator AG in the AGCU can generate sparse or dense memory commands. Intensive requests can be used to bulk transfer adjacent off-chip memory regions, and can be used to read data from or write data to a cluster from configurable cells in an array of configurable cells. Configurable cells in an array of configuration cells. Intensive requests can be converted to multiple off-chip memory burst requests by a merge unit (CU) in the AGCU. Sparse requests can cause a stream of addresses to be enqueued into a merge unit. The merge unit uses merge caching to maintain metadata on issued off-chip memory requests and to combine sparse addresses belonging to the same off-chip memory request to minimize the number of issued off-chip memory requests .

第4圖為顯示範例可組態單元400(例如型樣計算單元(PCU))之方塊圖。於可組態單元之陣列中的可組態單元包括組態資料儲存器420(例如序列鏈)以儲存單元檔案,包含特定於對應的可組態單元之複數個群集(或其他大小的子檔案)的組態資料。於可組態單元之陣列中的可組態單元各包括經由線路422連接至組態資料儲存器420之單元組態加載邏輯440,以執行單元組態加載處理。單元組態加載處理包括經由匯流排系統(例如向量輸入)接收特定於可組態單元之群集的單元檔案、及加載所接收的群集至可組態單元之組態資料儲存器420中。單元組態加載處理將參照第5圖進一步說明。FIG. 4 is a block diagram showing an example configurable unit 400, such as a pattern computing unit (PCU). Configurable cells in an array of configurable cells include configuration data storage 420 (eg, serial chain) to store cell files, including clusters (or other sized subfiles) specific to corresponding configurable cells ) configuration data. The configurable cells in the array of configurable cells each include cell configuration load logic 440 connected to configuration data store 420 via line 422 to perform cell configuration load processing. The cell configuration loading process includes receiving a cell file specific to a cluster of configurable cells via a bus system (eg, vector input), and loading the received cluster into the configuration data store 420 of the configurable cell. The unit configuration loading process will be further explained with reference to FIG. 5 .

於此範例中在複數個可組態單元中之可組態單元中的組態資料儲存器包含閂鎖之序列鏈,其中閂鎖儲存控制於可組態單元中之資源的組態之位元。於組態資料儲存器中之序列鏈可包括用於組態資料之移位暫存器鏈及用於狀態資訊與串聯連接的計數器值之第二移位暫存器鏈。組態儲存器將參照第6圖進一步說明。Configuration data storage in a configurable unit of the plurality of configurable units in this example includes a sequence chain of latches that store bits that control the configuration of resources in the configurable unit . The sequence chain in the configuration data store may include a shift register chain for configuration data and a second shift register chain for status information and serially connected counter values. The configuration memory will be further described with reference to FIG. 6 .

可組態單元可介接純量、向量、及控制匯流排,使用三個對應的輸入與輸出(IO)之組:純量輸入/輸出、向量輸入/輸出、及控制輸入/輸出。純量IO可被使用以與單一字元的資料(例如32個位元)通訊。向量IO可被使用以與群集的資料(例如128個位元)通訊,在例如於單元組態加載處理中接收組態資料、及在操作期間在組態之後從長管線的一邊到另一邊多個PCU之間傳輸與接收資料的情形中。控制IO可被使用以與控制訊號通訊,例如可組態單元之執行的開始或結束。控制輸入被控制區塊470接收,且控制輸出被控制區塊470提供。Configurable units can interface scalar, vector, and control busses using three corresponding sets of inputs and outputs (IOs): scalar input/output, vector input/output, and control input/output. Scalar IO can be used to communicate with a single word of data (eg, 32 bits). Vector IO can be used to communicate with clustered data (eg, 128 bits), receive configuration data, such as in the unit configuration load process, and after configuration during operation from one side of a long pipeline to the other. In the case of transmitting and receiving data between PCUs. Control IO can be used to communicate with control signals, such as the start or end of the execution of a configurable unit. Control inputs are received by control block 470 and control outputs are provided by control block 470 .

各向量輸入使用於向量FIFO區塊460中之向量FIFO來緩衝,其可包括一或多個向量FIFO。各純量輸入使用純量FIFO 450來緩衝。使用輸入FIFO解耦接資料生產者與消費者之間的時序、及簡化可組態單元間之控制邏輯,此係藉由使其強健以輸入延遲錯配。Each vector input is buffered using a vector FIFO in vector FIFO block 460, which may include one or more vector FIFOs. Each scalar input is buffered using scalar FIFO 450 . Using input FIFOs decouples timing between data producers and consumers, and simplifies control logic between configurable units, by making them robust to input delay mismatches.

輸入組態資料410可被提供至向量FIFO作為向量輸入,接著被傳送至組態資料儲存器420。輸出組態資料430可使用向量輸出從組態資料儲存器420被卸載。Input configuration data 410 may be provided to the vector FIFO as vector input and then transferred to configuration data store 420 . Output configuration data 430 may be unloaded from configuration data store 420 using vector output.

CGRA使用菊鍊完成匯流排以表示何時加載/卸載命令已被完成。主AGCU透過菊鍊命令匯流排將程式加載與卸載命令傳輸至可組態單元之陣列中的可組態單元(以從S0轉變至S1,第5圖)。如第4圖之範例所示,菊鍊完成匯流排491與菊鍊命令匯流排492被連接至菊鍊邏輯493,其與單元組態加載邏輯440通訊。菊鍊邏輯493可包括加載完成狀態邏輯,如以下所說明。菊鍊完成匯流排將進一步說明如下。用於命令與完成匯流排之其他拓樸明顯為可能的,但未於此說明。CGRA uses a daisy-chain completion bus to indicate when a load/unload command has been completed. The master AGCU transmits program load and unload commands to the configurable cells in the array of configurable cells through the daisy-chained command bus (to transition from S0 to S1, Figure 5). As shown in the example of FIG. 4 , daisy chain completion bus 491 and daisy chain command bus 492 are connected to daisy chain logic 493 , which communicates with unit configuration load logic 440 . Daisy chain logic 493 may include load complete status logic, as described below. The daisy-chained completion bus is further explained below. Other topologies for the command and completion bus are obviously possible, but not described here.

可組態單元包括於區塊480中之多個可重組態資料路徑。於可組態單元中之資料路徑可被組織為多階段(Stage 1 … Stage N)、可重組態單一指令多個資料(Single Instruction, Multiple Data;SIMD)管線。被推入至可組態單元中的組態序列鏈中之群集的資料包括用於可組態單元中之各資料路徑的各階段之組態資料。組態資料儲存器420中之組態序列鏈經由線路421連接至區塊480中之多個資料路徑。The configurable unit includes a plurality of reconfigurable data paths in block 480 . The data path in the configurable unit can be organized as a multi-stage (Stage 1 . . . Stage N), reconfigurable single instruction, multiple data (SIMD) pipeline. The data pushed to the clusters in the configuration sequence chain in the configurable unit includes configuration data for each stage of each data path in the configurable unit. The configuration sequence chain in configuration data store 420 is connected via line 421 to a plurality of data paths in block 480 .

連同被使用於PCU中之匯流排介面,型樣記憶體單元(例如PMU)可含有與用於位址計算的可重組態純量資料路徑耦接之高速暫存記憶體。PMU可被使用以在整個可重組態單元之陣列分配晶片上(on-chip)記憶體。於一個實施例中,PMU中的記憶體內之位址計算被執行於PMU資料路徑上,同時核心計算於PCU內被執行。Along with the bus interface used in the PCU, a patterned memory unit (eg, PMU) may contain scratchpad memory coupled with reconfigurable scalar data paths for address computation. PMUs can be used to distribute on-chip memory throughout an array of reconfigurable units. In one embodiment, in-memory address computations in the PMU are performed on the PMU datapath, while core computations are performed in the PCU.

第5圖顯示狀態機的一個範例,其可被使用以控制可組態單元中之單元組態加載處理。通常,單元組態加載處理在一個匯流排週期中從匯流排系統接收特定於可組態單元之單元檔案的第一群集(或子檔案)、在隨後的轉移週期期間開始將所接收的第一群集推入至序列鏈內,其在第二群集的單元檔案被接收之前以與匯流排週期相同的比率來發生。一旦於稍後的匯流排週期中從匯流排系統接收特定於可組態單元之第二群集的單元檔案,程序在將稍早所接收的群集推入序列鏈內之後開始將所接收的第二群集推入至序列鏈內。於某些或所有回合的組態加載處理中,於複數個排序過的群集中之第二群集(群集的單元檔案之次序的下一個)被可組態單元接收之前,第一群集可被單元組態加載處理於可組態單元中消耗。Figure 5 shows an example of a state machine that can be used to control the cell configuration loading process in a configurable cell. Typically, the unit configuration load process receives a first cluster (or sub-file) of configurable unit-specific unit files from the bus system during one bus cycle, begins loading the first received first during a subsequent transfer cycle Clusters are pushed into the sequence chain, which occurs at the same rate as the bus cycle before the cell file for the second cluster is received. Once a cell file specific to a second cluster of configurable cells is received from the bus system at a later bus cycle, the program begins to push the received second cluster after pushing the earlier received cluster into the sequence chain Clusters are pushed into sequence chains. During some or all rounds of the configuration load process, the first cluster may be received by the configurable unit before the second of the plurality of ordered clusters (next in the order of the cluster's unit files) is received by the configurable unit. Configuration load processing is consumed in configurable units.

第5圖之狀態機包括六個狀態S0至S5。於狀態S0(閒置),單元組態加載處理等待來自主AGCU中之組態加載/卸載控制器的組態加載/卸載命令。組態加載/卸載控制器負責組態資料從/至(from/to)晶片外記憶體(140,第1圖)及至/從(to/from)可組態單元之陣列(190,第1圖)之加載與卸載。當加載命令在組態加載/卸載控制器處被接收時,單元組態加載處理進入狀態S1。The state machine of FIG. 5 includes six states S0 to S5. In state S0 (idle), the unit configuration load process waits for a configuration load/unload command from the configuration load/unload controller in the main AGCU. The configuration load/unload controller is responsible for configuration data from/to off-chip memory (140, Fig. 1) and to/from the array of configurable cells (190, Fig. 1) ) loading and unloading. When a load command is received at the configuration load/unload controller, the unit configuration load process enters state S1.

於狀態S1(等待靜止),於多個資料路徑中之功能性擺動被去能使得功能性擺動沒有循環,而純量輸出、向量輸出及控制輸出被關閉使得輸出沒有驅動任何加載。若加載命令已被接收,則單元組態加載處理進入狀態S2。當卸載命令被接收時,單元組態加載處理進入狀態S4。In state S1 (waiting for quiescence), functional wobbles in multiple data paths are disabled so that functional wobbles do not cycle, while scalar outputs, vector outputs and control outputs are turned off so that the outputs do not drive any loads. If the load command has been received, the unit configuration load process proceeds to state S2. When the unload command is received, the unit configuration loading process goes to state S4.

於狀態S2(等待輸入有效(valid)),單元組態加載處理等待輸入FIFO(610,第6圖)變成有效的。當輸入FIFO變成有效的時,輸入FIFO已經由匯流排系統接收組態檔案之群集的組態資料。例如,群集的組態資料可包括128個位元的加載資料,其於匯流排系統之向量網路上被接收且向量網路具有128個位元的向量匯流排寬度。當輸入FIFO變成有效的時,單元組態加載處理進入狀態S3。In state S2 (waiting for input valid), the unit configuration load process waits for the input FIFO (610, Figure 6) to become valid. When the input FIFO becomes active, the input FIFO has received configuration data for the cluster of configuration files from the bus system. For example, the configuration data for a cluster may include 128 bits of load data received on the vector network of the bus system and the vector network has a vector bus width of 128 bits. When the input FIFO becomes active, the unit configuration loading process enters state S3.

於狀態S3(加載轉移),128個位元之群集的組態資料首先於一個時脈週期中從輸入FIFO被退出佇列(de-queued),然後128個位元之群集的組態資料於128個時脈週期中被轉移至輸入移位暫存器(620,第6圖)。輸入移位暫存器可具有如群集的組態資料之相同的長度(例如128個位元),且其花費相同數量的移位器時脈週期(例如128)以將群集的組態資料轉移至輸入移位暫存器內作為群集的組態資料之長度。如上所述,於一些實施例中,移位器時脈與匯流排時脈(或匯流排週期)可用相同比率運行。In state S3 (load transfer), the configuration data of the 128-bit cluster is first de-queued from the input FIFO in one clock cycle, and then the configuration data of the 128-bit cluster is 128 clock cycles are transferred to the input shift register (620, Figure 6). The input shift register may have the same length (eg 128 bits) as the clustered configuration data, and it takes the same number of shifter clock cycles (eg 128) to transfer the clustered configuration data The length of the configuration data into the input shift register as a cluster. As mentioned above, in some embodiments, the shifter clock and the bus clock (or bus period) may operate at the same ratio.

於可組態單元中之組態資料儲存器包含組態序列鏈(630、640,第6圖),其可被組構為FIFO鏈,以儲存包含特定於可組態單元之複數個群集的組態資料之單元檔案。複數個群集的組態資料包括第一群集的組態資料及最後群集的組態資料。於輸入移位暫存器中之群集的組態資料在隨後的時脈週期中進一步被序列地轉移至組態資料儲存器內。組態資料儲存器將參照第6圖進一步說明。The configuration data storage in the configurable unit includes a chain of configuration sequences (630, 640, Fig. 6), which can be configured as a FIFO chain to store data containing a plurality of clusters specific to the configurable unit. The unit file of configuration data. The configuration data of the plurality of clusters includes the configuration data of the first cluster and the configuration data of the last cluster. The configuration data of the clusters in the input shift register are further sequentially transferred into the configuration data store in subsequent clock cycles. The configuration data storage will be further described with reference to FIG. 6 .

在特定於可組態單元之第一群集的單元檔案於狀態S3被轉移至輸入移位暫存器內之後,單元組態加載處理決定第一群集的組態資料是否為特定於可組態單元之最後群集的組態資料。若是,則對於可組態單元之單元檔案的加載係完成,且單元組態加載處理進入狀態S0。若否,則單元組態加載處理進入狀態S2,且等待輸入FIFO對於特定於可組態單元之第二群集的組態資料變成有效的。After the cell file specific to the first cluster of configurable cells is transferred into the input shift register in state S3, the cell configuration load process determines whether the configuration data of the first cluster is specific to the configurable cell The configuration data of the last cluster. If so, the loading of the unit file for the configurable unit is complete, and the unit configuration loading process enters state S0. If not, the unit configuration load process enters state S2 and waits for the input FIFO to become available for configuration data specific to the second cluster of configurable units.

當卸載命令於狀態S1被接收時,單元組態加載處理進入狀態S4。When an unload command is received in state S1, the unit configuration loading process proceeds to state S4.

於狀態S4(卸載轉移),來自組態資料儲存器之群集的組態資料被轉移至輸出移位暫存器(650,第6圖)內。群集的組態資料可包括128個位元的卸載資料。輸出移位暫存器可具有如群集的組態資料之相同的長度(例如128),且其花費相同數量的移位器時脈週期(例如128)以將群集的組態資料從組態資料儲存器轉移至輸出FIFO內作為群集的組態資料之長度。當群集的組態資料被轉移至輸出移位暫存器內時,單元組態加載處理進入狀態S5(等待輸出有效)。In state S4 (unload transfer), the configuration data from the cluster of configuration data stores is transferred into the output shift register (650, FIG. 6). The configuration data for the cluster can include 128 bits of offload data. The output shift register can have the same length as the clustered configuration data (eg 128), and it takes the same number of shifter clock cycles (eg 128) to transfer the clustered configuration data from the configuration data The memory is transferred into the output FIFO as the length of the cluster's configuration data. When the configuration data of the cluster is transferred into the output shift register, the unit configuration loading process enters state S5 (waiting for output to be valid).

於狀態S5(等待輸出有效),單元組態加載處理等待輸出FIFO(660,第6圖)變成有效的。當輸出FIFO變成有效的時,來自輸出移位暫存器的具有128個位元之群集的組態資料於一個時脈週期中被插入至輸出FIFO內。於輸出FIFO中之群集的組態資料接著可被發送至匯流排系統(第3圖)。In state S5 (waiting for output to be active), the unit configuration load process waits for the output FIFO (660, Figure 6) to become active. When the output FIFO becomes active, the configuration data from the output shift register with clusters of 128 bits is inserted into the output FIFO in one clock cycle. The clustered configuration data in the output FIFO can then be sent to the bus system (Figure 3).

在第一群集的組態資料於狀態S5被轉移至輸出FIFO內之後,單元組態加載處理決定第一群集的組態資料是否為組態資料儲存器中之最後群集的組態資料。若是,則對於可組態單元之組態資料的卸載係完成,且單元組態加載處理進入狀態S0。若否,則單元組態加載處理進入狀態S4,且來自組態資料儲存器之第二群集的組態資料被序列地轉移至輸出移位暫存器內。After the configuration data of the first cluster is transferred into the output FIFO in state S5, the unit configuration loading process determines whether the configuration data of the first cluster is the configuration data of the last cluster in the configuration data store. If so, the unloading of the configuration data for the configurable unit is complete, and the unit configuration loading process enters state S0. If not, the cell configuration loading process enters state S4 and the configuration data from the second cluster of configuration data stores are sequentially transferred into the output shift register.

第6圖為可組態單元中之組態儲存器的邏輯表示。於此實施例中,於可組態單元中之組態資料儲存器(420,第4圖)包含組態序列鏈,包括第一移位暫存器鏈630與第二移位暫存器鏈640。第一移位暫存器鏈630包括一組暫存器或閂鎖。第二移位暫存器鏈640包括另一組暫存器或閂鎖(正反器)。於此實施例中,第一移位暫存器鏈與第二移位暫存器鏈係串聯連接以形成單一鏈。Figure 6 is a logical representation of configuration memory in a configurable unit. In this embodiment, the configuration data storage (420, FIG. 4) in the configurable unit includes a configuration sequence chain, including a first shift register chain 630 and a second shift register chain 640. The first shift register chain 630 includes a set of registers or latches. The second shift register chain 640 includes another set of registers or latches (flip-flops). In this embodiment, the first shift register chain and the second shift register chain are connected in series to form a single chain.

組態檔案包括用於在可組態單元之陣列中的複數個可組態單元中之各可組態單元之複數個群集的組態資料。群集的組態資料代表個別的可組態單元之初始組態、或開始狀態。於此系統中之組態加載操作為設定於可組態單元之陣列中的組態資料之單元檔案以允許所有可組態單元執行程式的程序。The configuration file includes configuration data for clusters of each of the configurable cells in the array of configurable cells. The configuration data of the cluster represents the initial configuration, or starting state, of the individual configurable units. The configuration load operation in this system is the process of setting the unit file of configuration data in the array of configurable units to allow all configurable units to execute the program.

於第一移位暫存器鏈630中之該組暫存器可代表運行程式之設定或序列,包括含有暫存器之可組態單元的操作之定義。這些暫存器可記錄(register)巢套迴路的數量、各迴路迭代器的限制、對於各階段執行的指令、運算元的來源、及用於輸入與輸出介面的網路參數。於第二移位暫存器鏈中的該組暫存器可含有有關週期地被加載於可組態單元中之程式的運行狀態之資料。The set of registers in the first shift register chain 630 may represent the settings or sequences of running programs, including definitions of the operation of the configurable units containing the registers. These registers can register the number of nested loops, the limits of each loop iterator, the instructions executed for each stage, the source of operands, and network parameters for input and output interfaces. The set of registers in the second shift register chain may contain data about the running state of programs that are periodically loaded into the configurable unit.

如於第6圖之範例中所示,第一移位暫存器鏈630與第二移位暫存器鏈640被串聯連接,使得第一移位暫存器鏈之最大有效位元(MSB)被連接至第二移位暫存器鏈之最小有效位元(LSB)。加載訊號或卸載訊號可作用為耦接至第一移位暫存器鏈之LSB與第二移位暫存器鏈之LSB的轉移賦能訊號,以控制於第一移位暫存器鏈與第二移位暫存器鏈上之加載/卸載操作。輸入FIFO 610經由選擇器670耦接至輸入移位暫存器620。當加載訊號為活躍的時,選擇器670將輸入移位暫存器620連接至組態資料儲存器之輸入(第一移位暫存器鏈630之LSB)。As shown in the example of FIG. 6, the first shift register chain 630 and the second shift register chain 640 are connected in series such that the most significant bit (MSB) of the first shift register chain ) is connected to the least significant bit (LSB) of the second shift register chain. The load signal or the unload signal can be used as a transfer enable signal coupled to the LSB of the first shift register chain and the LSB of the second shift register chain to control the first shift register chain and the LSB of the second shift register chain. Load/unload operations on the second shift register chain. Input FIFO 610 is coupled to input shift register 620 via selector 670 . When the load signal is active, selector 670 connects input shift register 620 to the input of the configuration data store (LSB of first shift register chain 630).

當加載訊號為活躍的時,於輸入移位暫存器620中之組態資料可被轉移至組態序列鏈中之第一移位暫存器鏈630與第二移位暫存器鏈640內。於此,加載訊號可作用為用於輸入移位暫存器、第一移位暫存器鏈、及第二移位暫存器鏈之賦能訊號。加載操作可重複直到用於可組態單元之所有群集的組態資料被加載至可組態單元中之組態資料儲存器內。當序列鏈之長度不同於整數的群集(或子檔案)之長度時,序列中的第一群集可將其差加以填補,且當最後的群集被轉移進去時,填補位元將被轉移至該鏈之末端。舉例來說,可組態單元中之組態資料儲存器可儲存具有760個位元的大小之單元檔案。單元組態加載處理可加載整數N的群集。於此範例中,N=6,且數量N的群集包括群集5、群集4、群集3、群集2、群集1及群集0。向量匯流排具有128個位元的向量寬度,群集的組態資料具有128個位元,且群集可於一個匯流排時脈週期中被發送至可組態單元。N個群集具有N x 128 = 6 * 128 = 768位元的大小,其包括8個填補位元以匹配760個位元的單元檔案大小。When the load signal is active, the configuration data in the input shift register 620 can be transferred to the first shift register chain 630 and the second shift register chain 640 in the configuration sequence chain Inside. Here, the load signal can function as an enable signal for the input shift register, the first shift register chain, and the second shift register chain. The load operation can be repeated until the configuration data for all clusters of the configurable unit is loaded into the configuration data store in the configurable unit. When the length of a sequence chain differs from the length of a cluster (or sub-file) of integers, the first cluster in the sequence can pad the difference, and when the last cluster is shifted in, the padding bits are shifted to the end of the chain. For example, a configuration data store in a configurable cell can store a cell file with a size of 760 bits. The unit configuration load process can load clusters of integer N. In this example, N=6, and the number N of clusters includes cluster 5, cluster 4, cluster 3, cluster 2, cluster 1, and cluster 0. The vector bus has a vector width of 128 bits, the configuration data of the cluster has 128 bits, and the cluster can be sent to the configurable unit in one bus clock cycle. The N clusters have a size of N x 128 = 6 * 128 = 768 bits, which includes 8 padding bits to match the unit file size of 760 bits.

為了從錯誤中復原,卸載操作可查驗(checkpoint)各可組態單元之狀態。卸載操作可儲存重新開始所需的各可組態單元之執行狀態,及賦能若錯誤發生時應用程式可被重新開始。其亦允許可組態單元之狀態被儲存或傳送以用於除錯目的。需要被儲存的狀態包括至少部份的第一或第二移位暫存器之內容、及選項地PMU記憶體之內容。程式卸載亦會需要卸載所有的第一與第二移位暫存器之狀態。To recover from errors, the unload operation may checkpoint the status of each configurable unit. The uninstall operation stores the execution state of each configurable unit needed to restart, and enables the application to be restarted if an error occurs. It also allows the state of the configurable unit to be stored or transmitted for debugging purposes. The state to be stored includes at least part of the contents of the first or second shift register, and optionally the contents of the PMU memory. Program unloading would also require unloading the state of all the first and second shift registers.

輸出FIFO 660被耦接至輸出移位暫存器650,其接著被耦接至組態資料儲存器之輸出(第二移位暫存器鏈640之MSB)。對於卸載操作,當卸載訊號為活躍的時,於第二移位暫存器鏈640與第一移位暫存器鏈630中之組態資料可被轉移至輸出移位暫存器650內。當輸出FIFO 660為有效的時,輸出移位暫存器650中之組態資料(例如128個位元)可於一個時脈週期中被插入至輸出FIFO 660內。卸載操作可重複直到於可組態單元中之組態資料儲存器中的所有群集的組態資料被卸載至輸出FIFO內。Output FIFO 660 is coupled to output shift register 650, which is then coupled to the output of the configuration data store (MSB of second shift register chain 640). For the unload operation, the configuration data in the second shift register chain 640 and the first shift register chain 630 can be transferred to the output shift register 650 when the unload signal is active. When the output FIFO 660 is active, the configuration data (eg, 128 bits) in the output shift register 650 can be inserted into the output FIFO 660 in one clock cycle. The unload operation can be repeated until the configuration data of all clusters in the configuration data store in the configurable unit are unloaded into the output FIFO.

為了將由MAGCU中之組態加載控制器所發出的組態加載命令之完成同步化與通訊,單一線路菊鍊方案被實現於一範例中,其由於該鍊之各組件中的菊鍊邏輯(例如第4圖中之菊鍊邏輯493)中所包括之邏輯所支持。此方案要求每個組件具有以下2埠:In order to synchronize and communicate the completion of configuration load commands issued by the configuration load controller in the MAGCU, a single-line daisy-chain scheme is implemented in one example due to the daisy-chain logic in the components of the chain (eg Supported by logic included in daisy chain logic 493) in Figure 4. This solution requires each component to have the following 2 ports:

1.輸入埠,稱為PROGRAM_LOAD_DONE_IN1. Input port, called PROGRAM_LOAD_DONE_IN

2.輸出埠,稱為PROGRAM_LOAD_DONE_OUT2. Output port, called PROGRAM_LOAD_DONE_OUT

一組件將驅動其PROGRAM_LOAD_DONE_OUT訊號,當其已完成執行MAGCU所發出的命令且其PROGRAM_LOAD_DONE_IN輸入被驅動為高(high)。當其已完成所有執行命令所需要的步驟時,MAGCU將藉由驅動其PROGRAM_LOAD_DONE_OUT來啟動菊鍊。該鍊中之最後組件將驅動其PROGRAM_LOAD_DONE_OUT,其將被連接至MAGCU之PROGRAM_LOAD_DONE_IN。MAGCU之PROGRAM_LOAD_DONE_IN變高表示命令的完成。在傳遞對應至所有群集的所有組件之資料之後,MAGCU將其PROGRAM_LOAD_DONE_OUT埠驅動為高。當所有組件已完成加載所有其組態位元時,所有組件將驅動其個別的PROGRAM_LOAD_DONE_OUT埠為高。A component will drive its PROGRAM_LOAD_DONE_OUT signal when it has finished executing the command issued by the MAGCU and its PROGRAM_LOAD_DONE_IN input is driven high. When it has completed all the steps required to execute the command, the MAGCU will start the daisy chain by driving its PROGRAM_LOAD_DONE_OUT. The last component in the chain will drive its PROGRAM_LOAD_DONE_OUT, which will be connected to the MAGCU's PROGRAM_LOAD_DONE_IN. PROGRAM_LOAD_DONE_IN of MAGCU goes high to indicate the completion of the command. After passing data corresponding to all components of all clusters, MAGCU drives its PROGRAM_LOAD_DONE_OUT port high. When all components have finished loading all their configuration bits, all components will drive their individual PROGRAM_LOAD_DONE_OUT ports high.

當MAGCU輸入埠PROGRAM_LOAD_DONE_IN被判定時,組態檔案加載即完成。When the MAGCU input port PROGRAM_LOAD_DONE_IN is determined, the configuration file loading is completed.

第7圖為顯示耦接至可重組態資料處理器之主機的操作之流程圖。於步驟711,主機(120,第1圖)經由PCIE介面(130,第1圖)及頂層網路(115,第1圖)將用於可組態單元之陣列的組態檔案發送至晶片外記憶體(140,第1圖)、或以別的方式將組態檔案儲存於對於可組態處理器為可存取的記憶體中。FIG. 7 is a flow chart showing the operation of a host coupled to a reconfigurable data processor. At step 711, the host (120, Fig. 1) sends the configuration file for the array of configurable cells off-chip via the PCIE interface (130, Fig. 1) and the top-level network (115, Fig. 1) The memory (140, FIG. 1), or otherwise stores the configuration file in memory accessible to the configurable processor.

於步驟712,當將組態檔案加載至記憶體被完成時,主機120將組態加載命令發送至於處理器(於此範例中,部份的主AGCU)中之組態加載控制器。主AGCU可實現暫存器,而透過暫存器,主機可將組態加載命令發送至組態加載控制器。組態加載命令可識別於可組態處理器上經由記憶體介面為可存取的記憶體中之位置。組態加載控制器可接著因應擷取組態檔案之命令而經由頂層網路產生一或多個記憶體存取請求。主機可接著對於可組態處理器監視已被完全加載之組態檔案的訊號(714)。當檔案加載完成時,接著主機可啟始待由機器所執行的功能(716)。At step 712, when the loading of the configuration file to the memory is completed, the host 120 sends a configuration load command to the configuration load controller in the processor (in this example, part of the main AGCU). The main AGCU can implement a scratchpad, and through the scratchpad, the host can send configuration load commands to the configuration load controller. The configuration load command can identify a location in memory that is accessible through the memory interface on the configurable processor. The configuration load controller may then generate one or more memory access requests via the top-level network in response to the command to retrieve the configuration file. The host may then monitor the configurable processor for a signal that the configuration file has been fully loaded (714). When the file loading is complete, the host can then initiate the function to be performed by the machine (716).

第8圖為顯示組態加載控制器之操作的流程圖,該組態加載控制器可為與在方塊中的該陣列之可組態單元通訊之部份的MAGCU或其他者。組態加載控制器負責從晶片外記憶體(140,第1圖)讀取組態檔案並將組態資料發送至於可組態單元之陣列中的每個可組態單元。此流程圖開始於組態加載控制器等待組態加載命令(810)。如上所述,組態加載命令識別組態檔案,及其於處理器可存取的記憶體中之位置。Figure 8 is a flow diagram showing the operation of a configuration load controller, which may be a MAGCU or otherwise part of the communication with the configurable units of the array in the block. The configuration load controller is responsible for reading the configuration file from the off-chip memory (140, FIG. 1) and sending the configuration data to each configurable cell in the array of configurable cells. The flow diagram begins with the configuration load controller waiting for a configuration load command (810). As mentioned above, the configuration load command identifies the configuration file and its location in processor-accessible memory.

一旦接收加載命令,於步驟811,組態加載控制器將加載請求發出至連接至可重組態資料處理器(110,第1圖)之記憶體(140,第1圖)。於步驟812,組態加載控制器經由記憶體介面於頂層網路上擷取群集的組態檔案。於步驟813,組態加載控制器在排序過的回合中將群集的組態檔案分配至於陣列層網路上的陣列中之可組態單元。於步驟814,當所有群集的組態檔案已被接收與分配時,組態加載控制器產生分配完成訊號(例如其PROGRAM_LOAD_DONE_OUT)。於步驟815,組態加載控制器接著等待來自可組態單元之對於其個別的單元檔案已被加載的確認,舉例來說,藉由其 PROGRAM_LOAD_DONE_IN之判定來表示。一旦確認成功的組態加載,組態加載控制器可通知主機(816)。Once the load command is received, at step 811, the configuration load controller issues a load request to the memory (140, Figure 1) connected to the reconfigurable data processor (110, Figure 1). In step 812, the configuration load controller retrieves the configuration file of the cluster on the top-level network via the memory interface. At step 813, the configuration load controller distributes the cluster's configuration files to the configurable units in the array on the array layer network in the sequenced rounds. At step 814, when the configuration files of all clusters have been received and allocated, the configuration load controller generates an allocation complete signal (eg, its PROGRAM_LOAD_DONE_OUT). At step 815, the configuration load controller then waits for confirmation from the configurable unit that its individual unit file has been loaded, for example, by its It is indicated by the judgment of PROGRAM_LOAD_DONE_IN. Once a successful configuration load is confirmed, the configuration load controller may notify the host (816).

第9圖顯示組態檔案之一個範例組織。其他組織亦可被使用以及設置成符合用於加載與卸載組態檔案之特定協定。於參照第9圖所述之範例中,於可組態單元之陣列中的可組態單元包括切換器、PCU、PMU、及AGCU。這些可組態單元中之各者含有代表要運行程式之設定或序列的一組暫存器。這些暫存器包括用以界定含有其的可組態單元之操作的資料,例如巢套迴路的數量、各迴路迭代器的限制、對於各階段執行的指令、運算元的來源、及用於輸入與輸出介面的網路參數。此外,各組態檔案可包括用以設定一組計數器之脈絡(context)的資料,其在各巢套迴路中追蹤其進展。Figure 9 shows an example organization of a configuration file. Other organizations may also be used and configured to conform to specific protocols for loading and unloading configuration files. In the example described with reference to FIG. 9, the configurable cells in the array of configurable cells include switches, PCUs, PMUs, and AGCUs. Each of these configurable units contains a set of registers representing the settings or sequence of programs to be run. These registers include data that defines the operation of the configurable unit containing it, such as the number of nested loops, the limits of each loop iterator, the instructions to execute for each stage, the source of operands, and the input used for input Network parameters for the output interface. In addition, each configuration file may include data used to set the context of a set of counters that track their progress in each nested loop.

程式(可執行的)含有表示執行程式之各可組態單元的初始組態或開始狀態之位元流。此位元流被參照為位元檔案,或於此作為組態檔案。程式加載是基於組態檔案之內容來設定於可組態單元中的組態儲存器以允許所有可組態單元執行程式之程序。程式卸載是從可組態單元卸載組態儲存器、及組合位元流(於此稱為卸載組態檔案)之程序。於此處所述之範例中,卸載組態檔案具有被使用於程式加載之相同的設置群集或子檔案與組態檔案。A program (executable) contains a bitstream representing the initial configuration or start state of each configurable element of the executing program. This bitstream is referenced as a bit file, or here as a configuration file. Program loading is based on the content of the configuration file to set the configuration memory in the configurable unit to allow all configurable units to execute the program. Program unloading is the process of unloading the configuration storage from the configurable unit and combining the bitstream (referred to herein as unloading the configuration file). In the example described here, the uninstall configuration file has the same settings cluster or subfile and configuration file that is used for the program load.

組態檔案包括用於可組態單元之陣列中的各可組態單元之複數個群集的組態資料,群集被以符合其被分配之序列的方式設置於組態檔案中。基於組態檔案中的群集之位置,組態檔案之此組織賦能陣列組態加載處理以將群集路由至可組態單元。The configuration file includes configuration data for a plurality of clusters of configurable cells in the array of configurable cells, the clusters are arranged in the configuration file in a manner consistent with their assigned sequence. This organization of the configuration file enables the array configuration load process to route clusters to configurable units based on the location of the clusters in the configuration file.

如第9圖所示,組態檔案(及以相同方式設置的卸載組態檔案)包括用於複數個可組態單元中之各可組態單元的複數個群集的單元檔案,單元檔案具有最多M(於此範例中,Z4 = 6)個具有單元檔案中次序(i)的子檔案。於第9圖中,M為六,且群集被從第一排序至第六(亦即,於此索引中,第一至第六群集對應群集(0)至(5))。群集被設置使得對於在加載或卸載組態檔案中之所有單元檔案,次序(i)的所有子檔案,(i)從0至M-1,被儲存於記憶體中對應的區塊(i)之位址空間中,(i)從0至M-1。次序(0)的群集被儲存於包括位址A0至A1-1之區塊(0)中。於此範例中,對於切換單元,次序(0)的群集係位於在區塊(0)內之相鄰的位址之群組中。對於PCU,次序(0)的群集係位於在區塊(0)內之相鄰的位址之群組中。對於PMU,次序(0)的群集係位於在區塊(0)內之相鄰的位址之群組中。對於AGCU,次序(0)的群集係位於相鄰的位址之群組中。次序(1)的群集被儲存於包括位址A1至A2-1之區塊(1)中。於此範例中,對於切換單元,次序(1)的群集被儲存於在區塊(1)內之相鄰的位址之群組中。對於PCU,次序(1)的群集係位於在區塊(1)內之相鄰的位址之群組中。對於PMU,次序(1)的群集係位於在區塊(1)內之相鄰的位址之群組中。對於AGCU,次序(1)的群集係位於在區塊(1)內之相鄰的位址之群組中。次序3至5的群集被設置如第9圖所示,跟隨區塊(2)至(5)中之型樣。As shown in Figure 9, the configuration file (and the uninstall configuration file set in the same way) includes the unit file for the plurality of clusters of each configurable unit of the plurality of configurable units, the unit file has the most M (in this example, Z4 = 6) subfiles with order (i) in the unit file. In Figure 9, M is six, and the clusters are ordered from first to sixth (ie, in this index, the first to sixth clusters correspond to clusters (0) to (5)). The cluster is set up so that for all unit files in the load or unload configuration file, all sub-files in order (i), (i) from 0 to M-1, are stored in the corresponding block (i) in memory In the address space of (i) from 0 to M-1. Clusters of order (0) are stored in block (0) comprising addresses A0 to A1-1. In this example, for a switch unit, the cluster of order (0) is in a group of adjacent addresses within block (0). For PCUs, clusters of order (0) are in groups of adjacent addresses within block (0). For the PMU, clusters of order (0) are in groups of adjacent addresses within block (0). For an AGCU, clusters of order (0) are located in groups of adjacent addresses. Clusters of order (1) are stored in block (1) comprising addresses A1 to A2-1. In this example, for switching units, the clusters of order (1) are stored in groups of adjacent addresses within block (1). For PCUs, clusters of order (1) are in groups of adjacent addresses within block (1). For the PMU, the clusters of order (1) are in groups of adjacent addresses within block (1). For an AGCU, clusters of order (1) are in groups of adjacent addresses within block (1). The clusters in order 3 to 5 are arranged as shown in Figure 9, following the pattern in blocks (2) to (5).

由圖可見,於此範例中,線性的位址空間被配置於行邊界(line boundaries)上用於組態檔案之區塊內。於其他實施例中,線性的位址空間可被配置於字元邊界或群集邊界上。邊界可被選擇以符合被使用的記憶體之效率特性。因此,於此範例中組態檔案包含具有按順序的行位址(line addresses)之記憶體行。As can be seen from the figure, in this example, a linear address space is allocated on line boundaries within the blocks used for configuration files. In other embodiments, linear address spaces may be configured on word boundaries or cluster boundaries. The boundaries can be chosen to match the efficiency characteristics of the memory being used. Therefore, the configuration file in this example contains memory lines with sequential line addresses.

同樣的,陣列包括多於一種類型的可組態單元,且用於不同類型的可組態單元之該等單元檔案包括不同數量的組態資料之子檔案,且其中在一區塊(i)之位址空間內,對於各類型的可組態單元,該等子檔案被儲存於在區塊(i)之位址空間內之相鄰的位址之對應的群組中。Likewise, the array includes more than one type of configurable cell, and the cell files for different types of configurable cells include different numbers of sub-files of configuration data, and wherein the sub-files of a block (i) Within the address space, for each type of configurable unit, the sub-files are stored in corresponding groups of adjacent addresses within the address space of block (i).

該陣列可包括多於一個類型的可組態單元,且用於不同類型的可組態單元之該等單元檔案包括不同數量的群集的組態資料。例如,如第3圖所示,於陣列中之可組態單元的類型可包括切換單元、PCU(型樣計算單元)、PMU(型樣記憶體單元)及AGCU(位址產生及合併單元)。The array may include more than one type of configurable cell, and the cell files for different types of configurable cells include configuration data for different numbers of clusters. For example, as shown in FIG. 3, the types of configurable cells in the array may include switching cells, PCU (pattern computing unit), PMU (pattern memory unit), and AGCU (address generation and merging unit) .

範例組態檔案組織包括: W(例如第3圖中,28)個切換單元,各單元需要Z1群集的組態位元; X(例如9)個PCU單元,各單元需要Z2群集的組態位元; Y(例如9)個PMU單元,各單元需要Z3群集的組態位元; Z(例如4)個AGCU單元,各單元需要Z4群集的組態位元。Example configuration file organizations include: W (for example, in Figure 3, 28) switching units, each unit needs the configuration bits of the Z1 cluster; X (for example, 9) PCU units, each unit requires configuration bits of the Z2 cluster; Y (for example, 9) PMU units, each unit needs the configuration bits of the Z3 cluster; Z (eg, 4) AGCU units, each requiring Z4 clusters of configuration bits.

因此,用於一第一類型的可組態單元之單元檔案可包括Z1群集,且用於一第二類型的可組態單元之單元檔案包括Z2群集,其中Z1小於Z2。陣列組態加載處理可包括擷取包括用於所有於Z1回合中的第一類型與第二類型的可組態單元之單元檔案的群集(i)之組態檔案的片段,其中(i)從0至Z1-1,及接著擷取包括用於所有於Z2回合中的第二類型的可組態單元之單元檔案的群集(i)之組態檔案的片段,其中(i)從Z1至Z2-1。用於第三類型的可組態單元之單元檔案可包括Z3群集,且用於第四類型的可組態單元之該等單元檔案包括Z4群集,其中Z1小於Z2、Z2小於Z3、及Z3小於Z4。分配序列可依此模式繼續一個回合,對於所有不同類型的可組態單元對於各群集(i),其需要多於(i+1)群集。Thus, the cell file for a first type of configurable cell may include Z1 clusters, and the cell file for a second type of configurable cells includes Z2 clusters, where Z1 is less than Z2. The array configuration load process may include retrieving a segment of the configuration file including the cluster (i) of cell files for all configurable cells of the first type and second type in round Z1, where (i) from 0 to Z1-1, and then fetch a segment of the configuration file including the cluster (i) of cell files for all configurable cells of the second type in round Z2, where (i) from Z1 to Z2 -1. The cell files for the third type of configurable cells may include Z3 clusters, and the cell files for the fourth type of configurable cells include Z4 clusters, where Z1 is less than Z2, Z2 is less than Z3, and Z3 is less than Z4. The allocation sequence can continue in this pattern for one round, which requires more than (i+1) clusters for each cluster (i) for all different types of configurable units.

如於範例組態檔案組織中所示,組態檔案中之群集的組態資料被以交錯方式設置: • 對於回合R(i = 0),對於切換單元中之各者的2個群集的組態位元中之第一個; • 對於回合R(i = 0),對於PCU單元中之各者的3個群集的組態位元中之第一個; • 對於回合R(i = 0),對於PMU單元中之各者的5個群集的組態位元中之第一個; • 對於回合R(i = 0),對於AGCU單元中之各者的6個群集的組態位元中之第一個; • 對於回合R(i = 1),對於切換單元中之各者的2個群集的組態位元中之第二個; • 對於回合R(i = 1),對於PCU單元中之各者的3個群集的組態位元中之第二個; • 對於回合R(i = 1),對於PMU單元中之各者的5個群集的組態位元中之第二個; • 對於回合R(i = 1),對於AGCU單元中之各者的6個群集的組態位元中之第二個; • 對於回合R(i = 2),對於PCU單元中之各者的3個群集的組態位元中之第三個; • 對於回合R(i = 2),對於PMU單元中之各者的5個群集的組態位元中之第三個; • 對於回合R(i = 2),對於AGCU單元中之各者的6個群集的組態位元中之第三個; • 對於回合R(i = 3),對於PMU單元中之各者的5個群集的組態位元中之第四個; • 對於回合R(i = 3),對於AGCU單元中之各者的6個群集的組態位元中之第四個; • 對於回合R(i = 3),對於PMU單元中之各者的5個群集的組態位元中之第五個; • 對於回合R(i = 4),對於AGCU單元中之各者的6個群集的組態位元中之第五個; • 對於回合R(i = 5),對於AGCU單元中之各者的6個群集的組態位元中之第六個。As shown in the example configuration file organization, the configuration data for the clusters in the configuration file are set up in an interleaved fashion: • For round R (i = 0), the first of the configuration bits of the 2 clusters for each of the switching cells; • For round R (i = 0), the first of the configuration bits of the 3 clusters for each of the PCU units; • For round R (i = 0), the first of the configuration bits of the 5 clusters for each of the PMU units; • For round R (i = 0), the first of the configuration bits of the 6 clusters for each of the AGCU units; • For round R (i = 1), the second of the configuration bits of the 2 clusters for each of the switching cells; • For round R (i = 1), the second of the configuration bits of the 3 clusters for each of the PCU units; • For round R (i = 1), the second of the configuration bits of the 5 clusters for each of the PMU units; • For round R (i = 1), the second of the configuration bits of the 6 clusters for each of the AGCU units; • For round R (i = 2), the third of the configuration bits of the 3 clusters for each of the PCU units; • For round R (i = 2), the third of the configuration bits of the 5 clusters for each of the PMU units; • For round R (i = 2), the third of the configuration bits of the 6 clusters for each of the AGCU units; • For round R (i = 3), the fourth of the configuration bits of the 5 clusters for each of the PMU units; • For round R (i = 3), the fourth of the configuration bits of the 6 clusters for each of the AGCU units; • For round R (i = 3), the fifth of the configuration bits of the 5 clusters for each of the PMU units; • For round R (i = 4), the fifth of the configuration bits of the 6 clusters for each of the AGCU units; • For round R (i = 5), the sixth of the configuration bits of the 6 clusters for each of the AGCU units.

單元檔案可被組織以包含複數個排序過的群集(或其他大小的子檔案)。於一些實施例中,特定於不同的可組態單元之單元檔案可具有不同數量之排序過的群集。用於可組態單元之陣列的組態檔案被設置使得群集的單元檔案被以與其他單元檔案有相同次序的群集組成群組。同樣的,組態檔案被設置使得組態檔案中之群集的位置暗示群集之陣列中的可組態單元及其特定於可組態單元於單元檔案中之次序。Unit archives can be organized to contain a plurality of ordered clusters (or other sized sub-archives). In some embodiments, unit files specific to different configurable units may have different numbers of ordered clusters. The configuration file for the array of configurable cells is set so that the cell files of the cluster are grouped in clusters in the same order as the other cell files. Likewise, the configuration file is set such that the location of the clusters in the configuration file implies the configurable cells in the array of clusters and their specific ordering of the configurable cells in the cell file.

陣列組態加載處理可擷取組態檔案之片段,包括第一類型(切換器類型)、第二類型(PCU類型)、第三類型(PMU類型)及第四類型(AGCU類型)之所有的可組態單元之群集(i)的單元檔案,其中(i)從0至Z1-1(=1)。四種類型之所有的可組態單元之群集(0)的單元檔案於第一回合中被擷取,而四種類型之所有的可組態單元之群集(1)的單元檔案於第二回合中被擷取。在第一與第二回合之後,第一類型(切換器類型)之所有的可組態單元之所有(2)群集的單元檔案已被擷取。第一、第二、第三及第四類型之所有的可組態單元之單元檔案分別具有0、1、3及4群集剩下要被擷取。The array configuration loading process can retrieve fragments of configuration files, including all possible types of the first type (switch type), the second type (PCU type), the third type (PMU type) and the fourth type (AGCU type). The unit file of cluster (i) of configuration units, where (i) is from 0 to Z1-1 (=1). The cell files of the cluster (0) of all configurable cells of the four types are retrieved in the first round, and the cell files of the cluster (1) of all the configurable cells of the four types are retrieved in the second round captured in. After the first and second rounds, the unit files of all (2) clusters of all configurable units of the first type (switch type) have been retrieved. The unit files of all configurable units of the first, second, third and fourth types have 0, 1, 3 and 4 clusters respectively remaining to be retrieved.

陣列組態加載處理可接著於第三回合中擷取組態檔案之片段,包括第二、第三及第四類型之所有的可組態單元之群集(i)的單元檔案。在第三回合之後,第二類型(PCU類型)之所有的可組態單元之所有(3)群集的單元檔案已被擷取。第一、第二、第三及第四類型之所有的可組態單元之單元檔案分別具有0、0、2及3群集剩下要被擷取。The array configuration load process may then retrieve segments of the configuration file in the third pass, including the cell files of all clusters (i) of configurable cells of the second, third, and fourth types. After the third round, the unit files of all (3) clusters of all configurable units of the second type (PCU type) have been retrieved. The cell files of all configurable cells of the first, second, third and fourth types have 0, 0, 2 and 3 clusters respectively remaining to be retrieved.

陣列組態加載處理可接著於第四回合中擷取組態檔案之片段,包括第三及第四類型之所有的可組態單元之群集(i)的單元檔案。在第四回合之後,第三類型(PMU類型)之所有的可組態單元之所有(4)群集的單元檔案已被擷取。第一、第二、第三及第四類型之所有的可組態單元之單元檔案分別具有0、0、1及2群集剩下要被擷取。The array configuration loading process may then retrieve segments of the configuration file in the fourth pass, including the cell files of all clusters (i) of configurable cells of the third and fourth types. After the fourth round, the unit files of all (4) clusters of all configurable units of the third type (PMU type) have been retrieved. The cell files of all configurable cells of the first, second, third and fourth types have 0, 0, 1 and 2 clusters respectively remaining to be retrieved.

陣列組態加載處理可接著於第五及第六回合中擷取組態檔案之片段,包括第三及第四類型之所有的可組態單元之群集(i)的單元檔案,(i)從Z3(=4)至Z4-1(5)。在第六回合之後,第四類型(AGCU類型)之所有的可組態單元之所有(6)群集的單元檔案已被擷取。第一、第二、第三及第四類型之所有的可組態單元之單元檔案分別具有0、0、0及0群集剩下要被擷取。The array configuration load process may then retrieve segments of the configuration file in the fifth and sixth rounds, including the cell files of all clusters of configurable cells of the third and fourth types (i), (i) from Z3 (=4) to Z4-1(5). After the sixth round, the unit files of all (6) clusters of all configurable units of the fourth type (AGCU type) have been retrieved. The cell files of all configurable cells of the first, second, third and fourth types have clusters of 0, 0, 0 and 0 respectively remaining to be retrieved.

於以上所述之方式中,陣列組態加載處理可繼續直到第一、第二、第三及第四類型之所有的可組態單元之單元檔案不具有群集剩下要被擷取。In the manner described above, the array configuration loading process may continue until the cell files for all configurable cells of the first, second, third and fourth types have no clusters remaining to be retrieved.

陣列組態加載處理使用於組態檔案中之群集的位置所暗示之位址將組態資料之群集經由陣列層網路路由至可組態單元。例如,對於198個切換單元中之各者的2個群集的組態位元中之第一個具有線性的記憶體位址0-12288,而對於198個切換單元中之各者的2個群集的組態位元中之第二個具有線性的記憶體位址33792-46080。The array configuration load process routes clusters of configuration data to configurable units through the array layer network using the addresses implied by the location of the clusters in the configuration file. For example, the first of the configuration bits for 2 clusters of each of 198 switching units has a linear memory address of 0-12288, while the first of 2 clusters for each of 198 switching units has a linear memory address of 0-12288 The second of the configuration bits has linear memory addresses 33792-46080.

於一些實施例中,群集的組態檔案可從記憶體不依次序被返回至組態加載控制器。組態檔案中之群集的位置可被使用以將群集路由至正確的可組態單元。因為分配序列中之回合的組織,可組態單元被保證按次序接收其單元檔案的群集。In some embodiments, the clustered configuration files may be returned to the configuration load controller out of sequence from memory. The location of the cluster in the configuration file can be used to route the cluster to the correct configurable unit. Because of the organization of rounds in the allocation sequence, configurable cells are guaranteed to receive clusters of their cell files in order.

第10圖為顯示用以執行用於類似第2與3圖之系統的陣列組態加載處理之邏輯的一個範例之狀態機圖,包括對於於陣列中之複數個可組態單元分配包含單元檔案之組態檔案,單元檔案各包含複數個排序過的群集(或子檔案),藉由在N個回合(R(i),i從0至N-1)的序列中經由匯流排系統將次序(i)的一個單元群集發送至於複數個可組態單元中之所有的包括最多N個子檔案之可組態單元,直到組態檔案中之單元檔案被分配至複數個可組態單元中之可組態單元。FIG. 10 is a state machine diagram showing an example of the logic used to execute the array configuration loading process for systems like those of FIGS. 2 and 3, including allocating the containing cell file for a plurality of configurable cells in the array. In the configuration file, the unit files each contain a plurality of sorted clusters (or sub-files), which are sorted by the bus system in a sequence of N rounds (R(i), i from 0 to N-1). (i) A unit cluster is sent to all configurable units including at most N sub-files in the plurality of configurable units until the unit file in the configuration file is allocated to the configurable units in the plurality of configurable units configuration unit.

於此範例中,狀態機包括六個狀態S1至S6。於狀態S1(閒置),組態加載控制器等待來自主機之組態加載命令。當組態加載命令被接收時,加載處理進入狀態S2以開始執行第一回合R(0)的分配序列。各回合遍歷(traverse)狀態S2至S6。於此處所述之範例中,有六個回合是因為被分配至陣列中之可組態單元的群集之最大數量是六。In this example, the state machine includes six states S1 to S6. In state S1 (idle), the configuration loading controller waits for a configuration loading command from the host. When a configuration load command is received, the load process enters state S2 to begin executing the allocation sequence of the first round R(0). Each round traverses states S2 to S6. In the example described here, there are six rounds because the maximum number of clusters assigned to configurable cells in the array is six.

於狀態S2(切換請求),組態加載控制器產生記憶體存取請求以經由頂層網路對於個別的切換單元之回合R(i)的組態單元檔案擷取狀態S2的群集、及將所擷取的群集分配至個別的切換單元。對於i=0,於回合R(0)中,組態加載控制器對於個別的切換單元產生多個群集中之群集(0)之記憶體存取請求、及將群集(0)發送至個別的切換單元。對於i=1,於回合R(1)中,組態加載控制器對於個別的切換單元產生多個群集中之群集(1)之記憶體存取請求、及將群集發送至個別的切換單元。於回合R(i)中,當組態加載控制器已對於個別的切換單元對於多個群集中之群集(i)產生記憶體存取請求及將群集分配至所有切換單元時,加載處理進入狀態S3。In state S2 (switch request), the configuration load controller generates a memory access request to retrieve the cluster in state S2 via the top-level network for the configuration unit file of the round R(i) of the individual switch unit, and The captured clusters are assigned to individual switching units. For i=0, in round R(0), the configuration load controller generates a memory access request for cluster (0) of the plurality of clusters for the individual switch unit, and sends cluster (0) to the individual switch unit Switch unit. For i=1, in round R(1), the configuration load controller generates a memory access request for cluster (1) of the plurality of clusters for the individual switch unit, and sends the cluster to the individual switch unit. In round R(i), the load process enters the state when the configuration load controller has generated a memory access request for cluster (i) of multiple clusters for an individual switch unit and assigned the cluster to all switch units S3.

於狀態S3(PCU請求),組態加載控制器產生記憶體存取請求以經由頂層網路對於個別的PCU單元(型樣計算單元)之回合R(i)的組態單元檔案擷取群集、及將所擷取的群集分配至個別的PCU單元。於回合R(i)之狀態S3中,組態加載控制器對於個別的PCU單元產生多個群集中之群集(i)之記憶體存取請求、及將群集(i)發送至個別的PCU單元。於回合R(i)中,當組態加載控制器已對於個別的PCU單元對於多個群集中之群集(i)產生記憶體存取請求及分配群集時,加載處理進入狀態S4。In state S3 (PCU request), the configuration load controller generates a memory access request to retrieve the cluster of configuration unit files for round R(i) of the individual PCU unit (pattern computing unit) via the top-level network, and assigning the captured clusters to individual PCU units. In state S3 of round R(i), the configuration load controller generates a memory access request for cluster (i) of the plurality of clusters for the individual PCU unit, and sends the cluster (i) to the individual PCU unit . In round R(i), when the configuration load controller has generated memory access requests and allocated clusters for individual PCU units for cluster (i) of the plurality of clusters, the loading process enters state S4.

於狀態S4(PMU請求),組態加載控制器產生記憶體存取請求以經由頂層網路對於可組態單元之陣列中之個別的PMU單元(型樣記憶體單元)之組態單元檔案擷取群集、及將所擷取的群集發送至個別的PMU單元。於回合R(i)之狀態S4中,組態加載控制器對於個別的PMU單元產生多個群集中之群集(i)之記憶體存取請求、及將群集(i)發送至個別的PMU單元。例如,對於i=0,於回合R(0)中,組態加載控制器對於個別的PMU單元產生多個群集中之群集(0)之記憶體存取請求、及將群集(0)發送至個別的PMU單元。對於i=1,於回合R(1)中,組態加載控制器對於個別的PMU單元產生多個群集中之群集(1)之記憶體存取請求、及將群集(1)發送至個別的PMU單元。於回合R(i)中,當組態加載控制器已對於個別的PMU單元對於多個群集中之群集(i)產生記憶體存取請求及分配群集時,加載處理進入狀態S5。In state S4 (PMU request), the configuration load controller generates a memory access request for configuration cell file retrieval via the top-level network for individual PMU cells (pattern memory cells) in the array of configurable cells The clusters are fetched, and the fetched clusters are sent to individual PMU units. In state S4 of round R(i), the configuration load controller generates a memory access request for cluster (i) of the plurality of clusters for the individual PMU unit and sends the cluster (i) to the individual PMU unit . For example, for i=0, in round R(0), the configuration load controller generates a memory access request for cluster(0) of multiple clusters for individual PMU units, and sends cluster(0) to individual PMU units. For i=1, in round R(1), the configuration load controller generates a memory access request for cluster(1) of multiple clusters for the individual PMU unit, and sends cluster(1) to the individual PMU unit PMU unit. In round R(i), when the configuration load controller has generated memory access requests and allocated clusters for individual PMU units for cluster (i) of the plurality of clusters, the loading process proceeds to state S5.

於狀態S5(AGCU請求),組態加載控制器產生記憶體存取請求以經由頂層網路對於可組態單元之陣列中之個別的AGCU(位址產生及合併單元)之組態單元檔案擷取群集、及將所擷取的群集發送至個別的AGCU單元。於回合R(i)之狀態S5中,組態加載控制器對於個別的AGCU單元產生多個群集中之群集(i)之記憶體存取請求、及將群集(i)發送至個別的AGCU單元。於回合R(i)之狀態S5中,當組態加載控制器已對於個別的AGCU單元對於多個群集中之群集(i)產生記憶體存取請求及分配群集時,加載處理進入回合R(i)之狀態S6。In state S5 (AGCU request), the configuration load controller generates a memory access request for configuration cell file retrieval via the top-level network for individual AGCUs (address generation and merging units) in the array of configurable cells Clusters are fetched, and the fetched clusters are sent to individual AGCU units. In state S5 of round R(i), the configuration load controller generates a memory access request for cluster (i) of the plurality of clusters for the individual AGCU unit, and sends the cluster (i) to the individual AGCU unit . In state S5 of round R(i), when the configuration load controller has generated memory access requests and allocated clusters for individual AGCU units for cluster (i) of the plurality of clusters, the load process enters round R( i) state S6.

於狀態S6(回應等待),組態加載控制器等待以確保陣列中之可組態單元(切換器、PCU、PMU、AGCU單元)被讀取以在下一個回合中接收更多的群集的組態資料。若切換單元之所有群集沒有被發送,則加載處理增加(i)且前進至狀態S2以開始下一個回合R(i+1)。若切換單元之所有群集被發送但PCU群集之所有群集沒有被發送,則加載處理增加(i)且前進至狀態S3以開始下一個回合R(i+1)。若切換單元與PCU單元之所有群集被發送但PMU群集之所有群集沒有被發送,則加載處理增加(i)且前進至狀態S4以開始下一個回合R(i+1)。若切換單元、PCU單元、與PMU單元之所有群集被發送但AGCU群集之所有群集沒有被發送,則加載處理增加(i)且前進至狀態S5以開始下一個回合R(i+1)。若所有可組態單元(切換器、PCU、PMU、AGCU單元)之所有群集被發送(亦即,所有回合完成),則加載處理前進至狀態S1。In state S6 (response wait), the configuration load controller waits to ensure that the configurable units (switches, PCUs, PMUs, AGCU units) in the array are read to receive configurations for more clusters in the next round material. If all clusters of switching units have not been sent, the load process is incremented by (i) and proceeds to state S2 to start the next round R(i+1). If all clusters of switching units are sent but all clusters of PCU clusters are not sent, the load process increments (i) and proceeds to state S3 to start the next round R(i+1). If all clusters of switching units and PCU units are sent but not all clusters of PMU clusters, the load process increments (i) and proceeds to state S4 to start the next round R(i+1). If all clusters of switching units, PCU units, and PMU units are sent but not all clusters of AGCU clusters, the load process increments (i) and proceeds to state S5 to start the next round R(i+1). If all clusters of all configurable units (switches, PCUs, PMUs, AGCU units) are sent (ie, all rounds complete), the loading process proceeds to state S1.

第11圖為顯示類似第10圖之較早回合的分配序列之時序的時序圖。於此範例中,群集的組態單元檔案具有數量B的位元之資料(例如B=128),分配序列中之回合可包括數量X的可組態單元,可組態單元之陣列可包括數量Y的可組態單元(例如Y=148)。對於回合R(0),X可等於Y。於隨後的回合中,X可小於或等於Y。FIG. 11 is a timing diagram showing the timing of the allocation sequence of an earlier round similar to that of FIG. 10. FIG. In this example, the configurated unit file of the cluster has data of a number B of bits (eg B=128), a round in the allocation sequence may include a number X of configurable cells, and an array of configurable cells may include a number of Configurable unit of Y (eg Y=148). For round R(0), X may be equal to Y. In subsequent rounds, X may be less than or equal to Y.

於此範例中,回合R(0)包括Y=148個可組態單元。對於回合R(0)與R(1),X=Y。在前兩個回合R(0)與R(1)之後,切換單元已接收所有的(2)其群集,所以第三回合R(2)包括少於128個可組態單元。In this example, round R(0) includes Y=148 configurable cells. For rounds R(0) and R(1), X=Y. After the first two rounds R(0) and R(1), the switching unit has received all (2) of its clusters, so the third round R(2) includes less than 128 configurable units.

如第11圖之範例中所示,回合R(0),第一群集P11的組態單元檔案於第一匯流排週期C0中經由匯流排系統於可組態單元被接收。第一群集接著被加載至第一可組態單元「單元1」之組態儲存器內,藉由當其他群集的回合被組態加載控制器分配至其他可組態單元時在可組態單元於平行任務中在B時脈週期中序列地轉移在第一群集P11中的資料之B位元(其可用與匯流排時脈相同的比率運行)。第二群集P21的組態檔案於第二匯流排週期C1中經由匯流排系統被接收。藉由在B時脈週期中轉移於第二群集P21中之資料的B位元,第二群集接著於平行任務中被加載至第二可組態單元「單元2」之組態儲存器內。第三群集P31的組態檔案於第三匯流排週期C2中經由匯流排系統被接收。藉由在B時脈週期中轉移於第三群集P31中之資料的B位元,第三群集P31接著被加載至第三可組態單元「單元3」之組態儲存器內。此回合前進直到所有可組態單元接收特定於其之第一群集的單元檔案。As shown in the example of FIG. 11, round R(0), the configuration unit file of the first cluster P11 is received at the configurable unit via the bus system in the first bus cycle C0. The first cluster is then loaded into the configuration memory of the first configurable unit "unit 1" by loading the The B bits of data in the first cluster P11 are transferred sequentially in parallel tasks in B clock cycles (which can run at the same rate as the bus clock). The configuration file of the second cluster P21 is received via the bus system in the second bus cycle C1. By transferring the B bits of data in the second cluster P21 during the B clock cycle, the second cluster is then loaded into the configuration memory of the second configurable unit "Cell 2" in parallel tasks. The configuration file of the third cluster P31 is received via the bus system in the third bus cycle C2. The third cluster P31 is then loaded into the configuration memory of the third configurable unit "Cell 3" by transferring the B bits of data in the third cluster P31 during the B clock cycle. This round proceeds until all configurable cells receive cell files specific to their first cluster.

回合R(0)包括分配第一組之Y群集的組態檔案(P11、P21、P31 … PY1)於陣列中之Y個別的可組態單元(單元1 … 單元Y)。群集的組態檔案具有數量B的位元之資料,且可組態單元之陣列具有數量Y的可組態單元。當回合R(0)完成時,第一組中之Y群集的組態檔案(P11、P21、P31 … PY1)已於Y匯流排週期(C0至CY-1)中被接收於陣列中之Y可組態單元,且第一群集P11已於B時脈週期中被加載或序列地轉移至第一可組態單元「單元1」之組態儲存器內。B時脈週期在第一群集P11被接收的第一時脈週期C0之後。Round R(0) involves assigning the configuration files of the first set of Y clusters (P11, P21, P31 . . . PY1 ) to Y individual configurable cells (Cell 1 . . . Cell Y) in the array. The configuration file of the cluster has data of B number of bits, and the array of configurable cells has Y number of configurable cells. When round R(0) is complete, the configuration files for the Y clusters in the first group (P11, P21, P31...PY1) have been received at Y in the array in the Y bus cycle (C0 to CY-1) configurable unit, and the first cluster P11 has been loaded or serially transferred into the configuration memory of the first configurable unit "unit 1" in the B clock cycle. The B clock period follows the first clock period C0 in which the first cluster P11 is received.

下一個回合R(1)包括接收第二組之Y群集的組態檔案(P12、P22、P32 … Py2)於陣列中之Y個別的可組態單元(單元1 … 單元Y)。當回合R(1)完成時,第二組中之Y群集的組態檔案(P12、P22、P32 … Py2)已於Y匯流排週期(Cy至C2y-1)中被接收於陣列中之Y個別的可組態單元。當回合R(1)完成時,對於第一可組態單元「單元1」之第二群集P12在回合R(1)中第一時脈週期(Cy)之後的B時脈週期中已被加載或序列地轉移至第一可組態單元「單元1」之組態儲存器內。同樣的,當第二回合完成時,於回合R(0)所接收之第一組的Y群集的組態檔案中之最後群集PY1已被加載或序列地轉移至最後可組態單元「單元Y」之組態儲存器內。The next round R(1) consists of receiving the configuration files of the second set of Y clusters (P12, P22, P32...Py2) in Y individual configurable cells in the array (Cell 1... Cell Y). When round R(1) is complete, the configuration files for the Y clusters in the second group (P12, P22, P32...Py2) have been received at Y in the array in the Y bus cycle (Cy to C2y-1) Individual configurable units. When round R(1) is completed, the second cluster P12 for the first configurable cell "cell 1" has been loaded in the B clock cycle following the first clock cycle (Cy) in round R(1) Or sequentially transferred to the configuration memory of the first configurable unit "Unit 1". Likewise, when the second round is completed, the last cluster PY1 in the configuration file of the first group of Y clusters received in round R(0) has been loaded or serially transferred to the last configurable unit "Unit Y" ” in the configuration memory.

只要群集中位元的數量B(128)小於回合中可組態單元之數量X,可組態單元在先前的群集已被加載之後將接收下個群集的單元組態檔案,使得可組態單元應為準備好了而不需要序列來拖延。於此範例中,群集中位元的數量B為128,而回合R(0)中可組態單元之數量X為X=Y=148。由於將群集中之128個位元序列地轉移至可組態單元之組態資料儲存器內需要128個時脈週期,在轉移完成之後可有有效的20(Y-B=148-128)個緩衝器週期,以確保第一可組態單元「單元1」在下一個回合R(1)中準備好接收下個群集(P12)。當群集中位元的數量B大於回合中可組態單元之數量X時,先前的群集被消耗的同時下個群集可被接收。於此,「被消耗」參照將群集中位元序列地轉移至可組態單元之組態資料儲存器內。As long as the number of bits B (128) in the cluster is less than the number X of configurable cells in the round, the configurable cells will receive the cell configuration file of the next cluster after the previous cluster has been loaded, making the configurable cells Should be ready without the need for a sequence to procrastinate. In this example, the number B of bits in the cluster is 128, and the number X of configurable cells in round R(0) is X=Y=148. Since 128 clock cycles are required to serially transfer the 128 bits in the cluster into the configuration data store of the configurable unit, there are 20 (Y-B=148-128) buffers available after the transfer is complete cycle to ensure that the first configurable unit "unit 1" is ready to receive the next cluster in the next round R(1) (P12). When the number B of bits in a cluster is greater than the number X of configurable units in a round, the previous cluster is consumed while the next cluster can be received. Here, "consumed" refers to the serial transfer of bits in the cluster into the configuration data store of the configurable unit.

通常,單元組態加載處理於一個匯流排週期中從匯流排系統接收特定於可組態單元之單元檔案的第一群集(或子檔案)、在單元檔案之第二群集在下一個回合被接收之前在隨後的匯流排週期期間開始將所接收的第一群集推入序列鏈內、於稍後的匯流排週期中對於該序列之下一個回合從匯流排系統接收特定於可組態單元之單元檔案的第二群集、及在將稍早所接收的群集推入序列鏈內之後在該序列之週期期間開始將所接收的第二群集推入序列鏈內。於某些回合中,所有的接收群集可在下個群集被接收之前被消耗。Typically, the cell configuration load process receives the first cluster (or subfile) of cell files specific to configurable cells from the bus system in one bus cycle, before the second cluster of cell files is received in the next round Begin pushing the first cluster received into the sequence chain during a subsequent bus cycle, receive a configurable unit-specific cell file from the bus system in a later bus cycle for the next round of the sequence and the second cluster received into the sequence chain starts to be pushed into the sequence chain during the period of the sequence after the cluster received earlier is pushed into the sequence chain. In some rounds, all receive clusters may be consumed before the next cluster is received.

由於不同類型的可組態單元可具有不同數量的組態位元,可組態單元會需要各不相同之數量的群集。一旦需要較少數量的群集之可組態單元已加載所有其組態位元時,組態加載控制器停止對其發送資料。此會導致較少的可組態單元(數量X)被交錯且會造成可組態單元在其完成處理先前的群集之前接收新的群集。此會在陣列層網路上造成回壓(back-pressure)。Since different types of configurable units can have different numbers of configuration bits, configurable units may require different numbers of clusters. Once a configurable unit that requires a smaller number of clusters has loaded all of its configuration bits, the configuration load controller stops sending data to it. This results in fewer configurable units (number X) being staggered and can cause configurable units to receive new clusters before they have finished processing previous clusters. This can cause back-pressure on the array layer network.

回壓可經由信用機制(credit mechanism)在陣列層網路上被處理。例如,各輸入FIFO可具有點對點(hop-to-hop)信用,所以若PCU的輸入FIFO填滿了,則試著發送組態資料至該PCU的輸入FIFO之陣列層網路中的切換器無法發送資料直到輸入FIFO空出一個項目(entry)並將信用送返至發送的切換器。最後,因為鏈結很忙碌,回壓會拖延AGCU發送資料。但是,一旦可組態單元消耗所有128個位元的群集,其空出一個輸入FIFO項目,信用被釋出,然後發送者可發送新的群集(若有的話)。Back pressure can be handled on the array layer network via a credit mechanism. For example, each input FIFO may have hop-to-hop credit, so if a PCU's input FIFO fills up, switches in the array layer network trying to send configuration data to that PCU's input FIFO cannot Data is sent until the input FIFO empties an entry and credit is sent back to the sending switch. Finally, because the link is busy, backpressure can delay the AGCU sending data. However, once the configurable unit consumes all 128-bit clusters, it empties an input FIFO entry, the credit is released, and the sender can send new clusters (if any).

第12圖為顯示於可組態單元中之單元組態加載處理的流程圖。於步驟1221,單元組態加載處理等待輸入FIFO(610,第6圖)變成有效的。當其為有效的時,輸入FIFO已經由匯流排系統接收組態檔案之群集的組態資料以組態可組態單元。當輸入FIFO為有效的時,流程前進至步驟1222。FIG. 12 is a flowchart showing the unit configuration loading process in the configurable unit. At step 1221, the unit configuration load process waits for the input FIFO (610, Figure 6) to become active. When active, the input FIFO has received configuration data for the cluster of configuration files from the bus system to configure the configurable units. When the input FIFO is valid, the flow proceeds to step 1222.

於步驟1222,輸入FIFO被退出佇列。於步驟1223,來自輸入FIFO之群集的組態資料被平行加載至輸入移位暫存器(620,第6圖)內。於步驟1224,輸入移位暫存器中之群集的組態資料被轉移至可組態單元之組態資料儲存器中的組態序列鏈內。At step 1222, the input FIFO is dequeued. At step 1223, the configuration data from the clusters of the input FIFOs are loaded in parallel into the input shift register (620, FIG. 6). At step 1224, the configuration data of the cluster in the input shift register is transferred into the configuration sequence chain in the configuration data store of the configurable unit.

於步驟1225,單元組態加載處理決定所加載的群集的組態資料是否為可組態單元最後群集的組態資料。若是,則可組態單元之組態資料的加載便完成。若否,則流程前進至步驟1221,且單元組態加載處理等待輸入FIFO對於下個群集的組態資料變成有效的。可組態單元中之單元組態加載處理將參照第5與6圖進一步說明。At step 1225, the unit configuration loading process determines whether the configuration data of the loaded cluster is the configuration data of the last cluster of configurable units. If so, the loading of the configuration data of the configurable unit is completed. If not, flow proceeds to step 1221 and the cell configuration load process waits for the input FIFO to become available for the configuration data of the next cluster. The cell configuration loading process in the configurable cell will be further explained with reference to FIGS. 5 and 6 .

第13圖為用以執行用於類似第2與3圖之系統的陣列組態卸載處理之邏輯的一個範例之狀態機圖。FIG. 13 is a state machine diagram of an example of the logic used to execute the array configuration offload processing for systems like those of FIGS. 2 and 3. FIG.

於此範例中,狀態機包括三個狀態S1至S3。於狀態S1(閒置),組態卸載控制器等待來自主機之組態卸載命令。組態卸載控制器實現陣列組態卸載處理之兩個計數「next_unld_req_count」與「next_unld_resp_count」。計數「next_unld_req_count」記錄下個卸載請求計數。計數「next_unld_resp_count」記錄下個卸載回應計數。於狀態S1,兩個計數皆重置至初始值,例如0。當組態卸載命令被接收時,卸載處理進入狀態S2。In this example, the state machine includes three states S1 to S3. In state S1 (idle), the configuration offload controller waits for a configuration offload command from the host. The configuration unload controller implements two counts "next_unld_req_count" and "next_unld_resp_count" for array configuration unload processing. The count "next_unld_req_count" records the next unload request count. The count "next_unld_resp_count" records the next unload response count. In state S1, both counts are reset to an initial value, eg, zero. When the configuration uninstall command is received, the uninstall process goes to state S2.

於狀態S2(產生請求),組態卸載控制器對於可組態單元之陣列中的可組態單元中之各者(包括陣列中之切換單元、PCU、PMU及AGCU)產生卸載請求。對於所產生的各卸載請求,計數「next_unld_req_count」係增加。計數「next_unld_req_count」被跟預定的數量PROGRAM_UNLOAD_REQ_COUNT做比較,其代表可組態單元之陣列中可組態單元的總數量。只要計數「next_unld_req_count」小於 PROGRAM_UNLOAD_REQ_COUNT,卸載處理就保持在狀態S2。當計數「next_unld_req_count」等於PROGRAM_UNLOAD_REQ_COUNT時,對於陣列中之可組態單元之各者的卸載請求已被產生,且卸載處理進入狀態S3。In state S2 (generating request), the configuration offload controller generates offload requests for each of the configurable cells in the array of configurable cells, including the switch unit, PCU, PMU, and AGCU in the array. For each unload request generated, the count "next_unld_req_count" is incremented. The count "next_unld_req_count" is compared to a predetermined number PROGRAM_UNLOAD_REQ_COUNT, which represents the total number of configurable cells in the array of configurable cells. As long as the count "next_unld_req_count" is less than PROGRAM_UNLOAD_REQ_COUNT, the unload processing remains in state S2. When the count "next_unld_req_count" equals PROGRAM_UNLOAD_REQ_COUNT, an unload request has been generated for each of the configurable cells in the array, and the unload process enters state S3.

於狀態S3(回應等待),組態卸載控制器對於從陣列中之可組態單元接收各回應增加計數「next_unld_resp_count」。回應包括對於可組態單元的組態資料之單元檔案中的群集(子檔案)。於一些範例中,回應亦可包括PMU高速暫存填補資料。在卸載處理期間,回應被提供至可組態單元之向量輸出且於向量匯流排上被發送至組態加載控制器。只要計數「next_unld_resp_count」小於PROGRAM_UNLOAD_REQ_COUNT,卸載處理就保持在狀態S3。In state S3 (response wait), the configuration offload controller increments the count "next_unld_resp_count" for each response received from a configurable unit in the array. The response includes clusters (subfiles) in the unit file of configuration data for the configurable unit. In some examples, the response may also include PMU cache fill data. During the unload process, the response is provided to the vector output of the configurable unit and sent on the vector bus to the configuration load controller. The unload processing remains in state S3 as long as the count "next_unld_resp_count" is less than PROGRAM_UNLOAD_REQ_COUNT.

於狀態S3,卸載處理對於所接收的各回應產生記憶體位址,且插入所接收的各回應連同於頂層網路上產生的記憶體位址。各回應包括卸載群集與序列ID。記憶體位址係從標頭產生,其伴隨於陣列層網路中運送群集的封包,包括以序列ID表示的群集號碼、行識別符、列識別符、及組件識別符。組件識別符可表示可組態單元是否為切換單元、PCU單元、PMU單元或AGCU單元。序列ID將參照第3圖進一步說明。In state S3, the offload process generates a memory address for each response received, and inserts each response received along with the memory address generated on the top-level network. Each response includes the offload cluster and sequence ID. The memory address is generated from the header, which accompanies the packet transporting the cluster in the array layer network, including the cluster number, row identifier, column identifier, and component identifier represented by the sequence ID. The component identifier may indicate whether the configurable unit is a switching unit, a PCU unit, a PMU unit, or an AGCU unit. The sequence ID will be further explained with reference to FIG. 3 .

當計數「next_unld_resp_count」等於 PROGRAM_UNLOAD_REQ_COUNT時,回應已從陣列中之可組態單元之各者被接收且被插入於頂層網路上,而卸載處理轉變回狀態S1。When the count "next_unld_resp_count" equals At PROGRAM_UNLOAD_REQ_COUNT, a response has been received from each of the configurable units in the array and inserted on the top-level net, and the unload process transitions back to state S1.

於一個實施例中,對於切換單元中之組態資料的線性的記憶體位址之次序為第一行的切換單元中之各列的第一群集、之後跟隨著第二行的切換單元中之各列的第一群集、之後跟隨著第三行的切換單元中之各列的第一群集、 … 直到最後行中之各列的第一群集。其將所有切換單元的第一群集在線性的位址空間中加入群組。其他類型的可組態單元之第一群集在鄰近的位址空間中被加載於群組中。接著,次序是之後跟隨著第一行的切換單元中之各列的第二群集、之後跟隨著第二行的切換單元中之各列的第二群集、之後跟隨著第三行的切換單元中之各列的第二群集、 … 直到最後行的切換單元中之最後列的最後群集,以此方式用於所有類型的可組態單元之第二群集。In one embodiment, the order of linear memory addresses for configuration data in switch cells is the first cluster of each column of switch cells in the first row, followed by each of the switch cells in the second row. The first cluster of columns, followed by the first cluster of columns in the switching unit of the third row, ... up to the first cluster of columns in the last row. It groups the first clusters of all switching units in a linear address space. The first clusters of other types of configurable units are loaded in groups in adjacent address spaces. Then, the order is followed by the second cluster of columns in the switching cells in the first row, followed by the second cluster in the columns in the switching cells in the second row, followed by the switching cells in the third row The second cluster of each column, ... up to the last cluster of the last column in the last row of switching units, in this way for the second cluster of all types of configurable units.

如上所述,使用對於切換單元之組態資料的記憶體位址之次序,以下偽碼(pseudo code)顯示如何對於切換單元(comp_switch)產生線性的記憶體位址。偽碼使用4個輸入: comp_id: component identifier(組件識別符); comp_col: column identifier(行識別符); comp_row: row identifier(列識別符); comp_chunk: chunk number(群集號碼); 及產生輸出: linear_address: linear memory address for an unload chunk(卸載群集之線性的記憶體位址);As described above, the following pseudo code shows how to generate a linear memory address for a switch unit (comp_switch) using the order of memory addresses for the configuration data of the switch unit. The pseudocode uses 4 inputs: comp_id: component identifier (component identifier); comp_col: column identifier (row identifier); comp_row: row identifier (column identifier); comp_chunk: chunk number (cluster number); and produces the output: linear_address: linear memory address for an unload chunk (the linear memory address of the unload cluster);

用以對於切換單元之特定卸載群集產生線性的記憶體位址之偽碼係如下:

Figure 02_image001
其中 • comp_switch表示切換單元; • NUM_ROW_SW為所有切換單元之列的數量; • COMP_COUNT_ALL為所有可組態單元的總和。Pseudocode for generating linear memory addresses for a particular offload cluster of switching units is as follows:
Figure 02_image001
Where • comp_switch represents the switching unit; • NUM_ROW_SW is the number of all switching units; • COMP_COUNT_ALL is the sum of all configurable units.

要對於PCU、PMU、或AGCU單元之特定的卸載群集產生線性的記憶體位址,類似的碼可被使用。一個差異是所有切換單元之列的數量不同於所有PCU之列的數量、所有PMU之列的數量、及所有AGCU之列的數量。另一個差異是用於切換單元之線性的記憶體位址可在基底位址(例如0)開始,同時用於PCU、PMU與AGCU之線性的記憶體位址分別在切換單元、PCU、與PMU之最後群集之後的位址開始。To generate linear memory addresses for a specific offload cluster of PCU, PMU, or AGCU units, a similar code can be used. One difference is that the number of ranks of all switching units is different from the number of ranks of all PCUs, the number of ranks of all PMUs, and the number of ranks of all AGCUs. Another difference is that the linear memory addresses for the switch unit can start at the base address (eg 0), while the linear memory addresses for the PCU, PMU and AGCU are at the end of the switch unit, PCU, and PMU respectively. The address after the cluster starts.

第14圖為顯示於可組態單元中之單元組態卸載處理的流程圖。於步驟1431,來自組態資料儲存器中組態序列鏈之群集的組態資料被序列地轉移至輸出移位暫存器(650,第6圖)內。流程進入步驟1432。FIG. 14 is a flowchart showing the unit configuration unloading process in the configurable unit. At step 1431, configuration data from clusters of configuration sequence chains in the configuration data store are sequentially transferred into the output shift register (650, FIG. 6). The flow proceeds to step 1432 .

於步驟1432,單元組態卸載處理等待輸出FIFO(660,第6圖)或其他類型的輸出緩衝器電路變成有效的。於步驟1433,當輸出FIFO變成有效的時,來自輸出移位暫存器之群集的組態資料被插入至輸出FIFO內。於步驟1434,於輸出FIFO中之群集的組態資料被寫入至匯流排系統(第3圖)。At step 1432, the cell configuration offload process waits for an output FIFO (660, Figure 6) or other type of output buffer circuit to become active. At step 1433, when the output FIFO becomes active, the configuration data from the cluster of output shift registers is inserted into the output FIFO. At step 1434, the configuration data of the cluster in the output FIFO is written to the bus system (FIG. 3).

於步驟1435,單元組態卸載處理決定第一群集的組態資料是否為組態資料儲存器中之最後群集的組態資料。若是,則可組態單元之組態資料的卸載便完成。若否,則流程轉變回步驟1431,且來自組態資料儲存器之第二群集的組態資料被序列地轉移至輸出移位暫存器內。At step 1435, the unit configuration offload process determines whether the configuration data of the first cluster is the configuration data of the last cluster in the configuration data store. If so, the unloading of the configuration data of the configurable unit is completed. If not, the flow transitions back to step 1431 and the configuration data from the second cluster of the configuration data store is sequentially transferred into the output shift register.

雖然本發明參照較佳實施例與以上詳述之範例來揭露,應了解的是,這些範例僅為說明用而非限制用。應考量的是,對於所屬技術領域中具有通常知識者而言,修改與結合將輕易地發生,其修改與結合將落於本發明之精神與以下申請專利範圍之範疇內。While the present invention has been disclosed with reference to the preferred embodiments and the above-detailed examples, it should be understood that these examples are for illustration only and not for limitation. It should be considered that modifications and combinations will easily occur to those with ordinary knowledge in the art, and the modifications and combinations will fall within the spirit of the present invention and the scope of the following claims.

110:可重組態資料處理器 115:匯流排系統 120:主機 130:外部I/O介面 140:記憶體 150:外部I/O介面 170:外部時脈產生器 175:時脈訊號 190:可組態單元之陣列 195:組態加載/卸載控制器 205:外部I/O介面 211:頂層切換器 212:頂層切換器 213:頂層切換器 214:頂層切換器 215:頂層切換器 216:頂層切換器 300:可組態單元之陣列 301:組態加載/卸載控制器 311:切換單元 312:切換單元 320:鏈結 321:互連 331:鏈結 341:型樣記憶體單元 400:可組態單元 410:輸入組態資料 420:組態資料儲存器 421:線路 422:線路 430:輸出組態資料 440:單元組態加載邏輯 450:純量FIFO 460:FIFO區塊 470:控制區塊 480:區塊 491:菊鍊完成匯流排 492:菊鍊命令匯流排 493:菊鍊邏輯 610:輸入FIFO 620:輸入移位暫存器 630:組態序列鏈 640:組態序列鏈 650:輸出移位暫存器 660:輸出FIFO 670:選擇器 711:步驟 712:步驟 714:步驟 716:步驟 810:步驟 811:步驟 812:步驟 813:步驟 814:步驟 815:步驟 816:步驟 1221:步驟 1222:步驟 1223:步驟 1224:步驟 1225:步驟 1431:步驟 1432:步驟 1433:步驟 1434:步驟 1435:步驟 A0:位址 A1:位址 A2:位址 A3:位址 A4:位址 A5:位址 AG:位址產生器 AGCU12:位址產生及合併單元 AGCU13:位址產生及合併單元 AGCU14:位址產生及合併單元 AGCU22:位址產生及合併單元 AGCU23:位址產生及合併單元 AGCU24:位址產生及合併單元 C0:匯流排週期 C1:匯流排週期 C2:匯流排週期 C2y-1:匯流排週期 CU:合併單元 Cy:匯流排週期 Cy+1:匯流排週期 Cy+2:匯流排週期 Cy-1:匯流排週期 L11:鏈結 L12:鏈結 L21:鏈結 L22:鏈結 LSB:最小有效位元 MAGCU1:位址產生及合併單元 MAGCU2:位址產生及合併單元 MSB:最大有效位元 P11:群集的組態檔案 P12:群集的組態檔案 P21:群集的組態檔案 P22:群集的組態檔案 P31:群集的組態檔案 P32:群集的組態檔案 PCU:型樣計算單元 PMU:型樣記憶體單元 PY1:群集的組態檔案 PY2:群集的組態檔案 S:切換單元 Stage 1:階段 Stage N:階段 Tile1:方塊 Tile2:方塊110: Reconfigurable Data Processor 115: Busbar system 120: host 130: External I/O interface 140: memory 150: External I/O interface 170: External clock generator 175: clock signal 190: Array of Configurable Units 195: Configuration Load/Unload Controller 205: External I/O interface 211: Top Switcher 212: Top Switcher 213: Top Switcher 214: Top Switcher 215: Top Switcher 216: Top Switcher 300: Array of Configurable Units 301: Configuration load/unload controller 311: Switch unit 312: Switch unit 320: Link 321: Interconnect 331: Link 341: Pattern Memory Cell 400: Configurable unit 410: Input configuration data 420: Configuration data storage 421: Line 422: Line 430: Output configuration data 440: Unit Configuration Loading Logic 450: Scalar FIFO 460: FIFO block 470: Control Block 480: block 491: Daisy Chain Completion Bus 492: Daisy Chain Command Bus 493: Daisy Chain Logic 610: Input FIFO 620: Input shift register 630: Configure Sequence Chain 640: Configure Sequence Chain 650: Output shift register 660: Output FIFO 670: selector 711: Steps 712: Steps 714: Steps 716: Steps 810: Steps 811: Steps 812: Steps 813: Steps 814: Steps 815: Steps 816: Steps 1221: Steps 1222: Steps 1223: Steps 1224: Steps 1225: Steps 1431: Steps 1432: Steps 1433: Steps 1434: Steps 1435: Steps A0: address A1: address A2: address A3: address A4: address A5: address AG: address generator AGCU12: Address Generation and Merging Unit AGCU13: Address Generation and Merging Unit AGCU14: Address Generation and Merge Unit AGCU22: Address Generation and Merging Unit AGCU23: Address Generation and Merging Unit AGCU24: Address Generation and Merge Unit C0: bus cycle C1: bus cycle C2: Bus cycle C2y-1: bus cycle CU: Merge Unit Cy: bus cycle Cy+1: bus cycle Cy+2: bus cycle Cy-1: Bus cycle L11: Link L12: Link L21: Link L22: Link LSB: Least Significant Bit MAGCU1: Address Generation and Merging Unit MAGCU2: Address Generation and Merging Unit MSB: Most Significant Bit P11: Cluster configuration file P12: Cluster configuration file P21: Cluster configuration file P22: Cluster configuration file P31: Cluster configuration file P32: Cluster configuration file PCU: Pattern Calculation Unit PMU: Pattern-like memory unit PY1: Cluster configuration file PY2: Cluster configuration file S: switch unit Stage 1: Stage Stage N: Stage Tile1: block Tile2: Square

[第1圖]為顯示包括主機、記憶體、及可重組態資料處理器之系統的系統圖。[FIG. 1] is a system diagram showing a system including a host, a memory, and a reconfigurable data processor.

[第2圖]為頂層網路及粗粒可重組態架構(Coarse Grain Reconfigurable Architecture;CGRA)之組件的簡化方塊圖。[Fig. 2] is a simplified block diagram of the top-level network and components of the Coarse Grain Reconfigurable Architecture (CGRA).

[第3圖]為方塊及可使用於第2圖之組態中的陣列層網路之簡化圖式,其中,在該陣列中之可組態單元為在陣列層網路上之節點。[FIG. 3] is a simplified diagram of a block and an array-level network that can be used in the configuration of FIG. 2, where the configurable units in the array are nodes on the array-level network.

[第3A圖]顯示連接於陣列層網路中之元件的切換單元之範例。[FIG. 3A] shows an example of a switching unit connected to elements in an array layer network.

[第4圖]為顯示範例可組態單元之方塊圖。[FIG. 4] is a block diagram showing an example configurable unit.

[第5圖]顯示狀態機圖的一個範例,其可被使用以控制可組態單元中之單元組態加載處理。[FIG. 5] shows an example of a state machine diagram that may be used to control the cell configuration loading process in a configurable cell.

[第6圖]為支援於可組態單元中之組態儲存器之加載的結構之邏輯表示。[FIG. 6] is a logical representation of the structure supporting the loading of configuration memory in a configurable unit.

[第7圖]為顯示耦接至可重組態資料處理器之主機的操作之流程圖。[FIG. 7] is a flow chart showing the operation of a host coupled to a reconfigurable data processor.

[第8圖]為顯示組態加載控制器之操作的流程圖,該組態加載控制器可為與在方塊中的該陣列之可組態單元通訊之部份的主AGCU或其他者。[FIG. 8] is a flowchart showing the operation of a configuration load controller, which may be the main AGCU or otherwise part of the communication with the configurable units of the array in the block.

[第9圖]顯示組態檔案之範例組織。[Figure 9] shows an example organization of configuration files.

[第10圖]為顯示用以執行用於類似第2與3圖之系統的陣列組態加載處理之邏輯的一個範例之狀態機圖。[FIG. 10] is a state machine diagram showing an example of the logic used to execute the array configuration loading process for systems like those of FIGS. 2 and 3. FIG.

[第11圖]為顯示類似第10圖之較早回合的分配序列之時序的時序圖。[FIG. 11] is a timing diagram showing the timing of the allocation sequence of an earlier round similar to FIG. 10. FIG.

[第12圖]為顯示於可組態單元中之單元組態加載處理的流程圖。[Fig. 12] is a flowchart showing the unit configuration loading process in the configurable unit.

[第13圖]為顯示用以執行用於類似第2與3圖之系統的陣列組態卸載處理之邏輯的一個範例之狀態機圖。[FIG. 13] is a state machine diagram showing an example of logic to execute the array configuration offload processing for systems like those of FIGS. 2 and 3. FIG.

[第14圖]為顯示於可組態單元中之單元組態卸載處理的流程圖。[FIG. 14] is a flowchart showing the unit configuration unloading process in the configurable unit.

110:可重組態資料處理器 110: Reconfigurable Data Processor

115:匯流排系統 115: Busbar system

120:主機 120: host

130:外部I/O介面 130: External I/O interface

140:記憶體 140: memory

150:外部I/O介面 150: External I/O interface

170:外部時脈產生器 170: External clock generator

175:時脈訊號 175: clock signal

190:可組態單元之陣列 190: Array of Configurable Units

195:組態加載/卸載控制器 195: Configuration Load/Unload Controller

Claims (60)

一種可重組態資料處理器,包含: 一匯流排系統; 連接至該匯流排系統之一陣列的可組態單元,於該陣列中之可組態單元包括組態資料儲存器以儲存包含特定於對應的可組態單元之組態資料的複數個子檔案之單元檔案; 其中於該複數個可組態單元中之可組態單元各包括用以執行一單元組態加載處理之邏輯,該單元組態加載處理包括經由該匯流排系統來接收特定於該可組態單元之一單元檔案的子檔案、及將所接收的子檔案加載至該可組態單元之組態儲存器內;及 連接至該匯流排系統之一組態加載控制器,包括用以執行一陣列組態加載處理之邏輯,該陣列組態加載處理包括對於於該陣列中之複數個可組態單元分配包含單元檔案之一組態檔案,該等單元檔案各包含複數個排序過的子檔案,藉由在N個回合(R(i),i從0至N-1)的序列中經由該匯流排系統將次序(i)的一個單元子檔案發送至於該複數個可組態單元中之所有的包括最多(i+1)個子檔案之可組態單元。A reconfigurable data processor comprising: a busbar system; Configurable cells connected to an array of the bus system, the configurable cells in the array including configuration data storage to store a plurality of subfiles containing configuration data specific to the corresponding configurable cells unit file; wherein each of the configurable units in the plurality of configurable units includes logic for performing a unit configuration loading process that includes receiving, via the bus system, a process specific to the configurable unit a subfile of a unit file, and loading the received subfile into the configuration memory of the configurable unit; and A configuration load controller connected to the bus system includes logic for executing an array configuration load process, the array configuration load process including assigning a cell file to a plurality of configurable cells in the array A configuration file, each of the unit files containing a plurality of ordered subfiles, by the bus system in a sequence of N rounds (R(i), i from 0 to N-1) A unit subfile of (i) is sent to all configurable units including at most (i+1) subfiles among the plurality of configurable units. 如請求項1之處理器,其中該複數個可組態單元包括於該陣列的可組態單元中之所有的可組態單元,且用於該等可組態單元中之一或多者的該單元檔案實現一無操作組態。The processor of claim 1, wherein the plurality of configurable cells comprise all of the configurable cells in the array and are used for the processing of one or more of the configurable cells The unit file implements a no-op configuration. 如請求項1之處理器,其中於該複數個可組態單元中之可組態單元中的該等組態資料儲存器包含序列鏈,且該單元組態加載處理於一個匯流排週期中從該匯流排系統接收特定於該可組態單元之該單元檔案的一第一子檔案、在該單元檔案之一第二子檔案被接收之前在隨後的匯流排週期期間開始將所接收的第一子檔案推入該序列鏈內、於一稍後的匯流排週期中對於該序列之下一個回合從該匯流排系統接收特定於該可組態單元之該單元檔案的該第二子檔案、及在將稍早所接收的子檔案推入該序列鏈內之後在該序列之週期期間開始將所接收的該第二子檔案推入該序列鏈內。The processor of claim 1, wherein the configuration data stores in the configurable cells of the plurality of configurable cells comprise serial chains, and the cell configuration load processing from The bus system receives a first sub-file of the cell file specific to the configurable cell, begins to process the received first sub-file during a subsequent bus cycle before a second sub-file of the cell file is received subfiles are pushed into the sequence chain, the second subfile of the cell file specific to the configurable cell is received from the bus system for the next turn of the sequence in a later bus cycle, and Pushing the second subfile received into the sequence chain begins during the cycle of the sequence after the earlier received subfile is pushed into the sequence chain. 如請求項3之處理器,其中於該複數個排序過的子檔案中之該第二子檔案被該可組態單元接收之前,該第一子檔案於該可組態單元中被該單元組態加載處理消耗。The processor of claim 3, wherein the first subfile is grouped by the unit in the configurable unit before the second subfile of the plurality of sorted subfiles is received by the configurable unit state loading processing consumption. 如請求項1之處理器,其中該陣列組態加載處理包括從一主處理接收識別該組態檔案於記憶體中之位置的組態加載命令、及因應該命令而產生一或多個記憶體存取請求以擷取該組態檔案。The processor of claim 1, wherein the array configuration load process includes receiving a configuration load command from a host process identifying the location of the configuration file in memory, and generating one or more memories in response to the command Access request to retrieve the configuration file. 如請求項1之處理器,其中對於在複數個可組態單元中之各可組態單元,該組態檔案包括單元檔案之複數個子檔案,該等子檔案被以符合該序列的交錯方式設置於該組態檔案中,且其中該陣列組態加載處理包括基於該等子檔案於該組態檔案中之位置將該等子檔案路由至可組態單元。The processor of claim 1, wherein for each configurable unit in the plurality of configurable units, the configuration file includes a plurality of sub-files of the unit file, the sub-files being arranged in an interleaved manner consistent with the sequence in the configuration file, and wherein the array configuration loading process includes routing the subfiles to configurable units based on their location in the configuration file. 如請求項1之處理器,其中子檔案具有數量N位元的資料,且該匯流排系統經組構以在一個匯流排週期中傳送N位元的資料。The processor of claim 1, wherein the subfile has an amount of N bits of data, and the bus system is configured to transfer the N bits of data in one bus cycle. 如請求項7之處理器,其中於該複數個可組態單元中之可組態單元中的該等組態資料儲存器包含序列鏈,且該單元組態加載處理於一個匯流排週期中從該匯流排系統接收特定於該可組態單元之該單元檔案的一第一子檔案、在N個隨後的匯流排週期期間將所接收的第一子檔案推入該序列鏈內、及於一稍後的匯流排週期中從該匯流排系統接收特定於該可組態單元之該單元檔案的一第二子檔案、及在將稍早所接收的子檔案推入該序列鏈內之後在N個隨後的匯流排週期期間將所接收的第二子檔案推入該序列鏈內。The processor of claim 7, wherein the configuration data stores in the configurable cells of the plurality of configurable cells comprise serial chains, and the cell configuration load processing is performed in one bus cycle from The bus system receives a first subfile of the cell file specific to the configurable cell, pushes the received first subfile into the sequence chain during N subsequent bus cycles, and at a A second subfile of the cell file specific to the configurable cell is received from the bus system in a later bus cycle, and at N after pushing the earlier received subfile into the sequence chain The received second subfile is pushed into the sequence chain during a subsequent bus cycle. 如請求項8之處理器,其中該陣列包括多於N個的可組態單元。The processor of claim 8, wherein the array includes more than N configurable cells. 如請求項1之處理器,其中該陣列包括多於一個類型的可組態單元,且用於不同類型的可組態單元之該等單元檔案包括不同數量的組態資料之子檔案。The processor of claim 1, wherein the array includes more than one type of configurable cell, and the cell files for different types of configurable cells include different numbers of subfiles of configuration data. 如請求項1之處理器,其中用於一第一類型的可組態單元之該等單元檔案包括Z1子檔案,且用於一第二類型的可組態單元之該等單元檔案包括Z2子檔案,其中Z1小於Z2,且該陣列組態加載處理包括: 擷取包括用於所有第一類型與第二類型的可組態單元之該等單元檔案的子檔案(i)之該組態檔案的片段,其中(i)從0至Z1-1,及接著擷取包括用於所有第二類型的可組態單元之該等單元檔案的子檔案(i)之該組態檔案的片段,其中(i)從Z1至Z2-1。The processor of claim 1, wherein the unit files for a configurable unit of a first type include Z1 sub-files, and the unit files for a second type of configurable unit include Z2 sub-files file, where Z1 is less than Z2, and the array configuration load process includes: Retrieve a segment of the configuration file that includes sub-file (i) of the unit files for all configurable units of the first type and second type, where (i) goes from 0 to Z1-1, and then A segment of the configuration file including sub-file (i) of the unit files for all configurable units of the second type is retrieved, where (i) is from Z1 to Z2-1. 如請求項1之處理器,其中於該陣列的可組態單元中之可組態單元包括個別的以在該陣列組態加載邏輯處開始與結束之菊鍊連接的加載完成狀態邏輯。The processor of claim 1, wherein the configurable cells in the array of configurable cells include individual load completion status logic connected in a daisy chain starting and ending at the array configuration load logic. 如請求項12之處理器,其中該陣列組態加載邏輯在該組態檔案被分配之後於該菊鍊上轉送加載完成訊號,且在該陣列中之各可組態單元中,當來自該菊鍊之一先前的成員之該加載完成訊號被接收且其單元檔案之加載被完成時,該加載完成狀態邏輯於該菊鍊上轉送該加載完成訊號。The processor of claim 12, wherein the array configuration load logic forwards a load complete signal on the daisy chain after the configuration file is allocated, and in each configurable unit in the array, when the When the load complete signal is received for a previous member of the chain and the loading of its cell file is complete, the load complete state logic forwards the load complete signal on the daisy chain. 如請求項1之處理器,其中該匯流排系統包括一頂層網路與一陣列層網路,該頂層網路包括一外部資料介面與一陣列介面,且該陣列層網路係連接至該陣列介面及至該陣列的可組態單元中之該等可組態單元。The processor of claim 1, wherein the bus system includes a top layer network and an array layer network, the top layer network includes an external data interface and an array interface, and the array layer network is connected to the array interface and to the configurable units of the array's configurable units. 如請求項14之處理器,其中該陣列組態加載處理包括從一主處理接收識別該組態檔案於記憶體中之位置的組態加載命令、及因應該命令經由該頂層網路而產生一或多個記憶體存取請求以透過該外部資料介面來擷取該組態檔案。The processor of claim 14, wherein the array configuration load process includes receiving a configuration load command from a host process identifying the location of the configuration file in memory, and generating a configuration load command via the top-level network in response to the command or more memory access requests to retrieve the configuration file through the external data interface. 如請求項15之處理器,其中該陣列組態加載處理使用於該組態檔案中之該等子檔案的位置所暗示之位址將該組態資料之子檔案經由該陣列層網路路由至該等可組態單元。The processor of claim 15, wherein the array configuration load process routes the subfiles of configuration data to the array layer network using addresses implied by the locations of the subfiles in the configuration file and other configurable units. 如請求項1之處理器,其中於該複數個可組態單元中之可組態單元在執行期間在組態亦被使用於該組態加載處理中之後使用於該匯流排系統中之路由。The processor of claim 1, wherein the configurable elements of the plurality of configurable elements are used for routing in the bus system during execution after configuration is also used in the configuration loading process. 一種用以操作可重組態資料處理器之方法,該可重組態資料處理器包含一匯流排系統及連接至該匯流排系統之一陣列的可組態單元,於該陣列中之可組態單元包括組態資料儲存器以儲存包含特定於對應的可組態單元之組態資料的複數個子檔案之單元檔案,該方法包含: 對於於該陣列中之複數個可組態單元分配包含單元檔案之一組態檔案,該等單元檔案各包含複數個排序過的子檔案,藉由在N個回合(R(i),i從0至N-1)的序列中經由該匯流排系統將次序(i)的一個單元子檔案發送至於該複數個可組態單元中之所有的包括最多(i+1)個子檔案之可組態單元;及 接收特定於該可組態單元之一單元檔案的該等子檔案,及將所接收的子檔案加載至該可組態單元之組態儲存器內。A method for operating a reconfigurable data processor comprising a bus system and configurable units connected to an array of the bus system, the configurable units in the array The state unit includes a configuration data store to store a unit file including a plurality of sub-files of configuration data specific to the corresponding configurable unit, the method comprising: For a plurality of configurable cells in the array, a configuration file is allocated that includes a cell file, each of which includes a plurality of ordered sub-files, by which in N rounds (R(i), i from 0 to N-1) in the sequence via the bus system sends a unit subfile of order (i) to all configurables including at most (i+1) subfiles in the plurality of configurable units unit; and The subfiles specific to a unit file of the configurable unit are received, and the received subfiles are loaded into the configuration memory of the configurable unit. 如請求項18之方法,其中該複數個可組態單元包括於該陣列的可組態單元中之所有的可組態單元,且用於該等可組態單元中之一或多者的該單元檔案實現一無操作組態。The method of claim 18, wherein the plurality of configurable cells comprise all of the configurable cells of the array and are used for the configurable cell of one or more of the configurable cells The unit file implements a no-op configuration. 如請求項18之方法,其中於該複數個可組態單元中之可組態單元中的該等組態資料儲存器包含序列鏈,該方法包括於一個匯流排週期中接收於一特定的可組態單元中之該單元檔案的一第一子檔案、在該單元檔案之一第二子檔案被接收之前在隨後的匯流排週期期間開始將所接收的第一子檔案推入該序列鏈內、及於一稍後的匯流排週期中對於該序列之下一個回合接收於該特定的可組態單元中之該單元檔案的該第二子檔案、及在完成將稍早所接收的子檔案推入該序列鏈內之後在該序列之週期期間開始將所接收的第二子檔案推入該序列鏈內。The method of claim 18, wherein the configuration data stores in configurable units of the plurality of configurable units comprise serial chains, the method comprising receiving at a particular configurable unit in a bus cycle A first subfile of the unit file in the configuration unit begins to push the received first subfile into the sequence chain during a subsequent bus cycle before a second subfile of the unit file is received , and the second subfile of the cell file received in the particular configurable cell for the next round of the sequence in a later bus cycle, and the subfile to be received earlier upon completion Pushing into the sequence chain begins to push the received second subfile into the sequence chain during the cycle of the sequence. 如請求項20之方法,其中於該複數個排序過的子檔案中之該第二子檔案被一可組態單元接收之前,該第一子檔案於該可組態單元中被該單元組態加載處理消耗。The method of claim 20, wherein before the second subfile of the plurality of ordered subfiles is received by a configurable unit, the first subfile is configured by the unit in the configurable unit Load processing consumption. 如請求項18之方法,包括在所述分配之前從一主處理接收接收識別該組態檔案於記憶體中之位置的組態加載命令、及因應該命令而產生一或多個記憶體存取請求以擷取該組態檔案。The method of claim 18, comprising receiving from a host process a configuration load command identifying the location of the configuration file in memory prior to said assigning, and generating one or more memory accesses in response to the command Request to retrieve this configuration file. 如請求項18之方法,其中對於在複數個可組態單元中之各可組態單元,該組態檔案包括單元檔案之複數個子檔案,該等子檔案被以交錯方式設置於該組態檔案中,且該方法包括基於該等子檔案於該組態檔案中之位置將該等子檔案路由至可組態單元。The method of claim 18, wherein for each configurable unit in the plurality of configurable units, the configuration file includes a plurality of sub-files of the unit file, the sub-files being disposed in the configuration file in an interleaved manner and the method includes routing the subfiles to configurable units based on their location in the configuration file. 如請求項23之方法,其中子檔案具有數量N位元的資料,且該匯流排系統經組構以在一個匯流排週期中傳送N位元的資料。The method of claim 23, wherein the subfile has an amount of N bits of data, and the bus system is configured to transmit the N bits of data in one bus cycle. 如請求項24之方法,其中於該複數個可組態單元中之可組態單元中的該等組態資料儲存器包含序列鏈,該方法包括於一個匯流排週期中於一可組態單元處接收特定於該可組態單元之該單元檔案的一第一子檔案、及在N個隨後的匯流排週期期間將所接收的第一子檔案推入該序列鏈內、及於一稍後的匯流排週期中接收特定於該可組態單元之該單元檔案的一第二子檔案、及在將稍早所接收的子檔案推入該序列鏈內之後在N個隨後的匯流排週期期間將所接收的第二子檔案推入該序列鏈內。The method of claim 24, wherein the configuration data stores in the configurable cells of the plurality of configurable cells comprise serial chains, the method comprising in a bus cycle in a configurable cell Receives a first subfile of the cell file specific to the configurable cell at and pushes the received first subfile into the sequence chain during N subsequent bus cycles, and at a later Receives a second subfile of the cell file specific to the configurable cell in the bus cycle of , and during N subsequent bus cycles after pushing the earlier received subfile into the sequence chain Push the received second subfile into the sequence chain. 如請求項25之方法,其中該陣列包括多於N個的可組態單元。The method of claim 25, wherein the array includes more than N configurable cells. 如請求項18之方法,其中該陣列包括多於一個類型的可組態單元,且用於不同類型的可組態單元之該等單元檔案包括不同數量的組態資料之子檔案。The method of claim 18, wherein the array includes more than one type of configurable cell, and the cell files for different types of configurable cells include different numbers of subfiles of configuration data. 如請求項18之方法,其中用於一第一類型的可組態單元之該等單元檔案包括Z1子檔案,且用於一第二類型的可組態單元之該等單元檔案包括Z2子檔案,其中Z1小於Z2,該方法包括: 擷取包括用於所有第一類型與第二類型的可組態單元之該等單元檔案的子檔案(i)之該組態檔案的片段,其中(i)從0至Z1-1,及接著擷取包括用於所有第二類型的可組態單元之該等單元檔案的子檔案(i)之該組態檔案的片段,其中(i)從Z1至Z2-1。The method of claim 18, wherein the unit files for a configurable unit of a first type include Z1 subfiles, and the unit files for a second type of configurable unit include Z2 subfiles , where Z1 is less than Z2, the method includes: Retrieve a segment of the configuration file that includes sub-file (i) of the unit files for all configurable units of the first type and second type, where (i) goes from 0 to Z1-1, and then A segment of the configuration file including sub-file (i) of the unit files for all configurable units of the second type is retrieved, where (i) is from Z1 to Z2-1. 如請求項18之方法,包括於可組態單元之菊鍊中通過加載完成狀態。The method of claim 18, comprising loading a completion state in a daisy chain of configurable units. 如請求項29之方法,包括在該組態檔案被分配之後從於該菊鍊上之一第一節點轉送加載完成訊號,且在該陣列中之各可組態單元中,當來自該菊鍊之一先前的節點之該加載完成訊號被接收且其單元檔案之加載被完成時,於該菊鍊上轉送該加載完成訊號。The method of claim 29, comprising forwarding a load complete signal from a first node on the daisy chain after the configuration file is allocated, and in each configurable unit in the array, when the When the load complete signal of a previous node is received and the loading of its cell file is completed, the load complete signal is forwarded on the daisy chain. 如請求項18之方法,包括從一主處理接收接收識別該組態檔案於記憶體中之位置的組態加載命令、及因應該命令經由頂層網路而產生一或多個記憶體存取請求以擷取該組態檔案。The method of claim 18, comprising receiving from a host process a configuration load command identifying the location of the configuration file in memory, and generating one or more memory access requests in response to the command via a top-level network to retrieve the configuration file. 如請求項31之方法,其中該陣列組態加載處理使用於該組態檔案中之該等子檔案的位置所暗示之位址將該組態資料之子檔案經由該陣列層網路路由至該等可組態單元。The method of claim 31, wherein the array configuration load process routes the subfiles of configuration data to the subfiles over the array layer network using addresses implied by the locations of the subfiles in the configuration file Configurable unit. 如請求項18之方法,在執行期間在組態亦被使用於所述分配處理之後使用於該匯流排系統中之路由。The method of claim 18, during execution, after configuration is also used for routing in the bus system after said allocation process. 一種可重組態資料處理器,包含: 匯流排系統; 連接至該匯流排系統之一陣列的可組態單元,於該陣列中之可組態單元包括被設置於序列鏈中之組態資料儲存器以儲存包含特定於對應的可組態單元之組態資料的複數個子檔案之單元檔案;及 連接至該匯流排系統之一組態加載控制器,包括用以對於在平行子檔案之該陣列中的複數個該可組態單元分配包含單元檔案的一組態檔案之邏輯。A reconfigurable data processor comprising: busbar system; Configurable cells connected to an array of the busbar system, the configurable cells in the array including configuration data storage arranged in a serial chain to store groups comprising configurable cells specific to the corresponding A unit file of a plurality of sub-files of state data; and A configuration load controller connected to the bus system includes logic for assigning a configuration file containing cell files to the plurality of the configurable cells in the array of parallel subfiles. 一種用以操作可重組態資料處理器之方法,該可重組態資料處理器包含一匯流排系統及連接至該匯流排系統之一陣列的可組態單元,於該陣列中之可組態單元包括被設置於序列鏈中之組態資料儲存器以儲存包含特定於對應的可組態單元之組態資料的複數個子檔案之單元檔案,該方法包含: 對於在平行子檔案之該陣列中的複數個可組態單元分配包含單元檔案的一組態檔案。A method for operating a reconfigurable data processor comprising a bus system and configurable units connected to an array of the bus system, the configurable units in the array The state unit includes a configuration data store disposed in the serial chain to store a unit file including a plurality of sub-files of configuration data specific to the corresponding configurable unit, and the method includes: A configuration file containing cell files is allocated to a plurality of configurable cells in the array of parallel subfiles. 一種可重組態資料處理器,包含: 一匯流排系統; 連接至該匯流排系統之一陣列的可組態單元,於該陣列中之可組態單元包括組態資料儲存器以儲存包含特定於對應的可組態單元之組態資料的複數個子檔案之單元檔案;及 連接至該匯流排系統之一組態卸載控制器,包括用以執行一陣列組態卸載處理之邏輯,該陣列組態卸載處理包括分配一命令至在該陣列中之複數個可組態單元以卸載特定於對應的可組態單元之該等單元檔案,該等單元檔案各包含複數個排序過的子檔案、經由該匯流排系統從該陣列的可組態單元接收子檔案、及根據該子檔案為其之一部分的該單元檔案之該可組態單元及於該單元檔案中之該子檔案之次序藉由設置所接收的子檔案於記憶體中來組合一卸載組態檔案; 其中於該複數個可組態單元中之可組態單元各包括用以執行一單元組態卸載處理之邏輯,該單元組態卸載處理包括從該可組態單元之該組態儲存器卸載該等子檔案及經由該匯流排系統將特定於該可組態單元的一單元檔案之子檔案傳輸至該組態卸載控制器。A reconfigurable data processor comprising: a busbar system; Configurable cells connected to an array of the bus system, the configurable cells in the array including configuration data storage to store a plurality of subfiles containing configuration data specific to the corresponding configurable cells unit file; and A configuration offload controller connected to the bus system includes logic for executing an array configuration offload process, the array configuration offload process including assigning a command to a plurality of configurable units in the array to unloading the cell files specific to the corresponding configurable cell, the cell files each containing a plurality of ordered subfiles, receiving the subfiles from the configurable cells of the array via the bus system, and according to the subfiles the configurable unit of the unit file of which the file is a part and the order of the subfile in the unit file to combine an unload configuration file by setting the received subfile in memory; wherein each of the configurable units in the plurality of configurable units includes logic for executing a unit configuration unloading process, the unit configuration unloading process includes unloading the configurable unit from the configuration storage of the configurable unit Equal sub-files and transfer the sub-files of a unit file specific to the configurable unit to the configuration offload controller via the bus system. 如請求項36之處理器,其中於該複數個可組態單元中之可組態單元中的該組態資料儲存器包含一序列鏈及耦接至該序列鏈之一輸出緩衝器,且該單元組態卸載處理將該單元檔案之該等子檔案從該序列鏈轉移出至該輸出緩衝器、及於該匯流排系統上從該輸出緩衝器傳輸該等子檔案。The processor of claim 36, wherein the configuration data store in a configurable unit of the plurality of configurable units includes a sequence chain and an output buffer coupled to the sequence chain, and the The cell configuration offload process transfers the subfiles of the cell file from the sequence chain to the output buffer, and transfers the subfiles from the output buffer on the bus system. 如請求項36之處理器,其中該陣列組態卸載處理包括從一主處理接收組態卸載命令,該組態卸載命令識別於記憶體中何處儲存一卸載組態檔案之一位址位置,且該組合步驟包括對於該等子檔案計算從該位址位置之位址偏移。The processor of claim 36, wherein the array configuration offload process comprises receiving a configuration offload command from a host process, the configuration offload command identifying an address location in memory where to store an offload configuration file, And the combining step includes calculating an address offset from the address location for the sub-files. 如請求項36之處理器,其中對於在複數個可組態單元中之各可組態單元,該組態檔案包括單元檔案之複數個子檔案,該等單元檔案具有多達M個具有於該單元檔案中次序(i)之子檔案、及被設置於該卸載組態檔案中使得對於在該卸載組態檔案中之所有單元檔案,所有次序(i)之子檔案被儲存於在該記憶體中之對應的區塊(i)之位址空間,其中(i)從0至M-1。The processor of claim 36, wherein for each configurable unit in the plurality of configurable units, the configuration file includes a plurality of sub-files of a unit file having up to M files with data in the unit The child files of order (i) in the file, and are set in the unload configuration file such that for all unit files in the unload configuration file, all child files of order (i) are stored in the corresponding memory The address space of block (i) of , where (i) is from 0 to M-1. 如請求項39之處理器,其中該陣列包括多於一種類型的可組態單元,且用於不同類型的可組態單元之該等單元檔案包括不同數量的組態資料之子檔案,且其中在一區塊(i)之位址空間內,對於各類型的可組態單元,該等子檔案被儲存於在該區塊(i)內之相鄰的位址之群組中。The processor of claim 39, wherein the array includes more than one type of configurable cell, and the cell files for different types of configurable cells include different numbers of subfiles of configuration data, and wherein the Within the address space of a block (i), for each type of configurable unit, the subfiles are stored in groups of adjacent addresses within the block (i). 如請求項36之處理器,其中子檔案具有數量N位元的資料,且該匯流排系統經組構以在一個匯流排週期中傳送N位元的資料。The processor of claim 36, wherein the subfile has an amount of N bits of data, and the bus system is configured to transfer the N bits of data in one bus cycle. 如請求項36之處理器,其中於該陣列的可組態單元中之該等可組態單元的該等單元檔案具有最多M個子檔案,且所述設置所接收的子檔案於記憶體中之步驟包括: 儲存該卸載組態檔案於記憶體中於複數個區塊(i)之位址,其中(i)從0至最多M-1,及對於在區塊(i)中在該複數個可組態單元中之所有的可組態單元儲存該等單元檔案之子檔案(i);及 所述傳輸子檔案步驟包括於該匯流排系統上發送具有一標頭與一酬載之封包,該酬載包括該等子檔案,且該標頭從被發送之該子檔案識別該可組態單元及識別該子檔案之次序。The processor of claim 36, wherein the cell files of the configurable cells in the array of configurable cells have at most M subfiles, and the subfiles received by the setting are in memory Steps include: Store the offload configuration file in memory at addresses in blocks (i), where (i) ranges from 0 to at most M-1, and for blocks (i) at addresses in the plurality of configurable all configurable cells in a cell store subfiles (i) of those cell files; and The transmitting subfile step includes sending a packet on the bus system having a header and a payload, the payload including the subfiles, and the header identifying the configurable from the subfile being sent The unit and the order in which the subfile is identified. 如請求項36之處理器,其中該匯流排系統包括一頂層網路與一陣列層網路,該頂層網路包括一外部資料介面與一陣列介面,且該陣列層網路係連接至該陣列介面及至該陣列的可組態單元中之該等可組態單元。The processor of claim 36, wherein the bus system includes a top layer network and an array layer network, the top layer network includes an external data interface and an array interface, and the array layer network is connected to the array interface and to the configurable units of the array's configurable units. 如請求項43之處理器,其中該陣列組態卸載處理經由該頂層網路使用藉由於該等可組態單元之該等單元檔案中的該等子檔案之次序所暗示的位址來將該卸載組態檔案之子檔案路由至記憶體。The processor of claim 43, wherein the array configuration offload process uses addresses implied by the order of the subfiles in the cell files of the configurable cells via the top-level network to Subfiles of uninstalled configuration files are routed to memory. 如請求項43之處理器,其中該單元組態卸載處理經由該頂層網路使用藉由於該等可組態單元之該等單元檔案中的該等子檔案之次序所暗示的位址來將該卸載組態檔案之子檔案路由至記憶體。The processor of claim 43, wherein the cell configuration offload process uses addresses implied by the order of the subfiles in the cell files of the configurable cells via the top-level network to Subfiles of uninstalled configuration files are routed to memory. 如請求項36之處理器,其中於該複數個可組態單元中之可組態單元在執行期間在卸載該組態檔案亦被使用於該組態卸載處理中之前使用於該匯流排系統中之路由。The processor of claim 36, wherein a configurable unit of the plurality of configurable units is used in the bus system during execution before the configuration file is unloaded and also used in the configuration unloading process route. 如請求項36之處理器,其中該等單元檔案包含複數個排序過的子檔案,且用於一陣列的可組態單元之該卸載組態檔案被組合使得對於所有相同類型的可組態單元,相同次序之子檔案被儲存於一區塊的位址空間中,且使得一子檔案於該卸載組態檔案中之位置對應於該陣列之該子檔案中的該可組態單元及特定於該可組態單元之該單元檔案中之其次序。The processor of claim 36, wherein the cell files include a plurality of ordered subfiles, and the offload configuration file for an array of configurable cells is combined such that for all configurable cells of the same type , subfiles of the same order are stored in the address space of a block such that the position of a subfile in the offload configuration file corresponds to the configurable unit in the subfile of the array and is specific to the The order of configurable cells in this cell file. 一種用以操作可重組態資料處理器之方法,該可重組態資料處理器包含一匯流排系統及連接至該匯流排系統之一陣列的可組態單元,於該陣列中之可組態單元包括組態資料儲存器以儲存包含特定於對應的可組態單元之組態資料的複數個子檔案之單元檔案,該方法包含: 分配一命令至在該陣列中之複數個可組態單元以卸載特定於對應的可組態單元之該等單元檔案,該等單元檔案各包含複數個排序過的子檔案; 從該陣列之可組態單元從該匯流排系統接收子檔案,及根據該子檔案為其之一部分的該單元檔案之該可組態單元及於該單元檔案中之該子檔案之次序藉由設置所接收的子檔案於記憶體中來組合一卸載組態檔案。A method for operating a reconfigurable data processor comprising a bus system and configurable units connected to an array of the bus system, the configurable units in the array The state unit includes a configuration data store to store a unit file including a plurality of sub-files of configuration data specific to the corresponding configurable unit, the method comprising: assigning a command to a plurality of configurable cells in the array to unload the cell files specific to the corresponding configurable cell, each of the cell files containing a plurality of ordered subfiles; A subfile is received from the bus system from a configurable cell of the array, and the configurable cell of the cell file of which the subfile is a part and the subfile in the cell file are ordered by Set the received subfile in memory to assemble an uninstall configuration file. 如請求項48之方法,包括從該可組態單元之該組態儲存器卸載該等子檔案及經由該匯流排系統將特定於該可組態單元的一單元檔案之子檔案傳輸至該組態卸載控制器。The method of claim 48, comprising unloading the subfiles from the configuration store of the configurable unit and transferring the subfiles of a unit file specific to the configurable unit to the configuration via the bus system Uninstall the controller. 如請求項48之方法,其中於該複數個可組態單元中之可組態單元中的該組態資料儲存器包含一序列鏈及耦接至該序列鏈之一輸出緩衝器,且所述卸載步驟包括將該單元檔案之該等子檔案從該序列鏈轉移出至該輸出緩衝器、及於該匯流排系統上從該輸出緩衝器傳輸該等子檔案。The method of claim 48, wherein the configuration data store in a configurable unit of the plurality of configurable units includes a serial chain and an output buffer coupled to the serial chain, and the The unloading step includes transferring the sub-files of the unit file out of the sequence chain to the output buffer, and transferring the sub-files from the output buffer on the bus system. 如請求項48之方法,包括從一主處理接收組態卸載命令,該組態卸載命令識別於記憶體中何處儲存一卸載組態檔案之一位址位置,且該組合步驟包括對於該等子檔案計算從該位址位置之位址偏移。The method of claim 48, comprising receiving a configuration unload command from a host process, the configuration unload command identifying an address location in memory where to store an unload configuration file, and the combining step includes for the The subfile calculates the address offset from this address location. 如請求項48之方法,其中對於在複數個可組態單元中之各可組態單元,該組態檔案包括單元檔案之複數個子檔案,該等單元檔案具有多達M個具有於該單元檔案中次序(i)之子檔案、及被設置於該卸載組態檔案中使得對於在該卸載組態檔案中之所有單元檔案,所有次序(i)之子檔案被儲存於在該記憶體中之對應的區塊(i)之位址空間,其中(i)從0至M-1。The method of claim 48, wherein for each configurable unit in the plurality of configurable units, the configuration file includes a plurality of sub-files of a unit file, the unit files having up to M files with data in the unit file subfiles in order (i), and are set in the uninstall configuration file such that for all unit files in the uninstall configuration file, all subfiles in order (i) are stored in the corresponding memory The address space of block (i), where (i) is from 0 to M-1. 如請求項52之方法,其中該陣列包括多於一種類型的可組態單元,且用於不同類型的可組態單元之該等單元檔案包括不同數量的組態資料之子檔案,且其中在一區塊(i)之位址空間內,對於各類型的可組態單元,該等子檔案被儲存於在區塊(i)之位址空間內之相鄰的位址之群組中。The method of claim 52, wherein the array includes more than one type of configurable cell, and the cell files for different types of configurable cells include different numbers of subfiles of configuration data, and one of them Within the address space of block (i), for each type of configurable unit, the subfiles are stored in groups of adjacent addresses within the address space of block (i). 如請求項48之方法,其中子檔案具有數量N位元的資料,且該匯流排系統經組構以在一個匯流排週期中傳送N位元的資料。The method of claim 48, wherein the subfile has an amount of N bits of data, and the bus system is configured to transmit the N bits of data in one bus cycle. 如請求項48之方法,其中於該陣列的可組態單元中之該等可組態單元的該等單元檔案具有最多M個子檔案,且所述設置所接收的子檔案於記憶體中之步驟包括: 儲存該卸載組態檔案於記憶體中於複數個區塊(i)之位址空間,其中(i)從0至最多M-1,及對於在區塊(i)之位址空間中在該複數個可組態單元中之所有的可組態單元儲存該等單元檔案之子檔案(i);及 藉由於該匯流排系統上發送具有一標頭與一酬載之封包,從於該陣列中之該等可組態單元傳輸子檔案,該酬載包括一子檔案,且該標頭從被發送之該酬載中的該子檔案識別該可組態單元及識別該子檔案之次序。The method of claim 48, wherein the cell files of the configurable cells in the array of configurable cells have at most M subfiles, and the step of setting the received subfiles in memory include: Store the offload configuration file in memory in the address space of a plurality of blocks (i), where (i) is from 0 to at most M-1, and for blocks in the address space of block (i) in the address space all configurable units of the plurality of configurable units store sub-files (i) of the unit files; and Subfiles are transmitted from the configurable units in the array by sending packets with a header and a payload on the bus system, the payload includes a subfile, and the header is sent from The subfile in the payload identifies the configurable unit and identifies the order of the subfile. 如請求項48之方法,其中該匯流排系統包括一頂層網路與一陣列層網路,該頂層網路包括一外部資料介面與一陣列介面,且該陣列層網路係連接至該陣列介面及至該陣列的可組態單元中之該等可組態單元。The method of claim 48, wherein the bus system includes a top layer network and an array layer network, the top layer network includes an external data interface and an array interface, and the array layer network is connected to the array interface and to the configurable units of the configurable units of the array. 如請求項56之方法,其中該陣列組態卸載處理經由該頂層網路使用藉由於該等可組態單元之該等單元檔案中的該等子檔案之次序所暗示的位址來將該卸載組態檔案之子檔案路由至記憶體。The method of claim 56, wherein the array configuration offload process uses an address implied by the order of the subfiles in the unit files of the configurable units via the top-level network to offload the offload Subfiles of configuration files are routed to memory. 如請求項56之方法,其中該單元組態卸載處理經由該頂層網路使用藉由於該等可組態單元之該等單元檔案中的該等子檔案之次序所暗示的位址來將該卸載組態檔案之子檔案路由至記憶體。The method of claim 56, wherein the cell configuration offload process uses an address implied by the order of the subfiles in the cell files of the configurable cells via the top-level network to offload the cell configuration Subfiles of configuration files are routed to memory. 如請求項48之方法,包括在執行期間在卸載該組態檔案亦被使用以接收該等子檔案之前使用於該匯流排系統中之路由。The method of claim 48, including using routing in the bus system before unloading the configuration file is also used to receive the subfiles during execution. 如請求項48之方法,其中該等單元檔案包含複數個排序過的子檔案,且用於一陣列的可組態單元之該卸載組態檔案被組合使得對於所有相同類型的可組態單元,相同次序之子檔案被儲存於一線性的位址空間中,且使得一子檔案於該卸載組態檔案中之位置對應於該陣列之該子檔案中的該可組態單元及特定於該可組態單元之該單元檔案中之其次序。The method of claim 48, wherein the cell files include a plurality of ordered subfiles, and the offload configuration file for an array of configurable cells is combined such that for all configurable cells of the same type, Subfiles in the same order are stored in a linear address space such that the location of a subfile in the offload configuration file corresponds to the configurable unit in the subfile of the array and specific to the configurable its order in the unit file of the state unit.
TW108142191A 2018-11-21 2019-11-20 Configuration load and unload of a reconfigurable data processor TWI766211B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US16/198,086 US11188497B2 (en) 2018-11-21 2018-11-21 Configuration unload of a reconfigurable data processor
US16/197,826 US10831507B2 (en) 2018-11-21 2018-11-21 Configuration load of a reconfigurable data processor
US16/198,086 2018-11-21
US16/197,826 2018-11-21

Publications (2)

Publication Number Publication Date
TW202032383A TW202032383A (en) 2020-09-01
TWI766211B true TWI766211B (en) 2022-06-01

Family

ID=73643865

Family Applications (1)

Application Number Title Priority Date Filing Date
TW108142191A TWI766211B (en) 2018-11-21 2019-11-20 Configuration load and unload of a reconfigurable data processor

Country Status (1)

Country Link
TW (1) TWI766211B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201346758A (en) * 2011-12-29 2013-11-16 Intel Corp Method, device and system for controlling execution of an instruction sequence in a data stream accelerator
TW201610708A (en) * 2014-03-18 2016-03-16 萬國商業機器公司 Common boot sequence for control utilities that can be initialized in multiple architectures
US20180089117A1 (en) * 2016-09-26 2018-03-29 Wave Computing, Inc. Reconfigurable fabric accessing external memory
US20180189231A1 (en) * 2016-12-30 2018-07-05 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201346758A (en) * 2011-12-29 2013-11-16 Intel Corp Method, device and system for controlling execution of an instruction sequence in a data stream accelerator
TW201610708A (en) * 2014-03-18 2016-03-16 萬國商業機器公司 Common boot sequence for control utilities that can be initialized in multiple architectures
US20180089117A1 (en) * 2016-09-26 2018-03-29 Wave Computing, Inc. Reconfigurable fabric accessing external memory
US20180189231A1 (en) * 2016-12-30 2018-07-05 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator

Also Published As

Publication number Publication date
TW202032383A (en) 2020-09-01

Similar Documents

Publication Publication Date Title
US11983140B2 (en) Efficient deconfiguration of a reconfigurable data processor
US11609769B2 (en) Configuration of a reconfigurable data processor using sub-files
TWI789687B (en) Virtualization of a reconfigurable data processor
US11561925B2 (en) Tensor partitioning and partition access order
TW202307689A (en) Multi-headed multi-buffer for buffering data for processing
TWI766211B (en) Configuration load and unload of a reconfigurable data processor
US20250390456A1 (en) Partitioning for reconfigurable data processors