TWI408600B

TWI408600B - Conpute unit with an internal bit fifo circuit

Info

Publication number: TWI408600B
Application number: TW097109496A
Authority: TW
Inventors: James Wilson; Joshua Kablotsky; Yosef Stein
Original assignee: Analog Devices Inc
Priority date: 2007-03-26
Filing date: 2008-03-18
Publication date: 2013-09-11
Also published as: CN101657803A; JP5191532B2; EP2130132A4; TW200903325A; WO2008118277A1; US7882284B2; US20080244237A1; CN101657803B; JP2010522928A; EP2130132A1; EP2130132B1

Abstract

A compute unit with an internal bit FIFO circuit includes at least one data register, a lookup table, a configuration register including FIFO base address, length and read/write mode fields for configuring a portion of the lookup table as a bit FIFO circuit and a read/write pointer register responsive to an instruction having a lookup table identification field, length of bits field and register extract/deposit field for selectively transferring in a single cycle between the FIFO circuit and the data register a bit field of specified length.

Description

Computing unit with internal bit first-in first-out circuit

本發明係關於具有內部位元先進先出電路之計算單元。The present invention relates to a computing unit having an internal bit FIFO circuit.

這項申請案係關於由Wilison等人在2005年10月26號(AD－432J)申請之美國專利申請案第11/258,801號題為"改良式管線式數位信號處理器(IMPROVED PIPELINED DIGITAL SIGNAL PROCESS)"，該案以引用之方式併入本文中。U.S. Patent Application Serial No. 11/258,801, entitled "Improved Pipeline Digital Signal Processor (IMPROVED PIPELINED DIGITAL SIGNAL PROCESS), filed on October 26, 2005 (AD-432J). ", the case is incorporated herein by reference.

數位信號處理器是一種特殊用途之處理器，其經最佳化以用於數位信號處理應用，諸如數位濾波、語音分析與合成或視訊編碼與解碼，以產生經壓縮之位元流。某些通信或視訊應用可以使用哈夫曼(Huffman)編碼，其使用可變長度編碼方案(有別於使用每碼字固定數量位元之編碼方案)。哈夫曼編碼最小化用於以最高頻率出現之碼字之總位元數。此編碼基於已知概率來選擇位元數，致使當資料流中的該等位元抵達時解碼該資料位元流。此編碼達成更緊密之資料壓縮，因為最常出現之字元是短的，及偶爾出現之字元是長的，其中具有最高出現概率之最短字元只有一位元長。大多數之數位信號處理器被設計用以操縱具有固定字大小(例如，8位元，16位元或32位元之字)之資料。當處理器需要操縱非標準字大小時，典型地使用位元先進先出電路來實現，位元先進先出電路可處置任何指定長度位元欄位。此類裝置之一缺點在於其實施在計算單元外部之儲存器中，致使每當需要存取以進行讀或寫時，拖延可能發生。只能透過資料位址產生器(DAG)完成對擴展儲存器之存取的事實而使之惡化。相依於外部位元先進先出電路之另一個問題在於其增加信號必須行進之距離以及因此限制了操作循環之速度。A digital signal processor is a special purpose processor that is optimized for use in digital signal processing applications, such as digital filtering, speech analysis and synthesis, or video encoding and decoding to produce a compressed bit stream. Some communication or video applications may use Huffman coding, which uses a variable length coding scheme (unlike the coding scheme using a fixed number of bits per codeword). Huffman coding minimizes the total number of bits used for codewords that occur at the highest frequency. This encoding selects the number of bits based on a known probability, such that the data bit stream is decoded when the bits in the data stream arrive. This code achieves tighter data compression because the most frequently occurring characters are short, and the occasionally occurring characters are long, with the shortest character with the highest probability of occurrence being only one bit long. Most digital signal processors are designed to manipulate data with a fixed word size (eg, 8-bit, 16-bit or 32-bit words). When the processor needs to manipulate a non-standard word size, it is typically implemented using a bit-first-in, first-out circuit that can handle any specified length bit field. One disadvantage of such a device is that it is implemented outside the computing unit. In the storage, stalls may occur whenever access is required for reading or writing. This can only be aggravated by the fact that the Data Address Generator (DAG) completes access to the extended memory. Another problem with the FIFO FIFO is that it increases the distance the signal must travel and thus limits the speed of the operating cycle.

因此本發明之一目的是提供一種具有內部位元先進先出電路之改良式計算單元。It is therefore an object of the present invention to provide an improved computing unit having an internal bit FIFO circuit.

本發明之進一步目的是提供此種具有內部位元先進先出電路之改良式計算單元，其利用該計算單元之查詢表以實施該位元先進先出電路。It is a further object of the present invention to provide such an improved computing unit having an internal bit FIFO circuit that utilizes the lookup table of the computing unit to implement the bit FIFO circuit.

本發明之進一步目的是提供此種具有內部位元先進先出電路之改良式計算單元，其能有條件地自外部儲存器填充(fill)及移取(spill)到外部儲存器。It is a further object of the present invention to provide such an improved computing unit having an internal bit FIFO circuit that can conditionally fill and spill from an external reservoir to an external reservoir.

本發明之進一步目的是提供此種具有內部位元先進先出電路之改良式計算單元，其設定及上限標準(high water mark)及下限標準(low water mark)以定義一用於連續位元流運算元之窗。It is a further object of the present invention to provide such an improved computing unit having an internal bit FIFO circuit with a high water mark and a low water mark to define a continuous bit stream The window of the operand.

本發明之進一步目的是提供此種具有內部位元先進先出電路之改良式計算單元，其用32位記憶體對齊字填充與移取。It is a further object of the present invention to provide such an improved computing unit having an internal bit FIFO circuit that is filled and removed with a 32 bit memory alignment word.

本發明之進一步目的是提供此種具有內部位元先進先出電路之改良式計算單元，其中以上限標準及下限標準為條件而使填充及移取發生。It is a further object of the present invention to provide such an improved computing unit having an internal bit FIFO circuit in which filling and removal occurs on the basis of upper and lower standards.

本發明之進一步目的是提供此種具有內部位元先進先出電路之改良式計算單元，其能在一個循環中在一外部儲存器與任何計算單元資料暫存器之間以一連續位元流之形式轉遞一指定長度位元欄位。It is a further object of the present invention to provide such an improved computing unit having an internal bit FIFO circuit capable of flowing a continuous bit between an external memory and any computing unit data register in a cycle. The form transfers a specified length bit field.

本發明之進一步目的是提供此種具有內部位元先進先出電路之改良式計算單元，其可以利用一查詢表之僅一部分並且可能有一個以上位元先進先出電路在一或多個查詢表中。It is a further object of the present invention to provide such an improved computing unit having an internal bit FIFO circuit that can utilize only a portion of a lookup table and possibly more than one bit FIFO circuit in one or more lookup tables in.

本發明之進一步目的是提供此種具有內部位元先進先出電路之改良式計算單元，其能從左到右(大端(Big Endian))或從右到左(小端(Little Endian))存入/提取資料。It is a further object of the present invention to provide such an improved computing unit having an internal bit FIFO circuit that can be left to right (Big Endian) or right to left (Little Endian). Deposit/extract data.

本發明係實現下列之結果：可在一計算單元內在內部提供一位元先進先出，其係藉由在該計算單元中組態一查詢表，以定義一位元先進先出基底位址、長度及讀/寫模式，並回應於具有一查詢表識別欄位、位元長度欄位及暫存器提取/存入欄位的一指令，而使用讀/寫指標暫存器，用於在一單個循環中在該先進先出電路與該資料暫存器之間選擇性地轉遞一指定長度位元欄位。The present invention achieves the following results: a one-bit FIFO can be provided internally in a computing unit by configuring a lookup table in the computing unit to define a one-bit FIFO address, Length and read/write mode, and in response to an instruction having a lookup table identification field, a bit length field, and a scratchpad extraction/storage field, using a read/write index register for A specified length bit field is selectively transferred between the FIFO circuit and the data register in a single cycle.

然而，在其他實施例中，本發明不需要達成所有這些目的並且其請求項不應限制於能夠達成這些目的之結構或方法。However, in other embodiments, the invention is not required to achieve all of these objectives and the claims are not limited to structures or methods that achieve these ends.

本發明之特徵在於具有內部位元先進先出電路之計算單元，其包括：至少一資料暫存器；一查詢表；一組態暫存器，其包含先進先出基底位址、長度及讀/寫模式欄位，用於組態該查詢表之一部分以作為一位元先進先出電路；及一讀/寫指標暫存器，其回應於具有一查詢表識別欄位、位元長度欄位及暫存器提取/存入欄位的一指令，用於在一單個循環中在該先進先出電路與該資料暫存器之間選擇性地轉遞一指定位元長度欄位。The invention is characterized by a computing unit having an internal bit FIFO circuit, comprising: at least one data register; a lookup table; a configuration register comprising a first in first out substrate address, a length and a read /write mode field, Used to configure a part of the lookup table as a one-bit FIFO circuit; and a read/write index register in response to having a lookup table identification field, a bit length field, and a scratchpad extraction An instruction stored in the field for selectively forwarding a specified bit length field between the first in first out circuit and the data register in a single loop.

在一較佳實施例中，該組態暫存器可進一步包含一小端/大端模式欄位。轉遞一位元欄位可包含：回應於在該組態暫存器及該指標暫存器中之資訊以及該指令，從該先進先出電路提取一位元欄位並儲存該位元欄位在該計算單元資料暫存器中。轉遞一位元欄位可包含：從一資料暫存器存入一位元欄位到該位元先進先出電路中，並回應於在該組態暫存器及該指標暫存器中之資訊以及該指令。提取可包含：藉由在模先進先出長度中之該指定長度，更新在該讀指標暫存器中之讀指標。存入可包含：藉由在該模先進先出長度中之該指定長度，更新在該寫指標暫存器中之寫指標。該讀/寫指標暫存器可包含用於追蹤該指定長度的一字位址欄位及位元位置欄位。該讀/寫指標暫存器可進一步包含一限制標準(water mark)暫存器，該限制標準暫存器用於定義：上限標準，對於高於該上限標準，禁止轉遞至該位元先進先出電路，並且必須移取該位元先進先出電路到一外部儲存器；及下限標準，對於低於該下限標準，准許轉遞至該位元先進先出電路，並且啟用以一連續位元流運算元從該外部儲存器填充該位元先進先出。用外部記憶體之填充及移取可係以32位元字出現。該等32位元字是對齊之記憶體。該查詢表可包含一隨機存取記憶體。該資料暫存器可係該計算單元暫存器檔案之一者。提取可包含：如果留置於該先進先出中之該等位元低於該下限標準，則更新該讀指標暫存器中之讀指標並產生一下限標準信號。存入可包含：如果該先進先出中之該等位元高於該上限標準，則更新該寫指標暫存器中之寫指標並產生一上限標準信號。該查詢表可以包含多個位元先進先出。In a preferred embodiment, the configuration register can further include a little endian/big endian mode field. Transmitting a one-bit field may include: in response to the information in the configuration register and the indicator register, and the instruction, extracting a one-bit field from the first-in first-out circuit and storing the bit field Bit in the calculation unit data register. Transferring a one-bit field may include: depositing a bit field from a data register into the bit first-in first-out circuit, and responding to the configuration register and the indicator register Information and the instructions. The extracting may include updating the read indicator in the read indicator register by the specified length in the modulo first in first out length. The depositing may include: updating the write indicator in the write index register by the specified length in the first in first out length of the module. The read/write index register may include a word address field and a bit position field for tracking the specified length. The read/write index register may further include a water mark register, the limit standard register is used to define: an upper limit standard, for which the transfer to the bit is advanced first. Out of the circuit, and must move the bit FIFO circuit to an external memory; and a lower limit standard, for which the priority is allowed to be forwarded to the bit FIFO circuit, and enabled by a contiguous bit The stream operator fills the bit FIFO from the external memory. Filling and stripping with external memory can occur in 32-bit words. The 32-bit words are Aligned memory. The lookup table can include a random access memory. The data register can be one of the calculation unit register files. The extracting may include updating the read indicator in the read indicator register and generating a lower limit standard signal if the bit remaining in the first in first out is below the lower limit criterion. The depositing may include: if the bit in the first in first out is higher than the upper limit criterion, updating the write indicator in the write indicator register and generating an upper limit standard signal. The lookup table can contain multiple bits first in, first out.

除了以下揭露之較佳實施例，本發明有其他實施例以及以各種方式實踐或實施之能力。因此，應瞭解本發明之應用並不限於以下描述中所闡述或圖中說明之構造細節及組件佈置。如果本文描述佄一項實施例，其請求項非限於該實施例。此外，其請求項不被認為局限性，除非有明確及令人信服之證據展現特定之排斥、限制或宣告放棄。In addition to the preferred embodiments disclosed below, the invention is capable of other embodiments and of various embodiments. Therefore, it is to be understood that the application of the invention is not limited to the details of the details If an embodiment is described herein, the claim is not limited to the embodiment. In addition, its claim is not considered to be limiting unless there is clear and convincing evidence of a particular exclusion, limitation or waiver.

圖1中所示之數位信號處理器210包含：一位址單元212，其具有一或多個數位位址產生器214、216；一控制單元，諸如程式定序器218；及一或多個計算單元220，每一計算單元含有若干電路，諸如算術邏輯單元222、乘法/累加器224、移位器226。在一個數位信號處理器中通常有兩個、四個或更多之計算單元。該數位信號處理器係透過記憶體匯流排228連接到一或多個記憶體，諸如第一階(L1)記憶體230，包含程式記憶體232及資料記憶體234或附加記憶體236。記憶體230可能是第一階記憶體，其通常非常快速並且相當昂貴。記憶體236可能是第三階(L3)記憶體，其比較便宜並且比較慢。隨著數位信號處理器210操作於1GHz及更高，操作循環是如此之快以至於該位址單元及計算單元需要一個以上循環以完成它們之操作。為了改良數位信號處理器210之總處理能力以及加強它之效能，其典型係深管線式。The digital signal processor 210 shown in FIG. 1 includes an address unit 212 having one or more digital address generators 214, 216; a control unit such as a program sequencer 218; and one or more The computing unit 220, each computing unit, includes a number of circuits, such as an arithmetic logic unit 222, a multiply/accumulator 224, and a shifter 226. There are typically two, four or more computing units in a digital signal processor. The digital signal processor is coupled to one or more memories, such as first level (L1) memory 230, via memory bus 228, including program memory 232 and data memory 234 or additional memory 236. Memory 230 may be a first order memory, which is typically very fast and relatively expensive. Memory 236 may be the third order (L3) Recalling the body, it is cheaper and slower. As the digital signal processor 210 operates at 1 GHz and higher, the operational cycle is so fast that the address unit and the computing unit require more than one cycle to complete their operation. In order to improve the overall processing power of the digital signal processor 210 and to enhance its performance, it is typically deep pipelined.

在管線式操作中，當跨所有處理器平行建置組塊介於先前指令之結果與後續指令之間的無相依性時，管線式效率被保存。然而，如果有此種相依性，則管線拖延可能發生，其中管線將停止並等待有問題的指令完成，才能繼續進行工作。例如，如果一計算結果不能被直接儲存但必須被用來產生一位址(可在該位址處在記憶體中找到該計算結果之相關函數)，則在計算單元結果與資料位址產生器之間有一種相依性，其破壞管線之平穩操作，一實例將足以說明。In pipelined operation, pipelined efficiency is preserved when the parallel build of chunks across all processors is between the results of previous instructions and subsequent instructions. However, if there is such a dependency, a pipeline stall may occur where the pipeline will stop and wait for the problematic instruction to complete before proceeding. For example, if a calculation result cannot be directly stored but must be used to generate an address (a correlation function can be found in the memory at the address), then the calculation unit result and data address generator There is a dependency between them that disrupts the smooth operation of the pipeline, an example will suffice.

假設計算單元計算一結果，其是一個角α，但是它是該角之一函數sine α，其將被用在執行後續操作中。然後計算單元必須轉遞計算結果到位址單元212，在位址單元212處的資料位址產生器214或216產生正確之位址以從記憶體230或236提取該角之正弦函數並把它帶回且提交給計算單元。管線中之此拖延或中斷浪費時間。數位信號處理器210之一個特徵在於位址單元212並且僅位址單元212可定址記憶體230及236。因此任何時候，計算單元需要來自L1記憶體230或L3記憶體236的資訊才能進行操作時，歸因於計算單元結果在遲於該資料位址產生器212暫存器被載入時之階段才有效之事實，而造成管線式操作變得拖延。Suppose the calculation unit calculates a result, which is an angle α, but it is a function of the angle sine α, which will be used in performing subsequent operations. The computing unit must then forward the result of the calculation to the address unit 212, and the data address generator 214 or 216 at the address unit 212 generates the correct address to extract the sine function of the angle from the memory 230 or 236 and bring it Go back and submit to the calculation unit. This delay or interruption in the pipeline wastes time. One of the digital signal processors 210 is characterized by an address unit 212 and only the address unit 212 can address the memories 230 and 236. Therefore, at any time, when the computing unit needs information from the L1 memory 230 or the L3 memory 236 to operate, the result of the calculation unit is later than when the data address generator 212 is loaded. The fact that it is effective causes pipeline operations to become delayed.

按照本發明，在根據本發明之數位信號處理器10a中，圖2之每個計算單元20a、20b、20c、20d皆具備一本端可重組態填充與移取隨機存取記憶體陣列，例如，查詢表(LUT)50a。計算單元28典型可包含：乘法器52；若干選擇電路54及56；多項式乘法器58，諸如用於Galois欄位操作者；桶式移位器(barrel shifter)60；算術邏輯單元62；累加器64；及多工器66等等。再者，每一計算單元包含一暫存器檔案68。資料暫存器可能是計算單元暫存器檔案之一者。典型地，當有一個以上計算單元(例如，在圖3中，計算單元20a、20b、20c及20d)時，計算單元可皆共用相同暫存器檔案68。每一計算單元亦有自己之本端可重組態填充與移取隨機存取記憶體陣列(查詢表50a、50b、50c及50d)。本端可重組態填充與移取隨機存取記憶體陣列係小得足以剛好能容納於習知計算單元中並且係可在一循環中存取，而本端可重組態填充與移取隨機存取記憶體陣列亦係大得足以支援計算單元內部之大多數應用，而不必用到外部記憶體及引起管線拖延。According to the present invention, in the digital signal processor 10a according to the present invention, each of the computing units 20a, 20b, 20c, 20d of FIG. 2 is provided with a local reconfigurable padding and removable random access memory array. For example, lookup table (LUT) 50a. Computing unit 28 may typically include: a multiplier 52; a number of selection circuits 54 and 56; a polynomial multiplier 58, such as for a Galois field operator; a barrel shifter 60; an arithmetic logic unit 62; an accumulator 64; and multiplexer 66 and so on. Furthermore, each computing unit includes a register file 68. The data register may be one of the calculation unit register files. Typically, when there is more than one computing unit (e.g., computing units 20a, 20b, 20c, and 20d in FIG. 3), the computing units may all share the same register file 68. Each computing unit also has its own local reconfigurable padding and stripping random access memory array (query tables 50a, 50b, 50c, and 50d). The local reconfigurable padding and removable random access memory array is small enough to fit in the conventional computing unit and can be accessed in a loop, while the local reconfigurable padding and pipetting The random access memory array is also large enough to support most applications within the computing unit without the need for external memory and causing pipeline stalls.

具有內部組態之位元先進先出電路的計算單元(例如，使用一個內部計算單元查詢表)係適用於編碼操作及解碼操作兩者。在編碼操作中，圖3A之計算單元10包含一算術邏輯單元12、一或多個資料暫存器14及一位元先進先出16，連同計算單元中通常存在的其他組件。操作中，原始資料或未經壓縮之位元流18係在線路20上被提供到算術邏輯單元12，算術邏輯單元12按照某演算法(例如H.264、 Windows Media、MP3或類似項)壓縮資料。經壓縮資料典型係在一運算元區塊中(諸如一巨集區塊)轉遞到資料暫存器14。然後，資料暫存器14轉遞經壓縮資料到位元先進先出16，位元先進先出16在線路22上提供連續位元流運算元(諸如視訊巨集區塊)作為經壓縮位元流24。A computing unit with an internally configured bit-first-in first-out circuit (eg, using an internal computing unit lookup table) is suitable for both encoding operations and decoding operations. In the encoding operation, the computing unit 10 of FIG. 3A includes an arithmetic logic unit 12, one or more data registers 14 and a bit-first-in first-out 16 together with other components typically found in the computing unit. In operation, the original data or uncompressed bit stream 18 is provided on line 20 to arithmetic logic unit 12, which operates according to an algorithm (eg, H.264, Windows Media, MP3, or similar) compresses data. The compressed data is typically forwarded to the data register 14 in an operand block, such as a macroblock. Then, the data register 14 forwards the compressed data to the bit FIFO 16, and the bit FIFO 16 provides a continuous bit stream operation element (such as a video macro block) on the line 22 as the compressed bit stream. twenty four.

除了在計算單元10中提供一內部組態之位元先進先出16，本發明具有在位元先進先出16中提供上限標準功能之額外特徵。在編碼操作期間，如果位元先進先出16中之位元數目超過該上限標準，此指示出位元先進先出中沒有足夠之空間以存入足夠位元以用於編碼一個完整運算元(例如一完整巨集區塊)，所以位元先進先出16中之一些位元必須移取到外部儲存器(典型係一L3級儲存器)。In addition to providing an internally configured bit FIFO 16 in computing unit 10, the present invention has the additional feature of providing an upper standard function in bit FIFO 16. During the encoding operation, if the number of bits in the bit FIFO 16 exceeds the upper limit criterion, this indicates that there is not enough space in the bit FIFO to store enough bits for encoding a complete operand ( For example, a complete macroblock), so some of the bits in the FIFO FIFO 16 must be moved to an external storage (typically an L3 level storage).

當圖3B之計算單元10操作在解碼操作中時，一壓縮位元流26(諸如來自一L3級儲存器裝置)係在線路28上轉遞到位元先進先出16。這些位元係經由資料暫存器14被轉遞到算術邏輯單元12，算術邏輯單元12以巨集區塊之運算元解碼該資料(典型在視訊應用中)。然後，在線路30上提供未經壓縮或經解碼之資料以作為未經壓縮資料32。When computing unit 10 of FIG. 3B is operating in a decoding operation, a compressed bit stream 26 (such as from an L3 level storage device) is forwarded on line 28 to bit FIFO 16. These bits are passed through the data register 14 to the arithmetic logic unit 12, which decodes the data (typically in a video application) with the operands of the macroblock. Uncompressed or decoded material is then provided on line 30 as uncompressed material 32.

本發明之一額外特徵在於位元先進先出16中之下限標準操作。下限標準確立低於下限標準則無法有效地完成對運算元(例如一巨集區塊)之解碼並且必須從外部儲存器填充位元先進先出的極限。當該位元先進先出16中之位元數目高於該下限標準時，則確立位元先進先出16中有至少最小數目之位元以使巨集區塊或其他定義之運算元能夠被解碼，而在處理中無拖延發生。An additional feature of the present invention resides in the lower standard operation of the bit FIFO 16. The lower limit criterion established below the lower limit criterion cannot effectively decode the operand (eg, a macroblock) and must fill the bit first-in first-out limit from the external memory. When the number of bits in the FIFO FIFO 16 is higher than the lower limit criterion, then at least a minimum number of bits in the FIFO FIFO 16 are established to enable the macroblock or other defined operands to be solved. Code, and no delay occurred during processing.

在查詢表40中組態圖4之位元先進先出16，查詢表40可能是在計算單元10內部包含的一查詢表，諸如圖2之LUT0 50a、LUT1 50c、LUT2 50b、LUT3 50d。除了算術單元12及一或多個資料暫存器14外，計算單元10亦包含一位元先進先出組態暫存器42、讀/寫指標暫存器44及限制標準暫存器46。組態暫存器42實際上組態查詢表40中的位元先進先出16。組態暫存器42具有：一讀/寫欄位48，其指示是否讀取或寫入至位元先進先出16；及端(endian)欄位50，其指示操作位元先進先出16之方式是否係先讀取最高有效位元地從右到左的大端，或係先讀取最低有效位元地從左到右的小端。組態暫存器42亦具有：一欄位52，用來定義查詢表40中之位元先進先出16之長度；以及起始位址或基底位址54，用來定義查詢表40中之位元先進先出16之起始位址。指標暫存器44包含一個寫指標56及一個讀指標58。圖5A之寫指標56包含一字位址60及位元位置62。讀指標58亦包含一字位址欄位64及位元位置欄位66。在每種情況下，字位址分別指示位元先進先出16中待讀取及寫入之位址，而位元位置指示已藉由圖5B及5C之讀指令及寫指令分別指定的位址之位元數目。圖5B之讀指令70包含：一位元先進先出識別欄位72，其識別其中位元先進先出已被組態的特定查詢表；欄位74，在欄位74中指示待讀取位元之長度；欄位76，在欄位76中指示待存入位元之數目。圖5C之寫指令80亦具有：一欄位82，其識別其中待寫入之位元先進先出已被組態的查詢表；一欄位84，在欄位84中指定待寫入之位元長度；及一欄位86，其識別特轉遞提取之位元至其的暫存器。The bit FIFO 16 of FIG. 4 is configured in lookup table 40, which may be a lookup table contained within computing unit 10, such as LUT0 50a, LUT1 50c, LUT2 50b, LUT3 50d of FIG. In addition to the arithmetic unit 12 and one or more data registers 14, the computing unit 10 also includes a one-bit FIFO configuration register 42, a read/write index register 44, and a limit standard register 46. The configuration register 42 actually configures the bit FIFO 16 in the lookup table 40. The configuration register 42 has a read/write field 48 indicating whether to read or write to the bit FIFO 16 and an endian field 50 indicating the operation bit first in first out 16 Whether the mode is to read the big end of the most significant bit from right to left, or the small end of the least significant bit from left to right. The configuration register 42 also has a field 52 for defining the length of the bit FIFO 16 in the lookup table 40, and a start address or base address 54 for defining the lookup table 40. The starting address of the bit FIFO FIFO 16. The indicator register 44 includes a write indicator 56 and a read indicator 58. The write indicator 56 of FIG. 5A includes a word address 60 and a bit position 62. The read indicator 58 also includes a one-word address field 64 and a bit position field 66. In each case, the word address indicates the address to be read and written in the first-in first-out 16 of the bit, and the bit position indicates the bit that has been designated by the read and write instructions of Figures 5B and 5C, respectively. The number of bits in the address. The read command 70 of FIG. 5B includes a one-bit FIFO identification field 72 that identifies a particular lookup table in which the FIFO FIFO has been configured; field 74, which indicates the bit to be read in field 74. The length of the element; field 76, indicating the number of bits to be stored in field 76. The write command 80 of Figure 5C also has a field 82 that identifies the bit to be written therein. The FIFO has a configured lookup table; a field 84, which specifies the length of the bit to be written in field 84; and a field 86 that identifies the bit to which the special transfer is extracted. Save.

回到圖4，限制標準暫存器46包含一上限標準欄位90及下限標準欄位92。上限標準欄位係涉及當存入或寫入位元正在發生時的編碼操作期間。下限標準欄位係涉及當提取或讀取位元正在發生時的解碼操作期間。指標暫存器44回應於一讀指令或寫指令以用於在單個循環中從位元先進先出電路選擇性地轉遞到資料暫存器或該資料暫存器選擇性地轉遞到該位元先進先出電路。指定位元欄位之轉遞可意謂著從先進先出電路提取一位元欄位並儲存該位元欄位在計算單元資料暫存器中，或從資料暫存器存入一位元欄位到位元先進先出電路中。提取動作或存入動作包含藉由在模先進先出長度中之指定長度分別更新在讀指標暫存器中之讀指標或寫指標暫存器中之寫指標，以適切追蹤位元先進先出16之狀態，其是一循環式記憶體。即，例如假設位元先進先出16是一512位元記憶體，當位元滿溢512時，位元先進先出16循環回到零並再次開始，如箭頭94所示。圖5A之指標暫存器中之字位址及位元位置持續追蹤位元先進先出16中之資料。Returning to FIG. 4, the limit standard register 46 includes an upper limit standard field 90 and a lower limit standard field 92. The upper standard field is related to the encoding operation when a store or write bit is occurring. The lower standard field is related to the decoding operation when the extraction or reading bit is occurring. The index register 44 is responsive to a read command or a write command for selective transfer from the bit FIFO circuit to the data register in a single loop or the data register is selectively transferred to the Bit FIFO circuit. The transfer of the specified bit field may mean extracting a meta field from the FIFO circuit and storing the bit field in the calculation unit data register, or storing a bit from the data register. The field is in the bit first-in first-out circuit. The extraction action or the deposit operation includes updating the read index in the read index register or the write index in the write index register by a specified length in the first in first out length of the mode, so as to appropriately track the bit first in first out 16 The state is a circular memory. That is, for example, assume that the bit FIFO 16 is a 512-bit memory. When the bit overflows 512, the bit FIFO 16 loops back to zero and begins again, as indicated by arrow 94. The word address and bit position in the indicator register of Figure 5A continuously track the data in the bit FIFO 16.

在解碼操作期間，限制標準暫存器46利用上限標準欄位90以用信號發送先進先出幾乎滿並且必須藉由從該位元先進先出16移取或卸載位元到晶片外記憶體100(諸如L3級)以使空間可用，否則將發生資料滿溢。在解碼操作中下限標準欄位用信號發送先進先出中沒有足夠資料以處理下一個巨集區塊以及必須被從晶片外位元流記憶體100填充先進先出，否則資料不足將發生。下限標準被設定在最小位元數目以確保有足夠之位元以允許一完整運算元(例如，視訊巨集區塊)被處理，而未由於缺乏足夠資料而使拖延發生。During the decoding operation, the limit standard register 46 utilizes the upper limit standard field 90 to signal that the FIFO is almost full and must be removed or unloaded from the bit FIFO 16 to the off-chip memory 100. (such as L3 level) to make space available, otherwise data overflow will occur. Lower limit in decoding operation The standard field is signaled. There is not enough data in the FIFO to process the next macroblock and must be populated from the off-chip bitstream memory 100 for FIFO, otherwise insufficient data will occur. The lower bound criterion is set to the minimum number of bits to ensure that there are enough bits to allow a complete operand (e.g., video macroblock) to be processed without delaying due to lack of sufficient data.

在圖6中顯示下限標準102及上限標準104之圖形表示，其中自上而下填充查詢表40。亦分別顯示基底位址106之位置以及讀指標108之位置與寫指標110之位置。圖6中亦顯示：大端路徑112，MSB最先，左到右；以及小端路徑114，LSB最先，右到左。填充及移取到外部記憶體100較好以標準字(諸如32位元字，並且典型係對齊之記憶體)出現，即以位元組(8個位元)、短字(16個位元)、字(32個位元)或雙字(64個位元)來操縱填充及移取。可用習知隨機存取記憶體裝置來實施查詢表40。A graphical representation of the lower limit criteria 102 and the upper limit criteria 104 is shown in FIG. 6, where the lookup table 40 is populated from top to bottom. The location of the base address 106 and the location of the read index 108 and the location of the write index 110 are also displayed, respectively. Also shown in Figure 6 is the big endian path 112, MSB first, left to right; and the little endian path 114, LSB first, right to left. Filling and pipetting to external memory 100 preferably occurs in standard words (such as 32-bit words, and typically aligned memory), ie, in bytes (8 bits), short words (16 bits) ), word (32 bits) or double word (64 bits) to manipulate padding and padding. The lookup table 40 can be implemented using conventional random access memory devices.

參考圖7A及7B能更好地理解上限標準及下限標準之操作。在圖7A中，在位元先進先出16中有大量之位元120，其中寫指標110非常接近上限標準104，在此點必須移取先進先出以避免資料滿溢。在圖7B中，在位元先進先出16中僅有少許位元122，並且讀指標108剛剛勉強達到下限標準，其確認在位元先進先出16中有足夠之位元122，以及必須填充先進先出以完全處理運算元(例如資料視訊巨集區塊)，而無拖延或資料不足。The operation of the upper limit standard and the lower limit standard can be better understood with reference to Figs. 7A and 7B. In Figure 7A, there is a large number of bits 120 in the bit FIFO 16 where the write index 110 is very close to the upper limit criterion 104, at which point FIFO must be removed to avoid data overflow. In FIG. 7B, there are only a few bits 122 in the bit FIFO 16, and the read indicator 108 has barely reached the lower limit criterion, which confirms that there are enough bits 122 in the bit FIFO 16 and must be filled. FIFO for complete processing of operands (such as data video macro blocks) without delay or insufficient data.

在圖8及9中分別描繪分別從晶片外L3級記憶體100填充之操作及移取到晶片外L3級記憶體100之操作。在圖8之填充操作中，經由資料暫存器14從晶片外L3級記憶體100填充位元先進先出16，資料暫存器14接受四個8位元之位元組或一個32位元之字並轉遞其到在122指示之可用空間中在寫指標110處的位元先進先出16。從位元先進先出16到外部晶片記憶體100之移取操作發生在圖9中，當高於讀指標108上面之資料被轉遞到資料暫存器14以用於傳送到晶片外記憶體100時。雖然至此已引用單個計算單元中之單個查詢表中之單個位元單先進先出來解說本發明，但是計算單元可以有一個以上組態之位元先進先出；事實上，在圖10之單一查詢表40中可以有一個以上位元先進先出16a、16b。Filled separately from the off-chip L3 level memory 100 in FIGS. 8 and 9, respectively. The operation and removal of the operation of the L3 level memory 100 outside the wafer. In the filling operation of FIG. 8, the bit FIFO 16 is filled from the off-chip L3 level memory 100 via the data register 14, and the data register 14 accepts four 8-bit bytes or a 32-bit unit. The word is forwarded to the bit FIFO 16 at the write index 110 in the available space indicated at 122. The transfer operation from the bit FIFO 16 to the external chip memory 100 occurs in Figure 9, when the data above the read index 108 is transferred to the data register 14 for transfer to the off-chip memory. 100 hours. Although the invention has been described above with reference to a single bit in a single lookup table in a single computing unit, the computing unit may have more than one configured bit FIFO; in fact, a single query in FIG. There may be more than one bit FIFO 16a, 16b in Table 40.

雖然本發明之具體特徵被顯示在一些圖式中且未顯示於其他圖式中，但這僅為了方便，因為每個特徵可根據本發明之任意或所有其他特徵結合。在這裏所使用之用詞“包含”、“包括”、“有”及“具有”被廣泛地全面地理解且不被限制於任意實現互連。此外，在本申請案中揭露之任何實施例不被作為唯一可能之實施例。Although specific features of the invention are shown in some drawings and not shown in other drawings, this is merely a convenience, as each feature may be combined in accordance with any or all of the other features of the invention. The words "including", "comprising", "having", and "having" are used to be broadly understood and are not limited to any implementation. Moreover, any embodiment disclosed in this application is not to be construed as the only possible embodiment.

此外，在專利申請案之追訴期間對本專利提出之任何修正不是對在如申請之申請案中提出之任何請求項之宣告放棄：熟習此項技術者不能合理地預期起草一請求項，其可能從字面上來看涵蓋了所有可能之當量，很多當量將在修正之時間係不可預見並超出將要被放棄之合理解釋(如果有之話)，基本原理修正可能承擔不超過一個對很多當量之正切關係，及/或有很多其他原因，申請人不能指望來描述對所附任何請求項之某些無實質之代替。In addition, any amendments to this patent during the prosecution of a patent application are not a waiver of any claim made in the application as filed: those skilled in the art cannot reasonably expect to draft a claim, which may Literally, all possible equivalents are covered, many of which will be unpredictable at the time of the correction and beyond the reasonable explanation (if any) that will be abandoned, the rationale correction may take no more than one pair of equivalents The tangential relationship, and/or for many other reasons, the Applicant cannot expect to describe some insubstantial substitution of any of the appended claims.

其他實施例將被那些熟習此項技術者想到並且在以下之請求項中。Other embodiments will be apparent to those skilled in the art and in the claims below.

10‧‧‧計算單元10‧‧‧Computation unit

10a‧‧‧數位信號處理器10a‧‧‧Digital Signal Processor

12‧‧‧算術邏輯單元12‧‧‧Arithmetic Logic Unit

14‧‧‧資料暫存器14‧‧‧data register

16‧‧‧位元先進先出16‧‧‧ 10,000 FIFO

16a‧‧‧位元先進先出16a‧‧‧元元先先出

16b‧‧‧位元先進先出16b‧‧‧ 10,000 FIFO

18‧‧‧未經壓縮之位元流(原始資料)18‧‧‧Uncompressed bit stream (original data)

20‧‧‧線路20‧‧‧ lines

20a‧‧‧計算單元20a‧‧‧Computation unit

20b‧‧‧計算單元20b‧‧‧Computation unit

20c‧‧‧計算單元20c‧‧‧Computation unit

20d‧‧‧計算單元20d‧‧‧Computation unit

22‧‧‧線路22‧‧‧ lines

24‧‧‧經壓縮位元流24‧‧‧Compressed bit stream

26‧‧‧來自L3的壓縮位元流26‧‧‧Compressed bitstream from L3

28‧‧‧線路28‧‧‧ lines

30‧‧‧線路30‧‧‧ lines

32‧‧‧未經壓縮資料32‧‧‧Uncompressed data

40‧‧‧查詢表40‧‧‧Enquiry Form

42‧‧‧組態暫存器42‧‧‧Configuration register

44‧‧‧讀/寫指標暫存器44‧‧‧Read/write indicator register

46‧‧‧限制標準暫存器46‧‧‧Restricted standard register

48‧‧‧讀/寫欄位48‧‧‧Read/write field

50a‧‧‧查詢表50a‧‧‧Enquiry Form

50b‧‧‧查詢表50b‧‧‧Enquiry Form

50c‧‧‧查詢表50c‧‧‧Enquiry Form

50d‧‧‧查詢表50d‧‧‧Enquiry Form

50‧‧‧端欄位50‧‧‧End field

52‧‧‧乘法器52‧‧‧Multiplier

52‧‧‧欄位52‧‧‧ fields

54‧‧‧選擇電路54‧‧‧Selection circuit

54‧‧‧起始位址或基底位址54‧‧‧Starting address or base address

56‧‧‧選擇電路56‧‧‧Selection circuit

56‧‧‧寫指標56‧‧‧ Write indicators

58‧‧‧多項式乘法器58‧‧‧ Polynomial Multiplier

58‧‧‧讀指標58‧‧‧ Reading indicators

60‧‧‧桶式移位器60‧‧‧ barrel shifter

62‧‧‧算術邏輯單元62‧‧‧Arithmetic Logic Unit

64‧‧‧累加器64‧‧‧ accumulator

66‧‧‧多工器66‧‧‧Multiplexer

68‧‧‧暫存器檔案68‧‧‧Scratch file

70‧‧‧讀指令70‧‧‧Reading instructions

72‧‧‧位元先進先出識別欄位72‧‧‧ bit FIFO first-in-first-out identification field

74‧‧‧欄位74‧‧‧ field

76‧‧‧欄位76‧‧‧ field

80‧‧‧寫指令80‧‧‧write instructions

82‧‧‧欄位82‧‧‧ field

84‧‧‧欄位84‧‧‧ field

86‧‧‧欄位86‧‧‧ field

90‧‧‧上限標準欄位90‧‧‧ upper limit standard field

92‧‧‧下限標準欄位92‧‧‧lower standard field

94‧‧‧開始代表箭頭94‧‧‧Starting to represent the arrow

100‧‧‧晶片記憶體100‧‧‧chip memory

102‧‧‧下限標準102‧‧‧lower standard

104‧‧‧上限標準104‧‧‧ upper limit standard

106‧‧‧基底位址106‧‧‧Base address

108‧‧‧讀指標108‧‧‧ reading indicators

110‧‧‧寫指標110‧‧‧ write indicators

112‧‧‧大端：MSB最先，左到右112‧‧‧ Big End: MSB first, left to right

114‧‧‧小端：LSB最先，右到左114‧‧‧Little End: LSB first, right to left

120‧‧‧位元120‧‧‧ bits

122‧‧‧位元122‧‧‧ bits

210‧‧‧數位信號處理器210‧‧‧Digital Signal Processor

212‧‧‧位址單元212‧‧‧ address unit

214‧‧‧數位位址產生器214‧‧‧Digital Address Generator

216‧‧‧數位位址產生器216‧‧‧Digital Address Generator

218‧‧‧程式定序器218‧‧‧Program Sequencer

220‧‧‧計算單元220‧‧‧Computation unit

222‧‧‧算術邏輯單元222‧‧‧Arithmetic Logic Unit

224‧‧‧乘法/累加器224‧‧‧Multiplier/Accumulator

226‧‧‧移位器226‧‧‧ shifter

228‧‧‧記憶體匯流排228‧‧‧Memory bus

230‧‧‧第一階(L1)記憶體230‧‧‧First Order (L1) Memory

232‧‧‧程式記憶體232‧‧‧Program memory

234‧‧‧資料記憶體234‧‧‧Data Memory

236‧‧‧附加記憶體236‧‧‧Additional memory

圖1是一種先前技術之具有外部記憶體及記憶體匯流排之數位信號處理器(DSP)之簡化方塊圖；圖2是一種根據本發明具有多個有本端可重組態查詢表之計算單元之數位信號處理器之方塊圖；圖3A及3B分別是根據本發明之編碼及解碼系統之簡化示意方塊圖；圖4是根據本發明在計算單元中之查詢表中實施之內部位元先進先出之示意圖；圖5A、5B及5C分別顯示該指標暫存器、讀指令及寫指令之欄位配置；圖6是類似於圖4之圖，說明根據本發明具有大端及小端選項之下限標準特徵及上限標準特徵之使用；圖7A及7B分別顯示使用及上限標準及下限標準之功效。1 is a simplified block diagram of a prior art digital signal processor (DSP) having an external memory and a memory bus; FIG. 2 is a calculation having multiple local reconfigurable lookup tables in accordance with the present invention. Figure 3A and 3B are simplified schematic block diagrams of an encoding and decoding system in accordance with the present invention; and Figure 4 is an internal bit advanced in the lookup table in the computing unit in accordance with the present invention. First-out schematic diagram; Figures 5A, 5B, and 5C show the field configuration of the index register, read command, and write command, respectively; Figure 6 is a diagram similar to Figure 4, illustrating the big-end and small-end options according to the present invention. The use of the lower limit standard feature and the upper limit standard feature; Figures 7A and 7B show the effects of the use and upper and lower limits, respectively.

圖8及9分別顯示在實施位元先進先出之查詢表及外部儲存器之間之填充及移取操作；及圖10說明在一個查詢表中實施一個以上位元先進先出。Figures 8 and 9 respectively show the filling and removing operations between the implementation of the bit FIFO look-up table and the external storage; and Figure 10 illustrates the implementation of more than one bit FIFO in a lookup table.

10a‧‧‧數位信號處理器10a‧‧‧Digital Signal Processor

20a‧‧‧計算單元20a‧‧‧Computation unit

20b‧‧‧計算單元20b‧‧‧Computation unit

20c‧‧‧計算單元20c‧‧‧Computation unit

20d‧‧‧計算單元20d‧‧‧Computation unit

50a‧‧‧查詢表50a‧‧‧Enquiry Form

50b‧‧‧查詢表50b‧‧‧Enquiry Form

50c‧‧‧查詢表50c‧‧‧Enquiry Form

50d‧‧‧查詢表50d‧‧‧Enquiry Form

52‧‧‧乘法器52‧‧‧Multiplier

54‧‧‧選擇電路54‧‧‧Selection circuit

56‧‧‧選擇電路56‧‧‧Selection circuit

58‧‧‧多項式乘法器58‧‧‧ Polynomial Multiplier

60‧‧‧桶式移位器60‧‧‧ barrel shifter

62‧‧‧算術邏輯單元62‧‧‧Arithmetic Logic Unit

64‧‧‧累加器64‧‧‧ accumulator

66‧‧‧多工器66‧‧‧Multiplexer

68‧‧‧暫存器檔案68‧‧‧Scratch file

Claims

A digital signal processor comprising a bit address unit, a control unit and a calculation unit, the calculation unit comprising: at least one data register; an arithmetic logic unit; a local random access memory array; a memory, comprising a first-in first-out base address field, a length field, and a read/write mode field, configured to configure a portion of the memory array as a one-bit FIFO circuit to provide or Storing a continuous bit stream operation element; and a read/write index register responsive to an instruction having a recognition field, a bit length field, and a scratchpad extraction/storage field, A bit field having one of the lengths specified in the bit length field is selectively transferred between the bit FIFO circuit and the data register in a single clock cycle.

The computing unit of claim 1, wherein the configuration register further comprises a little endian/big endian mode field.

According to the calculation unit of claim 1, wherein the forwarding one-bit field includes: in response to the information in the configuration register and the indicator register, and the instruction, extracting a first-in first-out circuit from the bit The bit field and store the bit field in the calculation unit data register.

According to the computing unit of claim 1, wherein the forwarding of the one-bit field includes: storing a bit field from a data register into the bit first-in first-out circuit, and responding to the temporary storage in the configuration And the information in the indicator register And the order.

According to the calculation unit of claim 3, the extracting comprises: updating the read indicator in the read indicator register by the specified length in the modulo first in first out length.

According to the calculation unit of claim 3, wherein the storing comprises: updating the write indicator in the write index register by the specified length in the first-in first-out length of the module.

The computing unit of claim 1, wherein the read/write index register includes a word address field and a bit position field for tracking the specified length.

According to the calculation unit of claim 1, further comprising a limit standard register, wherein the limit standard register is used to define an upper limit standard, and for the upper limit standard, the bit first-in first-out circuit must be removed to an external The storage; and the lower limit criterion, for which the continuous bit stream operation element must be used to fill the bit first-in first-out from the external storage.

According to the calculation unit of claim 1, wherein the filling and removal with an external memory occurs in a 32-bit word.

The computing unit of claim 9, wherein the 32-bit words are aligned memory.

The computing unit of claim 1, wherein the lookup table comprises a random access memory.

The computing unit of claim 1, wherein the data register is one of the computing unit register files.

According to the calculation unit of claim 8, wherein the extraction comprises: if left If the bit in the first in first out is lower than the lower limit criterion, the read indicator in the read indicator register is updated and a lower limit standard signal is generated.

According to the calculation unit of claim 8, wherein the depositing includes: if the bit in the first in first out is higher than the upper limit criterion, updating the write indicator in the write index register and generating an upper limit standard signal.

The computing unit of claim 1, wherein the lookup table can include a plurality of bits, first in first out.