TWI394461B

TWI394461B - Image deblocking filter and image processing device utilizing the same

Info

Publication number: TWI394461B
Application number: TW98137512A
Authority: TW
Inventors: Yu Chen Shen
Original assignee: Hon Hai Prec Ind Co Ltd
Priority date: 2009-11-05
Filing date: 2009-11-05
Publication date: 2013-04-21
Also published as: TW201117616A

Description

Image deblocking filter and image processing device

本發明涉及影像處理技術，尤其涉及一種影解塊過濾器及使用上述影像解塊過濾器的影像處理裝置。 The present invention relates to image processing technology, and in particular to a shadow block filter and an image processing apparatus using the image deblocking filter.

目前的視訊編碼技術標準處理影像時通常把視訊框當作是由視訊區塊(video block)所構成的組合，而每一個區塊就是框內(intra frame)編碼或框間(inter frame)編碼的基本單位。舉例來說，由動畫專家群(Moving Picture Experts Group，簡稱MPEG)所發展出來的MPEG-4標準把視訊框分為巨集區塊(Macroblock)。不同的標準技援不同尺寸的視訊區塊。H.264標準支援16×16、16×8、8×16、8×8、8×4、4×8及4×4像素的視訊區塊。 The current video coding technology standard generally treats a video frame as a combination of video blocks, and each block is an intra frame code or an inter frame code. The basic unit. For example, the MPEG-4 standard developed by the Moving Picture Experts Group (MPEG) divides video frames into macroblocks (Macroblock). Different standard technical aids for different sizes of video blocks. The H.264 standard supports video blocks of 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, and 4x4 pixels.

以視訊區塊為基礎的框內或框間編碼雖然利用視訊框在時間及空間上的相似性來獲取高壓縮比，卻會在視訊序列中區塊間的邊界上造成塊狀的不連續現象。不同的視訊編碼技術，例如H.264、VC-1及MPEG2，會使用不同的解塊(deblocking)方法以緩和上述塊狀現象。在單一個特定的視訊編碼技術中也會使用多種解塊公式。 In-frame or inter-frame coding based on video blocks, although using video frames in temporal and spatial similarity to obtain high compression ratios, causes block-like discontinuities in the boundaries between blocks in the video sequence. . Different video coding techniques, such as H.264, VC-1, and MPEG2, use different deblocking methods to mitigate the above-mentioned blockiness. A variety of deblocking formulas are also used in a single video encoding technique.

不同的解塊方法通常設計有專屬的解塊電路。若要將這些專屬電路整合在一個裝置中以支援不同的視訊編碼方法，可能使得電路設計複雜並且在電路微型化上增加困難度，當裝置欲支援新的解塊方法時，電路設計勢必需再更改。若使用通用處理器來執行不同的視訊編碼方法是較有彈性的，但相對而言較缺乏效率。 Different deblocking methods are usually designed with a dedicated deblocking circuit. Integrating these proprietary circuits into one device to support different video coding methods may complicate circuit design and increase the difficulty of circuit miniaturization. When the device wants to support a new deblocking method, the circuit design is necessary. Change again. It is more flexible to use a general purpose processor to perform different video coding methods, but it is relatively inefficient.

為了解決上述問題，本發明提出了一種影像解塊過濾器，包含：指令記憶體、執行單元及決定單元。所述指令記憶體用於儲存多個指令，其中所述多個指令中的第一指令代表作用於三個變數運算元及二個常數運算元的三個算術運算及二個位元運算。所述執行單元用於執行儲存於所述指令記憶體的指令，並包含第一算術邏輯單元，其中所述第一算術邏輯單元用於在單一頻率週期內執行所述第一指令。所述決定單元用於執行儲存於所述指令記憶體的指令，藉以決定將所述多個指令中的一部分指令從所述指令記憶體提取至所述執行單元的第一提取路徑及第二提取路徑的其中一提取路徑。其中當所述執行單元依據所述第一提取路徑或所述第二提取路徑執行指令時，分別執行第一或第二影像壓縮標準中的解塊公式，且被決定的所述提取路徑包含所述第一指令，其中當上述執行單元執行上述第一指令時指導上述影像解塊過濾器執行上述三個算術運算及二個位元運算，包含：作為一第一位元運算的一左移運算，實施於一第一變數運算元；作為一第一算術運算的一第一加法運算，實施於左移後的上述第一變數運算元及一第二變數運算元；根據上述第一指令的模式設定以作為一第二算術運算的一第二加法運算或一減法運算，其中上述第二加法運算包含上述第一加法運算的結果加一第三變數運算元，以及上述減法運算包含上述第一加法運算的結果減去上述第三變數運算元；作為一第三算術運算的一第三加法運算，實施於上述第二算術運算的結果及一第一常數運算元；以及作為一第二位元運算的一右移運算，依據一第二常數運算元的值將上述第三加法運算的結果右移。 In order to solve the above problems, the present invention provides an image deblocking filter, comprising: a command memory, an execution unit, and a decision unit. The instruction memory is configured to store a plurality of instructions, wherein the first one of the plurality of instructions represents three arithmetic operations and two bit operations acting on three variable operands and two constant operands. The execution unit is configured to execute an instruction stored in the instruction memory and includes a first arithmetic logic unit, wherein the first arithmetic logic unit is configured to execute the first instruction in a single frequency cycle. The determining unit is configured to execute an instruction stored in the instruction memory, thereby determining a first extraction path and a second extraction for extracting a part of the plurality of instructions from the instruction memory to the execution unit One of the extraction paths of the path. When the execution unit executes the instruction according to the first extraction path or the second extraction path, respectively performing a deblocking formula in the first or second image compression standard, and the determined extraction path includes the The first instruction, wherein when the execution unit executes the first instruction, instructing the image deblocking filter to perform the three arithmetic operations and two bit operations, comprising: a left shift operation as a first bit operation Implementing a first variable operation element; performing a first addition operation as a first arithmetic operation, performing the first variable operation element and a second variable operation element after the left shift; and the mode according to the first instruction Setting a second addition operation or a subtraction operation as a second arithmetic operation, wherein the second addition operation includes a result of the first addition operation plus a third variable operation element, and the subtraction operation includes the first addition Subtracting the third variable operation element from the result of the operation; performing a third addition operation as a third addition operation on the second arithmetic operation And a first constant operation element; and a right shift operation as a second bit operation, shifting the result of the third addition operation to the right according to the value of a second constant operation element.

本發明的實施方式也提出一種影像處理裝置，包含一影像解塊過濾器，所述影像解塊過濾器包含指令記憶體、執行單元及決定單元。所述指令記憶體用於儲存多個指令，所述多個指令中的第一指令代表作用於三個變數運算元及二個常數運算元的三個算術運算及二個位元運算。所述執行單元用於執行儲存於所述指令記憶體的指令，並包含第一算術邏輯單元，所述第一算術邏輯單元用於在單一頻率週期內執行所述第一指令。所述決定單元用於執行儲存於所述指令記憶體的指令，藉以決定將所述多個指令中的一部分指令從所述指令記憶體提取至所述執行單元的第一提取路徑及第二提取路徑的其中一提取路徑。其中當所述執行單元依據所述第一提取路徑或所述第二提取路徑執行指令時，分別執行第一或第二影像壓縮標準中的解塊公式，且被決定的所述提取路徑包含所述第一指令，其中當上述執行單元執行上述第一指令時指導上述影像解塊過濾器執行上述三個算術運算及二個位元運算，包含：作為一第一位元運算的一左移運算，實施於一第一變數運算元；作為一第一算術運算的一第一加法運算，實施於左移後的上述第一變數運算元及一第二變數運算元；根據上述第一指令的模式設定以作為一第二算術運算的一第二加法運算或一減法運算，其中上述第二加法運算包含上述第一加法運算的結果加一第三變數運算元，以及上述減法運算包含上述第一加法運算的結果減去上述第三變數運算元；作為一第三算術運算的一第三加法運算，實施於上述第二算術運算的結果及一第一常數運算元；以及作為一第二位元運算的一右移運算，依據一第二常數運算元的值將上述第三加法運算的結果右移。 Embodiments of the present invention also provide an image processing apparatus including an image deblocking filter, the image deblocking filter including a command memory, an execution unit, and a decision unit. The instruction memory is configured to store a plurality of instructions, and the first one of the plurality of instructions represents three arithmetic operations and two bit operations acting on three variable operands and two constant operands. The execution unit is configured to execute an instruction stored in the instruction memory, and includes a first arithmetic logic unit, the first arithmetic logic unit configured to execute the first instruction in a single frequency cycle. The determining unit is configured to execute an instruction stored in the instruction memory, thereby determining a first extraction path and a second extraction for extracting a part of the plurality of instructions from the instruction memory to the execution unit One of the extraction paths of the path. When the execution unit executes the instruction according to the first extraction path or the second extraction path, respectively performing a deblocking formula in the first or second image compression standard, and the determined extraction path includes the The first instruction, wherein when the execution unit executes the first instruction, instructing the image deblocking filter to perform the three arithmetic operations and two bit operations, comprising: a left shift operation as a first bit operation Implementing a first variable operation element; performing a first addition operation as a first arithmetic operation, performing the first variable operation element and a second variable operation element after the left shift; and the mode according to the first instruction Setting a second addition operation or a subtraction operation as a second arithmetic operation, wherein the second addition operation includes a result of the first addition operation plus a third variable operation element, and the subtraction operation includes the first addition Subtracting the third variable operator from the result of the operation; a third addition operation as a third arithmetic operation, a result of the second arithmetic operation and a first constant operation element; and a right shift operation as a second bit operation, according to a second constant operation The value of the element shifts the result of the third addition described above to the right.

100、101‧‧‧影像處理裝置 100, 101‧‧‧ image processing device

151‧‧‧處理器 151‧‧‧ processor

152‧‧‧主記憶體 152‧‧‧ main memory

153‧‧‧非揮發性記憶體 153‧‧‧ Non-volatile memory

154‧‧‧大量儲存裝置 154‧‧‧Many storage devices

155‧‧‧內容保護單元 155‧‧‧Content Protection Unit

156‧‧‧解調器 156‧‧‧ demodulator

157‧‧‧調諧器 157‧‧‧Tuner

158‧‧‧電源供應器 158‧‧‧Power supply

159‧‧‧石英振盪器 159‧‧‧Crystal Oscillator

160‧‧‧I/O單元 160‧‧‧I/O unit

161‧‧‧音訊輸出單元 161‧‧‧Optical output unit

162‧‧‧視訊輸出單元 162‧‧‧Video output unit

163‧‧‧天線 163‧‧‧Antenna

164‧‧‧埠 164‧‧‧埠

165、200‧‧‧影像解塊過濾器 165, 200‧‧‧ image deblocking filter

170‧‧‧網路介面 170‧‧‧Network interface

171‧‧‧網路 171‧‧‧Network

210‧‧‧內部記憶體 210‧‧‧Internal memory

211‧‧‧組態暫存器 211‧‧‧Configuration register

212‧‧‧主有限狀態機 212‧‧‧Main finite state machine

213‧‧‧動態連結庫FSM 213‧‧‧Dynamic Link Library FSM

214‧‧‧指令記憶體 214‧‧‧ instruction memory

215‧‧‧資料載入FSM 215‧‧‧Information loaded into FSM

216‧‧‧寫回FSM 216‧‧‧Write back to FSM

217‧‧‧視訊解碼器 217‧‧•Video Decoder

218‧‧‧外部記憶體 218‧‧‧ external memory

220‧‧‧執行單元 220‧‧‧Execution unit

230‧‧‧影像資料 230‧‧‧Image data

220‧‧‧執行單元 220‧‧‧Execution unit

300‧‧‧像素載入FSM 300‧‧‧ pixels loaded into FSM

310‧‧‧決定級 310‧‧‧Decision level

312‧‧‧模式暫存器 312‧‧‧Mode Register

313、323、333‧‧‧像素暫存器 313, 323, 333‧‧‧ pixel register

314、324、334‧‧‧提取級 314, 324, 334‧‧‧ extraction level

315、325、335‧‧‧解碼級 315, 325, 335‧‧ ‧ decoding level

316、326、336‧‧‧執行級 316, 326, 336 ‧ ‧ executive level

322、332‧‧‧路徑暫存器 322, 332‧‧‧ Path register

320‧‧‧運算級 320‧‧‧ computing level

330‧‧‧運算級 330‧‧‧ computing level

340‧‧‧像素更新FSM 340‧‧‧pixel update FSM

351-353‧‧‧記憶體區域 351-353‧‧‧ memory area

B1-B24‧‧‧區塊 B1-B24‧‧‧ Block

51-54‧‧‧邊緣 51-54‧‧‧ edge

55‧‧‧巨集區塊 55‧‧‧Macro Block

56‧‧‧區域 56‧‧‧Area

50‧‧‧像素單元 50‧‧‧ pixel unit

L00-L04‧‧‧指令 L00-L04‧‧‧ Directive

L10-L14‧‧‧指令 L10-L14‧‧‧ Directive

L20-L24‧‧‧指令 L20-L24‧‧‧ Directive

L30-L34‧‧‧指令 L30-L34‧‧‧ Directive

41、42‧‧‧匯流排 41, 42‧‧ ‧ busbar

3261、3262‧‧‧算術邏輯單元 3261, 3262‧‧‧Arithmetic Logic Unit

3263‧‧‧多工器 3263‧‧‧Multiplexer

3264‧‧‧像素調動電路 3264‧‧‧pixel transfer circuit

51-56‧‧‧暫存器 51-56‧‧‧ register

61-65‧‧‧運算子電路 61-65‧‧‧Operating subcircuit

600‧‧‧算術邏輯單元的部分電路 600‧‧‧ part of the circuit of the arithmetic logic unit

圖1A為包含影像解塊過濾器165的影像處理裝置100之實施方式的結構方塊圖。 FIG. 1A is a block diagram showing an embodiment of an image processing apparatus 100 including an image deblocking filter 165.

圖1B為從網路接收數位內容的影像處理裝置之第二種實施方式的結構方塊圖。 1B is a block diagram showing the structure of a second embodiment of an image processing apparatus that receives digital content from a network.

圖2為影解塊過濾器實施方式的方塊圖；圖3為影解塊過濾器的執行單元實施方式的方塊圖；圖4顯示一幅影像的示意圖；圖5-8顯示運算級指令序列；圖9顯示運算級結構示意圖；圖10顯示算術邏輯單元之示意圖；圖11顯示決定級的指令集；以及圖12顯示運算級的指令集。 2 is a block diagram of an embodiment of a shadow deblocking filter; FIG. 3 is a block diagram of an embodiment of an execution unit of a shadow deblocking filter; FIG. 4 is a schematic diagram of an image; Figure 9 shows a schematic diagram of the operational level structure; Figure 10 shows a schematic diagram of the arithmetic logic unit; Figure 11 shows the instruction set of the decision stage; and Figure 12 shows the instruction set of the arithmetic stage.

影像解塊過濾器及使用其的影像處理裝置的實施例說明如下： An embodiment of an image deblocking filter and an image processing apparatus using the same is described as follows:

1. System Overview

本發明所揭露的影像解塊過濾器可以實施在各種不同的影像處理裝置，諸如光碟播放機、多媒體播放器、數位攝影機、機上盒、個人數位助理(personal digital assistant，簡稱PDA)、膝上型或桌上型電腦或是任何具有影像處理能力的裝置，例如電視、行動電話或視訊會議裝置等。圖1A為包含影像解塊過濾器165之影像處理裝置100的結構方塊圖。 The image deblocking filter disclosed in the present invention can be implemented in various image processing devices. Such as CD player, multimedia player, digital camera, set-top box, personal digital assistant (PDA), laptop or desktop computer or any device with image processing capabilities, such as TV, mobile Telephone or video conferencing device, etc. FIG. 1A is a block diagram showing the structure of an image processing apparatus 100 including an image deblocking filter 165.

1.1 Embodiment of Image Processing Apparatus

影像解塊過濾器165整合於影像處理裝置100的中央處理單元，即處理器151中。該處理器151可由單晶片或多晶片封裝而成，而上述多晶片可以由匯流排連接。電源供應器158供給電力予影像處理裝置100中的各元件。石英振盪器159提供時脈訊號給處理器151與影像處理裝置100中的其它元件。圖1A顯示影像處理裝置100中各元件的連結關係，其連結可透過串列匯流排或平行匯流排。輸入輸出裝置包括控制按鈕、七段顯示以及和遙控器通信的紅外線接受器或收發器。複數埠164之其一與外部電腦相連結可用來對影像處理裝置100除錯。埠164可以是符合美國電子工業協會(Electronic Industries Association，簡稱EIA)所制定的第232號推薦標準(Recommended Standard-232，簡稱RS-232)及/或第11號推薦標準(Recommended Standard-11，簡稱RS-11)的實體連接埠、串列ATA(Serial ATA，簡稱SATA)及/或高清晰度多媒體介面(High Definition Multimedia Interface，簡稱HDMI)。非揮發性記憶體153儲存處理器151所執行的作業系統及應用程式。處理器151載入運行程序與數據資料到主記憶體152並將數位內容儲存於大量儲存裝置154中。該主記憶體152可以是動態隨機存取記憶體(Random Access Memory，簡稱RAM)，例如靜態隨機存取記憶體(Static RAM，簡稱SRAM)或是動態隨機存取記憶體(Dynamic RAM，簡稱DRAM)。該非揮發性記憶體153可以是電子可抹除可規劃唯讀記憶體(Electrically Erasable Programmable Read-Only Memory，簡稱EEPROM)，例如反或(NOR)快閃記憶體或是反及(NAND)快閃記憶體。內容保護單元155針對影像處理裝置100所產生的數位內容提供存取控制。該內容保護單元155包含實現數位視訊廣播之通用介面(DVB-CI)及/或條件式存取(DVB-CA)所需的記憶體與必要裝置。影像處理裝置100可從天線165、調諧器157以及解調器156傳遞的數位訊號取得數位內容。圖1B顯示另一個實施例，影像處理裝置101透過網路存取介面從網際網路等網路中取得數位內容。視訊輸出單元162包含濾波器和放大器用來將處理器151所輸出的視訊訊號加以過濾及放大。音訊輸出單元161包含數位類比轉換器用來將處理器151所輸出的音訊訊號從數位格式轉換為類比格式。 The image deblocking filter 165 is integrated in the central processing unit of the image processing apparatus 100, that is, the processor 151. The processor 151 may be packaged from a single wafer or a multi-chip, and the multi-wafer may be connected by a bus bar. The power supply 158 supplies power to each element in the image processing apparatus 100. The quartz oscillator 159 provides a clock signal to the processor 151 and other components in the image processing device 100. FIG. 1A shows the connection relationship of the components in the image processing apparatus 100, and the connection is permeable to the serial bus bar or the parallel bus bar. The input and output device includes a control button, a seven-segment display, and an infrared receiver or transceiver that communicates with the remote controller. One of the plurality 164 is coupled to an external computer for debugging the image processing apparatus 100.埠164 may be in accordance with the Electronic Industries Association (EIA) Recommendation No. 232 (Recommended Standard-232, referred to as RS-232) and/or Recommendation No. 11 (Recommended Standard-11, Referred to as RS-11), the physical connection port, Serial ATA (SATA) and/or High Definition Multimedia Interface (HDMI). The non-volatile memory 153 stores operating systems and applications executed by the processor 151. The processor 151 loads the running program and data to the main memory 152 and stores the digital content in the mass storage device 154. The main memory 152 may be a random access memory (RAM), such as a static random access memory (SRAM) or a dynamic random access memory (Dynamic RAM, DRAM for short). ). The non-volatile memory 153 may be an electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only) Memory, referred to as EEPROM), such as reverse (NOR) flash memory or reverse (NAND) flash memory. The content protection unit 155 provides access control for the digital content generated by the image processing apparatus 100. The content protection unit 155 includes the memory and necessary devices required to implement a universal interface for digital video broadcasting (DVB-CI) and/or conditional access (DVB-CA). The image processing device 100 can obtain digital content from the digital signals transmitted from the antenna 165, the tuner 157, and the demodulator 156. FIG. 1B shows another embodiment in which the image processing apparatus 101 acquires digital content from a network such as the Internet through a network access interface. The video output unit 162 includes a filter and an amplifier for filtering and amplifying the video signal output by the processor 151. The audio output unit 161 includes a digital analog converter for converting the audio signal output by the processor 151 from a digital format to an analog format.

1.2 embodiment of image deblocking filter

圖2顯示圖1A及1B影像解塊過濾器165的實施方式影像解塊過濾器200的結構方塊圖。圖2中的影像解塊過濾器200的構成元件可以由電路構成。影像解塊過濾器200連接至視訊解碼器217及外部記憶體218。上述視訊解碼器217用於對各種格式的影像或視訊進行解碼，例如MPEG所開發的MPEG-1、MPEG-2、及MPEG-4等標準，國際電信聯盟-電信(International telecommunications union-telecommunication，簡稱ITU-T)所開發的H.263及H.264，或美國蘋果電腦公司(Apple Computer Inc)所開發的QuickTime^TM，或美國微軟公司(Microsoft Corporation)所開發的VC-1技術。上述視訊解碼器217可以在組態暫存器211中寫入設定值，而主有限狀態機(main finite state machine，簡稱FSM)212再根據組態暫存器211中的設定值以控制影像解塊過濾器200的各項操作。上述外部記憶體218可以包含主記憶體152或一個記憶體容量於內部記憶體210的快取記憶體。上述內部記憶體210的較佳實施方式包含SRAM。上述外部記憶體218儲存解塊指令及上述解碼器217解碼後的影像資料。動態連結庫(dynamic link library，簡稱DLL)FSM 213從外部記憶體218載入解塊指令至指令記憶體214，而資料載入FSM從外部記憶體218載入影像資料至內部記憶體210。執行單元220從指令記憶體214取得解塊指令，從內部記憶體210取得影像資料，執行並應用取得的指令於上述取得的影像資料以實行解塊，並儲存解塊後的影像資料於內部記憶體210。寫回FSM 216儲存解塊後的影像資料至外部記憶體218。 2 is a block diagram showing the structure of an image deblocking filter 200 of the embodiment of the image deblocking filter 165 of FIGS. 1A and 1B. The constituent elements of the image deblocking filter 200 in Fig. 2 can be constituted by circuits. The image deblocking filter 200 is connected to the video decoder 217 and the external memory 218. The video decoder 217 is used for decoding images or video in various formats, such as MPEG-1, MPEG-2, and MPEG-4 developed by MPEG, and International Telecommunications Union-telecommunication (Telecommunication). H.263 and H.264 developed by ITU-T), or QuickTime ^TM developed by Apple Computer Inc. of the United States, or VC-1 technology developed by Microsoft Corporation of the United States. The video decoder 217 can write the set value in the configuration register 211, and the main finite state machine (FSM) 212 controls the image solution according to the set value in the configuration register 211. The operations of the block filter 200. The external memory 218 may include a main memory 152 or a cache memory having a memory capacity of the internal memory 210. A preferred embodiment of the internal memory 210 described above includes an SRAM. The external memory 218 stores the deblocking command and the image data decoded by the decoder 217. The dynamic link library (DLL) FSM 213 loads the deblocking instruction from the external memory 218 to the instruction memory 214, and the data loading FSM loads the image data from the external memory 218 to the internal memory 210. The execution unit 220 obtains the deblocking instruction from the instruction memory 214, acquires the image data from the internal memory 210, executes and applies the acquired instruction to the acquired image data to perform deblocking, and stores the decompressed image data in the internal memory. Body 210. The FSM 216 is written back to store the decompressed image data to the external memory 218.

2 image deblocking filter operation example

圖3顯示執行單元220的實施方式。圖3中的執行單元200的構成元件可以由電路組成。上述指令記憶體214可以包含記憶體區域351-353。在解碼器217完成一筆影像資料(例如靜態影像、視訊框或巨集區塊)的解碼後，輸出該筆影像資料至外部記憶體218，上述主FSM 212判定該筆影像資料的格式，並在上述組態暫存器211中寫入適合該格式的解塊模式的相應暫存器設定值。資料載入FSM 215將該筆解碼後的影像資料載入至內部記憶體210成為影像資料230。 FIG. 3 shows an embodiment of execution unit 220. The constituent elements of the execution unit 200 in FIG. 3 may be composed of circuits. The above instruction memory 214 may include memory regions 351-353. After the decoder 217 completes decoding of a piece of image data (for example, a still image, a video frame, or a macroblock), the image data is output to the external memory 218, and the main FSM 212 determines the format of the image data, and A corresponding register setting value suitable for the deblocking mode of the format is written in the configuration register 211 described above. The data loading FSM 215 loads the decoded image data into the internal memory 210 to become the image data 230.

一個解塊模式包含複數解塊決策，而每個解塊決策關聯於複數解碼公式的其中一者。每個解塊決策由記憶體區域351中儲存的複數決定級指令構成，而每個解塊公式由記憶體區域352及353的複數運算級指令構成。像素載入FSM 300在模式暫存器312中寫入對應上述解塊模式的暫存器設定值，並將上述影像資料230從上述內部記憶體210載入至像素暫存器313。在決定級310中，提取級314基於上述模式暫存器302中的設定值，從記憶體區域351取得決定級指令，解碼級315解碼並且執行級316執行上述取得的決定級指令，藉以決定一筆解塊公式所對應的一條指令提取路徑，並把上述所決定的提取路徑對應的暫存器設定值寫入在路徑暫存器中，例如路徑暫存器322 及332。上述路徑暫存器323的暫存器值可以轉送至路徑暫存器333。 A deblocking mode includes complex deblocking decisions, and each deblocking decision is associated with one of the complex decoding formulas. Each deblocking decision is made up of a complex decision level instruction stored in the memory region 351, and each deblocking formula is composed of complex arithmetic level instructions of the memory regions 352 and 353. The pixel loading FSM 300 writes a register setting value corresponding to the deblocking mode in the mode register 312, and loads the image data 230 from the internal memory 210 to the pixel register 313. In the decision stage 310, the fetch stage 314 fetches the decision level instruction from the memory area 351 based on the set value in the mode register 302, the decode stage 315 decodes and the execution stage 316 executes the above-determined decision level instruction to determine a An instruction fetch path corresponding to the deblocking formula is written in the path register, for example, the path register 322, in the path register, corresponding to the determined extraction path. And 332. The register value of the path register 323 described above can be forwarded to the path register 333.

上述像素暫存器313中的像素資料可以轉送至像素暫存器323。在運算級320中，提取級324基於上述路徑暫存器322中的設定值，從記憶體區域352取得運算級指令，解碼級325解碼並且執行級326執行上述取得的運算級指令，以產生並儲存中間資料(intermediate data)或已解塊像素值於像素暫存器323。上述像素暫存器323中的像素資料可以被轉送至像素暫存器333。上述路徑暫存器322中的設定值可以被轉送至路徑暫存器332。同理，在運算級330中，提取級334基於上述路徑暫存器332中的設定值，從記憶體區域353取得運算級指令，解碼級335解碼並且執行級336執行上述取得的運算級指令，以產生並儲存中間資料或已解塊像素值於像素暫存器333。上述路徑暫存器332的設定值可以接收自上述決定級310的執行級316的輸出或上路徑暫存器322。 The pixel data in the pixel register 313 can be transferred to the pixel register 323. In operation stage 320, extraction stage 324 retrieves the operation level instructions from memory area 352 based on the set values in path register 322 described above, decoding stage 325 decodes and execution stage 326 executes the above-derived operation level instructions to generate and The intermediate data or the deblocked pixel values are stored in the pixel register 323. The pixel data in the pixel register 323 can be transferred to the pixel register 333. The set values in the path register 322 described above can be forwarded to the path register 332. Similarly, in the operation stage 330, the extraction stage 334 retrieves the operation level instruction from the memory area 353 based on the set value in the path register 332, the decoding stage 335 decodes and the execution stage 336 executes the above-mentioned acquired operation level instruction. To generate and store intermediate data or deblocked pixel values in pixel register 333. The set value of the path register 332 can be received from the output of the execution stage 316 of the decision stage 310 or the upper path register 322.

上述提取級324、解碼級325及執行級326形成上述運算級320中的管線結構(pipelining)的三級。上述路徑暫存器322及上述像素暫存器323可以包含在提取級324中。上述解碼級325及上述執行級326可以整合成可以在單個時脈週期中完成指令解碼及執行的單一級。同理，上述提取級334、解碼級335及執行級336形成上述運算級330中的管線結構(pipelining)的三級。上述路徑暫存器332及上述像素暫存器333可以包含在提取級334中。上述解碼級335及上述執行級336可以整合成可以在單個時脈週期中完成指令解碼及執行的單一級。 The above-described extraction stage 324, decoding stage 325, and execution stage 326 form three stages of pipeline construction in the above-described arithmetic stage 320. The path register 322 and the pixel register 323 described above may be included in the extraction stage 324. The above-described decoding stage 325 and the above-described execution stage 326 can be integrated into a single stage that can perform instruction decoding and execution in a single clock cycle. Similarly, the above-described extraction stage 334, decoding stage 335, and execution stage 336 form three stages of pipeline construction in the above-described arithmetic stage 330. The path register 332 and the pixel register 333 described above may be included in the extraction stage 334. The above-described decoding stage 335 and the above-described execution stage 336 can be integrated into a single stage that can complete instruction decoding and execution in a single clock cycle.

像素更新FSM 340儲存解塊後的像素值於內部記憶體210以更新影像資料230。上述寫回FSM 216傳送並儲存上述更新後的影像資料230於上述外部記憶體218。 The pixel update FSM 340 stores the deblocked pixel values in the internal memory 210 to update the image material 230. The write-back FSM 216 transmits and stores the updated image data 230 to the external memory 218.

上述影像解塊過濾器220的指令集將在下列段落中說明。圖11顯示決定級的指令。「描述」欄中的文字以偽碼(pseudo code)說明運算級執行各別指令時的操作。 The instruction set of the image deblocking filter 220 described above will be explained in the following paragraphs. Figure 11 shows the decision level instruction. The text in the "Description" column uses a pseudo code to describe the operation when the operation level executes the individual instructions.

指令SAC,APR,及MFA中的每個指令皆附有模式設定位元，其中一位元指定該指令是屬於dual或non-dual模式，另一位元指定該指令為結束指令(end)或非結束指令(not-end)。當運算級執行在dual模式中的指令時，該指令指導該運算級執行單指令/多資料(single instruction stream and multiple data streams，簡稱SIMD)運算。當運算級執行在結束指令時，該結束指令指導該運算級將該結束指令後的一組指令集合從上述運算級移至另一個運算級執行。 Each instruction in the instructions SAC, APR, and MFA is accompanied by a mode setting bit, where a bit specifies that the instruction belongs to dual or non-dual mode, and another bit specifies that the instruction is an end instruction (end) or Non-end instruction (not-end). When the operation stage executes an instruction in the dual mode, the instruction directs the operation stage to perform a single instruction stream and multiple data streams (SIMD) operation. When the operation stage executes the end instruction, the end instruction instructs the operation stage to move the set of instructions after the end instruction from the above operation stage to another operation stage.

R1,R2,R3,TP及TF為暫存器名稱，且A,A1及A2為變數，稱為常數運算元。F1-F4為雙態的旗標。符號“=”代表指定或賦值操作(assignment operation)，即指定在該符號“=”右側的值予該符號“=”左側的變數或暫存器。“TF=(| R1-R2 |<R3)？1：0”表示在執行SAC指令時，如果(| R1-R2 |<R3)為真，TF的值指定為1，如果(| R1-R2 |<R3)非真，TF的值指定為0。運算級執行JMP指令時，只在旗標F1為真時，切換至暫存器A所指的位址執行指令。運算級執行APR指令時，當旗標F1為真，指定A1的值予暫存器TP，當旗標F1非真，指定A2的值予暫存器TP。符號“&”代表邏輯AND運算。圖12顯示運算級指令集。 R1, R2, R3, TP and TF are register names, and A, A1 and A2 are variables, called constant operands. F1-F4 is a two-state flag. The symbol "=" represents a designation or assignment operation, that is, a variable or a register that specifies the value to the right of the symbol "=" to the left of the symbol "=". "TF=(| R1-R2 |<R3)?1:0" means that if (| R1-R2 |<R3) is true, the value of TF is specified as 1 when executing the SAC instruction, if (| R1-R2 |<R3) is not true, the value of TF is specified as 0. When the operation stage executes the JMP instruction, it switches to the address execution instruction pointed to by the scratchpad A only when the flag F1 is true. When the operation stage executes the APR instruction, when the flag F1 is true, the value of A1 is assigned to the temporary register TP, and when the flag F1 is not true, the value of A2 is assigned to the temporary register TP. The symbol "&" represents a logical AND operation. Figure 12 shows the computed level instruction set.

TR是暫存器名。A及S是變數，稱為常數運算元。上述符號“<<”及“>>”分別代表位元左移及右移操作。明確而言，以實數變數x以及正整數變數y所標示的x((y表示x為二的補數並向右位移y位元。向右位移後填入最高有效位元(Most Significant Bit，簡稱MSB)的數值應與x位移前最高有效位元的數值相同。相同地，x((y表示x為二的補數並向左位移y位元。向左位移後填入最低有效位元(Least Significant Bit，簡稱LSB)的數值應為0。ASH指令附有模式設定位元用以指定暫存器R2及R3之間的運算。當上述模式設定位元用以指定加法運算“+”時，上述ASH指令表示TR=[(R1<<1)+R2+R3+A]>>S。當上述模式設定位元用以指定減法運算“-”時，上述ASH指令表示TR=[(R1<<1)+R2-R3+A]>>S。上述函數Clip( )稱為剪裁函數，其定義如下： TR is the name of the scratchpad. A and S are variables and are called constant arithmetic elements. The above symbols "<<" and ">>" represent the left and right shift operations of the bit, respectively. Specifically, x is indicated by the real variable x and the positive integer variable y ((y denotes that x is the complement of two and shifts y bit to the right. Shift to the right and fill in the most significant bit (Most Significant Bit, The value of MSB) should be the same as the value of the most significant bit before x-displacement. Similarly, x((y means that x is the complement of two and shifts y-bit to the left. After shifting to the left, the least significant bit is filled in. The value of (Least Significant Bit, LSB for short) should be 0. The ASH instruction is accompanied by a mode setting bit to specify the operation between the registers R2 and R3. When the above mode setting bit is used to specify the addition operation "+" When the ASH command indicates TR=[(R1<<1)+R2+R3+A]>>S. When the mode setting bit is used to specify the subtraction "-", the ASH command indicates TR=[( R1<<1)+R2-R3+A]>>S. The above function Clip( ) is called a trimming function and is defined as follows:

上述變數b及c分別為用來剪裁變數a的下限及上限。當執行級執行剪裁指令時，例如CLP或UCP，該剪裁指令指導上述影像解塊過濾器執行剪裁操作，以限制該指令的運算元的值在上述上下限之區間內。 The above variables b and c are the lower limit and the upper limit for cutting the variable a, respectively. When the execution stage executes the trimming instruction, such as CLP or UCP, the trimming instruction instructs the image deblocking filter to perform a trimming operation to limit the value of the operand of the instruction to the upper and lower limits.

圖4顯示上述影像資料230的影像區塊及區塊間的邊緣。上述影像資料230包含區域56，上述區域56包含區塊B17-B24及由區塊B1-B16所構成的巨集區塊55。如圖示，區塊B1-B24每一者包含16個像素，在圖中以方形表示像素。在圖4所示的區塊及像素的排列顯示其在影像資料230中的排列。在資料載入FSM 215從外部記憶體218載入影像資料230至內部記憶體210，上述像素載入FSM 300將影像資料230的一單位像素集合載入至像素暫存器313以對影像230的邊緣解塊。舉例來說，像素載入FSM 300載入單位像素集合50至像素暫存器313，包含區塊B17的像素P0-P3及區塊B1的像素Q0-Q3。箭號51-54分別代表巨集區塊55的水平邊緣。箭號所代表的邊緣包含該箭號兩邊的二列區塊。舉例來說，箭號51所代表的邊緣包含區塊B1-B4及B17-B20。在進行箭號51所代表的邊緣的解塊時，上述像素載入FSM 300可以載入像素P0-P3及Q0-Q3右邊的8個像素作為後續解塊操作的下一個單位像素集合。依此類推。 FIG. 4 shows the image block of the image data 230 and the edge between the blocks. The image data 230 includes a region 56, and the region 56 includes blocks B17-B24 and macroblocks 55 formed by blocks B1-B16. As illustrated, blocks B1-B24 each contain 16 pixels, which are represented by squares in the figure. The arrangement of the blocks and pixels shown in FIG. 4 shows their arrangement in the image data 230. The data loading FSM 215 loads the image data 230 from the external memory 218 to the internal memory 210, and the pixel loading FSM 300 loads a unit pixel set of the image data 230 into the pixel register 313 for the image 230. Edge deblocking. For example, pixel loading FSM 300 loads unit pixel set 50 into pixel register 313, including pixels P0-P3 of block B17 and pixels Q0-Q3 of block B1. The arrows 51-54 represent the horizontal edges of the macro block 55, respectively. The edge represented by the arrow contains two columns on either side of the arrow. For example, the edge represented by arrow 51 includes blocks B1-B4 and B17-B20. When the deblocking of the edge represented by the arrow 51 is performed, the pixel loading FSM 300 can load the 8 pixels to the right of the pixels P0-P3 and Q0-Q3 as the next unit of the subsequent deblocking operation. Pixel collection. So on and so forth.

解塊操作可以分別作用於單位像素集合50的亮度(luma)及彩度(chroma)。各影像編碼標準中已說明對影像各垂直及水平邊緣的解塊順序，因此在此不再贅述。 The deblocking operation may be applied to the luminance (luma) and chroma (chroma) of the unit pixel set 50, respectively. The deblocking order of the vertical and horizontal edges of the image has been described in each image coding standard, and therefore will not be described herein.

2.1 Example 1 of the execution instruction

各種不同的指令編碼標準的解塊公式用不同的指令提取路徑表示(或稱為指令執行路徑)，每個提取路徑包含提取至運算級的複數運算級指令。上述決定級310執行指令以輸出上述提取路徑。舉例來說，上述決定級310輸出對應下列解塊公式的提取路徑：P1bSLT4(P2,P1,P0,Q0)=P1+Clip3(-TC0,TC0,(P2+((P0+Q0+1)>>1)-(P1<<1))>>1) (2) Deblocking formulas for various different instruction encoding standards are represented by different instruction fetch paths (or called instruction execution paths), and each fetch path contains complex arithmetic level instructions that are extracted to the arithmetic stage. The decision stage 310 described above executes an instruction to output the above extracted path. For example, the decision stage 310 outputs an extraction path corresponding to the following deblocking formula: P1bSLT4(P2, P1, P0, Q0)=P1+Clip3(-TC0, TC0, (P2+((P0+Q0+1)>> 1)-(P1<<1))>>1) (2)

其中TC0為變數。公式(2)中的函數Clip3(x,y,z)在H.264中定義為： Where TC0 is a variable. The function Clip3(x, y, z) in equation (2) is defined in H.264 as:

公式(2)為H.264標準中用來計算像素P1值的公式。上述決定級310執行解塊策略對應的決定級指令，藉以根據影像資料230的每個邊緣的邊界強度(boundary strength)及取樣旗標(sampling flag，簡稱SF)以決定出包含指令L00-L04的提取路徑。上述SF可以包含H.264標準中所定義的旗標FilterSamplesFlag。圖5顯示上述包含指令L00-L04的提取路徑。上述指令L00-L04的註解也顯示在每個指令下方的符號“//”之後，其中W,X,Y及Z為變數。 Equation (2) is a formula used to calculate the pixel P1 value in the H.264 standard. The decision stage 310 executes the decision level instruction corresponding to the deblocking strategy, so that the boundary strength and the sampling flag (SF) of each edge of the image data 230 are used to determine the command L00-L04. Extract the path. The above SF may contain the flag FilterSamplesFlag defined in the H.264 standard. Figure 5 shows the above extraction path containing instructions L00-L04. The annotations of the above instructions L00-L04 are also displayed after the symbol "//" below each instruction, where W, X, Y and Z are variables.

上述提取級324順序地提取上述指令L00-L04至上述解碼級325。上述解碼級325及上述執行級326分別解碼及執行上述指令L00-L04。指令L00代表X=-P1，並且被執行後輸出X=-P1的結果。指令L01代表Y=(P0+Q0+1)>>1，並且被執行後輸出Y=(P0+Q0+1)>>1的結果。指令L02代表Z=[(-P1)<<1+Y+P2]>>1，並且被執行後輸出Z=[(-P1)<<1+Y+P2]>>1的結果。指令L03代表W=Clip(Z,-TC0,TC0)，並且被執行後輸出W=Clip(Z,-TC0,TC0)的結果。指令L04被執行後則輸出公式(2)的結果以更新像素P1。 The above-described extraction stage 324 sequentially extracts the above-described instructions L00-L04 to the above-described decoding stage 325. The decoding stage 325 and the execution stage 326 respectively decode and execute the above instructions L00-L04. The instruction L00 represents X=-P1, and the result of X=-P1 is output after being executed. The instruction L01 represents Y=(P0+Q0+1)>>1, and the result of Y=(P0+Q0+1)>>1 is output after being executed. The instruction L02 represents Z=[(-P1)<<1+Y+P2]>>1, and the result of Z=[(-P1)<<1+Y+P2]>>1 is output after being executed. The instruction L03 represents W=Clip(Z, -TC0, TC0), and the result of W=Clip(Z, -TC0, TC0) is output after being executed. After the instruction L04 is executed, the result of the formula (2) is output to update the pixel P1.

每個執行級，例如上述執行級326及336，皆包含二個算術邏輯單元(arithmetic logic unit，簡稱ALU)。當一個指令代表其dual模式的位元為致動狀態，一個解碼級(例如解碼級325或335)在解碼該指令時產生上述指令的二個實例，並將上述指令的上述二個實例並上述二個實例所對應的二組運算元集合輸出至上述解碼級所連接的上述執行級。多功器(multiplexer)可以從像素暫存器選擇並輸出二組像素集合以作為上述二組運算元集合至上述二個ALU。在上述執行級的中的二個ALU在同一個時脈週期內分別對上述二組運算元集合執行上述指令的上述二個實例。上述二組運算元集合的其中一組所包含的像素對稱於另一組的像素。 Each execution level, such as the execution levels 326 and 336 described above, includes two arithmetic logic units (ALUs). When an instruction represents a bit in its dual mode, the decoding stage (eg, decoding stage 325 or 335) generates two instances of the instruction when decoding the instruction, and the above two instances of the instruction are The two sets of operand sets corresponding to the two instances are output to the above-mentioned execution stage to which the decoding stage is connected. A multiplexer can select and output two sets of pixel sets from the pixel register as the set of two sets of operands to the two ALUs. The two ALUs in the execution stage respectively perform the above two instances of the above instructions on the two sets of operand sets in the same clock cycle. The pixels included in one of the above two sets of operand sets are symmetric to the other set of pixels.

舉例來說，如圖9所示，上述執行級326包含ALU 3261及3262。當對dual模式的位元為致動狀態的上述指令L00解碼時，上述解碼級325產生並輸出指令L00的二個實例至上述執行級326。多工器3263分別別輸出像素暫存器中的像素P1及Q1至ALU 3261及3262。如圖6所示，上述解碼級325產生指令L10作為指令L00的複製指令實例，其中參照上述箭號51，在像素單位50的幾何排列中作為指令L10的運算元的像素Q1與指令L00的運算元的像素P1對稱。在上述執行級326中的上述ALU 3261在一個時脈週中對運算元P1執行指令L00，以及上述ALU 3262在相同的時脈週中對運算元Q1執行指令L10，其中指令L10為指令L00的複製實例，像素Q1對稱像素P1。同理，上述解碼級325產生指令L11、L12、L13及L14作為指令L01、L02、L03及L04的複製指令實例，並且ALU 3262執行上述複製指令實例。 For example, as shown in FIG. 9, the above-described execution stage 326 includes ALUs 3261 and 3262. When the above instruction L00 in which the bit of the dual mode is in the actuated state is decoded, the above decoding stage 325 generates and outputs two instances of the instruction L00 to the above-described execution stage 326. The multiplexer 3263 outputs the pixels P1 and Q1 to the ALUs 3261 and 3262 in the pixel register, respectively. As shown in FIG. 6, the decoding stage 325 generates the instruction L10 as an example of the copy instruction of the instruction L00, wherein the operation of the pixel Q1 and the instruction L00 of the operation element of the instruction L10 in the geometric arrangement of the pixel unit 50 with reference to the above-mentioned arrow 51. The pixel P1 of the element is symmetrical. The above ALU 3261 in the above execution stage 326 is executed on the operation unit P1 in one clock cycle. The row instruction L00, and the ALU 3262 described above, executes the instruction L10 on the operand Q1 in the same clock cycle, wherein the instruction L10 is a copy instance of the instruction L00, and the pixel Q1 is symmetric pixel P1. Similarly, the above decoding stage 325 generates the instructions L11, L12, L13, and L14 as the copy instruction instances of the instructions L01, L02, L03, and L04, and the ALU 3262 executes the above-described copy instruction instance.

圖10顯示執行級中ALU之部分結構示意圖。圖10中的箭號顯示其中各元件的連接關係。電路600可以是執行級320或330中的任一個ALU的實施方式，用以在單一個時脈週期中執行ASH指令。暫存器51-55分別儲存ASH指令的運算元R1,R2,R3,A及S。運算子電路61將暫存器51的值左移1個位元，並輸出上左移後的結果。運算子電路62對運算子電路61的輸出與暫存器52的值執行加法運算，並輸出上述加法運算的結果至運算子電路63。暫存器56儲存一個指令解碼後的模式設定位元，用以指定暫存器R2及R3之間要執行加法或減法運算。上述運算子電路63根據儲存在暫存器56中的位元值來對已接收的運算元執行加法或減法運算，並輸出運算結果。運算子電路64對運算子電路63的輸出與暫存器54的值執行加法運算，並輸出上述加法運算的結果至運算子電路65。上述運算子電路65根據暫存器55的值所指定的數量對運算子電路64的輸出執行右移，並輸出右移的結果。 Figure 10 shows a partial structural diagram of the ALU in the execution stage. The arrows in Fig. 10 show the connection relationship of the respective elements therein. Circuit 600 may be an implementation of any one of execution stages 320 or 330 to execute an ASH instruction in a single clock cycle. The registers 51-55 store the operands R1, R2, R3, A and S of the ASH instruction, respectively. The arithmetic sub-circuit 61 shifts the value of the register 51 to the left by one bit, and outputs the result of the left shift. The arithmetic sub-circuit 62 performs an addition operation on the output of the arithmetic sub-circuit 61 and the value of the register 52, and outputs the result of the above-described addition to the arithmetic sub-circuit 63. The register 56 stores an instruction-decoded mode setting bit for specifying addition or subtraction between the registers R2 and R3. The arithmetic sub-circuit 63 performs an addition or subtraction operation on the received arithmetic element based on the bit value stored in the temporary memory 56, and outputs the operation result. The arithmetic sub-circuit 64 performs an addition operation on the output of the arithmetic sub-circuit 63 and the value of the register 54, and outputs the result of the above-described addition to the arithmetic sub-circuit 65. The arithmetic sub-circuit 65 performs a right shift on the output of the arithmetic sub-circuit 64 in accordance with the number specified by the value of the register 55, and outputs the result of the right shift.

當一個指令的解碼模式設定位元指示該指令為結束指令時，上述解碼級325通知上述提取級334而非提取級324去獲取該結束指令後的一組指令集合。上述解碼級325可以移動路徑暫存器322中的暫存器設定值至路徑暫存器32以完成上述通知作業。另外，如圖9所示，像素調動電路3264從像素暫存器323傳送像素至像素暫存器333以回應匯流排41中所傳送的上述結束指令的上述設定位元。上述解碼級335及上述執行級336分別解碼並執行上述指令集合，藉此實現執行級之間的管線執行構架。舉例來說，上述解碼級325可以順序解碼圖7中的指令，從第一個指令至最後一個指令。當解碼到結束指令模式設定位元為致動狀態的指令L22時，上述解碼級325通知提取級324不要取得指令L22之後的指令L23-24，並通知提取級334取得指令L23-24。上述解碼級335及執行級336分別解碼並執行上述指令L23-24。控制電路3265在匯流排42中的控制信號的控制下，致動上述執行級326所輸出的像素資料寫回像素暫存器323中的各別暫存器。 When the decoding mode setting bit of an instruction indicates that the instruction is an end instruction, the decoding stage 325 notifies the extraction stage 334 instead of the extraction stage 324 to acquire a set of instructions after the end instruction. The decode stage 325 can move the register set value in the path register 322 to the path register 32 to complete the notification job. In addition, as shown in FIG. 9, the pixel transfer circuit 3264 transfers pixels from the pixel buffer 323 to the pixel register 333 in response to the set bit of the end command transmitted in the bus bar 41. The above-described decoding stage 335 and the above-described execution stage 336 respectively decode and execute the above-described instruction set, thereby implementing a pipeline execution framework between execution stages. For example, the decode stage 325 described above can sequentially decode the instructions in FIG. 7, from the first instruction to the last instruction. When decoding to the end finger When the mode setting bit is the command L22 of the actuated state, the decoding stage 325 notifies the fetching stage 324 not to acquire the command L23-24 after the command L22, and notifies the fetching stage 334 to acquire the command L23-24. The above decoding stage 335 and execution stage 336 respectively decode and execute the above-described instruction L23-24. The control circuit 3265, under the control of the control signal in the bus bar 42, activates the pixel data output by the execution stage 326 to be written back to the respective registers in the pixel register 323.

2.2 Example 2 of the execution instruction

VC-1標準也利用了各種公式以進行解塊，包含：a0=(2×(p3-p6)-5×(p4-p5)+4)>>3 (4) The VC-1 standard also utilizes various formulas for deblocking, including: a0=(2×(p3-p6)-5×(p4-p5)+4)>>3 (4)

a1=(2×(p1-p4)-5×(p2-p3)+4)>>3 (5) A1=(2×(p1-p4)-5×(p2-p3)+4)>>3 (5)

a2=(2×(p5-p8)-5×(p6-p7)+4)>>3 (6) A2=(2×(p5-p8)-5×(p6-p7)+4)>>3 (6)

上述p1-p8為像素值，可以是像素亮度或彩度。舉例來說，上述決定級310輸出對應解塊公式(4)執行路徑如圖8所示，包含指令L30-L34。其中a0-a2,I,J,X及W為變數。上述提取級324順序提取指令L30-L34至解碼級325。上述解碼級325及上述執行級326分別解碼及執行上述指令L30-L34。指令L30代表X=(p3-p6)，並且被執行後輸出X=(p3-p6)的運算結果。指令L31代表W=(p4-p5)，並且被執行後輸出W=(p4-p5)的運算結果。指令L32代表I=2W=2(p4-p5)，並且被執行後輸出I=2W=2(p4-p5)的運算結果。指令L33代表J=2I+W=5(p4-p5)，並且被執行後輸出J=2I+W=5(p4-p5)的運算結果。指令L34代表a0=(2X-J+4)>>3，並且被執行後輸出a0=(2X-J+4)>>3的運算結果，也就是公式(4)的結果。同理，上述決定級310可以輸出對應公式(5)及(6)的執行路徑。上述提取級324順序提取上述執行路徑中的指令至上述解碼級325。上述解碼級325及上述執行級326分別解碼及執行上述執行路徑中的指令。 The above p1-p8 are pixel values, which may be pixel brightness or chroma. For example, the decision stage 310 outputs the corresponding deblocking formula (4). The execution path is as shown in FIG. 8, and includes instructions L30-L34. Where a0-a2, I, J, X and W are variables. The above-described fetch stage 324 sequentially fetches instructions L30-L34 to decode stage 325. The decoding stage 325 and the execution stage 326 decode and execute the instructions L30-L34, respectively. The instruction L30 represents X = (p3 - p6), and is outputted to output an operation result of X = (p3 - p6). The instruction L31 represents W = (p4 - p5), and after being executed, the operation result of W = (p4 - p5) is output. The instruction L32 represents I=2W=2 (p4-p5), and is outputted to output an operation result of I=2W=2 (p4-p5). The instruction L33 represents J=2I+W=5 (p4-p5), and after being executed, the operation result of J=2I+W=5 (p4-p5) is output. The instruction L34 represents a0=(2X-J+4)>>3, and after being executed, the operation result of a0=(2X-J+4)>>3 is output, that is, the result of the formula (4). Similarly, the above decision stage 310 can output the execution paths corresponding to the formulas (5) and (6). The above-described fetch stage 324 sequentially fetches instructions from the above-described execution path to the above-described decode stage 325. The decoding stage 325 and the execution stage 326 respectively decode and execute the instructions in the execution path.

如上述，上述影像處理裝置可以儲存各種實作各種影像或視訊技術或壓縮標準(例如VC-1、MPEG2及H.264)的解塊方法的指令。上述影像處理裝置整合越多影像或視訊技術或標準的對應指令，就可以提供越多的標準支援彈性。每個運算級包含二個ALU以實現SIMD。另外，二個運算級回應結束指令的暫存器設定位元以彼此平衡執行指令的負擔，藉此以實現運算級之間的管線架構，並增加解塊過濾器的整體效率。總結來說，適用於各種影像處理裝置的上述解塊過濾器電路可以適用於機上盒、媒體播放器、電視及視訊會議裝置。 As described above, the image processing apparatus can store various instructions for implementing the deblocking methods of various video or video technologies or compression standards (for example, VC-1, MPEG2, and H.264). The more image processing devices integrated with the image or video technology or standard corresponding instructions, the more standard support flexibility can be provided. Each compute stage contains two ALUs to implement SIMD. In addition, the two arithmetic stages respond to the register setting bits of the end instruction to balance the burden of executing the instructions with each other, thereby achieving a pipeline architecture between the computing stages and increasing the overall efficiency of the deblocking filter. In summary, the above deblocking filter circuit applicable to various image processing devices can be applied to set-top boxes, media players, televisions, and video conferencing devices.

綜上所述，本發明符合發明專利要件，爰依法提出專利申請。惟，以上所述者僅為本發明之較佳實施方式，舉凡熟悉本案技藝之人士，在爰依本案發明精神所作之等效修飾或變化，皆應包含於以下之申請專利範圍內。 In summary, the present invention complies with the requirements of the invention patent and submits a patent application according to law. The above description is only the preferred embodiment of the present invention, and equivalent modifications or variations made by those skilled in the art will be included in the following claims.

220‧‧‧執行單元 220‧‧‧Execution unit

300‧‧‧像素載入有限狀態機 300‧‧‧pixel loaded finite state machine

310‧‧‧決定級 310‧‧‧Decision level

312‧‧‧模式暫存器 312‧‧‧Mode Register

313、323、333‧‧‧像素暫存器 313, 323, 333‧‧‧ pixel register

314、324、334‧‧‧提取級 314, 324, 334‧‧‧ extraction level

315、325、335‧‧‧解碼級 315, 325, 335‧‧ ‧ decoding level

316、326、336‧‧‧執行級 316, 326, 336 ‧ ‧ executive level

322、332‧‧‧路徑暫存器 322, 332‧‧‧ Path register

320‧‧‧運算級 320‧‧‧ computing level

330‧‧‧運算級 330‧‧‧ computing level

340‧‧‧像素更新有限狀態機 340‧‧‧Pixel update finite state machine

351-353‧‧‧記憶體區域 351-353‧‧‧ memory area

Claims

An image deblocking filter includes: a command memory for storing a plurality of instructions, wherein a first one of the plurality of instructions represents three arithmetic operations acting on three variable operands and two constant operands Two bit operations; an execution unit for executing instructions stored in the instruction memory, and including a first arithmetic logic unit, wherein the first arithmetic logic unit is configured to execute the first instruction in a single clock cycle And a determining unit configured to execute the instruction stored in the instruction memory, thereby determining to extract a part of the plurality of instructions from the instruction memory to the first extraction path and the second extraction path of the execution unit; An extraction path, wherein when the execution unit executes the instruction according to the first extraction path or the second extraction path, respectively performing a deblocking formula in the first or second image compression standard, and the determined extraction path includes the foregoing a first instruction; wherein when the execution unit executes the first instruction, the image is deblocked Performing the above three arithmetic operations and two bit operations, comprising: performing a left shift operation as a first bit operation, implementing a first variable operation element; and performing a first addition operation as a first arithmetic operation And performing the first variable operation element and the second variable operation element after the left shift; setting a second addition operation or a subtraction operation as a second arithmetic operation according to the mode of the first instruction, wherein the foregoing The two addition operation includes the result of the first addition operation plus a third variable operation element, and the subtraction operation includes the result of the first addition operation minus the third variable operation element; a third addition operation as a third arithmetic operation, a result of the second arithmetic operation and a first constant operation element; and a right shift operation as a second bit operation, according to a second constant operation The value of the element shifts the result of the third addition described above to the right.

The image deblocking filter according to claim 1, wherein the plurality of instructions include a cropping instruction with an upper limit value and a lower limit value, and the execution unit executes the above-mentioned one cropping instruction separately: two pairs of pens The variable operation unit performs a fourth addition operation; when the result of the fourth addition operation is within the range of the upper limit value and the lower limit value, the result of the fourth addition operation is output; when the fourth addition operation is performed When the result is greater than the upper limit value, the upper limit value is output; and when the result of the fourth addition operation is less than the lower limit value, the lower limit value is output.

The image deblocking filter of claim 1, wherein the execution unit includes a first and a second operation stage, the first operation stage includes the first arithmetic logic unit, and the first operation stage transmits The instruction set after the first instruction in the extraction path is determined to be sent to the second operation level in response to the mode setting of the first instruction, and the second operation stage executes the instruction set.

The image deblocking filter of claim 3, wherein the execution unit further comprises a second arithmetic logic unit, wherein the first and second arithmetic logic units respectively perform the first step in the same clock cycle Two instances of the instruction are applied to the two sets of operands in response to the mode setting of the first instruction.

The image deblocking filter of claim 1, wherein the image deblocking filter is run in a set top box.

An image processing device includes an image deblocking filter, and the image deblocking filter includes: a command memory for storing a plurality of instructions, wherein a first one of the plurality of instructions represents three arithmetic operations and two bit operations acting on three variable operands and two constant operands; And an instruction for storing in the instruction memory, and including a first arithmetic logic unit, wherein the first arithmetic logic unit is configured to execute the first instruction in a single clock cycle; and a determining unit is configured to execute And storing, in the instruction memory, an instruction to extract a part of the plurality of instructions from the instruction memory to one of the first extraction path and the second extraction path of the execution unit; wherein the execution unit Performing a deblocking formula in the first or second image compression standard respectively according to the first extraction path or the second extraction path execution instruction, and the determined extraction path includes the first instruction; wherein When the unit executes the first instruction, instructing the image deblocking filter to perform the above three arithmetic operations The two bit operations include: a left shift operation as a first bit operation, implemented in a first variable operation element; a first addition operation as a first arithmetic operation, implemented in the left shifting a first variable operation element and a second variable operation element; a second addition operation or a subtraction operation as a second arithmetic operation according to the mode of the first instruction, wherein the second addition operation comprises the first addition a third variable operation element is added to the result of the operation, and the subtraction operation includes the result of the first addition operation minus the third variable operation element; and a third addition operation as a third arithmetic operation is performed on the second The result of the arithmetic operation and a first constant operation element; and a right shift operation as a second bit operation, shifting the result of the third addition operation to the right according to the value of a second constant operation element.

The image processing device of claim 6, wherein the plurality of instructions include a cropping instruction with an upper limit value and a lower limit value, and the execution unit executes the above-described one cropping instruction separately: two-variable arithmetic operation The element performs a fourth addition operation; when the result of the fourth addition operation is within the range of the upper limit value and the lower limit value, the result of the fourth addition operation is output; when the result of the fourth addition operation is greater than The upper limit value is outputted when the upper limit value is exceeded, and the lower limit value is output when the result of the fourth addition operation is less than the lower limit value.

The image processing device of claim 6, wherein the execution unit includes a first and a second operation stage, the first operation stage includes the first arithmetic logic unit, and the first operation level is transmitted in the above It has been decided to extract a set of instructions following the first instruction in the path to the second operation level in response to the mode setting of the first instruction, and the second operation stage executes the instruction set.

The image processing device of claim 8, wherein the execution unit further comprises a second arithmetic logic unit, wherein the first and second arithmetic logic units respectively execute the first instruction in the same clock cycle Two instances are respectively applied to the two sets of operands in response to the mode setting of the first instruction.

The image processing device of claim 6, wherein the image deblocking filter is a set-top box.