TWI851495B - Page buffer circuit and operating method thereof adapted for page read device - Google Patents
Page buffer circuit and operating method thereof adapted for page read device Download PDFInfo
- Publication number
- TWI851495B TWI851495B TW112148012A TW112148012A TWI851495B TW I851495 B TWI851495 B TW I851495B TW 112148012 A TW112148012 A TW 112148012A TW 112148012 A TW112148012 A TW 112148012A TW I851495 B TWI851495 B TW I851495B
- Authority
- TW
- Taiwan
- Prior art keywords
- bit data
- page buffer
- input
- weight
- latch
- Prior art date
Links
Images
Landscapes
- Complex Calculations (AREA)
Abstract
Description
本揭示關於一種半導體裝置,特別有關於一種用於記憶裝置的頁緩衝電路及其操作方法。 The present disclosure relates to a semiconductor device, and more particularly to a page buffer circuit for a memory device and an operating method thereof.
隨著人工智慧技術的崛起,已發展出人工智慧運算所需之各種基本演算,諸如向量-向量之乘法((vector-vector-multiply,VVM)、以及乘法累加(multiply-accumulate,MAC)。基於記憶體的高速存取特性,可藉由記憶體執行的記憶體內運算(in-memory-computing,IMC)來達成VVM運算與MAC運算。 With the rise of artificial intelligence technology, various basic operations required for artificial intelligence computing have been developed, such as vector-vector multiplication (VVM) and multiply-accumulate (MAC). Based on the high-speed access characteristics of memory, VVM operations and MAC operations can be achieved through in-memory-computing (IMC) performed by the memory.
然而,當VVM運算與MAC運算的位元寬度較大時(即,執行多個位元的運算)時,記憶體內運算需要的執行時間將大幅提升。 However, when the bit width of VVM operations and MAC operations is larger (i.e., performing operations on multiple bits), the execution time required for in-memory operations will increase significantly.
針對於上述議題,需要改良的頁讀取(page read)與頁緩衝(page buffer)機制,以對於記憶陣列儲存的頁資料進行更有效率的資料讀取,並能夠配合管線(pipeline)運作機制,據以降低VVM運算與MAC運算的執行時間。 To address the above issues, an improved page read and page buffer mechanism is needed to read the page data stored in the memory array more efficiently and cooperate with the pipeline operation mechanism to reduce the execution time of VVM and MAC operations.
根據本揭示之一方面,提供一種頁緩衝電路,其適應於頁讀取裝置,其中該頁讀取裝置包括記憶陣列,該記憶陣列具有複數個頁面和多條位元線。該頁緩衝電路包括以下元件。複數個第一鎖存器,用於經由該些位元線從該些頁面之對應一者接收權重向量,並經由資料輸入/輸出路徑匯入輸入向量,其中該權重向量具有複數個權重位元資料,且該輸入向量具有複數個輸入位元資料。複數個第二鎖存器,用於儲存該輸入向量的該些輸入位元資料。複數個邏輯運算單元,耦接於該些第一鎖存器以接收該些權重位元資料,並耦接於該些第二鎖存器以接收該些輸入位元資料,各該邏輯運算單元用於執行該些輸入位元資料之對應一者與該些權重位元資料之對應一者的邏輯運算以產生邏輯運算結果,並且該邏輯運算結果傳送至該些第一鎖存器之其中一者。控制電路,用於選擇性地致能該些邏輯運算單元以執行該邏輯運算。 According to one aspect of the present disclosure, a page buffer circuit is provided, which is adapted for a page read device, wherein the page read device includes a memory array having a plurality of pages and a plurality of bit lines. The page buffer circuit includes the following elements. A plurality of first latches, for receiving a weight vector from a corresponding one of the pages via the bit lines, and for importing an input vector via a data input/output path, wherein the weight vector has a plurality of weight bit data, and the input vector has a plurality of input bit data. A plurality of second latches, for storing the input bit data of the input vector. A plurality of logic operation units are coupled to the first latches to receive the weight bit data, and coupled to the second latches to receive the input bit data. Each of the logic operation units is used to perform a logic operation on a corresponding one of the input bit data and a corresponding one of the weight bit data to generate a logic operation result, and the logic operation result is transmitted to one of the first latches. A control circuit is used to selectively enable the logic operation units to perform the logic operation.
根據本揭示之一方面,提供一種適應於頁讀取裝置的頁緩衝電路的操作方法,其中該頁讀取裝置包括記憶陣列,該記憶陣列具有複數個頁面和多條位元線,該操作方法包括以下步驟。藉由該頁緩衝電路的複數個第一鎖存器經由該些位元線從該些頁面之對應一者接收權重向量,並經由一資料輸入/輸出路徑匯入輸入向量至該些第一鎖存器,其中該權重向量具有複數個權重位元資料,且該輸入向量具有複數個輸入位元資料。儲存該輸入向量的該些輸入位元資料至該頁緩衝電路的複數個第二鎖存器。 藉由該頁緩衝電路的複數個邏輯運算單元,從第一鎖存器接收該些權重位元資料並從該些第二鎖存器接收該些輸入位元資料。藉由各該邏輯運算單元,執行該些輸入位元資料之對應一者與該些權重位元資料之對應一者的邏輯運算,以產生邏輯運算結果。將該邏輯運算結果傳送至該些第一鎖存器之其中一者。藉由該頁緩衝電路的控制電路,選擇性地致能該些邏輯運算單元以執行該邏輯運算。 According to one aspect of the present disclosure, a method for operating a page buffer circuit adapted for a page read device is provided, wherein the page read device includes a memory array having a plurality of pages and a plurality of bit lines, and the method includes the following steps. A weight vector is received from a corresponding one of the pages via the bit lines by a plurality of first latches of the page buffer circuit, and an input vector is imported into the first latches via a data input/output path, wherein the weight vector has a plurality of weight bit data, and the input vector has a plurality of input bit data. The input bit data of the input vector is stored in a plurality of second latches of the page buffer circuit. The weight bit data are received from the first latch and the input bit data are received from the second latch by a plurality of logic operation units of the page buffer circuit. The logic operation corresponding to the input bit data and the weight bit data are executed by each logic operation unit to generate a logic operation result. The logic operation result is transmitted to one of the first latches. The logic operation units are selectively enabled to execute the logic operation by the control circuit of the page buffer circuit.
透過閱讀以下圖式、詳細說明以及申請專利範圍,可見本揭示之其它方面以及優點。 Other aspects and advantages of the present disclosure may be seen by reading the following drawings, detailed descriptions and claims.
21:感應放大器 21: Inductive amplifier
100:鎖存器單元 100: Lock register unit
200:解碼電路 200: decoding circuit
300:邏輯運算電路 300:Logical operation circuit
400:控制電路 400: Control circuit
1001,1001b,1001c:頁緩衝電路 1001,1001b,1001c: Page buffer circuit
1500:記憶陣列 1500:Memory array
1800:累加電路 1800: Accumulation circuit
2000:記憶裝置 2000: Memory devices
31,32,33,34,3(N-2),3(N-1):邏輯運算單元 31,32,33,34,3(N-2),3(N-1): Logical Operation Unit
311,312,314,321,322,324,331,332,334:輸入端 311,312,314,321,322,324,331,332,334: Input terminal
341,342,344:輸入端 341,342,344: Input port
313,323,333,343:輸出端 313,323,333,343: output port
42:多工器 42: Multiplexer
426:輸入端 426: Input port
421,422,423,424:輸入端 421,422,423,424: Input port
425:輸出端 425: Output port
PB:頁緩衝單元 PB: Page Buffer Unit
pg(0),pg(m+1),pg(m):頁面 pg(0),pg(m+1),pg(m):page
BL1,BL2,BL(M-1),BLM:位元線 BL1, BL2, BL(M-1), BLM: bit lines
P1:資料輸入/輸出路徑 P1: Data input/output path
In:輸入向量 In: Input vector
We:權重向量 We: weight vector
In(0)~In(3):位元資料 In(0)~In(3): bit data
We(0)~We(3):位元資料 We(0)~We(3): bit data
L1,L2,L3,L4,L5:鎖存器 L1, L2, L3, L4, L5: lock register
L(N-1),LN:鎖存器 L(N-1),LN: latch
DL,WDL,CDL:鎖存器 DL, WDL, CDL: lock register
T1~T4:運算週期 T1~T4: operation cycle
t1~t11:時間點 t1~t11: time point
t2’~t10’:時間點 t2’~t10’: time point
t2”~t7”:時間點 t2”~t7”: time point
T_ac_1,T_ac_2:期間 T_ac_1,T_ac_2: Period
T_im_1~T_im_4:期間 T_im_1~T_im_4: Period
T_op_1~T_op_3:期間 T_op_1~T_op_3: Period
T_rd_1,T_rd_2:期間 T_rd_1,T_rd_2: Period
T_int_rd_1,T_int_rd_2:期間 T_int_rd_1,T_int_rd_2: period
S100~S118,S200~S206,S300~S308:步驟 S100~S118,S200~S206,S300~S308: Steps
S400~S412,S600~S610:步驟 S400~S412,S600~S610: Steps
第1圖為本揭示一實施例的記憶裝置2000的電路圖。
Figure 1 is a circuit diagram of a
第2A圖為本揭示一實施例的頁緩衝電路1001的電路圖。
Figure 2A is a circuit diagram of a
第2B圖為本揭示另一實施例的頁緩衝電路1001b的電路圖。
Figure 2B is a circuit diagram of a
第2C圖為本揭示又一實施例的頁緩衝電路1001c的電路圖。
Figure 2C is a circuit diagram of a
第3圖為頁緩衝電路1001之基本運作之示意圖。
Figure 3 is a schematic diagram of the basic operation of the
第4A圖為頁緩衝單元PB與累加電路1800執行的VVM/MAC運算的主要程序的流程圖。
Figure 4A is a flow chart of the main procedures of the VVM/MAC operation performed by the page buffer unit PB and the
第4B圖為權重向量We的讀取程序的流程圖。 Figure 4B is a flowchart of the weight vector We reading procedure.
第4C圖為輸入向量In的匯入程序的流程圖。 Figure 4C is a flowchart of the import procedure of the input vector In.
第4D圖為VVM運算的流程圖。 Figure 4D is a flowchart of VVM operation.
第5A~5H圖為頁緩衝電路1001之運作之示意圖。
Figures 5A to 5H are schematic diagrams of the operation of the
第6圖為頁緩衝單元PB與累加電路1800執行的VVM/MAC運算的另一實施例的流程圖。
FIG. 6 is a flow chart of another embodiment of the VVM/MAC operation performed by the page buffer unit PB and the
第7圖為第5A~5H圖的實施例的頁緩衝電路1001之運作之時序圖。
FIG. 7 is a timing diagram of the operation of the
第8圖為一個比較例的向量-向量乘加器之運作之時序圖。 Figure 8 is a timing diagram of the operation of a vector-vector multiplier-adder as a comparison example.
本說明書的技術用語係參照本技術領域之習慣用語,如本說明書對部分用語有加以說明或定義,該部分用語之解釋係以本說明書之說明或定義為準。本揭示之各個實施例分別具有一或多個技術特徵。在可能實施的前提下,本技術領域具有通常知識者可選擇性地實施任一實施例中部分或全部的技術特徵,或者選擇性地將這些實施例中部分或全部的技術特徵加以組合。 The technical terms in this specification refer to the customary terms in this technical field. If this specification explains or defines some terms, the interpretation of these terms shall be based on the explanation or definition in this specification. Each embodiment disclosed in this disclosure has one or more technical features. Under the premise of possible implementation, a person with ordinary knowledge in this technical field can selectively implement some or all of the technical features in any embodiment, or selectively combine some or all of the technical features in these embodiments.
請參見第1圖,其為本揭示一實施例的記憶裝置2000的電路圖。記憶裝置2000包括記憶陣列1500、頁緩衝單元PB及累加電路1800。記憶裝置2000具有適合於執行頁讀取操作的組態(例如,具有頁讀取特徵),因此記憶裝置2000可稱為「頁讀取裝置」。對應的,記憶陣列1500是適合於頁讀取操作,例如,記憶陣列1500的類型可以是非揮發性(non-volatile)記憶體、或揮發性(volatile)記憶體,包括:反「及」閘快閃(NAND flash)記憶體、反「或」閘快閃(NOR flash)記憶體、相變記憶體(PCM)、
動態隨機存取記憶體(DRAM)、靜態隨機存取記憶體(SRAM)或磁阻式隨機存取記憶體(MRAM)。記憶陣列1500可具有二維(2D)結構或三維(3D)結構。記憶陣列1500亦可具有單一平面(single-plane)結構或多平面(multi-plane)結構。
Please refer to FIG. 1, which is a circuit diagram of a
記憶陣列1500包括多個頁面(page),例如:頁面pg(1)、...、頁面pg(m)及頁面pg(m+1)。每個頁面包括多個記憶區塊(圖中未顯示),每個記憶區塊包括多個記憶晶胞。記憶晶胞可以是SLC晶胞(單階記憶晶胞)、MLC晶胞(二階記憶晶胞)、TLC晶胞(三階記憶晶胞)、QLC晶胞(四階記憶晶胞)或PLC晶胞(五階記憶晶胞)晶胞,等等。記憶晶胞用於儲存資料,例如儲存權重資料(weight data)。在本實施例中,一個頁面對應儲存一個權重向量(weight-vector)We。
The
記憶陣列1500經由多條位元線,例如M條位元線BL1、BL2、BL3、BL4、...、BL(M-1)與BLM。頁緩衝單元PB可對於記憶陣列1500的頁面pg(1)~pg(m+1)執行頁讀取(page-read)操作。頁緩衝單元PB包括多個頁緩衝(page buffer)電路,例如M個頁緩衝電路1001、1002、1003、1004、...、100(M-1)與100M。此些頁緩衝電路1001~100M具有與位元線BL1~BLM相同之數量「M」。頁緩衝電路1001~100M分別耦接於位元線BL1~BLM。儲存於頁面pg(1)~pg(m+1)的權重向量We可經由位元線BL1~BLM之對應者讀取至頁緩衝電路1001~100M。
The
權重向量We可具有位元寬度「N」,其中「N」小於或等於數量「M」。權重向量We包括儲存在記憶陣列1500的頁面pg(1)~pg(m+1)之其中一者的位元資料Wex(0)、Wex(1)、...、Wex(N-1)。位元寬度「N」表示權重向量We的位元的數量,引數「x」表示第x個維度。例如,頁面pg(1)具有16KB的資料量大小,並且頁面pg(1)可儲存總共64個權重向量We,每個權重向量We具有位元寬度「4」和維度「512」。位元資料Wej(n)是第j個位元且第n個維度的位元資料。在以下段落中,以位元寬度是「4」(即,N=4)以及第一個維度(即,x=1)為例進行說明。權重向量We的位元資料Wei(n)包括位元資料We(0)、We(1)、We(2)、及We(3),頁緩衝單元PB中的四個頁緩衝電路1001~1004用於對應處理位元資料We(0)~We(3)。位元資料We(0)~We(3)可以經由位元線BL1~BL4對應提供至頁緩衝電路1001~1004。
The weight vector We may have a bit width "N", where "N" is less than or equal to the number "M". The weight vector We includes bit data We x (0), We x (1), ..., We x (N-1) stored in one of the pages pg(1)~pg(m+1) of the
頁緩衝電路1001~1004的每一者耦接於一條資料輸入/輸出路徑,例如,頁緩衝電路1001耦接於資料輸入/輸出路徑P1,頁緩衝電路1002耦接於資料輸入/輸出路徑P2,頁緩衝電路1003耦接於資料輸入/輸出路徑P3,頁緩衝電路1004耦接於資料輸入/輸出路徑P4(圖1中未顯示)。資料輸入/輸出路徑P1~P4可對應於位元線BL1~BL4。並且,具有位元寬度「4」的輸入向量(input vector)In經由對應的資料輸入/輸出路徑P1~P4匯入(import)至頁緩衝電路1001~1004其中一者。頁緩衝電路1001~1004的每一者對於輸入向量In與權重向量We執行的邏輯運算。
Each of the page buffer circuits 1001-1004 is coupled to a data input/output path, for example, the
頁緩衝電路1001~1004耦接於累加電路1800。累加電路1800對於頁緩衝電路1001~1004執行的邏輯運算的結果執行累加(accumulation)運算。頁緩衝電路1001~1004的邏輯運算併同於累加電路1800的累加運算形成向量-向量之乘法(可稱為「vector-vector-multiply,VVM」)的運算。
Page buffer circuits 1001-1004 are coupled to
接著,請參見第2A圖,其為本揭示的圖1的實施例之頁緩衝電路1001的電路圖。頁緩衝電路1001經由位元線BL1耦接於記憶陣列1500。頁緩衝電路1001包括鎖存器(latch)電路100、解碼電路200、邏輯運算電路300與控制電路400。鎖存器電路100例如包括多個鎖存器(latch),例如八個鎖存器DL、WDL、CDL以及L1~L5。鎖存器DL可稱為「首級(first stage)鎖存器」,其設置於鎖存電路100的首級(或第一級)。鎖存器WDL可稱為「權重鎖存器」,其設置於介於鎖存器L1與鎖存器L2之間的位址。鎖存器CDL可稱為「末級(last stage)鎖存器」,其設置於鎖存電路100的末級(或最後級)。鎖存器L2~L5設置於鎖存器WDL與鎖存器CDL之間的位址,且鎖存器L2~L5設置於連續的位址。邏輯運算電路300包括多個邏輯運算單元31~34。邏輯運算單元31~34的數量相等於鎖存器L2~L5的數量(為「4」)。
Next, please refer to FIG. 2A, which is a circuit diagram of a
可依據頁緩衝電路1001的設計限制(design constraint)而選擇性地設置鎖存器WDL。若設計限制是:執行權重向量We的讀取程序的延遲(即,執行時間)少於執行權重向量We與輸入向量In的VVM的運算程序的延遲,則頁緩衝電路
1001之中可設置鎖存器WDL。若無需考量權重向量We的讀取程序的延遲,則不設置鎖存器WDL。
The latch WDL may be selectively set according to the design constraint of the
位元線BL1經由感應放大器(sensing amplifier,SA)21耦接於解碼電路200,且解碼電路200耦接於鎖存器DL。位元線BL1傳送的資料經由感應放大器21處理後傳送至解碼電路200進行解碼。以記憶陣列1500的記憶晶胞為TLC晶胞為例,解碼電路200解碼出每個TLC晶胞的3個位元之資料。在其他示例中,記憶陣列1500的記憶晶胞可以是SLC晶胞、MLC晶胞、QLC晶胞或PLC晶胞晶胞。若記憶晶胞是SLC晶胞,則無需設置解碼電路200。
The bit line BL1 is coupled to the
邏輯運算單元31具有輸入端311、312與314以及輸出端313。其中,輸入端311耦接於鎖存器L2,輸入端312耦接於鎖存器WDL。邏輯運算單元31根據鎖存器L2儲存的資料與鎖存器WDL儲存的資料執行邏輯運算,例如:邏輯「及(AND)」運算、邏輯「或(OR)」運算、邏輯「互斥或(XOR)」運算或邏輯「反互斥或(XNOR)」運算。控制電路400傳送控制訊號至邏輯運算單元31的輸入端314,用以致能(enable)邏輯運算單元31進行邏輯運算。運算結果經由輸出端313傳送至鎖存器CDL。
The
邏輯運算單元32、33與34的運作機制、以及其輸入端與輸出端的耦接方式類似於邏輯運算單元31。例如,邏輯運算單元32、33與34的輸入端321、331與341分別耦接於鎖存器L3、L4與L5。邏輯運算單元32、33與34的輸入端322、332
與342共同耦接於鎖存器WDL。邏輯運算單元32、33與34分別根據鎖存器L3、L4與L5的資料與鎖存器WDL的資料執行邏輯運算。在本實施例中,邏輯運算單元31~34都執行相同類型的邏輯運算,例如都執行邏輯「及」運算。邏輯運算單元31~34的輸出端313~343共同耦接於鎖存器CDL。控制電路400傳送控制訊號至邏輯運算單元31~34的輸入端314~344,在同一個運算週期邏輯運算單元31~34只有一者傳送運算結果至鎖存器CDL。
The operation mechanism of the
資料輸入/輸出路徑P1耦接於鎖存器CDL。鎖存器CDL儲存的運算結果經由資料輸入/輸出路徑P1傳送至外部電路(例如累加電路1800)。 The data input/output path P1 is coupled to the latch CDL. The calculation result stored in the latch CDL is transmitted to the external circuit (such as the accumulation circuit 1800) via the data input/output path P1.
接著,請參見第2B圖,其為本揭示另一實施例的頁緩衝電路1001b的電路圖。本實施例的頁緩衝電路1001b可根據設計需求或設計限制設置更多個鎖存器,例如N個鎖存器L1~LN。對應的,頁緩衝電路1001b設置(N-1)個邏輯運算單元31~3(N-1)以分別執行鎖存器L2~LN的資料與鎖存器WDL的資料的邏輯運算。
Next, please refer to FIG. 2B, which is a circuit diagram of a
接著,請參見第2C圖,其為本揭示又一實施例的頁緩衝電路1001c的電路圖。本實施例的頁緩衝電路1001c更包括多工器42,藉由多工器42選擇邏輯運算單元31~34其中一者的運算結果以傳送至鎖存器CDL,而無須藉由控制電路400的控制訊號來致能並選擇邏輯運算單元31~34。
Next, please refer to FIG. 2C, which is a circuit diagram of a
邏輯運算單元31、32、33與34各自的輸出端313、323、333與343分別耦接於多工器42的輸入端421、422、423與424以傳送運算結果。多工器42的輸入端426接收控制電路400的控制訊號,以選擇將輸入端421、422、423與424其中一者接收的運算結果傳送至輸出端425,而後傳送至鎖存器CDL。
The
接著,請參見第3圖,其為頁緩衝電路1001之基本運作之示意圖。頁緩衝電路1001對於輸入向量In與權重向量We執行邏輯運算。輸入向量In的位元寬度是「N」,輸入向量In包括N個位元資料In(0)、In(1)、...、與In(N-1)。以序列(sequential)方式經由資料輸入/輸出路徑P1依序匯入(import)輸入向量In的位元資料In(0)~In(N-1)至頁緩衝電路1001。
Next, please refer to Figure 3, which is a schematic diagram of the basic operation of the
另一方面,權重向量We儲存於記憶陣列1500之中,權重向量We的位元寬度亦相等於「N」而包括N個位元資料We(0)、We(1)、...、與We(N-1)。權重向量We的位元資料We(0)~We(N-1)儲存於記憶陣列1500的其中一頁面,並以平行(parallel)方式經由對應位元線讀取至對應的頁緩衝電路。以位元寬度「N」等於「4」為例,權重向量We的位元資料We(0)讀取至對應的頁緩衝電路1001,位元資料We(1)讀取至對應的頁緩衝電路1002,位元資料We(2)讀取至對應的頁緩衝電路1003,位元資料We(3)讀取至對應的頁緩衝電路1004。
On the other hand, the weight vector We is stored in the
而後,在頁緩衝電路1001~1004每一者之中,對於輸入向量In的位元資料以及權重向量We的對應之位元資料依序執行部分乘積(partial-product)的運算。 Then, in each of the page buffer circuits 1001-1004, partial-product operations are sequentially performed on the bit data of the input vector In and the corresponding bit data of the weight vector We.
而後,頁緩衝電路1001將部分乘積的運算結果傳送至累加電路1800。累加電路1800以序列方式執行加權累加(weighted accumulation)的運算,以得到VVM/MAC運算的最終運算結果。
Then, the
接著,請參見第4A~4D圖,其為頁緩衝單元PB與累加電路1800執行的VVM/MAC運算的流程圖,並配合參見第5A~5H圖繪示的頁緩衝電路1001的運作的示意圖。第4A~4D圖與第5A~5H圖是以輸入向量In與權重向量We的位元寬度皆等於「4」、且邏輯運算單元31~34都執行邏輯「及」運算為例進行說明。
Next, please refer to Figures 4A to 4D, which are flow charts of the VVM/MAC operation performed by the page buffer unit PB and the
首先請參見第4A圖,其為頁緩衝單元PB與累加電路1800執行的VVM/MAC運算的主要程序的流程圖。在步驟S100,將輸入向量In經由對應的位元線(例如位元線BL1)匯入至頁緩衝電路1001。而後,在步驟S102,確認輸入向量In匯入完畢。另一方面,執行步驟S104:將權重向量We讀取至頁緩衝電路1001。步驟S104可同步於步驟S100、或在步驟S100之前或之後而執行。權重向量We原儲存於記憶陣列1500的目前頁面(例如,頁面pg(m))中,權重向量We經由位元線BL1讀取至頁緩衝電路1001,並經由解碼電路200對於權重向量We進行解碼。
並且,權重向量We的對應位元資料(例如We(0))儲存至對應的頁緩衝電路1001的鎖存器DL。
First, please refer to FIG. 4A , which is a flow chart of the main procedures of the VVM/MAC operation performed by the page buffer unit PB and the
在第4A圖的實施例中,頁緩衝電路1001的設計限制(design constraint)在於:權重向量We的讀取程序的延遲(latency)(即,所需的執行時間)小於執行權重向量We與輸入向量In的VVM/MAC的運算程序的延遲。因此,第4A圖的流程包含步驟S106、S110與S112:將鎖存器DL儲存的權重向量We傳送至鎖存器WDL。更具體而言,在步驟S106、S110與S112中,根據旗標(flag)的數值而選擇性地將權重向量We從鎖存器DL傳送至鎖存器WDL。權重向量We是在頁緩衝電路1001內部以「內部傳遞(internally transfer)」的形式從鎖存器DL傳遞並轉存至鎖存器WDL。
In the embodiment of FIG. 4A , the design constraint of the
首先,在步驟S106判斷旗標的數值是否等於「0」。判斷結果為「是」,則執行步驟S110:將旗標的數值觸發(trigger)為「1」。若判斷結果為「否」,則重新執行步驟S106。 First, in step S106, determine whether the value of the flag is equal to "0". If the determination result is "yes", execute step S110: trigger the value of the flag to "1". If the determination result is "no", re-execute step S106.
步驟S110中旗標的數值觸發(trigger)為「1」時,表示權重向量We應傳送至鎖存器WDL,則接著執行步驟S112:將權重向量We從鎖存器DL傳送至鎖存器WDL。而後,執行步驟S114:在頁緩衝電路1001內部執行權重向量We與輸入向量In的VVM運算。步驟S114的VVM運算可包括部分乘積運算及累加運算。首先,部分乘積運算係執行如下:依序執行權重向量We的位元資料與輸入向量In的對應位元資料的部分乘積運算。
例如,執行位元資料We(0)與位元資料In(0)的部分乘積運算,執行位元資料We(0)與位元資料In(1)的部分乘積運算,而後執行位元資料We(0)與位元資料In(2)的部分乘積運算,依此類推。再者,累加運算係執行如下:加總部分乘積運算的結果。例如,位元資料We(0)與位元資料In(0)的乘積加總於位元資料We(0)與位元資料In(1)的乘積,而後再加總於位元資料We(0)與位元資料In(2)的乘積,等等。
When the value of the flag in step S110 is triggered to "1", it means that the weight vector We should be transferred to the latch WDL, and then step S112 is executed: the weight vector We is transferred from the latch DL to the latch WDL. Then, step S114 is executed: the VVM operation of the weight vector We and the input vector In is performed inside the
而後,執行步驟S116:判斷權重向量We與輸入向量In的每個位元資料的部分乘積運算是否執行完畢。若步驟S116的判斷結果為「是」,則執行步驟S118:判斷是否有新的請求(request),該新請求是用於請求執行下一筆輸入向量In與第一頁面pg(1)的權重向量We的運算。若步驟S116的判斷結果為「否」則重新執行步驟S108:將旗標重設為「0」。 Then, execute step S116: determine whether the partial product operation of the weight vector We and each bit data of the input vector In has been completed. If the judgment result of step S116 is "yes", execute step S118: determine whether there is a new request, which is used to request the execution of the operation of the next input vector In and the weight vector We of the first page pg(1). If the judgment result of step S116 is "no", re-execute step S108: reset the flag to "0".
在步驟S118中,若判斷結果為「否」,則結束本流程。若判斷結果為「是」,則重新執行步驟S100以將新的輸入向量In匯入至頁緩衝電路1001,且同步執行步驟S104以將新的權重向量We’讀取至頁緩衝電路1001。
In step S118, if the judgment result is "No", the process ends. If the judgment result is "Yes", step S100 is re-executed to import the new input vector In into the
接著,請參見第4B圖,其為權重向量We的讀取程序的流程圖(即,第4A圖的步驟S104的詳細流程)。可配合於第5A圖的頁緩衝電路1001的運作之示意圖來說明第4B圖的流程。首先,執行步驟S200:從記憶陣列1500中讀取目前頁面(例如,
頁面pg(m))儲存的權重向量We。而後,執行步驟S202:藉由解碼電路200對於權重向量We進行解碼。
Next, please refer to FIG. 4B, which is a flowchart of the reading procedure of the weight vector We (i.e., the detailed process of step S104 in FIG. 4A). The process of FIG. 4B can be explained in conjunction with the schematic diagram of the operation of the
而後,執行步驟S204:解碼後的權重向量We的對應位元資料儲存於對應的頁緩衝電路1001~1004的鎖存器DL。例如,位元資料We(0)儲存於頁緩衝電路1001的鎖存器DL,位元資料We(1)儲存於頁緩衝電路1002的鎖存器DL,位元資料We(2)儲存於頁緩衝電路1003的鎖存器DL,位元資料We(3)儲存於頁緩衝電路1004的鎖存器DL。
Then, execute step S204: the corresponding bit data of the decoded weight vector We is stored in the corresponding register DL of the
而後,執行步驟S206:將旗標的數值觸發為「1」,並將鎖存器DL儲存的權重向量We傳送至鎖存器WDL。頁緩衝電路1001~1004的鎖存器WDL分別儲存位元資料We(0)~We(3)。例如,頁緩衝電路1001的鎖存器WDL儲存位元資料We(0),頁緩衝電路1002的鎖存器WDL儲存位元資料We(1),頁緩衝電路1003的鎖存器WDL儲存位元資料We(2),頁緩衝電路1004的鎖存器WDL儲存位元資料We(3)。
Then, execute step S206: trigger the value of the flag to "1", and transfer the weight vector We stored in the latch DL to the latch WDL. The latches WDL of the
接著,請參見第4C圖,其為輸入向量In的匯入程序的流程圖(即,第4A圖的步驟S100的詳細流程)。可配合於第5B~5D圖的頁緩衝電路1001的運作之示意圖來說明第4C圖的流程。
Next, please refer to Figure 4C, which is a flowchart of the import process of the input vector In (i.e., the detailed process of step S100 in Figure 4A). The process of Figure 4C can be explained in conjunction with the schematic diagram of the operation of the
先配合參見第5B圖,在第4C圖的步驟S300中,將輸入向量In的第1個位元資料In(0)經由對應的資料輸入/輸出路徑P1匯入至鎖存器CDL。此時,計數值cnt的初始值為「0」。
而後,執行步驟S302:將輸入向量In的位元資料In(0)從鎖存器CDL儲存至對應的鎖存器L(i)(例如鎖存器L2)。而後,在步驟S304中,判斷計數值cnt是否等於輸入向量In的位元寬度N(本實施例的N等於「4」)。若判斷結果為「否」,表示輸入向量In的位元資料尚未全部匯入至頁緩衝電路1001,則執行步驟S306:將計數值cnt由「0」遞增為「1」。而後,重新執行步驟S300。
First, referring to FIG. 5B, in step S300 of FIG. 4C, the first bit data In(0) of the input vector In is imported into the latch CDL via the corresponding data input/output path P1. At this time, the initial value of the count value cnt is "0".
Then, step S302 is executed: the bit data In(0) of the input vector In is stored from the latch CDL to the corresponding latch L(i) (e.g., latch L2). Then, in step S304, it is determined whether the count value cnt is equal to the bit width N of the input vector In (N in this embodiment is equal to "4"). If the judgment result is "No", it means that the bit data of the input vector In has not been fully imported into the
同時參見第5C圖,在重新執行的步驟S300中,將輸入向量In的第2個位元資料In(1)經由資料輸入/輸出路徑P1傳送至鎖存器CDL。而後執行步驟S302:將輸入向量In的位元資料In(1)從鎖存器CDL儲存至對應的鎖存器L3。而後執行步驟S304:判斷計數值cnt是否等於輸入向量In的位元寬度「4」。若判斷結果為「否」,則執行步驟S306以將計數值cnt遞增為「2」,並重新執行步驟S300。 Meanwhile, referring to FIG. 5C, in the re-executed step S300, the second bit data In(1) of the input vector In is transmitted to the latch CDL via the data input/output path P1. Then, step S302 is executed: the bit data In(1) of the input vector In is stored from the latch CDL to the corresponding latch L3. Then, step S304 is executed: it is determined whether the count value cnt is equal to the bit width "4" of the input vector In. If the determination result is "no", step S306 is executed to increase the count value cnt to "2", and step S300 is re-executed.
依此類推,在重新執行的步驟S300至步驟S306中,將輸入向量In的另外2個位元資料In(2)與In(3)經由位元線BL1的輸入/輸出路徑P1傳送至鎖存器CDL,而後儲存至對應的鎖存器L4與L5。配合參見第5D圖,此時,鎖存器L2~L5已分別儲存了輸入向量In的位元資料In(0)~In(3)。並且計數值cnt已經遞增至「4」。而後,執行步驟S308:將計數值cnt重設為「0」。 Similarly, in the re-executed steps S300 to S306, the other two bits of data In(2) and In(3) of the input vector In are transmitted to the latch CDL via the input/output path P1 of the bit line BL1, and then stored in the corresponding latches L4 and L5. Referring to FIG. 5D, at this time, the latches L2~L5 have respectively stored the bit data In(0)~In(3) of the input vector In. And the count value cnt has been incremented to "4". Then, execute step S308: reset the count value cnt to "0".
在其他示例中,輸入向量In的位元資料In(0)~In(3)可根據不同順序儲存於鎖存器L2~L5。例如,位元資料In(0)可儲存於鎖存器L3,位元資料In(1)可儲存於鎖存器L2,等等。 In other examples, bit data In(0)~In(3) of input vector In can be stored in registers L2~L5 according to different orders. For example, bit data In(0) can be stored in register L3, bit data In(1) can be stored in register L2, and so on.
接著,請參見第4D圖,其為VVM運算的流程圖(即,第4A圖的步驟S114的詳細流程)。可配合於第5E~5H圖的頁緩衝電路1001的運作之示意圖來說明第4D圖的流程。
Next, please refer to Figure 4D, which is a flowchart of the VVM operation (i.e., the detailed process of step S114 in Figure 4A). The process of Figure 4D can be explained in conjunction with the schematic diagram of the operation of the
首先,執行步驟S400:控制電路400控制邏輯運算單元31~34的致能狀態,以使邏輯運算單元31~34選擇性地在不同的運算週期各自執行邏輯運算。在本實施例中,控制電路400可根據有限狀態機(finite-state-machine,FSM)來控制邏輯運算單元31~34的致能狀態,以分別在運算週期T1、T2、T3、T4致能邏輯運算單元31、32、33、34執行邏輯運算以產生運算結果。例如第5E圖所示,在運算週期T1邏輯運算單元31被致能以執行位元資料We(0)與位元資料In(0)邏輯運算(例如,邏輯「及」運算)以產生運算結果In(0).We(0)。同時,經由位元線BL1將記憶陣列1500下一個頁面pg(m+1)的權重向量We’讀取至頁緩衝電路1001。
First, execute step S400: the
而後,執行步驟S402:將邏輯運算單元31的運算結果In(0).We(0)儲存至鎖存器CDL。同時,解碼電路200對於權重向量We’進行解碼。
Then, execute step S402: store the operation result In(0) and We(0) of the
而後,執行步驟S404:將邏輯運算單元31的運算結果In(0).We(0)從鎖存器CDL輸出至累加電路1800,以執行累加運算。同時,解碼後的權重向量We’儲存至鎖存器DL。
Then, execute step S404: output the operation result In(0) and We(0) of the
而後,執行步驟S406:判斷計數值cnt是否等於位元寬度「4」。若判斷結果為「否」則執行步驟S408以遞增計數值cnt。而後重新執行步驟S400至步驟S404(配合參見5F圖):在下一個運算週期T2,控制電路400致能另一個邏輯運算單元32執行位元資料We(0)與位元資料In(1)的邏輯「及」運算以產生運算結果In(1).We(0)。並且,運算結果In(1).We(0)傳送至鎖存器CDL,而後輸出至累加電路1800。
Then, execute step S406: determine whether the count value cnt is equal to the bit width "4". If the judgment result is "no", execute step S408 to increment the count value cnt. Then re-execute step S400 to step S404 (see Figure 5F): In the next operation cycle T2, the
依此類推,若在步驟S406判斷計數值cnt仍不等於位元寬度「4」,則重新執行步驟S400至步驟S404。如第5G圖所示:在運算週期T3邏輯運算單元33執行位元資料We(0)與位元資料In(2)的邏輯「及」運算以產生運算結果In(2).We(0),並且傳送至鎖存器CDL,而後輸出至累加電路1800進行累加運算。接著,如5H所示:在運算週期T4邏輯運算單元34執行位元資料We(0)與位元資料In(3)的邏輯「及」運算以產生運算結果In(3).We(0),並且傳送至鎖存器CDL,而後輸出至累加電路1800。
Similarly, if it is determined in step S406 that the count value cnt is still not equal to the bit width "4", then step S400 to step S404 are executed again. As shown in Figure 5G: In operation cycle T3, the
若在步驟S406判斷計數值cnt已達到位元寬度「4」,則執行步驟S410:儲存累加電路1800的累加運算的運算結果。而後執行步驟S412:將計數值cnt重設為「0」。
If it is determined in step S406 that the count value cnt has reached the bit width "4", then step S410 is executed: the calculation result of the
另一方面,參見第6圖,其為頁緩衝單元PB與累加電路1800執行的VVM/MAC運算的另一實施例的流程圖。在第6圖的實施例中,不考量權重向量We的讀取程序的延遲,則頁緩衝電路1001不設置鎖存器WDL,且在第6圖的步驟S604之後接著執行步驟S606:執行權重向量We與輸入向量In之VVM運算。無需將權重向量We從鎖存器DL傳送至鎖存器WDL。
On the other hand, see FIG. 6, which is a flow chart of another embodiment of the VVM/MAC operation performed by the page buffer unit PB and the
接著,參見第7圖,其為第5A~5H圖的實施例的頁緩衝電路1001之運作之時序圖。可配合於第4B、4C與4D圖的流程圖來說明第7圖的時序圖。首先,在時間點t0~t4的期間,輸入向量In的4個位元資料In(0)~In(3)依序匯入至鎖存器CDL、並傳送至對應的鎖存器L2~L5(對應於第4C圖的步驟S300至步驟S306)。例如,在時間點t0~t1的期間T_im_1,輸入向量In的位元資料In(0)匯入至鎖存器CDL並傳送至鎖存器L2。接著,在時間點t1~t2的期間T_im_2,下一個位元資料In(1)匯入至鎖存器CDL並傳送至鎖存器L3。接著,在時間點t2~t3的期間T_im_3,第3個位元資料In(2)匯入至鎖存器CDL並傳送至鎖存器L4。接著,在時間點t3~t4的期間T_im_4,第4個位元資料In(3)匯入至鎖存器CDL並傳送至鎖存器L5。期間T_im_1~T_im_4的每一者具有相同的時間長度(例如30.72μs),期間T_im_1~T_im_4的總時間長度為122.88μs(即,4×30.72μs)。
Next, refer to FIG. 7, which is a timing diagram of the operation of the
本揭示的頁緩衝電路1001是基於「管線(pipeline)」操作機制,在時間點t0至時間點t3的期間可同步將權重向量We的對應位元資料(例如We(0))讀取至鎖存器DL、並傳送至鎖存器WDL(對應於第4B圖的步驟S200至步驟S206)。例如,在時間點t0~t2’的期間T_rd_1,先將權重向量We讀取至鎖存器DL。期間T_rd_1的時間長度例如是70μs。而後,在時間點t2’~t2”的期間T_int_rd_1,將權重向量We傳送至鎖存器WDL。期間T_int_rd_1的時間長度例如是5μs。
The
接著,在時間點t4~t4’的期間T_op_1,邏輯運算單元31執行位元資料We(0)與位元資料In(0)的邏輯運算以產生運算結果In(0).We(0),並且將運算結果In(0).We(0)儲存至鎖存器CDL(對應於第4D圖的步驟S400與S402)。期間T_op_1的時間長度例如是5μs。
Then, during the period T_op_1 from time point t4 to t4', the
而後,在時間點t4’~t5的期間T_ac_1,累加電路1800根據運算結果In(0).We(0)進行累加運算(對應於第4D圖的步驟S404)。期間T_ac_1的時間長度例如是30.72μs。第5E圖之運算週期T1可包括期間T_op_1與期間T_ac_1。基於管線運作機制,從時間點t4開始可同步讀取下一個頁面pg(m+1)的權重向量We’。
Then, during the period T_ac_1 between time points t4' and t5, the
接著,在時間點t5~t5’的期間T_op_2,邏輯運算單元32執行位元資料We(0)與位元資料In(1)的邏輯運算以產生運算結果In(1).We(0),並將運算結果In(1).We(1)儲存至鎖存器
CDL。而後,在時間點t5’~t6的期間T_ac_2,累加電路1800將運算結果In(1).We(0)累加於運算結果In(0).We(0)。第5F圖之運算週期T2可包括期間T_op_2與期間T_ac_2。在時間點t6,可完成下一個頁面pg(m+1)的權重向量We’於鎖存器DL的儲存。即,在時間點t4~t6的期間T_rd_2執行權重向量We’於鎖存器DL的儲存。
Next, during the period T_op_2 from time point t5 to t5', the
類似的,在後續的時間點t6~t6’的期間T_op_3,邏輯運算單元33執行位元資料We(0)與位元資料In(2)的邏輯運算,並且運算結果儲存至鎖存器CDL。而後,在時間點t6’~t7的期間T_ac_3,累加電路1800進行累加。第5G圖之運算週期T3可包括期間T_op_3與期間T_ac_3。並且,第5H圖之運算週期T4可包括期間T_op_4與期間T_ac_4,其中:時間點t7~t7’的期間T_op_4用於執行位元資料We(0)與位元資料In(3)的邏輯運算、並儲存運算結果至鎖存器CDL。並且,在時間點t7’~t8的期間T_ac_4根據上述邏輯運算結果執行累加運算。
Similarly, in the subsequent time period T_op_3 from t6 to t6', the
而後,在時間點t8~t9的期間T_int_rd_2,頁面pg(m+1)的權重向量We’從鎖存器DL傳送至鎖存器WDL。 Then, during the period T_int_rd_2 from time point t8 to t9, the weight vector We’ of page pg(m+1) is transferred from the latch DL to the latch WDL.
而後,在時間點t9~t9’的期間T_op_1用於執行頁面pg(m+1)的權重向量We’的位元資料We(0)與位元資料In(0)的邏輯運算,且時間點t9’~t10的期間T_ac_1用於執行累加運算。接著,在時間點t10~t10’的期間T_op_2用於執行頁面pg(m+1)的權重向量We’的位元資料We(0)與位元資料In(1)的 邏輯運算,且時間點t10’~t11的期間T_ac_2用於執行累加運算。並且,基於管線運作機制,可同步的在時間點t9~t11的期間T_rd_3完成後續的頁面pg(m+2)的權重向量We”於鎖存器DL的儲存。 Then, during the period from time point t9 to t9’, T_op_1 is used to perform the logical operation of the bit data We(0) and the bit data In(0) of the weight vector We’ of page pg(m+1), and during the period from time point t9’ to t10, T_ac_1 is used to perform the accumulation operation. Then, during the period from time point t10 to t10’, T_op_2 is used to perform the logical operation of the bit data We(0) and the bit data In(1) of the weight vector We’ of page pg(m+1), and during the period from time point t10’ to t11, T_ac_2 is used to perform the accumulation operation. Furthermore, based on the pipeline operation mechanism, the weight vector We" of the subsequent page pg(m+2) can be stored in the latch DL synchronously during the time period T_rd_3 from t9 to t11.
在一種示例中,頁緩衝電路1001根據位元寬度是「4」且維度是「512」進行邏輯運算,總計執行512次VVM/MAC運算。其中,頁緩衝電路1001的儲存空間例如是16KB(即,16×1024×8=131072個位元)。為了執行位元寬度是「4」且維度是「512」的運算,必須使用記憶陣列1500之中的2048個記憶晶胞(即4×512=2048)。當執行總共512個VVM運算(每個運算具有位元寬度「4」與維度「512」)時,需要從8個頁面(例如,頁面pg(m)~pg(m+7))的權重向量We的讀取,並且讀取請求R_rd的次數是「8」。據此,維度「512」的VVM/MAC運算的總執行時間T_total是1305.92μs,如式(1)與式(2)所示:T_total=(N×T_im_1)+{R_rd×[N×(T_op_1+T_ac_1)+T_int_rd_1]} (1)
In one example, the
1305.92μs=(4×30.72μs)+{8×[4×(5μs+30.72μs)+5μs]} (2) 1305.92μs=(4×30.72μs)+{8×[4×(5μs+30.72μs)+5μs]} (2)
接著,參見第8圖,其為一個比較例的向量-向量乘加器之運作之時序圖。第8圖之比較例之向量-向量乘加器是根據逐週期(cycle-by-cycle)機制執行VVM/MAC運算。在時間點 t0~t1的期間T_im_1,輸入向量In的位元資料In(0)匯入鎖存器(圖中未顯示)。在時間點t1~t2的期間T_rd_1,權重向量We讀取至另一鎖存器(圖中未顯示)。在時間點t2~t2’的期間T_op_1,執行位元資料We(0)與位元資料In(0)的邏輯運算以產生運算結果In(0).We(0)。在時間點t2’~t3的期間T_ac_1,累加電路根據運算結果In(0).We(0)進行累加運算。由於第8圖之比較例是根據逐週期機制(而非本揭示的管線運作機制)而執行,因此在時間點t0~t3的期間T_im_1、T_rd_1、T_op_1與T_ac_1並不同步執行其他運作。直到累加運算結束於時間點t3,才接著執行下一個位元資料In(1)與位元資料We(0)的匯入、讀取與邏輯運算。例如,在時間點t3~t3’的期間T_im_2,輸入向量In的下一個位元資料In(1)匯入鎖存器。在時間點t3’~t3”的期間T_op_2,執行位元資料We(0)與位元資料In(1)的邏輯運算,並且在時間點t3”~t4的期間T_ac_2執行累加運算。 Next, refer to FIG. 8, which is a timing diagram of the operation of a vector-vector multiplier-adder of a comparative example. The vector-vector multiplier-adder of the comparative example of FIG. 8 performs VVM/MAC operations according to a cycle-by-cycle mechanism. During the period T_im_1 from time point t0 to t1, the bit data In(0) of the input vector In is imported into a latch (not shown in the figure). During the period T_rd_1 from time point t1 to t2, the weight vector We is read into another latch (not shown in the figure). During the period T_op_1 from time point t2 to t2’, a logical operation is performed on the bit data We(0) and the bit data In(0) to generate the operation result In(0) We(0). During the period T_ac_1 from time point t2’ to t3, the accumulation circuit performs accumulation operation based on the operation result In(0) and We(0). Since the comparison example in Figure 8 is executed according to the cycle-by-cycle mechanism (rather than the pipeline operation mechanism disclosed in the present invention), T_im_1, T_rd_1, T_op_1 and T_ac_1 do not perform other operations synchronously during the period t0 to t3. Until the accumulation operation ends at the time point t3, the next bit data In(1) and bit data We(0) are imported, read and logically operated. For example, during the period T_im_2 from time point t3 to t3’, the next bit data In(1) of the input vector In is imported into the latch. During the period T_op_2 from time point t3’ to t3”, the logical operation of bit data We(0) and bit data In(1) is performed, and during the period T_ac_2 from time point t3” to t4, the accumulation operation is performed.
依此類推,以逐週期機制,在時間點t4~t5的期間T_im_3、T_op_3與T_ac_3執行輸入向量的匯入、邏輯運算及累加運算。而後,在時間點t5~t6的期間T_im_4、T_op_4與T_ac_4執行下一個輸入向量的匯入、邏輯運算及累加運算。 Similarly, in a cycle-by-cycle mechanism, T_im_3, T_op_3 and T_ac_3 perform the import, logic operation and accumulation operation of the input vector during the period of time points t4 to t5. Then, T_im_4, T_op_4 and T_ac_4 perform the import, logic operation and accumulation operation of the next input vector during the period of time points t5 to t6.
而後,在時間點t6~t8的期間T_im_1、T_rd_2與T_ac_1執行輸入向量的匯入、下一個頁面的權重向量讀取、邏輯運算及累加運算。 Then, during the time period from t6 to t8, T_im_1, T_rd_2 and T_ac_1 perform the import of the input vector, the reading of the weight vector of the next page, the logical operation and the accumulation operation.
根據第7圖之本揭示的頁緩衝電路1001的時序圖與第8圖之比較例的時序圖進行效能比較。本揭示的頁緩衝電路1001係根據管線運作機制而運作。在期間T_im_1~T_im_3執行輸入向量In的位元資料之匯入的同時,可同步執行兩個運作:第一個運作:於期間T_rd_1將目前頁面pg(m)的權重向量We儲存於鎖存器DL。第二個運作:於期間T_int_rd_1以內部傳送形式將權重向量We儲存至鎖存器WDL。
The performance comparison is performed based on the timing diagram of the
並且,根據管線運作機制,在期間T_op_1與T_op_2執行位元資料之邏輯運算、及期間T_ac_1與T_ac_2執行累加運算的同時,可在期間T_rd_2同步地將下一個頁面pg(m+1)的權重向量We’儲存於鎖存器DL。 Furthermore, according to the pipeline operation mechanism, while the logical operation of bit data is performed in periods T_op_1 and T_op_2, and the accumulation operation is performed in periods T_ac_1 and T_ac_2, the weight vector We’ of the next page pg(m+1) can be synchronously stored in the latch DL in period T_rd_2.
因此,相較於第8圖之比較例的逐週期機制,本揭示的頁緩衝電路1001協同於累加電路1800執行的VVM/MAC運算所需的總執行時間能夠大幅降低。
Therefore, compared to the cycle-by-cycle mechanism of the comparison example in FIG. 8 , the total execution time required for the VVM/MAC operation performed by the
雖然本揭示已以較佳實施例及範例詳細揭示如上,可理解的是,此些範例意指說明而非限制之意義。可預期的是,所屬技術領域中具有通常知識者可想到多種修改及組合,其多種修改及組合落在本揭示之精神以及後附之申請專利範圍之範圍內。 Although the present disclosure has been disclosed in detail with preferred embodiments and examples, it is understood that these examples are intended to be illustrative rather than restrictive. It is expected that a person with ordinary knowledge in the relevant technical field can think of various modifications and combinations, and the various modifications and combinations fall within the spirit of the present disclosure and the scope of the attached patent application.
21:感應放大器 21: Inductive amplifier
31,3(N-2),3(N-1):邏輯運算單元 31,3(N-2),3(N-1): Logical Operation Unit
100:鎖存器單元 100: Lock register unit
200:解碼電路 200: decoding circuit
300:邏輯運算電路 300:Logical operation circuit
400:控制電路 400: Control circuit
1001b:頁緩衝電路 1001b: Page buffer circuit
BL1:位元線 BL1: bit line
L1,L2,L(N-1),LN:鎖存器 L1, L2, L(N-1), LN: latch
DL,WDL,CDL:鎖存器 DL, WDL, CDL: lock register
P1:資料輸入/輸出路徑 P1: Data input/output path
Claims (20)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW112148012A TWI851495B (en) | 2023-12-11 | 2023-12-11 | Page buffer circuit and operating method thereof adapted for page read device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW112148012A TWI851495B (en) | 2023-12-11 | 2023-12-11 | Page buffer circuit and operating method thereof adapted for page read device |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TWI851495B true TWI851495B (en) | 2024-08-01 |
| TW202524320A TW202524320A (en) | 2025-06-16 |
Family
ID=93283893
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW112148012A TWI851495B (en) | 2023-12-11 | 2023-12-11 | Page buffer circuit and operating method thereof adapted for page read device |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TWI851495B (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115220690A (en) * | 2021-04-16 | 2022-10-21 | 旺宏电子股份有限公司 | Memory device and operation method thereof |
| TW202321952A (en) * | 2021-11-22 | 2023-06-01 | 旺宏電子股份有限公司 | Memory device and operation method thereof |
| US11837290B2 (en) * | 2018-12-17 | 2023-12-05 | Samsung Electronics Co., Ltd. | Nonvolatile memory device and operation method thereof |
-
2023
- 2023-12-11 TW TW112148012A patent/TWI851495B/en active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11837290B2 (en) * | 2018-12-17 | 2023-12-05 | Samsung Electronics Co., Ltd. | Nonvolatile memory device and operation method thereof |
| CN115220690A (en) * | 2021-04-16 | 2022-10-21 | 旺宏电子股份有限公司 | Memory device and operation method thereof |
| TW202321952A (en) * | 2021-11-22 | 2023-06-01 | 旺宏電子股份有限公司 | Memory device and operation method thereof |
Also Published As
| Publication number | Publication date |
|---|---|
| TW202524320A (en) | 2025-06-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Zabihi et al. | In-memory processing on the spintronic CRAM: From hardware design to application mapping | |
| CN109766309B (en) | Spin-memory-computing integrated chip | |
| CN107209665B (en) | Generate and execute control flow | |
| Wang et al. | An energy-efficient nonvolatile in-memory computing architecture for extreme learning machine by domain-wall nanowire devices | |
| CN111679785A (en) | Memory device for processing operations, method of operating the same, and data processing system | |
| CN110597484B (en) | Multi-bit full adder and multi-bit full add operation control method based on in-memory computing | |
| US20090254697A1 (en) | Memory with embedded associative section for computations | |
| JP7628112B2 (en) | Memory chips connecting system-on-chips and accelerator chips | |
| US20210173647A1 (en) | Orthogonal data transposition system and method during data transfers to/from a processing array | |
| Yu et al. | Energy efficient in-memory machine learning for data intensive image-processing by non-volatile domain-wall memory | |
| CN113157248A (en) | In-memory Processing (PIM) system and method of operating PIM system | |
| KR20220052355A (en) | Copying data from memory system with AI mode | |
| Bhattacharjee et al. | Crossbar-constrained technology mapping for ReRAM based in-memory computing | |
| CN113345484A (en) | Data operation circuit and storage and calculation integrated chip | |
| CN111459552B (en) | In-memory parallel computing method and device | |
| US12399722B2 (en) | Memory device and method including processor-in-memory with circular instruction memory queue | |
| TWI851495B (en) | Page buffer circuit and operating method thereof adapted for page read device | |
| WO2013097228A1 (en) | Multi-granularity parallel storage system | |
| US20250094092A1 (en) | Memory device for performing in-memory processing | |
| WO1992022068A1 (en) | Serial access memory | |
| CN118248193A (en) | High-reliability in-memory computing circuits and chips based on dynamic matching of reference circuits | |
| US12482505B2 (en) | Page buffer circuit and operating method thereof adapted for page read device | |
| CN119229916A (en) | Dual-port storage and computing integrated circuit, chip and electronic device | |
| WO2023206748A1 (en) | Data writing method, test method, writing apparatus, medium, and electronic device | |
| TWI863803B (en) | Computing-in-memory circuit and method |