TWI851495B

TWI851495B - Page buffer circuit and operating method thereof adapted for page read device

Info

Publication number: TWI851495B
Application number: TW112148012A
Authority: TW
Inventors: 林柏榕; 胡瀚文; 李永駿; 王淮慕
Original assignee: 旺宏電子股份有限公司
Priority date: 2023-12-11
Filing date: 2023-12-11
Publication date: 2024-08-01
Also published as: TW202524320A

Abstract

A page buffer circuit adapted for a page-read device which including a memory array having several pages and several bit lines. The page buffer circuit comprises the following elements. First latches, receive a weight-vector from a corresponding one of the pages through the bit lines, and import an input-vector through a data input/output path. The weight-vector has a plurality of weight bit-data, and the input-vector has a plurality of input bit-data. Second latches, store the input bit-data of the input-vector. Logic operation units, coupled to the first latches to receive the weight bit-data, and coupled to the second latches to receive the input bit-data, perform a logic operation of the input bit-data and the weight bit-data to generate a logic operation result. The logic operation result is sent to one the first latches. A control circuit, selectively enables the logic operation units to perform the logic operation.

Description

Page buffer circuit suitable for page reading device and its operation method

本揭示關於一種半導體裝置，特別有關於一種用於記憶裝置的頁緩衝電路及其操作方法。 The present disclosure relates to a semiconductor device, and more particularly to a page buffer circuit for a memory device and an operating method thereof.

隨著人工智慧技術的崛起，已發展出人工智慧運算所需之各種基本演算，諸如向量-向量之乘法((vector-vector-multiply，VVM)、以及乘法累加(multiply-accumulate，MAC)。基於記憶體的高速存取特性，可藉由記憶體執行的記憶體內運算(in-memory-computing，IMC)來達成VVM運算與MAC運算。 With the rise of artificial intelligence technology, various basic operations required for artificial intelligence computing have been developed, such as vector-vector multiplication (VVM) and multiply-accumulate (MAC). Based on the high-speed access characteristics of memory, VVM operations and MAC operations can be achieved through in-memory-computing (IMC) performed by the memory.

然而，當VVM運算與MAC運算的位元寬度較大時(即，執行多個位元的運算)時，記憶體內運算需要的執行時間將大幅提升。 However, when the bit width of VVM operations and MAC operations is larger (i.e., performing operations on multiple bits), the execution time required for in-memory operations will increase significantly.

針對於上述議題，需要改良的頁讀取(page read)與頁緩衝(page buffer)機制，以對於記憶陣列儲存的頁資料進行更有效率的資料讀取，並能夠配合管線(pipeline)運作機制，據以降低VVM運算與MAC運算的執行時間。 To address the above issues, an improved page read and page buffer mechanism is needed to read the page data stored in the memory array more efficiently and cooperate with the pipeline operation mechanism to reduce the execution time of VVM and MAC operations.

根據本揭示之一方面，提供一種頁緩衝電路，其適應於頁讀取裝置，其中該頁讀取裝置包括記憶陣列，該記憶陣列具有複數個頁面和多條位元線。該頁緩衝電路包括以下元件。複數個第一鎖存器，用於經由該些位元線從該些頁面之對應一者接收權重向量，並經由資料輸入/輸出路徑匯入輸入向量，其中該權重向量具有複數個權重位元資料，且該輸入向量具有複數個輸入位元資料。複數個第二鎖存器，用於儲存該輸入向量的該些輸入位元資料。複數個邏輯運算單元，耦接於該些第一鎖存器以接收該些權重位元資料，並耦接於該些第二鎖存器以接收該些輸入位元資料，各該邏輯運算單元用於執行該些輸入位元資料之對應一者與該些權重位元資料之對應一者的邏輯運算以產生邏輯運算結果，並且該邏輯運算結果傳送至該些第一鎖存器之其中一者。控制電路，用於選擇性地致能該些邏輯運算單元以執行該邏輯運算。 According to one aspect of the present disclosure, a page buffer circuit is provided, which is adapted for a page read device, wherein the page read device includes a memory array having a plurality of pages and a plurality of bit lines. The page buffer circuit includes the following elements. A plurality of first latches, for receiving a weight vector from a corresponding one of the pages via the bit lines, and for importing an input vector via a data input/output path, wherein the weight vector has a plurality of weight bit data, and the input vector has a plurality of input bit data. A plurality of second latches, for storing the input bit data of the input vector. A plurality of logic operation units are coupled to the first latches to receive the weight bit data, and coupled to the second latches to receive the input bit data. Each of the logic operation units is used to perform a logic operation on a corresponding one of the input bit data and a corresponding one of the weight bit data to generate a logic operation result, and the logic operation result is transmitted to one of the first latches. A control circuit is used to selectively enable the logic operation units to perform the logic operation.

根據本揭示之一方面，提供一種適應於頁讀取裝置的頁緩衝電路的操作方法，其中該頁讀取裝置包括記憶陣列，該記憶陣列具有複數個頁面和多條位元線，該操作方法包括以下步驟。藉由該頁緩衝電路的複數個第一鎖存器經由該些位元線從該些頁面之對應一者接收權重向量，並經由一資料輸入/輸出路徑匯入輸入向量至該些第一鎖存器，其中該權重向量具有複數個權重位元資料，且該輸入向量具有複數個輸入位元資料。儲存該輸入向量的該些輸入位元資料至該頁緩衝電路的複數個第二鎖存器。藉由該頁緩衝電路的複數個邏輯運算單元，從第一鎖存器接收該些權重位元資料並從該些第二鎖存器接收該些輸入位元資料。藉由各該邏輯運算單元，執行該些輸入位元資料之對應一者與該些權重位元資料之對應一者的邏輯運算，以產生邏輯運算結果。將該邏輯運算結果傳送至該些第一鎖存器之其中一者。藉由該頁緩衝電路的控制電路，選擇性地致能該些邏輯運算單元以執行該邏輯運算。 According to one aspect of the present disclosure, a method for operating a page buffer circuit adapted for a page read device is provided, wherein the page read device includes a memory array having a plurality of pages and a plurality of bit lines, and the method includes the following steps. A weight vector is received from a corresponding one of the pages via the bit lines by a plurality of first latches of the page buffer circuit, and an input vector is imported into the first latches via a data input/output path, wherein the weight vector has a plurality of weight bit data, and the input vector has a plurality of input bit data. The input bit data of the input vector is stored in a plurality of second latches of the page buffer circuit. The weight bit data are received from the first latch and the input bit data are received from the second latch by a plurality of logic operation units of the page buffer circuit. The logic operation corresponding to the input bit data and the weight bit data are executed by each logic operation unit to generate a logic operation result. The logic operation result is transmitted to one of the first latches. The logic operation units are selectively enabled to execute the logic operation by the control circuit of the page buffer circuit.

透過閱讀以下圖式、詳細說明以及申請專利範圍，可見本揭示之其它方面以及優點。 Other aspects and advantages of the present disclosure may be seen by reading the following drawings, detailed descriptions and claims.

21:感應放大器 21: Inductive amplifier

100:鎖存器單元 100: Lock register unit

200:解碼電路 200: decoding circuit

300:邏輯運算電路 300:Logical operation circuit

400:控制電路 400: Control circuit

1001,1001b,1001c:頁緩衝電路 1001,1001b,1001c: Page buffer circuit

1500:記憶陣列 1500:Memory array

1800:累加電路 1800: Accumulation circuit

2000:記憶裝置 2000: Memory devices

31,32,33,34,3(N-2),3(N-1):邏輯運算單元 31,32,33,34,3(N-2),3(N-1): Logical Operation Unit

311,312,314,321,322,324,331,332,334:輸入端 311,312,314,321,322,324,331,332,334: Input terminal

341,342,344:輸入端 341,342,344: Input port

313,323,333,343:輸出端 313,323,333,343: output port

42:多工器 42: Multiplexer

426:輸入端 426: Input port

421,422,423,424:輸入端 421,422,423,424: Input port

425:輸出端 425: Output port

PB:頁緩衝單元 PB: Page Buffer Unit

pg(0),pg(m+1),pg(m):頁面 pg(0),pg(m+1),pg(m):page

BL1,BL2,BL(M-1),BLM:位元線 BL1, BL2, BL(M-1), BLM: bit lines

P1:資料輸入/輸出路徑 P1: Data input/output path

In:輸入向量 In: Input vector

We:權重向量 We: weight vector

In(0)~In(3):位元資料 In(0)~In(3): bit data

We(0)~We(3):位元資料 We(0)~We(3): bit data

L1,L2,L3,L4,L5:鎖存器 L1, L2, L3, L4, L5: lock register

L(N-1),LN:鎖存器 L(N-1),LN: latch

DL,WDL,CDL:鎖存器 DL, WDL, CDL: lock register

T1~T4:運算週期 T1~T4: operation cycle

t1~t11:時間點 t1~t11: time point

t2’~t10’:時間點 t2’~t10’: time point

t2”~t7”:時間點 t2”~t7”: time point

T_ac_1,T_ac_2:期間 T_ac_1,T_ac_2: Period

T_im_1~T_im_4:期間 T_im_1~T_im_4: Period

T_op_1~T_op_3:期間 T_op_1~T_op_3: Period

T_rd_1,T_rd_2:期間 T_rd_1,T_rd_2: Period

T_int_rd_1,T_int_rd_2:期間 T_int_rd_1,T_int_rd_2: period

S100~S118,S200~S206,S300~S308:步驟 S100~S118,S200~S206,S300~S308: Steps

S400~S412,S600~S610:步驟 S400~S412,S600~S610: Steps

第1圖為本揭示一實施例的記憶裝置2000的電路圖。 Figure 1 is a circuit diagram of a memory device 2000 according to an embodiment of the present disclosure.

第2A圖為本揭示一實施例的頁緩衝電路1001的電路圖。 Figure 2A is a circuit diagram of a page buffer circuit 1001 according to an embodiment of the present disclosure.

第2B圖為本揭示另一實施例的頁緩衝電路1001b的電路圖。 Figure 2B is a circuit diagram of a page buffer circuit 1001b of another embodiment of the present disclosure.

第2C圖為本揭示又一實施例的頁緩衝電路1001c的電路圖。 Figure 2C is a circuit diagram of a page buffer circuit 1001c of another embodiment of the present disclosure.

第3圖為頁緩衝電路1001之基本運作之示意圖。 Figure 3 is a schematic diagram of the basic operation of the page buffer circuit 1001.

第4A圖為頁緩衝單元PB與累加電路1800執行的VVM/MAC運算的主要程序的流程圖。 Figure 4A is a flow chart of the main procedures of the VVM/MAC operation performed by the page buffer unit PB and the accumulation circuit 1800.

第4B圖為權重向量We的讀取程序的流程圖。 Figure 4B is a flowchart of the weight vector We reading procedure.

第4C圖為輸入向量In的匯入程序的流程圖。 Figure 4C is a flowchart of the import procedure of the input vector In.

第4D圖為VVM運算的流程圖。 Figure 4D is a flowchart of VVM operation.

第5A~5H圖為頁緩衝電路1001之運作之示意圖。 Figures 5A to 5H are schematic diagrams of the operation of the page buffer circuit 1001.

第6圖為頁緩衝單元PB與累加電路1800執行的VVM/MAC運算的另一實施例的流程圖。 FIG. 6 is a flow chart of another embodiment of the VVM/MAC operation performed by the page buffer unit PB and the accumulation circuit 1800.

第7圖為第5A~5H圖的實施例的頁緩衝電路1001之運作之時序圖。 FIG. 7 is a timing diagram of the operation of the page buffer circuit 1001 of the embodiment of FIGS. 5A to 5H.

第8圖為一個比較例的向量-向量乘加器之運作之時序圖。 Figure 8 is a timing diagram of the operation of a vector-vector multiplier-adder as a comparison example.

本說明書的技術用語係參照本技術領域之習慣用語，如本說明書對部分用語有加以說明或定義，該部分用語之解釋係以本說明書之說明或定義為準。本揭示之各個實施例分別具有一或多個技術特徵。在可能實施的前提下，本技術領域具有通常知識者可選擇性地實施任一實施例中部分或全部的技術特徵，或者選擇性地將這些實施例中部分或全部的技術特徵加以組合。 The technical terms in this specification refer to the customary terms in this technical field. If this specification explains or defines some terms, the interpretation of these terms shall be based on the explanation or definition in this specification. Each embodiment disclosed in this disclosure has one or more technical features. Under the premise of possible implementation, a person with ordinary knowledge in this technical field can selectively implement some or all of the technical features in any embodiment, or selectively combine some or all of the technical features in these embodiments.

請參見第1圖，其為本揭示一實施例的記憶裝置2000的電路圖。記憶裝置2000包括記憶陣列1500、頁緩衝單元PB及累加電路1800。記憶裝置2000具有適合於執行頁讀取操作的組態(例如，具有頁讀取特徵)，因此記憶裝置2000可稱為「頁讀取裝置」。對應的，記憶陣列1500是適合於頁讀取操作，例如，記憶陣列1500的類型可以是非揮發性(non-volatile)記憶體、或揮發性(volatile)記憶體，包括：反「及」閘快閃(NAND flash)記憶體、反「或」閘快閃(NOR flash)記憶體、相變記憶體(PCM)、動態隨機存取記憶體(DRAM)、靜態隨機存取記憶體(SRAM)或磁阻式隨機存取記憶體(MRAM)。記憶陣列1500可具有二維(2D)結構或三維(3D)結構。記憶陣列1500亦可具有單一平面(single-plane)結構或多平面(multi-plane)結構。 Please refer to FIG. 1, which is a circuit diagram of a memory device 2000 according to an embodiment of the present disclosure. The memory device 2000 includes a memory array 1500, a page buffer unit PB, and an accumulation circuit 1800. The memory device 2000 has a configuration suitable for performing a page read operation (e.g., has a page read feature), so the memory device 2000 can be referred to as a "page read device". Correspondingly, the memory array 1500 is suitable for page read operations. For example, the type of the memory array 1500 may be a non-volatile memory or a volatile memory, including: NAND flash memory, NOR flash memory, phase change memory (PCM), dynamic random access memory (DRAM), static random access memory (SRAM) or magnetoresistive random access memory (MRAM). The memory array 1500 may have a two-dimensional (2D) structure or a three-dimensional (3D) structure. The memory array 1500 may also have a single-plane structure or a multi-plane structure.

記憶陣列1500包括多個頁面(page)，例如：頁面pg(1)、...、頁面pg(m)及頁面pg(m+1)。每個頁面包括多個記憶區塊(圖中未顯示)，每個記憶區塊包括多個記憶晶胞。記憶晶胞可以是SLC晶胞(單階記憶晶胞)、MLC晶胞(二階記憶晶胞)、TLC晶胞(三階記憶晶胞)、QLC晶胞(四階記憶晶胞)或PLC晶胞(五階記憶晶胞)晶胞，等等。記憶晶胞用於儲存資料，例如儲存權重資料(weight data)。在本實施例中，一個頁面對應儲存一個權重向量(weight-vector)We。 The memory array 1500 includes a plurality of pages, for example, page pg(1), ..., page pg(m), and page pg(m+1). Each page includes a plurality of memory blocks (not shown in the figure), and each memory block includes a plurality of memory cells. The memory cells may be SLC cells (single-order memory cells), MLC cells (second-order memory cells), TLC cells (third-order memory cells), QLC cells (fourth-order memory cells), or PLC cells (fiveth-order memory cells), etc. The memory cells are used to store data, for example, weight data. In this embodiment, one page stores a weight vector We.

記憶陣列1500經由多條位元線，例如M條位元線BL1、BL2、BL3、BL4、...、BL(M-1)與BLM。頁緩衝單元PB可對於記憶陣列1500的頁面pg(1)~pg(m+1)執行頁讀取(page-read)操作。頁緩衝單元PB包括多個頁緩衝(page buffer)電路，例如M個頁緩衝電路1001、1002、1003、1004、...、100(M-1)與100M。此些頁緩衝電路1001~100M具有與位元線BL1~BLM相同之數量「M」。頁緩衝電路1001~100M分別耦接於位元線BL1~BLM。儲存於頁面pg(1)~pg(m+1)的權重向量We可經由位元線BL1~BLM之對應者讀取至頁緩衝電路1001~100M。 The memory array 1500 is accessed via a plurality of bit lines, such as M bit lines BL1, BL2, BL3, BL4, ..., BL(M-1) and BLM. The page buffer unit PB can perform a page-read operation on pages pg(1) to pg(m+1) of the memory array 1500. The page buffer unit PB includes a plurality of page buffer circuits, such as M page buffer circuits 1001, 1002, 1003, 1004, ..., 100(M-1) and 100M. These page buffer circuits 1001 to 100M have the same number "M" as the bit lines BL1 to BLM. Page buffer circuits 1001~100M are coupled to bit lines BL1~BLM respectively. The weight vector We stored in pages pg(1)~pg(m+1) can be read to page buffer circuits 1001~100M via the corresponding bit lines BL1~BLM.

權重向量We可具有位元寬度「N」，其中「N」小於或等於數量「M」。權重向量We包括儲存在記憶陣列1500的頁面pg(1)~pg(m+1)之其中一者的位元資料We_x(0)、We_x(1)、...、We_x(N-1)。位元寬度「N」表示權重向量We的位元的數量，引數「x」表示第x個維度。例如，頁面pg(1)具有16KB的資料量大小，並且頁面pg(1)可儲存總共64個權重向量We，每個權重向量We具有位元寬度「4」和維度「512」。位元資料We_j(n)是第j個位元且第n個維度的位元資料。在以下段落中，以位元寬度是「4」(即，N=4)以及第一個維度(即，x=1)為例進行說明。權重向量We的位元資料We_i(n)包括位元資料We(0)、We(1)、We(2)、及We(3)，頁緩衝單元PB中的四個頁緩衝電路1001~1004用於對應處理位元資料We(0)~We(3)。位元資料We(0)~We(3)可以經由位元線BL1~BL4對應提供至頁緩衝電路1001~1004。 The weight vector We may have a bit width "N", where "N" is less than or equal to the number "M". The weight vector We includes bit data We _x (0), We _x (1), ..., We _x (N-1) stored in one of the pages pg(1)~pg(m+1) of the memory array 1500. The bit width "N" represents the number of bits of the weight vector We, and the parameter "x" represents the xth dimension. For example, page pg(1) has a data size of 16KB, and page pg(1) can store a total of 64 weight vectors We, each weight vector We has a bit width of "4" and a dimension of "512". The bit data We _j (n) is the bit data of the jth bit and the nth dimension. In the following paragraphs, the bit width is "4" (i.e., N=4) and the first dimension (i.e., x=1) is used as an example for explanation. The bit data _Wei (n) of the weight vector We includes bit data We(0), We(1), We(2), and We(3), and the four page buffer circuits 1001~1004 in the page buffer unit PB are used to process the bit data We(0)~We(3) accordingly. The bit data We(0)~We(3) can be provided to the page buffer circuits 1001~1004 via the bit lines BL1~BL4.

頁緩衝電路1001~1004的每一者耦接於一條資料輸入/輸出路徑，例如，頁緩衝電路1001耦接於資料輸入/輸出路徑P1，頁緩衝電路1002耦接於資料輸入/輸出路徑P2，頁緩衝電路1003耦接於資料輸入/輸出路徑P3，頁緩衝電路1004耦接於資料輸入/輸出路徑P4(圖1中未顯示)。資料輸入/輸出路徑P1~P4可對應於位元線BL1~BL4。並且，具有位元寬度「4」的輸入向量(input vector)In經由對應的資料輸入/輸出路徑P1~P4匯入(import)至頁緩衝電路1001~1004其中一者。頁緩衝電路1001~1004的每一者對於輸入向量In與權重向量We執行的邏輯運算。 Each of the page buffer circuits 1001-1004 is coupled to a data input/output path, for example, the page buffer circuit 1001 is coupled to the data input/output path P1, the page buffer circuit 1002 is coupled to the data input/output path P2, the page buffer circuit 1003 is coupled to the data input/output path P3, and the page buffer circuit 1004 is coupled to the data input/output path P4 (not shown in FIG. 1). The data input/output paths P1-P4 may correspond to the bit lines BL1-BL4. Furthermore, an input vector In having a bit width of "4" is imported into one of the page buffer circuits 1001 to 1004 through the corresponding data input/output paths P1 to P4. Each of the page buffer circuits 1001 to 1004 performs a logical operation on the input vector In and the weight vector We.

頁緩衝電路1001~1004耦接於累加電路1800。累加電路1800對於頁緩衝電路1001~1004執行的邏輯運算的結果執行累加(accumulation)運算。頁緩衝電路1001~1004的邏輯運算併同於累加電路1800的累加運算形成向量-向量之乘法(可稱為「vector-vector-multiply，VVM」)的運算。 Page buffer circuits 1001-1004 are coupled to accumulation circuit 1800. Accumulation circuit 1800 performs accumulation operation on the result of the logic operation performed by page buffer circuits 1001-1004. The logic operation of page buffer circuits 1001-1004 is combined with the accumulation operation of accumulation circuit 1800 to form a vector-vector multiplication (also called "vector-vector-multiply, VVM") operation.

接著，請參見第2A圖，其為本揭示的圖1的實施例之頁緩衝電路1001的電路圖。頁緩衝電路1001經由位元線BL1耦接於記憶陣列1500。頁緩衝電路1001包括鎖存器(latch)電路100、解碼電路200、邏輯運算電路300與控制電路400。鎖存器電路100例如包括多個鎖存器(latch)，例如八個鎖存器DL、WDL、CDL以及L1~L5。鎖存器DL可稱為「首級(first stage)鎖存器」，其設置於鎖存電路100的首級(或第一級)。鎖存器WDL可稱為「權重鎖存器」，其設置於介於鎖存器L1與鎖存器L2之間的位址。鎖存器CDL可稱為「末級(last stage)鎖存器」，其設置於鎖存電路100的末級(或最後級)。鎖存器L2~L5設置於鎖存器WDL與鎖存器CDL之間的位址，且鎖存器L2~L5設置於連續的位址。邏輯運算電路300包括多個邏輯運算單元31~34。邏輯運算單元31~34的數量相等於鎖存器L2~L5的數量(為「4」)。 Next, please refer to FIG. 2A, which is a circuit diagram of a page buffer circuit 1001 of the embodiment of FIG. 1 of the present disclosure. The page buffer circuit 1001 is coupled to the memory array 1500 via the bit line BL1. The page buffer circuit 1001 includes a latch circuit 100, a decoding circuit 200, a logic operation circuit 300, and a control circuit 400. The latch circuit 100 includes, for example, a plurality of latches, such as eight latches DL, WDL, CDL, and L1~L5. The latch DL can be referred to as a "first stage latch", which is disposed at the first stage (or first level) of the latch circuit 100. The latch WDL may be referred to as a "weight latch", which is set at an address between latch L1 and latch L2. The latch CDL may be referred to as a "last stage latch", which is set at the last stage (or the last stage) of the latch circuit 100. The latches L2 to L5 are set at addresses between the latch WDL and latch CDL, and the latches L2 to L5 are set at consecutive addresses. The logic operation circuit 300 includes a plurality of logic operation units 31 to 34. The number of logic operation units 31 to 34 is equal to the number of latches L2 to L5 (which is "4").

可依據頁緩衝電路1001的設計限制(design constraint)而選擇性地設置鎖存器WDL。若設計限制是：執行權重向量We的讀取程序的延遲(即，執行時間)少於執行權重向量We與輸入向量In的VVM的運算程序的延遲，則頁緩衝電路 1001之中可設置鎖存器WDL。若無需考量權重向量We的讀取程序的延遲，則不設置鎖存器WDL。 The latch WDL may be selectively set according to the design constraint of the page buffer circuit 1001. If the design constraint is that the delay (i.e., execution time) of executing the read procedure of the weight vector We is less than the delay of executing the operation procedure of the VVM of the weight vector We and the input vector In, the latch WDL may be set in the page buffer circuit 1001. If the delay of the read procedure of the weight vector We does not need to be considered, the latch WDL is not set.

位元線BL1經由感應放大器(sensing amplifier，SA)21耦接於解碼電路200，且解碼電路200耦接於鎖存器DL。位元線BL1傳送的資料經由感應放大器21處理後傳送至解碼電路200進行解碼。以記憶陣列1500的記憶晶胞為TLC晶胞為例，解碼電路200解碼出每個TLC晶胞的3個位元之資料。在其他示例中，記憶陣列1500的記憶晶胞可以是SLC晶胞、MLC晶胞、QLC晶胞或PLC晶胞晶胞。若記憶晶胞是SLC晶胞，則無需設置解碼電路200。 The bit line BL1 is coupled to the decoding circuit 200 via the sensing amplifier (SA) 21, and the decoding circuit 200 is coupled to the latch DL. The data transmitted by the bit line BL1 is processed by the sensing amplifier 21 and then transmitted to the decoding circuit 200 for decoding. For example, if the memory cells of the memory array 1500 are TLC cells, the decoding circuit 200 decodes the 3-bit data of each TLC cell. In other examples, the memory cells of the memory array 1500 can be SLC cells, MLC cells, QLC cells, or PLC cells. If the memory cells are SLC cells, the decoding circuit 200 does not need to be set.

邏輯運算單元31具有輸入端311、312與314以及輸出端313。其中，輸入端311耦接於鎖存器L2，輸入端312耦接於鎖存器WDL。邏輯運算單元31根據鎖存器L2儲存的資料與鎖存器WDL儲存的資料執行邏輯運算，例如：邏輯「及(AND)」運算、邏輯「或(OR)」運算、邏輯「互斥或(XOR)」運算或邏輯「反互斥或(XNOR)」運算。控制電路400傳送控制訊號至邏輯運算單元31的輸入端314，用以致能(enable)邏輯運算單元31進行邏輯運算。運算結果經由輸出端313傳送至鎖存器CDL。 The logic operation unit 31 has input terminals 311, 312 and 314 and an output terminal 313. The input terminal 311 is coupled to the lock L2, and the input terminal 312 is coupled to the lock WDL. The logic operation unit 31 performs a logic operation according to the data stored in the lock L2 and the data stored in the lock WDL, such as a logic "AND" operation, a logic "OR" operation, a logic "exclusive OR" operation or a logic "anti-exclusive OR" operation. The control circuit 400 transmits a control signal to the input terminal 314 of the logic operation unit 31 to enable the logic operation unit 31 to perform logic operation. The operation result is transmitted to the latch CDL via the output terminal 313.

邏輯運算單元32、33與34的運作機制、以及其輸入端與輸出端的耦接方式類似於邏輯運算單元31。例如，邏輯運算單元32、33與34的輸入端321、331與341分別耦接於鎖存器L3、L4與L5。邏輯運算單元32、33與34的輸入端322、332 與342共同耦接於鎖存器WDL。邏輯運算單元32、33與34分別根據鎖存器L3、L4與L5的資料與鎖存器WDL的資料執行邏輯運算。在本實施例中，邏輯運算單元31~34都執行相同類型的邏輯運算，例如都執行邏輯「及」運算。邏輯運算單元31~34的輸出端313~343共同耦接於鎖存器CDL。控制電路400傳送控制訊號至邏輯運算單元31~34的輸入端314~344，在同一個運算週期邏輯運算單元31~34只有一者傳送運算結果至鎖存器CDL。 The operation mechanism of the logic operation units 32, 33 and 34 and the coupling method of their input terminals and output terminals are similar to those of the logic operation unit 31. For example, the input terminals 321, 331 and 341 of the logic operation units 32, 33 and 34 are coupled to the latches L3, L4 and L5 respectively. The input terminals 322, 332 and 342 of the logic operation units 32, 33 and 34 are coupled to the latch WDL. The logic operation units 32, 33 and 34 perform logic operations according to the data of the latches L3, L4 and L5 and the data of the latch WDL respectively. In this embodiment, the logic operation units 31-34 all perform the same type of logic operation, for example, all perform a logic "and" operation. The output terminals 313-343 of the logic operation units 31-34 are commonly coupled to the latch CDL. The control circuit 400 transmits a control signal to the input terminals 314-344 of the logic operation units 31-34. In the same operation cycle, only one of the logic operation units 31-34 transmits the operation result to the latch CDL.

資料輸入/輸出路徑P1耦接於鎖存器CDL。鎖存器CDL儲存的運算結果經由資料輸入/輸出路徑P1傳送至外部電路(例如累加電路1800)。 The data input/output path P1 is coupled to the latch CDL. The calculation result stored in the latch CDL is transmitted to the external circuit (such as the accumulation circuit 1800) via the data input/output path P1.

接著，請參見第2B圖，其為本揭示另一實施例的頁緩衝電路1001b的電路圖。本實施例的頁緩衝電路1001b可根據設計需求或設計限制設置更多個鎖存器，例如N個鎖存器L1~LN。對應的，頁緩衝電路1001b設置(N-1)個邏輯運算單元31~3(N-1)以分別執行鎖存器L2~LN的資料與鎖存器WDL的資料的邏輯運算。 Next, please refer to FIG. 2B, which is a circuit diagram of a page buffer circuit 1001b of another embodiment of the present disclosure. The page buffer circuit 1001b of the present embodiment can be provided with more latches, such as N latches L1~LN, according to design requirements or design restrictions. Correspondingly, the page buffer circuit 1001b is provided with (N-1) logic operation units 31~3(N-1) to respectively perform logic operations on the data of the latches L2~LN and the data of the latch WDL.

接著，請參見第2C圖，其為本揭示又一實施例的頁緩衝電路1001c的電路圖。本實施例的頁緩衝電路1001c更包括多工器42，藉由多工器42選擇邏輯運算單元31~34其中一者的運算結果以傳送至鎖存器CDL，而無須藉由控制電路400的控制訊號來致能並選擇邏輯運算單元31~34。 Next, please refer to FIG. 2C, which is a circuit diagram of a page buffer circuit 1001c of another embodiment of the present disclosure. The page buffer circuit 1001c of this embodiment further includes a multiplexer 42, which selects the operation result of one of the logic operation units 31-34 to be transmitted to the latch CDL, without the need to enable and select the logic operation units 31-34 by the control signal of the control circuit 400.

邏輯運算單元31、32、33與34各自的輸出端313、323、333與343分別耦接於多工器42的輸入端421、422、423與424以傳送運算結果。多工器42的輸入端426接收控制電路400的控制訊號，以選擇將輸入端421、422、423與424其中一者接收的運算結果傳送至輸出端425，而後傳送至鎖存器CDL。 The output terminals 313, 323, 333 and 343 of the logic operation units 31, 32, 33 and 34 are respectively coupled to the input terminals 421, 422, 423 and 424 of the multiplexer 42 to transmit the operation results. The input terminal 426 of the multiplexer 42 receives the control signal of the control circuit 400 to select the operation result received by one of the input terminals 421, 422, 423 and 424 to be transmitted to the output terminal 425, and then transmitted to the latch CDL.

接著，請參見第3圖，其為頁緩衝電路1001之基本運作之示意圖。頁緩衝電路1001對於輸入向量In與權重向量We執行邏輯運算。輸入向量In的位元寬度是「N」，輸入向量In包括N個位元資料In(0)、In(1)、...、與In(N-1)。以序列(sequential)方式經由資料輸入/輸出路徑P1依序匯入(import)輸入向量In的位元資料In(0)~In(N-1)至頁緩衝電路1001。 Next, please refer to Figure 3, which is a schematic diagram of the basic operation of the page buffer circuit 1001. The page buffer circuit 1001 performs logical operations on the input vector In and the weight vector We. The bit width of the input vector In is "N", and the input vector In includes N bits of data In(0), In(1), ..., and In(N-1). The bit data In(0)~In(N-1) of the input vector In are sequentially imported into the page buffer circuit 1001 through the data input/output path P1.

另一方面，權重向量We儲存於記憶陣列1500之中，權重向量We的位元寬度亦相等於「N」而包括N個位元資料We(0)、We(1)、...、與We(N-1)。權重向量We的位元資料We(0)~We(N-1)儲存於記憶陣列1500的其中一頁面，並以平行(parallel)方式經由對應位元線讀取至對應的頁緩衝電路。以位元寬度「N」等於「4」為例，權重向量We的位元資料We(0)讀取至對應的頁緩衝電路1001，位元資料We(1)讀取至對應的頁緩衝電路1002，位元資料We(2)讀取至對應的頁緩衝電路1003，位元資料We(3)讀取至對應的頁緩衝電路1004。 On the other hand, the weight vector We is stored in the memory array 1500. The bit width of the weight vector We is also equal to "N" and includes N bit data We(0), We(1), ..., and We(N-1). The bit data We(0)~We(N-1) of the weight vector We are stored in one page of the memory array 1500 and read in parallel to the corresponding page buffer circuit through the corresponding bit line. Taking the bit width "N" equal to "4" as an example, the bit data We(0) of the weight vector We is read to the corresponding page buffer circuit 1001, the bit data We(1) is read to the corresponding page buffer circuit 1002, the bit data We(2) is read to the corresponding page buffer circuit 1003, and the bit data We(3) is read to the corresponding page buffer circuit 1004.

而後，在頁緩衝電路1001~1004每一者之中，對於輸入向量In的位元資料以及權重向量We的對應之位元資料依序執行部分乘積(partial-product)的運算。 Then, in each of the page buffer circuits 1001-1004, partial-product operations are sequentially performed on the bit data of the input vector In and the corresponding bit data of the weight vector We.

而後，頁緩衝電路1001將部分乘積的運算結果傳送至累加電路1800。累加電路1800以序列方式執行加權累加(weighted accumulation)的運算，以得到VVM/MAC運算的最終運算結果。 Then, the page buffer circuit 1001 transmits the operation result of the partial product to the accumulation circuit 1800. The accumulation circuit 1800 performs the weighted accumulation operation in a sequential manner to obtain the final operation result of the VVM/MAC operation.

接著，請參見第4A~4D圖，其為頁緩衝單元PB與累加電路1800執行的VVM/MAC運算的流程圖，並配合參見第5A~5H圖繪示的頁緩衝電路1001的運作的示意圖。第4A~4D圖與第5A~5H圖是以輸入向量In與權重向量We的位元寬度皆等於「4」、且邏輯運算單元31~34都執行邏輯「及」運算為例進行說明。 Next, please refer to Figures 4A to 4D, which are flow charts of the VVM/MAC operation performed by the page buffer unit PB and the accumulation circuit 1800, and refer to Figures 5A to 5H for schematic diagrams of the operation of the page buffer circuit 1001. Figures 4A to 4D and 5A to 5H are explained by taking the case where the bit width of the input vector In and the weight vector We are both equal to "4" and the logic operation units 31 to 34 all perform the logic "and" operation as an example.

首先請參見第4A圖，其為頁緩衝單元PB與累加電路1800執行的VVM/MAC運算的主要程序的流程圖。在步驟S100，將輸入向量In經由對應的位元線(例如位元線BL1)匯入至頁緩衝電路1001。而後，在步驟S102，確認輸入向量In匯入完畢。另一方面，執行步驟S104：將權重向量We讀取至頁緩衝電路1001。步驟S104可同步於步驟S100、或在步驟S100之前或之後而執行。權重向量We原儲存於記憶陣列1500的目前頁面(例如，頁面pg(m))中，權重向量We經由位元線BL1讀取至頁緩衝電路1001，並經由解碼電路200對於權重向量We進行解碼。並且，權重向量We的對應位元資料(例如We(0))儲存至對應的頁緩衝電路1001的鎖存器DL。 First, please refer to FIG. 4A , which is a flow chart of the main procedures of the VVM/MAC operation performed by the page buffer unit PB and the accumulation circuit 1800. In step S100, the input vector In is imported into the page buffer circuit 1001 via the corresponding bit line (e.g., bit line BL1). Then, in step S102, it is confirmed that the input vector In has been imported. On the other hand, step S104 is executed: the weight vector We is read into the page buffer circuit 1001. Step S104 can be executed synchronously with step S100, or before or after step S100. The weight vector We is originally stored in the current page (e.g., page pg(m)) of the memory array 1500. The weight vector We is read to the page buffer circuit 1001 via the bit line BL1, and the weight vector We is decoded via the decoding circuit 200. In addition, the corresponding bit data of the weight vector We (e.g., We(0)) is stored in the corresponding latch DL of the page buffer circuit 1001.

在第4A圖的實施例中，頁緩衝電路1001的設計限制(design constraint)在於：權重向量We的讀取程序的延遲(latency)(即，所需的執行時間)小於執行權重向量We與輸入向量In的VVM/MAC的運算程序的延遲。因此，第4A圖的流程包含步驟S106、S110與S112：將鎖存器DL儲存的權重向量We傳送至鎖存器WDL。更具體而言，在步驟S106、S110與S112中，根據旗標(flag)的數值而選擇性地將權重向量We從鎖存器DL傳送至鎖存器WDL。權重向量We是在頁緩衝電路1001內部以「內部傳遞(internally transfer)」的形式從鎖存器DL傳遞並轉存至鎖存器WDL。 In the embodiment of FIG. 4A , the design constraint of the page buffer circuit 1001 is that the latency of the read process of the weight vector We (i.e., the required execution time) is less than the latency of the VVM/MAC operation process of executing the weight vector We and the input vector In. Therefore, the process of FIG. 4A includes steps S106, S110 and S112: the weight vector We stored in the latch DL is transferred to the latch WDL. More specifically, in steps S106, S110 and S112, the weight vector We is selectively transferred from the latch DL to the latch WDL according to the value of the flag. The weight vector We is transferred from the latch DL to the latch WDL in the page buffer circuit 1001 in the form of "internally transfer".

首先，在步驟S106判斷旗標的數值是否等於「0」。判斷結果為「是」，則執行步驟S110：將旗標的數值觸發(trigger)為「1」。若判斷結果為「否」，則重新執行步驟S106。 First, in step S106, determine whether the value of the flag is equal to "0". If the determination result is "yes", execute step S110: trigger the value of the flag to "1". If the determination result is "no", re-execute step S106.

步驟S110中旗標的數值觸發(trigger)為「1」時，表示權重向量We應傳送至鎖存器WDL，則接著執行步驟S112：將權重向量We從鎖存器DL傳送至鎖存器WDL。而後，執行步驟S114：在頁緩衝電路1001內部執行權重向量We與輸入向量In的VVM運算。步驟S114的VVM運算可包括部分乘積運算及累加運算。首先，部分乘積運算係執行如下：依序執行權重向量We的位元資料與輸入向量In的對應位元資料的部分乘積運算。例如，執行位元資料We(0)與位元資料In(0)的部分乘積運算，執行位元資料We(0)與位元資料In(1)的部分乘積運算，而後執行位元資料We(0)與位元資料In(2)的部分乘積運算，依此類推。再者，累加運算係執行如下：加總部分乘積運算的結果。例如，位元資料We(0)與位元資料In(0)的乘積加總於位元資料We(0)與位元資料In(1)的乘積，而後再加總於位元資料We(0)與位元資料In(2)的乘積，等等。 When the value of the flag in step S110 is triggered to "1", it means that the weight vector We should be transferred to the latch WDL, and then step S112 is executed: the weight vector We is transferred from the latch DL to the latch WDL. Then, step S114 is executed: the VVM operation of the weight vector We and the input vector In is performed inside the page buffer circuit 1001. The VVM operation of step S114 may include partial product operation and accumulation operation. First, the partial product operation is performed as follows: the partial product operation of the bit data of the weight vector We and the corresponding bit data of the input vector In is sequentially executed. For example, a partial product operation of bit data We(0) and bit data In(0) is performed, a partial product operation of bit data We(0) and bit data In(1) is performed, and then a partial product operation of bit data We(0) and bit data In(2) is performed, and so on. Furthermore, the accumulation operation is performed as follows: the results of the partial product operations are added up. For example, the product of bit data We(0) and bit data In(0) is added to the product of bit data We(0) and bit data In(1), and then added to the product of bit data We(0) and bit data In(2), and so on.

而後，執行步驟S116：判斷權重向量We與輸入向量In的每個位元資料的部分乘積運算是否執行完畢。若步驟S116的判斷結果為「是」，則執行步驟S118：判斷是否有新的請求(request)，該新請求是用於請求執行下一筆輸入向量In與第一頁面pg(1)的權重向量We的運算。若步驟S116的判斷結果為「否」則重新執行步驟S108：將旗標重設為「0」。 Then, execute step S116: determine whether the partial product operation of the weight vector We and each bit data of the input vector In has been completed. If the judgment result of step S116 is "yes", execute step S118: determine whether there is a new request, which is used to request the execution of the operation of the next input vector In and the weight vector We of the first page pg(1). If the judgment result of step S116 is "no", re-execute step S108: reset the flag to "0".

在步驟S118中，若判斷結果為「否」，則結束本流程。若判斷結果為「是」，則重新執行步驟S100以將新的輸入向量In匯入至頁緩衝電路1001，且同步執行步驟S104以將新的權重向量We’讀取至頁緩衝電路1001。 In step S118, if the judgment result is "No", the process ends. If the judgment result is "Yes", step S100 is re-executed to import the new input vector In into the page buffer circuit 1001, and step S104 is synchronously executed to read the new weight vector We' into the page buffer circuit 1001.

接著，請參見第4B圖，其為權重向量We的讀取程序的流程圖(即，第4A圖的步驟S104的詳細流程)。可配合於第5A圖的頁緩衝電路1001的運作之示意圖來說明第4B圖的流程。首先，執行步驟S200：從記憶陣列1500中讀取目前頁面(例如，頁面pg(m))儲存的權重向量We。而後，執行步驟S202：藉由解碼電路200對於權重向量We進行解碼。 Next, please refer to FIG. 4B, which is a flowchart of the reading procedure of the weight vector We (i.e., the detailed process of step S104 in FIG. 4A). The process of FIG. 4B can be explained in conjunction with the schematic diagram of the operation of the page buffer circuit 1001 in FIG. 5A. First, execute step S200: read the weight vector We stored in the current page (e.g., page pg(m)) from the memory array 1500. Then, execute step S202: decode the weight vector We by the decoding circuit 200.

而後，執行步驟S204：解碼後的權重向量We的對應位元資料儲存於對應的頁緩衝電路1001~1004的鎖存器DL。例如，位元資料We(0)儲存於頁緩衝電路1001的鎖存器DL，位元資料We(1)儲存於頁緩衝電路1002的鎖存器DL，位元資料We(2)儲存於頁緩衝電路1003的鎖存器DL，位元資料We(3)儲存於頁緩衝電路1004的鎖存器DL。 Then, execute step S204: the corresponding bit data of the decoded weight vector We is stored in the corresponding register DL of the page buffer circuit 1001~1004. For example, the bit data We(0) is stored in the register DL of the page buffer circuit 1001, the bit data We(1) is stored in the register DL of the page buffer circuit 1002, the bit data We(2) is stored in the register DL of the page buffer circuit 1003, and the bit data We(3) is stored in the register DL of the page buffer circuit 1004.

而後，執行步驟S206：將旗標的數值觸發為「1」，並將鎖存器DL儲存的權重向量We傳送至鎖存器WDL。頁緩衝電路1001~1004的鎖存器WDL分別儲存位元資料We(0)~We(3)。例如，頁緩衝電路1001的鎖存器WDL儲存位元資料We(0)，頁緩衝電路1002的鎖存器WDL儲存位元資料We(1)，頁緩衝電路1003的鎖存器WDL儲存位元資料We(2)，頁緩衝電路1004的鎖存器WDL儲存位元資料We(3)。 Then, execute step S206: trigger the value of the flag to "1", and transfer the weight vector We stored in the latch DL to the latch WDL. The latches WDL of the page buffer circuits 1001~1004 store bit data We(0)~We(3) respectively. For example, the latch WDL of the page buffer circuit 1001 stores bit data We(0), the latch WDL of the page buffer circuit 1002 stores bit data We(1), the latch WDL of the page buffer circuit 1003 stores bit data We(2), and the latch WDL of the page buffer circuit 1004 stores bit data We(3).

接著，請參見第4C圖，其為輸入向量In的匯入程序的流程圖(即，第4A圖的步驟S100的詳細流程)。可配合於第5B~5D圖的頁緩衝電路1001的運作之示意圖來說明第4C圖的流程。 Next, please refer to Figure 4C, which is a flowchart of the import process of the input vector In (i.e., the detailed process of step S100 in Figure 4A). The process of Figure 4C can be explained in conjunction with the schematic diagram of the operation of the page buffer circuit 1001 in Figures 5B to 5D.

先配合參見第5B圖，在第4C圖的步驟S300中，將輸入向量In的第1個位元資料In(0)經由對應的資料輸入/輸出路徑P1匯入至鎖存器CDL。此時，計數值cnt的初始值為「0」。而後，執行步驟S302：將輸入向量In的位元資料In(0)從鎖存器CDL儲存至對應的鎖存器L(i)(例如鎖存器L2)。而後，在步驟S304中，判斷計數值cnt是否等於輸入向量In的位元寬度N(本實施例的N等於「4」)。若判斷結果為「否」，表示輸入向量In的位元資料尚未全部匯入至頁緩衝電路1001，則執行步驟S306：將計數值cnt由「0」遞增為「1」。而後，重新執行步驟S300。 First, referring to FIG. 5B, in step S300 of FIG. 4C, the first bit data In(0) of the input vector In is imported into the latch CDL via the corresponding data input/output path P1. At this time, the initial value of the count value cnt is "0". Then, step S302 is executed: the bit data In(0) of the input vector In is stored from the latch CDL to the corresponding latch L(i) (e.g., latch L2). Then, in step S304, it is determined whether the count value cnt is equal to the bit width N of the input vector In (N in this embodiment is equal to "4"). If the judgment result is "No", it means that the bit data of the input vector In has not been fully imported into the page buffer circuit 1001, then execute step S306: increment the count value cnt from "0" to "1". Then, re-execute step S300.

同時參見第5C圖，在重新執行的步驟S300中，將輸入向量In的第2個位元資料In(1)經由資料輸入/輸出路徑P1傳送至鎖存器CDL。而後執行步驟S302：將輸入向量In的位元資料In(1)從鎖存器CDL儲存至對應的鎖存器L3。而後執行步驟S304：判斷計數值cnt是否等於輸入向量In的位元寬度「4」。若判斷結果為「否」，則執行步驟S306以將計數值cnt遞增為「2」，並重新執行步驟S300。 Meanwhile, referring to FIG. 5C, in the re-executed step S300, the second bit data In(1) of the input vector In is transmitted to the latch CDL via the data input/output path P1. Then, step S302 is executed: the bit data In(1) of the input vector In is stored from the latch CDL to the corresponding latch L3. Then, step S304 is executed: it is determined whether the count value cnt is equal to the bit width "4" of the input vector In. If the determination result is "no", step S306 is executed to increase the count value cnt to "2", and step S300 is re-executed.

依此類推，在重新執行的步驟S300至步驟S306中，將輸入向量In的另外2個位元資料In(2)與In(3)經由位元線BL1的輸入/輸出路徑P1傳送至鎖存器CDL，而後儲存至對應的鎖存器L4與L5。配合參見第5D圖，此時，鎖存器L2~L5已分別儲存了輸入向量In的位元資料In(0)~In(3)。並且計數值cnt已經遞增至「4」。而後，執行步驟S308：將計數值cnt重設為「0」。 Similarly, in the re-executed steps S300 to S306, the other two bits of data In(2) and In(3) of the input vector In are transmitted to the latch CDL via the input/output path P1 of the bit line BL1, and then stored in the corresponding latches L4 and L5. Referring to FIG. 5D, at this time, the latches L2~L5 have respectively stored the bit data In(0)~In(3) of the input vector In. And the count value cnt has been incremented to "4". Then, execute step S308: reset the count value cnt to "0".

在其他示例中，輸入向量In的位元資料In(0)~In(3)可根據不同順序儲存於鎖存器L2~L5。例如，位元資料In(0)可儲存於鎖存器L3，位元資料In(1)可儲存於鎖存器L2，等等。 In other examples, bit data In(0)~In(3) of input vector In can be stored in registers L2~L5 according to different orders. For example, bit data In(0) can be stored in register L3, bit data In(1) can be stored in register L2, and so on.

接著，請參見第4D圖，其為VVM運算的流程圖(即，第4A圖的步驟S114的詳細流程)。可配合於第5E~5H圖的頁緩衝電路1001的運作之示意圖來說明第4D圖的流程。 Next, please refer to Figure 4D, which is a flowchart of the VVM operation (i.e., the detailed process of step S114 in Figure 4A). The process of Figure 4D can be explained in conjunction with the schematic diagram of the operation of the page buffer circuit 1001 in Figures 5E to 5H.

首先，執行步驟S400：控制電路400控制邏輯運算單元31~34的致能狀態，以使邏輯運算單元31~34選擇性地在不同的運算週期各自執行邏輯運算。在本實施例中，控制電路400可根據有限狀態機(finite-state-machine，FSM)來控制邏輯運算單元31~34的致能狀態，以分別在運算週期T1、T2、T3、T4致能邏輯運算單元31、32、33、34執行邏輯運算以產生運算結果。例如第5E圖所示，在運算週期T1邏輯運算單元31被致能以執行位元資料We(0)與位元資料In(0)邏輯運算(例如，邏輯「及」運算)以產生運算結果In(0)．We(0)。同時，經由位元線BL1將記憶陣列1500下一個頁面pg(m+1)的權重向量We’讀取至頁緩衝電路1001。 First, execute step S400: the control circuit 400 controls the enable state of the logic operation unit 31-34, so that the logic operation unit 31-34 selectively performs logic operations in different operation cycles. In this embodiment, the control circuit 400 can control the enable state of the logic operation unit 31-34 according to a finite-state-machine (FSM), so as to enable the logic operation unit 31, 32, 33, 34 to perform logic operations in operation cycles T1, T2, T3, T4 to generate operation results. For example, as shown in Figure 5E, in operation cycle T1, the logic operation unit 31 is enabled to perform a logic operation (e.g., a logic "and" operation) on the bit data We(0) and the bit data In(0) to generate the operation result In(0)·We(0). At the same time, the weight vector We' of the next page pg(m+1) of the memory array 1500 is read to the page buffer circuit 1001 via the bit line BL1.

而後，執行步驟S402：將邏輯運算單元31的運算結果In(0)．We(0)儲存至鎖存器CDL。同時，解碼電路200對於權重向量We’進行解碼。 Then, execute step S402: store the operation result In(0) and We(0) of the logic operation unit 31 into the latch CDL. At the same time, the decoding circuit 200 decodes the weight vector We’.

而後，執行步驟S404：將邏輯運算單元31的運算結果In(0)．We(0)從鎖存器CDL輸出至累加電路1800，以執行累加運算。同時，解碼後的權重向量We’儲存至鎖存器DL。 Then, execute step S404: output the operation result In(0) and We(0) of the logic operation unit 31 from the latch CDL to the accumulation circuit 1800 to perform the accumulation operation. At the same time, the decoded weight vector We’ is stored in the latch DL.

而後，執行步驟S406：判斷計數值cnt是否等於位元寬度「4」。若判斷結果為「否」則執行步驟S408以遞增計數值cnt。而後重新執行步驟S400至步驟S404(配合參見5F圖)：在下一個運算週期T2，控制電路400致能另一個邏輯運算單元32執行位元資料We(0)與位元資料In(1)的邏輯「及」運算以產生運算結果In(1)．We(0)。並且，運算結果In(1)．We(0)傳送至鎖存器CDL，而後輸出至累加電路1800。 Then, execute step S406: determine whether the count value cnt is equal to the bit width "4". If the judgment result is "no", execute step S408 to increment the count value cnt. Then re-execute step S400 to step S404 (see Figure 5F): In the next operation cycle T2, the control circuit 400 enables another logic operation unit 32 to perform a logical "and" operation of the bit data We(0) and the bit data In(1) to generate the operation result In(1). We(0). In addition, the operation result In(1). We(0) is transmitted to the latch CDL and then output to the accumulation circuit 1800.

依此類推，若在步驟S406判斷計數值cnt仍不等於位元寬度「4」，則重新執行步驟S400至步驟S404。如第5G圖所示：在運算週期T3邏輯運算單元33執行位元資料We(0)與位元資料In(2)的邏輯「及」運算以產生運算結果In(2)．We(0)，並且傳送至鎖存器CDL，而後輸出至累加電路1800進行累加運算。接著，如5H所示：在運算週期T4邏輯運算單元34執行位元資料We(0)與位元資料In(3)的邏輯「及」運算以產生運算結果In(3)．We(0)，並且傳送至鎖存器CDL，而後輸出至累加電路1800。 Similarly, if it is determined in step S406 that the count value cnt is still not equal to the bit width "4", then step S400 to step S404 are executed again. As shown in Figure 5G: In operation cycle T3, the logic operation unit 33 performs a logical "and" operation on the bit data We(0) and the bit data In(2) to generate the operation result In(2)·We(0), and transmits it to the latch CDL, and then outputs it to the accumulation circuit 1800 for accumulation operation. Next, as shown in 5H: In operation cycle T4, logic operation unit 34 performs a logical "AND" operation on bit data We(0) and bit data In(3) to generate an operation result In(3)．We(0), and transmits it to latch CDL, and then outputs it to accumulation circuit 1800.

若在步驟S406判斷計數值cnt已達到位元寬度「4」，則執行步驟S410：儲存累加電路1800的累加運算的運算結果。而後執行步驟S412：將計數值cnt重設為「0」。 If it is determined in step S406 that the count value cnt has reached the bit width "4", then step S410 is executed: the calculation result of the accumulation circuit 1800 is stored. Then step S412 is executed: the count value cnt is reset to "0".

另一方面，參見第6圖，其為頁緩衝單元PB與累加電路1800執行的VVM/MAC運算的另一實施例的流程圖。在第6圖的實施例中，不考量權重向量We的讀取程序的延遲，則頁緩衝電路1001不設置鎖存器WDL，且在第6圖的步驟S604之後接著執行步驟S606：執行權重向量We與輸入向量In之VVM運算。無需將權重向量We從鎖存器DL傳送至鎖存器WDL。 On the other hand, see FIG. 6, which is a flow chart of another embodiment of the VVM/MAC operation performed by the page buffer unit PB and the accumulation circuit 1800. In the embodiment of FIG. 6, the page buffer circuit 1001 does not set the latch WDL without considering the delay of the reading process of the weight vector We, and executes step S606 after step S604 of FIG. 6: execute the VVM operation of the weight vector We and the input vector In. There is no need to transfer the weight vector We from the latch DL to the latch WDL.

接著，參見第7圖，其為第5A~5H圖的實施例的頁緩衝電路1001之運作之時序圖。可配合於第4B、4C與4D圖的流程圖來說明第7圖的時序圖。首先，在時間點t0~t4的期間，輸入向量In的4個位元資料In(0)~In(3)依序匯入至鎖存器CDL、並傳送至對應的鎖存器L2~L5(對應於第4C圖的步驟S300至步驟S306)。例如，在時間點t0~t1的期間T_im_1，輸入向量In的位元資料In(0)匯入至鎖存器CDL並傳送至鎖存器L2。接著，在時間點t1~t2的期間T_im_2，下一個位元資料In(1)匯入至鎖存器CDL並傳送至鎖存器L3。接著，在時間點t2~t3的期間T_im_3，第3個位元資料In(2)匯入至鎖存器CDL並傳送至鎖存器L4。接著，在時間點t3~t4的期間T_im_4，第4個位元資料In(3)匯入至鎖存器CDL並傳送至鎖存器L5。期間T_im_1~T_im_4的每一者具有相同的時間長度(例如30.72μs)，期間T_im_1~T_im_4的總時間長度為122.88μs(即，4×30.72μs)。 Next, refer to FIG. 7, which is a timing diagram of the operation of the page buffer circuit 1001 of the embodiment of FIGS. 5A to 5H. The timing diagram of FIG. 7 can be explained in conjunction with the flow charts of FIGS. 4B, 4C and 4D. First, during the period of time points t0 to t4, the 4-bit data In(0) to In(3) of the input vector In are sequentially imported into the latch CDL and transmitted to the corresponding latches L2 to L5 (corresponding to steps S300 to S306 of FIG. 4C). For example, during the period T_im_1 from the time point t0 to t1, the bit data In(0) of the input vector In is imported into the latch CDL and transmitted to the latch L2. Then, during the time period T_im_2 from time point t1 to t2, the next bit data In(1) is imported into the latch CDL and transmitted to the latch L3. Then, during the time period T_im_3 from time point t2 to t3, the third bit data In(2) is imported into the latch CDL and transmitted to the latch L4. Then, during the time period T_im_4 from time point t3 to t4, the fourth bit data In(3) is imported into the latch CDL and transmitted to the latch L5. Each of the periods T_im_1 to T_im_4 has the same time length (e.g., 30.72μs), and the total time length of the periods T_im_1 to T_im_4 is 122.88μs (i.e., 4×30.72μs).

本揭示的頁緩衝電路1001是基於「管線(pipeline)」操作機制，在時間點t0至時間點t3的期間可同步將權重向量We的對應位元資料(例如We(0))讀取至鎖存器DL、並傳送至鎖存器WDL(對應於第4B圖的步驟S200至步驟S206)。例如，在時間點t0~t2’的期間T_rd_1，先將權重向量We讀取至鎖存器DL。期間T_rd_1的時間長度例如是70μs。而後，在時間點t2’~t2”的期間T_int_rd_1，將權重向量We傳送至鎖存器WDL。期間T_int_rd_1的時間長度例如是5μs。 The page buffer circuit 1001 disclosed in the present invention is based on a "pipeline" operation mechanism. During the period from time point t0 to time point t3, the corresponding bit data of the weight vector We (e.g., We(0)) can be synchronously read into the latch DL and transmitted to the latch WDL (corresponding to step S200 to step S206 in FIG. 4B). For example, during the period T_rd_1 from time point t0 to t2', the weight vector We is first read into the latch DL. The time length of the period T_rd_1 is, for example, 70μs. Then, during the period T_int_rd_1 from time point t2' to t2", the weight vector We is transmitted to the latch WDL. The time length of the period T_int_rd_1 is, for example, 5μs.

接著，在時間點t4~t4’的期間T_op_1，邏輯運算單元31執行位元資料We(0)與位元資料In(0)的邏輯運算以產生運算結果In(0)．We(0)，並且將運算結果In(0)．We(0)儲存至鎖存器CDL(對應於第4D圖的步驟S400與S402)。期間T_op_1的時間長度例如是5μs。 Then, during the period T_op_1 from time point t4 to t4', the logic operation unit 31 performs a logic operation on the bit data We(0) and the bit data In(0) to generate the operation result In(0)．We(0), and stores the operation result In(0)．We(0) in the latch CDL (corresponding to steps S400 and S402 of Figure 4D). The time length of the period T_op_1 is, for example, 5μs.

而後，在時間點t4’~t5的期間T_ac_1，累加電路1800根據運算結果In(0)．We(0)進行累加運算(對應於第4D圖的步驟S404)。期間T_ac_1的時間長度例如是30.72μs。第5E圖之運算週期T1可包括期間T_op_1與期間T_ac_1。基於管線運作機制，從時間點t4開始可同步讀取下一個頁面pg(m+1)的權重向量We’。 Then, during the period T_ac_1 between time points t4' and t5, the accumulation circuit 1800 performs an accumulation operation according to the calculation results In(0) and We(0) (corresponding to step S404 in Figure 4D). The time length of the period T_ac_1 is, for example, 30.72μs. The calculation cycle T1 in Figure 5E may include the period T_op_1 and the period T_ac_1. Based on the pipeline operation mechanism, the weight vector We' of the next page pg(m+1) can be synchronously read starting from time point t4.

接著，在時間點t5~t5’的期間T_op_2，邏輯運算單元32執行位元資料We(0)與位元資料In(1)的邏輯運算以產生運算結果In(1)．We(0)，並將運算結果In(1)．We(1)儲存至鎖存器 CDL。而後，在時間點t5’~t6的期間T_ac_2，累加電路1800將運算結果In(1)．We(0)累加於運算結果In(0)．We(0)。第5F圖之運算週期T2可包括期間T_op_2與期間T_ac_2。在時間點t6，可完成下一個頁面pg(m+1)的權重向量We’於鎖存器DL的儲存。即，在時間點t4~t6的期間T_rd_2執行權重向量We’於鎖存器DL的儲存。 Next, during the period T_op_2 from time point t5 to t5', the logic operation unit 32 performs a logic operation on the bit data We(0) and the bit data In(1) to generate an operation result In(1)．We(0), and stores the operation result In(1)．We(1) in the latch CDL. Then, during the period T_ac_2 from time point t5' to t6, the accumulation circuit 1800 accumulates the operation result In(1)．We(0) to the operation result In(0)．We(0). The operation cycle T2 of FIG. 5F may include the period T_op_2 and the period T_ac_2. At the time point t6, the weight vector We' of the next page pg(m+1) can be stored in the latch DL. That is, during the time period from t4 to t6, T_rd_2 performs the storage of the weight vector We’ in the latch DL.

類似的，在後續的時間點t6~t6’的期間T_op_3，邏輯運算單元33執行位元資料We(0)與位元資料In(2)的邏輯運算，並且運算結果儲存至鎖存器CDL。而後，在時間點t6’~t7的期間T_ac_3，累加電路1800進行累加。第5G圖之運算週期T3可包括期間T_op_3與期間T_ac_3。並且，第5H圖之運算週期T4可包括期間T_op_4與期間T_ac_4，其中：時間點t7~t7’的期間T_op_4用於執行位元資料We(0)與位元資料In(3)的邏輯運算、並儲存運算結果至鎖存器CDL。並且，在時間點t7’~t8的期間T_ac_4根據上述邏輯運算結果執行累加運算。 Similarly, in the subsequent time period T_op_3 from t6 to t6', the logic operation unit 33 performs a logic operation on the bit data We(0) and the bit data In(2), and stores the operation result in the latch CDL. Then, in the time period T_ac_3 from t6' to t7, the accumulation circuit 1800 performs accumulation. The operation cycle T3 of FIG. 5G may include the period T_op_3 and the period T_ac_3. Furthermore, the operation cycle T4 of Figure 5H may include the period T_op_4 and the period T_ac_4, wherein: the period T_op_4 from time point t7 to t7' is used to perform the logic operation of the bit data We(0) and the bit data In(3), and store the operation result to the latch CDL. Furthermore, the period T_ac_4 from time point t7' to t8 performs the accumulation operation according to the above logic operation result.

而後，在時間點t8~t9的期間T_int_rd_2，頁面pg(m+1)的權重向量We’從鎖存器DL傳送至鎖存器WDL。 Then, during the period T_int_rd_2 from time point t8 to t9, the weight vector We’ of page pg(m+1) is transferred from the latch DL to the latch WDL.

而後，在時間點t9~t9’的期間T_op_1用於執行頁面pg(m+1)的權重向量We’的位元資料We(0)與位元資料In(0)的邏輯運算，且時間點t9’~t10的期間T_ac_1用於執行累加運算。接著，在時間點t10~t10’的期間T_op_2用於執行頁面pg(m+1)的權重向量We’的位元資料We(0)與位元資料In(1)的邏輯運算，且時間點t10’~t11的期間T_ac_2用於執行累加運算。並且，基於管線運作機制，可同步的在時間點t9~t11的期間T_rd_3完成後續的頁面pg(m+2)的權重向量We”於鎖存器DL的儲存。 Then, during the period from time point t9 to t9’, T_op_1 is used to perform the logical operation of the bit data We(0) and the bit data In(0) of the weight vector We’ of page pg(m+1), and during the period from time point t9’ to t10, T_ac_1 is used to perform the accumulation operation. Then, during the period from time point t10 to t10’, T_op_2 is used to perform the logical operation of the bit data We(0) and the bit data In(1) of the weight vector We’ of page pg(m+1), and during the period from time point t10’ to t11, T_ac_2 is used to perform the accumulation operation. Furthermore, based on the pipeline operation mechanism, the weight vector We" of the subsequent page pg(m+2) can be stored in the latch DL synchronously during the time period T_rd_3 from t9 to t11.

在一種示例中，頁緩衝電路1001根據位元寬度是「4」且維度是「512」進行邏輯運算，總計執行512次VVM/MAC運算。其中，頁緩衝電路1001的儲存空間例如是16KB(即，16×1024×8=131072個位元)。為了執行位元寬度是「4」且維度是「512」的運算，必須使用記憶陣列1500之中的2048個記憶晶胞(即4×512=2048)。當執行總共512個VVM運算(每個運算具有位元寬度「4」與維度「512」)時，需要從8個頁面(例如，頁面pg(m)~pg(m+7))的權重向量We的讀取，並且讀取請求R_rd的次數是「8」。據此，維度「512」的VVM/MAC運算的總執行時間T_total是1305.92μs，如式(1)與式(2)所示：T_total=(N×T_im_1)+{R_rd×[N×(T_op_1+T_ac_1)+T_int_rd_1]} (1) In one example, the page buffer circuit 1001 performs logical operations based on a bit width of "4" and a dimension of "512", and executes 512 VVM/MAC operations in total. The storage space of the page buffer circuit 1001 is, for example, 16KB (i.e., 16×1024×8=131072 bits). In order to execute operations with a bit width of "4" and a dimension of "512", 2048 memory cells in the memory array 1500 must be used (i.e., 4×512=2048). When executing a total of 512 VVM operations (each operation has a bit width of "4" and a dimension of "512"), it is necessary to read the weight vector We from 8 pages (for example, pages pg(m)~pg(m+7)), and the number of read requests R_rd is "8". Accordingly, the total execution time T_total of the VVM/MAC operation of dimension "512" is 1305.92μs, as shown in equations (1) and (2): T_total=(N×T_im_1)+{R_rd×[N×(T_op_1+T_ac_1)+T_int_rd_1]} (1)

1305.92μs=(4×30.72μs)+{8×[4×(5μs+30.72μs)+5μs]} (2) 1305.92μs=(4×30.72μs)+{8×[4×(5μs+30.72μs)+5μs]} (2)

接著，參見第8圖，其為一個比較例的向量-向量乘加器之運作之時序圖。第8圖之比較例之向量-向量乘加器是根據逐週期(cycle-by-cycle)機制執行VVM/MAC運算。在時間點 t0~t1的期間T_im_1，輸入向量In的位元資料In(0)匯入鎖存器(圖中未顯示)。在時間點t1~t2的期間T_rd_1，權重向量We讀取至另一鎖存器(圖中未顯示)。在時間點t2~t2’的期間T_op_1，執行位元資料We(0)與位元資料In(0)的邏輯運算以產生運算結果In(0)．We(0)。在時間點t2’~t3的期間T_ac_1，累加電路根據運算結果In(0)．We(0)進行累加運算。由於第8圖之比較例是根據逐週期機制(而非本揭示的管線運作機制)而執行，因此在時間點t0~t3的期間T_im_1、T_rd_1、T_op_1與T_ac_1並不同步執行其他運作。直到累加運算結束於時間點t3，才接著執行下一個位元資料In(1)與位元資料We(0)的匯入、讀取與邏輯運算。例如，在時間點t3~t3’的期間T_im_2，輸入向量In的下一個位元資料In(1)匯入鎖存器。在時間點t3’~t3”的期間T_op_2，執行位元資料We(0)與位元資料In(1)的邏輯運算，並且在時間點t3”~t4的期間T_ac_2執行累加運算。 Next, refer to FIG. 8, which is a timing diagram of the operation of a vector-vector multiplier-adder of a comparative example. The vector-vector multiplier-adder of the comparative example of FIG. 8 performs VVM/MAC operations according to a cycle-by-cycle mechanism. During the period T_im_1 from time point t0 to t1, the bit data In(0) of the input vector In is imported into a latch (not shown in the figure). During the period T_rd_1 from time point t1 to t2, the weight vector We is read into another latch (not shown in the figure). During the period T_op_1 from time point t2 to t2’, a logical operation is performed on the bit data We(0) and the bit data In(0) to generate the operation result In(0) We(0). During the period T_ac_1 from time point t2’ to t3, the accumulation circuit performs accumulation operation based on the operation result In(0) and We(0). Since the comparison example in Figure 8 is executed according to the cycle-by-cycle mechanism (rather than the pipeline operation mechanism disclosed in the present invention), T_im_1, T_rd_1, T_op_1 and T_ac_1 do not perform other operations synchronously during the period t0 to t3. Until the accumulation operation ends at the time point t3, the next bit data In(1) and bit data We(0) are imported, read and logically operated. For example, during the period T_im_2 from time point t3 to t3’, the next bit data In(1) of the input vector In is imported into the latch. During the period T_op_2 from time point t3’ to t3”, the logical operation of bit data We(0) and bit data In(1) is performed, and during the period T_ac_2 from time point t3” to t4, the accumulation operation is performed.

依此類推，以逐週期機制，在時間點t4~t5的期間T_im_3、T_op_3與T_ac_3執行輸入向量的匯入、邏輯運算及累加運算。而後，在時間點t5~t6的期間T_im_4、T_op_4與T_ac_4執行下一個輸入向量的匯入、邏輯運算及累加運算。 Similarly, in a cycle-by-cycle mechanism, T_im_3, T_op_3 and T_ac_3 perform the import, logic operation and accumulation operation of the input vector during the period of time points t4 to t5. Then, T_im_4, T_op_4 and T_ac_4 perform the import, logic operation and accumulation operation of the next input vector during the period of time points t5 to t6.

而後，在時間點t6~t8的期間T_im_1、T_rd_2與T_ac_1執行輸入向量的匯入、下一個頁面的權重向量讀取、邏輯運算及累加運算。 Then, during the time period from t6 to t8, T_im_1, T_rd_2 and T_ac_1 perform the import of the input vector, the reading of the weight vector of the next page, the logical operation and the accumulation operation.

根據第7圖之本揭示的頁緩衝電路1001的時序圖與第8圖之比較例的時序圖進行效能比較。本揭示的頁緩衝電路1001係根據管線運作機制而運作。在期間T_im_1~T_im_3執行輸入向量In的位元資料之匯入的同時，可同步執行兩個運作：第一個運作：於期間T_rd_1將目前頁面pg(m)的權重向量We儲存於鎖存器DL。第二個運作：於期間T_int_rd_1以內部傳送形式將權重向量We儲存至鎖存器WDL。 The performance comparison is performed based on the timing diagram of the page buffer circuit 1001 disclosed in FIG. 7 and the timing diagram of the comparison example in FIG. 8. The page buffer circuit 1001 disclosed in the present invention operates according to the pipeline operation mechanism. While the bit data of the input vector In is imported during the period T_im_1~T_im_3, two operations can be executed synchronously: the first operation: during the period T_rd_1, the weight vector We of the current page pg(m) is stored in the latch DL. The second operation: during the period T_int_rd_1, the weight vector We is stored in the latch WDL in the form of internal transmission.

並且，根據管線運作機制，在期間T_op_1與T_op_2執行位元資料之邏輯運算、及期間T_ac_1與T_ac_2執行累加運算的同時，可在期間T_rd_2同步地將下一個頁面pg(m+1)的權重向量We’儲存於鎖存器DL。 Furthermore, according to the pipeline operation mechanism, while the logical operation of bit data is performed in periods T_op_1 and T_op_2, and the accumulation operation is performed in periods T_ac_1 and T_ac_2, the weight vector We’ of the next page pg(m+1) can be synchronously stored in the latch DL in period T_rd_2.

因此，相較於第8圖之比較例的逐週期機制，本揭示的頁緩衝電路1001協同於累加電路1800執行的VVM/MAC運算所需的總執行時間能夠大幅降低。 Therefore, compared to the cycle-by-cycle mechanism of the comparison example in FIG. 8 , the total execution time required for the VVM/MAC operation performed by the page buffer circuit 1001 in cooperation with the accumulation circuit 1800 disclosed herein can be significantly reduced.

雖然本揭示已以較佳實施例及範例詳細揭示如上，可理解的是，此些範例意指說明而非限制之意義。可預期的是，所屬技術領域中具有通常知識者可想到多種修改及組合，其多種修改及組合落在本揭示之精神以及後附之申請專利範圍之範圍內。 Although the present disclosure has been disclosed in detail with preferred embodiments and examples, it is understood that these examples are intended to be illustrative rather than restrictive. It is expected that a person with ordinary knowledge in the relevant technical field can think of various modifications and combinations, and the various modifications and combinations fall within the spirit of the present disclosure and the scope of the attached patent application.

21:感應放大器 21: Inductive amplifier

31,3(N-2),3(N-1):邏輯運算單元 31,3(N-2),3(N-1): Logical Operation Unit

100:鎖存器單元 100: Lock register unit

200:解碼電路 200: decoding circuit

300:邏輯運算電路 300:Logical operation circuit

400:控制電路 400: Control circuit

1001b:頁緩衝電路 1001b: Page buffer circuit

BL1:位元線 BL1: bit line

L1,L2,L(N-1),LN:鎖存器 L1, L2, L(N-1), LN: latch

DL,WDL,CDL:鎖存器 DL, WDL, CDL: lock register

P1:資料輸入/輸出路徑 P1: Data input/output path

Claims

A page buffer circuit is adapted for a page reading device, wherein the page reading device includes a memory array having a plurality of pages and a plurality of bit lines, the page buffer circuit includes: a plurality of first latches, for receiving a weight vector from a corresponding one of the pages via the bit lines, and importing an input vector via a data input/output path, wherein the weight vector has a plurality of weight bit data, and the input vector has a plurality of input bit data, the first latches include a primary latch and a weight latch, wherein the primary latch receives one of the weight bit data, and the weight vector is delayed in response to a delay in a reading procedure of the weight vector. The latch receives the corresponding weight bit data from the first latch; a plurality of second latches are used to store the input bit data of the input vector; a plurality of logic operation units are coupled to the first latches to receive the weight bit data, and coupled to the second latches to receive the input bit data, each of the logic operation units The operation unit is used to perform a logic operation on a corresponding one of the input bit data and a corresponding one of the weight bit data to generate a logic operation result, and the logic operation result is transmitted to one of the first latches; and a control circuit is used to selectively enable the logic operation units to perform the logic operation.

A page buffer circuit as described in claim 1, wherein the logical operations of the input bit data and the weight bit data together form the logical operations of the input vector and the weight vector.

A page buffer circuit as described in claim 1, wherein the page buffer circuit operates in a plurality of operation calculation cycles, and the control circuit selectively enables one of the logic operation units in a corresponding one of the operation calculation cycles.

The page buffer circuit as described in claim 1 further includes: a decoding circuit coupled to a corresponding one of the bit lines and used to decode the weight vector to obtain one of the weight bit data.

A page buffer circuit as described in claim 4, wherein the first-level latch is coupled to the decoding circuit to receive a corresponding one of the weight bit data.

The page buffer circuit as described in claim 5, wherein the weight latch further provides the corresponding weight bit data to the logic operation units.

A page buffer circuit as described in claim 1, wherein the first latches include: a final latch coupled to the data input/output path to receive the corresponding input bit data.

A page buffer circuit as described in claim 7, wherein the final latch provides the corresponding input bit data to the second latches.

The page buffer circuit as described in claim 7, wherein the final latch is further coupled to the logic operation units to receive the logic operation results.

A page buffer circuit as described in claim 9, wherein the page buffer circuit is coupled to an accumulation circuit, and the final latch provides the logic operation results to the accumulation circuit via the data input/output path.

An operating method of a page buffer circuit adapted for a page read device, wherein the page read device includes a memory array having a plurality of pages and a plurality of bit lines, the operating method comprising: receiving a weight vector from a corresponding one of the pages via the bit lines through a plurality of first latches of the page buffer circuit, and receiving the weight vector via a data input/output The output path imports an input vector to the first latches, wherein the weight vector has a plurality of weight bit data, and the input vector has a plurality of input bit data, wherein one of the weight bit data is received by a first latch among the first latches, and one of the first latches is read in response to a delay in a reading process of the weight vector. The weight lock receives the corresponding weight bit data from the primary lock; the input bit data of the input vector are stored in a plurality of second locks of the page buffer circuit; the weight bit data are received from the first locks and the input bit data are received from the second locks by a plurality of logic operation units of the page buffer circuit; The logic operation unit performs a logic operation on a corresponding one of the input bit data and a corresponding one of the weight bit data to generate a logic operation result; transmits the logic operation result to one of the first latches; and selectively enables the logic operation units to perform the logic operation through a control circuit of the page buffer circuit.

The operating method as described in claim 11, wherein the logical operations of the input bit data and the weight bit data together form the logical operations of the input vector and the weight vector.

An operating method as described in claim 11, wherein the page buffer circuit operates with a plurality of operation calculation cycles, and the control circuit selectively enables one of the logic operation units in a corresponding one of the operation calculation cycles.

The operating method as described in claim 11 further includes: decoding the weight vector by a decoding circuit of the page buffer circuit to obtain one of the weight bit data.

The operating method as described in claim 14 further includes: receiving a corresponding one of the weight bit data from the decoding circuit via the first-level latch.

The operating method as described in claim 15 further includes: providing the corresponding weight bit data to the logic operation units through the weight latch.

The operating method as described in claim 11 further includes: receiving the corresponding input bit data from the data input/output path through a final-stage latch among the first latches.

The operating method as described in claim 17 further includes: providing the corresponding input bit data to the second latches through the final latch.

The operating method as described in claim 17 further includes: receiving the logical operation results through the final-stage latch.

The operating method as described in claim 19, wherein the page buffer circuit is coupled to an accumulation circuit, and the operating method further includes: providing the logic operation results to the accumulation circuit via the data input/output path through the final-stage latch.