WO2025038372A1 - Système, procédé et appareil pour mémoire à l'échelle d'une tranche - Google Patents
Système, procédé et appareil pour mémoire à l'échelle d'une tranche Download PDFInfo
- Publication number
- WO2025038372A1 WO2025038372A1 PCT/US2024/041403 US2024041403W WO2025038372A1 WO 2025038372 A1 WO2025038372 A1 WO 2025038372A1 US 2024041403 W US2024041403 W US 2024041403W WO 2025038372 A1 WO2025038372 A1 WO 2025038372A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- write
- modules
- memory
- wafer
- read
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H10—SEMICONDUCTOR DEVICES; ELECTRIC SOLID-STATE DEVICES NOT OTHERWISE PROVIDED FOR
- H10D—INORGANIC ELECTRIC SEMICONDUCTOR DEVICES
- H10D62/00—Semiconductor bodies, or regions thereof, of devices having potential barriers
- H10D62/40—Crystalline structures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/16—Constructional details or arrangements
- G06F1/18—Packaging or power distribution
- G06F1/183—Internal mounting support structures, e.g. for printed circuit boards, internal connecting means
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C29/00—Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
- G11C29/02—Detection or location of defective auxiliary circuits, e.g. defective refresh counters
- G11C29/022—Detection or location of defective auxiliary circuits, e.g. defective refresh counters in I/O circuitry
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C7/00—Arrangements for writing information into, or reading information out from, a digital store
- G11C7/10—Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C7/00—Arrangements for writing information into, or reading information out from, a digital store
- G11C7/10—Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
- G11C7/1051—Data output circuits, e.g. read-out amplifiers, data output buffers, data output registers, data output level conversion circuits
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C7/00—Arrangements for writing information into, or reading information out from, a digital store
- G11C7/10—Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
- G11C7/1078—Data input circuits, e.g. write amplifiers, data input buffers, data input registers, data input level conversion circuits
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C8/00—Arrangements for selecting an address in a digital store
- G11C8/12—Group selection circuits, e.g. for memory block selection, chip selection, array selection
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03K—PULSE TECHNIQUE
- H03K19/00—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits
- H03K19/02—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components
- H03K19/173—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
- H03K19/177—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
- H03K19/17748—Structural details of configuration resources
- H03K19/1776—Structural details of configuration resources for memories
-
- H—ELECTRICITY
- H10—SEMICONDUCTOR DEVICES; ELECTRIC SOLID-STATE DEVICES NOT OTHERWISE PROVIDED FOR
- H10B—ELECTRONIC MEMORY DEVICES
- H10B51/00—Ferroelectric RAM [FeRAM] devices comprising ferroelectric memory transistors
- H10B51/30—Ferroelectric RAM [FeRAM] devices comprising ferroelectric memory transistors characterised by the memory core region
-
- H—ELECTRICITY
- H10—SEMICONDUCTOR DEVICES; ELECTRIC SOLID-STATE DEVICES NOT OTHERWISE PROVIDED FOR
- H10D—INORGANIC ELECTRIC SEMICONDUCTOR DEVICES
- H10D30/00—Field-effect transistors [FET]
- H10D30/01—Manufacture or treatment
- H10D30/021—Manufacture or treatment of FETs having insulated gates [IGFET]
- H10D30/0415—Manufacture or treatment of FETs having insulated gates [IGFET] of FETs having ferroelectric gate insulators
-
- H—ELECTRICITY
- H10—SEMICONDUCTOR DEVICES; ELECTRIC SOLID-STATE DEVICES NOT OTHERWISE PROVIDED FOR
- H10D—INORGANIC ELECTRIC SEMICONDUCTOR DEVICES
- H10D30/00—Field-effect transistors [FET]
- H10D30/60—Insulated-gate field-effect transistors [IGFET]
- H10D30/701—IGFETs having ferroelectric gate insulators, e.g. ferroelectric FETs
-
- H—ELECTRICITY
- H10—SEMICONDUCTOR DEVICES; ELECTRIC SOLID-STATE DEVICES NOT OTHERWISE PROVIDED FOR
- H10D—INORGANIC ELECTRIC SEMICONDUCTOR DEVICES
- H10D64/00—Electrodes of devices having potential barriers
- H10D64/01—Manufacture or treatment
- H10D64/031—Manufacture or treatment of data-storage electrodes
- H10D64/033—Manufacture or treatment of data-storage electrodes comprising ferroelectric layers
-
- H—ELECTRICITY
- H10—SEMICONDUCTOR DEVICES; ELECTRIC SOLID-STATE DEVICES NOT OTHERWISE PROVIDED FOR
- H10D—INORGANIC ELECTRIC SEMICONDUCTOR DEVICES
- H10D64/00—Electrodes of devices having potential barriers
- H10D64/60—Electrodes characterised by their materials
- H10D64/66—Electrodes having a conductor capacitively coupled to a semiconductor by an insulator, e.g. MIS electrodes
- H10D64/68—Electrodes having a conductor capacitively coupled to a semiconductor by an insulator, e.g. MIS electrodes characterised by the insulator, e.g. by the gate insulator
- H10D64/689—Electrodes having a conductor capacitively coupled to a semiconductor by an insulator, e.g. MIS electrodes characterised by the insulator, e.g. by the gate insulator having ferroelectric layers
-
- H—ELECTRICITY
- H10—SEMICONDUCTOR DEVICES; ELECTRIC SOLID-STATE DEVICES NOT OTHERWISE PROVIDED FOR
- H10D—INORGANIC ELECTRIC SEMICONDUCTOR DEVICES
- H10D30/00—Field-effect transistors [FET]
- H10D30/60—Insulated-gate field-effect transistors [IGFET]
- H10D30/67—Thin-film transistors [TFT]
- H10D30/674—Thin-film transistors [TFT] characterised by the active materials
- H10D30/6755—Oxide semiconductors, e.g. zinc oxide, copper aluminium oxide or cadmium stannate
- H10D30/6756—Amorphous oxide semiconductors
-
- H—ELECTRICITY
- H10—SEMICONDUCTOR DEVICES; ELECTRIC SOLID-STATE DEVICES NOT OTHERWISE PROVIDED FOR
- H10D—INORGANIC ELECTRIC SEMICONDUCTOR DEVICES
- H10D99/00—Subject matter not provided for in other groups of this subclass
Definitions
- the present disclosure relates to integrated circuits. More particularly, the present disclosure relates to wafer-scale memory.
- Wafers serve as substrates in semiconductor manufacturing and play a role in the production of integrated circuits (ICs). They are typically thin slices of semiconductor material, such as silicon, which is used due to its electrical properties, cost-effectiveness, and the availability of high-quality single crystals. Wafers can vary in size, with diameters ranging from a few millimeters to 450 millimeters, and their thickness is typically controlled to ensure uniformity during processing.
- the manufacturing process for semiconductor devices on wafers may involves multiple stages, including doping, oxidation, photolithography, etching, and deposition. These processes collectively form intricate and precise patterns of transistors, capacitors, resistors, and other electronic components that constitute an integrated circuit. Modem semiconductor devices often incorporate millions to billions, or more, of these components to provide the desired functionality.
- Wafer-scale integration involves fabricating multiple dies on a single wafer and interconnecting them to function as a unified system. This approach can improve performance by reducing latency, increasing bandwidth, and enabling more efficient power distribution compared to conventional packaging methods.
- An integrated circuit is disclosed herein that may be part of a semiconductor device.
- a method of manufacturing or a method of writing and reading data to modules are disclosed herein and can be used with all examples, embodiments, and aspects as described herein.
- a semiconductor package may include a memory wafer comprising a plurality of modules groups, where each modules group includes multiple modules such as a first module and a second module.
- the package is designed with a write port configured to write to the entire modules group, along with a first read port dedicated to reading data from the first module and a second read port dedicated to reading data from the second module.
- an application wafer is bonded to the memory wafer and is in operative communication with the plurality of modules groups.
- the semiconductor package may include a memory wafer and an application wafer where the outer perimeter of the memory wafer and the outer perimeter of the application wafer are coextensive relative to each other.
- the semiconductor package may involve an application wafer that optionally includes one or more of various processing elements.
- processing elements may include an FPGA core, a microcontroller, a GPU processing element, an ASIC, a CPU core, a neural network accelerator, a digital signal processor (DSP), a reconfigurable processing unit, a machine learning inference engine, a tensor processing unit (TPU), a system- on-chip (SoC), a network processing unit (NPU), an embedded processor, a security module, and a programmable logic device.
- the semiconductor package may include a shared write port configured to write to both the first module and the second module within the modules group.
- the semiconductor package may feature a shared write port that is configured to write to an address space, where the shared write port is arranged to write to the first module through a first portion of the address space and to the second module through a second portion of the address space.
- the semiconductor package may include a write port that is specifically a shared write port, wherein this shared write port is configured to write solely to the modules group.
- the semiconductor package may include a plurality of modules groups that comprise at least one write-group. Each write-group within these modules groups includes a respective read port and a common write port.
- the semiconductor package may include at least one write-group, which can be further divided into a first write group and a second write group. The first write group can comprise approximately the first half of the plurality of modules groups, while the second write group comprises approximately the second half of the plurality of modules groups.
- the semiconductor package may be configured such that the first write group is the modules group.
- a semiconductor package may feature a modules group designated as a write group, where the write port serves as a common write port for the entire write group.
- the semiconductor package may include a write-group that encompasses all the modules of the memory wafer. This configuration allows the write-group to span across the entirety of the memory wafer, ensuring that all modules are included within the write-group.
- the semiconductor package includes a memory wafer that is cut into a rectangle.
- the semiconductor package may include a memory wafer that is cut into a square.
- the semiconductor package may include a memory wafer consisting of a plurality of modules groups, wherein each module within these groups is designated as a microvault.
- the semiconductor package may feature each module within the plurality of modules having an independent read port. This configuration allows for concurrent reading via a respective independent read port for any of the modules.
- the first read port is configured to read data from the first module while simultaneously, the second module can be read using the second read port.
- the semiconductor package may feature a modules group designed to have a single write address space. This configuration allows for writing to the plurality of modules within the modules group using this single write address space.
- the semiconductor package may feature a first read port that is configured to have a first read address space, along with a second read port that is configured to have a second read address space.
- the semiconductor package may be configured such that the first read address space numerically overlaps with the second read address space.
- the semiconductor package may be configured such that the first read address space is coextensive with the second read address space. This means that the read address space designated for the first read port and the read address space designated for the second read port encompass the same range of addresses.
- the semiconductor package may be configured such that the first read address space is contiguous with the second read address space.
- the semiconductor package may include an application wafer that integrates a plurality of processing modules.
- the semiconductor package may further include a redistribution layer on either the memory wafer or the application wafer to provide power.
- the semiconductor package may include another wafer that is bonded to the memory wafer to provide power supply routing to all of the plurality of modules.
- the semiconductor package may include an application wafer that features an application programming interface.
- the semiconductor package may further include a plurality of vias coupled to the memory wafer and the application wafer.
- the semiconductor package may include a plurality of vias positioned on the peripheral of the memory wafer.
- the semiconductor package may include a plurality of vias positioned at the center of the memory wafer.
- the semiconductor package may include a heat sink material that is interposed between the application wafer and the memory wafer.
- the semiconductor package may feature an application wafer that includes a Network-on-Chip (NoC) with routers interconnecting a plurality of processing elements.
- the semiconductor package may include a memory wafer configured with a single write-grouping that features a single write port. This configuration ensures that the write port operates as the sole write interface for the entire memory wafer.
- the semiconductor package may be configured such that the plurality of modules is designed for a write endurance of at least once a day for 10 years.
- the semiconductor package may be configured such that the plurality of modules is designed for a write endurance of at least once a week for 10 years.
- the semiconductor package may include a memory wafer that is formed from a complete slice of an ingot.
- the semiconductor package may include a memory wafer that is round.
- the semiconductor package may feature a memory wafer that has a diameter of 12 inches and includes an aggregate memory capacity of the plurality of modules configured to provide at least 10 trillion bytes of storage.
- the semiconductor package may comprise an application wafer and a memory wafer that are interconnected using bump-less bonding technology.
- the semiconductor package may include a memory wafer that incorporates memory repair mechanisms configured for in-situ repair of faulty memory modules.
- the semiconductor package may be configured such that each module of the plurality of modules includes fault-tolerance routing. This configuration enables any module within the memory wafer to route data around a failed module.
- the semiconductor package may include functionality such that when a module within the semiconductor package is determined to be faulty, a corresponding processing element is also determined to be faulty.
- the semiconductor package may further comprise an interconnection between the memory wafer and the application wafer. This interconnection is configured to utilize through-silicon vias (TSVs), which are designed to form interconnected buses between the memory wafer and the application wafer.
- TSVs through-silicon vias
- the semiconductor package may include an Al inference fabric disposed on the application wafer.
- he semiconductor package may include a plurality of modules configured to support deep neural network operations through large-scale dataflow mapping.
- the semiconductor package includes each module within the plurality of modules being configured to receive configuration data corresponding to a spatial dataflow. This configuration data is used to optimize the execution of an Al inference task.
- a semiconductor package may feature a first read port disposed on the surface of the memory wafer.
- the first module within this configuration comprises a memory array that includes a plurality of memory unit cells arranged within the first module. Additionally, the module may be equipped with read peripheral circuitry that is configured to read data stored within the memory array via the first read port.
- the semiconductor package may feature an arrangement where the footprint of the read peripheral circuitry on the surface overlaps with the footprint of the memory array on the surface.
- a semiconductor package may include a first read port on the surface of the memory wafer, where the footprint of the first read port overlaps the footprint of the memory array on the same surface.
- the semiconductor package may feature a configuration where the footprint on the surface of the read peripheral circuitry is coextensive with the footprint on the surface of the memory array.
- the semiconductor package may include a memory wafer where the footprint on the surface of the first read port is coextensive with the footprint on the surface of the memory array.
- the memory array, the read peripheral circuitry, and the first read port are vertically disposed to minimize the module footprint on the surface of the memory wafer. This vertical arrangement ensures that the memory array, the read peripheral circuitry, and the first read port are stacked in a vertical configuration on the surface of the memory wafer. Consequently, the surface area occupied by each module is minimized.
- the semiconductor package may also include a write peripheral circuitry connected to the write port on the surface of the memory wafer.
- This write peripheral circuitry defines a first footprint on the surface, which does not overlap with the footprint of the memory array and does not overlap with the footprint of the first read port.
- the write port defines a second footprint on the surface, which does not overlap with the footprint of the memory array and does not overlap with the footprint of the first read port.
- the semiconductor package may include a write peripheral circuitry connected to the write port on the surface of the memory wafer, wherein the first footprint of the write peripheral circuitry overlaps with the second footprint of the write port. This configuration ensures that the two footprints share a common area on the surface of the memory wafer.
- the semiconductor package may include a plurality of through-silicon vias (TSVs) positioned in a central portion of the first wafer pair.
- the semiconductor package may include a heatsink material disposed between the first wafer pair and the second wafer pair to facilitate thermal management.
- the semiconductor package may further include a redistribution layer disposed between the first and second wafer pairs.
- the semiconductor package may include a redistribution layer that supplies power to one of the first and second wafer pairs via a through-silicon via (TSV).
- TSV through-silicon via
- the semiconductor package may further include another redistribution layer disposed on a second side of the first wafer pair.
- a method of forming a semiconductor package may involve forming a memory wafer having a plurality of modules groups, where each modules group includes multiple modules such as a first module and a second module.
- the method also includes configuring the modules group with a write port that is capable of writing data to the entire group, a first read port designed specifically for reading data from the first module, and a second read port designed specifically for reading data from the second module.
- This procedural step can be followed by bonding an application wafer to the memory wafer, thereby enabling operative communication between the application wafer and the plurality of modules groups on the memory wafer.
- the method of forming a semiconductor package may involve forming the memory wafer and the application wafer such that the outer perimeter of the memory wafer and the outer perimeter of the application wafer are coextensive relative to each other.
- the method of forming a semiconductor package may include forming the memory wafer and the application wafer such that the application wafer includes at least one of the following processing elements: an FPGA core, a microcontroller, a GPU processing element, an ASIC, a CPU core, a neural network accelerator, a digital signal processor (DSP), a reconfigurable processing unit, a machine learning inference engine, a tensor processing unit (TPU), a system- on-chip (SoC), a network processing unit (NPU), an embedded processor, a security module, and a programmable logic device.
- processing elements an FPGA core, a microcontroller, a GPU processing element, an ASIC, a CPU core, a neural network accelerator, a digital signal processor (DSP), a reconfigurable processing unit, a
- the method of forming a semiconductor package may involve configuring the write port as a shared write port, which is set up to write to both the first module and the second module within the modules group.
- the method of forming a semiconductor package may involve configuring the shared write port to write to an address space. Specifically, the shared write port may be set up to write to the first module through a first portion of the address space and to the second module through a second portion of the address space.
- the method of forming a semiconductor package may involve configuring the write port as a shared write port, which is specifically set up to write solely to the modules group. This ensures that the shared write port directs its write operations only within the confines of the designated modules group and does not extend its write functionality to other parts of the semiconductor package.
- the method of forming a semiconductor package may involve forming at least one write-group within the plurality of modules groups. This write-group is designed to include a respective read port for each write-group and a common write port that services the entire write-group, enabling efficient data writing and reading operations within the designated modules.
- the method of forming a semiconductor package may include creating a first write group and a second write group within at least one write-group. The first write group encompasses approximately the first half of the plurality of modules groups, while the second write group includes approximately the second half of the plurality of modules groups.
- the semiconductor package may be configured such that the first write group is identified as the modules group.
- the method of forming a semiconductor package may involve creating the modules group as a write group and configuring the write port as a common write port for the entire write group. This configuration allows the write port to serve as the sole interface for writing data to all the modules within the modules group.
- the semiconductor package may include a write group that encompasses all the modules of the memory wafer. This configuration allows the write group to span across the entirety of the memory wafer, ensuring that all modules are included within the write group.
- the method of forming a semiconductor package may further involve cutting the memory wafer into a rectangle.
- the semiconductor package may involve a method of forming a memory wafer that includes cutting the memory wafer into a square.
- the method of forming a semiconductor package may involve configuring each module within the plurality of modules as a microvault.
- a method of forming a semiconductor package may involve configuring each module within the plurality of modules to include an independent read port. This configuration allows for concurrent reading via a respective independent read port for any of the modules. Specifically, the first read port is designed to read data from the first module while simultaneously, another read operation can be conducted from the second module using the second read port.
- the method of forming a semiconductor package may include configuring the modules group to have a single write address space. This configuration is designed to enable writing data to the plurality of modules within the modules group using this single write address space.
- the method of forming a semiconductor package may include forming the first read port to have a first read address space and the second read port to have a second read address space.
- the method of forming a semiconductor package may further involve configuring the first read address space such that it numerically overlaps with the second read address space.
- the method of forming a semiconductor package may also involve configuring the first read address space to be coextensive with the second read address space.
- the method of forming a semiconductor package may involve configuring the first read address space to be contiguous with the second read address space.
- the method of forming the semiconductor package may include forming a plurality of processing modules in the application wafer.
- the method of forming a semiconductor package may include providing a redistribution layer on either the memory wafer or the application wafer to supply power.
- the method of forming a semiconductor package may involve bonding another wafer to the memory wafer to provide power supply routing to all of the plurality of modules. This additional wafer is integrated in such a way to ensure effective power delivery across the various modules located within the memory wafer.
- the method of forming a semiconductor package may involve forming an application programming interface (API) within the application wafer.
- API application programming interface
- the method of forming the semiconductor package may involve coupling a plurality of vias to the memory wafer and the application wafer.
- the method of forming a semiconductor package involves coupling a plurality of vias to the memory wafer and the application wafer. Additionally, this process further includes disposing the plurality of vias on the peripheral of the memory wafer. This arrangement may strategically position the vias around the edges of the memory wafer to facilitate the interconnection between the memory wafer and the application wafer.
- the method of forming a semiconductor package may involve coupling a plurality of vias to the memory wafer and the application wafer. Specifically, this embodiment further includes disposing the plurality of vias at the center of the memory wafer.
- the method of forming a semiconductor package may involve interposing a heat sink material between the application wafer and the memory wafer.
- the method of forming a semiconductor package may include forming a Network-on-Chip (NoC) within the application wafer.
- NoC Network-on-Chip
- the NoC is designed with routers that interconnect a plurality of processing elements, facilitating efficient communication and data transfer between these elements on the application wafer.
- the method of forming a semiconductor package may involve configuring the memory wafer to have a single write-grouping with a single write port. This configuration ensures that the single write port operates as the sole write interface for the entire memory wafer.
- the method of forming a semiconductor package may include configuring the plurality of modules to ensure a write endurance of at least once a day for a period of 10 years.
- the method of forming a semiconductor package may involve configuring the plurality of modules to endure writes at a frequency of at least once a week for a duration of 10 years.
- the method of forming a semiconductor package may involve forming the memory wafer from a complete slice of an ingot.
- the method of forming a semiconductor package may include forming the memory wafer as a round wafer.
- the method may include configuring the memory wafer to be a 12-inch wafer.
- This 12-inch memory wafer may be designed to incorporate an aggregate memory from the plurality of modules, providing a combined storage capacity of at least 10 trillion bytes.
- a method of forming a semiconductor package may further involve interconnecting the application wafer and the memory wafer using bump-less bonding technology. This method ensures an efficient and reliable connection between the two wafers.
- the method of forming a semiconductor package may include incorporating memory repair mechanisms within the memory wafer. These mechanisms are configured to enable in-situ repair of faulty memory modules, ensuring continued functionality and reliability of the memory components within the semiconductor package.
- the method of forming a semiconductor package may involve configuring each module within the plurality of modules to include fault-tolerance routing. This fault-tolerance routing is designed to enable any module within the memory wafer to route data around a failed module. This ensures continued data flow and communication integrity within the memory wafer, allowing the remaining operational modules to bypass the affected area efficiently.
- the method may further involve determining a corresponding processing element to be faulty when it is determined that a module within the semiconductor package has failed.
- the method of forming a semiconductor package may involve interconnecting the memory wafer and the application wafer using through-silicon vias (TSVs). These TSVs are configured to form interconnected buses between the memory wafer and the application wafer.
- TSVs through-silicon vias
- the method of forming a semiconductor package may involve forming an Al inference fabric on the application wafer.
- the method of forming a semiconductor package may involve configuring the plurality of modules to support deep neural network operations through large- scale dataflow mapping.
- the method of forming a semiconductor package involves configuring each module within the plurality of modules to receive configuration data corresponding to a spatial dataflow. This configuration data is used to optimize the execution of an Al inference task.
- the method involves forming the memory array, the read peripheral circuitry, and the first read port in a vertical arrangement on the surface of the memory wafer.
- This vertical configuration is designed to reduce the footprint of the module on the surface of the memory wafer, effectively minimizing the space occupied by each module.
- the method of forming a semiconductor package may include creating a write peripheral circuitry connected to the write port on the surface of the memory wafer.
- This write peripheral circuitry defines a first footprint on the surface, and this first footprint does not overlap with the footprint of the memory array nor with the footprint of the first read port.
- the write port defines a second footprint on the surface, and this second footprint does not overlap with the footprint of the memory array nor with the footprint of the first read port.
- a method of forming a semiconductor package may involve creating a write peripheral circuitry connected to the write port on the surface of the memory wafer.
- This write peripheral circuitry is configured to define a first footprint on the surface, which does not overlap with the footprint of the memory array and does not overlap with the footprint of the first read port.
- the write port defines a second footprint on the surface, which does not overlap with the footprint of the memory array and does not overlap with the footprint of the first read port.
- the method further includes forming the first footprint to overlap the second footprint.
- the method of forming a semiconductor package may involve forming a plurality of wafer pairs, where the application wafer and the memory wafer combine to form a first wafer pair within the plurality of wafer pairs.
- a method of forming a semiconductor package may include connecting the first wafer pair to a second wafer pair among the plurality of wafer pairs using a plurality of through-silicon vias (TSVs).
- the method of forming a semiconductor package may further involve disposing the plurality of through-silicon vias (TSVs) on a peripheral of the first wafer pair.
- the method of forming a semiconductor package may involve positioning the plurality of through-silicon vias (TSVs) on a central portion of the first wafer pair.
- the method of forming a semiconductor package may involve positioning a heatsink material between the first wafer pair and the second wafer pair.
- the method of forming a semiconductor package may further involve disposing a redistribution layer between the first and second wafer pairs.
- the method of forming a semiconductor package may further comprise supplying power to one of the first and second wafer pairs via a through-silicon via (TSV) using the redistribution layer.
- TSV through-silicon via
- the method of forming a semiconductor package may further involve disposing an additional redistribution layer on a second side of the first wafer pair.
- a method of using a semiconductor package may involve writing data to a modules group using a write port, where the modules group comprises a plurality of modules, including a first module and a second module. The method further includes reading data from the first module using a first read port and reading data from the second module using a second read port. Additionally, the method encompasses operatively communicating between a memory wafer that contains the modules group and an application wafer that is bonded to the memory wafer.
- a method of using a semiconductor package may involve writing data to a modules group using a write port, where the modules group comprises a plurality of modules, including a first module and a second module. The method further includes reading data from the first module using a first read port and reading data from the second module using a second read port. Additionally, the method encompasses operatively communicating between a memory wafer that contains the modules group and an application wafer that is bonded to the memory wafer. The method also involves aligning the outer perimeter of the memory wafer to be coextensive with the outer perimeter of the application wafer. The method of using a semiconductor package may involve writing data to the first module and the second module through the use of a shared write port. This approach facilitates the writing process by utilizing a common interface to manage data input to both modules within the modules group.
- the method may involve configuring the shared write port to write to an address space.
- the shared write port may be arranged to write to the first module through a first portion of the address space and to the second module through a second portion of the address space.
- the method may involve configuring a shared write port specifically to write solely to the modules group.
- the method of using a semiconductor package may involve configuring a plurality of modules groups. Each of these modules groups may be designed to have its own respective read port and a common write port, ensuring efficient data operations within the modules groups.
- the method of using a semiconductor package may involve configuring the modules groups into a first write group and a second write group.
- the first write group may comprise approximately the first half of the plurality of modules groups, while the second write group comprises approximately the second half of the plurality of modules groups.
- the method may involve configuring the first write group to be the modules group.
- the semiconductor package may involve configuring the modules group as a write group and utilizing a common write port for the write group.
- the method of forming a semiconductor package may involve configuring all the modules of the memory wafer into a single write group. This configuration allows the write port to serve as the common write interface for all the modules within the memory wafer, thereby enabling unified and coordinated data writing operations across the entire memory wafer.
- the method may involve configuring each module within the plurality of modules to include an independent read port. This configuration allows for concurrent reading via respective independent read ports, ensuring that data can be read simultaneously from the modules without contention. This means that each module can be accessed directly and independently for reading operations, facilitating efficient and parallel data retrieval from the memory wafer.
- the method of using a semiconductor package may involve configuring the modules group to possess a single write address space. This configuration allows data to be written to the plurality of modules within the modules group using this unified write address space.
- the method of forming a semiconductor package may involve configuring the first read port to have a first read address space and configuring the second read port to have a second read address space.
- the method of forming a semiconductor package involves configuring the first read address space to numerically overlap with the second read address space.
- the method may also involve configuring the first read address space to be coextensive with the second read address space.
- the method of configuring a semiconductor package may involve setting the first read address space to be contiguous with the second read address space. This configuration ensures that the read address spaces for the first and second read ports are adjacent to each other in the memory address space.
- the method for using a semiconductor package may involve configuring a plurality of processing modules on the application wafer.
- the method may involve supplying power to either the memory wafer or the application wafer using a redistribution layer.
- the method of using a semiconductor package may involve routing power to all of the plurality of modules.
- the method of using a semiconductor package may further include querying an application programming interface (API) on the application wafer.
- API application programming interface
- the method of using a semiconductor package may involve communicating between a plurality of processing elements on the application wafer via a Network-on-Chip (NoC) with routers.
- NoC Network-on-Chip
- the method of using a semiconductor package may involve configuring the memory wafer to feature a single write-grouping that includes a single write port. This configuration ensures that the write port operates as the sole write interface for the entire memory wafer, thereby facilitating data writing operations across the memory modules groupings.
- the method of using a semiconductor package may encompass configuring the plurality of modules to support write endurance of at least once a day for a duration of 10 years. This configuration ensures that each module can endure the specified write frequency over the designated time period, maintaining reliable performance throughout the package's operational lifespan.
- the semiconductor package may involve configuring the plurality of modules for a write endurance of at least once a week for 10 years.
- the method of forming a semiconductor package may involve facilitating communication between the application wafer and the memory wafer.
- the method of using a semiconductor package may involve detecting a faulty memory module within the modules group. This detection process ensures that any defective modules are identified during the operation of the memory wafer and the application wafer.
- the method of using a semiconductor package may involve routing around a faulted module within the modules group. This process ensures that data can be redirected to avoid the faulty module, maintaining the integrity and flow of data within the system.
- the method involves determining a corresponding processing element to be faulty when a module within the modules group is determined to have failed. This process ensures that any defective modules are identified alongside their associated processing elements, maintaining the overall integrity and functionality of the semiconductor package.
- a method of using a semiconductor package may involve establishing communication between the memory wafer and the application wafer through the use of through-silicon vias (TSVs). These TSVs are configured to form an interconnected bus system, facilitating the transfer of data and signals between the two wafers.
- TSVs through-silicon vias
- a method of using a semiconductor package may involve configuring each module within the plurality of modules to receive configuration data corresponding to a spatial dataflow. This configuration data is designed to optimize the execution of an Al inference task.
- Fig. 1 is a block diagram of an integrated circuit that may be part of a semiconductor device such as a chiplet in accordance with an embodiment of the present disclosure
- Fig. 2 shows a perspective view of an assembly having the integrated circuit of Fig. 1 implemented on a semiconductor device that is electrically connected to another device to form the assembly in accordance with an embodiment of the present disclosure
- FIG. 3 shows a block diagram illustrating the memory address space of the integrated circuit of Fig. 1 in accordance with an embodiment of the present disclosure
- FIG. 4 shows a block diagram illustrating the memory address space with the signal interfaces of the integrated circuit of Fig. 1 in accordance with an embodiment of the present disclosure
- FIG. 5 shows an illustration of an integrated circuit that may be part of a semiconductor device such as a chiplet in accordance with an embodiment of the present disclosure
- Fig. 6 shows a perspective of an assembly having the integrated circuit of Fig.
- FIG. 7 shows a perspective view of an assembly having a semiconductor device with an array of processing elements and a second semiconductor device having an array of microvaults;
- Fig. 8 shows an assembly of a semiconductor devices including several memory types in accordance with an embodiment of the present disclosure
- FIG. 9 shows an assembly of semiconductor devices including a semiconductor device with a system-on-chip and another semiconductor with microvaults disposed on top in accordance with an embodiment of the present disclosure
- Fig. 10 shows a semiconductor assembly incorporating a daisy-chained configuration of microvaults operatively connected to a multiplexer and managed by a counter for coordinated data selection and retrieval, in accordance with an embodiment of the present disclosure.
- FIG. 11 shows a semiconductor assembly incorporating a daisy-chained configuration of microvaults in multiple semiconductor devices that are operatively connected to multiplexers and managed by counters for coordinated data selection and retrieval, in accordance with an embodiment of the present disclosure
- Fig. 12 illustrates a three-dimensional (3D) memory column configured as a 3D-NOR or 3D-AND structure, featuring a series of ferroelectric field-effect transistors (FeFETs) with interconnected drain terminals linked to a common select line and individual gate terminals connected to respective read/write enable lines, all coupled to a common bit line, in accordance with an embodiment of the present disclosure;
- FeFETs ferroelectric field-effect transistors
- Fig. 13 depicts a three-dimensional (3D) memory column configured as a 3D- NAND structure, consisting of a vertical stack of ferroelectric field-effect transistors (FeFETs), in accordance with an embodiment of the present disclosure
- Fig. 14 depicts a three-dimensional (3D) memory column configured as a 3D- NAND with an integrated pass gate, in accordance with an embodiment of the present disclosure
- Fig. 15 illustrates a three-dimensional (3D) memory column 1500, which may be configured as either a 3D-NOR or a 3D- AND structure with independent Read/Write enable capabilities, in accordance with an embodiment of the present disclosure
- Fig. 16 shows a cross-sectional view of a 3D memory structure configured as a single-port 3D NAND, in accordance with an embodiment of the present disclosure
- Fig. 17 shows a cross-sectional view of a 3D memory structure that is a dualport 3D NAND arrangement, in accordance with an embodiment of the present disclosure
- Fig. 18 illustrates a 3D memory structure that can be configured as a 3D NOR Vertical Transistor memory array, in accordance with an embodiment of the present disclosure
- Fig. 19 shows a planar FeFET in accordance with an embodiment of the present disclosure
- Fig. 20 shows electrical characteristics of an embodiment of a FeFET in accordance with an embodiment of the present disclosure.
- Fig. 21 shows a semiconductor package using a memory wafer and an application wafer in accordance with an embodiment of the present disclosure.
- Fig. 1 shows a block diagram of an integrated circuit 100 that may be packaged as a bondable chiplet (e.g., face-to-face chiplet bondable) in accordance with an embodiment of the present disclosure.
- the integrated circuit (IC) 100 includes a modules group 106 consisting of modules 108, 110, 112, and 114.
- the IC 100 also features a shared write port 102, configured to write to the modules group 106 using a write peripheral 104. Additionally, it includes read peripherals 116, 118, 120, and 122 and read ports 124, 126, 128, and 130, configured to read from the modules 108, 110, 112, 114.
- the write port 102 may be configured to provide a single write address space for all of the modules group 106 where each of the modules 108, 110, 112, 114 has a dedicated read port 124, 126, 128, 130 respectively.
- the integrated circuit 100 may be packaged as part of a chiplet configured to be electrically connected to another integrated circuit device (e.g., another chiplet, or IC package, with or without electrical contacts, electrical bumps, etc.).
- the chiplet may be electrically connected to another device including, for example, by bonding, soldering, wafer-to-wafer bonding, face-to-face chiplet bonding, chiplet-to-wafer bonding, chiplet-to-interposer bonding, and/or may be connected together with an interposer or other interfacing technology. None, one, or more of interposers may be used or other interfacing technologies that are common to heterogeneous 3D system-in-package solutions may be utilized in electrically connecting a chiplet to another device.
- Each read port (124, 126, 128, 130) in the chiplet may feature electrical contacts on a side of the chiplet or on multiple sides of the chiplet.
- the read ports 124, 126, 128, 130 may use multi-cycle pipelined circuitry.
- the electrical contacts may line up in a manner that provides dedicated access to specific modules of the modules 108, 110, 112, 114.
- a processing/computing element may have exclusive access to module 108 via the read port 124, which may contain the neural network weights in a register file.
- a different processing/computing element may have exclusive read access to module 110 via the read port 126, which includes a different register file.
- this arrangement of the electrical contacts ensures that each computing/processing element has the dedicated access it needs to carry out its specific computation efficiently thereby providing a compact, modular, and scalable system that allows different processing elements to maintain dedicated access to specific modules 108, 110, 112, 114. Without dedicated access, different processing elements might have to queue up to use the same resource which would slow down overall processing speed.
- the proposed chiplet ensures that each processing element can operate at its maximum capability without interference from other computing elements in this specific embodiment.
- the write peripheral 104 is a peripheral circuitry responsible for processing and writing data into the memory cells found within the modules 108, 110, 112, 114.
- the write peripheral 104 may include dedicated contacts so that a chip electrically connected (e.g., bonded) to a chiplet of the integrated circuit, such that the write port 102 is accessible via a shared write logic system that involves utilizing a shift register-based, different voltage design, preferably high voltage design, that has a shared write address and data components.
- This shared write logic system is designed to be accessed via a bonded chip, another bonded chiplet, and/or via other circuitry in the same package as the integrated circuit 100.
- a shift register could allow the system to move data through a series of stages, with each subsequent stage receiving the data from the previous stage. By utilizing a shift register, the system can increase the data throughput while maintaining a low rate of data transfers.
- the shared write address space refers to the location where data is written in the chiplet.
- an interlock 132 may disable the read ports 124, 126, 128, 130 while data is being written to the modules group 106 via the write port 102. Likewise, the interlock 132 may disable the write port 102 when read operations are being carried out on the read ports 124, 126, 128, 130.
- the written data can later be accessed concurrently by all processing elements that need to read the data via a respective one of the read ports 124, 126, 128, 130. This ensures that all processing elements have the most commonly used data available to them without regard to other reads being concurrently carried out by other processing elements.
- the write peripheral 104 circuit includes a write driver. This unit receives the data to be written and converts it into suitable signals that can change the state of the memory cells. Depending on the type of memory technology used, these signals could involve voltage levels, current pulses, or other types of energy.
- the shared write logic system may be high voltage due to the specific voltage requirements of the chiplet.
- the write driver must provide enough power to reliably change the state of the memory cells, but it must also operate within suitable parameters to avoid causing damage or unnecessary wear.
- the write peripheral 104 circuit may also feature a data buffer or write buffer. This component temporarily stores the data to be written, allowing the write operation to be performed at an optimal pace. By balancing the speed of incoming data with the speed at which the memory cells can be written, the write buffer helps prevent data loss and optimizes system performance.
- the write peripheral 104 may also include, in some embodiments, a write control unit that orchestrates the sequence of operations in the write process. It generates control signals to activate the write driver at the appropriate times, controls the flow of data from the write buffer, and coordinates the timing of the write operations. By synchronizing these various activities, the write control unit ensures efficient and reliable write operations.
- the write peripheral 104 may also include data encoding mechanisms to improve reliability and data integrity. For example, before the data is written to the memory cells, these mechanisms encode it in a way that allows potential errors to be detected, and in some cases, corrected when the data is later read. This can be helpful in systems where data integrity has a higher priority, such as in servers or scientific research devices.
- the write peripheral 104 may also include a timing unit that serves as the system's heartbeat, supplying clock signals that synchronize the operation of the system's various components. In some systems, it may include components like oscillators, clock generators, or phase-locked loops.
- the timing unit may ensure that all operations occur at the suitable time relative to each other.
- the IC 100 may be implemented as a face-to-face bonded chiplet, with modules 108, 110, 112, and 114 formed from a non-volatile memory.
- the IC 100 may also feature a dynamic allocation circuitry to allocate memory blocks to the modules group 106 based on the usage of the modules group 106 (e.g., each module 108 may include dynamic allocation circuitry for dynamically allocating a range of read locations for a respective processing element).
- the IC 100 features a plurality of clocks, with each clock of the plurality of clocks feeding a respective module of the plurality of modules, providing each respective module with decoupled timing relative to the other modules of the plurality of modules.
- the modules group 106 may be arranged in any topology known to one of ordinary skill in the relevant art. Bit-cell density can be up to 10 times more dense than embedded SRAM cells in the modules group 106.
- the IC 100 may be formed on a chiplet that includes a first side and a second side, with the second side configured for bonding to a second semiconductor device.
- the IC 100 may include a high voltage write logic adjacent to the first side of the chiplet.
- a decoder circuitry, a driver circuitry, and a register circuitry may be formed on the silicon substrate portion of the chiplet, while the modules group 106 is formed on a second layer portion of the chiplet.
- the second semiconductor device may comprise a plurality of processing elements. Each processing element includes a respective interface to communicate with a respective module of the plurality of modules on the modules group 106 when the second semiconductor device is bonded to the chiplet.
- the silicon substrate traditionally serves as the initial stage of IC fabrication, focusing on the creation of active components, particularly transistors. Techniques like diffusion, ion implantation, oxidation, and material deposition are employed to fashion the intricate structures of transistors. These processes operate at small scales. The application of photolithography, etching, and implantation techniques enables the definition of transistor structures with precision.
- the silicon substrate’s significance lies in its ability to establish the fundamental building blocks necessary for signal processing, amplification, and control within the IC. This layer is sometimes called Front-End-Of-The-Line (“FEOL”).
- FEOL Front-End-Of-The-Line
- a second layer may be added that traditionally takes on the role of interconnect fabrication, facilitating the electrical connections between various IC components.
- the second layer processes typically differ from the processes used on the silicon substrate in terms of precision and scale.
- the interconnects are formed by depositing and patterning metal layers, typically aluminum or copper, to construct the wiring network.
- Dielectric layers such as silicon dioxide or low-k dielectrics, are introduced to insulate the interconnects and prevent signal interference between different wiring layers.
- the second layer’s traditional function is to establish the necessary interconnections that enable the routing and distribution of electrical signals throughout the IC.
- circuity may be utilized within this second layer (sometimes referred to as Back-End-Of-The-Line (“BEOL”)).
- Alternate embodiments of the IC 100 may be implemented as a stacked die, a monolithic design, TSVs, or silicon through vias.
- a stacked die design several dies may be stacked on top of each other, with each die performing different functions, such as memory and processing.
- the stacked die may communicate through wire bonds, microbumps, or bump-less bonds.
- the various functions and modules of the IC 100 may be integrated onto a single die, forming a more compact and power-efficient design.
- the IC 100 may include one or more interlocks 132 to prevent conflicts in reading and writing data.
- the modules group 106 may be formed from a variety of non-volatile or semi-volatile (e.g., very long refresh periods) memory technologies, such as Static Random-Access Memory (SRAM), Ferroelectric Field Effect Transistor (FeFET), Ferroelectric Random Access Memory (FeRAM), Resistive Random Access Memory (ReRAM), Spin-Orbit Torque (SOT) Memory, Spin Transfer Torque (STT) Memory, charge trap, floating gate memories, and/or Schottky diodes.
- SRAM Static Random-Access Memory
- FeFET Ferroelectric Field Effect Transistor
- FeRAM Ferroelectric Random Access Memory
- ReRAM Resistive Random Access Memory
- SOT Spin-Orbit Torque
- STT Spin Transfer Torque
- the modules group 106 may utilize a Static Random- Access Memory (SRAM) Topology.
- SRAM Static Random- Access Memory
- the SRAM topology may employ a cross-coupled flip-flop structure (e.g., latching flip-flops), ensuring the stored data remains intact as long as power is supplied.
- the modules group 106 may utilize heterogeneous types of memory including volatile and non-volatile memory types.
- the modules group 106 may utilize a Flash Memory Topology.
- the Flash memory is a non-volatile memory technology used in applications where data persistence is needed, such as solid-state drives (SSDs) and USB flash drives.
- SSDs solid-state drives
- the flash memory topology disclosed herein features a matrix of memory cells, each consisting of a floating-gate transistor or charge trap device.
- the modules group 106 may also use wear-leveling techniques to prolong the lifespan of the memory cells.
- the modules group 106 may utilize a Ferroelectric Random- Access Memory (FeRAM) Topology.
- FeRAM Ferroelectric Random- Access Memory
- the FeRAM topology utilizes a ferroelectric material capable of retaining polarization states.
- One such memory topology may, in specific embodiments, utilize a FeFET to retain state information and program the ferroelectric material. These ferroelectric materials may be used to retain state information and act as a memory bit cell.
- the modules group 106 may utilize a Phase Change Memory (PCM) Topology, which is a non-volatile memory technology that utilizes reversible phase changes in materials to store data.
- PCM Phase Change Memory
- the PCM topology may include any phase change material, for example a chalcogenide alloy or a chalcogenide glass housed within a memory cell.
- the modules group 106 may utilize a Resistive Random-Access Memory (ReRAM) Topology, which is a non-volatile memory technology based on resistive switching phenomena.
- the ReRAM topology may utilize a thin-film material that exhibits reversible changes in resistance upon the application of electrical stimuli.
- the modules group 106 may utilize a Spin-Orbit Torque (SOT) Magnetic Random-Access Memory Topology.
- SOT-MRAM is a type of non-volatile memory that utilizes spin-orbit torque to switch the magnetic state of a storage element.
- the SOT-MRAM topology may incorporate a magnetic tunnel junction (MTJ) structure and leverages the spinorbit coupling effect to write and read data.
- the magnetic tunnel junction may have a dielectric layer between a magnetic fixed layer and a magnetic free layer. Writing may be done by switching magnetization of the free magnetic layer by injecting an in-plane current in an adj acent SOT layer. Reading may be done by putting current into the magnetic tunnel junction.
- the SOT-MRAM can optimize the spin-orbit materials by using current-driven switching schemes while minimizing write energy consumption, in some specific embodiments.
- the modules group 106 may utilize a Spin Transfer Torque (STT) Magnetic Random-Access Memory Topology.
- STT-MRAM is another type of non-volatile memory that relies on spin transfer torque to manipulate the magnetic state of a storage element.
- the STT-MRAM topology can use a magnetic tunnel junction (MTJ) structure, where the magnetization orientation determines the stored data. Additionally, the orientation of a magnetic layer in a magnetic tunnel junction or spin valve can be changed using a spin- polarized current, for example.
- MTJ magnetic tunnel junction
- the IC 100 may include a single write peripheral 104 with a dedicated clock, or each module 108, 110, 112, 114 may have its own dedicated write peripheral utilizing a shared clock (not shown in Fig. 1). Additionally, the modules group 106 may be organized into separate partitions, each with a dedicated read peripheral 116, 118, 120, 122 having an independent clock.
- IC 100 includes an interface (e.g., the same, different, higher or lower voltage) to enable data transfer external to the packaging of the IC 100.
- the IC 100 may also include an integrated microcontroller unit (MCU) or a digital signal processor (DSP) for processing data within the IC in yet additional specific embodiments.
- MCU microcontroller unit
- DSP digital signal processor
- Fig. 2 shows a perspective view an assembly 200 of the integrated circuit 212 of Fig. 1 implemented on a chiplet 230 that is bonded to a second device 226 in accordance with an embodiment of the present disclosure.
- the integrated circuit 212 is the circuitry within the chiplet 230.
- the second device 226 may be a chiplet, semiconductor wafer, semiconductor package, encased circuitry, etc.
- the second device 226 may be an Al accelerator such that each processing unit has read access to one module (or a predetermined set) of the modules group 236.
- the second device 226 may be a network controller where there is an offload circuit to read the data from each of the modules to processing incoming/outgoing packets, etc.
- the assembly 200 includes a modules group 236 having a plurality of modules, including a first module 232 and a second module 234. Fig. 2 shows several modules, however, for clarity, only modules 232, 234 have reference numbers.
- the integrated circuit 212 further comprises a shared write port 222. The shared write port 222 interfaces into the write peripheral 202.
- the second device 226 may use the shared write port 222 via an address & data bus with a clock and a enable signal to write data to any modules within the modules group 236, other ways of writing data may be considered.
- serial connections, parallel connections, various buses, or ports may be used, such as a DDR (Double Data Rate) Interface, a SRAM (Static Random-Access Memory) Interface, a NAND Flash Memory Interface, a NOR Flash Memory Interface, a HBM (High Bandwidth Memory) Interface, a GDDR (Graphics Double Data Rate) Interface, a NVMe (Non-Volatile Memory Express) Interface, SPI, IC2, etc.
- Each of the modules has a read port with a read address 218 (to send an address to a module 234) and read data 214 (which is the data read from the module 232.
- the modules group 236 is formed on a chiplet 230 having two sides including a surface 228 that can be bonded to and complement a second device 226.
- the chiplet 230 may be formed by forming circuitry on a silicon substrate 204 and then by adding a second layer 206. In other embodiments, these layers may be reversed and/or other layers may be added, removed, etc.
- the read address 218 and read data 220 are used for reading the module 232.
- the second device 226 may use an address & data bus with a clock and a enable signal to read data from the module 232, other ways of reading data may be considered.
- serial connections, parallel connections, various buses, or ports may be used, such as a DDR (Double Data Rate) Interface, a SRAM (Static Random-Access Memory) Interface, a NAND Flash Memory Interface, a NOR Flash Memory Interface, a HBM (High Bandwidth Memory) Interface, a GDDR (Graphics Double Data Rate) Interface, a NVMe (Non-Volatile Memory Express) Interface, SPI, IC2, etc.
- DDR Double Data Rate
- SRAM Static Random-Access Memory
- NAND Flash Memory Interface a NAND Flash Memory Interface
- NOR Flash Memory Interface NOR Flash Memory Interface
- HBM High Bandwidth Memory
- GDDR Graphics Double Data Rate
- NVMe Non-Volatile Memory Express
- All of the read ports are configured to be inactive when a write operation is applied to the shared write port 222.
- the read ports may also be configured to process reads concurrently with each other.
- the shared write port 222 is configured to write to an address space, where the shared write port 222 is configured to write to the first module 232 via a first portion of the address space and write to the second module 234 via a second portion of the address space.
- Each module of the plurality of modules 236 includes an independent read port for concurrent reading via a respective independent read port of any of the plurality of modules.
- Each read port for a respective module may include contacts for circuitry found within the second device 226 to interface via metallic contacts.
- metallic contacts on the top layer 208 that are configured to interface with metallic contacts on the surface 228 of the chiplet 230 such that the metallic contacts allow for a read space that is coextensive with a read space of a module of the modules 236.
- the read spaces of the modules group 236 may all be coextensive with each other (as is described with reference to Figs. 3 and 4).
- the read peripheral for the first module 232 is implemented on a silicon substrate 204 (sometimes referred to as a Front-end-of-the-line).
- the second layer 206 (sometimes call the Back-end-of-the-line) may be built next in the manufacturing process on top of the silicon substrate 204 (and any circuitry) and may contain the respective memory bit cells.
- the read peripheral for the first module 232 is implemented in the second layer 206 and is disposed between the modules group 236 and the surface 228 of the chiplet 230.
- the modules group 236 may be configured to process write commands only during reset.
- the write commands may be “slow write” commands. That is, the modules group 236 may have very low write speeds relative to its read speed.
- the write logic may be frozen (or disabled) when the modules group 236 are used for reading data.
- the integrated circuit 212 provides functionality to allocate memory blocks to the modules group 236 based on the usage of the modules group 236. In other embodiments, the memory addresses are fixed along with the allocation.
- the integrated circuit 212 may be implemented as a face-to-face bonded chiplet 230. The face-to face bonding may be bumpless wafer bonding.
- the modules group 236 can have a single write peripheral 202.
- each module of the modules group 236 may have a dedicated write peripheral that utilizes a shared clock.
- the modules group 236 may also be organized into separate partitions each with partition having a dedicated read peripheral, where each dedicated read peripheral has an independent clock. The partitions may be one, two, or more modules of the modules group 236.
- the write operation is instigated by a write enable signal. When a write command is initiated, this signal propels the write drivers and decoders into the writing process.
- Data input latches may be used as temporary storage units, retaining the data set to be written into the memory until the write operation is implemented.
- a data bus with a transmission route can be used to facilitate the movement of data from the data input latches to the memory cells.
- a write operation to the modules group may be performed through a priority arbitration circuit that facilitates the modules to be accessed in a predetermined order, and the shared write port 222 may be configured to write to a virtual address space that is mapped onto a physical memory space.
- the integrated circuit 212 may include a high voltage write logic used within the write peripheral 202, and the second semiconductor device 226 may comprise a plurality of processing elements, whereby each processing element includes a respective interface to communicate with a respective module of the modules group 236. Furthermore, the chiplet 230 may include an interface to the shared write port 222 on the second side to thereby interface with a complementary interface on the second semiconductor device 226.
- the integrated circuit 212 may also include a power gating circuitry that selectively powers down a module of the modules 236 when not in use. Additionally, the integrated circuit 212 may have a write peripheral 202 of the modules group 236 connected to a dedicated I/O pad to enable data transfer external to the package of the integrated circuit.
- the integrated circuit 212 may utilize multiple modules of the modules group 234 grouped together. These modules may be synchronized with one another in specific embodiments. In some cases, all the modules are synchronized, while in other instances, only specific modules are to be synchronized. For instance, the circuit on a second device 226 may need to synchronize with a specific module when reading data from one of the modules in the modules group 236.
- the integrated circuit 212 may use various timing technologies.
- a plurality of clocks may feed each respective module of the modules group 236, thereby allowing each module to have decoupled timing relative to the other modules in the group. This decoupling ensures that any delay in one module will not affect the functioning of other modules.
- the clocks used may or may not need to be synchronized.
- a common clock can be used to synchronize the modules.
- the clock signal or signals may be provided by the second device 226.
- synchronization techniques can be used, such as phase comparison of the clock signals or a phase-locked loop (PLL) synchronization method.
- PLL phase-locked loop
- Another embodiment for synchronizing the modules in the IC could use delay-locked loop (DLL) synchronization. In this method, a delay element is added to the clock signal path, and the output is compared to the input clock signal. The feedback loop adjusts the delay element until the output of the DLL matches the input, resulting in synchronization of the clock signals.
- the integrated circuit 212 could use a combination of different synchronization techniques to achieve synchronization between the modules. For example, some modules may use PLL synchronization while others use clock delay lines or DLL synchronization, depending on their specific requirements.
- the integrated circuit 212 can also use redundant synchronization techniques to ensure reliability and redundancy in case one method fails.
- the integrated circuit 212 could use both PLL synchronization and DLL synchronization simultaneously, so that if one method fails, the other can still maintain synchronization.
- Fig. 3 shows a block diagram 300 illustrating the memory address space of the integrated circuit of Fig. 1 in accordance with an embodiment of the present disclosure.
- the memory address space includes a write address space 316 and read data address spaces 310, 312, 314.
- the write address space 316 consists of various units where data, e.g., weights, and/or instructions can be stored. These units are referred to as memory addresses.
- the modules group 302 includes multiple memory modules 304, 306, 308.
- the write address space 316 may be distributed among the memory modules 304, 306, 308 such that the write address space 316 spans from 0 to N*M-1.
- the modules group 302 has N memory modules 304, 306, 308, where N is a positive integer, and each module has a memory size of M.
- the total number of unique write memory addresses in the write address space will be N*M, which can be referenced by an integer from 0 to N*M-1.
- memory addresses of the write address space 316 are ordered sequentially up to N*M-1.
- the first address is 0 and the final address is N*M- 1, encompassing a total of N*M addresses.
- This ordering can be linear (each address increases by one) or some other specified pattern depending.
- the write memory addressing can be implemented in a variety of ways based on the system architecture.
- One method used in a specific embodiment is to use the base and limit registers.
- the base register holds the smallest legal physical write memory address, and the limit register specifies the size of the range. Therefore, to generate a logical address, you would add the base to the relative address.
- a memory addressing scheme may be used where the base used is set to be 0. Yet additional write addressing techniques will be appreciated by one or ordinary skill in the relevant art.
- each memory module can possess a unique set of write memory addresses such all memory addresses within the modules group 302 is unique with respect to writing data, e.g., the first module starting at 0 and the last one ending at N*M-1.
- This allocation may be dependent on the memory management system of the device writing data to the modules 304, 306, 308, which could range from simple fixed partitioning schemes to more complex dynamic partitioning models.
- each module (304, 306, or 308) has an equal size of M addresses
- the first module 304 would possess write addresses 0 to M-l
- the second module would have write addresses M to 2M-1
- the third module would have write addresses 2Mto 3*M-1
- the Nth module 308, therefore, would possess write addresses from (N-1)*M to N*M-1.
- the modules group 302 has different read data address spaces 310, 312, 314. These read address spaces 310, 312, 314 may have overlapping addresses spaces, may have contiguous address spaces, or may have coextensive address spaces.
- the read address spaces 310, 312, 314 may be independent relative to each other.
- the system includes three independent read address spaces, labeled as read address spaces 310, 312, and 314. Each of these read address spaces is distinct from the others, meaning that reads can be performed in each space without affecting the others.
- the read address spaces 310, 312, 314 may be defined as contiguous blocks of memory addresses, each with its own starting address and ending address.
- each read address space 310, 312, 314 may have a range of addresses that corresponds to values from 0 to M-l, where M is a maximum value determined by the size of the modules 304, 306, 308 being used.
- allowing one processing unit to interface with each read address space 310, 312, 314, the concurrent reads may be implemented as described herein.
- the independence of the read address spaces 310, 312, 314 ensures that each processing unit can access its desired data without causing any interference or conflict with other processing units.
- Fig. 4 shows a block diagram illustrating the memory address space with the signal interfaces of the integrated circuit of Fig. 1 in accordance with an embodiment of the present disclosure.
- the signals used in Fig. 4 may be used with any embodiment described herein. However, one of ordinary skill in the relevant art will appreciate that different signaling schemes may be used.
- the modules group 402 includes modules 404, 406, 408 that share a common write peripheral 411.
- the write peripheral 411 includes a write address bus that includes the address of the data being written, a write data bus that includes the data, a write clock cause the writes to occur (e.g., either on a leading or trailing edge of the clock signal, etc.). The writes only occur if the write enable signal indicates a write should occur.
- the write peripheral 411 may be on the chiplet 230 and in other embodiments, the write peripheral 411 is on the second device 226.
- the modules group 402 has modules 404, 406, 408 where each has a respective read peripheral 410, 412, 414.
- Each of the read peripheral 410, 412, 414 has a read address bus to send an address for reading, a read data bus to receive the data, a read clock which is the clock used to control the timing of the output of the digital data, and an output enable that is a precondition to outputting data. Any logic may be used, e.g., high voltage may correspond to 1 and a low voltage may correspond to 0, or vice versa. In yet additional embodiments, multibit or analog data storage may be used.
- one or more of the read peripherals 410, 412, 414 may be on the chiplet 230 and in other embodiments, one or more of the read peripherals 410, 412, 414 are on the second device 226.
- Fig. 5 shows an illustration of an integrated circuit 500 that may be part of a semiconductor device such as a chiplet in accordance with an embodiment of the present disclosure.
- the integrated circuit 500 may be disposed on a semiconductor device, such as a chiplet, that has a silicon substrate 506 and a second layer portion 508.
- a semiconductor device such as a chiplet
- the integrated circuit 500 may include a modules group having a plurality of modules including a first module and a second module, etc. even though only a single module 502 is shown.
- the memory bit cells 522 are written to by the shared write port 512, 516, which includes both a write address bus line 512 and a write data bus 516. These buses run through the second layer 508 and can be connected to a second semiconductor device via an interposer. The second device has electrical contacts that complement those on the surface 518, allowing it to be electrically coupled to the write address and data buses.
- the memory bit cells 522 can be read from via the read port 524, 526, which includes a read address bus line 524 and a read data bus 526. Both of these buses can also run through the second layer 508 to the second semiconductor device coupled to the surface 518, which also has complementary electrical contacts to allow it to be electrically coupled to the read address and data buses.
- Various kinds of memory technologies may be used for the memory bit cells 522, such as a vertical connectivity fabric structure formed from non-volatile memory unit cells arranged in a three-dimensional column array 522.
- the memory bit cells 522 may utilize one or more of a cross-point, 3D NANDs, 3D NORs, 3D ANDs, and/or a stacked planar layer.
- the integrated circuit 500 is electrically connected to a second semiconductor device (not shown in Fig. 5) comprising another integrated circuit, which may be a system-on-chip or a Field-Programmable-Gate-Array.
- the memory bit cells 522 may be formed from various non-volatile memory types, such as FeFET, FeRAM, ReRAM, SOT, or STT. Additionally, alternatively, or optionally, the memory bit cells may be formed from non-volatile memory unit cells having 2-terminal devices, 3-terminal devices, or 4-terminal devices.
- the memory unit bit cells 522 may be formed from ferroelectric materials, such as a ferroelectric tunnel junction, a diode, a capacitor, a single-gate transistor, or a dual-gate transistor.
- the memory unit bit cells 522 may be formed from memristive materials, such as at least one ReRAM, or magnetic materials, such as at least one spin-orbit-torque device or at least one spin-transfer-torque device.
- the non-volatile memory unit cells 522 may also be formed from phase-change materials or anti-ferroelectric materials.
- the non-volatile memory unit cells522 can be formed from other types of materials, such as phase change materials, anti -ferroelectric materials, or multi-bit PCM materials.
- the non-volatile unit cells can be formed utilizing different structures, such as resistive random-access memory (RRAM) technology, magnetic random-access memory (MRAM) technology, or ferroelectric random-access memory (FRAM) technology.
- RRAM resistive random-access memory
- MRAM magnetic random-access memory
- FRAM ferroelectric random-access memory
- 3D NAND technology may be utilized to form the memory unit bit cells 522.
- the memory unit bit cells 522 may be formed from stacked memory layers where each layer includes a plurality of memory cells that can be accessed using shared bit lines.
- the read port 524, 526 may be coupled to the bit lines
- the write port 512, 516 may be coupled to the word lines that control the access to each layer.
- the 3D connectivity fabric structure can be built with stacked layers of either NAND gates, NOR gates, or AND gates, and in some cases, different types of logic gates may be combined to optimize the structure's functionality.
- the 3D connectivity fabric structure may be formed utilizing through-silicon-via (TSV) technology, which allows the vertical interconnection of the different layers of the structure.
- TSV through-silicon-via
- the non-volatile memory unit cells may include 2-terminal devices, such as a capacitive or a memristive device with or without an additional selector device such as a diode in series, 3-terminal devices, such as a floating-gate transistor, a transistor with an access gate, or 4-terminal devices, such as a transistor with two access gates.
- the type and configuration of the non-volatile memory unit cells 522 may depend on the specific application requirements, including the speed, power consumption, and reliability of the circuit.
- the memory unit cell may include or be a single ferroelectric transistor or 6T SRAM cell.
- the memory unit cell may be a combination of many different devices, including, but not limited to, one or more of a transistor, a memristor, a capacitor, etc.
- a ferroelectric material can be utilized to form the nonvolatile memory unit cells 522.
- the ferroelectric material may be implemented as any kind of device, including, but not limited to, a thin-film device, such as a ferroelectric tunnel junction, a capacitor, a single-gate transistor, or dual -gate transistors, etc.
- the non-volatile memory unit cells522 may be formed from a memristive material, such as a Metal Oxide Memristor (MOM), Conductive-Bridging RAM (CBRAM), or valence change memory (VCM), each of which provides different benefits regarding power consumption, speed, endurance, etc.
- MOM Metal Oxide Memristor
- CBRAM Conductive-Bridging RAM
- VCM valence change memory
- the non-volatile memory unit cells 522 may be formed from a magnetic material, such as spin-orbit-torque (SOT) devices, spin-transfer- torque (STT) devices, or perpendicular magnetic tunnel junctions (p-MTJ).
- SOT spin-orbit-torque
- STT spin-transfer- torque
- p-MTJ perpendicular magnetic tunnel junctions
- the modules group may include many modules where each of which can be accessed through dedicated read ports 524, 526 with a dedicate read peripheral 520 while sharing the same write port 512, 516 and shared write peripheral 510.
- the shared write port 512, 516 can be configured to selectively write to one or more of the plurality of modules within the modules group including the memory bit cells 522.
- Each of the modules may have the same or different sizes, and different module sizes may be configured to optimize the utilization of the memory array with different operating scenarios, etc.
- the integrated circuit 500 may be formed utilizing different manufacturing processes and techniques, which include but not limited to, a CMOS or Bipolar- CMOS-DMOS (BCD) process, a silicon-on-insulator (SOI) process, a FinFET process, a silicon germanium (SiGe) process, a gallium arsenide (GaAs) process, etc.
- BCD Bipolar- CMOS-DMOS
- SOI silicon-on-insulator
- FinFET silicon germanium
- SiGe silicon germanium
- GaAs gallium arsenide
- the where a three-dimensional column array of memory bit cells 522 forms is configured as a microvault.
- each of the micvrovaults will have a dedicated read peripheral 520 and a write peripheral 510 that is also a dedicated write peripheral rather than a shared write peripheral.
- each mircovault includes a dedicate write connection and a dedicated read connection, predetermined number of microvaults (e.g., 2 or 4), may have a dedicated write connection and a dedicated read connection, with or without dedicated respective peripheries, etc.
- the SOC 610 includes a silicon substrate 602 on which a plurality of processing elements is formed, including a processing element 606.
- the processing elements can communicate with each other through a Network-on-Chip (“NOC”) 604, which is a communication fabric that directs data transfer between the processing elements.
- NOC Network-on-Chip
- the communication fabric can take various forms, including buses, switches, NOCs, etc.
- the NOC 604 in the SOC 610 directs data traffic between the various nodes (e.g., the processing element 606) and links, which provide the communication paths between the nodes.
- the plurality of processing elements including the processing element 606 processing elements can be any suitable type of processors capable of executing instructions, including microprocessors, graphics processing units (GPUs), digital signal processors (DSPs), or application-specific integrated circuits (ASICs).
- processors capable of executing instructions, including microprocessors, graphics processing units (GPUs), digital signal processors (DSPs), or application-specific integrated circuits (ASICs).
- GPUs graphics processing units
- DSPs digital signal processors
- ASICs application-specific integrated circuits
- the SOC 610 may comprise various modules, such as module 232, which are grouped together to provide memory functionality to the assembly 600 as described here.
- the modules in modules group 236 can be coupled to a respective processing element provide it readable memory.
- the coupling between the module (e.g., module 232) and the processing element (e.g., 606) can be achieved through interconnects on the silicon substrate 602.
- a second layer 608 can be disposed on top of the substrate.
- the second layer 608 can be any suitable material, such as an insulating material, a metal, a dielectric, or interconnect layer, and it may be bonded to the chiplet 230.
- the bonding can be done using any suitable technique, including but not limited to, adhesives, soldering, or welding, etc.
- the assembly 600 provides a means of integrating the chiplet 230, which can include the integrated circuit of Fig. 1, with the SOC 610. Integrating the chiplet 230 provides various advantages, such as enhanced functionality, higher performance, and lower power consumption. Moreover, the integration of the chiplet 230 with the SOC 610 can be accomplished in various ways, depending on the particular application and design objectives of the system.
- the assembly 600 can incorporate various variations and modifications, depending on the specific requirements of the system.
- the processing elements formed on the silicon substrate 602 can vary in their number, type, and arrangement.
- the modules in modules group 236 can vary in their number, type, and function.
- the second layer 608 can be modified to include additional functionality.
- the second layer 608 can include passive components, such as resistors, capacitors, and inductors, or active components, such as transistors or diodes. Incorporating these components in the second layer 608 can further enhance the functionality and performance of the system.
- the assembly 600 can incorporate a heterogeneous integration approach, where the chiplet 230 is fabricated using a different technology than that used for the SOC 610. This approach allows for the optimal use of different fabrication technologies for different parts of the system, resulting in improved performance and reduced power consumption.
- FIG. 7 shows a perspective view of an assembly 700 having a semiconductor device 707 with an array of processing elements 706 (on a grid of processing element 706a, a to 706n,n, where the first subscribe is the columns and the second subscript is the row), and a second semiconductor device 709 having an array of microvaults 708.
- the array of microvaults 708 are on a grid of microvaults 708a, a to 708n,n, where the first subscript is the columns and the second subscript is the row). These subscripts may line up such that a respective subscript of a processing element 706 corresponds to a respective subscript of a microvault 708.
- the microvaults 708 are a type of module described herein where it is positioned in a vertical direction, e.g., above a respective processing element 706.
- the semiconductor device 707 may be a chiplet.
- the semiconductor device 709 may also be a chiplet.
- the chiplets 707,709 may be bonded together.
- the different layers 710 e.g., 710a through 710d can correspond
- the assembly 700 is the overarching structure that houses the various components shown in Fig. 7. It provides mechanical support and integration for the other elements, allowing them to function as a unified system.
- the assembly 700 includes two semiconductor devices - the semiconductor device 707 and the semiconductor device 709.
- the semiconductor device 707 contains an array of processing elements labeled 706a, a to 706n,n.
- the semiconductor device 709 contains an array of microvaults labeled 708a, a to 708n,n.
- the subscripts a, a to n,n indicate that the processing elements 706 and microvaults 708 are arranged in a grid pattern, with the first subscript referring to the column and the second subscript referring to the row.
- This grid arrangement allows each processing element 706 to have a corresponding microvault 708 positioned vertically above it.
- processing element 706a, a has microvault 708a, a above it
- processing element 706b, b has microvault 708b, b above it, and so on.
- the alignment of the grid allows tight integration between the processing and storage components.
- the semiconductor devices 707 and 709 are potentially separate chiplets that are integrated using packaging techniques into the unified assembly 700.
- the chiplet form factor allows greater flexibility and customization in assembling the system. This arrangement is such that each processing element 706 can access its respective microvault 708 located above it to retrieve relevant data, such as weights for neural networks or Al models. This may provide high bandwidth and low latency access to the data needed for efficient processing.
- Input data enters the system via the input DRAM memories 702. This data flows into the processing elements 706, where it is operated on locally using weights or parameters from the vertically integrated microvaults 708. The processing results output via the output DRAM memories 704.
- the input DRAM memories 702 consist of multiple individual DRAM modules labeled 702a, 702b, and 702c.
- the DRAM memories 702 can be any type of dynamic random access memory, including but not limited to DDR SDRAM, LPDDR SDRAM, GDDR SDRAM, and HBM.
- the DRAM memories provide high-bandwidth data input capabilities to feed data, such as inference inputs or training data, into the processing pipeline.
- each individual DRAM module 702a, 702b, and 702c has a dedicated interface and data path to each processing element 706.
- DRAM module 702a may feed data only to processing element 706a, a
- DRAM module 702b feeds data only to processing element 706b, b. This provides modular scalability, as additional DRAM modules can be added to feed more processing elements.
- the number of input DRAM memories 702 and individual modules 702a-702c may vary depending on the application requirements. For instance, there could be 4, 8, 16, or more input DRAM modules.
- the capacity of each module can range from gigabytes to terabytes depending on factors such as access speed, power, and cost budget.
- High-speed interfaces like DDR5, GDDR6, or HBM3 may be used to maximize data transfer bandwidth between the input DRAM memories 702 and the processing elements 706 across the semiconductor device 707.
- Shared data buses, crossbar switches, or on-chip networks may interconnect groups of DRAM modules 702 and processing elements 706.
- the input DRAM modules 702 may be stacked or arranged in a multi-dimensional configuration to increase overall memory capacity and bandwidth while reducing latency and power consumption.
- Specialized memory controllers and schedulers may manage parallel data access across multiple input DRAM modules 702.
- the input DRAM memories 702 supply the high-bandwidth data needs of the parallel processing elements 706, enabling fast and efficient data-intensive computations such as neural network inferencing.
- Each processing element 706 can directly access the required input data from its dedicated DRAM module 702 without contending with other processors for data access.
- adjacent accelerator chiplets may be in communication with the semiconductor device 707. That is, there may be a grid-like arrangement of assemblies 700 in communication with each other to perform Al inference and/or Al training (e.g., transformer inference, CNN interference, ANN interference, etc.). In some embodiments, there may be clusters of semiconductor devices 707 that share a bank or portion of DRAM memories 702 and/or 706. In some embodiments, the input DRAM memories 702 and the output DRAM memories 704 may be combined into the same DRAM memory.
- the assembly 700 includes a semiconductor device 707 that comprises an array of processing elements labeled from 706a, a to 706n,n.
- Each processing element in the array may be configured to execute specialized computations and data processing operations.
- the processing elements could be optimized for artificial intelligence workloads like neural network inference.
- the processing elements may focus more on general-purpose capabilities.
- the capabilities of each processing element depend on its specific microarchitecture which can be tailored for certain applications if desired.
- the processing elements 706 can access nearby memory storage to retrieve data that feeds into their computations. This memory may be physically separate from the processing element arrays 706 as is the case with the microvaults 708 shown in Figure 7.
- the processing elements 706 and microvaults 708 are aligned so that each microvault is positioned directly above its corresponding processing element in a vertical configuration. This tight coupling provides fast data transfer speeds between each vault-element pair.
- the array of processing elements 706 resides within the semiconductor device 707.
- the semiconductor device 707 could potentially be manufactured as a standalone chiplet using advanced packaging techniques.
- This modular chiplet can then be integrated with other components like the microvault chiplet 709 through high-density interconnections.
- Some options include bumpless hybrid bonding, interposers, or even monolithic 3D integration.
- combining chiplets allows creating powerful heterogenous systems with optimized dies.
- processing elements 706 can vary between implementations of assembly 700. For instance, simpler systems may need only a 2x2 grid of elements whereas a sophisticated Al accelerator could feature a 32x32 array.
- the processing elements 706 themselves can also have different memory access routes across the assemblies. Point-to-point links, crossbar switches, or shared buses are possible connection structures. Such architectural decisions depend on the performance and area constraints trying to be met.
- the second semiconductor device 709 is a separate device from the first semiconductor device 707. Like the first semiconductor device 707, the second semiconductor device 709 may also be implemented as a chiplet.
- the second semiconductor device 709 includes an array of microvaults 708, arranged on a grid spanning from microvault 708a, a to 708n,n.
- the microvaults 708 on the second semiconductor device 709 are positioned vertically above the processing elements 706 on the first semiconductor device 707.
- Each microvault 708 lines up with and corresponds to the processing element 706 underneath it, based on the subscripts identifying their position in the grid.
- microvault 708a, a is vertically aligned with and corresponds to processing element 706a, a. This allows each processing element 706 to access the microvault 708 above it.
- the microvaults 708 act as a memory structure, storing things like Al model weights that can be accessed by the processing elements 706 underneath during operations like neural network inference.
- the microvaults 708 may be optimized for very fast read times but slower write times. This allows the processing elements 706 to rapidly access the weights and data needed for their computations, while less frequently updated data can still be written at a slower pace.
- the second semiconductor device 709 containing the array of microvaults 708 is directly bonded to the first semiconductor device 707 with the processing elements 706. This bonding aligns each microvault 708 with its corresponding processing element 706 underneath. Electrically conductive interconnects between the devices allow each processing element 706 to communicate directly upwards with its respective overlying microvault 708.
- the microvaults 708 may contain multiple memory layers, labeled 710a to 710d, each storing weights or data for a different Al model. For example, layer 710a contains the weights for model A, layer 710b contains the weights for model B, and so on. Stacking these layers vertically contributes to the high density and fast access times of the microvault design.
- the microvaults 708 can be implemented using various memory technologies, including but not limited to SRAM, FeFET, ReRAM, SOT, and STT, optimized for fast readout times to supply data to the processing elements 706 with minimal latency. Specific embodiments may configure the microvaults 708 to have much faster read speeds compared to their write speeds. The microvaults 708 may be implimented using any FeFET or memory structure described herein.
- each microvault 708 may have a capacity between 4 kilobytes to 128 kilobytes for storing parameters for machine learning models or other data.
- the bit density per layer may exceed 0.4 gigabits per square millimeter.
- the microvaults' 708 compact size between less than 100 micrometers on each side in one embodiment or 12 micrometers by 12 micrometers in another, allows high-density integration of the memory modules.
- the array-based arrangement of the microvaults 708 may enables concurrent parallel data access by the processing elements 706, supporting high-throughput data processing by the assembly 700.
- the one-to-one alignment of the microvaults 708 and processing elements 706 also ensures that each processing element 706 has dedicated access to its required data without contention.
- the microvaults 708 share the semiconductor device interface provided by the second semiconductor device 709, which facilitates writing data to the microvaults 708 from the input DRAM memories 702. Reading data from the microvaults 708 to the processing elements 706 and output DRAM memories 704 is handled through dedicated pathways between each vertically aligned microvault 708 and processing element pair.
- microvault memory layers 710 refer to multiple layers of microvault memories stacked vertically within the second semiconductor device 709. As illustrated in Fig. 7, there are four separate microvault memory layers labeled as 710a, 710b, 710c, and 710d. Each layer contains an array of microvaults, such as the array of microvaults 708 shown in the diagram.
- the microvaults 708 may utilize a stacked 3D NAND architecture built from multiple layers of NAND memory arrays using charge trap flash technology. Each microvault 708 may contain a dedicated set of wordline drivers on the bottom layer to facilitate access to the 3D NAND cell arrays above, spaced by alternating dielectric layers.
- the 3D NAND implementation may be used to maximize density and throughput by leveraging vertical scaling.
- the microvaults 708 employ a 3D NOR architecture constructed from multiple tiers of NOR flash memory arrays. Each plane features NOR strings with a source line and bit line architecture, stacked on top of each other using vias. The 3D NOR arrangement optimizes random read access times to stored data.
- the microvaults 708 may also adopt a hybrid configuration with different types of volatile and/or non-volatile memory, such as combining FeRAM and ReRAM cells, organized into vertical sub-arrays. This heterogeneous 3D integration allows optimizing for speed, endurance, and retention within the same vault structure.
- the microvaults 708 integrate processing logic like analog computing directly into the memory array stack itself. This processing-in-memory approach places basic computational operators within the memory peripheral or bit cells, enabling highly parallel and efficient in-situ data processing.
- Some implementations may utilize 2.5D or 3D stacking to integrate the microvaults 708 with other components like logic, CPUs, GPUs or application-specific accelerators.
- This tight packaging integration via techniques like high-bandwidth memory cube architectures reduces data transfer latency and power consumption.
- the microvaults 708 may also employ a virtualized architecture, with an external memory controller handling translation between the physical array organization and dynamically allocated virtual memory domains. These virtual domains mapped onto the physical array effectively creates separate virtual vaults with flexible capacities tailored to application needs.
- the microvaults 708 are designed as Computational RAM (CRAM) with integrated processing capabilities within the bit cell peripheral to enable highly parallel in-memory computing architectures. Gateless transistor structures integrated into the CRAM arrays facilitate efficient execution of bulk bitwise operations.
- CRAM Computational RAM
- microvaults 708 into modular Memory Processing Unit (MPU) structures containing dedicated processing logic tailored for workloads like Al inferencing.
- MPU Memory Processing Unit
- the MPU architecture couples vault arrays to vector processors via highspeed interfaces like HBM2 enabling low-latency data transfers.
- the microvaults 708 may also implement content-addressable capabilities by integrating comparison logic into the memory peripheral. This facilitates searching or accessing data based on content rather than explicit addresses, enabling powerful pattern matching capabilities.
- Certain embodiments may stack multiple microvault dies on top of base logic dies featuring things like GPUs or Al accelerators. This creates dense, high-bandwidth heterogeneous systems optimized for data-centric workloads while minimizing data movement.
- microvault memory layers 710 may be fabricated utilizing three- dimensional integrated circuit manufacturing processes to stack multiple dies or wafers containing microvault 708 arrays on top of each other.
- Through-silicon vias (TSVs) or other vertical interconnect technologies can be employed to enable communication between the layers.
- each microvault memory layer 710 corresponds to a different artificial intelligence (Al) model or application.
- layer 710a could store the weights and parameters for Al model A
- layer 710b could store the weights and parameters for Al model B, and so on. This allows multiple Al models to be stored efficiently within the same microvault memory 708 structure.
- the microvaults 708 may possess capacities ranging from 4 kilobytes to 128 kilobytes in some embodiments. In other cases, the capacity could be between 4 kilobytes to 16 kilobytes. Each microvault could have lateral dimensions less than 100 micrometers by less than 100 micrometers, while extending vertically to incorporate potentially over 200 memory cell layers in some implementations. [00202] In some embodiments, the bit density of per square millimeter per layer within the microvault memory layers 710 may facilitate high-capacity storage with a small footprint. The layers may utilize non-volatile memory technologies, such as FeFET, STT-MRAM, or ReRAM, to retain data when power is removed.
- non-volatile memory technologies such as FeFET, STT-MRAM, or ReRAM
- the processing elements 706 may access weights or parameters from the microvault memory layers 710 to perform neural network inferencing or other machine learning computations.
- the inference results of the various Ais may be sent to the output DRAM memories 704. which comprise individual DRAM memory modules labeled 704a, 704b, and 704c.
- the output DRAM memories 704 are positioned adjacent to the array of microvaults 708 and the second semiconductor device 709.
- the output DRAM memories 704 may serve as temporary data storage that can buffer output data retrieved from the microvaults 708 before it is transmitted externally.
- Each DRAM memory module 704a, 704b, and 704c may have similar or different storage capacities, depending on the design requirements. For example, in one embodiment, each module contains 16 megabits of storage.
- the DRAM storage cells utilize a capacitor to retain data bits in the form of electrical charges. Due to charge leakage, the DRAM memories require periodic refresh cycles to maintain the stored data integrity.
- each DRAM module 704a, 704b, and 704c can have dedicated internal control circuitry and VO ports.
- the data outputs from the individual microvaults 708 may get aggregated and buffered in the output DRAM memories 704 before being transmitted to external components via peripheral circuitry. Buffering the data allows the transmission rate to be regulated to match the requirements of the external interfaces. It also enables data processing operations like formatting, encoding, or encryption to be performed by the second semiconductor device 709 prior to output.
- the output DRAM modules 704a, 704b, and 704c are designed to provide high-density, low-cost temporary data storage to support the high- bandwidth parallel reads from the array of microvaults 708. Optimizing these performance parameters allows efficient extraction of data from the microvaults to feed the computational workflows hosted on external chips or devices. Specific implementations may utilize various types of DRAM, including asynchronous DRAM, synchronous DRAM, graphics DRM, and low-power DRM tailored to the application. Overall, the output DRAM memories 704 facilitate seamless data movement from the integrated microvaults to external execution pipelines. [00208] Fig.
- FIG. 8 shows an assembly 800 of a semiconductor devices 802, 804, 806, 808, 810, 812, 814 including several memory types in accordance with an embodiment of the present disclosure.
- Fig. 8 shows an exemplary assembly 800 of semiconductor devices 802, 804, 806, 808, 810, 812, 814 configured to provide a hierarchical memory structure.
- the assembly 800 is modular and scalable, allowing for various combinations and numbers of semiconductor devices, which may be implemented as chiplets in certain embodiments, to be stacked to meet specific performance and density requirements.
- the assembly 800 includes an application semiconductor 802, that may be a plurality of processing elements as described herein.
- semiconductor device 802 On top of the semiconductor device 802 is semiconductor device 814 that includes an array of microvaults.
- semiconductor device 812 which may also include an array of microvaults.
- semiconductor devices 810, 808, On top of the semiconductor device 812, are semiconductor devices 810, 808, which may be SRAM vaulted dies.
- semiconductor devices 806, 804 may be DRAM vaulted dies.
- These vaults 816 may be arranged in a grid-like fashion such that 816a, a to 816n,n subscripts the vaults.
- Each of these vaults may include a respective microvault from semiconductor devices 804, 812, respective SRAM vaults from semiconductor devices 810, 808, and respective DRAM vaults from semiconductor devices 806, 804.
- the semiconductor device 802 which comprises a plurality of processing elements. These processing elements execute computational tasks and facilitating data flow within the system.
- semiconductor device 814 which includes an array of microvaults.
- These microvaults utilize Field-Effect Transistors (FeFETs) known for their non-volatile characteristics and suitability for high-density memory applications.
- the FeFET-based microvaults may be designed to enable high-speed read operations essential for rapid data retrieval during processing tasks, such as Al inferencing, while supporting slower write operations that are more tolerant to latency.
- Stacked on top of semiconductor device 814 is semiconductor device 812, which similarly includes an array of microvaults.
- the presence of multiple layers of microvaults in semiconductor devices 814 and 812 exemplifies the scalable nature of the assembly, where additional memory capacities and functionalities can be integrated through additional layers.
- semiconductor devices 810 and 808, positioned above semiconductor device 812 are depicted as SRAM vaulted dies.
- SRAM provides fast access memory that can serve as a cache or buffer to the slower, but denser, FeFET microvault memory layers beneath.
- DRAM DRAM vaulted dies.
- DRAM is typically used for main memory due to its relatively high speed and low cost per bit compared to SRAM, offering a balance between performance and economy.
- This vertical stacking and alignment ensure that data and control signals can be directly routed between processing elements and their respective memory stacks, facilitated by interconnect technologies such as through-silicon vias (TSVs) and micro-bumps, which are sued in the assembly's 800 3D integrated circuit architecture.
- TSVs through-silicon vias
- micro-bumps micro-bumps
- the modular and scalable design of assembly 800 allows for various combinations of semiconductor devices or chiplets to be integrated into more extensive systems.
- the flexibility in the number and combination of stacks provides the adaptability to tailor the assembly to the requirements of different applications and performance demands.
- Each vault within the vaults 816 within the assembly 800 presents a multi -die structure that contributes to the overall capacity and performance of the system.
- Fig. 9 shows an assembly of semiconductor devices including a semiconductor device with a system-on-chip 914 and another semiconductor 906 with microvaults disposed on top in accordance with an embodiment of the present disclosure.
- the assembly 900 integrates a semiconductor device 906 and a semiconductor device 914, which may be implemented as separate chiplets bonded together.
- the semiconductor device 914 includes various components to facilitate reading data from the microvaults on the semiconductor device 906, such as a read address register input interconnect 924, read data register 922, and read data register output interconnect 950. These components pass the read address to the microvaults on semiconductor device 906 and return the read data back to the semiconductor device 914.
- the read address enters via interconnect 924 into the read address register 926.
- the output of this register 962 connects through interconnects and bumpless bonds to another read address register 938 on the semiconductor device 906, which then addresses the target microvault 936.
- the microvault 936 outputs read data via interconnect 940 to a read data register 942, which passes the data back through bumpless bonds 910, 918 to read data register 922 on semiconductor device 914. This data can then be accessed externally via the read data register output interconnect 950.
- the semiconductor device 914 and 906 have interconnected Through-Silicon Vias 916 and 944 to allow communication with devices potentially stacked above semiconductor device 906.
- the semiconductor device 906 features various memory structures to provide data storage capabilities.
- the microvault 936 resides on the BEOL portion of the chiplet, allowing dense 3D integration of memory layers.
- the microvault utilizes non-volatile memory technologies like FeFET or STT-MRAM for data retention without power.
- the semiconductor device 914 comprises processing elements and data routing circuitry to retrieve and manipulate data stored in semiconductor device 906. Components like read address register 926 and read data register 922 handle sending read addresses and receiving data from the microvault 936 respectively.
- the device 914 also includes interconnects 924, 950 and Through-Silicon Via 916 to communicate externally.
- the two devices 906 and 914 integrate via fine-pitch interconnects like bumpless hybrid bonds 908, 910, 918, 920, 930 and 932. This allows direct data transfer pathways between processing components in device 914 and memory structures in device 906. Alignment during bonding ensures dedicated access - for instance, read data register output interconnect 950 on 914 links directly to read data register 922 to receive requested data.
- a read address enters through interconnect 924 into read address register 926 on device 914. This gets communicated via interconnects and bumpless bonds to read address register 938 on device 906, which then addresses microvault 936. Requested data gets passed via interconnect 940 to read data register 942, then transfers through bonds back to read data register 922 on 914, where it becomes available externally via interconnect 950.
- the assembly 900 exemplifies a modular, high-density architecture optimized for data-centric applications like Al inferencing. Tight integration of processing and storage dies via advanced packaging techniques allows localized data access with minimal latency and power. Scalability is also enabled by incorporating multiple chiplets, in this case devices 906 and 914.
- the assembly 900 illustrates a potential configuration suited for space-constrained, high-performance computing systems.
- the Through-Silicon Via (TSV) 916 is an electrical connection that passes vertically through the semiconductor device 914. Its purpose is to provide a pathway for signals to travel between the top and to a processing element within the semiconductor device 914. This allows the device to be stacked and interconnected with other components in a vertical configuration.
- the TSV 916 along with other TSVs on the device, facilitates high-density 3D integration and heterogeneous stacking of multiple devices like chiplets.
- the TSV 916 interacts with several other components within the system. On the top side of semiconductor device 914, it connects to interconnect 912, which couples it to bumpless bonds 918. These bonds interface with complementary bumpless bonds 910 on the bottom side of semiconductor device 906 when the two devices are stacked. This allows signals to travel from device 914 to device 906 through the TSV 916. The route continues as signals go through interconnect 902 to TSV 944 on device 906. TSV 944 provides a vertical signal pathway to the top surface of device 906 where additional devices could be stacked. In the reverse direction, signals can travel from TSV 944 down through device 906, back up TSV 916, and down into device 914. So the TSV 916 provides bidirectional vertical communication across device boundaries.
- TSV 916 There are a few possible variations for the TSV 916 implementation.
- the dimensions and materials of the TSV could be optimized- for example, smaller TSV diameters using denser materials like tungsten could be advantageous.
- the interface circuitry driving signals into the TSV like interconnects 912 and 902, could employ variable line drivers to support different voltage levels or signal integrity enhancements. Further embodiments may include integrated monitoring circuitry within TSV 916 to track metrics like temperature and link utilization. And alternative signaling schemes besides electrical signals could be employed in future cases. For instance, integrated silicon photonics utilizing modulated light to convey data through the TSVs could enable very high bandwidth and low latency connectivity.
- TSV-based vertical links like TSV 916 within these complex 3D integrated architectures.
- interconnect 912 there are several variations and alternatives for the interconnect 912 implementation.
- different conductive materials such as copper or aluminum may be utilized to fabricate the pathways forming interconnect 912 and optimize for conductivity or thermal dissipation.
- interconnect 912 may feature redundant signal paths or self-repair capabilities using spare interconnect lines to improve reliability and resilience.
- the bumpless bonds 918 and 910 connecting devices 906 and 914 could also be replaced with other high-density bonding approaches like hybrid bonding or Through-Silicon Vias.
- alternate signaling schemes besides simple digital logic could be employed on interconnect 912, such as analog signaling or multi-level digital waveforms to enhance data transmission capabilities.
- the routing and dimensions of interconnect 912 can also be adapted according to bandwidth requirements or circuit layout considerations. Overall, many structural and functional alternatives exist for crafting interconnect 912 to meet application needs.
- the bumpless bonds 918 are electrical connections located on the semiconductor device 914 between an interconnect 912 and bumpless bonds 910 of the semiconductor device 906.
- the bumpless bonds 918 provide an electrical pathway for signals to travel between the semiconductor device 914 and any additional semiconductor devices, such as the semiconductor device 906, stacked on top of the assembly 900.
- the signals communicated over the bumpless bonds 918 can include data signals, control signals, address signals, or any other signals needed to coordinate operations between the multiple semiconductor devices.
- the number of individual bond sites can range from just a few to hundreds, depending on signal bandwidth requirements.
- the bonding method can utilize techniques like direct bonding, plasma-activated bonding, adhesive bonding, or compression bonding. Hybrid bonding approaches are also possible, combining direct wafer bonds with intermediate metal bonds.
- the size and pitch of each bond site can vary and may use pitches under 10 micrometers to enable high-density connections. Redundant bonds can provide backup pathways. Shielding structures may surround bonds for noise immunity. Overall, many embodiments of bumpless bonds 918 are possible to meet cost, reliability, and performance needs.
- the bumpless bonds 910 provide an interface for communicating signals between the semiconductor device 906 and semiconductor device 914. Specifically, the bumpless bonds 910 of the semiconductor device 906 are electrically coupled to the complementary bumpless bonds 918 of the semiconductor device 914. This allows signals like read/write data and addresses to be transmitted between the two devices. The bumpless nature of the bonds allows for a low-profile, high-density interconnection.
- the bumpless bonds 910 interact with other components in the system to facilitate data transfer operations.
- data enters the semiconductor device 914 via the Through-Silicon Via 916, passes through interconnect 912 and bumpless bonds 918 before reaching bumpless bonds 910 of device 906.
- addresses flow from the read address register 926 of device 914 through interconnects 928, 930 and bumpless bonds 932 into the read address register 938 on device 906. Read data then returns through bumpless bonds 908 and 920 back to device 914. So the bumpless bonds 910 provide key data and address routing between the devices.
- bumpless bonds 910 Possible variations of the bumpless bonds 910 include using different bond densities, materials, or electrical contact configurations to optimize performance.
- the bonds can use alloying or doping techniques to improve conductivity.
- the routing of signals can be changed, for example by using separate ports for input and output instead of shared ports.
- More bumpless bonds can be added to increase bandwidth between devices. Shielding may be added around the bonds to reduce interference.
- many modifications to the bumpless bonds 910 are possible within the scope of electrically interconnecting multiple devices.
- the Through-Silicon Via (TSV) 944 is an electrical connection that passes vertically through the semiconductor device 906 from the top surface to the bottom surface. Its purpose is to facilitate communication of signals and data between the semiconductor device 906 and any additional semiconductor devices potentially stacked on top of it in a 3D integrated circuit configuration.
- the TSV 944 enables high-density interconnections between multiple stacked semiconductor layers, providing an efficient means for data routing and signaling.
- the TSV 944 interfaces with surrounding circuitry within the semiconductor device 906, allowing signals to be transmitted upwards or downwards depending on the system configuration.
- the TSV 944 couples to the read data register 942 via interconnect 946.
- the read data register 942 can use the TSV 944 path to transfer read data from the microvault 936 to external semiconductor devices. This enables efficient data offloading from the on-chip memory.
- the TSV 944 continues through to the top surface of semiconductor device 906, where it may interface with complementary contacts or interconnects on the bonded semiconductor above it. This facilitates the vertical transfer of signals and data along the assembly 900.
- TSV 944 There can be many variations in the specific implementation of the TSV 944. Its dimensions can range from a few microns to tens of microns to match pitch requirements.
- the TSV 944 can be tapered, straight, or have non-uniform cross-sections. It may utilize different conductive materials as liners and fills, including metals like copper, tungsten or alloys. Insulating liners made of materials like silicon dioxide can separate the conductive fill from the substrate.
- the contacts and interconnects coupling into the TSV 944 can also have diverse layouts. Multiple TSVs can be placed adjacent to each other in a high-density array configuration if desired. Overall, many architectural optimizations in the design and fabrication process of the TSV 944 are possible within the scope of the present disclosure.
- FIG. 10 shows a semiconductor assembly 1000 incorporating a daisy-chained configuration of microvaults 1036, 1058, operatively connected to a multiplexer 1060 and managed by a counter 1062 for coordinated data selection and retrieval, in accordance with an embodiment of the present disclosure.
- This assembly 1000 is designed to carry out data processing tasks, potentially for applications such as artificial intelligence (Al) and machine learning, where high-speed data access and processing are utilized.
- Al artificial intelligence
- machine learning high-speed data access and processing are utilized.
- the assembly 1000 comprises two primary semiconductor devices: semiconductor device 1006 and semiconductor device 1014.
- Semiconductor device 1014 is depicted as containing several interfaces and registers for data communication, including a read address register input interconnect 1024.
- This interconnect 1024 facilitates the delivery of read addresses to a read address register 1026, which temporarily holds these addresses before they are transmitted to corresponding microvaults 1036, 1058 in semiconductor device 1006 for data retrieval operations.
- interconnect 1028 serves as a pathway for read addresses from the read address register 1026 to transition to bumpless bonds 1030.
- Bumpless bonds 1030 and 1032 represent high-density, low-profile electrical connections between semiconductor device 1014 and semiconductor device 1006, ensuring the transmission of read addresses with minimal signal loss and physical space requirements.
- Microvault 1036 an memory storage unit, may encompass a variety of memory technologies, such as FeFETs and/or 3D-NAND structures as described herein, to facilitate the storage and rapid retrieval of data.
- the multiplexer 1060 selects the appropriate data stream from multiple microvault 1036, 1058 outputs. Controlled by a counter 1062, which may operate according to a predefined sequence or be driven by external control signals, the multiplexer 1060 arbitrates between the outputs of microvault 1036 and another microvault, denoted as microvault 1058. Microvault 1058, similar in function and potential memory technology to microvault 1036, provides an additional source of data for the multiplexer 1060 to select from.
- the desired data is selected by the multiplexer 1060, it is temporarily stored in a read data register 1042, also located within semiconductor device 1006.
- This register 1042 acts as a buffer, holding the data for subsequent processing or transmission.
- the read data is then routed via interconnect 1004 to bumpless bonds 1008, which facilitate the transfer of data to the semiconductor device 1014.
- Bumpless bonds 1010, 1018 facilitate the continued data's journey through the assembly 1000, ensuring data transfer from semiconductor device 1006 to semiconductor device 1014.
- the read data arrives at semiconductor device 1014, it is channeled via interconnect 1048 to a read data register, specifically read data register 1022, where it can be accessed by external systems, such as an application-specific integrated circuit (ASIC) or a system-on-chip (SoC), via the read data register output interconnect 1050.
- ASIC application-specific integrated circuit
- SoC system-on-chip
- the assembly 1000 encompasses Through-Silicon Vias (TSVs) 1016 and 1044, providing vertical electrical connections through the semiconductor devices 1014 and 1006, respectively.
- TSVs Through-Silicon Vias
- Interconnects 1012 and 1046 serve as horizontal pathways for signals to travel to and from the TSVs 1016 and 1044, respectively.
- the assembly 1000 may be subject to various modifications and alternative embodiments.
- the number and arrangement of microvaults, the specific types of memory technologies employed within the microvaults, and the configuration of interconnects and bonding areas may be tailored to meet the requirements of different applications.
- the semiconductor devices 1006 and 1014 may be designed to accommodate additional functionality, such as thermal management layers for heat dissipation, hardware-based encryption modules for data security, or power management circuits to optimize energy consumption.
- additional functionality such as thermal management layers for heat dissipation, hardware-based encryption modules for data security, or power management circuits to optimize energy consumption.
- the detailed structure of Fig. 10, therefore, serves as a foundation upon which a variety of sophisticated semiconductor systems can be constructed, each tailored to the specific needs of its intended application.
- microvaults 1036 and 1058 present a daisy-chaining configuration that allows for an expandable and flexible memory architecture within the semiconductor device 1006. This daisy-chaining is facilitated through a series of interconnected pathways and controlled by the multiplexer 1060 in coordination with the counter 1062.
- Each microvault such as 1036 and 1058, is designed to hold and provide rapid access to data, which may be in the form of stored charge, magnetic states, ferroelectric material states, or other physical embodiments of binary information.
- the microvaults are interconnected such that the output of one microvault can be routed to the input of another, creating a chain of memory elements. This is achieved through a series of interconnects, such as interconnect 1034 for microvault 1036 and interconnect 1056 for microvault 1058, which serve as conduits for the read data signals emanating from the microvaults.
- the multiplexer 1060 manages the flow of data from this daisy chain of microvaults. It is designed with multiple inputs, each connected to the output of a microvault via respective interconnects.
- interconnect 1040 carries the read data from microvault 1036
- interconnect 1056 carries the read data from microvault 1058 to the multiplexer 1060.
- the multiplexer 1060 is capable of selecting which input to connect to its output at any given time, thus controlling which microvault's data is forwarded to the read data register 1042.
- the counter 1062 orchestrates the operation of the multiplexer 1060. It may be a binary counter or any form of sequential logic circuit that produces a series of output states in response to a clock signal.
- the counter 1062 progresses through its states with each tick of the clock, which may be provided by an external clock source or generated internally within the semiconductor device 1006. As the counter 1062 advances, it outputs a control signal that instructs the multiplexer 1060 on which input to select.
- the counter 1062 may instruct the multiplexer 1060 to connect the output from microvault 1036 to the read data register 1042.
- the counter may switch the connection to microvault 1058's output, and so on, cycling through the available microvaults in a predefined order.
- the counter's sequence and timing can be configured based on the desired data access patterns and the specific requirements of the processing tasks at hand.
- This clock-driven coordination allows for an efficient and organized retrieval of data from a potentially large array of microvaults. It ensures that each microvault has an equal opportunity to present its data for processing, and it simplifies the control scheme by reducing it to a predictable, rhythmic progression of states. This is particularly advantageous in systems where a large volume of data must be processed in parallel, as it provides a systematic method for accessing and utilizing the stored information.
- Fig. 10 illustrates only two microvaults
- the described daisy-chaining mechanism can be extended to accommodate any arbitrary number of microvaults. Additional microvaults can be added to the chain, with each new microvault connected to the multiplexer via an additional input line.
- the multiplexer 1060 and counter 1062 would be scaled accordingly to manage the increased number of inputs, maintaining the same clock-driven, sequential data retrieval process across the expanded memory architecture.
- This daisy-chaining of microvaults in conjunction with the multiplexing and counter-driven control system, exemplifies a modular and scalable approach to memory design in semiconductor devices. It allows for the customization of memory arrays to match the capacity and performance needs of a wide range of applications, from embedded systems to large-scale data centers, providing a versatile solution for modern computing challenges.
- FIG. 11 shows a semiconductor assembly 1100 incorporating a daisy-chained configuration of microvaults in multiple semiconductor devices 1106, 1116 that are operatively connected to multiplexers and managed by counters for coordinated data selection and retrieval, in accordance with an embodiment of the present disclosure.
- Fig. 11 of the accompanying drawings illustrates an embodiment of an assembly 1100 as part of an integrated circuit.
- the assembly 1100 may be seen as a hierarchical structure that includes a bottom semiconductor device 1122, a middle semiconductor device 1116, and a top semiconductor device 1106 (each of these may be chiplets).
- Each semiconductor device is configured to interface with the others through a series of bumpless bonds, such as bumpless bonds 1128, 1129, 1158, 1159, 1160, 1161, 1162, and 1163, which facilitate electrical connectivity without the added profile of traditional bonding methods, thus enabling a compact and dense stacking of semiconductor layers.
- the bottom semiconductor device 1122 includes a read address register 1126, which may be configured to store and communicate read addresses to microvaults located across the assembly 1100.
- the read address register 1126 communicates via an interconnect 1174, which serves as a conduit for signals directed to bumpless bonds 1128. These bonds in turn engage with bumpless bonds 1129 of the middle semiconductor device 1116, thus transferring the read addresses into the middle semiconductor device.
- the bottom semiconductor device 1122 also comprises a read data register 1124, which may serve as a repository for read data received. Read data is received through an interconnect 1176 that connects to bumpless bonds 1158, which are in communication with bumpless bonds 1159 of the middle semiconductor device 1116.
- the middle semiconductor device 1116 serves as an intermediary layer within the assembly 1100, housing microvaults such as microvault 1132 and microvault 1118, each of which may be designed to store and rapidly provide access to data. These microvaults are linked to other components within the device via interconnects, such as interconnect 1130 and interconnect 1134, which guide the flow of read addresses and read data, respectively.
- the middle semiconductor device 1116 also features a read address register 1146, which receives read addresses from interconnect 1130, and a read data register 1154, which collects read data from TSV 1152.
- the multiplexer 1114 within the middle semiconductor device 1116 selects between various data streams. This multiplexer is controlled by a phase counter 1110, which determines the sequence of data selection based on the input received from the top semiconductor device 1106 via the TSV 1152.
- the read data register 1112 serves as a holding area for the selected data stream from the multiplexer 1114.
- the top semiconductor device 1106 features a phase counter 1102 and a read data register 1104, which are used for coordinating data flow within the assembly 1100.
- the phase counter 1102 in conjunction with multiplexer 1150, dictates the output of read data from microvaults 1138 and 1164 based on the selected phase.
- the read data register 1104 captures the output from the multiplexer 1150, which is then relayed through interconnect 1108 to bumpless bonds 1162, facilitating communication with the middle semiconductor device 1116.
- the assembly 1100 illustrates the integration of multiple microvaults across different semiconductor devices. For example, the microvault 1132 on the middle semiconductor device 1116 may receive read addresses from read address register 1146 via an interconnect 1134, path " 1".
- microvault 1118 may receive read addresses from the same register via an interconnect 1156, path "2".
- Microvaults 1138 and 1164 on the top semiconductor device 1106 receive read addresses via interconnect 1140 and paths "3" and "4", respectively, from read address register 1142, which is in communication with the middle semiconductor device 1116 via TSV 1136 and bumpless bonds 1160 and 1161.
- Each microvault such as 1132, 1118, 1138, and 1164, can potentially output read data to the multiplexer 1114 or 1150, where the data is then selected based on the configuration of the respective phase counter, 1110 or 1102.
- the selected data is temporarily stored in a read data register, either 1112 or 1104, before being transmitted down the assembly 1100 through the respective bumpless bonds and interconnects, ultimately reaching the read data register 1124 of the bottom semiconductor device 1122.
- This arrangement allows for a synchronized read-out of data from all microvaults, which may be essential in applications requiring parallel processing and high-speed data access.
- the TSVs provide vertical connectivity across the semiconductor devices, enabling the integration of additional layers or functionalities atop the existing assembly 1100. These TSVs are coupled to various interconnects and bumpless bonds that establish the necessary pathways for signal transmission both within and between the semiconductor devices.
- the microvaults within the assembly 1100 may include various memory technologies, such as 3D-NAND or 3D-NOR structures, and are arranged to facilitate parallel processing and efficient data retrieval. Each microvault may include additional features, such as thermal management layers for heat dissipation, hardware-based encryption modules for data security, or power management circuits to optimize energy consumption.
- Datapath 1 within the assembly 1100 exemplifies a route through which read addresses and corresponding read data are transmitted across the assembly, specifically directing operations from the read address register 1126 located on the bottom semiconductor device 1122 to the read data register 1124 within the same device.
- the process begins with the read address register 1126 holding a specific read address. This address is sent through interconnect 1174, which acts as a channel for the signal. The read address is then transmitted to bumpless bonds 1128, which are meticulously designed to create a reliable electrical connection without the physical protrusion associated with traditional bonding methods. These bonds ensure a low-profile interface that preserves the compactness of the semiconductor stack.
- the signal continues from bumpless bonds 1128 to engage with bumpless bonds 1129 of the middle semiconductor device 1116.
- the read address is carried forward by interconnect 1130, which delivers the address to the read address register 1146 of the middle semiconductor device.
- Read address register 1146 propagates the read address through interconnect 1134, designated as path " 1 " guiding the signal to the microvault 1132.
- microvault 1132 Upon receiving the read address, microvault 1132 accesses the requested data. This data is then outputted through interconnect 1170 and directed to the multiplexer 1114.
- multiplexer 1114 functions as a selective switch that chooses between data streams based on the configuration determined by phase counter 1110. This phase counter may be designed to cycle through a sequence that dictates the timing and selection of data streams, ensuring that each microvault is read in a coordinated manner.
- the selected data from multiplexer 1114 is then captured by read data register 1112, which holds the data momentarily.
- the data is subsequently sent via interconnect 1120, which carries the signal to bumpless bonds 1159. These bonds are part of a sophisticated electrical interconnection system that, along with bumpless bonds 1158 on the bottom semiconductor device 1122, enables vertical and horizontal integration within the semiconductor stack.
- the signal now in the form of read data, traverses from bumpless bonds 1159 to bumpless bonds 1158 and is finally introduced into interconnect 1176. This interconnect completes the connection to read data register 1124, which is configured to receive and hold the read data.
- the read data register 1124 may be equipped to retain the data for subsequent processing or external communication.
- Datapath 2 within the assembly 1100 delineates a route specifically designed for the transmission of read addresses from the read address register 1126 on the bottom semiconductor device 1122 to the microvault 1118 located on the middle semiconductor device 1116 and the subsequent transfer of read data back to the read data register 1124 on the bottom device.
- the journey commences at the read address register 1126, where a read address is held in preparation for dispatch.
- This register is an part of the semiconductor device's control mechanism, orchestrating the retrieval of data by issuing specific addresses to the memory units.
- the read address is sent through the interconnect 1174, which provides a secure and reliable pathway for electrical signals within the integrated circuit.
- the read address continues from the interconnect 1174 to bumpless bonds 1128, which offer a seamless and low-profile connection to the middle semiconductor device 1116 via the corresponding bumpless bonds 1129. These bonds maintain the signal's integrity during inter-layer communication and are designed to accommodate the requirements of modern semiconductor architectures.
- the signal is then channeled via interconnect 1130 to the read address register 1146 within the middle semiconductor device 1116.
- the read address register 1146 acts as a secondary store and hold register From there, the read address is directed down interconnect 1156, labeled as path "2," which terminates at the microvault 1118.
- microvault 1118 Upon receipt of the read address, microvault 1118 accesses the corresponding data. This data retrieval process is facilitated by the microvault's internal architecture, which may comprise an array of memory cells optimized for rapid access and data stability. The read data is outputted from microvault 1118 and travels through interconnect 1172, which leads to the multiplexer 1114.
- Multiplexer 1114 determines which data stream to forward based on input from the phase counter 1110.
- the phase counter 1110 operates in synchronization with the system clock or an external control signal, cycling through various states to control the selection process of the multiplexer 1114 in a precise and predictable manner.
- the output of the multiplexer 1114, now carrying the selected read data, is conveyed to read data register 1112. This register temporarily stores the read data, acting as a buffer. The read data is then dispatched via interconnect 1120 towards bumpless bonds 1159.
- Bumpless bonds 1159 form the interface with bumpless bonds 1158 on the bottom semiconductor device 1122, where the signal is transmitted downward through the assembly.
- the read data then traverses the interconnect 1176 to reach its final destination, the read data register 1124.
- the read data register 1124 captures the read data, holding it in readiness for further processing or transmission to external circuits.
- Datapath 3 within the assembly 1100 is another communication route that illustrates the data transfer sequence from the read address register 1126 on the bottom semiconductor device 1122, through various components, ultimately to the microvault 1138 on the top semiconductor device 1106, and then back to the read data register 1124 on the bottom device.
- the sequence initiates at the read address register 1126, which serves as the origin point for read addresses.
- the register 1126 securely holds the address before it is dispatched through the interconnect 1174.
- Interconnect 1174 acts as a dedicated channel, ensuring that the read address is conveyed with precision to the bumpless bonds 1128.
- These bumpless bonds 1128 facilitate a streamlined connection to the bumpless bonds 1129 of the middle semiconductor device 1116, preserving the integrity and compactness of the signal pathway.
- the read address Upon reaching the middle semiconductor device 1116, the read address is relayed through interconnect 1130 to the read address register 1146.
- the read address register 1146 acts as a juncture that further propagates the address signal through the Through-Silicon Via (TSV) 1136.
- TSV Through-Silicon Via
- the TSV 1136 is a vertical interconnect that pierces through the semiconductor substrate, providing a direct link from the middle semiconductor device 1116 to the top semiconductor device 1106, thus exemplifying the 3D integration capabilities of semiconductor design.
- the read address ascends from TSV 1136 and emerges onto bumpless bonds 1160 on the middle semiconductor device 1116.
- the bumpless bonds 1160 are connected to the bumpless bonds 1161 of the top semiconductor device 1106.
- the address signal is conducted through interconnect 1140 to the read address register 1142 on the top device 1106.
- the read address register 1142 upon receiving the read address, directs the signal along interconnect 1144. This path, denoted as "3,” leads the address to the microvault 1138.
- Microvault 1138 designed for data storage, retrieves the requested information in response to the read address.
- the read data is outputted through interconnect 1166, which feeds the data into the multiplexer 1150.
- the multiplexer 1150 in the top semiconductor device 1106 is governed by the phase counter 1102, which dictates the selection of the data stream to be channeled to the read data register 1104.
- the chosen data stream is temporarily housed in the read data register 1104, where it awaits downstream transmission.
- the read data departs from the read data register 1104 via interconnect 1108, which connects to bumpless bonds 1162. These bonds 1162 engage with the corresponding bumpless bonds 1163 on the middle semiconductor device 1116, transferring the read data to TSV 1152.
- TSV 1152 operates as a vertical conduit, allowing the read data to traverse down into the middle semiconductor device 1116, where it is received by the read data register 1154.
- the read data register 1154 holds the read data momentarily before it is directed to the multiplexer 1114 through interconnect 1156.
- the multiplexer 1114 in the middle semiconductor device 1116 coordinated by the phase counter 1110, selects the appropriate data for output.
- the read data is then channeled to the read data register 1112, where it is briefly stored. Following this, the read data travels via interconnect 1120 to bumpless bonds 1159.
- the bumpless bonds 1159 form an interface with bumpless bonds 1158 on the bottom semiconductor device 1122.
- the read data signal is then carried through interconnect 1176, culminating its journey at the read data register 1124 on the bottom device.
- Datapath 4 within the assembly 1100 is a path that establishes the flow of read addresses from the read address register 1126 on the bottom semiconductor device 1122 to the microvault 1164 located on the top semiconductor device 1106, and subsequently facilitates the movement of read data back down to the read data register 1124 on the bottom device.
- This datapath begins at the read address register 1126, which is responsible for holding and issuing the read addresses necessary for data retrieval from the microvaults.
- the read address is sent from the register 1126 through interconnect 1174, a pathway that maintains signal integrity and facilitates electrical communication.
- the read address is directed to bumpless bonds 1128. These bonding areas create an interconnection between the bottom semiconductor device 1122 and the middle semiconductor device 1116 through bumpless bonds 1129. The design of these bumpless bonds facilitates data transmission.
- the read address Once the read address reaches the middle semiconductor device 1116, it is carried forward by interconnect 1130 to the read address register 1146. This register acts as an intermediary, preparing the address for its vertical ascent through the device stack. The address is then transmitted via the Through-Silicon Via (TSV) 1136 that facilitates vertical integration by providing a direct electrical link through the semiconductor substrate.
- TSV Through-Silicon Via
- the read address After ascending through TSV 1136, the read address emerges onto bumpless bonds 1160, which are aligned to connect with bumpless bonds 1161 on the top semiconductor device 1106. The read address then proceeds along interconnect 1140 to the read address register 1142 located on the top device.
- the read address register 1142 serves to forward the read address to its final destination, the microvault 1164, through interconnect 1148, labeled as path "4".
- Microvault 1164 upon receiving the read address, retrieves the requested data, which is then outputted through interconnect 1168. This data is directed to the multiplexer 1150, which is under the control of phase counter 1102.
- the phase counter 1102 determines which data stream is selected by the multiplexer 1150, which then sends the read data to the read data register 1104.
- the read data register 1104 acts as a temporary repository, holding the data until it can be sent downwards through the device stack.
- the data leaves the read data register 1104 and travels via interconnect 1108 to bumpless bonds 1162. These bonds maintain a connection to bumpless bonds 1163 on the middle semiconductor device 1116.
- the read data is then transferred to TSV 1152, which carries the data vertically down to the read data register 1154 on the middle semiconductor device 1116.
- the read data register 1154 temporarily holds the read data before it is fed into the multiplexer 1114 via interconnect 1156.
- the multiplexer 1114 coordinated by the phase counter 1110, channels the appropriate data stream to the read data register 1112. This register serves as a staging area for the read data, which is then sent through interconnect 1120 to bumpless bonds 1159.
- Bumpless bonds 1159 interface with bumpless bonds 1158 on the bottom semiconductor device 1122 to provide a downward transmission of the read data. Finally, the signal is routed through interconnect 1176 and arrives at read data register 1124, where the data is made available for subsequent processing or external communication.
- Fig. 12 illustrates a three-dimensional (3D) memory column 1200 configured as a 3D-N0R or 3D-AND structure, featuring a series of ferroelectric field-effect transistors (FeFETs) 1202 with interconnected drain terminals 1204 linked to a common select line 1212 and individual gate terminals 1206 connected to respective read/write enable lines 1214 (e.g., 1214a for FeFET 1202a, all coupled to a common bit line, in accordance with an embodiment of the present disclosure.
- FeFETs ferroelectric field-effect transistors
- Fig. 12 depicts a three-dimensional (3D) memory column, designated as element 1200, which can be configured in various embodiments as either a 3D-NOR or 3D- AND structure, providing flexibility in the application and use of the integrated circuit.
- This memory column is an assembly of multiple ferroelectric field-effect transistors (FeFETs), collectively referred to as fefets 1202, where each FeFET is indicated by elements such as 1202a, 1202b, 1202c, and 1202d, among others potentially present in the array.
- FeFETs ferroelectric field-effect transistors
- each FeFET such as 1202a, there is a drain terminal 1204a.
- This drain terminal is part of the memory cell's output path and is connected to a common select line 1212.
- the common select line 1212 serves as a control mechanism that enables the selection of a particular FeFET for data read or write operations.
- each FeFET exemplified by 1206a for FeFET 1202a
- a respective read/write enable line such as 1214a. This enables control of the FeFET's state, allowing it to be in a conductive (on) state for reading or writing data or in a non-conductive (off) state to prevent data flow.
- the presence of individual read/write lines for each FeFET may allow for precise control and operation of each memory cell.
- each FeFET such as 1202a, comprises a source terminal, such as 1208a, which is coupled to a common bit line 1210.
- the bit line 1210 provides a conduit for data being written to or read from the FeFETs.
- this bit line can be shared across multiple memory columns, which can facilitate parallel processing and increased data throughput.
- the 3D memory column 1200 may incorporate additional elements and configurations to enhance performance and functionality.
- the 3D memory column 1200 may include insulating materials, conductive pathways, and other structural components not explicitly shown in Fig. 12 but which are inherent to the implementation of such 3D memory structures.
- the FeFETs 1202 may also exhibit variations in terms of material composition, structural dimensions, and electrical properties, contributing to a range of performance characteristics suitable for different applications.
- the memory column 1200 may be incorporated into larger memory arrays, forming part of a memory module or system. These arrays can be arranged in various configurations, such as rows and columns, to create a matrix that efficiently addresses the demands of high-density data storage.
- the memory column 1200 can also be interfaced with other circuit elements and control logic, which may govern the operation of the memory array, including data management protocols, error correction algorithms, and power optimization strategies.
- the memory column 1200 may be fabricated using advanced semiconductor manufacturing techniques, such as photolithography, etching, deposition, and planarization processes.
- advanced semiconductor manufacturing techniques such as photolithography, etching, deposition, and planarization processes.
- the choice of materials for the FeFETs, including the ferroelectric material, the semiconductor channel, and the conductive elements, can be selected based on desired electrical characteristics, such as charge retention, switching speed, and energy efficiency.
- the 3D memory column 1200 may include FeFETs, such as 1202, fabricated from a variety of materials that provide the necessary electrical and physical properties to achieve the desired functionality.
- FeFETs such as 1202
- the channel layer of each FeFET in the FeFETs 1202 could be constructed from materials such as Indium Gallium Zinc Oxide (IGZO) or other Amorphous Oxide Semiconductors (AOS) like Zinc Tin Oxide or Indium Tungsten Oxide (IWO). These materials are selected for their electronic properties, such as carrier mobility and stability.
- the ferroelectric material rigidly coupled to the channel layer in each FeFET may comprise hafnium zirconium oxide (HfZrO2) or other transition metal oxides, perovskites, etc. These ferroelectric materials are chosen for their ability to maintain a polarization state when an electric field is applied, which is used for the non-volatile memory characteristics of the FeFETs.
- the thickness, crystalline structure, and stoichiometry of the ferroelectric layer can be controlled to achieve the desired coercive voltage, remanent polarization, and other electrical parameters for reliable data storage and retrieval.
- the drain 1204 and source 1208 terminals of the FeFETs 1202 are connected to the common select line 1212 and common bit line 1210, respectively.
- These common lines may be formed from conductive materials such as tungsten, titanium nitride, or other metals and metal alloys that provide low-resistance pathways for electrical signals. The configuration of these terminals and their respective common lines ensures that the FeFETs can be accessed and controlled effectively during operation.
- Each gate terminal, such as the gate 1206 of the FeFET 1202a is connected to its respective read/write enable line, such as 1214a.
- the gate terminals are help control the state of the FeFET, and the materials chosen for these terminals may include various conductive materials that can provide a reliable electrical interface with the ferroelectric material.
- the read/write enable 1214 lines are designed to deliver suitable voltage levels to the gates 1206 of the FeFETs 1202 for switching between states.
- the memory column 1200 as a whole is designed to support a range of operating parameters.
- these parameters may include, but are not limited to, an off- state current of less than 10 A -8 amps per centimeter cubed, an on-state current greater than 10 A - 7 amps per centimeter cubed, and a channel mobility that is maintained despite the presence of the ferroelectric layer.
- the channel layer's thickness can be less than 30 nm to ensure high device density, while the ferroelectric layer's characteristics, such as coercive voltage and remanent polarization, are optimized to provide the necessary memory functionality.
- the FeFETs 1202 may include additional materials or dopants to enhance their electrical properties.
- dopants such as gallium (Ga), indium (In), or zinc (Zn) may be introduced into the channel layer to modulate the carrier concentration or to adjust the threshold voltage of the FeFETs.
- the ferroelectric layer may include dopants like lanthanum (La) or niobium (Nb) to adjust its ferroelectric properties.
- the 3D memory column 1200 may be integrated with additional semiconductor devices and structures to form complex memory systems. These systems can provide storage capabilities and support various memory architectures.
- Fig. 13 depicts a three-dimensional (3D) memory column 1300 configured as a 3D-NAND structure, consisting of a vertical stack of ferroelectric field-effect transistors (FeFETs) 1302, each with source 1304 and drain 1308 terminals.
- the source 1304 of each FeFET such as 1304a for FeFET 1302a, is coupled to the start of a bit line 1310 or connected to the drain of the preceding FeFET, exemplified by source 1304b of FeFET 1302b coupled to drain 1308a of FeFET 1302a.
- Each FeFET includes a gate 1306, such as 1306a for FeFET 1302a, connected to a respective read/write enable line, illustrated by 1314a for FeFET 1302a, in accordance with an embodiment of the present disclosure.
- the 3D memory column 1300 is composed of a series of vertically stacked Field-Effect Transistors (FeFETs), identified collectively as FeFETs 1302. These transistors, which include FeFETs 1302a, 1302b, 1302c, 1302d, and so on, are characterized by their incorporation of ferroelectric materials within their gate structure. Each FeFET in the series is of the memory column contributes to the memory storage capabilities of the device. [00314] In the depicted embodiment, each FeFET, such as FeFET 1302a, includes a source terminal 1304, for instance, source 1304a, which is coupled to a bit line 1310.
- FeFETs Field-Effect Transistors
- the bit line 1310 serves as a conduit for electrical signals that are used to read from and write to the memory cell associated with FeFET 1302a.
- its source 1304b may be connected to the drain 1308a of the immediately preceding FeFET, such as FeFET 1302a, facilitating a serial connection that defines the vertical NAND architecture.
- Each FeFET within the FeFETs 1302 is further equipped with a gate terminal 1306, exemplified by gate 1306a for FeFET 1302a.
- This gate terminal 1306 is coupled to a respective read/write enable line, exemplified by 1314a for FeFET 1302a.
- the read/write enable line 1314a is responsible for controlling the state of the FeFET, allowing it to either conduct or prevent the flow of current through the device, thereby enabling the writing or reading of data.
- each FeFET of the FeFETs 1302 also includes a drain terminal 1308, such as drain 1308a for FeFET 1302a, which is typically connected to the source of the subsequent FeFET in the vertical stack. This arrangement ensures that the charge stored in the ferroelectric material of the gate can modulate the current flowing from the source to the drain, allowing for the storage and retrieval of data.
- the memory column 1300 within Fig. 13 is indicative of a memory architecture that can be utilized in various applications, from portable electronics to enterprise-level data storage systems.
- the ferroelectric material used in the FeFETs may include various compositions, such as hafnium oxide, zirconium oxide, or any combination thereof, which can be doped with elements such as lanthanum or yttrium to adjust the ferroelectric properties as required.
- the 3D memory column 1300 can incorporate additional features that enhance performance, reliability, or manufacturability.
- the FeFETs 1302 may include protective layers to shield the ferroelectric material from environmental factors or process-induced damage.
- the column 1300 may also be integrated with other circuit elements, such as capacitors or diodes, to facilitate operations like charge pumping or to provide additional functionality within the memory array.
- the 3D memory column 1300 may be fabricated from a variety of materials that confer specific electrical properties to enhance device performance.
- the channel layer of each FeFET may be formed from materials such as Indium Gallium Zinc Oxide (IGZO).
- IGZO Indium Gallium Zinc Oxide
- Other materials for the channel layer could include Amorphous Oxide Semiconductors (AOS) like Zinc Tin Oxide or Aluminum Zinc Oxide.
- AOS Amorphous Oxide Semiconductors
- the ferroelectric layer within the FeFETs 1302 may comprise materials such as Hafnium Zirconium Oxide (HfZrO2).
- the ferroelectric layer's thickness and material composition can be controlled through methods like Atomic Layer Deposition (ALD) to achieve the desired coercive voltages, remanent polarizations, and endurance characteristics.
- ALD Atomic Layer Deposition
- the coercive voltage of the ferroelectric layer may be tuned to be between -3 Volts to +3 Volts, facilitating low-voltage operation of the memory devices.
- the source and drain terminals of the FeFETs 1302 may be composed of conductive materials such as Tungsten or Titanium Nitride. These materials may also be selected to optimize the contact resistance with the channel layer, reducing overall power consumption and improving the lon/Ioff ratio of the device.
- the FeFETs 1302 may be engineered to exhibit specific electrical parameters.
- the channel layer's thickness may be less than 30 nm in some embodiments.
- the channel layer may demonstrate a carrier concentration of 10 A l 7 to 10 A 20 per centimeter-cubed, which can be adjusted through doping with elements such as Gallium, Indium, or Zinc to modulate the electrical properties.
- the memory cells formed by the FeFETs 1302 within the 3D memory column 1300 may also target operational parameters such as read and write latencies, endurance, and energy consumption. For example, read and write operations may be executed with energies less than 10 picojoules and within timeframes less than 20 nanoseconds, contributing to the low power and high-speed attributes of the memory column.
- the 3D-NAND configuration of the memory column 1300 may be designed to achieve a high off-state resistance to on-state resistance ratio (Roff/Ron), which is critical for distinguishing between different data states and ensuring reliable data retention.
- This ratio may be about 10 A 3 or greater, which helps to maintain a high signal -to-noise ratio during memory operations.
- the FeFETs 1302 in the memory column 1300 may also be designed to sustain a high degree of reliability, with endurance ratings greater than or equal to 10 A l 1 cycles, ensuring the longevity and durability of the memory device. This endurance is complemented by the ferroelectric layer's ability to maintain data retention for at least 1 minute at room temperature, which is 25°C.
- Fig. 14 depicts a three-dimensional (3D) memory column configured as a 3D- NAND with an integrated pass gate, in accordance with an embodiment of the present disclosure.
- This figure illustrates a series of ferroelectric field-effect transistors (FeFETs) 1402, each including source 1404 and drain 1408 terminals, gated by respective gate terminals 1406 and coupled to read/write enable lines 1414.
- the FeFETs are interconnected, forming a vertical memory structure with pass gates 1418 linked to a pass gate line 1416.
- Fig. 14 illustrates a three-dimensional (3D) memory column, designated as element 1400, which can be configured as a 3D-NAND structure with an integrated pass gate. This configuration enables enhanced control over individual memory cells within the 3D structure, potentially improving read/write operations and facilitating efficient memory management.
- the 3D memory column 1400 comprises multiple ferroelectric fieldeffect transistors (FeFETs), collectively referred to as FeFETs 1402.
- FeFETs ferroelectric fieldeffect transistors
- These FeFETs are utilized fortheir ability to retain data in a non-volatile manner due to the ferroelectric properties of their gate material, which allows for data retention without continuous power supply for a time.
- Each FeFET in the series 1402 includes a source, exemplified by source 1404a for FeFET 1402a.
- the source 1404 for each FeFET is either coupled to a bit line, illustrated as bit line 1410 for FeFET 1402a, or is connected to the drain of the preceding FeFET in the series.
- source 1404b of FeFET 1402b is electrically coupled to drain 1408a of FeFET 1402a.
- This serial connection forms the basis for the daisy-chain configuration, which is used in NAND architectures, allowing for sequential access to the array of FeFETs.
- each FeFET within the FeFETs 1402 is equipped with a gate terminal, such as gate 1406a for FeFET 1402a.
- the gates of the FeFETs are connected to their respective read/write enable lines, which are depicted as element 1414 in the figure.
- gate 1406a of FeFET 1402a is influenced by read/write enable line 1414a. These enable lines control the application of appropriate voltages for the reading and writing of data.
- each FeFET in the FeFETs 1402 series includes a drain, such as drain 1408a for FeFET 1402a. This drain is connected to the source of the subsequent FeFET in the series, thus establishing the continuity for the columnar structure of the 3D memory stack.
- each FeFET of the FeFETs 1402 incorporates a pass gate, for example, pass gate 1418a, which is connected to a pass gate line, represented by 1416 in the figure.
- the pass gate line 1416 is a conductive pathway that provides electrical signals to control the pass gates 1418 of the FeFETs.
- the inclusion of pass gates in the FeFETs may allow for improved isolation between memory cells during operation, thereby reducing interference and potentially enhancing the reliability of data storage and retrieval.
- the 3D memory column 1400 as depicted in Fig. 14 also encompasses a diverse range of materials and parameters that could be utilized to optimize its performance in various embodiments.
- Each FeFET 1402 within the column could be fabricated using a variety of semiconductor materials.
- the channel layer of the FeFETs could be formed from materials such as Indium Gallium Zinc Oxide (IGZO).
- the ferroelectric layer which is a defining characteristic of the FeFETs, may be composed of materials like Hafnium Zirconium Oxide (HfZrO2) or other perovskite materials, which are known for their remanent polarization. This property determines the data retention capabilities of the FeFETs.
- the coercive voltage of this layer which affects the energy required to switch the polarization state, is another critical parameter that can be adjusted according to the requirements of the specific application, with a range in one embodiment being between -3 Volts to 3 Volts.
- the source and drain terminals of the FeFETs which include elements 1404 and 1408, respectively, could be composed of conductive materials such as Tungsten or Titanium Nitride. These materials provide pathways for current flow, which is for switching.
- the read/write enable lines 1414, which control the gates 1406 of the FeFETs, could also be fabricated from similar materials, ensuring consistent electrical characteristics throughout the device.
- the channel layer thickness may be less than
- the electron mobility within these channel layers may be maintained at a predetermined level even when the layer is less than 30nm in thickness
- the pass gates 1418 may be manufactured using low-resistance materials to enable quick switching times, which is beneficial when the memory column is accessed frequently during operation.
- Fig. 15 illustrates a three-dimensional (3D) memory column 1500, which may be configured as either a 3D-NOR or a 3D- AND structure with independent Read/Write enable capabilities, in accordance with an embodiment of the present disclosure.
- This memory column encompasses a series of vertically aligned FeFETs 1502, such as FeFETs 1502a, 1502b, 1502c, 1502d, and so forth, each integrated with a source 1504 (e.g., source 1504a for FeFET 1502a) linked to a respective read enable line 1520 (e.g., read enable line 1520a for FeFET 1502a), and a gate 1506 (e.g., gate 1506a for FeFET 1502a) connected to a corresponding write enable line 1522 (e.g., write enable line 1522a for FeFET 1502a). All FeFETs within the column share a common bit line 1510 connected to their drains 1508, enabling the column to perform coordinated memory operations.
- Fig. 15 presents a detailed depiction of a three-dimensional (3D) memory column 1500, which can be configured as a 3D-NOR or 3D-AND structure with independent Read/Write enable functionalities.
- This memory column is an assembly of Field-Effect Transistors with ferroelectric gate layers, commonly referred to as FeFETs 1502, which are individually identified, for example, as 1502a, 1502b, 1502c, 1502d, etc., each representing a memory cell within the column.
- each FeFET 1502 includes a source 1504, such as source 1504a corresponding to FeFET 1502a.
- the source 1504 is designed to be electrically coupled to a respective read enable line 1520, such as read enable line 1520a which is dedicated to FeFET 1502a.
- the read enable line 1520 functions to selectively activate the FeFET 1502 for reading operations, allowing the readout of stored data from the memory cell.
- each FeFET 1502 is equipped with a gate 1506, exemplified by gate 1506a for FeFET 1502a.
- the gate 1506 is connected to a respective write enable line 1522, such as write enable line 1522a, which is specific to FeFET 1502a.
- the write enable line 1522 serves to selectively activate the FeFET 1502 for writing operations, enabling the storage of data within the memory cell.
- each FeFET 1502 includes a drain 1508, for instance, drain 1508a affiliated with FeFET 1502a.
- the drain 1508 is connected to a common bit line 1510.
- the bit line 1510 acts as a conduit for transferring data to and from the memory cells during read and write operations.
- the commonality of the bit line 1510 across multiple FeFETs 1502 signifies that data from any activated memory cell can be routed through this shared path.
- the configuration of the FeFETs 1502 allows for a high density of memory cells vertically stacked within a compact footprint.
- the ferroelectric material utilized in the gate 1506 of the FeFETs 1502 may comprise various compositions, such as hafnium oxide-based materials, which can be deposited using atomic layer deposition techniques.
- the ferroelectric property of the material allows for data retention, enabling the memory cells to maintain stored information even when power is not supplied.
- the source 1504, gate 1506, and drain 1508 of each FeFET 1502 can be fabricated from materials that provide predetermined electrical performance. These materials may include metals such as tungsten or copper, or metal nitrides such as titanium nitride. [00346]
- the read enable lines 1520 and write enable lines 1522 can be designed to minimize crosstalk and interference between adjacent lines, in some embodiments. In some specific embodiments, shielding layers or insulating materials may be included to further isolate the signal paths.
- the described memory column 1500 may be integrated within a larger semiconductor device, such as a processor or a storage module. It may form part of a system-on-chip (SoC) or be included in a multi-chip module (MCM), contributing to a data storage and retrieval system.
- SoC system-on-chip
- MCM multi-chip module
- the materials that constitute the FeFETs 1502 within the memory column 1500 are selected to provide specific electrical and physical properties to optimize the performance of the integrated circuit.
- the channel layer in each FeFET may be formed from advanced semiconductor materials, such as Indium Gallium Zinc Oxide (IGZO) or other Amorphous Oxide Semiconductors (AOS) like Zinc Tin Oxide or Cadmium Oxide. These materials are chosen for their excellent electron mobility characteristics and stability.
- IGZO Indium Gallium Zinc Oxide
- AOS Amorphous Oxide Semiconductors
- the ferroelectric layer integral to the FeFETs 1502, may be fabricated from various ferroelectric materials that exhibit suitable polarization properties.
- Materials such as Hafnium Zirconium Oxide (HfZrO2) or Lead Zirconate Titanate (PZT) could be utilized. These materials can be doped with elements such as Lanthanum, Yttrium, or other suitable dopants to modify their ferroelectric properties, such as coercive voltage, remanent polarization, and crystallization temperature.
- the ferroelectric layer's thickness and material composition may be adjusted to achieve desired memory characteristics, such as write endurance and retention time, while ensuring the layer remains compatible with the overall semiconductor manufacturing process, or other considerations, etc.
- the source 1504, gate 1506, and drain 1508 terminals of the FeFETs 1502 may be composed of conductive materials like Tungsten, Titanium Nitride, Nickel, or Molybdenum. Connections to the read enable lines 1520 and the write enable lines 1522 may be facilitated through conductive vias or contacts.
- the read enable lines 1520 and write enable lines 1522, along with the common bit line 1510, may be patterned using lithographic techniques to achieve the predetermined precision and alignment for proper functionality. These lines may be insulated from one another using dielectric materials like Silicon Dioxide (SiO2), Silicon Nitride (Si3N4), or low- k dielectrics to reduce parasitic capacitance and crosstalk.
- dielectric materials like Silicon Dioxide (SiO2), Silicon Nitride (Si3N4), or low- k dielectrics to reduce parasitic capacitance and crosstalk.
- Each element within the memory column 1500 may consider factors such as line width, spacing, and aspect ratio to ensure manufacturability, functionality, and/or other goals or characteristics.
- the materials and processes used in the construction of the memory column 1500 are chosen to ensure compatibility with standard semiconductor fabrication techniques, such as photolithography, etching, deposition, and annealing, while also enabling the integration of materials and structures.
- the fabrication of the FeFETs 1502 within the memory column 1500 may involve deposition techniques such as atomic layer deposition (ALD), chemical vapor deposition (CVD), or physical vapor deposition (PVD) to create uniform and/or non-uniform layers.
- ALD atomic layer deposition
- CVD chemical vapor deposition
- PVD physical vapor deposition
- Fig. 16 presents a cross-sectional view of a 3D memory structure, designated as 1600, configured as a single-port 3D NAND, in accordance with an embodiment of the present disclosure.
- the structure includes a first vertical structure 1608a and a second identical vertical structure 1608b, each comprising a dielectric column 1610a, 1610b, a channel column 1612a, 1612b disposed around the dielectric column, and a ferroelectric column 1614a, 1614b disposed around the channel column.
- a series of horizontal gate-electrode layers 1606a-c are disposed at predetermined distances from each other, adj acent to the ferroelectric column along the length of the vertical structures.
- the assembly further includes a drain select layer 1602 and a source select layer 1604, with respective end dielectric columns 1618a, 1618b, and 1616a, 1616b positioned at the interfaces with the vertical structures, illustrating a detailed and intricate design for high-density data storage.
- fig. 16 provides a cross-sectional view of a three-dimensional (3D) memory structure, designated as 1600, which is configured as a single-port 3D NAND architecture.
- This structure incorporates a pair of vertical structures, 1608a and 1608b, which may be fabricated to be substantially identical, as indicated by their respective subscripts a and b, suggesting the potential for a modular and scalable memory array design.
- Each vertical structure exemplified by the first vertical structure 1608a, includes a dielectric column 1610a.
- the dielectric column may adopt various geometric forms — it can be cylindrical, substantially cylindrical, or feature curves. Additionally, it may present a tapered form, having different diameters at each end, implying a design that narrows towards the top. Both solid and hollow configurations of the dielectric column are contemplated within the scope of the disclosure, offering design flexibility for different electrical and structural requirements.
- a channel column 1612a Surrounding the dielectric column 1610a is a channel column 1612a, which is the locus for charge carriers during device operation.
- the channel column is also described as potentially cylindrical, substantially cylindrical, or feature curves and/or and may exhibit similar variations in diameter along its length as the dielectric column.
- Enveloping the channel column 1612a is a ferroelectric column 1614a, which extends along the length of the channel column but may recede at the ends, thereby meaning the ferroelectric column 1614 does not extend the entire length of the channel column 1612a.
- drain select layer 1602 parallel to the horizontal gate-electrode layers 1606. Where the drain select layer 1602 meets the vertical structures 1608a, 1608b, end dielectric columns, 1618a and 1618b, are discernible. These end dielectric columns 1618 interface with the channel column 1612 and the drain select layer 1602, contributing to the isolation and control of the charge carriers within the channel column. They may contact the ferroelectric layer 1614, as they envelop the channel column 1612 at different positions along its length.
- a source select layer 1604 is situated at the bottom of the structure 1600, again parallel to the horizontal gate-electrode layers 1606.
- the horizontal gate-electrode 1606 layers could be constructed from a range of conductive materials, including metals and metal compounds, which may offer different work functions, conductivity, and compatibility with other materials in the structure.
- the ferroelectric column 1614 might incorporate a variety of ferroelectric materials each with its unique polarization characteristics, coercive fields, and dielectric constants, affecting the device's memory retention and switching behaviors.
- the channel column 1612 materials can be chosen based on their electronic properties, such as carrier mobility and bandgap, to achieve the desired levels of on-state and off-state current.
- the dielectric column 1610 provides the electrical insulation necessary to prevent leakage currents and ensure the proper functioning of the device.
- the dielectric column such as 1610a for the first vertical structure, may be constructed from materials that offer insulating properties to mitigate any potential leakage currents.
- Choices for the dielectric material may be Hafnium Oxide (HfO2) or Silicon Dioxide (SiO2).
- the channel column (1612a) has channel material can be selected from a wide range of semiconducting materials that offer predetermined carrier mobility.
- IGZO Indium Gallium Zinc Oxide
- the channel layer's thickness may vary, with some embodiments considering a thickness less than 30 nm. This thickness is chosen to achieve a predetermined electrical performance.
- the ferroelectric column like 1614a, may include perovskite structures, such as Lead Zirconate Titanate (PZT).
- the horizontal gate-electrode layers are composed of conductive materials that facilitate the application of an electric field to the ferroelectric column, such as Tungsten or Titanium Nitride, which may be chosen for their electrical behavior.
- the selection of gate-electrode materials also takes into consideration factors such as work function, thermal stability, and ease of integration with the existing semiconductor manufacturing processes.
- the drain and source select layers, 1602 and 1604 respectively, are incorporated to enable the addressing of individual memory cells within the array.
- the materials used for these layers are chosen for their conductive properties and compatibility with the channel and ferroelectric materials.
- the design of these layers may also incorporate considerations for reducing parasitic capacitance and ensuring swift data access.
- the end dielectric columns like 1618a and 1616a, provide electrical insulation at the ends of the channel column, where the ferroelectric material does not extend.
- the disclosed embodiments within the 3D memory structure 1600 outline an assembly capable of providing data storage.
- the design allows for variations in structural dimensions, such as the diameter of the cylindrical columns, which can be uniform or tapered. Additionally, the option for solid or hollow configurations may be used.
- Fig. 17 shows a 3D memory structure that is a dual -port 3D NAND arrangement in accordance with an embodiment of the present disclosure.
- the three-dimensional (3D) memory structure illustrated in Fig. 17, referred to as 3D memory structure 1700 exemplifies a dual-port 3D NAND arrangement to provide a memory functionality.
- This structure is characterized by two primary vertical formations, designated as the first vertical structure 1708a and the second vertical structure 1708b, which may be identical or near identical, as evidenced by the designating subscripts ‘a’ and ‘b’.
- the first vertical structure 1708a includes a hollow or solid, tapered pass-gate electrode column 1718a that is substantially cylindrical in shape.
- the pass-gate electrode column 1718a may be made of titanium nitride and may have a larger diameter at the bottom end compared to the top end.
- a dielectric column 1710a Surrounding the pass-gate electrode column 1718a is a dielectric column 1710a that may be made of hafnium oxide.
- the dielectric column 1710a is also substantially cylindrical with a slightly tapered shape, having a marginally larger diameter at the top.
- the dielectric column 1710a provides electrical isolation between the pass-gate electrode and subsequent layers.
- a cylindrical channel column 1712a Disposed around the dielectric column 1710a is a cylindrical channel column 1712a that may be made of IGZO semiconductor material.
- the channel column 1712a features curves along its length and has a uniform diameter throughout.
- the thickness of the channel column may be less than 30 nm.
- the channel column 1712a Enclosing the channel column 1712a is a PZT ferroelectric column 1714a that covers most of the length of the channel column 1712a but recedes at the ends, leaving a portion of the channel column 1712a uncovered.
- the ferroelectric column 1714a is substantially cylindrical and contains lead, zirconium and titanium as key elemental constituents.
- the vertical structures 1708a and 1708b traverse through several horizontal gate-electrode layers 1706a, 1706b and 1706c that may be made of tungsten, which are positioned at fixed intervals to form an interconnected grid layout. These layers influence the electric field within the ferroelectric column 1714a during memory operations.
- a drain select layer 1702 (e.g., titanium nitride ) runs parallel to the horizontal gate-electrode layers 1706. Where the drain select layer 1702 intersects the vertical structures 1708a and 1708b, end dielectric columns 1718a and 1718b are visible. These end columns (e.g., made of HfO2 ) touch the ferroelectric column 1714a on one end and surround the uncovered portion of channel column 1712a, providing insulation.
- a source select layer 1704 (e.g., made of tungsten), also parallel to the electrode layers 1706, interfaces with the vertical structures. End dielectric columns 1716a and 1716b can be observed at these intersection points, enclosing the open ends of the channel columns 1712a and 1712b.
- a thin dielectric horizontal layers 1720a and 1720b may be placed near the bottom terminals (e.g., HfO2). These layers seal off the bottom open ends of the vertical hollow voids.
- Fig. 18 illustrates a 3D memory structure 1800 that can be configured as a 3D NOR Vertical Transistor memory array.
- the 3D memory structure 1800 comprises a first vertical structure 1808a and an identical (or substantially identical) second vertical structure 1808b arranged adjacent to one another.
- the first vertical structure 1808a includes a vertical plug column 1802a that provides an electrical connection to the lower portions of the 3D memory structure.
- the vertical plug column 1802a may have a uniform diameter along its entire length or may have a larger diameter on its lower end than on its upper end.
- the plug column 1802a can be fabricated as a solid column or as a hollow column in various embodiments.
- a source electrode column 1804a and a drain electrode column 1816a Disposed adjacent to the vertical plug column 1802a is a source electrode column 1804a and a drain electrode column 1816a.
- the source electrode column 1804a and drain electrode column 1816a provide electrical connections to the source and drain nodes of the vertical transistors formed along the vertical structure 1808a.
- the source electrode column 1804a and drain electrode column 1816a may be comprised of various conducting materials including, but not limited to, tungsten, titanium nitride, tantalum nitride, nickel, molybdenum, platinum, palladium, cobalt, gold, aluminum, copper, hafnium, hafnium nitride, iridium, iridium oxide, ruthenium, ruthenium oxide, silicides, graphene, carbon nanotubes, doped polysilicon, indium tin oxide, silver, aluminum-doped zinc oxide, gallium, gallium arsenide, indium gallium zinc oxide, metal alloys such as AICu and TiW, and conducting polymers.
- conducting materials including, but not limited to, tungsten, titanium nitride, tantalum nitride, nickel, molybdenum, platinum, palladium, cobalt, gold, aluminum, copper, hafnium, hafnium nitride, iridium
- the channel column 1812a Surrounding the vertical plug column 1802a, source electrode column 1804a, and drain electrode column 1816a is a channel column 1812a that provides the semiconductor channel region for the vertical transistors along the first vertical structure 1808a.
- the channel column 1812a may be formed from materials including, but not limited to, indium gallium zinc oxide (IGZO), indium zinc oxide (IZO), zinc tin oxide (ZTO), aluminum zinc oxide (AZO), indium tungsten oxide (IWO), gallium zinc oxide (GZO), hafnium indium oxide (HIO), cadmium oxide (CdO), polysilicon, polygermanium, cadmium selenide (CdSe), copper indium gallium selenide (CIGS), crystalline silicon, crystalline germanium, gallium arsenide (GaAs), indium phosphide (InP), indium antimonide (InSb), silicon carbide (SiC), gallium nitride (GaN), zinc oxide (Z
- the ferroelectric column 1814a Surrounding the channel column 1812a is a ferroelectric column 1814a that provides the gate dielectric for the vertical transistors along the first vertical structure 1808a.
- the ferroelectric column 1814a may be comprised of ferroelectric materials including, but not limited to, perovskite oxides, lead zirconate titanate (PZT), barium titanate (BaTiO3), strontium titanate (SrTiO3), bismuth ferrite (BiFeO3), potassium niobate (KNbO3), lithium niobate (LiNbO3), lithium tantalate (LiTaO3), sodium bismuth titanate (Na0.5Bi0.5TiO3), bismuth titanate (Bi4Ti3O12), bismuth zinc niobate (Bi(Znl/2Ti 1/2)03), bismuth lanthanum titanate (BiLaTiO3), bismuth nickel titanate
- the 3D memory structure 1800 further comprises multiple horizontal gate electrode layers 1806 including layers 1806a, 1806b, 1806c etc.
- the horizontal gate electrode layers 1806 are disposed at regular intervals along the vertical structures 1808 and provide the gate electrodes for the vertical transistors.
- the gate electrode layers 1806 may be formed from materials such as tungsten, titanium nitride, tantalum nitride, nickel, molybdenum, platinum, palladium, cobalt, gold, aluminum, copper, hafnium, hafnium nitride, iridium, iridium oxide, ruthenium, ruthenium oxide, silicides, graphene, carbon nanotubes, doped polysilicon, indium tin oxide, silver, aluminum-doped zinc oxide, gallium, gallium arsenide, indium gallium zinc oxide, metal alloys such as AICu and TiW, and conducting polymers.
- materials such as tungsten, titanium nitride, tantalum nitride, nickel, molybdenum, platinum, palladium, cobalt, gold, aluminum, copper, hafnium, hafnium nitride, iridium, iridium oxide, ruthenium, ruthenium oxide, silicides
- Each of the horizontal gate electrode layers 1806 may be surrounded by an oxide/nitride/oxide (ONO) stack 1810, such as 1810a surrounding gate electrode layer 1806a, to provide insulation between the gate electrodes.
- ONO oxide/nitride/oxide
- the second vertical structure 1808b in the 3D memory structure 1800 is identically configured as the first vertical structure 1808a.
- the two vertical structures 1808a and 1808b are arranged horizontally adjacent to each other with a spacing that allows integration of the gate electrode layers 1806 and ONO stacks 1810. Together, the first and second vertical structures 1808a, 1808b along with the horizontal gate electrode layers 1806 can be configured as a 3D NOR memory architecture.
- Fig. 19 illustrates an embodiment of a planar FeFET 1900.
- the FeFET 1900 comprises a substrate 1910 upon which various layers and components are formed.
- the substrate 1910 may be comprised of silicon or other suitable semiconductor materials.
- Disposed on top of the substrate 1910 is a layer of TiN 1912.
- the TiN layer 1912 may act as an electrode and can be deposited by sputtering or other suitable deposition techniques.
- HZO 1908 comprises hafnium, zirconium, and oxygen and can exhibit ferroelectric properties.
- the HZO 1908 may be deposited by ALD, CVD, PVD or other suitable deposition methods and can have a thickness in the range of 5-50 nm. Acting as a ferroelectric layer, the HZO 1908 enables the non-volatile storage of data in the FeFET 1900.
- IWO 1906 comprises indium, tungsten, and oxygen. It can be deposited by sputtering or other suitable techniques and may have a thickness in various ranges.
- the IWO 1906 layer serves as a control oxide layer in the FeFET 1900.
- drain contact 1904 and source contact 1914 are formed on top of the IWO 1906 layer.
- the drain contact 1904 and source contact 1914 may comprise metals such as copper, aluminum, or alloys thereof and can be deposited by PVD, CVD or other suitable methods.
- the drain contact 1904 and source contact 1914 allow electrical connection to the FeFET 1900. They may have thicknesses in the range of 50-500 nm.
- a voltage applied to the drain 1904, source 1914, and TiN gate contact 1912 can control the ferroelectric polarization of the HZO 1908 layer.
- the polarization state can be used to store information in a non-volatile manner, enabling memory storage capabilities.
- the IWO 1906 layer helps improve the switching speed and endurance of the FeFET 1900. Overall, the layered structure shown in Fig. 19 enables a FeFET 1900 suitable for non-volatile memory applications.
- Fig. 20 presents the transfer characteristics of a Ferroelectric FET (FeFET) device, illustrating the relationship between the gate voltage (V_GS) on the x-axis and the resulting drain current (I D) on the y-axis.
- Fig. 20 may show the characteristics of a FeFET as disclosed herein.
- the x-axis spans from -IV to IV, while the y-axis, on a logarithmic scale, displays current values from 10 A -12A/pm to 10 A -4A/pm
- the red curve starts at approximately 10 A -l lA/pm at -IV and increases more gradually as it approaches 0V. At around IV, it then follows closely with the blue curve past IV. This demonstrates comparable drain current behavior under reverse bias conditions regardless of the polarization state.
- the separation between the red and blue curves spans several orders of magnitude in the negative voltage range near -IV.
- This substantial difference in off-state current highlights the non-volatile memory effect achievable with the FeFET depending on its polarization direction.
- This large memory window is explicitly called out in the green box labeled “Large Memory Window” at the top left.
- Additional key details provided include the FeFET device dimensions, with a width/length ratio of lpm/50nm specified.
- Fig. 20 depicts the bidirectional transfer characteristics of the FeFET device, highlighting the large memory window achievable through polarization switching and providing detailed voltage, current, and dimensional specifications to fully convey the measurement conditions and transistor performance.
- the paired curves effectively compare the clockwise and counterclockwise operation modes over the full gate voltage range.
- Fig. 21 shows a semiconductor package using a memory wafer 2102 and an application wafer 2104 formed into a wafer pair 2106 in accordance with an embodiment of the present disclosure.
- the semiconductor package 2100 includes multiple wafer pairs 2106, 2108, 211 stacked vertically and bonded together.
- the wafer pair 2106 comprises a memory wafer 2102 bonded face-to-face to an application wafer 2104.
- the memory wafer 2102 contains modules 2122 grouped into modules groups 2124.
- Each modules group 2124 includes multiple modules, such as first module 2126 and second module 2128.
- Each module 2122 has an independent read port (not shown) to allow concurrent reading of data from the modules.
- the modules group 2124 shares a common write port with write peripheral to write data to the modules.
- the write peripheral contains circuitry such as drivers, decoders, buffers and controllers to facilitate writing data to the modules.
- the application wafer 2104 contains processing elements (not shown) to operate on data read from the memory wafer 2102.
- the wafers 2102, 2104 may be connected through fine-pitch interconnects like hybrid bonds 2134 to enable high-bandwidth local data transfer between the processing logic and memory modules.
- An interposer or interface may be integrated between wafer pair 2106 and the next wafer pair 2108 to facilitate communication of signals/power between the vertically stacked wafers.
- Conductive through-silicon vias TSVs 2120 provide vertical connectivity across the wafer stack.
- a heat sink thermal material layer 2118 may be inserted between wafer pairs to dissipate heat. Additional wafer pairs like 2110 can be incorporated for increased functionality, storage capacity, etc. Redistribution layers may be included to route signals/power. The redistribution layers (RDL) on each wafer can be used for routing of electrical signals and/or power throughout the wafer stack. Each RDL may be composed of one or multiple metal layers, separated by dielectric materials (in some embodiments). The metal layers may be made of copper, aluminum, or other metals and/or alloys.
- the RDL layer may include the integration of bypass capacitors on the RDL. These capacitors may be placed to stabilize the power supply by filtering out noise and providing a reservoir of charge to meet transient current demands. The inclusion of these capacitors directly on the RDL reduces the distance between the power source and the active circuits to thereby minimizing inductive and resistive losses.
- the RDL may incorporate power supply Through-Silicon Vias (TSVs) on the back-side of the wafer.
- TSVs Through-Silicon Vias
- These TSVs are vertical interconnects that pass through the silicon wafer, providing a direct and low-resistance path for power supply connections from the top to the bottom of the wafer stack, for example.
- the back-side TSVs can be used in multi-wafer stacks to reduce the power delivery path length.
- the design of the RDL may be laid out to minimize crosstalk and signal degradation to maintain high-speed data transfer rates.
- the signal routing on the RDL may be complemented by the use of shielding techniques, such as the incorporation of ground planes and the strategic placement of vias, to further enhance signal integrity, in some embodiments.
- the RDL may be formed using processes such as photolithography for patterning, electroplating for metal deposition, and chemical-mechanical polishing (CMP), among others.
- CMP chemical-mechanical polishing
- the memory wafer 2102 comprises multiple modules groups, including a modules group 2124.
- Each modules group such as modules group 2122, contains multiple modules, including a first module 2126 and a second module 2128.
- the exact number of modules in each modules group may vary in different embodiments.
- the modules group 2124 may have a shared write port that allows data to be written to all modules within the group via a write peripheral.
- the write port utilizes a write address space that spans across the modules in modules group 2124.
- the shared write port may be facilitated by the write peripheral, which includes necessary circuitry like drivers, decoders, buffers, and controllers to ensure data is appropriately written to the modules.
- Each module in the modules group 2124 such as the first module 2126 and the second module 2128, has its own dedicated read port.
- the first module 2126 has a first read port that allows data to be read specifically from the first module 2126.
- the second module 2128 has a second read port for reading data only from the second module 2128.
- the dedicated read ports enable concurrent data reads across multiple modules. For instance, data can be read independently from the first module 2126 via the first read port concurrently with data being read from the second module 2128 via the second read port.
- the memory wafer 2102 may contain multiple such modules groups 2124 with dedicated read ports per module and shared write ports per group to simultaneously enable writes to the group and parallel reads from individual modules. This architecture provides flexibility and performance.
- the memory wafer 2102 is directly bonded to the application wafer 2104 to facilitate high-bandwidth communication between the processing elements on the application wafer 2104 and the data storage offered by the modules on the memory wafer 2102.
- the application wafer 2104 may contain various integrated circuit components to enable processing and communication with the modules on the memory wafer 2102.
- the application wafer 2104 includes processing elements such as CPUs, GPUs, DSPs, machine learning accelerators, and other logic circuits to manipulate data stored on and retrieved from the memory wafer 2102. These processing elements may be interconnected through an on-chip network or bus architecture to coordinate data transfers.
- the application wafer 2104 may feature an interface to an application programming interface (API) 2114 to facilitate interaction with the memory contents.
- API application programming interface
- This API 2114 may abstract the underlying memory implementation and provide a consistent interface through which applications can store and access data to package 2100, to the application wafer 2104, the memory module 2102, to one or more of the wafer pairs 216, 2106, 2110, or some combination thereof.
- the API 2114 may be a CXI or PCI interface, for example.
- the application wafer 2104 may also comprise data routing circuitry such as the DRAM data input 2112 and DRAM data output 2116. These components manage the flow of data in and out of the wafer pair 2106, enabling communication with external systems. In some embodiments, the DRAM data input 2112 and/or the DRAM data output 2116 can also access the memory wafer 2102.
- the application wafer 2104 contains voltage regulation and power delivery subsystems to provide stable power to the wafer pair 2106. It may also incorporate test, characterization, and built-in self-test (BIST) circuits to validate functionality.
- BIST built-in self-test
- the application wafer 2104 can have through-silicon vias (TSVs) 2120 and/or redistribution layers to assist with vertical system integration, thermal management, and intra-package signaling. Additional wafers may be stacked above wafer pair 2106 and interconnected using TSVs 2120.
- TSVs through-silicon vias
- the semiconductor package 2100 includes multiple wafer pairs, one of which is designated as wafer pair 2106.
- Wafer pair 2106 comprises a memory wafer 2102 bonded to an application wafer 2104.
- the memory wafer 2102 includes modules, such as module 2122, arranged into modules groups like modules group 2124.
- the modules serve as data storage components, potentially utilizing non-volatile memory technologies.
- the modules group 2124 features dedicated data input/output capabilities. For example, as previously mentioned, in some embodiments a DRAM data input 2112 allows external input of data to the modules, while a DRAM data output 2116 enables data retrieval from the modules. In some embodiments, DRAM data input and output 2112, 2116 instead interfaces to the application wafer 2104.
- the application wafer 2104 may contain various processing elements and routing circuitry to facilitate computation using data stored in the bonded memory wafer 2102.
- the application wafer 2104 could integrate components like CPUs, GPUs, Al accelerators, FPGAs, and other logic blocks that need to access large volumes of data.
- the application programming interface 2114 provides the signaling and protocols enabling communication between the application wafer 2104 and memory wafer 2102 to facilitate data transmission across the wafer pair 2106.
- Heat sink thermal material 2118 may be incorporated between wafer pairs like 2106, 2108, and 2110 for thermal management.
- Through-silicon vias 2120 also enable high-density signaling across vertically stacked wafers. Together, these techniques facilitate wafer-scale heterogeneous integration to create computing systems combining processing and storage.
- the wafer pair 2106 is architected to enable customized matching of compute engines (application wafer 2104) to large data repositories (memory wafer 2102), while optimizing metrics like throughput, latency, power efficiency, and reliability through advanced integration. Multiple wafer pairs can be interconnected across device layers to achieve increased capabilities. In some implementations, wafer pair 2106 may be designed based on a specific workload’s computational requirements and data access patterns. Overall, wafer pair 2106 facilitates an efficient hardware mapping between processing logic and stored data.
- the DRAM data input 2112 is configured to receive input data that can be written to the semiconductor package 2100.
- the data received by the DRAM data input 2112 may originate from various sources external to the package 2100, such as data buses, processing elements, sensor inputs, etc. that is coupled to a DRAM memory bank.
- the DRAM data input 2112 may be implemented as a high-speed interface port supporting standards such as DDR, GDDR, HBM, or custom protocols optimized for the target application.
- standards such as DDR, GDDR, HBM, or custom protocols optimized for the target application.
- the specific signaling schemes, bus widths, clock rates and other physical layer attributes can be tailored to meet bandwidth, power, and latency requirements.
- the DRAM data input 2112 can feature dedicated connections for each module or subsystem within the package 2100.
- separate links may exist for the three modules 2102 shown connected to corresponding processing elements in the application wafer 2104. This enables concurrent data transfers to different memories without contention.
- the links may utilize various topologies such as point-to-point, daisy-chained, muxed or switched interconnects as per design requirements.
- the connectivity between the DRAM data input 2112 and the internal DRAMs in package 2100 may involve intermediate buffers, queues, control logic or internal data networks like crossbar or mesh topologies to enable efficient routing, scheduling and quality-of-service minimums.
- This infrastructure transparently manages the data movement from input to destination DRAM modules.
- the DRAM data input 2112 may also include circuitry for serializing/deserializing, encoding/decoding, encrypting/decrypting, and integrity checking to transform the incoming data streams appropriately before routing them towards the DRAM storage elements.
- circuitry for serializing/deserializing, encoding/decoding, encrypting/decrypting, and integrity checking to transform the incoming data streams appropriately before routing them towards the DRAM storage elements.
- Additional features for handshake protocols, flow control, quality-of-service and traffic class differentiation may also be implemented.
- the DRAM data input 2112 can incorporate redundancy and fail-over mechanisms. This provides resiliency against faults in the communication paths to any specific DRAM module. Multiple input ports with routing flexibility help facilitate memory access even in degraded conditions.
- the application programming interface (API) 2114 provides an interface for the application wafer 2104 to communicate with and access the memory on the memory wafer 2102.
- the API 2114 abstracts the underlying memory implementation and provides a simplified way for the application to store and retrieve data.
- the API 2114 can write data to the memory in the memory wafer 2102.
- the API 2114 may provide functions such as OpenMemory, ReadMemory, WriteMemory, and CloseMemory.
- the OpenMemory function initializes access to a memory module within one of the modules groups 2124 on the memory wafer 2102. It takes as parameters information such as the module ID or address range to open.
- the ReadMemory and WriteMemory functions actually read and write data to the opened memory module. They take parameters such as a memory address, data buffer, and length.
- the CloseMemory function deinitializes access to the memory module when it is no longer needed.
- the API 2114 may handle various interactions with the memory wafer 2102. This includes interfacing with any write ports and read ports to access the desired memory module.
- the API 2114 can utilize the appropriate ports, commands, and protocols to activate the target module, initiating row/column decoding, sense amplification, and data transfers.
- the modular memory architecture maximizes parallelism and throughput.
- the API 2114 allows the application wafer 2104 flexibility in reading, writing, and accessing groups of memory modules in the memory wafer 2102 without having to handle low-level responsibilities. In this specific embodiment, the application sees a high-level interface exposing the memory storage capability. The API 2114 may therefore serve as an intermediary ensuring efficient inter- or intra-wafer communication.
- the DRAM data output 2116 refers to the data output connections to external DRAM (Dynamic Random Access Memory) components in communication with the semiconductor package 2100. The DRAM data outputs 2116 provide a means for reading out stored data from the memory arrays and transmitting that information externally for further processing.
- the DRAM data outputs 2116 may consist of a series of conductive traces, contact pads, solder bumps or pillars formed on the surface of the memory wafer 2102, the application wafer 2104, or the package 2100. These conductive elements can link to internal and/or external sense amplifiers, buffers and peripheral control logic.
- the DRAM data outputs 2116 may also connect to complementary interfaces on the application wafer 2104, allowing data transfer between the memory and application dies in a face-to-face configuration.
- High-density microbumps or hybrid bonding may be employed to reduce parasitics and enable wide parallel data buses capable of high bandwidth operation.
- the DRAM data outputs 2116 may have a predetermined pitch and size compatible with wafer-scale production and established interface standards. This may ensure interoperability with application dies from various sources and facilitates modular system design. Self-aligning bonding processes precision alignment between the fine-pitch outputs 2116 and application die contacts.
- the Heat Sink Thermal Material layer 2118 is an optional layer that may be included between wafer pair 2106 and wafer pair 2108.
- the Heat Sink Thermal Material layer 2118 serves to facilitate heat dissipation and cooling between the stacked wafer pairs 2106 and 2108.
- the Heat Sink Thermal Material layer 2118 may be implemented using materials with high thermal conductivity such as aluminum, copper, graphite sheets, thermal adhesives, or thermal greases/compounds.
- the thickness and coverage area of the Heat Sink Thermal Material layer 2118 can be adjusted to provide the desired level of heat spreading and cooling capacity.
- the Heat Sink Thermal Material layer 2118 may contain microstructures, porous features, fins, pins, or channels to increase the surface area for more effective heat transfer. These enhancements to the geometry of the layer can improve lateral and vertical heat conduction.
- the Heat Sink Thermal Material layer 2118 absorbs heat generated during the operation of wafer pair 2106 and spreads this thermal energy laterally to prevent localized hot spots. The heat is then conducted in the vertical direction towards wafer pair 2108 which can serve as a heat sink, facilitating cooling through greater heat dissipation across the larger surface area of the wafer pair 2108. This thermal management approach lowers the overall temperature rise within the semiconductor package 2100, enabling reliable operation of the stacked devices.
- the inclusion of the Heat Sink Thermal Material layer 2118 is optional and may be advantageous in certain configurations of the semiconductor package 2100 where cooling demands are greater due to increased power density or device integration. By assisting in heat diffusion and cooling, the layer helps mitigate potential temperature build-up issues within the package.
- the semiconductor package 2100 may include through silicon vias (TSVs) 2120 to facilitate vertical interconnections between the multiple wafer pairs 2106, 2108, 2110.
- TSVs 2120 are electrical connections that pass vertically through a wafer, allowing signals and power to be transmitted between vertically stacked wafers.
- the TSVs 2120 are located around the peripheral of the wafer pairs 2106, 2108, 2110. The peripheral placement allows the central areas of the wafers to be dedicated to active circuits like the modules groups 2124. In other embodiments, the TSVs 2120 may be positioned towards the centers of the wafer pairs to optimize signal routing. [00445]
- the TSVs 2120 enable high-density 3D integration of multiple dies or chiplets in the semiconductor package 2100.
- the TSVs 2120 may have straight, tapered, or variable diameters along their depths. They may be lined with insulating materials like silicon dioxide to prevent electrical leakage. The TSVs 2120 may then be filled with conductive materials such as copper or tungsten to carry signals and power between layers.
- the TSVs 2120 can carry various signals between the wafer pairs 2106, 2108, 2110 including data, address, control, and clock signals. This allows different dies in separate wafer pairs to coordinate their operation.
- the TSVs also distribute power across the stack, eliminating, reducing, or supplementing the need for external power delivery, in some specific embodiments.
- the TSVs 2120 enable communication between the memory wafer 2102 containing the modules groups 2124 and the application wafer 2104 which utilizes the memory.
- the TSVs 2120 provide power delivery and data transfer pathways between the stacked wafers. This allows high-bandwidth signaling with reduced latency between the computation and memory elements.
- TSVs 2120 can be optimized to meet thermal, electrical, and mechanical constraints. Additional circuitry may be embedded within the TSV structure for performance monitoring or redundancy.
- the semiconductor package 2100 includes multiple wafer pairs 2106, 2108, 2110 stacked vertically.
- the wafer pair 2106 comprises a memory wafer 2102 bonded to an application wafer 2104.
- the memory wafer 2102 contains multiple modules groups 2124 including modules such as module 2122.
- the modules 2122 are arranged in an array configuration across the memory wafer 2102. Each module 2122 functions as an independent data storage unit with dedicated read/write control circuitry and signal ports.
- the module 2122 is implemented as a microvault containing non-volatile memory cells optimized for high density and fast, parallel data access.
- the microvault design utilizes vertical 3D integration techniques to stack multiple layers of memory arrays, achieving high capacity within a compact footprint.
- the module 2122 features independent read peripheral circuitry and a dedicated read port (not shown) to enable concurrent, contention-free reads by multiple processing elements on the application wafer 2104.
- the read peripheral circuitry facilitates the transfer of stored data through the read port to the bonded application wafer 2104.
- the module 2122 contains write control circuitry coupled to a shared write port 2112 on the memory wafer 2102.
- the shared write port 2112 allows an external processor to write data to any module 2122 within the modules group 2124 by providing a common write interface. Appropriate write addressing schemes ensure data is routed to the correct module destination.
- the module 2122 may exchange data signals with the application wafer 2104 through fine-pitch vertical connections like micro-bumps or through-silicon vias. Advanced integration techniques enable direct high-bandwidth links between each module 2122 and its corresponding processing element on the application wafer 2104.
- the module 2122 provides an independently-accessible data storage unit optimized for fast parallel reads while also offering external write accessibility through shared write circuitry and signaling ports. Multiple similar modules can be integrated efficiently to construct dense, high-performance memory systems coupled to processing logic.
- the semiconductor package 2100 includes a modules group 2124 on the memory wafer 2102.
- the modules group 2124 comprises a plurality of modules, including a first module 2126 and a second module 2128.
- the modules group 2124 has a shared write port (not shown) that is configured to write data to the plurality of modules in the group. This allows a common write logic system, potentially involving high voltage circuitry, to write data that can be shared across the modules.
- Each module in the modules group 2124 also has an independent read port to enable concurrent reads from the modules.
- the first module 2126 has a first read port and the second module 2128 has a second read port. This allows each module to be read independently without contention.
- the modules group 2124 may be organized into partitions, each partition having its own read peripheral circuitry and independent clocking.
- the number of modules in each partition can vary. In some cases, each partition may correspond to a single module.
- the modules within the modules group 2124 can be implemented with various memory technologies, including but not limited to FeFET, ReRAM, SOT and STT.
- the modules can also have different capacities and speeds as suited for the application.
- the independent read ports of the modules in modules group 2124 may interface with processing elements on the application wafer 2104. This allows each processing element to have dedicated access to a module, enabling highly parallel and interference-free operation.
- the modules group 2124 through its shared write port and independent read ports, facilitates an efficient manycore architecture with a shared write space and independent read spaces. This allows data to be updated globally while being read locally without contention.
- a semiconductor package comprising: a memory wafer having a plurality of modules groups including a modules group, the modules group having: a plurality of modules including a first module and a second module; a write port configured to write to the modules group; a first read port configured to read from the first module; and a second read port configured to read from the second module; and an application wafer bonded to the memory wafer, the application wafer in operative communication with the plurality of modules groups.
- the application wafer includes at least one of an FPGA core, a microcontroller, a GPU processing element, an ASIC, a CPU core, a neural network accelerator, a digital signal processor (DSP), a reconfigurable processing unit, a machine learning inference engine, a tensor processing unit (TPU), a system-on-chip (SoC), a network processing unit (NPU), an embedded processor, a security module, and a programmable logic device.
- DSP digital signal processor
- TPU tensor processing unit
- SoC system-on-chip
- NPU network processing unit
- embedded processor a security module
- programmable logic device includes at least one of an FPGA core, a microcontroller, a GPU processing element, an ASIC, a CPU core, a neural network accelerator, a digital signal processor (DSP), a reconfigurable processing unit, a machine learning inference engine, a tensor processing unit (TPU), a system-on-chip (SoC), a network processing unit
- each write-group of the at least one write-group includes a respective read port and a common write port.
- the at least one write-group includes a first write group and a second write group, wherein the first write group comprises about a first half of the plurality of modules groups and the second write group comprises about a second half of the plurality of modules groups.
- each module of the plurality of modules is a microvault.
- each module of the plurality of modules includes an independent read port for concurrent reading via a respective independent read port of any of the plurality of modules whereby the first read port is configured to read from the first module concurrently with another read of the second module using the second read port.
- each module of the plurality of modules is configured to include fault-tolerance routing to thereby enable a module of the plurality of modules to route data around a failed module within the memory wafer.
- each module of the plurality of modules is configured to receive configuration data corresponding to a spatial dataflow to thereby optimize an execution of an Al inference task.
- the first read port is disposed on a surface of the memory wafer, the first module comprising: a memory array comprising a plurality of memory unit cells arranged within the first module; and a read peripheral circuitry configured to read data stored within the memory array via the first read port.
- the semiconductor package according to aspect 44 further comprising a write peripheral circuitry connected to the write port on the surface of the memory wafer, wherein: the write peripheral circuitry defines a first footprint on the surface, the first footprint does not overlap with a footprint of the memory array and the first footprint does not overlap with a footprint of the first read port, and the write port defines a second footprint on the surface, the second footprint does not overlap with the footprint of the memory array and does not overlap with the footprint of the first read port.
- a method of forming a semiconductor package comprising: forming a memory wafer having a plurality of modules groups including a modules group, the modules group having: a plurality of modules including a first module and a second module; a write port configured to write to the modules group; a first read port configured to read from the first module; and a second read port configured to read from the second module; and bonding an application wafer to the memory wafer, the application wafer being in operative communication with the plurality of modules groups.
- the application wafer includes at least one of an FPGA core, a microcontroller, a GPU processing element, an ASIC, a CPU core, a neural network accelerator, a digital signal processor (DSP), a reconfigurable processing unit, a machine learning inference engine, a tensor processing unit (TPU), a system-on-chip (SoC), a network processing unit (NPU), an embedded processor, a security module, and a programmable logic device.
- DSP digital signal processor
- TPU tensor processing unit
- SoC system-on-chip
- NPU network processing unit
- embedded processor a security module
- programmable logic device includes at least one of an FPGA core, a microcontroller, a GPU processing element, an ASIC, a CPU core, a neural network accelerator, a digital signal processor (DSP), a reconfigurable processing unit, a machine learning inference engine, a tensor processing unit (TPU), a system-on-chip (SoC), a network processing unit
- each module of the plurality of modules is a microvault.
- each module of the plurality of modules further comprising forming each module of the plurality of modules to include an independent read port for concurrent reading via a respective independent read port of any of the plurality of modules, whereby the first read port is configured to read from the first module concurrently with another read of the second module using the second read port.
- each module of the plurality of modules further comprising forming each module of the plurality of modules to include fault-tolerance routing to thereby enable a module of the plurality of modules to route data around a failed module within the memory wafer.
- a method of using a semiconductor package comprising: writing to a modules group using a write port, the modules group having a plurality of modules including a first module and a second module; reading from the first module using a first read port; reading from the second module using a second read port; and communicating operatively between a memory wafer having the modules group and an application wafer bonded to the memory wafer.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computer Hardware Design (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Power Engineering (AREA)
- Microelectronics & Electronic Packaging (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Semiconductor Integrated Circuits (AREA)
- Semiconductor Memories (AREA)
- Non-Volatile Memory (AREA)
- Static Random-Access Memory (AREA)
- Medicinal Preparation (AREA)
- Testing Or Measuring Of Semiconductors Or The Like (AREA)
- Design And Manufacture Of Integrated Circuits (AREA)
- Thin Film Transistor (AREA)
Abstract
L'invention concerne un boîtier de semi-conducteur qui comprend une tranche de mémoire et une tranche d'application liées ensemble pour créer un système de traitement de données et de stockage de mémoire. La tranche de mémoire héberge de multiples groupes de modules, chacun comprenant plusieurs modules, un port d'écriture partagé et des ports de lecture indépendants, permettant des opérations de lecture simultanées. La tranche d'application, qui communique fonctionnellement avec la tranche de mémoire, peut comprendre divers éléments de traitement tels que des cœurs FPGA, des GPU, des CPU et des accélérateurs de réseau neuronal. Les deux tranches peuvent avoir des périmètres externes coextensifs, des configurations de support telles que des formes rectangulaires, carrées et rondes, et fournir une capacité de mémoire substantielle. Le boîtier peut utiliser une technologie de liaison sans mémoire annexe pour réduire la résistance thermique et électrique et incorpore des trous d'interconnexion traversant le silicium (TSV) pour une interconnectivité robuste.
Applications Claiming Priority (14)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363518988P | 2023-08-11 | 2023-08-11 | |
| US63/518,988 | 2023-08-11 | ||
| US202363602737P | 2023-11-27 | 2023-11-27 | |
| US202363602733P | 2023-11-27 | 2023-11-27 | |
| US63/602,733 | 2023-11-27 | ||
| US63/602,737 | 2023-11-27 | ||
| US202463567649P | 2024-03-20 | 2024-03-20 | |
| US63/567,649 | 2024-03-20 | ||
| US202463637742P | 2024-04-23 | 2024-04-23 | |
| US202463637764P | 2024-04-23 | 2024-04-23 | |
| US63/637,742 | 2024-04-23 | ||
| US63/637,764 | 2024-04-23 | ||
| US202463674471P | 2024-07-23 | 2024-07-23 | |
| US63/674,471 | 2024-07-23 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025038372A1 true WO2025038372A1 (fr) | 2025-02-20 |
Family
ID=92538715
Family Applications (7)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/041403 Pending WO2025038372A1 (fr) | 2023-08-11 | 2024-08-08 | Système, procédé et appareil pour mémoire à l'échelle d'une tranche |
| PCT/US2024/041396 Pending WO2025038368A1 (fr) | 2023-08-11 | 2024-08-08 | Circuit intégré à mémoires de type microvault |
| PCT/US2024/041390 Pending WO2025038365A1 (fr) | 2023-08-11 | 2024-08-08 | Circuit intégré ayant des mémoires et un port d'écriture partagé |
| PCT/US2024/041395 Pending WO2025042587A1 (fr) | 2023-08-11 | 2024-08-08 | Ensemble ayant une micropuce liée face à face |
| PCT/US2024/041393 Pending WO2025038367A1 (fr) | 2023-08-11 | 2024-08-08 | Système et procédé pour avoir une fermeture de synchronisation correcte par construction pour une liaison face à face |
| PCT/US2024/041400 Pending WO2025038369A1 (fr) | 2023-08-11 | 2024-08-08 | Structures fefet utilisant des canaux semi-conducteurs d'oxyde amorphe sur des circuits intégrés |
| PCT/US2024/041392 Pending WO2025038366A1 (fr) | 2023-08-11 | 2024-08-08 | Procédé et système de test de micropuces liées face à face empilées |
Family Applications After (6)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/041396 Pending WO2025038368A1 (fr) | 2023-08-11 | 2024-08-08 | Circuit intégré à mémoires de type microvault |
| PCT/US2024/041390 Pending WO2025038365A1 (fr) | 2023-08-11 | 2024-08-08 | Circuit intégré ayant des mémoires et un port d'écriture partagé |
| PCT/US2024/041395 Pending WO2025042587A1 (fr) | 2023-08-11 | 2024-08-08 | Ensemble ayant une micropuce liée face à face |
| PCT/US2024/041393 Pending WO2025038367A1 (fr) | 2023-08-11 | 2024-08-08 | Système et procédé pour avoir une fermeture de synchronisation correcte par construction pour une liaison face à face |
| PCT/US2024/041400 Pending WO2025038369A1 (fr) | 2023-08-11 | 2024-08-08 | Structures fefet utilisant des canaux semi-conducteurs d'oxyde amorphe sur des circuits intégrés |
| PCT/US2024/041392 Pending WO2025038366A1 (fr) | 2023-08-11 | 2024-08-08 | Procédé et système de test de micropuces liées face à face empilées |
Country Status (2)
| Country | Link |
|---|---|
| TW (7) | TW202527683A (fr) |
| WO (7) | WO2025038372A1 (fr) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190325950A1 (en) * | 2018-04-24 | 2019-10-24 | Arm Limited | Multi-Port Memory Circuitry |
| WO2021028723A2 (fr) * | 2019-08-13 | 2021-02-18 | Neuroblade Ltd. | Processeurs à base de mémoire |
| US20220107888A1 (en) * | 2016-09-27 | 2022-04-07 | Integrated Silicon Solution, (Cayman) Inc. | Heuristics for selecting subsegments for entry in and entry out operations in an error cache system with coarse and fine grain segments |
| US20230059491A1 (en) * | 2019-05-31 | 2023-02-23 | Kepler Computing Inc. | 3d stacked compute and memory with copper-to-copper hybrid bond |
Family Cites Families (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP2509075B1 (fr) * | 2006-12-14 | 2019-05-15 | Rambus Inc. | Dispositif mémoire à plusieurs dés |
| US7978721B2 (en) * | 2008-07-02 | 2011-07-12 | Micron Technology Inc. | Multi-serial interface stacked-die memory architecture |
| JP2011081732A (ja) * | 2009-10-09 | 2011-04-21 | Elpida Memory Inc | 半導体装置及びその調整方法並びにデータ処理システム |
| US8547774B2 (en) * | 2010-01-29 | 2013-10-01 | Mosys, Inc. | Hierarchical multi-bank multi-port memory organization |
| JP2012208975A (ja) * | 2011-03-29 | 2012-10-25 | Renesas Electronics Corp | 半導体装置 |
| KR101466013B1 (ko) * | 2012-08-13 | 2014-11-27 | 한국표준과학연구원 | 비정질 산화물 반도체 층 및 이를 포함하는 박막 트랜지스터 |
| US10289604B2 (en) * | 2014-08-07 | 2019-05-14 | Wisconsin Alumni Research Foundation | Memory processing core architecture |
| CN106250321B (zh) * | 2016-07-28 | 2019-03-01 | 盛科网络(苏州)有限公司 | 2r1w存储器的数据处理方法及数据处理系统 |
| US10586786B2 (en) * | 2016-10-07 | 2020-03-10 | Xcelsis Corporation | 3D chip sharing clock interconnect layer |
| WO2018125118A1 (fr) * | 2016-12-29 | 2018-07-05 | Intel Corporation | Dispositifs à transistors à effet de champ ferroélectriques d'étage de sortie |
| WO2018236353A1 (fr) * | 2017-06-20 | 2018-12-27 | Intel Corporation | Mémoire non volatile intégrée basée sur des transistors à effet de champ ferroélectriques |
| WO2019066948A1 (fr) * | 2017-09-29 | 2019-04-04 | Intel Corporation | Transistor à effet de champ ferroélectrique à double grille |
| US11687472B2 (en) * | 2020-08-20 | 2023-06-27 | Global Unichip Corporation | Interface for semiconductor device and interfacing method thereof |
| US20220352379A1 (en) * | 2021-04-29 | 2022-11-03 | Taiwan Semiconductor Manufacturing Company Limited | Ferroelectric memory devices having improved ferroelectric properties and methods of making the same |
-
2024
- 2024-08-08 TW TW113129743A patent/TW202527683A/zh unknown
- 2024-08-08 WO PCT/US2024/041403 patent/WO2025038372A1/fr active Pending
- 2024-08-08 TW TW113129768A patent/TW202526570A/zh unknown
- 2024-08-08 TW TW113129781A patent/TW202523076A/zh unknown
- 2024-08-08 WO PCT/US2024/041396 patent/WO2025038368A1/fr active Pending
- 2024-08-08 TW TW113129713A patent/TW202526960A/zh unknown
- 2024-08-08 WO PCT/US2024/041390 patent/WO2025038365A1/fr active Pending
- 2024-08-08 WO PCT/US2024/041395 patent/WO2025042587A1/fr active Pending
- 2024-08-08 TW TW113129693A patent/TW202531863A/zh unknown
- 2024-08-08 WO PCT/US2024/041393 patent/WO2025038367A1/fr active Pending
- 2024-08-08 TW TW113129726A patent/TW202533421A/zh unknown
- 2024-08-08 WO PCT/US2024/041400 patent/WO2025038369A1/fr active Pending
- 2024-08-08 WO PCT/US2024/041392 patent/WO2025038366A1/fr active Pending
- 2024-08-08 TW TW113129719A patent/TW202527723A/zh unknown
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20220107888A1 (en) * | 2016-09-27 | 2022-04-07 | Integrated Silicon Solution, (Cayman) Inc. | Heuristics for selecting subsegments for entry in and entry out operations in an error cache system with coarse and fine grain segments |
| US20190325950A1 (en) * | 2018-04-24 | 2019-10-24 | Arm Limited | Multi-Port Memory Circuitry |
| US20230059491A1 (en) * | 2019-05-31 | 2023-02-23 | Kepler Computing Inc. | 3d stacked compute and memory with copper-to-copper hybrid bond |
| WO2021028723A2 (fr) * | 2019-08-13 | 2021-02-18 | Neuroblade Ltd. | Processeurs à base de mémoire |
Also Published As
| Publication number | Publication date |
|---|---|
| TW202531863A (zh) | 2025-08-01 |
| TW202526570A (zh) | 2025-07-01 |
| TW202533421A (zh) | 2025-08-16 |
| TW202527723A (zh) | 2025-07-01 |
| WO2025038368A1 (fr) | 2025-02-20 |
| WO2025038365A1 (fr) | 2025-02-20 |
| WO2025038367A1 (fr) | 2025-02-20 |
| TW202527683A (zh) | 2025-07-01 |
| TW202526960A (zh) | 2025-07-01 |
| TW202523076A (zh) | 2025-06-01 |
| WO2025038366A1 (fr) | 2025-02-20 |
| WO2025042587A1 (fr) | 2025-02-27 |
| WO2025038369A1 (fr) | 2025-02-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11552056B2 (en) | Three-dimensional memory device with three-dimensional phase-change memory | |
| TWI740319B (zh) | 具有可程式設計邏輯元件和異構記憶體的半導體元件及其形成方法 | |
| US11024600B2 (en) | Unified semiconductor devices having programmable logic device and heterogeneous memories and methods for forming the same | |
| US12219780B2 (en) | High-density memory device with planar thin film transistor (TFT) selector and methods for making the same | |
| TWI709139B (zh) | 三維記憶體件中的記憶體內計算 | |
| WO2020211322A1 (fr) | Dispositifs à semi-conducteur unifiés ayant un dispositif logique programmable et des mémoires hétérogènes et leurs procédés de formation | |
| CN104659030B (zh) | 电子设备 | |
| EP3891807A1 (fr) | Dispositifs à semi-conducteur unifiés ayant un dispositif logique programmable et des mémoires hétérogènes et leurs procédés de formation | |
| TW202042379A (zh) | 具有快閃記憶體控制器的鍵合的存放裝置及其製造和操作方法 | |
| WO2023287908A1 (fr) | Réseau de chaînes de mémoire tridimensionnelles de transistors ferroélectriques à couches minces | |
| TW201921631A (zh) | 耦合至一記憶體陣列之解碼電路 | |
| US20130083048A1 (en) | Integrated circuit with active memory and passive variable resistive memory with shared memory control logic and method of making same | |
| WO2025038372A1 (fr) | Système, procédé et appareil pour mémoire à l'échelle d'une tranche | |
| CN120676642A (zh) | 一种存储芯片和存储芯片的制备方法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24762144 Country of ref document: EP Kind code of ref document: A1 |