WO2025038365A1 - Circuit intégré ayant des mémoires et un port d'écriture partagé - Google Patents
Circuit intégré ayant des mémoires et un port d'écriture partagé Download PDFInfo
- Publication number
- WO2025038365A1 WO2025038365A1 PCT/US2024/041390 US2024041390W WO2025038365A1 WO 2025038365 A1 WO2025038365 A1 WO 2025038365A1 US 2024041390 W US2024041390 W US 2024041390W WO 2025038365 A1 WO2025038365 A1 WO 2025038365A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- read
- integrated circuit
- write
- modules
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H10—SEMICONDUCTOR DEVICES; ELECTRIC SOLID-STATE DEVICES NOT OTHERWISE PROVIDED FOR
- H10D—INORGANIC ELECTRIC SEMICONDUCTOR DEVICES
- H10D62/00—Semiconductor bodies, or regions thereof, of devices having potential barriers
- H10D62/40—Crystalline structures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/16—Constructional details or arrangements
- G06F1/18—Packaging or power distribution
- G06F1/183—Internal mounting support structures, e.g. for printed circuit boards, internal connecting means
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C29/00—Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
- G11C29/02—Detection or location of defective auxiliary circuits, e.g. defective refresh counters
- G11C29/022—Detection or location of defective auxiliary circuits, e.g. defective refresh counters in I/O circuitry
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C7/00—Arrangements for writing information into, or reading information out from, a digital store
- G11C7/10—Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C7/00—Arrangements for writing information into, or reading information out from, a digital store
- G11C7/10—Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
- G11C7/1051—Data output circuits, e.g. read-out amplifiers, data output buffers, data output registers, data output level conversion circuits
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C7/00—Arrangements for writing information into, or reading information out from, a digital store
- G11C7/10—Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
- G11C7/1078—Data input circuits, e.g. write amplifiers, data input buffers, data input registers, data input level conversion circuits
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C8/00—Arrangements for selecting an address in a digital store
- G11C8/12—Group selection circuits, e.g. for memory block selection, chip selection, array selection
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03K—PULSE TECHNIQUE
- H03K19/00—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits
- H03K19/02—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components
- H03K19/173—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
- H03K19/177—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
- H03K19/17748—Structural details of configuration resources
- H03K19/1776—Structural details of configuration resources for memories
-
- H—ELECTRICITY
- H10—SEMICONDUCTOR DEVICES; ELECTRIC SOLID-STATE DEVICES NOT OTHERWISE PROVIDED FOR
- H10B—ELECTRONIC MEMORY DEVICES
- H10B51/00—Ferroelectric RAM [FeRAM] devices comprising ferroelectric memory transistors
- H10B51/30—Ferroelectric RAM [FeRAM] devices comprising ferroelectric memory transistors characterised by the memory core region
-
- H—ELECTRICITY
- H10—SEMICONDUCTOR DEVICES; ELECTRIC SOLID-STATE DEVICES NOT OTHERWISE PROVIDED FOR
- H10D—INORGANIC ELECTRIC SEMICONDUCTOR DEVICES
- H10D30/00—Field-effect transistors [FET]
- H10D30/01—Manufacture or treatment
- H10D30/021—Manufacture or treatment of FETs having insulated gates [IGFET]
- H10D30/0415—Manufacture or treatment of FETs having insulated gates [IGFET] of FETs having ferroelectric gate insulators
-
- H—ELECTRICITY
- H10—SEMICONDUCTOR DEVICES; ELECTRIC SOLID-STATE DEVICES NOT OTHERWISE PROVIDED FOR
- H10D—INORGANIC ELECTRIC SEMICONDUCTOR DEVICES
- H10D30/00—Field-effect transistors [FET]
- H10D30/60—Insulated-gate field-effect transistors [IGFET]
- H10D30/701—IGFETs having ferroelectric gate insulators, e.g. ferroelectric FETs
-
- H—ELECTRICITY
- H10—SEMICONDUCTOR DEVICES; ELECTRIC SOLID-STATE DEVICES NOT OTHERWISE PROVIDED FOR
- H10D—INORGANIC ELECTRIC SEMICONDUCTOR DEVICES
- H10D64/00—Electrodes of devices having potential barriers
- H10D64/01—Manufacture or treatment
- H10D64/031—Manufacture or treatment of data-storage electrodes
- H10D64/033—Manufacture or treatment of data-storage electrodes comprising ferroelectric layers
-
- H—ELECTRICITY
- H10—SEMICONDUCTOR DEVICES; ELECTRIC SOLID-STATE DEVICES NOT OTHERWISE PROVIDED FOR
- H10D—INORGANIC ELECTRIC SEMICONDUCTOR DEVICES
- H10D64/00—Electrodes of devices having potential barriers
- H10D64/60—Electrodes characterised by their materials
- H10D64/66—Electrodes having a conductor capacitively coupled to a semiconductor by an insulator, e.g. MIS electrodes
- H10D64/68—Electrodes having a conductor capacitively coupled to a semiconductor by an insulator, e.g. MIS electrodes characterised by the insulator, e.g. by the gate insulator
- H10D64/689—Electrodes having a conductor capacitively coupled to a semiconductor by an insulator, e.g. MIS electrodes characterised by the insulator, e.g. by the gate insulator having ferroelectric layers
-
- H—ELECTRICITY
- H10—SEMICONDUCTOR DEVICES; ELECTRIC SOLID-STATE DEVICES NOT OTHERWISE PROVIDED FOR
- H10D—INORGANIC ELECTRIC SEMICONDUCTOR DEVICES
- H10D30/00—Field-effect transistors [FET]
- H10D30/60—Insulated-gate field-effect transistors [IGFET]
- H10D30/67—Thin-film transistors [TFT]
- H10D30/674—Thin-film transistors [TFT] characterised by the active materials
- H10D30/6755—Oxide semiconductors, e.g. zinc oxide, copper aluminium oxide or cadmium stannate
- H10D30/6756—Amorphous oxide semiconductors
-
- H—ELECTRICITY
- H10—SEMICONDUCTOR DEVICES; ELECTRIC SOLID-STATE DEVICES NOT OTHERWISE PROVIDED FOR
- H10D—INORGANIC ELECTRIC SEMICONDUCTOR DEVICES
- H10D99/00—Subject matter not provided for in other groups of this subclass
Definitions
- the present disclosure relates to integrated circuits. More particularly, the present disclosure relates to integrated circuits having multiple modules with separate read address spaces and a shared write address space.
- Chiplets refer to miniature chips that are designed to work as a single entity while using advanced packaging technology. These miniaturized chips are created by dividing the larger chip into several smaller chips, each with its own function or capability. The concept originated from the semiconductor industry's need to overcome the physical restrictions of traditional monolithic chip designs and achieve higher levels of integration. The idea behind chiplets is to create a modular system of interconnected and interchangeable chips that can be combined in different configurations to create advanced computing systems with improved performance, power efficiency, and functionality.
- Chiplets can be based on different architectures, such as CPU, GPU, memory, or IO, and can be assembled and stacked in a variety of ways, depending on the specific application requirements.
- One of the advantages of the chiplet approach is the ability to mix and match different chiplets from different manufacturers to create custom solutions that meet specific computing needs. This approach also allows for faster time to market, reduced development costs, and increased flexibility, as chiplets can be upgraded or replaced without the need for a complete system redesign.
- chiplets may be used in various industries, including consumer electronics, cloud computing, and data centers, where the demand for high-performance computing and energy efficiency is high. Chiplets are expected to play a significant role in the future of computing and are likely to unlock new possibilities for creating more powerful and/or sophisticated electronic devices.
- An integrated circuit is disclosed herein that may be part of a semiconductor device.
- a method of manufacturing or a method of writing and reading data to modules are disclosed herein and can be used with all examples, embodiments, and aspects as described herein.
- the integrated circuit may include a modules group having a plurality of modules including a first module and a second module.
- the integrated circuit may also include a shared write port configured to write to the modules group.
- the integrated circuit may furthermore include a first read port configured to read from the first module.
- the integrated circuit may in addition include a second read port configured to read from the second module.
- the integrated circuit may have the first and second read ports configured to be inactive when a write operation is applied to the shared write port.
- the integrated circuit may be disposed on a first semiconductor device and may include: a shared write peripheral configured to write to the modules group via the shared write port, the shared write peripheral disposed on a silicon substrate of the first semiconductor device; a first read peripheral configured to read the first module via the first read port, where the first read peripheral is disposed on the silicon substrate of the first semiconductor device; and a second read peripheral configured to read the second module via the second read port, where the second read peripheral is disposed on the silicon substrate of the first semiconductor device, where the first module and the second module are disposed on a second layer of the first semiconductor device.
- the integrated circuit may be configured such that the first module is a three- dimensional column of memory bit cells disposed between a first surface of the first semiconductor device and the silicon substrate of the first semiconductor device.
- the integrated circuit may have the first read port configured to be in electrical communication with a second semiconductor device and/or the second semiconductor device configured to be secured to a first surface of the first semiconductor device.
- the first read port may include a read address bus that traverses through the second layer of the first semiconductor device.
- the first read port may include a read data bus that traverses through the second layer of the first semiconductor device.
- the shared write port may be configured to be in electrical communication with a second semiconductor device.
- the second semiconductor device may be configured to be secured to a first surface of the first semiconductor device.
- the integrated circuit may include an electrically connecting interposer disposed between the first surface of the first semiconductor device and the second semiconductor device.
- the shared write port may include a write address bus that traverses through the second layer of the first semiconductor device.
- the shared write port may include a write data bus that traverses through the second layer of the first semiconductor device.
- the first and second read ports may be configured to process reads concurrently relative to each other.
- the shared write port may be configured to write to an address space.
- the shared write port may be configured to write to the first module via a first portion of the address space and write to the second module via a second portion of the address space.
- Each module of the plurality of modules may include an independent read port for concurrent reading via a respective independent read port of any of the plurality of modules where the first read port is configured to read from the first module concurrently with another read of the second module using the second read port.
- the modules group may be configured to have a single write address space configured to write to the plurality of modules.
- the first read port may be configured to have a first read address space and the second read port is configured to have a second read address space.
- the first read address space may numerically overlaps with the second read address space.
- the first read address space may be coextensive with the second read address space and/or contiguous with the second read address space.
- the modules group may be formed on a second layer portion of an integrated circuit device.
- the integrated circuit may be implemented as a chiplet configured to be face- to-face bonded.
- the chiplet may include a plurality of input/output (VO) bonds, where each of the plurality of input/output (VO) bonds is configured to provide a respective read port for a respective module of the plurality of modules where each of the plurality of input/output (VO) bonds is configured to interface with a complementary respective input/output (VO) bonds of a second device as bonded with the chiplet.
- VO input/output
- the integrated circuit may be part of a semiconductor device.
- the semiconductor device may be implemented as a die, a wafer, or a Chiplet die and may be bonded with a second semiconductor device, which is one of a GPU (Graphics Processing Unit), an SoC (System-On-chip), and a GPPU (General Purpose Processing Unit).
- a GPU Graphics Processing Unit
- SoC System-On-chip
- GPPU General Purpose Processing Unit
- the second semiconductor device may be implemented as a Chiplet die, a wafer, an ASIC (Application Specific Semiconductor Chip), and an FPGA (Field Programmable Gate Array).
- the semiconductor device may include a plurality of input/output (VO) bonds where each of the plurality of input/output (VO) bonds is configured to provide a respective read port for a respective module of the plurality of modules and each of the plurality of input/output (VO) bonds is configured to electrically connect with a complementary respective input/output (VO) bonds of the second semiconductor device.
- the Chiplet die has a first side and a second side where the second side is configured to be face-to-face bonded to the second semiconductor device.
- the Chiplet die and second semiconductor device may be part of a multi-chip package.
- the semiconductor device may include an electrically connecting interposer disposed between the Chiplet die and the second semiconductor device.
- the first read port may be disposed on the Chiplet die.
- the Chiplet die has a surface and the first module may include: a memory array having a plurality of memory unit cells arranged within the first module; and a read peripheral circuitry configured to read data stored within the memory array via the first read port.
- Footprints are the space that a device can proj ect onto a surface (e.g., how much x-y space a device occupies which may or may not extend into a third dimension (e.g., z-axis).
- a footprint on the surface of the read peripheral circuitry may overlap with a footprint on the surface of the memory array.
- the plurality of memory unit cells may be arranged in a three- dimensional configuration.
- a footprint on the surface of the first read port may overlap a footprint on the surface of the memory array.
- a footprint on the surface of the read peripheral circuitry may be coextensive with a footprint on the surface of the memory array.
- a footprint on the surface of the first read port may be coextensive with a footprint on the surface of the memory array.
- the memory array, the read peripheral circuitry, and the first read port may be stacked vertically and configured to minimize a module footprint on the surface of the Chiplet die.
- write peripheral circuitry may be connected to the shared write port on the surface of the Chiplet die such that: the write peripheral circuitry defines a first footprint on the surface where the first footprint does not overlap with a footprint of the memory array, the first footprint does not overlap with a footprint of the first read port, and the shared write port defines a second footprint on the surface where the second footprint does not overlap with the footprint of the memory array and does not overlap with the footprint of the first read port.
- the first footprint may overlap the second footprint.
- the plurality of memory unit cells may be formed from at least one non-volatile memory unit cell. These memory unit cells may comprise various materials such as ferroelectric, magnetic, spin orbit torque, spin-transfer-torque, phase-change, and/or antiferroelectric materials.
- One implementation may include organizing the plurality of modules into separate partitions, each with a dedicated read peripheral, where each dedicated read peripheral has an independent clock. Alternatively, each module may have its own dedicated read peripheral.
- the modules group may be configured to process write command only and disable read capability. Conversely, when the modules group is not in the reset, write capability may be disabled.
- the modules group may include at least two different non-volatile memory technologies.
- One implementation may include an integrated circuit with a dynamic allocation circuitry to allocate memory blocks to the plurality of modules based on the usage of the modules group. Additionally, the integrated circuit may include a write peripheral connected to a dedicated I/O pad enabling data transfer external to the integrated circuit.
- the integrated circuit may be formed via an additive manufacturing process on a silicon substrate and may be electrically connected to a second semiconductor device with another integrated circuit.
- the integrated circuit may include a read peripheral for at least one of the plurality of modules that is disposed on the silicon substrate or the second semiconductor device.
- the second semiconductor device may be implemented as a system-on-chip, Chiplet die, wafer, ASIC, or FPGA.
- the plurality of modules is formed from non-volatile memory unit cells arranged in a three-dimensional connectivity fabric vertical to a silicon substrate and the second semiconductor device, and may utilize at least one of a cross-point, 3D NANDs, 3D NORs, 3D ANDs, and a stacked planar layer.
- the integrated circuit may include a single write peripheral with a dedicated clock or a plurality of clocks which feed to respective modules with decoupled timing relative to other modules.
- Each of the plurality of clocks are configured for clocking a respective read port of a respective module of the plurality of modules.
- Another implementation may include the modules group formed on a chiplet with a first side and a second side, the second side being configured for bonding to a second semiconductor device.
- the integrated circuit may also include a decoder circuit, a driver circuit, and a register circuit on a silicon substrate of the chiplet.
- the modules group may be formed on a second layer of the chiplet.
- the second semiconductor device may have a plurality of processing elements, where each processing element includes a respective interface to communicate with a respective module of the plurality of modules on the modules group when the second semiconductor device is bonded to the chiplet.
- the chiplet may also have an interface to the shared write port on the second side to thereby interface with a complementary interface on the second semiconductor device.
- the second semiconductor device may have a network-on-a-chip that is configured to provide inter-element communication for the plurality of processing elements.
- the plurality of processing elements may include at least one embedded FPGA, or one of a soft processor, a DSP block, an embedded processor, and a microcontroller.
- a write operation to the modules group is performed through a priority arbitration circuit that facilitates the modules to be accessed in a predetermined order.
- the integrated circuit may include a power gating circuitry that selectively powers down a module of the plurality of modules when not in use.
- the shared write port may be configured to write to a virtual address space that is mapped onto a physical memory space.
- the integrated circuit may also include a control circuit configured to enable the shared write port during write operations and disable the shared write port during read operations to conserve energy.
- the control circuit may disable the first read port during write operations and enable the first read port during read operations to conserve energy.
- a power management module may also be included, which selectively powers down the shared write port during read operations to conserve energy.
- the write circuitry may be configured to dynamically shift power allocation from write operations to read operations, or to enter a sleep mode and power down the shared write port during read operations to conserve energy.
- the non-volatile memory utilized in the plurality of modules may be selected from the group comprising of a FeFET, a FeRAM, a ReRAM, a SOT (Spin Orbit Torque), and a STT (Spin Transfer Torque).
- the write peripheral for modules group may be implemented on a silicon substrate and disposed between the modules group and the second side of the chiplet.
- the read peripheral for the first module may be implemented on a silicon substrate and disposed between the modules group and the first side of the chiplet, or alternatively, in a second layer.
- each module has its own dedicated write peripheral that utilizes a shared clock.
- Fig. 1 is a block diagram of an integrated circuit that may be part of a semiconductor device such as a chiplet in accordance with an embodiment of the present disclosure
- Fig. 2 shows a perspective view of an assembly having the integrated circuit of Fig. 1 implemented on a semiconductor device that is electrically connected to another device to form the assembly in accordance with an embodiment of the present disclosure
- Fig. 3 shows a block diagram illustrating the memory address space of the integrated circuit of Fig. 1 in accordance with an embodiment of the present disclosure
- Fig. 4 shows a block diagram illustrating the memory address space with the signal interfaces of the integrated circuit of Fig. 1 in accordance with an embodiment of the present disclosure
- FIG. 5 shows an illustration of an integrated circuit that may be part of a semiconductor device such as a chiplet in accordance with an embodiment of the present disclosure
- Fig. 6 shows a perspective of an assembly having the integrated circuit of Fig.
- Fig. 1 shows a block diagram of an integrated circuit 100 that may be packaged as a bondable chiplet (e.g., face-to-face chiplet bondable) in accordance with an embodiment of the present disclosure.
- the integrated circuit (IC) 100 includes a modules group 106 consisting of modules 108, 110, 112, and 114.
- the IC 100 also features a shared write port 102, configured to write to the modules group 106 using a write peripheral 104. Additionally, it includes read peripherals 116, 118, 120, and 122 and read ports 124, 126, 128, and 130, configured to read from the modules 108, 110, 112, 114.
- the IC 100 also includes an interlock 132.
- the write port 102 may be configured to provide a single write address space for all of the modules group 106 where each of the modules 108, 110, 112, 114 has a dedicated read port 124, 126, 128, 130 respectively.
- the integrated circuit 100 may be packaged as part of a chiplet configured to be electrically connected to another integrated circuit device (e.g., another chiplet, or IC package, with or without electrical contacts, electrical bumps, etc.).
- the chiplet may be electrically connected to another device including, for example, by bonding, soldering, wafer-to-wafer bonding, face-to-face chiplet bonding, chiplet-to-wafer bonding, chiplet-to-interposer bonding, and/or may be connected together with an interposer or other interfacing technology. None, one, or more of interposers may be used or other interfacing technologies that are common to heterogeneous 3D system-in-package solutions may be utilized in electrically connecting a chiplet to another device.
- Each read port (124, 126, 128, 130) in the chiplet may feature electrical contacts on a side of the chiplet or on multiple sides of the chiplet.
- the read ports 124, 126, 128, 130 may use multi-cycle pipelined circuitry.
- the electrical contacts may line up in a manner that provides dedicated access to specific modules of the modules 108, 110, 112, 114.
- a processing/computing element may have exclusive access to module 108 via the read port 124, which may contain the neural network weights in a registry file.
- a different processing/computing element may have exclusive read access to module 110 via the read port 126, which includes a different registry file.
- this arrangement of the electrical contacts ensures that each computing/processing element has the dedicated access it needs to carry out its specific computation efficiently thereby providing a compact, modular, and scalable system that allows different processing elements to maintain dedicated access to specific modules 108, 110, 112, 114. Without dedicated access, different processing elements might have to queue up to use the same resource which would slow down overall processing speed.
- the proposed chiplet ensures that each processing element can operate at its maximum capability without interference from other computing elements in this specific embodiment.
- the write peripheral 104 is a peripheral circuitry responsible for processing and writing data into the memory cells found within the modules 108, 110, 112, 114.
- the write peripheral 104 may include dedicated contacts so that a chip electrically connected (e.g., bonded) to a chiplet of the integrated circuit can access the write port 102 via a shared write logic system.
- This system involves utilizing a shift register-based, different voltage design, preferably high voltage design, that has a shared write address and data components.
- This shared write logic system is designed to be accessed via a bonded chip, another bonded chiplet, and/or via other circuitry in the same package as the integrated circuit 100.
- a shift register could allow the system to move data through a series of stages, with each subsequent stage receiving the data from the previous stage. By utilizing a shift register, the system can increase the data throughput while maintaining a low rate of data transfers.
- the shared write address space refers to the location where data is written in the chiplet.
- an interlock 132 may disable the read ports 124, 126, 128, 130 while data is being written to the modules group 106 via the write port 102. Likewise, the interlock 132 may disable the write port 102 when read operations are being carried out on the read ports 124, 126, 128, 130.
- the written data can later be accessed concurrently by all processing elements that need to read the data via a respective one of the read ports 124, 126, 128, 130. This ensures that all processing elements have the most commonly used data available to them without regard to other reads being concurrently carried out by other processing elements.
- the write peripheral 104 circuit includes a write driver. This unit receives the data to be written and converts it into suitable signals that can change the state of the memory cells. Depending on the type of memory technology used, these signals could involve voltage levels, current pulses, or other types of energy.
- the shared write logic system may be high voltage due to the specific voltage requirements of the chiplet.
- the write driver must provide enough power to reliably change the state of the memory cells, but it must also operate within suitable parameters to avoid causing damage or unnecessary wear.
- the write peripheral 104 circuit may also feature a data buffer or write buffer. This component temporarily stores the data to be written, allowing the write operation to be performed at an optimal pace. By balancing the speed of incoming data with the speed at which the memory cells can be written, the write buffer helps prevent data loss and optimizes system performance.
- the write peripheral 104 may also include, in some embodiments, a write control unit that orchestrates the sequence of operations in the write process. It generates control signals to activate the write driver at the appropriate times, controls the flow of data from the write buffer, and coordinates the timing of the write operations. By synchronizing these various activities, the write control unit ensures efficient and reliable write operations.
- a write control unit that orchestrates the sequence of operations in the write process. It generates control signals to activate the write driver at the appropriate times, controls the flow of data from the write buffer, and coordinates the timing of the write operations. By synchronizing these various activities, the write control unit ensures efficient and reliable write operations.
- the write peripheral 104 may also include data encoding mechanisms to improve reliability and data integrity. For example, before the data is written to the memory cells, these mechanisms encode it in a way that allows potential errors to be detected, and in some cases, corrected when the data is later read. This can be helpful in systems where data integrity has a higher priority, such as in servers or scientific research devices.
- the write peripheral 104 may also include a timing unit that serves as the system's heartbeat, supplying clock signals that synchronize the operation of the system's various components. In some systems, it may include components like oscillators, clock generators, or phase-locked loops.
- the timing unit may ensure that all operations occur at the suitable time relative to each other.
- the IC 100 may be implemented as a face-to-face bonded chiplet, with modules 108, 110, 112, and 114 formed from a non-volatile memory.
- the IC 100 may also feature a dynamic allocation circuitry to allocate memory blocks to the modules group 106 based on the usage of the modules group 106 (e.g., each module 108, 110, 112, and 114 may include dynamic allocation circuitry for dynamically allocating a range of read locations for a respective processing element).
- the IC 100 features a plurality of clocks, with each clock of the plurality of clocks feeding a respective module of the plurality of modules, providing each respective module with decoupled timing relative to the other modules of the plurality of modules.
- the modules group 106 may be arranged in any topology known to one of ordinary skill in the relevant art. Bit-cell density can be up to 10 times more dense than embedded SRAM cells in the modules group 106.
- the IC 100 may be formed on a chiplet that includes a first side and a second side, with the second side configured for bonding to a second semiconductor device.
- the IC 100 may include a high voltage write logic adjacent to the first side of the chiplet.
- a decoder circuitry, a driver circuitry, and a register circuitry may be formed on the silicon substrate portion of the chiplet, while the modules group 106 is formed on a second layer portion of the chiplet.
- the second semiconductor device may comprise a plurality of processing elements. Each processing element includes a respective interface to communicate with a respective module of the plurality of modules on the modules group 106 when the second semiconductor device is bonded to the chiplet.
- the silicon substrate traditionally serves as the initial stage of IC fabrication, focusing on the creation of active components, particularly transistors. Techniques like diffusion, ion implantation, oxidation, and material deposition are employed to fashion the intricate structures of transistors. These processes operate at small scales. The application of photolithography, etching, and implantation techniques enables the definition of transistor structures with precision.
- the silicon substrate is used to establish the fundamental building blocks for signal processing, amplification, and control within the IC. This layer is sometimes called the Front-End-Of-The-Line (“FEOL”).
- a second layer may be added that traditionally takes on the role of interconnect fabrication, facilitating the electrical connections between various IC components.
- the second layer processes typically differ from the processes used on the silicon substrate in terms of precision and scale.
- the interconnects are formed by depositing and patterning metal layers, typically aluminum or copper, to construct the wiring network. Dielectric layers, such as silicon dioxide or low-k dielectrics, are introduced to insulate the interconnects and prevent signal interference between different wiring layers.
- the second layer’s traditional function is to establish the necessary interconnections that enable the routing and distribution of electrical signals throughout the IC. However, as described herein, circuitry may be utilized within this second layer (sometimes referred to as Back-End-Of-The-Line (“BEOL”)).
- BEOL Back-End-Of-The-Line
- Alternate embodiments of the IC 100 may be implemented as a stacked die, a monolithic design, TSVs, or silicon through vias.
- a stacked die design several dies may be stacked on top of each other, with each die performing different functions, such as memory and processing.
- the stacked die may communicate through wire bonds, microbumps, or bump-less bonds.
- the various functions and modules of the IC 100 may be integrated onto a single die, forming a more compact and power-efficient design.
- the IC 100 may include one or more interlocks 132 to manage conflicts in reading and writing data.
- the modules group 106 may be formed from a variety of non-volatile or semi-volatile (e.g., very long refresh periods) memory technologies, such as Static Random-Access Memory (SRAM), Ferroelectric Field Effect Transistor (FeFET), Ferroelectric Random Access Memory (FeRAM), Resistive Random Access Memory (ReRAM), Spin-Orbit Torque (SOT) Memory, Spin Transfer Torque (STT) Memory, charge trap, floating gate memories, and/or Schottky diodes.
- SRAM Static Random-Access Memory
- FeFET Ferroelectric Field Effect Transistor
- FeRAM Ferroelectric Random Access Memory
- ReRAM Resistive Random Access Memory
- SOT Spin-Orbit Torque
- STT Spin Transfer Torque
- the modules group 106 may utilize a Static Random- Access Memory (SRAM) topology.
- SRAM Static Random- Access Memory
- the SRAM topology may employ a cross-coupled flip-flop structure (e.g., latching flip-flops), ensuring the stored data remains intact as long as power is supplied.
- the modules group 106 may utilize heterogeneous types of memory including volatile and non-volatile memory types.
- the modules group 106 may utilize a Flash Memory Topology.
- Flash memory is a non-volatile memory technology used in applications where data persistence is needed, such as solid-state drives (SSDs) and USB flash drives.
- SSDs solid-state drives
- the flash memory topology disclosed herein features a matrix of memory cells, each consisting of a floating-gate transistor or charge trap device.
- the modules group 106 may also use wear-leveling techniques to prolong the lifespan of the memory cells.
- the modules group 106 may utilize a Ferroelectric Random- Access Memory (FeRAM) topology.
- FeRAM Ferroelectric Random- Access Memory
- the FeRAM topology utilizes a ferroelectric material capable of retaining polarization states.
- One such memory topology may, in specific embodiments, utilize a FeFET to retain state information and program the ferroelectric material. These ferroelectric materials may be used to retain state information and act as a memory bit cell.
- the modules group 106 may utilize a Phase Change Memory (PCM) Topology, which is a non-volatile memory technology that utilizes reversible phase changes in materials to store data.
- PCM Phase Change Memory
- the PCM topology may include any phase change material, for example a chalcogenide alloy or a chalcogenide glass housed within a memory cell.
- the modules group 106 may utilize a Resistive Random-Access Memory (ReRAM) Topology, which is a non-volatile memory technology based on resistive switching phenomena.
- the ReRAM topology may utilize a thin-film material that exhibits reversible changes in resistance upon the application of electrical stimuli.
- the modules group 106 may utilize a Spin-Orbit Torque (SOT) Magnetic Random-Access Memory (MRAM) topology.
- SOT-MRAM is a type of non-volatile memory that utilizes spin-orbit torque to switch the magnetic state of a storage element.
- the SOT- MRAM topology may incorporate a magnetic tunnel junction (MTJ) structure and leverages the spin-orbit coupling effect to write and read data.
- the magnetic tunnel junction may have a dielectric layer between a magnetic fixed layer and a magnetic free layer. Writing may be done by switching magnetization of the free magnetic layer by injecting an in-plane current in an adj acent SOT layer. Reading may be done by putting current into the magnetic tunnel junction.
- the SOT-MRAM can optimize the spin-orbit materials by using current-driven switching schemes while minimizing write energy consumption, in some specific embodiments.
- the modules group 106 may utilize a Spin Transfer Torque (STT) Magnetic Random-Access Memory Topology.
- STT-MRAM is another type of non-volatile memory that relies on spin transfer torque to manipulate the magnetic state of a storage element.
- the STT-MRAM topology can use a magnetic tunnel junction (MTJ) structure, where the magnetization orientation determines the stored data. Additionally, the orientation of a magnetic layer in a magnetic tunnel junction or spin valve can be changed using a spin- polarized current, for example.
- MTJ magnetic tunnel junction
- the IC 100 may include a single write peripheral 104 with a dedicated clock, or each module 108, 110, 112, 114 may have its own dedicated write peripheral utilizing a shared clock (not shown in Fig. 1). Additionally, the modules group 106 may be organized into separate partitions, each with a dedicated read peripheral 116, 118, 120, 122 having an independent clock.
- Another possible embodiment of the IC 100 includes an interface (e.g., the same, different, higher or lower voltage) to enable data transfer external to the packaging of the IC 100.
- the IC 100 may also include an integrated microcontroller unit (MCU) or a digital signal processor (DSP) for processing data within the IC in yet additional embodiments.
- MCU microcontroller unit
- DSP digital signal processor
- Fig. 2 shows a perspective view of an assembly 200 of the integrated circuit 212 of Fig. 1 implemented on a chiplet 230 that is bonded to a second device 226 in accordance with an embodiment of the present disclosure.
- the integrated circuit 212 is the circuitry within the chiplet 230.
- the second device 226 may be a chiplet, semiconductor wafer, semiconductor package, encased circuitry, etc.
- the second device 226 may be an Al accelerator such that each processing unit has read access to one module (or a predetermined set) of the modules group 236.
- the second device 226 may be a network controller where there is an offload circuit to read the data from each of the modules to process incoming/outgoing packets, etc.
- the assembly 200 includes a modules group 236 having a plurality of modules, including a first module 232 and a second module 234. Fig. 2 shows several modules, however, for clarity, only modules 232, 234 have reference numbers.
- the integrated circuit 212 further comprises a shared write port 222. The shared write port 222 interfaces into the write peripheral 202.
- the second device 226 may use the shared write port 222 via an address & data bus with a clock and a enable signal to write data to any modules within the modules group 236, other ways of writing data may be considered.
- serial connections, parallel connections, various buses, or ports may be used, such as a DDR (Double Data Rate) Interface, an SRAM (Static Random-Access Memory) Interface, a NAND Flash Memory Interface, a NOR Flash Memory Interface, an HBM (High Bandwidth Memory) Interface, a GDDR (Graphics Double Data Rate) Interface, an NVMe (Non-Volatile Memory Express) Interface, SPI, I2C, etc.
- Each of the modules has a read port with a read address 218 (to send an address to a module 234) and read data 220 (which is the data read from the module 232.
- the modules group 236 is formed on a chiplet 230 having two sides including a surface 228 that can be bonded to and complement a second device 226.
- the chiplet 230 may be formed by forming circuitry on a silicon substrate 204 and then by adding a second layer 206. In other embodiments, these layers may be reversed and/or other layers may be added, removed, etc.
- the read address 218 and read data 220 are used for reading the module 232.
- the second device 226 may use an address and data bus with a clock and an enable signal to read data from the module 232, other ways of reading data may be considered.
- serial connections, parallel connections, various buses, or ports may be used, such as a DDR (Double Data Rate) Interface, a SRAM (Static Random-Access Memory) Interface, a NAND Flash Memory Interface, a NOR Flash Memory Interface, an HBM (High Bandwidth Memory) Interface, a GDDR (Graphics Double Data Rate) Interface, an NVMe (Non-Volatile Memory Express) Interface, SPI, I2C, etc.
- DDR Double Data Rate
- SRAM Static Random-Access Memory
- NAND Flash Memory Interface a NAND Flash Memory Interface
- NOR Flash Memory Interface NOR Flash Memory Interface
- HBM High Bandwidth Memory
- GDDR Graphics Double Data Rate
- NVMe Non-Volatile Memory Express
- All of the read ports are configured to be inactive when a write operation is applied to the shared write port 222.
- the read ports may also be configured to process reads concurrently with each other.
- the shared write port 222 is configured to write to an address space, where the shared write port 222 is configured to write to the first module 232 via a first portion of the address space and write to the second module 234 via a second portion of the address space.
- Each module of the plurality of modules group 236 includes an independent read port for concurrent reading via a respective independent read port of any of the plurality of modules.
- Each read port for a respective module may include contacts for circuitry found within the second device 226 to interface via metallic contacts.
- metallic contacts on the top layer 208 that are configured to interface with metallic contacts on the surface 228 of the chiplet 230 such that the metallic contacts allow for a read space that is coextensive with a read space of a module of the modules group 236.
- the read spaces of the modules group 236 may all be coextensive with each other (as is described with reference to Figs. 3 and 4).
- the read peripheral for the first module 232 is implemented on a silicon substrate 204 (sometimes referred to as a Front-end-of-the-line).
- the second layer 206 (sometimes called the Back-end-of-the-line) may be built next in the manufacturing process on top of the silicon substrate 204 (and any circuitry) and may contain the respective memory bit cells.
- the read peripheral for the first module 232 is implemented in the second layer 206 and is disposed between the modules group 236 and the surface 228 of the chiplet 230.
- the modules group 236 may be configured to process write commands only during reset.
- the write commands may be “slow write” commands. That is, the modules group 236 may have very low write speeds relative to its read speed.
- the write logic may be frozen (or disabled) when the modules group 236 are used for reading data.
- the integrated circuit 212 provides functionality to allocate memory blocks to the modules group 236 based on the usage of the modules group 236. In other embodiments, the memory addresses are fixed along with the allocation.
- the integrated circuit 212 may be implemented as a face-to-face bonded chiplet 230. The face-to face bonding may be bumpless wafer bonding.
- the modules group 236 can have a single write peripheral 202.
- each module of the modules group 236 may have a dedicated write peripheral that utilizes a shared clock.
- the modules group 236 may also be organized into separate partitions each with partition having a dedicated read peripheral, where each dedicated read peripheral has an independent clock.
- the partitions may be one, two, or more modules of the modules group 236.
- the write peripheral 202 circuitry's overall architecture may include a series of different components, including write drivers, address decoders, sense amplifiers, data input latches, data bus, etc. and/or some combination thereof. Write drivers or write buffers, may be tasked with transferring data onto the memory cell.
- Address decoders may be used to interpret the memory address that is fed as an input where the data needs to be written. By activating the specific row and column of the memory array linked to that address, they may be used to select the target memory cell.
- Sense amplifiers may be used to identify and boost the signal from the memory cells during read operations, and also participate in refreshing the memory cell after data write operations. The write operation is instigated by a write enable signal. When a write command is initiated, this signal propels the write drivers and decoders into the writing process.
- Data input latches may be used as temporary storage units, retaining the data set to be written into the memory until the write operation is implemented.
- a data bus with a transmission route can be used to facilitate the movement of data from the data input latches to the memory cells.
- a write operation to the modules group 236 may be performed through a priority arbitration circuit that facilitates the modules to be accessed in a predetermined order, and the shared write port 222 may be configured to write to a virtual address space that is mapped onto a physical memory space.
- the integrated circuit 212 may include a high voltage write logic used within the write peripheral 202, and the second semiconductor device 226 may comprise a plurality of processing elements, whereby each processing element includes a respective interface to communicate with a respective module of the modules group 236.
- the chiplet 230 may include an interface to the shared write port 222 on the surface 228 to interface with a complementary interface on the second semiconductor device 226.
- the integrated circuit 212 may also include a power gating circuitry that selectively powers down a module of the modules group 236 when not in use. Additionally, the integrated circuit 212 may have the write peripheral 202 of the modules group 236 connected to a dedicated I/O pad to enable data transfer external to the package of the integrated circuit.
- the integrated circuit 212 may utilize multiple modules of the modules group 234 grouped together. These modules may be synchronized with one another in specific embodiments. In some cases, all the modules are synchronized, while in other instances, only specific modules are to be synchronized. For instance, the circuit on a second device 226 may need to synchronize with a specific module when reading data from one of the modules in the module group 236. [0082] To synchronize the modules, the integrated circuit 212 may use various timing technologies. In some cases, a plurality of clocks may feed each respective module of the modules group 236, thereby allowing each module to have decoupled timing relative to the other modules in the group. This decoupling ensures that any delay in one module will not affect the functioning of other modules. It is worth noting that the clocks used may or may not need to be synchronized. In some cases, a common clock can be used to synchronize the modules. In yet other embodiments, the clock signal or signals may be provided by the second device 226.
- synchronization techniques can be used, such as phase comparison of the clock signals or a phase-locked loop (PLL) synchronization method.
- PLL phase-locked loop
- Another embodiment for synchronizing the modules in the IC could use delay-locked loop (DLL) synchronization. In this method, a delay element is added to the clock signal path, and the output is compared to the input clock signal. The feedback loop adjusts the delay element until the output of the DLL matches the input, resulting in synchronization of the clock signals.
- DLL delay-locked loop
- the integrated circuit 212 could use a combination of different synchronization techniques to achieve synchronization between the modules of the modules group 236. For example, some modules may use PLL synchronization while others use clock delay lines or DLL synchronization, depending on their specific requirements. Additionally, the integrated circuit 212 can also use redundant synchronization techniques to ensure reliability and redundancy in case one method fails. For example, the integrated circuit 212 could use both PLL synchronization and DLL synchronization simultaneously, so that if one method fails, the other can still maintain synchronization.
- Fig. 3 shows a block diagram 300 illustrating the memory address space of the integrated circuit of Fig. 1 in accordance with an embodiment of the present disclosure.
- the memory address space includes a write address space 316 and read data address spaces 310, 312, 314.
- the write address space 316 consists of various units where data, e.g., weights, and/or instructions can be stored. These units are referred to as memory addresses.
- the module group 302 includes multiple memory modules 304, 306, 308.
- the write address space 316 may be distributed among the memory modules 304, 306, 308 such that the write address space 316 spans from 0 to N*M-1.
- the modules group 302 has N memory modules 304, 306, 308, where N is a positive integer, and each module has a memory size of M.
- the total number of unique write memory addresses in the write address space will be N*M, which can be referenced by an integer from 0 to N*M-1.
- memory addresses of the write address space 316 are ordered sequentially up to N*M-1.
- the first address is 0 and the final address is N*M- 1, encompassing a total of N*M addresses.
- This ordering can be linear (each address increases by one) or some other specified pattern depending on the implementation.
- the write memory addressing can be implemented in a variety of ways based on the system architecture.
- One method used in a specific embodiment is to use the base and limit registers.
- the base register holds the smallest legal physical write memory address, and the limit register specifies the size of the range. Therefore, to generate a logical address, you would add the base to the relative address.
- a memory addressing scheme may be used where the base used is set to be 0. Yet additional write addressing techniques will be appreciated by one or ordinary skill in the relevant art.
- each memory module can possess a unique set of write memory addresses such all memory addresses within the modules group 302 is unique with respect to writing data, e.g., the first module starting at 0 and the last one ending at N*M-1.
- This allocation may be dependent on the memory management system of the device writing data to the modules 304, 306, 308, which could range from simple fixed partitioning schemes to more complex dynamic partitioning models.
- each module (304, 306, or 308) has an equal size of M addresses
- the first module 304 would possess write addresses 0 to M-l
- the second module would have write addresses M to 2M-1
- the third module would have write addresses 2Mto 3*M-1
- the Nth module 308, therefore, would possess write addresses from (N-1)*M to N*M-1.
- the modules group 302 has different read data address spaces 310, 312, 314. These read address spaces 310, 312, 314 may have overlapping address spaces, may have contiguous address spaces, or may have coextensive address spaces.
- the read address spaces 310, 312, 314 may be independent relative to each other.
- the system includes three independent read address spaces, labeled as read address spaces 310, 312, and 314. Each of these read address spaces is distinct from the others, meaning that reads can be performed in each space without affecting the others.
- the read address spaces 310, 312, 314 may be defined as contiguous blocks of memory addresses, each with its own starting address and ending address.
- each read address space 310, 312, 314 may have a range of addresses that corresponds to values from 0 to M-l, where M is a maximum value determined by the size of the modules 304, 306, 308 being used.
- each processing unit may interface with each read address space 310, 312, 314.
- the concurrent reads may be implemented as described herein.
- the independence of the read address spaces 310, 312, 314 ensures that each processing unit can access its desired data without causing any interference or conflict with other processing units.
- Fig. 4 shows a block diagram illustrating the memory address space with the signal interfaces of the integrated circuit of Fig. 1 in accordance with an embodiment of the present disclosure.
- the signals used in Fig. 4 may be used with any embodiment described herein. However, one of ordinary skill in the relevant art will appreciate that different signaling schemes may be used.
- the modules group 402 includes modules 404, 406, 408 that share a common write peripheral 411.
- the write peripheral 411 includes a write address bus that includes the address of the data being written, a write data bus that includes the data, a write clock cause the writes to occur (e.g., either on a leading or trailing edge of the clock signal, etc.). The writes only occur if the write enable signal indicates a write should occur. Any logic may be used, e.g., high voltage may correspond to 1 and a low voltage may correspond to 0, or vice versa.
- the write peripheral 411 may be on the chiplet 230 and in other embodiments, the write peripheral 411 is on the second device 226.
- the modules group 402 comprises modules 404, 406, 408 where each has a respective read peripheral 410, 412, 414.
- Each of the read peripheral 410, 412, 414 has a read address bus to send an address for reading, a read data bus to receive the data, a read clock which is the clock used to control the timing of the output of the digital data, and an output enable that is a precondition to outputting data.
- Any logic may be used, e.g., high voltage may correspond to 1 and a low voltage may correspond to 0, or vice versa.
- multi-bit or analog data storage may be used.
- Fig. 5 shows an illustration of an integrated circuit 500 that may be part of a semiconductor device such as a chiplet in accordance with an embodiment of the present disclosure.
- the integrated circuit 500 may be disposed on a semiconductor device, such as a chiplet, that has a silicon substrate 506 and a second layer portion 508.
- a semiconductor device such as a chiplet
- the integrated circuit 500 may include a modules group having a plurality of modules including a first module and a second module, etc. even though only a single module 502 is shown.
- the memory bit cells 522 are written to by the shared write port 512, 516, which includes both a write address bus line 512 and a write data bus 516. These buses run through the second layer 508 and can be connected to a second semiconductor device via an interposer. The second device has electrical contacts that complement those on the surface 518, allowing it to be electrically coupled to the write address and data buses.
- the memory bit cells 522 can be read from via the read port 524, 526, which includes a read address bus line 524 and a read data bus 526. Both of these buses can also run through the second layer 508 to the second semiconductor device coupled to the surface 518, which also has complementary electrical contacts to allow it to be electrically coupled to the read address and data buses.
- Various kinds of memory technologies may be used for the memory bit cells 522, such as a vertical connectivity fabric structure formed from non-volatile memory unit cells arranged in a three-dimensional column array 522.
- the memory bit cells 522 may utilize one or more of a cross-point, 3D NANDs, 3D NORs, 3D ANDs, and/or a stacked planar layer.
- the integrated circuit 500 is electrically connected to a second semiconductor device (not shown in Fig. 5) comprising another integrated circuit, which may be a system-on-chip or a Field-Programmable-Gate-Array.
- the memory bit cells 522 may be formed from various non-volatile memory types, such as FeFET, FeRAM, ReRAM, SOT, or STT. Additionally, alternatively, or optionally, the memory bit cells may be formed from non-volatile memory unit cells having 2-terminal devices, 3-terminal devices, or 4-terminal devices.
- the memory unit bit cells 522 may be formed from ferroelectric materials, such as a ferroelectric tunnel junction, a diode, a capacitor, a single-gate transistor, or a dual-gate transistor.
- the memory unit bit cells 522 may be formed from memristive materials, such as at least one ReRAM, or magnetic materials, such as at least one spin-orbit-torque device or at least one spin-transfer-torque device.
- the non-volatile memory unit cells 522 may also be formed from phase-change materials or anti-ferroelectric materials.
- the non-volatile memory unit cells 522 can be formed from other types of materials, such as phase change materials, anti -ferroelectric materials, or multi-bit PCM materials.
- the non-volatile unit cells can be formed utilizing different structures, such as resistive random-access memory (RRAM) technology, magnetic random-access memory (MRAM) technology, or ferroelectric random-access memory (FRAM) technology.
- RRAM resistive random-access memory
- MRAM magnetic random-access memory
- FRAM ferroelectric random-access memory
- the memory unit bit cells 522 may be formed from stacked memory layers where each layer includes a plurality of memory cells that can be accessed using shared bit lines.
- the read port 524, 526 may be coupled to the bit lines
- the write port 512, 516 may be coupled to the word lines that control the access to each layer.
- the 3D connectivity fabric structure can be built with stacked layers of either NAND gates, NOR gates, or AND gates, and in some cases, different types of logic gates may be combined to optimize the structure's functionality.
- the 3D connectivity fabric structure may be formed utilizing through-silicon-via (TSV) technology, which allows the vertical interconnection of the different layers of the structure.
- TSV through-silicon-via
- the non-volatile memory unit cells may include 2-terminal devices, such as a capacitive or a memristive device with or without an additional selector device such as a diode in series, 3-terminal devices, such as a floating-gate transistor, a transistor with an access gate, or 4-terminal devices, such as a transistor with two access gates.
- the type and configuration of the non-volatile memory unit cells 522 may depend on the specific application requirements, including the speed, power consumption, and reliability of the circuit.
- the memory unit cell may include or be a single ferroelectric transistor or 6T SRAM cell.
- the memory unit cell may be a combination of many different devices, including, but not limited to, one or more of a transistor, a memristor, a capacitor, etc.
- a ferroelectric material can be utilized to form the non-volatile memory unit cells 522.
- the ferroelectric material may be implemented as any kind of device, including, but not limited to, a thin-film device, such as a ferroelectric tunnel junction, a capacitor, a single-gate transistor, or dual -gate transistors, etc.
- the non-volatile memory unit cells 522 may be formed from a memristive material, such as a Metal Oxide Memristor (MOM), Conductive-Bridging RAM (CBRAM), or valence change memory (VCM), each of which provides different benefits regarding power consumption, speed, endurance, etc.
- MOM Metal Oxide Memristor
- CBRAM Conductive-Bridging RAM
- VCM valence change memory
- the non-volatile memory unit cells 522 may be formed from a magnetic material, such as spin-orbit-torque (SOT) devices, spin-transfer- torque (STT) devices, or perpendicular magnetic tunnel junctions (p-MTJ).
- SOT spin-orbit-torque
- STT spin-transfer- torque
- p-MTJ perpendicular magnetic tunnel junctions
- the modules group may include many modules where each of which can be accessed through dedicated read ports 524, 526 with a dedicated read peripheral 520 while sharing the same write port 512, 516 and shared write peripheral 510.
- the shared write port 512, 516 can be configured to selectively write to one or more of the plurality of modules within the modules group including the memory bit cells 522.
- Each of the modules may have the same or different sizes, and different module sizes may be configured to optimize the utilization of the memory array with different operating scenarios, etc.
- the integrated circuit 500 may be formed utilizing different manufacturing processes and techniques, which include but not limited to, a CMOS or Bipolar- CMOS-DMOS (BCD) process, a silicon-on-insulator (SOI) process, a FinFET process, a silicon germanium (SiGe) process, a gallium arsenide (GaAs) process, etc.
- BCD Bipolar- CMOS-DMOS
- SOI silicon-on-insulator
- FinFET a silicon germanium
- SiGe silicon germanium
- GaAs gallium arsenide
- FIG. 6 shows a perspective of an assembly 600 having the integrated circuit of Fig. 1 implemented on a semiconductor device, such as the chiplet 230, that is electrically connected to a system-on-a-chip (“SOC”) 610 in accordance with an embodiment of the present disclosure.
- the semiconductor device in this embodiment, is the chiplet 230 that is electrically connected to a system-on-a-chip (“SOC”) 610.
- the SOC 610 includes a silicon substrate 602 on which a plurality of processing elements is formed, including a processing element 606.
- the processing elements can communicate with each other through a Network-on-Chip (“NOC”) 604, which is a communication fabric that directs data transfer between the processing elements.
- NOC Network-on-Chip
- the communication fabric can take various forms, including buses, switches, NOCs, etc.
- the NOC 604 in the SOC 610 directs data traffic between the various nodes (e.g., the processing element 606) and links, which provide the communication paths between the nodes.
- the plurality of processing elements including the processing element 606, which can be any suitable type of processors capable of executing instructions, including microprocessors, graphics processing units (GPUs), digital signal processors (DSPs), or application-specific integrated circuits (ASICs).
- processors capable of executing instructions, including microprocessors, graphics processing units (GPUs), digital signal processors (DSPs), or application-specific integrated circuits (ASICs).
- GPUs graphics processing units
- DSPs digital signal processors
- ASICs application-specific integrated circuits
- the SOC 610 may comprise various modules, such as module 232, which are grouped together to provide memory functionality to the assembly 600 as described herein.
- the modules in modules group 236 can be coupled to a respective processing element provide it readable memory.
- the coupling between the module (e.g., module 232) and the processing element (e.g., 606) can be achieved through interconnects on the silicon substrate 602.
- a second layer 608 can be disposed on top of the substrate.
- the second layer 608 can be any suitable material, such as an insulating material, a metal, a dielectric, or interconnect layer, and it may be bonded to the chiplet 230.
- the bonding can be done using any suitable technique, including but not limited to, adhesives, soldering, or welding, etc.
- the assembly 600 provides a means of integrating the chiplet 230, which can include the integrated circuit of Fig. 1, with the SOC 610. Integrating the chiplet 230 provides various advantages, such as enhanced functionality, higher performance, and lower power consumption. Moreover, the integration of the chiplet 230 with the SOC 610 can be accomplished in various ways, depending on the particular application and design objectives of the system.
- the assembly 600 can incorporate various variations and modifications, depending on the specific requirements of the system.
- the processing elements formed on the silicon substrate 602 can vary in their number, type, and arrangement.
- the modules in modules group 236 can vary in their number, type, and function.
- the second layer 608 can be modified to include additional functionality.
- the second layer 608 can include passive components, such as resistors, capacitors, and inductors, or active components, such as transistors or diodes. Incorporating these components in the second layer 608 can further enhance the functionality and performance of the system.
- the assembly 600 can incorporate a heterogeneous integration approach, where the chiplet 230 is fabricated using a different technology than that used for the SOC 610. This approach allows for the optimal use of different fabrication technologies for different parts of the system, resulting in improved performance and reduced power consumption.
- An integrated circuit comprising: a modules group having a plurality of modules including a first module and a second module; a shared write port configured to write to the modules group; a first read port configured to read from the first module; and a second read port configured to read from the second module.
- a modules group having a plurality of modules including a first module and a second module; a shared write port configured to write to the modules group; a first read port configured to read from the first module; and a second read port configured to read from the second module.
- the first and second read ports are configured to be inactive when a write operation is applied to the shared write port.
- each module of the plurality of modules includes an independent read port for concurrent reading via a respective independent read port of any of the plurality of modules whereby the first read port is configured to read from the first module concurrently with another read of the second module using the second read port.
- the chiplet further comprises a plurality of input/ output (I/O) bonds, wherein each of the plurality of input/ output (I/O) bonds is configured to provide a respective read port for a respective module of the plurality of modules, wherein each of the plurality of input/output (I/O) bonds is configured to interface with a complementary respective input/output (I/O) bonds of a second device as bonded with the chiplet.
- I/O input/ output
- a semiconductor device comprising: the integrated circuit according to aspect 1.
- the semiconductor device further comprises a plurality of input/output (VO) bonds, wherein each of the plurality of input/output (I/O) bonds is configured to provide a respective read port for a respective module of the plurality of modules, wherein each of the plurality of input/output (I/O) bonds is configured to electrically connect with a complementary respective input/output (I/O) bonds of the second semiconductor device.
- VO input/output
- Chiplet die has a first side and a second side, wherein the second side is configured to be face-to-face bonded to the second semiconductor device.
- the first read port is disposed on the Chiplet die, the Chiplet die having a surface, the first module comprising: a memory array comprising a plurality of memory unit cells arranged within the first module; and a read peripheral circuitry configured to read data stored within the memory array via the first read port.
- the semiconductor device further comprising a write peripheral circuitry connected to the shared write port on the surface of the Chiplet die, wherein: the write peripheral circuitry defines a first footprint on the surface, the first footprint does not overlap with a footprint of the memory array and the first footprint does not overlap with a footprint of the first read port, and the shared write port defines a second footprint on the surface, the second footprint does not overlap with the footprint of the memory array and does not overlap with the footprint of the first read port.
- the memory unit cells include at least one of a magnetic material, a spin orbit torque material, a spin- transfer-torque material, a phase-change material, and an anti-ferroelectric material.
- each dedicated read peripheral has an independent clock.
- each clock of the plurality of clocks feeds a respective module of the plurality of modules whereby each respective module has decoupled timing relative to other modules of the plurality of modules, wherein each of the plurality of clocks is configured for clocking a respective read port of a respective module of the plurality of modules.
- the second semiconductor device comprising a plurality of processing elements, whereby each processing element includes a respective interface to communicate with a respective module of the plurality of modules on the modules group when the second semiconductor device is bonded to the chiplet.
- the chiplet comprising an interface to the shared write port on the second side to thereby interface with a complementary interface on the second semiconductor device.
- the integrated circuit according to aspect 1, wherein the plurality of modules is formed from a non-volatile memory selected from the group consisting of a FeFET, a FeRAM, a ReRAM, a SOT (Spin Orbit Torque), and a STT (Spin Transfer Torque).
- a non-volatile memory selected from the group consisting of a FeFET, a FeRAM, a ReRAM, a SOT (Spin Orbit Torque), and a STT (Spin Transfer Torque).
- each dedicated write peripheral is configured to utilize a shared clock.
- the integrated circuit is disposed on a first semiconductor device, the integrated circuit further comprising: a shared write peripheral configured to write to the modules group via the shared write port, the shared write peripheral disposed on a silicon substrate of the first semiconductor device; a first read peripheral configured to read the first module via the first read port, wherein the first read peripheral is disposed on the silicon substrate of the first semiconductor device; and a second read peripheral configured to read the second module via the second read port, wherein the second read peripheral is disposed on the silicon substrate of the first semiconductor device, wherein the first module and the second module are disposed on a second layer of the first semiconductor device.
- shared write port includes a write data bus that traverses through the second layer of the first semiconductor device.
- a method comprising: forming an integrated circuit of any one of aspects 1-98.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computer Hardware Design (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Power Engineering (AREA)
- Microelectronics & Electronic Packaging (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Semiconductor Integrated Circuits (AREA)
- Semiconductor Memories (AREA)
- Testing Or Measuring Of Semiconductors Or The Like (AREA)
- Medicinal Preparation (AREA)
- Design And Manufacture Of Integrated Circuits (AREA)
- Thin Film Transistor (AREA)
- Static Random-Access Memory (AREA)
- Non-Volatile Memory (AREA)
Abstract
L'invention concerne un boîtier de semi-conducteur qui comprend une tranche de mémoire et une tranche d'application liées ensemble pour créer un système de traitement de données et de stockage de mémoire. La tranche de mémoire héberge de multiples groupes de modules, chacun comportant plusieurs modules, un port d'écriture partagé et des ports de lecture indépendants, permettant des opérations de lecture simultanées. La tranche d'application, qui communique de manière fonctionnelle avec la tranche de mémoire, peut comprendre divers éléments de traitement tels que des cœurs FPGA, des GPU, des CPU et des accélérateurs de réseau neuronal. Les deux tranches peuvent présenter des périmètres externes coextensifs, des configurations de support telles qu'une forme rectangulaire, carrée et ronde, et fournir une capacité de mémoire substantielle. Le boîtier peut utiliser une technologie de liaison sans bosse pour réduire la résistance thermique et électrique et incorpore des trous d'interconnexion traversant le silicium (TSV) pour une interconnectivité robuste.
Applications Claiming Priority (14)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363518988P | 2023-08-11 | 2023-08-11 | |
| US63/518,988 | 2023-08-11 | ||
| US202363602737P | 2023-11-27 | 2023-11-27 | |
| US202363602733P | 2023-11-27 | 2023-11-27 | |
| US63/602,737 | 2023-11-27 | ||
| US63/602,733 | 2023-11-27 | ||
| US202463567649P | 2024-03-20 | 2024-03-20 | |
| US63/567,649 | 2024-03-20 | ||
| US202463637742P | 2024-04-23 | 2024-04-23 | |
| US202463637764P | 2024-04-23 | 2024-04-23 | |
| US63/637,742 | 2024-04-23 | ||
| US63/637,764 | 2024-04-23 | ||
| US202463674471P | 2024-07-23 | 2024-07-23 | |
| US63/674,471 | 2024-07-23 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025038365A1 true WO2025038365A1 (fr) | 2025-02-20 |
Family
ID=92538715
Family Applications (7)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/041390 Pending WO2025038365A1 (fr) | 2023-08-11 | 2024-08-08 | Circuit intégré ayant des mémoires et un port d'écriture partagé |
| PCT/US2024/041392 Pending WO2025038366A1 (fr) | 2023-08-11 | 2024-08-08 | Procédé et système de test de micropuces liées face à face empilées |
| PCT/US2024/041396 Pending WO2025038368A1 (fr) | 2023-08-11 | 2024-08-08 | Circuit intégré à mémoires de type microvault |
| PCT/US2024/041393 Pending WO2025038367A1 (fr) | 2023-08-11 | 2024-08-08 | Système et procédé pour avoir une fermeture de synchronisation correcte par construction pour une liaison face à face |
| PCT/US2024/041400 Pending WO2025038369A1 (fr) | 2023-08-11 | 2024-08-08 | Structures fefet utilisant des canaux semi-conducteurs d'oxyde amorphe sur des circuits intégrés |
| PCT/US2024/041395 Pending WO2025042587A1 (fr) | 2023-08-11 | 2024-08-08 | Ensemble ayant une micropuce liée face à face |
| PCT/US2024/041403 Pending WO2025038372A1 (fr) | 2023-08-11 | 2024-08-08 | Système, procédé et appareil pour mémoire à l'échelle d'une tranche |
Family Applications After (6)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/041392 Pending WO2025038366A1 (fr) | 2023-08-11 | 2024-08-08 | Procédé et système de test de micropuces liées face à face empilées |
| PCT/US2024/041396 Pending WO2025038368A1 (fr) | 2023-08-11 | 2024-08-08 | Circuit intégré à mémoires de type microvault |
| PCT/US2024/041393 Pending WO2025038367A1 (fr) | 2023-08-11 | 2024-08-08 | Système et procédé pour avoir une fermeture de synchronisation correcte par construction pour une liaison face à face |
| PCT/US2024/041400 Pending WO2025038369A1 (fr) | 2023-08-11 | 2024-08-08 | Structures fefet utilisant des canaux semi-conducteurs d'oxyde amorphe sur des circuits intégrés |
| PCT/US2024/041395 Pending WO2025042587A1 (fr) | 2023-08-11 | 2024-08-08 | Ensemble ayant une micropuce liée face à face |
| PCT/US2024/041403 Pending WO2025038372A1 (fr) | 2023-08-11 | 2024-08-08 | Système, procédé et appareil pour mémoire à l'échelle d'une tranche |
Country Status (2)
| Country | Link |
|---|---|
| TW (7) | TW202527683A (fr) |
| WO (7) | WO2025038365A1 (fr) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130336074A1 (en) * | 2010-01-29 | 2013-12-19 | Mosys, Inc. | Hierarchical Multi-Bank Multi-Port Memory Organization |
| US20190278520A1 (en) * | 2016-07-28 | 2019-09-12 | Centec Networks (Su Zhou) Co., Ltd. | Data processing method and system for 2r1w memory |
| US20190325950A1 (en) * | 2018-04-24 | 2019-10-24 | Arm Limited | Multi-Port Memory Circuitry |
Family Cites Families (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP2102867B1 (fr) * | 2006-12-14 | 2013-07-31 | Rambus Inc. | Dispositif mémoire à plusieurs dés |
| US7978721B2 (en) * | 2008-07-02 | 2011-07-12 | Micron Technology Inc. | Multi-serial interface stacked-die memory architecture |
| JP2011081732A (ja) * | 2009-10-09 | 2011-04-21 | Elpida Memory Inc | 半導体装置及びその調整方法並びにデータ処理システム |
| JP2012208975A (ja) * | 2011-03-29 | 2012-10-25 | Renesas Electronics Corp | 半導体装置 |
| KR101466013B1 (ko) * | 2012-08-13 | 2014-11-27 | 한국표준과학연구원 | 비정질 산화물 반도체 층 및 이를 포함하는 박막 트랜지스터 |
| US10289604B2 (en) * | 2014-08-07 | 2019-05-14 | Wisconsin Alumni Research Foundation | Memory processing core architecture |
| US11119910B2 (en) * | 2016-09-27 | 2021-09-14 | Spin Memory, Inc. | Heuristics for selecting subsegments for entry in and entry out operations in an error cache system with coarse and fine grain segments |
| US10586786B2 (en) * | 2016-10-07 | 2020-03-10 | Xcelsis Corporation | 3D chip sharing clock interconnect layer |
| WO2018125118A1 (fr) * | 2016-12-29 | 2018-07-05 | Intel Corporation | Dispositifs à transistors à effet de champ ferroélectriques d'étage de sortie |
| WO2018236353A1 (fr) * | 2017-06-20 | 2018-12-27 | Intel Corporation | Mémoire non volatile intégrée basée sur des transistors à effet de champ ferroélectriques |
| DE112017007888T5 (de) * | 2017-09-29 | 2020-05-07 | Intel Corporation | Ferroelektrischer double-gate-feldeffekt-transistor |
| US11043472B1 (en) * | 2019-05-31 | 2021-06-22 | Kepler Compute Inc. | 3D integrated ultra high-bandwidth memory |
| TW202122993A (zh) * | 2019-08-13 | 2021-06-16 | 埃利亞德 希勒爾 | 記憶體式處理器 |
| US11687472B2 (en) * | 2020-08-20 | 2023-06-27 | Global Unichip Corporation | Interface for semiconductor device and interfacing method thereof |
| US20220352379A1 (en) * | 2021-04-29 | 2022-11-03 | Taiwan Semiconductor Manufacturing Company Limited | Ferroelectric memory devices having improved ferroelectric properties and methods of making the same |
-
2024
- 2024-08-08 WO PCT/US2024/041390 patent/WO2025038365A1/fr active Pending
- 2024-08-08 TW TW113129743A patent/TW202527683A/zh unknown
- 2024-08-08 WO PCT/US2024/041392 patent/WO2025038366A1/fr active Pending
- 2024-08-08 TW TW113129726A patent/TW202533421A/zh unknown
- 2024-08-08 WO PCT/US2024/041396 patent/WO2025038368A1/fr active Pending
- 2024-08-08 WO PCT/US2024/041393 patent/WO2025038367A1/fr active Pending
- 2024-08-08 TW TW113129693A patent/TW202531863A/zh unknown
- 2024-08-08 TW TW113129719A patent/TW202527723A/zh unknown
- 2024-08-08 TW TW113129781A patent/TW202523076A/zh unknown
- 2024-08-08 TW TW113129713A patent/TW202526960A/zh unknown
- 2024-08-08 WO PCT/US2024/041400 patent/WO2025038369A1/fr active Pending
- 2024-08-08 WO PCT/US2024/041395 patent/WO2025042587A1/fr active Pending
- 2024-08-08 WO PCT/US2024/041403 patent/WO2025038372A1/fr active Pending
- 2024-08-08 TW TW113129768A patent/TW202526570A/zh unknown
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130336074A1 (en) * | 2010-01-29 | 2013-12-19 | Mosys, Inc. | Hierarchical Multi-Bank Multi-Port Memory Organization |
| US20190278520A1 (en) * | 2016-07-28 | 2019-09-12 | Centec Networks (Su Zhou) Co., Ltd. | Data processing method and system for 2r1w memory |
| US20190325950A1 (en) * | 2018-04-24 | 2019-10-24 | Arm Limited | Multi-Port Memory Circuitry |
Non-Patent Citations (1)
| Title |
|---|
| HIROFUMI SHINOHARA ET AL: "A FLEXIBLE MULTIPORT RAM COMPILER FOR DATA PATH", IEEE JOURNAL OF SOLID-STATE CIRCUITS, IEEE, USA, vol. 26, no. 3, 1 March 1991 (1991-03-01), pages 343 - 348, XP000222612, ISSN: 0018-9200, DOI: 10.1109/4.75013 * |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2025038367A1 (fr) | 2025-02-20 |
| TW202523076A (zh) | 2025-06-01 |
| WO2025042587A1 (fr) | 2025-02-27 |
| TW202526570A (zh) | 2025-07-01 |
| TW202533421A (zh) | 2025-08-16 |
| WO2025038366A1 (fr) | 2025-02-20 |
| TW202526960A (zh) | 2025-07-01 |
| TW202527683A (zh) | 2025-07-01 |
| WO2025038372A1 (fr) | 2025-02-20 |
| TW202527723A (zh) | 2025-07-01 |
| TW202531863A (zh) | 2025-08-01 |
| WO2025038368A1 (fr) | 2025-02-20 |
| WO2025038369A1 (fr) | 2025-02-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11923341B2 (en) | Memory device including modular memory units and modular circuit units for concurrent memory operations | |
| US12243575B2 (en) | Memory system having combined high density, low bandwidth and low density, high bandwidth memories | |
| TWI767489B (zh) | 含晶圓級記憶體電路之高容量記憶體模組 | |
| US12027513B2 (en) | Layout design methodology for stacked devices | |
| US20230051480A1 (en) | Signal routing between memory die and logic die for mode based operations | |
| CN113626374B (zh) | 一种堆叠芯片 | |
| CN216118778U (zh) | 一种堆叠芯片 | |
| CN113626373A (zh) | 一种集成芯片 | |
| WO2025038365A1 (fr) | Circuit intégré ayant des mémoires et un port d'écriture partagé | |
| CN216118777U (zh) | 一种集成芯片 | |
| CN113722268B (zh) | 一种存算一体的堆叠芯片 | |
| US20200019348A1 (en) | Apparatuses, systems, and methods to store pre-read data associated with a modify-write operation | |
| CN120676642A (zh) | 一种存储芯片和存储芯片的制备方法 | |
| CN120676644A (zh) | 一种存储芯片和存储芯片的制备方法 | |
| CN120547881A (zh) | 存内计算芯片 | |
| CN119960678A (zh) | 混合存储计算架构以及存储计算方法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24761798 Country of ref document: EP Kind code of ref document: A1 |