WO2025078035A1

WO2025078035A1 - Mechanisms for reporting data over die-to-die interfaces

Info

Publication number: WO2025078035A1
Application number: PCT/EP2024/053370
Authority: WO
Inventors: Sachin ATHANIKAR; Francois Piednoel
Original assignee: Mercedes Benz Group AG
Current assignee: Mercedes Benz Group AG
Priority date: 2023-10-12
Filing date: 2024-02-09
Publication date: 2025-04-17
Anticipated expiration: 2026-04-12

Abstract

A multi-chiplet computing device (100) includes a primary chiplet (110) comprising a first mainband communication interface (112) and a first sideband communication interface (114). The computing device (100) further includes a secondary chiplet (120) comprising a second mainband communication interface (122) and a second sideband communication interface (124). The computing device (100) also includes an interconnect (130) comprising (i) a mainband data path (115) to transmit communications between the first mainband communication interface (112) and the second mainband communication interface (122), and (ii) a sideband data path (125) to transmit reporting and scheduling data between the first sideband communication interface (114) and the second sideband communication interface (124).

Description

MECHANISMS FOR REPORTING DATA OVER DIE-TO-DIE INTERFACES

CROSS-REFERENCE TO RELATED APPLICATION

[0001] This application claims the benefit of priority of U.S. Provisional Application No. 63/543,791, filed on October 12, 2023; which is hereby incorporated by reference in its entirety.

BACKGROUND

[0002] Universal Chiplet Interconnect Express (UCIe) provides an open specification for an interconnect and serial bus between chiplets, which enables the production of large system-on-chip (SoC) packages with intermixed components from different silicon manufacturers. It is contemplated that autonomous vehicle computing systems may operate using chiplet arrangements that follow the UCIe specification. One goal of creating such computing systems is to achieve the robust safety integrity levels of other important electrical and electronic (E/E) automotive components of the vehicle.

SUMMARY

[0003] Systems and methods are described herein in which a multi-chiplet computing apparatus includes at least a primary chiplet and a secondary chiplet. In certain implementations, the primary and secondary chiplets can be included on a SoC comprising any number of chiplets communicating over interconnects (e.g., standardized UCIe interconnects) that include a mainband (e.g., constituting the main data path for UCIe) and a sideband used to, for example, communicate register accesses, parameter exchanges, debugging communications, and compliance communications. In certain implementations, the mainband data path can include a forwarded clock, data valid pin, track pin, and any number of lanes of data per module.

[0004] A multi-chiplet computing apparatus can include a primary chiplet comprising a first mainband communication interface and a first sideband communication interface, and a secondary chiplet comprising a second mainband communication interface and a second sideband communication interface. In various examples, the primary chiplet and secondary chiplet can be connected by an interconnect that includes a mainband data path and a sideband data path. In some embodiments, the mainband data path can include communications transmitted between the first mainband communication interface and the second mainband communication interface, and the sideband data path can include reporting and scheduling data transmitted between the first sideband communication interface and the second sideband communication interface.

[0005] In various implementations, each of the primary chiplet and secondary chiplet can comprise a set of hardware components that can include processing resources, sensors, high-bandwidth memory, caches or other shared memory, machine learning accelerators, general compute resources, and the like. The reporting data can correspond to power consumption data for the first set of hardware components of the primary chiplet, which can indicate how much power each component is using at any given time. The power consumption data can comprise measurements of voltage, current, resistance, line frequency, power factor, wattage per unit time, and/or leakage.

[0006] In certain examples, the reporting and scheduling data can comprise aging data for the first set of hardware components of the primary chiplet, which can indicate whether one or more of the hardware components are operating outside nominal ranges (e.g., factory-tested and/or calibrated ranges). As provided herein, these ranges can correspond to processing speed, temperature ranges, resistance, and the like. The aging data can be measured by one or more sensors on the primary chiplet and can be processed and/or monitored by the secondary chiplet to ensure the primary chiplet is operating nominally. In some embodiments, the aging data can comprise directly measured data and/or sensor data from the first set of hardware components, where the sensor data can indicate at least one of voltage data, resistance data, or thermal data for each component of the first set of hardware components.

[0007] According to some embodiments, the reporting and scheduling data can include timing data based on one or more clock signals from the primary chiplet (e.g., clock forwarding signals). In some examples, the clock signals are used by the secondary chiplet to, for example, dynamically determine whether the primary chiplet is operating as calibrated. In certain aspects, each chiplet can include an embedded or internal clock, and can transmit timestamps associated with transmitted data, which can be used to compare with timestamps associated with other data to be processed by the chiplet. In further examples, the reporting and scheduling data can correspond to timing data for scheduling workloads by the primary chiplet and/or an SoC comprising a plurality of chiplets. For example, the primary chiplet can report timestamps for when a particular sensor data item was received and/or when the primary chiplet has completed processing a particular sensor data item. Such applications can be performed, for example, in autonomous vehicle implementations in which workloads are processed based on sensor data providing a sensor view of a surrounding environment of the vehicle.

[0008] In various examples, the secondary chiplet can predict, based on the reporting data from the primary chiplet, whether a fault or failure will occur in the first set of hardware components of the primary chiplet within a future timeframe. For example, when the secondary chiplet predicts that a fault or failure will occur in the first set of hardware components within the future timeframe, the secondary chiplet can transmit a mitigative message to the primary chiplet to perform one or more mitigative actions. The one or more mitigative actions can include one or more of: degrading or throttling the first set of hardware components (e.g., in accordance with a degradation scheme), switching processing from the primary chiplet to a backup chiplet, or shutting down the primary chiplet. In examples with multiple SoCs (e.g., a dual SoC arrange in which a primary SoC is monitored by a back SoC), the mitigative actions can include switching roles between the primary and backup chiplets. Further description of degrading performance of one or more components of an SoC may be found in U.S. Patent Application No. 18/229,491, which is incorporated by reference in its entirety.

[0009] As provided herein, the interconnect can comprise a standardized Universal Chiplet Interconnect Express (UCIe) interconnect, and the multi- chiplet computing apparatus can be included on a system-on-chip (SoC) that processes data (e.g., sensor data providing a sensor view for an external environment of an autonomous vehicle). In further implementations, the secondary chiplet and the primary chiplet can reverse roles or can each perform monitoring functions on the other chiplet based on reporting data from each chiplet.

BRIEF DESCRIPTION OF THE DRAWINGS

[0010] The disclosure herein is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements, and in which:

[0011] FIG. 1 is a block diagram depicting a multi-chiplet computing apparatus 100 that includes a die-to-die interconnect between chiplets, in accordance with examples described herein;

[0012] FIG. 2 is a block diagram depicting a system-on-chip (SoC) in which examples described herein may be implemented, in accordance with examples described herein;

[0013] FIG. 3 is a block diagram depicting an example multiple system-on- chip (MSoC), in accordance with examples described herein; and

[0014] FIGS. 4 and 5 are flow charts describing example methods of reporting information between chiplets, according to various examples. DETAILED DESCRIPTION

[0015] In certain implementations, example computing systems described herein can perform one or more functions described herein using a learningbased approach, such as by executing an artificial neural network (e.g., a recurrent neural network, convolutional neural network, etc.) or one or more machine-learning models. Such learning-based approaches can further correspond to the computing system storing or including one or more machine- learned models. In an embodiment, the machine-learned models may include an unsupervised learning model. In an embodiment, the machine-learned models may include neural networks (e.g., deep neural networks) or other types of machine-learned models, including non-linear models and/or linear models. Neural networks may include feed-forward neural networks, recurrent neural networks (e.g., long short-term memory recurrent neural networks), convolutional neural networks or other forms of neural networks. Some example machine-learned models may leverage an attention mechanism such as self-attention. For example, some example machine-learned models may include multi-headed self-attention models (e.g., transformer models).

[0016] As provided herein, a "network" or "one or more networks" can comprise any type of network or combination of networks that allows for communication between devices. In an embodiment, the network may include one or more of a local area network, wide area network, the Internet, secure network, cellular network, mesh network, peer-to-peer communication link or some combination thereof and may include any number of wired or wireless links. Communication over the network(s) may be accomplished, for instance, via a network interface using any type of protocol, protection scheme, encoding, format, packaging, etc.

[0017] One or more examples described herein provide that methods, techniques, and actions performed by a computing device are performed programmatically, or as a computer implemented method. Programmatically, as used herein, means through the use of code or computer-executable instructions. These instructions can be stored in one or more memory resources of the computing device. A programmatically performed step may or may not be automatic. In some examples, a computing "apparatus" can comprise a computing system, such as a system of one or more servers, or an on-board, autonomous vehicle computing system. In variations, a computing apparatus can comprise a computing device, such as computing resources included on a circuit board, personal computer, smartphone computer, tablet computer, laptop, and the like.

[0018] One or more examples described herein can be implemented using programmatic modules, engines, or components. A programmatic module, engine, or component can include a program, a sub-routine, a portion of a program, or a software component or a hardware component capable of performing one or more stated tasks or functions. As used herein, a module or component can exist on a hardware component independently of other modules or components. Alternatively, a module or component can be a shared element or process of other modules, programs or machines.

[0019] Some examples described herein can generally require the use of computing devices, including processing and memory resources. For example, one or more examples described herein may be implemented, in whole or in part, on computing devices such as servers and/or personal computers using network equipment (e.g., routers). Memory, processing, and network resources may all be used in connection with the establishment, use, or performance of any example described herein (including with the performance of any method or with the implementation of any system).

[0020] Furthermore, one or more examples described herein may be implemented through the use of instructions that are executable by one or more processors. These instructions may be carried on a non-transitory computer- readable medium. Machines shown or described with figures below provide examples of processing resources and computer-readable mediums on which instructions for implementing examples disclosed herein can be carried and/or executed. In particular, the numerous machines shown with examples of the invention include processors and various forms of memory for holding data and instructions. Examples of non-transitory computer-readable mediums include permanent memory storage devices, such as hard drives on personal computers or servers. Other examples of computer storage mediums include portable storage units, such as flash memory or magnetic memory. Computers, terminals, network-enabled devices are all examples of machines and devices that utilize processors, memory, and instructions stored on computer-readable mediums. Additionally, examples may be implemented in the form of computer programs, or a computer usable carrier medium capable of carrying such a program.

[0021] MULTI-CHIPLET COMPUTING APPARATUS

[0022] FIG. 1 is a block diagram depicting a multi-chiplet computing apparatus 100 that includes a die-to-die interconnect 130 between chiplets 110, 120, in accordance with examples described herein. In an embodiment, the multi-chiplet computing apparatus 100 can include a primary chiplet 110 and a secondary chiplet 120. Each chiplet can include one or more control circuits that may include one or more processors (e.g., microprocessors), one or more processing cores, a programmable logic circuit (PLC) or a programmable logic/gate array (PLA/PGA), a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), or any other control circuit. For example, the multi-chiplet computing apparatus 100 can be included on a system-on-chip (SoC) comprising a plurality of chiplets arranged with interconnects similar to the die-to-die interconnect 130 shown in FIG. 1. In some implementations, the control circuit(s) and/or computing system 100 may be part of, or may form, a vehicle control unit (also referred to as a vehicle controller) that is embedded or otherwise disposed in a vehicle (e.g., a Mercedes-Benz® car, truck, or van). For example, the vehicle controller may be or may include an infotainment system controller (e.g., an infotainment headunit), a telematics control unit (TCU), an electronic control unit (ECU), a central powertrain controller (CPC), a central exterior & interior controller (CEIC), a zone controller, an autonomous vehicle control system, or any other controller (the term "or" is used herein interchangeably with "and/or").

[0023] In an embodiment, the control circuit(s) may be programmed by one or more computer-readable or computer-executable instructions stored on the non-transitory computer-readable medium. The non-transitory computer- readable medium may be a memory device, also referred to as a data storage device, which may include an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination thereof. The non- transitory computer-readable medium may form, for example, a computer diskette, a hard disk drive (HDD), a solid state drive (SDD) or solid state integrated memory, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), dynamic random access memory (DRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), and/or a memory stick. In some cases, the non- transitory computer-readable medium may store computer-executable instructions or computer-readable instructions, such as instructions to perform the below methods described in connection with FIGS. 3 and 4.

[0024] In various embodiments, the terms "computer-readable instructions" and "computer-executable instructions" are used to describe software instructions or computer code configured to carry out various tasks and operations. In various embodiments, if the computer-readable or computerexecutable instructions form modules, the term "module" refers broadly to a collection of software instructions or code configured to cause the control circuit to perform one or more functional tasks. The modules and computer- readable/executable instructions may be described as performing various operations or tasks when the control circuit(s) or other hardware components execute the modules or computer-readable instructions. [0025] In further embodiments, the primary chiplet 110 and secondary chiplet 120 can each include a mainband communication interface 112, 122 that enable communications over a mainband data path 115 between the primary chiplet 110 and the secondary chiplet 120. The primary chiplet 110 and the secondary chiplet 120 can each further include a sideband communication interface 114, 124 that enable sideband communications over a sideband data path 125 between the primary chiplet 110 and the secondary chiplet 120.

[0026] As provided herein, the mainband data path 115 can constitute a performance-based data path for UCIe data communications, and can be used to transmit data large amounts of data (e.g., processed sensor data to a shared memory). The sideband data path 125 can comprise a high-reliability data path that may be used to, for example, communicate reporting data (e.g., timing, power, and aging data), as well as functional safety data and error correction code (ECC) information. As an example, the main data path 115 can comprise a 16 Gbps data path whereas the sideband data path 125 can comprise an 800 MHz data path.

[0027] In other examples, the main data path 115 can support high- bandwidth networking and/or datacenter applications (e.g., providing upwards of 32Gbps/pin). The sideband data path 125 can also support increased bandwidth (e.g., 4Gbps), such as for functional safety (FuSa) communications, error correction code (ECC) communications, parameter exchange and negotiations between two dies, timing data (e.g., clock forwarding), and the like. For advanced packages, the mainband data path 115 and sideband data path 125 can operate to provide redundancy in communications to, for example, provide for automotive safety integrity level (ASIL)-D ratings (e.g., for the combined hardware and communications over the D2D interconnect 130. In further examples, the main data path 115 and the sideband data path 125 can be independent from each other, and do not share any common points of failure. In still further examples, the mainband data path 115 and the sideband data path 125 can have differing voltage and phase-lock-loop (PLL) regulation (e.g., to reduce supply-noise sensitivity).

[0028] In one embodiment, a datacenter may utilize the mainband and sideband communication techniques described here to provide information technology applications, store and process data for application services, provide virtual networks to support applications and workloads, provide customer relations management (CRM) services, provide messaging and file sharing services, perform remote artificial intelligence and/or machine learning services (e.g., for big data applications), perform enterprise services (e.g., resource management and planning), and the like.

[0029] EXAMPLE SYSTEM-ON-CHIP

[0030] FIG. 2 is a block diagram illustrating an example system-on-chip (SoC) 200, in accordance with examples described herein. The example SoC 200 shown in FIG. 2 can include additional components, and the components of the SoC 200 may be arranged in various alternative configurations other than the example shown. Thus, the SoC 200 of FIG. 2 is described herein as an example arrangement for illustrative purposes and is not intended to limit the scope of the present disclosure in any manner. Furthermore, the interconnects between chiplets in FIG. 2 can comprise the die-to-die interconnect 130 described with respect to FIG. 1, which can include a mainband data path and sideband data path for transmitting reporting data between the chiplets.

[0031] Referring to FIG. 2, a sensor data input chiplet 210 of the SoC 200 can receive sensor data from various vehicle sensors 205 of the vehicle. These vehicle sensors 205 can include any combination of image sensors (e.g., single cameras, binocular cameras, fisheye lens cameras, etc.), LIDAR sensors, radar sensors, ultrasonic sensors, proximity sensors, and the like. The sensor data input chiplet 210 can automatically dump the received sensor data as it is received into a cache memory 231 of the central chiplet 220. The sensor data input chiplet 210 can also include an image signal processor (ISP) responsible for capturing, processing, and enhancing images taken from the various vehicle sensors 205. The ISP takes the raw image data and performs a series of complex image processing operations, such as color, contrast, and brightness correction, noise reduction, and image enhancement, to create a higher-quality image that is ready for further processing or analysis by the other chiplets of the SoC 200. The ISP may also include features such as auto-focus, image stabilization, and advanced scene recognition to further enhance the quality of the captured images. The ISP can then store the higher-quality images in the cache memory 231.

[0032] In some aspects, the sensor data input chiplet 210 publishes identifying information for each item of sensor data (e.g., images, point cloud maps, etc.) to a shared memory 230 of a central chiplet 220, which acts as a central mailbox for synchronizing workloads for the various chiplets. The identifying information can include details such as an address in the cache memory 231 where the data is stored, the type of sensor data, which sensor captured the data, and a timestamp of when the data was captured.

[0033] To communicate with the central chiplet 220, the sensor data input chiplet 210 transmits data through an interconnect 211a. Interconnects 211a-f each represent die-to-die (D2D) interfaces between the chiplets of the SoC 200. In some aspects, the interconnects 211a-f can include high-bandwidth data paths used for general data purposes to the cache memory 231 and high- reliability data paths to transmit functional safety (FuSa) and scheduler information to the shared memory 230. Depending on bandwidth requirements, an interconnect 211a-f may include more than one die-to-die interface. For example, interconnect 211a can include two interfaces to support higher bandwidth communications between the sensor data input chiplet 210 and the central chiplet 220.

[0034] In one aspect, the interconnects 211a-f implement the Universal Chiplet Interconnect Express (UCIe) standard and communicate through an indirect mode to allow each of the chiplet host processors to access remote memory as if it were local memory. This is achieved by using a specialized Network-on-Chip (NoC) Network Interface Unit (NIU) (e.g., which allows freedom of interferences between devices connected to the network) that provides hardware-level support for remote direct memory access (R.DMA) operations. In UCIe indirect mode, the host processor sends requests to the NIU, which then accesses the remote memory and returns the data to the host processor. This approach allows for efficient and low-latency access to remote memory, which can be particularly useful in distributed computing and data- intensive applications. Additionally, UCIe indirect mode provides a high degree of flexibility, as it can be used with a wide range of different network topologies and protocols.

[0035] In various examples, the SoC 200 can include additional chiplets that can store, alter, or otherwise process the sensor data cached by the sensor data input chiplet 210. The SoC 200 can include an autonomous drive chiplet 240 that can perform the perception, sensor fusion, trajectory prediction, and/or other autonomous driving algorithms of the autonomous vehicle. The autonomous drive chiplet 240 can be connected to a dedicated HBM-RAM chiplet 235 in which the autonomous drive chiplet 240 can publish all status information, variables, statistical information, and/or processed sensor data as processed by the autonomous drive chiplet 240.

[0036] In various examples, the system on chip 200 can further include a machine-learning (ML) accelerator chiplet 240 that is specialized for accelerating machine-learned or Al workloads, such as image inferences or other sensor inferences using machine learning, in order to achieve high performance and low power consumption for these workloads. The ML accelerator chiplet 240 can include an engine designed to efficiently process graph-based data structures, which are commonly used in Al workloads, and a highly parallel processor, allowing for efficient processing of large volumes of data. The ML accelerator chiplet 240 can also include specialized hardware accelerators for common Al operations such as matrix multiplication and convolution as well as a memory hierarchy designed to optimize memory access for Al workloads, which often have complex memory access patterns.

[0037] The general compute chiplets 245 can provide general purpose computing for the system on chip 200. For example, the general compute chiplets 245 can comprise high-powered central processing units and/or graphical processing units that can support the computing tasks of the central chiplet 220, autonomous drive chiplet 240, and/or the ML accelerator chiplet 250.

[0038] In various implementations, the shared memory 230 can store programs and instructions for performing autonomous driving tasks. The shared memory 230 of the central chiplet 220 can further include a reservation table that provides the various chiplets with the information needed (e.g., sensor data items and their locations in memory) for performing their individual tasks. In various aspects, the central chiplet 220 also includes the large cache memory 231, which supports invalidate and flush operations for stored data. Further description of the shared memory 230 in the context of the central chiplet 220 is provided below with respect to FIG. 3.

[0039] Cache miss and evictions from the cache memory 231 are sent by a high-bandwidth memory (HBM) RAM chiplet 255 connected to the central chiplet 220. The HBM-RAM chiplet 255 can include status information, variables, statistical information, and/or sensor data for all other chiplets. In certain examples, the information stored in the HBM-RAM chiplet 255 can be stored for a predetermined period of time (e.g., ten seconds) before deleting or otherwise flushing the data. For example, when a fault occurs on the autonomous vehicle, the information stored in the HBM-RAM chiplet 255 can include all information necessary to diagnose and resolve the fault. Cache memory 231 keeps fresh data available with low latency and less power required compared to accessing data from the HBM-RAM chiplet 255.

[0040] As provided herein, the shared memory 230 can house a mailbox architecture in which a reflex program comprising a suite of instructions is used to execute workloads by the central chiplet 220, general compute chiplets 245, and/or autonomous drive chiplet 240. In certain examples, the central chiplet 220 can further execute a FuSa program that operates to compare and verify outputs of respective pipelines to ensure consistency in the ML inference operations. In still further examples, the central chiplet 220 can execute a thermal management program to ensure that the various components of the SoC 200 operates within normal temperature ranges.

[0041] EXAMPLE MULTIPLE-SYSTEM-ON-CHIP

[0042] FIG. 3 is a block diagram depicting an example multiple system-on- chip (MSoC), in accordance with examples described herein. In various examples, the MSoC 300 can include a first SoC 310 having a first memory 315 and a second SoC 320 having a second memory 325 coupled by an interconnect 340 (e.g., an ASIL-D rated interconnect) that enables each of the first SoC 310 and second SoC 320 to read each other's memories 315, 325. During any given session, the first SoC 310 and the second SoC 320 may alternate roles, between a primary SoC and a backup SoC. As provided herein, each of the first SoC 310 and the second SoC 320 can comprise a system-on-chip 200 as shown and described with respect to FIG. 2, and may also be connected to each other via interconnect 340, which can correspond to the D2D interconnect 130 comprising the mainband data path 115 and sideband data path 125 as shown and described with respect to FIG. 1.

[0043] In some examples, the primary SoC can perform various autonomous driving tasks, such as perception, object detection and classification, grid occupancy determination, sensor data fusion and processing, motion prediction (e.g., of dynamic external entities), motion planning, and vehicle control tasks. The backup SoC can maintain a set of computational components (e.g., CPUs, ML accelerators, and/or memory chiplets) in a low power state, and continuously or periodically read the memory of the primary SoC. In variations, the primary SoC and backup SoC may be implemented in alternative computing environments, such as datacenter computing environments for managing and/or hosting application services.

[0044] In certain examples, if the first SoC 310 is the primary SoC and the second SoC 320 is the backup SoC, then the first SoC 310 performs a set of primary tasks (e.g., autonomous driving tasks) and publishes state information corresponding to these tasks in the first memory 315. The second SoC 320 reads the published state information in the first memory 315 to continuously check that the first SoC 310 is operating within nominal thresholds (e.g., temperature thresholds, bandwidth and/or memory thresholds, etc.), and that the first SoC 310 is performing the set of autonomous driving tasks properly. As such, the second SoC 320 performs health monitoring and error management tasks for the first SoC 310, and takes over control of the set of tasks (e.g., autonomous driving tasks) when a triggering condition is met. As provided herein, the triggering condition can correspond to a fault, failure, or other error experienced by the first SoC 310 that may affect the performance of the set of tasks by the first SoC 310. Such faults or failures may be reported via reporting data over the sideband data path of the interconnect 340, as described herein.

[0045] In various implementations, the second SoC 320 can publish state information corresponding to its computational components being maintained in a standby state (e.g., a low power state in which the second SoC 320 maintains readiness to take over the set of tasks from the first SoC 310). In such examples, the first SoC 310 can monitor the state information of the second SoC 320 by continuously or periodically reading the memory 325 of the second SoC 320 to also perform health check monitoring and error management on the second SoC 320. For example, if the first SoC 310 detects a fault, failure, or other error in the second SoC 320, the first SoC 310 can trigger the second SoC 320 to perform a system reset or reboot.

[0046] In certain examples, the first SoC 310 and the second SoC 320 can each include a functional safety (FuSa) component that performs the health monitoring (e.g., monitoring the state of the SoC and/or SoC transistor health) and error management tasks. The FuSa component can be maintained in a powered state for each SoC, whether the SoC operates in a primary or backup manner. As such, the backup SoC may maintain its other components in a low powered state, with its FuSa component being powered up and performing the heath monitoring and error management tasks described herein.

[0047] In various aspects, when the first SoC 310 operates as the primary SoC, the state information published in the first memory 315 can correspond to the set of tasks being performed by the first SoC 310. For example, the first SoC 310 can publish any information corresponding to the surrounding environment of an autonomous vehicle (e.g., any external entities identified by the first SoC 310, their locations, and predicted trajectories, detected objects, such as traffic signals, signage, lane markings, and crosswalk, and the like). The state information can further include the operating temperatures of the computational components of the first SoC 310, bandwidth usage and available memory of the chiplets of the first SoC 310, and/or any faults or errors, or information indicating faults or errors in these components.

[0048] In further aspects, when the second SoC 320 operates as the backup SoC, the state information published in the second memory 325 can correspond to the state of each computational component of the second SoC 320. In particular, these components may operate in a low power state in which the components are ready to take over the set of tasks being performed by the first SoC 310. The state information can include whether the components are operating within nominal temperatures and other nominal ranges (e.g., available bandwidth, power, memory, etc.).

[0049] As described throughout the present disclosure, the first SoC 310 and the second SoC 320 can switch between operating as the primary SoC and the backup SoC (e.g., each time the system 300 is rebooted). For example, in a computing session subsequent to a session in which the first SoC 310 operated as the primary SoC and the second SoC 320 operated as the backup SoC, the second SoC 320 can assume the role of the primary SoC and the first SoC 310 can assume the role of the backup SoC. It is contemplated that this process of switching roles between the two SoCs can provide substantially even wear of the hardware components of each SoC, which can prolong the lifespan of the computing system 300 as a whole.

[0050] According to embodiments, the first SoC 310 can be powered by a first power source and the second SoC 320 can be powered by a second power source that is independent or isolated from the first power source. For example, in an electric vehicle, the first power source can comprise the battery pack used for propelling the electric motors of the vehicle, and the second power source can comprise the auxiliary power source of the vehicle (e.g., a 12- volt battery). In other implementations, the first and second power sources can comprise other types of power sources, such as dedicated batteries for each SoC 310, 320 or other power sources that are electrically isolated or otherwise not dependent from each other.

[0051] It is contemplated that the mSoC arrangement of the computing system 300 can be provided to increase the safety integrity level (e.g., ASIL rating) of the computing system 300 and the overall autonomous driving system of the vehicle. As described herein, the autonomous driving system can include any number of dual SoC arrangements, each of which can perform a set of autonomous driving tasks. In doing so, the backup SoC dynamically monitors the health of the primary SoC in accordance with a set of functional safety operations, such that when a fault, failure, or other error is detected, the backup SoC can readily power up its components and take over the set of tasks from the primary SoC.

[0052] In certain implementations, the reporting data submitted over the sideband data path 125 of the interconnect 340 between SoC 310 and SoC 30 can include a degradation command or trigger to initiate a certain level of degradation of the primary SoC (e.g., based on the aging and/or reporting data). As described herein, further description of degrading the performance of one or more components of an SoC may be found in U.S. Patent Application No. 18/229,491, which is incorporated by reference herein in its entirety. Such a degradation scheme may include predetermined or preconfigured levels of degradation based on the reporting data, and can involve selectively decreasing certain compute tasks in a software structure based on, for example, safety ratings associated with runnables and/or connections between runnables.

[0053] METHODOLOGY

[0054] FIGS. 4 and 5 are flow charts describing example methods of reporting information between chiplets, according to various examples. In the below discussions of the methods of FIGS. 4 and 5, reference may be made to reference characters representing certain features described with respect to the diagrams of FIGS. 1, 2, and 3. Furthermore, the steps described with respect to the flow charts of FIGS. 4 and 5 may be performed by the multi-chiplet computing apparatus 100 or any combination of chiplets as shown and described with respect to FIGS. 1, 2, and 3. Further still, certain steps described with respect to the flow charts of FIGS. 4 and 5 may be performed prior to, in conjunction with, or subsequent to any other step, and need not be performed in the respective sequences shown. In further implementations, the processes described in connection with FIGS. 4 and 5 can be performed by one or more chiplets of a multi-chiplet computing apparatus included on a robotic system, autonomous vehicle, server farm or datacenter, personal computing device, graphics processing unit, and the like.

[0055] Referring to FIG. 4, at block 400, a primary chiplet 110 can transmit, over a sideband data path 125 of a die-to-die interconnect 130, reporting and scheduling data to a secondary chiplet 120. At block 405, the secondary chiplet can process the reporting and scheduling data based on a set of nominal operation metrics, such as factory specified, tested, and/or calibrated settings and operational ranges (e.g., nominal voltage, power consumption, bandwidth ranges, etc.). At block 410, the secondary chiplet 120 can further predict, based on the reporting data, that a fault or failure will occur on the primary chiplet 110 within a future time period. At block 415, the secondary chiplet 120 and/or primary chiplet 110 can implement a set of mitigative actions to prevent or mitigate the predicted fault or failure.

[0056] FIG. 5 is a flow chart describing another method of reporting data over a sideband data path 125 of a die-to-die interconnect 130, in accordance with examples described herein. Referring to FIG. 5, at block 500, a secondary chiplet 120 can receive reporting and scheduling data from a primary chiplet 110 via a sideband data path 125 of a die-to-die interconnect 130. As described herein, the reporting and scheduling data can include: aging data at block 502, power data at block 503, and/or timing data at block 504. As further described herein, the reporting and scheduling data may be detected by one or more sensors of the primary chiplet 110 or can be measured directly by the one or more components of the primary chiplet 110, and can include voltage data, frequency data, power leakage information, power consumption data, bandwidth data, resistance data, thermal information, and the like.

[0057] In various examples, at block 505, the secondary chiplet 120 can process the reporting and scheduling data to perform health monitoring on the primary chiplet 110. At decision block 510, the secondary chiplet 120 can dynamically determine whether the reporting and scheduling data is within nominal parameters. For example, the secondary chiplet 120 can determine whether the voltage data, frequency data, power leakage information, power consumption data, bandwidth data, resistance data, and/or thermal information are all within factory specified and/or calibrated operating ranges. If so, then the secondary chiplet 120 can continue to receive and process reporting and scheduling data from the primary chiplet 110, at blocks 500 and 505.

[0058] However, if the reporting and scheduling data is not within nominal parameters, then, at block 515, the secondary chiplet 120 can perform one or more mitigative actions on the primary chiplet 110. As provided herein, the mitigative actions can include throttling the primary chiplet at block 517, switching to a backup chiplet at block 518, or shutting down the primary chiplet 110 at block 519. It is contemplated that the roles between primary chiplet and secondary chiplet can be reversed, or each can perform secondary chiplet tasks to the other chiplet. The reporting data may be temporarily stored at the secondary chiplet 120 in a high-bandwidth memory device or random-access memory (e.g., a shared memory or cache).

[0059] It is contemplated for examples described herein to extend to individual elements and concepts described herein, independently of other concepts, ideas or systems, as well as for examples to include combinations of elements recited anywhere in this application. Although examples are described in detail herein with reference to the accompanying drawings, it is to be understood that the concepts are not limited to those precise examples. As such, many modifications and variations will be apparent to practitioners skilled in this art. Accordingly, it is intended that the scope of the concepts be defined by the following claims and their equivalents. Furthermore, it is contemplated that a particular feature described either individually or as part of an example can be combined with other individually described features, or parts of other examples, even if the other features and examples make no mention of the particular feature.

Claims

WHAT IS CLAIMED IS:

1. A multi-chiplet computing apparatus comprising: a primary chiplet comprising a first mainband communication interface and a first sideband communication interface; a secondary chiplet comprising a second mainband communication interface and a second sideband communication interface; and an interconnect comprising (i) a mainband data path to transmit communications between the first mainband communication interface and the second mainband communication interface, and (ii) a sideband data path to transmit reporting and scheduling data between the first sideband communication interface and the second sideband communication interface.

2. The multi-chiplet computing apparatus of claim 1, wherein the primary chiplet comprises a first set of hardware components and the secondary chiplet comprises a second set of hardware components.

3. The multi-chiplet computing apparatus of claim 2, wherein the reporting and scheduling data corresponds to power consumption data for the first set of hardware components of the primary chiplet.

4. The multi-chiplet computing apparatus of claim 2, wherein the reporting data corresponds to aging data for the first set of hardware components of the primary chiplet.

5. The multi-chiplet computing apparatus of claim 4, wherein the aging data comprises measured data comprises at least one of voltage data, resistance data, or thermal data for at least one component of the first set of hardware components.

6. The multi-chiplet computing apparatus of claim 5, wherein at least some of the aging data is received from one or more sensors of the primary chiplet.

7. The multi-chiplet computing apparatus of claim 2, wherein the reporting data corresponds to timing data for scheduling a set of workloads by the first set of hardware components of the primary chiplet.

8. The multi-chiplet computing apparatus of claim 2, wherein the secondary chiplet predicts, based on the reporting data from the primary chiplet, whether a fault or failure will occur in the first set of hardware components of the primary chiplet within a future timeframe.

9. The multi-chiplet computing apparatus of claim 8, wherein when the secondary chiplet predicts that a fault or failure will occur in the first set of hardware components within the future timeframe, the secondary chiplet transmits a mitigative message to the primary chiplet to perform one or more mitigative actions.

10. The multi-chiplet computing apparatus of claim 9, wherein the one or more mitigative actions include one or more of: throttling the first set of hardware components, switching processing from the primary chiplet to a backup chiplet, or shutting down the primary chiplet.

11. The multi-chiplet computing apparatus of claim 1, wherein the interconnect comprises a standardized Universal Chiplet Interconnect Express (UCIe) interconnect.

12. The multi-chiplet computing apparatus of claim 1, wherein the multi-chiplet computing device is included on a system-on-chip (SoC) that processes data.

13. The multi-chiplet computing apparatus of claim 12, wherein the SoC processes sensor data to enable operation of an autonomous vehicle.

14. The multi-chiplet computing apparatus of claim 13, wherein the sensor data comprises at least one of image data, LIDAR data, or radar data of a surrounding environment of the autonomous vehicle.

15. The multi-chiplet computing apparatus of claim 1, wherein the primary chiplet and secondary chiplet are included in a computing system of a datacenter.

16. The multi-chiplet computing apparatus of claim 2, wherein the secondary chiplet comprises a second set of hardware components, and wherein the secondary chiplet transmits reporting data corresponding to the second set of hardware components to the primary chiplet over the sideband data path.

17. The multi-chiplet computing apparatus of claim 1, wherein the sideband data path of the interconnect comprises an ASIL-D rating.

18. The multi-chiplet computing apparatus of claim 1, wherein multi- chiplet computing apparatus comprises a multiple-system-on-chip in which the primary chiplet is included on a first system-on-chip, the secondary chiplet is included on a second system-on-chip, and the interconnect connects the first system-on-chip with the second system-on-chip.

19. A computer-implemented method of operating a multi-chiplet computing apparatus, the method being performed by one or more processors and comprising: transmitting communications between a first mainband communication interface of a primary chiplet and a second mainband communication interface of a secondary chiplet using a mainband data path of an interconnect; and transmitting reporting and scheduling data between a first sideband communication interface of the primary chiplet and a second sideband communication interface of the secondary chiplet using a sideband data path of the interconnect.

20. A non-transitory computer readable medium storing instructions that, when executed by one or more processors of a multi-chiplet computing apparatus, cause the one or more processors to: transmit communications between a first mainband communication interface of a primary chiplet and a second mainband communication interface of a secondary chiplet using a mainband data path of an interconnect; and transmit reporting and scheduling data between a first sideband communication interface of the primary chiplet and a second sideband communication interface of the secondary chiplet using a sideband data path of the interconnect.