US20240430125A1 - Functional safety for system-on-chip arrangements - Google Patents
Functional safety for system-on-chip arrangements Download PDFInfo
- Publication number
- US20240430125A1 US20240430125A1 US18/212,442 US202318212442A US2024430125A1 US 20240430125 A1 US20240430125 A1 US 20240430125A1 US 202318212442 A US202318212442 A US 202318212442A US 2024430125 A1 US2024430125 A1 US 2024430125A1
- Authority
- US
- United States
- Prior art keywords
- chiplet
- soc
- fusa
- central
- sensor data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/40—Bus networks
- H04L12/40006—Architecture of a communication node
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W60/00—Drive control systems specially adapted for autonomous road vehicles
- B60W60/001—Planning or execution of driving tasks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1008—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
- G06F11/1048—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices using arrangements adapted for a specific error detection or correction feature
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
- G06F15/7825—Globally asynchronous, locally synchronous, e.g. network on chip
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/485—Task life-cycle, e.g. stopping, restarting, resuming execution
- G06F9/4856—Task life-cycle, e.g. stopping, restarting, resuming execution resumption being on a different machine, e.g. task migration, virtual machine migration
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0736—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in functional embedded systems, i.e. in a data processing system designed as a combination of hardware and software dedicated to performing a certain function
- G06F11/0739—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in functional embedded systems, i.e. in a data processing system designed as a combination of hardware and software dedicated to performing a certain function in a data processing system embedded in automotive or aircraft systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0751—Error or fault detection not based on redundancy
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1008—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
- G06F11/1044—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices with specific ECC/EDC distribution
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/1629—Error detection by comparing the output of redundant processing systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/167—Interprocessor communication using a common memory, e.g. mailbox
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/40—Bus networks
- H04L2012/40267—Bus for use in transportation systems
- H04L2012/40273—Bus for use in transportation systems the transportation system being a vehicle
Definitions
- Universal Chiplet Interconnect Express provides an open specification for an interconnect and serial bus between chiplets, which enables the production of large system-on-chip (SoC) packages with intermixed components from different silicon manufacturers.
- SoC system-on-chip
- Autonomous vehicle computing systems may operate using chiplet arrangements that follow the UCIe specification.
- One goal of creating such computing systems is to achieve the robust safety integrity levels of other important electrical and electronic (E/E) automotive components of the vehicle.
- a computing system can include a sensor data input chiplet to obtain sensor data from a sensor system of a vehicle, and one or more workload processing chiplets that execute workloads based on the sensor data.
- the computing system can further include a first central chiplet comprising a shared memory including a functional safety (FuSa) program that the first central chiplet to dynamically compare and verify output of workloads being executed by the set of workload processing chiplets.
- the computing system can be included on a vehicle, and the workloads can comprise inference tasks based on the sensor data for autonomously operating the vehicle.
- the workloads can be executed by the set of workload processing chiplets in independent pipelines, and the FuSa program can dynamically compare and verify output of the independent pipelines by executing a set of FuSa workloads in a FuSa pipeline.
- the computing system can comprise a first system-on-chip (SoC) that includes the first central chiplet and a second SoC that includes a second central chiplet.
- SoC system-on-chip
- the first SoC and the second SoC can be communicatively coupled by an interconnect, and the FuSa program can also be included in the second central chiplet of the second SoC.
- the FuSa program included in the second central chiplet of the second SoC causes the second SoC to monitor the shared memory of the first central chiplet of the first SoC to dynamically determine whether the first SoC is functioning within nominal operating parameters.
- the FuSa program of the second SoC can cause a second set of workload processing chiplets of the second SoC to take over execution of the workloads.
- the nominal operating parameters can correspond to nominal temperature ranges, voltage ranges, or any, faults, failures, or errors on the first SoC.
- the sensor data input chiplet, the central chiplet, and the one or more workload processing chiplets can communicate over a performance network comprising a plurality of network hubs.
- the performance network can comprise a high-bandwidth network for the transmission of raw sensor data, processed sensor data, and messages.
- the FuSa program can monitor communications through the plurality of network hubs between the sensor data input chiplet, the central chiplet, and the one or more workload processing chiplets. For example, the FuSa program can monitor communications through the plurality of network hubs using a set of FuSa accounting hubs that communicate over a high-reliability FuSa network.
- the central chiplet can comprise a dedicated FuSa CPU executing the FuSa program to communicate over the performance network via a performance network-on-chip (NoC), and communicate over the high-reliability FuSa network via a FuSa NoC.
- the one or more workload processing chiplets can transmit processed sensor data corresponding to the execution of workloads to a cache memory of the central chiplet over the performance network.
- the workload processing chiplets can further transmit a first error correction code (ECC) along the high-reliability FuSa network to the central chiplet based on the processed sensor data.
- ECC error correction code
- the central chiplet Upon receiving the processed sensor data, the central chiplet can generate a second ECC using the processed sensor data.
- the FuSa CPU of the central chiplet may then perform a functional safety call in the central chiplet to verify that the first ECC and the second ECC match to ensure that the processed data was transmitted correctly.
- FIG. 1 is a block diagram depicting an example computing system in which embodiments described herein may be implemented, in accordance with examples described herein;
- FIG. 2 is a block diagram depicting a system-on-chip (SoC) in which examples described herein may be implemented, in accordance with examples described herein;
- SoC system-on-chip
- FIG. 3 is a block diagram illustrating an example central chiplet of an SoC arrangement for executing workloads, in accordance with examples described herein;
- FIG. 4 depicts workloads being executed in a set of independent pipelines, and further depicts a functional safety (FuSa) pipeline operable to compare and verify the outputs of the independent pipelines, according to examples described herein;
- FuSa functional safety
- FIG. 5 is a block diagram depicting an example multiple system-on-chip (MSoC), in accordance with examples described herein;
- FIG. 6 is a block diagram depicting a performance network and a FuSa network for performing health monitoring and error correction, according to examples described herein;
- FIG. 7 is a flow chart describing a method of dynamically comparing and verifying workload outputs by a set of workload processing chiplets, according to various examples
- FIG. 8 is a flow chart describing a method of performing backup operations in a multiple system-on-chip (MSoC) arrangement, according to various examples.
- FIG. 9 is a flow chart describing a method of monitoring communications in a high-bandwidth performance network by a FuSa program, according to various examples described herein.
- ASIL automotive safety integrity level Rating
- autonomous driving features continue to advance (e.g., beyond Level 3 autonomy), and autonomous vehicles begin operating more commonly on public road networks, the qualification and certification of E/E components related to autonomous operation of the vehicle will be advantageous to ensure operational safety of these vehicles.
- novel methods for qualifying and certifying hardware, software, and/or hardware/software combinations will also be advantageous in increasing public confidence and assurance that autonomous driving systems are safe beyond current standards.
- certain safety standards for autonomous driving systems include safety thresholds that correspond to average human abilities and care.
- these statistics include vehicle incidences involving impaired or distracted drivers and do not factor in specified time windows in which vehicle operations are inherently riskier (e.g., inclement weather conditions, late night driving, winding mountain roads, etc.).
- Automotive safety integrity level is a risk classification scheme defined by ISO 26262 (the functional safety for road vehicles standard), and is typically established for the E/E components of the vehicle by performing a risk analysis of potential hazards, which involves determining respective levels of severity (i.e., the severity of injuries the hazard can be expected to cause; classified between S0 (no injuries) and S3 (life-threatening injuries)), exposure (i.e., the relative expected frequency of the operational conditions in which the injury can occur; classified between E0 (incredibly unlikely) and E4 (high probability of injury under most operating conditions)), and controllability (i.e., the relative likelihood that the driver can act to prevent the injury; classified between C0 (controllable in general) and C3 difficult to control or uncontrollable)) of the vehicle operating scenario.
- the safety goal(s) for any potential hazard event includes a set of ASIL requirements.
- QM quality management
- these QM hazards may be any combination of low probability of exposure to the hazard, low level of severity of potential injuries resulting from the hazard, and a high level of controllability by the driver in avoiding the hazard and/or preventing injuries.
- Other hazard events are classified as ASIL-A, ASIL-B, ASIL-C, or ASIL-D depending on the various levels of severity, exposure, and controllability corresponding to the potential hazard.
- ASIL-D events correspond to the highest integrity requirements (ASIL requirements) on the safety system or E/E components of the safety system, and ASIL-A comprises the lowest integrity requirements.
- the airbags, anti-lock brakes, and power steering system of a vehicle will typically have an ASIL-D grade, where the risks associated with the failure of these components (e.g., the probable severity of injury and lack of vehicle controllability to prevent those injuries) are relatively high.
- the ASIL may refer to both risk and risk-dependent requirements, where the various combinations of severity, exposure, and controllability are quantified to form an expression of risk (e.g., an airbag system of a vehicle may have a relatively low exposure classification, but high values for severity and controllability).
- the quantities for severity, exposure, and controllability for a given hazard are traditionally determined using values for severity (e.g., S0 through S3), exposure (e.g., E0 through E4), and controllability (e.g., C0 through C3) in the ISO 26262 series, where these values are then utilized to classify the ASIL requirements for the components of a particular safety system.
- certain safety systems can perform variable mitigation measures, which can range from alerts (e.g., visual, auditory, or haptic alerts), minor interventions (e.g., brake assist or steer assist), major interventions and/or avoidance maneuvering (e.g., taking over control of one or more control mechanisms, such as the steering, acceleration, or braking systems), and full autonomous control of the vehicle.
- alerts e.g., visual, auditory, or haptic alerts
- minor interventions e.g., brake assist or steer assist
- major interventions and/or avoidance maneuvering e.g., taking over control of one or more control mechanisms, such as the steering, acceleration, or braking systems
- Non-deterministic inference models in which the system executes one or more perception, object detection, object classification, motion prediction, motion planning, and vehicle control techniques based on, for example, two-dimensional image data, to perform all autonomous driving tasks. It is contemplated that such implementations may be difficult or impossible to certify and provide an ASIL rating for the overall autonomous driving system.
- an autonomous driving system is provided herein that may perform deterministic, reflexive inference operations on specified hardware arrangements that allow for the certification and ASIL grading of various components, software aspects of the system, and/or the entire autonomous driving system itself.
- Example computing systems are described herein which comprise MSoC arrangements comprising a primary SoC and backup SoC, with each SoC executing a FuSa program to monitor a shared memory of the other SoC, compare and verify independent pipeline outputs based on workload processing chiplets executing workloads, and generate error correction codes for communications between the sensor data input chiplet, central chiplet, and/or the workload processing chiplets.
- the FuSa program can be executed by a dedicated FuSa CPU in the central chiplet to perform these tasks.
- the FuSa program can facilitate redundancy between the primary and backup SoCs, perform verification of inference tasks performed by the central chiplet and/or workload processing chiplets, and reduce errors in communications between the chiplets using one or more error correction code techniques.
- Such FuSa tasks can contribute to an enhanced ASIL rating of various components of an autonomous drive system, as well as the autonomous drive system of the vehicle itself.
- the FuSa tasks and robust hardware components described herein can result in an ASIL-D rating for the autonomous drive system.
- the example computing systems can perform one or more functions described herein using a learning-based approach, such as by executing an artificial neural network (e.g., a recurrent neural network, convolutional neural network, etc.) or one or more machine-learning models.
- an artificial neural network e.g., a recurrent neural network, convolutional neural network, etc.
- Such learning-based approaches can further correspond to the computing system storing or including one or more machine-learned models.
- the machine-learned models may include an unsupervised learning model.
- the machine-learned models may include neural networks (e.g., deep neural networks) or other types of machine-learned models, including non-linear models and/or linear models.
- Neural networks may include feed-forward neural networks, recurrent neural networks (e.g., long short-term memory recurrent neural networks), convolutional neural networks or other forms of neural networks.
- Some example machine-learned models may leverage an attention mechanism such as self-attention.
- some example machine-learned models may include multi-headed self-attention models (e.g., transformer models).
- a “network” or “one or more networks” can comprise any type of network or combination of networks that allows for communication between devices.
- the network may include one or more of a local area network, wide area network, the Internet, secure network, cellular network, mesh network, peer-to-peer communication link or some combination thereof and may include any number of wired or wireless links. Communication over the network(s) may be accomplished, for instance, via a network interface using any type of protocol, protection scheme, encoding, format, packaging, etc.
- One or more examples described herein provide that methods, techniques, and actions performed by a computing device are performed programmatically, or as a computer implemented method.
- Programmatically means through the use of code or computer-executable instructions. These instructions can be stored in one or more memory resources of the computing device.
- a programmatically performed step may or may not be automatic.
- a programmatic module, engine, or component can include a program, a sub-routine, a portion of a program, or a software component or a hardware component capable of performing one or more stated tasks or functions.
- a module or component can exist on a hardware component independently of other modules or components. Alternatively, a module or component can be a shared element or process of other modules, programs or machines.
- Some examples described herein can generally require the use of computing devices, including processing and memory resources.
- one or more examples described herein may be implemented, in whole or in part, on computing devices such as servers and/or personal computers using network equipment (e.g., routers).
- Network equipment e.g., routers.
- Memory, processing, and network resources may all be used in connection with the establishment, use, or performance of any example described herein (including with the performance of any method or with the implementation of any system).
- one or more examples described herein may be implemented through the use of instructions that are executable by one or more processors. These instructions may be carried on a non-transitory computer-readable medium.
- Machines shown or described with figures below provide examples of processing resources and computer-readable mediums on which instructions for implementing examples disclosed herein can be carried and/or executed.
- the numerous machines shown with examples of the invention include processors and various forms of memory for holding data and instructions.
- Examples of non-transitory computer-readable mediums include permanent memory storage devices, such as hard drives on personal computers or servers.
- Other examples of computer storage mediums include portable storage units, such as flash memory or magnetic memory.
- Computers, terminals, network-enabled devices are all examples of machines and devices that utilize processors, memory, and instructions stored on computer-readable mediums. Additionally, examples may be implemented in the form of computer programs, or a computer usable carrier medium capable of carrying such a program.
- FIG. 1 is a block diagram depicting an example computing system 100 in which embodiments described herein may be implemented, in accordance with examples described herein.
- the computing system 100 can include one or more control circuits 110 that may include one or more processors (e.g., microprocessors), one or more processing cores, a programmable logic circuit (PLC) or a programmable logic/gate array (PLA/PGA), a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), systems on chip (SoCs), or any other control circuit.
- processors e.g., microprocessors
- PLC programmable logic circuit
- PLA/PGA programmable logic/gate array
- FPGA field programmable gate array
- ASIC application specific integrated circuit
- SoCs systems on chip
- the control circuit(s) 110 and/or computing system 100 may be part of, or may form, a vehicle control unit (also referred to as a vehicle controller) that is embedded or otherwise disposed in a vehicle (e.g., a Mercedes-Benz® car, truck, or van).
- vehicle controller may be or may include an infotainment system controller (e.g., an infotainment head-unit), a telematics control unit (TCU), an electronic control unit (ECU), a central powertrain controller (CPC), a central exterior & interior controller (CEIC), a zone controller, an autonomous vehicle control system, or any other controller (the term “or” is used herein interchangeably with “and/or”).
- control circuit(s) 110 may be programmed by one or more computer-readable or computer-executable instructions stored on the non-transitory computer-readable medium 120 .
- the non-transitory computer-readable medium 120 may be a memory device, also referred to as a data storage device, which may include an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination thereof.
- the non-transitory computer-readable medium 120 may form, for example, a computer diskette, a hard disk drive (HDD), a solid state drive (SDD) or solid state integrated memory, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), dynamic random access memory (DRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), and/or a memory stick.
- the non-transitory computer-readable medium 120 may store computer-executable instructions or computer-readable instructions, such as instructions to perform the below methods described in connection with FIGS. 7 - 9 .
- the terms “computer-readable instructions” and “computer-executable instructions” are used to describe software instructions or computer code configured to carry out various tasks and operations.
- the term “module” refers broadly to a collection of software instructions or code configured to cause the control circuit 110 to perform one or more functional tasks.
- the modules and computer-readable/executable instructions may be described as performing various operations or tasks when the control circuit(s) 110 or other hardware components execute the modules or computer-readable instructions.
- the computing system 100 can include a communication interface 140 that enables communications over one or more networks 150 to transmit and receive data.
- the computing system 100 can communicate, over the one or more networks 150 , with fleet vehicles using the communication interface 140 to receive sensor data and implement the methods described throughout the present disclosure.
- the communication interface 140 may be used to communicate with one or more other systems.
- the communication interface 140 may include any circuits, components, software, etc. for communicating via one or more networks 150 (e.g., a local area network, wide area network, the Internet, secure network, cellular network, mesh network, and/or peer-to-peer communication link).
- the communication interface 140 may include for example, one or more of a communications controller, receiver, transceiver, transmitter, port, conductors, software and/or hardware for communicating data/information.
- the control circuit(s) 110 of the computing system 100 can include a SoC arrangement that facilitates the various methods and techniques described throughout the present disclosure.
- the SoC can include a set of chiplets, including a central chiplet comprising a shared memory in which a reservation table is utilized to execute various autonomous driving workloads in independent deterministic pipelines.
- the shared memory of the central chiplet can include a FuSa program executable by the control circuit 110 to perform functional safety tasks for the SoC arrangement, as described in detail below.
- FIG. 2 is a block diagram illustrating an example SoC 200 , in accordance with examples described herein.
- the example SoC 200 shown in FIG. 2 can include additional components, and the components of system on chip 200 may be arranged in various alternative configurations other than the example shown.
- the system on chip 200 of FIG. 2 is described herein as an example arrangement for illustrative purposes and is not intended to limit the scope of the present disclosure in any manner.
- a sensor data input chiplet 210 of the system on chip 200 can receive sensor data from various vehicle sensors 205 of the vehicle.
- vehicle sensors 205 can include any combination of image sensors (e.g., single cameras, binocular cameras, fisheye lens cameras, etc.), LIDAR sensors, radar sensors, ultrasonic sensors, proximity sensors, and the like.
- the sensor data input chiplet 210 can automatically dump the received sensor data as it's received into a cache memory 231 of the central chiplet 220 .
- the sensor data input chiplet 210 can also include an image signal processor (ISP) responsible for capturing, processing, and enhancing images taken from the various vehicle sensors 205 .
- ISP image signal processor
- the ISP takes the raw image data and performs a series of complex image processing operations, such as color, contrast, and brightness correction, noise reduction, and image enhancement, to create a higher-quality image that is ready for further processing or analysis by the other chiplets of the SoC 200 .
- the ISP may also include features such as auto-focus, image stabilization, and advanced scene recognition to further enhance the quality of the captured images.
- the ISP can then store the higher-quality images in the cache memory 231 .
- the sensor data input chiplet 210 publishes identifying information for each item of sensor data (e.g., images, point cloud maps, etc.) to a shared memory 230 of a central chiplet 220 , which acts as a central mailbox for synchronizing workloads for the various chiplets.
- the identifying information can include details such as an address in the cache memory 231 where the data is stored, the type of sensor data, which sensor captured the data, and a timestamp of when the data was captured.
- Interconnects 211 a - f each represent die-to-die (D2D) interfaces between the chiplets of the SoC 200 .
- the interconnects include a high-bandwidth data path used for general data purposes to the cache memory 231 and a high-reliability data path to transmit functional safety and scheduler information to the shared memory 230 .
- an interconnect may include more than one die-to-die interface.
- interconnect 211 a can include two interfaces to support higher bandwidth communications between the sensor data input chiplet 210 and the central chiplet 220 .
- the interconnects 211 a - f implement the Universal Chiplet Interconnect Express (UCIe) standard and communicate through an indirect mode to allow each of the chiplet host processors to access remote memory as if it were local memory.
- UCIe Universal Chiplet Interconnect Express
- NIU Network on Chip
- RDMA remote direct memory access
- UCIe indirect mode the host processor sends requests to the NIU, which then accesses the remote memory and returns the data to the host processor. This approach allows for efficient and low-latency access to remote memory, which can be particularly useful in distributed computing and data-intensive applications.
- UCIe indirect mode provides a high degree of flexibility, as it can be used with a wide range of different network topologies and protocols.
- the system on chip 200 can include additional chiplets that can store, alter, or otherwise process the sensor data cached by the sensor data input chiplet 210 .
- the system on chip 200 can include an autonomous drive chiplet 240 that can perform the perception, sensor fusion, trajectory prediction, and/or other autonomous driving algorithms of the autonomous vehicle.
- the autonomous drive chiplet 240 can be connected to a dedicated HBM-RAM chiplet 235 in which the autonomous drive chiplet 240 can publish all status information, variables, statistical information, and/or processed sensor data as processed by the autonomous drive chiplet 240 .
- the system on chip 200 can further include a machine-learning (ML) accelerator chiplet 240 that is specialized for accelerating AI workloads, such as image inferences or other sensor inferences using machine learning, in order to achieve high performance and low power consumption for these workloads.
- the ML accelerator chiplet 240 can include an engine designed to efficiently process graph-based data structures, which are commonly used in AI workloads, and a highly parallel processor, allowing for efficient processing of large volumes of data.
- the ML accelerator chiplet 240 can also include specialized hardware accelerators for common AI operations such as matrix multiplication and convolution as well as a memory hierarchy designed to optimize memory access for AI workloads, which often have complex memory access patterns.
- the general compute chiplets 245 can provide general purpose computing for the system on chip 200 .
- the general compute chiplets 245 can comprise high-powered central processing units and/or graphical processing units that can support the computing tasks of the central chiplet 220 , autonomous drive chiplet 240 , and/or the ML accelerator chiplet 250 .
- the shared memory 230 can store programs and instructions for performing autonomous driving tasks.
- the shared memory 230 of the central chiplet 220 can further include a reservation table that provides the various chiplets with the information needed (e.g., sensor data items and their locations in memory) for performing their individual tasks. Further description of the shared memory 230 in the context of the dual SoC arrangements described herein is provided below with respect to FIG. 4 .
- the central chiplet 220 also includes the large cache memory 231 , which supports invalidate and flush operations for stored data.
- HBM-RAM RAM chiplet 255 can include status information, variables, statistical information, and/or sensor data for all other chiplets.
- the information stored in the HBM-RAM chiplet 255 can be stored for a predetermined period of time (e.g., ten seconds) before deleting or otherwise flushing the data. For example, when a fault occurs on the autonomous vehicle, the information stored in the HBM-RAM chiplet 255 can include all information necessary to diagnose and resolve the fault.
- Cache memory 231 keeps fresh data available with low latency and less power required compared to accessing data from the HBM-RAM chiplet 255 .
- the shared memory 230 can house a mailbox architecture in which a reflex program comprising a suite of instructions is used to execute workloads by the central chiplet 220 , general compute chiplets 245 , and/or autonomous drive chiplet 240 .
- the central chiplet 220 can further execute a functional safety (FuSa) program that operates to compare and verify outputs of respective pipelines to ensure consistency in the ML inference operations.
- the central chiplet 220 can execute a thermal management program to ensure that the various components of the SoC 200 operates within normal temperature ranges. Further description of the shared memory 230 in the context of out-of-order workload execution in independent deterministic pipelines is provided below with respect to FIG. 3 .
- FIG. 3 is a block diagram illustrating an example central chiplet 300 of an SoC arrangement, in accordance with examples described herein.
- the central chiplet 300 shown in FIG. 3 can correspond to the central chiplet 220 of the SoC 200 as shown in FIG. 2 .
- the sensor data input chiplet 310 of FIG. 3 can correspond to the sensor data input chiplet 210 shown in FIG. 2
- the workload processing chiplets 320 shown in FIG. 3 can correspond to any one or more of the general compute chiplets 245 , ML accelerator chiplet 250 , and/or the autonomous drive chiplet 240 shown in FIG. 2 .
- the central chiplet 300 can include a shared memory 360 that includes a reflex program 330 , an application program 335 , a thermal management program 337 , and a FuSa program 338 .
- the reflex program 330 can comprise a set of instructions for executing reflex workloads in independent pipelines.
- the reflex workloads can comprise sensor data acquisition, sensor fusion, and inference tasks that facilitate scene understanding of the surrounding environment of the vehicle. These tasks can comprise two-dimensional image processing, sensor fused data processing (e.g., three-dimensional LIDAR, radar, and image fusion data), neural radiance field (NeRF) scene reconstruction, occupancy grid determination, object detection and classification, motion prediction, and other scene understanding tasks for autonomous vehicle operation.
- sensor fused data processing e.g., three-dimensional LIDAR, radar, and image fusion data
- NeRF neural radiance field
- the application program 335 can comprise a set of instructions for operating the vehicle controls of the autonomous vehicle based on the outputs of the reflex workload pipelines.
- the application program 335 can be executed by one or more processors 340 of the central chiplet 300 and/or one or more of the workload processing chiplets 320 (e.g., the autonomous drive chiplet 240 of FIG. 2 ) to dynamically generate a motion plan for the vehicle based on the execution of the reflex workloads, and operate the vehicle's controls (e.g., acceleration, braking, steering, and signaling systems) to execute the motion plan accordingly.
- the vehicle's controls e.g., acceleration, braking, steering, and signaling systems
- the thermal management program 337 can be executed by one or more processors 340 of the central chiplet to manage heat generated by the SoC.
- the thermal management program 337 can operate a set of cooling components (e.g., heat sinks, fans, cold plates, heat pipes, synthetic jets, etc.) included with the SoC to manage local temperatures of computational components within normal operating ranges.
- the thermal management program 337 can be executed by a dedicated CPU of the central chiplet 300 and can communicate with the workload processing chiplets 320 and sensor data input chiplet 310 for adjusting clocking or throttling the computing components to further manage temperatures.
- the thermal management program 337 can execute on each SoC of a dual or multiple-SoC arrangement, and can further communicate with the FuSa program 338 to, for example, power down a primary SoC when temperatures exceed nominal ranges and enable a backup SoC to take over autonomous inference and driving tasks.
- the FuSa program 338 can be executed by the one or more processors 340 (e.g., a dedicated FuSa CPU) of the central chiplet 300 to perform functional safety tasks for the SoC.
- these tasks can comprise acquiring and comparing output from multiple independent pipelines that correspond to inference and/or autonomous vehicle control tasks.
- one independent pipeline can comprise workloads corresponding to identifying other vehicles operating around the vehicle in image data
- a second independent pipeline can comprise workloads corresponding to identifying other vehicles operating around the vehicle in radar and LIDAR data.
- the FuSa program 338 can execute FuSa workloads in another independent pipeline that acquires the output of the first and second independent pipelines to dynamically verify that they have identified the same vehicles.
- the FuSa program 338 can operate to perform SoC monitoring in a dual SoC arrangement in which a primary Soc performs the inference and vehicle control tasks, and a backup SoC performs health monitoring on the primary SoC with its chiplets in a low power standby mode, ready to take over these tasks if any errors, faults, or failures are detected in the primary SoC. Further description of these FuSa functions are described below with respect to FIG. 5 .
- the FuSa program 338 can operate to monitor communications between chiplets and provide redundancy (e.g., via error correction code techniques) to ensure communication reliability between the chiplets. Further description of these techniques is provided below with respect to FIG. 6 .
- the central chiplet 300 can include a set of one or more processors 340 (e.g., a transient-resistant CPU and general compute CPUs) that can execute a scheduling program 342 for out-of-order execution of workloads in a set of deterministic pipelines.
- processors 340 e.g., a transient-resistant CPU and general compute CPUs
- the processors 340 can execute reflex workloads in accordance with the reflex program 330 and/or application workloads in accordance with the application program 335 .
- the processors 340 of the central chiplet 300 can reference, monitor, and update dependency information in workload entries of the reservation table 350 as workloads become available and are executed accordingly.
- the chiplet updates the dependency information of other workloads in the reservation table 350 to indicate that the workload has been completed.
- This can include changing a bitwise operator or binary value representing the workload (e.g., from 0 to 1) to indicate in the reservation table 350 that the workload has been completed. Accordingly, the dependency information for all workloads having dependency on the completed workload is updated accordingly.
- the reservation table 350 can include workload entries, each of which indicates a workload identifier that describes the workload to be performed, an address in the cache memory 315 and/or HBM-RAM of the location of raw or processed sensor data required for executing the workload, and any dependency information corresponding to dependencies that need to be resolved prior to executing the workload.
- the dependencies can correspond to other workloads that need to be executed.
- the workload entry can be updated (e.g., by the chiplet executing the dependent workloads, or by the processors 240 of the central chiplet 300 through execution of the scheduling program 342 ).
- the workload can be executed in a respective pipeline by a corresponding workload processing chiplet 320 .
- the sensor data input chiplet 310 obtains sensor data from the sensor system of the vehicle, and stores the sensor data (e.g., image data, LIDAR data, radar data, ultrasonic data, etc.) in a cache 315 of the central chiplet 300 .
- the sensor data input chiplet 310 can generate workload entries for the reservation table 350 comprising identifiers for the sensor data (e.g., an identifier for each obtained image from various cameras of the vehicle's sensor system) and provide an address of the sensor data in the cache memory 315 .
- An initial set of workloads be executed on the raw sensor data by the processors 340 of the central chiplet 300 and/or workload processing chiplets 320 , which can update the reservation table 350 to indicate that the initial set of workloads have been completed.
- the workload processing chiplets 320 monitor the reservation table 350 to determine whether particular workloads in their respective pipelines are ready for execution.
- the workload processing chiplets 320 can continuously monitor the reservation table using a workload window 355 (e.g., an instruction window for multimedia data) in which a pointer can sequentially read through each workload entry to determine whether the workloads have any unresolved dependencies. If one or more dependencies still exist in the workload entry, the pointer progresses to the next entry without the workload being executed. However, if the workload indicates that all dependencies have been resolved (e.g., all workloads upon which the particular workload depends have been executed), then the relevant workload processing chiplet 320 and/or processors 340 of the central chiplet 300 can execute the workload accordingly.
- a workload window 355 e.g., an instruction window for multimedia data
- a pointer can sequentially read through each workload entry to determine whether the workloads have any unresolved dependencies. If one or more dependencies still exist in the workload entry, the pointer progresses to the
- the reservation table 350 comprises an out-of-order buffer that enables the workload processing chiplets 320 to execute the workloads in an order governed by the resolution of their dependencies in a deterministic manner. It is contemplated that out-of-order execution of workloads in the manner described herein can increase speed, increase power efficiency, and decrease complexity in the overall execution of the workloads.
- the workload processing chiplets 320 can execute workloads in each pipeline in a deterministic manner, such that successive workloads of the pipeline are dependent on the outputs of preceding workloads in the pipeline.
- the processors 340 and workload processing chiplets 320 can execute multiple independent workload pipelines in parallel, with each workload pipeline including a plurality of workloads to be executed in a deterministic manner.
- Each workload pipeline can provide sequential outputs (e.g., for other workload pipelines or for processing by the application program 335 for autonomously operating the vehicle).
- the application program 335 can autonomously operate the controls of the vehicle along a travel route.
- the scheduling program 342 can cause the processors 340 and workload processing chiplets 320 to perform out-of-order execution on the workloads in independent pipelines.
- each image generated by the camera system of the vehicle would be processed or inferred on as the image becomes available.
- the instruction set would involve acquiring the image, scheduling inference on the image by a workload processing chiplet, performing inference on the image, acquiring a second image, scheduling inference on the second image by the workload processing chiplet, and performing inference on the second image, and so on across the suite of cameras of the vehicle.
- the use of the workload window 355 and reservation table 350 referencing dependency information for workloads enables the workload processing chiplets 320 to operate more efficiently by performing out-of-order execution on the workloads. Instead of performing inference on images based on when they are available, a workload processing chiplet 320 can acquire all images from all cameras first, and then perform inference on all the images together. Accordingly, the workload processing chiplet 320 executes its workloads with significantly reduced complexity, increased speed, and reduced power requirements.
- the shared memory 360 can include a thermal management program 337 executable by the one or more processors 340 to manage the various temperatures of the SoC 200 , operate cooling components, perform hardware throttling, switch to backup components (e.g., a backup SoC), and the like.
- the shared memory 360 can include a FuSa program 338 that performs functional safety tasks for the SoC 200 , such as monitoring communications within the SoC (e.g., using error correction code), comparing outputs of different pipelines, and monitoring hardware performance of the SoC.
- the thermal management program 337 and FuSa program 338 can perform their respective tasks in independent pipelines.
- FIG. 4 depicts workloads being executed in a set of independent pipelines 400 , 410 , and further depicts a functional safety (FuSa) pipeline 420 operable to compare and verify the outputs of independent pipelines, according to examples described herein.
- various workloads can be executed in independent deterministic pipelines by one or more processors 340 of the central chiplet 300 and/or the workload processing chiplets 320 through execution of the reflex program 330 , application program 335 , thermal program 337 , FuSa program 338 , and/or scheduling program 342 as depicted in FIG. 3 .
- pipeline 400 and pipeline 410 are executed in parallel by one or more chiplets of the SoC. While only workload pipelines 400 and 410 are shown in FIG. 4 , any number of pipelines can be executed in parallel by the central chiplet 300 and/or workload processing chiplets 320 in performing the reflex and application tasks described throughout the present disclosure.
- the reflex and application tasks can comprise sensor data acquisition, sensor fusion, inference tasks that facilitate scene understanding of the surrounding environment of the vehicle, motion prediction, motion planning, and vehicle control tasks for autonomously operating a vehicle. Additional tasks may also be executed in individual pipelines, such as power control tasks, thermal management tasks, health monitoring tasks, and the like.
- the scheduling program 342 can cause the workloads represented by the workload entries in the reservation table 350 to be executed deterministically in independent pipelines, such that the order of workload execution in each pipeline is consistent and non-reversible.
- the workloads executed in each pipeline can comprise a chain of dependency, such that the outputs of the pipelines are based on the same or similar workloads being sequentially executed in each pipeline. As such, complexity in the inference operations is significantly reduced, which can facilitate certification of each individual pipeline for autonomous driving purposes.
- pipeline 400 can be tasked with performing inference on two-dimensional image data (e.g., to identify and classify other dynamic entities proximate to the vehicle in the images).
- a first workload in pipeline 400 can comprise obtaining images captured by each camera of the vehicle at a given time.
- a second workload in pipeline 400 can comprise stitching the images to form a 360-degree ribbon of the surrounding environment of the vehicle.
- a third workload in pipeline 400 can comprise performing inference on the two-dimensional image data (e.g., pixel analysis to identify the dynamic entities).
- an output of pipeline 400 can comprise a two-dimensional ribbon with dynamic entities identified (e.g., with a bounding box) and/or classified (e.g., as bicyclists, other vehicles, pedestrians, etc.).
- pipeline 410 can be tasked with performing inference on three-dimensional sensor fusion data (e.g., comprising fused LIDAR, image, and/or radar data).
- pipeline 410 can also be tasked with identifying external dynamic entities in the three-dimensional data.
- a first workload in pipeline 410 can comprise acquiring point clouds generated by LIDAR sensors of the vehicle at a given time, and acquiring radar and ultrasonic data from the same time.
- a second workload in pipeline 410 can comprise fusing the sensor data to provide a three-dimensional, fused sensor view of the surrounding environment of the vehicle.
- a third workload in pipeline 410 can comprise performing inference on the three-dimensional sensor fusion data to identify and/or classify the external dynamic entities.
- the workload processing chiplets e.g., workload processing chiplets 320 and the central chiplet 300 of FIG. 3
- the workload processing chiplets can execute respective workloads in various other deterministic pipelines (e.g., in accordance with the reflex program 330 and/or application program 335 shown in FIG. 3 ).
- a first pipeline can be dedicated for identifying traffic signals in two-dimensional image data
- a second pipeline can be dedicated for identifying traffic signals in three-dimensional sensor fusion data
- a third pipeline can be dedicated for identifying and classifying lane markings
- a fourth pipeline can be dedicated for generating occupancy grid maps from the sensor data
- a fifth pipeline can be dedicated for predicting the motion of external dynamic entities
- a sixth pipeline can be dedicated for planning the motion of the vehicle based on the inferences from other pipelines
- a seventh pipeline can be dedicated for controlling the vehicle's control systems to execute the motion plan generated by the sixth pipeline, and so on.
- the workloads or tasks performed in each pipeline are ordered deterministically (e.g., by the scheduling program 342 of FIG. 3 ), which can significantly reduce complexity in certifying the autonomous drive system.
- a single inference mechanism for an autonomous drive system that performs natural order processing using image data may not be certifiable due to the complexity and randomness of its workload executions, as well as the potential for outliers in the single inference mechanism (e.g., confusion about certain detected objects and lack of comparison between multiple inference mechanisms). These outliers may result in stuck states or collisions for the autonomous vehicle.
- any outliers from one pipeline can be mitigated or otherwise overcome by comparison and confirmation mechanisms from other pipelines.
- the various workloads of pipeline 400 and pipeline 410 can be executed as runnables on one or more processers of one or more chiplets of the SoC 200 .
- a transient-resistant CPU e.g., of central chiplet 220 and/or general compute chiplets 245
- the use of robust, transient-resistant CPUs e.g., ASIL-D rated CPUs
- ASIL-D rated CPUs for executing workloads in the independent deterministic pipelines can further bolster the ASIL rating of the autonomous drive system as a whole.
- transient-resistant CPUs can be manufactured for robustness in terms of reliability, resistance to heat, cold, radiation, wear, age, vibration, shock, etc. It is further contemplated that transient-resistant CPUs may not have the computing power of modern, non-transient-resistant CPUs (e.g., having an ASIL-B rating) that are designed and manufactured to maximize bandwidth and processing speed.
- ASIL-B rating e.g., having an ASIL-B rating
- the workloads in pipeline 400 and pipeline 410 can be executed as runnables on multiple CPUs of the SoC 200 and/or multiple chiplets of the SoC 200 .
- a transient-resistant CPU can execute workloads in each pipeline 400 , 410 and can be backed up by one or more state-of-the art CPUs that execute the same workloads in each pipeline 400 , 410 .
- the transient-resistant CPU(s) may execute workloads in each pipeline 400 , 410 at a lower frequency than the other CPUs.
- the transient-resistant CPU(s) can execute the workloads in each pipeline 400 , 410 and provide outputs on the order of microseconds, whereas the other CPUs can provide outputs for each pipeline 400 , 410 on the order of nanoseconds.
- the transient-resistant CPUs may execute workloads in deterministic pipeline 400 and identify external dynamic objects (e.g., other vehicles, bicyclists, pedestrians, etc.) in two-dimensional image data every few microseconds.
- the other CPU may execute the same workloads in deterministic pipeline 400 to identify the same external dynamic entities every few nanoseconds (e.g., or at the same frequency that the images are generated by the cameras).
- the outputs by the transient-resistant CPU(s) can be verified or confirmed by the outputs of the other CPU(s) in each deterministic pipeline. This process can occur for each independent pipeline performing inference operations (e.g., the reflex program 330 ), and can further be utilized for the application program 335 , thermal management program 337 , and/or the FuSa program 338 .
- the workloads of pipeline 400 and pipeline 410 can be executed by one or more CPUs of the central chiplet 220 and/or one or more CPUs of the general compute chiplets 245 .
- FIG. 4 further shows an example FuSa pipeline 420 that dynamically compares and verifies the outputs of the runnables for each pipeline 400 , 410 , according to various examples.
- the FuSa pipeline 420 can compare the outputs of multiple runnables performed by different CPUs in each pipeline 400 , 410 , as well as comparing the outputs of pipeline 400 with the outputs of pipeline 410 .
- the two-dimensional outputs from pipeline 400 can indicate the entities in image data that lacks precise distance information to each entity, whereas the three-dimensional outputs from pipeline 410 may lack information such as color and edge detail that facilitates classification of the external entities.
- the sensor fused data processed in pipeline 410 can include radar and/or ultrasonic data that can provide detailed proximity and or speed differential information of the external entities.
- the outputs of pipeline 400 and pipeline 410 have different outliers that, when viewed alone, can affect the accuracy the autonomous drive system's capabilities.
- the various workload processing chiplets e.g., chiplets 320 and central chiplet 300 of FIG. 3
- the outputs of certain pipelines can be compared with the outputs of other pipelines through the execution of one or more FuSa pipelines 420 that acquire and dynamically verify the respective outputs of different independent pipelines.
- the FuSa pipeline 420 can acquire the outputs of pipeline 400 and pipeline 410 and compare and verify their outputs. As described herein, the outputs can correspond to any inference operations relating to the processing of sensor data from the sensor system of the vehicle. In various examples, the runnable of the FuSa pipeline 420 can be executed on a dedicated CPU (e.g., on the central chiplet 220 of the SoC 200 arrangement).
- the FuSa pipeline 420 acquires the two-dimensional outputs of pipeline 400 and the three-dimensional outputs of pipeline 410 .
- the FuSa pipeline 420 compares the two-dimensional and three-dimensional outputs to determine whether they are consistent with each other. For inferences involving the identification and/or classification of external dynamic entities, the FuSa pipeline 420 will confirm whether pipeline 400 and pipeline 410 have both separately identified and/or classified the same external dynamic entities in the surrounding environment of the vehicle using different sensor data and/or techniques having different outliers.
- a FuSa pipeline is utilized to compare and dynamically verify their outputs.
- this can include a FuSa pipeline that compares outputs of multiple pipelines tasked to identify traffic signals and traffic signal states, outputs of motion prediction pipelines tasked to predict the motion of external dynamic entities, and comparable outputs of other deterministic pipelines that facilitate in autonomously operating the vehicle.
- any issue that occurs in any pipeline can be readily detected and flagged by a FuSa pipeline.
- transient-resistant CPUs with support from general compute CPUs can all combine to provide an increased ASIL rating (e.g., an ASIL-D rating) for the autonomous driving system of the vehicle.
- ASIL rating e.g., an ASIL-D rating
- FIG. 5 is a block diagram depicting an example computing system 500 implementing a multiple system-on-chip (MSoC), in accordance with examples described herein.
- the computing system 500 can include a first SoC 510 having a first memory 515 and a second SoC 520 having a second memory 525 coupled by an interconnect 540 (e.g., an ASIL-D rated interconnect) that enables each of the first SoC 510 and second SoC 520 to read each other's memories 515 , 525 .
- an interconnect 540 e.g., an ASIL-D rated interconnect
- the first SoC 510 and the second SoC 520 may alternate roles, between a primary SoC and a backup SoC.
- the primary SoC can perform various autonomous driving tasks, such as perception, object detection and classification, grid occupancy determination, sensor data fusion and processing, motion prediction (e.g., of dynamic external entities), motion planning, and vehicle control tasks.
- the backup SoC can maintain a set of computational components (e.g., CPUs, ML accelerators, and/or memory chiplets) in a low power state, and continuously or periodically read the memory of the primary SoC.
- the first SoC 510 performs a set of autonomous driving tasks and publishes state information corresponding to these tasks in the first memory 515 .
- the second SoC 520 reads the published state information in the first memory 515 to continuously check that the first SoC 510 is operating within nominal thresholds (e.g., temperature thresholds, bandwidth and/or memory thresholds, etc.), and that the first SoC 510 is performing the set of autonomous driving tasks properly.
- the second SoC 520 performs health monitoring and error management tasks for the first SoC 510 , and takes over control of the set of autonomous driving tasks when a triggering condition is met.
- the triggering condition can correspond to a fault, failure, or other error experienced by the first SoC 510 that may affect the performance of the set of tasks by the first SoC 510 .
- the second SoC 520 can publish state information corresponding to its computational components being maintained in a standby state (e.g., a low power state in which the second SoC 520 maintains readiness to take over the set of tasks from the first SoC 510 ).
- the first SoC 510 can monitor the state information of the second SoC 520 by continuously or periodically reading the memory 525 of the second SoC 520 to also perform health check monitoring and error management on the second SoC 520 . For example, if the first SoC 510 detects a fault, failure, or other error in the second SoC 520 , the first SoC 510 can trigger the second SoC 520 to perform a system reset or reboot.
- the state information published in the first memory 515 can correspond to the set of tasks being performed by the first SoC 510 .
- the first SoC 510 can publish any information corresponding to the surrounding environment of the vehicle (e.g., any external entities identified by the first SoC 510 , their locations, and predicted trajectories, detected objects, such as traffic signals, signage, lane markings, and crosswalk, and the like).
- the state information can further include the operating temperatures of the computational components of the first SoC 510 , bandwidth usage and available memory of the chiplets of the first SoC 510 , and/or any faults or errors, or information indicating faults or errors in these components.
- the state information published in the second memory 525 can correspond to the state of each computational component of the second SoC 520 .
- these components may operate in a low power state in which the components are ready to take over the set of tasks being performed by the first SoC 510 .
- the state information can include whether the components are operating within nominal temperatures and other nominal ranges (e.g., available bandwidth, power, memory, etc.).
- the first SoC 510 and the second SoC 520 can switch between operating as the primary SoC and the backup SoC (e.g., each time the system 500 is rebooted). For example, in a computing session subsequent to a session in which the first SoC 510 operated as the primary SoC and the second SoC 520 operated as the backup SoC, the second SoC 520 can assume the role of the primary SoC and the first SoC 510 can assume the role of the backup SoC. It is contemplated that this process of switching roles between the two SoCs can provide substantially even wear of the hardware components of each SoC, which can prolong the lifespan of the computing system 500 as a whole.
- the first SoC 510 can be powered by a first power source and the second SoC 520 can be powered by a second power source that is independent or isolated from the first power source.
- the first power source can comprise the battery pack used for propelling the electric motors of the vehicle
- the second power source can comprise the auxiliary power source of the vehicle (e.g., a 12-volt battery).
- the first and second power sources can comprise other types of power sources, such as dedicated batteries for each SoC 510 , 520 or other power sources that are electrically isolated or otherwise not dependent from each other.
- the MSoC arrangement of the computing system 500 can be provided to increase the safety integrity level (e.g., ASIL rating) of the computing system 500 and the overall autonomous driving system of the vehicle.
- the autonomous driving system can include any number of dual SoC arrangements, each of which can perform a set of autonomous driving tasks.
- the backup SoC dynamically monitors the health of the primary SoC in accordance with a set of functional safety operations, such that when a fault, failure, or other error is detected, the backup SoC can readily power up its components and take over the set of tasks from the primary SoC.
- FIG. 6 is a block diagram depicting a performance network and a FuSa network for performing health monitoring and error correction, according to examples described herein.
- the FuSa CPU(s) 600 can be included on a central chiplet 300 of an SoC, or each central chiplet 300 of an MSoC 500 as described herein.
- Each FuSa CPU 600 can execute a FuSa program 602 , which can correspond to the FuSa program 338 as shown and described with respect to FIG. 3 .
- execution of the FuSa program 602 can cause the FuSa CPU(s) 600 to perform the primary and backup SoC monitoring tasks described with respect to FIG. 5 , and the execution of FuSa workloads in a FuSa pipeline 420 for comparison and verification of independent pipeline output described with respect to FIG. 4 .
- multiple chiplets of the SoC can communicate with each other over a high-bandwidth performance network comprising respective sets of interconnects (e.g., interconnect 610 and interconnect 660 ) and network hubs (e.g., network hubs 615 , 635 , and 665 ).
- the multiple chiplets can comprise the sensor data input chiplet 310 , central chiplet 300 , and workload processing chiplets 320 of FIG. 3 , which are represented by chiplet A 605 , chiplet B 655 , and any number of additional chiplets (not shown) in FIG. 6 .
- the cache memories 625 , 675 shown in FIG. 6 may represent cache memories associated with multiple chiplets and/or the cache memory 315 of the central chiplet 300 as shown and described with respect to FIG. 3 .
- raw sensor data, processed sensor data, and various communications between chiplet A 605 , chiplet B 655 , and the FuSa CPU(s) 600 can be transmitted over the high-bandwidth performance network comprising the interconnects 610 , 660 , network hubs 615 , 635 , 665 , and caches 625 , 675 .
- chiplet A 605 comprises a sensor data input chiplet
- chiplet A 605 can obtain sensor data from the various sensors of the vehicle and transmit the sensor data to cache 625 via interconnect 610 and network hub 615 .
- chiplet B 655 comprises a workload processing chiplet
- chiplet B 655 can acquire the sensor data from cache 625 via network hubs 615 , 635 , 665 and interconnect 660 to execute respective inference workloads based on the sensor data.
- the FuSa CPU(s) 600 through execution of the FuSa program 602 , can communicate with the high-bandwidth performance network via a performance network-on-chip (NoC) 607 coupled to a network hub 635 .
- These communications can comprise, for example, acquiring output data from independent pipelines to perform the comparison and verification steps described herein.
- the communications over the high-bandwidth performance network can further comprise communications to access the shared memories 515 , 525 of each SoC 510 , 520 in a multiple SoC 500 that comprises a primary SoC and a backup SoC.
- the FuSa CPUs 600 of each SoC 510 , 520 access the shared memory 515 , 525 of each other to determine whether any faults, failures, or other errors have occurred.
- the backup SoC detects a fault, failure, or error, the backup SoC takes over the primary SoC tasks (e.g., inference, scene understanding, vehicle control tasks, etc.).
- the interconnects 610 , 660 are used as a high-bandwidth data path used for general data purposes to the cache memories 625 , 675 , and health control modules 620 , 670 and FuSa accounting hubs 630 , 640 , and 680 are used as a high-reliability data path to transmit functional safety and scheduler information to the shared memory of the SoC.
- NoCs and network interface units (NIUs) on chiplet A 605 and chiplet B 655 can be configured to generate error-correcting code (ECC) data on both the high-bandwidth and high-reliability data paths.
- ECC error-correcting code
- the FuSa CPU(s) 600 communicates via a FuSa network comprising the FuSa accounting hubs 630 , 640 , 680 and health control modules 620 , 670 via a FuSa NoC 609 .
- the FuSa network facilitates the communication monitoring and error correction code techniques.
- a FuSa accounting hub 630 , 640 , 680 can monitor communications transmitted through each network hub 615 , 635 , 665 of the high-bandwidth network.
- Each of chiplet A 605 and chiplet B 655 can communicate with or include a health control module 620 , 670 through which ECC data, workload start and end communications, and scheduling information can be transmitted.
- the NIUs can transmit the functional safety and scheduler information through the health control modules 620 , 670 in two redundant transactions, with the second transaction ordering the bits in reverse (e.g., from bit 31 to 0 on a 32-bit bus) of the order of the first transaction. Furthermore, if errors are detected in the data transfers between chiplet A 605 and chiplet B 655 over the high-reliability FuSa network, the NIUs can reduce the transmission rate to improve reliability.
- certain processors of chiplet A 605 , chiplet B 655 , and/or the FuSa CPU(s) 600 can include a transient-resistant CPU core to run the scheduling program 342 of FIG. 3 , which schedules workloads belonging to the reflex program 330 , application program 335 , thermal management program 337 , and/or the FuSa program 602 .
- the transient-resistant CPU cores are designed to resist and recover from transient faults caused by environmental factors such as cosmic radiation, power surges, and electromagnetic interference. These faults can cause the CPU to malfunction or produce incorrect results, potentially leading to system failures or security vulnerabilities.
- the transient-resistant CPU cores can include a range of hardware-based fault detection and recovery mechanisms, such as redundant execution units, error-correcting code (ECC) memory, and register duplication. These mechanisms can detect and correct errors in real-time, ensuring that the CPU continues to function correctly even in the presence of transient faults. Additionally, the transient-resistant CPU cores may include various software-based fault tolerance techniques, such as checkpointing and rollback, to further enhance system reliability and resilience.
- ECC error-correcting code
- the health control modules 620 , 670 and FuSa accounting hubs 630 , 640 , 680 can detect and correct errors in real-time, ensuring that the CPUs continue to function correctly even in the presence of transient faults.
- the workload processing chiplet A 540 and the central chiplet 520 perform an error correction check to verify that the processed data was sent and stored in the cache memories 625 , 675 completely and without corruption.
- the workload processing chiplets can generate an error correction code (ECC) using the processed data and transmit the ECC to the central chiplet.
- ECC error correction code
- the ECC is sent along a high-reliability FuSa network via the FuSa accounting hubs 630 , 640 , 680 .
- the central chiplet can generate its own ECC using the processed data, and the FuSa CPU 600 can perform a functional safety call in the central chiplet mailbox to compare the two ECCs to ensure that they match, which verifies that the data was transmitted correctly.
- the communications over the high-bandwidth perform network and high-reliability FuSa network, as well as the ECC techniques described herein can provide additional redundancy and mitigative actions that can bolster the ASIL rating of individual SoC components and the autonomous drive system as a whole.
- a safety authority may readily verify and certify the various methods and components described throughout the present disclosure such that an autonomous drive system implementing these methods and components can be deemed safe and reliable for use on public road networks.
- FIG. 7 through 9 are flow charts describing various methods of implementing functional safety techniques described above.
- the steps described with respect to the flow charts of FIGS. 7 through 9 may be performed by the computing system 100 , the workload processing chiplets 320 and central chiplet 300 of the SoC 200 , and/or the MSoC 500 as shown and described with respect to FIGS. 1 through 6 .
- certain steps described with respect to the flow charts of FIGS. 7 through 9 may be performed prior to, in conjunction with, or subsequent to any other step, and need not be performed in the respective sequences shown.
- FIG. 7 is a flow chart describing a method of dynamically comparing and verifying workload outputs by a set of workload processing chiplets, according to various examples.
- a sensor data input chiplet 310 can obtain sensor data from a set of vehicle sensors 205 .
- the sensor data from the vehicle sensors 205 can comprise any combination of LIDAR data, image data, radar data, and/or other forms of sensor data (e.g., ultrasonic data, IR data, etc.).
- a set of workload processing chiplets can execute workloads based on the sensor data in a set of independent pipelines.
- the scheduling program 342 of the central chiplet 300 can schedule specified sets of workloads to be executed in a deterministic manner within independent pipelines.
- the scheduling program 342 can impart dependency information in the workload entries such that they are not executed until the dependency information is resolved. This dependency information can comprise other workloads that need to be executed prior to execution of that particular workload.
- the workload processing chiplets can execute the workloads in the set of independent pipelines deterministically using the reservation table 350 as an out-of-order buffer (e.g., by sequentially analyzing workload entries in a workload window 355 using an instruction pointer for multimedia content).
- the workload processing chiplets can execute the workloads in the set of independent pipelines to perform a set of tasks for operating a vehicle.
- the set of tasks can comprise a plurality of image stitching tasks, sensor fusion tasks, machine learning inference tasks, object detection tasks, object classification tasks, scene understanding tasks, motion prediction tasks, and the like. These tasks can comprise inference operations to process the surrounding environment of the vehicle such that an application program 335 can successfully operate the vehicle along a travel route.
- the set of independent pipelines can provide output (e.g., an inferred sensor view of the surrounding environment) to the application program 335 for autonomously operating the vehicle.
- the central chiplet 300 can include a FuSa program 338 that dynamically compares and verifies output of a plurality of independent pipelines in a FuSa pipeline in a deterministic manner, as shown in the example FuSa pipeline 420 of FIG. 4 .
- the FuSa program 338 can dynamically compare and verify output of workloads executed by the set of workload processing chiplets 320 , the workloads being executed across the set of workload processing chiplets 320 based on the sensor data. Execution of the workloads in deterministic pipelines (e.g., via the reflex program 330 ) can result in generating an inferred sensor view of a surrounding environment of the vehicle, which can be provided to an application program 335 for autonomously operating the vehicle.
- each independent deterministic pipeline corresponding to the reflex program 330 can be certified (e.g., for use on public roads by a safety authority). These pipelines can include all inference operations that correspond to perception, object detection and classification, occupancy grid determination, motion prediction and/or planning, and any other scene understanding task for autonomously operating the vehicle.
- FIG. 8 is a flow chart describing a method of performing backup operations in a multiple system-on-chip (MSoC) arrangement, according to various examples.
- a first SoC 510 can receive sensor data from a set of vehicle sensors 205 .
- the first SoC 510 can perform a set of autonomous driving tasks based on the sensor data.
- the first SoC 510 can include a set of chiplets, as shown in FIG. 2 , that each perform one or more autonomous driving tasks that include one or more perception, inference, object detection and/or classification, right-of-way determination, occupancy grid determination, motion prediction, motion planning, and/or vehicle control tasks for the autonomous vehicle.
- the autonomous driving tasks can comprise sensor data perception and inference tasks for autonomously operating a vehicle along a travel route.
- the first SoC 510 can further publish state information in a shared memory 515 of the first SoC 510 .
- the first SoC 510 can continuously read state information of the second SoC 520 .
- the state information of the second SoC 520 can indicate the operating parameters of the various computational components (e.g., chiplets) of the second SoC 520 in the low powered state, and can further indicate whether these components are operating within those parameters (e.g., whether the components are warmed up and ready to take over the set of autonomous driving tasks).
- the first SoC 510 can dynamically determine whether a trigger has been detected in the state information of the second SoC 520 .
- the trigger can correspond to any of the components of the second SoC 520 operating outside nominal parameters, or a fault, failure, or error experienced by the second SoC 520 . If no trigger is detected, then the first SoC 510 can continue monitoring the state information of the second SoC 520 . However, if at any time a trigger is detected, then at block 825 , the first SoC 510 can, for example, transmit a command to the second SoC 520 to cause the second SoC 520 to perform a system reboot. As described herein, information communicated between SoC 510 and SoC 520 can be transmitted via a robust, ASIL-D rated interconnect (e.g., interconnect 540 shown in FIG. 5 ) using an error correction code (ECC), which provides redundancy algorithmically (e.g., through use of block codes, convolutional codes, and the like).
- ECC error correction code
- the second SoC 520 can maintain a plurality of computational components in a low power state. As described above, these components can include any one or more of the chiplets as shown and described with respect to FIG. 2 .
- the second SoC 520 through execution of the FuSa program 338 , can continuously read the state information as published by the first SoC 510 .
- the second SoC 520 can determine whether a trigger is detected in the state information.
- the trigger can correspond to the first SoC 510 experiencing a fault or a failure, where the fault or the failure can correspond to the first SoC 510 experiencing degraded performance, such as overheating, a power surge, or an error in the first SoC 510 .
- the second SoC 520 can continue to monitor the state information of the first SoC 510 .
- the second SoC 520 can power up its computational components and take over the set of autonomous driving tasks from the first SoC 510 , while the first SoC 510 powers down its components and assumes the role of backup SoC.
- the second SoC 520 can continue to read the state information of the first SoC 510 .
- the second SoC 520 can determine whether the first SoC 510 is still degraded. If so, at block 865 , the second SoC 520 can initiate a set of mitigative or emergency measures.
- these measures can comprise reducing the speed of the vehicle, providing a notification to any passengers in the vehicle (e.g., to take over manual control of the vehicle), autonomously operating the vehicle to a safe location (e.g., pulling over the vehicle or driving to a home location), and/or autonomously operating the vehicle to a service center to resolve the degraded status of the first SoC 510 .
- the second SoC 520 may further transmit a command to cause the first SoC 510 to perform a system reboot.
- the first SoC 510 may then perform the backup SoC tasks, such as maintaining a subset of its components in a low power state and dynamically monitoring state information as published by the primary SoC 520 . If at any time, the primary and secondary SoCs are unable to communicate (e.g., one of the SoCs is unable to boot up), the autonomous drive system of the vehicle will not engage. It is contemplated that this arrangement provides necessary redundancy for an increased ASIL rating of the autonomous drive system of the vehicle (e.g., contributes to an ASIL-D rating).
- the first SoC 510 and the second SoC 520 can switch between primary and backup roles to maintain substantially even wear on the MSoC components, such as the various chiplets of each SoC.
- the SoCs can be electrically coupled via one or more eFuses that protect the SoCs from each other (e.g., from voltage or current surges).
- the first SoC 510 and the second SoC 520 can be powered by distinct power sources, such as the battery pack used for propulsion of the vehicle, and the auxiliary power source of the vehicle used for powering the auxiliary components (e.g., ECU, lights, radio, etc.).
- the state information monitoring and error management functions performed by the first and second SoCs 510 , 520 can be performed by functional safety components of each SoC (e.g., the FuSa program 338 and FuSa processor).
- the FuSa components remain powered up to perform their functional safety tasks while the remaining components are maintained in the low power state, and ready to assume the primary SoC tasks.
- the first SoC 510 and the second SoC 520 being arranged to dynamically read state information and take over the set of tasks of the primary SoC provides redundancy to facilitate an automotive safety integrity level rating for the autonomous drive computing system (e.g., achieve an ASIL-D rating).
- FIG. 9 is a flow chart describing a method of monitoring communications in a high-bandwidth performance network by a FuSa program 602 , according to various examples described herein.
- the sensor data input chiplet 310 , the central chiplet 300 , and the one or more workload processing chiplets 320 can communicate data over a performance network comprising a plurality of network hubs.
- the sensor data input chiplet 310 can communicate raw sensor data, at block 902 , to a cache memory 315 , where the sensor data may be accessed by the central chiplet 300 and/or workload processing chiplets 320 to execute their respective workloads.
- the workload processing chiplets 320 and/or central chiplet 300 can transmit and receive processed sensor data (e.g., to and from the cache memory 315 ).
- each sensor data item such as images, point cloud maps, radar pulses, etc. can be transmitted using an encryption technique (e.g., a cipher using public-private key encryption) in which the recipient chiplet (e.g., central chiplet 300 and/or workload processing chiplet 320 ) decrypts the transmission to access the sensor data item.
- the cipher can comprise a coded algorithm associated with a public key of the sensor data component at which the sensor data item originated.
- Each sensor data item from each sensor data component (e.g., image sensors, LIDAR sensors, radar sensors, etc.) is generated sequentially. In accordance with examples described herein, any subsequent data item from any sensor data component cannot be accessed by recipient chiplets without decryption and verification of the previous sensor data item from the respective sensor data components.
- the sensor data input chiplet 310 or central chiplets 300 of the primary SoC can generate a cipher for each sensor data item (e.g., individual images, point cloud maps, radar data pulses, etc.) that identifies the data source of the sensor data item (e.g., individual image sensors, LIDAR sensors, radar sensors, etc.).
- the generated cipher can be transmitted to the backup SoC to facilitate immediate takeover of the primary SoC's functions.
- the sensor data input chiplet 310 or central chiplet 300 can transmit the cipher associated with the raw sensor data item to the recipient chiplets(s) and/or the backup SoC for decryption and verification to associate the sensor data item with the sensor data source.
- the cipher can be transmitted over the performance network via the network hubs 615 , 635 , 665 , the high-reliability network via the health control modules 620 , 670 and FuSa accounting hubs 630 , 640 , 680 , or both.
- decryption of the cipher enables the backup SoC to readily take over if any faults, failures, or errors occur in the primary SoC.
- the backup SoC can utilize a corresponding private key to decode the cipher and verify the data source of each of the sensor data items.
- each of the primary and backup SoCs has direct access to the sensor components of the vehicle. Accordingly, when the primary SoC experiences an issue, the backup SoC can provide a verification indicator to the sensor data components indicating that the previously sent sensor data items were decrypted accordingly, thereby enabling access to additional sensor data items from the individual sensor data components. Thereafter, the back SoC can assume the role of primary SoC, as described above.
- the FuSa program 602 is executed (e.g., by one or more dedicated FuSa CPUs 600 ) to monitor communications through the plurality of network hubs 615 , 635 , 665 of the performance network between the sensor data input chiplet 310 , the central chiplet 300 , and the one or more workload processing chiplets 320 .
- a set of FuSa accounting hubs can be connected to the network hubs of the performance network to receive communication data (e.g., a 32-bit register that indicates whether any errors are present in a particular communication) for each communication.
- the FuSa program 602 can receive this information from the FuSa accounting hubs 615 , 635 , 665 via a FuSa NoC 609 that enables the FuSa program 602 to communicate with each chiplet of the SoC over a high-reliability FuSa network. Furthermore, the chiplets can communicate workload start and end indicators over health control modules 620 , 670 and the FuSa accounting hubs 630 , 640 , 680 , which can, for example, cause dependency information in the reservation table 350 to be updated accordingly.
- the central chiplet 300 can include a dedicated FuSa CPU 600 executing the FuSa program 602 to communicate over the performance network via the performance NoC 607 , and communicate over the high-reliability FuSa network via the FuSa NoC 609 .
- the chiplets can generate and transmit ECCs over the high-reliability FuSa network (e.g., via the health control modules 620 , 670 and FuSa accounting hubs 630 , 640 , 680 ).
- the workload processing chiplet 320 when a workload processing chiplet 320 communicates with the shared memory 360 of the central chiplet 300 , the workload processing chiplet 320 can also generate and transmit a first ECC based on the communicated data along the FuSa network. Upon receiving the data, the central chiplet 300 can generate a second ECC based on the received data.
- the central chiplet 300 can perform a FuSa call to verify that the first ECC and the second ECC match to ensure that the data was transmitted correctly and confirm the communication.
- the ECC techniques can be performed for each communication to the central chiplet 300 .
- the ECC techniques may also be performed for communications between workload processing chiplets 320 , between the central chiplet 300 and workload processing chiplets 320 , and/or between the sensor data input chiplet 310 and central chiplet 300 .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Mechanical Engineering (AREA)
- Transportation (AREA)
- Human Computer Interaction (AREA)
- Automation & Control Theory (AREA)
- Quality & Reliability (AREA)
- Hardware Redundancy (AREA)
- Traffic Control Systems (AREA)
Abstract
A sensor data input chiplet obtains sensor data from a sensor system. A central chiplet executes a functional safety program to dynamically compare and verify output of workloads being executed by a set of workload processing chiplets, where the workloads are executed across the set of workload processing chiplets based on the sensor data.
Description
- Universal Chiplet Interconnect Express (UCIe) provides an open specification for an interconnect and serial bus between chiplets, which enables the production of large system-on-chip (SoC) packages with intermixed components from different silicon manufacturers. Autonomous vehicle computing systems may operate using chiplet arrangements that follow the UCIe specification. One goal of creating such computing systems is to achieve the robust safety integrity levels of other important electrical and electronic (E/E) automotive components of the vehicle.
- A computing system can include a sensor data input chiplet to obtain sensor data from a sensor system of a vehicle, and one or more workload processing chiplets that execute workloads based on the sensor data. The computing system can further include a first central chiplet comprising a shared memory including a functional safety (FuSa) program that the first central chiplet to dynamically compare and verify output of workloads being executed by the set of workload processing chiplets. In various examples, the computing system can be included on a vehicle, and the workloads can comprise inference tasks based on the sensor data for autonomously operating the vehicle. In certain implementations, the workloads can be executed by the set of workload processing chiplets in independent pipelines, and the FuSa program can dynamically compare and verify output of the independent pipelines by executing a set of FuSa workloads in a FuSa pipeline.
- In some embodiments, the computing system can comprise a first system-on-chip (SoC) that includes the first central chiplet and a second SoC that includes a second central chiplet. The first SoC and the second SoC can be communicatively coupled by an interconnect, and the FuSa program can also be included in the second central chiplet of the second SoC. In such embodiments, the FuSa program included in the second central chiplet of the second SoC causes the second SoC to monitor the shared memory of the first central chiplet of the first SoC to dynamically determine whether the first SoC is functioning within nominal operating parameters. In response to determining that the first SoC is not functioning within nominal operating parameters, the FuSa program of the second SoC can cause a second set of workload processing chiplets of the second SoC to take over execution of the workloads. As provided herein, the nominal operating parameters can correspond to nominal temperature ranges, voltage ranges, or any, faults, failures, or errors on the first SoC.
- In various implementations, the sensor data input chiplet, the central chiplet, and the one or more workload processing chiplets can communicate over a performance network comprising a plurality of network hubs. The performance network can comprise a high-bandwidth network for the transmission of raw sensor data, processed sensor data, and messages. In further implementations, the FuSa program can monitor communications through the plurality of network hubs between the sensor data input chiplet, the central chiplet, and the one or more workload processing chiplets. For example, the FuSa program can monitor communications through the plurality of network hubs using a set of FuSa accounting hubs that communicate over a high-reliability FuSa network.
- In certain examples, the central chiplet can comprise a dedicated FuSa CPU executing the FuSa program to communicate over the performance network via a performance network-on-chip (NoC), and communicate over the high-reliability FuSa network via a FuSa NoC. The one or more workload processing chiplets can transmit processed sensor data corresponding to the execution of workloads to a cache memory of the central chiplet over the performance network. The workload processing chiplets can further transmit a first error correction code (ECC) along the high-reliability FuSa network to the central chiplet based on the processed sensor data. Upon receiving the processed sensor data, the central chiplet can generate a second ECC using the processed sensor data. The FuSa CPU of the central chiplet may then perform a functional safety call in the central chiplet to verify that the first ECC and the second ECC match to ensure that the processed data was transmitted correctly.
- The disclosure herein is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements, and in which:
-
FIG. 1 is a block diagram depicting an example computing system in which embodiments described herein may be implemented, in accordance with examples described herein; -
FIG. 2 is a block diagram depicting a system-on-chip (SoC) in which examples described herein may be implemented, in accordance with examples described herein; -
FIG. 3 is a block diagram illustrating an example central chiplet of an SoC arrangement for executing workloads, in accordance with examples described herein; -
FIG. 4 depicts workloads being executed in a set of independent pipelines, and further depicts a functional safety (FuSa) pipeline operable to compare and verify the outputs of the independent pipelines, according to examples described herein; -
FIG. 5 is a block diagram depicting an example multiple system-on-chip (MSoC), in accordance with examples described herein; -
FIG. 6 is a block diagram depicting a performance network and a FuSa network for performing health monitoring and error correction, according to examples described herein; -
FIG. 7 is a flow chart describing a method of dynamically comparing and verifying workload outputs by a set of workload processing chiplets, according to various examples; -
FIG. 8 is a flow chart describing a method of performing backup operations in a multiple system-on-chip (MSoC) arrangement, according to various examples; and -
FIG. 9 is a flow chart describing a method of monitoring communications in a high-bandwidth performance network by a FuSa program, according to various examples described herein. - In experimentation and controlled testing environments, system redundancies and automotive safety integrity level (ASIL) ratings for autonomy systems are not typically a priority consideration. As autonomous driving features continue to advance (e.g., beyond
Level 3 autonomy), and autonomous vehicles begin operating more commonly on public road networks, the qualification and certification of E/E components related to autonomous operation of the vehicle will be advantageous to ensure operational safety of these vehicles. Furthermore, novel methods for qualifying and certifying hardware, software, and/or hardware/software combinations will also be advantageous in increasing public confidence and assurance that autonomous driving systems are safe beyond current standards. For example, certain safety standards for autonomous driving systems include safety thresholds that correspond to average human abilities and care. Yet, these statistics include vehicle incidences involving impaired or distracted drivers and do not factor in specified time windows in which vehicle operations are inherently riskier (e.g., inclement weather conditions, late night driving, winding mountain roads, etc.). - Automotive safety integrity level (ASIL) is a risk classification scheme defined by ISO 26262 (the functional safety for road vehicles standard), and is typically established for the E/E components of the vehicle by performing a risk analysis of potential hazards, which involves determining respective levels of severity (i.e., the severity of injuries the hazard can be expected to cause; classified between S0 (no injuries) and S3 (life-threatening injuries)), exposure (i.e., the relative expected frequency of the operational conditions in which the injury can occur; classified between E0 (incredibly unlikely) and E4 (high probability of injury under most operating conditions)), and controllability (i.e., the relative likelihood that the driver can act to prevent the injury; classified between C0 (controllable in general) and C3 difficult to control or uncontrollable)) of the vehicle operating scenario. As such, the safety goal(s) for any potential hazard event includes a set of ASIL requirements.
- Hazards that are identified as quality management (QM) do not dictate any safety requirements. As an illustration, these QM hazards may be any combination of low probability of exposure to the hazard, low level of severity of potential injuries resulting from the hazard, and a high level of controllability by the driver in avoiding the hazard and/or preventing injuries. Other hazard events are classified as ASIL-A, ASIL-B, ASIL-C, or ASIL-D depending on the various levels of severity, exposure, and controllability corresponding to the potential hazard. ASIL-D events correspond to the highest integrity requirements (ASIL requirements) on the safety system or E/E components of the safety system, and ASIL-A comprises the lowest integrity requirements. As an example, the airbags, anti-lock brakes, and power steering system of a vehicle will typically have an ASIL-D grade, where the risks associated with the failure of these components (e.g., the probable severity of injury and lack of vehicle controllability to prevent those injuries) are relatively high.
- As provided herein, the ASIL may refer to both risk and risk-dependent requirements, where the various combinations of severity, exposure, and controllability are quantified to form an expression of risk (e.g., an airbag system of a vehicle may have a relatively low exposure classification, but high values for severity and controllability). As provided above, the quantities for severity, exposure, and controllability for a given hazard are traditionally determined using values for severity (e.g., S0 through S3), exposure (e.g., E0 through E4), and controllability (e.g., C0 through C3) in the ISO 26262 series, where these values are then utilized to classify the ASIL requirements for the components of a particular safety system. As provided herein, certain safety systems can perform variable mitigation measures, which can range from alerts (e.g., visual, auditory, or haptic alerts), minor interventions (e.g., brake assist or steer assist), major interventions and/or avoidance maneuvering (e.g., taking over control of one or more control mechanisms, such as the steering, acceleration, or braking systems), and full autonomous control of the vehicle.
- Current fully autonomous driving systems can comprise non-deterministic inference models, in which the system executes one or more perception, object detection, object classification, motion prediction, motion planning, and vehicle control techniques based on, for example, two-dimensional image data, to perform all autonomous driving tasks. It is contemplated that such implementations may be difficult or impossible to certify and provide an ASIL rating for the overall autonomous driving system. To address these shortcomings in current implementations, an autonomous driving system is provided herein that may perform deterministic, reflexive inference operations on specified hardware arrangements that allow for the certification and ASIL grading of various components, software aspects of the system, and/or the entire autonomous driving system itself.
- Example computing systems are described herein which comprise MSoC arrangements comprising a primary SoC and backup SoC, with each SoC executing a FuSa program to monitor a shared memory of the other SoC, compare and verify independent pipeline outputs based on workload processing chiplets executing workloads, and generate error correction codes for communications between the sensor data input chiplet, central chiplet, and/or the workload processing chiplets. In certain embodiments, the FuSa program can be executed by a dedicated FuSa CPU in the central chiplet to perform these tasks. In the context of autonomous driving, the FuSa program can facilitate redundancy between the primary and backup SoCs, perform verification of inference tasks performed by the central chiplet and/or workload processing chiplets, and reduce errors in communications between the chiplets using one or more error correction code techniques. Such FuSa tasks can contribute to an enhanced ASIL rating of various components of an autonomous drive system, as well as the autonomous drive system of the vehicle itself. For example, it is contemplated that the FuSa tasks and robust hardware components described herein can result in an ASIL-D rating for the autonomous drive system.
- In certain implementations, the example computing systems can perform one or more functions described herein using a learning-based approach, such as by executing an artificial neural network (e.g., a recurrent neural network, convolutional neural network, etc.) or one or more machine-learning models. Such learning-based approaches can further correspond to the computing system storing or including one or more machine-learned models. In an embodiment, the machine-learned models may include an unsupervised learning model. In an embodiment, the machine-learned models may include neural networks (e.g., deep neural networks) or other types of machine-learned models, including non-linear models and/or linear models. Neural networks may include feed-forward neural networks, recurrent neural networks (e.g., long short-term memory recurrent neural networks), convolutional neural networks or other forms of neural networks. Some example machine-learned models may leverage an attention mechanism such as self-attention. For example, some example machine-learned models may include multi-headed self-attention models (e.g., transformer models).
- As provided herein, a “network” or “one or more networks” can comprise any type of network or combination of networks that allows for communication between devices. In an embodiment, the network may include one or more of a local area network, wide area network, the Internet, secure network, cellular network, mesh network, peer-to-peer communication link or some combination thereof and may include any number of wired or wireless links. Communication over the network(s) may be accomplished, for instance, via a network interface using any type of protocol, protection scheme, encoding, format, packaging, etc.
- One or more examples described herein provide that methods, techniques, and actions performed by a computing device are performed programmatically, or as a computer implemented method. Programmatically, as used herein, means through the use of code or computer-executable instructions. These instructions can be stored in one or more memory resources of the computing device. A programmatically performed step may or may not be automatic.
- One or more examples described herein can be implemented using programmatic modules, engines, or components. A programmatic module, engine, or component can include a program, a sub-routine, a portion of a program, or a software component or a hardware component capable of performing one or more stated tasks or functions. As used herein, a module or component can exist on a hardware component independently of other modules or components. Alternatively, a module or component can be a shared element or process of other modules, programs or machines.
- Some examples described herein can generally require the use of computing devices, including processing and memory resources. For example, one or more examples described herein may be implemented, in whole or in part, on computing devices such as servers and/or personal computers using network equipment (e.g., routers). Memory, processing, and network resources may all be used in connection with the establishment, use, or performance of any example described herein (including with the performance of any method or with the implementation of any system).
- Furthermore, one or more examples described herein may be implemented through the use of instructions that are executable by one or more processors. These instructions may be carried on a non-transitory computer-readable medium. Machines shown or described with figures below provide examples of processing resources and computer-readable mediums on which instructions for implementing examples disclosed herein can be carried and/or executed. In particular, the numerous machines shown with examples of the invention include processors and various forms of memory for holding data and instructions. Examples of non-transitory computer-readable mediums include permanent memory storage devices, such as hard drives on personal computers or servers. Other examples of computer storage mediums include portable storage units, such as flash memory or magnetic memory. Computers, terminals, network-enabled devices are all examples of machines and devices that utilize processors, memory, and instructions stored on computer-readable mediums. Additionally, examples may be implemented in the form of computer programs, or a computer usable carrier medium capable of carrying such a program.
-
FIG. 1 is a block diagram depicting anexample computing system 100 in which embodiments described herein may be implemented, in accordance with examples described herein. In an embodiment, thecomputing system 100 can include one ormore control circuits 110 that may include one or more processors (e.g., microprocessors), one or more processing cores, a programmable logic circuit (PLC) or a programmable logic/gate array (PLA/PGA), a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), systems on chip (SoCs), or any other control circuit. In some implementations, the control circuit(s) 110 and/orcomputing system 100 may be part of, or may form, a vehicle control unit (also referred to as a vehicle controller) that is embedded or otherwise disposed in a vehicle (e.g., a Mercedes-Benz® car, truck, or van). For example, the vehicle controller may be or may include an infotainment system controller (e.g., an infotainment head-unit), a telematics control unit (TCU), an electronic control unit (ECU), a central powertrain controller (CPC), a central exterior & interior controller (CEIC), a zone controller, an autonomous vehicle control system, or any other controller (the term “or” is used herein interchangeably with “and/or”). - In an embodiment, the control circuit(s) 110 may be programmed by one or more computer-readable or computer-executable instructions stored on the non-transitory computer-
readable medium 120. The non-transitory computer-readable medium 120 may be a memory device, also referred to as a data storage device, which may include an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination thereof. The non-transitory computer-readable medium 120 may form, for example, a computer diskette, a hard disk drive (HDD), a solid state drive (SDD) or solid state integrated memory, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), dynamic random access memory (DRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), and/or a memory stick. In some cases, the non-transitory computer-readable medium 120 may store computer-executable instructions or computer-readable instructions, such as instructions to perform the below methods described in connection withFIGS. 7-9 . - In various embodiments, the terms “computer-readable instructions” and “computer-executable instructions” are used to describe software instructions or computer code configured to carry out various tasks and operations. In various embodiments, if the computer-readable or computer-executable instructions form modules, the term “module” refers broadly to a collection of software instructions or code configured to cause the
control circuit 110 to perform one or more functional tasks. The modules and computer-readable/executable instructions may be described as performing various operations or tasks when the control circuit(s) 110 or other hardware components execute the modules or computer-readable instructions. - In further embodiments, the
computing system 100 can include acommunication interface 140 that enables communications over one ormore networks 150 to transmit and receive data. In various examples, thecomputing system 100 can communicate, over the one ormore networks 150, with fleet vehicles using thecommunication interface 140 to receive sensor data and implement the methods described throughout the present disclosure. In certain embodiments, thecommunication interface 140 may be used to communicate with one or more other systems. Thecommunication interface 140 may include any circuits, components, software, etc. for communicating via one or more networks 150 (e.g., a local area network, wide area network, the Internet, secure network, cellular network, mesh network, and/or peer-to-peer communication link). In some implementations, thecommunication interface 140 may include for example, one or more of a communications controller, receiver, transceiver, transmitter, port, conductors, software and/or hardware for communicating data/information. - As an example embodiment, the control circuit(s) 110 of the
computing system 100 can include a SoC arrangement that facilitates the various methods and techniques described throughout the present disclosure. In various examples, the SoC can include a set of chiplets, including a central chiplet comprising a shared memory in which a reservation table is utilized to execute various autonomous driving workloads in independent deterministic pipelines. According to embodiments described herein, the shared memory of the central chiplet can include a FuSa program executable by thecontrol circuit 110 to perform functional safety tasks for the SoC arrangement, as described in detail below. -
FIG. 2 is a block diagram illustrating anexample SoC 200, in accordance with examples described herein. Theexample SoC 200 shown inFIG. 2 can include additional components, and the components of system onchip 200 may be arranged in various alternative configurations other than the example shown. Thus, the system onchip 200 ofFIG. 2 is described herein as an example arrangement for illustrative purposes and is not intended to limit the scope of the present disclosure in any manner. - Referring to
FIG. 2 , a sensordata input chiplet 210 of the system onchip 200 can receive sensor data fromvarious vehicle sensors 205 of the vehicle. Thesevehicle sensors 205 can include any combination of image sensors (e.g., single cameras, binocular cameras, fisheye lens cameras, etc.), LIDAR sensors, radar sensors, ultrasonic sensors, proximity sensors, and the like. The sensordata input chiplet 210 can automatically dump the received sensor data as it's received into acache memory 231 of thecentral chiplet 220. The sensordata input chiplet 210 can also include an image signal processor (ISP) responsible for capturing, processing, and enhancing images taken from thevarious vehicle sensors 205. The ISP takes the raw image data and performs a series of complex image processing operations, such as color, contrast, and brightness correction, noise reduction, and image enhancement, to create a higher-quality image that is ready for further processing or analysis by the other chiplets of theSoC 200. The ISP may also include features such as auto-focus, image stabilization, and advanced scene recognition to further enhance the quality of the captured images. The ISP can then store the higher-quality images in thecache memory 231. - In some aspects, the sensor
data input chiplet 210 publishes identifying information for each item of sensor data (e.g., images, point cloud maps, etc.) to a sharedmemory 230 of acentral chiplet 220, which acts as a central mailbox for synchronizing workloads for the various chiplets. The identifying information can include details such as an address in thecache memory 231 where the data is stored, the type of sensor data, which sensor captured the data, and a timestamp of when the data was captured. - To communicate with the
central chiplet 220, the sensordata input chiplet 210 transmits data through aninterconnect 211 a. Interconnects 211 a-f each represent die-to-die (D2D) interfaces between the chiplets of theSoC 200. In some aspects, the interconnects include a high-bandwidth data path used for general data purposes to thecache memory 231 and a high-reliability data path to transmit functional safety and scheduler information to the sharedmemory 230. Depending on bandwidth requirements, an interconnect may include more than one die-to-die interface. For example, interconnect 211 a can include two interfaces to support higher bandwidth communications between the sensordata input chiplet 210 and thecentral chiplet 220. - In one aspect, the interconnects 211 a-f implement the Universal Chiplet Interconnect Express (UCIe) standard and communicate through an indirect mode to allow each of the chiplet host processors to access remote memory as if it were local memory. This is achieved by using a specialized Network on Chip (NoC) Network Interface Unit (NIU) (allows freedom of interferences between devices connected to the network) that provides hardware-level support for remote direct memory access (RDMA) operations. In UCIe indirect mode, the host processor sends requests to the NIU, which then accesses the remote memory and returns the data to the host processor. This approach allows for efficient and low-latency access to remote memory, which can be particularly useful in distributed computing and data-intensive applications. Additionally, UCIe indirect mode provides a high degree of flexibility, as it can be used with a wide range of different network topologies and protocols.
- In various examples, the system on
chip 200 can include additional chiplets that can store, alter, or otherwise process the sensor data cached by the sensordata input chiplet 210. The system onchip 200 can include anautonomous drive chiplet 240 that can perform the perception, sensor fusion, trajectory prediction, and/or other autonomous driving algorithms of the autonomous vehicle. Theautonomous drive chiplet 240 can be connected to a dedicated HBM-RAM chiplet 235 in which theautonomous drive chiplet 240 can publish all status information, variables, statistical information, and/or processed sensor data as processed by theautonomous drive chiplet 240. - In various examples, the system on
chip 200 can further include a machine-learning (ML)accelerator chiplet 240 that is specialized for accelerating AI workloads, such as image inferences or other sensor inferences using machine learning, in order to achieve high performance and low power consumption for these workloads. TheML accelerator chiplet 240 can include an engine designed to efficiently process graph-based data structures, which are commonly used in AI workloads, and a highly parallel processor, allowing for efficient processing of large volumes of data. TheML accelerator chiplet 240 can also include specialized hardware accelerators for common AI operations such as matrix multiplication and convolution as well as a memory hierarchy designed to optimize memory access for AI workloads, which often have complex memory access patterns. - The
general compute chiplets 245 can provide general purpose computing for the system onchip 200. For example, thegeneral compute chiplets 245 can comprise high-powered central processing units and/or graphical processing units that can support the computing tasks of thecentral chiplet 220,autonomous drive chiplet 240, and/or theML accelerator chiplet 250. - In various implementations, the shared
memory 230 can store programs and instructions for performing autonomous driving tasks. The sharedmemory 230 of thecentral chiplet 220 can further include a reservation table that provides the various chiplets with the information needed (e.g., sensor data items and their locations in memory) for performing their individual tasks. Further description of the sharedmemory 230 in the context of the dual SoC arrangements described herein is provided below with respect toFIG. 4 . Thecentral chiplet 220 also includes thelarge cache memory 231, which supports invalidate and flush operations for stored data. - Cache miss and evictions from the
cache memory 231 are sent by a high-bandwidth memory (HBM)RAM chiplet 255 connected to thecentral chiplet 220. The HBM-RAM chiplet 255 can include status information, variables, statistical information, and/or sensor data for all other chiplets. In certain examples, the information stored in the HBM-RAM chiplet 255 can be stored for a predetermined period of time (e.g., ten seconds) before deleting or otherwise flushing the data. For example, when a fault occurs on the autonomous vehicle, the information stored in the HBM-RAM chiplet 255 can include all information necessary to diagnose and resolve the fault.Cache memory 231 keeps fresh data available with low latency and less power required compared to accessing data from the HBM-RAM chiplet 255. - As provided herein, the shared
memory 230 can house a mailbox architecture in which a reflex program comprising a suite of instructions is used to execute workloads by thecentral chiplet 220,general compute chiplets 245, and/orautonomous drive chiplet 240. In certain examples, thecentral chiplet 220 can further execute a functional safety (FuSa) program that operates to compare and verify outputs of respective pipelines to ensure consistency in the ML inference operations. In still further examples, thecentral chiplet 220 can execute a thermal management program to ensure that the various components of theSoC 200 operates within normal temperature ranges. Further description of the sharedmemory 230 in the context of out-of-order workload execution in independent deterministic pipelines is provided below with respect toFIG. 3 . -
FIG. 3 is a block diagram illustrating an examplecentral chiplet 300 of an SoC arrangement, in accordance with examples described herein. Thecentral chiplet 300 shown inFIG. 3 can correspond to thecentral chiplet 220 of theSoC 200 as shown inFIG. 2 . Furthermore, the sensordata input chiplet 310 ofFIG. 3 can correspond to the sensordata input chiplet 210 shown inFIG. 2 , and theworkload processing chiplets 320 shown inFIG. 3 , can correspond to any one or more of thegeneral compute chiplets 245,ML accelerator chiplet 250, and/or theautonomous drive chiplet 240 shown inFIG. 2 . - Referring to
FIG. 3 , thecentral chiplet 300 can include a sharedmemory 360 that includes areflex program 330, anapplication program 335, athermal management program 337, and aFuSa program 338. As provided herein, thereflex program 330 can comprise a set of instructions for executing reflex workloads in independent pipelines. The reflex workloads can comprise sensor data acquisition, sensor fusion, and inference tasks that facilitate scene understanding of the surrounding environment of the vehicle. These tasks can comprise two-dimensional image processing, sensor fused data processing (e.g., three-dimensional LIDAR, radar, and image fusion data), neural radiance field (NeRF) scene reconstruction, occupancy grid determination, object detection and classification, motion prediction, and other scene understanding tasks for autonomous vehicle operation. - As further provided herein, the
application program 335 can comprise a set of instructions for operating the vehicle controls of the autonomous vehicle based on the outputs of the reflex workload pipelines. For example, theapplication program 335 can be executed by one ormore processors 340 of thecentral chiplet 300 and/or one or more of the workload processing chiplets 320 (e.g., theautonomous drive chiplet 240 ofFIG. 2 ) to dynamically generate a motion plan for the vehicle based on the execution of the reflex workloads, and operate the vehicle's controls (e.g., acceleration, braking, steering, and signaling systems) to execute the motion plan accordingly. - The
thermal management program 337 can be executed by one ormore processors 340 of the central chiplet to manage heat generated by the SoC. In various examples, thethermal management program 337 can operate a set of cooling components (e.g., heat sinks, fans, cold plates, heat pipes, synthetic jets, etc.) included with the SoC to manage local temperatures of computational components within normal operating ranges. In further examples, thethermal management program 337 can be executed by a dedicated CPU of thecentral chiplet 300 and can communicate with theworkload processing chiplets 320 and sensordata input chiplet 310 for adjusting clocking or throttling the computing components to further manage temperatures. In certain implementations, thethermal management program 337 can execute on each SoC of a dual or multiple-SoC arrangement, and can further communicate with theFuSa program 338 to, for example, power down a primary SoC when temperatures exceed nominal ranges and enable a backup SoC to take over autonomous inference and driving tasks. - According to examples described herein, the
FuSa program 338 can be executed by the one or more processors 340 (e.g., a dedicated FuSa CPU) of thecentral chiplet 300 to perform functional safety tasks for the SoC. As described throughout the present disclosure, these tasks can comprise acquiring and comparing output from multiple independent pipelines that correspond to inference and/or autonomous vehicle control tasks. For example, one independent pipeline can comprise workloads corresponding to identifying other vehicles operating around the vehicle in image data, and a second independent pipeline can comprise workloads corresponding to identifying other vehicles operating around the vehicle in radar and LIDAR data. TheFuSa program 338 can execute FuSa workloads in another independent pipeline that acquires the output of the first and second independent pipelines to dynamically verify that they have identified the same vehicles. - In further examples, the
FuSa program 338 can operate to perform SoC monitoring in a dual SoC arrangement in which a primary Soc performs the inference and vehicle control tasks, and a backup SoC performs health monitoring on the primary SoC with its chiplets in a low power standby mode, ready to take over these tasks if any errors, faults, or failures are detected in the primary SoC. Further description of these FuSa functions are described below with respect toFIG. 5 . In still further examples, theFuSa program 338 can operate to monitor communications between chiplets and provide redundancy (e.g., via error correction code techniques) to ensure communication reliability between the chiplets. Further description of these techniques is provided below with respect toFIG. 6 . - In various implementations, the
central chiplet 300 can include a set of one or more processors 340 (e.g., a transient-resistant CPU and general compute CPUs) that can execute ascheduling program 342 for out-of-order execution of workloads in a set of deterministic pipelines. In certain examples, one or more of theprocessors 340 can execute reflex workloads in accordance with thereflex program 330 and/or application workloads in accordance with theapplication program 335. As such, theprocessors 340 of thecentral chiplet 300 can reference, monitor, and update dependency information in workload entries of the reservation table 350 as workloads become available and are executed accordingly. For example, when a workload is executed by a particular chiplet, the chiplet updates the dependency information of other workloads in the reservation table 350 to indicate that the workload has been completed. This can include changing a bitwise operator or binary value representing the workload (e.g., from 0 to 1) to indicate in the reservation table 350 that the workload has been completed. Accordingly, the dependency information for all workloads having dependency on the completed workload is updated accordingly. - According to examples described herein, the reservation table 350 can include workload entries, each of which indicates a workload identifier that describes the workload to be performed, an address in the
cache memory 315 and/or HBM-RAM of the location of raw or processed sensor data required for executing the workload, and any dependency information corresponding to dependencies that need to be resolved prior to executing the workload. In certain aspects, the dependencies can correspond to other workloads that need to be executed. Once the dependencies for a particular workload are resolved, the workload entry can be updated (e.g., by the chiplet executing the dependent workloads, or by theprocessors 240 of thecentral chiplet 300 through execution of the scheduling program 342). When no dependencies exist for a particular workload as referenced in the reservation table 350, the workload can be executed in a respective pipeline by a correspondingworkload processing chiplet 320. - In various implementations, the sensor
data input chiplet 310 obtains sensor data from the sensor system of the vehicle, and stores the sensor data (e.g., image data, LIDAR data, radar data, ultrasonic data, etc.) in acache 315 of thecentral chiplet 300. The sensordata input chiplet 310 can generate workload entries for the reservation table 350 comprising identifiers for the sensor data (e.g., an identifier for each obtained image from various cameras of the vehicle's sensor system) and provide an address of the sensor data in thecache memory 315. An initial set of workloads be executed on the raw sensor data by theprocessors 340 of thecentral chiplet 300 and/orworkload processing chiplets 320, which can update the reservation table 350 to indicate that the initial set of workloads have been completed. - As described herein, the
workload processing chiplets 320 monitor the reservation table 350 to determine whether particular workloads in their respective pipelines are ready for execution. As an example, theworkload processing chiplets 320 can continuously monitor the reservation table using a workload window 355 (e.g., an instruction window for multimedia data) in which a pointer can sequentially read through each workload entry to determine whether the workloads have any unresolved dependencies. If one or more dependencies still exist in the workload entry, the pointer progresses to the next entry without the workload being executed. However, if the workload indicates that all dependencies have been resolved (e.g., all workloads upon which the particular workload depends have been executed), then the relevantworkload processing chiplet 320 and/orprocessors 340 of thecentral chiplet 300 can execute the workload accordingly. - As such, the workloads are executed in an out-of-order manner where certain workloads are buffered until their dependencies are resolved. Accordingly, to facilitate out-of-order execution of workloads, the reservation table 350 comprises an out-of-order buffer that enables the
workload processing chiplets 320 to execute the workloads in an order governed by the resolution of their dependencies in a deterministic manner. It is contemplated that out-of-order execution of workloads in the manner described herein can increase speed, increase power efficiency, and decrease complexity in the overall execution of the workloads. - As described herein, the
workload processing chiplets 320 can execute workloads in each pipeline in a deterministic manner, such that successive workloads of the pipeline are dependent on the outputs of preceding workloads in the pipeline. In various implementations, theprocessors 340 andworkload processing chiplets 320 can execute multiple independent workload pipelines in parallel, with each workload pipeline including a plurality of workloads to be executed in a deterministic manner. Each workload pipeline can provide sequential outputs (e.g., for other workload pipelines or for processing by theapplication program 335 for autonomously operating the vehicle). Through concurrent execution of the reflex workloads in deterministic pipelines, theapplication program 335 can autonomously operate the controls of the vehicle along a travel route. - As an illustration, the
scheduling program 342 can cause theprocessors 340 andworkload processing chiplets 320 to perform out-of-order execution on the workloads in independent pipelines. In previous implementations, each image generated by the camera system of the vehicle would be processed or inferred on as the image becomes available. The instruction set would involve acquiring the image, scheduling inference on the image by a workload processing chiplet, performing inference on the image, acquiring a second image, scheduling inference on the second image by the workload processing chiplet, and performing inference on the second image, and so on across the suite of cameras of the vehicle. By reorganizing the order in which workloads are processed, the complexity of computation is significantly reduced. Specifically, for validating an autonomous driving system that utilizes out-of-order workload execution as described herein, the number of computational combinations for verification (e.g., by a safety authority) is significantly reduced. - As provided herein, the use of the
workload window 355 and reservation table 350 referencing dependency information for workloads enables theworkload processing chiplets 320 to operate more efficiently by performing out-of-order execution on the workloads. Instead of performing inference on images based on when they are available, aworkload processing chiplet 320 can acquire all images from all cameras first, and then perform inference on all the images together. Accordingly, theworkload processing chiplet 320 executes its workloads with significantly reduced complexity, increased speed, and reduced power requirements. - In further examples, the shared
memory 360 can include athermal management program 337 executable by the one ormore processors 340 to manage the various temperatures of theSoC 200, operate cooling components, perform hardware throttling, switch to backup components (e.g., a backup SoC), and the like. In still further examples, the sharedmemory 360 can include aFuSa program 338 that performs functional safety tasks for theSoC 200, such as monitoring communications within the SoC (e.g., using error correction code), comparing outputs of different pipelines, and monitoring hardware performance of the SoC. According to examples described herein, thethermal management program 337 andFuSa program 338 can perform their respective tasks in independent pipelines. -
FIG. 4 depicts workloads being executed in a set of 400, 410, and further depicts a functional safety (FuSa) pipeline 420 operable to compare and verify the outputs of independent pipelines, according to examples described herein. In the below discussion ofindependent pipelines FIG. 4 , various workloads can be executed in independent deterministic pipelines by one ormore processors 340 of thecentral chiplet 300 and/or theworkload processing chiplets 320 through execution of thereflex program 330,application program 335,thermal program 337,FuSa program 338, and/orscheduling program 342 as depicted inFIG. 3 . - Referring to
FIG. 4 ,pipeline 400 andpipeline 410 are executed in parallel by one or more chiplets of the SoC. While 400 and 410 are shown inonly workload pipelines FIG. 4 , any number of pipelines can be executed in parallel by thecentral chiplet 300 and/orworkload processing chiplets 320 in performing the reflex and application tasks described throughout the present disclosure. As described herein, the reflex and application tasks can comprise sensor data acquisition, sensor fusion, inference tasks that facilitate scene understanding of the surrounding environment of the vehicle, motion prediction, motion planning, and vehicle control tasks for autonomously operating a vehicle. Additional tasks may also be executed in individual pipelines, such as power control tasks, thermal management tasks, health monitoring tasks, and the like. - In various implementations, the
scheduling program 342 can cause the workloads represented by the workload entries in the reservation table 350 to be executed deterministically in independent pipelines, such that the order of workload execution in each pipeline is consistent and non-reversible. Furthermore, the workloads executed in each pipeline can comprise a chain of dependency, such that the outputs of the pipelines are based on the same or similar workloads being sequentially executed in each pipeline. As such, complexity in the inference operations is significantly reduced, which can facilitate certification of each individual pipeline for autonomous driving purposes. - As an example,
pipeline 400 can be tasked with performing inference on two-dimensional image data (e.g., to identify and classify other dynamic entities proximate to the vehicle in the images). A first workload inpipeline 400 can comprise obtaining images captured by each camera of the vehicle at a given time. A second workload inpipeline 400 can comprise stitching the images to form a 360-degree ribbon of the surrounding environment of the vehicle. A third workload inpipeline 400 can comprise performing inference on the two-dimensional image data (e.g., pixel analysis to identify the dynamic entities). Accordingly, an output ofpipeline 400 can comprise a two-dimensional ribbon with dynamic entities identified (e.g., with a bounding box) and/or classified (e.g., as bicyclists, other vehicles, pedestrians, etc.). - As another example,
pipeline 410 can be tasked with performing inference on three-dimensional sensor fusion data (e.g., comprising fused LIDAR, image, and/or radar data). For example,pipeline 410 can also be tasked with identifying external dynamic entities in the three-dimensional data. A first workload inpipeline 410 can comprise acquiring point clouds generated by LIDAR sensors of the vehicle at a given time, and acquiring radar and ultrasonic data from the same time. A second workload inpipeline 410 can comprise fusing the sensor data to provide a three-dimensional, fused sensor view of the surrounding environment of the vehicle. A third workload inpipeline 410 can comprise performing inference on the three-dimensional sensor fusion data to identify and/or classify the external dynamic entities. - As described herein, the workload processing chiplets (e.g.,
workload processing chiplets 320 and thecentral chiplet 300 ofFIG. 3 ) can execute respective workloads in various other deterministic pipelines (e.g., in accordance with thereflex program 330 and/orapplication program 335 shown inFIG. 3 ). For example, a first pipeline can be dedicated for identifying traffic signals in two-dimensional image data, a second pipeline can be dedicated for identifying traffic signals in three-dimensional sensor fusion data, a third pipeline can be dedicated for identifying and classifying lane markings, a fourth pipeline can be dedicated for generating occupancy grid maps from the sensor data, a fifth pipeline can be dedicated for predicting the motion of external dynamic entities, a sixth pipeline can be dedicated for planning the motion of the vehicle based on the inferences from other pipelines, a seventh pipeline can be dedicated for controlling the vehicle's control systems to execute the motion plan generated by the sixth pipeline, and so on. - According to various examples, the workloads or tasks performed in each pipeline are ordered deterministically (e.g., by the
scheduling program 342 ofFIG. 3 ), which can significantly reduce complexity in certifying the autonomous drive system. For example, a single inference mechanism for an autonomous drive system that performs natural order processing using image data may not be certifiable due to the complexity and randomness of its workload executions, as well as the potential for outliers in the single inference mechanism (e.g., confusion about certain detected objects and lack of comparison between multiple inference mechanisms). These outliers may result in stuck states or collisions for the autonomous vehicle. With the use of deterministic pipelines that independently execute workloads, any outliers from one pipeline can be mitigated or otherwise overcome by comparison and confirmation mechanisms from other pipelines. - As shown in
FIG. 4 , the various workloads ofpipeline 400 andpipeline 410 can be executed as runnables on one or more processers of one or more chiplets of theSoC 200. In certain examples, a transient-resistant CPU (e.g., ofcentral chiplet 220 and/or general compute chiplets 245) can execute the workloads inpipeline 400 andpipeline 410. It is contemplated that the use of robust, transient-resistant CPUs (e.g., ASIL-D rated CPUs) for executing workloads in the independent deterministic pipelines can further bolster the ASIL rating of the autonomous drive system as a whole. These transient-resistant CPUs can be manufactured for robustness in terms of reliability, resistance to heat, cold, radiation, wear, age, vibration, shock, etc. It is further contemplated that transient-resistant CPUs may not have the computing power of modern, non-transient-resistant CPUs (e.g., having an ASIL-B rating) that are designed and manufactured to maximize bandwidth and processing speed. - As further shown in
FIG. 4 , the workloads inpipeline 400 andpipeline 410 can be executed as runnables on multiple CPUs of theSoC 200 and/or multiple chiplets of theSoC 200. For example, a transient-resistant CPU can execute workloads in each 400, 410 and can be backed up by one or more state-of-the art CPUs that execute the same workloads in eachpipeline 400, 410. The transient-resistant CPU(s) may execute workloads in eachpipeline 400, 410 at a lower frequency than the other CPUs. For example, the transient-resistant CPU(s) can execute the workloads in eachpipeline 400, 410 and provide outputs on the order of microseconds, whereas the other CPUs can provide outputs for eachpipeline 400, 410 on the order of nanoseconds.pipeline - In an example, the transient-resistant CPUs may execute workloads in
deterministic pipeline 400 and identify external dynamic objects (e.g., other vehicles, bicyclists, pedestrians, etc.) in two-dimensional image data every few microseconds. The other CPU may execute the same workloads indeterministic pipeline 400 to identify the same external dynamic entities every few nanoseconds (e.g., or at the same frequency that the images are generated by the cameras). Thus, the outputs by the transient-resistant CPU(s) can be verified or confirmed by the outputs of the other CPU(s) in each deterministic pipeline. This process can occur for each independent pipeline performing inference operations (e.g., the reflex program 330), and can further be utilized for theapplication program 335,thermal management program 337, and/or theFuSa program 338. - In certain aspects, the workloads of
pipeline 400 andpipeline 410 can be executed by one or more CPUs of thecentral chiplet 220 and/or one or more CPUs of thegeneral compute chiplets 245.FIG. 4 further shows an example FuSa pipeline 420 that dynamically compares and verifies the outputs of the runnables for each 400, 410, according to various examples. In certain implementations, the FuSa pipeline 420 can compare the outputs of multiple runnables performed by different CPUs in eachpipeline 400, 410, as well as comparing the outputs ofpipeline pipeline 400 with the outputs ofpipeline 410. In the example of identifying and classifying dynamic external entities, the two-dimensional outputs frompipeline 400 can indicate the entities in image data that lacks precise distance information to each entity, whereas the three-dimensional outputs frompipeline 410 may lack information such as color and edge detail that facilitates classification of the external entities. Furthermore, the sensor fused data processed inpipeline 410 can include radar and/or ultrasonic data that can provide detailed proximity and or speed differential information of the external entities. - As such, the outputs of
pipeline 400 andpipeline 410 have different outliers that, when viewed alone, can affect the accuracy the autonomous drive system's capabilities. As described herein, the various workload processing chiplets (e.g., chiplets 320 andcentral chiplet 300 ofFIG. 3 ) can execute workloads in any number of pipelines, with each pipeline having different outliers based on the sensor data being processed. As further described herein, the outputs of certain pipelines can be compared with the outputs of other pipelines through the execution of one or more FuSa pipelines 420 that acquire and dynamically verify the respective outputs of different independent pipelines. - As shown in
FIG. 4 , the FuSa pipeline 420 can acquire the outputs ofpipeline 400 andpipeline 410 and compare and verify their outputs. As described herein, the outputs can correspond to any inference operations relating to the processing of sensor data from the sensor system of the vehicle. In various examples, the runnable of the FuSa pipeline 420 can be executed on a dedicated CPU (e.g., on thecentral chiplet 220 of theSoC 200 arrangement). - In the example shown in
FIG. 4 , the FuSa pipeline 420 acquires the two-dimensional outputs ofpipeline 400 and the three-dimensional outputs ofpipeline 410. The FuSa pipeline 420 then compares the two-dimensional and three-dimensional outputs to determine whether they are consistent with each other. For inferences involving the identification and/or classification of external dynamic entities, the FuSa pipeline 420 will confirm whetherpipeline 400 andpipeline 410 have both separately identified and/or classified the same external dynamic entities in the surrounding environment of the vehicle using different sensor data and/or techniques having different outliers. - While the examples shown in
FIG. 4 show pipelines involving different types of sensor data, numerous other deterministic pipelines are contemplated in which a FuSa pipeline is utilized to compare and dynamically verify their outputs. For example, this can include a FuSa pipeline that compares outputs of multiple pipelines tasked to identify traffic signals and traffic signal states, outputs of motion prediction pipelines tasked to predict the motion of external dynamic entities, and comparable outputs of other deterministic pipelines that facilitate in autonomously operating the vehicle. As such, any issue that occurs in any pipeline can be readily detected and flagged by a FuSa pipeline. It is contemplated that the use of transient-resistant CPUs with support from general compute CPUs, the execution of workloads in verifiable deterministic pipelines, and the use of FuSa pipelines to dynamically compare and verify the outputs from the deterministic pipelines, can all combine to provide an increased ASIL rating (e.g., an ASIL-D rating) for the autonomous driving system of the vehicle. -
FIG. 5 is a block diagram depicting anexample computing system 500 implementing a multiple system-on-chip (MSoC), in accordance with examples described herein. In various examples, thecomputing system 500 can include afirst SoC 510 having a first memory 515 and asecond SoC 520 having asecond memory 525 coupled by an interconnect 540 (e.g., an ASIL-D rated interconnect) that enables each of thefirst SoC 510 andsecond SoC 520 to read each other'smemories 515, 525. During any given session, thefirst SoC 510 and thesecond SoC 520 may alternate roles, between a primary SoC and a backup SoC. As provided herein, the primary SoC can perform various autonomous driving tasks, such as perception, object detection and classification, grid occupancy determination, sensor data fusion and processing, motion prediction (e.g., of dynamic external entities), motion planning, and vehicle control tasks. The backup SoC can maintain a set of computational components (e.g., CPUs, ML accelerators, and/or memory chiplets) in a low power state, and continuously or periodically read the memory of the primary SoC. - For example, if the
first SoC 510 is the primary SoC and thesecond SoC 520 is the backup SoC, then thefirst SoC 510 performs a set of autonomous driving tasks and publishes state information corresponding to these tasks in the first memory 515. Thesecond SoC 520 reads the published state information in the first memory 515 to continuously check that thefirst SoC 510 is operating within nominal thresholds (e.g., temperature thresholds, bandwidth and/or memory thresholds, etc.), and that thefirst SoC 510 is performing the set of autonomous driving tasks properly. As such, thesecond SoC 520 performs health monitoring and error management tasks for thefirst SoC 510, and takes over control of the set of autonomous driving tasks when a triggering condition is met. As provided herein, the triggering condition can correspond to a fault, failure, or other error experienced by thefirst SoC 510 that may affect the performance of the set of tasks by thefirst SoC 510. - In various implementations, the
second SoC 520 can publish state information corresponding to its computational components being maintained in a standby state (e.g., a low power state in which thesecond SoC 520 maintains readiness to take over the set of tasks from the first SoC 510). In such examples, thefirst SoC 510 can monitor the state information of thesecond SoC 520 by continuously or periodically reading thememory 525 of thesecond SoC 520 to also perform health check monitoring and error management on thesecond SoC 520. For example, if thefirst SoC 510 detects a fault, failure, or other error in thesecond SoC 520, thefirst SoC 510 can trigger thesecond SoC 520 to perform a system reset or reboot. - In certain examples, the
first SoC 510 and thesecond SoC 520 can each include a functional safety (FuSa) component (e.g., aFuSa program 338 executed by one ormore processors 340 of acentral chiplet 300, as shown and described with respect toFIG. 3 ) that performs the health monitoring and error management tasks. The FuSa component can be maintained in a powered state for each SoC, whether the SoC operates in a primary or backup manner. As such, the backup SoC may maintain its other components in a low powered state, with its FuSa component being powered up and performing the heath monitoring and error management tasks described herein. - In various aspects, when the
first SoC 510 operates as the primary SoC, the state information published in the first memory 515 can correspond to the set of tasks being performed by thefirst SoC 510. For example, thefirst SoC 510 can publish any information corresponding to the surrounding environment of the vehicle (e.g., any external entities identified by thefirst SoC 510, their locations, and predicted trajectories, detected objects, such as traffic signals, signage, lane markings, and crosswalk, and the like). The state information can further include the operating temperatures of the computational components of thefirst SoC 510, bandwidth usage and available memory of the chiplets of thefirst SoC 510, and/or any faults or errors, or information indicating faults or errors in these components. - In further aspects, when the
second SoC 520 operates as the backup SoC, the state information published in thesecond memory 525 can correspond to the state of each computational component of thesecond SoC 520. In particular, these components may operate in a low power state in which the components are ready to take over the set of tasks being performed by thefirst SoC 510. The state information can include whether the components are operating within nominal temperatures and other nominal ranges (e.g., available bandwidth, power, memory, etc.). - As described throughout the present disclosure, the
first SoC 510 and thesecond SoC 520 can switch between operating as the primary SoC and the backup SoC (e.g., each time thesystem 500 is rebooted). For example, in a computing session subsequent to a session in which thefirst SoC 510 operated as the primary SoC and thesecond SoC 520 operated as the backup SoC, thesecond SoC 520 can assume the role of the primary SoC and thefirst SoC 510 can assume the role of the backup SoC. It is contemplated that this process of switching roles between the two SoCs can provide substantially even wear of the hardware components of each SoC, which can prolong the lifespan of thecomputing system 500 as a whole. - According to embodiments, the
first SoC 510 can be powered by a first power source and thesecond SoC 520 can be powered by a second power source that is independent or isolated from the first power source. For example, in an electric vehicle, the first power source can comprise the battery pack used for propelling the electric motors of the vehicle, and the second power source can comprise the auxiliary power source of the vehicle (e.g., a 12-volt battery). In other implementations, the first and second power sources can comprise other types of power sources, such as dedicated batteries for each 510, 520 or other power sources that are electrically isolated or otherwise not dependent from each other.SoC - It is contemplated that the MSoC arrangement of the
computing system 500 can be provided to increase the safety integrity level (e.g., ASIL rating) of thecomputing system 500 and the overall autonomous driving system of the vehicle. As described herein, the autonomous driving system can include any number of dual SoC arrangements, each of which can perform a set of autonomous driving tasks. In doing so, the backup SoC dynamically monitors the health of the primary SoC in accordance with a set of functional safety operations, such that when a fault, failure, or other error is detected, the backup SoC can readily power up its components and take over the set of tasks from the primary SoC. -
FIG. 6 is a block diagram depicting a performance network and a FuSa network for performing health monitoring and error correction, according to examples described herein. In various examples, the FuSa CPU(s) 600 can be included on acentral chiplet 300 of an SoC, or eachcentral chiplet 300 of anMSoC 500 as described herein. Each FuSa CPU 600 can execute a FuSa program 602, which can correspond to theFuSa program 338 as shown and described with respect toFIG. 3 . As described herein, execution of the FuSa program 602 can cause the FuSa CPU(s) 600 to perform the primary and backup SoC monitoring tasks described with respect toFIG. 5 , and the execution of FuSa workloads in a FuSa pipeline 420 for comparison and verification of independent pipeline output described with respect toFIG. 4 . - Furthermore, in the example shown in
FIG. 6 , multiple chiplets of the SoC can communicate with each other over a high-bandwidth performance network comprising respective sets of interconnects (e.g.,interconnect 610 and interconnect 660) and network hubs (e.g., 615, 635, and 665). As described herein, the multiple chiplets can comprise the sensornetwork hubs data input chiplet 310,central chiplet 300, andworkload processing chiplets 320 ofFIG. 3 , which are represented bychiplet A 605,chiplet B 655, and any number of additional chiplets (not shown) inFIG. 6 . Still further, thecache memories 625, 675 shown inFIG. 6 may represent cache memories associated with multiple chiplets and/or thecache memory 315 of thecentral chiplet 300 as shown and described with respect toFIG. 3 . - In various examples, raw sensor data, processed sensor data, and various communications between
chiplet A 605,chiplet B 655, and the FuSa CPU(s) 600 can be transmitted over the high-bandwidth performance network comprising the 610, 660,interconnects 615, 635, 665, andnetwork hubs caches 625, 675. For example, ifchiplet A 605 comprises a sensor data input chiplet, then chiplet A 605 can obtain sensor data from the various sensors of the vehicle and transmit the sensor data to cache 625 viainterconnect 610 andnetwork hub 615. In this example, ifchiplet B 655 comprises a workload processing chiplet, then chipletB 655 can acquire the sensor data from cache 625 via 615, 635, 665 andnetwork hubs interconnect 660 to execute respective inference workloads based on the sensor data. - In certain implementations, the FuSa CPU(s) 600, through execution of the FuSa program 602, can communicate with the high-bandwidth performance network via a performance network-on-chip (NoC) 607 coupled to a
network hub 635. These communications can comprise, for example, acquiring output data from independent pipelines to perform the comparison and verification steps described herein. The communications over the high-bandwidth performance network can further comprise communications to access the sharedmemories 515, 525 of each 510, 520 in aSoC multiple SoC 500 that comprises a primary SoC and a backup SoC. In such examples, the FuSa CPUs 600 of each 510, 520 access the sharedSoC memory 515, 525 of each other to determine whether any faults, failures, or other errors have occurred. As described herein, when the backup SoC detects a fault, failure, or error, the backup SoC takes over the primary SoC tasks (e.g., inference, scene understanding, vehicle control tasks, etc.). - In some aspects, the
610, 660 are used as a high-bandwidth data path used for general data purposes to theinterconnects cache memories 625, 675, and 620, 670 andhealth control modules FuSa accounting hubs 630, 640, and 680 are used as a high-reliability data path to transmit functional safety and scheduler information to the shared memory of the SoC. NoCs and network interface units (NIUs) onchiplet A 605 andchiplet B 655 can be configured to generate error-correcting code (ECC) data on both the high-bandwidth and high-reliability data paths. Each corresponding NIU on each pairing die has the same ECC configuration, which generates and checks the ECC data to ensure end-to-end error correction coverage. - According to various embodiments, the FuSa CPU(s) 600 communicates via a FuSa network comprising the
FuSa accounting hubs 630, 640, 680 and 620, 670 via a FuSa NoC 609. As provided herein, the FuSa network facilitates the communication monitoring and error correction code techniques. As shown inhealth control modules FIG. 6 , aFuSa accounting hub 630, 640, 680 can monitor communications transmitted through each 615, 635, 665 of the high-bandwidth network. Each ofnetwork hub chiplet A 605 andchiplet B 655 can communicate with or include a 620, 670 through which ECC data, workload start and end communications, and scheduling information can be transmitted.health control module - For the FuSa network data paths, the NIUs can transmit the functional safety and scheduler information through the
620, 670 in two redundant transactions, with the second transaction ordering the bits in reverse (e.g., from bit 31 to 0 on a 32-bit bus) of the order of the first transaction. Furthermore, if errors are detected in the data transfers betweenhealth control modules chiplet A 605 andchiplet B 655 over the high-reliability FuSa network, the NIUs can reduce the transmission rate to improve reliability. - In some examples, certain processors of
chiplet A 605,chiplet B 655, and/or the FuSa CPU(s) 600 can include a transient-resistant CPU core to run thescheduling program 342 ofFIG. 3 , which schedules workloads belonging to thereflex program 330,application program 335,thermal management program 337, and/or the FuSa program 602. The transient-resistant CPU cores are designed to resist and recover from transient faults caused by environmental factors such as cosmic radiation, power surges, and electromagnetic interference. These faults can cause the CPU to malfunction or produce incorrect results, potentially leading to system failures or security vulnerabilities. To address these issues, the transient-resistant CPU cores can include a range of hardware-based fault detection and recovery mechanisms, such as redundant execution units, error-correcting code (ECC) memory, and register duplication. These mechanisms can detect and correct errors in real-time, ensuring that the CPU continues to function correctly even in the presence of transient faults. Additionally, the transient-resistant CPU cores may include various software-based fault tolerance techniques, such as checkpointing and rollback, to further enhance system reliability and resilience. - In some aspects, the
620, 670 andhealth control modules FuSa accounting hubs 630, 640, 680 can detect and correct errors in real-time, ensuring that the CPUs continue to function correctly even in the presence of transient faults. For example, the workloadprocessing chiplet A 540 and thecentral chiplet 520 perform an error correction check to verify that the processed data was sent and stored in thecache memories 625, 675 completely and without corruption. For example, for each processed data communication, the workload processing chiplets can generate an error correction code (ECC) using the processed data and transmit the ECC to the central chiplet. While the data itself is transmitted along a high-bandwidth performance network between chiplets, the ECC is sent along a high-reliability FuSa network via theFuSa accounting hubs 630, 640, 680. Upon receiving the processed data, the central chiplet can generate its own ECC using the processed data, and the FuSa CPU 600 can perform a functional safety call in the central chiplet mailbox to compare the two ECCs to ensure that they match, which verifies that the data was transmitted correctly. - It is contemplated that the communications over the high-bandwidth perform network and high-reliability FuSa network, as well as the ECC techniques described herein can provide additional redundancy and mitigative actions that can bolster the ASIL rating of individual SoC components and the autonomous drive system as a whole. As such, a safety authority may readily verify and certify the various methods and components described throughout the present disclosure such that an autonomous drive system implementing these methods and components can be deemed safe and reliable for use on public road networks.
-
FIG. 7 through 9 are flow charts describing various methods of implementing functional safety techniques described above. In the below discussions of the methods ofFIGS. 7 through 9 , reference may be made to reference characters representing certain features described with respect to the diagrams ofFIGS. 1 through 6 . Furthermore, the steps described with respect to the flow charts ofFIGS. 7 through 9 may be performed by thecomputing system 100, theworkload processing chiplets 320 andcentral chiplet 300 of theSoC 200, and/or theMSoC 500 as shown and described with respect toFIGS. 1 through 6 . Further still, certain steps described with respect to the flow charts ofFIGS. 7 through 9 may be performed prior to, in conjunction with, or subsequent to any other step, and need not be performed in the respective sequences shown. -
FIG. 7 is a flow chart describing a method of dynamically comparing and verifying workload outputs by a set of workload processing chiplets, according to various examples. Referring toFIG. 7 , atblock 700, a sensordata input chiplet 310 can obtain sensor data from a set ofvehicle sensors 205. As described herein, the sensor data from thevehicle sensors 205 can comprise any combination of LIDAR data, image data, radar data, and/or other forms of sensor data (e.g., ultrasonic data, IR data, etc.). - At
block 705, a set of workload processing chiplets (e.g., processingchiplets 320 and/or the central chiplet 300) can execute workloads based on the sensor data in a set of independent pipelines. In particular, thescheduling program 342 of thecentral chiplet 300 can schedule specified sets of workloads to be executed in a deterministic manner within independent pipelines. For example, thescheduling program 342 can impart dependency information in the workload entries such that they are not executed until the dependency information is resolved. This dependency information can comprise other workloads that need to be executed prior to execution of that particular workload. In further examples, the workload processing chiplets can execute the workloads in the set of independent pipelines deterministically using the reservation table 350 as an out-of-order buffer (e.g., by sequentially analyzing workload entries in aworkload window 355 using an instruction pointer for multimedia content). - As provided herein, the workload processing chiplets can execute the workloads in the set of independent pipelines to perform a set of tasks for operating a vehicle. In various examples, the set of tasks can comprise a plurality of image stitching tasks, sensor fusion tasks, machine learning inference tasks, object detection tasks, object classification tasks, scene understanding tasks, motion prediction tasks, and the like. These tasks can comprise inference operations to process the surrounding environment of the vehicle such that an
application program 335 can successfully operate the vehicle along a travel route. As such, the set of independent pipelines can provide output (e.g., an inferred sensor view of the surrounding environment) to theapplication program 335 for autonomously operating the vehicle. - At
block 710, thecentral chiplet 300 can include aFuSa program 338 that dynamically compares and verifies output of a plurality of independent pipelines in a FuSa pipeline in a deterministic manner, as shown in the example FuSa pipeline 420 ofFIG. 4 . In particular, theFuSa program 338 can dynamically compare and verify output of workloads executed by the set ofworkload processing chiplets 320, the workloads being executed across the set ofworkload processing chiplets 320 based on the sensor data. Execution of the workloads in deterministic pipelines (e.g., via the reflex program 330) can result in generating an inferred sensor view of a surrounding environment of the vehicle, which can be provided to anapplication program 335 for autonomously operating the vehicle. It is contemplated that each independent deterministic pipeline corresponding to thereflex program 330 can be certified (e.g., for use on public roads by a safety authority). These pipelines can include all inference operations that correspond to perception, object detection and classification, occupancy grid determination, motion prediction and/or planning, and any other scene understanding task for autonomously operating the vehicle. -
FIG. 8 is a flow chart describing a method of performing backup operations in a multiple system-on-chip (MSoC) arrangement, according to various examples. Referring toFIG. 8 , atblock 800, afirst SoC 510 can receive sensor data from a set ofvehicle sensors 205. At block 805, thefirst SoC 510 can perform a set of autonomous driving tasks based on the sensor data. For example, thefirst SoC 510 can include a set of chiplets, as shown inFIG. 2 , that each perform one or more autonomous driving tasks that include one or more perception, inference, object detection and/or classification, right-of-way determination, occupancy grid determination, motion prediction, motion planning, and/or vehicle control tasks for the autonomous vehicle. In certain examples, the autonomous driving tasks can comprise sensor data perception and inference tasks for autonomously operating a vehicle along a travel route. Atblock 810, thefirst SoC 510 can further publish state information in a shared memory 515 of thefirst SoC 510. - At
block 815, through execution of theFuSa program 338, thefirst SoC 510 can continuously read state information of thesecond SoC 520. For example, the state information of thesecond SoC 520 can indicate the operating parameters of the various computational components (e.g., chiplets) of thesecond SoC 520 in the low powered state, and can further indicate whether these components are operating within those parameters (e.g., whether the components are warmed up and ready to take over the set of autonomous driving tasks). Atdecision block 820, thefirst SoC 510 can dynamically determine whether a trigger has been detected in the state information of thesecond SoC 520. As provided herein, the trigger can correspond to any of the components of thesecond SoC 520 operating outside nominal parameters, or a fault, failure, or error experienced by thesecond SoC 520. If no trigger is detected, then thefirst SoC 510 can continue monitoring the state information of thesecond SoC 520. However, if at any time a trigger is detected, then atblock 825, thefirst SoC 510 can, for example, transmit a command to thesecond SoC 520 to cause thesecond SoC 520 to perform a system reboot. As described herein, information communicated betweenSoC 510 andSoC 520 can be transmitted via a robust, ASIL-D rated interconnect (e.g., interconnect 540 shown inFIG. 5 ) using an error correction code (ECC), which provides redundancy algorithmically (e.g., through use of block codes, convolutional codes, and the like). - At
block 830, thesecond SoC 520 can maintain a plurality of computational components in a low power state. As described above, these components can include any one or more of the chiplets as shown and described with respect toFIG. 2 . Atblock 835, thesecond SoC 520, through execution of theFuSa program 338, can continuously read the state information as published by thefirst SoC 510. Atdecision block 840, thesecond SoC 520 can determine whether a trigger is detected in the state information. As provided herein, the trigger can correspond to thefirst SoC 510 experiencing a fault or a failure, where the fault or the failure can correspond to thefirst SoC 510 experiencing degraded performance, such as overheating, a power surge, or an error in thefirst SoC 510. If no trigger is detected, then thesecond SoC 520 can continue to monitor the state information of thefirst SoC 510. However, if a trigger is detected at any time, at block 845, thesecond SoC 520 can power up its computational components and take over the set of autonomous driving tasks from thefirst SoC 510, while thefirst SoC 510 powers down its components and assumes the role of backup SoC. - At
block 850, based on theFuSa program 338, thesecond SoC 520 can continue to read the state information of thefirst SoC 510. Atdecision block 860, thesecond SoC 520 can determine whether thefirst SoC 510 is still degraded. If so, atblock 865, thesecond SoC 520 can initiate a set of mitigative or emergency measures. In certain aspects, these measures can comprise reducing the speed of the vehicle, providing a notification to any passengers in the vehicle (e.g., to take over manual control of the vehicle), autonomously operating the vehicle to a safe location (e.g., pulling over the vehicle or driving to a home location), and/or autonomously operating the vehicle to a service center to resolve the degraded status of thefirst SoC 510. - In some examples, at
block 870, thesecond SoC 520 may further transmit a command to cause thefirst SoC 510 to perform a system reboot. At block 855, thefirst SoC 510 may then perform the backup SoC tasks, such as maintaining a subset of its components in a low power state and dynamically monitoring state information as published by theprimary SoC 520. If at any time, the primary and secondary SoCs are unable to communicate (e.g., one of the SoCs is unable to boot up), the autonomous drive system of the vehicle will not engage. It is contemplated that this arrangement provides necessary redundancy for an increased ASIL rating of the autonomous drive system of the vehicle (e.g., contributes to an ASIL-D rating). - In various examples, each time the MSoC arrangement reboots, the
first SoC 510 and thesecond SoC 520 can switch between primary and backup roles to maintain substantially even wear on the MSoC components, such as the various chiplets of each SoC. Furthermore, the SoCs can be electrically coupled via one or more eFuses that protect the SoCs from each other (e.g., from voltage or current surges). Along these lines, thefirst SoC 510 and thesecond SoC 520 can be powered by distinct power sources, such as the battery pack used for propulsion of the vehicle, and the auxiliary power source of the vehicle used for powering the auxiliary components (e.g., ECU, lights, radio, etc.). - As provided herein, the state information monitoring and error management functions performed by the first and
510, 520 can be performed by functional safety components of each SoC (e.g., thesecond SoCs FuSa program 338 and FuSa processor). As further provided herein, for the backup SoC, the FuSa components remain powered up to perform their functional safety tasks while the remaining components are maintained in the low power state, and ready to assume the primary SoC tasks. It is contemplated that thefirst SoC 510 and thesecond SoC 520 being arranged to dynamically read state information and take over the set of tasks of the primary SoC provides redundancy to facilitate an automotive safety integrity level rating for the autonomous drive computing system (e.g., achieve an ASIL-D rating). -
FIG. 9 is a flow chart describing a method of monitoring communications in a high-bandwidth performance network by a FuSa program 602, according to various examples described herein. As provided herein, atblock 900, the sensordata input chiplet 310, thecentral chiplet 300, and the one or moreworkload processing chiplets 320 can communicate data over a performance network comprising a plurality of network hubs. For example, the sensordata input chiplet 310 can communicate raw sensor data, atblock 902, to acache memory 315, where the sensor data may be accessed by thecentral chiplet 300 and/orworkload processing chiplets 320 to execute their respective workloads. In further examples, atblock 904, theworkload processing chiplets 320 and/orcentral chiplet 300 can transmit and receive processed sensor data (e.g., to and from the cache memory 315). - To provide increased security, certain encryption methods may be used in the communications between chiplets and/or between the primary and backup SoC. In certain examples, each sensor data item, such as images, point cloud maps, radar pulses, etc. can be transmitted using an encryption technique (e.g., a cipher using public-private key encryption) in which the recipient chiplet (e.g.,
central chiplet 300 and/or workload processing chiplet 320) decrypts the transmission to access the sensor data item. In certain aspects, the cipher can comprise a coded algorithm associated with a public key of the sensor data component at which the sensor data item originated. Each sensor data item from each sensor data component (e.g., image sensors, LIDAR sensors, radar sensors, etc.) is generated sequentially. In accordance with examples described herein, any subsequent data item from any sensor data component cannot be accessed by recipient chiplets without decryption and verification of the previous sensor data item from the respective sensor data components. - For multiple SoC arrangements, the sensor
data input chiplet 310 orcentral chiplets 300 of the primary SoC can generate a cipher for each sensor data item (e.g., individual images, point cloud maps, radar data pulses, etc.) that identifies the data source of the sensor data item (e.g., individual image sensors, LIDAR sensors, radar sensors, etc.). The generated cipher can be transmitted to the backup SoC to facilitate immediate takeover of the primary SoC's functions. - At
block 905, for each received raw sensor data item, the sensordata input chiplet 310 or central chiplet 300 (e.g., via the FuSa program 338) can transmit the cipher associated with the raw sensor data item to the recipient chiplets(s) and/or the backup SoC for decryption and verification to associate the sensor data item with the sensor data source. In certain examples, the cipher can be transmitted over the performance network via the 615, 635, 665, the high-reliability network via thenetwork hubs 620, 670 andhealth control modules FuSa accounting hubs 630, 640, 680, or both. - As provided herein, decryption of the cipher enables the backup SoC to readily take over if any faults, failures, or errors occur in the primary SoC. In certain examples, when such faults, failures, or errors occur (e.g., a power surge, overheating, etc.), the backup SoC can utilize a corresponding private key to decode the cipher and verify the data source of each of the sensor data items. In various aspects, each of the primary and backup SoCs has direct access to the sensor components of the vehicle. Accordingly, when the primary SoC experiences an issue, the backup SoC can provide a verification indicator to the sensor data components indicating that the previously sent sensor data items were decrypted accordingly, thereby enabling access to additional sensor data items from the individual sensor data components. Thereafter, the back SoC can assume the role of primary SoC, as described above.
- In various implementations, at
block 910, the FuSa program 602 is executed (e.g., by one or more dedicated FuSa CPUs 600) to monitor communications through the plurality of 615, 635, 665 of the performance network between the sensornetwork hubs data input chiplet 310, thecentral chiplet 300, and the one or moreworkload processing chiplets 320. In such implementations, a set of FuSa accounting hubs can be connected to the network hubs of the performance network to receive communication data (e.g., a 32-bit register that indicates whether any errors are present in a particular communication) for each communication. The FuSa program 602 can receive this information from the 615, 635, 665 via a FuSa NoC 609 that enables the FuSa program 602 to communicate with each chiplet of the SoC over a high-reliability FuSa network. Furthermore, the chiplets can communicate workload start and end indicators overFuSa accounting hubs 620, 670 and thehealth control modules FuSa accounting hubs 630, 640, 680, which can, for example, cause dependency information in the reservation table 350 to be updated accordingly. - As provided herein, the
central chiplet 300 can include a dedicated FuSa CPU 600 executing the FuSa program 602 to communicate over the performance network via theperformance NoC 607, and communicate over the high-reliability FuSa network via the FuSa NoC 609. For each data transmission on the performance network, atblock 915, the chiplets can generate and transmit ECCs over the high-reliability FuSa network (e.g., via the 620, 670 and FuSa accounting hubs 630, 640, 680). For example, when ahealth control modules workload processing chiplet 320 communicates with the sharedmemory 360 of thecentral chiplet 300, theworkload processing chiplet 320 can also generate and transmit a first ECC based on the communicated data along the FuSa network. Upon receiving the data, thecentral chiplet 300 can generate a second ECC based on the received data. - At
block 920, for each received data transmission, thecentral chiplet 300 can perform a FuSa call to verify that the first ECC and the second ECC match to ensure that the data was transmitted correctly and confirm the communication. As provided herein, the ECC techniques can be performed for each communication to thecentral chiplet 300. In further examples, the ECC techniques may also be performed for communications betweenworkload processing chiplets 320, between thecentral chiplet 300 andworkload processing chiplets 320, and/or between the sensordata input chiplet 310 andcentral chiplet 300. - It is contemplated for examples described herein to extend to individual elements and concepts described herein, independently of other concepts, ideas or systems, as well as for examples to include combinations of elements recited anywhere in this application. Although examples are described in detail herein with reference to the accompanying drawings, it is to be understood that the concepts are not limited to those precise examples. As such, many modifications and variations will be apparent to practitioners skilled in this art. Accordingly, it is intended that the scope of the concepts be defined by the following claims and their equivalents. Furthermore, it is contemplated that a particular feature described either individually or as part of an example can be combined with other individually described features, or parts of other examples, even if the other features and examples make no mention of the particular feature.
Claims (20)
1. A computing system, comprising:
a sensor data input chiplet to obtain sensor data from a sensor system;
a set of workload processing chiplets; and
a first central chiplet comprising a shared memory including a functional safety (FuSa) program that causes one or more processors of the first central chiplet to:
dynamically compare and verify output of workloads executed by the set of workload processing chiplets, the workloads being executed across the set of workload processing chiplets based on the sensor data.
2. The computing system of claim 1 , wherein the sensor data input chiplet, the central chiplet, and the one or more workload processing chiplets communicate over a performance network comprising a plurality of network hubs, and wherein the FuSa program monitors communications through the plurality of network hubs between the sensor data input chiplet, the central chiplet, and the one or more workload processing chiplets.
3. The computing system of claim 2 , wherein the FuSa program monitors communications through the plurality of network hubs using a set of FuSa accounting hubs that communicate over a high-reliability FuSa network.
4. The computing system of claim 3 , wherein the central chiplet comprises a dedicated FuSa CPU executing the FuSa program to (i) communicate over the performance network via a performance network-on-chip (NoC), and (ii) communicate over the high-reliability FuSa network via a FuSa NoC.
5. The computing system of claim 4 , wherein the one or more workload processing chiplets transmit (i) processed data to a cache memory of the central chiplet over the performance network, and (ii) a first error correction code (ECC) along the high-reliability FuSa network to the central chiplet based on the processed data.
6. The computing system of claim 5 , wherein upon receiving the processed data, the central chiplet generates a second ECC using the processed sensor data, and wherein the FuSa CPU performs a functional safety call in the central chiplet to verify that the first ECC and the second ECC match to ensure that the processed data was transmitted correctly.
7. The computing system of claim 1 , wherein the computing system is included on a vehicle, and wherein the workloads comprise inference tasks based on the sensor data for autonomously operating the vehicle.
8. The computing system of claim 1 , wherein the workloads are executed by the set of workload processing chiplets in independent pipelines, and wherein the FuSa program dynamically compares and verifies output of the independent pipelines by executing a set of FuSa workloads in a FuSa pipeline.
9. The computing system of claim 1 , wherein the computing system comprises a first system-on-chip (SoC) that includes the first central chiplet and a second SoC that includes a second central chiplet, the first SoC and the second SoC being communicatively coupled by an interconnect, and wherein the FuSa program is further included in the second central chiplet of the second SoC.
10. The computing system of claim 9 , wherein the FuSa program included in the second central chiplet of the second SoC causes one or more processors of the second SoC to:
monitor the shared memory of the first central chiplet of the first SoC to dynamically determine whether the first SoC is functioning within nominal operating parameters.
11. The computing system of claim 10 , wherein the FuSa program in the second central chiplet of the second SoC further causes the one or more processors of the second SoC to:
in response to determining that the first SoC is not functioning within nominal operating parameters, cause a second set of workload processing chiplets of the second SoC to take over execution of the workloads.
12. The computing system of claim 10 , wherein determining that the first SoC is not operating within nominal operating parameters corresponds to one or more of the first SoC overheating, a power surge, or an error in the first SoC.
13. The computing system of claim 11 , wherein for each respective sensor data item generated by each respective sensor data component of the sensor system, the first SoC generates a cipher associated with the respective sensor data component at which the respective sensor data item originates, and transmits the cipher to the second SoC.
14. The computing system of claim 13 , wherein, upon determining that the first SoC is not functioning within the nominal operating parameters, the second SoC decrypts the cipher to verify the respective sensor data item to take over execution of the workloads.
15. A non-transitory computer readable medium storing instructions that, when executed by one or more processors of a computing system, cause the computing system to:
obtain, by a sensor data input chiplet of the computing system, sensor data from a sensor system;
on a central chiplet of the computing system, execute a functional safety program to dynamically compare and verify output of workloads executed by the set of workload processing chiplets, the workloads being executed across the set of workload processing chiplets based on the sensor data.
16. The non-transitory computer readable medium of claim 15 , wherein the sensor data input chiplet, the central chiplet, and the one or more workload processing chiplets communicate over a performance network comprising a plurality of network hubs, and wherein the FuSa program monitors communications through the plurality of network hubs between the sensor data input chiplet, the central chiplet, and the one or more workload processing chiplets.
17. The non-transitory computer readable medium of claim 16 , wherein the FuSa program monitors communications through the plurality of network hubs using a set of FuSa accounting hubs that communicate over a high-reliability FuSa network.
18. The non-transitory computer readable medium of claim 17 , wherein the central chiplet comprises a dedicated FuSa CPU executing the FuSa program to (i) communicate over the performance network via a performance network-on-chip (NoC), and (ii) communicate over the high-reliability FuSa network via a FuSa NoC.
19. The non-transitory computer readable medium of claim 18 , wherein the one or more workload processing chiplets transmit (i) processed data to a cache memory of the central chiplet over the performance network, and (ii) a first error correction code (ECC) along the high-reliability FuSa network to the central chiplet based on the processed data.
20. A computer-implemented method of implementing functional safety on a computing system, the method being performed by one or more processors and comprising:
obtaining, by a sensor data input chiplet of the computing system, sensor data from a sensor system;
on a central chiplet of the computing system, executing a functional safety program to dynamically compare and verify output of workloads executed by the set of workload processing chiplets, the workloads being executed across the set of workload processing chiplets based on the sensor data.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/212,442 US20240430125A1 (en) | 2023-06-21 | 2023-06-21 | Functional safety for system-on-chip arrangements |
| PCT/EP2024/059468 WO2024260605A1 (en) | 2023-06-21 | 2024-04-08 | Functional safety for system-on-chip arrangements |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/212,442 US20240430125A1 (en) | 2023-06-21 | 2023-06-21 | Functional safety for system-on-chip arrangements |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20240430125A1 true US20240430125A1 (en) | 2024-12-26 |
Family
ID=90720949
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/212,442 Pending US20240430125A1 (en) | 2023-06-21 | 2023-06-21 | Functional safety for system-on-chip arrangements |
Country Status (2)
| Country | Link |
|---|---|
| US (1) | US20240430125A1 (en) |
| WO (1) | WO2024260605A1 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20250068499A1 (en) * | 2021-09-28 | 2025-02-27 | Bayerische Motoren Werke Aktiengesellschaft | Method and Device for Sequence Monitoring of Multiple Threads |
| US12362911B1 (en) * | 2022-05-19 | 2025-07-15 | Ceremorphic, Inc. | PRNG-based chiplet-to-chiplet secure communication using chaining of message blocks |
Citations (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11144027B2 (en) * | 2019-06-29 | 2021-10-12 | Intel Corporation | Functional safety controls based on soft error information |
| US11269799B2 (en) * | 2019-05-03 | 2022-03-08 | Arm Limited | Cluster of processing elements having split mode and lock mode |
| US11360846B2 (en) * | 2019-09-27 | 2022-06-14 | Intel Corporation | Two die system on chip (SoC) for providing hardware fault tolerance (HFT) for a paired SoC |
| US11538287B2 (en) * | 2019-09-20 | 2022-12-27 | Sonatus, Inc. | System, method, and apparatus for managing vehicle data collection |
| US11636063B2 (en) * | 2021-08-02 | 2023-04-25 | Nvidia Corporation | Hardware accelerated anomaly detection using a min/max collector in a system on a chip |
| US20230176577A1 (en) * | 2017-11-10 | 2023-06-08 | Nvidia Corporation | Systems and methods for safe and reliable autonomous vehicles |
| US20230342161A1 (en) * | 2022-04-26 | 2023-10-26 | Motional Ad Llc | Boot process system-on-chip node configuration |
| US12136002B1 (en) * | 2024-01-24 | 2024-11-05 | Mercedes-Benz Group AG | Simultaneous multi-threaded processing for executing multiple workloads with interference prevention |
| US20240378090A1 (en) * | 2023-05-10 | 2024-11-14 | Mercedes-Benz Group AG | Out-of-order workload execution |
| US20240375670A1 (en) * | 2023-05-10 | 2024-11-14 | Mercedes-Benz Group AG | Autonomous vehicle system on chip |
| US12199838B2 (en) * | 2022-04-26 | 2025-01-14 | Motional Ad Llc | Software-defined compute nodes on multi-SoC architectures |
| US12229079B2 (en) * | 2023-05-10 | 2025-02-18 | Mercedes-Benz Group AG | Multiple system-on-chip arrangement for vehicle computing systems |
| US12276963B2 (en) * | 2022-03-31 | 2025-04-15 | Intel Corporation | Apparatus, system, and method of functional safety |
-
2023
- 2023-06-21 US US18/212,442 patent/US20240430125A1/en active Pending
-
2024
- 2024-04-08 WO PCT/EP2024/059468 patent/WO2024260605A1/en active Pending
Patent Citations (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20230176577A1 (en) * | 2017-11-10 | 2023-06-08 | Nvidia Corporation | Systems and methods for safe and reliable autonomous vehicles |
| US11269799B2 (en) * | 2019-05-03 | 2022-03-08 | Arm Limited | Cluster of processing elements having split mode and lock mode |
| US11144027B2 (en) * | 2019-06-29 | 2021-10-12 | Intel Corporation | Functional safety controls based on soft error information |
| US11538287B2 (en) * | 2019-09-20 | 2022-12-27 | Sonatus, Inc. | System, method, and apparatus for managing vehicle data collection |
| US11360846B2 (en) * | 2019-09-27 | 2022-06-14 | Intel Corporation | Two die system on chip (SoC) for providing hardware fault tolerance (HFT) for a paired SoC |
| US11636063B2 (en) * | 2021-08-02 | 2023-04-25 | Nvidia Corporation | Hardware accelerated anomaly detection using a min/max collector in a system on a chip |
| US12276963B2 (en) * | 2022-03-31 | 2025-04-15 | Intel Corporation | Apparatus, system, and method of functional safety |
| US20230342161A1 (en) * | 2022-04-26 | 2023-10-26 | Motional Ad Llc | Boot process system-on-chip node configuration |
| US12199838B2 (en) * | 2022-04-26 | 2025-01-14 | Motional Ad Llc | Software-defined compute nodes on multi-SoC architectures |
| US20240378090A1 (en) * | 2023-05-10 | 2024-11-14 | Mercedes-Benz Group AG | Out-of-order workload execution |
| US20240375670A1 (en) * | 2023-05-10 | 2024-11-14 | Mercedes-Benz Group AG | Autonomous vehicle system on chip |
| US12229079B2 (en) * | 2023-05-10 | 2025-02-18 | Mercedes-Benz Group AG | Multiple system-on-chip arrangement for vehicle computing systems |
| US12136002B1 (en) * | 2024-01-24 | 2024-11-05 | Mercedes-Benz Group AG | Simultaneous multi-threaded processing for executing multiple workloads with interference prevention |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20250068499A1 (en) * | 2021-09-28 | 2025-02-27 | Bayerische Motoren Werke Aktiengesellschaft | Method and Device for Sequence Monitoring of Multiple Threads |
| US12362911B1 (en) * | 2022-05-19 | 2025-07-15 | Ceremorphic, Inc. | PRNG-based chiplet-to-chiplet secure communication using chaining of message blocks |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2024260605A1 (en) | 2024-12-26 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12045348B2 (en) | Methods and arrangements for multi-layer in-vehicle network intrusion detection and characterization | |
| US12377867B2 (en) | Independent safety monitoring of an automated driving system | |
| US20240375670A1 (en) | Autonomous vehicle system on chip | |
| EP3724763B1 (en) | System and method for online functional testing for error-correcting code function | |
| EP3663921B1 (en) | Workload repetition redundancy | |
| WO2024260605A1 (en) | Functional safety for system-on-chip arrangements | |
| US20200039530A1 (en) | Secure system that includes driving related systems | |
| US20190108160A1 (en) | Vehicle Control System Verification Device, Vehicle Control System, and Vehicle Control System Verification Method | |
| CN114435382A (en) | Autonomous driving vehicle operation condition monitoring | |
| JP7176488B2 (en) | Data storage device and data storage program | |
| WO2024230971A1 (en) | Multiple system-on-chip arrangement for vehicle computing systems | |
| US20210316742A1 (en) | Error handling in an autonomous vehicle | |
| CN108108262B (en) | Integrated circuit with hardware checking unit checking selected memory accesses | |
| US20230259293A1 (en) | Vehicle data storage method and vehicle data storage system | |
| US20240409106A1 (en) | System on chip automotive safety monitoring | |
| WO2025157629A1 (en) | Simultaneous multi-threaded processing for executing multiple workloads with interference prevention | |
| CN111897304A (en) | Method, apparatus and system for real-time diagnostics and fault monitoring in machine systems | |
| US20240378090A1 (en) | Out-of-order workload execution | |
| US20250042418A1 (en) | Adapting performance level of runnables based on safety ratings | |
| US20240411606A1 (en) | Autonomous vehicle system on chip mailbox architecture | |
| US20240391477A1 (en) | Workload execution in deterministic pipelines | |
| US20250291748A1 (en) | Interconnect providing freedom from interference | |
| KR20250169604A (en) | Multi-system-on-chip array for vehicle computing systems | |
| WO2025078035A1 (en) | Mechanisms for reporting data over die-to-die interfaces | |
| US20250284650A1 (en) | System on chip for freedom from interference |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: MERCEDES-BENZ GROUP AG, GERMANY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PIEDNOEL, FRANCOIS;POELLNY, OLIVER;SIGNING DATES FROM 20230622 TO 20230702;REEL/FRAME:064227/0175 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION COUNTED, NOT YET MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |