CN120163113B

CN120163113B - Wafer-level chip system design space construction and rapid parameter searching method

Info

Publication number: CN120163113B
Application number: CN202510366146.6A
Authority: CN
Inventors: 张国和; 姚鲁; 王金磊; 王宇木; 朱思宇; 张雯烁; 李忠良
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2025-03-26
Filing date: 2025-03-26
Publication date: 2025-11-11
Anticipated expiration: 2045-03-26
Also published as: CN120163113A

Abstract

The invention discloses a method for constructing a design space of a wafer-level chip system and searching fast parameters. The method comprises the steps of task graph definition, prefabrication parameter quantification, joint model construction based on task graph characteristics and prefabrication characteristics and Bayesian optimization implementation strategy. The technical scheme of the invention can guide the construction and search of the design space so as to reduce the times of calling the simulator and realize the construction of the wafer-level chip system architecture design space, the efficient search of parameters and the performance evaluation. The method can be used for agile design and hardware realization of a wafer-level chip system architecture of multi-core integration.

Description

Wafer-level chip system design space construction and rapid parameter searching method

Technical Field

The invention relates to the technical field of wafer-level chip system architecture design, in particular to a wafer-level chip system design space construction and quick parameter search method for a wafer-level chip system architecture.

Background

With the rapid development of integrated circuit technology, wafer-level chip systems (Wafer-SCALE SYSTEM, WSS) have become an important direction in the field of high-performance computing. Compared with the traditional chip design, the wafer-level chip greatly improves the computing power and the energy efficiency ratio by integrating a plurality of chip functions on the same wafer. However, this highly integrated design also introduces significant design complexity, including aspects such as software and hardware co-design, task mapping and scheduling, performance optimization, and power consumption management.

In wafer-level chip system design, design space exploration (DESIGN SPACE Exploration, DSE) is a critical task aimed at finding an optimal design that satisfies Performance, power consumption and Area (PPA) constraints in a broad design parameter space. However, as the design scale increases, the dimensions and complexity of the parameter space also increase significantly. Traditional design space exploration methods mostly adopt heuristic algorithms or experience-based manual tuning, and although the methods are effective in small-scale design, the methods often show the following defects when facing the ultra-high-dimensional design space of a wafer-level system:

The method has the advantages that (1) the calculation efficiency is low, the number of parameter combinations grows exponentially, so that the traditional method needs a large amount of calculation resources to cover a design space, (2) the global optimization capability is insufficient, the traditional method is often limited to local optimal solutions and lacks effective exploration capability for the global design space, (3) task mapping and resource scheduling are difficult, the mapping relationship between system-level tasks and hardware resources is complex and dynamically changed, the traditional method cannot effectively solve the problems of multi-task concurrency and hardware resource contention, and (4) the traditional method lacks multi-mode data processing capability, namely the design of a wafer-level chip system relates to software and hardware interaction, multi-level performance indexes and multi-mode data (such as structured task graphs and hardware parameters), and the traditional method is difficult to fuse and utilize the information efficiently.

In recent years, the development of machine learning and optimization algorithms has provided new solutions for design space exploration. For example, bayesian optimization algorithms have excellent global search capabilities, can be optimized with high efficiency under the condition of high evaluation cost, and the graph neural network (Graph Neural Network, GNN) is good at processing structured data and can be used for representing task graphs or hardware topologies. However, the application of these two techniques in wafer-level chip system design still faces challenges that bayesian optimization may have a slow convergence rate in a high-dimensional parameter space, and needs to improve the design and calculation efficiency of the acquisition function, and how to optimize the design of the wafer-level chip system in combination with the specific requirements of the design of the wafer-level chip system is still an open problem. The integration and fusion of the two needs to design an efficient framework and support the interpretation and verification of the design result.

Therefore, an intelligent design space exploration method combining bayesian optimization and graph neural network is needed, which can effectively process complex multi-mode input data, provide global optimization capability, and simultaneously have both calculation efficiency and interpretability so as to meet the requirements of wafer-level chip system design.

Disclosure of Invention

The invention aims to provide a design space construction and quick parameter search method for a wafer-level chip system architecture, which combines a Bayesian optimization algorithm and a joint model, performs iterative optimization through initialization, feature extraction, solution space generation, bayesian optimization and model update, ensures high efficiency, can obviously improve the performance of the wafer-level chip system design, and is particularly suitable for high-complexity system design and multi-task scheduling optimization.

The invention is realized by adopting the following technical scheme:

the design space construction and parameter searching method mainly comprises the following five stages:

1. Initialization phase

At this stage, the computing task is represented by a task graph. Each node in the task graph represents a computing task, and edges between nodes represent dependencies between tasks. And simultaneously, carrying out parameter quantization on the prefabricated member, wherein the parameter quantization comprises dynamic parameters and static parameters.

2. Feature extraction stage

And extracting characteristics of the task graph by using a graph convolution network, wherein each task node comprises node characteristics and mainly reflects the requirement of the task on computing resources. Node characteristics include, but are not limited to, information such as computational latency, memory requirements, data transmission requirements, computational load, etc. of tasks. The inter-task edges include edge features that mainly describe the data flow dependencies and priorities between tasks. These features play a vital role in task scheduling, affecting the order and dependency of task execution.

The hardware characteristics of the die can be categorized into static parameters and dynamic parameters. Static parameters include process, number of cores, cache hierarchy, etc., which are typically determined by the physical level of the chip and do not vary with tasks. The dynamic parameters reflect the influence of task load on hardware performance, including instruction cycle number, memory bandwidth occupancy rate, power consumption curve, etc., and are usually obtained by simulation tools. The preform parameters were feature extracted using a Transformer network.

3. Solution space generation stage

The main task of this stage is to input the features of the task graph and the preform parameter features into a joint model to generate a solution space. The joint model uses a dynamic graph neural network (D-GNN) for feature fusion. The dynamic graph neural network can extract characteristics of nodes and edges in the graph structure and capture dependency relationships among tasks. Through graph convolution operation, the D-GNN can effectively extract topology structure information of the task graph, and a foundation is provided for subsequent task division and hardware selection.

The joint model also employs a cross-modal attention mechanism to handle heterogeneity between task features and hardware features. In the process, the attention mechanism can automatically identify which task features and which hardware features have strong relevance, so that feature spaces of the task features and the hardware features are aligned, and the expression capacity of the model is further improved.

The resulting solution space contains two key elements, the task partitioning scheme and the grain selection sequence. The task partitioning scheme determines how each task is distributed across different dies, and the die selection sequence determines which dies are specifically used to perform the task.

4. Bayesian optimization stage

The goal of this stage is to quickly search for the optimal design scheme based on the generated solution space by a bayesian optimization algorithm. Bayesian optimization first requires the construction of a Gaussian Process (GP) as a proxy model for predicting the performance of different designs. In the proxy model, inputs are a task partitioning scheme and a grain selection sequence, and outputs are corresponding performance metrics such as power consumption, latency, area, etc.

The gaussian process uses a kernel function to model the uncertainty in the design space, and the super-parameters of the proxy model are updated by methods such as maximum marginal likelihood estimation.

Bayesian optimization uses methods such as Expected Improvement (EI) as an acquisition function, where EI can guide the search direction according to the uncertainty of the current model, selecting the evaluation points that are most likely to bring about performance improvement. By calling the simulator multiple times, the Bayesian optimization can gradually converge to a design scheme with optimal performance.

5. Model update phase

And after each optimization, calculating a loss function according to a simulation result, and updating the weights of the graph neural networks in the joint model through a back propagation algorithm. The loss function consists of two parts, namely a task division scheme and cross entropy loss of a grain selection sequence, wherein the cross entropy loss aims at ensuring the rationality of task division and hardware selection, and the Mean Square Error (MSE) based on a simulation result is used for measuring the difference between a predicted performance index and actual performance.

The super parameters of the agent model are updated by a marginal likelihood gradient descent Method (MLE) and other methods so as to improve the prediction accuracy of the model. In each iteration process, the joint model and the proxy model are updated according to the new simulation result, and the design scheme is gradually optimized until the preset convergence condition is met.

Compared with the prior art, the invention discloses a design space construction and quick parameter searching method for a wafer-level chip system architecture. The method has the following effects of revealing the time and space statistical characteristics of the running of the application program in the wafer-level chip system, establishing the design theory and method of the wafer-level chip system architecture of the field-specific software and hardware cooperative calculation, forming the basic theory and method of the wafer-level chip software development environment, and providing theoretical support and architecture guidance for the design and implementation of the field-specific wafer-level chip.

Drawings

FIG. 1 is a schematic diagram of an initialization, feature extraction and solution space generation process in the present invention

FIG. 2 is a schematic flow chart of Bayesian optimization in the present invention

FIG. 3 is a flow chart of a method for implementing the method for constructing and searching the quick parameters in the design space of the invention

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

As shown in fig. 1, the input calculation task and the prefabricated member are initialized, the calculation task is expressed as a directed acyclic graph, quantization parameters of the prefabricated member are obtained, features of the task graph and the prefabricated member parameters are extracted by using a graph rolling network and a transducer respectively, the features are input into an encoder network with an attention layer, and finally a solution space is output through a decoder network.

As shown in figure 2, firstly, a Gaussian model is pre-trained by combining a simulator with a small amount of random solutions, then the solutions output by the joint model are provided for the Gaussian model, the Gaussian model outputs evaluation values, then the next evaluation point is selected according to the distribution rule of the performance in a solution space, the simulator is used for evaluating the performance, if the performance meets the requirement, the optimal solution is output, and otherwise, the parameters of the joint model and the Gaussian model are updated according to the evaluation result of the simulator.

As shown in fig. 3, the implementation flow of the design space construction and rapid parameter search method of the present invention includes an initialization stage, a feature extraction stage, a solution space generation stage, a bayesian optimization stage and a model update stage, and specific steps of each stage are as follows.

1. An initialization stage:

Task graph construction-first, the computing tasks are represented as a graph, where each node represents a computing task. The dependencies between tasks are represented by edges between nodes. The structure of the task graph can clearly show the execution sequence and the dependency of the tasks.

And (3) quantifying parameters of the prefabricated member:

Static parameters, which are fixed by hardware design, such as process technology, core number of chips, cache hierarchy, etc., which do not change during the calculation process.

Dynamic parameters are relations between hardware performance and load, such as instruction cycle number, memory bandwidth occupancy rate, power consumption and the like, in the task execution process. These parameters are typically obtained by simulation tools reflecting the impact of the task on the hardware.

2. Feature extraction:

task graph feature extraction, namely processing a task graph by using a Graph Convolutional Network (GCN), wherein each task node comprises features reflecting the task calculation requirement, such as calculation time delay, memory requirement, data transmission requirement and the like. These features reflect the need for computing resources for each task.

The edge features in the task graph are connected with different tasks, and the edge features mainly describe the data flow dependency relationship and priority among the tasks. Edge features are particularly important in task scheduling because they affect the order and dependencies of task execution.

And (3) extracting the characteristics of the prefabricated member, namely processing static and dynamic prefabricated member parameters by using a transducer network, and extracting hardware characteristics such as hardware architecture characteristics of a chip, influence of load on hardware performance and the like. The key to this stage is to obtain accurate information on how the hardware resources affect task execution.

3.A solution space generation stage:

Joint model input, namely inputting the task characteristics in the task graph and the hardware prefabricated member parameter characteristics into the joint model. The model fuses the features of the task and the features of the hardware to generate a solution space.

And performing feature fusion by using the D-GNN, extracting topological structure information of the task graph, and capturing the dependency relationship between tasks. The Graph Neural Network (GNN) has excellent graph data processing capability, and can effectively mine complex dependency structures among tasks.

Cross-modal attention mechanisms-the heterogeneity between task features and hardware features is handled by the cross-modal attention mechanism. The attention mechanism can automatically identify which task features and which hardware features have strong correlation and align the features so as to improve the expression capacity of the model.

Generating a solution space, wherein the solution space comprises two key elements:

Task partitioning scheme-deciding how to partition and distribute tasks to different hardware units (dies) for execution.

Grain selection sequence-determining which grains are specifically used to perform the respective tasks.

4. Bayesian optimization:

agent model construction the core of bayesian optimization is to construct an agent model, typically a Gaussian Process (GP) model, for predicting the performance (e.g., power consumption, delay, etc.) of different design schemes. The proxy model is obtained from simulation or experimental data.

Modeling of a gaussian process by selecting an appropriate kernel function modeling design space uncertainty, a performance index can be predicted between the task partitioning scheme and the grain selection sequence.

Acquisition function (EI) Bayesian optimization by expected improvement (Expected Improvement, EI), the scheme most likely to improve performance is automatically selected for the next evaluation.

And (3) performance evaluation, namely guiding the search direction according to the uncertainty of the current model by Bayesian optimization, and gradually converging to an optimal design scheme by continuously evaluating the result of the simulator.

5. Model updating stage:

and (3) calculating the actual performance of the task division and grain selection scheme through a simulator after optimizing each time. And comparing the simulation result with the predicted value of the model, and calculating a loss function.

The loss function consists of two parts:

and cross entropy loss, namely, rationality for optimizing a task division scheme and a grain selection sequence, and ensuring that task allocation and hardware selection meet actual requirements.

Mean Square Error (MSE) is a measure of the gap between the performance index (e.g., power consumption, delay, etc.) of model predictions and simulation results.

And updating the back propagation, namely updating the weights of the graph neural networks in the joint model by using a back propagation algorithm according to the loss function, and gradually optimizing the model.

Proxy model update gaussian process model in bayesian optimization also needs to be updated. The super parameters of the proxy model are adjusted by a marginal likelihood gradient descent (MLE) method to improve the accuracy of the performance prediction.

Claims

1. A method for constructing a wafer-level chip system design space and searching fast parameters, wherein the method comprises a Bayesian optimization algorithm and a joint model, and is characterized in that:

(1) The initialization stage comprises the steps that a task graph TASK GRAPH is a directed graph, and the task graph is composed of a group of nodes and edges, wherein each node represents a calculation task, the edges represent the dependency relationship among the tasks, and optional prefabricated member parameters are quantized, and the static parameters and the dynamic parameters are included;

(2) The feature extraction stage comprises the steps of carrying out graph structure analysis on a task graph, extracting node features and edge features, wherein the node features comprise task computing resource requirements, and the edge features comprise inter-task data flow dependency relationships and priorities;

(3) The generation stage of a solution space, namely inputting the characteristics of the task graph and the characteristics of the crystal grains into a joint model, carrying out characteristic fusion on the characteristics of the task graph and the cross-modal attention mechanism through a dynamic graph neural network, and outputting the solution space formed by a task division scheme and a crystal grain selection sequence;

(4) A Bayesian optimization stage, namely constructing a Gaussian process proxy model based on a solution space, selecting an evaluation point by adopting a desired improvement function, and calling a simulator to acquire a system performance index;

(5) And (3) in the model updating stage, calculating a loss function according to a simulation result, updating the graph neural network weight of the joint model through back propagation, updating the super-parameters of the proxy model, and iteratively executing the steps (2) - (4) until a preset convergence condition is reached.

2. The method of claim 1, wherein the quantitatively modeling of the prefabricated die in step (2) includes static parameters including process, number of cores, cache hierarchy, and dynamic parameters including instruction cycle number, memory bandwidth occupancy, and power consumption curve based on task load simulation.

3. The method of claim 1, wherein the task graph feature extraction in step (2) uses a deep learning model or a conventional machine learning algorithm to extract node features and edge features.

4. The method of claim 1, wherein the joint model in step (3) employs an encoder-decoder architecture.

5. The method of claim 4, wherein the encoder side uses a graph rolling network to extract high dimensional features and achieves cross-modal alignment of task features with grain parameters through a multi-headed attention layer.

6. The method of claim 4, wherein the decoder side uses a reinforcement learning strategy to generate the task partitioning scheme in combination with a gating loop unit to output the grain selection sequence.

7. The method of claim 1, wherein the performing of the Bayesian optimization in step (4) includes using a Gaussian process model as a proxy model, wherein the variance of the parameter and noise of the hyper-parametric covariance function is updated by a maximum-margin likelihood estimation method, and using a desired improvement function Expected Improvement, EI as an acquisition function to guide the Bayesian optimization.

8. The method of claim 1, wherein the model update rule in step (5) is that the loss function of the joint model comprises cross entropy loss of the task partitioning scheme and the grain selection sequence, and a simulation index mean square error, and the proxy model hyper-parameters are updated by an edge likelihood gradient descent method.

9. The method of claim 1, wherein the wafer-level chip system architecture design space exploration is selected for preform matching for different task mappings in a complex application scenario.