CN106610816A

CN106610816A - Avoidance method for conflict between instruction sets in RISC-CPU and avoidance system thereof

Info

Publication number: CN106610816A
Application number: CN201611246947.6A
Authority: CN
Inventors: 孙建辉; 王春兴; 王公堂; 李登旺
Original assignee: Shandong Normal University
Current assignee: Shandong Normal University
Priority date: 2016-12-29
Filing date: 2016-12-29
Publication date: 2017-05-03
Anticipated expiration: 2036-12-29
Also published as: CN106610816B

Abstract

The invention discloses a method and system for avoiding conflicts between instruction sets in a RISC-CPU, comprising the following steps: Step 1: Determine the type of conflict according to the data dependency between different instruction sets in the RISC-CPU; Step 2: For the current instruction, judge whether it needs to access the "register file" or "memory", if so, proceed to step three, otherwise, continue to analyze the next instruction; Window", look for "related instructions" and judge whether there are "related instructions". If so, go to step 4; The choice of using specific strategies to resolve data conflicts while ensuring pipeline throughput efficiency.

Description

Method and system for avoiding conflicts between instruction sets in a RISC-CPU

技术领域technical field

本发明涉及计算机处理技术领域，具体涉及一种RISC-CPU中指令集之间冲突的规避方法及系统。The invention relates to the technical field of computer processing, in particular to a method and system for avoiding conflicts between instruction sets in a RISC-CPU.

背景技术Background technique

由于在嵌入式RISC-CPU的设计(从CPU硬件指令集选择与指令定制设计，到对应编译器的设计)非常重要。Because the design of embedded RISC-CPU (from CPU hardware instruction set selection and instruction custom design, to the design of corresponding compiler) is very important.

相关的技术如下：申请号为CN200810191060、申请人为“世意法(北京)半导体研发有限责任公司”发明专利申请“减少处理器中的指令冲突”，其解决指令冲破的策略是通过在指令发送(instruction issue)阶段，进行2种指令选择，送到后续的并行功能单元，看哪种指令无冲突，则仲裁出其中一种指令。该方法用于从多发射指令窗口中选择一个无冲突指令。The relevant technologies are as follows: the application number is CN200810191060, and the applicant is "Shiyifa (Beijing) Semiconductor Research and Development Co., Ltd." for the invention patent application "reducing instruction conflicts in processors". Instruction issue) stage, select two kinds of instructions, and send them to the subsequent parallel functional unit, to see which instruction has no conflict, and then arbitrate one of the instructions. This method is used to select a conflict-free command from a multi-issue command window.

而专利申请号为CN200710087737的英特尔公司的发明专利申请“执行存储器消歧的技术”，其针对相邻的存储访问指令之间的冲突避免给出了解决方案，但没有涉及到其余相关指令集内部或者指令集之间的相干窗口的界定与相干指令之间的冲突避免。The invention patent application of Intel Corporation with the patent application number CN200710087737 "Technology for Executing Memory Disambiguation" provides a solution for conflict avoidance between adjacent memory access instructions, but does not involve the interior of other related instruction sets Or the definition of coherence windows between instruction sets and the avoidance of conflicts between coherent instructions.

西安电子科技大学的申请号为CN201310054280的发明专利申请“一种基于超长指令字专用指令集处理器的汇编器设计方法”，通过汇编器设计完成指令调度，利用寄存器重命名等方法以去除指令乱序执行引起的写后写(W-A-W)与读后写(W-A-R)冲突。Xidian University's application number is CN201310054280 for an invention patent application "a method for designing an assembler based on a special instruction set processor with super-long instruction words". Write-after-write (W-A-W) conflicts with write-after-read (W-A-R) caused by out-of-order execution.

专利申请号为“CN200710087737”的英特尔公司的发明专利申请“执行存储器消歧的技术”，其针对相邻的存储访问指令之间的冲突避免给出了解决方案，但没有涉及到其余相关指令集内部或者指令集之间的相干窗口的界定与相干指令之间的冲突避免，并且与本专利所涉及的嵌入式RISC-CPU的流水阶段数不同，也没有提及本设计提出的从基于指令子集的原型CPU到全指令集的CPU硬件工程设计策略。Intel Corporation's invention patent application "Technology for Performing Memory Disambiguation" with the patent application number "CN200710087737" provides a solution for conflict avoidance between adjacent memory access instructions, but does not involve other related instruction sets The definition of coherent windows between internal or instruction sets and the avoidance of conflicts between coherent instructions are different from the number of pipeline stages of the embedded RISC-CPU involved in this patent, and there is no mention of the instruction-based Set prototype CPU to full instruction set CPU hardware engineering design strategy.

可见，现有技术中均没有从指令系统的整体角度给出RAW相干冲突解决的方法，也没有提出从局部最小指令集的CPU原型到全部指令集的CPU硬件设计工程策略。It can be seen that none of the prior art provides a method for solving RAW coherent conflicts from the perspective of the overall instruction system, nor does it propose a CPU hardware design engineering strategy from the CPU prototype of the local minimum instruction set to the entire instruction set.

发明内容Contents of the invention

为解决现有技术存在的不足，本发明公开了一种RISC-CPU中指令集之间冲突的规避方法及系统，用于解决指令集之间冲突的机制，并且提供一种RISC-CPU针对读写冲突解决的智能控制硬件设备。In order to solve the deficiencies in the prior art, the present invention discloses a method and system for avoiding conflicts between instruction sets in RISC-CPU, a mechanism for solving conflicts between instruction sets, and provides a RISC-CPU for reading Intelligent control of hardware devices for write conflict resolution.

为实现上述目的，本发明的具体方案如下：To achieve the above object, the specific scheme of the present invention is as follows:

一种RISC-CPU中指令集之间冲突的规避方法，包括以下步骤：A method for avoiding conflicts between instruction sets in a RISC-CPU, comprising the following steps:

步骤一：根据RISC-CPU中不同指令集之间的数据依赖关系，确定冲突类型；Step 1: Determine the type of conflict according to the data dependencies between different instruction sets in the RISC-CPU;

步骤二：针对当前指令，判断其是否需要对“寄存器堆”或者“存储器”访问，若是，则进行步骤三，否则，继续分析下条指令；Step 2: For the current instruction, determine whether it needs to access the "register file" or "memory", if so, proceed to step 3, otherwise, continue to analyze the next instruction;

步骤三：针对当前指令，界定“相干窗口”，在“相干窗口”内，寻找“相干指令”，并判断是否存在“相干指令”，若是，则进入步骤四，否则，判断为无读写冲突；Step 3: Define the "coherent window" for the current command, search for the "coherent command" in the "coherent window", and judge whether there is a "coherent command", if so, go to step 4, otherwise, judge that there is no read-write conflict ;

步骤四：根据指令集之间的冲突类型，进行冲突方法的选择，使用具体的策略进行数据冲突解决同时保证流水线吞吐效率。Step 4: According to the type of conflict between instruction sets, select a conflict method, and use a specific strategy to resolve data conflicts while ensuring pipeline throughput efficiency.

进一步的，在步骤一中，针对RISC-CPU的架构、指令集及流水段进行统计分析并归类，把全部指令集合分成数据处理指令集、存储器访问指令集、分支指令集及立即数指令集。Further, in step 1, statistically analyze and classify the architecture, instruction set and pipeline segments of RISC-CPU, and divide all instruction sets into data processing instruction set, memory access instruction set, branch instruction set and immediate number instruction set .

进一步的，在步骤一中，在得到四种指令集后，统计分析指令集之间的冲突，多种指令集合之间的读写冲突类型的遍历、归类，得到存在的指令数据依赖引入的冲突：后面“相关指令”读取旧的寄存器数据或覆盖还没有读取的新的写入数据，即原来的指令顺序的数据依赖关系被破坏，引起功能错误；以及原来的指令流水线被中断，包括由于流水线被冻结或流水线插入NOP指令引起的指令吞吐降低。Further, in step 1, after obtaining the four instruction sets, statistically analyze the conflicts between the instruction sets, traverse and classify the types of read-write conflicts between various instruction sets, and obtain the existing instruction data dependent on the introduction Conflict: The following "related instructions" read old register data or overwrite new write data that has not been read, that is, the data dependency of the original instruction sequence is destroyed, causing functional errors; and the original instruction pipeline is interrupted, These include reduced instruction throughput due to pipeline freezing or pipeline insertion of NOP instructions.

进一步的，冲突类型的归类具体为：数据处理、存储器访问、立即数指令集内部与它们之间。Further, the classification of conflict types specifically includes: data processing, memory access, and immediate instruction sets within and between them.

进一步的，在步骤三中，“相干窗口”定义：对于超标量指令处理，不需要考虑指令的乱序执行，对于当前指令而言，如果其需要对某个寄存器进行写入，则需要在如下的窗口内：如果当前指令对寄存器堆中的具体寄存器或者存储器的某个位置进行操作；从当前指令往下面指令流推，从取出指令一直到当前指令结果提交所涉及到的指令序列。Further, in step 3, the definition of "coherent window": For superscalar instruction processing, out-of-order execution of instructions does not need to be considered. For the current instruction, if it needs to write to a certain register, it needs to be in the following In the window: if the current instruction operates on a specific register in the register file or a certain location in the memory; push from the current instruction to the following instruction flow, from fetching the instruction until submitting the instruction sequence involved in the result of the current instruction.

进一步的，在步骤三中，“相干指令”定义：基于已经界定好的“相干窗口”，不考虑指令的乱序执行，对“当前指令”而言，如果其需要对某个寄存器或者存储器的某个偏移地址进行写入，则在“相干窗口”范围内，从当前指令往下面指令搜寻，同样也需要对“当前指令”进行写入或者读取的一个或者多个指令，这些存在数据依赖关系的指令，可能存在数据读写冲突，引起流水线的中断，降低指令吞吐效率。Further, in step 3, the definition of "coherent instruction": based on the already defined "coherent window", regardless of the out-of-order execution of instructions, for the "current instruction", if it requires a certain register or memory To write to a certain offset address, search from the current instruction to the following instruction within the scope of the "coherent window", and also need to write or read one or more instructions to the "current instruction". These exist data Dependent instructions may have data read and write conflicts, causing pipeline interruptions and reducing instruction throughput efficiency.

进一步的，位于“相干窗口”内，冲突解决的方法，包括：(1)把当前命令提交阶段前的逻辑运算结果，直接馈入到下面的相干指令的运算单元，而不用等到当前指令已经提交，再把最后的结果馈入到后面指令的运算单元。(2)在当前指令的后面的相干指令的运算结果进行延迟版本，利用延迟版本的输出作为读写冲突解决的寄存器备份资源。Further, within the "coherence window", the conflict resolution method includes: (1) directly feed the result of the logical operation before the current command is submitted to the calculation unit of the following coherent command, without waiting until the current command has been submitted , and then feed the final result to the arithmetic unit of the following instruction. (2) A delayed version is performed on the operation result of the coherent instruction following the current instruction, and the output of the delayed version is used as a register backup resource for reading and writing conflict resolution.

进一步的，步骤四中，冲突方法具体包括：Further, in step 4, the conflict method specifically includes:

(1)，数据在最终运算结果提交前的，中间运算数据提前前馈，中间运算数据提前前馈即不需要到最后的提交阶段，就直接馈入到下个指令的输入端；(1) If the data is before the final calculation result is submitted, the intermediate calculation data is fed forward in advance, and the intermediate calculation data is fed forward in advance, that is, it does not need to go to the final submission stage, and is directly fed to the input terminal of the next instruction;

(2)，数据在最终运算结果提交前的延迟版本前馈；(2), the delayed version of the data is fed forward before the final calculation result is submitted;

(3)，中间运算结果的流水延迟版本与原运算版本并存，在下个相干指令的流水段落中，增加额外的延迟版本寄存器，延迟一拍或几拍，同时保留原来的正常流水段中的寄存器版本；(3), the pipeline delay version of the intermediate operation result coexists with the original operation version. In the pipeline segment of the next coherent instruction, an additional delay version register is added to delay one or several beats, while retaining the registers in the original normal pipeline segment. Version;

(4)，原来流水线不冻结策略；(4), the original pipeline does not freeze strategy;

(5)，不使用插入NOP指令的方法。(5), the method of inserting the NOP instruction is not used.

进一步的，具体的，ADD与XOR指令的RAW冲突解决：把ADD指令的运算结果提前馈入到XOR作为输入，不必等到ADD指令提交再执行；Further, specifically, RAW conflict resolution between the ADD and XOR instructions: feed the operation result of the ADD instruction into the XOR as input in advance, without waiting for the ADD instruction to be submitted before executing;

STORE与LOAD指令的RAW冲突解决：STORE的结果作为LOAD的地址计算输入，通过在LOAD指令前面，添加延迟后的寄存器；RAW conflict resolution between STORE and LOAD instructions: The result of STORE is used as the address calculation input of LOAD, and the delayed register is added in front of the LOAD instruction;

ADD与STORE指令的RAW冲突解决：ADD指令的运算结果直接馈入到下一级作为偏移地址，进行STORE命令寻址地址的计算；RAW conflict resolution between ADD and STORE instructions: the operation result of the ADD instruction is directly fed to the next level as the offset address, and the address address of the STORE command is calculated;

LOAD与ADD指令的RAW冲突解决：LOAD指令的结果直接作为ADD的输入，ADD的中间结果延迟一拍的版本作为DMEM访问的偏移地址。RAW conflict resolution between LOAD and ADD instructions: the result of the LOAD instruction is directly used as the input of ADD, and the intermediate result of ADD is delayed by one beat as the offset address of DMEM access.

进一步的，在步骤一中，将可能存在的所有读写冲突归类后，存储在本地档案表中：冲突检索表格，包括索引及冲突类型；利用超标量技术，从指令存储器中，取出多条指令，存储到指令缓存窗口中，进行提前的“相干窗口”界定与“相干指令”的寻找；通过定义“相干窗口”与“相干指令”，进行指令流的读写冲突的寻找，如果存在指令之间的相干冲突，则通过搜寻已经预先存储在本地的指令集之间的冲突类型，进行冲突的快速提前规避。Further, in step 1, after classifying all read-write conflicts that may exist, store them in the local file table: conflict retrieval table, including index and conflict type; using superscalar technology, take out multiple Instructions are stored in the instruction cache window, and the "coherent window" definition and the search for "coherent instructions" are performed in advance; by defining the "coherent window" and "coherent instructions", the search for read and write conflicts of the instruction stream is performed. If there are instructions For the coherent conflicts between them, by searching for the conflict types between the instruction sets that have been pre-stored locally, the conflicts can be quickly avoided in advance.

一种RISC-CPU中指令集之间冲突的规避系统，包括冲突解决控制单元，应用在RISC-CPU中，所述突解决控制单元用于实现：将可能存在的所有读写冲突归类后，存储在本地档案表中：冲突检索表格，包括索引及冲突类型；利用超标量技术，从指令存储器中，取出多条指令，存储到指令缓存窗口中，进行提前的“相干窗口”界定与“相干指令”的寻找；通过定义“相干窗口”与“相干指令”，进行指令流的读写冲突的寻找，如果存在指令之间的相干冲突，则通过搜寻已经预先存储在本地的指令集之间的冲突类型，进行冲突的快速提前规避。A system for avoiding conflicts between instruction sets in a RISC-CPU, including a conflict resolution control unit, which is applied in a RISC-CPU, and the conflict resolution control unit is used to implement: after classifying all possible read-write conflicts, Stored in the local file table: conflict retrieval table, including index and conflict type; using superscalar technology, multiple instructions are taken out from the instruction memory, stored in the instruction cache window, and the "coherence window" definition and "coherence window" are defined in advance Instructions" search; by defining "coherent window" and "coherent instructions", search for read-write conflicts of instruction streams. Conflict type, to quickly avoid conflicts in advance.

本发明的有益效果：Beneficial effects of the present invention:

本发明针对当前的RISC-CPU的指令集，对于指令之间存在的冲突类型，提前进行了指令集之间冲突的统计分析与归类，然后通过冲突“相干指令”寻找的结果，控制“冲突寻找与解决”单元，达到相干窗口内部流水线流畅的目的。For the instruction set of the current RISC-CPU, the present invention performs statistical analysis and classification of the conflicts between the instruction sets in advance for the types of conflicts existing between the instructions, and then controls the "conflicts" based on the result of searching for conflicting "coherent instructions". Find and solve" unit to achieve the purpose of smooth pipeline inside the coherent window.

本发明提供了解决RISC-CPU设计中指令集之间冲突的有效策略；定义了指令“相干窗口”与“相干指令”，以解决“相干窗口”内部的“相干指令”读/写，写/读，写/写冲突。本文利用RAW冲突的解决作为例子，分析了几种主要指令集之间的冲突，该工程方法进行指令冲突的相关性量化界定与冲突解决，通过添加读写冲突定位与解决硬件单元，其可以有效规避RISC-CPU不同指令集之间存在的指令冲突，同时，提高了流水吞吐效率。The present invention provides an effective strategy for solving conflicts between instruction sets in RISC-CPU design; defines the instruction "coherent window" and "coherent instruction" to solve the "coherent instruction" read/write, write/ Read, write/write conflicts. This paper uses RAW conflict resolution as an example to analyze conflicts between several major instruction sets. This engineering method quantifies and resolves instruction conflicts. By adding read and write conflicts to locate and resolve hardware units, it can effectively Avoid instruction conflicts between different instruction sets of RISC-CPU, and at the same time, improve pipeline throughput efficiency.

附图说明Description of drawings

图1是RISC-CPU所涉及的指令集的分类；Figure 1 is the classification of instruction sets involved in RISC-CPU;

图2是“相干窗口”与“相干指令”的示意；Figure 2 is a schematic diagram of "coherent window" and "coherent instruction";

图3是本发明的提出的背景与工作流程；Fig. 3 is the background and workflow of the present invention;

图4是添加的冲突定位与解决的硬件单元；Fig. 4 is the added hardware unit of conflict location and resolution;

图5是ADD与XOR指令的RAW冲突解决；Figure 5 is the RAW conflict resolution of ADD and XOR instructions;

图6是STORE与LOAD指令的RAW冲突解决；Figure 6 is the RAW conflict resolution of STORE and LOAD instructions;

图7是ADD与STORE指令的RAW冲突解决；Figure 7 is the RAW conflict resolution of ADD and STORE instructions;

图8是LOAD与ADD指令的RAW冲突解决。Figure 8 is the RAW conflict resolution of LOAD and ADD instructions.

具体实施方式：detailed description:

下面结合附图对本发明进行详细说明：The present invention is described in detail below in conjunction with accompanying drawing:

如图1所示，本发明提出了一种用于解决CPU指令冲突有效方法。本发明把设计中的所有指令分成4大类(数据处理指令、分支Branch指令、立即数指令、存储访问指令)，从每类指令中抽取具有代表性的指令子集，分别解决指令子集内部与它们之间的RAW(写-后-读)直接数据相干冲突。预先统计分析出所有潜在的RAW冲突，然后进行冲突的定位以及解决，达到流水线顺畅的目的。As shown in FIG. 1, the present invention proposes an effective method for resolving CPU instruction conflicts. The present invention divides all instructions in the design into 4 categories (data processing instructions, branch Branch instructions, immediate data instructions, and storage access instructions), extracts representative instruction subsets from each type of instruction, and solves the internal problems of the instruction subsets respectively. Conflicts with RAW (write-after-read) direct data coherence between them. Pre-statistically analyze all potential RAW conflicts, and then locate and resolve conflicts to achieve a smooth pipeline.

本发明提出的RISC-CPU设计把全部指令集合分成数据处理(Data-Processing),存储器访问(STORE/LOAD)，分支(Branch)，立即数(Immediate)四种指令类。对于每种指令(分支Branch除外)类内部与指令类之间，以RAW(写-后-读)的冲突解决作为例子进行解释，共有几种RAW冲突(数据处理、存储器访问、立即数指令集内部与它们之间)，需要进行数据依赖引起的相干冲突解决。The RISC-CPU design proposed by the present invention divides all instruction sets into data processing (Data-Processing), memory access (STORE/LOAD), branch (Branch), and immediate data (Immediate) four instruction types. For each type of instruction (except branch Branch) and between instruction classes, RAW (write-after-read) conflict resolution is used as an example to explain. There are several RAW conflicts (data processing, memory access, immediate instruction set Internal and between them), coherent conflict resolution caused by data dependencies is required.

“相干窗口”定义：对于超标量指令处理，不需要考虑指令的乱序执行，对于当前指令而言，如果其需要对某个寄存器进行写入，则需要在如下的窗口内：如果当前指令对寄存器堆中的具体寄存器或者存储器的某个位置进行操作；从当前指令往下面指令流推，从取出指令一直到当前指令结果提交所涉及到的指令序列。Definition of "coherent window": For superscalar instruction processing, it is not necessary to consider the out-of-order execution of instructions. For the current instruction, if it needs to write to a certain register, it needs to be within the following window: Operate on a specific register in the register file or a certain location in the memory; push from the current instruction to the instruction flow below, from fetching the instruction until submitting the instruction sequence involved in the result of the current instruction.

“相干指令”定义：基于前面已经界定好的“相干窗口”，不考虑指令的乱序执行，对“当前指令”而言，如果其需要对某个寄存器或者存储器的某个偏移地址进行写入，则在“相干窗口”范围内，从当前指令往下面指令搜寻，同样也需要对“当前指令”进行写入或者读取的一个或者多个指令。这些存在数据依赖关系的指令，可能存在数据读写冲突，引起流水线的中断，降低指令吞吐效率(CPI)。Definition of "coherent instruction": Based on the previously defined "coherence window", regardless of the out-of-order execution of instructions, for the "current instruction", if it needs to write to a certain register or a certain offset address of the memory If input is entered, within the scope of the "coherence window", search from the current instruction to the following instructions, and also need to write or read one or more instructions to the "current instruction". These instructions with data dependencies may have data read and write conflicts, causing pipeline interruption and reducing instruction throughput efficiency (CPI).

对于每种情况，分别给出“相干窗口”与“相干指令”。相干窗口的例子示意见图2，比如ADD(粗实线)与XOR(细虚线)指令。相干指令的例子示意见图2，比如ADD指令的相干指令对应XOR与STORE指令。对于所有同类指令集之间，比如：对于数据处理指令子集：ADD与XOR指令，如表1、2所示，分别进行RAW冲突的解决，如图3所示。比如：对于存储器访问指令(LOAD/STORE)，如表3、4，5，6所示：For each case, "coherence window" and "coherence instruction" are given respectively. Examples of coherent windows are shown in FIG. 2 , such as ADD (thick solid line) and XOR (thin dashed line) instructions. An example of a coherent instruction is shown in FIG. 2 . For example, the coherent instruction of the ADD instruction corresponds to the XOR and STORE instructions. For all similar instruction sets, for example: for the subset of data processing instructions: ADD and XOR instructions, as shown in Tables 1 and 2, RAW conflicts are resolved respectively, as shown in Figure 3. For example: For memory access instructions (LOAD/STORE), as shown in Tables 3, 4, 5, and 6:

表1数据处理(Data-Processing instruction)指令集的指令格式Table 1 Instruction format of Data-Processing instruction instruction set

表2数据处理指令集(ADD加法指令、XOR异或指令)的代表Table 2 Representative of data processing instruction set (ADD addition instruction, XOR exclusive OR instruction)

表3，存储器到寄存器(LOAD)指令格式Table 3, memory-to-register (LOAD) instruction format

表4，LOAD指令类子集中的代表Table 4, representatives of the subset of LOAD instruction classes

LDRLDR R_D←[R_S]R _D ←[R _S ] LDR R_D,[R_S]LDR R _D ,[R _S ]

表5，寄存器到存储器(STORE)指令格式Table 5, Register-to-Memory (STORE) Instruction Format

表6，STORE指令类子集中的代表Table 6, representatives of the subset of STORE instruction classes

STRSTR R_D→[R_S]R _D → [R _S ] STR R_D,[R_S]STR R _D ,[R _S ]

分别进行RAW冲突的解决，如图5所示。RAW conflicts are resolved separately, as shown in FIG. 5 .

对于所有异类指令集之间，分别进行RAW冲突的解决，如图6，7，8所示。For all heterogeneous instruction sets, RAW conflicts are resolved respectively, as shown in Figures 6, 7, and 8.

对于其余的指令集，立即数指令，分支指令，它们之间的RAW冲突不再进行分析。For the rest of the instruction sets, immediate data instructions, branch instructions, RAW conflicts between them are no longer analyzed.

位于“相干窗口”内，冲突解决的方法，包括：(1)把当前命令提交阶段前的逻辑运算结果，直接馈入到下面的相干指令的运算单元，而不用等到当前指令已经提交，再把最后的结果馈入到后面指令的运算单元。(2)在当前指令的后面的相干指令的运算结果进行延迟版本，利用延迟版本的输出作为读写冲突解决的寄存器备份资源。Located in the "coherence window", conflict resolution methods include: (1) Feed the logical operation results before the current command submission stage to the calculation unit of the following coherent command, instead of waiting until the current command has been submitted, and then send The final result is fed to the arithmetic unit of the following instruction. (2) A delayed version is performed on the operation result of the coherent instruction following the current instruction, and the output of the delayed version is used as a register backup resource for reading and writing conflict resolution.

本设计只分析了数据处理与存储器访问指令之间的RAW冲突。对于其它指令集之间以及其余的WAR/WAW的冲突与规避，不再具体举例子介绍。This design only analyzes the RAW conflicts between data processing and memory access instructions. Conflicts and avoidances between other instruction sets and other WAR/WAWs will not be introduced with specific examples.

结合图3、4，进行本发明的指令冲突解决说明。RISC-CPU硬件设计中，由于相邻的指令之间可能存在数据读写依赖关系，即对于寄存器堆中的同一个寄存器或者存储器的同样偏移地址的访问存在读写顺序冲突，导致流水线必须停顿，降低效率：RAW/WAR/WAR。对于RISC-CPU的设计，指令集之间存在的冲突会导致流水线的中断，降低指令吞吐效率。按照MIPS的经典5段流水段落：阶段1-取指令(fetch)、阶段2-指令译码(decode)、阶段3-执行指令(excute)、阶段4-存储器访问(memory access，针对内部的Memory而言，含load/store两大类)、阶段5-回写。5段流水(Write Back，运算结果写回到RISC-CPU中的寄存器堆Register-bank)。经典MIPS架构中，指令最长的占用5个流水段落(比如访问存储器STORE指令，然后回写入到寄存器堆)；普通的非存储访问的数据运算指令，占用4段流水(比如ADD加法指令，最终相加的结果写入到寄存器堆的某个寄存器中，ADD指令中间不经过存储器访问阶段)，有的指令在解码阶段就执行完了(比如跳转JMP指令)；条件分支指令，由于其需要改变指令流的顺序，引起指令的非顺序执行；可见，RISC-CPU设计中，不同指令集合之间由于占用的流水段不同，并且指令排列的随机性很强，导致对寄存器堆中同一个寄存器或存储器某偏移地址的读/写很可能存在数据依赖冲突。In conjunction with Fig. 3 and Fig. 4, the instruction conflict resolution of the present invention is described. In RISC-CPU hardware design, there may be data read and write dependencies between adjacent instructions, that is, access to the same register in the register file or the same offset address of the memory has read and write order conflicts, resulting in the pipeline must be stopped , reduce efficiency: RAW/WAR/WAR. For the design of RISC-CPU, the conflict between the instruction sets will cause the interruption of the pipeline and reduce the efficiency of instruction throughput. According to the classic 5-stage pipeline of MIPS: stage 1-fetch, stage 2-decode, stage 3-execute, stage 4-memory access, for internal Memory As far as it is concerned, it includes load/store two categories), phase 5-write back. 5-stage pipeline (Write Back, the operation result is written back to the Register-bank in the RISC-CPU). In the classic MIPS architecture, the longest instruction occupies 5 pipeline segments (such as accessing the memory STORE instruction, and then writing it back to the register file); ordinary non-storage access data operation instructions occupy 4 pipeline segments (such as ADD addition instruction, The result of the final addition is written into a certain register of the register file, the ADD instruction does not go through the memory access stage), and some instructions are executed in the decoding stage (such as the jump JMP instruction); the conditional branch instruction, due to its need Changing the order of the instruction stream causes the non-sequential execution of the instructions; it can be seen that in the design of RISC-CPU, due to the different pipeline segments occupied by different instruction sets, and the randomness of the instruction arrangement is very strong, the same register in the register file Or there may be a data dependency conflict in the read/write of a certain offset address of the memory.

本发明，通过分析经典MIPS架构中，不同指令集合之间可能的寄存器数据读写依赖引起的冲突，定义了针对当前指令的“相干窗口”，并且从“相干窗口”中寻找针对当前指令的“相干指令”，通过提前分析好的RISC-CPU中不同指令集之间的数据依赖关系，进行冲突类型的确定以及流水效率的规避：(1)后面“相关指令”读取旧的寄存器数据或覆盖还没有读取的新的写入数据，即原来的指令顺序的数据依赖关系被破坏，引起功能错误；(2)以及原来的指令流水线被中断(包括由于流水线被冻结或流水线插入NOP指令引起的指令吞吐降低)，流水线一旦被冻结或者一味的整体延迟，将会引起流水优势的削弱，降低指令吞吐效率。The present invention defines the "coherence window" for the current instruction by analyzing the conflicts caused by the possible register data read and write dependencies between different instruction sets in the classic MIPS architecture, and searches for the "coherence window" for the current instruction from the "coherence window". "Coherent instructions", by analyzing the data dependencies between different instruction sets in the RISC-CPU in advance, determine the conflict type and avoid the pipeline efficiency: (1) The following "related instructions" read old register data or overwrite The newly written data that has not been read, that is, the data dependency of the original instruction sequence is destroyed, causing a functional error; (2) and the original instruction pipeline is interrupted (including due to the pipeline being frozen or the pipeline inserting the NOP instruction) Instruction throughput is reduced), once the pipeline is frozen or blindly delayed as a whole, it will cause the weakening of the advantage of pipeline and reduce the efficiency of instruction throughput.

本发明中，分析的例子使用以下策略进行数据冲突解决同时保证流水线吞吐效率：(1)，数据在最终运算结果提交前的，中间运算数据提前前馈；(2)数据在最终运算结果提交前的延迟版本前馈；(3)，中间运算结果的延流水迟版本与原运算版本并存；(4)，原来流水线不冻结策略；(5)，不使用：插入NOP指令的方法。比如：对于第(1)种方法：中间运算数据提前前馈(即不需要到最后的提交阶段)，就直接馈入到下个指令的输入端；对于第(3)种方法：在下个相干指令的流水段落中，增加额外的延迟版本寄存器(延迟一拍或几拍)，同时保留原来的正常流水段中的寄存器版本。In the present invention, the analyzed example uses the following strategies to resolve data conflicts while ensuring pipeline throughput efficiency: (1) before the final calculation result is submitted, the intermediate calculation data is fed forward in advance; (2) the data is before the final calculation result is submitted (3), the delay version of the intermediate operation result coexists with the original operation version; (4), the original pipeline does not freeze the strategy; (5), does not use: the method of inserting the NOP instruction. For example: for the (1) method: the intermediate operation data is fed forward in advance (that is, it does not need to go to the final submission stage), and it is directly fed into the input of the next instruction; for the (3) method: in the next coherent In the pipeline section of the instruction, an additional delayed version register (delayed by one or several beats) is added, while the original register version in the normal pipeline section is retained.

针对当前的RISC-CPU的指令集，对于指令之间存在的冲突类型，提前进行了指令集之间冲突的统计分析与归类，然后通过冲突“相干指令”寻找的结果，控制“冲突寻找与解决”单元，达到相干窗口内部流水线流畅的目的。图4、6是利用中间运算结果提前前馈机制；图5、7是利用计算结果的延迟版本方法。目前的已经有的方法，是NOP机制，该方法会将流水线往后延迟，本专利不再使用这种方法，同时，本专利不使用冻结流水线的方法，并且本专利的延迟版本是保留原来的流水寄存器，同时针对预先知道的相干指令冲突归类结果，备份延迟版本，这样达到解决“相干窗口”内部的局部指令冲突，同时，不会破坏原来的正常版本的流水段数据传递。本专利，需要充分利用数据前馈，延迟版本等多种方法，达到相干窗口内部，流水线顺畅。本专利，把指令的冲突解决从统计归类，到定位解决进行系统的管理。同时，本专利冲突解决框架，也可以吸收新的指令解决技术，以使得整个指令冲突解决系统更加完备。For the current RISC-CPU instruction set, for the types of conflicts between the instructions, the statistical analysis and classification of the conflicts between the instruction sets are carried out in advance, and then the results of the search for the conflict "related instructions" are used to control the "conflict search and "Solution" unit to achieve the goal of smooth pipeline inside the coherent window. Figures 4 and 6 are the advance feed-forward mechanism using the intermediate calculation results; Figures 5 and 7 are the delayed version methods using the calculation results. The current existing method is the NOP mechanism, which will delay the pipeline. This method is no longer used in this patent. At the same time, this patent does not use the method of freezing the pipeline, and the delayed version of this patent retains the original At the same time, for the classification results of the pre-known coherent instruction conflicts, the pipeline register backs up the delayed version, so as to solve the local instruction conflicts inside the "coherence window", and at the same time, it will not destroy the original normal version of the pipeline segment data transmission. In this patent, it is necessary to make full use of various methods such as data feedforward and delayed version to reach the inside of the coherent window and the pipeline is smooth. In this patent, the conflict resolution of instructions is systematically managed from statistical classification to positioning resolution. At the same time, the patent conflict resolution framework can also absorb new command resolution technologies to make the entire command conflict resolution system more complete.

对图1的解释：图1是几种指令集(数据处理、立即数、分支、存储访问)之间的冲突解决。Explanation of Figure 1: Figure 1 is the conflict resolution between several instruction sets (data processing, immediate data, branch, storage access).

对图2的解释：图2给出针对当前的ADD指令、当前XOR指令两种指令的“相干窗口”；并且对于ADD指令，给出了存在数据读写冲突的“相干指令”：XOR、STORE，这两个指令均为ADD指令的相干指令。对图3的解释：给出了本发明的基于“相干窗口”与“相干指令”解决指令冲突方法的整体流程。这种方法，可以作为一种RISC-CPU中的指令读写冲突智能定位与解决模块，添加到RISC-CPU硬核中。当然，针对需要当前设计的RISC-CPU的架构、流水段数、指令集种类、指令格式等确定后，需要再进行RISC-CPU硬核的开发。传统方法，没有将冲突的解决，从系统的角度，进行量化定位与解决。Explanation of Figure 2: Figure 2 shows the "coherent window" for the current ADD instruction and the current XOR instruction; and for the ADD instruction, it gives the "coherent instructions" with data read and write conflicts: XOR, STORE , both of these instructions are related instructions of the ADD instruction. Explanation of Fig. 3: The overall flow of the method for resolving instruction conflicts based on the "coherent window" and "coherent instruction" of the present invention is given. This method can be added to the hard core of the RISC-CPU as an intelligent positioning and resolution module of instruction read-write conflicts in the RISC-CPU. Of course, after the architecture, number of pipeline segments, instruction set types, and instruction formats of the currently designed RISC-CPU are determined, it is necessary to develop the RISC-CPU hard core. Traditional methods do not quantify and resolve conflicts from a systematic perspective.

该专利，首先对不同种类或同种种类指令之间的读写冲突进行预先的统计分析，并且进行冲突的严格归类，将可能存在的所有读写冲突归类后，存储在本地档案表中：冲突检索表格(索引，冲突类型)；利用超标量技术，从指令存储器中，取出多条指令，存储到指令缓存窗口中，进行提前的“相干窗口”界定与“相干指令”的寻找；通过严格的定义“相干窗口”与“相干指令”，进行指令流的读写冲突的寻找，如果存在指令之间的相干冲突，则通过搜寻已经预先存储在本地的指令集之间的冲突类型，进行冲突的快速提前规避(通过添加的冲突解决控制单元)。该发明中的表1、2、3是列举了几种指令，表1为数据处理指令的数据格式以及指令助记符，表2与表3是存储器访问LOAD/STORE指令的数据格式以及指令助记符；This patent first conducts pre-statistical analysis on the read-write conflicts between different types or the same type of instructions, and strictly classifies the conflicts, and stores all possible read-write conflicts in the local file table after classification : conflict retrieval table (index, conflict type); using superscalar technology, take out multiple instructions from the instruction memory, store them in the instruction cache window, and carry out the "coherent window" definition and the search for "coherent instructions" in advance; through Strictly define "coherent window" and "coherent instruction" to search for conflicts between reading and writing of instruction streams. If there are coherent conflicts between instructions, search for conflict types between instruction sets that have been pre-stored locally. Fast early avoidance of conflicts (via added conflict resolution control unit). Tables 1, 2, and 3 in the invention list several instructions. Table 1 is the data format and instruction mnemonic of the data processing instruction. Table 2 and Table 3 are the data format and instruction helper of the memory access LOAD/STORE instruction. mark;

图5、7是前级指令结果提交前提前前馈与图6、8是后续相干指令的中间运算结果延迟版本输出，这两种方法都是为了防止流水中断。相干指令不同延迟节拍的输出，需要利用一些额外的寄存器，这些寄存器是计算结果的延迟版本，其并没有破坏原来流水的结果。对于其余解决冲突的技术，该发明可以扩展到本发明的技术框架中。Figures 5 and 7 are feedforward in advance before the result of the preceding instruction is submitted, and Figures 6 and 8 are the output of the delayed version of the intermediate operation results of subsequent coherent instructions. Both methods are to prevent pipeline interruption. The output of different delay beats of coherent instructions needs to use some additional registers. These registers are delayed versions of calculation results, which do not destroy the original pipeline results. For other conflict resolution techniques, this invention can be extended into the technical framework of the present invention.

如果当前指令，没有对寄存器堆或存储器进行访问，则分析下一条指令，“相干窗口”向下滑动；如果“相干窗口”内没有“相干指令”，则没有读写冲突，分析结束。If the current instruction does not access the register file or memory, the next instruction is analyzed, and the "coherent window" slides down; if there is no "coherent instruction" in the "coherent window", there is no read-write conflict, and the analysis ends.

对图5的解释：把ADD指令的运算结果提前馈入到XOR作为输入，不必等到ADD指令提交再执行，保证流水线的流畅。Explanation of Figure 5: The operation result of the ADD instruction is fed into the XOR as an input in advance, and there is no need to wait for the ADD instruction to be submitted before execution, so as to ensure the smoothness of the pipeline.

对图6的解释：STORE的结果作为LOAD的地址计算输入，通过在LOAD指令前面，添加延迟后的寄存器，保证流水线的流畅。Explanation of Figure 6: The result of STORE is used as the input of the address calculation of LOAD. By adding a delayed register in front of the LOAD instruction, the smoothness of the pipeline is ensured.

对图7的解释：ADD指令的运算结果直接馈入到下一级作为偏移地址，进行STORE命令寻址地址的计算，保证流水线的流畅。Explanation of Figure 7: The operation result of the ADD instruction is directly fed to the next stage as the offset address, and the address address of the STORE command is calculated to ensure the smoothness of the pipeline.

对图8的解释：LOAD指令的存储结果直接送到下面相关指令：LOAD指令的结果直接作为ADD的输入，ADD的中间结果延迟一拍的版本作为DMEM访问的偏移地址，保证流水线的流畅。Explanation of Figure 8: The storage result of the LOAD instruction is directly sent to the following related instructions: the result of the LOAD instruction is directly used as the input of ADD, and the version of the intermediate result of ADD delayed by one beat is used as the offset address of DMEM access to ensure the smoothness of the pipeline.

本发明能够实现利用“相干窗口”与“相干指令”进行指令冲突寻找与指令规避。是对多种指令集之间的冲突进行解决的有效的简单策略，对于RISC-CPU硬件指令普通冲突解决有益。The present invention can realize instruction conflict search and instruction avoidance by using "coherent window" and "coherent instruction". It is an effective and simple strategy for resolving conflicts between multiple instruction sets, and is beneficial for common conflict resolution of RISC-CPU hardware instructions.

智能冲突寻找与解决的单元添加在RISC-CPU设计，可应用于超标量CPU设计。本发明把数据提前、不使用流水冻结、不使用插入NOP指令、延迟前馈或基于中间运算结果多拍延迟版本等技术组合使用的流水线顺畅实现技术。The unit of intelligent conflict finding and resolution is added in RISC-CPU design, which can be applied to superscalar CPU design. The present invention combines technologies such as data advance, no freezing of pipelines, no insertion of NOP instructions, delayed feedforward or multi-shot delayed versions based on intermediate calculation results, etc. to achieve smooth pipeline technology.

上述虽然结合附图对本发明的具体实施方式进行了描述，但并非对本发明保护范围的限制，所属领域技术人员应该明白，在本发明的技术方案的基础上，本领域技术人员不需要付出创造性劳动即可做出的各种修改或变形仍在本发明的保护范围以内。Although the specific implementation of the present invention has been described above in conjunction with the accompanying drawings, it does not limit the protection scope of the present invention. Those skilled in the art should understand that on the basis of the technical solution of the present invention, those skilled in the art do not need to pay creative work Various modifications or variations that can be made are still within the protection scope of the present invention.

Claims

1. a method for avoiding conflict between instruction sets in a RISC-CPU, characterized in that, comprising the following steps:

Step 1: Determine the type of conflict according to the data dependencies between different instruction sets in the RISC-CPU;

Step 2: For the current instruction, determine whether it needs to access the "register file" or "memory", if so, proceed to step 3, otherwise, continue to analyze the next instruction;

Step 3: Define the "coherent window" for the current command, search for the "coherent command" in the "coherent window", and judge whether there is a "coherent command", if so, go to step 4, otherwise, judge that there is no read-write conflict ;

Step 4: According to the type of conflict between instruction sets, select a conflict method, and use a specific strategy to resolve data conflicts while ensuring pipeline throughput efficiency.

2. the avoidance method of conflict between instruction sets in a kind of RISC-CPU as claimed in claim 1, it is characterized in that, in step 1, carry out statistical analysis and merge at the structure of RISC-CPU, instruction set and pipeline section Class, all instruction sets are divided into data processing instruction set, memory access instruction set, branch instruction set and immediate number instruction set.

3. the avoidance method of conflict between instruction sets in a kind of RISC-CPU as claimed in claim 2, it is characterized in that, in step 1, after obtaining four kinds of instruction sets, the conflict between statistical analysis instruction sets, Traverse and classify the types of read-write conflicts between various instruction sets, and obtain the conflicts introduced by the existing instruction data dependence: the following "related instructions" read the old register data or overwrite the new write data that has not been read , that is, the data dependency of the original instruction sequence is destroyed, causing a functional error; and the original instruction pipeline is interrupted, including the instruction throughput reduction caused by the pipeline being frozen or the pipeline inserting a NOP instruction.

4. the avoidance method of conflict between instruction sets in a kind of RISC-CPU as claimed in claim 1 or 3, it is characterized in that, the classification of conflict type is specifically: data processing, memory access, immediate data instruction set interior and between them;

Preferably, the RAW conflict between ADD and XOR instructions is resolved: the operation result of the ADD instruction is fed into XOR as an input in advance, and it is not necessary to wait until the ADD instruction is submitted before execution;

RAW conflict resolution between STORE and LOAD instructions: The result of STORE is used as the address calculation input of LOAD, and the delayed register is added in front of the LOAD instruction;

RAW conflict resolution between ADD and STORE instructions: the operation result of the ADD instruction is directly fed to the next level as the offset address, and the address address of the STORE command is calculated;

RAW conflict resolution between LOAD and ADD instructions: the result of the LOAD instruction is directly used as the input of ADD, and the version of the intermediate result of ADD delayed by one beat is used as the offset address of DM EM access.

5. the method for avoiding conflicts between instruction sets in a kind of RISC-CPU as claimed in claim 1, it is characterized in that, in step 3, " coherent window " definition: for superscalar instruction processing, need not consider instruction Out-of-order execution, for the current instruction, if it needs to write to a certain register, it needs to be in the following window: if the current instruction operates on a specific register in the register file or a certain location in the memory; from the current The instruction is pushed to the following instruction flow, from fetching the instruction to submitting the instruction sequence involved in the result of the current instruction.

6. The avoidance method of conflict between instruction sets in a kind of RISC-CPU as claimed in claim 5, it is characterized in that, in step 3, " coherent instruction " definition: based on the " coherent window " that has defined, not Considering the out-of-order execution of instructions, for the "current instruction", if it needs to write to a certain register or a certain offset address of the memory, within the scope of the "coherent window", search from the current instruction to the following instruction , It also needs to write or read one or more instructions for the "current instruction". These instructions with data dependencies may have data read and write conflicts, causing pipeline interruption and reducing instruction throughput efficiency.

7. the avoidance method of conflict between instruction sets in a kind of RISC-CPU as claimed in claim 1, it is characterized in that, be positioned at " coherence window ", the method for conflict resolution comprises: (1) present order submission stage The result of the previous logic operation is directly fed into the operation unit of the following related instruction, instead of waiting until the current instruction has been submitted, and then the final result is fed into the operation unit of the following instruction. (2) A delayed version is performed on the operation result of the coherent instruction following the current instruction, and the output of the delayed version is used as a register backup resource for reading and writing conflict resolution.

8. the avoidance method of conflict between instruction sets in a kind of RISC-CPU as claimed in claim 1, it is characterized in that, in step 4, conflict method specifically comprises:

(1) If the data is before the final calculation result is submitted, the intermediate calculation data is fed forward in advance, and the intermediate calculation data is fed forward in advance, that is, it does not need to go to the final submission stage, and is directly fed to the input terminal of the next instruction;

(2), the delayed version of the data is fed forward before the final calculation result is submitted;

(3), the pipeline delay version of the intermediate operation result coexists with the original operation version. In the pipeline segment of the next coherent instruction, an additional delay version register is added to delay one or several beats, while retaining the registers in the original normal pipeline segment. Version;

(4), the original pipeline does not freeze strategy;

(5), the method of inserting the NOP instruction is not used.

9. the method for avoiding conflicts between instruction sets in a kind of RISC-CPU as claimed in claim 1, it is characterized in that, in step 1, after all read-write conflicts that may exist are classified, be stored in local file table Middle: Conflict retrieval table, including index and conflict type; using superscalar technology, fetch multiple instructions from the instruction memory, store them in the instruction cache window, and define the "coherent window" and search for "coherent instructions" in advance; By defining "coherent window" and "coherent instruction", search for read and write conflicts of instruction streams. If there is a coherent conflict between instructions, search for the conflict type between the instruction sets that have been pre-stored locally. rapid early avoidance.

10. A system for avoiding conflicts between instruction sets in a RISC-CPU, characterized in that it includes a conflict resolution control unit, which is applied in a RISC-CPU, and the conflict resolution control unit is used to realize: all possible reads After write conflicts are classified, they are stored in the local file table: conflict retrieval table, including index and conflict type; using superscalar technology, multiple instructions are taken out from the instruction memory, stored in the instruction cache window, and "coherent" is performed in advance Window” definition and search for “coherent instructions”; by defining “coherent windows” and “coherent instructions”, search for read-write conflicts of instruction streams, if there are coherent conflicts between instructions, it will be pre-stored locally by searching The types of conflicts between different instruction sets are used to quickly avoid conflicts in advance.