CN1673954A - Method for holding data consistency when register document inbedding operation under multi process - Google Patents
Method for holding data consistency when register document inbedding operation under multi process Download PDFInfo
- Publication number
- CN1673954A CN1673954A CN 200410029747 CN200410029747A CN1673954A CN 1673954 A CN1673954 A CN 1673954A CN 200410029747 CN200410029747 CN 200410029747 CN 200410029747 A CN200410029747 A CN 200410029747A CN 1673954 A CN1673954 A CN 1673954A
- Authority
- CN
- China
- Prior art keywords
- value
- scoreboard
- instruction
- register file
- zone bit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Advance Control (AREA)
Abstract
本发明公开了一种多进程下寄存器文件置入操作时保持数据一致性的方法,包括以下步骤:预先设置寄存器文件的条目的记分牌标志位,置入指令执行时置记分牌标志位为占用值,置入指令完成置记分牌标志位为空闲值;判断指令需要访问的寄存器文件的条目的记分牌标志位的值是否为占用值,如果是则挂起执行当前指令的进程并执行下一进程,否则执行指令。本发明所提供的这种方法,在严格保证寄存器文件置入操作时的数据一致性的同时又保证了流水线的连续运行,而不会在流水线的执行中造成中断,从而极大地提高了CPU执行指令的效率。
The invention discloses a method for maintaining data consistency during register file insertion operation under multi-process, comprising the following steps: pre-setting the scoreboard flag position of the entry of the register file, and setting the scoreboard flag position as occupied when the insertion instruction is executed value, put in the instruction and set the scoreboard flag as an idle value; judge whether the value of the scoreboard flag of the register file entry that the instruction needs to access is an occupied value, if so, suspend the process of executing the current instruction and execute the next process, otherwise execute the instruction. The method provided by the present invention ensures the continuous operation of the pipeline while strictly ensuring the data consistency when the register file is placed into the operation, without causing interruption in the execution of the pipeline, thereby greatly improving the performance of the CPU. command efficiency.
Description
技术领域technical field
本发明属于数据通道技术,特别是一种多进程下寄存器文件置入操作时保持数据一致性的方法。The invention belongs to data channel technology, in particular to a method for maintaining data consistency during register file insertion operations under multi-process.
背景技术Background technique
目前大多数计算机都是基于冯.诺依曼所提出的可编程计算机模型来开发的。按照冯.诺依曼提出的可编程计算机模型的理论,中央处理器(CPU)主要是由控制器和数据通道组成,其中数据通道代表实现每条指令所必须的逻辑电路,例如加法或者逻辑操作等,而控制器则负责根据取得的指令向数据通道发送控制信号以激活数据通道中相应的处理电路。At present, most computers are developed based on the programmable computer model proposed by von Neumann. According to the theory of the programmable computer model proposed by von Neumann, the central processing unit (CPU) is mainly composed of a controller and a data channel, where the data channel represents the logic circuit necessary to implement each instruction, such as addition or logic operation etc., and the controller is responsible for sending a control signal to the data channel according to the obtained instruction to activate the corresponding processing circuit in the data channel.
数据通道部分主要包括寄存器文件、算术逻辑单元(ALU)和局部存储器,其中寄存器文件是一组通用存储的寄存器,用来存储当前计算链中要使用的数据字,ALU单元是提供所有算术操作和逻辑操作功能的一组电路,而局部存储器又称为高速缓冲存储器,局部存储器既与CPU通讯又与主系统存储器通讯,主要用来提高CPU访问主系统存储器的速度。The data channel part mainly includes a register file, an arithmetic logic unit (ALU) and a local memory. The register file is a set of general-purpose storage registers used to store data words to be used in the current calculation chain. The ALU unit provides all arithmetic operations and A group of circuits with logical operation functions, and the local memory is also called cache memory. The local memory communicates with both the CPU and the main system memory. It is mainly used to increase the speed of the CPU's access to the main system memory.
寄存器文件的操作包括三种方式,分别为寄存器文件到寄存器文件操作方式、存储操作方式和置入操作方式。在寄存器文件到寄存器文件的操作方式中,数据源是寄存器文件,而经过处理后的数据结果又回写到寄存器文件中,存储操作方式是将寄存器文件中的数据转移到局部存储器中,而置入操作方式是将局部存储器中的数据转移到寄存器文件中。The operation of the register file includes three modes, which are register file to register file operation mode, store operation mode and put operation mode. In the register file to register file operation mode, the data source is the register file, and the processed data result is written back to the register file. The storage operation mode is to transfer the data in the register file to the local memory, and set The input operation mode is to transfer the data in the local memory to the register file.
图1所示为置入操作方式数据流向示意图。如图1所示,数据源为局部存储器,寄存器文件接收来自局部存储器的数据。当置入指令执行时,ALU单元提供数据地址信号,置入和存储单元(LSU)根据ALU单元提供的数据地址信号取出局部存储器中的数据,并且将取出的数据发送到寄存器文件中。LSU单元主要用来处理CPU系统中的置入指令和存储指令。置入指令是完成CPU系统从局部存储器中读回数据到寄存器文件的指令,而存储指令是完成将寄存器文件的数据写入到局部存储器的指令。由于置入和存储指令执行的时间较长,为了不影响CPU主流水线的正常运行,在LSU单元保存的置入和存储命令并不是真正的置入和存储指令的内容,而是翻译后的置入和存储命令。这两个命令中主要包含了命令的源地址、目的地址、数据长度等信息。Figure 1 shows a schematic diagram of data flow in the insert operation mode. As shown in Figure 1, the data source is a local memory, and the register file receives data from the local memory. When the put instruction is executed, the ALU unit provides a data address signal, and the put and store unit (LSU) fetches the data in the local memory according to the data address signal provided by the ALU unit, and sends the fetched data to the register file. The LSU unit is mainly used to process put instructions and store instructions in the CPU system. The put instruction is an instruction for the CPU system to read back data from the local memory to the register file, and the store instruction is an instruction for completing the writing of the data in the register file to the local memory. Due to the long execution time of the insert and store instructions, in order not to affect the normal operation of the CPU main pipeline, the insert and store instructions stored in the LSU unit are not the actual contents of the insert and store instructions, but the translated set Enter and store commands. These two commands mainly include the source address, destination address, data length and other information of the command.
LSU单元一般包括命令先入先出缓存器(FIFO)、结果FIFO、读命令控制模块、输出控制模块、命令译码及MEM总线仲裁模块。由于流水线上可能会需要执行多个置入和存储指令,所以用命令FIFO缓存置入和存储指令,用结果FIFO缓存读回来的结果数据。当流水线执行置入指令时,置入指令将被放置到命令FIFO,局部存储器读入的数据将被放置到结果FIFO,而LSU单元中的读命令控制模块持续去读命令FIFO中的内容,并将具体的指令写入到命令译码及MEM总线仲裁模块。命令译码及MEM总线仲裁模块根据指令的内容产生对局部存储器的操作信号。命令译码及MEM总线仲裁模块首先取出源地址,然后将源地址信号再结合读片选信号写到它与局部存储器的接口管脚上,当从局部存储器中成功读回数据后就结合此置入指令的命令字中的目的地址组成一个新的结果数据,同时将其写入到结果FIFO中。当命令译码及MEM总线仲裁模块读入的命令是存储指令时,就先取出命令中的目的地址和数据,然后将这些目的地址和数据信号再结合写片选信号写到它与局部存储器的接口管脚上。The LSU unit generally includes a command first-in-first-out buffer (FIFO), a result FIFO, a read command control module, an output control module, a command decoding and MEM bus arbitration module. Since multiple insert and store instructions may need to be executed on the pipeline, the insert and store instructions are cached with the command FIFO, and the result data read back is cached with the result FIFO. When the pipeline executes the put instruction, the put instruction will be placed into the command FIFO, the data read from the local memory will be placed into the result FIFO, and the read command control module in the LSU unit continues to read the content in the command FIFO, and Write specific instructions into the command decoding and MEM bus arbitration module. The command decoding and MEM bus arbitration module generates the operation signal to the local memory according to the content of the command. The command decoding and MEM bus arbitration module first takes out the source address, and then writes the source address signal combined with the read chip select signal to the interface pin between it and the local memory. When the data is successfully read back from the local memory, it combines this setting The destination address in the command word of the incoming instruction composes a new result data and writes it into the result FIFO at the same time. When the command decoding and the MEM bus arbitration module read in the command is a storage command, it first takes out the destination address and data in the command, and then writes these destination addresses and data signals to it and the local memory in combination with the write chip select signal. interface pins.
为了能够在一个计算机系统中正确地实现对寄存器文件的操作,数据必须在ALU单元能够访问之前已经从存储器转移到寄存器文件中,因此必须要考虑数据一致性的问题,即每次ALU单元操作时所使用的寄存器文件中寄存器的具体的数值必须是ALU单元应该使用的数值。为了保证寄存器文件数据在CPU处理过程中的一致性,就必须保证ALU单元在能够访问这些数据之前,这些数据已经从存储器转移到寄存器中。例如假设当前一条指令是置入操作,这条置入操作指令是将局部存储器中的某个地址的数据写入到某个寄存器文件中的一个寄存器中,而在接下来的另一条指令要用到这个寄存器的值进行计算处理,那么在这条需要用到寄存器的值进行计算处理的指令能够使用寄存器的数据作为计算处理时的操作符时,前一条置入操作指令必须已经完成将存储器中的数据转入到寄存器文件中的这个寄存器中,否则获得的结果就与期望的结果不相符合。In order to be able to correctly implement the operation of the register file in a computer system, the data must have been transferred from the memory to the register file before the ALU unit can be accessed, so the problem of data consistency must be considered, that is, every time the ALU unit operates The specific values of the registers in the register file used must be the values that the ALU unit should use. In order to ensure the consistency of register file data during CPU processing, it must be ensured that the data has been transferred from memory to registers before the ALU unit can access the data. For example, assuming that the current instruction is a put operation, this put operation instruction is to write the data of a certain address in the local memory into a register in a certain register file, and the next instruction will use The value of this register is calculated and processed, so when this instruction that needs to use the value of the register for calculation and processing can use the data of the register as the operator during calculation and processing, the previous put operation instruction must have been completed and stored in the memory The data is transferred to this register in the register file, otherwise the obtained result does not match the expected result.
现有技术中一般是在指令流水线执行时,如果在译码阶段发现指令是置入指令,则在流水线执行过程中插入几个等待周期以保证置入指令能够正确地执行完毕,然后才执行接下来的指令。例如,假设某进程执行一个寄存器文件置入指令而且CPU采用时分复用的方式可以同时执行4个硬件进程,而LSU单元需要进行4个周期才能完成将外部存储空间的数据读取出来,那么按照现有技术的设计方法,此执行置入指令的进程需要等待4个周期才能执行下一条指令,由于硬件进程是每隔4个周期调度一次,那么此时指令执行同正常情况下没有差别,假设LSU单元需要执行8个周期才能完成将外部存储器空间的数据读取出来,此执行置入指令的进程就需要等待8个周期以后才能执行下一条指令。那么在下一次调度到此进程时,该进程就处于无操作状态,而只有当再一次调度到此任务时,也就是正好过了8个周期的时候,该进程才能再执行下一条指令。如果能成功读回存储器数据的间隔时间越长,那么需要插入的等待周期就越多。In the prior art, when the instruction pipeline is executed, if it is found that the instruction is a built-in instruction in the decoding stage, several waiting cycles are inserted in the pipeline execution process to ensure that the inserted instruction can be executed correctly, and then the next instruction is executed. down instruction. For example, assuming that a process executes a register file insertion instruction and the CPU can execute 4 hardware processes at the same time by means of time-division multiplexing, and the LSU unit needs 4 cycles to complete reading the data from the external storage space, then according to According to the design method of the prior art, the process of executing the inserted instruction needs to wait for 4 cycles before executing the next instruction. Since the hardware process is scheduled every 4 cycles, there is no difference between the execution of the instruction at this time and the normal situation. Suppose The LSU unit needs to execute 8 cycles to finish reading the data in the external memory space, and the process of executing the embedded instruction needs to wait 8 cycles before executing the next instruction. Then when this process is scheduled for the next time, the process is in a no-operation state, and only when this task is scheduled again, that is, when exactly 8 cycles have passed, the process can execute the next instruction. The longer the interval between successful readbacks of memory data, the more wait cycles need to be inserted.
图2为现有技术中执行指令的流程图。如图2所示,它包括以下步骤:Fig. 2 is a flow chart of executing instructions in the prior art. As shown in Figure 2, it includes the following steps:
步骤201:取指令并对指令译码;Step 201: Fetching instructions and decoding the instructions;
步骤202:判断指令是否为置入指令,如果是执行步骤203,如果不是执行步骤204;Step 202: Determine whether the instruction is an insert instruction, if yes, execute step 203, if not, execute step 204;
步骤203:在置入指令后面插入等待周期;Step 203: inserting a wait cycle behind the put instruction;
步骤204:执行指令并结束。Step 204: Execute the instruction and end.
图2表明现有技术中执行置入指令和执行别的指令的操作过程是不同的。当不是执行置入指令时将直接执行该指令,而当执行置入指令时,置入指令后面的其它指令只有等待置入指令结束后才能正常进行,因此整个流水线的执行过程都受到影响。FIG. 2 shows that the operation process of executing the embedded instruction and executing other instructions in the prior art is different. When the placed instruction is not executed, the instruction will be executed directly, and when the placed instruction is executed, other instructions behind the placed instruction can only proceed normally after waiting for the placed instruction to end, so the execution process of the entire pipeline is affected.
图3为现有技术中指令流水线执行时的时序示意图,如图3所示:CPU的流水线包括取指令(Fetch)、指令译码(Decode)、读操作数(Read)、指令执行(Execute)和回写寄存器(Write back)这些阶段。图3所示的load指令为置入指令的一种。在取指令Fetch阶段,首先执行load指令,接着执行sub,addic,multi,addi等几条指令。在指令译码Decode阶段,如果发现是load指令而且整个置入操作需要进行2个周期才能结束,则在load指令后面插入2个等待周期(wait-cycle),只有在这2个等待周期以后才开始执行sub指令。Fig. 3 is the time sequence schematic diagram when instruction pipeline is executed in the prior art, as shown in Fig. 3: the pipeline of CPU includes instruction fetch (Fetch), instruction decoding (Decode), read operand (Read), instruction execution (Execute) And write back registers (Write back) these stages. The load instruction shown in FIG. 3 is a kind of put instruction. In the instruction Fetch stage, the load instruction is executed first, and then several instructions such as sub, addic, multi, and addi are executed. In the instruction decoding Decode stage, if it is found that it is a load instruction and the entire insertion operation needs 2 cycles to end, then insert 2 wait-cycles (wait-cycle) after the load instruction, only after these 2 wait-cycles. Start executing the sub command.
现有技术是通过在置入指令后面插入等待周期来保证置入操作时的数据一致性,如果CPU指令的执行是按照单进程的方式进行的话,这种插入等待周期的方法是一种必要的手段,但是如果在多进程机制下,这种插入等待周期的方法就完全不能体现多进程执行的优势,因为现有技术中在流水线执行时存在等待周期,而在等待周期内流水线上不能执行置入指令后面的任何指令的任何操作,因此CPU执行指令的效率必然会受到很大程度的影响,从而大大降低了CPU执行指令的效率。The existing technology is to ensure the data consistency during the insert operation by inserting a wait cycle after the insert instruction. If the execution of the CPU instruction is carried out in a single-process manner, this method of inserting the wait cycle is necessary. means, but if under the multi-process mechanism, this method of inserting the waiting period cannot reflect the advantages of multi-process execution at all, because in the prior art, there is a waiting period when the pipeline is executed, and the pipeline cannot execute the setting in the waiting period. Therefore, the efficiency of CPU executing instructions will be greatly affected, thus greatly reducing the efficiency of CPU executing instructions.
发明内容Contents of the invention
有鉴于此,本发明的主要目的是提供一种多进程下寄存器文件置入操作时保证数据一致性的方法,以提高CPU执行指令的效率。In view of this, the main purpose of the present invention is to provide a method for ensuring data consistency during register file insertion operations under multi-process, so as to improve the efficiency of CPU executing instructions.
为达到上述目的,本发明的技术方案是这样实现的:In order to achieve the above object, technical solution of the present invention is achieved in that way:
一种多进程下寄存器文件置入操作时保持数据一致性的方法,包括以下步骤:A method for maintaining data consistency during register file placement operations under multi-process, comprising the following steps:
A1、预先为寄存器文件的条目设置记分牌标志位,置入操作时将目的条目的记分牌标志位的值设置为占用值,置入操作完成时,将目的条目的记分牌标志位的值设置为空闲值;A1. Set the scoreboard flag for the entry of the register file in advance. When inserting the operation, set the value of the scoreboard flag of the destination entry as the occupied value. When the insert operation is completed, set the value of the scoreboard flag of the destination entry. is the idle value;
A2、判断指令需要访问的寄存器文件的条目的记分牌标志位的值是否为占用值,如果是占用值则设置执行此指令的进程状态为挂起并执行下一进程,如果不是占用值则执行指令。A2. Determine whether the value of the scoreboard flag of the entry of the register file that the instruction needs to access is an occupied value. If it is an occupied value, set the state of the process executing the instruction to suspend and execute the next process. If it is not an occupied value, execute instruction.
步骤A1所述的设置寄存器文件的条目的记分牌标志位的方法为:将D寄存器设置为寄存器文件的条目的记分牌标志位、或将随机存储器RAM的一位或多位设置为寄存器文件的条目的记分牌标志位。The method for the scoreboard flag bit of the entry of setting register file described in step A1 is: the scoreboard flag bit of D register is set as the entry of register file, or one or more bits of random access memory RAM is set as register file The entry's scoreboard flag.
步骤A1所述的将记分牌标志位的值设置为占用值或空闲值的方法为:置入操作时,ALU单元将所述的记分牌标志位的值设置为占用值,置入操作完成时,LSU单元将所述记分牌标志位的值设置为空闲值。The method of setting the value of the scoreboard flag position as an occupied value or an idle value described in step A1 is: during the insertion operation, the ALU unit sets the value of the scoreboard flag position as an occupied value, and when the insertion operation is completed, , the LSU unit sets the value of the scoreboard flag to an idle value.
步骤A2为:ALU单元判断指令需要访问的寄存器文件的条目的记分牌标志位的值是否为占用值,如果是占用值,LSU单元向进程选择切换模块发送记分牌标志位占用信号,Step A2 is: the ALU unit judges whether the value of the scoreboard flag of the entry of the register file that the instruction needs to access is an occupied value, if it is an occupied value, the LSU unit sends the scoreboard flag occupied signal to the process selection switching module,
进程选择切换模块根据所述的记分牌标志位占用信号设置执行此指令的进程状态为挂起并执行下一进程;如果不是占用值则执行指令。The process selection switching module sets the state of the process executing the instruction as suspended and executes the next process according to the occupancy signal of the scoreboard flag position; if it is not an occupancy value, the instruction is executed.
在步骤A2所述的进程挂起之后,进程切换选择模块进一步判断是否所有的进程都挂起,如果是则在指令执行流水线插入空指令,否则执行下一进程。After the processes described in step A2 are suspended, the process switching selection module further judges whether all the processes are suspended, and if so, inserts a null instruction into the instruction execution pipeline, otherwise executes the next process.
该方法进一步包括:进程挂起后,如果置入指令完成,则LSU单元将记分牌标志位的值更新为空闲值,向进程切换选择模块发送记分牌标志位空闲信号,进程切换选择模块判断是否还有其它引起进程挂起的原因,如果没有,进程切换选择模块设置进程的状态为准备;如果有,进程继续等待,直至接收其它设置进程状态为准备的信号。The method further includes: after the process hangs, if the insertion instruction is completed, the LSU unit updates the value of the scoreboard flag to an idle value, sends the scoreboard flag idle signal to the process switching selection module, and the process switching selection module judges whether There are other reasons that cause the process to hang, if not, the process switching selection module sets the state of the process as ready; if there is, the process continues to wait until receiving other signals that set the process state as ready.
从以上技术方案可以看出,本发明采用记分牌技术和进程挂起机制相结合的方法来保证多进程下寄存器文件置入操作时的数据一致性。首先给寄存器文件的条目设置记分牌标志位,当置入指令运行时将置入指令的寄存器文件的目标条目的记分牌标志位的值设置为占用值,等成功回写该条目后再将记分牌标志位设置为空闲值,而当CPU指令试图访问记分牌标志位为占用值的寄存器文件条目时,该进程就处于挂起状态,同时下一个周期,选择别的进程来执行,而不是现有技术中在置入指令后面插入等待周期而造成流水线指令执行的中断,所以应用本发明后在严格保证了寄存器文件置入操作时的数据一致性的同时又保证了流水线的连续运行,而不会在流水线的执行中造成中断,并不受有的进程由于执行寄存器文件置入指令时对应的寄存器文件的条目没有及时更新而需要使该进程进入等待状态的影响,从而极大地提高了CPU执行指令的效率。It can be seen from the above technical solutions that the present invention adopts the method of combining the scoreboard technology and the process suspending mechanism to ensure the data consistency during the multi-process register file insertion operation. First, set the scoreboard flag bit for the entry of the register file. When the instruction is placed into operation, set the value of the scoreboard flag bit of the target entry of the register file of the instruction into the occupied value. After the entry is successfully written back, score The card flag bit is set to an idle value, and when the CPU instruction tries to access the register file entry whose scoreboard flag bit is an occupied value, the process is in a suspended state, and at the same time, in the next cycle, another process is selected for execution instead of the current one. In the prior art, a wait cycle is inserted behind the insert instruction to cause an interruption of the execution of the pipeline instruction. Therefore, after the application of the present invention, the data consistency during the register file insert operation is strictly guaranteed, and the continuous operation of the pipeline is guaranteed. It will cause interruptions in the execution of the pipeline, and will not be affected by the fact that some processes need to enter the waiting state because the corresponding register file entries are not updated in time when the register file is placed into the instruction, thus greatly improving CPU execution. command efficiency.
附图说明Description of drawings
图1为寄存器文件置入操作方式数据流向示意图。FIG. 1 is a schematic diagram of data flow in a register file insertion operation mode.
图2为现有技术指令执行流程图。Fig. 2 is a flow chart of instruction execution in the prior art.
图3为现有技术指令流水线执行时序示意图。FIG. 3 is a schematic diagram of execution sequence of an instruction pipeline in the prior art.
图4为CPU多进程执行示意图。FIG. 4 is a schematic diagram of CPU multi-process execution.
图5为本发明一实施例指令执行流程图。FIG. 5 is a flowchart of instruction execution according to an embodiment of the present invention.
图6为本发明一实施例的因需访问尚未更新完毕的寄存器文件的条目而挂起的进程状态改变流程图。FIG. 6 is a flow chart of a state change of a process suspended due to needing to access an entry of a register file that has not been updated in accordance with an embodiment of the present invention.
图7为本发明一实施例指令流水线执行时序示意图。FIG. 7 is a schematic diagram of an execution sequence of an instruction pipeline according to an embodiment of the present invention.
具体实施方式Detailed ways
为使本发明的目的、技术方案和优点表达得更加清楚明白,下面结合附图及具体实施例对本发明再作进一步详细的说明。In order to make the object, technical solution and advantages of the present invention more clearly, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.
本发明的主要思想为通过记分牌标志位和进程挂起机制相结合的方式来严格保证寄存器文件置入操作时的数据一致性。首先给寄存器文件的条目设置记分牌标志位,当置入指令执行时,ALU单元将置入指令的目的条目的记分牌标志位设置为占用值,等到数据成功回写该寄存器文件条目后,LSU单元再将该记分牌标志位设置为空闲值。当CPU的指令试图访问记分牌标志位为占用值的寄存器文件条目时,执行该指令的进程处于挂起状态,而到下一个周期就会选择别的进程来执行,只有当别的所有的进程都挂起的情况下,才在流水线中插入空指令。如果不是所有进程都被挂起,流水线便能正常执行,而不会受到执行寄存器文件置入操作时前一个进程由于对应的寄存器文件条目内容没有及时更新而需要使该进程进入等待状态的影响。The main idea of the present invention is to strictly guarantee the data consistency of the register file inserting operation by combining the scoreboard flag bit and the process suspension mechanism. First set the scoreboard flag bit for the entry of the register file. When the instruction is placed and executed, the ALU unit will set the scoreboard flag bit of the destination entry of the instruction as the occupied value. After the data is successfully written back to the register file entry, the LSU The unit then sets the scoreboard flag to an idle value. When a CPU instruction tries to access a register file entry whose scoreboard flag is an occupied value, the process executing the instruction is in a suspended state, and in the next cycle it will select another process to execute, only when all other processes When both are suspended, a null instruction is inserted into the pipeline. If not all processes are suspended, the pipeline can execute normally without being affected by the need for the previous process to enter the wait state because the corresponding register file entry content is not updated in time when performing a register file put operation.
CPU按照时分复用的方式同时可以执行多个硬件进程,而且每一个硬件进程又可以包含多个软件进程。时分复用时硬件进程的个数可以相应调整,软件进程的个数也可以相应调整,并且硬件进程执行的顺序是固定的。假设有4个硬件进程,图4为4个硬件进程时CPU多进程执行示意图。如图4所示,CPU采用时分复用的方式在一个周期内同时可执行4个硬件进程,并且这4个硬件进程执行的顺序是固定的。比如在一个周期内,先执行硬件进程process0,然后依次执行硬件进程process1、硬件进程process2和硬件进程process3,其中每一个时钟(CLK)内执行1个硬件进程,因此每个硬件进程占用CPU 25%的带宽。每一个硬件进程又可以包含多个软件进程。假设有4个硬件进程而且每个硬件进程包含的软件进程的个数都是n,那么整个系统便包含4n个软件进程,进程切换就是在这4n个软件进程之间进行。进程切换是指条件不允许的软件进程让给条件允许的软件进程先执行,即将条件不允许的软件进程先挂起,让别的软件进程先执行,等该进程条件允许后再重新与别的软件进程一起竞争执行。进程间的切换由专门的进程切换选择模块根据CPU的资源使用情况每隔一个时钟周期进行切换。The CPU can simultaneously execute multiple hardware processes in a time-division multiplexing manner, and each hardware process can contain multiple software processes. During time division multiplexing, the number of hardware processes can be adjusted accordingly, and the number of software processes can also be adjusted accordingly, and the execution order of the hardware processes is fixed. Assuming that there are 4 hardware processes, FIG. 4 is a schematic diagram of CPU multi-process execution when there are 4 hardware processes. As shown in Figure 4, the CPU can simultaneously execute four hardware processes in one cycle in a time-division multiplexing manner, and the order in which these four hardware processes are executed is fixed. For example, in one cycle, the hardware process process0 is executed first, and then the hardware process process1, the hardware process2 and the hardware process3 are executed sequentially, and one hardware process is executed in each clock (CLK), so each hardware process occupies 25% of the CPU bandwidth. Each hardware process can contain multiple software processes. Assuming that there are 4 hardware processes and the number of software processes included in each hardware process is n, then the entire system includes 4n software processes, and process switching is performed between these 4n software processes. Process switching means that the software process that is not allowed by the condition is given to the software process that is allowed to execute first, that is, the software process that is not allowed by the condition is suspended first, and other software processes are executed first, and then restarted with other software processes when the condition is allowed. Software processes compete for execution together. The switching between processes is switched every other clock cycle by a special process switching selection module according to the resource usage of the CPU.
软件进程在执行的过程中总是处于激活(Active)、挂起(Suspend)、准备(Ready)这三种状态中的一种。上电复位后,一个硬件进程对应的n个软件进程刚开始都处于准备态,然后按照一定的原则来从这n个软件进程中选择1个软件进程并将其状态迁移到激活态,而软件进程的挂起状态则是由处于激活状态的软件进程在执行过程中由于资源冲突的原因而切换过来的。例如当CPU去访问一个应该更新而实际尚未更新的寄存器就是这种从激活状态到挂起状态的一种。当某软件进程由于某个原因而挂起后,如果在接下来的某个时刻该进程对应的执行条件又获得了满足,此时此软件进程的状态马上从挂起态转变为准备态。每一个时钟周期将从硬件进程中选择一个硬件进程,同时在选中的硬件进程中选择一个处于激活状态的软件进程,并将选中的软件进程放到流水线的下一级,使下一级能够处理这个软件进程。在下一个时钟周期,再从下一个硬件进程选择一个软件进程,并将其放到流水线的下一级。A software process is always in one of the three states of activation (Active), suspend (Suspend), and preparation (Ready) during execution. After power-on reset, the n software processes corresponding to a hardware process are all in the ready state at the beginning, and then select one software process from the n software processes according to certain principles and migrate its state to the active state, while the software process The suspended state of the process is switched over due to resource conflicts during the execution of the active software process. For example, when the CPU accesses a register that should be updated but has not yet been updated, it is a kind of transition from the active state to the suspended state. When a software process is suspended due to some reason, if the execution condition corresponding to the process is satisfied at a certain moment in the next moment, the state of the software process will immediately change from the suspended state to the ready state. Each clock cycle will select a hardware process from the hardware process, and at the same time select an active software process among the selected hardware processes, and put the selected software process into the next stage of the pipeline so that the next stage can process the software process. On the next clock cycle, a software process is selected from the next hardware process and placed on the next stage of the pipeline.
本发明主要通过记分牌标志位和进程挂起机制相结合的方式来严格保证寄存器文件置入操作时的数据一致性。当指令试图访问内容尚未更新完成的寄存器文件条目时,执行该指令的进程挂起,并且利用记分牌标志位来表示置入指令的寄存器文件目标条目的内容更新状况。The present invention strictly guarantees the data consistency when the register file is inserted into the operation mainly through the combination of the scoreboard flag and the process suspending mechanism. When an instruction attempts to access a register file entry whose content has not been updated, the process executing the instruction is suspended, and a scoreboard flag is used to indicate the update status of the contents of the register file target entry placed in the instruction.
寄存器文件中的内容从逻辑上可以由多个条目组成,预先设置寄存器文件的条目的记分牌标志位。记分牌标志位可以为寄存器文件的D寄存器、或随机存储器(RAM)的一位或多位。设定当记分牌标志位的值为1代表寄存器文件的条目的内容没有更新完毕,当记分牌标志位的值为0代表寄存器文件的条目的内容已经更新完毕。The content in the register file can be logically composed of multiple entries, and the scoreboard flag bits of the entries in the register file are preset. The scoreboard flag can be a D register of a register file, or one or more bits of a random access memory (RAM). It is set that when the value of the scoreboard flag bit is 1, it means that the content of the entry of the register file has not been updated, and when the value of the scoreboard flag bit is 0, it means that the content of the entry of the register file has been updated.
基于图4所示的CPU多进程执行示意图,图5为应用本发明一实施例的指令执行流程图。如图5所示,包括以下步骤:Based on the schematic diagram of CPU multi-process execution shown in FIG. 4 , FIG. 5 is a flow chart of instruction execution according to an embodiment of the present invention. As shown in Figure 5, it includes the following steps:
步骤501:取指令并对指令译码;Step 501: Fetch instructions and decode the instructions;
步骤502:判断指令是否为置入指令,如果是则执行步骤503及其后续步骤,如果不是则执行步骤504及其后续步骤;Step 502: judging whether the instruction is an insert instruction, if yes, execute
步骤503:判断置入指令的目标条目的记分牌标志位的值是否为1,如果是则执行步骤508,如果不是则执行步骤505及其后续步骤;Step 503: judging whether the value of the scoreboard flag of the target entry of the instruction is 1, if yes, execute
步骤504:判断需要访问的寄存器文件的条目的记分牌标志位的值是否为1,如果是则执行步骤508,如果不是则执行步骤509;Step 504: judging whether the value of the scoreboard flag of the entry of the register file that needs to be accessed is 1, if yes, execute
步骤505:ALU单元设置记分牌标志位的值为1;Step 505: ALU unit sets the value of the scoreboard flag to 1;
步骤506:执行置入操作;Step 506: Execute the insertion operation;
步骤507:LSU单元设置记分牌标志位的值为0并发送记分牌标志位空闲信号到进程切换选择模块并结束;Step 507: the LSU unit sets the value of the scoreboard flag to 0 and sends the scoreboard flag idle signal to the process switching selection module and ends;
步骤508:LSU单元发送记分牌标志位占用信号到进程切换选择模块,进程切换选择模块设置进程状态为挂起并结束;Step 508: The LSU unit sends the scoreboard flag occupation signal to the process switching selection module, and the process switching selection module sets the process status as suspended and terminated;
步骤509:进程执行指令并结束。Step 509: the process executes the instruction and ends.
以上过程中,也可以预先设定当记分牌标志位的值为0代表记分牌标志位被占用,当记分牌标志位为的值1代表记分牌标志位空闲。In the above process, it can also be preset that when the value of the scoreboard flag is 0, it means that the scoreboard flag is occupied, and when the value of the scoreboard flag is 1, it means that the scoreboard flag is free.
以上过程中,当进程挂起后,每一个时钟周期将从硬件进程中选择一个硬件进程,同时在选中的硬件进程中再选择一个处于激活状态的软件进程,并将选中的软件进程放到流水线的下一级,使下一级能够处理这个软件进程。In the above process, when the process is suspended, each clock cycle will select a hardware process from the hardware process, and at the same time select an active software process from the selected hardware process, and put the selected software process into the pipeline The next level of , so that the next level can handle this software process.
以上过程中,寄存器文件的条目还可以对应多于1个记分牌标志位。当一个条目对应多个记分牌标志位的时候,在置入指令写入条目的时候,将这个条目所对应的所有记分牌标志位都设置为预先设定的占用值,当置入指令完成写入该条目时,设置该条目所对应的所有记分牌标志位为预先设定的空闲值。In the above process, an entry in the register file may also correspond to more than one scoreboard flag. When an entry corresponds to multiple scoreboard flags, when the insert instruction writes the entry, all the scoreboard flags corresponding to this entry are set to the preset occupancy value, and when the insert instruction finishes writing When entering this entry, set all the scoreboard flags corresponding to this entry to the preset free values.
以上过程中,其中在步骤508之后进程切换选择模块可以进一步判断是否所有的进程都已经被挂起,如果是则在流水线插入空指令,否则执行下一个选中的软件进程。In the above process, after
当进程试图访问寄存器文件的某条目时,ALU单元判断指令需要访问的寄存器文件的条目的记分牌标志位的值是否为占用值,如果是占用值,LSU单元向进程选择切换模块发送记分牌标志位占用信号,该进程挂起,同时进程切换选择模块记录此进程挂起的原因。当LSU单元完成对寄存器文件的条目的更新操作时,LSU单元更新记分牌标志位并且发送记分牌标志位空闲信号到进程切换选择模块,进程切换选择模块收到此信号则认定引起该进程挂起的这个原因已不存在,同时将判断是否还有别的引起进程挂起的原因,如果没有别的引起进程挂起的原因,进程切换选择模块将该进程的状态从挂起设置为准备。When a process tries to access an entry in the register file, the ALU unit judges whether the value of the scoreboard flag of the entry in the register file that the instruction needs to access is an occupied value. If it is an occupied value, the LSU unit sends the scoreboard flag to the process selection switching module bit occupation signal, the process is suspended, and the process switching selection module records the reason for the process suspension. When the LSU unit finishes updating the entries of the register file, the LSU unit updates the scoreboard flag and sends the scoreboard flag idle signal to the process switching selection module, and the process switching selection module will determine that the process is suspended when it receives this signal This reason no longer exists, and will judge simultaneously whether there are other reasons that cause the process to hang, if there are no other reasons that cause the process to hang, the process switching selection module is set to the state of the process from being suspended to preparing.
基于图4和图5所示,图6为本发明一实施例的因需访问尚未更新完毕的寄存器文件的条目而挂起的进程状态改变流程图。如图6所示,包括以下步骤:Based on what is shown in FIG. 4 and FIG. 5 , FIG. 6 is a flow chart of a process state change that is suspended due to the need to access an entry of a register file that has not been updated, according to an embodiment of the present invention. As shown in Figure 6, the following steps are included:
步骤601:当进程因需访问的寄存器文件的条目的记分牌标志位的值为占用值而挂起时,进程切换选择模块记录进程挂起的原因;Step 601: when the process is suspended because the value of the scoreboard flag of the entry of the register file to be accessed is an occupied value, the process switching selection module records the reason for the process suspension;
步骤602:寄存器文件的条目更新完毕后,LSU单元更新记分牌标志位的值为空闲值并发出记分牌标志位空闲信号到进程切换选择模块;Step 602: After the entry of the register file is updated, the LSU unit updates the value of the scoreboard flag to be an idle value and sends a scoreboard flag idle signal to the process switching selection module;
步骤603:进程切换选择模块判断是否还有其他引起进程挂起的原因,如果是则执行步骤604,否则执行步骤605;Step 603: The process switching selection module judges whether there are other reasons for process suspension, if so, execute step 604, otherwise execute step 605;
步骤604:进程继续保存挂起状态,等待其他改变进程状态的信号并结束;Step 604: The process continues to save the suspended state, waits for other signals to change the process state and ends;
步骤605:进程切换选择模块根据LSU单元发出的记分牌标志位空闲信号设置进程状态为准备并结束。Step 605: The process switching selection module sets the process state as ready and finished according to the scoreboard flag idle signal sent by the LSU unit.
当进程的状态为准备以后,进程切换选择模块就可以按照一定的选择原则而执行该进程。When the state of the process is ready, the process switching selection module can execute the process according to a certain selection principle.
图7为本发明一实施例指令流水线执行时序示意图。如图7所示,在指令译码Decode阶段执行完load指令后立即就执行sub指令,而不用插入任何等待周期。在这种情况下,CPU的整个流水线在执行load指令同执行别的指令一样,对整个流水线丝毫没有影响,而不会因为插入了等待周期而降低效率。FIG. 7 is a schematic diagram of an execution sequence of an instruction pipeline according to an embodiment of the present invention. As shown in Figure 7, the sub instruction is executed immediately after the load instruction is executed in the Decode stage of instruction decoding, without inserting any waiting cycle. In this case, the entire pipeline of the CPU executes the load instruction the same as executing other instructions, and has no effect on the entire pipeline, and the efficiency will not be reduced due to the insertion of waiting cycles.
以上过程中,以4个硬件进程为例说明CPU多进程各进程之间的切换,而实际上硬件进程的个数可以相应调整,并且软件进程的个数也可以相应调整。In the above process, 4 hardware processes are taken as an example to illustrate the switching between CPU multi-process processes. In fact, the number of hardware processes can be adjusted accordingly, and the number of software processes can also be adjusted accordingly.
以上所述,仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.
Claims (6)
- Keep the method for data consistency when 1, register file is inserted operation under a kind of multi-process, it is characterized in that, may further comprise the steps:A1, in advance for item in register file is provided with the scoreboard zone bit, the value of inserting the scoreboard zone bit of when operation purpose clauses and subclauses is set to the value of taking, and inserts operation when finishing, the value of the scoreboard zone bit of purpose clauses and subclauses is set to free value;Whether the value of the scoreboard zone bit of the item in register file that A2, decision instruction need be visited is the value of taking, if the value of taking then is provided with the process status of this instruction of execution for hanging up and carrying out next process, then executes instruction if not the value of taking.
- 2, method according to claim 1, it is characterized in that the steps A 1 described method that the scoreboard zone bit of item in register file is set is: the D register is set to the scoreboard zone bit of item in register file or with the one or more scoreboard zone bits that are set to item in register file of random access memory ram.
- 3, according to right 1 described method, it is characterized in that, the method that the value of steps A 1 described scoreboard zone bit is set to the value of taking or free value is: when inserting operation, the value of the described scoreboard zone bit in ALU unit is set to the value of taking, insert and operate when finishing, the value of the described scoreboard zone bit in LSU unit is set to free value.
- 4, according to right 1 described method, it is characterized in that steps A 2 is: whether the value of the scoreboard zone bit of the item in register file that the instruction of ALU unit judges need be visited is the value of taking, if the value of taking, the LSU unit sends scoreboard zone bit Seize ACK message to the process selection handover moduleIt is to hang up and carry out next process that the process selection handover module is provided with the process status of carrying out this instruction according to described scoreboard zone bit Seize ACK message; Then execute instruction if not the value of taking.
- 5, according to right 1 described method, it is characterized in that after the described processes of steps A 2 were hung up, process switching selected module to judge whether that further all processes all hang up, if then insert dummy instruction, otherwise carry out next process at the instruction execution pipeline.
- 6, according to right 3 described methods, it is characterized in that, this method further comprises: after process is hung up, if inserting instruction finishes, then the LSU unit is updated to free value with the value of scoreboard zone bit, selects module to send scoreboard zone bit idle signal to process switching, and process switching selects module to judge whether to also have other to cause the reason that process is hung up, if no, process switching selects module state of a process to be set for preparing; If have, process continue to wait for, process status is set is preparative until receiving other.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNB2004100297476A CN1310138C (en) | 2004-03-24 | 2004-03-24 | Method for holding data consistency when register document inbedding operation under multi process |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNB2004100297476A CN1310138C (en) | 2004-03-24 | 2004-03-24 | Method for holding data consistency when register document inbedding operation under multi process |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN1673954A true CN1673954A (en) | 2005-09-28 |
| CN1310138C CN1310138C (en) | 2007-04-11 |
Family
ID=35046520
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CNB2004100297476A Expired - Fee Related CN1310138C (en) | 2004-03-24 | 2004-03-24 | Method for holding data consistency when register document inbedding operation under multi process |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN1310138C (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107430509A (en) * | 2015-04-19 | 2017-12-01 | 森蒂彼得塞米有限公司 | The run time parallelization of the code execution of specification is accessed based on Approximation Register |
| WO2020108212A1 (en) * | 2018-11-26 | 2020-06-04 | 深圳云天励飞技术有限公司 | Register access timing sequence management method, processor, electronic device and computer-readable storage medium |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP0636985B1 (en) * | 1993-07-27 | 1998-04-08 | International Business Machines Corporation | Process monitoring in a multiprocessing server |
| JP2000099354A (en) * | 1998-09-18 | 2000-04-07 | Nec Ibaraki Ltd | Device and method for replacing process for multiprocessor system |
| JP2000349909A (en) * | 1999-06-03 | 2000-12-15 | Nec Corp | Virtual multi-processing system by flow definition file |
-
2004
- 2004-03-24 CN CNB2004100297476A patent/CN1310138C/en not_active Expired - Fee Related
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107430509A (en) * | 2015-04-19 | 2017-12-01 | 森蒂彼得塞米有限公司 | The run time parallelization of the code execution of specification is accessed based on Approximation Register |
| WO2020108212A1 (en) * | 2018-11-26 | 2020-06-04 | 深圳云天励飞技术有限公司 | Register access timing sequence management method, processor, electronic device and computer-readable storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN1310138C (en) | 2007-04-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP4170218B2 (en) | Method and apparatus for improving the throughput of a cache-based embedded processor by switching tasks in response to a cache miss | |
| KR100936601B1 (en) | Multiprocessor system | |
| US9842056B2 (en) | Systems and methods for non-blocking implementation of cache flush instructions | |
| US7908443B2 (en) | Memory controller and method for optimized read/modify/write performance | |
| CN103077132B (en) | A kind of cache handles method and protocol processor high-speed cache control module | |
| CN1890631A (en) | Transitioning from instruction cache to trace cache on label boundaries | |
| CN101414252B (en) | Data processing apparatus | |
| CN1410893A (en) | Microprocessor with prefetching instructions and method of prefetching to its cache | |
| CN112559389B (en) | Storage control device, processing device, computer system and storage control method | |
| CN110806900A (en) | Memory access instruction processing method and processor | |
| CN105786448A (en) | Instruction scheduling method and device | |
| CN101546293B (en) | Cache control apparatus, information processing apparatus, and cache control method | |
| US6862670B2 (en) | Tagged address stack and microprocessor using same | |
| CN100533428C (en) | Semiconductor device | |
| CN1310138C (en) | Method for holding data consistency when register document inbedding operation under multi process | |
| US7155718B1 (en) | Method and apparatus to suspend and resume on next instruction for a microcontroller | |
| KR100190377B1 (en) | Bus interface unit of microprocessor | |
| CN111078289A (en) | Method for executing sub-threads of a multi-threaded system and multi-threaded system | |
| CN117055811A (en) | Bus access command processing method, device, chip and storage medium | |
| WO2022257898A1 (en) | Task scheduling method, system, and hardware task scheduler | |
| CN115080121A (en) | Instruction processing method, apparatus, electronic device, and computer-readable storage medium | |
| CN118605941B (en) | CPU capable of quickly processing memory copy instructions and method thereof | |
| CN119127748B (en) | Flash Memory Controllers, Microcontrollers | |
| CN1720503A (en) | Method and apparatus for high speed cross-thread interrupts in a multithreaded processor | |
| CN114995884A (en) | Instruction retirement unit, instruction execution unit, and related apparatus and method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| TR01 | Transfer of patent right | ||
| TR01 | Transfer of patent right |
Effective date of registration: 20201223 Address after: Pan Du Zhen pan Du Cun Nan, yuncheng county, Heze City, Shandong Province Patentee after: Yuncheng Pandu branch of Shandong Zhongyou Construction Engineering Co.,Ltd. Address before: Unit 2414-2416, main building, no.371, Wushan Road, Tianhe District, Guangzhou City, Guangdong Province Patentee before: GUANGDONG GAOHANG INTELLECTUAL PROPERTY OPERATION Co.,Ltd. Effective date of registration: 20201223 Address after: Unit 2414-2416, main building, no.371, Wushan Road, Tianhe District, Guangzhou City, Guangdong Province Patentee after: GUANGDONG GAOHANG INTELLECTUAL PROPERTY OPERATION Co.,Ltd. Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd. |
|
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20070411 Termination date: 20200324 |