WO2025036099A1 - Method and system for processing task to be processed, and storage medium and electronic apparatus - Google Patents
Method and system for processing task to be processed, and storage medium and electronic apparatus Download PDFInfo
- Publication number
- WO2025036099A1 WO2025036099A1 PCT/CN2024/106811 CN2024106811W WO2025036099A1 WO 2025036099 A1 WO2025036099 A1 WO 2025036099A1 CN 2024106811 W CN2024106811 W CN 2024106811W WO 2025036099 A1 WO2025036099 A1 WO 2025036099A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- instruction
- executed
- control signal
- processed
- task
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/32—Circuit design at the digital level
- G06F30/327—Logic synthesis; Behaviour synthesis, e.g. mapping logic, HDL to netlist, high-level language to RTL or netlist
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/32—Circuit design at the digital level
- G06F30/33—Design verification, e.g. functional simulation or model checking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/3001—Arithmetic instructions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30029—Logical and Boolean instructions, e.g. XOR, NOT
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30032—Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the embodiments of the present disclosure relate to the field of communications, and in particular, to a method and system for processing tasks to be processed, a storage medium, and an electronic device.
- ASIC chips One of the major design difficulties of ASIC chips is the division of software and hardware, that is, which services are implemented by the CPU and which services are implemented by the hardware accelerator. Because the chip development cycle is long, the scenarios in commercial use are often quite different from those in the early stages of design, which requires a certain degree of flexibility to be reserved in the early stages of chip design, but this means an increase in chip costs, which leads to a dilemma.
- the embodiments of the present disclosure provide a method and system for processing a task to be processed, a storage medium, and an electronic device, so as to at least solve the problem of low flexibility in chip design in the related art.
- a method for processing a task to be processed which is applied to a hardware accelerator, and includes: determining an instruction set corresponding to the task to be processed based on parameter information and requirement information of the task to be processed, wherein the requirement information is used to indicate a processing result corresponding to the task to be processed; determining a first instruction address of each instruction to be executed in the instruction set; and executing each instruction to be executed based on the first instruction address of each instruction to be executed to process the task to be processed.
- a processing system for a task to be processed including: an instruction extraction unit, configured to determine an instruction set corresponding to the task to be processed based on parameter information and requirement information of the task to be processed, wherein the requirement information is used to indicate a processing result corresponding to the task to be processed; determining an instruction address of each instruction to be executed in the instruction set; and an execution unit, configured to execute each instruction to be executed according to the instruction address of each instruction to be executed to process the task to be processed.
- a computer-readable storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps of any one of the above method embodiments when running.
- an electronic device including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to run the computer program to execute the steps in any one of the above method embodiments.
- FIG1 is a hardware structure block diagram of a computer terminal of a method for processing a task to be processed according to an embodiment of the present disclosure
- FIG2 is a flow chart of a method for processing a task to be processed according to an embodiment of the present disclosure
- FIG3 is a flow chart of a method for processing a task to be processed according to an optional embodiment of the present disclosure
- FIG4 is an internal structure diagram of a hardware accelerator according to an embodiment of the present disclosure.
- FIG5 is a schematic diagram of a general register according to an embodiment of the present disclosure.
- FIG6 is a schematic diagram of an instruction according to an embodiment of the present disclosure (I);
- FIG7 is a schematic diagram of an instruction according to an embodiment of the present disclosure (II);
- FIG8 is a schematic diagram of an instruction according to an embodiment of the present disclosure (III);
- FIG9 is a schematic diagram of an instruction according to an embodiment of the present disclosure (IV).
- FIG10 is a schematic diagram of an instruction according to an embodiment of the present disclosure (V);
- FIG11 is a schematic diagram of an instruction according to an embodiment of the present disclosure (VI);
- FIG12 is a schematic diagram of an instruction according to an embodiment of the present disclosure (VII);
- FIG13 is a schematic diagram of a packet header format according to an embodiment of the present disclosure.
- FIG. 14 is a structural block diagram of a system for processing tasks to be processed according to an embodiment of the present disclosure.
- FIG1 is a hardware structure block diagram of a computer terminal of a processing method for a task to be processed in an embodiment of the present disclosure.
- the computer terminal may include one or more (only one is shown in FIG1 ) processors 102 (the processor 102 may include but is not limited to a hardware accelerator, a microprocessor (Central Processing Unit, MCU), a programmable logic device (Field Programmable Gate Array, FPGA), an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), a digital signal processor, a floating-point unit, a coprocessor, a multi-core processor, a multi-threaded processor, etc.) and a memory 104 for storing data, wherein the above-mentioned computer terminal may also include a transmission device 106 and an input/output device 108 for communication functions.
- processors 102 may include but is not limited to a hardware accelerator, a microprocessor (Central Processing Unit, MCU), a programmable logic device (Field Programmable Gate Array, FPGA), an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), a digital signal processor, a floating-point
- FIG1 is only for illustration and does not limit the structure of the above-mentioned computer terminal.
- the computer terminal may also include more or fewer components than those shown in FIG1 , or have a configuration different from that shown in FIG1 .
- the memory 104 may be configured to store computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the processing methods of the tasks to be processed in the embodiments of the present disclosure.
- the processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, that is, to implement the above-mentioned methods.
- the memory 104 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
- the memory 104 may further include a memory remotely arranged relative to the processor 102, and these remote memories may be connected to the computer terminal via a network. Examples of the above-mentioned network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
- the transmission device 106 is configured to receive or send data via a network.
- Specific examples of the above-mentioned network may include a wireless network provided by a communication provider of a computer terminal.
- the transmission device 106 includes a network adapter (Network Interface Controller, referred to as NIC), which can be connected to other network devices through a base station so as to communicate with the Internet.
- the transmission device 106 can be a radio frequency (Radio Frequency, referred to as RF) module, which is configured to communicate with the Internet wirelessly.
- RF Radio Frequency
- FIG. 2 is a flow chart of the method for processing a pending task according to an embodiment of the present disclosure. As shown in FIG. 2 , the process includes the following steps:
- Step S202 determining an instruction set corresponding to the task to be processed according to parameter information and requirement information of the task to be processed, wherein the requirement information is used to indicate a processing result corresponding to the task to be processed;
- the task requirement information is the specific requirements and goals of the task, including the content, time requirements, quantity requirements, etc.
- the parameter information refers to the parameters needed in the task, including input parameters and output parameters.
- the input parameters are the data that need to be input when the task is executed, and the output parameters are the results obtained after the task is completed. These parameters can be different types of data such as numbers, text, dates, files, etc.
- the task execution and the output of the results are completed according to the requirement information and parameter information.
- Step S204 determining the first instruction address of each instruction to be executed in the instruction set
- the instruction address refers to the address of the memory location or storage unit that indicates the specific instruction to be executed in the computer program.
- each instruction has a unique address, through which the instruction can be loaded into the processor for execution.
- the instruction address can be represented as a number or a memory address, which is not limited in the embodiments of the present disclosure.
- Step S206 Execute each of the to-be-executed instructions according to the first instruction address of each of the to-be-executed instructions to process the to-be-processed tasks.
- the task to be processed is sent to the hardware accelerator located on the computer terminal. Since the hardware accelerator can determine the instruction set of the task to be processed according to the preset instructions (i.e., it realizes the task programming of the task to be processed), and then processes the task to be processed by executing the instructions in the instruction set, that is, the hardware accelerator can also flexibly process different tasks. Therefore, the problem of low flexibility in chip design can be solved, thereby achieving the effect of improving chip flexibility.
- the execution subject of the above steps may be a hardware accelerator in the terminal, etc., but is not limited thereto.
- step S202 can be implemented by the following steps:
- Step 11 Determine multiple logical processing modes of the task to be processed according to the parameter information and the requirement information;
- Step 12 Determine the instructions corresponding to the multiple logical processing modes, and determine the multiple instructions as the instruction set.
- the above-mentioned multiple logical processing methods can be understood as a method of determining the result obtained after the task is executed according to the parameter information and requirement information of the task to be processed.
- the above logic processing methods include but are not limited to:
- bit0 to bit11 are DA fields, used to indicate the destination address
- bit12 to bit23 are SA fields, used to indicate the source address
- bit24 to bit27 are type is used to indicate the frame type
- bit28 to bit31 are subtype, which are used to indicate the frame subtype.
- the message type is determined by the subtype field. For example, messages with subtype equal to 0 are classified into the first category, and other cases are classified into the second category.
- the instructions are written as follows and written into the instruction cache memory (ITCM) by the master CPU writing registers:
- LW (32’b0000000000000_00000_010_00001_0000011) reads the data in DTCM address 0 into regfile address 1 LW instruction: reads back 32 bits of data from memory and writes it back to register rd;
- SW(32’b0000000_00000_00000_010_00001_0100011) writes 0 to DTCM address 1;
- SW (32’b0000000_00011_00000_010_00001_0100011) writes 1 to DTCM address 1.
- the downstream After starting the device, the downstream only needs to read the data in DTCM address 1. If it is 0, it means that the message is of the second category, and if it is 1, it means that the message is of the first category.
- DA, SA, and type need to be used as the basis for classification judgment.
- step S208 can be implemented by the following steps:
- Step 21 Processing step: obtaining a second instruction address in a program counter, wherein the program counter is used to store an instruction address of a next instruction to be executed;
- Step 22 determining the instruction to be executed corresponding to the second instruction address in the instruction set according to the first instruction address of each instruction to be executed, and executing the instruction to be executed;
- Step 23 Execute the processing steps in a loop until each instruction to be executed in the instruction set is completed.
- the program counter is a register in the computer that stores the address of the currently executed instruction or the address of the next instruction. It is continuously incremented during the execution of instructions to indicate the location of the next instruction to be executed. When the computer executes an instruction, the program counter automatically increments to the address of the next instruction so that the instruction can be fetched and executed.
- the value of the program counter is usually expressed in binary form and is updated in every clock cycle of the computer. Therefore, the instruction to be executed is determined according to the instruction address in the program counter until all instructions are executed.
- the instructions need to be written.
- the instructions can be written in the following way: determine the second instruction information of the instruction to be written, wherein the second instruction information includes at least one of the following: a second instruction code, a second source operand, a second target operand, and a second immediate value, the second instruction code is used to indicate the function of the instruction to be written, and the second source operand and the second target operand are used to indicate the read and write addresses of the instruction to be written; determine the instruction to be written according to the second instruction information.
- the instructions to be written in the embodiments of the present disclosure include: DTCM read and write instructions, basic integer operation instructions (addition, comparison, OR, AND, shift), unconditional jump instructions, conditional jump instructions (equal, unequal, greater than, less than, greater than or equal to, less than or equal to).
- the contents of the instructions to be written include:
- the instruction to be executed is executed in the following manner, including: parsing the instruction to be executed to obtain first instruction information corresponding to the instruction to be executed, wherein the first instruction information includes at least one of the following: a first instruction code, a first source operand, a first target operand, and a first immediate number; generating a control signal corresponding to the first instruction information, and sending the control signal to the corresponding execution unit of the hardware accelerator to execute the instruction to be executed.
- generating a control signal corresponding to the first instruction information includes at least one of the following: when the first instruction information includes: the first instruction code and the first source operand, generating first sub-control information according to the type of the first instruction code, and generating a second sub-control signal for reading an operand register according to the first source operand, wherein the control signal includes: the first sub-control signal and the second sub-control signal; when the first instruction information includes: the first instruction code, the first source operand and the first target operand, generating first sub-control information according to the type of the first instruction code, generating a second sub-control signal for reading an operand register according to the first source operand, and generating a third sub-control signal for writing into a general register according to the first target operand, wherein the control signal includes: the first sub-control signal, the second sub-control signal and the third sub-control signal; when the first instruction information includes: the first instruction code, the first immediate number and the first target operand, generating first sub-control information according to the type
- the first sub-control signal is a control signal for implementing operations such as addition, multiplication, displacement, and logical operations;
- the second sub-control signal is a control signal for reading the operand register;
- the third sub-control signal is a control signal for writing target information to the target location of the general register;
- the fourth sub-control signal is a control signal for reading the first immediate value.
- the Add Immediate Instruction (ADDI) instruction is used to instruct the addition operation of the integer value in the operand register rs1 and the 12-bit immediate value (with sign bit extension), and the result is written back to the register RD.
- the SW instruction is parsed, the first sub-control signal is the control signal for the addition operation, the second sub-control signal is the control signal for reading the integer value in the operand register rs1, and the third sub-control signal is the control signal for writing the result back to the general register RD.
- FIG3 is a flow chart of the method for processing a task to be processed according to an optional embodiment of the present disclosure. As shown in FIG3 , the details are as follows:
- the disclosed embodiment uses a 2-level Pipeline, in which reading instructions uses 1 beat, decoding, execution, memory access, and write back use 1 beat together. Through the arrangement of different instructions, the state machine can be programmed.
- FIG. 4 is an internal structure diagram of a hardware accelerator according to an embodiment of the present disclosure. As shown in FIG. 4 , the hardware accelerator specifically includes:
- DTCM Data Tightly-Coupled Memory
- IFU Instruction Fetch Unit
- ICM Instruction Tightly Coupled Memory
- Instruction Decode Unit (DeCode for short), set to decode;
- EXU Execution Unit
- Register File (Register File, referred to as RegFile for short) is set to complete the operation of internal general registers.
- the present disclosure imposes certain constraints on the input content (instructions), that is, a solution for instruction design is provided in the embodiment of the present disclosure, specifically:
- the instructions include DTCM read and write instructions, basic integer operation instructions (addition, comparison, OR, AND, shift), unconditional jump instructions, and conditional jump instructions (equal, unequal, greater than, less than, greater than or equal to, less than or equal to).
- the content of the instruction needs to include:
- the instruction code (required), used to identify the action the instruction is intended to complete;
- regfile is 32 32-bit general registers, which are used with instructions, as shown in Figure 5.
- the general register at address 0 is always 0 and cannot be written.
- PC is the address of the instruction currently being processed.
- the designed instructions are written into the ITCM by the CPU and the device is started; the IFU controls the instruction reading address and reads the instruction; the DeCode completes the instruction parsing; and the EXU completes the instruction execution and data writeback.
- the instructions designed in the disclosed embodiments are based on the RV32I basic instruction set in RSIC-V, directly borrowing some instructions and supplementing some custom instructions.
- the instruction design can be freely defined and expanded.
- the necessary instructions are read and write instructions, basic logical operation instructions and jump instructions.
- This group of instructions performs memory read or write operations, and the address used to access the memory is obtained by adding the value in the operand register rs1 to a 12-bit immediate value (with sign bit extension).
- the LW instruction as shown in Figure 6, reads 32 bits of data from the memory and writes it back to register rd;
- the SW instruction as shown in Figure 7, writes the 32-bit data in operand register rs2 back to the memory;
- This group of instructions performs basic integer operations on registers and immediate values.
- the ADDI instruction adds the integer value in the operand register rs1 to the 12-bit immediate value (with the sign bit extended), and writes the result back to the register RD. If the result overflows, no special processing is required, and the overflowed bits are discarded, leaving only the lower 32 bits of the result.
- the SLTIU instruction compares the integer value in operand register rs1 with the 12-bit immediate value (sign extended) as an unsigned number. If the value in rs1 is less than the immediate value, the result is 1, otherwise it is 0. The result is written back to register RD.
- ANDI instruction performs an AND operation on the integer value in operand register rs1 and the 12-bit immediate value (with sign bit extended), and writes the result back to register rd.
- the ORI instruction performs an OR operation on the integer value in the operand register rs1 and the 12-bit immediate value (with sign bit extension), and writes the result back to the register rd.
- the XORI instruction performs an exclusive OR (XOR) operation on the integer value in the operand register rs1 and the 12-bit immediate value (with sign bit extension), and writes the result back to the register rd.
- XOR exclusive OR
- the SLLI instruction performs a logical left shift operation on the integer value in the operand register rs1 (filling the low bits with 0), the shift amount is a 5-bit immediate value, and the result is written back to the register rd.
- the SRLI instruction performs a logical right shift operation on the integer value in the operand register rs1 (filling the high bits with 0), the shift amount is a 5-bit immediate value, and the result is written back to the register rd.
- the LUI instruction shifts the value of the 20-bit immediate number left by 12 bits (fills the lower 12 bits with 0) to become a 32-bit number, and writes the number back to register rd.
- Figure 11 shows an unconditional jump instruction
- JAL instruction Use a 20-bit immediate number (signed number) as an offset, and then add it to the PC of the instruction to generate the final jump target address.
- the jal instruction writes the value of the PC of the next instruction (that is, the current instruction PC+1) into its result register rd.
- JALR instruction Use 12-bit immediate number (signed number) as offset and add it to the value in operand register rs1 to get the final jump target address.
- the jalr instruction writes the value of the PC of the next instruction (i.e. the current instruction PC+1) into its result register rd.
- Figure 12 shows the conditional jump instruction
- This group of instructions is a conditional jump instruction, which uses a 12-bit immediate number (signed number) as an offset, and then adds it to the PC of the instruction to generate the final jump target address.
- the conditional jump instruction needs to jump only when the condition is true, as follows.
- BEQI instruction It will jump only if the value in operand register rs1 is equal to the value of the 5-bit immediate value in the operand register.
- BNEI instruction It will jump only if the value in operand register rs1 is not equal to the value of the 5-bit immediate value in the operand register.
- BGTUI instruction It will jump only if the unsigned number in operand register rs1 is greater than the unsigned number of the 5-bit immediate value in the operand register.
- BLEUI instruction It will jump only if the unsigned number in operand register rs1 is less than or equal to the unsigned number in operand register 5-bit immediate value.
- BLTUI instruction It will jump only if the unsigned number in operand register rs1 is less than the unsigned number of the 5-bit immediate value in the operand register.
- BGEUI instruction It will jump only if the unsigned number in operand register rs1 is greater than or equal to the unsigned number in operand register 5-bit immediate value.
- the disclosed embodiment designs a set of hardware architecture and imposes certain constraints on its input content (instructions). Based on the common hardware design, the programmable function similar to software is added, which realizes the programmability of state machine and even logic function, and provides a certain flexibility without increasing the chip power consumption area. To a certain extent, it solves the dilemma of chip design that requires both flexibility and low cost.
- the computer software product is stored in a storage medium (such as a read-only memory/random access memory (ROM/RAM), a disk, or an optical disk), and includes a number of instructions for a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in the various embodiments of the present disclosure.
- a storage medium such as a read-only memory/random access memory (ROM/RAM), a disk, or an optical disk
- a terminal device which can be a mobile phone, a computer, a server, or a network device, etc.
- a processing system for a task to be processed is also provided, and the system is used to implement the above-mentioned embodiment and preferred implementation mode, and the descriptions that have been made are not repeated.
- the term "module" can implement a combination of software and/or hardware of a predetermined function.
- the system described in the following embodiments is preferably implemented in software, the implementation of hardware, or a combination of software and hardware is also possible and conceived.
- FIG. 14 is a structural block diagram of the processing system for the task to be processed according to the embodiment of the present disclosure. As shown in FIG. 14 , the system includes:
- the instruction extraction unit 1402 is configured to determine an instruction set corresponding to the task to be processed according to parameter information and requirement information of the task to be processed, wherein the requirement information is used to indicate a processing result corresponding to the task to be processed; and determine an instruction address of each instruction to be executed in the instruction set;
- the execution unit 1404 is configured to execute each of the to-be-executed instructions according to the instruction address of each of the to-be-executed instructions to process the to-be-processed tasks.
- the hardware accelerator can determine the instruction set of the task to be processed according to the preset instructions (i.e., it realizes the task programming of the task to be processed), and then processes the task to be processed by executing the instructions in the instruction set, the hardware accelerator can also flexibly process different tasks. Therefore, the problem of low flexibility in chip design can be solved, thereby achieving the effect of improving chip flexibility.
- the processing system for the tasks to be processed in the embodiment of the present disclosure is located in the above-mentioned hardware accelerator, and the hardware accelerator processes the tasks to be processed through the processing system for the tasks to be processed. Therefore, the system structure of the processing system for the tasks to be processed in the embodiment of the present disclosure is similar to the internal structure of the above-mentioned hardware accelerator.
- the processing system for the task to be processed also includes: a data tightly coupled memory, an instruction tightly coupled memory unit, an instruction decoding unit, an execution unit, and a register file.
- a register file is a component inside a processing system that is used to store and manage system registers.
- a register is a high-speed memory inside a system that is used to temporarily store instructions and data.
- a regfile usually contains multiple registers, such as general registers, program counter (PC), etc.
- a register file can provide the system with fast data storage and access capabilities to support the execution of instructions and the processing of data.
- the instruction extraction unit 1402 is configured to determine based on the parameter information and the requirement information A plurality of logical processing modes for the task to be processed; determining instructions corresponding to the plurality of logical processing modes, and determining the plurality of instructions as the instruction set.
- the instruction extraction unit is further configured to obtain a second instruction address in a program counter, wherein the program counter is used to store the instruction address of the next instruction to be executed; determine the instruction to be executed corresponding to the second instruction address in the instruction set based on the first instruction address of each instruction to be executed; and the execution unit is further configured to execute the instruction to be executed.
- an instruction decoding unit is configured to parse the instruction to be executed to obtain first instruction information corresponding to the instruction to be executed, wherein the first instruction information includes at least one of the following: a first instruction code, a first source operand, a first target operand, and a first immediate number; generate a control signal corresponding to the first instruction information, and send the control signal to a corresponding execution unit to execute the instruction to be executed.
- the instruction decoding unit is further configured to be one of the following:
- the first instruction information includes: the first instruction code and the first source operand, generating first sub-control information according to the type of the first instruction code, and generating a second sub-control signal for reading an operand register according to the first source operand, wherein the control signal includes: the first sub-control signal and the second sub-control signal;
- the first instruction information includes: the first instruction code, the first source operand, and the first target operand, generating first sub-control information according to the type of the first instruction code, generating a second sub-control signal for reading an operand register according to the first source operand, and generating a third sub-control signal for writing into a general register according to the first target operand, wherein the control signal includes: the first sub-control signal, the second sub-control signal, and the third sub-control signal;
- first instruction information includes: the first instruction code, the first immediate value and the first target operand
- first sub-control information is generated according to the type of the first instruction code
- a fourth sub-control signal for reading the first immediate value is generated
- a third sub-control signal for writing into a general register is generated according to the first target operand
- the control signal includes: the first sub-control signal, the fourth sub-control signal and the third sub-control signal.
- the above-mentioned device also includes: an instruction writing unit, configured to determine second instruction information of the instruction to be written, wherein the second instruction information includes at least one of the following: a second instruction code, a second source operand, a second target operand, and a second immediate value, the second instruction code is used to indicate the function of the instruction to be written, and the second source operand and the second target operand are used to indicate the read and write addresses of the instruction to be written; the instruction to be written is determined according to the second instruction information.
- the second instruction information includes at least one of the following: a second instruction code, a second source operand, a second target operand, and a second immediate value
- the second instruction code is used to indicate the function of the instruction to be written
- the second source operand and the second target operand are used to indicate the read and write addresses of the instruction to be written
- the instruction to be written is determined according to the second instruction information.
- the programmed instructions are stored in the instruction tightly coupled memory unit, and the instruction set is also stored in the instruction tightly coupled memory unit before being fetched.
- the above modules can be implemented by software or hardware. For the latter, it can be implemented in the following ways, but not limited to: the above modules are all located in the same processor; or the above modules are located in different processors in any combination.
- An embodiment of the present disclosure further provides a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the steps of any of the above method embodiments when running.
- the computer-readable storage medium may include, but is not limited to: a USB flash drive, a read-only storage medium, Various media that can store computer programs include Read-Only Memory (ROM), Random Access Memory (RAM), mobile hard disk, magnetic disk or CD, etc.
- ROM Read-Only Memory
- RAM Random Access Memory
- mobile hard disk magnetic disk or CD, etc.
- An embodiment of the present disclosure further provides an electronic device, including a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to execute the steps in any one of the above method embodiments.
- the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
- modules or steps of the present disclosure can be implemented by a general computing device, they can be concentrated on a single computing device, or distributed on a network composed of multiple computing devices, they can be implemented by a program code executable by a computing device, so that they can be stored in a storage device and executed by the computing device, and in some cases, the steps shown or described can be executed in a different order than here, or they can be made into individual integrated circuit modules, or multiple modules or steps therein can be made into a single integrated circuit module for implementation.
- the present disclosure is not limited to any specific combination of hardware and software.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Geometry (AREA)
- Evolutionary Computation (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Executing Machine-Instructions (AREA)
Abstract
Description
本公开要求于2023年8月15日提交中国专利局、申请号为202311034305.X、发明名称“待处理任务的处理方法和系统、存储介质和电子装置”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure claims the priority of the Chinese patent application filed with the Chinese Patent Office on August 15, 2023, with application number 202311034305.X and invention name “Processing method and system for pending tasks, storage medium and electronic device”, the entire contents of which are incorporated by reference in this disclosure.
本公开实施例涉及通信领域,具体而言,涉及一种待处理任务的处理方法和系统、存储介质和电子装置。The embodiments of the present disclosure relate to the field of communications, and in particular, to a method and system for processing tasks to be processed, a storage medium, and an electronic device.
现有的大型ASIC芯片,一般采用CPU处理器加硬件加速器的设计方式,CPU面积大,功耗大,但通用性高,处理复杂业务时性能低,一般处理灵活多变的业务;如果是特定的或者灵活性不高的复杂业务,一般使用硬件加速器实现,可以提高性能,节省功耗和面积,但缺点就是不灵活。Existing large-scale ASIC chips generally adopt the design of CPU processor plus hardware accelerator. The CPU has a large area and high power consumption, but is highly versatile. It has low performance when processing complex services and generally handles flexible and changeable services. If it is a specific or complex service with low flexibility, it is generally implemented using a hardware accelerator, which can improve performance, save power consumption and area, but the disadvantage is that it is inflexible.
ASIC芯片的一大设计难点就是软硬的划分,也就是哪些业务用CPU实现,哪些业务用硬件加速器实现。因为芯片的开发周期长,商用时的场景与设计初期的场景往往有较大差异,这就需要在芯片设计初期预留一定的灵活性,但这就意味着芯片成本的增加,从而陷入两难。One of the major design difficulties of ASIC chips is the division of software and hardware, that is, which services are implemented by the CPU and which services are implemented by the hardware accelerator. Because the chip development cycle is long, the scenarios in commercial use are often quite different from those in the early stages of design, which requires a certain degree of flexibility to be reserved in the early stages of chip design, but this means an increase in chip costs, which leads to a dilemma.
针对相关技术中,芯片设计的灵活性较低等问题,尚未提出有效的解决方案。With regard to the problems in related technologies such as low flexibility in chip design, no effective solutions have been proposed yet.
发明内容Summary of the invention
本公开实施例提供了一种待处理任务的处理方法和系统、存储介质和电子装置,以至少解决相关技术中芯片设计的灵活性较低的问题。The embodiments of the present disclosure provide a method and system for processing a task to be processed, a storage medium, and an electronic device, so as to at least solve the problem of low flexibility in chip design in the related art.
根据本公开的一个实施例,提供了一种待处理任务的处理方法,应用于硬件加速器,包括:根据待处理任务的参数信息和需求信息确定所述待处理任务对应的指令集合,其中,所述需求信息用于指示所述待处理任务对应的处理结果;确定所述指令集合中的每一个待执行指令的第一指令地址;根据所述每一个待执行指令的第一指令地址执行所述每一个待执行指令,以处理所述待处理任务。According to an embodiment of the present disclosure, a method for processing a task to be processed is provided, which is applied to a hardware accelerator, and includes: determining an instruction set corresponding to the task to be processed based on parameter information and requirement information of the task to be processed, wherein the requirement information is used to indicate a processing result corresponding to the task to be processed; determining a first instruction address of each instruction to be executed in the instruction set; and executing each instruction to be executed based on the first instruction address of each instruction to be executed to process the task to be processed.
根据本公开的另一个实施例,提供了一种待处理任务的处理系统,包括:指令提取单元,设置为根据待处理任务的参数信息和需求信息确定所述待处理任务对应的指令集合,其中,所述需求信息用于指示所述待处理任务对应的处理结果;确定所述指令集合中的每一个待执行指令的指令地址;执行单元,设置为根据所述每一个待执行指令的指令地址执行所述每一个待执行指令,以处理所述待处理任务。According to another embodiment of the present disclosure, a processing system for a task to be processed is provided, including: an instruction extraction unit, configured to determine an instruction set corresponding to the task to be processed based on parameter information and requirement information of the task to be processed, wherein the requirement information is used to indicate a processing result corresponding to the task to be processed; determining an instruction address of each instruction to be executed in the instruction set; and an execution unit, configured to execute each instruction to be executed according to the instruction address of each instruction to be executed to process the task to be processed.
根据本公开的又一个实施例,还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。According to another embodiment of the present disclosure, a computer-readable storage medium is provided, in which a computer program is stored, wherein the computer program is configured to execute the steps of any one of the above method embodiments when running.
根据本公开的又一个实施例,还提供了一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行上述任一项方法实施例中的步骤。 According to another embodiment of the present disclosure, an electronic device is provided, including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to run the computer program to execute the steps in any one of the above method embodiments.
图1是本公开实施例的一种待处理任务的处理方法的计算机终端的硬件结构框图;FIG1 is a hardware structure block diagram of a computer terminal of a method for processing a task to be processed according to an embodiment of the present disclosure;
图2是根据本公开实施例的待处理任务的处理方法的流程图;FIG2 is a flow chart of a method for processing a task to be processed according to an embodiment of the present disclosure;
图3是根据本公开可选实施例的待处理任务的处理方法的流程图;FIG3 is a flow chart of a method for processing a task to be processed according to an optional embodiment of the present disclosure;
图4是根据本公开实施例的硬件加速器的内部结构图;FIG4 is an internal structure diagram of a hardware accelerator according to an embodiment of the present disclosure;
图5是根据本公开实施例的通用寄存器的示意图;FIG5 is a schematic diagram of a general register according to an embodiment of the present disclosure;
图6是根据本公开实施例的指令的示意图(一);FIG6 is a schematic diagram of an instruction according to an embodiment of the present disclosure (I);
图7是根据本公开实施例的指令的示意图(二);FIG7 is a schematic diagram of an instruction according to an embodiment of the present disclosure (II);
图8是根据本公开实施例的指令的示意图(三);FIG8 is a schematic diagram of an instruction according to an embodiment of the present disclosure (III);
图9是根据本公开实施例的指令的示意图(四);FIG9 is a schematic diagram of an instruction according to an embodiment of the present disclosure (IV);
图10是根据本公开实施例的指令的示意图(五);FIG10 is a schematic diagram of an instruction according to an embodiment of the present disclosure (V);
图11是根据本公开实施例的指令的示意图(六);FIG11 is a schematic diagram of an instruction according to an embodiment of the present disclosure (VI);
图12是根据本公开实施例的指令的示意图(七);FIG12 is a schematic diagram of an instruction according to an embodiment of the present disclosure (VII);
图13是根据本公开实施例的报文包头格式的示意图;FIG13 is a schematic diagram of a packet header format according to an embodiment of the present disclosure;
图14是根据本公开实施例的待处理任务的处理系统的结构框图。FIG. 14 is a structural block diagram of a system for processing tasks to be processed according to an embodiment of the present disclosure.
下文中将参考附图并结合实施例来详细说明本公开的实施例。Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings and in combination with the embodiments.
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that the terms "first", "second", etc. in the specification and claims of the present disclosure and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence.
本公开实施例中所提供的方法实施例可以在计算机终端或者类似的运算装置中执行。以运行在计算机终端上为例,图1是本公开实施例的一种待处理任务的处理方法的计算机终端的硬件结构框图。如图1所示,计算机终端可以包括一个或多个(图1中仅示出一个)处理器102(处理器102可以包括但不限于硬件加速器、微处理器(Central Processing Unit,MCU)、可编程逻辑器件(Field Programmable Gate Array,FPGA)、专用集成电路(Application-Specific Integrated Circuit,ASIC)、数字信号处理器、浮点运算单元、协处理器、多核处理器、多线程处理器等的处理装置)和用于存储数据的存储器104,其中,上述计算机终端还可以包括用于通信功能的传输设备106以及输入输出设备108。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述计算机终端的结构造成限定。例如,计算机终端还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。The method embodiments provided in the embodiments of the present disclosure can be executed in a computer terminal or a similar computing device. Taking running on a computer terminal as an example, FIG1 is a hardware structure block diagram of a computer terminal of a processing method for a task to be processed in an embodiment of the present disclosure. As shown in FIG1 , the computer terminal may include one or more (only one is shown in FIG1 ) processors 102 (the processor 102 may include but is not limited to a hardware accelerator, a microprocessor (Central Processing Unit, MCU), a programmable logic device (Field Programmable Gate Array, FPGA), an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), a digital signal processor, a floating-point unit, a coprocessor, a multi-core processor, a multi-threaded processor, etc.) and a memory 104 for storing data, wherein the above-mentioned computer terminal may also include a transmission device 106 and an input/output device 108 for communication functions. It can be understood by those skilled in the art that the structure shown in FIG1 is only for illustration and does not limit the structure of the above-mentioned computer terminal. For example, the computer terminal may also include more or fewer components than those shown in FIG1 , or have a configuration different from that shown in FIG1 .
存储器104可设置为存储计算机程序,例如,应用软件的软件程序以及模块,如本公开实施例中的待处理任务的处理方法对应的计算机程序,处理器102通过运行存储在存储器104内的计算机程序,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至计算机终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。 The memory 104 may be configured to store computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the processing methods of the tasks to be processed in the embodiments of the present disclosure. The processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, that is, to implement the above-mentioned methods. The memory 104 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include a memory remotely arranged relative to the processor 102, and these remote memories may be connected to the computer terminal via a network. Examples of the above-mentioned network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
传输设备106设置为经由一个网络接收或者发送数据。上述的网络具体实例可包括计算机终端的通信供应商提供的无线网络。在一个实例中,传输设备106包括一个网络适配器(Network Interface Controller,简称为NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输设备106可以为射频(Radio Frequency,简称为RF)模块,其设置为通过无线方式与互联网进行通讯。The transmission device 106 is configured to receive or send data via a network. Specific examples of the above-mentioned network may include a wireless network provided by a communication provider of a computer terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, referred to as NIC), which can be connected to other network devices through a base station so as to communicate with the Internet. In one example, the transmission device 106 can be a radio frequency (Radio Frequency, referred to as RF) module, which is configured to communicate with the Internet wirelessly.
在本实施例中提供了一种运行于上述计算机终端的待处理任务的处理方法,图2是根据本公开实施例的待处理任务的处理方法的流程图,如图2所示,该流程包括如下步骤:In this embodiment, a method for processing a pending task running on the above-mentioned computer terminal is provided. FIG. 2 is a flow chart of the method for processing a pending task according to an embodiment of the present disclosure. As shown in FIG. 2 , the process includes the following steps:
步骤S202,根据待处理任务的参数信息和需求信息确定所述待处理任务对应的指令集合,其中,所述需求信息用于指示所述待处理任务对应的处理结果;Step S202, determining an instruction set corresponding to the task to be processed according to parameter information and requirement information of the task to be processed, wherein the requirement information is used to indicate a processing result corresponding to the task to be processed;
需要说明的是,任务的需求信息是任务的具体要求和目标,包括任务的内容、时间要求、数量要求等。参数信息则是指在任务中需要使用的参数,包括输入参数和输出参数。输入参数是任务执行时需要输入的数据,而输出参数是任务执行完成后得到的结果。这些参数可以是数字、文字、日期、文件等不同类型的数据。在任务执行过程中,根据需求信息和参数信息来完成任务的执行和结果的输出。It should be noted that the task requirement information is the specific requirements and goals of the task, including the content, time requirements, quantity requirements, etc. The parameter information refers to the parameters needed in the task, including input parameters and output parameters. The input parameters are the data that need to be input when the task is executed, and the output parameters are the results obtained after the task is completed. These parameters can be different types of data such as numbers, text, dates, files, etc. During the task execution process, the task execution and the output of the results are completed according to the requirement information and parameter information.
步骤S204,确定所述指令集合中的每一个待执行指令的第一指令地址;Step S204, determining the first instruction address of each instruction to be executed in the instruction set;
需要说明的是,指令地址是指在计算机程序中指示要执行的特定指令的内存位置或存储单元的地址。在计算机的指令集架构中,每个指令都有一个唯一的地址,通过这个地址可以将指令加载到处理器中执行。指令地址可以表示为一个数字或一个内存地址,本公开实施例对此不做限定。It should be noted that the instruction address refers to the address of the memory location or storage unit that indicates the specific instruction to be executed in the computer program. In the computer's instruction set architecture, each instruction has a unique address, through which the instruction can be loaded into the processor for execution. The instruction address can be represented as a number or a memory address, which is not limited in the embodiments of the present disclosure.
步骤S206,根据所述每一个待执行指令的第一指令地址执行所述每一个待执行指令,以处理所述待处理任务。Step S206: Execute each of the to-be-executed instructions according to the first instruction address of each of the to-be-executed instructions to process the to-be-processed tasks.
通过上述步骤,在计算机终端接收到待处理任务时,将待处理任务发送至位于计算机终端上的硬件加速器,由于硬件加速器可以根据预设指令确定待处理任务的指令集合(即实现了对待处理任务的任务编程),进而通过执行指令集合中的指令,处理待处理任务,即硬件加速器也可以灵活处理不同的任务。因此,可以解决芯片设计的灵活性较低的问题,进而达到了提高芯片灵活度的效果。Through the above steps, when the computer terminal receives the task to be processed, the task to be processed is sent to the hardware accelerator located on the computer terminal. Since the hardware accelerator can determine the instruction set of the task to be processed according to the preset instructions (i.e., it realizes the task programming of the task to be processed), and then processes the task to be processed by executing the instructions in the instruction set, that is, the hardware accelerator can also flexibly process different tasks. Therefore, the problem of low flexibility in chip design can be solved, thereby achieving the effect of improving chip flexibility.
其中,上述步骤的执行主体可以为终端中的硬件加速器等,但不限于此。The execution subject of the above steps may be a hardware accelerator in the terminal, etc., but is not limited thereto.
可选地,上述步骤S202可以通过以下步骤实现:Optionally, the above step S202 can be implemented by the following steps:
步骤11:根据所述参数信息和需求信息确定所述待处理任务的多个逻辑处理方式;Step 11: Determine multiple logical processing modes of the task to be processed according to the parameter information and the requirement information;
步骤12:确定所述多个逻辑处理方式对应的指令,并将多个所述指令确定为所述指令集合。Step 12: Determine the instructions corresponding to the multiple logical processing modes, and determine the multiple instructions as the instruction set.
也就是说,上述多个逻辑处理方式可以理解为根据待处理任务的参数信息和需求信息确定获取任务执行完成后得到的结果的方式。That is to say, the above-mentioned multiple logical processing methods can be understood as a method of determining the result obtained after the task is executed according to the parameter information and requirement information of the task to be processed.
上述逻辑处理方式包括但不限于:The above logic processing methods include but are not limited to:
1.与运算(AND):当两个逻辑值都为真时,结果为真;否则,结果为假。1. AND operation: When both logical values are true, the result is true; otherwise, the result is false.
2.或运算(OR):当两个逻辑值至少有一个为真时,结果为真;否则,结果为假。2. OR operation: When at least one of the two logical values is true, the result is true; otherwise, the result is false.
3.非运算(NOT):对一个逻辑值进行取反操作,即真变假,假变真。3. NOT: Perform the inversion operation on a logical value, that is, true becomes false and false becomes true.
举例来讲,以解析分类报文为例,假设某种报文包头格式如图13所示,其中,bit0~bit11为DA字段,用于指示目的地址;bit12~bit23为SA字段,用于指示源地址;bit24~bit27为 type,用于指示帧类型;bit28~bit31为subtype,用于指示帧子类型。For example, taking the parsing of classified messages as an example, assuming that a message header format is shown in Figure 13, where bit0 to bit11 are DA fields, used to indicate the destination address; bit12 to bit23 are SA fields, used to indicate the source address; bit24 to bit27 are type is used to indicate the frame type; bit28 to bit31 are subtype, which are used to indicate the frame subtype.
本公开实施例中,通过subtype字段确定报文类型,例如,subtype等于0的报文分为第一类,其他情况分为第二类。In the embodiment of the present disclosure, the message type is determined by the subtype field. For example, messages with subtype equal to 0 are classified into the first category, and other cases are classified into the second category.
将包头通过上游逻辑写入DTCM地址0中。Write the packet header to DTCM address 0 through the upstream logic.
编写指令如下,并通过主控CPU写寄存器的方式写入指令缓存存储器(Instruction Cache Memory,简称为ITCM)中:The instructions are written as follows and written into the instruction cache memory (ITCM) by the master CPU writing registers:
0:ADDI(32’b000000000001_00000_000_00011_0010011)将regfile地址3中的数据初始化为1;0: ADDI (32’b000000000001_00000_000_00011_0010011) initializes the data in address 3 of regfile to 1;
1:LW(32’b000000000000_00000_010_00001_0000011)将DTCM地址0中的数据读入regfile地址1LW指令:从存储器中读回32位的数据,写回寄存器rd;1: LW (32’b0000000000000_00000_010_00001_0000011) reads the data in DTCM address 0 into regfile address 1 LW instruction: reads back 32 bits of data from memory and writes it back to register rd;
2:SLLI(32’b0000000_11100_00001_001_00010_0010011)regfile地址1中的数据左移28位,并写入regfile地址2;2: SLLI (32’b0000000_11100_00001_001_00010_0010011) The data in regfile address 1 is shifted left by 28 bits and written to regfile address 2;
3:SRLI(32’b0000000_11100_00010_101_00001_0010011)regfile地址2中的数据右移28位,并写入regfile地址1;3: SRLI (32’b0000000_11100_00010_101_00001_0010011) The data in regfile address 2 is shifted right by 28 bits and written to regfile address 1;
4:BEQI(32’b0000000_00000_00001_010_00010_1100111)regfile地址1中的数据等于0则调转到第6行指令,否则指令第5行指令;4: BEQI (32’b0000000_00000_00001_010_00010_1100111) If the data in address 1 of regfile is equal to 0, jump to the sixth line of instructions, otherwise, jump to the fifth line of instructions;
5:SW(32’b0000000_00000_00000_010_00001_0100011)DTCM地址1中写入0;5: SW(32’b0000000_00000_00000_010_00001_0100011) writes 0 to DTCM address 1;
6:SW(32’b0000000_00011_00000_010_00001_0100011)DTCM地址1中写入1。6: SW (32’b0000000_00011_00000_010_00001_0100011) writes 1 to DTCM address 1.
启动该装置后,下游只需要读取DTCM地址1中的数据,如果是0则表示该报文为第二类,如果是1则表示该报文为第1类。After starting the device, the downstream only needs to read the data in DTCM address 1. If it is 0, it means that the message is of the second category, and if it is 1, it means that the message is of the first category.
可选地,如果报文分类的需求变化,需要将DA、SA、type作为分类的判断依据,只需要修改ITCM中的指令组合就可以完成包头32bit中任意字段的组合判断,进而不再受传统硬件逻辑的需求,提高了芯片的灵活性。Optionally, if the demand for message classification changes, DA, SA, and type need to be used as the basis for classification judgment. You only need to modify the instruction combination in ITCM to complete the combination judgment of any field in the 32-bit packet header, and it is no longer subject to the requirements of traditional hardware logic, thereby improving the flexibility of the chip.
可选地,上述步骤S208可以通过以下步骤实现:Optionally, the above step S208 can be implemented by the following steps:
步骤21:处理步骤:获取程序计数器中的第二指令地址,其中,所述程序计数器用于存储下一条将要执行的指令的指令地址;Step 21: Processing step: obtaining a second instruction address in a program counter, wherein the program counter is used to store an instruction address of a next instruction to be executed;
步骤22:根据所述每一个待执行指令的第一指令地址在所述指令集合中确定所述第二指令地址对应的待执行指令,并执行所述待执行指令;Step 22: determining the instruction to be executed corresponding to the second instruction address in the instruction set according to the first instruction address of each instruction to be executed, and executing the instruction to be executed;
步骤23:循环执行所述处理步骤,直至执行完成所述指令集合中的每一个待执行指令。Step 23: Execute the processing steps in a loop until each instruction to be executed in the instruction set is completed.
需要说明的是,程序计数器是计算机中的一个寄存器,用于存储当前正在执行的指令的地址或下一条指令的地址。它在指令执行过程中不断地增加,以指示下一条要执行的指令的位置。当计算机执行一条指令时,程序计数器会自动递增到下一条指令的地址,以便取出并执行该指令。程序计数器的值通常以二进制形式表示,且在计算机的每个时钟周期中都会更新。因此,根据程序计数器中的指令地址确定待执行指令,直至将所有的指令执行完成。It should be noted that the program counter is a register in the computer that stores the address of the currently executed instruction or the address of the next instruction. It is continuously incremented during the execution of instructions to indicate the location of the next instruction to be executed. When the computer executes an instruction, the program counter automatically increments to the address of the next instruction so that the instruction can be fetched and executed. The value of the program counter is usually expressed in binary form and is updated in every clock cycle of the computer. Therefore, the instruction to be executed is determined according to the instruction address in the program counter until all instructions are executed.
可选地,获取待处理任务对应的指令集合之前,还需要对指令进行编写,具体的,可以通过以下方式编写指令:确定待编写指令的第二指令信息,其中,所述第二指令信息至少包括以下之一:第二指令编码、第二源操作数、第二目标操作数、第二立即数,所述第二指令编码用于指示所述待编写指令的功能,所述第二源操作数、第二目标操作数用于指示所述待编写指令的读写地址;根据所述第二指令信息确定所述待编写指令。 Optionally, before obtaining the instruction set corresponding to the task to be processed, the instructions need to be written. Specifically, the instructions can be written in the following way: determine the second instruction information of the instruction to be written, wherein the second instruction information includes at least one of the following: a second instruction code, a second source operand, a second target operand, and a second immediate value, the second instruction code is used to indicate the function of the instruction to be written, and the second source operand and the second target operand are used to indicate the read and write addresses of the instruction to be written; determine the instruction to be written according to the second instruction information.
本公开实施例中的待编写指令:包含DTCM读写指令、基本整数运算指令(加、比较、或、与、移位)、无条件跳转指令、有条件跳转指令(相等、不等、大于、小于、大于等于、小于等于)。The instructions to be written in the embodiments of the present disclosure include: DTCM read and write instructions, basic integer operation instructions (addition, comparison, OR, AND, shift), unconditional jump instructions, conditional jump instructions (equal, unequal, greater than, less than, greater than or equal to, less than or equal to).
待编写指令的内容包括:The contents of the instructions to be written include:
1)指令编码(必选),用于识别指令想要完成的动作;1) Instruction code (required), used to identify the action the instruction is intended to complete;
2)操作的地址(必选),DTCM的读写地址(相当于上述实施例中的源操作数)或者regfile的读写地址(相当于上述实施例中的目标操作数);2) Operation address (required), the read/write address of DTCM (equivalent to the source operand in the above embodiment) or the read/write address of regfile (equivalent to the target operand in the above embodiment);
)立即数(可选),一个具体的被操作数。) immediate value (optional), a specific operand.
可选地,通过以下方式执行所述待执行指令,包括:解析所述待执行指令,以获取所述待执行指令对应的第一指令信息,其中,所述第一指令信息至少包括以下之一:第一指令编码、第一源操作数、第一目标操作数、第一立即数;生成所述第一指令信息对应的控制信号,将所述控制信号发送至对应的所述硬件加速器的执行单元,以执行所述待执行指令。Optionally, the instruction to be executed is executed in the following manner, including: parsing the instruction to be executed to obtain first instruction information corresponding to the instruction to be executed, wherein the first instruction information includes at least one of the following: a first instruction code, a first source operand, a first target operand, and a first immediate number; generating a control signal corresponding to the first instruction information, and sending the control signal to the corresponding execution unit of the hardware accelerator to execute the instruction to be executed.
也就是说,本公开实施例中通过硬件加速器的执行单元执行待执行指令,执行单元进行相应的算术逻辑运算,如加法、乘法、位移、逻辑运算等。That is to say, in the embodiment of the present disclosure, the execution unit of the hardware accelerator executes the instruction to be executed, and the execution unit performs corresponding arithmetic and logical operations, such as addition, multiplication, displacement, logical operations, etc.
具体地,生成所述第一指令信息对应的控制信号,至少包括以下之一:在所述第一指令信息包括:所述第一指令编码和所述第一源操作数的情况下,根据所述第一指令编码的类型生成第一子控制信息,以及根据所述第一源操作数生成读取操作数寄存器的第二子控制信号,其中,所述控制信号包括:所述第一子控制信号和所述第二子控制信号;在所述第一指令信息包括:所述第一指令编码、所述第一源操作数和所述第一目标操作数的情况下,根据所述第一指令编码的类型生成第一子控制信息,根据所述第一源操作数生成读取操作数寄存器的第二子控制信号,以及根据所述第一目标操作数生成写入通用寄存器的第三子控制信号,其中,所述控制信号包括:所述第一子控制信号、所述第二子控制信号和所述第三子控制信号;在所述第一指令信息包括:所述第一指令编码、所述第一立即数和所述第一目标操作数的情况下,根据所述第一指令编码的类型生成第一子控制信息,生成读取所述第一立即数的第四子控制信号,以及根据所述第一目标操作数生成写入通用寄存器的第三子控制信号,其中,所述控制信号包括:所述第一子控制信号、所述第四子控制信号和所述第三子控制信号。Specifically, generating a control signal corresponding to the first instruction information includes at least one of the following: when the first instruction information includes: the first instruction code and the first source operand, generating first sub-control information according to the type of the first instruction code, and generating a second sub-control signal for reading an operand register according to the first source operand, wherein the control signal includes: the first sub-control signal and the second sub-control signal; when the first instruction information includes: the first instruction code, the first source operand and the first target operand, generating first sub-control information according to the type of the first instruction code, generating a second sub-control signal for reading an operand register according to the first source operand, and generating a third sub-control signal for writing into a general register according to the first target operand, wherein the control signal includes: the first sub-control signal, the second sub-control signal and the third sub-control signal; when the first instruction information includes: the first instruction code, the first immediate number and the first target operand, generating first sub-control information according to the type of the first instruction code, generating a fourth sub-control signal for reading the first immediate number, and generating a third sub-control signal for writing into a general register according to the first target operand, wherein the control signal includes: the first sub-control signal, the fourth sub-control signal and the third sub-control signal.
可以理解的是,第一子控制信号为用于实现加法、乘法、位移、逻辑运算等运算的控制信号,第二子控制信号为读取操作数寄存器的控制信号,第三子控制信号为写入通用寄存器的目标位置写入目标信息的控制信号,第四子控制信号为读取所述第一立即数的控制信号。It can be understood that the first sub-control signal is a control signal for implementing operations such as addition, multiplication, displacement, and logical operations; the second sub-control signal is a control signal for reading the operand register; the third sub-control signal is a control signal for writing target information to the target location of the general register; and the fourth sub-control signal is a control signal for reading the first immediate value.
例如:加立即数指令(Add Immediate Instruction,简称为ADDI)指令用于指示将操作数寄存器rs1中的整数值与12位立即数(进行符号位扩展)进行加法操作,结果写回寄存器RD中,将SW指令进行解析,第一子控制信号为加法运算的控制信号,第二子控制信号为读取操作数寄存器rs1中的整数值的控制信号,第三子控制信号为结果写回通用寄存器RD中的控制信号。For example: the Add Immediate Instruction (ADDI) instruction is used to instruct the addition operation of the integer value in the operand register rs1 and the 12-bit immediate value (with sign bit extension), and the result is written back to the register RD. The SW instruction is parsed, the first sub-control signal is the control signal for the addition operation, the second sub-control signal is the control signal for reading the integer value in the operand register rs1, and the third sub-control signal is the control signal for writing the result back to the general register RD.
为了更好的理解上述待处理任务的处理方法的过程,以下再结合可选实施例对上述待处理任务的处理的实现方法流程进行说明,但不用于限定本公开实施例的技术方案。In order to better understand the process of the method for processing the above-mentioned tasks to be processed, the implementation method flow of the processing of the above-mentioned tasks to be processed is described below in combination with an optional embodiment, but it is not used to limit the technical solution of the embodiment of the present disclosure.
在本实施例中提供了一种待处理任务的处理方法,图3是根据本公开可选实施例的待处理任务的处理方法的流程图,如图3所示,具体如下:In this embodiment, a method for processing a task to be processed is provided. FIG3 is a flow chart of the method for processing a task to be processed according to an optional embodiment of the present disclosure. As shown in FIG3 , the details are as follows:
读取指令、指令译码、指令执行、访存取数、结果回写,其中,本公开实施例使用2级 流水,其中读取指令使用1拍,译码、执行、访存、回写共同使用1拍,通过不同指令的编排,完成状态机的可编程。Read instructions, decode instructions, execute instructions, access memory, and write results back. The disclosed embodiment uses a 2-level Pipeline, in which reading instructions uses 1 beat, decoding, execution, memory access, and write back use 1 beat together. Through the arrangement of different instructions, the state machine can be programmed.
在本实施例中还提供了一种处理待处理任务的硬件加速器,图4是根据本公开实施例的硬件加速器的内部结构图,如图4所示,具体包括:In this embodiment, a hardware accelerator for processing a task to be processed is also provided. FIG. 4 is an internal structure diagram of a hardware accelerator according to an embodiment of the present disclosure. As shown in FIG. 4 , the hardware accelerator specifically includes:
数据紧密耦合存储器(Data Tightly-Coupled Memory,简称为DTCM),设置为高速数据缓存;Data Tightly-Coupled Memory (DTCM), configured as a high-speed data cache;
指令提取单元(Instruction Fetch Unit,简称为IFU),设置为完成取指和PC的跳转;Instruction Fetch Unit (IFU), which is set to complete instruction fetch and PC jump;
指令紧密耦合内存单元(Instruction Tightly Coupled Memory,简称为ITCM),是指令的缓存空间,支持离线的读写;Instruction Tightly Coupled Memory (ITCM) is the instruction cache space that supports offline reading and writing.
指令解码单元:(Instruction Decode Unit,简称为DeCode),设置为译码;Instruction Decode Unit: (DeCode for short), set to decode;
执行单元(Execution Unit,简称为EXU),设置为完成指令的执行;Execution Unit (EXU), which is set to complete the execution of instructions;
寄存器文件(Register File,简称为RegFile),设置为完成内部通用寄存器的操作。Register File (Register File, referred to as RegFile for short) is set to complete the operation of internal general registers.
为了提高芯片灵活度,本公开对输入内容(指令)进行了一定的约束,即本公开实施例中给出了一种指令设计的方案,具体的:In order to improve the flexibility of the chip, the present disclosure imposes certain constraints on the input content (instructions), that is, a solution for instruction design is provided in the embodiment of the present disclosure, specifically:
指令包含DTCM读写指令、基本整数运算指令(加、比较、或、与、移位)、无条件跳转指令、有条件跳转指令(相等、不等、大于、小于、大于等于、小于等于)。The instructions include DTCM read and write instructions, basic integer operation instructions (addition, comparison, OR, AND, shift), unconditional jump instructions, and conditional jump instructions (equal, unequal, greater than, less than, greater than or equal to, less than or equal to).
指令的内容需要包括:The content of the instruction needs to include:
指令的编码(必选),用于识别指令想要完成的动作;The instruction code (required), used to identify the action the instruction is intended to complete;
操作的地址(必选),DTCM的读写地址或者regfile的读写地址;Operation address (required), read/write address of DTCM or read/write address of regfile;
立即数(可选),一个具体的被操作数;Immediate value (optional), a specific operand;
regfile为32个32bit的通用寄存器,配合指令的使用,如图5所示,地址0的通用寄存器常为0,不可写,PC为当前正在处理的指令地址。regfile is 32 32-bit general registers, which are used with instructions, as shown in Figure 5. The general register at address 0 is always 0 and cannot be written. PC is the address of the instruction currently being processed.
在本公开实施例中,通过CPU将设计好的指令写入ITCM并启动该装置;IFU控制指令读取地址并读取指令;DeCode完成指令解析;EXU完成指令执行和数据回写。In the disclosed embodiment, the designed instructions are written into the ITCM by the CPU and the device is started; the IFU controls the instruction reading address and reads the instruction; the DeCode completes the instruction parsing; and the EXU completes the instruction execution and data writeback.
本公开实施例中设计的指令以RSIC-V中的RV32I基本指令集为基础,直接借用部分指令,另外补充一些自定义指令,指令设计可以自由定义和扩展,必要的指令为读写指令、基本逻辑运算指令和跳转指令。The instructions designed in the disclosed embodiments are based on the RV32I basic instruction set in RSIC-V, directly borrowing some instructions and supplementing some custom instructions. The instruction design can be freely defined and expanded. The necessary instructions are read and write instructions, basic logical operation instructions and jump instructions.
1、存储读写指令:1. Storage read and write instructions:
该组指令进行存储器读或者写操作,访问存储器的地址均由操作数寄存器rs1中的值与12位的立即数(进行符号位扩展)相加所得。This group of instructions performs memory read or write operations, and the address used to access the memory is obtained by adding the value in the operand register rs1 to a 12-bit immediate value (with sign bit extension).
LW指令,如图6所示:从存储器中读回32位的数据,写回寄存器rd;The LW instruction, as shown in Figure 6, reads 32 bits of data from the memory and writes it back to register rd;
SW指令,如图7所示:将操作数寄存器rs2中的32位数据,写回存储器中;The SW instruction, as shown in Figure 7, writes the 32-bit data in operand register rs2 back to the memory;
2、基本整数运算指令:2. Basic integer operation instructions:
该组指令将寄存器与立即数进行基本的整数运算操作。This group of instructions performs basic integer operations on registers and immediate values.
ADDI指令,如图8所示:将操作数寄存器rs1中的整数值与12位立即数(进行符号位扩展)进行加法操作,结果写回寄存器RD中。如果发生了结果溢出,无须特殊处理,将滥出位舍弃,仅保留低32位结果。The ADDI instruction, as shown in Figure 8, adds the integer value in the operand register rs1 to the 12-bit immediate value (with the sign bit extended), and writes the result back to the register RD. If the result overflows, no special processing is required, and the overflowed bits are discarded, leaving only the lower 32 bits of the result.
SLTIU指令,如图9所示:将操作数寄存器rs1中的整数值与12位立即数(进行符号位扩展)作无符号数进行比较。如果rs1中的值小于立即数的值,则结果为1,否则为0, 结果写回寄存器RD中。The SLTIU instruction, as shown in Figure 9, compares the integer value in operand register rs1 with the 12-bit immediate value (sign extended) as an unsigned number. If the value in rs1 is less than the immediate value, the result is 1, otherwise it is 0. The result is written back to register RD.
ANDI指令,如图9所示:将操作数寄存器rs1中的整数值与12位立即数(进行符号位扩展)进行与(AND)操作,结果写回寄存器rd中。ANDI instruction, as shown in FIG9 , performs an AND operation on the integer value in operand register rs1 and the 12-bit immediate value (with sign bit extended), and writes the result back to register rd.
ORI指令,如图9所示:将操作数寄存器rs1中的整数值与12位立即数(进行符号位扩展)进行或(OR)操作,结果写回寄存器rd中。The ORI instruction, as shown in FIG9 , performs an OR operation on the integer value in the operand register rs1 and the 12-bit immediate value (with sign bit extension), and writes the result back to the register rd.
XORI指令,如图9所示:指令将操作数寄存器rs1中的整数值与12位立即数(进行符号位扩展)进行异或(XOR)操作,结果写回寄存器rd中。The XORI instruction, as shown in FIG9 , performs an exclusive OR (XOR) operation on the integer value in the operand register rs1 and the 12-bit immediate value (with sign bit extension), and writes the result back to the register rd.
SLLI指令,如图9所示:对操作数寄存器rs1中的整数值进行逻辑左移运算(低位补入0),移位量为5位立即数,结果写回寄存器rd中。The SLLI instruction, as shown in FIG9 , performs a logical left shift operation on the integer value in the operand register rs1 (filling the low bits with 0), the shift amount is a 5-bit immediate value, and the result is written back to the register rd.
SRLI指令,如图9所示:对操作数寄存器rs1中的整数值进行逻辑右移运算(高位补入0),移位量为5位立即数,结果写回寄存器rd中。The SRLI instruction, as shown in FIG9 , performs a logical right shift operation on the integer value in the operand register rs1 (filling the high bits with 0), the shift amount is a 5-bit immediate value, and the result is written back to the register rd.
3、数据构造指令:3. Data construction instructions:
LUI指令,如图10所示:将20位立即数的值左移12位(低12位补0)成为一个32位数,将此数写回寄存器rd中。The LUI instruction, as shown in FIG10 , shifts the value of the 20-bit immediate number left by 12 bits (fills the lower 12 bits with 0) to become a 32-bit number, and writes the number back to register rd.
4、跳转指令,如图11和图12所示:4. Jump instruction, as shown in Figure 11 and Figure 12:
图11表示的是无条件跳转指令:Figure 11 shows an unconditional jump instruction:
JAL指令:使用20位立即数(有符号数)作为偏移量,然后与该指令的PC相加,生成得到最终的跳转目标地址,jal指令将其下一条指令的PC(即当前指令PC+1)的值写入其结果寄存器rd。JAL instruction: Use a 20-bit immediate number (signed number) as an offset, and then add it to the PC of the instruction to generate the final jump target address. The jal instruction writes the value of the PC of the next instruction (that is, the current instruction PC+1) into its result register rd.
JALR指令:使用12立即数(有符号数)作为偏移量,与操作数寄存器rs1中的值相加得到最终的跳转目标地址。jalr指令将其下一条指令的PC(即当前指令PC+1)的值写入其结果寄存器rd。JALR instruction: Use 12-bit immediate number (signed number) as offset and add it to the value in operand register rs1 to get the final jump target address. The jalr instruction writes the value of the PC of the next instruction (i.e. the current instruction PC+1) into its result register rd.
图12表示的是有条件跳转指令:Figure 12 shows the conditional jump instruction:
该组指令为有条件跳转指令,使用12位立即数(有符号数)作为偏移量,然后与该指令的PC相加,生成得到最终的跳转目标地址。有条件跳转指令需要在条件为真时才会发生跳转,具体如下。This group of instructions is a conditional jump instruction, which uses a 12-bit immediate number (signed number) as an offset, and then adds it to the PC of the instruction to generate the final jump target address. The conditional jump instruction needs to jump only when the condition is true, as follows.
BEQI指令:只有在操作数寄存器rs1中的数值与操作数寄存器5位立即数的数值相等时,才会跳转。BEQI instruction: It will jump only if the value in operand register rs1 is equal to the value of the 5-bit immediate value in the operand register.
BNEI指令:只有在操作数寄存器rs1中的数值与操作数寄存器5位立即数的数值不相等时,才会跳转。BNEI instruction: It will jump only if the value in operand register rs1 is not equal to the value of the 5-bit immediate value in the operand register.
BGTUI指令:只有在操作数寄存器rs1中的无符号数大于操作数寄存器5位立即数的无符号数时,才会跳转。BGTUI instruction: It will jump only if the unsigned number in operand register rs1 is greater than the unsigned number of the 5-bit immediate value in the operand register.
BLEUI指令:只有在操作数寄存器rs1中的无符号数小于或等于操作数寄存器5位立即数的无符号数时,才会跳转。BLEUI instruction: It will jump only if the unsigned number in operand register rs1 is less than or equal to the unsigned number in operand register 5-bit immediate value.
BLTUI指令:只有在操作数寄存器rs1中的无符号数小于操作数寄存器5位立即数的无符号数时,才会跳转。BLTUI instruction: It will jump only if the unsigned number in operand register rs1 is less than the unsigned number of the 5-bit immediate value in the operand register.
BGEUI指令:只有在操作数寄存器rs1中的无符号数大于或等于操作数寄存器5位立即数的无符号数时,才会跳转。BGEUI instruction: It will jump only if the unsigned number in operand register rs1 is greater than or equal to the unsigned number in operand register 5-bit immediate value.
本公开实施例,设计了一套硬件架构并对其输入内容(指令)进行了一定的约束,在普 通的硬件设计基础上增加了与软件相似的可编程功能,实现了状态机的可编程,甚至是逻辑功能的可编程,在几乎不增加芯片功耗面积的前提下,提供一定灵活性。一定程度上解决芯片设计既要灵活性,又要低成本的两难问题。The disclosed embodiment designs a set of hardware architecture and imposes certain constraints on its input content (instructions). Based on the common hardware design, the programmable function similar to software is added, which realizes the programmability of state machine and even logic function, and provides a certain flexibility without increasing the chip power consumption area. To a certain extent, it solves the dilemma of chip design that requires both flexibility and low cost.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如只读存储器/随机存取存储器(Read-Only Memory/Random Access Memory,ROM/RAM)、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本公开各个实施例所述的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course it can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of the present disclosure, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. The computer software product is stored in a storage medium (such as a read-only memory/random access memory (ROM/RAM), a disk, or an optical disk), and includes a number of instructions for a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in the various embodiments of the present disclosure.
在本实施例中还提供了一种待处理任务的处理系统,该系统用于实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的系统较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。In the present embodiment, a processing system for a task to be processed is also provided, and the system is used to implement the above-mentioned embodiment and preferred implementation mode, and the descriptions that have been made are not repeated. As used below, the term "module" can implement a combination of software and/or hardware of a predetermined function. Although the system described in the following embodiments is preferably implemented in software, the implementation of hardware, or a combination of software and hardware is also possible and conceived.
本公开实施例给出了一种待处理任务的处理系统,本公开实施例中的待处理任务的处理系统位于上述硬件加速器中,图14是根据本公开实施例的待处理任务的处理系统的结构框图,如图14所示,该系统包括:The embodiment of the present disclosure provides a processing system for a task to be processed. The processing system for the task to be processed in the embodiment of the present disclosure is located in the above-mentioned hardware accelerator. FIG. 14 is a structural block diagram of the processing system for the task to be processed according to the embodiment of the present disclosure. As shown in FIG. 14 , the system includes:
指令提取单元1402,设置为根据待处理任务的参数信息和需求信息确定所述待处理任务对应的指令集合,其中,所述需求信息用于指示所述待处理任务对应的处理结果;确定所述指令集合中的每一个待执行指令的指令地址;The instruction extraction unit 1402 is configured to determine an instruction set corresponding to the task to be processed according to parameter information and requirement information of the task to be processed, wherein the requirement information is used to indicate a processing result corresponding to the task to be processed; and determine an instruction address of each instruction to be executed in the instruction set;
执行单元1404,设置为根据所述每一个待执行指令的指令地址执行所述每一个待执行指令,以处理所述待处理任务。The execution unit 1404 is configured to execute each of the to-be-executed instructions according to the instruction address of each of the to-be-executed instructions to process the to-be-processed tasks.
通过上述装置,由于硬件加速器可以根据预设指令确定待处理任务的指令集合(即实现了对待处理任务的任务编程),进而通过执行指令集合中的指令,处理待处理任务,即硬件加速器也可以灵活处理不同的任务。因此,可以解决芯片设计的灵活性较低的问题,进而达到了提高芯片灵活度的效果。Through the above device, since the hardware accelerator can determine the instruction set of the task to be processed according to the preset instructions (i.e., it realizes the task programming of the task to be processed), and then processes the task to be processed by executing the instructions in the instruction set, the hardware accelerator can also flexibly process different tasks. Therefore, the problem of low flexibility in chip design can be solved, thereby achieving the effect of improving chip flexibility.
需要说明的是,本公开实施例中的待处理任务的处理系统位于上述硬件加速器中,硬件加速器通过待处理任务的处理系统处理待处理任务,因此,本公开实施例中的待处理任务的处理系统的系统结构与上述硬件加速器的内部结构相似。It should be noted that the processing system for the tasks to be processed in the embodiment of the present disclosure is located in the above-mentioned hardware accelerator, and the hardware accelerator processes the tasks to be processed through the processing system for the tasks to be processed. Therefore, the system structure of the processing system for the tasks to be processed in the embodiment of the present disclosure is similar to the internal structure of the above-mentioned hardware accelerator.
由于本公开实施例中的待处理任务的处理系统的系统结构与上述硬件加速器的内部结构相似,因此,如图4所示,上述待处理任务的处理系统还包括:数据紧密耦合存储器、指令紧密耦合内存单元、指令解码单元、执行单元、寄存器文件。Since the system structure of the processing system for the task to be processed in the embodiment of the present disclosure is similar to the internal structure of the above-mentioned hardware accelerator, as shown in Figure 4, the processing system for the task to be processed also includes: a data tightly coupled memory, an instruction tightly coupled memory unit, an instruction decoding unit, an execution unit, and a register file.
需要说明的是,在处理完成所述待处理任务的情况下,以预设格式将处理结果写回数据紧密耦合存储器中。It should be noted that, when the task to be processed is completed, the processing result is written back to the data tightly coupled memory in a preset format.
寄存器文件是待处理任务的处理系统的内部的一个组件,设置为存储和管理系统的寄存器。寄存器是系统的内部的一种高速存储器,设置为暂存指令和数据。regfile通常包含多个寄存器,比如通用寄存器、程序计数器(PC)等。寄存器文件可以为系统提供快速的数据存储和访问功能,以支持指令的执行和数据的处理。A register file is a component inside a processing system that is used to store and manage system registers. A register is a high-speed memory inside a system that is used to temporarily store instructions and data. A regfile usually contains multiple registers, such as general registers, program counter (PC), etc. A register file can provide the system with fast data storage and access capabilities to support the execution of instructions and the processing of data.
在一个示例性实施例中,指令提取单元1402,设置为根据所述参数信息和需求信息确定 所述待处理任务的多个逻辑处理方式;确定所述多个逻辑处理方式对应的指令,并将多个所述指令确定为所述指令集合。In an exemplary embodiment, the instruction extraction unit 1402 is configured to determine based on the parameter information and the requirement information A plurality of logical processing modes for the task to be processed; determining instructions corresponding to the plurality of logical processing modes, and determining the plurality of instructions as the instruction set.
在一个示例性实施例中,所述指令提取单元,还设置为获取程序计数器中的第二指令地址,其中,所述程序计数器用于存储下一条将要执行的指令的指令地址;根据所述每一个待执行指令的第一指令地址在所述指令集合中确定所述第二指令地址对应的待执行指令;所述执行单元,还设置为执行所述待执行指令。In an exemplary embodiment, the instruction extraction unit is further configured to obtain a second instruction address in a program counter, wherein the program counter is used to store the instruction address of the next instruction to be executed; determine the instruction to be executed corresponding to the second instruction address in the instruction set based on the first instruction address of each instruction to be executed; and the execution unit is further configured to execute the instruction to be executed.
在一个示例性实施例中,指令解码单元,设置为解析所述待执行指令,以获取所述待执行指令对应的第一指令信息,其中,所述第一指令信息至少包括以下之一:第一指令编码、第一源操作数、第一目标操作数、第一立即数;生成所述第一指令信息对应的控制信号,将所述控制信号发送至对应的执行单元,以执行所述待执行指令。In an exemplary embodiment, an instruction decoding unit is configured to parse the instruction to be executed to obtain first instruction information corresponding to the instruction to be executed, wherein the first instruction information includes at least one of the following: a first instruction code, a first source operand, a first target operand, and a first immediate number; generate a control signal corresponding to the first instruction information, and send the control signal to a corresponding execution unit to execute the instruction to be executed.
在一个示例性实施例中,所述指令解码单元,还设置为以下之一:In an exemplary embodiment, the instruction decoding unit is further configured to be one of the following:
在所述第一指令信息包括:所述第一指令编码和所述第一源操作数的情况下,根据所述第一指令编码的类型生成第一子控制信息,以及根据所述第一源操作数生成读取操作数寄存器的第二子控制信号,其中,所述控制信号包括:所述第一子控制信号和所述第二子控制信号;In a case where the first instruction information includes: the first instruction code and the first source operand, generating first sub-control information according to the type of the first instruction code, and generating a second sub-control signal for reading an operand register according to the first source operand, wherein the control signal includes: the first sub-control signal and the second sub-control signal;
在所述第一指令信息包括:所述第一指令编码、所述第一源操作数和所述第一目标操作数的情况下,根据所述第一指令编码的类型生成第一子控制信息,根据所述第一源操作数生成读取操作数寄存器的第二子控制信号,以及根据所述第一目标操作数生成写入通用寄存器的第三子控制信号,其中,所述控制信号包括:所述第一子控制信号、所述第二子控制信号和所述第三子控制信号;In a case where the first instruction information includes: the first instruction code, the first source operand, and the first target operand, generating first sub-control information according to the type of the first instruction code, generating a second sub-control signal for reading an operand register according to the first source operand, and generating a third sub-control signal for writing into a general register according to the first target operand, wherein the control signal includes: the first sub-control signal, the second sub-control signal, and the third sub-control signal;
在所述第一指令信息包括:所述第一指令编码、所述第一立即数和所述第一目标操作数的情况下,根据所述第一指令编码的类型生成第一子控制信息,生成读取所述第一立即数的第四子控制信号,以及根据所述第一目标操作数生成写入通用寄存器的第三子控制信号,其中,所述控制信号包括:所述第一子控制信号、所述第四子控制信号和所述第三子控制信号。In the case where the first instruction information includes: the first instruction code, the first immediate value and the first target operand, first sub-control information is generated according to the type of the first instruction code, a fourth sub-control signal for reading the first immediate value is generated, and a third sub-control signal for writing into a general register is generated according to the first target operand, wherein the control signal includes: the first sub-control signal, the fourth sub-control signal and the third sub-control signal.
在一个示例性实施例中,上述装置还包括:指令编写单元,设置为确定待编写指令的第二指令信息,其中,所述第二指令信息至少包括以下之一:第二指令编码、第二源操作数、第二目标操作数、第二立即数,所述第二指令编码用于指示所述待编写指令的功能,所述第二源操作数、第二目标操作数用于指示所述待编写指令的读写地址;根据所述第二指令信息确定所述待编写指令。In an exemplary embodiment, the above-mentioned device also includes: an instruction writing unit, configured to determine second instruction information of the instruction to be written, wherein the second instruction information includes at least one of the following: a second instruction code, a second source operand, a second target operand, and a second immediate value, the second instruction code is used to indicate the function of the instruction to be written, and the second source operand and the second target operand are used to indicate the read and write addresses of the instruction to be written; the instruction to be written is determined according to the second instruction information.
需要说明的是,编写好的指令存储在指令紧密耦合内存单元中,以及所述指令集合在取出之前也存储在指令紧密耦合内存单元中。It should be noted that the programmed instructions are stored in the instruction tightly coupled memory unit, and the instruction set is also stored in the instruction tightly coupled memory unit before being fetched.
需要说明的是,上述各个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述各个模块以任意组合的形式分别位于不同的处理器中。It should be noted that the above modules can be implemented by software or hardware. For the latter, it can be implemented in the following ways, but not limited to: the above modules are all located in the same processor; or the above modules are located in different processors in any combination.
为便于对本公开所提供的技术方案的理解,下面将结合具体场景的实施例进行详细的阐述。To facilitate the understanding of the technical solutions provided by the present disclosure, embodiments of the present invention will be described in detail below in conjunction with specific scenarios.
本公开的实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。An embodiment of the present disclosure further provides a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the steps of any of the above method embodiments when running.
在一个示例性实施例中,上述计算机可读存储介质可以包括但不限于:U盘、只读存储 器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储计算机程序的介质。In an exemplary embodiment, the computer-readable storage medium may include, but is not limited to: a USB flash drive, a read-only storage medium, Various media that can store computer programs include Read-Only Memory (ROM), Random Access Memory (RAM), mobile hard disk, magnetic disk or CD, etc.
本公开的实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。An embodiment of the present disclosure further provides an electronic device, including a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to execute the steps in any one of the above method embodiments.
在一个示例性实施例中,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。In an exemplary embodiment, the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
本实施例中的具体示例可以参考上述实施例及示例性实施方式中所描述的示例,本实施例在此不再赘述。For specific examples in this embodiment, reference may be made to the examples described in the above embodiments and exemplary implementation modes, and this embodiment will not be described in detail herein.
显然,本领域的技术人员应该明白,上述的本公开的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本公开不限制于任何特定的硬件和软件结合。Obviously, those skilled in the art should understand that the above modules or steps of the present disclosure can be implemented by a general computing device, they can be concentrated on a single computing device, or distributed on a network composed of multiple computing devices, they can be implemented by a program code executable by a computing device, so that they can be stored in a storage device and executed by the computing device, and in some cases, the steps shown or described can be executed in a different order than here, or they can be made into individual integrated circuit modules, or multiple modules or steps therein can be made into a single integrated circuit module for implementation. Thus, the present disclosure is not limited to any specific combination of hardware and software.
以上所述仅为本公开的优选实施例而已,并不用于限制本公开,对于本领域的技术人员来说,本公开可以有各种更改和变化。凡在本公开的原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。 The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure. For those skilled in the art, the present disclosure may have various modifications and variations. Any modification, equivalent replacement, improvement, etc. made within the principles of the present disclosure shall be included in the protection scope of the present disclosure.
Claims (11)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311034305.X | 2023-08-15 | ||
| CN202311034305.XA CN119514430A (en) | 2023-08-15 | 2023-08-15 | Method and system for processing tasks to be processed, storage medium and electronic device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025036099A1 true WO2025036099A1 (en) | 2025-02-20 |
Family
ID=94632162
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2024/106811 Pending WO2025036099A1 (en) | 2023-08-15 | 2024-07-22 | Method and system for processing task to be processed, and storage medium and electronic apparatus |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN119514430A (en) |
| WO (1) | WO2025036099A1 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119960830A (en) * | 2025-04-10 | 2025-05-09 | 北京开源芯片研究院 | Method, device, equipment and storage medium for configuring immediate value of jump instruction |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113010213A (en) * | 2021-04-15 | 2021-06-22 | 清华大学 | Simplified instruction set storage and calculation integrated neural network coprocessor based on resistance change memristor |
| US20210255860A1 (en) * | 2018-08-29 | 2021-08-19 | Cerebras Systems Inc. | Isa enhancements for accelerated deep learning |
| CN113869504A (en) * | 2021-12-02 | 2021-12-31 | 之江实验室 | A Memristor-Based Programmable Neural Network Accelerator |
| CN114761920A (en) * | 2019-12-09 | 2022-07-15 | 亚马逊技术股份有限公司 | Hardware accelerator with reconfigurable instruction set |
| CN116431214A (en) * | 2023-03-31 | 2023-07-14 | 北京大学 | An instruction set device for a reconfigurable deep neural network accelerator |
-
2023
- 2023-08-15 CN CN202311034305.XA patent/CN119514430A/en active Pending
-
2024
- 2024-07-22 WO PCT/CN2024/106811 patent/WO2025036099A1/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20210255860A1 (en) * | 2018-08-29 | 2021-08-19 | Cerebras Systems Inc. | Isa enhancements for accelerated deep learning |
| CN114761920A (en) * | 2019-12-09 | 2022-07-15 | 亚马逊技术股份有限公司 | Hardware accelerator with reconfigurable instruction set |
| CN113010213A (en) * | 2021-04-15 | 2021-06-22 | 清华大学 | Simplified instruction set storage and calculation integrated neural network coprocessor based on resistance change memristor |
| CN113869504A (en) * | 2021-12-02 | 2021-12-31 | 之江实验室 | A Memristor-Based Programmable Neural Network Accelerator |
| CN116431214A (en) * | 2023-03-31 | 2023-07-14 | 北京大学 | An instruction set device for a reconfigurable deep neural network accelerator |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119960830A (en) * | 2025-04-10 | 2025-05-09 | 北京开源芯片研究院 | Method, device, equipment and storage medium for configuring immediate value of jump instruction |
Also Published As
| Publication number | Publication date |
|---|---|
| CN119514430A (en) | 2025-02-25 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111782270B (en) | A data processing method, device and storage medium | |
| JP4986431B2 (en) | Processor | |
| EP1468367A4 (en) | MULTI-THREAD PROCESSOR WITH EFFICIENT PROCESSING FOR CONVERGENCE APPLICATIONS | |
| US10241794B2 (en) | Apparatus and methods to support counted loop exits in a multi-strand loop processor | |
| US8959501B2 (en) | Type and length abstraction for data types | |
| WO2025036099A1 (en) | Method and system for processing task to be processed, and storage medium and electronic apparatus | |
| US20250103330A1 (en) | Vector shift method, processor, and electronic device | |
| CN111158756A (en) | Method and apparatus for processing information | |
| KR101927858B1 (en) | Rsa algorithm acceleration processors, methods, systems, and instructions | |
| WO2021120713A1 (en) | Data processing method, decoding circuit, and processor | |
| TW201712534A (en) | Decoding information about a group of instructions including a size of the group of instructions | |
| CN115686628A (en) | A register allocation method and device | |
| JP2016006632A (en) | Processor with conditional instructions | |
| CN111459546B (en) | Device and method for realizing variable bit width of operand | |
| CN118779011A (en) | Data normalization RISC-V instruction set extension method and hardware acceleration device | |
| EP3953808A1 (en) | Method and apparatus for processing data splicing instruction | |
| CN114020332B (en) | Instruction processing method and device | |
| CN112130899A (en) | Stack computer | |
| US8572147B2 (en) | Method for implementing a bit-reversed increment in a data processing system | |
| CN118550590B (en) | Low-power consumption cryptographic algorithm processor micro-architecture and working method thereof | |
| US20240427597A1 (en) | Conditional branch instructions for aggregating conditional branch operations | |
| JPH1173301A (en) | Information processing device | |
| CN118426831A (en) | Vector processor and operation method thereof | |
| CN116991481A (en) | Execution method, device and medium of operation instruction | |
| CN118057308A (en) | Instruction processing optimization method and related device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24853511 Country of ref document: EP Kind code of ref document: A1 |