WO2019020028A1 - Method and apparatus for allocating shared resource - Google Patents
Method and apparatus for allocating shared resource Download PDFInfo
- Publication number
- WO2019020028A1 WO2019020028A1 PCT/CN2018/096869 CN2018096869W WO2019020028A1 WO 2019020028 A1 WO2019020028 A1 WO 2019020028A1 CN 2018096869 W CN2018096869 W CN 2018096869W WO 2019020028 A1 WO2019020028 A1 WO 2019020028A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- shared resource
- target thread
- clock cycles
- state
- shared
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5038—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
Definitions
- the present application relates to the field of information technology, and in particular, to a shared resource allocation method and apparatus.
- Thread-mixed deployment strategy that is, multiple threads are mixed and deployed in the same data center server. Because multiple threads of the hybrid deployment need to share the hardware and software resources (that is, shared resources) in the data center server, resource competition occurs, resulting in mixing. Unpredictable thread performance of the deployment can seriously affect the quality of service of the target thread (such as high priority threads).
- the embodiment of the present application provides a shared resource allocation method and apparatus, which detects a working state of a target thread when multiple threads access a shared resource at the same time, and adjusts a target thread according to the detected working state.
- the allocation of shared resources to ensure the quality of service of the target thread.
- an embodiment of the present application provides a method for allocating a shared resource, including: detecting a working state of a shared resource when multiple threads access a shared resource at the same time and counting in a working state, where the working state includes using the shared resource by the target thread without blocking.
- the basic clock cycle number is a count value of the shared resource in the first state
- the waiting clock cycle number is a count value of the shared resource in the second state
- the number of the interference clock cycles is a count value of the shared resource in the third state, according to The number of basic clock cycles, the number of waiting clock cycles, and the number of interfering clock cycles are adjusted to adjust the allocation amount of the shared resources of the target thread.
- the number of basic clock cycles, the number of waiting clock cycles, and the number of interference clock cycles of the target thread are respectively counted, and the clock is waited according to the basic clock cycle number.
- the number of cycles and the number of interference clock cycles are used to adjust the allocation quota of the shared resources of the target thread, so that the allocation quota of the shared resources of the target thread can be dynamically adjusted according to the working state of the shared resources, thereby ensuring the quality of service of the target thread.
- the shared resource includes multiple sub-shared resources, and the target thread and the non-target thread access multiple sub-shared resources in a serial manner, and the target thread is used without blocking.
- the shared resource is in the first state
- the target thread causes any of the plurality of sub-shared resources to block
- the shared resource is in the second state
- the non-target thread causes the plurality of sub-shared resources
- the shared resource is in the third state.
- the basic clock cycle number, the number of waiting clock cycles, and the number of interference clock cycles are determined according to the target thread.
- Adjusting the allocation quota of the shared resource of the target thread includes: determining the exclusive running time of the target thread and the actual running time of the target thread according to the number of basic clock cycles, the number of waiting clock cycles, and the number of interference clock cycles, in the exclusive running time and the actual time When the ratio between the running times is less than the quality of service Qos indicator, the allocation amount of the shared resources of the target thread is increased.
- the exclusive running time and the actual running time are obtained according to the number of basic clock cycles, the number of waiting clock cycles, and the number of interference clock cycles, directly reflecting the use of the shared resources by the target thread, obtaining the Qos indicator set by the user, and monopolizing the running time.
- the ratio between the actual running time and the Qos index can be used to know whether the target thread's usage of the shared resource satisfies the user's expectation, and maintains the allocation of the shared resource to the target thread when the user expects, or increases the shared resource to the non-target.
- the allocation of threads can improve the utilization of shared resources while ensuring the performance of the target thread. When the user expects not to be satisfied, the service quality of the target thread can be guaranteed by increasing the allocation quota of the shared resources of the target thread.
- the target thread is maintained when the ratio between the exclusive running time and the actual running time is not less than the quality of service QoS indicator The allocation of shared resources.
- a fourth possible implementation manner of the first aspect when the ratio between the exclusive running time and the actual running time is not less than the quality of service QoS indicator, the non-target is added. The allocation amount of the shared resource of the thread.
- determining the exclusive running time of the target thread and the actual running time of the target thread according to the basic clock cycle number, the number of waiting clock cycles, and the number of the interference clock cycles including: waiting for the basic clock cycle, waiting The number of clock cycles and the number of interfering clock cycles are summed to obtain the actual running time, and the number of basic clock cycles and the number of waiting clock cycles are summed to obtain exclusive running time.
- the exclusive running time of the target thread indicates that the target thread exclusively uses the shared resource without being interfered by the non-target thread. Therefore, the exclusive running time can be the sum of the basic clock period and the waiting clock period, for example, in practical applications.
- the target thread is not prioritized and the target thread can monopolize the shared resource, the target thread is inevitably affected by the non-target thread, so the actual running time of the target thread should be the basic clock period. The sum of the number, the number of waiting clock cycles, and the number of interfering clock cycles.
- the shared resource includes a front-end shared resource and a back-end shared resource
- the allocation quota of the shared resource of the target thread is increased, which includes: increasing the allocation quota of the front-end shared resource of the target thread, and increasing the target thread.
- the number of interference clock cycles can be reduced relatively quickly by preferentially increasing the allocation amount of the front-end shared resource of the target thread.
- the ratio between the exclusive running time and the actual running time is rapidly increased to be greater than or equal to the Qos indicator.
- the front-end shared resource includes an fetch unit and a first-level instruction cache
- the back-end shared resource includes a fetch queue, a first-level data cache, an instruction queue, an instruction reorder cache, and a physical rename register.
- the embodiment of the present application provides a thread resource allocation apparatus, including: a counting module, configured to detect a working state of a shared resource when multiple threads access a shared resource at the same time, and the working state includes a target thread that uses a shared resource without blocking. a state, a second state in which the target thread blocks the shared resource, and a third state in which the non-target thread causes the shared resource to be blocked, the counting module is configured to count in the working state, and the clock cycle number reading module is used to read the count The value is used to obtain the number of basic clock cycles, the number of waiting clock cycles, and the number of interference clock cycles.
- the number of basic clock cycles is the count value of the shared resource in the first state
- the number of waiting clock cycles is the count of the shared resource in the second state.
- the value of the interference clock cycle is a count value of the shared resource in the third state; the resource allocation quota adjustment module is configured to adjust the allocation of the shared resource of the target thread according to the number of basic clock cycles, the number of waiting clock cycles, and the number of interference clock cycles. Amount.
- the implementation of the second aspect or the second aspect is the device implementation corresponding to the first aspect or the implementation manner of any one of the first aspect, and the description in any one of the first aspect or the first aspect is applicable to the second aspect. Or any implementation of the second aspect, and details are not described herein again.
- an embodiment of the present application provides a thread resource allocation apparatus, including a processor and a memory, where the memory stores program instructions, and the processor runs the program instructions to perform the steps of the first aspect and various possible implementation manners of the first aspect. .
- a fourth aspect of the present application provides a multi-threaded control device comprising at least one processing element (or chip) for performing the method of the above first aspect.
- a fifth aspect of the present application provides a program for executing the method of the first aspect and various possible implementations of the first aspect, when executed by a processor.
- a sixth aspect of the present application provides a program product, such as a computer readable storage medium, comprising the program of the fifth aspect.
- a seventh aspect of the present application provides a computer readable storage medium having stored therein instructions that, when run on a computer, cause the computer to perform the method of the first aspect described above.
- FIG. 1 is a schematic structural diagram of a shared resource allocation system according to an embodiment of the present application.
- FIG. 2 is a flowchart of a shared resource allocation method according to an embodiment of the present application.
- FIG. 3 is another schematic structural diagram of a shared resource allocation system according to an embodiment of the present application.
- FIG. 4 is a schematic diagram of connection between a primary instruction cache and a primary instruction cache state detection module according to an embodiment of the present application
- FIG. 5 is a schematic diagram of connection between an indexing unit and a value unit status detecting module according to an embodiment of the present application
- FIG. 6 is a schematic diagram of a connection between a physical rename register and a physical rename register state detection module according to an embodiment of the present application
- FIG. 7 is a schematic diagram of a connection of an instruction reordering cache and an instruction reordering buffer state detecting module according to an embodiment of the present application
- FIG. 8 is a schematic diagram of a connection between an instruction queue and an instruction queue state detection module according to an embodiment of the present application.
- FIG. 9 is a schematic diagram of connection between a memory access queue and a memory access queue state detecting module according to an embodiment of the present application.
- FIG. 10 is a schematic diagram of connection between a primary data cache and a primary data cache state detection module according to an embodiment of the present application
- FIG. 11 is a schematic diagram of a connection relationship between a first state detecting module and a basic clock counter according to an embodiment of the present application;
- FIG. 12 is a schematic diagram of a connection relationship between a second state detecting module and a waiting clock counter according to an embodiment of the present application.
- FIG. 13 is a connection diagram of a third state detecting module and an interference clock counter according to an embodiment of the present application.
- FIG. 14 is a schematic diagram of a hardware structure of a shared resource allocation apparatus according to an embodiment of the present application.
- FIG. 1 is a schematic structural diagram of a shared resource allocation system according to an embodiment of the present application.
- the shared resource allocation system 100 includes a target thread 102, a non-target thread 103, a shared resource 300, and a shared resource. Distribution device 200.
- the target thread 102 refers to a thread that needs to guarantee the quality of service, which requires less delay, and needs to ensure that the shared resources occupied by it are sufficient
- the non-target thread 103 refers to a thread that does not need to guarantee the quality of service.
- the delay requirement may be larger than the target thread, and the shared resources occupied by the non-target thread 103 may be sacrificed under the premise of guaranteeing the quality of service of the target thread 102.
- the target thread 102 and the non-target thread 103 respectively include at least one instruction, and each instruction carries an instruction content and a thread number, and the thread number is used to identify the thread to which the instruction belongs.
- the target thread 102 The instruction carries the target thread number
- the instruction of the non-target thread 103 carries the non-target thread number
- the target thread 102 and the non-target thread 103 simultaneously access the shared resource 300, and share the resource according to the target thread 102 and/or the non-target thread 103.
- the shared resource 300 can be in different working states.
- the working state of the shared resource 300 includes a first state in which the target thread 102 is non-blocking using the shared resource 300, a second state in which the target thread 102 causes the shared resource 300 to be blocked, and a third state in which the non-target thread 103 causes the shared resource 300 to block. .
- the first state is specifically that when the shared resource 300 is sufficient, the instruction of the target thread 102 can directly use the state of the shared resource 300 without waiting;
- the second state is specifically, when the shared resource 300 is insufficient, part of the instruction of the target thread 102 Using the shared resource 300, while other instructions of the target thread 102 are waiting for the state of the shared resource 300, it is worth noting that in the second state, the other instruction causing the target thread 102 to wait for the shared resource is that the shared resource 300 has been targeted Part of the instructions of the thread 102 are used to cause blocking;
- the third state is specifically when the shared resource 300 is insufficient, the instruction of the non-target thread 103 uses the shared resource 300, and the instruction of the target thread 102 waits for the state of the shared resource 300, in the third state
- the reason for causing the instruction of the target thread 102 to wait for the shared resource is that the shared resource 300 has been used by the instruction of the non-target thread 103, causing blocking.
- the shared resource allocation apparatus 200 is connected to the shared resource 300 for detecting the working state of the shared resource 300, and adjusting the allocation quota of the shared resource 300 that the target thread 102 can use according to the detected working state, so that The shared resource quota that the target thread 102 can use corresponds to the operational state of the target thread 102.
- the shared resource includes a plurality of sub-shared resources 1, 2, 3, 4, and the target thread 102 and the non-target thread 103 access the plurality of sub-shared resources 1, 2, 3, 4 in a serial manner, at the target thread 102.
- the shared resource 300 is in the first state, causing any of the plurality of sub-shared resources 1, 2, 3, 4 to be blocked at the target thread 102.
- the shared resource 300 is in the second state, and when the non-target thread 103 causes any of the plurality of child shared resources 1, 2, 3, 4 to block, the shared resource 300 is in the third state.
- the number of sub-shared resources is not limited to the number shown in FIG. 1. In an alternative embodiment, the number of sub-shared resources may be set according to actual needs.
- the shared resource allocation apparatus 200 includes a state detecting module 210, a counting module 240, a clock cycle reading module 220, and a resource allocation amount adjustment module 230.
- the counting module 240 includes a basic clock counter 282, a waiting clock counter 284, and an interference clock counter 286.
- the state detecting module 210 includes a first sub-shared resource state detecting module 2101, a second sub-shared resource state detecting module 2102, and a third sub-share.
- the resource status detection module 2103 and the fourth sub-shared resource status detection module 2104 each sub-shared resource sets a sub-shared resource status detection module, and the sub-shared resource status detection module can detect the corresponding sub-shared resource.
- the working state wherein the working state includes a first state, a second state, and a third state.
- the state detecting module 210 further includes a first state detecting module 281, a second state detecting module 283, and a third state detecting module 285.
- the first state detecting module 281 respectively uses the first child shared resource state detecting module 2101 and the second child shared resource.
- the state detecting module 2102, the third sub-shared resource state detecting module 2103, and the fourth sub-shared resource state detecting module 2104 know whether the sub-shared resources 1 to 4 operate in the first state, and operate at the first in the shared resources 1 to 4 In the state, it is confirmed that the target thread 102 uses any of the plurality of sub-shared resources 1, 2, 3, and 4 without blocking.
- the shared resource 300 as a whole is in the first state, and the first state detecting module 281 notifies the basic clock counter. 282 is counted.
- the second state detecting module 283 is known from the first sub-shared resource state detecting module 2101, the second sub-shared resource state detecting module 2102, the third sub-shared resource state detecting module 2103, and the fourth sub-shared resource state detecting module 2104, respectively. Whether the child shared resources 1 to 4 operate in the second state, and when any of the shared resources 1 to 4 operates in the second state, it is confirmed that the target thread 102 causes any of the plurality of child shared resources 1, 2, 3, and 4 One is blocked. At this time, the shared resource 300 is in the second state as a whole, and the second state detecting module 283 notifies the waiting clock counter 284 to count.
- the third state detecting module 285 is respectively configured from the first sub-shared resource state detecting module 2101, the second sub-shared resource state detecting module 2102, the third sub-shared resource state detecting module 2103, and the fourth sub-shared resource state detecting module 2104. Whether the sub-shared resources 1 to 4 are operating in the third state, and when any of the shared resources 1 to 4 is operating in the third state, it is confirmed that the non-target threads 103 cause a plurality of sub-shared resources 1, 2, 3, 4 Either one of them is blocked, and at this time, the shared resource 300 is in the third state as a whole, and the third state detecting module 285 notifies the interference clock counter 286 to count.
- the sub-shared resources are serially connected, in the embodiment of the present application, when each sub-shared resource is not blocked, it can be confirmed that the shared resource 300 works in the first state, and when any sub-shared resource When the device is blocked, it is confirmed that the shared resource 300 operates in the second state or the third state, wherein the source of the shared resource 300 is determined by determining whether the source of the child shared resource blocking is the target thread 102 or the non-target thread 103. The method of judgment is described in detail below.
- the clock cycle reading module 220 is configured to read the count value of the counting module 240 to obtain the basic clock cycle number, the number of waiting clock cycles, and the number of interference clock cycles.
- the basic clock cycle number is output by the basic clock counter 282, which is the shared resource 300.
- the count value in the first state, the number of waiting clock cycles is output by the wait clock counter 284, is the count value of the shared resource 300 in the second state, and the number of interference clock cycles is output by the interference clock counter 286, which is the shared resource 300.
- the count value in the three states is configured to read the count value of the counting module 240 to obtain the basic clock cycle number, the number of waiting clock cycles, and the number of interference clock cycles.
- the basic clock cycle number is output by the basic clock counter 282, which is the shared resource 300.
- the count value in the first state, the number of waiting clock cycles is output by the wait clock counter 284, is the count value of the shared resource 300 in the second state, and the number of interference clock cycles is output by the interference clock counter 286, which is the
- the resource allocation quota adjustment module 230 is configured to adjust the allocation quota of the shared resource 300 of the target thread according to the number of basic clock cycles, the number of waiting clock cycles, and the number of interference clock cycles.
- FIG. 2 is a flowchart of a shared resource allocation method according to an embodiment of the present application. As shown in FIG. 2, the shared resource allocation method includes the following steps:
- Step 401 The shared resource allocation device 200 obtains a quality of service (QoS) indicator.
- QoS quality of service
- the resource allocation quota adjustment module 230 in the shared resource allocation apparatus 200 stores in advance a Qos indicator, which can be input by the user or set according to a default value.
- the Qos indicator is a quality of service indicator for the target thread 102, and the value ranges from 0 to 100%, and is used to indicate the priority requirement of the user for the target thread 102. For example, when the Qos indicator is 100%, Indicates that the user wants the target thread 102 to monopolize the shared resource 300. When the target thread 102 uses the shared resource 300, the shared resource 300 can be released to the non-target thread 103. When the Qos indicator is 50%, the user wants the target thread 102. The non-target threads 103 each occupy half of the shared resource 300; and when the Qos indicator is 0, it indicates that the user wants the target thread 102 not to occupy the shared resource 300.
- Step 402 The shared resource allocation device 200 detects the working state of the shared resource 300 and counts when the multi-thread simultaneously accesses the shared resource 300.
- the state detecting module 210 of the shared resource allocation apparatus 200 simultaneously accesses the plurality of sub-shared resources 1 to 4 in the shared resource 300 in a serial manner at the target thread 102 and the non-target thread 103.
- the counting module 240 cumulatively counts the duration of the working state, wherein the basic clock counter 282 outputs the count value of the shared resource 300 in the first state, and waits for the clock counter 284 to output the shared resource 300.
- the count value in the second state, the interference clock counter 286 outputs the count value of the shared resource 300 in the third state.
- the number of target threads 102 may be one or more, and the number of non-target threads 103 may also be one or more.
- Step 403 The clock cycle reading module 220 of the shared resource allocation device 200 reads the count value from the counting module 240 to obtain the basic clock cycle number, the waiting clock cycle number, and the number of interference clock cycles.
- the number of basic clock cycles is the count value output by the basic clock counter 282
- the number of waiting clock cycles is the count value output by the wait clock counter 284
- the number of interference clock cycles is the count value output by the interference clock counter 286.
- the clock cycle read module 220 acquires the number of basic clock cycles from the basic clock counter 282, the number of wait clock cycles from the wait clock counter 284, and the number of interference clock cycles from the interference clock counter 286.
- the shared resource 300 since the shared resource 300 is simultaneously used by the target thread 102 and the non-target thread 103, the shared resource 300 dynamically switches the working state between the first state, the second state, and the third state, because it is a dynamic switch. Therefore, the duration of the first state, the second state, or the third state is not a continuous period of time, but a plurality of discrete periods of time, and thus the accumulation of the first state, the second state, and the third state separately is required. The time is counted to obtain the proportion of time occupied by the first state, the second state, and the third state within a predetermined time period.
- the predetermined time period described in this example may be a statistical period.
- the basic clock cycle number, the number of waiting clock cycles, and the number of interference clock cycles are obtained in one statistical cycle, according to the basic clock cycle.
- the number, the number of waiting clock cycles, and the number of interference clock cycles are adjusted, and the allocation quota of the shared resource 300 of the target thread 102 is adjusted, and the number of basic clock cycles, the number of waiting clock cycles, and the number of interference clock cycles are continuously acquired in the next statistical cycle, and further determined.
- After adjusting the allocation quota of the shared resource 300 of the target thread 102 in the previous statistical period whether the number of basic clock cycles, the number of waiting clock cycles, and the number of interference clock cycles in the current statistical period satisfy the Qos indicator.
- the statistical period can be set to be between 1 ms and 1 s, for example, depending on the actual application scenario.
- Step 404 The resource allocation quota adjustment module 230 of the shared resource allocation apparatus 200 determines the exclusive running time of the target thread 102 and the actual running time of the target thread 102 according to the basic clock cycle number, the number of waiting clock cycles, and the number of interference clock cycles.
- the exclusive runtime of the target thread 102 indicates that the target thread 102 exclusively uses the shared resource 300 without interference from the non-target thread 103, so the exclusive runtime may be, for example, the sum of the number of basic clock cycles and the number of waiting clock cycles.
- the target thread 102 is not prioritized and the target thread 102 can monopolize the shared resource 300, the target thread 102 is inevitably affected by the non-target thread 103.
- the actual running time of the target thread 102 should be the sum of the number of basic clock cycles, the number of waiting clock cycles, and the number of interfering clock cycles.
- Step 405 The resource allocation quota adjustment module 230 of the shared resource allocation device 200 determines that the ratio between the exclusive running time and the actual running time is smaller than the quality of service Qos indicator. If yes, step 406 is performed, and if no, step 407 is performed.
- Step 406 The resource allocation quota adjustment module 230 of the shared resource allocation device 200 increases the allocation quota of the shared resource 300 of the target thread 102 when the ratio between the exclusive running time and the actual running time is less than the quality of service QoS indicator.
- the ratio between the exclusive running time and the actual running time is less than the quality of service QoS indicator, it indicates that the quota occupied by the target thread 102 to the shared resource 300 belongs to a level that the user can be unacceptable, and the shared resource allocating device 200 needs to increase the shared resource of the target thread 102. 300 allocation quota.
- the shared resource 300 includes a front-end shared resource and a back-end shared resource.
- the child shared resources 1 and 2 in FIG. 1 may be front-end shared resources
- the child shared resources 3 and 4 may be Share resources for the back end.
- increasing the allocation quota of the shared resource 300 of the target thread 102 may include: increasing the allocation quota of the front-end shared resource of the target thread 102, and increasing the allocation quota of the shared resource of the front-end thread of the target thread 102, and operating exclusively.
- the allocation amount of the shared resources of the back end of the target thread 102 is increased.
- the interference clock period can be quickly reduced by preferentially increasing the allocation amount of the front-end shared resource of the target thread 102.
- the number is such that the ratio between the exclusive running time and the actual running time is rapidly increased to be greater than or equal to the Qos indicator.
- the allocation quota of the shared resources of the front end of the target thread 102 is increased, if the ratio is greater than or equal to the Qos indicator, the allocation quota of the shared resources of the backend of the target thread 102 may be further considered, thereby reducing the number of interference clock cycles. Increase the ratio between the exclusive running time and the actual running time to be greater than or equal to the Qos indicator.
- the front-end shared resource includes an instruction fetch unit and a first-level instruction cache
- the back-end shared resources include a fetch queue, a first-level data cache, Instruction queues, instruction reorder buffers, and physical rename registers.
- first increasing the allocation amount of the front-end shared resource of the target thread 102 can immediately reduce the number of interference clock cycles, thereby making the exclusive running time and the actual running time. The ratio between them increases relatively quickly to a value not less than the Qos indicator.
- Step 407 The resource allocation quota adjustment module 230 of the shared resource allocation device 200 maintains the allocation quota of the shared resource 300 of the target thread 102 when the ratio between the exclusive running time and the actual running time is not less than the quality of service QoS indicator, and/ Or increase the allocation amount of the shared resource 300 of the non-target thread 103.
- the resource allocation quota adjustment module 230 does not need to adjust the allocation quota of the shared resource 300 of the target thread 102, but in another alternative, the resource allocation quota adjustment module 230 may appropriately increase the shared resource 300 of the non-target thread 103.
- the allocation quota thereby effectively utilizing the spare shared resource 300, or, in another alternative, the resource allocation quota adjustment module 230 can maintain the allocation quota for the shared resource 300 of the target thread 102, and at the same time appropriately increase the non-target The allocation amount of the shared resource 300 of the thread 103.
- the embodiment of the present application detects the working state of the shared resource 300 when the multi-thread accesses the shared resource 300 at the same time, and adjusts the allocation quota of the shared resource 300 of the target thread 102 according to the detected working state, thereby ensuring the target thread. 102 quality of service.
- FIG. 3 is another schematic structural diagram of a shared resource allocation system according to an embodiment of the present application.
- the shared resource allocation system shown in FIG. 3 includes a target thread 102 and a non-target thread 103.
- the shared resource 300 includes an instruction fetch unit 301, a first-level instruction cache 302, and an access.
- the fetch unit 301 and the first-level instruction cache 302 are front-end shared resources, and the fetch queue 306, the first-level data cache 307, the instruction queue 304, the instruction re-order cache 305, and the physical rename register 303 are back-end shared resources.
- the shared resource allocation apparatus 200 of the embodiment of the present application is applicable to a RISC processor architecture.
- the RISC processor architecture includes the shared resource 300 and functional modules (including the decoding unit 501, the rename/allocation unit 502) shown in FIG. And functional unit 503), the resource allocation device 200 is exemplified below for application to the RISC processor architecture.
- the shared resource allocation apparatus 200 of the embodiment of the present application is also applicable to a processor architecture of a Complex Instruction Set Computer (CISC) or other processor architecture.
- CISC Complex Instruction Set Computer
- FIG. 3 specifically illustrates the example shown in FIG. 1 in a RISC processor architecture.
- the fetch unit 301, the first-level instruction cache 302, the fetch queue 306, the primary data cache 307, and the instruction queue 304 are shown.
- the instruction reorder buffer 305 and the physical rename register 303 are specific implementations of the sub-shared resources shown in FIG. 1.
- the instructions of the target thread 102 and the non-target thread 103 are placed in the level one instruction cache 302, and when the level one instruction cache 302 detects the hit instruction, the hit instruction is sent to the fetch.
- the fetching unit 301 sends the fetched instruction to the decoding unit 501 for decoding.
- the decoded instruction is reordered in the renaming/allocating unit 502 and the corresponding physical renaming register 303 is assigned, and renamed.
- the /allocation unit 502 places the instructions that are reordered and assigned the corresponding physical rename register 303 into the instruction reorder buffer 305, and fetches the instruction from the instruction reorder buffer 305 and sends it to the instruction queue 304 when the functional unit 503 is idle.
- the function unit 503 executes the instruction.
- the rename/allocation unit 502 sends the memory access instruction to the memory access queue 306 via the instruction queue 304, and the memory access instruction caches the primary data cache 307 through the memory access queue 306. Make an access.
- the state detecting module 210 is configured to detect the working states of the fetching unit 301, the first-level instruction cache 302, the fetch queue 306, the primary data cache 307, the instruction queue 304, the instruction reorder buffer 305, and the physical rename register 303, respectively.
- Counting is performed to obtain the basic clock cycle number of the target thread 102, the number of waiting clock cycles, and the number of interference clock cycles.
- the state detecting module 210 includes an fetching unit state detecting module 2108 , a first-level instruction cache state detecting module 2109 , and a physical renaming register state detecting module 2110 .
- FIG. 4 is a schematic diagram of a connection between a first-level instruction cache and a first-level instruction cache state detection module according to an embodiment of the present application.
- the first-level instruction cache 302 stores a cache address and a thread number 2
- the cache space pointed to by the cache address has a cached instruction.
- the first level instruction cache 302 is configured to receive the instruction 1000 from the target thread 102 or the non-target thread 103, and determine whether the received address carried by the received instruction 1000 is consistent with the cache address stored by itself, and if so, the memory access hit, indicating a
- the level instruction cache 302 does not have a memory conflict, and the level one instruction cache 302 buffers the received instruction and generates a hit enable signal of a high level (hereinafter indicated by a high level), and if not, one level
- the instruction cache 302 has a memory collision conflict, and the level one instruction cache 302 generates a hit enable signal of a low level (hereinafter, 0 indicates a low level).
- the memory access address is calculated by the first level instruction cache 302 according to the check digit of the instruction 1000.
- the thread number and the instruction content carried by the instruction 1000 may also be calculated, and the specific calculation manner is The embodiments of the present application are not related to the present disclosure.
- the first-level instruction cache 302 caches the instruction 1000.
- the first-level instruction cache state detecting module 2109 needs to determine that the instruction 1000 for the cache processing in the first-level instruction cache 302 belongs to the target thread.
- 102 is also a non-target thread 103, i.e., whether the target thread 102 uses the level one instruction cache 302.
- the first determining unit 224 determines whether the thread number carried by the instruction 1000 received by the first level instruction cache 302 is consistent with the target thread number. If yes, the instruction 1000 belongs to the target thread 102, that is, the target thread 102. The instruction may use the first level instruction cache 302 without blocking. At this time, the first determining unit 224 outputs 1 . If not, it indicates that the instruction belongs to the non-target thread 103 , and the instruction of the non-target thread 103 can use the first level instruction cache 302 without blocking. The first determining unit 224 outputs 0.
- the first-level instruction cache 302 is not blocked, and the target thread 102 can use the first-level instruction without blocking.
- the buffer 302 at this time, the first state output interface 223 outputs 1, the third state output interface 222 outputs 0, and the second state output interface 221 outputs 0.
- the first level instruction cache state detecting module 2109 further determines whether the source causing the blocking of the level one instruction cache 302 is the instruction of the target thread 102 or the instruction of the non-target thread 103. Specifically, the first determining unit 224 receives the level one instruction cache 302. The thread number carried by the instruction 1000 is compared with the target thread number. If the two are consistent, the thread waiting for the level one instruction cache 302 is the target thread 102, and the output is 1. If the two are inconsistent, the thread waiting for the level one instruction cache 302 is indicated. For non-target thread 103, 0 is output.
- FIG. 5 is a schematic diagram of connection between an instruction fetching unit and an fetching unit state detecting module according to an embodiment of the present application.
- the fetching unit 301 is configured to obtain an instruction 1000 from the first-level instruction cache 302.
- the fetching unit state detecting module 2108 is configured to compare the thread number carried by the instruction 1000 acquired by the fetching unit 301 with the target thread number.
- the second determining unit 213 compares whether the thread number 1 carried by the instruction 1000 acquired by the fetching unit 301 is consistent with the target thread number, and if so, the second determining unit 213 outputs 1, and the second status output interface 211 Output 1, the third state output interface 212 outputs 0; if not, the first determining unit outputs 0, the second state output interface 211 outputs 0, and the third state output interface 212 outputs 1.
- FIG. 6 is a schematic diagram of a connection between a physical rename register and a physical rename register state detection module according to an embodiment of the present application.
- the instruction 1000 may not be needed.
- the physical rename register 303 is used awaitingly.
- the physical rename register output of the physical rename register 303 blocks the set signal to 0, and the first state output interface 233 outputs 1.
- the physical rename register output of the physical rename register 303 blocks the set signal to 1, and the first state output interface 233 outputs 0.
- the third determining unit 234 is configured to be enabled by the physical renaming register blocking setting signal. Specifically, when the physical renaming register blocking setting signal is 1, the third determining unit 234 operates, and the physical When the rename register blocking set signal is 0, the third judging unit 234 does not operate.
- the physical rename register state detecting module 2110 needs to further determine whether the instruction 1000 causing the physical rename register 303 to block is an instruction of the target thread 102 or an instruction of the non-target thread 103.
- the third determining unit 234 determines whether the thread number carried by the instruction 1000 is consistent with the target thread number. If yes, it indicates that the instruction of the target thread 102 causes the physical rename register 303 to block, and the third determining unit 234 outputs 1, If not, it indicates that the instruction of the non-target thread 103 causes the physical rename register 303 to block, and the third determination unit 234 outputs 0.
- the third determining unit 234 When the third determining unit 234 outputs 1, the second state output interface 231 outputs 1, the third state output interface 232 outputs 0; when the third determining unit 234 outputs 0, the third state output interface 232 outputs 0, the second state Output interface 231 outputs 1.
- FIG. 7 to FIG. 9 respectively introduce a physical rename register state detecting module 2110 having a similar structure and function to the physical rename register state detecting module 2110, an instruction reordering buffer state detecting module 2111, and an instruction queue state detecting.
- the module 2105 and the memory access queue state detecting module 2106 differ only in the reason that the blocking set signal is generated, wherein the instruction reordering buffer blocking set signal shown in FIG. 7 is 1 when the instruction reordering buffer 305 is blocked. Otherwise it is 0.
- the instruction queue blocking set signal shown in Figure 8 is 1 when the instruction queue 304 is blocked, and is 0 otherwise.
- the fetch queue blocking set signal shown in FIG. 10 is 1 when the fetch queue 306 is blocked, and 0 otherwise.
- the above modules are similar to the physical rename register state detecting module 2110, and are not described herein.
- FIG. 10 introduces a primary data cache state detecting module 2107.
- the primary data cache state detecting module 2107 is similar to the primary command cache state detecting module 2109 shown in FIG. 4, and details are not described herein.
- FIG. 11 is a schematic diagram of a connection relationship between a first state detecting module and a basic clock counter according to an embodiment of the present application.
- the first state detecting module 281 of FIG. 1 is specifically implemented as an AND gate 281 .
- the input terminal of the AND gate 281 is connected to the first state output interfaces 223, 233, 243, 253, 263, and 273 shown in FIG. 4, FIG. 6 to FIG. 10, respectively, and the output signal of the output terminal of the AND gate 281 is paired with the basic clock counter. 282 is enabled.
- the output level of the AND gate 281 indicates the first state.
- the output level of the gate 281 is 1, it indicates that the shared resource 300 is in the first state, and when the output level of the gate 281 is 0, it indicates The shared resource 300 is not in the first state.
- FIG. 12 is a schematic diagram of a connection relationship between a second state detecting module and a waiting clock counter according to an embodiment of the present application.
- the second state detecting module 283 shown in FIG. 1 is specifically implemented as an OR gate. 283, the input terminals of the OR gate 283 are respectively connected to the second state output interfaces 211, 221, 231, 241, 251, 261, and 271 shown in FIG. 4 to FIG. 10, and the output signals of the output terminals of the OR gates 283 are waiting for the clock.
- Counter 284 is enabled, waiting for clock counter 284 to count when any of the wait clock counter interface outputs is 1, and waits for clock counter 284 not to count when all wait clock counter interface outputs are zero.
- the output level of the AND gate 283 indicates the second state.
- the output level of the gate 283 is 1, it indicates that the shared resource 300 is in the second state, and when the output level of the gate 283 is 0, it indicates The shared resource 300 is not in the second state.
- FIG. 13 is a connection diagram of a third state detecting module and an interference clock counter according to an embodiment of the present application.
- the third state detecting module 285 shown in FIG. 1 is specifically implemented as an OR gate. 285, the input of the OR gate 285 is connected to the third state output interfaces 212, 222, 232, 242, 252, 262, and 272 shown in FIG. 4 to FIG. 10, respectively, and the output signal of the output of the OR gate 285 interferes with the clock.
- Counter 286 is enabled. When any of the interfering clock counter interface outputs are 1, the interfering clock counter counts. When all interfering clock counter interface outputs are zero, the interfering clock counter 286 does not count.
- the output level of the AND gate 285 indicates the third state.
- the output level of the gate 285 is 1, it indicates that the shared resource 300 is in the third state, and when the output level of the gate 285 is 0, it indicates The shared resource 300 is not in the third state.
- the base clock counter 282 can count the accumulated time of the first state of the shared resource 300 to obtain the number of basic clock cycles
- the waiting clock counter can be the second state of the shared resource 300
- the accumulated time is counted to obtain the number of waiting clock cycles
- the interference clock counter can count the accumulated time of the third state of the shared resource 300 to obtain the number of clock cycles
- the counter 282 acquires the number of basic clock cycles, acquires the number of waiting clock cycles from the waiting clock counter 284, and acquires the number of waiting clock cycles from the interference clock counter 286.
- the resource allocation quota adjustment module 230 adjusts the allocation quota of the shared resources of the target thread according to the number of basic clock cycles, the number of waiting clock cycles, and the number of waiting clock cycles.
- the resource allocation quota adjustment module 230 can implement the target thread by the following manner. Adjustment of the allocation quota of the shared resource: adjusting the allocation quota of the instruction of the target thread in the primary instruction cache 302, the primary data cache 307, and the instruction reordering cache 305, or adjusting the instruction of the target thread in the physical rename register 303, fetching fingers
- a shared resource allocation method and apparatus provided by an embodiment of the present invention can detect a working state of a target thread when multiple threads access a shared resource at the same time, and adjust a shared resource of the target thread according to the detected working state.
- the quota is divided to ensure the quality of service of the target thread.
- the shared resource allocation device may be implemented by an integrated circuit including a logic gate.
- the shared resource allocation device may also pass a Field Programmable Gate Array (FPGA). Implementation, specifically by the program instructions written to the FPGA chip to achieve the corresponding function.
- FIG. 14 is a schematic diagram of a hardware structure of a shared resource allocation apparatus according to an embodiment of the present application.
- the shared resource allocation apparatus 200 includes a memory 501 , a bus 503 , and a processor 502 , and a processor 502 .
- the memory 501 is connected to a bus 503, which stores program instructions, and the processor 502 runs program instructions to implement the functions of the shared resource allocation device 200 described above.
- any of the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as the cells may or may not be Physical units can be located in one place or distributed to multiple network elements.
- Some or all of the processes may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- the connection relationship between processes indicates that there is a communication connection between them, and specifically may be implemented as one or more communication buses or signal lines.
- U disk mobile hard disk, Read-Only Memory (ROM), Random Access Memory (RAM), disk or optical disk, etc., including a number of instructions to make a computer device (may be A personal computer, server, or network device, etc.) performs the methods described in various embodiments of the present application.
- a computer device may be A personal computer, server, or network device, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Multi Processors (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
本申请涉及信息技术领域,特别涉及一种共享资源分配方法及装置。The present application relates to the field of information technology, and in particular, to a shared resource allocation method and apparatus.
随着互联网与云计算的发展,越来越多的线程被从本地迁移到云端的数据中心服务器,为了减小数据中心服务器的硬件开销,提高数据中心服务器的资源利用率,一般会采取多个线程混合部署的策略,即将多个线程混合部署在同一台数据中心服务器中,由于混合部署的多个线程需要共享数据中心服务器中的软硬件资源(即共享资源),从而产生资源竞争,导致混合部署的线程性能的不可预测,会严重影响目标线程(如高优先级线程)的服务质量。With the development of the Internet and cloud computing, more and more threads are being migrated from the local to the data center server in the cloud. In order to reduce the hardware overhead of the data center server and improve the resource utilization of the data center server, multiple Thread-mixed deployment strategy, that is, multiple threads are mixed and deployed in the same data center server. Because multiple threads of the hybrid deployment need to share the hardware and software resources (that is, shared resources) in the data center server, resource competition occurs, resulting in mixing. Unpredictable thread performance of the deployment can seriously affect the quality of service of the target thread (such as high priority threads).
发明内容Summary of the invention
为解决现有技术的问题,本申请实施例提供一种共享资源分配方法及装置,在多线程同时访问共享资源时对目标线程的工作状态进行检测,并根据检测到的工作状态调整目标线程的共享资源的分配额度,从而保证目标线程的服务质量。To solve the problem of the prior art, the embodiment of the present application provides a shared resource allocation method and apparatus, which detects a working state of a target thread when multiple threads access a shared resource at the same time, and adjusts a target thread according to the detected working state. The allocation of shared resources to ensure the quality of service of the target thread.
第一方面,本申请实施例提供一种共享资源分配方法,包括:在多线程同时访问共享资源时检测共享资源的工作状态并在工作状态下计数,工作状态包括目标线程无阻塞使用共享资源的第一状态、目标线程造成共享资源阻塞的第二状态、以及非目标线程造成共享资源阻塞的第三状态,读取计数值以获取基本时钟周期数、等候时钟周期数以及干扰时钟周期数,其中,基本时钟周期数是共享资源在第一状态下的计数值,等候时钟周期数是共享资源在第二状态下的计数值,干扰时钟周期数是共享资源在第三状态下的计数值,根据基本时钟周期数、等候时钟周期数和干扰时钟周期数,调整目标线程的共享资源的分配额度。In a first aspect, an embodiment of the present application provides a method for allocating a shared resource, including: detecting a working state of a shared resource when multiple threads access a shared resource at the same time and counting in a working state, where the working state includes using the shared resource by the target thread without blocking. a first state, a second state in which the target thread blocks the shared resource, and a third state in which the non-target thread causes the shared resource to block, and the count value is read to obtain the basic clock cycle number, the waiting clock cycle number, and the number of interference clock cycles, wherein The basic clock cycle number is a count value of the shared resource in the first state, the waiting clock cycle number is a count value of the shared resource in the second state, and the number of the interference clock cycles is a count value of the shared resource in the third state, according to The number of basic clock cycles, the number of waiting clock cycles, and the number of interfering clock cycles are adjusted to adjust the allocation amount of the shared resources of the target thread.
在本申请实施例中,由于针对目标线程对共享资源的使用情况进行检测,分别统计了目标线程的基本时钟周期数、等候时钟周期数和干扰时钟周期数,并根据基本时钟周期数、等候时钟周期数和干扰时钟周期数来调整目标线程的共享资源的分配额度,使得目标线程的共享资源的分配额度可根据共享资源的工作状态动态调整,从而确保目标线程的服务质量。In the embodiment of the present application, since the usage of the shared resource by the target thread is detected, the number of basic clock cycles, the number of waiting clock cycles, and the number of interference clock cycles of the target thread are respectively counted, and the clock is waited according to the basic clock cycle number. The number of cycles and the number of interference clock cycles are used to adjust the allocation quota of the shared resources of the target thread, so that the allocation quota of the shared resources of the target thread can be dynamically adjusted according to the working state of the shared resources, thereby ensuring the quality of service of the target thread.
结合第一方面,在第一方面的第一种可能的实现方式中,共享资源包括多个子共享资源,目标线程和非目标线程以串行方式访问多个子共享资源,在目标线程无阻塞地使用多个子共享资源的任一者时,共享资源处于第一状态,在目标线程造成多个子共享资源的任一者阻塞时,共享资源处于第二状态,在非目标线程造成多个子共享资源的任一者阻塞时,共享资源处于第三状态。With reference to the first aspect, in a first possible implementation manner of the first aspect, the shared resource includes multiple sub-shared resources, and the target thread and the non-target thread access multiple sub-shared resources in a serial manner, and the target thread is used without blocking. When any of the plurality of sub-shared resources is in the first state, when the target thread causes any of the plurality of sub-shared resources to block, the shared resource is in the second state, and the non-target thread causes the plurality of sub-shared resources When one is blocked, the shared resource is in the third state.
在多个子共享资源以流水线方式被多线程访问时,通过检测各子共享资源的工作状态,并整合各子共享资源的工作状态,汇总为共享资源的全局工作状态,从整体上对目标线程对共享资源的占用率进行判断,可有效提高共享资源额度调整的准确度。When multiple sub-shared resources are accessed by multiple threads in a pipeline manner, the working state of each sub-shared resource is detected, and the working state of each sub-shared resource is integrated, and the global working state of the shared resource is summarized, and the target thread pair is integrated as a whole. Judging the occupancy rate of shared resources can effectively improve the accuracy of the adjustment of shared resources.
结合第一方面或第一方面的第一种可能的实现方式,在第一方面的第二种可能的实现方式中,根据目标线程的基本时钟周期数、等候时钟周期数和干扰时钟周期数,调整目标线程的共享资源的分配额度,具体包括:根据基本时钟周期数、等候时钟周期数和干扰时钟周期数,确定目标线程的独占运行时间和目标线程的实际运行时间,在独占运行时间与实际运行时间之间的比值小于服务质量Qos指标时,则增加目标线程的共享资源的分配额度。With reference to the first aspect or the first possible implementation manner of the first aspect, in the second possible implementation manner of the first aspect, the basic clock cycle number, the number of waiting clock cycles, and the number of interference clock cycles are determined according to the target thread. Adjusting the allocation quota of the shared resource of the target thread includes: determining the exclusive running time of the target thread and the actual running time of the target thread according to the number of basic clock cycles, the number of waiting clock cycles, and the number of interference clock cycles, in the exclusive running time and the actual time When the ratio between the running times is less than the quality of service Qos indicator, the allocation amount of the shared resources of the target thread is increased.
由于独占运行时间与实际运行时间根据基本时钟周期数、等候时钟周期数和干扰时钟周期数获得,直接反映目标线程对共享资源的使用情况,通过获取用户设定的Qos指标,并将独占运行时间与实际运行时间之间的比值与Qos指标比较,可获知目标线程对共享资源的使用情况是否满足用户预期,并在满足用户预期时维持共享资源对目标线程的分配,或增加共享资源对非目标线程的分配,可在保证目标线程的性能下提高共享资源的利用率,而在不满足用户预期时,通过增加目标线程的共享资源的分配额度,可保证目标线程的服务质量。Since the exclusive running time and the actual running time are obtained according to the number of basic clock cycles, the number of waiting clock cycles, and the number of interference clock cycles, directly reflecting the use of the shared resources by the target thread, obtaining the Qos indicator set by the user, and monopolizing the running time. Compared with the actual running time, the ratio between the actual running time and the Qos index can be used to know whether the target thread's usage of the shared resource satisfies the user's expectation, and maintains the allocation of the shared resource to the target thread when the user expects, or increases the shared resource to the non-target. The allocation of threads can improve the utilization of shared resources while ensuring the performance of the target thread. When the user expects not to be satisfied, the service quality of the target thread can be guaranteed by increasing the allocation quota of the shared resources of the target thread.
结合第一方面的第二种可能的实现方式,在第一方面的第三种可能的实现方式中,在独占运行时间与实际运行时间之间的比值不小于服务质量QoS指标时,维持目标线程的共享资源的分配额度。With the second possible implementation of the first aspect, in a third possible implementation manner of the first aspect, the target thread is maintained when the ratio between the exclusive running time and the actual running time is not less than the quality of service QoS indicator The allocation of shared resources.
结合第一方面的第二种可能的实现方式,在第一方面的第四种可能的实现方式中,在独占运行时间与实际运行时间之间的比值不小于服务质量QoS指标时,增加非目标线程的共享资源的分配额度。With reference to the second possible implementation manner of the first aspect, in a fourth possible implementation manner of the first aspect, when the ratio between the exclusive running time and the actual running time is not less than the quality of service QoS indicator, the non-target is added. The allocation amount of the shared resource of the thread.
在一种可能的实现方式中,根据基本时钟周期数、等候时钟周期数和干扰时钟周期数,确定目标线程的独占运行时间和目标线程的实际运行时间,具体包括:对基本时钟周期数、等候时钟周期数以及干扰时钟周期数进行求和运算以获取实际运行时间,对基本时钟周期数以及等候时钟周期数进行求和运算以获取独占运行时间。In a possible implementation manner, determining the exclusive running time of the target thread and the actual running time of the target thread according to the basic clock cycle number, the number of waiting clock cycles, and the number of the interference clock cycles, including: waiting for the basic clock cycle, waiting The number of clock cycles and the number of interfering clock cycles are summed to obtain the actual running time, and the number of basic clock cycles and the number of waiting clock cycles are summed to obtain exclusive running time.
目标线程的独占运行时间表示目标线程在不受非目标线程的干扰而独占使用共享资源,因此独占运行时间举例而言可为基本时钟周期数与等候时钟周期数之和,而在实际应用中,在多线程混跑场景下,在没有对目标线程进行优先级设置而使得目标线程可独占共享资源时,目标线程不免会受到非目标线程的影响,因此目标线程的实际运行时间应为基本时钟周期数、等候时钟周期数以及干扰时钟周期数之和。The exclusive running time of the target thread indicates that the target thread exclusively uses the shared resource without being interfered by the non-target thread. Therefore, the exclusive running time can be the sum of the basic clock period and the waiting clock period, for example, in practical applications, In a multi-threaded mixed-running scenario, when the target thread is not prioritized and the target thread can monopolize the shared resource, the target thread is inevitably affected by the non-target thread, so the actual running time of the target thread should be the basic clock period. The sum of the number, the number of waiting clock cycles, and the number of interfering clock cycles.
在一种可能的实现方式中,共享资源包括前端共享资源和后端共享资源,增加目标线程的共享资源的分配额度,具体包括:增加目标线程的前端共享资源的分配额度,在增加目标线程的前端共享资源的分配额度之后,在独占运行时间与实际运行时间之间的比值仍小于服务质量Qos指标时,增加目标线程的后端共享资源的分配额度。In a possible implementation manner, the shared resource includes a front-end shared resource and a back-end shared resource, and the allocation quota of the shared resource of the target thread is increased, which includes: increasing the allocation quota of the front-end shared resource of the target thread, and increasing the target thread. After the front-end shared resource allocation quota, when the ratio between the exclusive running time and the actual running time is still less than the quality of service QoS indicator, the allocation quota of the target thread's back-end shared resource is increased.
由于目标线程的指令和非目标线程的指令均是先使用前端共享资源再使用后端共享资源,因此通过优先增加目标线程的前端共享资源的分配额度,可较快地减少干扰时钟周期数,从而使得独占运行时间与实际运行时间之间的比值快速增大至大于或等于Qos指标。Since both the instruction of the target thread and the instruction of the non-target thread use the front-end shared resource and then use the back-end shared resource, the number of interference clock cycles can be reduced relatively quickly by preferentially increasing the allocation amount of the front-end shared resource of the target thread. The ratio between the exclusive running time and the actual running time is rapidly increased to be greater than or equal to the Qos indicator.
在一种可能的实现方式中,前端共享资源包括取指单元和一级指令缓存,后端共享资源包括访存队列、一级数据缓存、指令队列、指令重排序缓存和物理重命名寄存器。In a possible implementation manner, the front-end shared resource includes an fetch unit and a first-level instruction cache, and the back-end shared resource includes a fetch queue, a first-level data cache, an instruction queue, an instruction reorder cache, and a physical rename register.
第二方面,本申请实施例提供一种线程资源分配装置,包括:计数模块,用于在多线 程同时访问共享资源时检测共享资源的工作状态,工作状态包括目标线程无阻塞使用共享资源的第一状态、目标线程造成共享资源阻塞的第二状态、以及非目标线程造成共享资源阻塞的第三状态,计数模块,用于在工作状态下计数,时钟周期数读取模块,用于读取计数值以获取基本时钟周期数、等候时钟周期数以及干扰时钟周期数,其中,基本时钟周期数是共享资源在第一状态下的计数值,等候时钟周期数是共享资源在第二状态下的计数值,干扰时钟周期数是共享资源在第三状态下的计数值;资源分配额度调整模块,用于根据基本时钟周期数、等候时钟周期数和干扰时钟周期数,调整目标线程的共享资源的分配额度。In a second aspect, the embodiment of the present application provides a thread resource allocation apparatus, including: a counting module, configured to detect a working state of a shared resource when multiple threads access a shared resource at the same time, and the working state includes a target thread that uses a shared resource without blocking. a state, a second state in which the target thread blocks the shared resource, and a third state in which the non-target thread causes the shared resource to be blocked, the counting module is configured to count in the working state, and the clock cycle number reading module is used to read the count The value is used to obtain the number of basic clock cycles, the number of waiting clock cycles, and the number of interference clock cycles. The number of basic clock cycles is the count value of the shared resource in the first state, and the number of waiting clock cycles is the count of the shared resource in the second state. The value of the interference clock cycle is a count value of the shared resource in the third state; the resource allocation quota adjustment module is configured to adjust the allocation of the shared resource of the target thread according to the number of basic clock cycles, the number of waiting clock cycles, and the number of interference clock cycles. Amount.
第二方面或第二方面任意一种实现方式是第一方面或第一方面任意一种实现方式对应的装置实现,第一方面或第一方面任意一种实现方式中的描述适用于第二方面或第二方面任意一种实现方式,在此不再赘述。The implementation of the second aspect or the second aspect is the device implementation corresponding to the first aspect or the implementation manner of any one of the first aspect, and the description in any one of the first aspect or the first aspect is applicable to the second aspect. Or any implementation of the second aspect, and details are not described herein again.
第三方面,本申请实施例子提供一种线程资源分配装置,包括处理器和存储器,所存储器存储有程序指令,处理器运行程序指令以执行第一方面及第一方面各种可能实现方式的步骤。In a third aspect, an embodiment of the present application provides a thread resource allocation apparatus, including a processor and a memory, where the memory stores program instructions, and the processor runs the program instructions to perform the steps of the first aspect and various possible implementation manners of the first aspect. .
本申请第四方面提供一种多线程的控制设备,包括用于执行以上第一方面的方法的至少一个处理元件(或芯片)。A fourth aspect of the present application provides a multi-threaded control device comprising at least one processing element (or chip) for performing the method of the above first aspect.
本申请第五方面提供一种程序,该程序在被处理器执行时用于执行以上第一方面及第一方面各种可能实现方式的方法。A fifth aspect of the present application provides a program for executing the method of the first aspect and various possible implementations of the first aspect, when executed by a processor.
本申请第六方面提供一种程序产品,例如计算机可读存储介质,包括第五方面的程序。A sixth aspect of the present application provides a program product, such as a computer readable storage medium, comprising the program of the fifth aspect.
本申请第七方面提供一种计算机可读存储介质,计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面的方法。A seventh aspect of the present application provides a computer readable storage medium having stored therein instructions that, when run on a computer, cause the computer to perform the method of the first aspect described above.
图1是根据本申请实施例的共享资源分配系统的结构示意图;1 is a schematic structural diagram of a shared resource allocation system according to an embodiment of the present application;
图2是根据本申请实施例的共享资源分配方法的流程图;2 is a flowchart of a shared resource allocation method according to an embodiment of the present application;
图3是根据本申请实施例的共享资源分配系统的另一结构示意图;FIG. 3 is another schematic structural diagram of a shared resource allocation system according to an embodiment of the present application; FIG.
图4是根据本申请实施例的一级指令缓存与一级指令缓存状态检测模块的连接示意图;4 is a schematic diagram of connection between a primary instruction cache and a primary instruction cache state detection module according to an embodiment of the present application;
图5是根据本申请实施例的取指单元与取值单元状态检测模块的连接示意图;FIG. 5 is a schematic diagram of connection between an indexing unit and a value unit status detecting module according to an embodiment of the present application; FIG.
图6是根据本申请实施例的物理重命名寄存器与物理重命名寄存器状态检测模块的连接示意图;6 is a schematic diagram of a connection between a physical rename register and a physical rename register state detection module according to an embodiment of the present application;
图7是根据本申请实施例的指令重排序缓存与指令重排序缓存状态检测模块的连接示意图;7 is a schematic diagram of a connection of an instruction reordering cache and an instruction reordering buffer state detecting module according to an embodiment of the present application;
图8是根据本申请实施例的指令队列与指令队列状态检测模块的连接示意图;FIG. 8 is a schematic diagram of a connection between an instruction queue and an instruction queue state detection module according to an embodiment of the present application; FIG.
图9是根据本申请实施例的访存队列与访存队列状态检测模块的连接示意图;9 is a schematic diagram of connection between a memory access queue and a memory access queue state detecting module according to an embodiment of the present application;
图10是根据本申请实施例的一级数据缓存与一级数据缓存状态检测模块的连接示意图;10 is a schematic diagram of connection between a primary data cache and a primary data cache state detection module according to an embodiment of the present application;
图11是根据本申请实施例的第一状态检测模块与基本时钟计数器的连接关系示意图;11 is a schematic diagram of a connection relationship between a first state detecting module and a basic clock counter according to an embodiment of the present application;
图12是根据本申请实施例的第二状态检测模块与等候时钟计数器的连接关系示意图;FIG. 12 is a schematic diagram of a connection relationship between a second state detecting module and a waiting clock counter according to an embodiment of the present application; FIG.
图13是根据本申请实施例的第三状态检测模块与干扰时钟计数器的连接关系图;FIG. 13 is a connection diagram of a third state detecting module and an interference clock counter according to an embodiment of the present application; FIG.
图14是根据本申请实施例的共享资源分配装置的硬件结构示意图。FIG. 14 is a schematic diagram of a hardware structure of a shared resource allocation apparatus according to an embodiment of the present application.
首先请参见图1,图1是根据本申请实施例的共享资源分配系统的结构示意图,如图1所示,共享资源分配系统100包括目标线程102、非目标线程103、共享资源300以及共享资源分配装置200。First, please refer to FIG. 1. FIG. 1 is a schematic structural diagram of a shared resource allocation system according to an embodiment of the present application. As shown in FIG. 1, the shared resource allocation system 100 includes a
在本申请实施例中,目标线程102是指需保证服务质量的线程,其对时延要求较小,需保证其占用的共享资源足够,非目标线程103是指无需保证服务质量的线程,对时延要求可相对目标线程大,可在保证目标线程102的服务质量的前提下牺牲非目标线程103所占用的共享资源。In the embodiment of the present application, the
具体地,目标线程102和非目标线程103分别包括至少一个指令,且每一指令均携带有指令内容和线程编号,线程编号用于标识指令所属的线程,在本申请实施例中,目标线程102的指令携带有目标线程编号,非目标线程103的指令携带有非目标线程编号,并且目标线程102和非目标线程103同时访问共享资源300,根据目标线程102和/或非目标线程103对共享资源300的使用情况,共享资源300可处于不同的工作状态。Specifically, the
其中,共享资源300的工作状态包括目标线程102无阻塞使用共享资源300的第一状态、目标线程102造成共享资源300阻塞的第二状态、以及非目标线程103造成共享资源300阻塞的第三状态。The working state of the shared
具体而言,第一状态具体为在共享资源300充足时,目标线程102的指令可无需等候地直接使用共享资源300的状态;第二状态具体为共享资源300不足时,目标线程102的部分指令使用共享资源300,而目标线程102的其他指令等候共享资源300的状态,值得注意的是,在第二状态中,造成目标线程102的其他指令等候共享资源的原因在于该共享资源300已经被目标线程102的部分指令使用,从而引起阻塞;第三状态具体为共享资源300不足时,非目标线程103的指令使用共享资源300,而目标线程102的指令等候共享资源300的状态,在第三状态中,造成目标线程102的指令等候共享资源的原因在于该共享资源300已经被非目标线程103的指令使用,从而引起阻塞。Specifically, the first state is specifically that when the shared
在本申请实施例中,共享资源分配装置200与共享资源300连接,用于检测共享资源300的工作状态,并根据检测到的工作状态调整目标线程102可使用的共享资源300的分配额度,使得目标线程102可使用的共享资源配额与目标线程102的工作状态相对应。In the embodiment of the present application, the shared
请进一步参见图1,共享资源包括多个子共享资源1、2、3、4,目标线程102和非目标线程103以串行方式访问多个子共享资源1、2、3、4,在目标线程102无阻塞地使用多个子共享资源1、2、3、4的任一者时,共享资源300处于第一状态,在目标线程102造成多个子共享资源1、2、3、4的任一者阻塞时,共享资源300处于第二状态,在非目标线程103造成多个子共享资源1、2、3、4的任一者阻塞时,共享资源300处于第三状态。Referring further to FIG. 1, the shared resource includes a plurality of sub-shared resources 1, 2, 3, 4, and the
值得注意的是,在本申请实施例中,子共享资源的数量并不受限于图1所示数量,在可选实施例中,子共享资源的数量可根据实际需要设置。It should be noted that, in the embodiment of the present application, the number of sub-shared resources is not limited to the number shown in FIG. 1. In an alternative embodiment, the number of sub-shared resources may be set according to actual needs.
共享资源分配装置200包括状态检测模块210、计数模块240、时钟周期读取模块220以及资源分配额度调整模块230。其中,计数模块240包括基本时钟计数器282、等候时钟 计数器284以及干扰时钟计数器286,状态检测模块210包括第一子共享资源状态检测模块2101、第二子共享资源状态检测模块2102、第三子共享资源状态检测模块2103以及第四子共享资源状态检测模块2104,在本申请实施例中,每一子共享资源设置一子共享资源状态检测模块,子共享资源状态检测模块可检测对应的子共享资源的工作状态,其中,工作状态包括第一状态、第二状态以及第三状态。The shared
状态检测模块210还包括第一状态检测模块281、第二状态检测模块283以及第三状态检测模块285,第一状态检测模块281分别从第一子共享资源状态检测模块2101、第二子共享资源状态检测模块2102、第三子共享资源状态检测模块2103以及第四子共享资源状态检测模块2104获知子共享资源1至4是否工作于第一状态,并且在共享资源1至4均工作于第一状态时,确认目标线程102无阻塞地使用多个子共享资源1、2、3、4的任一者,此时共享资源300整体而言处于第一状态,第一状态检测模块281通知基本时钟计数器282进行计数。The state detecting module 210 further includes a first
进一步,第二状态检测模块283分别从第一子共享资源状态检测模块2101、第二子共享资源状态检测模块2102、第三子共享资源状态检测模块2103以及第四子共享资源状态检测模块2104获知子共享资源1至4是否工作于第二状态,并且在共享资源1至4中的任一者工作于第二状态时,确认目标线程102造成多个子共享资源1、2、3、4的任一者阻塞,此时共享资源300整体而言处于第二状态,第二状态检测模块283通知等候时钟计数器284进行计数。Further, the second
类似地,第三状态检测模块285分别从第一子共享资源状态检测模块2101、第二子共享资源状态检测模块2102、第三子共享资源状态检测模块2103以及第四子共享资源状态检测模块2104获知子共享资源1至4是否工作于第三状态,并且在共享资源1至4中的任一者工作于第三状态时,确认非目标线程103造成多个子共享资源1、2、3、4的任一者阻塞,此时共享资源300整体而言处于第三状态,第三状态检测模块285通知干扰时钟计数器286进行计数。Similarly, the third
值得注意的是,由于子共享资源串行连接,因此在本申请实施例中,当每一子共享资源均无阻塞时,才能确认共享资源300工作于第一状态,而当任一子共享资源阻塞时,确认共享资源300工作于第二状态或第三状态,其中,可通过判断造成子共享资源阻塞的源头是目标线程102抑或是非目标线程103来确认共享资源300工作于何种状态,具体判断方式于下文详细介绍。It is to be noted that, because the sub-shared resources are serially connected, in the embodiment of the present application, when each sub-shared resource is not blocked, it can be confirmed that the shared
时钟周期读取模块220用于读取计数模块240的计数值以获取基本时钟周期数、等候时钟周期数以及干扰时钟周期数,其中,基本时钟周期数由基本时钟计数器282输出,是共享资源300在第一状态下的计数值,等候时钟周期数由等候时钟计数器284输出,是共享资源300在第二状态下的计数值,干扰时钟周期数由干扰时钟计数器286输出,是共享资源300在第三状态下的计数值。The clock
资源分配额度调整模块230,用于根据基本时钟周期数、等候时钟周期数和干扰时钟周期数,调整目标线程的共享资源300的分配额度。The resource allocation quota adjustment module 230 is configured to adjust the allocation quota of the shared
为了进一步清楚说明,以下结合图2进行参考,图2是根据本申请实施例的共享资源分配方法的流程图,如图2所示,共享资源分配方法包括以下步骤:For further clarification, reference is made to FIG. 2 below. FIG. 2 is a flowchart of a shared resource allocation method according to an embodiment of the present application. As shown in FIG. 2, the shared resource allocation method includes the following steps:
步骤401:共享资源分配装置200获取服务质量(Quality of Service,Qos)指标。Step 401: The shared
具体地,共享资源分配装置200中的资源分配额度调整模块230预先存储Qos指标,该Qos指标可由用户输入,或按照默认值设定。Specifically, the resource allocation quota adjustment module 230 in the shared
其中,Qos指标为针对目标线程102的服务质量指标,其数值范围在0至100%之间,用于表示用户对目标线程102的优先级要求,举例而言,当Qos指标为100%时,表示用户希望目标线程102独占共享资源300,当目标线程102对共享资源300使用完毕后,才可将共享资源300释放给非目标线程103;当Qos指标为50%时,表示用户希望目标线程102和非目标线程103各占用共享资源300的一半;而当Qos指标为0时,表示用户希望目标线程102不占用共享资源300。The Qos indicator is a quality of service indicator for the
步骤402:共享资源分配装置200在多线程同时访问共享资源300时检测共享资源300的工作状态并计数。Step 402: The shared
具体地,如图1及其对应内容所述,共享资源分配装置200的状态检测模块210在目标线程102和非目标线程103同时以串行方式访问共享资源300中的多个子共享资源1至4时检测共享资源300的工作状态,计数模块240对工作状态的持续时间进行累积计数,其中,基本时钟计数器282输出共享资源300在第一状态下的计数值,等候时钟计数器284输出共享资源300在第二状态下的计数值,干扰时钟计数器286输出共享资源300在第三状态下的计数值。Specifically, as shown in FIG. 1 and its corresponding content, the state detecting module 210 of the shared
值得注意的是,本申请实施例中,目标线程102的数量可为一个或多个,非目标线程103的数量也可为一个或多个。It should be noted that in the embodiment of the present application, the number of
步骤403:共享资源分配装置200的时钟周期读取模块220从计数模块240读取计数值以获取基本时钟周期数、等候时钟周期数和干扰时钟周期数。Step 403: The clock
基本时钟周期数为基本时钟计数器282输出的计数值,等候时钟周期数为等候时钟计数器284输出的计数值,干扰时钟周期数为干扰时钟计数器286输出的计数值。The number of basic clock cycles is the count value output by the basic clock counter 282, the number of waiting clock cycles is the count value output by the wait clock counter 284, and the number of interference clock cycles is the count value output by the interference clock counter 286.
具体地,时钟周期读取模块220从基本时钟计数器282获取基本时钟周期数,从等候时钟计数器284获取等候时钟周期数,从干扰时钟计数器286获取干扰时钟周期数。Specifically, the clock cycle read
在本申请实施例中,由于共享资源300被目标线程102和非目标线程103同时使用,因此共享资源300在第一状态、第二状态以及第三状态之间动态切换工作状态,由于是动态切换,因此第一状态、第二状态或第三状态的持续时间并不是一个连续的时间段,而是多个离散的时间段,因此需要分别对第一状态、第二状态以及第三状态的累积时间进行计数,才能获取到在预定时间段内,第一状态、第二状态以及第三状态所占的时间比例。In the embodiment of the present application, since the shared
于此所述的预定时间段举例而言可为一个统计周期,本申请实施例中,可在一个统计周期中获取基本时钟周期数、等候时钟周期数和干扰时钟周期数后,根据基本时钟周期数、等候时钟周期数和干扰时钟周期数,调整目标线程102的共享资源300的分配额度,在下一个统计周期中继续获取基本时钟周期数、等候时钟周期数和干扰时钟周期数,并进一步判断在上一个统计周期中调整目标线程102的共享资源300的分配额度之后,在本统计周期中的基本时钟周期数、等候时钟周期数和干扰时钟周期数是否满足Qos指标。The predetermined time period described in this example may be a statistical period. In the embodiment of the present application, after the basic clock cycle number, the number of waiting clock cycles, and the number of interference clock cycles are obtained in one statistical cycle, according to the basic clock cycle. The number, the number of waiting clock cycles, and the number of interference clock cycles are adjusted, and the allocation quota of the shared
取决于实际应用场景,统计周期举例而言可设置为在1ms至1s之间。The statistical period can be set to be between 1 ms and 1 s, for example, depending on the actual application scenario.
步骤404:共享资源分配装置200的资源分配额度调整模块230根据基本时钟周期数、 等候时钟周期数和干扰时钟周期数,确定目标线程102的独占运行时间和目标线程102的实际运行时间。Step 404: The resource allocation quota adjustment module 230 of the shared
目标线程102的独占运行时间表示目标线程102在不受非目标线程103的干扰而独占使用共享资源300,因此独占运行时间举例而言可为基本时钟周期数与等候时钟周期数之和。The exclusive runtime of the
而在实际应用中,在多线程混跑场景下,在没有对目标线程102进行优先级设置而使得目标线程102可独占共享资源300时,目标线程102不免会受到非目标线程103的影响,因此目标线程102的实际运行时间应为基本时钟周期数、等候时钟周期数以及干扰时钟周期数之和。In a practical application, in the multi-threaded running scenario, when the
步骤405:共享资源分配装置200的资源分配额度调整模块230判断独占运行时间与实际运行时间之间的比值小于服务质量Qos指标,如果是,执行步骤406,如果否,执行步骤407。Step 405: The resource allocation quota adjustment module 230 of the shared
步骤406:共享资源分配装置200的资源分配额度调整模块230在独占运行时间与实际运行时间之间的比值小于服务质量Qos指标时,则增加目标线程102的共享资源300的分配额度。Step 406: The resource allocation quota adjustment module 230 of the shared
在独占运行时间与实际运行时间之间的比值小于服务质量Qos指标时,说明目标线程102占用共享资源300的配额属于用户可以不可接受的水平,共享资源分配装置200需增加目标线程102的共享资源300的分配额度。When the ratio between the exclusive running time and the actual running time is less than the quality of service QoS indicator, it indicates that the quota occupied by the
在一些示例中,共享资源300包括前端共享资源和后端共享资源,以图1所示为例,在图1中的子共享资源1和2可为前端共享资源,子共享资源3和4可为后端共享资源。而在本步骤中,增加目标线程102的共享资源300的分配额度具体可包括:增加目标线程102的前端共享资源的分配额度,在增加目标线程102的前端共享资源的分配额度之后,在独占运行时间与实际运行时间之间的比值仍小于服务质量Qos指标时,增加目标线程102的后端共享资源的分配额度。In some examples, the shared
由于目标线程102的指令和非目标线程103的指令均是先使用前端共享资源再使用后端共享资源,因此通过优先增加目标线程102的前端共享资源的分配额度,可较快地减少干扰时钟周期数,从而使得独占运行时间与实际运行时间之间的比值快速增大至大于或等于Qos指标。Since both the instruction of the
而在增加目标线程102的前端共享资源的分配额度之后,若仍未能使得比值大于或等于Qos指标,可进一步考虑增加目标线程102的后端共享资源的分配额度,从而减少干扰时钟周期数,使得独占运行时间与实际运行时间之间的比值增大至大于或等于Qos指标。After the allocation quota of the shared resources of the front end of the
举例而言,在精简指令集计算机(Reduced Instruction Set Computer,RISC)的处理器架构中,前端共享资源包括取指单元和一级指令缓存,后端共享资源包括访存队列、一级数据缓存、指令队列、指令重排序缓存、以及物理重命名寄存器。For example, in a processor architecture of a Reduced Instruction Set Computer (RISC), the front-end shared resource includes an instruction fetch unit and a first-level instruction cache, and the back-end shared resources include a fetch queue, a first-level data cache, Instruction queues, instruction reorder buffers, and physical rename registers.
由于目标线程102的指令先访问前端共享资源,再访问后端共享资源,因此先增加目标线程102的前端共享资源的分配额度可马上降低干扰时钟周期数,从而使得在独占运行时间与实际运行时间之间的比值较为快速地增大至不小于Qos指标的数值。Since the instruction of the
步骤407:共享资源分配装置200的资源分配额度调整模块230在独占运行时间与实 际运行时间之间的比值不小于服务质量Qos指标时,则维持目标线程102的共享资源300的分配额度,和/或增加非目标线程103的共享资源300的分配额度。Step 407: The resource allocation quota adjustment module 230 of the shared
在本步骤中,在独占运行时间与实际运行时间之间的比值不小于服务质量Qos指标时,说明目标线程102占用共享资源300的配额属于用户可以接受的水平,此时,作为一种可选方案,资源分配额度调整模块230无需对目标线程102的共享资源300的分配额度进行调整,而在另外一种可选方案中,资源分配额度调整模块230可以适当增加非目标线程103的共享资源300的分配额度,从而有效利用空余的共享资源300,或者,在另外一种可选方案中,资源分配额度调整模块230可以维持对目标线程102的共享资源300的分配额度,并同时适当增加非目标线程103的共享资源300的分配额度。In this step, when the ratio between the exclusive running time and the actual running time is not less than the quality of service Qos indicator, it indicates that the quota occupied by the
综上,本申请实施例在多线程同时访问共享资源300时对共享资源300的工作状态进行检测,并根据检测到的工作状态调整目标线程102的共享资源300的分配额度,从而保证了目标线程102的服务质量。In summary, the embodiment of the present application detects the working state of the shared
为了进一步清楚说明,以下请结合图3进行参考,图3是根据本申请实施例的共享资源分配系统的另一结构示意图,图3所示的共享资源分配系统包括目标线程102、非目标线程103、共享资源300、共享资源分配装置200以及功能模块,其中功能模块包括译码单元501、重命名/分配单元502以及功能单元503,共享资源300包括取指单元301、一级指令缓存302、访存队列306、一级数据缓存307、指令队列304、指令重排序缓存305以及物理重命名寄存器303。其中,取指单元301和一级指令缓存302为前端共享资源,访存队列306、一级数据缓存307、指令队列304、指令重排序缓存305和物理重命名寄存器303为后端共享资源。本申请实施例的共享资源分配装置200适用于RISC处理器架构,具体而言,RISC处理器架构包括图3所示的共享资源300和功能模块(包括译码单元501、重命名/分配单元502以及功能单元503),下文将资源分配装置200应用于RISC处理器架构进行举例说明。For further clarification, reference is made to FIG. 3 for reference. FIG. 3 is another schematic structural diagram of a shared resource allocation system according to an embodiment of the present application. The shared resource allocation system shown in FIG. 3 includes a
值得注意的是,在可选实施例中,本申请实施例的共享资源分配装置200也可适用于复杂指令集计算机(Complex Instruction Set Computer,CISC)的处理器架构或其他处理器架构。It should be noted that in an alternative embodiment, the shared
并且,图3以RISC处理器架构对图1所示的示例进行具体说明,在图3中,取指单元301、一级指令缓存302、访存队列306、一级数据缓存307、指令队列304、指令重排序缓存305以及物理重命名寄存器303为图1所示的子共享资源的具体实现。Moreover, FIG. 3 specifically illustrates the example shown in FIG. 1 in a RISC processor architecture. In FIG. 3, the fetch unit 301, the first-level instruction cache 302, the fetch queue 306, the primary data cache 307, and the instruction queue 304 are shown. The instruction reorder buffer 305 and the physical rename register 303 are specific implementations of the sub-shared resources shown in FIG. 1.
而在图3所示的RISC处理器架构中,目标线程102和非目标线程103的指令放置于一级指令缓存302,在一级指令缓存302检测到命中的指令时,发送命中的指令至取指单元301,取指单元301将取得的指令发送至译码单元501中进行译码,译码后的指令在重命名/分配单元502进行重排序并分配对应的物理重命名寄存器303,重命名/分配单元502将重排序和分配了对应物理重命名寄存器303的指令放置到指令重排序缓存305,并从指令重排序缓存305取出该指令并发送至指令队列304中,当功能单元503空闲时,功能单元503执行该指令。并且,重命名/分配单元502在判断到该指令为访存指令时,经指令队列304将该访存指令发送至访存队列306,该访存指令通过访存队列306对一级数据缓存307进行访问。In the RISC processor architecture shown in FIG. 3, the instructions of the
进一步,状态检测模块210分别用于检测取指单元301、一级指令缓存302、访存队列306、一级数据缓存307、指令队列304、指令重排序缓存305以及物理重命名寄存器303的工作状态,计数模块220分别对取指单元301、一级指令缓存302、访存队列306、一级数据缓存307、指令队列304、指令重排序缓存305以及物理重命名寄存器303的各工作状态的累积时间进行计数,获得目标线程102的基本时钟周期数、等候时钟周期数和干扰时钟周期数。Further, the state detecting module 210 is configured to detect the working states of the fetching unit 301, the first-level instruction cache 302, the fetch queue 306, the primary data cache 307, the instruction queue 304, the instruction reorder buffer 305, and the physical rename register 303, respectively. The accumulation time of the working states of the fetch unit 301, the first-level instruction cache 302, the fetch queue 306, the primary data cache 307, the instruction queue 304, the instruction reorder buffer 305, and the physical rename register 303, respectively. Counting is performed to obtain the basic clock cycle number of the
具体而言,如图4至图13所示,在本实施例中,状态检测模块210包括取指单元状态检测模块2108、一级指令缓存状态检测模块2109、物理重命名寄存器状态检测模块2110、指令重排序缓存状态检测模块2111、指令队列状态检测模块2105、访存队列状态检测模块2106、一级数据缓存状态检测模块2107,上述各功能模块为图1所示的子共享资源状态检测模块的具体实现。Specifically, as shown in FIG. 4 to FIG. 13 , in the embodiment, the state detecting module 210 includes an fetching unit state detecting module 2108 , a first-level instruction cache state detecting module 2109 , and a physical renaming register state detecting module 2110 . The instruction reordering buffer state detecting module 2111, the instruction queue state detecting module 2105, the memory access queue state detecting module 2106, and the first level data buffer state detecting module 2107, wherein each of the foregoing functional modules is the sub-shared resource state detecting module shown in FIG. Implementation.
首先请参见图4,图4是根据本申请实施例的一级指令缓存与一级指令缓存状态检测模块的连接示意图,如图4所示,一级指令缓存302存储有缓存地址和线程编号2,缓存地址所指向的缓存空间记录有缓存的指令。Referring to FIG. 4, FIG. 4 is a schematic diagram of a connection between a first-level instruction cache and a first-level instruction cache state detection module according to an embodiment of the present application. As shown in FIG. 4, the first-level instruction cache 302 stores a cache address and a thread number 2 The cache space pointed to by the cache address has a cached instruction.
一级指令缓存302用于从目标线程102或非目标线程103接收指令1000,并判断接收到的指令1000携带的请求地址与自身存储的缓存地址是否一致,如果是,则访存命中,说明一级指令缓存302没有发生访存冲突,一级指令缓存302将接收到的指令进行缓存处理,并产生高电平(下文以1表示高电平)的命中使能信号,如果否,则一级指令缓存302发生访存冲突,一级指令缓存302产生低电平(下文以0表示低电平)的命中使能信号。The first level instruction cache 302 is configured to receive the
在一些示例中,访存地址是一级指令缓存302根据指令1000的校验位计算所得,在另外一些示例中,也可根据指令1000携带的线程编号和指令内容计算所得,其具体计算方式与本申请实施例无关,于此不作展开介绍。In some examples, the memory access address is calculated by the first level instruction cache 302 according to the check digit of the
进一步地,在访存命中时,一级指令缓存302对指令1000进行缓存处理,此时一级指令缓存状态检测模块2109需判断出在一级指令缓存302进行缓存处理的指令1000是属于目标线程102还是非目标线程103,即是否是目标线程102使用一级指令缓存302。在本实施例中,第一判断单元224判断一级指令缓存302接收到的指令1000携带的线程编号与目标线程编号是否一致,如果是,说明该指令1000属于目标线程102,即目标线程102的指令可无阻塞地使用一级指令缓存302,此时第一判断单元224输出1,如果否,说明该指令属于非目标线程103,非目标线程103的指令可无阻塞地使用一级指令缓存302,第一判断单元224输出0。Further, when the memory access hits, the first-level instruction cache 302 caches the
因此,当访存命中,且一级指令缓存302接收到的指令1000携带的线程编号与目标线程编号一致时,说明一级指令缓存302没有阻塞,且目标线程102可无阻塞地使用一级指令缓存302,此时,第一状态输出接口223输出1,第三状态输出接口222输出0,第二状态输出接口221输出0。Therefore, when the memory access hits, and the thread number carried by the
而当访存冲突,说明一级指令缓存302阻塞,一级指令缓存302产生命中使能信号0并输入至与门229的输入端,使得与门229的输出端输出0,从而使得第一状态输出接口223输出0。一级指令缓存状态检测模块2109进一步判断造成一级指令缓存302阻塞的源头是目标线程102的指令还是非目标线程103的指令,具体地,第一判断单元224将一级 指令缓存302接收到的指令1000携带的线程编号与目标线程编号作比较,如果二者一致,说明等候一级指令缓存302的线程为目标线程102,则输出1,如果二者不一致,说明等候一级指令缓存302的线程为非目标线程103,则输出0。When the access conflict occurs, the first-level instruction cache 302 is blocked, and the first-level instruction cache 302 generates a hit enable signal 0 and inputs it to the input of the AND
因此,在命中使能信号为0,且第一判断单元224输出1时,第三状态输出接口222输出0,第二状态输出接口221输出1;在命中使能信号为0,且第一判断单元224输出0时,第三状态输出接口222输出1,第二状态输出接口221输出0。Therefore, when the hit enable signal is 0, and the first determining unit 224 outputs 1, the third state output interface 222 outputs 0, the second
以下请参见图5,图5是根据本申请实施例的取指单元与取指单元状态检测模块的连接示意图,如图5所示,取指单元301用于从一级指令缓存302获取指令1000,取指单元状态检测模块2108用于将取指单元301获取到的指令1000所携带的线程编号与目标线程编号作比较。Referring to FIG. 5, FIG. 5 is a schematic diagram of connection between an instruction fetching unit and an fetching unit state detecting module according to an embodiment of the present application. As shown in FIG. 5, the fetching unit 301 is configured to obtain an
具体而言,第二判断单元213比较取指单元301获取到的指令1000所携带的线程编号1与目标线程编号是否一致,如果是,则第二判断单元213输出1,第二状态输出接口211输出1,第三状态输出接口212输出0;如果否,则第一判断单元输出0,第二状态输出接口211输出0,第三状态输出接口212输出1。Specifically, the second determining
以下请参见图6,图6是根据本申请实施例的物理重命名寄存器与物理重命名寄存器状态检测模块的连接示意图,如图6所示,当物理重命名寄存器303无阻塞,指令1000可无需等候地使用物理重命名寄存器303,此时物理重命名寄存器303输出的物理重命名寄存器阻塞置位信号为0,第一状态输出接口233输出1。Please refer to FIG. 6. FIG. 6 is a schematic diagram of a connection between a physical rename register and a physical rename register state detection module according to an embodiment of the present application. As shown in FIG. 6, when the physical rename register 303 is not blocked, the
当物理重命名寄存器303被发生阻塞,此时物理重命名寄存器303输出的物理重命名寄存器阻塞置位信号为1,第一状态输出接口233输出0。When the physical rename register 303 is blocked, the physical rename register output of the physical rename register 303 blocks the set signal to 1, and the first state output interface 233 outputs 0.
在本申请实施例中,第三判断单元234设置为被物理重命名寄存器阻塞置位信号使能,具体而言,物理重命名寄存器阻塞置位信号为1时,第三判断单元234工作,物理重命名寄存器阻塞置位信号为0时,第三判断单元234不工作。In the embodiment of the present application, the third determining unit 234 is configured to be enabled by the physical renaming register blocking setting signal. Specifically, when the physical renaming register blocking setting signal is 1, the third determining unit 234 operates, and the physical When the rename register blocking set signal is 0, the third judging unit 234 does not operate.
因此,当物理重命名寄存器阻塞置位信号为1时,物理重命名寄存器状态检测模块2110需进一步判断造成物理重命名寄存器303阻塞的指令1000是目标线程102的指令还是非目标线程103的指令,此时,第三判断单元234判断指令1000所携带的线程编号与目标线程编号是否一致,如果是,则说明是目标线程102的指令造成物理重命名寄存器303阻塞,第三判断单元234输出1,如果否,则说明非目标线程103的指令造成物理重命名寄存器303阻塞,第三判断单元234输出0。Therefore, when the physical rename register blocking set signal is 1, the physical rename register state detecting module 2110 needs to further determine whether the
在第三判断单元234输出1时,第二状态输出接口231输出1,第三状态输出接口232输出0;在第三判断单元234输出0时,第三状态输出接口232输出0,第二状态输出接口231输出1。When the third determining unit 234 outputs 1, the second state output interface 231 outputs 1, the third state output interface 232 outputs 0; when the third determining unit 234 outputs 0, the third state output interface 232 outputs 0, the second state Output interface 231 outputs 1.
值得注意的是,图7至图9分别介绍了与物理重命名寄存器状态检测模块2110具有类似结构和功能的物理重命名寄存器状态检测模块2110、指令重排序缓存状态检测模块2111、指令队列状态检测模块2105、以及访存队列状态检测模块2106,区别仅在于阻塞置位信号产生的原因不同,其中,图7所示的指令重排序缓存阻塞置位信号在指令重排序缓存305阻塞时时为1,反之为0。图8所示的指令队列阻塞置位信号在指令队列304阻塞时为1,反之为0。图10所示的访存队列阻塞置位信号在访存队列306阻塞时为1,反之为0。由 于上述的模块与物理重命名寄存器状态检测模块2110类似,于此不作赘述。It should be noted that FIG. 7 to FIG. 9 respectively introduce a physical rename register state detecting module 2110 having a similar structure and function to the physical rename register state detecting module 2110, an instruction reordering buffer state detecting module 2111, and an instruction queue state detecting. The module 2105 and the memory access queue state detecting module 2106 differ only in the reason that the blocking set signal is generated, wherein the instruction reordering buffer blocking set signal shown in FIG. 7 is 1 when the instruction reordering buffer 305 is blocked. Otherwise it is 0. The instruction queue blocking set signal shown in Figure 8 is 1 when the instruction queue 304 is blocked, and is 0 otherwise. The fetch queue blocking set signal shown in FIG. 10 is 1 when the fetch queue 306 is blocked, and 0 otherwise. The above modules are similar to the physical rename register state detecting module 2110, and are not described herein.
进一步,图10介绍了一级数据缓存状态检测模块2107,一级数据缓存状态检测模块2107与图4所示的一级指令缓存状态检测模块2109类似,于此也不作赘述。Further, FIG. 10 introduces a primary data cache state detecting module 2107. The primary data cache state detecting module 2107 is similar to the primary command cache state detecting module 2109 shown in FIG. 4, and details are not described herein.
以下请参见图11,图11是根据本申请实施例的第一状态检测模块与基本时钟计数器的连接关系示意图,在图11中,图1的第一状态检测模块281具体实现为与门281,与门281的输入端分别与图4、图6至图10所示的第一状态输出接口223、233、243、253、263、和273连接,与门281的输出端的输出信号对基本时钟计数器282使能,在任一第一状态输出接口为0时,与门281的输出端的输出信号为0,基本时钟计数器282不计数,在所有第一状态输出接口输出均为1时,与门281的输出端的输出信号为1,基本时钟计数器282计数。Referring to FIG. 11 , FIG. 11 is a schematic diagram of a connection relationship between a first state detecting module and a basic clock counter according to an embodiment of the present application. In FIG. 11 , the first
在本实施例中,与门281的输出电平表示第一状态,当门281的输出电平是1时,表示共享资源300处于第一状态,当门281的输出电平是0时,表示共享资源300没有处于第一状态。In the present embodiment, the output level of the AND
并请参见图12,图12是根据本申请实施例的第二状态检测模块与等候时钟计数器的连接关系示意图,在图12中,图1所示的第二状态检测模块283具体实现为或门283,或门283的输入端分别与图4至图10所示的第二状态输出接口211、221、231、241、251、261、和271连接,或门283的输出端的输出信号对等候时钟计数器284使能,在任一等候时钟计数器接口输出为1时,等候时钟计数器284计数,在所有等候时钟计数器接口输出均为0时,等候时钟计数器284不计数。Referring to FIG. 12, FIG. 12 is a schematic diagram of a connection relationship between a second state detecting module and a waiting clock counter according to an embodiment of the present application. In FIG. 12, the second
在本实施例中,与门283的输出电平表示第二状态,当门283的输出电平是1时,表示共享资源300处于第二状态,当门283的输出电平是0时,表示共享资源300没有处于第二状态。In the present embodiment, the output level of the AND
并请参见图13,图13是根据本申请实施例的第三状态检测模块与干扰时钟计数器的连接关系图,在图13中,图1所示的第三状态检测模块285具体实现为或门285,或门285的输入端分别与图4至图10所示的第三状态输出接口212、222、232、242、252、262、和272连接,或门285的输出端的输出信号对干扰时钟计数器286使能,在任一干扰时钟计数器接口输出为1时,干扰时钟计数器计数,在所有干扰时钟计数器接口输出均为0时,干扰时钟计数器286不计数。Referring to FIG. 13, FIG. 13 is a connection diagram of a third state detecting module and an interference clock counter according to an embodiment of the present application. In FIG. 13, the third
在本实施例中,与门285的输出电平表示第三状态,当门285的输出电平是1时,表示共享资源300处于第三状态,当门285的输出电平是0时,表示共享资源300没有处于第三状态。In the present embodiment, the output level of the AND
因此,基于图4至图11所示的架构,基本时钟计数器282可对共享资源300的第一状态的累积时间进行计数以获取基本时钟周期数,等候时钟计数器可对共享资源300的第二状态的累积时间进行计数以获取等候时钟周期数,干扰时钟计数器可对共享资源300的第三状态的累积时间进行计数以获取时钟周期数,图1所示的时钟周期读取模块220分别从基本时钟计数器282获取基本时钟周期数,从等候时钟计数器284获取等候时钟周期数,从干扰时钟计数器286获取等候时钟周期数。Thus, based on the architecture illustrated in Figures 4-11, the base clock counter 282 can count the accumulated time of the first state of the shared
资源分配额度调整模块230根据基本时钟周期数、等候时钟周期数以及等候时钟周期 数调整目标线程的共享资源的分配额度,举例而言,资源分配额度调整模块230可通过以下方式来实现目标线程的共享资源的分配额度的调整:调整目标线程的指令在一级指令缓存302、一级数据缓存307以及指令重排缓存305的分配额度,或调整目标线程的指令在物理重命名寄存器303、取指单元301、指令队列304以及访存队列306中的分配数量。具体可参见图2所示的步骤404至步骤407。The resource allocation quota adjustment module 230 adjusts the allocation quota of the shared resources of the target thread according to the number of basic clock cycles, the number of waiting clock cycles, and the number of waiting clock cycles. For example, the resource allocation quota adjustment module 230 can implement the target thread by the following manner. Adjustment of the allocation quota of the shared resource: adjusting the allocation quota of the instruction of the target thread in the primary instruction cache 302, the primary data cache 307, and the instruction reordering cache 305, or adjusting the instruction of the target thread in the physical rename register 303, fetching fingers The number of allocations in unit 301, instruction queue 304, and fetch queue 306. For details, refer to step 404 to step 407 shown in FIG. 2 .
综上,本申请实施例提供的一种共享资源分配方法及装置,可在多线程同时访问共享资源时对目标线程的工作状态进行检测,并根据检测到的工作状态调整目标线程的共享资源的分配额度,从而保证目标线程的服务质量。In summary, a shared resource allocation method and apparatus provided by an embodiment of the present invention can detect a working state of a target thread when multiple threads access a shared resource at the same time, and adjust a shared resource of the target thread according to the detected working state. The quota is divided to ensure the quality of service of the target thread.
在本申请的实施例中,共享资源分配装置可通过包括逻辑门的集成电路实现,而本申请可选实施例中,共享资源分配装置也可通过现场可编程门阵列(Field Programmable Gate Array,FPGA)实现,具体可通过写入至FPGA芯片的程序指令来实现对应功能。以下请参见图14,图14是根据本申请实施例的共享资源分配装置的硬件结构示意图,如图14所示,共享资源分配装置200包括存储器501、总线503以及处理器502,处理器502和存储器501分别与总线503连接,存储器501存储有程序指令,处理器502运行程序指令以实现上述共享资源分配装置200的功能。In the embodiment of the present application, the shared resource allocation device may be implemented by an integrated circuit including a logic gate. In an alternative embodiment of the present application, the shared resource allocation device may also pass a Field Programmable Gate Array (FPGA). Implementation, specifically by the program instructions written to the FPGA chip to achieve the corresponding function. Referring to FIG. 14 , FIG. 14 is a schematic diagram of a hardware structure of a shared resource allocation apparatus according to an embodiment of the present application. As shown in FIG. 14 , the shared
需说明的是,以上描述的任意装置实施例都仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部进程来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,进程之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。It should be noted that any of the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as the cells may or may not be Physical units can be located in one place or distributed to multiple network elements. Some or all of the processes may be selected according to actual needs to achieve the purpose of the solution of the embodiment. In addition, in the drawings of the apparatus embodiments provided by the present application, the connection relationship between processes indicates that there is a communication connection between them, and specifically may be implemented as one or more communication buses or signal lines. Those of ordinary skill in the art can understand and implement without any creative effort.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘,U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the present application can be implemented by means of software plus necessary general hardware, and of course, dedicated hardware, dedicated CPU, dedicated memory, dedicated memory, Special components and so on. In general, functions performed by computer programs can be easily implemented with the corresponding hardware, and the specific hardware structure used to implement the same function can be various, such as analog circuits, digital circuits, or dedicated circuits. Circuits, etc. However, software program implementation is a better implementation for more applications in this application. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer. , U disk, mobile hard disk, Read-Only Memory (ROM), Random Access Memory (RAM), disk or optical disk, etc., including a number of instructions to make a computer device (may be A personal computer, server, or network device, etc.) performs the methods described in various embodiments of the present application.
所属领域的技术人员可以清楚地了解到,上述描述的系统、装置或单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。A person skilled in the art can clearly understand that the specific working process of the system, the device or the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求书的保护范围为准。The foregoing is only a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present application. It should be covered by the scope of protection of this application. Therefore, the scope of protection of the present application should be determined by the scope of the claims.
Claims (17)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710619844.8A CN109308220B (en) | 2017-07-26 | 2017-07-26 | Shared resource allocation method and device |
| CN201710619844.8 | 2017-07-26 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019020028A1 true WO2019020028A1 (en) | 2019-01-31 |
Family
ID=65040433
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2018/096869 Ceased WO2019020028A1 (en) | 2017-07-26 | 2018-07-24 | Method and apparatus for allocating shared resource |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN109308220B (en) |
| WO (1) | WO2019020028A1 (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113505084A (en) * | 2021-06-24 | 2021-10-15 | 中国科学院计算技术研究所 | Memory resource dynamic regulation and control method and system based on memory access and performance modeling |
| CN113804470A (en) * | 2021-04-14 | 2021-12-17 | 山东省计算中心(国家超级计算济南中心) | Fault detection feedback method for plug seedling assembly line |
| CN114138685A (en) * | 2021-12-06 | 2022-03-04 | 海光信息技术股份有限公司 | Cache resource allocation method and device, electronic device and storage medium |
| CN119759584A (en) * | 2025-03-04 | 2025-04-04 | 山东浪潮科学研究院有限公司 | A shared memory allocation method, device, equipment and storage medium |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111597034B (en) * | 2019-02-21 | 2023-04-28 | 阿里巴巴集团控股有限公司 | Processor resource scheduling method and device, terminal equipment and computer storage medium |
| US11144353B2 (en) * | 2019-09-27 | 2021-10-12 | Advanced Micro Devices, Inc. | Soft watermarking in thread shared resources implemented through thread mediation |
| CN112445619A (en) * | 2020-11-30 | 2021-03-05 | 海光信息技术股份有限公司 | Management system and method for dynamically sharing ordered resources in a multi-threaded system |
| CN113467939B (en) * | 2021-06-24 | 2024-09-24 | 深圳前海微众银行股份有限公司 | Capacity management method, device, platform and storage medium |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103596285A (en) * | 2012-08-16 | 2014-02-19 | 华为技术有限公司 | Wireless resource scheduling method, wireless resource scheduler and system thereof |
| US20140082626A1 (en) * | 2012-09-14 | 2014-03-20 | International Business Machines Corporation | Management of resources within a computing environment |
| CN105872043A (en) * | 2016-03-29 | 2016-08-17 | 清华大学 | Delay difference balancing method of interactive application in cloud deployment |
| US20160246652A1 (en) * | 2015-02-20 | 2016-08-25 | Andrew J. Herdrich | Techniques to Dynamically Allocate Resources of Configurable Computing Resources |
| CN106126336A (en) * | 2016-06-17 | 2016-11-16 | 上海兆芯集成电路有限公司 | Processor and dispatching method |
Family Cites Families (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7657893B2 (en) * | 2003-04-23 | 2010-02-02 | International Business Machines Corporation | Accounting method and logic for determining per-thread processor resource utilization in a simultaneous multi-threaded (SMT) processor |
| US9342801B2 (en) * | 2010-03-29 | 2016-05-17 | Amazon Technologies, Inc. | Managing committed processing rates for shared resources |
| CN102662763B (en) * | 2012-04-11 | 2014-03-26 | 华中科技大学 | Virtual machine resource scheduling method based on service quality |
| WO2015096031A1 (en) * | 2013-12-24 | 2015-07-02 | 华为技术有限公司 | Method and apparatus for allocating thread shared resource |
| US9626218B1 (en) * | 2014-03-10 | 2017-04-18 | Altera Corporation | Repartitioning and reordering of multiple threads into subsets based on possible access conflict, for sequential access to groups of memory banks in a shared memory |
| CN105183565B (en) * | 2015-09-30 | 2018-12-07 | 华为技术有限公司 | Computer, method for controlling quality of service and device |
| CN106712998A (en) * | 2015-11-18 | 2017-05-24 | 中兴通讯股份有限公司 | Cloud platform resource management method, device and system |
| CN105808357B (en) * | 2016-03-29 | 2021-07-27 | 沈阳航空航天大学 | Performance with precise control over multi-core and multi-threaded processors |
| CN206115425U (en) * | 2016-03-29 | 2017-04-19 | 沈阳航空航天大学 | But performance accurate control multinuclear multi -thread processor |
| CN105893120B (en) * | 2016-04-21 | 2019-07-30 | 北京京东尚科信息技术有限公司 | A kind of acquisition methods and device of thread synchronization resource |
| CN106020973A (en) * | 2016-05-10 | 2016-10-12 | 广东睿江云计算股份有限公司 | CPU (Central Processing Unit) scheduling method and device in cloud host system |
| CN106920025A (en) * | 2016-10-27 | 2017-07-04 | 蔚来汽车有限公司 | Shared resource scheduling method and system |
-
2017
- 2017-07-26 CN CN201710619844.8A patent/CN109308220B/en active Active
-
2018
- 2018-07-24 WO PCT/CN2018/096869 patent/WO2019020028A1/en not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103596285A (en) * | 2012-08-16 | 2014-02-19 | 华为技术有限公司 | Wireless resource scheduling method, wireless resource scheduler and system thereof |
| US20140082626A1 (en) * | 2012-09-14 | 2014-03-20 | International Business Machines Corporation | Management of resources within a computing environment |
| US20160246652A1 (en) * | 2015-02-20 | 2016-08-25 | Andrew J. Herdrich | Techniques to Dynamically Allocate Resources of Configurable Computing Resources |
| CN105872043A (en) * | 2016-03-29 | 2016-08-17 | 清华大学 | Delay difference balancing method of interactive application in cloud deployment |
| CN106126336A (en) * | 2016-06-17 | 2016-11-16 | 上海兆芯集成电路有限公司 | Processor and dispatching method |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113804470A (en) * | 2021-04-14 | 2021-12-17 | 山东省计算中心(国家超级计算济南中心) | Fault detection feedback method for plug seedling assembly line |
| CN113804470B (en) * | 2021-04-14 | 2023-12-01 | 山东省计算中心(国家超级计算济南中心) | A fault detection feedback method for plug seedling cultivation line |
| CN113505084A (en) * | 2021-06-24 | 2021-10-15 | 中国科学院计算技术研究所 | Memory resource dynamic regulation and control method and system based on memory access and performance modeling |
| WO2022267443A1 (en) * | 2021-06-24 | 2022-12-29 | 中国科学院计算技术研究所 | Memory resource dynamic regulation and control method and system based on memory access and performance modeling |
| CN113505084B (en) * | 2021-06-24 | 2023-09-12 | 中国科学院计算技术研究所 | Memory resource dynamic control method and system based on memory access and performance modeling |
| CN114138685A (en) * | 2021-12-06 | 2022-03-04 | 海光信息技术股份有限公司 | Cache resource allocation method and device, electronic device and storage medium |
| CN114138685B (en) * | 2021-12-06 | 2023-03-10 | 海光信息技术股份有限公司 | Cache resource allocation method and device, electronic device and storage medium |
| CN119759584A (en) * | 2025-03-04 | 2025-04-04 | 山东浪潮科学研究院有限公司 | A shared memory allocation method, device, equipment and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN109308220B (en) | 2021-12-14 |
| CN109308220A (en) | 2019-02-05 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2019020028A1 (en) | Method and apparatus for allocating shared resource | |
| Ausavarungnirun et al. | Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems | |
| US8024735B2 (en) | Method and apparatus for ensuring fairness and forward progress when executing multiple threads of execution | |
| US20170017412A1 (en) | Shared Memory Controller And Method Of Using Same | |
| CN105389211B (en) | Memory allocation method and delay perception-Memory Allocation device suitable for NUMA architecture | |
| US10061723B2 (en) | Techniques for handling queued interrupts in a data processing system based on a saturation value | |
| EP2790106A2 (en) | Performance measurement unit, processor core including the same and process profiling method | |
| CN104679593A (en) | Task scheduling optimization method based on SMP system | |
| US10705985B1 (en) | Integrated circuit with rate limiting | |
| KR20130048786A (en) | Coordinating device and application break events for platform power saving | |
| US8190924B2 (en) | Computer system, processor device, and method for controlling computer system | |
| CN106325995B (en) | A method and system for allocating GPU resources | |
| US7007138B2 (en) | Apparatus, method, and computer program for resource request arbitration | |
| CN106325996A (en) | GPU resource distribution method and system | |
| US9274827B2 (en) | Data processing apparatus, transmitting apparatus, transmission control method, scheduling method, and computer product | |
| US9958928B2 (en) | Method and apparatus for controlling an operating mode of a processing module | |
| CN115408153B (en) | Instruction distribution method, device and storage medium of multithreaded processor | |
| US20140032792A1 (en) | Low pin count controller | |
| CN112540934B (en) | Method and system for ensuring quality of service when multiple delay-critical programs are jointly executed | |
| CN118245188B (en) | Thread control method and device, processor and computer readable storage medium | |
| CN104866370A (en) | Dynamic time slice dispatching method and system for parallel application under cloud computing environment | |
| US7533201B2 (en) | Queue management mechanism in network processor wherein packets stored at memory device corresponds to addresses stored in plurity of queues within queue management | |
| Deri et al. | Exploiting commodity multi-core systems for network traffic analysis | |
| EP2700014B1 (en) | Variable length arbitration | |
| US12204941B2 (en) | Preserving quality of service for client applications having workloads for execution by a compute core or a hardware accelerator |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18839350 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18839350 Country of ref document: EP Kind code of ref document: A1 |