[go: up one dir, main page]

WO2024248652A1 - Method and electronic device for instrumentation for profile-guided optimization - Google Patents

Method and electronic device for instrumentation for profile-guided optimization Download PDF

Info

Publication number
WO2024248652A1
WO2024248652A1 PCT/RU2023/000159 RU2023000159W WO2024248652A1 WO 2024248652 A1 WO2024248652 A1 WO 2024248652A1 RU 2023000159 W RU2023000159 W RU 2023000159W WO 2024248652 A1 WO2024248652 A1 WO 2024248652A1
Authority
WO
WIPO (PCT)
Prior art keywords
callsite
function
path
electronic device
execution count
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/RU2023/000159
Other languages
French (fr)
Inventor
Pavel Vladimirovich KOSOV
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to PCT/RU2023/000159 priority Critical patent/WO2024248652A1/en
Publication of WO2024248652A1 publication Critical patent/WO2024248652A1/en
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • G06F8/4441Reducing the execution time required by the program code
    • G06F8/4443Inlining

Definitions

  • Embodiments of the present application relate to the field of techniques for compiling software programs for execution on computing systems, and more specifically, to a method and electronic device for instrumentation for a profile-guided optimization.
  • PGO Profile-guided optimization
  • the first approach has lower runtime overhead as well as lower accuracy.
  • probes are inserted into source code to collect profile data (such as the number of times a function has been called) while the source code is running.
  • profile data such as the number of times a function has been called
  • CFG control-flow graph
  • MST minimal spanning tree
  • Embodiments of this application provide a method and electronic device for instrumentation for a profile-guided optimization.
  • the technical solution may improve binary performance of source code and reduce code size.
  • an embodiment of this application provides a method for instrumentation for profile-guided optimization, including: inserting a callsite list into a callee function of source code or a binary representation of the source code, where a first callsite ID in the callsite list corresponds to a set of counters, and the set of counters is configured to record execution count of each path in the callee function at a first callsite; and while executing a first binary of the source code, collecting profile data about the source code that includes the execution count of each path in the callee function at the first callsite.
  • the profile data with the execution count of each path in the callee function at each callsite may be used in different optimizations (like partial inlining and function specialization) to generate code with better performance.
  • the code generated by using the profile data can maintain a small size.
  • the method further includes: generating a second binary of the source code by using the profile data.
  • the method further includes: while executing a first binary of the source code, if the callee function is called at a second callsite, writing a second callsite ID into the callsite list, and recording, by another set of counters that corresponds to the second callsite ID, execution count of each path in the callee function at the second callsite.
  • the method further includes: determining whether the execution count of each path in the callee function at the first callsite is greater than or equal to a first threshold; and responsive to determining that execution count of a first path in the callee function at the first callsite is greater than or equal to a first threshold, inlining the first path into the first callsite.
  • the responsive to determining that execution count of a first path in the callee function at the first callsite is greater than or equal to a first threshold, inlining the first path into the first callsite includes: responsive to determining that execution count of a first path in the callee function at the first callsite is greater than or equal to a first threshold and total execution count of all paths in the callee function at the first callsite is greater than or equal to a second threshold, inlining the first path into the first callsite.
  • an embodiment of this application provides an electronic device for instrumentation for a profile-guided optimization, including: one or more processors; and at least one memory, storing instructions that, when executed by the one or more processor, cause the electronic device to: insert a callsite list into a callee function of source code or a binary representation of the source code, where a first callsite ID in the callsite list corresponds to a set of counters, and the set of counters is configured to record execution count of each path in the callee function at a first callsite; and while executing a first binary of the source code, collect profile data about the source code that includes the execution count of each path in the callee function at the first callsite.
  • the electronic device is further configured to: generate a second binary of the source code by using the profile data.
  • the electronic device is further configured to: while executing a first binary of the source code, if the callee function is called at a second callsite, write the second callsite ID into the callsite list, and another set of counters that corresponds to the second callsite ID is confugrued to record execution count of each path in the callee function at the second callsite.
  • the electronic device is further configured to: determine whether the execution count of each path in the callee function at the first callsite is greater than or equal to a first threshold; and responsive to determining that execution count of a first path in the callee function at the first callsite is greater than or equal to a first threshold, inline the first path into the first callsite.
  • the electronic device is further configured to: responsive to determining that execution count of a first path in the callee function at the first callsite is greater than or equal to a first threshold and total execution count of all paths in the callee function at the first callsite is greater than or equal to a second threshold, inline the first path into the first callsite.
  • the first callsite ID is coded based on an address of a caller function of the first callsite and an ordinal number of the first callsite inside the caller function.
  • an embodiment of this application provides a computer readable storage medium including instructions. When the instructions are run on a computer, the computer is enabled to perform the method in the first aspect or any possible design of the first aspect.
  • an embodiment of this application provides a computer program product.
  • the computer program product When the computer program product is run on an electronic device, the electronic device is enabled to perform the method in the first aspect or any possible design of the first aspect.
  • an embodiment of this application provides a computer system including modules to perform the method in the first aspect or any possible design of the first aspect.
  • an embodiment of this application provides a chip system.
  • the chip system includes a memory and a processor, the memory is configured to store a computer program, and the processor is configured to invoke the computer program from the memory and run the computer program, so that an electronic device on which the chip system is disposed performs the method in the first aspect or any possible design of the first aspect.
  • FIG. 1 is an example of source code according to an embodiment of this application.
  • FIG. 2 shows an example of a callee function’s control flow graph according to an embodiment of this application.
  • FIG. 3 shows a callsite list inserted into function “alpha” according to an embodiment of this application.
  • FIG. 4 shows a callsite list inserted into function “beta” according to an embodiment of this application.
  • FIG. 5 shows execution count of each instruction (or path) of function “alpha” at each callsite.
  • FIG. 6 is a flowchart of a method for instrumentation for a profile-guide optimization according to an embodiment of this application.
  • FIG. 7 is a schematic block diagram of an electronic device according to an embodiment of this application.
  • a compiler typically performs code optimizations in order to generate code with faster execution speed and efficient use of memory.
  • Profile-guided optimization is a technique that uses information generated from the runtime of a program to guide the compiler to perform optimizations. PGO improves program performance by reducing code sizes and branch mispredictions, changing register allocation, and reorganizing code layouts to reduce instruction-cache.
  • Some applications contain multiple function calls and nested function calls. These function calls may be profiled in order to understand how many times they are executed. Each call to a function may result in several instructions being executed, including the function call itself and the function return, which may be costly.
  • PGO provides profile data to the compiler about areas of an application that are most frequently executed. By knowing these areas, the compiler is able to be more selective and specific in optimizing the application.
  • Instrumentation-based PGO is one type of PGO technique.
  • probes or special instructions are inserted into a binary representation of a program.
  • the probes are used to collect information during execution of the program.
  • the information collected during the program runs is referred to as profile data.
  • the instrumented binary is then compiled to form an instrumented executable file that is run with test input.
  • the profile data resulting from these sample runs is then used in a subsequent compilation of the program, without the instrumented code, to optimize the program.
  • Profile data enables better optimization. For example, knowing that a branch is taken very frequently helps the compiler make better decisions when ordering basic blocks. However, for the current instrumentation approach, all callers (and callsites) use the same counters, so it is not possible to determine hot paths for different callsites, and it leads to a missing optimization opportunity, and also it is not possible to perform different partial inlining decisions for different callsites.
  • embodiments of the present application provide a method and electronic device for instrumentation for a profile-guided optimization, which is helpful to generate code with better performance.
  • FIG. 1 is an example of source code.
  • caller refers to a function that calls another function, which is also called a calling function or a parent function.
  • Callee refers to a function that is called by another function, which is also called a called function or a child function.
  • Callsite refers to a location in the code where a callee function is called by a caller function.
  • given source code may contain a function “foo” which calls a function named “alpha” at POS 1.
  • function “foo” calls function “alpha”, “foo” is a caller function of “alpha”, and the function “alpha” is a callee function, and the callsite, POS 1, is where the actual call to function “alpha” inside of function “foo” exists.
  • function “foo” has three callsites, one for invoking function “alpha” at POS 1, one for invoking function “beta” at POS 2, and one for invoking function “alpha” at POS 3.
  • function “beta” has one callsite for invoking function “alpha” at POS 8.
  • function “alpha” has three instructions at POS 4, POS 5, and POS 6, and function “beta” has one instruction at POS 7.
  • each of the instructions in FIG. 1, such as instruction 1 at position 4, may be a set of instructions which form a path.
  • the instructions in the function in FIG. 1 are only an example, which is not specifically defined in the embodiments of the present application.
  • every callsite has its own unique ID, which is called callsite ID.
  • the callsite ID may be encoded in the other manner, which is not specifically defined in the embodiments of the present application.
  • a callsite ID at position 1 in the function foo may be written as foo- callsitel
  • a callsite ID at position 3 in the function foo may be written as foo-callsite2
  • a callsite ID at position 8 in the function foo may be written as beta-callsite8.
  • this type of callsite ID will also be used for illustration.
  • FIG. 2 shows an example of a callee function’s control flow graph. It should be understood that a callee function in FIG. 2 may correspond to the function “alpha” in FIG. 1.
  • the callee function shown in FIG. 2 may include three paths, namely pathl : Entry - If.thenl - Exit, path2: Entry - If. else 1 - If.then2 - Exit, and path3: Entry - If.elsel - If.else2 - Exit.
  • the instrumented binary being inserted into a callee function may record execution count of the callee function at each callsite, and execution count of each instruction (or path) of the callee function at each callsite.
  • FIG. 3 shows a callsite list inserted into function “alpha”.
  • the callsite list is used to store the current callsite ID (CCID).
  • the callsite list includes foo- callsitel, foo-callsite2, and beta-callsite 1, with three counters corresponding to each callsite.
  • the three counters record execution count of each of three instructions (or paths) in the function “alpha” at each callsite.
  • Counter 1 of foo-callsitel records the execution count of the instruction! of function “alpha” at foo-callsitel.
  • Counter2 of foo-callsitel records the execution count of the instruction of function “alpha” at foo-callsitel.
  • Counter? of foo-callsitel records the
  • Each callee function in source code has its own unique callsite list.
  • FIG. 4 shows a callsite list inserted into function “beta”. As illustrated in FIG. 4, Counterl of
  • foo-callsitel records the execution count of the instruction of function “beta” at foo-callsitel.
  • Counter? of foo-callsitel records the execution count of the “invoke alpha” of function “beta” at foo-callsitel.
  • the foo-callsitel in the callsite list of function “alpha” is different from the foo-callsitel in the callsite list of function “beta”.
  • the foo-callsitel in FIG.? refers to callsite ordinal number 1 of function “foo” calling function “alpha”
  • the foo-callsitel in FIG.4 refers to callsite ordinal number 1 of function “foo” calling function “beta”.
  • callsite lists of different callee functions can be distinguished by callee function IDs or the other unique ID, which is not specifically defined in
  • the callsite ID is written to the callsite list of the callee function.
  • the callsite in the callsite list may be called the current callsite.
  • the callee function chooses counters which correspond to a callsite ID from callsite list. For example, when function “foo” calls function “alpha” at foo-callsitel, the foo-callsitel is written to the callsite list of function “alpha”.
  • function “alpha” execute instruction! it chooses counterl which corresponds to foo-callsitel from the callsite list.
  • all unregistered callsites’ ID occupies a reserved slot, and a set of counters that corresponds to the reserved slot record execution count of each path in the callee function at all unregistered callsites.
  • Callee function can be used in different callsites. For example, in large C/C++ projects such as LLVM or ArkVM with many functions, each function may be used in different callsites. And for these callsites, different paths of the callee function may be hot or cold.
  • FIG. 5 shows execution count of each instruction (or path) of function “alpha” at each callsite. As illustrated in FIG. 5, pathl of function “alpha” was called at foo-callsitel for 100 times, path2 of function “alpha” was called at foo-callsitel for 10 times, and path3 of function “alpha” was called at foo-callsitel for 20 times. Therefore, path2 may be hot for foo-callsitel, and pathl and path3 may be cold for foo-callsitel. Similarly, path2 and path3 may be hot for foo-callsite2, and path3 may be hot for beta-callsite 1.
  • the execution count of each instruction (or path) of each current callsite may be compared with a particular first threshold. If the number of times that the instruction (or path) of the current callsite is executed is greater than (or equal to) the first threshold, the instruction (or path) may be considered “hot” and may be inlined into the current callsite.
  • the total execution count of all paths in the callee function at the current callsite may be compared with a particular second threshold. If the number of times that the current callsite is executed is greater than (or equal to) the second threshold, the current callsite may be considered “hot”. Only if the current callsite is “hot” and the instruction (or path) of the current callsite is “hot”, the instruction (or path) may be inlined into the current callsite. In this way, it reduces the number of callsites that needs to be inlined, and the compilation time and binary size can be lower.
  • FIG. 6 is a flowchart of an embodiment of a method for instrumentation for a profileguide optimization.
  • ID in the callsite list corresponds to a set of counters.
  • the set of counters is used to record execution count of each path in the callee function at a first callsite, and each of the set of counters corresponds to each of the paths in the callee function.
  • each of the paths in the callee function may include at least one instruction.
  • the source code is a program’s or an application’s source code, which is not specifically defined in the embodiments of the present application.
  • the first binary of the source code is a binary file into which probes or special instructions are inserted.
  • the probes are used to collect profile data during execution of the first binary, and the probes or special instructions may include the callsite list.
  • the probes are inserted into the source code, and then the source code with the probes is compiled into the first binary.
  • the callsite list is inserted into a binary representation of source code, which generates the first binary.
  • the profile data is used in compilation of the source code, without any instrumented code (e.g. probes or special instructions), to generate a second binary.
  • the second binary is the optimized binary of source code. Compared to the first binary, the base blocks of the second binary may be reordered, the registers may be allocated in a more profitable way, and so on.
  • the second callsite ID is written into the callsite list, and another set of counters that corresponds to the second callsite ID records execution count of each path in the callee function at the second callsite.
  • the coldest callsite ID in the callsite list is replaced by a new current callsite ID, and a set of counters that corresponds to the new current callsite ID records execution count of each path in the callee function at the new current callsite.
  • all unregistered callsites’ ID occupies a reserved slot, and a set of counters that corresponds to the reserved slot records execution count of each path in the callee function at all unregistered callsites.
  • a callsite ID in the callsite list is coded based on an address of a caller function of the callsite and the ordinal number of the callsite inside the caller function.
  • the first callsite ID is coded based on an address of a caller function of the first callsite and the ordinal number of the first callsite inside the caller function.
  • the specific encoding method for callsite ID can refer to the description in the previous text, which will not be elaborated here.
  • an electronic device 700 may include a transceiver 701 , a processor 702, and a memory 703.
  • the memory 703 may be configured to store code, instructions, and the like executed by the processor 702.
  • the processor 702 may be an integrated circuit chip and has a signal processing capability. In an implementation process, steps of the foregoing method embodiments may be completed by using a hardware integrated logic circuit in the processor, or by using instructions in a form of software.
  • the processor may be a general purpose processor, a central processing unit (CPU), a graphics processing unit (GPU), a neural processing unit (NPU), a system on chip (SoC) or another programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component.
  • the processor may implement or perform the methods, the steps, and the logical block diagrams that are disclosed in the embodiments of the present application.
  • the general purpose processor may be a microprocessor, or the processor may be any conventional processor or the like.
  • the steps of the methods disclosed with reference to the embodiments of the present application may be directly performed and completed by a hardware decoding processor, or may be performed and completed by using a combination of hardware in the decoding processor and a software module.
  • the software module may be located in a mature storage medium in the art, such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, an electrically erasable programmable memory, or a register.
  • the storage medium is located in the memory, and the processor reads information in the memory and completes the steps of the foregoing methods in combination with hardware in the processor.
  • the memory 703 in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both a volatile memory and a non-volatile memory.
  • the non-volatile memory may be a read-only memory (Read-Only Memory, ROM), a programmable read-only memory (Programmable ROM, PROM), an erasable programmable read-only memory (Erasable PROM, EPROM), an electrically erasable programmable read-only memory (Electrically EPROM, EEPROM), or a flash memory.
  • the volatile memory may be a random access memory (Random Access Memory, RAM) and is used as an external cache.
  • RAMs may be used, and are, for example, a static random access memory (Static RAM, SRAM), a dynamic random access memory (Dynamic RAM, DRAM), a synchronous dynamic random access memory (Synchronous DRAM, SDRAM), a double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR SDRAM), an enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), a synchronous link dynamic random access memory (Synchronous link DRAM, SLDRAM), and a direct rambus random access memory (Direct Rambus RAM, DR RAM).
  • Static RAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • Enhanced SDRAM, ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • Direct Rambus RAM Direct Rambus RAM
  • the memory in the system and the method described in this specification includes but is not limited to these memories and a memory of any other appropriate type.
  • An embodiment of this application further provides a system chip, where the system chip includes an input/output interface, at least one processor, at least one memory, and a bus.
  • the at least one memory is configured to store instructions
  • the at least one processor is configured to invoke the instructions of the at least one memory to perform operations in the methods in the foregoing embodiments.
  • An embodiment of this application further provides a computer storage medium, where the computer storage medium may store a program instruction for performing any of the foregoing methods.
  • the storage medium may be specifically the memory 703.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the described apparatus embodiment is merely an example.
  • the unit division is merely logical function division and may be other division in actual implementation.
  • a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed.
  • the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented through some interfaces.
  • the indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.
  • the units described as separate parts may be or may not be physically separate, and parts displayed as units may be or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected based on actual requirements to achieve the objectives of the solutions of the embodiments.
  • the functions When the functions are implemented in a form of a software functional unit and sold or used as an independent product, the functions may be stored in a computer readable storage medium. Based on such an understanding, the technical solutions in this application essentially, or the part contributing to the prior art, or some of the technical solutions may be implemented in a form of a software product.
  • the computer software product is stored in a storage medium, and includes several instructions for instructing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or some of the steps of the methods described in the embodiments of this application.
  • the foregoing storage medium includes: any medium that can store program code, such as a USB flash drive, a removable hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk, or an optical disc.
  • program code such as a USB flash drive, a removable hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk, or an optical disc.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

Embodiments of this application provide a method and electronic device for instrumentation for a profile-guided optimization. The method includes: inserting a callsite list into a callee function of source code or a binary representation of the source code, where a first callsite ID in the callsite list corresponds to a set of counters, and the set of counters is configured to record execution count of each path in the callee function at a first callsite; and while executing a first binary of the source code, collecting profile data about the source code that includes the execution count of each path in the callee function at the first callsite. The proposed technique makes it possible to improve binary performance of source code and reduce code size.

Description

METHOD AND ELECTRONIC DEVICE FOR INSTRUMENTATION
FOR PROFILE-GUIDED OPTIMIZATION
TECHNICAL FIELD
[0001] Embodiments of the present application relate to the field of techniques for compiling software programs for execution on computing systems, and more specifically, to a method and electronic device for instrumentation for a profile-guided optimization.
BACKGROUND
[0002] Profile-guided optimization (PGO), also known as feedback-driven optimization, is usually performed by using two approaches: (1) sampling with a profiler and (2) instrumentation with additional code. The first approach has lower runtime overhead as well as lower accuracy.
[0003] For the instrumentation approach, probes are inserted into source code to collect profile data (such as the number of times a function has been called) while the source code is running. Currently in C/C++ compilers, the instrumentation approach is achieved in the following way: firstly, a compiler builds a control-flow graph (CFG) of the function which should be instrumented, then based on CFG it builds a minimal spanning tree (MST), and each leaf of the MST is considered as a place to insert a counter for profiling.
[0004] The current instrumentation approach has shown good results; however, it has some defects. For example, all callers (and callsites) use the same counters, so it is not possible to determine hot paths for different callsites, which leads to missing optimization opportunities, and also it is not possible to perform different partial inlining decisions for different callsites.
SUMMARY
[0005] Embodiments of this application provide a method and electronic device for instrumentation for a profile-guided optimization. The technical solution may improve binary performance of source code and reduce code size.
[0006] According to a first aspect, an embodiment of this application provides a method for instrumentation for profile-guided optimization, including: inserting a callsite list into a callee function of source code or a binary representation of the source code, where a first callsite ID in the callsite list corresponds to a set of counters, and the set of counters is configured to record execution count of each path in the callee function at a first callsite; and while executing a first binary of the source code, collecting profile data about the source code that includes the execution count of each path in the callee function at the first callsite.
[0007] According to the method above, the profile data with the execution count of each path in the callee function at each callsite may be used in different optimizations (like partial inlining and function specialization) to generate code with better performance. And the code generated by using the profile data can maintain a small size.
[0008] In a possible design, the method further includes: generating a second binary of the source code by using the profile data.
[0009] In a possible design, the method further includes: while executing a first binary of the source code, if the callee function is called at a second callsite, writing a second callsite ID into the callsite list, and recording, by another set of counters that corresponds to the second callsite ID, execution count of each path in the callee function at the second callsite.
[0010] In a possible design, the method further includes: determining whether the execution count of each path in the callee function at the first callsite is greater than or equal to a first threshold; and responsive to determining that execution count of a first path in the callee function at the first callsite is greater than or equal to a first threshold, inlining the first path into the first callsite.
[0011] In a possible design, the responsive to determining that execution count of a first path in the callee function at the first callsite is greater than or equal to a first threshold, inlining the first path into the first callsite, includes: responsive to determining that execution count of a first path in the callee function at the first callsite is greater than or equal to a first threshold and total execution count of all paths in the callee function at the first callsite is greater than or equal to a second threshold, inlining the first path into the first callsite.
[0012] In a possible design, the first callsite ID is coded based on an address of a caller function of the first callsite and an ordinal number of the first callsite inside the caller function. [0013] According to a third aspect, an embodiment of this application provides an electronic device for instrumentation for a profile-guided optimization, including: one or more processors; and at least one memory, storing instructions that, when executed by the one or more processor, cause the electronic device to: insert a callsite list into a callee function of source code or a binary representation of the source code, where a first callsite ID in the callsite list corresponds to a set of counters, and the set of counters is configured to record execution count of each path in the callee function at a first callsite; and while executing a first binary of the source code, collect profile data about the source code that includes the execution count of each path in the callee function at the first callsite.
[0014] In a possible design, the electronic device is further configured to: generate a second binary of the source code by using the profile data.
[0015] In a possible design, the electronic device is further configured to: while executing a first binary of the source code, if the callee function is called at a second callsite, write the second callsite ID into the callsite list, and another set of counters that corresponds to the second callsite ID is confugrued to record execution count of each path in the callee function at the second callsite.
[0016] In a possible design, the electronic device is further configured to: determine whether the execution count of each path in the callee function at the first callsite is greater than or equal to a first threshold; and responsive to determining that execution count of a first path in the callee function at the first callsite is greater than or equal to a first threshold, inline the first path into the first callsite.
[0017] In a possible design, the electronic device is further configured to: responsive to determining that execution count of a first path in the callee function at the first callsite is greater than or equal to a first threshold and total execution count of all paths in the callee function at the first callsite is greater than or equal to a second threshold, inline the first path into the first callsite.
[0018] In a possible design, the first callsite ID is coded based on an address of a caller function of the first callsite and an ordinal number of the first callsite inside the caller function. [0019] According to a third aspect, an embodiment of this application provides a computer readable storage medium including instructions. When the instructions are run on a computer, the computer is enabled to perform the method in the first aspect or any possible design of the first aspect.
[0020] According to a fourth aspect, an embodiment of this application provides a computer program product. When the computer program product is run on an electronic device, the electronic device is enabled to perform the method in the first aspect or any possible design of the first aspect.
[0021] According to a fifth aspect, an embodiment of this application provides a computer system including modules to perform the method in the first aspect or any possible design of the first aspect.
[0022] According to a sixth aspect, an embodiment of this application provides a chip system. The chip system includes a memory and a processor, the memory is configured to store a computer program, and the processor is configured to invoke the computer program from the memory and run the computer program, so that an electronic device on which the chip system is disposed performs the method in the first aspect or any possible design of the first aspect.
DESCRIPTION OF DRAWINGS
[0023] FIG. 1 is an example of source code according to an embodiment of this application. [0024] FIG. 2 shows an example of a callee function’s control flow graph according to an embodiment of this application.
[0025] FIG. 3 shows a callsite list inserted into function “alpha” according to an embodiment of this application.
[0026] FIG. 4 shows a callsite list inserted into function “beta” according to an embodiment of this application.
[0027] FIG. 5 shows execution count of each instruction (or path) of function “alpha” at each callsite.
[0028] FIG. 6 is a flowchart of a method for instrumentation for a profile-guide optimization according to an embodiment of this application.
[0029] FIG. 7 is a schematic block diagram of an electronic device according to an embodiment of this application.
DESCRIPTION OF EMBODIMENTS
[0030] The following clearly and completely describes the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application. Apparently, the described embodiments are merely some but not all of the embodiments of the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without creative efforts shall fall within the protection scope of the present application.
[0031] In the specification, claims, and accompanying drawings of the present application, the term “include” and any other variants of the term mean to cover the non-exclusive inclusion. For example, a process, a method, a system, a product, or a device that includes a series of steps or units is not limited to the listed steps or units, but optionally further includes an unlisted step or unit, or optionally further includes another inherent step or unit of the process, the method, the product, or the device.
[0032] This application will present various aspects, embodiments or features in the context of a system including a plurality of devices, components, modules, and the like. It is to be understood and appreciated that the various systems may include additional devices, components, modules, etc., and/or may not include all of the devices, components, modules, etc. discussed in connection with the figures. In addition, combinations of these schemes can also be used.
[0033] In addition, in the embodiments of the present application, words such as “exemplary” and “for example” are used to represent examples or illustrations. Any embodiment or design described in this application as “exemplary” should not be construed as preferred or advantageous embodiment over other embodiments or designs. Rather, the use of the word example is intended to present a concept in a concrete way.
[0034] In this application, “relevant” and “corresponding” can sometimes be used interchangeably, it should be noted that, when the difference is not emphasized, the meanings to be expressed are the same.
[0035] The service scenarios described in the embodiments of the present application are for the purpose of illustrating the technical solutions of the embodiments of the present application more clearly, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application. In spite of the evolution of the architecture and the emergence of new business scenarios, the technical solutions provided in the embodiments of the present application are also applicable to solving similar technical problems.
[0036] A compiler typically performs code optimizations in order to generate code with faster execution speed and efficient use of memory. Profile-guided optimization (PGO) is a technique that uses information generated from the runtime of a program to guide the compiler to perform optimizations. PGO improves program performance by reducing code sizes and branch mispredictions, changing register allocation, and reorganizing code layouts to reduce instruction-cache.
[0037] Some applications contain multiple function calls and nested function calls. These function calls may be profiled in order to understand how many times they are executed. Each call to a function may result in several instructions being executed, including the function call itself and the function return, which may be costly. PGO provides profile data to the compiler about areas of an application that are most frequently executed. By knowing these areas, the compiler is able to be more selective and specific in optimizing the application.
[0038] Instrumentation-based PGO is one type of PGO technique. In this technique, probes or special instructions are inserted into a binary representation of a program. The probes are used to collect information during execution of the program. The information collected during the program runs is referred to as profile data. The instrumented binary is then compiled to form an instrumented executable file that is run with test input. The profile data resulting from these sample runs is then used in a subsequent compilation of the program, without the instrumented code, to optimize the program.
[0039] Profile data enables better optimization. For example, knowing that a branch is taken very frequently helps the compiler make better decisions when ordering basic blocks. However, for the current instrumentation approach, all callers (and callsites) use the same counters, so it is not possible to determine hot paths for different callsites, and it leads to a missing optimization opportunity, and also it is not possible to perform different partial inlining decisions for different callsites.
[0040] To solve the problem above, embodiments of the present application provide a method and electronic device for instrumentation for a profile-guided optimization, which is helpful to generate code with better performance.
[0041] FIG. 1 is an example of source code. In an embodiment of this application, caller refers to a function that calls another function, which is also called a calling function or a parent function. Callee refers to a function that is called by another function, which is also called a called function or a child function. Callsite refers to a location in the code where a callee function is called by a caller function. For example, given source code may contain a function “foo” which calls a function named “alpha” at POS 1. Since function “foo” calls function “alpha”, “foo” is a caller function of “alpha”, and the function “alpha” is a callee function, and the callsite, POS 1, is where the actual call to function “alpha” inside of function “foo” exists. [0042] In this case, function “foo” has three callsites, one for invoking function “alpha” at POS 1, one for invoking function “beta” at POS 2, and one for invoking function “alpha” at POS 3. And function “beta” has one callsite for invoking function “alpha” at POS 8. Besides, function “alpha” has three instructions at POS 4, POS 5, and POS 6, and function “beta” has one instruction at POS 7.
[0043] It should be understood that each of the instructions in FIG. 1, such as instruction 1 at position 4, may be a set of instructions which form a path. The instructions in the function in FIG. 1 are only an example, which is not specifically defined in the embodiments of the present application.
[0044] In the embodiments of the present application, every callsite has its own unique ID, which is called callsite ID. A callsite ID may be encoded in the following manner: take an address of a caller function, and replace the most significant byte with an ordinal number of the callsite inside the caller function (from 1 to 255). If a caller function has more than 255 callsites, the remaining callsites are marked with Callsite ID = 0 (unregistered). Namely, the callsite ID in the callsite list is coded based on an address of a caller function of the callsite’s and the ordinal number of the callsite inside the caller function.
[0045] Here is an example of callsite IDs. void foo() { // address of function “foo” is 0xffffl234 call alpha(); // callsite ordinal number 1, callsite ID - 01f l234 call beta(); // callsite ordinal number 1, callsite ID = 01 ffl234 call gamma(); // callsite ordinal number 1, callsite ID = 01ffl234 call alpha(); // callsite ordinal number 2, callsite ID = 02ffl234 call gamma(); // callsite ordinal number 2, callsite ID = 02f l234 call alpha (); // callsite ordinal number 3, callsite ID = 03ffl234 call alpha(); // if there already more or equal than 255 callsites ,
// for this callsite invoking function “alpha”, callsite ID = 00ffl234
[0046] It should be understood that the callsite ID may be encoded in the other manner, which is not specifically defined in the embodiments of the present application. For example, for ease of understanding, a callsite ID at position 1 in the function foo may be written as foo- callsitel, a callsite ID at position 3 in the function foo may be written as foo-callsite2, a callsite ID at position 8 in the function foo may be written as beta-callsite8. In later sections, this type of callsite ID will also be used for illustration.
[0047] FIG. 2 shows an example of a callee function’s control flow graph. It should be understood that a callee function in FIG. 2 may correspond to the function “alpha” in FIG. 1. The callee function shown in FIG. 2 may include three paths, namely pathl : Entry - If.thenl - Exit, path2: Entry - If. else 1 - If.then2 - Exit, and path3: Entry - If.elsel - If.else2 - Exit.
[0048] In the embodiments of the present application, the instrumented binary being inserted into a callee function may record execution count of the callee function at each callsite, and execution count of each instruction (or path) of the callee function at each callsite.
[0049] FIG. 3 shows a callsite list inserted into function “alpha”. The callsite list is used to store the current callsite ID (CCID). As illustrated in FIG. 3, the callsite list includes foo- callsitel, foo-callsite2, and beta-callsite 1, with three counters corresponding to each callsite. The three counters record execution count of each of three instructions (or paths) in the function “alpha” at each callsite. Counter 1 of foo-callsitel records the execution count of the instruction! of function “alpha” at foo-callsitel. Counter2 of foo-callsitel records the execution count of the instruction of function “alpha” at foo-callsitel. Counter? of foo-callsitel records the
5 execution count of the instruction? of function “alpha” at foo-callsitel. Similarly, Counter 1 of beta-callsite 1 records the execution count of the instruction! of function “alpha” at betacallsite 1.
[0050] Each callee function in source code has its own unique callsite list. As an example, FIG. 4 shows a callsite list inserted into function “beta”. As illustrated in FIG. 4, Counterl of
10 foo-callsitel records the execution count of the instruction of function “beta” at foo-callsitel. Counter? of foo-callsitel records the execution count of the “invoke alpha” of function “beta” at foo-callsitel.
[0051] It should be understood that the foo-callsitel in the callsite list of function “alpha” is different from the foo-callsitel in the callsite list of function “beta”. As illustrated in FIG. 4 and FIG. 4, the foo-callsitel in FIG.? refers to callsite ordinal number 1 of function “foo” calling function “alpha”, and the foo-callsitel in FIG.4 refers to callsite ordinal number 1 of function “foo” calling function “beta”.
[0052] It should be understood that the callsite lists of different callee functions can be distinguished by callee function IDs or the other unique ID, which is not specifically defined in
?0 the embodiments of the present application.
[0053] In the embodiments of the present application, when the caller function calls the callee function at a callsite, the callsite ID is written to the callsite list of the callee function. The callsite in the callsite list may be called the current callsite. When the callee function starts, it chooses counters which correspond to a callsite ID from callsite list. For example, when function “foo” calls function “alpha” at foo-callsitel, the foo-callsitel is written to the callsite list of function “alpha”. When function “alpha” execute instruction! , it chooses counterl which corresponds to foo-callsitel from the callsite list.
[0054] As an example embodiment, if all slots of the callsite list have been already registered by other callsites, the coldest callsite ID in the callsite list will be replaced by a new
?0 current callsite ID, and a set of counters that corresponds to the new current callsite ID will record execution count of each path in the callee function at the new current callsite.
[0055] As an example embodiment, if all slots of the callsite list have been already registered by other callsites, all unregistered callsites’ ID occupies a reserved slot, and a set of counters that corresponds to the reserved slot record execution count of each path in the callee function at all unregistered callsites.
[0056] Callee function can be used in different callsites. For example, in large C/C++ projects such as LLVM or ArkVM with many functions, each function may be used in different callsites. And for these callsites, different paths of the callee function may be hot or cold. For example, FIG. 5 shows execution count of each instruction (or path) of function “alpha” at each callsite. As illustrated in FIG. 5, pathl of function “alpha” was called at foo-callsitel for 100 times, path2 of function “alpha” was called at foo-callsitel for 10 times, and path3 of function “alpha” was called at foo-callsitel for 20 times. Therefore, path2 may be hot for foo-callsitel, and pathl and path3 may be cold for foo-callsitel. Similarly, path2 and path3 may be hot for foo-callsite2, and path3 may be hot for beta-callsite 1.
[0057] As an example embodiment, it is possible to make partial inlining of pathl to foo- callsitel, path2 and path3 to foo-callsite2, path3 to beta-callsite 1, thereby improving performance and keeping code size lower than in the case of full inlining whole function “alpha” to all callsites.
[0058] Using profile data from embodiments of the present application it is possible to create different function specializations for different callsites (e.g. reorder basic blocks to improve instruction cache locality, and to allocate registers (inside each specialization) in a more profitable way).
[0059] As an example embodiment, the execution count of each instruction (or path) of each current callsite may be compared with a particular first threshold. If the number of times that the instruction (or path) of the current callsite is executed is greater than (or equal to) the first threshold, the instruction (or path) may be considered “hot” and may be inlined into the current callsite.
[0060] As an example embodiment, the total execution count of all paths in the callee function at the current callsite may be compared with a particular second threshold. If the number of times that the current callsite is executed is greater than (or equal to) the second threshold, the current callsite may be considered “hot”. Only if the current callsite is “hot” and the instruction (or path) of the current callsite is “hot”, the instruction (or path) may be inlined into the current callsite. In this way, it reduces the number of callsites that needs to be inlined, and the compilation time and binary size can be lower.
[0061] FIG. 6 is a flowchart of an embodiment of a method for instrumentation for a profileguide optimization.
[0062] 101, inserting a callsite list into a callee function of source code, where a first callsite
ID in the callsite list corresponds to a set of counters.
[0063] The set of counters is used to record execution count of each path in the callee function at a first callsite, and each of the set of counters corresponds to each of the paths in the callee function.
[0064] It should be understood that each of the paths in the callee function may include at least one instruction.
[0065] In the embodiments of the present application, the source code is a program’s or an application’s source code, which is not specifically defined in the embodiments of the present application.
[0066] 102, while executing a first binary of the source code, collecting profile data about the source code that includes the execution count of each path in the callee function at the first callsite.
[0067] It should be understood that the first binary of the source code is a binary file into which probes or special instructions are inserted. The probes are used to collect profile data during execution of the first binary, and the probes or special instructions may include the callsite list.
[0068] In some cases, the probes (including the callsite list) are inserted into the source code, and then the source code with the probes is compiled into the first binary. In some other cases, the callsite list is inserted into a binary representation of source code, which generates the first binary.
[0069] In some embodiments, the profile data is used in compilation of the source code, without any instrumented code (e.g. probes or special instructions), to generate a second binary. The second binary is the optimized binary of source code. Compared to the first binary, the base blocks of the second binary may be reordered, the registers may be allocated in a more profitable way, and so on.
[0070] In some embodiments, while executing a first binary of the source code, if the callee function is called at a second callsite, the second callsite ID is written into the callsite list, and another set of counters that corresponds to the second callsite ID records execution count of each path in the callee function at the second callsite.
[0071] In some cases, if all slots of the callsite list have been already registered by other callsites, the coldest callsite ID in the callsite list is replaced by a new current callsite ID, and a set of counters that corresponds to the new current callsite ID records execution count of each path in the callee function at the new current callsite.
[0072] In some other cases, if all slots of the callsite list have been already registered by other callsites, all unregistered callsites’ ID occupies a reserved slot, and a set of counters that corresponds to the reserved slot records execution count of each path in the callee function at all unregistered callsites.
[0073] In some embodiments, for each path in the callee function at the first callsite, whether the execution count is greater than or equal to a first threshold is determined. And responsive to determining that execution count of a first path in the callee function at the first callsite is greater than or equal to a first threshold, the first path is inlined into the first callsite. [0074] In some embodiments, responsive to determining that execution count of a first path in the callee function at the first callsite is greater than or equal to a first threshold and total execution count of all paths in the callee function at the first callsite is greater than or equal to a second threshold, the first path is inlined into the first callsite.
[0075] In some embodiments, a callsite ID in the callsite list is coded based on an address of a caller function of the callsite and the ordinal number of the callsite inside the caller function. For example, the first callsite ID is coded based on an address of a caller function of the first callsite and the ordinal number of the first callsite inside the caller function. The specific encoding method for callsite ID can refer to the description in the previous text, which will not be elaborated here.
[0076] As shown in FIG.7, an electronic device 700 may include a transceiver 701 , a processor 702, and a memory 703. The memory 703 may be configured to store code, instructions, and the like executed by the processor 702.
[0077] It should be understood that the processor 702 may be an integrated circuit chip and has a signal processing capability. In an implementation process, steps of the foregoing method embodiments may be completed by using a hardware integrated logic circuit in the processor, or by using instructions in a form of software. The processor may be a general purpose processor, a central processing unit (CPU), a graphics processing unit (GPU), a neural processing unit (NPU), a system on chip (SoC) or another programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component. The processor may implement or perform the methods, the steps, and the logical block diagrams that are disclosed in the embodiments of the present application. The general purpose processor may be a microprocessor, or the processor may be any conventional processor or the like. The steps of the methods disclosed with reference to the embodiments of the present application may be directly performed and completed by a hardware decoding processor, or may be performed and completed by using a combination of hardware in the decoding processor and a software module. The software module may be located in a mature storage medium in the art, such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, an electrically erasable programmable memory, or a register. The storage medium is located in the memory, and the processor reads information in the memory and completes the steps of the foregoing methods in combination with hardware in the processor.
[0078] It may be understood that the memory 703 in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both a volatile memory and a non-volatile memory. The non-volatile memory may be a read-only memory (Read-Only Memory, ROM), a programmable read-only memory (Programmable ROM, PROM), an erasable programmable read-only memory (Erasable PROM, EPROM), an electrically erasable programmable read-only memory (Electrically EPROM, EEPROM), or a flash memory. The volatile memory may be a random access memory (Random Access Memory, RAM) and is used as an external cache. By way of example rather than limitation, many forms of RAMs may be used, and are, for example, a static random access memory (Static RAM, SRAM), a dynamic random access memory (Dynamic RAM, DRAM), a synchronous dynamic random access memory (Synchronous DRAM, SDRAM), a double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR SDRAM), an enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), a synchronous link dynamic random access memory (Synchronous link DRAM, SLDRAM), and a direct rambus random access memory (Direct Rambus RAM, DR RAM).
[0079] It should be noted that the memory in the system and the method described in this specification includes but is not limited to these memories and a memory of any other appropriate type.
[0080] An embodiment of this application further provides a system chip, where the system chip includes an input/output interface, at least one processor, at least one memory, and a bus. The at least one memory is configured to store instructions, and the at least one processor is configured to invoke the instructions of the at least one memory to perform operations in the methods in the foregoing embodiments.
[0081 ] An embodiment of this application further provides a computer storage medium, where the computer storage medium may store a program instruction for performing any of the foregoing methods.
[0082] Optionally, the storage medium may be specifically the memory 703.
[0083] A person of ordinary skill in the art may be aware that, in combination with the examples described in the embodiments disclosed in this specification, units and algorithm steps can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether the functions are performed by hardware or software depends on particular applications and design constraints of the technical solutions. A person skilled in the art may use different methods to implement the described functions for each particular application, but it should not be considered that the implementation goes beyond the scope of this application.
[0084] It may be clearly understood by a person skilled in the art that, for the purpose of convenient and brief description, for a detailed working process of the foregoing system, apparatus, and unit, refer to a corresponding process in the foregoing method embodiment. Details are not described herein again.
[0085] In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the described apparatus embodiment is merely an example. For example, the unit division is merely logical function division and may be other division in actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented through some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms. [0086] The units described as separate parts may be or may not be physically separate, and parts displayed as units may be or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected based on actual requirements to achieve the objectives of the solutions of the embodiments.
[0087] In addition, functional units in the embodiments of this application may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units are integrated into one unit.
[0088] When the functions are implemented in a form of a software functional unit and sold or used as an independent product, the functions may be stored in a computer readable storage medium. Based on such an understanding, the technical solutions in this application essentially, or the part contributing to the prior art, or some of the technical solutions may be implemented in a form of a software product. The computer software product is stored in a storage medium, and includes several instructions for instructing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or some of the steps of the methods described in the embodiments of this application. The foregoing storage medium includes: any medium that can store program code, such as a USB flash drive, a removable hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk, or an optical disc.
[0089] The foregoing descriptions are merely specific implementations of this application, but are not intended to limit the protection scope of this application. Any variation or replacement readily figured out by a person skilled in the art within the technical scope disclosed in this application shall fall within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

CLAIMS What is claimed is:
1. A method for instrumentation for a profile-guided optimization, comprising: inserting a callsite list into a callee function of source code or a binary representation of the source code, wherein a first callsite ID in the callsite list corresponds to a set of counters, and the set of counters is configured to record execution count of each path in the callee function at a first callsite; and while executing a first binary of the source code, collecting profile data about the source code that includes the execution count of each path in the callee function at the first callsite.
2. The method according to claim 1, wherein the method further comprises: generating a second binary of the source code by using the profile data.
3. The method according to claim 1 or 2, wherein the method further comprises: while executing a first binary of the source code, if the callee function is called at a second callsite, writing a second callsite ID into the callsite list, and recording, by another set of counters that corresponds to the second callsite ID, execution count of each path in the callee function at the second callsite.
4. The method according to any one of claims 1-3, wherein the method further comprises: determining whether the execution count of each path in the callee function at the first callsite is greater than or equal to a first threshold; and responsive to determining that execution count of a first path in the callee function at the first callsite is greater than or equal to a first threshold, inlining the first path into the first callsite.
5. The method according to claim 4, wherein the responsive to determining that execution count of a first path in the callee function at the first callsite is greater than or equal to a first threshold, inlining the first path into the first callsite, comprises: responsive to determining that execution count of a first path in the callee function at the first callsite is greater than or equal to a first threshold and total execution count of all paths in the callee function at the first callsite is greater than or equal to a second threshold, inlining the first path into the first callsite.
6. The method according to any one of claims 1-5, wherein the first callsite ID is coded based on an address of a caller function of the first callsite and an ordinal number of the first callsite inside the caller function.
7. An electronic device for instrumentation for profile-guided optimization, comprising: one or more processors; and at least one memory, storing instructions that, when executed by the one or more processors, cause the electronic device to: insert a callsite list into a callee function of source code or a binary representation of the source code, wherein a first callsite ID in the callsite list corresponds to a set of counters, and the set of counters is configured to record execution count of each path in the callee function at a first callsite; and while executing a first binary of the source code, collect profile data about the source code that includes the execution count of each path in the callee function at the first callsite.
8. The electronic device according to claim 6, wherein the electronic device is further configured to: generate a second binary of the source code by using the profile data.
9. The electronic device according to claim 6 or 7, wherein the electronic device is further configured to: while executing a first binary of the source code, if the callee function is called at a second callsite, write a second callsite ID into the callsite list, and another set of counters that corresponds to the second callsite ID is confugrued to record execution count of each path in the callee function at the second callsite.
10. The electronic device according to any one of claims 6-8, wherein the electronic device is further configured to: determine whether the execution count of each path in the callee function at the first callsite is greater than or equal to a first threshold; and responsive to determining that execution count of a first path in the callee function at the first callsite is greater than or equal to a first threshold, inline the first path into the first callsite.
11. The electronic device according to any one of claims 9, wherein the electronic device is further configured to: responsive to determining that execution count of a first path in the callee function at the first callsite is greater than or equal to a first threshold and total execution count of all paths in the callee function at the first callsite is greater than or equal to a second threshold, inline the first path into the first callsite.
12. The electronic device according to any one of claims 9, wherein the first callsite ID is coded based on an address of a caller function of the first callsite and an ordinal number of the first callsite inside the caller function.
13. A computer-readable storage medium, wherein the computer-readable storage medium stores instructions, and when the instructions run on an electronic device, the electronic device is enabled to perform the method according to any one of claims 1 to 6.
14. A computer program product, wherein when the computer program product runs on an electronic device, the electronic device is enabled to perform the method according to any one of claims 1 to 6.
15. A computer system, wherein the computer system comprises modules to perform the method according to any one of claims 1 to 6.
16. A chip system, comprising a memory and a processor, wherein the memory is configured to store a computer program, and the processor is configured to invoke the computer program from the memory and run the computer program, so that an electronic device on which the chip system is disposed performs the method according to any one of claims 1 to 6.
PCT/RU2023/000159 2023-06-01 2023-06-01 Method and electronic device for instrumentation for profile-guided optimization Pending WO2024248652A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/RU2023/000159 WO2024248652A1 (en) 2023-06-01 2023-06-01 Method and electronic device for instrumentation for profile-guided optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/RU2023/000159 WO2024248652A1 (en) 2023-06-01 2023-06-01 Method and electronic device for instrumentation for profile-guided optimization

Publications (1)

Publication Number Publication Date
WO2024248652A1 true WO2024248652A1 (en) 2024-12-05

Family

ID=87519855

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/RU2023/000159 Pending WO2024248652A1 (en) 2023-06-01 2023-06-01 Method and electronic device for instrumentation for profile-guided optimization

Country Status (1)

Country Link
WO (1) WO2024248652A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6986130B1 (en) * 2000-07-28 2006-01-10 Sun Microsystems, Inc. Methods and apparatus for compiling computer programs using partial function inlining
US9009691B1 (en) * 2013-07-12 2015-04-14 Google Inc. Using an inline stack to improve performance of an applications binary

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6986130B1 (en) * 2000-07-28 2006-01-10 Sun Microsystems, Inc. Methods and apparatus for compiling computer programs using partial function inlining
US9009691B1 (en) * 2013-07-12 2015-04-14 Google Inc. Using an inline stack to improve performance of an applications binary

Similar Documents

Publication Publication Date Title
US9940229B2 (en) Technologies for persistent memory programming
US9946523B2 (en) Multiple pass compiler instrumentation infrastructure
US8479176B2 (en) Register mapping techniques for efficient dynamic binary translation
US5778233A (en) Method and apparatus for enabling global compiler optimizations in the presence of exception handlers within a computer program
US9286190B2 (en) Inserting implicit sequence points into computer program code to support debug operations
US20070150880A1 (en) Post-register allocation profile directed instruction scheduling
US9213531B2 (en) Methods to eliminate extra memory loads while accessing global variables in position independent code
US9690552B2 (en) Technologies for low-level composable high performance computing libraries
WO2022105492A1 (en) Method and apparatus for fixing weak memory ordering problem
US6360360B1 (en) Object-oriented compiler mechanism for automatically selecting among multiple implementations of objects
CN102804142B (en) Use double byte sequence compiler optimized code
US7383402B2 (en) Method and system for generating prefetch information for multi-block indirect memory access chains
US9117017B2 (en) Debugger with previous version feature
US8056066B2 (en) Method and apparatus for address taken refinement using control flow information
US7779393B1 (en) System and method for efficient verification of memory consistency model compliance
US7856628B2 (en) Method for simplifying compiler-generated software code
US9146719B2 (en) Data layout using data type information
WO2024248652A1 (en) Method and electronic device for instrumentation for profile-guided optimization
US9940267B2 (en) Compiler global memory access optimization in code regions using most appropriate base pointer registers
WO2025241640A1 (en) Program processing method, compiler, system on chip, electronic device and storage medium
US7383401B2 (en) Method and system for identifying multi-block indirect memory access chains
CN116700729B (en) Code compilation method and related device
US20030217356A1 (en) Register allocation for program execution analysis
US11704101B2 (en) Function-level redundancy detection and optimization
US20050251795A1 (en) Method, system, and program for optimizing code

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23748626

Country of ref document: EP

Kind code of ref document: A1