[go: up one dir, main page]

WO2024074163A1 - Symbolic execution of machine code for software testing - Google Patents

Symbolic execution of machine code for software testing Download PDF

Info

Publication number
WO2024074163A1
WO2024074163A1 PCT/CZ2023/050058 CZ2023050058W WO2024074163A1 WO 2024074163 A1 WO2024074163 A1 WO 2024074163A1 CZ 2023050058 W CZ2023050058 W CZ 2023050058W WO 2024074163 A1 WO2024074163 A1 WO 2024074163A1
Authority
WO
WIPO (PCT)
Prior art keywords
context
program
risc
execution
successive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CZ2023/050058
Other languages
French (fr)
Inventor
René Christoph KIRSCH
Stefanie Muroya LEI
Michael Starzinger
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Czech Technical University In Prague
Original Assignee
Czech Technical University In Prague
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Czech Technical University In Prague filed Critical Czech Technical University In Prague
Publication of WO2024074163A1 publication Critical patent/WO2024074163A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Prevention of errors by analysis, debugging or testing of software
    • G06F11/3604Analysis of software for verifying properties of programs
    • G06F11/3608Analysis of software for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis

Definitions

  • This present disclosure relates to a method of exploring a RISC-V program, which method may be used for testing all or part of a software module, a computer system adapted to execute such a method, and a non-transitory computer-readable storage medium including computer-executable instructions configured to carry out the method.
  • Testing and analysing programs is an integral part of software development and computer research. At a basic level, testing involves selecting a specific input, executing the program with said input and checking the result. In such testing, all or part of a program is put through a suite of regression tests after each revision or modification and the outputs are evaluated for correctness against a test scenario.
  • a software program may include any number of modules, and each module may be tested or validated individually, or multiple modules may be tested or validated in combination.
  • a software module may be designed, tested and validated manually or automatically, and executed under the test cases.
  • a software testing tool implemented as computer software or hardware, may automatically generate test cases for a software module under analysis, execute the module while simulating the test cases, and check for module behavior or output that does not agree with the test cases/scenarios.
  • Programs can be tested using symbolic execution systems. Instead of relying on a specific input, symbolic execution bases analysis of a program on symbolic inputs and tracking the execution in terms of those symbolic values.
  • a method of symbolically executing a program on a computer device comprising:
  • Semantics of lower-level representations are closer to the semantics of the lower level activity which determines actual program behavior on a machine.
  • some embodiments enable symbolic execution of the RISC-V program to operate on a binary level, enabling low level exploration of the program. Subsequently, inspection of, and determining and solving errors within, the program is made easier and more efficient.
  • use of contexts in the manner described above facilitates simulation that is much closer to physical hardware by directly simulating the RISC-V machine states. Owing to the bit-precise nature of RISC-V representations and the binary nature of the symbolic execution as described above, the source code of the software under test is not required to carry out extensive testing.
  • selecting a context may comprise selecting, from the work list, a context having the lowest program counter.
  • Such selection enables selection of deeper contexts first. It is preferable to explore deeper contexts first such that paths with recursive calls are fully explored, before the depth of the context reaches the n-bound. Such decisions conceptually ensure a depth-first exploration of looping/recursive program paths. It may be useful for understanding some embodiments to consider that in some instances it may be preferable to explore contexts with numerically lower program counter values, such that loops are fully explored before the loop-termination point is explored.
  • the RISC-V code is a RISC-U code, wherein RISC-U is a small subset of 64-bit RISC-V machine code.
  • the at least one SMT formula is a subset of an SMT-LIB language formula.
  • the method comprises determining a program counter of at least one selected context and at least one successor context, determining whether the program counter of the selected context and the program counter of the successor context match, wherein if it is determined that there is a match, conducting eager merging of the selected context and the successor context.
  • the method comprises determining and tracking a call stack of each context, and merging contexts having matching program counters in reverse order of the depth of the call stack of each context.
  • the method comprises translating an origin program into a RISC-V program, and exploring the RISC-V program.
  • the method comprises comprising solving the SMT formulae via an SMT solver to determine at least one solution for identifying error states within the RISC-V program.
  • the method may comprise configuring a computer device to run the RISC-V representation of the program with the witness as an input for exploring error states of the program.
  • the method may comprise applying a merging condition.
  • Merging may drastically reduce the number of contexts requiring exploration by MONSTER, thereby reducing the overall complexity of the symbolic execution of the RISC-V program. This in turn lightens the processing burden of a computer device or data carrier or the like configured to carry out the translation of the RISC-V program into SMT functions.
  • the merging condition may, for example, be at least one of eager merging or a merging condition based on the depth of the context (i.e. minimum/maximum merging).
  • a nontransitory computer-readable storage medium including computer-executable instructions configured to carry out the above method.
  • a system comprising a memory comprising instructions executable by one or more processors, and the one or more processors coupled to the memory and operable to execute the instructions, the one or more processors being operable when executing the instructions to:
  • the one or more processors is operable when executing the instructions solve the SMT formula for determining at least one error state of the RISC-V program.
  • the system comprises output means for outputting the at least one SMT formula, wherein the output means is at least one of a register of a data carrier, a memory storage means and a screen.
  • Reference to software module may refer to either a self-contained software application or portion or subset of a software application.
  • the method may be used for symbolic testing all or part of a software module, wherein the all or part of the software module is represented by RISC-V code.
  • Some embodiments herein may provide a method of symbolically executing a program on a computer device, the method comprising:
  • RISC-V Reduced instruction set computer five
  • RISC-U Reduced instruction set computer five
  • RISC-U consists of 14 machine instructions limited to unsigned 64-bit integer arithmetic. RISC-U may also have 4GB of byte-addressed main memory. RISC-U programs may be obtained by compiling C* programs, which itself is a small subset of the programming language C. Similar to RISC-U, C* is also limited to unsigned 64-bit integer arithmetic. RISC-U code may thus be a highly simplified representation of complex code.
  • a self-compiling compiler may compile a small, but still fast, subset of C, C* to a small subset of 64-bit RISC-V, RISC-U. At least part of Selfie is itself written in C* comprises a non-optimizing linear-time C* compiler that targets RISC-U. In this way RISC-U derived from a C* code, via Selfie, is a linear binary code.
  • An input program in RISC-U binary code may itself be the program to be tested or a representation of the program to be tested.
  • the program may be a C code, which is translated into RISC-U code prior to symbolic execution.
  • the example method 100 of represents an example set of steps that may be applied to a RISC-U program as an implementation of a symbolic execution process, in order to transform the program into an SMT formulae. This process is referred to hereon as MONSTER .
  • the method 100 may be carried out by at least one computer device and/or data carrier.
  • a computer configured to execute MONSTER may receive the RISC-U program to be transformed, and a bound, n , on the path length.
  • the bound specifies the maximum number of instructions MONSTER executes symbolically along each path until exploration of that particular path is abandoned. This is one way in which impact of path explosion when executing MONSTER can be limited.
  • the bound may be manually set by a user/tester.
  • a context may exist as a triple, more specifically as the triple ( pi, s, d) , wherein pi represents a path condition, s represents a symbolic store representing all values stored in registers and memory, and d represents an execution depth, or the number of instructions executed on that path pi at that point.
  • step 104 all contexts c are stored in a work list C. It will become evident that the work list C represents a list of contexts for all paths that have not yet terminated or exceeded the bound during the symbolic execution.
  • MONSTER then beings an iterative process 106.
  • MONSTER effectively iteratively selects and removes a context from C, symbolically executes the corresponding instruction of the RISC-U program.
  • the work list When running the first iteration the work list will be fully populated with contexts. However, it is explained below how this will change over time.
  • MONSTER selects and removes the context with the smallest program counter (which is stored as a concrete value in store s ) from the work list – step 108. Subsequently, at step 110, MONSTER decodes and evaluates that context, and the corresponding RISC-U instruction. At step 112 the effect(s) of evaluating c, and thereby evaluating the corresponding RISC-U instruction, on the machine state is interpreted and reflected bit-precisely in s . The effect on the machine state may be evidenced by a change in a register of a computer or the like involved in execution of MONSTER.
  • successor context c’ is generated, said successor context comprising an execution depth which is an increment of 1 from the execution depth of the context which was removed at step 108.
  • MONSTER determines whether the execution depth of the successor context has reached a bound condition determined by a pre-set bound, for example bound n mentioned above.
  • the successor context is added into the work list (step 120). If, however, the bound condition has been reached, the successor context is discarded (step 122) as that path is considered to have been explored to the maximum desired depth. In this instance the overall number of contexts in the work list decreases.
  • MONSTER determines whether evaluation of the path condition of selected context lead to a non-zero exit value. This is indicative of that particular path causing an error state in the program. If a non-zero exit value is arrived at, that path is added to a path store F (step 126). If the exit value is a zero value, the path is not added to the path store.
  • MONSTER determines whether the work list still comprises at least one context. It is noted given sufficient iterations of the process 106 and specifically sufficient iterations of step 122, the work list will eventually comprise no contexts as bound conditions are met by the execution depths of successor contexts.
  • step 108 If contexts remain, the iterative process returns to step 108 and continues through to step 128 until the work list is empty.
  • the iterative process ends and the path store F is output.
  • the output may comprise display at a screen or transmission of F to a solver.
  • Path store F comprises error states of the RISC-U program as SMT formulae.
  • error states may be SMT-LIB formulae.
  • the formulae may be passed onto an SMT solver.
  • MONSTER may implement merging of contexts, particularly eager mering or static state merging.
  • eager merging When eager merging is implemented, MONSTER will determine whether a program counter of a successive context matches the program counter of a context already existing in the work list. If the respective program counters do match, MONSTER acts to merge these contexts. In this way only one context exists in the work list for each program counter at any point in time.
  • merging two contexts may comprise combining their respective path conditions into a logical disjunction requiring that the merge point is reachable if one of the two paths can be satisfied.
  • Such eager merging drastically reduces the number of contexts requiring exploration by MONSTER, thereby reducing the overall complexity of the symbolic execution of the RISC-V program. This in turn lightens the burden of processing from a computer device or data carrier or the like configured to carry out the translation of the RISC-V program into SMT functions using MONSTER, and reduces. processing time.
  • a RISC-V program 202 is input to MONSTER 204 so as to undergo symbolic execution.
  • the program may by a RISC-U program.
  • the output of execution of the RISC-V program by MONSTER is a set of SMT formulae 206, specifically SMT-LIB formulae in a particularly simplified example.
  • the output SMT formulae can then be passed to an SMT solver 208 so as to arrive at a solution/solutions identifying error states of the RISC-V program.
  • the RISC-V program itself may be a translation, wherein an original program for testing by MONSTER was translated into the RISC-V program, optionally RISC-U, for evaluation by MONSTER.
  • the some embodiments herein relate to exploration of a RISC-V program.
  • the some embodiments comprise a symbolic execution engine for exploring all possible execution paths in a given RISC-V program, up to a given bound on the path length. Exploration of the program in this way leads to potential error conditions being expressed as SMT formulae and which can subsequently be solved using existing SMT solvers.
  • the claimed symbolic execution process does not need to settle with using higher-level representations of an input program, which can lead to sub-optimal exploration of the input program.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present invention relates to a method of symbolically executing a program on a computer device, a corresponding non-transitory computer-readable storage medium and a corresponding system.

Description

Symbolic Execution of Machine Code for Software Testing
This present disclosure relates to a method of exploring a RISC-V program, which method may be used for testing all or part of a software module, a computer system adapted to execute such a method, and a non-transitory computer-readable storage medium including computer-executable instructions configured to carry out the method.
Testing and analysing programs is an integral part of software development and computer research. At a basic level, testing involves selecting a specific input, executing the program with said input and checking the result. In such testing, all or part of a program is put through a suite of regression tests after each revision or modification and the outputs are evaluated for correctness against a test scenario.
A software program may include any number of modules, and each module may be tested or validated individually, or multiple modules may be tested or validated in combination. A software module may be designed, tested and validated manually or automatically, and executed under the test cases. A software testing tool, implemented as computer software or hardware, may automatically generate test cases for a software module under analysis, execute the module while simulating the test cases, and check for module behavior or output that does not agree with the test cases/scenarios.
Known testing processes can be cumbersome, inefficient, rely heavily on selection of a correct input at the beginning of the process, provide only limited coverage and have a tendency to miss bugs.
Programs can be tested using symbolic execution systems. Instead of relying on a specific input, symbolic execution bases analysis of a program on symbolic inputs and tracking the execution in terms of those symbolic values.
In this way it is a non-explicit state model-checking technique, but a more generalized system when compared with basic input-execution-output testing.
Symbolic execution is conventionally carried out on high level representations of a program or algorithm. Semantics of such higher level representations are by definition further away from the semantics of the lower-level which determines actual program behavior on the machine. In this way known simulations rely on representations of a program that are far from the fully compiled program product.
According to a first aspect of some embodiments, there is provided a method of symbolically executing a program on a computer device, the method comprising:
  • representing a program as a RISC-V program, the RISC-V program comprising plural paths;
  • extending a machine state of each path of the RISC-V program into a context, wherein the context comprises a path condition, a symbolic store comprising a program counter and an execution depth,
  • deriving a work list comprising the contexts;
  • setting a bound for the execution depth;
  • conducting an iterative process of evaluating each context, said iterative process comprising the steps of:
  1. selecting a context and removing the selected context from the work list;
  2. executing the selected context and determining the effect of the execution on the symbolic store of the selected context, and representing the execution in the symbolic store of the selected context as a bit value;
  3. generating a successive context, wherein the execution depth of the successive context is greater by 1 than the execution depth of the selected context;
  4. determining whether the execution depth of the successive context has met a bound condition determined by the bound for the execution depth, and:
    1. adding the successive context to the work list if the execution depth of the successive context has not met the bound condition; or
    2. discarding the successive context;
  5. determining whether evaluation of the path condition leads to a nonzero exit value, and if it is determined that evaluation of the path condition does lead to a non-zero exit value, storing the path condition as a satisfiability modulo theories, SMT, formula and generating a witness; and
  6. repeating steps a) to e) until the work list no longer comprises a context; and outputting the at least one SMT formula and the witness.
Semantics of lower-level representations are closer to the semantics of the lower level activity which determines actual program behavior on a machine.
Therefore symbolic execution with higher-level representations of an input program can lead to sub-optimal exploration of programs.
By using bit-precise modeling of RISC-V machine instructions some embodiments enable symbolic execution of the RISC-V program to operate on a binary level, enabling low level exploration of the program. Subsequently, inspection of, and determining and solving errors within, the program is made easier and more efficient. In particular, use of contexts in the manner described above facilitates simulation that is much closer to physical hardware by directly simulating the RISC-V machine states. Owing to the bit-precise nature of RISC-V representations and the binary nature of the symbolic execution as described above, the source code of the software under test is not required to carry out extensive testing.
Generation of witnesses in the instances of non-zero exit paths during execution of contexts can be used to identify points of error in the RISC-V representation of the program, without requiring the source code of the program itself.
In an example selecting a context may comprise selecting, from the work list, a context having the lowest program counter.
Such selection enables selection of deeper contexts first. It is preferable to explore deeper contexts first such that paths with recursive calls are fully explored, before the depth of the context reaches the n-bound. Such decisions conceptually ensure a depth-first exploration of looping/recursive program paths. It may be useful for understanding some embodiments to consider that in some instances it may be preferable to explore contexts with numerically lower program counter values, such that loops are fully explored before the loop-termination point is explored.
In an example the RISC-V code is a RISC-U code, wherein RISC-U is a small subset of 64-bit RISC-V machine code.
In an example the at least one SMT formula is a subset of an SMT-LIB language formula.
Many SMT-LIB operators map bit-precisely from RISC-U instructions. In this way use of SMT-LIB with RISC-U minimises the need for translations and the like during evaluation of the program by MONSTER. This in turn reduces the number of steps required to be taken by the method when evaluating the program to give the claimed output. In this way some embodiments facilitate more efficient exploration of a RISC-U program.
In an example the method comprises determining a program counter of at least one selected context and at least one successor context, determining whether the program counter of the selected context and the program counter of the successor context match, wherein if it is determined that there is a match, conducting eager merging of the selected context and the successor context.
In this way only one context exists in the work list for each program counter at any point in time. This dramatically reduces the overall complexity of the symbolic execution of the RISC-V program, and in turn reduces the number of proof obligations in the SMT formula output. This lightens the burden of processing from a computer device or data carrier or the like configured to carry out the translation of the RISC-V program into SMT functions using MONSTER, and reduces processing time.
In an example the method comprises determining and tracking a call stack of each context, and merging contexts having matching program counters in reverse order of the depth of the call stack of each context.
In an example the method comprises translating an origin program into a RISC-V program, and exploring the RISC-V program.
In an example the method comprises comprising solving the SMT formulae via an SMT solver to determine at least one solution for identifying error states within the RISC-V program.
In this way errors in the RISC-V program can be located, thereby enabling a user to quickly investigate, and where relevant, correct the corresponding portion of the program.
In an example the method may comprise configuring a computer device to run the RISC-V representation of the program with the witness as an input for exploring error states of the program.
In an example the method may comprise applying a merging condition. Merging may drastically reduce the number of contexts requiring exploration by MONSTER, thereby reducing the overall complexity of the symbolic execution of the RISC-V program. This in turn lightens the processing burden of a computer device or data carrier or the like configured to carry out the translation of the RISC-V program into SMT functions. The merging condition may, for example, be at least one of eager merging or a merging condition based on the depth of the context (i.e. minimum/maximum merging).
According to another aspect of some embodiments, there is provided a nontransitory computer-readable storage medium including computer-executable instructions configured to carry out the above method.
According to another aspect of the some embodiments, there is provided a system comprising a memory comprising instructions executable by one or more processors, and the one or more processors coupled to the memory and operable to execute the instructions, the one or more processors being operable when executing the instructions to:
  • represent a program as a RISC-V program, the RISC-V program comprising plural paths;
  • extend a machine state of each path of the RISC-V program into a context, wherein the context comprises a path condition, a symbolic store comprising a program counter and an execution depth,
  • derive a work list comprising the contexts;
  • set a bound for the execution depth;
  • conduct an iterative process of evaluating each context, said iterative process comprising the steps of:
  1. selecting a context and removing the selected context from the work list;
  2. executing the selected context and determining the effect of the execution on the symbolic store of the selected context, and representing the execution in the symbolic store of the selected context as a bit value;
  3. generating a successive context, wherein the execution depth of the successive context is greater by 1 than the execution depth of the selected context;
  4. determining whether the execution depth of the successive context has met a bound condition determined by the bound for the execution depth, and:
    1. adding the successive context to the work list if the execution depth of the successive context has not met the bound condition; or
    2. discarding the successive context;
  5. determining whether evaluation of the path condition leads to a non-zero exit value, and if it is determined that evaluation of the path condition does lead to a non-zero exit value, storing the path condition as an SMT formula and generating a witness and; and
  6. repeating steps a) to e) until the work list no longer comprises a context; and outputting the at least one SMT formula and the witness.
In an example the one or more processors is operable when executing the instructions solve the SMT formula for determining at least one error state of the RISC-V program.
In an example the system comprises output means for outputting the at least one SMT formula, wherein the output means is at least one of a register of a data carrier, a memory storage means and a screen.
Reference to software module may refer to either a self-contained software application or portion or subset of a software application. The method may be used for symbolic testing all or part of a software module, wherein the all or part of the software module is represented by RISC-V code.
Fig.1
shows schematically an example embodiment of the invention.
Fig.2
shows schematically another example embodiment of the invention.
Examples
In conventional symbolic execution, translation requires simulation of the control flow in the program. Symbolic execution must consider not one but plural possible machine states, and in the instances of branches in the program, multiple paths need to be considered. This may lead to an explosion in the number of paths needing to be explored in the symbolic execution. Such exploration can be energy-intensive, tedious and subject to faults and crashes.
Traditionally, when executing a program symbolically, all values that are part of the machine state are no longer necessarily concrete, they can also be symbolic thereby representing an entire set of possible assignments. Hence symbolic execution is not always able to decide which path is taken at branching points and potentially forced to explore both paths. This means there is no longer just one machine state to consider, but a (potentially unbounded) set of machine state, and path explosion can be time and computationally expensive.
Some embodiments herein may provide a method of symbolically executing a program on a computer device, the method comprising:
  • representing a program as a RISC-V program, the RISC-V program comprising plural paths;
  • extending a machine state of each path of the RISC-V program into a context, wherein the context comprises a path condition, a symbolic store comprising a program counter and an execution depth,
  • deriving a work list comprising the contexts;
  • setting a bound for the execution depth;
  • conducting an iterative process of evaluating each context, said iterative process comprising the steps of:
  1. selecting a context and removing the selected context from the work list;
  2. executing the selected context and determining the effect of the execution on the symbolic store of the selected context, and representing the execution in the symbolic store of the selected context as a bit value;
  3. generating a successive context, wherein the execution depth of the successive context is greater by 1 than the execution depth of the selected context;
  4. determining whether the execution depth of the successive context has met a bound condition determined by the bound for the execution depth, and:
    1. adding the successive context to the work list if the execution depth of the successive context has not met the bound condition; or
    2. discarding the successive context;
  5. determining whether evaluation of the path condition leads to a nonzero exit value, and if it is determined that evaluation of the path condition does lead to a non-zero exit value, storing the path condition as a satisfiability modulo theories (SMT) formula and generating a witness; and
  6. repeating steps a) to e) until the work list no longer comprises a context; and outputting the at least one SMT formula and the witness.
Reduced instruction set computer five (RISC-V) is a type of microprocessor architecture that divides instructions into two categories – memory access and combinational digital circuit operations. In embodiments described below, a small subset of 64-bit RISC-V machine code, referred to as RISC-U, is the binary machine code that will be subjected to symbolic execution according to the some embodiments herein.
RISC-U consists of 14 machine instructions limited to unsigned 64-bit integer arithmetic. RISC-U may also have 4GB of byte-addressed main memory. RISC-U programs may be obtained by compiling C* programs, which itself is a small subset of the programming language C. Similar to RISC-U, C* is also limited to unsigned 64-bit integer arithmetic. RISC-U code may thus be a highly simplified representation of complex code.
A self-compiling compiler, referred to herein as “Selfie” may compile a small, but still fast, subset of C, C* to a small subset of 64-bit RISC-V, RISC-U. At least part of Selfie is itself written in C* comprises a non-optimizing linear-time C* compiler that targets RISC-U. In this way RISC-U derived from a C* code, via Selfie, is a linear binary code.
Generation of the RISC-U input is not shown in the figures for simplicity.
An input program in RISC-U binary code may itself be the program to be tested or a representation of the program to be tested. In an example the program may be a C code, which is translated into RISC-U code prior to symbolic execution.
shows schematically an example embodiment of some embodiments herein. The example method 100 of represents an example set of steps that may be applied to a RISC-U program as an implementation of a symbolic execution process, in order to transform the program into an SMT formulae. This process is referred to hereon as MONSTER. The method 100 may be carried out by at least one computer device and/or data carrier.
Prior to execution of the method 100, a computer configured to execute MONSTER may receive the RISC-U program to be transformed, and a bound, n, on the path length. The bound specifies the maximum number of instructions MONSTER executes symbolically along each path until exploration of that particular path is abandoned. This is one way in which impact of path explosion when executing MONSTER can be limited. The bound may be manually set by a user/tester.
At step 102 the machine state of each path of the program is extended into a context, c. Such a context may exist as a triple, more specifically as the triple (pi, s, d), wherein pi represents a path condition, s represents a symbolic store representing all values stored in registers and memory, and d represents an execution depth, or the number of instructions executed on that path pi at that point.
At step 104 all contexts c are stored in a work list C. It will become evident that the work list C represents a list of contexts for all paths that have not yet terminated or exceeded the bound during the symbolic execution.
MONSTER then beings an iterative process 106. MONSTER effectively iteratively selects and removes a context from C, symbolically executes the corresponding instruction of the RISC-U program. When running the first iteration the work list will be fully populated with contexts. However, it is explained below how this will change over time.
With each iteration of the symbolic execution method, MONSTER selects and removes the context with the smallest program counter (which is stored as a concrete value in store s) from the work list – step 108. Subsequently, at step 110, MONSTER decodes and evaluates that context, and the corresponding RISC-U instruction. At step 112 the effect(s) of evaluating c, and thereby evaluating the corresponding RISC-U instruction, on the machine state is interpreted and reflected bit-precisely in s. The effect on the machine state may be evidenced by a change in a register of a computer or the like involved in execution of MONSTER. Following evaluation of the context, at step 114 successor context c’ is generated, said successor context comprising an execution depth which is an increment of 1 from the execution depth of the context which was removed at step 108. In other words, if context c has an execution depth d, the successor context c’ has an execution depth d = d + 1.
At step 116 MONSTER determines whether the execution depth of the successor context has reached a bound condition determined by a pre-set bound, for example bound n mentioned above.
If the bound condition has not yet been reached, the successor context is added into the work list (step 120). If, however, the bound condition has been reached, the successor context is discarded (step 122) as that path is considered to have been explored to the maximum desired depth. In this instance the overall number of contexts in the work list decreases.
In this way the work list evolves as MONSTER works through the work list as part of the symbolic execution process.
In most cases evaluating an instruction will produce one successor context at most. In the instance where MONSTER cannot decide on a branch outcome, for example when there are two symbolic values to consider in a path, two successor contexts can be produced, and optionally both added to the work list.
At step 124 MONSTER determines whether evaluation of the path condition of selected context lead to a non-zero exit value. This is indicative of that particular path causing an error state in the program. If a non-zero exit value is arrived at, that path is added to a path store F (step 126). If the exit value is a zero value, the path is not added to the path store.
Following this, at step 128, MONSTER determines whether the work list still comprises at least one context. It is noted given sufficient iterations of the process 106 and specifically sufficient iterations of step 122, the work list will eventually comprise no contexts as bound conditions are met by the execution depths of successor contexts.
If contexts remain, the iterative process returns to step 108 and continues through to step 128 until the work list is empty.
At step 128 when MONSTER determines that there are no more contexts remaining, the iterative process ends and the path store F is output. The output may comprise display at a screen or transmission of F to a solver.
Path store F comprises error states of the RISC-U program as SMT formulae.
In a particular embodiment the error states may be SMT-LIB formulae. In either case, the formulae may be passed onto an SMT solver.
In some instance MONSTER may implement merging of contexts, particularly eager mering or static state merging. When eager merging is implemented, MONSTER will determine whether a program counter of a successive context matches the program counter of a context already existing in the work list. If the respective program counters do match, MONSTER acts to merge these contexts. In this way only one context exists in the work list for each program counter at any point in time.
In an example, merging two contexts may comprise combining their respective path conditions into a logical disjunction requiring that the merge point is reachable if one of the two paths can be satisfied. Such eager merging drastically reduces the number of contexts requiring exploration by MONSTER, thereby reducing the overall complexity of the symbolic execution of the RISC-V program. This in turn lightens the burden of processing from a computer device or data carrier or the like configured to carry out the translation of the RISC-V program into SMT functions using MONSTER, and reduces. processing time.
shows schematically another example embodiment of the invention.
Herein there is shown a process 200 for identifying error states of a program.
A RISC-V program 202 is input to MONSTER 204 so as to undergo symbolic execution. In a specific simplified example the program may by a RISC-U program. The output of execution of the RISC-V program by MONSTER is a set of SMT formulae 206, specifically SMT-LIB formulae in a particularly simplified example. The output SMT formulae can then be passed to an SMT solver 208 so as to arrive at a solution/solutions identifying error states of the RISC-V program.
The RISC-V program itself may be a translation, wherein an original program for testing by MONSTER was translated into the RISC-V program, optionally RISC-U, for evaluation by MONSTER.
The some embodiments herein relate to exploration of a RISC-V program. In particular the some embodiments comprise a symbolic execution engine for exploring all possible execution paths in a given RISC-V program, up to a given bound on the path length. Exploration of the program in this way leads to potential error conditions being expressed as SMT formulae and which can subsequently be solved using existing SMT solvers. By using bit-precise modeling of RISC-V machine instructions, the claimed symbolic execution process does not need to settle with using higher-level representations of an input program, which can lead to sub-optimal exploration of the input program.

Claims (15)

  1. A method of symbolically executing a program on a computer device, the method comprising:
    • representing a program as a Reduced Instruction Set Computer Five, RISC-V, program, the RISC-V program comprising plural paths; extending a machine state of each path of the RISC-V program into a context, wherein the context comprises a path condition, a symbolic store comprising a program counter and an execution depth,
    • deriving a work list comprising the contexts;
    • setting a bound for the execution depth;
    • conducting an iterative process of evaluating each context, said iterative process comprising the steps of:
    1. selecting a context and removing the selected context from the work list;
    2. executing the selected context and determining an effect of the execution on the symbolic store of the selected context, and representing the execution in the symbolic store of the selected context as a bit value;
    3. generating a successive context, wherein the execution depth of the successive context is greater by 1 than the execution depth of the selected context;
    4. determining whether the execution depth of the successive context has met a bound condition determined by the bound for the execution depth, and:
      i) adding the successive context to the work list if the execution depth of the successive context has not met the bound condition; or
      ii) discarding the successive context;
    5. determining whether evaluation of the path condition leads to a non-zero exit value, and if it is determined that the evaluation of the path condition does lead to the non-zero exit value, storing the path condition as a satisfiability modulo theories, SMT, formula and generating a witness and; and
    6. repeating steps a) to e) until the work list no longer comprises a context; and outputting the at least one SMT formula and the witness.
  2. The method according to claim 1, wherein selecting a context comprises selecting, from the work list, a context having a lowest program counter.
  3. The method according to any of claims 1 to 2, wherein the RISC-V code is a RISC-U code, wherein RISC-U is a small subset of 64-bit RICV-V machine code.
  4. The method according to claim 3, wherein the at least one SMT formula is a subset of an SMT-LIB language formula.
  5. The method according to any of claims 1 to 4, comprising determining a program counter of at least one selected context and at least one successor context, determining whether the program counter of the selected context and the program counter of the successor context match, wherein if it is determined that there is a match, conducting eager merging of the selected context and the successor context.
  6. The method according to claim 4, comprising determining and tracking a call stack of each context, and merging contexts having matching program counters in reverse order of the execution depth of the call stack of each context.
  7. The method according to any of claims 1 to 6, comprising solving the SMT formulae via an SMT solver to determine at least one solution for identifying error states within the RISC-V program.
  8. The method according to any of claims 1 to 7, comprising configuring a computer device to run the RISC-V representation of the program with the witness as an input for exploring error states of the program.
  9. The method according to any of claims 1 to 8, comprising applying a merging condition to the RISC-V program.
  10. The method according to claim 9, wherein the merging condition is an eager merging condition.
  11. The method according to claim 9, wherein the merging condition is based on the depth of the context.
  12. A non-transitory computer-readable storage medium including computer-executable instructions configured to carry out the method according to any of claims 1 to 11.
  13. A system comprising: a memory comprising instructions executable by one or more processors, and the one or more processors coupled to the memory and operable to execute the instructions, the one or more processors being operable when executing the instructions to:
    • represent a program as a Reduced Instruction Set Computer Five, RISC-V, program, the RISC-V program comprising plural paths; extend a machine state of each path of the RISC-V program into a context, wherein the context comprises a path condition, a symbolic store comprising a program counter and an execution depth,
    • derive a work list comprising the contexts;
    • set a bound for the execution depth;
    • conduct an iterative process of evaluating each context, said iterative process comprising the steps of:
    1. selecting a context and removing the selected context from the work list;
    2. executing the selected context and determining an effect of the execution on the symbolic store of the selected context, and representing the execution in the symbolic store of the selected context as a bit value;
    3. generating a successive context, wherein the execution depth of the successive context is greater by 1 than the execution depth of the selected context;
    4. determining whether the execution depth of the successive context has met a bound condition determined by the bound for the execution depth, and:
      i) adding the successive context to the work list if the execution depth of the successive context has not met the bound condition; or
      ii) discarding the successive context;
    5. determining whether evaluation of the path condition leads to a non-zero exit value, and if it is determined that the evaluation of the path condition does lead to the non-zero exit value, storing the path condition as a satisfiability modulo theories, SMT, formula and generating a witness and; and
    6. repeating steps a) to e) until the work list no longer comprises a context; and outputting the at least one SMT formula and the witness.
  14. The system according to claim 13, wherein the one or more processors is operable when executing the instructions to solve the SMT formula for determining at least one error state of the RISC-V program.
  15. The system according to any of claims 13 to 14, comprising an output means for outputting the at least one SMT formula, wherein the output means is at least one of a register of a data carrier, a memory storage means and a screen.
PCT/CZ2023/050058 2022-10-06 2023-09-01 Symbolic execution of machine code for software testing Ceased WO2024074163A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263378564P 2022-10-06 2022-10-06
US63/378,564 2022-10-06

Publications (1)

Publication Number Publication Date
WO2024074163A1 true WO2024074163A1 (en) 2024-04-11

Family

ID=88297195

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CZ2023/050058 Ceased WO2024074163A1 (en) 2022-10-06 2023-09-01 Symbolic execution of machine code for software testing

Country Status (1)

Country Link
WO (1) WO2024074163A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130283236A1 (en) * 2012-04-23 2013-10-24 Ecole Polytechnique Federale De Lausanne (Epfl) Advantageous State Merging During Symbolic Analysis

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130283236A1 (en) * 2012-04-23 2013-10-24 Ecole Polytechnique Federale De Lausanne (Epfl) Advantageous State Merging During Symbolic Analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KLOIBHOFER SEBASTIAN SEBASTIAN KLOIBHOFER@JKU AT ET AL: "SymJEx: symbolic execution on the GraalVM", PROCEEDINGS OF THE 2021 5TH INTERNATIONAL CONFERENCE ON ELECTRONIC INFORMATION TECHNOLOGY AND COMPUTER ENGINEERING, ACMPUB27, NEW YORK, NY, USA, 4 November 2020 (2020-11-04), pages 63 - 72, XP058873828, ISBN: 978-1-4503-8432-2, DOI: 10.1145/3426182.3426187 *

Similar Documents

Publication Publication Date Title
Zhao Data-flow-based unit testing of aspect-oriented programs
US20060253739A1 (en) Method and apparatus for performing unit testing of software modules with use of directed automated random testing
US10915302B2 (en) Identification and visualization of associations among code generated from a model and sources that affect code generation
JP5923636B2 (en) Loop abstraction for model checking
Biere et al. The auspicious couple: Symbolic execution and WCET analysis
Clarisó et al. Smart bound selection for the verification of UML/OCL class diagrams
JP6528465B2 (en) State parameterization in symbolic execution for software testing
Joshi et al. Property-driven fence insertion using reorder bounded model checking
Godboley et al. Enhanced modified condition/decision coverage using exclusive-nor code transformer
US7283945B2 (en) High level verification of software and hardware descriptions by symbolic simulation using assume-guarantee relationships with linear arithmetic assumptions
Chatelain et al. VeriTracer: Context-enriched tracer for floating-point arithmetic analysis
WO2024074163A1 (en) Symbolic execution of machine code for software testing
EP2718821A2 (en) Verification of computer-executable code generated from a model
Arndt et al. Let this graph be your witness! an attestor for verifying Java pointer programs
WO2019142266A1 (en) Test case generation device, test case generation method, and test case generation program
EP3872663A1 (en) Method and device for symbolic analysis of a software program
Costa et al. Observability analysis of embedded software for coverage-directed validation
WO2024046516A1 (en) Symbolic execution in testing software
Wilhelm et al. Symbolic state traversal for WCET analysis
US6968523B2 (en) Design method of logic circuit using data flow graph
Xavier et al. Type checking Circus specifications
Mandal et al. A static analyzer for Industrial robotic applications
Ribon et al. Assertion support in high-level synthesis design flow
Madhavan et al. Purity analysis: An abstract interpretation formulation
Sri et al. A Holistic Approach to CPU Verification using Formal Techniques

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23786173

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 23786173

Country of ref document: EP

Kind code of ref document: A1