[go: up one dir, main page]

WO2007099322A2 - Procedural abstraction for executable code - Google Patents

Procedural abstraction for executable code Download PDF

Info

Publication number
WO2007099322A2
WO2007099322A2 PCT/GB2007/000712 GB2007000712W WO2007099322A2 WO 2007099322 A2 WO2007099322 A2 WO 2007099322A2 GB 2007000712 W GB2007000712 W GB 2007000712W WO 2007099322 A2 WO2007099322 A2 WO 2007099322A2
Authority
WO
WIPO (PCT)
Prior art keywords
code
sequence
repeated
instance
blx
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/GB2007/000712
Other languages
French (fr)
Other versions
WO2007099322A3 (en
Inventor
Mel Pullen
Christopher Redpath
Andrew Langstaff
Andrew Sizer
Mark Divall
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Symbian Software Ltd
Original Assignee
Symbian Software Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB0604136A external-priority patent/GB0604136D0/en
Application filed by Symbian Software Ltd filed Critical Symbian Software Ltd
Publication of WO2007099322A2 publication Critical patent/WO2007099322A2/en
Publication of WO2007099322A3 publication Critical patent/WO2007099322A3/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • G06F8/4434Reducing the memory space required by the program code
    • G06F8/4436Exlining; Procedural abstraction

Definitions

  • This invention relates to a method for reducing the size of a set of computer executable code, and to a reduced set of code resulting from the method.
  • the term 'computing device' is intended to include, without being limited to, Desktop and Laptop computers, Personal Digital Assistants (PDAs), Mobile Telephones, Smartphones, Digital Cameras and Digital Music Players. It also includes converged devices incorporating the functionality of one or more of the classes of device already mentioned, together with many other industrial and domestic electronic appliances.
  • NAND Flash The type of masked ROMs used in older computing devices are relatively expensive and inflexible, and modern computing devices typically rely on less expensive and more easily updated technologies, such as NAND Flash, to provide a persistent store for embedded software.
  • NAND flash is unlike the older type of ROM in that it does not permit code to be executed directly from persistent store. It has to be copied to a different type of storage, such as Random Access Memory (RAM) to be executed.
  • RAM Random Access Memory
  • embedded program code on NAND Flash behaves in the same way as hard disk drives on conventional desktop computers.
  • LZW compression scans sequentially through the data to be compressed and looks for repeated sequences. If it is possible to replace a repeated sequence with a token that occupies less space than the original, then the replacement is made and the original sequence and the token are saved in a compression dictionary. When decoding, tokens are looked up in the dictionary and replaced with the corresponding sequence, enabling the source material to be recovered from the compressed text.
  • LZW compression works best with data such as language-based text files, where relatively long repetitive sequences are extremely common. Compression results are not anything like as good for non-textual binaries such as executable code. Repeated sequences in such data are either short or, if long, are not that frequent, so the compression opportunities are much lower.
  • the compressed code remains in executable form, so it can be run without having to be decompressed first; so the cost saving in ROM is matched by a comparable saving in RAM, making devices less expensive to manufacture.
  • a method for reducing the size of a set of computer executable code to form a reduced executable version of the set of code comprising: (a) analysing the set of code to identify a sequence of instructions that is repeated within the set of code; (b) for each instance of the repeated sequence, replacing that instance with an abbreviated sequence; and (c) providing a common sequence of instructions representing a portion of code that is common to each instance of the repeated sequence; wherein each abbreviated sequence is arranged to call the common sequence, thereby providing the functionality of the instance of the repeated sequence.
  • the set of computer code could comprise instructions that include constants, and each instance of the repeated sequence is preferably identical in structure and differs only in its included constants.
  • the corresponding abbreviated sequence may comprise instructions for storing in predetermined storage locations the constants included in that instance; and the step of providing a common sequence may comprise replacing those instructions in the instances of the repeated sequence that include constants, with corresponding instructions that include identifiers of the predetermined storage locations.
  • the method could further comprise the steps of, prior to step (b), analysing the set of code to determine one or more storage locations that will not be utilised on execution of the set of code, and defining the said one or more storage locations as the predetermined storage locations.
  • the predetermined storage locations are preferably the same for each instance of the repeated sequence.
  • the predetermined storage locations could include registers and/or stacks.
  • the method could further comprise: repeating steps (b) and (c) for further repeated sequences of instructions; and collecting all resulting common sequences in a pattern dictionary.
  • the computer executable code could be binary code.
  • the method could also comprise determining from the pattern dictionary and the repeated sequences of instructions which of a plurality of compilers was used to compile the set of computer executable code.
  • the method could further comprise compressing the reduced version of the code such that it is no longer executable, the step of compressing comprising: (d) analysing the reduced version of the code to identify a binary sequence that is repeated within the reduced version of the code; (e) for each instance of the repeated binary sequence, replacing that instance with an identifier of the binary sequence; and (f) maintaining a binary sequence dictionary for associating the identified binary sequence with the corresponding identifier.
  • an operating system comprising a reduced set of computer executable code according to the second aspect.
  • a data carrier carrying the reduced set of computer executable code according to the second aspect.
  • a computing device containing the reduced set of computer executable code according to the second aspect.
  • the common sequence is preferably stored in the computing device in such a way that it is always available to programs running on the device.
  • the common sequence may be stored in a dynamic link library, and/or it may be stored in execute-in-place memory.
  • the computing device could contain further software in addition to the said reduced set of computer executable code, and the pattern dictionary could be available to be accessed by at least some of the further software such that the further software may be provided with abbreviated sequences in place of instances of the repeated sequence.
  • executable is used herein to refer to code that is in a form such that it can be acted upon by a CPU. This could be compiled, binary code, or it could be code that needs to be interpreted by an interpreter before it can be processed by a CPU.
  • the invention provides a method of compressing program code without losing the ability for the code to be executable, by identifying repeated opcode sequences which differ only in the constant parameters associated with the opcodes. These common sequences are replaced by a single occurrence of a refactored sequence which uses register contents rather than constant values, with each occurrence of the original being replaced with shim sequences which load the appropriate constant into the appropriate register before branching to or calling the common sequence.
  • executable code since executable code is generated by a compiler, it consists of a finite number of code generation sequences or patterns that are modified by parameters specifying limited variability, such as the value of a constant or the identity of a register. Because the patterns are modified by parameters each time they are generated by the compiler, they cannot be recognised by common algorithms such as LZW that rely on sequential scanning. However, the inventors have identified that the underlying patterns may be identified in the executable code to advantageous effect, as described fully below.
  • a dictionary for an executable code file to be compressed can be constructed from a parameterised set of the patterns utilised by the compiler which was used to generate the executable code from source code.
  • This technique can result in greater space savings over the original binary code format than any technique based on sequential scanning. For example, an executable file which contained multiple calls to a subroutine, each of which passed a different parameter, would appear as different sequences if scanned in a linear fashion; but using an embodiment of this invention, the repetitive nature of the pattern (albeit with a different parameter) would be recognised.
  • Figure 1 shows a set of executable code
  • Figure 2 shows the code of Figure 1 reduced in size in accordance with an embodiment of the invention
  • FIGS 3 and 4 show further example sets of code
  • Figure 5 shows a list of instructions
  • Figure 6 shows the code of Figure 5 reduced in size in accordance with an embodiment of the invention
  • Figure 7 is an overview of a technique for compressing an executable file
  • Figures 8-11 illustrate details of the compression technique shown in Figure 7.
  • Figure 1 shows an example of the phenomenon of repeated sequences with different parameters. It shows the executable code generated by a compiler for three different functions used by Symbian OS, the advanced operating system for mobile telephones developed by Symbian Ltd, London, UK, and compiled for the ARM microprocessor and the THUMB instruction set (which contains 16-bit instructions). It will be readily appreciated by those skilled in the art that the principles underlying this invention are applicable to any operating system and any instruction set for any microprocessor. Different instruction sets have instructions of different forms, and while the Thumb instruction set is discussed here by way of example, the concepts of the invention are equally applicable to other instruction sets such as, for example, the Intel instruction set (http://www.iegerlehner.ch/intel/opcode.html). In this instruction set, instructions can be of different lengths. They may include constant values. For example, a MOV operation can refer to a particular value that is to be "moved" into a register.
  • instruction is intended to include the constants that may be used by an operation, as well as the operation itself.
  • Figure 1 shows an original version of a set of computer executable code.
  • the code includes three functions, whose entry point addresses are indicated in the first line of each function (00005888 etc.).
  • Each function includes 12 instructions: PUSH, LSL , LDR, etc.
  • the definitions of these instructions can be found in the ARM Architectural Reference Manual edited by Dave Jaggar and published by Prentice Hall, or at www.arm.com.
  • Embodiments of the present invention aim to reduce the size of code such as that shown in Figure 1 by taking advantage of the similarities in the structure of the different functions, even though the binary sequences are non-identical. This is achieved by extracting sequences of instructions that are repeated within the set of code, and replacing them with abbreviated sequences that can each call to the common elements of the repeated sequences.
  • the repeated sequence LSL , LDR , CMP , B, LSL, ADD, BL/BLX, BL, Mov, STR, POP is extracted from each of the three functions and placed in a separate commonly accessible function, referred to here as a "core" function.
  • Figure 2 shows a set of executable code that is identical in functionality to the code of Figure 1, but has been reduced in size in accordance with an embodiment of the invention.
  • Each of the three original functions now exists in an abbreviated form, which is referred to here as a "shim" function.
  • the first, second and third functions each now contain only three instructions: PUSH , MOV, MOV, together with a branch instruction, B, in the case of the first and second functions, as discussed further below.
  • Each shim function thus has the effect of moving the values of the constant parameters referred to in the original function into registers.
  • registers 05 and 06 are unused in the original functions, and they are therefore available as temporary stores for holding the parameters referred to by the functions.
  • the first shim function has the effect of moving the value 4c into register 05. 4c is 4 multiplied by the constant value 13, which was referenced in line 3 of the first function of Figure 1. Thus, the first shim multiplies the first referenced parameter by 4, and moves it into one of the pre-defined available registers. Additionally, the first shim moves the value eo, which is 4 times the constant 38, into register 06.
  • the core function is the fourth function shown in Figure 2. It contains the instructions which make up the sequence found in each of the original three functions: LSL , LDR , CMP , B , LSL , ADD , BL/BLX , BL , MOV, STR, POP.
  • the shims each call to the core function. In the case of the first and second shims, this is achieved by means of the branch instruction at the end of the shim. In the case of the third shim, the end of the shim is adjacent in terms of its memory address to the beginning of the core function, and therefore the core function runs automatically when the final instruction of the third shim is executed.
  • the code for the three functions has been rearranged in such a way that their previously differing constant parameters are instead stored in the computing device's registers. Since code to achieve this is available, the constants embedded in the code are no longer required, and not merely the instruction sequences but the entire binary sequences become identical, enabling the previous three separate sequences to be replaced with one single common sequence. We refer to this rearrangement as refactoring the code.
  • the code of Figure 1 contains three functions, each having 12 16-bit Thumb instructions. This gives a total code size of 3x12x2 bytes, i.e. 72 bytes.
  • the code of Figure 2 contains three shim instructions, having 4, 4 and 3 instructions respectively, and a core function having 11 instructions. This gives a total code size of (4x2 + 4x2 + 3x2 + 11x2), i.e. 44 bytes.
  • the technique exemplified in Figures 1 and 2 turns a 72 byte set of code into a 44 byte set of code having three entry points. This constitutes a code saving of almost 40%.
  • FIGS. 3 and 4 show how this modified technique can be implemented, using as an example code from Symbian OS compiled for an ARM microprocessor. It should be noted that this specific code is shown merely to facilitate explanation of this exemplary embodiment, and is not intended to be limiting.
  • the first shim is specific to each function in a particular executable, while the second shim is for use by all the functions in that executable.
  • Figure 4 shows the same techniques applied to the function from the executable BT . DLL.
  • a further application of the concepts of this invention enables functions that differ only in small sub-sequences to be rewritten to remove those common code sequences and turn them into subfunctions. While this type of optimisation can certainly be carried out by hand on source code, the impenetrability of the compilation process means that identifying such common code sequences in binary code cannot be carried out by such conventional means.
  • the first step in this application is for the functions to have any differences identified and removed; this is very similar to the parameterisation technique previously discussed.
  • Figure 5 shows two functions, which differ only in a few opcodes, but are otherwise identical. For ease of reading, the sections with differences are shown in italics in the figure; they can be found in the 8 th and the last lines of the functions.
  • each function is rewritten to call a common subfunction for those portions of the original code that are identical.
  • the resulting code is shown in Figure 6; it can be clearly seen than subfunctioning results in smaller code size.
  • the extra compression step follows the refactoring and parameterisation described above. Each remaining function can then be broken down into small commonly occurring sequences, which are then added to an additional conventional compression dictionary.
  • CAgnRptDef : SetStartDate
  • CAgnRptDef :SetEndDate
  • the first function contributes four such sequences: 1 PUSH SUB ADD LSL MOV BL/BLX BL
  • the second function contributes two sequences:
  • the third function contributes one sequence:
  • the fourth function doesn't contribute anything, as it can be completely built from entries already in the dictionary.
  • the entries for each of these functions is constructed from the parameterisation block (that holds all the constant values) together with the list of dictionary entries that comprise each function.
  • TAgnlnstanceDateTimeld : SetStartDate ... // constant data 1 5 6
  • CAgnRptDef : SetStartDate ... // constant data 1 7 4
  • CAgnRptDef SetEndDate ... // constant data 1 7 6 It should be noted that the exact nature of the splits between the common code sequences depends on the contents of the entire body of compressed code, not simply on these four functions. So, for example, in the context of the complete code, it might be more efficient to split sequence 1 into
  • the conventional compression dictionary needs to be stored only once, in the NAND Flash, together with the compressed program files; when the contents of the ROM are copied to XIP memory (either as a core OS image at boot time or as single executables, on demand, at run time), the decompressor references the conventional compression dictionary to rebuild the original executables.
  • These executables may themselves be compressed with reference to the pattern dictionary, which will itself have been included in the body of embedded software and copied to XIP memory, either at boot time or on demand.
  • Figures 7 to 11 illustrate the implementation that was used to generate the specific compression examples discussed above.
  • Figure 7 shows the basic flow of control for a stand-alone code compression engine based on the application of the pattern dictionary technology disclosed by this invention to a single executable (either a DLL or an EXE). It relies on the information produced by the compiler and linker that produced the executable and saved it in the mapfile during the program building process.
  • mapfiles contain details of how the various functions contained in an executable are laid out in memory. There are standard mechanisms for laying this information out; many compilers use the Executable and Linking Format (ELF), as developed by Bell Labs Unix System Laboratories, for this purpose.
  • ELF format is well known to those skilled in the art and is extensively documented on the Internet; the Wikipedia article at http://en.wikipedia.org/wiki/Executable and Linkable Format is a good place to start.
  • the first steps shown in the overview in Figure 7 are to load the executable and the mapfile and use the information made available through the ELF standard to build a tree of objects (including functions) that have been generated by the compiler and included in the executable.
  • This object tree is systematically traversed, and the patterns common to the code objects in the tree are identified, and the functions containing them refactored.
  • the code section for the executable is then rebuilt; unchanged functions are copied to the new code sections, while refactored functions are reassembled.
  • a new table is built of all the object offsets in the new executable, and this is mapped onto the original table of object offsets in the old executable. This enables the relocation information needed for the new executable to be regenerated, and applied to the new executable, which can then be saved to disk.
  • Figure 8 shows how each code object is taken in turn from the object tree constructed at the start of the process, and a hash generated for each operation code (or opcode) sequence for that object. This sublist of hashes for the code object is then added to a list of hashes for the entire executable.
  • Figure 9 shows one aspect of the techniques described herein.
  • the new (refactored) executable we maintain a table mapping references to objects in the original executable which need to be mapped to different objects in the new executable.
  • a repeated hash indicates a common code sequence, and a difference in the parameters for the code sequence indicates that refactoring is needed.
  • Figure 10 shows how the opcode sequence is rewritten to use registers rather than constants, and how shim functions are constructed to take account of the differences in the parameters.
  • the reference mapping table is updated, and how the references to the refactored code sequences replace references to the original. The steps in this figure are repeated for each hash in the sublist.
  • Figure 11 shows how we traverse the new data tree and (using the references stored in Figure 8 and the mapping table generated in Figure 9) replace all references to objects in the old executable with the corresponding references to objects in the new executable.
  • the compressed code remains executable and does not have to be decompressed before it can run; it retains all the functionality of the original but with a considerably reduced size.
  • ClrmuxSAP SendMUXConnectConfi rmFrame (Thumb mode) (0064 bytes)
  • 00004a9a fOO7 BL/BLX I I I 1000007
  • 00004ae2 f007 BL/BLX I I I 1000007
  • 00004b20 a902 ADD 101 I 102 ISP I I m 00004b22: f7fe BL/BLX [ I
  • 00004a60 XXXX PUSH 1 I I I I I -XXXXXXX X I I I 00004ada: XXXX MOV
  • 00004ac4 XXXX PUSH I I I I I I -XXXXXX X I I I 00004ada: XXXX MOV I 106 144 I I I ; 0x11 * 4 00004ada: XXXX MOV I 107
  • 00004ac6 0004 LSL
  • 000Q4ad4 f7fd BL/BLX I I
  • 00004ae2 f007 BL/BLX I I I 1000007 I
  • ROXOR O52flalO Sequence: FUw"F ⁇ qFUwqt,wl ⁇ 3wqt3 >qF5!»[OFl&P;l
  • 00009b80 foio BL/BLX I I I I 1000010 I
  • H O0009b8c fOlO BL/BLX 1 1 I 1 1000010 I H BLX I I 1 I I0003de
  • 01a34c cleanupStack: :PushL(CBase*) ro m
  • I 0O009b94 fOlO BL/BLX I I I I [000010 1 m BLX I 1 I 1 I I 1 1 [0OO68e
  • 00009b9c 0020 LSL 100 104 I I loo I I I 1
  • registers for storing constants embedded in the original code.
  • stacks could be used.
  • RAM could also be used, although the advantages of such an implementation would be negligible since in practice there would need to be a separate stage of storing details of the RAM locations used for storing the constants, so that the constants could be retrieved by the shim functions.
  • registers are used, a technique known in the art as packing could be applied, whereby multiple items of information are held in a single register. For example, if a 32-bit register is available, 4 items of information each 8 bits long could be packed into the same register.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Storage Device Security (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

A method for reducing the size of a set of computer executable code to form a reduced executable version of the set of code, the method comprising: (a) analysing the set of code to identify a sequence of instructions that is repeated within the set of code; (b) for each instance of the repeated sequence, replacing that instance with an abbreviated sequence; and (c) providing a common sequence of instructions representing a portion of code that is common to each instance of the repeated sequence; wherein each abbreviated sequence is arranged to call the common sequence, thereby providing the functionality of the instance of the repeated sequence.

Description

DATA PROCESSING
This invention relates to a method for reducing the size of a set of computer executable code, and to a reduced set of code resulting from the method.
In the present discussion the term 'computing device' is intended to include, without being limited to, Desktop and Laptop computers, Personal Digital Assistants (PDAs), Mobile Telephones, Smartphones, Digital Cameras and Digital Music Players. It also includes converged devices incorporating the functionality of one or more of the classes of device already mentioned, together with many other industrial and domestic electronic appliances.
It is very common for computing devices, especially portable ones, to embed their controlling software as well as some or all of their application software in a persistent memory store, which is normally protected in some way from accidental erasure. Storage media for such embedded software stores are known generically as ROMs, as Read-Only Memory was historically the first type to be used.
The type of masked ROMs used in older computing devices are relatively expensive and inflexible, and modern computing devices typically rely on less expensive and more easily updated technologies, such as NAND Flash, to provide a persistent store for embedded software. However, NAND flash is unlike the older type of ROM in that it does not permit code to be executed directly from persistent store. It has to be copied to a different type of storage, such as Random Access Memory (RAM) to be executed. In this respect, embedded program code on NAND Flash behaves in the same way as hard disk drives on conventional desktop computers.
Because of the expense of ROMs, and the continual downward pressure on the price of those computing devices marketed as consumer products (such as mobile telephones and PDAs) there has always been a pressure to use some type of compression on part or all of its embedded software; a computing device that compresses its embedded software needs a smaller ROM and is therefore less expensive to manufacture. The fact that this meant a loss of the ability to execute code in place was often sufficient reason not to compress embedded software. But on modern computing devices which making use of NAND Flash to store embedded software, nothing can be executed in place in any case; so the arguments for compressing its contents are much more compelling. Apart from the saving of space, it can be the case, if the processor is fast and the NAND flash is slow, that the time saved from having to read less content from the ROM compensates for the extra time spent on decompression.
The most common compression mechanism in use is some type of Lempel Ziv Welch (LZW) compression, as described in the Unisys patent US 4,558,302. LZW compression scans sequentially through the data to be compressed and looks for repeated sequences. If it is possible to replace a repeated sequence with a token that occupies less space than the original, then the replacement is made and the original sequence and the token are saved in a compression dictionary. When decoding, tokens are looked up in the dictionary and replaced with the corresponding sequence, enabling the source material to be recovered from the compressed text.
There are two main deficiencies in the known methods for compressing programs in ROM.
1. It is known that LZW compression works best with data such as language-based text files, where relatively long repetitive sequences are extremely common. Compression results are not anything like as good for non-textual binaries such as executable code. Repeated sequences in such data are either short or, if long, are not that frequent, so the compression opportunities are much lower.
2. Any executables that are compressed using such prior art techniques have to be decompressed in order to be run. In respect of programs in ROM, this inevitably results in an expansion of data. Therefore, the cost saving of using technologies such as NAND flash has to be offset against the increased requirement of eXecute In Place (XIP) memory such as RAM. This is particularly the case for core OS images in ROM, for which XIP memory cannot be readily reused once it has been allocated owing the nature of the static linkage typically used to reference it. (The term "core OS" is used here in a general sense to refer to those parts of an operating system that are automatically loaded at device power-up time, because their contents are required for the general running of the device.)
With a view to overcoming these deficiencies, we disclose an improved method of compressing executable code which may be particularly suitable for computing devices that are manufactured to store their ROM code in NAND Flash, it offers a solution to both of the problems described above:
1. It is specifically designed to optimise the compression of executable code.
2. The compressed code remains in executable form, so it can be run without having to be decompressed first; so the cost saving in ROM is matched by a comparable saving in RAM, making devices less expensive to manufacture.
According to a first aspect of the present invention there is provided a method for reducing the size of a set of computer executable code to form a reduced executable version of the set of code, the method comprising: (a) analysing the set of code to identify a sequence of instructions that is repeated within the set of code; (b) for each instance of the repeated sequence, replacing that instance with an abbreviated sequence; and (c) providing a common sequence of instructions representing a portion of code that is common to each instance of the repeated sequence; wherein each abbreviated sequence is arranged to call the common sequence, thereby providing the functionality of the instance of the repeated sequence.
The set of computer code could comprise instructions that include constants, and each instance of the repeated sequence is preferably identical in structure and differs only in its included constants. For each instance, the corresponding abbreviated sequence may comprise instructions for storing in predetermined storage locations the constants included in that instance; and the step of providing a common sequence may comprise replacing those instructions in the instances of the repeated sequence that include constants, with corresponding instructions that include identifiers of the predetermined storage locations.
The method could further comprise the steps of, prior to step (b), analysing the set of code to determine one or more storage locations that will not be utilised on execution of the set of code, and defining the said one or more storage locations as the predetermined storage locations.
The predetermined storage locations are preferably the same for each instance of the repeated sequence. The predetermined storage locations could include registers and/or stacks.
The method could further comprise: repeating steps (b) and (c) for further repeated sequences of instructions; and collecting all resulting common sequences in a pattern dictionary. The computer executable code could be binary code.
The method could also comprise determining from the pattern dictionary and the repeated sequences of instructions which of a plurality of compilers was used to compile the set of computer executable code.
The method could further comprise compressing the reduced version of the code such that it is no longer executable, the step of compressing comprising: (d) analysing the reduced version of the code to identify a binary sequence that is repeated within the reduced version of the code; (e) for each instance of the repeated binary sequence, replacing that instance with an identifier of the binary sequence; and (f) maintaining a binary sequence dictionary for associating the identified binary sequence with the corresponding identifier.
According to a second aspect of the invention there is provided a reduced set of computer executable code generated by any of the methods defined above.
According to a third aspect of the invention there is provided an operating system comprising a reduced set of computer executable code according to the second aspect.
According to a fourth aspect of the invention there is provided a data carrier carrying the reduced set of computer executable code according to the second aspect.
According to a fifth aspect of the invention there is provided a computing device containing the reduced set of computer executable code according to the second aspect.
The common sequence is preferably stored in the computing device in such a way that it is always available to programs running on the device. The common sequence may be stored in a dynamic link library, and/or it may be stored in execute-in-place memory. The computing device could contain further software in addition to the said reduced set of computer executable code, and the pattern dictionary could be available to be accessed by at least some of the further software such that the further software may be provided with abbreviated sequences in place of instances of the repeated sequence.
The term "executable" is used herein to refer to code that is in a form such that it can be acted upon by a CPU. This could be compiled, binary code, or it could be code that needs to be interpreted by an interpreter before it can be processed by a CPU.
In a preferred embodiment, the invention provides a method of compressing program code without losing the ability for the code to be executable, by identifying repeated opcode sequences which differ only in the constant parameters associated with the opcodes. These common sequences are replaced by a single occurrence of a refactored sequence which uses register contents rather than constant values, with each occurrence of the original being replaced with shim sequences which load the appropriate constant into the appropriate register before branching to or calling the common sequence.
The inventors have appreciated that since executable code is generated by a compiler, it consists of a finite number of code generation sequences or patterns that are modified by parameters specifying limited variability, such as the value of a constant or the identity of a register. Because the patterns are modified by parameters each time they are generated by the compiler, they cannot be recognised by common algorithms such as LZW that rely on sequential scanning. However, the inventors have identified that the underlying patterns may be identified in the executable code to advantageous effect, as described fully below.
In the present discussion we follow the standard practice in referring to a collection of patterns of this type as a dictionary, since this is a term commonly used in the literature on compression for collections of instances of repetitive patterns. However, it should be noted that there are considerable differences between dictionaries of the type referred to in the following specific description, and the dictionaries generated and used by prior art compression methods. In particular, it should be noted that the use of a conventional compression dictionary requires the recognition by a decompressor of some type of distinctively recognizable token which was inserted into the compressed bytestream by the compressor. When the decompressor comes across such a token, it looks for accompanying index data that can be directly translated into an entry in the conventional dictionary, with that entry then being substituted for both the token and the index data. Many such conventional compression techniques are lossless, in that the original can be completely reconstituted from the compressed version.
However, with the use of embodiments of the present invention no compression dictionary is required. The selection of a specific pattern in the collection of patterns that make up the dictionary is made by embedding a conventional branch, call or jump instruction in the code. Furthermore, while the compressed code is functionally equivalent to the uncompressed code, the compression offered by embodiments of the invention is lossy, as no guarantees can be offered that the original code can be reconstituted from the compressed version.
In the following description of embodiments of the present invention, we explain how a dictionary for an executable code file to be compressed can be constructed from a parameterised set of the patterns utilised by the compiler which was used to generate the executable code from source code. This technique can result in greater space savings over the original binary code format than any technique based on sequential scanning. For example, an executable file which contained multiple calls to a subroutine, each of which passed a different parameter, would appear as different sequences if scanned in a linear fashion; but using an embodiment of this invention, the repetitive nature of the pattern (albeit with a different parameter) would be recognised.
The invention will now be described by way of example, with reference to the accompanying drawings, in which:
Figure 1 shows a set of executable code;
Figure 2 shows the code of Figure 1 reduced in size in accordance with an embodiment of the invention;
Figures 3 and 4 show further example sets of code;
Figure 5 shows a list of instructions;
Figure 6 shows the code of Figure 5 reduced in size in accordance with an embodiment of the invention; Figure 7 is an overview of a technique for compressing an executable file; and Figures 8-11 illustrate details of the compression technique shown in Figure 7.
Single executable parameterised compression
Figure 1 shows an example of the phenomenon of repeated sequences with different parameters. It shows the executable code generated by a compiler for three different functions used by Symbian OS, the advanced operating system for mobile telephones developed by Symbian Ltd, London, UK, and compiled for the ARM microprocessor and the THUMB instruction set (which contains 16-bit instructions). It will be readily appreciated by those skilled in the art that the principles underlying this invention are applicable to any operating system and any instruction set for any microprocessor. Different instruction sets have instructions of different forms, and while the Thumb instruction set is discussed here by way of example, the concepts of the invention are equally applicable to other instruction sets such as, for example, the Intel instruction set (http://www.iegerlehner.ch/intel/opcode.html). In this instruction set, instructions can be of different lengths. They may include constant values. For example, a MOV operation can refer to a particular value that is to be "moved" into a register.
In the context of this invention, the term "instruction" is intended to include the constants that may be used by an operation, as well as the operation itself.
The code shown in Figure 1 will now be described in some detail for the benefit of those unfamiliar with assembly language. Figure 1 shows an original version of a set of computer executable code. The code includes three functions, whose entry point addresses are indicated in the first line of each function (00005888 etc.). Each function includes 12 instructions: PUSH, LSL , LDR, etc. The definitions of these instructions can be found in the ARM Architectural Reference Manual edited by Dave Jaggar and published by Prentice Hall, or at www.arm.com.
It can be seen from a review of the three functions shown in Figure 1 that the instructions in each function are of the same form, and that they differ only in the constant parameters to which they refer. For example, in line 3 of the first function, the LDR (Load Register) instruction references the parameter 13 (shown in italics and underlined). This indicates that the constant value 13 is to be loaded into a register. In contrast, the instruction at line 3 of the second function references od and the instruction at line 3 of the third function references 07. In all other respects the instructions at the third lines of each function are identical. Similar differences in the constant parameters referenced by the functions are indicated by italic font and underlining in Figure 1.
It can thus be seen from a comparison of the object code for the three functions shown in Figure 1 that while the form of the instructions generated by the compiler are identical, the binary sequences are non-identical: different constant parameters accompany the LDR , ADD and STR instructions in the three functions.
Embodiments of the present invention aim to reduce the size of code such as that shown in Figure 1 by taking advantage of the similarities in the structure of the different functions, even though the binary sequences are non-identical. This is achieved by extracting sequences of instructions that are repeated within the set of code, and replacing them with abbreviated sequences that can each call to the common elements of the repeated sequences. Thus, in the example of Figure 1 , the repeated sequence: LSL , LDR , CMP , B, LSL, ADD, BL/BLX, BL, Mov, STR, POP is extracted from each of the three functions and placed in a separate commonly accessible function, referred to here as a "core" function.
Figure 2 shows a set of executable code that is identical in functionality to the code of Figure 1, but has been reduced in size in accordance with an embodiment of the invention. Each of the three original functions now exists in an abbreviated form, which is referred to here as a "shim" function. Thus, the first, second and third functions each now contain only three instructions: PUSH , MOV, MOV, together with a branch instruction, B, in the case of the first and second functions, as discussed further below. Each shim function thus has the effect of moving the values of the constant parameters referred to in the original function into registers.
An analysis of the original code reveals that registers 05 and 06 are unused in the original functions, and they are therefore available as temporary stores for holding the parameters referred to by the functions. On inspection of Figure 2, it can be seen that the first shim function has the effect of moving the value 4c into register 05. 4c is 4 multiplied by the constant value 13, which was referenced in line 3 of the first function of Figure 1. Thus, the first shim multiplies the first referenced parameter by 4, and moves it into one of the pre-defined available registers. Additionally, the first shim moves the value eo, which is 4 times the constant 38, into register 06. The reason for multiplying the constants by 4 before storing them in the registers derives from a characteristic of ARM processors: it is only possible to access those addresses that are "word aligned", that is, at the start of a word boundary, with a word being defined as 4 bytes. Thus, to ensure that a value added to a register is a whole number of words, the original value (here, the values 13 and 38) is multiplied by 4.
Having loaded the constant parameters referenced in the original function into the predefined registers, the shim can then perform its second role: enabling the use of a further block of code, here a common core function. The core function is the fourth function shown in Figure 2. It contains the instructions which make up the sequence found in each of the original three functions: LSL , LDR , CMP , B , LSL , ADD , BL/BLX , BL , MOV, STR, POP. In order to provide this functionality to each of the three reduced functions, the shims each call to the core function. In the case of the first and second shims, this is achieved by means of the branch instruction at the end of the shim. In the case of the third shim, the end of the shim is adjacent in terms of its memory address to the beginning of the core function, and therefore the core function runs automatically when the final instruction of the third shim is executed.
By virtue of this arrangement, when the core function comes to be executed one of the shim functions has already run and caused the parameters specific to the corresponding original function to be available in pre-defined registers (05 and 06) which are then accessed by appropriate instructions in the core function. In this example, the core function cannot be accessed directly; it can only run by being linked to from another set of code, in this case the three shims.
Thus, in this example of the invention, the code for the three functions has been rearranged in such a way that their previously differing constant parameters are instead stored in the computing device's registers. Since code to achieve this is available, the constants embedded in the code are no longer required, and not merely the instruction sequences but the entire binary sequences become identical, enabling the previous three separate sequences to be replaced with one single common sequence. We refer to this rearrangement as refactoring the code.
It can be seen that the code of Figure 1 contains three functions, each having 12 16-bit Thumb instructions. This gives a total code size of 3x12x2 bytes, i.e. 72 bytes. The code of Figure 2 contains three shim instructions, having 4, 4 and 3 instructions respectively, and a core function having 11 instructions. This gives a total code size of (4x2 + 4x2 + 3x2 + 11x2), i.e. 44 bytes. In other words, the technique exemplified in Figures 1 and 2 turns a 72 byte set of code into a 44 byte set of code having three entry points. This constitutes a code saving of almost 40%.
Multiple executable parameterised compression
While the example above shows how the size of the code in a single module can be substantially reduced, it is important to note that repetitive code patterns such as these are a feature of compiler technology rather than a feature of individual programs. Consequently, all executable code that has been generated by the same compiler is likely to include the same code patterns.
Where the same patterns occur in multiple executables, it can often be extremely efficient to refactor them together and implement any necessary core functions in a central location, such as a DLL (Dynamic Linked Library) or an XIP (execute In Place) core OS (Operating System), which is guaranteed to be always available.
However, when refactoring functions in different executables, a modified technique has to be used to refactor them to make use of the common pattern. Figures 3 and 4 show how this modified technique can be implemented, using as an example code from Symbian OS compiled for an ARM microprocessor. It should be noted that this specific code is shown merely to facilitate explanation of this exemplary embodiment, and is not intended to be limiting.
It will be noted immediately by those skilled in the art that the function CL2CAPProtocol : : RemoveidleτimerEntry which is taken from one of 14 similar functions in the executable BT . DLL and appears at the start of Figure 3 is clearly an identical pattern to all three of the functions taken from the executable IR . DLL which appeared in Figure 1. All have the same checksum (which appears as RoXoR : 583c5bO9 at the end of each function). However, it will also be noted that the BL Branch with Link instruction that appears in the 9th line of the functions in Figure 1 takes the parameter obocβ, while the same the BL (Branch with Link) instruction that appears in the 9th line of function CL2CAPProtocol : : Remove idleTimerEntry takes the parameter
018d8.
To take account of this difference, we need to use two shims rather than one. The first shim, as before, is specific to each function in a particular executable, while the second shim is for use by all the functions in that executable.
This can be seen in the case of the functions from IR . DLL, where the new shim structure appears in Figure 3. The same shims appear as in Figure 2, but now they branch not to the common core function, but instead to a second shim, which loads the parameter obo c 6 into a register before branching off to a common core function located not within the same executable, but in a commonly accessible location such as an XIP core OS image. Note that this would require an import stub were it to look like another DLL.
Figure 4 shows the same techniques applied to the function from the executable BT . DLL. The specific Shim for the function CL2CAPProtocol : : RemoveIdleTimerEntry branches to a shim for all the 14 functions in that executable which use the same pattern; it loads the parameter oi8d8 (as used by BT . DLL) into a register before branching to the same common core function (found in a location such as an XIP core OS image) as used by IR . DLL which appears at the bottom of Figure 4.
With this modified technique, it can be seen that once a dictionary of the patterns used by a compiler has been constructed, all the binary executables in any collection that were developed with that compiler may make use of the same dictionary; this is because the dictionary is derived from the characteristics of the compiler used to generate the code.
This is a clear improvement on standard LZW compression techniques; with a single pattern dictionary in an accessible location such as XIP memory, there is no need to store a separate dictionary in each compressed file, and the compressed code does not need to be decompressed to be executed. Additional examples of parameterised compression
Two more examples of parameterisation refactoring techniques are provided below; those skilled in the art will be able to study them in conjunction with Figures 1-4 in order to better appreciate the power of embodiments of this invention.
1. Subfunctioning
A further application of the concepts of this invention enables functions that differ only in small sub-sequences to be rewritten to remove those common code sequences and turn them into subfunctions. While this type of optimisation can certainly be carried out by hand on source code, the impenetrability of the compilation process means that identifying such common code sequences in binary code cannot be carried out by such conventional means.
The first step in this application is for the functions to have any differences identified and removed; this is very similar to the parameterisation technique previously discussed.
Figure 5 shows two functions, which differ only in a few opcodes, but are otherwise identical. For ease of reading, the sections with differences are shown in italics in the figure; they can be found in the 8th and the last lines of the functions.
Using the techniques described above, each function is rewritten to call a common subfunction for those portions of the original code that are identical. The resulting code is shown in Figure 6; it can be clearly seen than subfunctioning results in smaller code size.
It should be noted that the technique only works a) if the data and registers involved in the common sequences are similar enough; and b) if there are no direct dependencies between the common sequences and the differences.
It should also be remembered that we have previously made clear that the use of the ARM instruction set in our examples is for illustration, and that the principles behind this invention are applicable to all instruction sets for all microprocessors. Those familiar with ARM assembly language programming will note that Figure 6 includes the use of a BL/BX R14 pair to call the common subfunctions. While this is a common way of calling a subfunction in ARM Thumb code, it will be appreciated that because this alters the stack frame, it should be modified if the register pair needs to be preserved.
2. Non-executable dictionary compression
Should a greater degree of code compression be desirable, an extra step may be added to the processes described above. However, a consequence of this further compression is that the code is no longer executable and has to be decompressed before it can be run. It is therefore not necessarily a suitable step to take should the executable compressed code already reside in XIP memory, as XIP memory is often expensive and the requirement to have two copies of code (the compressed and the uncompressed versions) would increase the cost of manufacture of the computing device.
However, where the executable compressed code resides in non-XIP memory (such as NAND Flash) and would in any case have to copied to XIP memory in order to be run, there are clearly definite cost advantages in the extra compression step. There may also be speed advantages in circumstances where the time taken to decompress the code is less than the extra time it would take to read a larger code block from relatively slow NAND flash.
The extra compression step follows the refactoring and parameterisation described above. Each remaining function can then be broken down into small commonly occurring sequences, which are then added to an additional conventional compression dictionary.
We illustrate the extra compression step with four small functions shown below:
TAgnException: :SetDate
PUSH SUB ADD LSL MOV BL/BLX BL MOV LSL MOV BL/BLX BLX MOV STR ADD POP
TAgnlnstanceDateTimeld: : SetStartDate
PUSH SUB ADD LSL MOV BL/BLX BL MOV LSL ADD MOV BL/BLX BLX B
CAgnRptDef : : SetStartDate
PUSH SUB ADD LSL MOV BL/BLX BL LSL MOV BL/BLX BL ADD POP
CAgnRptDef: :SetEndDate
PUSH SUB ADD LSL MOV BL/BLX BL LSL MOV BL/BLX BL B
When we remove the commonly occurring sequences in these four functions, we can generate a dictionary of small code sequences. The first function contributes four such sequences: 1 PUSH SUB ADD LSL MOV BL/BLX BL
2 MOV LSL MOV BL/BLX BLX
3 MOV STR
4 ADD POP
The second function contributes two sequences:
5 MOV LSL ADD MOV BL/BLX BLX
6 B
The third function contributes one sequence:
7 LSL MOV BL/BLX BL
The fourth function doesn't contribute anything, as it can be completely built from entries already in the dictionary.
When the compressed code block is built, the entries for each of these functions is constructed from the parameterisation block (that holds all the constant values) together with the list of dictionary entries that comprise each function.
TAgnException : : SetDate ... // constant data 1 2 3 4
TAgnlnstanceDateTimeld: : SetStartDate ... // constant data 1 5 6
CAgnRptDef : : SetStartDate ... // constant data 1 7 4
CAgnRptDef : : SetEndDate ... // constant data 1 7 6 It should be noted that the exact nature of the splits between the common code sequences depends on the contents of the entire body of compressed code, not simply on these four functions. So, for example, in the context of the complete code, it might be more efficient to split sequence 1 into
la PUSH SUB ADD LSL Ib MOV BL/BLX BL even though every one of our four functions contains it, if the two sequences appear separately a large number of times but appear less frequently together. Similarly, it may be the case that the sequence
3 +4 MOV STR ADD POP
may appear often enough to warrant its inclusion as a separate entry in the dictionary in addition to the original sequences 3 and 4.
Using both pattern and convention dictionary compression
All the techniques described herein can be effectively combined and applied to the collection of binary executables which make up the embedded software of a computing device; as explained above, devices which utilise NAND Flash ROM for this purpose are of particular interest in this context.
There are numerous benefits that arise from the more efficient and economical storage of software in such devices. The conventional compression dictionary needs to be stored only once, in the NAND Flash, together with the compressed program files; when the contents of the ROM are copied to XIP memory (either as a core OS image at boot time or as single executables, on demand, at run time), the decompressor references the conventional compression dictionary to rebuild the original executables. These executables may themselves be compressed with reference to the pattern dictionary, which will itself have been included in the body of embedded software and copied to XIP memory, either at boot time or on demand.
Note that standard LZW may still be used to compress embedded components such as non-executable data files wherever that method provides superior performance. However, for executable code, it is likely that the set of such patterns generated by the use of the concepts of this invention will be smaller than the dictionary size of a large compressed text file. In the case of code compiled for a Reduced Instruction Set Computer (RISC) processor this will almost certainly be true.
Implementation of this invention
In any given situation, the most appropriate manner of implementing the concepts of this invention will be closely dependent on the technologies used by the associated compilers and linkers. Indeed, were the general techniques disclosed herein to be adopted by the manufacturers and vendors of such tools, it is likely that an optimal level of compression might be attained.
However, for the purposes of illustration only, a sample implementation is shown in the flowcharts contained in Figures 7, 8, 9, 10 and 11. This sample implementation is not intended to restrict the application of the principles underlying this invention, in any way; indeed, those skilled in the art will be able to read the flowcharts and readily recognise the applicability of those principles to a wide variety of circumstances.
Figures 7 to 11 illustrate the implementation that was used to generate the specific compression examples discussed above. Figure 7 shows the basic flow of control for a stand-alone code compression engine based on the application of the pattern dictionary technology disclosed by this invention to a single executable (either a DLL or an EXE). It relies on the information produced by the compiler and linker that produced the executable and saved it in the mapfile during the program building process. Such mapfiles contain details of how the various functions contained in an executable are laid out in memory. There are standard mechanisms for laying this information out; many compilers use the Executable and Linking Format (ELF), as developed by Bell Labs Unix System Laboratories, for this purpose. The ELF format is well known to those skilled in the art and is extensively documented on the Internet; the Wikipedia article at http://en.wikipedia.org/wiki/Executable and Linkable Format is a good place to start.
The first steps shown in the overview in Figure 7 are to load the executable and the mapfile and use the information made available through the ELF standard to build a tree of objects (including functions) that have been generated by the compiler and included in the executable. This object tree is systematically traversed, and the patterns common to the code objects in the tree are identified, and the functions containing them refactored. The code section for the executable is then rebuilt; unchanged functions are copied to the new code sections, while refactored functions are reassembled. A new table is built of all the object offsets in the new executable, and this is mapped onto the original table of object offsets in the old executable. This enables the relocation information needed for the new executable to be regenerated, and applied to the new executable, which can then be saved to disk.
More detail on specific aspects of the overview given in Figure 7 are shown in the remaining figures. Figure 8 shows how each code object is taken in turn from the object tree constructed at the start of the process, and a hash generated for each operation code (or opcode) sequence for that object. This sublist of hashes for the code object is then added to a list of hashes for the entire executable.
At this point, we depart to Figure 9 which shows one aspect of the techniques described herein. When building the new (refactored) executable, we maintain a table mapping references to objects in the original executable which need to be mapped to different objects in the new executable. A repeated hash indicates a common code sequence, and a difference in the parameters for the code sequence indicates that refactoring is needed. At this point, Figure 10 shows how the opcode sequence is rewritten to use registers rather than constants, and how shim functions are constructed to take account of the differences in the parameters. Returning to Figure 9, we observe how the reference mapping table is updated, and how the references to the refactored code sequences replace references to the original. The steps in this figure are repeated for each hash in the sublist.
Returning to Figure 8, we mark all references to other code objects and store their locations, and we repeat the steps in this figure for each code object. Finally, Figure 11 shows how we traverse the new data tree and (using the references stored in Figure 8 and the mapping table generated in Figure 9) replace all references to objects in the old executable with the corresponding references to objects in the new executable. From the above discussion, it can be seen that the general concepts of the present invention can be used to achieve various benefits. Advantages resulting from certain embodiments of the invention are summarized below:
• Allowing reductions in the amount of relatively expensive XIP memory such as RAM that is required in computing devices for any given combination of software modules. This permits devices to be manufactured at lower cost.
• Alternatively, allowing an increased number of software modules to be run effectively in a given amount of XIP memory such as RAM.
• Unlike other methods of compressing software, the compressed code remains executable and does not have to be decompressed before it can run; it retains all the functionality of the original but with a considerably reduced size.
• Because executables are reduced in size, the resulting software can be loaded faster. As well as providing an enhanced user experience, this saves power. On mobile devices, this gives increased battery life; on devices obtaining power from conventional sources, reduced power consumption equates to less pollution and a lower carbon footprint, thereby diminishing (albeit by a small amount) global warming.
• Allowing computer programs in binary code format to be manipulated more flexibly. Where an embedded computer program has to be replaced (in persistent NAND Flash, for example) in order to update the software, only a relatively small amount of code needs to be replaced. If the part to be replaced is contained in the dictionary of common sequences, then this is particularly true since only the relevant common sequence needs to be replaced, and the functions that call the common sequence are unaffected. This means that OTA updates can be made more quickly. Even if the part to be replaced is not in the common sequence, then if the entire body of code in the OS has been compressed according to the invention, a smaller quantity of code will need to be updated.
• Facilitating maximal packing and fast loading of program code from NAND Flash. The compressed code can be loaded more quickly from NAND Flash since it is reduced in size, and because the loaded code is already executable no subsequent decompression step is needed, meaning that the code can typically be executed more quickly than is the case with standard compression techniques. • Because the dictionary of common patterns is specific to a particular compiler, it is possible to work backwards from a dictionary of patterns to find out the identity of the compiler that was used to create any executable program files. Effectively, the pattern dictionary acts as a compiler signature.
• The unique relationship between a compiler and a parameterised pattern dictionary can also enable the procedure described in the flowcharts of figures 7 to 11 to be optimised further once a pattern dictionary for a compiler already exists. After very few steps of the analysis, the correct compression dictionary can be identified and selected and the optimal code compression can be carried out.
• Conversely, if a collection of binary code program files exists, such as in a ROM for an embedded system, then multiple compression dictionaries can be stored, one for each type of compiler used to generate the program. This allows appropriate handling of the program to optimise loading and running of the program, among other actions.
Examples
In the following tables, some specific examples of code reduction are considered to further illustrate the principles of the invention.
Table 1
EXAMPLE A
This shows 2 similar large functions
Function @ 00004a60: ClrmuxSAP: : SendMUXConnectConfi rmFrame (Thumb mode) (0064 bytes)
I Cond I Rd I Rn I Rm IRS I#C3) I#(7J |#(8) |#(«) !#(+/-) ISP/PC [R 01234567 LR |#(+/-) |#S
00004a60: b53e PUSH I I -XXXXX— X I
00004a62: 0004 LSL [04 [ 100 100
00004a64: a802 ADD 100 [ 102 ISP
00004a66: f7fd BL/BLX I I |0007fd
BL I I |0003a9 |0021bc
00004a6a: 6cel LDR 101 |04 113
00004a6c: a802 ADD 100 I 102 [SP
(/) 00004a6e: 3108 ADD [01 [ |08
C 00004a70: f7fd BL/BLX I I |0007fd
CO BL I I |0003c0 |0021f4
( _/)ι 00004a74: 7f60 LDR 100 104 Ud
—1 00004a76: 2180 MOV 101 1 |80
C 00004a78: 0005 LSL I 105 1 |00 100 to mm 00004a7a: 9802 LDRH 102 |00 100 O
00004a7c: 43Od ORR 105
Ui I 101
I 00004a7e: f007 BL/BLX 1 I 1000007 m BL I I IOOOOdd |00bc3c m 00004a82: 7085 STR 105 loo 102
™1
00004a84: 7f25 LDR 105 |04 Hc
00004a86: 9802 LDRH 102 100 100
C 00004a88: f007 BL/BLX I 1 [ 1000007 m BL [ 1 1 10000d8 |00bc3c
K) 00004a8c: 70c5 STR 105 |00 103 I σ> 00004a8e: 9802 LDRH I 102 |00 100 I
00004a90: f007 BL/BLX I 1 1000007
BL I I |0000d4 |00bc3c
00004a94: 2181 MOV 101 I 181 I
00004a96: 7101 STR 101 100 104 I
00004a98: 9802 LDRH 102 100 100 I
00004a9a: fOO7 BL/BLX I I I 1000007
BL I I I IOOOOcf I00bc3c
00004a9e: 2100 MOV [01 I |00
00004aaO: 7141 STR 101 100 105
00004aa2: 6ceO LDR [00 |04 113
00004aa4: 2800 CMP I 100 |00
00004aa6: d008 B IEQ 108 1004aba
EXAMPLE A CONTINUED
00004aa8: 6cal LDR 101 |04 I 112
00004aaa: 4668 MOV 100 I 105 I
00004aac: f006 BL/BLX I 1 I I 1000006
BLX I I I I0004bc |00b428
00Q04ab0: 9900 LDRH 100 100 104
00004ab2: 9aθl LDRH 101 100 |08
00004ab4: a802 ADD 100 I 102 ISP
00004ab6: f7fd BL/BLX I I |0007fd
BL I I 1000670 100279a
00004aba: 6a20 LDR 100 104 108
00004abc: a902 ADD ioi I 102 lsp
00004abe: f7fe BL/BLX I I I0007fe
BL I I |0002cl 1003044
|#$
N)
Figure imgf000023_0001
00004ade: 9802 LDRH 102 100 100
00004aeO: 43Od ORR 105 I 101
00004ae2: f007 BL/BLX I I I 1000007
BL I I I 1 IOOOOab |00bc3c
00004ae6: 7085 STR 105 100 102
00004ae8: 7f25 LDR 105 |04 Uc
00004aea: 9802 LDRH 102 100 |00 I
00004aec: f007 BL/BLX I I I 1000007
BL I I I |0000a6 |00bc3c
00004af0: 70c5 STR 105 100 103 I
00004af2: 9802 LDRH 102 100 |00
00004af4: f007 BL/BLX 1000007
BL |0000a2 100bc3c
EXAMPLE A CONTINUED
00C04af8: 2101 MOV 01 I I 101
00004afa: 7101 STR 01 100 I |04
00004afc: 9802 LDRH 02 loo ! 100
00004afe: f007 BL/BLX I I 1000007 I
BL I I |00009d |00bc3c
00004b02: 2100 MOV 01 I I |00
00004b04: 7141 STR 01 100 I 105
00004b06: 6c60 LDR 00 104 I 111
00004b08: 2800 CMP 1 loo i 100
00004b0a: d008 B I EQ I I |08 1004ble
O0C04b0c: 6c21 LDR 01 104 110 I
OOQ04bOe: 4668 MOV 100 I [05 I
00004bl0: f006 BL/BLX I I [000006 I
BLX I I 100048a |00b428
00004bl4: 9900 LDRH 100 100 104
C 00004bl6: 9aθl LDRH ioi 100 108 CU 00004bl8: a802 ADD 100 [ 102 ISP I I
(/) 00004bla: f7fd BL/BLX I I |0007fd I
BL I I [00063e 100279a
00004ble: 6a20 LDR loo 104 |08 I I
00004b20: a902 ADD 101 I 102 ISP I I m 00004b22: f7fe BL/BLX [ I |0007fe I
(/) x BL [ I |00028f 1003044
00004b26: bd3e POP 1 I -XXXXX— X mm ROXoR: 655a6214
?0 There are 3 constants that need to be made variables
C mι- Function @ 00004a60: CIrmuxSAP: :SendMUXConnectConfirmFrame (Thumb mode) (000a bytes)
K) I Cond I Rd I Rn I Rm IRS l#(3) l#(7) |#(8) |#(«) [#(+/-) ISP/PC |R 01234567 LR [#(+/-) |#$
00004a60: XXXX PUSH 1 I I I I -XXXXXXX X I I 00004ada: XXXX MOV |06 14c I I I ; 0x13 * 4 00004ada: XXXX MOV 107 |48 1 I I ; 0x12 * 4 00004ada: XXXX MOV 103 181 I I I ; 0x81 00004b0a: XXXX B I I I ; core
Function ® 00004ac4: CIrmuxSAP: :sendMUXConnectFrame (Thumb mode) (0008 bytes)
ICond IRd I Rn IRm IRS l#(3) l#(7) |#(8) [#(«) |#(+/-) sp/pc IR 01234567 LR I#(+/-) 1#S
00004ac4: XXXX PUSH I I I I I -XXXXXXX X I I 00004ada: XXXX MOV I 106 144 I I I ; 0x11 * 4 00004ada: XXXX MOV I 107 |40 I I I ; 0x10 * 4 00004ada: XXXX MOV I 103 101 1 I 1 ; 0x01
Function @ 00004ac4: Core function (0062 bytes)
EXAMPLE A CONTINUED
00004ac6: 0004 LSL |04 1 1 00 100 I I
00Q04ac8: a802 ADD |00 I 102 ISP I I
00004aca: f7fd BL/BLX I I 10007fd 1
BL I I 1000377 |0021bc
00004ace: XXXX LDR 101 104 06
00C04ad0: a802 ADD 100 I 102 lsp
00004ad2: 3108 ADD 101 I 108
000Q4ad4: f7fd BL/BLX I I |0007fd I
BL I I |00038e |0021f4
00004ad8: 7f60 LDR |00 |04 Ud
00004ada: 2180 MOV 101 I |80
00004adc: 0005 LSL 105 I 00 100
00004ade: 9802 LDRH 102 [00 |00
00004aeO: 43Od ORR 105 I 01 1
00004ae2: f007 BL/BLX I I I 1000007 I
(/) BL I I I IOOOOab |00bc3c
C 00004ae6: 7085 STR 105 100 102 CO c/> 00004ae8: 7f25 LDR 105 |04 Hc
00004aea: 9802 LDRH [02 100 |00
00004aec: f007 BL/BLX I I I [000007 I
BL I I 1 |0000a6 |00bc3c CO m 00004af0: 70c5 STR 105 100 103
(/) 00004af2: 9802 LDRH 102 |00 loo x 00004af4: f007 BL/BLX I [ 1000007 I m BL I I 10000a2 |00bc3c m 00004af8: XXXX MOV ioi 103 I
00004afa: 7101 STR [01 |00 |04 I
Ti 00004afc: 9802 LDRH 102 |00 |00 I c ι 00004afe: f007 BL/BLX I I 1000007 m— BL I I |00009d I00bc3c
N) 00004b02: 2100 MOV ioi I |00
C0004b04: 7141 STR ioi [00 105
00004b06: XXXX LDR [00 |04 06
00004b08: 2800 CMP I 100 |00
00004b0a: d008 B EQ I I 108 |004ble
00004b0c: XXXX LDR I ioi |04 [07
00004b0e: 4668 MOV I loo I 105
00004blO: fOO6 BL/BLX I 1 I 1000006
BLX I I I I [00048a I00b428
00004bl4: 9900 LDRH I loo 100 I |04 I
00004bl6: 9a01 LDRH I 101 100 I |08 1
Q0004bl8: a802 ADD [ 100 I I 102 lsp I
00004bla: f7fd BL/BLX I I I I |0007fd
BL I I 1 I I00063e 100279a
EXAMPLE A CONTINUED
00004ble: 6a20 LDR [OO |04 [08
00004b20: a902 ADD [Ol 102 ISP
00004b22: f7fe BL/BLX I |0007fe I
BL I [ |00028f 1003044
00004b26: XXXX POP I I -XXXXXXX X I
RoXoR: 6S5a6214
EXAMPLE B
(/)
C Function @ 000099dO; cContactDatabase: :PersistGroupIdsLO (Thumb mode) (0044 bytes) CO ICoπd |Rd |Rπ |Rm [Rs [#(3) l#(7) [#(8) |#(«) |#(+-) |SP/PC|R 01234567 LR !#(+/-) l#ϊ (/) OOOOggdO: b570 PUSH I I XXX- X I I
000099d2: 0004 LSL |04 |00 |00
000099d4: 0005 LSL 105 |00 |00
000099de: 355c ADD [05 I 15c I m 000099d8: 0028 LSL [00 105 100 (/) 000099da: foio BL/BLX [ 1 1000010
X BLX I I I 1000744 |01a864: RDbRowSet: :updateL() m m [1] 000099de: 484c LDR 100 I 14c I PC |009bl0: ReIoc
000099e0: 0029 LSL |01 |05 |00 I
000099e2: fOlO BL/BLX I I 1000010 I
TJ c BLX I I [ 0004b0 |01a344: CleanupStack: :PushL(τcleanupltem)
000099e6: 0028 LSL [00 105 I 100 I I ιm— 000099e8: fOlO BL/BLX I I i 1000010 [
N) BLX I I I 100077c [01a8e4: RDbRowSet: :ColSetL() const
000099ec: 0006 LSL |06 |00 |00 I I
OO0099ee: fOlO BL/BLX I I 1000010
BLX I I I 0004ae |01a34c: cleanupStack: :PushL(CBase*)
[2] OOO099f2: 4948 LDR ioi I I [48 I PC I |009bl4: ReIoc
000099f4: 0030 LSL 100 106 100 I I I
000099f6: fOlO BL/BLX I I I I 1000010 I
BLX I I |00075e [01a8b4: CDbCoiSet: :ColNo(const TDescl6&) const
000099fa: 0001 LSL ioi [00 100 [ I I
[3] 000099fc: 6b62 LDR 102 04 I I [Od
000099fe: 0020 LSL 100 104 100
00009a00: f7ff BL/BLX 1 1 I [0007ff I
[4] BL 1 I |0007bb [00997a: CContactDatabase: :writeIdsCo"lL(int, cContactldArray*]
00009a04: fOlO BL/BLX I I 1000010 I
BLX I I I I 100048a |01a31c: CleanupStack: :PopAπdDestroyO
EXAMPLE B CONTINUED
00009a08: 0028 LSL |00 105 100 I I I
00009a0a: foio BL/BLX I I 1000010 I
BLX 10006f4 101a7f4: RDbRowSet: : PutLO
00009a0e: fOlO BL/BLX 1000010 I
BLX I 100048e |01a32c: CleanupStack: :PopO
00009al2: bd70 POP I XXX- X I 1
ROXOR: O52flalO Sequence: FUw"F§qFUwqt,wl{3wqt3 >qF5!»[OFl&P;l
Look at which registers are PUSHed, POPed, read from, and written to.
Types of parameterisation w LDR Rd, [PC, #yy]
C BL/BLX #ZZZZ
ID OP Rd, Rn, #xx (/)
^ in cases [1] and [2], we need to load a register Cor the stack) with the immediate offset, the instruction in the core function then becomes
C m LDR Rd, [PC, Rm] en new instruction then becomes
address to be passed in, and then branched to
Figure imgf000027_0001
Work backwards from the end (leaving some empty slots for POP and SP adjust)
At each different constant try to find a spare register to use as a parameter
It needs to have not been written to at that point, or read from within the whole function if a register is found then replace the #immediate version of the function with one that uses Rn/Rm/Rs in the core function
And add in the load of the new parameter to the shim
Mark the register as used. if the function has a Push, then this needs to be extended to include the new registers, the POP needs updating, along with the Exception Handler. The Push becomes the start of the shim functions
Function @ 00009b6e: ccontactDatabase: :PersistCardτemplateIdsLθ (Thumb mode) (0044 bytes)
ICond |Rd lRn |Rm |Rs |#(3) |#(7) |#(8) I#(«)!#(+-) [SP/PCI R 01234567 LR [#(+/-) l#$ O0009b6e: b570 PUSH | i 1 I I I I I I I 1 1 XXX- X I I
EXAMPLE E CONTINUED
O0009b70: 0004 LSL 104 |00 I I IOO I
00009b72: 0005 LSL 105 |00 1 I 100
00OO9b74: 355c ADD 105 1 I I 15c I I I
00009b7S: 0028 LSL 100 |05 I 1 100 1 I
00009b78: fOlO BL/BLX I I I 1 [OOOOIO I
BLX i I I 1 I 1000674 |01a864: RDbRowSet: :UpdateLO
00009b7c: 48e9 LDR |00 I I I |e9 lPC I |009f24: Reioc
00009b7e: 0029 LSL ioi 105 I I |00 I I
00009b80: foio BL/BLX I I I I 1000010 I
CΛ BLX I I I I |0003e0 |01a344: CleanupStack::PushL(τcleanupIteπϋ
C 00009b84: 0028 LSL [OO 105 I 1 100 I I
CO O0009b86: fOlO BL/BLX I 1 I i looooio I
CΛ BLX 1 I I I |0006ae |01a8e4: RDbRowSet: :Co1SetLO const
H 0000%8a: 0006 LSL 106 100 1 I loo 1 1
H O0009b8c: fOlO BL/BLX 1 1 I 1 1000010 I H BLX I I 1 I I0003de |01a34c: cleanupStack: :PushL(CBase*) ro m 0O009b9O: 49e5 LDR |01 I I 1 |e5 I PC I 1009f28: ReIoc CD
CΛ 0O009b92: 0030 LSL 100 106 I 1 |00 1 I
I 0O009b94: fOlO BL/BLX I I I I [000010 1 m BLX I 1 I 1 I I 1 1 [0OO68e |Ola8b4: CDbColset: :ColNo(const TDesd6S0 const m 00009b98: 0001 LSL ioi |00 I I 100 I I 1
H 00009b9a: 6b22 LDR 102 04 1 I I IOc 1 I I
00009b9c: 0020 LSL 100 104 I I loo I I I 1
C 00009b9e: f7ff BL/BLX I I I I |0007ff 1 ι- m BL I 1 1 |0007bb [009bl8: CContactDatabase::WriteTemp1ateIdsColLCint, CContactldArray*'
00009ba2: fOlO BL/BLX 1 I 1 1 1 1000010 I σ> BLX I 1 I |0003bc |01a31c: CleanupStack: :PopAndDestroyO
00009ba6: 0028 LSL 100 105 1 100 I I
00009ba8: fOlO BL/BLX I 1 I 1000010 [
BLX I I I 1000624 101a7f4: RDbRowSet: :PUtLO
00009bac: fOlO BL/BLX I I 1 1000010 I
BLX I I I |0003be 101a32c: CleanupStack::PopO
00009bbO: bd70 POP 1 I I I XXX- X
ROXOR: 052flalt )
Sequence: FUw"FSqFUv>qt,wl{αwqf] >qF%[rjFI&P;l
In the specific description and the examples above, the described embodiments of the invention utilise registers for storing constants embedded in the original code. It will be understood by the skilled person that other types of storage may be used to achieve the same purpose: for example, stacks could be used. In principle, RAM could also be used, although the advantages of such an implementation would be negligible since in practice there would need to be a separate stage of storing details of the RAM locations used for storing the constants, so that the constants could be retrieved by the shim functions.
Where registers are used, a technique known in the art as packing could be applied, whereby multiple items of information are held in a single register. For example, if a 32-bit register is available, 4 items of information each 8 bits long could be packed into the same register.
In general, it will be understood by the skilled person that various modifications of the methods and implementations described above may be made within the scope of the invention, as defined by the appended claims.

Claims

1. A method for reducing the size of a set of computer executable code to form a reduced executable version of the set of code, the method comprising:
(a) analysing the set of code to identify a sequence of instructions that is repeated within the set of code;
(b) for each instance of the repeated sequence, replacing that instance with an abbreviated sequence; and
(c) providing a common sequence of instructions representing a portion of code that is common to each instance of the repeated sequence; wherein each abbreviated sequence is arranged to call the common sequence, thereby providing the functionality of the instance of the repeated sequence.
2. A method according to claim 1 wherein the set of computer code comprises instructions that include constants, and each instance of the repeated sequence is identical in structure and differs only in its included constants, and wherein: for each instance, the corresponding abbreviated sequence comprises instructions for storing in predetermined storage locations the constants included in that instance; and the step of providing a common sequence comprises replacing those instructions in the instances of the repeated sequence that include constants, with corresponding instructions that include identifiers of the predetermined storage locations.
3. A method according to claim 2 further comprising the steps of, prior to step (b), analysing the set of code to determine one or more storage locations that will not be utilised on execution of the set of code, and defining the said one or more storage locations as the said predetermined storage locations.
4. A method according to claim 2 or claim 3 wherein the said predetermined storage locations are the same for each instance of the repeated sequence.
5. A method according to any of claims 2 to 4 wherein the predetermined storage locations include registers.
6. A method according to any of claims 2 to 5 wherein the predetermined storage locations include stacks.
7. A method according to any preceding claim further comprising: repeating steps (b) and (c) for further repeated sequences of instructions; and collecting all resulting common sequences in a pattern dictionary.
8. A method according to claim 7 further comprising determining from the pattern dictionary and the repeated sequences of instructions which of a plurality of compilers was used to compile the set of computer executable code.
9. A method according to any preceding claim wherein the set of computer executable code is binary code.
10. A method according to any preceding claim further comprising compressing the reduced version of the code such that it is no longer executable, the step of compressing comprising:
(d) analysing the reduced version of the code to identify a binary sequence that is repeated within the reduced version of the code;
(e) for each instance of the repeated binary sequence, replacing that instance with an identifier of the binary sequence; and
(f) maintaining a binary sequence dictionary for associating the identified binary sequence with the corresponding identifier.
11. A reduced set of computer executable code generated by the method of any of claims 1 to 10.
12. An operating system comprising a reduced set of computer executable code according to claim 11.
13. A data carrier carrying the reduced set of computer executable code according to claim 11. B2007/000712
30
14. A computing device containing the reduced set of computer executable code according to claim 11.
15. A computing device according to claim 14 wherein the common sequence is stored in such a way that it is always available to programs running on the device.
16. A computing device according to claim 15 wherein the common sequence is stored in a dynamic link library.
17. A computing device according to claim 15 or claim 16 wherein the common sequence is stored in execute-in-place memory.
18. A computing device according to any of claims 14 to 17 as dependent on claim 7, the device containing further software in addition to the said reduced set of computer executable code, wherein the pattern dictionary is available to be accessed by at least some of the further software such that the further software may be provided with abbreviated sequences in place of instances of the repeated sequence as claimed in claim 1.
PCT/GB2007/000712 2006-03-01 2007-03-01 Procedural abstraction for executable code Ceased WO2007099322A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB0604136.2 2006-03-01
GB0604136A GB0604136D0 (en) 2006-03-01 2006-03-01 Improvements related to the delivery of embedded software and usage of memory in a computing device
GB0701856A GB2435701A (en) 2006-03-01 2007-01-31 Reducing the size of computer executable code
GB0701856.7 2007-01-31

Publications (2)

Publication Number Publication Date
WO2007099322A2 true WO2007099322A2 (en) 2007-09-07
WO2007099322A3 WO2007099322A3 (en) 2007-10-25

Family

ID=38069315

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2007/000712 Ceased WO2007099322A2 (en) 2006-03-01 2007-03-01 Procedural abstraction for executable code

Country Status (1)

Country Link
WO (1) WO2007099322A2 (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6263429B1 (en) * 1998-09-30 2001-07-17 Conexant Systems, Inc. Dynamic microcode for embedded processors
DE10216602A1 (en) * 2002-04-15 2003-10-30 Giesecke & Devrient Gmbh Optimization of compiler generated program code
US7293264B2 (en) * 2003-09-17 2007-11-06 Nokia Corporation Method and a device for abstracting instruction sequences with tail merging

Also Published As

Publication number Publication date
WO2007099322A3 (en) 2007-10-25

Similar Documents

Publication Publication Date Title
Wang et al. Uroboros: Instrumenting stripped binaries with static reassembling
CN102231117B (en) Software installment method and system for embedded platform
US6460178B1 (en) Shared library optimization for heterogeneous programs
US7712092B2 (en) Binary translation using peephole translation rules
US6110227A (en) Systems and methods for pre-processing variable initializers
US5960197A (en) Compiler dispatch function for object-oriented C
US7117507B2 (en) Software atomization
KR20050081869A (en) Views for software atomization
WO2017014318A1 (en) Instruction set simulator and simulator generation method therefor
WO2016148762A1 (en) Methods and systems for removing plt stubs from dynamically linked binaries
Lopes et al. Shrink: Reducing the isa complexity via instruction recycling
CN102364433B (en) Method for realizing Wine construction tool transplanting on ARM (Advanced RISC Machines) processor
WO2025020398A1 (en) Smart-contract calling method and execution method, and computer device and storage medium
US8656381B2 (en) Presenting machine instructions in a machine-independent tree form suitable for post-link optimizations
WO2007099324A1 (en) Duplicate code detection
US20110113409A1 (en) Symbol capabilities support within elf
CN101782860B (en) Method and device for linking program
CN113760236A (en) Relocation method, relocation device, linker and compiling system
CN1828576A (en) Method and system for data optimization and protection in digital signal processing firmware
US7698534B2 (en) Reordering application code to improve processing performance
KR100478463B1 (en) Dynamic Linking Method for Application Program
WO2007099322A2 (en) Procedural abstraction for executable code
CN119225902A (en) Binary translation method, device, electronic device and readable storage medium
WO2025020397A1 (en) Method for optimizing wasm byte code, execution method, computer device and storage medium
Engelke et al. Using LLVM for optimized lightweight binary re-writing at runtime

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07705297

Country of ref document: EP

Kind code of ref document: A2