CN108197035A

CN108197035A - A kind of method for detecting memory boundary overflow error

Info

Publication number: CN108197035A
Application number: CN201810101223.5A
Authority: CN
Inventors: 殷萍; 高翠芳
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2018-02-01
Filing date: 2018-02-01
Publication date: 2018-06-22

Abstract

The invention provides a method for detecting memory boundary overflow errors, which belongs to the field of computers. Using a regular method of storing objects, it is convenient to obtain and store the boundary information of the object; realize a more efficient and smaller boundary table, store the logarithm of the object allocation boundary size; judge the binary of the logarithm of the address pointed to by the pointer The value of the pointer is used to determine whether the pointer overflows; through the loop checking optimization method with a weaker premise as the core, the pointer overflowing the boundary can be found in advance, and the detection efficiency can be improved. The invention supports efficient checking of memory overflow errors of C language and C++ language programs, can solve all space memory safety problems, and effectively improves software reliability.

Description

A Method of Detecting Memory Boundary Overflow Errors

技术领域technical field

本发明涉及一种检测内存边界溢出错误的方法，属于计算机领域。The invention relates to a method for detecting memory boundary overflow errors, which belongs to the field of computers.

背景技术Background technique

在日常生活中，各种各样的软件正以前所未有的速度融入到我们的日常生活中。而软件已经成为整个社会运行必不可少的一部分，银行，通信，教育，交通，乃至核工业等关键性行业都在利用软件来推动行业和社会的发展。然而，近年来我们的软件系统的大小和复杂度呈现出迅猛增长的态势，而且这种趋势在未来仍然不会停止。随着社会对软件依赖性的愈发严重，软件失效可能会给商业系统，甚至是安全关键系统带来巨大损失。因此，软件的可靠性正在成为软件行业发展中不可忽视的一个重要方面。In daily life, all kinds of software are being integrated into our daily life at an unprecedented speed. And software has become an indispensable part of the operation of the whole society. Key industries such as banking, communication, education, transportation, and even the nuclear industry are using software to promote the development of industries and society. However, the size and complexity of our software systems have grown exponentially in recent years, and this trend will not stop in the future. As society becomes more and more dependent on software, software failures can bring huge losses to commercial systems, even safety-critical systems. Therefore, software reliability is becoming an important aspect that cannot be ignored in the development of the software industry.

软件可靠性是指在特定环境以及特定时间段内软件保持运行不失效的概率，而软件可靠性工程是指利用一些工程上的技术来开发和维护软件系统，使得这些系统的可靠性能够进行量化的评价。按照错误产生的周期可以概括出提高软件系统可靠性的四个主要方面：通过规范开发进行错误预防和错误避免；通过验证和确认技术实现错误检测和移除；通过冗余设计使系统服务在错误发生情况下依然满足要求；通过软件可靠性模型等对错误存在性，未来失效发生可能性和失效影响等进行预测。其中软件验证技术一直是研究的重要方面，现有的软件验证技术主要包括如下几种，静态分析，模型检测，软件测试。Software reliability refers to the probability that software will keep running without failure in a specific environment and within a specific period of time, and software reliability engineering refers to the use of some engineering technologies to develop and maintain software systems, so that the reliability of these systems can be quantified evaluation of. According to the cycle of error generation, four main aspects of improving the reliability of software systems can be summarized: error prevention and error avoidance through specification development; error detection and removal through verification and confirmation technology; redundancy design to make system services in error The requirements are still met in the event of occurrence; the existence of errors, the possibility of future failures, and the impact of failures are predicted through software reliability models. Among them, software verification technology has always been an important aspect of research. Existing software verification technologies mainly include the following types, static analysis, model checking, and software testing.

静态分析指的是通过类型推断，数据流分析和约束分析等找出可能存在的错误，优点是可以在系统开发早期发现错误，但同时也会带来大量的误报和漏报；模型检测通过对软件结构的分析，将系统转化成有限状态空间模型，并利用算法验证其完备性，优点是模型检测是可证明的，缺点是形式化验证会带来状态空间爆炸问题，验证需要时间较长且模型的行为和系统的实际行为之间会存在一定的差距；软件测试是指通过设计测试用例，并在此基础上运行程序，观察程序的行为和结果，优点是实现了测试用例管理和执行的自动化，缺点是测试用例无法实现全覆盖，错误会出现遗漏，另外验证性质的描述采用的是直接定义的方式，无法通过形式化表达自动生成，限制了验证的能力。Static analysis refers to finding possible errors through type inference, data flow analysis, and constraint analysis. The advantage is that errors can be found in the early stage of system development, but it will also bring a large number of false positives and negative negatives; model detection passes The analysis of the software structure transforms the system into a finite state space model, and verifies its completeness by using an algorithm. The advantage is that model checking is provable, but the disadvantage is that formal verification will cause the problem of state space explosion, and verification takes a long time And there will be a certain gap between the behavior of the model and the actual behavior of the system; software testing refers to designing test cases and running the program on this basis to observe the behavior and results of the program. The advantage is that it realizes test case management and execution. The disadvantage of automation is that test cases cannot achieve full coverage, and errors will be missed. In addition, the description of the verification nature is directly defined, which cannot be automatically generated through formal expression, which limits the ability of verification.

和前述四种主要在软件开发过程中应用的验证技术不同的是，运行时验证是一种在软件实际运行阶段进行性质分析验证的技术，作为一种新近提出的轻量级验证技术，能够实现对软件运行的实时监控，验证软件是否符合或违反给定的正确性属性，对提高软件的可靠性有重要的意义。运行时验证是运行时监控技术和形式化规约技术的结合体，其通过监控软件运行过程获取实时状态，避免了模型检测中形式化验证带来的状态空间爆炸和复杂度问题，并考虑到了实际运行过程的环境等影响因素，同时运行时验证采用形式化语言对待验证性质进行描述，例如线性时序逻辑，正则表达式，自动机等，使运行时验证对性质表达更加丰富和灵活。运行时验证在性质分析验证的基础上，往往还支持对程序运行流程的控制，当程序运行进入某种错误状态，可以引导程序进入安全状态或者直接终止程序的运行，避免程序的错误运行带来更大的损失或者严重的事故。目前运行时验证技术已经被广泛应用在硬件运行控制Web应用和Android应用的监控，网络等各个领域。Different from the aforementioned four verification technologies that are mainly used in the software development process, runtime verification is a technology that performs property analysis and verification during the actual software running stage. As a newly proposed lightweight verification technology, it can realize Real-time monitoring of software operation to verify whether the software conforms to or violates a given correctness attribute is of great significance for improving the reliability of software. Runtime verification is a combination of runtime monitoring technology and formal specification technology. It obtains real-time status by monitoring the software running process, avoids the state space explosion and complexity problems caused by formal verification in model checking, and takes into account the actual Influencing factors such as the environment of the running process, while the runtime verification uses a formal language to describe the properties to be verified, such as linear sequential logic, regular expressions, automata, etc., making the runtime verification more rich and flexible in expressing properties. On the basis of property analysis and verification, runtime verification often also supports the control of the program running process. When the program runs into a certain error state, it can guide the program into a safe state or directly terminate the running of the program to avoid the error caused by the wrong running of the program. Greater loss or serious accident. At present, the runtime verification technology has been widely used in various fields such as the monitoring of hardware operation control Web applications and Android applications, and the network.

在运行时验证实现的关键技术中，主要包括两个部分，一个是待验证性质的描述和相应验证器的生成，二是监控器和验证器的系统集成。在性质描述方面，运行时验证采用了和模型检测相类似的技术，通过线性时序逻辑，参数化命题，正则表达式，自动机等逻辑语言进行描述。在验证器生成方面，验证器是指一类特殊的组件，可以读取一个有限事件序列，并给出一个特定的结论，验证器是依据用户所指定的性质自动生成，并随着对应的执行代码集成到待监控的系统当中，目前验证器主要采用基于自动机的验证算法和基于公式重写的验证算法生成。由于在性质描述和验证器生成方面，部分继承了模型检测中的内容，目前已有大量的成熟理论或算法可供参考，具有普遍的语言和算法通用性；而在监控器和验证器的系统集成方面，对应不同的开发平台，不同的目标语言，不同的性质描述规范会带来集成方法的不同，集成方法的设计反过来又会影响运行时验证本身性质的表达能力。因此，如何设计并实现运行时验证监控器和验证器的高效系统集成，成为运行时验证领域研究的一个重要和热点的方面，对于推动运行时验证在实际系统中的应用具有重要价值。In the key technology of runtime verification, it mainly includes two parts, one is the description of the properties to be verified and the generation of corresponding verifiers, and the other is the system integration of monitors and verifiers. In terms of property description, runtime verification adopts techniques similar to model checking, and is described by logic languages such as linear temporal logic, parameterized proposition, regular expression, and automata. In terms of validator generation, a validator refers to a special class of components that can read a finite sequence of events and give a specific conclusion. The validator is automatically generated according to the properties specified by the user, and with the corresponding execution The code is integrated into the system to be monitored. At present, the verifier is mainly generated by the verification algorithm based on automata and the verification algorithm based on formula rewriting. Due to the partial inheritance of the content of model checking in terms of property description and verifier generation, there are already a large number of mature theories or algorithms for reference, which have universal language and algorithm versatility; while in the system of monitor and verifier In terms of integration, corresponding to different development platforms, different target languages, and different property description specifications will lead to differences in integration methods, and the design of integration methods will in turn affect the ability to express the nature of runtime verification itself. Therefore, how to design and realize the efficient system integration of runtime verification monitor and verifier has become an important and hot topic in the field of runtime verification research, which is of great value in promoting the application of runtime verification in practical systems.

发明内容Contents of the invention

为解决上述问题，本发明提供一种检测内存边界溢出错误的方法。To solve the above problems, the present invention provides a method for detecting memory boundary overflow errors.

本发明提供的检测内存边界溢出错误的方法包括如下步骤：The method for detecting memory boundary overflow error provided by the present invention comprises the following steps:

步骤一：编码器读取待检测的源代码并将其转变成中间代码；Step 1: The encoder reads the source code to be detected and converts it into an intermediate code;

步骤二：分析中间代码，采用基于对象的边界检测技术完成对象的对齐分配并实现边界表；Step 2: Analyze the intermediate code, use the object-based boundary detection technology to complete the alignment allocation of objects and realize the boundary table;

步骤三：在中间代码中插入检测函数，执行边界检查；Step 3: Insert the detection function in the intermediate code and perform boundary check;

步骤四：通过分析优化，简化冗余操作，生成优化后的中间代码；Step 4: Through analysis and optimization, redundant operations are simplified, and optimized intermediate code is generated;

步骤五：将优化后的中间代码链接到二进制的库文件，生成可执行文件；Step 5: Link the optimized intermediate code to the binary library file to generate an executable file;

步骤六：运行可执行文件，检测函数会在指针解引用其所指向对象之前判断当前指针是否越出对象边界，当遇到有指针越界，缓冲区溢出等内存安全错误，程序会调用终止执行并报错。Step 6: Run the executable file, the detection function will judge whether the current pointer is beyond the boundary of the object before the pointer dereferences the object it points to. error.

在一种实施方式中，所述步骤二中采用基于对象的边界检测技术完成对象的对齐分配并实现边界表的具体过程为：In one embodiment, the specific process of using the object-based boundary detection technology in step 2 to complete the alignment allocation of objects and realize the boundary table is as follows:

首先，以对齐方式分配对象：对象的对齐分配直接利用内存分配器对程序产生的对象进行边界控制，利用二进制的内存分配器来满足内存对象的填充与对齐存储；First, allocate objects in an aligned manner: the aligned allocation of objects directly uses the memory allocator to control the boundaries of the objects generated by the program, and uses the binary memory allocator to satisfy the filling and aligned storage of memory objects;

然后，边界表的实现方法：在程序对象分配的时对对象的边界进行控制，通过填充使得对象所占内存为2的幂，在边界表中存储的是分配大小的对数的二进制表达(e＝log2(size))，通过(size＝1<<e；base＝p&～(size-1)；)操作还原出对象的分配大小和指向分配地址开始位置的指针。Then, the implementation method of the boundary table: when the program object is allocated, the boundary of the object is controlled, and the memory occupied by the object is made to be a power of 2 by filling, and the binary expression of the logarithm of the allocated size is stored in the boundary table (e =log2(size)), through the (size=1<<e; base=p&~(size-1);) operation, restore the allocation size of the object and the pointer to the start position of the allocation address.

在一种实施方式中，是使用连续数组来实现边界表。In one embodiment, a contiguous array is used to implement the boundary table.

在一种实施方式中，所述步骤三中在中间代码中插入检测函数，执行边界检查的具体操作如下：In one embodiment, in the step 3, the detection function is inserted into the intermediate code, and the specific operation of performing the boundary check is as follows:

使用p指针的值和分配大小e的对数的二进制值，其中的e是从边界表中获取的，以此保证如果q和p只在最低有效位有区别那么q指针就在边界内部。Use the value of the p pointer and the binary value of the logarithm of the allocation size e, where e is obtained from the boundary table, to ensure that the q pointer is inside the boundary if q and p differ only in the least significant bit.

在一种实施方式中，所述步骤四中具体优化步骤如下：In one embodiment, the specific optimization steps in the step 4 are as follows:

首先，由之前第三步已经在中间代码中为每个指针运算都添加了检查代码；First, check codes have been added for each pointer operation in the intermediate code from the previous third step;

其次，对存在于循环体内的所有指针运算进行最大范围的值域分析。Secondly, the maximum range of value domain analysis is performed on all pointer operations existing in the loop body.

在一种实施方式中，最大范围指的是在一个循环体内部的多个指向同一缓冲区的指针变量的最大指向范围。In one embodiment, the maximum range refers to the maximum pointing range of multiple pointer variables pointing to the same buffer within a loop body.

在一种实施方式中，所述步骤六中检测函数通过检查指针当前的指针地址是否在该指针所指对象的内部，对象的地址是通过边界表查询得到。In one embodiment, the detection function in step six checks whether the current pointer address of the pointer is inside the object pointed to by the pointer, and the address of the object is obtained by querying the boundary table.

在一种实施方式中，所述待检测的源代码为C或C++语言。In one embodiment, the source code to be detected is C or C++ language.

本发明提供的检测内存边界溢出错误的方法，使用一种规整的存储对象的方法，方便获取并存储对象的边界信息；实现更高效占据内存更小的边界表，存储对象分配边界大小的对数；通过判断指针所指向地址的对数的二进制的值来确定指针是否边界溢出；通过以更弱前提为核心的循环检查优化方法，提前发现边界溢出的指针，提高检测效率，能够快速的对软件进行验证；针对不同硬件的系统，如32位或64位的，可以调整对象分配的大小以达到最高的内存利用效率，减少内存浪费。本发明支持C语言和C++语言程序的内存溢出错误的高效检查，由于对象是以对齐方式存储的，这样既可以省去对边界值的查询时间，也可以省去存储常规对象地址的内存空间，能够解决所有的空间内存安全问题，从而有效的提高软件的可靠性。The method for detecting memory boundary overflow error provided by the present invention adopts a regular method of storing objects, which facilitates obtaining and storing object boundary information; realizes a more efficient boundary table occupying smaller memory, and stores the logarithm of object allocation boundary size ;By judging the binary value of the logarithm of the address pointed by the pointer to determine whether the pointer overflows; through the cycle inspection optimization method with a weaker premise as the core, the pointer overflowing the boundary can be found in advance, the detection efficiency can be improved, and the software can be quickly Verify; for systems with different hardware, such as 32-bit or 64-bit, you can adjust the size of object allocation to achieve the highest memory utilization efficiency and reduce memory waste. The present invention supports the efficient checking of memory overflow errors of C language and C++ language programs. Since the objects are stored in an aligned manner, the query time for boundary values can be saved, and the memory space for storing conventional object addresses can also be saved. It can solve all space memory security issues, thereby effectively improving the reliability of the software.

附图说明Description of drawings

图1为本发明实施例一提供的检测内存边界溢出错误的方法流程示意图；FIG. 1 is a schematic flowchart of a method for detecting memory boundary overflow errors provided by Embodiment 1 of the present invention;

图2为对象的实际分配边界与对象大小的关系图。FIG. 2 is a relationship diagram between the actual allocation boundary of an object and the size of the object.

具体实施方式Detailed ways

以下结合附图和具体实施例对本发明提出的一种检测内存边界溢出错误的方法作进一步详细说明。根据下面说明和权利要求书，本发明的优点和特征将更清楚。需说明的是，附图均采用非常简化的形式且均使用非精准的比例，仅用以方便、明晰地辅助说明本发明实施例的目的。A method for detecting memory boundary overflow errors proposed by the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments. Advantages and features of the present invention will be apparent from the following description and claims. It should be noted that all the drawings are in a very simplified form and use imprecise scales, and are only used to facilitate and clearly assist the purpose of illustrating the embodiments of the present invention.

实施例一Embodiment one

在实际的对象内存分配中，使用大小固定的分配边界而不是对象的实际大小作为边界。这里，我们将这种方法用于高效的向下兼容边界检查。我们使用内存分配器(如二进制BD分配器)来让分配的边界足够的简化：由于所有的分配大小都是2的幂，因此单个字节足以存储二进制对数的分配大小。进一步来说，因为一个大小为S的分配的基地址可以通过清除任何指向已分配内存区域的指针的最低有效位(log2(s))计算出来，所以没必要去存储一些没必要的信息。这样我们就可以为边界信息表设计一个在空间上和时间上同时有效的数据结构。本方法中使用一个连续数组而不是十分耗费内存的数据结构(如展开树)。In actual object memory allocation, use a fixed-size allocation boundary rather than the actual size of the object as the boundary. Here, we use this approach for efficient backward compatibility bounds checking. We use a memory allocator (such as the binary BD allocator) to make the allocation bounds sufficiently simple: since all allocation sizes are powers of 2, a single byte is sufficient to store the binary logarithm of the allocation size. Further, since the base address of an allocation of size S can be computed by clearing the least significant bits (log2(s)) of any pointers to allocated memory regions, there is no need to store unnecessary information. In this way, we can design a data structure that is both spatially and temporally efficient for the boundary information table. This method uses a contiguous array instead of a very memory-intensive data structure (such as a splay tree).

本实施例一的内容如下：The content of this embodiment one is as follows:

步骤一：编码器读取待检测的源代码并将其转变成中间代码；所述待检测的源代码为C或C++语言。Step 1: the coder reads the source code to be detected and converts it into an intermediate code; the source code to be detected is C or C++ language.

步骤二：分析步骤一中生成的中间代码，完成对象的对齐分配并实现边界表。本步骤中采用的是基于对象的边界检测技术，基于对象的含义就是在边界表中记录每个对象的边界信息。基于对象的方法是对含有指针运算的程序语句上执行检测。它使用源指针在边界表中查询对象边界信息，并且检测目标指针是否还在边界内部。如果目标不是指向它应该指向的对象，就将这个指针标记为边界溢出指针，并且拒绝该指针的任何解引用请求。但是我们会允许这个非法指针进行进一步的指针运算，因为运算最终可能得到一个在边界内部的指针。Step 2: Analyze the intermediate code generated in Step 1, complete the alignment allocation of objects and realize the boundary table. In this step, the object-based boundary detection technology is adopted, and the meaning of object-based is to record the boundary information of each object in the boundary table. The object-based approach is to perform detection on program statements that contain pointer arithmetic. It uses the source pointer to look up the object boundary information in the boundary table, and checks whether the destination pointer is still inside the boundary. If the target does not point to the object it should point to, the pointer is marked as a bounds overflow pointer and any dereference requests for the pointer are denied. But we will allow this illegal pointer to perform further pointer operations, because the operation may end up with a pointer inside the bounds.

具体过程如下：The specific process is as follows:

(1)以对齐方式分配对象：对象的对齐分配可以直接利用现有的内存分配器(如buddy分配器)对程序产生的对象进行边界控制。利用二进制的内存分配器来满足内存对象的填充与对齐存储，通俗来说就是将内存分成一个个大小相同的块(slot)，每个块的大小(slot-size)都是2的幂。这就保证了下一步的边界表能够高效的存储边界信息。图2为对象的实际分配边界与对象大小的关系图；(1) Allocate objects in alignment: The alignment allocation of objects can directly use the existing memory allocator (such as the buddy allocator) to control the boundaries of the objects generated by the program. Using a binary memory allocator to satisfy the filling and alignment storage of memory objects, generally speaking, divides the memory into blocks (slots) of the same size, and the size of each block (slot-size) is a power of 2. This ensures that the boundary table in the next step can efficiently store boundary information. FIG. 2 is a relationship diagram between the actual allocation boundary of an object and the size of the object;

(2)边界表的高效实现方法：使用非常简洁的方式来表示边界信息。在程序对象分配的时候就对对象的边界进行控制，通过填充使得对象所占内存为2的幂。这样就使得边界表中存储的是对象的分配边界而不是对象实际边界。原本存储对象的基地址和它的大小至少需要八个字节，而本实施例一中的边界表只需要一个字节就可以记录边界信息，从而可以大大降低内存的开销。我们在边界表中存储的是分配大小的对数的二进制表达(e＝log2(size))。有了这条信息，我们可以用(size＝1<<e；base＝p&～(size-1)；)操作还原出对象的分配大小和指向分配地址开始位置的指针。(2) Efficient implementation method of boundary table: use a very concise way to represent boundary information. When the program object is allocated, the boundary of the object is controlled, and the memory occupied by the object is made to be a power of 2 by filling. In this way, what is stored in the boundary table is the allocation boundary of the object instead of the actual boundary of the object. Originally, at least eight bytes are required to store the base address and size of an object, but the boundary table in the first embodiment only needs one byte to record boundary information, thereby greatly reducing memory overhead. What we store in the boundary table is the binary representation of the logarithm of the allocation size (e=log2(size)). With this information, we can use the (size=1<<e; base=p&~(size-1);) operation to restore the allocation size of the object and the pointer to the beginning of the allocation address.

本发明中使用连续数组来实现边界表，因为表内的每个条目都仅仅使用一个字节，所以边界表很小。由于上一步中对象都是以固定块大小进行存储的，并且每个块的大小slot-size都是2的幂，每个对象所占据的内存块都在边界表中有一个对象条目，因此表所占的空间是原本的1/slot size，并且我们可以调整slot-size大小来平衡内存浪费程度和表大小。In the present invention, a continuous array is used to realize the boundary table, because each entry in the table only uses one byte, so the boundary table is very small. Since the objects in the previous step are stored in a fixed block size, and the size of each block slot-size is a power of 2, the memory block occupied by each object has an object entry in the boundary table, so the table The space occupied is the original 1/slot size, and we can adjust the slot-size to balance the memory waste and table size.

步骤三：在中间代码中插入检测函数，执行边界检查。由于本步骤三是基于对象的，并不需要关注每个指针，只需要关注每个指向当前对象的指针以及其算数运算是否仍在该对象的边界内部。Step 3: Insert the detection function in the intermediate code and perform boundary check. Since the third step is based on the object, it is not necessary to pay attention to each pointer, but only needs to pay attention to whether each pointer to the current object and its arithmetic operation are still within the boundary of the object.

一般而言，检查一个关于p指针的算术运算(q＝p+i)的结果q的边界需要检查q是否在上下边界内。本实施例提供了一个不需要计算对象的上下边界的优化的边界检查方法。这个方法只需要使用p指针的值和分配大小e的对数的二进制值，其中的e是从边界表中获取的。这种对分配大小和对齐的控制可以保证：如果q和p只在最低有效位有区别，那么q指针就在边界内部。In general, checking the bounds of the result q of an arithmetic operation (q=p+i) on p pointers requires checking whether q is within the upper and lower bounds. This embodiment provides an optimized bounds checking method that does not need to calculate the upper and lower bounds of the object. This method takes only the value of the p pointer and the binary value of the logarithm of the allocation size e, where e is obtained from the bounds table. This control over allocation size and alignment guarantees that if q and p differ only in the least significant bits, then the q pointer is inside the bounds.

进一步来说，对于指针q其中sizeof(*q)>1，我们也需要检查(char*)q+sizeof(*q)–1的值是否在边界内，以预防后续的绕过分配边界直接访问*q。宽松边界检查可以避免检查q指针指向一个内置类型这种情况。对这些对齐的类型的访问不会发生重叠，因为它们的值都是2的幂且都小于slot-size。Further, for the pointer q where sizeof(*q)>1, we also need to check whether the value of (char*)q+sizeof(*q)–1 is within the boundary to prevent subsequent direct access bypassing the allocation boundary *q. Relaxed bounds checking avoids checking that the q pointer points to a built-in type. Accesses to these aligned types do not overlap because their values are all powers of 2 and are smaller than slot-size.

以一个指针操作(char*p＝buf[i])为例：Take a pointer operation (char*p=buf[i]) as an example:

第一步，右移源指针buf以获取与其对应的对象在边界表中的边界。The first step is to move the source pointer buf to the right to obtain the boundary of the corresponding object in the boundary table.

第二步，将分配大小e的对数从边界表载入到寄存器al中。In the second step, the logarithm of the allocation size e is loaded from the bounds table into register al.

第三步，将指针运算的结果与源指针buf做异或运算。In the third step, the result of the pointer operation is XORed with the source pointer buf.

第四步，通过al右移以去掉低位。The fourth step is to remove the low bits by shifting al to the right.

如果buf和p都在分配的边界内部，那他们只在log2e的最低有效位有区别。If both buf and p are within the bounds of the allocation, they differ only in the least significant bits of log2e.

步骤四：通过分析优化，简化不必要的冗余操作，生成优化后的中间代码。在边界检查中用到的典型优化方法包括：消除冗余检查，提取循环内部检查或者是仅仅提取出循环内的边界表查询操作。优化内部循环可以有效的提升性能。当一个循环内的所有访问都是同一个对象我们就尝试将边界表查询操作从循环内部提取出来。具体优化步骤如下：Step 4: Through analysis and optimization, unnecessary redundant operations are simplified, and optimized intermediate code is generated. Typical optimization methods used in bounds checking include: eliminating redundant checks, extracting checks inside loops, or extracting only bound table lookup operations inside loops. Optimizing the inner loop can effectively improve performance. When all accesses within a loop are to the same object we try to extract the boundary table lookup operation from inside the loop. The specific optimization steps are as follows:

a.由之前的几个步骤已经中间代码中为每个指针运算都添加了检查代码。a. From the previous steps, check codes have been added for each pointer operation in the intermediate code.

b.对存在于循环体内的所有指针运算进行最大范围的值域分析。最大范围指的是在一个循环体内部的多个指向同一缓冲区的指针变量的最大指向范围。但是这些指针变量的范围是不一样的，假设指针p的范围为a[0]～a[i-1]，指针q的范围为a[1]～a[i+1]，对这些指向同一缓冲区的数组指针的范围取并集，就可以得到最大范围a[0]～a[i+1]，多个指针的最大范围分析和上述过程一样。例如在下面的一个循环示例中，如果L足够的大，不优化的代码的执行将会带来非常大的运行时开销。在C语言中a[i]就相当于一个指针，我们可以在循环开始前判断a[i]的最弱的边界溢出条件，在本例中a[i]的范围为a[1]到a[L]。如果a[i]连这个最弱的条件都不满足即说明其已经溢出，则终止执行。如果满足了最弱条件则执行循环体，在循环体内部判断是否存在内部溢出。b. Analyze the maximum range of value ranges for all pointer operations that exist in the loop body. The maximum range refers to the maximum pointing range of multiple pointer variables pointing to the same buffer within a loop body. However, the ranges of these pointer variables are different. Assume that the range of pointer p is a[0]～a[i-1], and the range of pointer q is a[1]～a[i+1]. The maximum range a[0]～a[i+1] can be obtained by taking the union of the range pointers of the buffer array, and the analysis of the maximum range of multiple pointers is the same as the above process. For example, in the following loop example, if L is large enough, the execution of unoptimized code will bring a very large runtime overhead. In C language, a[i] is equivalent to a pointer. We can judge the weakest boundary overflow condition of a[i] before the loop starts. In this example, a[i] ranges from a[1] to a [L]. If a[i] does not even meet this weakest condition, it means that it has overflowed, and the execution is terminated. If the weakest condition is satisfied, the loop body is executed, and whether there is an internal overflow is judged inside the loop body.

c.将同源对象的指针的最弱条件合并。如上面代码中的a[i]和a[i-1],a[i]的最弱条件范围为a[1]到a[L]，a[i-1]的范围为a[0]到a[L-1]，合并后的同源最弱条件则为a[0]到a[L]。c. Merge the weakest conditions of pointers to objects of the same origin. For example, a[i] and a[i-1] in the above code, the weakest condition range of a[i] is a[1] to a[L], and the range of a[i-1] is a[0] to a[L-1], the weakest homologous condition after merging is a[0] to a[L].

通过这样的进入循环体前对对象指针的范围进行分析，可以简化很多不必要的冗余操作。当静态分析能确定循环体内指针值的大概的边界，那么将整个检查从循环体内提取出来是非常有效的。但是只有当静态分析能确定这些边界在每一步执行中都能被访问到时才能将这些检查从循环体内提取出来。一个循环体的边界可以很容易确定但是循环可能会在到达上边界前就终止了。在循环跳出前溢出了对象边界，这种情况下将一个检测从循环内取出就会在运行时抛出一个警报。By analyzing the scope of the object pointer before entering the loop body, many unnecessary redundant operations can be simplified. Extracting the entire check from the loop body is very efficient when static analysis can determine approximate bounds on pointer values within the loop body. But these checks can only be extracted from the loop body if the static analysis can determine that these boundaries can be visited at every step of execution. The bounds of a loop body can be easily determined but the loop may terminate before reaching the upper bound. Object bounds are overflowed before the loop breaks out, in which case taking a check out of the loop will throw an alert at runtime.

步骤五：将优化后的中间代码链接到二进制的库文件，生成可执行文件。Step 5: Link the optimized intermediate code to the binary library file to generate an executable file.

步骤六：用户运行已插入检测函数后的可执行文件，当遇到有指针越界，缓冲区溢出等内存安全错误，程序会调用终止执行并报错。在第三步中所插入的检测函数会通过检查指针当前的指针地址是否在该指针所指对象的内部，对象的地址是通过边界表查询得到的。如果在边界内部则继续执行下面的语句，如果不在内部则说明该指针已溢出，执行终止程序并报错Step 6: The user runs the executable file after the detection function has been inserted. When encountering memory security errors such as pointer out of bounds and buffer overflow, the program will call to terminate the execution and report an error. The detection function inserted in the third step checks whether the current pointer address of the pointer is inside the object pointed to by the pointer, and the address of the object is obtained by querying the boundary table. If it is inside the boundary, continue to execute the following statement, if it is not inside, it means that the pointer has overflowed, and the execution terminates the program and reports an error

本实施例一提供的支持所有针对C和C++的向下兼容边界溢出错误的检查。该方法将源代码转换成一种中间代码表示(IR)，并寻找潜在的含有不安全指针运算的操作语句，并且插入检查代码来保证他们的结果始终在边界内部。然后，将IR链接到我们的运行库和二进制库，从而产生健壮的可执行文件。Embodiment 1 provides support for checking all backward compatible boundary overflow errors for C and C++. This method converts the source code into an intermediate code representation (IR), looks for potentially unsafe pointer arithmetic operation statements, and inserts checking code to ensure that their results are always within bounds. Then, link the IR against our runtime and binary libraries, resulting in a robust executable.

虽然本发明已以较佳实施例公开如上，但其并非用以限定本发明，任何熟悉此技术的人，在不脱离本发明的精神和范围内，都可做各种的改动与修饰，因此本发明的保护范围应该以权利要求书所界定的为准。Although the present invention has been disclosed above with preferred embodiments, it is not intended to limit the present invention. Any person familiar with this technology can make various changes and modifications without departing from the spirit and scope of the present invention. Therefore The scope of protection of the present invention should be defined by the claims.

Claims

A kind of 1. method for detecting memory boundary overflow error, which is characterized in that include the following steps：

Step 1：Encoder reads source code to be detected and is converted into intermediate code；

Step 2：Intermediate code is analyzed, distributed using the alignment of object-based bound test technology completion object and realizes side Boundary's table；

Step 3：Detection function is inserted into intermediate code；

Step 4：By analysis optimization, simplify redundant operation, the intermediate code after generation optimization；

Step 5：Intermediate code after optimization is linked to binary library file, generates executable file；

Step 6：Executable file is run, detection function can judge current pointer before pointer dereference its pointed object Whether object bounds are run off, there is pointer to cross the border when encountering, the memories security error such as buffer overflow, program can call termination to perform And it reports an error.
2. the method for detection memory boundary overflow error as described in claim 1, which is characterized in that used in the step 2 The alignment that object-based bound test technology completes object distributes and realizes that the detailed process of boundary table is：

First, distribution object in aligned fashion：The object that the alignment distribution of object directly generates program using memory allocator Boundary Control is carried out, meets the filling of memory object with being aligned storage using binary memory allocator；

Then, the implementation method of boundary table：The boundary of object is controlled when program object distributes, passes through filling 2 power is saved as in shared by object, that stored in the table of boundary is the binary expression (e=log2 of the logarithm of allocated size (size)), pass through (size=1<<e；Base=p＆~(size-1)；) operation restores the allocated size of object and direction is divided Pointer with address starting position.
3. the method for detection memory boundary overflow error as claimed in claim 2, which is characterized in that come using continuous array Realize boundary table.
4. the method for detection memory boundary overflow error as described in claim 1, which is characterized in that in the step 3 in Between be inserted into detection function in code, the concrete operations of exercise boundary inspection are as follows：

Using the binary value of the logarithm of the value and allocated size e of p pointers, e therein is obtained from the table of boundary, is protected with this Card is if q and p is only if least significant bit has any different so q pointers in border inner.
5. the method for detection memory boundary overflow error as described in claim 1, which is characterized in that specific in the step 4 Optimization Steps are as follows：

First, by before the step of three in be that each pointer operation is added to inspection code in intermediate code；

Secondly, all pointer operations being present in loop body are carried out with the range analysis of maximum magnitude.
6. the method for detection memory boundary overflow error as claimed in claim 5, which is characterized in that the maximum magnitude is Refer to：The maximum of the pointer variable of the same buffering area of multiple directions inside a loop body is directed toward range.
7. the method for detection memory boundary overflow error as described in claim 1, which is characterized in that detected in the step 6 For function by whether checking the current pointer address of pointer in the inside of the pointer referent, the address of object is to pass through boundary Table is inquired to obtain.
8. the method for detection memory boundary overflow error as described in claim 1, which is characterized in that the source generation to be detected Code is C or C Plus Plus.