CN117370134A

CN117370134A - Micro service performance evaluation method and device, electronic equipment and storage medium

Info

Publication number: CN117370134A
Application number: CN202311352116.7A
Authority: CN
Inventors: 吴长鹏
Original assignee: Jingdong Technology Information Technology Co Ltd
Current assignee: Jingdong Technology Information Technology Co Ltd
Priority date: 2023-10-18
Filing date: 2023-10-18
Publication date: 2024-01-09

Abstract

The present disclosure relates to a method, an apparatus, an electronic device, and a storage medium for evaluating micro-service performance, where the method includes: monitoring the running process of the micro-service system to obtain perceived topology information; the perceived topology information covers at least one topology relation of resource scheduling and service calling of micro services in the micro service system; constructing fault simulation information according to the perceived topology information; performing fault scene simulation in the micro-service system according to the fault simulation information; and analyzing real-time performance data of the preset monitoring index before, during and after the fault scene simulation to obtain the stability evaluation result of the micro-service system. The scheme realizes automatic fault scene simulation and automatic evaluation of the stability of the micro-service system, and the evaluation result is more accurate.

Description

Microservice performance evaluation methods, devices, electronic equipment and storage media

技术领域Technical field

本公开涉及微服务技术领域，尤其涉及一种微服务性能的评价方法、装置、电子设备及存储介质。The present disclosure relates to the technical field of microservices, and in particular, to a method, device, electronic device and storage medium for evaluating the performance of microservices.

背景技术Background technique

微服务系统是多个微服务的集合，是一种能够灵活进行组织和升级迭代的软件架构形式。一个复杂的应用软件的功能可以通过拆分的多个微服务来实现，每个微服务可以独立部署、运行和升级且能够采用不同开发语言进行独立开发，具有扩展灵活和维护方便的优点。在微服务系统实现所需功能的过程中，如果某一个或多个微服务发生故障可能会导致整个微服务系统的不可用或功能受影响，因此进行微服务系统的稳定性测试和评估十分必要。A microservice system is a collection of multiple microservices. It is a form of software architecture that can be flexibly organized, upgraded and iterated. The functions of a complex application software can be realized by splitting multiple microservices. Each microservice can be deployed, run and upgraded independently and can be developed independently using different development languages. It has the advantages of flexible expansion and convenient maintenance. In the process of realizing the required functions of the microservice system, if one or more microservices fails, it may cause the entire microservice system to be unavailable or have its functions affected. Therefore, it is necessary to conduct stability testing and evaluation of the microservice system. .

在实现本公开构思的过程中，发明人发现相关技术中至少存在如下技术问题：相关技术中，大多是需要技术人员手动设置测试规则并手动进行微服务系统的稳定性测试，而编写测试规则的前提是需要技术人员熟练掌握微服务系统的相关知识，并能够设计出合适的测试用例，这对于技术人员要求较高、耗费时间和人力成本，预设测试规则和测试用例的方式也会导致测试和评价不全面，而且无法做到根据实际运行场景适配化测试，评估准确度有待于提升。In the process of realizing the concept of the present disclosure, the inventor found that there are at least the following technical problems in related technologies: in related technologies, most of them require technicians to manually set test rules and manually conduct stability testing of microservice systems, and writing test rules The premise is that technical personnel need to be proficient in the relevant knowledge of the microservice system and be able to design appropriate test cases. This is highly demanding for technical personnel, time-consuming and labor-intensive. The way of preset test rules and test cases will also cause testing And the evaluation is not comprehensive, and it is impossible to adapt the test according to the actual operating scenario, and the accuracy of the evaluation needs to be improved.

发明内容Contents of the invention

为了解决上述技术问题或者至少部分地解决上述技术问题，本公开的实施例提供了一种微服务性能的评价方法、装置、电子设备及存储介质。In order to solve the above technical problems or at least partially solve the above technical problems, embodiments of the present disclosure provide a microservice performance evaluation method, device, electronic device and storage medium.

第一方面，本公开的实施例提供一种微服务性能的评价方法。上述评价方法包括：对微服务系统的运行过程进行监测，得到感知拓扑信息；上述感知拓扑信息涵盖上述微服务系统中微服务的资源调度和服务调用至少一种拓扑关系；根据上述感知拓扑信息，构建故障模拟信息；根据上述故障模拟信息，在上述微服务系统中进行故障场景模拟；对预设监测指标在故障场景模拟前、模拟中和模拟后的实时表现数据进行分析，得到上述微服务系统的稳定性评价结果。In the first aspect, embodiments of the present disclosure provide a method for evaluating microservice performance. The above evaluation method includes: monitoring the operation process of the microservice system to obtain perceived topology information; the above perceived topology information covers at least one topological relationship between resource scheduling and service invocation of microservices in the above microservice system; based on the above perceived topology information, Construct fault simulation information; perform fault scenario simulation in the above microservice system based on the above fault simulation information; analyze the real-time performance data of preset monitoring indicators before, during and after the fault scenario simulation to obtain the above microservice system stability evaluation results.

根据本公开的实施例，根据上述感知拓扑信息，构建故障模拟信息，包括：根据上述感知拓扑信息和预设故障类型，确定需要进行故障模拟的目标对象；针对上述目标对象，构建与上述预设故障类型对应的故障模拟内容；其中，上述故障模拟信息包含上述目标对象和上述故障模拟内容。According to an embodiment of the present disclosure, constructing fault simulation information based on the above-mentioned sensing topology information includes: determining a target object that needs to be fault simulated based on the above-mentioned sensing topology information and a preset fault type; for the above-mentioned target object, constructing the same as the above-mentioned preset Fault simulation content corresponding to the fault type; wherein the above fault simulation information includes the above target object and the above fault simulation content.

根据本公开的实施例，上述预设故障类型包括以下故障类型中的至少一种：实例宕机、CPU满载、内存满载、磁盘占满、网络丢包、网络延时、进程阻塞、依赖服务不可用、依赖服务延时。According to embodiments of the present disclosure, the above-mentioned preset fault types include at least one of the following fault types: instance down, CPU full load, memory full load, disk full, network packet loss, network delay, process blocking, dependent service unavailability Use and rely on service delays.

根据本公开的实施例，上述感知拓扑信息包含：具有调用依赖关系的第一微服务节点的节点信息，各第一微服务节点之间的调用依赖关系。其中，根据上述感知拓扑信息和预设故障类型，确定需要进行故障模拟的目标对象，包括：在上述预设故障类型为第一故障类型的情况下，根据上述调用依赖关系和上述第一微服务节点的节点信息，确定进行故障模拟的第一目标微服务节点；上述第一故障类型包含以下至少一种：依赖服务不可用、依赖服务延时；针对上述目标对象，构建与上述预设故障类型对应的故障模拟内容，包括：针对上述第一目标微服务节点，构建上述第一故障类型对应的故障模拟内容。According to an embodiment of the present disclosure, the above-mentioned sensing topology information includes: node information of the first microservice node with a call dependency relationship, and call dependency relationships between the first microservice nodes. Among them, determining the target object that needs to be fault simulated based on the above-mentioned perceived topology information and the preset fault type includes: when the above-mentioned preset fault type is the first fault type, based on the above-mentioned calling dependency relationship and the above-mentioned first microservice The node information of the node determines the first target microservice node for fault simulation; the above-mentioned first fault type includes at least one of the following: dependent service unavailability, dependent service delay; for the above-mentioned target object, construct the above-mentioned preset fault type The corresponding fault simulation content includes: constructing fault simulation content corresponding to the above-mentioned first fault type for the above-mentioned first target microservice node.

根据本公开的实施例，上述感知拓扑信息包含：上述感知拓扑信息包含：上述微服务系统中部署的第二微服务节点的节点信息，第二微服务节点进行资源调度的分配拓扑关系和优先级信息。其中，根据上述感知拓扑信息和预设故障类型，确定需要进行故障模拟的目标对象，包括：在上述预设故障类型为第二故障类型的情况下，根据上述第二微服务节点的节点信息和上述优先级信息，确定进行故障模拟的第二目标微服务节点；或者，根据上述第二微服务节点的节点信息、上述优先级信息和上述分配拓扑关系，确定进行故障模拟的第二目标微服务节点。其中上述第二故障类型包含以下至少一种：实例宕机、CPU满载、内存满载、磁盘占满、网络丢包、网络延时、进程阻塞。针对上述目标对象，构建与上述预设故障类型对应的故障模拟内容，包括：针对上述第二目标微服务节点，构建上述第二故障类型对应的故障模拟内容。According to an embodiment of the present disclosure, the above-mentioned sensing topology information includes: the above-mentioned sensing topology information includes: node information of the second microservice node deployed in the above-mentioned microservice system, and the allocation topology relationship and priority of the second microservice node for resource scheduling. information. Among them, determining the target object that needs to be fault simulated based on the above-mentioned perceived topology information and the preset fault type includes: when the above-mentioned preset fault type is the second fault type, based on the node information of the above-mentioned second microservice node and The above priority information determines the second target microservice node for fault simulation; or, based on the node information of the above second microservice node, the above priority information and the above distribution topology relationship, determines the second target microservice for fault simulation. node. The above-mentioned second fault type includes at least one of the following: instance downtime, CPU full load, memory full load, disk full, network packet loss, network delay, and process blocking. Constructing fault simulation content corresponding to the above preset fault type for the above target object includes: constructing fault simulation content corresponding to the above second fault type for the above second target microservice node.

根据本公开的实施例，对预设监测指标在故障场景模拟前、模拟中和模拟后的实时表现数据进行分析，得到上述微服务系统的稳定性评价结果，包括：对上述实时表现数据的变化规律进行分析，得到在故障场景模拟中和模拟后各故障场景对微服务系统运行稳态的实时破坏程度信息；根据各故障场景的预设分配分值和上述实时破坏程度信息，确定每个故障场景下的稳定性评分；根据上述稳定性评分和各个故障场景预先分配的权重，生成上述微服务系统的稳定性综合评分，上述稳定性综合评分作为上述稳定性评价结果。According to embodiments of the present disclosure, the real-time performance data of the preset monitoring indicators before, during and after the fault scenario simulation are analyzed to obtain the stability evaluation results of the above-mentioned microservice system, including: changes to the above-mentioned real-time performance data Analyze the rules to obtain real-time damage degree information of each fault scenario to the steady state of the microservice system during and after the fault scenario simulation; determine each fault based on the preset assigned score of each fault scenario and the above real-time damage degree information Stability score under the scenario; based on the above stability score and the pre-assigned weight of each failure scenario, a comprehensive stability score of the above microservice system is generated, and the above comprehensive stability score is used as the above stability evaluation result.

根据本公开的实施例，上述评价方法还包括：获取上述微服务系统中的各个微服务处于运行稳态下的第一历史状态数据和发生异常对应的第二历史状态数据；根据上述第一历史状态数据和上述第二历史状态数据，确定用于表示微服务运行情况的候选监测指标；上述候选监测指标用于作为指标配置界面中各微服务对应的指标选项；接收用户在上述指标配置界面中针对指标选项的选择信息和自定义指标信息；根据上述选择信息和上述自定义指标信息，生成各微服务对应的预设监测指标。According to an embodiment of the present disclosure, the above-mentioned evaluation method further includes: obtaining the first historical status data of each microservice in the above-mentioned microservice system when it is in a steady state of operation and the second historical status data corresponding to an abnormality; according to the above-mentioned first history The status data and the above-mentioned second historical status data determine the candidate monitoring indicators used to represent the operation status of the microservice; the above-mentioned candidate monitoring indicators are used as indicator options corresponding to each microservice in the indicator configuration interface; receive the user's input in the above-mentioned indicator configuration interface For the selection information and custom indicator information of indicator options; based on the above selection information and the above custom indicator information, generate preset monitoring indicators corresponding to each microservice.

第二方面，本公开的实施例提供一种微服务性能的评价装置。上述评价装置包括：监测模块、构建模块、故障模拟模块和评价模块。上述监测模块用于对微服务系统的运行过程进行监测，得到感知拓扑信息；上述感知拓扑信息涵盖上述微服务系统中微服务的资源调度和服务调用至少一种拓扑关系。上述构建模块用于根据上述感知拓扑信息，构建故障模拟信息。上述故障模拟模块用于根据上述故障模拟信息，在上述微服务系统中进行故障场景模拟。上述评价模块用于对预设监测指标在故障场景模拟前、模拟中和模拟后的实时表现数据进行分析，得到上述微服务系统的稳定性评价结果。In the second aspect, embodiments of the present disclosure provide an evaluation device for microservice performance. The above-mentioned evaluation device includes: a monitoring module, a construction module, a fault simulation module and an evaluation module. The above-mentioned monitoring module is used to monitor the operation process of the microservice system and obtain perceptual topology information; the above-mentioned perceptual topology information covers at least one topological relationship between resource scheduling and service invocation of microservices in the above-mentioned microservice system. The above building modules are used to construct fault simulation information based on the above sensing topology information. The above fault simulation module is used to simulate fault scenarios in the above microservice system based on the above fault simulation information. The above-mentioned evaluation module is used to analyze the real-time performance data of the preset monitoring indicators before, during and after the fault scenario simulation to obtain the stability evaluation results of the above-mentioned microservice system.

第三方面，本公开的实施例提供了一种电子设备。上述电子设备包括处理器、通信接口、存储器和通信总线，其中，处理器、通信接口和存储器通过通信总线完成相互间的通信；存储器，用于存放计算机程序；处理器，用于执行存储器上所存放的程序时，实现如上所述的微服务性能的评价方法。In a third aspect, embodiments of the present disclosure provide an electronic device. The above-mentioned electronic equipment includes a processor, a communication interface, a memory and a communication bus. The processor, communication interface and memory complete communication with each other through the communication bus; the memory is used to store computer programs; the processor is used to execute everything on the memory. When storing programs, implement the evaluation method of microservice performance as mentioned above.

第四方面，本公开的实施例提供了一种计算机可读存储介质。上述计算机可读存储介质上存储有计算机程序，上述计算机程序被处理器执行时实现如上所述的微服务性能的评价方法。In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium. The computer program is stored on the computer-readable storage medium. When the computer program is executed by the processor, the evaluation method of microservice performance as described above is implemented.

本公开实施例提供的上述技术方案至少具有如下优点的部分或全部：The above technical solutions provided by the embodiments of the present disclosure have at least some or all of the following advantages:

通过对微服务系统的运行过程进行监测，得到感知拓扑信息，由于上述感知拓扑信息涵盖上述微服务系统中微服务的资源调度和服务调用至少一种拓扑关系，能够自动感测出多个微服务在进行资源调度过程中的分配拓扑关系和相对优先级、微服务进行服务调用过程中的网络拓扑关系至少一种；那么根据上述感知拓扑信息构建得到的故障模拟信息进行故障场景模拟可以更加准确地模拟出微服务系统在运行过程中可能出现的各种故障场景，对预设监测指标在故障场景模拟前、模拟中和模拟后的实时表现数据进行分析，得到的上述微服务系统的稳定性评价结果也是更加准确和完善。上述方案实现自动化模拟故障场景并自动化评价微服务系统的稳定性，评价结果更加准确，相较于采用人工手动设置测试规则和测试用例并手动进行微服务系统的稳定性测试的方案而言，降低了对测试人员的技术门槛并提升了测试的自动化程度和测试结果的准确度。By monitoring the running process of the microservice system, the sensing topology information is obtained. Since the above sensing topology information covers at least one topological relationship between resource scheduling and service invocation of microservices in the above microservice system, multiple microservices can be automatically sensed. At least one of the allocation topology relationship and relative priority in the resource scheduling process, and the network topology relationship in the service invocation process of microservices; then the fault scenario simulation based on the fault simulation information constructed based on the above-mentioned perceived topology information can be more accurate. Simulate various fault scenarios that may occur during the operation of the microservice system, analyze the real-time performance data of the preset monitoring indicators before, during and after the fault scenario simulation, and obtain the stability evaluation of the above microservice system. The result is also more accurate and complete. The above solution automatically simulates fault scenarios and automatically evaluates the stability of the microservice system. The evaluation results are more accurate. Compared with the solution that manually sets test rules and test cases and manually conducts stability testing of the microservice system, it reduces the cost. It improves the technical threshold for testers and improves the automation of testing and the accuracy of test results.

附图说明Description of the drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本公开的实施例，并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

为了更清楚地说明本公开实施例或现有技术中的技术方案，下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍，显而易见地，对于本领域普通技术人员而言，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or related technologies will be briefly introduced below. It is obvious to those of ordinary skill in the art that , other drawings can also be obtained based on these drawings without exerting creative labor.

图1示意性地示出了适用于本公开实施例的微服务性能的评价方法的系统架构；Figure 1 schematically shows the system architecture of a microservice performance evaluation method suitable for embodiments of the present disclosure;

图2示意性地示出了根据本公开一实施例的微服务性能的评价方法的流程图；Figure 2 schematically shows a flow chart of a method for evaluating microservice performance according to an embodiment of the present disclosure;

图3示意性地示出了根据本公开一实施例的步骤S220的详细实施流程图；Figure 3 schematically shows a detailed implementation flow chart of step S220 according to an embodiment of the present disclosure;

图4A示意性地示出了根据本公开一实施例的步骤S310和S320的实施过程示意图；Figure 4A schematically shows a schematic diagram of the implementation process of steps S310 and S320 according to an embodiment of the present disclosure;

图4B示意性地示出了根据本公开另一实施例的步骤S310和S320的实施过程示意图；Figure 4B schematically shows a schematic diagram of the implementation process of steps S310 and S320 according to another embodiment of the present disclosure;

图5示意性地示出了根据本公开另一实施例的微服务性能的评价方法的流程图；Figure 5 schematically shows a flow chart of a method for evaluating microservice performance according to another embodiment of the present disclosure;

图6示意性地示出了根据本公开一实施例的微服务性能的评价装置的结构框图；Figure 6 schematically shows a structural block diagram of a microservice performance evaluation device according to an embodiment of the present disclosure;

图7示意性地示出了本公开实施例提供的电子设备的结构框图。FIG. 7 schematically shows a structural block diagram of an electronic device provided by an embodiment of the present disclosure.

具体实施方式Detailed ways

为使本公开实施例的目的、技术方案和优点更加清楚，下面将结合本公开实施例中的附图，对本公开实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本公开的一部分实施例，而不是全部的实施例。基于本公开中的实施例，本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本公开保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments These are some, but not all, of the embodiments of the present disclosure. Based on the embodiments in this disclosure, all other embodiments obtained by those of ordinary skill in the art without any creative efforts fall within the scope of protection of this disclosure.

本公开的第一个示例性实施例提供一种微服务性能的评价方法。A first exemplary embodiment of the present disclosure provides a method for evaluating microservice performance.

图1示意性地示出了适用于本公开实施例的微服务性能的评价方法的系统架构。Figure 1 schematically shows the system architecture of a microservice performance evaluation method suitable for embodiments of the present disclosure.

本公开的实施例中，将应用软件实现一个或多个功能(例如为某个业务功能)所需的多个微服务的集合描述为微服务系统。参照图1所示，适用于本公开实施例的微服务性能的评价方法的系统架构100包含：微服务系统110和性能评价应用120，其中微服务系统110可以基于各种方式进行部署，例如包含但不限于是以下部署方式：单机多进程部署方式、多机器多进程部署方式、基于容器的部署方式、基于容器编排器(例如基于k8s进行部署)的部署方式、基于云函数(例如为API函数)的部署方式。In the embodiments of the present disclosure, a collection of multiple microservices required by application software to implement one or more functions (for example, a certain business function) is described as a microservice system. Referring to FIG. 1 , the system architecture 100 suitable for the microservice performance evaluation method according to the embodiment of the present disclosure includes: a microservice system 110 and a performance evaluation application 120 , where the microservice system 110 can be deployed based on various methods, for example, including But it is not limited to the following deployment methods: single-machine multi-process deployment method, multi-machine multi-process deployment method, container-based deployment method, container orchestrator-based (for example, deployment based on k8s) deployment method, cloud function-based (for example, API function) ) deployment method.

上述单机多进程部署方式是将多个微服务作为单个服务器中的多个进程。The above single-machine multi-process deployment method uses multiple microservices as multiple processes in a single server.

多机器多进程部署方式是单机多进程部署的升级版本，通过提供多个服务器实现高可用性和可扩展性。Multi-machine multi-process deployment is an upgraded version of single-machine multi-process deployment, which achieves high availability and scalability by providing multiple servers.

基于容器的部署方式是将微服务打包于容器中进行部署，具有高并发性，能够运行容器镜像里面的多个实例而不会引发冲突。The container-based deployment method is to package microservices in containers for deployment. It has high concurrency and can run multiple instances in the container image without causing conflicts.

基于容器编排器的部署方式是基于容器编排器进行容器的部署和资源调度，例如以k8s(kubernetes的简称)容器编排器作为示例，将最小运行单位pod(pod是kubernetes中最小的资源管理组件，也是最小化运行容器化应用的资源对象)调度到工作节点上，在一个pod中可以包含一个或多个容器，每个容器用于运行一个微服务，pod作为在集群中运行的进程。The deployment method based on the container orchestrator is based on the container orchestrator for container deployment and resource scheduling. For example, taking the k8s (abbreviation for kubernetes) container orchestrator as an example, the smallest running unit pod (pod is the smallest resource management component in kubernetes, It is also a resource object that minimizes running containerized applications) and is scheduled to a worker node. A pod can contain one or more containers. Each container is used to run a microservice, and the pod serves as a process running in the cluster.

基于云函数的部署方式是将每个微服务构建为一个云函数接口，在需要调用微服务的情况下基于对应的接口进行服务调用即可。本公开实施例不限定具体微服务系统的部署方式。The deployment method based on cloud functions is to construct each microservice as a cloud function interface, and when the microservice needs to be called, the service can be called based on the corresponding interface. The embodiments of this disclosure do not limit the specific deployment method of the microservice system.

上述性能评价应用120用于执行本公开实施例提供的微服务性能的评价方法，能够实现自动化模拟故障场景并自动化评价微服务系统的稳定性。上述性能评价应用120可以安装于微服务系统所对应的服务端，该服务端作为微服务性能的评价装置的一种示例。上述性能评价应用120所安装的服务端具体可以是但不限于：主机(对应于单机多进程部署方式)、服务集群的主节点(多机器多进程部署方式)、运行容器的服务节点、具有容器编排器并进行资源调度管控的主节点、或者云服务器的管控装置等。The above-mentioned performance evaluation application 120 is used to execute the evaluation method of microservice performance provided by embodiments of the present disclosure, and can automatically simulate fault scenarios and automatically evaluate the stability of the microservice system. The above-mentioned performance evaluation application 120 can be installed on the server corresponding to the microservice system, and the server serves as an example of a microservice performance evaluation device. The server installed by the performance evaluation application 120 may specifically be, but is not limited to: a host (corresponding to a single-machine multi-process deployment method), a master node of a service cluster (a multi-machine multi-process deployment method), a service node running a container, or a server with a container. Orchestrator and master node for resource scheduling and control, or cloud server management and control device, etc.

上述微服务包含但不限于是各种类型的服务，能够适配于业务类应用、运营类应用、开发类应用、测试类应用、运维类应用等的需求，诸如：登录服务、人脸识别服务、身份认证服务、权限分配服务、AI算法类服务、订单服务、物流服务、支付服务等。The above microservices include but are not limited to various types of services, which can be adapted to the needs of business applications, operational applications, development applications, testing applications, operation and maintenance applications, etc., such as: login services, face recognition services, identity authentication services, authority allocation services, AI algorithm services, order services, logistics services, payment services, etc.

图2示意性地示出了根据本公开一实施例的微服务性能的评价方法的流程图。Figure 2 schematically shows a flow chart of a method for evaluating microservice performance according to an embodiment of the present disclosure.

参照图2所示，本公开实施例提供的微服务性能的评价方法，包括以下步骤：S210～S240。Referring to Figure 2, the evaluation method of microservice performance provided by the embodiment of the present disclosure includes the following steps: S210 to S240.

在步骤S210，对微服务系统的运行过程进行监测，得到感知拓扑信息；上述感知拓扑信息涵盖上述微服务系统中微服务的资源调度和服务调用至少一种拓扑关系。In step S210, the running process of the microservice system is monitored to obtain perceptual topology information; the perceptual topology information covers at least one topological relationship between resource scheduling and service invocation of microservices in the microservice system.

在一些实施例中，可以通过对微服务系统运行期间进行资源调用和服务调用至少一种运行过程进行监测，得到上述感知拓扑信息。In some embodiments, the above-mentioned perceived topology information can be obtained by monitoring at least one of the running processes of resource invocation and service invocation during the operation of the microservice system.

上述微服务系统110包含多个微服务，例如图1中示例的微服务1、微服务2和微服务3等。在实现应用软件的一个或多个功能时，会存在微服务调用其他服务的过程。在一些实施例中，可以适配于微服务系统的部署方式，在主机、服务集群的主节点、运行容器的服务节点、具有容器编排器并进行资源调度管控的主节点或云服务器的管控装置等服务端上安装第一感知模块，该第一感知模块用于进行服务调用的感测，在某个微服务发生服务调用的情况下进行网络调用链路的构建(例如登录微服务在执行过程中要调用人脸识别服务和验证码服务、订单微服务在订单结算环节要调用支付服务)，以及进行多个网络调用链路的整合，得到微服务系统在服务调用过程中的网络拓扑关系。例如，在登录微服务执行过程中要调用验证码服务和人脸识别服务，则在监测到服务调用的时候对应会构建登录微服务与验证码服务、和登录微服务与人脸识别服务之间的网络调用链路。订单微服务在订单结算环节要调用支付服务，则在监测到服务调用的时候对应会构建订单微服务与支付服务之间的网络调用链路。The above-mentioned microservice system 110 includes multiple microservices, such as microservice 1, microservice 2, and microservice 3 illustrated in Figure 1. When implementing one or more functions of application software, there will be a process in which microservices call other services. In some embodiments, it can be adapted to the deployment mode of the microservice system, on the host, the main node of the service cluster, the service node running the container, the main node with a container orchestrator and resource scheduling management and control, or the management and control device of the cloud server. Wait for the first sensing module to be installed on the server. The first sensing module is used to sense service calls and construct network call links when a service call occurs in a certain microservice (for example, during the execution of the login microservice The face recognition service and verification code service must be called, the order microservice must call the payment service during the order settlement process), and multiple network call links must be integrated to obtain the network topology relationship of the microservice system during the service call process. For example, during the execution of the login microservice, the verification code service and the face recognition service need to be called. When the service call is detected, the relationship between the login microservice and the verification code service, and the login microservice and the face recognition service will be constructed. network call link. The order microservice needs to call the payment service during the order settlement process. When the service call is detected, a network call link between the order microservice and the payment service will be constructed.

在实现应用软件的一个或多个功能时，还会存在资源调度的情形，多个微服务作为进程(对应于单机多进程部署方式、多机器多进程部署方式)、容器微服务(对应于基于容器的部署方式、基于容器编排器的部署方式)、云函数接口(对应于基于云函数的部署方式)等进行运行或被调用的过程中，会存在对CPU、内存、输入输出(IO)、网络、磁盘等资源的调度。在一些实施例中，可以适配于微服务系统的部署方式，在主机、服务集群的主节点、运行容器的服务节点、具有容器编排器并进行资源调度管控的主节点或云服务器的管控装置等服务端上安装第二感知模块，该第二感知模块用于进行资源调度的感测，在将资源调度给某个微服务的情况下进行分配拓扑关系的构建，以及根据监测的资源调度逻辑确定各个微服务在进行资源分配过程中的相对优先级。例如在微服务1～微服务3都申请内核资源的情况下，将双核操作系统中的CPU1的T1～T2(例如为左闭右开区间，不包含右端点值)时段分配给微服务1，将CPU2的T1～T3时段分配给微服务2，期间微服务3处于调度等待状态，等到微服务1释放了内核资源，将CPU1的T2～T4时段分配给微服务3；根据资源调度情况可以构建分配拓扑关系，同时按照高优先级的资源分配逻辑可以确定微服务1的优先级高于微服务3的优先级。When implementing one or more functions of application software, there will also be resource scheduling situations. Multiple microservices serve as processes (corresponding to single-machine multi-process deployment mode, multi-machine multi-process deployment mode), container microservices (corresponding to based on During the process of running or being called, there will be changes to the CPU, memory, input and output (IO), Scheduling of network, disk and other resources. In some embodiments, it can be adapted to the deployment mode of the microservice system, on the host, the main node of the service cluster, the service node running the container, the main node with a container orchestrator and resource scheduling management and control, or the management and control device of the cloud server. Wait for the second sensing module to be installed on the server. The second sensing module is used to sense resource scheduling, construct the allocation topology relationship when scheduling resources to a certain microservice, and based on the monitored resource scheduling logic. Determine the relative priority of individual microservices in resource allocation. For example, when microservices 1 to 3 all apply for kernel resources, allocate the T1 to T2 (for example, the left-closed and right-open interval, excluding the right endpoint value) period of CPU1 in the dual-core operating system to microservice 1. Allocate the T1~T3 period of CPU2 to microservice 2, during which microservice 3 is in a scheduling waiting state. When microservice 1 releases the kernel resources, allocate the T2~T4 period of CPU1 to microservice 3; it can be constructed according to the resource scheduling situation Allocate topological relationships, and at the same time, according to the high-priority resource allocation logic, it can be determined that the priority of microservice 1 is higher than the priority of microservice 3.

在步骤S220，根据上述感知拓扑信息，构建故障模拟信息。In step S220, fault simulation information is constructed based on the above perceived topology information.

由于上述感知拓扑信息涵盖上述微服务系统中微服务的资源调度和服务调用至少一种拓扑关系，能够自动感测出多个微服务在进行资源调度过程中的分配拓扑关系和相对优先级、微服务进行服务调用过程中的网络拓扑关系至少一种，那么根据上述感知拓扑信息构建得到的故障模拟信息进行故障场景模拟可以更加准确地模拟出微服务系统在运行过程中可能出现的各种故障场景。Since the above-mentioned sensing topology information covers at least one topological relationship between resource scheduling and service invocation of microservices in the above-mentioned microservice system, it can automatically sense the distribution topology relationship and relative priority of multiple microservices in the resource scheduling process, and the microservices' resource scheduling and service invocation. If the service has at least one network topology relationship in the process of calling the service, then fault scenario simulation based on the fault simulation information constructed from the above sensing topology information can more accurately simulate various fault scenarios that may occur during the operation of the microservice system. .

在步骤S230，根据上述故障模拟信息，在上述微服务系统中进行故障场景模拟。In step S230, fault scenario simulation is performed in the microservice system based on the above fault simulation information.

故障模拟也可以称为故障演练，是遵循混沌工程原理的实践，通过以软件形式编排各种类型的故障来模拟生产系统中可能发生的各种异常状态，基于对异常状态的分析和应对，进一步对微服务架构、资源调度或处理逻辑等进行优化，帮助提升微服务系统的容错能力，避免真正的突发事件带来的灾难性后果。Fault simulation can also be called fault drill. It is a practice that follows the principles of chaos engineering. It simulates various abnormal states that may occur in the production system by arranging various types of faults in the form of software. Based on the analysis and response to the abnormal state, further Optimize the microservice architecture, resource scheduling or processing logic to help improve the fault tolerance of the microservice system and avoid the catastrophic consequences of real emergencies.

在一些实施例中，上述故障模拟信息包含：需要进行故障模拟的目标对象和故障模拟内容。上述故障模拟内容可以对应于以下故障类型：实例宕机、CPU满载、内存满载、磁盘占满、网络丢包、网络延时、进程阻塞、依赖服务不可用、依赖服务延时等；故障模拟内容具体包括但不限于是：进行故障模拟的模拟参数、模拟时长、模拟频率、故障模拟触发条件等信息。In some embodiments, the above fault simulation information includes: the target object that needs to be fault simulated and the fault simulation content. The above fault simulation content can correspond to the following fault types: instance downtime, CPU full load, memory full load, disk full, network packet loss, network delay, process blocking, dependent service unavailability, dependent service delay, etc.; fault simulation content Specific information includes but is not limited to: simulation parameters for fault simulation, simulation duration, simulation frequency, fault simulation trigger conditions and other information.

在一些实施例中，可以将单个故障场景逐一进行错峰模拟，检验微服务系统在单故障下的应对能力和稳定性；还可以将多个故障场景联合起来进行模拟，检验微服务系统在综合故障下的应对能力和稳定性。In some embodiments, individual fault scenarios can be simulated one by one to test the response capability and stability of the microservice system under a single fault; multiple fault scenarios can also be combined for simulation to test the comprehensive performance of the microservice system. Coping ability and stability under failure.

在步骤S240，对预设监测指标在故障场景模拟前、模拟中和模拟后的实时表现数据进行分析，得到上述微服务系统的稳定性评价结果。In step S240, the real-time performance data of the preset monitoring indicators before, during and after the fault scenario simulation are analyzed to obtain the stability evaluation results of the above-mentioned microservice system.

上述预设监测指标是与微服务的功能评价具有关联的指标，例如可以是一些能够随着故障场景的发生而具有客观变化规律(可重复实现)并能够反映出微服务性能变化的指标，从而可以基于这些预设监测指标的变化来评估微服务系统的稳定性。针对订单服务，对应的预设监测指标包括但不限于是：下单成功率、页面显示是否正常的状态、支付页面跳转是否正常、支付是否成功等。The above-mentioned preset monitoring indicators are indicators related to the functional evaluation of microservices. For example, they can be indicators that can have objective change patterns (repeatable implementation) as fault scenarios occur and can reflect changes in microservice performance, so that The stability of the microservice system can be evaluated based on changes in these preset monitoring indicators. For order services, the corresponding preset monitoring indicators include but are not limited to: order success rate, whether the page display is normal, whether the payment page jumps normally, whether the payment is successful, etc.

在一些实施例中，基于预设的评分算法来对微服务系统在故障场景模拟前、中、后的实时表现数据进行评分，各个预设监测指标对应的实时表现数据作为评分算法的输入，由评分算法输出每个故障场景模拟过程中和模拟后各个时刻下的稳定性评分。In some embodiments, the real-time performance data of the microservice system before, during, and after fault scenario simulation is scored based on a preset scoring algorithm. The real-time performance data corresponding to each preset monitoring indicator is used as the input of the scoring algorithm. The scoring algorithm outputs the stability score at each moment during and after the simulation of each fault scenario.

在一些实施例中，上述评分算法主要可以根据故障重要程度、故障对业务稳态指标的破坏程度以及其他相关因素等进行评分。In some embodiments, the above-mentioned scoring algorithm can mainly perform scoring based on the importance of the fault, the degree of damage caused by the fault to the business steady-state indicators, and other related factors.

上述步骤S240中，对预设监测指标在故障场景模拟前、模拟中和模拟后的实时表现数据进行分析，得到上述微服务系统的稳定性评价结果，包括：对上述实时表现数据的变化规律进行分析，得到在故障场景模拟中和模拟后各故障场景对微服务系统运行稳态的实时破坏程度信息；根据各故障场景的预设分配分值和上述实时破坏程度信息，确定每个故障场景下的稳定性评分；根据上述稳定性评分和各个故障场景预先分配的权重，生成上述微服务系统的稳定性综合评分，上述稳定性综合评分作为上述稳定性评价结果。In the above step S240, the real-time performance data of the preset monitoring indicators before, during and after the simulation of the fault scenario are analyzed to obtain the stability evaluation results of the above-mentioned microservice system, including: analyzing the change rules of the above-mentioned real-time performance data. Analyze and obtain the real-time damage degree information of each fault scenario to the steady state of the microservice system during and after the fault scenario simulation; based on the preset distribution score of each fault scenario and the above real-time damage degree information, determine the damage level of each fault scenario. Stability score; based on the above stability score and the pre-assigned weight of each failure scenario, a comprehensive stability score of the above microservice system is generated, and the above comprehensive stability score is used as the above stability evaluation result.

微服务系统运行稳态是指微服务系统的业务、功能或运行处于健康和稳定的状态。随着故障场景的模拟进行，预设监测指标对应的实时表现数据的变化会呈现一定规律，例如向着变差的方向变化，考虑实时表现数据变差的情况对于微服务系统的性能影响，如果影响较大，对应的实时破坏程度信息表示的破坏程度对应也较大。例如，微服务系统在实现下单功能的过程中，在模拟订单微服务所依赖的支付服务功能不可用这一故障场景的过程中，对应的下单成功率、页面显示是否正常的状态、支付页面跳转是否正常、支付是否成功等指标会相应发生变化，例如支付页面跳转是否正常、支付是否成功这两个监测指标的数据会从正常跳转变化为无跳转、支付异常等状态，从而能够得到微服务系统运行稳态的实时破坏程度信息。在一些实施例中，可以根据预设监测指标对稳定性的关联程度划分分值以及分值的变化趋势，关联程度越大，划分的分值越多，监测指标在故障场景发生后的变化可以是定性变化或者定量变化，针对定性变化情况，可以将分值划分为对应的等级区间；针对定量变化的情况，按照定量变化程度进行具体分值的线性调整。The steady state of microservice system operation means that the business, function or operation of the microservice system is in a healthy and stable state. As the simulation of the fault scenario progresses, the changes in the real-time performance data corresponding to the preset monitoring indicators will show certain patterns, such as changing in the direction of deterioration. Consider the impact of the deterioration of the real-time performance data on the performance of the microservice system. If it affects If it is larger, the damage degree represented by the corresponding real-time damage degree information will also be larger. For example, in the process of implementing the order function of the microservice system, in the process of simulating a fault scenario in which the payment service function that the order microservice relies on is unavailable, the corresponding order success rate, whether the page display is normal, and payment Indicators such as whether page jump is normal and payment is successful will change accordingly. For example, whether the payment page jump is normal and payment is successful, the data of these two monitoring indicators will change from normal jump to no jump, payment exception, etc. In this way, real-time damage degree information of the steady state of the microservice system can be obtained. In some embodiments, the score and the change trend of the score can be divided according to the correlation degree of the preset monitoring index to stability. The greater the correlation, the more scores are divided. The change of the monitoring index after the fault scenario can be It is a qualitative change or a quantitative change. For qualitative changes, the score can be divided into corresponding grade intervals; for quantitative changes, the specific score can be linearly adjusted according to the degree of quantitative change.

在一些实施例中，相似的故障可以划分成一个大类，多个大类的故障之间可以平均分配分值，例如按照运行稳态的满分为100分，一共有4个故障大类，每个故障大类的总分为25分，表示该故障大类下微服务系统稳定运行的状态；每个故障大类下面如果有多个故障，具体给每个故障划分的分值可以灵活调配，可以是平均分配或者按照不同故障的相对重要程度、实际出现频次等分配对应分值。在一些实施例中，某个故障场景的预设分配分值×(1-实时破坏程度信息(例如可以是百分比的形式))＝该故障场景下的稳定性评分。在一些实施例中，某个故障场景的预设分配分值×(1-实时破坏程度信息(例如可以是百分比的形式))×调整系数＝该故障场景下的稳定性评分；上述调整系数用于进行稳定性评分的准确度调整，可以根据模拟结果与真实发生的故障场景的表现之间的差距进行调整和优化。In some embodiments, similar faults can be divided into one major category, and scores can be evenly distributed among multiple major categories of faults. For example, according to the full score of 100 points in the steady state of operation, there are a total of 4 major categories of faults. The total score of each fault category is 25 points, which indicates the stable operation status of the microservice system under this fault category; if there are multiple faults under each fault category, the specific score assigned to each fault can be flexibly allocated. It can be distributed evenly or assigned corresponding scores according to the relative importance of different faults, actual frequency of occurrence, etc. In some embodiments, the preset assigned score of a certain fault scenario × (1 - real-time damage degree information (for example, it can be in the form of a percentage)) = the stability score under the fault scenario. In some embodiments, the preset assigned score of a certain fault scenario × (1 - real-time damage degree information (for example, it can be in the form of a percentage)) × adjustment coefficient = stability score under the fault scenario; the above adjustment coefficient is expressed by In order to adjust the accuracy of the stability score, adjustments and optimizations can be made based on the gap between the simulation results and the performance of the actual fault scenario.

在一些实施例中，根据上述稳定性评分和各个故障场景预先分配的权重，生成上述微服务系统的稳定性综合评分，包括：计算各个故障场景下的稳定性评分与各自对应的权重之间的加权和，该加权和作为微服务系统的稳定性综合评分。In some embodiments, generating a comprehensive stability score of the above-mentioned microservice system based on the above-mentioned stability score and the pre-assigned weights of each failure scenario includes: calculating the relationship between the stability score under each failure scenario and the respective corresponding weights. Weighted sum, which serves as a comprehensive score for the stability of the microservice system.

在包含步骤S210～S240的实施例中，通过对微服务系统的运行过程进行监测，得到感知拓扑信息，由于上述感知拓扑信息涵盖上述微服务系统中微服务的资源调度和服务调用至少一种拓扑关系，能够自动感测出多个微服务在进行资源调度过程中的分配拓扑关系和相对优先级、微服务进行服务调用过程中的网络拓扑关系至少一种；那么根据上述感知拓扑信息构建得到的故障模拟信息进行故障场景模拟可以更加准确地模拟出微服务系统在运行过程中可能出现的各种故障场景，对预设监测指标在故障场景模拟前、模拟中和模拟后的实时表现数据进行分析，得到的上述微服务系统的稳定性评价结果也是更加准确和完善。上述方案实现自动化模拟故障场景并自动化评价微服务系统的稳定性，评价结果更加准确，相较于采用人工手动设置测试规则和测试用例并手动进行微服务系统的稳定性测试的方案而言，降低了对测试人员的技术门槛并提升了测试的自动化程度和测试结果的准确度。In the embodiment including steps S210 to S240, the perceptual topology information is obtained by monitoring the running process of the microservice system, because the above perceptual topology information covers at least one topology of resource scheduling and service invocation of microservices in the above microservice system. Relationship, can automatically sense at least one of the distribution topology relationship and relative priority of multiple microservices in the process of resource scheduling, and the network topology relationship in the process of microservices calling services; then the system constructed based on the above perceived topology information Using fault simulation information to simulate fault scenarios can more accurately simulate various fault scenarios that may occur during the operation of the microservice system, and analyze the real-time performance data of preset monitoring indicators before, during and after the fault scenario simulation. , the obtained stability evaluation results of the above-mentioned microservice system are also more accurate and complete. The above solution automatically simulates fault scenarios and automatically evaluates the stability of the microservice system. The evaluation results are more accurate. Compared with the solution that manually sets test rules and test cases and manually conducts stability testing of the microservice system, it reduces the cost. It improves the technical threshold for testers and improves the automation of testing and the accuracy of test results.

图3示意性地示出了根据本公开一实施例的步骤S220的详细实施流程图。FIG. 3 schematically shows a detailed implementation flow chart of step S220 according to an embodiment of the present disclosure.

根据本公开的实施例，参照图3所示，上述步骤S220，根据上述感知拓扑信息，构建故障模拟信息，包括以下步骤：S310和S320。According to an embodiment of the present disclosure, referring to FIG. 3 , the above-mentioned step S220 constructs fault simulation information based on the above-mentioned sensing topology information, including the following steps: S310 and S320.

在步骤S310，根据上述感知拓扑信息和预设故障类型，确定需要进行故障模拟的目标对象。In step S310, the target object that needs to be fault simulated is determined based on the above-mentioned perceived topology information and the preset fault type.

根据本公开的实施例，上述预设故障类型包括以下故障类型中的至少一种：实例宕机、CPU满载、内存满载、磁盘占满、网络丢包、网络延时、进程阻塞、依赖服务不可用、依赖服务延时。上述预设故障类型支持配置化和定制化，用户可以通过性能评价应用的可视化界面进行故障类型的设置。According to embodiments of the present disclosure, the above-mentioned preset fault types include at least one of the following fault types: instance down, CPU full load, memory full load, disk full, network packet loss, network delay, process blocking, dependent service unavailability Use and rely on service delays. The above preset fault types support configuration and customization, and users can set the fault types through the visual interface of the performance evaluation application.

在步骤S320，针对上述目标对象，构建与上述预设故障类型对应的故障模拟内容。In step S320, for the above target object, fault simulation content corresponding to the above preset fault type is constructed.

针对实例宕机、CPU满载、内存满载、磁盘占满、网络丢包、网络延时、进程阻塞等故障类型，需要利用到资源调度对应的感知拓扑信息，因为需要提前确定针对哪些对象进行故障模拟和监测指标的探测。例如，在某个pod中具有一个或多个容器(每个容器用于运行一个微服务)，pod作为在集群中运行的进程，针对某个pod进行CPU故障模拟、内存故障模拟等需要用到感知到的资源调度的分配拓扑关系和相对优先级，先定位到需要模拟故障的对象，然后进行故障模拟。即，需要用到预先感知到的拓扑关系，来确定模拟故障所针对的对象(例如具体是哪个对象要进行各种故障模拟)。For fault types such as instance downtime, CPU full load, memory full load, disk full, network packet loss, network delay, process blocking, etc., it is necessary to use the perceived topology information corresponding to resource scheduling, because it is necessary to determine in advance which objects will be targeted for fault simulation. and detection of monitoring indicators. For example, if there are one or more containers in a pod (each container is used to run a microservice), and the pod is a process running in the cluster, it is necessary to perform CPU failure simulation, memory failure simulation, etc. for a certain pod. Based on the perceived allocation topology relationship and relative priority of resource scheduling, the object that needs to simulate faults is first located, and then the fault simulation is performed. That is, the pre-perceived topological relationship needs to be used to determine the object for which the fault is simulated (for example, which object specifically needs to be simulated for various faults).

针对依赖服务不可用、依赖服务延时等故障类型，需要用到服务调用对应的感知拓扑信息，因为需要提前确定针对哪些依赖服务进行服务不可用模拟或延时模拟。For failure types such as dependent service unavailability and dependent service delay, it is necessary to use the sensing topology information corresponding to the service call, because it is necessary to determine in advance which dependent services should be simulated for service unavailability or delay simulation.

在包含上述步骤S310和S320的实施例中，通过对微服务系统中微服务的资源调度和服务调用至少一种拓扑关系进行感知并利用这些拓扑关系，能够相对客观且准确地定位到要进行故障模拟的目标对象，并进行对应的故障场景模拟，有效提升模拟的故障与真实故障场景的贴合程度，从而使得故障模拟中预设监测指标的实时表现数据也比较贴近真实故障场景下的反应状态，提升微服务系统稳定性评价的准确度。In the embodiment including the above steps S310 and S320, by perceiving at least one topological relationship between resource scheduling and service invocation of microservices in the microservice system and utilizing these topological relationships, the fault can be relatively objectively and accurately located. The simulated target object is simulated and the corresponding fault scenario is simulated, which effectively improves the fit between the simulated fault and the real fault scenario, so that the real-time performance data of the preset monitoring indicators in the fault simulation is closer to the reaction state in the real fault scenario. , improve the accuracy of microservice system stability evaluation.

图4A示意性地示出了根据本公开一实施例的步骤S310和S320的实施过程示意图。FIG. 4A schematically shows a schematic diagram of the implementation process of steps S310 and S320 according to an embodiment of the present disclosure.

根据本公开的一种实施例，参照图4A所示，上述感知拓扑信息包含：具有调用依赖关系的第一微服务节点的节点信息，各第一微服务节点之间的调用依赖关系。第一微服务节点可以是上述微服务系统中的部分微服务或全部微服务，第一微服务节点是被其他服务调用的节点、或者是调用其他服务的节点。According to an embodiment of the present disclosure, with reference to FIG. 4A , the above-mentioned sensing topology information includes: node information of the first microservice node with a call dependency relationship, and call dependency relationships between the first microservice nodes. The first microservice node may be some or all of the microservices in the above microservice system. The first microservice node is a node called by other services or a node that calls other services.

上述步骤S310中，根据上述感知拓扑信息和预设故障类型，确定需要进行故障模拟的目标对象，包括以下步骤S310a：在上述预设故障类型为第一故障类型的情况下，根据上述调用依赖关系和上述第一微服务节点的节点信息，确定进行故障模拟的第一目标微服务节点。上述第一故障类型包含以下至少一种：依赖服务不可用、依赖服务延时。In the above-mentioned step S310, determining the target object that needs to be fault simulated based on the above-mentioned perceived topology information and the preset fault type includes the following step S310a: when the above-mentioned preset fault type is the first fault type, based on the above-mentioned calling dependency relationship and the node information of the above-mentioned first microservice node to determine the first target microservice node for fault simulation. The above-mentioned first fault type includes at least one of the following: dependent service unavailability, dependent service delay.

上述步骤S320中，针对上述目标对象，构建与上述预设故障类型对应的故障模拟内容，包括以下步骤S320a：针对上述第一目标微服务节点，构建上述第一故障类型对应的故障模拟内容。In the above step S320, constructing fault simulation content corresponding to the above preset fault type for the above target object includes the following step S320a: constructing fault simulation content corresponding to the above first fault type for the above first target microservice node.

图4B示意性地示出了根据本公开另一实施例的步骤S310和S320的实施过程示意图。FIG. 4B schematically shows a schematic diagram of the implementation process of steps S310 and S320 according to another embodiment of the present disclosure.

根据本公开的另一种实施例，参照图4B所示，上述感知拓扑信息包含：上述感知拓扑信息包含：上述微服务系统中部署的第二微服务节点的节点信息，第二微服务节点进行资源调度的分配拓扑关系和优先级信息。第二微服务节点可以是上述微服务系统中的部分微服务或全部微服务，第二微服务节点是分配了资源的运行节点，或者进行资源等待队列的节点。According to another embodiment of the present disclosure, with reference to FIG. 4B , the above-mentioned sensing topology information includes: the above-mentioned sensing topology information includes: node information of the second microservice node deployed in the above-mentioned microservice system, and the second microservice node performs Allocation topology relationship and priority information for resource scheduling. The second microservice node can be some or all of the microservices in the above microservice system. The second microservice node is a running node to which resources are allocated, or a node that queues resources.

上述步骤S310中，根据上述感知拓扑信息和预设故障类型，确定需要进行故障模拟的目标对象，包括以下步骤S310b-1：在上述预设故障类型为第二故障类型的情况下，根据上述第二微服务节点的节点信息和上述优先级信息，确定进行故障模拟的第二目标微服务节点；或者，包括以下步骤S310b-2：在上述预设故障类型为第二故障类型的情况下，根据上述第二微服务节点的节点信息、上述优先级信息和上述分配拓扑关系，确定进行故障模拟的第二目标微服务节点。其中上述第二故障类型包含以下至少一种：实例宕机、CPU满载、内存满载、磁盘占满、网络丢包、网络延时、进程阻塞。In the above-mentioned step S310, determining the target object that needs to be fault simulated based on the above-mentioned perceived topology information and the preset fault type includes the following step S310b-1: In the case where the above-mentioned preset fault type is the second fault type, according to the above-mentioned third fault type, Determine the second target microservice node for fault simulation based on the node information of the two microservice nodes and the above priority information; or, include the following step S310b-2: When the above preset fault type is the second fault type, according to The node information of the above-mentioned second microservice node, the above-mentioned priority information and the above-mentioned allocation topology relationship determine the second target microservice node for fault simulation. The above-mentioned second fault type includes at least one of the following: instance downtime, CPU full load, memory full load, disk full, network packet loss, network delay, and process blocking.

上述步骤S320中，针对上述目标对象，构建与上述预设故障类型对应的故障模拟内容，包括以下步骤S320b：针对上述第二目标微服务节点，构建上述第二故障类型对应的故障模拟内容。In the above step S320, constructing fault simulation content corresponding to the above preset fault type for the above target object includes the following step S320b: constructing fault simulation content corresponding to the above second fault type for the above second target microservice node.

在一些实施场景中，微服务系统中的资源调度过程是动态变化的，之前感知到的分配拓扑关系的场景(例如CPU1的T1～T2时段分配给微服务1，期间微服务3处于调度等待状态；将CPU2的T1～T3时段分配给微服务2；CPU1的T2～T4时段分配给微服务3)与当前进行故障模拟时(例如为T5时段，T5晚于T4和T3)面临的场景不同，则仅根据优先级信息作为资源调度分配的参考因素。即，根据各第二微服务节点之间进行资源调度的优先级信息和上述第二微服务节点的节点信息来确定当前进行故障模拟时的资源调度分配结果，并根据该资源调度分配结果确定进行故障模拟的第二目标微服务节点。In some implementation scenarios, the resource scheduling process in the microservice system changes dynamically, and the previously perceived allocation topology relationship scenarios (for example, the T1~T2 period of CPU1 is allocated to microservice 1, during which microservice 3 is in a scheduling waiting state ; Assign the T1~T3 period of CPU2 to microservice 2; assign the T2~T4 period of CPU1 to microservice 3). This is different from the scenario faced when performing fault simulation (for example, it is the T5 period, and T5 is later than T4 and T3). Then only priority information is used as a reference factor for resource scheduling and allocation. That is, the resource scheduling allocation result during the current fault simulation is determined based on the priority information of resource scheduling between the second microservice nodes and the node information of the above-mentioned second microservice node, and the resource scheduling allocation result is determined based on the resource scheduling allocation result. The second target microservice node for fault simulation.

在另一些实施场景中，当前进行故障模拟时面临的场景就是构建的分配拓扑关系所对应的场景，则可以将上述分配拓扑关系和优先级共同作为资源调度分配的参考因素。即，根据上述分配拓扑关系、各第二微服务节点之间进行资源调度的优先级信息和上述第二微服务节点的节点信息来确定当前进行故障模拟时的资源调度分配结果，并根据该资源调度分配结果确定进行故障模拟的第二目标微服务节点。In other implementation scenarios, the scenario currently faced when performing fault simulation is the scenario corresponding to the constructed allocation topology relationship, and the above allocation topology relationship and priority can be used as reference factors for resource scheduling and allocation. That is, the resource scheduling allocation result during the current fault simulation is determined based on the above-mentioned allocation topology relationship, the priority information for resource scheduling between the second micro-service nodes and the node information of the above-mentioned second micro-service node, and based on the resource The scheduling allocation result determines the second target microservice node for fault simulation.

图5示意性地示出了根据本公开另一实施例的微服务性能的评价方法的流程图。Figure 5 schematically shows a flow chart of a method for evaluating microservice performance according to another embodiment of the present disclosure.

根据本公开的实施例，上述评价方法除了包括上述步骤S210～S240之外，还包括构建预设监测指标的过程；上述构建预设监测指标包含以下步骤：S510、S520、S530和S540，为了简化示意，在图5中仅示意了步骤S510～S540。上述步骤S510～S540在步骤S240之前执行。According to an embodiment of the present disclosure, in addition to the above-mentioned steps S210 to S240, the above-mentioned evaluation method also includes a process of constructing preset monitoring indicators; the above-mentioned construction of preset monitoring indicators includes the following steps: S510, S520, S530 and S540. To simplify For illustration, only steps S510 to S540 are shown in FIG. 5 . The above steps S510 to S540 are executed before step S240.

在步骤S510，获取上述微服务系统中的各个微服务处于运行稳态下的第一历史状态数据和发生异常对应的第二历史状态数据。In step S510, the first historical status data of each microservice in the above-mentioned microservice system when it is running in a steady state and the second historical status data corresponding to an abnormality are obtained.

上述第一历史状态数据是各个微服务真实运行过程中的状态数据，第二历史状态数据是各个微服务在应对真实故障场景发生异常对应的状态数据。The above-mentioned first historical status data is the status data during the actual operation of each microservice, and the second historical status data is the status data corresponding to the abnormality of each microservice in response to a real fault scenario.

在步骤S520，根据上述第一历史状态数据和上述第二历史状态数据，确定用于表示微服务运行情况的候选监测指标。In step S520, candidate monitoring indicators used to represent the running status of the microservice are determined based on the above-mentioned first historical status data and the above-mentioned second historical status data.

上述候选监测指标用于作为指标配置界面中各微服务对应的指标选项。The above candidate monitoring indicators are used as indicator options corresponding to each microservice in the indicator configuration interface.

在步骤S530，接收用户在上述指标配置界面中针对指标选项的选择信息和自定义指标信息。In step S530, the user's selection information and customized indicator information for indicator options in the above indicator configuration interface are received.

在一些实施例中，为满足性能评价的个性化需求或者一些特殊测试需求，通过在指标配置界面中设置自定义指标配置功能，用户不仅能够对已经有的指标选项进行选择，还可以根据业务性能评价需求而利用自定义指标配置功能来设置自定义指标信息。In some embodiments, in order to meet the personalized needs of performance evaluation or some special testing needs, by setting a custom indicator configuration function in the indicator configuration interface, users can not only select existing indicator options, but also choose based on business performance Evaluate needs and use the custom indicator configuration function to set custom indicator information.

在步骤S540，根据上述选择信息和上述自定义指标信息，生成各微服务对应的预设监测指标。In step S540, preset monitoring indicators corresponding to each microservice are generated based on the above selection information and the above custom indicator information.

在包含步骤S510～S540的实施例中，大部分的指标选项可以是由性能评价应用经过智能化分析而生成的，减少了人为设置指标所需的时间和人力成本；同时还支持用户对自定义指标的配置，提升了构建预设监测指标的智能化程度和灵活性，有助于满足各种性能测试和评价场景的需求，适用性广泛。In the embodiment including steps S510 to S540, most of the indicator options can be generated by intelligent analysis by the performance evaluation application, reducing the time and labor costs required to manually set indicators; at the same time, it also supports user customization The configuration of indicators improves the intelligence and flexibility of building preset monitoring indicators, helps meet the needs of various performance testing and evaluation scenarios, and has wide applicability.

本公开的第二个示例性实施例提供一种微服务性能的评价装置。A second exemplary embodiment of the present disclosure provides an evaluation device for microservice performance.

图6示意性地示出了根据本公开一实施例的微服务性能的评价装置的结构框图。Figure 6 schematically shows a structural block diagram of a microservice performance evaluation device according to an embodiment of the present disclosure.

参照图6所示，本公开实施例提供的微服务性能的评价装置600包括：监测模块601、构建模块602、故障模拟模块603和评价模块604。Referring to FIG. 6 , the microservice performance evaluation device 600 provided by the embodiment of the present disclosure includes: a monitoring module 601 , a construction module 602 , a fault simulation module 603 and an evaluation module 604 .

上述监测模块601用于对微服务系统的运行过程进行监测，得到感知拓扑信息；上述感知拓扑信息涵盖上述微服务系统中微服务的资源调度和服务调用至少一种拓扑关系。The above-mentioned monitoring module 601 is used to monitor the running process of the microservice system and obtain perceptual topology information; the above-mentioned perceptual topology information covers at least one topological relationship between resource scheduling and service invocation of microservices in the above-mentioned microservice system.

上述构建模块602用于根据上述感知拓扑信息，构建故障模拟信息。The above-mentioned building module 602 is used to construct fault simulation information based on the above-mentioned sensing topology information.

上述故障模拟模块603用于根据上述故障模拟信息，在上述微服务系统中进行故障场景模拟。The above-mentioned fault simulation module 603 is used to perform fault scenario simulation in the above-mentioned microservice system based on the above-mentioned fault simulation information.

上述评价模块604用于对预设监测指标在故障场景模拟前、模拟中和模拟后的实时表现数据进行分析，得到上述微服务系统的稳定性评价结果。The above-mentioned evaluation module 604 is used to analyze the real-time performance data of the preset monitoring indicators before, during and after the fault scenario simulation to obtain the stability evaluation results of the above-mentioned microservice system.

根据本公开的实施例，上述评价装置还包括：监测指标构建模块。According to an embodiment of the present disclosure, the above-mentioned evaluation device further includes: a monitoring index building module.

上述监测指标构建模块用于：获取上述微服务系统中的各个微服务处于运行稳态下的第一历史状态数据和发生异常对应的第二历史状态数据；根据上述第一历史状态数据和上述第二历史状态数据，确定用于表示微服务运行情况的候选监测指标；上述候选监测指标用于作为指标配置界面中各微服务对应的指标选项；接收用户在上述指标配置界面中针对指标选项的选择信息和自定义指标信息；根据上述选择信息和上述自定义指标信息，生成各微服务对应的预设监测指标。The above-mentioned monitoring indicator building module is used to: obtain the first historical status data of each microservice in the above-mentioned microservice system when it is running in a steady state and the second historical status data corresponding to an abnormality; according to the above-mentioned first historical status data and the above-mentioned third 2. Historical status data to determine candidate monitoring indicators used to represent the running status of microservices; the above candidate monitoring indicators are used as indicator options corresponding to each microservice in the indicator configuration interface; receive the user's selection of indicator options in the above indicator configuration interface information and custom indicator information; based on the above selection information and the above custom indicator information, the preset monitoring indicators corresponding to each microservice are generated.

本实施例更多的细节或有益效果等可以参照第一个实施例的详细描述，这里不再赘述。For more details or beneficial effects of this embodiment, please refer to the detailed description of the first embodiment, which will not be described again here.

上述评价装置600所包含的功能模块中的任意多个可以合并在一个模块中实现，或者其中的任意一个模块可以被拆分成多个模块。或者，这些模块中的一个或多个模块的至少部分功能可以与其他模块的至少部分功能相结合，并在一个模块中实现。评价装置600所包含的功能模块中的至少一个可以至少被部分地实现为硬件电路，例如现场可编程门阵列(FPGA)、可编程逻辑阵列(PLA)、片上系统、基板上的系统、封装上的系统、专用集成电路(ASIC)，或可以通过对电路进行集成或封装的任何其他的合理方式等硬件或固件来实现，或以软件、硬件以及固件三种实现方式中任意一种或以其中任意几种的适当组合来实现。或者，评价装置600所包含的功能模块中的至少一个可以至少被部分地实现为计算机程序模块，当该计算机程序模块被运行时，可以执行相应的功能。Any number of the functional modules included in the above-mentioned evaluation device 600 can be combined and implemented in one module, or any one of the modules can be split into multiple modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of other modules and implemented in one module. At least one of the functional modules included in the evaluation device 600 may be at least partially implemented as a hardware circuit, such as a field programmable gate array (FPGA), a programmable logic array (PLA), a system on a chip, a system on a substrate, or a system on a package. system, application specific integrated circuit (ASIC), or can be implemented by hardware or firmware in any other reasonable way to integrate or package the circuit, or by any one of the three implementation methods of software, hardware and firmware, or in any of them. Any appropriate combination of these can be achieved. Alternatively, at least one of the functional modules included in the evaluation device 600 may be at least partially implemented as a computer program module, and when the computer program module is executed, the corresponding function may be executed.

本公开的第三个示例性实施例提供了一种电子设备。A third exemplary embodiment of the present disclosure provides an electronic device.

图7示意性示出了本公开实施例提供的电子设备的结构框图。FIG. 7 schematically shows a structural block diagram of an electronic device provided by an embodiment of the present disclosure.

参照图7所示，本公开实施例提供的电子设备700包括处理器701、通信接口702、存储器703和通信总线704，其中，处理器701、通信接口702和存储器703通过通信总线704完成相互间的通信；存储器703，用于存放计算机程序；处理器701，用于执行存储器上所存放的程序时，实现如上所述的微服务性能的评价方法。Referring to FIG. 7 , an electronic device 700 provided by an embodiment of the present disclosure includes a processor 701 , a communication interface 702 , a memory 703 , and a communication bus 704 . The processor 701 , the communication interface 702 , and the memory 703 complete interactions with each other through the communication bus 704 . communication; the memory 703 is used to store computer programs; the processor 701 is used to implement the above-mentioned evaluation method of microservice performance when executing the program stored in the memory.

本公开的第四个示例性实施例还提供了一种计算机可读存储介质。上述计算机可读存储介质上存储有计算机程序，上述计算机程序被处理器执行时实现如上所述的微服务性能的评价方法。A fourth exemplary embodiment of the present disclosure also provides a computer-readable storage medium. The computer program is stored on the computer-readable storage medium. When the computer program is executed by the processor, the evaluation method of microservice performance as described above is implemented.

该计算机可读存储介质可以是上述实施例中描述的设备或装置中所包含的；也可以是单独存在，而未装配入该设备或装置中。上述计算机可读存储介质承载有一个或者多个程序，当上述一个或者多个程序被执行时，实现根据本公开实施例的方法。The computer-readable storage medium may be included in the device or device described in the above embodiments; it may also exist independently without being assembled into the device or device. The above computer-readable storage medium carries one or more programs. When the above one or more programs are executed, the method according to the embodiment of the present disclosure is implemented.

根据本公开的实施例，计算机可读存储介质可以是非易失性的计算机可读存储介质，例如可以包括但不限于：便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, but is not limited to, portable computer disks, hard disks, random access memory (RAM), and read-only memory (ROM). , erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.

需要说明的是，本公开实施例提供的技术方案中，所涉及的用户个人信息的采集、收集、更新、分析、处理、使用、传输、存储等方面，均符合相关法律法规的规定，被用于合法的用途，且不违背公序良俗。对用户个人信息采取必要措施，防止对用户个人信息数据的非法访问，维护用户个人信息安全、网络安全和国家安全。It should be noted that in the technical solutions provided by the embodiments of the present disclosure, the collection, collection, updating, analysis, processing, use, transmission, storage, etc. of user personal information are all in compliance with relevant laws and regulations and are used. For legitimate purposes and not contrary to public order and good customs. Take necessary measures for users' personal information to prevent illegal access to users' personal information and maintain the security of users' personal information, network security and national security.

需要说明的是，在本文中，诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relational terms such as “first” and “second” are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these There is no such actual relationship or sequence between entities or operations. Furthermore, the terms "comprises," "comprises," or any other variations thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also those not expressly listed other elements, or elements inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or apparatus that includes the stated element.

以上所述仅是本公开的具体实施方式，使本领域技术人员能够理解或实现本公开。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的，本文中所定义的一般原理可以在不脱离本公开的精神或范围的情况下，在其它实施例中实现。因此，本公开将不会被限制于本文所示的这些实施例，而是要符合与本文所申请的原理和新颖特点相一致的最宽的范围。The above descriptions are only specific embodiments of the present disclosure, enabling those skilled in the art to understand or implement the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be practiced in other embodiments without departing from the spirit or scope of the disclosure. Therefore, the present disclosure is not to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features claimed herein.

Claims

1. An evaluation method for microservice performance, which is characterized by including:

Monitor the running process of the microservice system to obtain perceptual topology information; the perceptual topology information covers at least one topological relationship between resource scheduling and service invocation of microservices in the microservice system;

Construct fault simulation information based on the perceived topology information;

Carry out fault scenario simulation in the microservice system according to the fault simulation information;

Analyze the real-time performance data of the preset monitoring indicators before, during and after the fault scenario simulation to obtain the stability evaluation results of the microservice system.

2. The evaluation method according to claim 1, characterized in that, based on the perceived topology information, fault simulation information is constructed, including:

According to the perceived topology information and the preset fault type, determine the target object that needs to be fault simulated;

For the target object, construct fault simulation content corresponding to the preset fault type;

Wherein, the fault simulation information includes the target object and the fault simulation content.

3. The evaluation method according to claim 2, wherein the preset fault type includes at least one of the following fault types: instance downtime, CPU full load, memory full load, disk full, network packet loss, Network delay, process blocking, dependent service unavailability, dependent service delay.

4. The evaluation method according to claim 2, wherein the sensing topology information includes: node information of the first microservice node with a call dependency relationship, and call dependency relationships between the first microservice nodes;

Wherein, determining the target object that needs to be fault simulated based on the perceived topology information and the preset fault type includes: when the preset fault type is the first fault type, based on the calling dependency relationship and the The node information of the first microservice node determines the first target microservice node for fault simulation; the first fault type includes at least one of the following: dependent service unavailability, dependent service delay;

Constructing fault simulation content corresponding to the preset fault type for the target object includes: constructing fault simulation content corresponding to the first fault type for the first target microservice node.

5. The evaluation method according to claim 2, wherein the sensing topology information includes: node information of a second microservice node deployed in the microservice system, and the second microservice node performs resource scheduling allocation. Topological relationships and priority information;

Wherein, determining the target object that needs to perform fault simulation according to the perceived topology information and the preset fault type includes: when the preset fault type is the second fault type, according to the second microservice node Determine the second target microservice node for fault simulation based on the node information and the priority information; or determine the second target microservice node for fault simulation based on the node information of the second microservice node, the priority information and the distribution topology relationship. The simulated second target microservice node; wherein the second fault type includes at least one of the following: instance downtime, CPU full load, memory full load, disk full, network packet loss, network delay, process blocking;

Constructing fault simulation content corresponding to the preset fault type for the target object includes: constructing fault simulation content corresponding to the second fault type for the second target microservice node.

6. The evaluation method according to claim 1, characterized in that the real-time performance data of the preset monitoring indicators before, during and after the fault scenario simulation are analyzed to obtain the stability evaluation result of the microservice system. ,include:

Analyze the change patterns of the real-time performance data to obtain information on the real-time damage degree of each fault scenario to the steady state of the microservice system during and after the fault scenario simulation;

Determine the stability score for each fault scenario based on the preset assigned score for each fault scenario and the real-time damage degree information;

According to the stability score and the pre-assigned weight of each failure scenario, a comprehensive stability score of the microservice system is generated, and the comprehensive stability score is used as the stability evaluation result.

7. The evaluation method according to claim 1, further comprising:

Obtain the first historical state data of each microservice in the microservice system when it is in a steady state of operation and the second historical state data corresponding to an exception;

According to the first historical status data and the second historical status data, candidate monitoring indicators used to represent the operation status of the microservice are determined; the candidate monitoring indicators are used as indicator options corresponding to each microservice in the indicator configuration interface;

Receive the user's selection information and customized indicator information for indicator options in the indicator configuration interface;

According to the selection information and the custom indicator information, preset monitoring indicators corresponding to each microservice are generated.

8. An evaluation device for microservice performance, which is characterized by including:

A monitoring module, used to monitor the running process of the microservice system and obtain perceptual topology information; the perceptual topology information covers at least one topological relationship between resource scheduling and service invocation of microservices in the microservice system;

A building module used to construct fault simulation information based on the sensing topology information;

A fault simulation module, configured to perform fault scenario simulation in the microservice system based on the fault simulation information;

The evaluation module is used to analyze the real-time performance data of the preset monitoring indicators before, during and after the simulation of the fault scenario to obtain the stability evaluation results of the microservice system.

9. An electronic device, characterized in that it includes a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete communication with each other through the communication bus;

Memory, used to store computer programs;

The processor is used to implement the method described in any one of claims 1-7 when executing a program stored in the memory.

10. A computer-readable storage medium with a computer program stored thereon, characterized in that when the computer program is executed by a processor, the method according to any one of claims 1-7 is implemented.