[go: up one dir, main page]

WO2023206875A1 - Indicator distance-based indicator deduplication method and apparatus - Google Patents

Indicator distance-based indicator deduplication method and apparatus Download PDF

Info

Publication number
WO2023206875A1
WO2023206875A1 PCT/CN2022/114362 CN2022114362W WO2023206875A1 WO 2023206875 A1 WO2023206875 A1 WO 2023206875A1 CN 2022114362 W CN2022114362 W CN 2022114362W WO 2023206875 A1 WO2023206875 A1 WO 2023206875A1
Authority
WO
WIPO (PCT)
Prior art keywords
indicator
indicators
benchmark
distance
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2022/114362
Other languages
French (fr)
Chinese (zh)
Inventor
蒋乾
李扬
韩卿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Kyligence Information Technology Co Ltd
Original Assignee
Shanghai Kyligence Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Kyligence Information Technology Co Ltd filed Critical Shanghai Kyligence Information Technology Co Ltd
Publication of WO2023206875A1 publication Critical patent/WO2023206875A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket

Definitions

  • the present application relates to the field of Internet technology, specifically, to an index deduplication method, device, computer equipment and storage medium based on index distance.
  • indicator development work is usually carried out in different teams in the data department and business department. Due to the lack of unified indicator development specifications, a large number of duplicate indicators with synonymous names and different names will be generated. On the one hand, repeated indicators can easily lead to inconsistent indicator calibers, making people who use indicators afraid of trusting indicators; on the other hand, repeated indicators can take up a lot of storage space.
  • Embodiments of the present invention provide an index deduplication method, device, computer equipment and storage medium based on index distance to solve the problem in related technologies that repeated indicators easily lead to inconsistent index calibers and occupy a large amount of storage space.
  • a first aspect of the embodiment of the present invention provides an index deduplication method based on index distance, including:
  • the indicator distance is smaller than the indicator repetition threshold in multiple consecutive time periods, it means that the two indicators are synonymous, otherwise they are not synonymous.
  • the method further includes:
  • a possible implementation of the first aspect includes:
  • determining whether to complete the indicator data or delete the indicator data based on the start and end time of the benchmark indicator includes:
  • start and end time range of the benchmark indicator is greater than the start and end time range of the indicator to be merged, the data of the indicator to be merged will be automatically completed;
  • start and end time range of the benchmark indicator is smaller than the start and end time range of the indicators to be merged, part of the indicator data to be merged will be automatically deleted.
  • comparing indicators with the same time dimension in the incremental data, and calculating the indicator distance between the two indicators including:
  • the index distance of the two indicators is calculated.
  • a second aspect of the embodiment of the present invention provides an indicator deduplication device based on indicator distance, including:
  • Incremental data selection module used to select incremental data of indicators within a preset time
  • An indicator distance calculation module is used to compare two indicators with the same time dimension in the incremental data and calculate the indicator distance between the two indicators
  • the threshold judgment module is used to indicate that if the distance between the indicators is smaller than the indicator repetition threshold in multiple consecutive time periods, the two indicators are synonymous; otherwise, they are not synonymous.
  • the device further includes:
  • the indicator synthesis module is used to generate indicator merging suggestions based on the indicator's dimension information, start and end time, owner, department, and application information of the referenced indicator.
  • the indicator synthesis module includes:
  • a selection unit is used to select a benchmark indicator for indicator merging, where the benchmark indicator refers to an indicator retained when two indicators are merged;
  • the judgment unit is used to judge whether to complete the indicator data or delete the indicator data based on the start and end time of the benchmark indicator;
  • Replacement unit used to replace application information of non-benchmark indicators with benchmark indicators
  • Delete unit used to delete non-benchmark indicators.
  • a third aspect of the embodiment of the present invention provides a computer device, including a memory and a processor.
  • the memory stores a computer program that can be run on the processor.
  • the processor executes the computer program, the above methods are implemented. Steps in Examples.
  • a fourth aspect of the embodiments of the present invention provides a readable storage medium.
  • a computer program is stored in the readable storage medium. When the computer program is executed by a processor, it is used to implement the first aspect of the present invention and the first aspect thereof. Various possible designs of the steps of the method.
  • the index deduplication method, device, computer equipment and storage medium based on index distance provided by the present invention select the incremental data of the index within a preset time; compare the indexes with the same time dimension in the incremental data in pairs Yes, calculate the index distance between the two indicators; if the indicator distance is smaller than the indicator repetition threshold in multiple consecutive time periods, it means that the two indicators are synonymous, otherwise they are not synonymous.
  • the invention can help indicator managers eliminate duplicate indicators, unify indicator calibers, and save indicator storage space.
  • Figure 1 is a flow chart of the first implementation of the index deduplication method based on index distance
  • Figure 2 is a structural diagram of a first implementation of an index deduplication device based on index distance.
  • the size of the sequence numbers of each process does not mean the order of execution.
  • the execution order of each process should be determined by its functions and internal logic, and should not be determined by the execution order of the embodiments of the present invention.
  • the implementation process constitutes no limitation.
  • pluriality means two or more.
  • “And/or” is just an association relationship that describes related objects, indicating that three relationships can exist.
  • B can mean: A alone exists, A and B exist simultaneously, and B alone exists. .
  • the character “/” generally indicates that the related objects are in an "or” relationship.
  • “Includes A, B and C” and “includes A, B, C” means that it includes all three of A, B and C
  • “includes A, B or C” means that it includes one of A, B and C.
  • “Including A, B and/or C” means including any one, any two or three of A, B and C.
  • B corresponding to A means that B is associated with A. According to A B can be determined. Determining B based on A does not mean determining B only based on A, but can also determine B based on A and/or other information.
  • the matching between A and B means that the similarity between A and B is greater than or equal to the preset threshold.
  • the present invention provides an index deduplication method based on index distance.
  • the flow chart is shown in Figure 1, which includes:
  • Step S110 Select incremental data of the indicator within a preset time.
  • the length of the preset time is related to the time dimension of the indicator. If the time dimension of the indicator is day-level granularity, the newly added indicator data of the previous day will be generated every day.
  • the indicator period is one day, and the preset time is n indicator periods, that is, the last n days; if the time dimension of the indicator is week-level granularity, the newly added indicator data of the previous week will be generated every week, and the indicator period is one week, and the preset time is n indicator periods, that is, the latest n weeks.
  • Step S120 Compare indicators with the same time dimension in the incremental data, and calculate the indicator distance between the two indicators.
  • the newly added indicator data includes indicators in different time dimensions, including indicator data 1 that is added once a day, indicator data 2 that is added once a week, and indicator data that is added once at a specific time based on artificial settings.
  • Indicator data 3 Therefore, when comparing indicators in pairs, it is necessary to first classify indicator data in the same time dimension. Then any two indicator data belonging to the same time dimension are calculated and compared, and their indicator distance is calculated.
  • the indicator distance refers to the distance between which the data difference of the indicator is calculated within n indicator periods.
  • the calculation method of the indicator distance is as follows:
  • Step 1 Calculate the index difference between the two indicators, as well as the average and standard deviation of the index difference.
  • , where i 1,2,...,n;
  • Step 2 Calculate the index distance of the two indicators based on the index difference and the average and standard deviation of the index difference.
  • Step S130 If the indicator distance is smaller than the indicator repetition threshold in multiple consecutive time periods, it means that the two indicators are synonymous; otherwise, they are not synonymous.
  • the time period is determined by the time dimension of the two indicators used to calculate the indicator distance. That is to say: if the time dimension of these two indicators is days, then the indicator distance within consecutive days needs to be less than the repetition threshold, indicating that the two The indicators are synonymous; or if the time dimension of the two indicators is week, then the indicator distance in consecutive weeks needs to be less than the repetition threshold to indicate that the two indicators are synonymous.
  • the indicator duplication threshold k is a threshold that measures whether two indicators are repeated. When dist(Indicator A, Indicator B)>k is calculated, it means that the two indicators are synonymous; when dist(Indicator A, Indicator B) ⁇ k is calculated, it means that the two indicators are not synonymous.
  • the method further includes:
  • the present invention also proposes to automatically generate indicator merging suggestions based on the dimension information of the indicator, the start and end time of the indicator data, the owner of the indicator, the department to which the indicator belongs, and the application (such as BI report) that refers to the indicator, and This serves as the basis for the indicator administrator to decide whether to merge indicators.
  • the method further includes:
  • Select the benchmark indicator for indicator merging which refers to the indicator retained when the two indicators are merged; determine whether to complete the indicator data or delete the indicator data based on the start and end time of the benchmark indicator; replace the application information of non-benchmark indicators become benchmark indicators; delete non-benchmark indicators.
  • Determining whether to complete the indicator data or delete the indicator data based on the start and end time of the benchmark indicator includes: if the start and end time range of the benchmark indicator is greater than the start and end time range of the indicator to be merged, automatically completing the indicator data to be merged; If the start and end time range is smaller than the start and end time range of the indicators to be merged, part of the indicator data to be merged will be automatically deleted.
  • the indicator merging suggestions are mainly divided into the following three points:
  • the so-called base coordinate is also the indicator that is retained when the two indicators are merged.
  • indicator A and indicator B have synonyms and different names.
  • the indicator data is supplemented or automatically deleted according to the start and end time of the benchmark indicator. That is, if the start and end time range of the benchmark indicator is greater than the start and end time range of the indicator to be merged, the indicator data to be merged will be automatically completed; if the start and end time of the benchmark indicator are If the range is smaller than the start and end time range of the indicators to be merged, some of the indicator data to be merged will be automatically deleted. For example: the start and end time of indicator A is from January to December, and the start and end time of indicator B is from May to December. When selecting indicator B as the base coordinate, specify the start and end time from May to December, then 1-4 will be automatically deleted. Indicator A within the monthly time range; when selecting indicator A as the base coordinate and specifying the start and end time as January to December, indicator B within the January to April time range will be automatically supplemented.
  • the method further includes: after merging duplicate indicators, generating a difference in storage space after merging indicators - before merging indicators, and viewing the effect of the space saved after merging indicators through a visual interface, specifically as follows :
  • the index deduplication method based on index distance selects the incremental data of the index within a preset time; compares the indexes with the same time dimension in the incremental data to calculate the index of the two indexes. distance; if the distance between the indicators is smaller than the indicator repetition threshold in multiple consecutive time periods, it means that the two indicators are synonymous, otherwise they are not synonymous.
  • the invention can help indicator managers eliminate duplicate indicators, unify indicator calibers, and save indicator storage space.
  • Embodiments of the present invention also provide an indicator deduplication device based on indicator distance, as shown in Figure 2, including:
  • Incremental data selection module used to select incremental data of indicators within a preset time
  • An indicator distance calculation module is used to compare two indicators with the same time dimension in the incremental data and calculate the indicator distance between the two indicators
  • the threshold judgment module is used to indicate that if the distance between the indicators is smaller than the indicator repetition threshold in multiple consecutive time periods, the two indicators are synonymous; otherwise, they are not synonymous.
  • the device further includes:
  • the indicator synthesis module is used to generate indicator merging suggestions based on the indicator's dimension information, start and end time, owner, department, and application information of the referenced indicator.
  • the indicator synthesis module includes:
  • the selection unit is used to select the benchmark indicator used for indicator merging.
  • the benchmark indicator refers to the indicator retained when the two indicators are merged;
  • the judgment unit is used to judge whether to complete the indicator data or delete the indicator data based on the start and end time of the benchmark indicator;
  • Replacement unit used to replace application information of non-benchmark indicators with benchmark indicators
  • Delete unit used to delete non-benchmark indicators.
  • the index deduplication device based on index distance selects the incremental data of the index within a preset time; compares the indexes with the same time dimension in the incremental data to calculate the index of the two indexes. distance; if the distance between the indicators is smaller than the indicator repetition threshold in multiple consecutive time periods, it means that the two indicators are synonymous, otherwise they are not synonymous.
  • the invention can help indicator managers eliminate duplicate indicators, unify indicator calibers, and save indicator storage space.
  • the readable storage medium may be a computer storage medium or a communication medium.
  • Communication media includes any medium that facilitates transfer of a computer program from one place to another.
  • Computer storage media can be any available media that can be accessed by a general purpose or special purpose computer.
  • a readable storage medium is coupled to a processor such that the processor can read information from the readable storage medium and write information to the readable storage medium.
  • the readable storage medium may also be an integral part of the processor.
  • the processor and readable storage medium may be located in Application Specific Integrated Circuits (ASICs). Additionally, the ASIC can be located in the user equipment.
  • ASICs Application Specific Integrated Circuits
  • the processor and the readable storage medium may also exist as discrete components in the communication device.
  • Readable storage media can be read-only memory (ROM), random-access memory (RAM), CD-ROM, tapes, floppy disks, optical data storage devices, etc.
  • the present invention also provides a program product.
  • the program product includes execution instructions, and the execution instructions are stored in a readable storage medium.
  • At least one processor of the device can read the execution instruction from the readable storage medium, and at least one processor executes the execution instruction to cause the device to implement the methods provided by the various embodiments described above.
  • the processor may be a central processing unit (English: Central Processing Unit, CPU), or other general-purpose processor, digital signal processor (English: Digital Signal Processor, DSP )wait.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method disclosed in the present invention can be directly implemented by a hardware processor, or executed by a combination of hardware and software modules in the processor.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Educational Administration (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Traffic Control Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed in the embodiments of the present invention are an indicator distance-based indicator deduplication method and apparatus. Incremental data of indicators within a preset time is selected; pairwise comparison is performed on the indicators having the same time dimension in the incremental data, and the indicator distance of two indicators are calculated; and if the indicator distances in multiple continuous time periods are all smaller than an indicator repetition threshold value, the two indicators are proved to be synonymous, otherwise, the two indicators are non-synonymous. The present invention may help to eliminate repeated indicators for indicator management personnel, help to unify the indicator standard, and help to save the indicator storage space.

Description

基于指标距离的指标去重方法及装置Index deduplication method and device based on index distance 技术领域Technical field

本申请涉及互联网技术领域,具体而言,涉及一种基于指标距离的指标去重方法、装置、计算机设备和存储介质。The present application relates to the field of Internet technology, specifically, to an index deduplication method, device, computer equipment and storage medium based on index distance.

背景技术Background technique

在指标系统的建设过程中,为了平衡急迫的适用指标的需求与缓慢的指标开发速度之间的矛盾,指标的开发工作通常是在数据部门和业务部门不同的团队进行的。由于缺少统一的指标开发规范,会产生大量的同义不同名的重复指标。一方面,重复的指标容易导致指标口径不统一,让使用指标的人不敢信任指标;另一方面,重复的指标会占用大量存储空间。During the construction process of the indicator system, in order to balance the contradiction between the urgent need for applicable indicators and the slow speed of indicator development, indicator development work is usually carried out in different teams in the data department and business department. Due to the lack of unified indicator development specifications, a large number of duplicate indicators with synonymous names and different names will be generated. On the one hand, repeated indicators can easily lead to inconsistent indicator calibers, making people who use indicators afraid of trusting indicators; on the other hand, repeated indicators can take up a lot of storage space.

针对相关技术中重复的指标容易导致指标口径不统一,以及会占用大量存储空间的问题,目前尚未提出有效的解决方案。In view of the problem that repeated indicators in related technologies can easily lead to inconsistent indicator calibers and occupy a large amount of storage space, no effective solution has yet been proposed.

发明内容Contents of the invention

本发明实施例提供一种基于指标距离的指标去重方法、装置、计算机设备和存储介质,用以解决相关技术中重复的指标容易导致指标口径不统一,以及会占用大量存储空间的问题。Embodiments of the present invention provide an index deduplication method, device, computer equipment and storage medium based on index distance to solve the problem in related technologies that repeated indicators easily lead to inconsistent index calibers and occupy a large amount of storage space.

为了实现上述目的,本发明实施例的第一方面,提供一种基于指标距离的指标去重方法,包括:In order to achieve the above object, a first aspect of the embodiment of the present invention provides an index deduplication method based on index distance, including:

在预设时间内选取指标的增量数据;Select the incremental data of the indicator within the preset time;

将所述增量数据中时间维度相同的指标进行两两比对,计算两个指标的指标距离;Compare indicators with the same time dimension in the incremental data in pairs, and calculate the indicator distance between the two indicators;

如果在连续多个时间周期内,所述指标距离均小于指标重复阈值,则说明这两个指标同义,否则不同义。If the indicator distance is smaller than the indicator repetition threshold in multiple consecutive time periods, it means that the two indicators are synonymous, otherwise they are not synonymous.

可选地,在第一方面的一种可能实现方式中,所述方法还包括:Optionally, in a possible implementation of the first aspect, the method further includes:

根据指标的维度信息、起止时间、所有者、所属部门以及引用指标的应用信息,生成指标合并建议。Generate indicator merging suggestions based on the indicator's dimension information, start and end time, owner, department, and application information that refers to the indicator.

可选地,在第一方面的一种可能实现方式中,包括:Optionally, a possible implementation of the first aspect includes:

选取用于指标合并的基准指标,所述基准指标是指两个指标合并时保留的指标;Select a benchmark indicator for indicator merging, where the benchmark indicator refers to the indicator retained when two indicators are merged;

根据基准指标的起止时间,判断是否补全指标数据或者删除指标数据;Based on the start and end time of the benchmark indicator, determine whether to complete the indicator data or delete the indicator data;

将非基准指标的应用信息替换成基准指标;Replace the application information of non-benchmark indicators with benchmark indicators;

删除非基准指标。Remove non-benchmark indicators.

可选地,在第一方面的一种可能实现方式中,所述根据基准指标的起止时间,判断是否补全指标数据或者删除指标数据,包括:Optionally, in a possible implementation of the first aspect, determining whether to complete the indicator data or delete the indicator data based on the start and end time of the benchmark indicator includes:

如果基准指标的起止时间范围大于待合并指标的起止时间范围,则自动补全待合并指标数据;If the start and end time range of the benchmark indicator is greater than the start and end time range of the indicator to be merged, the data of the indicator to be merged will be automatically completed;

如果基准指标的起止时间范围小于待合并指标的起止时间范围,则自动删除部分待合并指标数据。If the start and end time range of the benchmark indicator is smaller than the start and end time range of the indicators to be merged, part of the indicator data to be merged will be automatically deleted.

可选地,在第一方面的一种可能实现方式中,将所述增量数据中时间维度相同的指标进行两两比对,计算两个指标的指标距离,包括:Optionally, in a possible implementation of the first aspect, comparing indicators with the same time dimension in the incremental data, and calculating the indicator distance between the two indicators, including:

计算两个指标的指标差值,以及所述指标差值的平均值和标准差;Calculate the indicator difference between two indicators, as well as the mean and standard deviation of the indicator difference;

基于所述指标差值,以及所述指标差值的平均值和标准差,计算两个指标的指标距离。Based on the index difference, as well as the mean and standard deviation of the index difference, the index distance of the two indicators is calculated.

本发明实施例的第二方面,提供一种基于指标距离的指标去重装置,包括:A second aspect of the embodiment of the present invention provides an indicator deduplication device based on indicator distance, including:

增量数据选取模块,用于在预设时间内选取指标的增量数据;Incremental data selection module, used to select incremental data of indicators within a preset time;

指标距离计算模块,用于将所述增量数据中时间维度相同的指标进行两两比对,计算两个指标的指标距离;An indicator distance calculation module is used to compare two indicators with the same time dimension in the incremental data and calculate the indicator distance between the two indicators;

阈值判断模块,用于如果在连续多个时间周期内,所述指标距离均小于指标重复阈值,则说明这两个指标同义,否则不同义。The threshold judgment module is used to indicate that if the distance between the indicators is smaller than the indicator repetition threshold in multiple consecutive time periods, the two indicators are synonymous; otherwise, they are not synonymous.

可选地,在第二方面的一种可能实现方式中,其特征在于,所述装置还包 括:Optionally, in a possible implementation of the second aspect, it is characterized in that the device further includes:

指标合成模块,用于根据指标的维度信息、起止时间、所有者、所属部门以及引用指标的应用信息,生成指标合并建议。The indicator synthesis module is used to generate indicator merging suggestions based on the indicator's dimension information, start and end time, owner, department, and application information of the referenced indicator.

可选地,在第二方面的一种可能实现方式中,其特征在于,所述指标合成模块,包括:Optionally, in a possible implementation of the second aspect, it is characterized in that the indicator synthesis module includes:

选取单元,用于选取用于指标合并的基准指标,所述基准指标是指两个指标合并时保留的指标;A selection unit is used to select a benchmark indicator for indicator merging, where the benchmark indicator refers to an indicator retained when two indicators are merged;

判断单元,用于根据基准指标的起止时间,判断是否补全指标数据或者删除指标数据;The judgment unit is used to judge whether to complete the indicator data or delete the indicator data based on the start and end time of the benchmark indicator;

替换单元,用于将非基准指标的应用信息替换成基准指标;Replacement unit, used to replace application information of non-benchmark indicators with benchmark indicators;

删除单元,用于删除非基准指标。Delete unit, used to delete non-benchmark indicators.

本发明实施例的第三方面,提供一种计算机设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述各个方法实施例中的步骤。A third aspect of the embodiment of the present invention provides a computer device, including a memory and a processor. The memory stores a computer program that can be run on the processor. When the processor executes the computer program, the above methods are implemented. Steps in Examples.

本发明实施例的第四方面,提供一种可读存储介质,所述可读存储介质中存储有计算机程序,所述计算机程序被处理器执行时用于实现本发明第一方面及第一方面各种可能设计的所述方法的步骤。A fourth aspect of the embodiments of the present invention provides a readable storage medium. A computer program is stored in the readable storage medium. When the computer program is executed by a processor, it is used to implement the first aspect of the present invention and the first aspect thereof. Various possible designs of the steps of the method.

本发明提供的基于指标距离的指标去重方法、装置、计算机设备和存储介质,通过在预设时间内选取指标的增量数据;将所述增量数据中时间维度相同的指标进行两两比对,计算两个指标的指标距离;如果在连续多个时间周期内,所述指标距离均小于指标重复阈值,则说明这两个指标同义,否则不同义。本发明能够帮助指标管理人员消除重复的指标,统一指标口径,节省指标存储空间。The index deduplication method, device, computer equipment and storage medium based on index distance provided by the present invention select the incremental data of the index within a preset time; compare the indexes with the same time dimension in the incremental data in pairs Yes, calculate the index distance between the two indicators; if the indicator distance is smaller than the indicator repetition threshold in multiple consecutive time periods, it means that the two indicators are synonymous, otherwise they are not synonymous. The invention can help indicator managers eliminate duplicate indicators, unify indicator calibers, and save indicator storage space.

附图说明Description of the drawings

图1为基于指标距离的指标去重方法的第一种实施方式的流程图;Figure 1 is a flow chart of the first implementation of the index deduplication method based on index distance;

图2为基于指标距离的指标去重装置的第一种实施方式的结构图。Figure 2 is a structural diagram of a first implementation of an index deduplication device based on index distance.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments These are only some embodiments of the present invention, rather than all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.

本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。The terms "first", "second", "third", "fourth", etc. (if present) in the description and claims of the present invention and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to Describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the invention described herein are capable of being practiced in sequences other than those illustrated or described herein.

应当理解,在本发明的各种实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。It should be understood that in various embodiments of the present invention, the size of the sequence numbers of each process does not mean the order of execution. The execution order of each process should be determined by its functions and internal logic, and should not be determined by the execution order of the embodiments of the present invention. The implementation process constitutes no limitation.

应当理解,在本发明中,“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be understood that in the present invention, "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product or equipment that includes a series of steps or units is not necessarily limited to Those steps or elements that are expressly listed may instead include other steps or elements that are not expressly listed or that are inherent to the process, method, product or apparatus.

应当理解,在本发明中,“多个”是指两个或两个以上。“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。“包含A、B和C”、“包含A、B、C”是指A、B、C三者都包含,“包含A、B或C”是指包含A、B、C三者之一,“包含A、B和/或C”是指包含A、B、C三者中任1个或任2个或3个。It should be understood that in the present invention, "plurality" means two or more. "And/or" is just an association relationship that describes related objects, indicating that three relationships can exist. For example, and/or B can mean: A alone exists, A and B exist simultaneously, and B alone exists. . The character "/" generally indicates that the related objects are in an "or" relationship. "Includes A, B and C" and "includes A, B, C" means that it includes all three of A, B and C, and "includes A, B or C" means that it includes one of A, B and C. "Including A, B and/or C" means including any one, any two or three of A, B and C.

应当理解,在本发明中,“与A对应的B”、“与A相对应的B”、“A与B相对应”或者“B与A相对应”,表示B与A相关联,根据A可以确定B。根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其他信息 确定B。A与B的匹配,是A与B的相似度大于或等于预设的阈值。It should be understood that in the present invention, "B corresponding to A", "B corresponding to A", "A corresponding to B" or "B corresponding to A" means that B is associated with A. According to A B can be determined. Determining B based on A does not mean determining B only based on A, but can also determine B based on A and/or other information. The matching between A and B means that the similarity between A and B is greater than or equal to the preset threshold.

取决于语境,如在此所使用的“若”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。Depending on the context, "if" as used herein may be interpreted as "when" or "when" or "in response to determination" or "in response to detection."

下面以具体地实施例对本发明的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。The technical solution of the present invention will be described in detail below with specific examples. The following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments.

本发明提供一种基于指标距离的指标去重方法,如图1所示其流程图,包括:The present invention provides an index deduplication method based on index distance. The flow chart is shown in Figure 1, which includes:

步骤S110、在预设时间内选取指标的增量数据。Step S110: Select incremental data of the indicator within a preset time.

在本步骤中,该预设时间的长短与指标的时间维度有关,如果指标的时间维度是天级粒度,则每天生成前一天新增的指标数据,其指标周期是一天,预设时间也就是n个指标周期,即最近n天;如果指标的时间维度是周级粒度,则每周生成前一周新增的指标数据,其指标周期是一周,预设时间也就是n个指标周期,即最近n周。In this step, the length of the preset time is related to the time dimension of the indicator. If the time dimension of the indicator is day-level granularity, the newly added indicator data of the previous day will be generated every day. The indicator period is one day, and the preset time is n indicator periods, that is, the last n days; if the time dimension of the indicator is week-level granularity, the newly added indicator data of the previous week will be generated every week, and the indicator period is one week, and the preset time is n indicator periods, that is, the latest n weeks.

步骤S120、将所述增量数据中时间维度相同的指标进行两两比对,计算两个指标的指标距离。Step S120: Compare indicators with the same time dimension in the incremental data, and calculate the indicator distance between the two indicators.

在步骤S120中,在新增的指标数据中包含着不同时间维度的指标,既有每天新增一次的指标数据1,也有每周新增一次的指标数据2,以及根据人为设置特定时间新增一次的指标数据3;因此在进行指标的两两比对时需要先将相同时间维度的指标数据进行归类。而后再计算属于同一时间维度的任意两个指标数据进行比对,计算其指标距离。In step S120, the newly added indicator data includes indicators in different time dimensions, including indicator data 1 that is added once a day, indicator data 2 that is added once a week, and indicator data that is added once at a specific time based on artificial settings. Indicator data 3; Therefore, when comparing indicators in pairs, it is necessary to first classify indicator data in the same time dimension. Then any two indicator data belonging to the same time dimension are calculated and compared, and their indicator distance is calculated.

其中,指标距离是指在n个指标周期内,计算指标的数据差值的距离,指标距离的计算方法如下:Among them, the indicator distance refers to the distance between which the data difference of the indicator is calculated within n indicator periods. The calculation method of the indicator distance is as follows:

步骤1、计算两个指标的指标差值,以及所述指标差值的平均值和标准差。Step 1. Calculate the index difference between the two indicators, as well as the average and standard deviation of the index difference.

指标差值:xi=|指标Ai-指标Bi|,其中i=1,2,......,n;Indicator difference: xi=|Indicator Ai-Indicator Bi|, where i=1,2,...,n;

指标差值的平均值:

Figure PCTCN2022114362-appb-000001
Average of indicator differences:
Figure PCTCN2022114362-appb-000001

指标差值的标准差:

Figure PCTCN2022114362-appb-000002
其中i=1,2,......,n。 Standard deviation of indicator differences:
Figure PCTCN2022114362-appb-000002
Where i=1,2,...,n.

步骤2、基于所述指标差值,以及所述指标差值的平均值和标准差,计算两个指标的指标距离。Step 2: Calculate the index distance of the two indicators based on the index difference and the average and standard deviation of the index difference.

Figure PCTCN2022114362-appb-000003
其中i=1,2,......,n。
Figure PCTCN2022114362-appb-000003
Where i=1,2,...,n.

步骤S130、如果在连续多个时间周期内,所述指标距离均小于指标重复阈值,则说明这两个指标同义,否则不同义。Step S130: If the indicator distance is smaller than the indicator repetition threshold in multiple consecutive time periods, it means that the two indicators are synonymous; otherwise, they are not synonymous.

在该步骤中,In this step,

时间周期是由进行指标距离计算的两个指标的时间维度决定的,也就是说:如果这两个指标的时间维度为天,那么需要在连续几天内的指标距离小于重复阈值则说明两个指标同义;或者如果这两个指标的时间维度为周,那么需要在连续几周内的指标距离小于重复阈值则说明两个指标同义。The time period is determined by the time dimension of the two indicators used to calculate the indicator distance. That is to say: if the time dimension of these two indicators is days, then the indicator distance within consecutive days needs to be less than the repetition threshold, indicating that the two The indicators are synonymous; or if the time dimension of the two indicators is week, then the indicator distance in consecutive weeks needs to be less than the repetition threshold to indicate that the two indicators are synonymous.

指标重复阈值k是衡量两个指标是否重复的阈值。当计算出dist(指标A,指标B)>k时,则说明两个指标同义;当计算出dist(指标A,指标B)≤k时,则说明两个指标不同义。The indicator duplication threshold k is a threshold that measures whether two indicators are repeated. When dist(Indicator A, Indicator B)>k is calculated, it means that the two indicators are synonymous; when dist(Indicator A, Indicator B)≤k is calculated, it means that the two indicators are not synonymous.

在一个实施例中,所述方法还包括:In one embodiment, the method further includes:

根据指标的维度信息、起止时间、所有者、所属部门以及引用指标的应用信息,生成指标合并建议。Generate indicator merging suggestions based on the indicator's dimension information, start and end time, owner, department, and application information that refers to the indicator.

在该步骤中,由于同义的指标是两两比对的,指标名称显然不相同,因此出现了同义不同名指标的问题。为解决该问题,本发明还提出了根据指标的维度信息、指标数据的起止时间、指标的所有者、指标所属部门、引用指标的应用(如:BI报表)信息,自动生成指标合并建议,并以此作为指标管理员是否合并指标的决策依据。In this step, since synonymous indicators are compared in pairs, the indicator names are obviously different, so the problem of synonymous indicators with different names arises. In order to solve this problem, the present invention also proposes to automatically generate indicator merging suggestions based on the dimension information of the indicator, the start and end time of the indicator data, the owner of the indicator, the department to which the indicator belongs, and the application (such as BI report) that refers to the indicator, and This serves as the basis for the indicator administrator to decide whether to merge indicators.

在一个实施例中,所述方法还包括:In one embodiment, the method further includes:

选取用于指标合并的基准指标,所述基准指标是指两个指标合并时保留的指标;根据基准指标的起止时间,判断是否补全指标数据或者删除指标数据; 将非基准指标的应用信息替换成基准指标;删除非基准指标。Select the benchmark indicator for indicator merging, which refers to the indicator retained when the two indicators are merged; determine whether to complete the indicator data or delete the indicator data based on the start and end time of the benchmark indicator; replace the application information of non-benchmark indicators become benchmark indicators; delete non-benchmark indicators.

所述根据基准指标的起止时间,判断是否补全指标数据或者删除指标数据包括:如果基准指标的起止时间范围大于待合并指标的起止时间范围,则自动补全待合并指标数据;如果基准指标的起止时间范围小于待合并指标的起止时间范围,则自动删除部分待合并指标数据。Determining whether to complete the indicator data or delete the indicator data based on the start and end time of the benchmark indicator includes: if the start and end time range of the benchmark indicator is greater than the start and end time range of the indicator to be merged, automatically completing the indicator data to be merged; If the start and end time range is smaller than the start and end time range of the indicators to be merged, part of the indicator data to be merged will be automatically deleted.

在步骤中,所述指标合并建议主要分为如下三点:In the steps, the indicator merging suggestions are mainly divided into the following three points:

首先,需要从待合并的指标中选择一个指标,将其作为基准坐标,所谓基准坐标也是两个指标在合并时保留的那个指标,例如指标A和指标B同义不同名,合并两个指标时,保留指标A,删除指标B,则指标A为基准指标。First, you need to select an indicator from the indicators to be merged and use it as the base coordinate. The so-called base coordinate is also the indicator that is retained when the two indicators are merged. For example, indicator A and indicator B have synonyms and different names. When merging two indicators, , retain indicator A and delete indicator B, then indicator A is the benchmark indicator.

其次,根据基准指标的起止时间对应补充指标数据或者自动删除指标数据,即如果基准指标的起止时间范围大于待合并指标的起止时间范围,则自动补全待合并指标数据;如果基准指标的起止时间范围小于待合并指标的起止时间范围,则自动删除部分待合并指标数据。例如:指标A的起止时间为1-12月,指标B的起止时间为5-12月,当选择指标B作为基准坐标时,指定起止时间为5-12月,那么就会自动删除1-4月时间范围内的指标A;当选择指标A作为基准坐标时,指定起止时间为1-12月,那么就会自动补充1-4月时间范围内的指标B。Secondly, the indicator data is supplemented or automatically deleted according to the start and end time of the benchmark indicator. That is, if the start and end time range of the benchmark indicator is greater than the start and end time range of the indicator to be merged, the indicator data to be merged will be automatically completed; if the start and end time of the benchmark indicator are If the range is smaller than the start and end time range of the indicators to be merged, some of the indicator data to be merged will be automatically deleted. For example: the start and end time of indicator A is from January to December, and the start and end time of indicator B is from May to December. When selecting indicator B as the base coordinate, specify the start and end time from May to December, then 1-4 will be automatically deleted. Indicator A within the monthly time range; when selecting indicator A as the base coordinate and specifying the start and end time as January to December, indicator B within the January to April time range will be automatically supplemented.

最后,将非基准指标的应用替换成基准指标;以及删除非基准指标(也可以不将其删除,只需要将该指标下线,不展示给指标用户使用即可)。Finally, replace the application of non-benchmark indicators with benchmark indicators; and delete the non-benchmark indicator (you do not need to delete it, you just need to take the indicator offline and not display it to indicator users).

根据指标合并建议和用户的选择,来决定是否合并重复指标以及具体如何合并重复指标。Based on the indicator merging suggestions and the user's choice, it is decided whether to merge duplicate indicators and how to merge duplicate indicators.

在一个实施例中,所述方法还包括:在合并重复指标后,会生成指标合并后-指标合成前的存储空间的差值,并通过可视化界面查看指标合并之后节约的空间的效果,具体如下:In one embodiment, the method further includes: after merging duplicate indicators, generating a difference in storage space after merging indicators - before merging indicators, and viewing the effect of the space saved after merging indicators through a visual interface, specifically as follows :

指标合并前空间占用:3.2TBSpace occupied before merging indicators: 3.2TB

指标合并后空间占用:2.4TBSpace occupied after merging indicators: 2.4TB

指标合并后空间节省::0.8TB(25%)Space savings after metrics merged: 0.8TB (25%)

本发明提供的基于指标距离的指标去重方法,通过在预设时间内选取指标的增量数据;将所述增量数据中时间维度相同的指标进行两两比对,计算两个指标的指标距离;如果在连续多个时间周期内,所述指标距离均小于指标重复阈值,则说明这两个指标同义,否则不同义。本发明能够帮助指标管理人员消除重复的指标,统一指标口径,节省指标存储空间。The index deduplication method based on index distance provided by the present invention selects the incremental data of the index within a preset time; compares the indexes with the same time dimension in the incremental data to calculate the index of the two indexes. distance; if the distance between the indicators is smaller than the indicator repetition threshold in multiple consecutive time periods, it means that the two indicators are synonymous, otherwise they are not synonymous. The invention can help indicator managers eliminate duplicate indicators, unify indicator calibers, and save indicator storage space.

技术效果:Technical effect:

1.能够全自动化对重复指标进行识别;1. Able to fully automate the identification of repeated indicators;

2.用户可通过设置指标周期和指标重复阈值来对重复指标识别方法进行调整;2. Users can adjust the repeated indicator identification method by setting the indicator period and indicator repetition threshold;

3.指标合并建议中,提供了充分的指标的上下游信息,作为指标管理员合并指标的依据;3. In the indicator merging proposal, sufficient upstream and downstream information of the indicators is provided as a basis for the indicator administrator to merge indicators;

4.从重复指标识别、合并重复指标到重复指标合并存储空间节约可视化的完整的技术方案。4. A complete technical solution from identifying duplicate indicators, merging duplicate indicators to merging duplicate indicators to save storage space and visualize them.

本发明的实施例还提供一种基于指标距离的指标去重装置,如图2所示,包括:Embodiments of the present invention also provide an indicator deduplication device based on indicator distance, as shown in Figure 2, including:

增量数据选取模块,用于在预设时间内选取指标的增量数据;Incremental data selection module, used to select incremental data of indicators within a preset time;

指标距离计算模块,用于将所述增量数据中时间维度相同的指标进行两两比对,计算两个指标的指标距离;An indicator distance calculation module is used to compare two indicators with the same time dimension in the incremental data and calculate the indicator distance between the two indicators;

阈值判断模块,用于如果在连续多个时间周期内,所述指标距离均小于指标重复阈值,则说明这两个指标同义,否则不同义。The threshold judgment module is used to indicate that if the distance between the indicators is smaller than the indicator repetition threshold in multiple consecutive time periods, the two indicators are synonymous; otherwise, they are not synonymous.

在一个实施例中,所述装置还包括:In one embodiment, the device further includes:

指标合成模块,用于根据指标的维度信息、起止时间、所有者、所属部门以及引用指标的应用信息,生成指标合并建议。The indicator synthesis module is used to generate indicator merging suggestions based on the indicator's dimension information, start and end time, owner, department, and application information of the referenced indicator.

在一个实施例中,所述指标合成模块,包括:In one embodiment, the indicator synthesis module includes:

选取单元,用于选取用于指标合并的基准指标,所述基准指标是指两个指 标合并时保留的指标;The selection unit is used to select the benchmark indicator used for indicator merging. The benchmark indicator refers to the indicator retained when the two indicators are merged;

判断单元,用于根据基准指标的起止时间,判断是否补全指标数据或者删除指标数据;The judgment unit is used to judge whether to complete the indicator data or delete the indicator data based on the start and end time of the benchmark indicator;

替换单元,用于将非基准指标的应用信息替换成基准指标;Replacement unit, used to replace application information of non-benchmark indicators with benchmark indicators;

删除单元,用于删除非基准指标。Delete unit, used to delete non-benchmark indicators.

本发明提供的基于指标距离的指标去重装置,通过在预设时间内选取指标的增量数据;将所述增量数据中时间维度相同的指标进行两两比对,计算两个指标的指标距离;如果在连续多个时间周期内,所述指标距离均小于指标重复阈值,则说明这两个指标同义,否则不同义。本发明能够帮助指标管理人员消除重复的指标,统一指标口径,节省指标存储空间。The index deduplication device based on index distance provided by the present invention selects the incremental data of the index within a preset time; compares the indexes with the same time dimension in the incremental data to calculate the index of the two indexes. distance; if the distance between the indicators is smaller than the indicator repetition threshold in multiple consecutive time periods, it means that the two indicators are synonymous, otherwise they are not synonymous. The invention can help indicator managers eliminate duplicate indicators, unify indicator calibers, and save indicator storage space.

其中,可读存储介质可以是计算机存储介质,也可以是通信介质。通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。计算机存储介质可以是通用或专用计算机能够存取的任何可用介质。例如,可读存储介质耦合至处理器,从而使处理器能够从该可读存储介质读取信息,且可向该可读存储介质写入信息。当然,可读存储介质也可以是处理器的组成部分。处理器和可读存储介质可以位于专用集成电路(Application Specific Integrated Circuits,ASIC)中。另外,该ASIC可以位于用户设备中。当然,处理器和可读存储介质也可以作为分立组件存在于通信设备中。可读存储介质可以是只读存储器(ROM)、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。The readable storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media can be any available media that can be accessed by a general purpose or special purpose computer. For example, a readable storage medium is coupled to a processor such that the processor can read information from the readable storage medium and write information to the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and readable storage medium may be located in Application Specific Integrated Circuits (ASICs). Additionally, the ASIC can be located in the user equipment. Of course, the processor and the readable storage medium may also exist as discrete components in the communication device. Readable storage media can be read-only memory (ROM), random-access memory (RAM), CD-ROM, tapes, floppy disks, optical data storage devices, etc.

本发明还提供一种程序产品,该程序产品包括执行指令,该执行指令存储在可读存储介质中。设备的至少一个处理器可以从可读存储介质读取该执行指令,至少一个处理器执行该执行指令使得设备实施上述的各种实施方式提供的方法。The present invention also provides a program product. The program product includes execution instructions, and the execution instructions are stored in a readable storage medium. At least one processor of the device can read the execution instruction from the readable storage medium, and at least one processor executes the execution instruction to cause the device to implement the methods provided by the various embodiments described above.

在上述终端或者服务器的实施例中,应理解,处理器可以是中央处理单元(英文:Central Processing Unit,CPU),还可以是其他通用处理器、数字信号 处理器(英文:Digital Signal Processor,DSP)等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本发明所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。In the above embodiments of the terminal or server, it should be understood that the processor may be a central processing unit (English: Central Processing Unit, CPU), or other general-purpose processor, digital signal processor (English: Digital Signal Processor, DSP )wait. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc. The steps of the method disclosed in the present invention can be directly implemented by a hardware processor, or executed by a combination of hardware and software modules in the processor.

最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention, but not to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions described in the foregoing embodiments can still be modified, or some or all of the technical features can be equivalently replaced; and these modifications or substitutions do not deviate from the essence of the corresponding technical solutions from the technical solutions of the embodiments of the present invention. scope.

Claims (10)

一种基于指标距离的指标去重方法,其特征在于,包括:An indicator deduplication method based on indicator distance, which is characterized by including: 在预设时间内选取指标的增量数据;Select the incremental data of the indicator within the preset time; 将所述增量数据中时间维度相同的指标进行两两比对,计算两个指标的指标距离;Compare indicators with the same time dimension in the incremental data in pairs, and calculate the indicator distance between the two indicators; 如果在连续多个时间周期内,所述指标距离均小于指标重复阈值,则说明这两个指标同义,否则不同义。If the indicator distance is smaller than the indicator repetition threshold in multiple consecutive time periods, it means that the two indicators are synonymous, otherwise they are not synonymous. 根据权利要求1所述的基于指标距离的指标去重方法,其特征在于,所述方法还包括:The indicator deduplication method based on indicator distance according to claim 1, characterized in that the method further includes: 根据指标的维度信息、起止时间、所有者、所属部门以及引用指标的应用信息,生成指标合并建议。Generate indicator merging suggestions based on the indicator's dimension information, start and end time, owner, department, and application information that refers to the indicator. 根据权利要求2所述的基于指标距离的指标去重方法,其特征在于,包括:The indicator deduplication method based on indicator distance according to claim 2, characterized in that it includes: 选取用于指标合并的基准指标,所述基准指标是指两个指标合并时保留的指标;Select a benchmark indicator for indicator merging, where the benchmark indicator refers to the indicator retained when two indicators are merged; 根据基准指标的起止时间,判断是否补全指标数据或者删除指标数据;Based on the start and end time of the benchmark indicator, determine whether to complete the indicator data or delete the indicator data; 将非基准指标的应用信息替换成基准指标;Replace the application information of non-benchmark indicators with benchmark indicators; 删除非基准指标。Remove non-benchmark indicators. 根据权利要求3所述的基于指标距离的指标去重方法,其特征在于,所述根据基准指标的起止时间,判断是否补全指标数据或者删除指标数据,包括:The index deduplication method based on index distance according to claim 3, characterized in that, judging whether to complete the index data or delete the index data according to the start and end time of the benchmark index includes: 如果基准指标的起止时间范围大于待合并指标的起止时间范围,则自动补全待合并指标数据;If the start and end time range of the benchmark indicator is greater than the start and end time range of the indicator to be merged, the data of the indicator to be merged will be automatically completed; 如果基准指标的起止时间范围小于待合并指标的起止时间范围,则自动删除部分待合并指标数据。If the start and end time range of the benchmark indicator is smaller than the start and end time range of the indicators to be merged, part of the indicator data to be merged will be automatically deleted. 根据权利要求1所述的基于指标距离的指标去重方法,其特征在于,所述将所述增量数据中时间维度相同的指标进行两两比对,计算两个指标的指标距离,包括:The indicator deduplication method based on indicator distance according to claim 1, characterized in that the step of comparing indicators with the same time dimension in the incremental data and calculating the indicator distance of the two indicators includes: 计算两个指标的指标差值,以及所述指标差值的平均值和标准差;Calculate the indicator difference between two indicators, as well as the mean and standard deviation of the indicator difference; 基于所述指标差值,以及所述指标差值的平均值和标准差,计算两个指标的指标距离。Based on the index difference, as well as the mean and standard deviation of the index difference, the index distance of the two indicators is calculated. 一种基于指标距离的指标去重装置,其特征在于,包括:An indicator deduplication device based on indicator distance, which is characterized by including: 增量数据选取模块,用于在预设时间内选取指标的增量数据;Incremental data selection module, used to select incremental data of indicators within a preset time; 指标距离计算模块,用于将所述增量数据中时间维度相同的指标进行两两比对,计算两个指标的指标距离;An indicator distance calculation module is used to compare two indicators with the same time dimension in the incremental data and calculate the indicator distance between the two indicators; 阈值判断模块,用于如果在连续多个时间周期内,所述指标距离均小于指标重复阈值,则说明这两个指标同义,否则不同义。The threshold judgment module is used to indicate that if the distance between the indicators is smaller than the indicator repetition threshold in multiple consecutive time periods, the two indicators are synonymous; otherwise, they are not synonymous. 根据权利要求6所述的基于指标距离的指标去重装置,其特征在于,所述装置还包括:The indicator deduplication device based on indicator distance according to claim 6, characterized in that the device further includes: 指标合成模块,用于根据指标的维度信息、起止时间、所有者、所属部门以及引用指标的应用信息,生成指标合并建议。The indicator synthesis module is used to generate indicator merging suggestions based on the indicator's dimension information, start and end time, owner, department, and application information of the referenced indicator. 根据权利要求7所述的基于指标距离的指标去重装置,其特征在于,所述指标合成模块,包括:The indicator deduplication device based on indicator distance according to claim 7, characterized in that the indicator synthesis module includes: 选取单元,用于选取用于指标合并的基准指标,所述基准指标是指两个指标合并时保留的指标;A selection unit is used to select a benchmark indicator for indicator merging, where the benchmark indicator refers to an indicator retained when two indicators are merged; 判断单元,用于根据基准指标的起止时间,判断是否补全指标数据或者删除指标数据;The judgment unit is used to judge whether to complete the indicator data or delete the indicator data based on the start and end time of the benchmark indicator; 替换单元,用于将非基准指标的应用信息替换成基准指标;Replacement unit, used to replace application information of non-benchmark indicators with benchmark indicators; 删除单元,用于删除非基准指标。Delete unit, used to delete non-benchmark indicators. 一种计算机设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至5任意一项所述的方法的步骤。A computer device, including a memory and a processor. The memory stores a computer program that can be run on the processor. It is characterized in that when the processor executes the computer program, it implements any one of claims 1 to 5. steps of the method described. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至5任意一项所述的方法的步骤。A computer-readable storage medium on which a computer program is stored, characterized in that when the computer program is executed by a processor, the steps of the method described in any one of claims 1 to 5 are implemented.
PCT/CN2022/114362 2022-04-29 2022-08-23 Indicator distance-based indicator deduplication method and apparatus Ceased WO2023206875A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210472179.5 2022-04-29
CN202210472179.5A CN114841559A (en) 2022-04-29 2022-04-29 Index deduplication method and device based on index distance

Publications (1)

Publication Number Publication Date
WO2023206875A1 true WO2023206875A1 (en) 2023-11-02

Family

ID=82568368

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/114362 Ceased WO2023206875A1 (en) 2022-04-29 2022-08-23 Indicator distance-based indicator deduplication method and apparatus

Country Status (2)

Country Link
CN (1) CN114841559A (en)
WO (1) WO2023206875A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114841559A (en) * 2022-04-29 2022-08-02 上海跬智信息技术有限公司 Index deduplication method and device based on index distance

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040162806A1 (en) * 2002-09-13 2004-08-19 Fuji Xerox Co., Ltd. Text sentence comparing apparatus
CN105354685A (en) * 2015-12-03 2016-02-24 南华大学 Evaluation index system construction method based on distance classification of images and principal component analysis
US9424269B1 (en) * 2013-12-19 2016-08-23 Veritas Technologies Llc Systems and methods for deduplicating archive objects
CN113920381A (en) * 2021-12-15 2022-01-11 深圳市明源云科技有限公司 Repeated derivative index identification method, electronic device and readable storage medium
CN114049016A (en) * 2021-11-16 2022-02-15 中国联合网络通信集团有限公司 Index similarity judgment method, system, terminal device and computer storage medium
CN114841559A (en) * 2022-04-29 2022-08-02 上海跬智信息技术有限公司 Index deduplication method and device based on index distance

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3049700B1 (en) * 2016-03-29 2018-03-16 Safran Aircraft Engines PROCESS FOR MANUFACTURING PARTS USING MANUFACTURING DERIVATIVES
CN109509517A (en) * 2018-10-16 2019-03-22 华东理工大学 A kind of medical test Index for examination modified method automatically
CN114254918B (en) * 2021-12-20 2025-04-29 平安证券股份有限公司 Method, device, readable medium and electronic device for calculating index data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040162806A1 (en) * 2002-09-13 2004-08-19 Fuji Xerox Co., Ltd. Text sentence comparing apparatus
US9424269B1 (en) * 2013-12-19 2016-08-23 Veritas Technologies Llc Systems and methods for deduplicating archive objects
CN105354685A (en) * 2015-12-03 2016-02-24 南华大学 Evaluation index system construction method based on distance classification of images and principal component analysis
CN114049016A (en) * 2021-11-16 2022-02-15 中国联合网络通信集团有限公司 Index similarity judgment method, system, terminal device and computer storage medium
CN113920381A (en) * 2021-12-15 2022-01-11 深圳市明源云科技有限公司 Repeated derivative index identification method, electronic device and readable storage medium
CN114841559A (en) * 2022-04-29 2022-08-02 上海跬智信息技术有限公司 Index deduplication method and device based on index distance

Also Published As

Publication number Publication date
CN114841559A (en) 2022-08-02

Similar Documents

Publication Publication Date Title
US9317374B2 (en) Performing a background copy process during a backup operation
CN105654201B (en) Advertisement traffic prediction method and device
WO2014048277A1 (en) Information pushing method and apparatus
CN114020721B (en) Data management method, device, equipment and storage medium for time series database cluster
US7246043B2 (en) Graphical display and correlation of severity scores of system metrics
CN112699142A (en) Cold and hot data processing method and device, electronic equipment and storage medium
CN110647447A (en) Abnormal instance detection method, apparatus, device and medium for distributed system
WO2023206875A1 (en) Indicator distance-based indicator deduplication method and apparatus
CN114186123A (en) Processing method, device, device and storage medium for hot event
WO2019196502A1 (en) Marketing activity quality assessment method, server, and computer readable storage medium
CN114048231B (en) Data processing method, device and computer program product
CN111737555A (en) Method, device and storage medium for selecting hot keywords
CN110941536A (en) Monitoring method and system, and first server cluster
CN118410021B (en) Data storage optimization method and device, electronic equipment and storage medium
CN108491432A (en) Electric system cumulative amount storage based on eap-message digest and abstracting method, electronic equipment and storage medium
CN117972559A (en) Index data anomaly detection method, computer device, and storage medium
JP2015138484A (en) Service level management system
CN119066105A (en) Interaction data volume determination method, device, equipment and readable storage medium
CN114864030A (en) A single disease data reporting and verification method, device, equipment and medium
CN114116213A (en) Method, device, equipment and storage medium for estimating excess load in distributed system
CN114662952A (en) A method, device, equipment and storage medium for evaluating behavior data
CN114416699A (en) Relational database data management method and device and electronic equipment
CN115291806B (en) A processing method, apparatus, electronic device, and storage medium
CN108959033A (en) A kind of cpu monitor pre-warning management system and method
CN110515923B (en) A data migration method and system between distributed databases

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22939701

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22939701

Country of ref document: EP

Kind code of ref document: A1