CN103327075A

CN103327075A - Distributed mass organization realizing method based on label interaction

Info

Publication number: CN103327075A
Application number: CN2013102004661A
Authority: CN
Inventors: 于秦; 赵一甲; 罗俊海
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2013-05-27
Filing date: 2013-05-27
Publication date: 2013-09-25
Anticipated expiration: 2033-05-27
Also published as: CN103327075B

Abstract

The invention discloses a distributed community discovery method based on tag interaction, which specifically includes: setting known information in an application scene, a network initialization process, and a tag update process. The method of the present invention is a general computing model. Under the condition that the global topology is unknown and the nodes only know the local topology information of one-hop logical relationship, the community relationship can be dynamically discovered and maintained; all nodes participate in the calculation, and through the label interaction between nodes To complete the community discovery process in a dynamic network environment, each node updates its own label number according to the policy by exchanging label information with its neighbor nodes, and does not care about the topology of the entire network.

Description

A Distributed Community Discovery Method Based on Tag Interaction

技术领域technical field

本发明属于通信技术领域，具体涉及移动社会网络中的社团发现方法。The invention belongs to the technical field of communication, and in particular relates to a community discovery method in a mobile social network.

背景技术Background technique

社团是网络中结点组成的分组，组内的边较多，而组间的边较少。复杂网络中的社团结构具有重要意义，因为社团结构通常对应于网络中的某一功能单元，例如，万维网中具有相同主题的站点组成的社团；生物网络中具有相同功能的细胞单元组成的社团。发现复杂网络中社团的过程可以看作是对复杂网络进行较粗粒化的过程，发现社团结构也可以揭示某个具体的结点在复杂网络中起的作用。A community is a grouping of nodes in a network, with more edges within a group and fewer edges between groups. The community structure in a complex network is of great significance, because the community structure usually corresponds to a certain functional unit in the network, for example, a community composed of sites with the same theme in the World Wide Web; a community composed of cell units with the same function in a biological network. The process of discovering communities in complex networks can be seen as a coarser-grained process of complex networks, and discovering community structures can also reveal the role of a specific node in complex networks.

随着研究的深入,人们发现许多实际网络均具有社团结构,即整个网络由若干个社团组成,社团之间的连接相对稀疏、社团内部的连接相对稠密。社团发现则是利用图拓扑结构中所蕴藏的信息从复杂网络中解析出其模块化的社团结构,该问题的深入研究有助于以一种分而治之的方式研究整个网络的模块、功能及其演化,更准确地理解复杂系统的组织原则、拓扑结构与动力学特性,具有十分重要的意义。With the deepening of research, it is found that many real networks have a community structure, that is, the entire network is composed of several communities, the connections between the communities are relatively sparse, and the connections within the communities are relatively dense. Community discovery is to use the information contained in the graph topology to analyze its modular community structure from the complex network. The in-depth study of this problem helps to study the modules, functions and evolution of the entire network in a divide-and-conquer manner. , it is of great significance to understand the organization principles, topology and dynamics of complex systems more accurately.

当前研究中的社团发现方法，大多数是集中式的社团发现算法，即一台拥有全网逻辑拓扑的服务器，通过各种算法和策略，发现整个网络中的社团结构。当前常见的分布式计算方法分为两种。Most of the community discovery methods in current research are centralized community discovery algorithms, that is, a server with a logical topology of the entire network can discover community structures in the entire network through various algorithms and strategies. There are currently two common distributed computing methods.

一种是通过不同的服务器对相同的网络数据分别计算，每个服务器采用不同的算法参数，然后对结果进行统计处理，得出最优的社团发现结果，这种方式将原本需要多次计算，将结果进行对比的算法分流水线进行了处理。One is to calculate the same network data separately through different servers. Each server uses different algorithm parameters, and then performs statistical processing on the results to obtain the optimal community discovery results. This method would originally require multiple calculations. The algorithm for comparing the results is divided into pipelines for processing.

另一种则是利用已有的分布式计算模型来实现社团发现算法的并行处理。CLPA(CliqueLabel Propagation Algorithm)算法就采用了这样的方式实现了并行计算，该方法采用google公司提出的一种编程模型MapReduce，该模型主要用于海量数据的处理，CLPA提出利用这种模型来进行数据输入，从而可以实现并行计算，将原本在一个服务器上进行的计算分配到服务器群组上完成，然后得到算法最终结果。The other is to use the existing distributed computing model to realize the parallel processing of the community discovery algorithm. The CLPA (CliqueLabel Propagation Algorithm) algorithm uses this method to realize parallel computing. This method uses a programming model MapReduce proposed by Google. This model is mainly used for processing massive data. CLPA proposes to use this model to process data. Input, so that parallel computing can be realized, the calculation originally performed on one server is distributed to the server group for completion, and then the final result of the algorithm is obtained.

这两种方式的本质都是通过将计算内容利用现有的模型进行分布式计算，算法计算的前提和内容并没有发生变化，都需要一个集中的服务器为分布的计算步骤提供整个网络的拓扑结构信息，在静态的、全网拓扑已知的网络中进行社团发现，即一台拥有全网逻辑拓扑的服务器，通过各种算法和策略，发现整个网络中的社团结构。The essence of these two methods is to use the existing model for distributed calculation of the calculation content. The premise and content of the algorithm calculation have not changed. Both require a centralized server to provide the topology of the entire network for the distributed calculation steps. Information, community discovery is performed in a static network with a known topology of the entire network, that is, a server with a logical topology of the entire network discovers the community structure in the entire network through various algorithms and strategies.

发明内容Contents of the invention

本发明的目的是为了解决现有技术存在的上述问题，提出了一种基于标签交互的分布式社团发现方法。The purpose of the present invention is to solve the above-mentioned problems existing in the prior art, and propose a distributed community discovery method based on tag interaction.

本发明的技术方案是：一种基于标签交互的分布式社团发现方法，具体包括如下步骤：The technical solution of the present invention is: a distributed community discovery method based on label interaction, which specifically includes the following steps:

步骤1．设置应用场景中可知的信息，具体包括：每个节点的初始化标签号，所述标签号具有唯一性；每个节点的逻辑邻居；step 1. Set the known information in the application scenario, specifically including: the initialization label number of each node, and the label number is unique; the logical neighbors of each node;

步骤2．网络初始化过程：网络中所有的节点根据本地唯一信息初始化自身的标签号；将该标签号的权重设置为1，标签号的传播因子初始化为1；设置本地迭代次数记为1，向所有的邻居节点广播自己的标签号；初始化本地保存的邻居标签号列表，设置保存的邻居标签号列表中的每个邻居标签为未更新状态；Step 2. Network initialization process: All nodes in the network initialize their own label numbers according to local unique information; set the weight of the label number to 1, and initialize the propagation factor of the label number to 1; set the number of local iterations to 1, and send The node broadcasts its own label number; initializes the locally saved neighbor label number list, and sets each neighbor label in the saved neighbor label number list to an unupdated state;

步骤3．标签更新过程：节点接收来自邻居的标签号广播，对比接收到的邻居标签号列表中的迭代次数与本地保存的对应邻居标签列表的迭代次数，若接收到的邻居标签列表中的迭代次数大于本地保存的邻居列表迭代次数，则更新邻居标签列表，设置对应列表状态为已更新，若小于或等于本地保存的邻居列表迭代次数，则忽略该广播报文；当节点记录的所有邻居标签号的更新标志为已更新时，更新本地标签号列表,更新策略为：将邻居的标签号列表相加，标签的传播因子取邻居相同标签中最大的传播因子减去预先设定的传播系数p，若标签传播因子小于0，则删除标签；相同标签号的标签权重相加，归一化所有标签权重，删除权重小于预先设定参数1/v的标签号，如果都小于1/v则保留权重最大的标签号，若有多个这样的标签号，则随机选择一个最大权重的标签，删除其余标签，然后归一化保留的标签号，迭代次数加1，向邻居广播新的本地标签号列表，将记录的所有邻居的标签列表更新标志设置为未更新；如果一次更新之后，节点的标签号列表并没有发生变化，则表明节点当前状态下归属的社团结构已经稳定。Step 3. Label update process: The node receives the label number broadcast from the neighbor, compares the number of iterations in the received neighbor label number list with the iteration number of the corresponding neighbor label list saved locally, if the iteration number in the received neighbor label list is greater than the local If the number of neighbor list iterations saved, update the neighbor label list, set the status of the corresponding list as updated, if it is less than or equal to the number of iterations of the neighbor list saved locally, then ignore the broadcast message; when all the neighbor label numbers recorded by the node are updated When the flag is updated, update the list of local label numbers. The update strategy is: add the list of label numbers of the neighbors, and the propagation factor of the label is the largest propagation factor among the same labels of the neighbors minus the preset propagation coefficient p. If the label If the propagation factor is less than 0, delete the label; add the label weights of the same label number, normalize all label weights, delete the label number with a weight less than the preset parameter 1/v, and keep the one with the largest weight if it is less than 1/v Tag number, if there are multiple such tag numbers, randomly select a tag with the largest weight, delete the rest of the tags, then normalize the reserved tag numbers, add 1 to the number of iterations, and broadcast a new list of local tag numbers to the neighbors. The tag list update flags of all recorded neighbors are set to not updated; if the tag number list of the node does not change after an update, it indicates that the community structure to which the node belongs in the current state has been stabilized.

进一步的，上述方法还包括动态维护过程，当出现新的节点或两个节点之间的逻辑邻居关系发生变化时触发动态维护过程，此时节点向所有邻居节点发布自己的标签列表，发起标签更新过程。Further, the above method also includes a dynamic maintenance process. When a new node appears or the logical neighbor relationship between two nodes changes, the dynamic maintenance process is triggered. At this time, the node publishes its own label list to all neighbor nodes and initiates a label update. process.

进一步的，步骤2所述的本地唯一信息具体为mac地址或用户名或节点ID。Further, the local unique information described in step 2 is specifically a mac address or a user name or a node ID.

本发明的有益效果：本发明提出的基于标签交互的分布式社团发现方法，是一种通用的计算模型，在全局拓扑未知，节点只知道一跳逻辑关系的局部拓扑的信息下，动态发现、维护社团关系；所有的节点都参与计算，通过节点间标签交互的方式，来完成动态网络环境中的社团发现过程，每个节点分别通过和邻居节点交互标签信息，根据策略更新自己的标签号，并不关心整个网络的拓扑结构。本发明方法中每一个节点都参与社团发现算法的计算过程，因此不依赖于一个知道全网拓扑的集中式服务器，节点不需要知道全网拓扑，时间复杂度不再成为社团发现算法的瓶颈；此外在标签更新过程中，可以采用不同的更新策略，从而达到更好的社团发现结果，因此本发明的方法可以作为一个框架模块，对所有基于标签的社团发现算法稍作改动，应用在不同模型中，从而适用于分布式的社团发现算法。Beneficial effects of the present invention: The distributed community discovery method based on label interaction proposed by the present invention is a general computing model, and it can dynamically discover, Maintain the community relationship; all nodes participate in the calculation, and complete the community discovery process in the dynamic network environment through label interaction between nodes. Each node updates its own label number according to the strategy by exchanging label information with neighbor nodes. It does not care about the topology of the entire network. In the method of the present invention, each node participates in the calculation process of the community discovery algorithm, so it does not depend on a centralized server that knows the topology of the entire network, the nodes do not need to know the topology of the entire network, and the time complexity no longer becomes the bottleneck of the community discovery algorithm; In addition, in the tag update process, different update strategies can be used to achieve better community discovery results. Therefore, the method of the present invention can be used as a framework module to slightly modify all tag-based community discovery algorithms and apply it to different models. Therefore, it is suitable for distributed community discovery algorithms.

附图说明Description of drawings

图1是本发明具体实施例的主流程图。Fig. 1 is the main flowchart of the specific embodiment of the present invention.

图2是本发明具体实施例步骤2的具体流程图。Fig. 2 is a specific flow chart of Step 2 of a specific embodiment of the present invention.

图3是本发明具体实施例步骤3的具体流程图。Fig. 3 is a specific flow chart of Step 3 of a specific embodiment of the present invention.

图4是本发明具体实施例步骤4的具体流程图。Fig. 4 is a specific flow chart of Step 4 of a specific embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图和具体实施例对本发明作进一步说明。如图1所示，基于标签交互的分布式社团发现方法，具体包括步骤：The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments. As shown in Figure 1, the distributed community discovery method based on label interaction includes the following steps:

步骤1．设置应用场景中可知的信息：step 1. Set the known information in the application scenario:

分布式的社团发现算法需要一定的场景能力和网络环境。在方法初始化之前，首先使得应用场景中可知的信息包括：Distributed community discovery algorithms require certain scene capabilities and network environments. Before the method is initialized, the information that can be known in the application scene includes:

使每个节点知道自己的初始化标签号，并且该标签号具有唯一性；使每个节点知道自己的逻辑邻居有哪些。Let each node know its initial label number, and the label number is unique; let each node know which logical neighbors it has.

这里，每个节点通过和邻居节点实时交互信息，感知到其邻居节点的出现和消失。Here, each node perceives the appearance and disappearance of its neighbor nodes by exchanging information with neighbor nodes in real time.

步骤2．网络初始化过程：Step 2. Network initialization process:

网络中所有的节点根据本地唯一信息初始化自身的标签号；本地唯一信息可以采用如mac地址、用户名、节点ID等标识中的任一个。将该标签号的权重设置为1，设置该标签号的传播因子为1；设置本地迭代次数记为1，向所有的邻居节点广播自己的标签号。本步骤详细过程如图2所示：All nodes in the network initialize their own label numbers according to the local unique information; the local unique information can use any one of identifiers such as mac address, user name, and node ID. Set the weight of the label number to 1, set the propagation factor of the label number to 1; set the number of local iterations to 1, and broadcast its own label number to all neighbor nodes. The detailed process of this step is shown in Figure 2:

步骤21.所有节点初始化自身的标签号：网络中所有的节点根据本地唯一信息初始化自身的标签号，本地唯一信息可以采用如mac地址、用户名、节点ID等标识。Step 21. All nodes initialize their own tag numbers: All nodes in the network initialize their own tag numbers based on local unique information, which can be identified by mac address, user name, node ID, etc.

步骤22.将该标签号的权重设置为1，传播因子初始化为1。Step 22. Set the weight of the tag number to 1, and initialize the propagation factor to 1.

步骤23．设置本地迭代次数记为1。Step 23. Set the number of local iterations to 1.

步骤24.向所有的邻居节点广播自己的标签号。Step 24. Broadcast its own label number to all neighbor nodes.

步骤25.初始化本地保存的邻居标签号列表，设置保存的邻居标签号列表中的每个邻居标签号为未更新状态。Step 25. Initialize the list of neighbor label numbers saved locally, and set each neighbor label number in the saved neighbor label number list as an unupdated state.

步骤3.标签更新过程：Step 3. Tag update process:

节点接收来自邻居的标签号广播，对比接收到的邻居标签列表中的迭代次数与本地保存的对应邻居标签列表的迭代次数，若标签号大于本次保存的邻居列表迭代次数，则更新邻居标签列表，设置对应列表状态为已更新，若小于或等于本地保存的邻居列表迭代次数，则忽略该广播报文；当节点记录的所有邻居标签号的更新标志为已更新时，更新本地标签号列表,更新策略为：将邻居的标签列表相加，标签的传播因子取邻居相同标签中最大的传播因子减去固定的传播系数p，若标签传播因子小于0，则删除标签。相同签标号的权重相加，归一化所有标签号权重，删除权重小于预定参数1/v的标签号，如果都小于1/v则保留权重最大的标签号，若有多个这样的标签号，则随机选择一个最大权重的标签，删除其余标签，然后归一化保留的标签号的权重值，迭代次数加1，向邻居广播新的标签号列表，将记录的所有邻居的标签列表更新标志设置为未更新。v为预先设定的参数，代表节点最多属于的社团数目（状态机）。The node receives the label number broadcast from the neighbor, compares the number of iterations in the received neighbor label list with the iteration number of the corresponding neighbor label list saved locally, and if the label number is greater than the number of iterations of the saved neighbor list, update the neighbor label list , set the status of the corresponding list as updated, if it is less than or equal to the number of iterations of the locally saved neighbor list, ignore the broadcast message; when the update flags of all neighbor label numbers recorded by the node are updated, update the local label number list, The update strategy is: add up the label lists of the neighbors, and the propagation factor of the label is the largest propagation factor among the neighbors with the same label minus the fixed propagation coefficient p. If the label propagation factor is less than 0, delete the label. Add the weights of the same tag numbers, normalize the weights of all tag numbers, delete the tag numbers whose weight is less than the predetermined parameter 1/v, if they are all less than 1/v, keep the tag number with the largest weight, if there are multiple such tag numbers , then randomly select a label with the largest weight, delete the remaining labels, and then normalize the weight value of the reserved label number, increase the number of iterations by 1, broadcast a new list of label numbers to neighbors, and update the label list of all recorded neighbors Set to not updated. v is a pre-set parameter, representing the maximum number of communities (state machine) that a node belongs to.

标签更新过程中，每一个节点保留的标签列表中的标签号，就代表着当前节点所归属的社团结构id，如果一次更新之后，节点的标签号列表并没有发生变化，则证明节点当前状态下归属的社团结构已经稳定。不再发生变化，直到其社团关系发生变化或因邻居的社团关系发生变化而出现的社团归属变化。During the label update process, the label number in the label list retained by each node represents the community structure id to which the current node belongs. If the label number list of the node does not change after an update, it proves that the node is in the current state. The community structure of affiliation has stabilized. There will be no change until the community relationship changes or the community affiliation changes due to the change of the neighbor's community relationship.

需要说明的是，这里一个标签号携带的信息包括：标签号码，标签号的权重，传播因子等信息；每一个节点拥有一个本地标签号列表，由多个标签号组成（初始化时，每一个节点对应一个标签号，在后续的过程中即标签更新过程中，节点可以保存多个标签号），代表自己所拥有的标签号，本地标签号列表的信息包括：本地迭代次数，和多个标签号；节点广播自己的标签号列表时，广播的是本地标签号列表，包括本地迭代次数和多个标签号；每一个节点还维护一个自己邻居的标签号列表，该列表包括了多张表，每一个邻居对应一个标签号列表，每一个邻居的标签号列表包含信息：邻居的迭代次数，邻居拥有的标签号，以及该邻居的标签号更新状态。It should be noted that the information carried by a label number here includes: label number, label number weight, propagation factor and other information; each node has a local label number list, which is composed of multiple label numbers (initialization, each node Corresponding to a tag number, in the subsequent process, that is, during the tag update process, the node can save multiple tag numbers), representing the tag numbers it owns, and the information of the local tag number list includes: the number of local iterations, and multiple tag numbers ;When a node broadcasts its own label number list, what it broadcasts is a local label number list, including the number of local iterations and multiple label numbers; each node also maintains a label number list of its neighbors, which includes multiple tables, each A neighbor corresponds to a tag number list, and the tag number list of each neighbor contains information: the number of iterations of the neighbor, the tag number owned by the neighbor, and the tag number update status of the neighbor.

具体步骤如图3所示：The specific steps are shown in Figure 3:

步骤31.节点接收并记录来自邻居的标签号广播：对比接收到的邻居标签列表中的迭代次数与本地保存的对应邻居标签列表的迭代次数，若标签号大于本次保存的邻居列表迭代次数，则更新邻居标签列表，设置对应列表状态为已更新，若小于或等于本地保存的邻居列表迭代次数，则忽略该广播报文；当节点记录的所有邻居标签号的更新标志为已更新时，更新本地标签号列表。Step 31. The node receives and records the label number broadcast from the neighbor: compare the number of iterations in the received neighbor label list with the iteration number of the corresponding neighbor label list saved locally, if the label number is greater than the number of iterations of the neighbor list saved this time, Then update the neighbor label list, set the status of the corresponding list as updated, if it is less than or equal to the number of iterations of the neighbor list saved locally, then ignore the broadcast message; when the update flags of all neighbor label numbers recorded by the node are updated, update A list of local tag numbers.

步骤32.选择更新标签号：将邻居的标签列表相加，相同标号的权重相加，归一化所有标号权重，删除权重小于预定参数1/v的标签号，如果都小于1/v则保留权重最大的标签号，若有多个这样的标签号，则随机选择一个最大权重的标签，删除其余标签，然后归一化保留的标签号。标签号的传播因子设置为邻居标签列表中相同标签号的传播因子的最大值减去传播系数p的结果，删除传播因子小于0的标签号。其中，v为预先设定的参数，代表节点最多属于的社团数目；p为预先设定参数，代表传播因子衰减的速度，为区间[0，1]之间的实数。Step 32. Select to update the label number: add the neighbor's label list, add the weights of the same label, normalize all label weights, delete the label numbers whose weight is less than the predetermined parameter 1/v, and keep them if they are all less than 1/v The tag number with the largest weight. If there are multiple such tag numbers, randomly select a tag with the largest weight, delete the rest of the tags, and then normalize the reserved tag numbers. The propagation factor of the label number is set to the result of subtracting the propagation coefficient p from the maximum value of the propagation factor of the same label number in the neighbor label list, and the label number with a propagation factor less than 0 is deleted. Among them, v is a preset parameter, which represents the maximum number of communities that a node belongs to; p is a preset parameter, which represents the speed attenuation of the propagation factor, and is a real number in the interval [0, 1].

步骤33.迭代次数累加并向邻居广播新的标签号列表：迭代次数加1，向邻居广播新的标签号列表，将记录的所有邻居的标签列表更新标志设置为未更新。Step 33. Accumulate the number of iterations and broadcast a new list of label numbers to neighbors: add 1 to the number of iterations, broadcast a new list of label numbers to neighbors, and set the label list update flags of all recorded neighbors to not updated.

步骤34.检查节点的标签号列表是否发生变化：标签更新过程中，每一个节点保留的标签列表中的标签号，就代表着当前节点所归属的社团结构id，如果一次更新之后，节点的标签号列表并没有发生变化，则证明节点当前状态下归属的社团结构已经稳定。不再发生变化，直到其社团关系发生变化或因邻居的社团关系发生变化而出现的社团归属变化。Step 34. Check whether the label number list of the node has changed: during the label update process, the label number in the label list retained by each node represents the community structure id to which the current node belongs. If after an update, the node's label If the number list has not changed, it proves that the community structure that the node belongs to in the current state has been stabilized. There will be no change until the community relationship changes or the community affiliation changes due to the change of the neighbor's community relationship.

通过一定次数的更新迭代，全网的所有节点的社团归属达到了局部稳定。After a certain number of update iterations, the community ownership of all nodes in the entire network has reached local stability.

这里，作为一种优选方案，还包括步骤4.动态维护过程。Here, as a preferred solution, step 4. Dynamic maintenance process is also included.

社团结构的动态维护主要出现在两种情况下，一是当出现新的节点之后，二是两个节点之间的逻辑邻居关系发生了变化，即原本没有邻居关系的两个节点动态的变成了邻居关系或原本是邻居的两个节点不在相连。两种方式对已有的节点来说，都是发现了新的邻居关系；当逻辑关系发生变化时，相关的节点会发现出现了新的邻居或有邻居消失，此时节点向所有邻居发布自己当前的标签列表，发起标签更新过程。The dynamic maintenance of the community structure mainly occurs in two situations. One is when a new node appears, and the other is that the logical neighbor relationship between the two nodes changes, that is, the two nodes that originally had no neighbor relationship dynamically become Two nodes that lost the neighbor relationship or were originally neighbors are no longer connected. In both ways, new neighbor relationships are discovered for existing nodes; when the logical relationship changes, the relevant nodes will find that new neighbors have appeared or neighbors have disappeared. At this time, the node publishes itself to all neighbors. The current tag list, initiates the tag update process.

具体步骤如图4所示：The specific steps are shown in Figure 4:

步骤41.各节点检查逻辑邻居关系是否发生变化；Step 41. Each node checks whether the logical neighbor relationship changes;

步骤42.出现新的节点：当逻辑关系发生变化时，相关的节点会发现出现了新的邻居或有邻居消失，此时节点向所有邻居发布自己当前的标签列表，发起标签更新过程，即步骤3。Step 42. A new node appears: when the logical relationship changes, the relevant node will find that a new neighbor has appeared or a neighbor has disappeared. At this time, the node publishes its current label list to all neighbors, and initiates the label update process, that is, step 3.

步骤43.两个节点之间的逻辑邻居关系发生了变化：即原本没有邻居关系的两个节点动态的变成了邻居关系或原本是邻居的两个节点不再相连；当逻辑关系发生变化时，相关的节点会发现出现了新的邻居或有邻居消失，此时节点向所有邻居发布自己当前的标签列表，发起标签更新过程，即步骤3。Step 43. The logical neighbor relationship between two nodes has changed: that is, the two nodes that originally had no neighbor relationship dynamically become a neighbor relationship or the two nodes that were originally neighbors are no longer connected; when the logical relationship changes , the relevant nodes will find that new neighbors have appeared or some neighbors have disappeared. At this time, the node publishes its current label list to all neighbors, and initiates the label update process, that is, step 3.

本发明方法的贡献主要体现在如下几个方面：The contribution of the inventive method is mainly reflected in the following aspects:

邻居信息与标签交互：每个节点仅要负责计算自己的标签号，整个网络的拓扑信息并不需要被每个节点所知道，每个节点只需要知道邻居节点是哪些，并且可以得到的其标签号列表即可，因此节点应该具有标签交互的能力，可以将自己的标签发布给自己的邻居，从而使得每一个节点知道自己所有的邻居拥有的标签号。分布式社团发现方法中的邻居，对应于传统社团发现算法中相连的节点，并不代表物理链路相连，而是指的两个节点具有一定的社会关系，并可以相互通信。因此不做特殊注明，这里所述的邻居指的均是逻辑上的邻居。Neighbor information and label interaction: each node is only responsible for calculating its own label number, the topology information of the entire network does not need to be known by each node, each node only needs to know which neighbor nodes are, and can get its label The number list is enough, so the node should have the ability of label interaction, and can publish its own label to its neighbors, so that each node knows the label numbers owned by all its neighbors. The neighbor in the distributed community discovery method corresponds to the connected nodes in the traditional community discovery algorithm, and does not mean that the two nodes have a certain social relationship and can communicate with each other. Therefore, no special note is made, and the neighbors mentioned here refer to logical neighbors.

标签更新同步：在集中式的基于标签的社团发现算法中，分为同步和异步两种方式来进行计算，但是不管哪一种方式，当前计算的节点都可以同时看到自己所有邻居的标签号信息，因为集中式的服务器拥有整个网络的拓扑信息。而在分布式的社团发现过程中，由于节点向所有邻居广播自己的标签号，节点才能收到每一个邻居的标签号。但是由于分布式系统中，节点并不能同一时刻接收到邻居发布的标签广播，也不能判定当前收到的标签广播是邻居什么时候发布的。考虑到节点在更新自身标签时，只需要收集到邻居节点当前迭代次数的标签信息即可，因此在节点交互标签信息时，增加迭代次数这个参数，使得节点分辨出收到的邻居标签信息是否可用，节点认为迭代次数最新的标签号为有效标签，每次节点更新自己的标签后，广播的标签信息中携带的迭代次数加1。Label update synchronization: In the centralized label-based community discovery algorithm, there are two ways to calculate, synchronous and asynchronous, but no matter which way, the current calculation node can see the label numbers of all its neighbors at the same time information, because the centralized server has the topology information of the entire network. In the distributed community discovery process, the node can receive the label number of each neighbor because the node broadcasts its own label number to all neighbors. However, in a distributed system, nodes cannot receive label broadcasts issued by neighbors at the same time, nor can they determine when the currently received label broadcasts are issued by neighbors. Considering that when a node updates its own label, it only needs to collect the label information of the current iteration number of the neighbor node. Therefore, when the node exchanges label information, the parameter of iteration number is increased so that the node can distinguish whether the received neighbor label information is available. , the node considers the label number with the latest number of iterations to be a valid label, and each time the node updates its own label, the number of iterations carried in the broadcast label information is increased by 1.

动态维护：由于分布式系统中并没有整个网络的拓扑信息，因此当网络拓扑发生变化，如有新的节点加入，或者有新的逻辑关系出现时，新加入的节点或增加新逻辑关系的节点主动向所有邻居节点发布新的标签号信息，拓扑变化附近的节点根据策略决定是否更新自己的标签号信息，从而动态的维护社团信息。社团变动的涉及范围主要在社团内部以及和社团具有重叠节点的社团内，因此不会扩散到整个网络范围。Dynamic maintenance: Since there is no topology information of the entire network in the distributed system, when the network topology changes, such as new nodes joining or new logical relationships appearing, the newly added nodes or nodes with new logical relationships Actively publish new tag number information to all neighbor nodes, and the nodes near the topology change decide whether to update their own tag number information according to the policy, so as to dynamically maintain community information. The scope of community changes is mainly within the community and within the community that has overlapping nodes with the community, so it will not spread to the entire network.

标签传播控制：如果采用所有标签无限制的传播的方式，经过多次标签交互之后，会划分出多个大型社团，重复社团，从而降低社团发现的效果，这里通过增加传播因子的方式来控制标签号传播的范围，有效的改善了经过多次迭代之后社团划分结果性能下降的问题。Label dissemination control: If all labels are used for unlimited dissemination, after multiple label interactions, multiple large communities will be divided and repeated communities, thereby reducing the effect of community discovery. Here, the label is controlled by increasing the propagation factor The range of number propagation effectively improves the performance degradation of community division results after multiple iterations.

整个方法模型的工作流程如上所述，在标签更新过程中，可以采用不同的更新策略，从而达到更好的社团发现效果，因此本发明的方法可以作为一个框架模块，对所有基于标签的社团发现算法稍作改动，应用在不同的模型中，从而适用于分布式的社团发现算法。The workflow of the entire method model is as described above. During the tag update process, different update strategies can be used to achieve better community discovery results. Therefore, the method of the present invention can be used as a framework module to detect all tag-based community The algorithm is slightly modified and applied in different models, so that it is suitable for distributed community discovery algorithms.

本领域的普通技术人员将会意识到，这里所述的实施例是为了帮助读者理解本发明的原理，应被理解为本发明的保护范围并不局限于这样的特别陈述和实施例。本领域的普通技术人员可以根据本发明公开的这些技术启示做出各种不脱离本发明实质的其它各种具体变形和组合，这些变形和组合仍然在本发明的保护范围内。Those skilled in the art will appreciate that the embodiments described here are to help readers understand the principles of the present invention, and it should be understood that the protection scope of the present invention is not limited to such specific statements and embodiments. Those skilled in the art can make various other specific modifications and combinations based on the technical revelations disclosed in the present invention without departing from the essence of the present invention, and these modifications and combinations are still within the protection scope of the present invention.

Claims

1. one kind based on the mutual distributed Combo discovering method of label, specifically comprises the steps:

Step 1. arranges in the application scenarios information as can be known, specifically comprises: the initialization tag number of each node, and described tag number has uniqueness; The logic neighbours of each node;

Step 2. netinit process: all node is according to the tag number of local unique information initialization self in the network; The weight of this tag number is set to 1, and the propagation factor of tag number is initialized as 1; Local iterations is set is designated as 1, to the tag number of all neighbor node broadcasting oneself; The local neighbours' tag number tabulation of preserving of initialization, each the neighbours' label that arranges in neighbours' tag number tabulation of preservation is update mode not;

Step 3. tag update process: node receives the tag number broadcasting from neighbours, iterations in neighbours' tag number tabulation that contrast receives and the iterations of local respective neighbours list of labels of preserving, if the neighbor list iterations that the iterations in the neighbours' list of labels that receives is preserved greater than this locality, new neighbor list of labels more then, the corresponding lists state is set for upgrading, if be less than or equal to the local neighbor list iterations of preserving, then ignore this broadcasting packet; When the updating mark of all neighbours' tag numbers of nodes records when upgrading, upgrade local label number tabulation, update strategy is: with neighbours' tag number tabulation addition, the propagation factor of label is got propagation factor maximum in neighbours' same label and is deducted predefined propagation coefficient p, if the label propagation factor less than 0, is then deleted label; The label weight addition of same label number, all label weights of normalization, the deletion weight is less than the tag number of preliminary setting parameter 1/v, if all less than 1/v then keep the tag number of weight maximum, if a plurality of such tag numbers are arranged, then select at random the label of a weight limit, delete all the other labels, the tag number that keeps of normalization then, iterations adds 1, broadcast number tabulation of new local label to neighbours, all neighbours' of record list of labels updating mark is set to not upgrade; If after once upgrading, the tag number tabulation of node does not change, and shows that then the community structure that belongs under the node current state is stable.

2. according to claim 1 a kind of based on the mutual distributed Combo discovering method of label, it is characterized in that, also comprise the Dynamic Maintenance process, when occurring triggering the Dynamic Maintenance process when logic neighborhood between new node or two nodes changes, this moment, node was issued the list of labels of oneself to all neighbor nodes, initiated the tag update process.

3. according to claim 1 and 2 a kind ofly it is characterized in that based on the mutual distributed Combo discovering method of label, the described local unique information of step 2 is specially mac address or user name or node ID.