[go: up one dir, main page]

WO2019109255A1 - Method for inferring scholars' temporal location in academic social network - Google Patents

Method for inferring scholars' temporal location in academic social network Download PDF

Info

Publication number
WO2019109255A1
WO2019109255A1 PCT/CN2017/114646 CN2017114646W WO2019109255A1 WO 2019109255 A1 WO2019109255 A1 WO 2019109255A1 CN 2017114646 W CN2017114646 W CN 2017114646W WO 2019109255 A1 WO2019109255 A1 WO 2019109255A1
Authority
WO
WIPO (PCT)
Prior art keywords
time
space
function
correlations
author
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2017/114646
Other languages
French (fr)
Inventor
Jie Tang
Kan Wu
Bo Gao
Debing Liu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to PCT/CN2017/114646 priority Critical patent/WO2019109255A1/en
Publication of WO2019109255A1 publication Critical patent/WO2019109255A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Definitions

  • the present disclosure relates to a field of social network technology, and more particularly, to a method for inferring scholars’temporal location in academic social network.
  • Embodiments of the present disclosure aim to solve at least one of the technical solutions in the related art.
  • embodiments of the present disclosure provide a method for inferring scholars’ temporal location in academic social network.
  • the present disclosure proposes a Space-Time Factor Graph Model (STFGM) which outperforms baselines 6%-27%in two datasets and on two precision metrics.
  • STFGM Space-Time Factor Graph Model
  • the devised smoothing methods can gain 5%-18%items growth with a minor precision loss about 0.05%-1%or achieve 2%-7%precision increasing with 3%-7%items loss according to different priority settings.
  • the present disclosure develops an application which can automatically list out a given author’s career experiences and draws the trajectory path in the map. The service will soon be open to public.
  • the present disclosure develops another application to show and study the scientist’s group migrations covering a century and get some interesting findings.
  • the binary classification problem is defined as follows.
  • each author a is associated with individual featuresvec (a) t while each coauthor relation (a i1 , a i2 ) is associated with coauthor features vec (a i1 , a i2 ) t .
  • a binary label is used to indicate whether a i1 and a i2 belong to the same affiliation.
  • a two-class rankSVM model is built based on Maximum likelihood Estimation (MLE) :
  • Pair-wise Comparison Model is a combination of individual and coauthor features, this approach is called as Pair-wise Comparison Model.
  • the implementation of Statistics-based Model is easy, but the model is coarse and cannot capture the properties and behavior features of the target author and his/her coauthors at the microscopic level.
  • the Pair-wise Comparison Model focuses on the features of the different authors at the microscopic level, thus can fit the training datasets better than the previous one. However, it assumes every instance to be predicted as independent with each other, which is not the truth. In fact, there are many correlations in the instances which can help increase performance.
  • Embodiments of the present disclosure propose the Space-Time Factor Graph here.
  • Each node is a tuple (t, a i1 , a i2 ) indicating authors a i1 and a i2 co-write a paper at time t.
  • edges two types of correlations are considered, namely space correlations and time correlations. Additionally, individual and coauthor features have benn mentioned before.
  • the objective is to maximize the probabilityP (Y
  • Heuristic knowledge one the first one is very straightforward, and the respective demographic features of the two authors and the same things they did together must have a direct connection with the likeness between them. For example, if two persons are all undergraduate students, have similar age and coauthored a lot of papers, the topics of their respective papers are close, it can infer with much confidence that they may be in the same school or university.
  • Heuristic knowledge two the second one is that if we know author B and author C are with the same affiliation or have many common connections, and the likeness between author A and author B is high, then we can infer that the probability of A and C with the same affiliation is high also.
  • Heuristic knowledge three thirdly, if we know author A and author B are in the same affiliation last year, then A and B may continue be in the same affiliation this year. Moreover, if we know A and B are in the same affiliation next year as well, then the probability of them with the same affiliation this year increases also. We refer to the correlation between different times on the same author pair as time correlation.
  • each tuple of Year t, Author a i1 , Author a i2 corresponds to an observation instance.
  • three factors Based on the previous heuristic knowledge, we define three factors.
  • Attribute factor function It captures the features of each tuple (t, a i1 , a i2 ) , including the respective features of the two authors and the concurrent features between them.
  • the function characterizes how the observed tuple features contribute to the likeness of the authors in the tuple.
  • the function is defined as an exponential-linear function:
  • a classification label (whether the two authors in author pairs ⁇ a i1 , a i2 > are in the same affiliation at time t ) , is the weighting vector, ⁇ is the vector of feature functions, is the corresponding feature vector of the observation tuple (t, a i1 , a i2 ) concatenated by the vectors of a i1 and a i2 ’s respective features and shared common features at time t.
  • Space factor function The construction of the space factor function is based on the heuristic knowledge two mentioned above which captures the space correlation between the hidden variables in the same time. It is also defined as an exponential-linear function:
  • Time factor function The construction is based on the third heuristic knowledge which captures the time correlation between the hidden variables. It is also defined as an exponential-linear function:
  • Model Learning Once it is modeled the authors’ attributes, space and time correlations, the next goal is to combine all the factors, observation instances and hidden variables into an unified model. N S and N T are reused to denote all the space and time relations sets without confusion. Define and which are two sets representing all the observation instances (including instance attributes and relations) and the hidden variables in the model respectively. Directly modeling joint probability P (X, Y) is very difficult, because it needs to model distribution over all the possible values of X. Fortunately, we can compute the joint conditional probability P (Y
  • G is an aggregation of the factor functions over all the hidden variables.
  • Y L we use Y'
  • each author a is associated with individual featuresvec (a) t while each coauthor relation (a i1 , a i2 ) is associated with coauthor featuresvec (a i1 , a i2 ) t .
  • a binary label is used to indicate whether a i1 and a i2 belong to the same affiliation.
  • a two-class rankSVM model is built based on Maximum likelihood Estimation (MLE) :
  • the implementation of Statistics-based Model is easy, but the model is coarse and cannot capture the properties and behavior features of the target author and his/her coauthors at the microscopic level.
  • the Pair-wise Comparison Model focuses on the features of the different authors at the microscopic level, thus can fit the training datasets better than the previous one. However, it assumes every instance to be predicted as independent with each other, which is not the truth. In fact, there are many correlations in the instances which can help increase performance.
  • Embodiments of the present disclosure propose the Space-Time Factor Graph here.
  • Each node is a tuple (t, a i1 , a i2 ) indicating authors a i1 and a i2 co-write a paper at time t.
  • edges two types of correlations are considered, namely space correlations and time correlations. Additionally, individual and coauthor features have benn mentioned before.
  • the objective is to maximize the probabilityP (Y
  • Heuristic knowledge one the first one is very straightforward, and the respective demographic features of the two authors and the same things they did together must have a direct connection with the likeness between them. For example, if two persons are all undergraduate students, have similar age and coauthored a lot of papers, the topics of their respective papers are close, it can infer with much confidence that they may be in the same school or university.
  • Heuristic knowledge two the second one is that if we know author B and author C are with the same affiliation or have many common connections, and the likeness between author A and author B is high, then we can infer that the probability of A and C with the same affiliation is high also.
  • Heuristic knowledge three thirdly, if we know author A and author B are in the same affiliation last year, then A and B may continue be in the same affiliation this year. Moreover, if we know A and B are in the same affiliation next year as well, then the probability of them with the same affiliation this year increases also. We refer to the correlation between different times on the same author pair as time correlation.
  • each tuple of Year t, Author a i1 , Author a i2 corresponds to an observation instance.
  • three factors Based on the previous heuristic knowledge, we define three factors.
  • Attribute factor function It captures the features of each tuple (t, a i1 , a i2 ) , including the respective features of the two authors and the concurrent features between them.
  • the function characterizes how the observed tuple features contribute to the likeness of the authors in the tuple.
  • the function is defined as an exponential-linear function:
  • a classification label (whether the two authors in author pairs ⁇ a i1 , a i2 > are in the same affiliation at time t ) , is the weighting vector, ⁇ is the vector of feature functions, is the corresponding feature vector of the observation tuple (t, a i1 , a i2 ) concatenated by the vectors of a i1 and a i2 ’s respective features and shared common features at time t.
  • Space factor function The construction of the space factor function is based on the heuristic knowledge two mentioned above which captures the space correlation between the hidden variables in the same time. It is also defined as an exponential-linear function:
  • Time factor function The construction is based on the third heuristic knowledge which captures the time correlation between the hidden variables. It is also defined as an exponential-linear function:
  • Model Learning Once it is modeled the authors’ attributes, space and time correlations, the next goal is to combine all the factors, observation instances and hidden variables into an unified model. N S and N T are reused to denote all the space and time relations sets without confusion. Define and which are two sets representing all the observation instances (including instance attributes and relations) and the hidden variables in the model respectively. Directly modeling joint probability P (X, Y) is very difficult, because it needs to model distribution over all the possible values of X. Fortunately, we can compute the joint conditional probability P (Y
  • G is an aggregation of the factor functions over all the hidden variables.
  • Y L we use Y'
  • LEF Local Outlier Factor

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments of the present disclosure provide a Space-Time Factor Graph Model (STFGM) incorporating time and space correlations to infer the authors' missing high-resolution affiliations with time in academic social network. What's more, at a personal global level, devising different smoothing methods to bridge the "holes" between years and trim the "glitches" according to different priority goals of increasing information items with the least precision loss or increasing precision with the least information items trimmed, and demonstrating that our STFGM outperforms the baselines 6%-27% in two datasets (Aminer and MAG) and on two precision metrics. Further, the devised smoothing methods can gain 5%-18% items growth with only a minor precision loss about 0.05%-1% or achieve 2%-7% precision increasing with 3%-7% items loss according to different priority settings. At last, two applications are developed based on our inferring model and smoothing method which demonstrate the effectiveness further.

Description

METHOD FOR INFERRING SCHOLARS’ TEMPORAL LOCATION IN ACADEMIC SOCIAL NETWORK FIELD
The present disclosure relates to a field of social network technology, and more particularly, to a method for inferring scholars’temporal location in academic social network.
BACKGROUND
The tough competition on personalized information services in a variety of domains has driven the demand for more precise user profiling. Just as a saying goes: “You cannot judge of a man till you know his whole story” , exploring the past affiliations a person have been studying or working at different times can help better profile him/her. One common way is through extracting a person’s working or studying experiences from his/her curriculum vitae in personal home page. However, extracting the formatted affiliations with time in an unstructured biography paragraph automatically is still a big challenge.
There are many researches concerning the mobility of scientists, which need a lot of affiliation information of researchers. Traditional methods, such as biographical questionnaires or individual interviews, can only collect limited data. It is necessary to develop a powerful approach to automatically infer one’s affiliation.
Luckily, with the development of the academic social networks such as Aminer and MAG, we find a new breach. We could get an author’s coauthors, affiliations at different times according to his/her papers included. However, there are some obstacles. Our sample statistic of about 1.5 million authors in Aminer academic network shows that 0.55 million authors (accounting for 1/3 of the sample size) don’t have any affiliations in their papers. And in a modern academic network with more than one hundred million authors, it’s often hard to ascribe the papers to the right authors, since there are different authors with the same name, the same author with different name spellings and abbreviations, even different papers with the same name, et al. So we need to infer out the missing information from sparse and noisy data.
Moreover, traditional location inferring methods rarely concern about the time, but people would not tend to stay at one place all their lives. Inferring temporal location is more challenging. What’s more, many location inferring methods only concern at a very coarse granularity with very limited classification labels, e.g., inferring a country or a state/city in a country. However, the  number of higher-resolution institutions such as some university, some research institution are huge.
Most existing methods infer users’ affiliation independently with an indirect metrics on neighbors, such as the highest frequent or the geometric median. In the present disclosure, we illustrate how to infer all the affiliation simultaneously by introducing space correlations between authors.
Another challenge which cannot be negligible is that when we infer out all the missing affiliations through authors’ papers, while an author may not publish papers every year, how to fill the missing information in the gaps when there are no papers without injuring overall precision.
SUMMARY
Embodiments of the present disclosure aim to solve at least one of the technical solutions in the related art.
In order to achieve the above object, embodiments of the present disclosure provide a method for inferring scholars’ temporal location in academic social network.
To the best of our knowledge, this is the first time to infer people’s affiliations at different history times, and the present disclosure proposes a Space-Time Factor Graph Model (STFGM) which outperforms baselines 6%-27%in two datasets and on two precision metrics. On personal global level, the devised smoothing methods can gain 5%-18%items growth with a minor precision loss about 0.05%-1%or achieve 2%-7%precision increasing with 3%-7%items loss according to different priority settings. Based on the proposed model and smoothing method, the present disclosure develops an application which can automatically list out a given author’s career experiences and draws the trajectory path in the map. The service will soon be open to public. Based on many single persons’ trajectories inferred, the present disclosure develops another application to show and study the scientist’s group migrations covering a century and get some interesting findings.
Before proceeding, we first introduce two baseline solutions for this problem. The first is based on the idea that the opportunities to work with someone in the same affiliation are much larger than that in different affiliations. Our survey on 2 million randomly selected papers confirmed the assumption which showed that about 71.3%papers have two or more coauthors from the same affiliation. To predict a missing affiliation of an author at a time t, we can count the affiliation that her/his coauthors belong to and simply assign the affiliation with the maximum  count to her/him. We call this method as Statistics-Based model.
The binary classification problem is defined as follows.
At time t, each author a is associated with individual featuresvec (a) t while each coauthor relation (ai1, ai2) is associated with coauthor features vec (ai1, ai2t. For each coauthor relation (ai1, ai2) , a binary label
Figure PCTCN2017114646-appb-000001
is used to indicate whether ai1 and ai2 belong to the same affiliation. Then given a training data, a two-class rankSVM model is built based on Maximum likelihood Estimation (MLE) :
Figure PCTCN2017114646-appb-000002
Figure PCTCN2017114646-appb-000003
is a combination of individual and coauthor features, this approach is called as Pair-wise Comparison Model.
Previous works usually focused on expertise matching, but seldom considered whether an expert would decline the invitation.
The implementation of Statistics-based Model is easy, but the model is coarse and cannot capture the properties and behavior features of the target author and his/her coauthors at the microscopic level. The Pair-wise Comparison Model focuses on the features of the different authors at the microscopic level, thus can fit the training datasets better than the previous one. However, it assumes every instance to be predicted as independent with each other, which is not the truth. In fact, there are many correlations in the instances which can help increase performance.
Embodiments of the present disclosure propose the Space-Time Factor Graph here. Let G= (V, E) denote a undirected graph, where V and E are the sets of nodes and edges, respectively. Each node is a tuple (t, ai1, ai2) indicating authors ai1 and ai2 co-write a paper at time t. In terms of edges, two types of correlations are considered, namely space correlations and time correlations. Additionally, individual and coauthor features have benn mentioned before. The objective is to maximize the probabilityP (Y|G, X) .
Three kinds of heuristic knowledge incorporated in our Space-Time Factor Graph Model for predicting the affiliation likeness between authors are introduced.
Heuristic knowledge one: the first one is very straightforward, and the respective  demographic features of the two authors and the same things they did together must have a direct connection with the likeness between them. For example, if two persons are all undergraduate students, have similar age and coauthored a lot of papers, the topics of their respective papers are close, it can infer with much confidence that they may be in the same school or university.
Heuristic knowledge two: the second one is that if we know author B and author C are with the same affiliation or have many common connections, and the likeness between author A and author B is high, then we can infer that the probability of A and C with the same affiliation is high also. We refer to the correlation between different authors in the same time as space correlation.
Heuristic knowledge three: thirdly, if we know author A and author B are in the same affiliation last year, then A and B may continue be in the same affiliation this year. Moreover, if we know A and B are in the same affiliation next year as well, then the probability of them with the same affiliation this year increases also. We refer to the correlation between different times on the same author pair as time correlation.
A detailed description of the prosed STFGM is given. In STFGM, each tuple of Year t, Author ai1, Author ai2 corresponds to an observation instance. We define the same number of hidden binary-valued variables associated with each observation instance representing the relation between the two authors at the time Year t. More concretely, if Author ai1 has the same affiliation as Author ai2 at time t, then the value for the hidden variable related to the tuple (t, ai1, ai2) is 1. Otherwise, it is 0. Based on the previous heuristic knowledge, we define three factors.
Attribute factor function: It captures the features of each tuple (t, ai1, ai2) , including the respective features of the two authors and the concurrent features between them. The function characterizes how the observed tuple features contribute to the likeness of the authors in the tuple. The function is defined as an exponential-linear function:
Figure PCTCN2017114646-appb-000004
Figure PCTCN2017114646-appb-000005
denotes a classification label (whether the two authors in author pairs <ai1, ai2> are in the same affiliation at time t ) , 
Figure PCTCN2017114646-appb-000006
is the weighting vector, Φ is the vector of  feature functions, 
Figure PCTCN2017114646-appb-000007
is the corresponding feature vector of the observation tuple (t, ai1, ai2) concatenated by the vectors of ai1 and ai2’s respective features and shared common features at time t.
Space factor function: The construction of the space factor function is based on the heuristic knowledge two mentioned above which captures the space correlation between the hidden variables in the same time. It is also defined as an exponential-linear function:
Figure PCTCN2017114646-appb-000009
denotes neighbors which have space correlations with
Figure PCTCN2017114646-appb-000010
Figure PCTCN2017114646-appb-000011
C is the number of types of space correlations; 
Figure PCTCN2017114646-appb-000012
Figure PCTCN2017114646-appb-000013
Time factor function: The construction is based on the third heuristic knowledge which captures the time correlation between the hidden variables. It is also defined as an exponential-linear function:
Figure PCTCN2017114646-appb-000014
Figure PCTCN2017114646-appb-000015
denotes neighbors which have time correlations with
Figure PCTCN2017114646-appb-000016
Figure PCTCN2017114646-appb-000017
C'is the number of types of time correlations; 
Figure PCTCN2017114646-appb-000018
Figure PCTCN2017114646-appb-000019
Model Learning: Once it is modeled the authors’ attributes, space and time correlations, the next goal is to combine all the factors, observation instances and hidden variables into an unified model. NS and NT are reused to denote all the space and time relations sets without confusion. Define
Figure PCTCN2017114646-appb-000020
and
Figure PCTCN2017114646-appb-000021
which are two sets representing all the observation instances (including instance attributes and relations) and the hidden variables in the model respectively. Directly modeling joint probability P (X, Y) is very difficult, because it needs to model  distribution over all the possible values of X. Fortunately, we can compute the joint conditional probability P (Y|X) which avoids computing the annoying P (X) .
Figure PCTCN2017114646-appb-000022
Figure PCTCN2017114646-appb-000023
G is an aggregation of the factor functions over all the hidden variables. 
Figure PCTCN2017114646-appb-000024
is the parameter configuration of the model, 
Figure PCTCN2017114646-appb-000025
is the global normalization term making the joint conditional probability value between 0 and 1. For Y is partially labeled, we define our log-likelihood objective function on the labeled data YL. We use Y'|YL to denote the label configuration Y'that satisfies all the known labels YL.
Additional aspects and advantages of embodiments of present disclosure will be given in part in the following descriptions, become apparent in part from the following descriptions, or be learned from the practice of the embodiments of the present disclosure.
DETAILED DESCRIPTION
Reference will be made in detail to embodiments of the present disclosure. The embodiments described herein are explanatory and illustrative, and is not construed to limit the present disclosure.
Before proceeding, two baseline solutions for this problem are introduced first. The first is based on the idea that the opportunities to work with someone in the same affiliation are much larger than that in different affiliations. Our survey on 2 million randomly selected papers confirmed the assumption which showed that about 71.3%papers have two or more coauthors from the same affiliation. To predict a missing affiliation of an author at a time t, we can count the affiliation that her/his coauthors belong to and simply assign the affiliation with the maximum count to her/him. We call this method as Statistics-Based model.
By merely counting the frequency of affiliations in one’s coauthors, we ignore some helpful  information. Intuitively, we can formulate the problem as a multi-label classification where the label classes are the set of available affiliations. However, in our basic statistics of about 2 million paper, the different affiliations exceed 10 thousand. The 1 vs. 10k+ classification model is likely to fail with limited features. In order to relieve this problem, we first restricted the size of candidate affiliations, which only include those of one’s coauthors. Furthermore, we change the multi-label classification problem into binary classification problem: in contrast to predict the affiliation label directly, the model is designed to predict whether the target author belongs to the same affiliation with a given coauthor.
We define the binary classification problem as follow.
At time t, each author a is associated with individual featuresvec (a) t while each coauthor relation (ai1, ai2) is associated with coauthor featuresvec (ai1, ai2t. For each coauthor relation (ai1, ai2) , a binary label
Figure PCTCN2017114646-appb-000026
is used to indicate whether ai1 and ai2 belong to the same affiliation. Then given a training data, a two-class rankSVM model is built based on Maximum likelihood Estimation (MLE) :
Figure PCTCN2017114646-appb-000027
Figure PCTCN2017114646-appb-000028
is the combination of individual and coauthor features, this approach is called as Pair-wise Comparison Model.
Previous works usually focused on expertise matching, but seldom considered whether an expert would decline the invitation.
The implementation of Statistics-based Model is easy, but the model is coarse and cannot capture the properties and behavior features of the target author and his/her coauthors at the microscopic level. The Pair-wise Comparison Model focuses on the features of the different authors at the microscopic level, thus can fit the training datasets better than the previous one. However, it assumes every instance to be predicted as independent with each other, which is not the truth. In fact, there are many correlations in the instances which can help increase performance.
Embodiments of the present disclosure propose the Space-Time Factor Graph here. Let G= (V, E) denote a undirected graph, where V and E are the sets of nodes and edges, respectively. Each node is a tuple (t, ai1, ai2) indicating authors ai1 and ai2 co-write a paper at  time t. In terms of edges, two types of correlations are considered, namely space correlations and time correlations. Additionally, individual and coauthor features have benn mentioned before. The objective is to maximize the probabilityP (Y|G, X) .
Three kinds of heuristic knowledge incorporated in our Space-Time Factor Graph Model for predicting the affiliation likeness between authors are introduced.
Heuristic knowledge one: the first one is very straightforward, and the respective demographic features of the two authors and the same things they did together must have a direct connection with the likeness between them. For example, if two persons are all undergraduate students, have similar age and coauthored a lot of papers, the topics of their respective papers are close, it can infer with much confidence that they may be in the same school or university.
Heuristic knowledge two: the second one is that if we know author B and author C are with the same affiliation or have many common connections, and the likeness between author A and author B is high, then we can infer that the probability of A and C with the same affiliation is high also. We refer to the correlation between different authors in the same time as space correlation.
Heuristic knowledge three: thirdly, if we know author A and author B are in the same affiliation last year, then A and B may continue be in the same affiliation this year. Moreover, if we know A and B are in the same affiliation next year as well, then the probability of them with the same affiliation this year increases also. We refer to the correlation between different times on the same author pair as time correlation.
A detailed description of the prosed STFGM is given. In STFGM, each tuple of Year t, Author ai1, Author ai2 corresponds to an observation instance. We define the same number of hidden binary-valued variables associated with each observation instance representing the relation between the two authors at the time Year t. More concretely, if Author ai1 has the same affiliation as Author ai2 at time t, then the value for the hidden variable related to the tuple (t, ai1, ai2) is 1. Otherwise, it is 0. Based on the previous heuristic knowledge, we define three factors.
Attribute factor function: It captures the features of each tuple (t, ai1, ai2) , including the respective features of the two authors and the concurrent features between them. The function characterizes how the observed tuple features contribute to the likeness of the authors in the tuple. The function is defined as an exponential-linear function:
Figure PCTCN2017114646-appb-000029
Figure PCTCN2017114646-appb-000030
denotes a classification label (whether the two authors in author pairs <ai1, ai2> are in the same affiliation at time t ) , 
Figure PCTCN2017114646-appb-000031
is the weighting vector, Φ is the vector of feature functions, 
Figure PCTCN2017114646-appb-000032
is the corresponding feature vector of the observation tuple (t, ai1, ai2) concatenated by the vectors of ai1 and ai2’s respective features and shared common features at time t.
Space factor function: The construction of the space factor function is based on the heuristic knowledge two mentioned above which captures the space correlation between the hidden variables in the same time. It is also defined as an exponential-linear function:
Figure PCTCN2017114646-appb-000033
Figure PCTCN2017114646-appb-000034
denotes neighbors which have space correlations with
Figure PCTCN2017114646-appb-000035
Figure PCTCN2017114646-appb-000036
C is the number of types of space correlations; 
Figure PCTCN2017114646-appb-000037
Figure PCTCN2017114646-appb-000038
Time factor function: The construction is based on the third heuristic knowledge which captures the time correlation between the hidden variables. It is also defined as an exponential-linear function:
Figure PCTCN2017114646-appb-000039
Figure PCTCN2017114646-appb-000040
denotes neighbors which have time correlations with
Figure PCTCN2017114646-appb-000041
Figure PCTCN2017114646-appb-000042
C'is the number of types of time correlations; 
Figure PCTCN2017114646-appb-000043
Figure PCTCN2017114646-appb-000044
Model Learning: Once it is modeled the authors’ attributes, space and time correlations, the next goal is to combine all the factors, observation instances and hidden variables into an unified model. NS and NT are reused to denote all the space and time relations sets without confusion. Define
Figure PCTCN2017114646-appb-000045
and
Figure PCTCN2017114646-appb-000046
which are two sets representing all the observation instances (including instance attributes and relations) and the hidden variables in the model respectively. Directly modeling joint probability P (X, Y) is very difficult, because it needs to model distribution over all the possible values of X. Fortunately, we can compute the joint conditional probability P (Y|X) which avoids computing the annoying P (X) .
Figure PCTCN2017114646-appb-000047
Figure PCTCN2017114646-appb-000048
G is an aggregation of the factor functions over all the hidden variables. 
Figure PCTCN2017114646-appb-000049
is the parameter configuration of the model, 
Figure PCTCN2017114646-appb-000050
is the global normalization term making the joint conditional probability value between 0 and 1. For Y is partially labeled, we define our log-likelihood objective function on the labeled data YL. We use Y'|YL to denote the label configuration Y'that satisfies all the known labels YL.
After inferring out an author’s affiliations at different years, there may still exist “holes” and/or “glitches” in discrete years. For example, we may not collect or the author may not publish any papers at some years, previous algorithms cannot infer out the affiliations without observation instances so “holes” appear. Another example, a predicted same-affiliation coauthor may have two or more affiliations at a year and with some not belonging to the query author. At that year we may not easily distinguish which one is wrong so “glitches” arise. Smoothing includes stretching the data to bridge the “holes” and trimming out the “glitches” . They are two inverse processes. Obviously, when we stretch the data, the wrong information may be introduced to reduce our  precision; and when we trim the data, the precision increases but the useful information may be lost.
The trim algorithm we used is Local Outlier Factor (LOF) , which can identify density-based local anomaly. We use google map api to change the affiliations into latitude-longitude pairs. Each point into LOF is a tuple with latitude, longitude and time t to make the information smooth in space and time.
Although explanatory embodiments have been shown and described, it would be appreciated by those skilled in the art that the above embodiments cannot be construed to limit the present disclosure, and changes, alternatives, and modifications can be made in the embodiments without departing from scope of the present disclosure.

Claims (1)

  1. A method for inferring scholars’ temporal location in academic social network, comprising: building a two-class rankSVM model based on Maximum likelihood Estimation (MLE) :
    Figure PCTCN2017114646-appb-100001
    wherein, 
    Figure PCTCN2017114646-appb-100002
    is a combination of individual and coauthor features, this approach is called as Pair-wise Comparison Model;
    capturing an attribute facor function of each tuple (t, ai1, ai2) by an exponential-linear function:
    Figure PCTCN2017114646-appb-100003
    wherein, the function characterize how the observed tuple features contribute to the likeness of the authors in the tuple; 
    Figure PCTCN2017114646-appb-100004
    denotes a classification label (whether the two authors in author pairs <ai1, ai2> are in the same affiliation at time t ) ; 
    Figure PCTCN2017114646-appb-100005
    is the weighting vector; Φ is the vector of feature functions; 
    Figure PCTCN2017114646-appb-100006
    is the corresponding feature vector of the observation tuple (t, ai1, ai2) concatenated by the vectors of ai1 and ai2’s respective features and shared common features at time t;
    capturing a space factor function by an exponential-linear function:
    Figure PCTCN2017114646-appb-100007
    wherein, the construction of the space factor function is based on the heuristic knowledge two mentioned above which captures the space correlation between the hidden variables in the same time; 
    Figure PCTCN2017114646-appb-100008
    denotes neighbors which have space correlations with
    Figure PCTCN2017114646-appb-100009
    C is the number of types of space correlations; 
    Figure PCTCN2017114646-appb-100010
    Figure PCTCN2017114646-appb-100011
    capturing a time factor function by an exponential-linear function:
    Figure PCTCN2017114646-appb-100012
    wherien, the construction of the time factor functor is based on the third heuristic knowledge which captures the time correlation between the hidden variables; 
    Figure PCTCN2017114646-appb-100013
    denotes neighbors which have time correlations with
    Figure PCTCN2017114646-appb-100014
    C'is the number of types of time correlations; 
    Figure PCTCN2017114646-appb-100015
    obtaining an unified model based on features of author’s attributes, space and time correlations, observation instance and hidden variables, and computing a joint conditional probability P (Y|X) by formula:
    Figure PCTCN2017114646-appb-100016
    wherein, 
    Figure PCTCN2017114646-appb-100017
    G is an aggregation of the factor functions over all the hidden variables; 
    Figure PCTCN2017114646-appb-100018
    is the parameter configuration of the model, 
    Figure PCTCN2017114646-appb-100019
    is the global normalization term making the joint conditional probability value between 0 and 1; for Y is partially labeled, we define our log-likelihood objective function on the labeled data YL; Y'|YL denotes the label configuration Y'that satisfies all the known labels YL.
PCT/CN2017/114646 2017-12-05 2017-12-05 Method for inferring scholars' temporal location in academic social network Ceased WO2019109255A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/114646 WO2019109255A1 (en) 2017-12-05 2017-12-05 Method for inferring scholars' temporal location in academic social network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/114646 WO2019109255A1 (en) 2017-12-05 2017-12-05 Method for inferring scholars' temporal location in academic social network

Publications (1)

Publication Number Publication Date
WO2019109255A1 true WO2019109255A1 (en) 2019-06-13

Family

ID=66750411

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/114646 Ceased WO2019109255A1 (en) 2017-12-05 2017-12-05 Method for inferring scholars' temporal location in academic social network

Country Status (1)

Country Link
WO (1) WO2019109255A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115481215A (en) * 2021-05-31 2022-12-16 华东师范大学 Partner prediction method and prediction system based on temporal partner knowledge graph

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855552A (en) * 2011-06-13 2013-01-02 索尼公司 Information processing apparatus, information processing method, and program
US20130110754A1 (en) * 2011-10-28 2013-05-02 Research In Motion Limited Factor-graph based matching systems and methods

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855552A (en) * 2011-06-13 2013-01-02 索尼公司 Information processing apparatus, information processing method, and program
US20130110754A1 (en) * 2011-10-28 2013-05-02 Research In Motion Limited Factor-graph based matching systems and methods

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LIN, HUIJIE ET AL.: "Detecting Stress Based on Social Interactions in Social Networks", IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, vol. 29, no. 9, 30 September 2017 (2017-09-30), XP55615233 *
TANG, WENBIN ET AL.: "Learning to Infer Social Ties in Large Networks", PROCEEDINGS OF THE EUROPEAN CONFERENCE ON MACHINE LEARNING AND PRINCIPLES AND PRACTICE KNOWLEDGE DISCOVERY IN DATABASES (ECMUPKDD'1 1, 23 September 2016 (2016-09-23), XP047463629 *
WANG, CHI ET AL.: "Mining Advisor-Advisee Relationships from Research Publication Networks", PROCEEDINGS OF THE SIXTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'10, 28 July 2010 (2010-07-28), pages 1 - 7, XP058270578 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115481215A (en) * 2021-05-31 2022-12-16 华东师范大学 Partner prediction method and prediction system based on temporal partner knowledge graph

Similar Documents

Publication Publication Date Title
Wang et al. Trust-enhanced collaborative filtering for personalized point of interests recommendation
Zhang et al. On robust truth discovery in sparse social media sensing
US8706739B1 (en) Joining user profiles across online social networks
Wan et al. A hybrid ensemble learning method for tourist route recommendations based on geo-tagged social networks
CN108304380A (en) A method of scholar&#39;s name disambiguation of fusion academic
Ding et al. Predicting the attributes of social network users using a graph-based machine learning method
Korayem et al. De-anonymizing users across heterogeneous social computing platforms
Abrol et al. Tweethood: Agglomerative clustering on fuzzy k-closest friends with variable depth for location mining
Lane et al. Connecting personal-scale sensing and networked community behavior to infer human activities
Sewell Model-based edge clustering
Mao et al. Mining of marital distress from microblogging social networks: A case study on Sina Weibo
Beigi et al. " Identifying novel privacy issues of online users on social media platforms" by Ghazaleh Beigi and Huan Liu with Martin Vesely as coordinator
Roedler et al. Profile matching across online social networks based on geo-tags
Guo et al. TMR: Towards an efficient semantic-based heterogeneous transportation media big data retrieval
Meng et al. Towards the inference of travel purpose with heterogeneous urban data
WO2019109255A1 (en) Method for inferring scholars&#39; temporal location in academic social network
Bose A comparative study of social networking approaches in identifying the covert nodes
Yu et al. How to optimize an academic team when the outlier member is leaving?
Vu et al. Exploiting Distant Supervision to Learn Semantic Descriptions of Tables with Overlapping Data
Apreleva et al. Predicting the location of users on Twitter from low density graphs
Mazumder et al. Spatio-temporal signal recovery from political tweets in indonesia
Cano et al. Volatile Classification of Point of Interests based on Social Activity Streams.
Li et al. A solution to tweet-based user identification across online social networks
Singh et al. Optimised K-anonymisation technique to deal with mutual friends and degree attacks
Hashimoto et al. Breaking Anonymity of Social Media by Profiling from Multimodal Information

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17934279

Country of ref document: EP

Kind code of ref document: A1