[go: up one dir, main page]

CN109299167B - A visualization method to display family migration history and family development - Google Patents

A visualization method to display family migration history and family development Download PDF

Info

Publication number
CN109299167B
CN109299167B CN201811158830.1A CN201811158830A CN109299167B CN 109299167 B CN109299167 B CN 109299167B CN 201811158830 A CN201811158830 A CN 201811158830A CN 109299167 B CN109299167 B CN 109299167B
Authority
CN
China
Prior art keywords
family
character
influence
migration
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201811158830.1A
Other languages
Chinese (zh)
Other versions
CN109299167A (en
Inventor
夏理超
陈锦言
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201811158830.1A priority Critical patent/CN109299167B/en
Publication of CN109299167A publication Critical patent/CN109299167A/en
Application granted granted Critical
Publication of CN109299167B publication Critical patent/CN109299167B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a visualization method for displaying family migration history and family development conditions, which comprises the following steps: arranging the character information; calculating the influence of the person; establishing a figure family spectrogram; visualization of human influence: the method comprises the following steps of adopting a rose diagram as a visual diagram of the influence of all families in the same dynasty and with the same ancestor of people in the same place as a household, wherein the area of each petal of the rose diagram represents the influence of each person, the total area of roses represents the influence of all the families living in the same dynasty, and the sum of the influences of all the families in the same place represents the prosperity degree and social prestige of the family of the people; visualizing family migration process; predicting the cause of family migration; family migration and spatio-temporal visualization of social influence.

Description

Visualization method for displaying family migration history and family development condition
Technical Field
The invention relates to a machine learning-based visualization method.
Background
The development of data mining technology and visualization brings a new way for traditional human history research. Historians provide computer students with good research materials by electronically documenting historical literature. The computer student can better show the historical rules through the visual electronic documents, and meanwhile, the historian can be better judged through the machine learning related algorithm.
Disclosure of Invention
The invention provides an effective visualization method, which can better show family migration and family development processes of people. The technical scheme is as follows:
a visualization method for displaying family migration history and family development conditions comprises the following steps:
(1) collating character information
Preprocessing the data to form the information containing the names of the figures, the longitude and latitude of the places where the family members of the figures are located, the life dynasties of the figures, the parents of the figures and the achievement information of the figures, wherein the achievement information of the figures comprises the following steps: literary works of people, occupation of people, people's manner of entering, social relations of people, relative relations of people, and major events of people's participation;
(2) calculating human influence
(3) Establishing a family spectrogram of a character
By a recursion algorithm, continuously recursively searching the parent-child relationship of the figure for each person, and matching rules, wherein the rules are as follows: 0< father birth time-son birth time <100, until the father-son relationship cannot be found, recording that the current person is an ancestor of the same family; all the people with ancestors of the same person are the same family;
(4) visualization of human influence
The method comprises the following steps of adopting a rose diagram as a visual diagram of the influence of all families in the same dynasty and with the same ancestor of people in the same place as a household, wherein the area of each petal of the rose diagram represents the influence of each person, the total area of roses represents the influence of all the families living in the same dynasty, and the sum of the influences of all the families in the same place represents the prosperity degree and social prestige of the family of the people;
(5) family migration process visualization
The dynamic straight line with an arrow represents the migration direction and the migration scale of the character family, the generation of each migration is recorded, the arrow of the migration straight line represents the migration direction of the character family, the two ends of the straight line respectively represent the home address and the migration destination of the family, and the width of the straight line represents the number of the migrating people;
(6) prediction of cause of family migration
Sorting various factors which may influence family migration in a database, setting 1 and setting 0 to respectively represent existence and nonexistence, firstly adopting a PCA dimension reduction algorithm to reduce dimension of features and prevent overfitting, then adopting a multi-classification logistic regression algorithm to carry out model training on the data after dimension reduction, balancing variance and deviation in a cross validation mode, continuously debugging the dimension after PCA dimension reduction and training parameters of the logistic regression model to enable the accuracy of the model to reach the maximum value, storing the model, and finally generating the model to predict each migration which has been recorded;
(7) family migration and spatio-temporal visualization of social influence
The two-dimensional map marks the geographic position of the life of the person by taking the time coordinate axis as the substitute, simultaneously displays the reason of the predicted migration, and dynamically displays the family migration process of the person and the social influence change process of the family of the person with time change.
In the step (2), the crawler technology is utilized to obtain the attention of the entry of the search engine of the stored literature, and the influence factor of the character literature is calculated according to the formula (1); scoring the occupation of the figure according to the ancient Chinese occupation status, and calculating a figure occupation influence factor according to a formula (2); counting the attention of the entries of the search engine on the major events participated by the characters, standardizing the influence factors of the major events participated by the characters to 0-100, adding the influence factors of the major events participated by the characters, calculating a formula (3), counting the social relation influence of the characters according to the attached degree of the characters, wherein the formula (4) is shown, and finally, carrying out overall scoring on the social influence of the characters, wherein the overall scoring is shown as a formula (5);
Figure BDA0001819551960000021
I1influence factor, P, representing human literature1iInfluence factor representing the i-th work of a person participating in authoring, E1The influence factors of any works stored in the database at present;
Figure BDA0001819551960000022
I2influencing factor, P, representing the occupation of a person2iAn influence factor representing an i-th occupation in which the person is engaged;
Figure BDA0001819551960000031
I3the impact factor, P, representing a significant event in which a person participates3iInfluence factor representing the i-th significant event in which the person participates, E3Influence factors of any significant event stored in the database at present;
Figure BDA0001819551960000032
I4influential factor, P, representing the social relationship of a person4iRepresenting the number of persons attached to the character, E4The number of people to which any person is attached is stored in the current database;
Isum=I1+I2+I3+I4formula (5)
IsumRepresenting the influence of the character.
Drawings
FIG. 1 is a flow chart of the method
FIG. 2 flow chart of person family generation
FIG. 3 is a flow chart of character family migration cause model training and prediction
FIG. 4 visualization of family migration and influence of a person
Detailed Description
The invention provides a comprehensive and effective visualization mode for a family influence change process and a family migration process by combining with space-time information, a figure family relation graph is obtained through a data mining technology, a figure influence value is obtained through statistics, all data are comprehensively visualized on a two-dimensional map by combining with the time and space information of the life of all people in the whole family of a figure, and a valuable auxiliary speculation is given for a migration reason through a machine learning algorithm. The visualization method can effectively display the family migration history of the person and the influence conditions of the families at different times and can effectively analyze the family migration reasons. The method comprises the following specific steps:
1. collating character information
Preprocessing the data to form the information containing the names of the figures, the longitude and latitude of the places where the family members of the figures are located, the life dynasties of the figures, the parents of the figures and the achievement information of the figures, wherein the achievement information of the figures comprises the following steps: literary works of people, occupation of people, manner of entering people, social relationship of people, relative relationship of people, major events of people participation, and the like.
2. Calculating human influence
The method comprises the steps of firstly, obtaining the attention of the Baidu vocabulary entry of existing literature works by utilizing a crawler technology, calculating influence factors of the character literature works according to a formula (1), grading the occupation of characters according to the ancient occupation status of China, calculating the occupation influence factors of the characters according to a formula (2), counting the attention of the Baidu vocabulary entry to major events in which the characters participate, standardizing the influence factors to be 0-100, adding the influence factors of the major events in which the characters participate, wherein the calculation formula is a formula (3), counting the social relation influence of the characters according to the degree to which the characters are attached, namely a formula (4), and finally, counting the overall grade of the social influence of the characters, namely a formula (5).
Figure BDA0001819551960000041
Figure BDA0001819551960000042
Figure BDA0001819551960000043
Figure BDA0001819551960000044
Isum=I1+I2+I3+I4Formula (5)
3. Establishing a family spectrogram of a character
By a recursion algorithm, continuously recursively searching the parent-child relationship of the figure for each person, and matching rules, wherein the rules are as follows: 0< father birth time-son birth time <100, until the father-son relationship cannot be found, the process is as shown in fig. 2, and the current person is recorded as an ancestor of the same family. All the people with the ancestors of the same person are the same family.
4. Visualization of human influence
The rose flower picture is used as a visual influence picture of all people in the same dynasty and in the same place and in the same ancestor with the characters, wherein the area of each petal of the rose picture represents the influence of each character, the total area of roses represents the influence of all people living in the same dynasty, and the total sum of the influences of all the people in the same place represents the prosperity degree and social prestige of the family in which the characters are located.
5. Family migration process visualization
The dynamic straight line with an arrow represents the migration direction and the migration scale of the character family, the generation of each migration is recorded, the arrow of the migration straight line represents the migration direction of the character family, the two ends of the straight line respectively represent the home address and the migration destination of the family, and the width of the straight line represents the number of the migrating people.
6. Prediction of cause of family migration
Sorting various factors which may influence family migration in a database, setting 1 and setting 0 to respectively represent existence and nonexistence, firstly adopting a PCA dimension reduction algorithm to reduce dimension of features and prevent overfitting, then adopting a multi-classification logistic regression algorithm to train a model of the data after dimension reduction, balancing variance and deviation in a cross validation mode, continuously debugging the dimension after PCA dimension reduction and training parameters of the logistic regression model to enable the accuracy of the model to reach the maximum value, storing the model, and finally generating the model to predict each recorded migration, wherein the specific flow is shown in FIG. 3.
7. Family migration and spatio-temporal visualization of social influence
The two-dimensional map marks the geographic position of the life of the person by taking the time coordinate axis as the substitute, simultaneously displays the reason of the predicted migration, and dynamically displays the family migration process of the person and the social influence change process of the family of the person with time change. Fig. 4 shows the influence and migration effect of a family of people in a certain generation.
And fitting the migration cause prediction model by adopting a PCA algorithm and a logistic regression algorithm in a sklern module of python. The method comprises the steps of taking a map component in an echarts frame as a geographic information platform, taking a java language spring MVC frame and a mybatis frame as background technical frames, dynamically displaying the character family migration and the change process of social influence by taking a two-dimensional display plane as a substitute time axis and taking a Chinese map as a two-dimensional display plane, and giving a migration reason prediction result. By adopting the method, a good visualization effect can be obtained.

Claims (1)

1.一种展示家族迁徙历史以及家族发展状况的可视化方法,步骤如下:1. A visualization method for displaying family migration history and family development status, the steps are as follows: (1)整理人物信息(1) Organize character information 对数据进行预处理,形成包含人物名字,人物户籍所在地经纬度,人物生活朝代,人物父亲,人物成就信息,其中人物成就信息包括:人物的文学著作,人物的职业,人物入仕方式,人物社会关系,人物的亲戚关系,人物参与的重大事件;The data is preprocessed to form information including the character's name, the latitude and longitude of the character's household registration, the character's life dynasty, the character's father, and the character's achievement information. The character's achievement information includes: the character's literary works, the character's occupation, the character's entry method, and the character's social relationship , the relatives of the characters, the major events that the characters are involved in; (2)计算人物影响力,方法如下:(2) Calculate the influence of characters as follows: 利用爬虫技术获取已存文学著作的搜索引擎词条关注度,按照公式(1)计算人物文学著作的影响力因子;按照中国古代职业地位对人物职业进行评分,按照公式(2)计算人物职业影响力因子;对人物参与的重大事件统计搜索引擎词条关注度,并将其影响力因子标准化范围至0-100,对人物参与的重大事件影响力因子进行加分,计算公式为公式(3),按照人物被依附的度统计人物社会关系影响力如公式(4),最终人物社会影响力的总体评分如公式(5):Use crawler technology to obtain the attention of search engine entries of existing literary works, and calculate the influence factor of characters' literary works according to formula (1); score characters' occupations according to their occupational status in ancient China, and calculate the influence of characters' occupations according to formula (2) Power factor: Count the attention of search engine entries for major events involving people, and standardize its influence factor to a range of 0-100, and add points to the influence factor of major events involving people. The calculation formula is formula (3) , according to the degree to which the character is attached, the social relationship influence of the character is calculated as formula (4), and the final overall score of the social influence of the character is as formula (5):
Figure FDA0003121413320000011
Figure FDA0003121413320000011
I1代表人物文学著作的影响力因子,P1i代表人物参与编著的第i部著作的影响力因子,E1目前数据库中已存的任意著作的影响力因子;I 1 represents the influence factor of the character's literary works, P 1i represents the influence factor of the i-th work edited by the character, and E 1 represents the influence factor of any work that exists in the current database;
Figure FDA0003121413320000012
Figure FDA0003121413320000012
I2代表人物职业的影响力因子,P2i代表人物所从事的第i种职业的影响力因子;I 2 represents the influence factor of the person's occupation, and P 2i represents the influence factor of the i-th occupation that the person is engaged in;
Figure FDA0003121413320000013
Figure FDA0003121413320000013
I3代表人物参与的重大事件的影响力因子,P3i代表人物参与的第i次重大事件的影响力因子,E3目前数据库中已存的任意重大事件的影响力因子;I 3 represents the influence factor of the major event that the person participated in, P 3i represents the influence factor of the i-th major event that the person participated in, and E 3 is the influence factor of any major event existing in the current database;
Figure FDA0003121413320000014
Figure FDA0003121413320000014
I4代表人物社会关系的影响力因子,P4代表依附该人物的人数,E4目前数据库中已存任意人物被依附的人数;I 4 represents the influence factor of the social relationship of the character, P 4 represents the number of people who are attached to the character, and E 4 is the number of people who are attached to any character in the current database; Isum=I1+I2+I3+I4 公式(5)I sum =I 1 +I 2 +I 3 +I 4 Formula (5) Isum代表人物的影响力;I sum represents the influence of the character; (3)建立人物家谱图(3) Establish a family tree of characters 通过递归算法,对每一个人不断递归查找人物的父子关系,并进行规则匹配,规则为:0<父亲出生时间-儿子出生时间<100,直至无法发现父子关系为止,记录当前的人物为同一族的祖先;最终得到的祖先为同一个人的所有人物为同一族人;Through the recursive algorithm, the father-son relationship of each person is recursively searched for each person, and the rules are matched. The rules are: 0<father's birth time-son's birth time<100, until the father-son relationship cannot be found, record the current characters as the same family Ancestors of the same person; all figures whose ancestors are the same person are of the same clan; (4)人物影响力的可视化(4) Visualization of the influence of characters 采用玫瑰花图作为同一朝代,户籍为同一地点的与人物为同一祖先的所有族人的影响力可视化图,其中玫瑰画图每个花瓣的面积代表每个人物影响力的大小,总的玫瑰花的面积代表所有生活在同一朝代,同一地点的所有族人的影响力的总和,代表人物所在家族的兴盛程度和社会声望;The rose diagram is used as a visualization of the influence of all the clansmen of the same dynasty, the household registration is the same place and the characters are the same ancestor, in which the area of each petal of the rose painting represents the influence of each character, and the total area of roses Represents the sum of the influences of all clans living in the same dynasty and the same place, and represents the prosperity and social prestige of the family where the character belongs; (5)家族迁徙历程可视化(5) Visualization of family migration process 带箭头的动态直线代表人物家族的迁徙方向和迁徙规模,同时记录每次迁徙发生的朝代,迁徙直线的箭头代表人物家族迁徙的方向,直线的两端分别代表家族的原地址与迁徙目的地,直线的宽度代表迁徙的族人数量;The dynamic line with arrows represents the migration direction and migration scale of the family, and records the dynasties of each migration. The arrow of the migration line represents the migration direction of the family, and the two ends of the line represent the original address and migration destination of the family, respectively. The width of the line represents the number of migrating tribes; (6)家族迁徙原因预测(6) Prediction of reasons for family migration 整理数据库中可能影响家族迁徙的各种因素,置1和置0分别表示存在和不存在,首先采用PCA降维算法,对特征进行降维,防止过拟合,之后采用多分类逻辑回归算法对降维之后的数据进行模型训练,采用交叉验证的方式平衡方差和偏差,通过不断的调试PCA降维之后的维度和逻辑回归模型的训练参数使模型的准确率达到最大值,并保存模型,最终生成的模型对已有记录的每次迁徙做出预测;Sort out various factors that may affect family migration in the database, and set 1 and 0 to indicate existence and non-existence respectively. First, the PCA dimension reduction algorithm is used to reduce the dimension of the features to prevent over-fitting, and then the multi-class logistic regression algorithm is used to The data after dimensionality reduction is used for model training, and the variance and bias are balanced by cross-validation. Through continuous debugging of the dimension after PCA dimensionality reduction and the training parameters of the logistic regression model, the accuracy of the model is maximized, and the model is saved. Finally, The resulting model makes predictions for each migration that has been recorded; (7)家族迁徙以及社会影响力的时空可视化(7) Spatiotemporal visualization of family migration and social influence 以朝代为时间坐标轴,二维地图为平面标记人物生活的地理位置,同时显示已作出预测的迁徙的原因,随时间变化动态展示人物家族迁徙的过程以及人物所在家族社会影响力的变化过程。Taking the dynasties as the time axis, and the two-dimensional map as the plane, it marks the geographical location of the characters' lives, and at the same time shows the reasons for the migration that have been predicted, and dynamically displays the migration process of the characters' families and the social influence of the characters' families over time.
CN201811158830.1A 2018-09-30 2018-09-30 A visualization method to display family migration history and family development Expired - Fee Related CN109299167B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811158830.1A CN109299167B (en) 2018-09-30 2018-09-30 A visualization method to display family migration history and family development

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811158830.1A CN109299167B (en) 2018-09-30 2018-09-30 A visualization method to display family migration history and family development

Publications (2)

Publication Number Publication Date
CN109299167A CN109299167A (en) 2019-02-01
CN109299167B true CN109299167B (en) 2021-08-13

Family

ID=65161507

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811158830.1A Expired - Fee Related CN109299167B (en) 2018-09-30 2018-09-30 A visualization method to display family migration history and family development

Country Status (1)

Country Link
CN (1) CN109299167B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110110270B (en) * 2019-04-25 2021-01-15 武汉大学 A method and device for generating large-scale genealogy graphs with parallel processing

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101988119A (en) * 2009-07-31 2011-03-23 刘晓明 Method for calculating family branch of family name and tracing pedigree by using DNA
CN106540448A (en) * 2016-09-30 2017-03-29 浙江大学 The visual analysis method affected on its consuming behavior is exchanged between a kind of game player
CN107346337A (en) * 2017-06-30 2017-11-14 福州大学 A kind of family tree with history age mark and ancestral hall information linkage method for visualizing

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10250261A (en) * 1997-03-12 1998-09-22 Noritsu Koki Co Ltd Family tree and family tree generation system
US6416325B2 (en) * 2000-04-14 2002-07-09 Jeffrey J. Gross Genealogical analysis tool

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101988119A (en) * 2009-07-31 2011-03-23 刘晓明 Method for calculating family branch of family name and tracing pedigree by using DNA
CN106540448A (en) * 2016-09-30 2017-03-29 浙江大学 The visual analysis method affected on its consuming behavior is exchanged between a kind of game player
CN107346337A (en) * 2017-06-30 2017-11-14 福州大学 A kind of family tree with history age mark and ancestral hall information linkage method for visualizing

Also Published As

Publication number Publication date
CN109299167A (en) 2019-02-01

Similar Documents

Publication Publication Date Title
Karthikeyan et al. RETRACTED ARTICLE: Towards developing hybrid educational data mining model (HEDM) for efficient and accurate student performance evaluation
US20150050637A1 (en) System and method for early warning and recognition for student achievement in schools
CN111709575A (en) Academic achievement prediction method based on C-LSTM
CN118396795B (en) Campus life recording method integrating large models
CN119988735B (en) Talent culture recommendation method based on federal learning and natural language processing
CN108897750B (en) Personalized location recommendation method and device integrating multiple contextual information
Gupta et al. Machine learning approaches for student performance prediction
Peng Research on online learning behavior analysis model in big data environment
CN120069258A (en) Online education learning path optimization system based on big data and intelligent analysis
Tobey et al. Interpretable models for the automated detection of human trafficking in illicit massage businesses
Hassan et al. Identification of Technical and Vocational Education and Training (TVET) trainee’s personality attributes which impact skills learning
Kırdar et al. A design proposal of integrated smart mobility application for travel behavior change towards sustainable mobility
Holding et al. Quantifying the mover’s advantage: transatlantic migration, employment prestige, and scientific performance
Saranya et al. ENHANCED PREDICTION OF STUDENT DROPOUTS USING FUZZY INFERENCE SYSTEM AND LOGISTIC REGRESSION.
CN112800210A (en) Crowd portrait algorithm based on massive bus data
CN109299167B (en) A visualization method to display family migration history and family development
Chiang et al. Linear correlation discovery in databases: a data mining approach
Langan et al. Benchmarking factor selection and sensitivity: a case study with nursing courses
Govindarajan Educational data mining techniques and applications
Yuan Recommended Teaching Resources for Ideological and Political Courses Based on Normalized Discounted Cumulative Gain
CN113051469A (en) Subject selection recommendation method based on K-clustering algorithm
Jenitha et al. Prediction of Students' Performance based on Academic, Behaviour, Extra and Co-Curricular Activities.
Kubegenova et al. Using the data mining tool to analyze student performance
Slomczynski et al. On the Future of Survey Data Harmonization
US20200104799A1 (en) Identifying fake positions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210813

CF01 Termination of patent right due to non-payment of annual fee