CN113571177A

CN113571177A - On-line forum user depression detection model based on gated convolution network

Info

Publication number: CN113571177A
Application number: CN202110713423.8A
Authority: CN
Inventors: 饶国政; 张越; 丛庆
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2021-06-25
Filing date: 2021-06-25
Publication date: 2021-10-29

Abstract

The invention discloses a method for detecting depression of online forum users based on a gated convolutional network. A model for detecting depression of online paper users based on a gated convolutional network is used to realize the detection. The detection of the model is realized through post-level operations and user-level operations. The post-level representation is obtained through post-level operations; the user state representation is obtained through user-level operations, and then the probability distribution on the label is output by the softmax layer to achieve detection. The method of the present invention can use gating weights to effectively identify language associated with negative emotions in user posts, thereby enabling depression detection.

Description

A Depression Detection Model for Online Forum Users Based on Gated Convolutional Networks

技术领域technical field

本发明涉及抑郁检测技术领域，特别是涉及一种基于门控卷积网络的在线论坛用户抑郁检测模型。The invention relates to the technical field of depression detection, in particular to an online forum user depression detection model based on a gated convolutional network.

背景技术Background technique

抑郁症是最常见的与精神健康相关的疾病之一，是导致全球范围内自我伤害和自杀并影响数百万人的主要原因。早期抑郁症的发现和治疗可以减少由疾病引起的损害。但早期发现和治疗抑郁症和其他精神疾病的服务非常有限。而且许多患者不愿意主动寻求医疗保健提供者的帮助。这些问题使患者无法得到及时的治疗，导致病情进一步恶化。Depression is one of the most common mental health-related disorders and a leading cause of self-harm and suicide worldwide and affects millions of people. Early detection and treatment of depression can reduce damage caused by the disorder. But services for early detection and treatment of depression and other mental illnesses are very limited. And many patients are reluctant to proactively seek help from a healthcare provider. These problems keep patients from getting timely treatment, which can lead to further deterioration of the condition.

越来越多的抑郁症患者开始使用在线资源(Twitter，网站，Reddit等)来表达自己的心理问题并寻求帮助。特别是一些可选择匿名或匿名的在线论坛更受欢迎。利用社交媒体数据早期发现抑郁症任务已成为一种有效的手段。同时，海量社交媒体数据难以手动识别患有抑郁症或有自杀倾向的用户，这使得自动抑郁症检测技术的开发变得更为关键。More and more people with depression are using online resources (Twitter, website, Reddit, etc.) to express their psychological problems and seek help. In particular, some online forums that can choose to be anonymous or anonymous are more popular. The early detection of depression task using social media data has emerged as an effective means. At the same time, the massive amount of social media data makes it difficult to manually identify users with depression or suicidal tendencies, making the development of automated depression detection technology even more critical.

论坛中，经常有大量带有敏感内容的帖子post，表明用户有自杀和自残风险。使用适当深度学习模型和社交媒体数据进行早期抑郁症检测可防止潜在自我伤害。但是现有的抑郁症检测模型功能不足，无法从每个用户发布的大量帖子中捕获关键的情绪信息，这使得这些模型的性能无法令人满意。In the forum, there are often a large number of posts with sensitive content, indicating that users are at risk of suicide and self-harm. Early depression detection using appropriate deep learning models and social media data prevents potential self-harm. But the existing depression detection models are not powerful enough to capture key emotional information from the large number of posts made by each user, which makes the performance of these models unsatisfactory.

发明内容SUMMARY OF THE INVENTION

本发明的目的是针对现有技术中存在的技术缺陷，而提供一种基于门控卷积网络的在线论坛用户抑郁检测模型，并提供基于门控卷积网络的在线论坛用户抑郁检测模型进行抑郁检测的方法。The purpose of the present invention is to provide an online forum user depression detection model based on gated convolution network for the technical defects existing in the prior art, and to provide an online forum user depression detection model based on gated convolution network for depression detection method of detection.

为实现本发明的目的所采用的技术方案是：The technical scheme adopted for realizing the purpose of the present invention is:

一种基于门控卷积网络的在线论坛用户抑郁检测方法，包括：A method for detecting depression in online forum users based on gated convolutional networks, including:

利用基于门控卷积网络对在线论文用户进行抑郁检测的模型实现检测，该模型的检测步骤如下：A model for depression detection of online paper users based on gated convolutional network is used to achieve detection. The detection steps of this model are as follows:

S1.帖子级操作S1. Post-level actions

模型的输入通过多层带有门控单元的卷积神经网络，卷积神经网络利用有限的上下文信息来获取帖子表示的关键特征；The input of the model is passed through a multi-layer convolutional neural network with gated units, and the convolutional neural network utilizes limited contextual information to obtain the key features of the post representation;

其中，帖子级操作的处理层包括两个卷积层和一个全局平均池；第一卷积层使用两个大小不同的卷积内核来获得抽象特征图，然后，具有门控单元的第二卷积层使用卷积核来获得不同的门控权重，应用所述门控权重与第一个卷积层生成的特征图进行逐元素乘积，以此获得帖子级别的表征；Among them, the processing layers of post-level operations consist of two convolutional layers and a global average pooling; the first convolutional layer uses two convolutional kernels of different sizes to obtain abstract feature maps, and then, the second convolutional layer with gated units The product layer uses convolution kernels to obtain different gating weights, and applies the gating weights to the element-wise product of the feature map generated by the first convolution layer to obtain post-level representations;

每个单词由词嵌入矩阵L_ω∈R^d×|V|表示，|V|是词汇量中的单词数，d是单词向量的维数，每条帖子表示为n个单词，分别为{ω₁，ω₂，...ω_i...ω_n}，令x_i∈R^d为与帖子中第i个单词相对应的d维单词向量，长度为n的帖子嵌入后表示为Each word is represented by a word embedding matrix L _ω ∈ R ^d×|V| , |V| is the number of words in the vocabulary, d is the dimension of the word vector, and each post is represented by n words, respectively {ω ₁ , ω ₂ , ... ω _i ... ω _n }, let x _i ∈ R ^d be the d-dimensional word vector corresponding to the ith word in the post, the post embedding of length n is expressed as

X_1：n＝[x₁，x₂，...，x_n]X _{1 : n} = [x ₁ , x ₂ , . . . , x _n ]

第一卷积层中，使用CNN和多个不同宽度的卷积滤波器生成帖子的表示形式；将具有不同宽度的卷积滤波器视为提取器以获取的多粒度局部信息；使用不同窗口大小的多个卷积滤波器以获得多个特征图；In the first convolutional layer, a CNN and multiple convolutional filters of different widths are used to generate a representation of the post; convolutional filters with different widths are treated as multi-granular local information obtained by the extractor; different window sizes are used of multiple convolutional filters to obtain multiple feature maps;

设K∈R^s×d是步长为1的卷积滤波器，应用于s个单词的窗口以产生新的特征，[x_i+1，x_i+2，...，x_i+s-1]表示长度固定为s的窗口中的词嵌入，将两个向量串联起来即X_i：i+s-1，由X_i：i+s-1生成一个新的特征a_i Let K∈R ^s×d be a convolutional filter with stride 1 applied to a window of s words to produce new features, [x _i+1 , x _i+2 , ..., x _{i+s -1} ] represents the word embedding in a window with a fixed length of s, concatenate the two vectors, namely X _{i: i+s-1} , and generate a new feature a _i from X _{i: i+s-1}

a_i＝f(K*X_i：i+s+1+b)a _i =f(K*X _i:i+s+1 +b)

其中，b∈R是偏置项，*表示卷积运算，f是激活函数LeakyReLU，该过滤器应用于帖子[X_1：s，X_2：s+1，...，X_n-s+1：n]中每个可能窗口的s个单词来生成特征图A，where b∈R is the bias term, * denotes the convolution operation, and f is the activation function LeakyReLU, the filter applied to the posts [X1 _:s , X2 _:s+1 ,...,Xn _{-s+ 1:n} ] s words for each possible window to generate feature map A,

A＝[a₁，a₂，...，a_n-s+1]A=[a ₁ , a ₂ , ..., an _-s+1 ]

其中，A∈R^(n-s+1)×1表示通过不同大小的过滤器得到所有的特征图，然后将每个特征图A都作为输入传给第二卷积层；Among them, A∈R ^(n-s+1)×1 means that all feature maps are obtained through filters of different sizes, and then each feature map A is passed as an input to the second convolutional layer;

第二卷积层包括卷积层和门控单元，用来生成不同的门控权重，包含一个核F∈R^h ^×1的卷积运算，用于提取特征图A，卷积核F以窗口h在特征a₁上滑动，以此来生成门控权重；The second convolutional layer includes convolutional layers and gating units, which are used to generate different gating weights, and a convolution operation with a kernel F∈R ^h ^{× 1} , which is used to extract the feature map A, and the convolution kernel F is a window h slides on feature a ₁ to generate gating weights;

其中，g_l∈R，l＝1，2，...，n-s+1，所有门控权重元素均由特征图A和卷积核F所生成，并组成门控权重矩阵G：Among them, g _l ∈ R, l=1,2,...,n-s+1, all gating weight elements are generated by the feature map A and the convolution kernel F, and form the gating weight matrix G:

G＝[g₁，g₂，…，g_n-s+1]G=[g ₁ , g ₂ , ..., g _n-s+1 ]

其中，G∈R^(n-s+1)×1，m为第二卷积层中的卷积核的数量，利用门控单元提取不同的门控权重矩阵G₁，G₂，...，G_m，然后通过门控权重矩阵G获得输出特征图O：Among them, G∈R ^(n-s+1)×1 , m is the number of convolution kernels in the second convolutional layer, and the gating unit is used to extract different gating weight matrices G ₁ , G ₂ ,... , G _m , and then obtain the output feature map O by gating the weight matrix G:

其中，

是矩阵之间的按元素乘积，O∈R^m×(n-s+1)；in,

is the element-wise product between matrices, O∈R ^m×(n-s+1) ;

第一卷积层的输出特征图O由门控权重矩阵G控制；The output feature map O of the first convolutional layer is controlled by the gating weight matrix G;

接着将第二个卷积层的输出特征图O输入到全局平均池化层，并连接所有输出以获取帖子的表征；Then input the output feature map O of the second convolutional layer to the global average pooling layer and concatenate all outputs to obtain the representation of the post;

S2.用户级操作S2. User-level operations

将获得的帖子表征输入送到用户级操作处理中，使用与帖子级操作相同的方法，获得用户状态表征，然后将获得的用户状态表征传递到全连接的softmax图层，该softmax图层输出标签上的概率分布，从而实现检测；模型损失函数使用的是分类交叉熵，每个文档的目标情感分布记为p^T，p为预测文档情感分布；Send the obtained post representation input to the user-level operation processing, use the same method as the post-level operation to obtain the user state representation, and then pass the obtained user state representation to the fully connected softmax layer, which outputs the labels The probability distribution on , to achieve detection; the model loss function uses the classification cross entropy, the target sentiment distribution of each document is recorded as p ^T , and p is the predicted document sentiment distribution;

其中，T为训练数据，C为分类数，i是文档的索引，j是类别索引，训练的目的是使所有训练文档的p^T和p之间的交叉熵的误差最小化。Among them, T is the training data, C is the number of categories, i is the index of the document, j is the category index, and the purpose of training is to minimize the error of the cross-entropy between p ^T and p of all training documents.

本发明模型与基准之间的差异具有统计学意义(McNemar检验，p＜0.05)。使用具有两组功能的MNB和SVM分类器，将本发明模型与多个基准进行比较。针对抑郁用户提出的MGL-CNN模型在精度，召回率和F1方面均优于基线(分别增长了6.8％，6.7％和5.9％)。结果证明，方法可以使用选通权重来有效地识别与用户帖子中的负面情绪相关的语言。The difference between the inventive model and the benchmark was statistically significant (McNemar test, p<0.05). The inventive model was compared to multiple benchmarks using MNB and SVM classifiers with two sets of functions. The proposed MGL-CNN model for depressed users outperforms the baseline in precision, recall, and F1 (increases by 6.8%, 6.7%, and 5.9%, respectively). It turns out that the method can use gating weights to effectively identify language associated with negative sentiment in user posts.

附图说明Description of drawings

图1是本发明的基于门控卷积网络对在线论文用户进行抑郁检测模型(MG-CNN)整体结构图；Fig. 1 is the overall structure diagram of the present invention's depression detection model (MG-CNN) for online paper users based on gated convolutional network;

图2是本发明的帖子级操作模型结构图。FIG. 2 is a structural diagram of a post-level operation model of the present invention.

具体实施方式Detailed ways

以下结合附图和具体实施例对本发明作进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

如图1-2所示，本发明的基于门控卷积网络的在线论坛用户抑郁检测方法，利用基于门控卷积网络对在线论文用户进行抑郁检测的模型实现检测，该模型的检测步骤如下：As shown in Figures 1-2, the method for detecting depression of online forum users based on gated convolutional network of the present invention utilizes a model for detecting depression of online paper users based on gated convolutional network to realize detection. The detection steps of the model are as follows :

步骤一：帖子级操作Step 1: Post-level actions

模型的输入通过多层带有门控单元的卷积神经网络，卷积神经网络利用有限的上下文信息来获取帖子表示的关键特征。The input to the model is passed through multiple layers of convolutional neural networks with gated units, which utilize limited contextual information to obtain key features of post representations.

MG-CNN中的帖子级操作，由两个卷积层和一个全局平均池组成。在MG-CNN中的帖子级操作的演示模型中，第一卷积层使用两个大小不同的卷积内核来获得抽象特征图。然后，具有门控单元的第二卷积层使用卷积核来获得不同的门控权重。应用这些门控权重来与第一个卷积层生成的特征图进行逐元素乘积，以此获得帖子级别的表征。Post-level operations in MG-CNN, consisting of two convolutional layers and a global average pooling. In the demonstration model of post-level operations in MG-CNN, the first convolutional layer uses two convolutional kernels of different sizes to obtain abstract feature maps. Then, the second convolutional layer with gating units uses convolution kernels to obtain different gating weights. These gating weights are applied to perform element-wise multiplication with the feature map generated by the first convolutional layer to obtain a post-level representation.

每个单词都由词嵌入矩阵L_ω∈R^d×|V|表示，|V|是词汇量中的单词数，d是单词向量的维数。将用户的每条帖子都表示为n个单词，分别为{ω₁，ω₂，...ω_i...ω_n}，令x_i∈R^d为与帖子中第i个单词(英文单词)相对应的d维单词向量，比如Iam depress，要变成[0，0，1]，[0，1，0][0，1，1]，维度为3。长度为n的帖子嵌入后表示为Each word is represented by a word embedding matrix L _ω ∈ R ^d×|V| , where |V| is the number of words in the vocabulary and d is the dimension of the word vector. Each post of the user is represented as n words, respectively {ω ₁ , ω ₂ , ... ω _i ... ω _n }, let x _i ∈ R ^d be the same as the ith word in the post (English word) The corresponding d-dimensional word vector, such as Iam depress, should become [0, 0, 1], [0, 1, 0][0, 1, 1], and the dimension is 3. Posts of length n are embedded as

X_1：n＝[x₁，x₂，...，x_n] (1)X _{1 : n} = [x ₁ , x ₂ , . . . , x _n ] (1)

在第一卷积层中，使用CNN和多个不同宽度的卷积滤波器来生成帖子的表示形式。可以将具有不同宽度的卷积滤波器视为提取器，以获取诸如N-Grams之类的多粒度局部信息。同样，宽度为2的卷积滤波器实质上可捕获用户帖子中的双单词组的语义。使用不同窗口大小的多个卷积滤波器以获得多个特征图。设K∈R^s×d是步长为1的卷积滤波器，将其应用于s个单词的窗口以产生新的特征。[x_i+1，x_i+2，...，x_i+s-1]表示长度固定为s的窗口中的词嵌入，将两个向量串联起来即X_i：i+s-1。由X_i：i+s-1生成一个新的特征a_i In the first convolutional layer, a CNN and multiple convolutional filters of different widths are used to generate the representation of the post. Convolutional filters with different widths can be considered as extractors to obtain multi-granularity local information such as N-Grams. Likewise, a convolutional filter of width 2 essentially captures the semantics of bigrams in user posts. Use multiple convolutional filters with different window sizes to obtain multiple feature maps. Let K ∈ R ^{s × d} be a convolutional filter with stride 1 applied to a window of s words to produce new features. [x _i+1 , x _i+2 , . . . , x _i+s-1 ] represents the word embedding in a window of fixed length s, concatenating the two vectors, namely X _{i: i+s-1} . Generate a new feature a _i by X _{i: i+s-1}

a_i＝f(K*X_i：i+s+1+b) (2)a _i =f(K*X _i:i+s+1 +b) (2)

其中b∈R是偏置项，*表示卷积运算，f是激活函数(LeakyReLU)。此过滤器应用于帖子[X_1：s，X_2：s+1，...，X_n-s+1：n]中的每个可能窗口的s个单词来生成特征图A，where b∈R is the bias term, * denotes the convolution operation, and f is the activation function (LeakyReLU). This filter is applied to the s words of each possible window in posts [X1 _:s , X2 _:s+1 ,...,Xn _-s+1:n ] to generate feature map A,

A＝[a₁，a₂，...，a_n-s+1] (3)A=[a ₁ , a ₂ , ..., an _-s+1 ] (3)

其中A∈R^(n-s+1)×1。通过不同大小的过滤器得到所有的特征图，然后将每个特征图A都作为输入传给第二卷积层。where A∈R ^(n-s+1)×1 . All feature maps are obtained through filters of different sizes, and then each feature map A is passed as input to the second convolutional layer.

第二卷积层由卷积层和门控单元组成。该层用来生成不同的门控权重。它包含一个核F∈R^h×1的卷积运算，主要应用于提取上下文特征A。卷积核F以窗口h在特征a₁上滑动，以此来生成门控权重。The second convolutional layer consists of convolutional layers and gating units. This layer is used to generate different gating weights. It contains a convolution operation with a kernel F ∈ R ^h×1 , which is mainly used to extract the contextual feature A. The convolution kernel F slides on the feature a ₁ with a window h to generate the gating weights.

其中g_l∈R，l＝1，2，...，n-s+1。所有门控权重元素均由特征图A和卷积核F所生成，并组成门控权重矩阵：where g _l ∈ R, l=1,2,...,n-s+1. All gating weight elements are generated by feature map A and convolution kernel F, and form a gating weight matrix:

G＝[g₁，g₂，…，g_n-s+1] (5)G=[g ₁ , g ₂ , ..., g _n-s+1 ] (5)

其中G∈R^(n-s+1)×1。令m为第二卷积层中使用的卷积核的数量。利用MGL-CNN的门控单元提取不同的门控权重矩阵：G₁，G₂，...，G_m。然后，通过门控权重矩阵G获得输出特征图O：where G∈R ^(n-s+1)×1 . Let m be the number of convolution kernels used in the second convolutional layer. Different gating weight matrices are extracted using the gating unit of MGL-CNN: G ₁ , G ₂ , ..., G _m . Then, the output feature map O is obtained by gating the weight matrix G:

其中

是矩阵之间的按元素乘积。在MG-CNN中，O∈R^m×(n-s+1)。in

is the element-wise product between matrices. In MG-CNN, O∈Rm ^×(n-s+1) .

第一卷积层的输出O由门控权重G控制。这些门控权重乘以特征图A，并控制应在层中传播哪些信息。为了捕获帖子的全局信息，接着将第二个卷积层的输出输入到全局平均池化层，并连接所有输出以获取帖子的表征。The output O of the first convolutional layer is controlled by the gating weight G. These gating weights are multiplied by the feature map A and control what information should be propagated in the layers. To capture the global information of the post, the output of the second convolutional layer is then fed into the global average pooling layer, and all outputs are concatenated to obtain the representation of the post.

步骤二：用户级操作Step 2: User-level operations

将获得的帖子表征输入送到用户级操作，使用与帖子级操作相同的方法，获得用户的状态表征。然后，将获得的用户特征传递到全连接的softmax图层，该图层的输出是标签上的概率分布。模型的损失函数使用的是分类交叉熵。每个文档的目标情感分布记为p^T，p为预测文档情感分布。The obtained post representation input is sent to the user-level operation, and the user's state representation is obtained using the same method as the post-level operation. Then, the obtained user features are passed to a fully connected softmax layer whose output is a probability distribution over the labels. The loss function of the model uses categorical cross entropy. The target sentiment distribution of each document is denoted as p ^T , where p is the predicted document sentiment distribution.

其中T为训练数据，C为分类数，i是文档的索引，j是类别索引。训练的目的是使所有训练文档的p^T和p之间的交叉熵的误差最小化。where T is the training data, C is the number of categories, i is the index of the document, and j is the category index. The goal of training is to minimize the error of the cross-entropy between p ^T and p of all training documents.

实验验证：Experimental verification:

实验基于Reddit自我报告的抑郁诊断(RSDD)数据集和抑郁的早期检测数据集(eRisk 2017)。通过与其他强大的基线模型进行比较来评估提出的模型的性能，并分析模型的性能。The experiments are based on the Reddit Self-Reported Depression Diagnosis (RSDD) dataset and the Early Detection of Depression dataset (eRisk 2017). The performance of the proposed model is evaluated by comparison with other strong baseline models, and the performance of the model is analyzed.

RSDD数据集由训练，验证和测试数据集组成，每个数据集包含大约3,000个已诊断用户和35,000个对照用户。使用验证集来调整模型和基线的超参数。eRisk 2017由训练和测试数据集组成。eRisk 2017的训练集包含83个抑郁的用户和403个对照用户，而测试集包含52个抑郁的用户和349个对照用户。The RSDD dataset consists of training, validation and testing datasets, each containing approximately 3,000 diagnosed users and 35,000 control users. Use the validation set to tune the hyperparameters of the model and baseline. eRisk 2017 consists of training and testing datasets. The training set of eRisk 2017 contains 83 depressed users and 403 control users, while the test set contains 52 depressed users and 349 control users.

RSDD验证集用于选择抑郁症检测模型的超参数，测试集用于报告结果。不使用预训练(如公开的Glove)的嵌入来初始化嵌入层。抑郁模型的输入的原始编码由one-hot向量组成，然后使用输入层学习50维和100维的嵌入。学习率(lr)设置为0.001。对于RSDD，eRisk2017，将batch大小分别设置为64和128。将Maxm定义为每个用户的最大帖子数，Maxn定义为每个帖子的最大特征数。当一个用户的文档超过了最大帖子数时，将随机打乱帖子，然后随机选择帖子并截取Maxm的长度。例如，将模型设置为接收RSDD数据集中最多600个帖子(Maxm＝600)，其中每个帖子最多包含100个单词(Maxn＝100)。本发明提出的抑郁症检测模型是用Keras框架实现的。The RSDD validation set was used to select hyperparameters for the depression detection model, and the test set was used to report the results. Embedding layers are initialized without pre-trained (like publicly available Glove) embeddings. The raw encoding of the input to the depression model consists of one-hot vectors, and then the input layer is used to learn 50- and 100-dimensional embeddings. The learning rate (lr) is set to 0.001. For RSDD, eRisk2017, set the batch size to 64 and 128, respectively. Define Maxm as the maximum number of posts per user and Maxn as the maximum number of features per post. When a user's document exceeds the maximum number of posts, the posts will be randomly scrambled, then the posts will be randomly selected and the length of Maxm will be intercepted. For example, set the model to receive up to 600 posts (Maxm=600) in the RSDD dataset, where each post contains up to 100 words (Maxn=100). The depression detection model proposed by the present invention is implemented with Keras framework.

对MG-CNN帖子级操作，将第一卷积层的卷积核(s)的窗口大小分别设置为2、3、4、5、6，每个窗口大小有30个不同的卷积核。第二个卷积层中，卷积核(h)的窗口大小为1、3、5、7，每个窗口大小有30个不同的卷积核。在SG-CNN第二层中使用窗口大小为3的卷积核。对于模型的用户级操作，设置与帖子级操作相同的参数，不会对数据集进行任何特定的调整。For MG-CNN post-level operations, the window sizes of the convolution kernel(s) of the first convolutional layer are set to 2, 3, 4, 5, and 6, respectively, and each window size has 30 different convolution kernels. In the second convolutional layer, the convolution kernels (h) have window sizes of 1, 3, 5, and 7, and each window size has 30 different convolution kernels. A convolution kernel with a window size of 3 is used in the second layer of SG-CNN. For the user-level operations of the model, the same parameters are set as for the post-level operations, without any specific adjustments to the dataset.

本发明模型使用softmax函数和分类交叉熵作为其损失函数，对于类别的平衡问题，是通过“分类交叉”的方式进行处理的。所有模型都通过随机梯度下降法使用Adam优化器进行训练的。The model of the present invention uses the softmax function and the classification cross entropy as its loss function, and handles the class balance problem by means of "classification cross". All models are trained by stochastic gradient descent using the Adam optimizer.

上表中显示了RSDD中从方法和其他基准中识别抑郁用户的结果。本发明模型与基准之间的差异具有统计学意义(McNemar检验，p<0.05)。使用具有两组功能的MNB和SVM分类器，将本发明模型与多个基准进行比较。Results for identifying depressed users from methods and other benchmarks in RSDD are shown in the table above. The difference between the model of the present invention and the benchmark was statistically significant (McNemar test, p<0.05). The inventive model was compared to multiple benchmarks using MNB and SVM classifiers with two sets of functions.

本发明针对抑郁用户提出的MGL-CNN模型在精度，召回率和F1方面均优于基线(分别增长了6.8％，6.7％和5.9％)。结果证明，方法可以使用选通权重来有效地识别与用户帖子中的负面情绪相关的语言。The MGL-CNN model proposed by the present invention for depressed users outperforms the baseline in terms of precision, recall and F1 (increases by 6.8%, 6.7% and 5.9%, respectively). It turns out that the method can use gating weights to effectively identify language associated with negative sentiment in user posts.

上述表中显示了模型和当前表现最佳方法的抑郁症早期检测数据集的结果。来自基线的度量标准的绝对值表明，抑郁症任务的早期检测很困难。The results of the model and the current best-performing methods on the depression early detection dataset are shown in the table above. Absolute values of metrics from baseline suggest that early detection of depression tasks is difficult.

就F1而言，性能低下，最高F1为0.64。某些方法(如FHDO-BCSGB)选择优化精度，但召回率较低，而其他方法(如UNSLA)选择优化召回率，但精度较低。这可能与数据集的规模和创建有关。本发明提出的模型(MGL-CNN)在抑郁用户上的精度，召回率和F1方面的性能接近几种最新方法。As far as F1 is concerned, the performance is low, with a top F1 of 0.64. Some methods (like FHDO-BCSGB) choose to optimize for precision but lower recall, while others (like UNSLA) choose to optimize for recall but lower precision. This may be related to the size and creation of the dataset. The performance of the proposed model (MGL-CNN) in terms of precision, recall and F1 on depressed users is close to several state-of-the-art methods.

此外，本发明模型并非旨在像基线模型那样改善一个指标，而是在所有三个指标(Precision，Recall和F1)上均表现良好。本发明模型的结果和最先进的方法表明，提出的通用神经网络架构也可以用于在不同的在线论坛上对抑郁症进行早期检测。Furthermore, the inventive model is not designed to improve one metric like the baseline model, but performs well on all three metrics (Precision, Recall and F1). The results of the present model and the state-of-the-art method show that the proposed general neural network architecture can also be used for early detection of depression in different online forums.

以上所述仅是本发明的优选实施方式，应当指出的是，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above are only the preferred embodiments of the present invention. It should be noted that, for those skilled in the art, without departing from the principles of the present invention, several improvements and modifications can be made. These improvements and Retouching should also be considered within the scope of protection of the present invention.

Claims

1. The method for detecting the depression of the online forum user based on the gated convolutional network is characterized in that the model for detecting the depression of the online thesis user based on the gated convolutional network is used for realizing detection, and the detection steps of the model are as follows:

s1, post level operations

Inputting the model through a multilayer convolutional neural network with a gate control unit, wherein the convolutional neural network acquires key features represented by posts by using limited context information;

the processing layer of the post level operation comprises two convolution layers and a global average pool; the first convolutional layer obtains an abstract feature map by using two convolution kernels with different sizes, then a second convolutional layer with a gating unit obtains different gating weights by using the convolution kernels, and the gating weights and the feature map generated by the first convolutional layer are subjected to element-by-element multiplication to obtain a post-level representation;

each word is embedded by a word into a matrix L_ω∈R^d×|V|Where | V | is the number of words in the vocabulary, d is the dimension of the word vector, and each post is represented as n words, each being { ω [ ]₁，ω₂，...ω_i...ω_nLet x_i∈R^dFor the d-dimensional word vector corresponding to the ith word in the post, the post of length n is shown embedded as

X_1：n＝[x₁，x₂，...，x_n]

In the first convolutional layer, generating a representation form of the post by using a CNN and a plurality of convolutional filters with different widths; treating convolution filters with different widths as extractors to obtain multi-granularity local information; using a plurality of convolution filters of different window sizes to obtain a plurality of feature maps;

let K be an element of R^s×dIs a convolution filter of step size 1 applied to a window of s words to generate a new feature, [ x ]_i+1，x_i+2，...，x_i+s-1]Meaning word embedding in a window of fixed length s, concatenating two vectors, i.e. X_i：i+s-1From X_i：i+s-1Generating a new feature a_i

a_i＝f(K*X_i：i+s+1+b)

Where b ∈ R is a bias term, representing a convolution operation, and f is the activation function LeakyReLU, the filter is applied to the post [ X ∈ R_1：s，X_2：s+1，...，X_n-s+1：n]The s words in each possible window in the feature map a are generated,

A＝[a₁，a₂，...，a_n-s+1]

wherein A ∈ R^(n-s+1)×1Representing all feature maps obtained by filters of different sizes, and then transmitting each feature map A as input to the second convolutional layer;

the second convolutional layer comprises convolutional layer and gating unit for generating different gating weights, including a core F e R^h×1Is used for extracting the feature map A, and the convolution kernel F is arranged at the feature a by a window h₁Sliding up to generate gating weights;

wherein, g_lE R, l 1, 2.., n-s +1, all gating weight elements are generated by the feature map a and the convolution kernel F, and constitute a gating weight matrix G:

G＝[g₁，g₂，…，g_n-s+1]

wherein G ∈ R^(n-s+1)×1M is the number of convolution kernels in the second convolution layer, and different gating weight matrixes G are extracted by using the gating units₁，G₂，...，G_mThen, obtaining an output characteristic diagram O through a gating weight matrix G:

wherein,

is the product by element between matrices，O∈R^m×(n-s+1)；

The output characteristic diagram O of the first convolution layer is controlled by a gating weight matrix G;

then inputting the output feature graph O of the second convolutional layer into the global average pooling layer and connecting all the outputs to obtain the representation of the post;

s2, user-level operation

The obtained post representation input is sent to user level operation processing, a user state representation is obtained by using the same method as the post level operation, then the obtained user state representation is transmitted to a fully connected softmax layer, and the probability distribution on a label is output by the softmax layer, so that detection is realized; the model loss function uses the classification cross entropy, and the target emotion distribution of each document is marked as p^TP is the predicted document emotion distribution;

wherein T is training data, C is classification number, i is document index, j is class index, and training is to make p of all training documents^TAnd p minimizes the error in cross entropy.

2. The gated convolutional network-based online forum user depression detection method as claimed in claim 1, wherein said model is implemented with a Keras framework.

3. The gated convolutional network-based online forum user depression detection method as claimed in claim 2, wherein for MG-CNN post level operation, the window sizes of the convolutional kernels in the first convolutional layer are set to 2, 3, 4, 5, 6, respectively, each window size having 30 different convolutional kernels; the window sizes of the convolution kernels in the second convolutional layer are 1, 3, 5, 7, and each window size has 30 different convolution kernels; a convolution kernel with a window size of 3 is used in the second layer.

4. The gated convolutional network-based online forum user depression detection method as claimed in claim 3, wherein for user-level operations, the same parameters as for post-level operations are set.