Background technique
Nowadays computer network plays an important role in human society life, and server protection is used as computer network
Core, be to guarantee the stable important component of computer network security.Around the attack and defence of server,
WebShell is often used as the weapon for obtaining server permission by most of invaders.WebShell is a kind of order execution ring
Border occurs usually in the form of webpage back door, such as asp, php, jsp or cgi page, can also be referred to as webpage back door.
Administrator is usually prevented or is detected WebShell, static detection method by static detection and dynamic monitoring method
Network shell is detected using characteristic value and dangerous function.This method is easy to operate, can quickly detect the presence of WebShell.
Hujian Kang etc. is extracted the different characteristic of target pages, and WebShell is classified and detected using decision tree;Ye Fei
Etc. the structure and text feature for analyzing the page, using bag-of-words model extraction keyword, SVM method pair is then used
WebShell is classified and is detected.But for some newest WebShell, leak detection rate will be relatively high.It is dynamic
State monitoring can solve invader and encrypt the problem of WebShell is to avoid detection, but detect WebShell to a certain extent,
But it is difficult to detect certain specific WebShell.In existing method, there is also following disadvantages:
1. usually above two kind of method is assisted to complete detection, but for the major company for possessing a large amount of hosts, it is non-
Often consumption, efficiency are lower
2. there are many time that traditional detection method is spent, and precision and recall rate are all relatively low
3. currently existing scheme is difficult to find new portal management permission, accuracy rate is low.
Summary of the invention
In order to solve shortcoming and defect existing in the prior art, the present invention provides the Webshell based on matrix decomposition
Detection method, for improving detection accuracy.
It is described the present invention provides the Webshell detection method based on matrix decomposition in order to reach above-mentioned technical purpose
Detection method includes:
Establish the score matrix that score information is saved with triple;
The feature construction characteristic set that preset quantity is chosen from text to be detected, is obtained based on the feature in characteristic set
Triple mid-score parameter is predicted, a possibility that there are WebShell in text to be detected is determined according to prediction result.
Optionally, the foundation saves the score matrix of score information with triple, comprising:
Define a mu×miThe score matrix R of size uses triple s=(u, i, a rui) represent a score and disappear
Breath, all information are stored in data set S={ (u, i, rui)|ruiIt is unknown } in.
WithTo ruiPrediction, processing formula are as follows:
Wherein puIndicate the k denapon subvector of each column u, qiIndicate that the k of each row i maintains number vector
Optionally, the detection method further include:
To avoid parameter overfitting, the processing of the reduction as shown in formula two is carried out to parameter
Optionally, the feature construction characteristic set that preset quantity is chosen from text to be detected, is based on characteristic set
In feature acquisition triple mid-score parameter is predicted, comprising:
It selects right quantity and influences characteristic set as big as possible, these Feature Conversions are the available shape of matrix decomposition model
Formula: being grouped each text and other features and is combined the feature as u and i, obtain matrix it is corresponding column and
The number of row vector;
It is calculated according to the feature combination formula three including text feature
Wherein μ represents the average value of all predictions of rating matrix, buAnd biText feature u and other are respectively represented
The combination of feature i.
Optionally, described to include: according to there are a possibility that WebShell in prediction result judgement text to be detected
IfResult be greater than 0.5, file more likely has WebShell;On the contrary, if result less than 0.5, more
It can not include WebShell.
Optionally, the detection method further includes trained process, and the trained process includes:
Prediction error is obtained according to formula four
Decline formula according to gradient and updates puAnd qi:
Wherein γ is the step-length of gradient descent algorithm, and λ is learning rate.
Optionally, the detection method further include:
Training process is optimized based on formula six,
Technical solution provided by the invention has the benefit that
Based on machine learning algorithm, the characteristics of capable of quickly and accurately understanding the WebShell page.The method overcome biographies
The shortcomings that system feature matching method, improve the accuracy and recall rate of the detection of network shell.By to known existing and non-existing
There are analysis and the learning characteristic of the WebShell page, which can predict the unknown page, it is high-efficient, it is with higher
Precision and recall rate.
Embodiment one
The present invention provides the Webshell detection methods based on matrix decomposition, as shown in Figure 1, the detection method packet
It includes:
11, the score matrix that score information is saved with triple is established;
12, the feature construction characteristic set that preset quantity is chosen from text to be detected, based on the feature in characteristic set
Triple mid-score parameter is predicted in acquisition, determines the possibility in text to be detected there are WebShell according to prediction result
Property.
In an implementation, the Webshell detection method of the invention based on matrix decomposition, step successively include: that (1) is built
Vertical matrix decomposition model;(2) feature is measured;(3) it is trained using matrix decomposition model;(4) model prediction is utilized
Webshell.
A kind of Webshell detection method based on matrix decomposition of the present embodiment is based on machine learning algorithm, can be fast
Speed accurately understands the characteristics of WebShell page, passes through the analysis and to the known existing and non-existing WebShell page
Characteristic is practised, the shortcomings that can predicting the unknown page, overcome traditional characteristic matching process, improves the detection of network shell
Accuracy and recall rate.Flow or the service of only matching characteristic value in the prior art or detection generation are overcome,
This is difficult to find the technical problems such as the webpage back door of new type.
Optionally, the foundation saves the score matrix of score information with triple, comprising:
Define a mu×miThe score matrix R of size uses triple s=(u, i, a rui) represent a score and disappear
Breath, all information are stored in data set S={ (u, i, rui)|ruiIt is unknown } in.
WithTo ruiPrediction, processing formula are as follows:
Wherein puIndicate the k denapon subvector of each column u, qiIndicate that the k of each row i maintains number vector.
In an implementation, a m is definedu×miThe score matrix R of size uses triple s=(u, i, a rui) represent
One score message, all information are stored in data set S={ (u, i, rui)|ruiIt is unknown } in.
To ruiIt is predicted: rui∈ [0,1] is the corresponding score of object u, if ruiCloser to 1, then respective column feature
The corresponding row feature i of u more likely includes Webshell.WithTo ruiPrediction, processing formula are formula one.
Optionally, the detection method further include:
To avoid parameter overfitting, the processing of the reduction as shown in formula two is carried out to parameter
Optionally, the feature construction characteristic set that preset quantity is chosen from text to be detected, is based on characteristic set
In feature acquisition triple mid-score parameter is predicted, comprising:
It selects right quantity and influences characteristic set as big as possible, these Feature Conversions are the available shape of matrix decomposition model
Formula: being grouped each text and other features and is combined the feature as u and i, obtain matrix it is corresponding column and
The number of row vector;
It is calculated according to the feature combination formula three including text feature
Wherein μ represents the average value of all predictions of rating matrix, buAnd biText feature u and other are respectively represented
The combination of feature i.
In an implementation, the selection right quantity characteristic set as big as possible with influence.Feature is divided into text spy by this method
It seeks peace other features, specific features are as shown in table 1.
It is the available form of matrix decomposition model by these Feature Conversions: each text and other features is grouped,
Spcial character is divided into three groups: 0 to 10,10 to 50, and more than 50;The length of word is divided into three groups: 0 to 10,10 to 20,
More than 20 and so on.After by six characteristics grouping in text feature, a combination is randomly choosed from each characteristic.Often
A feature is segmented into three groups, we can be obtained by 36 features, i.e., 729 in total combinations.It combines other features one
It rises.Each feature is divided into two groups, we can obtain 2 from other features8Feature, a total of 64 kinds of combinations;By number of combinations point
Feature not as u and i obtains the number of the corresponding columns and rows vector of matrix are as follows: 729 × 64.
It is trained using matrix decomposition model;Specifically includes the following steps:
(1) text and other feature calculations are combined Calculation method it is as follows:
Wherein μ represents the average value of all predictions of rating matrix, buAnd biText feature u and other are respectively represented
The combination of feature i, the average value of the probabilistic forecasting corresponding to WebShell.
(2) optimize training process: to reduce the loss function after training, training process being optimized.Handle formula such as
Under:
Training process is as follows:
(2.1) result r is given in each of training setui, can be predicted by formula 3
(2.2) error is predicted:
(2.3) formula is declined according to gradient and updates puAnd qi:
pu←pu+γ·(∈uiqi-λ·pu) formula 5
qi←qi+γ·(∈uipu-λ·qi) formula 6
Wherein parameter γ is the step-length of gradient descent algorithm, is arranged to 0.1 in the present invention, and parameter lambda is learning rate, if
It is set to 0.05.
Optionally, described to include: according to there are a possibility that WebShell in prediction result judgement text to be detected
IfResult be greater than 0.5, file more likely has WebShell;On the contrary, if result less than 0.5, more
It can not include WebShell.
The present invention provides the Webshell detection methods based on matrix decomposition, including establish and save score letter with triple
The score matrix of breath;The feature construction characteristic set that preset quantity is chosen from text to be detected, based on the spy in characteristic set
Sign is obtained and is predicted triple mid-score parameter, according to prediction result determine in text to be detected there are WebShell can
It can property.Based on machine learning algorithm, the characteristics of capable of quickly and accurately understanding the WebShell page.The method overcome tradition
The shortcomings that feature matching method, improves the accuracy and recall rate of the detection of network shell.By to known existing and non-existing
The analysis of the WebShell page and learning characteristic, which can predict the unknown page, high-efficient, essence with higher
Degree and recall rate.
Each serial number in above-described embodiment is for illustration only, the assembling for not representing each component or the elder generation in use process
Sequence afterwards.
The above description is only an embodiment of the present invention, is not intended to limit the invention, all in the spirit and principles in the present invention
Within, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.