CN117911057A

CN117911057A - Second-hand car price prediction method, device, equipment and storage medium

Info

Publication number: CN117911057A
Application number: CN202410093849.1A
Authority: CN
Inventors: 叶亮; 张小庆; 冯晓祥; 许荣杰
Original assignee: Wuhan Polytechnic University
Current assignee: Wuhan Polytechnic University
Priority date: 2024-01-23
Filing date: 2024-01-23
Publication date: 2024-04-19

Abstract

The invention discloses a method, a device, equipment and a storage medium for predicting the price of a second-hand vehicle, wherein the method comprises the following steps: inputting target second-hand vehicle data into a target second-hand vehicle price prediction model to obtain a second-hand vehicle price prediction value, wherein the target second-hand vehicle price prediction model is obtained by introducing Chebyshev chaotic mapping, nonlinear decreasing disturbance factors and self-adaptive weight factors to improve a Harris eagle algorithm, and optimizing super-parameters of a LightGBM model based on the improved Harris eagle algorithm. According to the invention, the Chebyshev chaotic map, the nonlinear decreasing disturbance factor and the self-adaptive weight factor are introduced to improve the Harris eagle algorithm, and then the secondary handcart price prediction is carried out based on the target secondary handcart price prediction model obtained by optimizing the super parameters of LightGBM by utilizing the improved Harris eagle algorithm, so that the accuracy and stability of secondary handcart price prediction are effectively improved.

Description

Second-hand car price prediction method, device, equipment and storage medium

技术领域Technical Field

本发明涉及数据处理技术领域，尤其涉及一种二手车价格预测方法、装置、设备及存储介质。The present invention relates to the field of data processing technology, and in particular to a second-hand car price prediction method, device, equipment and storage medium.

背景技术Background technique

随着我国二手车市场的迅猛发展，截至2021年，我国二手车销量增速已赶超新车销量增速达到了22.62％。但在二手车市场迅速发展的同时，针对二手车价格评估的合理性问题显得愈发明显。传统人工评估方法以行业经验为主，故存在评估成本高、评估结果随意性较大的不足。同时，消费者与车商之间信息不对称，也导致消费者无法判定二手车价格。因此，以历史数据驱动，建立合理的二手车价格预测模型对规范二手车市场显得尤为重要，它能够为消费者和车商间的商洽提供参考，也可促进二手车市场稳健发展。With the rapid development of my country's used car market, as of 2021, the growth rate of used car sales in my country has surpassed that of new car sales, reaching 22.62%. However, as the used car market develops rapidly, the rationality of used car price evaluation has become increasingly apparent. Traditional manual evaluation methods are based on industry experience, so there are shortcomings such as high evaluation costs and arbitrary evaluation results. At the same time, information asymmetry between consumers and car dealers also makes it impossible for consumers to determine the price of used cars. Therefore, it is particularly important to establish a reasonable used car price prediction model driven by historical data to regulate the used car market. It can provide a reference for negotiations between consumers and car dealers, and can also promote the steady development of the used car market.

目前，利用机器学习算法进行二手车价格预测的模型主要有：KNN、随机森林、SVR及线性回归模型等，均能提供对二手车价格预测的有效评估。但其预测精度仍有提升空间，其原因是传统机器学习模型的预测性能受到超参数选择的影响，传统的人工定参或网格法、梯度下降法并不能使预测模型的性能达到最优。At present, the models that use machine learning algorithms to predict used car prices mainly include: KNN, random forest, SVR and linear regression models, etc., which can provide effective evaluation of used car price prediction. However, there is still room for improvement in its prediction accuracy. The reason is that the prediction performance of traditional machine learning models is affected by the selection of hyperparameters, and traditional manual parameter setting or grid method and gradient descent method cannot make the performance of the prediction model reach the optimal level.

因此，亟需一种二手车价格预测方法，能够有效提高二手车价格预测的准确性和稳定性。Therefore, there is an urgent need for a used car price prediction method that can effectively improve the accuracy and stability of used car price prediction.

发明内容Summary of the invention

本发明的主要目的在于提供了一种二手车价格预测方法、装置、设备及存储介质，旨在解决现有技术中二手车价格预测的准确性和稳定性不高的技术问题。The main purpose of the present invention is to provide a second-hand car price prediction method, device, equipment and storage medium, aiming to solve the technical problem of low accuracy and stability of second-hand car price prediction in the prior art.

为实现上述目的，本发明提供了一种二手车价格预测方法，所述方法包括以下步骤：To achieve the above object, the present invention provides a method for predicting the price of a used car, the method comprising the following steps:

获取二手车车辆属性信息，并对所述车辆属性信息进行预处理，获得目标二手车数据；Acquire vehicle attribute information of a used car, and pre-process the vehicle attribute information to obtain target used car data;

将所述目标二手车数据输入至目标二手车价格预测模型，获得二手车价格预测值，所述目标二手车价格预测模型是通过引入Chebyshev混沌映射、非线性递减扰动因子和自适应权重因子改进哈里斯鹰算法后，基于改进后的哈里斯鹰算法对LightGBM模型的超参数进行寻优后获得的模型。The target used car data is input into a target used car price prediction model to obtain a used car price prediction value. The target used car price prediction model is a model obtained by optimizing the hyperparameters of the LightGBM model based on the improved Harris Eagle algorithm after introducing Chebyshev chaotic mapping, nonlinear decreasing perturbation factor and adaptive weight factor to improve the Harris Eagle algorithm.

可选地，所述获取二手车车辆属性信息，并对所述车辆属性信息进行预处理，获得目标二手车数据的步骤之前，还包括：Optionally, before the step of acquiring the vehicle attribute information of the used car and preprocessing the vehicle attribute information to obtain the target used car data, the step further includes:

获取二手车销售数据集，并对所述二手车销售数据集进行预处理，获得训练数据集；Obtaining a used car sales data set, and preprocessing the used car sales data set to obtain a training data set;

通过引入Chebyshev混沌映射、非线性递减扰动因子和自适应权重因子改进哈里斯鹰算法，获得改进后的哈里斯鹰算法；The improved Harris Hawk algorithm is obtained by introducing Chebyshev chaotic mapping, nonlinear decreasing disturbance factor and adaptive weight factor to improve the Harris Hawk algorithm.

利用所述改进后的哈里斯鹰算法对LightGBM模型的超参数进行寻优，获得目标参数组合；The improved Harris Eagle algorithm is used to optimize the hyperparameters of the LightGBM model to obtain the target parameter combination;

基于所述目标参数组合建立初始LightGBM模型后，利用所述训练数据集对所述初始LightGBM模型进行训练，获得目标二手车价格预测模型。After the initial LightGBM model is established based on the target parameter combination, the initial LightGBM model is trained using the training data set to obtain a target used car price prediction model.

可选地，所述通过引入Chebyshev混沌映射、非线性递减扰动因子和自适应权重因子改进哈里斯鹰算法，获得改进后的哈里斯鹰算法的步骤，包括：Optionally, the step of improving the Harris Hawk algorithm by introducing a Chebyshev chaotic map, a nonlinear decreasing disturbance factor and an adaptive weight factor to obtain an improved Harris Hawk algorithm comprises:

在所述哈里斯鹰算法的全局勘探阶段引入Chebyshev混沌映射，获得改进后的全局勘探阶段；Introducing Chebyshev chaotic mapping into the global exploration phase of the Harris Eagle algorithm to obtain an improved global exploration phase;

采用非线性递减扰动因子对所述哈里斯鹰算法中的猎物逃逸能量进行改进，获得改进后的逃逸能量公式；The prey escape energy in the Harris Hawk algorithm is improved by using a nonlinear decreasing disturbance factor to obtain an improved escape energy formula;

利用自适应权重因子对所述哈里斯鹰算法中种群内的最优个体进行加权，获得最优个体更新公式；The optimal individual in the population of the Harris Hawk algorithm is weighted by using an adaptive weight factor to obtain an optimal individual update formula;

基于所述改进后的全局勘探阶段、所述改进后的逃逸能量公式和所述最优个体更新公式获得改进后的哈里斯鹰算法。An improved Harris Eagle algorithm is obtained based on the improved global exploration phase, the improved escape energy formula and the optimal individual update formula.

可选地，所述Chebyshev混沌映射为：Optionally, the Chebyshev chaotic map is:

CM＝cos(tcos^-1(γ))CM＝cos(tcos ^-1 (γ))

式中，t为当前迭代次数，γ为混沌初始值；In the formula, t is the current iteration number, γ is the initial value of chaos;

所述改进后的全局勘探阶段的模型为：The improved model of the global exploration stage is:

式中，X(t+1)和X(t)分别为哈里斯鹰下一次迭代的位置和当前位置，X_rabbit(t)为猎物所在位置，CM₁、CM₂、CM₃、CM₄、q均为(0,1)之间的随机数，UB和LB分别为个体位置搜索的上限和下限，X_m(t)为种群的平均位置， Where X(t+1) and X(t) are the next iteration position and current position of Harris's hawk, respectively; X _rabbit (t) is the location of the prey; CM ₁ , CM ₂ , CM ₃ , CM ₄ , and q are all random numbers between (0,1); UB and LB are the upper and lower limits of the individual position search, respectively; X _m (t) is the average position of the population;

相应地：所述改进后的逃逸能量公式为：Accordingly: the improved escape energy formula is:

E＝2E₀(2r(0.5+cos((π(t/T))^1/2)))；E＝2E ₀ (2r(0.5+cos((π(t/T)) ^1/2 )));

式中，E₀初始逃逸能量，T为最大迭代次数，t为当前迭代次数，r为(0,1)之间的随机数；Where, E ₀ is the initial escape energy, T is the maximum number of iterations, t is the current number of iterations, and r is a random number between (0,1);

所述自适应权重因子为：The adaptive weight factor is:

所述最优个体更新公式：The optimal individual update formula:

X′_rabbit＝w(t)×X_rabbit。X′ _rabbit =w(t)×X _rabbit .

可选地，所述获取二手车销售数据集，并对所述二手车销售数据集进行预处理，获得训练数据集的步骤，包括：Optionally, the step of obtaining a used car sales data set and preprocessing the used car sales data set to obtain a training data set includes:

获取二手车销售数据集，对所述二手车销售数据集进行预处理，获得预处理后的二手车销售数据集；Acquire a used car sales data set, preprocess the used car sales data set, and obtain a preprocessed used car sales data set;

对所述预处理后的二手车销售数据集按照预设比例进行拆分，获得训练数据集和测试数据集；Splitting the preprocessed used car sales data set according to a preset ratio to obtain a training data set and a test data set;

相应地，所述利用所述训练数据集对所述初始LightGBM模型进行训练，获得目标二手车价格预测模型的步骤之后，还包括：Accordingly, after the step of training the initial LightGBM model using the training data set to obtain the target used car price prediction model, the method further includes:

利用所述目标二手车价格预测模型对所述测试数据集进行预测，获得二手车价格预测值；Using the target used car price prediction model to predict the test data set to obtain a used car price prediction value;

选取评价指标，基于所述二手车价格预测值对所述目标二手车价格预测模型进行评估，所述评价指标包括决定系数、均方误差和平均绝对误差。An evaluation index is selected to evaluate the target used car price prediction model based on the used car price prediction value, wherein the evaluation index includes a determination coefficient, a mean square error, and a mean absolute error.

可选地，所述利用所述改进后的哈里斯鹰算法对LightGBM模型的超参数进行寻优，获得目标参数组合的步骤，包括：Optionally, the step of optimizing the hyperparameters of the LightGBM model using the improved Harris Eagle algorithm to obtain a target parameter combination includes:

设定所述改进后的哈里斯鹰算法的种群规模、迭代次数和问题维度，获得目标哈里斯鹰算法；Setting the population size, number of iterations and problem dimension of the improved Harris Hawk algorithm to obtain a target Harris Hawk algorithm;

将LightGBM模型中每棵树叶子节点数、每棵树最大深度、模型学习率和每个叶子节点所需最小样本数作为所述LightGBM模型的超参数；The number of leaf nodes per tree, the maximum depth of each tree, the model learning rate, and the minimum number of samples required for each leaf node in the LightGBM model are used as hyperparameters of the LightGBM model;

将所述训练数据集输入至所述LightGBM模型，并依据预设适应度函数确定个体适应度值；Input the training data set into the LightGBM model, and determine the individual fitness value according to a preset fitness function;

基于所述个体适应度值，根据所述目标哈里斯鹰算法对种群个体进行迭代更新，直至达到所述迭代次数，获得目标参数组合。Based on the individual fitness values, the population individuals are iteratively updated according to the target Harris Hawk algorithm until the number of iterations is reached to obtain a target parameter combination.

可选地，所述基于所述目标参数组合建立初始LightGBM模型后，利用所述训练数据集对所述初始LightGBM模型进行训练，获得目标二手车价格预测模型的步骤，包括：Optionally, after establishing the initial LightGBM model based on the target parameter combination, the step of training the initial LightGBM model using the training data set to obtain a target used car price prediction model includes:

基于所述LightGBM模型和所述目标参数组合建立初始LightGBM模型；Establishing an initial LightGBM model based on the LightGBM model and the target parameter combination;

利用所述训练数据集对所述初始LightGBM模型进行训练，获得训练结果；Using the training data set to train the initial LightGBM model to obtain a training result;

根据所述训练结果对所述初始LightGBM模型进行优化，获得目标二手车价格预测模型。The initial LightGBM model is optimized according to the training results to obtain a target used car price prediction model.

此外，为实现上述目的，本发明还提出一种二手车价格预测装置，所述装置包括：In addition, to achieve the above-mentioned purpose, the present invention also proposes a second-hand car price prediction device, the device comprising:

数据处理模块，用于获取二手车车辆属性信息，并对所述车辆属性信息进行预处理，获得目标二手车数据；A data processing module, used to obtain the vehicle attribute information of a used car, and pre-process the vehicle attribute information to obtain target used car data;

价格输出模块，用于将所述目标二手车数据输入至目标二手车价格预测模型，获得二手车价格预测值，所述目标二手车价格预测模型是通过引入Chebyshev混沌映射、非线性递减扰动因子和自适应权重因子改进哈里斯鹰算法后，基于改进后的哈里斯鹰算法对LightGBM模型的超参数进行寻优后获得的模型。The price output module is used to input the target used car data into the target used car price prediction model to obtain the used car price prediction value. The target used car price prediction model is a model obtained by improving the Harris Eagle algorithm by introducing Chebyshev chaotic mapping, nonlinear decreasing perturbation factor and adaptive weight factor, and optimizing the hyperparameters of the LightGBM model based on the improved Harris Eagle algorithm.

此外，为实现上述目的，本发明还提出一种二手车价格预测设备，所述设备包括：存储器、处理器及存储在所述存储器上并可在所述处理器上运行的二手车价格预测程序，所述二手车价格预测程序配置为实现如上文所述的二手车价格预测方法的步骤。In addition, to achieve the above-mentioned purpose, the present invention also proposes a used car price prediction device, which includes: a memory, a processor, and a used car price prediction program stored in the memory and executable on the processor, wherein the used car price prediction program is configured to implement the steps of the used car price prediction method described above.

此外，为实现上述目的，本发明还提出一种存储介质，所述存储介质上存储有二手车价格预测程序，所述二手车价格预测程序被处理器执行时实现如上文所述的二手车价格预测方法的步骤。In addition, to achieve the above-mentioned purpose, the present invention also proposes a storage medium, on which a used car price prediction program is stored. When the used car price prediction program is executed by a processor, the steps of the used car price prediction method described above are implemented.

本发明通过获取二手车车辆属性信息，并对所述车辆属性信息进行预处理，获得目标二手车数据；将所述目标二手车数据输入至目标二手车价格预测模型，获得二手车价格预测值，所述目标二手车价格预测模型是通过引入Chebyshev混沌映射、非线性递减扰动因子和自适应权重因子改进哈里斯鹰算法后，基于改进后的哈里斯鹰算法对LightGBM模型的超参数进行寻优后获得的模型。相比于现有技术，由于本发明引入Chebyshev混沌映射、非线性递减扰动因子和自适应权重因子改进哈里斯鹰算法的全局勘探和局部开发能力，提升算法的全局搜索精度，然后基于利用改进后的哈里斯鹰算法对LightGBM的超参数进行寻优，解决了模型预测中的超参依赖性问题，进而有效提高了二手车价格预测的准确性和稳定性。The present invention obtains target second-hand car data by acquiring vehicle attribute information of used cars and preprocessing the vehicle attribute information; the target second-hand car data is input into a target second-hand car price prediction model to obtain a second-hand car price prediction value, wherein the target second-hand car price prediction model is a model obtained by optimizing the hyperparameters of the LightGBM model based on the improved Harris Hawk algorithm after introducing Chebyshev chaotic mapping, nonlinear decreasing perturbation factor and adaptive weight factor to improve the Harris Hawk algorithm. Compared with the prior art, the present invention improves the global exploration and local development capabilities of the Harris Hawk algorithm by introducing Chebyshev chaotic mapping, nonlinear decreasing perturbation factor and adaptive weight factor, thereby improving the global search accuracy of the algorithm, and then optimizing the hyperparameters of LightGBM based on the improved Harris Hawk algorithm, thereby solving the hyperparameter dependency problem in model prediction, thereby effectively improving the accuracy and stability of second-hand car price prediction.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明实施例方案涉及的硬件运行环境的二手车价格预测设备的结构示意图；FIG1 is a schematic diagram of the structure of a second-hand car price prediction device in a hardware operating environment according to an embodiment of the present invention;

图2为本发明二手车价格预测方法第一实施例的流程示意图；FIG2 is a flow chart of a first embodiment of a method for predicting used car prices according to the present invention;

图3为本发明二手车价格预测方法第二实施例的流程示意图；FIG3 is a flow chart of a second embodiment of a method for predicting used car prices according to the present invention;

图4为本发明二手车价格预测方法中改进后的哈里斯鹰算法流程示意图；FIG4 is a schematic diagram of the improved Harris Eagle algorithm flow in the second-hand car price prediction method of the present invention;

图5为本发明二手车价格预测方法第三实施例的流程示意图；FIG5 is a schematic diagram of a flow chart of a third embodiment of a method for predicting used car prices according to the present invention;

图6为本发明二手车价格预测方法中目标二手车价格预测模型整体设计示意图；FIG6 is a schematic diagram of the overall design of a target second-hand car price prediction model in the second-hand car price prediction method of the present invention;

图7为本发明二手车价格预测装置第一实施例的结构框图。FIG. 7 is a structural block diagram of the first embodiment of the second-hand car price prediction device of the present invention.

本发明目的的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The realization of the purpose, functional features and advantages of the present invention will be further explained in conjunction with embodiments and with reference to the accompanying drawings.

具体实施方式Detailed ways

应当理解，此处所描述的具体实施例仅用以解释本发明，并不用于限定本发明。It should be understood that the specific embodiments described herein are only used to explain the present invention, and are not used to limit the present invention.

参照图1，图1为本发明实施例方案涉及的硬件运行环境的二手车价格预测设备结构示意图。Refer to FIG. 1 , which is a schematic diagram of the structure of a used car price prediction device in a hardware operating environment according to an embodiment of the present invention.

如图1所示，该二手车价格预测设备可以包括：处理器1001，例如中央处理器(Central Processing Unit，CPU)，通信总线1002、用户接口1003，网络接口1004，存储器1005。其中，通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard)，可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如无线保真(Wireless-Fidelity，WI-FI)接口)。存储器1005可以是高速的随机存取存储器(RandomAccess Memory，RAM)，也可以是稳定的非易失性存储器(Non-Volatile Memory，NVM)，例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。As shown in FIG1 , the second-hand car price prediction device may include: a processor 1001, such as a central processing unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Among them, the communication bus 1002 is used to realize the connection and communication between these components. The user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a wireless fidelity (Wireless-Fidelity, WI-FI) interface). The memory 1005 may be a high-speed random access memory (Random Access Memory, RAM), or a stable non-volatile memory (Non-Volatile Memory, NVM), such as a disk storage. The memory 1005 may also be a storage device independent of the aforementioned processor 1001.

本领域技术人员可以理解，图1中示出的结构并不构成对二手车价格预测设备的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。Those skilled in the art will appreciate that the structure shown in FIG. 1 does not constitute a limitation on the used car price prediction device, and may include more or fewer components than shown in the figure, or a combination of certain components, or a different arrangement of components.

如图1所示，作为一种存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及二手车价格预测程序。As shown in FIG. 1 , the memory 1005 as a storage medium may include an operating system, a network communication module, a user interface module, and a used car price prediction program.

在图1所示的二手车价格预测设备中，网络接口1004主要用于与网络服务器进行数据通信；用户接口1003主要用于与用户进行数据交互；本发明二手车价格预测设备中的处理器1001、存储器1005可以设置在二手车价格预测设备中，所述二手车价格预测设备通过处理器1001调用存储器1005中存储的二手车价格预测程序，并执行本发明实施例提供的二手车价格预测方法。In the used car price prediction device shown in Figure 1, the network interface 1004 is mainly used for data communication with the network server; the user interface 1003 is mainly used for data interaction with the user; the processor 1001 and the memory 1005 in the used car price prediction device of the present invention can be set in the used car price prediction device, and the used car price prediction device calls the used car price prediction program stored in the memory 1005 through the processor 1001, and executes the used car price prediction method provided by the embodiment of the present invention.

本发明实施例提供了一种二手车价格预测方法，参照图2，图2为本发明二手车价格预测方法第一实施例的流程示意图。An embodiment of the present invention provides a method for predicting the price of a used car. Referring to FIG. 2 , FIG. 2 is a flow chart of a first embodiment of the method for predicting the price of a used car of the present invention.

本实施例中，所述二手车价格预测方法包括以下步骤：In this embodiment, the second-hand car price prediction method includes the following steps:

步骤S10：获取二手车车辆属性信息，并对所述车辆属性信息进行预处理，获得目标二手车数据。Step S10: Acquire vehicle attribute information of a used car, and pre-process the vehicle attribute information to obtain target used car data.

需要说明的是，本实施例的执行主体可以是一种具有数据处理、网络通信以及程序运行功能的计算机服务器设备，例如服务器、平板电脑、个人电脑、ipad等，或者是一种能够实现上述功能的电子设备、二手车价格预测设备等。以下以二手车价格预测设备为例，对本实施例及下述各实施例进行举例说明。It should be noted that the execution subject of this embodiment can be a computer server device with data processing, network communication and program running functions, such as a server, a tablet computer, a personal computer, an iPad, etc., or an electronic device capable of realizing the above functions, a second-hand car price prediction device, etc. The second-hand car price prediction device is taken as an example to illustrate this embodiment and the following embodiments.

应理解的是，所述二手车车辆属性信息包括但不限于汽车型号、上牌日期、变速箱种类、公里数、燃油类型、道路税、油耗、排量和售价等。It should be understood that the vehicle attribute information of the used car includes but is not limited to the car model, registration date, gearbox type, mileage, fuel type, road tax, fuel consumption, displacement and selling price.

可理解的是，上述对车辆属性信息进行预处理可以是二手车车辆属性信息的缺失数据进行数据清洗或数据填充，获得目标二手车数据。It is understandable that the above-mentioned pre-processing of the vehicle attribute information may be data cleaning or data filling of the missing data of the vehicle attribute information of used cars to obtain the target used car data.

步骤S20：将所述目标二手车数据输入至目标二手车价格预测模型，获得二手车价格预测值，所述目标二手车价格预测模型是通过引入Chebyshev混沌映射、非线性递减扰动因子和自适应权重因子改进哈里斯鹰算法后，基于改进后的哈里斯鹰算法对LightGBM模型的超参数进行寻优后获得的模型。Step S20: Input the target used car data into the target used car price prediction model to obtain a used car price prediction value. The target used car price prediction model is a model obtained by improving the Harris Eagle algorithm by introducing Chebyshev chaotic mapping, nonlinear decreasing perturbation factor and adaptive weight factor, and optimizing the hyperparameters of the LightGBM model based on the improved Harris Eagle algorithm.

需要解释的是，哈里斯鹰算法HHO是Heidari等学者在2019年提出的一种模拟哈里斯鹰协作捕食行为的元启发式算法。算法具有参数少、易于调节与实现等优点，在许多优化问题中有着出色表现。HHO算法分为全局勘探、勘探到开发的过渡和局部开发三个阶段。It should be explained that the Harris Hawk Algorithm (HHO) is a metaheuristic algorithm proposed by Heidari et al. in 2019 to simulate the collaborative predation behavior of Harris Hawks. The algorithm has the advantages of few parameters, easy adjustment and implementation, and has excellent performance in many optimization problems. The HHO algorithm is divided into three stages: global exploration, transition from exploration to development, and local development.

需要说明的是，混沌序列具有遍历性和随机性等特点，可使种群在空间中的分布更加均匀，有助于提高算法的搜索能力。在传统哈里斯鹰算法勘探阶段中，哈里斯鹰按传统公式进行随机栖息观察猎物，导致哈里斯鹰种群在空间中分布不够均匀。因此，本实施例引入Chebyshev混沌映射改进哈里斯鹰在勘探阶段的位置，使哈里斯鹰种群在空间中的分布更加均匀。It should be noted that chaotic sequences have the characteristics of ergodicity and randomness, which can make the distribution of populations in space more uniform, and help improve the search ability of the algorithm. In the exploration phase of the traditional Harris Hawk algorithm, the Harris Hawk observes prey randomly according to the traditional formula, resulting in the Harris Hawk population being not evenly distributed in space. Therefore, this embodiment introduces Chebyshev chaotic mapping to improve the position of the Harris Hawk in the exploration phase, so that the distribution of the Harris Hawk population in space is more uniform.

逃逸能量E起着平衡哈里斯鹰进行勘探和开发行为的关键作用。传统哈里斯鹰算法中，呈线性递减的逃逸能量E容易导致哈里斯鹰算法在迭代后期开发时陷入局部最优。因此，本实施例采用非线性递减扰动因子对猎物逃逸能量进行改进，使哈里斯鹰算法在后期仍然具备一定的勘探行为，增强哈里斯鹰算法跳出局部最优的能力。The escape energy E plays a key role in balancing the exploration and development behaviors of the Harris Hawk. In the traditional Harris Hawk algorithm, the linearly decreasing escape energy E easily causes the Harris Hawk algorithm to fall into a local optimum during the late development of iterations. Therefore, this embodiment uses a nonlinear decreasing disturbance factor to improve the prey escape energy, so that the Harris Hawk algorithm still has a certain exploration behavior in the later stage, and enhances the ability of the Harris Hawk algorithm to jump out of the local optimum.

惯性权重具有平衡全局搜索和局部开发的作用，较大的惯性权重可以扩大算法全局搜索范围，提高全局勘探能力。较小的惯性权重可使算法在最优解附近进行更精细的搜索，从而提高算法的局部开发能力。因此，为增强哈里斯鹰在限定范围内向最优解逼近的能力，本实施例提出一种新的自适应权重因子对种群内的最优个体进行加权。The inertia weight has the function of balancing global search and local development. A larger inertia weight can expand the global search range of the algorithm and improve the global exploration capability. A smaller inertia weight can enable the algorithm to perform a more detailed search near the optimal solution, thereby improving the local development capability of the algorithm. Therefore, in order to enhance the ability of the Harris Hawk to approach the optimal solution within a limited range, this embodiment proposes a new adaptive weight factor to weight the optimal individuals in the population.

通过上述对哈里斯鹰算法(HHO算法)的改进后，利用改进后的哈里斯鹰算法(iHHO算法)对LightGBM模型中的超参数进行寻优，并以此建立目标二手车价格预测模型。After the above-mentioned improvements to the Harris Hawk algorithm (HHO algorithm), the improved Harris Hawk algorithm (iHHO algorithm) is used to optimize the hyperparameters in the LightGBM model, and the target used car price prediction model is established based on this.

需要说明的是，轻量梯度提升机LightGBM是梯度提升决策树GBDT的一种改进算法。GBDT算法的核心思想是每一次的输入都取决于上一次的训练结果，其在整个训练过程中需要多次对整个数据集进行迭代，其时间和内存开销巨大。而传统XGBoost在训练时采用了预排序思想来寻求最佳分裂点，但在面临海量数据和高维度数据时，其时间和空间开销依旧较大。LightGBM模型有效解决了GBDT和XGBoost在面临高维度和海量数据上时所表现的不足，具有训练速度快、内存消耗少及精度高的优点。It should be noted that the Lightweight Gradient Boosting Machine LightGBM is an improved algorithm of the Gradient Boosting Decision Tree GBDT. The core idea of the GBDT algorithm is that each input depends on the results of the previous training. It needs to iterate the entire data set many times during the entire training process, which consumes huge time and memory. The traditional XGBoost uses the pre-sorting idea to find the best split point during training, but when faced with massive data and high-dimensional data, its time and space overhead is still large. The LightGBM model effectively solves the shortcomings of GBDT and XGBoost when faced with high dimensions and massive data, and has the advantages of fast training speed, low memory consumption and high accuracy.

本实施例通过获取二手车车辆属性信息，并对所述车辆属性信息进行预处理，获得目标二手车数据；将所述目标二手车数据输入至目标二手车价格预测模型，获得二手车价格预测值，所述目标二手车价格预测模型是通过引入Chebyshev混沌映射、非线性递减扰动因子和自适应权重因子改进哈里斯鹰算法后，基于改进后的哈里斯鹰算法对LightGBM模型的超参数进行寻优后获得的模型。相比于现有技术，由于本实施例引入Chebyshev混沌映射、非线性递减扰动因子和自适应权重因子改进哈里斯鹰算法的全局勘探和局部开发能力，提升算法的全局搜索精度，然后基于利用改进后的哈里斯鹰算法对LightGBM的超参数进行寻优，解决了模型预测中的超参依赖性问题，进而有效提高了二手车价格预测的准确性和稳定性。This embodiment obtains target second-hand car data by acquiring vehicle attribute information of used cars and preprocessing the vehicle attribute information; the target second-hand car data is input into a target second-hand car price prediction model to obtain a second-hand car price prediction value, wherein the target second-hand car price prediction model is a model obtained by optimizing the hyperparameters of the LightGBM model based on the improved Harris Hawk algorithm after introducing Chebyshev chaotic mapping, nonlinear decreasing perturbation factor and adaptive weight factor to improve the Harris Hawk algorithm's global exploration and local development capabilities, thereby improving the algorithm's global search accuracy, and then optimizing the hyperparameters of LightGBM based on the improved Harris Hawk algorithm, thereby solving the hyperparameter dependency problem in model prediction, thereby effectively improving the accuracy and stability of second-hand car price prediction.

参考图3，图3为本发明二手车价格预测方法第二实施例的流程示意图。Refer to FIG3 , which is a flow chart of a second embodiment of a method for predicting used car prices according to the present invention.

基于上述第一实施例，在本实施例中，所述步骤S10之前，还包括：Based on the first embodiment above, in this embodiment, before step S10, the following steps are further included:

步骤S01：获取二手车销售数据集，并对所述二手车销售数据集进行预处理，获得训练数据集。Step S01: Obtain a used car sales data set, and preprocess the used car sales data set to obtain a training data set.

可理解的是，上述二手车销售数据集可以是通过数据开放平台或者网络爬虫技术获得的二手车销售数据集，本实施例对此不加以限制。本实施例及以下各实施例以采用kaggle数据开放平台中VW二手车销售数据集为例进行说明。It is understandable that the above-mentioned second-hand car sales dataset can be a second-hand car sales dataset obtained through a data open platform or web crawler technology, and this embodiment does not limit this. This embodiment and the following embodiments are described by taking the VW second-hand car sales dataset in the kaggle data open platform as an example.

需要说明的是，上述数据预处理可以是对二手车销售数据集中的车型、变数箱类型、燃油类型进行独热编码处理，将上牌日期换算成对应年限，也可以是进行数据清洗，本实施例对此不加以限制。It should be noted that the above data preprocessing may be to perform unique-hot encoding processing on the vehicle model, variable box type, and fuel type in the used car sales data set, convert the registration date into the corresponding years, or perform data cleaning, which is not limited in this embodiment.

步骤S02：通过引入Chebyshev混沌映射、非线性递减扰动因子和自适应权重因子改进哈里斯鹰算法，获得改进后的哈里斯鹰算法。Step S02: The Harris Hawk algorithm is improved by introducing Chebyshev chaotic mapping, nonlinear decreasing disturbance factor and adaptive weight factor to obtain an improved Harris Hawk algorithm.

需要解释的是，步骤S02，包括：It should be explained that step S02 includes:

步骤S021：在所述哈里斯鹰算法的全局勘探阶段引入Chebyshev混沌映射，获得改进后的全局勘探阶段。Step S021: Introducing Chebyshev chaotic mapping into the global exploration phase of the Harris Hawk algorithm to obtain an improved global exploration phase.

需要说明的是，所述Chebyshev混沌映射为：It should be noted that the Chebyshev chaotic map is:

CM＝cos(tcos^-1(γ)) (1)CM＝cos(tcos ^-1 (γ)) (1)

式中，t为当前迭代次数，γ为混沌初始值。Where t is the current iteration number and γ is the initial value of chaos.

步骤S022：采用非线性递减扰动因子对所述哈里斯鹰算法中的猎物逃逸能量进行改进，获得改进后的逃逸能量公式。Step S022: using a nonlinear decreasing disturbance factor to improve the prey escape energy in the Harris Hawk algorithm to obtain an improved escape energy formula.

需要解释的是，所述改进后的逃逸能量公式为：It should be explained that the improved escape energy formula is:

E＝2E₀(2r(0.5+cos((π(t/T))^1/2))) (3)E＝2E ₀ (2r(0.5+cos((π(t/T)) ^1/2 ))) (3)

式中，E₀初始逃逸能量，T为最大迭代次数，t为当前迭代次数，r为(0,1)之间的随机数。Where E _{= 0} , the initial escape energy, T = the maximum number of iterations, t = the current number of iterations, and r = a random number between (0, 1).

步骤S023：利用自适应权重因子对所述哈里斯鹰算法中种群内的最优个体进行加权，获得最优个体更新公式。Step S023: weighting the optimal individual in the population in the Harris Hawk algorithm using an adaptive weight factor to obtain an optimal individual update formula.

需要说明的是，所述自适应权重因子为：It should be noted that the adaptive weight factor is:

所述最优个体更新公式：The optimal individual update formula:

X′_rabbit＝w(t)×X_rabbit (5)X′ _rabbit = w(t) × X _rabbit (5)

步骤S024：基于所述改进后的全局勘探阶段、所述改进后的逃逸能量公式和所述最优个体更新公式获得改进后的哈里斯鹰算法。Step S024: obtaining an improved Harris Hawk algorithm based on the improved global exploration phase, the improved escape energy formula and the optimal individual update formula.

需要解释的是，改进后的哈里斯鹰算法在局部开发阶段与传统哈里斯鹰算法一致。It needs to be explained that the improved Harris Hawk algorithm is consistent with the traditional Harris Hawk algorithm in the local development stage.

在局部开发阶段：哈里斯鹰种群可以根据猎物的逃跑行为，采用四种不同的策略更新种群位置。并根据逃逸能量E和狩猎随机选择因子r∈(0,1)来选择围攻策略。In the local development stage: Harris hawk populations can use four different strategies to update the population position according to the escape behavior of prey, and choose the siege strategy according to the escape energy E and the hunting random selection factor r∈(0,1).

1)当r大于等于0.5且|E|大于等于0.5时，哈里斯鹰实行软围攻策略，模型如下：1) When r is greater than or equal to 0.5 and |E| is greater than or equal to 0.5, the Harris Hawk implements a soft siege strategy, and the model is as follows:

X(t+1)＝ΔX(t)-E|JX_rabbit(t)-X(t)| (6)X(t+1)＝ΔX(t)-E|JX _rabbit (t)-X(t)| (6)

ΔX(t)＝X_rabbit(t)-X(t) (7)ΔX(t)＝ _Xrabbit (t)-X(t) (7)

J＝2(1-r₅) (8)J＝2(1-r ₅ ) (8)

其中，J为猎物在整个逃跑过程中的随机跳跃强度，X_rabbit(t)为当前种群内的最优解。Among them, J is the random jumping intensity of the prey during the entire escape process, and X _rabbit (t) is the optimal solution in the current population.

2)当r大于等于0.5且|E|小于0.5时，哈里斯鹰实行硬围攻策略将猎物直接捕获，模型如下：2) When r is greater than or equal to 0.5 and |E| is less than 0.5, Harris's hawk uses a hard siege strategy to capture the prey directly. The model is as follows:

X(t+1)＝X_rabbit(t)-E|ΔX(t)| (9)X(t+1)＝ _Xrabbit (t)-E|ΔX(t)| (9)

3)当r小于0.5且|E|大于等于0.5时，哈里斯鹰实行渐进快速俯冲软围攻策略，模型如下：3) When r is less than 0.5 and |E| is greater than or equal to 0.5, the Harris Hawk implements a gradual rapid dive soft siege strategy, and the model is as follows:

其中，D为待优化问题的维度，S是一个D维随机向量，LF为是莱维飞行算子。Where D is the dimension of the problem to be optimized, S is a D-dimensional random vector, and LF is the Levy flight operator.

4)当r小于0.5且|E|小于0.5时。哈里斯鹰实行渐进快速俯冲硬围攻策略，模型如下：4) When r is less than 0.5 and |E| is less than 0.5, the Harris Hawk implements a progressive fast dive hard siege strategy, and the model is as follows:

例如，参考图4，图4为本发明二手车价格预测方法中改进后的哈里斯鹰算法流程示意图。首先初始化HHO参数，随机初始化种群，并计算个体适应度；依据公式(3)更新逃逸能量E；然后判断E>1？；若E>1，则依据公式(1)和公式(2)更新位置，然后判断t>T？，若t>T，则输出适应度和最优个体，若t≤T，则再次判断E>1？，若E≤1，当r大于等于0.5且|E|大于等于0.5时，则依据公式(6)更新位置；当r大于等于0.5且|E|小于0.5时，则依据公式(9)更新位置；当r小于0.5且|E|大于等于0.5时，则依据公式(4)、(5)和(10)更新位置；当r小于0.5且|E|小于0.5时，则依据公式(4)、(5)和(11)更新位置，最后判断t>T？，若t>T，则输出适应度和最优个体。For example, referring to FIG4, FIG4 is a schematic diagram of the improved Harris Hawk algorithm flow in the second-hand car price prediction method of the present invention. First, the HHO parameters are initialized, the population is randomly initialized, and the individual fitness is calculated; the escape energy E is updated according to formula (3); then it is judged that E>1?; if E>1, the position is updated according to formula (1) and formula (2), and then t>T? is judged. If t>T, the fitness and the optimal individual are output. If t≤T, E>1? is judged again. If E≤1, when r is greater than or equal to 0.5 and |E| is greater than or equal to 0.5, the position is updated according to formula (6); when r is greater than or equal to 0.5 and |E| is less than 0.5, the position is updated according to formula (9); when r is less than 0.5 and |E| is greater than or equal to 0.5, the position is updated according to formula (4), (5) and (10); when r is less than 0.5 and |E| is less than 0.5, the position is updated according to formula (4), (5) and (11), and finally t>T? is judged. , if t>T, then output the fitness and the optimal individual.

步骤S03：利用所述改进后的哈里斯鹰算法对LightGBM模型的超参数进行寻优，获得目标参数组合。Step S03: Use the improved Harris Eagle algorithm to optimize the hyperparameters of the LightGBM model to obtain the target parameter combination.

需要解释的是，本实施例及以下各实施例中LightGBM模型的超参数可以是LightGBM模型中每棵树叶子节点数、每棵树最大深度、模型学习率和每个叶子节点所需最小样本数。It should be explained that the hyperparameters of the LightGBM model in this embodiment and the following embodiments may be the number of leaf nodes of each tree in the LightGBM model, the maximum depth of each tree, the model learning rate, and the minimum number of samples required for each leaf node.

在具体实现中，可以设定所述改进后的哈里斯鹰算法的种群规模、迭代次数和问题维度，获得目标哈里斯鹰算法；将LightGBM模型中每棵树叶子节点数、每棵树最大深度、模型学习率和每个叶子节点所需最小样本数作为所述LightGBM模型的超参数；将所述训练数据集输入至所述LightGBM模型，并依据预设适应度函数确定个体适应度值；基于所述个体适应度值，根据所述目标哈里斯鹰算法对种群个体进行迭代更新，直至达到所述迭代次数，获得目标参数组合。In a specific implementation, the population size, number of iterations and problem dimension of the improved Harris Hawk algorithm can be set to obtain the target Harris Hawk algorithm; the number of leaf nodes of each tree, the maximum depth of each tree, the model learning rate and the minimum number of samples required for each leaf node in the LightGBM model are used as hyperparameters of the LightGBM model; the training data set is input into the LightGBM model, and the individual fitness value is determined according to a preset fitness function; based on the individual fitness value, the population individuals are iteratively updated according to the target Harris Hawk algorithm until the number of iterations is reached to obtain the target parameter combination.

需要说明的是，上述预设适应度函数为：It should be noted that the above preset fitness function is:

式中，y_pred为样本i对应二手车价格的预测值，y_i为样本i二手车价格的实际值，n为样本数量，parameter为所优化的LightGBM模型的超参数。Where _ypred is the predicted value of the used car price corresponding to sample i, _yi is the actual value of the used car price of sample i, n is the number of samples, and parameter is the hyperparameter of the optimized LightGBM model.

步骤S04：基于所述目标参数组合建立初始LightGBM模型后，利用所述训练数据集对所述初始LightGBM模型进行训练，获得目标二手车价格预测模型。Step S04: After establishing an initial LightGBM model based on the target parameter combination, the initial LightGBM model is trained using the training data set to obtain a target used car price prediction model.

在具体实现中，可以基于所述LightGBM模型和所述目标参数组合建立初始LightGBM模型；利用所述训练数据集对所述初始LightGBM模型进行训练，获得训练结果；根据所述训练结果对所述初始LightGBM模型进行优化，获得目标二手车价格预测模型。In a specific implementation, an initial LightGBM model can be established based on the LightGBM model and the target parameter combination; the initial LightGBM model is trained using the training data set to obtain a training result; the initial LightGBM model is optimized according to the training result to obtain a target used car price prediction model.

本实施例获取二手车销售数据集，并对所述二手车销售数据集进行预处理，获得训练数据集；通过引入Chebyshev混沌映射、非线性递减扰动因子和自适应权重因子改进哈里斯鹰算法，获得改进后的哈里斯鹰算法；利用所述改进后的哈里斯鹰算法对LightGBM模型的超参数进行寻优，获得目标参数组合；基于所述目标参数组合建立初始LightGBM模型后，利用所述训练数据集对所述初始LightGBM模型进行训练，获得目标二手车价格预测模型。相比于现有技术，本实施例通过引入Chebyshev混沌映射、非线性递减扰动因子和自适应权重因子改进哈里斯鹰算法，获得改进后的哈里斯鹰算法，然后利用改进后的哈里斯鹰算法对LightGBM模型的超参数进行寻优，解决了LightGBM模型主观定参易陷入局部最优的不足的问题，并提升了LightGBM模型的学习能力。This embodiment obtains a used car sales data set, and pre-processes the used car sales data set to obtain a training data set; improves the Harris Hawk algorithm by introducing Chebyshev chaotic mapping, nonlinear decreasing perturbation factor and adaptive weight factor to obtain an improved Harris Hawk algorithm; uses the improved Harris Hawk algorithm to optimize the hyperparameters of the LightGBM model to obtain a target parameter combination; after establishing an initial LightGBM model based on the target parameter combination, the initial LightGBM model is trained using the training data set to obtain a target used car price prediction model. Compared with the prior art, this embodiment improves the Harris Hawk algorithm by introducing Chebyshev chaotic mapping, nonlinear decreasing perturbation factor and adaptive weight factor to obtain an improved Harris Hawk algorithm, and then uses the improved Harris Hawk algorithm to optimize the hyperparameters of the LightGBM model, thereby solving the problem that the subjective parameter setting of the LightGBM model is prone to fall into the local optimum, and improving the learning ability of the LightGBM model.

参考图5，图5为本发明二手车价格预测方法第三实施例的流程示意图。Refer to FIG5 , which is a flow chart of a third embodiment of a method for predicting used car prices according to the present invention.

基于上述各实施例，在本实施例中，所述步骤S01，包括：Based on the above embodiments, in this embodiment, step S01 includes:

步骤S011：获取二手车销售数据集，对所述二手车销售数据集进行预处理，获得预处理后的二手车销售数据集。Step S011: obtaining a used car sales dataset, and preprocessing the used car sales dataset to obtain a preprocessed used car sales dataset.

步骤S012：对所述预处理后的二手车销售数据集按照预设比例进行拆分，获得训练数据集和测试数据集。Step S012: split the preprocessed used car sales data set according to a preset ratio to obtain a training data set and a test data set.

可理解的是，上述预设比例可以是用户自定义设置的，也可以是根据二手车销售数据集的特定设置的，例如，6:4、7:3或8:2等，本实施例对此不加以限制。It is understandable that the above preset ratio may be a user-defined setting or a specific setting according to the used car sales data set, for example, 6:4, 7:3 or 8:2, etc., which is not limited in this embodiment.

本实施例在具体实现中，可以将预处理后的二手车销售数据集按照8:2的比例进行拆分，获得训练数据集和测试数据集，分别用于模型的训练和测试。In a specific implementation of this embodiment, the preprocessed used car sales data set may be split in a ratio of 8:2 to obtain a training data set and a test data set, which are used for model training and testing, respectively.

步骤S05：利用所述目标二手车价格预测模型对所述测试数据集进行预测，获得二手车价格预测值。Step S05: using the target used car price prediction model to predict the test data set to obtain a used car price prediction value.

步骤S06：选取评价指标，基于所述二手车价格预测值对所述目标二手车价格预测模型进行评估，所述评价指标包括决定系数、均方误差和平均绝对误差。Step S06: selecting evaluation indicators, and evaluating the target used car price prediction model based on the used car price prediction value, wherein the evaluation indicators include determination coefficient, mean square error and mean absolute error.

例如，参考图6，图6为本发明二手车价格预测方法中目标二手车价格预测模型整体设计示意图。For example, referring to FIG. 6 , FIG. 6 is a schematic diagram of the overall design of the target used car price prediction model in the used car price prediction method of the present invention.

Step1：数据集预处理，并将数据集拆分为训练集和测试集(即对预处理后的二手车销售数据集按照预设比例进行拆分，获得训练数据集和测试数据集)；Step 1: Preprocess the data set and split the data set into a training set and a test set (i.e., split the preprocessed used car sales data set according to a preset ratio to obtain a training data set and a test data set);

Step2：设定iHHO算法(即改进后的哈里斯鹰算法)的种群规模N、迭代次数T和问题维度D；Step 2: Set the population size N, number of iterations T and problem dimension D of the iHHO algorithm (i.e. the improved Harris Hawk algorithm);

Step3：设定四个LightGBM模型超参数搜索范围，包括每棵树叶子节点数num_leaves、每棵树最大深度max_depth、模型学习率learning_rate及每个叶子节点所需最小样本数min_child_samples；Step 3: Set the search range of four LightGBM model hyperparameters, including the number of leaf nodes per tree num_leaves, the maximum depth of each tree max_depth, the model learning rate learning_rate, and the minimum number of samples required for each leaf node min_child_samples;

Step4：将训练集(即训练数据集)输入LightGBM模型，并依据公式(12)计算个体适应度值；Step 4: Input the training set (i.e., training data set) into the LightGBM model and calculate the individual fitness value according to formula (12);

Step5：根据iHHO算法对种群个体进行迭代更新；Step 5: Iteratively update the population individuals according to the iHHO algorithm;

Step6：若达到迭代次数终止条件，进入Step7；否则，返回Step4；Step 6: If the iteration termination condition is reached, go to Step 7; otherwise, return to Step 4;

Step7：输出最优参数组合(即目标参数组合)；Step 7: Output the optimal parameter combination (i.e., target parameter combination);

Step8：依据最优参数组合建立LightGBM模型，并使用训练集(训练数据集)对模型进行训练；Step 8: Establish the LightGBM model based on the optimal parameter combination, and train the model using the training set (training data set);

Step9：利用训练后的模型对测试集进行预测；Step 9: Use the trained model to predict the test set;

Step10:输出二手车价格预测值，并对预测模型进行评估。Step 10: Output the predicted value of used car prices and evaluate the prediction model.

需要说明的是，上述决定系数R²、均方误差RMSE和平均绝对误差MAE的定义如下：It should be noted that the above determination coefficient R ² , mean square error RMSE and mean absolute error MAE are defined as follows:

其中，n为样本数量，y_pred为二手车价格的预测值，y_mean为均值，y为二手车价格的真实值。其中，MAE衡量整体预测精度，数值越小表示数据拟合效果越好。RMSE量化预测误差大小，较小的值表示预测精度更高。R²衡量模型拟合目标变量变异性的程度，较接近1表示拟合能力较好。Where n is the number of samples, y _pred is the predicted value of the used car price, y _mean is the mean, and y is the true value of the used car price. MAE measures the overall prediction accuracy, and the smaller the value, the better the data fit. RMSE quantifies the size of the prediction error, and a smaller value indicates a higher prediction accuracy. R ² measures the degree to which the model fits the variability of the target variable, and a value closer to 1 indicates a better fit.

在具体实现中，还可以引入开放二手车数据集VW对目标二手车价格预测模型进行验证，并利用皮尔逊相关系数对影响二手车价格的特征进行了相关性分析，证明目标二手车价格预测模型在预测精度和数据拟合方面要优于对比模型。In the specific implementation, the open used car dataset VW can also be introduced to verify the target used car price prediction model, and the Pearson correlation coefficient can be used to perform a correlation analysis on the characteristics that affect the used car price, proving that the target used car price prediction model is superior to the comparison model in terms of prediction accuracy and data fitting.

本实施例述利用所述训练数据集对所述初始LightGBM模型进行训练，获得目标二手车价格预测模型的步骤之后，利用所述目标二手车价格预测模型对所述测试数据集进行预测，获得二手车价格预测值；选取评价指标，基于所述二手车价格预测值对所述目标二手车价格预测模型进行评估，所述评价指标包括决定系数、均方误差和平均绝对误差。相比于现有技术，由于本实施例选取决定系数、均方误差和平均绝对误差对所述目标二手车价格预测模型进行评估，实现了目标二手车价格预测模型的评估，证明了目标二手车价格预测模型在预测精度和数据拟合方面要优于对比模型。This embodiment describes the step of using the training data set to train the initial LightGBM model to obtain the target used car price prediction model, and then uses the target used car price prediction model to predict the test data set to obtain the used car price prediction value; selects evaluation indicators, and evaluates the target used car price prediction model based on the used car price prediction value, and the evaluation indicators include determination coefficient, mean square error and mean absolute error. Compared with the prior art, since this embodiment selects determination coefficient, mean square error and mean absolute error to evaluate the target used car price prediction model, the evaluation of the target used car price prediction model is achieved, proving that the target used car price prediction model is superior to the comparison model in terms of prediction accuracy and data fitting.

此外，本发明实施例还提出一种存储介质，所述存储介质上存储有二手车价格预测程序，所述二手车价格预测程序被处理器执行时实现如上文所述的二手车价格预测方法的步骤。In addition, an embodiment of the present invention further proposes a storage medium, on which a used car price prediction program is stored. When the used car price prediction program is executed by a processor, the steps of the used car price prediction method described above are implemented.

参照图7，图7为本发明二手车价格预测装置第一实施例的结构框图。Refer to FIG. 7 , which is a structural block diagram of a first embodiment of a second-hand car price prediction device according to the present invention.

如图7所示，本发明实施例提出的二手车价格预测装置包括：数据处理模块701和价格输出模块702。As shown in FIG. 7 , the second-hand car price prediction device provided in the embodiment of the present invention includes: a data processing module 701 and a price output module 702 .

所述数据处理模块701，用于获取二手车车辆属性信息，并对所述车辆属性信息进行预处理，获得目标二手车数据。The data processing module 701 is used to obtain vehicle attribute information of a used car, and pre-process the vehicle attribute information to obtain target used car data.

所述价格输出模块702，用于将所述目标二手车数据输入至目标二手车价格预测模型，获得二手车价格预测值，所述目标二手车价格预测模型是通过引入Chebyshev混沌映射、非线性递减扰动因子和自适应权重因子改进哈里斯鹰算法后，基于改进后的哈里斯鹰算法对LightGBM模型的超参数进行寻优后获得的模型。The price output module 702 is used to input the target used car data into a target used car price prediction model to obtain a used car price prediction value. The target used car price prediction model is a model obtained by improving the Harris Eagle algorithm by introducing Chebyshev chaotic mapping, nonlinear decreasing perturbation factor and adaptive weight factor, and optimizing the hyperparameters of the LightGBM model based on the improved Harris Eagle algorithm.

基于本发明上述二手车价格预测装置第一实施例，提出本发明二手车价格预测装置的第二实施例。Based on the first embodiment of the second-hand car price prediction device of the present invention, a second embodiment of the second-hand car price prediction device of the present invention is proposed.

在本实施例中，所述二手车价格预测模块701，还用于获取二手车销售数据集，并对所述二手车销售数据集进行预处理，获得训练数据集；通过引入Chebyshev混沌映射、非线性递减扰动因子和自适应权重因子改进哈里斯鹰算法，获得改进后的哈里斯鹰算法；利用所述改进后的哈里斯鹰算法对LightGBM模型的超参数进行寻优，获得目标参数组合；基于所述目标参数组合建立初始LightGBM模型后，利用所述训练数据集对所述初始LightGBM模型进行训练，获得目标二手车价格预测模型。In this embodiment, the used car price prediction module 701 is also used to obtain a used car sales data set, and preprocess the used car sales data set to obtain a training data set; improve the Harris Hawk algorithm by introducing Chebyshev chaotic mapping, nonlinear decreasing perturbation factor and adaptive weight factor to obtain an improved Harris Hawk algorithm; use the improved Harris Hawk algorithm to optimize the hyperparameters of the LightGBM model to obtain a target parameter combination; after establishing an initial LightGBM model based on the target parameter combination, use the training data set to train the initial LightGBM model to obtain a target used car price prediction model.

所述二手车价格预测模块701，还用于在所述哈里斯鹰算法的全局勘探阶段引入Chebyshev混沌映射，获得改进后的全局勘探阶段；采用非线性递减扰动因子对所述哈里斯鹰算法中的猎物逃逸能量进行改进，获得改进后的逃逸能量公式；利用自适应权重因子对所述哈里斯鹰算法中种群内的最优个体进行加权，获得最优个体更新公式；基于所述改进后的全局勘探阶段、所述改进后的逃逸能量公式和所述最优个体更新公式获得改进后的哈里斯鹰算法。The used car price prediction module 701 is also used to introduce Chebyshev chaotic mapping in the global exploration stage of the Harris Hawk algorithm to obtain an improved global exploration stage; use a nonlinear decreasing disturbance factor to improve the prey escape energy in the Harris Hawk algorithm to obtain an improved escape energy formula; use an adaptive weight factor to weight the optimal individual in the population in the Harris Hawk algorithm to obtain an optimal individual update formula; and obtain an improved Harris Hawk algorithm based on the improved global exploration stage, the improved escape energy formula and the optimal individual update formula.

所述二手车价格预测模块701，还用于设定所述改进后的哈里斯鹰算法的种群规模、迭代次数和问题维度，获得目标哈里斯鹰算法；将LightGBM模型中每棵树叶子节点数、每棵树最大深度、模型学习率和每个叶子节点所需最小样本数作为所述LightGBM模型的超参数；将所述训练数据集输入至所述LightGBM模型，并依据预设适应度函数确定个体适应度值；基于所述个体适应度值，根据所述目标哈里斯鹰算法对种群个体进行迭代更新，直至达到所述迭代次数，获得目标参数组合。The used car price prediction module 701 is also used to set the population size, number of iterations and problem dimension of the improved Harris Hawk algorithm to obtain the target Harris Hawk algorithm; the number of leaf nodes of each tree, the maximum depth of each tree, the model learning rate and the minimum number of samples required for each leaf node in the LightGBM model are used as hyperparameters of the LightGBM model; the training data set is input into the LightGBM model, and the individual fitness value is determined according to a preset fitness function; based on the individual fitness value, the population individuals are iteratively updated according to the target Harris Hawk algorithm until the number of iterations is reached to obtain the target parameter combination.

所述二手车价格预测模块701，还用于基于所述LightGBM模型和所述目标参数组合建立初始LightGBM模型；利用所述训练数据集对所述初始LightGBM模型进行训练，获得训练结果；根据所述训练结果对所述初始LightGBM模型进行优化，获得目标二手车价格预测模型。The used car price prediction module 701 is also used to establish an initial LightGBM model based on the LightGBM model and the target parameter combination; train the initial LightGBM model using the training data set to obtain a training result; optimize the initial LightGBM model according to the training result to obtain a target used car price prediction model.

本发明二手车价格预测装置的其他实施例或具体实现方式可参照上述各方法实施例，此处不再赘述。Other embodiments or specific implementations of the second-hand car price prediction device of the present invention can refer to the above-mentioned method embodiments and will not be described in detail here.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。It should be noted that, in this article, the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, article or system including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or system. In the absence of further restrictions, an element defined by the sentence "comprises a ..." does not exclude the existence of other identical elements in the process, method, article or system including the element.

上述本发明实施例序号仅仅为了描述，不代表实施例的优劣。The serial numbers of the above embodiments of the present invention are only for description and do not represent the advantages or disadvantages of the embodiments.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如只读存储器/随机存取存储器、磁碟、光盘)中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，空调器，或者网络设备等)执行本发明各个实施例所述的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that the above-mentioned embodiment methods can be implemented by means of software plus a necessary general hardware platform, and of course by hardware, but in many cases the former is a better implementation method. Based on such an understanding, the technical solution of the present invention, or the part that contributes to the prior art, can be embodied in the form of a software product, which is stored in a storage medium (such as a read-only memory/random access memory, a magnetic disk, or an optical disk), and includes a number of instructions for enabling a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods described in each embodiment of the present invention.

以上仅为本发明的优选实施例，并非因此限制本发明的专利范围，凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本发明的专利保护范围内。The above are only preferred embodiments of the present invention, and are not intended to limit the patent scope of the present invention. Any equivalent structure or equivalent process transformation made using the contents of the present invention specification and drawings, or directly or indirectly applied in other related technical fields, are also included in the patent protection scope of the present invention.

Claims

1. The second-hand vehicle price prediction method is characterized by comprising the following steps of:

Acquiring second-hand vehicle attribute information, and preprocessing the vehicle attribute information to acquire target second-hand vehicle data;

And inputting the target second-hand vehicle data into a target second-hand vehicle price prediction model to obtain a second-hand vehicle price prediction value, wherein the target second-hand vehicle price prediction model is obtained by introducing Chebyshev chaotic mapping, nonlinear decreasing disturbance factors and self-adaptive weight factors to improve a Harris eagle algorithm and optimizing super-parameters of a LightGBM model based on the improved Harris eagle algorithm.

2. The second-hand-car price prediction method according to claim 1, wherein the step of acquiring second-hand-car vehicle attribute information and preprocessing the vehicle attribute information to obtain target second-hand-car data further comprises, before:

Acquiring a second-hand vehicle sales data set, and preprocessing the second-hand vehicle sales data set to acquire a training data set;

improving the Harris hawk algorithm by introducing Chebyshev chaotic mapping, nonlinear decreasing disturbance factors and self-adaptive weight factors, so as to obtain an improved Harris hawk algorithm;

Optimizing the super parameters of the LightGBM model by utilizing the improved Harris eagle algorithm to obtain a target parameter combination;

and after an initial LightGBM model is established based on the target parameter combination, training the initial LightGBM model by utilizing the training data set to obtain a target secondary car price prediction model.

3. The second-hand car price prediction method of claim 2, wherein the step of improving the harris eagle algorithm by introducing Chebyshev chaotic mapping, nonlinear decreasing perturbation factors and adaptive weighting factors, and obtaining the improved harris eagle algorithm comprises the following steps:

Introducing Chebyshev chaotic mapping into the global exploration phase of the Harris hawk algorithm to obtain an improved global exploration phase;

Improving the escaping energy of the prey in the Harris eagle algorithm by adopting a nonlinear decreasing disturbance factor to obtain an improved escaping energy formula;

Weighting the optimal individuals in the population in the Harris eagle algorithm by using the self-adaptive weight factors to obtain an optimal individual updating formula;

and obtaining an improved Harris hawk algorithm based on the improved global exploration phase, the improved escape energy formula and the optimal individual update formula.

4. The second-hand-car price prediction method of claim 3, wherein the Chebyshev chaotic map is:

CM＝cos(tcos^-1(γ))

Wherein t is the current iteration number, and gamma is the chaos initial value;

the improved global exploration phase model is as follows:

wherein X (t+1) and X (t) are respectively the position and the current position of the next iteration of the Harris eagle, X _rabbit (t) is the position of the prey, CM ₁、CM₂、CM₃、CM₄ and q are random numbers between (0 and 1), UB and LB are respectively the upper limit and the lower limit of individual position search, X _m (t) is the average position of the population,

Accordingly: the improved escape energy formula is as follows:

E＝2E₀(2r(0.5+cos((π(t/T))^1/2)))；

Wherein E ₀ is initial escape energy, T is maximum iteration times, T is current iteration times, and r is a random number between (0, 1);

The adaptive weight factor is:

The optimal individual update formula:

X′_rabbit＝w(t)×X_rabbit。

5. The second-hand-car price prediction method of claim 2, wherein the step of acquiring a second-hand-car sales data set and preprocessing the second-hand-car sales data set to obtain a training data set comprises:

acquiring a second-hand vehicle sales data set, preprocessing the second-hand vehicle sales data set, and acquiring a preprocessed second-hand vehicle sales data set;

splitting the preprocessed second-hand vehicle sales data set according to a preset proportion to obtain a training data set and a testing data set;

Correspondingly, after the step of training the initial LightGBM model by using the training dataset to obtain the target secondary handcart price prediction model, the method further includes:

predicting the test data set by using the target secondary vehicle price prediction model to obtain a secondary vehicle price prediction value;

And selecting an evaluation index, and evaluating the target secondary car price prediction model based on the secondary car price prediction value, wherein the evaluation index comprises a decision coefficient, a mean square error and an average absolute error.

6. The second-hand-car price prediction method of claim 2, wherein the step of optimizing the super-parameters of the LightGBM model by using the modified harris eagle algorithm to obtain the target parameter combination comprises the following steps:

setting the population scale, iteration times and problem dimension of the improved Harris eagle algorithm to obtain a target Harris eagle algorithm;

Taking the number of leaf nodes in the LightGBM model, the maximum depth of each tree, the model learning rate and the minimum number of samples required by each leaf node as super parameters of the LightGBM model;

inputting the training data set into the LightGBM model, and determining an individual fitness value according to a preset fitness function;

And based on the individual fitness value, carrying out iterative updating on population individuals according to the target Harriset algorithm until the iterative times are reached, and obtaining a target parameter combination.

7. The method of claim 6, wherein the step of training the initial LightGBM model with the training dataset after establishing the initial LightGBM model based on the target parameter combination to obtain a target secondary price prediction model comprises:

Establishing an initial LightGBM model based on the LightGBM model and the target parameter combination;

Training the initial LightGBM model by using the training data set to obtain a training result;

and optimizing the initial LightGBM model according to the training result to obtain a target secondary handcart price prediction model.

8. A secondary car price prediction apparatus, the apparatus comprising:

the data processing module is used for acquiring the attribute information of the second-hand vehicle, preprocessing the attribute information of the second-hand vehicle and acquiring target second-hand vehicle data;

the price output module is used for inputting the target second-hand vehicle data into a target second-hand vehicle price prediction model to obtain a second-hand vehicle price prediction value, and the target second-hand vehicle price prediction model is obtained by introducing Chebyshev chaotic mapping, nonlinear decreasing disturbance factors and self-adaptive weight factors to improve a Harris eagle algorithm and optimizing super-parameters of a LightGBM model based on the improved Harris eagle algorithm.

9. A secondary car price prediction apparatus, the apparatus comprising: a memory, a processor and a second-hand-vehicle price prediction program stored on the memory and executable on the processor, the second-hand-vehicle price prediction program being configured to implement the steps of the second-hand-vehicle price prediction method of any one of claims 1 to 7.

10. A storage medium having stored thereon a second-hand-vehicle price prediction program which, when executed by a processor, implements the steps of the second-hand-vehicle price prediction method according to any one of claims 1 to 7.