[go: up one dir, main page]

CN111882114B - A short-term traffic flow prediction model construction method and prediction method - Google Patents

A short-term traffic flow prediction model construction method and prediction method Download PDF

Info

Publication number
CN111882114B
CN111882114B CN202010628317.5A CN202010628317A CN111882114B CN 111882114 B CN111882114 B CN 111882114B CN 202010628317 A CN202010628317 A CN 202010628317A CN 111882114 B CN111882114 B CN 111882114B
Authority
CN
China
Prior art keywords
traffic flow
short
data
time traffic
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN202010628317.5A
Other languages
Chinese (zh)
Other versions
CN111882114A (en
Inventor
孙朝云
李伟
郝雪丽
凤少伟
曹磊
裴莉莉
户媛姣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changan University
Original Assignee
Changan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changan University filed Critical Changan University
Priority to CN202010628317.5A priority Critical patent/CN111882114B/en
Publication of CN111882114A publication Critical patent/CN111882114A/en
Application granted granted Critical
Publication of CN111882114B publication Critical patent/CN111882114B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • General Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Educational Administration (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Primary Health Care (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a short-time traffic flow prediction model construction method and a short-time traffic flow prediction method. The influence on the prediction result of the short-time traffic flow is obvious aiming at the training set selection of the neural network, the K-Means clustering algorithm is adopted to cluster the historical short-time traffic flow data, the prediction is carried out in a targeted manner, and the accuracy of the prediction result is improved.

Description

一种短时交通流量预测模型构建方法及预测方法A short-term traffic flow prediction model construction method and prediction method

技术领域Technical field

本发明属于智能交通系统领域,具体涉及一种短时交通流量预测模型构建方法及预测方法。The invention belongs to the field of intelligent transportation systems, and specifically relates to a short-term traffic flow prediction model construction method and a prediction method.

背景技术Background technique

智能交通系统将现代的技术成果应用于交通规划和管理,建立了一种智能化的交通管理体系,精准、高效、实时是智能交通系统的显著优势,通过智能交通系统的反馈,有关部门可以及时掌握所关心地区的交通状况,从而有效地指导交通运转,减少甚至避免交通拥堵的发生率,并为城市道路规划、节能减排提供信息支持。The intelligent transportation system applies modern technological achievements to transportation planning and management, and establishes an intelligent transportation management system. Accuracy, efficiency, and real-time are the significant advantages of the intelligent transportation system. Through the feedback of the intelligent transportation system, relevant departments can promptly Master the traffic conditions in areas of concern to effectively guide traffic operations, reduce or even avoid the occurrence of traffic congestion, and provide information support for urban road planning, energy conservation and emission reduction.

随着人工智能研究热潮的兴起,神经网络模型在短时交通流量预测方面取得了令人瞩目的成果。但是由于不同路况、不同天气、不同日期等因素对短时交通流量时间序列分布存在很大影响,导致现有的神经网络模型无法准确对交通流量进行预测。With the rise of artificial intelligence research, neural network models have achieved remarkable results in short-term traffic flow prediction. However, due to factors such as different road conditions, different weather, and different dates, which have a great impact on the time series distribution of short-term traffic flow, the existing neural network model cannot accurately predict traffic flow.

发明内容Contents of the invention

针对现有技术中存在的不足,本发明的目的在于,提供一种基于K-Means聚类与GRU网络的短时交通流量预测方法,解决现有技术无法准确对交通流量进行预测的技术问题。In view of the shortcomings in the existing technology, the purpose of the present invention is to provide a short-term traffic flow prediction method based on K-Means clustering and GRU network to solve the technical problem that the existing technology cannot accurately predict traffic flow.

为了解决上述技术问题,本申请采用如下技术方案予以实现:In order to solve the above technical problems, this application adopts the following technical solutions to achieve:

一种短时交通流量预测模型构建方法,包括以下步骤:A short-term traffic flow prediction model construction method includes the following steps:

步骤1,获取一段时间内的交通流量数据,得到交通流量数据集;对得到的交通流量数据集进行预处理,得到预处理后的交通流量数据集;Step 1: Obtain traffic flow data within a period of time to obtain a traffic flow data set; preprocess the obtained traffic flow data set to obtain a preprocessed traffic flow data set;

步骤2,对步骤1得到的预处理后的交通流量数据集中的数据以K-Means算法进行聚类,得到短时交通流量模式类别,并建立短时交通流量模式库;Step 2: Cluster the data in the preprocessed traffic flow data set obtained in step 1 using the K-Means algorithm to obtain short-term traffic flow pattern categories, and establish a short-term traffic flow pattern library;

步骤3,确定预测日期,获取该预测日期的前N个时刻的短时交通流量数据,依次确定N个时刻短时交通流量模式的特征向量,利用分类方法分别确定N个时刻特征向量对应的短时交通流量模式,选取N个时刻对应的短时交通流量模式出现频率最高的流量模式所对应的交通流量数据组成训练的数据集;Step 3: Determine the prediction date, obtain the short-term traffic flow data for the first N moments of the prediction date, determine the feature vectors of the N time short-term traffic flow patterns in sequence, and use the classification method to determine the short-term traffic flow patterns corresponding to the N time feature vectors. Time traffic flow pattern, select the traffic flow data corresponding to the traffic pattern with the highest frequency of short-term traffic flow patterns corresponding to N moments to form the training data set;

步骤4,利用GRU神经网络模型训练步骤3形成的训练的数据集,训练完成后得到短时交通流量预测模型;Step 4: Use the GRU neural network model to train the training data set formed in step 3. After the training is completed, the short-term traffic flow prediction model is obtained;

所述的GRU神经网络模型采用堆栈式结构,包括依次连接的输入层、第一GRU单元层、第二GRU单元层和第三GRU单元层,且第三GRU单元层后连接有用于防止过拟合的dropout层。The GRU neural network model adopts a stack structure, including an input layer, a first GRU unit layer, a second GRU unit layer and a third GRU unit layer that are connected in sequence, and the third GRU unit layer is connected behind to prevent over-simulation. combined dropout layer.

具体的,步骤1中所述预处理包括数据删除、数据插补、数据去噪和归一化。Specifically, the preprocessing described in step 1 includes data deletion, data interpolation, data denoising and normalization.

具体的,步骤2所述对步骤1得到的短时交通流量数据以K-Means算法进行聚类,得到短时交通流量类别,组成短时交通流量类别模式库具体包括:选取K个点作为初始聚类中心进行聚类,其中,K的取值范围为2~7。Specifically, in step 2, the short-term traffic flow data obtained in step 1 is clustered using the K-Means algorithm to obtain short-term traffic flow categories. The composition of the short-term traffic flow category pattern library specifically includes: selecting K points as initial Clustering is performed using the cluster center, where K ranges from 2 to 7.

具体的,步骤3所述的GRU网络的参数设置包括:预测步长8、隐层神经元个数12、学习率0.02和迭代次数800。Specifically, the parameter settings of the GRU network described in step 3 include: prediction step size 8, number of hidden layer neurons 12, learning rate 0.02, and number of iterations 800.

一种短时交通流量预测方法,所述方法包括以下步骤:A short-term traffic flow prediction method, the method includes the following steps:

步骤1、获得待预测交通流量数据集并进行预处理,得到得到预处理后的待预测交通流量数据集;Step 1. Obtain the traffic flow data set to be predicted and perform preprocessing to obtain the preprocessed traffic flow data set to be predicted;

步骤2、将所述的预处理后的待预测交通流量数据集输入至权利要求1-5任一项权利要求所述的短时交通流量预测模型构建方法获得的短时交通流量预测模型中,获得待预测短时交通流量类别。Step 2: Input the preprocessed traffic flow data set to be predicted into the short-term traffic flow prediction model obtained by the short-term traffic flow prediction model construction method according to any one of claims 1 to 5, Obtain the short-term traffic flow category to be predicted.

本发明与现有技术相比,有益的技术效果是:Compared with the existing technology, the beneficial technical effects of the present invention are:

本发明将K-Means聚类方法与GRU神经网络相结合,针对神经网络的训练集选取对短时交通流量的预测结果影响显著,采用K-Means聚类算法将历史短时交通流量数据进行聚类,有针对性地进行预测,提高了预测结果的准确度。This invention combines the K-Means clustering method with the GRU neural network. The training set selection of the neural network has a significant impact on the prediction results of short-term traffic flow. The K-Means clustering algorithm is used to cluster historical short-term traffic flow data. Class, targeted predictions are made, improving the accuracy of prediction results.

附图说明Description of the drawings

图1为本发明的程序流程图;Figure 1 is a program flow chart of the present invention;

图2为本发明实施例1得到的K-means聚类结果图;Figure 2 is a K-means clustering result diagram obtained in Example 1 of the present invention;

图3GRU预测模型流程图。Figure 3 GRU prediction model flow chart.

图4为本发明实施例1中得到的短时交通流量预测结果图,其中,实线表示实际短时交通流量,虚线表示预测值;Figure 4 is a short-term traffic flow prediction result diagram obtained in Embodiment 1 of the present invention, in which the solid line represents the actual short-term traffic flow, and the dotted line represents the predicted value;

图5为KMeans-GRU模型、传统GRU网络模型、ARIMA模型和SAEs模型的预测误差对比图。Figure 5 is a comparison chart of the prediction errors of the KMeans-GRU model, the traditional GRU network model, the ARIMA model and the SAEs model.

以下结合说明书附图和具体实施方式对本发明做具体说明。The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments of the description.

具体实施方式Detailed ways

本发明的整体技术构思和技术原理是:对一段时间内的交通流量数据进行聚类,充分提取数据信息输入到GRU神经网络,以短时交通流量模式类别作为输出,对GRU神经网络模型进行训练,训练完成后得到短时交通流量预测模型,实现对短时交通流量的精准预测。The overall technical concept and technical principle of the present invention are: cluster the traffic flow data within a period of time, fully extract the data information and input it into the GRU neural network, and use the short-term traffic flow pattern category as the output to train the GRU neural network model , after the training is completed, the short-term traffic flow prediction model is obtained to achieve accurate prediction of short-term traffic flow.

以下对本发明涉及的定义或概念内涵做以说明:The definitions or conceptual connotations involved in the present invention are explained below:

欧几里得度量(也称欧氏距离)是一个通常采用的距离定义,指在m维空间中两个点之间的真实距离,或者向量的自然长度(即该点到原点的距离)。在二维和三维空间中的欧氏距离就是两点之间的实际距离。The Euclidean metric (also called Euclidean distance) is a commonly used distance definition, which refers to the real distance between two points in m-dimensional space, or the natural length of a vector (that is, the distance from the point to the origin). The Euclidean distance in two- and three-dimensional space is the actual distance between two points.

二范数:是指矩阵A的2范数,就是A的转置共轭矩阵与矩阵A的积的最大特征根的平方根值,指空间上两个向量矩阵的直线距离。Two-norm: refers to the 2-norm of matrix A, which is the square root value of the largest eigenvalue of the product of the transposed conjugate matrix of A and matrix A. It refers to the straight-line distance between two vector matrices in space.

为了使本发明的目的及优点更加清楚明白,以下结合附图和实施例对本发明进行进一步详细说明,并通过对对比例的分析来体现本发明的优势。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the purpose and advantages of the present invention more clear, the present invention will be further described in detail below with reference to the drawings and examples, and the advantages of the present invention will be reflected through the analysis of comparative examples. It should be understood that the specific embodiments described here are only used to explain the present invention and are not intended to limit the present invention.

K-means聚类算法的原理是:以k为参数,把n个对象分成k个簇,使簇内具有较高的相似度,而簇间的相似度较低。K-means聚类算法的处理过程如下:首先,随机选择k个对象作为质心,每个对象初始地代表了一个簇的平均值或中心;对剩余的每个对象,根据其与各簇中心的距离,将它赋给最近的簇,然后算法在数据分配步骤和质心更新步骤之间迭代,直到满足停止标准(即,没有数据点改变,或簇距离的总和最小化,或者达到一些最大迭代次数)。The principle of K-means clustering algorithm is: with k as parameter, n objects are divided into k clusters, so that the similarity within the cluster is high and the similarity between clusters is low. The processing process of the K-means clustering algorithm is as follows: first, k objects are randomly selected as centroids, and each object initially represents the mean or center of a cluster; for each remaining object, according to its relationship with the center of each cluster distance, assigning it to the nearest cluster, and then the algorithm iterates between the data assignment step and the centroid update step until a stopping criterion is met (i.e., no data points change, or the sum of cluster distances is minimized, or some maximum number of iterations is reached ).

KNN分类法:KNN模型的核心思想很简单,它通过将每一个测试集样本点与训练集中每一个样本之间测算欧氏距离,然后取欧氏距离最近的K个点(k是可以人为划定的近邻取舍个数,K的确定会影响算法结果),并统计这K个训练集样本点所属类别频数,将其中频数最高的所属类别化为该测试样本点的预测类别。KNN classification method: The core idea of the KNN model is very simple. It measures the Euclidean distance between each test set sample point and each sample in the training set, and then takes the K points with the closest Euclidean distance (k can be manually drawn (The determination of K will affect the algorithm results), and count the category frequencies of the K training set sample points, and convert the category with the highest frequency into the predicted category of the test sample point.

本发明还采用多个模型评价指标对比来评价预测结果的好坏,包括平方百分比误差(MAPE)、平方绝对误差(MAE)、均方根误差(RMSE)。The present invention also uses multiple model evaluation index comparisons to evaluate the quality of the prediction results, including square percentage error (MAPE), square absolute error (MAE), and root mean square error (RMSE).

对于平方百分比误差、平方绝对误差和均方根误差这几个误差来说,误差越小拟预测效果越好。For the squared percentage error, squared absolute error and root mean square error, the smaller the error, the better the prediction effect.

实施例:Example:

本实施例This embodiment

步骤1:获取历史一段时间内及短时交通流量数据,然后进行数据填补、删除或数据去噪和归一化相关预处理,得到预处理后的交通流量数据集。Step 1: Obtain historical traffic flow data within a certain period of time and short-term, and then perform data filling, deletion, or data denoising and normalization related preprocessing to obtain a preprocessed traffic flow data set.

作为本发明的一种具体实施方式,交通流量数据的获取,可以直接下载官方数据库或者自行独立检测。本实施例中的历史交通流量数据是从美国PeMS数据库下载,包含2018年1月1日至2018年1月31日么5分钟统计一次得交通流量数据,共计8928条数据。As a specific implementation mode of the present invention, traffic flow data can be obtained by directly downloading the official database or independently detecting it. The historical traffic flow data in this example is downloaded from the US PeMS database, including traffic flow data collected every 5 minutes from January 1, 2018 to January 31, 2018, with a total of 8928 pieces of data.

步骤11:对于输入的短时交通流量历史数据,首先判定是否为空值,如果是空值执行删除操作,若一天中的交通流量数据空值占比大于20%,则删除整填的交通流量数据;Step 11: For the input short-term traffic flow historical data, first determine whether it is a null value. If it is a null value, perform a deletion operation. If the proportion of null values in the traffic flow data in a day is greater than 20%, delete the filled traffic flow. data;

步骤12:如果不为空值,根据式(1)判断是否为异常数据。Step 12: If it is not a null value, determine whether it is abnormal data according to equation (1).

其中,q表示短时交通流量,C表示道路通行能力最大值,T表示流量数据采集时间间隔,fc表示修正系数。若为异常数据,先删除然后利用前一个值与后一个值的均值进行插补。Among them, q represents the short-term traffic flow, C represents the maximum road capacity, T represents the traffic data collection time interval, and f c represents the correction coefficient. If it is abnormal data, delete it first and then use the mean of the previous value and the next value for interpolation.

步骤13:min-max归一化方法对短时交通流量数据进行处理,如式(2):Step 13: The min-max normalization method is used to process the short-term traffic flow data, as shown in Equation (2):

其中,Xi表示第i个样本值,Xi′表示第i个样本的归一化值,Xmax表示样本中的最大值,Xmin表示样本中的最小值。Among them, X i represents the i-th sample value, X i ′ represents the normalized value of the i-th sample, X max represents the maximum value in the sample, and X min represents the minimum value in the sample.

步骤2:对步骤1得到的预处理后的交通流量数据集中的数据以K-Means算法进行聚类,得到短时交通流量模式类别,并建立短时交通流量模式库。Step 2: Cluster the data in the preprocessed traffic flow data set obtained in step 1 using the K-Means algorithm to obtain short-term traffic flow pattern categories, and establish a short-term traffic flow pattern library.

步骤21:首先选取K个点作为初始聚类中心(K的取值范围一般为[2-7])。本例中选取K=3。Step 21: First select K points as the initial clustering center (the value range of K is generally [2-7]). In this example, K=3 is selected.

步骤22:分配预处理后的交通流量数据集中的数据。Step 22: Distribute the data in the preprocessed traffic flow dataset.

每个质心定义一个簇。在此步骤中,利用基于2范数的欧几里德距离计算数据点之间的距离,将每个数据点分配到距其距离最近的质心,如式(3)。如果ci是集合C中的一个质心集合,则数据集中的点x都将被分配给一个基于质心ci的类簇中。Each centroid defines a cluster. In this step, the distance between data points is calculated using the Euclidean distance based on the 2 norm, and each data point is assigned to the centroid closest to it, as shown in Equation (3). If c i is a centroid set in set C, then all points x in the data set will be assigned to a cluster based on centroid c i .

其中,dist()是2范数下的欧几里德距离。Among them, dist() is the Euclidean distance under 2 norm.

步骤23:质心更新Step 23: Centroid Update

在此步骤中,通过计算所有数据点的均值对质心进行更新,如式(4):In this step, the centroid is updated by calculating the mean of all data points, as shown in Equation (4):

不断对步骤1和步骤2进行迭代,直到没有数据点改变类簇,簇中每个数据点到质距离的总和达到最小,或者达到最大迭代次数,如图2所示。Steps 1 and 2 are iterated continuously until no data point changes the cluster, and the sum of the prime distances from each data point in the cluster reaches the minimum, or the maximum number of iterations is reached, as shown in Figure 2.

步骤3:选取待预测日期的24个时间点(前2个小时)的短时交通流量作为状态向量,将这24个时间点的短时交通流量与步骤2得到的短时交通流量模式库中的短时交通流量的欧式距离作为衡量数据相似性的指标。Step 3: Select the short-term traffic flow at 24 time points (first 2 hours) of the date to be predicted as the state vector, and compare the short-term traffic flow at these 24 time points with the short-term traffic flow pattern library obtained in step 2. The Euclidean distance of short-term traffic flow is used as a measure of data similarity.

步骤31:计算2018年1月1日0点至2018年1月30日24点区间内得全部数据点与2018年1月31日0点前24个时刻得数据之间得欧氏距离。Step 31: Calculate the Euclidean distance between all data points in the interval from 0:00 on January 1, 2018 to 24:00 on January 30, 2018, and the data 24 times before 0:00 on January 31, 2018.

步骤32:从小到大排列计算得到的2018年1月31日0点前24个时刻欧氏距离。Step 32: Arrange the calculated Euclidean distances at the 24 moments before 0:00 on January 31, 2018 from small to large.

步骤33:确定并选取与未知样本距离最小的N个样本点(实施应用例中选择范围为[3-7])。Step 33: Determine and select the N sample points with the smallest distance from the unknown sample (the selection range is [3-7] in the implementation example).

步骤34:统计出所选N个点所属类别的出现频率,本实施例中选取N=5。Step 34: Calculate the frequency of occurrence of the categories to which the selected N points belong. In this embodiment, N=5 is selected.

步骤35:将出现频率最高的类别作为当前样点的类别。Step 35: Use the category with the highest frequency as the category of the current sample point.

步骤4:选取与2018年1月31日的交通流量模式最为相似的类别中的数据,作为GRU网络的训练数据,然后利用所设计的GRU网络进行预测。Step 4: Select the data in the category most similar to the traffic flow pattern on January 31, 2018 as the training data of the GRU network, and then use the designed GRU network to make predictions.

所述的GRU网络包括1层输入层、3层GRU单元层、1层dropout层和1层输出层,结构如图3所示。该GRU预测模型结构采用堆栈式结构,多层架构能够对数据进行更深层次的表达。如图4.5所示,gru_1_input:InputLayer中的input的3个参量none、12、1表示的是网络的输入层的输入数据的样本数、时间步长和变量维数,上一层的output的参量必须与下一层input的参量维数相同。3层GRU网络后连接的Dropout层(参数取值范围为0-0.2)是为了减小GRU网络的过拟合,最后的Dense(全连接)层把GRU输出的多维数据转换成一维输出。The GRU network includes 1 layer of input layer, 3 layers of GRU unit layer, 1 layer of dropout layer and 1 layer of output layer. The structure is shown in Figure 3. The GRU prediction model structure adopts a stack structure, and the multi-layer architecture can express the data at a deeper level. As shown in Figure 4.5, the three input parameters none, 12, and 1 in gru_1_input:InputLayer represent the number of samples, time steps and variable dimensions of the input data of the network's input layer, and the parameters of the output of the previous layer. It must be the same as the parameter dimension of the next layer input. The Dropout layer (parameter value range is 0-0.2) connected after the 3-layer GRU network is to reduce the over-fitting of the GRU network. The final Dense (fully connected) layer converts the multi-dimensional data output by the GRU into one-dimensional output.

GRU网络的超参数设置如表1所示。The hyperparameter settings of the GRU network are shown in Table 1.

表1 GRU网络超参数设置Table 1 GRU network hyperparameter settings

图4为利用该方法的预测结果图,其中蓝色实线表示实际短时交通流量,橙色实线表示预测值。Figure 4 shows the prediction results using this method, in which the blue solid line represents the actual short-term traffic flow and the orange solid line represents the predicted value.

图5为KMeans-GRU模型、传统GRU网络模型、ARIMA模型和SAEs模型的误差对比图。Figure 5 shows the error comparison chart of KMeans-GRU model, traditional GRU network model, ARIMA model and SAEs model.

KMeans-GRU预测模型的预测结果与传统GRU网络模型、ARIMA模型和SAEs模型的误差对比如表2:Table 2: Error comparison between the prediction results of the KMeans-GRU prediction model and the traditional GRU network model, ARIMA model and SAEs model:

表2模型的评价指标对比表Table 2 Comparison table of evaluation indicators of models

Claims (3)

1. The short-time traffic flow prediction model construction method is characterized by comprising the following steps of:
step 1, acquiring traffic flow data in a period of time to obtain a traffic flow data set; preprocessing the obtained traffic flow data set to obtain a preprocessed traffic flow data set;
step 1.1: for the input short-time traffic flow historical data, firstly judging whether the data is null, if so, executing deleting operation, and if the null ratio of the traffic flow data in one day is more than 20%, deleting the whole filled traffic flow data;
step 1.2: if the value is not null, judging whether the value is abnormal data according to the formula (1);
wherein q represents short-time traffic flow, C represents maximum road traffic capacity, T represents flow data acquisition time interval, f c Representing the correction coefficient;
if the data is abnormal data, deleting the data firstly and then interpolating by using the average value of the previous value and the next value;
step 1.3: the min-max normalization method processes the short-time traffic flow data as shown in the formula (2):
wherein X is i Represents the i-th sample value, X i ' represents the normalized value of the ith sample, X max Represents the maximum value in the sample, X min Representing the minimum value in the sample;
step 2, clustering the data in the preprocessed traffic flow data set obtained in the step 1 by a K-Means algorithm to obtain short-time traffic flow mode types, and establishing a short-time traffic flow mode library;
step 2.1: firstly, K points are selected as initial clustering centers, and the value range of K is 2-7;
step 2.2: distributing data in the preprocessed traffic flow data set;
each centroid defines a cluster; in this step, the distance between data points is calculated using euclidean distance based on 2 norms, and each data point is assigned to the centroid nearest to it, as shown in the following formula (3); if c i Is a centroid set in set C, then points x in the dataset will all be assigned to a centroid-based C i Is in a cluster of classes (1);
wherein dist () is a euclidean distance at 2 norms;
step 2.3: centroid update
In this step, the centroid is updated by calculating the mean of all data points, as shown in equation (4):
iterating the step 1 and the step 2 continuously until no data point changes the class cluster, and the sum of the distances from each data point in the cluster to the quality is minimum or the maximum iteration times are reached;
step 3, determining a prediction date, acquiring short-time traffic flow data of the first N moments of the prediction date, sequentially determining feature vectors of the N moment short-time traffic flow modes, respectively determining short-time traffic flow modes corresponding to the N moment feature vectors by using a classification method, and selecting traffic flow data corresponding to a traffic flow mode with highest occurrence frequency of the short-time traffic flow modes corresponding to the N moments to form a training data set;
step 4, training the training data set formed in the step 3 by using the GRU neural network model, and obtaining a short-time traffic flow prediction model after training is completed;
the GRU neural network model adopts a stack structure and comprises an input layer, a first GRU unit layer, a second GRU unit layer and a third GRU unit layer which are sequentially connected, and a dropout layer for preventing overfitting is connected behind the third GRU unit layer;
the parameter setting of the GRU network model comprises the following steps: prediction step size 8, hidden layer neuron number 12, learning rate 0.02 and iteration number 800.
2. The short-term traffic flow prediction model construction method according to claim 1, wherein the preprocessing in step 1 includes data deletion, data interpolation, data denoising and normalization.
3. A short-term traffic flow prediction method, characterized in that the method comprises the steps of:
step 1, obtaining a traffic flow data set to be predicted and preprocessing the traffic flow data set to obtain a preprocessed traffic flow data set to be predicted;
and 2, inputting the preprocessed traffic flow data set to be predicted into the short-time traffic flow prediction model obtained by the short-time traffic flow prediction model construction method according to claim 1 or 2, and obtaining the class of the short-time traffic flow to be predicted.
CN202010628317.5A 2020-07-01 2020-07-01 A short-term traffic flow prediction model construction method and prediction method Expired - Fee Related CN111882114B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010628317.5A CN111882114B (en) 2020-07-01 2020-07-01 A short-term traffic flow prediction model construction method and prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010628317.5A CN111882114B (en) 2020-07-01 2020-07-01 A short-term traffic flow prediction model construction method and prediction method

Publications (2)

Publication Number Publication Date
CN111882114A CN111882114A (en) 2020-11-03
CN111882114B true CN111882114B (en) 2023-10-31

Family

ID=73149869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010628317.5A Expired - Fee Related CN111882114B (en) 2020-07-01 2020-07-01 A short-term traffic flow prediction model construction method and prediction method

Country Status (1)

Country Link
CN (1) CN111882114B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112561188A (en) * 2020-12-22 2021-03-26 广州杰赛科技股份有限公司 People flow prediction method and device
CN113763700B (en) * 2021-04-26 2022-09-20 腾讯云计算(北京)有限责任公司 Information processing method, information processing device, computer equipment and storage medium
CN114255591A (en) * 2021-12-17 2022-03-29 重庆中信科信息技术有限公司 Short-term traffic flow prediction method and device considering space-time correlation and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013128486A1 (en) * 2012-02-29 2013-09-06 株式会社 日立製作所 Traffic amount prediction system
CN105701571A (en) * 2016-01-13 2016-06-22 南京邮电大学 Short-term traffic flow prediction method based on nerve network combination model
CN106875511A (en) * 2017-03-03 2017-06-20 深圳市唯特视科技有限公司 A kind of method for learning driving style based on own coding regularization network
CN107154150A (en) * 2017-07-25 2017-09-12 北京航空航天大学 A kind of traffic flow forecasting method clustered based on road with double-layer double-direction LSTM
CN109711640A (en) * 2019-01-23 2019-05-03 北京工业大学 A short-term traffic flow prediction method based on fuzzy C-means traffic flow clustering and error feedback convolutional neural network
CN110827544A (en) * 2019-11-11 2020-02-21 重庆邮电大学 A Short-Term Traffic Flow Control Method Based on Graph Convolutional Recurrent Neural Networks

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013128486A1 (en) * 2012-02-29 2013-09-06 株式会社 日立製作所 Traffic amount prediction system
CN105701571A (en) * 2016-01-13 2016-06-22 南京邮电大学 Short-term traffic flow prediction method based on nerve network combination model
CN106875511A (en) * 2017-03-03 2017-06-20 深圳市唯特视科技有限公司 A kind of method for learning driving style based on own coding regularization network
CN107154150A (en) * 2017-07-25 2017-09-12 北京航空航天大学 A kind of traffic flow forecasting method clustered based on road with double-layer double-direction LSTM
CN109711640A (en) * 2019-01-23 2019-05-03 北京工业大学 A short-term traffic flow prediction method based on fuzzy C-means traffic flow clustering and error feedback convolutional neural network
CN110827544A (en) * 2019-11-11 2020-02-21 重庆邮电大学 A Short-Term Traffic Flow Control Method Based on Graph Convolutional Recurrent Neural Networks

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Cluster-Based LSTM Network for Short-Term Passenger Flow Forecasting in Urban Rail Transit;Jinlei Z.等;IEEE Access;全文 *
城市交通重大事件的客流预测实验设计与实现;徐秀娟;张文轩;陈谆悦;赵小薇;许真珍;;实验室科学(第01期);全文 *
基于深度学习的交通流量预测;刘明宇 等;系统仿真学报;第4100-4114页 *

Also Published As

Publication number Publication date
CN111882114A (en) 2020-11-03

Similar Documents

Publication Publication Date Title
CN115270965B (en) A distribution network line fault prediction method and device
CN109063911B (en) Load aggregation grouping prediction method based on gated cycle unit network
CN111882114B (en) A short-term traffic flow prediction model construction method and prediction method
CN109376772B (en) Power load combination prediction method based on neural network model
WO2021073462A1 (en) 10 kv static load model parameter identification method based on similar daily load curves
CN108416366A (en) A kind of power-system short-term load forecasting method of the weighting LS-SVM based on Meteorological Index
CN110414788A (en) A Power Quality Prediction Method Based on Similar Days and Improved LSTM
CN107220764A (en) A kind of electricity sales amount Forecasting Methodology compensated based on preamble analysis and factor and device
CN110059852A (en) A kind of stock yield prediction technique based on improvement random forests algorithm
CN114444378A (en) Short-term power prediction method for regional wind power cluster
CN108805213B (en) Power load curve double-layer spectral clustering method considering wavelet entropy dimensionality reduction
CN111008726B (en) A method for image-like conversion in power load forecasting
CN101699514B (en) SAR Image Segmentation Method Based on Immune Cloning Quantum Clustering
CN112036598B (en) A charging pile usage information prediction method based on multi-information coupling
CN110348608A (en) A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm
CN110503104A (en) A short-term prediction method for the number of remaining parking spaces based on convolutional neural network
CN113515512A (en) Quality control and improvement method for industrial internet platform data
CN113657671A (en) A flight delay prediction method based on ensemble learning
CN114936694B (en) A photovoltaic power prediction method based on dual integrated model
CN111178585A (en) Prediction method of fault reception volume based on multi-algorithm model fusion
CN111652444B (en) A method for predicting the number of daily tourists based on K-means and LSTM
CN112819208A (en) Spatial similarity geological disaster prediction method based on feature subset coupling model
CN110322075A (en) A kind of scenic spot passenger flow forecast method and system based on hybrid optimization RBF neural
CN114548212A (en) A method and system for evaluating water quality
CN116341929A (en) Prediction method based on clustering and adaptive gradient lifting decision tree

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20231031