CN116703641A - Flexible multi-resource workshop dynamic scheduling method based on deep reinforcement learning - Google Patents
Flexible multi-resource workshop dynamic scheduling method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN116703641A CN116703641A CN202310703139.1A CN202310703139A CN116703641A CN 116703641 A CN116703641 A CN 116703641A CN 202310703139 A CN202310703139 A CN 202310703139A CN 116703641 A CN116703641 A CN 116703641A
- Authority
- CN
- China
- Prior art keywords
- workpiece
- time
- workshop
- machine
- decision
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/04—Manufacturing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06313—Resource planning in a project environment
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Strategic Management (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Tourism & Hospitality (AREA)
- Software Systems (AREA)
- Marketing (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Quality & Reliability (AREA)
- Manufacturing & Machinery (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Educational Administration (AREA)
- Development Economics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- General Factory Administration (AREA)
- Feedback Control In General (AREA)
Abstract
Description
技术领域Technical Field
本发明涉及一种基于深度强化学习的柔性多资源车间动态调度方法,属于动态调度技术领域。The present invention relates to a flexible multi-resource workshop dynamic scheduling method based on deep reinforcement learning, belonging to the technical field of dynamic scheduling.
背景技术Background Art
作业车间问题是生产调度问题中较早出现并且被广泛研究的问题,可以简述为:车间有一批存有既定工艺路线的工件,每个工件的工艺路线中的每一道工序都有对应的机床进行加工,需要确定每一台机床的加工工序及其的加工顺序以达到一个或者多个加工目标。柔性多资源车间调度问题是上述问题中一种更一般的情况,相对于作业车间问题,该问题多了多资源约束,少了机器加工约束,多资源指除了工件和机床外,还需考虑自动导引车辆(Automated Guided Vehicle,AGV)等运输资源,没有机器加工的约束后,工件的每道工序有多台机床可供选择随着柔性车间的发展,车间调度算法也在不断的丰富和拓展。The job shop problem is an early and widely studied problem in production scheduling. It can be briefly described as follows: there are a number of workpieces with established process routes in the workshop. Each process in the process route of each workpiece has a corresponding machine tool for processing. It is necessary to determine the processing process and processing sequence of each machine tool to achieve one or more processing goals. The flexible multi-resource workshop scheduling problem is a more general case of the above problems. Compared with the job shop problem, this problem has more multi-resource constraints and fewer machine processing constraints. Multi-resource means that in addition to workpieces and machine tools, transportation resources such as automated guided vehicles (AGVs) must also be considered. Without the constraints of machine processing, there are multiple machine tools available for each process of the workpiece. With the development of flexible workshops, workshop scheduling algorithms are also constantly enriched and expanded.
目前,传统的调度方法主要可以分为两种类型,一是基于调度规则的方法,另一种是启发式算法。调度规则可以分为简单的调度规则和复合规则,简单的调度规则有先来先服务,最短加工时间优先,最短交货期优先和最短剩余加工时间优先等,而复合规则是由一些简单调度规则组合而来。基于规则的调度方法在每个调度时刻选择一个工件到一台机器上进行加工,并选择一辆AGV用来运输工件,这种方法的优点是实时性高,能够迅速响应,但这种方法的缺点是效率较低,从长期来看并不能取得较好的效果。At present, traditional scheduling methods can be mainly divided into two types, one is the method based on scheduling rules, and the other is the heuristic algorithm. Scheduling rules can be divided into simple scheduling rules and compound rules. Simple scheduling rules include first come first served, shortest processing time priority, shortest delivery time priority and shortest remaining processing time priority, while compound rules are composed of some simple scheduling rules. The rule-based scheduling method selects a workpiece to a machine for processing at each scheduling moment, and selects an AGV to transport the workpiece. The advantage of this method is high real-time performance and rapid response, but the disadvantage of this method is low efficiency and cannot achieve good results in the long run.
常用的启发式算法主要有遗传算法(Genetic Algorithm,GA)、粒子群算法(Particle swarm optimization,PSO)、模拟退火法(Simulated Annealing,SA)和蚁群算法(Ant Colony Optimization,ACO)等。启发式算法的搜索效率较高,在有限的时间内能获取到一个较优解,但在面对动态问题时,基于启发式的调度算法只能将动态问题划分为多阶段的静态问题,即在调度的初始时刻进行计算获得一个初始调度方案,在重调度时刻再对未调度的资源进行计算,再获得重调度方案,这种方法需要大量的计算时间,实时性较差,难以应用到环境状态多变的多资源调度问题中。Commonly used heuristic algorithms include genetic algorithm (GA), particle swarm optimization (PSO), simulated annealing (SA) and ant colony optimization (ACO). Heuristic algorithms have high search efficiency and can obtain a better solution in a limited time. However, when facing dynamic problems, heuristic scheduling algorithms can only divide dynamic problems into multi-stage static problems, that is, calculate at the initial moment of scheduling to obtain an initial scheduling plan, and calculate the unscheduled resources at the rescheduling moment to obtain the rescheduling plan. This method requires a lot of computing time and has poor real-time performance, making it difficult to apply to multi-resource scheduling problems with changing environmental conditions.
因此需要寻找一个能够兼顾响应能力和优化效果的算法,近些年来,基于学习的方法已经逐渐被应用到解决该问题中。Q-learning算法是使用较多的算法,该方法将Q-函数储存一张表中,行为表状态,列为动作,元素则是在某个状态某个动作下对应的Q-函数的价值。然后在调度时刻,只需要在表中寻找当前状态下价值最高的动作。但实际的车间中,车间的状态空间十分复杂,再建立和维护一个表格来存储价值函数是不太现实的,不仅会降低算法的效率,还会带来很多无用的状态。Therefore, it is necessary to find an algorithm that can take into account both responsiveness and optimization effects. In recent years, learning-based methods have gradually been applied to solve this problem. The Q-learning algorithm is a commonly used algorithm. This method stores the Q-function in a table, where the behavior table states are listed as actions, and the elements are the values of the Q-function corresponding to a certain state and action. Then, at the scheduling time, you only need to find the action with the highest value in the current state in the table. However, in actual workshops, the state space of the workshop is very complex. It is not realistic to establish and maintain a table to store the value function. It will not only reduce the efficiency of the algorithm, but also bring many useless states.
发明内容Summary of the invention
为了解决现有技术存在无法兼顾效率和性能的问题,本发明的主要目的是提供一种基于深度强化学习的柔性多资源车间动态调度方法;该方法定义车间的状态、车间的调度动作、决策奖励函数,并构建用于决策的神经网络的架构和学习方法,通过在车间的决策时刻,计算车间的实时生产状态,将生产状态输入到训练好的神经网络中,神经网络根据输入的状态输出车间的调度动作,执行调度动作并反馈奖励给神经网络的方法来实现降低车间工件总延迟的目的,实现车间的高效生产。In order to solve the problem that the existing technology cannot take into account both efficiency and performance, the main purpose of the present invention is to provide a flexible multi-resource workshop dynamic scheduling method based on deep reinforcement learning; this method defines the state of the workshop, the scheduling action of the workshop, and the decision-making reward function, and constructs the architecture and learning method of the neural network for decision-making. At the decision moment of the workshop, the real-time production status of the workshop is calculated, and the production status is input into the trained neural network. The neural network outputs the scheduling action of the workshop according to the input status. The method of executing the scheduling action and feeding back the reward to the neural network can achieve the purpose of reducing the total delay of the workshop workpiece and realizing efficient production of the workshop.
本发明的目的是通过下述技术方案实现的。The purpose of the present invention is achieved through the following technical solutions.
本发明公开的基于深度强化学习的柔性多资源车间动态调度方法,包括如下步骤:The flexible multi-resource workshop dynamic scheduling method based on deep reinforcement learning disclosed in the present invention comprises the following steps:
步骤一、以最小化工件总延迟时间为优化目标,构建调度目标优化问题,用于安排工序的加工机器集合、各台机器上的加工顺序和运输工件的AGV。Step 1: Taking minimizing the total delay time of workpieces as the optimization goal, a scheduling target optimization problem is constructed to arrange the processing machine set of the process, the processing sequence on each machine, and the AGV for transporting workpieces.
步骤二、使用D3QN(Dueling Double DQN)方法对步骤一构建的调度目标优化问题进行求解,得到决策结果;所述决策结果包括当前时间选择待加工的工件,选择加工该工件的机器和选择运输该工件的AGV;通过执行该决策结果,降低所有工件总延迟时间,提高车间的生产效率。Step 2: Use the D3QN (Dueling Double DQN) method to solve the scheduling target optimization problem constructed in step 1 to obtain a decision result; the decision result includes selecting the workpiece to be processed at the current time, selecting the machine to process the workpiece, and selecting the AGV to transport the workpiece; by executing the decision result, the total delay time of all workpieces is reduced and the production efficiency of the workshop is improved.
步骤2.1、D3QN算法拥有两个神经网络,分别是Main Q Network和Target QNetwork,两个网络的结构一样,均为输入层、隐藏层和双输出子网的结构;输入层的输入维数和车间状态的维数保持一致;隐藏层采用一层或者多层全连接层;双输出子网包括状态价值函数网络Vn和优势函数网络An,状态价值函数网络Vn负责评估状态s的价值,优势函数网络An负责评估在当前状态下各个动作a的相对优劣,最终网络的价值输出Q由Vn和An线性组合得到:Step 2.1, D3QN algorithm has two neural networks, namely Main Q Network and Target QNetwork. The structures of the two networks are the same, both of which are input layer, hidden layer and dual output subnet structure; the input dimension of the input layer is consistent with the dimension of the workshop state; the hidden layer uses one or more fully connected layers; the dual output subnet includes the state value function network Vn and the advantage function network An. The state value function network Vn is responsible for evaluating the value of the state s, and the advantage function network An is responsible for evaluating the relative advantages and disadvantages of each action a in the current state. The final network value output Q is obtained by linear combination of Vn and An:
其中,ω表示当前双输出子网的网络参数,α和β分别表示状态价值网络Vn和优势函数网络An的参数,A表示为所有动作的集合,a'∈A;Among them, ω represents the network parameters of the current dual-output subnet, α and β represent the parameters of the state value network Vn and the advantage function network An respectively, A represents the set of all actions, a'∈A;
步骤2.2、参数设置:设置柔性多资源车间动态调度问题的相关参数,包括超参数和神经网络的参数;Step 2.2, parameter setting: set the relevant parameters of the flexible multi-resource workshop dynamic scheduling problem, including hyperparameters and neural network parameters;
步骤2.3、初始化训练次数episode=0,初始化当前时间t=0,初始化车间资源的状态;Step 2.3, initialize the number of training episodes = 0, initialize the current time t = 0, and initialize the state of the workshop resources;
步骤2.4、判断当前回合是否结束,如果结束则执行下一个回合并初始化时间和车间资源状态,否则继续执行;回合结束的判定条件为:是否所有的工件都加工结束;Step 2.4: Determine whether the current round is over. If it is over, execute the next round and initialize the time and workshop resource status. Otherwise, continue to execute. The judgment condition for the end of the round is: whether all workpieces have been processed;
步骤2.5、判断当前时刻是否为决策点,决策点的定义为:当前时刻有到来的新工件和已经结束上一道工序的待加工工件;如果是决策点则进行决策,进行第步骤2.6;如果不是决策点,则跳转至第步骤2.15;Step 2.5, determine whether the current moment is a decision point. The definition of a decision point is: at the current moment, there are new workpieces arriving and workpieces to be processed that have completed the previous process; if it is a decision point, make a decision and proceed to step 2.6; if it is not a decision point, jump to step 2.15;
步骤2.6、计算当前车间的三类状态特征值,即已经到达工件的状态、机器的状态和AGV的状态;三类状态特征值一共14种状态,分别如下所示:Step 2.6, calculate the three types of state characteristic values of the current workshop, namely the state of the workpiece that has arrived, the state of the machine, and the state of the AGV; the three types of state characteristic values have a total of 14 states, which are as follows:
(1)已经到达工件的平均完成率:(1) Average completion rate of workpieces that have been reached:
f1=mean{jcr}f 1 =mean{jcr}
工件完成率 Workpiece completion rate
其中OPi是工件Ji的总工序数,OCi是工件Ji已经完工的工序数;Where OP i is the total number of processes for workpiece Ji i , and OC i is the number of completed processes for workpiece Ji i ;
(2)已经到达工件平均完成率的标准差:(2) The standard deviation of the average completion rate of the workpieces has been reached:
f2=std{jcr}f 2 = std{jcr}
(3)已经到达工序平均完成率:(3) The average process completion rate has been reached:
(4)已经到达工序的平均运输等待率(4) Average waiting rate of transportation that has arrived at the process
f4=mean{otwr}f 4 =mean{otwr}
工序的运输等待率 Transportation waiting rate of the process
其中wttij是工序Oij的运输等待时间,ttijr是工序Oij在AGV Ar上的运输时间,Oi={Oij|Oi1,Oi2,...,Oij}是工件Ji的工序集;Where wtt ij is the transportation waiting time of process O ij , tt ijr is the transportation time of process O ij on AGV A r , O i ={O ij |O i1 ,O i2 ,...,O ij } is the process set of workpiece Ji ;
(5)已经到达工序的平均加工等待率(5) Average processing waiting rate of the process that has arrived
f5=mean{opwr}f 5 =mean{opwr}
工序的加工等待率 Processing waiting rate of the process
其中wptij是工序Oij的加工等待时间,ptijk是工序Oij的在机器Mk上的加工时间,Mij是工序Oij可用的机器集;Where wpt ij is the processing waiting time of process O ij , pt ijk is the processing time of process O ij on machine M k , and M ij is the set of machines available for process O ij ;
(6)已经到达工件当前工序的平均预计延期率:(6) Average expected delay rate of workpieces that have arrived at the current process:
f6=mean{cotr}f 6 =mean{cotr}
工件的当前工序的预计延期率 The expected delay rate of the current process of the workpiece
其中ctic是工序Oic的完工时间,dtic是工序Oic的工序交货期,Ci是工件Ji的交货期,Oic是工件Ji当前正在处理工序,ati是工件Ji的到达时间;Where ct ic is the completion time of process O ic , dt ic is the process delivery time of process O ic , C i is the delivery time of workpiece Ji i , O ic is the process that workpiece Ji i is currently processing, and at i is the arrival time of workpiece Ji i ;
(7)已经到达工件的平均预计延期率的标准差:(7) Standard deviation of the average expected delay rate of the arrived workpieces:
f7=std{cotr}f 7 = std{cotr}
(8)已经到达工件平均预计延期率的最大值:(8) The maximum value of the average expected delay rate of the workpiece has been reached:
f8=max{cotr}f 8 = max{cotr}
(9)机器的平均利用率:(9) Average machine utilization:
f9=mean{mur}f 9 =mean{mur}
机器的利用率为 The machine utilization rate is
其中RTk为机器Mk的累计工作时间,t是当前的时间;Where RT k is the cumulative working time of machine M k , and t is the current time;
(10)机器的平均利用率的标准差:(10) Standard deviation of the average utilization rate of the machine:
f10=std{mur}f 10 = std{mur}
(11)机器的平均剩余任务量:(11) Average remaining workload of the machine:
其中rptij是工序Oij的剩余加工时间,xijk是判断工序Oij是否在机器Mk上加工,m是机器的数量;Where rpt ij is the remaining processing time of process O ij , x ijk is used to determine whether process O ij is processed on machine M k , and m is the number of machines;
(12)AGV的平均利用率:(12) Average utilization rate of AGV:
f12=mean{aur}f 12 =mean{aur}
AGV的利用率为 The utilization rate of AGV is
其中a是AGV的数量,RTr是AGVAr的累计工作时间;Where a is the number of AGVs, RT r is the cumulative working time of AGVA r ;
(13)AGV的平均利用率的标准差:(13) Standard deviation of average utilization of AGV:
f13=std{aur}f 13 = std{aur}
(14)AGV的平均剩余任务量:(14) Average remaining task volume of AGV:
其中rttij是工序Oij的剩余运输时间,yijr是判断工序Oij是否在AGVAr上进行运输;Where rtt ij is the remaining transportation time of process O ij , and y ijr is used to determine whether process O ij is transported on AGVA r ;
最终输入神经网络的车间状态是:The final workshop state input to the neural network is:
s={f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13,f14}s={f 1 ,f 2 ,f 3 ,f 4 ,f 5 ,f 6 ,f 7 ,f 8 ,f 9 ,f 10 ,f 11 ,f 12 ,f 13 ,f 14 }
步骤2.7、进行决策:随机生成一个0到1之间的小数,如果该数小于或者等于当前的探索率,则将随机选择一个复合调度规则;否则,将当前的车间状态值输入到MainQNetwork中进行计算,选择价值最大的复合调度规则;Step 2.7, make a decision: randomly generate a decimal between 0 and 1. If the decimal is less than or equal to the current exploration rate, a composite scheduling rule will be randomly selected; otherwise, the current workshop status value is input into MainQNetwork for calculation, and the composite scheduling rule with the greatest value is selected;
所述复合调度规则是通过将三类调度规则排列组合的方式生成一组复合调度规则;每个单独的复合调度规则能够直接选择出相应的工件、机器和AGV;The composite scheduling rule is generated by arranging and combining three types of scheduling rules to generate a set of composite scheduling rules; each individual composite scheduling rule can directly select the corresponding workpiece, machine and AGV;
所述三类调度规则为选择加工工件的规则、选择加工机器的规则和选择运输AGV的规则;The three types of scheduling rules are rules for selecting workpieces to be processed, rules for selecting processing machines, and rules for selecting transport AGVs;
所述选择加工工件的规则:The rules for selecting workpieces to be processed are:
(1)选择当前工序加工时间最短的工件(1) Select the workpiece with the shortest processing time in the current process
其中Jd是选择出来的工件,Lt是当前时刻t中待调度的工件的集合,ptick是工序Oic在机器Mk上的加工时间,Mic是工序Oic可用的机器集合,amic是工序Oic可用的机器数量;Where J d is the selected workpiece, L t is the set of workpieces to be scheduled at the current time t, pt ick is the processing time of process O ic on machine M k , M ic is the set of machines available for process O ic , and am ic is the number of machines available for process O ic ;
(2)选择当前工序加工时间最长的工件(2) Select the workpiece with the longest processing time in the current process
(3)选择剩余加工时间最小的工件(3) Select the workpiece with the shortest remaining processing time
Mij是工序Oij可用的机器集合,amij是工序Oij可用的机器数量M ij is the set of machines available for process O ij , and am ij is the number of machines available for process O ij.
(4)选择剩余加工时间最大的工件(4) Select the workpiece with the largest remaining processing time
(5)选择交货期最短的工件(5) Select the workpiece with the shortest delivery time
其中Di是工件Ji的交货期;Where D i is the delivery date of workpiece Ji ;
选择加工机器的规则:Rules for selecting processing machines:
(1)选择可用机器中剩余工作时间最小的机器(1) Select the machine with the shortest remaining working time among the available machines
其中Md是选择出的机器,mrtk是机器Mk的剩余工作时间;Where M d is the selected machine, mrt k is the remaining working time of machine M k ;
(2)选择可用机器中机器利用率最小的机器(2) Select the machine with the lowest utilization rate among the available machines
(3)选择可用机器中加工最快的机器(3) Choose the fastest machine among the available machines
选择运输工件AGV的规则:Rules for selecting AGV for transporting workpieces:
(1)选择剩余运输时间最小的AGV(1) Select the AGV with the shortest remaining transport time
其中Ad是被选择的AGV,artr是AGVAr的剩余工作时间;Where A d is the selected AGV, art r is the remaining working time of AGVA r ;
(2)选择利用率最低的AGV(2) Select the AGV with the lowest utilization rate
上述五种选择加工工件的规则、三种选择加工机器的规则和两种选择运输工件的AGV的规则,通过排列组合的方式生成三十种复合调度规则;The above five rules for selecting workpieces to be processed, three rules for selecting processing machines, and two rules for selecting AGVs to transport workpieces are used to generate thirty composite scheduling rules through permutations and combinations;
步骤2.8、执行调度规则:通过决策出的复合规则选择相对应的工件、机器和AGV;Step 2.8, execute the scheduling rules: select the corresponding workpiece, machine and AGV according to the composite rules decided;
步骤2.9、采用步骤2.6的方法计算执行调度规则后的车间的状态特征值;Step 2.9, using the method in step 2.6 to calculate the state characteristic value of the workshop after the scheduling rule is executed;
步骤2.10、计算奖励:优化目标为最小化总延迟时间(Total Delay Time,TDT),因此奖励函数的设计要能保证奖励最大化的趋势和优化目标的优化趋势一致,具体的奖励函数如为:如果决策选择的工件的预计工序完成时间小于该工序的交货期,则奖励为0;否则,奖励为工件的预计工序完成时间与交货期差值;Step 2.10, calculate the reward: the optimization goal is to minimize the total delay time (Total Delay Time, TDT), so the design of the reward function should ensure that the trend of maximizing the reward is consistent with the optimization trend of the optimization goal. The specific reward function is as follows: if the estimated process completion time of the workpiece selected by the decision is less than the delivery date of the process, the reward is 0; otherwise, the reward is the difference between the estimated process completion time and the delivery date of the workpiece;
rij=max(0,ctij-dtij) rij = max(0, ctij -dtij )
其中TDT为总延期,Ci是工件Ji的完工时间,Di是工件Ji的交货期,rij代表的是这次决策所得到的奖励,ctij代表的是工序Oij的完工时间,dtij代表的是工序Oij的交货期;Where TDT is the total delay, Ci is the completion time of workpiece Ji , Di is the delivery date of workpiece Ji , r ij represents the reward obtained from this decision, ct ij represents the completion time of process O ij , and dt ij represents the delivery date of process O ij ;
步骤2.11、判断当前决策是否为该回合的最后一步决策,如果是则,done=1;否则done=0;Step 2.11: Determine whether the current decision is the last decision of this round. If so, done = 1; otherwise, done = 0;
步骤2.12、将每次的调度经验以(st,at,st+1,r,done)的形式存储到经验池中,为后续神经网络的学习和训练提供经验;其中st代表的含义是决策前的车间状态值,at代表的是决策出的调度规则,r代表这次决策得到的奖励,st+1代表的是执行决策结果后的车间状态值,done代表的是该步决策是否为最后一步决策;Step 2.12, store each scheduling experience in the experience pool in the form of (s t , a t , s t+1 , r, done) to provide experience for subsequent neural network learning and training; where s t represents the workshop state value before the decision, a t represents the scheduling rule decided, r represents the reward obtained for this decision, s t+1 represents the workshop state value after the decision result is executed, and done represents whether this step decision is the last step decision;
步骤2.13、经验回放:判断当前经验池中存储的经验的个数是否大于最小的Batch数,如果大于,则从经验池中随机选取一个Batch大小的经验用于神经网络的训练和更新;如果小于或者等于,则不执行任何操作;Step 2.13, experience playback: determine whether the number of experiences stored in the current experience pool is greater than the minimum batch number. If it is greater, randomly select a batch size of experience from the experience pool for neural network training and updating; if it is less than or equal to, do not perform any operation;
步骤2.14、更新D3QN中的Main Q Network:将状态st输入到Main Q Network中并估计其Q值,将状态st+1输入Target Q Network中结合奖励值r得到目标值Y,Main QNetwork的参数采用梯度下降的方式更新;而Target Network的参数则是通过间隔一定时间复制Main Q Network的参数得来的;因此,每决策二十次,对Target Q Network的权重进行更新;Step 2.14: Update the Main Q Network in D3QN: Input the state s t into the Main Q Network and estimate its Q value. Input the state s t+1 into the Target Q Network and combine it with the reward value r to get the target value Y. The parameters of the Main Q Network are updated by gradient descent. The parameters of the Target Network are obtained by copying the parameters of the Main Q Network at regular intervals. Therefore, the weights of the Target Q Network are updated every twenty decisions.
Y=r+γQt(st+1,argmaxQ(st+1,am;ω,α,β);ωt,αt,βt)*(1-done)Y=r+γQ t (s t+1 ,argmaxQ(s t+1 ,a m ;ω,α,β);ω t ,α t ,β t )*(1-done)
其中Ε[]2表示为均方误差,(st,at,st+1,r,done)~U(B)表示从样本中抽取的数据,γ为折扣因子,argmaxa Q(st+1,am;ω,α,β)表示为Main Q Network评估下一状态st+1下价值最大的动作am;Where Ε[] 2 represents the mean square error, (s t ,a t ,s t+1 ,r,done)~U(B) represents the data extracted from the sample, γ is the discount factor, argmax a Q(s t+1 , am ;ω,α,β) represents the action a m with the greatest value in the next state s t+1 evaluated by the Main Q Network;
步骤2.15、更新车间工件、机器和AGV的状态,更新当前时间为t=t+1;Step 2.15, update the status of the workshop workpiece, machine and AGV, and update the current time to t=t+1;
步骤2.16、判断是否已经达到训练次数,如果已经完成,则保存神经网络的参数,得到训练好的神经网络;否则,跳转执行第2.4步;Step 2.16: Determine whether the number of training times has been reached. If it has been completed, save the parameters of the neural network to obtain the trained neural network; otherwise, jump to step 2.4;
步骤2.17、向步骤2.16得到的训练好的神经网络输入车间的当前状态,得到决策结果;所述决策结果包括当前时间选择待加工的工件,选择加工该工件的机器和选择运输该工件的AGV;通过执行该决策结果,降低所有工件的总延迟时间,提高车间的生产效率。Step 2.17, input the current state of the workshop into the trained neural network obtained in step 2.16 to obtain a decision result; the decision result includes selecting the workpiece to be processed at the current time, selecting the machine to process the workpiece, and selecting the AGV to transport the workpiece; by executing the decision result, the total delay time of all workpieces is reduced and the production efficiency of the workshop is improved.
有益效果:Beneficial effects:
1、本发明公共的一种计算车间实时生产状态的方法,为车间中的工件、机器和AGV分别设计了不同的状态特征,通过这些状态特征值可以快速掌握车间的生产状态,为车间决策提供支持。1. The present invention discloses a method for calculating the real-time production status of a workshop, which designs different status characteristics for the workpieces, machines and AGVs in the workshop. Through these status characteristic values, the production status of the workshop can be quickly grasped, providing support for workshop decision-making.
2、本发明公开的一组复合车间调度规则,将单种资源的调度规则通过组合生成复合调度规则,每个单独的调度规则可以直接用于车间决策,直接选出相对应的工件、机器和AGV,多种的复合调度规则可以在不同的情况下选择使用,提高调度的灵活性。2. The present invention discloses a set of composite workshop scheduling rules, which generate composite scheduling rules by combining the scheduling rules of single resources. Each individual scheduling rule can be directly used for workshop decision-making to directly select the corresponding workpieces, machines and AGVs. A variety of composite scheduling rules can be selected and used in different situations to improve the flexibility of scheduling.
3、本发明公开的一种奖励函数,通过最小化每道工序的延迟时间来实现降低工件总延迟时间的目的。3. A reward function disclosed by the present invention achieves the purpose of reducing the total delay time of the workpiece by minimizing the delay time of each process.
4、本发明公开的基于深度强化学习的柔性多资源车间动态调度方法,通过计算车间状态特征值,将车间的状态值输入到神经网络中,结合D3QN算法,决策相对应的复合调度规则,执行调度规则并计算奖励函数获得奖励的方式实现柔性多资源车间的动态调度,能够在一定程度上减少工件的总延迟时间,提高车间的生产效率。4. The method for dynamic scheduling of flexible multi-resource workshops based on deep reinforcement learning disclosed in the present invention calculates the state characteristic values of the workshop, inputs the state values of the workshop into the neural network, combines the D3QN algorithm, decides the corresponding composite scheduling rules, executes the scheduling rules and calculates the reward function to obtain rewards to achieve dynamic scheduling of flexible multi-resource workshops, which can reduce the total delay time of workpieces to a certain extent and improve the production efficiency of the workshop.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为D3QN奖励曲线;Figure 1 shows the D3QN reward curve;
图2为本发明公开的基于深度强化学习的柔性多资源车间动态调度方法流程图。Figure 2 is a flow chart of the flexible multi-resource workshop dynamic scheduling method based on deep reinforcement learning disclosed in the present invention.
具体的实施方式Specific implementation method
下面将结合附图和实施例对本发明加以详细说明。同时也叙述了本发明技术方案解决的技术问题及有益效果,需要指出的是,所描述的实施例仅旨在便于对本发明的理解,而对其不起任何限定作用。The present invention will be described in detail below with reference to the accompanying drawings and embodiments. The technical problems solved by the technical solution of the present invention and the beneficial effects are also described. It should be noted that the described embodiments are only intended to facilitate the understanding of the present invention and do not have any limiting effect on it.
本实施例柔性作业车间调度问题描述如下:车间里有10台机器用于加工工件,有7台自动导引车辆(Automated Guided Vehicle,AGV)用于执行运输任务,一批加工需求中有60个工件,其中在初始时刻有15个初始工件,在后续的时间内会陆续到达45个工件,每个工件的加工存在既定的工艺路线,一条工艺路线存在若干工序,这些工序需要按照一定的顺序一次在,工件的运输任务需要通过AGV来完成。初始时刻,AGV是从仓库出发,工件的初始位置也在仓库。随着时间的推移,后续到达的工件进入车间,这些工件的到达时间,工艺路线和加工时间,可选加工机器等信息并不能提前知道,而是等工件进入车间之后才知道。The flexible job shop scheduling problem of this embodiment is described as follows: There are 10 machines in the workshop for processing workpieces, and 7 automated guided vehicles (AGVs) for performing transportation tasks. There are 60 workpieces in a batch of processing requirements, of which 15 are initial workpieces at the initial moment, and 45 workpieces will arrive in succession in the subsequent period of time. There is a predetermined process route for the processing of each workpiece, and a process route has several processes. These processes need to be completed one at a time in a certain order, and the transportation task of the workpiece needs to be completed by AGV. At the initial moment, the AGV starts from the warehouse, and the initial position of the workpiece is also in the warehouse. As time goes by, the workpieces that arrive later enter the workshop. The arrival time, process route and processing time, optional processing machines and other information of these workpieces cannot be known in advance, but only after the workpiece enters the workshop.
工件的信息如下表所示:The information of the artifact is shown in the following table:
表工件信息表Table Artifact Information Table
其中dtij'是工序Oij'的交货期。Where dt ij' is the delivery date of process O ij' .
其中,仓库和机器的位置如下表所示:Among them, the locations of warehouses and machines are shown in the following table:
表车间设施位置表Table Workshop Facility Location Table
通过车间设施的位置,可以计算得出各个设施之间的行驶距离和行驶时间:Based on the location of the workshop facilities, the driving distance and driving time between the facilities can be calculated:
tkk'=dkk'=|xk-xk'|+|yk-yk'|t kk' =d kk' =|x k -x k' |+|y k -y k' |
其中tkk'和dkk'分别是机器Mk和机器Mk'之间的距离和AGV行驶时间,xk和yk是机器Mk的位置坐标。where t kk' and d kk' are the distance and AGV travel time between machine M k and machine M k', respectively, and x k and y k are the position coordinates of machine M k .
在本发明公开的基于深度强化学习算法的柔性多资源车间动态调度方法中上述调度问题中,为了和其他考虑罂粟不同的调度问题区别开来,做出以下说明:In the above scheduling problem in the flexible multi-resource workshop dynamic scheduling method based on deep reinforcement learning algorithm disclosed by the present invention, in order to distinguish it from other scheduling problems that consider different factors, the following explanation is made:
A、工序的加工时间和AGV的行驶时间是确定的;A. The processing time of the process and the driving time of the AGV are determined;
B、工件的加工准备时间已经考虑在工序加工时间内,工件的装卸时间已经考虑在AGV的运输时间内;B. The processing preparation time of the workpiece has been considered in the process processing time, and the loading and unloading time of the workpiece has been considered in the transportation time of the AGV;
C、工件在机器上的加工是连续进行的,不能被其他工件抢占;工件在AGV上的运输也是连续进行的,不能被其他工件抢占;C. The processing of workpieces on the machine is continuous and cannot be preempted by other workpieces; the transportation of workpieces on the AGV is also continuous and cannot be preempted by other workpieces;
D、每一台机器在同一时刻最多只能够加工一个工件;每一台AGV在同一时刻最多只能够运输一个工件;D. Each machine can only process one workpiece at a time; each AGV can only transport one workpiece at a time;
E、每个AGV的性能完全相同,对于同一个运输任务,可以任意选择其中一台AGV进行完成;E. The performance of each AGV is exactly the same. For the same transportation task, any one of the AGVs can be selected to complete it.
F、每台机器和AGV有足够大的缓冲区容纳工件F. Each machine and AGV has a buffer large enough to accommodate the workpiece
G、不考虑机器故障、机器停机、AGV电量不足、AGV故障和AGV堵塞等问题基于深度强化学习的柔性多资源车间动态调度方法,其特征在于:包括如下步骤:G. A flexible multi-resource workshop dynamic scheduling method based on deep reinforcement learning without considering problems such as machine failure, machine shutdown, AGV power shortage, AGV failure and AGV congestion is characterized by comprising the following steps:
步骤一、以最小化平均延期率ATR(Average Tardiness Rate,ATR)为优化目标合理的安排工序的加工机器集合、各台机器上的加工顺序和运输工件的AGV,使得调度目标得到优化;Step 1: Taking minimizing the average tardiness rate (ATR) as the optimization goal, reasonably arrange the processing machine set of the process, the processing sequence on each machine and the AGV for transporting the workpiece, so that the scheduling goal is optimized;
步骤二、使用D3QN(Dueling Double DQN)方法对步骤一的问题进行求解:Step 2: Use the D3QN (Dueling Double DQN) method to solve the problem in step 1:
1、D3QN算法拥有两个神经网络,分别是Main Q Network和Target Q Network,这两个网络的结构一样,均为输入层、隐藏层和双输出子网的结构;输入层的输入维数和车间状态的维数保持一致;隐藏层采用一层或者多层全连接层;双输出子网包括状态价值函数网络Vn和优势函数网络An,状态价值函数网络Vn负责评估状态s的价值,优势函数网络An负责评估在当前状态下各个动作a的相对优劣,最终网络的价值输出Q由Vn和An线性组合得到:1. The D3QN algorithm has two neural networks, namely Main Q Network and Target Q Network. The structures of these two networks are the same, both of which are input layer, hidden layer and dual output subnet structure. The input dimension of the input layer is consistent with the dimension of the workshop state. The hidden layer uses one or more fully connected layers. The dual output subnet includes the state value function network Vn and the advantage function network An. The state value function network Vn is responsible for evaluating the value of the state s, and the advantage function network An is responsible for evaluating the relative advantages and disadvantages of each action a in the current state. The final network value output Q is obtained by linear combination of Vn and An:
2、参数设置:设置问题的相关参数,其中包括神经网络的参数和超参数。D3QN有两个神经网络,Main Q Network和Target Q Network,这两个网络的结构是一样的,都采用一个输入层,接三层全连接层和一个输出层的结构,全连接层的每层节点设置为256,激活函数设置为ReLU。超参数中,经验池的大小设置为4000,抽样批量为32,折扣因子设置为0.99,初始探索率设置为1,探索率的衰减率设置为0.999,探索率的最小值设置为0.04,学习率设置为0.00005,训练次数设置为5000。2. Parameter setting: Set the relevant parameters of the problem, including the parameters and hyperparameters of the neural network. D3QN has two neural networks, Main Q Network and Target Q Network. The structures of these two networks are the same. They both use an input layer, three fully connected layers and an output layer. The nodes of each layer of the fully connected layer are set to 256, and the activation function is set to ReLU. In the hyperparameters, the size of the experience pool is set to 4000, the sampling batch is 32, the discount factor is set to 0.99, the initial exploration rate is set to 1, the decay rate of the exploration rate is set to 0.999, the minimum exploration rate is set to 0.04, the learning rate is set to 0.00005, and the number of training times is set to 5000.
3、初始化训练次数episode=0,初始化当前时间t=0,初始化车间资源的状态;3. Initialize the number of training episodes = 0, initialize the current time t = 0, and initialize the status of workshop resources;
4、判断当前回合是否结束,如果结束则执行下一个回合并初始化时间和车间资源状态,否则继续执行;回合结束的判定条件为:是否所有的工件都加工结束;4. Determine whether the current round is over. If it is over, execute the next round and initialize the time and workshop resource status, otherwise continue to execute; the judgment condition for the end of the round is: whether all workpieces have been processed;
5、判断当前时刻是否为决策点,决策点的定义为:当前时刻有到来的新工件和已经结束上一道工序的待加工工件;如果是决策点则进行决策,进行第6步;如果不是决策点,则跳转至第15步;5. Determine whether the current moment is a decision point. The definition of a decision point is: at the current moment, there are new workpieces arriving and workpieces to be processed that have completed the previous process; if it is a decision point, make a decision and proceed to step 6; if it is not a decision point, jump to step 15;
6、计算当前车间的三类状态特征值,即已经到达工件的状态、机器的状态和AGV的状态;三类状态特征值一共14种状态,分别如下所示:6. Calculate the three types of state characteristic values of the current workshop, namely the state of the workpiece that has arrived, the state of the machine, and the state of the AGV; there are a total of 14 states of the three types of state characteristic values, which are as follows:
(1)已经到达工件的平均完成率:(1) Average completion rate of workpieces that have been reached:
f1=mean{jcr}f 1 =mean{jcr}
工件完成率 Workpiece completion rate
其中OPi是工件Ji的总工序数,OCi是工件Ji已经完工的工序数;Where OP i is the total number of processes for workpiece Ji i , and OC i is the number of completed processes for workpiece Ji i ;
(2)已经到达工件平均完成率的标准差:(2) The standard deviation of the average completion rate of the workpieces has been reached:
f2=std{jcr}f 2 = std{jcr}
(3)已经到达工序平均完成率:(3) The average process completion rate has been reached:
(4)已经到达工序的平均运输等待率(4) Average waiting rate of transportation that has arrived at the process
f4=mean{otwr}f 4 =mean{otwr}
工序的运输等待率 Transportation waiting rate of the process
其中wttij是工序Oij的运输等待时间,ttijr是工序Oij在AGV Ar上的运输时间,Oi={Oij|Oi1,Oi2,...,Oij}是工件Ji的工序集;Where wtt ij is the transportation waiting time of process O ij , tt ijr is the transportation time of process O ij on AGV A r , O i ={O ij |O i1 ,O i2 ,...,O ij } is the process set of workpiece Ji ;
(5)已经到达工序的平均加工等待率(5) Average processing waiting rate of the process that has arrived
f5=mean{opwr}f 5 =mean{opwr}
工序的加工等待率 Processing waiting rate of the process
其中wptij是工序Oij的加工等待时间,ptijk是工序Oij的在机器Mk上的加工时间,Mij是工序Oij可用的机器集;Where wpt ij is the processing waiting time of process O ij , pt ijk is the processing time of process O ij on machine M k , and M ij is the set of machines available for process O ij ;
(6)已经到达工件当前工序的平均预计延期率:(6) Average expected delay rate of workpieces that have arrived at the current process:
f6=mean{cotr}f 6 =mean{cotr}
工件的当前工序的预计延期率 The expected delay rate of the current process of the workpiece
其中ctic是工序Oic的完工时间,dtic是工序Oic的工序交货期,Ci是工件Ji的交货期,Oic是工件Ji当前正在处理工序,ati是工件Ji的到达时间;Where ct ic is the completion time of process O ic , dt ic is the process delivery time of process O ic , C i is the delivery time of workpiece Ji i , O ic is the process that workpiece Ji i is currently processing, and at i is the arrival time of workpiece Ji i ;
(7)已经到达工件的平均预计延期率的标准差:(7) Standard deviation of the average expected delay rate of the arrived workpieces:
f7=std{cotr}f 7 = std{cotr}
(8)已经到达工件平均预计延期率的最大值:(8) The maximum value of the average expected delay rate of the workpiece has been reached:
f8=max{cotr}f 8 = max{cotr}
(9)机器的平均利用率:(9) Average machine utilization:
f9=mean{mur}f 9 =mean{mur}
机器的利用率为 The machine utilization rate is
其中RTk为机器Mk的累计工作时间,t是当前的时间;Where RT k is the cumulative working time of machine M k , and t is the current time;
(10)机器的平均利用率的标准差:(10) Standard deviation of the average utilization rate of the machine:
f10=std{mur}f 10 = std{mur}
(11)机器的平均剩余任务量:(11) Average remaining workload of the machine:
其中rptij是工序Oij的剩余加工时间,xijk是判断工序Oij是否在机器Mk上加工,m是机器的数量;Where rpt ij is the remaining processing time of process O ij , x ijk is used to determine whether process O ij is processed on machine M k , and m is the number of machines;
(12)AGV的平均利用率:(12) Average utilization rate of AGV:
f12=mean{aur}f 12 =mean{aur}
AGV的利用率为 The utilization rate of AGV is
其中a是AGV的数量,RTr是AGVAr的累计工作时间;Where a is the number of AGVs, RT r is the cumulative working time of AGVA r ;
(13)AGV的平均利用率的标准差:(13) Standard deviation of average utilization of AGV:
f13=std{aur}f 13 = std{aur}
(14)AGV的平均剩余任务量:(14) Average remaining task volume of AGV:
其中rttij是工序Oij的剩余运输时间,yijr是判断工序Oij是否在AGVAr上进行运输;Where rtt ij is the remaining transportation time of process O ij , and y ijr is used to determine whether process O ij is transported on AGVA r ;
最终输入神经网络的车间状态是:The final workshop state input to the neural network is:
s={f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13,f14}s={f 1 ,f 2 ,f 3 ,f 4 ,f 5 ,f 6 ,f 7 ,f 8 ,f 9 ,f 10 ,f 11 ,f 12 ,f 13 ,f 14 }
7、进行决策:随机生成一个0到1之间的小数,如果该数小于或者等于当前的探索率,则将随机选择一个复合调度规则;否则,将当前的车间状态值输入到Main Q Network中进行计算,选择价值最大的复合调度规则;7. Make a decision: randomly generate a decimal between 0 and 1. If the decimal is less than or equal to the current exploration rate, a composite scheduling rule will be randomly selected; otherwise, the current workshop status value is input into the Main Q Network for calculation, and the composite scheduling rule with the greatest value is selected;
所述复合调度规则是通过将三类调度规则排列组合的方式生成一组复合调度规则;每个单独的复合调度规则能够直接选择出相应的工件、机器和AGV;The composite scheduling rule is generated by arranging and combining three types of scheduling rules to generate a set of composite scheduling rules; each individual composite scheduling rule can directly select the corresponding workpiece, machine and AGV;
所述三类调度规则为选择加工工件的规则、选择加工机器的规则和选择运输AGV的规则;The three types of scheduling rules are rules for selecting workpieces to be processed, rules for selecting processing machines, and rules for selecting transport AGVs;
所述选择加工工件的规则:The rules for selecting workpieces to be processed are:
(6)选择当前工序加工时间最短的工件(6) Select the workpiece with the shortest processing time in the current process
其中Jd是选择出来的工件,Lt是当前时刻t中待调度的工件的集合,ptick是工序Oic在机器Mk上的加工时间,Mic是工序Oic可用的机器集合,amic是工序Oic可用的机器数量;Where J d is the selected workpiece, L t is the set of workpieces to be scheduled at the current time t, pt ick is the processing time of process O ic on machine M k , M ic is the set of machines available for process O ic , and am ic is the number of machines available for process O ic ;
(7)选择当前工序加工时间最长的工件(7) Select the workpiece with the longest processing time in the current process
(8)选择剩余加工时间最小的工件(8) Select the workpiece with the shortest remaining processing time
Mij是工序Oij可用的机器集合,amij是工序Oij可用的机器数量M ij is the set of machines available for process O ij , and am ij is the number of machines available for process O ij.
(9)选择剩余加工时间最大的工件(9) Select the workpiece with the largest remaining processing time
(10)选择交货期最短的工件(10) Select the workpiece with the shortest delivery time
其中Di是工件Ji的交货期;Where D i is the delivery date of workpiece Ji ;
选择加工机器的规则:Rules for selecting processing machines:
(4)选择可用机器中剩余工作时间最小的机器(4) Select the machine with the shortest remaining working time among the available machines
其中Md是选择出的机器,mrtk是机器Mk的剩余工作时间;Where M d is the selected machine, mrt k is the remaining working time of machine M k ;
(5)选择可用机器中机器利用率最小的机器(5) Select the machine with the lowest utilization rate among the available machines
(6)选择可用机器中加工最快的机器(6) Choose the fastest machine among the available machines
选择运输工件AGV的规则:Rules for selecting AGV for transporting workpieces:
(3)选择剩余运输时间最小的AGV(3) Select the AGV with the shortest remaining transport time
其中Ad是被选择的AGV,artr是AGVAr的剩余工作时间;Where A d is the selected AGV, art r is the remaining working time of AGVA r ;
(4)选择利用率最低的AGV(4) Select the AGV with the lowest utilization rate
上述五种选择加工工件的规则、三种选择加工机器的规则和两种选择运输工件的AGV的规则,通过排列组合的方式生成了三十种复合调度规则;The above five rules for selecting workpieces to be processed, three rules for selecting processing machines, and two rules for selecting AGVs to transport workpieces generate thirty composite scheduling rules through permutations and combinations;
8、执行调度规则:通过决策出的复合规则选择相对应的工件、机器和AGV;8. Execute scheduling rules: select the corresponding workpiece, machine and AGV through the composite rules decided;
9、采用第6步的方法计算执行调度规则后的车间的状态特征值;9. Use the method in step 6 to calculate the state characteristic value of the workshop after the scheduling rules are implemented;
10、计算奖励:本文的优化目标为最小化总延迟时间(Total Delay Time,TDT),因此奖励函数的设计要能保证奖励最大化的趋势和优化目标的优化趋势一致,具体的奖励函数如为:如果决策选择的工件的预计工序完成时间小于该工序的交货期,则奖励为0;否则,奖励为工件的预计工序完成时间与交货期差值;10. Calculate rewards: The optimization goal of this paper is to minimize the total delay time (TDT). Therefore, the design of the reward function should ensure that the trend of maximizing the reward is consistent with the optimization trend of the optimization goal. The specific reward function is as follows: if the estimated process completion time of the workpiece selected by the decision is less than the delivery date of the process, the reward is 0; otherwise, the reward is the difference between the estimated process completion time and the delivery date of the workpiece;
rij=max(0,ctij-dtij) rij = max(0, ctij -dtij )
其中TDT为总延期,Ci是工件Ji的完工时间,Di是工件Ji的交货期,rij代表的是这次决策所得到的奖励,ctij代表的是工序Oij的完工时间,dtij代表的是工序Oij的交货期;Where TDT is the total delay, Ci is the completion time of workpiece Ji , Di is the delivery date of workpiece Ji , r ij represents the reward obtained from this decision, ct ij represents the completion time of process O ij , and dt ij represents the delivery date of process O ij ;
11、判断当前决策是否为该回合的最后一步决策,如果是则,done=1;否则done=0;11. Determine whether the current decision is the last decision of this round. If so, done = 1; otherwise done = 0;
12、将每次的调度经验以(st,at,st+1,r,done)的形式存储到经验池中,为后续神经网络的学习和训练提供经验;其中st代表的含义是决策前的车间状态值,at代表的是决策出的调度规则,r代表这次决策得到的奖励,st+1代表的是执行决策结果后的车间状态值,done代表的是该步决策是否为最后一步决策;12. Store each scheduling experience in the experience pool in the form of (s t , a t , s t+1 , r, done) to provide experience for subsequent neural network learning and training; where s t represents the workshop state value before the decision, a t represents the scheduling rule decided, r represents the reward obtained for this decision, s t+1 represents the workshop state value after the decision result is executed, and done represents whether this step of decision is the last step of decision;
13、经验回放:判断当前经验池中存储的经验的个数是否大于最小的Batch数,如果大于,则从经验池中随机选取一个Batch大小的经验用于神经网络的训练和更新;如果小于或者等于,则不执行任何操作;13. Experience playback: Determine whether the number of experiences stored in the current experience pool is greater than the minimum batch number. If it is greater, randomly select a batch size of experience from the experience pool for training and updating the neural network; if it is less than or equal to, do not perform any operation;
14、更新D3QN中的Main Q Network:将状态st输入到Main Q Network中并估计其Q值,将状态st+1输入Target Q Network中结合奖励值r得到目标值Y,Main Q Network的参数采用梯度下降的方式更新;而Target Network的参数则是通过间隔一定时间复制MainQNetwork的参数得来的;因此,每决策二十次,对Target Q Network的权重进行更新;14. Update the Main Q Network in D3QN: Input the state s t into the Main Q Network and estimate its Q value. Input the state s t+1 into the Target Q Network and combine it with the reward value r to get the target value Y. The parameters of the Main Q Network are updated by gradient descent. The parameters of the Target Network are obtained by copying the parameters of the MainQNetwork at regular intervals. Therefore, the weights of the Target Q Network are updated every 20 decisions.
Y=r+γQt(st+1,argmaxQ(st+1,am;ω,α,β);ωt,αt,βt)*(1-done)Y=r+γQ t (s t+1 ,argmaxQ(s t+1 ,a m ;ω,α,β);ω t ,α t ,β t )*(1-done)
其中Ε[]2表示为均方误差,(st,at,st+1,r,done)~U(B)表示从样本中抽取的数据,γ为折扣因子,argmaxaQ(st+1,am;ω,α,β)表示为Main Q Network评估下一状态st+1下价值最大的动作am;Where Ε[] 2 represents the mean square error, (s t ,a t ,s t+1 ,r,done)~U(B) represents the data extracted from the sample, γ is the discount factor, argmax a Q(s t+1, a m ; ω,α,β) represents the action a m with the greatest value in the next state s t+1 evaluated by the Main Q Network;
15、更新车间工件、机器和AGV的状态,更新当前时间为t=t+1;15. Update the status of workshop workpieces, machines and AGVs, and update the current time to t = t + 1;
16、判断是否已经达到训练次数,如果已经完成,则保存神经网络的参数,得到训练好的神经网络;否则,跳转执行第4步;16. Determine whether the training times have been reached. If so, save the parameters of the neural network to obtain the trained neural network; otherwise, jump to step 4.
17、向第16步得到的训练好的神经网络输入车间的当前状态,得到决策结果;所述决策结果包括当前时间选择待加工的工件,选择加工该工件的机器和选择运输该工件的AGV;通过执行该决策结果,实现降低总延迟时间的效果,提高车间的生产效率。17. Input the current state of the workshop into the trained neural network obtained in step 16 to obtain a decision result; the decision result includes selecting the workpiece to be processed at the current time, selecting the machine to process the workpiece, and selecting the AGV to transport the workpiece; by executing the decision result, the effect of reducing the total delay time is achieved and the production efficiency of the workshop is improved.
计算结果与分析:Calculation results and analysis:
在上述实施例的条件下,本文提出方法所得到的奖励图如下图所示:Under the conditions of the above embodiment, the reward graph obtained by the method proposed in this article is shown in the following figure:
从图1中可以看出奖励曲线随着训练次数的增加而不断上升,最后稳定收敛。将D3QN算法的效果与本文设计出的三十种复合调度规则的效果进行比较,如下表所示:From Figure 1, we can see that the reward curve continues to rise with the increase in the number of training times, and finally converges stably. The effect of the D3QN algorithm is compared with the effects of the thirty composite scheduling rules designed in this paper, as shown in the following table:
表算法效果对比表Table Algorithm effect comparison table
从表中分析表明,在实施例1的问题中,D3QN算法的效果与最优的复合规则相比,提升了约19%左右,可见D3QN算法对比复合规则的效果有着明显的提升,能够有效的降低总延迟,针对该实施例1中提出的柔性多资源车间调度问题,D3QN算法能够兼顾响应能力和优化效果。Analysis from the table shows that in the problem of Example 1, the effect of the D3QN algorithm is improved by about 19% compared with the optimal composite rule. It can be seen that the effect of the D3QN algorithm is significantly improved compared with the composite rule, and the total delay can be effectively reduced. For the flexible multi-resource workshop scheduling problem proposed in Example 1, the D3QN algorithm can take into account both responsiveness and optimization effect.
本发明在分析柔性多资源作业车间工件、机器和AGV等多种资源调度时,考虑了工艺柔性约束和紧急插单动态事件,提出了基于深度强化学习的调度方法,优化了AGV搬运任务和机器的加工任务的分配方案,在求解相关问题是能够快速收敛并且结果较为稳定,相对于传统算法有一定的优越性,提高生产效率,减少总延迟。When analyzing the scheduling of multiple resources such as workpieces, machines and AGVs in a flexible multi-resource job shop, the present invention takes into account process flexibility constraints and dynamic events of emergency insertion, proposes a scheduling method based on deep reinforcement learning, and optimizes the allocation scheme of AGV handling tasks and machine processing tasks. It can converge quickly and the results are relatively stable when solving related problems. It has certain advantages over traditional algorithms, improves production efficiency and reduces total delay.
以上所述的具体描述,对发明的目的、技术方案和有益效果进行进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施例而已,并不用于限定本发明的保护范围,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The specific description above further illustrates the purpose, technical solutions and beneficial effects of the invention in detail. It should be understood that the above description is only a specific embodiment of the present invention and is not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention should be included in the scope of protection of the present invention.
Claims (1)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310703139.1A CN116703641A (en) | 2023-06-14 | 2023-06-14 | Flexible multi-resource workshop dynamic scheduling method based on deep reinforcement learning |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310703139.1A CN116703641A (en) | 2023-06-14 | 2023-06-14 | Flexible multi-resource workshop dynamic scheduling method based on deep reinforcement learning |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN116703641A true CN116703641A (en) | 2023-09-05 |
Family
ID=87825294
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310703139.1A Pending CN116703641A (en) | 2023-06-14 | 2023-06-14 | Flexible multi-resource workshop dynamic scheduling method based on deep reinforcement learning |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN116703641A (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118966715A (en) * | 2024-10-15 | 2024-11-15 | 贵州师范大学 | Dynamic intelligent scheduling method for multi-objective flexible workshop in industrial green microgrid |
| WO2025236661A1 (en) * | 2024-05-11 | 2025-11-20 | 浙江大学 | Workshop scheduling method and apparatus considering skill levels and fatigue degrees of workers |
-
2023
- 2023-06-14 CN CN202310703139.1A patent/CN116703641A/en active Pending
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2025236661A1 (en) * | 2024-05-11 | 2025-11-20 | 浙江大学 | Workshop scheduling method and apparatus considering skill levels and fatigue degrees of workers |
| CN118966715A (en) * | 2024-10-15 | 2024-11-15 | 贵州师范大学 | Dynamic intelligent scheduling method for multi-objective flexible workshop in industrial green microgrid |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109388484B (en) | Multi-resource cloud job scheduling method based on Deep Q-network algorithm | |
| CN109800904B (en) | Method and system for optimizing distribution route of prefabricated prefabricated buildings with time window | |
| CN116703641A (en) | Flexible multi-resource workshop dynamic scheduling method based on deep reinforcement learning | |
| CN111091238A (en) | An intelligent scheduling method for AGV in an automated container terminal | |
| CN116500994B (en) | Dynamic multi-target scheduling method for low-carbon distributed flexible job shop | |
| CN106779252B (en) | A real-time route planning method for AGV based on improved quantum ant colony algorithm | |
| CN111985672A (en) | Single-piece job shop scheduling method for multi-Agent deep reinforcement learning | |
| CN110334838A (en) | AGV trolley coordinated dispatching method and system based on ant group algorithm and genetic algorithm | |
| CN113837628B (en) | Metallurgical industry workshop crown block scheduling method based on deep reinforcement learning | |
| Liu et al. | A framework for scheduling in cloud manufacturing with deep reinforcement learning | |
| CN115034143B (en) | A multi-resource collaborative intelligent workshop equipment configuration optimization method | |
| CN111260144B (en) | Method for solving single-machine batch scheduling problem under condition of random arrival of different workpieces | |
| CN116880425A (en) | Real-time scheduling method for flexible job shop based on deep reinforcement learning of Dueling architecture | |
| CN116596440A (en) | Automatic stereoscopic warehouse-in and warehouse-out intelligent scheduling method | |
| CN118917567A (en) | Multi-target distributed flexible job shop scheduling method based on hierarchical selection type deep reinforcement learning | |
| CN117742277A (en) | A collaborative scheduling method between workshop machines and AGVs | |
| CN119536276A (en) | A method for multi-AGV multi-task allocation | |
| Chen et al. | A matching algorithm with reinforcement learning and decoupling strategy for order dispatching in on-demand food delivery | |
| CN118469525A (en) | Grain surface multi-inspection robot cooperative task allocation method based on improved NSGA-II | |
| CN120355202B (en) | Flexible workshop production resource scheduling method based on multi-strategy deep reinforcement learning | |
| Cervellera et al. | Policy optimization for berth allocation problems | |
| CN113361915B (en) | Flexible job shop scheduling method based on deep reinforcement learning and multi-agent graph | |
| Ge et al. | Research on online scheduling method for flexible assembly workshop of multi-agv system based on assembly island mode | |
| CN118963272A (en) | A reconfigurable workshop dynamic scheduling method based on reinforcement learning and rule evolution | |
| CN114723360A (en) | A logistics vehicle scheduling management model based on improved particle swarm algorithm |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |