[go: up one dir, main page]

WO2023085560A1 - Procédé de gestion de réponse à la demande d'un système de fabrication industrielle discret à l'aide d'un apprentissage par renforcement contraint - Google Patents

Procédé de gestion de réponse à la demande d'un système de fabrication industrielle discret à l'aide d'un apprentissage par renforcement contraint Download PDF

Info

Publication number
WO2023085560A1
WO2023085560A1 PCT/KR2022/012145 KR2022012145W WO2023085560A1 WO 2023085560 A1 WO2023085560 A1 WO 2023085560A1 KR 2022012145 W KR2022012145 W KR 2022012145W WO 2023085560 A1 WO2023085560 A1 WO 2023085560A1
Authority
WO
WIPO (PCT)
Prior art keywords
policy
manufacturing system
state
algorithm
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/KR2022/012145
Other languages
English (en)
Korean (ko)
Inventor
홍승호
장시옹펑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nestfield Co Ltd
Original Assignee
Nestfield Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nestfield Co Ltd filed Critical Nestfield Co Ltd
Publication of WO2023085560A1 publication Critical patent/WO2023085560A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for AC mains or AC distribution networks
    • H02J3/12Circuit arrangements for AC mains or AC distribution networks for adjusting voltage in AC networks by changing a characteristic of the network load
    • H02J3/14Circuit arrangements for AC mains or AC distribution networks for adjusting voltage in AC networks by changing a characteristic of the network load by switching loads on to, or off from, network, e.g. progressively balanced loading
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for AC mains or AC distribution networks
    • H02J3/12Circuit arrangements for AC mains or AC distribution networks for adjusting voltage in AC networks by changing a characteristic of the network load
    • H02J3/14Circuit arrangements for AC mains or AC distribution networks for adjusting voltage in AC networks by changing a characteristic of the network load by switching loads on to, or off from, network, e.g. progressively balanced loading
    • H02J3/144Demand-response operation of the power transmission or distribution network
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S20/00Management or operation of end-user stationary applications or the last stages of power distribution; Controlling, monitoring or operating thereof
    • Y04S20/20End-user application control systems
    • Y04S20/222Demand response systems, e.g. load shedding, peak shaving

Definitions

  • the present invention relates to a demand response management method for a discrete industrial manufacturing system, and more specifically, a cost-effective operating strategy for a discrete industrial manufacturing system modeled with a Constrained Markov Decision Process (CMDP) and adopting a Constrained Reinforcement Learning (CRL) algorithm. It is about a demand response management method for a discrete industrial manufacturing system to which constraint reinforcement learning can determine
  • DR Demand response
  • RL reinforcement learning
  • RL has no models. There is no need for predefined rules or predefined rules to determine how to select an action.
  • Second, RL is adaptive. Valuable knowledge can be learned from historical data to deal with highly uncertain system dynamics, and the extracted knowledge can be generalized and applied to newly emerging situations. The impressive advantages of RL can potentially solve smart grid decision-making problems such as DR energy management, electric vehicle charging and dynamic economic dispatch.
  • the present invention aims to provide a Constraint Reinforcement Learning (CRL) based DR management algorithm for an industrial manufacturing system that seeks to optimize energy costs while ensuring production targets.
  • CTL Constraint Reinforcement Learning
  • the present invention has the effect of determining a cost-effective operating strategy for a discrete industrial manufacturing system.
  • the present invention proposes a CRL-based DR algorithm for discrete manufacturing systems to minimize energy costs while achieving production targets.
  • discrete manufacturing systems are formulated into CMDP and CRL algorithms are adopted to identify optimal operating schedules for all machines.
  • the DR scheme according to the present invention can be evaluated in many other types of industrial facilities that integrate energy storage systems and renewable energy resources (eg solar and wind power), and can also be implemented in actual industrial manufacturing systems to evaluate their performance. you will be able to receive
  • FIG. 1 is a diagram showing the general configuration of an industrial discrete manufacturing system according to the present invention.
  • FIG. 2 is a diagram showing the configuration of an actor-critic scheme
  • FIG. 3 is a view showing a lithium ion battery assembly process to which the method according to the present invention is applied.
  • FIG. 4 is a diagram showing a hierarchical structure of a lithium ion battery module.
  • FIG. 5 is a diagram showing accumulated compensation in a learning process according to the present invention.
  • Figure 7 shows the storage capacity of final battery production at each time step (interval).
  • Figure 8 shows the total cost obtained by CSAC and Gurobi solver.
  • 9 is a diagram showing accumulated compensation in the learning process from June 2 to 4, 2019.
  • 10 is a diagram showing total energy demand corresponding to electricity prices from June 2 to June 4, 2019.
  • 11 is a diagram showing the storage amount of final battery production in each time interval from June 2 to 4, 2019.
  • FIG. 12 is a flow chart showing processing steps in the method according to the present invention.
  • the operating cost optimization problem of an industrial manufacturing process is formulated as a Constrained Markov Decision Process (CMDP) without prediction of unknown variables or dynamic models that consider both energy consumption and resource management.
  • CMDP Constrained Markov Decision Process
  • the efficiency of the CRL algorithm according to the present invention is evaluated using a typical lithium ion battery assembly manufacturing process, which is an actual industrial case.
  • the evaluation results show that the CRL algorithm according to the present invention can balance energy demand and supply and reduce energy costs while ensuring production requirements. This indicates that the algorithm according to the present invention has the potential to handle complex industrial DR management problems.
  • the present invention is the first application of CRL to DR management of an industrial manufacturing system that simultaneously considers the operation of several continuous machines and the use of various resources (electricity and production materials).
  • EMC energy management center
  • RTP real-time pricing
  • M i,j and B i,j denote a machine and a corresponding buffer used to process and store intermediate products, respectively.
  • i denotes the ith serial production line branch
  • j denotes the jth machine or buffer of the ith branch.
  • each machine has two operating options: running and idle.
  • Running means the machine is in full operating mode and Idle means the machine is going into sleep mode.
  • each machine can select only one operating option. Therefore, the energy consumption of machine M i,j during step t can be expressed as Equation 1.
  • e i,j op and e i,j idle denote the energy consumption of machine M i,j in operating or sleep mode, respectively.
  • Equation 2 the energy consumption of the entire manufacturing system during step t can be summed using Equation 2.
  • Equation 3 the total energy consumption E t of the entire system depends on the maximum capacity E max of the local power grid as shown in Equation 3.
  • the buffer B i,j serves as a storage space for the different parts of the product along the production process.
  • the material storage of buffer B i,j is expressed as Equation 4.
  • P i,j t (C i,j t ) represents the amount produced (consumed) in buffer B i,j during step t.
  • P i,j t and C i,j t are equal to Equations 5 and 6, respectively.
  • p i,j t (c i,j t ) denotes the production (consumption) rate of machine M i,j in operating mode z i,j t .
  • B i,j min and B i,j max represent the minimum and maximum amounts of material in buffer B i,j .
  • Equation 8 The objective function of the system can be expressed by Equations 8 and 9.
  • Equation 8 defines the energy cost minimization for the day where v t represents the electricity price at step t.
  • CMDP is briefly described, then the industrial DR management problem is formulated as CMDP, and finally, the CRL methodology is applied to solve CMDP.
  • CMDP features six pairs (S, A, ⁇ , P r , R, R c ).
  • S represents the state space where there are available states.
  • A represents an action space with available actions.
  • represents the distribution of actions in a given state.
  • P r S ⁇ A ⁇ S ⁇ [0,1] represents the transition probability function
  • R (or R c ) S ⁇ A ⁇ S ⁇ R (or R c ) represents the reward function (or cost function)
  • CDMP generally includes the concept of agents and environments interacting with each other in discrete time steps. During each step t ⁇ [0,T], the agent observes the state s_t ⁇ S of the environment and chooses an action a t ⁇ A according to policy ⁇ . At the next step t+1, the agent gets the reward R(s t ,a t ,s t+1 ) ⁇ R and the cost R c (s t ,a t ,s t+1 ) ⁇ R c .
  • the environment moves to the next state s t +1 according to the transition probability function P r (s t+1
  • the agent's target is an upper bound constraint on the expected discounted total cost J c is to identify the optimal policy ⁇ that maximizes the expected discounted return J that depends on
  • is the path (s 0 , a 0 ,...s T-1 , a T-1 , s T ), ⁇ [0,1] represents the discount rate, R and R t c are R( It stands for s t ,a t ,s t+1 ) and R c (s t ,a t ,s t+1 ).
  • state value function V ⁇ (s) and action value function Q ⁇ (s) are defined as follows.
  • is a decision policy mapped from state space S to action space A , or a probability policy mapped to the probability of choosing another action in a state.
  • the state value function and the action value function can be decomposed into the immediate reward and the discount value of the subsequent state according to the Bellman equation.
  • CMDP The industrial DR management problem is established as CMDP and EMC is considered as a learning agent that interacts with the environment, i.e. the entire discrete manufacturing system.
  • the formalized CMDP includes state space, action space, reward function, and cost function.
  • the optimization horizon consists of 24 steps with a total of 24 decisions to be made based on hourly price.
  • the goal of the present invention is to find an optimal strategy to meet production target constraints while minimizing the total energy cost of a discrete manufacturing system.
  • State s observed at the beginning of each stage in industrial DR management includes four parts: time indicator t, electricity price v t , mechanical energy consumption e t ij , and buffer storage B t i,j .
  • Equation 18 represents a sample of the state s t at step t, including time indicator t, electricity price v t , mechanical energy consumption e t ij , and buffer storage B t i,j .
  • Action space A thus contains all the machine's actions.
  • Equation 20 represents a sample of action (a t ) at step t.
  • z t i,j represents the selection of the operating point of machine M i,j at step t.
  • the objective function of industrial DR consists of two parts: satisfying the production task and minimizing energy costs. Therefore, in this CMDP framework, the reward function R t is defined as:
  • R t is the reciprocal of the energy cost of the manufacturing assembly system at step t.
  • the cost function R t c is defined as
  • the second line measures the amount of final production storage B t final that exceeds the maximum allowable storage B max final .
  • the third line calculates the amount of final production storage B t final that is less than the minimum allowable storage B min final .
  • the fourth line indicates that the final production save B t final is between the minimum save B min final and the maximum save B max final .
  • the CRL algorithm according to the present invention represents a constrained soft actor-critic (CSAC) for solving CMDP.
  • CSAC constrained soft actor-critic
  • RL algorithms can be classified as value-based, policy-based or actor-critic.
  • Value-based approaches such as Q-learning and SARSA use only value functions and do not have explicit formulations for policies.
  • Policy-based approaches such as policy gradient, try to identify the optimal policy directly without any form of value function.
  • the third type is an actor-critic algorithm such as Fig. 2 that combines the above two approaches. Actors are responsible for generating actions, and critics are responsible for processing rewards. During training, when the agent observes the most recent state in the environment, the actor outputs a set of actions based on the current policy. On the other hand, critics will judge how good the current policy is through the value function. The deviation between the expected value and the reward received is then represented by the time lag (TD) error, which is fed back to actors and critics simultaneously to adjust the policy and value functions.
  • TD time lag
  • the SAC algorithm improves the stochastic policy through an off-policy method.
  • a salient property of SAC is entropy normalization, where the agent learns a policy to strike a balance between expected reward and entropy. This is closely related to the exploration-exploration mechanism. In other words, as entropy increases, more exploration occurs. This allows SAC to accelerate the learning rate and prevent premature convergence of the policy to a non-ideal local optimum.
  • the value function changes to include the entropy bonus at each step.
  • Equation (27) The two normalized value functions V h ⁇ and Q h ⁇ are connected by Equation (27).
  • Equation 27 the approximate solution of the policy (here, ) is derived as in Equation 28.
  • the agent chooses the optimal policy yields optimal policy
  • the optimal value V h *(s) can also be obtained.
  • the update mechanism of the Q-value function can be achieved through an off-policy scheme.
  • the SAC framework is presented in Algorithm 1. Implementation details such as clipped double Q-learning, baseline value function, and deferred update of value function are omitted here.
  • D is a data sampling buffer, that is, a series of empirical data.
  • V h l, ⁇ (s) is
  • Lagrange multipliers can be used to address constrained optimization problems. For the policy domain given a multiplier ⁇ k ⁇ 0 at the kth iteration By maximizing , the policy ⁇ k can be obtained.
  • ⁇ i is the step size for updating ⁇ .
  • [] + represents a non-negative real number.
  • V h ⁇ (s) is bounded by a range for all policies ⁇ .
  • third is a step for updating the parameter ⁇ of the policy neural network.
  • ⁇ ⁇ can actually be set smaller than ⁇ ⁇ .
  • the CSAC algorithm according to the present invention is summarized as Algorithm 2.
  • Input values policy neural network parameter ⁇ , state value function V parameter , the state-action-value function Q parameter , the corresponding update step size , the Lagrange multiplier ⁇ , the temperature parameter ⁇ , and the discount rate ⁇
  • Policy neural network parameter ⁇ , state value function V parameter , the state-action-value function Q parameter , the corresponding update step size , the Lagrange multiplier ⁇ , the temperature parameter ⁇ , and the discount rate ⁇ are initialized.
  • the CSAC algorithm consists of 9 neural networks, and the 9 neural networks can be classified into 3 sets.
  • the first four neural networks have two action-value functions and two state-value functions in relation to the value function in the Lagrange function (Equation 32). is used to approximate
  • the following four neural networks are associated with constraints and 2 action-value functions with parameters and two state-value functions is used to approximate
  • the last neural network with parameters ⁇ is used to approximate the policy function.
  • the parameters of the 9 neural networks are used as inputs to the CSAC algorithm. That is, the state value function V parameter , the state-action-value function Q parameter and the policy network parameter ⁇ . Also, the corresponding gradient descent update step size , the Lagrange multiplier ⁇ , the temperature parameter ⁇ , and the discount rate ⁇ are also used as inputs to the CSAC algorithm.
  • the entire learning process is controlled by three time indicators: i (line 3 of Algorithm 2), t (Line 4 of Algorithm 2) and n (Line 9 of Algorithm 2). of which i counts learning episodes.
  • t represents the daily hourly step.
  • n represents a gradient step.
  • the algorithm enters the process of accumulating experience.
  • the agent executes a t according to the current policy ⁇ , gets the reward R t , the cost R c t , and moves to the next state s t+1 .
  • each pair of (s t ,a t ,s t+1 ,R t ,R c t ) is stored in response pool D for future agent training.
  • the algorithm enters the learning phase, the gradient process.
  • the algorithm samples a random batch of B samples from response pool D, i.e. ⁇ (s t ,a t ,s t+1 ,R t ,R c t ) ⁇ .
  • is the size of the mini-batch.
  • the training targets for the Q and V networks are calculated through Equations 33 to 37. Considering Equations 34 and 35, the training target Q ⁇ is is calculated as where r t is for the neural network associated with the Lagrange function , and R c t is for the neural network associated with the constraint, updates ⁇ targ according to Equation 41 with an additional copy of the V neural network.
  • Equations 35 and 36 a clipped double Q-learning technique is used to reduce bias in the policy update process.
  • the training labels for the V neural network are is indicated by
  • Equation 37 Is Indicates that the behavior is sampled from .
  • Parameters of the action-value neural network in line 12 Mean squared error (MSE) to update To minimize , gradient descent update is performed using the Adam optimizer.
  • MSE Mean squared error
  • the MSE error To minimize gradient descent update is performed through the Adam optimizer.
  • Equation 41 the target V network parameters A deferred update of is performed in Equation 41.
  • the Lagrange multiplier ⁇ is updated according to equation (42).
  • the algorithm completes the current gradient process based on the special mini-batch B samples. Finally, the algorithm repeats the learning process until it enters the next episode and obtains the maximum cumulative reward. This means that the algorithm can generate an optimal operating policy.
  • FIG. 12 illustrates a process of a demand response management method of a discrete industrial manufacturing system to which constraint reinforcement learning is applied according to the present invention.
  • the process shown in FIG. 12 is performed in the energy management unit (EMC) of the discrete industrial manufacturing system or a separate computer device, and the optimal operating policy (energy strategy) generated through this process is applied to the discrete industrial manufacturing system. .
  • EMC energy management unit
  • energy strategy energy strategy
  • the process shown in FIG. 12 is a process for generating an optimal operating policy for demand response management of a discrete industrial manufacturing system, and this process is performed by an energy management device (EMC) or a separate computer device of a discrete industrial manufacturing system.
  • EMC energy management device
  • the method according to the present invention is largely divided into an experience accumulation step (S100), a parameter update step (S102), and an accumulated reward calculation step (S104).
  • the experience accumulation step (S100) is a process of generating a training set.
  • the state (S t ) includes the electricity price, energy consumption of each machine, and storage amount of each buffer in the corresponding time interval (Equation 18).
  • the action (a t ) represents the operation or idleness of each machine in the corresponding time interval (Equation 20).
  • Parameter update step (S102) is a process of determining the input value of the algorithm 2 described above.
  • the parameter updating step (S102) is a learning process (gradient step) for determining parameters of the neural network.
  • the input values of Algorithm 2 are the policy neural network (policy function) parameter ⁇ and the state value function V parameter. , the state-action-value function Q parameter , the corresponding update step size , the Lagrange multiplier ⁇ , the temperature parameter ⁇ , and the discount rate ⁇ .
  • policy function policy neural network
  • the processor randomly samples a mini-batch from the training set generated in the experience accumulation step (S100), and targets the state-action value function and the state value function for the mini-batch.
  • a value (target label) is calculated, and the parameters of the action value function and the parameter of the state value function are updated according to the gradient descent method so that the error between the function value and the target value is minimized.
  • the processor performs a process of updating parameters of a policy function (Equation 40), a process of updating parameters of a target state value function (Equation 41), and a Lagrange multiplier.
  • the updating process (Equation 42) is sequentially performed.
  • the accumulative compensation calculation step (S104) is a process of obtaining an accumulative compensation amount for one episode. Based on the determined parameter, the processor calculates an accumulated reward by repeatedly executing an action according to a policy in the current state, obtaining a reward, and moving to the next state.
  • the processor determines whether the accumulated compensation is maximum (S106), and if the accumulated compensation has not yet reached the maximum, repeatedly performs the parameter updating step (S102), and determines that the accumulated compensation has reached the maximum policy at that time. is determined as the optimal operating policy (S108).
  • the optimal operating policy can be applied to the discrete industrial manufacturing system to achieve demand response management that can minimize energy costs while ensuring production targets.
  • the lithium ion battery assembly process includes four processes of assembly, saturation, formation and grading.
  • Assembling Components are assembled together to form a battery module having a hierarchical structure of battery modules as shown in FIG. 4 .
  • the components include a side frame (SF), a battery cell (BC), a cooling plate (CP), an intermediate frame (IF) and a compressed foam (CF).
  • the module is injected with an appropriate amount of electrolyte.
  • the module is activated into a usable mode by an appropriate charge and discharge process.
  • Figure 3 provides an overview of the battery module assembly process which can be divided into 10 tasks with each task assigned to a related machine. Operational information for each machine, including available operating modes, production speed, power consumption and buffer capacity, is provided in Table 1.
  • the assembly system's target is set to produce 500 lithium-ion battery units per day (i.e., the machine's maximum buffer capacity).
  • the maximum amount of power drawn by the system is set at 500 kW.
  • Step sizes for state-value networks V, state-action-value Q, policy networks are set to 5e -4 , e -3 , e -3 and e -5 , respectively.
  • the number of hidden layers for each neural network is set to 2 and each layer has 64 neurons.
  • ReLU Rectified Linear Unit
  • the temperature coefficient ⁇ and the discount rate ⁇ are set to 0.02 and 0.95, respectively.
  • the softmax function is applied in the last layer of the policy to create discrete actions. Playback buffer size and batch size are set to 500 and 256 respectively.
  • the weights of the 9 neural networks are randomly initialized and updated iteratively.
  • the agent After setting all the parameters, the agent starts maximizing the cumulative reward. 5 shows the training process on September 5, 2019. The agent is performing poorly initially, as evidenced by the reward. However, as the number of iterations increased, the agent began to perform actions that provided higher rewards through trial and error. Finally achieved the maximum reward in about 30,000 times. Obtaining the maximum reward determines the corresponding optimal operating strategy. Table 3 lists the operating options of all machines at each stage, with “1” and “0” representing the operating and idle modes respectively.
  • Figure 6 shows the total energy consumption of all machines under the optimal operating policy generated in the method according to the invention.
  • Machines consume more energy when prices are low, consume less when prices are high, and avoid consuming energy during peak hours.
  • the machine is consuming more energy during steps 1-15 and 19-24 and less during steps 16-18.
  • most machines are reducing energy consumption to a minimum value because power prices are at their peak at levels 16 and 17. This not only relieves stress on the power grid, but also reduces energy costs for industrial consumers.
  • the proposed DR plan to describe real-time battery production the corresponding final battery production storage at each stage t is shown in FIG. 7 .
  • the system finally calculates 500 battery modules (ie, production target) by executing the optimal schedule table shown in Table 3.
  • Figure 8 compares the total cost obtained from the MILP Gurobi solver and the CSAC algorithm according to the present invention.
  • the MILP Gurobi solver uses a specific model that takes into account all information from the system, short-sighted measures to meet production targets and minimize energy costs as defined in Equations 8 and 9.
  • the CSAC algorithm exhibits the self-learning ability to select different actions to maximize this reward, as described above.
  • the CRL algorithm method according to the present invention shows low performance in the initial training stage because it goes through trial and error. However, after experiencing more episodes, the agent adapts to the learning environment and adjusts its policy through a search and exploit mechanism. Finally, the optimal policy is obtained. Since the method according to the present invention is modelless and does not require expertise in complex energy management scenarios, it can provide a promising solution to solve complex industrial energy management problems.
  • Figures 9 and 10 respectively show the convergence of cumulative rewards in the learning process and the corresponding total energy consumption of all machines under the DR scheme according to the present invention.
  • FIG. 10 Similar energy consumption trends were observed between the previous day patterns, further proving the validity of the DR method according to the present invention.
  • the final battery production and storage for 3 days at each step t is shown in FIG. As can be seen in Figure 11, the system produces 500 battery modules at the end stage each day.
  • the present invention can be applied in demand response (DR) management systems in industrial manufacturing systems, as well as in many other types of industrial facilities that integrate energy storage systems with renewable energy sources such as solar and wind power.
  • DR demand response

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Economics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Water Supply & Treatment (AREA)
  • Algebra (AREA)
  • General Business, Economics & Management (AREA)
  • Power Engineering (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Feedback Control In General (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

La réponse à la demande (DR) est considérée comme un procédé efficace pour améliorer la stabilité et l'efficacité financière d'un réseau électrique. La mise en œuvre DR correspondant au secteur industriel en tant que consommateur primaire est urgente. La présente invention propose une nouvelle approche de gestion DR basée sur le prix industriel pour un système de fabrication discret industriel en prenant simultanément en compte les coûts énergétiques et les cibles de production quotidienne. Pour ce faire, le système de fabrication discret est modélisé par un processus de décision de Markov contraint (CMDP), et un algorithme d'apprentissage par renforcement contraint (CRL) est utilisé pour déterminer des stratégies de fonctionnement économiques du système de fabrication discret. Afin de vérifier les performances du procédé selon la présente invention, une simulation a été réalisée à l'aide d'un système d'ensemble batterie lithium-ion réel, et les résultats d'évaluation ont montré que le procédé selon la présente invention peut être utilisé pour optimiser les coûts énergétiques sans cibles de production manquantes.
PCT/KR2022/012145 2021-11-15 2022-08-12 Procédé de gestion de réponse à la demande d'un système de fabrication industrielle discret à l'aide d'un apprentissage par renforcement contraint Ceased WO2023085560A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2021-0156719 2021-11-15
KR1020210156719A KR102707077B1 (ko) 2021-11-15 2021-11-15 제약 강화 학습이 적용된 이산 산업 제조 시스템의 수요반응 관리 방법

Publications (1)

Publication Number Publication Date
WO2023085560A1 true WO2023085560A1 (fr) 2023-05-19

Family

ID=86336242

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2022/012145 Ceased WO2023085560A1 (fr) 2021-11-15 2022-08-12 Procédé de gestion de réponse à la demande d'un système de fabrication industrielle discret à l'aide d'un apprentissage par renforcement contraint

Country Status (2)

Country Link
KR (1) KR102707077B1 (fr)
WO (1) WO2023085560A1 (fr)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117376355A (zh) * 2023-10-31 2024-01-09 重庆理工大学 基于超图的b5g海量物联网资源分配方法及系统
CN117369286A (zh) * 2023-12-04 2024-01-09 中国海洋大学 一种海洋平台动力定位控制方法
CN117833307A (zh) * 2023-12-08 2024-04-05 三峡大学 一种基于近似集体策略和独立学习器的家庭微网群优化方法
CN118364943A (zh) * 2024-04-16 2024-07-19 中国科学院深圳先进技术研究院 数控加工工艺优化方法、装置、设备、存储介质及产品
CN118690632A (zh) * 2024-04-07 2024-09-24 北京交通大学 用于异质能源系统智能调度决策的可信解释方法及系统
CN119720727A (zh) * 2024-09-29 2025-03-28 重庆大学 一种基于柔性行动器‐评判器结合逻辑Benders分解的月度机组组合方法
CN120634208A (zh) * 2025-08-11 2025-09-12 上上德盛集团股份有限公司 一种智能工厂生产过程实时监控与优化系统
WO2025200753A1 (fr) * 2024-10-16 2025-10-02 南京邮电大学 Procédé et système de prédiction de données multidimensionnelles pour atelier de fabrication discrète

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101472582B1 (ko) * 2013-08-06 2014-12-16 국민대학교산학협력단 마이크로그리드 기반의 지능형 전력수요관리 방법 및 그 시스템
KR20180032492A (ko) * 2016-09-22 2018-03-30 한양대학교 에리카산학협력단 계층적 전력 시장을 고려한 인센티브 기반 수요반응 방법
KR20190061451A (ko) * 2017-11-28 2019-06-05 가천대학교 산학협력단 에너지수요 관리를 위한 경쟁 인지형 가격제어방법
KR20190132187A (ko) * 2018-05-18 2019-11-27 한양대학교 에리카산학협력단 복합적 dr 자원을 고려한 인센티브 기반 수요응답 방법 및 시스템
KR20190132193A (ko) * 2018-05-18 2019-11-27 한양대학교 에리카산학협력단 스마트 그리드에서 동적 가격 책정 수요반응 방법 및 시스템

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7147243B2 (ja) * 2018-04-23 2022-10-05 日本製鉄株式会社 スケジュール作成装置、方法及びプログラム
KR102238021B1 (ko) * 2020-05-29 2021-04-09 엘에스일렉트릭(주) 수요 반응을 고려한 공정 스케줄링 장치 및 그의 동작 방법

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101472582B1 (ko) * 2013-08-06 2014-12-16 국민대학교산학협력단 마이크로그리드 기반의 지능형 전력수요관리 방법 및 그 시스템
KR20180032492A (ko) * 2016-09-22 2018-03-30 한양대학교 에리카산학협력단 계층적 전력 시장을 고려한 인센티브 기반 수요반응 방법
KR20190061451A (ko) * 2017-11-28 2019-06-05 가천대학교 산학협력단 에너지수요 관리를 위한 경쟁 인지형 가격제어방법
KR20190132187A (ko) * 2018-05-18 2019-11-27 한양대학교 에리카산학협력단 복합적 dr 자원을 고려한 인센티브 기반 수요응답 방법 및 시스템
KR20190132193A (ko) * 2018-05-18 2019-11-27 한양대학교 에리카산학협력단 스마트 그리드에서 동적 가격 책정 수요반응 방법 및 시스템

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117376355A (zh) * 2023-10-31 2024-01-09 重庆理工大学 基于超图的b5g海量物联网资源分配方法及系统
CN117369286A (zh) * 2023-12-04 2024-01-09 中国海洋大学 一种海洋平台动力定位控制方法
CN117369286B (zh) * 2023-12-04 2024-02-09 中国海洋大学 一种海洋平台动力定位控制方法
CN117833307A (zh) * 2023-12-08 2024-04-05 三峡大学 一种基于近似集体策略和独立学习器的家庭微网群优化方法
CN117833307B (zh) * 2023-12-08 2024-06-11 三峡大学 一种基于近似集体策略和独立学习器的家庭微网群优化方法
CN118690632A (zh) * 2024-04-07 2024-09-24 北京交通大学 用于异质能源系统智能调度决策的可信解释方法及系统
CN118364943A (zh) * 2024-04-16 2024-07-19 中国科学院深圳先进技术研究院 数控加工工艺优化方法、装置、设备、存储介质及产品
CN119720727A (zh) * 2024-09-29 2025-03-28 重庆大学 一种基于柔性行动器‐评判器结合逻辑Benders分解的月度机组组合方法
WO2025200753A1 (fr) * 2024-10-16 2025-10-02 南京邮电大学 Procédé et système de prédiction de données multidimensionnelles pour atelier de fabrication discrète
CN120634208A (zh) * 2025-08-11 2025-09-12 上上德盛集团股份有限公司 一种智能工厂生产过程实时监控与优化系统

Also Published As

Publication number Publication date
KR20230070779A (ko) 2023-05-23
KR102707077B1 (ko) 2024-09-19

Similar Documents

Publication Publication Date Title
WO2023085560A1 (fr) Procédé de gestion de réponse à la demande d'un système de fabrication industrielle discret à l'aide d'un apprentissage par renforcement contraint
Wang et al. Safe off-policy deep reinforcement learning algorithm for volt-var control in power distribution systems
Du et al. Deep reinforcement learning from demonstrations to assist service restoration in islanded microgrids
Nguyen et al. Three-stage inverter-based peak shaving and Volt-VAR control in active distribution networks using online safe deep reinforcement learning
CN113541191B (zh) 考虑大规模可再生能源接入的多时间尺度调度方法
Sanseverino et al. An execution, monitoring and replanning approach for optimal energy management in microgrids
Li et al. Restoration strategy for active distribution systems considering endogenous uncertainty in cold load pickup
Martinez Ramos et al. Transmission power loss reduction by interior-point methods: implementation issues and practical experience
Gharavi et al. CVR and loss optimization through active voltage management: A trade-off analysis
WO2024177354A1 (fr) Procédé et système de négociation d'énergie basée sur une commande optimale prédictive dans un nanoréseau
Kumari et al. A state-of-the-art review on recent load frequency control architectures of various power system configurations
Chen et al. Multiagent soft actor–critic learning for distributed ess enabled robust voltage regulation of active distribution grids
Cui et al. An Intelligent Control Strategy for buck DC-DC Converter via Deep Reinforcement Learning
Li et al. Deep reinforcement learning for online scheduling of photovoltaic systems with battery energy storage systems
Gambino et al. Model predictive control for optimization of combined heat and electric power microgrid
Gbadega et al. Iterative LMI-tuned PID controller for robust decentralized multi-area automatic generation control in deregulated electricity markets
Capitanescu Challenges ahead risk-based ac optimal power flow under uncertainty for smart sustainable power systems
Pinthurat et al. Simultaneous Voltage Regulation and Unbalance Compensation in Distribution Systems With an Information-Driven Learning Approach
CN114336634B (zh) 一种电网系统的潮流计算方法、装置和设备
Nayak et al. Modified differential evolution optimization algorithm for multi-constraint optimal power flow
Lenin et al. Modified monkey optimization algorithm for solving optimal reactive power dispatch problem
Hazra et al. A study on real and reactive power optimization using particle swarm optimization
Kisengeu et al. Under voltage load shedding using hybrid metaheuristic algorithms for voltage stability enhancement: A review
Kodeeswara Kumaran et al. Two-area power system stability analysis by frequency controller with UPFC synchronization and energy storage systems by optimization approach
Zhang et al. Scalable and privacy-preserving distributed energy management for multimicrogrid

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22892983

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 22892983

Country of ref document: EP

Kind code of ref document: A1