WO2025135608A1

WO2025135608A1 - Drone and control method therefor

Info

Publication number: WO2025135608A1
Application number: PCT/KR2024/019561
Authority: WO
Inventors: 박현빈
Original assignee: Posco Holdings Inc
Current assignee: Posco Holdings Inc
Priority date: 2023-12-18
Filing date: 2024-12-03
Publication date: 2025-06-26
Anticipated expiration: 2026-06-18
Also published as: KR20250094759A

Abstract

The embodiments of the present invention provide a drone and a control method therefor, in which, by converting a motor signal to a motor control signal in a general situation, the drone is controlled so as to fly along a specific flight trajectory, a specific situation is recognized by measuring attitude on the basis of sensor data, the difference between an input value and an estimated input value is estimated as estimated disturbance, the input value being outputted from a policy-based reinforcement learning network which has undergone reinforcement learning so as to output the input value by receiving a desired current state, and the estimated input value being outputted from a Bayesian network trained so as to output the estimated input value by receiving the current state and the next state, and the drone is controlled so as to fly while maintaining attitude by adjusting the motor control signal in the specific situation by means of a control signal in which disturbance that can actually occur is canceled by subtracting the estimated disturbance from the input value.

Description

Drone and method of controlling the drone

본 실시예들은 특정 상황에서 기체를 제어하는 드론 및 그 드론의 제어방법에 관한 것이다.The present embodiments relate to a drone for controlling an aircraft in a specific situation and a method for controlling the drone.

예를 들어, 아르헨티나에 위치한 염수 리튬을 생산하기 위해서는 각 폰드별 목표 농도를 유지해야 하며, 농도를 유지하기 위한 관리 작업으로서 염수 폰드별 채수 및 수심 측정 작업이 존재한다. 해당 작업의 효율성을 높이기 위한 방안 중 하나로서 드론을 활용할 수 있으나, 리튬 염호가 위치한 곳은 증발량을 극대화하기 위해 바람이 잦은 곳에 위치하게 된다. For example, in order to produce brine lithium in Argentina, the target concentration must be maintained for each pond, and as a management task to maintain the concentration, there are tasks such as collecting water and measuring water depth for each brine pond. Drones can be used as one way to increase the efficiency of this task, but the lithium salt lake is located in a windy place to maximize evaporation.

드론은 대표적인 부족구동(under-actuated) 시스템으로서 바람과 같은 외란에 취약하며, 또한, 가장 전형적으로 사용되는 PID 제어기는 경로 추적에 맞춰 하이퍼 매개변수가 조절돼 있기 때문에 돌풍에 취약하다.Drones are a typical under-actuated system and are vulnerable to external disturbances such as wind. In addition, the most typically used PID controllers are vulnerable to gusts because their hyperparameters are tuned for path following.

대부분의 드론에서 특정 상황, 예를 들어 돌풍의 영향을 받아 드론이 표류하거나 기울어 추락 또는 제어력 상실로 이어질 수 있는 위험성을 내포한다.Most drones pose a risk of drifting or tilting under certain conditions, such as when subjected to gusts of wind, which can lead to a crash or loss of control.

본 실시예들은 돌풍 상황과 같은 특정 상황에서 저지연으로 드론을 안정적으로 제어하는 드론 및 그 드론의 제어 방법을 제공할 수 있다.The present embodiments can provide a drone and a method for controlling the drone, which stably control the drone with low delay in specific situations such as a gust of wind.

본 발명의 실시예들은 일반적인 상황에서 모터 신호를 모터 제어 신호로 변환하여 특정 비행 궤적으로 비행하도록 제어하고, 센서 데이터를 기반으로 자세를 측정하여 특정 상황을 인지하고, 원하는 현재 상태를 입력받아 입력값을 출력하도록 강화학습된 정책 기반 강화학습 네트워크로부터 출력되는 입력값과 현재 상태와 다음 상태를 입력받아 추정 입력값을 출력하도록 학습된 베이지안 네트워크로부터 출력된 추정 입력값의 차이를 추정 외란으로 추정하고 입력값에 추정 외란을 빼서 실제 발생할 수 있는 외란을 상쇄한 제어신호로 특정 상황에서 모터 제어 신호를 조정하여 자세를 유지하면서 비행하도록 제어하는 드론 및 그 드론의 제어 방법을 제공한다.Embodiments of the present invention provide a drone and a method for controlling the drone to fly while maintaining the attitude by adjusting the motor control signal in a specific situation as a control signal by converting a motor signal into a motor control signal in a general situation and controlling the drone to fly along a specific flight trajectory, measuring an attitude based on sensor data to recognize a specific situation, and estimating the difference between an input value output from a policy-based reinforcement learning network that receives a desired current state as an input and outputs an input value and an estimated input value output from a Bayesian network that is trained to receive a current state and a next state as inputs and output an estimated input value, and controlling the drone to fly while maintaining the attitude by adjusting the motor control signal in a specific situation as a control signal by subtracting the estimated disturbance from the input value and offsetting the disturbance that may actually occur.

일 측면에서, 본 실시예들은 센서 데이터를 제공하는 드론 센서부, 모터 신호를 제공하는 드론 비행부 및 일반적인 상황에서 모터 신호를 모터 제어 신호로 변환하여 특정 비행 궤적으로 비행하도록 제어하고, 센서 데이터를 기반으로 자세를 측정하여 특정 상황을 인지하고, 원하는 현재 상태를 입력받아 입력값을 출력하도록 강화학습된 정책 기반 강화학습 네트워크로부터 출력되는 입력값과 현재 상태와 다음 상태를 입력받아 추정 입력값을 출력하도록 학습된 베이지안 네트워크로부터 출력된 추정 입력값의 차이를 추정 외란으로 추정하고 입력값에 추정 외란을 빼서 실제 발생할 수 있는 외란을 상쇄한 제어신호로 특정 상황에서 모터 제어 신호를 조정하여 자세를 유지하면서 비행하도록 제어하는 드론 제어부를 포함하는 드론을 제공한다.In one aspect, the present embodiments provide a drone including a drone sensor unit that provides sensor data, a drone flight unit that provides a motor signal, and a drone control unit that controls the drone to fly along a specific flight trajectory by converting the motor signal into a motor control signal in a general situation, measures an attitude based on the sensor data to recognize a specific situation, and estimates the difference between an input value output from a policy-based reinforcement learning network that receives a desired current state as an input and outputs an input value and an estimated input value output from a Bayesian network that is trained to receive a current state and a next state as an input and output an estimated input value, and controls the drone to fly while maintaining the attitude by adjusting the motor control signal in a specific situation as a control signal that offsets a disturbance that may actually occur by subtracting the estimated disturbance from the input value.

다른 측면에서, 본 실시예들은 일반적인 상황에서 모터 신호를 모터 제어 신호로 변환하여 특정 비행 궤적으로 비행하도록 제어하는 제1단계 및 센서 데이터를 기반으로 자세를 측정하여 특정 상황을 인지하고, 원하는 현재 상태를 입력받아 입력값을 출력하도록 강화학습된 정책 기반 강화학습 네트워크로부터 출력되는 입력값과 다음 상태를 입력받아 추정 입력값을 출력하도록 학습된 베이지안 네트워크로부터 출력된 추정 입력값의 차이를 추정 외란으로 추정하고 입력값에 추정 외란을 빼서 실제 발생할 수 있는 외란을 상쇄한 제어신호로 특정 상황에서 모터 제어 신호를 조정하여 자세를 유지하면서 비행하도록 제어하는 제2단계를 포함하는 드론의 제어 방법을 제공할 수 있다.In another aspect, the present embodiments can provide a method for controlling a drone, including a first step for controlling the drone to fly along a specific flight trajectory by converting a motor signal into a motor control signal in a general situation, and a second step for controlling the drone to fly while maintaining the attitude by measuring the attitude based on sensor data, recognizing a specific situation, estimating the difference between an input value output from a policy-based reinforcement learning network that receives a desired current state as an input and outputs an input value, and an estimated input value output from a Bayesian network that is trained to receive a next state as an input and output an estimated input value, and subtracting the estimated disturbance from the input value to cancel out a disturbance that may actually occur.

본 실시예들에 따른 드론 및 그 드론의 제어 방법에 의하면, 돌풍 상황과 같은 특정 상황에서 저지연으로 드론을 안정적으로 제어할 수 있다.According to the drone and the method for controlling the drone according to the present embodiments, the drone can be stably controlled with low delay in specific situations such as a gust of wind.

도 1은 실시예들이 적용되는 드론의 사시도이다. Figure 1 is a perspective view of a drone to which embodiments are applied.

도 2는 일 실시예에 따른 드론의 구성도이다. Figure 2 is a configuration diagram of a drone according to one embodiment.

도 3은 도 2의 드론 제어부의 일 예의 구성도이다.Figure 3 is a configuration diagram of an example of the drone control unit of Figure 2.

도 4는 도 2 및 도 3의 제1제어부와 드론 구동부의 관계도이다.Figure 4 is a relationship diagram between the first control unit and the drone drive unit of Figures 2 and 3.

도 5는 도 2 및 도 3의 제2제어부와 드론 구동부의 관계도이다.Figure 5 is a relationship diagram between the second control unit and the drone drive unit of Figures 2 and 3.

도 6은 도 5의 제2제어부(154)에서 사용되는 정책기반 강화학습 네트워크의 개념도이다. Figure 6 is a conceptual diagram of a policy-based reinforcement learning network used in the second control unit (154) of Figure 5.

도 7은 도 5의 제2제어부의 정책 기반 강화학습 네트워크와 베이지안 네트워크의 관계의 일 예를 도시하고 있다. Figure 7 illustrates an example of the relationship between the policy-based reinforcement learning network of the second control unit of Figure 5 and the Bayesian network.

도 8은 도 5의 제2제어부의 정책 기반 강화학습 네트워크와 베이지안 네트워크의 관계의 다른 예를 도시하고 있다.Figure 8 illustrates another example of the relationship between the policy-based reinforcement learning network of the second control unit of Figure 5 and the Bayesian network.

도 9은 학습 단계에서 도 2 및 도 3의 제2제어부와 드론 구동부의 적용예이다. Figure 9 is an example of application of the second control unit and drone drive unit of Figures 2 and 3 in the learning stage.

도 10은 사용 단계에서 도 2 및 도 3의 제2제어부와 드론 구동부의 적용예이다.Fig. 10 is an application example of the second control unit and the drone drive unit of Figs. 2 and 3 in the use stage.

도 11은 도 2 및 도 3의 드론 구동부의 개념도이다.Figure 11 is a conceptual diagram of the drone driving unit of Figures 2 and 3.

도 12는 도 2 및 도 3의 드론의 시뮬레이터의 일 예를 도시하고 있다. Figure 12 illustrates an example of a simulator of the drone of Figures 2 and 3.

도 13은 또 다른 실시예에 따른 드론 제어 방법의 흐름도이다.Figure 13 is a flowchart of a drone control method according to another embodiment.

이하, 본 개시의 일부 실시예들을 예시적인 도면을 통해 상세하게 설명한다. 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 개시를 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 개시의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.Hereinafter, some embodiments of the present disclosure will be described in detail with reference to exemplary drawings. When adding reference numerals to components in each drawing, it should be noted that the same components are given the same numerals as much as possible even if they are shown in different drawings. In addition, when describing the present disclosure, if it is determined that a specific description of a related known configuration or function may obscure the gist of the present disclosure, the detailed description thereof will be omitted.

또한, 본 개시의 구성 요소를 설명하는 데 있어서, 제 1, 제 2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 어떤 구성 요소가 다른 구성요소에 "연결", "결합" 또는 "접속"된다고 기재된 경우, 그 구성 요소는 그 다른 구성요소에 직접적으로 연결되거나 또는 접속될 수 있지만, 각 구성 요소 사이에 또 다른 구성 요소가 "연결", "결합" 또는 "접속"될 수도 있다고 이해되어야 할 것이다.Also, in describing components of the present disclosure, terms such as first, second, A, B, (a), (b), etc. may be used. These terms are only intended to distinguish the components from other components, and the nature, order, or sequence of the components are not limited by the terms. When it is described that a component is "connected," "coupled," or "connected" to another component, it should be understood that the component may be directly connected or connected to the other component, but another component may also be "connected," "coupled," or "connected" between each component.

본 발명의 실시예들에 있어서, 에너지의 형태는 전기 에너지, 열 에너지, 빛 에너지 등일 수 있다. 이하, 본 발명의 실시예들에서는 에너지의 형태가 전기 에너지인 경우에 대해 주로 설명하나 에너지의 형태는 이에 한정되지 않는다.In embodiments of the present invention, the form of energy may be electrical energy, thermal energy, light energy, etc. Hereinafter, embodiments of the present invention will mainly describe a case where the form of energy is electrical energy, but the form of energy is not limited thereto.

이하에서는, 관련 도면을 참조하여 본 발명의 실시예들에 따른 에너지 관리 장치, 다중 에이전트 기반 배터리 냉각판 설계 장치 및 다중 에이전트 기반 배터리 냉각판 설계 장치에 대하여 설명하기로 한다.Hereinafter, an energy management device, a multi-agent based battery cooling plate design device, and a multi-agent based battery cooling plate design device according to embodiments of the present invention will be described with reference to related drawings.

이하 도면을 참조하여 실시예들을 상세히 설명한다. The embodiments are described in detail with reference to the drawings below.

도 1은 일 실시예에 따른 드론의 정면도이다 Figure 1 is a front view of a drone according to one embodiment.

도 1을 참조하면, 일 실시예에 따른 드론(100)은 프로펠러(110)와 전기모터(120)를 이용하여 비행할 수 있다. 전기모터(120)는 전력을 전기적 에너지로 변환하여 프로펠러(110)를 회전시키는데 사용된다. 이 회전운동은 공기를 밀어내어 드론(100)을 위쪽으로 추진하는 힘을 만들어낸다. 프로펠러(110)는 이 회전운동을 이용해 공기를 이동시키는데 도움을 주는데, 이런 원리를 이용해 드론(100)이 비행할 수 있다. Referring to FIG. 1, a drone (100) according to one embodiment can fly using a propeller (110) and an electric motor (120). The electric motor (120) is used to convert electric power into electrical energy to rotate the propeller (110). This rotational motion pushes out air, creating a force that propels the drone (100) upward. The propeller (110) helps to move the air using this rotational motion, and the drone (100) can fly using this principle.

드론(100)은 다양한 센서와 카메라, GPS, 통신 시스템 등을 탑재하여 비행 제어와 데이터 수집에 사용된다. Drones (100) are equipped with various sensors, cameras, GPS, communication systems, etc. and are used for flight control and data collection.

드론(100)은 여러 개(예를 들어 도 1의 4개)의 프로펠러들(110)를 가지고 있어 각각의 프로펠러(110)를 제어하여 상승, 하강, 전진, 후진, 좌우 이동, 회전 등을 수행할 수 있다. 이를 위해서는 각 프로펠러(110)의 회전 속도를 조절하여 드론(100)의 방향과 높이를 조절한다. The drone (100) has multiple (e.g., four in FIG. 1) propellers (110), and each propeller (110) can be controlled to perform ascent, descent, forward, backward, left and right movement, rotation, etc. To this end, the direction and height of the drone (100) are controlled by controlling the rotation speed of each propeller (110).

드론(100)에는 비행을 제어하기 위한 시스템이 내장되어 있다. 이 시스템은 사용자가 입력한 명령에 따라 각 전기모터(120)의 속도를 조절하고, 자세를 조절하여 원하는 방향으로 이동하도록 한다. The drone (100) has a built-in system for controlling flight. This system controls the speed of each electric motor (120) and adjusts the attitude to move in the desired direction according to the command entered by the user.

일 실시예에 따른 드론(100)은 돌풍 상황과 같은 특정 상황에서 저지연으로 표류하거나 기울어 추락 또는 제어력을 상실하지 않고 자세와 속도를 안정적으로 제어한다. A drone (100) according to one embodiment stably controls attitude and speed without drifting, tilting, crashing or losing control with low delay in specific situations such as gusts.

도 2는 도 1의 드론의 구성도이다. Figure 2 is a configuration diagram of the drone of Figure 1.

도 2를 참조하면, 일 실시예에 따른 드론(100)은 센서 데이터를 제공하는 드론 센서부(130), 모터 신호를 제공하는 드론 비행부(140) 및 모터 신호를 모터 제어 신호로 변환하여 특정 비행 궤적으로 비행하도록 제어하는 드론 제어부를 포함한다. Referring to FIG. 2, a drone (100) according to one embodiment includes a drone sensor unit (130) that provides sensor data, a drone flight unit (140) that provides motor signals, and a drone control unit that converts the motor signals into motor control signals and controls the drone to fly along a specific flight trajectory.

드론 제어부(150)는 일반적인 상황에서 모터 신호를 모터 제어 신호로 변환하여 특정 비행 궤적으로 비행하도록 제어하고, 센서 데이터를 기반으로 자세를 측정하여 특정 상황을 인지하고, 원하는 현재 상태를 입력받아 입력값을 출력하도록 강화학습된 정책 기반 강화학습 네트워크로부터 출력되는 입력값과 현재 상태와 다음 상태를 입력받아 추정 입력값을 출력하도록 학습된 베이지안 네트워크로부터 출력된 추정 입력값의 차이를 추정 외란으로 추정하고 입력값에 추정 외란을 빼서 실제 발생할 수 있는 외란을 상쇄한 제어신호로 특정 상황에서 모터 제어 신호를 조정하여 자세를 유지하면서 비행하도록 제어할 수 있다. The drone control unit (150) controls the drone to fly along a specific flight trajectory by converting a motor signal into a motor control signal in a general situation, recognizes a specific situation by measuring an attitude based on sensor data, and estimates the difference between an input value output from a policy-based reinforcement learning network that receives a desired current state as input and outputs an input value and an estimated input value output from a Bayesian network that is trained to receive a current state and a next state as inputs and outputs an estimated input value as an estimated disturbance, and controls the drone to fly while maintaining an attitude by adjusting the motor control signal in a specific situation as a control signal that offsets disturbances that may actually occur by subtracting the estimated disturbance from the input value.

이하에서 특정 상황은 전술한 바와 같이 돌풍 상황을 예시적으로 설명하나, 이에 제한되지 않고 일반적으로 드론(100)이 표류하거나 기울어 추락 또는 제어력을 상실할 수 있는 일반적 상황이 아닌 모든 경우를 포함한다. 즉, 본 명세서는 드론(10)이 두개의 상황들을 정의하고, 일반적 상황과 같은 제1상황에서 일반적인 제어 동작을 수행하다 특정 상황과 같은 제2상황에서 특수한 제어 동작을 수행할 수 있다. 후술하는 바와 같이 드론(100)은 일반적 상황과 특수 상황에서 소프트웨어적인 제어 동작을 수행할 수도 있고, 하드웨어적인 제어 동작을 수행할 수도 있다. Hereinafter, a specific situation is exemplified by a gust situation as described above, but is not limited thereto, and generally includes all cases other than a general situation in which a drone (100) may drift or tilt, crash, or lose control. That is, this specification defines two situations in which a drone (100) may perform a general control operation in a first situation such as a general situation, and perform a special control operation in a second situation such as a specific situation. As described below, the drone (100) may perform a software control operation or a hardware control operation in the general situation and the special situation.

드론 센서부(130)는 자이로 센서(132)와 가속도 센서(134)를 포함하고, 센서 데이터는 자이로 센서(132)에서 센싱된 각속도와 가속도 센서(134)에서 센싱한 가속도일 수 있으나, 이에 제한되지 않는다. The drone sensor unit (130) includes a gyro sensor (132) and an acceleration sensor (134), and the sensor data may be, but is not limited to, angular velocity sensed by the gyro sensor (132) and acceleration sensed by the acceleration sensor (134).

자이로 센서(132)는 각속도를 측정할 수 있다. 자이로센서(132)의 일예로 각속도를 측정하는 자이로스코프일 수 있다. 가속도 센서(134)는 가속도를 측정하는 센서이다. 가속도 센서(132)는 예를 들어 x축과 y축, z축 방향의 가속도를 측정할 수 있다. The gyro sensor (132) can measure angular velocity. An example of the gyro sensor (132) may be a gyroscope that measures angular velocity. The acceleration sensor (134) is a sensor that measures acceleration. The acceleration sensor (132) can measure acceleration in the x-axis, y-axis, and z-axis directions, for example.

드론 비행부(140)는 드론 제어부(150)에 의해 조정된 모터 제어 신호를 모터 속도와 방향을 조정하는 특정 PWM 신호로 변환할 수 있다. The drone flight unit (140) can convert a motor control signal adjusted by the drone control unit (150) into a specific PWM signal that adjusts the motor speed and direction.

드론 비행부(140)는 드론 제어부(150)에 의해 조정된 모터 제어 신호를 모터 속도와 방향을 조정하는 해당 PWM 신호로 변환한다. 드론 비행부(140)의 제어 로직은 공기역학 및 비행 역학 모델을 사용하여 바람의 방해를 보상하고 원하는 비행 궤적을 유지한다.The drone flight unit (140) converts the motor control signal adjusted by the drone control unit (150) into a corresponding PWM signal that adjusts the motor speed and direction. The control logic of the drone flight unit (140) uses aerodynamic and flight dynamics models to compensate for wind disturbance and maintain the desired flight trajectory.

드론 비행부(140)는 변환된 PWM 신호를 전기 모터(120)에 제공하고 전기 모터(120)를 제어하므로 드론(200)의 비행을 유도할 수 있다.The drone flight unit (140) provides the converted PWM signal to the electric motor (120) and controls the electric motor (120), thereby inducing the flight of the drone (200).

드론 제어부(150)는 마이크로 프로세서를 포함하고 있고, 드론(100)은 메모리(미도시)를 추가로 포함할 수 있다. 메모리는 각종 센서 데이터와 각종 프로그램을 저장하고 있다. 메모리는 휘발성 메모리(e.g. SRAM, DRAM) 또는 비휘발성 메모리(e.g. NAND Flash)일 수 있다.The drone control unit (150) includes a microprocessor, and the drone (100) may additionally include a memory (not shown). The memory stores various sensor data and various programs. The memory may be a volatile memory (e.g. SRAM, DRAM) or a nonvolatile memory (e.g. NAND Flash).

전술한 일 실시예에 따른 드론(100)은 사용하고자 하는 상용 드론의 시뮬레이터를 활용하여 해당 드론의 모델 다이나믹스를 베이지안 네트워크를 통해 학습한다. The drone (100) according to the above-described embodiment learns the model dynamics of the commercial drone to be used through a Bayesian network by utilizing a simulator of the commercial drone to be used.

베이지안 네트워크(Bayesian Network)는 확률적인 그래픽 모델 중 하나로, 변수 간의 확률적 의존 관계를 표현하는데 사용된다. 베이지안 네트워크의 파라미터는 네트워크를 구성하는 각 변수들의 조건부 확률(conditional probability)로, 이를 통해 변수들 간의 관계를 모델링한다. 예를 들어, 베이지안 네트워크의 학습 과정에서 파라미터를 추정하는 방법은 최대 우도 추정(Maximum Likelihood Estimation, MLE)이라는 방법을 활용한다. 이는 데이터로부터 가장 가능성이 높은 파라미터를 찾는 방법이다.Bayesian Network is one of the probabilistic graphical models, used to express probabilistic dependencies between variables. The parameters of the Bayesian Network are the conditional probabilities of each variable that constitutes the network, and the relationships between variables are modeled through this. For example, the method of estimating parameters in the learning process of the Bayesian Network utilizes a method called Maximum Likelihood Estimation (MLE). This is a method of finding the most probable parameters from data.

학습된 베이지안 네트워크는 새로운 데이터에 대한 예측을 수행할 때, 학습된 파라미터를 이용하여 확률을 계산한다. A trained Bayesian network uses learned parameters to calculate probabilities when making predictions on new data.

광의의 베이지안 네트워크는 일반적인 변수들 간의 조건부 의존 관계를 그래프로 표현한 베이지안 네트워크 (Bayesian Network) 뿐만 아니라, 노드와 에지로 이루어진 그래프 구조로 표현되며, 에지는 조건부 독립성을 나타내는 마르코프 네트워크 (Markov Network): 또는 마르코프 랜덤 필드(Markov Random Field), 시간에 따라 변화하는 상황에서 변수 간의 의존 관계를 모델링하는데 사용되는 다이나믹 베이지안 네트워크 (Dynamic Bayesian Network), 데이터로부터 자동으로 구조를 학습하여 네트워크를 생성하는 구조적 베이지안 네트워크 (Structural Bayesian Network) 등 다양한 종류들이 존재한다. Bayesian networks in a broad sense exist in various types, including Bayesian networks, which represent conditional dependencies between general variables as a graph, Markov networks, or Markov Random Fields, which are expressed as graph structures consisting of nodes and edges, where edges represent conditional independence, Dynamic Bayesian Networks, which are used to model dependencies between variables in situations that change over time, and Structural Bayesian Networks, which automatically learn structures from data to create networks.

또한, 사용되는 베이지안 네트워크는 학습 시, 매 단계마다 파라미터 분포에 대한 신호 대 잡음비를 계산하고, 해당 신호 대 잡음비 값이 작을 경우, 결과값에 영향이 적은 파라미터로 간주되므로 가지치기(pruning)를 진행하여 경량화한 회소 베이지안 네트워크(Sparse Bayesian Network)일 수도 있다. In addition, the Bayesian network used may be a sparse Bayesian network that calculates the signal-to-noise ratio for the parameter distribution at each step during learning, and if the signal-to-noise ratio value is small, it is considered a parameter with little influence on the result value, so it may be pruned to make it lightweight.

예를 들어, 전술한 일 실시예에 따른 드론(100)은 학습된 희소 베이지안 네트워크에서 출력된 시스템 입력 추정치와 의도한 시스템 입력 간의 차이를 외란으로 간주하여 다음 입력에서 상쇄시킨다. 희소 베이지안 네트워크는 학습 과정에서 네트워크를 압축하여 용량 및 추론 속도를 감소시킨다. 전술한 일 실시예에 따른 드론(100)으로 인해 상대적으로 많은 바람이 부는 아르헨티나 상공 지역에서 드론 사용 시 돌풍으로 인한 추락을 대비할 수 있다.For example, the drone (100) according to the above-described embodiment considers the difference between the system input estimate output from the learned sparse Bayesian network and the intended system input as a disturbance and offsets it in the next input. The sparse Bayesian network compresses the network during the learning process to reduce the capacity and inference speed. The drone (100) according to the above-described embodiment can prevent crashes due to gusts when using the drone in the relatively windy airspace over Argentina.

돌풍 상황과 같은 특정 상황에서 드론 추락 시, 강화학습을 이용해 호버링(공중 정지)을 할 수 있도록 제어할 수 있다. 하지만 강화학습의 특성 상, 학습 환경에서 경험하지 못한 외란에 취약하기 때문에, 실제 환경에서 이를 적용하기 위해서는 정책 기반 강화학습 네트워크의 강건성을 향상시킬 수 있는 방안이 필요하다. 강화학습의 강건성을 향상시키기 위한, 일반적인 인공신경망을 통해 모델 다이나믹스를 추정한 후, 외란 관측기를 적용할 수 있다. 하지만, 일반적인 인공신경망은 특유의 큰 용량과 상대적으로 느린 추론 속도 때문에 임베디드 시스템과 같은 성능이 제한된 환경에서는 사용이 제한된다.In certain situations, such as a gust of wind, when a drone crashes, it can be controlled to hover (suspend in the air) using reinforcement learning. However, due to the nature of reinforcement learning, it is vulnerable to disturbances not experienced in the learning environment, so a method to improve the robustness of the policy-based reinforcement learning network is needed to apply it in a real environment. To improve the robustness of reinforcement learning, a disturbance observer can be applied after estimating the model dynamics through a general artificial neural network. However, general artificial neural networks have limited use in performance-limited environments such as embedded systems due to their unique large capacity and relatively slow inference speed.

전술한 일 실시예에 따른 드론(100)은 베이지안 네트워크, 예를 들어 희소 베이지안 네트워크 및 외란 관측기를 활용하여 정책 기반 강화학습 네트워크의 강건성을 높이고 네트워크 용량 및 추론 속도를 줄일 수 있다. The drone (100) according to the above-described embodiment can increase the robustness of a policy-based reinforcement learning network and reduce network capacity and inference speed by utilizing a Bayesian network, for example, a sparse Bayesian network and a disturbance observer.

도 3을 참조하면, 드론 제어부(150)는 일반적인 상황에서 모터 신호를 모터 제어 신호로 변환하여 특정 비행 궤적으로 비행하도록 제어하는 제1제어부(152) 및 입력값에 정책 기반 강화학습 네트워크로부터 출력되는 입력값과 베이지안 네트워크로부터 출력된 추정 입력값의 차이인 추정 외란을 빼서 실제 발생할 수 있는 외란을 상쇄한 제어신호로 특정 상황에서 모터 제어 신호를 조정하여 자세를 유지하면서 비행하도록 제어하는 제2제어부(154)를 포함할 수 있다.Referring to FIG. 3, the drone control unit (150) may include a first control unit (152) that controls the drone to fly along a specific flight trajectory by converting a motor signal into a motor control signal in a general situation, and a second control unit (154) that controls the drone to fly while maintaining an attitude by adjusting the motor control signal in a specific situation with a control signal that offsets a disturbance that may actually occur by subtracting an estimated disturbance, which is a difference between an input value output from a policy-based reinforcement learning network and an estimated input value output from a Bayesian network, from the input value.

제1제어부(152)와 제2제어부(154)는 별도의 하드웨어들로 구현될 수도 있고, 소프트웨어적으로 구현될 수도 있고, 하나는 하드웨어로 구현되고 다른 하나는 소프트웨어로 구현될 수도 있다. The first control unit (152) and the second control unit (154) may be implemented as separate hardware, may be implemented as software, or may be implemented as one hardware and the other software.

도 4는 도 2 및 도 3의 제1제어부와 드론 구동부의 관계도이다. 도 5는 도 2 및 도 3의 제2제어부와 드론 구동부의 관계도이다.Fig. 4 is a relationship diagram between the first control unit and the drone drive unit of Figs. 2 and 3. Fig. 5 is a relationship diagram between the second control unit and the drone drive unit of Figs. 2 and 3.

도 4를 참조하면, 제1제어부(152)는 일반적인 상황에서 모터 신호를 모터 제어 신호로 변환하여 특정 비행 궤적으로 비행하도록 제어한다. 제1제어부(152)는 일반적인 상황에서 원하는 현재 상태(Sr)를 입력받아 모터 제어 신호(u)를 드론 구동부(140)에 출력한다. 드론 구동부(140)는 수신한 모터 제어 신호(u)로 전기 모터를 구동한다. Referring to FIG. 4, the first control unit (152) converts a motor signal into a motor control signal in a general situation and controls the flight to a specific flight trajectory. The first control unit (152) receives a desired current state (Sr) in a general situation and outputs a motor control signal (u) to the drone drive unit (140). The drone drive unit (140) drives the electric motor with the received motor control signal (u).

도 5를 참조하면, 제2제어부(154)는, 원하는 현재 상태(s_r)와 드론 구동부(140)의 출력값(s)의 차이값, 즉 에러값(e_s)를 입력받아 제어신호값(u)을 출력하도록 강화학습된 정책 기반 강화학습 네트워크(policy network, 156)로부터 출력되는 제어신호(u)과 현재 상태(s_r)와 다음 상태(s)를 입력받아 추정 입력값(

)을 출력하도록 학습된 베이지안 네트워크(158)로부터 출력된 추정 입력값(

)의 차이(

-u)를 추정 외란(

)으로 추정하고 입력값(u)에 추정 외란(

)을 빼서 실제 발생할 수 있는 외란(d)을 상쇄한 실제 제어신호(u_real)로 특정 상황에서 모터 제어 신호를 조정하여 자세를 유지하면서 비행하도록 제어할 수 있다. Referring to FIG. 5, the second control unit (154) receives the difference value (e s ), that is, the error value (e s ) between the desired current state (s _r ) and the output value ( _s ) of the drone drive unit (140), and outputs a control signal value (u) from a policy-based reinforcement learning network (policy network, 156) that has undergone reinforcement learning, and receives the control signal (u) and the current state (s _r ) and the next state (s) as inputs to output an estimated input value (

) is output from a Bayesian network (158) trained to output the estimated input value (

) difference(

-u) is estimated as a disturbance (

) is estimated and the input value (u) is estimated as a disturbance (

) can be used to control flight while maintaining attitude by adjusting the motor control signal in a specific situation with the real control signal (u _real ) that compensates for the disturbance (d) that may actually occur.

간단한 예를 들어, 드론 구동부(140)의 제어신호가 RPM값이라면, 정책 기반 강화학습 네트워크(policy network, 156)로부터 출력되는 제어신호(u)이 4000RPM이라고 가정하겠다. 이때 추정 외란(

)이 200RPM이라면 드론 구동부(140)에 입력되는 실제 제어신호값은 3800RPM이 될 수 있다. For a simple example, if the control signal of the drone drive unit (140) is an RPM value, it is assumed that the control signal (u) output from the policy-based reinforcement learning network (policy network, 156) is 4000 RPM. At this time, the estimated disturbance (

) is 200 RPM, the actual control signal value input to the drone drive unit (140) can be 3800 RPM.

일반적으로 입력값을 추정하기 위해 동역학 모델의 역함수를 계산하는 것과 달리, 베이지안 네트워크(158)은 후술하는 바와 같이 강화학습의 버퍼에 있는 학습 데이터 쌍을 활용한 지도 학습을 통해 입력값을 추정하도록 학습된다. 제2제어부(154)는 학습된 베이지안 네트워크(158)로 전술한 인공신경망inverse model dynamics을 대체함으로써 물리적인 모델링없이 정책 기반 강화학습 네트워크(154)에 외란 관측기를 구현할 수 있다.Unlike the general method of calculating the inverse function of a dynamic model to estimate input values, the Bayesian network (158) is trained to estimate input values through supervised learning using training data pairs in a buffer of reinforcement learning, as described below. The second control unit (154) can implement a disturbance observer in the policy-based reinforcement learning network (154) without physical modeling by replacing the artificial neural network inverse model dynamics described above with the trained Bayesian network (158).

이하 일반적인 강화 학습과 전술한 일 실시예에 따른 드론(100)의 제2제어부(154)에서 사용되는 정책기반 강화학습 네트워크와 베이지안 네트워크(158)에 대해 도 6 내지 도 10을 참조하여 상세히 설명한다. Below, the general reinforcement learning and the policy-based reinforcement learning network and Bayesian network (158) used in the second control unit (154) of the drone (100) according to the above-described embodiment are described in detail with reference to FIGS. 6 to 10.

도 6을 참조하면, 제2제어부(154)에서 사용되는 정책기반 강화학습 네트워크(158)는 환경(environment, 156a)과 에이전트(agent, 156b)가 상태(state)/보상(reward)와 액션(action)을 교환하는 시스템이다. 환경(156a)은 후술하는 바와 같이 다양한 형태의 시뮬레이션 환경 또는 시뮬레이터를 포함할 수 있다. 본 명세서는 환경(156a)과 시뮬레이터를 동일한 의미로 사용하나, 이에 제한되지 않는다. Referring to FIG. 6, the policy-based reinforcement learning network (158) used in the second control unit (154) is a system in which an environment (156a) and an agent (156b) exchange states/rewards and actions. The environment (156a) may include various forms of simulation environments or simulators as described below. In this specification, the environment (156a) and simulator are used interchangeably, but are not limited thereto.

환경(시뮬레이터)(156a)는 현재 상태(Current state, S_t)와 액션(action, A_t)을 입력 받아 다음 상태(Next State, S_t+1)와 보상(reward, R_t+1)를 출력할 수 있다. 에이전트(156b)는 이 과정에서 정책(policy, π(a/s): S==>R∈{0,1})을 학습하게 된다. The environment (simulator) (156a) can input the current state (Current state, S _t ) and action (action, A _t ) and output the next state (Next State, S _t+1 ) and reward (reward, R _t+1 ). In this process, the agent (156b) learns a policy (policy, π(a/s): S ==> R∈{0,1}).

도 7은 도 5의 제2제어부의 정책 기반 강화학습 네트워크와 베이지안 네트워크의 관계의 일 예를 도시하고 있다.Figure 7 illustrates an example of the relationship between the policy-based reinforcement learning network of the second control unit of Figure 5 and the Bayesian network.

도 7을 참조하면, 제2제어부(154)에서 사용되는 정책 기반 강화학습 네트워크(156)에서. 환경(156a)에서 발생된 상태(state)와 보상(reward)을 기반으로 에이전트(156b)가 다음 행동(action)을 반복적으로 수행하며, 축적된 보상의 합이 최대화가 되도록 행동 결정을 업데이트한다.Referring to Fig. 7, in a policy-based reinforcement learning network (156) used in the second control unit (154), an agent (156b) repeatedly performs the next action based on the state and reward generated in the environment (156a), and updates the action decision so that the sum of accumulated rewards is maximized.

제2제어부(154)에서 사용되는 베이지안 네트워크(158)는 정책 기반 강화학습 네트워크(156)의 정책 기반 강화학습에서 발생하는 학습 데이터 쌍(s_t,

, s_t+1)을 버퍼(159)에 저장한 후, 미니 배치 크기(M)만큼 샘플링하여 학습될 수 있다. The Bayesian network (158) used in the second control unit (154) is a learning data pair (s _t , s t ) generated from policy-based reinforcement learning of the policy-based reinforcement learning network (156).

, s _t+1 ) can be stored in a buffer (159) and then learned by sampling as much as the mini batch size (M).

수학식 1은 베이지안 네트워크(158)의 목적식의 일 예이다. Mathematical expression 1 is an example of the objective formula of a Bayesian network (158).

수학식 1에서,

_,θ 및 M은 각각 베이지안 네트워크의 로스 함수(Loss function), 베이지안 네트워크의 훈련 매개변수, 미니 배치 크기을 나타낸다. r_sbl은 스케링값, D_KL은 후술하는 바와 같이 KL 발산 항을 나타낸다.

을 학습하기 위한 데이터는 강화학습의 버퍼(159)에서 미니 배치 크기(M)만큼 샘플링된다. In mathematical expression 1,

_, θ, and M represent the loss function of the Bayesian network, the training parameters of the Bayesian network, and the mini-batch size, respectively. r _sbl represents a scaling value, and D _KL represents a KL divergence term as described below.

Data for learning is sampled from the buffer (159) of reinforcement learning in amounts equal to the mini-batch size (M).

예를 들어 미니 배치 크기(M)를 갖는 미니 배치는

로 표현할 수 있다. For example, a mini-batch with a mini-batch size (M)

can be expressed as

수학식 1의 첫번째 항목은 시뮬레이션(156a)에서 사용된 행동 데이터(

)와 정책 기반 강화학습 네트워크(156)에서 출력된 추정 행동(

) 사이의 평균 제곱 오차 (mean square error (MSE))를 최소화하도록 설계한다. 수학식 1의 두번째 항목인 KL 발산 항은 사후 분포가 사전 분포에 가깝게 유지되도록 장려하며, 영향력이 적은 네트워크 매개변수를 제거하기 위해 각 단계에서 신호 대 잡음비(Q(W))를 계산한다.The first item in Equation 1 is the behavioral data (

) and the estimated action output from the policy-based reinforcement learning network (156).

) is designed to minimize the mean square error (MSE) between the two networks. The second term in Equation 1, the KL divergence term, encourages the posterior distribution to remain close to the prior distribution, and the signal-to-noise ratio (Q(W)) is calculated at each step to remove less influential network parameters.

도 8은 도 5의 제2제어부의 정책 기반 강화학습 네트워크와 베이지안 네트워크의 관계의 다른 예를 도시하고 있다. Figure 8 illustrates another example of the relationship between the policy-based reinforcement learning network of the second control unit of Figure 5 and the Bayesian network.

도 8을 참조하면, 제2제어부(154)에서 사용되는 베이지안 네트워크(158)는 학습 시, 매 단계마다 파라미터 분포에 대한 신호 대 잡음비를 계산하고, 해당 신호 대 잡음비 값(SNR(Q(W))이 작을 경우, 결과값에 영향이 적은 파라미터로 간주되므로 가지치기(pruning)를 진행하여 경량화할 수 있다. 베이지안 네트워크(158)에서 가지치기를 진행하여 경량화한 네크워크가 도 8에 도시한 희소 베이지안 네트워크(158b)일 수 있다. 이하에서 베이지안 네트워크(158)는 경량화하지 않는 베이지안 네트워크(158)일 수도 있고, 희소 베이지안 네트워크(158b)일 수도 있다. 이하에서 희소 베이지안 네트워크(158b)를 베이지안 네트워크로(158)로 예시하여 설명한다. Referring to FIG. 8, the Bayesian network (158) used in the second control unit (154) calculates a signal-to-noise ratio for parameter distribution at each step during learning, and if the corresponding signal-to-noise ratio value (SNR(Q(W))) is small, it is considered a parameter that has little influence on the result value, so it can be pruned to make it lightweight. The network that is made lightweight by making pruning in the Bayesian network (158) can be the sparse Bayesian network (158b) illustrated in FIG. 8. Hereinafter, the Bayesian network (158) can be a Bayesian network (158) that is not made lightweight, or can be a sparse Bayesian network (158b). Hereinafter, the sparse Bayesian network (158b) will be described as an example of a Bayesian network (158).

학습된 정책 기반 강화학습 네트워크(156)와 희소 베이지안 네트워크(158)는 외란을 관측하는 제2제어부(154)와 드론 구동부(140)로 구성된 외란 관측기의 구조로 설계된다. The learned policy-based reinforcement learning network (156) and the sparse Bayesian network (158) are designed as a structure of a disturbance observer consisting of a second control unit (154) that observes disturbances and a drone driving unit (140).

도 8 및 도 9를 참조하면, 제2제어부(154)에서 사용되는 정책 기반 강화학습 네트워크(156)에서. 인공신경망으로 구현된 에이전트(156b)가 원하는 현재 상태(s_t)를 입력받아 입력값(

)을 출력하면 시뮬레이터 환경 또는 시뮬레이터(156a)가 이 입력값(

)을 입력받아 다음 상태(S_t+1)를 생성한다. 희소 베이지안 네트워크(158)은 현재 상태와 다음 상태를 입력받아 추정 입력값을 출력하고, 비교기에서 입력값과 추정 입력값의 차이값을 계산하고 이 과정이 반복적으로 수행되는 수학식 1에 따라 축적된 차이값이 최소화되도록 회소 베이지안 네트워크(158)이 학습된다. Referring to Figures 8 and 9, in the policy-based reinforcement learning network (156) used in the second control unit (154), an agent (156b) implemented as an artificial neural network receives a desired current state (s _t ) and inputs an input value (

) outputs the simulator environment or simulator (156a) to this input value (

) is input and generates the next state (S _t+1 ). The sparse Bayesian network (158) receives the current state and the next state, outputs an estimated input value, calculates the difference between the input value and the estimated input value in the comparator, and the sparse Bayesian network (158) is trained so that the accumulated difference value is minimized according to mathematical expression 1 in which this process is performed repeatedly.

전술한 바와 같이, 제2제어부(154)에서 사용되는 베이지안 네트워크(158)는 정책 기반 강화학습 네트워크(156)의 정책 기반 강화학습에서 발생하는 학습 데이터 쌍(s_t,

, s_t+1)을, 미니 배치 크기(M)만큼 샘플링하여 학습될 수 있다. As described above, the Bayesian network (158) used in the second control unit (154) is a learning data pair (s _t , s t ) generated from policy-based reinforcement learning of the policy-based reinforcement learning network (156).

, s _t+1 ) can be learned by sampling as much as the mini-batch size (M).

도 10을 참조하면, 학습한 희소 베이지안 네트워크(158)는 실제 드론(100)의 드론 구동부(140)에서 측정된 현재 상태(s_t) 및 다음 상태(s_t+1)와 쌍을 이루는 시스템 입력을 추정한다. Referring to FIG. 10, the learned sparse Bayesian network (158) estimates a system input paired with the current state (s _t ) and the next state (s _t+1 ) measured from the drone actuator (140) of the actual drone (100).

희소 베이지안 네트워크(158)의 출력(

)과 정책 기반 강화학습 네트워크(156)에서 얻은 출력(

)의 차이는 알려지지 않은 외란(d)과 모델 불확실성에 대한 영향으로 간주된다. 결과적으로, 다음 제어 입력에 해당 추정치를 반영함으로써 의도하지 않은 외란(d)을 상쇄할 수 있다.The output of the sparse Bayesian network (158)

) and the output obtained from the policy-based reinforcement learning network (156) (

) is considered as an effect of unknown disturbance (d) and model uncertainty. Consequently, the unintended disturbance (d) can be offset by incorporating its estimate into the next control input.

다시 말해, 제2제어부(154)는, 원하는 현재 상태(s_t)를 입력받아 입력값(

)을 출력하도록 강화학습된 정책 기반 강화학습 네트워크(policy network)로부터 출력되는 입력값(

)과 현재 상태(s_t)와 다음 상태(S_t+1)를 입력받아 추정 입력값(

)을 출력하도록 학습된 베이지안 네트워크로부터 출력된 추정 입력값(

)의 차이(

-

)를 추정 외란(

)으로 추정하고 입력값(

)에 추정 외란(

)을 빼서 실제 발생할 수 있는 외란(d)을 상쇄한 제어신호로 특정 상황에서 모터 제어 신호를 조정하여 자세를 유지하면서 비행하도록 제어할 수 있다. In other words, the second control unit (154) receives the desired current state (s _t ) and inputs the input value (

) is the input value output from a policy-based reinforcement learning network that has been reinforced to output a policy (

) and the current state (s _t ) and the next state (S _t+1 ) are input to estimate the input value (

) from the Bayesian network trained to output the estimated input values (

) difference(

-

) to estimate the disturbance (

) and the input value (

) estimated external disturbance(

) can be controlled to maintain attitude and fly by adjusting the motor control signal in a specific situation as a control signal that offsets the disturbance (d) that may actually occur.

도 11을 참조하면, 제2제어부의 정책 기반 강화학습 네트워크(156)는, 강화학습 문제에 적용 시, 상태 S 는 3차원 위치 (Px, Py, Pz), 3차원 회전(roll, pitch, yaw)이며, 행동 A 는 각 모터의 추력 (F1, F2, F3, F4) 이고, 희소 베이지안 네트워크(158)는 현재 상태(s_t)와 다음 상태(s_t+1)을 입력으로 받아 각 모터의 추력인 현재 입력(

)를 출력하도록 학습될 수 있다. Referring to Fig. 11, the policy-based reinforcement learning network (156) of the second control unit, when applied to a reinforcement learning problem, the state S is a three-dimensional position (Px, Py, Pz) and a three-dimensional rotation (roll, pitch, yaw), the action A is the thrust of each motor (F1, F2, F3, F4), and the sparse Bayesian network (158) receives the current state (s _t ) and the next state (s _t+1 ) as inputs and the current input (which is the thrust of each motor)

) can be learned to output.

도 12를 참조하면, 상용 드론의 시뮬레이터로서, 학습 시, 해당 시뮬레이터에서 나오는 상태 데이터를 기반으로 드론(100)의 입력을 출력하도록 강화학습하며. 해당 데이터는 버퍼(159)에 저장되어, 추후, 희소 베이지안 네트워크(158)로 해당 모델을 학습할 때 사용된다.Referring to Fig. 12, as a simulator of a commercial drone, reinforcement learning is performed to output the input of the drone (100) based on the state data coming from the simulator during learning. The data is stored in a buffer (159) and used later when learning the model with a sparse Bayesian network (158).

전술한 다른 실시예에 따른 드론(100)에 의하면, 돌풍 상황과 같은 특정 상황에서 저지연으로 안정적으로 제어할 수 있다.According to the drone (100) according to the other embodiment described above, it is possible to stably control with low delay in specific situations such as a gust of wind.

도 13을 참조하면, 또 다른 실시예에 따른 드론 제어 방법(200)은 일반적인 상황에서 모터 신호를 모터 제어 신호로 변환하여 특정 비행 궤적으로 비행하도록 제어하는 제1단계(S210) 및 센서 데이터를 기반으로 자세를 측정하여 특정 상황을 인지하고, 원하는 현재 상태를 입력받아 입력값을 출력하도록 강화학습된 정책 기반 강화학습 네트워크로부터 출력되는 입력값과 다음 상태를 입력받아 추정 입력값을 출력하도록 학습된 베이지안 네트워크로부터 출력된 추정 입력값의 차이를 추정 외란으로 추정하고 입력값에 추정 외란을 빼서 실제 발생할 수 있는 외란을 상쇄한 제어신호로 특정 상황에서 모터 제어 신호를 조정하여 자세를 유지하면서 비행하도록 제어하는 제2단계(S210)를 포함한다. Referring to FIG. 13, a drone control method (200) according to another embodiment includes a first step (S210) for controlling the drone to fly along a specific flight trajectory by converting a motor signal into a motor control signal in a general situation, and a second step (S210) for controlling the drone to fly while maintaining the attitude by adjusting the motor control signal in a specific situation by measuring the attitude based on sensor data, recognizing a specific situation, estimating the difference between an input value output from a policy-based reinforcement learning network that receives a desired current state as an input and outputs an input value, and an estimated input value output from a Bayesian network that is trained to receive a next state as an input and output an estimated input value, and subtracting the estimated disturbance from the input value to offset a disturbance that may actually occur.

제2단계(S220)에서, 조정된 모터 제어 신호를 모터 속도와 방향을 조정하는 특정 PWM 신호로 변환할 수 있다. In the second step (S220), the adjusted motor control signal can be converted into a specific PWM signal that adjusts the motor speed and direction.

도 4 및 도 5를 참조하여 전술한 바와 같이, 제2단계(S220)는 입력값에 정책 기반 강화학습 네트워크로부터 출력되는 입력값과 베이지안 네트워크로부터 출력된 추정 입력값의 차이인 추정 외란을 빼서 실제 발생할 수 있는 외란을 상쇄한 제어신호로 특정 상황에서 모터 제어 신호를 조정하여 자세를 유지하면서 비행하도록 제어할 수 있다. As described above with reference to FIGS. 4 and 5, the second step (S220) controls flight while maintaining attitude by adjusting the motor control signal in a specific situation by subtracting the estimated disturbance, which is the difference between the input value output from the policy-based reinforcement learning network and the estimated input value output from the Bayesian network, from the input value and canceling out the disturbance that may actually occur.

제2단계(S220)는, 원하는 제t상태(s_t)를 입력받아 입력값(

)과 제t상태(s_t)와 제t상태(S_t)의 다음 상태인 제t+1상태(S_t+1)를 입력받아 추정 입력값(

)의 차이(

-

)를 추정 외란(

)으로 추정하고 입력값(

)에 추정 외란(

)을 빼서 실제 발생할 수 있는 외란(d)을 상쇄한 제어신호로 특정 상황에서 모터 제어 신호를 조정하여 자세를 유지하면서 비행하도록 제어할 수 있다. Step 2 (S220) receives the desired t-state (s _t ) and inputs the input value (

) and the t-th state (s _t ) and the t+1 state (S _t ₊₁ ), which is the next state of the t-th state (S t ), are input to estimate the input values (

) from the Bayesian network trained to output the estimated input values (

) difference(

-

) to estimate the disturbance (

) and the input value (

) estimated external disturbance(

도 6 및 도 7을 참조하여 전술한 바와 같이, 제2단계(S220)에서 사용되는 베이지안 네트워크는 강화학습 네트워크의 정책 기반 강화학습에서 발생하는 학습 데이터 쌍 (s_t,

, s_t+1)을 버퍼에 저장한 후, 미니 배치 크기만큼 샘플링하여 학습될 수 있다. As described above with reference to FIGS. 6 and 7, the Bayesian network used in the second step (S220) is a learning data pair (s _t ,

, s _t+1 ) can be stored in a buffer and then learned by sampling as much as the mini-batch size.

도 8 및 도 9를 참조하여 전술한 바와 같이, 제2단계(S220)에서 사용되는 베이지안 네트워크는 학습 시, 매 단계마다 파라미터 분포에 대한 신호 대 잡음비를 계산하고, 해당 신호 대 잡음비 값이 작을 경우, 결과값에 영향이 적은 파라미터로 간주되므로 가지치기(pruning)를 진행하여 경량화할 수 있다. As described above with reference to FIGS. 8 and 9, the Bayesian network used in the second step (S220) calculates a signal-to-noise ratio for parameter distribution at each step during learning, and if the signal-to-noise ratio value is small, it is considered a parameter with little influence on the result value, so pruning can be performed to make it lighter.

도 10을 참조하여 전술한 바와 같이, 제2단계(S220)는, 원하는 제t상태(s_t)를 입력받아 입력값(

)의 차이(

-

)를 추정 외란(

)으로 추정하고 입력값(

)에 추정 외란(

)을 빼서 실제 발생할 수 있는 외란(d)을 상쇄한 제어신호로 특정 상황에서 모터 제어 신호를 조정하여 자세를 유지하면서 비행하도록 제어할 수 있다. As described above with reference to Fig. 10, the second step (S220) receives a desired t-state (s _t ) and inputs the input value (

) from the Bayesian network trained to output the estimated input values (

) difference(

-

) to estimate the disturbance (

) and the input value (

) estimated external disturbance(

도 11을 참조하여 전술한 바와 같이, 제2단계(S220)의 정책 기반 강화학습 네트워크는, 강화학습 문제에 적용 시, 상태 S 는 3차원 위치 (Px, Py, Pz), 3차원 회전(roll, pitch, yaw)이며, 행동 A 는 각 모터의 추력 (F1, F2, F3, F4) 이고, 희소 베이지안 네트워크는 현재 상태(s_t)와 다음 상태(s_t+1)을 입력으로 받아 각 모터의 추력인 현재 입력(

)를 출력하도록 학습될 수 있다. As described above with reference to FIG. 11, the policy-based reinforcement learning network of the second stage (S220), when applied to a reinforcement learning problem, the state S is a three-dimensional position (Px, Py, Pz) and a three-dimensional rotation (roll, pitch, yaw), the action A is the thrust of each motor (F1, F2, F3, F4), and the sparse Bayesian network receives the current state (s _t ) and the next state (s _t+1 ) as inputs and the current input (

) can be learned to output.

전술한 다른 실시예에 따른 드론의 제어 방법(200)에 의하면, 돌풍 상황과 같은 특정 상황에서 저지연으로 드론을 안정적으로 제어할 수 있다.According to the drone control method (200) according to the other embodiment described above, the drone can be stably controlled with low delay in specific situations such as a gust of wind.

이상 도면을 참조하여 실시예들에 따른 드론 및 그 드론의 제어 방법을 설명하였으나, 본 발명은 이에 제한되지 않는다. Although the drone and the control method thereof according to the embodiments have been described with reference to the above drawings, the present invention is not limited thereto.

전술한 드론(100)은, 프로세서, 메모리, 사용자 입력장치, 프레젠테이션 장치 중 적어도 일부를 포함하는 컴퓨팅 장치에 의해 구현될 수 있다. 메모리는, 프로세서에 의해 실행되면 특정 태스크를 수행할 있도록 코딩되어 있는 컴퓨터-판독가능 소프트웨어, 애플리케이션, 프로그램 모듈, 루틴, 인스트럭션(instructions), 및/또는 데이터 등을 저장하는 매체이다. 프로세서는 메모리에 저장되어 있는 컴퓨터-판독가능 소프트웨어, 애플리케이션, 프로그램 모듈, 루틴, 인스트럭션, 및/또는 데이터 등을 판독하여 실행할 수 있다. 사용자 입력장치는 사용자로 하여금 프로세서에게 특정 태스크를 실행하도록 하는 명령을 입력하거나 특정 태스크의 실행에 필요한 데이터를 입력하도록 하는 수단일 수 있다. 사용자 입력장치는 물리적인 또는 가상적인 키보드나 키패드, 키버튼, 마우스, 조이스틱, 트랙볼, 터치-민감형 입력수단, 또는 마이크로폰 등을 포함할 수 있다. 프레젠테이션 장치는 디스플레이, 프린터, 스피커, 또는 진동장치 등을 포함할 수 있다.The drone (100) described above may be implemented by a computing device including at least some of a processor, a memory, a user input device, and a presentation device. The memory is a medium that stores computer-readable software, applications, program modules, routines, instructions, and/or data, etc. that are coded to perform a specific task when executed by the processor. The processor can read and execute computer-readable software, applications, program modules, routines, instructions, and/or data stored in the memory. The user input device may be a means for allowing a user to input a command to cause the processor to perform a specific task or input data necessary for executing a specific task. The user input device may include a physical or virtual keyboard or keypad, key buttons, a mouse, a joystick, a trackball, a touch-sensitive input means, or a microphone. The presentation device may include a display, a printer, a speaker, or a vibration device.

컴퓨팅 장치는 스마트폰, 태블릿, 랩탑, 데스크탑, 서버, 클라이언트 등의 다양한 장치를 포함할 수 있다. 컴퓨팅 장치는 하나의 단일한 스탠드-얼론 장치일 수도 있고, 통신망을 통해 서로 협력하는 다수의 컴퓨팅 장치들로 이루어진 분산형 환경에서 동작하는 다수의 컴퓨팅 장치를 포함할 수 있다.Computing devices may include a variety of devices, such as smartphones, tablets, laptops, desktops, servers, and clients. A computing device may be a single, stand-alone device, or it may include multiple computing devices operating in a distributed environment with multiple computing devices cooperating with each other via a communications network.

또한 전술한 드론(100)은, 프로세서를 구비하고, 또한 프로세서에 의해 실행되면 전술한 드론의 제어 방법(200)을 수행할 수 있도록 코딩된 컴퓨터 판독가능 소프트웨어, 애플리케이션, 프로그램 모듈, 루틴, 인스트럭션, 및/또는 데이터 구조 등을 저장한 메모리를 구비하는 컴퓨팅 장치에 의해 실행될 수 있다.In addition, the drone (100) described above may be executed by a computing device having a processor and a memory storing computer-readable software, applications, program modules, routines, instructions, and/or data structures coded to perform the drone control method (200) described above when executed by the processor.

전술한 드론의 제어 방법(200)은 다양한 수단을 통해 구현될 수 있다. 예를 들어, 전술한 드론의 제어 방법(200)은 하드웨어, 펌웨어(firmware), 소프트웨어 또는 그것들의 결합 등에 의해 구현될 수 있다.The above-described drone control method (200) can be implemented through various means. For example, the above-described drone control method (200) can be implemented by hardware, firmware, software, or a combination thereof.

하드웨어에 의한 구현의 경우 전술한 드론의 제어 방법(200)은 하나 또는 그 이상의 ASICs(Application Specific Integrated Circuits), DSPs(Digital Signal Processors), DSPDs(Digital Signal Processing Devices), PLDs(Programmable Logic Devices), FPGAs(Field Programmable Gate Arrays), 프로세서, 컨트롤러, 마이크로 컨트롤러 또는 마이크로 프로세서 등에 의해 구현될 수 있다.In the case of hardware implementation, the drone control method (200) described above can be implemented by one or more ASICs (Application Specific Integrated Circuits), DSPs (Digital Signal Processors), DSPDs (Digital Signal Processing Devices), PLDs (Programmable Logic Devices), FPGAs (Field Programmable Gate Arrays), processors, controllers, microcontrollers, or microprocessors.

예를 들어 실시예들에 따른 전술한 드론의 제어 방법(200)은 심층 신경망의 뉴런(neuron)과 시냅스(synapse)가 반도체 소자들로 구현된 인공지능 반도체 장치를 이용하여 구현될 수 있다. 이때 반도체 소자는 현재 사용하는 반도체 소자들, 예를 들어 SRAM이나 DRAM, NAND 등일 수도 있고, 차세대 반도체 소자들, RRAM이나 STT MRAM, PRAM 등일 수도 있고, 이들의 조합일 수도 있다.For example, the above-described drone control method (200) according to the embodiments may be implemented using an artificial intelligence semiconductor device in which neurons and synapses of a deep neural network are implemented using semiconductor elements. At this time, the semiconductor elements may be currently used semiconductor elements, such as SRAM, DRAM, NAND, etc., or may be next-generation semiconductor elements, such as RRAM, STT MRAM, PRAM, etc., or may be a combination thereof.

전술한 드론의 제어 방법(200)을 인공지능 반도체 장치를 이용하여 구현할 때, 딥 러닝 모델을 소프트웨어로 학습한 결과(가중치)를 어레이로 배치된 시냅스 모방소자에 전사하거나 인공지능 반도체 장치에서 학습을 진행할 수도 있다.When implementing the above-described drone control method (200) using an artificial intelligence semiconductor device, the results (weights) of learning a deep learning model using software can be transferred to synapse-mimicking elements arranged in an array, or learning can be performed in the artificial intelligence semiconductor device.

펌웨어나 소프트웨어에 의한 구현의 경우, 전술한 드론의 제어 방법(200)은 이상에서 설명된 기능 또는 동작들을 수행하는 장치, 절차 또는 함수 등의 형태로 구현될 수 있다. 소프트웨어 코드는 메모리 유닛에 저장되어 프로세서에 의해 구동될 수 있다. 메모리 유닛은 상기 프로세서 내부 또는 외부에 위치하여, 이미 공지된 다양한 수단에 의해 프로세서와 데이터를 주고 받을 수 있다.In the case of implementation by firmware or software, the above-described drone control method (200) may be implemented in the form of a device, procedure, or function that performs the functions or operations described above. The software code may be stored in a memory unit and driven by a processor. The memory unit may be located inside or outside the processor and may exchange data with the processor by various means already known.

또한, 위에서 설명한 "시스템", "프로세서", "컨트롤러", "컴포넌트", "모듈", "인터페이스", "모델", 또는 "유닛" 등의 용어는 일반적으로 컴퓨터 관련 엔티티 하드웨어, 하드웨어와 소프트웨어의 조합, 소프트웨어 또는 실행 중인 소프트웨어를 의미할 수 있다. 예를 들어, 전술한 구성요소는 프로세서에 의해서 구동되는 프로세스, 프로세서, 컨트롤러, 제어 프로세서, 개체, 실행 스레드, 프로그램 및/또는 컴퓨터일 수 있지만 이에 국한되지 않는다. 예를 들어, 컨트롤러 또는 프로세서에서 실행 중인 애플리케이션과 컨트롤러 또는 프로세서가 모두 구성 요소가 될 수 있다. 하나 이상의 구성 요소가 프로세스 및/또는 실행 스레드 내에 있을 수 있으며, 구성 요소들은 하나의 장치(예: 시스템, 컴퓨팅 디바이스 등)에 위치하거나 둘 이상의 장치에 분산되어 위치할 수 있다.Additionally, the terms "system", "processor", "controller", "component", "module", "interface", "model", or "unit" as described above may generally refer to a computer-related entity hardware, a combination of hardware and software, software, or software in execution. For example, the aforementioned components may be, but are not limited to, a process driven by a processor, a processor, a controller, a control processor, an object, a thread of execution, a program, and/or a computer. For example, an application running on a controller or a processor and the controller or the processor can both be components. One or more components may be within a process and/or thread of execution, and the components may be located on a single device (e.g., a system, a computing device, etc.) or distributed across two or more devices.

한편, 전술한 드론의 제어 방법(200)을 수행하는, 컴퓨터 기록매체에 저장되는 컴퓨터 프로그램을 제공한다. 또한 또 다른 실시예는 전술한 다중 에이전트 기반 배터리 냉각판 설계 장치을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다. Meanwhile, a computer program stored in a computer storage medium is provided for performing the above-described drone control method (200). In addition, another embodiment provides a computer-readable storage medium having recorded thereon a program for realizing the above-described multi-agent based battery cooling plate design device.

기록매체에 기록된 프로그램은 컴퓨터에서 읽히어 설치되고 실행됨으로써 전술한 단계들을 실행할 수 있다.The program recorded on the recording medium can be read, installed and executed by a computer, thereby executing the steps described above.

이와 같이, 컴퓨터가 기록매체에 기록된 프로그램을 읽어 들여 프로그램으로 구현된 기능들을 실행시키기 위하여, 전술한 프로그램은 컴퓨터의 프로세서(CPU)가 컴퓨터의 장치 인터페이스(Interface)를 통해 읽힐 수 있는 C, C++, JAVA, 기계어 등의 컴퓨터 언어로 코드화된 코드(Code)를 포함할 수 있다.In this way, in order for the computer to read the program recorded on the recording medium and execute the functions implemented by the program, the above-mentioned program may include code coded in a computer language such as C, C++, JAVA, or machine language that can be read by the computer's processor (CPU) through the computer's device interface.

이러한 코드는 전술한 기능들을 정의한 함수 등과 관련된 기능적인 코드(Function Code)를 포함할 수 있고, 전술한 기능들을 컴퓨터의 프로세서가 소정의 절차대로 실행시키는데 필요한 실행 절차 관련 제어 코드를 포함할 수도 있다.Such code may include functional code related to functions defining the aforementioned functions, and may also include control code related to execution procedures necessary for the computer's processor to execute the aforementioned functions according to a predetermined procedure.

또한, 이러한 코드는 전술한 기능들을 컴퓨터의 프로세서가 실행시키는데 필요한 추가 정보나 미디어가 컴퓨터의 내부 또는 외부 메모리의 어느 위치(주소 번지)에서 참조 되어야 하는지에 대한 메모리 참조 관련 코드를 더 포함할 수 있다.Additionally, such code may further include memory reference related code regarding where in the internal or external memory of the computer the additional information or media required for the computer's processor to execute the aforementioned functions should be referenced.

또한, 컴퓨터의 프로세서가 전술한 기능들을 실행시키기 위하여 원격(Remote)에 있는 어떠한 다른 컴퓨터나 서버 등과 통신이 필요한 경우, 코드는 컴퓨터의 프로세서가 컴퓨터의 통신 모듈을 이용하여 원격(Remote)에 있는 어떠한 다른 컴퓨터나 서버 등과 어떻게 통신해야만 하는지, 통신 시 어떠한 정보나 미디어를 송수신해야 하는지 등에 대한 통신 관련 코드를 더 포함할 수도 있다.In addition, if the computer's processor needs to communicate with another computer or server located remotely in order to execute the functions described above, the code may further include communication-related code regarding how the computer's processor should communicate with another computer or server located remotely using the computer's communication module, and what information or media should be sent and received during the communication.

이상에서 전술한 바와 같은 프로그램을 기록한 컴퓨터로 읽힐 수 있는 기록매체는, 일 예로, ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 미디어 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어, 인터넷을 통한 전송)의 형태로 구현되는 것도 포함할 수 있다.As described above, a computer-readable recording medium having recorded thereon a program may include, for example, a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical media storage device, etc., and may also include one implemented in the form of a carrier wave (e.g., transmission via the Internet).

또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.Additionally, computer-readable recording media can be distributed across network-connected computer systems, allowing computer-readable code to be stored and executed in a distributed manner.

그리고, 본 발명을 구현하기 위한 기능적인(Functional) 프로그램과 이와 관련된 코드 및 코드 세그먼트 등은, 기록매체를 읽어서 프로그램을 실행시키는 컴퓨터의 시스템 환경 등을 고려하여, 본 발명이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론되거나 변경될 수도 있다.In addition, the functional program for implementing the present invention and the codes and code segments related thereto may be easily inferred or changed by programmers in the technical field to which the present invention belongs, taking into consideration the system environment of the computer that reads the recording medium and executes the program.

전술한 드론의 제어방법(200)은, 컴퓨터에 의해 실행되는 애플리케이션이나 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다.The above-described drone control method (200) may also be implemented in the form of a recording medium including computer-executable commands, such as an application or program module executed by a computer. The computer-readable medium may be any available medium that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media. In addition, the computer-readable medium may include all computer storage media. The computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented by any method or technology for storing information, such as computer-readable commands, data structures, program modules, or other data.

전술한 드론의 제어방법(200)은, 단말기에 기본적으로 설치된 애플리케이션(이는 단말기에 기본적으로 탑재된 플랫폼이나 운영체제 등에 포함된 프로그램을 포함할 수 있다)에 의해 실행될 수 있고, 사용자가 애플리케이션 스토어 서버, 애플리케이션 또는 해당 서비스와 관련된 웹 서버 등의 애플리케이션 제공 서버를 통해 마스터 단말기에 직접 설치한 애플리케이션(즉, 프로그램)에 의해 실행될 수도 있다. 이러한 의미에서, 전술한 다중 에이전트 기반 배터리 냉각판 설계 장치은 단말기에 기본적으로 설치되거나 사용자에 의해 직접 설치된 애플리케이션(즉, 프로그램)으로 구현되고 단말기에 등의 컴퓨터로 읽을 수 있는 기록매체에 기록될 수 있다.The drone control method (200) described above can be executed by an application that is basically installed on the terminal (which may include a program included in a platform or operating system basically installed on the terminal), and can also be executed by an application (i.e., a program) that the user directly installs on the master terminal through an application providing server such as an application store server, an application, or a web server related to the corresponding service. In this sense, the multi-agent-based battery cooling plate design device described above can be implemented as an application (i.e., a program) that is basically installed on the terminal or directly installed by the user, and can be recorded on a computer-readable recording medium such as the terminal.

이상에서, 본 개시의 실시예를 구성하는 모든 구성 요소들이 하나로 결합되거나 결합되어 동작하는 것으로 설명되었다고 해서, 본 개시는 반드시 이러한 실시예에 한정되는 것은 아니다. 즉, 본 개시의 목적 범위 안에서라면, 그 모든 구성 요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다. 또한, 그 모든 구성 요소들이 각각 하나의 독립적인 하드웨어로 구현될 수 있지만, 각 구성 요소들의 그 일부 또는 전부가 선택적으로 조합되어 하나 또는 복수 개의 하드웨어에서 조합된 일부 또는 전부의 기능을 수행하는 프로그램 모듈을 갖는 컴퓨터 프로그램으로서 구현될 수도 있다. 그 컴퓨터 프로그램을 구성하는 코드들 및 코드 세그먼트들은 본 개시의 기술 분야의 당업자에 의해 용이하게 추론될 수 있을 것이다. 이러한 컴퓨터 프로그램은 컴퓨터가 읽을 수 있는 저장매체(Computer Readable Media)에 저장되어 컴퓨터에 의하여 읽혀지고 실행됨으로써, 본 개시의 실시예를 구현할 수 있다. 컴퓨터 프로그램의 저장매체로서는 자기 기록매체, 광 기록매체, 등이 포함될 수 있다.In the above, even though all the components constituting the embodiments of the present disclosure have been described as being combined as one or combined and operating, the present disclosure is not necessarily limited to these embodiments. That is, within the scope of the purpose of the present disclosure, all of the components may be selectively combined and operating one or more times. In addition, although all of the components may be implemented as independent hardware, some or all of the components may be selectively combined and implemented as a computer program having a program module that performs some or all of the functions combined in one or more hardware. The codes and code segments constituting the computer program may be easily inferred by a person skilled in the art of the present disclosure. Such a computer program may be stored in a computer-readable storage medium and read and executed by a computer, thereby implementing the embodiments of the present disclosure. The storage medium of the computer program may include a magnetic recording medium, an optical recording medium, etc.

또한, 이상에서 기재된 "포함하다", "구성하다" 또는 "가지다" 등의 용어는, 특별히 반대되는 기재가 없는 한, 해당 구성 요소가 내재될 수 있음을 의미하는 것이므로, 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것으로 해석되어야 한다. 기술적이거나 과학적인 용어를 포함한 모든 용어들은, 다르게 정의되지 않는 한, 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 사전에 정의된 용어와 같이 일반적으로 사용되는 용어들은 관련 기술의 문맥 상의 의미와 일치하는 것으로 해석되어야 하며, 본 개시에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.In addition, the terms "include," "comprise," or "have" described above, unless otherwise specifically stated, mean that the corresponding component can be included, and therefore should be interpreted as including other components rather than excluding other components. All terms, including technical or scientific terms, unless otherwise defined, have the same meaning as commonly understood by a person of ordinary skill in the art to which this disclosure belongs. Commonly used terms, such as terms defined in a dictionary, should be interpreted as being consistent with the contextual meaning of the relevant technology, and shall not be interpreted in an ideal or overly formal sense, unless explicitly defined in this disclosure.

이상의 설명은 본 개시의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 개시의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 개시에 개시된 실시예들은 본 개시의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 개시의 기술 사상의 범위가 한정되는 것은 아니다. 본 개시의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 개시의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely an illustrative description of the technical idea of the present disclosure, and those skilled in the art to which the present disclosure pertains may make various modifications and variations without departing from the essential characteristics of the present disclosure. Therefore, the embodiments disclosed in the present disclosure are not intended to limit the technical idea of the present disclosure but to explain it, and the scope of the technical idea of the present disclosure is not limited by these embodiments. The protection scope of the present disclosure should be interpreted by the following claims, and all technical ideas within a scope equivalent thereto should be interpreted as being included in the scope of rights of the present disclosure.

CROSS-REFERENCE TO RELATED APPLICATIONCROSS-REFERENCE TO RELATED APPLICATION

본 특허출원은 2023년 12월 18일 한국에 출원한 특허출원번호 제 10-2023-0184113 호에 대해 미국 특허법 119(a)조 (35 U.S.C § 119(a))에 따라 우선권을 주장하며, 그 모든 내용은 참고문헌으로 본 특허출원에 병합된다. 아울러, 본 특허출원은 미국 이외에 국가에 대해서도 위와 동일한 이유로 우선권을 주장하면 그 모든 내용은 참고문헌으로 본 특허출원에 병합된다.This patent application claims the benefit of priority under 35 U.S.C. § 119(a) to Korean patent application No. 10-2023-0184113, filed December 18, 2023, the entire contents of which are incorporated herein by reference. In addition, this patent application claims priority for other countries than the United States for the same reasons, the entire contents of which are incorporated herein by reference.

Claims

Drone sensor section that provides sensor data;

A drone flight section providing motor signals; and

A drone comprising a drone control unit that controls the drone to fly along a specific flight trajectory by converting the motor signal into the motor control signal in a general situation, and measures the attitude based on the sensor data to recognize a specific situation, and estimates the difference between the input value output from a policy-based reinforcement learning network trained to receive a desired current state as an input and output an input value and the estimated input value output from a Bayesian network trained to receive the current state and the next state as inputs, and controls the drone to fly while maintaining the attitude by adjusting the motor control signal in the specific situation as a control signal that offsets the disturbance that may actually occur by subtracting the estimated disturbance from the input value.

In the first paragraph,

A drone wherein the above drone flight section converts the adjusted motor control signal into a specific PWM signal that adjusts the motor speed and direction.

In the first paragraph,

The above drone control unit

In the above general situation, a first control unit that converts the motor signal into the motor control signal and controls the flight to a specific flight trajectory; and

A drone including a second control unit that controls the drone to fly while maintaining the attitude by adjusting the motor control signal in the specific situation with the control signal that offsets the disturbance that may actually occur by subtracting the estimated disturbance, which is the difference between the input value output from the policy-based reinforcement learning network and the estimated input value output from the Bayesian network, from the input value.

In the third paragraph,

The Bayesian network used in the second control unit above is a learning data pair (s _t ,

, s _t+1 ) are stored in the buffer, and then trained drones are sampled by the mini-batch size.

In paragraph 4,

The Bayesian network used in the second control unit calculates the signal-to-noise ratio for the parameter distribution at each step during learning, and if the signal-to-noise ratio value is small, it is considered a parameter with little influence on the result value, so pruning is performed to make the drone lightweight.

In the third paragraph,

The above second control unit receives the desired current state (s _t ) and inputs the input value (

) is output from the above policy-based reinforcement learning network (policy network) that has been reinforced to output the input value (

) from the Bayesian network trained to output the estimated input value (

) difference(

-

) to estimate the disturbance (

) is estimated as the input value (

) in the above estimated disturbance (

) is controlled to fly while maintaining the attitude by adjusting the motor control signal in the specific situation with a control signal that offsets the disturbance (d) that may actually occur.

In the third paragraph,

The policy-based reinforcement learning network of the second control unit, when applied to a reinforcement learning problem, the state S is a three-dimensional position (Px, Py, Pz) and a three-dimensional rotation (roll, pitch, yaw), and the action A is the thrust of each motor (F1, F2, F3, F4).

The above sparse Bayesian network receives the current state (s _t ) and the next state (s _t+1 ) as inputs and calculates the current input ( , which is the thrust of each motor)

) drone trained to output

A first step for controlling the flight along a specific flight trajectory by converting a motor signal into the motor control signal in a general situation; and

A method for controlling a drone, comprising a second step of controlling the drone to fly while maintaining the attitude by adjusting the motor control signal in the specific situation by estimating the difference between the input value output from a policy-based reinforcement learning network trained to receive a desired current state as input and output an input value and the estimated input value output from a Bayesian network trained to receive a next state as input and output an estimated input value as an estimated disturbance, and controlling the drone to fly while maintaining the attitude by subtracting the estimated disturbance from the input value and using a control signal that offsets disturbance that may actually occur.

In Article 8,

A method for controlling a drone by converting, in the second step, the adjusted motor control signal into a specific PWM signal for adjusting the motor speed and direction.

In Article 8,

The second step is a method for controlling a drone to fly while maintaining the attitude by adjusting the motor control signal in the specific situation with the control signal that offsets the disturbance that may actually occur by subtracting the estimated disturbance, which is the difference between the input value output from the policy-based reinforcement learning network and the estimated input value output from the Bayesian network, from the input value.

In Article 10,

The Bayesian network used in the second step above is a learning data pair (s _t ,

, s _t+1 ) in a buffer, and then a method for controlling a learned drone by sampling the size of the mini batch.

In Article 11,

The Bayesian network used in the second step calculates the signal-to-noise ratio for the parameter distribution at each step during learning, and if the signal-to-noise ratio value is small, it is considered a parameter with little influence on the result value, so pruning is performed to control a lightweight drone.

In Article 10,

The second step above receives the desired t-state (s _t ) and inputs the input value (

) and the t-th state (s _t ) and the t+1-th state (s _t ₊₁ ), which is the next state of the t-th state (s t ), are input to estimate the input value (

) from the Bayesian network trained to output the estimated input value (

) difference(

-

) to estimate the disturbance (

) is estimated as the input value (

) in the above estimated disturbance (

) is subtracted from the drone to control the drone so that it can fly while maintaining the attitude by adjusting the motor control signal in the specific situation with a control signal that offsets the disturbance (d) that may actually occur.

In Article 10,

The above second stage policy-based reinforcement learning network, when applied to a reinforcement learning problem, the state S is a three-dimensional position (Px, Py, Pz) and a three-dimensional rotation (roll, pitch, yaw), and the action A is the thrust of each motor (F1, F2, F3, F4).

The above sparse Bayesian network takes as input the current state (s _t ) and the next state (s _t+1 ) and outputs the current input ( , which is the thrust of each motor)

) A method for controlling a drone learned to output.