WO2024189997A1

WO2024189997A1 - Information processing device

Info

Publication number: WO2024189997A1
Application number: PCT/JP2023/042953
Authority: WO
Inventors: 知洋三村; 慎石黒; 英理小出
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2023-03-15
Filing date: 2023-11-30
Publication date: 2024-09-19
Anticipated expiration: 2025-09-15
Also published as: JPWO2024189997A1

Abstract

An information processing device (10) comprises: an acquisition unit (11) that acquires information to be input into an action value function for estimating the value of an action related to a rearrangement of vehicles between a plurality of ports in a shared transportation service whereby the vehicles are shared; and a learning inference unit (12) that implements reinforcement learning with regard to the action value function using the information acquired by the acquisition unit (11) in order to obtain an action value function for selecting an action that will have the highest future action value.

Description

Information processing device

　本開示は、シェアリング交通サービスにおける複数のポート間での車両の再配置の最適化を行う情報処理装置に関する。なお、「ポート」とは、シェアされる複数の車両を停めるためのスペース（例えば、シェアされる複数の電動自転車を停めるための駐輪場など）を意味し、シェアリング交通サービスを提供するために、複数のポートが複数の場所に設けられている。 The present disclosure relates to an information processing device that optimizes the reallocation of vehicles among multiple ports in a shared transportation service. Note that a "port" refers to a space for parking multiple shared vehicles (such as a bicycle parking lot for parking multiple shared electric bicycles), and multiple ports are provided in multiple locations to provide a shared transportation service.

　従来より、車両のシェアリングサービスに関し、複数のポートのうち、車両が余っているポートから車両を回収し、回収された車両を車両が不足しているポートに配置することで、サービス全体での車両の利用率を向上させる技術が知られている（特許文献１参照）。 There is a known technology for vehicle sharing services that improves the vehicle utilization rate throughout the entire service by collecting vehicles from a port with a surplus of vehicles among multiple ports and placing the collected vehicles in a port with a shortage of vehicles (see Patent Document 1).

特開２０２２－１７５８９８号公報JP 2022-175898 A

　上記の車両のシェアリングサービスは民間企業により提供されるサービスであるため、単に、ポート間で不足自転車を適切に補充し合うのみではなく、サービス全体としての報酬（利益）を最大化するように運用することが望まれる。 The vehicle sharing services mentioned above are provided by private companies, so it is desirable to operate them in a way that maximizes the rewards (profits) of the service as a whole, rather than simply allowing ports to replenish bicycle shortages appropriately.

　しかし、シェアリングサービスのサービス全体としての報酬（利益）を最大化することに主眼を置いて、車両の再配置を行う技術は提案されていない。 However, no technology has been proposed that reallocates vehicles with a primary focus on maximizing the overall reward (profit) of the sharing service.

　本開示は、上記のような事情を考慮し、シェアリング交通サービスにおける複数のポート間での車両の再配置において報酬（利益）を最大化するように車両の再配置の最適化を行うことを目的とする。 The present disclosure takes into consideration the above circumstances and aims to optimize vehicle reallocation between multiple ports in a shared transportation service so as to maximize rewards (profits).

　機械学習の１つとして、ある環境におけるエージェントが、現在の状態を観測し、観測で得られた状態情報から、方策（policy）に基づいて将来取るべき行動を決定する問題を扱う強化学習（reinforcement　learning）が知られている。上記エージェントは、決定された行動を実行することで環境から報酬を得るが、強化学習は、一連の行動を通じて報酬が最も多く得られるような方策を学習する。 One type of machine learning known as reinforcement learning is the problem of an agent in an environment observing the current state and determining the future actions to be taken based on a policy from the state information obtained from the observation. The agent obtains rewards from the environment by executing the determined actions, and reinforcement learning learns the policy that will obtain the greatest rewards through a series of actions.

　出願人は、上記のような強化学習に着目し、強化学習の枠組みを利用して、シェアリング交通サービスにおける複数のポート間での車両の再配置において強化学習における報酬を最大化するように車両の再配置の最適化を行う発明をしたので、本明細書にて開示する。 The applicant has focused on the above-mentioned reinforcement learning and has devised an invention that utilizes a reinforcement learning framework to optimize vehicle reallocation between multiple ports in a shared transportation service so as to maximize the reward in reinforcement learning, and discloses this invention in the present specification.

　本開示に係る情報処理装置は、車両をシェアするシェアリング交通サービスにおける複数のポート間での車両の再配置に関する行動の価値を見積もる行動価値関数に入力される情報を取得する取得部と、将来の行動の価値が最も高くなる行動を選択する前記行動価値関数にするために、前記取得部により取得された情報を用いて前記行動価値関数に対し強化学習を行う学習推論部と、を備える。 The information processing device according to the present disclosure includes an acquisition unit that acquires information to be input into an action value function that estimates the value of an action related to the reallocation of vehicles between multiple ports in a shared transportation service in which vehicles are shared, and a learning and inference unit that performs reinforcement learning on the action value function using the information acquired by the acquisition unit in order to make the action value function select an action that will maximize the value of a future action.

　上記の情報処理装置では、取得部が、シェアリング交通サービスにおける複数のポート間での車両の再配置に関する行動の価値を見積もる行動価値関数に入力される情報を取得する。当該情報の例は、発明の実施形態にて詳述するが、ここでの「取得」は、情報を入手することに加え、予め定められた数式等を用いて情報を算出することも広く含む。学習推論部は、上記の行動価値関数を、将来の行動の価値が最も高くなる行動を選択する行動価値関数にするために、取得部により取得された情報を用いて行動価値関数に対し強化学習を行う。これにより、強化学習によって、行動価値関数は、将来の行動の価値が最も高くなる行動を選択するような行動価値関数に学習されていく。 In the above information processing device, the acquisition unit acquires information to be input into an action value function that estimates the value of an action related to the reallocation of vehicles between multiple ports in a shared transportation service. Examples of the information will be described in detail in the embodiments of the invention, but "acquisition" here broadly includes not only obtaining information but also calculating information using a predetermined formula or the like. The learning and inference unit performs reinforcement learning on the action value function using the information acquired by the acquisition unit to turn the above action value function into an action value function that selects an action that will have the highest value for a future action. As a result, through reinforcement learning, the action value function learns to become an action value function that selects an action that will have the highest value for a future action.

　このような行動価値関数に基づき複数のポート間での車両の再配置に関する行動の価値が見積もられ、将来の行動の価値が最も高くなる行動が選択されるよう制御されるため、シェアリング交通サービスにおける複数のポート間での車両の再配置において、強化学習における報酬を最大化するように車両の再配置の最適化を行うことができる。 The value of actions related to vehicle reallocation between multiple ports is estimated based on such an action value function, and the action that will have the highest value in the future is selected. Therefore, when reallocating vehicles between multiple ports in a shared transportation service, vehicle reallocation can be optimized to maximize the reward in reinforcement learning.

　本開示によれば、シェアリング交通サービスにおける複数のポート間での車両の再配置において、強化学習における報酬を最大化するように車両の再配置の最適化を行うことができる。 According to the present disclosure, when reallocating vehicles between multiple ports in a shared transportation service, it is possible to optimize the reallocation of vehicles so as to maximize rewards in reinforcement learning.

発明の実施形態における情報処理装置の機能ブロック構成図である。1 is a functional block diagram of an information processing device according to an embodiment of the present invention; 強化学習の概要を説明するための図である。FIG. 1 is a diagram for explaining an overview of reinforcement learning. 本件を強化学習に適用した場合の状態情報などの例を示す図である。FIG. 11 is a diagram showing an example of state information etc. when the present invention is applied to reinforcement learning. 割引累積報酬を説明するための図である。FIG. 13 is a diagram for explaining a discounted cumulative reward. 状態価値関数を説明するための図である。FIG. 13 is a diagram for explaining a state value function. 状態価値関数の学習について説明するための図である。FIG. 13 is a diagram for explaining learning of a state value function. 強化学習における報酬の算出例を説明するための図である。FIG. 11 is a diagram for explaining an example of calculating a reward in reinforcement learning. ポートに置くことで利用される平均回数の算出を説明するための図である。13 is a diagram for explaining calculation of the average number of times used by placing it on a port. FIG. ポートに置くことで利用される平均回数の算出で用いられる情報例を示す図である。13 is a diagram showing an example of information used in calculating the average number of times a device is used by being placed on a port. FIG. 自転車のインプレッション平均回数の算出を説明するための図である。FIG. 13 is a diagram for explaining calculation of the average number of impressions of a bicycle. 自転車のインプレッション平均回数の算出で用いられる情報例を示す図である。FIG. 13 is a diagram showing an example of information used in calculating the average number of impressions of a bicycle. 学習段階の処理を示すフロー図である。FIG. 13 is a flow diagram showing the processing in the learning stage. 図１２のステップＳ４の処理詳細を示すフロー図である。FIG. 13 is a flowchart showing details of the process in step S4 of FIG. 12. 推論段階の処理を示すフロー図である。FIG. 1 is a flow diagram showing the processing of the inference stage. 情報処理装置のハードウェア構成例を示す図である。FIG. 2 is a diagram illustrating an example of a hardware configuration of an information processing device.

　以下、図面を参照しながら、シェアリング交通サービスにおける複数のポート間での車両の再配置の最適化を行う情報処理装置の一実施形態を説明する。 Below, we will explain one embodiment of an information processing device that optimizes vehicle reallocation between multiple ports in a shared transportation service, with reference to the drawings.

　（情報処理装置の構成）
　図１に示すように、情報処理装置１０は、取得部１１、および、学習推論部１２を備える。このうち取得部１１は、車両をシェアするシェアリング交通サービスにおける複数のポート間での車両の再配置に関する行動の価値を見積もる行動価値関数に入力される情報を取得する機能を備えた機能部である。学習推論部１２は、将来の行動の価値が最も高くなる行動を選択する行動価値関数にするために、取得部１１によって取得された情報を用いて行動価値関数に対し強化学習を行う機能を備えた機能部である。 (Configuration of information processing device)
As shown in Fig. 1, the information processing device 10 includes an acquisition unit 11 and a learning and inference unit 12. The acquisition unit 11 is a functional unit having a function of acquiring information to be input to an action value function that estimates the value of an action related to the reallocation of vehicles among multiple ports in a shared transportation service in which vehicles are shared. The learning and inference unit 12 is a functional unit having a function of performing reinforcement learning on the action value function using the information acquired by the acquisition unit 11 in order to make the action value function select an action that will maximize the value of a future action.

　本実施形態におけるシェアリング交通サービスにおいてシェアされる車両は、バッテリ交換により再稼働可能とされ且つ車体に広告が掲載される車両であり、レンタル用電動自転車、レンタル用電動キックボード、レンタルバイク、レンタカーなどが挙げられる。以下では、上記のうち「レンタル用電動自転車」をシェアするサービスを例にして説明するが、「レンタル用電動自転車」は「自転車」と略称する。 In the shared transportation service of this embodiment, the vehicles shared are vehicles that can be restarted by replacing the battery and have advertisements placed on the body, and examples of such vehicles include rental electric bicycles, rental electric kick scooters, rental motorbikes, and rental cars. In the following, we will use as an example a service for sharing "rental electric bicycles," which will be abbreviated to "bicycles."

　また、シェアリング交通サービスを提供するために、複数のポート（ここでは、シェアされる複数の自転車を停めるための駐輪場）が複数の場所に設けられており、各ポートには複数の自転車が停められている。サービス員は、自転車の再配置に関する行動として、自転車輸送用のトラック（以下「再配置トラック」という）を用いて、あるポートへの自転車の「配置」、あるポートからの自転車の「回収」、および、自転車に搭載されたバッテリの「バッテリ交換」を行う。 Furthermore, to provide a shared transportation service, multiple ports (here, bicycle parking areas for multiple shared bicycles) are provided in multiple locations, and multiple bicycles are parked at each port. Service staff use bicycle transport trucks (hereafter referred to as "relocation trucks") to "dispatch" bicycles to a port, "collect" bicycles from a port, and "replace" the batteries installed in the bicycles, as actions related to the relocation of bicycles.

　上記のようなシェアリング交通サービスを想定した上で、学習推論部１２は、行動の価値に含まれる「報酬」として、広告が視認されうる回数であるインプレッションを一要素としつつ、後述する算出式を用いて、(a)配置に係る配置評価値、(b)回収に係る回収評価値、および、(c)バッテリ交換に係るバッテリ交換評価値を各ポートについて算出し、全ポートについて得られた(a)配置評価値、(b)回収評価値、および、(c)バッテリ交換評価値に基づいて、強化学習を行う。 Assuming a shared transportation service as described above, the learning and inference unit 12 uses impressions, which are the number of times an advertisement can be viewed, as one element of the "reward" included in the value of an action, and calculates (a) a placement evaluation value for placement, (b) a recovery evaluation value for recovery, and (c) a battery replacement evaluation value for battery replacement for each port using a calculation formula described below, and performs reinforcement learning based on the (a) placement evaluation value, (b) recovery evaluation value, and (c) battery replacement evaluation value obtained for all ports.

　また、強化学習を行った後の「行動を推論する段階」では、学習推論部１２は、取得部１１により取得された現時点の状態情報を、強化学習により得られた行動価値関数に代入することで、現時点の状態で様々な行動のそれぞれを行った場合の行動価値を取得し、得られた行動価値が最大となる行動を、取るべき行動として選択する機能を有する。 In addition, in the "stage of inferring an action" after reinforcement learning, the learning inference unit 12 has the function of substituting the current state information acquired by the acquisition unit 11 into the action value function obtained by reinforcement learning to acquire the action value when various actions are performed in the current state, and selecting the action that maximizes the acquired action value as the action to be taken.

　上記の強化学習に係る「学習段階の処理」、および、行動の推論に係る「推論段階の処理」は、図１２～図１４を用いて、後述する。 The "learning stage processing" related to the above reinforcement learning and the "inference stage processing" related to behavior inference will be described later using Figures 12 to 14.

　（強化学習の概要と学習方法の補足的説明（図２～図６））
　前述したように、機械学習の１つとして、ある環境におけるエージェントが、現在の状態を観測し、観測で得られた状態情報から、方策（policy）に基づいて将来取るべき行動を決定する問題を扱う強化学習（reinforcement　learning）が知られている。上記エージェントは、決定された行動を実行することで環境から報酬を得るが、強化学習は、一連の行動を通じて報酬が最も多く得られるような方策を学習する。 (Supplementary explanation of reinforcement learning overview and learning method (Figures 2 to 6))
As mentioned above, reinforcement learning is known as one type of machine learning, which deals with the problem of an agent in an environment observing the current state and determining the future action to be taken based on a policy from the state information obtained from the observation. The agent obtains a reward from the environment by executing the determined action, and reinforcement learning learns a policy that will obtain the most reward through a series of actions.

　出願人は、上記のような強化学習に着目し、以下では、強化学習の枠組みを利用して、シェアリング交通サービスにおける複数のポート間での車両の再配置において強化学習における報酬を最大化するように車両の再配置の最適化を行う技術を開示する。 The applicant focuses on the above-mentioned reinforcement learning, and below discloses a technology that utilizes a reinforcement learning framework to optimize vehicle reallocation between multiple ports in a shared transportation service so as to maximize the reward in reinforcement learning.

　図２に示すように、強化学習における「状態(State)」は、エージェントが環境から得た情報であり、例えば、現在のポートの状態、需要予測結果、天候、距離、制約ポートなどの情報が例示される。「行動(Action)」は、エージェントが環境でとる行動であり、例えば、配置、回収、バッテリ交換、どこのポートに行くか、などが例示される。「報酬(Reward)」は、エージェントが環境で得た利益であり、ここでは、図４を用いて後述する「割引累積報酬R_t」が挙げられる。「方策(Policy)」は、状態から行動を返す関数πである。 As shown in FIG. 2, a "state" in reinforcement learning is information that an agent obtains from the environment, such as the current port state, demand forecast results, weather, distance, and restricted ports. An "action" is a behavior that an agent takes in the environment, such as placement, recovery, battery replacement, and which port to go to. A "reward" is a profit that an agent obtains in the environment, such as the "discounted cumulative reward R _t ," which will be described later with reference to FIG. 4. A "policy" is a function π that returns an action from a state.

　図２の下段に示すように、環境から得られた状態S_tを、方策に係る関数πに入力すると、行動a_tが返される。そこで、状態S_tにおいてエージェントが行動a_tを実施すると、状態S_t+1に至る。状態S_t+1に遷移する確率（状態遷移確率）は、T(S_t+1｜S_t, a_t)と表記され、状態S_tにおいてエージェントが行動a_tを実施した場合の報酬は報酬関数R(S_t, a_t)と表記される。 As shown in the lower part of Figure 2, when state S _t obtained from the environment is input to function π related to the policy, action a _t is returned. Therefore, when the agent performs action a _t in state S _t , it reaches state S _t+1 . The probability of transitioning to state S _t+1 (state transition probability) is expressed as T(S _t+1 | S _t , a _t ), and the reward when the agent performs action a _t in state S _t is expressed as reward function R(S _t , a _t ).

　図３に示すように、状態S_t、S_t+1の情報例としては、ポートの状態、ポートの需要、天候、トラックの状態、自転車の状態、ポートの位置、ポート間の距離などの情報が挙げられる。状態S_tを、方策に係る関数πに入力することで、行動a_tとして、例えばエージェントとしての「トラックＡ」に関し「次に行くポート」として「ポートＡ」、「自転車の配置・回収数」として「－２３」、「バッテリ交換数」として「１０」といった行動が返される。なお、自転車の配置・回収数が正の値である場合は、その数だけ自転車を配置することを意味し、負の値である場合は、その数だけ自転車を回収することを意味するため、上記例は、次に行くポートＡにおいて自転車を２３台回収し、１０台分のバッテリを交換する行動を意味する。状態S_tにおいてエージェント（トラックＡ）が行動a_tを実施して状態S_t+1に至り、状態S_t+1を、方策に係る関数πに入力することで、行動a_t+1として、例えばエージェントとしての「トラックＡ」に関し「次に行くポート」として「ポートＣ」、「自転車の配置・回収数」として「１０」、「バッテリ交換数」として「０」といった行動が返される。この行動は、次に行くポートＣにおいて自転車を１０台配置し、バッテリ交換は行わない（交換不要である）ことを意味する。 As shown in Fig. 3, examples of information for states S _t and S _t+1 include port status, port demand, weather, truck status, bicycle status, port location, and distance between ports. By inputting state S _t into function π related to the policy, as action a _t , for example, for "truck A" as an agent, actions such as "port A" as the "next port,""-23" as the number of bicycles placed/recovered, and "10" as the number of batteries exchanged are returned. Note that when the number of bicycles placed/recovered is a positive value, it means that the number of bicycles will be placed, and when it is a negative value, it means that the number of bicycles will be recovered, so the above example means that 23 bicycles will be recovered at the next port A and 10 batteries will be replaced. In state S _t, an agent (truck A) performs action a _t to reach state S _t+1 , and by inputting state S _t+1 into function π related to the policy, as action a _t+1 , for example, for "truck A" as an agent, actions such as "port C" as the "next port,""10" as the number of bicycles placed/collected, and "0" as the number of batteries replaced are returned. This action means that 10 bicycles will be placed at the next port C, and no battery replacement will be performed (no replacement is required).

　図４に示すように、本件では、強化学習における「報酬」は「割引累積報酬Rt」として以下の式（１）のように定義される。

ここで、γ（０≦γ＜１）は割引率と呼ばれる定数である。つまり、将来の報酬を加味する必要があるが、実際には、将来の報酬は現時点では分からないため、上記のような割引率γを想定する。このとき、以下のように、将来へ向けて１世代ごとに割引率γを乗算していく。即ち、
１世代後の報酬R(S_t+1, a_t+1)は、γ¹R(S_t+1, a_t+1)
２世代後の報酬R(S_t+2, a_t+2)は、γ²R(S_t+2, a_t+2)
３世代後の報酬R(S_t+3, a_t+3)は、γ³R(S_t+3, a_t+3)
となり、ここで、
割引累積報酬R_t
＝γ¹R(S_t+1, a_t+1)＋γ¹R(S_t+1, a_t+1)＋γ¹R(S_t+1, a_t+1)＋γ¹R(S_t+1, a_t+1)＋…
と表され、図４の最下行にて破線で囲んだ部分はγR_t+1に相当するため、上記式（１）が導かれる。 As shown in FIG. 4, in this case, the “reward” in reinforcement learning is defined as the “discounted cumulative reward Rt” as shown in the following equation (1).

Here, γ (0≦γ<1) is a constant called the discount rate. In other words, it is necessary to take future rewards into account, but in reality, future rewards are unknown at the present time, so the above discount rate γ is assumed. In this case, the discount rate γ is multiplied for each generation going into the future as follows. In other words,
The reward R(S _t+1 , a _t+1 ) after one generation is γ ¹ R(S _t+1 , a _t+1 )
The reward R(S _t+2 , a _t+2 ) after two generations is γ ² R(S _t+2 , a _t+2 )
The reward R(S _t+3 , a _t+3 ) after three generations is γ ³ R(S _t+3 , a _t+3 )
Here,
Discounted cumulative reward R _t
=γ ¹ R(S _t+1 , a _t+1 )＋γ ¹ R(S _t+1 , a _t+1 )＋γ ¹ R(S _t+1 , a _t+1 )＋γ ¹ R(S _t+1 , a _t+1 )＋…
Since the part enclosed by the dashed line in the bottom row of FIG. 4 corresponds to γR _t+1 , the above formula (1) is derived.

　また、図５に示すように、強化学習では、よりよい方策を学習するために、状態の価値を見積もる状態価値関数、および、行動の価値を見積もる行動価値関数を定義する。ここで、「状態価値関数V_π(S_t)」は、その方策πに従えば、その状態S_tからスタートして将来どれだけの割引報酬を得られるかを表す関数である。「行動価値関数」は、状態S_tから、ある行動をとった場合に将来どれだけの割引報酬を得られるかを表す関数であり、上記の状態価値関数の一部を構成する。

上記式（２）の２段目の後段（図５にて破線で囲んだ部分）が行動価値関数Q(S_t, a_t)に相当する。このとき、行動価値関数を正確に見積もることができれば、効率よく報酬を得ることができる。 In addition, as shown in Fig. 5, in reinforcement learning, in order to learn a better policy, a state value function that estimates the value of a state and an action value function that estimates the value of an action are defined. Here, the "state value function V _π (S _t )" is a function that indicates how much discounted reward can be obtained in the future starting from the state S _t if the policy π is followed. The "action value function" is a function that indicates how much discounted reward can be obtained in the future if a certain action is taken from the state S _t , and constitutes a part of the above state value function.

The latter part of the second stage of the above formula (2) (the part surrounded by the dashed line in FIG. 5) corresponds to the action value function Q(S _t , a _t ). In this case, if the action value function can be accurately estimated, rewards can be obtained efficiently.

　そこで、図６に示すように、見積もり値と経験値の誤差

を最小化するように行動価値関数Q(S_t, a_t)を学習する。具体的には、

を最小化するように行動価値関数Q(S_t, a_t)を学習する。 Therefore, as shown in Figure 6, the error between the estimated value and the empirical value

The action value function Q(S _t , a _t ) is learned so as to minimize

We learn the action value function Q(S _t , a _t ) to minimize

　（強化学習における報酬の算出例の説明（図７～図１１））
　前述したように、学習推論部１２は、行動の価値に含まれる「報酬」として、広告が視認されうる回数であるインプレッションを一要素としつつ、後述する算出式を用いて、(a)配置に係る配置評価値、(b)回収に係る回収評価値、および、(c)バッテリ交換に係るバッテリ交換評価値を各ポートについて算出し、全ポートについて得られた(a)配置評価値、(b)回収評価値、および、(c)バッテリ交換評価値に基づいて、強化学習を行う。以下では、図７～図１１を用いて、(a)配置評価値、(b)回収評価値、および、(c)バッテリ交換評価値の算出について説明する。 (Explanation of Reward Calculation Example in Reinforcement Learning (FIGS. 7 to 11))
As described above, the learning and inference unit 12 uses the impression, which is the number of times an advertisement can be viewed, as one element of the "reward" included in the value of the action, and calculates for each port (a) a placement evaluation value related to placement, (b) a recovery evaluation value related to recovery, and (c) a battery replacement evaluation value related to battery replacement using a calculation formula described later, and performs reinforcement learning based on the (a) placement evaluation value, (b) recovery evaluation value, and (c) battery replacement evaluation value obtained for all ports. The calculation of the (a) placement evaluation value, (b) recovery evaluation value, and (c) battery replacement evaluation value will be described below with reference to Figures 7 to 11.

　(a)配置評価値は、図７にも示すように、
配置評価値＝(ポートに置くことで利用される平均回数×利用料金×１回会員割合×定数α＋自転車のインプレッション平均回数×自転車視認確率×広告単価×定数β)×配置する自転車の台数　　（５）
という式（５）により算出される。 (a) The placement evaluation value, as shown in FIG.
Placement evaluation value = (average number of times bicycles are used when placed at the port x usage fee x percentage of members who use the bicycle once x constant α + average number of impressions of the bicycle x probability of seeing the bicycle x advertising cost x constant β) x number of bicycles to be placed (5)
It is calculated by the following formula (5).

　(b)回収評価値は、図７にも示すように、
回収評価値＝((他のポートに置くことで利用される平均回数の最大数－ポートに置くことで利用される平均回数)×利用料金×１回会員割合×定数α－(他のポートの自転車のインプレッション平均回数の最大数－自転車のインプレッション平均回数)×自転車視認確率×広告単価×定数β）×回収する自転車の台数
　　　（６）
という式（６）により算出される。 (b) The recovery evaluation value is as shown in FIG.
Recovery evaluation value = ((Maximum number of average times used by placing it at another port - Average number of times used by placing it at the port) x Usage fee x Percentage of one-time members x Constant α - (Maximum number of average times of impressions of bicycles at other ports - Average number of impressions of bicycles) x Bicycle visibility probability x Advertising cost x Constant β) x Number of bicycles to be recovered (6)
It is calculated by the following formula (6).

　(c)バッテリ交換評価値は、図７にも示すように、
バッテリ交換評価値＝(ポートに置くことで利用される平均回数×利用料金×１回会員割合×定数α)×バッテリを交換する自転車の台数　　（７）
という式（７）により算出される。 (c) The battery exchange evaluation value, as shown in FIG.
Battery replacement evaluation value = (average number of times the bicycle is used when placed at the port x usage fee x percentage of members who use the bicycle once x constant α) x number of bicycles to replace the battery (7)
This is calculated by the following formula (7).

　次に、式（５）～（７）の各要素について概説する。
・「ポートに置くことで利用される平均回数」は、図８と図９を用いて後述する。
・「自転車のインプレッション平均回数」は、図１０と図１１を用いて後述する。
・「利用料金」は、電動自転車のシェアリングサービスの利用料金であり、例えば165円である。
・「１回会員割合」は、全会員数に占める１回会員の数の比率である。ここでは、会員として、月額一定額を払って利用する月額会員と、利用の都度上記利用料金を払って利用する１回会員の２種類の会員を想定しており、１回会員割合は、１回会員数／（１回会員数+月額会員数）により算出される。
・「広告単価」は、シェアされる自転車の車体に掲載されるドレスガード広告の１インプレッションの単価である。
・「定数α、β」は、自転車をポートに配置することで得られる利益とドレスガード広告により得られる利益とを考慮して定められる重要度調整用の定数であり、デフォルトは「１」とされる。
・「自転車視認確率」は、自転車のインプレッションの中で実際に広告を視認したと推測される人の占める確率である。
・「他のポートに置くことで利用される平均回数の最大数」は、後述する「ポートに置くことで利用される平均回数」の最大値である。
・「他のポートの自転車のインプレッション平均回数の最大数」は、後述する「自転車のインプレッション平均回数」の最大値である。
・「配置する自転車の台数」、「回収する自転車の台数」、「バッテリを交換する自転車の台数」は、方策πに状態Sを入力することで返される行動aに含まれる情報である。 Next, each element of the formulas (5) to (7) will be outlined.
The "average number of times a device is used by being placed in a port" will be described later with reference to FIG. 8 and FIG.
The "average number of impressions for a bicycle" will be described later with reference to FIG. 10 and FIG.
"Usage fee" is the usage fee for the electric bicycle sharing service, for example 165 yen.
- "One-time member ratio" is the ratio of one-time members to the total number of members. Here, there are two types of members: monthly members who pay a fixed monthly amount, and one-time members who pay the above-mentioned usage fee each time they use the service. The one-time member ratio is calculated by the number of one-time members / (number of one-time members + number of monthly members).
"Advertising cost" is the cost per impression of a DressGuard advertisement placed on the body of a shared bicycle.
- "Constants α, β" are constants for adjusting the importance, which are determined taking into account the profit obtained by placing a bicycle at a port and the profit obtained by dressguard advertising, and the default is set to "1".
- "Bicycle visibility probability" is the probability that people who are estimated to have actually viewed the advertisement among bicycle impressions.
"The maximum average number of times used by placing it in another port" is the maximum value of "the average number of times used by placing it in a port" which will be described later.
"Maximum number of average impressions of bicycles of other ports" is the maximum value of "Average number of impressions of bicycles" described later.
"The number of bicycles to place,""The number of bicycles to retrieve," and "The number of bicycles to replace the batteries" are information contained in the action a that is returned by inputting the state S into the policy π.

　「ポートに置くことで利用される平均回数」は、図８に示すように、ある時刻（基準時刻）から自転車ごとにトラックが回収する又はバッテリ交換されるまでの間で何回利用されたかを集計し（ステップ１）、平日・休日、時刻、ポート、天候ごとに自転車の平均利用回数を算出する（ステップ２）、という手順で求められる。図８には、基準時刻12:00以降に、自転車Ａが、ポートＡからポートＢまで利用され、次にポートＢからポートＣまで利用され、次にポートＣからポートＤまで利用され、さらにポートＤからポートＥまで利用され、その後、ポートＥでトラックにより回収された、という状況を例示している。この状況では、自転車Ａは、基準時刻12:00以降、バッテリ交換はされず、トラックが回収するまでの間で４回利用されているため、ステップ１では、自転車Ａについては「利用回数：４回」と集計される。このような集計が自転車ごとに行われる。さらに、ステップ２にて、図８の右下の表のように、平日・休日、時刻、ポート、天候ごとに自転車の平均利用回数が算出される。このとき、天候については、例えば、時間降水量＝0mmであれば「晴」とし、時間降水量＞0mmであれば「雨」とされる。また、集計表における「ポート」は、基準時刻以降に最初に自転車が利用されたポートとされる。 As shown in FIG. 8, the "average number of times used by leaving it at a port" is calculated by the following procedure: counting the number of times each bicycle is used from a certain time (reference time) until it is collected by a truck or the battery is replaced (step 1); and calculating the average number of times the bicycle is used by weekday/holiday, time, port, and weather (step 2). FIG. 8 illustrates an example of a situation in which, after the reference time of 12:00, bicycle A is used from port A to port B, then from port B to port C, then from port C to port D, and then from port D to port E, and then collected by a truck at port E. In this situation, bicycle A's battery has not been replaced since the reference time of 12:00, and bicycle A has been used four times before it is collected by a truck, so in step 1, bicycle A is counted as "number of times used: 4 times." This counting is performed for each bicycle. Furthermore, in step 2, the average number of times the bicycle is used is calculated by weekday/holiday, time, port, and weather, as shown in the table in the lower right of FIG. 8. At this time, the weather is recorded as "sunny" if the hourly precipitation is 0 mm, and "rainy" if the hourly precipitation is > 0 mm. Also, the "port" in the summary table is the port where the bicycle was first used after the reference time.

　より詳しくは、図９に示すように、取得部１１は、自転車の利用履歴情報を保管した利用テーブルから、利用履歴情報を自転車ごとに抽出し、算出対象の自転車Ａに関する利用履歴情報から図８に示すように利用開始日時（基準時刻）と利用回数を求める。そして、取得部１１は、利用開始日時における天候情報を参照することで、その時点の天候を判断し、図９の右下の表のように、平日・休日、時刻、ポート、天候ごとに自転車の平均利用回数を算出する。 More specifically, as shown in FIG. 9, the acquisition unit 11 extracts the usage history information for each bicycle from the usage table that stores the bicycle usage history information, and obtains the usage start date/time (reference time) and number of uses from the usage history information for bicycle A that is the subject of the calculation, as shown in FIG. 8. The acquisition unit 11 then references the weather information at the usage start date/time to determine the weather at that time, and calculates the average number of times the bicycle is used for weekdays/holidays, time, port, and weather, as shown in the table in the lower right of FIG. 9.

　「自転車のインプレッション平均回数」は、図１０に示すように、ある時刻（基準時刻）から自転車ごとにトラックが回収する又はバッテリ交換されるまでの間で広告が視認されうる回数を後述の要領で集計し（ステップ１）、平日・休日、時刻、ポート、天候ごとに自転車のインプレッション平均回数を算出する（ステップ２）という手順で求められる。なお、天候の判断は、前述した図８の例と同様である。 As shown in Figure 10, the "average number of impressions for a bicycle" is calculated by tallying up the number of times an advertisement is likely to be viewed for each bicycle from a certain time (reference time) until the bicycle is collected by a truck or the battery is replaced (step 1), as described below, and then calculating the average number of impressions for the bicycle for weekdays/holidays, time, port, and weather (step 2). Note that the weather is determined in the same way as in the example of Figure 8 described above.

　より詳しくは、図１１に示すように、取得部１１は、自転車の利用履歴情報を保管した利用テーブルから、利用履歴情報を自転車ごとに抽出する。また、取得部１１は、自転車の位置情報と人の位置情報から、対象の自転車（図１１では自転車Ａ）の位置から一定範囲内に存在する人を抽出し、その人数を集計することで、得られた人数を対象の自転車Ａのインプレッション回数（広告が視認されうる回数）とする。このようなインプレッション回数は、自転車の利用履歴ごとに求められ、上記で抽出された自転車ごとの利用履歴情報とマージされ、各自転車につき利用開始日時（基準時刻）ごとのインプレッション回数が得られる。そして、取得部１１は、利用開始日時（基準時刻）における天候情報を参照することで、その時点の天候を判断し、図１１の右下の表のように、平日・休日、時刻、ポート、天候ごとに自転車のインプレッション平均回数を算出する。 More specifically, as shown in FIG. 11, the acquisition unit 11 extracts usage history information for each bicycle from a usage table that stores bicycle usage history information. The acquisition unit 11 also extracts people who are within a certain range of the location of the target bicycle (bicycle A in FIG. 11) from the bicycle location information and person location information, and counts up the number of people, using the obtained number as the number of impressions (the number of times the advertisement can be viewed) for the target bicycle A. This number of impressions is calculated for each bicycle usage history and merged with the usage history information for each bicycle extracted above, to obtain the number of impressions for each bicycle for each usage start date and time (reference time). The acquisition unit 11 then refers to the weather information at the usage start date and time (reference time) to determine the weather at that time, and calculates the average number of impressions for the bicycle for weekdays/holidays, time, port, and weather, as shown in the table in the lower right of FIG. 11.

　（学習段階の処理の説明（図１２～図１３））
　以下、情報処理装置１０において実行される「学習段階の処理」と「推論段階の処理」を順に説明する。 (Explanation of the learning stage process (FIGS. 12-13))
The "learning stage processing" and the "inference stage processing" executed in the information processing device 10 will be described below in this order.

　図１２には、学習段階の処理フローを示す。まず、取得部１１は、各自転車について、ある時刻から回収されるまでの時間、および、ある時刻からバッテリ交換されるまでの時間を、図９および図１１の自転車の利用テーブルから抽出する（ステップＳ１）。上記の「ある時刻」は、例えば、予め定められた複数の基準時刻の候補から選択すればよい。次に、取得部１１は、図８、図９を用いて前述した手順で、ポートに置くことで利用される平均回数を算出する（ステップＳ２）。これにより、図９の右下の表のように、平日・休日、時刻、ポート、天候ごとに自転車の平均利用回数が算出される。さらに、取得部１１は、図１０、図１１を用いて前述した手順で、インプレッション平均回数を算出する（ステップＳ３）。これにより、図１１の右下の表のように、平日・休日、時刻、ポート、天候ごとに自転車のインプレッション平均回数が算出される。 12 shows the process flow of the learning stage. First, the acquisition unit 11 extracts the time from a certain time until collection and the time from a certain time until battery replacement for each bicycle from the bicycle usage table of FIG. 9 and FIG. 11 (step S1). The above "certain time" may be selected from a number of predetermined reference time candidates. Next, the acquisition unit 11 calculates the average number of times the bicycle is used by being placed in a port using the procedure described above with reference to FIG. 8 and FIG. 9 (step S2). As a result, the average number of times the bicycle is used is calculated for each weekday/holiday, time, port, and weather, as shown in the table at the bottom right of FIG. 9. Furthermore, the acquisition unit 11 calculates the average number of impressions using the procedure described above with reference to FIG. 10 and FIG. 11 (step S3). As a result, the average number of impressions of the bicycle is calculated for each weekday/holiday, time, port, and weather, as shown in the table at the bottom right of FIG. 11.

　次に、学習推論部１２は、図１３に示す、情報(S_t, a_t,R(S_t, a_t), S_t+1)の蓄積処理を実行する（ステップＳ４）。まず、学習推論部１２は、取得部１１によって取得された状態S_tを取得し（図１３のステップＳ４Ａ）、取得された状態S_tを方策πに入力することで、行動a_tを得る（ステップＳ４Ｂ）。このとき、実際は、複数の取りうる行動a₁、a₂、…、a_nが得られる。そして、学習推論部１２は、割引累積報酬Rtの式を用いて、状態S_tから行動a_t（a₁、a₂、…、a_n）それぞれを実行した場合の割引累積報酬R(S_t, a_t)を算出する（ステップＳ４Ｃ）。このとき、具体的には、図７～図１１を用いて説明した「配置評価値」、「回収評価値」および「バッテリ交換評価値」が自転車ごとに算出され、全ての自転車についての上記３つの評価値から割引累積報酬R(S_t, a_t)が算出される。そして、学習推論部１２は、算出された割引累積報酬が最大となる行動a_kを求める（ステップＳ４Ｄ）。即ち、ステップＳ４Ｄでは、複数の取りうる行動a₁、a₂、…、a_nのうち、

となる行動a_kが求められる。さらに、学習推論部１２は、ステップＳ４Ｄで得られた行動a_kを用いてシミュレーションを行うことで、次の状態S_t+1を取得し（ステップＳ４Ｅ）、以上のようにして得られた、状態S_t、行動a_t（即ち、割引累積報酬が最大となる行動a_k）、報酬R(S_t, a_t)（即ち、状態S_tから行動a_kを実行した場合の割引累積報酬）、および、次の状態S_t+1から成る情報(S_t, a_t,R(S_t, a_t), S_t+1)を蓄積する（ステップＳ４Ｆ）。 Next, the learning inference unit 12 executes the accumulation process of information (S _t , a _{t ,} R(S _t , a _t ), S _t+1 ) shown in FIG. 13 (step S4). First, the learning inference unit 12 acquires the state S _t acquired by the acquisition unit 11 (step S4A in FIG. 13), and obtains an action a _t by inputting the acquired state S _t to the policy π (step S4B). At this time, in reality, multiple possible actions a ₁ , a ₂ , ..., _an are obtained. Then, the learning inference unit 12 calculates the discounted cumulative reward R(S t , a _t ) when each of the actions a _t (a ₁ , a ₂ , ..., _an ) is executed from the state _S _t using the formula of the discounted cumulative reward Rt (step S4C). Specifically, the "placement evaluation value,""recovery evaluation value," and "battery exchange evaluation value" described with reference to Figures 7 to 11 are calculated for each bicycle, and the discounted cumulative reward R(S _t , a _t ) is calculated from the above three evaluation values for all bicycles. The learning and inference unit 12 then determines the action a _k that maximizes the calculated discounted cumulative reward (step S4D). That is, in step S4D, of the multiple possible actions a ₁ , a ₂ , ..., a _n ,

Further, the learning and inference unit 12 performs a simulation using the action a _k obtained in step S4D to obtain the next state S _t+1 (step S4E), and accumulates information (S t , a t , R(S _t , a _{t ), S t+1 ) consisting of the state S t , action a t (i.e., the action a k with the largest discounted cumulative reward), reward R(S t , a t} _{) (i.e., the discounted cumulative reward when the action a k is executed from the state S t )} _, _and _the _next _state _S _t ₊ ₁ _obtained _as described above (step S4F).

　図１２へ戻り、学習推論部１２は、

を最小化するように行動価値関数Q(S_t, a_t)を学習する（ステップＳ５）。その後、学習推論部１２は、ステップＳ４、Ｓ５の処理を所定のループ回数だけ繰り返すことで、行動価値関数Q(S_t, a_t)を学習し、そして、ループ回数繰り返した後、図１２の処理を終了する。 Returning to FIG. 12, the learning and inference unit 12

(Step S5). After that, the learning and inference unit 12 repeats the processes of steps S4 and S5 a predetermined number of times to learn the action value function _{Q(S t} _, a _t ). After the loop is repeated a predetermined number of times, the process of FIG _. 12 ends.

　（推論段階の処理の説明（図１４））
　図１４には、推論段階の処理フローを示す。まず、取得部１１は、前述した学習段階の処理と同様に、状態情報S_tを取得し（ステップＳ１１）、学習推論部１２は、学習により得られた行動価値関数Q(S_t, a_t)を用いて行動a_tごとの行動価値を算出する（ステップＳ１２）。そして、学習推論部１２は、行動価値が最大となる行動a_tを選択する（ステップＳ１３）。このようにして、将来の行動の価値が最も高くなる行動が選択される。

(Explanation of the inference stage process (FIG. 14))
FIG. 14 shows a process flow of the inference stage. First, the acquisition unit 11 acquires state information S _t in the same manner as the learning stage process described above (step S11), and the learning inference unit 12 calculates the action value for each action a _t using the action value function Q(S _t , a _t ) obtained by learning (step S12). Then, the learning inference unit 12 selects the action a _t with the maximum action value (step S13). In this way, the action with the highest value of future action is selected.

　（上記の実施形態の効果）
　以下、上記の実施形態による効果を説明する。 (Effects of the above embodiment)
The effects of the above embodiment will be described below.

　上述した強化学習によって、行動価値関数は、将来の行動の価値が最も高くなる行動を選択するような行動価値関数に学習されていく。このような行動価値関数に基づき複数のポート間での車両の再配置に関する行動の価値が見積もられ、将来の行動の価値が最も高くなる行動が選択されるよう制御されるため、シェアリング交通サービスにおける複数のポート間での車両の再配置において、強化学習における報酬を最大化するように車両の再配置の最適化を行うことができる。 By using the above-mentioned reinforcement learning, the action value function is trained to an action value function that selects the action that will have the highest value for future actions. Based on this action value function, the value of actions related to vehicle reallocation between multiple ports is estimated, and control is exercised so that the action that will have the highest value for future actions is selected. Therefore, when reallocating vehicles between multiple ports in a shared transportation service, vehicle reallocation can be optimized to maximize the reward in reinforcement learning.

　具体的には、シェアリング交通サービスにてシェアされる自転車の再配置に関する行動として、「配置」、「回収」および「バッテリ交換」が想定され、配置、回収、バッテリ交換という複数の側面での評価値に基づき適切な強化学習が行われるため、より適切に車両の再配置の最適化が行われ、報酬を最大化することができる。 Specifically, the actions assumed for the reallocation of bicycles shared in a shared transportation service are "placement," "collection," and "battery replacement." Appropriate reinforcement learning is performed based on evaluation values from multiple aspects, namely placement, collection, and battery replacement, which allows for more appropriate optimization of vehicle reallocation and maximizes rewards.

　シェアリング交通サービスにてシェアされる自転車は、車体に広告が表示される自転車であり、学習推論部１２は、広告が視認されうる回数であるインプレッションを加味した上で、配置評価値および回収評価値を算出するため、広告による報酬を最大化することができる。 The bicycles shared in the shared transportation service have advertisements displayed on the body of the bicycle, and the learning and inference unit 12 calculates the placement evaluation value and the collection evaluation value while taking into account impressions, which are the number of times an advertisement can be viewed, so that rewards from advertising can be maximized.

　シェアリング交通サービスにてシェアされる車両は、レンタル用電動自転車を例示したが、バッテリ交換により再稼働可能とされ車体に広告が掲載される車両であればよく、レンタル用電動キックボード、レンタルバイク、およびレンタカーといった他の種類の車両にも適用可能である。このように、本開示は、近年普及しつつあるレンタル用電動キックボードも含め、幅広い車両に適用可能であり、ユーザに広く利用される可能性を有する。 Although rental electric bicycles have been given as an example of vehicles shared in the shared transportation service, any vehicle that can be restarted by replacing the battery and has advertisements on the body can be used, and the present disclosure can also be applied to other types of vehicles such as rental electric kick scooters, rental motorcycles, and rental cars. In this way, the present disclosure can be applied to a wide range of vehicles, including rental electric kick scooters, which have become increasingly popular in recent years, and has the potential to be widely used by users.

　また、行動を推論する段階においては、図１４の処理を実行することにより、行動価値が最大となる行動を、取るべき行動として選択でき、結果的に、サービスにおける報酬（利益）を最大化することができる。 In addition, at the stage of inferring actions, by executing the process in Figure 14, the action that maximizes the action value can be selected as the action to be taken, thereby maximizing the reward (profit) for the service.

　なお、上記の実施形態は、本開示の１つの例であり、情報の種類、報酬の内容、処理の内容などは、上述したものに限定されるものではない。 Note that the above embodiment is one example of the present disclosure, and the type of information, the content of the reward, the content of the processing, etc. are not limited to those described above.

　本開示の要旨は以下の［１］～［５］に存する。
［１］　車両をシェアするシェアリング交通サービスにおける複数のポート間での車両の再配置に関する行動の価値を見積もる行動価値関数に入力される情報を取得する取得部と、
　将来の行動の価値が最も高くなる行動を選択する前記行動価値関数にするために、前記取得部により取得された情報を用いて前記行動価値関数に対し強化学習を行う学習推論部と、
　を備える情報処理装置。
［２］　前記シェアリング交通サービスにてシェアされる車両は、搭載されたバッテリの交換により再利用可能となる車両であり、
　前記車両の再配置に関する行動は、あるポートへの車両の配置、あるポートからの車両の回収、および、車両に搭載されたバッテリのバッテリ交換を含み、
　前記学習推論部は、前記行動の価値に含まれる報酬として、所定の算出式を用いて、前記配置に係る配置評価値、前記回収に係る回収評価値、および、前記バッテリ交換に係るバッテリ交換評価値を各ポートについて算出し、全ポートについて得られた配置評価値、回収評価値、および、バッテリ交換評価値に基づいて、前記強化学習を行う、［１］に記載の情報処理装置。
［３］　前記シェアリング交通サービスにてシェアされる車両は、車体に広告が表示される車両であり、
　前記学習推論部は、広告が視認されうる回数であるインプレッションを一要素として、前記配置評価値および前記回収評価値を算出する、［２］に記載の情報処理装置。
［４］　前記シェアリング交通サービスにてシェアされる車両は、バッテリ交換により再稼働可能とされ車体に広告が掲載される、レンタル用電動自転車、レンタル用電動キックボード、レンタルバイク、およびレンタカーである、［３］に記載の情報処理装置。
［５］　前記学習推論部は、行動を推論する段階では、
前記取得部により取得された現時点の状態情報を、前記強化学習により得られた前記行動価値関数に代入することで、前記現時点の状態で様々な行動のそれぞれを行った場合の行動価値を取得し、得られた行動価値が最大となる当該行動を、取るべき行動として選択する、［１］～［４］の何れか一項に記載の情報処理装置。 The gist of the present disclosure lies in the following [1] to [5].
[1] An acquisition unit that acquires information to be input into an action value function that estimates the value of an action related to the reallocation of vehicles among a plurality of ports in a sharing transportation service in which vehicles are shared;
a learning and inference unit that performs reinforcement learning on the action-value function using the information acquired by the acquisition unit in order to make the action-value function select an action that will have the highest value for a future action;
An information processing device comprising:
[2] A vehicle shared in the shared transportation service is a vehicle that can be reused by replacing the installed battery,
The actions related to the relocation of the vehicle include the deployment of the vehicle to a port, the recovery of the vehicle from a port, and the replacement of a battery installed in the vehicle;
The learning and inference unit uses a predetermined calculation formula to calculate a placement evaluation value for the placement, a recovery evaluation value for the recovery, and a battery replacement evaluation value for the battery replacement for each port as rewards included in the value of the action, and performs the reinforcement learning based on the placement evaluation values, recovery evaluation values, and battery replacement evaluation values obtained for all ports.
[3] A vehicle shared in the sharing transportation service is a vehicle on which advertisements are displayed,
The information processing device according to [2], wherein the learning and inference unit calculates the placement evaluation value and the collection evaluation value using impressions, which are the number of times an advertisement can be viewed, as one element.
[4] The information processing device described in [3], wherein the vehicles shared in the shared transportation service are rental electric bicycles, rental electric kick scooters, rental motorbikes, and rental cars that can be restarted by replacing the batteries and have advertisements displayed on the vehicle bodies.
[5] In the step of inferring an action, the learning inference unit
The information processing device according to any one of [1] to [4], wherein the current state information acquired by the acquisition unit is substituted into the action value function obtained by the reinforcement learning to obtain action values for various actions taken in the current state, and the action that maximizes the obtained action value is selected as the action to be taken.

　（用語の説明、ハードウェア構成（図１５）の説明など）
　なお、上記実施形態の説明に用いたブロック図は、機能単位のブロックを示している。これらの機能ブロック（構成部）は、ハードウェア及びソフトウェアの少なくとも一方の任意の組み合わせによって実現される。また、各機能ブロックの実現方法は特に限定されない。すなわち、各機能ブロックは、物理的又は論理的に結合した１つの装置を用いて実現されてもよいし、物理的又は論理的に分離した２つ以上の装置を直接的又は間接的に（例えば、有線、無線などを用いて）接続し、これら複数の装置を用いて実現されてもよい。機能ブロックは、上記１つの装置又は上記複数の装置にソフトウェアを組み合わせて実現されてもよい。 (Explanation of terms, explanation of hardware configuration (Fig. 15), etc.)
The block diagrams used in the description of the above embodiments show functional blocks. These functional blocks (components) are realized by any combination of at least one of hardware and software. The method of realizing each functional block is not particularly limited. That is, each functional block may be realized using one device that is physically or logically coupled, or may be realized using two or more devices that are physically or logically separated and directly or indirectly connected (for example, using wires, wirelessly, etc.). The functional blocks may be realized by combining the one device or the multiple devices with software.

　機能には、判断、決定、判定、計算、算出、処理、導出、調査、探索、確認、受信、送信、出力、アクセス、解決、選択、選定、確立、比較、想定、期待、見做し、報知（broadcasting）、通知（notifying）、通信（communicating）、転送（forwarding）、構成（configuring）、再構成（reconfiguring）、割り当て（allocating、mapping）、割り振り（assigning）などがあるが、これらに限られない。たとえば、送信を機能させる機能ブロック（構成部）は、送信部（transmitting　unit）、送信機（transmitter）と呼称される。いずれも、上述したとおり、実現方法は特に限定されない。 Functions include, but are not limited to, judgement, determination, judgment, calculation, computation, processing, derivation, investigation, search, confirmation, reception, transmission, output, access, resolution, selection, selection, establishment, comparison, assumption, expectation, regarding, broadcasting, notifying, communicating, forwarding, configuring, reconfiguring, allocating, mapping, and assignment. For example, a functional block (component) that performs the transmission function is called a transmitting unit or a transmitter. As mentioned above, there are no particular limitations on the method of realization for either of these.

　例えば、本開示の一実施形態における情報処理装置などは、本開示の処理を実行するコンピュータとして機能してもよい。図１５は、本開示の一実施形態に係る情報処理装置１０のハードウェア構成の一例を示す図である。上述の情報処理装置１０は、物理的には、プロセッサ１００１、メモリ１００２、ストレージ１００３、通信装置１００４、入力装置１００５、出力装置１００６、バス１００７などを含むコンピュータ装置として構成されてもよい。 For example, an information processing device in an embodiment of the present disclosure may function as a computer that executes the processing of the present disclosure. FIG. 15 is a diagram showing an example of the hardware configuration of an information processing device 10 according to an embodiment of the present disclosure. The information processing device 10 described above may be physically configured as a computer device including a processor 1001, memory 1002, storage 1003, a communication device 1004, an input device 1005, an output device 1006, a bus 1007, etc.

　なお、以下の説明では、「装置」という文言は、回路、デバイス、ユニットなどに読み替えることができる。情報処理装置１０のハードウェア構成は、図に示した各装置を１つ又は複数含むように構成されてもよいし、一部の装置を含まずに構成されてもよい。 In the following description, the word "apparatus" can be interpreted as a circuit, device, unit, etc. The hardware configuration of the information processing device 10 may be configured to include one or more of the devices shown in the figure, or may be configured to exclude some of the devices.

　情報処理装置１０における各機能は、プロセッサ１００１、メモリ１００２などのハードウェア上に所定のソフトウェア（プログラム）を読み込ませることによって、プロセッサ１００１が演算を行い、通信装置１００４による通信を制御したり、メモリ１００２及びストレージ１００３におけるデータの読み出し及び書き込みの少なくとも一方を制御したりすることによって実現される。 Each function of the information processing device 10 is realized by loading a specific software (program) onto hardware such as the processor 1001 and memory 1002, causing the processor 1001 to perform calculations, control communications via the communication device 1004, and control at least one of the reading and writing of data in the memory 1002 and storage 1003.

　プロセッサ１００１は、例えば、オペレーティングシステムを動作させてコンピュータ全体を制御する。プロセッサ１００１は、周辺装置とのインタフェース、制御装置、演算装置、レジスタなどを含む中央処理装置（ＣＰＵ：Central　Processing　Unit）によって構成されてもよい。 The processor 1001, for example, runs an operating system to control the entire computer. The processor 1001 may be configured as a central processing unit (CPU) that includes an interface with peripheral devices, a control device, an arithmetic unit, registers, etc.

　また、プロセッサ１００１は、プログラム（プログラムコード）、ソフトウェアモジュール、データなどを、ストレージ１００３及び通信装置１００４の少なくとも一方からメモリ１００２に読み出し、これらに従って各種の処理を実行する。プログラムとしては、上述の実施の形態において説明した動作の少なくとも一部をコンピュータに実行させるプログラムが用いられる。各種処理は、１つのプロセッサ１００１によって実行される旨を説明してきたが、２以上のプロセッサ１００１により同時又は逐次に実行されてもよい。プロセッサ１００１は、１以上のチップによって実装されてもよい。なお、プログラムは、電気通信回線を介してネットワークから送信されても良い。 The processor 1001 also reads out programs (program codes), software modules, data, etc. from at least one of the storage 1003 and the communication device 1004 into the memory 1002, and executes various processes according to these. The programs used are those that cause a computer to execute at least some of the operations described in the above-mentioned embodiments. Although it has been described that the various processes are executed by one processor 1001, they may be executed simultaneously or sequentially by two or more processors 1001. The processor 1001 may be implemented by one or more chips. The programs may be transmitted from a network via a telecommunications line.

　メモリ１００２は、コンピュータ読み取り可能な記録媒体であり、例えば、ＲＯＭ（Read　Only　Memory）、ＥＰＲＯＭ（Erasable　Programmable　ＲＯＭ）、ＥＥＰＲＯＭ（Electrically　Erasable　Programmable　ＲＯＭ）、ＲＡＭ（Random　Access　Memory）などの少なくとも１つによって構成されてもよい。メモリ１００２は、レジスタ、キャッシュ、メインメモリ（主記憶装置）などと呼ばれてもよい。メモリ１００２は、本開示の一実施の形態に係る無線通信方法を実施するために実行可能なプログラム（プログラムコード）、ソフトウェアモジュールなどを保存することができる。 Memory 1002 is a computer-readable recording medium, and may be composed of at least one of, for example, ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), RAM (Random Access Memory), etc. Memory 1002 may also be called a register, cache, main memory (primary storage device), etc. Memory 1002 can store executable programs (program codes), software modules, etc. for implementing a wireless communication method according to one embodiment of the present disclosure.

　ストレージ１００３は、コンピュータ読み取り可能な記録媒体であり、例えば、ＣＤ－ＲＯＭ（Compact　Disc　ＲＯＭ）などの光ディスク、ハードディスクドライブ、フレキシブルディスク、光磁気ディスク(例えば、コンパクトディスク、デジタル多用途ディスク、Ｂｌｕ－ｒａｙ（登録商標）ディスク)、スマートカード、フラッシュメモリ(例えば、カード、スティック、キードライブ)、フロッピー（登録商標）ディスク、磁気ストリップなどの少なくとも１つによって構成されてもよい。ストレージ１００３は、補助記憶装置と呼ばれてもよい。上述の記憶媒体は、例えば、メモリ１００２及びストレージ１００３の少なくとも一方を含むデータベース、サーバその他の適切な媒体であってもよい。 Storage 1003 is a computer-readable recording medium, and may be, for example, at least one of an optical disk such as a CD-ROM (Compact Disc ROM), a hard disk drive, a flexible disk, a magneto-optical disk (e.g., a compact disk, a digital versatile disk, a Blu-ray (registered trademark) disk), a smart card, a flash memory (e.g., a card, a stick, a key drive), a floppy (registered trademark) disk, a magnetic strip, etc. Storage 1003 may also be referred to as an auxiliary storage device. The above-mentioned storage medium may be, for example, a database, a server, or other suitable medium including at least one of memory 1002 and storage 1003.

　通信装置１００４は、有線ネットワーク及び無線ネットワークの少なくとも一方を介してコンピュータ間の通信を行うためのハードウェア（送受信デバイス）であり、例えばネットワークデバイス、ネットワークコントローラ、ネットワークカード、通信モジュールなどともいう。通信装置１００４は、例えば周波数分割複信（ＦＤＤ：Frequency　Division　Duplex）及び時分割複信（ＴＤＤ：Time　Division　Duplex）の少なくとも一方を実現するために、高周波スイッチ、デュプレクサ、フィルタ、周波数シンセサイザなどを含んで構成されてもよい。 The communication device 1004 is hardware (transmitting/receiving device) for communicating between computers via at least one of a wired network and a wireless network, and is also referred to as, for example, a network device, a network controller, a network card, a communication module, etc. The communication device 1004 may be configured to include a high-frequency switch, a duplexer, a filter, a frequency synthesizer, etc., to realize, for example, at least one of Frequency Division Duplex (FDD) and Time Division Duplex (TDD).

　入力装置１００５は、外部からの入力を受け付ける入力デバイス（例えば、キーボード、マウス、マイクロフォン、スイッチ、ボタン、センサなど）である。出力装置１００６は、外部への出力を実施する出力デバイス（例えば、ディスプレイ、スピーカー、LEDランプなど）である。なお、入力装置１００５及び出力装置１００６は、一体となった構成（例えば、タッチパネル）であってもよい。 The input device 1005 is an input device (e.g., a keyboard, a mouse, a microphone, a switch, a button, a sensor, etc.) that accepts input from the outside. The output device 1006 is an output device (e.g., a display, a speaker, an LED lamp, etc.) that performs output to the outside. Note that the input device 1005 and the output device 1006 may be integrated into one structure (e.g., a touch panel).

　また、プロセッサ１００１、メモリ１００２などの各装置は、情報を通信するためのバス１００７によって接続される。バス１００７は、単一のバスを用いて構成されてもよいし、装置間ごとに異なるバスを用いて構成されてもよい。 Furthermore, each device such as the processor 1001 and memory 1002 is connected by a bus 1007 for communicating information. The bus 1007 may be configured using a single bus, or may be configured using different buses between each device.

　また、情報処理装置１０は、マイクロプロセッサ、デジタル信号プロセッサ（ＤＳＰ：Digital　Signal　Processor）、ＡＳＩＣ（Application　Specific　Integrated　Circuit）、ＰＬＤ（Programmable　Logic　Device）、ＦＰＧＡ（Field　Programmable　Gate　Array）などのハードウェアを含んで構成されてもよく、当該ハードウェアにより、各機能ブロックの一部又は全てが実現されてもよい。例えば、プロセッサ１００１は、これらのハードウェアの少なくとも１つを用いて実装されてもよい。 In addition, the information processing device 10 may be configured to include hardware such as a microprocessor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a programmable logic device (PLD), or a field programmable gate array (FPGA), and some or all of the functional blocks may be realized by the hardware. For example, the processor 1001 may be implemented using at least one of these pieces of hardware.

　情報の通知は、本開示において説明した態様／実施形態に限られず、他の方法を用いて行われてもよい。例えば、情報の通知は、物理レイヤシグナリング（例えば、ＤＣＩ（Downlink　Control　Information）、ＵＣＩ（Uplink　Control　Information））、上位レイヤシグナリング（例えば、ＲＲＣ（Radio　Resource　Control）シグナリング、ＭＡＣ（Medium　Access　Control）シグナリング、報知情報（ＭＩＢ（Master　Information　Block）、ＳＩＢ（System　Information　Block）））、その他の信号又はこれらの組み合わせによって実施されてもよい。また、ＲＲＣシグナリングは、ＲＲＣメッセージと呼ばれてもよく、例えば、ＲＲＣ接続セットアップ（RRC　Connection　Setup）メッセージ、ＲＲＣ接続再構成（RRC　Connection　Reconfiguration）メッセージなどであってもよい。 The notification of information is not limited to the aspects/embodiments described in this disclosure, and may be performed using other methods. For example, the notification of information may be performed by physical layer signaling (e.g., DCI (Downlink Control Information), UCI (Uplink Control Information)), higher layer signaling (e.g., RRC (Radio Resource Control) signaling, MAC (Medium Access Control) signaling, broadcast information (MIB (Master Information Block), SIB (System Information Block))), other signals, or a combination of these. In addition, RRC signaling may be referred to as an RRC message, and may be, for example, an RRC Connection Setup message, an RRC Connection Reconfiguration message, etc.

　本開示において説明した各態様／実施形態は、ＬＴＥ（Long　Term　Evolution）、ＬＴＥ－Ａ（LTE-Advanced）、ＳＵＰＥＲ　３Ｇ、ＩＭＴ－Ａｄｖａｎｃｅｄ、４Ｇ（4th　generation　mobile　communication　system）、５Ｇ（5th　generation　mobile　communication　system）、6th　generation　mobile　communication　system（６Ｇ）、xth　generation　mobile　communication　system（ｘＧ）（ｘＧ（ｘは、例えば整数、小数））、ＦＲＡ（Future　Radio　Access）、ＮＲ（new　Radio）、New　radio　access（ＮＸ）、Future　generation　radio　access（ＦＸ）、Ｗ－ＣＤＭＡ（登録商標）、ＧＳＭ（登録商標）、ＣＤＭＡ２０００、ＵＭＢ（Ultra　Mobile　Broadband）、ＩＥＥＥ　８０２．１１（Ｗｉ－Ｆｉ（登録商標））、ＩＥＥＥ　８０２．１６（ＷｉＭＡＸ（登録商標））、ＩＥＥＥ　８０２．２０、ＵＷＢ（Ultra-WideBand）、Ｂｌｕｅｔｏｏｔｈ（登録商標）、その他の適切なシステムを利用するシステム及びこれらに基づいて拡張、修正、作成、規定された次世代システムの少なくとも一つに適用されてもよい。また、複数のシステムが組み合わされて（例えば、ＬＴＥ及びＬＴＥ－Ａの少なくとも一方と５Ｇとの組み合わせ等）適用されてもよい。 Each aspect/embodiment described in this disclosure is a mobile communication system that is compatible with LTE (Long Term Evolution), LTE-A (LTE-Advanced), SUPER 3G, IMT-Advanced, 4G (4th generation mobile communication system), 5G (5th generation mobile communication system), 6th generation mobile communication system (6G), xth generation mobile communication system (xG) (xG (x is, for example, an integer or decimal number)), FRA (Future Ra The present invention may be applied to at least one of systems using IEEE 802.11 (Wi-Fi (registered trademark)), IEEE 802.16 (WiMAX (registered trademark)), IEEE 802.20, UWB (Ultra-WideBand), Bluetooth (registered trademark), and other appropriate systems, and next-generation systems that are expanded, modified, created, or defined based on these. It may also be applied to a combination of multiple systems (for example, a combination of at least one of LTE and LTE-A with 5G, etc.).

　本開示において説明した各態様／実施形態の処理手順、シーケンス、フローチャートなどは、矛盾の無い限り、順序を入れ替えてもよい。例えば、本開示において説明した方法については、例示的な順序を用いて様々なステップの要素を提示しており、提示した特定の順序に限定されない。 The processing steps, sequences, flow charts, etc. of each aspect/embodiment described in this disclosure may be reordered unless inconsistent. For example, the methods described in this disclosure present elements of various steps using an example order and are not limited to the particular order presented.

　入出力された情報等は特定の場所（例えば、メモリ）に保存されてもよいし、管理テーブルを用いて管理してもよい。入出力される情報等は、上書き、更新、又は追記され得る。出力された情報等は削除されてもよい。入力された情報等は他の装置へ送信されてもよい。 The input and output information may be stored in a specific location (e.g., memory) or may be managed using a management table. The input and output information may be overwritten, updated, or added to. The output information may be deleted. The input information may be sent to another device.

　判定は、１ビットで表される値（０か１か）によって行われてもよいし、真偽値（Boolean：true又はfalse）によって行われてもよいし、数値の比較（例えば、所定の値との比較）によって行われてもよい。 The determination may be based on a value represented by one bit (0 or 1), a Boolean value (true or false), or a numerical comparison (e.g., a comparison with a predetermined value).

　本開示において説明した各態様／実施形態は単独で用いてもよいし、組み合わせて用いてもよいし、実行に伴って切り替えて用いてもよい。また、所定の情報の通知（例えば、「Ｘであること」の通知）は、明示的に行うものに限られず、暗黙的（例えば、当該所定の情報の通知を行わない）ことによって行われてもよい。 Each aspect/embodiment described in this disclosure may be used alone, in combination, or switched depending on the execution. In addition, notification of specific information (e.g., notification that "X is the case") is not limited to being done explicitly, but may be done implicitly (e.g., not notifying the specific information).

　以上、本開示について詳細に説明したが、当業者にとっては、本開示が本開示中に説明した実施形態に限定されるものではないということは明らかである。本開示は、請求の範囲の記載により定まる本開示の趣旨及び範囲を逸脱することなく修正及び変更態様として実施することができる。したがって、本開示の記載は、例示説明を目的とするものであり、本開示に対して何ら制限的な意味を有するものではない。　Although the present disclosure has been described in detail above, it is clear to those skilled in the art that the present disclosure is not limited to the embodiments described herein. The present disclosure can be implemented in modified and altered forms without departing from the spirit and scope of the present disclosure as defined by the claims. Therefore, the description of the present disclosure is intended to be illustrative and does not have any limiting meaning on the present disclosure.

　ソフトウェアは、ソフトウェア、ファームウェア、ミドルウェア、マイクロコード、ハードウェア記述言語と呼ばれるか、他の名称で呼ばれるかを問わず、命令、命令セット、コード、コードセグメント、プログラムコード、プログラム、サブプログラム、ソフトウェアモジュール、アプリケーション、ソフトウェアアプリケーション、ソフトウェアパッケージ、ルーチン、サブルーチン、オブジェクト、実行可能ファイル、実行スレッド、手順、機能などを意味するよう広く解釈されるべきである。 Software shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executable files, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.

　また、ソフトウェア、命令、情報などは、伝送媒体を介して送受信されてもよい。例えば、ソフトウェアが、有線技術（同軸ケーブル、光ファイバケーブル、ツイストペア、デジタル加入者回線（ＤＳＬ：Digital　Subscriber　Line）など）及び無線技術（赤外線、マイクロ波など）の少なくとも一方を使用してウェブサイト、サーバ、又は他のリモートソースから送信される場合、これらの有線技術及び無線技術の少なくとも一方は、伝送媒体の定義内に含まれる。 Software, instructions, information, etc. may also be transmitted and received via a transmission medium. For example, if the software is transmitted from a website, server, or other remote source using at least one of wired technologies (such as coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL)), and/or wireless technologies (such as infrared, microwave), then at least one of these wired and wireless technologies is included within the definition of a transmission medium.

　本開示において説明した情報、信号などは、様々な異なる技術のいずれかを使用して表されてもよい。例えば、上記の説明全体に渡って言及され得るデータ、命令、コマンド、情報、信号、ビット、シンボル、チップなどは、電圧、電流、電磁波、磁界若しくは磁性粒子、光場若しくは光子、又はこれらの任意の組み合わせによって表されてもよい。 The information, signals, etc. described in this disclosure may be represented using any of a variety of different technologies. For example, the data, instructions, commands, information, signals, bits, symbols, chips, etc. that may be referred to throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or magnetic particles, optical fields or photons, or any combination thereof.

　なお、本開示において説明した用語及び本開示の理解に必要な用語については、同一の又は類似する意味を有する用語と置き換えてもよい。例えば、チャネル及びシンボルの少なくとも一方は信号（シグナリング）であってもよい。また、信号はメッセージであってもよい。また、コンポーネントキャリア（ＣＣ：Component　Carrier）は、キャリア周波数、セル、周波数キャリアなどと呼ばれてもよい。 Note that the terms explained in this disclosure and the terms necessary for understanding this disclosure may be replaced with terms having the same or similar meanings. For example, at least one of the channel and the symbol may be a signal (signaling). Also, the signal may be a message. Also, the component carrier (CC) may be called a carrier frequency, a cell, a frequency carrier, etc.

　本開示において使用する「システム」及び「ネットワーク」という用語は、互換的に使用される。 As used in this disclosure, the terms "system" and "network" are used interchangeably.

　また、本開示において説明した情報、パラメータなどは、絶対値を用いて表されてもよいし、所定の値からの相対値を用いて表されてもよいし、対応する別の情報を用いて表されてもよい。例えば、無線リソースはインデックスによって指示されるものであってもよい。 In addition, the information, parameters, etc. described in this disclosure may be represented using absolute values, may be represented using relative values from a predetermined value, or may be represented using other corresponding information. For example, a radio resource may be indicated by an index.

　上述したパラメータに使用する名称はいかなる点においても限定的な名称ではない。さらに、これらのパラメータを使用する数式等は、本開示で明示的に開示したものと異なる場合もある。様々なチャネル（例えば、ＰＵＣＣＨ、ＰＤＣＣＨなど）及び情報要素は、あらゆる好適な名称によって識別できるので、これらの様々なチャネル及び情報要素に割り当てている様々な名称は、いかなる点においても限定的な名称ではない。 The names used for the parameters described above are not intended to be limiting in any way. Furthermore, the formulas etc. using these parameters may differ from those explicitly disclosed in this disclosure. The various channels (e.g., PUCCH, PDCCH, etc.) and information elements may be identified by any suitable names, and therefore the various names assigned to these various channels and information elements are not intended to be limiting in any way.

　本開示で使用する「判断(determining)」、「決定(determining)」という用語は、多種多様な動作を包含する場合がある。「判断」、「決定」は、例えば、判定(judging)、計算(calculating)、算出(computing)、処理(processing)、導出(deriving)、調査(investigating)、探索(looking　up、search、inquiry)（例えば、テーブル、データベース又は別のデータ構造での探索）、確認(ascertaining)した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、受信(receiving)（例えば、情報を受信すること）、送信(transmitting)(例えば、情報を送信すること)、入力(input)、出力(output)、アクセス(accessing)（例えば、メモリ中のデータにアクセスすること）した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、解決(resolving)、選択(selecting)、選定(choosing)、確立(establishing)、比較(comparing)などした事を「判断」「決定」したとみなす事を含み得る。つまり、「判断」「決定」は、何らかの動作を「判断」「決定」したとみなす事を含み得る。また、「判断（決定）」は、「想定する（assuming）」、「期待する（expecting）」、「みなす（considering）」などで読み替えられてもよい。 As used in this disclosure, the terms "determining" and "determining" may encompass a wide variety of actions. "Determining" and "determining" may include, for example, judging, calculating, computing, processing, deriving, investigating, looking up, search, inquiry (e.g., searching in a table, database, or other data structure), and considering ascertaining as "judging" or "determining." Also, "determining" and "determining" may include receiving (e.g., receiving information), transmitting (e.g., sending information), input, output, accessing (e.g., accessing data in memory), and considering ascertaining as "judging" or "determining." Additionally, "judgment" and "decision" can include considering resolving, selecting, choosing, establishing, comparing, etc., to have been "judged" or "decided." In other words, "judgment" and "decision" can include considering some action to have been "judged" or "decided." Additionally, "judgment (decision)" can be interpreted as "assuming," "expecting," "considering," etc.

　本開示において使用する「に基づいて」という記載は、別段に明記されていない限り、「のみに基づいて」を意味しない。言い換えれば、「に基づいて」という記載は、「のみに基づいて」と「に少なくとも基づいて」の両方を意味する。 As used in this disclosure, the phrase "based on" does not mean "based only on," unless expressly stated otherwise. In other words, the phrase "based on" means both "based only on" and "based at least on."

　本開示において使用する「第１の」、「第２の」などの呼称を使用した要素へのいかなる参照も、それらの要素の量又は順序を全般的に限定しない。これらの呼称は、２つ以上の要素間を区別する便利な方法として本開示において使用され得る。したがって、第１及び第２の要素への参照は、２つの要素のみが採用され得ること、又は何らかの形で第１の要素が第２の要素に先行しなければならないことを意味しない。 Any reference to an element using a designation such as "first," "second," etc., used in this disclosure does not generally limit the quantity or order of those elements. These designations may be used in this disclosure as a convenient method of distinguishing between two or more elements. Thus, a reference to a first and a second element does not imply that only two elements may be employed or that the first element must precede the second element in some way.

　本開示において、「含む（include）」、「含んでいる（including）」及びそれらの変形が使用されている場合、これらの用語は、用語「備える（comprising）」と同様に、包括的であることが意図される。さらに、本開示において使用されている用語「又は（or）」は、排他的論理和ではないことが意図される。 When the terms "include," "including," and variations thereof are used in this disclosure, these terms are intended to be inclusive, similar to the term "comprising." Additionally, the term "or," as used in this disclosure, is not intended to be an exclusive or.

　本開示において、例えば、英語でのa,　an及びtheのように、翻訳により冠詞が追加された場合、本開示は、これらの冠詞の後に続く名詞が複数形であることを含んでもよい。 In this disclosure, where articles have been added through translation, such as a, an, and the in English, this disclosure may include that the nouns following these articles are plural.

　本開示において、「ＡとＢが異なる」という用語は、「ＡとＢが互いに異なる」ことを意味してもよい。なお、当該用語は、「ＡとＢがそれぞれＣと異なる」ことを意味してもよい。「離れる」、「結合される」などの用語も、「異なる」と同様に解釈されてもよい。 In this disclosure, the term "A and B are different" may mean "A and B are different from each other." The term may also mean "A and B are each different from C." Terms such as "separate" and "combined" may also be interpreted in the same way as "different."

　１０…情報処理装置、１１…取得部、１２…学習推論部、１００１…プロセッサ、１００２…メモリ、１００３…ストレージ、１００４…通信装置、１００５…入力装置、１００６…出力装置、１００７…バス。 10: Information processing device, 11: Acquisition unit, 12: Learning and inference unit, 1001: Processor, 1002: Memory, 1003: Storage, 1004: Communication device, 1005: Input device, 1006: Output device, 1007: Bus.

Claims

An acquisition unit that acquires information to be input to an action value function that estimates the value of an action related to the reallocation of vehicles among a plurality of ports in a shared transportation service that shares vehicles;
a learning and inference unit that performs reinforcement learning on the action-value function using the information acquired by the acquisition unit in order to make the action-value function select an action that will have the highest value for a future action;
An information processing device comprising:

The vehicle shared in the shared transportation service is a vehicle that can be reused by replacing the installed battery,
The actions related to the relocation of the vehicle include the deployment of the vehicle to a port, the recovery of the vehicle from a port, and the replacement of a battery installed in the vehicle;
the learning and inference unit calculates, for each port, a placement evaluation value related to the placement, a recovery evaluation value related to the recovery, and a battery replacement evaluation value related to the battery replacement, using a predetermined calculation formula as a reward included in the value of the action, and performs the reinforcement learning based on the placement evaluation values, recovery evaluation values, and battery replacement evaluation values obtained for all ports.
The information processing device according to claim 1 .

The vehicle shared in the shared transportation service is a vehicle on which advertisements are displayed,
the learning and inference unit calculates the placement evaluation value and the recovery evaluation value using, as one element, impressions, which are the number of times an advertisement can be viewed;
The information processing device according to claim 2 .

The vehicles shared in the shared transportation service are rental electric bicycles, rental electric kick scooters, rental motorcycles, and rental cars that can be restarted by replacing the batteries and have advertisements placed on the vehicle body.
The information processing device according to claim 3 .

In the step of inferring an action, the learning inference unit
The current state information acquired by the acquisition unit is substituted into the action value function obtained by the reinforcement learning to acquire action values in the case where various actions are performed in the current state, and the action that maximizes the acquired action value is selected as the action to be taken.
The information processing device according to claim 1 .